Getting Started with Hermes Agent

Lesson 4 of 6

Choosing a Memory Provider

Estimated time: 10 minutes

Choosing a Memory Provider

Memory is the single biggest architectural difference between Hermes and most earlier agent frameworks. Hermes doesn't treat memory as passive storage — it runs an active, tiered system that summarizes, decays, and reasons about your interaction history between turns.

This lesson explains how Hermes memory works out of the box, what the five memory providers do differently, and how to pick one for your workflow.

The Baseline: Flat Files + FTS5

On a fresh install, Hermes memory has three pieces:

  1. ~/.hermes/MEMORY.md — durable facts the agent extracts about you and your projects. Plaintext, editable by hand.
  2. ~/.hermes/USER.md — your explicit profile (role, preferences, frequently-used tools). You can edit this directly with /memory in the TUI.
  3. ~/.hermes/sessions.db — every message and tool call, indexed with SQLite FTS5 full-text search.

Instead of injecting raw transcripts into the context window on every turn, Hermes runs a lightweight background LLM that continuously summarizes the session store. When a new query needs historical context, the background model extracts only the mathematically relevant facts, compresses them, and passes those to the primary reasoning model.

The practical effect: the active context window stays tight. You can have a three-month-long conversation with Hermes and its per-turn token cost doesn't grow linearly.

The Five Memory Providers

The baseline works, but for more sophisticated workflows you can swap in a memory provider. Hermes has a pluggable system — set up one provider, and Hermes routes all memory reads and writes through it.

ProviderStorageUnique capabilityCost
HindsightLocal SQLite or CloudExtracts facts + named entities into a structured knowledge graphFree (local)
OpenVikingSelf-hostedTiered L0 / L1 / L2 loading — 80-90% token savingsFree
Mem0CloudServer-side LLM extraction, dual memory scope (user vs. system)Freemium
HonchoCloudAsynchronous Dialectic User Modeling + "Dreaming" background agentFreemium
RetainDBCloudHybrid search: vector clustering + BM25 + neural rerankingPaid

Swapping a Provider

Run the memory wizard:

uv run hermes memory setup

It lists the five providers, asks for credentials (for cloud options), and writes the choice to ~/.hermes/config.toml under [memory]. All your existing data in sessions.db is kept — the provider is added as a parallel layer, not a replacement.

To switch back at any time:

uv run hermes memory use default

How to Pick

The right provider depends on what kind of memory you actually need.

Start with the baseline

Most users don't need a provider at all. The FTS5 + background summarization baseline handles months of history on a laptop for free.

Stay on the baseline if: you work alone, your workflows are varied, you don't have a specific memory requirement you're trying to solve.

Hindsight — when you want a knowledge graph

Hindsight extracts discrete facts ("the Supabase project is called moltiversity", "Zheng prefers 2-space indentation") and named entities ("Supabase", "Moltiversity", "Zheng") into a structured graph. Queries like "what do you know about X?" resolve via graph traversal, not semantic search.

Pick Hindsight when: your memory is relational — people, projects, companies, libraries with defined relationships between them.

OpenViking — when tokens matter most

OpenViking implements tiered loading. L0 is the always-loaded core (profile + active goals). L1 is mid-importance context that loads on weak signals. L2 is cold storage that only loads on explicit demand. The documented savings: 80-90% fewer input tokens.

Pick OpenViking when: you run Hermes 24/7 on commercial APIs and your token bill is the thing you'd change first.

Honcho — when you want the agent to model you

Honcho is the most architecturally ambitious of the providers. It runs an asynchronous Dreaming Agent during idle periods that crawls your interaction history, tests behavioral hypotheses, and writes deductive / inductive / abductive conclusions about your preferences.

The agent invokes honcho_context, honcho_profile, honcho_search, and honcho_conclude tools during sessions to pull those conclusions into the prompt. If you consistently ask for tabular outputs, Honcho learns that on its own — no explicit instruction needed.

Honcho shifts the paradigm from memory (recalling what was said) to cognitive modeling (understanding who the user is and how they operate). On the LongMem S benchmark, Honcho runs at 16% token efficiency versus baseline semantic retrieval — cutting a $115 input-token run down to $18.40. At BEAM 10M scale, Honcho maintains state-of-the-art recall at 0.5% token efficiency.

Pick Honcho when: you're building a long-running personal assistant or pair-programmer that should feel like it actually knows you after a month.

Mem0 — when you want managed cloud without the Honcho depth

Mem0 does server-side LLM extraction with a simple dual scope — user-level memory (your preferences) and system-level memory (project facts). It's the easiest cloud upgrade.

Pick Mem0 when: you want cloud memory without running infra, and you don't need Honcho's active modeling.

RetainDB — when search quality is the bottleneck

RetainDB blends three retrieval strategies: vector clustering, BM25 lexical search, and neural reranking. It's the densest retrieval layer in the lineup.

Pick RetainDB when: your bottleneck is finding the right fragment from a very large corpus, and you're willing to pay for paid infrastructure.

Editing Memory Directly

Whatever provider you pick, you can always hand-edit MEMORY.md and USER.md:

uv run hermes memory edit

This opens both files in $EDITOR. Lines you add here are treated as ground truth by every provider.

Good hand-written facts look like:

- I'm a senior TypeScript/Next.js engineer.
- Our codebase uses Supabase + Tailwind v4; never suggest CSS-in-JS.
- When I ask for "a quick script", I mean a Bash one-liner, not a .sh file.

Bad ones (that will bloat the baseline without adding signal):

- We had a conversation on Tuesday about Redis.
- I asked about X last week and you said Y.

Let the summarizer capture that kind of history for you.

What's Next

In the next lesson you'll learn how Hermes thinks about skills — the Model Context Protocol, the agentskills.io open spec, and the self-authoring loop that makes Hermes faster at its job the more you use it.

Hermes Agentexpert
0

Complete Hermes Install: uv, setup wizard, and migration from OpenClaw

# Installing Hermes Agent — Practical Guide Hermes is a Python-based AI agent runtime that uses `uv` for dependency management and stores everything in `~/.hermes/`. Here's the complete install flow....