AI Agent Memory Providers: Comprehensive Report

Prepared: April 04, 2026 For: Hermes Agent (Nous Research / OpenClaw) Providers Analyzed: 7

1. Executive Summary & Rankings

AI agents are fundamentally stateless — they don’t remember anything between sessions unless given that capacity. Memory providers solve this by extracting, storing, and retrieving relevant context from past interactions. They are one of the highest-leverage upgrades you can give an agent.

Below is the ranking by popularity (GitHub stars, downloads, community activity, and market presence as of April 2026), followed by a deep dive into each.

#1  Mem0
51.9k GitHub stars
#2  OpenViking
20.9k GitHub stars
#3  ByteRover
~3.7k GitHub stars
#4  Hindsight
Vectorize.io backed
#5  Honcho
Plastic Labs · $5.4M raised
#6  RetainDB
SOTA on LongMemEval
#7  Holographic
Academic / niche

Key Takeaway

The gap between Mem0 and everything else is massive — over 50k stars versus 20k for the next contender. However, raw popularity does not mean best fit. Your use case (self-hosted, multi-platform agent) aligns most naturally with OpenViking (already running), while Mem0 and Hindsight offer the richest features if you’re willing to adopt a cloud or managed service.

2. Side-by-Side Comparison

Provider Type Storage License Cost Key Differentiator
Mem0 Cloud · Open-core Pluggable (your own vector DB) Apache 2.0 Free tier; paid scaling Largest ecosystem, framework-agnostic
OpenViking Self-hosted Local filesystem + SQLite AGPL-3.0 Free Filesystem paradigm, tiered L0/L1/L2
ByteRover Local · SaaS Local knowledge tree Proprietary Free CLI; paid cloud Pre-compression extraction, coding focus
Hindsight Local · Cloud PostgreSQL + PgVector Proprietary Free tier; paid plans Biomimetic memory, 3-layer model, knowledge graph
Honcho Self-hosted + Cloud PostgreSQL + pgvector AGPL-3.0 $100 free credits; paid after User/agent modeling (Peer Paradigm)
RetainDB Managed cloud Proprietary cloud storage Apache 2.0 (SDK) Free 10k ops/mo; paid beyond SOTA on LongMemEval benchmark
Holographic Local only SQLite + numpy Built into Hermes Free HRR algebraic vectors, trust scoring

3. Mem0 (mem0ai/mem0)

Mem0

51.9k stars · 5.8k forks · 296 contributors

What Is It?

Mem0 (pronounced “mem-zero”) is the most popular AI memory layer in the industry. It acts as a universal, self-improving memory layer that sits between your application and the LLM. When a user sends a message, Mem0 automatically extracts facts, stores them, and retrieves only the most relevant ones at the next turn. It supports Open-source and managed tiers, with SDKs in Python and TypeScript.

Architecture

User Message | v +----------------+ Extract Facts +-------------+ | Mem0 Core | ──────────────────────► | Vector DB | | (Memory API) | | (Chroma, | +----------------| Weaviate, | | ^ Pinecone, Qdr | | | Retrieve Relevant Facts | + pgvector) | v | ◄──────────────────────────────+-------------+ LLM Response + Injected Context

How It Works

  1. Add: After each conversation turn, Mem0 passes the message history through an LLM (default: gpt-4.1-nano) to extract atomic facts from the conversation.
  2. Store: These facts are embedded and stored in your chosen vector database. Mem0 de-duplicates, merges, and updates conflicting facts automatically.
  3. Search: At the start of each new turn, Mem0 searches the vector DB for relevant memories based on the current query. Results are optionally reranked.
  4. Inject: Retrieved memories are injected into the system prompt, giving the LLM cross-session context without blowing the context window.

Hermes Agent Integration

pip install mem0ai
# Set env: MEM0_API_KEY=sk-... (for managed service)
# Or: configure your own vector store for self-hosted mode

Pros

  • Largest community and ecosystem (51.9k stars)
  • Pluggable storage — use your own vector DB
  • Python SDK, TypeScript SDK, CLI, and MCP server
  • Native integrations with LangGraph, CrewAI, Vercel AI SDK, Claude Code, Cursor, Codex
  • Multi-level memory: User, Session, Agent scopes
  • AWS-backed credibility
  • Benchmarked: +26% accuracy vs. OpenAI Memory, 90% lower token usage

Cons

  • Managed Cloud API is a black box — you don’t control extraction logic
  • Free tier has usage limits; costs scale with conversations
  • Self-hosted mode requires you to manage a vector DB
  • Fact extraction quality depends on the underlying LLM
  • Can lose nuance when compressing complex user preferences into flat facts
Verdict: The safe default choice. If you want memory that just works with the largest community backing it, pick Mem0. Best for production teams that need integrations across multiple frameworks.

4. OpenViking (volcengine/OpenViking)

OpenViking

20.9k stars · 1.5k forks

What Is It?

OpenViking is an open-source context database built by ByteDance’s Volcengine team (the TikTok parent company’s cloud division). It’s the only provider that uses a filesystem paradigm — everything is organized into a virtual viking:// directory tree that agents can browse, search, and navigate hierarchically. You’re already running it on this machine.

Architecture

Agent | | viking://resources/memories/profile/ | viking://resources/memories/preferences/ | viking://resources/memories/entities/ | viking://resources/memories/events/ | viking://resources/memories/cases/ | viking://resources/memories/patterns/ v +------------------+ | OpenViking Server | | viking:// tree | +--------------------+ | | v v Local FS SQLite + embeddings

How It Works

  1. Ingest: Resources (URLs, documents, conversation turns) are added via ov add-resource or programmatically. They are automatically parsed, embedded, and placed into a directory structure.
  2. Tiered Retrieval: Data exists at three levels:
    • L0 (Abstract) — ~100 tokens, quick relevance check
    • L1 (Overview) — ~2,000 tokens, core info for planning
    • L2 (Full) — complete data, loaded on demand
  3. Directory-Guided Search: Instead of flat vector search, OpenViking first identifies the most relevant directory, then drills down progressively. This significantly reduces token waste.
  4. Session End Extraction: After conversations end, the system asynchronously extracts long-term memories (preferences, entities, events) into the appropriate category.

Benchmark Results (LoCoMo10 Dataset)

SetupTask Completion RateToken Cost (Input)
OpenClaw (Native Memory)35.65%24.6M
OpenClaw + LanceDB44.55%51.6M
OpenClaw + OpenViking52.08%4.3M

Hermes Agent Integration

pip install openviking
# Set memory.provider: openviking in config.yaml
# Server: openviking-server
# Already running at http://localhost:1933

Pros

  • Completely free and fully self-hosted
  • 83-91% reduction in token costs vs. naive approaches (benchmark-backed)
  • Filesystem organization is intuitive for human debugging
  • Tiered retrieval (L0/L1/L2) prevents context window overflow
  • Backed by ByteDance — serious engineering resources
  • Already running in your environment
  • Automatic session-end memory extraction

Cons

  • Must manage your own server infrastructure
  • Requires a VLM and embedding provider setup (config-heavy)
  • Browser tools are broken (GitHub issue #4740)
  • Smaller community than Mem0 (though still substantial)
  • AGPL-3.0 license can be restrictive for commercial use
  • Primarily designed for OpenClaw — Hermes integration is newer
Verdict: The best token-efficiency choice with excellent structured organization. Backed by ByteDance, fully free, and already running for you. Downsides are infrastructure overhead and a broken browser interface.

5. ByteRover (byterover-cli/brv)

ByteRover

~3.7k stars · 368 forks

What Is It?

ByteRover is a CLI-based knowledge tree purpose-built for AI coding agents. Its unique claim is pre-compression extraction — capturing key insights from long conversations before the LLM’s context window compresses or discards them. It also recently launched Cipher, an open-source memory layer specifically for coding IDEs.

How It Works

  1. Extract: During a long coding session, ByteRover runs parallel to the conversation, extracting project-specific knowledge, user preferences, architectural decisions, and patterns.
  2. Structure: These facts are organized into a hierarchical knowledge tree rather than a flat vector store.
  3. Store: The tree is persisted locally in a portable format.
  4. Retrieve: When starting a new session, the agent can load relevant branches from the knowledge tree at context window start.

Hermes Agent Integration

npm install -g byterover-cli  # CLI: brv
# or use the cloud service

Pros

  • Pre-compression extraction is a genuinely novel idea — capture before the window drops it
  • Portable between IDEs (VS Code, Cursor, Claude Code, neovim)
  • CLI-first design fits naturally into terminal-based workflows
  • Cipher extends it to a shared team memory concept
  • Local storage option available — no cloud dependency

Cons

  • Narrowly focused on coding agents — less applicable to general-purpose assistants
  • Smaller community than major alternatives
  • Proprietary license for the core product
  • Newer and less battle-tested than Mem0/OpenViking
  • Hermes Agent integration is thin compared to other providers
Verdict: Niche but interesting. The pre-compression extraction concept is clever and valuable for long coding sessions. Best for developers who switch between IDEs and want persistent coding context. Not a great fit for general-purpose agents.

6. Hindsight (vectorize-io/hindsight)

Hindsight

Vectorize.io backed · SOTA on LongMemEval

What Is It?

Hindsight is an agent memory system built by Vectorize.io that structures memory around how human memory actually works. Instead of just storing facts, it uses a three-layer biomimetic model (World, Experiences, Mental Models) and a knowledge graph to create connections between disparate pieces of information. It has a published arXiv paper and scored state-of-the-art on the LongMemEval benchmark.

Architecture — The Biomimetic Model

┌──────────────────────────────────────┐ │ MENTAL MODELS │ ← Abstracted insights formed by │ (How things work) │ reflecting on raw data ├──────────────────────────────────────┤ │ EXPERIENCES │ ← The agent's own past actions │ (What I did, what happened) │ and their outcomes ├──────────────────────────────────────┤ │ WORLD │ ← General facts and knowledge │ (Static facts) │ about the external world └──────────────────────────────────────┘

How It Works

  1. Retain (Store): An LLM extracts facts from conversations and stores them with context, timestamps, and entity labels into PostgreSQL with PgVector.
  2. Recall (Retrieve): Runs four retrieval strategies in parallel: Semantic (Vector), Keyword (BM25), Graph (Entity/Causal), and Temporal (Time-range). Results are combined via reciprocal rank fusion with cross-encoder reranking.
  3. Reflect (Learn): A unique capability — Hindsight can analyze its stored memories to form new insights, discover patterns, and update its own mental models without new external input.

Hermes Agent Integration

# Cloud: pip install hindsight-client
#        HINDSIGHT_API_KEY=sk-...
# Local: docker run --rm -it -p 8888:8888 -p 9999:9999
#          ghcr.io/vectorize-io/hindsight:latest

Pros

  • Most sophisticated memory model — genuinely learns, not just retrieves
  • SOTA on the LongMemEval benchmark (88% preference recall)
  • reflect() capability is unique — the system can synthesize new insights
  • 4-way parallel retrieval (vector + BM25 + graph + temporal) is top-tier
  • Available as Docker (fully local) and cloud managed
  • Published academic paper adds credibility
  • Built-in UI (Web dashboard)
  • Integrations for Claude Code, Telegram, Paperclip

Cons

  • Complex setup — requires heavy ML dependencies (Torch, Transformers) for full features
  • Proprietary license (not open source in the traditional sense)
  • Heavier resource requirements than simpler providers
  • Knowledge graph can be overkill for simple memory needs
  • Paid managed tier beyond free limits
  • Confusing repo situation (two different GitHub repos exist)
Verdict: The most intellectually ambitious option. If you want an AI that learns rather than just retrieves, Hindsight is the pick. The reflect() capability and biomimetic model are genuinely ahead of the curve. But it's complex to run and proprietary.

7. Honcho (plastic-labs/honcho)

Honcho

Plastic Labs · $5.4M pre-seed

What Is It?

Honcho is an open-source memory library and managed service designed around the Peer Paradigm — both humans and AI agents are treated as “Peers,” each with their own evolving identity profile. Rather than just storing facts, Honcho builds psychological models of users and agents, tracking learning styles, communication preferences, and behavioral patterns over time. Founded by Plastic Labs which has raised $5.4M in pre-seed funding.

How It Works

  1. Ingest: Messages are logged to sessions between Peers (user and agent).
  2. Derive: A background worker (“Deriver”) asynchronously analyzes the conversation to update:
    • Representations — evolving psychological profiles of each Peer
    • Summaries — compressed summaries of sessions
    • Conclusions — dialectic reasoning about what the user actually wants
  3. Retrieve: Natural language queries to a “Peer Oracle” — e.g., alice.chat("What learning styles does the user respond to best?") — to hydrate prompts with deeply personalized context.

Hermes Agent Integration

pip install honcho-ai  # or: uv add honcho-ai
# Requires: Postgres with pgvector
# Multiple LLM keys: Anthropic, Gemini, Groq (or configure alternatives)

Pros

  • Only provider that builds psychological profiles of users (not just facts)
  • “Peer Oracle” allows natural language querying of user psychology
  • Open-source core with a managed service option
  • Backed by Plastic Labs ($5.4M raised, active blog on memory theory)
  • The “Memory as Reasoning” blog post series is excellent thought leadership
  • Get Context endpoint combines messages + conclusions + summaries into prompt-ready format
  • Multi-provider LLM flexibility for different pipeline stages

Cons

  • Requires multiple LLM providers by default (Anthropic + Gemini + Groq)
  • User modeling may feel intrusive or unnecessary for simple tasks
  • AGPL-3.0 license restricts commercial use without a managed service
  • Smaller community footprint than Mem0/OpenViking
  • Heavily opinionated approach — not a generic memory store
  • $100 free credits run out; managed service pricing kicks in
Verdict: The most opinionated provider. If you want deeply personalized agents that understand how you think, not just what you said, Honcho is unmatched. But it requires significant setup infrastructure and is overkill if you don't need personality modeling.

8. RetainDB

RetainDB

SOTA on LongMemEval · Managed Cloud

What Is It?

RetainDB is a managed cloud memory service with the goal of being the memory layer for AI agents, with only two API calls needed. It scored state-of-the-art on the LongMemEval benchmark (88% preference recall, tied with Hindsight). Its SDK enables memory integration in under 30 seconds with a Vercel AI SDK wrapper.

How It Works

  1. Context Query: POST /v1/context/query retrieves relevant memories and injects them into the system prompt.
  2. LLM Generation: Your LLM generates a response with the enriched context.
  3. Learn: POST /v1/learn stores the interaction for future sessions.

RetainDB uses hybrid search (Vector + BM25) and delta compression for storage efficiency. It supports 7 memory types and works with any LLM provider.

Pricing

TierPriceLimits
Free$0/mo10,000 operations/month
Paid$20/moHigher limits, priority access

Pros

  • Extremely easy to integrate — 2 API calls, 30-second setup
  • SOTA benchmark performance on LongMemEval
  • Hybrid search (Vector + BM25) outperforms vector-only retrieval
  • Delta compression saves storage
  • Free tier covers 10k operations/month — generous for personal use
  • Works with any LLM and framework
  • Starter templates for Next.js, Express, Python, LangChain

Cons

  • Cloud-only — no self-hosted option at all
  • $20/mo paid tier is relatively expensive compared to self-hosted alternatives
  • Smallest community and public footprint of any provider
  • Less control over data — privacy implications for sensitive use cases
  • SDK is primarily TypeScript-focused; Python support is thinner
  • Managed service = vendor lock-in risk
Verdict: The quickest to integrate but the least flexible. If you want a managed solution that just works with great benchmarks and don't mind the cloud dependency, RetainDB is straightforward. But it's the most vendor-locked option.

9. Holographic (Built into Hermes Agent)

Holographic Memory

Academic/niche · Zero external stars

What Is It?

Holographic memory is a fully local, zero-dependency SQLite-based memory plugin built directly into the Hermes Agent source code. It uses Holographic Reduced Representations (HRR), an academic approach from cognitive science (Tony Plate, 1995) that performs symbolic AI on top of real-valued vectors. Memories are stored as compressed holographic vectors that can be combined (superposition) and extracted (unbinding) algebraically.

How It Works

HRR is a mathematical framework for storing composite memories in a single vector:

  1. Binding: Two vectors A and B are combined via circular convolution to produce C = A ⊛ B (analogous to "key-value" association).
  2. Superposition: Multiple bound pairs are added together to compound information: V = A₁⊛B₁ + A₂⊛B₂ + ... + Aₙ⊛Bₙ
  3. Unbinding: To retrieve B given A, we unbind: B ≈ A* ⊛ V (where A* is the inverse/approximate inverse).
  4. Trust Scoring: Each memory is scored based on user feedback (+0.05 for helpful, -0.10 for unhelpful), influencing retrieval ranking.

Hermes Agent Integration

# No external package needed — it's in the Hermes source tree.
# Enable via: hermes memory setup  # select "holographic"

Pros

  • Zero external dependencies — only needs numpy
  • 100% local — no API keys, no cloud, no network calls
  • Unique algebraic query capabilities:
    • probe — entity-specific recall
    • reason — compositional AND queries across entities
    • contradict — automated conflict detection
  • Trust scoring with feedback-driven weights is novel
  • Privacy-maximal — data never leaves your machine
  • Already shipped with Hermes Agent

Cons

  • Smallest community — essentially just one plugin maintainer
  • HRR is an academic approach with a steep learning curve
  • Scaling to thousands of memories degrades recall quality (interference)
  • Not well-documented outside the Hermes codebase
  • No managed service or cloud backup option
  • May require significant tuning to get good results for non-trivial use cases
  • Limited adoption means limited bug fixes and community support
Verdict: The most technically novel but least practical. HRR is fascinating math — but it's like using a neural-symbolic microscope when most people just need a filing cabinet. Good for privacy-focused, local-only setups with moderate memory volumes.

10. Bonus: /sethome Command

/sethome (and its alias /set-home) is not a memory provider — it’s a Hermes Agent gateway command that sets the current chat channel as the “home channel.”

PropertyDetail
TypeGateway command (not a tool or MCP)
ScopeSession / platform-level
PlatformsTelegram, Discord, Slack, WhatsApp, Signal, Matrix, etc.
PurposeSets this chat as the primary delivery destination for notifications, cron jobs, scheduled tasks, and system alerts
CLI equivalentNot available — CLI only (cli-only=True)

Use /sethome in the chat where you want to receive Hermes’s automated outputs (morning briefs, scheduled tasks, health alerts, etc.). Without setting a home channel, scheduled outputs may have nowhere to go.

11. Recommendations by Use Case

  • SOTA on LongMemEval, unique reflect() capability, biomimetic memory model. Docker deployable.
  • Psychological profiling, Peer Oracle, dialectic reasoning. Best for deeply personal assistants.
  • 2 API calls, 30-second setup. SOTA benchmark scores. But cloud-only.
  • Pre-compression extraction is perfect for long coding sessions. Switch IDEs with memory intact.
  • No network calls, no API keys, just SQLite and numpy. Data never leaves your box.
  • If you want...Choose...Why
    The default best overall Mem0 51.9k stars, widest ecosystem, works with everything, proven at scale. The safe pick.
    Lowest token cost + self-hosted OpenViking 83-91% token reduction backed by benchmarks. Already running on your machine. Free.
    Agent that actually learns Hindsight
    Deep user personalization Honcho
    Zero effort, managed service RetainDB
    Persistent coding context ByteRover
    Maximum privacy, zero deps Holographic

    Recommendation for This Machine

    OpenViking is the best fit for your current setup: it’s already configured and running, it’s free and self-hosted, and the benchmark results are genuinely impressive. If you want to explore alternatives: