AI Agent Memory Providers — Comprehensive Report

1. Executive Summary & Rankings

AI agents are fundamentally stateless — they don’t remember anything between sessions unless given that capacity. Memory providers solve this by extracting, storing, and retrieving relevant context from past interactions. They are one of the highest-leverage upgrades you can give an agent.

Below is the ranking by popularity (GitHub stars, downloads, community activity, and market presence as of April 2026), followed by a deep dive into each.

#1 Mem0

51.9k GitHub stars

#2 OpenViking

20.9k GitHub stars

#3 ByteRover

~3.7k GitHub stars

#4 Hindsight

Vectorize.io backed

#5 Honcho

Plastic Labs · $5.4M raised

#6 RetainDB

SOTA on LongMemEval

#7 Holographic

Academic / niche

Key Takeaway

The gap between Mem0 and everything else is massive — over 50k stars versus 20k for the next contender. However, raw popularity does not mean best fit. Your use case (self-hosted, multi-platform agent) aligns most naturally with OpenViking (already running), while Mem0 and Hindsight offer the richest features if you’re willing to adopt a cloud or managed service.

2. Side-by-Side Comparison

Provider	Type	Storage	License	Cost	Key Differentiator
Mem0	Cloud · Open-core	Pluggable (your own vector DB)	Apache 2.0	Free tier; paid scaling	Largest ecosystem, framework-agnostic
OpenViking	Self-hosted	Local filesystem + SQLite	AGPL-3.0	Free	Filesystem paradigm, tiered L0/L1/L2
ByteRover	Local · SaaS	Local knowledge tree	Proprietary	Free CLI; paid cloud	Pre-compression extraction, coding focus
Hindsight	Local · Cloud	PostgreSQL + PgVector	Proprietary	Free tier; paid plans	Biomimetic memory, 3-layer model, knowledge graph
Honcho	Self-hosted + Cloud	PostgreSQL + pgvector	AGPL-3.0	$100 free credits; paid after	User/agent modeling (Peer Paradigm)
RetainDB	Managed cloud	Proprietary cloud storage	Apache 2.0 (SDK)	Free 10k ops/mo; paid beyond	SOTA on LongMemEval benchmark
Holographic	Local only	SQLite + numpy	Built into Hermes	Free	HRR algebraic vectors, trust scoring

3. Mem0 (mem0ai/mem0)

Mem0

51.9k stars · 5.8k forks · 296 contributors

What Is It?

Mem0 (pronounced “mem-zero”) is the most popular AI memory layer in the industry. It acts as a universal, self-improving memory layer that sits between your application and the LLM. When a user sends a message, Mem0 automatically extracts facts, stores them, and retrieves only the most relevant ones at the next turn. It supports Open-source and managed tiers, with SDKs in Python and TypeScript.

Architecture

User Message | v +----------------+ Extract Facts +-------------+ | Mem0 Core | ──────────────────────► | Vector DB | | (Memory API) | | (Chroma, | +----------------| Weaviate, | | ^ Pinecone, Qdr | | | Retrieve Relevant Facts | + pgvector) | v | ◄──────────────────────────────+-------------+ LLM Response + Injected Context

How It Works

Add: After each conversation turn, Mem0 passes the message history through an LLM (default: gpt-4.1-nano) to extract atomic facts from the conversation.
Store: These facts are embedded and stored in your chosen vector database. Mem0 de-duplicates, merges, and updates conflicting facts automatically.
Search: At the start of each new turn, Mem0 searches the vector DB for relevant memories based on the current query. Results are optionally reranked.
Inject: Retrieved memories are injected into the system prompt, giving the LLM cross-session context without blowing the context window.

Hermes Agent Integration

pip install mem0ai
# Set env: MEM0_API_KEY=sk-... (for managed service)
# Or: configure your own vector store for self-hosted mode

Pros

Largest community and ecosystem (51.9k stars)
Pluggable storage — use your own vector DB
Python SDK, TypeScript SDK, CLI, and MCP server
Native integrations with LangGraph, CrewAI, Vercel AI SDK, Claude Code, Cursor, Codex
Multi-level memory: User, Session, Agent scopes
AWS-backed credibility
Benchmarked: +26% accuracy vs. OpenAI Memory, 90% lower token usage

Cons

Managed Cloud API is a black box — you don’t control extraction logic
Free tier has usage limits; costs scale with conversations
Self-hosted mode requires you to manage a vector DB
Fact extraction quality depends on the underlying LLM
Can lose nuance when compressing complex user preferences into flat facts

Verdict: The safe default choice. If you want memory that just works with the largest community backing it, pick Mem0. Best for production teams that need integrations across multiple frameworks.

4. OpenViking (volcengine/OpenViking)

OpenViking

20.9k stars · 1.5k forks

What Is It?

OpenViking is an open-source context database built by ByteDance’s Volcengine team (the TikTok parent company’s cloud division). It’s the only provider that uses a filesystem paradigm — everything is organized into a virtual viking:// directory tree that agents can browse, search, and navigate hierarchically. You’re already running it on this machine.

Architecture

How It Works

Ingest: Resources (URLs, documents, conversation turns) are added via ov add-resource or programmatically. They are automatically parsed, embedded, and placed into a directory structure.
Tiered Retrieval: Data exists at three levels:
- L0 (Abstract) — ~100 tokens, quick relevance check
- L1 (Overview) — ~2,000 tokens, core info for planning
- L2 (Full) — complete data, loaded on demand
Directory-Guided Search: Instead of flat vector search, OpenViking first identifies the most relevant directory, then drills down progressively. This significantly reduces token waste.
Session End Extraction: After conversations end, the system asynchronously extracts long-term memories (preferences, entities, events) into the appropriate category.

Benchmark Results (LoCoMo10 Dataset)

Setup	Task Completion Rate	Token Cost (Input)
OpenClaw (Native Memory)	35.65%	24.6M
OpenClaw + LanceDB	44.55%	51.6M
OpenClaw + OpenViking	52.08%	4.3M

Hermes Agent Integration

pip install openviking
# Set memory.provider: openviking in config.yaml
# Server: openviking-server
# Already running at http://localhost:1933

Pros

Completely free and fully self-hosted
83-91% reduction in token costs vs. naive approaches (benchmark-backed)
Filesystem organization is intuitive for human debugging
Tiered retrieval (L0/L1/L2) prevents context window overflow
Backed by ByteDance — serious engineering resources
Already running in your environment
Automatic session-end memory extraction

Cons

Must manage your own server infrastructure
Requires a VLM and embedding provider setup (config-heavy)
Browser tools are broken (GitHub issue #4740)
Smaller community than Mem0 (though still substantial)
AGPL-3.0 license can be restrictive for commercial use
Primarily designed for OpenClaw — Hermes integration is newer

Verdict: The best token-efficiency choice with excellent structured organization. Backed by ByteDance, fully free, and already running for you. Downsides are infrastructure overhead and a broken browser interface.

5. ByteRover (byterover-cli/brv)

ByteRover

~3.7k stars · 368 forks

What Is It?

ByteRover is a CLI-based knowledge tree purpose-built for AI coding agents. Its unique claim is pre-compression extraction — capturing key insights from long conversations before the LLM’s context window compresses or discards them. It also recently launched Cipher, an open-source memory layer specifically for coding IDEs.

How It Works

Extract: During a long coding session, ByteRover runs parallel to the conversation, extracting project-specific knowledge, user preferences, architectural decisions, and patterns.
Structure: These facts are organized into a hierarchical knowledge tree rather than a flat vector store.
Store: The tree is persisted locally in a portable format.
Retrieve: When starting a new session, the agent can load relevant branches from the knowledge tree at context window start.

Hermes Agent Integration

npm install -g byterover-cli  # CLI: brv
# or use the cloud service

Pros

Pre-compression extraction is a genuinely novel idea — capture before the window drops it
Portable between IDEs (VS Code, Cursor, Claude Code, neovim)
CLI-first design fits naturally into terminal-based workflows
Cipher extends it to a shared team memory concept
Local storage option available — no cloud dependency

Cons

Narrowly focused on coding agents — less applicable to general-purpose assistants
Smaller community than major alternatives
Proprietary license for the core product
Newer and less battle-tested than Mem0/OpenViking
Hermes Agent integration is thin compared to other providers

Verdict: Niche but interesting. The pre-compression extraction concept is clever and valuable for long coding sessions. Best for developers who switch between IDEs and want persistent coding context. Not a great fit for general-purpose agents.

6. Hindsight (vectorize-io/hindsight)

Hindsight

Vectorize.io backed · SOTA on LongMemEval

What Is It?

Hindsight is an agent memory system built by Vectorize.io that structures memory around how human memory actually works. Instead of just storing facts, it uses a three-layer biomimetic model (World, Experiences, Mental Models) and a knowledge graph to create connections between disparate pieces of information. It has a published arXiv paper and scored state-of-the-art on the LongMemEval benchmark.

Architecture — The Biomimetic Model

┌──────────────────────────────────────┐ │ MENTAL MODELS │ ← Abstracted insights formed by │ (How things work) │ reflecting on raw data ├──────────────────────────────────────┤ │ EXPERIENCES │ ← The agent's own past actions │ (What I did, what happened) │ and their outcomes ├──────────────────────────────────────┤ │ WORLD │ ← General facts and knowledge │ (Static facts) │ about the external world └──────────────────────────────────────┘

How It Works

Retain (Store): An LLM extracts facts from conversations and stores them with context, timestamps, and entity labels into PostgreSQL with PgVector.
Recall (Retrieve): Runs four retrieval strategies in parallel: Semantic (Vector), Keyword (BM25), Graph (Entity/Causal), and Temporal (Time-range). Results are combined via reciprocal rank fusion with cross-encoder reranking.
Reflect (Learn): A unique capability — Hindsight can analyze its stored memories to form new insights, discover patterns, and update its own mental models without new external input.

Hermes Agent Integration

# Cloud: pip install hindsight-client
#        HINDSIGHT_API_KEY=sk-...
# Local: docker run --rm -it -p 8888:8888 -p 9999:9999
#          ghcr.io/vectorize-io/hindsight:latest

Pros

Most sophisticated memory model — genuinely learns, not just retrieves
SOTA on the LongMemEval benchmark (88% preference recall)
reflect() capability is unique — the system can synthesize new insights
4-way parallel retrieval (vector + BM25 + graph + temporal) is top-tier
Available as Docker (fully local) and cloud managed
Published academic paper adds credibility
Built-in UI (Web dashboard)
Integrations for Claude Code, Telegram, Paperclip

Cons

Complex setup — requires heavy ML dependencies (Torch, Transformers) for full features
Proprietary license (not open source in the traditional sense)
Heavier resource requirements than simpler providers
Knowledge graph can be overkill for simple memory needs
Paid managed tier beyond free limits
Confusing repo situation (two different GitHub repos exist)

Verdict: The most intellectually ambitious option. If you want an AI that learns rather than just retrieves, Hindsight is the pick. The reflect() capability and biomimetic model are genuinely ahead of the curve. But it's complex to run and proprietary.

7. Honcho (plastic-labs/honcho)

Honcho

Plastic Labs · $5.4M pre-seed

What Is It?

Honcho is an open-source memory library and managed service designed around the Peer Paradigm — both humans and AI agents are treated as “Peers,” each with their own evolving identity profile. Rather than just storing facts, Honcho builds psychological models of users and agents, tracking learning styles, communication preferences, and behavioral patterns over time. Founded by Plastic Labs which has raised $5.4M in pre-seed funding.

How It Works

Ingest: Messages are logged to sessions between Peers (user and agent).
Derive: A background worker (“Deriver”) asynchronously analyzes the conversation to update:
- Representations — evolving psychological profiles of each Peer
- Summaries — compressed summaries of sessions
- Conclusions — dialectic reasoning about what the user actually wants
Retrieve: Natural language queries to a “Peer Oracle” — e.g., alice.chat("What learning styles does the user respond to best?") — to hydrate prompts with deeply personalized context.

Hermes Agent Integration

pip install honcho-ai  # or: uv add honcho-ai
# Requires: Postgres with pgvector
# Multiple LLM keys: Anthropic, Gemini, Groq (or configure alternatives)

Pros

Only provider that builds psychological profiles of users (not just facts)
“Peer Oracle” allows natural language querying of user psychology
Open-source core with a managed service option
Backed by Plastic Labs ($5.4M raised, active blog on memory theory)
The “Memory as Reasoning” blog post series is excellent thought leadership
Get Context endpoint combines messages + conclusions + summaries into prompt-ready format
Multi-provider LLM flexibility for different pipeline stages

Cons

Requires multiple LLM providers by default (Anthropic + Gemini + Groq)
User modeling may feel intrusive or unnecessary for simple tasks
AGPL-3.0 license restricts commercial use without a managed service
Smaller community footprint than Mem0/OpenViking
Heavily opinionated approach — not a generic memory store
$100 free credits run out; managed service pricing kicks in

Verdict: The most opinionated provider. If you want deeply personalized agents that understand how you think, not just what you said, Honcho is unmatched. But it requires significant setup infrastructure and is overkill if you don't need personality modeling.

8. RetainDB

RetainDB

SOTA on LongMemEval · Managed Cloud

What Is It?

RetainDB is a managed cloud memory service with the goal of being the memory layer for AI agents, with only two API calls needed. It scored state-of-the-art on the LongMemEval benchmark (88% preference recall, tied with Hindsight). Its SDK enables memory integration in under 30 seconds with a Vercel AI SDK wrapper.

How It Works

Context Query: POST /v1/context/query retrieves relevant memories and injects them into the system prompt.
LLM Generation: Your LLM generates a response with the enriched context.
Learn: POST /v1/learn stores the interaction for future sessions.

RetainDB uses hybrid search (Vector + BM25) and delta compression for storage efficiency. It supports 7 memory types and works with any LLM provider.

Pricing

Tier	Price	Limits
Free	$0/mo	10,000 operations/month
Paid	$20/mo	Higher limits, priority access

Pros

Extremely easy to integrate — 2 API calls, 30-second setup
SOTA benchmark performance on LongMemEval
Hybrid search (Vector + BM25) outperforms vector-only retrieval
Delta compression saves storage
Free tier covers 10k operations/month — generous for personal use
Works with any LLM and framework
Starter templates for Next.js, Express, Python, LangChain

Cons

Cloud-only — no self-hosted option at all
$20/mo paid tier is relatively expensive compared to self-hosted alternatives
Smallest community and public footprint of any provider
Less control over data — privacy implications for sensitive use cases
SDK is primarily TypeScript-focused; Python support is thinner
Managed service = vendor lock-in risk

Verdict: The quickest to integrate but the least flexible. If you want a managed solution that just works with great benchmarks and don't mind the cloud dependency, RetainDB is straightforward. But it's the most vendor-locked option.

9. Holographic (Built into Hermes Agent)

Holographic Memory

Academic/niche · Zero external stars

What Is It?

Holographic memory is a fully local, zero-dependency SQLite-based memory plugin built directly into the Hermes Agent source code. It uses Holographic Reduced Representations (HRR), an academic approach from cognitive science (Tony Plate, 1995) that performs symbolic AI on top of real-valued vectors. Memories are stored as compressed holographic vectors that can be combined (superposition) and extracted (unbinding) algebraically.

How It Works

HRR is a mathematical framework for storing composite memories in a single vector:

Binding: Two vectors A and B are combined via circular convolution to produce C = A ⊛ B (analogous to "key-value" association).
Superposition: Multiple bound pairs are added together to compound information: V = A₁⊛B₁ + A₂⊛B₂ + ... + Aₙ⊛Bₙ
Unbinding: To retrieve B given A, we unbind: B ≈ A* ⊛ V (where A* is the inverse/approximate inverse).
Trust Scoring: Each memory is scored based on user feedback (+0.05 for helpful, -0.10 for unhelpful), influencing retrieval ranking.

Hermes Agent Integration

# No external package needed — it's in the Hermes source tree.
# Enable via: hermes memory setup  # select "holographic"

Pros

Zero external dependencies — only needs numpy
100% local — no API keys, no cloud, no network calls
Unique algebraic query capabilities:
- probe — entity-specific recall
- reason — compositional AND queries across entities
- contradict — automated conflict detection
Trust scoring with feedback-driven weights is novel
Privacy-maximal — data never leaves your machine
Already shipped with Hermes Agent

Cons

Smallest community — essentially just one plugin maintainer
HRR is an academic approach with a steep learning curve
Scaling to thousands of memories degrades recall quality (interference)
Not well-documented outside the Hermes codebase
No managed service or cloud backup option
May require significant tuning to get good results for non-trivial use cases
Limited adoption means limited bug fixes and community support

Verdict: The most technically novel but least practical. HRR is fascinating math — but it's like using a neural-symbolic microscope when most people just need a filing cabinet. Good for privacy-focused, local-only setups with moderate memory volumes.

10. Bonus: `/sethome` Command

/sethome (and its alias /set-home) is not a memory provider — it’s a Hermes Agent gateway command that sets the current chat channel as the “home channel.”

Property	Detail
Type	Gateway command (not a tool or MCP)
Scope	Session / platform-level
Platforms	Telegram, Discord, Slack, WhatsApp, Signal, Matrix, etc.
Purpose	Sets this chat as the primary delivery destination for notifications, cron jobs, scheduled tasks, and system alerts
CLI equivalent	Not available — CLI only (cli-only=True)

Use /sethome in the chat where you want to receive Hermes’s automated outputs (morning briefs, scheduled tasks, health alerts, etc.). Without setting a home channel, scheduled outputs may have nowhere to go.

11. Recommendations by Use Case

SOTA on LongMemEval, unique reflect() capability, biomimetic memory model. Docker deployable.

Psychological profiling, Peer Oracle, dialectic reasoning. Best for deeply personal assistants.

2 API calls, 30-second setup. SOTA benchmark scores. But cloud-only.

Pre-compression extraction is perfect for long coding sessions. Switch IDEs with memory intact.

No network calls, no API keys, just SQLite and numpy. Data never leaves your box.

If you want...	Choose...	Why
The default best overall	Mem0	51.9k stars, widest ecosystem, works with everything, proven at scale. The safe pick.
Lowest token cost + self-hosted	OpenViking	83-91% token reduction backed by benchmarks. Already running on your machine. Free.
Agent that actually learns	Hindsight
Deep user personalization	Honcho
Zero effort, managed service	RetainDB
Persistent coding context	ByteRover
Maximum privacy, zero deps	Holographic

Recommendation for This Machine

OpenViking is the best fit for your current setup: it’s already configured and running, it’s free and self-hosted, and the benchmark results are genuinely impressive. If you want to explore alternatives:

Hindsight is the most interesting upgrade candidate — the reflect() capability and knowledge graph would give your agent genuine learning ability.
Mem0 is the safest production bet if you need maximum ecosystem compatibility.
Honcho is worth trying if you care about user modeling (e.g., your agent learns your communication style, preferences, and behavioral patterns).

1. Executive Summary & Rankings

Key Takeaway

2. Side-by-Side Comparison

3. Mem0 (mem0ai/mem0)

Mem0

What Is It?

Architecture

How It Works

Hermes Agent Integration

Pros

Cons

4. OpenViking (volcengine/OpenViking)

OpenViking

What Is It?

Architecture

How It Works

Benchmark Results (LoCoMo10 Dataset)

Hermes Agent Integration

Pros

Cons

5. ByteRover (byterover-cli/brv)

ByteRover

What Is It?

How It Works

Hermes Agent Integration

Pros

Cons

6. Hindsight (vectorize-io/hindsight)

Hindsight

What Is It?

Architecture — The Biomimetic Model

How It Works

Hermes Agent Integration

Pros

Cons

7. Honcho (plastic-labs/honcho)

Honcho

What Is It?

How It Works

Hermes Agent Integration

Pros

Cons

8. RetainDB

RetainDB

What Is It?

How It Works

Pricing

Pros

Cons

9. Holographic (Built into Hermes Agent)

Holographic Memory

What Is It?

How It Works

Hermes Agent Integration

Pros

Cons

10. Bonus: /sethome Command

11. Recommendations by Use Case

Recommendation for This Machine

10. Bonus: `/sethome` Command