The Problem: Claude Code Forgets Everything
Every time you start a new Claude Code session, the slate is wiped clean. Your coding style preferences, project architecture decisions, yesterday's debugging session — all gone.
You end up repeating yourself: "We use Supabase, not Firebase. The Edge Functions are in supabase/functions/. Don't use dummy data."
claude-mem fixes this by adding persistent memory across sessions. It hit 46K GitHub stars within 48 hours of launch. I installed it, built a lightweight DIY alternative first, and here's what I found.
What is claude-mem?
GitHub: https://github.com/thedotmack/claude-mem
A plugin that gives Claude Code a long-term memory. It automatically captures what you do during sessions and injects relevant context into future conversations.
Architecture
- 5 Lifecycle Hooks: SessionStart / UserPromptSubmit / PostToolUse / Stop / SessionEnd
- SQLite + Chroma: Hybrid search (keyword + vector similarity)
- Bun HTTP Worker: Background service on localhost:37777
- MCP Tools: 3-layer progressive disclosure (search → timeline → get_observations)
- Web UI: Visual memory browser
Installation
npx claude-mem install
npx claude-mem start # Requires Bun
The DIY Alternative I Built First
Before discovering claude-mem, I built a minimal memory system using just two PowerShell scripts and Claude Code's native hooks API.
PostToolUse Hook (auto-capture.ps1)
Triggered after every Bash or Write tool use. Captures git commits and new file creations to a daily markdown file:
memory/auto-capture/2026-04-13.md
- 09:15 [abc1234] feat: Add user authentication
- 09:32 [Write] auth_middleware.dart
- 10:01 [def5678] fix: Token refresh logic
SessionStart Hook (session-resume.ps1)
Reads the last 3 days of captures and injects them as context when a new session starts. The AI immediately knows what you've been working on.
Registration in settings.json
{
"hooks": {
"PostToolUse": [{
"matcher": "Bash|Write",
"hooks": [{
"type": "command",
"command": "powershell -File auto-capture.ps1"
}]
}],
"SessionStart": [{
"hooks": [{
"type": "command",
"command": "powershell -File session-resume.ps1"
}]
}]
}
}
Head-to-Head Comparison
| Feature | claude-mem | DIY Hooks |
|---|---|---|
| Setup |
npx install (1 command) |
2 scripts, manual |
| Auto-capture | All tool usage | git commits + Write only |
| Search | Vector similarity + keyword | grep (text search) |
| Web UI | localhost:37777 | None |
| Dependencies | Bun + SQLite + (Chroma) | None |
| Token cost | LLM compression (Gemini = free) | Zero |
| Git-friendly | DB file (gitignored) | Markdown files (shareable) |
| Multi-instance | Session-scoped isolation | File sharing for coordination |
Running Both Together
The good news: they coexist perfectly. claude-mem registers as a plugin, DIY hooks register directly in settings.json. Both fire on the same events without conflict.
When claude-mem shines
- Smart compression: Uses an LLM (Gemini/Claude) to summarize tool outputs into compact observations
- Semantic search: "What did I do with the auth system last week?" actually works
- Web dashboard: Visual overview of what's been captured
When DIY hooks shine
- Zero dependencies: No server, no database, no runtime
- Team sharing: Markdown files can be committed to git and shared across instances
- Full control: You decide exactly what gets captured and how
- Truly free: No API calls whatsoever
Cost Optimization Tip
claude-mem defaults to using Claude API for compression, which consumes your tokens. Switch to Gemini (free) to eliminate this:
// ~/.claude-mem/settings.json
{
"CLAUDE_MEM_PROVIDER": "gemini",
"CLAUDE_MEM_GEMINI_API_KEY": "your-free-key-from-aistudio.google.com"
}
Our 3-Layer Memory Architecture
In our project (Flutter Web + Supabase, 3 parallel Claude Code instances), we use a layered approach:
| Layer | Tool | Purpose |
|---|---|---|
| L1: Intra-session | claude-mem (SQLite) | Auto-record all tool usage, semantic search |
| L2: Inter-session | DIY hooks (markdown) | Git commit history, cross-instance sharing |
| L3: Cross-project | NotebookLM Master Brain | Deep research, long-term architectural knowledge |
Verdict
claude-mem delivers on its promise of turning Claude Code from a "disposable tool" into a "growing partner." The vector search and Web UI are genuinely useful features that are hard to replicate with simple scripts.
However, for teams that want zero dependencies, zero token cost, and git-friendly memory sharing, a DIY hook approach is a solid starting point.
My recommendation: Start with DIY hooks for minimal memory, then layer on claude-mem when you need semantic search and automatic compression.
Built with Claude Code | Project: https://my-web-app-b67f4.web.app/

Top comments (12)
The 3-layer memory architecture is a smart approach. I've been running a similar pattern with my automation clients — using CLAUDE.md for project-level rules, but the gap has always been that "what happened in the last 2 hours" context. Your DIY hooks approach for L2 is elegant because markdown files are auditable and diffable, unlike opaque database entries.
One edge case worth flagging: if you're running parallel Claude Code instances (as you mention with 3 instances), the PostToolUse hook can create race conditions writing to the same daily markdown file. A simple fix is namespacing by instance ID in the filename. Keeps the merge clean when your SessionStart hook aggregates them.
That’s a great catch — and yes, that edge case is very real.
Right now the markdown layer works because the instances are relatively scoped, but namespacing by instance ID is probably the cleaner long-term fix. It keeps the write path simpler and makes aggregation safer when SessionStart pulls recent context back together.
I also like that it preserves the main reason I chose markdown in the first place: auditable, diffable memory instead of opaque state.
Really appreciate you pointing that out.
This comparison is useful because it surfaces a design question most memory systems dodge: what's the right unit of memory?
claude-mem captures tool outputs and compresses them via LLM. Your DIY approach captures git commits and file writes as markdown. Both are event-level. The question neither fully answers is: when does raw event data become useful knowledge?
I've been running persistent memory across 1,100+ Claude Code sessions, and I've iterated through three generations:
Flat markdown files (your DIY approach) — simple, greppable, zero dependencies. Worked for ~200 sessions. Failed at scale because related memories across files were invisible to search.
SQLite + FTS5 — solved keyword search. Added session digests (compressed summaries of what each session decided, not just what it did). This was the first system that let a new session meaningfully continue previous work.
Knowledge graph (Neo4j, 130k+ nodes) with typed relationships — 'supersedes', 'blocked_by', 'attempted_and_failed'. This is what finally made memory operationally useful. The key insight: knowing that Strategy X was attempted in Session 400 and failed is more valuable than knowing what Strategy X was.
Your observation about multi-instance coordination via shared markdown is something I haven't seen other memory tools address well. In a multi-agent setup, memory isolation vs. memory sharing becomes a governance question, not just a technical one. Which agent can see what, and who decides?
The Bun worker reliability concern is real — I've had background services silently die under sustained load. If claude-mem's SQLite is the only copy of memory state and the worker crashes mid-write, you're looking at potential corruption. The markdown fallback being git-friendly is actually an underrated safety property.
I've been running persistent memory for Claude Code across 1100+ sessions, so I can share what the long-term trajectory looks like for both approaches you described.
I started with exactly your DIY approach — markdown files, session captures, injected at startup. It worked great for the first 200 sessions. Then it broke in two ways:
The files got too large. A flat MEMORY.md file that accumulates observations from hundreds of sessions becomes a context-window tax. You end up spending tokens loading memory that's 80% irrelevant to the current task. I had to build a manual curation discipline (trim periodically, organize by topic, delete stale entries).
Cross-referencing became impossible. Session 400 references a contact from session 50 and a decision from session 200. Grep works, but the agent wastes turns searching instead of working.
The fix was a knowledge graph (Neo4j) behind a Python API. Contacts, sessions, facts, insights, and interactions are all separate node types with typed relationships. The agent queries semantically (memory.py search 'MCP server architecture') and gets back ranked results from across 1100 sessions in milliseconds. The flat markdown files still exist as backup, but the graph is primary.
One thing neither claude-mem nor your DIY approach addresses: memory decay. Facts from session 50 may be wrong by session 500. I use timestamped fact triples with a convention that newer facts on the same subject shadow older ones. Without this, the agent acts on outdated information confidently.
The 3-layer approach you recommend (DIY hooks + claude-mem + cross-project knowledge) is directionally right. I'd just add: plan for the migration from layer 1 to layer 2 early, because by the time you need semantic search, you have hundreds of unstructured entries that are painful to backfill.
We've been running a similar DIY hooks approach across 1,100+ Claude Code sessions and the tradeoff you identified — markdown files shareable across agent instances vs. a local DB — is the one that actually matters in production. Our session-start hook reads the last N entries from a flat memory file and prepends a summary block; the key lesson was keeping that injection under ~600 tokens or it visibly degrades reasoning quality on complex tasks. One thing your comparison doesn't surface: claude-mem's background Bun worker is a reliability risk if sessions get killed mid-run (we've seen DB corruption in similar setups). For teams running Claude Code in CI or headless environments, the zero-dependency markdown approach is more resilient even if it's less capable. Worth calling out in the comparison table.
The DIY hooks approach is underrated. SQLite + markdown files covers 90% of persistent context needs without any background daemons or vector search overhead.
The interesting gap I keep running into is multimodal session context — if your agent is working with sensor readings, images, or structured telemetry alongside code, the "yesterday's decisions" you want to recall aren't just text. SQLite handles that awkwardly; most pure vector stores don't handle it at all.
For embedded or edge AI use cases specifically, I've been using moteDB (a Rust-native embedded database designed for AI agents — github.com/motedb/motedb) to handle exactly this. The model I'm moving toward: text decisions in markdown like your approach, structured/multimodal context in moteDB, retrieved together at session start.
What types of context turned out to be most valuable to persist across sessions in your setup? Code structure decisions, or more conversational style preferences?
Memory is the hidden technical debt in AI-assisted development. Without it, every conversation starts from zero, which means you're re-explaining your project architecture, coding conventions, and business context every session.
The claude-mem approach is interesting, but I think the real insight is in the DIY lightweight alternative. Most developers don't need a complex memory system — they need a well-maintained CLAUDE.md file and a few structured context documents. That's the 80/20 solution.
The technical debt angle here is important: memory systems accumulate stale information over time. If you stored a decision 3 months ago that's no longer valid, the AI will confidently follow outdated guidance. Memory without a cleanup/review process creates a new kind of tech debt that's invisible until it causes a bug.
I've been thinking about this a lot while building tokrepo.com — the context management problem is fundamental to every AI coding tool.
Really thoughtful comment — I agree with your “memory as hidden technical debt” framing.
I think that’s exactly why a plain CLAUDE.md + a few structured docs is such a strong baseline. It’s simple, reviewable, and much less likely to drift silently.
For me, the gap is that static docs capture explicit rules well, but they don’t capture recent working context very well — things like “what changed yesterday,” “which files were just touched,” or “what this instance was debugging 30 minutes ago.” That’s the part I’m trying to cover with lightweight hooks first, and then semantic memory only when the project complexity justifies it.
So I’m converging on almost the same conclusion: static docs for durable truth, lightweight memory for short-term continuity, and some cleanup discipline so the memory layer doesn’t become the next mess.
Really appreciate this perspective.
ran into this same problem and ended up rolling my own markdown-based memory files instead of claude-mem. works for my setup but the 46k stars are telling - most people want zero config.
That makes sense — I think that’s exactly why markdown-based memory feels so appealing.
For a lot of setups, “good enough and transparent” beats “powerful but heavier.” The 46k stars definitely suggest there’s huge demand for zero-config memory, but I still think the markdown route has a strong advantage when you want auditability, git-friendliness, and full control over what gets captured.
So my current view is pretty close to yours: start simple, then add heavier memory only when the project complexity actually earns it.
yeah the auditability point is underrated - being able to git blame your agent's memory is genuinely useful when something breaks. most of the zero-config solutions treat state as a black box
The DIY memory file approach works surprisingly well until the project has >5 parallel contexts — at that point the file grows and Claude starts skipping sections when loading (the "laziness" failure mode). What I've found fixes it: split memory into ~20-line category files with clear filenames, keep a 1-line index. Claude reads the index first, then only the relevant files. Less context bloat, less skipping. Anyone else dealt with the context rot from a single big CLAUDE.md?