agent-memory - Claude Code Plugin

Documentation

# agent-memory

Persistent memory for AI coding agents that build, maintain, and enhance long-lived projects.

Most memory solutions assume your relationship with a project ends at `git push`. This one doesn't. If you maintain production systems, ship continuous improvements, and need your agent to remember why that Docker port was changed 4 months ago — agent-memory is built for you.

Records what was learned, built, fixed, and decided during each session, then makes it searchable via semantic + full-text hybrid search. Claude Code's built-in `MEMORY.md` gives you 200 lines of pinned notes. agent-memory gives you a searchable journal across thousands of observations — so accumulated context becomes a competitive advantage, not a truncated file.

Works with Claude Code out of the box. Designed to support any AI coding agent via REST API or MCP.

## Quick Start

```bash
git clone https://github.com/metazen11/agent-memory.git
cd agent-memory
node install.js
```

The installer handles everything:
- Creates Python venv and installs dependencies
- Downloads embedding model (~400MB) and observation LLM (~1GB)
- Generates `.env` with random Postgres password
- Starts Docker (PostgreSQL + pgvector)
- Starts FastAPI server on port 3377
- Registers MCP server, hooks, and skills in Claude Code

### Commands

```bash
node install.js              # Full setup + install
node install.js --status     # Show what's installed and running
node install.js --start      # Start services (Docker + FastAPI)
node install.js --stop       # Stop services
node install.js --migrate    # Run pending database migrations
node install.js --migrate --dry-run  # Preview migrations (no changes)
node install.js --migrate --backup   # Backup tables, then migrate
node install.js --backup     # Backup mem_* tables only
node install.js --uninstall  # Remove hooks, MCP, skills
```

### Prerequisites

- **Docker** *(or external PostgreSQL)* — macOS: `brew install --cask docker` | Linux: `sudo apt install docker.io docker-compose-plugin`
- **Python 3.12+** — macOS: `brew install [email protected]` | Linux: `sudo apt install python3.12 python3.12-venv`
- **Node.js** — for the installer and hooks

## Architecture

```
┌─────────────────────────────────────────────────────────┐
│  Claude Code Session                                    │
│                                                         │
│  session-start hook ──► Health check → auto-start       │
│                     └──► Inject MCP guide + context     │
│  post-tool-use hook ──► POST /api/queue (fire & forget) │
│  session-end hook   ──► PATCH /api/sessions/:id         │
└──────────────┬──────────────────────────────────────────┘
               │ HTTP (localhost:3377)
┌──────────────▼──────────────────────────────────────────┐
│  FastAPI Server (uvicorn, port 3377)                    │
│                                                         │
│  /api/queue ──► observation_queue table                  │
│  /api/observations ──► CRUD + hybrid search             │
│  /api/sessions ──► session lifecycle                     │
│  /api/admin ──► stats, re-embed                         │
│                                                         │
│  Queue Worker (background asyncio task)                 │
│  ├─ Dequeue pending items (FOR UPDATE SKIP LOCKED)      │
│  ├─ Generate observation via LLM (local GGUF → Haiku)   │
│  ├─ Embed via sentence-transformers (in-process)        │
│  └─ Insert into mem_observations with vector            │
└──────────────┬──────────────────────────────────────────┘
               │
┌──────────────▼──────────────────────────────────────────┐
│  MCP Server (stdio, separate process)                   │
│  Registered in ~/.claude/.mcp.json                      │
│                                                         │
│  Tools: search, timeline, get_observations, save_memory │
│  Own DB pool + embedding model (zero FastAPI deps)      │
└──────────────┬──────────────────────────────────────────┘
               │
┌──────────────▼──────────────────────────────────────────┐
│  PostgreSQL 16 + pgvector (Docker)                      │
│  Tables: mem_* prefixed (avoids collisions)             │
└─────────────────────────────────────────────────────────┘
```

## How It Works

### Recording (write path)

Every tool call in your coding session is captured:

1. **PostToolUse hook** fires (fire-and-forget, ~40ms)
2. Tool call data queued to `/api/queue`
3. Background worker dequeues with `FOR UPDATE SKIP LOCKED`
4. Local LLM extracts structured observation (title, type, narrative, facts)
5. Sentence-transformers generates 768-dim embedding
6. Inserted into PostgreSQL with pgvector index

### Retrieval (read path)

Search past sessions via MCP tools (3-layer workflow):

1. `search(query)` — hybrid vector + full-text search, returns IDs (~50-100 tokens/result)
2. `timeline(anchor=ID)` — context around interesting results
3. `get_observations([IDs])` — full details only for filtered IDs

Never skip to step 3. Always filter first. 10x token savings.

### Auto-start

The session-start hook automatically starts services if they're not running. No manual intervention needed after initial install.

## Configuration

### .env

Generated by `install.js`. Key settings:

| Variable | Default | Description |
|----------|---------|-------------|
| `POSTGRES_USER` | `agentmem` | PostgreSQL user |
| `POSTGRES_PASSWORD` | *(generated)* | PostgreSQL password |
| `POSTGRES_HOST` | `localhost` | PostgreSQL host |
| `POSTGRES_PORT` | `5433` | PostgreSQL port |
| `POSTGRES_DB` | `agent_memory` | Database name |
| `DATABASE_URL` | *(built from above)* | Full URL override |
| `EMBEDDING_MODEL` | `nomic-ai/nomic-embed-text-v1.5` | Sentence-transformers model |
| `OBSERVATION_LLM_MODEL` | *(path to .gguf)* | Local LLM for observation extraction |
| `ANTHROPIC_API_KEY` | *(empty)* | Haiku fallback if no local LLM |
| `PORT` | `3377` | FastAPI server port |

### Existing Database (Bring Your Own Postgres)

If you already have a PostgreSQL 16+ instance with pgvector, set `DATABASE_URL` in `.env`:

```bash
DATABASE_URL=postgresql://user:pass@host:5433/dbname
```

When `DATABASE_URL` is set, the installer:
- Skips Docker entirely (no container needed)
- Runs versioned SQL migrations against your database
- Creates all `mem_`-prefixed tables (avoids collisions with other apps)

Requirements for external databases:
- PostgreSQL 16+ with the `vector` extension (pgvector)
- A database and user with CREATE TABLE / CREATE EXTENSION permissions

### Schema Migrations

The database schema is managed by versioned SQL migrations in `scripts/migrations/`:

```
scripts/migrations/
├── 001-initial-schema.sql     # Tables, indexes, pgvector extension
├── 002-add-new-feature.sql    # Future migrations...
└── ...
```

Migrations run automatically:
- During `node install.js` (step 7)
- On every FastAPI server startup
- Via `python scripts/run_migrations.py` (manual)

Each migration runs exactly once. A `mem_schema_migrations` table tracks which have been applied.

## Components

### FastAPI Server (`app/`)

| File | Purpose |
|------|---------|
| `main.py` | App lifecycle (pool init, migrations, queue worker) |
| `migrate.py` | Versioned SQL migration runner |
| `config.py` | Pydantic settings from `.env` |
| `db.py` | asyncpg connection pool |
| `models.py` | Pydantic schemas |
| `embeddings.py` | Sentence-transformers in-process embeddings (768-dim) |
| `observation_llm.py` | Local GGUF (Qwen2.5-1.5B) with Anthropic Haiku fallback |
| `queue_worker.py` | Background asyncio task, processes queue items |
| `routes/` | Health, observations, sessions, admin endpoints |

### MCP Server (`mcp_server.py`)

Self-contained stdio MCP server. Own DB pool and embedding model — zero dependency on FastAPI.

### Hooks (`hooks/`)

| Hook | Event | Timeout | Description |
|------|-------|---------|-------------|
| `session-start.js` | SessionStart | 60s | Health check, auto-start services, inject context |
| `post-tool-use.js` | PostToolUse | 5s | Fire-and-forget observation capture |
| `session-end.js` | Stop | 10s | Mark session completed |
| `ensure-services.js` | *(internal)* | — | Starts Docker + FastAPI when called by session-start |

### Skills (`skills/`)

`/mem-search` — User-invocable skill for searching past sessions.

## API Endpoints

### Health & Admin

| Method | Path | Description |
|--------|------|-------------|
| `GET` | `/api/health` | DB, embeddings, queue depth |
| `GET` | `/api/admin/stats` | Counts and type breakdown |
| `POST` | `/api/admin/re-embed` | Background re-embed job |

### Observations

| Method | Path | Description |
|--------|------|-------------|
| `POST` | `/api/queue` | Queue tool call for async extraction |
| `POST` | `/api/observations` | Create observation directly |
| `GET` | `/api/observations` | List with filters |
| `POST` | `/api/observations/search` | Hybrid search |

### Sessions

| Method | Path | Description |
|--------|------|-------------|
| `POST` | `/api/sessions` | Start new session |
| `PATCH` | `/api/sessions/{id}` | Update session status |
| `GET` | `/api/sessions` | List sessions |

## Database Schema

All tables use the `mem_` prefix.

| Table | Purpose |
|-------|---------|
| `embedding_models` | Registry of embedding models |
| `mem_projects` | Auto-created from working directory |
| `mem_sessions` | One per coding session |
| `mem_observations` | Core memory unit with embeddings |
| `mem_observation_queue` | Async processing queue |

### Search Strategy

Hybrid search using **Reciprocal Rank Fusion (RRF)** with k=60:
1. **Vector search** — cosine similarity via pgvector HNSW index
2. **Full-text search** — PostgreSQL tsvector with weighted fields
3. **RRF fusion** — `score = sum(1/(60+rank))` across both result sets

## Multi-Agent Support

The system is agent-agnostic. The hooks are the Claude-specific integration layer.

**REST API** — Any agent can POST to `/api/queue` and GET from `/api/observations`.

**MCP** — Register `mcp_server.py` in any MCP-compatible agent's config.

**Direct SQL** — Query `mem_observations` with pgvector operators.

See **[docs/PRIMER.md](docs/PRIMER.md)** for the full multi-agent integration guide with config snippets for Claude Code, Cursor, Windsurf, Cline, Codex CLI, Zed, VS Code Copilot, and custom agents.

## Why Replace claude-mem?

This project was built as a direct replacement for [claude-mem](https://github.com/thedotmack/claude-mem) after hitting persistent stability issues:

- **PostToolUse hook hangs** — claude-mem's `PostToolUse` hook uses `matcher: "*"` with a 120-second timeout. It fires on every single tool call, spawns worker-service daemons, and frequently hangs waiting for ChromaDB sync. This blocks Claude Code after every tool use. The fix (removing the hook from `hooks.json`) gets overwritten on every plugin update.
- **Zombie processes** — The worker-service daemons accumulate. We've seen 50-80+ zombie `worker-service` processes in a single session, consuming memory and CPU.
- **ChromaDB crashes on Apple Silicon** — ChromaDB 1.5.0's Rust bindings (`chromadb_rust_bindings.abi3.so`) segfault on macOS ARM64 due to a thread-safety bug. Multiple tokio workers contend on a mutex, causing SIGSEGV.
- **No real vector search** — claude-mem uses ChromaDB/SQLite locally, which doesn't scale well and lacks proper hybrid search. agent-memory uses PostgreSQL + pgvector with HNSW indexes and Reciprocal Rank Fusion (vector + full-text).
- **No auto-recovery** — When claude-mem's database or services go down, they stay down. agent-memory's session-start hook auto-detects unhealthy services and restarts Docker containers and the FastAPI server automatically.
- **Fire-and-forget hooks** — agent-memory's PostToolUse hook writes stdout immediately and exits in ~30ms. The HTTP POST to the queue is unref'd so it never blocks the Node.js event loop. claude-mem's hook blocks until its worker completes.

If you're currently using claude-mem and experiencing hangs, crashes, or zombie processes, agent-memory is a drop-in replacement with a migration script included.

## Migration from claude-mem

```bash
source .venv/bin/activate
python scripts/migrate_claude_mem.py       # migrate without embeddings
python scripts/migrate_claude_mem.py --embed  # migrate with embeddings
python scripts/re_embed.py --only-missing  # embed missing observations
```

## Debug

| Hook | Default | Toggle |
|------|---------|--------|
| session-start | ON | `AGENT_MEMORY_DEBUG=0` |
| post-tool-use | OFF | `AGENT_MEMORY_DEBUG=1` |
| session-end | ON | `AGENT_MEMORY_DEBUG=0` |

```bash
AGENT_MEMORY_DEBUG=1 claude   # enable all
```

## Docker

```bash
cd docker && docker compose up -d     # start
cd docker && docker compose down      # stop
cd docker && docker compose down -v   # reset (destroys data)
```