harness

Documentation

<p align="center">
  <img src="logo.svg" alt="Harness Logo" width="400" height="400">
</p>

# Harness

A Claude Code plugin for autonomous multi-session development. Based on [Anthropic's research on effective harnesses for long-running agents](https://www.anthropic.com/engineering/effective-harnesses-for-long-running-agents).

## Philosophy

Large projects can't be completed in a single session. This plugin provides a framework for:

- **Breaking work into discrete tasks** - 50-200 testable tasks per project
- **Clarifying requirements upfront** - Asks questions before generating tasks
- **Documentation-driven development** - Each task includes relevant doc URLs
- **Native task integration** - Uses Claude Code's built-in task system for tracking
- **Autonomous execution** - Runs through ALL tasks without human intervention
- **Atomic commits** - Each task gets its own commit
- **Tracking progress visibly** - Native UI, session logs, and git history tell the story

## Installation

### Option A: Add as Marketplace (Recommended)

This method persists across sessions and can be shared with your team.

**Step 1: Add the marketplace**

```bash
# Inside Claude Code
/plugin marketplace add mikkelkrogsholm/harness
```

**Step 2: Install the plugin**

```bash
# Install for yourself (user scope - works across all projects)
/plugin install harness@mikkelkrogsholm-harness

# OR install for this project only (project scope - shared with team via git)
/plugin install harness@mikkelkrogsholm-harness --scope project
```

Commands will be namespaced as `/harness:init`, `/harness:continue`, `/harness:status`.

### Option B: Test with --plugin-dir (Session Only)

For quick testing without permanent installation:

```bash
git clone https://github.com/mikkelkrogsholm/harness /tmp/harness-plugin
claude --plugin-dir /tmp/harness-plugin
```

Note: This only loads the plugin for that session.

### Option C: Manual Installation (Copy to .claude/)

For maximum control or when marketplace isn't available:

```bash
# Clone the repo
git clone https://github.com/mikkelkrogsholm/harness /tmp/harness-plugin

# Create .claude directories in your project
mkdir -p .claude/{commands,agents,scripts}

# Copy components
cp /tmp/harness-plugin/commands/*.md .claude/commands/
cp /tmp/harness-plugin/agents/*.md .claude/agents/
cp /tmp/harness-plugin/scripts/*.sh .claude/scripts/

# Make scripts executable
chmod +x .claude/scripts/*.sh

# Fix hook paths for manual installation
sed -i '' 's|\${CLAUDE_PLUGIN_ROOT}/scripts/|.claude/scripts/|g' .claude/agents/*.md
```

Commands will be `/harness-init`, `/harness-continue`, `/harness-status` (no namespace prefix).

### Option D: Project-Level Plugin Configuration

Add to your project's `.claude/settings.json` to auto-prompt team members:

```json
{
  "extraKnownMarketplaces": ["mikkelkrogsholm/harness"],
  "enabledPlugins": ["harness@mikkelkrogsholm-harness"]
}
```

When team members open the project and trust the folder, they'll be prompted to install.

## Usage

### 1. Initialize a Project

```
/harness:init Build a task management app with auth, projects, and real-time sync
```

(Or `/harness-init` for manual installation)

The bootstrap agent will:

1. **Ask clarifying questions** about:
   - Tech stack (frontend, backend, database)
   - Authentication method
   - Scope (MVP vs full product)
   - Third-party integrations
   - Special requirements

2. **Research documentation** for chosen technologies

3. **Create tasks and files**:
   - **Native tasks** in Claude Code's task system with metadata:
     - Priority (1-5)
     - Category (core, functional, ui, integration, polish)
     - Documentation URLs
     - Verification steps
   - `claude-progress.txt` - Session log with assumptions documented
   - `init.sh` - Development environment setup

### 2. Continue Development

```
/harness:continue
```

(Or `/harness-continue` for manual installation)

This runs autonomously until complete:

```
WHILE incomplete tasks exist:
    1. Get next pending task (by priority)
    2. Call incremental-workflow agent (fresh context)
    3. Agent consults documentation URLs from task metadata
    4. Agent implements, verifies, commits
    5. Mark task completed, continue to next
END WHILE
```

Stops only when all tasks are completed or all remaining are blocked.

### 3. Check Status

```
/harness:status
```

(Or `/harness-status` for manual installation)

Shows progress without making changes:
- Overall completion percentage
- Progress by category (from task metadata)
- Recent session activity
- Git state
- Next task to implement
- Native Claude Code task UI integration

## How It Works

### Orchestrator Pattern

The `/harness:continue` command acts as an **orchestrator** that loops through tasks. For each task, it delegates to the `incremental-workflow` agent which runs in its own **isolated context**. This gives each task implementation:

- Fresh context window (no token buildup)
- Isolated execution
- Clean return to orchestrator

This is the same pattern as the proven [Motlin /todo-all method](https://motlin.com/blog/claude-code-running-for-hours).

### Native Task Integration

Tasks are stored in Claude Code's native task system, providing:

- **Visual tracking** in the Claude Code UI
- **No external dependencies** - no jq required
- **Persistent across sessions** via SessionStart/SessionEnd hooks
- **Metadata support** for priority, category, verification, and documentation URLs
- **Clean API** via TaskCreate, TaskList, TaskGet, TaskUpdate tools

### Two Agents

| Agent | Purpose | Hooks |
|-------|---------|-------|
| `project-bootstrap` | Asks questions, researches docs, creates tasks | Stop: Auto-commits bootstrap files |
| `incremental-workflow` | Consults docs, implements ONE task per invocation | Stop: Requires clean git state |

### Hook Flow

```
Session Lifecycle:
  SessionStart → load-tasks.sh → Loads .claude/tasks/*.json into native task system
  [Work happens...]
  SessionEnd → save-tasks.sh → Saves native tasks back to .claude/tasks/*.json

Bootstrap (once per project):
  project-bootstrap agent runs
  Stop hook → bootstrap-commit.sh → Auto-commits progress log and init.sh

Workflow (per task):
  incremental-workflow agent runs
  [Implementation work...]
  Stop hook → verify-clean-state.sh → Ensures clean git before stopping
```

### Documentation-Driven Development

Each task includes metadata with relevant documentation URLs:

```json
{
  "subject": "Set up Next.js 14 project with App Router",
  "description": "Initialize Next.js project with TypeScript and App Router...",
  "activeForm": "Setting up Next.js project",
  "metadata": {
    "priority": 1,
    "category": "core",
    "documentation": [
      "https://nextjs.org/docs/getting-started/installation",
      "https://nextjs.org/docs/app/building-your-application/routing"
    ],
    "verification": [
      "npm run dev works",
      "TypeScript compiles"
    ]
  }
}
```

The `incremental-workflow` agent **fetches and reads these docs** before implementing each task, ensuring:
- Correct API usage
- Following official patterns
- Avoiding common pitfalls

### Hook Enforcement

The plugin uses hooks to enforce discipline and manage task persistence:

**SessionStart Hook** (setup-harness.sh):
- Creates `.claude/tasks/` directory structure
- Runs load-tasks.sh to load persisted tasks into native system

**SessionEnd Hook** (save-tasks.sh):
- Saves all native tasks to `.claude/tasks/*.json`
- Ensures task state persists across sessions

**Stop Hook** (incremental-workflow via verify-clean-state.sh):
- Checks for uncommitted changes
- Blocks stopping if working tree is dirty
- Forces clean handoffs between sessions

**Stop Hook** (project-bootstrap via bootstrap-commit.sh):
- Auto-commits bootstrap files after initialization
- Creates clean starting point for development

### Files Created

| File/Directory | Purpose |
|----------------|---------|
| `.claude/tasks/*.json` | Task persistence files (auto-managed by hooks) |
| `claude-progress.txt` | Session-by-session log with assumptions documented |
| `init.sh` | Script to set up development environment |

### Task Structure

Tasks are stored in Claude Code's native task system with rich metadata:

```json
{
  "taskId": "task-001",
  "subject": "Set up Next.js 14 project with App Router",
  "description": "Initialize Next.js project with TypeScript, App Router, and configure Tailwind CSS.\n\nVerification:\n- npm run dev works\n- TypeScript compiles\n- Tailwind styles apply",
  "activeForm": "Setting up Next.js project",
  "status": "pending",
  "metadata": {
    "priority": 1,
    "category": "core",
    "documentation": [
      "https://nextjs.org/docs/getting-started/installation",
      "https://nextjs.org/docs/app/building-your-application/routing"
    ],
    "verification": [
      "npm run dev works",
      "TypeScript compiles",
      "Tailwind styles apply"
    ],
    "tech_stack": {
      "frontend": "Next.js 14",
      "backend": "Node.js with tRPC",
      "database": "PostgreSQL with Drizzle ORM",
      "auth": "Better Auth"
    }
  }
}
```

**Metadata Fields:**
- `priority`: 1 (critical) → 5 (nice-to-have)
- `category`: `core`, `functional`, `ui`, `integration`, `polish`
- `documentation`: Array of relevant documentation URLs
- `verification`: Array of verification steps
- `tech_stack`: Technology choices (stored in first task only)

## Directory Structure

```
harness/
├── .claude-plugin/
│   └── plugin.json          # Plugin manifest
├── agents/
│   ├── project-bootstrap.md # Bootstrap agent (asks questions, creates tasks)
│   └── incremental-workflow.md # Workflow agent (implements tasks)
├── commands/
│   ├── init.md              # /harness:init
│   ├── continue.md          # /harness:continue
│   └── status.md            # /harness:status
├── hooks/
│   └── hooks.json           # Hook configuration
├── scripts/
│   ├── setup-harness.sh     # SessionStart hook (creates .claude/tasks/)
│   ├── load-tasks.sh        # Loads persisted tasks
│   ├── save-tasks.sh        # SessionEnd hook (saves tasks)
│   ├── bootstrap-commit.sh  # Stop hook for bootstrap
│   ├── verify-clean-state.sh # Stop hook for workflow
│   └── migrate-features.sh  # Migration utility
└── README.md
```

## Requirements

- Claude Code 1.0.33+ (for plugins with agents and hooks)
- Git (for commit hooks)
- No external dependencies (uses native task tools)

## Migration from feature_list.json

If you have an existing project using the old `feature_list.json` format, you can migrate to the task-based system:

### Automatic Migration

The `/harness:init` command now detects existing `feature_list.json` files and offers to migrate them:

```bash
# If feature_list.json exists
/harness:init

# Follow the prompts to migrate to task-based system
```

The migration will:
1. Convert each feature to a native task
2. Preserve priority, category, verification, and documentation
3. Back up `feature_list.json` to `feature_list.json.bak`
4. Create `.claude/tasks/` directory with persisted tasks

### Manual Migration

If you prefer manual migration, use the migration script:

```bash
# Run the migration script directly
./.claude/scripts/migrate-features.sh

# Or from the plugin directory
./scripts/migrate-features.sh /path/to/your/project
```

This creates a migration guide in `MIGRATION_GUIDE.json` that you can use with Claude Code's task tools.

## Benefits of Native Task Integration

The task-based architecture provides significant advantages over the previous `feature_list.json` approach:

### Visual Progress Tracking
- Tasks appear directly in Claude Code's native UI
- See task status, priorities, and categories at a glance
- No need to open external files to check progress

### No External Dependencies
- No `jq` installation required
- Uses Claude Code's built-in TaskCreate, TaskList, TaskGet, TaskUpdate tools
- Cleaner, more maintainable codebase

### Better Persistence
- SessionStart/SessionEnd hooks automatically manage task state
- Tasks persist across sessions via `.claude/tasks/*.json`
- Git-friendly format for team collaboration

### Rich Metadata Support
- Store priority, category, verification steps, and documentation URLs
- Access task details programmatically via TaskGet
- Filter and query tasks by metadata

### Cleaner Hook System
- Removed need for prevent-feature-edit.sh hook
- Task specifications are naturally immutable via TaskCreate
- Simpler hook configuration in hooks.json

### Improved Developer Experience
- Native tool integration feels more natural
- Better error messages from built-in tools
- Consistent API across all task operations

## Tips

### Starting Fresh
If you want to restart a project, delete the generated files:
```bash
rm -rf .claude/tasks/ claude-progress.txt init.sh
```

### Resuming After Context Limits
If a session hits context limits before completing, just run `/harness:continue` again. It picks up where it left off thanks to task persistence.

### Viewing Progress
Progress is visible directly in the Claude Code UI via the native task panel. You can also check:

```bash
# View progress log
cat claude-progress.txt

# List persisted task files
ls -la .claude/tasks/

# View git history
git log --oneline
```

### Managing the Plugin
```bash
# Update to latest version
/plugin update harness@mikkelkrogsholm-harness

# Disable temporarily
/plugin disable harness@mikkelkrogsholm-harness

# Re-enable
/plugin enable harness@mikkelkrogsholm-harness

# Uninstall
/plugin uninstall harness@mikkelkrogsholm-harness
```

## License

MIT
Keywords

Commands

Documentation