Harnesses for Long-Running Agents

Overview

The core problem: Long-running agents work in discrete sessions with no memory between context windows—like engineers arriving to shifts with no knowledge of previous work. Compaction alone isn't sufficient; agents tend to one-shot tasks or prematurely declare completion.
Two-part solution: An initializer agent sets up the environment on first run, while a coding agent makes incremental progress in subsequent sessions, leaving clear artifacts for the next session.
Feature list scaffolding: The initializer creates a comprehensive JSON file of feature requirements (200+ for a web app clone), all initially marked "failing," giving coding agents a clear outline of what full functionality looks like.
Incremental progress with clean states: Coding agents work on one feature at a time, commit to git with descriptive messages, and write progress summaries—enabling recovery from bad changes and eliminating guesswork.
Testing with browser automation: Explicit prompting to use tools like Puppeteer MCP for end-to-end testing dramatically improved performance, catching bugs invisible from code alone.
Session startup routine: Each coding agent runs pwd, reads git logs and progress files, runs init.sh to start the dev server, and verifies basic functionality before implementing new features.

Takeaways

Anthropic developed this approach for the Claude Agent SDK to solve context window limitations. The key insight is structured handoffs—feature lists, git commits, and progress files—that let each new agent quickly understand project state.

The key insight here was finding a way for agents to quickly understand the state of work when starting with a fresh context window, which is accomplished with the claude-progress.txt file alongside the git history.