Guide

Claude implementation harness scaffolding, the five layers, file by file

Most writing on Claude harness engineering stops at the theory: tools, memory, context, permissions. Useful, but a reader who wants to build one is left guessing at the actual files. This is the opposite: the production harness mk0r ships, broken into five layers with the real file paths, real line counts, and real behavior.

Matthew Diakonov, Written with AI

Published May 12, 202611 min

Direct answer, verified 2026-05-12

A real Claude implementation harness has five layers, each a separate file you can open today:

Process boot: docker/e2b/files/opt/startup.sh (71 lines)
Agent protocol bridge: docker/e2b/files/opt/acp-bridge.js (799 lines)
MCP toolbelt: src/core/e2b.ts buildMcpServersConfig, lines 175 to 213
Layered context: src/core/vm-claude-md.ts (2,438 lines, generates 3 files)
Event-capture instrumentation: docker/e2b/files/opt/patched-acp-entry.mjs (intercepts 6 dropped event types)

All five files live in the same repo. Boot order: 1 → 2 → 3 → 4 → 5.

Why the harness matters more than the model

A wrapper sends a prompt and prints a reply. A harness gives the model an environment to act inside, with a filesystem, a tool surface, a permission gate, a memory layer, and a feedback loop that closes when the work is verified. Anthropic's own engineering post on this estimates that roughly 98% of an agent's observed behavior comes from what surrounds the model, not from the model itself. The corollary is unsettling: the choice of GPT versus Claude versus the next thing matters less than the harness you put around whichever one you picked.

That puts a lot of weight on the scaffolding. So this page does something the existing writing on harnesses does not: walks through one real harness, top to bottom, with every layer anchored to a file you can open and grep.

The five layers, in boot order

These layers boot bottom-up: process boot starts the daemons the bridge needs, the bridge connects the agent the MCP toolbelt extends, the MCP toolbelt makes the tools the layered context tells the agent how to use, and the event-capture instrumentation sits at the top so the client UI can finally see what the agent is doing.

Layer 1, process boot

What gets started before the agent's first turn. The OS daemons, the browser, the dev server, the tunnels. Boring infrastructure that has to come up in the right order or every subsequent layer falls over.

In mk0r this is 71 lines of bash at docker/e2b/files/opt/startup.sh. Notice the absence of set -e: a single failing daemon should not tear down the whole boot. Each background process is fire-and-forget except the final node /opt/proxy.js which is exec'd as PID 1. The pattern that matters: every later layer assumes Chromium's CDP is on 9222, Playwright MCP is on 3001, the ACP bridge is on 3002, and Vite is on 5173. Those port numbers are the API contract between the layers.

Layer 2, agent protocol bridge

A process that owns one stdin/stdout pipe to the Claude Code agent and exposes JSON-RPC over HTTP to the host. Every prompt, every cancel, every credential refresh goes through here.

docker/e2b/files/opt/acp-bridge.js, 799 lines. The interesting bits: it manages a credentials file at /root/.claude/.credentials.json and never overwrites a newer on-disk token with a stale host-pushed one (Claude refresh tokens rotate and are single-use, so blindly overwriting forces re-auth). It buffers the last 200 stderr lines from the ACP subprocess so when the subprocess dies you get a death rattle instead of a bare error. And it spawns either /opt/patched-acp-entry.mjs (the wrapped agent) or falls back to the stock @agentclientprotocol/claude-agent-acp@0.25.0 entry on older templates.

Layer 3, MCP toolbelt

The list of named MCP servers the agent can call. Each one expands the agent's tool surface. Playwright gives it a browser. A scheduler gives it cron. A provisioning server gives it the ability to ask for a database.

src/core/e2b.ts function buildMcpServersConfig at lines 175-213 declares three: playwright (the @playwright/mcp@0.0.70 binary spawned with --cdp-endpoint http://127.0.0.1:9222), scheduler (/opt/scheduler-mcp.js, reads /run/mk0r-session.json at tool-call time so paused-and-resumed sandboxes still authenticate), and mk0r-provisioning (/opt/provisioning-mcp.js, lazily creates a Resend audience or a Neon Postgres on demand and writes the env vars into /app/.env). The shape of the config block is the same shape Claude Code itself reads from ~/.claude/settings.json, so any MCP server you build elsewhere works here unchanged.

Layer 4, layered context

The persistent instructions the agent reads before every session. Role, memory rules, design guardrails, file conventions. The single highest-leverage place to change agent behavior, because the model is the same and the tools are the same, but a better CLAUDE.md ships better apps.

src/core/vm-claude-md.ts at 2,438 lines generates three files: /root/.claude/CLAUDE.md (global, applies to any project the agent touches), /app/CLAUDE.md (project, Vite + React + Tailwind v4 specifics), and /root/.claude/settings.json (permission mode: acceptEdits). It also installs six pre-built skills under /root/.claude/skills/ (frontend-design, copywriting, backend-services, algorithmic-art, website-builder, seo-page) so the agent can match a user request to a skill and load a deeper playbook than the global config carries. The whole context layer is data, not code, which means iterating on agent behavior is iterating on prose.

Layer 5, event-capture instrumentation

The stock Claude Code agent quietly drops several session-update event types. If you want a UI that shows real progress, real rate-limit state, and real context compaction, you have to recapture them.

docker/e2b/files/opt/patched-acp-entry.mjs wraps ClaudeAcpAgent.prototype.createSession and replaces session.query.next() with a function that forwards six event types the stock SDK ignores: api_retry, rate_limit_event, compact_boundary, task_notification, tool_progress, and tool_use_summary. It also records the per-turn cost in USD (session._lastCostUsd) and the typed last-error (session._lastApiError) so the bridge can attach _meta.lastApiError to the prompt response. This is the file no one writes about because it only matters when you ship the harness to real users. Then it matters every day.

Layer 1 in code, the boot script

The whole boot is 71 lines. Trimmed for readability, the shape looks like this. Every later layer assumes these ports are open and these daemons are alive.

# docker/e2b/files/opt/startup.sh  (71 lines total)

# Layer 1: the virtual display so the agent has a screen
Xvfb :99 -screen 0 1600x1600x24 -ac &
export DISPLAY=:99

# Layer 1: the residential proxy daemon (dormant until configured)
node /opt/brd-proxy.js &

# Layer 1: Chromium with CDP open on 9222
chromium --no-sandbox --disable-gpu \
  --user-data-dir=/root/.chromium-profile \
  --proxy-server=http://127.0.0.1:3003 \
  --remote-debugging-port=9222 about:blank &

# Layer 1: VNC and WebSocket bridge (the user watches here)
x11vnc -display :99 -nopw -forever -shared -rfbport 5900 -q &
websockify 0.0.0.0:5901 localhost:5900 &

# Layer 3 of the harness: Playwright MCP wired to CDP
npx @playwright/mcp --cdp-endpoint http://127.0.0.1:9222 \
  --port 3001 --host 0.0.0.0 --allowed-hosts '*' &

# Layer 2: the agent protocol bridge
node /opt/acp-bridge.js &

# Vite dev server for the app the agent is about to write
cd /app && npx vite --host 0.0.0.0 --port 5173 &

# The single foreground process, the public proxy
exec node /opt/proxy.js

The deliberate omission of set -e at the top of the script is the part most people get wrong on the first pass. A failing VNC daemon must not tear down the agent. The single foreground process, the public proxy on port 3000, is the only thing whose death should end the container.

Layer 3 in code, the MCP toolbelt

One block, three servers. The shape of this block is identical to what Claude Code reads from ~/.claude/settings.json, which means any MCP server you build elsewhere drops in here unchanged.

// src/core/e2b.ts  (lines 175-213)

export function buildMcpServersConfig(): Array<Record<string, unknown>> {
  return [
    {
      name: "playwright",
      command: "npx",
      args: [
        "@playwright/mcp",
        "--cdp-endpoint",
        "http://127.0.0.1:9222",
      ],
      env: [
        {
          name: "PLAYWRIGHT_CHROMIUM_EXECUTABLE_PATH",
          value: "/usr/bin/chromium",
        },
      ],
    },
    {
      // Reads /run/mk0r-session.json at tool-call time so pool reuse
      // and reconnect-after-pause both work without restarting.
      name: "scheduler",
      command: "node",
      args: ["/opt/scheduler-mcp.js"],
      env: [],
    },
    {
      // Lazy on-demand provisioning of Resend + Neon Postgres.
      name: "mk0r-provisioning",
      command: "node",
      args: ["/opt/provisioning-mcp.js"],
      env: [],
    },
  ];
}

The two custom servers, scheduler and mk0r-provisioning, both read /run/mk0r-session.json at tool-call time rather than at startup. That tiny detail matters: when a sandbox is paused and later resumed, or when a pool sandbox is reassigned to a new project, the session credentials change, but the MCP subprocesses are already running. Reading at call time, not startup time, is what makes the pool work.

Layer 5 in code, the patched ACP entry

This is the file that earns its own section. The stock @agentclientprotocol/claude-agent-acp@0.25.0 entry point ships, runs, and on its surface looks fine. The problem is that several session-update event types come out of the Claude Code SDK and never make it to the client. Without them, your UI lies about progress during long tool calls, disappears during context compaction, and silently freezes during rate limits.

mk0r ships a patched entry, around 200 lines, that wraps ClaudeAcpAgent.prototype.createSession and replaces session.query.next() with a function that forwards six specific event types the stock SDK drops: api_retry, rate_limit_event, compact_boundary, task_notification, tool_progress, and tool_use_summary. It also captures per-turn cost (session._lastCostUsd) and the typed last error (session._lastApiError) so the bridge can attach a _meta block to the prompt response.

// docker/e2b/files/opt/patched-acp-entry.mjs

// Patch query.next() so events the stock SDK drops still reach the client.
const originalNext = session.query.next.bind(session.query);
session.query.next = async function (...args) {
  const item = await originalNext(...args);

  if (item.value?.type === "system") {
    const subtype = item.value.subtype;
    if (subtype === "compact_boundary") { /* forward */ }
    else if (subtype === "task_notification") { /* forward */ }
    else if (subtype === "api_retry") { /* forward + record */ }
  }

  if (item.value?.type === "rate_limit_event") {
    // includes resetsAt, utilization, overageStatus, isUsingOverage
    /* forward */
  }

  if (item.value?.type === "tool_progress")    { /* forward */ }
  if (item.value?.type === "tool_use_summary") { /* forward */ }

  return item;
};

If you write a harness and your "agent is working" spinner stops moving for 30 seconds at a time, this is almost certainly why. The agent did not freeze. The event that says it's still working never made it through.

Harness versus wrapper, side by side

A useful way to feel the gap is to put a harness next to the shape most people start with, a thin HTTP wrapper that POSTs to the API. The wrapper is fine for a chatbot. It falls apart the moment you want the agent to ship working code.

Feature	Wrapper (raw API call)	Harness (mk0r-style)
Surface area	Stdin and stdout	OS, filesystem, real browser, tool servers
State across turns	None unless you build it	Persistent /root and /app, pause-and-resume sandbox
Tool surface	Whatever you remember to put in the system prompt	MCP servers register tools the agent discovers
Verification	User reads the reply and checks manually	Agent navigates and snapshots its own output
Observability	HTTP 200 with a body	Cost, usage, retries, rate-limit state per turn
Failure mode	Opaque exception	Death rattle in stderr buffer, typed error metadata

Where Anthropic and the open reference harnesses sit

For context: Anthropic's engineering post on effective harnesses for long-running agents is the canonical theory. Chachamaru's claude-code-harness on GitHub is a worker / reviewer / scaffolder layout. cc-mini is roughly 1,000 lines of Python that show you what the minimum looks like. None of them ship as a product to non-technical users, so none of them have to solve the boring parts: paused sandboxes that resume cleanly, rotated refresh tokens that survive a host restart, MCP servers that read session creds at call time instead of startup, and a patched entry point that forwards the events the stock SDK drops.

None of those problems are interesting in isolation. All of them are the difference between a harness you can demo and a harness you can run.

What to copy if you're building your own

Three patterns are worth lifting whole, regardless of what your harness is for:

Pin every harness dependency, including transitive ones. A floating @playwright/mcp version is the same kind of mistake as a floating database driver version. The harness is a contract; treat it like one.
Buffer the agent subprocess's stderr. The first time it dies, you will want the last 200 lines, not the bare error message. mk0r keeps a rolling buffer at the bridge level and attaches it to the rejection.
Patch the agent SDK if you have to. Forking a dependency feels heavy, but if the cost is 200 lines of instrumentation and the benefit is a UI that does not lie about progress, fork it.

The honest limits

mk0r's harness is shaped for one job: take a one-sentence app idea from a non-technical user, ship a working Vite + React app inside 30 seconds for static apps and a few minutes for ones that need a database. It is not a general-purpose Claude Code replacement. If you need parallel agents in a multi-process workflow, look at Anthropic's C-compiler post. If you need something small enough to read end-to-end, cc-mini is honest. What mk0r offers is a working answer to one specific question: what does a Claude harness look like when the user is not a developer?

Want to look at the harness with someone who built it?

Half an hour, screenshare on the repo, ask whatever you want about the wiring.

Frequently asked

Frequently asked questions

What does a Claude implementation harness actually contain?

Five layers stacked on top of the model. (1) Process boot: a script that starts the OS-level services the agent will touch, in mk0r's case a 71-line docker/e2b/files/opt/startup.sh that brings up Xvfb, Chromium with CDP on port 9222, x11vnc, websockify, Playwright MCP on 3001, the ACP bridge on 3002, Vite on 5173, and a reverse proxy on 3000. (2) Agent protocol bridge: a process that translates between the host's JSON-RPC client and the Claude Code SDK's stdio interface, mk0r's is docker/e2b/files/opt/acp-bridge.js at 799 lines. (3) MCP toolbelt: a config block listing the MCP servers the agent can call, in mk0r at src/core/e2b.ts lines 175-213, declaring playwright, scheduler, and mk0r-provisioning. (4) Layered context: a global CLAUDE.md at /root/.claude/CLAUDE.md, a project CLAUDE.md at /app/CLAUDE.md, and a settings.json with permissions, all generated by src/core/vm-claude-md.ts (2,438 lines). (5) Event-capture instrumentation: a patched ACP entry point at docker/e2b/files/opt/patched-acp-entry.mjs that wraps query.next() and forwards six event types the stock @agentclientprotocol/claude-agent-acp@0.25.0 silently drops.

What is the difference between a harness and a wrapper that calls the Claude API?

A wrapper sends a prompt and prints the reply. A harness gives the model an environment it can act inside: a filesystem to write to, tools to call, a stdout to observe, a way to ask for permission, a memory layer that survives between turns, and a feedback loop that closes when the work is verified. The Anthropic engineering post on harnesses calls this 'everything you build around the model.' mk0r is one example. Claude Code itself is another. cc-mini is an open-source reference. They differ in what the agent can touch, not in what the model is.

Why use ACP (Agent Client Protocol) instead of the raw Anthropic API?

ACP is the protocol Claude Code speaks. By spawning @agentclientprotocol/claude-agent-acp@0.25.0 in a subprocess and piping JSON-RPC over its stdin/stdout, you inherit the agent's full toolchain (file edits, bash, todo lists, planning, memory) instead of rebuilding them. The cost is the protocol's own quirks: the stock entry point drops several session-update events that you need for an honest UI. mk0r's fix is a 200-line patched-acp-entry.mjs that wraps the agent and forwards those events through.

Which events does the stock ACP agent drop, and why does it matter?

Six of them, all forwarded by mk0r's patched entry: api_retry (so the UI can show 'retrying because of rate limit'), rate_limit_event (so the UI can show the actual reset time and overage status), compact_boundary (so the UI can show when the agent compressed its context), task_notification (so the UI can show task plan transitions), tool_progress (so the UI can show 'still running, elapsed 8s' for slow tools), and tool_use_summary (so the UI can collapse a chain of tool calls into one human-readable summary). Without these, your harness UI either lies about progress or appears frozen during compaction and rate limits.

Do I need three CLAUDE.md files or is one enough?

One is enough for a hobby project, three are useful when the agent works inside many different repos. mk0r has a global CLAUDE.md at /root/.claude/CLAUDE.md (role identity, memory rules, design guardrails, browser testing rules), a project CLAUDE.md at /app/CLAUDE.md (Vite + React + Tailwind specifics, MCP discovery, file conventions for this stack), and a settings.json with the permission mode. The split lets the global file stay stable across every new app the agent builds, while the project file changes with the stack. Both files are generated in src/core/vm-claude-md.ts.

How does the agent get its hands on a real browser?

Chromium boots with --remote-debugging-port=9222 in startup.sh. Then @playwright/mcp is pinned at version 0.0.70 in the Dockerfile and started with --cdp-endpoint http://127.0.0.1:9222 --port 3001. The MCP server declares its tools (navigate, snapshot, click, fill, console messages) over MCP, the agent sees them in the tool list, and every browser action goes through Chrome DevTools Protocol on a real running browser. There is no screenshot OCR and no headless trickery. The same browser is exported over VNC at port 5901 so the user can watch the agent click.

Why pin every dependency in the Dockerfile?

Because the harness is the contract. If @playwright/mcp ships a breaking change to its CDP wire format, the agent's browser tools silently start failing in production with no code change on your side. The mk0r Dockerfile pins @playwright/mcp@0.0.70 and @agentclientprotocol/claude-agent-acp@0.25.0 explicitly. The lesson from running this in production for months: floating versions on harness primitives is the same kind of mistake as floating versions on a database driver.

What does the harness do that the model cannot?

Boot the OS-level services, route auth, persist credentials between turns, enforce permissions, expose a stable tool surface, recover from a paused VM, write structured logs, and verify work happened. Anthropic's own post estimates roughly 98% of an agent's behavior comes from this scaffolding, not from the prompt. The decision to put Playwright behind MCP, to use ACP instead of raw API, to layer the CLAUDE.md, and to write a patched entry that recaptures dropped events, those are the harness, and they determine whether the agent feels reliable or feels like a demo.

Can I see this harness running before I read the code?

Open mk0r.com with no signup, type one sentence describing an app, and watch a VNC stream of the agent driving Chromium while writing files into a sandboxed /app. Every harness primitive in this page is in that session: the ACP bridge between you and the agent, the patched entry forwarding events, the MCP toolbelt the agent calls, the three CLAUDE.md files shaping its behavior, and the startup.sh that wired it all together a few seconds before you typed.