What this unit solvesThe same model behaves wildly differently depending on the execution shell around it. The harness is that shell: tool set, permission boundaries, memory, loop control, observability, kill switch. The model sets the capability ceiling; the harness determines how much of it you actually get, how stable it is, whether results can be reproduced, and whether things can be contained when they go wrong. This unit unpacks the components and design criteria of a harness so you have a clear evaluation framework when choosing or building an agent system, rather than stumbling toward a working configuration by trial and error.
Learning objectives
- Explain in your own words where the boundary between “model” and “harness” lies, and name concrete problems that arise from confusing the two.
- List the six components of a sound harness and use them to audit an existing tool or a self-built system.
- Design sensible security boundaries: approval gate, sandbox isolation, heartbeat dead-man switch, kill switch, and know which of these cannot be skipped in which situations.
- Explain why “reproducible” matters more than “clever”, and identify the minimum observability configuration that makes agent behavior reproducible.
1. What a harness is: the model sets the ceiling; the harness determines what you get
Start with a comparison. The same Claude model, driven two ways: one approach is a bare loop you write yourself — throw a prompt, receive a response, manually execute the tool calls the model requests, paste the results back, no permission checks, no step limit, no logs of any kind. The other is Claude Code, the same model, but wrapped in permission rules, a sandbox, hooks, session persistence, and an interruptible loop. The stability, reproducibility, and safety of the two differ by an order of magnitude. The difference is not the model; it is the harness. Anthropic describes the basic building block of an agentic system as “an LLM augmented with retrieval, tools, memory, and other capabilities” [1]. That framing draws the boundary:- The model owns: reasoning, generation, deciding “which tool to call next, what to remember, what to look up.”
- The harness owns: how tools are actually executed, where permission boundaries sit, how state is persisted, when the loop stops, who applies the brakes when something goes wrong, and whether a record of the whole process is kept.
rm -rf, will not cry halt at step 50 of an infinite retry, and will not terminate a process when the agent locks up. Those are the harness’s responsibilities. Conflating the two leads you to believe “a well-written prompt means safety,” while the actual safety and stability gap goes completely unaddressed.
2. The six components of a sound harness
When auditing any agent system — commercial tool or self-built — check these six components one by one and the gaps become immediately visible. The first four correspond to Anthropic’s augmented LLM and agent loop; the last two are what you must add when taking it into production. Component 1: tool set and permission boundaries. The shape of a tool directly affects model decision quality. Anthropic recommends treating the agent-computer interface (ACI) with the same care as a human-computer interface (HCI), investing comparable effort: giving the model enough tokens to think before acting, using formats close to natural web text, and eliminating error-prone formatting overhead [1]. Beyond shape comes boundary: the files, domains, and command scopes each tool can reach determine the blast radius when something goes wrong. The principle of least privilege matters more in an agent context than in a traditional system, because the decision-maker is a model that improvises, not a program whose behavior is fixed and predictable. Component 2: memory and state. Three layers, each with its applicable scenarios and contamination risks:- Short-term (conversation context): the conversation history within a session. The Claude Agent SDK explicitly states that the context window within a session only grows, never resets — system prompt, tool definitions, conversation, tool inputs and outputs all accumulate [2]. This is precisely the context rot discussed in 01-4.
- Working memory (structured state within a session): storing intermediate results as key-value pairs or summaries rather than leaving them floating in the conversation where they dilute.
- Long-term (cross-session persistence): content loaded across sessions, such as
CLAUDE.mdand memory files (see 04-1). This is the highest-risk layer, because contaminated memory can quietly reassemble itself in a future session.
session_id to resume, restoring files read, analyses done, and actions taken [2]). Section 4 expands on this.
Audit your current tool against all sixTake Claude Code as an example and map each component: tool set and permissions (
permissions in settings.json), memory (session context + CLAUDE.md + memory files), loop termination (task-completion judgment + you can interrupt any time), observability (transcript + hook logs), security boundaries (permissions + sandbox + hook gate), reproducibility (session resume). All six have counterparts — that is why it is more stable than a bare API loop. Apply the same list to your self-built agent and ask “do I have this?” for each item. The first one you can’t answer is where it will break first.3. Security boundaries: approval gate, sandbox, heartbeat, kill switch
These four are the parts of harness design most commonly skipped out of laziness, because skipping them leaves the system looking like it still runs. The problem only surfaces at the moment something goes wrong, by which point it is usually too late. Approval gate. Which actions proceed automatically and which require human confirmation? There is only one criterion: is the harm caused by this action reversible? Reversible actions such as reading files or running tests can be allowed through automatically. Irreversible or externally visible actions such as deleting files,git push, network egress, or writes to paths outside the repo should be held for confirmation. Claude Code’s permissions divides rules into three categories — allow, ask, deny — evaluated in the order deny, then ask, then allow; the first matching rule wins, so deny always takes precedence (as of 2026-05) [3].
date +%s > /tmp/agent.heartbeat). The value of this mechanism is not in ordinary runs; it is in the one time the agent actually locks up.
This section covers design criteria only; the threat model belongs in 03-3This section is about which security mechanisms a harness should have and the design trade-offs involved. Concrete attack-surface analysis (prompt injection, supply chain, memory poisoning, relevant CVEs) belongs to the threat model and is covered in 03-3; it is not repeated here.
4. Observability and reproducibility: why “reproducible” is worth more than “clever”
An agent that occasionally produces impressive results but differs every time has lower engineering value than one that reliably produces 80-point output. The reason is practical: behavior that cannot be reproduced cannot be improved, and cannot be demonstrated to stakeholders. “It worked that way last time” is not an engineering answer; it is a luck report. Making behavior reproducible requires two things:- Observability (seeing after the fact what happened). The minimum configuration is the structured log fields from Component 4 in Section 2. The key is using a structured log (one JSON entry per call) rather than a full transcript — the former is cheap, queryable, and sufficient; the latter is expensive and hard to search. Observability has an underrated secondary use: anomaly pattern detection. Hijacked or out-of-control runs typically show anomalous signals in the tool call trace (suddenly reading a pile of
.envfiles, connecting to domains it has no business reaching) before you notice anything manually. - Replayability and resumability (reconstructing state after the fact). The Claude Agent SDK’s session mechanism lets you grab a
session_idand resume, restoring files previously read, analyses done, and actions taken [2]. At a minimum, preserve the tool call sequence; a full token stream is not needed.
grep approval=blocked or grep .env and land directly on the anomaly. That is what makes a structured log more valuable than a transcript.
5. What a mature harness looks like: from “it runs” to “well-designed”
The gap between “it runs” and “a well-designed harness” is clearest as a comparison table:| Dimension | It runs (barely) | Well-designed harness |
|---|---|---|
| Permissions | Wide open, or --dangerously-skip-permissions | deny high-risk actions, ask for irreversible ones, allow the rest |
| Isolation | Runs directly on the host under your personal account | Untrusted work goes into a sandbox / container; network closed by default |
| Loop | No step limit; someone watches and shouts stop | Explicit success condition + maximum steps + timeout exit |
| Memory | Everything packed into the conversation window | Short-term / working / long-term layered; cross-session persistence |
| Observability | Dig through chat logs when something breaks | Structured log per tool call; replayable |
| Emergency stop | Close the terminal window | Kill process group + audit record + heartbeat |
- OpenClaw: see how its tool set design and approval gate are implemented, corresponding to Component 1 and Section 3 of this unit.
- Hermes-Agent: see how its memory layering and loop termination conditions are designed, corresponding to Components 2 and 3.
6. Positioning on the concept ladder: harness is the outermost layer, L4
Arranged from innermost to outermost (see 01-3, 01-4, 01-5), the harness is the outer shell:Tool comparison
Mapping harness components to mainstream agent systems reveals consistent concepts under different names (as of 2026-05; exact mechanisms and configuration paths are subject to each vendor’s current documentation):| Component | Anthropic Claude (primary) | OpenAI | Self-built (bare SDK / framework) | |
|---|---|---|---|---|
| Tool set and ACI | Claude Code tools + MCP; Agent SDK custom tools [2] | Agents SDK function tools | Gemini API function calling | Custom function schema |
| Permissions / approval gate | settings.json permissions (deny, ask, allow) [3] | Agents SDK guardrails / tool_use approval | Needs source verification | Implement your own approval layer |
| Loop termination | Task-completion judgment; Stop hook convergence sentinel | Agents SDK max_turns limit | Needs source verification | Set your own step / timeout limit |
| Memory and state | Session context + CLAUDE.md + memory files | Agents SDK sessions | Needs source verification | Manage your own state store |
| Observability | Transcript + hook logs | Agents SDK tracing | Needs source verification | Wire your own logging / OpenTelemetry |
| Sandbox / isolation | OS-level sandbox (Bash child processes only) [3] | Needs source verification | Needs source verification | Container / devcontainer |
The comparison table gives coordinates, not detailsExact names, versions, and configuration paths for each cell are fast-moving facts; entries that cannot be confirmed are marked “needs source verification” — refer to each vendor’s official documentation. The final column uses “self-built” instead of GitHub Copilot and Cursor (the usual comparison targets) because neither currently offers a programmable agent harness layer equivalent to the others; the natural comparison point for harness engineering is always “off-the-shelf tool” vs. “roll your own.” The table’s purpose is to tell you what the same harness component is called in your tool; for deeper configuration detail see Part II and Part IV.
Hands-on exercises
Two exercises: one audit, one hands-on fix for the component most commonly missing.-
Audit your existing agent configuration against the six components. Take the system you are currently using (Claude Code’s
.claude/settings.json, or your self-built agent) and check each of the six components from Section 2 one by one: tool set and permissions, memory and state, loop termination, observability, security boundaries, reproducibility. For the first one you cannot check off, write down “in what scenario will this be the first thing to break?” Most people find the gap is observability or a kill switch, because neither is needed under normal conditions and both are needed the moment they are. -
Add a heartbeat to an unattended loop. Take the
supervisor.shskeleton from Section 3 and adapt it for your own agent: decide a reasonableTIMEOUT(slightly longer than the slowest normal single-step duration), add a one-line heartbeat write to each iteration of the agent loop, and usesetsidto put it in its own process group. Verification: deliberately cause the agent to lock in an infinite loop and confirm that the supervisor actually kills the entire group afterTIMEOUT, rather than leaving orphan processes behind.
Common pitfalls
Self-check
The bar for passing this unit
- Can you state in one sentence what “model” and “harness” each own? Have you hit or observed a concrete problem from confusing the two?
- Applying the six components from Section 2 to the agent tool you currently use, which items are missing? In what scenario would the first missing item be the first thing to break?
- Are your approval boundaries divided by action reversibility, or is everything allowed through / everything blocked? Give one action you would allow and one you would block.
- Does your unattended loop have a heartbeat and a process-group-level kill switch? Does the termination signal go to the parent process or the whole group?
- If you needed to reconstruct right now what your agent did in its last run, is your observability data sufficient? If not, what piece is missing?
Sources and further reading
Factual claims are grounded in official documentation; fast-changing items are annotated as of 2026-05. IEEE numbering convention used throughout.- [1] E. Schluntz and B. Zhang, “Building Effective Agents,” Anthropic Engineering, Dec. 19, 2024. (augmented LLM as fundamental building block = LLM + retrieval + tools + memory; agent obtains environment ground truth at each step; stopping conditions maintain control; ACI should be designed with the same care as HCI; production requires sandboxed testing + guardrails) https://www.anthropic.com/engineering/building-effective-agents (as of 2026-05)
- [2] Anthropic, “How the agent loop works,” Claude Agent SDK Docs. (gather context, take action, verify work, repeat; the harness powering Claude Code can power other agents; context accumulates within a session without reset;
session_idresume restores full context) https://code.claude.com/docs/en/agent-sdk/agent-loop (as of 2026-05) - [3] Anthropic, “Configure permissions,” Claude Code Docs. (
permissionsallow / ask / deny; evaluation order deny, ask, allow first-match-wins; sandbox is OS-level enforcement, applies only to Bash and child processes; permissions and sandboxing are complementary defense-in-depth layers) https://code.claude.com/docs/en/permissions (as of 2026-05) - Related: 01-4 on memory layering and the causes of context rot; 01-5 on workflow orchestration that the harness grounds; 03-3 on the threat model and attack surface behind security boundaries; 04-1 on long-term memory design; 04-6 on using hooks to implement deterministic approval gates and convergence sentinels; 05-1 and 05-2 for the actual design of two open-source harnesses.