What this unit solvesMulti-step tasks that a single prompt cannot handle are solved by workflow orchestration: decomposition, chaining, parallelism, verification, and convergence. Prompt engineering governs “how to phrase this one request”; context engineering governs “what belongs in the window”; workflow engineering governs “what structure connects these steps, who goes first, and where to place checkpoints”. Getting this layer right is what makes multi-step flows repeatable, auditable, and not luck-dependent, instead of producing different results on every run.
Learning objectives
- Identify when a task exceeds the scope of a single prompt and requires a workflow to run reliably, and distinguish the boundary between “workflow” and “agent”.
- Decompose a compound goal into steps with dependency relationships, and identify which can run in parallel and which must be sequential (DAG thinking).
- Apply primitives such as pipeline, fan-out, barrier, and loop-until, and map them to the five named patterns Anthropic identifies.
- Place adversarial verification and convergence checkpoints at critical nodes to prevent upstream errors from propagating silently.
1. When you need a workflow: where the single-prompt ceiling is
When any step’s output is the next step’s input, and that intermediate result is hard to maintain reliably within the same context, you need a workflow. Conversely, anything that can be completed stably within one prompt and one context window should not be split; splitting only adds latency and state management overhead (see Section 8 on common pitfalls). A single prompt hits three ceilings:- Output length: the model’s per-response token limit means asking it to produce a long report plus complete code plus tests in one shot either truncates or skims every section.
- Context depth: when a task requires “remembering the intermediate result computed earlier” before proceeding, those intermediates crowd the same window and dilute each other, triggering the lost-in-the-middle and context-rot effects covered in 01-4.
- External retrieval: some steps must first query a database, run tests, or fetch a page, then use the result to decide what comes next. A single-round prompt has no structure for “pause, retrieve, then continue”.
2. Decomposition and dependencies: DAG thinking
The most reliable way to break a compound goal into steps is to ask backwards from the end: “What does this final output directly require?” For each answer, trace one layer further up: “And where does that come from?” Continue until you reach atomic steps that cannot be further divided. Compared to working forward from “what is step one”, backward reasoning is less likely to miss hidden prerequisites. After decomposing, draw the steps as a directed graph: nodes are steps, edges are dependencies (A → B means B uses A’s output). This graph must be acyclic; a cycle means A waits on B and B waits on A, which deadlocks. This is DAG (directed acyclic graph) thinking. Once the DAG is drawn, parallelism opportunities surface on their own: nodes with no path between them can run simultaneously. In this graph, the three summarization steps C1, C2, and C3 are independent and can run in parallel; but all three must wait for B to finish (they need the cleaned data), and all three must finish before D (the comparison needs all three summaries). D is a natural barrier point (see Section 3). Decomposition most commonly produces two opposite mistakes:- Chaining parallelizable steps: C1/C2/C3 are entirely independent but are run one after another. No error occurs; you just pay three times the time with nothing to show for it.
- Parallelizing dependent steps: launching D (comparison) and C (summarization) simultaneously means D receives incomplete summaries, producing inconsistent or empty-based output. This mistake is harder to spot because it “runs fast”, but the results are wrong.
3. Common patterns: from mechanical primitives to named patterns
Four primitives are the actual building blocks you manipulate when orchestrating; Anthropic combines them into five higher-level named patterns. Master the primitives first, recognize the combinations next, and encountering a new framework is just a syntax change. Four mechanical primitives:- Pipeline (chaining): A → B → C, each step’s output feeds the next. Suited for linear transformations. Corresponds to Anthropic’s prompt chaining: decompose a complex task into sequential steps, with programmatic checkpoints between steps to verify progress before continuing [1].
- Fan-out: the same input is sent to multiple parallel steps simultaneously. Suited for analyzing the same data from multiple perspectives (checking security, performance, and types in parallel).
- Barrier: wait for all parallel branches to complete before proceeding. Fan-out is usually followed by a barrier and then aggregation. Fan-out + barrier together make up Anthropic’s parallelization, which has two variants: sectioning (split the task into independent sub-tasks and run them simultaneously) and voting (run the same task multiple times to get independent results and take a vote) [1].
- Loop-until (convergence loop): repeat until verification passes or the limit is reached. Suited for quality iteration and self-correction. Corresponds to Anthropic’s evaluator-optimizer: one LLM call generates output, another scores and gives feedback, iterating toward a passing result [1].
- Routing: use a classification step first to determine which category the input falls into, then direct it to the appropriate specialized handler. This is still fundamentally a workflow (you define the routing table), with an added layer of dynamic dispatch.
- Orchestrator-workers: a central LLM dynamically decomposes the task, dispatches sub-tasks to workers, and synthesizes the results [1]. The key difference from parallelization is: the sub-tasks are not predefined; the orchestrator decides them after seeing the input. At this point you have stepped into agent territory and partially surrendered control to the model.
4. Verification and convergence: placing adversarial verification at critical nodes
The consequences of skipping verification are concrete: one small upstream error is treated as a correct premise by every downstream step and amplified, and by the time the output three steps later is obviously wrong, tracing which step broke is already difficult, and all the tokens consumed at every intermediate step are wasted. The value of a verification checkpoint is catching errors while they are still cheap. The weakest form of “verification” is having the same model check its own output, but models tend to agree with what they just produced, making this minimally effective. Stronger is adversarial verification: use a different perspective, or an independent model call, to challenge the previous step’s output rather than confirm it. Two key design moves:- Switch the role to critic: don’t write “check whether this result is correct” for the verification step; write “try to refute this conclusion; when uncertain, default to not-established”. Reversing the burden of proof is what suppresses false positives.
- Multiple perspectives rather than multiple copies: if a conclusion can fail in multiple ways, give each verifier a different lens (correctness, security, reproducibility) rather than running three identical checks. This is exactly the voting variant of Anthropic’s parallelization [1].
- Pass criterion: use a structured pass / fail schema rather than a vague judgment like “looks acceptable”.
- Maximum retry count: if the Nth iteration still fails, stop; never iterate indefinitely.
- Timeout exit: abort if a single round or the whole loop exceeds the time limit.
MAX_ROUNDS line, any case the model can never pass will burn until you notice the bill. Claude Code’s Stop hook can serve as this convergence sentinel, running a deterministic pass / fail check at the end of each round (as of 2026-05) [2].
5. Positioning on the concept ladder
In the four-layer engineering stack, workflow occupies L3: it depends downward on the quality of L2 context, and requires L4 harness as a safety net above it (from shallow to deep: 01-3 prompt, 01-4 context, this unit’s workflow, 01-6 harness):- Connects downward to context: every node in a workflow is fundamentally a context assembly. The quality ceiling of a node’s output is set by the quality of the context you give it (see 01-4). No matter how elegant the workflow structure, if a single node’s context design is poor, its output is poor. Workflow amplifies good design and amplifies bad design equally.
- Connects upward to harness: a workflow only describes “how steps are chained”; it does not handle “who applies the brakes when something breaks”. Adding monitoring, permission boundaries, memory persistence, and a kill switch is what makes a complete agentic harness (see 01-6). Building a solid L3 without an L4 kill switch is fitting a racing engine with no brakes.
Tool comparison
Three main vehicle types for putting multi-step orchestration into practice, with different applicable scenarios (as of 2026-05):- Claude Code sub-agent orchestration: spawn multiple sub-agents using the Task tool inside an orchestrator prompt; each sub-agent has its own independent context and tool set and returns only the final conclusion, not the process [2]. Best for situations where the orchestration target is LLM work and you want to stay within the Claude ecosystem. Anthropic launched a research preview of Dynamic Workflows as an orchestration layer for larger tasks, and Agent Teams that collaborate via a shared git workspace, on 2026-05-28 (research preview as of 2026-05) [2].
- n8n: visual low-code workflow where nodes correspond to steps; strongest at integrating a large number of external services (databases, APIs, messaging platforms). Its AI capability is built on LangChain, providing 70+ AI nodes; the AI Agent node defaults to Tools Agent and supports human-in-the-loop approval before tool execution (as of 2026-05) [4]. Best for flows that span many SaaS services and where non-engineering roles need to read and modify the flow.
- LangGraph: describes flows as graphs in Python / TypeScript; the three primitives are State (shared across all nodes), Node (a function that reads and writes state), and Edge (direct or conditional transitions). Version 1.2 (released 2026-05-11) treats an agent run as a persistent graph execution (durable execution), checkpointing at every node so the server can resume from the last breakpoint after a restart [3]. Best for agentic flows requiring fine-grained control, state persistence, and human-in-the-loop.
| Concept | Anthropic Claude (primary) | OpenAI | GitHub Copilot | Cursor | |
|---|---|---|---|---|---|
| Multi-step orchestration entry point | Claude Code Task tool / sub-agents; SDK orchestrator pattern | Agents SDK / Responses API (with tool calls) | Vertex AI Agent Builder; Gemini API function calling | Copilot coding agent (cloud-autonomous, opens PRs) | Background agents |
| Parallel sub-tasks (fan-out) | Spawn multiple Task sub-agents inside the orchestrator prompt, each with independent context | Agents SDK multi-agent handoff | Needs original source verification | Needs original source verification | Background agents parallel |
| Loop-until / self-correction | Orchestrator conditional re-call; Stop hook as convergence sentinel | Agents SDK loop + guardrails | Needs original source verification | Needs original source verification | Needs original source verification |
| External workflow integration | n8n, LangGraph both call the Claude API | n8n, LangGraph both support OpenAI | n8n, LangGraph both support Gemini | Via API integration | Via API integration |
The comparison table gives coordinates, not detailThe precise mechanism names, versions, and configuration paths in each cell are fast-changing facts; some are presented with more stable descriptions. Cells that cannot be verified are marked “needs original source verification”; use that tool’s official documentation for exact names. The table’s purpose is to tell you “what your tool calls the same orchestration concept”; for deep configuration see Part II and Part IV.
Hands-on exercises
Two exercises, from paper decomposition to actual orchestration:- Draw an existing manual task as a DAG. Pick a multi-step flow you run manually (for example: collect data → summarize individually → cross-compare → write report). Draw the nodes and dependency edges, label which nodes can run in parallel (no path between them) and which must be sequential. Then ask three questions for every node: where does the input come from, who consumes the output, and what is the pass criterion? Which node most deserves a verification checkpoint (usually the one where an error would be amplified by downstream steps as a correct premise)?
-
Implement a minimal pipeline in Claude Code. Three chained steps; the second step verifies the first step’s output:
Making “verification” a standalone node with a pass / fail flag and a retry cap is the point of this exercise. Add a Stop hook for a deterministic end-of-run check, and this pipeline has a convergence sentinel.
Common pitfalls
Self-check
The bar for passing this unit
- Given a multi-step task, can you explain clearly why it exceeds the scope of a single prompt (output length, context depth, or external retrieval)? If you can’t, it may not need splitting at all.
- For each node in a workflow you design, can you answer “where does the input come from, who consumes the output, what is the pass criterion”? Any node you can’t answer for means the design is not yet mature.
- Is the barrier after your fan-out genuinely needed because the next step requires global cross-branch information, or did you just put it there out of habit?
- Does your loop-until have an explicit maximum iteration count and a timeout exit?
- Can you distinguish whether what you are building is a workflow (path predefined) or an agent (model decides the path)? The reproducibility cost of the two differs.
Sources and further reading
Facts are sourced from official documentation; fast-changing items are annotated “as of 2026-05”; IEEE numbering style.- [1] E. Schluntz and B. Zhang, “Building Effective Agents,” Anthropic Engineering, Dec. 19, 2024. (definitions of workflow and agent; the five named patterns: prompt chaining, routing, parallelization, orchestrator-workers, evaluator-optimizer) https://www.anthropic.com/engineering/building-effective-agents (as of 2026-05)
- [2] Anthropic, “Create custom subagents” (sub-agent independent context and tool set, returns conclusion only; Task tool orchestration; Dynamic Workflows and Agent Teams research preview, 2026-05-28), Claude Code Docs. https://code.claude.com/docs/en/sub-agents (as of 2026-05)
- [3] LangChain, “Durable execution,” LangGraph Docs (State / Node / Edge, checkpointing, durable execution; LangGraph 1.2 released 2026-05-11). https://docs.langchain.com/oss/python/langgraph/durable-execution (as of 2026-05)
- [4] n8n, “AI Agent node documentation” (LangChain-based, Tools Agent default, human-in-the-loop approval before tool execution; 70+ AI nodes), n8n Docs. https://docs.n8n.io/integrations/builtin/cluster-nodes/root-nodes/n8n-nodes-langchain.agent/ (as of 2026-05)
- Connections: 01-3 prompt quality at a single node; 01-4 context assembly per node and sub-agent isolation; 01-6 wrapping a workflow into a harness with monitoring and a kill switch; 05-1 and 05-2 real orchestration designs from two open-source agents.