Prompt chaining
Prompt chaining is a design pattern in which a complex task is decomposed into a sequence of individual large language model calls, where the output of each call becomes part of the input for the next. Rather than asking a single prompt to perform a multi-step task end-to-end, a prompt chain breaks the problem into discrete, manageable stages — each handled by an LLM invocation tuned for that specific subtask. The chain as a whole accomplishes what no single prompt reliably could.
A practical example: a customer support automation system needs to classify an inbound message, extract the relevant entities (order number, product name, issue type), draft a response, and then verify that the draft addresses the classified issue. Each step has different requirements in terms of output format and model behavior. Doing all four in a single prompt is possible but brittle — the model must balance competing objectives simultaneously. A prompt chain assigns each step to a focused invocation, using the prior step's structured output as input to the next.
When prompt chaining makes sense
Prompt chaining is the right design pattern when two or more of these conditions hold:
- The task requires sequential dependencies: Step B cannot start until Step A's output is known. Classification before drafting, extraction before lookup, validation after generation.
- The full task exceeds reliable single-prompt performance: Complex reasoning, multi-step workflows, or tasks requiring both broad synthesis and precise formatting tend to degrade when jammed into a single prompt. Splitting them creates cleaner separation of concerns.
- Different steps benefit from different configurations: Classification may call for a lower-temperature, deterministic model; creative drafting may need higher temperature; validation may need a specific output schema. A chain allows per-step configuration.
- Intermediate outputs need inspection or routing: In a chain, outputs between steps are explicit and inspectable. A step that classifies intent can route to different downstream chains based on the classification result — a pattern called conditional chaining or branching.
Prompt chaining vs single-shot prompting vs ReAct
Single-shot prompting asks one model in one call to handle the full task. It is appropriate for simple, well-defined tasks where the model's capabilities are sufficient end-to-end. Prompt chaining is appropriate when single-shot performance is insufficient and the task has clear sequential structure.
The ReAct agent pattern differs from prompt chaining in that ReAct interleaves reasoning traces with tool calls within a single extended LLM session — the model itself decides what to do next, rather than having the decisions predetermined by the chain's structure. Prompt chaining is deterministic in structure: the developer specifies the sequence of steps at design time. ReAct is dynamic: the agent determines the sequence at inference time based on observations from tool calls. Prompt chaining is more predictable and auditable; ReAct is more flexible for open-ended tasks.
Context management across chain steps
A key engineering consideration in prompt chaining is what context to pass between steps. Passing the full output of every prior step forward causes the context window to grow with each step, eventually exceeding limits and inflating token costs. Most production chains pass only the relevant extracted output from each step — structured JSON, a classification label, a short summary — rather than the full prior response. This selective forwarding requires careful output schema design for each step so that subsequent steps receive exactly what they need in a predictable format.
Context engineering governs how each step's context is assembled: what system prompt governs that step, what prior outputs are included, in what format, and at what level of detail. In a well-designed prompt chain, each step has a clear input contract (what it receives) and output contract (what it produces), making the chain debuggable and modifiable at the step level without affecting the rest.
Failure modes and mitigations
Prompt chains fail in characteristic ways that single-prompt systems do not. Error propagation is the primary risk: if Step 2 produces a malformed or incorrect output, every downstream step operates on bad data. Step 2's error may not manifest as an obvious failure until the final output, making debugging difficult. Mitigations include output validation at each step boundary (checking that the output conforms to the expected schema before passing it forward), and retry logic with fallback prompts for steps prone to formatting failures.
Latency accumulates across chain steps. Each LLM call adds network round-trip time and inference latency; a five-step chain with 500ms per step introduces 2.5 seconds of irreducible latency before the final output is available. Steps that can run in parallel — because they do not depend on each other's outputs — should be executed concurrently. AI observability tooling that traces individual steps in the chain and logs their latency, token usage, and outputs is essential for diagnosing bottlenecks and optimizing chain performance in production.

