Skip to main content

What is context engineering? Everything the model sees before it answers

Context engineering is the shift from writing a prompt to deciding everything that fills the context window for a given call — the system prompt, instructions, examples, retrieved facts, history, tool definitions, and the user input, all competing for one finite budget. The reframe: a model only ever sees its window, so the window is the real unit of control. Why most 'the model got it wrong' problems are context problems, not model problems. And the vocabulary the rest of this subcategory uses. Principle 10: context is a budget, not a bucket.

Reference·Last updated 2026-06-03·Drafted by Lira · Edited by German Medina

A model never sees your codebase, your CRM, your intentions, or the last thing you told it five minutes ago unless one of those made it into the context window — the block of tokens the model reads before it produces a single word. That window is the model's entire world for that call. Nothing outside it exists. This is the fact that everything in this subcategory turns on, and it reframes the whole job. "Prompt engineering" narrowly means the words you write in the prompt. Context engineering is the wider discipline: deciding everything that goes into the window for a given call — and that, not the wording of one instruction, is the real unit of control.

The shift is from authoring a sentence to assembling a payload. The words you type are one input among several, and often not the one that decides whether the answer is right. This page draws the line, names everything that competes for space in the window, and lays out the vocabulary the rest of the subcategory leans on. Get the context right and a whole class of bugs you would otherwise blame on the model simply stops happening (principle 2 — ground before you generate, the same logic one layer up).

What fills the window

Everything the model reads for a call is assembled into one window before the call runs. It is worth naming the parts once, because each is a separate decision and each spends from the same budget:

  • The system prompt — who the model is for this call, what its job is, and the rules it must hold. Role, task, and boundaries live here. This is the highest-leverage text in the window, and the depth of writing it is system prompts and instructions.
  • Instructions — the specific task for this call: what to do with the input, in what format, under what constraints. The system prompt sets the standing rules; the instructions set the immediate ask.
  • Few-shot examples — worked examples of input-and-output you put in the window to show the model the shape you want, instead of only describing it. Two good examples often beat a paragraph of instruction.
  • Retrieved context — real facts pulled in at query time so the model answers from your data rather than from training. This is grounding, and it is its own discipline; the shape of it is what is grounding.
  • Conversation history / state — what came before in a multi-turn exchange, plus any scratch state the system carries forward. In a long conversation this is the part that quietly grows until it crowds out everything else.
  • Tool definitions — the typed functions the model is allowed to call, each with a name, a description it reads, and arguments it fills. Every tool you expose is text in the window the model must read before it can choose; the design of them is tools and actions.
  • The user input — the actual question or content for this turn. The one part people think of as "the prompt," and frequently the smallest slice of what the model actually sees.

The list is the point. When you picture a call as just "the prompt," you optimize one box and ignore the six around it. When you picture it as a window seven things share, you start asking the questions that matter: which of these earns its space, which is missing, which is so bloated it is burying the rest.

The window is finite, and it is ordered

Two properties of the window govern everything downstream. First, it is finite. There is a hard ceiling on how many tokens fit, and every part above draws from that one budget — this is principle 10, context is a budget, not a bucket. Add a longer system prompt and there is less room for retrieved facts. Let the history grow unbounded and it eventually crowds out the instruction that matters. More is not better; past a point it is actively worse, because the signal the model needs drowns in the context you added "just in case."

Second, the window is ordered, and position carries weight. The same instruction is not equally effective everywhere. A key rule buried in the middle of a long window gets under-weighted — models attend most reliably to what sits near the start and near the end, and a critical instruction stranded in the middle of ten thousand tokens of history can be effectively invisible. So context engineering is not only what goes in the window but where and in what order — the same facts arranged differently produce different answers.

Managing this deliberately — measuring what each part costs, deciding what to keep and what to drop as a conversation grows, ordering for attention — is its own depth: how the window is structured, how the token budget is spent, and how techniques like prompt caching change the arithmetic. Treat it as the operational layer beneath this page; meet it in depth later in the subcategory.

Most "the model got it wrong" is a context problem

Here is the honest framing, and it is the most useful idea on this page: most failures people pin on the model are failures of context, not of reasoning. The model usually reasoned fine — over a window that was missing the right instruction, missing the retrieved fact, or so stuffed that the signal it needed drowned in noise.

This is the same pattern grounding names one layer down. When a grounded answer is fluent but wrong, what is grounding tells you to suspect retrieval before the model — because the model defended an answer built on the wrong three chunks. Context engineering generalizes it: retrieval is one of the things that fills the window, and any part of the window can be the culprit. A missing instruction, stale history, a malformed tool definition, the right fact placed where the model under-weights it — each produces an answer that looks like a model failure and is not. The same instinct shows up agent-side in agent gotchas, where a confident wrong answer traces back to what the agent was handed. Get the context right and the bug disappears, because it was never the model's fault.

This is why "context engineering" is the right name for the discipline and "prompt engineering" sits inside it. Wording matters, and the next pages go deep on it. But the wording of a perfect instruction the model never reaches — because it is buried, or because the fact it depends on was never retrieved — fixes nothing. The window is the unit. The words are one part of it.

The vocabulary

This subcategory leans on a small set of terms. Here they are once, plainly; you will meet each in depth as the subcategory goes on.

  • System prompt — the standing instructions for a call: role, task, rules, boundaries. Set once per call, holds for the whole call.
  • Instruction — the specific ask for this turn: what to do, in what format, under what constraints. The immediate request, distinct from the standing rules.
  • Few-shot / multishot — putting worked input-output examples in the window to demonstrate the shape you want. "Few-shot" is a handful of examples; "zero-shot" is none, instruction only.
  • Chain-of-thought — prompting the model to reason step by step before it answers, so the working is in the window and the final answer stands on it rather than arriving cold.
  • Structured output — constraining the model to return a fixed shape — JSON, a schema, a set of fields — instead of free prose, so code downstream can consume it reliably. Its own page is structured output.
  • Context window — the finite block of tokens the model reads for a call. The container everything above shares.
  • Token budget — the accounting of that finite space: how many tokens each part costs and how many remain. The thing you are spending when you add to the window.
  • Prompt caching — reusing the model's processing of a stable prefix of the window across calls, so the unchanging part is not re-paid every time. The lever that makes a large, stable context affordable to send repeatedly.

Each of these gets a page or a section of its own as the subcategory goes on. Here they are just the words, so the rest reads cleanly.

Where to go next

From here, the subcategory builds outward from the window this page framed. The most leveraged part of it — how to write the system prompt and the instructions so the model holds its job and its boundaries — is system prompts and instructions. When you need the model to return something code can consume rather than prose a human reads, that is structured output. And the failure modes before you hit them in production — the context-shaped bugs that masquerade as model failures — are in prompting gotchas.

The same discipline crosses platforms as complementary tools an engineer composes, never a versus. The context that fills an agent's window is assembled the same way whether the agent runs on Agentforce inside the Salesforce security model or on the Claude API outside it; the parts are identical, only the source of each differs. The platform-specific surfaces are framed in what is an agent and the agent build pages it points to.

Related

  • System prompts and instructions — the highest-leverage text in the window: role, task, rules, and how to write them so they hold
  • Structured output — constraining the model to a shape code can consume, not prose a human reads
  • Prompting gotchas — the context-shaped failure modes — buried instructions, bloated history, missing facts — and how to catch them
  • What is grounding — retrieved context is one part of the window; this is the discipline of getting it right
  • Tools and actions — tool definitions are text in the window too; how to design the functions a model can call
  • Agent gotchas — the same lesson agent-side: a confident wrong answer traces to what the agent was handed
  • Prompting Style Guide — the bar a prompt clears before it ships
  • Debugging prompts — isolate the variable, then fix
  • AI Engineering principles — context is a budget not a bucket (10), ground before you generate (2), the model is the easy part (4)

Reference: