AI ENGINEERING / PROMPTING & CONTEXT ENGINEERING
Prompting & Context Engineering
Steering models reliably: system prompts, instructions, context windows, structured output, and prompt caching. The craft that turns a capable model into a dependable component.
Foundation · 2
Production note
Prompting gotchas: how a prompt breaks in production
A prompt that works in the playground works on the input you typed, on the model you typed it into, on a path you walked yourself. Production runs it on inputs nobody scripted, with untrusted text in the window, against a model version that shifts under you — and a prompt failure looks like a confident answer, not an error. Ten gotchas that break a prompt after the demo, each with the question to answer first and the cost of getting it wrong.
Decision framework
Prompting Style Guide: the bar a prompt clears before it ships
The opinionated rules Cleon applies to every prompt — the first decision (prompt, ground, or fine-tune), the prompt-quality bar a prompt clears before it ships, and how we compose Agentforce Prompt Templates and the Claude API by where the prompt runs rather than pick a camp. The discipline document that turns the prompting gotchas into a gate and the principles into practice, treating the prompt as engineered context, not a string you tweak by feel.
Reference · 5
Reference
What is context engineering? Everything the model sees before it answers
Context engineering is the shift from writing a prompt to deciding everything that fills the context window for a given call — the system prompt, instructions, examples, retrieved facts, history, tool definitions, and the user input, all competing for one finite budget. The reframe: a model only ever sees its window, so the window is the real unit of control. Why most 'the model got it wrong' problems are context problems, not model problems. And the vocabulary the rest of this subcategory uses. Principle 10: context is a budget, not a bucket.
Reference
System prompts and instructions: steering the model on purpose
The steering part of the context, written on purpose: system-prompt anatomy — role, task, boundaries, output format — and why system content sets a stronger foundation than the user turn. The techniques that actually hold: be clear and direct, give the model the why behind a rule, show worked examples (few-shot), let it reason out loud for multi-step work, and decompose a too-big task into a chain. Stated honestly: a model cannot think privately, so reasoning only helps if it is allowed to output it. Every change scored against an eval set (principle 3) — structure and ordering beat volume.
Reference
Structured output: when you need JSON, not prose
When the model's output feeds a system instead of a human, the shape is a contract — and parsing prose for it is the fragile bet that breaks on the first input you didn't try. The reliable paths in order of strength: Structured Outputs for guaranteed schema compliance, tool calling for typed arguments, and a precise format spec in the system prompt for flexibility. Prefill is not the technique — it is unsupported on newer Claude models. Whatever the path, validate every output before you use it and have a deterministic fallback when validation fails.
Reference
Context windows: the token budget every call spends
The context window is every token the model can reference for a call — including its own response — and it is two things at once: finite and ordered. This is the operational depth under context engineering: the window is one budget that the system prompt, instructions, examples, retrieved facts, history, tool definitions, and user input all draw from, so more is not better; and it is ordered, so a key fact stranded in the middle gets under-weighted. How to manage a window that grows, why a million-token ceiling is still not a license to fill it, and the levers — summarize, drop, retrieve on demand, server-side compaction — that keep it honest. Principle 10: context is a budget, not a bucket.
Reference
Prompt caching: stop re-paying for the same prefix
The large static preamble — system prompt, tool defs, examples, a long grounding doc — re-sent on every call gets re-processed every time, and at production volume that is a bill nobody approved. Prompt caching fixes it: mark a stable prefix with cache_control and later calls reuse the model's processing instead of re-paying for it. The mechanic — caching runs in order over tools, then system, then messages, up to and including the marked block, with up to 4 breakpoints, automatic or explicit. The economics stated honestly: a cache read costs about 0.1x a base input token while a cache write costs more than one, so caching wins on a reused prefix and can lose on a one-shot call. And the design rule that makes it pay — stable content first, the variable input last.