Prompting Style Guide: the bar a prompt clears before it ships
The opinionated rules Cleon applies to every prompt — the first decision (prompt, ground, or fine-tune), the prompt-quality bar a prompt clears before it ships, and how we compose Agentforce Prompt Templates and the Claude API by where the prompt runs rather than pick a camp. The discipline document that turns the prompting gotchas into a gate and the principles into practice, treating the prompt as engineered context, not a string you tweak by feel.
This is the page where Cleon stops describing what a prompt is and says what we do before one ships. The reference pages lay out the parts — what context engineering is, the system prompts and instructions that steer the model, the structured output a system can consume, the context window every call spends and the prompt caching that makes a stable one affordable. The gotchas lay out where each part bites. This Style Guide is the discipline that decides whether to reach for a prompt at all, and the bar it has to clear before its output touches a customer or a record.
The rules are short on purpose. When a rule needs an explanation, the explanation lives in the page it links to. This is the operational form of the AI Engineering principles — each rule below is one of those principles with its sleeves rolled up, and we cite the number so you can trace the rule back to the reasoning. It is the prompt-side companion to the agent Style Guide and the grounding Style Guide: an agent runs on prompts and reads the world through retrieval, so all three bars meet here at the window.
The first decision: prompt, ground, or fine-tune?
Before any of the rest, place the problem on the right lever. Most of what looks like "the model got it wrong" is not a model problem at all — it is a context problem, and the cheapest fix is almost always to engineer the window before you touch anything heavier (principle 12 — start from the problem). The three levers, in the order you reach for them:
- Prompt first. The default for nearly every task: shape the context the model sees — the system prompt, the instructions, the examples, the order — and most "wrong answer" bugs disappear, because they were never the model's fault (see what is context engineering). Engineer the window before anything else; it is the fastest, cheapest lever and the one most problems actually need.
- Ground when the answer depends on facts the model doesn't have. If the right answer turns on your data — a record, this quarter's policy, a document written last week — or on anything that changes, no prompt cleverness conjures the fact; you have to retrieve it (see what is grounding). Prompting steers the model over facts it can reach; grounding is how the fact gets in reach. Different lever, asked separately.
- Fine-tune rarely, and last. The conceptual third lever: bake behavior into the model's weights rather than its context. Reach for it only for a narrow, high-volume, latency- or cost-sensitive task where prompting and grounding have been genuinely exhausted — and know what you are buying. It is expensive to do, it ages with your data the moment that data moves, and it is almost never the first move. Prefer prompt-then-ground; treat fine-tuning as the exception you justify, not the reflex you reach for.
The prompt-quality bar
A prompt that ships is structured for salience, enforced in its output shape, defended against the untrusted text in its window, ordered so the model weights what matters, scored on every change, and budgeted in cost — or it is a prompt that worked once in the playground and has not yet met an input you didn't write (principle 1). Before this prompt's output touches a customer or a record, confirm every box. Each one closes a gotcha that has turned a clean demo into a prompt that breaks confidently.
- [ ] Instructions clear, direct, and structurally salient. The hard constraints are pulled out of the prose — their own line or section, stated once and clearly — not the eleventh sentence in a dense paragraph hoping to be noticed. (Principle 10 · gotcha 1.)
- [ ] Examples where the task's shape is hard to describe. When a tone, an edge case, or an output style resists prose, a few worked examples (few-shot) carry it — and they cover the hard cases, not just restate the easy one the model already gets. (Principle 10 · gotcha 4.)
- [ ] Output structured and validated when a system consumes it. When the output feeds a parser, a Flow, or the next step rather than a human, the shape is a contract — asked for as a schema and validated, never prose you regex-scrape. (Principle 8 · gotchas 2, 7.)
- [ ] Untrusted text fenced off from instructions. Every retrieved chunk, user message, or pasted document in the window is delimited and labelled as data, structurally separated from your directives — and what a document says never expands what the model is allowed to do. (Principle 5 · gotcha 5.)
- [ ] The key instruction or fact placed where the model weights it. What matters most sits near the start or the end, not stranded in the middle of a long window where it is read but under-weighted. (Principle 10 · gotcha 6.)
- [ ] A deterministic fallback when output is invalid. When validation fails or the model returns something unusable, a boring deterministic path fires — never a blank, an error string, or a hallucination rendered in front of someone. (Principle 8 · gotcha 2.)
- [ ] The stable prefix cache-structured. The unchanging content — system prompt, tool definitions, fixed examples — sits first and the per-request input last, so prompt caching can actually cover a contiguous prefix instead of being broken by an early variable token. (Principle 6 · gotcha 10.)
- [ ] Every prompt change scored against an eval set before it ships. The prompt is the most-edited artifact in the system; an edit has to score better on real cases, not just look better on the one input you tried, before it ships. (Principle 3 · gotchas 8, 9.)
If any box is unchecked, the prompt isn't ready — and a prompt failure looks exactly like a correct answer until someone downstream acts on it. See debugging prompts for how to find the failing layer when a prompt goes wrong instead of guessing at the words.
Composing the toolkit
The same prompt discipline travels across surfaces. It is one craft applied in two places, composed by where the prompt runs — not rival camps you choose between (principle 7). The bar above is identical on both; only the surface that carries the prompt differs.
- Agentforce Prompt Templates when the prompt runs on-platform. Prompt Builder is where the template is authored, grounded via merge fields, Flow, or Apex, and run inside the Salesforce security model — the same instructions, governed by the platform, acting on data the running user is allowed to see. See Agentforce agents.
- The Claude API when the prompt runs off-platform. The system and messages structure, structured outputs, and prompt caching are the surface here — the same craft of steering, shaping, and budgeting the window, governed by you. See prompt caching for the cost lever and structured output for the contract.
Same craft, different surface: a constraint stated once and made salient is the right move whether it lands in a Prompt Builder template or a Claude system prompt. The one decision this page does not own is the surface-level Agentforce vs. external AI call inside Marketing Cloud — that's a different framework, and it lives in full in the Marketing Cloud AI Style Guide.
Patterns to prefer
- Structure over volume — pull the constraint onto its own line and make it salient, rather than adding another sentence to a wall of text and hoping it gets noticed. More prompt is not more control. (Principle 10 · gotchas 1, 3.)
- Examples over explanation — when the task's shape resists description, show two or three worked examples covering the hard cases instead of writing another paragraph that still doesn't transfer. (Principle 10 · gotcha 4.)
- Structured output over parsing prose — when a system consumes the output, ask for a schema and validate it, rather than scraping structure out of a sentence with a regex. (Principle 8 · gotchas 2, 7.)
- Eval every change — score an edit against real cases before it ships, and re-score when the model version changes; a model swap is a change to test, not a free upgrade. (Principle 3 · gotchas 8, 9.)
- Cache the stable prefix — order the unchanging content first so prompt caching covers it, and stop re-paying to re-process the same leading tokens on every call. (Principle 6 · gotcha 10.)
- Fence untrusted text — delimit and label every chunk, message, or document you didn't write, and keep your directives outside that boundary. (Principle 5 · gotcha 5.)
Patterns to refuse
- Rewriting the whole prompt on a hunch — the swing that "looks better" on one input is an unscored guess; change one thing, score it, keep what wins. (Gotcha 9 · principle 3.)
- Piling rules until they contradict — every past failure bolted on as one more "always" or "never" until the task drowns and nobody can say which instruction produced which behavior. (Gotcha 3 · principle 10.)
- Parsing prose for a contract — building a regex to scrape a field out of a sentence is reinventing, badly, a feature the model already supports. (Gotcha 7 · principle 8.)
- Prefill on newer models — prefilling the assistant turn to force a shape is not the technique; it is unsupported on newer Claude models. Use Structured Outputs or tool calling instead. (See structured output · gotcha 2.)
- Shipping a prompt change unscored — "it looked good in three tries" is a vibe, not a test, and the regression ships silently next to the fix. (Gotcha 9 · principle 3.)
- A giant static preamble re-sent uncached — the right answer at a token bill that scales linearly with traffic and never had to, because the stable prefix was never cached. (Gotcha 10 · principle 6.)
Closing
None of these rules is hard to apply up front; all of them are expensive to discover live, because a prompt bug looks exactly like a correct answer until someone checks. The throughline is the one that runs through every page in this subcategory: the playground makes the easy path look like the whole path, and production is everything the easy path left out. The prompt is the most-edited artifact in an AI system and usually the least-tested — treat it as engineered context, not a string you tweak by feel, and the bar above is how we make sure the prompt we ship holds under an input the demo never showed.
If you spot a rule missing — or one of these rules being violated in our public work — write to hello@wearecleon.com. We add it, or we fix it and we say so.
Related
- Prompting gotchas — the failures this Style Guide is designed to prevent
- What is context engineering — the window as the unit of control, behind the first decision
- System prompts and instructions — where salience, ordering, and "structure over volume" are decided
- Structured output — the contract behind "structured and validated," and why prefill is not the technique
- Context windows — the token budget the cache-structure and ordering rules spend
- Prompt caching — the cost lever behind "cache the stable prefix"
- Debugging prompts — finding the failing layer when a prompt goes wrong
- Agent Style Guide — the agent-side anchor whose prompts this bar steers
- Grounding Style Guide — the retrieval-side anchor whose facts a prompt can't conjure
- AI Engineering principles — the meta-rules these specifics operationalize
Reference: