Skip to main content

Grounding Style Guide: the bar retrieval clears before it ships

The opinionated rules Cleon applies to every grounded system — the first decision (do you even need retrieval), the retrieval-quality bar a pipeline clears before it ships, and how we compose Agentforce retrievers and external RAG by where the data lives rather than pick a camp. The discipline document that turns the grounding gotchas into a gate and the principles into practice, scoring retrieval before anyone touches the prompt.

Decision framework·Last updated 2026-06-03·Drafted by Lira · Edited by German Medina

This is the page where Cleon stops describing what grounding is and says what we do before retrieval ships. The reference pages lay out the parts — what grounding is, the chunking and embeddings that set the ceiling, the retrieval quality you measure and tune. The gotchas lay out where each part bites. This Style Guide is the discipline that decides whether to ground at all, and the bar retrieval has to clear before an answer it feeds touches a customer or a record.

The rules are short on purpose. When a rule needs an explanation, the explanation lives in the page it links to. This is the operational form of the AI Engineering principles — each rule below is one of those principles with its sleeves rolled up, and we cite the number so you can trace the rule back to the reasoning. It is the grounding-side companion to the agent Style Guide: an agent reads the world through retrieval, so its bar starts here.

The first decision: do you even need grounding?

Before any of the rest, answer this: does the answer actually need retrieval? The honest test is one question — does the answer depend on facts the model wasn't trained on, or that change? (See what is grounding.) If yes, you ground; no prompt cleverness conjures a fact the model never saw. If no, you don't — and bolting RAG onto a task whose facts already sit in the prompt buys you latency, cost, and a new failure mode for nothing (principle 12 — start from the problem).

The split, by where the facts live:

  • Ground it when the answer depends on your data — a customer's order history, this quarter's policy, a record in the CRM, a document written last week — or on anything that changes faster than the model is retrained: prices, inventory, status. The fact is not in training; retrieval is the only honest way to put it in reach.
  • Don't when the task is self-contained — everything the model needs is already in the prompt. Summarize this email, classify this ticket, rewrite this paragraph, extract these dates. The input carries its own facts; there is nothing to go fetch.

The retrieval-quality bar

A grounded system that ships is chunked to be findable, embedded consistently, kept fresh, measured on retrieval and not just the answer, budgeted in context, permission-aware, and traceable — or it is a demo that retrieved the one document you tested (principle 1). Before retrieval feeds an answer to a customer or a record, confirm every box. Each one closes a gotcha that has turned a working demo into a system that confidently misinforms.

  • [ ] Chunked to be findable. Each chunk carries enough context to stand on its own, and boundaries fall on natural seams — sections, paragraphs, complete rows — not a fixed character count that cuts through meaning. (Principle 2 · gotcha 1.)
  • [ ] Embedded consistently. Every vector in the index — documents and queries alike — is produced by the same embedding model and version, and that model fits this domain's vocabulary. (Principle 2 · gotcha 2.)
  • [ ] Index fresh. A re-index trigger fires when a source changes, and you've set a staleness budget per source you can actually tolerate — a vector index is a cache, wrong the moment the source moves. (Principle 2 · gotcha 3.)
  • [ ] Retrieval measured on its own. A retrieval eval set of real query → correct chunk pairs scores recall and position separately from answer quality, and it runs before the prompt is tuned. (Principle 3 · gotcha 5.)
  • [ ] k tuned to the smallest that clears recall. You retrieve to the smallest number of chunks that reliably holds the answer on the eval set, not the largest you can afford — no over-retrieval "to be safe." (Principle 10 · gotcha 6.)
  • [ ] Permission-filtered at retrieval. A chunk the requesting user can't see is filtered out before ranking, on metadata indexed for exactly this — not retrieved and hopefully suppressed at the UI. (Principle 5 · gotcha 8.)
  • [ ] An "I don't know" path. When retrieval comes back empty or low-relevance, abstaining — "not in the knowledge base," route to a human, ask — is a first-class, tested outcome, not an accident. (Principle 8 · gotchas 4, 7.)
  • [ ] Citations traceable to the chunk used. Every grounded claim carries a citation back to the chunk it actually came from, and you've verified the citations are real — the cited chunk genuinely supports the claim. (Principle 11 · gotcha 10.)

If any box is unchecked, the grounded answer isn't ready — and a confident wrong answer is the worst output a grounded system can produce, because it looks exactly like a correct one until someone checks the source. See debugging grounding for how the eval set turns a wrong answer into a diagnosis instead of a guess.

Composing the toolkit

The toolkit is one set of complementary instruments, not rival camps you choose between. A real system composes them; the skill is fitting each to where the data lives and who governs it, not pledging loyalty to one (principle 7).

  • Agentforce retrievers over Data 360 when the data lives in Salesforce. Retrieval runs inside the security model, permission-aware by construction — the running user's sharing rules and field-level security hold at retrieval, not just the UI — so gotcha 8 is closed by the platform rather than by code you write. The precondition is a clean, governed model: the Data 360 agent-readiness check. See Agentforce retrievers.
  • External RAG when the corpus is off-platform — LangChain to wire the pipeline, a vector store to hold the embeddings, an embeddings provider for the vectors, and the Claude API for the generation. The same pipeline, governed by you: the permission filter of gotcha 8 is yours to build, and it is not optional. See external RAG.

A common shape is Agentforce retrievers grounding the in-platform answer while an external pipeline grounds a corpus the platform doesn't reach — composed by where the data lives, never Agentforce retrievers vs. external RAG. The one decision this page does not own is the surface-level Agentforce vs. external AI call inside Marketing Cloud — that's a different framework, and it lives in full in the Marketing Cloud AI Style Guide.

Patterns to prefer

  • Measure retrieval before tuning the prompt — when an answer is wrong but fluent, score recall and position first; a prompt tuned against broken retrieval just memorizes the breakage. (Principle 3 · gotcha 5.)
  • Hybrid search where exact terms matter — semantic plus keyword (BM25), merged, so codes, IDs, names, and jargon match literally instead of being quietly dropped. (Principle 2 · gotchas 1–2.)
  • The smallest k that clears recall — the smallest number of chunks that reliably holds the answer, tuned against the eval set, never padding the context "just in case." (Principle 10 · gotcha 6.)
  • A permission filter at retrieval — access control holds at retrieval, not the UI; on Agentforce the security model does it for you, off-platform you build it. (Principle 5 · gotcha 8.)
  • An "I don't know" path wired and tested, so an out-of-scope question becomes an abstention rather than a hallucination. (Principle 8 · gotchas 4, 7.)
  • Traceable citations back to the chunk actually used, verified real, so a grounded answer can be audited and replayed. (Principle 11 · gotcha 10.)
  • The Data 360 agent-readiness check passing before grounding on the platform path — a clean model is where safe retrieval starts.

Patterns to refuse

  • Editing the prompt before checking what retrieval returned — the answer is two systems compounded; reword the prompt over the wrong chunks and you fix nothing while the real bug waits. (Gotcha 5 · principle 3.)
  • Over-retrieval "to be safe" — top-fifty instead of top-five floods the window with distractors, and past a point it degrades the answer; context is a budget, not a bucket. (Gotcha 6 · principle 10.)
  • One embedding model for documents and another for queries — vectors in two different spaces, a distance that is noise; the same goes for changing the model without re-embedding the corpus. (Gotcha 2 · principle 2.)
  • An index with no refresh path — a snapshot quietly aging into fiction, correct against itself and wrong against reality, the hardest miss to catch. (Gotcha 3 · principle 2.)
  • Grounding on a fragmented Data 360 model — confidently wrong is worse than "I don't know," because someone acts on it; a clean model is the precondition, not a nicety. (Gotcha 8 · principle 2.)

Closing

None of these rules is hard to apply up front; all of them are expensive to discover live, because a grounding bug looks exactly like a correct answer until someone checks the source. The throughline is the one that runs through every page in this subcategory: the demo retrieves the one right document and the model reads it, while production has to surface the right chunk from everything you own and ground a clean model on it. The bar above is how we make sure the answer we ship stands on the chunk it claims to.

If you spot a rule missing — or one of these rules being violated in our public work — write to hello@wearecleon.com. We add it, or we fix it and we say so.

Related

Reference: