AI ENGINEERING / GROUNDING & RETRIEVAL
Grounding & Retrieval
Connecting models to your knowledge: RAG done right — chunking, embeddings, retrieval quality — across Agentforce retrievers over Data 360 and external vector stores. The foundation an agent’s answers stand on.
Foundation · 2
Production note
Grounding gotchas: how RAG fails in production
A RAG demo retrieves the one document you tested, on the one question you asked. Production retrieves from everything you have, on questions nobody scripted — and the wrong chunk is a confident wrong answer. Ten gotchas that kill grounding after the demo, each with the question to answer first and the cost of getting it wrong.
Decision framework
Grounding Style Guide: the bar retrieval clears before it ships
The opinionated rules Cleon applies to every grounded system — the first decision (do you even need retrieval), the retrieval-quality bar a pipeline clears before it ships, and how we compose Agentforce retrievers and external RAG by where the data lives rather than pick a camp. The discipline document that turns the grounding gotchas into a gate and the principles into practice, scoring retrieval before anyone touches the prompt.
Reference · 5
Reference
What is grounding? The retrieval pipeline an answer stands on
Grounding is feeding the model real retrieved facts instead of letting it answer from training — and RAG, retrieval-augmented generation, is how it's built. The pipeline end to end: chunk → embed → store → retrieve → augment → generate, each stage in a sentence. The vocabulary the rest of this subcategory uses — chunk, embedding, vector store, semantic search, hybrid search, top-k, re-ranking — and the honest test for when you need retrieval at all: only when the answer lives in data the model wasn't trained on or that changes. Principle 2: ground before you generate.
Reference
Chunking and embeddings: the inputs retrieval quality depends on
The two upstream levers retrieval quality stands on: chunking and embeddings. How you split a document — fixed-size, structural, semantic — and the chunk-size and overlap trade-offs that either preserve or destroy meaning, including Anthropic's Contextual Retrieval. What an embedding is, why the embedding model is a real choice (dimension, cost, latency, domain fit), and why query and document must share one model. Anthropic ships no first-party embedding model — you pair Claude with a provider. Get these wrong and no retrieval tuning saves you.
Reference
Retrieval quality: measuring and improving what the model gets
Retrieval quality is a separate thing from answer quality, and you have to measure it on its own. This page splits the two — how to score whether the right chunk came back at all and how high it ranked, with a small retrieval eval set built query-to-chunk — then walks the levers that improve it: hybrid search, re-ranking, metadata filtering, query rewriting, and chunk/k tuning, each with where it fits and what it costs. The throughline: a grounded system is only as good as its retrieval, and retrieval is only trustworthy once it's measured.
Reference
Agentforce retrievers: grounding inside the Salesforce platform
The platform-native grounding path: a retriever wraps an Einstein Search operation and bridges a search index to your Prompt Templates and Flow, so an Agentforce agent answers over Data 360 instead of guessing. How it is assembled — Data 360's auto-created retriever per index versus a custom one in Einstein Studio, vector and hybrid search index types, no-code retrievers for admins versus custom for control, ensemble retrievers across sources — and the part that earns the platform its place: retrieval runs inside the Salesforce security model, honoring the running user's permissions by construction. The complementary line to an external RAG pipeline (principle 7), and the clean Data 360 model that has to come first.
Reference
External RAG: the grounding pipeline you own
The off-platform grounding path as a pipeline you build stage by stage — LangChain to orchestrate it, a vector store you run, an embeddings provider paired with Claude, and the Claude API for generation. The thing that defines it: when the corpus lives outside Salesforce, you own the chunking, the index, the retrieval tuning, the permission filter, the freshness contract, and the eval set that Agentforce retrievers hand you for free. The right call when the data and the work are off-platform — and complementary to the platform path, not a rival to it.