Agent Style Guide: the bar an agent clears before it ships

This is the page where Cleon stops describing what an agent is and says what we do before one ships. The reference pages lay out the parts — what an agent is, the orchestration patterns, the tools and actions. The gotchas lay out where each part bites. This Style Guide is the discipline that decides whether to build an agent at all, and the bar it has to clear before it touches a customer or a record.

The rules are short on purpose. When a rule needs an explanation, the explanation lives in the page it links to. This is the operational form of the AI Engineering principles — each rule below is one of those principles with its sleeves rolled up, and we cite the number so you can trace the rule back to the reasoning.

The first decision: agent, workflow, or single prompt

Before any of the rest, answer this: does the job actually need an agent? Most things scoped as "an agent" are a workflow wearing the word. The honest test is one question — can you enumerate the path ahead of time? If you can draw the flowchart, you don't need an agent; you need a workflow (see what is an agent). Climb this ladder only as far as the problem forces you, and not one rung higher (principle 12 — start from the problem).

The four shapes, ordered by who decides the next step:

Single prompt — nobody decides the flow; one call, one response. Reach for it when the model already knows how: summarize, classify, rewrite, extract.
Chain — you decide; a fixed sequence of prompts. Reach for it when the task has clear stages that always run in the same order.
Workflow — you decide; branches and logic, all of them yours. Reach for it when you can draw the flowchart, even a large messy one.
Agent — the model decides; the loop asks it what to do next. Reach for it only when the path genuinely can't be written down ahead of time.

Each shape down the list hands one more decision to the model, buying capability on open-ended problems at the cost of predictability, money, and testability. Reach for an agent only at the bottom — when the next step depends on what earlier steps return in ways no fixed chart could cover. Every branch you can name is a branch you should hard-code rather than hope the model picks right each run.

The production checklist: the bar an agent clears

An agent that ships is grounded, evaluated, bounded, governed, and owned — or it's a demo that hasn't failed yet (principle 1). Before this agent touches a customer or a record, confirm every box. Each one closes a gotcha that has killed a real project.

Grounded. It answers from retrieved facts, not from training, and "I don't know" is a path it's allowed to take when retrieval comes back empty. (Principle 2 · gotcha 8.)
Evaluated. An eval set of real cases with known-good outcomes runs on every change before it ships — built from the first ten real failures, scoring outcome and trajectory, not an exact string. (Principle 3 · gotcha 2.)
Bounded. A hard step cap, a token budget, and a timeout are enforced in code, not requested in the prompt. (Principle 6 · gotcha 3.)
Least-privilege tools. Every tool has the narrowest scope that does the job, is validated against a strict schema before it runs, and is idempotent where the action allows. (Principle 5 · gotchas 4–5.)
Gated. Every irreversible action — send, charge, delete — has a human approval gate before it fires, and an operator can halt a running agent now. (Principle 5 · gotcha 6.)
Fallback. A deterministic fallback fires when the agent fails or returns something invalid; a model failure never renders as a blank, an error string, or a hallucination in front of someone. (Principle 8 · gotcha 9.)
Curated context. Each step gets what it needs, not everything you have — the window is summarized, dropped, or retrieved on demand, not grown unbounded. (Principle 10 · gotcha 7.)
Traced. Every run logs its inputs, retrieved context, tool calls and results, and final output, so you can replay it when it goes wrong. (Principle 11.)
Owned. A named human is accountable for every consequential outcome — can explain it, defend it, and has the controls to act. (Principle 9 · gotcha 10.)

If any box is unchecked, the agent isn't ready — and "the agent decided" won't be an acceptable answer when it's wrong. See debugging agents for how the trace turns a failed run into a fix.

Composing the toolkit

The toolkit is one set of complementary instruments, not rival camps you choose between. A real system composes them; the skill is fitting each to the job, not pledging loyalty to one (principle 7).

Agentforce when the work lives in the Salesforce security model and needs governed, auditable actions on customer data. The action runs as a user under their permissions, sharing rules and field-level security still apply, and the platform records what ran — see Agentforce agents.
LangGraph and the Claude API when the work is off-platform, needs a custom control loop, or spans models Salesforce doesn't reach — a graph for deterministic control over a non-deterministic step, a Claude loop for the reasoning core. See external agents.
MCP when you want a tool, and the data behind it, exposed once and reused by any host that speaks the protocol — the connective tissue across systems, not a third camp.

A common shape is Agentforce owning the in-platform actions while an external graph owns a step it can't, handing off to each other. The one decision this page does not own is the surface-level Agentforce vs. external AI call inside Marketing Cloud — that's a different framework, and it lives in full in the Marketing Cloud AI Style Guide.

Patterns to prefer

The smallest shape that fits the job — single prompt over chain, workflow over agent, single loop over a graph of agents. The strongest system that could work is usually the smallest one. (Principle 12 · gotcha 1.)
A fixed, narrow tool set over an open-ended one — each tool scoped at its boundary in code, not asked to behave in the prompt. (Principle 5 · gotcha 5.)
Grounding checked first when an answer is wrong but fluent — suspect retrieval before you blame the model. (Principle 2 · gotcha 8.)
A human gate on anything consequential an agent produces or any irreversible action it can take. (Principle 9 · gotcha 6.)
The Data 360 agent-readiness check passing before an agent reads customer data — a clean model is where a safe agent starts.
A deterministic fallback under every generation, so the floor is boring-and-correct, never an exciting wrong answer. (Principle 8 · gotcha 9.)

Patterns to refuse

An agent that should be a workflow — if you can draw the flowchart, the agent costs more, runs slower, and resists the eval set you need. (Gotcha 1 · principle 12.)
A tool that can delete, send, or charge with no gate — least privilege isn't optional once the agent has hands. (Gotcha 5 · gotcha 6 · principle 5.)
Shipping with no eval set — "it looked good in three tries" is a vibe, not a test, and the regression ships silently. (Gotcha 2 · principle 3.)
An unbounded loop — no step cap or token budget means one stuck run can cost more than a thousand good ones. (Gotcha 3 · principle 6.)
An agent grounded on a fragmented model — confidently wrong is worse than "I don't know" because someone acts on it. (Gotcha 8 · principle 2.)
"The AI did it" as the answer to who's accountable — an incident with no owner is how a fixable mistake becomes an organizational one. (Gotcha 10 · principle 9.)

Closing

None of these rules is hard to apply up front; all of them are expensive to discover live. The throughline is the one that runs through every page in this subcategory: the demo makes the easy path look like the whole path, and production is everything the easy path left out. The bar above is how we make sure the agent we ship is the system, not the trick.

If you spot a rule missing — or one of these rules being violated in our public work — write to hello@wearecleon.com. We add it, or we fix it and we say so.

Agent gotchas — the failures this Style Guide is designed to prevent
What is an agent — the definition and the agent-or-workflow test behind the first decision
Orchestration patterns — the shapes a control loop takes, and what each costs
Tools and actions — least privilege, schemas, idempotency, and gates
Agentforce agents — the agent inside the Salesforce security model
External agents — the LangGraph and Claude API side, off-platform
Debugging agents — tracing a run when it goes wrong
AI Engineering principles — the meta-rules these specifics operationalize
Data 360 agent-readiness check — the grounding side of a safe agent

Reference: