Data 360 query gotchas: where the SQL instinct misleads

Data 360 (formerly Data Cloud) gives you three ways to ask the data a question: a SQL dialect you run in the Query Editor or the Query API, Calculated Insights that pre-compute a metric once and serve it everywhere, and the consumers that read both. To a Marketing Cloud practitioner all of it looks familiar — it's SQL, you already write SQL — and that familiarity is exactly where it bites.

Ten query gotchas that bit Cleon's Data 360 builds, synthesized with Salesforce's official guidance and the corrections the practitioner community learned the hard way. Each is paired with the question to answer before the query and the cost of getting it wrong. The throughline is the one decision this whole subcategory turns on: compute once and retrieve many (a Calculated Insight) versus query live every time (the Query API) — and the model underneath both.

The gotchas

1. Query the DMO, not the DLO — the lake is not your query surface

A Data Lake Object is the raw landing of a stream, in the source system's shape. A Data Model Object is the harmonized view everything downstream should read. Querying a DLO directly — or defining a Calculated Insight on one — is the query-time version of letting the lake leak into your model: you inherit the source's naming and mess, and the day that source renames a column, the query breaks with no warning.

The cost lands on every consumer of that query at once. The question to answer before you point a query at an object: does a DMO already express the meaning you need — and if it doesn't, is the gap in the model rather than something to paper over in the query?

2. The dialect looks like Marketing Cloud SQL — and the resemblance is the trap

Data Cloud SQL is ANSI-leaning. It is not the T-SQL subset you write against Data Extensions, and the instinct carries you most of the way before it drops you: function names differ, there is no INSERT INTO ... SELECT activity model, there are no _Subscribers or _Sent System Data Views, and the objects you query are DMOs over a unified profile rather than flat DEs.

-- Marketing Cloud habit: query a flat DE, lean on System Data Views
SELECT s.SubscriberKey, j.EmailName
FROM _Sent j
INNER JOIN _Subscribers s ON s.SubscriberID = j.SubscriberID;

-- Data 360: query DMOs over the unified profile; no System Data Views,
-- qualified object names, ANSI semantics
SELECT i.ssot__Id__c, i.ssot__FirstName__c
FROM UnifiedIndividual__dlm i
WHERE i.ssot__FirstName__c IS NOT NULL;

The cost is a query you'd swear is correct that won't parse, or returns the wrong shape. The question: which of your MC SQL habits actually transfer, and which are muscle memory that misfires here? (The crosswalk has its own page in this subcategory.)

3. A Calculated Insight is exactly as fresh as its last run — and the staleness is silent

A CI is pre-computed. It runs on a schedule, or on a stream, and between runs it serves the last result it computed — with nothing in the consumer to tell you the world has moved on. An insight computed on yesterday's data is a confident wrong answer, and the segment or agent that reads it has no idea the number is a day old.

4. Cost scales with what you process, not what you store — a CI that scans the whole profile every hour is a budget decision

Data 360 bills the work — the rows a query scans, the data a Calculated Insight recomputes — far more than the data sitting at rest. A CI that re-aggregates the entire profile every hour, or a segment that scans everything on each refresh, is a cost decision wearing a logic decision's clothes.

5. A Calculated Insight is dimensions and measures, not arbitrary SELECT — the grain is the decision

A CI is not a free-form query you alias into a table. It is an aggregation defined by its dimensions — what you group by — and its measures — what you compute. Get the grain wrong (group by the wrong key, or sum where you needed a distinct count) and the number is wrong everywhere it is retrieved, consistently and silently.

-- The grain IS the decision: one row per buyer, their distinct orders summed.
-- Group by the wrong dimension and every consumer inherits the wrong number.
SELECT
  o.ssot__BuyerId__c                AS buyer,          -- dimension: the grain
  COUNT(DISTINCT o.ssot__Id__c)     AS order_count,    -- measure
  SUM(o.ssot__GrandTotalAmount__c)  AS lifetime_value  -- measure
FROM ssot__SalesOrder__dlm o
GROUP BY o.ssot__BuyerId__c;

The cost is a metric that's wrong by design — retrieved identically by every segment, activation, and agent. The question: at what grain is this metric actually true, and do the dimensions express exactly that grain?

6. Batch and streaming Calculated Insights are different tools with different limits

A batch CI recomputes on a schedule: full expressive power, latency measured in the refresh interval. A streaming CI updates continuously: low latency, but with real constraints on the operations and the lookback window it supports. Reaching for streaming because "real-time is better" and then hitting its limits, or using batch where a decision genuinely needs sub-hour freshness, are the two symmetrical mistakes.

The cost is a rebuild when the CI type can't do the job you discovered it needed. The question: does this metric drive a real-time decision or a periodic one — and does the CI type match the answer, rather than the aspiration?

7. The Query API paginates and goes async for large results — assuming one synchronous response truncates your analysis

The Query API runs ad-hoc SQL over the data model, but large result sets paginate, and big queries run asynchronously: you submit, then retrieve. Code that reads the first response and stops is silently analyzing a fraction of the data, with no error to flag it.

The cost is an export or an integration that is quietly missing most of its rows — the worst kind of wrong, because the numbers look plausible. The question: does this query's result fit a single synchronous page, and if it doesn't, does the caller actually handle pagination and async retrieval, or just read page one?

8. A join you didn't model is a join the query can't make

Querying across DMOs — individuals to their orders, orders to their items — only works where the relationship is modeled. A query that needs an unmodeled traversal doesn't fail with a helpful message; the path simply isn't available, and the question can't be written. This is the data-architecture relationship decision seen from the query side: the query is where the gap the model left finally shows up.

The cost is discovering mid-analysis that the question you were asked can't be answered without a model change — with everything already depending on the model. The question: does every traversal this query needs correspond to a relationship someone actually modeled?

9. You're querying a resolved profile, not raw records — count the right object or the numbers won't reconcile

Identity resolution merges source rows into a unified individual. Query the unified DMO and you are counting people; query a source-aligned object and you are counting records — and the two will not match, by design. The mistake is assuming a count is a count.

The cost is two dashboards that disagree and an afternoon spent reconciling them, only to find both are "right" against different objects. The question: for this number, are you counting unified individuals or source rows — and is that the object you actually meant to count?

10. An agent inherits every flaw in your query layer — and answers confidently anyway

The most modern version of all of the above: an agent — Agentforce or an external LLM — that retrieves a Calculated Insight gets its staleness, its cost profile, and its grain exactly as you defined them. A wrong CI doesn't make the agent hesitate; it makes the agent confidently wrong, which is worse than an agent that can't answer at all.

The cost is the most expensive kind of wrong: authoritative, fluent, and trusted. "Agent-ready query" is not a feature you switch on — it's the state you're already in when the CIs are correct, fresh, and defined at the right grain. The honest question: would a human analyst trust this CI's number without a caveat? If not, neither should an agent.

The throughline across all ten: the query layer is only ever as good as the model beneath it and the Calculated Insights defined over it. Compute once and retrieve many is the discipline that keeps a metric consistent; the right grain and the right freshness are what make it true; and an agent grounded on the result inherits whatever you decided. Every gotcha here is, in the end, the same gotcha — the query exposes a decision someone made earlier, and the cheapest place to fix it is earlier still.

Closing

These ten are the query and insight choices Cleon has seen bite hardest in Data 360 builds. The shared theme echoes the Marketing Cloud SQL catalog one product over: the surface makes the easy query easy and the durable query deliberate. Querying the lake, trusting a stale insight, computing at the wrong grain, reading only page one — none is hard in the moment, and each is a number nobody can trust once segments, activations, and agents are reading it.

If a query or insight gotcha bit your team and isn't here, write to hello@wearecleon.com — we add it, with credit.

Reference: