Identity resolution gotchas: the silent failures

Match rules and reconciliation rules are the easy demo and the hard production system. In a sandbox with a few thousand clean records, identity resolution looks like a checkbox: turn it on, watch profiles unify, move on. The cost shows up later, at scale, on real data — and almost never as an error. Identity resolution fails quietly. The job runs green, the unified profiles populate, the counts look plausible, and the failure is two customers who can now see each other's orders, or one customer fragmented across three profiles whose metrics will never reconcile.

Seven identity resolution choices that bit Cleon's Data 360 (formerly Data Cloud) builds, synthesized with Salesforce's official guidance and the corrections the practitioner community learned the hard way. Each is paired with the instinct that leads you in, what actually happens in production, and the fix. The throughline is the one this subcategory keeps returning to: identity resolution is a business decision, not a checkbox — principle 4 — and its two worst failures, over-merge and over-split, are both silent. The two also fail asymmetrically: a false merge is a privacy incident; a false split is a metrics nuisance. Tune against the one you can least afford, never split the difference blindly.

The gotchas

1. Loosening the rules to "catch more matches" — over-merge fuses two customers, and they see each other's data

The instinct is reasonable and quietly dangerous: matches are being missed — the same person clearly exists twice — so you relax a criterion or add a looser rule to catch them. Each loosening does catch real duplicates, which is exactly what makes it feel safe. But a match ruleset is an OR of rules (match rules): every rule you add or widen is one more way for two records to be declared the same person, and the engine cannot tell a correct merge from an incorrect one. It applies the rule and moves on.

What actually happens in production is over-merge: two different people collapse into one UnifiedIndividual__dlm. Their contact points, attributes, and history pool into a single profile — and because everything downstream reads the unified individual, both customers now resolve to the same person. A segment targets them as one. An activation sends one of them content keyed on the other's behavior. In a service or commerce context, one customer can see the other's orders. Nothing errored; the rule did exactly what it was told.

2. Tightening the rules to "be safe" — over-split fragments one person, and the metrics double-count

The mirror instinct, and it feels like the responsible one: over-merge is scary, so you tighten — add a criterion to a rule, require an exact key everywhere, refuse anything fuzzy. Stricter rules can't merge two people who aren't the same, so the privacy risk drops. The trap is that strictness is not free; it just moves the failure to the other side.

What actually happens is over-split: one real person who should unify stays fragmented across several UnifiedIndividual__dlm rows, because no single rule's full AND-group fired across their records. The damage is the inverse of over-merge and just as silent. A count of unified individuals over-states how many people you have. A suppression set built on one fragment misses the person's other fragments, so they get the message you suppressed. Personalization runs on a partial history because the rest of the person lives on another profile. The "single view" the platform promised quietly isn't — and no error says so.

The fix is to stop treating "stricter is safer" as universally true. Stricter is safer against over-merge and worse against over-split; which one your rules should lean toward is the asymmetric-risk decision the Style Guide frames. Test the ruleset against real data, count both directions through the link object (next gotcha), and have the business sign off on where the line sits.

3. Counting the wrong individual object — the source-versus-unified mix-up that breaks every metric

The instinct is to write FROM ssot__Individual__dlm because it's the object you ingested and the one whose name you learned first. It returns a number, the number looks like customers, and the report ships. The problem is that ssot__Individual__dlm counts source records — one row per individual per source — while UnifiedIndividual__dlm counts people. They are not two views of one count; they are two different questions, and the arithmetic only runs one way: unified is always less than or equal to source (the unified individual).

What actually happens is a "customers" metric that silently double-counts everyone who exists in two systems, because each duplicate source row is counted as a person. Switch the same metric to the unified object and the number drops — not because customers left, but because you stopped counting the same person twice. The two objects mixed across reports never reconcile, and the gap looks like a data bug long before anyone suspects the grain. Worse: someone may "fix" the gap by loosening match rules to force the counts together — closing a discrepancy that was identity resolution working correctly, and trading it for over-merge (gotcha 1).

4. Joining source straight to unified — the direct-join trap returns a wrong mapping, not an error

The instinct comes from every other data model you've used: two objects that describe the same thing must share a key, so you find a plausible shared field and write ssot__Individual__dlm = UnifiedIndividual__dlm on it. It feels like ordinary SQL. It is the one traversal the model does not support — and the failure mode is the dangerous kind, because depending on the field you pick, it may not error.

What actually happens is one of two things, both bad. Either the join finds no real relationship and resolves to nothing, or — far worse — a hand-written = on some shared-looking field returns a mapping that is plausible but wrong, silently pairing source rows to unified rows that identity resolution never linked. You then build counts, segments, and audits on a relationship the platform never asserted. The unified individual stores no foreign key naming its sources, and the source individual stores none naming its unified row; the relationship lives only in IndividualIdentityLink__dlm (the unified individual).

The fix is non-negotiable and it is the same on every page in this subcategory: source-to-unified always traverses IndividualIdentityLink__dlm, never a direct join. Two hops, alias and qualify every column:

-- Source individual -> its unified individual, through the link object.
-- Never join ssot__Individual__dlm directly to UnifiedIndividual__dlm.
-- link/source join-field names (e.g. SourceRecordId/UnifiedRecordId) follow your org's model
SELECT
  src.ssot__Id__c            AS source_individual_id,
  uni.ssot__Id__c            AS unified_individual_id
FROM ssot__Individual__dlm src
JOIN IndividualIdentityLink__dlm link
  ON link.SourceRecordId__c = src.ssot__Id__c
JOIN UnifiedIndividual__dlm uni
  ON uni.ssot__Id__c = link.UnifiedRecordId__c;

The same bridge run the other direction — counting source records per unified individual — is how you actually measure over-merge and over-split: a unified row linked to many source records may be over-merged; a person you know is one customer scattered across several unified rows is over-split. (For the full clause behavior and the IS NOT DISTINCT FROM nullable-key join, see Data Cloud SQL.)

5. Trusting the match and ignoring reconciliation — the "wrong" value won because of the rule you didn't choose

The instinct is to treat identity resolution as finished once the right records merge: the people are correct, so the profile is correct. But matching only decides which source values are candidates; reconciliation decides which one wins per attribute (reconciliation rules). Skip that decision and you don't skip the outcome — you inherit the ruleset default on every attribute, and a default nobody chose is still a choice.

What actually happens is a profile that's right about the person and wrong about their data. Set Most Recent on the email attribute and a stale address from last night's batch import outranks the address the customer typed into a form last week — because the batch job is "most recent," not the human. The unified profile presents the wrong email, the segment filters on it, and the activation sends to it. No error fires; the first you hear of it is a bounce report or a customer who stopped getting messages they expect. The same trap sets a birthdate that flips every time a low-quality list re-imports, because recency let the newest write win regardless of source quality.

The fix is to choose reconciliation per attribute, against how each value actually behaves. Use Source Priority pointed at the system of record for attributes that have one — the CRM owns the verified email, billing owns account status — so authority holds regardless of which feed wrote last. Reserve Most Recent for attributes a customer actively changes and where the timestamp means a human edit, not a pipeline touch. Reconciliation is principle 4 too: it quietly assigns the customer their "true" email and address, and its failures are as silent as matching's (reconciliation rules).

6. Saving a rule change like it's a config toggle — it silently reprocesses the org and shifts every count

The instinct is that a match or reconciliation rule is settings: you change it, you save, you move on, the way you'd flip any other configuration. The rule editor reinforces it — there's a Save button, not a "reprocess the entire org" button. But saving a changed ruleset triggers identity resolution reprocessing across the org: the engine re-evaluates affected source records, recomputes the IndividualIdentityLink__dlm mappings, and rebuilds the unified profiles (match rules).

What actually happens has three edges, all easy to walk off. It costs — reprocessing is work, and Data 360 bills on work processed (principle 11), so iterating on rules is not free. It takes time — the profiles you query right after a save may still be settling, so a count read too early is mid-reprocess, not final. And counts shift: a looser rule lowers the unified count as more records collapse together; a stricter rule raises it. A downstream report keyed on UnifiedIndividual__dlm moves the moment matching reprocesses — by design, but it ambushes whoever reads the report if nobody warned them, and it can look exactly like a data incident.

The fix is to treat a rule change as a deployment, not a toggle. Budget the reprocessing cost and time, announce the count shift to whoever consumes unified-individual numbers before you save, and verify what the change did by inspecting the link object — how many source individuals now resolve to one unified individual — rather than by reading a dashboard mid-reprocess. Rules get validated against real data and signed off before they run on the whole org (principle 4), precisely because the save is the irreversible-feeling part.

7. Matching on a key nobody normalized — the same person stays split because the values never lined up

The instinct is to add the obvious key to a match rule — email, phone, a loyalty ID — and assume that because two records contain the same identifier, the engine will see them as equal. It often doesn't, because every criterion compares the normalized value, not the raw one (match keys and normalization). Jane@Example.com with a trailing space and jane@example.com are the same address to a human and two different strings to an exact-match rule that never normalized them.

What actually happens is over-split with an innocent-looking cause: one person stays fragmented not because the rule was too strict in intent, but because the key it matched on wasn't standardized first, so the values never compared equal. A phone stored as (555) 123-4567 in one source and +15551234567 in another won't match on an exact-phone rule until both are normalized toward a canonical form. The rule looks correct in the editor; the resolution is wrong in production, and it presents identically to a deliberately strict ruleset — one person, several profiles, no error.

The fix is to treat normalization as part of the match rule, not a detail beneath it: the quality of a match rule is capped by the quality of the normalization under its keys (match keys and normalization). Lowercase and trim email, reformat phone toward a canonical form, standardize the party identifier — before the rule compares anything. When two records you know are the same person won't merge, suspect the normalization on the key before you reach for a looser rule, because loosening to paper over a normalization gap is how you turn a quiet over-split into a quiet over-merge (gotcha 1).

The throughline across all seven: identity resolution does not tell you when it's wrong. Match too loose and two people are one; match too strict — or normalize too little — and one person is many; count the wrong object or join straight to unified and the relationship you report was never asserted; skip reconciliation and the profile is right about the person and wrong about their email; save a rule like a toggle and the whole org reprocesses under a report that just moved. Every one of these is silent, and the leverage is the same place every time: the rules, the keys under them, and the link object that is the only honest map of what resolution actually did. Test against real data, count through the link, and get the business to sign off — because the platform will never raise its hand.

Closing

These seven are the identity resolution failures Cleon has watched bite hardest in Data 360 builds. The shared theme echoes the rest of this catalog: the platform makes the easy merge easy and the correct resolution deliberate. Two customers fused into one, one customer split across three, a count that won't reconcile, a direct join that returned a wrong map, a stale email that won, a rule change that quietly reprocessed the org — none is loud in the moment, and each is a profile that lies to everything downstream until someone notices a number that can't be right, or a customer sees an order that isn't theirs.

If an identity resolution gotcha bit your team and isn't here, write to hello@wearecleon.com — we add it, with credit.

The unified individual — what resolution produces, the IndividualIdentityLink__dlm bridge, and why source-to-unified is never a direct join
Match rules — the OR-of-AND-groups model where over-merge and over-split are decided
Reconciliation rules — why the "wrong value won" and how per-attribute selection fixes it
Match keys and normalization — the normalization gap that keeps the same person split
Debugging identity resolution — how to confirm over-merge vs over-split and verify a reprocess landed
Identity Resolution Style Guide — the asymmetric-risk decision: how strict should the match be?
Data 360 principles — identity as a business decision (4), agent-readiness (10), cost on what you process (11)

Reference: