Debugging identity resolution: is it over-merge or over-split?

A unified profile in Data 360 (formerly Data Cloud) is wrong and nothing told you. Maybe a customer logged in and saw an order that wasn't theirs. Maybe "active customers" dropped by a third overnight after someone changed a rule. Maybe a person you know exists is somehow three people in the data. The match ruleset looked reasonable, the reprocessing reported success, and identity resolution never throws for being wrong — only for failing to run. So the diagnostic is the same shape every time: name the failure first, because every other step depends on which one you have. There are only two, and they're opposites.

Over-merge — two (or more) different people fused into one UnifiedIndividual__dlm. The rules were too loose. This is the worse, asymmetric failure: the merged people can see each other's data, suppression leaks across them, and personalization addresses the wrong person. In a consent-bearing channel it's a privacy incident, not a metrics nuisance.
Over-split — one real person fragmented into several UnifiedIndividual__dlm rows. The rules were too strict, or a key was dirty. Metrics double-count, a suppression set misses some of the person's records, and the "single view" quietly isn't one.

Everything below routes through the bridge object. Source-to-unified is never a direct join — there is no modeled path from ssot__Individual__dlm straight to UnifiedIndividual__dlm; you always traverse through IndividualIdentityLink__dlm, the link identity resolution maintains. Every query on this page goes source → link → unified (or the reverse), and so must yours.

The steps

[ STEP 1 — Name it: over-merge or over-split? (inspect the link object) ]
        ↓
[ STEP 2 — Which match criterion fired to cause it? ]
        ↓
[ STEP 3 — Validate normalization on the suspect keys ]
        ↓
[ STEP 4 — Change a rule, then verify the reprocessing actually landed ]

Unlike a wrong query number, where you walk down from the model, an identity symptom points at a direction. "A customer saw someone else's data" is over-merge; "the count doubled" or "this person is several profiles" is over-split. Step 1 confirms which, in the data, before you touch a rule — because the fix for over-merge (tighten) is the exact opposite of the fix for over-split (loosen), and guessing wrong makes it worse. Steps 2 and 3 find why the wrong decision happened; step 4 is how you change the rule and confirm the change did what you intended without breaking the half that was fine.

Step 1 — Name it: over-merge or over-split?

Before changing anything, prove in the data which failure you have. Both are measured the same way: count how source records distribute across unified individuals through IndividualIdentityLink__dlm. Over-merge is too many distinct source records — clearly different people — collapsing into one unified individual. Over-split is one person's records spread across too many unified individuals. You read the first by fanning out from the unified row; you read the second by grouping the source records that should be one person and counting the distinct unified individuals they landed in.

The check (over-merge) — start from the suspect unified individual and fan out through the link to every source record that resolved into it. A unified individual linking to one source record was never matched to anything; linking to five means resolution folded five source rows into one person. A handful from genuinely different systems is normal; dozens, or a count that keeps climbing, is the smell.

-- OVER-MERGE probe: how many distinct source records resolved into one unified
-- individual? Traverse unified -> link -> source. Never a direct join.
-- A high count for what should be one person is the over-merge signature.
-- link/source join-field names (e.g. SourceRecordId/UnifiedRecordId) follow your org's model
SELECT
  uni.ssot__Id__c                        AS unified_individual_id,
  COUNT(DISTINCT link.SourceRecordId__c) AS source_record_count
FROM UnifiedIndividual__dlm uni
JOIN IndividualIdentityLink__dlm link
  ON link.UnifiedRecordId__c = uni.ssot__Id__c
GROUP BY uni.ssot__Id__c
ORDER BY source_record_count DESC;

A high source_record_count alone isn't proof — a genuinely well-connected person can legitimately span many sources. Confirm the merge is wrong by reading the linked source records' identifying attributes (name, email, party ID) and seeing whether they describe different humans. Walk one suspect unified individual's sources through the link and read them:

-- Confirm an over-merge: list the source records behind ONE suspect unified
-- individual and inspect their identifying attributes. Different people sharing
-- one unified_individual_id is over-merge, in the data.
SELECT
  src.ssot__Id__c          AS source_individual_id,
  src.ssot__FirstName__c   AS first_name,
  src.ssot__LastName__c    AS last_name,
  src.DataSourceId__c      AS data_source
FROM UnifiedIndividual__dlm uni
JOIN IndividualIdentityLink__dlm link
  ON link.UnifiedRecordId__c = uni.ssot__Id__c
JOIN ssot__Individual__dlm src
  ON src.ssot__Id__c = link.SourceRecordId__c
WHERE uni.ssot__Id__c = 'PASTE_SUSPECT_UNIFIED_ID'
ORDER BY src.ssot__LastName__c;

The check (over-split) — go the other direction. Take a key that should identify one person — a verified email, a loyalty / party ID — group the source records by it, and count the distinct unified individuals they resolved into through the link. The expected answer is exactly one. Two or more means that single key's records fragmented across multiple unified profiles: over-split.

-- OVER-SPLIT probe: for a key that should be ONE person, how many distinct
-- unified individuals did its source records resolve into? Source -> link ->
-- unified. Anything over one for a single real person is over-split.
SELECT
  src.ssot__LastName__c                  AS family_key,
  COUNT(DISTINCT link.UnifiedRecordId__c) AS unified_individual_count
FROM ssot__Individual__dlm src
JOIN IndividualIdentityLink__dlm link
  ON link.SourceRecordId__c = src.ssot__Id__c
GROUP BY src.ssot__LastName__c
HAVING COUNT(DISTINCT link.UnifiedRecordId__c) > 1
ORDER BY unified_individual_count DESC;

In production you'd group on the strong key you expect to be unique per person — a party ID or a verified email joined through its contact-point DMO — not a surname; the surname above is illustrative because it needs no extra join. The shape is the point: group the source records that ought to be one person, count distinct unified individuals, and anything over one is fragmentation.

The fix — name the failure, then commit to its direction. Over-merge means the rules matched two people who aren't one: the remedy is to tighten (drop a too-loose rule, or anchor a fuzzy criterion to an exact key). Over-split means the rules failed to match one person who is: the remedy is to loosen or to repair the key feeding the rule. They are opposite moves; Step 1 exists so you make the right one. (How strict the ruleset should be in the first place is the Identity Resolution Style Guide; this page is what you run once it's already wrong.)

Once you know which failure you have and on which records, you know which way to move. The next question is why the engine decided wrong. Go on.

Step 2 — Which match criterion fired?

You've confirmed an over-merge (or an over-split) on specific records. Now find the rule responsible, because a ruleset is an OR of AND-groups — several rules, any one of which is enough to declare a match — and only one of them caused this. Tightening the wrong rule fixes nothing and may break a match that was working.

The check — for an over-merge, read the ruleset and ask which single rule's full AND-group both suspect records could satisfy. Take the two source records that shouldn't have merged, line up their normalized identifying values side by side, and walk each rule: a rule fires only if every criterion in it holds for the pair. The rule whose whole AND-group both records satisfy is the one that merged them. Usually it's a rule with a fuzzy criterion that stands too much on its own — fuzzy-name with a weak or absent anchor — letting "J. Smith" and "John Smith" through. For an over-split, the logic inverts: read why no rule fired for two records that are the same person — typically every rule depended on a key one record was missing or had in a different form.
The symptom — over-merge: two records you can see are different people share a unified_individual_id, and exactly one rule's AND-group is satisfied across them. Over-split: two records that are the same person sit under different unified_individual_ids, and no rule's AND-group spans both — each rule is defeated by one missing or mismatched criterion.
The fix — change that rule, not the ruleset wholesale. Over-merge from a lone fuzzy criterion: anchor it to an exact key in the same AND-group (fuzzy-name AND phone-exact) so name similarity alone can't merge. Over-split because rules leaned on a key one record lacked: add a rule that can match on a key the fragmented records do share, or fix the key (Step 3). One rule changed deliberately beats a ruleset rewritten in a panic. (See match rules on how criteria combine and why a fuzzy criterion is rarely safe alone.)

If the responsible rule is clear and you understand why it fired (or didn't), the logic is diagnosed. But a rule can be written perfectly and still misfire if the values it compares were never normalized the same way. Go on.

Step 3 — Validate normalization on the suspect keys

Every match criterion compares normalized values, not raw ones — lowercased email, whitespace and punctuation stripped, phone pushed toward a canonical form. A correct rule on un-normalized keys produces both failures: an exact-email rule that sees Jane@Example.com and jane@example.com as different misses the match and over-splits; a fuzzy rule on garbage keys finds spurious similarity and over-merges. Before you blame the rule logic, prove the keys it reads are clean.

The check — pull the raw values of the suspect key for the records in question and look at them as strings, exactly as ingested. For an over-split you expected to match: are the two emails the same once you mentally lowercase and trim, but different raw — meaning normalization should have unified them but the values arrived in a form it didn't catch? For an over-merge: is the shared key actually shared, or do two values merely look alike to a fuzzy comparison while being different people? Read the keys straight off the source DMO, including the contact-point DMOs where email and phone live:

-- Inspect the raw key values for two source individuals you're comparing.
-- Email and phone live in their own DMOs, joined from the source individual.
-- Read them as strings: casing, trailing spaces, punctuation, format drift.
SELECT
  src.ssot__Id__c          AS source_individual_id,
  email.ssot__EmailAddress__c AS raw_email,
  phone.ssot__PhoneNumber__c  AS raw_phone
FROM ssot__Individual__dlm src
LEFT JOIN ssot__ContactPointEmail__dlm email
  ON email.ssot__PartyId__c = src.ssot__Id__c
LEFT JOIN ssot__ContactPointPhone__dlm phone
  ON phone.ssot__PartyId__c = src.ssot__Id__c
WHERE src.ssot__Id__c IN ('SOURCE_ID_A', 'SOURCE_ID_B');

The symptom — over-split with keys that are "obviously" the same to a human: one email has a trailing space or mixed case, one phone is +1 (555) 010-0199 and the other 5550100199, and the exact-match criterion correctly treated them as unequal because normalization didn't reconcile the formats. Over-merge with a fuzzy criterion: two genuinely different people whose names normalize close enough that fuzzy similarity fired without a strong anchor to stop it.
The fix — fix normalization at the key, not by loosening the rule to paper over dirty data. If phone formats vary by source, normalize them to a canonical form (E.164) before matching; if an email arrives with casing or whitespace the rule should ignore, ensure normalization covers it. A rule is only as good as the keys it compares — repairing the key is the durable fix, and loosening a rule to swallow un-normalized values just converts an over-split into a future over-merge. (See match keys and normalization for what to normalize and which keys are stable enough to match on.)

Once the keys are clean and the rule is right, the diagnosis is done — but the fix only counts once it's reprocessed across the org. The last step is changing the rule and proving the change landed. Go on.

Step 4 — Change a rule, then verify the reprocessing actually landed

Match rules don't take effect when you save them in isolation — saving a changed ruleset triggers identity resolution reprocessing across the org: the engine re-evaluates the affected source records, recomputes the IndividualIdentityLink__dlm mappings, and rebuilds the unified profiles. That costs, it takes time, and it moves counts. So you don't verify a rule change by re-reading the rule; you verify it by confirming the link object moved the way you predicted — and a green "reprocessing complete" status is not that confirmation.

The check — predict the direction before you change the rule, then measure after. Tightening to fix over-merge should raise the unified count (fewer source records collapse together) and lower the per-unified source_record_count on the profiles you were splitting apart. Loosening to fix over-split should lower the unified count (more records collapse) and drop the unified_individual_count for your suspect key to one. Capture the before number, wait for reprocessing to finish, and re-run the exact Step 1 probe. Did the count move in the predicted direction, by roughly the magnitude you expected?

-- BEFORE and AFTER a rule change: the unified count for the population you touched.
-- Capture this number before saving the rule, then re-run after reprocessing
-- completes. The direction of the move is your verification, not the status badge.
SELECT COUNT(ssot__Id__c) AS unified_individuals
FROM UnifiedIndividual__dlm;

The symptom that you're reading too early — you re-run the probe and the counts haven't moved, or moved partway. Reprocessing isn't instantaneous; the unified profiles you query right after saving may still be settling, so a number that looks "unchanged" can simply be mid-flight. Equally, a status that reads complete while your targeted count didn't move at all means the rule you changed wasn't the one that fired (return to Step 2) — the engine ran, but on logic that never touched these records.
The fix — treat the link object as the source of truth for whether the change worked, and re-confirm on the specific profiles from Step 1, not just the global count. Re-run the over-merge fan-out: did the over-merged unified individual split back into separate people? Re-run the over-split group-by: does the suspect key now resolve to exactly one unified individual? The global count moving in the right direction is necessary; the specific profiles resolving correctly is sufficient. And budget the reprocessing — it's billable work (principle 11) and it shifts every downstream report keyed on UnifiedIndividual__dlm, so tell whoever reads those numbers before they move under them.

A diagnostic you can run

When the report is "this profile is wrong" and you don't yet know the direction, the fastest triage is two counts through the link object, read by eye — no rule change, no direct join. Both traverse IndividualIdentityLink__dlm; they just start from opposite ends.

Source records per unified individual (the over-merge probe). Fan out from a suspect unified individual through the link and count distinct source records. A count far above what one person could plausibly own — and whose linked sources, when you read them, describe different humans — is over-merge. The fix direction is tighten.
Unified individuals per should-be-one-person key (the over-split probe). Group the source records by a key that should be unique to one person, and count distinct unified individuals they resolved into through the link. Anything above one for a single real person is over-split. The fix direction is loosen or repair the key.

The direction you find dictates everything after it: tighten for over-merge, loosen for over-split, and never guess, because the two failures can coexist in one ruleset and a single global count hides both. A unified individual linking to fifteen source records that are clearly three different families is an over-merge you've localized to one profile — without changing a rule, and without ever joining source to unified directly.

Common symptoms mapped to steps

Symptom	Likely cause	Where to look
A customer saw another person's data	Over-merge	Source records per unified individual (Step 1)
"Active customers" jumped after a rule change	Over-merge or over-split	Unified count direction vs. what you changed (Step 4)
One known person appears as several profiles	Over-split	Unified individuals per should-be-one-person key (Step 1)
Two records that should match never do	Over-split — logic or key	Which rule failed to fire (Step 2); raw key values (Step 3)
Two clearly different people merged	Over-merge — a lone fuzzy criterion	Which rule's AND-group both satisfy (Step 2)
Exact-email rule misses obvious matches	Normalization gap	Raw email casing / whitespace on the source DMO (Step 3)
Rule changed, but the count didn't move	Reading too early, or wrong rule	Re-run the Step 1 probe after reprocessing (Step 4)

The unified individual — the two objects and the IndividualIdentityLink__dlm bridge every probe on this page traverses
Match rules — the OR-of-AND-groups logic you read in Step 2, and why a fuzzy criterion is rarely safe alone
Identity resolution gotchas — over-merge, over-split, and the silent failures this page diagnoses, in production form
Debugging query results — when the wrong number is a query bug, not an identity one (the layer above this)

Reference: