Skip to main content

Debugging silent UpsertData duplicates

A Script Activity ran cleanly but downstream counts are wrong. UpsertData inserted duplicates instead of updating because the destination DE's primary key is missing or misconfigured. Six queries that confirm the silent-insert pattern and walk through the recovery.

How-to·Last updated 2026-05-13·Drafted by Lira · Edited by German Medina

A Script Activity ran cleanly — the log shows every step completed, no errors caught, the Activity status reads Completed. The next day a downstream SQL Query Activity that does INNER JOIN ON SubscriberKey returns 60,000 rows instead of the expected 50,000. Or a report that aggregates by SubscriberKey returns inflated totals. Or the same SubscriberKey appears with different Status values across rows that should have been a single record. The fingerprint is Platform.Function.UpsertData against a destination DE whose primary key is missing or set on the wrong column — the function inserted duplicates instead of updating. See gotchas — #4.

This page is the diagnostic playbook for that exact shape — six queries that confirm the silent-insert pattern, locate which key the duplicates are on, and walk through the recovery (dedup the data, fix the DE schema, patch the script). The Activity didn't fail; the data is wrong. That distinction is what makes this debug session different from the other two SSJS snippets.

How the failure happens

[ Script calls Platform.Function.UpsertData ]
        ↓ MC backend checks destination DE's primary key config
[ PK configured correctly on the right column ]
        ↓ row exists by PK → UPDATE / row missing → INSERT
[ PK missing or on the wrong column ]
        ↓ MC backend treats every call as INSERT
[ Duplicates accumulate silently ]

The Activity status reports success because the writes themselves succeeded — they just weren't the writes you intended. The script can't detect the bug at runtime because the function returns "success" for both insert and update outcomes.

The queries below detect duplicates in the destination DE, prove the silent-insert pattern, and produce the receipts you need to fix the schema with confidence.

Step 1 — Count duplicates by the expected key

Start at the destination. If you intended SubscriberKey to be the unique key, that's what you group by. Any group with COUNT(*) > 1 is a duplicate.

-- Replace 'master_subscribers' with the destination DE name
-- Replace 'SubscriberKey' with the column you intended as the PK
SELECT
  SubscriberKey,
  COUNT(*) AS Copies
FROM master_subscribers
GROUP BY SubscriberKey
HAVING COUNT(*) > 1
ORDER BY Copies DESC;

Three failure shapes here:

  • The query returns rows: confirms duplicates exist on SubscriberKey. The destination DE's PK is missing or misconfigured. Continue to step 2.
  • The query returns no rows: SubscriberKey is unique. The bug isn't UpsertData duplicates on this column — either the duplicates are on a different intended key, or the downstream INNER JOIN issue is in a different DE. Re-aim the diagnostic.
  • The query times out: the destination DE is very large and unindexed. Add a TOP 1000 or filter to a recent date window to get a sample first.

Step 2 — Profile the duplicate rows themselves

For the keys with duplicates, look at what's different between the copies. The pattern tells you which column the PK is actually on (vs the one you intended).

SELECT
  SubscriberKey,
  Status,
  EmailAddress,
  CreatedDate,
  UpdatedDate
FROM master_subscribers
WHERE SubscriberKey IN (
  SELECT SubscriberKey
  FROM master_subscribers
  GROUP BY SubscriberKey
  HAVING COUNT(*) > 1
)
ORDER BY SubscriberKey, UpdatedDate DESC;

What the patterns mean:

  • Copies have different EmailAddress values for the same SubscriberKey: the DE's PK is likely on the EmailAddress column (not SubscriberKey). Every script run with a re-keyed subscriber inserted a new row.
  • Copies have identical EmailAddress but different UpdatedDate: there's no enforced PK at all. The same logical record was inserted on every run.
  • Copies have identical everything except CreatedDate: the DE was rebuilt without truncation, or two automations are writing to the same DE without coordination.

Step 3 — Cross-reference with the script's log

Confirm the duplicates came from your UpsertData calls and not from another source (a SQL Query Activity in Append mode, an Import Activity, etc.).

-- Replace the date range with the window you suspect
SELECT
  RunId,
  Step,
  Ts,
  Message
FROM de_log_ssjs_runs
WHERE Step LIKE '%upsert%'
  AND Ts BETWEEN '2026-05-01' AND '2026-05-13'
ORDER BY Ts;

The expected pattern when UpsertData is misbehaving: many rows logged across many RunIds for the same destination DE. If the destination DE row count grew on each script run in lockstep with the number of upsert log entries, the script is the source. If not, the duplicates came in from somewhere else — audit the Automation's other steps or any Import Activities that target the same DE.

Step 4 — Compute expected vs actual unique-key count

A single number that captures the inflation: how many distinct values of the intended-key column, vs how many total rows.

SELECT
  COUNT(*)                       AS TotalRows,
  COUNT(DISTINCT SubscriberKey)  AS UniqueSubscriberKeys,
  COUNT(*) - COUNT(DISTINCT SubscriberKey) AS Inflation
FROM master_subscribers;

If Inflation is in the thousands, the silent-insert pattern has been running for many script executions. Cross-reference the magnitude against step 3's log count to estimate when the misconfiguration started — if you have 8,000 inflation and the log shows 200 upserts per day for 40 days, the bug has been live since the 40-day mark.

Step 5 — Recovery: stage the dedup

Before fixing the schema, stage a deduplicated copy of the destination DE. The pattern is the same as the SQL Style Guide stage-validate-promote rule: write to staging, validate, then promote.

-- Stage the deduplicated rows. Use the MAX-per-group pattern from
-- SQL Aggregate Functions; keep the row with the most recent
-- UpdatedDate per SubscriberKey.
SELECT
  s.SubscriberKey,
  s.Status,
  s.EmailAddress,
  s.CreatedDate,
  s.UpdatedDate
INTO de_stg_master_subscribers_dedup
FROM master_subscribers s
INNER JOIN (
  SELECT
    SubscriberKey,
    MAX(UpdatedDate) AS MaxUpdated
  FROM master_subscribers
  GROUP BY SubscriberKey
) latest
  ON s.SubscriberKey = latest.SubscriberKey
  AND s.UpdatedDate = latest.MaxUpdated;

Before promoting, validate the staging DE: row count equals the unique-key count from step 4, no duplicates remain (re-run step 1 against the staging DE), and the most recent UpdatedDate for each key matches what the upstream source would have written.

Then in the Marketing Cloud UI:

  1. Open the destination DE's properties → mark SubscriberKey as the Primary Key. (You may need to recreate the DE if MC won't let you add a PK to a populated DE — coordinate with stakeholders for the cutover window.)
  2. Truncate the destination DE.
  3. Run an SQL Activity in Overwrite mode: SELECT * FROM de_stg_master_subscribers_dedup.
  4. Re-run step 1 against the destination DE to confirm zero duplicates.

The next script execution will now behave as a real upsert because the PK is correctly enforced.

Step 6 — Write the postmortem

Write the diagnostic into de_log_ssjs_postmortems (same DE used by the other two SSJS debugging snippets).

INSERT INTO de_log_ssjs_postmortems
SELECT
  GETDATE()                                AS DiagnosedAt,
  'SA_NightlyEnrichment'                   AS ActivityName,
  'multiple-runs'                          AS RunId,  -- the bug spans many runs
  NULL                                     AS StartedAt,
  NULL                                     AS LastWriteAt,
  'upsert-master-subscribers'              AS LastStep,
  (SELECT COUNT(*) FROM master_subscribers) - (SELECT COUNT(DISTINCT SubscriberKey) FROM master_subscribers) AS RowInflation,
  'Destination DE master_subscribers had no PK; UpsertData behaved as InsertData across all runs. Deduplicated via de_stg_*, recreated DE with SubscriberKey as PK, ran Overwrite from staging.' AS RootCause;

The RowInflation figure is the receipt — six months later when an audit asks "how bad was the duplicate-key issue", the answer is one query away.

Common causes ranked by frequency

| Cause | How to spot | Fix in | |---|---|---| | Destination DE has no PK | Step 1 finds duplicates; step 2 shows identical rows | Recreate DE with PK; dedup + Overwrite from staging | | PK is on the wrong column | Step 2 shows duplicates with different EmailAddress for same SubscriberKey (or similar) | Confirm intended PK column; recreate DE schema | | Script's column list doesn't include the PK | UpsertData call omits SubscriberKey from the keys array | Audit the script; add the PK column to the keys-array argument | | PK is set but the script writes to a different column name | DE has PK on SubKey, script writes to SubscriberKey | Audit DE schema vs script's column-name constants | | Multiple sources writing to the DE without coordination | Step 3 logs few upserts but step 4's Inflation is large | Audit Automation flow; check for Import Activities and SQL Activities in Append mode | | Update-mode SQL Activity also writing without PK | DE shows duplicates from both SQL Activities and SSJS | See SQL Style Guide — Update-mode without PK rule | | DE was rebuilt mid-history without truncation | Step 2 shows pairs of rows with identical content + different CreatedDate | Audit audience-rebuild Activity's target action (Overwrite vs Append) |

Related