Skip to main content

JOIN — Marketing Cloud SQL reference

How to combine Data Extensions in MC SQL — the four join types, the SubscriberKey type-coercion trap, anti-joins, and the staging rule that keeps performance honest.

Reference·Last updated 2026-05-07·Drafted by Lira · Edited by German Medina

JOIN is how you combine rows from two or more sources in a single SELECT. In Marketing Cloud SQL the four standard join types (INNER, LEFT, RIGHT, FULL OUTER) are supported and behave like T-SQL. What's not standard is the data — SubscriberKey looks like an integer but is stored as a string, System Data Views might or might not match the row set you expect, and the 30-minute timeout punishes joins that touch too many sources at once.

Official syntax

-- INNER JOIN — only rows that match in both sides
SELECT s.SubscriberKey, p.LastPurchase
FROM master_subscribers s
INNER JOIN purchases p
  ON s.SubscriberKey = p.SubscriberKey;

-- LEFT JOIN — every row from the left side; NULL on the right
-- when no match
SELECT s.SubscriberKey, p.LastPurchase
FROM master_subscribers s
LEFT JOIN purchases p
  ON s.SubscriberKey = p.SubscriberKey;

-- LEFT JOIN ... WHERE right IS NULL — anti-join, "in left but
-- not in right". The most common dedup / suppression pattern in MC
SELECT s.SubscriberKey
FROM master_subscribers s
LEFT JOIN suppression_list x
  ON s.SubscriberKey = x.SubscriberKey
WHERE x.SubscriberKey IS NULL;

-- FULL OUTER JOIN — every row from both sides, NULL where
-- the other side has no match
SELECT
  COALESCE(s.SubscriberKey, p.SubscriberKey) AS SubscriberKey,
  s.EmailAddress,
  p.LastPurchase
FROM master_subscribers s
FULL OUTER JOIN purchases p
  ON s.SubscriberKey = p.SubscriberKey;

CROSS JOIN (Cartesian product) is technically supported but almost never the right answer in MC — it multiplies the row count of one side by the other, which against production-sized DEs blows past the 30-min timeout instantly.

Reference:

What survives in production

SubscriberKey is a string — cast both sides before joining

The single most common silent-bug source in MC joins. Even when SubscriberKey looks like an integer in your CRM, MC stores it as a string. A join ON sub.SubscriberKey = ext.UserId where ext.UserId is INT looks correct, parses, runs — and silently misses every row where leading zeros differ, where one side trims trailing whitespace and the other doesn't, or where the two sides differ in length even by one character.

-- AT RISK — implicit type coercion, silent mismatches
SELECT s.SubscriberKey, e.LoyaltyTier
FROM _Subscribers s
INNER JOIN ext_loyalty e
  ON s.SubscriberKey = e.UserId;

-- SAFE — explicit cast on both sides + defensive trim
SELECT s.SubscriberKey, e.LoyaltyTier
FROM _Subscribers s
INNER JOIN ext_loyalty e
  ON LTRIM(RTRIM(CAST(s.SubscriberKey AS NVARCHAR(255))))
   = LTRIM(RTRIM(CAST(e.UserId AS NVARCHAR(255))));

The first version is shorter. The second matches the row count you expected. See gotchas — #9.

The anti-join is your suppression pattern

MC SQL doesn't have EXCEPT reliably across editions, and NOT IN against a million-row subquery is a performance trap. The durable pattern is LEFT JOIN ... WHERE right IS NULL:

-- Standard MC suppression: subscribers not in the suppression list
SELECT s.SubscriberKey
FROM master_subscribers s
LEFT JOIN suppression_list x
  ON s.SubscriberKey = x.SubscriberKey
WHERE x.SubscriberKey IS NULL;

-- Standard MC dedup-on-most-recent pattern (no window functions)
INSERT INTO de_stg_max_purchase_per_email
SELECT EmailAddress, MAX(LastPurchase) AS MaxPurchase
FROM master_subscribers
GROUP BY EmailAddress;

INSERT INTO de_stg_dedup_subs
SELECT m.SubscriberKey
FROM master_subscribers m
INNER JOIN de_stg_max_purchase_per_email s
  ON m.EmailAddress = s.EmailAddress
  AND m.LastPurchase = s.MaxPurchase;

Two staged Activities. Each independently auditable. Both survive the next platform update because neither depends on edition-specific syntax.

Three or more sources: stage, don't chain

A query with FROM A INNER JOIN B INNER JOIN C INNER JOIN D parses fine, runs fine in dev with sample data, and times out at 31 minutes in production. The optimizer choices in MC SQL are not yours to control.

The pattern: each Activity owns at most one join. Stage A JOIN B → de_stg_ab, then de_stg_ab JOIN C → de_stg_abc, then de_stg_abc JOIN D → final. Three Activities, three checkpoints, three row counts you can compare against expectations. Each stays well under the timeout.

Always specify the join type explicitly

FROM a, b WHERE a.k = b.k (implicit join) is technically valid SQL but reads as either an unfinished thought or a bug. Use INNER JOIN ... ON so the next dev reads intent in the syntax, not in the WHERE.

-- AVOID — implicit cross-join + filter, hard to read, easy to misread
SELECT s.SubscriberKey, p.LastPurchase
FROM master_subscribers s, purchases p
WHERE s.SubscriberKey = p.SubscriberKey
  AND s.Status = 'Active';

-- CLEAR — join type explicit, ON for the join condition,
-- WHERE for the post-join filter
SELECT s.SubscriberKey, p.LastPurchase
FROM master_subscribers s
INNER JOIN purchases p
  ON s.SubscriberKey = p.SubscriberKey
WHERE s.Status = 'Active';

Quick decision

Use INNER JOIN when:

  • You only want rows that match in both sides.
  • The default. If you don't know which join type you need, you usually need INNER.

Use LEFT JOIN when:

  • You want every row from the primary source, with optional data from the secondary.
  • You're building an anti-join with WHERE right IS NULL (suppression, "missing from").
  • The right-side row count is uncertain and you don't want to lose left-side rows.

Use RIGHT JOIN only as:

  • A LEFT JOIN with the operands flipped, when reordering FROM would hurt readability. Most teams pick a convention (always LEFT) and stick with it.

Use FULL OUTER JOIN when:

  • You genuinely need every row from both sides (reconciliation, audit) — and you COALESCE the join keys to surface the matched value.

Avoid CROSS JOIN unless:

  • You're generating a small calendar / numbers table for date ranges, and you LIMIT the result. Otherwise the row explosion blows the timeout.

Related

  • Basics — the supported T-SQL subset and target action mechanics
  • SELECT — the projection that comes before the join
  • FROM — sources, aliases, and the SDV snapshot pattern joins inherit
  • MC SQL gotchas — see #9 for the SubscriberKey type-coercion trap and #6 for SDV rotation

More reference pages incoming: WHERE · LIKE · CASE · INSERT INTO · String / Date / Numeric / Conversion / Aggregate / Null Functions · Style Guide.

Plus how-to snippets for common production debugging — email sends, value length, contact reach, etc.