Aggregate functions — Marketing Cloud SQL reference
The aggregate functions MC SQL supports — COUNT, SUM, AVG, MIN, MAX — plus the COUNT(*) vs COUNT(col) distinction, the SUM-returns-NULL-on-empty trap, and the MAX-per-group dedup pattern that replaces window functions in MC.
Aggregate functions reduce many rows to one value (or one value per group, with GROUP BY). In MC SQL the supported set is the standard one — COUNT, SUM, AVG, MIN, MAX — and the bugs are mostly about how each one treats NULL and empty result sets. COUNT(*) counts all rows; COUNT(col) skips NULLs. SUM over zero rows returns NULL, not 0. AVG silently ignores NULL rows from both numerator and denominator. The MC-specific addition: MAX-per-group is the pattern that replaces window functions when those aren't available.
Official syntax
-- COUNT(*) vs COUNT(col) vs COUNT(DISTINCT col)
SELECT
COUNT(*) AS TotalRows, -- all rows, NULL or not
COUNT(LoyaltyTier) AS RowsWithTier, -- skips NULL LoyaltyTier
COUNT(DISTINCT LoyaltyTier) AS UniqueTiers -- unique non-NULL values
FROM master_subscribers;
-- SUM, AVG, MIN, MAX
SELECT
EmailDomain,
SUM(TotalSpend) AS DomainSpend,
AVG(TotalSpend) AS AvgSpend,
MIN(LastPurchase) AS FirstPurchase,
MAX(LastPurchase) AS MostRecentPurchase,
COUNT(*) AS Subs
FROM master_subscribers
GROUP BY EmailDomain
HAVING COUNT(*) > 100; -- HAVING filters aggregates; WHERE filters rows
-- Conditional aggregation with CASE
SELECT
EmailDomain,
COUNT(*) AS Total,
COUNT(CASE WHEN LoyaltyTier = 'gold' THEN 1 END) AS Gold,
COUNT(CASE WHEN LoyaltyTier = 'silver' THEN 1 END) AS Silver,
SUM(CASE WHEN Status = 'Active' THEN 1 ELSE 0 END) AS ActiveCount
FROM master_subscribers
GROUP BY EmailDomain;The supported set across most SFMC tenants:
| Function | What it does | NULL behavior |
|---|---|---|
| COUNT(*) | Count all rows in the group | Always counts (NULL or not) |
| COUNT(col) | Count rows where col is non-NULL | Skips NULLs |
| COUNT(DISTINCT col) | Count unique non-NULL values | Skips NULLs, deduplicates |
| SUM(col) | Sum of non-NULL values | Returns NULL if zero rows / all NULL |
| AVG(col) | Average of non-NULL values | Skips NULLs in both numerator AND denominator |
| MIN(col) | Minimum non-NULL value | Returns NULL if zero rows / all NULL |
| MAX(col) | Maximum non-NULL value | Returns NULL if zero rows / all NULL |
HAVING filters the result of GROUP BY (use it for "groups with more than N rows"). WHERE filters rows before grouping. Don't put WHERE COUNT(*) > 100 — the parser rejects it; use HAVING instead.
Reference:
What survives in production
COUNT(*) vs COUNT(col) — the difference is whether NULLs count
COUNT(*) counts every row in the group, regardless of NULLs. COUNT(LoyaltyTier) counts only rows where LoyaltyTier is non-NULL. They're often used interchangeably and shouldn't be.
-- "How many subscribers are in this domain?" — every row, NULL OK
SELECT EmailDomain, COUNT(*) AS Subscribers
FROM master_subscribers
GROUP BY EmailDomain;
-- "How many subscribers in this domain have a loyalty tier set?"
-- Excludes NULLs naturally
SELECT EmailDomain, COUNT(LoyaltyTier) AS WithTier
FROM master_subscribers
GROUP BY EmailDomain;
-- "How many distinct tiers exist in this domain?"
-- DISTINCT + COUNT
SELECT EmailDomain, COUNT(DISTINCT LoyaltyTier) AS UniqueTiers
FROM master_subscribers
GROUP BY EmailDomain;The wrong choice produces "correct-looking" numbers that aren't what the business asked. Always say what you're counting in a column alias so the next reader can audit the math.
SUM over zero rows returns NULL, not 0
If WHERE filters out everything in a group, SUM returns NULL rather than 0. Then downstream math (SUM(...) + 100) returns NULL because of three-valued logic. The defense: wrap aggregates in COALESCE when the result feeds further math or a destination column that doesn't allow NULL.
-- AT RISK — if no rows match the filter, TotalSpend is NULL,
-- which then propagates through any further math
SELECT
EmailDomain,
SUM(TotalSpend) AS TotalSpend,
SUM(TotalSpend) + 100 AS TotalSpendPlus100 -- NULL + 100 = NULL
FROM master_subscribers
WHERE Status = 'Inactive' -- might match zero rows
GROUP BY EmailDomain;
-- DURABLE — COALESCE guarantees a numeric default
SELECT
EmailDomain,
COALESCE(SUM(TotalSpend), 0) AS TotalSpend,
COALESCE(SUM(TotalSpend), 0) + 100 AS TotalSpendPlus100
FROM master_subscribers
WHERE Status = 'Inactive'
GROUP BY EmailDomain;MIN and MAX have the same behavior — return NULL when no non-NULL values exist. COUNT(*) is the exception: always returns an integer (0 if no rows).
AVG skips NULLs from both numerator AND denominator
If you AVG(LoyaltyDelta) over 100 rows but only 70 have a non-NULL LoyaltyDelta, the average is SUM(non-null) / 70, not SUM(non-null) / 100. This usually isn't what the business asked for — they wanted "average across all subscribers, treating missing as zero".
-- AT RISK — average ignores NULL rows in BOTH the sum and the count
SELECT EmailDomain, AVG(LoyaltyDelta) AS AvgDelta
FROM master_subscribers
GROUP BY EmailDomain;
-- 70 of 100 rows non-null → returns SUM(70) / 70
-- EXPLICIT — coalesce NULLs to 0 first if that's the business rule
SELECT EmailDomain, AVG(COALESCE(LoyaltyDelta, 0)) AS AvgDelta
FROM master_subscribers
GROUP BY EmailDomain;
-- → returns SUM(70 non-null + 0 for the 30 nulls) / 100
-- OR explicit numerator + denominator with conditional logic
SELECT EmailDomain, SUM(LoyaltyDelta) * 1.0 / COUNT(*) AS AvgDelta
FROM master_subscribers
GROUP BY EmailDomain;Decide which math the business actually wants and write it explicitly. The implicit AVG behavior is correct often enough to lull you into trusting it, then wrong on the report that goes to leadership.
MAX-per-group: the dedup pattern that replaces window functions
MC SQL doesn't reliably support ROW_NUMBER() OVER (...) across editions (see gotchas — #4). The standard SFMC dedup-on-most-recent uses MAX plus a self-join in two staged Activities:
-- Activity 1: find the max date per group
INSERT INTO de_stg_max_purchase_per_email
SELECT
EmailAddress,
MAX(LastPurchase) AS MaxPurchase
FROM master_subscribers
GROUP BY EmailAddress;
-- Activity 2: pick the row(s) that match the max
INSERT INTO de_stg_dedup_subs
SELECT m.SubscriberKey, m.EmailAddress, m.LastPurchase
FROM master_subscribers m
INNER JOIN de_stg_max_purchase_per_email s
ON m.EmailAddress = s.EmailAddress
AND m.LastPurchase = s.MaxPurchase;If multiple rows tie on MaxPurchase for the same email (same date), all of them survive — which may or may not be what you want. Add a tiebreaker (e.g., MIN(SubscriberKey)) in Activity 2 if you need a single row per group.
Quick decision
Use COUNT(*) when:
- Counting every row regardless of NULLs.
Use COUNT(col) when:
- The business question is "rows that have a non-NULL value in this column".
Use COUNT(DISTINCT col) when:
- You need unique values, and the source is small (< ~500k rows). For larger sources, stage
SELECT DISTINCTinto a DE first, thenCOUNT(*).
Wrap SUM / MIN / MAX in COALESCE when:
- The result feeds further math, or the destination column doesn't allow NULL.
Use explicit SUM(...) / COUNT(*) instead of AVG when:
- You need NULL rows to be treated as zero in the average.
Use the MAX-per-group two-Activity pattern when:
- You need "most recent row per group" semantics. Don't depend on
ROW_NUMBER().
Related
- Basics — supported T-SQL subset
- SELECT — aggregate functions in projection
- WHERE — pre-aggregate filtering (vs
HAVINGpost-aggregate) - JOIN — staging pattern for MAX-per-group
- CASE — conditional aggregation (
SUM(CASE WHEN x THEN 1 ELSE 0 END)) - Numeric functions — DECIMAL precision when summing money
- MC SQL gotchas — #4 (window functions / CTEs edition-dependent), #5 (NULL three-valued)
One more function reference page incoming: Null Functions.
Plus how-to snippets for common production debugging — email sends, value length, contact reach, etc.