Debugging WSProxy auth issues

You wrote a Script Activity that uses WSProxy to retrieve subscribers via the SOAP API. Tested it in preview against a small batch — worked. Promoted to production with a real audience — fine for the first thousand or so rows, then everything past minute 20 comes back with Token has expired or Unauthorized, the script's catch block writes errors to de_log_ssjs_errors, and downstream Activities run against half the data. The failure is the WSProxy auth token timing out at the ~20-minute mark and a script that didn't re-instantiate the proxy. See gotchas — #8.

This page is the diagnostic playbook for that exact shape — five queries that confirm the auth-timeout fingerprint and rule out the other shapes (bad credentials, missing setClientId for cross-BU access, stale Installed Package, network).

The auth lifecycle

[ new Script.Util.WSProxy() ]
        ↓ implicit OAuth call → access token cached on the prox
[ first .retrieve() / .create() / .update() — uses cached token ]
        ↓ ... up to ~20 minutes ...
[ token expires server-side ]
        ↓ next call → 'Token has expired' / 'Unauthorized'
[ either: re-instantiate the prox (durable) ]
[     or: fail the rest of the loop (what you're debugging) ]

The fingerprint of this exact failure: a long run where the first batch of calls succeeded, the failure starts at a clean ~20-minute boundary, and everything after that point fails with the same error message.

The queries below confirm the fingerprint and tell you what to fix.

Step 1 — Did the first call succeed?

If WSProxy auth failed on the first call, the bug isn't the 20-minute timeout — it's credentials, scopes, or network. Confirm that at least one SOAP call succeeded under this RunId before reading anything else.

-- Replace 'a1b2c3d4-...' with the RUN_ID of the failed run
SELECT
  COUNT(*)                AS SuccessSteps,
  MIN(Ts)                 AS FirstSuccessAt,
  MAX(Ts)                 AS LastSuccessAt
FROM de_log_ssjs_runs
WHERE RunId = 'a1b2c3d4-e5f6-...'
  AND Step LIKE '%wsproxy%'
  AND Step NOT LIKE '%fail%';

Three numbers, one question:

SuccessSteps = 0: WSProxy never worked on this run. Skip the rest of this page — your problem is in the Installed Package / API Integration setup. Verify the script reads clientId / clientSecret from the right config DE, the Installed Package's API scopes include Subscribers Read/Write (or whatever surface you're calling), and the package's Business Unit matches the one the script runs in.
SuccessSteps > 0 and LastSuccessAt - FirstSuccessAt < 20 minutes: the script didn't run long enough to test the timeout. The failure is something else — likely a payload error or a specific record's data. Use Debugging stuck Script Activities — step 3.
SuccessSteps > 0 and LastSuccessAt - FirstSuccessAt near or past 20 minutes: the 20-minute pattern is in play. Continue to step 2.

Step 2 — When did auth start failing?

Find the first auth-failure row for the RunId and compute the gap from the first successful call.

WITH
  first_success AS (
    SELECT MIN(Ts) AS Ts
    FROM de_log_ssjs_runs
    WHERE RunId = 'a1b2c3d4-e5f6-...'
      AND Step LIKE '%wsproxy%'
      AND Step NOT LIKE '%fail%'
  ),
  first_failure AS (
    SELECT MIN(Ts) AS Ts
    FROM de_log_ssjs_errors
    WHERE RunId = 'a1b2c3d4-e5f6-...'
      AND (Msg LIKE '%Token has expired%'
        OR Msg LIKE '%Unauthorized%'
        OR Msg LIKE '%401%')
  )
SELECT
  f.Ts                                 AS FirstFailureAt,
  s.Ts                                 AS FirstSuccessAt,
  DATEDIFF(minute, s.Ts, f.Ts)         AS GapMinutes
FROM first_success s, first_failure f;

If GapMinutes is 19, 20, or 21, you have the auth-timeout fingerprint. The fix is in WSProxy — re-instantiate the prox every 15 minutes inside the long-running loop.

If GapMinutes is much smaller (under 5), it's not a timeout. Possible causes:

A specific record triggered a permissions error that returns a 401-shaped message (rare but real for cross-org references).
The Installed Package was rotated mid-run by an admin (very rare, but check the audit trail).
The script switched Business Units via setClientId and the new BU has different credentials — see step 5.

Step 3 — Is it the 20-minute pattern?

Confirm by looking at the full timeline of success-vs-failure rows for the RunId. The pattern: all success rows fall within the first 20 minutes; all failure rows fall after.

SELECT
  Ts,
  'SUCCESS' AS Outcome,
  Step
FROM de_log_ssjs_runs
WHERE RunId = 'a1b2c3d4-e5f6-...'
  AND Step LIKE '%wsproxy%'
  AND Step NOT LIKE '%fail%'

UNION ALL

SELECT
  Ts,
  'FAILURE' AS Outcome,
  Step
FROM de_log_ssjs_errors
WHERE RunId = 'a1b2c3d4-e5f6-...'
  AND (Msg LIKE '%Token has expired%'
    OR Msg LIKE '%Unauthorized%'
    OR Msg LIKE '%401%')

ORDER BY Ts;

The output is a chronological timeline. The auth-timeout signature looks like:

02:00:14  SUCCESS  wsproxy-retrieve-subs
02:05:22  SUCCESS  wsproxy-retrieve-subs
02:11:38  SUCCESS  wsproxy-retrieve-subs
02:18:47  SUCCESS  wsproxy-retrieve-subs
02:21:03  FAILURE  wsproxy-retrieve-subs   ← clean cutoff around minute 20
02:21:08  FAILURE  wsproxy-retrieve-subs
02:21:14  FAILURE  wsproxy-retrieve-subs
...

A different shape — failures interleaved with successes throughout — means it's not the auth timeout; it's something per-record (data shape, permissions, partial outage).

Step 4 — Is the script re-instantiating?

If the script is supposed to refresh the prox every 15 minutes per the Style Guide discipline, it should log a wsproxy-refresh step at each refresh. Verify the refresh entries are actually being written.

SELECT
  RunId,
  COUNT(*)                                   AS RefreshCount,
  MIN(Ts)                                    AS FirstRefreshAt,
  MAX(Ts)                                    AS LastRefreshAt,
  DATEDIFF(minute, MIN(Ts), MAX(Ts))         AS RefreshSpanMinutes
FROM de_log_ssjs_runs
WHERE RunId = 'a1b2c3d4-e5f6-...'
  AND Step = 'wsproxy-refresh'
GROUP BY RunId;

Three failure shapes here:

RefreshCount = 0: the script never refreshed. Confirms the bug — the prox was instantiated once at the top and lived through the entire run. Fix per WSProxy re-instantiation pattern.
RefreshCount > 0 but RefreshSpanMinutes shorter than the script's total runtime: the script refreshed at the start but stopped after some point. Common cause: the refresh logic is inside a try whose catch doesn't continue the loop — one error in the middle stopped further refreshes.
RefreshCount > 0 evenly spread but auth still failed: either the refresh interval is too long (>15 min between resets), or the script is calling the prox from a code path where the variable holding the proxy reference is stale (e.g. captured by a closure that started before the refresh).

Step 5 — Cross-BU scenarios

If the script uses prox.setClientId(clientId) to switch Business Units mid-run, the auth pattern is different. Each BU's credentials are evaluated against that BU's Installed Package configuration, and a BU that wasn't part of the original auth scope returns Unauthorized immediately.

SELECT
  Step,
  Message,
  Ts
FROM de_log_ssjs_runs
WHERE RunId = 'a1b2c3d4-e5f6-...'
  AND Step LIKE '%setClientId%'
ORDER BY Ts;

If you see setClientId calls followed quickly by Unauthorized errors:

Verify the Installed Package is enabled for that target BU (Setup → Installed Packages → Components tab → "Available in this Business Unit").
Verify the API scopes the script requires (Subscribers Read/Write, Data Extensions Read/Write, etc.) are enabled for the target BU's Installed Package, not just the parent BU's.
If the parent BU is MID 1000 and you're calling setClientId(2000), the API user behind the Installed Package needs Marketing Cloud Connect access to BU 2000 — not just the parent.

This is not the same as the 20-minute timeout. A failed setClientId fails on the first call into the new BU; the 20-minute timeout fails after 20 minutes of successful work.

Step 6 — Write the postmortem

Once the bug is found, write the diagnostic into de_log_ssjs_postmortems (same DE used by Debugging stuck Script Activities — step 6).

INSERT INTO de_log_ssjs_postmortems
SELECT
  GETDATE()                                AS DiagnosedAt,
  'SA_NightlyEnrichment'                   AS ActivityName,
  'a1b2c3d4-e5f6-...'                      AS RunId,
  (SELECT MIN(Ts) FROM de_log_ssjs_runs WHERE RunId = 'a1b2c3d4-e5f6-...')                 AS StartedAt,
  (SELECT MAX(Ts) FROM de_log_ssjs_runs WHERE RunId = 'a1b2c3d4-e5f6-...')                 AS LastWriteAt,
  (SELECT TOP 1 Step FROM de_log_ssjs_runs WHERE RunId = 'a1b2c3d4-e5f6-...' ORDER BY Ts DESC) AS LastStep,
  (SELECT COUNT(*) FROM de_log_ssjs_errors
     WHERE RunId = 'a1b2c3d4-e5f6-...'
       AND (Msg LIKE '%Token has expired%' OR Msg LIKE '%Unauthorized%'))                  AS AuthFailureCount,
  'WSProxy 20-min token expiry; script did not refresh prox in loop'                       AS RootCause;

A RootCause written in past tense forces clarity. "20-min token expiry" is a hypothesis; "script did not refresh prox in loop" is the fix.

Common causes ranked by frequency

Cause	How to spot	Fix in
20-minute token timeout, no refresh	Step 3 shows a clean cutoff around minute 20 + step 4 returns `RefreshCount = 0`	WSProxy; re-instantiate every 15 min
Refresh logic broken mid-run	Step 4's `RefreshCount > 0` but `RefreshSpanMinutes` shorter than total runtime	Style Guide; audit the refresh's try/catch
Auth failed on call one	Step 1's `SuccessSteps = 0`	Audit Installed Package: enabled BUs, API scopes, MID matches
`setClientId` to an unauthorized BU	Step 5 finds `setClientId` rows followed by `Unauthorized`	Marketing Cloud Setup → Installed Package → enable for target BU
Installed Package credentials rotated mid-run	Step 2's `GapMinutes` is small, audit trail shows recent edit	Coordinate with admin team; pin rotation to maintenance windows
Stale captured proxy reference in a closure	Step 4 has refreshes but auth still failed	Audit script for closures that capture `prox` before a refresh
Misconfigured API scope (e.g. `List Send` missing)	Step 1 successes for one operation type, failures for another	Installed Package Components tab; verify all needed scopes
Cross-org MID access not granted	`setClientId` to an MID outside the API user's MCC access	Marketing Cloud Connect access matrix; grant the MID to the user

WSProxy — the reference page with the re-instantiation pattern this debug session points at
Platform.Function — the higher-level wrapper; sometimes the fix is to drop WSProxy and use Platform.Function.* instead
MC SSJS gotchas — #8 (the auth-timeout gotcha this page diagnoses)
Debugging stuck Script Activities — the sibling how-to that shares the postmortem DE
Style Guide — the instrumentation discipline (RunId + Step + log-in-catch + 15-minute refresh) that makes these queries possible