DATA 360 / INGESTION
Ingestion
Getting data in: Data Streams, connectors (CRM, Marketing Cloud, S3, web/mobile SDK), the Ingestion API, and refresh modes — full vs incremental and when each one bites.
Foundation · 2
Production note
Ingestion gotchas: the silent failures at the front of the pipeline
The ingestion failures that don't throw an error. An upsert keeps deleted source records because nothing told it they left; a full refresh on a huge source bills like a cost decision nobody made; a non-unique key overwrites a record; a daily cadence sits behind a real-time decision; event data lands as a Profile stream and the time series never exists; a DLO bloats because nobody reconciles ingest against use; a connector quietly loses auth and looks like a source with no data. Seven gotchas, each as the instinct, what actually happens in production, and the fix.
Decision framework
Data 360 Ingestion: Style Guide
The opinionated rules Cleon applies to every Data 360 ingestion decision — the streaming-vs-scheduled-batch call decided by the freshest downstream decision, the full-refresh-vs-upsert call decided by a named key and an explicit delete story, the cadence and cost discipline, the patterns to prefer and the ones to refuse, plus a pre-ship checklist for any new Data Stream. The discipline document that ties the Ingestion subcategory together.
Reference · 5
Reference
Data streams: the unit of ingestion
What a Data Stream is: the configured source-to-Data 360 connection that lands a DLO. The stream category — Profile, Engagement, or Other — and what it constrains downstream, the refresh schedule, and where ingestion ends and modeling begins.
Reference
Connectors: where Data 360 ingests from
The sources a Data Stream ingests from and what each is for — Salesforce CRM, Marketing Cloud, Amazon S3, Google Cloud Storage and Microsoft Azure Storage, the Web/Mobile SDK, the Ingestion API, MuleSoft, and zero-copy over Snowflake, BigQuery, and Databricks. Each one's batch-versus-streaming nature, and why the available set is something you verify in the org, not memorize from a page.
Reference
The Ingestion API: streaming vs bulk
Programmatic ingestion for when no packaged connector fits: the Ingestion API's two patterns — Streaming (small near-real-time asynchronous payloads) and Bulk (large file/CSV uploads) — when each fits, when to reach for the API instead of a connector, and the up-front requirement that a source and schema be defined before you send a single record.
Reference
Refresh modes: full refresh vs upsert
The refresh mode of a Data Stream decides how each run reconciles against what's already landed. Full refresh replaces the whole dataset every run and captures deletes by absence; upsert inserts-or-updates keyed on a primary key — lighter and incremental, but a wrong or non-unique key silently duplicates or overwrites, and it does not remove deleted source records unless you explicitly send deletes.
Reference
Ingestion and the lifecycle: what every downstream layer inherits
Ingestion is the front of the Data 360 lifecycle, and every later layer inherits whatever you landed. The DLO-to-DMO handoff, and how ingestion freshness and correctness feed identity resolution, query, segmentation, and agent-readiness — a stale or wrongly-keyed ingest surfaces as a downstream bug three layers away.