The Ingestion API: streaming vs bulk

The Ingestion API is how you push data into Data 360 (formerly Data Cloud) from a system that no packaged connector covers. A connector knows the shape of its source — Salesforce CRM, Marketing Cloud, an S3 bucket — and lands it for you on a schedule. The Ingestion API knows nothing until you tell it: you define the source and its schema first, then your own code sends records against that contract. It's the programmatic escape hatch for custom apps, internal services, and event streams that don't have a connector of their own.

This page is the reference for what the API is and how its two patterns differ. The two patterns are Streaming and Bulk, and choosing between them is the first decision you make — they share a setup but behave nothing alike at send time. Both land their data as a Data Lake Object (__dll), exactly like any other Data Stream; the modeling that turns a DLO into something the rest of the platform reads is Data Architecture's job, not this page's (see mapping the DLO to a DMO).

The prerequisite: a defined source and schema before you send

Unlike a connector, the API has no idea what your data looks like until you describe it. Before a single record flows you have to register an ingestion source and define the schema of each object you'll send — the field names and types the payload must conform to. That schema is the contract: a record that doesn't match it is rejected or silently dropped, not coerced into shape.

This is the same discipline principle 1 asks for everywhere — model the keys and the shape before data flows — applied at the API boundary. The schema you register here becomes a DLO, and every downstream decision inherits it. Get the field types wrong at registration and you're re-landing the whole source later, after segments are already reading it.

Streaming: small, near-real-time, asynchronous

The Streaming pattern is for sending records as they happen, in small payloads, close to real time. Your application makes a call carrying a handful of records — an order placed, a cart abandoned, a profile updated — and the API accepts them asynchronously: it acknowledges receipt and processes the data into the DLO shortly after, rather than making you wait for the landing to complete in the same call.

That asynchronous, incremental shape is the whole point. Streaming is how ingestion buys the freshness principle 6 calls a feature: when a downstream decision needs to reflect what just happened — a real-time activation, an agent answering about a customer mid-session — the data has to arrive continuously, not in a nightly batch. Streaming feeds that need at the source. (Freshness is a property you have to design for end to end; streaming the ingest is necessary, not sufficient — the cadence of everything downstream has to match too. See refresh modes.)

Streaming is the right choice when:

Records are generated continuously and individually, not assembled into files.
A downstream consumer genuinely needs near-real-time data — not "fresher is nicer," but a decision that's wrong on stale data.
Each payload is small. Streaming is built for many small calls, not for shipping a million rows in one request.

Bulk: large files, fewer calls, scheduled loads

The Bulk pattern is for the opposite shape: large volumes of data delivered as files — CSV uploads — rather than a stream of individual records. You stage a file (or a set of files) and the API ingests it as a job. This is the pattern for an initial backfill, a periodic export from another system, or any load where the data already exists as a batch and there's no value in trickling it in one record at a time.

Bulk trades latency for throughput. It is not near-real-time — a bulk job lands on its own schedule — and that's exactly right when the source itself is batch. Loading yesterday's full export every morning has no business being a stream of millions of streaming calls; that would cost more and gain nothing, because the data isn't fresher than the daily export it came from (principle 11 — cost scales with what you process).

Bulk is the right choice when:

The data arrives as files, or is naturally produced in batches.
Volume is high — a backfill or a recurring large load, where one job beats millions of small calls.
Near-real-time isn't required: the freshest decision the data feeds is fine on a batch cadence.

The two patterns are not a ranking — neither is "better." They match two different source shapes. A custom source that emits both live events and a nightly reconciliation file legitimately uses both: streaming for the events, bulk for the file.

When to reach for the API vs a connector

The API is the escape hatch, not the default. If a packaged connector covers your source, use it: a connector is configured, not coded, it handles auth and schema discovery for you, and there's less of your own code to maintain and debug. The decision is straightforward:

Use a connector when one exists for your source — CRM, Marketing Cloud, S3, Google Cloud Storage, Azure Storage, the Web/Mobile SDK, or a zero-copy warehouse. See connectors for the common set and what each is for.
Use the Ingestion API when no connector fits: a custom application, an internal service, a homegrown event pipeline, or any system whose data you control in code and want to push on your own terms.

Don't reach for the API to get "more control" over a source a connector already handles well — that's custom code you now own forever, in place of configuration the platform maintains. Reach for it when there's genuinely no connector-shaped path in.

A note on specifics

The two patterns — Streaming for small asynchronous near-real-time payloads, Bulk for large file uploads — are stable and worth committing to memory. The exact endpoint paths, request formats, payload size limits, and authentication flow are version-specific and change; this page deliberately doesn't pin them. Before you build, verify the current API reference for the concrete contract. Treat any payload shape below as illustrative of the idea, not as a literal spec:

// ILLUSTRATIVE ONLY — verify the current Ingestion API reference for the
// real endpoint, payload format, and limits. Shape shown to convey the idea:
// a small streaming payload carrying a few records that match the registered schema.
{
  "data": [
    {
      "external_id": "cust-10293",
      "email": "person@example.com",
      "event_type": "cart_abandoned",
      "event_timestamp": "2026-06-01T14:32:00Z"
    }
  ]
}

The record's fields match the schema you registered for the source; event_timestamp is the kind of event-time field an Engagement stream requires (see data streams for stream categories). Land it well here and the rest of the lifecycle — modeling, identity, query, activation — inherits clean data. Land it wrong, and the bug surfaces three layers downstream where it's hardest to trace (see debugging ingestion).

Connectors — the packaged sources to prefer before reaching for the API, and what each one is for
Data streams — the unit of ingestion the API feeds: source → DLO, stream categories, and schedule
Refresh modes — full refresh vs upsert, the primary key, and how a streamed or bulk-loaded source updates
Debugging ingestion — when a custom source lands wrong: missing records, schema mismatches, a stream that won't refresh

Reference: