Skip to main content

Data streams: the unit of ingestion

What a Data Stream is: the configured source-to-Data 360 connection that lands a DLO. The stream category — Profile, Engagement, or Other — and what it constrains downstream, the refresh schedule, and where ingestion ends and modeling begins.

Reference·Last updated 2026-06-01·Drafted by Lira · Edited by German Medina

A Data Stream is the unit of ingestion in Data 360 (formerly Data Cloud): the configured connection that pulls data from one source and lands it as a Data Lake Object (DLO) — API names end in __dll — in roughly the shape the source delivered. Every source you bring in is a stream, and every stream produces a DLO. Get this layer right and everything downstream has clean material to work with; get it wrong and the mistake propagates through identity, query, and activation before anyone notices.

This page is the spine of the Ingestion subcategory: what a stream is, the three things you configure on it (source, category, schedule), and where ingestion ends and modeling begins. It does not cover the individual sources — that is the connectors page — nor how DLOs become meaningful objects, which is Data Architecture's job and lives in mapping.

What a Data Stream is

A Data Stream is a definition, not a one-time import. It names a source, a way to connect to it, and a schedule, and on each run it lands the source's records into a DLO. The DLO is the raw floor: it holds the data in the source's structure — its fields, its types, its grain — without business meaning attached yet.

That separation is deliberate and it is principle 2 in practice. The stream's job is to land data faithfully; assigning what the data means is a separate step. A single source can produce more than one DLO, and a stream's DLO is what every later step reads from — so the shape you land is the shape the rest of the org inherits.

Source            Data Stream              DLO (__dll)            DMO (__dlm)
(CRM, S3, SDK) ──[connect + schedule]──▶  raw landing  ──map──▶  harmonized meaning
                  ingestion                 this page            Data Architecture

The three things you configure

1. The source

Every stream connects to exactly one source through a connector or the Ingestion API — Salesforce CRM, Marketing Cloud, a cloud-storage bucket, the Web/Mobile SDK, and others. Which sources are available depends on your org and licensing, so verify the list rather than assume a specific connector exists. The full catalog of sources and what each is for lives in the connectors page.

2. The category

When you create a stream you assign it a category, and the category is not cosmetic — it tells Data 360 how the data behaves and constrains what you can do with it downstream. There are three:

  • Profile — data that describes a person or entity: a customer, an account, a contact point. One record per subject, updated over time. This is the material identity resolution unifies.
  • Engagement — time-series event data: an email open, a purchase, a page view, an app event. Each record is a thing that happened at a moment, so an Engagement stream requires an event-time field — the timestamp that places the event on a timeline. Without it the data is not a usable time series.
  • Other — reference or lookup data that is neither a profile nor an event: a product catalog, a store list, a category table. It supports the other two without being either.

Pick the wrong category and the constraint surfaces later, not now. The classic mistake is landing event data as a Profile stream: it ingests, it looks fine, and then time-windowed segmentation and engagement metrics have nothing to stand on because the data was never modeled as a time series. Choosing the category is the first modeling decision you make, even though it happens at ingestion — which is why it anchors to principle 1: the model is a contract you design before the first stream connects.

3. The refresh schedule

A stream is not a single load; it refreshes on a cadence you set, and that cadence is a real decision, not a default to accept. The schedule determines how fresh the DLO — and therefore everything downstream — actually is. This is principle 6 made concrete: freshness is a feature. A stream that refreshes daily behind a decision that needs near-real-time data is a latency bug waiting to surprise someone; a stream that streams continuously to feed a once-a-day batch is wasted cost (principle 11).

How a refresh applies its data — replacing the whole set versus merging by a key — is a separate axis from how often it runs, and it carries the central correctness gotcha of ingestion. That belongs to refresh modes; here, the point is only that the cadence is something you choose deliberately and write down next to the stream.

Where ingestion ends and modeling begins

A Data Stream's responsibility stops at the DLO. The DLO is raw landing; turning it into a Data Model Object (DMO) — API names end in __dlm, standard ones in the ssot__ namespace — is the modeling step, and it belongs to Data Architecture, not to ingestion.

This boundary matters because the two are easy to conflate and expensive to confuse. A DLO is what you ingested; a DMO is what it means. Mapping a DLO one-to-one onto a DMO and calling it modeling just relocates the source system's mess into the layer every segment and agent reads from. The mapping decision — which DLO field carries which business meaning, what the key is, how objects relate — is documented in mapping and Data Lake Objects. This page does not redefine it. The handoff is the thing to hold in your head: ingestion lands the DLO, Data Architecture gives it meaning, and the quality of the stream sets the ceiling on everything that follows.

Related

  • Connectors — the catalog of sources a Data Stream can connect to, and what each is for
  • Refresh modes — full refresh vs upsert: how a stream applies its data, and the deletes gotcha
  • Data Lake Objects — the raw landing object a stream creates, and why you map it rather than build on it
  • Data 360 principles — why the model under the stream is the product (principles 1 and 6)

Reference: