Connectors: where Data 360 ingests from
The sources a Data Stream ingests from and what each is for — Salesforce CRM, Marketing Cloud, Amazon S3, Google Cloud Storage and Microsoft Azure Storage, the Web/Mobile SDK, the Ingestion API, MuleSoft, and zero-copy over Snowflake, BigQuery, and Databricks. Each one's batch-versus-streaming nature, and why the available set is something you verify in the org, not memorize from a page.
A connector is the thing on the far end of a Data Stream — the source system, and the mechanics for getting its data into Data 360 (formerly Data Cloud). Pick a connector, point it at a source, and the stream lands that data as a Data Lake Object (__dll). This page is the map of what you can ingest from and what each source is for; the stream itself — categories, schedule, the DLO it produces — is the previous page, and the programmatic path is the Ingestion API.
The one rule that outranks everything else here: the available connector set evolves, so verify the org's current connector list before you design around any single one. Salesforce adds connectors and changes their capabilities release over release. Everything below describes the common, durable shape of each source category; treat a specific connector as present only after you've seen it in the org's setup, not because a page said so.
The two things every connector decides
Before the individual sources, the two properties that actually shape a design:
- Batch or streaming. Most connectors land data on a schedule — they run, pull a batch, and land it (a file drop, a CRM sync). A few are streaming — events arrive continuously, near-real-time. This is not a quality ranking; it's a freshness decision. Freshness is a feature (principle 6): match the source's cadence to the freshest decision the data feeds, never to "fresher is better." A daily batch behind a real-time activation is a latency bug; streaming a source a nightly batch would serve is wasted cost (principle 11).
- Copy or zero-copy. Most connectors copy data into Data 360's lake. Zero-copy leaves the data in the source warehouse and queries it in place — no second copy to keep fresh. That difference is significant enough to get its own section below.
Salesforce-native sources
- Salesforce CRM. The connector for Sales Cloud, Service Cloud, and the rest of the core platform — objects like Account, Contact, and Lead, plus your custom objects. Scheduled batch ingestion; it is the most common first stream on almost any build, because the CRM is usually the system of record for who the customer is.
- Marketing Cloud. Engagement and subscriber data from Marketing Cloud Engagement — the sends, opens, and clicks that a Marketing Cloud practitioner knows as the System Data Views, now ingested as a stream rather than queried in place. Scheduled batch. This is the connector that makes the Data-360-feeds-Marketing-Cloud loop (principle 8) two-directional: MC engagement flows in, a resolved segment activates back out.
Cloud file storage
These ingest files (typically CSV) that another system drops into a bucket on a cadence. Scheduled batch by nature — Data 360 reads the file on its schedule, it does not receive a push.
- Amazon S3 — read files from an S3 bucket.
- Google Cloud Storage and Microsoft Azure Storage — the equivalents for the other two major clouds.
File storage is the pragmatic catch-all: when a source system has no direct connector but can export a file, a scheduled drop into a bucket is the path. The cost is that you own the export — its schedule, its schema, and its correctness — and a file that silently stops landing looks downstream exactly like a source with no new data.
Engagement from your own properties: the Web/Mobile SDK
The Web/Mobile SDK captures behavioral events — page views, screen views, interactions — from your website and mobile apps, and streams them into Data 360 as engagement data. This is the streaming, near-real-time source: events arrive as they happen, not on a nightly run. It feeds the Engagement stream category (time-series, event-time-stamped) that the data-streams page describes, and it's how you get first-party behavioral signal into the unified profile without waiting for a batch.
Programmatic ingestion: the Ingestion API
When no connector fits the source — a custom application, a system that can push rather than be pulled — the Ingestion API is the programmatic path. It has two patterns: a Streaming one for small near-real-time payloads and a Bulk one for large file uploads. The detail (which pattern, the up-front schema definition, the trade-offs) is its own page — see the Ingestion API. Here it's enough to know it exists as the escape hatch when the source isn't a packaged connector.
Integration at scale: MuleSoft
For sources that need real integration work before they're ingestible — legacy systems, APIs that require orchestration, transformation in flight — MuleSoft is the path Salesforce points to. It sits upstream of the stream: MuleSoft does the integration, and the result lands in Data 360. Reach for it when the gap between the source and a clean ingest is an integration problem, not just a connection.
Zero-copy: query the warehouse without copying it
Zero-copy is the one that breaks the "ingest means copy" assumption. With a zero-copy connection to a data warehouse — Snowflake, Google BigQuery, or Databricks — the data stays in the warehouse and Data 360 queries it where it lives. There is no second physical copy in the lake to keep in sync, which means no copy to fall stale and no duplicated storage.
Verify the org's list — don't design against a name from a page
Because the connector catalog moves with each release, the honest engineering posture is to confirm what the org actually has before you commit a design to it. If you're unsure whether a specific connector exists or supports the source you need, treat the source category generically — "we need scheduled file ingestion," "we need a streaming behavioral feed" — and resolve it to a concrete connector against the org's real setup, not against an assumption. A design that names a connector the org can't enable is a rebuild you discover late.
Related
- Data streams — the stream a connector feeds: source to DLO, the category, and the schedule
- The Ingestion API — the programmatic source when no connector fits: Streaming versus Bulk
- Refresh modes — full refresh versus upsert, the correctness decision every connector's stream inherits
- Ingestion gotchas — connector auth and limit failures that look like "no data," and other silent ones
- Data 360 principles — freshness is a feature (6), cost scales with what you process (11)
Reference: