Apache Airflow XComs should be reserved exclusively for small metadata pointers, such as S3 keys or row IDs, to prevent metadata database bottlenecks. For large data transfers, utilizing custom XCom backends for object storage like S3 or GCS is recommended to optimize DAG performance. Read more on best practices at Astronomer Documentation Apache Airflow XComs — Airflow 3.2.0 Documentation
In Apache Airflow, XComs (cross-communications) allow tasks to exchange small amounts of data. While XComs are standard, achieving "exclusive" or restricted data sharing requires understanding advanced configurations like custom backends and specific TaskFlow API filters. Core XCom Mechanics
Definition: XComs consist of a key, value, and timestamp, along with attributes for the specific Task Instance and DAG Run.
Pushing Data: Tasks "push" data by returning a value from an operator's execute() method or by explicitly calling task_instance.xcom_push(). airflow xcom exclusive
Pulling Data: Tasks retrieve data using xcom_pull(), which can be filtered by task_ids, dag_id, or a specific key. Advanced "Exclusive" Strategies
To handle data more strictly or exclusively beyond the default local database, Airflow provides several advanced mechanisms: 1. Custom XCom Backends
By default, XComs are stored in the Airflow metadata database (e.g., PostgreSQL, MySQL), which has strict size limits (roughly 1GB for Postgres and 64KB for MySQL). You can create an exclusive storage layer by configuring a Custom XCom Backend: Apache Airflow XComs should be reserved exclusively for
Cloud Storage: Use the XComObjectStorageBackend to store larger data exclusively in S3 or GCS while only keeping a reference in the metadata DB.
Serialization: Subclass BaseXCom to override serialize_value and deserialize_value, allowing you to implement custom encryption or specialized compression for sensitive data. 2. TaskFlow API for Clean Scoping XComs — Airflow 3.2.1 Documentation
In Apache Airflow, XCom (short for "cross-communication") is the mechanism used to exchange data between tasks. However, it comes with significant constraints that make it "exclusive" in terms of how and when it should be used. Size Exclusive: XCom values must be small (under 1 KB)
Here is an overview of XCom exclusivity, limitations, and best practices.
Problem: You enable exclusive mode but still store heavy objects in the default DB.
Solution: Use CustomXComBackend that serializes large objects to external storage (GCS, S3, Redis) and stores only a URI in the xcom table.
Example:
class S3XCom(BaseXCom):
@staticmethod
def serialize(value):
if size_of(value) > 1_000_000:
s3_key = upload_to_s3(value)
return "__s3_uri": s3_key
return value
An XCom Exclusive treats XCom as a read-only, immutable, single-purpose reference, never as a payload carrier. The word "exclusive" implies three constraints:
In practice, an XCom Exclusive looks like this: a task generates a unique resource identifier (e.g., a S3 key, a BigQuery table name, a Spark application ID). It pushes only that string. Downstream tasks then use that string to interact with the external system directly.
# Task A and Task B run in parallel
task_a >> task_c
task_b >> task_c