How to Integrate Amazon Redshift with Airflow

How do I connect Amazon Redshift to Apache Airflow?

Automate Amazon Redshift loads, exports, and transformations inside Apache Airflow DAGs using built-in Redshift operators.

Welcome to the Galaxy, Guardian!
You'll be receiving a confirmation email

Follow us on twitter :)

Oops! Something went wrong while submitting the form.

Description

Example H2

Example H3

What is the fastest way to connect Airflow to Redshift?

Create an Airflow Connection of type Amazon Redshift (or Postgres) that stores host, port, database, user, and IAM-role-based credentials. Reference the connection with redshift_conn_id in your tasks.

How do I run SQL against Redshift from a DAG?

Use RedshiftSQLOperator (or PostgresOperator) to embed SQL. Airflow will open a Redshift session, execute the statement, and commit or roll back automatically.

Can Airflow copy data into Redshift?

Yes.Combine S3ToRedshiftTransfer for COPY jobs and PythonOperator for pre-processing files. Schedule incremental loads or full refreshes nightly.

How do dependencies work?

Define tasks (e.g., extract_orders → load_orders → transform_orders). Use >> to chain tasks so Redshift steps fire only after upstream jobs succeed.

Best practices for production?

Store secrets in AWS Secrets Manager, enable Auto-Vacuum and ANALYZE, set autocommit=True, and log SQL to S3 for audit.Parametrize dates with Jinja ({{ ds }}) to keep DAGs reusable.

How to monitor Redshift tasks?

Leverage Airflow SLA alarms, task-level retries, and Redshift system tables (stl_query, svl_qlog). Push query IDs to XCom for downstream diagnostics.

Why How to Integrate Amazon Redshift with Airflow is important

How to Integrate Amazon Redshift with Airflow Example Usage


# dags/redshift_orders_etl.py
from airflow import DAG
from airflow.providers.amazon.aws.operators.redshift_sql import RedshiftSQLOperator
from airflow.providers.amazon.aws.transfers.s3_to_redshift import S3ToRedshiftOperator
from datetime import datetime

default_args = {"retries": 2}

dag = DAG(
    dag_id="orders_daily_etl",
    start_date=datetime(2024, 6, 1),
    schedule_interval="0 2 * * *",
    default_args=default_args,
    catchup=False,
)

stage_orders = S3ToRedshiftOperator(
    task_id="stage_orders",
    schema="public",
    table="staging_orders",
    s3_bucket="ecommerce-raw",
    s3_key="orders/.csv",
    copy_options=["CSV", "IGNOREHEADER 1"],
    redshift_conn_id="redshift_prod",
    dag=dag,
)

load_orders = RedshiftSQLOperator(
    task_id="load_orders",
    sql="sql/load_orders.sql",  # keeps SQL version-controlled
    redshift_conn_id="redshift_prod",
    autocommit=True,
    dag=dag,
)

stage_orders >> load_orders

How to Integrate Amazon Redshift with Airflow Syntax


from airflow.providers.amazon.aws.operators.redshift_sql import RedshiftSQLOperator

RedshiftSQLOperator(
    task_id="load_daily_orders",
    sql="""
        INSERT INTO Orders (id, customer_id, order_date, total_amount)
        SELECT id, customer_id, order_date, total_amount
        FROM staging_orders
        WHERE order_date = '';
    """,
    redshift_conn_id="redshift_prod",   # Connection in Airflow UI
    autocommit=True,                     # Commit after run
    parameters=None,                     # Optional SQL params
    database="analytics",               # Target DB (optional)
    aws_conn_id=None                     # Use if IAM-based auth differs
)

Common Mistakes

Using the generic Postgres hook without setting sslmode=require. Redshift rejects the connection. Fix by creating a dedicated Amazon Redshift connection or adding extra={"sslmode":"require"}.
Hard-coding dates in SQL instead of Jinja templates. This breaks scheduling. Replace static dates with {{ ds }} or {{ macros.ds_add(ds, -1) }} for rolling windows.