Automate Amazon Redshift loads, exports, and transformations inside Apache Airflow DAGs using built-in Redshift operators.
Create an Airflow Connection of type Amazon Redshift (or Postgres) that stores host, port, database, user, and IAM-role-based credentials. Reference the connection with redshift_conn_id in your tasks.
Use RedshiftSQLOperator (or PostgresOperator) to embed SQL. Airflow will open a Redshift session, execute the statement, and commit or roll back automatically.
Yes.Combine S3ToRedshiftTransfer for COPY jobs and PythonOperator for pre-processing files. Schedule incremental loads or full refreshes nightly.
Define tasks (e.g., extract_orders → load_orders → transform_orders). Use >> to chain tasks so Redshift steps fire only after upstream jobs succeed.
Store secrets in AWS Secrets Manager, enable Auto-Vacuum and ANALYZE, set autocommit=True, and log SQL to S3 for audit.Parametrize dates with Jinja ({{ ds }}) to keep DAGs reusable.
Leverage Airflow SLA alarms, task-level retries, and Redshift system tables (stl_query, svl_qlog). Push query IDs to XCom for downstream diagnostics.
.
No. Redshift tasks run via a lightweight JDBC/psycopg2 session, so any standard worker can execute them.
Yes. Attach an IAM role to the Airflow instance or Kubernetes service account and reference it in the Redshift connection’s role_arn.
Split files in S3 into equal-sized parts, enable COMPUPDATE, and set COPY option MAXERROR 0 to fail fast on bad rows.