Snowflake-Airflow integration lets you schedule and monitor Snowflake SQL jobs through Python DAGs.
Airflow uses the SnowflakeOperator
and a snowflake_conn_id
stored in the Airflow connection UI. The operator authenticates with username/password, key-pair, or OAuth and then sends SQL to Snowflake.
1) Create a Snowflake user/role. 2) Add a Snowflake connection in Airflow. 3) Install apache-airflow-providers-snowflake
. 4) Define a DAG with SnowflakeOperator
. 5) Trigger or schedule the DAG.
Pass the SQL string or file path, choose warehouse, database, schema, and optional parameters. Each task becomes a node in your DAG just like any Python function.
Use the operator to execute DDL, DML, or COPY statements against tables such as Customers
, Orders
, Products
, and OrderItems
. You can chain tasks for staging, transformation, and quality checks.
Store SQL in version-controlled files, enable autocommit=True
, set depends_on_past=True
for incremental loads, and add on_failure_callback
to alert on errors.
Incorrect connection IDs and missing warehouses cause most failures. Always test the connection in the Airflow UI and specify the warehouse in each task or connection.
Airflow’s Tree, Gantt, and Log views show real-time status. Inside Snowflake, query QUERY_HISTORY
to cross-check execution time and cost.
No. You can also use S3ToSnowflakeOperator
, SnowflakeSqlApiOperator
, or generic PythonOperator
with the Snowflake Python connector for advanced logic.
Store passwords or key-pair secrets in Airflow’s backend (Env Var, AWS Secrets Manager, Vault). Reference them in the Snowflake connection; never embed creds in DAG code.
Yes. Define separate tasks with different warehouse
values or create multiple Airflow connections, each pointing to a distinct warehouse to isolate workloads.