Airflow integrates with MySQL through connections, hooks, and operators to execute SQL and orchestrate data pipelines.
Airflow automates recurring MySQL queries, extracts, and transformations so you no longer rely on cron jobs or manual execution. DAG scheduling, retry logic, and alerting make pipelines reliable and observable.
Add a connection via the UI or environment variable. Set Conn Id mysql_ecom
, host, port (3306), user, password, and schema ecommerce
. Airflow stores credentials in its metadatabase and injects them into hooks and operators.
MySqlOperator
is the workhorse for running SQL. Provide the task id, connection id, and SQL string or file. Templates let you inject runtime variables, making DAGs dynamic.
Define a DAG file, import MySqlOperator
, and create a task that pulls yesterday’s orders. Push results to XCom for downstream tasks.
Set do_xcom_push=True
in MySqlOperator
to store query results. Subsequent tasks access them with {{ ti.xcom_pull(task_ids='task_name') }}
. Limit large result sets to avoid metadatabase bloat.
Use parametrized SQL to avoid SQL injection, keep transactions short to minimize lock time, and store large extracts in object storage, not XComs. Wrap DDL in Autocommit
or BEGIN…COMMIT
as needed.
Bad Conn Id: Typos cause InvalidConnection
. Verify the Conn Id matches conn_id
in your operator.
Missing client libs: Airflow needs mysqlclient
or pymysql
. Install with pip install 'apache-airflow[mysql]'
.
No, the metadata database can be Postgres or MySQL. The MySQL integration discussed here targets your source/target business database.
Yes. Use BashOperator
to call mysqlimport
or a Python task with LOAD DATA INFILE
.
Run pip install "apache-airflow[mysql]"
. This installs MySQL client drivers and the provider package.
Yes. Pass a list of SQL strings to sql
or separate statements with semicolons and set autocommit=False
to run them in a single transaction.
Absolutely. Surround variables with Jinja delimiters like {{ ds }}
to insert execution dates at runtime.