How to Integrate Oracle with Airflow

Galaxy Glossary

How do I integrate Oracle with Airflow?

Oracle integration with Airflow lets you schedule, monitor, and retry Oracle SQL jobs through Apache Airflow’s OracleHook and OracleOperator.

Sign up for the latest in SQL knowledge from the Galaxy Team!
Welcome to the Galaxy, Guardian!
Oops! Something went wrong while submitting the form.

Description

Table of Contents

What does Oracle integration with Airflow do?

It connects Airflow DAGs to Oracle so you can run, retry, and log PL/SQL or plain SQL statements on a schedule.

How do I create an Oracle connection in Airflow?

Use the airflow connections add CLI or the UI. Supply host, port, SID or service_name, user, and password. Name it oracle_default or any string referenced by oracle_conn_id.

What Python code schedules an Oracle query?

Create a DAG, import OracleOperator, and pass the SQL plus optional parameters.Airflow handles retries, logging, XCom pushes, and templating.

How do I pass bind parameters safely?

Use the parameters argument: parameters={"min_total":1000}.This avoids SQL injection and lets Airflow render Jinja templates for dynamic values.

Can I move Oracle data to another system?

Chain an OracleOperator that SELECTs data with a PythonOperator or PostgresOperator to push rows into PostgreSQL, S3, or BigQuery.

Best practices for production DAGs

• Store credentials in a secret back-end.
• Use small, idempotent SQL chunks.
• Add retries and retry_delay in default_args.
• Set catchup=False for one-off loads.

What are common mistakes?

1. Forgetting the Oracle client libraries—Airflow won’t start.
2.Misnaming oracle_conn_id—tasks fail with “connection id not found.”

.

Why How to Integrate Oracle with Airflow is important

How to Integrate Oracle with Airflow Example Usage


high_value_new_customers = OracleOperator(
    task_id='high_value_new_customers',
    oracle_conn_id='oracle_default',
    sql="""
        SELECT C.id, C.name, C.email, O.total_amount
        FROM Customers C
        JOIN Orders O ON O.customer_id = C.id
        WHERE C.created_at >= SYSDATE - 1
          AND O.total_amount > 500
    """
)

How to Integrate Oracle with Airflow Syntax


# 1. Add an Oracle connection
$ airflow connections add 'oracle_default' \
  --conn-type oracle \
  --conn-login scott \
  --conn-password tiger \
  --conn-host db.example.com \
  --conn-port 1521 \
  --conn-schema ORCLPDB1 \
  --conn-extra '{"service_name":"ORCLPDB1"}'

# 2. Define a DAG that queries Orders
from airflow import DAG
from airflow.providers.oracle.operators.oracle import OracleOperator
from datetime import datetime, timedelta

default_args = {
    'owner': 'galaxy',
    'retries': 3,
    'retry_delay': timedelta(minutes=5)
}

with DAG(
    dag_id='daily_high_value_orders',
    start_date=datetime(2023, 1, 1),
    schedule_interval='0 2 * * *',
    catchup=False,
    default_args=default_args
) as dag:
    high_value_orders = OracleOperator(
        task_id='high_value_orders',
        oracle_conn_id='oracle_default',
        sql="""
            SELECT *
            FROM Orders
            WHERE total_amount > :min_total
              AND order_date >= TRUNC(SYSDATE) - 1
        """,
        parameters={'min_total': 1000}
    )

Common Mistakes

Frequently Asked Questions (FAQs)

Do I need cx_Oracle installed on every Airflow worker?

Yes. Each worker executing Oracle tasks must have cx_Oracle and the Oracle client libraries available.

Can I templatize SQL files?

Absolutely. Point sql to a .sql file in the DAG folder and use Jinja templates like {{ ds }} for dynamic dates.

How large of a result set can OracleOperator pull?

Limit results with fetch_size or push to XCom in chunks; otherwise, large pulls can exceed worker memory.

Want to learn about other SQL terms?

Trusted by top engineers on high-velocity teams
Aryeo Logo
Assort Health
Curri
Rubie Logo
Bauhealth Logo
Truvideo Logo
Welcome to the Galaxy, Guardian!
Oops! Something went wrong while submitting the form.