ELT is a modern data integration pattern where raw data is first Extracted from source systems, Loaded into a destination datastore, and then Transformed in-situ using that platform’s processing engine.
ELT—Extract, Load, Transform—reverses the final two steps of the classic ETL pattern. Instead of transforming data in an intermediary system before loading, ELT lands raw data directly into a scalable analytics store (cloud data warehouse, lakehouse, or object storage) and performs transformations there, leveraging elastic compute and native SQL engines.
Cloud data warehouses such as Snowflake, BigQuery, Redshift, and Databricks have fundamentally changed cost and performance dynamics. It is now cheaper and faster to copy large volumes of raw data into a centralized platform and run set-based SQL transformations on-demand than to operate a separate transformation layer. ELT embraces this shift by:
Data is pulled—or streamed—from source systems: operational databases, SaaS APIs, logs, IoT devices, etc. Tools like Fivetran, Airbyte, and Debezium automate snapshot and change data capture (CDC), writing results to cloud storage or staging schemas.
The extracted files or CDC events are loaded almost verbatim into landing zones inside the target warehouse. Typical formats include Parquet, Avro, JSON, or compressed CSV. The goal is lossless ingest: preserve every column, row, and type—even if messy.
Once data is safely stored, SQL (or Spark/Python) jobs perform cleansing, normalization, denormalization, data quality checks, and dimensional modeling. Frameworks such as dbt orchestrate these transformations, recompiling views into materialized tables.
Write extracted data into _raw
schemas that are append-only. Disable UPDATE/DELETE permissions so the raw history remains an authoritative source.
Organize transformations into clearly named layers—commonly staging
, intermediate
, and mart
. Each layer should be idempotent and fully derived from the one below it. Tools like dbt or Dagster enforce these dependencies.
Write transformations in SQL as views or CREATE OR REPLACE TABLE AS SELECT
(CTAS). Declarative code is easier to diff, test, and review than imperative procedural scripts.
Automated tests (row counts, uniqueness, null checks, schema contracts) should run after each transformation step. Fail fast to prevent bad data from propagating.
Track downstream dependencies and warehouse spend. ELT centralizes compute in the warehouse, so poorly written queries can cascade costs. Use query tags, warehouse credit limits, and optimization techniques (clustering, partitioning, materialized views).
Assume a SaaS product’s PostgreSQL users
table is replicated daily into a Snowflake schema raw.public_users
. We want a cleaned dimension table dim_users
.
-- Stage: cast types, alias snake_case
CREATE OR REPLACE TABLE staging.users AS
SELECT
id::NUMBER AS user_id,
full_name AS name,
email AS email,
to_timestamp_tz(created) AS created_ts,
to_timestamp_tz(updated) AS updated_ts,
status AS status
FROM raw.public_users;
-- Mart: keep only active users & apply SCD Type 2
CREATE OR REPLACE TABLE mart.dim_users AS
SELECT *
FROM (
SELECT *,
ROW_NUMBER() OVER (PARTITION BY user_id ORDER BY updated_ts DESC) AS rn
FROM staging.users
WHERE status = 'active'
) WHERE rn = 1;
Both statements run inside Snowflake without moving data outside the platform—embodying the ELT philosophy.
Because ELT transformations are overwhelmingly SQL-centric, a developer-focused SQL editor like Galaxy accelerates the workflow:
The mistake: copying raw data into a warehouse but never modeling it.
Fix: schedule incremental transformations and enforce SLAs so raw data matures into consumable datasets.
The mistake: porting imperative ETL scripts that update rows individually, negating warehouse parallelism.
Fix: rewrite logic as set-based SQL (e.g., CTAS, MERGE) and leverage warehouse optimizer hints.
The mistake: giving ad-hoc users unrestricted warehouse credits, leading to runaway bills.
Fix: tag queries, allocate resource monitors, and educate teams on partition pruning.
_raw
schemas with RBAC locking.stg_
, int_
, dim_
, fct_
).As data volumes soar and cloud warehouses provide near-infinite compute, ELT enables teams to centralize raw data quickly, preserve history, and perform flexible, cost-efficient SQL transformations in situ. This accelerates analytics, improves governance, and aligns with DevOps-style version control and CI/CD for data models.
No. ELT shines when your destination platform can scale compute cheaply (e.g., Snowflake). For strict latency or on-prem constraints, traditional ETL may still fit.
Transformations implement SCD logic (Type 2 or 3) using SQL MERGE or window functions once data is loaded, preserving full history.
Popular stacks combine Fivetran/Airbyte (extract), Snowflake/BigQuery/Databricks (load), and dbt or Dagster (transform).
Galaxy’s desktop SQL editor and AI Copilot let you write, test, share, and version the transformation queries that form the “T” in ELT, while Collections help endorse production-ready models.