A Slowly Changing Dimension (SCD) Type 2 is a data-warehousing technique that preserves the full history of attribute changes by inserting a new dimension record for every change instead of overwriting existing values.
A Slowly Changing Dimension Type 2 (SCD 2) is a modeling and ETL pattern used in star or snowflake schemas to track historical changes to dimension attributes—such as a customer’s address—by inserting a new row with a surrogate key and date range each time a change occurs.
The term "slowly changing" refers to dimensions whose attributes change infrequently—think about customer addresses, product categories, or employee job titles. Type 2 is one of several SCD strategies (others include Type 0, 1, 3, 4, and 6). Its distinguishing feature is that it never updates or deletes existing rows; instead, it adds a new row and marks it current with flags or date ranges.
Data products frequently require "as was" analysis: How many orders shipped when the customer lived in New York? How did a sales pipeline look before a re-org? Without Type 2, such questions are impossible once attributes are overwritten. Preserving historical context is vital for:
A system-generated, immutable identifier (e.g., customer_sk
) that uniquely identifies each version of the dimension.
The natural key in the source system (e.g., customer_id
).
Columns—typically is_current
, effective_date
, and end_date
—that define which record is active for a given business key at any point in time.
end_date
to yesterday
and is_current = 'N'
.is_current = 'Y'
, effective_date = today
, end_date = '9999-12-31'
.business_key
and event_date BETWEEN effective_date AND end_date
.-- Step 1: detect changed customers in staging vs dimension
WITH hashed_staging AS (
SELECT s.customer_id,
md5(concat_ws('|', s.first_name, s.last_name, s.address, s.city)) AS attr_hash,
s.*
FROM stage_customer s
), latest_dim AS (
SELECT d.*
FROM dim_customer d
WHERE d.is_current = 'Y'
)
, to_change AS (
SELECT hs.*
FROM hashed_staging hs
JOIN latest_dim ld
ON hs.customer_id = ld.customer_id
WHERE hs.attr_hash <> ld.attr_hash
)
-- Step 2: close out old versions
UPDATE dim_customer d
SET is_current = 'N',
end_date = CURRENT_DATE - INTERVAL '1 day'
FROM to_change c
WHERE d.customer_id = c.customer_id
AND d.is_current = 'Y';
-- Step 3: insert new versions
INSERT INTO dim_customer (
customer_sk,
customer_id,
first_name,
last_name,
address,
city,
attr_hash,
effective_date,
end_date,
is_current
)
SELECT nextval('customer_sk_seq'),
s.customer_id,
s.first_name,
s.last_name,
s.address,
s.city,
s.attr_hash,
CURRENT_DATE,
DATE '9999-12-31',
'Y'
FROM to_change s;
BETWEEN
joins.business_key
and is_current
; optionally partition by is_current
or effective_year
.Although SCD 2 is conceptually a modeling technique, its implementation is SQL-heavy. Galaxy’s modern SQL editor speeds up this work by:
Updating the existing row corrupts the history. Always set is_current = 'N'
and end_date
before inserting the new version.
If effective_date
or end_date
ignore time zones, facts stamped in UTC may mismatch. Normalize to UTC or store both date and timestamp with TZ.
Analysts often forget the date range condition. A correct join uses business_key
and a window predicate (BETWEEN
effective/end dates).
After each load, consider:
SELECT customer_id, COUNT(*) FROM dim_customer GROUP BY 1 HAVING COUNT(*) > 1 AND SUM(is_current='Y') <> 1;
– Ensures exactly one current row per business key.Without Type 2 dimensions, any update to an attribute overwrites history, making it impossible to answer time-variant business questions or comply with audit requirements. Implementing SCD 2 correctly preserves data lineage, supports historical analytics, and ensures regulatory compliance—all foundational needs in modern data engineering and analytics.
Type 1 simply overwrites the dimension record, losing history. Type 2 inserts a new record for every change and preserves all historical versions via date ranges or flags.
Filter on is_current = 'Y'
or end_date = '9999-12-31'
. Indexing that column speeds up lookups.
No. Facts remain the same; only the dimension stores multiple versions. However, ensure you join on both the business key and date range to avoid duplicates.
Yes. Galaxy’s AI copilot can auto-generate MERGE statements, while Collections allow teams to review and endorse the Type 2 ETL scripts—all within a fast, memory-light SQL IDE.