How to Denormalize Data in Redshift

Galaxy Glossary

How do I denormalize tables in Amazon Redshift?

Denormalizing data in Redshift flattens relational tables into wider, query-ready tables or materialized views to speed up analytics.

Sign up for the latest in SQL knowledge from the Galaxy Team!
Welcome to the Galaxy, Guardian!
You'll be receiving a confirmation email

Follow us on twitter :)
Oops! Something went wrong while submitting the form.

Description

Table of Contents

Why denormalize data in Redshift?

Denormalization reduces joins, cutting cross-node data movement and CPU usage. Fewer joins mean faster dashboards and cheaper clusters.

What is the quickest denormalization method?

CREATE TABLE AS (CTAS) writes the result of a SELECT into a new table, letting you set distribution style and sort keys in one step.

How does CTAS work step by step?

1. Draft a SELECT that joins normalized tables.
2. Choose a DISTKEY matching frequent filters.
3.Add SORTKEYs that support range scans.
4. Run CTAS to create the flattened table.
5. Schedule refresh via orchestration.

Should I use materialized views instead?

Materialized views keep data fresh automatically. Use them when your workload tolerates their refresh lag and immutable key/sort design.

When do I refresh denormalized tables?

Refresh after every ETL batch or on a cron aligned with SLA.Large tables benefit from dropping and re-running CTAS with ALTER TABLE APPEND to avoid vacuum overhead.

Best practices for Redshift denormalization

• Align DISTKEY with join columns.
• Keep SORTKEYs narrow.
• Vacuum after big inserts.
• Use ENCODE AUTO for compression.
• Tag tables for automated housekeeping.

Common mistakes and fixes

Skipping distribution keys causes random distribution and full scans—always set a DISTKEY. Relying on row-by-row INSERT bloats storage—prefer CTAS or COPY + ALTER TABLE APPEND.

.

Why How to Denormalize Data in Redshift is important

How to Denormalize Data in Redshift Example Usage


-- Query top customers after denormalization
SELECT customer_id,
       customer_name,
       SUM(line_total) AS lifetime_value
FROM   denormalized_orders
GROUP  BY customer_id, customer_name
ORDER  BY lifetime_value DESC
LIMIT  10;

How to Denormalize Data in Redshift Syntax


-- Denormalize Orders with CTAS
CREATE TABLE denormalized_orders
DISTSTYLE KEY
DISTKEY (customer_id)
SORTKEY  (order_date)
AS
SELECT  o.id              AS order_id,
        c.id              AS customer_id,
        c.name            AS customer_name,
        c.email           AS customer_email,
        o.order_date,
        o.total_amount,
        p.id              AS product_id,
        p.name            AS product_name,
        p.price,
        oi.quantity,
        (oi.quantity * p.price) AS line_total
FROM    Orders      o
JOIN    Customers   c  ON c.id = o.customer_id
JOIN    OrderItems  oi ON oi.order_id = o.id
JOIN    Products    p  ON p.id = oi.product_id;

-- Materialized view alternative
CREATE MATERIALIZED VIEW mv_denormalized_orders
DISTKEY (customer_id)
SORTKEY  (order_date)
AS
SELECT /* same SELECT as above */;

Common Mistakes

Frequently Asked Questions (FAQs)

Is CTAS faster than INSERT SELECT?

Yes. CTAS writes data in bulk, applies compression automatically, and skips logging, giving significant speedups over INSERT SELECT.

Can I change DISTKEY or SORTKEY of a materialized view?

No. You must drop and recreate the view with new keys.

How large can a denormalized table be?

Redshift supports up to 16 PB compressed, but aim to keep flattened tables narrow and column-encoded to control storage costs.

Want to learn about other SQL terms?

Trusted by top engineers on high-velocity teams
Aryeo Logo
Assort Health
Curri
Rubie Logo
Bauhealth Logo
Truvideo Logo
Welcome to the Galaxy, Guardian!
You'll be receiving a confirmation email

Follow us on twitter :)
Oops! Something went wrong while submitting the form.