Choose, create, and query star and snowflake schemas efficiently in Amazon Redshift.
Schema choice controls query speed, storage cost, and ETL complexity. Redshift’s columnar engine favors wide tables and fewer joins, making star schemas popular, but snowflake schemas can save space and simplify dimension maintenance.
A star schema keeps one fact table and several denormalized dimension tables.Each dimension stores descriptive columns directly, letting BI tools join with a single key and Redshift scan fewer tables.
A snowflake schema further normalizes dimensions into sub-tables. Shared attributes move to child tables, reducing duplication but re-introducing joins.This can lower storage but may slow queries unless you use materialized views or result caching.
Pick star schemas for interactive dashboards, ad-hoc exploration, or when analysts need low-latency joins on large fact tables like Orders
. Denormalization minimizes joins and maximizes Redshift’s data skipping.
Choose snowflake schemas when dimensions change frequently or share data (e.g., Products
and Categories
).Normalization avoids update anomalies and shrinks storage, especially with many multilingual attributes.
Yes.If you query S3 via Spectrum, a snowflake on Parquet can stay normalized while Redshift’s fact table stays star-like, balancing storage and performance.
1) Sort and distribute the fact table on high-cardinality keys like id
or date ranges.
2) Keep dimension tables <1 GB for fast broadcast joins.
3) Add FOREIGN KEY
constraints as NOT VALID
to aid query planner without enforcement cost.
Create a new denormalized dimension with CREATE TABLE AS
, backfill data with INSERT ...SELECT
, then swap tables using ALTER TABLE ... RENAME
. Re-run analyze and vacuum to refresh statistics.
Use EXPLAIN
to spot unnecessary joins, add materialized views for frequent snowflake joins, and leverage RESULT_CACHE
for repeated queries.
.
No. Use FOREIGN KEY … NOT VALID
for planner hints without runtime checks.
Yes. Many teams keep a star-like core while snowflaking rarely used dimensions to balance performance and storage.