Explains when to model data with a star schema or a snowflake schema in SQL Server and how each affects query speed, storage, and maintenance.
Data-warehouse performance, storage cost, and developer productivity depend on the model. Picking the wrong shape slows reports and complicates ETL.
A star schema keeps dimensions denormalized. One fact table joins directly to wide dimension tables through single-column surrogate keys.
A snowflake schema normalizes dimensions into multiple related tables.The fact table still joins to primary dimension tables, but extra joins reach sub-dimensions.
Build one fact table (e.g., FactSales
) and several wide dimensions (DimDate
, DimCustomer
, DimProduct
).Denormalize attributes into each dimension to avoid runtime joins.
Create the same fact table plus narrow dimensions that reference additional lookup tables, such as DimCustomer
→ DimGeo
→ DimCountry
.
Choose star when query speed, BI tool friendliness, and simple ETL trump disk space.Denormalized dimensions cut join counts and accelerate aggregates.
Choose snowflake when storage is tight, dimension data changes slowly, or the warehouse must mirror a highly normalized source system.
Star typically wins. Less joins let SQL Server pick simpler plans and leverage columnstore segment elimination.Snowflake may edge ahead only when RAM is scarce and cache reuse is critical.
Create integer identity columns on dimensions. Populate them during ETL and reference them from the fact table. This isolates slowly changing dimension logic from natural keys.
The sample query below aggregates revenue by product category. The star version scans one fact and one dimension.The snowflake version scans one fact and two dimensions, adding latency.
• Always load dimensions before facts. • Use surrogate integer keys. • Add cluster and columnstore indexes to facts. • Keep description fields VARCHAR(MAX) out of hot paths. • Document lineage in data catalog.
1. Flatten lookup tables into the parent dimension with ETL. 2. Update surrogate keys. 3. Rebuild indexes and statistics. 4. Retest reports.
.
Almost always. Fewer joins reduce I/O and simplify execution plans. Columnstore indexes amplify the benefit.
Yes, when dimensions hold many textual attributes reused across rows. Normalization eliminates repeated strings, cutting bytes on disk and in cache.
Yes. Use star for high-traffic dimensions and snowflake for rarely queried or very large lookup hierarchies.