A star schema keeps dimensions denormalized while a snowflake schema normalizes them; the right choice affects query speed, storage cost, and maintenance in BigQuery.
Pick a star schema when analytics speed matters more than storage. Star keeps each dimension in a single, wide table, reducing joins. BigQuery’s columnar storage compresses repeated dimension values, so denormalization rarely explodes cost.
Choose a snowflake schema when dimensions change frequently, contain many low-cardinality attributes, or are reused by multiple fact tables. Normalizing dimensions cuts maintenance time and avoids duplicate updates.
Create one fact table (Orders) and one dimension table per business entity—Customers, Products, Date. The Orders fact stores foreign keys to each dimension plus measures such as total_amount.
Break large dimensions into subdimensions. For example, split the Products dimension into Product, Category, and Supplier tables. The Orders fact references Product; Product references Category and Supplier, forming a chain of smaller tables.
Star schemas usually scan fewer tables, so BigQuery can prune columns quickly and parallelize work, giving lower latency. Snowflake schemas may add shuffle steps for extra joins.
Storage: snowflake wins by avoiding repeated dimension data. Query: star wins by reading fewer tables. For most workloads, BigQuery’s compressed storage makes the cost gap small.
1. Partition and cluster the fact table by date and common filters.
2. Use surrogate integer keys for joins; avoid strings.
3. Materialize high-use snowflake joins into views or scheduled tables.
4. Revisit schema choice as data volume and query patterns evolve.
BigQuery’s columnar compression keeps repeated dimension values small. For most cases, the cost increase is negligible compared to the performance gain.
Yes. Many teams keep high-use dimensions denormalized (star) and less-used dimensions normalized (snowflake), gaining speed without sacrificing maintainability.