Choose between star and snowflake schemas in ClickHouse to balance query speed, storage, and model flexibility.
A star schema stores each business process in one fact table and keeps every descriptive field for a dimension in a single, wide table. Fewer tables mean fewer joins, which ClickHouse can scan extremely fast because of columnar storage.
A snowflake schema normalizes dimensions into multiple related tables. Repeating attributes such as country
or category
are broken out to save storage and simplify maintenance at the cost of extra joins.
Pick star when query latency is more critical than storage cost, dimensions are small, or your dashboards require sub-second responses. ClickHouse’s JOIN algorithm performs best with one or two joins, matching star layouts.
Choose snowflake when dimensions are large or highly hierarchical, frequent updates to reference data are expected, or multiple fact tables need to share conformed dimensions without duplication.
Create one fact table (e.g., Orders
) and denormalize customer and product attributes into it or into single-level dimension tables. Use MergeTree
engines for large facts and TinyLog
/ReplacingMergeTree
for small dimensions.
Split dimensions into 2+ levels. For example, Products
connects to ProductCategories
; Customers
links to Countries
. Create surrogate keys as UInt32
to keep joins lightweight.
Star queries often need one join: Orders ↔ Customers
. Snowflake queries may require two or more joins, increasing planning time. Mitigate by enabling join_algorithm = 'partial_merge'
and using PROJECTION
s.
Cluster tables by date and foreign key, compress surrogate keys with LowCardinality
, and preload dimension tables in memory using SET prefer_localhost_replica = 1
for faster joins.
No. You enforce referential integrity at load time. Use JOIN
s in queries as needed.
Yes. Keep frequently used dimensions denormalized while normalizing sparse hierarchies. Hybrid models balance speed and maintenance.
Create aggregate projections or materialized views to pre-compute heavy joins, reducing query time regardless of schema style.