How to Design Star vs Snowflake Schemas in ClickHouse

Galaxy Glossary

Should I use a star or snowflake schema in ClickHouse?

Choose between star and snowflake schemas in ClickHouse to balance query speed, storage, and model flexibility.

Welcome to the Galaxy, Guardian!
You'll be receiving a confirmation email

Follow us on twitter :)

Oops! Something went wrong while submitting the form.

Description

Example H2

Example H3

What is a star schema in ClickHouse?

A star schema stores each business process in one fact table and keeps every descriptive field for a dimension in a single, wide table. Fewer tables mean fewer joins, which ClickHouse can scan extremely fast because of columnar storage.

What is a snowflake schema?

A snowflake schema normalizes dimensions into multiple related tables. Repeating attributes such as country or category are broken out to save storage and simplify maintenance at the cost of extra joins.

When should I pick a star schema?

Pick star when query latency is more critical than storage cost, dimensions are small, or your dashboards require sub-second responses. ClickHouse’s JOIN algorithm performs best with one or two joins, matching star layouts.

When is a snowflake schema better?

Choose snowflake when dimensions are large or highly hierarchical, frequent updates to reference data are expected, or multiple fact tables need to share conformed dimensions without duplication.

How do I create a star schema in ClickHouse?

Create one fact table (e.g., Orders) and denormalize customer and product attributes into it or into single-level dimension tables. Use MergeTree engines for large facts and TinyLog/ReplacingMergeTree for small dimensions.

How do I create a snowflake schema?

Split dimensions into 2+ levels. For example, Products connects to ProductCategories; Customers links to Countries. Create surrogate keys as UInt32 to keep joins lightweight.

How do queries differ between the two?

Star queries often need one join: Orders ↔ Customers. Snowflake queries may require two or more joins, increasing planning time. Mitigate by enabling join_algorithm = 'partial_merge' and using PROJECTIONs.

Best practices for both schemas

Cluster tables by date and foreign key, compress surrogate keys with LowCardinality, and preload dimension tables in memory using SET prefer_localhost_replica = 1 for faster joins.

Why How to Design Star vs Snowflake Schemas in ClickHouse is important

How to Design Star vs Snowflake Schemas in ClickHouse Example Usage


-- Star: single join
SELECT o.order_date,
       sum(o.total_amount) AS revenue,
       c.country
FROM Orders AS o
INNER JOIN Customers AS c ON o.customer_id = c.id
WHERE o.order_date BETWEEN '2024-01-01' AND '2024-01-31'
GROUP BY c.country;

-- Snowflake: two joins
SELECT o.order_date,
       sum(o.total_amount) AS revenue,
       co.name AS country
FROM Orders AS o
INNER JOIN Customers AS c  ON o.customer_id = c.id
INNER JOIN Countries AS co ON c.country_id = co.id
WHERE o.order_date BETWEEN '2024-01-01' AND '2024-01-31'
GROUP BY co.name;

How to Design Star vs Snowflake Schemas in ClickHouse Syntax


-- Star schema (denormalized)
CREATE TABLE Customers (
    id UInt32,
    name String,
    email String,
    country String,
    created_at DateTime
) ENGINE = TinyLog;

CREATE TABLE Orders (
    id UInt32,
    customer_id UInt32,
    order_date Date,
    total_amount Decimal(12,2),
    customer_name String,
    customer_country String
) ENGINE = MergeTree
PARTITION BY toYYYYMM(order_date)
ORDER BY (order_date, customer_id);

-- Snowflake schema (normalized)
CREATE TABLE Countries (
    id UInt32,
    name String
) ENGINE = TinyLog;

CREATE TABLE Customers (
    id UInt32,
    name String,
    email String,
    country_id UInt32,
    created_at DateTime
) ENGINE = ReplacingMergeTree
ORDER BY id;

CREATE TABLE Orders (
    id UInt32,
    customer_id UInt32,
    order_date Date,
    total_amount Decimal(12,2)
) ENGINE = MergeTree
PARTITION BY toYYYYMM(order_date)
ORDER BY (order_date, customer_id);

Common Mistakes

Normalizing small dimensions in a snowflake schema. ClickHouse gains little storage savings but adds join overhead. Fix by keeping low-cardinality attributes in a star layout.
Using String foreign keys. Strings inflate on-disk size and slow joins. Always cast to numeric surrogate keys like UInt32.