How to Design Star vs Snowflake Schemas in ClickHouse

Galaxy Glossary

Should I use a star or snowflake schema in ClickHouse?

Choose between star and snowflake schemas in ClickHouse to balance query speed, storage, and model flexibility.

Sign up for the latest in SQL knowledge from the Galaxy Team!
Welcome to the Galaxy, Guardian!
Oops! Something went wrong while submitting the form.

Description

Table of Contents

What is a star schema in ClickHouse?

A star schema stores each business process in one fact table and keeps every descriptive field for a dimension in a single, wide table. Fewer tables mean fewer joins, which ClickHouse can scan extremely fast because of columnar storage.

What is a snowflake schema?

A snowflake schema normalizes dimensions into multiple related tables. Repeating attributes such as country or category are broken out to save storage and simplify maintenance at the cost of extra joins.

When should I pick a star schema?

Pick star when query latency is more critical than storage cost, dimensions are small, or your dashboards require sub-second responses. ClickHouse’s JOIN algorithm performs best with one or two joins, matching star layouts.

When is a snowflake schema better?

Choose snowflake when dimensions are large or highly hierarchical, frequent updates to reference data are expected, or multiple fact tables need to share conformed dimensions without duplication.

How do I create a star schema in ClickHouse?

Create one fact table (e.g., Orders) and denormalize customer and product attributes into it or into single-level dimension tables. Use MergeTree engines for large facts and TinyLog/ReplacingMergeTree for small dimensions.

How do I create a snowflake schema?

Split dimensions into 2+ levels. For example, Products connects to ProductCategories; Customers links to Countries. Create surrogate keys as UInt32 to keep joins lightweight.

How do queries differ between the two?

Star queries often need one join: Orders ↔ Customers. Snowflake queries may require two or more joins, increasing planning time. Mitigate by enabling join_algorithm = 'partial_merge' and using PROJECTIONs.

Best practices for both schemas

Cluster tables by date and foreign key, compress surrogate keys with LowCardinality, and preload dimension tables in memory using SET prefer_localhost_replica = 1 for faster joins.

Why How to Design Star vs Snowflake Schemas in ClickHouse is important

How to Design Star vs Snowflake Schemas in ClickHouse Example Usage


-- Star: single join
SELECT o.order_date,
       sum(o.total_amount) AS revenue,
       c.country
FROM Orders AS o
INNER JOIN Customers AS c ON o.customer_id = c.id
WHERE o.order_date BETWEEN '2024-01-01' AND '2024-01-31'
GROUP BY c.country;

-- Snowflake: two joins
SELECT o.order_date,
       sum(o.total_amount) AS revenue,
       co.name AS country
FROM Orders AS o
INNER JOIN Customers AS c  ON o.customer_id = c.id
INNER JOIN Countries AS co ON c.country_id = co.id
WHERE o.order_date BETWEEN '2024-01-01' AND '2024-01-31'
GROUP BY co.name;

How to Design Star vs Snowflake Schemas in ClickHouse Syntax


-- Star schema (denormalized)
CREATE TABLE Customers (
    id UInt32,
    name String,
    email String,
    country String,
    created_at DateTime
) ENGINE = TinyLog;

CREATE TABLE Orders (
    id UInt32,
    customer_id UInt32,
    order_date Date,
    total_amount Decimal(12,2),
    customer_name String,
    customer_country String
) ENGINE = MergeTree
PARTITION BY toYYYYMM(order_date)
ORDER BY (order_date, customer_id);

-- Snowflake schema (normalized)
CREATE TABLE Countries (
    id UInt32,
    name String
) ENGINE = TinyLog;

CREATE TABLE Customers (
    id UInt32,
    name String,
    email String,
    country_id UInt32,
    created_at DateTime
) ENGINE = ReplacingMergeTree
ORDER BY id;

CREATE TABLE Orders (
    id UInt32,
    customer_id UInt32,
    order_date Date,
    total_amount Decimal(12,2)
) ENGINE = MergeTree
PARTITION BY toYYYYMM(order_date)
ORDER BY (order_date, customer_id);

Common Mistakes

Frequently Asked Questions (FAQs)

Does ClickHouse support foreign key constraints?

No. You enforce referential integrity at load time. Use JOINs in queries as needed.

Can I mix star and snowflake schemas?

Yes. Keep frequently used dimensions denormalized while normalizing sparse hierarchies. Hybrid models balance speed and maintenance.

How do materialized views help?

Create aggregate projections or materialized views to pre-compute heavy joins, reducing query time regardless of schema style.

Want to learn about other SQL terms?

Trusted by top engineers on high-velocity teams
Aryeo Logo
Assort Health
Curri
Rubie Logo
Bauhealth Logo
Truvideo Logo
Welcome to the Galaxy, Guardian!
Oops! Something went wrong while submitting the form.