How to Design a Schema in ClickHouse

How do I design an efficient schema in ClickHouse?

Designing a ClickHouse schema defines tables, columns, data types, and engines so data is stored and queried efficiently.

Welcome to the Galaxy, Guardian!
You'll be receiving a confirmation email

Follow us on twitter :)

Oops! Something went wrong while submitting the form.

Description

Example H2

Example H3

What problems does a well-planned ClickHouse schema solve?

A thoughtful schema maximizes compression, minimizes I/O, and speeds aggregations. It also lowers hardware costs by choosing the right table engine and partitioning strategy up front.

Which questions should I answer before creating tables?

Clarify query patterns, retention periods, and update frequency. Decide on partition keys for bulk deletes, order keys for fast range scans, and table engines (MergeTree, ReplicatedMergeTree, CollapsingMergeTree, etc.).

How do I pick column data types?

Match each column to the smallest ClickHouse type that fits the data. Use LowCardinality(String) for short enumerations, Decimal(18,2) for money, and DateTime64 for sub-second timestamps.

When should I denormalize?

Denormalize whenever joins would out-weigh storage savings. ClickHouse favors wide tables with pre-aggregated metrics. Keep high-cardinality dimensions in separate dictionaries for lookups.

How do partitions and order keys impact performance?

Partition by coarse time (e.g., month) to speed deletes; order by (customer_id, order_date) to enable fast recent-order scans. The order key determines on-disk sort, reducing index reads.

What is the recommended engine for ecommerce events?

MergeTree is default. Use ReplicatedMergeTree in clusters, CollapsingMergeTree to handle soft deletes, or SummingMergeTree to auto-sum numeric columns.

Can I evolve a schema without downtime?

Yes. Use ALTER TABLE ADD COLUMN for non-blocking expansion. To change data type, create a new table with the target schema, INSERT SELECT, then swap names.

How to design a customer-orders schema?

Create fact tables (Orders, OrderItems) with MergeTree, partition by month, order by primary keys. Create dimension tables (Customers, Products) as TinyLog or MergeTree with LowCardinality columns.

Why How to Design a Schema in ClickHouse is important

How to Design a Schema in ClickHouse Example Usage


-- Find total revenue per customer last 30 days
SELECT c.name, sum(o.total_amount) AS revenue
FROM Orders o
JOIN Customers c ON o.customer_id = c.id
WHERE o.order_date >= now() - INTERVAL 30 DAY
GROUP BY c.name
ORDER BY revenue DESC;

How to Design a Schema in ClickHouse Syntax


-- Customers dimension
CREATE TABLE Customers (
    id UInt32,
    name String,
    email String,
    created_at DateTime
) ENGINE = TinyLog;

-- Orders fact table
CREATE TABLE Orders (
    id UInt32,
    customer_id UInt32,
    order_date DateTime,
    total_amount Decimal(18,2)
) ENGINE = MergeTree
PARTITION BY toYYYYMM(order_date)
ORDER BY (customer_id, order_date);

-- Products dimension
CREATE TABLE Products (
    id UInt32,
    name String,
    price Decimal(10,2),
    stock UInt32
) ENGINE = TinyLog;

-- OrderItems fact table
CREATE TABLE OrderItems (
    id UInt32,
    order_id UInt32,
    product_id UInt32,
    quantity UInt8
) ENGINE = MergeTree
PARTITION BY toYYYYMM(now())
ORDER BY (order_id, product_id);

Common Mistakes

Using default String for low-cardinality columns creates heavy dictionaries and slow IO. Fix: switch to LowCardinality(String) to cut storage and speed filters.
Partitioning by day for low-volume tables yields too many parts and slow merges. Fix: partition by month or week unless daily retention is required.

Frequently Asked Questions (FAQs)

Is normalization bad in ClickHouse?

No, but ClickHouse rewards denormalization for read-heavy workloads. Keep small, slowly changing dimensions normalized; denormalize high-traffic facts.