Designing a ClickHouse schema defines tables, columns, data types, and engines so data is stored and queried efficiently.
A thoughtful schema maximizes compression, minimizes I/O, and speeds aggregations. It also lowers hardware costs by choosing the right table engine and partitioning strategy up front.
Clarify query patterns, retention periods, and update frequency. Decide on partition keys for bulk deletes, order keys for fast range scans, and table engines (MergeTree, ReplicatedMergeTree, CollapsingMergeTree, etc.).
Match each column to the smallest ClickHouse type that fits the data. Use LowCardinality(String) for short enumerations, Decimal(18,2) for money, and DateTime64 for sub-second timestamps.
Denormalize whenever joins would out-weigh storage savings. ClickHouse favors wide tables with pre-aggregated metrics. Keep high-cardinality dimensions in separate dictionaries for lookups.
Partition by coarse time (e.g., month) to speed deletes; order by (customer_id, order_date) to enable fast recent-order scans. The order key determines on-disk sort, reducing index reads.
MergeTree is default. Use ReplicatedMergeTree in clusters, CollapsingMergeTree to handle soft deletes, or SummingMergeTree to auto-sum numeric columns.
Yes. Use ALTER TABLE ADD COLUMN for non-blocking expansion. To change data type, create a new table with the target schema, INSERT SELECT, then swap names.
Create fact tables (Orders, OrderItems) with MergeTree, partition by month, order by primary keys. Create dimension tables (Customers, Products) as TinyLog or MergeTree with LowCardinality columns.
No, but ClickHouse rewards denormalization for read-heavy workloads. Keep small, slowly changing dimensions normalized; denormalize high-traffic facts.
Either use CollapsingMergeTree with a sign column or drop whole partitions by month. Avoid row-by-row deletes.
No; integrity is handled at the application layer. Replicate dimension keys before inserting into fact tables.