Break large, denormalized tables into smaller, related tables to eliminate redundancy and speed up analytical queries.
Normalization converts a wide, duplicate-heavy table into multiple related tables, reducing storage, preventing update anomalies, and improving query efficiency.
Normalize when the original table repeats customer, product, or order details. Splitting them into Customers, Orders, Products, and OrderItems minimizes redundancy and speeds specific lookups.
Create one MergeTree table per entity with a stable primary key.Use LowCardinality for lookup columns and materialized views for fast aggregates.
Use CREATE TABLE to define each entity, then INSERT INTO normalized_table SELECT ... FROM raw_table to backfill. Optional TTL clauses keep history lean.
The example query below moves data from a raw denormalized OrdersRaw table into Orders and OrderItems, linking them by order_id.
Pick UInt64 surrogate keys, store dates as Date32, and declare LowCardinality(String) for categorical columns.Use ReplacingMergeTree to avoid duplicate rows.
Duplicate primary keys: forgetting ORDER BY id in MergeTree causes multiple rows per key.Always set ORDER BY(id).
Using String instead of LowCardinality: plain String inflates disk usage; switch to LowCardinality for repeated text values.
No—proper joins on primary keys are fast, and column pruning means fewer bytes scanned.
Yes, create materialized views that aggregate normalized tables into wide reporting tables without touching source data.
Use INSERT SELECT for batch loads or a Kafka engine + materialized view for streaming sync.
.
ClickHouse’s distributed joins on primary keys are optimized and often faster than scanning a wide table.
Set up a Kafka or RabbitMQ pipeline feeding a raw table and use materialized views to insert into normalized tables in real time.
Create a materialized view or SELECT with joins; ClickHouse handles it without duplicating data.