The OPTIMIZE TABLE statement merges data parts, removes duplicates, and boosts ClickHouse query speed.
OPTIMIZE TABLE
?Merging many small parts into larger ones reduces the number of files ClickHouse scans. Fewer files mean faster reads, lower CPU usage, and predictable latency during peak traffic.
FINAL
required?Use the FINAL
keyword when you need fresh aggregation results right after heavy inserts or updates. It forces a complete merge so aggregating engines (e.g., SummingMergeTree) return up-to-date values.
Add DEDUPLICATE
with a column list. ClickHouse removes duplicate rows in the specified partition, eliminating overlap from distributed ingestion pipelines.
Increase max_bytes_to_merge_at_max_space_in_pool
to let background merges process bigger parts. Raise parts_to_throw_insert
only in emergency scenarios to reject new parts when merge lag is huge.
Yes. Provide the partition key in the PARTITION
clause to limit merging and deduplication to that slice, saving resources and avoiding cluster-wide locks.
Partition Orders
by toYYYYMM(order_date)
and order by (customer_id, order_date)
. Schedule nightly OPTIMIZE TABLE Orders PARTITION …
jobs so morning dashboards run against compact parts.
Query system.parts
and alert when the number of active parts per partition exceeds 300. Trigger an immediate OPTIMIZE
if it spikes.
Run OPTIMIZE
first on raw tables, then on aggregated materialized views to maintain fast rollups.
Yes. The merge operation is replicated to other nodes, ensuring consistent part sets across the cluster.
Use KILL MUTATION WHERE command LIKE 'optimize%'
. Be sure parts are consistent after cancellation.
No. VACUUM reclaims dead rows, while OPTIMIZE merges storage parts and optionally deduplicates rows.