Query tuning in ClickHouse adjusts clauses and session settings like PREWHERE, SAMPLE, and max_threads to reduce scanned rows and latency.
Large table scans, missing PREWHERE clauses, or unoptimized settings cause most slowdowns. Identifying them early saves compute and cost.
Use system.query_log
to find heavy queries, check read_rows
and read_bytes
, and focus on the worst offenders first.
Add EXPLAIN AST
or EXPLAIN PIPELINE
in front of your SELECT to see parsed filters, projected columns, and pipeline stages.
Look for steps that read unnecessary columns or ignore indexes. Target these areas for tuning.
Raise max_threads
to fully use CPU cores. Example: SET max_threads = 16;
Increase memory budget only when needed. Example: SET max_memory_usage = 16e9;
Allow larger in-memory GROUP BY before spilling. Example: SET max_bytes_before_external_group_by = 4e9;
Place the most selective condition in PREWHERE
to cut early rows. ClickHouse reads only matching parts of each data block.
SELECT id, name
FROM Customers
PREWHERE created_at > now() - INTERVAL 7 DAY
WHERE email ILIKE '%@gmail.com'
SAMPLE
reads a pseudo-random subset of parts. Use it to estimate result size or debug heavy joins without scanning all data.
SELECT count() FROM Orders SAMPLE 10%
Always join small-to-large. Move small tables to the left. Enable JOIN
algorithm hash
or partial_merge
.
SELECT o.id, c.name
FROM Customers c
ANY LEFT JOIN Orders o ON o.customer_id = c.id
SETTINGS join_algorithm = 'hash'
SET max_threads = 16,
max_memory_usage = 8e9;
SELECT
c.name,
sum(oi.quantity * p.price) AS lifetime_value
FROM Customers c
ANY LEFT JOIN Orders o ON o.customer_id = c.id
ANY LEFT JOIN OrderItems oi ON oi.order_id = o.id
ANY LEFT JOIN Products p ON p.id = oi.product_id
PREWHERE o.order_date > now() - INTERVAL 90 DAY
GROUP BY c.name
ORDER BY lifetime_value DESC
LIMIT 100
SETTINGS join_algorithm = 'hash',
allow_experimental_parallel_reading_from_replica = 1;
PREWHERE
SAMPLE
for quick testssystem.query_log
and EXPLAIN
Yes. Issue session-level SET
commands (e.g., SET max_threads = 16
) inside your client. They expire when the connection closes.
Only if your queries spill to disk. First profile with EXPLAIN
and system.query_log
. Raising max_memory_usage
before analyzing wastes resources.
No. SAMPLE returns approximate results. Use it for debugging or estimations, then run the full query once satisfied with the plan.