How to Tune Queries in ClickHouse

Galaxy Glossary

How can I speed up slow ClickHouse queries without altering table schema?

Query tuning in ClickHouse adjusts clauses and session settings like PREWHERE, SAMPLE, and max_threads to reduce scanned rows and latency.

Sign up for the latest in SQL knowledge from the Galaxy Team!
Welcome to the Galaxy, Guardian!
Oops! Something went wrong while submitting the form.

Description

Why do ClickHouse queries slow down?

Large table scans, missing PREWHERE clauses, or unoptimized settings cause most slowdowns. Identifying them early saves compute and cost.

Use system.query_log to find heavy queries, check read_rows and read_bytes, and focus on the worst offenders first.

How do I find the slow parts of a query?

Add EXPLAIN AST or EXPLAIN PIPELINE in front of your SELECT to see parsed filters, projected columns, and pipeline stages.

Look for steps that read unnecessary columns or ignore indexes. Target these areas for tuning.

Which settings give the biggest speed gains?

max_threads

Raise max_threads to fully use CPU cores. Example: SET max_threads = 16;

max_memory_usage

Increase memory budget only when needed. Example: SET max_memory_usage = 16e9;

max_bytes_before_external_group_by

Allow larger in-memory GROUP BY before spilling. Example: SET max_bytes_before_external_group_by = 4e9;

When should I use PREWHERE?

Place the most selective condition in PREWHERE to cut early rows. ClickHouse reads only matching parts of each data block.

SELECT id, name
FROM Customers
PREWHERE created_at > now() - INTERVAL 7 DAY
WHERE email ILIKE '%@gmail.com'

Does SAMPLE really speed things up?

SAMPLE reads a pseudo-random subset of parts. Use it to estimate result size or debug heavy joins without scanning all data.

SELECT count() FROM Orders SAMPLE 10%

How do I optimize joins?

Always join small-to-large. Move small tables to the left. Enable JOIN algorithm hash or partial_merge.

SELECT o.id, c.name
FROM Customers c
ANY LEFT JOIN Orders o ON o.customer_id = c.id
SETTINGS join_algorithm = 'hash'

Complete tuning example

SET max_threads = 16,
max_memory_usage = 8e9;

SELECT
c.name,
sum(oi.quantity * p.price) AS lifetime_value
FROM Customers c
ANY LEFT JOIN Orders o ON o.customer_id = c.id
ANY LEFT JOIN OrderItems oi ON oi.order_id = o.id
ANY LEFT JOIN Products p ON p.id = oi.product_id
PREWHERE o.order_date > now() - INTERVAL 90 DAY
GROUP BY c.name
ORDER BY lifetime_value DESC
LIMIT 100
SETTINGS join_algorithm = 'hash',
allow_experimental_parallel_reading_from_replica = 1;

Best practices checklist

  • Filter early with PREWHERE
  • Select only needed columns
  • Use SAMPLE for quick tests
  • Tune session settings instead of cluster-wide defaults
  • Profile with system.query_log and EXPLAIN

Why How to Tune Queries in ClickHouse is important

How to Tune Queries in ClickHouse Example Usage


SET max_threads = 16;
SET max_memory_usage = 8e9;

SELECT
    p.name,
    sum(oi.quantity) AS units_sold
FROM OrderItems oi
INNER JOIN Products p ON p.id = oi.product_id
PREWHERE oi.order_id IN (
    SELECT id FROM Orders SAMPLE 50%
)
GROUP BY p.name
ORDER BY units_sold DESC
LIMIT 20;

How to Tune Queries in ClickHouse Syntax


SELECT [DISTINCT] column_list
FROM table_expression
    [SAMPLE fraction [OFFSET fraction]]
    [PREWHERE condition]
    [WHERE condition]
    [GROUP BY expr_list]
    [ORDER BY expr_list]
    [LIMIT n [OFFSET k]]
    [SETTINGS key=value, ...];

-- Ecommerce join example
SELECT c.name, o.total_amount
FROM Customers AS c
ANY LEFT JOIN Orders AS o ON o.customer_id = c.id
PREWHERE o.order_date > today() - 30
SETTINGS max_threads = 16, join_algorithm = 'hash';

Common Mistakes

Frequently Asked Questions (FAQs)

Can I tune queries without changing server config?

Yes. Issue session-level SET commands (e.g., SET max_threads = 16) inside your client. They expire when the connection closes.

Does adding more RAM always help?

Only if your queries spill to disk. First profile with EXPLAIN and system.query_log. Raising max_memory_usage before analyzing wastes resources.

Is SAMPLE safe for production dashboards?

No. SAMPLE returns approximate results. Use it for debugging or estimations, then run the full query once satisfied with the plan.

Want to learn about other SQL terms?

Trusted by top engineers on high-velocity teams
Aryeo Logo
Assort Health
Curri
Rubie Logo
Bauhealth Logo
Truvideo Logo
Welcome to the Galaxy, Guardian!
Oops! Something went wrong while submitting the form.