How to GROUP BY in ClickHouse

Galaxy Glossary

How do I use GROUP BY in ClickHouse to aggregate data?

GROUP BY groups rows sharing the same column values so aggregate functions return one row per group.

Sign up for the latest in SQL knowledge from the Galaxy Team!
Welcome to the Galaxy, Guardian!
Oops! Something went wrong while submitting the form.

Description

Table of Contents

What does GROUP BY do in ClickHouse?

GROUP BY clusters rows whose selected columns hold identical values, letting aggregate functions like count(), sum(), or avg() return one summarized row per cluster. This reduces raw event data into actionable metrics.

How do I write a basic GROUP BY?

Place GROUP BY after FROM and optional WHERE clauses. List the columns to group on, then add aggregate expressions in the SELECT list. ClickHouse requires every non-aggregated column in SELECT to appear in GROUP BY.

Can I GROUP BY multiple columns?

Yes. Separate columns with commas. ClickHouse forms a composite key, creating a distinct group for each unique value combination. This is handy for daily revenue per customer or product.

How do aggregate functions interact with GROUP BY?

Aggregate functions, such as sum(total_amount) or countDistinct(customer_id), run once per group. ClickHouse optimizes many aggregates with combinators like sumIf() to speed filtered calculations.

How to filter aggregated results with HAVING?

Use HAVING after GROUP BY to keep or discard groups based on aggregate output. Example: HAVING sum(total_amount) > 500 filters high-value customers without rerunning the aggregation.

Which performance tips matter for GROUP BY in ClickHouse?

Use ORDER BY on the same columns as GROUP BY when creating tables. Prefer AggregatingMergeTree engines for heavy aggregations. Add low-cardinality columns to optimize memory. Avoid unnecessary SELECT *.

What are best practices for readable queries?

Alias aggregates clearly (AS total_sales). Keep GROUP BY columns on separate lines. Document why each aggregate matters. Commit queries to version control or a tool like Galaxy Collections for team reuse.

Why How to GROUP BY in ClickHouse is important

How to GROUP BY in ClickHouse Example Usage


-- Total spent per customer in 2023
SELECT
    c.id AS customer_id,
    c.name,
    sum(o.total_amount) AS total_spent
FROM Customers AS c
INNER JOIN Orders AS o ON o.customer_id = c.id
WHERE toYear(o.order_date) = 2023
GROUP BY c.id, c.name
HAVING total_spent > 500
ORDER BY total_spent DESC;

How to GROUP BY in ClickHouse Syntax


SELECT
    column1,
    column2,
    aggregate_function(expr) AS alias
FROM table_name
[WHERE condition]
GROUP BY column1, column2
[WITH TOTALS] -- returns an extra row with grand totals
[HAVING aggregate_condition]
[ORDER BY column1]

Common Mistakes

Frequently Asked Questions (FAQs)

Do I need HAVING if I already used WHERE?

Yes. WHERE filters rows before aggregation, while HAVING filters groups after aggregation. Use both for precise control.

Can I GROUP BY expressions?

Absolutely. You can GROUP BY toDate(order_date) or even GROUP BY hash(customer_id) when suitable.

What does WITH TOTALS do?

WITH TOTALS appends an extra row containing grand totals across all groups, useful for quick sanity checks.

Want to learn about other SQL terms?

Trusted by top engineers on high-velocity teams
Aryeo Logo
Assort Health
Curri
Rubie Logo
Bauhealth Logo
Truvideo Logo
Welcome to the Galaxy, Guardian!
Oops! Something went wrong while submitting the form.