How to GROUP BY in ClickHouse

Galaxy Glossary

How do I use GROUP BY in ClickHouse to aggregate data?

GROUP BY groups rows sharing the same column values so aggregate functions return one row per group.

Welcome to the Galaxy, Guardian!
You'll be receiving a confirmation email

Follow us on twitter :)

Oops! Something went wrong while submitting the form.

Description

Example H2

Example H3

What does GROUP BY do in ClickHouse?

GROUP BY clusters rows whose selected columns hold identical values, letting aggregate functions like count(), sum(), or avg() return one summarized row per cluster. This reduces raw event data into actionable metrics.

How do I write a basic GROUP BY?

Place GROUP BY after FROM and optional WHERE clauses. List the columns to group on, then add aggregate expressions in the SELECT list. ClickHouse requires every non-aggregated column in SELECT to appear in GROUP BY.

Can I GROUP BY multiple columns?

Yes. Separate columns with commas. ClickHouse forms a composite key, creating a distinct group for each unique value combination. This is handy for daily revenue per customer or product.

How do aggregate functions interact with GROUP BY?

Aggregate functions, such as sum(total_amount) or countDistinct(customer_id), run once per group. ClickHouse optimizes many aggregates with combinators like sumIf() to speed filtered calculations.

How to filter aggregated results with HAVING?

Use HAVING after GROUP BY to keep or discard groups based on aggregate output. Example: HAVING sum(total_amount) > 500 filters high-value customers without rerunning the aggregation.

Which performance tips matter for GROUP BY in ClickHouse?

Use ORDER BY on the same columns as GROUP BY when creating tables. Prefer AggregatingMergeTree engines for heavy aggregations. Add low-cardinality columns to optimize memory. Avoid unnecessary SELECT *.

What are best practices for readable queries?

Alias aggregates clearly (AS total_sales). Keep GROUP BY columns on separate lines. Document why each aggregate matters. Commit queries to version control or a tool like Galaxy Collections for team reuse.

Why How to GROUP BY in ClickHouse is important

How to GROUP BY in ClickHouse Example Usage


-- Total spent per customer in 2023
SELECT
    c.id AS customer_id,
    c.name,
    sum(o.total_amount) AS total_spent
FROM Customers AS c
INNER JOIN Orders AS o ON o.customer_id = c.id
WHERE toYear(o.order_date) = 2023
GROUP BY c.id, c.name
HAVING total_spent > 500
ORDER BY total_spent DESC;

How to GROUP BY in ClickHouse Syntax


SELECT
    column1,
    column2,
    aggregate_function(expr) AS alias
FROM table_name
[WHERE condition]
GROUP BY column1, column2
[WITH TOTALS] -- returns an extra row with grand totals
[HAVING aggregate_condition]
[ORDER BY column1]

Common Mistakes

Including columns in SELECT that are neither aggregated nor listed in GROUP BY. ClickHouse throws an error. Fix by adding the column to GROUP BY or wrapping it in an aggregate.
Grouping by high-cardinality columns like UUIDs, causing huge memory use. Fix by pre-aggregating on lower-cardinality keys or using approximate functions like uniqCombined().