GROUP BY groups rows sharing the same column values so aggregate functions return one row per group.
GROUP BY clusters rows whose selected columns hold identical values, letting aggregate functions like count(), sum(), or avg() return one summarized row per cluster. This reduces raw event data into actionable metrics.
Place GROUP BY after FROM and optional WHERE clauses. List the columns to group on, then add aggregate expressions in the SELECT list. ClickHouse requires every non-aggregated column in SELECT to appear in GROUP BY.
Yes. Separate columns with commas. ClickHouse forms a composite key, creating a distinct group for each unique value combination. This is handy for daily revenue per customer or product.
Aggregate functions, such as sum(total_amount) or countDistinct(customer_id), run once per group. ClickHouse optimizes many aggregates with combinators like sumIf() to speed filtered calculations.
Use HAVING after GROUP BY to keep or discard groups based on aggregate output. Example: HAVING sum(total_amount) > 500 filters high-value customers without rerunning the aggregation.
Use ORDER BY on the same columns as GROUP BY when creating tables. Prefer AggregatingMergeTree engines for heavy aggregations. Add low-cardinality columns to optimize memory. Avoid unnecessary SELECT *.
Alias aggregates clearly (AS total_sales). Keep GROUP BY columns on separate lines. Document why each aggregate matters. Commit queries to version control or a tool like Galaxy Collections for team reuse.
Yes. WHERE filters rows before aggregation, while HAVING filters groups after aggregation. Use both for precise control.
Absolutely. You can GROUP BY toDate(order_date) or even GROUP BY hash(customer_id) when suitable.
WITH TOTALS appends an extra row containing grand totals across all groups, useful for quick sanity checks.