How to GROUP BY in Redshift

How do I use GROUP BY in Amazon Redshift to aggregate data?

GROUP BY clusters rows sharing column values, enabling Redshift to compute aggregates per group.

Description

What does GROUP BY do in Redshift?

GROUP BY partitions result sets by one or more columns so aggregate functions—COUNT, SUM, AVG, MIN, MAX—run per partition instead of across the entire table.

When should I use GROUP BY?

Use it to calculate metrics per customer, product, day, or any categorical dimension. Common cases include daily revenue, customer order counts, and inventory valuation.

How do I write basic GROUP BY syntax?

Select non-aggregated columns in the GROUP BY clause, then apply aggregates in the SELECT list.Each non-aggregated column must appear in GROUP BY or be functionally dependent on it.

Example: revenue per customer

This query totals order amounts per customer from the Orders table.

SELECT customer_id, SUM(total_amount) AS customer_revenue FROM Orders GROUP BY customer_id;

Can I GROUP BY multiple columns?

Yes. Add comma-separated columns to create finer partitions, such as revenue per customer per month.

How does GROUP BY interact with ORDER BY?

ORDER BY runs after GROUP BY.To sort by aggregated columns, reference their alias or ordinal position.

Tip

ORDER BY 2 sorts by the second column in the SELECT list—handy when sorting by an aggregate alias.

How do HAVING and GROUP BY work together?

HAVING filters groups after aggregation. Use it for conditions on aggregated values like revenue > 1000.

Performance best practices

1. Prefix predicate filters in a WHERE clause to shrink scanned data.
2. Choose distribution and sort keys aligned with GROUP BY columns to minimize shuffling.
3.Avoid SELECT *; project only needed columns.

Common mistakes and fixes

Missing column in GROUP BY

Redshift errors when a non-aggregated column isn’t in GROUP BY. Aggregate it or add it to the clause.

Using HAVING instead of WHERE

HAVING runs post-aggregation, so predicates like order_date >= CURRENT_DATE - 30 belong in WHERE for speed.

Advanced: GROUPING SETS

Redshift supports GROUPING SETS for roll-ups. Example: total revenue per product and overall total in one scan.

Why How to GROUP BY in Redshift is important