GROUP BY clusters rows sharing column values, enabling Redshift to compute aggregates per group.
GROUP BY partitions result sets by one or more columns so aggregate functions—COUNT, SUM, AVG, MIN, MAX—run per partition instead of across the entire table.
Use it to calculate metrics per customer, product, day, or any categorical dimension. Common cases include daily revenue, customer order counts, and inventory valuation.
Select non-aggregated columns in the GROUP BY clause, then apply aggregates in the SELECT list.Each non-aggregated column must appear in GROUP BY or be functionally dependent on it.
This query totals order amounts per customer from the Orders table.
SELECT customer_id,
SUM(total_amount) AS customer_revenue
FROM Orders
GROUP BY customer_id;
Yes. Add comma-separated columns to create finer partitions, such as revenue per customer per month.
ORDER BY runs after GROUP BY.To sort by aggregated columns, reference their alias or ordinal position.
ORDER BY 2 sorts by the second column in the SELECT list—handy when sorting by an aggregate alias.
HAVING filters groups after aggregation. Use it for conditions on aggregated values like revenue > 1000.
1. Prefix predicate filters in a WHERE clause to shrink scanned data.
2. Choose distribution and sort keys aligned with GROUP BY columns to minimize shuffling.
3.Avoid SELECT *; project only needed columns.
Redshift errors when a non-aggregated column isn’t in GROUP BY. Aggregate it or add it to the clause.
HAVING runs post-aggregation, so predicates like order_date >= CURRENT_DATE - 30 belong in WHERE for speed.
Redshift supports GROUPING SETS for roll-ups. Example: total revenue per product and overall total in one scan.
.
No. Redshift requires original column names in GROUP BY. Use aliases only in SELECT and ORDER BY.
Yes. Use GROUPING SETS, which can implement ROLLUP and CUBE logic for multi-level aggregations.
Use COUNT(DISTINCT column) or SUM(DISTINCT column) inside the SELECT list alongside GROUP BY.