How to GROUP BY in Redshift

Galaxy Glossary

How do I use GROUP BY in Amazon Redshift to aggregate data?

GROUP BY clusters rows sharing column values, enabling Redshift to compute aggregates per group.

Sign up for the latest in SQL knowledge from the Galaxy Team!

Description

What does GROUP BY do in Redshift?

GROUP BY partitions result sets by one or more columns so aggregate functions—COUNT, SUM, AVG, MIN, MAX—run per partition instead of across the entire table.

When should I use GROUP BY?

Use it to calculate metrics per customer, product, day, or any categorical dimension. Common cases include daily revenue, customer order counts, and inventory valuation.

How do I write basic GROUP BY syntax?

Select non-aggregated columns in the GROUP BY clause, then apply aggregates in the SELECT list.Each non-aggregated column must appear in GROUP BY or be functionally dependent on it.

Example: revenue per customer

This query totals order amounts per customer from the Orders table.

SELECT customer_id,
SUM(total_amount) AS customer_revenue
FROM Orders
GROUP BY customer_id;

Can I GROUP BY multiple columns?

Yes. Add comma-separated columns to create finer partitions, such as revenue per customer per month.

How does GROUP BY interact with ORDER BY?

ORDER BY runs after GROUP BY.To sort by aggregated columns, reference their alias or ordinal position.

Tip

ORDER BY 2 sorts by the second column in the SELECT list—handy when sorting by an aggregate alias.

How do HAVING and GROUP BY work together?

HAVING filters groups after aggregation. Use it for conditions on aggregated values like revenue > 1000.

Performance best practices

1. Prefix predicate filters in a WHERE clause to shrink scanned data.
2. Choose distribution and sort keys aligned with GROUP BY columns to minimize shuffling.
3.Avoid SELECT *; project only needed columns.

Common mistakes and fixes

Missing column in GROUP BY

Redshift errors when a non-aggregated column isn’t in GROUP BY. Aggregate it or add it to the clause.

Using HAVING instead of WHERE

HAVING runs post-aggregation, so predicates like order_date >= CURRENT_DATE - 30 belong in WHERE for speed.

Advanced: GROUPING SETS

Redshift supports GROUPING SETS for roll-ups. Example: total revenue per product and overall total in one scan.

.

Why How to GROUP BY in Redshift is important

How to GROUP BY in Redshift Example Usage


-- Average items per order per customer in the last 90 days
SELECT c.id AS customer_id,
       c.name,
       AVG(oi.quantity) AS avg_items
FROM   Customers c
JOIN   Orders o   ON o.customer_id = c.id
JOIN   OrderItems oi ON oi.order_id = o.id
WHERE  o.order_date >= CURRENT_DATE - INTERVAL '90 days'
GROUP  BY c.id, c.name
HAVING AVG(oi.quantity) > 2
ORDER  BY avg_items DESC;

How to GROUP BY in Redshift Syntax


SELECT column1, column2, AGG(column3)
FROM table_name
[WHERE predicate]
GROUP BY column1, column2
[HAVING aggregate_predicate]
[ORDER BY column1 | aggregate_alias];

-- Ecommerce example
SELECT customer_id, DATE_TRUNC('month', order_date) AS order_month,
       SUM(total_amount) AS monthly_revenue
FROM   Orders
WHERE  order_date >= CURRENT_DATE - INTERVAL '1 year'
GROUP  BY customer_id, DATE_TRUNC('month', order_date)
ORDER  BY monthly_revenue DESC;

Common Mistakes

Frequently Asked Questions (FAQs)

Can I use column aliases in GROUP BY?

No. Redshift requires original column names in GROUP BY. Use aliases only in SELECT and ORDER BY.

Does Redshift support ROLLUP or CUBE?

Yes. Use GROUPING SETS, which can implement ROLLUP and CUBE logic for multi-level aggregations.

How do I aggregate distinct values?

Use COUNT(DISTINCT column) or SUM(DISTINCT column) inside the SELECT list alongside GROUP BY.

Want to learn about other SQL terms?

Trusted by top engineers on high-velocity teams
Aryeo Logo
Assort Health
Curri
Rubie
BauHealth Logo
Truvideo Logo