GROUP BY in BigQuery aggregates rows that share column values so you can apply aggregate functions like COUNT, SUM, or AVG.
Group related rows to calculate totals, averages, or counts in one scan, reducing data processed and query cost.
Place non-aggregated columns in the GROUP BY clause and use aggregate functions on the remaining columns.
Yes. List every dimension column after GROUP BY or use positional references like 1,2,3 that map to the SELECT list.
Use HAVING after GROUP BY to keep or discard groups based on aggregate conditions, e.g., HAVING SUM(total_amount) > 1000.
Project only needed columns, use table aliases, avoid SELECT *, and filter early with WHERE to minimize scanned data.
Legacy SQL only. In Standard SQL, BigQuery handles large cardinality automatically; simply use GROUP BY.
This query sums order totals by product and date, demonstrating multiple dimensions and HAVING.
SELECT p.name,
DATE(o.order_date) AS order_day,
SUM(oi.quantity * p.price) AS revenue
FROM `shop.Orders` o
JOIN `shop.OrderItems` oi ON oi.order_id = o.id
JOIN `shop.Products` p ON p.id = oi.product_id
GROUP BY p.name, order_day
HAVING revenue > 500
ORDER BY revenue DESC;
Yes. Every non-aggregated column in SELECT must appear in GROUP BY or the query errors.
Absolutely. Use AS to create readable names—e.g., SUM(total_amount) AS total_revenue.
Yes. GROUP BY 1,2 refers to the first and second columns in the SELECT list. This shortens queries but reduces readability.
It can lower cost because aggregation reduces result size, but scanning cost depends on selected columns and filters applied in WHERE.