GROUP BY is a SELECT-statement clause that collapses rows having identical values in one or more columns into single summary rows. Aggregate functions such as COUNT, SUM, AVG, MIN, and MAX are then applied to each group. GROUP BY happens after the FROM-JOIN phase and before HAVING, SELECT (projection), and ORDER BY in the logical query processing order. A query without aggregates but with GROUP BY is valid, though uncommon. GROUP BY supports advanced features like GROUPING SETS, ROLLUP, and CUBE (dialect dependent) to generate multiple grouping levels in one query. All nonaggregated columns in the SELECT list must be present in the GROUP BY clause or be functionally dependent on them (only SQL:2011 and some databases support functional-dependency relaxation). Attempting to include other columns will raise an error. Null values are treated as equal, forming a single null group. Performance hinges on indexes and the number of distinct group keys; large cardinalities may require hash or sort operations.
column_list
(identifier or expression) - One or more columns or expressions that define grouping boundariesHAVING, DISTINCT, ORDER BY, ROLLUP, CUBE, GROUPING SETS, aggregate functions
SQL-86 (first ANSI SQL standard)
DISTINCT removes duplicate rows across the entire SELECT list, while GROUP BY forms groups based on specified columns and often pairs with aggregates to compute summaries per group.
Use the HAVING clause, not WHERE, to filter after aggregation. Example: HAVING COUNT(*) > 5 keeps only groups with more than five rows.
All nonaggregated columns selected must also be listed in GROUP BY (or be functionally dependent in supported dialects). Remove the column, wrap it in an aggregate, or add it to GROUP BY.
Use ROLLUP to produce hierarchical subtotals (e.g., city, state, country, grand total) and CUBE for all possible subtotal combinations, saving multiple queries and simplifying reporting.