SQL Keywords

SQL GROUP

What does the SQL GROUP BY clause do?

GROUP BY combines rows sharing the same values into summary rows for aggregation.
Sign up to get up to date news on SQL keywords
Welcome to the Galaxy, Guardian!
You'll be receiving a confirmation email

Follow us on twitter :)
Oops! Something went wrong while submitting the form.

Compatible dialects for SQL GROUP: Supported by PostgreSQL, MySQL, MariaDB, SQL Server, Oracle, SQLite, Snowflake, BigQuery, Redshift and all ANSI-compliant systems. Advanced features (ROLLUP, CUBE, GROUPING SETS) vary: PostgreSQL, Oracle, SQL Server, MySQL 8+ fully support; SQLite lacks them.

SQL GROUP Full Explanation

GROUP BY is a SELECT-statement clause that collapses rows having identical values in one or more columns into single summary rows. Aggregate functions such as COUNT, SUM, AVG, MIN, and MAX are then applied to each group. GROUP BY happens after the FROM-JOIN phase and before HAVING, SELECT (projection), and ORDER BY in the logical query processing order. A query without aggregates but with GROUP BY is valid, though uncommon. GROUP BY supports advanced features like GROUPING SETS, ROLLUP, and CUBE (dialect dependent) to generate multiple grouping levels in one query. All nonaggregated columns in the SELECT list must be present in the GROUP BY clause or be functionally dependent on them (only SQL:2011 and some databases support functional-dependency relaxation). Attempting to include other columns will raise an error. Null values are treated as equal, forming a single null group. Performance hinges on indexes and the number of distinct group keys; large cardinalities may require hash or sort operations.

SQL GROUP Syntax

SELECT column1, aggregate_function(column2) AS alias
FROM   table_name
WHERE  conditions
GROUP BY column1 [ , columnN ]
  [ WITH ROLLUP | WITH CUBE | GROUPING SETS ( ... ) ]
HAVING aggregate_function(column2) > value;

SQL GROUP Parameters

  • column_list (identifier or expression) - One or more columns or expressions that define grouping boundaries

Example Queries Using SQL GROUP

-- 1. Total sales per customer
SELECT customer_id, SUM(amount) AS total_spent
FROM   sales
GROUP BY customer_id;

-- 2. Count of employees per department with a filter
SELECT department_id, COUNT(*) AS headcount
FROM   employees
GROUP BY department_id
HAVING COUNT(*) > 10;

-- 3. Multi-level subtotals using ROLLUP (MySQL/PostgreSQL)
SELECT region, country, SUM(revenue) AS rev
FROM   orders
GROUP BY ROLLUP (region, country);

Expected Output Using SQL GROUP

  • Each query returns one result row per distinct group key plus any specified super-aggregate rows (ROLLUP/CUBE)
  • Aggregate columns hold computed values; non-grouped columns that are aggregated show single values per group

Use Cases with SQL GROUP

  • Calculate metrics such as revenue per customer or visits per day
  • Build summary reports and dashboards
  • Feed BI layers that require preaggregated datasets
  • Generate subtotals and grand totals with ROLLUP/CUBE
  • Enforce data privacy by returning only grouped statistics

Common Mistakes with SQL GROUP

  • Selecting columns not present in GROUP BY or inside an aggregate
  • Forgetting HAVING for post-aggregation filtering and misusing WHERE
  • Assuming GROUP BY preserves input order; use ORDER BY instead
  • Expecting NULLs to form separate groups for each NULL value (they do not)
  • Using DISTINCT redundantly with GROUP BY (GROUP BY already eliminates duplicates within groups)

Related Topics

HAVING, DISTINCT, ORDER BY, ROLLUP, CUBE, GROUPING SETS, aggregate functions

First Introduced In

SQL-86 (first ANSI SQL standard)

Frequently Asked Questions

What is the difference between GROUP BY and DISTINCT?

DISTINCT removes duplicate rows across the entire SELECT list, while GROUP BY forms groups based on specified columns and often pairs with aggregates to compute summaries per group.

How do I filter grouped results?

Use the HAVING clause, not WHERE, to filter after aggregation. Example: HAVING COUNT(*) > 5 keeps only groups with more than five rows.

Why do I get the error “column must appear in the GROUP BY clause”?

All nonaggregated columns selected must also be listed in GROUP BY (or be functionally dependent in supported dialects). Remove the column, wrap it in an aggregate, or add it to GROUP BY.

When should I use ROLLUP or CUBE?

Use ROLLUP to produce hierarchical subtotals (e.g., city, state, country, grand total) and CUBE for all possible subtotal combinations, saving multiple queries and simplifying reporting.

Sign up to get up to date news on SQL keywords
Welcome to the Galaxy, Guardian!
You'll be receiving a confirmation email

Follow us on twitter :)
Oops! Something went wrong while submitting the form.
Trusted by top engineers on high-velocity teams
Aryeo Logo
Assort Health
Curri
Rubie Logo
Bauhealth Logo
Truvideo Logo

Check out other commonly used SQL Keywords!