SQL DISTINCT returns only unique rows by eliminating duplicates from a SELECT result. Add DISTINCT immediately after SELECT to deduplicate full rows or specific column combinations. Use it for cleaner reports, counts, and subqueries, but watch performance on large tables.
The SQL DISTINCT keyword removes duplicate rows, returning only unique combinations of selected columns. Use it to deduplicate data for reports, aggregations, and subqueries.
SQL DISTINCT removes duplicate rows in the result set. When the database engine processes the SELECT list, it compares the values of the specified columns in each row. If two rows share identical values across those columns, only one row is kept. The operation produces a deduplicated view without altering the underlying table.
Use DISTINCT whenever your query returns redundant rows that would confuse reports or downstream queries. Typical cases include exporting customer lists, counting unique visitors, feeding subqueries to JOINs, or populating dropdown values. DISTINCT is a read-only tool, so you can test it safely without risking data changes.
Place the DISTINCT keyword immediately after SELECT. The simplest pattern is
SELECT DISTINCT column_name FROM table_name;
. The server scans the target column and returns each value once. You can list several columns to keep only unique combinations: SELECT DISTINCT country, state FROM addresses;
.
When DISTINCT covers multiple columns, the engine treats the entire set of listed columns as a composite key. A row is considered a duplicate only if all listed column values match another row. This technique deduplicates at the grouping level you choose, such as unique city-state pairs or unique product-date pairs.
Yes. Wrapping a column in COUNT(DISTINCT column) tells the engine to count unique values, not total rows. Example:
SELECT COUNT(DISTINCT user_id) AS active_users FROM events WHERE event_date = CURRENT_DATE;
. The outer COUNT returns a single integer representing distinct user IDs for the specific day.
Most SQL dialects treat all NULLs as equal when DISTINCT is applied. If multiple rows contain NULL in the deduplicated columns, only one NULL row appears. This behavior matches the SQL standard, but always test your database because some engines may treat NULLs differently.
DISTINCT and GROUP BY both collapse duplicate rows, yet they target different goals. DISTINCT is shorthand for grouping without aggregates; it merely removes repetition. GROUP BY groups rows so that aggregate functions like SUM or AVG can compute metrics per group. If you need aggregates, choose GROUP BY; if not, DISTINCT often reads more clearly.
Add ORDER BY after DISTINCT just like any SELECT. The ORDER BY list can include columns not in the SELECT clause on some databases, but safest practice is to sort by selected columns. Example:
SELECT DISTINCT city FROM addresses ORDER BY city ASC;
.
Yes. You can deduplicate a join output by placing DISTINCT after SELECT, or deduplicate one side first in a subquery. Example:
SELECT DISTINCT p.category FROM order_lines ol JOIN products p ON p.id = ol.product_id;
returns categories purchased.
DISTINCT forces the engine to sort or hash the result to detect duplicates, which adds CPU, memory, and possible disk usage. On indexed or small result sets the impact is minor, but on wide, unindexed tables performance can drop sharply. Always benchmark with EXPLAIN plans.
Limit the number of columns listed, filter early with WHERE, add appropriate indexes, and avoid SELECT * with DISTINCT. Consider GROUP BY instead if you also compute aggregates, or use window functions like ROW_NUMBER() to pick one row per group with finer control.
Always ask why duplicates exist before reaching for DISTINCT; removing them in source data might be cheaper. Avoid DISTINCT in highly concurrent transactional systems where every millisecond matters. Document queries so teammates know deduplication is intentional. Review execution plans regularly as data volume grows.
DISTINCT is a quick, declarative way to return unique rows for clearer analysis and application use. Syntax is simple, but performance depends on data size, indexes, and column count. Combine DISTINCT with COUNT, JOIN, and ORDER BY as needed, use GROUP BY for aggregates, and monitor execution plans.
Avoid DISTINCT when duplicates can be removed at source, when you need aggregates, or when the deduplication cost outweighs the benefit on very large datasets.
Yes, most databases support DISTINCT on any comparable data type. However, large text or JSON values can increase memory usage during deduplication.
Case sensitivity depends on the database collation. In case-insensitive collations 'USA' and 'usa' are treated as duplicates; in binary collations they differ.
Yes. Prepend DISTINCT in the SELECT part to ensure only unique rows are inserted into the target table.