DISTINCT is placed immediately after SELECT to filter out duplicate rows produced by the query’s FROM, JOIN, and WHERE clauses. The database engine evaluates the result set, compares the values of all selected columns, and returns only one instance of each unique combination. If multiple columns are listed, uniqueness is determined across the entire tuple, not per column. DISTINCT can be combined with aggregate functions such as COUNT to obtain counts of unique values. In PostgreSQL, the DISTINCT ON variation lets you specify leading columns for finer control, but it is non-standard. Because DISTINCT triggers a sort or hash aggregate internally, it can increase query execution time on large datasets. Adding selective WHERE clauses, proper indexing, or using GROUP BY can mitigate performance costs. Beware that DISTINCT applies after projections, so expressions or aliases affect uniqueness.
SELECT, GROUP BY, DISTINCT ON, COUNT, UNIQUE constraint
SQL-92
A UNIQUE constraint enforces uniqueness at the table storage level, preventing duplicate data from being inserted. DISTINCT operates at query time, filtering duplicates in the result set without altering stored data.
Check for hidden differences such as whitespace, letter casing, or additional selected columns. DISTINCT compares exact byte values of every selected column, so even minor variations create distinct rows.
Yes. For example, COUNT(DISTINCT column_name) counts the number of unique non-NULL values in that column.
Avoid using DISTINCT to fix bad joins or poor data modeling. Optimize your joins and filters first; use DISTINCT only when legitimate duplicates exist in the intended result.