Remove Duplicates in SQL

Galaxy Glossary

How do you remove duplicates in SQL?

Removing duplicates in SQL refers to identifying repeated rows and eliminating extras while preserving at least one accurate record.

Sign up for the latest in SQL knowledge from the Galaxy Team!
Thank you! Your submission has been received!
Oops! Something went wrong while submitting the form.

Description

Remove Duplicates in SQL

Learn fast, reliable techniques—DISTINCT, GROUP BY, window functions, and CTEs—to find and delete duplicate rows across popular databases.

What Is Removing Duplicates in SQL?

Removing duplicates in SQL means writing queries that identify identical rows and eliminate the extras, keeping one authoritative record. Techniques vary by dialect but share a goal: data integrity.

Why Do Duplicate Rows Occur?

Duplicates stem from missing constraints, batch imports, merge errors, or lack of idempotent ETL logic. Cleaning them protects analytics accuracy and downstream applications.

How Do You Quickly Detect Duplicates?

Use GROUPBY with HAVINGCOUNT(>1) to list keys appearing multiple times, or leverage ROW_NUMBER() window functions to flag repeats in one pass.SELECT id, COUNT(*)
FROM sales
GROUP BY id
HAVING COUNT(*) > 1;

How Do You Delete Duplicates but Keep One Row?

Wrap ROW_NUMBER() in a CTE, then delete rows where row_num > 1. This keeps the first occurrence based on chosen sort criteria.WITH dup AS (
SELECT id, ROW_NUMBER() OVER (PARTITION BY id ORDER BY created_at) AS rn
FROM sales)
DELETE FROM sales USING dup
WHERE sales.id = dup.id AND dup.rn > 1;

Can SELECT DISTINCT Remove Duplicates?

SELECT DISTINCT returns unique rows for reporting or inserts into a new table, but it doesn’t change the original data unless you replace or overwrite the table.

How Do You Remove Duplicates Without Permanent Deletion?

Create a temp table with DISTINCT rows, truncate the original table, then re-insert from the temp table. This offers a rollback plan.

What Are Best Practices to Prevent Duplicates?

Add primary keys or UNIQUE constraints, stage raw loads, validate keys during ETL, and schedule deduplication jobs as safety nets.

Which SQL Dialects Support These Techniques?

Postgres, SQLServer, Snowflake, Redshift, BigQuery, and MySQL8+ support window-function deletion. Older MySQL uses self-joins; Oracle needs DELETE WHERE ROWID NOT IN.

How Does Galaxy Help Remove Duplicates Faster?

Galaxy’s AI copilot autowrites deduplication CTEs, highlights duplicate keys in the result grid, and lets teams endorse canonical queries in shared Collections.

Why Remove Duplicates in SQL is important

Dirty data skews KPIs, inflates counts, and erodes trust. Deduplication ensures accurate analytics, consistent customer views, and reliable machine-learning training sets. Automated, repeatable SQL patterns let data engineers fix issues quickly without manual exports.

Remove Duplicates in SQL Example Usage


"How do I delete duplicate email addresses in a Postgres table using ROW_NUMBER?"

Remove Duplicates in SQL Syntax



Common Mistakes

Frequently Asked Questions (FAQs)

FAQ

Does DELETE DISTINCT exist in SQL?

No. Use CTEs or subqueries with ROW_NUMBER() to target duplicates, then DELETE.

How safe is removing duplicates with transactions?

Wrapping the DELETE in BEGIN/COMMIT lets you validate row counts, ensuring a quick rollback if needed.

Can Galaxy auto-generate deduplication queries?

Yes. Galaxy’s AI copilot reads table metadata and proposes optimized CTEs, saving manual coding time.

What if my database lacks window functions?

Use a self-join on MIN(id) to preserve the first row, or upgrade to a version that supports analytic functions.

Want to learn about other SQL terms?