“Normalizing data” restructures a denormalized table into smaller, related tables to remove redundancy and enforce data integrity.
Normalization eliminates duplicated customer, product, and order details, shrinking storage, speeding updates, and preventing inconsistent data.
Typical workflow: 1) create lookup tables for repeating attributes, 2) populate them with distinct values, 3) replace duplicated columns with foreign-key columns, 4) add constraints to maintain referential integrity.
Run COUNT(*) with GROUP BY on candidate columns. If counts >1, the column is a normalization candidate.
CREATE TABLE defines lookup tables, INSERT INTO … SELECT DISTINCT fills them, UPDATE rewires the original table to reference the new keys, and ALTER TABLE adds foreign keys.
In a denormalized Orders table holding customer_name and customer_email, we first create Customers, load unique rows, then link Orders.customer_id to Customers.id.
CREATE TABLE Customers (id SERIAL PRIMARY KEY, name TEXT NOT NULL, email TEXT UNIQUE, created_at TIMESTAMP DEFAULT NOW());
INSERT INTO Customers (name, email)SELECT DISTINCT customer_name, customer_email FROM Orders;
ALTER TABLE Orders ADD COLUMN customer_id INT;
UPDATE Orders oSET customer_id = c.idFROM Customers cWHERE o.customer_name = c.name AND o.customer_email = c.email;
ALTER TABLE Orders ADD CONSTRAINT fk_orders_customers FOREIGN KEY (customer_id) REFERENCES Customers(id);ALTER TABLE Orders DROP COLUMN customer_name, DROP COLUMN customer_email;
Work in a transaction, add indexes on foreign keys, validate counts before and after the move, and update application code in tandem.
Forgetting to backfill foreign keys leaves NULLs; skipping UNIQUE constraints can re-introduce duplicates.
Read queries may need extra joins, but updates and storage become faster. Benchmark typical workloads before and after.
Yes—wrap the process in a transaction, backfill incrementally, and use triggers to sync new writes until the switch-over.