NORMALIZE returns a Unicode string converted to the specified normalization form (NFC, NFD, NFKC, or NFKD), making text comparisons reliable.
Strings containing accents or composed characters can compare or sort unpredictably. NORMALIZE converts every code point to a canonical form, so “é” and “e01” become identical.
Call it before storing or comparing user-entered names, product titles, or email addresses to avoid duplicate keys, case issues, and mismatched joins.
Use NORMALIZE(char_expr [, form])
. form
is optional and defaults to NFC
. Valid forms: NFC
, NFD
, NFKC
, NFKD
.
NFC keeps characters in their composed form (single code point) and is recommended for storage. NFD splits characters into base + combining marks, useful for advanced search functions.
Wrap the column in NORMALIZE during inserts or updates:INSERT INTO Customers (name, email) VALUES (NORMALIZE(:name,'NFC'), :email);
Yes. Use an UPDATE with SET column = NORMALIZE(column):UPDATE Customers SET name = NORMALIZE(name, 'NFC');
After bulk normalization, rebuild function-based indexes or unique keys that rely on the text column to ensure the index uses the new values.
Normalize at the point of entry, stick to one form (NFC), create deterministic function-based indexes, and document the policy so every service layer follows it.
Yes, NFD and NFKD often increase length because they decompose characters into base + combining marks. Plan VARCHAR sizes accordingly.
It is CPU-bound and scales linearly with text length. For bulk jobs, batch commits and monitor CPU usage.