NORMALIZE converts a Unicode string to a chosen normalization form (NFC, NFD, NFKC, or NFKD) so that visually identical characters are stored with identical byte sequences.
NORMALIZE standardizes Unicode strings so that characters with multiple binary representations (such as accented letters) become identical. This prevents hidden duplicates and makes JOINs, GROUP BYs, and text comparisons reliable.
Customer names, addresses, or product titles pasted from different sources often carry mixed Unicode forms. Applying NORMALIZE before inserts or during ETL guarantees deduplication and consistent search behaviour.
Use NORMALIZE(string_expr, form)
.Omitting form
defaults to NFC, the most common choice for storage and comparisons.
NFC is safest for most applications. Pick NFKC or NFKD only when you need compatibility folding (e.g., converting superscript ² to regular 2). NFD is useful for stripping accents when combined with TRANSLATE
.
Yes.Update the column in-place or create a computed column that stores the normalized version to avoid repetitive calls in queries.
UPDATE Customers
SET name = NORMALIZE(name, 'NFC');
1. Normalize during data ingestion inside stages or COPY INTO statements. 2. Index or cluster on normalized columns instead of raw text. 3. Always normalize both sides of a comparison to avoid mismatches.
• Mis-matching forms across tables causes silent join failures. • Forgetting to cast non-VARCHAR types to VARCHAR before normalizing raises an error.
.
The function is fast on small strings but costly in large scans. Normalize once during ETL or store the result to avoid repeated processing.
Yes. Normalize to NFD, then strip diacritics with TRANSLATE()
by deleting the combining marks.
No. COLLATE controls sort and comparison rules, while NORMALIZE changes the underlying byte representation of the string.