NFC(), NFD(), NFKC(), and NFKD convert text to consistent Unicode normalization forms in MariaDB.
Normalization guarantees that visually identical characters have the same byte sequence, preventing duplicate keys and failed joins.
MariaDB offers four scalar functions: NFC()
, NFD()
, NFKC()
, and NFKD()
. Each converts a string to the corresponding Unicode Normalization Form.
Call the function with the target column or literal string. The return value is VARCHAR
in the same character set.
Update text columns so future comparisons are reliable:
UPDATE Customers
SET name = NFC(name)
WHERE id > 0;
Create a generated column that stores the normalized value and index it for fast lookups.
Yes—wrap the value inside the normalization function in INSERT
or LOAD DATA
statements.
Run SELECT id, name FROM Customers WHERE name <> NFC(name);
to preview rows that will change.
Compare the original text to its normalized version; a difference means it was not normalized:
SELECT name FROM Customers WHERE name != NFC(name);
Use normalization on both sides of a join to avoid mismatches caused by accent composition differences.
Skipping collation. Always use a Unicode collation (e.g., utf8mb4_unicode_ci
) so that comparison rules match normalized data.
Updating without transaction. Wrap bulk normalization in a single transaction to avoid partial updates.
Yes, decomposed forms (NFD/NFKD) can increase length, while composed forms (NFC/NFKC) may reduce it. Size columns accordingly.
Yes, they always return the same output for the same input, making them safe for generated columns and functional indexes.
The functions are CPU-bound but fast for short strings. Normalize once and store results to avoid per-query overhead.