How to Normalize Unicode Strings in MariaDB

Galaxy Glossary

How do I normalize Unicode text in MariaDB?

NFC(), NFD(), NFKC(), and NFKD convert text to consistent Unicode normalization forms in MariaDB.

Welcome to the Galaxy, Guardian!
You'll be receiving a confirmation email

Follow us on twitter :)

Oops! Something went wrong while submitting the form.

Description

Example H2

Example H3

Why normalize Unicode text in MariaDB?

Normalization guarantees that visually identical characters have the same byte sequence, preventing duplicate keys and failed joins.

Which functions perform normalization?

MariaDB offers four scalar functions: NFC(), NFD(), NFKC(), and NFKD(). Each converts a string to the corresponding Unicode Normalization Form.

What is the basic syntax?

Call the function with the target column or literal string. The return value is VARCHAR in the same character set.

How to normalize and store customer names?

Update text columns so future comparisons are reliable:

UPDATE Customers SET name = NFC(name) WHERE id > 0;

Query performance tip

Create a generated column that stores the normalized value and index it for fast lookups.

Can I normalize while inserting?

Yes—wrap the value inside the normalization function in INSERT or LOAD DATA statements.

Best practice: validate before updating

Run SELECT id, name FROM Customers WHERE name <> NFC(name); to preview rows that will change.

How to check if data is already normalized?

Compare the original text to its normalized version; a difference means it was not normalized:

SELECT name FROM Customers WHERE name != NFC(name);

Unicode normalization in joins

Use normalization on both sides of a join to avoid mismatches caused by accent composition differences.

Common mistakes

Skipping collation. Always use a Unicode collation (e.g., utf8mb4_unicode_ci) so that comparison rules match normalized data.

Updating without transaction. Wrap bulk normalization in a single transaction to avoid partial updates.

Why How to Normalize Unicode Strings in MariaDB is important

How to Normalize Unicode Strings in MariaDB Example Usage


-- Deduplicate customer names that differ only by accents
WITH normalized AS (
  SELECT id, NFC(name) AS norm_name
  FROM Customers
)
SELECT c1.id AS duplicate_of, c2.id AS original
FROM normalized c1
JOIN normalized c2 
  ON c1.norm_name = c2.norm_name AND c1.id > c2.id;

How to Normalize Unicode Strings in MariaDB Syntax


-- Normalize a single string
SELECT NFC('cafe01'); -- Returns 'café'

-- Update a column in place
UPDATE Customers
SET name = NFD(name)
WHERE id > 0;

-- Insert with normalization
INSERT INTO Products (name, price, stock)
VALUES (NFKC('ＥＸＴＲＡ  ＳＰＡＣＥＳ'), 19.99, 100);

-- Create indexed generated column
ALTER TABLE Customers 
ADD COLUMN name_nfc VARCHAR(255) AS (NFC(name)) STORED,
ADD INDEX idx_name_nfc (name_nfc);

Common Mistakes

Using ASCII or latin1 character sets. Without a Unicode character set (utf8mb4), normalization may truncate or misinterpret characters. Always set the column and connection character set to utf8mb4.
Running bulk UPDATE without WHERE filter. This can lock large tables and cause downtime. First identify affected rows, then normalize in batches or during low-traffic windows.