How to Normalize Unicode Strings in MariaDB

Galaxy Glossary

How do I normalize Unicode text in MariaDB?

NFC(), NFD(), NFKC(), and NFKD convert text to consistent Unicode normalization forms in MariaDB.

Sign up for the latest in SQL knowledge from the Galaxy Team!
Welcome to the Galaxy, Guardian!
You'll be receiving a confirmation email

Follow us on twitter :)
Oops! Something went wrong while submitting the form.

Description

Table of Contents

Why normalize Unicode text in MariaDB?

Normalization guarantees that visually identical characters have the same byte sequence, preventing duplicate keys and failed joins.

Which functions perform normalization?

MariaDB offers four scalar functions: NFC(), NFD(), NFKC(), and NFKD(). Each converts a string to the corresponding Unicode Normalization Form.

What is the basic syntax?

Call the function with the target column or literal string. The return value is VARCHAR in the same character set.

How to normalize and store customer names?

Update text columns so future comparisons are reliable:

UPDATE Customers
SET name = NFC(name)
WHERE id > 0;

Query performance tip

Create a generated column that stores the normalized value and index it for fast lookups.

Can I normalize while inserting?

Yes—wrap the value inside the normalization function in INSERT or LOAD DATA statements.

Best practice: validate before updating

Run SELECT id, name FROM Customers WHERE name <> NFC(name); to preview rows that will change.

How to check if data is already normalized?

Compare the original text to its normalized version; a difference means it was not normalized:

SELECT name FROM Customers WHERE name != NFC(name);

Unicode normalization in joins

Use normalization on both sides of a join to avoid mismatches caused by accent composition differences.

Common mistakes

Skipping collation. Always use a Unicode collation (e.g., utf8mb4_unicode_ci) so that comparison rules match normalized data.

Updating without transaction. Wrap bulk normalization in a single transaction to avoid partial updates.

Why How to Normalize Unicode Strings in MariaDB is important

How to Normalize Unicode Strings in MariaDB Example Usage


-- Deduplicate customer names that differ only by accents
WITH normalized AS (
  SELECT id, NFC(name) AS norm_name
  FROM Customers
)
SELECT c1.id AS duplicate_of, c2.id AS original
FROM normalized c1
JOIN normalized c2 
  ON c1.norm_name = c2.norm_name AND c1.id > c2.id;

How to Normalize Unicode Strings in MariaDB Syntax


-- Normalize a single string
SELECT NFC('cafe01'); -- Returns 'café'

-- Update a column in place
UPDATE Customers
SET name = NFD(name)
WHERE id > 0;

-- Insert with normalization
INSERT INTO Products (name, price, stock)
VALUES (NFKC('EXTRA  SPACES'), 19.99, 100);

-- Create indexed generated column
ALTER TABLE Customers 
ADD COLUMN name_nfc VARCHAR(255) AS (NFC(name)) STORED,
ADD INDEX idx_name_nfc (name_nfc);

Common Mistakes

Frequently Asked Questions (FAQs)

Does normalization change string length?

Yes, decomposed forms (NFD/NFKD) can increase length, while composed forms (NFC/NFKC) may reduce it. Size columns accordingly.

Are these functions deterministic?

Yes, they always return the same output for the same input, making them safe for generated columns and functional indexes.

Is there a performance impact?

The functions are CPU-bound but fast for short strings. Normalize once and store results to avoid per-query overhead.

Want to learn about other SQL terms?

Trusted by top engineers on high-velocity teams
Aryeo Logo
Assort Health
Curri
Rubie Logo
Bauhealth Logo
Truvideo Logo
Welcome to the Galaxy, Guardian!
You'll be receiving a confirmation email

Follow us on twitter :)
Oops! Something went wrong while submitting the form.