The same visible text can be stored with different underlying code points, which makes search, comparison, and de-duplication fail even though the words look identical.
Unicode Normalize converts text to a standard form, NFC, NFD, NFKC, or NFKD, so equivalent text compares as equal.
How to use Unicode Normalize
- Choose NFC, NFD, NFKC, or NFKD for the form you need.
- Paste the text you want to normalize into the box.
- Copy the normalized output, now in a consistent code-point form.
Use cases
- Normalising user input before comparing or de-duplicating it.
- Converting compatibility characters to plain equivalents for indexing.
- Standardising text to one canonical form before storage.
Good to know
Unicode Normalize offers the four standard forms: NFC composes characters, NFD decomposes them, and the NFK variants additionally fold compatibility characters such as ligatures and circled digits to plain forms. NFC is the most common choice for storage and the web. The NFK forms change appearance, so use them when canonical meaning matters more than exact glyphs.
Frequently asked questions
What is the difference between NFC and NFD?
NFC composes letters and their accents into single code points, while NFD splits them apart. They look the same but store differently.
When should I use NFKC or NFKD?
Use the compatibility forms to fold characters like ligatures or circled numbers to plain equivalents, for example before search indexing.
Could normalization change how my text looks?
The NFK forms can, since they replace compatibility glyphs with plain ones; NFC and NFD preserve appearance.