Unicode Code Point Inspector

JJ Ben-Joseph headshot JJ Ben-Joseph

What this Unicode Code Point Inspector does

The Unicode Code Point Inspector lets you move back and forth between characters (like A, é, or 😀) and their underlying numeric identifiers. For any valid Unicode character or code point it will:

This is useful when you are debugging encoding problems, writing regular expressions with Unicode support, working with emoji, or just learning how Unicode works internally.

How to use this inspector

The form accepts either a single character or a numeric code point. You can use whichever is more convenient.

1. Enter a character

In the Character field, type or paste exactly one user‑perceived character. Examples:

The inspector will derive the code point from that character and show all numeric representations.

2. Enter a code point

In the Code Point field, you can enter the numeric value directly in one of several formats:

If you fill in both fields, the code point takes precedence. This is helpful if the character field contains multiple characters or whitespace by accident.

3. Run the inspection

After entering your value, activate the Inspect button. The page will show:

What is a Unicode code point?

Unicode assigns every character a unique number called a code point. Conceptually, the Unicode space is a numbered list from U+0000 to U+10FFFF. Each position may represent a letter, digit, punctuation mark, symbol, emoji, or a special non‑printing control code.

By convention, code points are written as U+HHHH where HHHH is a hexadecimal number. For example:

Internally, computers still store bytes, not abstract code points. Encoding schemes such as UTF‑8 and UTF‑16 map each code point to one or more underlying code units, which are then expressed as bytes in memory or on disk.

Code points vs. UTF‑16 code units

This tool focuses on how a code point is represented in UTF‑16, the encoding used by JavaScript strings and many APIs. In UTF‑16:

JavaScript’s string.length property counts UTF‑16 code units, not code points. That means characters above U+FFFF are counted as length 2. The inspector helps you see this clearly by displaying the UTF‑16 units next to the code point.

Formula for surrogate pairs (UTF‑16)

For code points above U+FFFF, UTF‑16 uses surrogate pairs. If CP is a code point in the range U+10000 to U+10FFFF, the transformation from CP to the two UTF‑16 units can be expressed formally.

The core relationship can be written as:

CP 65536 = v

Then the high and low surrogates are:

high = 0xD800 + v / 1024 low = 0xDC00 + v mod 1024

In words:

  1. Subtract 0x10000 (65536) from the code point.
  2. Divide the result by 1024. The quotient, added to 0xD800, gives the high surrogate.
  3. The remainder, added to 0xDC00, gives the low surrogate.

The inspector applies this logic under the hood when it displays UTF‑16 units for characters outside the BMP.

Worked example: 😀 (U+1F600)

Suppose you paste 😀 into the Character field and click Inspect. The tool will find its code point and derive related data:

  1. The Unicode code point is U+1F600.
  2. In decimal, this is 128512.
  3. In binary, it is a 21‑bit value: 0001 1111 0110 0000 0000 (grouped for readability).
  4. Because it is greater than U+FFFF, UTF‑16 uses a surrogate pair.

Following the surrogate pair formula:

The inspector presents these UTF‑16 units so that you can see why JavaScript sees this character as length 2, and how it will look when encoded as bytes.

Interpreting the inspector output

Once you run the tool, you will usually see several distinct fields in the results. Here is how to read them:

Common representations compared

The same Unicode character can appear in different notations depending on context. The table below outlines some of the most common ways to express a code point and how they relate to the inspector’s outputs.

Context Example notation Relationship to inspector output
Unicode standard U+1F600 Matches the inspector’s normalized code point field.
Hex literal in code 0x1F600 Same numeric value as the hex output, using a language‑specific prefix.
Decimal code 128512 Matches the inspector’s decimal value.
JavaScript escape "\u{1F600}" Uses the same code point; older syntax may show surrogate units like "\uD83D\uDE00".
HTML entity 😀 or 😀 These numeric character references are based on the decimal and hex outputs.
UTF‑16 units D83D DE00 Corresponds to the inspector’s UTF‑16 code unit field.

Practical uses and limitations

This inspector is designed as a lightweight, browser‑based helper. It is powerful enough for everyday development and learning tasks, but it does make some assumptions.

Practical uses

Limitations and assumptions

Keeping these limitations in mind will help you interpret the results accurately and avoid misdiagnosing encoding issues.

Next steps

Once you are comfortable with the numeric side of Unicode, you can apply what you learn here directly in code, markup, and debugging sessions. Use the inspector as a quick reference whenever you need to confirm a code point, generate an escape sequence, or understand how a particular character is stored in UTF‑16.

Provide either a single character or a code point in hexadecimal (U+1F600) or decimal form.

Enter a character or code point to see its properties.

Embed this calculator

Copy and paste the HTML below to add the Unicode Code Point Inspector to your website.