The Unicode Code Point Inspector lets you move back and forth between characters (like A, é, or 😀) and their underlying numeric identifiers. For any valid Unicode character or code point it will:
This is useful when you are debugging encoding problems, writing regular expressions with Unicode support, working with emoji, or just learning how Unicode works internally.
The form accepts either a single character or a numeric code point. You can use whichever is more convenient.
In the Character field, type or paste exactly one user‑perceived character. Examples:
A, é, ßم, Ж, अ, 汉€, ©, →The inspector will derive the code point from that character and show all numeric representations.
In the Code Point field, you can enter the numeric value directly in one of several formats:
U+1F600, U+00E90x prefix: 0x1F600, 0x411F600, 0041128512, 65If you fill in both fields, the code point takes precedence. This is helpful if the character field contains multiple characters or whitespace by accident.
After entering your value, activate the Inspect button. The page will show:
U+1F600).Unicode assigns every character a unique number called a code point. Conceptually, the Unicode space is a numbered list from U+0000 to U+10FFFF. Each position may represent a letter, digit, punctuation mark, symbol, emoji, or a special non‑printing control code.
By convention, code points are written as U+HHHH where HHHH is a hexadecimal number. For example:
A → U+0041é → U+00E9U+1F600Internally, computers still store bytes, not abstract code points. Encoding schemes such as UTF‑8 and UTF‑16 map each code point to one or more underlying code units, which are then expressed as bytes in memory or on disk.
This tool focuses on how a code point is represented in UTF‑16, the encoding used by JavaScript strings and many APIs. In UTF‑16:
U+0000 to U+FFFF, and supplementary planes, from U+10000 to U+10FFFF.A (U+0041) is stored as a single unit 0041.JavaScript’s string.length property counts UTF‑16 code units, not code points. That means characters above U+FFFF are counted as length 2. The inspector helps you see this clearly by displaying the UTF‑16 units next to the code point.
For code points above U+FFFF, UTF‑16 uses surrogate pairs. If CP is a code point in the range U+10000 to U+10FFFF, the transformation from CP to the two UTF‑16 units can be expressed formally.
The core relationship can be written as:
Then the high and low surrogates are:
In words:
0x10000 (65536) from the code point.0xD800, gives the high surrogate.0xDC00, gives the low surrogate.The inspector applies this logic under the hood when it displays UTF‑16 units for characters outside the BMP.
Suppose you paste 😀 into the Character field and click Inspect. The tool will find its code point and derive related data:
U+1F600.128512.0001 1111 0110 0000 0000 (grouped for readability).U+FFFF, UTF‑16 uses a surrogate pair.Following the surrogate pair formula:
CP = 0x1F600.v = CP − 0x10000 = 0xF600.high = 0xD800 + (v / 0x400) = 0xD800 + 0x3D = 0xD83D.low = 0xDC00 + (v mod 0x400) = 0xDC00 + 0x00 = 0xDE00 (values are illustrative; the inspector shows the exact units used by the runtime).The inspector presents these UTF‑16 units so that you can see why JavaScript sees this character as length 2, and how it will look when encoded as bytes.
Once you run the tool, you will usually see several distinct fields in the results. Here is how to read them:
U+HHHH form. Use this for documentation or specifications.U+), often used in programming, terminals, and encoding tables.The same Unicode character can appear in different notations depending on context. The table below outlines some of the most common ways to express a code point and how they relate to the inspector’s outputs.
| Context | Example notation | Relationship to inspector output |
|---|---|---|
| Unicode standard | U+1F600 |
Matches the inspector’s normalized code point field. |
| Hex literal in code | 0x1F600 |
Same numeric value as the hex output, using a language‑specific prefix. |
| Decimal code | 128512 |
Matches the inspector’s decimal value. |
| JavaScript escape | "\u{1F600}" |
Uses the same code point; older syntax may show surrogate units like "\uD83D\uDE00". |
| HTML entity | 😀 or 😀 |
These numeric character references are based on the decimal and hex outputs. |
| UTF‑16 units | D83D DE00 |
Corresponds to the inspector’s UTF‑16 code unit field. |
This inspector is designed as a lightweight, browser‑based helper. It is powerful enough for everyday development and learning tasks, but it does make some assumptions.
\p{L} (letters) or \p{Nd} (decimal digits) in Unicode‑aware regex engines.U+0000 to U+10FFFF. Values outside this range are considered invalid.é versus e plus combining accent) are treated as distinct sequences of code points.Keeping these limitations in mind will help you interpret the results accurately and avoid misdiagnosing encoding issues.
Once you are comfortable with the numeric side of Unicode, you can apply what you learn here directly in code, markup, and debugging sessions. Use the inspector as a quick reference whenever you need to confirm a code point, generate an escape sequence, or understand how a particular character is stored in UTF‑16.