What this calculator does
The Hamming distance between two sequences of the same length is the number of positions where the two sequences differ. It is a simple, fast way to measure how many substitutions (and only substitutions) are needed to transform one string into another when insertions and deletions are not allowed.
This calculator compares your two inputs character-by-character from left to right and returns the total mismatch count. Because the comparison is positional, the strings must be the same length for the Hamming distance to be defined.
Definition and formula
Let x and y be two strings (or sequences) of equal length n. Their Hamming distance is:
where the indicator function ฮด is:
ฮด(a,b) = 1 if a โ b
ฮด(a,b) = 0 if a = b
In plain language: scan both strings at the same positions and add 1 every time the characters differ.
How to interpret the result
- Distance = 0: the strings are identical (same characters in every position).
- Small distance: the strings are very similar under substitution-only changes.
- Larger distance: more positions differ; for length
n, the maximum distance is n (every position differs).
If you want a scale-free measure, you can also compute the normalized distance (sometimes used in analysis):
normalized = d(x,y) / n (a value between 0 and 1). This calculator reports the raw Hamming distance (an integer).
Worked example (step-by-step)
Compare two binary strings of equal length:
x = 1011101
y = 1001001
Compare each position:
- 1 vs 1 (same) โ +0
- 0 vs 0 (same) โ +0
- 1 vs 0 (different) โ +1
- 1 vs 1 (same) โ +0
- 1 vs 0 (different) โ +1
- 0 vs 0 (same) โ +0
- 1 vs 1 (same) โ +0
Total mismatches = 2, so d(x,y)=2.
Common use cases
- Error-correcting codes: minimum Hamming distance between valid codewords determines how many bit errors can be detected/corrected.
- Networking and storage: parity checks and ECC RAM rely on differences between received and valid bit patterns.
- Biology (aligned sequences): for DNA/protein strings of equal length (already aligned), Hamming distance counts point mutations.
- Clustering/near-duplicate detection: fixed-length fingerprints or categorical encodings can be compared quickly.
Hamming vs. other โstring distanceโ measures
| Metric |
Requires equal length? |
Allowed operations |
Typical use |
| Hamming distance |
Yes |
Substitutions only (position-by-position mismatches) |
Bitstrings, codes, aligned sequences |
| Levenshtein (edit) distance |
No |
Insertions, deletions, substitutions |
Typos, approximate text matching |
| Jaccard distance |
No |
Set overlap of tokens/characters (not positional) |
Similarity of sets, shingling |
Limitations and assumptions (important)
- Equal length is required: if the two inputs are different lengths, Hamming distance is not defined. You must pad/trim intentionally, or use an edit distance (e.g., Levenshtein) if you need insertions/deletions.
- Case sensitivity: uppercase and lowercase letters are treated as different characters (e.g.,
A โ a).
- All characters count: spaces, punctuation, and digits are compared like any other character. If your strings include leading/trailing spaces, they affect the result.
- Unicode/emoji: distance is computed at the level of JavaScript string code units; some composed Unicode characters can behave unexpectedly. For plain ASCII, DNA letters, and typical binary/text inputs, results match intuition.
- Alignment matters: for biological sequences, Hamming distance is meaningful only when sequences are already aligned and the same length.
- Empty strings: two empty strings have distance 0; empty vs. non-empty is invalid because lengths differ.
Quick tips
- To compare binary sequences, use only
0 and 1 and ensure lengths match.
- To ignore case, convert both strings to the same case first (e.g., both lowercase) before calculating.
- If you want a percentage difference, compute
100 ร d/n.