ASCII & Unicode
How text becomes numbers.
ASCII & Unicode
When you type the letter A on your keyboard, the computer doesn't store a letter — it stores a number. That number is the codepoint for A: 65. When you type B, it stores 66. This mapping from characters to numbers is called a character encoding.
Analogy
Think of a library's card catalog. Every book on the shelves has a call number — 813.54 MOR —
and the librarian never moves books by their title, only by that number. The title is for humans;
the number is for the shelving system. A character encoding is the same idea applied to writing:
every letter, digit, and symbol is assigned a unique number that the computer uses internally,
and the "A" you see on screen is just a label on the shelf.
ASCII — the first 128
The original encoding was ASCII, which covers the basic Latin alphabet, digits, punctuation, and control characters. It fits in 7 bits, so every ASCII character is a number from 0 to 127.
| Character | Codepoint | Hex |
|---|---|---|
A |
65 | 0x41 |
a |
97 | 0x61 |
0 |
48 | 0x30 |
! |
33 | 0x21 |
| space | 32 | 0x20 |
Unicode — every character on Earth
ASCII only covers English. Unicode extends it to every writing system, every emoji, every symbol — over a million codepoints.
UTF-8 — how Unicode becomes bytes
Computers store bytes, not codepoints. UTF-8 is the encoding that packs Unicode codepoints into bytes: ASCII fits in 1 byte, European accents in 2, most other scripts in 3, emoji in 4.
This is not encryption
Encoding a character as a number isn't a secret. Anyone with the table can reverse it. Keep this in mind as you go — encoding transforms data into a different representation; encryption transforms data so that only the key holder can read it.