encoding · level 1

ASCII & Unicode

How text becomes numbers.

100–150 XP

ASCII & Unicode

When you type the letter A on your keyboard, the computer doesn't store a letter — it stores a number. That number is the codepoint for A: 65. When you type B, it stores 66. This mapping from characters to numbers is called a character encoding.

Analogy

Think of a library's card catalog. Every book on the shelves has a call number — 813.54 MOR — and the librarian never moves books by their title, only by that number. The title is for humans; the number is for the shelving system. A character encoding is the same idea applied to writing: every letter, digit, and symbol is assigned a unique number that the computer uses internally, and the "A" you see on screen is just a label on the shelf.

ASCII — the first 128

The original encoding was ASCII, which covers the basic Latin alphabet, digits, punctuation, and control characters. It fits in 7 bits, so every ASCII character is a number from 0 to 127.

Character Codepoint Hex
A 65 0x41
a 97 0x61
0 48 0x30
! 33 0x21
space 32 0x20

Unicode — every character on Earth

ASCII only covers English. Unicode extends it to every writing system, every emoji, every symbol — over a million codepoints.

UTF-8 — how Unicode becomes bytes

Computers store bytes, not codepoints. UTF-8 is the encoding that packs Unicode codepoints into bytes: ASCII fits in 1 byte, European accents in 2, most other scripts in 3, emoji in 4.

This is not encryption

Encoding a character as a number isn't a secret. Anyone with the table can reverse it. Keep this in mind as you go — encoding transforms data into a different representation; encryption transforms data so that only the key holder can read it.