You're going to write a JSON parser from scratch in TypeScript — a tokenizer that breaks a JSON string into typed tokens, then a recursive-descent parser that builds a tree from those tokens. By the end, you'll understand exactly what JSON.parse() does under the hood, and you'll have a working parser that handles strings, numbers, booleans, nulls, arrays, and objects.
We're not racing. Each step is one idea. If a hint helps, take it — there's no penalty.
Before you start
macOS ships Node with some package managers. Try node --version first. If you don't have it, install via Homebrew: brew install node@20.
The walkthrough
Set up a Node + TypeScript project
In your terminal, create a folder for this project and initialize it:
mkdir json-parser
cd json-parser
npm init -y
npm install --save-dev typescript vitest
npx tsc --init
This creates a package.json, installs TypeScript and Vitest (a fast test runner), and generates a tsconfig.json with compiler options.
Then create a src/ folder and a src/index.ts file:
mkdir -p src
touch src/index.ts
Define the Token type
A token is one atomic piece of JSON syntax. We'll use a TypeScript discriminated union — a type that can be one of several shapes, and we know which shape by looking at a type field.
At the top of src/index.ts, write:
type Token =
| { type: "String"; value: string }
| { type: "Number"; value: number }
| { type: "Bool"; value: boolean }
| { type: "Null" }
| { type: "OpenBracket" }
| { type: "CloseBracket" }
| { type: "OpenBrace" }
| { type: "CloseBrace" }
| { type: "Colon" }
| { type: "Comma" };
This says a Token is either a string token with a value, or a number token with a value, or ... and so on. TypeScript will make sure you handle every case when you check the type field.
type Token =
| { type: "String"; value: string }
| { type: "Number"; value: number }
| { type: "Bool"; value: boolean }
| { type: "Null" }
| { type: "OpenBracket" }
| { type: "CloseBracket" }
| { type: "OpenBrace" }
| { type: "CloseBrace" }
| { type: "Colon" }
| { type: "Comma" };Handle quoted strings
Strings in JSON are wrapped in double quotes and can contain escape sequences like \n, \", \\. Write a tokenizeString() helper that reads characters until it finds the closing quote.
Then call it from a tokenize(input: string) function that scans the input and produces a Token[]:
function tokenizeString(input: string, start: number): { token: Token; nextIndex: number } {
let i = start + 1; // skip opening quote
let value = "";
while (i < input.length) {
if (input[i] === '"') {
return { token: { type: "String", value }, nextIndex: i + 1 };
}
if (input[i] === "\\") {
i++;
if (i >= input.length) throw new Error("Unterminated string");
const char = input[i];
value += char === "n" ? "\n" : char === "t" ? "\t" : char === "r" ? "\r" : char === '"' ? '"' : char === "\\" ? "\\" : char;
} else {
value += input[i];
}
i++;
}
throw new Error("Unterminated string");
}
function tokenize(input: string): Token[] {
const tokens: Token[] = [];
let i = 0;
while (i < input.length) {
const char = input[i];
if (/\s/.test(char)) {
i++; // skip whitespace
continue;
}
if (char === '"') {
const { token, nextIndex } = tokenizeString(input, i);
tokens.push(token);
i = nextIndex;
continue;
}
// ... we'll add more token types next
throw new Error(`Unexpected character: ${char}`);
}
return tokens;
}
Test it: in a new file src/test.ts, write:
console.log(tokenize('["hello", "world"]'));
Run npx ts-node src/test.ts and you should see the String tokens.
[
{ type: 'String', value: 'hello' },
{ type: 'String', value: 'world' }
]Handle numeric literals
Add a tokenizeNumber() helper and wire it into tokenize(). Numbers in JSON are integers, decimals, and exponents: 42, 3.14, 1e10, -0.5.
function tokenizeNumber(input: string, start: number): { token: Token; nextIndex: number } {
let i = start;
if (input[i] === "-") i++; // optional minus sign
while (i < input.length && /\d/.test(input[i])) i++; // integer part
if (input[i] === ".") {
i++;
while (i < input.length && /\d/.test(input[i])) i++; // decimal part
}
if (input[i] === "e" || input[i] === "E") {
i++;
if (input[i] === "+" || input[i] === "-") i++; // exponent sign
while (i < input.length && /\d/.test(input[i])) i++; // exponent digits
}
const numStr = input.slice(start, i);
const value = parseFloat(numStr);
if (isNaN(value)) throw new Error(`Invalid number: ${numStr}`);
return { token: { type: "Number", value }, nextIndex: i };
}
In the tokenize() function, add this branch inside the main loop (before the throw):
if (char === "-" || /\d/.test(char)) {
const { token, nextIndex } = tokenizeNumber(input, i);
tokens.push(token);
i = nextIndex;
continue;
}
Test with console.log(tokenize('[42, 3.14, -1e5]')); — you should see three Number tokens.
[
{ type: 'Number', value: 42 },
{ type: 'Number', value: 3.14 },
{ type: 'Number', value: -100000 }
]Handle true, false, null
Add a helper for the three keyword tokens:
function tokenizeKeyword(input: string, start: number): { token: Token; nextIndex: number } {
if (input.slice(start, start + 4) === "true") {
return { token: { type: "Bool", value: true }, nextIndex: start + 4 };
}
if (input.slice(start, start + 5) === "false") {
return { token: { type: "Bool", value: false }, nextIndex: start + 5 };
}
if (input.slice(start, start + 4) === "null") {
return { token: { type: "Null" }, nextIndex: start + 4 };
}
throw new Error(`Unknown keyword at position ${start}`);
}
In tokenize(), before the throw, add:
if (char === "t" || char === "f" || char === "n") {
const { token, nextIndex } = tokenizeKeyword(input, i);
tokens.push(token);
i = nextIndex;
continue;
}
Test with console.log(tokenize('[true, false, null]')); — you should see three tokens.
[
{ type: 'Bool', value: true },
{ type: 'Bool', value: false },
{ type: 'Null' }
]Handle brackets, braces, colons, commas
These are single characters. Add this to tokenize() before the keyword branch:
if (char === "[") {
tokens.push({ type: "OpenBracket" });
i++;
continue;
}
if (char === "]") {
tokens.push({ type: "CloseBracket" });
i++;
continue;
}
if (char === "{") {
tokens.push({ type: "OpenBrace" });
i++;
continue;
}
if (char === "}") {
tokens.push({ type: "CloseBrace" });
i++;
continue;
}
if (char === ":") {
tokens.push({ type: "Colon" });
i++;
continue;
}
if (char === ",") {
tokens.push({ type: "Comma" });
i++;
continue;
}
Test with console.log(tokenize('{"key": [1, 2], "nested": null}')); — you should see all token types together.
[
{ type: 'OpenBrace' },
{ type: 'String', value: 'key' },
{ type: 'Colon' },
{ type: 'OpenBracket' },
{ type: 'Number', value: 1 },
{ type: 'Comma' },
{ type: 'Number', value: 2 },
{ type: 'CloseBracket' },
{ type: 'Comma' },
{ type: 'String', value: 'nested' },
{ type: 'Colon' },
{ type: 'Null' },
{ type: 'CloseBrace' }
]What does a discriminated union allow TypeScript to do?
Define the JsonValue type
Now that we have tokens, we need to turn them into actual values. Define:
type JsonValue = string | number | boolean | null | JsonValue[] | { [key: string]: JsonValue };
This says: a JSON value is either a primitive (string, number, bool, null) or an array of JSON values or an object whose values are JSON values. The recursive definition handles nesting.
Also, create a parser class to hold state:
class JsonParser {
private tokens: Token[];
private position: number = 0;
constructor(tokens: Token[]) {
this.tokens = tokens;
}
private peek(): Token | undefined {
return this.tokens[this.position];
}
private next(): Token | undefined {
return this.tokens[this.position++];
}
private expect(type: string): Token {
const token = this.next();
if (!token || token.type !== type) {
throw new Error(`Expected ${type}, got ${token?.type ?? "EOF"}`);
}
return token;
}
parse(): JsonValue {
const value = this.parseValue();
if (this.peek()) throw new Error("Unexpected token after JSON");
return value;
}
private parseValue(): JsonValue {
const token = this.peek();
if (!token) throw new Error("Unexpected end of input");
// we'll implement each case next
throw new Error(`Unexpected token: ${token.type}`);
}
}
type JsonValue = string | number | boolean | null | JsonValue[] | { [key: string]: JsonValue };
class JsonParser {
private tokens: Token[];
private position: number = 0;
constructor(tokens: Token[]) {
this.tokens = tokens;
}
private peek(): Token | undefined {
return this.tokens[this.position];
}
private next(): Token | undefined {
return this.tokens[this.position++];
}
private expect(type: string): Token {
const token = this.next();
if (!token || token.type !== type) {
throw new Error(`Expected ${type}, got ${token?.type ?? "EOF"}`);
}
return token;
}
parse(): JsonValue {
const value = this.parseValue();
if (this.peek()) throw new Error("Unexpected token after JSON");
return value;
}
private parseValue(): JsonValue {
const token = this.peek();
if (!token) throw new Error("Unexpected end of input");
throw new Error(`Unexpected token: ${token.type}`);
}
}Parse primitive values
Fill in parseValue() to handle single-token cases:
private parseValue(): JsonValue {
const token = this.peek();
if (!token) throw new Error("Unexpected end of input");
if (token.type === "String") {
const t = this.next() as { type: "String"; value: string };
return t.value;
}
if (token.type === "Number") {
const t = this.next() as { type: "Number"; value: number };
return t.value;
}
if (token.type === "Bool") {
const t = this.next() as { type: "Bool"; value: boolean };
return t.value;
}
if (token.type === "Null") {
this.next();
return null;
}
if (token.type === "OpenBracket") return this.parseArray();
if (token.type === "OpenBrace") return this.parseObject();
throw new Error(`Unexpected token: ${token.type}`);
}
Test with:
const input = '["hello", 42, true, null]';
const tokens = tokenize(input);
const parser = new JsonParser(tokens);
console.log(parser.parse());
You should see: ["hello", 42, true, null]
[ 'hello', 42, true, null ]
What does type narrowing do in TypeScript?
Parse arrays
Arrays are [ then a comma-separated list of values, then ]. Here's the function:
private parseArray(): JsonValue[] {
this.expect("OpenBracket");
const items: JsonValue[] = [];
if (this.peek()?.type === "CloseBracket") {
this.next();
return items;
}
while (true) {
items.push(this.parseValue());
if (this.peek()?.type === "CloseBracket") {
this.next();
return items;
}
this.expect("Comma");
}
}
Test with:
const input = '[1, "two", [3, 4], true]';
const tokens = tokenize(input);
const parser = new JsonParser(tokens);
console.log(parser.parse());
You should see nested arrays work.
[ 1, 'two', [ 3, 4 ], true ]
Parse objects
Objects are { then key-value pairs separated by commas (key is always a string, then :, then a value), then }:
private parseObject(): { [key: string]: JsonValue } {
this.expect("OpenBrace");
const obj: { [key: string]: JsonValue } = {};
if (this.peek()?.type === "CloseBrace") {
this.next();
return obj;
}
while (true) {
const keyToken = this.expect("String") as { type: "String"; value: string };
const key = keyToken.value;
this.expect("Colon");
const value = this.parseValue();
obj[key] = value;
if (this.peek()?.type === "CloseBrace") {
this.next();
return obj;
}
this.expect("Comma");
}
}
Test with:
const input = '{"name": "Alice", "age": 30, "scores": [95, 87]}';
const tokens = tokenize(input);
const parser = new JsonParser(tokens);
console.log(parser.parse());
You should see: { name: 'Alice', age: 30, scores: [ 95, 87 ] }
{ name: 'Alice', age: 30, scores: [ 95, 87 ] }Add line + column numbers to errors
When parsing fails, "Unexpected token" isn't helpful. Add line and column tracking to make errors actionable. Modify the Token type to include position:
type Token =
| { type: "String"; value: string; line: number; col: number }
| { type: "Number"; value: number; line: number; col: number }
| { type: "Bool"; value: boolean; line: number; col: number }
| { type: "Null"; line: number; col: number }
| { type: "OpenBracket"; line: number; col: number }
| { type: "CloseBracket"; line: number; col: number }
| { type: "OpenBrace"; line: number; col: number }
| { type: "CloseBrace"; line: number; col: number }
| { type: "Colon"; line: number; col: number }
| { type: "Comma"; line: number; col: number };
Then update tokenize() to track position:
function tokenize(input: string): Token[] {
const tokens: Token[] = [];
let i = 0;
let line = 1;
let col = 1;
// ... inside the loop, before each token ...
const startLine = line;
const startCol = col;
// ... and after consuming any token, update line/col ...
for (const char of /* consumed text */) {
if (char === "\n") {
line++;
col = 1;
} else {
col++;
}
}
// ... push the token with startLine, startCol ...
}
Then in the parser, use this info in error messages:
throw new Error(
`Unexpected token at line ${token.line}, col ${token.col}: expected ${type}, got ${token.type}`
);
Test with invalid JSON to see better errors.
Unexpected token at line 1, col 5: expected Comma, got OpenBrace
Write unit tests with Vitest
Create src/index.test.ts and write tests covering happy paths and error cases:
import { describe, it, expect } from "vitest";
describe("JSON Parser", () => {
it("parses primitives", () => {
const run = (input: string) => {
const tokens = tokenize(input);
const parser = new JsonParser(tokens);
return parser.parse();
};
expect(run('"hello"')).toBe("hello");
expect(run("42")).toBe(42);
expect(run("3.14")).toBe(3.14);
expect(run("true")).toBe(true);
expect(run("false")).toBe(false);
expect(run("null")).toBe(null);
});
it("parses arrays", () => {
const tokens = tokenize("[1, 2, 3]");
const parser = new JsonParser(tokens);
expect(parser.parse()).toEqual([1, 2, 3]);
});
it("parses empty arrays", () => {
const tokens = tokenize("[]");
const parser = new JsonParser(tokens);
expect(parser.parse()).toEqual([]);
});
it("parses objects", () => {
const tokens = tokenize('{"x": 1, "y": 2}');
const parser = new JsonParser(tokens);
expect(parser.parse()).toEqual({ x: 1, y: 2 });
});
it("parses nested structures", () => {
const input = '{"users": [{"name": "Alice", "age": 30}, {"name": "Bob", "age": 25}]}';
const tokens = tokenize(input);
const parser = new JsonParser(tokens);
const result = parser.parse() as { users: any[] };
expect(result.users).toHaveLength(2);
expect((result.users[0] as any).name).toBe("Alice");
});
it("rejects unterminated strings", () => {
expect(() => {
tokenize('"hello');
}).toThrow("Unterminated string");
});
it("rejects trailing commas", () => {
expect(() => {
const tokens = tokenize("[1, 2, 3,]");
const parser = new JsonParser(tokens);
parser.parse();
}).toThrow();
});
it("rejects non-string object keys", () => {
expect(() => {
const tokens = tokenize("{1: 2}");
const parser = new JsonParser(tokens);
parser.parse();
}).toThrow();
});
});
Run npx vitest src/index.test.ts and check they all pass.
✓ parses primitives ✓ parses arrays ✓ parses empty arrays ✓ parses objects ✓ parses nested structures ✓ rejects unterminated strings ✓ rejects trailing commas ✓ rejects non-string object keys
Export a public parse function
Create a public entry point that users call. At the end of src/index.ts, add:
export function parse(input: string): JsonValue {
const tokens = tokenize(input);
const parser = new JsonParser(tokens);
return parser.parse();
}
Then consumers can do:
import { parse } from "./index.ts";
const data = parse('{"x": 1}');
console.log(data.x); // 1
Create a small example file, src/example.ts:
import { parse } from "./index.js";
const jsonString = `
{
"title": "JSON Parser",
"difficulty": "beginner",
"steps": 14,
"complete": true
}
`;
const data = parse(jsonString);
console.log("Parsed successfully:");
console.log(data);
Run it: npx ts-node src/example.ts
Parsed successfully:
{
title: 'JSON Parser',
difficulty: 'beginner',
steps: 14,
complete: true
}README + benchmark
Create a README.md in the project root:
# JSON Parser from Scratch
A working JSON parser in TypeScript — tokenizer + recursive-descent parser, zero dependencies.
## Usage
\`\`\`typescript
import { parse } from "./src/index.js";
const data = parse('{"name": "Alice", "age": 30}');
console.log(data.name); // "Alice"
\`\`\`
## What's inside
- **Tokenizer**: breaks a JSON string into typed tokens (String, Number, Bool, Null, punctuation)
- **Parser**: a recursive-descent parser that builds a JSON value tree from tokens
- **Error messages**: line + column info on parse failures
## Run tests
\`\`\`bash
npm test
\`\`\`
## Compare to JSON.parse
How fast is our parser vs. the built-in?
\`\`\`typescript
import { performance } from "perf_hooks";
import { parse as ourParse } from "./src/index.js";
const largeJson = JSON.stringify(Array(1000).fill({ x: 1, y: 2 }));
const t0 = performance.now();
for (let i = 0; i < 1000; i++) ourParse(largeJson);
const t1 = performance.now();
const t2 = performance.now();
for (let i = 0; i < 1000; i++) JSON.parse(largeJson);
const t3 = performance.now();
console.log(\`Our parser: \${(t1 - t0).toFixed(2)}ms\`);
console.log(\`JSON.parse: \${(t3 - t2).toFixed(2)}ms\`);
console.log(\`Ratio: \${((t1 - t0) / (t3 - t2)).toFixed(1)}x slower\`);
\`\`\`
Run it: \`npx ts-node benchmark.ts\`
You'll see our parser is ~10–50x slower than the built-in (which is C code). That's expected — but now you *know why*.
Add a benchmark.ts file and run it. Save the output in the README.
# JSON Parser from Scratch
A working JSON parser in TypeScript — tokenizer + recursive-descent parser, zero dependencies.
## Usage
```typescript
import { parse } from "./src/index.js";
const data = parse('{"name": "Alice", "age": 30}');
console.log(data.name); // "Alice"
```
## What's inside
- **Tokenizer**: breaks a JSON string into typed tokens (String, Number, Bool, Null, punctuation)
- **Parser**: a recursive-descent parser that builds a JSON value tree from tokens
- **Error messages**: line + column info on parse failures
## Run tests
```bash
npm test
```
## Compare to JSON.parseYou did it
Run npx vitest to confirm everything passes. Run npx ts-node src/example.ts to see it parse real JSON. Run the benchmark to see how much slower you are than the built-in (and reflect on why — hint: C beats TypeScript).
You wrote:
- A tokenizer that turns characters into typed tokens
- A recursive-descent parser that builds trees from tokens
- A discriminated union type so TypeScript knows which token you have
- Type narrowing inside
ifbranches to safely access token fields - Error recovery with line + column numbers
- A test suite covering happy paths and error cases
- An export API that's simple to use
- A benchmark so you can measure performance
Every one of those is a tool you'll use again — in the next language, in the next domain, everywhere.