Beginnertypescript~120 min

📄 JSON Parser from Scratch

Write a working JSON parser from scratch in TypeScript. You'll build a tokenizer that turns characters into tokens, then a recursive-descent parser that turns tokens into a tree. By the end you'll understand exactly what JSON.parse does under the hood.

You're going to write a JSON parser from scratch in TypeScript — a tokenizer that breaks a JSON string into typed tokens, then a recursive-descent parser that builds a tree from those tokens. By the end, you'll understand exactly what JSON.parse() does under the hood, and you'll have a working parser that handles strings, numbers, booleans, nulls, arrays, and objects.

We're not racing. Each step is one idea. If a hint helps, take it — there's no penalty.

Before you start

macOS ships Node with some package managers. Try node --version first. If you don't have it, install via Homebrew: brew install node@20.

The walkthrough

Set up a Node + TypeScript project

In your terminal, create a folder for this project and initialize it:

mkdir json-parser
cd json-parser
npm init -y
npm install --save-dev typescript vitest
npx tsc --init

This creates a package.json, installs TypeScript and Vitest (a fast test runner), and generates a tsconfig.json with compiler options.

Then create a src/ folder and a src/index.ts file:

mkdir -p src
touch src/index.ts
After this step your file should look like… (src/index.ts)

Define the Token type

A token is one atomic piece of JSON syntax. We'll use a TypeScript discriminated union — a type that can be one of several shapes, and we know which shape by looking at a type field.

At the top of src/index.ts, write:

type Token =
  | { type: "String"; value: string }
  | { type: "Number"; value: number }
  | { type: "Bool"; value: boolean }
  | { type: "Null" }
  | { type: "OpenBracket" }
  | { type: "CloseBracket" }
  | { type: "OpenBrace" }
  | { type: "CloseBrace" }
  | { type: "Colon" }
  | { type: "Comma" };

This says a Token is either a string token with a value, or a number token with a value, or ... and so on. TypeScript will make sure you handle every case when you check the type field.

After this step your file should look like… (src/index.ts)
type Token =
| { type: "String"; value: string }
| { type: "Number"; value: number }
| { type: "Bool"; value: boolean }
| { type: "Null" }
| { type: "OpenBracket" }
| { type: "CloseBracket" }
| { type: "OpenBrace" }
| { type: "CloseBrace" }
| { type: "Colon" }
| { type: "Comma" };

Handle quoted strings

Strings in JSON are wrapped in double quotes and can contain escape sequences like \n, \", \\. Write a tokenizeString() helper that reads characters until it finds the closing quote.

Then call it from a tokenize(input: string) function that scans the input and produces a Token[]:

function tokenizeString(input: string, start: number): { token: Token; nextIndex: number } {
  let i = start + 1; // skip opening quote
  let value = "";
  while (i < input.length) {
    if (input[i] === &apos;"&apos;) {
      return { token: { type: "String", value }, nextIndex: i + 1 };
    }
    if (input[i] === "\\") {
      i++;
      if (i >= input.length) throw new Error("Unterminated string");
      const char = input[i];
      value += char === "n" ? "\n" : char === "t" ? "\t" : char === "r" ? "\r" : char === &apos;"&apos; ? &apos;"&apos; : char === "\\" ? "\\" : char;
    } else {
      value += input[i];
    }
    i++;
  }
  throw new Error("Unterminated string");
}

function tokenize(input: string): Token[] {
  const tokens: Token[] = [];
  let i = 0;
  while (i < input.length) {
    const char = input[i];
    if (/\s/.test(char)) {
      i++; // skip whitespace
      continue;
    }
    if (char === &apos;"&apos;) {
      const { token, nextIndex } = tokenizeString(input, i);
      tokens.push(token);
      i = nextIndex;
      continue;
    }
    // ... we'll add more token types next
    throw new Error(`Unexpected character: ${char}`);
  }
  return tokens;
}

Test it: in a new file src/test.ts, write:

console.log(tokenize(&apos;["hello", "world"]&apos;));

Run npx ts-node src/test.ts and you should see the String tokens.

After this step you should see…
[
{ type: &apos;String&apos;, value: &apos;hello&apos; },
{ type: &apos;String&apos;, value: &apos;world&apos; }
]

Handle numeric literals

Add a tokenizeNumber() helper and wire it into tokenize(). Numbers in JSON are integers, decimals, and exponents: 42, 3.14, 1e10, -0.5.

function tokenizeNumber(input: string, start: number): { token: Token; nextIndex: number } {
  let i = start;
  if (input[i] === "-") i++; // optional minus sign
  while (i < input.length && /\d/.test(input[i])) i++; // integer part
  if (input[i] === ".") {
    i++;
    while (i < input.length && /\d/.test(input[i])) i++; // decimal part
  }
  if (input[i] === "e" || input[i] === "E") {
    i++;
    if (input[i] === "+" || input[i] === "-") i++; // exponent sign
    while (i < input.length && /\d/.test(input[i])) i++; // exponent digits
  }
  const numStr = input.slice(start, i);
  const value = parseFloat(numStr);
  if (isNaN(value)) throw new Error(`Invalid number: ${numStr}`);
  return { token: { type: "Number", value }, nextIndex: i };
}

In the tokenize() function, add this branch inside the main loop (before the throw):

if (char === "-" || /\d/.test(char)) {
  const { token, nextIndex } = tokenizeNumber(input, i);
  tokens.push(token);
  i = nextIndex;
  continue;
}

Test with console.log(tokenize(&apos;[42, 3.14, -1e5]&apos;)); — you should see three Number tokens.

After this step you should see…
[
{ type: &apos;Number&apos;, value: 42 },
{ type: &apos;Number&apos;, value: 3.14 },
{ type: &apos;Number&apos;, value: -100000 }
]

Handle true, false, null

Add a helper for the three keyword tokens:

function tokenizeKeyword(input: string, start: number): { token: Token; nextIndex: number } {
  if (input.slice(start, start + 4) === "true") {
    return { token: { type: "Bool", value: true }, nextIndex: start + 4 };
  }
  if (input.slice(start, start + 5) === "false") {
    return { token: { type: "Bool", value: false }, nextIndex: start + 5 };
  }
  if (input.slice(start, start + 4) === "null") {
    return { token: { type: "Null" }, nextIndex: start + 4 };
  }
  throw new Error(`Unknown keyword at position ${start}`);
}

In tokenize(), before the throw, add:

if (char === "t" || char === "f" || char === "n") {
  const { token, nextIndex } = tokenizeKeyword(input, i);
  tokens.push(token);
  i = nextIndex;
  continue;
}

Test with console.log(tokenize(&apos;[true, false, null]&apos;)); — you should see three tokens.

After this step you should see…
[
{ type: &apos;Bool&apos;, value: true },
{ type: &apos;Bool&apos;, value: false },
{ type: &apos;Null&apos; }
]

Handle brackets, braces, colons, commas

These are single characters. Add this to tokenize() before the keyword branch:

if (char === "[") {
  tokens.push({ type: "OpenBracket" });
  i++;
  continue;
}
if (char === "]") {
  tokens.push({ type: "CloseBracket" });
  i++;
  continue;
}
if (char === "{") {
  tokens.push({ type: "OpenBrace" });
  i++;
  continue;
}
if (char === "}") {
  tokens.push({ type: "CloseBrace" });
  i++;
  continue;
}
if (char === ":") {
  tokens.push({ type: "Colon" });
  i++;
  continue;
}
if (char === ",") {
  tokens.push({ type: "Comma" });
  i++;
  continue;
}

Test with console.log(tokenize(&apos;{"key": [1, 2], "nested": null}&apos;)); — you should see all token types together.

After this step you should see…
[
{ type: &apos;OpenBrace&apos; },
{ type: &apos;String&apos;, value: &apos;key&apos; },
{ type: &apos;Colon&apos; },
{ type: &apos;OpenBracket&apos; },
{ type: &apos;Number&apos;, value: 1 },
{ type: &apos;Comma&apos; },
{ type: &apos;Number&apos;, value: 2 },
{ type: &apos;CloseBracket&apos; },
{ type: &apos;Comma&apos; },
{ type: &apos;String&apos;, value: &apos;nested&apos; },
{ type: &apos;Colon&apos; },
{ type: &apos;Null&apos; },
{ type: &apos;CloseBrace&apos; }
]

What does a discriminated union allow TypeScript to do?

Define the JsonValue type

Now that we have tokens, we need to turn them into actual values. Define:

type JsonValue = string | number | boolean | null | JsonValue[] | { [key: string]: JsonValue };

This says: a JSON value is either a primitive (string, number, bool, null) or an array of JSON values or an object whose values are JSON values. The recursive definition handles nesting.

Also, create a parser class to hold state:

class JsonParser {
  private tokens: Token[];
  private position: number = 0;

  constructor(tokens: Token[]) {
    this.tokens = tokens;
  }

  private peek(): Token | undefined {
    return this.tokens[this.position];
  }

  private next(): Token | undefined {
    return this.tokens[this.position++];
  }

  private expect(type: string): Token {
    const token = this.next();
    if (!token || token.type !== type) {
      throw new Error(`Expected ${type}, got ${token?.type ?? "EOF"}`);
    }
    return token;
  }

  parse(): JsonValue {
    const value = this.parseValue();
    if (this.peek()) throw new Error("Unexpected token after JSON");
    return value;
  }

  private parseValue(): JsonValue {
    const token = this.peek();
    if (!token) throw new Error("Unexpected end of input");
    // we'll implement each case next
    throw new Error(`Unexpected token: ${token.type}`);
  }
}
After this step your file should look like… (src/index.ts)
type JsonValue = string | number | boolean | null | JsonValue[] | { [key: string]: JsonValue };

class JsonParser {
private tokens: Token[];
private position: number = 0;

constructor(tokens: Token[]) {
  this.tokens = tokens;
}

private peek(): Token | undefined {
  return this.tokens[this.position];
}

private next(): Token | undefined {
  return this.tokens[this.position++];
}

private expect(type: string): Token {
  const token = this.next();
  if (!token || token.type !== type) {
    throw new Error(`Expected ${type}, got ${token?.type ?? "EOF"}`);
  }
  return token;
}

parse(): JsonValue {
  const value = this.parseValue();
  if (this.peek()) throw new Error("Unexpected token after JSON");
  return value;
}

private parseValue(): JsonValue {
  const token = this.peek();
  if (!token) throw new Error("Unexpected end of input");
  throw new Error(`Unexpected token: ${token.type}`);
}
}

Parse primitive values

Fill in parseValue() to handle single-token cases:

private parseValue(): JsonValue {
  const token = this.peek();
  if (!token) throw new Error("Unexpected end of input");

  if (token.type === "String") {
    const t = this.next() as { type: "String"; value: string };
    return t.value;
  }

  if (token.type === "Number") {
    const t = this.next() as { type: "Number"; value: number };
    return t.value;
  }

  if (token.type === "Bool") {
    const t = this.next() as { type: "Bool"; value: boolean };
    return t.value;
  }

  if (token.type === "Null") {
    this.next();
    return null;
  }

  if (token.type === "OpenBracket") return this.parseArray();
  if (token.type === "OpenBrace") return this.parseObject();

  throw new Error(`Unexpected token: ${token.type}`);
}

Test with:

const input = &apos;["hello", 42, true, null]&apos;;
const tokens = tokenize(input);
const parser = new JsonParser(tokens);
console.log(parser.parse());

You should see: ["hello", 42, true, null]

After this step you should see…
[ &apos;hello&apos;, 42, true, null ]

What does type narrowing do in TypeScript?

Parse arrays

Arrays are [ then a comma-separated list of values, then ]. Here's the function:

private parseArray(): JsonValue[] {
  this.expect("OpenBracket");
  const items: JsonValue[] = [];

  if (this.peek()?.type === "CloseBracket") {
    this.next();
    return items;
  }

  while (true) {
    items.push(this.parseValue());
    if (this.peek()?.type === "CloseBracket") {
      this.next();
      return items;
    }
    this.expect("Comma");
  }
}

Test with:

const input = &apos;[1, "two", [3, 4], true]&apos;;
const tokens = tokenize(input);
const parser = new JsonParser(tokens);
console.log(parser.parse());

You should see nested arrays work.

After this step you should see…
[ 1, &apos;two&apos;, [ 3, 4 ], true ]

Parse objects

Objects are { then key-value pairs separated by commas (key is always a string, then :, then a value), then }:

private parseObject(): { [key: string]: JsonValue } {
  this.expect("OpenBrace");
  const obj: { [key: string]: JsonValue } = {};

  if (this.peek()?.type === "CloseBrace") {
    this.next();
    return obj;
  }

  while (true) {
    const keyToken = this.expect("String") as { type: "String"; value: string };
    const key = keyToken.value;
    this.expect("Colon");
    const value = this.parseValue();
    obj[key] = value;

    if (this.peek()?.type === "CloseBrace") {
      this.next();
      return obj;
    }
    this.expect("Comma");
  }
}

Test with:

const input = &apos;{"name": "Alice", "age": 30, "scores": [95, 87]}&apos;;
const tokens = tokenize(input);
const parser = new JsonParser(tokens);
console.log(parser.parse());

You should see: { name: &apos;Alice&apos;, age: 30, scores: [ 95, 87 ] }

After this step you should see…
{ name: &apos;Alice&apos;, age: 30, scores: [ 95, 87 ] }

Add line + column numbers to errors

When parsing fails, "Unexpected token" isn't helpful. Add line and column tracking to make errors actionable. Modify the Token type to include position:

type Token =
  | { type: "String"; value: string; line: number; col: number }
  | { type: "Number"; value: number; line: number; col: number }
  | { type: "Bool"; value: boolean; line: number; col: number }
  | { type: "Null"; line: number; col: number }
  | { type: "OpenBracket"; line: number; col: number }
  | { type: "CloseBracket"; line: number; col: number }
  | { type: "OpenBrace"; line: number; col: number }
  | { type: "CloseBrace"; line: number; col: number }
  | { type: "Colon"; line: number; col: number }
  | { type: "Comma"; line: number; col: number };

Then update tokenize() to track position:

function tokenize(input: string): Token[] {
  const tokens: Token[] = [];
  let i = 0;
  let line = 1;
  let col = 1;

  // ... inside the loop, before each token ...
  const startLine = line;
  const startCol = col;

  // ... and after consuming any token, update line/col ...
  for (const char of /* consumed text */) {
    if (char === "\n") {
      line++;
      col = 1;
    } else {
      col++;
    }
  }

  // ... push the token with startLine, startCol ...
}

Then in the parser, use this info in error messages:

throw new Error(
  `Unexpected token at line ${token.line}, col ${token.col}: expected ${type}, got ${token.type}`
);

Test with invalid JSON to see better errors.

After this step you should see…
Unexpected token at line 1, col 5: expected Comma, got OpenBrace

Write unit tests with Vitest

Create src/index.test.ts and write tests covering happy paths and error cases:

import { describe, it, expect } from "vitest";

describe("JSON Parser", () => {
  it("parses primitives", () => {
    const run = (input: string) => {
      const tokens = tokenize(input);
      const parser = new JsonParser(tokens);
      return parser.parse();
    };

    expect(run(&apos;"hello"&apos;)).toBe("hello");
    expect(run("42")).toBe(42);
    expect(run("3.14")).toBe(3.14);
    expect(run("true")).toBe(true);
    expect(run("false")).toBe(false);
    expect(run("null")).toBe(null);
  });

  it("parses arrays", () => {
    const tokens = tokenize("[1, 2, 3]");
    const parser = new JsonParser(tokens);
    expect(parser.parse()).toEqual([1, 2, 3]);
  });

  it("parses empty arrays", () => {
    const tokens = tokenize("[]");
    const parser = new JsonParser(tokens);
    expect(parser.parse()).toEqual([]);
  });

  it("parses objects", () => {
    const tokens = tokenize(&apos;{"x": 1, "y": 2}&apos;);
    const parser = new JsonParser(tokens);
    expect(parser.parse()).toEqual({ x: 1, y: 2 });
  });

  it("parses nested structures", () => {
    const input = &apos;{"users": [{"name": "Alice", "age": 30}, {"name": "Bob", "age": 25}]}&apos;;
    const tokens = tokenize(input);
    const parser = new JsonParser(tokens);
    const result = parser.parse() as { users: any[] };
    expect(result.users).toHaveLength(2);
    expect((result.users[0] as any).name).toBe("Alice");
  });

  it("rejects unterminated strings", () => {
    expect(() => {
      tokenize(&apos;"hello&apos;);
    }).toThrow("Unterminated string");
  });

  it("rejects trailing commas", () => {
    expect(() => {
      const tokens = tokenize("[1, 2, 3,]");
      const parser = new JsonParser(tokens);
      parser.parse();
    }).toThrow();
  });

  it("rejects non-string object keys", () => {
    expect(() => {
      const tokens = tokenize("{1: 2}");
      const parser = new JsonParser(tokens);
      parser.parse();
    }).toThrow();
  });
});

Run npx vitest src/index.test.ts and check they all pass.

After this step you should see…
✓ parses primitives
✓ parses arrays
✓ parses empty arrays
✓ parses objects
✓ parses nested structures
✓ rejects unterminated strings
✓ rejects trailing commas
✓ rejects non-string object keys

Export a public parse function

Create a public entry point that users call. At the end of src/index.ts, add:

export function parse(input: string): JsonValue {
  const tokens = tokenize(input);
  const parser = new JsonParser(tokens);
  return parser.parse();
}

Then consumers can do:

import { parse } from "./index.ts";

const data = parse(&apos;{"x": 1}&apos;);
console.log(data.x); // 1

Create a small example file, src/example.ts:

import { parse } from "./index.js";

const jsonString = `
{
  "title": "JSON Parser",
  "difficulty": "beginner",
  "steps": 14,
  "complete": true
}
`;

const data = parse(jsonString);
console.log("Parsed successfully:");
console.log(data);

Run it: npx ts-node src/example.ts

After this step you should see…
Parsed successfully:
{
title: &apos;JSON Parser&apos;,
difficulty: &apos;beginner&apos;,
steps: 14,
complete: true
}

README + benchmark

Create a README.md in the project root:

# JSON Parser from Scratch

A working JSON parser in TypeScript — tokenizer + recursive-descent parser, zero dependencies.

## Usage

\`\`\`typescript
import { parse } from "./src/index.js";

const data = parse(&apos;{"name": "Alice", "age": 30}&apos;);
console.log(data.name); // "Alice"
\`\`\`

## What&apos;s inside

- **Tokenizer**: breaks a JSON string into typed tokens (String, Number, Bool, Null, punctuation)
- **Parser**: a recursive-descent parser that builds a JSON value tree from tokens
- **Error messages**: line + column info on parse failures

## Run tests

\`\`\`bash
npm test
\`\`\`

## Compare to JSON.parse

How fast is our parser vs. the built-in?

\`\`\`typescript
import { performance } from "perf_hooks";
import { parse as ourParse } from "./src/index.js";

const largeJson = JSON.stringify(Array(1000).fill({ x: 1, y: 2 }));

const t0 = performance.now();
for (let i = 0; i < 1000; i++) ourParse(largeJson);
const t1 = performance.now();

const t2 = performance.now();
for (let i = 0; i < 1000; i++) JSON.parse(largeJson);
const t3 = performance.now();

console.log(\`Our parser: \${(t1 - t0).toFixed(2)}ms\`);
console.log(\`JSON.parse: \${(t3 - t2).toFixed(2)}ms\`);
console.log(\`Ratio: \${((t1 - t0) / (t3 - t2)).toFixed(1)}x slower\`);
\`\`\`

Run it: \`npx ts-node benchmark.ts\`

You&apos;ll see our parser is ~10–50x slower than the built-in (which is C code). That&apos;s expected — but now you *know why*.

Add a benchmark.ts file and run it. Save the output in the README.

After this step your file should look like… (README.md)
# JSON Parser from Scratch

A working JSON parser in TypeScript — tokenizer + recursive-descent parser, zero dependencies.

## Usage

```typescript
import { parse } from "./src/index.js";

const data = parse(&apos;{"name": "Alice", "age": 30}&apos;);
console.log(data.name); // "Alice"
```

## What&apos;s inside

- **Tokenizer**: breaks a JSON string into typed tokens (String, Number, Bool, Null, punctuation)
- **Parser**: a recursive-descent parser that builds a JSON value tree from tokens
- **Error messages**: line + column info on parse failures

## Run tests

```bash
npm test
```

## Compare to JSON.parse

You did it

Run npx vitest to confirm everything passes. Run npx ts-node src/example.ts to see it parse real JSON. Run the benchmark to see how much slower you are than the built-in (and reflect on why — hint: C beats TypeScript).

You wrote:

  • A tokenizer that turns characters into typed tokens
  • A recursive-descent parser that builds trees from tokens
  • A discriminated union type so TypeScript knows which token you have
  • Type narrowing inside if branches to safely access token fields
  • Error recovery with line + column numbers
  • A test suite covering happy paths and error cases
  • An export API that's simple to use
  • A benchmark so you can measure performance

Every one of those is a tool you'll use again — in the next language, in the next domain, everywhere.

Stretch goals