You're going to build a small rate-limiting library from scratch. Pick the language you code in most — Python, TypeScript, Go, Rust, or another. We show TypeScript + Vitest as the worked path, but the three algorithms are language-agnostic.
A rate limiter is a guard that says "yes, you can make that request" or "no, wait, you've already made too many." They live in APIs, authentication systems, and real-time services. By the end you'll have implemented three fundamentally different strategies and understand when to reach for each.
We're not racing. Each step is one idea. If a hint helps, take it — there's no penalty.
Before you start
The walkthrough
Create the project file and first test
In your terminal, inside your project folder, create a file for the rate-limiter code.
For TypeScript: limiter.ts
For Python: limiter.py
For Go: limiter.go
For Rust: edit src/lib.rs
Then create a test file:
For TypeScript: limiter.test.ts
For Python: test_limiter.py
For Go: limiter_test.go in the same folder
For Rust: add a test module at the bottom of src/lib.rs
In your test file, write one placeholder test that passes — something like expect(1 + 1).toBe(2) in TypeScript or assert 1 + 1 == 2 in Python. Run it to confirm the toolchain is wired up: npx vitest run (TS), pytest (Python), go test (Go), or cargo test (Rust).
1 passed
Define the Limiter interface
A rate limiter answers one question: "can I do this action now?" It needs to remember some state per key (e.g., per IP address or user ID) and check whether allowing the action would exceed a limit.
Define a function or type signature for the limiter's core method:
allow(key: string, now: number) -> boolean
It returns true if the action is allowed, false if it's been rate-limited.
Why now as a parameter? Because we'll write tests that fake time — passing in now lets us test "what happens 30 seconds later" without actually waiting.
Store window boundaries and request count
In a fixed-window rate limiter, we divide time into buckets. If the limit is 10 requests per 60 seconds, then seconds 0–59 is one bucket, 60–119 is the next, and so on.
When a request comes in at time now:
- Calculate which window it belongs to:
windowNumber = floor(now / windowMs) - Check the store: have we seen this key + window before? If not, it's a fresh window.
- If it's a fresh window, reset the counter. If it's the same window, increment.
Implement the logic to store and retrieve windowStart and count for each key. Don't implement the allow/deny decision yet — just make sure you can compute which window a request falls into and remember the count.
0
Implement the allow decision logic
Now add the decision: return true if the count is still under the limit, false if it would exceed it.
The simplest version: increment first, then check if you've gone over. If you have, return false; otherwise true.
true
Write tests for fixed-window — under limit, at limit, after window
Write three test cases:
- Under limit: Call
allowthree times with the same key within the same window. All should returntrue. - At limit: With a limit of 3, the fourth call should return
false. - After window: Call
allowfor a key, then call it again at a time past the window boundary. The counter should reset and returntrueagain.
3 passed
With a 60-second window and limit 5, after making 5 requests at second 0, what happens if you call allow() again at second 30?
Implement sliding window — keep a queue of request timestamps
Fixed window has a problem: if the limit is 10 per 60 seconds and someone makes 10 requests at second 59, then 10 more at second 60, they've made 20 in 2 seconds — well over the intended rate.
Sliding window fixes this: instead of resetting at fixed boundaries, remember the exact time of each request for the past windowMs milliseconds. When a new request arrives, throw away the old ones and count what's left in the window.
Implement a sliding-window limiter:
- For each key, keep a list of request timestamps.
- When
allowis called withnow, remove all timestamps older thannow - windowMs. - Check if the remaining count is under the limit. If so, add
nowto the list and returntrue.
Test it with the same three cases: under limit, at limit, and after window. Sliding window should pass all three.
3 passed
Implement token bucket — refill tokens at a rate
Token bucket is different from both: imagine a bucket with tokens in it. Each request consumes one token. Tokens refill at a steady rate (e.g., 10 per second). If the bucket fills, it overflows — extra tokens are wasted.
Implement token bucket:
- For each key, store how many tokens are in the bucket and when it was last refilled.
- When
allowis called, calculate how many tokens have accrued since the last refill:time_since_refill * refill_rate. - Add those to the bucket (capped at the bucket capacity).
- If there's at least 1 token, consume it and return
true. Otherwise returnfalse.
Use a refill rate of limit_per_window / window_ms tokens per millisecond. For example, a limit of 10 per 1000ms = 0.01 tokens per millisecond.
Test with the same three cases. For the "after window" case, call allow at a time far enough in the future that the bucket has refilled.
3 passed
Add a comparison table showing when each algorithm wins
You now have three algorithms. Add a comment or docstring in your code (or in a README) that documents when to use each:
| Algorithm | Pros | Cons | Best for |
|---|---|---|---|
| Fixed Window | Simple, low memory, fast | Allows burst at window edges | Simple services, known predictable patterns |
| Sliding Window | Perfectly fair, no burst gaps | Higher memory (keeps all timestamps) | Critical APIs, strict rate enforcement |
| Token Bucket | Handles bursts, smooth rate | Slightly complex, memory per key | Bursty traffic, backpressure systems |
Add a comment or docstring explaining this in your limiter file. You don't need to write tests for this — it's documentation, not code.
0
Test edge cases: boundary times, key cardinality, clock skew
Rate limiters live in the real world, where clocks are weird. Write three more test cases:
- Exact boundary: With a 1000ms window, make requests at exactly times 0 and 1000. The counter should reset between them (they're in different windows).
- High key cardinality: Create 1000 different keys and call
allowon each once. All should returntrue(each has its own limit). - Clock backward (gotcha): Call
allow(key, 1000), then call it again withallow(key, 500)— a time in the past. Your limiter should handle this gracefully (either ignore, or reset, but don't crash or leak memory).
3 passed
Create a clean public API — factory function returning the interface
You have three limiter implementations. Create a unified public API so users can pick one without knowing the internals:
createLimiter(algorithm, limitPerWindow, windowMs) -> {allow(key, now): boolean}
The algorithm parameter is a string: "fixed-window", "sliding-window", or "token-bucket". The factory returns a single object with an allow method.
Test that all three algorithms are accessible via the factory.
3 passed
You did it
Run your full test suite one more time: npx vitest run, pytest, go test, or cargo test. You should see 12+ tests passing across fixed-window, sliding-window, token-bucket, and edge cases.
You built:
- A data structure to track per-key rate-limit state
- Three fundamentally different algorithms (fixed window, sliding window, token bucket)
- An understanding of the tradeoffs between them
- Tests that confirm behavior under normal and edge cases
- A unified public API that hides implementation details
Every one of those is a pattern you'll see again in distributed systems, API design, and production services.