Iterators & Generators
Lazy sequences, yield, and the itertools toolbox.
Iterators & Generators
Iteration in Python is a protocol, not a syntax. Anything that implements it can be used in a for loop, comprehension, list(), sum(), any(), etc.
The protocol
An iterable is anything with an __iter__ method that returns an iterator. An iterator is anything with a __next__ method that yields values until it raises StopIteration.
class Counter:
def __init__(self, n):
self.n = n
def __iter__(self):
self.i = 0
return self
def __next__(self):
if self.i >= self.n:
raise StopIteration
self.i += 1
return self.i
for x in Counter(3):
print(x) # 1 2 3
That's verbose. Generators are the shortcut.
Generators
A function with yield in it is a generator function. Calling it doesn't run the body — it returns a generator object. The body runs lazily, one yield at a time:
def counter(n):
i = 0
while i < n:
i += 1
yield i
for x in counter(3):
print(x) # 1 2 3
Equivalent to the verbose Counter class, but five lines instead of ten.
Why bother
Three reasons generators are worth knowing:
- Memory — they produce one value at a time. A generator over a billion items uses constant memory; a list does not.
- Composition — generators chain naturally. Build pipelines without intermediate lists.
- Infinite streams — you can iterate things that don't end (a counter, polling a queue, a random sequence).
def numbers():
n = 0
while True:
yield n
n += 1
# Iterate with care — this is infinite.
for x in numbers():
if x > 10:
break
print(x)
Generator expressions
Like list comprehensions but with () instead of []. Lazy:
total = sum(x * x for x in big_iterable) # no intermediate list
nonzero = (x for x in stream if x != 0) # filter, lazily
sum, any, all, max, min all accept iterables. Pair them with generator expressions to avoid materialising lists.
itertools
The standard-library itertools module is a goldmine of iterator combinators:
from itertools import (
chain, islice, cycle, count, repeat,
takewhile, dropwhile, groupby, accumulate,
product, permutations, combinations,
)
# First 10 squares.
list(islice((x*x for x in count()), 10))
# Accumulate running totals.
list(accumulate([1, 2, 3, 4])) # [1, 3, 6, 10]
# Group consecutive equal values.
[(k, list(g)) for k, g in groupby("aaabbc")]
# [('a', ['a','a','a']), ('b', ['b','b']), ('c', ['c'])]
If you find yourself writing a loop with manual state, check itertools first — there's often a one-liner.
Sending values into a generator
Generators can also receive values via .send(). Rare but powerful — coroutines were built on this before async/await landed:
def echo():
while True:
msg = yield
print(f"got: {msg}")
g = echo()
next(g) # prime
g.send("hello") # got: hello
g.send("world") # got: world
yield from
Delegate to another iterable:
def chained():
yield from range(3)
yield from "abc"
list(chained()) # [0, 1, 2, 'a', 'b', 'c']
Saves a for x in inner: yield x loop and forwards send / throw through.
When NOT to generator
If you're going to iterate the result more than once, use a list. Generators are single-use:
g = (x*x for x in range(5))
sum(g) # 30
sum(g) # 0 — generator exhausted!
Tip
A generator is the right answer when you have a stream. A list is the right answer when you have a collection. Streams produce values over time or on demand; collections are values you already have.