Let's discuss:
• The main reason generators are used
• The 2 ways to make a generator
• Lesser known generator features
• Why and when to use generators
Thread 🧵👇
Generators are typically used as lazy iterables.
By "lazy" I mean that they don't actually compute their values until they absolutely need to.
Essentially generators will "generate" their next value as soon as they're asked for it.
Here's a generator of every .py file in my home directory:
>>> from pathlib import Path
>>> py_files = Path.home().rglob("*.py")
>>> py_files
<generator object Path.rglob at 0x7f5a6721c900>
(the rglob method on pathlib.Path objects always returns a generator)
If I loop over that generator, it'll start giving me values immediately:
for path in py_files:
print(path)
We get the initial value so quickly due to "laziness": instead of getting all the paths at once, between each loop iteration the generator locates the next .py file.
I I instead turned this generator into a list, it'd take a while to see any values.
py_files = list(Path.home().rglob("*.py"))
for path in py_files:
print(path)
That takes over a minute to run on my machine because it finds ALL the files before processing any 1 of them.
So generators are lazy iterables that compute their next item as you loop over them. Some functions/methods in Python will return a generator to you.
How can you make your own generators?
Two ways:
1. Generator expressions 2. Generator functions
Generator expressions look like list comprehensions.
numbers = [2, 1, 3, 4, 7, 11, 18]
squares = (n**2 for n in numbers) # <- a generator expression
List comprehensions use [ ... ] & return lists
Generator expressions use ( ... ) & return generator objects
A dramatic demonstration of a generator expression 👇
This generates all primes below 50,000 and prints the first N of them.
We're generating more primes than we may end up needing, so using a generator expression instead of a list is WAY faster here.
@nedbat The other way to make a generator is with a generator function.
Here's a generator function that lazily trims newlines from an iterable of lines:
def without_newlines(lines):
for line in lines:
yield line.rstrip("\n")
Note that odd "yield" statement. ☝
@nedbat "yield" is the magic word in generator function land.
The mere presence of a yield statement turns a regular function into a generator function.
Unlike regular functions, generator functions DO NOT RUN when you call them. Instead they return a new generator object when called.
@nedbat To *run* a generator function, you can loop over the return generator object.
This lazily prints each line in a file:
f = open("logs.txt")
for line in without_newlines(f):
print(line)
Both file objects & generators are lazy, so we never store the whole file in memory here.
@nedbat Generator objects that come from generator functions will pause themselves and provide a value whenever a "yield" statement is reached. When they're asked for another item, they'll unpause and keep running from where they left off until "yield" is hit again or until they return.
@nedbat Generator functions are a bit more complex than generator expressions but they're also more flexible. You can write much more complex logic within them.
Not all list-creation logic can fit in a list comprehension & not all lazy looping logic can fit in a generator expression.
• Starting to get results quickly
• Saving memory (items are generated instead of stored in a data structure)
• Saving time (if there's an early break from a loop)
But generators can also be bidirectional (sending & receiving data).
@nedbat Generator objects also have these methods: send, throw, & close.
These methods allow generators to be "coroutines" which can both send & receive data.
It's a bit uncommon to see generators used as coroutines. Using "async def" is a more common way to make co-routines in Python.
@nedbat Quick disclaimer: my terminology here is inconsistent with the Python docs.
I say "generator object" but the docs say "generator iterator".
I say "generator function" when the docs just say "generator".
Colloquial & official terminology is unfortunately inconsistent here. 😢
And methods like isnumeric, isdigit, and isdecimal unfortunately take a bit of studying to figure out, so I recommend either avoiding them or using them carefully.
Usually when we think of an "object" we think of class instance. For example these are all objects:
>>> numbers = [2, 1, 3, 4, 7] # a list object
>>> colors = {"red", "green", "blue", "yellow"} # a set object
>>> name = "Trey" # a string object
>>> n = 3 # an int object
Need to remove all spaces from a string in #Python? 🌌🐍
Let's take a quick look at:
• removing just space characters
• removing all whitespace
• collapsing consecutive whitespace to 1 space
• removing from the beginning/end
• removing from the ends of every line
Thread🧵
If you just need to remove space characters you could use the string replace method to replace all spaces by an empty string: