What's a "generator" in Python?

#TerminologyTuesday time ⏰

Let's discuss:
• The main reason generators are used
• The 2 ways to make a generator
• Lesser known generator features
• Why and when to use generators

Thread 🧵👇
Generators are typically used as lazy iterables.

By "lazy" I mean that they don't actually compute their values until they absolutely need to.

Essentially generators will "generate" their next value as soon as they're asked for it.
Here's a generator of every .py file in my home directory:

>>> from pathlib import Path
>>> py_files = Path.home().rglob("*.py")
>>> py_files
<generator object Path.rglob at 0x7f5a6721c900>

(the rglob method on pathlib.Path objects always returns a generator)
If I loop over that generator, it'll start giving me values immediately:

for path in py_files:
print(path)

We get the initial value so quickly due to "laziness": instead of getting all the paths at once, between each loop iteration the generator locates the next .py file.
I I instead turned this generator into a list, it'd take a while to see any values.

py_files = list(Path.home().rglob("*.py"))
for path in py_files:
print(path)

That takes over a minute to run on my machine because it finds ALL the files before processing any 1 of them.
So generators are lazy iterables that compute their next item as you loop over them. Some functions/methods in Python will return a generator to you.

How can you make your own generators?

Two ways:

1. Generator expressions
2. Generator functions
Generator expressions look like list comprehensions.

numbers = [2, 1, 3, 4, 7, 11, 18]
squares = (n**2 for n in numbers) # <- a generator expression

List comprehensions use [ ... ] & return lists
Generator expressions use ( ... ) & return generator objects
A dramatic demonstration of a generator expression 👇

This generates all primes below 50,000 and prints the first N of them.

We're generating more primes than we may end up needing, so using a generator expression instead of a list is WAY faster here.

pym.dev/p/2hafq/
You can think of generator expressions as "generator comprehensions" because they use a comprehension syntax but just have a different name.

@nedbat used that term in a blog post years ago and I really wish it was the official term for this feature.

nedbatchelder.com/blog/201605/ge…
@nedbat The other way to make a generator is with a generator function.

Here's a generator function that lazily trims newlines from an iterable of lines:

def without_newlines(lines):
for line in lines:
yield line.rstrip("\n")

Note that odd "yield" statement. ☝
@nedbat "yield" is the magic word in generator function land.

The mere presence of a yield statement turns a regular function into a generator function.

Unlike regular functions, generator functions DO NOT RUN when you call them. Instead they return a new generator object when called.
@nedbat To *run* a generator function, you can loop over the return generator object.

This lazily prints each line in a file:

f = open("logs.txt")
for line in without_newlines(f):
print(line)

Both file objects & generators are lazy, so we never store the whole file in memory here.
@nedbat Generator objects that come from generator functions will pause themselves and provide a value whenever a "yield" statement is reached. When they're asked for another item, they'll unpause and keep running from where they left off until "yield" is hit again or until they return.
@nedbat Generator functions are a bit more complex than generator expressions but they're also more flexible. You can write much more complex logic within them.

Not all list-creation logic can fit in a list comprehension & not all lazy looping logic can fit in a generator expression.
@nedbat Generators are primarily used for either:

• Starting to get results quickly
• Saving memory (items are generated instead of stored in a data structure)
• Saving time (if there's an early break from a loop)

But generators can also be bidirectional (sending & receiving data).
@nedbat Generator objects also have these methods: send, throw, & close.

These methods allow generators to be "coroutines" which can both send & receive data.

It's a bit uncommon to see generators used as coroutines. Using "async def" is a more common way to make co-routines in Python. screenshot from Python docs...
@nedbat Quick disclaimer: my terminology here is inconsistent with the Python docs.

I say "generator object" but the docs say "generator iterator".

I say "generator function" when the docs just say "generator".

Colloquial & official terminology is unfortunately inconsistent here. 😢
@nedbat A duck typing related thought:

• Tuples are a common "sequence"
• Dictionaries are the most common "mapping"
• Generators are the most common "iterator"

Python's enumerate & zip functions return iterators (essentially generator-like objects).

More on iterators another time.
@nedbat Generators (& iterators) allow for pre-processing items while you're looping over an iterable.

Generators can act as an iterable processing pipeline.

You can turn a very complex loop into a smaller loop that abstracts logic away into generators.

More details in the resources👇
@nedbat It's time for generator resources.

Screencasts/articles on generators:

pym.dev/how-write-gene…
pym.dev/what-is-a-gene…
pym.dev/how-to-create-…

Lazy looping terminology: pym.dev/terms/#lazy-lo…

My talk explaining generators & iterators 📺👇

trey.io/lazy-looping
@nedbat If you enjoyed this thread, sign up for the @PythonMorsels newsletter. 💌

Every Wednesday, I share a new Python tip along with a sneak peek of Python exercises, screencasts, and learning tools. 📬

trey.io/b3L81e

• • •

Missing some Tweet in this thread? You can try to force a refresh
 

Keep Current with Trey Hunner (Python trainer)

Trey Hunner (Python trainer) Profile picture

Stay in touch and get notified when new unrolls are available from this author!

Read all threads

This Thread may be Removed Anytime!

PDF

Twitter may remove this content at anytime! Save it as PDF for later use!

Try unrolling a thread yourself!

how to unroll video
  1. Follow @ThreadReaderApp to mention us!

  2. From a Twitter thread mention us with a keyword "unroll"
@threadreaderapp unroll

Practice here first or read more on our help page!

More from @treyhunner

Aug 30
What's "assignment" in #Python?

This might seem like a simple concept, but there are some gotchas in the way assignment works in Python.

Fundamentally "assignment" describes the action of binding a name to a value.
You can assign a variable (a.k.a. name) to a value (a.k.a. object) with an "assignment statement".

That uses an = sign, like this:

>>> x = []

That assigns the name x to an empty list.

If "x" already exists, an = sign will re-assign it to a new value:

.>>> x = 4
>>> x
4
Python programmers often talk about names "binding" to objects or variables "pointing" to objects.

In Python, variables point to (or "refer to") values. Variables don't *contain* objects, but just say "look over there for the object". 👀

Beware of code like this:

x = []
y = x
Read 14 tweets
Aug 29
Python's strings have 47 methods!

But these 12 string methods are the only ones really worth committing to memory for most Python programmers:

join
split
replace
strip
casefold
startswith
endswith
splitlines
format
count
removeprefix
removesuffix A table showing the 12 stri...
But what about find, encode, translate, isnumeric, and all the other string methods?

While there are other useful string methods, most of the remaining methods have a niche use case or have quirks that make them tricky to use.

For example the title method has quirks 👇
And methods like isnumeric, isdigit, and isdecimal unfortunately take a bit of studying to figure out, so I recommend either avoiding them or using them carefully.
Read 5 tweets
Aug 9
What's a regular expression (a.k.a. regex)? 🤔 #TerminologyTuesday

Yes, they're a programming language within a programming language that's just for pattern matching and they're extremely succinct.

But what does "regular expression" really mean? And where did they come from?
Haven't seen regular expressions?

Imagine a special purpose programming language where every single character is a statement and no whitespace or comments are allowed. 😨

Regular expressions are extremely information dense but very helpful for certain types of pattern matching.
Regular expressions are called "regular" because regular expressions define a "regular language".

What's a "regular language"?

I'm so glad you asked! This is one of the few factoids from my CS degree that actually stuck with me.

This part of CS is tightly tied to linguistics.
Read 17 tweets
Aug 3
I usually recommend the "literal" list & dict syntax in #Python over the built-in list & dict functions.

✅ []
🚫 list()

✅ {}
🚫 dict()

✅ {"name": "Trey", "id": 4}
🚫 dict(name='Trey', id=4)

So what's the purpose of list(...) and dict(...)? 🤔

Copying!

(thread🧵)
Per the Zen of Python:

> there should be one— and preferably only one —obvious way to do it

I see [] and {} as "the one obvious" way to make a new list/dict.

[] and {} are even more common than list() and dict() so are likely more obvious to most Python devs.
What about passing keyword arguments to dict(...)?

This is a neat trick, but its use is limited.

Using non-string or invalid Python variables as keys doesn't work:

>>> d = dict(class="yes")
SyntaxError: invalid syntax

I don't find the benefits of dict(...) worth 2 syntaxes.
Read 8 tweets
Aug 2
What's an "object" in Python?

According to the Python glossary, an object is:

> Any data with state (attributes or value) and defined behavior (methods). Also the ultimate base class of any new-style class.

What does that really mean? (thread🧵)

#Python #TerminologyTuesday
Usually when we think of an "object" we think of class instance. For example these are all objects:

>>> numbers = [2, 1, 3, 4, 7] # a list object
>>> colors = {"red", "green", "blue", "yellow"} # a set object
>>> name = "Trey" # a string object
>>> n = 3 # an int object
Anything that can have attributes is an object.

Anything that can has methods is an object.

Anything that you can point a variable to is an object.

Pretty much EVERY THING is an object in Python.
Read 8 tweets
Aug 1
Need to remove all spaces from a string in #Python? 🌌🐍

Let's take a quick look at:

• removing just space characters
• removing all whitespace
• collapsing consecutive whitespace to 1 space
• removing from the beginning/end
• removing from the ends of every line

Thread🧵
If you just need to remove space characters you could use the string replace method to replace all spaces by an empty string:

>>> greeting = " Hello world! "
>>> greeting.replace(" ", "")
'Helloworld!'

But you may also want to remove other whitespace too (e.g. newlines)...
To remove all sorts of whitespace, you could use the string split method along with the string join method:

>>> version = "\tpy 310\n"
>>> "".join(version.split())
'py310'

Or you could use a regular expression:

>>> import re
>>> re.sub(r"\s+", "", version)
'py310'
Read 8 tweets

Did Thread Reader help you today?

Support us! We are indie developers!


This site is made by just two indie developers on a laptop doing marketing, support and development! Read more about the story.

Become a Premium Member ($3/month or $30/year) and get exclusive features!

Become Premium

Don't want to be a Premium member but still want to support us?

Make a small donation by buying us coffee ($5) or help with server cost ($10)

Donate via Paypal

Or Donate anonymously using crypto!

Ethereum

0xfe58350B80634f60Fa6Dc149a72b4DFbc17D341E copy

Bitcoin

3ATGMxNzCUFzxpMCHL5sWSt4DVtS8UqXpi copy

Thank you for your support!

Follow Us on Twitter!

:(