What Are Generators in Python?
At their core, generators are special functions that allow you to iterate through data one item at a time, without loading everything into memory. They’re built using the yield
keyword or generator expressions.
Generators make Python code lazy — which means values are computed only when they are requested. This makes them perfect for handling large datasets, streams, or scenarios where you don’t know the dataset’s size upfront.
What Are Generators in Python?
At their core, generators are special functions that allow you to iterate through data one item at a time, without loading everything into memory. They’re built using the yield
keyword or generator expressions.
Generators make Python code lazy — which means values are computed only when they are requested. This makes them perfect for handling large datasets, streams, or scenarios where you don’t know the dataset’s size upfront.
Example of a simple generator:
def simple_generator():
yield 1
yield 2
yield 3
for num in simple_generator():
print(num)
This outputs:
1
2
3
Unlike lists, generators don’t store all elements; they generate values on the fly.
What Are Transformation Generators?
A transformation generator is one that takes in an input sequence and modifies (or transforms) each element before yielding it.
For example:
def square_numbers(numbers):
for n in numbers:
yield n * n
nums = [1, 2, 3, 4, 5]
for result in square_numbers(nums):
print(result)
Output:
1
4
9
16
25
Here, the transformation applied is squaring each number. Instead of creating a new list of squares, the generator yields one square at a time.
This reduces memory consumption and allows you to chain transformations together.
What Are Filtering Generators?
Filtering generators are used when you only want to yield items that meet a certain condition.
For example:
def even_numbers(numbers):
for n in numbers:
if n % 2 == 0:
yield n
nums = range(10)
for even in even_numbers(nums):
print(even)
Output:
0
2
4
6
8
Instead of building a filtered list, we simply yield the values that match our condition.
Combining Transformation & Filtering Generators
One of the greatest strengths of generators is how easily they can be combined. You can build pipelines that filter, transform, and process data step by step.
Example:
def transform_and_filter(numbers):
for n in numbers:
if n % 2 == 0: # Filtering
yield n * n # Transformation
nums = range(10)
for result in transform_and_filter(nums):
print(result)
Output:
0
4
16
36
64
Here, odd numbers are filtered out, while even numbers are squared and yielded.
Why Use Transformation & Filtering Generators?
Here are some key benefits of using these generators in Python:
- Memory Efficiency – They don’t store entire sequences in memory.
- Lazy Evaluation – Values are computed only when needed.
- Readable Pipelines – You can chain transformations and filters neatly.
- Improved Performance – Faster for large datasets compared to list comprehensions.
- Scalability – Ideal for streams, logs, and real-time data.
- Composability – Functions remain modular and easy to reuse.
Generator Expressions vs. List Comprehensions
Python also allows you to write generator expressions, which are similar to list comprehensions but use parentheses instead of square brackets.
Example with list comprehension:
squares = [x * x for x in range(10)]
Example with generator expression:
squares = (x * x for x in range(10))
- List comprehension → produces a list in memory immediately.
- Generator expression → produces a generator, yielding items lazily.
This difference is crucial when working with large data.
Real-World Use Cases
Let’s look at where transformation & filtering generators shine:
- Log Processing: Filter only error messages and transform them into structured objects.
- Data Pipelines: Clean and preprocess large CSV/JSON files without loading them fully.
- Web Scraping: Process streams of scraped HTML chunks.
- APIs and Streaming: Handle paginated or streamed responses efficiently.
- Mathematical Computations: Transform numbers while applying conditions without extra memory usage.
Advanced Example: Chaining Generators
You can chain multiple generators to form elegant data pipelines.
def numbers():
for i in range(20):
yield i
def filter_even(seq):
for n in seq:
if n % 2 == 0:
yield n
def square(seq):
for n in seq:
yield n * n
pipeline = square(filter_even(numbers()))
for result in pipeline:
print(result)
This produces squares of even numbers between 0–19, step by step, without creating intermediate lists.
Python’s Built-in Tools for Transformation & Filtering
Python provides built-in functions that often pair well with generators:
map()
– for transformationsfilter()
– for filteringitertools
module – for complex generator pipelines
Example:
import itertools
nums = range(10)
pipeline = map(lambda x: x * x, filter(lambda n: n % 2 == 0, nums))
for result in pipeline:
print(result)
This mirrors our earlier manual example, but using built-ins.
Best Practices for Using Generators
- Use generators over lists when working with large or infinite sequences.
- Favor readability over over-optimization — sometimes a list comprehension is fine.
- Chain generators instead of nesting loops for cleaner code.
- Remember generators can be consumed only once. If you need to reuse them, convert them to a list.
Common Pitfalls to Avoid
- Forgetting that generators are one-time use: Once exhausted, they cannot be reset.
- Overcomplicating pipelines: Too many nested generators may reduce readability.
- Mixing generators with blocking operations: Be mindful of I/O-heavy tasks.
- Not handling exceptions inside generators: Errors in pipelines can be harder to debug.
FAQ on Python Transformation & Filtering Generators
Q1: What’s the difference between a generator and an iterator?
A generator is a special kind of iterator created using yield
or generator expressions. All generators are iterators, but not all iterators are generators.
Q2: Can generators be reused?
No, once a generator is exhausted, you need to recreate it if you want to iterate again.
Q3: Are generators faster than lists?
Not always. Generators save memory and are efficient for large datasets, but small datasets may perform faster with lists.