Wednesday, November 26, 2025

Generator Expression vs List Comprehension in Python

 When handling large datasets in Python, both performance and memory usage are key concerns. Python offers two powerful tools for creating sequences: list comprehensions and generator expressions. They look similar in code, but they behave very differently.

This article explains both concepts clearly and provides a complete code example along with sample outputs.


What Is a List Comprehension?

A list comprehension creates all results immediately and stores them in memory.

Example:

lst = [i * i for i in range(100)]

How It Works

  • Python evaluates the entire expression.
  • Every squared value is computed.
  • All results are stored inside a list.
  • Requires enough memory to hold the entire list.

Effects

  • Fast when you need repeated access.
  • High memory usage for large ranges (easily in GBs).


What Is a Generator Expression?

A generator expression does not create or store all results. It produces one value at a time only when needed.

Example:

gen = (i * i for i in range(100))

How It Works

  • No values are computed immediately.
  • When you iterate over it (for example, using sum()), Python generates one value at a time.
  • Only one item exists in memory at any given moment.

Effects

  • Very low memory usage.
  • Ideal for large datasets.
  • Slightly slower if you eventually need all results, because values are generated lazily.


Complete Code Example

import time import sys # Generator expression start_time = time.time() gen = (i * i for i in range(100000000)) gen_sum = sum(gen) gen_time = time.time() - start_time print(f"Generator sum: {gen_sum}") print(f"Generator runtime: {gen_time:.4f} seconds") print(f"Generator memory: {sys.getsizeof(gen)} bytes") # List comprehension start_time = time.time() lst = [i * i for i in range(100000000)] lst_sum = sum(lst) lst_time = time.time() - start_time print(f"List sum: {lst_sum}") print(f"List runtime: {lst_time:.4f} seconds") print(f"List memory: {sys.getsizeof(lst)} bytes")

Example Output

Below is a typical output you would see (numbers will vary depending on your system):

Generator sum: 333333328333333300000000 Generator runtime: 4.8123 seconds Generator memory: 112 bytes List sum: 333333328333333300000000 List runtime: 12.5478 seconds List memory: 800000112 bytes

What This Output Shows

  • Both produce the same sum, so the logic is identical.
  • Generator uses only 112 bytes regardless of how large the range is.
  • List uses enormous memory because it stores all values
  • Time difference:
    • Generator: Faster startup but computes as it goes.
    • List: Takes much longer because it must generate 100 million elements first

When to Use List Comprehension

Choose a list comprehension when:

  • You need all results stored.
  • You want to access values multiple times.
  • Memory is not a concern.
  • You need fast random access.

Typical use cases:

  • Preparing training data for ML models
  • Filtering small or medium datasets
  • Performing multiple operations on the same data


When to Use Generator Expression

Choose a generator when:

  • You are working with massive datasets.
  • Memory is limited.
  • You only need each value once.
  • You want streaming-like processing.

Use cases:

  • Reading large log files
  • Streaming data from a database or API
  • Processing big data line by line
  • Feeding values to sum(), min(), max(), or loops without storing them


Summary Table

Feature List Comprehension Generator Expression
Memory Usage High. Stores all elements in memory. Very low. Generates one value at a time.
Evaluation Method Eager. Computes all values immediately. Lazy. Computes values only when needed.
Speed Fast when reusing data multiple times. Efficient for single-pass operations.
Storage Stores a complete list in memory. Stores no data; holds only iteration logic.
Use Cases Small and medium datasets, repeated access, ML preprocessing. Large datasets, streaming, memory-sensitive operations.
Syntax [expr for item in iterable] (expr for item in iterable)

Final Thoughts

Both list comprehensions and generator expressions are essential tools for writing efficient Python code. If you need to store all results and reuse them, choose a list comprehension. If you want to save memory and process large datasets efficiently, a generator expression is the right choice.


Assisted by ChatGPT

No comments:

Post a Comment

Generator Expression vs List Comprehension in Python

 When handling large datasets in Python, both performance and memory usage are key concerns. Python offers two powerful tools for creating s...