Python Iterables: Uncovering the Power of Python's Iterator Protocol

Python has a novel framework whereby objects can define how to iterate over the data they encapsulate. This framework, known as the Iterator Protocol, empowers developers to abstract the process of how an object returns members individually. Here we take a dive into understanding Python’s Iterator Protocol, the “magic” methods used, and how to tap into the power of this clear, concise, and convenient framework when designing custom classes.

Table of Contents show

Highlights

Iterable Basics.

Python Iterator Protocol: __iter__, __next__, iter(), and next().
Basic iteration examples.
Custom Iterables – Basic Use-Cases

Custom Iterables – Using our Imagination

Introduction: Understanding Iterables

The concept of iteration is language-agnostic. It’s found in every major programming language but has roots that run even deeper. Iteration is a conceptual underpinning of approaching repetitive actions.

Iteration reduces overhead by providing algorithmic definitions of how to handle situations in which there are multiple things for which multiple actions are required — even if that means repeating the same action multiple times! In the context of this article the term “Iterable” will refer to something which provides native support for the process of iteration.

The Python language defines an Iterable as “an object capable of returning its members one at a time.” Examples of Python-native iterables include list, tuple, dict, set and even str type objects. Very quickly, let’s see an example of iteration using the list object:

# defines a list with members
my_list = [1, 2, 3, 4, 5]

# uses a for loop to "iterate" through the list,
# 1 member at a time
for item in my_list:
    print(item)

This code block outputs the following to stdout:

This executes the print statement “for each” item within the my_list object. In plain English: print each item in my_list. Consider the deeper logic of this statement — how does Python know when to stop printing members? How does it detect when there are no more members to print? If we were to implement this manually it might look something like this:

# defines a list with 5 members
my_list = [1, 2, 3, 4, 5]

# loop until the number of members
# having been printed matches the
# number of members present.
count = 0
while count < len(my_list):
    print(my_list[count])
    count += 1

Just like before, the output is as follows:

Alternatively, we could attempt to access the next item in the my_list collection until we encountered an IndexError exception as follows:

# defines a list with 5 members
my_list = [1, 2, 3, 4, 5]

# access the next member of the
# list until an exception is thrown.
i = 0
while True:
    try:
        print(my_list[i])
        i += 1
    except:
        break

Once again, our output is as follows:

Clearly, there are several ways to approach the concept of implementing a for loop. The last case — the one that uses exception handling for control flow — is actually the most similar to how Python’s Iterator Protocol works under the hood.

Understanding Python Iterables

Iterators were introduced in Python version 2 via PEP 234 — showing iconic number selection. It outlines the use of several methods, special methods, and a StopIteration exception to help handle the Iterator Protocol. Over the course of Python’s evolution, the current 3.x versions use the following major components for implementing an Iterable object (taken directly from the documentation):

__next__(): Return the next item from the iterator. If there are no further items, raise the StopIteration exception.

__iter__(): Return an iterator object. The object is required to support the iterator protocol
iter(): Built-in function that calls an iterable object’s __iter__ method.
next(): Built-in function that calls an iterable object’s __next__ method.

StopIteration: Exception raised by an iterator’s __next__ method to signify that there are no more items.

Note: next() calls the __next__method of the object passed in as an argument. Likewise, iter() calls the __iter__ method of the object passed to it as an argument. In either case, an exception is raised if the object passed as an argument doesn’t implement the required method: __next__ and __iter__, respectively.

Below is a manual implementation of a custom Iterable object using the Python Iterator Protocol syntax/framework:

from collections.abc import Sequence
from typing import Any


class MyIterable:
    """Custom implementation of an Iterable object via Iterable Protocol"""
    def __init__(self, items: Sequence[Any]):
        """
        Initializes the object with the Sequence specified.
        Args:
            items: An object that defines the __getitem__ method.
        """
        # internal "protected" fields
        self._sequence = items
        self._counter = 0

    def __iter__(self):
        """Returns a reference to the Iterable object"""
        return self

    def __next__(self):
        """Returns the next item or raises StopIteration"""
        # if the indexing variable hasn't exceeded the length of
        # the sequence, get the next item -- otherwise raise StopIteration
        if self._counter < len(self._sequence):
            item = self._sequence[self._counter]
            self._counter += 1
            return item
        raise StopIteration

Note: The use of Sequence here asserts a type hint that the argument should implement the __getitem__ method that provides self[key] type access.

This custom MyIterable class taps into the power of Python’s Iterator Protocol by implementing a custom __next__ method. This method allows instances of this custom object to tap into Python-native language tools like for and while and even list comprehensions. Let’s take our new object for a test drive:

For Loops

Our custom object allows instances to be accessed via standard for loop syntax as follows:

# create new instance encapsulating a list object
my_list = MyIterable([1, 2, 3, 4, 5])

# use Python-native for looping to iterate
for item in my_list:
    print(item)

This code produces the following to stdout, just as if we had used a list object:

List Comprehensions

Python’s list comprehensions are a powerful abstraction for applying logic to sequences of values. Below, we see our custom object instance can make use of this powerful language-level tool as well:

# create new instance encapsulating a list object
my_list = MyIterable([1, 2, 3, 4, 5])

# copy 1-4 into a separate list using list comp.
no_five = [x for x in my_list if x < 5]

# print results
for j in no_five:
    print(j)

Similar to before, this code produces the following to stdout minus the last element:

The real benefit of using custom iterable objects defined via the Python Iterator Protocol is that they are accessible to the languages tools that can interact with built-in iterables like list, dict and even str. Below, we see how the .join function can be used on a Custom Iterable object:

# chain the join tool and list comprehension together
stringed = "".join([str(x) for x in my_list])

The output is a string: 12345. Sure, this isn’t a direct use case but it shows how abiding by the Python Iterable Protocol allows easy folding into Python syntax.

Iterator vs. Iterable vs. Iteration

Goal: Develop an intuition for Python’s Iterator Protocol

There are some semantic concepts one needs to be aware of when using iterators in general — but especially when designing and implementing custom Python iterables. Let’s define some terms:

Iterator: an object with a __next__ method (or next in Python 2.x).
Iterable: an object with a __iter__ method that maintains state of an iterator.

Iteration: A general term used to describe the process of iterating over a sequence of items (language agnostic.)

Those are some easy-access definitions but don’t get us far in terms of developing an intuition of their relevancy in Python. The language from Python’s Iterators documentation is helpful:

… the for statement calls iter() on the container object. The function returns an iterator object that defines the method __next__() which accesses elements in the container one at a time. When there are no more elements, __next__() raises a StopIteration exception which tells the for loop to terminate.

Now — a curveball. There is a special method named __getitem__ that was used to define an Iterable in early 2.x versions of Python, before PEP234. This was how an object was defined as being Iterable. This method is still compatible with Python 3.x versions and AFAIK has no hints of deprecation in how it is used to qualify an object as an iterable. The __getitem__ method is used primarily to implement the evaluation of self[key] syntax. Read here for more. The takeaway:

__getitem__ used to be used to define an Iterable in older versions of Python.
__getitem__ is now used to implement the self[key] access for objects.

An Iterable implementing __next__ and __iter__ does not have self[key] access unless it also implements the __getitem__ method.

Remember that last point — you can iterate over an Iterator but not necessarily index into its values. More specifically: Iterables are Sequences by default.

Custom Iterable Objects

In our example above, we saw a custom implementation of a basic Iterable object. Instances of this object are used similarly to how any native Sequence type object might be used and support interoperability with many language features like list comprehension. This object isn’t very exciting though. Let’s implement some more exciting custom Iterables in Python:

Example 1: Fibonacci Iterable

Most Computer Science programs will, at some point, ask students to implement a function that calculates the n-th member in the Fibonacci sequence. Commonly, this is introduced as the seminal example of recursion. Let’s implement it as a Python Iterable:

class Fibonacci:
    def __init__(self, n: int):
        """
        Args:
            n: The number in the Fibonacci sequence that is to be calculated.
        """
        self.n = n
        self._i = 0
        self._current = 0
        self._next = 1

    def __iter__(self):
        return self

    def __next__(self):
        """Calculates and returns the next value in the Fibonacci sequence."""

        # returns next digit until the n-th digit is reached
        if self._i < self.n:

            # increases reference to members
            self._i += 1

            # gets the current value
            fib = self._current

            # gets next values in sequence
            self._current, self._next = self._next, self._current + self._next
            return fib

        # raises the StopIteration error which Python uses to end e.g. for loops
        else:
            raise StopIteration

Here we see all the essential Iterable methods: __iter__ and __next__ implemented. During each call to __next__ (made during a for loop iteration for example) the next digit in the Fibonacci sequence is calculated until we reach the value of n at which point the StopIteration exception is raised — signaling Python’s Iterator Protocol that the iterator has been exhausted. Let’s see how this can be used:

Approach 1: Looping

# loops
for digit in Fibonacci(10):
    print(digit)

# output
0
1
1
2
3
5
8
13
21
34

Approach 2: Explicit List Conversion

Our Fibonacci sequence can be converted to a list explicitly by using Python’s list builtin:

# explicitly converted to a list
digits = list(Fibonacci(10))

# output
[0, 1, 1, 2, 3, 5, 8, 13, 21, 34]

Approach 3: List Comprehension

Implementing Fibonacci using Python’s Iterator Protocol means we can also leverage Python’s list comprehension:

# via list comprehension
digits = [d for d in Fibonacci(10)]

# output
[0, 1, 1, 2, 3, 5, 8, 13, 21, 34]

Approach 4: Wrapped in a Function

For cases where only the n-th digit is needed, the Fibbonacci class can be wrapped into a custom function:

# via custom function
def get_nth_fibonacci(n: int) -> int:
    """Returns the nth digit in the Fibonacci sequence"""
    return list(Fibonacci(n))[-1]

# get a digit
print(get_nth_fibonacci(n=15))

>>> 377

The headline here is: anything one can do with an Iterable in Python one can now do with our custom Fibbonacci class!

Summary

Python’s Iterator Protocol is a powerful framework whereby developers can implement custom classes that are able to leverage the full power of the Python programming language. As described by the documentation: “This style of access is clear, concise, and convenient” — the three C’s. To quote again from the documentation: “The use of iterators pervades and unifies Python.” As such, making use of the Iterator Protocol helps ensure that one’s custom classes are Pythonic and more easily understood by other developers. For those readers yearning for more: check out the documentation for Python’s Data Model for an even deeper dive into iterators.