Python has a novel framework whereby objects can define how to iterate over the data they encapsulate. This framework, known as the Iterator Protocol, empowers developers to abstract the process of how an object returns members individually. Here we take a dive into understanding Python’s Iterator Protocol, the “magic” methods used, and how to tap into the power of this clear, concise, and convenient framework when designing custom classes.
Highlights
- Iterable Basics.
- Python Iterator Protocol:
__iter__
,__next__
,iter()
, andnext()
. - Basic iteration examples.
- Custom Iterables – Basic Use-Cases
- Custom Iterables – Using our Imagination
Introduction: Understanding Iterables
The concept of iteration is language-agnostic. It’s found in every major programming language but has roots that run even deeper. Iteration is a conceptual underpinning of approaching repetitive actions.
Iteration reduces overhead by providing algorithmic definitions of how to handle situations in which there are multiple things for which multiple actions are required — even if that means repeating the same action multiple times! In the context of this article the term “Iterable” will refer to something which provides native support for the process of iteration.
The Python language defines an Iterable as “an object capable of returning its members one at a time.” Examples of Python-native iterables include list
, tuple
, dict
, set
and even str
type objects. Very quickly, let’s see an example of iteration using the list
object:
# defines a list with members my_list = [1, 2, 3, 4, 5] # uses a for loop to "iterate" through the list, # 1 member at a time for item in my_list: print(item)
This code block outputs the following to stdout
:
1 2 3 4 5
This executes the print
statement “for each” item within the my_list
object. In plain English: print each item in my_list
. Consider the deeper logic of this statement — how does Python know when to stop printing members? How does it detect when there are no more members to print? If we were to implement this manually it might look something like this:
# defines a list with 5 members my_list = [1, 2, 3, 4, 5] # loop until the number of members # having been printed matches the # number of members present. count = 0 while count < len(my_list): print(my_list[count]) count += 1
Just like before, the output is as follows:
1 2 3 4 5
Alternatively, we could attempt to access the next item in the my_list
collection until we encountered an IndexError
exception as follows:
# defines a list with 5 members my_list = [1, 2, 3, 4, 5] # access the next member of the # list until an exception is thrown. i = 0 while True: try: print(my_list[i]) i += 1 except: break
Once again, our output is as follows:
1 2 3 4 5
Clearly, there are several ways to approach the concept of implementing a for
loop. The last case — the one that uses exception handling for control flow — is actually the most similar to how Python’s Iterator Protocol works under the hood.
Understanding Python Iterables
Iterators were introduced in Python version 2 via PEP 234 — showing iconic number selection. It outlines the use of several methods, special methods, and a StopIteration
exception to help handle the Iterator Protocol. Over the course of Python’s evolution, the current 3.x
versions use the following major components for implementing an Iterable object (taken directly from the documentation):
- __next__(): Return the next item from the iterator. If there are no further items, raise the StopIteration exception.
- __iter__(): Return an iterator object. The object is required to support the iterator protocol
- iter(): Built-in function that calls an iterable object’s
__iter__
method. - next(): Built-in function that calls an iterable object’s
__next__
method. - StopIteration: Exception raised by an iterator’s
__next__
method to signify that there are no more items.
Note: next()
calls the __next__
method of the object passed in as an argument. Likewise, iter()
calls the __iter__
method of the object passed to it as an argument. In either case, an exception is raised if the object passed as an argument doesn’t implement the required method: __next__
and __iter__
, respectively.
Below is a manual implementation of a custom Iterable object using the Python Iterator Protocol syntax/framework:
from collections.abc import Sequence from typing import Any class MyIterable: """Custom implementation of an Iterable object via Iterable Protocol""" def __init__(self, items: Sequence[Any]): """ Initializes the object with the Sequence specified. Args: items: An object that defines the __getitem__ method. """ # internal "protected" fields self._sequence = items self._counter = 0 def __iter__(self): """Returns a reference to the Iterable object""" return self def __next__(self): """Returns the next item or raises StopIteration""" # if the indexing variable hasn't exceeded the length of # the sequence, get the next item -- otherwise raise StopIteration if self._counter < len(self._sequence): item = self._sequence[self._counter] self._counter += 1 return item raise StopIteration
Note: The use of Sequence
here asserts a type hint that the argument should implement the __getitem__
method that provides self[key]
type access.
This custom MyIterable
class taps into the power of Python’s Iterator Protocol by implementing a custom __next__
method. This method allows instances of this custom object to tap into Python-native language tools like for
and while
and even list comprehensions. Let’s take our new object for a test drive:
For Loops
Our custom object allows instances to be accessed via standard for
loop syntax as follows:
# create new instance encapsulating a list object my_list = MyIterable([1, 2, 3, 4, 5]) # use Python-native for looping to iterate for item in my_list: print(item)
This code produces the following to stdout
, just as if we had used a list
object:
1 2 3 4 5
List Comprehensions
Python’s list comprehensions are a powerful abstraction for applying logic to sequences of values. Below, we see our custom object instance can make use of this powerful language-level tool as well:
# create new instance encapsulating a list object my_list = MyIterable([1, 2, 3, 4, 5]) # copy 1-4 into a separate list using list comp. no_five = [x for x in my_list if x < 5] # print results for j in no_five: print(j)
Similar to before, this code produces the following to stdout
minus the last element:
1 2 3 4
The real benefit of using custom iterable objects defined via the Python Iterator Protocol is that they are accessible to the languages tools that can interact with built-in iterables like list
, dict
and even str
. Below, we see how the .join
function can be used on a Custom Iterable object:
# chain the join tool and list comprehension together stringed = "".join([str(x) for x in my_list])
The output is a string: 12345
. Sure, this isn’t a direct use case but it shows how abiding by the Python Iterable Protocol allows easy folding into Python syntax.
Iterator vs. Iterable vs. Iteration
Goal: Develop an intuition for Python’s Iterator Protocol
There are some semantic concepts one needs to be aware of when using iterators in general — but especially when designing and implementing custom Python iterables. Let’s define some terms:
- Iterator: an object with a
__next__
method (ornext
in Python 2.x). - Iterable: an object with a
__iter__
method that maintains state of an iterator. - Iteration: A general term used to describe the process of iterating over a sequence of items (language agnostic.)
Those are some easy-access definitions but don’t get us far in terms of developing an intuition of their relevancy in Python. The language from Python’s Iterators documentation is helpful:
… the for statement calls iter()
on the container object. The function returns an iterator object that defines the method __next__()
which accesses elements in the container one at a time. When there are no more elements, __next__()
raises a StopIteration
exception which tells the for loop to terminate.
Now — a curveball. There is a special method named __getitem__
that was used to define an Iterable in early 2.x
versions of Python, before PEP234. This was how an object was defined as being Iterable. This method is still compatible with Python 3.x
versions and AFAIK has no hints of deprecation in how it is used to qualify an object as an iterable. The __getitem__
method is used primarily to implement the evaluation of self[key]
syntax. Read here for more. The takeaway:
__getitem__
used to be used to define an Iterable in older versions of Python.__getitem__
is now used to implement theself[key]
access for objects.- An Iterable implementing
__next__
and__iter__
does not haveself[key]
access unless it also implements the__getitem__
method.
Remember that last point — you can iterate over an Iterator but not necessarily index into its values. More specifically: Iterables
are Sequences
by default.
Custom Iterable Objects
In our example above, we saw a custom implementation of a basic Iterable object. Instances of this object are used similarly to how any native Sequence
type object might be used and support interoperability with many language features like list comprehension. This object isn’t very exciting though. Let’s implement some more exciting custom Iterables in Python:
Example 1: Fibonacci Iterable
Most Computer Science programs will, at some point, ask students to implement a function that calculates the n-th member in the Fibonacci sequence. Commonly, this is introduced as the seminal example of recursion. Let’s implement it as a Python Iterable:
class Fibonacci: def __init__(self, n: int): """ Args: n: The number in the Fibonacci sequence that is to be calculated. """ self.n = n self._i = 0 self._current = 0 self._next = 1 def __iter__(self): return self def __next__(self): """Calculates and returns the next value in the Fibonacci sequence.""" # returns next digit until the n-th digit is reached if self._i < self.n: # increases reference to members self._i += 1 # gets the current value fib = self._current # gets next values in sequence self._current, self._next = self._next, self._current + self._next return fib # raises the StopIteration error which Python uses to end e.g. for loops else: raise StopIteration
Here we see all the essential Iterable methods: __iter__
and __next__
implemented. During each call to __next__
(made during a for
loop iteration for example) the next digit in the Fibonacci sequence is calculated until we reach the value of n
at which point the StopIteration
exception is raised — signaling Python’s Iterator Protocol that the iterator has been exhausted. Let’s see how this can be used:
Approach 1: Looping
# loops for digit in Fibonacci(10): print(digit) # output 0 1 1 2 3 5 8 13 21 34
Approach 2: Explicit List Conversion
Our Fibonacci
sequence can be converted to a list explicitly by using Python’s list
builtin:
# explicitly converted to a list digits = list(Fibonacci(10)) # output [0, 1, 1, 2, 3, 5, 8, 13, 21, 34]
Approach 3: List Comprehension
Implementing Fibonacci using Python’s Iterator Protocol means we can also leverage Python’s list comprehension:
# via list comprehension digits = [d for d in Fibonacci(10)] # output [0, 1, 1, 2, 3, 5, 8, 13, 21, 34]
Approach 4: Wrapped in a Function
For cases where only the n-th digit is needed, the Fibbonacci
class can be wrapped into a custom function:
# via custom function def get_nth_fibonacci(n: int) -> int: """Returns the nth digit in the Fibonacci sequence""" return list(Fibonacci(n))[-1] # get a digit print(get_nth_fibonacci(n=15)) >>> 377
The headline here is: anything one can do with an Iterable
in Python one can now do with our custom Fibbonacci
class!
Summary
Python’s Iterator Protocol is a powerful framework whereby developers can implement custom classes that are able to leverage the full power of the Python programming language. As described by the documentation: “This style of access is clear, concise, and convenient” — the three C’s. To quote again from the documentation: “The use of iterators pervades and unifies Python.” As such, making use of the Iterator Protocol helps ensure that one’s custom classes are Pythonic and more easily understood by other developers. For those readers yearning for more: check out the documentation for Python’s Data Model for an even deeper dive into iterators.