Python has a novel framework whereby objects can define how to iterate over the data they encapsulate. This framework, known as the Iterator Protocol, empowers developers to abstract the process of how an object returns members individually. Here we take a dive into understanding Python’s Iterator Protocol, the “magic” methods used, and how to tap into the power of this clear, concise, and convenient framework when designing custom classes.
- Iterable Basics.
- Python Iterator Protocol:
- Basic iteration examples.
- Custom Iterables – Basic Use-Cases
- Custom Iterables – Using our Imagination
Introduction: Understanding Iterables
The concept of iteration is language-agnostic. It’s found in every major programming language but has roots that run even deeper. Iteration is a conceptual underpinning of approaching repetitive actions.
Iteration reduces overhead by providing algorithmic definitions of how to handle situations in which there are multiple things for which multiple actions are required — even if that means repeating the same action multiple times! In the context of this article the term “Iterable” will refer to something which provides native support for the process of iteration.
The Python language defines an Iterable as “an object capable of returning its members one at a time.” Examples of Python-native iterables include
set and even
str type objects. Very quickly, let’s see an example of iteration using the
# defines a list with members my_list = [1, 2, 3, 4, 5] # uses a for loop to "iterate" through the list, # 1 member at a time for item in my_list: print(item)
This code block outputs the following to
1 2 3 4 5
This executes the
my_list object. In plain English: print each item in
my_list. Consider the deeper logic of this statement — how does Python know when to stop printing members? How does it detect when there are no more members to print? If we were to implement this manually it might look something like this:
# defines a list with 5 members my_list = [1, 2, 3, 4, 5] # loop until the number of members # having been printed matches the # number of members present. count = 0 while count < len(my_list): print(my_list[count]) count += 1
Just like before, the output is as follows:
1 2 3 4 5
Alternatively, we could attempt to access the next item in the
my_list collection until we encountered an
IndexError exception as follows:
# defines a list with 5 members my_list = [1, 2, 3, 4, 5] # access the next member of the # list until an exception is thrown. i = 0 while True: try: print(my_list[i]) i += 1 except: break
Once again, our output is as follows:
1 2 3 4 5
Clearly, there are several ways to approach the concept of implementing a
for loop. The last case — the one that uses exception handling for control flow — is actually the most similar to how Python’s Iterator Protocol works under the hood.
Understanding Python Iterables
Iterators were introduced in Python version 2 via PEP 234 — showing iconic number selection. It outlines the use of several methods, special methods, and a
StopIteration exception to help handle the Iterator Protocol. Over the course of Python’s evolution, the current
3.x versions use the following major components for implementing an Iterable object (taken directly from the documentation):
- __next__(): Return the next item from the iterator. If there are no further items, raise the StopIteration exception.
- __iter__(): Return an iterator object. The object is required to support the iterator protocol
- iter(): Built-in function that calls an iterable object’s
- next(): Built-in function that calls an iterable object’s
- StopIteration: Exception raised by an iterator’s
__next__method to signify that there are no more items.
next() calls the
__next__method of the object passed in as an argument. Likewise,
iter() calls the
__iter__ method of the object passed to it as an argument. In either case, an exception is raised if the object passed as an argument doesn’t implement the required method:
Below is a manual implementation of a custom Iterable object using the Python Iterator Protocol syntax/framework:
from collections.abc import Sequence from typing import Any class MyIterable: """Custom implementation of an Iterable object via Iterable Protocol""" def __init__(self, items: Sequence[Any]): """ Initializes the object with the Sequence specified. Args: items: An object that defines the __getitem__ method. """ # internal "protected" fields self._sequence = items self._counter = 0 def __iter__(self): """Returns a reference to the Iterable object""" return self def __next__(self): """Returns the next item or raises StopIteration""" # if the indexing variable hasn't exceeded the length of # the sequence, get the next item -- otherwise raise StopIteration if self._counter < len(self._sequence): item = self._sequence[self._counter] self._counter += 1 return item raise StopIteration
Note: The use of
Sequence here asserts a type hint that the argument should implement the
__getitem__ method that provides
self[key] type access.
MyIterable class taps into the power of Python’s Iterator Protocol by implementing a custom
__next__ method. This method allows instances of this custom object to tap into Python-native language tools like
while and even list comprehensions. Let’s take our new object for a test drive:
Our custom object allows instances to be accessed via standard
for loop syntax as follows:
# create new instance encapsulating a list object my_list = MyIterable([1, 2, 3, 4, 5]) # use Python-native for looping to iterate for item in my_list: print(item)
This code produces the following to
stdout, just as if we had used a
1 2 3 4 5
Python’s list comprehensions are a powerful abstraction for applying logic to sequences of values. Below, we see our custom object instance can make use of this powerful language-level tool as well:
# create new instance encapsulating a list object my_list = MyIterable([1, 2, 3, 4, 5]) # copy 1-4 into a separate list using list comp. no_five = [x for x in my_list if x < 5] # print results for j in no_five: print(j)
Similar to before, this code produces the following to
stdout minus the last element:
1 2 3 4
The real benefit of using custom iterable objects defined via the Python Iterator Protocol is that they are accessible to the languages tools that can interact with built-in iterables like
dict and even
str. Below, we see how the
.join function can be used on a Custom Iterable object:
# chain the join tool and list comprehension together stringed = "".join([str(x) for x in my_list])
The output is a string:
12345. Sure, this isn’t a direct use case but it shows how abiding by the Python Iterable Protocol allows easy folding into Python syntax.
Iterator vs. Iterable vs. Iteration
Goal: Develop an intuition for Python’s Iterator Protocol
There are some semantic concepts one needs to be aware of when using iterators in general — but especially when designing and implementing custom Python iterables. Let’s define some terms:
- Iterator: an object with a
nextin Python 2.x).
- Iterable: an object with a
__iter__method that maintains state of an iterator.
- Iteration: A general term used to describe the process of iterating over a sequence of items (language agnostic.)
Those are some easy-access definitions but don’t get us far in terms of developing an intuition of their relevancy in Python. The language from Python’s Iterators documentation is helpful:
… the for statement calls
iter() on the container object. The function returns an iterator object that defines the method
__next__() which accesses elements in the container one at a time. When there are no more elements,
__next__() raises a
StopIteration exception which tells the for loop to terminate.
Now — a curveball. There is a special method named
__getitem__ that was used to define an Iterable in early
2.x versions of Python, before PEP234. This was how an object was defined as being Iterable. This method is still compatible with Python
3.x versions and AFAIK has no hints of deprecation in how it is used to qualify an object as an iterable. The
__getitem__ method is used primarily to implement the evaluation of
self[key] syntax. Read here for more. The takeaway:
__getitem__used to be used to define an Iterable in older versions of Python.
__getitem__is now used to implement the
self[key]access for objects.
- An Iterable implementing
__iter__does not have
self[key]access unless it also implements the
Remember that last point — you can iterate over an Iterator but not necessarily index into its values. More specifically:
Sequences by default.
Custom Iterable Objects
In our example above, we saw a custom implementation of a basic Iterable object. Instances of this object are used similarly to how any native
Sequence type object might be used and support interoperability with many language features like list comprehension. This object isn’t very exciting though. Let’s implement some more exciting custom Iterables in Python:
Example 1: Fibonacci Iterable
Most Computer Science programs will, at some point, ask students to implement a function that calculates the n-th member in the Fibonacci sequence. Commonly, this is introduced as the seminal example of recursion. Let’s implement it as a Python Iterable:
class Fibonacci: def __init__(self, n: int): """ Args: n: The number in the Fibonacci sequence that is to be calculated. """ self.n = n self._i = 0 self._current = 0 self._next = 1 def __iter__(self): return self def __next__(self): """Calculates and returns the next value in the Fibonacci sequence.""" # returns next digit until the n-th digit is reached if self._i < self.n: # increases reference to members self._i += 1 # gets the current value fib = self._current # gets next values in sequence self._current, self._next = self._next, self._current + self._next return fib # raises the StopIteration error which Python uses to end e.g. for loops else: raise StopIteration
Here we see all the essential Iterable methods:
__next__ implemented. During each call to
__next__ (made during a
for loop iteration for example) the next digit in the Fibonacci sequence is calculated until we reach the value of
n at which point the
StopIteration exception is raised — signaling Python’s Iterator Protocol that the iterator has been exhausted. Let’s see how this can be used:
Approach 1: Looping
# loops for digit in Fibonacci(10): print(digit) # output 0 1 1 2 3 5 8 13 21 34
Approach 2: Explicit List Conversion
Fibonacci sequence can be converted to a list explicitly by using Python’s
# explicitly converted to a list digits = list(Fibonacci(10)) # output [0, 1, 1, 2, 3, 5, 8, 13, 21, 34]
Approach 3: List Comprehension
Implementing Fibonacci using Python’s Iterator Protocol means we can also leverage Python’s list comprehension:
# via list comprehension digits = [d for d in Fibonacci(10)] # output [0, 1, 1, 2, 3, 5, 8, 13, 21, 34]
Approach 4: Wrapped in a Function
For cases where only the n-th digit is needed, the
Fibbonacci class can be wrapped into a custom function:
# via custom function def get_nth_fibonacci(n: int) -> int: """Returns the nth digit in the Fibonacci sequence""" return list(Fibonacci(n))[-1] # get a digit print(get_nth_fibonacci(n=15)) >>> 377
The headline here is: anything one can do with an
Iterable in Python one can now do with our custom
Python’s Iterator Protocol is a powerful framework whereby developers can implement custom classes that are able to leverage the full power of the Python programming language. As described by the documentation: “This style of access is clear, concise, and convenient” — the three C’s. To quote again from the documentation: “The use of iterators pervades and unifies Python.” As such, making use of the Iterator Protocol helps ensure that one’s custom classes are Pythonic and more easily understood by other developers. For those readers yearning for more: check out the documentation for Python’s Data Model for an even deeper dive into iterators.