ICS 33 Spring 2024, Notes and Examples: Iteration

Background

Earlier in the course, we introduced the term iterable, which is used to describe objects in Python that can be iterated. When an object is iterated, it's being asked to produce a sequence of values one at a time. Many different kinds of objects are iterable, and iteration can be used for many different purposes.

The first example of iteration you likely saw early in your Python journey was the for loop, which is flexible enough to be able to iterate many kinds of objects.


>>> for i in [1, 2, 3]:
...     print(i, end = ' ')
...
    1 2 3        # Lists can be iterated, one element at a time.
>>> for i in 'Boo':
...     print(i, end = ' ')
...
    B o o        # Strings can be iterated, a one-character string at a time.
>>> for i in range(3):
...     print(i, end = ' ')
...
    0 1 2        # Ranges can be iterated, one integer at a time.

This flexibility is made possible by the same mechanisms that make it possible to pass any of those same kinds of objects to the list constructor, the set constructor, a list comprehension, and so on.


>>> list([1, 2, 3])
    [1, 2, 3]             # The list constructor iterates its argument, building a list
                          # containing those elements.
>>> list('Boo')
    ['B', 'o', 'o']
>>> set(range(3))
    {0, 1, 2}             # The set constructor iterates its argument, building a set
                          # containing those elements (as long as they're hashable, and
                          # with duplicates removed).
>>> [x.upper() for x in 'Boo']
    ['B', 'O', 'O']       # A list comprehension iterates its argument, evaluating the
                          # initial expression for each, and building a list of the results.

We've already seen a handful of examples of how these kinds of common behaviors in Python are driven by protocols, which not only allows the flexibility for those behaviors to be supported by many of the types built into Python and its standard library, but importantly allows us to design our own types that support those same behaviors. What if we want to design a class whose objects are iterable? To do so, we'll need to understand the mechanics of iteration in Python, so let's start there.

Manual iteration in Python

The iteration mechanism in Python is centered around two similar-sounding but slightly different concepts.

An iterable is an object that we can iterate, which is to say that we can tell it that we want it to produce a sequence of elements, one at a time.
An iterator is an object whose job is to manage the process of producing that sequence of elements once. It might keep of track of how many we've seen, which one we'll see next, and whether there are any that we haven't seen yet.

At first blush, that might seem like a distinction without a difference. Why can't iterables manage the process of iterating themselves? Why do they need an intermediary — the iterator — to do it for them? The answer lies in one example.


>>> values = [1, 2, 3]
>>> [(x, y) for x in values for y in values]
    [(1, 1), (1, 2), (1, 3), (2, 1), (2, 2), (2, 3), (3, 1), (3, 2), (3, 3)]

Consider how the list comprehension would have had to build this result. By the time it's done, it will have iterated the same list a total of four times: once to produce all of the x values, and once to produce all of the y values for each of the x values. All of those iterations are out of step with each other; they're beginning and ending at different times, and they're proceeding at different rates.

So, how might lists allow themselves to be iterated more than once at different times and different rates, as we're doing here? There are two design choices.

Lists need to keep track of every iteration simultaneously, so that when we ask them to give us the next element, they'll know how to do it. But, critically, this will also complicate how we ask for the next element, because we'll need to tell the list for which iteration we need the next element.
An intermediary object is used to manage the iteration instead. Each time we ask for the next element, it's the intermediary that we ask; that way, it's obvious which iteration we're asking about and what the result should be, because it's the intermediary who keeps track of how far a particular iteration has gotten already.

Python's design opted for the second of these techniques, with iterators being those intermediaries, because it's the simpler and more flexible of the two. So, if we want to understand Python's iteration mechanism, we need to know how to do two things.

Start a new iteration, by asking an iterable object for an iterator.
Ask an iterator for the next element we haven't seen yet, along with a way for it to tell us that we've seen them all.

Python provides two built-in functions, iter and next, for these respective purposes.


>>> values = [1, 2, 3]
>>> i = iter(values)         # This is how we create an iterator for an iterable object.
>>> type(i)
    <class 'list_iterator'>  # A list_iterator is an iterator that can iterate a list.
>>> next(i)
    1                        # This is how we ask an iterator for the next element.
>>> next(i)
    2
>>> i2 = iter(values)        # Multiple iterators can exist simultaneously.
>>> next(i2)
    1
>>> next(i)
    3
>>> next(i)
    Traceback (most recent call last):
      ...
    StopIteration            # A StopIteration exception is raised when there are no more elements.
>>> next(i2)
    2                        # When one iteration has finished, others are unaffected.

Other types of objects that are iterable, such as sets, strings, or ranges, can be used with iter and next similarly.

The details and costs of list iteration

Lists have a characteristic that's worth exploring with respect to iteration: They are mutable. In other words, lists can have elements added or removed at any time, including while they're in the midst of being iterated. (This is true whether we're iterating manually using iter and next, or using automatic iteration such as a for loop.) So, what happens when we modify a list while it's being iterated?


>>> values = [1, 2, 3, 4]
>>> i = iter(values)
>>> next(i)
    1
>>> del values[1]         # This is the next element we would have seen.
>>> next(i)
    3                     # The removed element was simply skipped, and we got back
                          # the next element still in the list, which seems reasonable.

>>> values = [1, 2, 3, 4]
>>> i = iter(values)
>>> next(i)
    1
>>> values.insert(1, 17)  # Inserting an element after the one we just saw from the iterator.
>>> next(i)
    17                    # The inserted element emerged in the proper sequence.

>>> values = [1, 2, 3, 4]
>>> i = iter(values)
>>> next(i)
    1
>>> next(i)
    2
>>> del values[0]         # Removing an element we already saw.
>>> next(i)
    4                     # An element we hadn't seen yet, but is still in the list, was skipped.
                          # That seems a little stranger.

Mutating a data structure while you're in the midst of iterating it is generally not a good idea, if it can be avoided, but these three examples give us some insight about how list_iterator works internally, because there's a straightforward technique that would behave in exactly this way. A list_iterator could store two things: a reference to the list being iterated and the index of the next element to be returned. (Since modifications to the list affect iteration, we know the iterator isn't storing a copy of the list.)

When initially created, it would store a reference to the list, along with the index 0.
When asked for its next element, it would return the element at the index it's storing, then add 1 to that index.
When the index it's storing is greater than or equal to the list's length, asking it for its next element would cause it to raise StopIteration.

This also gives us some insight about the time and memory cost of iteration.

A list_iterator requires O(1) memory, no matter how large the list is, because it always stores exactly two things: a reference to the list and an integer index.
Asking a list_iterator for its next element takes O(1) time, because it always obtains an element from a known index. It's no more expensive to do that with the 500th element than it is with the fifth.

So, all in all, we can iterate an entire list of n elements in O(n) time, while using O(1) additional memory, which is as cheap asymptotically as you'd ever expect iteration of n elements to be.

The iterable protocol

We saw that the built-in Python functions iter and next can be used to perform iteration manually, but a question remains: By what mechanism can many different types be compatible with iter and next? When iter asks an object for an iterator, how does it ask for one? When next asks an iterator for its next element, how does that work?

When we want the objects of a class to be iterable, we need the class to support the iterable protocol. Like the other protocols we've seen, that means we need to write the dunder methods it requires, and that those methods need to work in a way that's compatible with the protocol. The iterable protocol looks deceptively simple, because it only requires one method.

__iter__(self), which returns an iterator that is capable of producing one element at a time.

We can see the iterable protocol in action in the Python shell, though we'll rarely need to call __iter__ directly in practice.


>>> type([1, 2, 3].__iter__())
    <class 'list_iterator'>
>>> type('Boo'.__iter__())
    <class 'str_iterator'>
>>> type(range(5).__iter__())
    <class 'range_iterator'>

So, that's the mechanism that makes the built-in iter function work. What about next? Once we have an iterator, how does next interact with it? Iterators, too, are built around a protocol: the iterator protocol, which consists of two methods.

__next__(self), which returns the next element from the iterator, or raises a StopIteration if there are no more elements.
__iter__(self), which returns the iterator itself. (This allows iterators to be used wherever iterables can be used, so you can, for example, pass an iterator to the list constructor or use one to drive a for loop. In other words, iterators are iterable, even if iterables are not iterators.)

Let's see if list iteration follows that protocol.


>>> values = [1, 2, 3]
>>> i = iter(values)
>>> i.__iter__() is i
    True             # Iterators' __iter__ methods return themselves.
>>> i.__next__()
    1                # Iterators' __next__ methods return each element in turn.
>>> i.__next__()
    2
>>> i.__next__()
    3
>>> i.__next__()
    Traceback (most recent call last):
      ...
    StopIteration    # Iterators' __next__ methods raise StopIteration when
                     # there are no more elements.

It does indeed. So, now we have a pretty good idea of what makes an object able to be iterated: Its class implements the iterable protocol, which provides an __iter__ method that returns an object whose class implements the iterator protocol.

But what can we do with that knowledge? Now that we know what it takes for a class to have objects that are iterable, we can write those classes ourselves.

Writing an iterable class

You've likely used Python's ranges in a fair amount of detail previously, but let's quickly review a few of their abilities by using list comprehensions to quickly show us the sequence of values produced by different ranges.


>>> [x for x in range(7)]
    [0, 1, 2, 3, 4, 5, 6]   # Passing one positional argument specifies a stop, with start
                            # defaulted to 0 and step defaulted to 1.
>>> [x for x in range(3, 7)]
    [3, 4, 5, 6]            # Passing two positional arguments specifies the start and stop,
                            # respectively, with step defaulted to 1.
>>> [x for x in range(0, 10, 2)]
    [0, 2, 4, 6, 8]         # Passing three positional arguments specifies the start, stop, and
                            # step, respectively.
>>> [x for x in range(start = 0, stop = 10, step = 2)]
    Traceback (most recent call last):
      ...
    TypeError: range() takes no keyword arguments
                            # The range function doesn't accept keyword arguments.

Now, let's suppose that we want to build our own rough equivalent of Python's range type, which we'll name MyRange. How would we do it?

(It may seem pointless to build things in a programming language when they already exist, but it's actually a useful technique for learning the details of a programming language and its library, because it gives you a complete problem you can work on without having to decide "What should I build and how should it work?", and also because it exposes areas of a language that you haven't seen yet, as we'll see a little later in this example. Particularly when you're engaging in self-study, this can be a great way to help keep you moving forward in a learning process, as long as you can avoid feeling discouraged when there are aspects that you won't know how to build yet. For now, we'll embrace the problems that we already understand how to solve, and defer the rest until later.)

Initializing a MyRange

Based on the examples above, we see that initializing revolves around an __init__ method that accepts up to three positional arguments, though their meaning is partly determined by how many there are.

When exactly one argument is passed, it is treated as the stop value, with start defaulted to 0.
When fewer than three arguments are passed, step is defaulted to 1.

We can detect missing arguments by using default arguments to take their place, though we'll need to be sure we use defaults that don't otherwise have a legitimate meaning. None is a good choice there. We can also use the / notation in the __init__ method's signature to enforce the requirement that all arguments be passed positionally.


class MyRange:
    def __init__(self, start, stop = None, step = None, /):
        ...

Having used default arguments to detect whether arguments are missing, we'll need to handle other defaulting (e.g., start being 0, step being 1) ourselves. We might also want to ensure that start, stop, and step all have type int, which makes our type more difficult to misuse.


class MyRange:
    def __init__(self, start, stop = None, step = None, /):
        if stop is None:
            stop = start
            start = 0

        if step is None:
            step = 1

        self._require_int(start, 'start')
        self._require_int(stop, 'stop')
        self._require_int(step, 'step')

        self._start = start
        self._stop = stop
        self._step = step


    @staticmethod
    def _require_int(value, name):
        if type(value) is not int:
            raise TypeError(f'{name} must be an integer, but was {value}')

We'll probably also want a way to find out the start, stop, and step values associated with a MyRange, but not to change them. While we don't yet have the tools to enforce that attributes can't have new values assigned to them, we can at least mark them as protected (which we have) by prefixing their names with a single underscore, then provide public methods that return their values. Since those values will be immutable, there's no danger of changing a MyRange by calling these methods. (Later this quarter, we'll learn some ways to prevent attributes from being modified, which would give us the ability to simplify this code a bit.)


class MyRange:
    ...

    def start(self):
        return self._start


    def stop(self):
        return self._stop


    def step(self):
        return self._step

We might now like to do some quick experimentation in the Python shell, to see whether we have the basics of initialization in place. (We might then want to follow up by writing a thorough set of unit tests, to solidify the correctness of our implementation going forward.)


>>> MyRange(7)
    <__main__.MyRange object at 0x000002207C524340>
                     # We can't see the details in the Python shell, so we'll need to dig for them.
>>> r = MyRange(7)
>>> r.start(), r.stop(), r.step()
    (0, 7, 1)        # start and step are defaulted when only one argument is passed.
>>> r = MyRange(3, 7)
>>> r.start(), r.stop(), r.step()
    (3, 7, 1)        # step is defaulted when two arguments are passed.
>>> r = MyRange(5, 15, 2)
>>> r.start(), r.stop(), r.step()
    (5, 15, 2)       # No defaults when we specify all three arguments.
>>> r = MyRange('A', 'B', 'C')
    Traceback (most recent call last):
      ...
    TypeError: start must be an integer, but was A
                     # Non-integer arguments aren't permitted.

That first example was a bit unfortunate, because ranges do a better job of displaying themselves in the Python shell than ours do.


>>> range(7)
    range(0, 7)
>>> range(3, 15, 2)
    range(3, 15, 2)

When an expression is evaluated in the Python shell, an object is returned. The result that's displayed is what's called a representation of that object. We can specify an object's representation by adding a __repr__ method to its class, so let's do that.


class MyRange:
    ...

    def __repr__(self):
        step_repr = f'' if self._step == 1 else f', {self._step}'
        return f'MyRange({self._start}, {self._stop}{step_repr})'

Let's try that out in the Python shell.


>>> MyRange(10)
    MyRange(0, 10)         # Much better!
>>> MyRange(3, 15, 2)
    MyRange(3, 15, 2)      # Yep!  Looking good.

Making MyRange iterable

We saw previously that we can make an object iterable by adding an __iter__ method to its class, which returns an iterator that is capable of iterating through the object once. It will generally be the case that the iterator needs to be aware of the object that it's iterating, so a good first step is this.


class MyRange:
    ...

    def __iter__(self):
        return MyRangeIterator(self)


class MyRangeIterator:
    def __init__(self, myrange):
        self._myrange = myrange

The iterator's job will be to give us each element in the range, one at a time, which means it'll need to do two things:

Keep track, at all times, of the next value to be returned. Initially, that will be the start value.
When asked for the next element, return that value, but update the "next value" by adding the step to it.

We'll also need to be sure that we're doing these things in the right places, by initializing the "next value" in the __init__ method and the rest in the __next__ method (which is part of the iterator protocol). (To follow the rest of the iterator protocol, we'll need an __iter__ method in the iterator, too.)


class MyRangeIterator:
    def __init__(self, myrange):
        self._myrange = myrange
        self._next = myrange.start()


    def __iter__(self):
        return self


    def __next__(self):
        result = self._next
        self._next += self._myrange.step()
        return result

How does that work so far?


>>> r = MyRange(3)
>>> i = iter(r)
>>> next(i)
    0
>>> next(i)
    1
>>> next(i)
    2        # Looking good so far!
>>> next(i)
    3        # Wait!  It needs to stop!
>>> next(i)
    4

All we're missing at this point is the ability to stop iterating when there are no more values in the range. We can detect this in the __next__ method by checking whether self._next has reached or exceeded the range's stop value. If so, we should raise a StopIteration instead of returning a result.


class MyRangeIterator:
    ...

    def __next__(self):
        if self._next >= self._myrange.stop():
            raise StopIteration
        else:
            result = self._next
            self._next += self._myrange.step()
            return result

Did that fix the problem?


>>> r = MyRange(3)
>>> i = iter(r)
>>> next(i)
    0
>>> next(i)
    1
>>> next(i)
    2
>>> next(i)
    Traceback (most recent call last):
      ...
    StopIteration             Looks good to me!

So, we can see that our ranges are compatible with hand-written iteration using iter and next. How about other constructs that can accept iterables? Can they now accept MyRanges instead?


>>> [x for x in MyRange(10)]
    [0, 1, 2, 3, 4, 5, 6, 7, 8, 9]
>>> list(MyRange(1, 17, 6))
    [1, 7, 13]

Asymptotic analysis of MyRange

There's something else we should consider about MyRange. How performant is it? How much time does it spend? How much memory? Could we have done significantly better than we did on those measurements?

Let's start with memory, because there's not much to it.

What memory is used by a MyRange? Regardless of how large the range is, it's always going to be three attributes, each storing an integer: _start, _stop, and _step. For a MyRange with n elements, we would say the memory usage here is O(1).
What additional memory is used by a MyRangeIterator when an iteration is in progress? Regardless of how large the range is, it's always going to be two attributes: a reference back to the MyRange and the _next attribute that tracks what the next value to be returned is. So, again, the memory usage is not dependent on the size of the range; it, too, is O(1).

Both of these are encouraging results. From an asymptotic perspective, you can't do any better. For something that feels like a data structure, we've achieved what seems impossible: We don't need more memory to store a larger range than we do to store a smaller one. (To be fair, that's because MyRange is so limited in terms of what it can store, because its entire sequence of elements can be boiled down to a formula involved start, stop, and step. This wasn't magic; it was simply taking advantage of the simplifying properties of the problem we were solving.)

How about time? How long does it take to create a MyRange? How long does it take to iterate one?

Constructing a MyRange involves initializing the start, stop, and step values. There's a little bit of shuffling of parameters in the __init__ method, but all of that work is constant — it doesn't take longer to construct a larger range than it does a smaller one. So, we'd say this takes O(1) time.
Iterating a MyRange containing n elements requires first creating an iterator, and then calling __next__ on it n times.
- The creation step involves initializing exactly two attributes. That takes O(1) time.
- Each call to __next__ checks that we haven't passed the stop value, then adds the step to the "next" value. This takes O(1) time for each call to __next__. If there are n elements in the range, a complete iteration of the entire range will take O(n) time.

These results, too, are in the category of being as good as it gets. Not bad for our first attempt!

What else can ranges do that MyRanges can't?

At this point, we're fairly satisfied that we've applied the new knowledge we've gained about iterables and iterators, as well as one new bit about representations and the __repr__ method. But it's worth stopping for a minute and considering what kinds of things range can do that MyRange still can't. You can learn a lot about small-picture software design by comparing your work to existing work.


>>> range(3, 5, 0)
    Traceback (most recent call last):
      ...
    ValueError: range() arg 3 must not be zero
                    # If step is zero, ranges will continue forever.  This is best
                    # disallowed, but MyRange currently allows it.
>>> [x for x in range(-2, -8, -2)]
    [-2, -4, -6]    # range allows negative step sizes and handles them correctly.
>>> [x for x in MyRange(-2, -8, -2)]
    []              # MyRange allows them, but mishandles them.
>>> r = range(0, 10, 2)
>>> r[3]
    6               # ranges can be indexed, but MyRanges can't.
>>> r.start
    0               # range exposes start, stop, and step as
                    # attributes rather than methods.
>>> r.start = 3
    Traceback (most recent call last):
      ...
    AttributeError: readonly attribute
                    # But range doesn't allow those attributes to be modified.
>>> len(range(0, 10, 2))
    5               # ranges have a length, but MyRanges do not.
>>> range(10) == range(10)
    True            # ranges can be compared for equality, but MyRanges cannot.
>>> bool(range(0, 0))
    False           # Empty ranges are falsy, but empty MyRanges are not.

Still, despite the limitations of what we've done, this is a really good start. As we proceed through this course, we'll soon find out how we might add all of these abilities to out MyRange class, so that it can be as complete and full-featured as Python's built-in range class.

The completed code example

Below are links to a complete version of our MyRange example, along with a collection of unit tests for it.