Iterators Iterators are one of the most useful (and used) features in Python. Besides being used explicitly in for loops (including for loops in comprehensions), constructors often include a parameter that is iterable, whose values are used to initialize the state of the objects they are building: e.g., write set(alist) to create a set with all the values from a list (obviously removing duplicates). Also, names like sorted and reversed are also used in iterators: e.g., for i in sorted(aset)) take an iterable as an argument and return an iterator. In the next three lectures we will explore iterators in more detail. The first focuses on the mechanics of iterators; the second focuses on examples of iterators for classes; the third introduces a new kind of function called a generator (in computer science it is also known as a coroutine) that returns a value, but when called again remember where it left off to return another value: this feature makes writing all sorts of iterators as functions (not classes) much easier. Throughout these lectures we will discuss iterator efficiency issues (both time and space): because iterators are used so often, their running times can dominate a program's running time. In this lecture we will learn about the __iter__ and __next__ methods (which a class must implement to be iterated over), how a for loop for iteration is equivalent to a while loop (which needs a try/except block to catch the StopIteration exception that is raised by "exhausted" iterators), and how sharing and mutation affects iterators: sometimes they can cause problesms, and we will learn how to mitigate such problems. ------------------------------------------------------------------------------ For a class to be iterable (i.e., used in a for loop) it must implement the iterator protocol, which means the __iter__ and typically the __next__ methods. A protocol is just a group of methods that work together to accomplish something useful. We have already seen the __enter__ and __exit__ methods that work together to implement the context manager protocol. Included in this protocol is the StopIteration class, which is an exception raised by __next__ to signal there are no more values to iterate over (terminating the for loop). The builtins module defines StopIteration. Before writing some classes that act as iterators, we will explore the semantics of Python's for loop, by showing an equivalent while loop. We can think of Python translating the former into the latter; the while loop is more primitive than the for loop, so we can see more details of how for loop works by analyzing the while loop. In this process we will better understand for loops, and also see how to write loops that process iterators in a more intricate way than the straightforward (but very useful) way that for loops proccess them. Before discussing this translation, we should first understand how the else keyword is used in for/while loops. Since I'm not sure how familiar you are with these statements, I'll start at the beginning (but move quickly). 1) The break statement in Python (just the keyword break as a statement) can be put in any for/while loop; semantically, when executed it terminates the loop. I often write loops like while True: statements if test: break statements In fact, when the if test: break stateement appears first in the the loop, as shown below, while True: if test: break statements we can write an equivalent loop instead as while not (test): # not is high precedence, so I put () around test statements Some programmers/educator banish using breaks in loops, but I think that edict is too extreme.I could give a long lecture on how programmers (and educators who teach programming ) feel about break statements. Instead I'll just say the following. a) When I teach indefinite looping (in ICS-31), I teach the While True/break form first. It decouples deciding to loop from how the loop terminates. When students need to write a loop, they write an infinite loop first, and later decide where to test the "termination" condition, and what this condition should be; this test can be stated in the "positive" form: terminate when the test is True. b) If that test is the first statement in the loop (as shown above; sometimes we need to re-arrange code to get equivalent code with that test first) then I will SOMETIMES convert it into a While loop whose test is a "continuation" condition; the test must be stated in the "negative" form: terminate when the test is False (equivalent to continue when the test is True). c) I think it is much easier to think about termination (something is unique about the state) than about continuation (lots of other states). Think about terminating when a value reaches 0 versus continuing when the value is any positive number. d) Students sometimes go crazy and write too many different conditional breaks inside loops: there is no limit. Programmers need to work hard to reduce the number of breaks to simplify their code, but sometimes the simplest to understand code has a few breaks. In fact, we can also have conditional breaks in for loops, because they can terminate that kind of loop: and sample use might be searching over a range of values with a for loop, but terminating when a special value is reached. e) Often difficulties with breaks get resolved if we take the loop code we are writing and put it in a function, replacing muiltiple breaks by multiple returns. Of course some programmers/educators don't like functions with multiple returns. Here is a cannonical example of a While loop that is more easily understood when written as a While True loop with a conditional break. sum = 0 while True: value = prompt.for_int('Enter test score (-1 to terminate)') if value == -1: break sum += value print(Sum =',sum) # we know here the loop exited so value == -1 is True To use a while test loop (and no conditional break), we would need to write sum = 0 value = prompt.for_int('Enter test score (-1 to terminate)') while value != 0: sum += value value = prompt.for_int('Enter test score (-1 to terminate)') print(Sum =',sum) # we know here the loop exited so value != -1 is False What I object to in the code above is the duplicate prompt; in other code there might be more statements duplicated. 2) The actual syntax of a for/while loop is for index(es) in iterable: while boolean-expression: block-body block-body [else: [else: block-else] block-else] Semantically, when each loop terminates (it may terminate "normally" or by executing a break inside the first block), if the else keyword is present and the loop terminated normally, then Python executes block-else. So else means: execute block-else if the loop didn't execute a break to terminate. As a simple example that illustrates the use of else: we could write for i in range(100): if special_property(i): print(i,'is the first with the special property') break else: print('No value had the special property') I don't find myself writing else in loops much, but that might be because I have never used a language that allows such a feature. As I continue to use Python, I'll come to a more concrete conclusion about the usefulness of this language feature. OK, now back to the main event: iterators. Python translates for loops like for indexes in iterable: block-body [else: block-else] into the following code. Note that like len, Python's iter and next method translate into calls of __iter__ and __next__ on their argument: e.g., def iter(i) : i__iter__() _hidden = iter(iterable) # iterable.__iter__() try: while True: indexes = next(_hidden) # _hidden.__next__() block-body except StopIteration: pass # A place-holder in case [] discarded [block-else] # the except block cannot be empty Study this equivalence carefully. Note that _hidden is a special name used by Python to translate the for loop; it is initialized by calling iter(iterable) and we cannot use this name in our code. Then Python executes a while True loop INSIDE a try/except statement. Each iteration of the loop binds the next value(s) to be iterated over to index(es) and then executes block-body. This rebinding/block-body execution continues until either (1) next raises the StopIteration exception, which is handled by causing the loop to terminate (the except is outside the loop, so handling the exception leaves the loop) and executing block-else. (2) block-body executes a break statement, which causs the loop to terminate. Because there was no exception in the body of the try, the try/except terminates without having to handle an exception (and therefore doesn't execute bloc-else. As a concrete example, the simple for loop for i in range(1,6): # for the values 1-5 print(i) #break # uncomment to see what break does else: print('executing else') is translated into _hidden = iter(range(1,6)) try: while True: i = next(_hidden) print(i) #break # uncomment to see what break does except StopIteration: pass print('executing else') Try executing both scripts in Eclipse, including uncommenting each break statement to see the effect: terminates early and doesn't print 'executing at' Now that we understand the use of the iter/next functions, we can write more interesting loops (while loops) that process iteratables. For example, suppose that we wanted to write a function that returned a list of the absolute values of the differences between each adjacent pairs of values in an iterable: e.g., for the list [5, 2, 8, 3, 5] it would return [3, 6, 5, 2]. We can write this function as follows def abs_dif_list(iterable): answer = [] i = iter(itreable) v2 = next(i) try: while True: v1, v2 = v2, next(i) answer.append(abs(v1-v2)) except StopIteration: pass return answer print(abs_dif_list(range(1,10,2)), abs_dif_list( [5, 2, 8, 3, 5])) Now, I wouldn't call this function simple/beautiful, but it is pretty easy to write and understand. I don't think this function can be written with a for loop or comprehension at all, and I'm not sure what other Python features we could use to write it simply and efficiently. ------------------------------------------------------------------------------ Classes implementing the iterator protocol: how range really works In this section we will first write a very simple Countdown class and then a more complex prange class (pseudo-range) that acts like the real range class from Python's builtin module. Then we will generalize it by overloading some simple operators for it (as the real range class does). I'm not sure how range is really implemented, but this implementation seems straight-forward. I would like the following iterator behavior for Countdown objects. The loop for i in Countdown(10): print(str(i)+', ',end='') print ('blastoff') Should print: 10, 9, 8, 7, 6, 5, 4, 3, 2, 1, 0, blastoff Here is the class implementing these semantics. class Countdown: def __init__(self,last): self.last = last def __iter__(self): self.n = self.last return self # must return an object on which __next__ can be called def __next__(self): if self.n < 0: raise StopIteration else: answer = self.n self.n -= 1 return answer In this class, when __iter__ is called it (re)sets self.n (the value __next__ will return first) to self.last (which never changes). The __iter__ method has a requirement that it must return an object that defines a __next__ method. Here it returns self, which is an object constructed from Countdown, which defines __next__ (right below __iter__). When __next__ is called it checks whether self.n has reached shot past 0 and if does, throws StopIteration; otherwise it returns the current value of self.n but before doing so it decrements self.n by 1 (by saving it, decrementing it, then returning the saved value). Note that if we substituted Countdown(-1) in the loop above, its body would be executed 0 times and nothing would be printed. ------------------------------ Quick interlude: I originally said Python defines iter like len: def iter(i): return i.__iter__() But the answer is closer to the following (not completely true yet, but correct until we discuss inheritance). def iter(i): x = i.__iter__() if '__next__' not in x.__dict__ and '__next__' not in type(x).__dict__: raise TypeError('iter() returned non-iterator of type '+str(type(x))[8:-2]) return x If neither the object returned by i.__iter__() or the class it comes from define __next__ then Python raises an exception right there, not even waiting for __next__ to get called on the iterator. Ya know, even I am getting tired of writing str(type(x))[8:-2], so I am going to put the following function in my next release of the goody module def type_as_str(x): return str(type(x))[8:-2] That is how libraries get built. ------------------------------ Now let's aim higher a write a prange class that operates like the range class. Recall there are 3 possible parameters for range: start, stop, and step. Start has a default value of 0 and step has a default value of 1, but we cannot write the following parameter structure for the __init__ in prange. def __init__(start=0,stop,step=1): .... because prange(5) would be an illegal call: it would bind start to 5 but have no value to bind for stop; but in this case start should be 0 and stop should be bound to 5. More generallly prange(5) should bind the first/only argument to stop; prange(1,5) should bind the first argument to start and the second to stop; and prange(1,5,2) should bind the first argument to start and the second to stop, and the third to step. Here is the start of the prange class, using *args in __init__ to solve this parameter problem. I also made the __repr__ method return a string just like range: it always shows start and top but shows step only if it is not 1. class prange: def __init__(self,*args): try: self.start, self.step = 0,1 if len(args) == 1: self.stop = int(args[0]) elif len(args) == 2: self.start, self.stop = int(args[0]), int(args[1]) elif len(args) == 3: self.start, self.stop, self.step = int(args[0]), int(args[1]), int(args[2]) if self.step == 0: raise ValueError('3rd argument must not be 0') else: raise TypeError('range expected 1-3 arguments, got '+str(len(args))) except ValueError: raise TypeError('some argument cannot be interpreted as integer:'+str(args)) def __repr__(self): return 'prange('+str(self.start)+','+str(self.stop)+('' if self.step==1 else ','+str(self.step))+')' Note that int(...) is called on each argument, and will throw the ValueError exception if they fail. Now let's add the main functionality: the iterator protocol methods. They are similar to but generalize what we wrote in the Countdown class. def __iter__(self): self.n = self.start return self # must return an object on which __next__ can be called def __next__(self): if self.step > 0 and self.n >= self.stop or \ self.step < 0 and self.n <= self.stop: raise StopIteration answer = self.n self.n += self.step return answer In this class, when __iter__ is called it (re)sets self.n (the value __next__ will return) to self.start (which never changes). The __iter__ method has a requirement that it must return an object that defines a __next__ method. Here it returns self, which is an object constructed from prange, which defines __next__ (right below __iter__). When __next__ is called it checks whether self.n has reached or exceeded self.stop (different tests, depending on whether self.step is positive or negative) and throws StopIteration; otherwise it returns the current value of self.n but before doing so it increments self.n by self.step (by saving it, incrementing it, returning the saved value). Try running various loops or comprehensions using prange objects to ensure that this code perform like Python's built-in ranges. Now we move prange closer to a real range by writing methods that implement __len__, __getitem__, __contains__, and __reversed__. All of these methods use some fancy mathematics to compute their results, so I won't discuss here how these method work in detail (but I encourage you to examine them). Note that __reversed__ just returns a new prange object, but with different start/stop values and a step that is the opposite. To run this code, I need to import math (to use the math's ceiling function). def __len__(self): if self.step > 0 and self.start >= self.stop or \ self.step < 0 and self.start <= self.stop: return 0 else: return math.ceil((self.stop-self.start)/self.step) def __getitem__(self,n): if n >= len(self) : # yes, could be n >= self.__len__() raise IndexError('range('+str(self)+') index('+str(n)+') out of range') return self.start+n*self.step def __contains__(self,n): if self.step > 0: return self.start<=n 0: return prange(self.start+(len(self)-1)*self.step,self.start-1,-self.step) else: return prange(self.start+(len(self)-1)*self.step,self.start+1,-self.step) For my final topic in this section, I am going to rewrite an alterntaive implementation of this class: one that stores the complete list of values that are in a range. I will then discuss the complexity of the code and some ime/space tradeoffs. In this code, the __init__ method computes and remembers start, stop, and step (but only for use in __repr__). The code at the end of __init__ is a while loop that manually iterates through all the values in the range, storing each value in a list. (try to reverse the test and replace True by the continuation condition for this loop). Once we have a list with all these values, all the other methods are much simpler to implement (they typically just delegate to the list to get their job done) and require no complicated mathematics to write correctly. The __iter__ method just delegates to construct an iterator for the list (defined in the list class); The __next__ method is not defined in this class, because the object returned by __iter__ is a list iterator, which has its own __next__ method defined in the list class. len, [], in, and reversed all work by delegating to the list. import math class prange: def __init__(self,*args): try: self.start, self.step = 0,1 if len(args) == 1: self.stop = int(args[0]) elif len(args) == 2: self.start, self.stop = int(args[0]), int(args[1]) elif len(args) == 3: self.start, self.stop, self.step = int(args[0]), int(args[1]), int(args[2]) if self.step == 0: raise ValueError('3rd argument must not be 0') else: raise TypeError('range expected 1-3 arguments, got '+str(len(args))) except ValueError: raise TypeError('some argument cannot be interpreted as integer:'+str(args)) except TypeError: raise self.listof = [] self.n = self.start while True: if self.step > 0 and self.n >= self.stop or \ self.step < 0 and self.n <= self.stop: break self.listof.append(self.n) self.n += self.step def __repr__(self): return 'prange('+str(self.start)+','+str(self.stop)+','+('' if self.step==1 else str(self.step))+')' def __iter__(self): return iter(self.listof) # no need to define __next__: __iter__ returns iter(list) and list defines __next__ def __len__(self): return len(self.listof) def __getitem__(self,n): return self.listof[n] def __contains__(self,n): return n in self.listof def __reversed__(self): return reversed(self.listof) So what are the tradeoffs bewtween these two implementations? The list implementation of the prange class is much simpler (except for the loop in __init__ to turn a the range into a real list is 47 vs 59 lines); it requires no complex mathematics (hard for us, trivial for the computer). This class can require a huge amount of memory for a large range, while the original pranges stores only 4 instance variables. Finally, the __contains__ method here takes an amount of time proportional to the range size, to search for the value in the list; whereas the original implementation computes this value just by arithmetic, independent of the size of the range. As the quarter goes on, we will study efficiency more formally, but I thought this was a good first time to bring up the issue, dealing with alternative implementations for the prange class (and its iterator). ------------------------------------------------------------------------------ Sharing iterators and Mutating objects that are being iterated over Examine the following code l = [0, 1,2,3,4,5,6,7,8,9] x = iter(l) y = iter(l) print(next(x)) print(next(y)) print(next(x)) print(next(y)) print(next(x)) print(next(y)) It defines a list l, and constructs two iterator objects for the list: the first bound to x, the second bound to y. Each call of next(x) refers to one iterator object and each call of next(y) refers to another. Each call advances the state of one iterator object. So identical values come out of each iterator: the code prints two 0s, two 1s, two 2s. Now change the code to l = [0,1,2,3,4,5,6,7,8,9] x = iter(l) y = x Now x and y are bound to the same iterator object. Each call of next is on the same OBJECT (whether next(x) or next(y)), which advances the state of that one iterator object. So the print statements above print 0, 1, 2, 3, 4, and 5. Finally, what do you think the following code will produce? The big question is, does Python iterate over the values in l when the iterator is CREATED, or does it iterate over the values of l at the time the iterator is USED. l = [0,1,2,3,4,5,6,7,8,9] x = iter(l) l.append(10) try: while True: print(next(x),end=' ') except StopIteration: pass The answer is that it, iterates over the values of l at the time the iterator is USED. It prints the integers 0 - 10. What if we change the loop to try: while True: v = next(x) print(v,end=' ') if v == 4: l[4:7] = ('a','b','c') except StopIteration: pass It prints: 0 1 2 3 4 b c 8 9, 10. Although 'a' is added to position 4, the value in tht position of the list has already been returned, so when __next__ is called again it prints the value in the next position, 'b'. So any mutation we make to a list while it is being iterated over can affect the results of the iteration. Now look at the following code; it causes an infinite loop (printing higher and higher values) because for every iteration of the loop, a new value is added to the list. i = 0 l = [i] for x in l: print(x) i += 1 l.append(i) So, the iterator object is keeping track of what index it is on, but the list it indexes to from which to get these values is also growing (its len is increasing). We can easily avoid this problem, by iterating over a copy of the list. There are many ways to create copies of objects (see the copy module, for example) but the easiest way to make a copy of a list l is by writing list(l) or even l[:]. If we replace for x in l: by for x in l[:]: only the original list's value (just one 0) is printed. It would be easy to write a variant of a list class whose __iter__ method makes a copy of of the list, so any changes subsequently made to the list will not change the result of the iteration. When we study lots of iterators in the next lecture, we will see examples of these kinds of classes. It takes time and space to make a copy, but sometimes it is worth it (to avoid the problems shown above). ------------------------------------------------------------------------------ Problems: 1. Is the following translation of a for loop also equivalent to the translation of the for loop shown above? Argue they are equivalent of show an example where they differ. _hidden = iter(iterable) while True: try: indexes = next(_hidden) block-body except StopIteration: block-else break 2. Write the class Random_N which is constructed with an integer argument and produces that many random values. ------------------------------------------------------------------------------ The entire original prange (59 lines when # and line below it removed) import math class prange: def __init__(self,*args): try: self.start, self.step = 0,1 if len(args) == 1: self.stop = int(args[0]) elif len(args) == 2: self.start, self.stop = int(args[0]), int(args[1]) elif len(args) == 3: self.start, self.stop, self.step = int(args[0]), int(args[1]), int(args[2]) if self.step == 0: raise ValueError('3rd argument must not be 0') else: raise TypeError('range expected 1-3 arguments, got '+str(len(args))) except ValueError: raise TypeError('some argument cannot be interpreted as integer:'+str(args)) except TypeError: raise def __repr__(self): return 'prange('+str(self.start)+','+str(self.stop)+','+('' if self.step==1 else str(self.step))+')' def __iter__(self): self.n = self.start return self # must return an object on which __next__ can be called def __next__(self): if self.step > 0 and self.n >= self.stop or \ self.step < 0 and self.n <= self.stop: raise StopIteration save = self.n self.n += self.step return save def __len__(self): if self.step > 0 and self.start >= self.stop or \ self.step < 0 and self.start <= self.stop: return 0 else: return math.ceil((self.stop-self.start)/self.step) def __getitem__(self,n): if n >= len(self) : # yes, could be n >= self.__len__() raise IndexError('range('+str(self)+') index('+str(n)+') out of range') return self.start+n*self.step def __contains__(self,n): if self.step > 0: return self.start<=n 0: return prange(self.start+(len(self)-1)*self.step,self.start-1,-self.step) else: return prange(self.start+(len(self)-1)*self.step,self.start+1,-self.step)