Iterators Iterators are one of the most useful (and used) features in Python. Besides being used explicitly with for loops (for loop statements and comprehensions), constructors often include a parameter that is iterable, whose values are used to initialize the state of the list/tuple/set/dict objects they are building: e.g., we write set(alist) to create a set with all the values from alist (obviously removing duplicates in the process). Also, calls to functions like sorted and reversed have iterable arguments and produce list results that we can iterate over: e.g., for i in sorted(aset): ... takes an iterable as an argument and returns a list that Python iterates over. In the next three lectures we will explore iterators in more detail. The first focuses on the underlying mechanics of iterators; the second focuses on examples of iterators written in/for classes; the third introduces a new kind of function-like object (called a generator function, which is more generally known in computer science as a coroutine) that returns a value, but when called again remembers where it "left off executing" to return another value: this feature makes writing all sorts of iterators as generators (not as classes) much easier. Throughout these lectures we will also discuss iterator efficiency issues (both time and space): because iterators are used so often, their running times can dominate a program's running time; they can often use little space because (like tuple comprehensions) they produce their values one at a time, rather than storing them all in a large data structure that is iterated over (although with constructors and comprehensions, we can easily create such a data structure from a iterator, if we need to examine it all at once). In this lecture we will learn about the __iter__ and __next__ methods (which a class must implement to be iterated over), how a for loop doing iteration is translated into an equivalent while loop (which needs a try/except block to catch the StopIteration exception that is raised by "exhausted" iterators), and how sharing and mutation affects iterators: sometimes they can cause problems, and we will learn how to mitigate such problems, although often at the cost of using more space to store the iterator (by copying the object to iterate over). ------------------------------------------------------------------------------ For a class to be iterable (i.e., used in any kind of for loop) it must implement the iterator protocol, which means both the __iter__ and __next__ methods. A protocol is just a group of methods that work together to accomplish some task. We have already seen the __enter__ and __exit__ methods that work together to implement the context manager protocol. Included in the iterator protocol is the StopIteration class, which is an exception raised by __next__ to signal there are no more values to iterate over (terminating the for loop). The builtins module defines the StopIteration class. Before writing any classes that act as iterators, we will explore the semantics of Python's for loop, by showing how to translate it into an equivalent -but more primitive- while loop and the try/except statement. Python translate for loops into while loops using __iter__ and __next__ automatically; the while loop is more primitive than the for loop, so we can see more details of how for loops work by analyzing their equivalent while loops. In this process we will better understand for loops, and also see how to write loops that process iterators in a more intricate way than the straightforward (but simple and very useful) way that for loops process them. ------------ for loop with else Interlude Before discussing this translation, we should first understand how the else keyword is used in for/while loops. Since I'm not sure how familiar you are with these statements, I'll start at the beginning (but move quickly). 1) The break statement in Python (just the keyword break as a statement) can be put in any for/while loop; semantically, when executed it terminates the loop. I sometimes write loops like while True: statements if test: break statements In fact, when the if test: break statement appears FIRST in the the loop, as shown below, while True: if test: break statements we can write an equivalent while loop that incorporates the test directly while not (test): # not is HIGH precedence, so I put test in () statements # not True and False = (not True) and False = False # not (True and False) = True -----Interlude In Python "not" has a higher precedence than "and", which has a higher precedence than "or" (think of "not" like unary "-", "and" like "*", and "or" like "+"). Thus, in Python the expression not False or True is equivalent to (not False) or True which evaluates to True or True = True; whereas not (False or True) evaluates to not (True) = False All the logical operators have lower precedence than relational operators, which have a lower precedence than arithmetic operators. These orderings make it easy to write expressions that build relational operators on arithmetic operators and logical operators that build on relational operators with few parentheses: x**2 < 100*math.cos(x) and 3*x+1 >= sqrt(18) -----End Interlude Some programmers/educators banish using breaks in loops, but I think that edict is too extreme. I could give a long lecture on how programmers (and educators who teach programming) feel about break statements. Instead I'll just say the following. a) When I teach indefinite looping (in ICS-31), I teach the while True/break form first. It decouples deciding to loop from how the loop terminates. When students need to write a loop, they write an infinite loop first, and later decide where to test the "termination" condition, and what this condition should be; this test can be stated in the "positive" form: terminate when the test is True (unlike while: keep executing while the test is True). b) If that test is the first statement in the loop (as shown above; sometimes we need to re-arrange code to get equivalent code with that test first) then I will SOMETIMES convert it into a while loop whose test is a "continuation" condition; the test must be stated in the "negative" form: terminate when the test is False (equivalent to continue when the test is True). c) I think it is much easier to think about termination (stop when some specific state occurs) than about continuation (continue in lots of other states). Think about terminating when a value reaches 0 versus continuing when the value is any positive number. d) Students sometimes go crazy and write too many different conditional breaks inside loops: there is no limit. Programmers need to work hard to reduce the number of breaks to simplify their code, but sometimes the "simplest to understand" code has a few breaks. In fact, we can also have conditional breaks in for loops, because they can terminate that kind of loop too (since for loops, as we'll see below, are translated into while loops): one simple use might be searching over a range of values with a for loop, but terminating when a special value is reached. e) Often difficulties with breaks get resolved if we take the loop code we are writing and put it in a function, replacing multiple breaks by multiple returns. Of course some programmers/educators don't like to write functions with multiple returns either. Here is a canonical example of a while loop that is more easily understood when written as a while True loop with a conditional break. This is a "sentinel" loop, which sums the values read until a sentinel (-1) is read. It is a "middle exit" loop, because termination is computed in the middle of the body of the loop. sum = 0 while True: value = prompt.for_int('Enter test score (-1 to terminate)') if value == -1: break sum += value print(Sum =',sum) # we know here the loop exited, so value == -1 is True To use a while test loop (and no conditional break), we would need to write it with the following test sum = 0 value = prompt.for_int('Enter test score (-1 to terminate)') while value != -1: sum += value value = prompt.for_int('Enter test score (-1 to terminate)') print(Sum =',sum) # we know here the loop exited, so value != -1 is False What I object to in the code above is the duplicate prompt; in other code there might be more even statements duplicated. 2) The actual syntax of a for/while loops are for index(es) in iterable: while boolean-expression: block-body block-body [else: [else: block-else] block-else] where [else: block-else] is optional (using [] from EBNF). Semantically, when each loop terminates (it may terminate "normally" or by executing a break inside the block-body), if the else keyword is present and the loop terminated normally, then Python executes block-else. So else means: execute block-else if the loop didn't execute a break to terminate. In the case of the for loop, it means the iterator stopped; in the case of the while loop it means the boolean-expression finally evaluated to False. As a simple example that illustrates the use of else, we could write the following code (notice the else is indented at the level of the for, not the if) for i in range(100): if special_property(i): print(i,'is the first with the special property') break else: print('No value had the special property') I don't find myself writing else in loops much, but that might be because I am new to a language that allows such a feature. As I continue to use Python, I'll come to a more concrete conclusion about the usefulness of this language feature. ------------ OK, now back to the main event: iterators. Python translates for loops like for index(es) in iterable: # indexes for unpacked assignment: e.g, key,value block-body [else: block-else] into the following code. _hidden = iter(iterable) # ultimately calls iterable.__iter__() try: while True: index(es) = next(_hidden) # ultimately calls _hidden.__next__() block-body except StopIteration: pass # A place-holder, when [] is discarded; [block-else] # the except block cannot be empty finally: del _hidden # Remove _hidden from name_space ------------------------------------------------------------------------------- ------------------------------------------------------------------------------- For loop update interlude The following translation is more complicated but more accurate for Python 3.6 and beyond: _hidden = iter(iterable) # ultimately calls iterable.__iter__() try: while True: try: index(es) = next(_hidden) # ultimately calls _hidden.__next__() except StopIteration: [block-else] # omit if no block-else in for loop break block-body finally: del _hidden # Remove _hidden from name_space Notice here that the body of the while loop still contains two statements: a try/except statement (with the call to next(...) in it) followed by the block-body (from the for loop). Now, the index(es) = next(_hidden) statement is in its own special try/except, which handles the StopIteration exception that might be raised during its execution: it executes the optional block-else if present (because the for loop terminates normally) and then breaks out of the while loop. But now, if the block-body raises an exception of any kind, it also terminates the loop (by the outer try/except). This outer try/except does not handle the exception: it lets the exception propagate, but the finally: clause still ensures that the _del is executed. What is the difference between these two translations? It is subtle, but we can illustrate the difference with the following code: for i in range(10): raise StopIteration In Python the StopIteration ISN'T supposed to terminate the loop normally! The loop should be terminated normally only when there are no more range values: when it calls next(_hidden). The result of executing the code above in Python is that a StopIteration is raised but not handled. If we used the first translation _hidden = iter(range(10)) try: while True: i = next(_hidden) raise StopIteration except StopIteration: pass finally: del _hidden Raising StopIteration explicitly would be handled in the only try/except there is, causing the loop to terminate normally. BUT THIS IS NOT WHAT HAPPENS IN PYTHON. If we used the second translation _hidden = iter(range(10)) try: while True: try: i = next(_hidden) except StopIteration: break raise StopIteration finally: del _hidden Raising StopIteration explicitly would now be controlled in the outer try/except, again causing the loop to terminate, but the StopIteration exception is still raised because the outer try/except has no except clause to handle that type of exception. So, the two translations are the same when the block-body doesn't raise the StopIteration exception. But only the second accurately illustrates how Python executes this case: one we can easily illustrate with the for loop above. Thanks to Kevin Liu (n Winter 2022) for bringing this issue to my attention. ------------------------------------------------------------------------------- ------------------------------------------------------------------------------- Note that like Python's len function, its iter and next functions translate into method calls of __iter__ and __next__ on their argument: e.g., def iter(i): return i.__iter__() def next(i): return i.__next__() The key to understanding how iter and next are used in iteration is: The iter function MUST return an object on which next can be called; and __next__ is frequently called multiple times (in the while loop). Each call to the next function returns a value and advances the state of the iteration, until there are no more values to return; then calling next raises StopIteration (i.e., when there is no next value to iterate over). Study the for/while equivalence carefully. Note that _hidden is a special name used by Python to translate the for loop (it is not really the name "_hidden"; I'm just trying to indicate there is a name, but it hidden from use: it cannot named/accessed by the programmer): it is defined by calling iter(iterable) and deleted when the loop terminates. We cannot use this name in our code even if we knew it; it can be used only by Python in the translation of the for loop. Then Python executes a while True loop INSIDE a try/except statement. Each iteration of the loop binds the next value(s) to be iterated over to index(es) and then executes block-body. This rebinding/block-body execution continues until either (1) next raises the StopIteration exception, which is handled by causing the loop to terminate (the except: is OUTSIDE the loop, so handling the exception causes Python to exit the loop, terminating it) and executing pass and block-else (if there is one: every except: clause must have at least one statement, which pass always satisfies), and finally removing the _hidden variable from the namespace (in the finally: clause). (2) block-body itself executes a break statement, which causes the loop to terminate. Because there was no exception in the body of the try, the try/except terminates without having to handle an exception, and therefore does not the execute pass/block-else (if present), and finally removes the the _hidden variable from the namespace (in the finally: clause). In practice, most for loops do not contain break statements, so (1) happens much more frequently in our code than (2). As a concrete example, the simple for loop for i in range(1,6): # for the values 1-5 print(i) #break # uncomment this line to see what break does else: print('executing else') is translated into _hidden = iter(range(1,6)) try: while True: i = next(_hidden) print(i) #break # uncomment thie line to see what break does except StopIteration: pass print('executing else') finally: del _hidden Try executing both scripts in Eclipse, including uncommenting each break statement to observe the effect of executing this statement: it unconditionally terminates the loop on the first iteration and doesn't print 'executing else' Now that we understand the use of the iter/next functions, we can write more interesting loops (while loops) that process iterables. For example, suppose that we wanted to write a function that returned the sum of the absolute values of the differences between each adjacent pair of values in an iterable: e.g., for the 5-list [5, 2, 8, 3, 5] it would return the value 16 = 3+6+5+2, where 3 = abs(5-2), 6 = abs(2-8), 5 = abs(8-3), and 2 = abs(3-5). The following simple code works for arguments that can be sliced (like lists) def sum_dif(alist): return sum( [abs(a-b) for a,b in zip(alist,alist[1:])] ) But, it is not easy to write this function efficiently: 1) It is inefficient to slice a very large list: Python creates a new/sliced list whose length is just one shorter than alist, almost doubling the storage requirements. 2) An arbitrary iterable (i.e., a tuple-comprehension) doesn't even support slicing (not even indexing). Given the predicate is_prime, the call sum_diff( i for i in range(1000) if is_prime(i) ) would raise the exception TypeError: 'generator' object is not subscriptable We could use the code below to turn the iterable into a real tuple (or list), and then slicing that data structure. But that would also be space inefficient. def sum_dif(iterable): all_values = tuple(iterable) # same if we used list(iterable) return sum( [abs(a-b) for a,b in zip(all_values,all_values[1:])] ) Here calling sum_dif(range(1_000_000_000)) raise MemoryError (would run out of storage) because the tuple comprehension contains more values than can be stored in the computer. So, we need to focus on how we can write this code to store only two values at a time. We can write sum_dif without ever turning the iterable into a tuple or list as follows, using a variant of the standard for loop code (translated to a while loop) that we explored above. def sum_dif(iterable): answer = 0 i = iter(iterable) v2 = next(i) # get first value (loop moves v2 to v1) try: while True: # next gets one new value in each loop v1, v2 = v2, next(i) # first time, v1 is 1st value, v2 is 2nd answer += abs(v1-v2) except StopIteration: pass return answer Now, all the calls above (which failed) will work (although iterating over a billion values still takes a lot of computer time: about 10 minutes on my computer). This function has just five names (iterable, answer, i, v1, v2), none of which is bound to a big data structure: iterable and i are bound to generator functions; and i, v1, and v2 are bound to integers. Note that pass is required in this except: clause because we need no block-else; although, we could replace the pass statement by the return answer statement. Also, we did not "del i" (i takes the place of _hidden) because as a local variable in sum_dif; as a local variable, i will disappear when the function returns. ------------------------------------------------------------------------------ Classes implementing the iterator protocol: how range really works in Python In this section we will first write a very simple Countdown class and then a more complex prange class (pseudo-range) that acts like the real range class from Python's builtins module. Then we will generalize prange by overloading some simple operators for it (as the real range class does). I'm not sure how range is really implemented, but this implementation seems straight-forward and efficient. I would like the following iterator behavior for Countdown objects. The loop for i in Countdown(10): print(str(i)+', ',end='') print ('blastoff') Should print: 10, 9, 8, 7, 6, 5, 4, 3, 2, 1, 0, blastoff Here is one class implementing these semantics. class Countdown: def __init__(self,start): self.start = start # self.start never changes; see self.n in __iter__ # __iter__ must return an object on which __next__ can be called; it returns # self, which is an object of the Countdown class, which defines __next__. # Later we will see a problem with returning self (when the same Countdown # object is iterated over in a nested structure), and how to solve that # problem. def __iter__(self): self.n = self.start # n attribute is added to the namespace here return self # (not in __init__) and processed in __next__ def __next__(self): if self.n < 0: raise StopIteration # can del self.n here, after exhausting iterator else: answer = self.n # or, without the temporary, but more confusing self.n -= 1 # self.n -= 1 return answer # return self.n+1 In this class, when __iter__ is called it (re)sets self.n (the value __next__ will return first) to self.start (which is set in __init__ and never changes for a constructed object). The __iter__ method has a requirement that it must return an object that defines a __next__ method. Here it returns self, which as an object constructed from Countdown, defines __next__ (right below __iter__). When __next__ is called it checks whether self.n has been decremented past 0, and if is has, raises StopIteration; otherwise it returns the current value of self.n, but before doing so, it decrements self.n by 1 (by saving it in a local variable, decrementing it, and then returning the saved local value). As a variant in __next__, we could put del self.n directly before the raise statement, to remove this attribute from the namespace once the iterator is exhausted; if we did this, calling __next__ again would raise a NameError when accessing self.n; the code above, without del, would just raise StopIteration again, which is probably a better behavior to implement. Note that if we substituted Countdown(-1) in the loop above, its body would be executed 0 times and nothing would be printed before "blastoff". Also, the following code counts-down twice; in each for loop __iter__ is called (recall how it is translated into a while loop), which initializes self.n to 10 before calling __next__ multiple times. cd = Countdown(10) for i in cd: print(str(i)+', ',end='') print ('blastoff') for i in cd: print(str(i)+', ',end='') print ('blastoff') It print: 10, 9, 8, 7, 6, 5, 4, 3, 2, 1, 0, blastoff 10, 9, 8, 7, 6, 5, 4, 3, 2, 1, 0, blastoff Again, at the start of each for loop (see the equivalent while loop), iter is called, which (re)initialize the attribute n to 10 from the attribute start. ------------------------------ Quick interlude: I originally said Python defines iter like len: def iter(i): return i.__iter__() But the truth is closer to the following (not completely true yet, but correct until we discuss inheritance). def iter(i): x = i.__iter__() if '__next__' not in x.__dict__ and '__next__' not in type(x).__dict__: raise TypeError('iter() returned non-iterator of type '+type_as_str(x)) return x If neither the object returned by i.__iter__() nor the class (type) it was constructed from defines a __next__ method, then Python raises an exception immediately: not waiting for a call of __next__ on the iterator to fail by raising a NameError exception. ------------------------------ prange: initialization (and the meaning of its arguments) Now let's aim much higher and write a prange class that operates like the range class. Recall there are 3 possible parameters for range: start, stop, and step. Start has a default value of 0 and step has a default value of 1, but stop -written between start and step- has no default value. Unfortunately, we CANNOT write the following parameter structure for the __init__ in prange. def __init__(self,start=0,stop,step=1): .... because prange(5) would be an illegal call: it would bind start to 5 but have no value to bind for stop; moreover, in this case start should be 0 and stop should be bound to 5. More generally prange(5) should bind the first/only argument to stop; prange(1,5) should bind the first argument to start and the second to stop; and prange(1,5,2) should bind the first argument to start and the second to stop, and the third to step. How can we write such parameters for such an __init__ method? Here is the start of the prange class, using *args in __init__ to solve this parameter problem. The __init__ methods raises exceptions like range: there must be 1-3 int arguments and the value of step cannot be 0; the __repr__ method returns a string just like range: it always shows start and stop values, but shows the step value only if it is not 1. It works by looking at *args and decoding the meaning of the arguments based on the the number of arguments actually passed to prange. To run this code, I need to import the type_as_str function from goody. class prange: def __init__(self,*args): for a in args: if not type(a) is int: raise TypeError('\''+type_as_str(a)+'\' object cannot be interpreted as an integer') self.start, self.step = 0, 1 # defaults for non-required parameters if len(args) == 1: self.stop = args[0] # store single argument elif len(args) == 2: self.start, self.stop = args # unpack 2 arguments elif len(args) == 3: self.start, self.stop, self.step = args # unpack 3 arguments if self.step == 0: raise ValueError('3rd argument must not be 0') else: raise TypeError('range expected at most 3 arguments, got '+str(len(args))) def __repr__(self): return 'prange('+str(self.start)+','+str(self.stop)+('' if self.step==1 else ','+str(self.step))+')' ------------------------------ prange: implementing the iteration protocol Now let's add the main functionality: the iterator protocol methods. They are similar to but generalize what we wrote in the Countdown class. def __iter__(self): self.n = self.start # first value to return from __next__ return self # must return object on which __next__ is callable def __next__(self): if self.step > 0 and self.n >= self.stop or\ self.step < 0 and self.n <= self.stop: raise StopIteration answer = self.n self.n += self.step return answer In this class, when __iter__ is called it (re)sets self.n (the value __next__ will return) to self.start (which never changes). The __iter__ method has a requirement that it must return an object that defines a __next__ method. Here it returns self, which is an object constructed from prange, which defines __next__ (right below __iter__). When __next__ is called it checks whether self.n has reached or exceeded self.stop (different tests, depending on whether self.step is positive or negative: self.step cannot be 0) and raises StopIteration; otherwise it returns the current value of self.n but before returning it increments self.n by self.step (by saving it, incrementing it, returning the saved value). We could avoid the temporary name answer by writing the following code, but it seems clumsy to me. if self.step > 0 and self.n >= self.stop or\ self.step < 0 and self.n <= self.stop: raise StopIteration self.n += self.step return self.n - self.step We discussed similar, but simpler, code in Countdown (there the equivalent of self.step was always -1). Try running various loops or comprehensions using prange objects to ensure that this code perform like Python's built-in ranges. ------------------------------ prange: overloading operators Now we move prange closer to Python's real range class by writing methods that implement __len__, __getitem__, __contains__, and __reversed__. All of these methods use some fancy mathematics to compute their results, so I won't discuss here how these method work in detail (but I encourage you to examine them and calculate examples). Note that __reversed__ just returns a new prange object, but with different start/stop values and a step that is the opposite sign. So, none of these methods creates or uses any potentially large data structure. To run this code, I need to import math (to use the math's ceiling function: returns an integer >= its float/int argument; ceiling(3.9) is 4). def __len__(self): if self.step > 0 and self.start >= self.stop or \ self.step < 0 and self.start <= self.stop: return 0 else: return math.ceil((self.stop-self.start)/self.step) def __getitem__(self,n): if n < 0: # Handle negative (index from end) n = len(self) + n if n < 0 or n >= len(self) : # yes, could be n >= self.__len__() raise IndexError(str(self)+'['+str(n)+'] index out of range') return self.start+n*self.step def __contains__(self,n): if self.step > 0: return self.start<=n 0: return prange(self.start+(len(self)-1)*self.step,self.start-1,-self.step) else: return prange(self.start+(len(self)-1)*self.step,self.start+1,-self.step) ------------------------------ prange: an alternative implementation (poor use of space and potentially time) For my final topic in this section, I am going to rewrite an alternative implementation of this class: one that uses __init__ to generate and store the complete list of values that are in a range. I will then discuss the complexity of the code and some time/space tradeoffs. In this code, the __init__ method computes and remembers start, stop, and step (but only for use in __repr__). The code at the end of __init__ is a while loop that explicitly iterates through all the values in the range, storing each value in a list. (try to reverse the test and replace True by the continuation condition for this loop). Once we have a list with all these values, all the other methods are much simpler to implement (they typically just delegate to the list methods to get their jobs done) and require no complicated mathematics to write correctly. The __iter__ method just delegates to construct an iterator for the list (defined in the list class); The __next__ method is not defined in this class, because the object returned by __iter__ is a list iterator, which has its own __next__ method defined in the list class. len, [], in, and reversed all work by delegating to the list. from goody import type_as_str import math class prange: def __init__(self,*args): for a in args: if not type(a) is int: raise TypeError('\''+type_as_str(a)+'\' object cannot be interpreted as an integer') self.start, self.step = 0,1 # defaults if len(args) == 1: self.stop = args[0] # store single argument elif len(args) == 2: self.start, self.stop = args # unpack 2 arguments elif len(args) == 3: self.start, self.stop, self.step = args # unpack 3 arguments if self.step == 0: raise ValueError('3rd argument must not be 0') else: raise TypeError('range expected at most 3 arguments, got '+str(len(args))) # store exactly the range of values into a list self.listof = [] self.n = self.start while True: if self.step > 0 and self.n >= self.stop or \ self.step < 0 and self.n <= self.stop: break self.listof.append(self.n) self.n += self.step def __repr__(self): return 'prange('+str(self.start)+','+str(self.stop)+','+('' if self.step==1 else str(self.step))+')' def __iter__(self): return iter(self.listof) # no need to define __next__: __iter__ returns iter(list) and list defines __next__ def __len__(self): return len(self.listof) def __getitem__(self,n): return self.listof[n] def __contains__(self,n): return n in self.listof def __reversed__(self): return reversed(self.listof) So what are the tradeoffs between these two implementations? The list implementation of the prange class is much simpler (except for the loop in __init__ to turn a the range into a real list is 8 extra lines: 23 vs 15); it requires no complex mathematics (but while such mathematics is hard for us, it is trivial for the computer). This class can require a huge amount of memory for storing a large range (e.g., for prange(1,1_000_000_0000_000)), this implementation must store a trillion element list, while the original pranges always stores only 4 int attributes (self variables that we can use to generate all the values in the range -those in the list). Also, __init__ takes a lot of time to construct such a list (and we might not even iterate over all the values in this prange). Finally, the __contains__ method here takes an amount of time proportional to the range size (we will study why soon), to search for the value in the list; whereas the original implementation computes this value just by some quick arithmetic, independent of the size of the range. As the quarter goes on, we will study efficiency more formally, but I thought this was a good first time to bring up the issue, dealing with alternative implementations for the prange class (and its iterator). Again, the main point here is that the original prange COMPUTED the values in the range one at a time, as needed, from 4 attributes; while the list version EXPLICITLY stores all the values in a potentially huge list. Generally, new Python students often create extra data structures that are not needed. For example, they read an entire open file into a list and then iterate over that list, when they could just iterate over the open file: if the file is huge, it might not fit in memory as a list; but iterating over an open file just needs enough space to store one line at a time from the file. Although I've promoted "write simple code" in this course, we should start thinking about some of the time/space issues of using Python. Often using Python "efficiently" doesn't make the code more complicated. ------------------------------------------------------------------------------ Sharing list iterators and Mutating list objects that are being iterated over Examine the following code l = [0,1,2,3,4,5,6,7,8,9] x = iter(l) y = iter(l) print(next(x)) print(next(y)) print(next(x)) print(next(y)) print(next(x)) print(next(y)) It defines one list l, and constructs two iterator objects for the list: the first bound to x, the second bound to y. Each call of next(x) refers to one iterator object and each call of next(y) refers to another. Each call advances the state of one iterator object. So identical values come out of each iterator: the code prints 0, 0, 1, 1, 2, 2 (two 0s, 1s, and 2s) Now change the code to l = [0,1,2,3,4,5,6,7,8,9] x = iter(l) y = x Now x and y are bound to the same iterator object. Each call of next is on the same OBJECT (whether next(x) or next(y)), which advances the state of that one iterator object. So the print statements above print 0, 1, 2, 3, 4, and 5. Finally, what do you think the following code will produce? The big question is, does Python iterate over the values in l when the iterator is CREATED, or does it iterate over the values of l at the time the iterator is USED? l might have been mutated since the iterator was created. l = [0,1,2,3,4,5,6,7,8,9] x = iter(l) l.append(10) try: while True: print(next(x),end=' ') except StopIteration: pass The answer is that it iterates over the values of l at the time the iterator is USED. In this case it prints the integers 0 - 10 (including the 10 added to the list after the iterator was created). What if we change the loop to try: while True: v = next(x) print(v,end=' ') if v == 4: l[4:7] = ('a','b','c') except StopIteration: pass It prints: 0 1 2 3 4 b c 7 8 9, 10. Although 'a' is added to position 4, the value in that position of the list has already been iterated over (returned), so when __next__ is called again it prints the value in the next position, 'b'. So any mutation we make to a list while it is being iterated over can affect the results of the iteration. Now look at the following code; it causes an infinite loop (printing higher and higher values) because for every iteration of the loop, a new value is added to the list, increasing the number of values to iterate over. i = 0 l = [i] # To start, l = [0], a list of one value for x in l: print(x) i += 1 l.append(i) # Append another value into list l So, the iterator object for a list is keeping track of what index it is on, but the list it indexes (from which to get these values) is also growing (its len is increasing). We can avoid all these problems by iterating over a COPY of the list. There are many ways to create copies of objects (see the copy module, for example) but the easiest way to make a copy of a list l is by using the idiom list(l) or even l[:]. If we replace for x in l: by for x in l[:]: # or for x in list(l): in the example above, only the original list's value (just one 0) is printed. Of course doing so make a copy of the list (using more space). Changing l does not change the copy of the original l that x is iterating over. It would be easy to write a variant of a list class whose __iter__ method always makes a copy of the list to iterate over, so any changes subsequently made to the list will not change the result of the iteration. Of course, a drawback of that approach is that it requires extra space to "remember what the list contained when the iterator started". When we study lots of iterators in the next lecture, we will see examples of these kinds of classes. Of course, besides the extra space it takes extra time to make a copy, but sometimes it is worth it (to avoid the problems shown above). ------------------------------------------------------------------------------ The entire original prange (59 lines) from goody import type_as_str import math class prange: def __init__(self,*args): for a in args: if not type(a) is int: raise TypeError('\''+type_as_str(a)+'\' object cannot be interpreted as an integer') self.start, self.step = 0,1 # defaults if len(args) == 1: self.stop = args[0] # store single argument elif len(args) == 2: self.start, self.stop = args # unpack 2 arguments elif len(args) == 3: self.start, self.stop, self.step = args # unpack 3 arguments elif len(args) == 3: self.start, self.stop, self.step = args[0], args[1], args[2] if self.step == 0: raise ValueError('3rd argument must not be 0') else: raise TypeError('range expected at most 3 arguments, got '+str(len(args))) def __repr__(self): return 'prange('+str(self.start)+','+str(self.stop)+','+('' if self.step==1 else str(self.step))+')' def __iter__(self): self.n = self.start return self # must return an object on which __next__ can be called def __next__(self): if self.step > 0 and self.n >= self.stop or \ self.step < 0 and self.n <= self.stop: raise StopIteration save = self.n self.n += self.step return save def __len__(self): if self.step > 0 and self.start >= self.stop or \ self.step < 0 and self.start <= self.stop: return 0 else: return math.ceil((self.stop-self.start)/self.step) def __getitem__(self,n): if n < 0 or n >= len(self) : # yes, could be n >= self.__len__() raise IndexError('range('+str(self)+') index('+str(n)+') out of range') return self.start+n*self.step def __contains__(self,n): if self.step > 0: return self.start<=n 0: return prange(self.start+(len(self)-1)*self.step,self.start-1,-self.step) else: return prange(self.start+(len(self)-1)*self.step,self.start+1,-self.step) ------------------------------------------------------------------------------ Problems: 1)Is the following translation of a for loop also equivalent to the translation of the for loop shown above? Argue they are equivalent of show an example where their execution differs. _hidden = iter(iterable) while True: try: indexes = next(_hidden) block-body except StopIteration: [block-else] break finally: del _hidden 2) Write the class Random_N which is constructed with an integer argument and produces that many random values. 3) Define a function named first, which takes any iterable and a predicate function (takes one argument, returns a bool) as arguments, and uses a for loop to returns the first value in the iterable for which the predicate is True; or, it raises the ValueError exception if the predicate is True for none of the iterator's values. 4) Translate the following iteration to use a while loop and explicit calls to __iter__ and __next__. for c in 'abcdefg': print(c) 5) Define a function named group_n, which takes any iterable and an int as arguments, and returns a list of a list of the first n values, a list of the second n values, etc. in the iterable; if the iterable raises StopIteration before the last group of n values can be collected, ignore those. For example, group_n('abcdefghijklmn',3) returns the list of lists: [['a', 'b', 'c'], ['d', 'e', 'f'], ['g', 'h', 'i'], ['j', 'k', 'l']] 6) The fibonacci sequence is defined as 1, 1, 2, 3, 5, 8, 13, 21, ... where the 1st and 2nd fibonacci numbers are 1 and 1, and all subsequent numbers are the sum of the prior two. Define a class named fib_range that is initialized by two values (the index of the first and last fibonacci number inclusive), and whose iterator generates all the fibonacci numbers in the range. For example, the comprehension [i for i in fib_range(4,6)] creates the list [3, 5, 8] 7) Define a method named len_remaining which returns an int indicating how many more values a prange iterator object will iterate over. Initially len and len_remaining would return the same result, but for every time __next__ is called, len_remaining returns a value one less. 8) Re-examine the __reversed__ method in prange. Explain why it is incorrect to simplify this method to the following: for what kinds of values does it work and for what kind does it fail? def __reversed__(self): if self.step > 0: return prange(self.stop-1 ,self.start-1, -self.step) else: return prange(self.stop+1 ,self.start+1, -self.step) 9) Examine the difference between the following code and what is produced. c = Countdown(2) for a in Countdown(2): for a in c: for b in Countdown(2): for b in c: print(a,b) print(a,b) prints: prints: 2 2 2 2 2 1 2 1 2 0 2 0 1 2 1 1 0 0 0 2 0 1 0 0 Explain why these results print as they do and find a simple way to modify the Countdown class to make these two scripts perform identically. Hint: it relates to a problem with sharing objects.