Iterators via Classes In this lecture we will first learn how to fix a problem (related to sharing objects) with the prange class that we wrote in the last lecture. We will learn how to return an object constructed from a NESTED CLASS inside __iter__ instead of returning self. Next, we look at a class that stores and manipulates data (histograms), and also allows iteration over its data. Finally, we will begin to explore functions and classes that DECORATE iterables: they take an iterable argument and produce another iterable, related but not the same as their argument, hence the term "decorations" (e.g., sorted and reversed are two examples built into Python, but there are many other interesting and useful ones). We will finish the week with a lecture about a special kind of function-like object called a generator function, which provides a simple and excellent mechanism for writing most iterators (and iterators that decorate iterators). All this material will become even more important and useful when we spend a week talking about inheritance among classes. ------------------------------------------------------------------------------ Fixing a sharing problem with prange In the last lecture, we discussed various classes that implemented the iterator protocol (by implementing the methods __iter__ and __next__). Typically in these cases (for both the Countdown and prange classes) the main purpose of the class was to create an object to iterate over once. That is, we processed objects from these classes only (or primarily) by iterating over them. Contrast these classes to the list, tuple, set, and dict class: while we often iterate over these objects, we also perform many other useful operations on them too: e.g., examining and/or updating data in them. Often we construct an object from Countdown and prange only to use it in a for loop: e.g., for i in prange(...) : .... We don't even bind this object to a name, so the objects are used just once in the for loop (which calls __iter__ on them to start the iteration), and when the for loop finishes these objects disappear, so they cannot be reused (no name refers to them). But at the end of the last lecture we started discussing sharing, and we will start our discussion here by looking at another example of sharing, and how to fix a defect in our first implementation of the prange class (so that it behaves more like the real range class). Doing so will involve defining and constructing a class nested in the prange class, whose sole purpose is to construct an object for iteration: an object returned by __iter__ on which we can call __next__ . We will see that when we write iterators for other classes, controlling more complicated data, this same technique works nicely. To illustate the defect, we first define the following function, which uses a doubly-iterating comprehension to return all pairs of values produced by its single argument iterators. Here i1 and i2 are two objects that are iterable. def all_pair(i1,i2): return [(x,y) for x in i1 for y in i2] this code is equivalent to def all_pair(i1,i2): answer = [] for x in i1: for y in i2: answer.append((x,y)) return answer So, in the outer for loop we iterate over i1, and while we are doing this iteration, we iterate over i2 in the inner for loop. The while loop translation of both for loops call __init__ on i1 and i2 respectively to get started. Now let's run this function in various interesting ways, with range/prange (in one case, illustrating a difference in how these classes perform). r = range (3) p = prange(3) print(all_pair(r,p)) # use range and prange print(all_pair(p,r)) # use prange and range print(all_pair(r,r)) # use only range, but the same range object twice print(all_pair(p,p)) # use only prange, but the same range object twice These four print statements produce the following results. Notice the first three result are the same (producing tuples containing all pairs of the values in the range), but differ from the fourth (producing only the first of the tuples). The difference occurs when the same prange iterable argument is passed to both parameters. [(0, 0), (0, 1), (0, 2), (1, 0), (1, 1), (1, 2), (2, 0), (2, 1), (2, 2)] [(0, 0), (0, 1), (0, 2), (1, 0), (1, 1), (1, 2), (2, 0), (2, 1), (2, 2)] [(0, 0), (0, 1), (0, 2), (1, 0), (1, 1), (1, 2), (2, 0), (2, 1), (2, 2)] [(0, 0), (0, 1), (0, 2)] The last use of p (using it twice in a call to all_pair) above produces the wrong results. The problem is that when __iter__ is called on the prange argument (the same object in the outer loop and innner loop), it returns the object itself (self, on which __iter__ was called). So both loops are using the SAME OBJECT to iterate over. And that object has just one attribute in which to store the state of the iteration (self.n). Both for loops in the all_pair function share the iterator/object: the outer loop initializes it and calls next to get its first value (just 0); then the inner loop re-initializes it and calls next to get the values 0, 1, 2; when the inner for loop exhausts the iterator, it is also exhausted in the outer for loop (because the loops SHARE the same object and its n attribute is 3) so the outer loop immediately terminates too, after producing only 3 tuples from the inner loop). In summary, the problem is that we are doing nested/multiple iteration on a single object: p. Python's standard range function doesn't suffer from problem, and we will fix our prange to operate likewise: when we call iter on range (and soon on prange) it will return a new/special object to iterate over. In nested iteration, iter is called twice so two new/different objects will participate in the iteration. As a reminder, here is how prange defined __iter__ and __next__ in the previous lecture. def __iter__(self): self.n = self.start return self # return an object on which __next__ can be called def __next__(self): if self.step > 0 and self.n >= self.stop or \ self.step < 0 and self.n <= self.stop: raise StopIteration save = self.n self.n += self.step return save __iter__ sets a new attribute self.n (new because __init__ did not set it) and returns self; __next__ (written in prange) uses this attribute name and the stop/step attrributes initialized in __init__ to advance the iterator. The problem is that when all_pair tries to doubly iterate over the same prange object it will be using the same object in the first/outer iteration as the second/inner iteration. When the second/inner iteration starts it calls __iter__ on the same object, it clobbers the state being maintained by the first/outer iteration (because both are using the same prange object). When the inner iteration finishes, Python returns to call __next__ for the outer iterations, and thinks the outer one is finished too (the one prange object being used has its self.n set to 3, beyond its last value, which is why the second/innner iteration stopped). So, after generating (0,0), (0,1), and (0,2) and finishing the inner iteration, Python thinks that the first/outer iteration has finished too. We will now fix this problem. Here is how we do it. We define the local class prange_iter inside the __iter__ method. Every time that we call __iter__ (once in the outer loop, 3 times in the inner loop) Python creates a new/different object with its OWN STATE, so multiple iterations on the same prange object don't interact badly with each other. Each loop calls __iter__ and each call creates/uses it own prange_iter object with its own state. The prange_iter class has just three methods: __init__ to initialize it with the information needed to iterate over the prange object, __next__ to advance the iteration of the prange object, and __iter__: when we call __iter__ on an iterator it just returns that iterator (self). The __init__ method defines the instance names n, stop, and step (note that start is not needed: it is used only to initialize self.n); the __next__ method has exactly the same code as the method above (but now applying to the attributes of the prange_iter object; of course, this code never referred to self.start). Calling __iter__ on an prange_iter object just returns itself. The second/final line of code in the __iter__ method (after defining the class) just constructs and returns the necessary prange_iter object using the attributes of prange: return prange_iter(self.start, self.stop, self.step). This object is constructed from a class that implements __next__ (as required). The __next__ defined is the same as before, but now in defined in prange_iter instead of prange itself, and operates on the prange_iter attributes initialized in prange_iter's __init__. def __iter__(self): class prange_iter: def __init__(self,start,stop,step): self.n = start self.stop = stop self.step = step def __next__(self): if self.step > 0 and self.n >= self.stop or \ self.step < 0 and self.n <= self.stop: raise StopIteration save = self.n self.n += self.step return save def __iter__(self): return self return prange_iter(self.start, self.stop ,self.step) The remaining examples in this lecture will contain code like this, with the __iter__ method defining a LOCAL or NESTED CLASS (with these classes defining only __init__ and__next__ methods) and returning a new object constructed from this local/nested class every time __iter__ is called on a prange object. The total amount of code isn't much bigger, but it is certainly more complicated to define and use the nested class inside the __iter__ method (and think about it). Now every call to __iter__ on a prange objects returns a different object that has a __next__, so two uses of the same prange object (as in all_pair(p,p) above) will now work correctly. Note that __iter__ is called automatically multiple times: once when each loop starts: 1 time for the outer loop, 3 times for the inner loop for prange(2). One more interesting observation: notice the difference between the following for v in prange(10): and i = iter(prange(10)) .... for v in i: .... When the first is translated into a while loop, it automatically calls iter on a prange object; this is as expected. The second automatically calls iter on a prange_iter object (that is, after all, what iter(prange(10)) returns); this call is handled by the __iter__ function defined INSIDE the prange_iter class, which just returns the same iterator created by calling iter(prange(10)). Another approach would be to create a new object with the current values of prange_iter. def __iter__(self): return prange_iter(self.n, self.stop, self.step) The real Python range uses the first approach. We can verify this by executing the following code (which executes the first print statement, because i is j). i = iter(range(10)) j = iter(i) if i is j: print('__iter__ for range just returns self') else: print('__iter__ for range returns a new copy of iterator') What might this inner definition of __iter__ be useful for? Suppose that we wanted to ignore the first 10 lines in a file, and then print all the remaining lines. We could write this code using complicated while loops, but we can also write it using simpler for loops, as follows: i= iter(open('file.txt')) # iterator for opened file for _ in range(10): # 10 times next(i) # call next to advance iterator for line in i: # iter(i) called in while translation returns i; print(line.rstrip()) # for remaining lines in i, print each line Finally, we could also write the following code, which declares prange_iter not inside __iter__ but in the prange class itself. The code for class prange_iter is identical (except outdented) but the call to __iter__ is now just one line calling the constructor for this class. class prange: ....def of __init__ class prange_iter: def __init__(self,start,stop,step): self.n = start self.stop = stop self.step = step def __next__(self): if self.step > 0 and self.n >= self.stop or \ self.step < 0 and self.n <= self.stop: raise StopIteration save = self.n self.n += self.step return save def __iter__(self): return self def __iter__(self): return prange.prange_iter(self.start, self.stop, self.step) Generally, names should be defined in the most restricted place they can be (inner most scope), to avoid accidental misuse. This rule means that the original definition of prange_iter (defined INSIDE the __iter__ method) is probably the BEST location to define it. Finally, here is an alternative that has fewer attributes in the prange_iter class, but whose code is a bit more complicated because of indirect accesses. Here each prange_iter object refers to the prange object that iter was called on: it can access that prange's start, stop, and step attributes via self.prange (see the code in __next__, instead of defining them directly); it also creates its own n attribute for counting, initialized to the start value in the prange. def __iter__(self): class prange_iter: def __init__(self,prange): self.prange = prange self.n = prange.start # or self.prange.start def __next__(self): if self.prange.step > 0 and self.n >= self.prange.stop or \ self.prange.step < 0 and self.n <= self.prange.stop: raise StopIteration save = self.n self.n += self.prange.step return save def __iter__(self): return self return prange_iter(self) ------------------------------------------------------------------------------ Quick interlude: Iterators and unpacking assignment Suppose o is an object constructed from class C. When we write something like x, y, z = o Python translates it into x, y, z = (i for i in o) That is, it (a) iterates over o, (b) places all its produced values into a list, and (c) tries to unpack the list's values according to the structure on the left of the = sign. So, if we wrote x, y, *z = prange(1,10) the result would be: x is bound to 1, y is bound to 2, and z is bound to the list [3, 4, 5, 6, 7, 8, 9]. ------------------------------------------------------------------------------ Classes that store interesting data and have iterators over the data Examine the definition of the following class that stores and processes histograms. For simplicity we will assume it processes percentages (ints from 0 to 100) and places them in 10 bins: 0-9, 10-19, 20-29, ... 80-89, 90-100; note that the last bin really represents 11 values, while all the others represent 10 values. Of course we will focus on the how to accomplish iteration for objects of this class (iterating over the counts in their bins) but there are other interesting aspects about this class that we will discuss first (and we could always generalize or add methods to make this class even more powerful). Note both the tally and _tally methods. class Percent_Histogram: def __init__(self,init_percents=[]): self._histogram = 10*[0] # [0,0,0,...,0,0] length 10, all 0s for p in init_percents: self.tally(p) # Called only when 0<=p<=100: 100//10 is 10 but 100 belongs in index 9 def _tally(self,p): self._histogram[p//10 if p<100 else 9] += 1 def clear(self): for i in range(10): # could write: self._histogram = 10*[0] self._histogram[i] = 0 # tally allows any number of arguments, collected into a tuple by *args def tally(self,*args): if len(args) == 0: raise IndexError('Percent_Histogram.tally: no value(s) to tally') for p in args: if 0 <= p <= 100: self._tally(p) else: raise IndexError('Percent_Histogram.tally: '+str(p)+' outside [0,100]') # Another approach would be to store/remember all tally failures # allow indexing for bins [0-9] # but can mutate these values only through __init__, clear, and tally # no __setitem__ defined def __getitem__(self,bin_num): bin = (bin_num if bin_num >= 0 else 10+bin_num): # bin_num = -1, 10-1 = 9 (last bin) if 0 <= bin <= 9: return self._histogram[bin] else: raise IndexError('Percent_Histogram.__getitem__: '+str(bin_num)+' outside [0,9]') # standard __iter__: defines a class with __init__/__next__ and returns # an object from that class def __iter__(self): class PH_iter: def __init__(self,histogram): self._histogram = histogram # sharing; sees mutation # self._histogram = list(histogram) # copying; doesn't see it self._next = 0 def __next__(self): if self._next == 10: raise StopIteration answer = self._histogram[self._next] self._next += 1 return answer def __iter__(self): return self return PH_iter(self._histogram) # To reconstruct a call the __init__ that reproduces the correct counts in # the histogram, we supply the correct number of values, but all at the # start of the bin: e.g., if bin 5 has 3 items, the repr has three 50s def __repr__(self): param = [] for i in range(10): param += self[i]*[i*10] return 'Percent_Histogram('+str(param)+')' # a 2-dimensional display; do you understand the use of .format here? def __str__(self): return '\n'.join(['[{l: >2}-{h: >3}] | {s}'.format(l=10*i,h=10*i+9 if i != 9 else 100,s=self[i]*'*') for i in range(10)]) Notes: 0) The __init__ method uses the idiom 10*[0] which you should know. If not, experiment with it. An alternative to the loop calling tally is calling tally(*init_percents), but unfortunately tally requires at least one argument, and if init_percents in [] the call would appear as tally() which would cause tally to raise an exception; so better to use the explicitly for loop calling tally, which would be executed 0 times if init_percents is []. Of course, we could allow tally() and define it to do nothing. 1) The _tally function is supposed to be called only by methods defined in this class. It does the actual work, putting a number from the range [0,100] into the correct bin, treating 100 specially (it belongs in bin 9, but p//10 would put it in bin 10, which doesn't exist). The last bin, 90-100 contains 11 values, while all the other bins (e.g., 30-39) contain 10. To work correctly, this method assumes p is legal: 0 <= p <= 100 2) The clear method sets each bin in the list to 0; we could have allocated a new list as shown in the comment, but generally that takes more time and occupies more space. Better to zero-out existing list. 3) By using *args, the tally method can have any number (0 or more) of positional arguments. All arguments are collected into tuple that is iterated over to process the value individually. If there is not at least one value, or any value is out of range, this method raises an exception. Instead, we could just update a list of "bad tallies" and write a method that returns this list. Then, we could call the new method to determine if there were any bad tallys (in fact, how many, and what attempted tallies failed). 4) The __getitem__ method allows us to index all the bins, 0-9 inclusive of a Histogrm object (negative values are acceptable, indexing from the end of the array of bins). Note that we can set values into these bins (i.e., mutate the list), only via __init__ and tally. So we call this information read-only: we can read it but not write/change it (this class defines no __setitem__). Of course, Python actually allows us to write o._histogram but the leading underscore indicates only methods in the class should refer to the _histogram attribute 5) We use the now standard way to implement __iter__, by defining a local class that defines __next__ and returning an object from that class. We will discuss how changing self._histogram = histogram vs. self._histogram = list(histogram) changes the iterator. This nested class is very similar to the one used for iterating over lists: it is simpler because the list always contains 10 values. 6) The __repr__ method doesn't know what numbers went into the bins! But we can use the lowest number in each bin, repeated by the count in that bin, to specify a list needed to construct an equivalent object (with the equivalent number of values in each bin) with the constructor. 7) The __str__ method returns a two-dimensional plot of the histogram. Do you all know how to use the format method for strings? If not you should look it up (it is described online using something like EBNF) and practice using it. You should certainly be able to tell me why the string that .format is called on produces the result you'll see. When Python executes the following script: quiz1 = Percent_Histogram([50, 55, 70, 75, 85, 100]) quiz1.tally(20,30,95) print(repr(quiz1)) for count in quiz1: print(count,end=' ') print('\n',quiz1,sep='') It prints the following information: Percent_Histogram([20, 30, 50, 50, 70, 70, 80, 90, 90]) 0 0 1 1 0 2 0 2 1 2 [ 0- 9] | [10- 19] | [20- 29] | * [30- 39] | * [40- 49] | [50- 59] | ** [60- 69] | [70- 79] | ** [80- 89] | * [90-100] | ** Normally we would use this class in a program that reads a file of scores. Now, what would happen if we executed the following code? for count in quiz1: print(count,end=' ') quiz1.tally(100) It would print: 0 0 1 1 0 2 0 2 1 11 (and after the loop, quiz1[9] is 12) Note that mutating the quiz1 object during each iteration would result in the new, accumulated values for the results produce by the iterator (in the last bin). That is because the PH_iter object refers to (shares) the same list that the Histogram class created (and that the tally method increments). So that sharing results in the iterator always returning the most up-to-date value in the list. What if we wanted to have the iterator produce the values in the histogram WHEN THE ITERATION STARTED, and not show any updates after that. The change is trivial: in PH_iters's __iter__ method we change self._histogram = histogram to self._histogram = list(histogram) Now instead of this iterator object sharing the list being using for the histogram, it has its own copy: a new/different list, but storing all the original list's values. So, changes to the original list will not change the self.histogram list and therefore not change the result of the iteration. The cost: extra space used for the list (not much, because the list always contains just 10 values) and some extra time to construct the list. So, we need to decide (and document) the semantics for our iterators. Can you tell (and if so, with what code) what decision was made concerning this issues for the list iterator, and discuss why you think the designers made that decision? In fact, because there are iterators for lists built-into Python, we could in fact simplify __iter__ to delegate to the list class: def __iter__(self): return iter(self._histogram) for sharing behavior; and for copying behavior def __iter__(self): return iter(list(self._histogram)) because the list class supports an __iter__ method, on whose result __next__ can be called. So defining and advancing our own indexes in the PH_iter class is not strictly necessary. But this code illustrates how the actual list iterator works (by remembering/using/incrementing the index it is currently at) and such simplifications are not possible for other classes that store more complicated data structures. Iteration on other fundamental data structures in Python is different! If we try to mutate a set or dict (of course we can't mutated a string or tuple) then Python will raise a RuntimeError exception. Try writing/testing some code exhibiting this behavior. Basically, we know the order in which lists are iterated over, so if we mutate lists we will know whether the mutation has an affect on the iteration: if we mutated by appending, we will see that value eventually; if we mutate at the front (before where we are currently iterating) then we won't. Because sets and dicts are not iterated over in any fixed order, we cannot predict whether adding something to a set/dict will eventually be seen in the iteration, so Python disallows mutating a set/dict while iterating over it. ------------------------------------------------------------------------------ Iterable Decorators: Classes that are initialized by/implement iterable Since iterators are so important, it is useful to have a grab-bag of classes (this lecture) and generators (next lecture) that operate on iterables to produce other, slightly different iterables. When a class takes as an argument an object that has methods implementing a certain protocol and returns an object that has methods that implement the same protocol, the class is called a DECORATOR. We will write a bunch of classes that decorate iterables below (and even more in the next lecture). These are all pretty simple to think about, and while the code is complicated, it is complicated in the same way each time. In all these examples, we will not see an explicit "raise StopIteration"; instead, this exception is raised when calling next on the iterable it is iterating over. That next rasises StopIteration, which may or may not be caught in this code: if caught, it executes code in the "except" block; if not, it propagates the exception outside the call's context, stopping the iteration. Here is a first example of a decorator for iterable and a refinement of it. The Repeat class takes an iterable as an argument and implements an __iter__ method that repeats that iterator over and over: whenever it runs out of values to produce, the entire sequence of values is produced again. We can test this class with any iterable, and strings are the simplest, so we will demonstrate this class using a string. If we run the script for i in Repeat("abcde"): print(i,end='')) Python would print: abcdeabcdeabcde ... and keep going forever; sometimes it is useful to have an iterator go forever (typically there will be some if/break in the loop using it, to eventually stop it). Here is that class class Repeat: def __init__(self, iterable): self._iterable = iterable def __iter__(self): class Repeat_iter: def __init__(self, iterable): self._iterable = iterable # remember for restarting in next self._iterator = iter(iterable)# remember for direct use in next def __next__(self): try: return next(self._iterator)# return next result in iterator except StopIteration: # but if there is none... self._iterator = iter(self._iterable) # restart iterable return next(self) #call next to start def __iter__(self): return self return Repeat_iter(self._iterable) This uses the same define-a-class-in-the-__iter__-method used in the prange and Histogram classes above. We can generalize this class to Repeat an iterator either at most some fixed number of times or forever, using the following class. If the second argument to __init__ is an integer, it repeats the iterable at most that many times; if there is no second argumment, the iterator repeats forever (as above). So max__times = None means there is no maximum: it repeats forever. Note that short-circuit evaluats is necessary in the first if in __next__. class Repeat: def __init__(self, iterable, max_times=None): self._iterable = iterable self._max_times = max_times def __iter__(self): class Repeat_iter: def __init__(self, iterable, max_times): self._iterable = iterable self._max_times_left = max_times self._iterator = iter(iterable) def __next__(self): if self._max_times_left != None and self._max_times_left <= 0: raise StopIteration else: try: return next(self._iterator) # StopIteration raised? except StopIteration: if self._max_times_left != None: self._max_times_left -= 1 self._iterator = iter(self._iterable) return next(self) # StopIteration raised? def __iter__(self): return self return Repeat_iter(self._iterable, self._max_times) If we run the script for i in Repeat("abcde",3): print(i,end='')) Python would print: abcdeabcdeabcde What does Python produce for the following code in the original version of Repeat (on the left) and the updated version (on the right)? Why? Is there a more reasonable behavior in this case? Note the empty string iterable iterates 0 times. for i in Repeat(""): for i in Repeat("",3): print(i) print(i) Here is a second kind of decorator for iterables. It returns all the values in an iterable, but never the same value twice. We call this class Unique. It works by keeping a set in each Unique_iter object that remembers and bypasses returning any value already returned from that iterator object. class Unique: def __init__(self, iterable): self._iterable = iterable def __iter__(self): class Unique_iter: def __init__(self, iterable): self._iterated = set() self._iterator = iter(iterable) def __next__(self): answer = next(self._iterator) # StopIteration raised? while answer in self._iterated: answer = next(self._iterator) # StopIteration raised? self._iterated.add(answer) return answer def __iter__(self): return self return Unique_iter(self._iterable) If we run the script for i in Unique('Mississippi'): print(i,end='') Python prints: Misp We can also generalize this class by specifying the maximum number of times a value can be returned (with a default argument of 1, which brings us back to version of Unique specified above, since it allows values to be returned only once). This is a trivial but interesting example of generalizing classes with backward compatibility of use. Here we replace a set of "iterated-over" values by a dictionary with these values as keys associated with the number of times this value has been returned: a set doesn't contain the information we need to implement this class. from collections import defaultdict class Unique: def __init__(self, iterable, max_times=1): self._iterable = iterable self._max_times = max_times def __iter__(self): class Unique_iter: def __init__(self, iterable, max_times): self._times = defaultdict(int) self._iterator = iter(iterable) self._max_times = max_times def __next__(self): answer = next(self._iterator) # StopIteration raised? while self._times[answer] >= self._max_times: answer = next(self._iterator) # StopIteration raised? self._times[answer] += 1 return answer def __iter__(self): return self return Unique_iter(self._iterable, self._max_times) If we run the script: for i in Unique('Mississippi',2): print(i,end='') Python prints: Missipp As another example, we will write the Filter class, which is supplied with a predicate function (of one argument that returns a bool), indicating whether or not a value should be produced by the iterable or FILTERED OUT, causing next to not return that value, but instead to keep looking at values until it finds a value to return for which the predicate returns True. class Filter: def __init__(self, iterable, predicate): self._iterable = iterable self._predicate = predicate def __iter__(self): class Filter_iter: def __init__(self, iterable, predicate): self._iterator = iter(iterable) self._predicate = predicate def __next__(self): answer = next(self._iterator) # StopIteration raised? while self._predicate(answer) == False: answer = next(self._iterator) # StopIteration raised? return answer def __iter__(self): return self return Filter_iter(self._iterable, self._predicate) If we run the script: for i in Filter('abcdefghijklmnopqrstuvwxyz',lambda x : x not in 'richardpattis'): print(i,end='') Python prints all the letters not in my name: befgjklmnoquvwxyz Notice that the Repeat, Unique, and Filter classes all implement their iterators similarly, with the same pattern of code. In the next lecture we will rewrite these decorators -and even more decorators- much more simply using generators, which allow us to capture the pattern above much more easily -once we understand how generators work. Here is a last decorator for iterables. Its calls to next return all the values in an iterable but in sorted order. In this implementation we collect all the values from the iterator into a list and then sort the list and return its iterator (since the values are all in the correct order). We cannot know what smallest value to return until we have seen all the values. class psorted: # pseudo-sorted: works just like sorted def __init__(self, iterable, key=None, reverse=False): self.result = list(iterable) # put all values from iterable into a list self.result.sort(key=key, reverse=reverse) def __iter__(self): return iter(self.result) Actually, "sorted" in Python is simpler: it is a function that returns a list (so it still returns something that is iterable). So, psorted might be more clearly written as just def psorted(iterable, key=None, reverse=False): result = list(iterable) # put all values in a list result.sort(key=key, reverse=reverse) # calling sort returns None return result # so return in another statement Finally, notice how we can combine these decorator classes below. Suppose I want to print out all the letters in my name in alphabetical order, with no repetition of letters. I can do it with the following script. Note that the string argument to psorted is iterable, and psorted itself returns an iterable (so the argument to Unique is iterable) and finally Unique is iterable as well (which is needed by the for loop) for i in Unique(psorted('richardpattis')): print(i,end='') It prints: acdhiprst What would the following script print? It reverse the order of the decorator classes being constructed. for i in psorted(Unique('richardpattis')): print(i,end='') ------------------------------------------------------------------------------ Problems: 1) Explain what happens (and why) if we write the following loops for c in Repeat(''): ...? for c in Repeat('',4): ...? each with an empty iterator (produces no values). What alternative (more reasonable) behavior might you want in these cases? How we can implement it? 2) Explain what happens in each of the following situations, which uses a list as the iterable being decorated: will the appended value be printed? l = [...] # some list i = 0 for u in Unique(l): print(u) i += 1 if i == 1: l.append(...) # append a value to l that is not already there l = [...] # some list ui = Unique(l) l.append(...) # append a value to l that is not already there for u in ui: print(u) Explain how we could modify the Unique class to get the opposite behavior. 3) Here is another way to write the __next__ method in the Filter_iter class. Compare this loop with the one in the code above as to simplicity/clarity. Which would you prefer in your code? while True: answer = next(self.iterator) if self.predicate(answer): return answer 4) Expain why the iterable passed as an argument to the psorted class must be finite. What very big difference is there between calling for i in psorted(Repeat('abc')): print(i) and for i in Unique(Repeat('abc')): print(i) 5) Define a preversed class similar to the sorted class above, which acts as a decorator for iterables. You may not use reversed in your code. Hint: it uses a combination of creating a list from the iterable and the iterator in the Histogram class (although using a range iterator might simplify the code). 6) Define a Random class, which acts as a decorator for iterables: it returns each value with a probability (a float between 0-never- and 1-always) specified when the object is constructed. Account for the fact that the iterator might produce an infinite number of values. Write this in a straightforward way, then write it in a simpler way using the Filter class. 7) Define a Skipping class, which acts as a decorator for iterables: it skips the first n values (n is specified as an argument to __init__) and then produces the same value as its iterable argument. So the following loop skips the first three values when iterating over the string 'abcdefg' for c in Skipping('abcdefg',3): print(c,end='') and prints: defg