Operator Overloading (continued) In this lecture we examine how to overload more operators: many fewer than in the first lecture, but some that perform more interesting (and subtle) operations -some that get to the core of Python's operation, which we have the ability to leverage in the classes we write! Some symbols/operators we discuss here include [] (index/dict lookup) and in (contains) and with (context manager) and . (attribute access) and del (attribute removal)... ------------------------------------------------------------------------------ Container operators: in Lists/Tuples We saw that when we call the standard len function with an object argument, it is translated into a call of the __len__ method on that object: for example, calling len(x) returns the value x.__len__(); Python does the same with the reversed function and and the __reversed__ method. Since __reversed__ is really an iterator, we will discuss it in depth next week (see how it, and similar iterators are coded). The following methods all relate to indexing: using the [] delimiters, which Python interprets as symbols describing the indexing operator. What appears inside the brackets is typically either a simple integer (for lists and tuples) or a slice (ditto: more on slices below). Also [] is an operator that works on dictionaries (dict and defaultdict), where the value inside [] is the dict's key, which can be any arbitrary value (well, it has to be immutable/hashable). Let's look at the laundry list of methods first, and then use each in a new class that we define. Assume for the examples below that we have defined l to be a list. Note that the index parameter can be an int or a slice for lists/tuples (we will focus on lists here). We discuss three different forms of using an index, each calling a different double underscore method; a fourth related operator, "in", is translated into a call on the __contains__ method and is discussed below too. __getitem__(self,index) : index: l[3] or l[1:-1] __setitem__(self,index,value) : store list: l[3]='a' or l[2:4]=('a','b','c') __delitem__(self,index) : delete: del l[3] or del l[1:-1] __contains__(self,item) : called by the in operator: 'a' in l To illustrate these methods, let's write a class that allows us to specify a list that is indexed staring at 1 (not 0). Really we should think in terms of defining a class for a new type of data (lists, tuples, sets, frozensets, dicts, and defaultdicts are all types of data; we can use inheritance, discussed later in the quarter, to define variants of these data types more easily), but for now let's look at adapting/using standard lists in this new way, to simplify what we are learning (we could do the same with strings, writing Str1). This example also uses delegation, where an operation on a List1 is translated into a "similar" operation on the list it actually stores/delegates to. We start with class List1: def __init__(self,iterable): self._plist = list(iterable) def __str__(self): return str(self._plist) So, we can write x = List1(['a', 'b', 'c', 'd', 'e']) print(x) which prints as ['a', 'b', 'c', 'd', 'e'] So x looks just like a list when we create one and print one (although we must use the List1 constructor, not just [...] for these special lists). Its argument is anything Python can iterate over to create the values in the list, and we copy that information into a regular list. We now want to implement list-like behavior, but with indexes that start at 1 instead of 0. First we look at the __getitem__ method, called by l[...]. We can use either integers or slices inside the brackets, for now let's look at just integers. Note that in Python lists, integer indexes are either non-negative (0, 1, ...), which specify an index from the beginning (e.g., 0 is the first, 1 is the second...), or negative, which specify an index from the end (e.g., -1 is the last index, -2 second from last). Note the asymmetry we are now fixing: in List1 we want 1 to be first and -1 be last, 2 to be second and -2 to be 2nd from last, unlike Python lists, where 0 is the first index and -1 the last. So we will start with the helper function (static method) _fix_index: the leading underscore means this method should be used only other methods in the List1 class (although it is safe to use by anyone: it doesn't set/mutate any attributes). This function demotes positive indexes by 1, but leaves 0 and negative indexes as is. So _fix_index(1) returns 0, which when used to index self._plist, the delegated list, denotes the index of the first value. Likewise, _fix_index(-1) returns -1, which still denotes the index of the last value. @staticmethod def _fix_index(i): return (i-1 if i >= 1 else i) Recall, because this is a static method, it has no self parameter. We will call this method (see below) like List1._fix_index(...). Static means it does not operate on instances of a class: so it uses no self parameter to refer to an instance of List1. If l is a list we could call l._fix_index(...); because it is static, FEOOP translates it to List._fix_index(...), not putting l as a first argument (because there is no self in static methods). Alternatively, we could have defined _fix_index as a global function in this module, and called it just as _fix_index(...) in the class methods, but it is better to define this static method in the class itself (avoiding "polluting" the module's namespace by putting another name in it). Again, note that we use a prefix underscore to indicate no function/method outside the class SHOULD call this function: it is just a helper for the methods in the class. But there is nothing in Python that disallows us from calling List1._fix_index anywhere. With _fix_index defined, we can write __getitem__ as follows. Notice that it ensures index is an int, otherwise it raises an exception. If index is an int, it delegates to self._plist to access its information, but when accessing self._plist, we decrease the index by 1 for positive indexes, but leave zero and negative indexes alone. For illumination/debugging purposes we have put a print in __getitem__ which we will comment/uncomment as needed. For the examples below we will leave it in, so that we can see when __getitem__ is called by Python, since we don't explicitly call it. def __getitem__(self,index): print('List1.__getitem__('+str(index)+')') # for illumination/debugging if type(index) is int: return self._plist[List1._fix_index(index)] else: raise TypeError('List1.__getitem__ index('+str(index)+') must be int') Running the following script illustrates how __getitem__ is called. x = List1(['a','b','c','d','e']) print(x) print(x[1], x[2], x[-2], x[-1]) Python produces the following output, printing the entire list, the first, second, last, and second to last values. Notice the calls that Python automatically makes to __getitem__ when we use the [] operator. Note that print calls __getitem__ on its arguments, building a string to print, and then prints the entire string after getting all the values. ['a', 'b', 'c', 'd', 'e'] List1.__getitem__(1) List1.__getitem__(2) List1.__getitem__(-2) List1.__getitem__(-1) a b d e Two things about this method. First, we should probably raise an exception if the index is 0 because that value should not be a legal index in a List1 list. But for purposes of illustration later, when we discuss the "in" method, we will not do this (so index 0 will be the same as index 1: both translate to 0). Second, we need to talk about slices and how List1 objects will can process them. ---------- Interlude: slices To finish writing __getitem__ we must take a short detour to talk about slices. Recall that for Python lists we can write indexes like x[1:4], x[2:-2], x[:-1] and even x[::2]. Each of these slices translates into an actual slice object (yes there is a slice class defined in the builtins module which is automatically imported into every python module) that is passed to __getitem__. Each slice object has three attributes that we can access: start, stop and step (with None for every value not specified in the slice). x[1:4] translates to x[slice(1,4,None)] x[2:-2] translates to x[slice(2,-2,None)] x[:-1] translates to x[slice(None,-1,None)] x[::2] translates to x[slice(None,None,2)] The __getitem__ methods for lists know how to process slices. We can delegate these slices to be used on self._plist, but we need to fix the start and stop indexes (as done above for pure int indexes, with the same function), but leave the step as is. Also, since slices can specify None, we need to update _fix_index to leave None unchanged. So, we update the _fix_index and __getitem__ methods as shown below. Now for slices, we construct a fixed slice from the one passed as an argument (fixed for start and stop in 1-origin lists) and use this slice when delegating to self._plist. @staticmethod def _fix_index(i): if i == None: return None else: return (i-1 if i >= 1 else i) # for + indexes, 1 smaller: 1 -> 0 # for - indexes, the same: -1 (still last), -2 (still 2nd to last) def __getitem__(self,index): print('List1.__getitem__('+str(index)+')') # for illumination/debugging if type(index) is int: return self._plist[List1._fix_index(index)] elif type(index) is slice: s = slice(List1._fix_index(index.start), List1._fix_index(index.stop), index.step) return self._plist[s] else: raise TypeError('List1.__getitem__ index('+str(index)+') must be int/slice') Running the following script illustrates how __getitem__ works with slices; we left the print statement in __getitem__ and the __str__ for slice objects prints as slice(start, stop, step) x = List1(['a','b','c','d','e']) print(x) print(x[1:4], x[2:-2], x[:-1], x[::2]) Python produces the following output, printing the entire list, and then the specified slices of that list (again, where the index of the first item is 1). Notice the calls that Python automatically makes to __getitem__ when we use the [] operator with slices. Remember that slices include indexes up to but not including the stop index (we could fix this too, if we didn't like, by always incrementing the stop index by 1, therefore using one more index; this would be a problem, though for incrementing -1 to 0 which would be wrong and we'd need a special way to fix that: incrementing -1 to None: here, x[1:] would be slice with all values in a List1). ['a', 'b', 'c', 'd', 'e'] List1.__getitem__(slice(1, 4, None)) List1.__getitem__(slice(2, -2, None)) List1.__getitem__(slice(None, -1, None)) List1.__getitem__(slice(None, None, 2)) ['a', 'b', 'c'] ['b', 'c'] ['a', 'b', 'c', 'd'] ['a', 'c', 'e'] ---------- Now that we know how to handle indexes that are integers or slices (by fixing them and delegating their use to self._plist), we can easily write the remaining methods. For example the __setitem__(self,index,value) method is supposed to assign value to object at the specified index(es). Its structure is identical to __getitem__, processing int indexes, slice indexes, or raising TypeError. Here, though, we are assigning to self._plist, not returning a value. Because there are no return statements in this method, Python will automatically return None when it finishes executing; we could also specify this explicitly as return None: both have the same effect in Python. def __setitem__(self,index,value): print('List1.__setitem__('+str(index)+','+str(value)+')') # for illumination/debugging if type(index) is int: self._plist[List1._fix_index(index)] = value elif type(index) is slice: s = slice(List1._fix_index(index.start), List1._fix_index(index.stop), index.step) self._plist[s] = value else: raise TypeError('List1.__setitem__ index('+str(index)+') must be int/slice') Running the following script illustrates how __setitem__ works with int and slice indexes; we again left the print statement in __setitem__. x = List1(['a','b','c','d','e']) print(x) x[1] = 1 x[4:5] = (4,5) print(x) Python produces the following output, printing the entire list, and then the updated list. Notice the calls that Python automatically makes to __setitem__ when we use the [] operator with slices. ['a', 'b', 'c', 'd', 'e'] List1.__setitem__(1,1) List1.__setitem__(slice(4, 5, None),(4, 5)) [1, 'b', 'c', 4, 5, 'e'] Next, the __delitem__(self,index) method is supposed to delete/remove values from the specified index(es). Its structure is identical to __getitem__ and __set__item, processing int indexes, slice indexes, or raising TypeError. As with __setitem__ we automatically return None. def __delitem__(self,index): print('List1.__delitem__('+str(index)+')') # for illumination/debugging if type(index) is int: del self._plist[List1._fix_index(index)] elif type(index) is slice: s = slice(List1._fix_index(index.start), List1._fix_index(index.stop), index.step) del self._plist[s] else: raise TypeError('List1.__delitem__ index('+str(index)+') must be int/slice') Running the following script illustrates how __delitem__ works with int and slice indexes; we again left the print statement in __delitem__. x = List1(['a','b','c','d','e']) print(x) del x[1] # now ['b','c','d','e'] index 1 deleted print(x) del x[2:4] # now ['b','e'] indexes 2-3 (not 4) deleted print(x) Python produces the following output, printing the entire list, and then the updated list. Notice the calls that Python automatically makes to __delitem__ when we use the [] operator with slices. ['a', 'b', 'c', 'd', 'e'] List1.__delitem__(1) ['b', 'c', 'd', 'e'] List1.__delitem__(slice(2, 4, None)) ['b', 'e'] ---------- Implementing in via __getitem__ (Python's default if no __contains__ exists) Before defining the __contains__ method, we will learn that if there is no defined __contains__ method, Python will check "x in l" by first checking if x == l[0], then x == l[1], then x == l[2], ... until it finds x, or indexing raises an exception (Python takes this approach instead of computing len(l) as an upper bound to index, because the __len__ method might not be defined for the class). So if we ran the following script (without defining the __contains__ method) x = List1(['a','b','c','d','e']) print('d' in x) print('z' in x) Python produces the following output, printing List1.__getitem__(0) List1.__getitem__(1) List1.__getitem__(2) List1.__getitem__(3) List1.__getitem__(4) True List1.__getitem__(0) List1.__getitem__(1) List1.__getitem__(2) List1.__getitem__(3) List1.__getitem__(4) List1.__getitem__(5) List1.__getitem__(6) False Let's look at this carefully. For the first print Python executes 'd' in x; it first calls __getitem__(0) but in the List1 objects there is nothing at index 0; actually if you look at the code a call of __getitem__(0) is translated to [0] -storing 'a'- which looks at the first value and doesn't find 'd'; then Python calls __getitem__(1), which is also translated to [0] -storing 'a'- which again looks at the first value and doesn't find 'd'; then Python calls __getitem__(2), which is translated to [1] -storing 'b'- which looks at the second value and doesn't find 'd'; ...; then Python calls __getitem__(4), which is translated to [3] -storing 'd'- which looks at the fourth value and does find 'd', so the contains returns True, which is printed. For the second print Python executes 'z' in x; it calls __getitem__ multiple times; __getitem__(5) is translated to [4] -storing 'e'- which looks at the fifth/last value and doesn't find 'z'; then Python calls __getitem__(6), which raises an exception indicating that there are no more values to examine, so using "in" (translated to multiple indexes) returns False, which is printed. Although using "in" accesses index 0 twice, it does always produce the correct answer. If we wrote _fix_index to raise an exception for index 0 (which isn't really a legal index in List1) __contains__ would not automatically work correctly. But now we will explicitly write a __contains__ method that does not automatically call the __getitem__ method. ---------- Now, we can define our own contains to just use the in operator for lists. def __contains__(self,item): for v in self._plist: if v == item: return True return False but this is just checking whether item is in _plist: checking "in" doesn't really depend on whether our indexes start at 0 or 1: it depends only on whether the value is at some index in the stored list. So we can simplify it to delegate to the in operator for the standard list it stores. def __contains__(self,item): return item in self._plist So if we ran the following script (defining the __contains__ method above) x = List1(['a','b','c','d','e']) print('d' in x) print('z' in x) Python now produces the following output, printing just the following, showing that there are no more calls automatically made to __getitem__. True False Again,, when determining whether a value is in a list, we don't really care what index it is stored in: we just want to know whether it is stored anywhere. If we want to iterate over a class (for example, in a for loop) we should implement the __iter__ and __next__ methods. We will discuss these in detail in next week's lectures. But like in/__contains__ if those methods aren't defined, Python calls the __getitem__ method for indexing. For example, if Python executes x = List1(['a','b','c','d','e']) for i in x: print(i) for the List1 class as defined above, it prints the following __getitem__(0) a __getitem__(1) a __getitem__(2) b __getitem__(3) c __getitem__(4) d __getitem__(5) e __getitem__(6) Notice as with the in operator above, it starts calling __getitem__ at index 0, so that value gets produced/printed twice by the iterator. For the in operator this wasn't a problem, but here it produces/prints the first value twice, which we will fix when we define the real __iter__ method. Also as with the in operator it calls __getitem__ with successively bigger indexes, returning those values until at __getitem__(6) an exception is raised, because there is no index 6 in this List1 object: just indexes 1 through 5 and stops by handling (and not showing) the exception. So, we really should write _fix_index to raise an exception when supplied the argument 0, which is not a legal index for objects constructed from List1. Then we must define explicit __contains__ and iteration methods (__iter__ and __next__, discussed next week) because using the automatic ones would generate index 0, for which _fix_index would raise an exception. It is often the case that we define new classes using combinations of the builtin classes (like list, tuple, set, dict) so often we delegate operations for our new class to operations on the classes it is built from. In the Vector example, we represented a Vector as a tuple: we computed the __len__ of the Vector just by delegating __len__ to compute on the tuple. ------- Container operators: in Dictionaries We are now coming to the end of this example. Although this example used integer indexes for list, if we wanted to produce a special kind of dictionary data type we can use any type of index as a key in the __getitem__, __setitem__, and __delitem__ methods. Of course we cannot use slices with dictionaries, as we did above with List1. For classes that will be used like dicts, there is another special method __missing__(self,key) which should be called whenever a dict fails to find a key it is looking up. We can define the __missing__ method to tell Python what to do in these cases. In a normal dict class, Python raises an exception; in the defaultdict class, its __missing__ method associates a special object in the defaultdict (specified by an argument in the construction of the defaultdict) with that key, and then returns that associated object for possible further processing. When we study inheritance later in the quarter, we will see how to completely define defaultdict easily, including defining the __missing__ method as described above. One last point here. many of the methods below return None, because they are meant to be the result of commands that mutate arguments, not return values. But, it might be useful for them to return a value. For example, in the Java library for maps (which are like Python dicts), mapping keys to values, calling __setitem__ returns the OLD value associated with the key being set; likewise, __delitem__ returns the value associated with the key being deleted. Using these returned values sometimes makes for more elegant code. Although Python doesn't do this, we can write our own classes that mimic a standard dict, but in addition behave any special way we want them to (like returning such values). Finally, although we changed the indexes to start at 1, we did not change the upper-bound meaning of slices. recall that a standard slice 2:5 corresponds to indexes 2, 3, and 4 (not including 5). I often find it difficult to remember or use the fact that the stop index is not included in the slice, so it might be interesting to change the slices so that the stop index is included (as I did for irange in my goody module) which can be done simply when the slices are created/"fixed" and used in __getitem__, __setitem__, and __delitem__ methods. The main point here (and for the special methods below) is that if we want a data type to behave a certain way in Python, we can write classes in Python for that data type and make it exhibit exactly the behavior that we want by writing methods for the class. We have crude ways to do that now, but will see more powerful ways when we study inheritance. ------------------------------------------------------------------------------ Function Call We know how Python calls methods on instances of classes: by the Fundamental Equation of OOP, o.m(...) is translated into type(o).m(o,...). But we can also define how to use an instance as if it were a function itself, allowing us to "call it" as o(...). The way we do this is by defining a method named __call__ in the class. The call o(...) -here () is the "call operator"- is translated into o.__call__(...), and by FEOOP type(o).__call__(o,...) ---- In fact, when we use the name of a class like a function call, we are using this mechanism. For example, when calling set([1,2,3]) we are calling a function on the set class object, which is an instance of a special class named type; it is __call__ in type that creates an empty object, and calls __init__ defined in set on it. Below we will show how to overload __call__ for objects that have already been constructed from a class. --- 1) Here is a tiny example of overloading __call__ followed by a more interesting one. When we call the object, it will increment the self.x value: x is stored in the namespace/__dict__ of the object. class C: def __init__(self): self.x = 0 def __call__(self,inc=1): self.x += inc So if we construct objects from class C, we can call them specifying one argument (to match the inc parameter), and their x attribute is incremented by that amount. We can write o = C() print(o.x) o(2) print(o.x) o() print(o.x) Note that we can call the object o refers to just by writing parentheses after o. Python processes o(2) by calling to o.__call__(2) and then calling type(o).__call__(o,2) which is C.__call__(o,2). It prints 0 2 3 2) The following class is a bit ahead of where we are now, but it shows a more realistic use of overloading __call__, and we will discuss this class, and other classes like it, here and in detail later in the quarter. Track_Calls is a (decorator) class that we can use to remember how many times functions are called. We illustrate its use with a recursive fibonacci (fib) function. When Python constructs an object from the Track_Calls class, it remembers the function it is given (f) in its _f instance variable, and initializes its calls instance variable to 0. Here we assume f is a function of one parameter, but see the "#to bundle" commend below. An object of class Track_Calls can be called directly: its __call__ method increments the calls counter and calls and returns the value computed by the remembered function: delegating to it to compute the actual values we want. Thus, we can replace a function call with a function call to an object in which the original function is remembered; and objects can do more than functions: for example, they can remember information/state, like how many times they are called. class Track_Calls: def __init__(self,f): self._f = f self.calls = 0 def __call__(self,x): # or ,*args,**kargs): #to bundle arbitrary arguments self.calls += 1 return self._f(x)# or ,*args,**kargs) #to unbundle arbitrary arguments def fib(n): assert n>=0, 'fib cannot have negative n('+str(n)+')' if n == 0: return 1 elif n == 1: return 1 else: return fib(n-1) + fib(n-2) fib = Track_Calls(fib) The script above defines the Track_Calls class, the fib function, and then rebinds the name fib to an object constructed from the Track_Calls class, when passed a reference to the fib function object as an argument. When we call fib(...), Python finds the Trace_Calls object that fib refers to and calls its __call__ method. So, it calls fib.__calls__(...) which FEOOP translates to Track_Calls.__calls__(fib,...) which increments the calls instance variables and then calls the original function object bound to fib (now bound to self._f, which is recursive) and returns that value. Remember that the fib name is now bound to a Track_Calls object, so any recursive calls to fib are also handled by the Track_Calls._call__ method (which increments the calls instance variable). Ultimately the recursive function call returns an answer and the "calls" attribute/instance variable accumulates how many calls of fib, called through Track_Calls.__call__, were made. Here is code that creates a table of the returned values from fib and the number of calls each required. I directly reset fib.calls to 0, but I could have created a method in Track_Calls to reset this instance variable, or just remembered the number of calls before and after a new call and subtracted. for i in irange(0,31): print('fib(', i, ') =', fib(i), 'and was called', fib.calls,'times') fib.calls = 0 # reset instance variable to 0 for next iteration In this code I am using an instance variable directly (note calls is not preceded by an underscore). To avoid using this instance variable directly, I could have defined this instance variable as _calls and then defined the report method below to return such a value (and another method to reset it): def report_calls(self): return self._calls. def reset_calls(self): self._calls = 0 This follows the more standard convention of using a class without directly using its state attributes; the class supplies methods to access/update them. In this case I would write the code above as for i in irange(0,31): print('fib(', i, ') =', fib(i), 'and was called', fib.report_calls(), 'times') fib.reset_calls() # reset for next call in loop When we discuss "decorators" later in the course, we will see all sorts of what I would call fascinating uses of classes that remember a function and define function calls for the object that delegate to the function, but do something else too. Thus, they "decorate" the function. ------------------------------------------------------------------------------ Context managers: We will first review using context managers, which have the following syntax. with A_Class(...) [as name]: # as name is optional: in [] (EBNF) block # (possibly using name in its statements) You should have used context managers in ICS-32, with the open class, to ensure that any opened file is closed automatically (even if an exception was raised when executing the block of code using it). We use the keyword "with" to invoke a context manager, whose object may or may not be named; the context manager manages execution of a block. Generally, the purpose of a context manager is to simplify the setup/tear-down of a context in which to execute the block. Oftentimes, the context includes information about how to do exception handling for the block, if any statement in the block raises an exception. To be used by a context manager, the class must define both __enter__ and __exit__ methods. Below, we will examine how such code can perform setup and tear-down, and handle exceptions. Once these methods are written in a class, we can easily use a context manager to manage objects from that class, avoiding duplicating try/except statements other code. For example, we can use an "open" class object in a context manager to ensure that the open file is closed after the block finishes executing, regardless of whether our block raises an exception. We write no code to handle any exceptions and no code to close the file: that code appears in the __enter__ and __exit__ methods defined in the open class itself. For an example, we can write just with open(...) as open_file: for line in open_file: process line, possibly raising an exception which opens the file, and processes all the information in it, a line it a time, and then closes the file. Even if the block raises an exception, the file will still be closed by the context manager before it finishes executing. We can write our own context managers by defining its protocol: writing two special methods in the class, __enter__ and __exit__. By defining these methods, we can use objects of such a class in the syntax Python provides for context managers. Thus, defining these methods is similar to overloading operators: Python calls these __ methods automatically based on us using the "with" syntax, which is why we discuss them in this lecture. Generally, the __enter__ method takes the standard self (an argument that is an instance of A_Class) and performs the SETUP; if the "as name" option is to be used, then this method MUST RETURN SELF to bind to name. That name can be used in the statements executed in the block. It is a good idea for __enter__ to always return self, whether or not it is expected to be bound to anything (with no "as name" option, this returned value is ignored). The __exit__ method takes four arguments: (1) the standard self, (2) an object that is an exception class (3) and object that is a raised exception: 2 is the type of 3; for example, for raise AttributeError('...') (2) is AttributeError and (3) is the raised exception object, which includes the text '...' (4) a traceback (which we can print, and typically is formatted to print nicely). These parameters are bound to these special values automatically, if an actual exception is raised in the block; if no exception is raised, then 2-4 are each bound to the value None. So, the __exit__ of a context manager has a lot to do with exceptions. The __exit__ method performs the TEAR_DOWN and looks at these parameters and does what it wants with them: if it returns True any exception is considered to be handled (some say the context manager "swallows the exception") and it does not propagate; if __exit__ returns False, Python propagates the exception (re-raises it, signaling an error that must still be handled by other/outer code -if not, Python just terminates the program and prints the traceback). We can show the meaning of the "with" statement with A_Class(...) [as name]: block as being equivalent to the following Python code. Here underscored variables (e.g., _mgr) are temporary/locals used by Python, hidden from the programmer. import sys _mgr = A_Class(...) # construct A_Class object _exit = type(_mgr).__exit__ # just find it: if missing raises AttributeError _value = _mgr.__enter__() # call SETUP: if missing raises AttributeError try: [name = _value] # Only if "as name" is present block except: # block raises ANY exception if not _exit(_mgr,*sys.exc_info()): # do TEAR-DOWN and exception handling raise # re-raise if _exit returns False else: # block raises NO exception _exit(_mgr, None, None, None) # do TEAR-DOWN by special _exit call Note that *sys.exc_info() produces a 3-tuple consisting of the type of exception raised, the actual exception object, and the "traceback" with information about how the exception started and propagated. Here are a few simple but powerful context managers. The first executes a block (really designed to execute a single statement) handling specific exceptions by ignoring them, but propagating all other exceptions. class Ignore: # no 2nd argument means ignore ALL exceptions def __init__(self,*exceptions_to_ignore): self._ignore = exceptions_to_ignore def __enter__(self): return self def __exit__(self,exc,exc_value,traceback): return self._ignore == () or exc in self._ignore Here __enter__ does nothing special (but does return self); __exit__ returns True if self._ignore is either the empty tuple (meaning ignore ALL exceptions), or the exception raised is in self._ignore tuple (one of the exceptions that the class was constructed to ignore). Remember than returning True means that the exception is considered handled/swallowed and will not propagate. So executing with Ignore(AssertionError): raise AssertionError('...') print('After with') will execute the print statement (because __exit__ returns True, since the Ignore object it is told to ignore AssertionError); but executing with Ignore(AssertionError): raise TypeError('...') print('After with') will NOT execute the print statement; instead the context manager will raise a TypeError exception (because __exit__ returns False, since the Ignore object is not told to ignore TypeError). A more realistic use of the Ignore in a context manager allows us to ensure there is no file with a certain name in our directory: we want to remove a file name from our directory, but if the file doesn't exist in the first place, we just keep executing. Calling os.remove for a file that does not exist raises the FileNotFoundError. with Ignore(FileNotFoundError): os.remove(file) Here, if the removal of a file fails, the FileNotFoundError error is raised but ignored. So, the code after the context manager keeps executing. Without context managers we would need to write try: os.remove(file) except FileNotFoundError: pass So, we can use context managers to more simply capture this pattern, and use it much more easily. How does the open class used in ICS-32 work with the context manager? It defines an __enter__ and __exit__ class. In the open class, its __init__ method would store the open file object, then its __enter__ method would return it, and finally the __exit__ method would close that file before returning, regardless of whether an exception was raised; if an exception was raised, it will be propagated (__exit__ always returns False). As a similar example, suppose that we wanted to write a class that allowed us to use it in context manager to echo/duplicate all values printed via a print statement into a special log file. The print would still print on the console class Echo: def __init__(self,open_file): self._log_file = open_file def __enter__(self): import builtins,sys self._real_print = builtins.print # Save the print accessed via builtins def echo_print(*objects, sep=' ', end='\n', file=sys.stdout, flush=False): self._real_print(*objects,sep=sep,end=end,file=file,flush=flush) self._real_print(*objects,sep=sep,end=end,file=self._log_file,flush=flush) builtins.print = echo_print # Change the print accessed via builtins! return self def __exit__(self,exc,exc_value,traceback): import builtins self._log_file.close() builtins.print = self._real_print # Always restore the original print accessed via builtins! return False # Don't "swallow" any exceptions We would use this class in a context manager like with Echo(open('test_echo_output.txt','w')): print('abc') print(1,2,3) print('xyz') print('After with') Here, abc, 1 2 3, xyz, and After with are printed on the console. The file name test_echo_output.txt contains abc, 1 2 3, and xyz (not After with). In fact, if we called any function in the Echo block, and that function contained a print that was accessed in builtins, then that print would be echoed too. To start, __init__ creates an attribute to refer to the open file object into which information is written. Here __enter__ creates an attribute to remember the real print statement (the one stored in builtins.print); it then defines echo_print to have the same header as print (but its body calls the real print, and the real print specifying self._log_file as the place the print write into); then it replaces builtins.print with this one. Finally __enter__ returns an Echo object (which is not really needed because the context manager doesn't use "as name"). __exit__ automatically closes the file (like open's __exit__ does) and restores the original print in builtns. Then it returns False, propagating any exceptions raised in block (none in the example above). But it does make sure, even if an exception is raised, that the builds.print function is restored. Finally, the "with" syntax can include a sequence of objects, which create nested context managers. In Python's form of EBNF (a cross between EBNF and regular expressions) we would write with_stmt ::= "with" with_item ("," with_item)* ":" block with_item ::= expression ["as" target] with A_Class(...) as name1, B_Class(...) as name2: block is equivalent to with A_Class(...) as name1: with B_Class(...) as name2: block ------------------------------------------------------------------------------ Attributes: In this last section we will discuss some of the methods that are at the heart of how Python executes our code. All require careful use or they will cause big problems (often infinite recursion) that stop the execution of all Python code. All concern getting and setting the values of attributes in the namespaces of objects (what classes are all about). Here is the complete laundry list. __getattr__(self,name) : called, when cannot find name attribute __setattr__(self,name,value) : set name attribute to value __delattr__(self,name) : delete name attribute __getattribute__(self,name) : access name attribute (tough to overload, unless you know about inheritance, we will discuss it then) Whenever we refer to an attribute in some object's namespace (recall __dict__ stores the namespaces for objects, containing the bindings of their instance variables) Python calls one of a few double-underscore method: if we do NOT define these methods for a class, Python uses inheritance (a topic that we will cover later) to determine what to do. We will look at __getattr__ first (it is the safest), which is called when a attribute CANNOT be found in a namespace (much like __missing__ for dictionaries, discussed briefly above). Here is a small class that defines this method to return a string that includes the name of the non-existent attribute, instead of raising the NameError exception. We could also easily return None for such non-existent attributes when they are "gotten". class C: def __getattr__(self,name): # call when name (a str) isn't in the namespace print('Attribute for ' + name + ' not found') return name+'?' # Just returns a string of name followed by ? o = C() print(o.a_name) print(o.a_name) o.a_name = 0 print(o.a_name) Running this code prints the results Attribute for a_name not found a_name? Attribute for a_name not found a_name? 0 When the a_name attribute of o is referred to in the first print; this attribute doesn't exist, so Python calls __getattr__ for its result, which returns the attribute's name with a ? appended. This doesn't change whether that name is bound to anything, so the same thing happens if it is called again. When the a_name attribute of o is referred to in the second print, it has already been bound to a value (see o.a_name = 0) so Python doesn't call the __getattr__ method. IMPORTANT __getattr__ is called only for attributes that don't exist; it is not called for all attributes (that is what __getattribute__ is for, but it is tough to use correctly until we learn about inheritance). Now let's look at the standard meaning of __setattr__ for a class; if we do not define our own __setattr__, the class uses one (through inheritance, a topic we cover later in ICS-33) that is equivalent to defining the following one. def __setattr__(self,name,value): self.__dict__[name] = value Study this simple code carefully. It translates o.a = v into a method call on __setattr__: it executes o.a = v by executing o.__setattr__('a', v), which executes o.__dict__['a'] = v. Notice that the attribute name is CONVERTED INTO AN EQUIVALENT STRING when calling __setattr__, which is used as a key in __dict__. One thing we should NEVER DO is write One thing we should NEVER DO is write One thing we should NEVER DO is write def __setattr__(self,name,value): # NEVER DO THIS pass # NEVER DO THIS This code executes o.a = v by doing nothing! It does not bind this attribute to anything. By NOT DEFINING OUR OWN __setattr__, the standard one will execute. if we DEFINE OUR OWN, it is likely to include self.__dict__[name] = value somewhere in the code: we set the attribute but maybe do something else too. Now let's looks at a sophisticated use of the __setattr__ method. The class below defines the __setattr_ method so that it uses a dict to remember all the values ever bound to an attribute (stored in self._history). from collections import defaultdict # for elegance; with __setattr__ we need it class C: def __init__(self): self._history = defaultdict(list) # empty list when initially accessed self.s = 0 def bump(self): # resets attribute s: bind s to a value one bigger self.s += 1 # equivalent to self.s = self.s + 1 def __setattr__(self,name,value): # print('C.__setattr__',name,value) # helps for debugging if '_history' in self.__dict__: # False (skip append) 1st time only: # self._history = ... in __init__ self._history[name].append(value)# Do this every time but 1st self.__dict__[name] = value # always do this too, to update name def report(self): print('History Report:') for k,v in sorted(self._history.items()): print(' ',k,' had the values:', v) First, whenever __setattr__ is called, it checks to see if '_history' is already an attribute name; if not yet, it means that self._history is being set for the first time, in __init__, by executing the self._history = defaultdict(list), so __setattr__ shouldn't update the history by appending a new value, because there is no history dictionary yet to append to! This binding MUST occur FIRST in __init__ for the interesting parts of __setattr__ to work correctly. Next, __setattr__ sets the attribute by using __dict__ directly, using the string name as the key, and thus bypasses another (recursive) call to __setattr__. So, when called in __init__, when there is no '_history' attribute, this statement creates one; if there already is a '_history' attribute, this statement updates whatever attribute is being set. If we ran the following script x = C() x.s = 3 # resets attribute initialized to 0 in __init__ x.bump() x.bump() x.y = 11 # create a new attribute not created in __init__ x.s = 8 x.y += 1 x.report() Calling x.report would show a history of all the values the two instance variables stored. Recall that the name parameter in __setattr__ is passed a string argument: x.s = passes 's' as the name of the attribute to set, and 8 as its value. History Report: s had the values: [0, 3, 4, 5, 8] y had the values: [11, 12] Note that x._history['s'][-1] is the current value bound to x.s (the last one in the list), and x._history['s'][-2] is the previous one to that. We might call this an Elephant class: it never forgets the binding for a name, even after the name is rebound to another value. The method __delattr__ is not so useful (and __getattribute__ is so dangerous to use that we will bypass it now). But here is a short example of a class that keeps a list of names that have been deleted from it and what their last bindings were (like remembering the old bindings of instance variables). Recall that name is a string: del o.y calls type(o).__delattr__ and passes 'y' as the name of the attribute to delete. class C: def __init__(self): self._deleted = {} def __delattr__(self,name): self._deleted[name] = self.__dict__[name] del self.__dict__[name] def report(self): print('Deleted attributes and their final values:',self._deleted) o = C() o.x = 1 o.y = 2 o.z = 3 del o.x del o.y o.report() Calling o.report prints: Deleted attributes and their final values when deleted : {'y': 2, 'x': 1} ------------------------------------------------------------------------------ The double-underscore methods __iter__ and __next__ are so useful (everything about iteration is useful) we will spend next week examining this protocol and various ways to implement it. ------------------------------------------------------------------------------ FYI, here is the entire List1 class with all the methods described above. class List1: def __init__(self,_plist): self._plist = list(_plist) def __str__(self): return str(self._plist) @staticmethod def _fix_index(i): if i == None: return None else: # for positive indexes, 1 smaller: 1 -> 0 # for - indexes, the same: -1 (still last), -2 (still 2nd to last return (i-1 if i >= 1 else i) def __getitem__(self,index): print('List1.__getitem__('+str(index)+')') # for illumination/debugging if type(index) is int: return self._plist[List1._fix_index(index)] elif type(index) is slice: s = slice(List1._fix_index(index.start), List1._fix_index(index.stop), index.step) return self._plist[s] else: raise TypeError('List1.__getitem__ index('+str(index)+') must be int/slice') def __setitem__(self,index,value): print('List1.__setitem__('+str(index)+','+str(value)+')') # for illumination/debugging if type(index) is int: self._plist[List1._fix_index(index)] = value elif type(index) is slice: s = slice(List1._fix_index(index.start), List1._fix_index(index.stop), index.step) self._plist[s] = value else: raise TypeError('List1.__setitem__ index('+str(index)+') must be int/slice') def __delitem__(self,index): print('List1.__delitem__('+str(index)+')') # for illumination/debugging if type(index) is int: del self._plist[List1._fix_index(index)] elif type(index) is slice: s = slice(List1._fix_index(index.start), List1._fix_index(index.stop), index.step) del self._plist[s] else: raise TypeError('List1.__delitem__ index('+str(index)+') must be int/slice') def __len__(self): return len(self.thislist) def __contains__(self,item): return item in self._plist Note that we could define the following __getattr__ method for List1. def __getattr__(self,attr): # if attr not here, try self._plist return getattr(self._plist,attr) This says, if we try to access any attribute that is not defined for List1, get that attribute from self._plist. Generally this is called delegation: where an "outer" object that does not have some attribute delegates the attribute reference to an inner object. Doing so allows us the inner object implicitly to do all processing not explicitly done by the outer object. So for example, if we wrote x = List1([1,2,3]) x.append(4) print(x[4]) it would print 4. There is no append attribute defined in List1, so instead we get that attribute from self._plist, in which case the attribute is a method that we can call (we call it here with the argument 4). Generally this is called delegation: where an "outer" object that does not have some attribute delegates the attribute reference to an inner object. Decorators often use exactly this form of delegation, so the decorator object can process its attributes and all the attributes of the decorated object. In fact, with this __getattr__ method we can omit defining __len__ and __contains__ letting those attributes be accessed in the self._plist. Such delegation would work for all method that are index independent. We would still have to define, in List1, all the methods using list indexes because the whole purpose of this class is to shift the original from index 0 to index 1. ------------------------------------------------------------------------------ FYI, here is a list of operators and the double-underscore methods that we can define to overload them. We've covered most but not all. Relational operators: < > <= >= eq ne __lt__ __gt__ __le__ __ge__ __eq__ __ne__ Unary operators/functions: + - abs ~ round floor ceil trunc __pos__ __neg__ __abs__ __invert__ __round__ __floor__ __ceil__ Arithmetic: + - * / // % divmod ** << >> & | ^ __add__ __sub__ __mul__ __truediv__ __floordiv__ __mod__ __divmod__ __pow__ __lshift__ __rshift__ __and__ __or__ __xor__ Reflected (right) arithmetic __radd__ __rsub__ __rmul__ __rtruediv__ __rfloordiv__ __rmod__ __rdivmod__ __rpow__ __rlshift__ __rrshift__ __rand__ __ror__ __rxor__ Incrementing Arithmetic: += -= *= /= //= %= **= <<= >>= &= |= ^= __iadd__ __isub__ __imul__ __idiv__ __ifloordiv__ __imod__ __idivmod__ __ipow__ __lishift__ __irshift__ __iand__ __ior__ __ixor__ Type conversion: int float complex oct hex index trunc coerce __int__ __float__ __complex__ __oct__ __hex__ __index__ __trunc__ __coerce__ Class representation: __str__ __repr__ __unicode__ __format__ __hash__ __nonozero__ __dir__ Attribute: __getattr__ __setattr__ delattr__ __getattribute__ Containers: __len__ __getitem__ __setitem__ __delitem__ __iter__ __reversed__ __contains__ __missing__ Misc: __call__ __copy__ __deepcopy__ getattr/setattr: special classes, wrapped, inheritance soon, and decorators Context managers __enter__ __exit__ Descriptors: __get__ __set__ __delete__ Do you want to build your own context manager: overload the __enter__ and __exit__ methods __iter__ and __next_ for generators ------------------------------------------------------------------------------ Problems: 1) Write a class named Indexed_Set that defines an __init__ method taking anything that can be iterated over as an argument (e.g., list, tuple, set) and stores all these values in a set. Although sets don't have indexes, define a __getitem__ method such that for int index i it returns the ith value produced when iterating over the set (raising IndexError for too small/too large indexes); for a slice index, __getitem__ produces a set containing all the the indexed values; for any other index (non-int, non-slice) raise the TypeError exception. Of course Python can iterate over an Indexed_Set using __getitem__. Hint: write a _getitem helper method that implements the requirements for int indexes, which is called from __getitem__ when necessary (once for an int index, multiple times for a slice index; build the resulting set with a comprehension). 2) Write a class name Count_Argument_Use that defines an __init__ method taking a function of one parameter as an argument. Define a __call__ method that keeps a dictionary of what argument the function was called with and how often it was called with each argument. Define a report method that prints each argument in this dictionary, followed by the number times it was used, in decreasing order of the times it was used. Test this code on the fibonacci function defined in these notes. For fib(10), calling report() should print: Argument | Times Called ---------+------------- 1 | 55 0 | 34 2 | 34 3 | 21 4 | 13 5 | 8 6 | 5 7 | 3 8 | 2 9 | 1 10 | 1 3) Write a class named Copying that can be used as context manager for writing simple scripts that copy information an input file to an output file. It defines an __init__ method taking the name of the input and output files as arguments. It defines an __enter__ method that opens these file names for reading/writing and prints 'Starting Copy' on the console. Copying defines a read method that calls readline on the open input file and either (a) returns the next line of the input file or (b) or raises the EOFError exception if that line comes back empty; it defines a write method (with a string parameter) that writes its parameter into the open output file. Finally, Copying defines an __exit__ method that prints 'Stopping Copy....successfully on EOF' if that was the exception that terminated the block (and considers the exception handled) and prints 'Stopping Copy....on exception' if any other exception that terminated the block (and considers the exception not handled). Here are three examples of code using the Copying class as a context manager. #pure copy with Copying('input.txt','output.txt') as copy: while True: copy.write(copy.read()) #duplicate input lines for output with Copying('input.txt','output.txt') as copy: while True: to_write = copy.read() copy.write(to_write) copy.write(to_write) #only copy lines matching the string pattern with Copying('input.txt','output.txt') as copy: re_pattern = re.compile(pattern) while True: to_write = copy.read() if re_pattern.match(to_write): copy.write(to_write) 4) Write the class Store_Once that defines an __init__ method taking a anything that can be iterated over to produce strings as an argument (e.g., list of strings, tuple of string, set of string) and stores all these strings in a set. Define a __setattr__ method that allows currently unbound attributes to be set, but raises Exception (with a reasonable message) if attempting to change the binding of a currently bound attribute in the stored set. Raise the exception only if the value is to be changed, not just rebound to the same (check using the is or is not operator). 4) The following code is supposed to print a file twice. Why does it not work? What does it do. o = open('filename','r') with o: for line in o: print(line) with o: for line in o: print(line)