Python Classes This lecture reviews basic material that you should know about defining and using classes, although it also presents some new (hopefully easy to understand) material that you may have not seen. Primarily this lecture discusses how to use the namespaces in class objects and the namespaces of their instance objects to store data as well as functions/methods. Most data is stored in the namespace of an object constructed from a class. Most methods are stored in the namespace of class the object is constructed from (and accessed via the Fundamental Equation of Object Oriented programming: review that from the first week's lectures). Fundamentally this lecture is about two things. 1) The name spaces for classes and instances (objects constructed from classes): how they are initialized, stored, and accessed/updated. 2) Accessing attributes (both for methods and data) in instance and classes: the 2-step algorithm (which we will extend/generalize when we cover inheritance among classes) ------------------------------------------------------------------------------ Defining Classes: When we define a class in a Python, we are binding the class name to an object representing the entire class. That class object contains mostly method definitions, but it can also contain data that is common to all the objects constructed from the class (common to examine/common to update). We call the class object to create a new object that is an instance of the class: Python constructs an empty instance object and then calls the __init__ method defined in the class, (passing the new/empty instance object to its self parameter) to initialize the state of the instance object. Recall that all names in Python refer/are-bound to objects; so defining class C: ... creates the name C and binds it to an object representing the class C. We must clearly be able to talk about both the class object, and (instance) objects constructed from class objects. What names are attributes defined in a class object's namespace? I'm not talking about the instance objects that will be constructed from the class C, but the names in the namespace of the class object itself. Mostly, a class defines names that are bound to its methods (e.g., __init__), but later in this lecture we will also discuss names representing other class attributes: data. When we want to construct an object that is an instance of a class, we refer to a class object (typically by the name bound to it when it is defined) followed by () and possibly arguments in these parentheses. Writing o = C(arguments) defines the name o to refer/bind to an instance object created from class C. Python does three things when constructing new instance objects: (1) It calls a special function that creates an object that is a new instance of the class. Note that this object automatically has an dictionary associated with it, with the name __dict__, and that dictionary starts empty. (2) It calls the __init__ method for the class, passing the empty object created in (1) to the first parameter of __init__ (by convention, called self), and following this with all the other argument values in the parentheses used in the call to initialize the states of attributes in this instance. Typically __init__ assigns values to the self/instance variables, which stores the name/binding in __dict__ for the self object. It often checks arguments for validity as well, raising an exception if the object cannot be correctly constructed. (3) A reference to the object that was created in (1) and initialized in (2) is returned as the result of constructing the object: most likely this reference will be bound to some name: e.g., o = C(...) which means o refers to a newly constructed/initialized object, constructed from class C (again, the new object is called an instance of class C). So if we call o = C(1,2) Python calls C.__init__(reference to a new object with empty __dict__, 1, 2) and binds o to refer to the newly constructed object (from class C) somehow initialized (see the body of __init__) by the arguments 1 and 2. Note that we can define other names that can bind to/share the same class object. For example: class C: def __init__(self, p1): print('instance of C object initialized') self.a = 'starting' # initialize an instance/attribute name self.b = p1 # initialize an instance/attribute name D = C # Names C and D refer to (share) the same class object x = C(1) # Use C to construct a first instance of a class C object (direct) y = D(2) # Use D to construct another instance of a class C object (via sharing) print(C,D,x,y) print(type(x), type(y), type(C), type(type(x))) Running this script produces the following: the first two lines from calling __init__ twice, the next two from calling the two print statements above instance of C object initialized instance of C object initialized <__main__.C object at 0x027B0E10> <__main__.C object at 0x02889C50> The prefix '__main__' appears in front of '.C' because I executed the code above in a script (as the main module Python is executing). That is where the class C is defined. Finally, if we print(x.a, x.b) it prints: starting 1. if we print (y.a, y.b) it prints: starting 2. The attribute a is always bound to "starting"; the attribute b is bound to the value of the argument passed to __init__ whenever an object of type C is constructed. We can use the type function to determine the type of any object: the class from which it was constructed. The objects that x and y refer to are instances of the class C, defined in the main script. C (and all classes that we define) are instances of a special class called 'type'). So in the example above, x is an instance of C, and C is an instance of 'type'. ------------------------------------------------------------------------------ Manipulating an object's namespace (and __init__): All objects have namespaces, which are dictionaries, in which the attribute names defined in that object are bound-to/associated-with values. To define an attribute in the namespace of the object that self is bound to, we write something like: self.name = value in the __init__, or any other method in the class, to define an attribute name in the dictionary of the object self refers to, and bind that name (or update an existing binding) to refer to value. Now we illustrate a lower-level way to add attribute names to the namespace of an object. This way is not recommended for now, but it furthers our understanding of objects and their namespaces in Python, and we will find interesting/concrete uses for this understanding later in the course. Given class C defined above, we can write o = C(1) # defines a/b attributes print(o.__dict__) #1 o.x = 1 # defined x attribute o.y = 'two' # defined y attribute o.__dict__['z'] = 3 # defined z attribute by __dict__ directly print(o.__dict__) #2 print(o.a, o.b, o.x, o.y, o.z) #3 Running this script produces instance of C object initialized # from __init__ {'a': 'starting', 'b': 1} #1 {'a': 'starting', 'b': 1, 'x': 1, 'y': 'two', 'z': 3} #2 starting 1 1 two 3 #3 So, we have used two different forms for adding three attribute names to the namespace/dictionary for the object that o refers to (which initially stores the name a initialized by self.a = 'starting' and b initialized by self.b = p1). Adding names updates the object's __dict__. Writing object.name = value is equivalent to writing object.__dict__['name'] = value: note that each __dict__'s key is a string representing the attribute name. The object o has a __dict__ attribute that stores its namespace's bindings. Each identifier/name for which we define an attribute in the object appears as a string key in __dict__. Mostly, we don't initialize the namespace of an object this way: outside of __init__; instead we use the automatically-called __init__ method and its self parameter to do the initialization. But really, the same thing is happening in __init__ below as was shown above. class C: def __init__(self, init_x, init_y): print('instance of C object initialized') self.x = init_x # value depends on the argument matching init_x self.y = init_y # value depends on the argument matching init_y self.z = 3 # always the same object: 3 c = C(1,'two') print(c.__dict__) Running this script produces the same results. instance of C object initialized {'z': 3, 'y': 'two', 'x': 1} The purpose of the __init__ method is to create all the attribute names needed by the object's methods and initialize them appropriately. Typically, once __init__ is called, no new attribute names are added to the object's namespace (although Python allows additions, as illustrated in the prior section: e.g., o.x = 1) and some useful examples are illustrated later in this lecture as well). Every object constructed is likely to need exactly the same attribute names: all these attribute names are used in the methods defined in the class -methods that process instance objects of the class. The __init__ method, which is automatically called by Python when an object is constructed, is a convenient place to localize the creation and initialization of all these attribute names. So, for every assignment statement self.name = value Python puts an entry into the self object's namespace (self.__dict_) with the key 'name' (keys are strings) associated with value. We can (and mostly) do this in the __init__ method or after the object is constructed: both ways are shown above. When self.name appears in an expression (e.g., a = self.name), Python substitutes the expression self.__dict__['name'] for the right hand side of the =, to retrieve the value of that name from the self object's namespace/dictionary. Note how name in self.name becomes 'name' in self.__dict__['name']. If we try to access a non-existing attribute name (o.mumble, for objects constructed from the class C above, which is translated into o.__dict__['mumble']), Python raises an exception. Attempting to access o.mumble prints the exception as follows AttributeError: 'C' object has no attribute 'mumble' So, it DOES NOT raise a key error in the __dict__ but raises an AttributeError instead. So if we accidentally try to access a non-existing attribute, this is how Python detects and reports the error. Note that some names defined in __init__ (z above) always receive the SAME INITIALIZATION, so we don't need to define a parameter in __init__ to initialize them. But often names need to be initialized to different values when different objects are constructed, so typically we add just enough parameters to the __init__ method to allow us to specify how those names with different values should be initialized. ----------Assert Interlude: assert in initialization Sometimes __init__ will contain code that ensures that a parameter is matched to an argument that stores a legal/reasonable value for it; if not, Python will raise an exception to indicate that the object being constructed cannot be properly initialized. Sometimes it raises an exception explicitly, using an if statement that tests for an illegal value. Sometimes it uses an assert statement for this purpose. Remember that the form of assert is: assert boolean-test, string which is equivalent to the slightly more verbose if not boolean-test: raise AssertionError(string) I suggest that the string argument to AssertionError should always contain 4 parts: (1) The name of the class (if the problem occurs in the method of a class) or the name of the module (if the problem occurs in a function in a module) (2) The name of the method/function that detects the problem (here __init__) (3) A short description of the problem... (4) ...including the value(s) of the argument(s) causing the problem For example, if class C included an __init__ method that required x's argument to be a positive integer, we could write __init__ as follows. We typically check all the arguments FIRST in this method, before binding any self/instance names. def __init__(self,x): assert type(x) is int and x > 0, 'C.__init__: x('+str(x)+') must be an int and positive' ... or def __init__(self,x): assert type(x) is int and x > 0, 'C.__init__: x({v}) must be an int and positive'.format(v=x) ... or def __init__(self,x): assert type(x) is int and x > 0, f'C.__init__: x({x}) must be an int and positive' ... We construct the information strings by concatenation, the format function, or by using f-strings. Note that for concatenation we must explicitly convert values to strings; but the format function (like the print function) and f-strings automatically converts all its arguments to strings. Some languages define + to also automatically convert its arguments to strings, but Python does not. If so, calling C(-1) would result in the error message C.__init__: x(-1) must be an int and positive and calling C('abc') would result in the error message C.__init__: x(abc) must be an int and positive Such a message provides useful information to whomever is writing/debugging a program using this class. In a well-written program, someone just using the program (possible not a programmer) should not have to read/interpret such a message. Question: if we wrote the assert as just x > 0, what exception would be raised by calling C('abc') and why? Why is the given assertion "better", even though it takes more work to check? We could be even more descriptive and write two different error messages, one for each part of the assertion. def __init__(self,x): assert type(x) is int, 'C.__init__: x({v}) must be an int'.format(v=x) assert x > 0, 'C.__init__: int x({v}) must be positive'.format(v=x) Note that int refers to the int class (NOT an instance of an int class: e.g., not a number), so writing "type(x) is int" is checking whether the type of the object x refers to is the same as the object int refers to: meaning x is bound to an object that is constructed from the int class. -----More truth More truth/details about assert: In fact, assert boolean-test, string is equivalent to if __debug__ and not boolean-test: raise AssertionError(string) where __debug__ is a special keyword defined in the builtins module. Whenever it is False, the if's entire boolean expression is False (it doesn't even take time to evaluate not boolean-test: we will discuss short-circuit operators soon), so no exception is raised. Although we cannot directly store into __debug__, we can run Python code "optimized" which sets __debug__ to False. In this way we can quickly bypass all the assertion checking, which will speed up the code. With this approach asserts are used mostly for internal consistency checking, when the code is being developed... ...but if the running code bypasses such assert statements, and bad things happen, then they can go undetected (although that often leads to other bad things happening, causing Python to raise exceptions that cannot be turned off). So turning off assertions is not recommended unless they were designed to be turned off, and turning them off also accrues a large speed advantage. This leads to a big discussion of who checks preconditions (the caller of a function/method or the function/method itself) which is a bit beyond the scope of this class, but I cover in ICS-46 in more detail. -----End More truth ----------End assert Once an object is constructed and initialized, typically we use it by calling it methods (or possibly passing it to another function/method that calls its methods). We call an object's method using the syntax object.method(arguments); recall the Fundamental Equation of Object-Oriented Programming. o.m(...) is translated into type(o).m(o,...) if m is not an attribute in o This means, call the method m in the class specifying the type of object o, and the object o used to call the method is passed to the method as the first argument (matching the self parameter). For example, the method call 'a;b;c'.split(';') is translated into the call type('a;b;c').split('a;b;c', ';') which is equivalent to the call str.split('a;b;c', ';') because type('a;b;c') is str: that is, they both refer to the same object: the str class object. To execute the original method call, ultimately Python calls the split method defined in the str class with the arguments 'a;b;c' (matching the self parameter) and ';' (matching the second parameter). It calls this method by its function name: str.split (split in the str class) Likewise, Python translates accesses to data attributes o.a is translated into type(o).a if a is not an attribute in o Before finishing the discussion of objects and their dictionaries, recall that C refers to a class object. As an object, it also has a __dict__ that stores its namespace. Here is some code that shows what names are defined in the C object. for (k,v) in C.__dict__.items(): print(k,'->',v) And here is what it prints. __weakref__ -> __dict__ -> __doc__ -> None __init__ -> __qualname__ -> C __module__ -> __main__ Note that __init__ is the only function that we defined, and it is there (on the 4th line). The other attributes are defined automatically. Because I ran this code as a script (the main module), its __module__ variable is bound to '__main__', which you should know something about, because you should have written (and understood) code like if __name__ == '__main__': ... If a module were imported, the __module__ key would be associated with the file/module name it was written in. Likewise, the __name__ variable in a module is bound to '__main__' in a started script, but it is bound to the module's name in any module that is imported into another module. ...For example, if module m1.py contained print('in m1',__name__) if __name__ == '__main__': print('m1 is running') ...then running it would produce in m1 __main__ m1 is running ...If in addition module m2.py contained import m1 print('in m2',__name__) if __name__ == '__main__': print('m2 is running') ...then running it would produce in m1 m1 in m2 __main__ m2 is running Note here the code printing 'm1 is running' is not executed. In fact, if we wrote import m1 print('in m2',__name__) if __name__ == '__main__': import m1 print('m2 is running') ...then running it would still produce in m1 m1 in m2 __main__ m2 is running because if Python has already imported a module, it will not re-execute the module's code when it is imported again: the module and its namespace, initialized the first time it is imported, will remain the same on re-importation (unless the programmer uses importlib.reload(m1), in which case it will re-execute all the code in m1). So, we can test __name__ so our code will run the script only when its module is run, not when its module is only imported by another module. The if accomplishes this behavior. When I test/grade your code, I import its module, so any code remaining after "if __name__ == '__main__':" is not executed, which is why you can leave code there and it won't interfere with program grading. If I had defined C with a docstring it would appear as an attribute of __doc__. Don't worry about __weakref__, or __qualname__ or __module__. ------------------------------------------------------------------------------ Different kinds of names/attributes: definition and use Let's discuss four different kinds of names in relation to classes. We will call all these names attributes (and sometimes variables). (1) local variables: defined and used inside functions/methods to help with the computation; parameter variables are typically considered local variables too. These are created during the execution of the function/method; they disappear when the function/method terminates (unless the function defines and returns a local function object, which can be called later and still refer to the local variables that existed when the inner function was created). (2) instance attributes/variables of an object: typically defined inside __init__ and used inside class methods (we saw other ways to define them above too). These are referred to as self.name. These exist so long as the object exists. (3) class attributes/variables: typically defined in classes (at same level as methods, not inside methods) and typically used in methods; we use a class attribute to store information COMMON to all objects of a class (rather than putting the same instance attributes in each object constructed from the class). Methods are actually class attributes, bound to function objects. All class attributes are defined in the class object and they are found by the Fundamental Equation of Object-Oriented Programming through instances of that class. That is, if class C defines an attribute a (method or data) and o refers to an object constructed from class C, then o.a will find the value of attribute a in class C, but only if it is not stored directly in o (in o's __dict__). For class attributes, that is typically what we want. Whenever we write o.a = value, it binds/rebinds attribute a stored in object o to value (whether or not attribute a is stored in object o or the class of object o). We can write C.a = value to bind/rebind attribute a stored in class C. (4) global variables: typically defined in modules (outside of functions and classes) and used inside functions and/or class methods; we typically avoid using them (don't use global variables), and when we do use them, we do so in a cautious and limited manner. You should know how to use all these kinds of attributes/variables (and their semantics). Use local variables and instance attributes as needed (most function/methods have the former, and most classes define the later in __init__ and use them in its methods). Class attributes are sometimes useful to solve certain kinds of problems where common information/data is stored among all the instances, by storing them just once in their common class object. For example, if we are storing a Person class, class Person: fingers = 10 def __init__(...): ... NO setting of self.fingers Person.fingers has the value 10. If we define bob = Person(...) then bob.fingers (by FEOOP) has the value 10. fingers is not an attribute in the instance object bob refers to, so bob.fingers is translated into type(bob).fingers and then Person.fingers, which is defined here as 10. Let's assume bob had an accident and lost a finger :( We could write bob.fingers = 9 and then bob's object's __dict__ would now contain the fingers attribute, associated with the value 9. When we specified bob.fingers, Python would find 'fingers' in the __dict__ for bob's object and return the value 9, not ever needing to apply the FEOOP, and therefore NOT returning 10. If a surgery reattached bob's finger :) so he was back to having 10, we could update the namespace of bob's object in two different ways to reflect this change. 1) We could write: bob.fingers = 10 and then bob's object's __dict__ changes its fingers attribute to now be associated with the value 10. When we specified bob.fingers, Python would still find 'fingers' in the __dict__ for bob's object and return the value 10. OR 2) We could write: del bob.fingers and then bob's object's __dict__ removes the fingers attribute from its keys. Now, when we specified bob.fingers, Python would once again not find it in bob's object's __dict__, so it would again use the FEOOP to translate bob.fingers into type(bob).fingers and then Person.fingers, which is still defined as 10. Generally, using class attributes for common information saves space, but it takes a bit longer to find the value associated with the attribute (needs the second FEOOP step). Using object attributes takes extra space, but saves some time. This is a classic time-for-space tradeoff in programming. Global variables are fine to use in scripts, but are often frowned upon when declared in modules that are imported (although they too have their uses there, but in more advanced settings). Define/use global variables sparingly. Be able to justify their use. The following script uses each kind of attribute/variable, appropriately named. Ensure that you understand how each use of these works. The use of the command named 'global' (see the two lines with #comments) is explained in more detail below. global_var = 0 class C: class_attr = 0 def __init__(self,init_ instance_attr): self.instance_attr = init_instance_attr def bump(self,name): print(name,'bumped') #global_var = 100 # comment out this line or the next global global_var # comment out this line or the previous global_var += 1 C.class_attr += 1 # self.class_attr by FEOOP is C.class_attr too self.instance_attr += 1 def report(self, attr_name): print('instance referred to by', attr_name, ': global_var =', global_var, '/class_attr =', self.class_attr, # could write as C._class_attr '/instance_attr=', self.instance_attr) o=C(10) o.report('o') o.bump('o') o.report('o') print() prints instance referred to by o : global_var = 0 /class_attr = 0 /instance_attr= 10 o bumped instance referred to by o : global_var = 1 /class_attr = 1 /instance_attr= 11 print('x = o') x = o x.bump('x') o.report('o') x.report('x') print() prints x = o x bumped instance referred to by o : global_var = 2 /class_attr = 2 /instance_attr= 12 instance referred to by x : global_var = 2 /class_attr = 2 /instance_attr= 12 print('x = C(20)') x = C(20) x.bump('x') x.report('x') print() prints x = C(20) x bumped instance referred to by x : global_var = 3 /class_attr = 3 /instance_attr= 21 C.report(o,'o') # same as o.report('o') by the Fundamental Equation of OOP type(o).report(o,'o') # ditto: the meaning of the Fundamental Equation of OOP print() prints instance referred to by o : global_var = 3 /class_attr = 3 /instance_attr= 12 instance referred to by o : global_var = 3 /class_attr = 3 /instance_attr= 12 print(C.class_attr, o.class_attr) # discussed below print(o.instance_attr) prints 3 3 12 So, the global variable is changing every time, as is the class attribute, because there is just one of each. But each object that is an instance of C has its own instance attribute, which changes only when bump is called on that instance. If we instead commented as follows global_var = 100 # comment out this line or the next #global global_var # comment out this line or the previous running the script would have the following result: By removing the statement global global_var, then the statement global_var = 100 actually defines a LOCAL variable in the bump method -despite its name- so its increment does not affect the true global_var, which stays at zero. Recall, if a variable defined in a function/method has NOT been declared global, then it is created as a local variable inside the function/procedure. Note that one can REFER to the value of a global_var inside methods of class C WITHOUT a global declaration (see the report method), but if a method wants to CHANGE global_var it must declare it global (then all references and changes are to the real global variable). With no global global_var, the assignment global_var = 100 creates a new name local to the bump method and always binds it to 100. instance referred to by o : global_var = 0 /class_attr = 0 /instance_attr= 10 o bumped instance referred to by o : global_var = 0 /class_attr = 1 /instance_attr= 11 x = o x bumped instance referred to by o : global_var = 0 /class_attr = 2 /instance_attr= 12 instance referred to by x : global_var = 0 /class_attr = 2 /instance_attr= 12 x = C(20) x bumped instance referred to by x : global_var = 0 /class_attr = 3 /instance_attr= 21 instance referred to by o : global_var = 0 /class_attr = 3 /instance_attr= 12 instance referred to by o : global_var = 0 /class_attr = 3 /instance_attr= 12 3 3 12 Finally, it is clear what C.class_attr and x.instance_attr refer to, but what about x.class_attr? As shown above this prints 3 just as C.class_attr does. This meaning is a result of the Fundamental Equation of Object Oriented Programming (but applied to variable attributes, not method attributes). -----IMPORTANT Technically, when specifying o.attr, any access to an attribute name in object o (whether a variable or method) Python first tries to find attr in the object o; if it is not defined in o's namespace/__dict__ Python uses the FEOOP to try to find it by checking type(o).attr, which attempts to find attr in type(o)'s namespace/__dict__. So when trying to find x.class_attr if fails to find class_attr in o's namespace/__dict__, so it tries type(o).class_attr or C.class_attr and finds the class_attr attribute in C's namespace/__dict__. When we study inheritance, we will learn more about how Python searches for all attributes by extending/generalizing this rule: if it is not in an instance, then it tries in the class that instance was constructed from, and if not in that class, it tries in its base/super classes, and if not in its base/super classes.... But for now remember: to find an attribute for an object: o.a 1) First look in the namespace of the object (o.__dict__) 2) If it isn't there, then use FEOOP to look in the namespace of the class that the object was constructed from. (type(o).__dict__) 3) Remember if we are calling a method attribute, FEOOP translates the call o.m(...) into the call type(o).m(o,...) ----- ------------------------------------------------------------------------------ Strange Python (but now understandable): 1) Defining/Redefining/Using a method for a class, AFTER the class has been declared: 2) Defining a method for an instance (but not the whole class) after the instance has been constructed: 1) Defining/Redefining/Using a method for a class, AFTER the class has been declared: We will now discuss one more interesting capability that a dynamic language like Python has (but languages like Java and C++ do not). We can change the meaning of a class as a program is running. Let's go back to a very simple class C, that stores one instance attribute, but has no methods that change it. The report method prints the value of this instance attribute. class C: def __init__(self, init_instance_attr): self.instance_attr = init_instance_attr def report(self, attr_name): print('instance referred to by', attr_name, '/instance_attr=', self.instance_attr) Now look at the following code. It defines o to refer to an object constructed from the class C, which defines only a report method (and then it calls that method to report). Next it defines the bump function with a first parameter named self: its body increments the instance_attr in self's namespace dictionary. We call bump with o and it updates o's instance_attr (as seen by the report). o = C(10) o.report('o') # By FEOOP, exactly the same as calling C.report(o,'o') print() prints instance referred to by o /instance_attr= 10 # Define function bump # Strange, because this function uses "self" # But we can use any names for parameters def bump(self,name): print(name,'bumped') self.instance_attr += 1 bump(o,'o') # Call function bump o.report('o') # By FEOOP, exactly the same as calling C.report(o,'o') print() prints o bumped instance referred to by o /instance_attr= 11 So, here we call the bump function, passing o as the argument matching self. Then we do something strange. We add the bump function into the namespace of C's class object with the name cbump (that is just the same as writing C.__dict__['cbump'] = bump, putting the cbump method in the namespace of the class). We could have called it just bump, but instead we use a slightly different name. Then, we can call o.cbump('o') which by the Fundamental Equation of OOP is the same as calling C.cbump(o,'o') and because we just made the cbump attribute of the object representing class C refer to the bump function, its meaning is to call the same function object that bump refers to. C.cbump = bump; # put bump in the namespace of C, with the name cbump o.cbump('o') # By FEOOP, exactly the same as calling C.cbump(o,'o') o.report('o') print() prints o bumped instance referred to by o /instance_attr= 12 x = C(20) x.cbump('x') x.report('x') prints x bumped instance referred to by x /instance_attr= 21 So, even AFTER THE CLASS C HAS BEEN DEFINED, we can still ADD A METHOD TO ITS NAMESPACE and then can CALL IT using any object that has already been (or will be) constructed from the class C. That is, we can CHANGE THE MEANING of a class C dynamically (AFTER THE PROGRAM IS RUNNING), and all the objects constructed from the class will respond to the change through the FEOOP. Likewise, we can change the meaning of a method in a class by rebinding the class attribute. Recall that the del command gets rid of an association in a dict. If we wrote del C.cbump, then if we tried to call that method by o.cbump('o') the result would be that Python raises AttributeError: 'C' object has no attribute 'cbump' So we can both ADD and REMOVE names from a class object's namespace! Note that we could have defined/written bump as follows (note use of p not self as the first parameter of bump: any name will do) def bump(p,name): print(name,'bumped') p.instance_attr += 1 and nothing changes, so long as every occurrence of self is changed to p inside the bump function (as is done). We could likewise write def report(p, var): ... inside the C class itself. The parameter name "self" is just the standard name used in Python code; but there is nothing magical about this name, and we can substitute whatever name we want for the first parameter. This parameter will be matched with the object used to call the method. Although any name is allowed, I strongly recommend always following Python's convention and using the name self. Eclipse always supplies the name self automatically as the first parameter to any methods we define. ------------------------------ 2) Defining a method for an instance (but not the whole class) after the instance has been constructed: In fact, Python also allows us to add a reference to the bump method to a single instance of an object constructed from class C, not the class itself. Therefore unlike the example above, bump is callable only on the one object that it was added to, not on all the other instances of that class. We can also use this technique to add a method to an object that is different than the one defined in the object's class. Start with the class C as defined above, defining just __init__ and report. Then def bump(self, name): print(name,'bumped') self.instance_attr += 1 o = C(0) x = C(100) o.bump = bump; o.bump(o,'o') # o.bump directly calls bump method defined above o.bump(o,'o') # (not using FEOOP) because that attribute is stored for o o.report('o') x.bump(x,'x') # fails because there is no bump attribute in the object x x.report('x') # refers to, either directly in x or in x's class C Note that calling o.bump(..) finds the bump method in o's namespace, without needing to translate it using the Fundamental Equation of Object-Oriented Programming. Without the translation, we explicitly need to pass the o argument which becomes bump's self parameter. Generally when looking up the attribute o.attr (whether attr is a data or method name), Python first looks in the namespace of the object o refers to, and if it doesn't find attr, it next uses FEOOP and looks in the namespace of the object type(o) refers to. When run, this script produces: o bumped o bumped instance referred to by o /instance_attr= 2 Traceback (most recent call last): File "C:\Users\pattis\Desktop\python32\test\script.py", line 21, in x.bump(x,'x') # fails because there is no bump attribute in the object x AttributeError: 'C' object has no attribute 'bump' The error message mentions C because after failing to find the bump attribute in the object y refers to, it looks in the object C refers to; when it fails there too, the raised exception reports the error. So, we can add methods to (a) classes: methods that all the instances of that class can use, via FEOOP (b) an instance of the class, such that only that instance can call the method (and their self parameter must also be passed as an explicit argument), since FEOOP is not used. ------------------------------ Combining both worlds using delegation Suppose that we wanted to be able to call methods attached to objects, but do so using the standard object.method(...) syntax, as we did in the first part of this section. In the second part of this section, we had to duplicate the object, e.g., calling o.bump(o,'o') instead of just o.bump('o'). We will see how to return to this simpler behavior using the concept of delegation in Python. We start by defining C as follows, adding the bump function below class C: def __init__(self,init_instance_attr): self.instance_attr = init_instance_attr def report(self, attr): print('instance referred to by', attr, '/instance_attr=', self.instance_attr) def bump(self,name): try: self.object_bump(self,name) except AttributeError: print('could not bump',name) # or just pass to handle the exception; # or omit try/except altogether In this definition of C, there is a bump method defined in the class C, for all instances constructed from this class to execute, findable by FEOOP. When bump is called here, it tries to call a method named object_bump on the instance it was supplied, passing the object itself to object_bump (doing explicitly what FEOOP does implicitly/automatically). If that instance defines an object_bump function, it is executed; if not, Python raises an attribute exception, which at present prints a message, but if replaced by pass would just silently fail. Of course, we could also remove the entire try/except so an attribute failure would raise an exception and stop execution. Note that in the call o.bump(...) Python uses the Fundamental Equation of OOP to translate this call into C.bump(o,'o'), which calls the equivalent of o.object_bump(o,'o'). In the world of programming, this is called delegation (which we will see more of): the bump method delegates to the object_bump method (if present) to get its work done. Here is a script, using this class. It attaches different object_bump methods to the instance o refers to, and the instance x refers to, but not to the instance z refers to (nor to the class C). It calls this object_bump method not directly, but through delegation by calling bump in the C class. o = C(10) x = C(20) z = C(30) def bump1(self, name): print('bump1',name) self.instance_attr += 1 def bump2(self, name): print('bump2',name) self.instance_attr += 2 o.object_bump = bump1 x.object_bump = bump2 # No binding of z.object_bump o.report('o') o.bump('o') o.report('o') print() prints instance referred to by o /instance_attr= 10 bump1 o instance referred to by o /instance_attr= 11 x.report('x') x.bump('x') x.report('x') print() prints instance referred to by x /instance_attr= 20 bump2 x instance referred to by x /instance_attr= 22 z.report('z') z.bump('z') z.report('z') prints instance referred to by z /instance_attr= 30 could not bump z instance referred to by z /instance_attr= 30 ------------------------------------------------------------------------------ Redefinition of Function Names (or anything else, really) Note that we can redefine a function or class. For example, we can write def f(): return 0 def f(): return 1 print(f()) Calling f() would return 1. Eclipse gets upset about this, and marks the second definition as an error (duplicate signature), but there is nothing technically wrong with this code (although the first definition is useless, and there may be a mistake in the spelling of one of these functions). Python will run the script. We can also write the following script, which Eclipse won't complain about, and runs the same. def f(): return 0 def g(): return 1 f = g print(f()) Calling f() returns 1. Again, def just makes a name refer to a function object; if, as in the case of the two definitions of the f name above, the name already refers to an object, the binding of f is just changed to refer to the function object g refers to. Conceptually, it is no different than writing x = 1 and then x = 2 (changing what x refers to from the int object 1 to the int object 2). We can do the same thing for classes, as we saw with the names C and D in the first example in these notes. In summary, def f or class C just define a name and binds it to a function/class object. We can call the function or the class's constructor. We can rebind that name later to any other object. We can even write def f(): return 0 f = 0 Now f is bound to an int instance object, not a function object. ------------------------------------------------------------------------------ Accessor/Mutator Methods (or query/command methods) and Instance Attributes ...single/double underscore prefix If calling o.method(...) returns information about an o's state but does not change o's state, it is called an ACCESSOR (or QUERY). If o.method(...) changes o's state and returns None (all functions/methods must return some value), it is called a MUTATOR (or COMMAND). Some method calls do both: they change o's state but also return a non-None value. A design question arises. Suppose that we know that an object o of a class C has an instance attribute name ia: should we directly refer to o.ia? Should we use its value by writing o.ia and change it by writing o.ia = ....? The high road says, no: a class should hide the actual instance attribute from the clients/users of the class; they should provide query and command methods to manipulate the objects under control of the class. What instance attributes we need to implement a class might change over time, but the methods that define the behavior of objects created from that class should stay the same and always work correctly with whatever instance attributes we are using. Python is a bit at odds with this philosophy. Some languages (Java/C++) have a mechanism whereby instance attributes can be tagged PRIVATE, so that they are accessible only from methods defined in the class itself; these languages do not allow new instance attributes to be created after the class is constructed, nor do they allow new methods to be added dynamically to objects (as we showed above). Python allows both. Python's philosophy is a bit more open to accessing instance attributes outside of the class methods. But there is danger in doing so, and beginners often use this convenience and end up taking longer to get their code to work correctly, and also make it harder to understand and change the code, because they are accessing information that is not guaranteed to be there in future changes to the code. In fact, Python does have a weaker form of tagging PRIVATE names, which we can use for names that should not be referred to outside the methods in the object's class. Below we explain the meaning of instance attribute names that begin with one or two underscores (but don't have two trailing underscores, so are unlike __init__). ----- Single Underscore Prefix: When a programmer uses a single underscore to prefix a name in a class (for data or a method), he/she is indicating that the name should NOT BE ACCESSED outside of the methods in the class. But, there is nothing in the Python interpreter that stops anyone from accessing that name. We can use this convention for private data and helper methods. class C: def __init__(self): self._ia = 1 def _f(self): return self._ia == 1 o = C() print(o._ia, o._f()) When run, this script produces: 1 True Still, if a class is written with names prefixed by a single underscore, it indicates that objects constructed from that class should NOT access those names outside of the methods defined inside the class. This is a message from the programmer who wrote the class to a programmer who is using the class: do not access these attributes directly. They might disappear, or their names might be changed in a later version of that class, or they might store/do something different. ----- Double Underscore Prefix: If a Python name in a class begins with two underscores, it can be referred to by that name in the class, but not easily outside the class: but it can still be referred to outside the class, but with a "mangled" name that includes the name of the class. If a class C defines a name __ia then the name outside the class can be referred to as _C__ia. This is called a "mangled" name. So, if we changed the code in the class C above by writing _ia as __ia and _f as __f, and tried to execute o = C() print(o.__ia, o.__f()) Python would complain by raising an AttributeError exception for the first value in the print statement AttributeError: 'C' object has no attribute '__ia' If o.__ia was removed, Python would complain about o.__f(), indicating the object has no attribute '__f'. But given class C defines __ia, and __f we could execute the following code o = C() print(o._C__ia, o._C__f()) by writing the mangled names explicitly. When we run this script it again produces: 1 True In fact, if we printed the dictionary for o, it would show '_C__ia' as a key, which is the true name of these functions outside of the module. print(o.__dict__) would print: {'_C__ia': 1} So, Python does contain two conventions for hiding names: the first (one underscore prefix) is purely suggestive; the second (two underscore prefix) actually makes it harder to refer to such names outside of a class. But neither truly prohibits accessing the information by referring to the name. When we discuss operator overloading (later this week) and inheritance (later in the quarter) we will learn more about controlling access to names defined in objects and classes. ------------------------------------------------------------------------------ Defining classes in unconventional places: We normally define classes in modules, and often a module does nothing but define just one class (although some define multiple, related classes). Other modules define lots of functions. All such modules are called library modules (not scripts) because they typically don't run code themselves, but we import them to gain access to the names (of classes and functions) that they define. (If they do run code, it is inside the code if __module__ == '__main__': and often the code there allows us to test the class or module, not use it to solve any interesting problem). We have seen that we can define local functions inside functions (and sometimes even return references to these locally defined functions). We can also declare a class inside a function and call the function to return an object constructed from the class (and even use the returned object if we know its defined instance attributes). def f(x): class C: def __init__(self,x): self.val = x def double (self) : return 2*self.val return C(x) o = f(1) # o refers to an object constructed from class C print(o, o.val, o.double()) When run, this script prints the following. <__main__.f..C object at 0x02829170> 1 2 The first value indicates (reading the first word after the left angle-bracket, from back to front) that class C is defined local to function f, which is in the script we ran (named by Python to be __main__); o.val refers to the instance attribute inside o's names space; o.double() calls the double function (found by FEOOP) which returns twice o's val attribute. We can also declare a class inside a class, and call some method in the class to return an object constructed from the inner class (and even use the object if we know its defined instance attributes). class C: def __init__(self,x,y): self.x = x self.y = y class Cinner: def __init__(self,x): self.val = x def double (self) : return 2*self.val def identity(self) : return (self.x, self.y) def x_construct(self): return C.Cinner(self.x) def y_construct(self): return C.Cinner(self.y) o = C(1,2) a = o.x_construct() b = o.y_construct() print( o, a, b,sep='\n') print(o.identity(), a.double(), b.double()) When run this script prints: <__main__.C object at 0x02A36310> <__main__.C.Cinner object at 0x02A36290> <__main__.C.Cinner object at 0x02A36E70> (1, 2) 2 4 o.x_construct() returns a reference to an instance of class C.Cinner, that was initialized by the x attribute defined in the __init__ method. When we call a.double(), the double method defined in Cinner() returns twice the value it was initialized with. In fact, instead of the x_construct and y_construct functions, we could define a more general construct function that takes either 'x' or 'y as arguments and constructs an object for self.x or self.y. The first way to do this uses the parameter (matching 'x' or 'y') as a key to access __dict__ def construct(self,which): return C.Cinner(self.__dict__[which]) So, o.construct('x') returns C.Cinner(self.__dict__['x']) = C.Cinner(1) The second way to do this uses the eval function, whose argument is either 'self.x' or 'self.y' depending on the value of which. def construct(self,which): return C.Cinner(eval('self.'+which)) So, o.construct('x') returns C.Cinner(eval('self.'+'x')) = C.Cinner(eval('self.x')) = C.Cinner(self.x) = C.Cinner(1) ------------------------------------------------------------------------------ Defining/Using Static Methods in Classes A method defined in a class is considered "static" if it does not have a self parameter. Sometimes it is useful to write methods in a class that have this property. For example, suppose that we wanted to declare a Point2d class for storing the x,y coordinates for Points in 2-d space. The __init__ method would take two such arguments. But suppose that we also wanted to create Point2d objects by specifying polar coordinates (i.e, using a distance and angle in radians). We could write such a class (with this static method) as follows. import math Class Point2d: def __init__(self,x,y): self.x = x self.y = y @staticmethod def from_polar(dist,angle): return Point2d( dist*math.cos(angle), dist*math.sin(angle) ) ...more code This method is preceded by @staticmethod (a decorator: we will discuss decorators generally later). This method is meant to be called from outside the class, to create Point2d objects from polar coordinates. We can write calls like a = Point2d(0., 1.) b = Point2d.from_polar(1.0, math.pi/4) Notice that we call Point2d.from_polar outside of the class by using the class name Point2d and the static method name from_polar defined in that class (which has no self parameter). Of course, the static class name is just an attribute of the Point2d class. Likewise, suppose that we wanted to write a helper function for computing the distance between two Point2d objects as a static method in this class. We could write it as @staticmethod def _distance(x1,y1,x2,y2): return math.sqrt( (x1-x2)**2 + (y1-y2)**2 ) def dist(self,another): return Point2d._distance(self.x, self.y, another.x, another.y) # or as return self._distance(self.x, self.y, another.x, another.y) # or as return another._distance(self.x, self.y, another.x, another.y) Here this helper function is meant to be called only by the dist method in this class, so we write its name with a leading underscore. Note that again we call it using Point2d. Because of FEOOP, we could also call this helper as self._distance(self.x, self.y, another.x, another.y) or another._distance(self.x, self.y, another.x, another.y) because type(self) is Point2d and type(another) should also be Point2d. Because _distance is decorated with @staticmethod, Python translates the call slightly DIFFERENTLY when using FEOOP: WITHOUT putting self as the first argument: it just passes the 4 arguments given. So, we can call such a static method either by using the name of its class or by using an object of that class that already exists. Finally, we could also write this helper function as a global function defined outside Point2d, in the module that Point2d is defined in. But it is better to minimize any kinds of global names; so, it is better to define this name inside the class. In this way it won't conflict with any other name the importing module has defined. ------------------------------------------------------------------------------ Look at my Dice class (in dice.py in the courselib). It has many interesting simple class features; it doesn't use many of the features discussed here. Later during this week, we will learn how to use methods to overload operators that allow us to take advantage of Python's syntactic features. We will learn about many other special functions named like __init__ with double underscores front and back. ------------------------------------------------------------------------------ Problems: 0) What does the following script print? class C: def __init__(self): print('C object created') D = C def C(): print('C function called') x = C() y = D() 1) What does the following script print? class Person: fingers = 10 def __init__(self, name, fingers): self.name = name if fingers != self.fingers: self.fingers = fingers bobby = Person('Bobby', 9) carol = Person('Carol', 10) print(bobby.__dict__, bobby.fingers) print(carol.__dict__, carol.fingers) Will it print differently if we write the if as fingers != Person.fingers: 2) What does the following script print? Draw a picture of the object x refers to using the graphical form for representing objects we learned during Week #1. class C: def __init__(self,a): self.a = a x = C(1) C.__init__(x,2) print(x.a) 3) Write a class C whose __init__ method has a self parameter, followed by low and high. __init__ should store these values in self's dictionary using the same names, but do so only if low is strictly less than high (otherwise it should raise the AssertionError exception with an appropriate string. 4) Explain why each of the following code fragments does what it does: two execute (printing different results) and two raise an exception. g = 0 g = 0 g = 0 g = 0 def f(): def f(): def f(): def f(): print(g) global g print(g) print(g) print(g) g += 1 global g g += 1 g += 1 f() f() f() g += 1 g += 1 g += 1 f() f() f() 5) Write a class C that uses a class attribute to keep track of how many objects are created from C (remember that each object creation calls __init__) That is, for a class C a = C(...) b = C(...) print(C.instance_count) prints 2 c = C(...) print(C.instance_count) prints 3 6) What would the following script print; explain why. Also, explain why the call self.object_bump(name) in bump is not self.object_bump(self,name) as it was in the notes. class C: def __init__(self,init_instance_attr): self.instance_attr = init_instance_attr def report(self,attr): print('instance referred to by', attr, '/instance_attr=', self.instance_attr) def bump(self,name): try: self.object_bump(name) except AttributeError: print('could not bump',name) # or just pass to handle the exception x = C(10) y = C(20) def bump(self,name): print('bumped',name) self.instance_attr += 1 C.object_bump = bump x.report('x') x.bump('x') x.report('x') y.report('y') y.bump('y') y.report('y') 7) The function print is bound to a function object. What is printed by the following script (and why)? print = 1 print(print)