Advanced Classes and Inheritance Topics In this lecture we will examine some advanced topics in classes and inheritance. We will discuss three main topics. (1) Abstract Base Classes (ABC) allow us to specify the protocol of a class (the methods that its objects can call), without fully defining all these methods. Using ABCs, we can ensure derived classes eventually define all these methods. ABCs are also useful for describing the protocols of standard Python classes. (2) Properties extend our understanding of how to define/use data attributes. We will study (a) the property class, (b) how to use it to write read-only attributes and attributes that check some constraint for any value assigned to them, and (c) how to make properties easier to use with a property factory. (3) Descriptors are a lower-level Python feature that are used to implement properties. We will explore how to write/use descriptor classes in an inheritance hierarchy (closing this chapter's loop, by using an ABC) to create general and powerful attribute properties, and learn how Python uses descriptors for method attributes (returning bound functions). These are new topics to ICS-33. I'd say a general understanding the concepts related to (1) and (2) is sufficient. -------------------------------------------------------------------------------- Abstract Base Classes: An Abstract Base Class (ABC) is a special kind of class. Being called a Base class signifies that we expect to derive other classes from it; in fact, its ONLY use is to be derived from: it cannot be instantiated. Being called an Abstract class means that it is a class that (1) ...is directly derived from abc.ABC (the ABC class in the abc module) or inherits from abc.ABC via a chain of multiple derived classes. (2) ...DEFINES (using the @abc.abstractmethod decorator) or INHERITS one or more abstract methods, each of which must eventually be overridden in a by a concrete method in a derived class; each abstract method is a "placeholder" for a concrete method that must be defined "for real" in a derived class: a concrete method typically accesses/binds some data attributes defined in a derived class. (3) ...cannot be instantiated - a direct result of (2) because if we could construct such an object, and call any of the abstract methods, they would produce meaningless results. After one or more derivations from an ABC, a derived class should define concrete methods that override all the abstract methods it inherits; then Python can construct objects from that derived class (which is called a concrete class) and get meaningful results from calling all its concrete methods. Together, all the abstract methods (each to be made concrete in some derived class) specify a minimal "protocol" for objects in their base class: a subset of method names that objects (constructed from all derived classes) must be able to call. Here is a simple but instructive example using an abstract shape class along with its derived concrete square and circle classes. We define Shape as an ABC: all shapes should be able to compute their perimeter and area (and whether or not they are one-dimensional: neither squares not circles are). But we need to write concrete classes derived from Shape (specifying what kind of shape) to be able to compute this information concretely. import math # use pi import abc # use the class ABC and the function abstractmethod (a decorator) # Shape is class Shape(abc.ABC): @abc.abstractmethod def perimeter(self): pass # will be overridden in some derived class @abc.abstractmethod def area(self): pass # will be overridden in some derived class def one_dimensional(self): # concrete: works by calling an abstract method return self.area() == 0 # Concrete classes derived from Shape; besides an __init__ specifying their # attributes, they specify concrete perimeter/area methods, overriding the # abstract methods defined in (and inherited from) the Shape ABC # We will learn later in these notes how to easily ensure side and radius are >0 class Square(Shape): def __init__(self,side) : self.side = side def __repr__(self) : return f'Square(side={self.side})' def perimeter(self) : return 4*self.side def area(self) : return self.side**2 class Circle(Shape): def __init__(self,radius): self.radius = radius def __repr__(self) : return f'Circle(radius={self.radius})' def perimeter(self) : return math.pi * 2*self.radius def area(self) : return math.pi * self.radius**2 Note that the Shape class is an ABC because it is derived from abc.ABC and defines the abstract methods perimeter and area. (I could have also defined an abstract __repr__ method in Shape, if I wanted to mandate that all concrete derived classes concretely define it.) Therefore, Shape cannot be instantiated: attempting to construct an object from it by s1 = Shape() produces the error TypeError: Can't instantiate abstract class Shape with abstract methods area, perimeter Also note that an ABC can define concrete methods too. Shape defines the method one_dimensional, which is not decorated as being abstract. It is concrete, but to compute its result, it calls the area method (which is abstract in Shape but will be made concrete in Square and Circle). Note that all one-dimensional objects have area = 0 (and the area method will eventually be defined for any concrete class). It makes no sense for Python to compute whether a Shape object (we cannot even construct one!) is one dimensional, because to do so it would have to compute the area of a Shape: but area is an abstract method that just returns None. So, Shapes are too abstract for that computation. But if a concrete derived class overrides the abstract area method with a concrete one (as Square does), then calling one_dimensional makes perfect sense on Square objects, which returns False for all squares whose sizes are > 0. The concrete Square and Circle classes are each derived from Shape and each defines two concrete methods, perimeter and area, which override these two inherited abstract methods that are defined in Shape. So, we CAN construct objects from these two classes, because they are concrete. Each class inherits the one_dimension method defined in Shape, which now can execute by calling the concrete area method defined in each class. If we wrote s1 = Square(1) then we could compute s1.perimeter(), s1.area() and s1.one-dimensional(). Note that these two classes can define any other methods, dunder or regularly- named. In the example above, I defined __init__ in each to in initialize its attributes for its special kind of Shape (side vs radius), and __repr__ to return the string representation of each. So, ABCs solve a certain problem for writing concrete classes. By deriving the concrete classes from ABCs, we guarantee that certain methods are callable on such objects: we cannot even construct objects unless the concrete classes define all the necessary concrete methods (of course we still might define them incorrectly, requiring debugging). So we are guaranteed that when the objects can be created, all the methods run on them will be concrete. If we accidentally don't override all the abstract methods (maybe we misspell one) Python catches our error the first time we try to construct an object from the class. Note the TypeError message above specifies which methods are still abstract. Let's extend this example to include other concrete shapes: triangles, lines, and arcs. But instead of just writing 5 classes derived from Shape (each defining concrete perimeter and area methods), we add a bit more structure to the inheritance hierarchy: we will define the Shape1d class derived from Shape; then we will derive the Line and Arc class from Shape1d, as illustrated below, leveraging off additional information present in the Shape1d class. object | abc.ABC | Shape / / / \ Square Circle Triangle Shape1d / \ Line Arc The Shape1d class defines one concrete method, area, which overrides the abstract method that it inherits def area(self) : return 0 All one-dimensional shapes compute 0 for their area. By defining Shape1d this way, Shape1d is still abstract: it inherits one abstract method (perimeter) that it does not override. Because Shape1d is still abstract, we cannot construct objects from it. But, like all abstract classes, we can derive concrete classes from it. Note that we could also override the inherited one_dimensional method in Shape1d, defining it to returns False without even calling area. So, we derive the Line and Arc classes from it. Both inherit the concrete method area from Shape1d and define a concrete perimeter method that overrides the abstract one they inherit from Shape. Now both of these classes have no abstract methods, so they are concrete classes from which objects can be constructed. Object-oriented design (OOD) deals with relationships among classes, including inheritance relationships among classes related by the IS-A relationship: a Line IS-A Shape1d IS-A Shape (IS-an ABC). In this example, we added a new class, Shape1d, and captured the fact that all one-dimensional shapes have an area of 0. By writing this concrete method in Shape1d, it did not have to be rewritten in classes that represent one-dimensional shapes: they just inherit that method. OOD addresses issues such as whether it is a good idea to create a new class in a hierarchy (increasing its complexity) to gain the ability to avoid writing the area method in all its derived classes (decreasing the complexity of each). Here are all the classes. class Shape(abc.ABC): @abc.abstractmethod def perimeter(self): pass # will be overridden in some derived class @abc.abstractmethod def area(self): pass # will be overridden in some derived class def one_dimensional(self): # concrete: works by calling an abstract method return self.area() == 0 class Shape2d(Shape): pass # To derive from, but area is still abstract class Square(Shape2d): def __init__(self,side) : self.side = side def __repr__(self) : return f'Square(side={self.side})' def perimeter(self) : return 4*self.side def area(self) : return self.side**2 class Circle(Shape2d): def __init__(self,radius): self.radius = radius def __repr__(self) : return f'Circle(radius={self.radius})' def perimeter(self) : return math.pi * 2*self.radius def area(self) : return math.pi * self.radius**2 class Triangle(Shape2d): def __init__(self,a,b,c) : self.a, self.b, self.c = a, b, c def __repr__(self) : return f'Triangle{self.a,self.b,self.c}' def perimeter(self) : return self.a + self.b + self.c def area(self): sp = self.perimeter()/2 return math.sqrt(sp*(sp-self.a)*(sp-self.b)*(sp-self.c)) class Shape1d(Shape): def area(self) : return 0 class Line(Shape1d): def __init__(self,length): self.length = length def __repr__(self) : return f'Line(length={self.length})' def perimeter(self) : return self.length class Arc(Shape1d): def __init__(self,radius,angle): self.radius, self.angle = radius, angle def __repr__(self) : return f'Arc(radius={self.radius},angle={self.angle})' def perimeter(self) : return math.pi * 2*self.radius * self.angle/360 In fact, it might be useful to derive Shape2d from Shape too: Shape2d will define no new methods. It will be written just as pass: we do not know how to compute the area of a general two-dimensional shape; in fact, we don't even know the attributes used in such a computation. We can then use Shape2d as the base class from which square, circle, and triangle are derived. If we construct s1 = Square(...), then isinstance(s1,Square), isinstance(s1,Shape2d), and isinstance(s1,Shape) would all evaluate to True: s1 IS-A square, IS-A shape2d, and IS-A shape: any method defined in these three classes can be called on the object s1 refers to: it follows the protocol of a Shape, Shape2d, and Square. ---------- Interlude: How is property (3) of ABC implemented in Python? I cannot go into details here, but abc.ABC is a "metaclass". A metaclass is a class the controls the creation of other classes that are derived from it. Any class that is derived from abc.ABC has its __new__ automatically updated to first look for any abstract methods that are in that class: either inherited from a base class or defined there. If it finds any, it disallows construction of the object and raises the TypeError exception shown above, specifying all the abstract methods it still found in the class. Here is a bit of insight into this process. Each class derived from abc.ABC has an __abstractmethods__ attribute that is a frozenset of method names: the names of all methods in that class that are abstract, either because they were inherited abstract and not overridden with concrete methods or they were defined in that class as abstract. __abstractmethods__ does not include any methods that were inherited as abstract but overridden to be concrete in the class. If __abstractmethods__ is not empty, Python should not construct objects of that class, and each method from this attribute appears in the error message. ---------- The collections module (which also defines things like namedtuple and defaultdict) defines the collections.abc module, which defines the following ABCs, organized into an inheritance hierarchy. Here is a list of the most common ABCS defined there (see the ABC Inheritance Hierarchy link associated with this lecture on the Weekly Schedule). ABC (Inherits From) : new abstract methods Concrete Mix-In methods ------------------------------------------------------------------------------ Container : __contains__ Hashable : __hash__ Iterable : __iter__ Iterator (Iterable) : __next__ (__iter__ returns self) Reversible (Iterable) : __reversed__ Sized : __len__ Callable : __call__ Collection (Sized,Iterable,Container): Sequence (Reversible,Collection) : __getitem__ (used for iteration) MutableSequence(Sequence) : __setitem__, __delitem__, insert append, reverse, extend, pop, remove, __iadd__ Set (Collection) : all relational operators, all logical operators, isdisjoint MutableSet (Set) : add, discard clear, pop, remove, all i-logicals (e.g., __ior__) Mapping (Collection) : __getitem__ __contains, keys, items, values, get, __eq__, __neq__ MutableMapping (Mapping) : pop, pop_item, clear, updsate, and set_default MappingView (Sized) : KeysView (MappingView,Set) : ValuesView (MappingView,Collection) : - Note can have duplicate values, so Collection not Set ItemsView (MappingView,Set) : See the collections.abc in the Python documentation for more details. A MappingView is a read-only Mapping: no keys can be added/removed nor have their associated value changed. KeysView, ValuesView, and ItemsView are each a structure (Set or Collection) that Python can iterate over (as we have seen). A method is a "Concrete Mix-In method" if it can be defined by calling some concrete methods overriding abstract methods. In our Shape example, one_dimensional is such method. As another example, the append method in the MutableSequence ABC can be written using the insert method. Sometimes we use the standard mix-in method; but sometimes we rewrite it for efficiency, in a concrete class that "knows more" about the implementation of a class than its ABC knows. We can characterize the Random.shuffle function (typically used for lists) to work on any class derived from MutableSequence (which list is): the protocol for such a class will include all the methods necessary for Random.shuffle to execute correctly. Finally, the collections module also defines the classes UserString, UserList, and UserDict. We should use these classes when creating derived classes for special Python strings, lists, and dictionaries. The reason is a bit subtle: If we derive from the string, list, or dict class, there might be some methods that don't call methods we would expect: for example, __init__ in dict might not use the standard "update" method (which we might override when we derive from UserDict) to put initial values into the dictionary: it might do something more efficient that bypasses any "update" method we might defined in our derived class, which would be confusing. The "User" classes perform operations in a more straightforward (sometimes slower) way, calling all the expected methods. So if we override the update method in a UserDict, the __init__ method will call it. To see the real details in Python, in Eclipse disclose the SystemLibs and disclose Lib, and examine the _collections_abc.py module. ---------- Interlude: Numbers and inheritance In the numbers class, names for the various numeric classes in Python are arranged in the following hierarchy. Note that the ACTUAL numeric classes (with the names int, float, etc.) do not inherit from each other. Number | Complex (represents complex) | Real (represents float) | Rational (represents fractions.Fraction) | Integral (represents int) We can use this hierarchy to check for different types of numeric values: isinstance(x, numbers.Integral) True: x is an int. isinstance(x, numbers.Real) = True: x is an int, fractions.Fraction, or float isinstance(x, numbers.Number) = True: x is any kind of numeric value ----- Future Notes: discuss abc.register and it relationship to isinstance Use registration in ABC to say "implements protocol" even if not a subclass (in a virtual subclass: Fluent 328)...not checking parameter structure or semantic constraints (e.g., parameterless __len__ returning non-negaitive) ----- ------------------------------------------------------------------------------ Properties as class attributes: the property class and how to use/generalize it In this section we will learn about the "property" class in Python, and how to use it to better control the data attributes of an object. Using properties, we can define general methods to control the 3 ways to use a data attribute: accessing, setting, and deleting it. ---------- BIG CHANGE: Before discussing properties (and later how they are implemented by descriptors) in detail, we must change our understanding of a fundamental part of Python: how attributes are searched for and set. We have discussed the inheritance-enhanced Fundamental Equation of Object-Oriented programming: When attributes are searched for, Python does each of the following, in order, until the attribute is found (1) examine the namespace of the object (its __dict__) (2) examine the namespace of the object's class (its __dict__), (3) examine the namespaces of the classes that the object's class was derived from: repeat 2 in the order specified by the __mro__ of the object's class That is what the __getattribute__ method defined in and inherited from the object class does to search for attributes. Although we can define a __getattribute__ method in any class to override the standard one, we typically do not. But, we have sometimes defined __getattr__, which Python runs only if __getattribute__ fails to find an attribute. When attributes are set (1) bind the attribute in the object's namespace (its __dict__ via __setattr__) That is what the __setattr__ method (defined in and inherited from the object class) does to set attributes: unlike attribute lookup, attribute setting does not ascend the inheritance hierarchy! Rrecall that we can define a __setattr__ method in any class to override the standard one to do something special, possibly also calling object.__setattr__ or super().__setattr__ (or the builtin function setattr) to mimic the standard __setattr__. ---------- Now let's learn about the "property" class (defined in the builtins module) and how using it changes the search/set/delete rules. The __init__ constructor for the property class appears as follows: def __init__(self, fget=None, fset=None, fdel=None, doc=None): When attributes are specified as properties, fget is the code to execute when an attribute is searched for, fset is the code to execute when an attribute is is set, and fdel is the code to execute when an attributed is deleted (by del). Note that if the parameter's value is None, then that operation is ILLEGAL, which means that searching for/setting/deleting an attribute raises an exception respectively if fget/fset/fdel is None. The specified code can be the name of a function or a lambda (each with the correct parameters: self + ...). If we define a CLASS ATTRIBUTE as a property, it means that when any object constructed from that class searches for or sets or deletes that attribute, it will call the appropriate property's function to do the job. When attributes are searched for, Python does each of the following, in order, until the attribute is found: THE FIRST STEP IS NEW. (1) if the attribute is defined in the object's class as a property, run the property's fget method (2) examine the namespace of the object (__dict__), (3) examine the namespace of the object's class (__dict__), (4) examine the namespaces of the classes that the object's class was derived from: repeat 1 and 3 in the order specified by the __mro__ of the object's class Likewise, when attributes are set (o.a = v) (1) if the attribute is defined in the object's class as a property, run the property's fset method (2) bind the attribute in the object's namespace (its __dict__ via __setattr__) Likewise, when attributes are deleted (by del) (1) if the attribute is defined in the object's class as a property, run the property's fdel method (2) delete the attribute in the object's namespace (its __dict__ via del) So, property manipulation "goes to the head of the class", being processed specially, first. Here is a simple example of an attribute named "count" that is defined as a property in the class C. It uses get_count, set_count, and del_count as the methods called for that property. IMPORTANT: each of these method manipulates an attribute named the_count, which is a regular attribute, NOT A PROPERTY. So, while Rules (1) apply to the attribute count; Rules (2-4) apply to the attribute the_count. class C: def __init__(self,init): self.count = init def inc(self): self.count += 1 # same as self.count = self.count + 1 #...def other methods for C... # Define methods used for the count property def get_count(self): return self.the_count def set_count(self, value): self.the_count = value def del_count(self): del self.the_count # Define count as a property in the class C count = property(get_count, set_count, del_count) o = C(5) # C(5) calls set_count and initializes self.the_count to 5 print(o.__dict__) # prints {'the_count': 5} print(o.count) # prints 5: o.count calls get_count; returns self.the_count o.inc() # calls get_count then set_count; stores self.the_count as 6 print(o.count) # prints: 6 o.count = 1 # calls set_count; stores self.the_count as 1 print(o.count) # prints 1: o.count calls get_count; returns self.the_count del o.count # calls del_count, which executes del self.the_count print(o.__dict__) # prints {} Let's trace what happens in the code above to better understand it and explain why it prints what it prints. I STRONGLY SUGGEST that you actually put this code into an Eclipse project and use the debugger to single step into/over the methods and their bodies in class C to better observe what is happening. Line 1) o = C(5) __init__ is called with self bound to a new/empty dictionary, and the init parameter bound to 5. The statement self.count = init finds that the count attribute IS a property of the C class, so by Rule 1 its set_count method is called with value = 5; inside that method, the statement self.the_count = value finds that the the_count attribute is NOT a property, so by Rule 2 it binds the_count to 5 in o's __dict__. Note the difference between self.count = value and self.the_count = value. Line 2) print(o.__dict__) prints o's updated __dict__: {'the_count': 5} The attribute the_count appears in o's namespace. Line 3) print(o.count) When print computes its argument, the expression o.count, it finds that the count attribute IS a property of the C class, so by Rule 1 the get_count method is called; that method, the statement return self.the_count finds that the the_count attribute is NOT a property, so by Rule 2 it finds the_count is bound to 5 in o's namespace/__dict__ so returns 5, which is printed. Line 4) o.inc() The body of the inc method is the statement self.count += 1, which is equivalent to self.count = self.count + 1. It first finds that the count attribute IS a property of the C class, so by Rule 1 the get_count method is called and it returns 5 as described in Line 3). It next computes a value one bigger, 6. It finally executes the statement self.count = 6, finding that the count attribute IS a property of the C class, so by Rule 1 the set_count method is called which includes the statement self.the_count = value. Python finds that the the_count attribute is NOT a property, so by Rule 2 it binds the_count to 6 in o's namespace/__dict__, just as in 1) Line 5) print(o.count) prints 6, as described in 3) Line 6) o.count = 1 The statement self.count = 1 finds that the count attribute is a property of the C class, so by Rule 1 the set_count method is called with value = 1; inside that method, the statement self.the_count = value finds that the the_count attribute is NOT a property, so by Rule 2 it binds the_count to 1 in o's namespace/__dict__, just as in 1) Line 7) print(o.count) prints 1, as described in 5) and 3) Line 8) del o.count The statement del o.count finds that the count attribute is a property of the C class, so by Rule 1 the del_count method is called; inside that method, the statement del self.the_count finds that the the_count attribute is NOT a property, so by Rule 2 it deletes the the_count attribute from o's namespace/__dict__. Line 9) print(o.__dict__) prints o's updated __dict__ (the_count removed): {} CAUTION: What would happen if we wrote the statement return self.count INSIDE the get_count method (using the same name as the property)? When it tried to compute self.count it would find that count is a property and call get_count again, leading to infinite recursion. We can write .count everywhere except inside the get_count, set_count, and del_count methods. NOTE: We could have written the following statement in __init__ self.the_count = init to accomplish the same result: binding the_count to 5 in o's __dict__. But it is better to use .count everywhere in C's methods, except in the methods get_count, set_count, and del_count. So, count is the public attribute name external to C (and used in most of C's methods) while the_count is its "special attribute name" used inside the __dict__. Of course, we could still refer to o.the_count and find its values in o's namespace/__dict__, but that would be improper: if we changed the "special attribute name" to something else, our code would fail. If we used inheritance, we would have to make sure that all the "special attribute names" were unique. One quick way to do this is to use attribute names with double underscores. If we replaced the_count with __count in all the code above and re-ran it, it would produce the same result except for in line 2, which would appear as {'_C__count': 5}. Recall that double underscore attributes are mangled to be an underscore, followed by the class name, followed by the double underscore attribute name. By using name mangling for "special attribute names", different attributes used in different classes would be unique. Of course in the case of .the_count or .__count we could still access/update o's namespace/__dict__ with the "special/mangled" name finding/changing its value. But good programmers just shouldn't do that. ---------- Safety vs Security in Python "Safety is the condition of being protected from harm or other non-desirable outcomes, caused by non-intentional failures (mistakes). Security is the condition of being protected from harm or other non-desirable outcomes caused by intentional human actions or human behavior." Many Python features (like double underscore names) are safe but not secure. ---------- Note that originally we described __setattr__ as def __setattr__(self,attr,value): self.__dict__[attr] = value With properties, it can be better approximated as follows: checking whether attr is a class attribute that is a property to decide how to set that attribute: by calling fset for that attribute/property or directly setting the object's namespace/__dict__. def __setattr__(self,attr,value): if isinstance(self.__class__.__dict__.get(attr,None),property): self.__class__.__dict__[attr].__set__(self,value) else: self.__dict__[attr] = value We will soon learn below how the __set__ method in a property calls fset. Any class for which __get__, __set__, and __delete__ is declared is called a descriptor. The property class is a special way to use descriptors. We will describe the advantage of using descriptor directly in the last part of this lecture. ------------------------------------------------------------------------------ More useful applications of properties: read-only and constrained attributes In the C class, if we removed set_count and specified count = property(get_count, None, del_count) then count becomes a read-only property; we could never set it (at least not by writing o.count = value, because the fset parameter is None). In the C class, the __init__ and inc methods would now fail (and of course there would be no way to write o.count = 1). If we wanted to allow the methods in C to work correctly, we could replace .count with .the_count in __init__ and inc; of course with this change o.count = 1 still fails. But suppose that we want methods INSIDE C to have access to (re)binding this name, but nothing outside of the class to be able to (re)bind it. We could use some of the material that we learned in the Introspection lecture and write set_count as follows. def set_count(self, value): import inspect code_called_from = inspect.stack()[1].frame.f_code fname_called_from = inspect.stack()[1].function assert code_called_from is (None if (code := self.__class__.__dict__.get(fname_called_from,None)) == None else code.__code__),\ f'Attempt to set attribute "count" from outside class {self.__class__}' self.the_count = value After computing some information about the call to set_count, it asserts that set_count was called from some method defined inside the class C. If so, it sets the attribute in the standard way; if not it raises an AssertionError exception. Again, think safety vs. security. Someone outside the C class could certainly write o.__dict__['the_count'] = new_value, bypassing count's property to set the "special name attribute". ---------- As a second example, suppose that we wanted to specify a constraint on the values stored in the count attribute: that its value is always non-negative (>= 0). This is called a DATA INVARIANT. We can ensure this invariant by writing the set_count method to first check it, and raise the ValueError exception whenever (re)binding the attribute would violate the invariant. In this way we will never store a negative number in this property. def set_count(self, value): if value < 0: raise ValueError(f'C.set_count: value({value}) not >= 0') self.the_count = value Here, we check the constraint before making the assignment to the "special attribute name". Note that we could completely dispense with properties altogether, if we required users of class C to always call o.get_count() and o.set_count(value). We could do the checking in set_count. But, it is much easier to write o.count and o.count = 1. Without properties we would have to write o.count += 1 as o.set_count(o.get_count() + 1) which is quite difficult to read and understand. Properties provide us "uniform access" to a data attribute: whether it is accessed/stored directly or through getter/setter functions. Using properties, we can define getter/setter/deleter methods for every data attribute that has interesting behavior. This approach can lead to us writing a lot of code. In the next section, we will learn how to simplify this process. But first, we will examine how the property class provides us with another way to specify properties, ultimately examining how the property class is written. ---------- Specifying Properties with Decorators: Here is an alternative way to define the same class C that was defined above. The differences are discussed below. One big difference is that the method names now are count: the ultimate class attribute defined in C. class C: def __init__(self,init): self.count = init def inc(self): self.count += 1 #...def other methods for C... @property def count(self): return self.the_count @property.setter def count(self, value): self.the_count = value @property.deleter def count(self): del self.the_count The first version defined its get_/set_/del_ count methods, and then built a property object from them, and assigned it to the class attribute count: count = property(get_count, set_count, del_count) The new version above defines the count method (a class attribute), but decorates it with @property, which rebinds count to be an property object with only its fget attribute defined. Then, using the @property.setter decorator, it rebinds count to be a new property object, with only the fset attribute changed (keeping the others the same). Then, using the @property.deleter decorator, it rebinds count to be a new property object, with only the fdel attribute changed (keeping all the others the same). Here is a good approximation to the property class. Reading it can help us understand what the decorators described above work as they do. class property: """ Emulation of the property class for educational purposes I first saw this definition in Ramalho, "Fluent Python" an excellent book covering ICS-33 material and lots more. Later I found it in https://docs.python.org/3/howto/descriptor.html """ # When initially called as decorator, @property, the function # decorated is bound to the fget parameter in __init__ (the # getter); the other __init__parameters (fset, fdel, and doc) # are all bound to None. # # To bind the other functions, use the @property.setter/deleter # decorators, which each bind the name to a new property object, # updated with that specific function. def __init__(self, fget=None, fset=None, fdel=None, doc=None): self.fget = fget self.fset = fset self.fdel = fdel if doc is None and fget is not None: doc = fget.__doc__ self.__doc__ = doc # These functions are descriptors for attributes; in the property # class, they check against None and call the fget/fset/fdel def __get__(self, obj, objtype=None): if obj is None: return self if self.fget is None: raise AttributeError("unreadable attribute") return self.fget(obj) def __set__(self, obj, value): if self.fset is None: raise AttributeError("can't set attribute") self.fset(obj, value) def __delete__(self, obj): if self.fdel is None: raise AttributeError("can't delete attribute") self.fdel(obj) # These functions are used as decorators; they return a new property # with old funcctions for all but argument def getter(self, fget): return type(self)(fget, self.fset, self.fdel, self.__doc__) def setter(self, fset): return type(self)(self.fget, fset, self.fdel, self.__doc__) def deleter(self, fdel): return type(self)(self.fget, self.fset, fdel, self.__doc__) As a final point here, recall that (class) properties override (object) attributes. Given either of the definitions of C above, we could write o = C(5) and then bypass the count property by writing o.__dict__['count'] = 10, which binds the name count in the object's namespace (__dict__) to 10. At this point, writing o.count would still invoke the getter function, because a property is checked before the namespace of the object (__dict__). But, if we next wrote del C.count (deleting C's attribute that is a property) then o.count would evaluate to 10, because the object's namespace/__dict__ still binds the count attribute to 10. ------------------------------------------------------------------------------ Property Factories: how to abstract and then reuse properties It would be nice to abstract a non-negative property, to be able to more easily use it in classes, maybe even multiple times. With such an abstracted property, we could define the class C we have been using as an example much more simply: class C: def __init__(self,init): self.count = init def inc(self): self.count += 1 #...def other methods for C - but not get_count/set_count/del_count... count = non_negative('the_count') # below: various non_negative definitions Here, non_negative is a function acting as a property factory: when called, it creates/returns a new property object with fget/fset/fdel appropriately defined. Notice that the first two methods remain the same, and the count attribute is bound to a property object whose "special attribute name" is specified by the argument to non_negative: the_count. If we had another attribute that was also to be non-negative, we could write it as follows: another = non_negative('the_another') Below, we define the needed non_negative function, which returns the appropriate property object. Now that we have abstracted the special attribute name (a str) we need to change the bodies of the get_/set_/del_ methods to use that name. We can either use the builtins getattr/setattr/delattr or use the object's namespace/__dict__ directly, with the whatever abstracted special name we pass as an argument to non_negative. Note that writing count = non_negative('count') # same name will cause infinite recursion! would cause problems with recursion (explained in CAUTION above). The special attribute name MUST BE DIFFERENT from the property name "count" and from any other attribute names that are properties. def non_negative(special_attr_name): # use special_attr_name from enclosing scope in definitions of fget/fset/fdel # define the fget/fset/fdel used to construct/returns a property object def get_non_negative(an_object): return getattr(an_object,special_attr_name) # same as return an_object.__dict__[special_attr_name] def set_non_negative(an_object, value): if value < 0: raise ValueError(f'non_negative.set_non_negative: value({value}) not >= 0') setattr(an_object,special_attr_name, value) # same as an_object.__dict__[special_attr_name] = value def del_non_negative(an_object): delattr(an_object,special_attr_name] #same as del an_object.__dict__[special_attr_name]) return property(get_non_negative, set_non_negative, del_non_negative) In fact, we can simplify the process even more by rewriting the non_negative function (below) to generate/remember a different name each time it is called. So, we could define the following properties in a class length = non_negative() # actual attribute name: 'non_negative#0' width = non_negative() # actual attribute name: 'non_negative#1' height = non_negative() # actual attribute name: 'non_negative#2' In the C class we would write count = non_negative() The leaves less code for a programmer to write (and possibly mess up: accidentally having two different properties with the same special attribute name). In the non_negative function below, the first time it is called it creates an 'id' attribute initialized to 0; every other time it is called it increments that attribute. Then we use this unique id to create special attributes names: e.g.,' non_negative#0'. Note that this name is a perfectly good DICTIONARY KEY, but not a legal IDENTIFIER in Python. So, we can never write o.non_negative#0 in our code. Thus, the special attribute names below will never conflict with any attributes we create in Python's standard syntax; of course, we can still create such an attribute by writing o.__dict__['non_negative#0'] = 1. Maybe in our first version we should have written count = non_negative('count#'). def non_negative(): # Each time non_negative is called, its id attribute is different. # Concatenate a special attribute name using 'non_negative' and the id. if hasattr(non_negative,'id'): non_negative.id += 1 else: non_negative.id = 0 special_attr_name = 'non_negative#'+str(non_negative.id) def get_non_negative(an_object): return getattr(an_object,special_attr_name) def set_non_negative(an_object, value): if value < 0: raise ValueError(f'non_negative.set_non_negative: value({value}) not >= 0') setattr(an_object,special_attr_name, value) def del_non_negative(an_object): delattr(an_object,special_attr_name) return property(get_non_negative, set_non_negative, del_non_negative) So here, non_negative has no special_attr_name parameter; each time the function is called it creates a unique name automatically, which is used by the three functions it defines and uses to construct/return a property. Here is a second way to handle creating/incrementing the id attribute at the beginning of non_negative. try: non_negative.id += 1 except AttributeError: non_negative.id = 0 The second solution tries to increment the attribute; if unsuccessful (because this is the first time the function has been called and no attribute exists) the exception is handled by defining and initializing the attribute. I like the first solution better: checking and then acting accordingly. Avoid try/except when you can (sometimes you cannot: e.g., you can't check whether next(...) will return a value or raise and exception: you need to call it to find out). Unfortunately, using this function now means that declaring the count property in C means its special attribute name is 'non_negative#0', which is unrelated to the count property name. Debugging would be easier if the count property had a special attribute name based on count, the NAME of the defined property. But, there is no easy fix to the problem of (a) automatically generating special attribute names (b) having each name related to the name of the property it represents We could add an argument to non_negative and write length = non_negative('length') which would generate an attribute name like 'length@non_negative#0', containing both names and a special counter for uniqueness. But now we are back requiring the programmer to specify names. We cannot fix the problem by defining a better non_negative function. But, we can apply the fix AFTER the C class has been fully defined. I'll show one solution and briefly illustrate it below. The basic idea is AFTER the class is defined, we will call a decorator on it to change/fix all the special attribute names: the decorator will find all the class attributes that are bound to properties, then surgically go into the property object of each to change its special_attr_name to be based on the attribute name of the property. So, when we define the C class, we will use the fix_property_names decorator. The decorator will not change what class object C is bound to (it returns aclass), but instead will change information in its namespace/__dict__ related to the special attribute names for properties. So we write @fix_property_names class C: ...body is the same And we must define the class decorator earlier as def fix_property_names(aclass): for attr_name, value in aclass.__dict__.items(): if isinstance(value,property)\ and value.fset != None\ and 'special_attr_name' in value.fset.__code__.co_freevars: cell_to_update = value.fset.__closure__[value.fset.__code__.co_freevars.index('special_attr_name')] cell_to_update.cell_contents = f'{attr_name}@{cell_to_update.cell_contents}' return aclass When fix_property_names is called on a class, the for loop looks at all bindings in the aclass dictionary. If it finds an attr_name bound to a value that is a property, such that its fset attribute (guaranteed to exist) is not None (is a real function), and the fset function refers to a free* variable named 'special_attr_name' in its closure, then reset the value bound to that free variable to be the attr_name, followed by @, followed by the current binding of the free variable. *A free variable in a method is one that is not a parameter or local to the method. Free variables are typically declared (as is done here) in an enclosing scope (the scope of non_negative) and used inside a locally declared function: get_non_negative, set_non_negative, delete_non_negative use special_attr_name as a free variable). Global variables are a different variant of free variables. (recall I said that closures are not actually stored as dictionaries; they are stored as lists with indexes, whose index in the closure can be computed by using __code__.co_freevars) So for attribute count defined as a property in C, it changes the free variable special_attr_name from 'non_negative#0' (created by the non_negative function) to 'count@non_negative#0' indicating that "count" is the actual name of the property manipulating this attribute. By changing this free variable, the attribute will now be stored/accessed using this new name, which includes useful information about the property. There is a lot of magic in the code above, but most of the information can be tracked down by looking in the inspect module; some of the material is covered in the Introspection lecture. In fact, there is another way to solve this problem, by using a metaclass. If we derive C from such a metaclass, the metaclass can be made to automatically fix the property names in C (pretty much the same code) after the class has been defined. Because it is easier to describe the decorator, and because metaclasses are beyond what I teach in ICS-33, I won't discuss this solution here; the 'Fluent Python" book is a good source for information on metaclesses. Finally, we will see that classes (even using a bit of ABCs) provide a better way to provide general purpose properties. So, in the next section we will discuss descriptors (how properties are implemented) in general and how to write classes using descriptors and inheritance to make it even easier to implement more interesting and complicated attributes. ------------------------------------------------------------------------------ YOU DO NOT NEED TO READ/UNDERSTAND THIS MATERIAL IT EXPLAINS HOW DESCRIPTORS, A LOW-LEVEL PYTHON FEATURE, IMPLEMENTS PROPERTIES Descriptors in Python and Improved Attribute Control After multiple sections on properties, we are now ready to go down one level to look in detail at "descriptors", from which properties are built. We can use descriptor classes and inheritance (even ABC) to improve our ability to process data attributes; we will also discuss method attributes and how descriptors process them to become bound functions. A class is a descriptor if it defines (some of) the methods __get__, __set__, and __delete__. We have seen that the property class is a descriptor: in it, these three dunder methods work by calling the simpler fget/fset/fdel methods specified when a property is constructed (or, as we saw above, by decoration). In the examples below we will use descriptor classes directly, but also use some of the information that we learned when writing advanced properties and property factories. To start, we define the class Unique_descriptor_base, from which we will derive descriptor classes that constrain the values stored in data attributes (similar to how we wrote the non_negative property factory). This class will define an __init__ and standard versions of all three methods in the descriptor protocol: __get__, __set__, and __delete__ (which will be inherited, and sometimes overridden, in derived classes). In addition, it defines __id as a class data attribute, which it uses to ensure the special attribute names generated are unique; it is used much like the free variable id was used in the non_negative property factory function. class Unique_descriptor_base: __id = 0 # initial unique id (see use in special_attr_name) def __init__(self): this_class = self.__class__ self.special_attr_name = f'{this_class.__name__}#{this_class.__id}' this_class.__id += 1 # increment to new unique id # obj is the instance object this __get__ was called on; if it is None, # then __get__ was called on this class not an instance of this class def __get__(self, obj, objtype = None): if obj is None: return self else: return getattr(obj, self.special_attr_name) def __set__(self, obj, value): setattr(obj, self.special_attr_name, value) def __delete__(self,obj): delattr(obj, self.special_attr_name) Examine the __init__ method. Note most importantly that its local variable named this_class will typically NOT be 'Unique_descriptor_base': it will be the name of some class derived from Unique_descriptor_base (see the Non_negative class, defined below). That means the attribute special_attr_name will have its name/__id based on that derived class: it might look like 'Non_negative#0'. Now we will derive the Constrained class from Unique_descriptor_base, but also from abc.ABC. Thus, Constrained will be an ABC that overrides __set__ (to check the constraint and act appropriately depending on whether or not it is satisfied) and defines as abstract the method check_constraint that will implement the actual constraint to be checked in concrete classes derived from Constrained. Recall that if a class derived from this ABC is defined without defining a check_constraint method, it we cannot construct objects from it. Note that self.check_constraint in __set__ either raises an exception or doesn't (in which case the value it returns is bound to the object). import abc class Constrained(abc.ABC, Unique_descriptor_base): def __set__(self, obj, value): value = self.check_constraint(obj, value) setattr(obj, self.special_attr_name, value) @abc.abstractmethod # Check constraint and do nothing if satisfied; raise ValueError if not # satisfied; may also return a modified value that meets the constraint, # allowing implicit conversion def check_constraint(self, value): pass Now, let's define the entire Non_negative class by easily deriving it from Constrained. It is easy, because we have set up a lot of machinery in the Unique_descriptor_base and Constrained classes. class Non_negative(Constrained): # Define the abstract method specified in Constrained def check_constraint(self, obj, value): if value >= 0: return value else: raise ValueError(f'Non_negative.check_constraint: value({value}) not >= 0') Using the concrete Non_negative class, we can now write the C class as folows, just as easily as we did with the property factory. class C: def __init__(self,init): self.count = init def inc(self): self.count += 1 #...def other methods... count = Non_negative() Whenever we bind the count attribute, Python will execute the __set__ method in the Non_negative descriptor: the __set__ inherited from Constrained. Inside, it calls the check_constraint method defined in the Non_negative class, which has overridden the abstract check_constraint method. Here is another way to define a different kind of constraint. In the class Correcting_non_negative, it raises an exception if the value is not a numeric value; if it is a numeric value, any negative number is converted to a positive one, which is returned: the statement value = self.check_constraint(obj, value) in the __set__ methods binds value to an object that should be guaranteed to to be 0 or positive. import number class Correcting_non_negative(Constrained): def check_constraint(self, obj, value): if not isinstance(value,number.Number): raise ValueError(f'Non_negative.check_constraint: value({repr(value)}) not numeric') return abs(value) Finally, here is a constraint that specifies what type of value can be bound to an attribute. The actual type being checked is an argument to __init__, so this class is very general. class Type_constrained(Constrained): def __init__(self, type_constraint): super().__init__() self.type_constraint = type_constraint def check_constraint(self, obj, value): if not isinstance(value, self.type_constraint): raise ValueError(f'Type_constrained.check_constraint: value({repr(value)}) not instance of {self.type_constraint}') return value Then, we might write count = Type_constrained(int), which would check that any value binding count must be an integer. Of course we could write a more specific class like type_int, omit the __init__ definition and write the if test in check_constraint as just not isinstance(value, int). But it is very easy to use this generalized class. Therefore, by using a tiny inheritance hierarchy, with Constrained derived from Unique_descriptor_base, we can easily derive small classes from Constrained to check a variety of interesting attribute properties. As with the property factory, the special_attr_name attribute for count defined in Unique_descriptor_base is bound to a string based on the class being used to do the checking: count.special_attr_name is 'Type_constrained#0'. As before, we want to change this attribute to be 'count@Type_constrained#0'. We do so with another decorator function, similar to (but simpler than) the one we used above. def fix_descriptor_names(aclass): for attr_name, value in aclass.__dict__.items(): if hasattr(value,'__dict__') and 'special_attr_name' in value.__dict__: value.__dict__["special_attr_name"] = f'{attr_name}@{value.__dict__["special_attr_name"]}' print(value.__dict__) return aclass When fix_property_names is called on aclass, the for loop looks at all bindings in the aclass dictionary. If it finds an attr_name bound to a value that is an object that defines a special__attr_name attribute, then reset the value bound to the special_attr_name to be the attr_name, followed by @, followed by attr_name's current binding. So for attribute count defined as a class derived from Unique_descriptor_base (which defines special_attr_name), it changes special_attr_name from 'Type_constrained#0' to 'count@Type_Constrained#0' indicating that count is the name of the attribute. By changing special_attr_name this attribute will now be stored/accessed using this new name. ----------- SIMPLIFICATION in Python 3.6 and beyond. Python now allows us to avoid using the fix_descriptor_names decorator function by starting the definition of Unique_descriptor_base as follows. class Unique_descriptor_base: __master_id = 0 def __init__(self): self.__id = Unique_descriptor_base.__master_id Unique_descriptor_base.__master_id += 1 def __set_name__(self, owner, attr_name): self.special_attr_name = f'{attr_name}@{self.__class__.__name__}#{self.__id}' .... Now, the code that processes a class (it is run right after the class itself has been fully defined; just like what the decorator does) examines every attribute that is specified by a descriptor and calls the __set_name__ method on it (with the appropriate arguments) if this methods is defined in the descriptor. The __set_name_ method sets the special_attr_name to include the name of the class attribute being defined, the class of the descriptor (like Non_negative), and its unique ID. I have left the fix_descriptor_names decorator function in this handout because there are still other useful operations that such decorators can perform, once a class is defined. Note that even something as fundamental as how a class is defined has changed recently in Python. ---------- -------------------------------------------------------------------------------- Final Details Mostly we define descriptors with at least __get__ and __set__ methods. But if we don't define one or the other, the behavior of descriptors can be a bit subtle. Here are the three possibilities. We call a descriptor that defines __set__ an Overriding Descriptor (it overrides the __set__ method). Overriding Descriptor (has __set__) with __get__: the Normal Case for Data __get__ from object: calls __get__ method with instance that is object __get__ from class : calls __get__ method with None as object __set__ from object: calls __set__ method in descriptor Overriding Descriptor (has __set__) without __get__ __get__ from object: returns value in namespace (__dict__); if nothing there, returns descriptor object __get__ from class : returns descriptor object __set__ from object: calls __set__ method in descriptor NonOverriding Descriptor (not have __set__) with __get__: the Normal Case for Methods __get__ from object: calls __get__ method with instance that is object __get__ from class : calls __get__ method with None as object __set__ from object: updates __dict__ for instance Methods are implemented with a non overriding descriptor: they define __get__ but not __set__. When we call o.method(a1,a2,...a2) Python first calls __get__, which returns a function constructed from the method, whose self parameter is bound, and then calls the function with the remaining arguments on that result. Here is an approximation of what happens in the descriptor def __get__(self, obj, objtype = None): if obj is None: return self else: def bound_function(*args,**kargs): return self.__func__(obj,*args,**kargs) bound_function.__self__ = self bound_function.__func__ = self.__func__ return bound_function For efficiency purposes, Python allows us to specify the attributes of a class using the __slots__ attribute (bound to a tuple) rather than __dict__. Such classes have a fixed number of attributes (Python cannot add or remove them). But, if we also include the name '__dict__' inside the __slots__ tuple, Python can add/remove attributes, which are stored in __dict__ (and thus take longer to access than the other attributes that are explicitly named in __slots__. Again, see the "Fluent Python" book for more details. ------------------------------------------------------------------------------ Problems: 1. What would happen if in the definition of class Non_negative(Constrained): ... we misspelled check_constraint (when we tried to define that method)? That is, there is an error in the code that will eventually be detected: when/where does Python detect the errors and what happens when Python detects it? 2. Write two versions of the In_range class; each ensures objects have a value in the specified range. By defining the data attribute selection = In_range(1,5) we guarantee that selection always stores an value in the closed range [1,5]. Also, always verify that the lower bound is smaller than the upper bound. In the first version, raise an exception if the value that is to be bound to the attribute is outside the range. In the second version, raise an exception if Python cannot compare the value that is to be bound to the attribute against either the the lower or upper bound; if it can be compared, but is outside the range, implicitly convert it to the value in the range that is closest to it. So the assignment o.selection = -1 would result in the attribute storing 1; the assignment o.selection = 8 would result in the attribute storing 5. 3. Write a Non_blank classes that ensures objects have a string value that is not empty. In the process, return a string that has all leading and trailing blanks removed. 4. What happens if we define the __delete__ method as the statement pass? What happens if do not define the __delete__ method in Unique_descriptor_base? 5. How can we update the inheritance hierarchy for descriptors to make it easy to define a new property that both (a) allows only constrained updates (like a value being non-negative) (b) allows updates to a property only if it is updated inside a method defined in that class