Python's match/case Statement (3.10) Python 10 introduced a new control structure: the match/case statement. This control structure -once we understand how to use it- makes it easier for programmers to write simpler/cleaner/more readable code for processing data: based on its structure and its values. It generalizes Python's standard if/elif statement, often removing the need to use multiple == or "in" operators, as well as calls to instanceof, hasattr, and len. In fact, any match/case statement can be translated into a more complicated if/elif statement, although some very complicated if/elif statements cannot be translated into simple match/case statements. So we will use match/case statements if we CAN (because they will likely be simpler) and if/elif when we MUST (because they allow more latitude when complexity is required): this is similar to the relationship between for loops (which are simpler) and while loops (which are more complicated but versatile: we can translate any for loop into a while loop). It is infrequent that a new control structure is added to a programming language: match/case was added very late in Python's life-cycle, because it took careful consideration to design the syntax and semantics of this statement to fit in to the rest of Python. Other programming languages already have statements used similarly. We can think of match/case statements as greatly expanding the capabilities of tuple unpacking. There are three PEP documents covering match/case statements in detail, which you can read for all their details. Specification (and EBNF): https://www.python.org/dev/peps/pep-0634/ Motivation and Rationale: https://www.python.org/dev/peps/pep-0635/ Tutorial : https://www.python.org/dev/peps/pep-0636/ In this lecture I will illustrate how to use match/case by a series of ever more complicated examples. I will often illustrate how to translate each into an if/elif statement, to help explain the meaning of the example. Generally, a match/case statement uses the structures an contents of some data to decide which unique block (of many) to execute. The general form of the match/case statement is shown below (with each case indented inside match, and a block indented in each case). match expression: case pattern1 [if boolean-expression]: block1 .... case patternN [if boolean-expression]: blockN After each case is a pattern, and each pattern can be followed by an optional "if boolean-expression" called a "Guard". To execute a match/case statement, Python first evaluates the match expression; it then sequentially tries to match each case pattern with the math expression's value, until it finds the first successful match whose guard's boolean-expression evaluates to True, in which case it executes the block block associated with the case pattern. Then Python is finished executing the match/case statement. We will soon learn the details about how Python attempts to match expressions to patterns (and how it uses the results of the match when evaluating guards). We will leverage of our understanding of matching symbols to patterns in both EBNF and regular expressions. In Python, "match" and "case" are called soft keywords. A hard keyword (see the list below) cannot be used as a name: e.g., we can never bind def = 1. When Python was created, it defined many hard keywords. You can print them all by importing the keyword module and printing keyword.kwlist (which is used in Programming Assignment #3): ['False', 'None', 'True', , 'and', 'as', 'assert', 'async', 'await', 'break', 'class', 'continue', 'def', 'del', 'elif', 'else', 'except', 'finally', 'for', 'from', 'global', 'if', 'import', 'in', 'is', 'lambda', 'nonlocal', 'not', 'or', 'pass', 'raise', 'return', 'try', 'while', 'with', 'yield'] If Python's maintainers made "match" and "case" hard keywords, then they would break all programs that bound values to these names. Because it is so late in Python's life cycle, they chose instead to give match and case a special meaning only if these words are not used in a program to name objects (which is a lot like how the builtin names in Python are used: e.g., the max function). We will now discuss the form of match expressions and case patterns allowed, and how they match. These include understanding the elements of patterns containing literals, wildcards, OR and grouping, sequences, binding/capturing (as) names, mappings, and objects from classes. Along with patterns we will discuss guards, and all the ways these elements can be combined (much like composing regular expressions from their elements). I. Literal Patterns Literals are symbols we write that stand for values of various built-in types in Python: e.g., False, True, None, 123, 45.76, 'abc'. We can use a literal as the simplest kind of pattern; Python checks for a match by using == to compare the literal's value to the value of the expression. Note that the pattern _ is called a wildcard and matches anything. In the examples below, assume robot refers to an object with move, turn, and stop methods: move's arguments can be 'forward' and 'backward'; turn's arguments can be 'left' and 'right'. To start, imagine each command is one of these words. match command: case 'forward': robot.move('forward') case 'backward': robot.move('backward') case 'left': robot.turn('left') case 'right': robot.turn('right') case _ : robot.stop() # unknown command We can also replace each method call's argument by command: robot.move(command) Remember that Python examines the patterns in the case statements sequentially until if finds a match. If we put the last/wildcard case first, it would always match and therefore always stop the robot without checking for matches with other patterns. Python can translate this statement into the following code. if command == 'forward': robot.move('forward') elif command == 'backward': robot.move('backward') elif command == 'left': robot.turn('left') elif command == 'right': robot.turn('right') else: # or elif True: robot.stop() There is another possible approach to translation: case_map = {'forward' : "robot.move('forward')", 'backward': "robot.move('backward')", 'left' : "robot.turn('left')", 'right' : "robot.turn('right')"} exec(case_map.get(command,"robot.stop()")) or case_map = {'forward' : (lambda direction : robot.move(direction)), 'backward': (lambda direction : robot.move(direction)), 'left' : (lambda direction : robot.turn(direction)), 'right' : (lambda direction : robot.turn(direction)), case_map.get(command, lambda _ : robot.stop())(command) Generally, we will show the if/elif translation, but the map translation could be faster to execute. Python can use whatever cleverness it has to do the translation. II. OR Patterns (using choices: borrowing | from EBNF/Regular Expressions) OR patterns use the | character to separate alternative patterns to match for a single case. Python matches the patterns left-to-right and the case pattern matches if any of its alternative patterns match. We can simplify the code above to match command: case 'forward' | 'backward': robot.move(command) # command matches either 'forward' or 'backward' case 'left' | 'right': robot.turn(command) # command matches either 'left' or 'right' case _ : robot.stop() # unknown command Python can translate this statement into the following code. if command == 'forward' or command == 'backward': robot.move(command) elif command == 'left' or command == 'right': robot.turn(command) else: robot.stop() III. Sequences (and name binding) Now suppose that we generalize the form of the command to be a 2-word string: e.g., 'move forward' or 'turn left'. This form makes patterns a bit more complicated, but we are interested in non-trivial patterns. We can rewrite the code as follows match command.split(): case ['move', direction]: if direction == 'forward' or direction == 'backward': robot.move(direction) else: robot.stop() # unknown second argument to command case ['turn', direction]: if direction == 'left' or direction == 'right': robot.turn(direction) else: robot.stop() # unknown second argument to command case _ : robot.stop() # unknown command Here Python first matches the value command.split() against the pattern ['move',direction]. Because the first index in the pattern is a literal, it checks the first indexes for equality; because the second index in the pattern is a name, it binds that name to the second index in command.split(), whatever it may be. What happens if a) the first index is not 'move', or b) the length of the split list is only 1, or c) the length of the split list is more than 2 In all cases, the match fails. Sequence MATCHING is similar to tuple UNPACKING, but when tuple unpacking fails, Python raises an exception; when sequence matching fails, Python just tries to match the pattern in the next case. Note too that the pattern _ matches anything, but does not bind the symbol _ to any values. For example if we specified a pattern [_] it would match any list storing one value, but it would not bind _ to that value. The inner if/elses are clumsy, but (so far) necessary in the case that the command was 'move bletch', because 'bletch' is not a legal direction to move. Python can translate this statement into the following code. Here _e represents a hidden name that Python defines uses: binding it to command.split() and using it while executing the match/case statements: the name is not really _e, but I use that name here for purposes of illustration. I also show it deleted after the match statement finishes executing. _e = command.split() if len(_e) == 2 and _e[0] == 'move': direction = _e[1] if direction == 'forward' or direction == 'backward': robot.move(direction) else: robot.stop() # unknown second argument to command elif len(_e) == 2 and _e[0] == 'turn': direction = _e[1] if direction == 'left' or direction == 'right': robot.move(direction) else: robot.stop() # unknown second argument to command else: robot.stop() del _e Notice all the extra code that Python added to (a) determine whether command.split() produces a 2-list (b) check the 1st index is a literal: 'move' in case 1, or 'turn' in case 2 (c) bind the second index to direction There are many ways to fix this verboseness. The first uses an OR pattern for the second part of each case. In the code below, when command.split() is matched against the first sequence/pattern, it requires the second index to match either 'forward' or 'backward'. So we can simlify the code as follows match command.split(): case ['move', ('forward' | 'backward')]: robot.move(command.split()[1]) case ['turn', ('left' | 'right')]: robot.move(command.split()[1]) case _ : robot.stop() # unknown command Now, both method calls use the 2nd string in command.split() as argument: in the robot.move(...) or robot.turn(...) calls. Rewriting command.split() seems clumsy, and we will now rectify it in two different ways, using two different extensions to case pattern matching. IV. Capturing Subpatterns When we match an OR pattern, we can "capture" which literal matched by following the parenthesized options with "as some-name". match command.split(): case ['move', ('forward' | 'backward') as direction]: robot.move(direction) case ['turn', ('left' | 'right') as direction]: robot.turn(direction) case _ : robot.stop() # unknown command The "as" here is also used elsewhere in Python's syntax for a similar purpose: e.g., renaming of imported items in different kinds of import statements; in an except clause (to name bind the exception object raised); and later in the "with" construct). Python can translate this statement into the following code. _e = command.split() if len(_e) == 2 and _e[0] == 'move' and ( _e[1] == 'forward' or _e[1] == 'backward'): direction = _e[1] robot.move(direction) elif len(_e) == 2 and _e[0] == 'turn' and ( _e[1] == 'left' or _e[1] == 'right'): direction = _e[1] robot.turn(direction) else: robot.stop() # unknown command del _e At this point we have a very simple match/case statement compared to the if/elif statement Python translates it into. This should be your first inkling of the power of this new Python statement. V. Guards: when "if boolean-expression" appears after patterns Another way to solve the problem above is more clumsy for this example, but is very powerful and useful in other situations. When a guard appears after any pattern, if the match expression matches the pattern, the guard is evaluted, and the block is executed only if the guard's evaluations is True. The guard typically computes some boolean condition based ont he names bound when its preceding pattern is matched. match command.split(): case ['move', direction] if direction == 'forward' or direction == 'backward': robot.move(direction) case ['turn', direction] if direction == 'left' or direction == 'right': robot.turn(direction) case _ : robot.stop() # unknown command Here, the pattern is simplified so that the name direction matches whatever the second list values is, and the guard ensures the value bound to direction is one of the two allowed. Python can translate this statement into the following code. _e = command.split() _done = False if len(_e) == 2 and _e[0] == 'move': direction = _e[1] if direction == 'forward' or direction == 'backward': # guard _done = True robot.move(direction) if not _done and len(_e) == 2 and _e[0] == 'turn': direction = _e[1] if direction == 'left' or direction == 'right': # guard _done = True robot.turn(direction) if not _done: robot.stop() del _e del _done Notice here that the match/case statement is translated into SEQUENTIAL if statements (not one big if/elif) that examines _done to see if any match worked and its guard was true for a previous case pattern (in which case its block was executed so NO MORE cases/guards should be checked/executed). Now assume that we allow more general commands of the form 'move forward 3', where the third value (in the split list) can be a string representing an unsigned integer (which we can check by calling isdigit on that string -checking whether all the characters in the string are digits). We can then use the following match/case statement to solve this problem. match command.split(): case ['move', ('forward'|'backward') as direction]: robot.move(direction) case ['move', ('forward'|'backward') as direction, count] if count.isdigit(): for _ in range(int(count)): robot.move(direction) case ['turn', ('left'|'right') as direction]: robot.turn(direction) case ['turn', ('left' | 'right') as direction, count] if count.isdigit(): for _ in range(int(count)): robot.turn(direction) case _ : robot.stop() # unknown command This version checks whether the list produced by split matches a pattern that is a 2-list or 3-list. In fact, we can also reduce the number of cases by using a pattern that specifies either a 2-list or 3-list and a guard that checks which. Here (as in tuple unpacking) recall that *count matches all the remaining sequence values (whether there be one, two, three, ...etc. more). match command.split(): case ['move', ('forward'|'backward') as direction, *count] if len(count) == 1 and count[0].isdigit(): for _ in range(1 if count ==[] else int(count[0])): robot.move(direction) case ['turn', ('left' | 'right') as direction, count] if len(count) == 1 and count[0].isdigit(): for _ in range(1 if count ==[] else int(count[0])): robot.turn(direction) case _ : robot.stop() # unknown command VI: Matching Mappings We can also write match expressions and patterns that are dictionaries. The patterns must have keys that are literal values, but their association can be any pattern: often such a pattern is just a literal or a name. If the pattern is a literal, matching requires that the key/value pair must be present in the dictionary expression; if the pattern is a name, matching requires that the key be present in the dictionary, in which case Python binds the name to the value associated with the key in the expression dictionary. To match an expression dictionary to a case pattern, Python must be able to match every association in the case pattern: so, these associations can be any subset of the key/value associations in the match expression. Imagine that we parsed a robot command into a dictionary: e.g., command_dict = {'action' : move', direction: 'forward', 'count' : 3} but also allowing no 'count' key to be present. We can then use the following match/case statement to solve this problem. match command_dict: case {'action' : 'move', direction: ('forward' | 'backward') as direction, 'count' : count} if type(count) is int and count >= 0: for _ in range(count) robot.move(direction) case {'action' : 'move', direction : ('forward' | 'backward') as direction}: robot.move(direction) case {'action' : 'turn', 'direction: ('left' | 'right') as direction, 'count' : count} if type(count) is int and count >= 0: for _ in range(count) robot.turn(direction) case {'action' : 'turn', direction: ('left' | 'right') as direction}: robot.turndirection) case _ : robot.stop() # unknown command Notice that any dictionry matching case {'action' : 'move', direction: ('forward' | 'backward') as direction, 'count' : count} also matches case {'action' : 'turn', 'direction: ('left' | 'right') as direction, 'count' : count} so the first case shown must be first (otherwise it will match the 2-list case). In fact, we can write a pattern (here and in simpler contexts) that specifies that a name is bound to a specific type. So we can simplify the first case above to case {'action' : 'move', 'direction':('forward'|'backward') as direction, 'count' : int() as count)} if count >= 0: for _ in range(count) robot.move(direction) See the next section (VII) for more information on matching objects generally. For primitive types we can simplify the pattern to be of the form case {'action' : 'move', 'direction': ('forward' | 'backward') as direction, 'count' : int(count)} if count >= 0: for _ in range(count) robot.move(direction) Also, we can reduce the number of cases in this code by leaving off the 'count' key in the case pattern, but later we must look up its assocation in the dictionary, using 1 as the default if it is not present. Although, I think this code is more complicated, even if there are fewer cases. Note its use of the binding operator := match command_dict: case {'action' : 'move', 'direction' :('forward' | 'backward') as direction} if type((count:= command_dict.get('count',1)) is int and count >= 0: for _ in range(count) robot.move(direction) case {'action' : 'turn', 'direction' : ('left' | 'right') as direction} if type((count:= command_dict.get('count',1)) is int and count >= 0: for _ in range(count) robot.turn(direction) case _ : robot.stop() # unknown command VII: Matching Objects: by type, then attributes (named or position) We can also write match expressions and case patterns that are objects constructed from arbitrary classes (instances). In the patterns we can further specify patterns that match the attributes of the match expression object. For example, suppose that we define a class Point that specifies the two attributes x and y then write Point(x=..,,y=..) match p Here are examples of case patterns/guards and their meanings. case Point(): matches any expression that is a Point object (without binding attributes). case Point(x=0): matches any expression that is a point object whose x attribute matches the literal pattern 0. There is nothing relevant to the y attribute. The translation would be if isinstance(p,Point) and hasattr(p,'x') and getattr(p,'x') == 0: case Point(x=0,y=5): matches any expression that is a point object whose x attribute matches the literal pattern 0 and the y attribute matches the literal attribute 5. The translation would be if isinstance(p,Point) and hasattr(p,'x') and getattr(p,'x') == 0 and hasattr(p,'y') and getattr(p,'y') == 0: case Point(x=x1,y=y1) if x1 == y1: matches any Point object whose x and y coordinate are the same; here the x attribute is matched to the name x1, the y attribute is matched to the name y1, and the guard checks for equality among the values matching these two names; also, case Point(x=x,y=y) if x == y: has the same meaning, matching the attributes x and y with the names x and y, and then checking equality. Python can translate this version of the case into the following code. _done = False if isinstance(p,Point) and hasattr(p,'x') and hasattr(p,'y'): x = getattr(p,'x') y = getattr(p,'y') if x == y: _done = True body of case statement It is sometimes useful to specify an attribute ordering that can be positional instead of named. If in the point class we specified the class attribute __match_args__ as __match_args__ = ('x', 'y') then we could specify case Point(0) same as case Point(x=0) case Point(0,5) same as case Point(x=0,y=5) case Point(x,y) GUARD same as case Point(x=x,y=y) GUARD This feature is especially useful if there are few attributes and/or they have a standard order (as above, where typically x coordinates precede y coordinates). VII. Miscelaneous When a match/case statements is finished executing, any names bound by the matching process retain their binding. This is similar to a for loop. So for i in range(1,3): print(i) print(i) prints 1, then 2, then finishes the loop, then prints 2: there is no value bound to i after 2. Likewise, direction = None match command.split(): case ['move', ('forward' | 'backward') as direction]: robot.move(direction) case ['turn', ('left' | 'right') as direction]: robot.turn(direction) case _ : robot.stop() # unknown command print(direction) executes a robot command and then prints the binding of direction: it it is not bound in the first or second case, it is not bound in the match/case statement, and therefore prints None, its initial value. Remember that the order of case patterns is important. Python stops executing the match/case statement after executing the block associated with the first matching pattern. Any value matching an earlier pattern shouldn't be able to match a later pattern; if it can, it is likely you want to change the order of the case patterns (which is why wildcard patterns should always appear last). Typically case patterns appear from more to least specific. For example case {'action' : 'move', direction: ('forward' | 'backward') as direction, 'count' : count}: ... case {'action' : 'move', direction : ('forward' | 'backward') as direction}: ... Finally, we can use tuples in patterns instad of lists: both specify sequences of patterns. ------------------------------------------------------------------------------ Notes to Remove: Example: check isinstance + len for unpacking vs just check whether unpacking works based on len. "Unify"/match the match expression with case (with variable place holders?) match expression produced the subject, finds first case pattern and executes its block match/case are soft (not hard) keywords Order cases from simplest to most complex (matches first/efficiency) VARIABLE BINDINGS OUTLIVE THEIR RESPECTIVE MATCH/CASE STATEMENTS Guards are arbitrary statements attached to pattern that must evaluate to a truthy value for the pattern to match; supported on level of case clauses Syntax pattern if test Patterns: impose structural constaints, which data extracted to bind variable avoid side effects: no binding of attributes/subscripts not expressions: think of as declarating elements similar to formal parameters 2 categories: (structural) constraint | capture pattern binds without regard structure or values "as" pattern includes multiple options (or is |)and "as binding" for chosen or, but not and/not patterns literal patterns: match with == (e.g., 1 and 1.); f-string aren't literal pats capture patterns: name accepting value (must not be nonlocal or global) like parameter to function call (must be unique names) wildcard pattern: _ (non binding): [a, *_, b] == 2-element sequence wherease [_] 1-element sequence; + matches anything value patterns: could bypass, use binding + guard using == sequence patterns: can use tuple/list, only one *, accessed by [], not iteration mapping patterns: keys are literals or value pattern; values are values or names that can be bound class patterns: check for class and/or extract attribute Node(left=x,right=y) matches Node and binds object's left with x, right with y (not binding is left to right!) class field __match_args__ specifies sequence of arguments allowing class patterns to rely on positional subpatterns ------------------------------------------------------------------------------ Examples: Capture, literal, and Wildcard patterns e.g.: literals, variables to be bound, *, wildcard: implicit length match match command.split(): case ['print']: ... case ['go', direction]: ... case ['drop', *objects]: iterate through objects case _: # wildcard; matches but does not bind Or patterns with |: alternatives must bind the same variables! case ['north'] | ['go','north'] : ... As pattern: capturing matched subpattern case ['go', ('north' | 'south' | 'east' | 'west')]: ...which direction case ['go', ('north' | 'south' | 'east' | 'west') as direction]: use direction Guards: addding conditions to patterns case ['go', direction] if direction in ('north' | 'south' | 'east' | 'west')]: ...which direction case ['go', direction] if direction in current_room.exits() Class recognition and variable binding case Click(position=(x,y)): Click object/tuple position attribute binding x,y case KeyPress(): KeyPress object case KeyPress(key_name='Q'): KeyPress object/key_name attribute == to 'Q' note 'Q' not bindable to changes meaning of = case Click((x,y)): See use of __matcah_args__ above case Click((x,y), button=Button.LEFT): will check == for Button.LEFT if just name will CAPTURE value to name Mapping: recognize subset in mapping, literal key : literal or capture not _ means key present but value not captured Built-in classes case ['text': message, 'color' : c] if type(mesage) is str and type(c) is str case ['text': str() as message, 'color' : str() as c] case ['text': str(message), 'color' : str(c)] for many builtin classes (which in Pep 634 but I cannot find) ------------------------------------------------------------------------------