Complexity of Python Operations In this lecture we will learn the complexity classes of various operations on Python data types, and learn how to combine these complexity classes to compute the complexity class of all the statements in a function, and therefore the complexity class of the function ------------------------------------------------------------------------------ Python Complexity Classes In ICS-46 we will write low-level implementations of all of Python's data types and see/understand WHY these complexity classes apply. For now we just need to try to absorb (not memorize) this information, with some -but minimal- justification. In all these examples, N = len(data-type). The operations are organized by increasing complexity Lists: Complexity Operation | Example | Class | Notes --------------+--------------+---------------+------------------------------- Index | l[i] | O(1) | Store | l[i] = 0 | O(1) | Length | len(l) | O(1) | Append | l.append(5) | O(1) | Clear | l.clear() | O(1) | similar to l = [] Slice | l[a:b] | O(b-a) | l[:] is O(len(l)-0) = O(N) Extend | l.extend(...)| O(len(...)) | depends on the len of extension Construction | list(...) | len(...) | Insert | l[a:b] = ... | O(N) | Delete | del l[i] | O(N) | Remove | l.remove(...)| O(N) | Containment | x in/not in l| O(N) | Copy | l.copy() | O(N) | Same as l[:] Pop | l.pop(...) | O(N) | Extreme value | min(l)/max(l)| O(N) | Reverse | l.reverse() | O(N) | Iteration | for v in l: | O(N) | Sort | l.sort() | O(N Log N) | key/reverse doesn't change this Multiply | k*l | O(k N) | 5*l is O(N): len(l)*l is O(N**2) Tuples support all operations that do not mutate the data structure (and with the same complexity classes). Sets: Complexity Operation | Example | Class | Notes --------------+--------------+---------------+------------------------------- Length | len(s) | O(1) | Add | s.add(5) | O(1) | Containment | x in/not in s| O(1) | Remove | s.remove(5) | O(1) | Discard | s.discard(5) | O(1) | Pop | s.pop() | O(1) | Clear | s.clear() | O(1) | similar to s = set() Construction | set(...) | len(...) | <=/< | s <= t | Olen(s1)) | issubset >=/> | s >= t | Olen(s2)) | issuperset s <= t == t >= s Union | s | t | O(len(s)+len(t)) Intersection | s & t | O(min(len(s),lent(t)) Difference | s - t | O(len(t)) | Symmetric Diff| s ^ t | O(len(s)) | Iteration | for v in s: | O(N) | Copy | s.copy() | O(N) | Frozen sets support all operations that do not mutate the data structure (and with the same complexity classes). Dictionaries: dict and defaultdict Complexity Operation | Example | Class | Notes --------------+--------------+---------------+------------------------------- Index | d[k] | O(1) | Store | d[k] = v | O(1) | Length | len(d) | O(1) | Delete | del d[k] | O(1) | get/setdefault| d.method | O(1) | Pop | d.pop(k) | O(1) | Pop item | d.popitem() | O(1) | Clear | d.clear() | O(1) | similar to s = {} or = dict() Views | d.keys() | O(1) | Construction | dict(...) | len(...) | Iteration | for k in d: | O(N) | all forms: keys, values, items Note that for i in range(...) is O(len(...)) Also, if len(alist) is N, then for i in range(len(alist)): is O(N) because it loops N times. Of course even for i in range (len(alist)//2): is O(N) because it loops N/2 times, and dropping the constant 1/2 makes it O(N). ------------------------------------------------------------------------------ Composing Complexity Classes: Sequential and Nested Statements In this section we will learn how to combine complexity information about simple operations into complexity information about complex operations (composed from simple operations). The goal is to be able to determine the complexity class of a function by analyzing its statements. Using big-O notation O(f(n)) + O(g(n)) is O( f(n) + g(n) ) That is, we when adding complexity classes we bring the two complexity classes inside the O(...). Ultimately, O( f(n) + g(n) ) results in the bigger of the two complexity class (because we drop the lower added term). So, O(N) + O(Log N) = O(N + Log N) = O(N) because N is the faster growing function. This rule helps us understand how to compute the complexity of doing some SEQUENCE of operations: executing some statements that are O(f(n)) followed by executing some statements that are O(g(n)). Executing all the statements SEQUENTAILLY is O(f(n)) + O(g(n)) which is O( f(n) + g(n) ). For example, if some functoin call f() is O(N) and another function call g() is O(N Log N), then doing the sequence f() g() is O(N) + O(N Log N) = O(N + N Log N) = O(N Log N). Of course, doing the sequence f() f() is O(N) + O(N) which is O(N + N) which is O(2N) which is O(N). Likewise, using big-O notation O(f(n)) * O(g(n)) is O( f(n) * g(n) ) If we repeat an O(f(N)) process O(N) times, the resulting complexity is O(N)*O(f(N)) = O( Nf(N) ). An example of this is, if some function call f() is O(N**2), then executing that call N times (in the following loop) for i in range(N): f() is O(N)*O(N**2) = O(N*N**2) = O(N**3) Compound statements can be analyzed by composing the complexity classes of their constituent statements. For sequential statements the complexity classes are added; for nested statements the complexity class are multiplied. Let's analyze three functions that compute whether or not a list contains only unique values. We assume in all three examples that len(alist) is N. 1) A list is unique if each value in the list does not occur in any later indexes: alist[i+1:] is a list containing all values after the one at index i. def is_unique (alist : [object]) -> bool: for i in range(len(alist)): O(N) if alist[i] in alist[i+1:]: O(N) - copying/in both: O(N)+O(N) = O(N) return False O(1) - not executed in worst case return True O(1) The complexity is O(N) * O(N) + O(1) = O(N**2). So if we double the length of the list, this function takes 4 times as long to execute. 2) A list is unique if when we sort its values, no adjacent values are equal. If there were duplicate values, sorting the list would put duplicates next to each other. Here we copy the list so as to not change the order of the parameter's list. def is_unique2 (alist : [object]) -> bool: copy = list(alist) O(N) copy.sort() O(N Log N) - for Python sorting for i in range(len(alist)-1): O(N) - really N-1, but that is O(N) if copy[i] == copy[i+1]: O(1) return False O(1) - not executed in worst case return True O(1) The complexity is O(N) + O(N Log N) + O(N)*O(1) + O(1) = O(N + N Log N + O(N*1) + 1) = O(N + N Log N + N + 1) = O(N Log N). So this complexity class is lower, and sorting is the dominant operation. If we double the length of this list, this function takes a bit more than twice the amount of time. In N Log N: N doubles and Log N gets a tiny bit bigger (i.e., Log 2N = 1 + Log N; e.g., Log 2000 = 1 + Log 1000 = 11, so compared to Log 1000, 2000 Log 2000 got 2.1 times bigger, or 10% bigger than doubling). 3) A list is unique if when we turn it into a set, its length is the same (no duplicate values were added to the set). def is_unique2 (alist : [object]) -> bool: aset = set(alist) O(N) return len(aset) == len(alist) O(1) The complexity is O(N) + O(1) = O(N + 1) = O(N). So this complexity class is lower yet. If we double the length of this list, this function takes twice the amount of time. We could write the body of this function more simply as: return len(set(alist)) == len(alist). ------------------------------------------------------------------------------ A Collection Class Example: W will now look at the solution of a few problems (combining operations on a priority queue: pq) and how the complexity class of the result is affected by three different classes/implementations of priority queues. For the problems, all we need to know is the complexity class of the "add" and "remove" operations. add remove +-------------+-------------+ Implementation 1 | O(1) | O(N) | +-------------+-------------+ Implementation 2 | O(N) | O(1) | +-------------+-------------+ Implementation 3 | O(Log N) | O(Log N) | +-------------+-------------+ So, Implementation 1 works by adds the new value into the pq by appending at the rear of a list or the front of a linked list: O(1); it removes the highest priority value by scanning through the list or linked list to find the highest, which is O(N), and then removing that value, also O(N) in the worst case (removing at the front of a list; at the rear of a linked list). Implementation 2 adds the new value into the pq by scanning the list or linked list for the right spot to put it and putting it there, whici is O(N). Lists store their highest priority at the rear, linked lists at the front; it removes the highest priority value from the rear or front, which is O(1) Implementation 3, which is discussed in ICS-46, uses a binary heap tree (not a binary search tree) to implement both operations with "middle" complexity O(Log N): greater than O(1) but less than O(N). Problem 1: Suppose we wanted to use the priority queue to sort N values: we add N values and then remove N values (first the highest, next the second highest, ...). Here is the complexity of these combined operations for each implementation. Implementation 1: O(N)*O(1) + O(N)*O(N) = O(N) + O(N**2) = O(N**2) Implementation 2: O(N)*O(N) + O(N)*O(1 = O(N**2) + O(N) = O(N**2) Implementation 3: O(N)*O(Log N) + O(N)*O(Log N) = O(NLogN) + O(NLogN) = O(NLogN) Here, Implementation 3 has the lowest complexity for the combined operations. Implementations 1 and 2 each do one operation quickly but the other slowly; since the slowest operation determines the complexity class, both are equally slow. The compleixty class O(Log N) is between O(1) and O(N); surprisingly, it is actually "closer" to O(1) than O(N), even though it does grow -because it grows so slowly. Problem 2: Suppose we wanted to use the priority queue to find the 10 biggest (of N) values: we would enqueue N values and then dequeue 10 values. Here is the complexity of these combined operations for each implementation.. Implementation 1: O(N)*O(1) + O(10)*O(N) = O(N) + O(N) = O(N) Implementation 2: O(N)*O(N) + O(10)*O(1) = O(N**2) + O(1) = O(N**) Implementation 3: O(N)*O(Log N) + O(10)*O(Log N) = O(NLogN) + O(LogN) = O(NLogN) Here, Implementation 1 has the lowest complexity for the combined operations. That makes sense, as the operation done many times (add) is very simple (add to the end of a list/the front of a linked list) and the operation done a constant number of times (10, independent of N) is the expensive operation (remove). It even beats the complexity of Implementation 3. So, as N gets bigger, implementation 1 becomes faster than the other two. So, sometimes there is NOT a "best all the time" implementation. We need to know what problem we are solving (the complexity classes of all the operations in various implementations and the number of times we must do these operations) to choose the best implementation for solving the problem.