Analysis of Algorithms Analysis of Algorithms is a mathematical area of Computer Science in which we analyze the resources (mostly time, but sometimes space) used by algorithms to solve problems. An algorithm is a precise procedure for solving a problem, written in any notation that humans understand (and thus can carry-out the algorithm): if we write an algorithm as code in some programming language, then a computer can execute it too. The main tool that we use to analyze algorithms is big-O notation: it means "growth on the order of". We use big-O notation to characterize the performance of an algorithm by placing it in a complexity class (most often based on its WORST-CASE behavior -but sometimes on its AVERAGE-CASE behavior) when solving a problem of size N: we will learn how to characterize the size of problem, which is most often as simple as a list of N elements. Once we know the complexity class of an algorithm, we have a good handle on understanding its performance behavior (within certain limits). Thus, we don't necessarily compute the exact resources needed, but typically an approximate bound on the resources. ------------------------------------------------------------------------------ Getting to Big-O Notation: Throwing away Irrelevant Details Here is a simple Python function for computing the maximum of a list. def maximum(alist): answer = None if alist = [] else alist[0] for i in irange(1,len(alist)): if alist[i] > answer answer = alist[i] return answer Often, the problem size is the number of values processed: e.g., the number of values in an list or in a file. But we can use other metrics as well: it can be the count of number of digits in an integer number, when looking at the complexity of multiplication, based on the size of the numbers. Thus, there is no single measure of size that fits all problems: instead, we try to choose a measure that makes sense for the problem. Python translates functions like maximum into a sequence of instructions that the computer executes. To solve a problem, the computer always executes an integer number of instructions. For simplicity we will assume that all instructions take the same amount of time to execute. So to compute the amount of time it takes to solve a problem is equivalent to knowing how many instructions the computer must execute (which we can divide by the number of instructions/second the machine executes). Again, we typically look at the worst case behavior of algorithms. For maximum the worst case occurs if the list is in increasing order. In this case, each new value examined in the list will be bigger than the previous, so the if statment will require updating answer. If any value was lower, it wouldn't have to update answer and thus take fewer instructions/less time to execute. It turns out that for a list of N values, the computer executes 14N + 9 instructions in the worst case for this function. You need to know more CS than you do at this time to determine this formula, but you will get there by ICS 51. A simple way to think about this formula is that there are 14 computer instructions that are executed for/in the loop and 9 instructions that deal with starting and terminating the function. We can say I(N) = 14N + 9 for the worst case of the maximum function, where I(N) is the number of instructions the computer executes. I would like to argue now that if simplify this function to just I(N) = 14N we have not lost much information. Let's look at the result of this this function vs. the original as N gets bigger and bigger N | 14N + 9 | 14N | error: (14N+9 - 14N)/(14N+9) as a % of N ---------+-----------+-------------+--------------------------- 1 | 23 | 14 | 61% 10 | 149 | 140 | 6% 100 | 1409 | 1400 | .6% 1000 | 14009 | 14000 | .06% So by the time we are processing a list of 1,000 values, using the formula 14N instead of 14N+9 is 99.94% accurate. For computers solving real problems, a list of 1,000 values is small. Analysis of Algorithms really should be referred to as ASYMPTOTIC Analysis of Algorithms, as it is mostly concerned with the performance of algorithms as the problem size gets very big (N -> infinity). We see here that as N->Infinity 14N is a better and better approximation to 14N+9: dropping the extra 9 becomes less and less important. A simple function for sorting is the following. This is much simpler than the real sort method in Python (and takes much more time, but it is a good starting point for understanding sorting now). If you are interested in how this function accomplishes sorting, hand simulate it working on a list of 5-10 values: basically each execution of the outer loops gets the next value in the list into the correct position. def sort(alist): for base in range(len(alist)): for check in range(base+1,len(alist)): if alist[base] > alist[check]: alist[base], alist[check] = alist[check],alist[base] return None # list is mutated It turns out that for a list of N values, the computer executes 8N**2 + 12N + 6 instructions in the worst case for this function. The outer loop executes N times (N is len(alist)) and inner loop on average executes N/2 times, so the if statement in the inner loop is executed a quadratic number of times. We can say I(N) = 8N**2 + 12N + 6 for the worst case of the sort function, where I(N) is again the number of instructions the computer executes. I would like to argue that if simplify this function to just I(N) = 8N**2, we have not lost much information. Let's look at the result of this this function vs. the original as N gets bigger and bigger N |8N**2+12N+6| 8N**2 | error: (12N+6)/(8N**2+12N+6) as a % of N ---------+-----------+-------------+--------------------------- 1 | 26 | 8 | 70% 10 | 926 | 800 | 14% 100 | 81,206 | 80,000 | 1.5% 1000 | 8,012,006 | 8,000,000 | .15% So by the time we are processing a list of 1,000 values, using the formula 12N**2 instead of 12N**2+8N+6 is 99.85% accurate. CONCLUSION (though not proven): If the real formula I(N) is a sum of a bunch of terms, we can drop any term that doesn't grow as quickly as the most quickly growing term. First, the linear term: 14N, grows more quickly than the next term, the constant 9, which doesn't grow at all (as N grows) so we drop the 9 term. Second the quadratic term 8N**2, grows more quickly than the next two terms, the linear term 12N and the constant 9, so we drop the 8N and 9 terms. In fact note that Limit as N-> of 12N/8N**2 = 3/2N -> 0, which means we can discard the 12N term. The result is a simple function that is still an accurate approximation of the number of computer instructions executed for lists of various sizes. We now will explain a rationale for dropping the constant in front of N and N**2, and classifying these algorithms as O(N) growing at a linear rate and O(N**2) growing at a quadratic rate. Again O means "grows on the order of". 1) If we assume that every instruction in the computer takes the same amount of time to execute. Then the time taken for maximim is about 14N/speed and time for sort is about 8N/speed. We should rally think about these formulas as (14/speed)N and (8/speed)N**2. We know the 14 and 8 came from the number of instructions inside loops that Python needed to execute: but a different Python interpreter (or a different language) might generate a different number of instructions and therefore a different constant. Thus, this number is based on technology, and we want our analysis to be independent of technology. And, of course, "speed" changes based on technology too. Since we are trying to come up with a "science" of algorithms, we don't want our results to depend on technology, so we are also going to drop the constant in front of the biggest term. For the reason explained above, this number is based solely on technology. Here is another justification for not being concerned with the constant in front of the biggest term. 2) A major question we want answered about any algorithm is, "how much more of a resources does it need when solving a problem TWICE AS BIG". In "maximum", when N is big (so we can drop the +9 without losing much accuracy) the ratio of time to solve solve a problem of size 2N to the time to solve a problem of size N is easily computed: I(2N) 14(2N) -----_- ~ -------_ ~ 2. I(N) 14 N The ratio is is a simple number (no matter how many instructions are in the loop, since the constant 14 appears as a multiplicative factor in both the numerator and denominator). So, we know for this code if we double the size of the list, we double the number of instructions that are executed, and thus double the amount of time (for whatever the speed of the computer is). Likewise, for sorting we can write I(2N) 8(2N)**2 ------ ~ -------- ~ 4 I(N) 8 N**2 Again, the ratio is is a simple number, with the constant (no matter what it is, disappearing). So, we know for this code that if we double the size of the list, we increase by a factor of 4 the number of instructions that are executed, and thus increase by a factor of 4 the amount of time (for whatever the speed of the computer is). Thus, the contant 8 is irrelevant when asking this "doubling" question. Note if we didn't simplify, we'd have I(2N) 8(2N)**2 12(2N) + 6 ------- = --------------------- I(N) 8N**2 + 12N + 6 which doesn't simplify easiy; although, as N->inifinty, this ratio gets closer and closer to 4 (and is very close evern for small-sized problems). As with air-resistance and friction in physics, typically ignoring the contribution of these negligible factors (for big, slow-moving objects) allows us to quickly solve an approximately correct problem. Using big-O notation, we say that the complexity class of the code to find the maximum is O(N). The big-O means "grows on the order of" N, which means a linear growth. For the sorting code, its complexity class is O(N**2), which means grows on the order of N**2, which means a quadratic growth rate. ---------- IMPORTANT: To analyze a Python function and compute its complexity class, we approximate the number of times the most frequently executed statement is executed, dropping all the lower (more slowly growing) terms and drop the constant in front of the most frequently executed statement (the fastest growing term). The maximum code executes the if statement N times, so it is O(N). The sorting code executes the if statement N(N-1)/2 times (we will justify this number below), which is N**2/2 - N/2, so dropping the lower term and the constant 1/2, yields a complexity class of O(N**2) ---------- ------------------------------------------------------------------------------ Comparing Algorithms by their complexity classes Primarily from this definition we have that if two algorithms, a and b, both solve some problem, and a is in a lower complexity class than b, then for all big enough N, Ta(N) < Tb(N): here Ta(N) means the Time it takes for algorithm a to solve the problem. Note that nothing here is said about small N; which algorithms uses fewer resources depends on the actual constants (and even the terms that we dropped). For example, if algorithm a is O(N) with a constant of 100, and algorithm b is O(N**2) with a constant of 1, then for values of N in the range [1,100], Tb(N) = 1N**2 <= 100N = Ta(N) but for all values bigger than 100, Ta(N) = 100N <= 1N**2 = Tb(N) Again, we use the term "asymptotic" analysis of algoritms to indicate that we are concerned with the time when N gets very large (going towards infinity). In which case algorithm a will eventually be better. What about the constants? It is often the case that the constants of different algorithms are close. (They are often just the number of instructions in the main loop of the code.) So the complexity classes are a good indication of faster vs slower algorithms for all but the smallest values N. Although all possible mathematical functions might represent complexity classes (and many strange ones do), we will mostly restrict our attention to the following complexity classes. Note that complexity classes can interchangably represent computing time, the # of machine operations executed, and such more nebulous terms as "effort" or "work" or "resources". As we saw before, a fundamental question about any algorithm is, "What is the time needed to solve a problem twice as big". We will call this the SIGNATURE of the complexity class (knowing this value empirically often allows us to know the complexity class as well). Class | Algorithm Example | Signature --------+-----------------------------------------------+---------------------- O(1) | passing argument to parameters | T(2N) = T(N) O(LogN) | binary searching of a sorted list | T(2N) = c + T(N) O(N) | linear searching a list (the in operator) | T(2N) = 2T(N) O(NLogN)| Fast sorting | T(2N) = cN + 2T(N) Fast algorithms come before here; NLogN grows a bit more slowly than linearly (because logarithms grow so slowly) and no where near as fast as O(N**2) O(N**2) | Slow sorting; scanning N times list of size N | T(2N) = 4T(N) O(N**3) | Slow matrix multiplication | T(2N) = 8T(N) O(N**m) | for some fixed m: 4, 5, ... | T(2N) = 2**mT(N) Tractable algorithms come before here; their work is polynomial in N O(2**N) | Finding boolean values that satisfy a formula | T(2N)=2**NT(N) For example, for an O(N**2) algorithm, doubling the size of the problem quadruples the time required: T(2N) ~ c(2N)**2 = c4N**2 = 4cN**2 = 4T(N). Note that in Computer Science, logarithms are mostly taken to base 2. (Remember that algorithms and logarithms are very different terms). All logarithms are implicitly to base 2 (e.g., Log N = Log2 N). You should memorize and be able to use the following facts to compute some logarithms without a calculator. Log 1000 ~ 10 Actually, 2**10 = 1,024, 2**10 is approximatley 1,000 with < a 3% error. Log a**b = b Log a, or more usefully, Log 1000**N = N Log 1000; so ... Log 1,000,000 = 20 (because 1,000,000 = 1,000**2 Log 1,000,000,000 = 30 (because 1,000,000,000 = 1,000**3 So note that Log is a very slowly growing function. When we increase from Log 1,000 to Log 1,000,000,000 (a factor of 1 million) the result only goes up from 10 to 30 (a factor or 3). In fact, we can compute these logarithms on any calculator that computes Log in any base. For example, Log (base b) X = Log (base a) X/Log (base a) b. So, Log (base b) X is just a constant times Log (base e) X, so they are really all the same complexity class (regardless of the base) because they differ only by a multiplicative constant. For example, Log(base 10) X = Log(base 2) X / Log(base 2) 10 ~ .3 Log(base 2) X ---------- IMPORTANT: If we can demonstrate that doubling the size of the input approximately quadruples the time of the algorithm, then the algorithm is O(N**2). We can use the signatures shown above for other complexity classes as well. Thus, even if we cannot mathematically analyze the complexity class of an algorithm based on inspecting its code, if we can measure it running on various sized problems (doubling the size over and over again), and use the signature information to approximate its complexity class. ---------- We can use knowledge of the complexity class of an algorithm to predict its actually running time on a computer as a function of N easily. For example, if we know the complexity class of algorithm a is O(N**2), then we know that Ta(N) ~ cN**2 for some constant c. The constant c represents the "technology" used: the language, interpreter, machine speed, etc.; the N**2 (from O(N**2)) represents the "science/math" part. Now, given this information, we can time the algorithm for some large value N. Let's say for N = 10,000 (which is actually a pretty small N these days) we find that Ta(10,000) is 4 seconds. First, if I asked you to estimate Ta(20,000) you'd immediately know it is about 16 second (doubling the input of an O(N**2) algorithm approximately increases the running time by a factor of 4). Second, if we solve for c we have Ta(N) ~ cN**2, substituting 10,000 for N and 4 for Ta(N) we have Ta(10,000) = 4 ~ c 10,000**2 (from the formula), so solving for c ~ 4x10**-8. By measuring the run-time of this code, we can calculate the constant "c", which involves all the technology (language, compiler, computer speed, etc.). Roughly, we can think of c as being the amount of time it takes to do one loop (# of instructions per loop/speed of executing instructions) where the algorithm requires N**2 loops to do all its work. Therefore, Ta(N) ~ 4x10**(-8) x N**2. So, if asked to estimate the time to process 1,000,000 (10**6) values (100 times more than 10,000), we'd have Ta(10**6) ~ 4x10**(-8) x (10**6)**2 Ta(10**6) ~ 4x10**(-8) x 10**12 Ta(10**6) ~ 4x10**4, or about 40,000 seconds (about 1/2 a day) Notice that solving a problem 100 times as big take 10,000 (which is 100**2) times as long, as we would expect for an O(N**2) algorithm. In fact, while we often anaylze code to determine its complexity class, if we don't have the code (or find it too complicated to analyze) we can double the input sizes a few times and see whether we can "fit the resulting times" to any of the standard signatures to estimate the complexity class of the algorithm. We should do this for some N that is as large as reasonble (taking some number of seconds to solve on the computer). Note for an O(2**N) algorithms, if we double the size of the problem from 100 to 200 values the amount of time needed goes up by a factor of 2**100, which is ~ 1.3x10**30. Notice that adding one more value to process doubles the time: this "exponential" time is the opposite of logarithmic time, in terms of its growth rate: it grows incredibly quickly. Note too that it is important to be able to analyze the following code. Notice that the upper bound of the inner loop (i) is changed by the outer loop. for i in range(N): for j in range(i): body How many times does the "body" of the loop get executed? When the outer loop index i is 0, "body" gets executed 0 times; when the outer loop index i is 1, "body" gets executed 1 time; when the outer loop index i is 2, "body" gets executed 2 times; .... when the outer loop index i is N-1 (as big as i gets), "body" gets executed N-1 times. So, totally "body" gets executed 0 + 1 + 2 + 3 + ... + N-1 times or just 1 + 2 + 3 + ... + N-1 times. There is a simple, general closed form solution of adding up consecutive integers. Here is the proof that 1 + 2 + 3 + ... + N =N*(N+1)/2 Let S = 1 + 2 + 3 + ... + N-1 + N. Since the order of the numbers makes no difference in the sum, we also have S = N + N-1 + ... + 3 + 2 + 1. If we add the left and right side (column by column) we have S = 1 + 2 + ... + N-1 + N S = N + N-1 + ... + 2 + 1 ------------------------------------- 2S = (N+1) + (N+1) + ... + (N+1) + (N+1) That is, each pair in the column sums to N+1, and there are N pairs to sum. Since there are N pairs, each summing to N+1, the right hand side can be simplified to N*(N+1). so 2S = N*(N+1), therefore S = N(N+1)/2 = N**2/2 + N/2 Thus, S is O(N**2): with a constant of 1/2 and a term of N/2 that is dropped (because its order is lower than that N**2). Note that either N or N+1 is an even number, so dividing their product by 2 is always a integer as it must be for the sum of integers: 6*7/2 = 21. So, looking back at the example of the code above, the total number of times the body gets executed is 0 + 1 + 2 + ... + N-1 which is the same as 1 + 2 + ... + N-1 so plugging N-1 in for n we have (N-1)(N-1+1)/2 = N**2/2 - N/2 which is O(N**2). We can apply this formula for putting N values at the end of a linked list that is initially empty (and has no cache reference to the last node). To put in the 1st value requires skipping 0 nodes; to put in the 2nd value requires skipping 1 nodes; to put in the 3rd value requires skipping 2 nodes; ... to put in the Nth value requires skipping N-1 nodes. So the number of nodes skipped is (N-1)N/2 so building a linked list in this way is in the O(N**2) complexity class. Fast Searching and Sorting: There are obvious algorithms for searching a list in complexity O(N) and sorting a list in complexity O(N**2). But, there are surprisingly better algorithms for these tasks: searching in O(Log N) if the list is sorted; and sorting in O(N Log N). In class, I will briefly discuss the binary searching algorithm, for searching a sorted list: its complexity class is O(Log N). That means that when searching a list of 1,000,000 values, we must access the list at most 20 times to either (a) find the index of the value in the list or (b) determine the value is not in the list. This is potentially 50,000 times faster than one index after another (which we must do if the list is not sorted)! On large problems, algorithms in a lower complexity class can execute much faster. You should know that Python's sorting method on lists is O(N Log N), but we will not discuss the algorithm. As with binary searching, we will discuss the details in ICS-46. Note that we CANNOT perform binary searching efficiently on linked lists, because we cannot quickly find the middle of a linked list (for lists we just compute the middle index and access the list there). In fact, another self- referential data structure, trees, can be used to perform efficient binary searches. Sorting is one of the most common tasks performed on computers. There are hundreds of different sorting algorithms that have been invented and studied. Many small and easy to write sorting algorithms are in the O(N**2) complexity class (see the sorting code above, for example). Complicated but efficient algorithms are often in the O(N Log N) complexity class. We will study sorting in more detail in ICS-46, including a few of these efficient algorithms. For now, memorize that fast sorting algorithms are in the O(N Log N) complexity class. If you are ever asked to analyze the complexity class of a task that requires sorting data as part of the task, assume you can use an O(N Log N) sorting method. Closing: To close for now, finding an algorithm to solve a problem in a lower complexity class is a big accomplishment; a more minor accomplishment might be decreasing the constant in the same complexity class (certainly useful, but often based more on technology than science). By knowing the complexity class of an algorithm we know a lot about the performance of the algorithm (especially if we measure the time it takes to solve certain sized problems). We can also reverse the process, and use a few measurement to approximate the complexity class of an algorithm. ------------------------------------------------------------------------------ Coming up: In the next two lectures we will learn the complexity classes of many of the operations in Python, so we can better analyze Python code that use these operations. This includes dicussion of hashing that will reveal why dicts and sets behave the way they do in Python. We will also look at empirically running algorithms and timing them.