Analysis of Algorithms (Complexity Classes and Big-O Notation) Analysis of Algorithms (AA) is a mathematical area of Computer Science in which we analyze the resources (mostly time; but sometimes space; and at the chip level, even electrical power) used by algorithms to solve problems. An algorithm is a precise procedure for solving a problem, written in any notation that humans understand (and thus can carry-out the algorithm): if we write an algorithm as code, in some programming language, then a computer can execute it too. We are primarily interested in using computers to solve very large-sized problems, and that is where Analysis of Algorithms works best. The main tool that we use to analyze algorithms is big-O notation: it means "growth (in resources, based on the problem size) on the Order of". We use big-O notation to characterize the performance of an algorithm by placing it in a complexity class (most often based on its WORST-CASE behavior -but sometimes on its AVERAGE-CASE behavior ore even BEST-CASE behavior) when solving a problem of size N: we will learn how to characterize the SIZE of problem, which is most often as simple as N is the number of values in a list/set/dictionary/file being processed. Thus, in AA we don't necessarily compute the exact resources needed, but typically an approximate upper bound on the resources, based on the problem size. Once we know the complexity class of an algorithm, we can easily get a good handle on understanding its actual performance (within certain limits): the time it will take on a specific computer to solve a specific-sized problem. ------------------------------------------------------------------------------ Getting to Big-O Notation: Throwing away Irrelevant Details Here is one simple Python function for computing the maximum of a list (or returning None if there are no values, for an empty list). def maximum(alist): answer = None if alist == [] else alist[0] for i in range(1,len(alist)): if alist[i] > answer: answer = alist[i] return answer Often, the problem size is the number of values processed: e.g., the number of values in a list or lines in a file. But we can use other metrics as well: it can be the count of number of digits in an integer value, when looking at the complexity of multiplication based on the size of the numbers. Thus, there is no single measure of size that fits all problems: instead, for each problem we try to choose a measure that makes sense and is natural for that problem. Python translates functions like maximum into a sequence of instructions that the computer executes (the subject of ICS-51 and one of ICS-33's last week's lectures). To solve a problem, the computer always executes an integral number of instructions. For simplicity, we will assume that all instructions take the same amount of time to execute. So to compute the amount of time it takes to solve a problem is equivalent to computing how many instructions the computer must execute to solve it, which we can then divide by the number of instructions per second a machine executes, to compute the actual amount of time taken to solve the problem on that machine. Again, we typically look at the worst case behavior of algorithms. For maximum the worst case occurs if the list is in increasing order. In this case, each new value examined in the list will be bigger than the previous maximum, so the if statement's condition will always be True, which always requires updating the answer local variable. If any value was lower, it wouldn't have to update answer and thus take fewer instructions/less time to execute. It turns out that for a list of N values, we find that the computer executes 14N + 9 instructions in the worst case for this function. You need to know more CS than you do at this time to determine this formula, but you will get there by ICS 51 (and I expect to cover the basics during the last week of the quarter, when I cover the Python Virtual Machine: its equivalent to machine code). A simple way to think about this formula is that there are 14 computer instructions that are executed each time Python executes the body of the for loop and 9 instructions that are executed only once: they deal with starting and terminating the loop and the other parts of the function not in the loop. We can write I(N) = 14N + 9 for the worst case of the maximum function, where I(N) is the largest number of instructions the computer executes when solving a problem on a list with N values. Or to be more specific we could include the name of the function in this formula and write Imaximum(N) = 14N + 9. I would like to argue now that if we simplify this function to just I(N) = 14N we have lost some information, but not much. Specifically, as N gets bigger (i.e., when we are dealing with very big problems - the kinds computers are used to solve), 14N and 14N+9 are relatively close. Let's look at the result of this function vs. the original as N gets for values of N increasing by a factor of 10. N | 14N + 9 | 14N | error: (14N+9 - 14N)/(14N+9) as a % of N ---------+------------+------------+--------------------------- 1 | 23 | 14 | 61% or 39% accurate 10 | 149 | 140 | 6% 94% accurate 100 | 1409 | 1400 | .6% 99.4% accurate 1000 | 14009 | 14000 | .06% 99.94% accurate ... 1,000,000 | 14,000,009 | 14,000,000 | .00006% 99.99994% accurate ... Each line shows the % error of computing 14N when the true answer is 14N + 9. So by the time we are processing a list of 1,000 values, using the formula 14N instead of 14N+9 is 99.94% accurate. For computers solving real problems, a list of 1,000 values is small: a list of millions is more normal. For 1,000,000 values 14N is off by just 9 parts in 14 million. So the 9 doesn't affect the value of the formula much. Analysis of Algorithms really should be referred to as ASYMPTOTIC Analysis of Algorithms, as it is mostly concerned with the performance of algorithms as the problem size gets very big (N -> infinity). We see here that as N->Infinity 14N is a better and better approximation to 14N+9: dropping the extra 9 becomes less and less important. Here is another simple Python function, for sorting a list of values. This is much simpler than the actual sort method in Python (and the simplicity of code results in this function taking much more time, but it is a good starting point for understanding sorting now; ICS-46 spends a week studying sorting). If you are interested in how this function accomplishes sorting, hand simulate it working on a list of 5-10 values (try increasing values, decreasing values, and values in a random order): basically, each execution of the outer loops mutates the list so that the next value in the list is in the correct position. def sort(alist): for base in range(len(alist)): for check in range(base+1,len(alist)): if alist[base] > alist[check]: alist[base], alist[check] = alist[check],alist[base] return None # list is mutated It turns out that for a list of N values, the computer executes 8N**2 + 12N + 6 instructions in the worst case for this function. The outer loop executes N times (N is len(alist)) and inner loop on average executes N/2 times, so the if statement in the inner loop is executed a quadratic number of times. We can say I(N) = 8N**2 + 12N + 6 for the worst case of the sort function, where I(N) is again the number of instructions the computer executes. Or to be more specific we would write the formula as Isort(N) = 8N**2 + 12N + 6. I would like to argue in the same way that if simplify this function to just the dominant term, I(N) = 8N**2, we have not lost much information. Let's look at the result of this this function vs. the original as N gets bigger. N | 8N**2+12N+6| 8N**2 | error: (12N+6)/(8N**2+12N+6) as a % of N ---------+------------+-------------+--------------------------- 1 | 26 | 8 | 70% or 30% accurate 10 | 926 | 800 | 14% 86% accurate 100 | 81,206 | 80,000 | 1.5% 98.5% accurate 1000 | 8,012,006 | 8,000,000 | .15% 99.85% accurate So by the time we are processing a list of 1,000 values, using the formula 8N**2 instead of 8N**2 + 12N + 6 is 99.85% accurate. For 1,000,000 values (10**6), 8N**2 is 8*10^12 while 8N**2 + 12N + 6 is 8*10^12 +12*10^6 + 6; the simpler formula is is 99.999985% accurate. CONCLUSION (though not proven): If the real formula I(N) is a sum of a bunch of terms, we can drop any term that doesn't grow as quickly as the most quickly growing term. In the maximum function, the linear term 14N grows more quickly than the next term, the constant 9, which doesn't grow at all (as N grows) so we drop the 9 term. In the sorting function, the quadratic term 8N**2 grows more quickly than the next two terms, the linear term 12N and the constant 6, so we drop both the 12N and 6 terms. In fact, we can prove that the Limit as N->infinity of 12N/8N**2 = 3/(2N) -> 0, which means we can discard the 12N term because it grows more slowly than the 8N**2 term, and will eventually be dominated by this faster growing term. Of course the Limit as N->infinity of 6/8N**2 = 3/4N**2 -> 0. The result is a simple function that is still an accurate approximation of the number of computer instructions executed for lists of various LARGE sizes for the maximum and sort functions. Each consists of just a constant multiplied by some simple function of N (here N and N**2, but many other functions are possible too; we will see others later in this lecture). We now will explain the rationale for dropping the constant in front of N and N**2 terms, and classifying these algorithms as O(N) = growing at a linear rate and O(N**2) = growing at a quadratic rate. Again O means "grows on the order of" so O(N) means execution time grows on the order of N and O(N**2) means execution time grows on the order of N**2. Technically, it means "execution time grows no more quickly than the the order of". 1) If we assume that every instruction in the computer takes the same amount of time to execute, then the time taken for maximum is about 14N/speed and the time for sort is about 8N**2/speed. We should really think about these formulas as (14/speed)N and (8/speed)N**2. We know the 14 and 8 came from the number of instructions inside loops that Python executes: but a different Python interpreter (or a different language) might generate a different number of instructions in these loops and therefore would result in a different constant. Thus, this number is based purely on TECHNOLOGY, and we want our analysis to be independent of technology. And, of course, "speed" changes based on technology too, and that is also part of that constant. Since we are trying to come up with a "science" of algorithms, we don't want our results to depend on technology, so we are also going to drop the constant in front of the biggest term as well. For the reason explained above (relating to instructions generated by Python and the speed of the machine), the resulting number is based solely on technology. Here is another example from technology. Computers can be categorized at a very high level as either CISC (Complex Instruction Set), RISC (Reduced Instruction Set), or VLIW (Very Long Instruction Word). ICS-51 will cover the differences. Let's just look at just the first two. A program translated to a RISC processor will have more instructions than a CISC processor: each instruction does something simpler, so there are more of them. But each RISC instruction (because it is simpler) executes more quickly. Generally, a program on a RISC machine translates to 2 times as many instructions, but each instruction runs 3 times as fast. So, based on the constants computing the speed, the numerator doubles and the denominator triples. Therefore the code will run faster by a factor of about 1.5. But NONE of this has anything to do with the actual program/algorithm; it relates only to the technology, so should not be part of a science of algorithms. Here is another justification for not being concerned with the constant in front of the biggest term. 2) A fundamental question that we want answered about any algorithm is, "how much MORE resources does it need when solving a problem TWICE AS BIG". In maximum, when N is big (so we can drop the +9 without losing much accuracy) the ratio of time to solve solve a problem of size 2N to the time to solve a problem of size N is easily computed by the following approximation Imaximum(2N) 14(2N) -------------- ~ -------- ~ 2 Imaximum(N) 14 N The ratio is always a simple number (here 2), no matter how many instructions are executed in the loop, since the constant 14 appears as a multiplicative factor in both the numerator and denominator: so when divided it cancels out itself. So, we know for this code, if we double the size of the list, we approximately double the number of instructions that maximum executes to solve the problem (approximately because we did drop the 9 term), and thus approximately double the amount of time (for whatever the speed of the computer is). Thus, the constant 14 is irrelevant when asking this "doubling" question. Likewise, for sorting we can write Isort(2N) 8(2N)**2 ----------- ~ ---------- ~ 4 Isort(N) 8 N**2 Again, the ratio is is a simple number (4), with the constant (no matter what it is, disappearing). So, we know for this code that if we double the size of the list, we increase by approximately a factor of 4 the number of instructions that are executed to sort it (approximately because did drop the 12N + 6 term), and thus increase by approximately a factor of 4 the amount of time (for whatever the speed of the computer is). Thus, the constant 8 is irrelevant when asking this "doubling" question. Note if we didn't simplify, we'd have I(2N) 8(2N)**2 + 12(2N) + 6 32N**2 + 24N + 6 ------- = ----------------------- = ----------------------- I(N) 8N**2 + 12N + 6 8N**2 + 12N + 6 which doesn't simplify easily; although, as N->infinty, this ratio gets closer and closer to 4 (and is close even for pretty small-sized problems). As with air-resistance and friction in physics, typically ignoring the contribution of these negligible factors (for big, slow-moving objects) allows us to quickly and simply solve an approximately correct problem, which gives us useful information. Using big-O notation, we say that the complexity class of the code to find the maximum is O(N). The big-O means "grows on the order of" N, which means a linear growth rate (double the input size, double the time). For the sorting code, its complexity class is O(N**2), which means grows on the order of N**2, which means a quadratic growth rate (double the input size, quadruple the time). ---------- IMPORTANT: A Quick way to compute the complexity class of an algorithm To analyze a Python function's code and compute its complexity class, we approximate THE NUMBER OF TIMES THE MOST FREQUENTLY EXECUTED STATEMENT IS EXECUTED, dropping all the lower (more slowly growing) terms and dropping the constant in front of the most frequently executed statement (the fastest growing term). We will show how to do this much more rigorously in the next lecture. The maximum code executes the if statement N times, so the code is O(N). The sorting code executes the if statement N(N-1)/2 times (we will justify this number later in this lecture), which is N**2/2 - N/2, so dropping the lower term and the constant 1/2, yields a complexity class of O(N**2) for this code. ---------- ------------------------------------------------------------------------------ Comparing Algorithms by their complexity classes: When we can/When we cannot Primarily from this definition we know that if two algorithms, a and b, both solve some problem, and a is in a lower complexity class than b, then for all BIG ENOUGH N, Ta(N) < Tb(N): here Ta(N) means the Time it takes for algorithm a to solve the problem. Note that nothing here is said about small N; which algorithms uses fewer resources on small problems depends on the actual constants and lower-order terms that we drop. So COMPLEXITY CLASSES HAVE MUCH TO SAY ABOUT BIG PROBLEM SIZES BUT LITTLE TO SAY FOR SMALL PROBLEM SIZES. For example, if algorithm a is O(N) with a constant of 100, and algorithm b is O(N**2) with a constant of 1, then for values of N (problem sizes) in the range [0,100], Tb(N) = 1N**2 <= 100N = Ta(N) but for all values of N (problem sizes) >= 100, Ta(N) = 100N <= 1N**2 = Tb(N) So, for all N >= 100, Ta(N) <= Tb(N). Contrast this with a second example. If algorithm a is O(N) with a constant of 1, and algorithm b is O(N**2) with a constant of 10, then for all values of N (all problem sizes) Ta(N) = 1N <= 10N**2 = Tb(N) In this case, for all N >= 0, Ta(N) <= Tb(N). These two examples illustrate (but don't prove) that regardless of the constants, there is always a problem size P such that for all N >= P, Ta(N) <= Tb(N). This means that as problems get big enough, there will always be a size for which an O(N) algorithm eventually executes in less time than an O(N**2) algorithm, and it executes in less time for all problem sizes bigger than that one. But, in some cases a lower complexity class can execute in less time for small N (2nd example), and in some cases it can execute in more time (1st example). We need information beyond their complexity classes to find out: constants and lower-order terms. But it is guaranteed that FOR ALL SIZES N BEYOND SOME PROBLEM SIZE P, the algorithm in the lower complexity class will run faster. Again, we use the term "asymptotic" analysis of algorithms to indicate that we are mostly concerned with the time taken by the code when N gets very large (going towards infinity). In both cases, because of their complexity classes, algorithm a will eventually take less time to execute than algorithm b. What about the constants? Are they likely to be very different in practice? It is often the case that the constants of different algorithms are close (say within a factor of 10 of each other): they are often just the number of instructions in the innermost loop of the code. So the complexity classes are a good indication of faster vs slower algorithms for all but the smallest problem sizes. Although all possible mathematical functions might represent complexity classes (and many strange ones do), we will mostly restrict our attention to the following complexity classes. Note that complexity classes can interchangeably represent computing time, the # of machine operations executed, and such more nebulous terms as "effort" or "work" or "resources". As we saw before, a fundamental question about any algorithm is, "What is the time needed to solve a problem twice as big". We will call this the DOUBLING SIGNATURE of the complexity class (measuring this value empirically, by timing the code on problem sizes that increase by a factor of 2, allows us to know -or at least approximate- the complexity class as well). Class | Algorithm Example | Doubling Signature --------+-----------------------------------------------+---------------------- O(1) | pass argument->parameters: copying a reference| T(2N) = T(N) O(LogN) | binary searching of a sorted list | T(2N) = c + T(N) O(N) | linear searching a list (the in operator) | T(2N) = 2T(N) O(NLogN)| Fast sorting | T(2N) = cN + 2T(N) Fast algorithms come before here; NLogN grows a bit more quickly than linearly (because logarithms grow so slowly compared to N) and nowhere near as quickly as O(N**2). 1000 Log 1000 (for logarithms base 2) is about 10,000; 1,000,000 Log 1,000,000 (ditto) is about 20,000,000. So increasing N by a factor of 1,000 increase the function by 2,000: the linear term grows by a factor of 1,000, the logarithmic term grows by just a factor of 2. O(N**2) | Slow sorting; scanning N times list of size N | T(2N) = 4T(N) O(N**3) | Slow matrix multiplication | T(2N) = 8T(N) O(N**m) | for some fixed m: 4, 5, ... graph algorithms | T(2N) = 2**mT(N) Tractable algorithms come before here; their work is polynomial in N for some fixed power. In the complexity class below, N is an exponent. O(2**N) | Finding boolean values that satisfy a formula | T(2N)=2**N T(N) For example, for an O(N**2) algorithm, doubling the size of the problem quadruples the time required: If T(N) ~ cN^2 then T(2N) ~ c(2N)**2 = c4N**2 = 4cN**2 = 4T(N). So the ratio T(2N)/T(N) ~ c4N**2/cN**2 = 4. Note that in Computer Science, logarithms are mostly taken to base 2. (Remember that algorithms and logarithms are very different terms). All logarithms are implicitly to base 2 (e.g., Log N = Log2 N). You should memorize and be able to use the following facts to approximate logarithms without a calculator. Log 1000 ~ 10 Actually, 2**10 = 1,024, 2**10 is approximately 1,000 with < a 3% error. Log a**b = b Log a, or more usefully, Log 1000**N = N Log 1000; so ... Log 1,000,000 = 20 : 1,000,000 = 1,000**2; Log 1000**2 = 2*Log 1000 Log 1,000,000,000 = 30 : 1,000,000,000 = 1,000**3; Log 1000**3 = 3*Log 1000 So note that Log is a very slowly growing function. When we increase from Log 1,000 to Log 1,000,000,000 (the argument grows by a factor of 1 million) the Log only grows by from 10 to 30 (by a factor or 3). In fact, we can compute these logarithms on any calculator that computes Log in any base. The following shows how to compute Log (base b) in terms of Log (base a). Log (base b) X = Log (base a) X / Log (base a) b So, Log (base b) X is just a constant (1/Log (base a) b) times Log (base a) X, so logarithms to any base are really ALL THE SAME COMPLEXITY CLASS (regardless of the base) because they differ only by a multiplicative constant (and we ignore such multiplicative constants when writing complexity classes). For example, Log(base 10) X = Log(base 2) X / Log(base 2) 10 ~ .3 Log(base 2) X So O(Log(base 10) X) = O(.3 Log(base 2) X) which simplifies to O(Log(base 2) X) by removing the constant. ---------- A great example (from the physical world) involving O(1) complexity, which students often struggle with. When I came to interview at UCI (late in my career) I went to see Alex Thornton teach one of his classes and he showed a great example that I have been using every since. Here is my version of what Alex taught... Suppose that you want to get from point A to point B, and you have two ways to travel: by walking and by teleporting. If you walk, you walk at a constant rate from A to B; if you teleport, you must wait 1 minute for the teleporter to warm up, and then you will be instantaneously transported from A to B. Here N (the size of the problem to solve) is the distance between A and B. Now, walking is represented by the linear complexity class, O(N): if you double the distance you need to walk, you double the time you take to get there. Next, teleportation is represented by the constant complexity class, O(1): if you double the distance you need to teleport, the amount of time does not change to get there: it is always 1 minute. From the discussion above, because the complexity class O(1) is a lower complexity class than O(N), we know that for a big enough distance teleportation will be faster than walking. What is that distance? It is the distance we can walk in 1 minute (the amount of time it takes the teleporter to warm up). We can draw these relationships as follows, showing where the Walk and Teleport lines cross. Walk TIME | / | / 60 secs+-----+--------------------------- Teleport | / | / | / | / |/ +-------------------------------- PROBLEM SIZE N = DISTANCE The sloping line represents the function Twalk(N) = (1/speed)*N; when we remove the constant (1/speed), this function is just O(N). The horizontal line represent the function Tteleport(N) = 60; the time is always 60 seconds. We treat that as 60*1 (so we remove the 60 as a constant multiplier), which leaves this function as O(1). The meaning of O(1) is not 1 second, but is any constant time. The slope of the Walk line represents the speed: slower speeds have a bigger slope; higher speeds have a smaller slope. But no matter the slope, the Walk line will eventually intersect the Teleport line: where both modes of transportation take the same amount of time. For all larger distances, teleporting will take less time than walking. What if we needed 30 seconds to put on our shoes. The graphs would change to be Walk TIME | / | / 60 secs+--+----------------------------- Teleport | / |/ 30 secs+ Putting on shoes in 30 seconds | | +-------------------------------- PROBLEM SIZE N = DISTANCE Here Twalk(N) = (1/speed)*N + 30 The lines still cross in the same way (and still at 60 seconds, although now at a smaller distance on the X axis): the distance we can walk in 30 seconds, because that is how much time we can walk between putting on our shoes and starting to teleport. For these specific lines we once again see that walking is better (takes less time) than teleporting for small distances, but that is because we have written out the full equations here. Generally, when we know only that one process is O(N) and another one is O(1) we don't know which is better for small N. For example, if it took us 90 seconds to put on our shoes, it would always be faster to teleport. Walk TIME | / |/ 90 secs+ Putting on shoes in 90 seconds | | 60 secs+-------------------------------- Teleport | | 30 secs| | | +-------------------------------- PROBLEM SIZE N = DISTANCE Whether Twalk(N) = (1/speed)*N or (1/speed)*N + 30 or Twalk(N) = (1/speed)+ 90, we still say walking is O(N). So, asking whether O(1) or O(N) is faster for small sized problems is unknown: it depends on the speed and the dropped term (either 0, 30, or 90 above). But regardless of constants and dropped term, we know there is some distance, at which (and for all larger distances) teleporting, which is O(1), is faster than walking, which is O(N). Again: We cannot tell whether an O(1) algorithm is faster than an O(N) one for small problem sizes, but for large ones, we know O(1) is faster. ---------- ---------- IMPORTANT: Determining the Complexity Class Empirically - from a Doubling Signature If we can demonstrate that doubling the size of the input approximately quadruples the time of an algorithm, then the algorithm is likely O(N**2). We can use the doubling signatures shown above for other complexity classes as well. Thus, even if we cannot mathematically analyze the complexity class of an algorithm based on inspecting its code (something we will highlight in the next lecture), if we can measure it running on various sized problems (doubling the size over and over again), we can use the signature information to approximate its complexity class. In this approach we don't have to understand (or even look at) the code. ---------- ------------------------------------------------------------------------------ Computing Running Times from Complexity Classes We can use knowledge of the complexity class of an algorithm to predict its actually running time on a computer as a function of N easily. For example, if we know the complexity class of algorithm a is O(N**2), then we know that Ta(N) ~ cN**2 for some constant c. The constant c represents the "technology" used: the language, interpreter, machine speed, etc.; the N**2 (from O(N**2)) represents the "science/math" part: the complexity class. Now, given this information, we can time the algorithm for some large value N. Let's say for N = 10,000 (which is actually a pretty small N these days) we find that Ta(10,000) is 4 seconds. We can solve for c using the first measurement, we have Given Ta(N) ~ cN**2, substituting 10,000 for N and 4 for Ta(N) (from timing algorithm a running on a computer, as described above) we have Ta(10,000) = 4 ~ c 10,000**2 (from the formula), so solving for the the technology constant, we have c ~ 4x10**-8. If I asked you to estimate Ta(20,000) you could plug into this equation and compute 4x10**-8 x (20,000)**2 = 16. But, you could also shortcut this process because you know that doubling the input of an O(N**2) algorithm approximately increases its running time by a factor of 4, so the answer would be 16 seconds. So, by measuring the run-time of this code, we can calculate the constant "c", which involves all the technology (language, interpreter, computer speed, etc). Roughly, we can think of c as being the amount of time it takes to do one loop (# of instructions per loop/speed of executing instructions) where the algorithm requires N**2 iterations through the loops to do all its work. That is why the constant is so small: modern computers execute billions of operations per second. Therefore, Ta(N) ~ 4x10**(-8) x N**2. So, if asked to estimate the time to process 1,000,000 (10**6) values (100 times more than 10,000), we'd have Ta(10**6) ~ 4x10**(-8) x (10**6)**2 Ta(10**6) ~ 4x10**(-8) x 10**12 Ta(10**6) ~ 4x10**4, or about 40,000 seconds (~1/2 a day: day = 86,400 secs) Notice that solving a problem 100 times as big takes 10,000 (which is 100**2) times as long, which is based on the signature for an O(N**2) algorithm when we increase the problem size by a factor of 100. If we go back to our sorting example, I(100N) 8(100N)**2 ------- ~ ------------- ~ 10,000 I(N) 8 N**2 In fact, while we often analyze code to determine its complexity class, if we don't have the code (or find it too complicated to analyze) we can double the input sizes a few times and see whether we can "fit the resulting times" to any of the standard signatures to estimate the complexity class of the algorithm. We should do this for some N that is as large as reasonable (taking some number of seconds to solve on the computer). Note for an O(2**N) algorithms, if we double the size of the problem from 100 to 200 values the amount of time needed goes up by a factor of 2**100, which is ~ 1.3x10**30. The time for such algorithms "explodes" as the size of the problem increases. Notice that even adding just one more value to process doubles the time: this "exponential" time is the inverse function of logarithmic time, in terms of its growth rate: it grows incredibly quickly while logarithms grow incredible slowly. An exponential algorithm does little good as problem sizes get bigger. ------------------------------------------------------------------------------ Odds and Ends: Many algorithms (including sorting) exhibit the following behavior. Notice that the upper bound of the inner loop (i) is changed by the outer loop. It is important to be able to analyze this code to compute its complexity class. for i in range(N): for j in range(i): body How many times does the "body" of the loop get executed? When the outer loop index i is 0, "body" gets executed 0 times; when the outer loop index i is 1, "body" gets executed 1 time; when the outer loop index i is 2, "body" gets executed 2 times; .... when the outer loop index i is N-1 (as big as i gets), "body" gets executed N-1 times. So, in totality, "body" gets executed 0 + 1 + 2 + 3 + ... + N-1 times, or dropping the 0, just 1 + 2 + 3 + ... + N-1 times. Is there a simpler way to write this sum? There is a simple, general closed form solution of adding up consecutive integers. Here is the proof that 1 + 2 + 3 + ... + N =N*(N+1)/2. This is a direct proof, but this relationship can also be proved by induction. Let S = 1 + 2 + 3 + ... + N-1 + N. Since the order of the numbers makes no difference in the sum, we also have S = N + N-1 + ... + 3 + 2 + 1. If we add the left and right side (column by column) we have S = 1 + 2 + ... + N-1 + N S = N + N-1 + ... + 2 + 1 ------------------------------------- 2S = (N+1) + (N+1) + ... + (N+1) + (N+1) That is, each pair in the column sums to N+1, and there are N pairs to sum. Since there are N pairs, each summing to N+1, the right hand side can be simplified to just N*(N+1). so 2S = N*(N+1), therefore S = N(N+1)/2 = N**2/2 + N/2 Thus, S is O(N**2): with a constant of 1/2 dropped and a lower order term of N/2 dropped (all terms whose "order" is lower than N**2 are dropped). Note that either N or N+1 is an even number, so dividing their product by 2 is always a integer as it must be for the sum of integers: 6*7/2 = 21. So, looking back at the example of the code above, the total number of times the body gets executed is 0 + 1 + 2 + ... + N-1 which is the same as 1 + 2 + ... + N-1 so plugging N-1 in for N we have (N-1)(N-1+1)/2 = N**2/2 - N/2 which is still O(N**2) for the same reason. We can apply this formula for putting N values at the end of a linked list that is initially empty (and has no cache reference to the last node). It takes a certain amount of time to start at the front and skip all nodes to reach the back. To put in the 1st value requires skipping 0 nodes; to put in the 2nd value requires skipping 1 nodes; to put in the 3rd value requires skipping 2 nodes; ... to put in the Nth value requires skipping N-1 nodes. So the number of nodes skipped is 0 + 1 + ... + N-1, which by the formula is (N-1)N/2: so building a linked list in this way is in the O(N**2) complexity class. Fast Searching and Sorting: There are obvious algorithms for searching a list in complexity O(N) and sorting a list in complexity O(N**2). But, there are surprisingly faster algorithms for these tasks: searching a list can be O(Log N) IF THE LIST IS SORTED (the algorithm is called binary search); and sorting values in a list can be in O(N Log N). In lecture, I will briefly discuss the binary searching algorithm, for searching a sorted list: its complexity class is O(Log N). In fact if we ask in the worst case, how many times do we need to compare the "value being searched for" against "a value in the list" the answer is exactly Log N. That means that when searching a list of 1,000,000 values, we must access the list at most about 20 times to either (a) find the index of the value in the list or (b) determine the value is not in the list. This is potentially 50,000 times faster than a loop checking one index after another (which we must do if the list is not sorted)! On large problems, algorithms in a lower complexity class typically execute much faster. Also note that when increasing the size of the list by 1,000 (to 1 billion values) the maximum number of accesses for binary search goes from 20 to 30. You should know that Python's sorting method (called TimSort, named after Tim Peters, who invented it; it is a variant of simpler and more well-known mergesort) on lists is O(N Log N), but we will not discuss the algorithm. As with binary searching, we will discuss the details in ICS-46. Note that we CANNOT perform binary searching efficiently on linked lists, because we cannot quickly find the middle of a linked list (for lists we just compute the middle index and access the list there). In fact, we have seen another self-referential data structure, binary search trees, that we can use to perform efficient searches (assuming the tree is bushy). Sorting is one of the most common tasks performed on computers. There are hundreds of different sorting algorithms that have been invented and studied. Many small and easy to write sorting algorithms are in the O(N**2) complexity class (see the sorting code above, for example). Complicated but efficient algorithms are often in the O(N Log N) complexity class. We will study sorting in more detail in ICS-46, including a few of these efficient algorithms. For now, memorize that fast (binary) searching is in the O(Log N) complexity class (on sorted data) and fast sorting algorithms are in the O(N Log N) complexity class. If you are ever asked to analyze the complexity class of a task that requires sorting data as part of the task, assume the sorting function/method is in the O(N Log N) complexity class. Closing: To close for now, finding a new algorithm to solve a problem in a lower complexity class is a big accomplishment; a more minor (but still useful) accomplishment is decreasing the constant for an algorithm in the same complexity class (certainly useful, but often based more on technology than science). By knowing the complexity class of an algorithm we know a lot about the performance of the algorithm when processing a large amount of data (especially if we measure the time it takes to solve certain sized problems: using this information we can accurately predict the time to solve other size problems). Finally, We can also reverse the process, and use a few measurements, doubling the size of the problem each time, to approximate the complexity class of an algorithm. ------------------------------------------------------------------------------ Coming up: In the next two lectures we will first learn the complexity classes of many of the operations in Python, so we can look at the code of a Python function and use these operations to directly determine its complexity class. The second lecture includes code for easily timing Python functions, including discussion of hashing that will reveal why dicts and sets behave the way they do (very efficiently) in Python. ------------------------------------------------------------------------------ Problems: 1) A PC in the 1980s executed 10**6 operations per second; a PC today executes 1,000 times faster 10**9 operations per second. Assume a slow sorting method takes exactly N**2 operations to sort a list of length N; assume a fast sorting method takes exactly 5 N Log N operations to sort a list of length N. When sorting a small amount of data, the fast machine running the slow algorithm (call this configuration 1) is better; when sorting a large amount of data the slow machine running the fast algorithm is better (call this configuration 2). For approximately what size array does the transition occur: for smaller arrays configuration 1 is faster, but for larger arrays configuration 2 is better. 2) Suppose an algorithm is O(N Log N) and it takes .003 seconds to run on a computer for a problem of size 1000. Determine the function Ta(N) and compute how long it would take to run on a problem of size 1 million. Hint: us the simple approximation shown in this lecture for logarithms. 3) Suppose that we time a function f by doubling the size of its input and we get the following data N | Time to compute f(n) -------+---------------------- 1,000 | .32 2,000 | 1.31 4,000 | 5.12 8,000 | 20.5 What complexity class has a signature that matches this data?