Analysis of Algorithms (Complexity Classes and Big-O Notation) Analysis of Algorithms is a mathematical area of Computer Science in which we analyze the resources (mostly time, but sometimes space) used by algorithms to solve problems. An algorithm is a precise procedure for solving a problem, written in any notation that humans understand (so that we can carry-out the algorithm): if we write an algorithm as code in some programming language, then a computer can execute it too. The analysis we do should be independent of any technology: e.g., language, compiler, or processor. The main tool that we use to analyze algorithms is big-O notation: it means that the resource is "bounded above by growth on the order of". We use big-O notation to bound from above the performance of an algorithm by placing it in a complexity class (most often based on its WORST-CASE behavior -but sometimes on its BEST or AVERAGE-CASE behavior) when solving a problem of size N: we will learn how to characterize the size of a problem, which is most often as simple as "N is the number of values in" an array, file, vector, linked list, etc. Once we know the complexity class of an algorithm, we have a good handle on understanding its actual performance on real machines (within certain limits). Thus, in AA we don't necessarily compute the exact resources needed, but typically an approximate upper bound on the resources, based on just the problem size (not the details of the data: but if we know those, sometimes we can compute a more accurate bound). ------------------------------------------------------------------------------ Getting to Big-O Notation: Throwing away Irrelevant Details Here is a simple C++ function for computing the maximum value in an array (or throwing and exception if there are no values in the array). int maximum(int a[], int N) { //N is the length of the array if (N == 0) throw "empty array"; int answer = a[0]; for (int i=1; i answer) answer = a[i]; return answer; } Often, the problem size is the number of values processed: e.g., the number of values in an array, file, linked list, tree, etc. But we can use other metrics as well: e.g., it can be the count of number of digits in an integer value, when looking at the complexity of multiplication based on the size of the numbers. Thus, there is no single measure of size that fits all problems: instead, for each problem we try to choose a measure that makes sense and is natural for that problem. C++ compiles functions like maximum into a sequence of machine language instructions that the computer executes. To solve a problem, the computer always executes an integer number of instructions. For simplicity, we will assume that all instructions take the same amount of time to execute. So to compute the amount of time it takes to solve a problem is equivalent to computing how many instructions the computer must execute to solve it (which depends on the compiler technology we are using), which we can divide by the number of instructions/second the machine executes to compute the time taken (which also depends on technology, and changes). Again, we typically look at the worst case behavior of algorithms. For maximum, the worst case occurs if the array value are stored in increasing order. In this case, each new value examined in the array will be bigger than the previous answer, so the if statement's condition will always be true, which always requires executing the code that updates the answer. If any value was lower, it wouldn't have to update the answer and thus would execute fewer instructions and take less time to execute. It turns out that for an array of N values, on a certain C++ compiler for a certain processor, the computer executes 14N + 9 instructions in the worst case for this function. You may need to know more CS than you do at this time to determine this formula, but you will get there by ICS 51; and if you have taken ICS-51, you should understand the machine instructions that such a program might be translated into. A simple way to think about this formula is that there are 14 computer instructions that are executed each time the body of the for loop is executed, and 9 instructions that are executed only once: they deal with starting and terminating the loop and the entire function. We can say I(N) = 14N + 9 for the worst case of the maximum function, where I(N) is the number of instructions the computer executes when solving a problem on an array with N values. Or to be more specific we would write Imaximum(N) = 14N + 9, for the maximum function. I would like to argue now that if simplify this function to just Imaximum(N) = 14N, we get a simpler function; although we lose some information, we don't lose much. Specifically, as N gets bigger (i.e., we are dealing with very big problems - the kinds computers are used to solve), 14N and 14N+9 are relatively close: as N gets bigger the loop accounts for most of the instructions executed) Let's look at the result of this function vs. the original as N gets for values of N increasing by a factor of 10. N | 14N + 9 | 14N | error: (14N+9 - 14N)/(14N+9) as a % of N ---------+------------+------------+--------------------------- 1 | 23 | 14 | 61% or 39% accurate 10 | 149 | 140 | 6% 94% accurate 100 | 1409 | 1400 | .6% 99.4% accurate 1000 | 14009 | 14000 | .06% 99.94% accurate ... 1,000,000 | 14,000,009 | 14,000,000 | .00006% 99.99994% accurate ... Each line shows the % error of computing 14N when the true answer is 14N + 9. So by the time we are processing an array of 1,000 values, using the formula 14N instead of 14N+9 is 99.94% accurate. For computers solving real problems, an array of 1,000 values is tiny: an array of millions is more normal. For 1,000,000 values 14N is off by just 9 parts in 14 million. So the 9 doesn't affect the formula much. Almost all the time is spent in the loop. Analysis of Algorithms really should be referred to as ASYMPTOTIC Analysis of Algorithms, as it is mostly concerned with the performance of algorithms as the problem size gets very big (N -> infinity). We see here that as N->infinity 14N is a better and better approximation to 14N+9: dropping the extra 9 becomes less and less important, when divided by the actual number of instructions executed. A simple function for sorting is the following. This is much simpler than the real sort method in C++ (and the simplicity results in the function taking much more time for large N, but it is a good starting point for understanding sorting now). If you are interested in how this function accomplishes sorting, hand simulate it working on an array of 5-10 values (try arrays with increasing values, decreasing values, and values in random order): basically each execution of the outer loops mutates the array so that the next value in the array is in the correct position. We will spend a few lectures studying sorting more formally towards the end of the quarter. void sort(int a[], int N) { //N is the length of the array for (int base=0; base a[check]) std::swap(a[base], a[check]); } It turns out that for an array of N values, the computer executes 8N^2 + 12N + 6 instructions in the worst case for this function. The outer loop executes N times and inner loop on average executes N/2 times (when base is 0 it executes N-1 times; when base is N-1 it executes 0 times), so the if statement in the inner loop is executed a quadratic number of times. So Isort(N) = 8N^2+12N+6 for the worst case of the sort function, where Isort(N) is again the number of instructions that the computer executes. Or to be more specific we would write Isort(N) = 8N^2 + 12N + 6. I would like to argue in the same way that if we simplify this function to just Isort(N) = 8N^2, we get a much simpler function (dropping 2 of the 3 additive terms) but we have not lost much information. Let's look at the result of this this function vs. the original as N gets bigger and bigger N | 8N^2+12N+6 | 8N^2 | error: (12N+6)/(8N^2+12N+6) as a % of N ---------+------------+-------------+--------------------------- 1 | 26 | 8 | 70% or 30% accurate 10 | 926 | 800 | 14% 86% accruate 100 | 81,206 | 80,000 | 1.5% 98.5% accurate 1000 | 8,012,006 | 8,000,000 | .15% 99.85% accurate So by the time we are processing an array of 1,000 values, using the formula 8N^2 instead of 8N^2 + 12N + 6 is 99.85% accurate. For 1,000,000 values (10^6), 8N^2 is 8*10^12 so 8N^2 + 12N + 6 is 8*10^12 + 12*10^6 + 6; the simpler formula is 99.999985% accurate. Note the difference in the powers of 10 for the first and second terms in the formula: 10^12 and 10^6, a factor of a million difference. CONCLUSION (though not proven): If the real formula I(N) is a sum of a bunch of terms, we can drop any term that doesn't grow as quickly as the most quickly growing term. As N->infinity, we can neglect the effects from all these lower order terms without losing much accuracy. In computing the maximum, the linear term 14N grows more quickly than the next term, the constant 9, which doesn't grow at all (as N grows) so we drop the 9 term. In sorting, the quadratic term 8N^2 grows more quickly than the next two terms, the linear term 12N and the constant 6, so we drop the 12N and 6 terms. In fact note that we can prove that the Limit as N->infinity of 12N/8N^2 = 3/(2N) -> 0, which means we can discard the 12N term as growing more slowly than the 8N^2 term. Similarly, the Limit as N->infinite of 6/8N^2 -> 0. This ratio tells us what terms we can discard. The result is a simple function (a constant times some function of N) that is still an accurate approximation of the number of computer instructions executed for arrays of various large sizes. But also note, if the formula were something like N^2 + 1,000,000N (with the coefficient of N one million times greater than the coefficient of N^2), at N = 1,000,000 the N^2 term is the same size as 1,000,000N term, so the percent error would be 50%. Of course if we increase N to a billion or trillion, N^2 becomes much bigger. In most algorithms, the coefficients are of the same magnitude, so throwing away lower-order terms is still accurate for even moderate-sized N. We will soon discuss where these constants come from, so you can better undertand them. We now will explain the rationale for dropping the constant in front of N (in Imaximum) and N^2 (in Isort), and classifying these algorithms as just O(N) bounded by linear growth, or O(N^2) bounded by quadratic growth. Again O means "bounded by growth on the order of", so O(N) means bounded by growth on the order of N and O(N^2) means bounded by growth on the order of N^2. 1) If we assume that every instruction in the computer takes the same amount of time to execute. Then the time taken for maximum is about 14N/speed and the time for sort is about 8N^2/speed. We should really think about these formulas as (14/speed)N and (8/speed)N^2. We know the 14 and 8 came from the number of instructions inside loops that C++ needed to execute: but a different C++ compiler (or a different language) might generate a different number of instructions and therefore a different constant. Thus, this number is based on technology, and we want our analysis to be independent of technology. Of course, "speed" changes based on technology too, so we lump that with the compiler constant. Since we are trying to come up with a "science" of algorithms, we don't want our results to depend on technology, so we are also going to drop the constant in front of the biggest term as well. For the reason explained above (relating to instructions generated by C++ and the speed of the machine), this number is based solely on technology. Here is a second justification for not being concerned with the constant in front of the biggest term. 2) The fundamental question that we want answered about any algorithm is, "how much MORE resources does it need when solving a problem TWICE AS BIG". In maximum, when N is big (so we can drop the +9 without losing much accuracy) the ratio of time to solve solve a problem of size 2N to the time to solve a problem of size N is easily computed: Imaximum(2N) 14(2N) -------------- ~ -------- ~ 2 Imaximum(N) 14 N The ratio is a simple number (no matter how many instructions are in the loop, since the constant 14 appears as a multiplicative factor in both the numerator and denominator, and cancels itself out). So, we know for this code, if we double the size of the array, we double the number of instructions that maximum executes to solve the problem, and thus double the amount of time (for whatever the speed of the computer is). Thus, the constant 14 is irrelevant when asking this "doubling" question. Likewise, for sorting we can write Isort(2N) 8(2N)^2 ----------- ~ ---------- ~ 4 Isort(N) 8 N^2 Again, the ratio is a simple number, with the constant (no matter what it is), disappearing. So, we know for this code that if we double the size of the array, we increase by a factor of 4 the number of instructions that are executed, and thus increase by a factor of 4 the amount of time (for whatever the speed of the computer is). Thus, the constant 8 is irrelevant when asking this "doubling" question. Note if we didn't simplify, we'd have Isort(2N) 8(2N)^2 + 12(2N) + 6 32N^2 + 24N + 6 ----------- = ---------------------- = ------------------- Isort(N) 8N^2 + 12N + 6 8N^2 + 12N + 6 which doesn't simplify easiy; although, as N->infinty, this ratio gets closer and closer to 4 (and is very close even for relatively small-sized problems, because the coefficients are the same magnitude). As with air-resistance and friction in physics, typically ignoring the contribution of these negligible factors (for big, slow-moving objects) allows us to quickly and easily solve an approximately correct problem. Using big-O notation, we say that the complexity class of the code to find the maximum is O(N). The big-O means "bounded by growth on the order of" N, which means at most a linear growth (double the input size, double the time). For the sorting code, its complexity class is O(N^2), which means "bounded by growth on the order of N^2", which means at most a quadratic growth rate (double the input size, quadruple the time). ---------- IMPORTANT: A Quick way to compute the complexity class of an algorithm To analyze a C++ function's code and compute its complexity class, we approximate the number of times the most frequently executed statement is executed, dropping all the lower (more slowly growing) terms and dropping the constant in front of the most frequently executed statement (the fastest growing term). We will show how to do this much more rigorously in the next lecture. Here is how we apply this simple rule to the maximum code: it executes the if statement N times, so it is O(N). The sorting code executes the if statement N(N-1)/2 times (we will justify this number below), which is N^2/2 - N/2, so dropping the lower term and the constant 1/2, yields a complexity class of O(N^2). In the next lecture we will generalize this method for quickly calculating the complexity class of C++ code by examining source code. ---------- ------------------------------------------------------------------------------ Comparing Algorithms by their complexity classes Primarily from this definition we know that if two algorithms, a and b, both solve the same problem, and a is in a lower complexity class than b, then for all BIG ENOUGH N, Ta(N) < Tb(N): here Ta(N) means the Time it takes for algorithm a to solve the problem of size N. NOTE THAT NOTHING HERE IS SAID ABOUT SMALL N; which algorithms uses fewer resources on small problems depends on the actual constants that we dropped (and even on the additive terms that we dropped), so COMPLEXITY CLASSES HAVE LITTLE TO SAY FOR SMALL PROBLEM SIZES. For example, if algorithm a is O(N) with a constant of 100, and algorithm b is O(N^2) with a constant of 1, then for values of N in the range [1,100], Tb(N) = 1N^2 <= 100N = Ta(N) but for all values bigger than 100, Ta(N) = 100N <= 1N^2 = Tb(N) So for big-sized problem, the algorithm in the lower complexity class is faster. As a second example, if algorithm a is O(N) with a constant of 1, and algorithm b is O(N^2) with a constant of 10, then for all values of N Ta(N) = 1N <= 10N^2 = Tb(N) So in some cases, a lower complexity class can be worse (but only for small N), and in some cases it can be better for all N. We need determine which is faster for small N by going beyond our use of complexity classes. But it is guaranteed that FOR ALL SIZES BEYOND SOME VALUE OF N, the algorithm in the lower complexity class will run faster. Again, we use the term "asymptotic" analysis of algoritms to indicate that we are mostly concerned with the time taken by the code when N gets very large (going towards infinity). In both cases, because of their complexity classes, algorithm a will be better. What about the constants? Are they likely to be very different in practice? It is often the case that the constants of different algorithms are close. (They are often just the number of instructions in the main loop of the code, so code will small loops will have similar constants). So the complexity classes are a good indication of faster vs slower algorithms for all but the smallest problem sizes. Although all possible mathematical functions might represent complexity classes (and many strange ones do), we will mostly restrict our attention to the following complexity classes. Note that complexity classes can interchangably represent computing time, the # of machine operations executed, and more nebulous terms such as as "effort" or "work" or "resources". As we saw before, a fundamental question about any algorithm is, "What is the time needed to solve a problem twice as big". We will call this the DOUBLING SIGNATURE of the complexity class (knowing this value empirically -which we can determine by running the program on various problem sizes- often allows us to deduce the complexity class as well). Class | Algorithm Example | Doubling Signature --------+-----------------------------------------------+---------------------- O(1) | pass argument->parameters/copying a pointer | T(2N) = T(N) O(LogN) | binary searching of a sorted array | T(2N) = c + T(N) O(N) | linear searching an array | T(2N) = 2T(N) O(NLogN)| Fast sorting | T(2N) = cN + 2T(N) Fast algorithms come before here; NLogN grows a bit more slowly than linearly (because logarithms grow so slowly compared to N) and nowhere near as fast as O(N^2). O(N^2) | Slow sorting; scanning N times array size N | T(2N) = 4T(N) O(N^3) | Slow matrix multiplication | T(2N) = 8T(N) O(N^m) | Graph algorithms, for some fixed m: 4, 5, ... | T(2N) = 2^mT(N) Tractable algorithms come before here; their work is polynomial in N (and typically a small power). In the complexity class below, N is in an exponent. For example, for an O(N^2) algorithm, doubling the size of the problem quadruples the time required: T(2N) ~ c(2N)^2 = c4N^2 = 4cN^2 = 4T(N). O(2^N) | Finding boolean values that satisfy a formula | T(2N)=2^N T(N) Note T(2N) = 2^(2N) = (2^N)^2 = 2^N T(N) Note that in Computer Science, logarithms are mostly taken to base 2. (Remember that algorithms and logarithms are very different terms). So when we write logarithms, they are implicitly to base 2 (e.g., Log N = Log2 N). You should memorize and be able to use the following facts to approximate logarithms without a calculator. Log 1000 ~ 10 Actually, 2^10 = 1,024, 2^10 is approximatley 1,000 with < a 3% error. Log a^b = b Log a, or more usefully, Log 1000^N = N Log 1000; so ... Log 1,000,000 = 20 : 1,000,000 = 1,000^2; Log 1000^2 = 2*Log 1000 Log 1,000,000,000 = 30 : 1,000,000,000 = 1,000^3; Log 1000^3 = 3*Log 1000 So note that Log is a very slowly growing function. When we increase from Log 1,000 to Log 1,000,000,000 (the argument grows by a factor of 1 million) the result only grows by from 10 to 30 (by a factor or 3). That represents a very flat curve, with its slope decreasing assympotically toward 0 (the constant function). We can compute logarithms base 2 on any calculator that computes Log in any base, using a "base conversion function". For example, Log (base b) X = Log (base a) X / Log (base a) b So, Log (base b) X is just a constant (1/Log (base a) b) times Log (base a) X, so logarithms to any base really all specify the SAME COMPLEXITY CLASS (regardless of the base) because they differ only by a multiplicative constant (which we ignore when writing complexity classes). For example, Log(base 10) X = Log(base 2) X / Log(base 2) 10 ~ .3 Log(base 2) X Log(base 2) X = Log(base 10) X / Log(bas 10) 2 ~ 3.3 Log(base 10) X ---------- IMPORTANT: Determining the Complexity Class Empirically from a Doubling Signature If we can demonstrate that doubling the size of the input approximately quadruples the time of an algorithm, then the algorithm is O(N^2). We can use the doubling signatures shown above for other complexity classes as well. Thus, even if we cannot mathematically analyze the complexity class of an algorithm based on inspecting its code (something we will highlight in the next lecture), if we can measure it running on various sized problems (doubling the size over and over again), we can use the doubling signature information to approximate its complexity class. In this approach we don't have to understand (or even look at) the code. ---------- ------------------------------------------------------------------------------ Computing Running Times from Complexity Classes We can use knowledge of the complexity class of an algorithm to predict its actually running time on a computer as a function of N easily. For example, if we know the complexity class of algorithm a is O(N^2), then we know that Ta(N) ~ cN^2 when N is large, some constant c. The constant c represents the "technology" used: the language, compiler, machine speed, etc.; the N^2 (from O(N^2)) represents the "science/math" part: the complexity class. Now, given this information, we can time the algorithm for some large value N. Let's say for N = 10,000 (which is actually a pretty small N these days) we find that Ta(10,000) is 4 seconds. First, if I asked you to estimate Ta(20,000) you'd immediately know it is about 16 seconds (doubling the input of an O(N^2) algorithm approximately increases the running time by a factor of 4). Second, if we solved the equation for Ta(N) above for c, we have Ta(N) ~ cN^2, substituting 10,000 for N and 4 for Ta(N) (by actually running the code and timing it) we have 4 ~ c 10,000^2 (from the formula); so solving for the technology constant we have c ~ 4x10^-8. So, by measuring the run-time of this code once, we can calculate the constant "c", which involves all the technology (language, compiler, computer speed, etc). Roughly, we can think of c as being the amount of time it takes to do one loop (# of instructions per loop/speed of executing instructions) where the algorithm requires N^2 iterations through the loop to do all its work. Therefore, Ta(N) ~ 4x10^(-8) x N^2. So, if asked to estimate the time to process 1,000,000 (10^6) values (100 times more than 10,000), we'd have Ta(10^6) ~ 4x10^(-8) x (10^6)^2 Ta(10^6) ~ 4x10^(-8) x 10^12 Ta(10^6) ~ 4x10^4, or about 40,000 seconds (about 1/2 a day) Notice that solving a problem 100 times as big takes 10,000 (which is 100^2) times as long, which is based on the signature for an O(N^2) algorithm when we increase the problem size by a factor of 100. If we go back to our sorting example, Isort(100N) 8(100N)^2 ----------- ~ ------------- ~ 10,000 Isort(N) 8 N^2 In fact, while we often analyze code to determine its complexity class, if we don't have the code (or find it too complicated to analyze) we can run the code, doubling the input sizes a few times, plot the result, and then check whether we can "fit the resulting times" to any of the standard doubling signatures to estimate the complexity class of the algorithm. We should do this for some N that is as large as reasonble (taking some number of seconds to solve on the computer). Yes, we just discussed this method above. Note for an O(2^N) algorithms, if we double the size of the problem from 100 to 200 values the amount of time needed goes up by a factor of 2^100, which is ~ 1.3x10^30. Notice that each time we add one more value to process, the time it takes doubles: this "exponential" time is the inverse function of logarithmic time, in terms of its growth rate: exponentials grow incredibly quickly while logarithms grow incredible slowly. The function 2^N eventually is bigger than any polynomial in N: e.g., 2^N is eventually bigger than N^1000. ------------------------------------------------------------------------------ Mathematical Definition of big-O Notation Mathematically, we say that f(n) is O(g(n)) -meaning the function f has an order of growth that is BOUNDED by function g- if we can find values n0 and c such that for all n > n0, f(n) <= c g(n): read this as, "for all n greater than n0, f(n) is less than or equal to a constant c times g(n)." We don't care about small values of n, when n <= n0. This definition allows us to do precisely what we did above more intuitively: find the upper-bound on the complexity class of an algorithm by computing how often the most frequently executed statement is executed (the f(n)), and throwing away lower order terms and constant coefficients: the result is a simpler function g(n) that characterizes the complexity class. Let's use this defintion to prove that certain functions are in certain complexity classes: recall a complexity class is a single term involving n (no lower order terms and no constant multiplier). Finally, recall that n is typically an integer (often the size of some data structure: e.g., an array or linked list). So talking about n = .5 or n = -10 makes no sense. First, note that (1) 1 <= n, 1 <= n^2 whenever n>0 (2) log n <= n whenever n>0 (log is undefined if n<=0) (3) n <= n^2 whenever n>=0 (4) n log n <= n^2 whenever n>0 (by multiplying each side in (2) by n) So, let's formally prove thta f(n) = 5n^2 + 3nlogn + 2n + 5 is O(n^2). That means we must find an n0 and c such that 5n^2 + 3nlogn + 2n + 5 <= c n^2 for all n>n0 Note from (4) 3n log n <= 3n^2 for n > 0, (3) 2n <= 2n^2 for n > 0, and (1) 5 <= 5n^2 for n > 0 and trivially 5n^2 <= 5n^2 for all n. so we can add each term on the left and add each term on the right (with each term on the left being <= each term on the right for n > 0) with the result 5n^2 + 3nlogn + 2n + 5 <= 5n^2 + 3n^2 + 2n^2 + 5n^2 for all n>0 (because each term on the left is less than its corresponding term on the right for n>0). We can simplify all the terms on the right (they are all constants times n^2) to be (5 + 3 + 2 + 5)n^2 or just 15n^2. So, we have proven f(n) = 5n^2 + 3nlogn + 2n + 5 <= 15n^2 for all n>0, therefore 5n^2 + 3nlogn + 2n + 5 is O(n^2) (with c = 15 and n0 = 0). Likewise, we can see that for f(n) = 2n + 100log n, f(n) is O(n) because 100log n <= 100n (for n>0) -multiply each side of (2) by 100, and 2n + 100logn <= 2n + 100n = 102n for all n>0 - add 2n to each side. Likewise, we can see that for f(n) = 2n^2 - 100n + 50, f(n) is O(n^2) because f(n) <= f(n)+100n (for all n>0) = 2n^2+50 <= 2n^2+50n^2 = 52n^2 (for all n > 0) By a similar "addition trick" we can can drop/ignore all negative terms that are in a formula f when computing its complexity class. Finally, we can see that for f(n) = 2n + 10, f(n) is O(n) because 10 <= 10n (for n>0) -multiply each side of (1) by 100, so 2n + 10 <= 12n for n>0 ------- There is more than one way to compute the c and n0 (which are not independent: they depend on each other, in interesting ways). For example, there are many other ways to show that for f(n) = 2n + 10, f(n) is O(N). Originally we chose c = 12, and found f(n) < 12n, for n>0. But, we can also choose c = 4, and use that value to find for what n0, f(n) = 2n + 10 <= 4n. 2n + 10 <= 4n 10 <= 2n subtract 2n from each side 5 <= n divide each side by 2 and see that this inequality holds when n>=5. So, we have c=4 and n0 = 5 as another demonstration that f(n) is O(n). If we choose c = 2.001, we can solve for when 2n + 10 <= 2.001n 10 <= .001n subtract 2n from each side 10,000 <= n divide each side by .001 and see that this inequality holds when n>=10,000. So, we have c = 2.001 and n0 = 10,000 as another demonstration that f(n) is O(n). So, there are MULTPLE PROOFS (multiple cs and n0s) that meet the requirement stated at the beginning. If can choose a large c, we can choose a small n0; if we choose a smaller c, we will need to choose a larger n0. Note that we must choose a c bigger than 2.0 (even a tad bigger: .001 as was shown above, or even .0000001). If we try to use 2, the inequality 2n + 10 <= 2n has no solution that makes sense for non-negative problem sizes n0; the solution to the inequality 2n + 10 <= 1.5n 10 <= -.5n subtract 2n from each side -20 >= n divide each side by -.5 (inequality changes <= to >=) is meaningless: it isn't in the form, n >= something. It concerns problem sizes less than -20, and -20 is not a legal problem size! ------- Finally, notice that if f(n) is O(n^2) then f(n) -by the definition- is also O(n^3). Here is the proof. If f(n) is O(n^2) then we know for all n > n0, f(n) <= c n^2 but we also know n^2 <= n^3 for n>=0 (and n is an integer and always >= 0) so for all n > n0, f(n) <= cn^3 (the same c as above), so f(n) is O(n^3). IMPORTANT: Saying f(n) is O(g(n)) is telling us just that g(n) is an UPPER BOUND on how fast the function f(n) grows. In practice, we always try to get a "tight" upper bound (the smallest growing function g(n) for which f(n) is O(g(n))), but big-O notation says NOTHING ABOUT "TIGHTNESS", just inequality. When we talk about the big Theta and big Omega notation in a later lecture, we will see how to bound f(n) from below as well as from above. If the bounding complexity classes (both big O and big Omega) are both g(n), then we will have a tight bound on f(n) and pronounce it big Theta(g(n)). More on this in the next lecture. ------------------------------------------------------------------------------ Odds and Ends: It is important to be able to analyze the following code. Notice that the upper bound of the inner loop (i, in j