ICS 46 Spring 2022, Notes and Examples: Asymptotic Analysis

Background

One of the things you quickly realize as you study computer science is that there is quite often more than one way to solve the same problem. When you're in a situation like that, you need a way to make a reasonable choice between them; which solution should you choose? Sometimes, there's a clear-cut winner or loser, with one approach plainly better than another no matter the situation. In most cases, though, it becomes necessary for us to assess the tradeoffs inherent in each solution, by understanding the situations in which each solution excels and struggles, and then applying your understanding of the situation you're actually in, so you can make the appropriate choice. Ideally, you apply at least some quantitative reasoning, particularly when your goal is to choose a solution that runs within a certain length of time, or that uses no more than a certain amount of memory. When choosing between possible solutions, a "gut feeling" or a choice based on which solution you're most comfortable implementing may not be good enough.

Let's consider an example. If you had a vector containing a sequence of integers in no particular order and your goal was to sort it into ascending order (i.e., smallest integer first, next-smallest integer second, and so on), how would you do it? Sorting is a well-understood and well-researched problem, and there are actually lots of known ways to solve it; as of this writing, Wikipedia's page about sorting algorithms lists over 40 different algorithms. So how do you choose one?

One way is to implement various solutions and then test them out on representative sets of input data. The upside of this approach is that you'll get a realistic view of which one is truly the winner, because you'll have an objective measurement. On the other hand, there are some huge downsides here. First of all, you'll need to actually implement the various solutions and test each implementation to make sure it's actually correct, which may be no small feat. Secondly, you'll need to make sure you've chosen test data that's representative of reality; a bad choice of test data — one that seems reasonable but is flawed in some way — could easily give you the wrong answer, just as well as a bug in your implementation could. So maybe that "objective measurement" won't be so objective after all!

You've no doubt learned, many times over, that writing correct code is not an easy thing to do, so we should prefer not to do it if we can avoid it. Why write 40 or more sorting algorithms just to see which one is the best for your situation, when all you really need is one of them? Ideally, we'd be able to make a good choice — quantitatively, even! — without ever writing a line of code, so that once we started our implementation, we would be making useful progress toward our final goal.

So, what we need is a mechanism for doing this kind of analysis and expressing its result. One very commonly-used approach is called asymptotic analysis, which considers the way some measurement of an algorithm changes as the size of the problem changes (e.g., How does the amount of time spent sorting a collection of integers grow as the size of the collection grows?). We call this kind of analysis asymptotic, because it concerns itself mostly with the rate of growth as the problem sizes become arbitrarily large — technically, as they approach infinity (which is where the name "asymptotic" comes from), though we're obviously concerned, in practice, with problems of finite size, with our emphasis on analysis becoming greater as problems become larger.

A simple asymptotic analysis

To orient our minds correctly, if you'll indulge me, let's consider a couple of simple algorithms for getting from one side of a rectangular room to another. (They say an algorithm is a "step-by-step procedure"; what could be more "step-by-step" than walking across a room?)

One obvious algorithm for getting across a rectangular room works like this:

start with your back against one side wall, facing the opposite wall

while you haven't reached the opposite wall
{
    take one step forward
}

In the absence of any obstacles, this algorithm should be correct in any rectangular room of any size. Sooner or later, you'll reach the opposite wall.

Since we're interested in a quantitative kind of analysis, the important question is how long it takes for the algorithm to run. The answer actually depends on a number of factors:

What is the time required for you to take each step?
What distance do you cover with each step you take?
How wide is the room? (In other words, what's the distance from the wall where you started to the opposite one where you finished?)

For the sake of keeping the analysis simple, let's assume that every step takes the same length of time, and that every step covers the same distance. Holding these things constant, the important remaining factor is the width of the room; the wider the room is, the more time it'll take to cross it. In fact, we could draw a graph that expresses the relationship nicely, with the width of the room on the x-axis and the time required to cross it on the y-axis. That graph would be a diagonal line (i.e., a straight line that grows as we move to the right).

A graph depicting a linear time growth rate

In general, we see that the time it takes to walk across a room is directly proportional to the width of that room. In other words, if we double the width of the room, we'll double the time it takes to cross it; if we multiply the width of the room by 10, we'll multiply by 10 the time required to cross it. This kind of algorithm is known as a linear-time algorithm (i.e., the time required grows linearly with respect to the size of the problem).

A second algorithm for getting across a rectangular room comes from science fiction. And, in fact, there's not much "science" here; it's mostly fiction.

start with your back against one side wall, facing the opposite wall
press a button on your teleportation device
you'll be disintegrated, and a copy of you will be created, standing just in front of the opposite wall

Let's imagine that the teleportation device works in such a way that it takes thirty seconds to build a copy of you on the opposite side wall, regardless of how wide the room is. If we were to graph the time required to cross the room using our new algorithm, the graph would look different: It would be a horizontal line (i.e., a straight line that doesn't grow at all as we move to the right).

A graph depicting a constant time growth rate

In general, this is what is often called a constant-time algorithm, so called because the time required to solve a problem is constant (i.e., it doesn't change as the size of the problem grows).

Now, the operative question is which of these algorithms is better, and let's say our notion of "better" has only to do with which one takes the least amount of time. (We'll leave aside the philosophical question of whether the teleported copy of me is actually "me".) And if you think about it, you'll realize that there isn't a clear answer that's right for every room. For smaller rooms that take less than thirty seconds to cross, walking across the room will be faster. For larger rooms that take longer, teleportation will be faster.

But there is one truth here worth considering. The growth rate of a constant-time algorithm is slower than the growth rate of a linear-time algorithm. What this tells us is that if you consider rooms that are progressively wider, you will sooner or later be considering rooms wide enough that the constant-time algorithm wins; for all rooms wider than that, the constant-time algorithm will still win. There's a point where the slower-growing algorithm eventually wins, even if it doesn't win for the relatively narrow rooms.

A graph depicting a linear time growth rate overtaking a constant time one

What we've just done is an asymptotic analysis. In spirit, that's all there is to it.

What do we do about the small details?

You could imagine that a particular algorithm that solves, say, the sorting problem could be measured against inputs of various sizes, and that it might be possible to describe the algorithm's running time as a function of the size of the input. You might determine, for example, that the algorithm runs in 3n²+4n+5 milliseconds. And you might determine that, for a different algorithm, the total is n²+6n milliseconds instead. Clearly, given these functions, our two algorithms have different running times for inputs of various sizes. But is there a significant difference between them? Are they similar enough to be considered, more or less, equivalent to each other? Are they different enough that our analysis needs to be this precise?

Before we answer those questions, let's consider a third algorithm, one that we've determined to run in 1234n milliseconds to solve the same problem. How different is this one from the others? Which one is fastest? How does our notion of similarity (or lack thereof) change now that we've added this function to the mix?

First, let's think about it intuitively. What does our understanding of these functions tell us about them?

Though the algorithm that runs in 1234n milliseconds will be slower initially, as n grows, sooner or later it will become the faster one, because the n² term in the others will ulimately dominate the 1234n. Or, thought differently, the 1234n algorithm has a slower growth rate than the others, which is the key reason why it's the eventual winner as n grows.
Comparing only the first two algorithms, when n gets sufficiently large, the n² term will dominate the others in each algorithm, so these are essentially algorithms that run in (more or less) n² and 3n² milliseconds. One is obviously faster than the other by a constant factor, yet both have the same growth rate (i.e., if you double n, you'll multiply the time it takes to run both of these algorithms by 2² = 4).

Now let's consider where we might get this level of detail in the first place. In our explanation above, the assumption is that we implemented these algorithms and then measured them. But if our ideal goal is to be able to do this kind of analysis without first writing any code — so we know whether we're on the right track before we start — then we're not easily going to be able to achieve this level of detail. The distinctions between 1234n , 1000n, and 13n are going to be difficult to deduce without some implementation details. The distinction between 3n²+4n+5 and n²+6n will be similarly difficult.

So, the goal should be to leave some of the small details out. What we should be after are less-refined measurements that just tell us a ballpark estimate of how each algorithm behaves, giving us enough information to compare them broadly, while leaving aside the fine-grained comparisons for cases where we need them. This process is driven primarily by describing the basic "shape" of a function, and by comparing these shapes to see if they're the same or different. But how do we determine, quantitatively, the basic shape of a function? What does shape mean? The answer to that lies in a kind of mathematically-simplified form called asymptotic notation, of which we'll learn about three types: O-notation, Ω-notation, and Θ-notation. These notations, and their corresponding definitions, specify an agreement amongst mathematicians and computer science about the general "shape" of a function, so we can all consider the same details to be relevant, while leaving out the others; this way, we're all on the same page.

O-notation

We can begin to learn about O-notation — sometimes also called Big O-notation — by starting with its definition.

def. We say that f(n) is O(g(n)) if and only if there are two positive constants, c and n₀, such that f(n) ≤ cg(n) ∀n ≥ n₀

Like a lot of mathematical definitions, this one can seem daunting at first, but it helps if you understand a few details:

f(n) and g(n) are two different functions; in practice, each fills a different role. f(n) is the function describing the actual behavior of some algorithm. g(n) is a (generally) simpler function that we want to demonstrate has the same basic shape, i.e., the same basic growth rate.
The definition actually specifies that g(n) is an upper bound on f(n), ignoring two things (neither of which substantially affects growth rate):
- Constant factors — that's what c is for. The functions n, 3n, and 1000n all grow at the same rate as n grows, so O-notation does not distinguish between them.
- Relatively small values of n — that's what n₀ is for. In practice, it doesn't matter how you solve small problems.

So, for example, we might want to know if it's true that 3n+4 is O(n) (i.e., is it true that the functions 3n+4 and n have, more or less, the same growth rate?). One way to do this is by proving it with the mathematical definition:

 f(n) ≤ cg(n)     ∀n ≥ n₀
 3n+4 ≤ cn        ∀n ≥ n₀

     let c = 4  (all we need is a c and n₀ that work)

 3n+4 ≤ 4n        ∀n ≥ n₀

     let n₀ = 4

 3n+4 ≤ 4n        ∀n ≥ 4
    4 ≤ n         ∀n ≥ 4   (subtracted 3n from both sides of the inequality)

And we're left with an obviously true statement, that n is at least 4 whenever n is at least 4. Thus, it's proven that 3n+4 is O(n).

Understanding the limitations of O-notation

O-notation gives us a way to ignore lower-level details in favor of a quick, back-of-the-envelope analysis of the growth rate of a function. The quick analysis doesn't give us an exact answer about, say, how long it will take to run a search of 1,000 elements in a linked list on a particular machine in a particular situation. But it does tell us, in general, that, as the problem size continues to grow, one algorithm would eventually beat another one (and continue to beat it no matter how much larger the problem got). That's a powerful piece of knowledge to have.

The details that are ignored by O-notation, though, are sometimes precisely the ones you might need, so it's important to understand the ways in which O-notation is limited.

If you have two algorithms, one that runs in O(n²) time and the other than runs in O(n) time, and you want to run them on a problem of size n = 500, which one is faster? There's no way, using only O-notation to know for sure, because O-notation isn't an absolute measurement; it measures asymptotic behavior (i.e., what happens as n grows toward infinity?).
If you have two algorithms, both of which run in O(n²) time, which one runs faster? Knowing only the O-notations, there's no way to know, because the details required to differentiate are precisely the details that O-notation ignores.

These aren't problems we can fix, necessarily, but they do tell us that our analyses won't always end with an asymptotic analysis; sometimes, we'll need to do more.

Another limitation of O-notation is that, technically, it's allowed to be a gross overestimate. Is it true that 3n is O(n²)? Let's see if we can prove it using the definition.

 f(n) ≤ cg(n)     ∀n ≥ n₀
   3n ≤ cn²       ∀n ≥ n₀

     let n₀ = 1

   3n ≤ cn²       ∀n ≥ 1

     let c = 3

   3n ≤ 3n²       ∀n ≥ 1

That last statement is certainly true for all n ≥ 1, so it is indeed proven. In fact, we could also prove that 3n is O(n³), O(n⁴), O(2ⁿ), and O(n!), among many other functions whose growth rates are greater than the growth rate of n.

That said, our goal here is to use analysis to convince ourselves and others about fundamental truths regarding algorithms and data structures. There will be no reason to make weaker statements when it's possible for us to make stronger ones. So we'll try to keep our bounds as tight as we can. And we'll also add some additional notations, to allow us to express things other than just upper bounds.

Ω-notation

Ω-notation is similar to O-notation, but with the distinguishing characeristic that it specifies a lower bound on f(n), rather than an upper bound. So if O-notation provides us a way to show that some measurement of performance will never grow worse than a certain rate, Ω-notation, instead, shows that some measurement will never grow better than a certain rate. This allows us to make a different kind of statement about performance, but one that can sometimes also be useful in practice.

From a definition standpoint, Ω-notation looks an awful lot like O-notation.

def. We say that f(n) is Ω(g(n)) if and only if there are two positive constants, c and n₀, such that f(n) ≥ cg(n) ∀n ≥ n₀

So the key difference here is in the direction of the inequality. Otherwise, the definition is the same, meaning that it's made up of the same parts that each have the same basic purpose.

f(n) is the function describing the actual behavior of some algorithm.
g(n) is a simpler function whose shape we'd like to compare to f(n)'s.
c allows us to ignore constant factors.
n₀ allows us to ignore relatively small values of n, and concentrate instead on what happens as n grows large.

Why you might like a tool like this, in practice, is because it allows you to say different kinds of things about algorithms. One useful thing it allows us to do is to make conclusions about classes of algorithms, as opposed to just individual ones. For example, suppose you have a collection of integers and you want to sort them. Depending on which algorithm you choose, its O-notation will be quite different. (As we'll see, some algorithms can sort elements in O(n²) time, others in O(n log n) time, and some can even do it in O(n), at least in some circumstances.) But we can say, for sure, that no matter what algorithm we choose, it will take Ω(n) time to run; that's fundamental. Here's why:

Suppose you have a "magic box" that, given an element in a collection, can always tell you immediately where in the list it belongs.
Suppose, additionally, that the collection is an array, so that you have constant-time access to every element given its index.
If you have n elements to sort, the "magic box" algorithm will run in O(n) time, because even though you have a "magic box" to tell you where to put things, you'll still need to move everything (or, at the very least, ask the "magic box" about every element and be told not to move it). All n elements will need to be considered, so this will take O(n) time.

Of course, there's no "magic box" sorting algorithm that works in the general case. But what this analysis shows is that this really is the best we can do. So we could say, in general, that sorting takes Ω(n) time; the lower bound on sorting is linear, because you at least have to verify that what you have is sorted and, potentially, move the things that aren't.

That's what Ω-notation is really about: establishing a lower bound on the growth rate of a function. We'll see more practical examples of this as we move forward through the course.

Θ-notation

In addition to allowing us to talk about lower bounds, Ω-notation allows us to do one other important thing. If O-notation lets us bound the growth rate of a function from above, while Ω-notation allows us to bound it from below, then that means we can sometimes establish a tight bound, by showing that the upper bound and lower bound are the same. That's the idea behind Θ-notation, whose definition follows.

def. We say that f(n) is Θ(g(n)) if and only if f(n) is O(g(n)) and f(n) is Ω(g(n))

In other words, Θ-notation is used to express the notion that there's a tight fit, that an algorithm's growth rate can be specified definitively instead of somewhat more weakly.

A few examples of asymptotic analysis

To continue getting our minds around asymptotic analysis, here are a few examples. Suppose you have an array of n three-digit integers, and that the integers are not necessarily stored in a meaningful order already. Using asymptotic notations, we can talk about the growth rate of the time required to run various algorithms on our array (i.e., how will the time it takes increase as the size of the array increases?).

Print all of the elements to std::cout. You'll be guaranteed to have to "visit" each element in order to print it. Each will have to be visited once. Since the integers are all three digits, it's fair to say that they'll each take the same (constant) amount of time to print. So, generally, we'd say that the growth rate is linear, and always linear. So we could say that this will take Θ(n) time.
Search the array looking for a particular integer. You might have to visit every element, if you're unlucky and you don't find what you're looking for. On the other hand, you might look at the first one, find what you're looking for, and be done! So it's not always going to be the case that it takes a linear amount of time; in the lucky case, it doesn't matter how long the array is, because we'll only be looking at the first element. But we could certainly say it would take no longer than a linear amount of time. So we could say that this will take O(n) time.
Search the array looking for a particular integer, in the worst case. Note that we can sometimes make a stronger statement about a more constrained circumstance. By adding the phrase "in the worst case" here, there is only one possible scenario: every element will have to be visited. So this would be said to take Θ(n) time.
Search the array looking for a particular integer, in the best case. Similarly, by adding the phrase "in the best case" here, we're talking about the best possible outcome, which is that the first element is the one we're looking for. So this would be said to take Θ(1) time. (Constant-time bounds are usually represented as O(1), Ω(1), and/or Θ(1).)

Note, too, that we can use asymptotic notations to describe measurements other than time. For example, we would say that our array of n elements would require Θ(n) memory, because it always requires a constant amount of memory (sizeof(int)) for each of the n elements.