ICS 46 Spring 2022
Notes and Examples: Comparison-Based Sorting


The sorting problem

We'll be spending some time learning, analyzing, and comparing several different ways of solving the sorting problem. Before we do that, it's not a bad idea for us all to agree on what the problem actually is. What are we looking for? How do we know we've solved it? Does it matter what types of data we're sorting? Even if you may think that it sounds silly to start there — that you must know already what it means to sort things — by understanding the problem more thoroughly, we'll discover that we can solve it in a wider variety of cases (and ways) than you might have first thought.

What does it mean to be sorted?

Suppose you have a sequence of items of some type — they might be integers, they might be strings, they might be calendar dates. We say that this sequence can be sorted if there exists a total ordering for the items. Total ordering is a term from mathematics with a particular meaning, which we should be sure we understand. A total ordering exists if there is a relation, which you might refer to by the symbol ≤, that can be used to provide that ordering. Colloquially, we could say that ≤ is the way that we decide "which item comes first" in a sorted version of the sequence. But there are a few details we have to get right; for ≤ to form the basis of a total ordering, it has to have a few properties:

These properties, taken together, mean that our ≤ relation can make meaningful decisions about which items come before which others, and that those decisions, taken together, will lead to a sequence that is sorted meaningfully. If our ≤ relation has these properties, we can use it to sort elements into a meaningful order.

There is another wrinkle worth considering here, as well. It's important not to take the use of ≤ too literally here. It's certainly true that the ≤ relation on integers, for example, has the properties described above and, thus, can be used to provide a total ordering of the elements. If you literally used ≤ to sort integers, you would be sorting them into ascending order, which is to say that they'd be arranged from smallest to largest.

However, the ≥ relation on integers has the properties described above, as well. For any two items i and j, ij. If ij and ji, that means i = j. The ≥ relation is transitive. So we can just as well use the ≥ relation to sort integers, in which case we'd have sorted them into descending order, arranged from largest to smallest.

You could also sort elements that aren't numbers. Strings, for example, can be sorted, as long as you have a total ordering relation. One way to do that is to use a lexicographical comparison, which compares the character codes of the first character of each string using ≤, then the second character if the first characters are equal, and so on. But you could also sort strings by comparing their lengths, or counting how many times the letter A appears in them. Any of these relations are total orderings, so any of them would be potential ways you could sort strings. By the same token, you could sort calendar dates by year, then month (if the years are equal), and day (if the years and months are equal); you could sort students by student ID#; and so on.

Regardless of which relation you choose, the sorting algorithms themselves will be the same, and we'll generally learn about these algorithms by sorting integers into ascending order. But be aware that the same algorithms can be used to sort other kinds of things, simply by using a different means of comparing them. (For example, if implemented in C++, you could pass a function or a function object as a parameter to your sort algorithm, which is what built-in sorting functions like std::sort allow.)

Comparison-based sorting

Many sorting algorithms do their work by comparing pairs of elements, then rearranging them based on the result of those comparisons. We call these algorithms comparison-based sorting algorithms. There are a variety of such algorithms, with wildly different performance characteristics. Some perform better than others in one circumstance, while performing worse than others in different circumstances. Some are generally solid, if not the best in every case. Some really are irredeemable.

Below, we'll discuss some of the most commonly-studied comparison-based sorting algorithms, focusing on both how each of the algorithms works, as well as how well each of them works.


Insertion sort

The first comparison-based sorting algorithm we'll look at is called insertion sort. Suppose we have the following list of elements that we'd like to sort. (We'll think of this as an array, though, as we'll see, this algorithm could be implemented to work just as well on a linked list, since it mainly accesses elements sequentially.)

6 3 5 7 2 8 1 4

Insertion sort begins by conceptually dividing these elements into two sections, a sorted section and an unsorted section. The elements in the sorted section are sorted with respect to one another; the elements in the unsorted section can be in any order. Initially, the first element of the array is considered sorted, while the rest are considered unsorted. Our initial state, then, looks like this — the sorted section is denoted as s and the unsorted section is denoted as u.

6 3 7 1 2 8 5 4
s u

We then proceed through a number of steps, with each step having the job of inserting the first element in the unsorted section into the appropriate place in the sorted section. If there are n elements total, after n − 1 steps, the sorted section will comprise all of the elements in the array, i.e., the array will be sorted.

At each step, the insertion is done by comparing the element we're inserting to the element directly to its left. If it's in the right place already, we'll leave it alone and we're done; if not, we'll swap it with the element directly to its left, then do the same thing again, continuing until we've found the right place to insert the element.

In this example, our first step will have us compare 3 to 6. 3 is smaller than 6, so it belongs to 6's left, which means we'll swap the elements. There's nowhere left to look; 3 is now at the beginning of the array, so we're done.

3 6 7 1 2 8 5 4
s u

Next, we compare 7 to the element to its left, which is 6. 7 is fine with respect to 6 already, so we're done. 7 is now part of the sorted section.

3 6 7 1 2 8 5 4
s u

In the next step, we need to insert 1 into the appropriate position in the sorted sequence. It's smaller than all the other elements in the sorted sequence, so we'll swap 1 with 7, then swap 1 with 6, then swap 1 with 3. 1 is now in the appropriate place in the sorted section.

1 3 6 7 2 8 5 4
s u

The algorithm proceeds this way until the unsorted section is empty, which leads to four more steps:

1 2 3 6 7 8 5 4
s u

1 2 3 6 7 8 5 4
s u

1 2 3 5 6 7 8 4
s u

1 2 3 4 5 6 7 8
s

At this point, our entire array is sorted; our insertion sort is complete.

Analysis

One nice thing about insertion sort is that it's dead simple to implement. All other things being equal, simple is better than complex. A basic insertion sort algorithm that sorts a vector of integers in C++ might look something like this. (It wouldn't be a stretch to turn this into a function template capable of sorting an arbitrary collection of arbitrary things using an arbitrary comparison function, but I'll stick with this version to keep it simple.)

void insertionSort(std::vector<int>& v)
{
    for (int i = 1; i < v.size(); ++i)
        for (int j = i; j > 0 && v[j] < v[j - 1]; --j)
            std::swap(v[j], v[j - 1]);
}

Still, simple isn't good enough if we have performance requirements that aren't being met, so we'll need to understand how well this algorithm performs. And, as with many algorithms, the answer is "it depends".

Best case. At each step, the best outcome is that the element is already in the right place, in which case we compare it to the element to its left, determine that there's nothing more to do, and move on. If the array is already sorted, this will happen at every step, so that's the best possible situation we can be in. In that case, we'll run n − 1 steps, and each one will require Θ(1) time (to do a single comparison and no swaps), for a total of Θ(n) time. This is a very nice outcome indeed!

Worst case. At each step, the worst outcome is that the element has to be inserted at the beginning of the sorted sequence, which means it needs to be swapped all the way through the sorted sequence. If the array is reverse-sorted to begin with, this will happen at every step, so that's the worst possible situation we can be in. In that case, the first step will do one comparison and swap; the second step will do two of them; and so on, with the last step doing n − 1 of them. This leads us back to some math we've seen before:

   (n - 1) + (n - 2) + ... + 1
=  (n - 1)n / 2

Each comparison and swap takes Θ(1) time and we'd need to do (n - 1)n / 2 of them, for a total of Θ(n2) time, which is a whole lot worse than the best case. What would the average be?

Average case. The average case would see us insert the elements, on average, at the middle of the sorted sequence. While it's difficult to describe an array that would lead to this, that would be the overall average. In that case, we would do this many comparisons and swaps:

   (n - 1) / 2 + (n - 2) / 2 + ... + 1 / 2
=  ((n - 1) + (n - 2) + ... + 1) / 2
=  ((n - 1)n / 2) / 2
=  (n - 1)n / 4

Sadly, the average case is still Θ(n2).

The reality of the situation

What can we conclude from our analysis? There are two circumstances where insertion sort excels:

Outside of those scenarios, insertion sort is not a great choice. O(n2) overall is a disappointing result, and we should want to do better if we can. Where insertion sort spends its time is in the insertion. What if we could avoid some of that work by being smarter about how we do the insertions?


Binary insertion sort

One way to improve insertion sort would be to make the observation that the insertion step isn't taking advantage of the fact that the sorted sequence is, in fact, sorted. At each step, we're essentially doing a linear search to find an insertion point, walking backward one element at a time until we find it. What if we did a binary search at each step instead? That's what an algorithm called binary insertion sort does. It's otherwise identical to insertion sort. The question is how much of an improvement this really is.

Analysis

The first thing to realize is that the movements of the elements are still going to have to happen the same way they did in insertion sort. Suppose we're at this point in a binary insertion sort:

2 3 4 5 6 7 8 1
s u

Even if binary searching for an insertion point will reduce the number of comparisons it takes to discover that 1 should be inserted before all the others, we'll still have to do all the same movements we did before; if this is an array, we still have to slide elements out of the way to make room for 1. (And if we're sorting a linked list, we can't do the binary search in the first place, rendering this whole algorithm moot!)

This tells us that the worst case hasn't changed; part of what it made it take Θ(n2) time is the need to do Θ(n2) movements of elements. That hasn't gone away, so the worst case here will still be Θ(n2).

What's worse is that we've actually hurt our best-case performance. Suppose we're in this situation:

1 2 3 4 5 6 7 8
s u

Insertion sort would quickly determine that 8 is in the right place relative to 7 and be done. Binary insertion sort, on the other hand, will binary search for an insertion point before coming to the same conclusion. When there are i elements in the sorted section, this will take Θ(log i) time. In total, how long would be spent on the binary searches? A little math gives us our answer.

   log21 + log22 + log23 + ... + log2n
=  log2(1 * 2 * 3 * ... * n)
=  log2(n!)
=  Θ(n log n)        This last step is thanks to something called Stirling's approximation

So, even if the array is already sorted, we'll spend a total of Θ(n log n) time doing comparisons and deciding where to insert the elements. Reviewing the two scenarios where insertion sort is a good choice, we find that binary insertion sort is worse in both of them:

The moral of the story, in other words, is that a "smart" optimization isn't always all that smart; it takes some analysis to decide whether our ideas are actually any good.


Selection sort

Where insertion spends a lot of its time, ultimately, is in the actual inserting of elements into the sorted section. Whether we do a linear search or a binary search to decide where to insert them, the reality is that we still have to move elements around to make space for the inserted element, which is where a lot of the cost is in an insertion sort. But what if we just made better choices about which elements to insert? Couldn't we lessen the cost of doing the insertions if we inserted the "right" elements into the sorted section at each step? That's the idea behind an algorithm called selection sort.

Like insertion sort, selection sort begins by dividing the elements into two sections: a sorted section and an unsorted section. Unlike insertion sort, the sorted section starts out empty, and all of the elements are considered unsorted. For example, we might start out this way:

6 3 7 1 2 8 5 4
u

At each step, selection sort involves finding the largest element in the unsorted section. That element is then swapped to the end of the unsorted section and becomes part of the sorted section instead. In the example above, we'd start by determining that 8 is the largest element, then swap it to the end of the array.

6 3 7 1 2 4 5 8
u s

Note that 8 is where it ultimately belongs; it'll never need to move again.

Next, we'd select the largest remaining element, which is 7, then swap it to the end of the unsorted section (i.e., swapping it with 5). The sorted section, again, grows by one element.

6 3 5 1 2 4 7 8
u s

The algorithm proceeds in this way; each step causes the sorted section to grow by one element.

4 3 5 1 2 6 7 8
u s

4 3 2 1 5 6 7 8
u s

1 3 2 4 5 6 7 8
u s

1 2 3 4 5 6 7 8
u s

1 2 3 4 5 6 7 8
u s

1 2 3 4 5 6 7 8
s

When we've completed n steps — one for each element we're sorting — the entire array is sorted.

Analysis

The most important part of this algorithm is finding the largest element in the unsorted section. This requires us to look through the entire unsorted section; only then can we be sure we have the largest one. (There is one narrow circumstance where this isn't true, which is when we find an element equal to the last element we swapped into the sorted section, but this is unlikely in most cases, so we'll ignore that detail going forward.)

So how many elements do we end up looking at? In the first step, we look at all n of them. In the second step, one of the elements is already in the correct position, but we'll have to look at the remaining n − 1. So, in total, we'll look at this many elements:

   n + (n - 1) + (n - 2) + ... + 1
=  n(n + 1) / 2

Generally, then, we'll be doing a total of Θ(n2) comparisons. We'll only do n swaps, but the total time we'll spend will still be Θ(n2). And note that this is always how long it'll take; even if the array is sorted already, we'll be finding maximums and swapping them into place at each step.

Again, we've attempted to be clever, but ended up making things worse; insertion sort still wins. Is insertion sort as good as it gets? Absolutely not, but we'll need to consider other approaches.


Properties of sorting algorithms

Thus far, we've seen three sorting algorithms: insertion sort, binary insertion sort, and selection sort. While they each do their job somewhat differently, they share some similar characteristics. When comparing sorting algorithms, it's a good idea to stop and observe the things that make them similar and the things that make them different.

In-place algorithms

We say that an in-place algorithm is one that is able to do its work using Θ(1) additional memory — above and beyond the memory used for its initial parameters. So, for example, a sorting algorithm would be in-place if it could be done using Θ(1) memory, not counting the elements that we're sorting. So, which of the algorithms we've seen so far are in-place?

As we'll see, some sorting algorithms do their work using ancillary data structures, and that offers us one way to potentially improve on the algorithms we've seen. (It's not at all uncommon in computer science to find algorithms that use more memory but, as a result, need less time.)

Stable sorting algorithms

A sorting algorithm is stable if it maintains the relative ordering of elements that are considered equivalent. If you were sorting strings by their length, for example, any two strings with the same length would be considered equivalent. Suppose we're sorting these strings:

Boo is happy to see me

The strings is, to, and me are the same length; the strings Boo and see are also the same length. A stable sorting algorithm is one that would give this result:

is to me Boo see happy

Since is was listed before to and to before me originally, they remain in that same relative order when we're finished. Similarly, Boo and see remain in the same relative order.

As we'll see, stability can be a useful property for a sorting algorithm to have, though not all sorting algorithms have it. Of the algorithms we've seen so far, which ones are stable?

Sorting algorithms don't have to be stable to be useful, but stable sorting algorithms can be used in ways that non-stable ones can't. In particular, you can run stable sorting algorithms more than once using different comparison functions and get a sensible result. For example, if you sorted people by first name using a stable sorting algorithm, then sorted them again by last name using a stable sorting algorithm, the people would now be sorted primarily by last name with first name breaking any ties. With a non-stable sorting algorithm, the result might not be this way; once you sorted by last name, the relative order of the people by first name might be changed.


Treesort

If we give up on the idea that a sorting algorithm needs to be in-place (i.e., if we allow ourselves to use an ancillary data structure whose size varies with the number of elements we're sorting), we might be able to achieve things we couldn't acheive otherwise.

Recall that binary search trees have one property that turns out to be interesting if your goal is to sort some elements: If you have a binary search tree containing a set of keys, it's possible to iterate through those keys in ascending order in Θ(n) time, by doing something called an inorder traversal. That leads to a potentially good idea for a sorting algorithm:

Treesort(items):
    let t be an empty binary search tree

    for each item in items:
        insert the item into t

    perform an inorder traversal of t

The order in which the items are traversed will be ascending order; if we use the same ≤ relation to organize the items in the binary search tree as the one we want to sort by, this will certainly sort the elements. (There is one wrinkle here, which is that binary search trees are generally required to have unique keys; there are a couple of ways we could work around that, such as storing multiple items in the same node if they're considered equivalent.)

Analysis

Worst case. We saw previously that a binary search tree is susceptible to performance problems in the cases that they can become degenerate. If the elements were already sorted or were already reverse-sorted, for example, we'd end up with a binary search tree that's degenerate, and we'd end up spending Θ(n2) time building it. Note, though, that we can mitigate this problem by using a balancing technique such as AVL.

Best-case. Binary search trees behave very well when their height remains logarithmic — and their balanced brethren, like AVL trees, guarantee this. How long does it take to build a balanced binary search tree? At each step, we have i keys in the tree already and its height is Θ(log i). Adding that up for n steps is quite similar to some math we saw earlier:

   log21 + log22 + log23 + ... + log2n
=  Θ(n log n)

Conclusion. As long as the height of the tree remains logarithmic at each step, the algorithm will run in Θ(n log n). If we keep three balanced, using a technique like AVL, this will be true even in the worst case.

However, the algorithm is not in-place — it requires a separate data structure — and it may or may not be stable, depending on how we handle multiple equivalent keys.


Heapsort

Let's consider again the selection sort algorithm we saw earlier, which boils down to this:

The problem with this algorithm is that it needs to spend a linear amount of time choosing the largest key at each step. If we could avoid that somehow, we might have a much better algorithm.

Previously, we learned about priority queues and how to implement them efficiently using a data structure called a binary heap. Binary heaps have two properties that can really help us here.

There's one change we could make to selection sort that might have a positive effect on its performance. What if we organized the unsorted section as a binary max heap — one whose largest element is at the root — instead? We could then remove the largest element using the algorithm we already know, then move it to the end of the unsorted section. The unsorted section would then be one element smaller (i.e., one fewer element would now be part of the binary max heap). If we did this at every step, eventually the unsorted section would be empty, but the sorted section would be a sorted version of the elements we started with.

This algorithm has a name: It's called heapsort. Suppose we wanted to sort these elements:

6 3 7 1 2 8 5 4

The algorithm begins by rearranging the elements so they form a binary max heap, a step that is sometimes called heapifying. That can be done using the following algorithm, which I'm describing conceptually, based around the idea that the elements of the array form a complete binary tree. (See the Priority Queues notes for a refresher on how that works.)

Heapify(Array a):
    for each element with at least one child, in reverse level-order:
        swap that element with the larger of its two children until it is larger than both its children

We'd start at the element 1, because it's the last one that has children (a left child, whose value is 4, and no right child). Since its left child is larger than it is, we swap them.

6 3 7 4 2 8 5 1

Moving backward in level order, we now want to swap 7 into place. Its two children are 8 and 5. 8 is the larger child and is larger than 7, so we swap 7 with 8.

6 3 8 4 2 7 5 1

Next, we want to swap 3 into place. Its two children are 4 and 2. 4 is the larger child and is larger than 3, so we swap 3 and 4.

6 4 8 3 2 7 5 1

3 now has one child, the element 1. 1 is smaller than 3, so 3 is fine where it is. Continuing, we now want to swap 6 into place. Its two children are 4 and 8. 8 is the larger child and is larger than 6, so swap 6 and 8.

8 4 6 3 2 7 5 1

6's two children are now 7 and 5. 7 is the larger child and is larger than 6, so swap 6 and 7.

8 4 7 3 2 6 5 1

And now 6 has no children, so it's fine where it is. At this point, our array has been rearranged so that it's a binary max heap. (Draw the complete binary tree if you want to verify this.)

Of course, the array isn't sorted yet, but we've now got a collection of unsorted elements that are a binary max heap. We'll denote that as h.

8 4 7 3 2 6 5 1
h

From here, the algorithm proceeds similarly to selection sort, removing the maximum element from the heap and placing it at the end, the rearranging the remaining elements so they're again a heap; this is identical to dequeuing from a priority queue.

In our example, the first step would be to relocate 8 to the end of the heap by swapping it with 1, then restructure the remaining elements by moving 1 down until the remaining cells are a binary max heap.

7 4 6 3 2 1 5 8
h s

Next, we do the same with 7.

6 4 5 3 2 1 7 8
h s

Execution continues in this manner, with each step moving one element from the heap into the sorted section.

5 4 1 3 2 6 7 8
h s

4 3 1 2 5 6 7 8
h s

3 2 1 4 5 6 7 8
h s

2 1 3 4 5 6 7 8
h s

1 2 3 4 5 6 7 8
h s

1 2 3 4 5 6 7 8
s

Analysis

The algorithm is comprised of two parts, which we can analyze separately:

Heapifying the array, surprisingly, can be done in Θ(n) time. Even though elements sometimes have to move further down the heap than just one level, it turns out that many of the nodes reside near the bottom of the tree and, thus, don't have far to go, even if they go all the way to the bottom. (Around half of the nodes are leaves already and are being skipped. Half of the remaining nodes would only need to move down, at most, one level. Half of the other half would only need to move down, at most, two levels. So, overall, most of the nodes don't have far to go.) The proof of this is beyond the scope of our work here, but suffice it to say that this is a linear-time operation.

From there, we run one step for each of the n elements, which will involve removing one element from the heap and restructuring it. As we saw in the priority queue implementation, removing from a heap-based priority queue with i elements takes O(log i) time. We'll need to do this on a heap with n elements, then on one with n − 1 elements, and so on, while leads us back to this sum again.

   log21 + log22 + log23 + ... + log2n
=  O(n log n)

In fact, it can be shown that this is actually a Θ(n log n) bound — in other words, even though some of the heap removals might be faster than others, they'll nonetheless average Θ(log n) time.

So, in total, heapsort spends Θ(n) time heapifying, then Θ(n log n) time removing the elements. The total time is Θ(n log n).

Heapsort is an in-place sort. What appears to be an ancillary data structure, the binary max heap, is actually stored within the array we're sorting. Heapsort is not stable, though, because binary max heaps can move equivalent elements around in a way that may change their relative order.


Divide and conquer algorithms

A divide and conquer algorithm is one that solves its problem, broadly, using the following technique:

  1. Divide the problem into two or more smaller subproblems, each of which is a smaller instance of the same problem.
  2. Separately solve the subproblems recursively.
  3. Combine the solutions to the subproblems to yield an overall solution to the whole problem.

In the case of a sorting algorithm, we divide the problem of sorting all of the elements into two or more subproblems, each of which has the goal of sorting a subset of them. Combining the solutions involves taking the two or more sorted subsets of the elements and putting them together so that they comprise a sorted version of the entire set.

As we'll see, there is often a tradeoff between the time we spend dividing the problem and the time we spend combining the solutions to the subproblems. In other words, if we're more careful about how we divide, combining becomes easier; if we're less careful about how we divide, combining involves more work.


Quicksort

One divide-and-conquer approach to sorting is called quicksort. Quicksort is based around the notion of partitioning the array, by choosing one of the elements as a pivot, then rearranging the elements so that any elements smaller than the pivot are moved to its left and others are moved to its right. The two partitions are then quicksorted recursively.

The partitioning algorithm doesn't have the goal of sorting the elements; it just has the goal of ensuring that the pivot is placed between elements on its left that are smaller and elements on its right that aren't. Here's one such algorithm:

Partition(Array a):
    pivot = choose the index of the pivot; see below

    last = a.length - 1

    i = 0
    j = last - 1

    swap(a[pivot], a[last])                    swap the pivot to the end temporarily

    loop:
        while i ≤ last and a[i] < a[last]:     move i forward until it reaches an element ≥ the pivot
            i++

        while j ≥ i and a[j] ≥ a[last]:        move j backward until it reaches an element < the pivot
            j--

        if i ≥ j:                              if i has reached j, we're done
            break

        swap(a[i], a[j])                       swap mismatched elements
        i++
        j--

    swap(a[end], a[j])                         put pivot in between the partitions

This algorithm moves two indexes/pointers toward each other, looking for pairs of elements that are mismatched (i.e., the one of the left is larger than the pivot and the one on the right is smaller). Whenever it finds such a pair, it swaps them, then we continue looking for more pairs. When the two indexes/pointers reach one another, we're done, so we can place the pivot at that position — everything to the left of that position will now be smaller, everything to the right will not. One important thing to note at this point is that the pivot will be in its correct final position in the sorted order; even if we sort everything smaller than the pivot and everything larger, the pivot will be in the right place.

By way of example, let's suppose that we have the folowing sequence of elements that we're interested in partitioning.

8 6 11 5 13 7 2 10 1 9 4 12 3

We'll talk later about how best to choose a pivot. Let's say for the sake of this example that our pivot is 8. The first thing the partitioning algorithm does is swap 8 to the far right, then set up our i and j counters, respectively.

i
j
3 6 11 5 13 7 2 10 1 9 4 12 8

Now, we march i forward until it points to an element larger than the pivot, then march j backward until it points to an element smaller.

i
j
3 6 11 5 13 7 2 10 1 9 4 12 8

These elements are mismatched — whether we put the pivot before, between, or after them, at least one will be in the wrong place with respect to the pivot — so we'll swap them and then inch i and j closer to each other.

i
j
3 6 4 5 13 7 2 10 1 9 11 12 8

Repeating this process, we find that 13 and 1 are mismatched:

i
j
3 6 4 5 13 7 2 10 1 9 11 12 8

So we swap them and move i and j yet closer together.

i
j
3 6 4 5 1 7 2 10 13 9 11 12 8

Finally, we move i forward, looking for an element larger than the pivot. By the time we find one, i and j are equal.

i j
↓↓
3 6 4 5 1 7 2 10 13 9 11 12 8

At this point, all that's left is to swap the pivot into the position where i and j met.

3 6 4 5 1 7 2 8 13 9 11 12 10
left   right

Notice that all of the elements to the left of the pivot are smaller and all of the elements to the right are larger; we got the outcome we were looking for. And remember the important outcome: 8 is where it belongs in the final sorted order.

From there, we simply quicksort the two partitions recursively, leading to this overall algorithm.

Quicksort(Array a):
    Partition a into a left and right partition
    Quicksort the left partition
    Quicksort the right partition

Analysis

Most of the work in a quicksort is done by the partitioning algorithm, so the first question is how long it takes to partition an array of n elements. Note that i and j begin n − 1 steps away from each other. We then compare the ith or jth element to the pivot, each time moving either i or j one step closer to each other. So, in total, partitioning takes Θ(n) time by the time i and j have reached each other and the algorithm is finished.

The remaining question is the effect that the recursion has. As it turns out, it matters very much where the pivot ends up.

Worst case. Suppose each time we selected a pivot, we selected the largest or smallest element in the partition. In that case, we'd be left with an empty partition on one side of the pivot and a partition containing n − 1 elements on the other side. We'd then recursively sort one of the partitions, and could skip the other. How long would this take? The following recurrence captures the notion pretty well.

T(1) = 0
T(n) = n + T(n - 1)

We can use something much like the repeated substitution technique to solve this recurrence:

T(n) = n + T(n - 1)
     = n + (n - 1) + T(n - 2))
     = n + (n - 1) + (n - 2) + T(n - 3)
     ...
     = n + (n - 1) + (n - 2) + ... + 2 + T(1)
     = n + (n - 1) + (n - 2) + ... + 2 + 0         sum of 1..n, minus 1
     = (n(n - 1) / 2) - 1

And we see that, in this case, the running time of quicksort would be Θ(n2), a disappointing result.

Best case. Suppose instead that each time we selected a pivot, it was the exact middle of the partition, so that an equal number of elements would be in the left and right partitions. We'd then have to recursively quicksort the two partitions. How would this change the outcome? Here's a recurrence that describes this scenario.

T(1) = 0
T(n) = n + 2T(n / 2)

Once again, repeated substitution gives us a better idea of how long this will actually take.

T(n) = n + 2T(n / 2)
     = n + 2((n / 2) + 2T(n / 4))
     = 2n + 4T(n / 4)
     = 2n + 4((n / 4) + 2T(n / 8))
     = 3n + 8T(n / 8)
     ...
     = jn + 2jT(n / 2j)

  let j = log2n

     = n log2n + nT(n / n)
     = n log2n + nT(1)
     = n log2n + n

This is a much better outcome. If we choose our pivots well, quicksort will run in Θ(n log n) time.

There are a couple of other things worth noting.

Choosing pivots carefully

The remaining question is how we choose the pivot each time. As we've seen, quicksort is susceptible to poor worst-case performance if we repeatedly choose the largest or smallest element in a partition as the pivot. So how should we choose the pivot to avoid that scenario?

The simplest mechanism for choosing a pivot is to choose the first or last element of the partition as the pivot. The problem is that this is susceptible to worst-case behavior when the array is already sorted or is already reverse-sorted. That's not such a rare scenario, so best not to choose the pivot this way.

There are a couple of options that work pretty well:


Mergesort

Quicksort is a divide and conquer algorithm that spends most of its time doing the dividing — by partitioning the elements around a pivot — so that it can do essentially no work in combining the solutions to the subproblems. Once the elements are partitioned, the pivot is in the right place; once we've sorted the two partitions, the entire array will be sorted.

Another algorithm, called mergesort, makes the opposite choice, spending little or no time dividing the problem, but having to spend more time combining the solutions to the subproblems instead. The mergesort algorithm, simply stated, is as follows.

Mergesort(Array a):
    divide a in half (i.e., calculate the index of the middle)
    Mergesort(left half of a)
    Mergesort(right half of a)
    merge the sorted halves of a together so they're one sorted array

Suppose that you wanted to mergesort the following array:

6 3 7 1 2 8 5 4

We would begin by dividing the array in half, with the first four elements being in the left half and the last four elements being in the right half.

6 3 7 1 2 8 5 4
left right

We would then recursively mergesort the two halves. Let's fast-forward to the point where that's done. Here's what we would have at that point:

1 3 6 7 2 4 5 8
left right

Just as the partitioning algorithm was the most important part of quicksort, the merging algorithm is the most important part of mergesort; the merging is where the work is actually done. So how do we merge these two halves into a sorted whole? The answer is to use two counters, one pointing to an element in the left half, and another pointing to an element in the right.

i
j
1 3 6 7 2 4 5 8
left right

At each step, we'll want to decide which element is smaller, the ith or the jth. That will be the next element in our final sorted order. The problem is that we're going to need an extra array to move the answers into.

Initially, i points to the element 1 and j points to the element 2. 1 is smaller, so we'll move 1 into our merged array and then move i forward.

i
j
  3 6 7 2 4 5 8
left right
1

Next, we'll do the same with 2.

i
j
  3 6 7   4 5 8
left right
1 2

The merging operation continues in this fashion until all of the elements have been moved into the merged array. At this point, they can be moved back — or there are clever implementations that swap back and forth between multiple arrays to avoid some of this overhead.

Analysis

One thing to realize right away about mergesort is that there is no best or worst case. This algorithm does the same amount of work regardless; whether the array is sorted already, reverse-sorted already, or a random mish-mash of elements, the dividing and merging processes are always the same.

Merging n elements — two sorted halves of size n / 2 — together takes Θ(n) time. At each step, we choose an element from one half or the other, which can be done in Θ(1) time; there are a total of n steps required.

But, of course, we have to do more than one merge. So how much time do we spend? A recurrence can be used to capture the idea:

T(1) = 0
T(n) = n + 2T(n / 2)

Note that this is the same as the best-case time required to perform a quicksort. The total time required would be Θ(n log n).

Because of the need for the extra array used in merging, mergesort is not an in-place algorithm. However, it is a stable algorithm, as long as we always prefer an element in the left partition over an element in the right partition when we're merging and the two elements we're looking at are equivalent.

One way that mergesort excels is in situations where accessing elements sequentially is beneficial. Mergesort, for example, can be used quite nicely to sort very large collections of information stored on disk for this reason, because disks tend to perform much better when you access their data in sequence.