CompSci 260P Project #1 — Shellsort
Shellsort, the famous,
classic sorting algorithm, is really a family of different
sorting algorithms, depending on the gap sequence that it uses.
For this project, you need to implement five (5) different
versions of the Shellsort algorithm.
Specifically, you need to implement
Shellsort with the following five (5) different gap sequences:
- The original Shell sequence, n/2k, for k=1,2,...,log n.
- The sequence, 2k-1, for k=log n, ..., 7, 3, 1.
- The Pratt sequence, 2p3q, ordered from
the largest such number less than n
down to 1.
-
The A036562 sequence,
in reverse order, starting from the largest value less than n,
down to 1.
-
A sequence different than any of these, which you invented.
Please test out multiple
possibilities for your invented sequence, #5,
with the goal of making it the best of the five.
Following up on this, you need to do an experimental time
analysis by implementing each of the
five above versions of Shellsort,
as n grows, with multiple runs for random permutations for each problem size.
That is, you need to
perform an empirical comparative analysis by running these five
implementations on a variety of test cases to experimentally determine
the relative efficiencies of these
algorithms in the worst and average cases as a function of n.
For each implementation, you need to perform runtime experiments
on random permutations, with multiple
runs for each problem size, for increasing problem sizes.
Specifically, you need to do a set of experiments for each of the following
distributions:
- Uniformly distributed permutations,
where all permutations are equally likely.
-
Almost-sorted permutations. These are generated by starting with a
sorted array/vector of n numbers, say, the numbers 1,2,3,...,n, in this order.
Then, independently
choose 2log n pairs, (i,j), where i and j are uniformly-chosen random integers
in the range from 0 to n-1, and swap the numbers at positions i and j in
the array/vector.
You must plot
the results on a log-log scale (with uniformly distrubted permuations on one
plot and almost-sorted permutations on another)
to empirically determine the algorithm's
asymptotic running time for each type of distribution.
If the data is well-fit with a line, then the
slope of the line can be used to determine the exponent in the running time,
as explained in this
Wikipedia
article about log-log plots (focus especially on "Relation with monomials").
You then need to rank all your sorting implementations from asymptotically slowest
to fastest based on your experimental results.
Deliverables (to be submitted via EEE dropbox) include the following:
- A ZIP file that includes the following:
- Source code (C, C++, Java, or Python)
of your program that implements and experimentally
tests the running times of the five versions of the Shellsort algorithm
- A Report (in PDF) that includes the following:
- Comparative experimental analysis and conclusions drawn from
the experiments ranking the quality of the gap sequences you tested.
This portion of the report should include a pseudo-code summary
of your algorithm, design choices implemented in the source code (such
as data structures and representations), and visual plots showing
comparitive empirical performance of the different versions
of the Shellsort algorithm.
The plots should be on a log-log scale and you should use the slope of
a best line fit to determine the exponent of the polynomial for a
best estimate of the running time in terms of n for each version.