CS 261 Lab #11

April 18th

Goals:

We've discussed three different sorting algorithms so far and, while they all seemed to be O(n^2) in the worst case, there is some reason to believe that there might be some differences in their best case and average case behavior. In this lab you'll measure their performance experimentally to see how they compare when sorting arrays of random values, arrays of values in increasing order (already sorted), and arrays of values in decreasing order (reverse sorted).

Partners

In each lab this semester you will work with a randomly assigned partner. (I'll have Zoom randomly set up breakout rooms.) Please be kind in your interactions with your partners! Keep in mind that students in this class have a range of previous programming experience, and that some have been college students for longer than others. We're all in this together, and you have something to learn from your partner, no matter who they are or what their previous experiences have been. I expect that group members will collaborate and work together on each step of the lab.

Take a moment to introduce yourself to your partner(s). After social pleasantries are complete, pick one member of the team to be the "typer". They'll share their screen while editing the lab code in BlueJ. (I think BlueJ works better for these interactions than Eclipse. It's easier to see when sharing screens, and makes it easier to quickly test individual methods than Eclipse does.) Group members should contribute equally while working through the problems below and discuss all code to be written, though only the "typer" will be able to edit code. Resist the temptation to have both members work simultaneously in BlueJ — you're much more likely to "drift apart" over the course of the lab if you do so. The goal here is to have a partner who's engaged on exactly the same step of the lab as you are.

Counting Comparisons and Swaps

Start by downloading the lab project, extract its contents, and open it in BlueJ. The project contains the sorting code from class: Our Selection Sort and Bubble Sort implementations, and the book's Insertion Sort code. To approximate the number of computational steps required to sort an array, we will count the number of comparisons performed during a sort. We'll also keep track of the number of swaps each algorithm performs since our analysis in class led us to expect differences between the algorithms. Variables for holding these counts are already defined at the top of the Sort class, along with methods for accessing and clearing the counts.
Add code to selectionSort so that comparisons is incremented each time a comparison is performed between two data values in the array (regardless of its outcome), and swaps is incremented each time a pair of items from the array exchange positions.
Add code to insertionSort and bubbleSort so that they also count the number of comparisons and swaps. In insertionSort, consider it a "swap" any time a data value moves from one place to another within the array.

Testing the Sorts

Now let's test the sorting algorithms and see how they perform. We want to know how many comparisons and swaps are occurring, but we also want to get a sense of how the algorithms scale — how the number of comparisons changes as the length of the input increases. The SortTester class contains some code that will help: Methods for creating arrays of random ints, arrays of ints in increasing order, and arrays of ints in decreasing order. This will let you see whether the order of the values in the array influences the performance of the algorithms.

Take a look at the testSelectionSort method in the SortTester class. I've written code to generate three different arrays, sort each of them, and report the number of comparisons and swaps. (The output is a little cryptic — just numbers separated by tab characters — but that's intentional. Using tab characters means that we can copy and paste output from the terminal window into a spreadsheet and values will be in different cells.) In SortTester's main method, write a for loop that calls testSelectionSort for sizes from 1000 to 20000, at increments of 1000.
Study the output from the calls to testSelectionSort and try to answer the questions below. It might be easier to figure out the scaling if you copy the data into a spreadsheet and have it draw a graph. (You could even have it fit a curve to the data if the pattern's not clear.)
- Does it work best on random data, sorted data, or reverse-sorted data? Why?
- How does the number of comparisons scale with the array size?
- How does the number of swaps scale with the array size?
Write testing code that will generate similar tables of output for Insertion Sort and Bubble Sort. You can copy-and-paste my testSelectionSort and make some minor edits, then add some additional for loops in main to call these new routines with various problem sizes.
Answer the three questions above for Insertion Sort and Bubble Sort.

A Speedier Selection Sort

I've written a sorting method called splitAndSort in the Sorts class. It works by splitting an input array into two equally sized pieces, sorting each piece with Selection Sort, and then merging the two sorted halves back together such that the result is ordered. For the last exercise I want you to examine its performance. (This is not Merge Sort, but we'll discuss this approach as a motivation for a faster sorting algorithm in class.)

Take a look at splitAndSort and its helper merge. Make sure you understand what the methods are doing before you proceed. Then, add code as necessary so that the comparisons and swaps performed by splitAndSort are being counted accurately. (Don't forget the comparisons that occur in merge.)
Add some code to SortTester to measure the performance of splitAndSort. How does it compare to plain old Selection Sort? Why?

Extras

If you've got extra time, consider trying the following:

Instead of counting swaps, count the number of assignment statements required by each sorting algorithm. Each swap in Selection Sort actually requires three assignments, but when a value changes location in Insertion Sort that's only a single assignment. Counting assignment statements will therefore give us a more accurate estimate of the number of writes to memory the algorithms require.
Write an improved version of splitAndSort that splits into four pieces before sorting. You could keep splitAndSort the way it is and use it as a helper for a new method that splits the input array, calls splitAndSort on each half, then merges the results.
Revise the original splitAndSort so that instead of calling Selection Sort as a helper to sort each of the two smaller pieces, it recursively calls splitAndSort instead! The required changes are pretty minimal...

Brad Richards, 2023