CS 261 Lab #12

April 25th

Goals:

In this lab, you'll work with Merge Sort, Quicksort, and Heap Sort to get better insight into how they work, and experimentally compare the number of comparisons performed when sorting both random and sorted input data. You'll also measure Java's built-in sort on arrays — perhaps we can guess which algorithm Java uses by looking at the results of your experiments.

Partners

In each lab this semester you will work with a randomly assigned partner. (I'll have Zoom randomly set up breakout rooms.) Please be kind in your interactions with your partners! Keep in mind that students in this class have a range of previous programming experience, and that some have been college students for longer than others. We're all in this together, and you have something to learn from your partner, no matter who they are or what their previous experiences have been. I expect that group members will collaborate and work together on each step of the lab.

Take a moment to introduce yourself to your partner(s). After social pleasantries are complete, pick one member of the team to be the "typer". They'll share their screen while editing the lab code in BlueJ. (I think BlueJ works better for these interactions than Eclipse. It's easier to see when sharing screens, and makes it easier to quickly test individual methods than Eclipse does.) Group members should contribute equally while working through the problems below and discuss all code to be written, though only the "typer" will be able to edit code. Resist the temptation to have both members work simultaneously in BlueJ — you're much more likely to "drift apart" over the course of the lab if you do so. The goal here is to have a partner who's engaged on exactly the same step of the lab as you are.

Measuring Merge Sort

  1. Start by downloading the lab project, extract its contents, and open it in BlueJ. The Sorts class contains the code we went over in class for Merge Sort, Quicksort, and Heap Sort. There's a Tester class that creates arrays full of random values and calls sort methods, and an IntegerComparator class that we'll use to help us count comparisons in Heap Sort and Java's built-in sort method.
  2. According to our analysis in class, Merge Sort was expected to take O(n log n) time in all cases — whether the list was already sorted or not. Let's verify that. We'll approximate the amount of computational effort by counting the number of comparisons (compareTo calls) performed during a sort. (That only gives us an estimate of the work done during the merge phase, but it should still help us establish the complexity.) I've already added code at the top of the Sorts class that defines a counter variable and methods to clear it and to return its value. Add code to increment the counter wherever comparisons are occurring in the Merge Sort code.
  3. In Tester, I've written code that creates an array of random values and sorts it. Add the necessary code to report how many comparisons were performed during a Merge Sort call.
  4. We could test Merge Sort's performance on already-sorted data if we just called mergeSort again on the array we just sorted! Add some code that clears the counter, re-sorts the array, and reports how many comparisons were performed.
  5. Finally, use Tester's reverseArray to reverse the now-sorted array. Sort that array and see how many comparisons it takes. You should leave all three of these tests in Tester's main method, so that we see all three results on each run. Run it a few times to see if things change from run to run.

Measuring Quicksort

  1. Now let's investigate Quicksort. Add code to increment the counter on each comparison performed during a sort. (This one's a little trickier than Merge Sort, but not too bad.)
  2. Add code to Tester that reports the number of comparisons performed for the three test cases we did for Merge Sort: Sorting a random array, sorting an already sorted array, and sorting a reverse-sorted array. Which case(s) trigger its worst case behavior? What is its worst case behavior? How much variability is there from run to run?
  3. Quicksort ends up doing many more comparisons if its partition phase splits the data unevenly, and that happens if we pick a bad pivot value (one that's too large or too small, rather than being close to the median value). Improve the partition code by having it select a value from the array at random and use that as the pivot. See how many comparisons this modified Quicksort requires. (One way to do that is to swap the randomly chosen pivot value with the first value in the array, and then let the rest of the code run normally.) Note that you should not just use a random value as the pivot — you should use a value that's already in the list, but select it from a random position.

Measuring Heap Sort

  1. It'll be interesting to count the number of comparisons required by Heap Sort so we can compare it to the other two, but it's harder to instrument the Heap Sort code — all of the comparisons happen inside the PriorityQueue class as we insert and remove heap items, and we don't have access to that code. Luckily, PriorityQueue will let us use a comparator object to determine the orderings, as we discussed in class.

    I've provided an IntegerComparator class that compares two Integer objects in the "standard" way. (The compare method says that smaller Integers should come before larger ones.) It also maintains a counter so we can see how many times PriorityQueue calls the compare method. Add code to the Tester class that creates an IntegerComparator in addition to an array full of random values, and passes both to heapSort. When the sorting's done, you can call getCount on the comparator to see how many comparisons were done! Add code to try the other array orderings as well and print the results so we can compare it to the other two sorts.

Investigating Java's Built-in Sort

As we saw in class, Java has built-in sorting routines for sorting arrays and lists. The Arrays class, for example, has a static sort method that sorts an array of Comparable objects. The interactions below show this method being used to sort an array of five Integers:

> import java.util.Arrays;
> Integer[] nums = {5, 4, 3, 2, 1};
> Arrays.sort(nums);
> Arrays.toString(nums)
  "[1, 2, 3, 4, 5]"   (String)

There's also a two-input version of sort that takes a comparator object and uses that to determine the desired ordering. We can therefore use the same trick as you used in Heap Sort above to count the number of comparisons done in the built-in sort.

  1. You can use the same IntegerComparator class from your Heap Sort tests to investigate the number of comparisons done by the built-in sort method. Uncomment the code in Tester that creates an IntegerComparator and an array, and sorts it using the comparator object. Add code below the sort call that prints the number of comparisons performed. Run it and make sure the results are reasonable.
  2. Do some more tests to see how it behaves on already sorted and reverse-sorted arrays. Does the pattern look more like Merge Sort, Quicksort, or Heap Sort? Could you predict which Arrays.sort() is using?

Extras

If you've got extra time and are bored, consider trying the following:
  1. In addition to tracking the number of comparisons, you could measure the actual time required to do the sorting. You could use something like the code below, used on Lab #6:
    long start = System.nanoTime();
    // Do something that needs to be timed
    long end = System.nanoTime();
    long elapsed = (end-start)/1000000;
    System.out.println("Took "+elapsed+" milliseconds");
    
  2. Use a better approach for finding pivots: Pick three values at random and use the median of the three as the pivot. See how much of a difference that makes.
  3. For practice, modify IntegerComparator to get other sorting orders. See if you can get the built-in sorting routine to sort the integers in reverse order, for example, or to sort based on the absolute value of the numbers so that negative values don't necessarily come before positive ones.
  4. Do a better job of counting computational steps in Merge Sort. For example, increment the counter as appropriate during the split phase as well.
  5. Your investigations above show that Quicksort's partition doesn't always split arrays evenly, but we only get hints about its behavior by looking at the total number of comparisons it does. Find other ways to measure and report the "imbalance" — for example, by printing information about the size of each "half" after the splits, or by determining the maximum depth of the "tree".


Brad Richards, 2023