CS 261 Lab #10

April 11th

Goals:

This week you'll start by modifying and completing some methods in two different map implementations to get more familiar with their workings, then we'll run some experiments to put Knuth's formulas to the test.

Partners

In each lab this semester you will work with a randomly assigned partner. (I'll have Zoom randomly set up breakout rooms.) Please be kind in your interactions with your partners! Keep in mind that students in this class have a range of previous programming experience, and that some have been college students for longer than others. We're all in this together, and you have something to learn from your partner, no matter who they are or what their previous experiences have been. I expect that group members will collaborate and work together on each step of the lab.

Take a moment to introduce yourself to your partner(s). After social pleasantries are complete, pick one member of the team to be the "typer". They'll share their screen while editing the lab code in BlueJ. (I think BlueJ works better for these interactions than Eclipse. It's easier to see when sharing screens, and makes it easier to quickly test individual methods than Eclipse does.) Group members should contribute equally while working through the problems below and discuss all code to be written, though only the "typer" will be able to edit code. Resist the temptation to have both members work simultaneously in BlueJ — you're much more likely to "drift apart" over the course of the lab if you do so. The goal here is to have a partner who's engaged on exactly the same step of the lab as you are.

Open Addressing Implementation

Start by downloading the lab project, extract its contents, and open it in BlueJ. The project contains the open addressing map implementation we wrote in class and the start of a chaining-based implementation. You'll put your test code in the Experiment class, and that testing code can be polymorphic because there's an interface (Maplike) that both map implementations support.
Start by looking through OpenMap. It's the same basic code we wrote in class, but with a new name and a handful of new methods that will help us with our experiments in a bit. There's a getLoadFactor() method that is intended to return a value between 0 and 1, indicating the ratio between the number of (key, value) pairs in the map and the size of the array. Modify the appropriate methods in OpenMap so that the load factor is computed on each put call, and returned correctly by getLoadFactor.
The OpenMap class also has methods that are supposed to track and report how many "probes" have been done. The getProbes and resetProbes methods are complete, but the returned value won't be correct until you fix up the code a bit. We'll define a probe to be any call to .equals as we're adding or looking up keys. Modify the get method so that it counts the number of probes properly. (You can do put too if you like, but we'll only be worried about counting the probes performed by get in this lab.)
Do some testing to verify that your load factor and probe-counting code works as expected. (You don't have to write test code — testing via the code pad or point-and-click is enough.)

Chaining-Based Implementation

In class we talked briefly about how the chaining-based map implementation worked, but we didn't write any code. Now's your chance to fix that! Finish writing the guts of the put method. I've left the comments that I wrote in my version, but yanked out the actual code.
Add in the plumbing so that getLoadFactor works correctly. Note that for a chaining implementation, this is still the ratio between the number of (key, value) pairs in the map and the size of the array, but that now it could be greater than 1. (If all slots in the array are filled, and each of the lists have a couple of items in them, for example.)
Modify the get method so that it counts the number of probes properly. (You can do put too if you like, but we'll only be worried about counting the probes performed by get in this lab.)
Test all of your new code to make sure it works.

Performance

Now it's time to see how well our implementations work! We're going to create instances of each kind of map, add a bunch of random entries, and then see how many probes it takes on average to look up data values in the maps. The table from the book, showing some predicted values from Knuth's formulas, is below for the sake of comparison.

Finish the definition of fillAndTest in the Experiment class. It's supposed to take a map instance and a load factor, add random entries to the map until it reaches the desired load factor, then calculate and print the average number of probes required to do a bunch of get calls. We'll use maps that map strings to strings (both the key and value will be of type String) to keep things simple. You can therefore create random (key, value) pairs by calling my randomString a couple of times to get a key and value.
When you write the code to test the number of probes, start by resetting the probe counter, then doing at least 1000 get calls for random keys. After all of the gets, use getProbes() to see how many total probes were done, and use that to calculate and print an average value.
In Experiment's main method I've left code that creates an OpenMap and passes it to fillAndTest. Create a ChainMap of the same size and pass it to fillAndTest. See how many probes are required for the maps for a load factor of .75.
The experiment you performed in the previous step used random strings as keys when doing the lookups. The behavior might be different if we looked for keys that were actually in the map. Go back and add some additional code to fillAndTest: As you're adding random (key, value) pairs, keep track of the keys you use. (Store them in a list, for example, or a set.) Then, after you've calculated the average number of probes for some retrievals using random keys, do a retrieval for each of the real keys and report that average as well.
Knuth's formulas predicted that an open-addressing implementation would require 2.5 probes per access, on average, if the load factor was 0.75. For a chaining implementation we could allow the load factor to be as high as 3.0 and still have an average of 2.5 probes per access. Make some calls to fillAndTest to test this experimentally.

Extras

If you've got extra time, consider trying the following:

Make an OpenMap of size 10 and add entries for the following keys, monitoring where they end up after each: ishmael, whale, surf, and harpoon. Can you figure out where each wanted to go? (You could add a print statement to put() so it tells you.) How many of them ended up "downstream"? Can get() still find all of them properly?
Get fancy with the code in main and have it build a table like the one shown above.
Repeat your experiments with maps that use objects other than strings as keys. That will exercise a different hashCode function and let you see the impact. Knuth's predictions are based on the assumption that the key's hash function distributes keys evenly across the array, and that might be harder to get right for things other than strings.
Build your own class and override hashCode. See how much luck you have coming up with good hashing functions. (If they don't do a good job of distributing keys across the table, the map performance will suffer.)
Write remove methods for one or both of the map classes.

Brad Richards, 2023