CS 161 Assignment #9

Due Wednesday, April 24th by 11:59pm
(Not accepted late!)

Partners

You are welcome to work in groups of two on this assignment if you prefer. If you choose to work with a partner, please send me a note so I know who's working with whom. Only one member of the team should submit the assignment, but make sure both names are in the comment at the top of the classes. Both partners should contribute equally to the effort, and neither should start without first having an initial planning meeting to discuss an implementation plan.

Introduction

For your final assignment, you'll build an application that can determine which subjects in a large group have tested positive for Covid 19, without having to test each subject individually. In addition to giving you more practice with your basic Java skills, this project will give you a chance to write a complete application (one with a main method), and one that takes input from both the keyboard and a data file.

There's a long history of using pooled testing to detect viruses and other pathogens. For example, when you donate blood, a portion of your donation is combined with other donors' blood, and the entire "pool" is tested for things like HIV. If any subject in the group has HIV, the pool will test positive. We wouldn't know which subject had HIV unless we then tested each subject individually, but in situations where infections are rare this can still be an efficient way to screen subjects.

As Covid started spreading in the spring of 2019, an Israeli research team reported an approach to pooled testing that uses techniques from computer science to improve the efficiency: Subjects' samples are added to multiple pools, but arranged carefully so that we can deduce which subjects tested positive without having to test each individually. In one of their examples, they were able to evaluate 384 subjects with only 48 tests!

Deducing Which Subjects are Positive

We'll start by assuming that all of the subjects in pools that tested positive are Covid positive, then eliminate any of those subjects that appear in negative pools. Let's work through a scaled-down example. The diagram below shows eight subjects (0–7) and six different pools, labelled A through F. You can see that subject 2, for example, is in pools A, D, and E, meaning that 2's sample was split across those three groups and combined with samples from the other subjects in each pool.

Assume that subject 5 is Covid positive. That would lead pools B, D, and E (shown in red below) to test positive. The algorithm can tell that only subject 5 is responsible for these positive pools: Not only is it in every pool that tested positive, but there aren't any negative pools containing 5. Subject 4 is in several of the positive pools as well, but is also in pool C, which tested negative. That's evidence that subject 4 isn't responsible for the positive results.

The savings in this example weren't very dramatic. We determined the status of 8 subjects with 6 tests. That's still better than having to do 8 tests, but the process gets more efficient as it scales up to larger numbers of subjects. What if several subjects are positive? Will this approach still work? Imagine that subjects 0 and 5 are both positive. The diagram below shows all of the pools containing 0 and/or 5 in red — those groups would all test positive.

The good news is that the algorithm would still be able to determine that 0 and 5 are positive — they're in pools that tested positive, and they're not in any negative pools. Unfortunately, that's also true for subjects 2 and 4, so we'd mistakenly conclude that they're Covid positive as well. That can happen if the number of positive subjects is large with respect to the pool size, so this approach works best for fairly low positivity rates.

Project Overview

For your assignment you'll complete a program that reads a data file containing information on subjects and pools, prompts the user for information about which pools have tested positive, and uses the algorithm above to determine which subjects are Covid positive. The good news is that we can represent pools with the IntSet class from the last assignment, so you're partway done already! (You can copy your own IntSet class into the project folder, or the solutions once they're posted.) The PooledTesting.zip project contains some starter code, and an outline of the methods you need to write. You'll add methods to the PoolProcessor class for reading a data file and determining which subjects are Covid positive. The PooledSampleTester is already written. It has a main method that prompts the user for information on which pools test positive, then runs your code.

The input data is in text files, organized as shown below. The first line contains the column labels, separated by a tab character. The remaining lines consist of a pool label (a String) and a subject ID (an int) to be added to the specified pool, separated by a tab. The sample data file below describes the six pools in the introduction above. Here, each pool's entries are consecutive (all of A's subjects come before B's, etc), but you should not assume they'll always be that way. Your code should still work regardless of the order of the lines in the input file. The BlueJ project folder contains this file, pooling_example.txt, as well as a much larger file, pooling384_48_by_pool.txt, containing pool assignments for 384 subjects. (Recent versions of BlueJ will show these files as gray rectangles in the project window. Older versions won't display anything except Java classes, but the data files are still there.)

pool    subject
A   0
A   2
A   3
A   7
B   4
B   5
B   6
B   1
C   0
C   3
C   4
C   1
D   5
D   6
D   7
D   2
E   0
E   4
E   5
E   2
F   6
F   7
F   1
F   3

Specifics

The PoolProcessor class keeps an ArrayList of IntSets, and a corresponding ArrayList of their names (labels). (This is similar to our Election lab, where we kept an array of vote counts and a corresponding array of names.) Its constructor reads a data file, creates all of the pools, and fills them with subjects. The detect method takes a list of test results (the names of the pools that tested positive), and implements the algorithm above for deducing which subjects must be positive. I'm giving you a "to-do list" for each method in the class to get you started. Output from these methods should match that shown in the testing section below.

The PooledSampleTester class has a main method that starts the program running. It creates a PoolProcessor object, passing the constructor the name of the file to read. After prompting the user to enter the names of the pools that should be considered positive, it runs the detect method to determine which subjects have tested positive. It's currently set up to read from the small sample file. You'll want to edit the top of the file to use the larger test file when making sure your program works, but otherwise you won't need to change anything in this class.

Below I'm showing codepad interactions in which a PoolProcessor object is created, and asked to read the file "pooling_example.txt". (Output that would appear in the terminal window is shown indented below.) The constructor printed out the details of the pools it created, as well as the total number of subjects read and pools created. I then made a list of strings containing "B", "D", and "E" — three pools I decided had tested positive. This list was passed to detect, and it deduced that based on those test results for the pools, only subject 5 was actually Covid positive. (The detect method printed the pool when it was done.)

PoolProcessor pp = new PoolProcessor("pooling_example.txt");
    Read 24 subjects and 6 pools:
    A: [0, 2, 3, 7]
    B: [1, 4, 5, 6]
    C: [0, 1, 3, 4]
    D: [2, 5, 6, 7]
    E: [0, 2, 4, 5]
    F: [1, 3, 6, 7]
import java.util.ArrayList;
ArrayList testResults = new ArrayList();
testResults.add("B");
testResults.add("D");
testResults.add("E");
pp.detect(testResults);
    Positive subjects: [5]
You should also test your program by running the main method in PooledSampleTester. It will automate some of the steps shown above. Test it with a variety of scenarios, and make sure it properly picks the Covid-positive subjects.

Style Guide

Before you submit your assignment, go through the checklist below and make sure your code conforms to the style guide.

Once you've tested your code and verified that it meets style guidelines, submit from within BlueJ as usual.

Grading

This assignment will be graded out of a total of 90 points:

Submitting

Before submitting, make sure you've added your name(s) to the comment at the top of the class files, and written comments for each method. Test your methods thoroughly. When you're convinced it's ready to go, submit the project via BlueJ just like you have in the past. If you worked with a partner, make sure only one of you submits!


David Chiu & Brad Richards, 2024