There's a long history of using pooled testing to detect viruses and other pathogens. For example, when you donate blood, a portion of your donation is combined with other donors' blood, and the entire "pool" is tested for things like HIV. If any subject in the group has HIV, the pool will test positive. We wouldn't know which subject had HIV unless we then tested each subject individually, but in situations where infections are rare this can still be an efficient way to screen subjects.
As Covid started spreading in the spring of 2019, an Israeli research team reported an approach to pooled testing that uses techniques from computer science to improve the efficiency: Subjects' samples are added to multiple pools, but arranged carefully so that we can deduce which subjects tested positive without having to test each individually. In one of their examples, they were able to evaluate 384 subjects with only 48 tests!
Assume that subject 5 is Covid positive. That would lead pools B, D, and E (shown in red below) to test positive. The algorithm can tell that only subject 5 is responsible for these positive pools: Not only is it in every pool that tested positive, but there aren't any negative pools containing 5. Subject 4 is in several of the positive pools as well, but is also in pool C, which tested negative. That's evidence that subject 4 isn't responsible for the positive results.
The savings in this example weren't very dramatic. We determined the status of 8 subjects with 6 tests. That's still better than having to do 8 tests, but the process gets more efficient as it scales up to larger numbers of subjects. What if several subjects are positive? Will this approach still work? Imagine that subjects 0 and 5 are both positive. The diagram below shows all of the pools containing 0 and/or 5 in red — those groups would all test positive.
The good news is that the algorithm would still be able to determine that 0 and 5 are positive — they're in pools that tested positive, and they're not in any negative pools. Unfortunately, that's also true for subjects 2 and 4, so we'd mistakenly conclude that they're Covid positive as well. That can happen if the number of positive subjects is large with respect to the pool size, so this approach works best for fairly low positivity rates.
IntSet
class from the last assignment, so you're partway done already! (You can copy your own IntSet
class into the project folder, or the solutions once they're posted.) The PooledTesting.zip project contains some starter code, and an outline of the methods you need to write. You'll add methods to the PoolProcessor
class for reading a data file and determining which subjects are Covid positive. The PooledSampleTester
is already written. It has a main
method that prompts the user for information on which pools test positive, then runs your code.
The input data is in text files, organized as shown below. The first line contains the column labels, separated by a tab character. The remaining lines consist of a pool label (a String
) and a subject ID (an int
) to be added to the specified pool, separated by a tab. The sample data file below describes the six pools in the introduction above. Here, each pool's entries are consecutive (all of A's subjects come before B's, etc), but you should not assume they'll always be that way. Your code should still work regardless of the order of the lines in the input file. The BlueJ project folder contains this file, pooling_example.txt
, as well as a much larger file, pooling384_48_by_pool.txt
, containing pool assignments for 384 subjects. (Recent versions of BlueJ will show these files as gray rectangles in the project window. Older versions won't display anything except Java classes, but the data files are still there.)
pool subject A 0 A 2 A 3 A 7 B 4 B 5 B 6 B 1 C 0 C 3 C 4 C 1 D 5 D 6 D 7 D 2 E 0 E 4 E 5 E 2 F 6 F 7 F 1 F 3
PoolProcessor
class keeps an ArrayList of IntSet
s, and a corresponding ArrayList of their names (labels). (This is similar to our Election lab, where we kept an array of vote counts and a corresponding array of names.) Its constructor reads a data file, creates all of the pools, and fills them with subjects. The detect
method takes a list of test results (the names of the pools that tested positive), and implements the algorithm above for deducing which subjects must be positive. I'm giving you a "to-do list" for each method in the class to get you started. Output from these methods should match that shown in the testing section below.
The PooledSampleTester
class has a main method that starts the program running. It creates a PoolProcessor
object, passing the constructor the name of the file to read. After prompting the user to enter the names of the pools that should be considered positive, it runs the detect
method to determine which subjects have tested positive. It's currently set up to read from the small sample file. You'll want to edit the top of the file to use the larger test file when making sure your program works, but otherwise you won't need to change anything in this class.
Below I'm showing codepad interactions in which a PoolProcessor
object is created, and asked to read the file "pooling_example.txt". (Output that would appear in the terminal window is shown indented below.) The constructor printed out the details of the pools it created, as well as the total number of subjects read and pools created. I then made a list of strings containing "B", "D", and "E" — three pools I decided had tested positive. This list was passed to detect
, and it deduced that based on those test results for the pools, only subject 5 was actually Covid positive. (The detect
method printed the pool when it was done.)
You should also test your program by running thePoolProcessor pp = new PoolProcessor("pooling_example.txt"); Read 24 subjects and 6 pools: A: [0, 2, 3, 7] B: [1, 4, 5, 6] C: [0, 1, 3, 4] D: [2, 5, 6, 7] E: [0, 2, 4, 5] F: [1, 3, 6, 7] import java.util.ArrayList; ArrayListtestResults = new ArrayList (); testResults.add("B"); testResults.add("D"); testResults.add("E"); pp.detect(testResults); Positive subjects: [5]
main
method in PooledSampleTester
. It will automate some of the steps shown above. Test it with a variety of scenarios, and make sure it properly picks the Covid-positive subjects.
@param
tags for all inputs and @return
tags for returned values.