The diagram below shows how our revised program would process "the dog bit the cat". After reading the first three words, we would have scenario (a), where the list of words is in alphabetical order, and their corresponding counters are each 1. When we get to the fourth word, "the", we discover (via Binary Search) that it's already in the list of words, at position 2. The counter at position 2 would therefore be updated to reflect the fact that "the" has now been seen twice, giving us scenario (b). When we come across the word "cat", we discover it's not in our list of known words, so it gets added to the list at a position that keeps the list ordered. (Its counter is added at the same position in the list of counters.)
WordStorage
, that keeps track of the unique words encountered and their frequencies, then modify the WordCounter
class from lab to make use of your new, more efficient word-counting code. You will also write up an analysis of the Big-O complexity of some of the key code in WordStorage
. The specifics are given below.
WordStorage
at the moment is a copy of our recursive Binary Search code from class, but the documentation for the finished code is online and describes the methods you're required to implement. The WordCounter
class contains some code from our lab project.
WordStorage
class will contain the list of words and the list of counters used to determine the word frequencies. If the constructor is passed true
as an input it will create ArrayLists for both, otherwise it will create LinkedLists. This makes it easy to experiment with the choice of list on the performance of the program if you choose to.
processWord
method in WordStorage
should take a word and update the lists appropriately: If the word is already in the list the corresponding counter should be incremented, otherwise the word should be added to the list and a new counter inserted at the corresponding position. For full credit, binary search should be used both to check whether a word is in the list and to find the insertion position if it turns out that the word is not in the list. This will require maintaining the word list in alphabetical order.
maxFrequency
method should return the largest value in the list of counters. For full credit, it should use an iterator to traverse the list of counters so that it's as efficient as possible.
mostFrequentWord
method should return the word that occurred most frequently. For full credit, any list traversals performed when finding the word should be done via iterators as well. In the case of a tie it doesn't matter which word you return.
size
method should return the number of unique words in the collection, and toString
should return a string reporting the most frequently occurring word and its frequency (or a message indicating that no words have been processed).
countWords
method in the WordCounter
class so that it makes use of the WordStorage
class to process the words.
WordStorage
class that presents an analysis of the Big-O complexity of the processWord
method. You don't need to give a T(n) function — just an overall Big-O estimate — but for full credit you should justify your answer. Doing it justice will take a paragraph or two rather than just a sentence. (You'll need to factor in the complexity of the methods called by processWord
, and discuss the impact of the type of lists holding the words and counts, for example.)
WordStorage
fields and constructor
processWord
works correctly and uses Binary Search for both checks and insertions.
maxFrequency
and mostFrequentWord
work correctly and use iterators
size
and toString
are correct
WordCounter
are completed
processWord
@param
and @return
directives) above each method. Don't forget to do the Big-O analysis for processWord
! When you're convinced everything is ready to go, zip up your project folder and submit it via the Canvas submission tool for Assignment #5.