CS 261 Assignment #7

Due Wednesday, May 3rd by 11:59pm
(Not accepted late)

Introduction

For your final assignment you'll implement a simple spreadsheet application. I'm providing a graphical interface to display spreadsheet data. Your job is to design and implement the classes that represent the spreadsheet itself. The good news is that you've already got a pretty good start on this, since the Expression class from the previous assignment will be used to represent the formulas in the spreadsheet cells. (Formulas that refer to other spreadsheet cells will contain variables corresponding to those cells!) A given cell's value can be calculated via its formula's evaluate method, assuming the values of all referenced cells are already known. The tricky part has to do with recalculating values in the spreadsheet after updating or adding a formula, but more about that in a moment...

Groups

I want you to work in pairs again on this assignment. (Pairings are listed below.) Your group can use either member's Expression class, or my solution, as you work on this final assignment.
Ethan Spence and Lucas Takiff
Gabriel Guinn and Stephen Rice
Grey Roppolo and Bonacic
Ella Slattery and Noah Zimmer
Noah Sprenger and Jacob Endrina
Rohan Crossland and Brett Garon
Devan Meyer and Alex Lopez
Cian Monaghan and Asa Bonner
Noah McCullough and Jinwoo Choi
Brendan Bell and Lucas Calaff
Mackenzie Leibee and Spencer Racca-Gwozdzik

Overview

The GUI code I'm providing displays a spreadsheet's data in "typical" fashion: A regular grid of cells is displayed, with column names along the top, and row numbers on the left. Only the values are displayed. The formulas used to compute those values are kept hidden until the user clicks on a cell, at which point my code opens a window containing the formula and allows it to be edited. The picture below shows the editing window as it appeared after cell E2 was clicked:

spreadsheet photo

The GUI just displays information though — it doesn't store values or evaluate formulas. When the window is drawn, it interacts with a spreadsheet implementation via the set of methods in the Sheetable interface. (See the documentation for Sheetable and the GUI.) The GUI uses a getValue method (that you'll need to impleent) to request the value of a cell during the display process, setFormula to update spreadsheet formulas when the user enters or modifies formulas. Thus, the GUI can be combined with any object that implements the Sheetable interface.

To get you started, and to illustrate the use of the GUI, I'm providing a "fake" spreadsheet that implements the Sheetable interface. It doesn't do anything very interesting, but it does implement all of the required methods in a naive way. I'm also giving you the Demo class. It contains a main method that ties all of the pieces together — it creates a FakeSpreadsheet object and a GUI that interacts with it. If you download and compile all of this code (or the zipped version), you'll have a working demonstration. (It doesn't actually store any formulas — that part is up to you!) The class diagram below illustrates the relationships between the classes I'm providing:

class diagram

Cells and Formulas

Note that formulas may contain references to other spreadsheet cells. We will use the standard "letter + number" notation when referring to other cells, and treat cell references as variables. Thank goodness your Expression class allows variables in expressions or this would be a lot harder! (The column identifier always comes first, and we will assume that it will consist of a single upper-case character. The row number may be more than a single digit, however.)

We won't use an "=" to differentiate between simple values and formulas like Excel does — each of our cells will hold a formula (possibly a formula consisting of a single value), so we won't need the "=". Initially, though, all cells in the spreadsheet should be empty. They should only take on values once the user has clicked on a cell and entered a formula. People make mistakes though, so there's no guarantee that an input formula will be syntactically correct. If it's not, catch any resulting exceptions and leave the cell's contents unchanged.

Recalculating

In the first picture above, cell B2 holds the value 2.0, C2 holds the formula B2 * 2.0, and D2 holds C2 * 2.0. E2's formula is shown in the editing window. Since none of the cells contain formulas that reference E2, only cell E2 would need to be updated if the user modified its formula. If the user were to edit B2 and change the 2.0 to a 3.0, however, all four cells shown would have to be updated. Note also that the order in which they're evaluated is important: A good spreadsheet would evaluate the cells from left to right in this particular case, so that C2's new value is available when we update D2, and D2's new value is ready for E2 when it's recalculated. The most efficient order in which to visit and re-evaluate cells depends upon the relationship between their formulas, so you can't just build in a particular ordering if you want to be efficient.

For this assignment, you can select from one of the three possible approaches to the recalculation problem described below. Your grade will be influenced by the approach you take, since the difficulty of implementing them differs. Regardless of the approach you choose, you must detect cycles among formulas (e.g. B1 refers to B2, and B2 refers to B1). The three algorithms reveal cycles in different ways, but if you detect one you should stop the recalculation process and report an error. You need not describe the cycle to the user — just tell them that one exists and leave it up to them to find and fix it.

Brute Force

While an efficient recalculation algorithm needs to take the relationships between cells and formulas into account, it's possible to update cell values correctly (but inefficiently) without considering relationships at all: Make a pass over the entire spreadsheet and update every cell's value (in any order you choose). Keep making recalculation passes like this until the values stop changing. You can think of this as the "Bubble Sort" approach: Keep making things better, a bit at a time, until we're done. Let's think about how this would work on the example from above. The cells' contents are:
Cell Formula Current Value
B2 2 2
C2 B2 * 2 4
D2 C2 * 2 8
E2 ( D2 * 2 ) + B2 18
Let's say we change B2 to be 3 instead of 2, and that we evaluate the formulas from right to left. Right to left is the worst possible order to use, given the formulas, but it'll demonstrate that the brute-force approach will work no matter what. After changing B2, if we updated E2's value, it would refer to B2's new value but D2's old value since D2 hasn't been updated yet, giving us an incorrect 19. Then we'd move on to D2. It uses C2's value, but we haven't updated C2 yet, so D2 would still evaluate to 8. When we get to C2 we'd use B2's new value and compute C2's new value to be 6 instead of 4. After this one update pass we'd have correct values in B2 and C2, but D2 and E2 would still be wrong. On the next update pass we'd determine E2's value to be 16+3 (again using a mixture of old and new values), but D2 would use the new C2 value and correctly evaluate to 12. After one more pass, E2 would then use the new values of both D2 and B2 and correctly evaluate to 27. Of course, we don't know that we're done yet. We'd have to make one more pass over all of the cells, note that nothing changed, and then conclude that the values had all been recalculated correctly.

List-Based Optimal Order

If we're clever about the order in which we update the cells, we can do the recalculation in a single pass (assuming there aren't cycles in the formulas). Start by building a list of all cells that contain formulas, and sort the list based on the number of unique variables that each formula references. (Initially, that's just the size of the formula's variables set.) Cells that don't refer to any other cells will be at the front of the list, followed by the cells that refer to just one other cell, etc. Then use the following algorithm: Remove the first cell from the list and recalculate its value. Go through the list looking for cells whose formulas refer to this initial cell, and remove it from their variable sets. Re-sort the list, and repeat the process until the list is empty. (Note that you can actually get by without sorting if you watch for empty variable sets as you traverse the list doing the removals.) At each step, the size of the variable sets is telling us how many other un-updated cells a formula depends on. When that number goes to zero it's safe to evaluate a cell.

In terms of the example above, our list would contain [B2, D2, C2, E2], in that order. (C2 and D2 have the same number of variables and so could appear in either order, but I'm listing them this way because it's more interesting that way.) Thus we'd start by evaluating B2. Its new value of 3 wouldn't change, but we'd then go through the rest of the list and remove B2 from the variable sets of the remaining formulas. That would leave D2 with one variable, C2 with 0 variables, and E2 with one. If we re-sort the list based on the size of these variable sets we'd get [C2, D2, E2]. That tells us that C2 is the one to evaluate next. Removing C2 from the rest of the variable sets would leave D2 with none and E2 with one, so we'd have [D2, E2]. Repeat until they're all evaluated.

Graph-Based Optimal Order

The previous approach evaluates each cell exactly once, but requires that we re-sort the list (or at least traverse it) after each evaluation. A better approach would be to start by building a dependence graph that describes how the cells are related to one another. The graph corresponding to our spreadsheet example above is shown below. The arrows from B2 to C2 and E2 indicate that B2's value is used by those nodes. C2's value is only used by D2, etc. You could build this graph out of objects and references (like a more general version of a linked list), or find some other way to represent its interconnections.

dependence graph

The algorithm in this case is to start at the node that's been updated and visit the "downstream" nodes that use its value. (You'll want to be a little clever about the order in which you visit these nodes if you want the best efficiency, otherwise you might visit E2 and recompute its value before you visit C2. The algorithm will work even if you don't worry about the order though.) Keep going until you reach a node with no outgoing arrows — E2 in this case. That is, updating B2 would cause you to visit and evaluate C2 and (eventually) E2. Updating C2 causes you to visit and update D2, which causes you to visit and update E2. Eventually, when there are no more arrows to follow, you're done. This approach can potentially get by without recalculating all of the cells in the spreadsheet. If we changed the formula in D2, for example, we'd recompute E2 but wouldn't have to consider B2 or C2.

Assignment Specifics

Submitting

Please zip up your entire project — the spreadsheet code you've written, the spreadsheet code you copied from me, and the old arithmeticExpression package code — and submit the .zip archive via Canvas. I want to make sure I've got all of the code necessary to compile and run your projects. (I'd appreciate it if you'd make sure there's nothing but .java files in the folder.) Additionally, each group member should email me a sentence or two, individually, describing how the work was divided, what your role was in the design and implementation, and your experiences during the group design and programming sessions.

Extending the Assignment

Consider trying one of these extensions to the assignment if you finish early or just can't get enough.


Brad Richards, 2023