Problem 1
---------

Part a)

    Out = !op!A!B + !op!AB + Op!A!B + OpA!B
    
    The formula didn't need to be simplified for full credit, but it could be
    simplified to produce a shorter formula:
    
        Out = !op(!A!B + !AB) + Op(!A!B + A!B)
            = !op!A + Op!B
            

Part b)

    There are multiple ways to show that the circuit is NOT equivalent to the
    truth table. You could wire up the circuit corresponding to the truth table
    from Problem 1 and show that they're not the same, or you could generate the
    truth table for the circuit shown in Problem 2 and note that it's not the
    same as the table in the previous problem:

        Op  A   B   Out
        0   0   0   0
        0   0   1   0
        0   1   0   1
        0   1   1   1
        1   0   0   1
        1   0   1   0
        1   1   0   1
        1   1   1   0
        
    In fact, you'd only need to generate the first *entry* in the table to show
    that it's not the same.
    
    
Problem 2
---------

a)  It's a 1-bit ALU cell.
    
b)  a and b are the two 1-bit inputs to the ALU. They represent the data values 
    being manipulated by the ALU. Once we chain together multiple 1-bit ALU cells
    to build a 32-bit ALU, they will either come from the output of the register
    file, or the immediate bits of an instruction.

c)  If Ainvert is set, we invert the a input before using its value in the ALU.
    This is useful if we want to implement a logical NOR or NAND operation, for
    example. It is *not* used in subtraction -- for that we'd want to invert b.
    

Problem 3
---------

a)  The current hardware cannot execute a BNE instruction, but it's close.
    The hardware is already in place for calculating the address of the
    instruction we want to branch to, but in the case of BNE we want to 
    use that address if the Branch control line is 1 and the Zero output
    from the ALU is 0 (rather than 1 as in the case of BEQ).  One way to
    implement BNE is to add a multiplexer that determines whether Zero 
    or !Zero is used as an input to the AND gate. We'd need an additional
    control like to control that multiplexer as well.
    
b)  The control settings would be exactly the same as for a BEQ, including
    the Branch control like, but we'd need to add the appropriate setting
    for the new multiplexer control line. (Details will vary depending on
    how you've inserted that into the diagram.)


Problem 4
---------

# $a0:  Contains address of string to print
# $a1:  Number of times to print
# $s0:  Loop counter

print_string:
        addi $sp, $sp, -4
        sw $s0, 0($sp)
        li $s0, 0               # Initialize our counter "variable"
top:    beq $s0, $a1 done       # Count == max?
        li $v0, 4               # Specify the "print string" syscall
        syscall
        addi $s0, $s0, 1        # Increment counter
        j top
done:   lw $s0, 0($sp)
        addi $sp, $sp, 4
        jr $ra


Problem 5
---------

100x10^9 instructions, 2.5GHz clock, CPI of 5.0.


a)  Runtime = 100x10^9 x 5.0 / 2.5x10^9 = 200 seconds
    
    
b)  3% miss rate on instructions and 6% on data, with 35% of instructions
    being loads or stores and 200-cycle miss penalty:
    
        Inst miss:  6.00E+11 cycles   (100E9 x 3% x 200)
        Data miss:  4.20E+11 cycles   (100E9 x 35% x 6% x 200)
        Inst cyc:   5.00E+11 cycles   (as above)
        
            --> 608 seconds 
                  
c)  2% miss rate for instructions and 5% for data, with 2.3GHz clock:
    
        Inst miss:  4.00E+11    (100E9 x 2% x 200)
        Data miss:  3.50E+11    (100E9 x 35% x 5% x 200)
        Inst cyc:   5.00E+11    (100E9 x 5.0)
        
        Sum divided by 2.3E9 this time
        
            --> 543 seconds 

d)  2% miss rate on instructions, 6% on data, 2.5GHz clock:
    
        Inst miss:  4.00E+11    (100E9 x 2% x 200)
        Data miss:  4.20E+11    (100E9 x 35% x 6% x 200)
        Inst cyc:   5.00E+11    (100E9 x 5.0)
        
        Sum divided by 2.5E9
        
            --> 528 seconds