Problem 1 --------- Part a) Out = !op!A!B + !op!AB + Op!A!B + OpA!B The formula didn't need to be simplified for full credit, but it could be simplified to produce a shorter formula: Out = !op(!A!B + !AB) + Op(!A!B + A!B) = !op!A + Op!B Part b) There are multiple ways to show that the circuit is NOT equivalent to the truth table. You could wire up the circuit corresponding to the truth table from Problem 1 and show that they're not the same, or you could generate the truth table for the circuit shown in Problem 2 and note that it's not the same as the table in the previous problem: Op A B Out 0 0 0 0 0 0 1 0 0 1 0 1 0 1 1 1 1 0 0 1 1 0 1 0 1 1 0 1 1 1 1 0 In fact, you'd only need to generate the first *entry* in the table to show that it's not the same. Problem 2 --------- a) It's a 1-bit ALU cell. b) a and b are the two 1-bit inputs to the ALU. They represent the data values being manipulated by the ALU. Once we chain together multiple 1-bit ALU cells to build a 32-bit ALU, they will either come from the output of the register file, or the immediate bits of an instruction. c) If Ainvert is set, we invert the a input before using its value in the ALU. This is useful if we want to implement a logical NOR or NAND operation, for example. It is *not* used in subtraction -- for that we'd want to invert b. Problem 3 --------- a) The current hardware cannot execute a BNE instruction, but it's close. The hardware is already in place for calculating the address of the instruction we want to branch to, but in the case of BNE we want to use that address if the Branch control line is 1 and the Zero output from the ALU is 0 (rather than 1 as in the case of BEQ). One way to implement BNE is to add a multiplexer that determines whether Zero or !Zero is used as an input to the AND gate. We'd need an additional control like to control that multiplexer as well. b) The control settings would be exactly the same as for a BEQ, including the Branch control like, but we'd need to add the appropriate setting for the new multiplexer control line. (Details will vary depending on how you've inserted that into the diagram.) Problem 4 --------- # $a0: Contains address of string to print # $a1: Number of times to print # $s0: Loop counter print_string: addi $sp, $sp, -4 sw $s0, 0($sp) li $s0, 0 # Initialize our counter "variable" top: beq $s0, $a1 done # Count == max? li $v0, 4 # Specify the "print string" syscall syscall addi $s0, $s0, 1 # Increment counter j top done: lw $s0, 0($sp) addi $sp, $sp, 4 jr $ra Problem 5 --------- 100x10^9 instructions, 2.5GHz clock, CPI of 5.0. a) Runtime = 100x10^9 x 5.0 / 2.5x10^9 = 200 seconds b) 3% miss rate on instructions and 6% on data, with 35% of instructions being loads or stores and 200-cycle miss penalty: Inst miss: 6.00E+11 cycles (100E9 x 3% x 200) Data miss: 4.20E+11 cycles (100E9 x 35% x 6% x 200) Inst cyc: 5.00E+11 cycles (as above) --> 608 seconds c) 2% miss rate for instructions and 5% for data, with 2.3GHz clock: Inst miss: 4.00E+11 (100E9 x 2% x 200) Data miss: 3.50E+11 (100E9 x 35% x 5% x 200) Inst cyc: 5.00E+11 (100E9 x 5.0) Sum divided by 2.3E9 this time --> 543 seconds d) 2% miss rate on instructions, 6% on data, 2.5GHz clock: Inst miss: 4.00E+11 (100E9 x 2% x 200) Data miss: 4.20E+11 (100E9 x 35% x 6% x 200) Inst cyc: 5.00E+11 (100E9 x 5.0) Sum divided by 2.5E9 --> 528 seconds