HW2: RISC instruction set, pipelining

Note: Make reasonable assumptions where necessary and clearly state them. Feel free to discuss problems with classmates, but the only written material that you may consult while writing your solutions are the textbook, lecture notes, and lecture slides.

Problem 1: RISC Instruction set

Take a look at the RISC instruction set in the book (Section A.9). Write a simple RISC assembly program that finds a sum of numbers from 1 to N.

Problem 2: Basic pipelining

An unpipelined processor takes 3 ns to work on one instruction. It then takes 0.2 ns to latch its results into latches. I was able to convert the circuits into 5 equal sequential pipeline stages. Answer the following, assuming that there are no stalls in the pipeline.

  • What are the cycle times in the two processors?
  • What are the clock speeds?
  • What are the IPCs?
  • How long does it take to finish one instr?
  • What is the speedup from pipelining?
  • If I was able to build a magical 1000-stage pipeline, where each stage took an equal amount of time, what speedup would I get?

Problem 3: Data Dependences

Consider a 32-bit in-order pipeline that has the following stages. Note the many differences from the examples in class: a stage that converts CISC instructions to micro-ops, one stage to do register reads, one stage to do register writes, three stages to access the data memory, and 4 stages for the FP-ALU. For the questions below, assume that each CISC instruction is simple and is converted to a single micro-op.

Fetch Convert to micro-ops Decode Regread IntALU Regwrite
IntALU Datamem1 Datamem2 Datamem3 Regwrite
FPALU1 FPALU2 FPALU3 FPALU4 Regwrite

After instruction fetch, the instruction goes through the micro-op conversion stage, a Decode stage where dependences are analyzed, and a Regread stage where input operands are read from the register file. After this, an instruction takes one of three possible paths. Int-adds go through the stages labeled "IntALU" and "Regwrite". Loads/stores go through the stages labeled "IntALU", "Datamem1", "Datamem2", "Datamem3", and "Regwrite". FP-adds go through the stages labeled "FPALU1", "FPALU2", "FPALU3", "FPALU4", and "Regwrite". Assume that the register file has an infinite number of write ports so stalls are never introduced because of structural hazards, also assume that register read and register write take half a cycle same as in our simple 5-stage pipeline that we discussed in class. How many stall cycles are introduced between the following pairs of successive instructions (i) for a processor with no register bypassing and (ii) for a processor with full bypassing?

  1. Int-add, followed by a dependent Int-add
  2. FP-add, followed by a dependent FP-add
  3. Load, providing the address for a store
  4. Load, providing the data for a store
  5. FP-add, providing the data for a store

Submit your solution through Gradescope HW2 (as a PDF file (please mark which parts of the PDF are used for each question (this can be done through Gradescope)).

Updated: February, 2019