HW3: Out-of-order Execution

Note: Make reasonable assumptions where necessary and clearly state them. Feel free to discuss problems with classmates, but the only written material that you may consult while writing your solutions are the textbook, lecture notes, and lecture slides.

Q1: Out-of-order processing

Consider an out-of-order processor similar to the one described in class. The architecture has 32 logical registers (also known as architected registers or program-defined registers and indicated as LR*) and 38 physical registers (indicated as PR*). On power up, the following program starts executing. To simplify the problem, some of the initialization code is not shown and you can ignore that code. The loop in the program is executed for at least three iterations.

Line1: L.D LR1 0(LR2)
DADD LR1, LR1, LR3
ST.D LR1, 0(LR2)
DADD LR2, LR2, 8
BNE LR2, LR4, Line1

The processor has a width of 3, i.e., every pipeline stage can move up to 3 instructions through in every cycle. Show the renamed code for the first 15 instructions of this program. In what cycle will the 15th instruction get committed?

Assumptions:

Assume that branch prediction is perfect for a simple program like this. With the help of a trace cache, even fetch is perfect. Assume that caches are perfect as well. Assume that the dependent of a DADD instruction can leave the issue queue in the cycle right after the DADD. Assume that the dependent of an L.D cannot leave in the next cycle, but the cycle after that. Assume a ROB, an issue queue, and an LSQ with 20 entries each. When the thread starts executing, its logical register LR1 is mapped to physical register PR1, LR2 is mapped to PR2, and so on. An instruction goes through 5 pipeline stages before it gets placed in the issue queue and an additional 5 pipeline stages (6 for a LD/ST) after it leaves the issue queue (in other words, an instruction will take a minimum of 11 cycles to go through the pipeline). When determining if a L.D can issue, you need not check to see if previous store addresses have been resolved (just to make the problem simpler). As a further simplification, assume that stores leave the issue queue when their register dependences have been fulfilled (recall that a real processor will issue a store only when the store is the oldest instruction in the ROB).

Q2: Load-Store Queue

The table below lists a sequence of loads and stores in the LSQ, and the cycles when their one (for loads) or two (for stores) input operands are made available, and their computed effective addresses. Estimate when the address calculation happens for each ld/st and when each ld/st accesses the data memory. Assume that the processor does no memory dependence prediction to speculatively issue loads.

LD/ST The register for the address calculation is made available The register that must be stored into memory is made available The calculated effective address
LD 5 - abcd
ST 3 8 abde
LD 1 - abce
LD 4 - abde
ST 7 9 abde
LD 2 - abde
LD 6 - abbe

Subimt

Submit your solution through Gradescope HW3 (as a PDF file (please mark which parts of the PDF are used for each question (this can be done through Gradescope)).

Updated: February, 2019