HW4: Caches and Cache-Coherence

Note: Make reasonable assumptions where necessary and clearly state them. Feel free to discuss problems with classmates, but the only written material that you may consult while writing your solutions are the textbook, lecture notes, and lecture slides.

Memory access times

Consider a processor and a program that would have an IPC of 1 with a perfect 1-cycle L1 cache. Assume that each additional cycle for cache/memory access causes program execution time to increase by one cycle. Assume the following MPKIs and latencies for the following caches:

  • L1: 16 KB: 1-cycle: 100 MPKI
  • L2: 64 KB: 10-cycles: 50 MPKI
  • L3: 2 MB: 30-cycles: 10 MPKI
  • L4: 16 MB: 100-cycle: 5 MPKI
  • Memory: 200-cycles
Estimate the program execution times for the following cache configurations:
  1. L1-L2-L3-L4
  2. L1-L2-L3
  3. L2-L3-L4
  4. L1-L2-L4

Cache Organization

A 32 MB L3 cache has a 64 byte line (block) size and is 16-way set-associative. How many sets does the cache have? How many bits are used for the offset, index, and tag, assuming that the CPU provides 48-bit addresses? How large is the tag array? (If you do not explain your steps, you will not receive partial credit for an incorrect answer.)

Cache Miss Rates

For the following access pattern, (i) indicate if each access is a hit or miss. (ii) What is the hit rate? Assume that the cache has 2 sets and is 2-way set-associative. Assume that block A maps to set 0, B to set 1, C to set 0, D to set 1, E to set 0, F to set 1. Assume an LRU replacement policy.

Does the hit rate improve if you assume a fully-associative cache of the same size (again, indicate if each access is a hit or a miss)?

Access pattern: AAECBAECAABDAFFBACA

Virtually Indexed Cache

Assume that the OS uses a page size of 4 KB. Assume that your L1 cache must be 4-way set-associative, and the cache line size is 64 bytes. If you're trying to correctly implement a virtually indexed physically tagged cache (with no additional support from the OS), what is the largest L1 cache that you can design?

Snooping-Based Cache Coherence

Consider a symmetric shared-memory multiprocessor (3 processors sharing a bus) implementing a snooping cache coherence protocol such as the one discussed in class. For each of the events below, explain the coherence protocol steps (does the cache flag a hit/miss, what request is placed on the bus, who responds, is a writeback required, etc.) and mention the eventual state of the data block in the caches of each of the 3 processors. Assume that the caches are direct-mapped and that each cache line only stores one word and that words X and Y map to the same cache line in each cache (in other words, X and Y cannot both be in the cache at the same time). At the start, X and Y are not in any of the three caches.

P1: Read X
P2: Write X
P3: Write X
P2: Read X
P3: Write X
P3: Write Y
P2: Read Y

Subimt

Submit your solution through Gradescope HW3 (as a PDF file (please mark which parts of the PDF are used for each question (this can be done through Gradescope)).

Updated: December, 2019