Multi-Level Cache Performance Problem

This is a homework problem that I was given:

Suppose you have the following statistics for a Processor with several different choices for memory hierarchy.

Base CPI = 1.5 
Processor Speed = 2 GHZ 
Main Memory Access Time = 100ns 
L1 miss rate per instruction = 7% 
L2 direct mapped access = 12 cycles 
Global miss rate with L2 direct mapped = 3.5% 
L2 8-way set associative access = 28 cycles 
Global miss rate with L2 8-way set associative access = 1.5%

Note: A global miss rate is the percentage of references that miss in all levels of cache (and therefore must access Main memory)

  1. Calculate the Total CPI if L2 cache is available and is direct mapped.
  2. Calculate the Total CPI if L2 is available and is 8-way set associative.



Total CPI = Base CPI + Memory-Stall Cycles per Instruction

Memory-Stall Cycles per Instruction = Miss Penalty (in cycles) x Miss Rate

The first order of business is to figure out the miss penalty if there was not a second cache. This is easily determined by the following calculation:

Main Memory Access Time/ (1/Processor Speed) = (100) / (.5) = 200 cycles

Note: Main Memory Access Time is in ns, and the inverse of Processor Speed will be in ns/cycles, so by dividing the two we get the number of cycles. We are doing this calculation because it takes a certain amount of time to go all the way to main memory (100ns) and the processor speed determines how fast we can go (2GHz) and by changing clock speed to clock rate by inversion we can calculate the number of cycles required to go to main memory (miss penalty).

Because the problem involves two caches, when there is a miss in L1 there will be an attempt to retrieve the information from L2 and then if the information is still not found it will access main memory, so the flow looks something like this.

Access L1 —–> Access L2 —–> Access Main Memory

(it’s implied that if there is a “hit” we will not need to continue the flow)

The problem tells us that L2 direct mapped access takes = 12 cycles

So the calculation will look as follow:

Total CPI = 1.5 + (0.07 x 12) + (0.035 x 200) = 9.34 CPI

Because you miss 7% of the time you will need to access L2 and that takes 12 cycles so you multiply the two. Then if it’s still not found we must access main memory which takes 200 cycles and the global miss rate is 3.5%

Total CPI = 1.5 + (0.07 x 28) + (0.015 x 200) = 6.46

The second calculation is done in a similar fashion