Objectives

Learn how memory access patterns determine cache hit rates.
Determine which memory access patterns produce GOOD hit rates.

“Data Cache Simulator and Visualisation” in MARS

This lab uses cache/memory visualization tools in MARS in order to illustrate cache behaviour and performance measurement.

First, download the starter file (i.e. cache.s) to a directory of your choice. Then, load the cache.s file in MARS and examine its contents to get a rough idea of what the program does before proceeding further.

The most important part of the program to understand is the section called “PSEUDOCODE” at the top of the file. Basically, we’re just going to reset some elements of an array (option 0) or increment them (option 1).
The elements accessed in the array is determined by step size ($a1) and the number of times we do this is controlled by the repcount parameter ($a2). These two parameters will directly affect the cache hit and miss rates. The option parameter ($a3) will also have some impact on the execution performance as will the cache’s own parameters.

For each of the scenarios below, repeat the following steps:

Set the appropriate program parameters in caches.s as shown at the beginning of each scenario (by changing the immediate values of the li instructions of the lines 23 – 26 in the main function)
Run « Tools | Data Cache Simulator ».
Set the appropriate cache parameters as shown at the beginning of each scenario.
Enable the Runtime Log and click “Connect to MIPS”.
Run « Tools | Memory Reference Visualization » and click “Connect to MIPS”.
As you step through the code in MARS (i.e. using ), any access to the data segment (load or store) will be visualised (access to the code segement is not displayed).

The « Data Cache Simulator » will display the state of your data cache and the « Memory Reference Visualization » will show which parts of memory are accessed and how many times. Remember that these tools work independently of your code, so if you « reassemble/reset » the cache.s code, you must also reset the contents of the cache and memory visualisation tools!

NOTE: If you run the code in one go (instead of stepping through it), you will get the final state of the cache and the hit rate. In order to better see which « steps/iterations » affect the hit and miss rates in the program, insert a « breakpoint » in the loop wordLoop just before or after each memory access.

Tasks:

Simulate the following scenarios for the cache.s program and take note of the final cache hit rates. Try to infer what the hit rate will be BEFORE you run the code. After running each simulation, make sure you understand WHY you got these results (this might end up as a question on the exam)!

to help you better understand the given problems in the scenarios below, here are some questions you should be able to answer:

How big is the memory block/cache line?
How many consecutive accesses (taking into account the step size) access the same block?
How much data fits in the ENTIRE cache?
How far apart in memory are the blocks that map to the same cache line (and thus could create conflicts)?
What is cache type (i.e. associativity)?
Where in the cache will a particular block in memory be mapped?
To decide whether a specific cache access is a miss or a hit: has that data been accessed before? If so, is it still in the cache or not?

Exercise 1 (Cache Scenario 1):

Cache settings: (to be set in the « Data Cache Simulator » window)

Placement Policy: Direct Mapping.
Block Replacement Policy: LRU.
Set size (i.e. number of blocks per cache line): 1. (MARS will not allow you to change this value. Why?)
Number of blocks: 4.
Cache block size: 2 (in words).

Program parameters: (to be set by initialising registers $a? in cache.s: lines 23 – 26)

Array Size: 128 (bytes).
Step Size: 8.
repcount: 4.
Option: 0.

What combination of parameters produces the hit rate you observe? (Hint: Your answer should be of the form: «Because [parameter A] in bytes is exactly equal to [parameter B] in bytes.»). Note: Recall that the cache size is implicitly defined by the « block size » and the « number of blocks » in the cache.
What is the cache hit rate if we arbitrarily increase the repcount parameter? Why?
How could we change a parameter in the program to increase our hit rate? (Hint: Your answer should be of the form: “Replace [parameter] with [value]”). Note: It does not matter whether we access the same array elements. Just give a change to the program parameters that would increase the hit rate. Make sure, however, that the value you give is valid.

Hint: If you have trouble visualising what is being cached on each memory access by simply looking at the code, try drawing on a piece of paper a sketch of the cache memory and write down what the (Tag/Index/Offset) decomposition of the 32-bit addresses would be. this will help you determine which memory addresses map to which line in the cache using the ‘Index’ bits.

Exercise 2 (Cache Scenario 2):

Cache Parameters:

Placement Policy: N-way Set Associative.
Block Replacement Policy: LRU.
Set size: 4.
Number of blocks: 16.
Cache block size: 4 (in words)

Program Parameters:

Array Size: 256 (bytes)
Step Size: 2
repcount: 1
Option: 1

How many memory accesses are there per iteration of the inner loop? (not the one involving repcount)? It’s not 1!
What is the repeating pattern of hits/misses? WHY? (Hint: they repeat every 4 accesses). Give the hit rate in terms of this repeating pattern.
What happens to our hit rate if the number of repetitions repcount goes to infinity (i.e. our loop becomes infinite)? Why?

You should have noticed that the hit rate was quite high for scenario 2, and your answer to the previous question should help you understand why this is the case. If you don’t know why, consider the size of the array and compare it to the size of the cache.

Now, consider the following: Suppose we have a program that iterates repcount times over a very large array (i.e., much larger than the cache size). During each iteration, we map a different function to the elements of our array (e.g., if repcount = 1024, we map 1024 different functions to each of the elements in the array). For reference, we only had one increment function and one iteration up to now.

Given the above description, how can we restructure accesses to the array to achieve a hit rate like the one we got before (assuming that each element of the array is modified independently of the others, i.e. it doesn’t matter if iteration k is applied to element arr[i+1] before it is applied to element arr[i], etc.)?

Hint: It would not make sense to scan the entire array on each iteration because it is much larger than your cache. This would reduce the amount of temporal locality exposed by your program and thus negatively impact the cache hit rate.

The idea is to expose more locality so that our cache can take advantage of our predictable behavior. So, we should try to access ________ of the array at once and apply all the ________ to those ________ before continuing with a new ________. This will allow us to keep _______ of the array nice and warm in the cache during the whole procedure and not have to go around later! (The 1st, 3rd, and 5th blanks must be the same. This is not a specific vocabulary term that you should use to fill them. It is more of an idea that you should have).