## Self-Spatial Reuse

**Introduction:** The analysis of spatial reuse depends on the data layout of the matrix. C matrices are laid out in row-major order and Fortran matrices are laid out in column-major order. In other words, array elements X[iJ] and X[i,j 1] are contiguous in C and X[i, j] and X[i 1, j] are contiguous in Fortran. Without loss of generality, in the rest of the chapter, we shall adopt the C (row-major) array layout.

As a first approximation, we consider two array elements to share the same cache line if and only if they share the same row in a two-dimensional array. More generally, in an array of d dimensions, we take array elements to share a cache line if they differ only in the last dimension. Since for a typical array and cache, many array elements can fit in one cache line, there is significant speedup to be had by accessing an entire row in order, even though, strictly speaking, we occasionally have to wait to load a new cache line. The trick to discovering and taking advantage of self-spatial reuse is to drop the last row from the coefficient matrix F. If the resulting truncated matrix has rank that is less than the depth of the loop nest, then we can assure spatial locality by making sure that the innermost loop varies only the last coordinate of the array.

**Example:** Consider the last access, Z[l, i,2* i j], in Fig. 1 1 . 1 9 . If we delete the last row, we are left with the truncated matrix

" 0 0 "

1 0

The rank of this matrix is evidently 1, and since the loop nest has depth 2, there is the opportunity for spatial reuse. In this case, since j is the inner-loop index, the inner loop visits contiguous elements of the array Z stored in row-major order. Making i the inner-loop index will not yield spatial locality, since as I changes, both the second and third dimensions change.

The general rule for determining whether there is self-spatial reuse is as follows. As always, we assume that the loop indexes correspond to columns of the coefficient matrix in order, with the outermost loop first, and the innermost loop last. Then in order for there to be spatial reuse, the vector [ 0 , 0 , . . . , 0,1] must be in the null space of the truncated matrix. The reason is that if this vector is in the null space, then when we fix all loop indexes but the innermost one, we know that all dynamic accesses during one run through the inner loop vary in only the last array index. If the array is stored in row-major order, then these elements are all near one another, perhaps in the same cache line.

**Group Reuse: **We compute group reuse only among accesses in a loop sharing the same coefficient matrix. Given two dynamic accesses Fii fi and Fi_{2} f_{2}, reuse of the same data requires that

**Fii fi = Fi _{2} f_{2}**

or

**F ( i i - i _{2} ) = ( f _{2} - f i ) .**

Suppose v is one solution to this equation. Then if w is any vector in the null space of F i , w v is also a solution, and in fact those are all the solutions to the equation.