§ Proving block matmul using program analysis
It's a somewhat well-known fact that given matrix multiplication: O=AB
where O∈R2n×2m ( O for output),
A∈R2n×r,B∈Rr×2m are matrices.
We can also write this as follows:
[o11o21o12o22]=[a11a21a12a22][b11b21b12b22]=[a11b11+a12b21a21b11+a22b21a11b12+a12b22a21b12+a22b22]
When written as code, the original matrix multiplication is:
int matmul(int N, int Z, int M, int a[N][Z], int b[Z][M], int out[N][M]) {
for(int i = 0; i < 2*N; ++i) {
for(int j = 0; j < 2*M; ++j) {
for(int k = 0; k < 2Z; ++k) out[i][j] += a[i][k] * b[k][j]
}
}
}
and the block-based matrix multiplication is:
int matmulBlock(int N, int Z, int M, int a[N][Z], int b[Z][M], int out[N][M]) {
for (int BI = 0; BI < 2; ++BI) {
for (int BJ = 0; BJ < 2; ++BJ) {
for(int i = BI*N; i < BI*N+N; ++i) {
for(int j = BJ*M; j < BJ*M+M; ++j) {
for(int k = 0; k < 2Z; ++k) { out[i][j] += a[i][k] * b[k][j] }
}
}
}
}
}
we wish to show that both of these programs have the same semantics .
We will do this by appealing to ideas from program analysis.
§ The key idea
We will consider the statement:
out[i][j] += a[i][k] * b[k][j]
as occuring at an abstract "point in time" (i,j,k) in the matmul
function.
I also occurs at an abstract "point in time" (BI,BJ,i′,j′,k′) in
the matmulBlock
function.
We will then show that the loops for(i...) for(j...) for(k...)
are fully
parallel, and hence we can reorder the loops any way we want.
Then, we will show that the ordering imposed by (BI,BJ,i′,j′,k′)
is a reordering of the original (i,j,k) ordering. We do this by
showing that there is a bijection:
(i=i0,j=j0,k=k0)→(BI=i0/N,BJ=j0/N,i=i0%N,j=j0%N,k=k0)
Thus, this bijection executes all loops, and does so without affecting the
program semantics.
§ Schedules
We'll zoom out a little, to consider some simple programs and understan
how to represent parallelism.
void eg1(int N, int M, int out[N][M]) {
for(int i = 0; i < N; ++i) {
for(int j = 1; j < M; ++j) {
out[i][j] = out[i][j-1];
}
}
Notice that this program is equivalent to the program with the i loop
reversed:
void eg1rev(int N, int M, int out[N][M]) {
for(int i = N-1; i >=0; --i) {
for(int j = 1; j < M; ++j) {
out[i][j] = out[i][j-1];
}
}
What's actually stopping us from reversing the loop for(j...)
? it's
the fact that the value of, say, out[i=0][j=1]
depends on
out[i=0][j=0]
. We can see that in general, out[i=i_0][j=j_0]
depends
on out[i=i_0][j=j_0-1]
. We can represent this by considering a
dependence set :
{write:(i0,j0−1)→write:(i0,j0)}
in general, we can reorder statements as long as we do not change
the directions of the arrows in the dependence set.
We can imagine the scenario as follows:
^
| (1, 0)->(1, 1)->(1, 2)->(1, 3) ...
| (0, 0)->(0, 1)->(0, 2)->(0, 3) ....
(i, j) -->
§ Dependence structure of matmul
.
§ Fully parallel, reordering
[TODO ]
§ References