§ Cayley Hamilton

I find the theorem spectacular, because while naively the vector space Mn(F)M_n(F) has dimension n2n^2, Cayley-Hamilton tells us that there's only nn of M0,M1,M^0, M^1, \dots are enough to get linear dependence. However, I've never known a proof of Cayley Hamilton that sticks with me; I think I've found one now. For any matrix AA, the adjugate has the property:
Aadj(A)=det(A)I A adj(A) = det(A) I
Using this, Consider the matrix PAxIAP_A \equiv xI - A which lives in End(V)[x]End(V)[x], and its corresponding determinant pA(x)det(PA)=det(xIA)p_A(x) \equiv det(P_A) = det(xI - A). We have that
PAadj(PA)=det(PA)I(xIA)adj(xIA)=det(xIA)I=pA(x)I P_A adj(P_A) = det(P_A) I \\ (xI - A) adj(xI - A) = det(xI - A) I = p_A(x) I \\
If we view this as an equation in End(V)[x]End(V)[x], it says that pAp_A has a factor xIAxI - A. This means that x=Ax = A is a zero of pA(X)p_A(X). Thus, we know that AA satisfies pA(x)p_A(x), hence AA satisfies its own characteristic polynomial! The key is of course the adjugate matrix equation that relates the adjugate matrix to the determinant of AA.

§ Adjugate matrix equation

  • Let A(i,j)A'(i, j) be the matrix AA with the ii th row and jj column removed.
  • Let C[i,j](1)i+jdet(A(i,j))C[i, j] \equiv (-1)^{i+j} det(A'(i, j)) be the determinant of the A(i,j)A'(i, j) matrix.
  • Let define adj(A)CTadj(A) \equiv C^T to be the transpose of the cofactor matrix. That is, adj(A)[i,j]=C[j,i]=det(A(j,i))adj(A)[i, j] = C[j, i] = det(A'(j, i)).
  • Call DAadj(A)D \equiv A adj(A). We will now compute the entries of ZZ when i=ji = j and when iji \neq j. We call it DD for diagonal since we will show that DD is a diagonal matrix with entry det(A)det(A)on the diagonal.
  • First, compute D[i,i]D[i, i] (I use einstein summation convention where repeated indices are implicitly summed over):
=D[i,i]=(Aadj(A))[i,i]=A[i,k]adj(A)[k,i]=A[i,k](1)i+kdet(A[k,i])=det(A) \begin{aligned} = D[i, i] = (A adj(A))[i, i] \\ &= A[i, k] adj(A) [k, i] \\ &= A[i, k] (-1)^{i+k} det(A'[k, i]) \\ &= det(A) \end{aligned}
The expression A[i,k]det(A[k,i])A[i, k] det(A'[k, i]) is the determinant of AA when expanded along the row ii using the Laplace expansion . Next, let's compute D[i,j]D[i, j] when iji \neq j:
D[i,j]=(Aadj(A))[i,j]=A[i,k]adj(A)[k,j]=A[i,k](1)k+jdet(A[k,j])=A[i,k](1)j+kdet(A[k,j])=det(Z) \begin{aligned} D[i, j] = (A adj(A))[i, j] \\ &= A[i, k] adj (A)[k, j] \\ & = A[i, k] (-1)^{k+j} det(A'[k, j]) \\ & = A[i, k] (-1)^{j+k} det(A'[k, j]) \\ & = det(Z) \end{aligned}
This is the determinant of a new matrix ZZ (for zero), such that the jjth row of ZZ is the iith row of AA. More explicitly:
Z[l,:]{A[l,:]ljA[i,:]l=j \begin{aligned} Z[l, :] \equiv \begin{cases} A[l, :] & l \neq j \\ A[i, :] & l = j \\ \end{cases} \end{aligned}
Since ZZ differs from AA only in the jjth row, we must have that Z[k,j]=A[k,j]Z'[k, j] = A'[k, j], since Z[k,j]Z'[k, j] depends on what happens on all rows and columns outside of jj. If we compute det(Z)det(Z) by expanding along the jj row, we get:
det(Z)=(1)j+kZ[j,k]det(Z[k,j])det(Z)=(1)j+kA[j,k]det(Z[k,j])det(Z)=(1)j+kA[j,k]det(A[k,j])=D[i,j] \begin{aligned} &det(Z) = (-1)^{j+k} Z[j, k] det(Z'[k, j]) \\ &det(Z) = (-1)^{j+k} A[j, k] det(Z'[k, j]) \\ &det(Z) = (-1)^{j+k} A[j, k] det(A'[k, j]) \\ &= D[i, j] \end{aligned}
But ZZ has a repeated row: A[j,:]=A[i,:]A[j, :] = A[i, :] and iji \neq j. Hence, det(Z)=0det(Z) = 0. So, D[i,j]=0D[i, j] = 0 when iji \neq j. Hence, this means that Aadj(A)=det(A)IA adj(A) = det(A) I.
  • We can rapidly derive other properties from this reuation. For example, det(Aadj(A))=det(det(A)I)=det(A)ndet(A adj(A)) = det(det(A) I) = det(A)^n, and hence det(A)det(adj(A))=det(A)ndet(A) det(adj(A)) = det(A)^n, or det(adj(A))=det(A)n1det(adj(A)) = det(A)^{n-1}.
  • Also, by rearranging, if det(A)0det(A) \neq 0, we get Aadj(A)=det(A)IA adj(A) = det(A) I, hence A(adj(A)/det(A))=IA (adj(A)/det(A)) = I, or adj(A)/det(A)=A1adj(A)/det(A) = A^{-1}.

§ Determinant in terms of exterior algebra

For a vector space VV of dimension nn, Given a linear map T:VVT: V \rightarrow V, define a map ΛT:ΛnVΛnV\Lambda T: \Lambda^n V \rightarrow \Lambda^n V such that ΛT(v1v2vn)Tv1Tv2Tvn\Lambda T(v_1 \wedge v_2 \dots \wedge v_n) \equiv T v_1 \wedge T v_2 \dots \wedge T v_n. Since the space ΛnV\Lambda^n V is one dimension, we will need one scalar kk to define TT: T(v1vn)=kv1vnT(v_1 \wedge \dots \wedge v_n) = k v_1 \wedge \dots \wedge v_n. It is either a theorem or a definition (depending on how one starts this process) that k=det(T)k = det(T). If we choose this as a definition, then let's try to compute the value. Pick orthonormal basis v[i]v[i]. Let w[i]Tv[i]w[i] \equiv T v[i] (to write TT as a matrix). Define the TT matrix to be defined by the equation w[i]=T[i][j]v[j]w[i] = T[i][j] v[j]. If we now evaluate ΛT\Lambda_T, we get:
ΛT(v1vn)=T(v1)T(v2)T(vn)=w1w2wn=(T[1][j1]v[j1])(T[1][j2]v[j2])(T[1][jn]v[jn])=(σSn(iT[i][σ(i)]sgn(σ))v[1]v[2]vn \begin{aligned} \Lambda T(v_1 \wedge \dots v_n) \\ &= T(v_1) \wedge T(v_2) \dots \wedge T(v_n) \\ &= w_1 \wedge w_2 \dots w_n \\ &= (T[1][j_1] v[j_1]) \wedge (T[1][j_2] v[j_2]) \wedge (T[1][j_n] v[j_n]) \\ &= (\sum_{\sigma \in S_n} (\prod_i T[i][\sigma(i)] sgn(\sigma)) v[1] \wedge v[2] \dots \wedge v_n \end{aligned}
Where the last equality is because:
  • (1) Repeated vectors get cancelled, so we must have unique v[1],v[2],v[n]v[1], v[2], \dots v[n] in the terms we collect. So all the jkj_k must be distinct in a given term.
  • A wedge of the form T[1][j1]v[j1]T[2][j2]v[j2]T[n][jn]v[jn]T[1][j_1] v[j_1] \wedge T[2][j_2] v[j_2] \dots T[n][j_n] v[j_n], where all the jij_i are distinct (see (1)) can be rearranged by a permutation that sends vjiviv_{j_i} \mapsto v_i. Formally, apply the permutation τ(ji)i\tau(j_i) \equiv i. This will reorganize the wedge into T[1][j1]T[2][j2]T[n][jn]v[1]v[2]v[3]v[n](1)sgn(τ)T[1][j_1] T[2][j_2] \dots T[n][j_n] v[1] \wedge v[2] \wedge v[3] \dots v[n] (-1)^{sgn(\tau)}, where the sign term is picked up by the rearrangement.
  • Now, write the indexing into T[i][ji]T[i][j_i] in terms of a permutation σ(i)ji\sigma(i) \equiv j_i. This becomes iT[i][σ(i)](1)sgn(τ)v[1]v[2]v[n]\prod_i T[i][\sigma(i)] (-1)^{sgn(\tau)} v[1] \wedge v[2] \dots \wedge v[n].
  • We have two permutations σ,τ\sigma, \tau in the formula. But we notice that σ=τ1\sigma = \tau^{-1}, and hence sgn(σ)=sgn(τ)sgn(\sigma) = sgn(\tau), so we can write the above as iT[i][σ(i)](1)sgn(σ)v[1]v[2]v[n]\prod_i T[i][\sigma(i)] (-1)^{sgn(\sigma)} v[1] \wedge v[2] \dots \wedge v[n].
  • Thus, we have recovered the "classical determinant formula".

§ Laplace expansion of determinant

From the above algebraic encoding of the determinant of T[i][j]T[i][j] as σSnsgn(σ)iT[i][σ(i)]\sum_{\sigma \in S_n}sgn(\sigma)\prod_i T[i][\sigma(i)], we can recover the "laplace expansion" rule, that asks to pick a row rr, and then compute the expression: as
Lr(T)cT[r,c](1)r+cdet(T(r,c)) L_r(T) \equiv \sum_c T[r, c] (-1)^{r+c} det(T'(r, c))
Where T(r,c)T'(r, c) is the matrix TT with row rr and column cc deleted. I'll derive this concretely using the determinant definition for the 3 by 3 case. The general case follows immediately. I prefer being explicit in a small case as it lets me see what's going on. Let's pick a basis for VV, called b[1],b[2],b[3]b[1], b[2], b[3]. We have the relationship v[i]Tb[i]v[i] \equiv Tb[i]. We want to evaluate the coefficient of v[1]v[2]v[3]v[1] \wedge v[2] \wedge v[3]. First grab a basis expansion of v[i]v[i] as v[i]=c[i][j]b[j]v[i] = c[i][j] b[j]. These uniquely define the coefficients c[i][j]c[i][j]. Next, expand the wedge:
v[1]v[2]v[3]=(c[1][1]b[1]+c[1][2]b[2]+c[1][3]b[3])(c[2][1]b[1]+c[2][2]b[2]+c[2][3]b[3])(c[3][1]b[1]+c[3][2]b[2]+c[3][3]b[3]) \begin{aligned} & v[1] \wedge v[2] \wedge v[3] \\ &= (c[1][1]b[1] + c[1][2]b[2] + c[1][3]b[3]) \wedge (c[2][1]b[1] + c[2][2]b[2] + c[2][3]b[3]) \wedge (c[3][1]b[1] + c[3][2]b[2] + c[3][3]b[3]) \end{aligned}
I now expand out only the first wedge, leaving terms of the form c[1][1]b[1]()+c[1][2]b[2]()c[1][3]b[3]()c[1][1]b[1] (\cdot) + c[1][2]b[2](\cdot) c[1][3]b[3] (\cdot). (This corresponds to "expanding along and deleting the row" in a laplace expansion when finding the determinant) Let's identify the ()(\cdot) and see what remains:
v[1]v[2]v[3]=c[1][1]b[1](c[2][1]b[1]+c[2][2]b[2]+c[2][3]b[3])(c[3][1]b[1]+c[3][2]b[2]+c[3][3]b[3])+c[1][2]b[2](c[2][1]b[1]+c[2][2]b[2]+c[2][3]b[3])(c[3][1]b[1]+c[3][2]b[2]+c[3][3]b[3])+c[1][3]b[2](c[2][1]b[1]+c[2][2]b[2]+c[2][3]b[3])(c[3][1]b[1]+c[3][2]b[2]+c[3][3]b[3]) \begin{aligned} & v[1] \wedge v[2] \wedge v[3] \\ &= c[1][1]b[1] \wedge (c[2][1]b[1] + c[2][2]b[2] + c[2][3]b[3]) \wedge (c[3][1]b[1] + c[3][2]b[2] + c[3][3]b[3]) + c[1][2]b[2] \wedge(c[2][1]b[1] + c[2][2]b[2] + c[2][3]b[3]) \wedge(c[3][1]b[1] + c[3][2]b[2] + c[3][3]b[3]) + c[1][3]b[2] \wedge(c[2][1]b[1] + c[2][2]b[2] + c[2][3]b[3]) \wedge(c[3][1]b[1] + c[3][2]b[2] + c[3][3]b[3]) \end{aligned}
Now, for example, in the first term c[1][1]b[1]()c[1][1]b[1] \wedge (\cdot), we lose anything inside that contains a b[1]b[1], as the wedge will give us b[1]b[1]=0b[1] \wedge b[1] = 0 (this corresponds to "deleting the column" when considering the submatrix). Similar considerations have us remove all terms that contain b[2]b[2] in the brackets of c[1][2]b[2]c[1][2]b[2], and terms that contain b[3]b[3] in the brackets of c[1][3]b[3]c[1][3]b[3]. This leaves us with:
v[1]v[2]v[3]=c[1][1]b[1](c[2][2]b[2]+c[2][3]b[3])(c[3][2]b[2]+c[3][3]b[3])+c[1][2]b[2](c[2][1]b[1]+c[2][3]b[3])(c[3][1]b[1]+c[3][3]b[3])+c[1][3]b[2](c[2][1]b[1]+c[2][2]b[2])(c[3][1]b[1]+c[3][2]b[2]) \begin{aligned} & v[1] \wedge v[2] \wedge v[3] \\ &= c[1][1]b[1] \wedge (c[2][2]b[2] + c[2][3]b[3]) \wedge (c[3][2]b[2] + c[3][3]b[3]) + c[1][2]b[2] \wedge(c[2][1]b[1] + c[2][3]b[3]) \wedge(c[3][1]b[1] + c[3][3]b[3]) + c[1][3]b[2] \wedge(c[2][1]b[1] + c[2][2]b[2] ) \wedge(c[3][1]b[1] + c[3][2]b[2]) \end{aligned}
We are now left with calculating terms like (c[2][2]b[2]+c[2][3]b[3])(c[3][2]b[2]+c[3][3]b[3])(c[2][2]b[2] + c[2][3]b[3]) \wedge (c[3][2]b[2] + c[3][3]b[3]) which we can solve by recursion (that is the determinant of the 2x2 submatrix). So if we now write the "by recursion" terms down, we will get something like:
v[1]v[2]v[3]=c[1][1]b[1](k[1]b[2]b[3])+c[1][2]b[2](k[2]b[1]b[3])+c[1][3]b[2](k[3]b[1]b[2]) \begin{aligned} & v[1] \wedge v[2] \wedge v[3] \\ &= c[1][1]b[1] \wedge (k[1] b[2] \wedge b[3]) + c[1][2]b[2] \wedge(k[2] b[1] \wedge b[3]) + c[1][3]b[2] \wedge(k[3] b[1] \wedge b[2]) \end{aligned}
Where the k[i]k[i] are the values produced by the recursion, and we assume that the recursion will give us the coefficients of the wedges "in order": so we always have b[2]b[3]b[2] \wedge b[3] for example, not b[3]b[2]b[3] \wedge b[2]. So, we need to ensure that the final answer we spit out corresponds to b[1]b[2]b[3]b[1] \wedge b[2] \wedge b[3]. If we simplify the current step we are at, we will get:
v[1]v[2]v[3]=k[1]c[1][1]b[1]b[2]b[3]+k[2]c[1][2]b[2]b[1]b[3]+k[3]c[1][3]b[2]b[1]b[2] \begin{aligned} & v[1] \wedge v[2] \wedge v[3] \\ &= k[1] c[1][1]b[1] \wedge b[2] \wedge b[3] + k[2] c[1][2]b[2] \wedge b[1] \wedge b[3] + k[3] c[1][3]b[2] \wedge b[1] \wedge b[2] \end{aligned}
We need to rearrange our terms to get b[1]b[2]b[3]b[1] \wedge b[2] \wedge b[3] times some constant. On rearranging each term into the standard form b[1]b[2]b[3]b[1] \wedge b[2] \wedge b[3], we are forced to pick up the correct sign factors:
v[1]v[2]v[3]=k[1]c[1][1]b[1]b[2]b[3]k[2]c[1][2]b[1]b[2]b[3]+k[3]c[1][3]b[1]b[2]b[3]=(c[1][1]k[1]c[1][2]k[2]+c[1][3]k[3])(b[1]b[2]b[3]) \begin{aligned} & v[1] \wedge v[2] \wedge v[3] \\ &= k[1] c[1][1]b[1] \wedge b[2] \wedge b[3] -k[2] c[1][2]b[1] \wedge b[2] \wedge b[3] + k[3] c[1][3]b[1] \wedge b[2] \wedge b[3] \\ &= (c[1][1]k[1] - c[1][2]k[2] + c[1][3]k[3])(b[1] \wedge b[2] \wedge b[3]) \end{aligned}
We clearly see that for each c[i]c[i], the factor is (1)ik[i](-1)^i k[i] where k[i]k[i] is the answer gotten by computing the determinant of the sub-expression where we delete the vector b[i]b[i] (ignore the column) and also ignore the entire "row", by not thinking about c[j]()c[j](\cdot) where jij \neq i. So, this proves the laplace expansion by exterior algebra.

§ Deriving Cayley Hamilton for rings from Z\mathbb Z

I'll show the idea of how to prove Cayley Hamilton for an arbitrary commutative ring RR given we know Cayley Hamilton for Z\mathbb Z. I describe it for 2x2 matrices. The general version is immediate from this. Pick a variable matrix and write down the expression for the characteristic polynomial So if:
M = [a b]
    [c d]
then the characteristic polynomial is:
ch
= |M - xI|
=
|a-x b|
|c   d-x|
that's ch(a,b,c,d,x)(ax)(dx)bc=x2+x(a+d)+adbcch(a, b, c, d, x) \equiv (a-x)(d-x) - bc = x^2 +x (a + d) + ad - bc. This equation has a,b,c,d,xRa, b, c, d, x \in R for some commutative ring RR. Now, we know that if we set x=Mx = M, this equation will be satisfied. But what does it mean to set x=Ax = A? Well, we need to let xx be an arbitrary matrix:
x = [p q]
    [r s]
And thus we compute x^2 to be:
x^2
= [p q][p q]
  [r s][r s]
= [p^2 + qr; pq + qs]
  [rp + sr; rq + s^2]
So now expanding out ch(a,b,c,d,x)ch(a, b, c, d, x) in terms of xx on substituting for xx the matrix we get the system:
[p^2 + qr; pq + qs] + (a + d) [p q] + (ad - bc)[1 0] = [0 0]
[rp + sr; rq + s^2]           [r s]            [0 1]   [0 0]
We know that these equations hold when x=Mx = M, because the Cayley-Hamilton theorem tells us that ch(M)=0ch(M) = 0! So we get a different system with p = a, q = b, r = c, s = d, still with four equations, that we know is equal to zero! This means we have four intederminates a, b, c, d and four equations, and we know that these equations are true for all Z\mathbb Z. But if a polynomial vanishes on infinitely many points, it must identically be zero. Thus, this means that ch(A) is the zero polynomial, or ch(A) = 0 for all R. This seems to depend on the fact that the ring is infinite, because otherwise imagine we send Z\mathbb Z to Z/10ZZ/10Z. Since we don't have an infinite number of Z\mathbb Z elements, why should the polynomial be zero? I imagine that this needs zariski like arguments to be handled.

§ Cramer's rules

We can get cramer's rule using some handwavy manipulation or rigorizing the manipulation using geometric algebra . Say we have a system of equations:
a[1]x+b[1]y=c[1]a[2]x+b[2]y=c[2] \begin{aligned} a[1] x + b[1] y = c[1] \\ a[2] x + b[2] y = c[2] \end{aligned}
We can write this as:
ax+by=c \vec a x + \vec b y = \vec c
where a(a1,a2)\vec a \equiv (a_1, a_2) and so on. To solve the system, we wedge with a\vec a and b\vec b:
a(ax+by)=acaby=acy=acaby=a[1]a[2]c[1]c[2]a[1]b[1]a[2]b[2] \begin{aligned} \vec a \wedge (\vec a x + \vec b y) = \vec a \wedge \vec c \\ \vec a \wedge \vec b y = \vec a \wedge \vec c \\ y = \frac{\vec a \wedge \vec c}{\vec a \wedge \vec b} \\ y = \frac{\begin{vmatrix} a[1] & a[2] \\ c[1] & c[2] \end{vmatrix}}{ \begin{vmatrix} a[1] & b[1] \\ a[2] & b[2] \end{vmatrix} } \end{aligned}
Which is exactly cramer's rule.

§ The formula for the adjugate matrix from Cramer's rule (TODO)

§ References