I find the theorem spectacular, because while naively the vector space Mn(F)has dimension n2, Cayley-Hamilton tells us that there's only n of M0,M1,…are enough to get linear dependence. However, I've never known a
proof of Cayley Hamilton that sticks with me; I think I've found one now.
For any matrix A, the adjugate has the property:
Aadj(A)=det(A)I
Using this, Consider the matrix PA≡xI−A which lives in End(V)[x],
and its corresponding determinant pA(x)≡det(PA)=det(xI−A).
We have that
If we view this as an equation in End(V)[x], it says that pA has a factor xI−A. This means that
x=A is a zero of pA(X). Thus, we know that A satisfies pA(x), hence A satisfies
its own characteristic polynomial!
The key is of course the adjugate matrix equation that relates the adjugate matrix
to the determinant of A.
Let A′(i,j) be the matrix A with the i th row and j column removed.
Let C[i,j]≡(−1)i+jdet(A′(i,j)) be the determinant of the A′(i,j) matrix.
Let define adj(A)≡CT to be the transpose of the cofactor matrix. That is, adj(A)[i,j]=C[j,i]=det(A′(j,i)).
Call D≡Aadj(A). We will now compute the entries of Z when i=j and when i=j. We call it D for diagonal since we will show that D is a diagonal matrix with entry det(A)on the diagonal.
First, compute D[i,i] (I use einstein summation convention where repeated indices are implicitly summed over):
The expression A[i,k]det(A′[k,i]) is the determinant of A when expanded along the row i using
the Laplace expansion .
Next, let's compute D[i,j] when i=j:
This is the determinant of a new matrix Z (for zero), such that the jth row of Z is the ith
row of A. More explicitly:
Z[l,:]≡{A[l,:]A[i,:]l=jl=j
Since Z differs from A only in the jth row, we must have that
Z′[k,j]=A′[k,j], since Z′[k,j] depends on what happens on all
rows and columns outside of j.
If we compute det(Z) by expanding along the j row, we get:
But Z has a repeated row: A[j,:]=A[i,:] and i=j. Hence, det(Z)=0.
So, D[i,j]=0 when i=j.
Hence, this means that Aadj(A)=det(A)I.
We can rapidly derive other properties from this reuation. For example, det(Aadj(A))=det(det(A)I)=det(A)n, and hence det(A)det(adj(A))=det(A)n, or det(adj(A))=det(A)n−1.
Also, by rearranging, if det(A)=0, we get Aadj(A)=det(A)I, hence A(adj(A)/det(A))=I, or adj(A)/det(A)=A−1.
For a vector space V of dimension n, Given a linear map T:V→V, define
a map ΛT:ΛnV→ΛnV such that
ΛT(v1∧v2⋯∧vn)≡Tv1∧Tv2⋯∧Tvn.
Since the space ΛnV is one dimension, we will need one scalar k to define
T: T(v1∧⋯∧vn)=kv1∧⋯∧vn. It is either a theorem
or a definition (depending on how one starts this process) that k=det(T).
If we choose this as a definition, then let's try to compute the value. Pick orthonormal
basis v[i]. Let w[i]≡Tv[i] (to write T as a matrix). Define the T matrix to be defined
by the equation w[i]=T[i][j]v[j]. If we now evaluate ΛT, we get:
(1) Repeated vectors get cancelled, so we must have unique v[1],v[2],…v[n] in the terms we collect. So all the jk must be distinct in a given term.
A wedge of the form T[1][j1]v[j1]∧T[2][j2]v[j2]…T[n][jn]v[jn], where all the ji are distinct (see (1)) can be rearranged by a permutation that sends vji↦vi. Formally, apply the permutation τ(ji)≡i. This will reorganize the wedge into T[1][j1]T[2][j2]…T[n][jn]v[1]∧v[2]∧v[3]…v[n](−1)sgn(τ), where the sign term is picked up by the rearrangement.
Now, write the indexing into T[i][ji] in terms of a permutation σ(i)≡ji. This becomes ∏iT[i][σ(i)](−1)sgn(τ)v[1]∧v[2]⋯∧v[n].
We have two permutations σ,τ in the formula. But we notice that σ=τ−1, and hence sgn(σ)=sgn(τ), so we can write the above as ∏iT[i][σ(i)](−1)sgn(σ)v[1]∧v[2]⋯∧v[n].
Thus, we have recovered the "classical determinant formula".
From the above algebraic encoding of the determinant of T[i][j] as ∑σ∈Snsgn(σ)∏iT[i][σ(i)],
we can recover the "laplace expansion" rule, that asks to pick a row r, and then compute the expression:
as
Lr(T)≡c∑T[r,c](−1)r+cdet(T′(r,c))
Where T′(r,c) is the matrix T with row r and column c deleted.
I'll derive this concretely using the determinant definition for the 3 by 3 case. The general
case follows immediately. I prefer being explicit in a small case as it lets me see what's going on.
Let's pick a basis for V, called b[1],b[2],b[3]. We have the relationship
v[i]≡Tb[i]. We want to evaluate the coefficient of v[1]∧v[2]∧v[3].
First grab a basis expansion of v[i] as v[i]=c[i][j]b[j]. These uniquely define
the coefficients c[i][j]. Next, expand the wedge:
I now expand out only the first wedge, leaving terms of the form c[1][1]b[1](⋅)+c[1][2]b[2](⋅)c[1][3]b[3](⋅).
(This corresponds to "expanding along and deleting the row" in a laplace expansion when finding the determinant)
Let's identify the (⋅) and see what remains:
Now, for example, in the first term c[1][1]b[1]∧(⋅), we lose anything inside that contains a b[1],
as the wedge will give us b[1]∧b[1]=0 (this corresponds to "deleting the column" when considering the submatrix).
Similar considerations have us remove all terms that contain b[2] in the brackets of c[1][2]b[2], and terms that
contain b[3] in the brackets of c[1][3]b[3]. This leaves us with:
We are now left with calculating terms like (c[2][2]b[2]+c[2][3]b[3])∧(c[3][2]b[2]+c[3][3]b[3]) which
we can solve by recursion (that is the determinant of the 2x2 submatrix). So if we now write the "by recursion" terms
down, we will get something like:
Where the k[i] are the values produced by the recursion, and we assume that the recursion will give
us the coefficients of the wedges "in order": so we always have b[2]∧b[3] for example, not b[3]∧b[2].
So, we need to ensure that the final answer we spit out corresponds to b[1]∧b[2]∧b[3]. If we simplify
the current step we are at, we will get:
We need to rearrange our terms to get b[1]∧b[2]∧b[3] times some constant.
On rearranging each term into the standard form b[1]∧b[2]∧b[3], we are forced to pick up the correct sign factors:
We clearly see that for each c[i], the factor is (−1)ik[i] where k[i] is the answer
gotten by computing the determinant of the sub-expression where we delete the vector b[i] (ignore the column)
and also ignore the entire "row", by not thinking about c[j](⋅) where j=i.
So, this proves the laplace expansion by exterior algebra.
I'll show the idea of how to prove Cayley Hamilton for an arbitrary commutative ring Rgiven we know Cayley Hamilton for Z. I describe it for 2x2 matrices.
The general version is immediate from this. Pick a variable matrix
and write down the expression for the characteristic polynomial So if:
M = [a b]
[c d]
then the characteristic polynomial is:
ch
= |M - xI|
=
|a-x b|
|c d-x|
that's ch(a,b,c,d,x)≡(a−x)(d−x)−bc=x2+x(a+d)+ad−bc. This equation has a,b,c,d,x∈Rfor some commutative ring R. Now, we know that if we set x=M, this equation will be satisfied. But what does
it mean to set x=A? Well, we need to let x be an arbitrary matrix:
We know that these equations hold when x=M, because the Cayley-Hamilton theorem
tells us that ch(M)=0! So we get a different system with p = a, q = b, r = c, s = d,
still with four equations, that we know is equal to zero! This means we have four
intederminates a, b, c, d and four equations, and we know that these equations are true
for all Z. But if a polynomial vanishes on infinitely many points, it must
identically be zero. Thus, this means that ch(A) is the zero polynomial, or ch(A) = 0for all R. This seems to depend on the fact that the ring is infinite, because otherwise
imagine we send Z to Z/10Z. Since we don't have an infinite number
of Z elements, why should the polynomial be zero? I imagine that this
needs zariski like arguments to be handled.