A Universe of Sorts

§ Cayley Hamilton

I find the theorem spectacular, because while naively the vector space

M_n(F)

has dimension

n^2

, Cayley-Hamilton tells us that there's only

n

M^0, M^1, \dots

are enough to get linear dependence. However, I've never known a proof of Cayley Hamilton that sticks with me; I think I've found one now.
For any matrix

A

, the adjugate has the property:

A adj(A) = det(A) I

Using this, Consider the matrix

P_A \equiv xI - A

which lives in

End(V)[x]

, and its corresponding determinant

p_A(x) \equiv det(P_A) = det(xI - A)

.
We have that

P_A adj(P_A) = det(P_A) I \\ (xI - A) adj(xI - A) = det(xI - A) I = p_A(x) I \\

If we view this as an equation in

End(V)[x]

, it says that

p_A

has a factor

xI - A

. This means that

x = A

is a zero of

p_A(X)

. Thus, we know that

A

satisfies

p_A(x)

, hence

A

satisfies its own characteristic polynomial!
The key is of course the adjugate matrix equation that relates the adjugate matrix to the determinant of

A

§ Adjugate matrix equation

Let $A'(i, j)$ be the matrix $A$ with the $i$ th row and $j$ column removed.
Let $C[i, j] \equiv (-1)^{i+j} det(A'(i, j))$ be the determinant of the $A'(i, j)$ matrix.
Let define $adj(A) \equiv C^T$ to be the transpose of the cofactor matrix. That is, $adj(A)[i, j] = C[j, i] = det(A'(j, i))$ .
Call $D \equiv A adj(A)$ . We will now compute the entries of $Z$ when $i = j$ and when $i \neq j$ . We call it $D$ for diagonal since we will show that $D$ is a diagonal matrix with entry $det(A)$ on the diagonal.
First, compute $D[i, i]$ (I use einstein summation convention where repeated indices are implicitly summed over):

\begin{aligned} = D[i, i] = (A adj(A))[i, i] \\ &= A[i, k] adj(A) [k, i] \\ &= A[i, k] (-1)^{i+k} det(A'[k, i]) \\ &= det(A) \end{aligned}

The expression

A[i, k] det(A'[k, i])

is the determinant of

A

when expanded along the row

i

using the Laplace expansion .
Next, let's compute

D[i, j]

when

i \neq j

\begin{aligned} D[i, j] = (A adj(A))[i, j] \\ &= A[i, k] adj (A)[k, j] \\ & = A[i, k] (-1)^{k+j} det(A'[k, j]) \\ & = A[i, k] (-1)^{j+k} det(A'[k, j]) \\ & = det(Z) \end{aligned}

This is the determinant of a new matrix

Z

(for zero), such that the

j

th row of

Z

is the

i

th row of

A

. More explicitly:

\begin{aligned} Z[l, :] \equiv \begin{cases} A[l, :] & l \neq j \\ A[i, :] & l = j \\ \end{cases} \end{aligned}

Since

Z

differs from

A

only in the

j

th row, we must have that

Z'[k, j] = A'[k, j]

, since

Z'[k, j]

depends on what happens on all rows and columns outside of

j

.
If we compute

det(Z)

by expanding along the

j

row, we get:

\begin{aligned} &det(Z) = (-1)^{j+k} Z[j, k] det(Z'[k, j]) \\ &det(Z) = (-1)^{j+k} A[j, k] det(Z'[k, j]) \\ &det(Z) = (-1)^{j+k} A[j, k] det(A'[k, j]) \\ &= D[i, j] \end{aligned}

But

Z

has a repeated row:

A[j, :] = A[i, :]

and

i \neq j

. Hence,

det(Z) = 0

. So,

D[i, j] = 0

when

i \neq j

.
Hence, this means that

A adj(A) = det(A) I

We can rapidly derive other properties from this reuation. For example, $det(A adj(A)) = det(det(A) I) = det(A)^n$ , and hence $det(A) det(adj(A)) = det(A)^n$ , or $det(adj(A)) = det(A)^{n-1}$ .
Also, by rearranging, if $det(A) \neq 0$ , we get $A adj(A) = det(A) I$ , hence $A (adj(A)/det(A)) = I$ , or $adj(A)/det(A) = A^{-1}$ .

§ Determinant in terms of exterior algebra

For a vector space

V

of dimension

n

, Given a linear map

T: V \rightarrow V

, define a map

\Lambda T: \Lambda^n V \rightarrow \Lambda^n V

such that

\Lambda T(v_1 \wedge v_2 \dots \wedge v_n) \equiv T v_1 \wedge T v_2 \dots \wedge T v_n

. Since the space

\Lambda^n V

is one dimension, we will need one scalar

k

to define

T

T(v_1 \wedge \dots \wedge v_n) = k v_1 \wedge \dots \wedge v_n

. It is either a theorem or a definition (depending on how one starts this process) that

k = det(T)

.
If we choose this as a definition, then let's try to compute the value. Pick orthonormal basis

v[i]

. Let

w[i] \equiv T v[i]

(to write

T

as a matrix). Define the

T

matrix to be defined by the equation

w[i] = T[i][j] v[j]

. If we now evaluate

\Lambda_T

, we get:

\begin{aligned} \Lambda T(v_1 \wedge \dots v_n) \\ &= T(v_1) \wedge T(v_2) \dots \wedge T(v_n) \\ &= w_1 \wedge w_2 \dots w_n \\ &= (T[1][j_1] v[j_1]) \wedge (T[1][j_2] v[j_2]) \wedge (T[1][j_n] v[j_n]) \\ &= (\sum_{\sigma \in S_n} (\prod_i T[i][\sigma(i)] sgn(\sigma)) v[1] \wedge v[2] \dots \wedge v_n \end{aligned}

Where the last equality is because:

(1) Repeated vectors get cancelled, so we must have unique $v[1], v[2], \dots v[n]$ in the terms we collect. So all the $j_k$ must be distinct in a given term.
A wedge of the form $T[1][j_1] v[j_1] \wedge T[2][j_2] v[j_2] \dots T[n][j_n] v[j_n]$ , where all the $j_i$ are distinct (see (1)) can be rearranged by a permutation that sends $v_{j_i} \mapsto v_i$ . Formally, apply the permutation $\tau(j_i) \equiv i$ . This will reorganize the wedge into $T[1][j_1] T[2][j_2] \dots T[n][j_n] v[1] \wedge v[2] \wedge v[3] \dots v[n] (-1)^{sgn(\tau)}$ , where the sign term is picked up by the rearrangement.
Now, write the indexing into $T[i][j_i]$ in terms of a permutation $\sigma(i) \equiv j_i$ . This becomes $\prod_i T[i][\sigma(i)] (-1)^{sgn(\tau)} v[1] \wedge v[2] \dots \wedge v[n]$ .
We have two permutations $\sigma, \tau$ in the formula. But we notice that $\sigma = \tau^{-1}$ , and hence $sgn(\sigma) = sgn(\tau)$ , so we can write the above as $\prod_i T[i][\sigma(i)] (-1)^{sgn(\sigma)} v[1] \wedge v[2] \dots \wedge v[n]$ .
Thus, we have recovered the "classical determinant formula".

§ Laplace expansion of determinant

From the above algebraic encoding of the determinant of

T[i][j]

\sum_{\sigma \in S_n}sgn(\sigma)\prod_i T[i][\sigma(i)]

, we can recover the "laplace expansion" rule, that asks to pick a row

r

, and then compute the expression: as

L_r(T) \equiv \sum_c T[r, c] (-1)^{r+c} det(T'(r, c))

Where

T'(r, c)

is the matrix

T

with row

r

and column

c

deleted. I'll derive this concretely using the determinant definition for the 3 by 3 case. The general case follows immediately. I prefer being explicit in a small case as it lets me see what's going on.
Let's pick a basis for

V

, called

b[1], b[2], b[3]

. We have the relationship

v[i] \equiv Tb[i]

. We want to evaluate the coefficient of

v[1] \wedge v[2] \wedge v[3]

. First grab a basis expansion of

v[i]

v[i] = c[i][j] b[j]

. These uniquely define the coefficients

c[i][j]

. Next, expand the wedge:

\begin{aligned} & v[1] \wedge v[2] \wedge v[3] \\ &= (c[1][1]b[1] + c[1][2]b[2] + c[1][3]b[3]) \wedge (c[2][1]b[1] + c[2][2]b[2] + c[2][3]b[3]) \wedge (c[3][1]b[1] + c[3][2]b[2] + c[3][3]b[3]) \end{aligned}

I now expand out only the first wedge, leaving terms of the form

c[1][1]b[1] (\cdot) + c[1][2]b[2](\cdot) c[1][3]b[3] (\cdot)

. (This corresponds to "expanding along and deleting the row" in a laplace expansion when finding the determinant) Let's identify the

(\cdot)

and see what remains:

\begin{aligned} & v[1] \wedge v[2] \wedge v[3] \\ &= c[1][1]b[1] \wedge (c[2][1]b[1] + c[2][2]b[2] + c[2][3]b[3]) \wedge (c[3][1]b[1] + c[3][2]b[2] + c[3][3]b[3]) + c[1][2]b[2] \wedge(c[2][1]b[1] + c[2][2]b[2] + c[2][3]b[3]) \wedge(c[3][1]b[1] + c[3][2]b[2] + c[3][3]b[3]) + c[1][3]b[2] \wedge(c[2][1]b[1] + c[2][2]b[2] + c[2][3]b[3]) \wedge(c[3][1]b[1] + c[3][2]b[2] + c[3][3]b[3]) \end{aligned}

Now, for example, in the first term

c[1][1]b[1] \wedge (\cdot)

, we lose anything inside that contains a

b[1]

, as the wedge will give us

b[1] \wedge b[1] = 0

(this corresponds to "deleting the column" when considering the submatrix). Similar considerations have us remove all terms that contain

b[2]

in the brackets of

c[1][2]b[2]

, and terms that contain

b[3]

in the brackets of

c[1][3]b[3]

. This leaves us with:

\begin{aligned} & v[1] \wedge v[2] \wedge v[3] \\ &= c[1][1]b[1] \wedge (c[2][2]b[2] + c[2][3]b[3]) \wedge (c[3][2]b[2] + c[3][3]b[3]) + c[1][2]b[2] \wedge(c[2][1]b[1] + c[2][3]b[3]) \wedge(c[3][1]b[1] + c[3][3]b[3]) + c[1][3]b[2] \wedge(c[2][1]b[1] + c[2][2]b[2] ) \wedge(c[3][1]b[1] + c[3][2]b[2]) \end{aligned}

We are now left with calculating terms like

(c[2][2]b[2] + c[2][3]b[3]) \wedge (c[3][2]b[2] + c[3][3]b[3])

which we can solve by recursion (that is the determinant of the 2x2 submatrix). So if we now write the "by recursion" terms down, we will get something like:

\begin{aligned} & v[1] \wedge v[2] \wedge v[3] \\ &= c[1][1]b[1] \wedge (k[1] b[2] \wedge b[3]) + c[1][2]b[2] \wedge(k[2] b[1] \wedge b[3]) + c[1][3]b[2] \wedge(k[3] b[1] \wedge b[2]) \end{aligned}

Where the

k[i]

are the values produced by the recursion, and we assume that the recursion will give us the coefficients of the wedges "in order": so we always have

b[2] \wedge b[3]

for example, not

b[3] \wedge b[2]

. So, we need to ensure that the final answer we spit out corresponds to

b[1] \wedge b[2] \wedge b[3]

. If we simplify the current step we are at, we will get:

\begin{aligned} & v[1] \wedge v[2] \wedge v[3] \\ &= k[1] c[1][1]b[1] \wedge b[2] \wedge b[3] + k[2] c[1][2]b[2] \wedge b[1] \wedge b[3] + k[3] c[1][3]b[2] \wedge b[1] \wedge b[2] \end{aligned}

We need to rearrange our terms to get

b[1] \wedge b[2] \wedge b[3]

times some constant. On rearranging each term into the standard form

b[1] \wedge b[2] \wedge b[3]

, we are forced to pick up the correct sign factors:

\begin{aligned} & v[1] \wedge v[2] \wedge v[3] \\ &= k[1] c[1][1]b[1] \wedge b[2] \wedge b[3] -k[2] c[1][2]b[1] \wedge b[2] \wedge b[3] + k[3] c[1][3]b[1] \wedge b[2] \wedge b[3] \\ &= (c[1][1]k[1] - c[1][2]k[2] + c[1][3]k[3])(b[1] \wedge b[2] \wedge b[3]) \end{aligned}

We clearly see that for each

c[i]

, the factor is

(-1)^i k[i]

where

k[i]

is the answer gotten by computing the determinant of the sub-expression where we delete the vector

b[i]

(ignore the column) and also ignore the entire "row", by not thinking about

c[j](\cdot)

where

j \neq i

. So, this proves the laplace expansion by exterior algebra.

§ Deriving Cayley Hamilton for rings from $\mathbb Z$

I'll show the idea of how to prove Cayley Hamilton for an arbitrary commutative ring

R

given we know Cayley Hamilton for

\mathbb Z

. I describe it for 2x2 matrices. The general version is immediate from this. Pick a variable matrix and write down the expression for the characteristic polynomial So if:

M = [a b]
    [c d]

then the characteristic polynomial is:

ch
= |M - xI|
=
|a-x b|
|c   d-x|

that's

ch(a, b, c, d, x) \equiv (a-x)(d-x) - bc = x^2 +x (a + d) + ad - bc

. This equation has

a, b, c, d, x \in R

for some commutative ring

R

. Now, we know that if we set

x = M

, this equation will be satisfied. But what does it mean to set

x = A

? Well, we need to let

x

be an arbitrary matrix:

x = [p q]
    [r s]

And thus we compute x^2 to be:

x^2
= [p q][p q]
  [r s][r s]
= [p^2 + qr; pq + qs]
  [rp + sr; rq + s^2]

So now expanding out

ch(a, b, c, d, x)

in terms of

x

on substituting for

x

the matrix we get the system:

[p^2 + qr; pq + qs] + (a + d) [p q] + (ad - bc)[1 0] = [0 0]
[rp + sr; rq + s^2]           [r s]            [0 1]   [0 0]

We know that these equations hold when

x = M

, because the Cayley-Hamilton theorem tells us that

ch(M) = 0

! So we get a different system with p = a, q = b, r = c, s = d, still with four equations, that we know is equal to zero! This means we have four intederminates a, b, c, d and four equations, and we know that these equations are true for all

\mathbb Z

. But if a polynomial vanishes on infinitely many points, it must identically be zero. Thus, this means that ch(A) is the zero polynomial, or ch(A) = 0 for all R. This seems to depend on the fact that the ring is infinite, because otherwise imagine we send

\mathbb Z

Z/10Z

. Since we don't have an infinite number of

\mathbb Z

elements, why should the polynomial be zero? I imagine that this needs zariski like arguments to be handled.

§ Cramer's rules

We can get cramer's rule using some handwavy manipulation or rigorizing the manipulation using geometric algebra .
Say we have a system of equations:

\begin{aligned} a[1] x + b[1] y = c[1] \\ a[2] x + b[2] y = c[2] \end{aligned}

We can write this as:

\vec a x + \vec b y = \vec c

where

\vec a \equiv (a_1, a_2)

and so on. To solve the system, we wedge with

\vec a

and

\vec b

\begin{aligned} \vec a \wedge (\vec a x + \vec b y) = \vec a \wedge \vec c \\ \vec a \wedge \vec b y = \vec a \wedge \vec c \\ y = \frac{\vec a \wedge \vec c}{\vec a \wedge \vec b} \\ y = \frac{\begin{vmatrix} a[1] & a[2] \\ c[1] & c[2] \end{vmatrix}}{ \begin{vmatrix} a[1] & b[1] \\ a[2] & b[2] \end{vmatrix} } \end{aligned}

Which is exactly cramer's rule.

§ Cayley Hamilton

§ Adjugate matrix equation

§ Determinant in terms of exterior algebra

§ Laplace expansion of determinant

§ Deriving Cayley Hamilton for rings from $\mathbb Z$

§ Cramer's rules

§ The formula for the adjugate matrix from Cramer's rule (TODO)

§ References

§ Cayley Hamilton

§ Adjugate matrix equation

§ Determinant in terms of exterior algebra

§ Laplace expansion of determinant

§ Deriving Cayley Hamilton for rings from Z\mathbb ZZ

§ Cramer's rules

§ The formula for the adjugate matrix from Cramer's rule (TODO)

§ References

§ Deriving Cayley Hamilton for rings from $\mathbb Z$