## § Cayley Hamilton

I find the theorem spectacular, because while naively the vector space $M_n(F)$ has dimension $n^2$, Cayley-Hamilton tells us that there's only $n$ of $M^0, M^1, \dots$ are enough to get linear dependence. However, I've never known a proof of Cayley Hamilton that sticks with me; I think I've found one now. For any matrix $A$, the adjugate has the property:
$A adj(A) = det(A) I$
Using this, Consider the matrix $P_A \equiv xI - A$ which lives in $End(V)[x]$, and its corresponding determinant $p_A(x) \equiv det(P_A) = det(xI - A)$. We have that
$P_A adj(P_A) = det(P_A) I \\ (xI - A) adj(xI - A) = det(xI - A) I = p_A(x) I \\$
If we view this as an equation in $End(V)[x]$, it says that $p_A$ has a factor $xI - A$. This means that $x = A$ is a zero of $p_A(X)$. Thus, we know that $A$ satisfies $p_A(x)$, hence $A$ satisfies its own characteristic polynomial! The key is of course the adjugate matrix equation that relates the adjugate matrix to the determinant of $A$.

• Let $A'(i, j)$ be the matrix $A$ with the $i$ th row and $j$ column removed.
• Let $C[i, j] \equiv (-1)^{i+j} det(A'(i, j))$ be the determinant of the $A'(i, j)$ matrix.
• Let define $adj(A) \equiv C^T$ to be the transpose of the cofactor matrix. That is, $adj(A)[i, j] = C[j, i] = det(A'(j, i))$.
• Call $D \equiv A adj(A)$. We will now compute the entries of $Z$ when $i = j$ and when $i \neq j$. We call it $D$ for diagonal since we will show that $D$ is a diagonal matrix with entry $det(A)$on the diagonal.
• First, compute $D[i, i]$ (I use einstein summation convention where repeated indices are implicitly summed over):
\begin{aligned} = D[i, i] = (A adj(A))[i, i] \\ &= A[i, k] adj(A) [k, i] \\ &= A[i, k] (-1)^{i+k} det(A'[k, i]) \\ &= det(A) \end{aligned}
The expression $A[i, k] det(A'[k, i])$ is the determinant of $A$ when expanded along the row $i$ using the Laplace expansion . Next, let's compute $D[i, j]$ when $i \neq j$:
\begin{aligned} D[i, j] = (A adj(A))[i, j] \\ &= A[i, k] adj (A)[k, j] \\ & = A[i, k] (-1)^{k+j} det(A'[k, j]) \\ & = A[i, k] (-1)^{j+k} det(A'[k, j]) \\ & = det(Z) \end{aligned}
This is the determinant of a new matrix $Z$ (for zero), such that the $j$th row of $Z$ is the $i$th row of $A$. More explicitly:
\begin{aligned} Z[l, :] \equiv \begin{cases} A[l, :] & l \neq j \\ A[i, :] & l = j \\ \end{cases} \end{aligned}
Since $Z$ differs from $A$ only in the $j$th row, we must have that $Z'[k, j] = A'[k, j]$, since $Z'[k, j]$ depends on what happens on all rows and columns outside of $j$. If we compute $det(Z)$ by expanding along the $j$ row, we get:
\begin{aligned} &det(Z) = (-1)^{j+k} Z[j, k] det(Z'[k, j]) \\ &det(Z) = (-1)^{j+k} A[j, k] det(Z'[k, j]) \\ &det(Z) = (-1)^{j+k} A[j, k] det(A'[k, j]) \\ &= D[i, j] \end{aligned}
But $Z$ has a repeated row: $A[j, :] = A[i, :]$ and $i \neq j$. Hence, $det(Z) = 0$. So, $D[i, j] = 0$ when $i \neq j$. Hence, this means that $A adj(A) = det(A) I$.
• We can rapidly derive other properties from this reuation. For example, $det(A adj(A)) = det(det(A) I) = det(A)^n$, and hence $det(A) det(adj(A)) = det(A)^n$, or $det(adj(A)) = det(A)^{n-1}$.
• Also, by rearranging, if $det(A) \neq 0$, we get $A adj(A) = det(A) I$, hence $A (adj(A)/det(A)) = I$, or $adj(A)/det(A) = A^{-1}$.

#### § Determinant in terms of exterior algebra

For a vector space $V$ of dimension $n$, Given a linear map $T: V \rightarrow V$, define a map $\Lambda T: \Lambda^n V \rightarrow \Lambda^n V$ such that $\Lambda T(v_1 \wedge v_2 \dots \wedge v_n) \equiv T v_1 \wedge T v_2 \dots \wedge T v_n$. Since the space $\Lambda^n V$ is one dimension, we will need one scalar $k$ to define $T$: $T(v_1 \wedge \dots \wedge v_n) = k v_1 \wedge \dots \wedge v_n$. It is either a theorem or a definition (depending on how one starts this process) that $k = det(T)$. If we choose this as a definition, then let's try to compute the value. Pick orthonormal basis $v[i]$. Let $w[i] \equiv T v[i]$ (to write $T$ as a matrix). Define the $T$ matrix to be defined by the equation $w[i] = T[i][j] v[j]$. If we now evaluate $\Lambda_T$, we get:
\begin{aligned} \Lambda T(v_1 \wedge \dots v_n) \\ &= T(v_1) \wedge T(v_2) \dots \wedge T(v_n) \\ &= w_1 \wedge w_2 \dots w_n \\ &= (T[1][j_1] v[j_1]) \wedge (T[1][j_2] v[j_2]) \wedge (T[1][j_n] v[j_n]) \\ &= (\sum_{\sigma \in S_n} (\prod_i T[i][\sigma(i)] sgn(\sigma)) v[1] \wedge v[2] \dots \wedge v_n \end{aligned}
Where the last equality is because:
• (1) Repeated vectors get cancelled, so we must have unique $v[1], v[2], \dots v[n]$ in the terms we collect. So all the $j_k$ must be distinct in a given term.
• A wedge of the form $T[1][j_1] v[j_1] \wedge T[2][j_2] v[j_2] \dots T[n][j_n] v[j_n]$, where all the $j_i$ are distinct (see (1)) can be rearranged by a permutation that sends $v_{j_i} \mapsto v_i$. Formally, apply the permutation $\tau(j_i) \equiv i$. This will reorganize the wedge into $T[1][j_1] T[2][j_2] \dots T[n][j_n] v[1] \wedge v[2] \wedge v[3] \dots v[n] (-1)^{sgn(\tau)}$, where the sign term is picked up by the rearrangement.
• Now, write the indexing into $T[i][j_i]$ in terms of a permutation $\sigma(i) \equiv j_i$. This becomes $\prod_i T[i][\sigma(i)] (-1)^{sgn(\tau)} v[1] \wedge v[2] \dots \wedge v[n]$.
• We have two permutations $\sigma, \tau$ in the formula. But we notice that $\sigma = \tau^{-1}$, and hence $sgn(\sigma) = sgn(\tau)$, so we can write the above as $\prod_i T[i][\sigma(i)] (-1)^{sgn(\sigma)} v[1] \wedge v[2] \dots \wedge v[n]$.
• Thus, we have recovered the "classical determinant formula".

#### § Laplace expansion of determinant

From the above algebraic encoding of the determinant of $T[i][j]$ as $\sum_{\sigma \in S_n}sgn(\sigma)\prod_i T[i][\sigma(i)]$, we can recover the "laplace expansion" rule, that asks to pick a row $r$, and then compute the expression: as
$L_r(T) \equiv \sum_c T[r, c] (-1)^{r+c} det(T'(r, c))$
Where $T'(r, c)$ is the matrix $T$ with row $r$ and column $c$ deleted. I'll derive this concretely using the determinant definition for the 3 by 3 case. The general case follows immediately. I prefer being explicit in a small case as it lets me see what's going on. Let's pick a basis for $V$, called $b[1], b[2], b[3]$. We have the relationship $v[i] \equiv Tb[i]$. We want to evaluate the coefficient of $v[1] \wedge v[2] \wedge v[3]$. First grab a basis expansion of $v[i]$ as $v[i] = c[i][j] b[j]$. These uniquely define the coefficients $c[i][j]$. Next, expand the wedge:
\begin{aligned} & v[1] \wedge v[2] \wedge v[3] \\ &= (c[1][1]b[1] + c[1][2]b[2] + c[1][3]b[3]) \wedge (c[2][1]b[1] + c[2][2]b[2] + c[2][3]b[3]) \wedge (c[3][1]b[1] + c[3][2]b[2] + c[3][3]b[3]) \end{aligned}
I now expand out only the first wedge, leaving terms of the form $c[1][1]b[1] (\cdot) + c[1][2]b[2](\cdot) c[1][3]b[3] (\cdot)$. (This corresponds to "expanding along and deleting the row" in a laplace expansion when finding the determinant) Let's identify the $(\cdot)$ and see what remains:
\begin{aligned} & v[1] \wedge v[2] \wedge v[3] \\ &= c[1][1]b[1] \wedge (c[2][1]b[1] + c[2][2]b[2] + c[2][3]b[3]) \wedge (c[3][1]b[1] + c[3][2]b[2] + c[3][3]b[3]) + c[1][2]b[2] \wedge(c[2][1]b[1] + c[2][2]b[2] + c[2][3]b[3]) \wedge(c[3][1]b[1] + c[3][2]b[2] + c[3][3]b[3]) + c[1][3]b[2] \wedge(c[2][1]b[1] + c[2][2]b[2] + c[2][3]b[3]) \wedge(c[3][1]b[1] + c[3][2]b[2] + c[3][3]b[3]) \end{aligned}
Now, for example, in the first term $c[1][1]b[1] \wedge (\cdot)$, we lose anything inside that contains a $b[1]$, as the wedge will give us $b[1] \wedge b[1] = 0$ (this corresponds to "deleting the column" when considering the submatrix). Similar considerations have us remove all terms that contain $b[2]$ in the brackets of $c[1][2]b[2]$, and terms that contain $b[3]$ in the brackets of $c[1][3]b[3]$. This leaves us with:
\begin{aligned} & v[1] \wedge v[2] \wedge v[3] \\ &= c[1][1]b[1] \wedge (c[2][2]b[2] + c[2][3]b[3]) \wedge (c[3][2]b[2] + c[3][3]b[3]) + c[1][2]b[2] \wedge(c[2][1]b[1] + c[2][3]b[3]) \wedge(c[3][1]b[1] + c[3][3]b[3]) + c[1][3]b[2] \wedge(c[2][1]b[1] + c[2][2]b[2] ) \wedge(c[3][1]b[1] + c[3][2]b[2]) \end{aligned}
We are now left with calculating terms like $(c[2][2]b[2] + c[2][3]b[3]) \wedge (c[3][2]b[2] + c[3][3]b[3])$ which we can solve by recursion (that is the determinant of the 2x2 submatrix). So if we now write the "by recursion" terms down, we will get something like:
\begin{aligned} & v[1] \wedge v[2] \wedge v[3] \\ &= c[1][1]b[1] \wedge (k[1] b[2] \wedge b[3]) + c[1][2]b[2] \wedge(k[2] b[1] \wedge b[3]) + c[1][3]b[2] \wedge(k[3] b[1] \wedge b[2]) \end{aligned}
Where the $k[i]$ are the values produced by the recursion, and we assume that the recursion will give us the coefficients of the wedges "in order": so we always have $b[2] \wedge b[3]$ for example, not $b[3] \wedge b[2]$. So, we need to ensure that the final answer we spit out corresponds to $b[1] \wedge b[2] \wedge b[3]$. If we simplify the current step we are at, we will get:
\begin{aligned} & v[1] \wedge v[2] \wedge v[3] \\ &= k[1] c[1][1]b[1] \wedge b[2] \wedge b[3] + k[2] c[1][2]b[2] \wedge b[1] \wedge b[3] + k[3] c[1][3]b[2] \wedge b[1] \wedge b[2] \end{aligned}
We need to rearrange our terms to get $b[1] \wedge b[2] \wedge b[3]$ times some constant. On rearranging each term into the standard form $b[1] \wedge b[2] \wedge b[3]$, we are forced to pick up the correct sign factors:
\begin{aligned} & v[1] \wedge v[2] \wedge v[3] \\ &= k[1] c[1][1]b[1] \wedge b[2] \wedge b[3] -k[2] c[1][2]b[1] \wedge b[2] \wedge b[3] + k[3] c[1][3]b[1] \wedge b[2] \wedge b[3] \\ &= (c[1][1]k[1] - c[1][2]k[2] + c[1][3]k[3])(b[1] \wedge b[2] \wedge b[3]) \end{aligned}
We clearly see that for each $c[i]$, the factor is $(-1)^i k[i]$ where $k[i]$ is the answer gotten by computing the determinant of the sub-expression where we delete the vector $b[i]$ (ignore the column) and also ignore the entire "row", by not thinking about $c[j](\cdot)$ where $j \neq i$. So, this proves the laplace expansion by exterior algebra.

#### § Deriving Cayley Hamilton for rings from $\mathbb Z$

I'll show the idea of how to prove Cayley Hamilton for an arbitrary commutative ring $R$ given we know Cayley Hamilton for $\mathbb Z$. I describe it for 2x2 matrices. The general version is immediate from this. Pick a variable matrix and write down the expression for the characteristic polynomial So if:
M = [a b]
[c d]

then the characteristic polynomial is:
ch
= |M - xI|
=
|a-x b|
|c   d-x|

that's $ch(a, b, c, d, x) \equiv (a-x)(d-x) - bc = x^2 +x (a + d) + ad - bc$. This equation has $a, b, c, d, x \in R$ for some commutative ring $R$. Now, we know that if we set $x = M$, this equation will be satisfied. But what does it mean to set $x = A$? Well, we need to let $x$ be an arbitrary matrix:
x = [p q]
[r s]

And thus we compute x^2 to be:
x^2
= [p q][p q]
[r s][r s]
= [p^2 + qr; pq + qs]
[rp + sr; rq + s^2]

So now expanding out $ch(a, b, c, d, x)$ in terms of $x$ on substituting for $x$ the matrix we get the system:
[p^2 + qr; pq + qs] + (a + d) [p q] + (ad - bc)[1 0] = [0 0]
[rp + sr; rq + s^2]           [r s]            [0 1]   [0 0]

We know that these equations hold when $x = M$, because the Cayley-Hamilton theorem tells us that $ch(M) = 0$! So we get a different system with p = a, q = b, r = c, s = d, still with four equations, that we know is equal to zero! This means we have four intederminates a, b, c, d and four equations, and we know that these equations are true for all $\mathbb Z$. But if a polynomial vanishes on infinitely many points, it must identically be zero. Thus, this means that ch(A) is the zero polynomial, or ch(A) = 0 for all R. This seems to depend on the fact that the ring is infinite, because otherwise imagine we send $\mathbb Z$ to $Z/10Z$. Since we don't have an infinite number of $\mathbb Z$ elements, why should the polynomial be zero? I imagine that this needs zariski like arguments to be handled.

#### § Cramer's rules

We can get cramer's rule using some handwavy manipulation or rigorizing the manipulation using geometric algebra . Say we have a system of equations:
\begin{aligned} a[1] x + b[1] y = c[1] \\ a[2] x + b[2] y = c[2] \end{aligned}
We can write this as:
$\vec a x + \vec b y = \vec c$
where $\vec a \equiv (a_1, a_2)$ and so on. To solve the system, we wedge with $\vec a$ and $\vec b$:
\begin{aligned} \vec a \wedge (\vec a x + \vec b y) = \vec a \wedge \vec c \\ \vec a \wedge \vec b y = \vec a \wedge \vec c \\ y = \frac{\vec a \wedge \vec c}{\vec a \wedge \vec b} \\ y = \frac{\begin{vmatrix} a[1] & a[2] \\ c[1] & c[2] \end{vmatrix}}{ \begin{vmatrix} a[1] & b[1] \\ a[2] & b[2] \end{vmatrix} } \end{aligned}
Which is exactly cramer's rule.