Posted by: holdenlee | June 1, 2010

## Cayley-Hamilton Theorem

We prove the following:

(1) Let $M$ be a matrix and $P(x)=\det(x I -M)$ be its characteristic polynomial. Then $P(M)=\mathcal{O}$.

An equivalent way of stating this is: (the determinant of a linear transformation is just the determinant of any matrix representation; it does not depend on choice of basis)

(2) Let $T$ be a linear transformation and $P(x)$ be its characteristic polynomial. Then $P(T)=\mathcal{O}$.

Suppose $M$ is a $n\times n$ matrix. Let $F$ be the underlying field.

To prove (1), we could consider just plugging in $M$ into the characteristic polynomial, like this:

$(3)\;\displaystyle \det(M I -M)=0?$

Wait, but this doesn’t make sense… What do we want here? Well, when we expand the characteristic polynomial, we take the matrix $xI-M$, which has $x$‘s along the diagonal, and then take the determinant. So we want to take a matrix which has $M$‘s along its diagonal, and then take the determinant. In other words, we want to work with matrices with matrix entries!

But wait, matrices don’t form a field (since some of them don’t have inverses). They do, however, form a ring $R$, and we know that many linear algebra facts (where matrices have entries in fields) carry over to rings. We consider matrices over the ring of $n\times n$ matrices with entries in $F$. Equivalently, since matrices correspond to linear transformations, we could talk about matrices over the ring of linear transformations instead. This is frequently called the endomorphism ring, $\text{End}(V)$ where $V$ is our vector space $F^n$. To prevent confusion, we will denote matrices with matrix entries in bold.

Let $\mathbf{I}$ be the identity matrix in the ring of matrices with entries in $R$: It has $n\times n$ identity matrices (or transformations) $I_n$ along its diagonal, and zero matrices $\mathcal{O}$ elsewhere. Let $\mathbf{M}$ be the matrix with each entry $m_{ij}$ of $M$ replaced by that $m_{ij}I_n$ (thinking of entries as linear transformations, these correspond to multiplication by $m_{ij}$). So the correct way to write (3)- what we actually want to prove- is:

$\displaystyle (4)\; \det(M \mathbf{I} -\mathbf{M})=\mathcal{O}.$

Note that $M\in R$ functions as a scalar here, multiplying by every entry in $\mathbf{I}$. Let $\mathbf{\Delta}$ be the matrix $M \mathbf{I}-\mathbf{M}$.

Let $\mathbf{X}$ be the column vector with $i$th entry the $i$th standard basis vector $e_i$ in $F^n$, so it is a column vector with entries that are column vectors. We can multiply on the left by $\mathbf{\Delta}$. Then the $i$th entry of $\mathbf{\Delta X}$ is

$\displaystyle M\mathbf{I}\mathbf{X}-\mathbf{M}\mathbf{X}=Me_i-\sum_{j=1}^n m_{ij}e_j=0.$

Hence

$(5)\;\displaystyle \mathbf{\Delta X}=0.$

(The entries here are column vectors, or equivalently elements of the vector space $V$.)

Now we recall this fact from linear algebra:

(6) Let $A$ be a $n\times n$ matrix, and $C$ its cofactor matrix, that is, the $(i,j)$ entry is $(-1)^{i+j}$ times the determinant of $A$ with the $i$th row and $j$th column removed. Then

$\displaystyle C^T A=\det(A)I.$

Proof (same for matrices over a ring as matrices over a field): The $(i,j)$ entry of $C^T A$ is the dot product of the $i$th column of $C$ with the $j$th column of $A$. When $i=j$ this simply gives the cofactor expansion of $\det(A)$ along the $i$th column. When $i\neq j$ this gives the determinant if the $i$th column were replaced by the $j$th column, which is 0 since 2 columns are the same. $\square$

Let $\mathbf{C}$ be the cofactor matrix of $\mathbf{\Delta}$. We apply this to $\mathbf{\Delta}$ to get:

$(7)\; \mathbf{C}^T\mathbf{\Delta}=\det(\Delta)\mathbf{I}.$

Multiplying both sides of (5) by $\mathbf{C}^T$ and using (7),

$(8)\; \det(\Delta)\mathbf{I}X=0.$

$\det(\Delta)$ is a $n\times n$ matrix, and $X$ contains all the standard basis vectors. Hence $\det(\Delta)e_i$ is 0 for any $i$, and $\det(\Delta)=\mathcal{O}$, as desired. $\blacksquare$

The theorem can actually be made more general: In (2) we can replace $T$ with an endomorphism $\varphi$ in a module $M$. Take the matrix of the endomorphism with respect to any generating set $m_1,\ldots,m_n$ (If this is not a basis the matrix is not unique, but that’s okay!), and use that to define $P$. We get $P(\varphi)=\mathcal{O}$. The proof is the same.