Posted by: holdenlee | June 1, 2010

Cayley-Hamilton Theorem

We prove the following:

(1) Let M be a matrix and P(x)=\det(x I -M) be its characteristic polynomial. Then P(M)=\mathcal{O}.

An equivalent way of stating this is: (the determinant of a linear transformation is just the determinant of any matrix representation; it does not depend on choice of basis)

(2) Let T be a linear transformation and P(x) be its characteristic polynomial. Then P(T)=\mathcal{O}.

Suppose M is a n\times n matrix. Let F be the underlying field.

To prove (1), we could consider just plugging in M into the characteristic polynomial, like this:

(3)\;\displaystyle \det(M I -M)=0?

Wait, but this doesn’t make sense… What do we want here? Well, when we expand the characteristic polynomial, we take the matrix xI-M, which has x‘s along the diagonal, and then take the determinant. So we want to take a matrix which has M‘s along its diagonal, and then take the determinant. In other words, we want to work with matrices with matrix entries!

But wait, matrices don’t form a field (since some of them don’t have inverses). They do, however, form a ring R, and we know that many linear algebra facts (where matrices have entries in fields) carry over to rings. We consider matrices over the ring of n\times n matrices with entries in F. Equivalently, since matrices correspond to linear transformations, we could talk about matrices over the ring of linear transformations instead. This is frequently called the endomorphism ring, \text{End}(V) where V is our vector space F^n. To prevent confusion, we will denote matrices with matrix entries in bold.

Let \mathbf{I} be the identity matrix in the ring of matrices with entries in R: It has n\times n identity matrices (or transformations) I_n along its diagonal, and zero matrices \mathcal{O} elsewhere. Let \mathbf{M} be the matrix with each entry m_{ij} of M replaced by that m_{ij}I_n (thinking of entries as linear transformations, these correspond to multiplication by m_{ij}). So the correct way to write (3)- what we actually want to prove- is:

\displaystyle (4)\; \det(M \mathbf{I} -\mathbf{M})=\mathcal{O}.

Note that M\in R functions as a scalar here, multiplying by every entry in \mathbf{I}. Let \mathbf{\Delta} be the matrix M \mathbf{I}-\mathbf{M}.

Let \mathbf{X} be the column vector with ith entry the ith standard basis vector e_i in F^n, so it is a column vector with entries that are column vectors. We can multiply on the left by \mathbf{\Delta}. Then the ith entry of \mathbf{\Delta X} is

\displaystyle M\mathbf{I}\mathbf{X}-\mathbf{M}\mathbf{X}=Me_i-\sum_{j=1}^n m_{ij}e_j=0.


(5)\;\displaystyle \mathbf{\Delta X}=0.

(The entries here are column vectors, or equivalently elements of the vector space V.)

Now we recall this fact from linear algebra:

(6) Let A be a n\times n matrix, and C its cofactor matrix, that is, the (i,j) entry is (-1)^{i+j} times the determinant of A with the ith row and jth column removed. Then

\displaystyle C^T A=\det(A)I.

Proof (same for matrices over a ring as matrices over a field): The (i,j) entry of C^T A is the dot product of the ith column of C with the jth column of A. When i=j this simply gives the cofactor expansion of \det(A) along the ith column. When i\neq j this gives the determinant if the ith column were replaced by the jth column, which is 0 since 2 columns are the same. \square

Let \mathbf{C} be the cofactor matrix of \mathbf{\Delta}. We apply this to \mathbf{\Delta} to get:

(7)\; \mathbf{C}^T\mathbf{\Delta}=\det(\Delta)\mathbf{I}.

Multiplying both sides of (5) by \mathbf{C}^T and using (7),

(8)\; \det(\Delta)\mathbf{I}X=0.

\det(\Delta) is a n\times n matrix, and X contains all the standard basis vectors. Hence \det(\Delta)e_i is 0 for any i, and \det(\Delta)=\mathcal{O}, as desired. \blacksquare

The theorem can actually be made more general: In (2) we can replace T with an endomorphism \varphi in a module M. Take the matrix of the endomorphism with respect to any generating set m_1,\ldots,m_n (If this is not a basis the matrix is not unique, but that’s okay!), and use that to define P. We get P(\varphi)=\mathcal{O}. The proof is the same.


  1. Thanks for sharing this proof in such a detailed form! That was the way in whic I liked to think of Cayley-Hamilton… untill I found this other proof which, to me, is more clear: take a look to see if you like it 🙂 It uses only the fact (which can be proved as a preparation for canonical formas) that a matrix with entries in C can be put in upper triangular form:

    I’m quite sure you understand the math symbols: if not please tell me that so I can translate them to you.

  2. Thanks, that’s a nice proof!

Leave a Reply

Fill in your details below or click an icon to log in: Logo

You are commenting using your account. Log Out /  Change )

Google photo

You are commenting using your Google account. Log Out /  Change )

Twitter picture

You are commenting using your Twitter account. Log Out /  Change )

Facebook photo

You are commenting using your Facebook account. Log Out /  Change )

Connecting to %s


%d bloggers like this: