Let be an -matrix. Earlier, we saw that all matrices of this size, with the usual matrix addition and scalar multiplication, form an -dimensional vector space. Thus, there is a non-negative integer , such that the system
is linearly dependent. In that case there is an equality of the form
According to the Cayley-Hamilton theorem below, is at most . To understand this result, we specify what is meant by evaluating a polynomial at a square matrix.
Let be a natural number and an -matrix. Under substitution of (for ) into a polynomial
in
, or
evaluating this polynomial at
, we mean determining the
-matrix
defined by
The assignment of to is a mapping from the vector space of all polynomials in to the vector space of all -matrices.
This map respects multiplication in the following sense: if , , and are polynomials such that , then
Under substitution of a linear map
into the polynomial
as given above, we mean the linear map
defined by
If, for example, and , then
If , then the substitution of the matrix into a polynomial is no different than substituting the value for into , that is to say: is the -matrix with entry .
A remarkable fact about linear maps from a finite-dimensional vector space to itself is that substitution in their characteristic polynomial gives the null map:
Each linear map from a finite-dimensional vector space to itself satisfies
where
is the characteristic polynomial.
If and we write the characteristic polynomial as
then for any basis
of
, the
-matrix
is a solution of the corresponding matrix equation in
:
Let be a natural number and an -matrix. Then satisfies . This is the special case of the theorem for the linear map determined by .
Let be a natural number. First we prove the statement: Every -matrix satisfies .
We use the matrix equation in the Cramer's rule. We recall that the -entry of the adjoint matrix is . Said equation is
If we replace
by
, we find
We carry out the proof by applying substitution of an -matrix into a more general expression than a polynomial, namely a polynomial whose coefficients are also -matrices. We denote by the collection of all such polynomials. An element of is of the form
where
belong to
. The factor
in the above equality is an example of an element of
. The left-hand side of the equality is also an example; here all of the coefficients of powers of
are multiples of the identity matrix
. Finally, the factor
on the right-hand side is also a polynomial in
of degree
in
(after all, the determinants in the adjoint matrix are of matrices with
rows and
columns).
For , the matrices are no different than ordinary numbers and we have .
Again, we denote by the matrix obtained from by substitution of for . It is easy to see that is again a vector space and that the map assigning to the matrix , is a linear map .
Elements of can be multiplied by one another with the usual rules, where is rewritten to for each matrix in . If , then matrix multiplication is no longer commutative, so substitution of for no longer respects multiplication. This means that it may happen that . For example, if and are invertible matrices that do not commute (that is, ), then and satisfy
However, if commutes with and , and so commutes with each coefficient of and , then we do have
because then the product in the right-hand side can be expanded in the same manner as for a polynomial in (always when is rewritten to for a coefficient of or , the same commutation also applies after substitution: ).
We apply this statement to the equality derived above. Because commutes with , and because commutes with each matrix, it follows that commutes with and, as a consequence, with each coefficient of .
If we substitute for , then the right-hand side becomes equal to the null matrix because the substitution of in yields the null matrix. We have found that substitution of respects multiplication, thus substitution of in the entire right-hand side gives the null matrix. We conclude that substitution of in the left-hand side produces the null matrix as well. This means that substitution of in gives the null matrix. Since substituting for in equals , this means that , which proves the statement about .
The theorem is a direct consequence because, for each basis of we have
where
.
Substitution of for in does not always give the null matrix. If, for example,
then
so substitution of
gives
This is distinct from the null matrix if for example .
Consider the matrix
and the polynomial
Calculate
.
In order to calculate the matrix
, we substitute
for
in the polynomial
and we simplify the result to a single
-matrix: