Let be an -matrix. Earlier, we saw that all matrices of this size, with the usual matrix addition and scalar multiplication, form an -dimensional vector space. Thus, there is a non-negative integer , such that the system is linearly dependent. In that case there is an equality of the form
According to the Cayley-Hamilton theorem below, is at most . To understand this result, we specify what is meant by evaluating a polynomial at a square matrix.
Let be a natural number and an -matrix. Under substitution of (for ) into a polynomial in , or evaluating this polynomial at , we mean determining the -matrix defined by
The assignment of to is a mapping from the vector space of all polynomials in to the vector space of all -matrices.
This map respects multiplication in the following sense: if , , and are polynomials such that , then Under substitution of a linear map into the polynomial as given above, we mean the linear map defined by
If, for example, and , then
If , then the substitution of the matrix into a polynomial is no different than substituting the value for into , that is to say: is the -matrix with entry .
A remarkable fact about linear maps from a finite-dimensional vector space to itself is that substitution in their characteristic polynomial gives the null map:
Each linear map from a finite-dimensional vector space to itself satisfies where is the characteristic polynomial.
If and we write the characteristic polynomial as then for any basis of , the -matrix is a solution of the corresponding matrix equation in :
Let be a natural number and an -matrix. Then satisfies . This is the special case of the theorem for the linear map determined by .
Let be a natural number. First we prove the statement: Every -matrix satisfies .
We use the matrix equation in the Cramer's rule. We recall that the -entry of the adjoint matrix is . Said equation is If we replace by , we find
We carry out the proof by applying substitution of an -matrix into a more general expression than a polynomial, namely a polynomial whose coefficients are also -matrices. We denote by the collection of all such polynomials. An element of is of the form
where belong to . The factor in the above equality is an example of an element of . The left-hand side of the equality is also an example; here all of the coefficients of powers of are multiples of the identity matrix . Finally, the factor on the right-hand side is also a polynomial in of degree in (after all, the determinants in the adjoint matrix are of matrices with rows and columns).
For , the matrices are no different than ordinary numbers and we have .
Again, we denote by the matrix obtained from by substitution of for . It is easy to see that is again a vector space and that the map assigning to the matrix , is a linear map .
Elements of can be multiplied by one another with the usual rules, where is rewritten to for each matrix in . If , then matrix multiplication is no longer commutative, so substitution of for no longer respects multiplication. This means that it may happen that . For example, if and are invertible matrices that do not commute (that is, ), then and satisfy
However, if commutes with and , and so commutes with each coefficient of and , then we do have
because then the product in the right-hand side can be expanded in the same manner as for a polynomial in (always when is rewritten to for a coefficient of or , the same commutation also applies after substitution: ).
We apply this statement to the equality derived above. Because commutes with , and because commutes with each matrix, it follows that commutes with and, as a consequence, with each coefficient of .
If we substitute for , then the right-hand side becomes equal to the null matrix because the substitution of in yields the null matrix. We have found that substitution of respects multiplication, thus substitution of in the entire right-hand side gives the null matrix. We conclude that substitution of in the left-hand side produces the null matrix as well. This means that substitution of in gives the null matrix. Since substituting for in equals , this means that , which proves the statement about .
The theorem is a direct consequence because, for each basis of we have where .
Substitution of for in does not always give the null matrix. If, for example,
then so substitution of gives
This is distinct from the null matrix if for example .
Consider the matrix and the polynomial Calculate .
In order to calculate the matrix , we substitute for in the polynomial and we simplify the result to a single -matrix: