Cayley-Hamilton

Matrix calculus: Minimal polynomial

Cayley-Hamilton

Let $A$ be an $(n\times n)$ -matrix. Earlier, we saw that all matrices of this size, with the usual matrix addition and scalar multiplication, form an $n^2$ -dimensional vector space. Thus, there is a non-negative integer $k\le n^2$ , such that the system

$\basis{1,A,A^2,\ldots, A^{k}}$ is linearly dependent. In that case there is an equality of the form

$A^k+c_{k-1} \cdot A^{k-1}+\cdots +c_1\cdot A + c_0\cdot I_n = 0$

According to the Cayley-Hamilton theorem below, $k$ is at most $n$ . To understand this result, we specify what is meant by evaluating a polynomial at a square matrix.

Let $n$ be a natural number and $A$ an $(n\times n)$ -matrix. Under substitution of $A$ (for $x$ ) into a polynomial

$p(x) = c_0+c_1x+\cdots+ c_kx^k$ in $x$ , or evaluating this polynomial at $A$ , we mean determining the $(n\times n)$ -matrix $p(A)$ defined by

$p(A)= c_0\cdot I_n+c_1\cdot A+\cdots +c_k\cdot A^k$

The assignment of $p(A)$ to $p(x)$ is a mapping $P\to M_{n\times n}$ from the vector space $P$ of all polynomials in $x$ to the vector space $M_{n\times n}$ of all $(n\times n)$ -matrices.

This map respects multiplication in the following sense: if $p(x)$ , $q(x)$ , and $r(x)$ are polynomials such that $r (x)= p(x)\cdot q(x)$ , then

$r(A) = p(A)\, q(A)$ Under substitution of a linear map $L: V\to V$ into the polynomial $p(x)$ as given above, we mean the linear map $p(L): V\to V$ defined by

$p(L)= c_0\cdot I_V+c_1\cdot L+\cdots +c_k\cdot L^k$

If, for example, $p(x) = 3x +4$ and $A = \matrix{1&-3\\ 2&-5}$ , then

$\begin{array}{rcl}p(A) &=& 3\cdot A+4\cdot I_2 \\ &=&3\cdot\matrix{1&-3\\ 2&-5}+4\cdot\matrix{1&0\\ 0&1}\\ &=&\matrix{3+4&-9\\ 6&-15+4}\\ &=&\matrix{7&-9\\ 6&-11}\end{array}$

Dimension 1 If $A = \matrix{a}$ , then the substitution of the matrix $A$ into a polynomial $p(x)$ is no different than substituting the value $a$ for $x$ into $p(x)$ , that is to say: $p(A)$ is the $(1\times1)$ -matrix with entry $p(a)$ .

Explanation The meaning of $p(L)$ follows from the theory of Composition of linear maps and Sum and multiple of linear maps.

The matrix of $p(L)$ with respect to a basis $\alpha$ of $V$ is obtained by the substitution of the matrix of $L$ with respect to $\alpha$ into $p(x)$ :

$\left(p(L)\right)_\alpha=p(L_\alpha)$

A remarkable fact about linear maps from a finite-dimensional vector space to itself is that substitution in their characteristic polynomial gives the null map:

Each linear map $L:V\to V$ from a finite-dimensional vector space $V$ to itself satisfies

$p_L(L) = 0_V$ where $p_L(x)$ is the characteristic polynomial.

If $\dim{V}=n$ and we write the characteristic polynomial as

$\det(L-\lambda\cdot I_V) = (-1)^n\lambda^n+c_{n-1} \lambda^{n-1}+\cdots +c_1 \lambda + c_0$ then for any basis $\alpha$ of $V$ , the $(n\times n)$ -matrix $X=L_\alpha$ is a solution of the corresponding matrix equation in $X$ :

$(-1)^n\cdot X^n+c_{n-1} \cdot X^{n-1}+\cdots +c_1\cdot X + c_0\cdot I_n = 0_n$

Let $n$ be a natural number and $A$ an $(n\times n)$ -matrix. Then $A$ satisfies $p_A(A) = 0_n$ . This is the special case of the theorem for the linear map $L_A$ determined by $A$ .

Let $n$ be a natural number. First we prove the statement: Every $(n\times n)$ -matrix $A$ satisfies $p_A(A) = 0_n$ .

We use the matrix equation in the Cramer's rule. We recall that the $(i,j)$ -entry of the adjoint matrix $\text{adj}(A)$ is $(-1)^{i+j}\cdot \det(A_{ji})$ . Said equation is

$\det(A)\cdot I_n = A\, \text{adj}(A)$ If we replace $A$ by $A-x\cdot I_n$ , we find

$\det(A-x\cdot I_n)\cdot I_n = (A-x\cdot I_n)\, \text{adj}(A-x\cdot I_n)$

We carry out the proof by applying substitution of an $(n\times n)$ -matrix into a more general expression than a polynomial, namely a polynomial whose coefficients are also $(n\times n)$ -matrices. We denote by $Q$ the collection of all such polynomials. An element $q(x)$ of $Q$ is of the form

$q(x) =x^k\,C_k+x^{k-1}\,C_{k-1} +\cdots + x\,C_1 + C_0$ where $C_k,C_{k-1},\ldots,C_0$ belong to $M_{n\times n}$ . The factor $r(x) = A-x\cdot I_n$ in the above equality is an example of an element of $Q$ . The left-hand side of the equality is also an example; here all of the coefficients of powers of $x$ are multiples of the identity matrix $I_n$ . Finally, the factor $p(x) = \text{adj}(A-x\cdot I_n)$ on the right-hand side is also a polynomial in $Q$ of degree $n-1$ in $x$ (after all, the determinants in the adjoint matrix are of matrices with $n-1$ rows and $n-1$ columns).

For $n=1$ , the matrices are no different than ordinary numbers and we have $Q=P$ .

Again, we denote by $q(A)$ the matrix obtained from $q(x)$ by substitution of $A$ for $x$ . It is easy to see that $Q$ is again a vector space and that the map assigning to $q(x)$ the matrix $q(A)$ , is a linear map $Q\to M_{n\times n}$ .

Elements of $Q$ can be multiplied by one another with the usual rules, where $C\,x$ is rewritten to $x\, C$ for each matrix $C$ in $M_{n\times n}$ . If $n\gt 1$ , then matrix multiplication is no longer commutative, so substitution of $A$ for $x$ no longer respects multiplication. This means that it may happen that $(p\cdot q)(A) \ne p(A)\, q(A)$ . For example, if $A$ and $B$ are invertible matrices that do not commute (that is, $A\, B \ne B\, A$ ), then $p(x) = x$ and $q(x) =B$ satisfy

$(p\cdot q)(A) = B\,A \ne A\,B = p(A)\, q(A)$

However, if $A$ commutes with $p(x)$ and $q(x)$ , and so commutes with each coefficient of $p(x)$ and $q(x)$ , then we do have

$(p\cdot q)(A) = p(A)\, q(A)$

because then the product in the right-hand side can be expanded in the same manner as for a polynomial in $x$ (always when $C\,x$ is rewritten to $x\, C$ for a coefficient of $p(x)$ or $q(x)$ , the same commutation also applies after substitution: $C\,A = A\,C$ ).

We apply this statement to the equality $\det(A-x I_n)\cdot I_n = (A-x I_n)\, \text{adj}(A-x I_n)$ derived above. Because $A-x I_n$ commutes with $\text{adj}(A-x I_n)$ , and because $x\, I_n$ commutes with each matrix, it follows that $A=(A-xI_n)+xI_n$ commutes with $\text{adj}(A-x I_n)$ and, as a consequence, with each coefficient of $\text{adj}(A-x I_n)$ .

If we substitute $A$ for $x$ , then the right-hand side becomes equal to the null matrix because the substitution of $A$ in $A-x I_n$ yields the null matrix. We have found that substitution of $A$ respects multiplication, thus substitution of $A$ in the entire right-hand side gives the null matrix. We conclude that substitution of $A$ in the left-hand side produces the null matrix as well. This means that substitution of $A$ in $\det(A-x I_n)\cdot I_n$ gives the null matrix. Since substituting $A$ for $x$ in $p_A(x)\cdot I_n=\det(A-x I_n)\cdot I_n$ equals $p_A(A)\cdot I_n$ , this means that $p_A(A)=0_n$ , which proves the statement about $A$ .

The theorem is a direct consequence because, for each basis $\alpha$ of $V$ we have

$\left(p_L(L) \right)_{\alpha}=\left(p_A(L)\right)_{\alpha}=p_A( A) = 0_n$ where $A=L_\alpha$ .

Substitution of $A$ for $x$ in $\text{adj}(A- x I_n)$ does not always give the null matrix. If, for example,

$A = \matrix{a&b\\ c&d}$ then

$\text{adj}(A- x I_2) = \matrix{d-x&-b\\- c&a-x}=\matrix{d&-b\\ -c&a}-x\cdot I_2$ so substitution of $A$ gives

$\matrix{d&-b\\ -c&a}-\matrix{a&b\\ c&d} = \matrix{d-a & -2b\\-2c&a-d}$

This is distinct from the null matrix if for example $d\ne a$ .

Consider the matrix

$A =\matrix{2 & 0 & 0 \\ -1 & 1 & 0 \\ -1 & 0 & 1 \\ }$ and the polynomial

$p(x) = x^2-3 x+2$ Calculate $p(A)$ .

$p(A)=$ $\matrix{0 & 0 & 0 \\ 0 & 0 & 0 \\ 0 & 0 & 0 \\ }$

In order to calculate the matrix $p(A)$ , we substitute $A$ for $x$ in the polynomial $p(x)$ and we simplify the result to a single $(3\times3)$ -matrix:

$\begin{array}{rcl} p(A) &=&A^2-3\cdot A+2\cdot I_3\\ &&\phantom{xxx}\color{blue}{A\text{ substituted }}\\ &=& \matrix{4 & 0 & 0 \\ -3 & 1 & 0 \\ -3 & 0 & 1 \\ } -3\cdot \matrix{2 & 0 & 0 \\ -1 & 1 & 0 \\ -1 & 0 & 1 \\ } +2\cdot \matrix{1&0&0\\ 0&1&0\\ 0&0&1}\\ &&\phantom{xxx}\color{blue}{\text{explicit matrices substituted}}\\ &=& \matrix{0 & 0 & 0 \\ 0 & 0 & 0 \\ 0 & 0 & 0 \\ }\\ &&\phantom{xxx}\color{blue}{\text{linear combination simplified}}\\ \end{array}$

New example