To a linear map #L# of an #n#-dimensional vector space #V# to itself belongs a matrix #A=L_\alpha# as soon as we have chosen a basis #\alpha# for #V#. If #L_\beta# is the matrix with respect to a second basis #\beta# for #V#, then #L_\alpha# and #L_\beta# are conjugate, meaning: there exists an invertible #(n\times n)#-matrix #T# with the property that #L_\beta=T^{-1}\,L_\alpha\,T#. For example, the transition matrix #T = {}_\alpha I_\beta# has this property.
We keep in mind that a matrix #A# is diagonizable if it is conjugate to a diagonal matrix, and not all matrices are diagonizable. Here we will discuss a normal form that can be found for all matrices. We will start with matrices whose characteristic polynomial only has one root.
Each #(n\times n)#-matrix #A# with eigenvalue #\lambda# and characteristic polynomial #(\lambda-x)^n# can be transformed to Jordan normal form (in short: Jordan form); meaning: there is an invertible matrix #S# such that
\[
S^{-1}AS=J=\left(\begin{array}{ccc} J_{i_1} & & \\ & \ddots & \\ & & J_{i_k}
\end{array}\right)
\] where the #(i\times i)#-matrix #J_{i}# is equal to
\[
J_{i}=\left(\begin{array}{cccc}\lambda & 1 & & \\ & \lambda & \ddots & \\
& & \ddots & 1\\ & & & \lambda
\end{array}\right)
\] and is called the Jordan block of size #i# corresponding eigenvalue #\lambda#.
The number #j_i# of Jordan blocks of size #i# satisfies
\[j_i =2\dim{\ker{(A-\lambda\,I_n)^{i}}}-\dim{\ker{(A-\lambda\,I_n)^{i+1}}}-\dim{\ker{(A-\lambda\,I_n)^{i-1}}}\] Assume that #B# is also an #(n\times n)#-matrix with eigenvalue #\lambda# whose characteristic polynomial is #(\lambda-x)^n#. Then #A# and #B# are conjugate if and only if the number of Jordan blocks of size #i# match for #i = 1,\ldots,n#.
For a diagonalizable matrix all Jordan blocks have size #1#.
Multiple Jordan blocks of the same dimension corresponding to eigenvalue #\lambda# are possible. The total number of Jordan blocks is equal to the dimension of the corresponding eigenspace #E_\lambda#, the null space #\ker{A-\lambda I}#.
From the Jordan form #J# of #A# we can tell right away what the characteristic polynomial of the matrix #A# is: #\det(A-x\cdot I_n)=\det(J-x\cdot I_n)=(\lambda-x)^{n}#. A simple example is given by the matrices #\matrix{0&0\\ 0&0}# and #\matrix{0&1\\ 0&0}#, both having characteristic polynomial #x^2#.
For determining the Jordan form in case of #r# eigenvalues, we only have to determine the dimensions #e_{\lambda_s,i}# of the null spaces #\ker{(A-\lambda_s\cdot I_n)^i}#, where #s=1,\ldots,r# and #i# runs from #1# to #\ell_s#, the multiplicity of #\lambda_s# in the minimal polynomial, equal to the lowest index #i# for which #e_{\lambda_s,i}# is maximal.
First we determine the Jordan normal form for #A#. It is sufficient to prove the statement for #\lambda = 0#; once this is proven, we apply the results to #A-\lambda\,I_n# with eigenvalue #0#. In order to find the Jordan form for #A#, we will only need to add the scalar multiple by #\lambda# of the identity to the Jordan blocks for #A-\lambda\,I_n#.
Let #\ell# be the degree of the minimal polynomial of #A#. Then #\ell# is the smallest number such that #A^\ell = 0_n#. For every positive integer #r \le \ell# we choose a complement #U_{r}# of #\ker{A^{r-1}}+ \left(\im{A}\cap\ker{A^r}\right)# in #\ker{A^r}#. This means we have the following direct sum decomposition:
\[
\ker{A^r}=
\left (\ker{A^{r-1}} + \im{A}\cap \ker{A^{r}}\right) \oplus{U_r}
\] We state that #V# is the direct sum of all subspaces #A^{j}U_{r}# for #0 \le j \lt r \le \ell#.
To see this, we first prove that #V# is equal to the sum #U# of the vector spaces #A^{j}U_{r}#. To this end, we determine consecutively, for each #i=\ell,\ell-1,\ldots,1#, the following claim:
\[
\im{A} \cap \ker{A^{i}}
\subseteq \ker{A^{i-1}}+ U
\] The case #i=\ell# boils down to the inclusion #\im{A}\cap\ker{A^\ell}\subseteq\ker{A^{\ell-1}}# which follows directly from #A\, A^{\ell-1} = A^\ell = 0_n#. Suppose, therefore, that #i \lt \ell# and assume that the claim is proven for #i+1,\ldots,\ell#.
Suppose #\vec{v} \in \im{A}\cap \ker{A^i}#. Then there exists a vector #\vec{w}\in \ker{A^{i+1}}# such that #\vec{v} = A\vec{w}#. From the assumption it follows that
\[\begin{array}{rcl}
\ker{A^{i+1}}&=&
\left (\ker{A^{i}} + \im{A}\cap \ker{A^{i+1}}\right)
\oplus{U_{i+1} }\\ &&\phantom{xx}\color{blue}{\text{definition of }U_{r}}\\ &\subseteq&
\ker{A^{i}}
+
\im{A}\cap\ker{A^{i+1}} +U\\&&\phantom{xx}\color{blue}{U_{i+1}\subseteq U}\\
&\subseteq& \ker{A^{i}}+U\\ &&\phantom{xx}\color{blue}{\text{assumption}}\end{array}
\] from which we conclude that #\vec{v} = A\vec{w} \in A(\ker{A^i})+A\,U\subseteq \ker{A^{i-1}} + U#. Thus we have proven the claim for #i=\ell,\ldots,1#. Consequently, for each #r\le\ell#, \[\ker{A^r} = \left (\ker{A^{r-1}} + \im{A}\cap \ker{A^r}\right) \oplus {U_r}\subseteq \ker{A^{r-1}} + U\]
so
\[\begin{array}{rcl} V & = & \ker{A^\ell}\\ &&\phantom{xx}\color{blue}{A^\ell = 0_n}\\ & \subseteq & \ker{A^{\ell -1}}+U\\ &&\phantom{xx}\color{blue}{\text{claim}}\\&\vdots&\\ & \subseteq & \ker{A^{0}}+U\\ &&\phantom{xx}\color{blue}{\text{claim}}\\ & =& U\\ &&\phantom{xx}\color{blue}{\text{claim}}\\\end{array}\] This way we have determined that #V# is the sum of all subspaces #A^{j}U_{r}# for #0 \le j \lt r \le \ell#.
To show that this sum is a direct sum, we have to verify that the intersection of the summand #A^{j}U_{r}# with the sum of all other summands only consists of the zero vector. This follows directly from the following statement:
If #\sum_{j,r} A^{j}\vec{x}_{j,r} =\vec{0}#, where #0\le j \lt r\le \ell# and #\vec{x}_{j,r} \in U_{r}#, we have #\vec{x}_{j,r} = \vec{0}# for all #j, r#.
After all, if, for an arbitrary nonzero vector #\vec{x}_{j,r}#, the image #A^{j}\vec{x}_{j,r} # belongs to a sum of other summands than #A^{j}U_{r}# of #U#, then a non-trivial linear combination as in this statement can be found, which leads to a contradiction with the assumption that #\vec{x}_{j,r}# is distinct from the zero vector.
Assume that this statement is not true. Then there are indices #j# and #r# such that #\vec{x}_{j,r}\ne\vec{0}#. Since #\vec{x}_{j,r}# has intersection #\{\vec{0}\}# with #\ker{A^{r-1}}#, then we also have #A^{r-1}\vec{x}_{j,r}\ne\vec{0}#, and, since #j\lt r#, also #A^{j} \vec{x}_{j,r}\ne\vec{0}#. There exists a maximum index #i\lt \ell# such that, for a certain #r#, we have #A^{r-i}\vec{x}_{r-i,r}\neq \vec{0}#.
Now #A^{r-m}\vec{x}_{r-m,r}= \vec{0}# for each #r# and #m# with #m\gt i#. Therefore, the equality in the assumption of the statement can be rewritten as
\[\sum_{r=i}^\ell A^{r-i}\vec{x}_{r-i,r}=-\sum_{m=1}^{i-1}\sum_{r=m}^\ell A^{r-m}\vec{x}_{r-m,r}\] Because #U_r\subseteq \ker{A^r}#, we have #A^{r-m}\vec{x}_{r-m,r} \in\ker{A^m}#. Hence, the right-hand side of the equality above lies in #\ker{A^{i-1}}#. For the left-hand side we have the same:
\[
\sum_{r=i}^{\ell} A^{r-i}\vec{x}_{r-i,r} \in\ker{A^{i-1}}
\] Write #\vec{y }= \sum_{r=i+1}^{\ell} A^{r-i}\vec{x}_{r-i,r}# and #\vec{x} = \sum_{r=i}^{\ell} A^{r-i}\vec{x}_{r-i,r}#, so
\[
\vec{x} - A\vec{y} =
\vec{x}_{0,i} \in
(\ker{A^{i-1}}+ \im{A} \cap
\ker{A^i}) \cap U_{i} =\{ \vec{0}\}
\] We find that #\vec{x}_{0,i} = \vec{0}# and #\vec{x} = A\vec{y}#. In particular we have #\vec{y} \in \ker{A^{i-1}}#. By reasoning for #\vec{y}# in the same manner as we did above for #\vec{x}# (with #i-1# instead of #i#), etc., we find #\vec{x}_{1,i+1}= \cdots =\vec{x}_{\ell-i,\ell} =\{\vec{ 0}\}#. But this contradicts our choice of #i#.
We have shown that #V# is the direct sum of subspaces #A^{j}U_{r}#. For every #r =\ell,\ell-1,\ldots,1# we choose a basis #\basis{\vec{v}_{r,1},\ldots,\vec{v}_{r,j_r}}# of #U_{r}#. We then get #j_r# Jordan blocks #J_r# with respect to the basis consisting of #A^{r-1}\vec{v}_{r,k}, A^{r-2}\vec{v}_{r,k}, \ldots, \vec{v}_{r,k}# for #k=1,\ldots,j_r#. By joining these vectors for #r=\ell, \ell-1, \ldots,1# we get a basis of the whole vector space with respect to which the matrix of #A# takes on the correct form.
For a proof of the formula, we write #e_i = \dim{\ker{(A-\lambda\,I_n)^i}}#. A Jordan block of size #i# is defined on a set of independent vectors in #\ker{(A-\lambda\,I_n)^i}#. In this last subspace each Jordan block of bigger length than #i# leads to exactly one basis vector. This number is #j_{i+1}+\cdots+j_k#. Moreover, in #\ker{(A-\lambda\,I_n)^i}# exactly #e_{i-1}# are contained in #\ker{(A-\lambda\,I_n)^{i-1}}#. Exactly #e_i-e_{i-1}-j_{i+1}+\cdots+j_k# basis vectors remain in #\ker{(A-\lambda\,I_n)^i}#, each belonging to a unique Jordan block of size #i#. We conclude
\[j_i = e_i-e_{i-1}-(j_{i+1}+\cdots+j_k)\] This statement is true for all natural numbers #i#. In particular we have
\[j_{i+1} = e_{i+1}-e_{i}-(j_{i+2}+\cdots+j_k)\] Application of these formulas gives
\[\begin{array}{rcl}j_{i} &=&e_i-e_{i-1}-(j_{i+1}+\cdots+j_k)\\ &&\phantom{xxx}\color{blue}{\text{formula for }j_i}\\ &=&e_i-e_{i-1}-j_{i+1}-(j_{i+2}+\cdots+j_k)\\ &&\phantom{xxx}\color{blue}{j_{i+1} \text{ taken separately}}\\ &=&e_i-e_{i-1}-(e_{i+1}-e_{i}-(j_{i+2}+\cdots+j_k))-(j_{i+2}+\cdots+j_k)\\ &&\phantom{xxx}\color{blue}{\text{formula for }j_{i+1}}\\ &=&2e_i-e_{i-1}-e_{i+1}\\&&\phantom{xxx}\color{blue}{\text{simplified}}\\\end{array}\]
Now we will verify why two #(n\times n)#-matrices #A# and #B# whose characteristic polynomial is #(\lambda-x)^n# for a certain scalar #\lambda#, are conjugate if and only if the number of Jordan blocks of size #i# match for all #i = 1,\ldots,n#. Because of the above, we know that each of the two matrices is conjugate to a Jordan form and that the list of sizes of the corresponding Jordan blocks are equal to each other, apart from the order. Two Jordan blocks in a matrix can be rearranged by conjugation with a permutation matrix, where the basis of one block is replaced by the basis of the other block; the corresponding permutation permutes the basis vectors of one Jordan block with those of the other block. If we continue like that with rearranging Jordan blocks, we can make sure that a Jordan form for #A# by conjugation with a permutation matrix transforms into a Jordan form for #B#. Thus, we have proven that #A# and #B# are conjugate if they have the same number of Jordan blocks of size #i# for all #i = 1,\ldots,n#.
Finally we note that #A# and #B# are not conjugate if a number of Jordan blocks of size #i# differs for an #i#. This follows from the fact that the numbers #j_i# in the formulas above are expressed in terms of the dimension of the kernels of #\left(A-\lambda \cdot I_n\right)^i#, numbers that do not change if #A# is replaced by a conjugate.
Since the blocks belong to a direct sum decomposition, #n=\dim{V}# is the sum of the dimensions of the Jordan blocks. This means that \[n = \sum_{i=1}^\ell i\cdot j_i\]where #\ell# is the multiplicity of #\lambda# in the minimal polynomial. On the one hand, this formula can be used as a verification of all found values of #j_i#. On the other hand, the formula is a key to the interpretation of the numbers #j_i# as a partition of #n#, that is, a way to express #n# as a sum of natural numbers. Here, #j_i# indicates the number of terms in the sum equal to #i#. The order of these terms does not matter.
The partition \[n = \underbrace{1+1+\cdots+1}_{j_1\text{ terms}}+\underbrace{2+2+\cdots+2}_{j_2\text{ terms}}+\cdots\]determines the Jordan normal form #J# at #\lambda# with along the diagonal #j_1# Jordan blocks of size #1#, next #j_2# Jordan blocks of size #2#, and so on. Because of this, the Jordan normal form corresponding to #\lambda# is determined uniquely (apart from the order of the Jordan blocks, hence also apart from conjugacy by a permutation matrix). Because two #(n\times n)#-matrices #A# and #B# with the same characteristic polynomial #(\lambda-x)^n# are conjugate if and only if the number #j_i# of Jordan blocks correspond for all #i=1,\ldots,\ell#, the number of conjugacy classes is equal to the number of possible partitions of #n#.
We apply these results on the matrix of a linear map #L:V\to V# restricted to each of its generalized subspaces. The information on the Jordan blocks uniquely determines the conjugacy class of #L#.
Assume that #V# is a vector space of finite dimension #n#, that #L:V\to V# is a linear map, and that the characteristic polynomial #p_L(x)# of #L# is a product of linear factors:
\[p_L(x) = (\lambda_1-x)^{k_1}\cdot(\lambda_2-x)^{k_2}\cdots (\lambda_r-x)^{k_r}\]where #\lambda_1,\ldots,\lambda_r# are different from each other.
Then there are unique numbers \[ j_{\lambda_s,i}\phantom{xxx} \text{ with }\phantom{xxx} s=1,\ldots,r\phantom{xx}\text{ and }\phantom{xx} i = 1,\ldots,k_s\] such that the matrix of the restriction of #L# to the generalized eigenspace corresponding to #\lambda_s# with respect to a suitably chosen basis has a Jordan normal form with #j_{\lambda_s,i}# blocks of size #i#.
A linear map #M:V\to V# has the the same matrix as #L# relative to a suitably chosen basis for #V# if and only if it has the same eigenvalues as #L# and the same sizes #j_{\lambda_s,i}# (for #i=1,\ldots,k_s#) of Jordan blocks corresponding to each eigenvalue #\lambda_s# (for #s=1,\ldots,r#).
This solves the conjugation problem for complex vector spaces, since then each polynomial is a product of linear factors. In the real case the characteristic polynomial may have quadratic factors with non-real complex roots. This case will be discussed later.
According to the direct sum decomposition we can write #V# as the direct sum of the generalized eigenspaces #E_\lambda^*#, where #\lambda# runs over the roots #\lambda_1,\ldots,\lambda_r# of #p_L(x)#. Since #E_\lambda^*# are invariant subspaces of #V# under #L# and since their order can be rearranged to any other order by conjugation with a permutation matrix, we can limit ourselves to the finding and studying of the Jordan normal form for each of #\lambda_1,\ldots,\lambda_r#. The result for #L# restricted to each of the generalized subspaces follows directly from the above theorem The Jordan form with one eigenvalue.
The information on the sizes of the Jordan blocks suffices to determine the minimal polynomial: If #\ell_s# is the maximum of the sizes #j_{\lambda_s,i}# (for #i=1,\ldots,k_s#) of the Jordan blocks corresponding to eigenvalue #\lambda_s# (for #s=1,\ldots,r#), then
\[m_A(x) = \prod_{s=1}^r (x-\lambda_s)^{\ell_s}\]
To determine the conjugacy class of a complex square matrix we can use a table in which the Jordan blocks are described for each eigenvalue. To guarantee the uniqueness of the Jordan normal form we should order the occurring Jordan blocks along the diagonal in the Jordan normal form. Yet we do not worry about this, since we know that each order can be changed into every other order by a permutation matrix and because it is easy to check if tables with eigenvalues and sizes of corresponding Jordan blocks are equal to one another.
For the conjugacy class of a complex #(n\times n)#-matrix #A# we have now found a good characteristic: the eigenvalues #\lambda# with the numbers #j_{\lambda,i}# of the Jordan blocks with size #i# for each #i# corresponding to each eigenvalue #\lambda# uniquely determine the conjugacy class.
This data is equivalent to the dimensions #e_{\lambda,i}# of the kernels of #(A-\lambda\cdot I_n)^i#. We have already seen that the numbers #j_{\lambda,i}# follow from the dimensions #e_{\lambda,i}# thanks to the formula \[j_{\lambda,i} = 2e_{\lambda,i}-e_{\lambda,i-1}-e_{\lambda,i+1}\]Conversely it is easy to see that
\[e_{\lambda,i} = e_{\lambda,i-1}+j_{\lambda,i}+j_{\lambda,i+1}+\cdots +j_{\lambda,\ell}\]where #\ell# is the multiplicity of #\lambda# in the minimal polynomial #m_A(x)# of #A#.
For the interpretation in terms of Jordan normal form the numbers #j_{\lambda,i}# are useful. For calculations the numbers #e_{\lambda,i}# are preferable: on the one hand they directly follow from determining the dimensions of the kernels (or images) of linear maps, on the other hand they form a strictly increasing sequence ending at index #\ell#:\[e_{\lambda,1}\lt e_{\lambda,2}\lt\cdots \lt e_{\lambda,\ell} = e_{\lambda,\ell+1}=\cdots=\dim{E_\lambda^*}\]
Consider the matrix \[ A = \matrix{1 & -1 & -2 & -1 \\ 0 & 1 & -1 & 1 \\ 1 & 1 & 4 & 0 \\ 0 & 0 & 0 & 3 \\ } \] The characteristic polynomial of the matrix is equal to #{\left(x-3\right) \left(x-2\right)^3}#. Therefore, the eigenvalues of #A# are #2# and #3#.
Which of the matrices below is a Jordan normal form of #A#?
#\text{Jordan normal form for }A=# #\matrix{2 & 1 & 0 & 0 \\ 0 & 2 & 1 & 0 \\ 0 & 0 & 2 & 0 \\ 0 & 0 & 0 & 3 \\ }#
Since the multiplicity of #{2}# in the characteristic polynomial is equal to #3#, the dimension of generalized eigenspace #E_{2}^*# equals #{3}#. Since the multiplicity of #{3}# in the characteristic polynomial is equal to #1#, the dimension of generalized eigenspace #E_{3}^*# equals #1#.
To determine the size of the Jordan blocks, we first calculate the dimensions #\ker{(A-\lambda\,I_4)^{i}}# for the eigenvalues #\lambda = 2#, # 3# and #i=1,2,\ldots#. If #i\ge 3#, this dimension is equal to #3# for #\lambda=2# and if #i\ge 1#, then this dimension is equal to #1# for #\lambda = 3#.
\[\begin{array}{l|r}\text{subspace}\phantom{i}&\text{dimension}\\
\hline
\ker{(A-2\,I_4)^{3}}& 3\\
\ker{(A-2\,I_4)^{2}}&2\\
\ker{A-2\,I_4}&1\\
\ker{A-3\,I_4}&1\\
\hline
\end{array}\]According to the theorem
The Jordan form with one eigenvalue, the numbers #j_1#, #j_2#, #j_3# of Jordan blocks of #A# corresponding to eigenvalue #2# of size #1#, #2#, #3#, respectively, can be determined as follows (not all steps are necessary because the dimension #3# has already reached the dimension of the generalized eigenspace after the first step):
\[\begin{array}{rcrcl}
j_3 &=&
2\dim{\ker{(A-2 I_4)^{3}}}-\dim{\ker{(A-2 I_4)^{4}}}-\dim{\ker{(A-2 I_4)^{2}}}&=&1\\
j_2 &=&
2\dim{\ker{(A-2 I_4)^{2}}}-\dim{\ker{(A-2 I_4)^{3}}}-\dim{\ker{(A-2I_4)}}&=&0\\
j_1 &=&
2\dim{\ker{(A-2 I_4)}}-\dim{\ker{(A-2 I_4)^{2}}}-\dim{\ker{I_4}}&=&0
\end{array}\]Similarly, we find that only the number of Jordan blocks of size #1# is nonzero for eigenvalue #3#. We conclude that #A# has
- exactly one Jordan block of size #3# on the generalized eigenspace corresponding to #2#
- exactly one Jordan block of size #1# on the generalized eigenspace corresponding to #3#.
Because of the similarity in size of the Jordan blocks, we conclude that the Jordan form of #A# is equal to\[\matrix{2 & 1 & 0 & 0 \\ 0 & 2 & 1 & 0 \\ 0 & 0 & 2 & 0 \\ 0 & 0 & 0 & 3 \\ }\]