The Gram-Schmidt procedure we dealt with, and the corresponding relationship between a system of vectors and a system of orthonormal vectors, can also be captured in a matrix decomposition. We will first formulate the decomposition in a theorem, and then show how the associated matrices can be found.
If #A# is an #(m \times n)#-matrix with linearly independent column vectors, then this matrix can be written as a product \[A=Q\,R\] where #R# is an #(m\times n)#-upper triangular matrix, and the columns of the #(m \times m)#-matrix #Q# form an orthonormal system. This notation is called a QR decomposition of #A#.
The matrix #A = \matrix{3\\4}# has a QR decomposition \[A =\left( \frac{1}{5}\matrix{3&4\\ 4&-3}\right)\, \matrix{5\\ 0}\] Here, the columns of #Q = \frac{1}{5}\matrix{3&4\\ 4&-3}# form an orthonormal system, and the matrix #R = \matrix{5\\ 0}# is upper triangular.
The assumption that the columns of #A# are independent implies that #m\ge n#. After all, the columns are vectors of the #m#-dimensional vector space #\mathbb{R}^m#.
Let #A# be an #(m\times n)#-matrix with #n# independent column vectors. We denote the column vectors of #A# by #\vec{a}_1,\ldots, \vec{a}_n#. These are vectors from #\mathbb{R}^m#. By means of the Gram-Schmidt procedure we can construct an orthonormal basis #\vec{q}_1,\ldots,\vec{q}_n# for the span of #\vec{a}_1,\ldots, \vec{a}_n#.
We will now construct this orthonormal basis and record the steps taken in elementary matrices. Here we use the following notation:
- #D_{i,\lambda}# is the #(n\times n)#-diagonal matrix with #\lambda# at entry #(i,i)# and ones on the remainder of the diagonal
- #E_{jk,\lambda}# (where #j\ne k#) is the #(n\times n)#-matrix having #(j,k)#-entry equal to #\lambda#, ones on the diagonal, and zeros everywhere else
Start with the matrices #P_0 = A# and #S_0=I_n# (the identity #(n\times n)#-matrix). We have #A = P_0\, S_0#. We will now iteratively construct matrices #P_i# and #S_i# from #P_{i-1}# and #S_{i-1}# of the same dimensions in such a way that
- the columns of #P_i# always form an intermediate result of the Gram-Schmidt procedure, and
- #S_i# always is an upper triangular matrix which records what happened to the columns of #P_0# in order to become #P_i#,
- #A = P_i\, S_i#.
The first step of the procedure is normalizing the column vector #\vec{a}_1# to get #\vec{q}_1#. Here, #\vec{a}_1# is not the zero vector since it is part of a system of linealry independent vectors. By multiplying the first vector #\vec{q}_1# by the norm of #\vec{a}_1#, we recover the vector #\vec{a}_1#. In terms of matrices, this gives\[ \begin{array}{rcl} A&=&\matrix{&&&\\\vec{a}_1&\vec{a}_2& \cdots &\vec{a}_n\\&&&}\\&&\phantom{xx}\color{blue}{\vec{a}_1,\ldots,\vec{a}_n\text{ are the columns of }A}\\&=&\matrix{&&&\\\vec{q}_1&\vec{a}_2&\cdots&\vec{a}_n\\&&&}\,\matrix{\norm{\vec{a}_1}&0&\ldots&\ldots\\0&1&&\\\vdots&&\ddots&\\\vdots&&&1}\\&&\phantom{xx}\color{blue}{\vec{a}_1=\norm{\vec{a}_1}\cdot \vec{q}_1}\\&=&P_1\,S_1\\&&\phantom{xx}\color{blue}{P_1\text{ is the first factor, }S_1\text{ is the diagonal matrix }D_{1,\parallel \vec{a}_1\parallel}}\end{array}\] In the next step of the Gram-Schmidt procedure we calculate the vector #\vec{q}^*_2# by subtracting the vector #(\dotprod{\vec{a}_2}{\vec{q}_1})\,\vec{q}_1# from the vector #\vec{a}_2#. This gives \(\vec{a}_2 =\vec{q}^*_2+(\dotprod{\vec{a}_2}{\vec{q}_1})\vec{q}_1\). For the matrix whose columns are \( \vec{q}_1,\vec{q}^*_2,\vec{a}_3, \ldots ,\vec{a}_n\) this gives the matrix decomposition\[\begin{array}{rcl}P_1&=&\matrix{&&&\\\vec{q}_1&\vec{q}^*_2&\vec{a}_3& \cdots &\vec{a}_n\\&&&}\, \matrix{1&(\dotprod{\vec{a}_2}{\vec{q}_1})&0&\cdots\\0&1&&\\\vdots&&\ddots&\\\vdots&&&1}\end{array}\] We see that the first matrix #P_2# of this decomposition looks a bit more like the desired matrix #P#. The second matrix of the decomposition is the elementary matrix #E_{1,2,(\dotprod{\vec{a}_2}{\vec{q}_1})}#. This matrix is upper triangular, and so (see
LUP decomposition) the product #S_2 = E_{1,2,(\dotprod{\vec{a}_2}{\vec{q}_1})}\, S_1# is again upper triangular. We now have the decomposition \[A = P_1\,S_1 = P_2\, E_{1,2,(\dotprod{\vec{a}_2}{\vec{q}_1})}\,S_1 = P_2\,S_2\]
We continue this way. We normalize the second vector as we did for the first vector. Then we get \[ P_2=\matrix{&&&\\\vec{q}_1&\vec{q}_2&\vec{a}_3& \cdots &\vec{a}_n\\&&&}\,D_{2,\norm{\vec{a}_2}}\]
Let #P_3# be the first factor of the decomposition and write #S_3 = D_{2,\norm{\vec{a}_2}}\, S_2# so the first two columns of #P_3# form an orthonormal system, #S_3# is upper triangular, and
\[ A = P_3\,S_3\]
Continuing this way, we obtain, in general, \[P_{i-1}=P_{i}\,E_{i}\phantom{x}\text{, }\phantom{xxx}S_i = E_i\, S_{i-1}\phantom{xxx}\text{ and }\phantom{xxx}A = P_i\, S_i \] where, for each #i#, the matrix #E_i# is either the diagonal matrix \( D_{k,\norm{\vec{a}_k}}\) for an index #k\le n# or an elementary matrix of the form #E_{j,k,(\dotprod{\vec{a}_k}{\vec{q}_j})}# for a pair of indices #j,k# with #1\le j\lt k\le n#. In each case, #E_i# is an upper triangular matrix, so the product #S_i = E_i\, S_{i-1}# is an upper triangular matrix again. This is because we can always express the vectors #\vec{a}_k# as linear combinations of #\vec{q}_1,\ldots,\vec{q}_k#.
Because the columns #\vec{a}_1,\ldots, \vec{a}_n# of #A# are linearly independent, the process ends at #k=n# with
- an #(m\times n)#-matrix #P_k# whose columns form an orthonormal system of #n# vectors in #\mathbb{R}^m#,
- an upper triangular matrix #S_k#,
- \(A = P_k\, S_k\).
We now set #Q# equal to the #(m\times n)#-matrix #P_n# augmented by #n# columns forming an orthonormal basis for the orthogonal complement of the span of the #n# columns of #P_n#. In other words, if we let #B# be the #(n\times (n-m))#-matrix whose columns form a basis of this orthogonal complement, then #Q=\matrix{P&B}#.
Furthermore, we take #R# to be equal to the matrix #S_k# augmented by #n# zero rows to an #(m\times n)#-matrix, so #R = \matrix{S\\ 0}#, where #0# is the #((n-m)\times n)#-zero matrix. Then #Q# is an #(m\times m)#-matrix whose columns form an orthonormal basis of #\mathbb{R}^m#. Moreover, #R# is an #(m\times n)#-upper triangular matrix with the property that
\[A = P\, S = \matrix{P&B}\,\matrix{S\\ 0} = Q\, R\]
This proves the theorem.
Later a square matrix whose columns form an orthonormal system will be called orthogonal. The rows of such a matrix also form an orthonormal system.
Often the columns of #Q# are exactly the vectors as those that are created by the Gram-Schmidt algorithm. Other orthonormal systems can be formed by changing the signs of the vectors. For instance, the matrix #A = \matrix{3\\4}# has two QR-decompositions:
\[A =\left( \frac{1}{5}\matrix{3&4\\ 4&-3}\right)\, \matrix{5\\ 0}= \left(\frac{1}{5}\matrix{-3&4\\ -4&-3}\right)\, \matrix{-5\\ 0}\]
By making such sign changes we can arrange that the diagonal elements of the matrix #R# all become positive. If #m=n#, then, under this additional condition on the diagonal elements of #R#, the QR decomposition is unique.
For a proof of this fact, we need some facts that we will prove later when we know more about orthogonal matrices.
Suppose #m=n#, so #A# is a square matrix. Let #P# be the permutation matrix pertaining to the permutation #[n,n-1,n-2,\ldots,1]#. Then #P# satisfies #P^2 = I_n# and #P^\top = P=P^{-1}#. Now find the QR decomposition of #PA^\top P#. This gives the decomposition #P\,A^\top P = Q_1\, R_1#, where the columns of #Q_1# form an orthonormal system, and #R_1# is an upper triangular matrix. This implies #A = (P\,R_1^\top P)\, ( P\, Q_1^\top P )#. The matrix #R = P\,R_1^\top P# is upper triangular, and the columns of the matrix #Q =P \, Q_1^\top P# form an orthonormal system. Therefore, we can call #A = R\, Q# an RQ decomposition.
As we will see later, the maximum number of independent column vectors of #A# is equal to the rank of #A#. Therefore, the independence condition of the theorem can also be formulated as: the rank of #A# is equal to #n#.
All of the diagonal entries of #R# are nonzero. This follows from the construction of #R# as given in the proof: the diagonal entries of #R# are lengths of nonzero vectors. (It is also a consequence of properties of the rank of a matrix which we will deal with later.)
There are several ways of finding the QR decomposition of a matrix. Here we discuss the method arising from the proof of the above theorem.
For a given #(m\times n)#-matrix #A# whose columns are linearly independent, a QR decomposition can be found by the following extension of the Gram-Schmidt procedure:
Start with the matrices #P=A# and #S=I_n# (the identity matrix). For #k=1,\ldots,n# execute the following changes:
- Calculate #\vec{q}^*_k= \vec{a}_k - \sum_{j=1}^{k-1}\vec(\dotprod{\vec{a}_k}{\vec{q}_j})\,\vec{q}_j#.
- Calculate the length #\norm{\vec{q}^*_k}#.
- Calculate #\vec{q}_k=\frac{1}{\norm{\vec{q}^*_k}}\vec{q}^*_k#.
- Replace the #k#-th column of #P# by #\vec{q}_k#.
- Replace #S# by #D_{k,\norm{\vec{q}_k^*}}\,E_{1,k,(\dotprod{\vec{a}_k}{\vec{q}_1})}\, E_{2,k,(\dotprod{\vec{a}_k}{\vec{q}_2})}\,\cdots\,E_{k-1,k,(\dotprod{\vec{a}_k}{\vec{q}_{k-1}})}\,S#.
Determine an #(m\times (m-n))#-matrix #B# whose #m-n# columns form an orthonormal basis for the orthogonal complement of the span of the #n# columns of #P#.
Write #Q = \matrix{P&B}# and #R = \matrix{S\\ 0}#, where #0# stands for the zero matrix with dimensions #{(m-n)\times n}#.
Then #A = Q\, R# is a QR decomposition of #A#.
There is an alternative way to calculate the QR decomposition which avoids the computation of the matrix #R# during the Gram-Schmidt procedure. Due to the orthonormality of the columns of #Q#, which is created as in the above procedure, we have #Q^{\top}Q=I_m#, where #Q^\top# is the transposed matrix of #Q#. Then the corresponding matrix #R# can be found due to the equation #A=Q\, R#: \[R=I_m\, R=Q^{\top}\,Q\,R=Q^{\top}\,A\]
Thus we can compute the matrix #R# as #Q^{\top}\, A#.
The described steps follow the steps in the proof of the above theorem QR decomposition. The steps described for #k=1,\ldots,n# follow the Gram-Schmidt procedure.
For fixed #k# the elementary matrices #E_{j,k,(\dotprod{\vec{a}_k}{\vec{q}_j})}# for #{j=1,\ldots,k-1}# commute with each other. In the Gram-Schmidt procedure this gives the possibility to subtract scalar multiples of #\vec{q}_j# from #\vec{a}_k# for different #j# in any order.
If #m=n#, then the matrices #Q# and #R# are equal to #P# and #S#, respectively. In this case, the algorithm already stops after passing through the steps for #k=1,\ldots,n#, in the sense that #A = P\, S# is already a QR decomposition.
Determine a QR decomposition of the #(3\times 3)#-matrix \[A= {{{1}\over{3}}\,\matrix{-4 & 1 & 2 \\ -2 & 5 & -2 \\ -4 & -8 & -10 \\ }} \]
#A=# #\left({{1}\over{3}}\,\matrix{-2 & 1 & 2 \\ -1 & 2 & -2 \\ -2 & -2 & -1 \\ } \right) \, \matrix{2 & 1 & 2 \\ 0 & 3 & 2 \\ 0 & 0 & 2 \\ }#
We describe the steps of the
extended Gram-Schmidt procedure in the table below. The left-hand column contains the changes to the columns of the matrix #A# prescribed by the Gram-Schmidt procedure to arrive at a matrix #Q# whose columns are an orthogonal system. The second column contains the elementary operation to get back the previous matrix from the left-hand column in terms of the matrix which is to be multiplied from the right.
The first line shows how the first column of #A# is normalized. The second line shows how, in the second column, a scalar multiple of the first is added to this column so as to be perpendicular to the first. The corresponding contribution to #R# is a matrix of which #(1,2)# is the only entry differing from the corresponding entry of the identity matrix #I_3#. And so on.
\[ \begin{array}{rclcl}\text{from } A\text{ toward }Q&\phantom{xx}&\text{contribution to }R&\phantom{xx}&\text{build-up of }R\\
\hline
\matrix{-{{2}\over{3}} & {{1}\over{3}} & {{2}\over{3}} \\ -{{1}\over{3}} & {{5}\over{3}} & -{{2}\over{3}} \\ -{{2}\over{3}} & -{{8}\over{3}} & -{{10}\over{3}} \\ } && \matrix{2 & 0 & 0 \\ 0 & 1 & 0 \\ 0 & 0 & 1 \\ } && \matrix{2 & 0 & 0 \\ 0 & 1 & 0 \\ 0 & 0 & 1 \\ } \\ \matrix{-{{2}\over{3}} & 1 & {{2}\over{3}} \\ -{{1}\over{3}} & 2 & -{{2}\over{3}} \\ -{{2}\over{3}} & -2 & -{{10}\over{3}} \\ } && \matrix{1 & 1 & 0 \\ 0 & 1 & 0 \\ 0 & 0 & 1 \\ } && \matrix{2 & 1 & 0 \\ 0 & 1 & 0 \\ 0 & 0 & 1 \\ } \\ \matrix{-{{2}\over{3}} & {{1}\over{3}} & {{2}\over{3}} \\ -{{1}\over{3}} & {{2}\over{3}} & -{{2}\over{3}} \\ -{{2}\over{3}} & -{{2}\over{3}} & -{{10}\over{3}} \\ } && \matrix{1 & 0 & 0 \\ 0 & 3 & 0 \\ 0 & 0 & 1 \\ } && \matrix{2 & 1 & 0 \\ 0 & 3 & 0 \\ 0 & 0 & 1 \\ } \\ \matrix{-{{2}\over{3}} & {{1}\over{3}} & 2 \\ -{{1}\over{3}} & {{2}\over{3}} & 0 \\ -{{2}\over{3}} & -{{2}\over{3}} & -2 \\ } && \matrix{1 & 0 & 2 \\ 0 & 1 & 0 \\ 0 & 0 & 1 \\ } && \matrix{2 & 1 & 2 \\ 0 & 3 & 0 \\ 0 & 0 & 1 \\ } \\ \matrix{-{{2}\over{3}} & {{1}\over{3}} & {{4}\over{3}} \\ -{{1}\over{3}} & {{2}\over{3}} & -{{4}\over{3}} \\ -{{2}\over{3}} & -{{2}\over{3}} & -{{2}\over{3}} \\ } && \matrix{1 & 0 & 0 \\ 0 & 1 & 2 \\ 0 & 0 & 1 \\ } && \matrix{2 & 1 & 2 \\ 0 & 3 & 2 \\ 0 & 0 & 1 \\ } \\ \matrix{-{{2}\over{3}} & {{1}\over{3}} & {{2}\over{3}} \\ -{{1}\over{3}} & {{2}\over{3}} & -{{2}\over{3}} \\ -{{2}\over{3}} & -{{2}\over{3}} & -{{1}\over{3}} \\ } && \matrix{1 & 0 & 0 \\ 0 & 1 & 0 \\ 0 & 0 & 2 \\ } && \matrix{2 & 1 & 2 \\ 0 & 3 & 2 \\ 0 & 0 & 2 \\ } \end{array}\]
The matrix #Q# appears at the bottom of the left column: #Q = {{1}\over{3}}\,\matrix{-2 & 1 & 2 \\ -1 & 2 & -2 \\ -2 & -2 & -1 \\ }#. The matrix #R# is the product of the matrices in the middle column in reverse order; the first line contains the right-most factor, and the last line contains the left-most factor. The right column displays the products of the contributions #R# up to the current line. The result is to be found at the bottom right: #R = \matrix{2 & 1 & 2 \\ 0 & 3 & 2 \\ 0 & 0 & 2 \\ }#. Thus, we find
\[ A = {{1}\over{3}}\,\matrix{-2 & 1 & 2 \\ -1 & 2 & -2 \\ -2 & -2 & -1 \\ }\, \matrix{2 & 1 & 2 \\ 0 & 3 & 2 \\ 0 & 0 & 2 \\ } \]