In every inner product space each point #\vec{p}# has a unique point on each finite-dimensional subspace #W# that is nearest to #\vec{p}#. The unique point is the orthogonal projection of #\vec{p}# on #W#. These results are also valid in the more general case where #W# is an affine subspace.
Let #W# be a finite-dimensional affine subspace of an inner product space #V# and #\vec{x}# a vector of #V#. Then there exists a unique vector #\vec{y}# in #W# such that #\vec{x}-\vec{y}# is perpendicular to #W#.
This vector #\vec{y}# is called the orthogonal projection of #\vec{x}# on #W#, and is denoted by #P_W(\vec{x})#.
The statement that #\vec{x}-\vec{y}# is perpendicular to the subspace #W# means that #\dotprod{(\vec{x}-\vec{y})}{\vec{w}}=0# for all #\vec{w}# in #U#, where #U# is the direction space of #W#.
The concept of orthogonal projection has been visualized in the picture below. In this particular case, we are looking at a #2#-dimensional linear subspace #W# of #V=\mathbb{R}^3# represented by the shaded area. The vectors #\vec{x}# and #\vec{y}# are as in the definition.
3d picture
Here #\vec{x}-\vec{y}# is the dotted vector, which is perpendicular to #W#.
We first prove the theorem for the case where #W# is a linear subspace of #V#. According to the Gram-Schmidt Theorem each finite-dimensional subspace has an orthonormal basis. Let #\basis{\vec{a}_1,\ldots,\vec{a}_k}# be such a basis for #W#. Because an orthogonal projection is a vector in #W#, we can write such a projection #\vec{y}# as a linear combination #\lambda_1 \vec{a}_1+\cdots + \lambda_k\vec{a}_k# for some scalars #\lambda_1 ,\ldots ,\lambda_k#. The requirement that #\vec{x}-\vec{y}# be perpendicular to #W# means that, for each #j# with #1\le j\le k#,
\[
\dotprod{
(\vec{x}-(\lambda_1 \vec{a}_1+\cdots + \lambda_k\vec{a}_k))}{\vec{a}_j}=0\]
Using the orthonormality of the basis and the linearity of the inner product we can rewrite the equation in #\lambda_j# to \( \lambda_j =\dotprod{\vec{x}}{\vec{a}_j} \). As a consequence, the vector \[\vec{y}=(\dotprod{\vec{x}}{\vec{a}_1} )\vec{a}_1+\cdots +(\dotprod{\vec{x}}{\vec{a}_k})\vec{a}_k\] is uniquely determined by the requirement that #\vec{x}-\vec{y}# be perpendicular to #W#. In addition, this vector belongs to #W# and #\vec{x}-\vec{y}# lies in #W^\perp#. Therefore, there is exactly one orthogonal projection of #\vec{x}# on #W#. This proves the theorem in the case where #W# is a linear subspace.
Now suppose that #W# is an affine subspace of #V#. Then there are a support vector #\vec{a}# and a direction space #U# such that #W = \vec{a}+U#. The vector #\vec{y}=\vec{a}+P_U(\vec{x}-\vec{a})# is an orthogonal projection of #\vec{x}# on #W# for
- #P_U(\vec{x}-\vec{a})# belongs to #U#, so the vector #\vec{y}# belongs to #\vec{a}+U=W#;
- the vector #\vec{x}-\vec{y}# is equal to #(\vec{x}-\vec{a}) -P_U(\vec{x}-\vec{a})# and is thus perpendicular to #U# due to the contention for the linear subspace #U# applied to the vector #\vec{x}-\vec{a}#.
It remains to show that #\vec{y}# is unique. Suppose that #\vec{z}# is an orthogonal projection of #\vec{x}# on #W#. Then #\vec{z}-\vec{a}# is the orthogonal projection of #\vec{x}-\vec{a}# on #U# and (by the theorem for a linear subspace #U#) equal to #P_U(\vec{x}-\vec{a})#. The latter vector is equal to #\vec{y}-\vec{a}#, and so #\vec{z}-\vec{a}=\vec{y}-\vec{a}#. We conclude that #\vec{z} = \vec{y}#, which proves that #\vec{y}# is the unique orthogonal projection of #\vec{x}# on #W#.
In many optimization problems calculating the minimum distance of a vector #\vec{x}# to an affine subspace #W# of #V# plays an important role.
If the affine subspace #W# has infinite dimension, there need not be an orthogonal projection. In order to see this, we let #V# be the inner product space of all polynomials in #t# with the inner product having #\basis{1,t,t^2,\ldots}# as an orthonormal basis (so #\dotprod{t^i}{t^j}=\delta_{ij}# with #\delta_{ij} =1 # if #i=j# and #0# otherwise). Take #\vec{x} # to be the constant polynomial #1# and #W# to be the linear subspace of #V# consisting of all polynomials having value #0# at #t=1#. The subspace #W# has basis \[\basis{t-1,t^2-1,\ldots,t^j-1,\ldots}\] Thus, if #\vec{y}# is an orthogonal projection of #\vec{x}# onto #W#, then there is a polynomial #\vec{y} =\sum_{i=1}^na_i(t^i-1)# where #n# is a natural number and #a_i# are real numbers satisfying
\[\dotprod{(\vec{x}-\vec{y})}{\vec{w}} = 0\text{ for all }\vec{w}\in W\]
Since #\dotprod{1}{(t-1)} = -1\ne0#, the vector #\vec{x}=1# is not perpendicular to #W#. In particular, #\vec{y}\ne\vec{0}#. We can therefore assume that #a_n\ne0#.
We work out the left-hand side of the above equation for #\vec{w} = t^j-1# where #j\ge 1#:
\[\begin{array}{rcl}\dotprod{(\vec{x}-\vec{y})}{\vec{w}} &=&\displaystyle \dotprod{\left(1-\sum_{i=0}^na_i(t^i-1)\right)}{ (t^j-1)}\\ &&\phantom{xx}\color{blue}{\text{expressions for }\vec{x},\, \vec{y},\, \vec{w}\text{ used}}\\&=&\displaystyle \dotprod {1}{(t^j-1)}-\sum_{i=0}^na_i\dotprod{(t^i-1)}{(t^j-1)}\\ &&\phantom{xx}\color{blue}{\text{linearity of inner product}}\\&=&\displaystyle -1-\sum_{i=0}^na_i(\delta_{ij}+1)\\&&\phantom{xx}\color{blue}{\text{ orthonormality of basis } \basis{1,t,t^2,\ldots}}\\&=&\displaystyle-\left(1+\sum_{i=0}^na_i\right)-a_j\\ &&\phantom{xx}\color{blue}{\text{with convention }a_j =0 \text{ if }j\gt n}\end{array}\] Now first consider values of #j# with #j\gt n#. Because then #a_j = 0# and because the above inner product should be equal to #0#, we find \( 1+\sum_{i=0}^na_i= 0\). This implies that the equation for # j =n# can be rewritten to #-a_n = 0# which contradicts the assumption #a_n\ne0#. This shows that there is no vector #\vec{y}# for which #\vec{x}-\vec{y}# is perpendicular to #W#. We conclude that there is no orthogonal projection of #\vec{x}# on #W#.
Here are some useful properties of the orthogonal projection on an affine subspace.
Let #V# be an inner product space with affine subspace #W=\vec{a}+U# for a vector #\vec{a}# and a linear subspace #U# of #V#. Suppose that #\basis{\vec{a}_1, \ldots ,\vec{a}_k}# is an orthonormal basis of #U# for a natural number #k#. The orthogonal projection #P_W(\vec{x})# of a vector #\vec{x}# of #V# on #W# satisfies the following properties:
- #\vec{x}-P_W(\vec{x})# is orthogonal to each vector from #W#.
- The orthogonal projection #P_W(\vec{x})# is given by \[\vec{a} + (\dotprod{(\vec{x}-\vec{a})}{\vec{a}_1})\,\vec{a}_1 + \cdots +(\dotprod{(\vec{x}-\vec{a})}{\vec{a}_k})\,\vec{a}_k\]
- The distance from #\vec{x}# to a vector from #W# is minimal for the orthogonal projection on #W#: \[\norm{\vec{x}-P_W(\vec{x})}=\min_{\vec{w}\in W}\norm{\vec{x}-\vec{w}}\]
- The orthogonal projection is the unique vector for which this minimum occurs.
- #\norm{P_W(\vec{x})}\leq\norm{\vec{x}}# with equality if and only if #\vec{x}=P_W(\vec{x})#.
- The equallity #P_W(\vec{x})=\vec{x}# holds if and only if #\vec{x}# lies in #W#.
The distance between #\vec{x}# and #W# is defined by #\norm{\vec{x}-P_W(\vec{x})}# as in statement 3.
If #W# is a linear subspace, then we can take #\vec{a} = \vec{0}# and #U = W#. For a first reading of the statement it is useful to keep this special case in mind. The general case follows simply after subtraction of #\vec{a}# from the affine subspace #W# and the vector #\vec{x}#.
1. The first statement is merely a repetition of the definition.
2. The second statement follows from the proof of the previous theorem.
3. For a good intuition of the proof of the third statement, the following figure is useful.
3d picture
Let #\vec{w}# be a vector in #W#. We compare #\norm{\vec{x}-P_W(\vec{x})}# to #\norm{\vec{x}-\vec{w}}#. To this end we write #\vec{x}-\vec{w}=(\vec{x}-P_W(\vec{x}))+(P_W(\vec{x})-\vec{w})#. Because #P_W(\vec{x})# and #\vec{w}# belong to the affine subspace #W#, the difference #P_W(\vec{x})-\vec{w}# lies in #U#. This vector is perpendicular to #\vec{x}-P_W(\vec{x})#, by definition of the orthogonal projection. We can therefore apply the Pythagorean theorem:
\[
\norm{\vec{x}-\vec{w}}^2 =\norm{\vec{x}-P_W(\vec{x})}^2 +
\norm{P_W(\vec{x})-\vec{w}}^2
\] Rewriting this gives
\[
\norm{\vec{x}-P_W(\vec{x})}^2 =\norm{\vec{x}-\vec{w}}^2 - \norm{P_W(\vec{x})-\vec{w}}^2 \leq \norm{\vec{x}-\vec{w}}^2
\] so \(\norm{\vec{x}-P_W(\vec{x})}^2 \leq \norm{ \vec{x}-\vec{w}}^2
\). Since lengths are non-negative, we conclude that
\[\norm{\vec{x}-P_W(\vec{x})}\leq\norm{\vec{x}-\vec{w}}\] Equality occurs if and only if #\norm{P_W(\vec{x})-\vec{w}}=0#, which means that #P_W(\vec{x})# must equal #\vec{w}#.
4. The last sentence of the proof of the third statement immediately proves the fourth statement.
5. The fifth statement follows, just like the third, from the Pythagorean theorem: We know that #P_W(\vec{x})# belongs to #W#, and that, as a consequence, it is perpendicular to #\vec{x}-P_W(\vec{x})#. Therefore we have
\[
\norm{\vec{x}}^2=\norm{\vec{x}-P_W(\vec{x}) +P_W(\vec{x})}^2 =\norm{\vec{x}-P_W(\vec{x})}^2 +
\norm{P_W(\vec{x})}^2
\] from which the statement follows immediately.
6. The implication from left to right in the sixth statement follows from the fact that the projection #P_W(\vec{x})# lies in #W# by definition, and so #\vec{x}# is a vector in #W# if #P_W(\vec{x})=\vec{x}#.
The implication from right to left can be seen by using the fourth statement and observing that the distance between #\vec{x}# and #\vec{x}# equals #0#, and the fact that this is the unique vector in #W# for which this holds.
According to statement 3, the distance between a vector #\vec{x}# and an affine subspace #W# is equal to the minimum distance between #\vec{x}# and a point in #W#.
In #\mathbb{R}^3# we will calculate the orthogonal projection of the vector #\rv{1,1,1}# on the subspace spanned by the vector #\rv{1,2,2}#. First, we normalize the vector #\rv{1,2,2}# to get an orthonormal basis. Because #{\norm{\vec{a}_1}} = \sqrt{1^2+2^2+2^2}=3 #, this provides the vector \[\vec{a}_1=\dfrac{1}{3}\cdot \rv{1,2,2} =\rv{\frac{1}{3},\frac{2}{3},\frac{2}{3}}\]
The orthogonal projection is now given by
\[\begin{array}{rcl} P_W(\vec{x})&=&(\dotprod{\vec{x}}{\vec{a}_1})\vec{a}_1\\ &=&\displaystyle\left(\dotprod{\rv{1,1,1}}{\rv{\frac{1}{3},\frac{2}{3},\frac{2}{3}}}\right)\cdot\rv{\frac{1}{3},\frac{2}{3},\frac{2}{3}}\\&=&\displaystyle\frac{5}{3}\cdot\rv{\frac{1}{3},\frac{2}{3},\frac{2}{3}}\\&=&\displaystyle\frac{5}{9}\cdot\rv{1,2,2}\end{array}
\]
Therefore, the distance from #\rv{1,1,1}# to the subspace spanned by #{\rv{1,2,2}}# is equal to the length of the difference vector:
\[{\norm{\rv{1,1,1}-\frac{5}{9}\cdot\rv{1,2,2}}} ={ \norm{\rv{\frac{4}{9},\frac{-1}{9},\frac{-1}{9}}}} = \frac{1}{9}\sqrt{4^2+1^2+1^2} = \frac{\sqrt{18}}{9} = \frac{1}{3}\sqrt{2}\]
When performing the Gram-Schmidt procedure, we actually make use of the orthogonal projection. In this procedure, vectors are sometimes normalized, but the essential step is \[ \vec{e}_{i+1}^*:=\vec{a}_{i+1}-\sum_{j=1}^i(\dotprod{\vec{a}_{i+1}}{\vec{e}_j})\cdot \vec{e}_j\]
where #\basis{\vec{e}_1,\ldots,\vec{e}_i}# is an orthonormal basis. The vector #\vec{e}_{i+1}^*# is the difference #\vec{a}_{i+1}-P_W(\vec{a}_{i+1})# of #\vec{a}_{i+1}# and the projection #P_W(\vec{a}_{i+1})# of #\vec{a}_{i+1}# on the linear subspace #W = \linspan{\vec{e}_1,\ldots,\vec{e}_i}#. That is why this vector lies in the orthogonal complement of #W#.
Let #V# be a finite-dimensional inner product space. An affine subspace #W# of dimension #\dim{V}-1# of #V# is called a hyperplane. For such a hyperplane #W#, there exists, up to scaling by a nonzero factor, a unique vector #\vec{n}\ne\vec{0}# perpendicular to #W#. If #V# is equal to #\mathbb{R}^2# or #\mathbb{R}^3#, then #\vec{n}# is known as a normal vector of #W#. The coordinates of the vector #\vec{n}# are the coefficients of the coordinate variables in a linear equation for #W#. The projection of a vector #\vec{x}# can then be determined by calculating the point of intersection of #W# and the line with parametric representation #\vec{x}+r\cdot \vec{n}#.
In #\mathbb{R}^3# determine the orthogonal projection of the vector #{\left[ 1 , 3 , 2 \right] }# onto the subspace #W# spanned by the vector #{\left[ 4 , 0 , 3 \right] }#.
#P_W(\left[ 1 , 3 , 2 \right] ) = # #{{{2}\over{5}}\cdot \left[ 4 , 0 , 3 \right] }#
First, we normalize the vector #{\left[ 4 , 0 , 3 \right] }# to get an orthonormal basis for #W#. Because \[{\norm{\left[ 4 , 0 , 3 \right] }} =\sqrt{(4)^2+(0)^2+(3)^2}=5 \] we find the normalized basis vector \[\vec{a}_1=\dfrac{1}{5}\cdot {\left[ 4 , 0 , 3 \right] } =\left[ {{4}\over{5}} , 0 , {{3}\over{5}} \right] \]
Now, the orthogonal projection is given by
\[\begin{array}{rcl} P_W(\vec{x})&=&(\dotprod{\vec{x}}{\vec{a}_1})\,\vec{a}_1\\ &=&\displaystyle\left(\dotprod{\left[ 1 , 3 , 2 \right] }{\left[ {{4}\over{5}} , 0 , {{3}\over{5}} \right] }\right)\cdot{\left[ {{4}\over{5}} , 0 , {{3}\over{5}} \right] }\\&=&\displaystyle 2 \cdot {\left[ {{4}\over{5}} , 0 , {{3}\over{5}} \right] }\\&=&\displaystyle {{2}\over{5}}\cdot \left[ 4 , 0 , 3 \right] \end{array}
\]
The distance from # \left[ 1 , 3 , 2 \right] # to the subspace #W =\linspan{\left[ 4 , 0 , 3 \right] }# is equal to the length of the difference vector of #\left[ 1 , 3 , 2 \right] # and the projection #{{{2}\over{5}}\cdot \left[ 4 , 0 , 3 \right] }#:
\[\norm{\left[ 1 , 3 , 2 \right] - {{2}\over{5}}\cdot \left[ 4 , 0 , 3 \right] }=\norm{\left[ -{{3}\over{5}} , 3 , {{4}\over{5}} \right] } = \sqrt{10}\]