Orthogonal projection

Inner Product Spaces: Orthogonal projections

Orthogonal projection

In every inner product space each point $\vec{p}$ has a unique point on each finite-dimensional subspace $W$ that is nearest to $\vec{p}$ . The unique point is the orthogonal projection of $\vec{p}$ on $W$ . These results are also valid in the more general case where $W$ is an affine subspace.

Let $W$ be a finite-dimensional affine subspace of an inner product space $V$ and $\vec{x}$ a vector of $V$ . Then there exists a unique vector $\vec{y}$ in $W$ such that $\vec{x}-\vec{y}$ is perpendicular to $W$ .

This vector $\vec{y}$ is called the orthogonal projection of $\vec{x}$ on $W$ , and is denoted by $P_W(\vec{x})$ .

The statement that $\vec{x}-\vec{y}$ is perpendicular to the subspace $W$ means that $\dotprod{(\vec{x}-\vec{y})}{\vec{w}}=0$ for all $\vec{w}$ in $U$ , where $U$ is the direction space of $W$ .

The concept of orthogonal projection has been visualized in the picture below. In this particular case, we are looking at a $2$ -dimensional linear subspace $W$ of $V=\mathbb{R}^3$ represented by the shaded area. The vectors $\vec{x}$ and $\vec{y}$ are as in the definition.

3d picture

Here $\vec{x}-\vec{y}$ is the dotted vector, which is perpendicular to $W$ .

We first prove the theorem for the case where $W$ is a linear subspace of $V$ . According to the Gram-Schmidt Theorem each finite-dimensional subspace has an orthonormal basis. Let $\basis{\vec{a}_1,\ldots,\vec{a}_k}$ be such a basis for $W$ . Because an orthogonal projection is a vector in $W$ , we can write such a projection $\vec{y}$ as a linear combination $\lambda_1 \vec{a}_1+\cdots + \lambda_k\vec{a}_k$ for some scalars $\lambda_1 ,\ldots ,\lambda_k$ . The requirement that $\vec{x}-\vec{y}$ be perpendicular to $W$ means that, for each $j$ with $1\le j\le k$ ,

$\dotprod{ (\vec{x}-(\lambda_1 \vec{a}_1+\cdots + \lambda_k\vec{a}_k))}{\vec{a}_j}=0$

Using the orthonormality of the basis and the linearity of the inner product we can rewrite the equation in $\lambda_j$ to $\lambda_j =\dotprod{\vec{x}}{\vec{a}_j}$ . As a consequence, the vector

$\vec{y}=(\dotprod{\vec{x}}{\vec{a}_1} )\vec{a}_1+\cdots +(\dotprod{\vec{x}}{\vec{a}_k})\vec{a}_k$ is uniquely determined by the requirement that $\vec{x}-\vec{y}$ be perpendicular to $W$ . In addition, this vector belongs to $W$ and $\vec{x}-\vec{y}$ lies in $W^\perp$ . Therefore, there is exactly one orthogonal projection of $\vec{x}$ on $W$ . This proves the theorem in the case where $W$ is a linear subspace.

Now suppose that $W$ is an affine subspace of $V$ . Then there are a support vector $\vec{a}$ and a direction space $U$ such that $W = \vec{a}+U$ . The vector $\vec{y}=\vec{a}+P_U(\vec{x}-\vec{a})$ is an orthogonal projection of $\vec{x}$ on $W$ for

$P_U(\vec{x}-\vec{a})$ belongs to $U$ , so the vector $\vec{y}$ belongs to $\vec{a}+U=W$ ;
the vector $\vec{x}-\vec{y}$ is equal to $(\vec{x}-\vec{a}) -P_U(\vec{x}-\vec{a})$ and is thus perpendicular to $U$ due to the contention for the linear subspace $U$ applied to the vector $\vec{x}-\vec{a}$ .

It remains to show that $\vec{y}$ is unique. Suppose that $\vec{z}$ is an orthogonal projection of $\vec{x}$ on $W$ . Then $\vec{z}-\vec{a}$ is the orthogonal projection of $\vec{x}-\vec{a}$ on $U$ and (by the theorem for a linear subspace $U$ ) equal to $P_U(\vec{x}-\vec{a})$ . The latter vector is equal to $\vec{y}-\vec{a}$ , and so $\vec{z}-\vec{a}=\vec{y}-\vec{a}$ . We conclude that $\vec{z} = \vec{y}$ , which proves that $\vec{y}$ is the unique orthogonal projection of $\vec{x}$ on $W$ .

In many optimization problems calculating the minimum distance of a vector $\vec{x}$ to an affine subspace $W$ of $V$ plays an important role.

If the affine subspace $W$ has infinite dimension, there need not be an orthogonal projection. In order to see this, we let $V$ be the inner product space of all polynomials in $t$ with the inner product having $\basis{1,t,t^2,\ldots}$ as an orthonormal basis (so $\dotprod{t^i}{t^j}=\delta_{ij}$ with $\delta_{ij} =1$ if $i=j$ and $0$ otherwise). Take $\vec{x}$ to be the constant polynomial $1$ and $W$ to be the linear subspace of $V$ consisting of all polynomials having value $0$ at $t=1$ . The subspace $W$ has basis

$\basis{t-1,t^2-1,\ldots,t^j-1,\ldots}$ Thus, if $\vec{y}$ is an orthogonal projection of $\vec{x}$ onto $W$ , then there is a polynomial $\vec{y} =\sum_{i=1}^na_i(t^i-1)$ where $n$ is a natural number and $a_i$ are real numbers satisfying

$\dotprod{(\vec{x}-\vec{y})}{\vec{w}} = 0\text{ for all }\vec{w}\in W$

Since $\dotprod{1}{(t-1)} = -1\ne0$ , the vector $\vec{x}=1$ is not perpendicular to $W$ . In particular, $\vec{y}\ne\vec{0}$ . We can therefore assume that $a_n\ne0$ .

We work out the left-hand side of the above equation for $\vec{w} = t^j-1$ where $j\ge 1$ :

$\begin{array}{rcl}\dotprod{(\vec{x}-\vec{y})}{\vec{w}} &=&\displaystyle \dotprod{\left(1-\sum_{i=0}^na_i(t^i-1)\right)}{ (t^j-1)}\\ &&\phantom{xx}\color{blue}{\text{expressions for }\vec{x},\, \vec{y},\, \vec{w}\text{ used}}\\&=&\displaystyle \dotprod {1}{(t^j-1)}-\sum_{i=0}^na_i\dotprod{(t^i-1)}{(t^j-1)}\\ &&\phantom{xx}\color{blue}{\text{linearity of inner product}}\\&=&\displaystyle -1-\sum_{i=0}^na_i(\delta_{ij}+1)\\&&\phantom{xx}\color{blue}{\text{ orthonormality of basis } \basis{1,t,t^2,\ldots}}\\&=&\displaystyle-\left(1+\sum_{i=0}^na_i\right)-a_j\\ &&\phantom{xx}\color{blue}{\text{with convention }a_j =0 \text{ if }j\gt n}\end{array}$ Now first consider values of $j$ with $j\gt n$ . Because then $a_j = 0$ and because the above inner product should be equal to $0$ , we find $1+\sum_{i=0}^na_i= 0$ . This implies that the equation for $j =n$ can be rewritten to $-a_n = 0$ which contradicts the assumption $a_n\ne0$ . This shows that there is no vector $\vec{y}$ for which $\vec{x}-\vec{y}$ is perpendicular to $W$ . We conclude that there is no orthogonal projection of $\vec{x}$ on $W$ .

Here are some useful properties of the orthogonal projection on an affine subspace.

Let $V$ be an inner product space with affine subspace $W=\vec{a}+U$ for a vector $\vec{a}$ and a linear subspace $U$ of $V$ . Suppose that $\basis{\vec{a}_1, \ldots ,\vec{a}_k}$ is an orthonormal basis of $U$ for a natural number $k$ . The orthogonal projection $P_W(\vec{x})$ of a vector $\vec{x}$ of $V$ on $W$ satisfies the following properties:

$\vec{x}-P_W(\vec{x})$ is orthogonal to each vector from $W$ .
The orthogonal projection $P_W(\vec{x})$ is given by $\vec{a} + (\dotprod{(\vec{x}-\vec{a})}{\vec{a}_1})\,\vec{a}_1 + \cdots +(\dotprod{(\vec{x}-\vec{a})}{\vec{a}_k})\,\vec{a}_k$
The distance from $\vec{x}$ to a vector from $W$ is minimal for the orthogonal projection on $W$ : $\norm{\vec{x}-P_W(\vec{x})}=\min_{\vec{w}\in W}\norm{\vec{x}-\vec{w}}$
The orthogonal projection is the unique vector for which this minimum occurs.
$\norm{P_W(\vec{x})}\leq\norm{\vec{x}}$ with equality if and only if $\vec{x}=P_W(\vec{x})$ .
The equallity $P_W(\vec{x})=\vec{x}$ holds if and only if $\vec{x}$ lies in $W$ .

The distance between $\vec{x}$ and $W$ is defined by $\norm{\vec{x}-P_W(\vec{x})}$ as in statement 3.

If $W$ is a linear subspace, then we can take $\vec{a} = \vec{0}$ and $U = W$ . For a first reading of the statement it is useful to keep this special case in mind. The general case follows simply after subtraction of $\vec{a}$ from the affine subspace $W$ and the vector $\vec{x}$ .

1. The first statement is merely a repetition of the definition.

2. The second statement follows from the proof of the previous theorem.

3. For a good intuition of the proof of the third statement, the following figure is useful.

3d picture

Let $\vec{w}$ be a vector in $W$ . We compare $\norm{\vec{x}-P_W(\vec{x})}$ to $\norm{\vec{x}-\vec{w}}$ . To this end we write $\vec{x}-\vec{w}=(\vec{x}-P_W(\vec{x}))+(P_W(\vec{x})-\vec{w})$ . Because $P_W(\vec{x})$ and $\vec{w}$ belong to the affine subspace $W$ , the difference $P_W(\vec{x})-\vec{w}$ lies in $U$ . This vector is perpendicular to $\vec{x}-P_W(\vec{x})$ , by definition of the orthogonal projection. We can therefore apply the Pythagorean theorem:

$\norm{\vec{x}-\vec{w}}^2 =\norm{\vec{x}-P_W(\vec{x})}^2 + \norm{P_W(\vec{x})-\vec{w}}^2$ Rewriting this gives

$\norm{\vec{x}-P_W(\vec{x})}^2 =\norm{\vec{x}-\vec{w}}^2 - \norm{P_W(\vec{x})-\vec{w}}^2 \leq \norm{\vec{x}-\vec{w}}^2$ so $\norm{\vec{x}-P_W(\vec{x})}^2 \leq \norm{ \vec{x}-\vec{w}}^2$ . Since lengths are non-negative, we conclude that

$\norm{\vec{x}-P_W(\vec{x})}\leq\norm{\vec{x}-\vec{w}}$ Equality occurs if and only if $\norm{P_W(\vec{x})-\vec{w}}=0$ , which means that $P_W(\vec{x})$ must equal $\vec{w}$ .

4. The last sentence of the proof of the third statement immediately proves the fourth statement.

5. The fifth statement follows, just like the third, from the Pythagorean theorem: We know that $P_W(\vec{x})$ belongs to $W$ , and that, as a consequence, it is perpendicular to $\vec{x}-P_W(\vec{x})$ . Therefore we have

$\norm{\vec{x}}^2=\norm{\vec{x}-P_W(\vec{x}) +P_W(\vec{x})}^2 =\norm{\vec{x}-P_W(\vec{x})}^2 + \norm{P_W(\vec{x})}^2$ from which the statement follows immediately.

6. The implication from left to right in the sixth statement follows from the fact that the projection $P_W(\vec{x})$ lies in $W$ by definition, and so $\vec{x}$ is a vector in $W$ if $P_W(\vec{x})=\vec{x}$ .

The implication from right to left can be seen by using the fourth statement and observing that the distance between $\vec{x}$ and $\vec{x}$ equals $0$ , and the fact that this is the unique vector in $W$ for which this holds.

According to statement 3, the distance between a vector $\vec{x}$ and an affine subspace $W$ is equal to the minimum distance between $\vec{x}$ and a point in $W$ .

In $\mathbb{R}^3$ we will calculate the orthogonal projection of the vector $\rv{1,1,1}$ on the subspace spanned by the vector $\rv{1,2,2}$ . First, we normalize the vector $\rv{1,2,2}$ to get an orthonormal basis. Because ${\norm{\vec{a}_1}} = \sqrt{1^2+2^2+2^2}=3$ , this provides the vector

$\vec{a}_1=\dfrac{1}{3}\cdot \rv{1,2,2} =\rv{\frac{1}{3},\frac{2}{3},\frac{2}{3}}$

The orthogonal projection is now given by

$\begin{array}{rcl} P_W(\vec{x})&=&(\dotprod{\vec{x}}{\vec{a}_1})\vec{a}_1\\ &=&\displaystyle\left(\dotprod{\rv{1,1,1}}{\rv{\frac{1}{3},\frac{2}{3},\frac{2}{3}}}\right)\cdot\rv{\frac{1}{3},\frac{2}{3},\frac{2}{3}}\\&=&\displaystyle\frac{5}{3}\cdot\rv{\frac{1}{3},\frac{2}{3},\frac{2}{3}}\\&=&\displaystyle\frac{5}{9}\cdot\rv{1,2,2}\end{array}$

Therefore, the distance from $\rv{1,1,1}$ to the subspace spanned by ${\rv{1,2,2}}$ is equal to the length of the difference vector:

${\norm{\rv{1,1,1}-\frac{5}{9}\cdot\rv{1,2,2}}} ={ \norm{\rv{\frac{4}{9},\frac{-1}{9},\frac{-1}{9}}}} = \frac{1}{9}\sqrt{4^2+1^2+1^2} = \frac{\sqrt{18}}{9} = \frac{1}{3}\sqrt{2}$

Gram-Schmidt When performing the Gram-Schmidt procedure, we actually make use of the orthogonal projection. In this procedure, vectors are sometimes normalized, but the essential step is

$\vec{e}_{i+1}^*:=\vec{a}_{i+1}-\sum_{j=1}^i(\dotprod{\vec{a}_{i+1}}{\vec{e}_j})\cdot \vec{e}_j$

where $\basis{\vec{e}_1,\ldots,\vec{e}_i}$ is an orthonormal basis. The vector $\vec{e}_{i+1}^*$ is the difference $\vec{a}_{i+1}-P_W(\vec{a}_{i+1})$ of $\vec{a}_{i+1}$ and the projection $P_W(\vec{a}_{i+1})$ of $\vec{a}_{i+1}$ on the linear subspace $W = \linspan{\vec{e}_1,\ldots,\vec{e}_i}$ . That is why this vector lies in the orthogonal complement of $W$ .

Let $V$ be a finite-dimensional inner product space. An affine subspace $W$ of dimension $\dim{V}-1$ of $V$ is called a hyperplane. For such a hyperplane $W$ , there exists, up to scaling by a nonzero factor, a unique vector $\vec{n}\ne\vec{0}$ perpendicular to $W$ . If $V$ is equal to $\mathbb{R}^2$ or $\mathbb{R}^3$ , then $\vec{n}$ is known as a normal vector of $W$ . The coordinates of the vector $\vec{n}$ are the coefficients of the coordinate variables in a linear equation for $W$ . The projection of a vector $\vec{x}$ can then be determined by calculating the point of intersection of $W$ and the line with parametric representation $\vec{x}+r\cdot \vec{n}$ .

In $\mathbb{R}^3$ determine the orthogonal projection of the vector ${\left[ 1 , -3 , -1 \right] }$ onto the subspace $W$ spanned by the vector ${\left[ 4 , -4 , 2 \right] }$ .

$P_W(\left[ 1 , -3 , -1 \right] ) =$ ${{{7}\over{9}}\cdot \left[ 2 , -2 , 1 \right] }$

First, we normalize the vector ${\left[ 4 , -4 , 2 \right] }$ to get an orthonormal basis for $W$ . Because

${\norm{\left[ 4 , -4 , 2 \right] }} =\sqrt{(4)^2+(-4)^2+(2)^2}=6$ we find the normalized basis vector

$\vec{a}_1=\dfrac{1}{6}\cdot {\left[ 4 , -4 , 2 \right] } =\left[ {{2}\over{3}} , -{{2}\over{3}} , {{1}\over{3}} \right]$

Now, the orthogonal projection is given by

$\begin{array}{rcl} P_W(\vec{x})&=&(\dotprod{\vec{x}}{\vec{a}_1})\,\vec{a}_1\\ &=&\displaystyle\left(\dotprod{\left[ 1 , -3 , -1 \right] }{\left[ {{2}\over{3}} , -{{2}\over{3}} , {{1}\over{3}} \right] }\right)\cdot{\left[ {{2}\over{3}} , -{{2}\over{3}} , {{1}\over{3}} \right] }\\&=&\displaystyle {{7}\over{3}} \cdot {\left[ {{2}\over{3}} , -{{2}\over{3}} , {{1}\over{3}} \right] }\\&=&\displaystyle {{7}\over{9}}\cdot \left[ 2 , -2 , 1 \right] \end{array}$

The distance from $\left[ 1 , -3 , -1 \right]$ to the subspace $W =\linspan{\left[ 4 , -4 , 2 \right] }$ is equal to the length of the difference vector of $\left[ 1 , -3 , -1 \right]$ and the projection ${{{7}\over{9}}\cdot \left[ 2 , -2 , 1 \right] }$ :

$\norm{\left[ 1 , -3 , -1 \right] - {{7}\over{9}}\cdot \left[ 2 , -2 , 1 \right] }=\norm{\left[ -{{5}\over{9}} , -{{13}\over{9}} , -{{16}\over{9}} \right] } = {{5\cdot \sqrt{2}}\over{3}}$

New example