Linear Algebra Done Wrong 4 Introduction to spectral theory 6 Structure of operators in inner product spaces.

Chapter 5 Inner product spaces

Theory of inner product spaces is developed only for real and complex spaces, so $\mathbb{F}$ in this Chapter is always $\mathbb{R}$ or $\mathbb{C}$ ; the results usually do not generalize to spaces over arbitrary fields.

Most of the results and calculations in this chapter hold (and the results have the same statements) in both real and complex cases. In rare situations when there is a difference between real and complex case, we state explicitly which case is considered: otherwise everything holds for both cases.

Finally, when the results and calculations hold for both complex and real cases, we use formulas for the complex case; in the real case they give correct, although sometimes a bit more complicated, formulas.

5.1. Inner product in $\mathbb{R}^{n}$ and $\mathbb{C}^{n}$ . Inner product spaces.

5.1.1. Inner product and norm in $\mathbb{R}^{n}$ .

In dimensions 2 and 3, we defined the length of a vector $\mathbf{x}$ (i.e. the distance from its endpoint to the origin) by the Pythagorean rule, for example in $\mathbb{R}^{3}$ the length of the vector is defined as

\|\mathbf{x}\|=\sqrt{x_{1}^{2}+x_{2}^{2}+x_{3}^{2}}.

It is natural to generalize this formula for all $n$ , to define the norm of the vector $\mathbf{x}\in\mathbb{R}^{n}$ as

\|\mathbf{x}\|=\sqrt{x_{1}^{2}+x_{2}^{2}+\ldots+x_{n}^{2}}.

The word norm is used as a fancy replacement for the word length.

The dot product in $\mathbb{R}^{3}$ was defined as $\mathbf{x}\cdot\mathbf{y}=x_{1}y_{1}+x_{2}y_{2}+x_{3}y_{3}$ , where $\mathbf{x}=(x_{1},x_{2},x_{3})^{T}$ and $\mathbf{y}=(y_{1},y_{2},y_{3})^{T}$ .

Remark.

While the notation $\mathbf{x}\cdot\mathbf{y}$ and term “dot product” is often used in the literature, we prefer to call it “inner product”, and for reasons which will be clear later, we will the notation $(\mathbf{x},\mathbf{y})$ instead of $\mathbf{x}\cdot\mathbf{y}$ .

Thus, in $\mathbb{R}^{n}$ one can define the inner product $(\mathbf{x},\mathbf{y})$ of two vectors $\mathbf{x}=(x_{1},x_{2},\ldots,x_{n})^{T}$ , $\mathbf{y}=(y_{1},y_{2},\ldots,y_{n})^{T}$ by

(\mathbf{x},\mathbf{y}):=x_{1}y_{1}+x_{2}y_{2}+\ldots+x_{n}y_{n}=\mathbf{y}^{T% }\mathbf{x},

so $\|\mathbf{x}\|=\sqrt{(\mathbf{x},\mathbf{x})}$ .

Note, that $\mathbf{y}^{T}\mathbf{x}=\mathbf{x}^{T}\mathbf{y}$ , and we use the notation $\mathbf{y}^{T}\mathbf{x}$ only to be consistent.

5.1.2. Inner product and norm in $\mathbb{C}^{n}$ .

Let us now define norm and inner product for $\mathbb{C}^{n}$ . As we have seen before, the complex space $\mathbb{C}^{n}$ is the most natural space from the point of view of spectral theory: even if one starts from a matrix with real coefficients (or operator on a real vectors space), the eigenvalues can be complex, and one needs to work in a complex space.

For a complex number $z=x+iy$ , we have $|z|^{2}=x^{2}+y^{2}=z\overline{z}$ . If $\mathbf{z}\in\mathbb{C}^{n}$ is given by

\mathbf{z}=\left(\begin{array}[]{c}z_{1}\\ {z}_{2}\\ \vdots\\ {z}_{n}\end{array}\right)=\left(\begin{array}[]{c}x_{1}+iy_{1}\\ x_{2}+iy_{2}\\ \vdots\\ x_{n}+iy_{n}\end{array}\right),

it is natural to define its norm $\|\mathbf{z}\|$ by

\|\mathbf{z}\|^{2}={\sum_{k=1}^{n}(x_{k}^{2}+y_{k}^{2})}={\sum_{k=1}^{n}|z_{k}% |^{2}}.

Let us try to define an inner product on $\mathbb{C}^{n}$ such that $\|\mathbf{z}\|^{2}=(\mathbf{z},\mathbf{z})$ . One of the choices is to define $(\mathbf{z},\mathbf{w})$ by

(\mathbf{z},\mathbf{w})=z_{1}\overline{w}_{1}+z_{2}\overline{w}_{2}+\ldots+z_{% n}\overline{w}_{n}=\sum_{k=1}^{n}z_{k}\overline{w}_{k},

and that will be our definition of the standard inner product in $\mathbb{C}^{n}$ .

To simplify the notation, let us introduce a new notion. For a matrix $A$ let us define its Hermitian adjoint, or simply adjoint $A^{*}$ by $A^{*}=\overline{A}^{T}$ , meaning that we take the transpose of the matrix, and then take the complex conjugate of each entry. Note, that for a real matrix $A$ , $A^{*}=A^{T}$ .

Using the notion of $A^{*}$ , one can write the standard inner product in $\mathbb{C}^{n}$ as

(\mathbf{z},\mathbf{w})=\mathbf{w}^{*}\mathbf{z}.

Remark.

It is easy to see that one can define a different inner product in $\mathbb{C}^{n}$ such that $\|\mathbf{z}\|^{2}=(\mathbf{z},\mathbf{z})$ , namely the inner product given by

(\mathbf{z},\mathbf{w})_{1}=\overline{z}_{1}w_{1}+\overline{z}_{2}w_{2}+\ldots% +\overline{z}_{n}w_{n}=\mathbf{z}^{*}\mathbf{w}.

We did not specify what properties we want the inner product to satisfy, but $\mathbf{z}^{*}\mathbf{w}$ and $\mathbf{w}^{*}\mathbf{z}$ are the only reasonable choices giving $\|\mathbf{z}\|^{2}=(\mathbf{z},\mathbf{z})$ .

Note, that the above two choices of the inner product are essentially equivalent: the only difference between them is notational, because $(\mathbf{z},\mathbf{w})_{1}=(\mathbf{w},\mathbf{z})$ .

While the second choice of the inner product looks more natural, the first one, $(\mathbf{z},\mathbf{w})=\mathbf{w}^{*}\mathbf{z}$ is more widely used, so we will use it as well.

5.1.3. Inner product spaces.

The inner product we defined for $\mathbb{R}^{n}$ and $\mathbb{C}^{n}$ satisfies the following properties:

1.

(Conjugate) symmetry: $(\mathbf{x},\mathbf{y})=\overline{(\mathbf{y},\mathbf{x})}$ ; note, that for a real space, this property is just symmetry, $(\mathbf{x},\mathbf{y})=(\mathbf{y},\mathbf{x})$ ;
2.

Linearity: $(\alpha\mathbf{x}+\beta\mathbf{y},\mathbf{z})=\alpha(\mathbf{x},\mathbf{z})+% \beta(\mathbf{y},\mathbf{z})$ for all vector $\mathbf{x},\mathbf{y},\mathbf{z}$ and all scalars $\alpha,\beta$ ;
3.

Non-negativity: $(\mathbf{x},\mathbf{x})\geq 0$ $\forall\mathbf{x}$ ;
4.

Non-degeneracy: $(\mathbf{x},\mathbf{x})=0$ if and only if $\mathbf{x}=\mathbf{0}$ .

Let $V$ be a (complex or real) vector space. An inner product on $V$ is a function, that assign to each pair of vectors $\mathbf{x}$ , $\mathbf{y}$ a scalar, denoted by $(\mathbf{x},\mathbf{y})$ such that the above properties 1–4 are satisfied.

Note that for a real space $V$ we assume that $(\mathbf{x},\mathbf{y})$ is always real, and for a complex space the inner product $(\mathbf{x},\mathbf{y})$ can be complex.

A space $V$ together with an inner product on it is called an inner product space. Given an inner product space, one defines the norm on it by

\|\mathbf{x}\|=\sqrt{(\mathbf{x},\mathbf{x})}.

Examples

Example 5.1.1.

Let $V$ be $\mathbb{R}^{n}$ or $\mathbb{C}^{n}$ . We already have an inner product $(\mathbf{x},\mathbf{y})=\mathbf{y}^{*}\mathbf{x}=\sum_{k=1}^{n}x_{k}\overline{% y}_{k}$ defined above.

This inner product is called the standard inner product in $\mathbb{R}^{n}$ or $\mathbb{C}^{n}$

We will use symbol $\mathbb{F}$ to denote both $\mathbb{C}$ and $\mathbb{R}$ . When we have some statement about the space $\mathbb{F}^{n}$ , it means the statement is true for both $\mathbb{R}^{n}$ and $\mathbb{C}^{n}$ .

Example 5.1.2.

Let $V$ be the space $\mathbb{P}_{n}$ of polynomials of degree at most $n$ . Define the inner product by

(f,g)=\int_{-1}^{1}f(t)\overline{g(t)}dt.

It is easy to check, that the above properties 1–4 are satisfied.

This definition works both for complex and real cases. In the real case we only allow polynomials with real coefficients, and we do not need the complex conjugate here.

Let us recall, that for a square matrix $A$ , its trace is defined as the sum of the diagonal entries,

\operatorname{trace}A:=\sum_{k=1}^{n}a_{k,k}.

Example 5.1.3.

For the space $M_{m\times n}$ of $m\times n$ matrices let us define the so-called Frobenius inner product by

(A,B)=\operatorname{trace}(B^{*}A).

Again, it is easy to check that the properties 1–4 are satisfied, i.e. that we indeed defined an inner product.

Note, that

\operatorname{trace}(B^{*}A)=\sum_{j,k}A_{j,k}\overline{B}_{j,k},

so this inner product coincides with the standard inner product in $\mathbb{C}^{mn}$ .

5.1.4. Properties of inner product

The statements we get in this section are true for any abstract inner product space, not only for $\mathbb{F}^{n}$ . To prove them we use only properties 1–4 of the inner product.

First of all let us notice, that properties 1 and 2 imply that

$2^{\prime}$ .

$(\mathbf{x},\alpha\mathbf{y}+\beta\mathbf{z})=\overline{\alpha}(\mathbf{x},% \mathbf{y})+\overline{\beta}(\mathbf{x},\mathbf{z})$ .

Indeed,

(\mathbf{x},\alpha\mathbf{y}+\beta\mathbf{z})=\overline{(\alpha\mathbf{y}+% \beta\mathbf{z},\mathbf{x})}=\overline{\alpha(\mathbf{y},\mathbf{x})+\beta(% \mathbf{z},\mathbf{x})}=\\ =\overline{\alpha}\overline{(\mathbf{y},\mathbf{x})}+\overline{\beta}\,% \overline{(\mathbf{z},\mathbf{x})}=\overline{\alpha}(\mathbf{x},\mathbf{y})+% \overline{\beta}(\mathbf{x},\mathbf{z})

Note also that property 2 implies that for all vectors $\mathbf{x}$

(\mathbf{0},\mathbf{x})=(\mathbf{x},\mathbf{0})=0.

Lemma 5.1.4.

Let $\mathbf{x}$ be a vector in an inner product space $V$ . Then $\mathbf{x}=\mathbf{0}$ if and only if

(5.1.1)

(\mathbf{x},\mathbf{y})=0\qquad\forall\mathbf{y}\in V.

Proof.

Since $(\mathbf{0},\mathbf{y})=0$ we only need to show that (5.1.1) implies $\mathbf{x}=\mathbf{0}$ . Putting $\mathbf{y}=\mathbf{x}$ in (5.1.1) we get $(\mathbf{x},\mathbf{x})=0$ , so $\mathbf{x}=\mathbf{0}$ . ∎

Applying the above lemma to the difference $\mathbf{x}-\mathbf{y}$ we get the following

Corollary 5.1.5.

Let $\mathbf{x},\mathbf{y}$ be vectors in an inner product space $V$ . The equality $\mathbf{x}=\mathbf{y}$ holds if and only if

(\mathbf{x},\mathbf{z})=(\mathbf{y},\mathbf{z})\qquad\forall\mathbf{z}\in V.

The following corollary is very simple, but will be used a lot

Corollary 5.1.6.

Suppose two operators $A,B:X\to Y$ satisfy

(A\mathbf{x},\mathbf{y})=(B\mathbf{x},\mathbf{y})\qquad\forall\mathbf{x}\in X,% \ \forall\mathbf{y}\in Y.

Then $A=B$ .

Proof.

By the previous corollary (fix $\mathbf{x}$ and take all possible $\mathbf{y}$ ’s) we get $A\mathbf{x}=B\mathbf{x}$ . Since this is true for all $\mathbf{x}\in X$ , the transformations $A$ and $B$ coincide. ∎

The following property relates the norm and the inner product.

Theorem 5.1.7 (Cauchy–Schwarz inequality).

|(\mathbf{x},\mathbf{y})|\leq\|\mathbf{x}\|\cdot\|\mathbf{y}\|.

Proof.

The proof we are going to present, is not the shortest one, but it shows where the main ideas came from.

Let us consider the real case first. If $\mathbf{y}=\mathbf{0}$ , the statement is trivial, so we can assume that $\mathbf{y}\neq\mathbf{0}$ . By the properties of an inner product, for all scalar $t$

0\leq\|\mathbf{x}-t\mathbf{y}\|^{2}=(\mathbf{x}-t\mathbf{y},\mathbf{x}-t% \mathbf{y})=\|\mathbf{x}\|^{2}-2t(\mathbf{x},\mathbf{y})+t^{2}\|\mathbf{y}\|^{% 2}.

In particular, this inequality should hold for $t=\frac{(\mathbf{x},\mathbf{y})}{\|\mathbf{y}\|^{2}}$ ¹¹1That is the point where the above quadratic polynomial has a minimum: it can be computed, for example by taking the derivative in $t$ and equating it to $0$ , and for this point the inequality becomes

0\leq\|\mathbf{x}\|^{2}-2\frac{(\mathbf{x},\mathbf{y})^{2}}{\|\mathbf{y}\|^{2}% }+\frac{(\mathbf{x},\mathbf{y})^{2}}{\|\mathbf{y}\|^{2}}=\|\mathbf{x}\|^{2}-% \frac{(\mathbf{x},\mathbf{y})^{2}}{\|\mathbf{y}\|^{2}},

which is exactly the inequality we need to prove.

There are several possible ways to treat the complex case. One is to replace $\mathbf{x}$ by $\alpha\mathbf{x}$ , where $\alpha$ is a complex constant, $|\alpha|=1$ such that $(\alpha\mathbf{x},\mathbf{y})$ is real, and then repeat the proof for the real case.

The other possibility is again to consider

	$\displaystyle 0\leq\\|\mathbf{x}-t\mathbf{y}\\|^{2}$	$\displaystyle=(\mathbf{x}-t\mathbf{y},\mathbf{x}-t\mathbf{y})=(\mathbf{x},% \mathbf{x}-t\mathbf{y})-t(\mathbf{y},\mathbf{x}-t\mathbf{y})$
		$\displaystyle=\\|\mathbf{x}\\|^{2}-t(\mathbf{y},\mathbf{x})-\overline{t}(\mathbf% {x},\mathbf{y})+\|t\|^{2}\\|\mathbf{y}\\|^{2}.$

Substituting $t=\frac{{(\mathbf{x},\mathbf{y})}}{\|\mathbf{y}\|^{2}}=\frac{\overline{(% \mathbf{y},\mathbf{x})}}{\|\mathbf{y}\|^{2}}$ into this inequality, we get

0\leq\|\mathbf{x}\|^{2}-\frac{|(\mathbf{x},\mathbf{y})|^{2}}{\|\mathbf{y}\|^{2}}

which is the inequality we need.

Note, that the above paragraph is in fact a complete formal proof of the theorem. The reasoning before that was only to explain why do we need to pick this particular value of $t$ . ∎

An immediate Corollary of the Cauchy–Schwarz Inequality is the following lemma.

Lemma 5.1.8 (Triangle inequality).

For any vectors $\mathbf{x}$ , $\mathbf{y}$ in an inner product space

\|\mathbf{x}+\mathbf{y}\|\leq\|\mathbf{x}\|+\|\mathbf{y}\|.

Proof.

	$\displaystyle\\|\mathbf{x}+\mathbf{y}\\|^{2}=(\mathbf{x}+\mathbf{y},\mathbf{x}+% \mathbf{y})$	$\displaystyle=\\|\mathbf{x}\\|^{2}+\\|\mathbf{y}\\|^{2}+(\mathbf{x},\mathbf{y})+(% \mathbf{y},\mathbf{x})$
		$\displaystyle\leq\\|\mathbf{x}\\|^{2}+\\|\mathbf{y}\\|^{2}+2\|(\mathbf{x},\mathbf{y% })\|$
		$\displaystyle\leq\\|\mathbf{x}\\|^{2}+\\|\mathbf{y}\\|^{2}+2\\|\mathbf{x}\\|\cdot\\|% \mathbf{y}\\|=(\\|\mathbf{x}\\|+\\|\mathbf{y}\\|)^{2}.$

∎

The following polarization identities allow one to reconstruct the inner product from the norm:

Lemma 5.1.9 (Polarization identities).

For $\mathbf{x},\mathbf{y}\in V$

(\mathbf{x},\mathbf{y})=\frac{1}{4}\left(\|\mathbf{x}+\mathbf{y}\|^{2}-\|% \mathbf{x}-\mathbf{y}\|^{2}\right)

if $V$ is a real inner product space, and

(\mathbf{x},\mathbf{y})=\frac{1}{4}\sum_{\alpha=\pm 1,\pm i}\alpha\|\mathbf{x}% +\alpha\mathbf{y}\|^{2}

if $V$ is a complex inner product space.

The lemma is proved by direct computation. We leave the proof as an exercise for the reader.

Another important property of the norm in an inner product space can be also checked by direct calculation.

Lemma 5.1.10 (Parallelogram Identity).

For any vectors $\mathbf{u},\mathbf{v}$

\|\mathbf{u}+\mathbf{v}\|^{2}+\|\mathbf{u}-\mathbf{v}\|^{2}=2(\|\mathbf{u}\|^{% 2}+\|\mathbf{v}\|^{2}).

In 2-dimensional space this lemma relates sides of a parallelogram with its diagonals, which explains the name. It is a well-known fact from planar geometry.

5.1.5. Norm. Normed spaces

We have proved before that the norm $\|\mathbf{v}\|$ satisfies the following properties:

1.

Homogeneity: $\|\alpha\mathbf{v}\|=|\alpha|\cdot\|\mathbf{v}\|$ for all vectors $\mathbf{v}$ and all scalars $\alpha$ .
2.

Triangle inequality: $\|\mathbf{u}+\mathbf{v}\|\leq\|\mathbf{u}\|+\|\mathbf{v}\|$ .
3.

Non-negativity: $\|\mathbf{v}\|\geq 0$ for all vectors $\mathbf{v}$ .
4.

Non-degeneracy: $\|\mathbf{v}\|=0$ if and only if $\mathbf{v}=\mathbf{0}$ .

Suppose in a vector space $V$ we assigned to each vector $\mathbf{v}$ a number $\|\mathbf{v}\|$ such that above properties 1–4 are satisfied. Then we say that the function $\mathbf{v}\mapsto\|\mathbf{v}\|$ is a norm. A vector space $V$ equipped with a norm is called a normed space.

Any inner product space is a normed space, because the norm $\|\mathbf{v}\|=\sqrt{(\mathbf{v},\mathbf{v})}$ satisfies the above properties 1–4. However, there are many other normed spaces. For example, given $p$ , $1\leq p<\infty$ one can define the norm $\|\,\cdot\,\|_{p}$ on $\mathbb{R}^{n}$ or $\mathbb{C}^{n}$ by

\|\mathbf{x}\|_{p}=\left(|x_{1}|^{p}+|x_{2}|^{p}+\ldots+|x_{n}|^{p}\right)^{1/% p}=\left(\sum_{k=1}^{n}|x_{k}|^{p}\right)^{1/p}.

One can also define the norm $\|\,\cdot\,\|_{\infty}$ ( $p=\infty$ ) by

\|\mathbf{x}\|_{\infty}=\max\{|x_{k}|:k=1,2,\ldots,n\}.

The norm $\|\,\cdot\,\|_{p}$ for $p=2$ coincides with the regular norm obtained from the inner product.

To check that $\|\,\cdot\,\|_{p}$ is indeed a norm one has to check that it satisfies all the above properties 1–4. Properties 1, 3 and 4 are very easy to check, we leave it as an exercise for the reader. The triangle inequality (property 2) is easy to check for $p=1$ and $p=\infty$ (and we proved it for $p=2$ ).

For all other $p$ the triangle inequality is true, but the proof is not so simple, and we will not present it here. The triangle inequality for $\|\,\cdot\,\|_{p}$ even has special name: its called Minkowski inequality, after the German mathematician H. Minkowski.

Note, that the norm $\|\,\cdot\,\|_{p}$ for $p\neq 2$ cannot be obtained from an inner product. It is easy to see that this norm is not obtained from the standard inner product in $\mathbb{R}^{n}$ ( $\mathbb{C}^{n}$ ). But we claim more! We claim that it is impossible to introduce an inner product which gives rise to the norm $\|\,\cdot\,\|_{p}$ , $p\neq 2$ .

This statement is actually quite easy to prove. By Lemma 5.1.10 any norm obtained from an inner product must satisfy the Parallelogram Identity. It is easy to see that the Parallelogram Identity fails for the norm $\|\,\cdot\,\|_{p}$ , $p\neq 2$ , and one can easily find a counterexample in $\mathbb{R}^{2}$ , which then gives rise to a counterexample in all other spaces.

In fact, the Parallelogram Identity, as the theorem below asserts, completely characterizes norms obtained from an inner product.

Theorem 5.1.11.

A norm in a normed space is obtained from some inner product if and only if it satisfies the Parallelogram Identity

\|\mathbf{u}+\mathbf{v}\|^{2}+\|\mathbf{u}-\mathbf{v}\|^{2}=2(\|\mathbf{u}\|^{% 2}+\|\mathbf{v}\|^{2})\qquad\forall\mathbf{u},\mathbf{v}\in V.

Lemma 5.1.10 asserts that a norm obtained from an inner product satisfies the Parallelogram Identity.

The converse implication is more complicated. If we are given a norm, and this norm came from an inner product, then we do not have any choice; this inner product must be given by the polarization identities, see Lemma 5.1.9. But, we need to show that $(\mathbf{x},\mathbf{y})$ which we got from the polarization identities is indeed an inner product, i.e. that it satisfies all the properties.

It is indeed possible to verify that if the norm satisfies the parallelogram identity then the inner product $(\mathbf{x},\mathbf{y})$ obtained from the polarization identities is indeed an inner product (i.e. satisfies all the properties of an inner product). However, the proof is a bit too involved, so we do not present it here.

Exercises.

5.1.1.

Compute

(3+2i)(5-3i),\qquad\frac{2-3i}{1-2i},\qquad\operatorname{Re}\left(\frac{2-3i}{% 1-2i}\right),\qquad(1+2i)^{3},\qquad\operatorname{Im}((1+2i)^{3}).

5.1.2.

For vectors $\mathbf{x}=(1,2i,1+i)^{T}$ and $\mathbf{y}=(i,2-i,3)^{T}$ compute

a)

$(\mathbf{x},\mathbf{y})$ , $\|\mathbf{x}\|^{2}$ , $\|\mathbf{y}\|^{2}$ , $\|\mathbf{y}\|$ ;
b)

$(3\mathbf{x},2i\mathbf{y})$ , $(2\mathbf{x},i\mathbf{x}+2\mathbf{y})$ ;
c)

$\|\mathbf{x}+2\mathbf{y}\|$ .

Remark: After you have done part a), you can do parts b) and c) without actually computing all vectors involved, just by using the properties of inner product.

5.1.3.

Let $\|\mathbf{u}\|=2$ , $\|\mathbf{v}\|=3$ , $(\mathbf{u},\mathbf{v})=2+i$ . Compute

\|\mathbf{u}+\mathbf{v}\|^{2},\qquad\|\mathbf{u}-\mathbf{v}\|^{2},\qquad(% \mathbf{u}+\mathbf{v},\mathbf{u}-i\mathbf{v}),\qquad(\mathbf{u}+3i\mathbf{v},4% i\mathbf{u}).

5.1.4.

Prove that for vectors in a inner product space

\|\mathbf{x}\pm\mathbf{y}\|^{2}=\|\mathbf{x}\|^{2}+\|\mathbf{y}\|^{2}\pm 2% \operatorname{Re}(\mathbf{x},\mathbf{y})

Recall that $\operatorname{Re}z=\frac{1}{2}(z+\overline{z})$

5.1.5.

Explain why each of the following is not an inner product on a given vector space:

a)

$(\mathbf{x},\mathbf{y})=x_{1}y_{1}-x_{2}y_{2}$ on $\mathbb{R}^{2}$ ;
b)

$(A,B)=\operatorname{trace}(A+B)$ on the space of real $2\times 2$ matrices’
c)

$(f,g)=\int_{0}^{1}f^{\prime}(t)\overline{g(t)}dt$ on the space of polynomials; $f^{\prime}(t)$ denotes derivative.

5.1.6Equality in Cauchy–Schwarz inequality.

Prove that

|(\mathbf{x},\mathbf{y})|=\|\mathbf{x}\|\cdot\|\mathbf{y}\|

if and only if one of the vectors is a multiple of the other. Hint: Analyze the proof of the Cauchy–Schwarz inequality.

5.1.7.

Prove the parallelogram identity for an inner product space $V$ ,

\|\mathbf{x}+\mathbf{y}\|^{2}+\|\mathbf{x}-\mathbf{y}\|^{2}=2(\|\mathbf{x}\|^{% 2}+\|\mathbf{y}\|^{2}).

5.1.8.

Let $\mathbf{v}_{1},\mathbf{v}_{2},\ldots,\mathbf{v}_{n}$ be a spanning set (in particular, a basis) in an inner product space $V$ . Prove that

a)

If $(\mathbf{x},\mathbf{v})=0$ for all $\mathbf{v}\in V$ , then $\mathbf{x}=\mathbf{0}$ ;
b)

If $(\mathbf{x},\mathbf{v}_{k})=0$ $\forall k$ , then $\mathbf{x}=\mathbf{0}$ ;
c)

If $(\mathbf{x},\mathbf{v}_{k})=(\mathbf{y},\mathbf{v}_{k})$ $\forall k$ , then $\mathbf{x}=\mathbf{y}$ .

5.1.9.

Consider the space $\mathbb{R}^{2}$ with the norm $\|\,\cdot\,\|_{p}$ , introduced in Section 5.1.5. For $p=1,2,\infty$ , draw the “unit ball” $B_{p}$ in the norm $\|\,\cdot\,\|_{p}$

B_{p}:=\{\mathbf{x}\in\mathbb{R}^{2}:\|\mathbf{x}\|_{p}\leq 1\}.

Can you guess what the balls $B_{p}$ for other $p$ look like?

5.2. Orthogonality. Orthogonal and orthonormal bases.

Definition 5.2.1.

Two vectors $\mathbf{u}$ and $\mathbf{v}$ are called orthogonal (also perpendicular) if $(\mathbf{u},\mathbf{v})=0$ . We will write $\mathbf{u}\perp\mathbf{v}$ to say that the vectors are orthogonal.

Note, that for orthogonal vectors $\mathbf{u}$ and $\mathbf{v}$ we have the following, so-called Pythagorean identity:

\|\mathbf{u}+\mathbf{v}\|^{2}=\|\mathbf{u}\|^{2}+\|\mathbf{v}\|^{2}\qquad\text% {if }\mathbf{u}\perp\mathbf{v}.

The proof is straightforward computation,

\|\mathbf{u}+\mathbf{v}\|^{2}=(\mathbf{u}+\mathbf{v},\mathbf{u}+\mathbf{v})=(% \mathbf{u},\mathbf{u})+(\mathbf{v},\mathbf{v})+(\mathbf{u},\mathbf{v})+(% \mathbf{v},\mathbf{u})=\|\mathbf{u}\|^{2}+\|\mathbf{v}\|^{2}

( $(\mathbf{u},\mathbf{v})=(\mathbf{v},\mathbf{u})=0$ because of orthogonality).

Definition 5.2.2.

We say that a vector $\mathbf{v}$ is orthogonal to a subspace $E$ if $\mathbf{v}$ is orthogonal to all vectors $\mathbf{w}$ in $E$ .

We say that subspaces $E$ and $F$ are orthogonal if all vectors in $E$ are orthogonal to $F$ , i.e. all vectors in $E$ are orthogonal to all vectors in $F$

The following lemma shows how to check that a vector is orthogonal to a subspace.

Lemma 5.2.3.

Let $E$ be spanned by vectors $\mathbf{v}_{1},\mathbf{v}_{2},\ldots,\mathbf{v}_{r}$ . Then $\mathbf{v}\perp E$ if and only if

\mathbf{v}\perp\mathbf{v}_{k},\qquad\forall k=1,2,\ldots,r.

Proof.

By the definition, if $\mathbf{v}\perp E$ then $\mathbf{v}$ is orthogonal to all vectors in $E$ . In particular, $\mathbf{v}\perp\mathbf{v}_{k}$ , $k=1,2,\ldots,r$ .

On the other hand, let $\mathbf{v}\perp\mathbf{v}_{k}$ , $k=1,2,\ldots,r$ . Since the vectors $\mathbf{v}_{k}$ span $E$ , any vector $\mathbf{w}\in E$ can be represented as a linear combination $\sum_{k=1}^{r}\alpha_{k}\mathbf{v}_{k}$ . Then

(\mathbf{v},\mathbf{w})=\sum_{k=1}^{r}\alpha_{k}(\mathbf{v},\mathbf{v}_{k})=0,

so $\mathbf{v}\perp\mathbf{w}$ . ∎

Definition 5.2.4.

A system of vectors $\mathbf{v}_{1},\mathbf{v}_{2},\ldots,\mathbf{v}_{n}$ is called orthogonal if any two vectors are orthogonal to each other (i.e. if $(\mathbf{v}_{j},\mathbf{v}_{k})=0$ for $j\neq k$ ).

If, in addition $\|\mathbf{v}_{k}\|=1$ for all $k$ , we call the system orthonormal.

Lemma 5.2.5 (Generalized Pythagorean identity).

Let $\mathbf{v}_{1},\mathbf{v}_{2},\ldots,\mathbf{v}_{n}$ be an orthogonal system. Then

\left\|\sum_{k=1}^{n}\alpha_{k}\mathbf{v}_{k}\right\|^{2}=\sum_{k=1}^{n}|% \alpha_{k}|^{2}\|\mathbf{v}_{k}\|^{2}

This formula looks particularly simple for orthonormal systems, where $\|\mathbf{v}_{k}\|=1$ .

Proof of the Lemma.

\left\|\sum_{k=1}^{n}\alpha_{k}\mathbf{v}_{k}\right\|^{2}=\Bigl{(}\sum_{k=1}^{% n}\alpha_{k}\mathbf{v}_{k},\sum_{j=1}^{n}\alpha_{j}\mathbf{v}_{j}\Bigr{)}=\sum% _{k=1}^{n}\sum_{j=1}^{n}\alpha_{k}\overline{\alpha}_{j}(\mathbf{v}_{k},\mathbf% {v}_{j}).

Because of orthogonality $(\mathbf{v}_{k},\mathbf{v}_{j})=0$ if $j\neq k$ . Therefore we only need to sum the terms with $j=k$ , which gives exactly

\sum_{k=1}^{n}|\alpha_{k}|^{2}(\mathbf{v}_{k},\mathbf{v}_{k})=\sum_{k=1}^{n}|% \alpha_{k}|^{2}\|\mathbf{v}_{k}\|^{2}.

∎

Corollary 5.2.6.

Any orthogonal system $\mathbf{v}_{1},\mathbf{v}_{2},\ldots,\mathbf{v}_{n}$ of non-zero vectors is linearly independent.

Proof.

Suppose for some $\alpha_{1},\alpha_{2},\ldots,\alpha_{n}$ we have $\sum_{k=1}^{n}\alpha_{k}\mathbf{v}_{k}=\mathbf{0}$ . Then by the Generalized Pythagorean identity (Lemma 5.2.5)

0=\|\mathbf{0}\|^{2}=\sum_{k=1}^{n}|\alpha_{k}|^{2}\|\mathbf{v}_{k}\|^{2}.

Since $\|\mathbf{v}_{k}\|\neq 0$ ( $\mathbf{v}_{k}\neq\mathbf{0}$ ) we conclude that

\alpha_{k}=0\qquad\forall k,

so only the trivial linear combination gives $\mathbf{0}$ . ∎

Remark.

In what follows we will usually mean by an orthogonal system an orthogonal system of non-zero vectors. Since the zero vector $\mathbf{0}$ is orthogonal to everything, it always can be added to any orthogonal system, but it is really not interesting to consider orthogonal systems with zero vectors.

5.2.1. Orthogonal and orthonormal bases.

Definition 5.2.7.

An orthogonal (orthonormal) system $\mathbf{v}_{1},\mathbf{v}_{2},\ldots,\mathbf{v}_{n}$ which is also a basis is called an orthogonal (orthonormal) basis.

It is clear that if $\dim V=n$ then any orthogonal system of $n$ non-zero vectors is an orthogonal basis.

As we studied before, to find coordinates of a vector in a basis one needs to solve a linear system. However, for an orthogonal basis finding coordinates of a vector is much easier. Namely, suppose $\mathbf{v}_{1},\mathbf{v}_{2},\ldots,\mathbf{v}_{n}$ is an orthogonal basis, and let

\mathbf{x}=\alpha_{1}\mathbf{v}_{1}+\alpha_{2}\mathbf{v}_{2}+\ldots+\alpha_{n}% \mathbf{v}_{n}=\sum_{j=1}^{n}\alpha_{j}\mathbf{v}_{j}.

Taking inner product of both sides of the equation with $\mathbf{v}_{1}$ we get

(\mathbf{x},\mathbf{v}_{1})=\sum_{j=1}^{n}\alpha_{j}(\mathbf{v}_{j},\mathbf{v}% _{1})=\alpha_{1}(\mathbf{v}_{1},\mathbf{v}_{1})=\alpha_{1}\|\mathbf{v}_{1}\|^{2}

(all inner products $(\mathbf{v}_{j},\mathbf{v}_{1})=0$ if $j\neq 1$ ), so

\alpha_{1}=\frac{(\mathbf{x},\mathbf{v}_{1})}{\|\mathbf{v}_{1}\|^{2}}.

Similarly, multiplying both sides by $\mathbf{v}_{k}$ we get

(\mathbf{x},\mathbf{v}_{k})=\sum_{j=1}^{n}\alpha_{j}(\mathbf{v}_{j},\mathbf{v}% _{k})=\alpha_{k}(\mathbf{v}_{k},\mathbf{v}_{k})=\alpha_{k}\|\mathbf{v}_{k}\|^{2}

(5.2.1)

\alpha_{k}=\frac{(\mathbf{x},\mathbf{v}_{k})}{\|\mathbf{v}_{k}\|^{2}}.

Therefore,

to find coordinates of a vector in an orthogonal basis one does not need to solve a linear system, the coordinates are determined by the formula (5.2.1).

This formula is especially simple for orthonormal bases, when $\|\mathbf{v}_{k}\|=1$ .

Namely, if $\mathbf{v}_{1},\mathbf{v}_{2},\ldots,\mathbf{v}_{n}$ is an orthonormal basis, any vector $\mathbf{v}$ can be represented as

(5.2.2)

\mathbf{v}=\sum_{k=1}^{n}(\mathbf{v},\mathbf{v}_{k})\mathbf{v}_{k}.

This formula is sometimes called (a baby version of) the abstract orthogonal Fourier decomposition. The classical (non-abstract) Fourier decomposition deals with a concrete orthonormal system (sines and cosines or complex exponentials). We call this formula a baby version because the real Fourier decomposition deals with infinite orthonormal systems.

Remark 5.2.8.

The importance of orthonormal bases is that if we fix an orthonormal basis in an inner product space $V$ , we can work with coordinates in this basis the same way we work with vectors in $\mathbb{F}^{n}$ . Namely, as it was discussed in the very beginning of the book, see Remark 1.2.4 in Chapter 1, if we have a vector space $V$ (over a field $\mathbb{F}$ ) with a basis $\mathbf{v}_{1},\mathbf{v}_{2},\ldots,\mathbf{v}_{n}$ , then we can perform the standard vector operations (vector addition and multiplication by a scalar) by working with the columns of coordinates in the basis $\mathbf{v}_{1},\mathbf{v}_{2},\ldots,\mathbf{v}_{n}$ in absolutely the same way we work with vectors in the standard coordinate space $\mathbb{F}^{n}$ .

Exercise 5.2.3 below shows that if we have an orthonormal basis in an inner product space $V$ , we can compute the inner product of 2 vectors in $V$ by taking columns of their coordinates in this orthonormal basis and computing the standard inner product (in $\mathbb{C}^{n}$ or $\mathbb{R}^{n}$ ) of these columns.

As it will be shown below in Section 5.3 any finite-dimensional inner product space has an orthonormal basis. Thus, the standard inner product spaces $\mathbb{C}^{n}$ (or $\mathbb{R}^{n}$ in the case of real spaces) are essentially the only examples of a finite-dimensional inner product spaces.

This is a very important remark allowing one to translate any statement about the standard inner product space $\mathbb{F}^{n}$ to an inner product space with an orthonormal basis $\mathbf{v}_{1},\mathbf{v}_{2},\ldots,\mathbf{v}_{n}$ .

Exercises.

5.2.1.

Find all vectors in $\mathbb{R}^{4}$ orthogonal to vectors $(1,1,1,1)^{T}$ and $(1,2,3,4)^{T}$ .

5.2.2.

Let $A$ be a real $m\times n$ matrix. Describe $(\operatorname{Ran}A^{T})^{\perp}$ , $(\operatorname{Ran}A)^{\perp}$

5.2.3.

Let $\mathbf{v}_{1},\mathbf{v}_{2},\ldots,\mathbf{v}_{n}$ be an orthonormal basis in $V$ .

a)

Prove that for any $\mathbf{x}=\sum_{k=1}^{n}\alpha_{k}\mathbf{v}_{k}$ , $\mathbf{y}=\sum_{k=1}^{n}\beta_{k}\mathbf{v}_{k}$

$(\mathbf{x},\mathbf{y})=\sum_{k=1}^{n}\alpha_{k}\overline{\beta}_{k}.$
b)

Deduce from this the Parseval’s identity

$(\mathbf{x},\mathbf{y})=\sum_{k=1}^{n}(\mathbf{x},\mathbf{v}_{k})\overline{(% \mathbf{y},\mathbf{v}_{k})}$
c)

Assume now that $\mathbf{v}_{1},\mathbf{v}_{2},\ldots,\mathbf{v}_{n}$ is only an orthogonal basis, not an orthonormal one. Can you write down Parseval’s identity in this case?

This problem shows that if we have an orthonormal basis, we can use the coordinates in this basis absolutely the same way we use the standard coordinates in $\mathbb{C}^{n}$ (or $\mathbb{R}^{n}$ ).

The problem below shows that we can define an inner product by declaring a basis to be an orthonormal one.

5.2.4.

Let $V$ be a vector space and let $\mathbf{v}_{1},\mathbf{v}_{2},\ldots,\mathbf{v}_{n}$ be a basis in $V$ . For $\mathbf{x}=\sum_{k=1}^{n}\alpha_{k}\mathbf{v}_{k}$ , $\mathbf{y}=\sum_{k=1}^{n}\beta_{k}\mathbf{v}_{k}$ define $\langle\mathbf{x},\mathbf{y}\rangle:=\sum_{k=1}^{n}\alpha_{k}\bar{\beta}_{k}$ .

Prove that $\langle\mathbf{x},\mathbf{y}\rangle$ defines an inner product in $V$ .

5.2.5.

Let $A$ be a real $m\times n$ matrix. Describe the set of all vectors in $\mathbb{F}^{m}$ orthogonal to $\operatorname{Ran}A$ .

5.3. Orthogonal projection and Gram-Schmidt orthogonalization

Recalling the definition of orthogonal projection from the classical planar (2-dimensional) geometry, one can introduce the following definition. Let $E$ be a subspace of an inner product space $V$ .

Definition 5.3.1.

For a vector $\mathbf{v}$ its orthogonal projection $P_{E}\mathbf{v}$ onto the subspace $E$ is a vector $\mathbf{w}$ such that

1.

$\mathbf{w}\in E$ ;
2.

$\mathbf{v}-\mathbf{w}\perp E$ .

We will use notation $\mathbf{w}=P_{E}\mathbf{v}$ for the orthogonal projection.

After introducing an object, it is natural to ask:

1.

Does the object exist?
2.

Is the object unique?
3.

How does one find it?

We will show first that the projection is unique. Then we present a method of finding the projection, proving its existence.

The following theorem shows why the orthogonal projection is important and also proves that it is unique.

Theorem 5.3.2.

The orthogonal projection $\mathbf{w}=P_{E}\mathbf{v}$ minimizes the distance from $\mathbf{v}$ to $E$ , i.e. for all $\mathbf{x}\in E$

\|\mathbf{v}-\mathbf{w}\|\leq\|\mathbf{v}-\mathbf{x}\|.

Moreover, if for some $\mathbf{x}\in E$

\|\mathbf{v}-\mathbf{w}\|=\|\mathbf{v}-\mathbf{x}\|,

then $\mathbf{x}=\mathbf{w}$ .

Proof.

Let $\mathbf{y}=\mathbf{w}-\mathbf{x}$ . Then

\mathbf{v}-\mathbf{x}=\mathbf{v}-\mathbf{w}+\mathbf{w}-\mathbf{x}=\mathbf{v}-% \mathbf{w}+\mathbf{y}.

Since $\mathbf{v}-\mathbf{w}\perp E$ we have $\mathbf{y}\perp\mathbf{v}-\mathbf{w}$ and so by Pythagorean Theorem

\|\mathbf{v}-\mathbf{x}\|^{2}=\|\mathbf{v}-\mathbf{w}\|^{2}+\|\mathbf{y}\|^{2}% \geq\|\mathbf{v}-\mathbf{w}\|^{2}.

Note that equality happens only if $\mathbf{y}=\mathbf{0}$ i.e. if $\mathbf{x}=\mathbf{w}$ . ∎

The following proposition shows how to find an orthogonal projection if we know an orthogonal basis in $E$ .

Proposition 5.3.3.

Let $\mathbf{v}_{1},\mathbf{v}_{2},\ldots,\mathbf{v}_{r}$ be an orthogonal basis in $E$ . Then the orthogonal projection $P_{E}\mathbf{v}$ of a vector $\mathbf{v}$ is given by the formula

P_{{}_{\scriptstyle E}}\mathbf{v}=\sum_{k=1}^{r}\alpha_{k}\mathbf{v}_{k},% \qquad\text{where}\qquad\alpha_{k}=\frac{(\mathbf{v},\mathbf{v}_{k})}{\|% \mathbf{v}_{k}\|^{2}}\,.

In other words

(5.3.1)

P_{{}_{\scriptstyle E}}\mathbf{v}=\sum_{k=1}^{r}\frac{(\mathbf{v},\mathbf{v}_{% k})}{\|\mathbf{v}_{k}\|^{2}}\,\mathbf{v}_{k}.

Note that the formula for $\alpha_{k}$ coincides with (5.2.1), i.e. this formula applied to an orthogonal system (not a basis) gives us a projection onto its span.

Remark 5.3.4.

It is easy to see now from formula (5.3.1) that the orthogonal projection $P_{{}_{\scriptstyle E}}$ is a linear transformation.

One can also see linearity of $P_{{}_{\scriptstyle E}}$ directly, from the definition and uniqueness of the orthogonal projection. Indeed, it is easy to check that for any $\mathbf{x}$ and $\mathbf{y}$ the vector $\alpha\mathbf{x}+\beta\mathbf{y}-(\alpha P_{{}_{\scriptstyle E}}\mathbf{x}-% \beta P_{{}_{\scriptstyle E}}\mathbf{y})$ is orthogonal to any vector in $E$ , so by the definition $P_{{}_{\scriptstyle E}}(\alpha\mathbf{x}+\beta\mathbf{y})=\alpha P_{{}_{% \scriptstyle E}}\mathbf{x}+\beta P_{{}_{\scriptstyle E}}\mathbf{y}$ .

Remark 5.3.5.

Recalling the definition of inner product in $\mathbb{C}^{n}$ and $\mathbb{R}^{n}$ one can get from the above formula (5.3.1) the matrix of the orthogonal projection $P_{{}_{\scriptstyle E}}$ onto $E$ in $\mathbb{C}^{n}$ ( $\mathbb{R}^{n}$ ) is given by

(5.3.2)

P_{{}_{\scriptstyle E}}=\sum_{k=1}^{r}\frac{1}{\|\mathbf{v}_{k}\|^{2}}\,\,% \mathbf{v}_{k}\mathbf{v}_{k}^{*}

where columns $\mathbf{v}_{1},\mathbf{v}_{2},\ldots,\mathbf{v}_{r}$ form an orthogonal basis in $E$ .

Proof of Proposition 5.3.3.

Let

\mathbf{w}:=\sum_{k=1}^{r}\alpha_{k}\mathbf{v}_{k},\qquad\text{where}\qquad% \alpha_{k}=\frac{(\mathbf{v},\mathbf{v}_{k})}{\|\mathbf{v}_{k}\|^{2}}\,.

We want to show that $\mathbf{v}-\mathbf{w}\perp E$ . By Lemma 5.2.3 it is sufficient to show that $\mathbf{v}-\mathbf{w}\perp\mathbf{v}_{k}$ , $k=1,2,\ldots,r$ . Computing the inner product we get for $k=1,2,\ldots,r$

	$\displaystyle(\mathbf{v}-\mathbf{w},\mathbf{v}_{k})$	$\displaystyle=(\mathbf{v},\mathbf{v}_{k})-(\mathbf{w},\mathbf{v}_{k})=(\mathbf% {v},\mathbf{v}_{k})-\sum_{j=1}^{r}\alpha_{j}(\mathbf{v}_{j},\mathbf{v}_{k})$
		$\displaystyle=(\mathbf{v},\mathbf{v}_{k})-\alpha_{k}(\mathbf{v}_{k},\mathbf{v}% _{k})=(\mathbf{v},\mathbf{v}_{k})-\frac{(\mathbf{v},\mathbf{v}_{k})}{\\|\mathbf% {v}_{k}\\|^{2}}\\|\mathbf{v}_{k}\\|^{2}=0.$

∎

So, if we know an orthogonal basis in $E$ we can find the orthogonal projection onto $E$ . In particular, since any system consisting of one vector is an orthogonal system, we know how to perform orthogonal projection onto one-dimensional spaces.

But how do we find an orthogonal projection if we are only given a basis in $E$ ? Fortunately, there exists a simple algorithm allowing one to get an orthogonal basis from a basis.

5.3.1. Gram-Schmidt orthogonalization algorithm

Suppose we have a linearly independent system $\mathbf{x}_{1},\mathbf{x}_{2},\ldots,\mathbf{x}_{n}$ . The Gram-Schmidt method constructs from this system an orthogonal system $\mathbf{v}_{1},\mathbf{v}_{2},\ldots,\mathbf{v}_{n}$ such that

\operatorname{span}\{\mathbf{x}_{1},\mathbf{x}_{2},\ldots,\mathbf{x}_{n}\}=% \operatorname{span}\{\mathbf{v}_{1},\mathbf{v}_{2},\ldots,\mathbf{v}_{n}\}.

Moreover, for all $r\leq n$ we get

\operatorname{span}\{\mathbf{x}_{1},\mathbf{x}_{2},\ldots,\mathbf{x}_{r}\}=% \operatorname{span}\{\mathbf{v}_{1},\mathbf{v}_{2},\ldots,\mathbf{v}_{r}\}

Now let us describe the algorithm.

Step 1. Put $\mathbf{v}_{1}:=\mathbf{x}_{1}$ . Denote by $E_{1}:=\operatorname{span}\{\mathbf{x}_{1}\}=\operatorname{span}\{\mathbf{v}_{% 1}\}$ .

Step 2. Define $\mathbf{v}_{2}$ by

\mathbf{v}_{2}=\mathbf{x}_{2}-P_{E_{1}}\mathbf{x}_{2}=\mathbf{x}_{2}-\frac{(% \mathbf{x}_{2},\mathbf{v}_{1})}{\|\mathbf{v}_{1}\|^{2}}\mathbf{v}_{1}.

Define $E_{2}=\operatorname{span}\{\mathbf{v}_{1},\mathbf{v}_{2}\}$ . Note that $\operatorname{span}\{\mathbf{x}_{1},\mathbf{x}_{2}\}=E_{2}$ .

Step 3. Define $\mathbf{v}_{3}$ by

\mathbf{v}_{3}:=\mathbf{x}_{3}-P_{E_{2}}\mathbf{x}_{3}=\mathbf{x}_{3}-\frac{(% \mathbf{x}_{3},\mathbf{v}_{1})}{\|\mathbf{v}_{1}\|^{2}}\mathbf{v}_{1}-\frac{(% \mathbf{x}_{3},\mathbf{v}_{2})}{\|\mathbf{v}_{2}\|^{2}}\mathbf{v}_{2}

Put $E_{3}:=\operatorname{span}\{\mathbf{v}_{1},\mathbf{v}_{2},\mathbf{v}_{3}\}$ . Note that $\operatorname{span}\{\mathbf{x}_{1},\mathbf{x}_{2},\mathbf{x}_{3}\}=E_{3}$ . Note also that $\mathbf{x}_{3}\notin E_{2}$ so $\mathbf{v}_{3}\neq\mathbf{0}$ .

…

Step $r+1$ . Suppose that we already made $r$ steps of the process, constructing an orthogonal system (consisting of non-zero vectors) $\mathbf{v}_{1},\mathbf{v}_{2},\ldots,\mathbf{v}_{r}$ such that $E_{r}:=\operatorname{span}\{\mathbf{v}_{1},\mathbf{v}_{2},\ldots,\mathbf{v}_{r% }\}=\operatorname{span}\{\mathbf{x}_{1},\mathbf{x}_{2},\ldots,\mathbf{x}_{r}\}$ . Define

\mathbf{v}_{r+1}:=\mathbf{x}_{r+1}-P_{E_{r}}\mathbf{x}_{r+1}=\mathbf{x}_{r+1}-% \sum_{k=1}^{r}\frac{(\mathbf{x}_{r+1},\mathbf{v}_{k})}{\|\mathbf{v}_{k}\|^{2}}% \mathbf{v}_{k}

Note,that $\mathbf{x}_{r+1}\notin E_{r}$ so $\mathbf{v}_{r+1}\neq\mathbf{0}$ .

…

Continuing this algorithm we get an orthogonal system $\mathbf{v}_{1},\mathbf{v}_{2},\ldots,\mathbf{v}_{n}$ .

5.3.2. An example.

Suppose we are given vectors

\mathbf{x}_{1}=(1,1,1)^{T},\qquad\mathbf{x}_{2}=(0,1,2)^{T},\qquad\mathbf{x}_{% 3}=(1,0,2)^{T},

and we want to orthogonalize it by Gram-Schmidt. On the first step define

\mathbf{v}_{1}=\mathbf{x}_{1}=(1,1,1)^{T}.

On the second step we get

\mathbf{v}_{2}=\mathbf{x}_{2}-P_{E_{1}}\mathbf{x}_{2}=\mathbf{x}_{2}-\frac{(% \mathbf{x}_{2},\mathbf{v}_{1})}{\|\mathbf{v}_{1}\|^{2}}\mathbf{v}_{1}.

Computing

(\mathbf{x}_{2},\mathbf{v}_{1})=\bigl{(}\left(\begin{array}[]{c}0\\ 1\\ 2\end{array}\right),\left(\begin{array}[]{c}1\\ 1\\ 1\end{array}\right)\bigr{)}=3,\quad\|\mathbf{v}_{1}\|^{2}=3,

we get

\mathbf{v}_{2}=\left(\begin{array}[]{c}0\\ 1\\ 2\end{array}\right)-\frac{3}{3}\left(\begin{array}[]{c}1\\ 1\\ 1\end{array}\right)=\left(\begin{array}[]{c}-1\\ 0\\ 1\end{array}\right).

Finally, define

\mathbf{v}_{3}=\mathbf{x}_{3}-P_{E_{2}}\mathbf{x}_{3}=\mathbf{x}_{3}-\frac{(% \mathbf{x}_{3},\mathbf{v}_{1})}{\|\mathbf{v}_{1}\|^{2}}\mathbf{v}_{1}-\frac{(% \mathbf{x}_{3},\mathbf{v}_{2})}{\|\mathbf{v}_{2}\|^{2}}\mathbf{v}_{2}.

Computing

\bigl{(}\left(\begin{array}[]{c}1\\ 0\\ 2\end{array}\right),\left(\begin{array}[]{c}1\\ 1\\ 1\end{array}\right)\bigr{)}=3,\quad\bigl{(}\left(\begin{array}[]{c}1\\ 0\\ 2\end{array}\right),\left(\begin{array}[]{c}-1\\ 0\\ 1\end{array}\right)\bigr{)}=1,\ \|\mathbf{v}_{1}\|^{2}=3,\ \|\mathbf{v}_{2}\|^% {2}=2

( $\|\mathbf{v}_{1}\|^{2}$ was already computed before) we get

\mathbf{v}_{3}=\left(\begin{array}[]{c}1\\ 0\\ 2\end{array}\right)-\frac{3}{3}\left(\begin{array}[]{c}1\\ 1\\ 1\end{array}\right)-\frac{1}{2}\left(\begin{array}[]{c}-1\\ 0\\ 1\end{array}\right)=\left(\begin{array}[]{c}\frac{1}{2}\\ -1\\ \frac{1}{2}\end{array}\right)

Remark.

Since the multiplication by a scalar does not change the orthogonality, one can multiply vectors $\mathbf{v}_{k}$ obtained by Gram-Schmidt by any non-zero numbers.

In particular, in many theoretical constructions one normalizes vectors $\mathbf{v}_{k}$ by dividing them by their respective norms $\|\mathbf{v}_{k}\|$ . Then the resulting system will be orthonormal, and the formulas will look simpler.

On the other hand, when performing the computations one may want to avoid fractional entries by multiplying a vector by the least common denominator of its entries. Thus one may want to replace the vector $\mathbf{v}_{3}$ from the above example by $(1,-2,1)^{T}$ .

5.3.3. Orthogonal complement. Decomposition $V=E\oplus E^{\perp}$

Definition.

For a subspace $E$ its orthogonal complement $E^{\perp}$ is the set of all vectors orthogonal to $E$ ,

E^{\perp}:=\{\mathbf{x}:\mathbf{x}\perp E\}.

If $\mathbf{x},\mathbf{y}\perp E$ then for any linear combination $\alpha\mathbf{x}+\beta\mathbf{y}\perp E$ (can you see why?). Therefore $E^{\perp}$ is a subspace.

By the definition of orthogonal projection any vector in an inner product space $V$ admits a unique representation

\mathbf{v}=\mathbf{v}_{1}+\mathbf{v}_{2},\qquad\mathbf{v}_{1}\in E,\ \mathbf{v% }_{2}\perp E\text{ (eqv. }\mathbf{v}_{2}\in E^{\perp})

(where clearly $\mathbf{v}_{1}=P_{{}_{\scriptstyle E}}\mathbf{v}$ ).

This statement is often symbolically written as $V=E\oplus E^{\perp}$ , which mean exactly that any vector admits the unique decomposition above.

The following proposition gives an important property of the orthogonal complement.

Proposition 5.3.6.

For a subspace $E$

(E^{\perp})^{\perp}=E.

The proof is left as an exercise, see Problem 5.3.12 below.

Exercises.

5.3.1.

Apply Gram–Schmidt orthogonalization to the system of vectors $(1,2,-2)^{T}$ , $(1,-1,4)^{T}$ , $(2,1,1)^{T}$ .

5.3.2.

Apply Gram–Schmidt orthogonalization to the system of vectors $(1,2,3)^{T}$ , $(1,3,1)^{T}$ . Write the matrix of the orthogonal projection onto 2-dimensional subspace spanned by these vectors.

5.3.3.

Complete an orthogonal system obtained in the previous problem to an orthogonal basis in $\mathbb{R}^{3}$ , i.e. add to the system some vectors (how many?) to get an orthogonal basis.

Can you describe how to complete an orthogonal system to an orthogonal basis in general situation of $\mathbb{R}^{n}$ or $\mathbb{C}^{n}$ ?

5.3.4.

Find the distance from a vector $(2,3,1)^{T}$ to the subspace spanned by the vectors $(1,2,3)^{T}$ , $(1,3,1)^{T}$ . Note, that I am only asking to find the distance to the subspace, not the orthogonal projection.

5.3.5.

Find the orthogonal projection of a vector $(1,1,1,1)^{T}$ onto the subspace spanned by the vectors $\mathbf{v}_{1}=(1,3,1,1)^{T}$ and $\mathbf{v}_{2}=(2,-1,1,0)^{T}$ (note that $\mathbf{v}_{1}\perp\mathbf{v}_{2}$ ).

5.3.6.

Find the distance from a vector $(1,2,3,4)^{T}$ to the subspace spanned by the vectors $\mathbf{v}_{1}=(1,-1,1,0)^{T}$ and $\mathbf{v}_{2}=(1,2,1,1)^{T}$ (note that $\mathbf{v}_{1}\perp\mathbf{v}_{2}$ ). Can you find the distance without actually computing the projection? That would simplify the calculations.

5.3.7.

True or false: if $E$ is a subspace of $V$ , then $\dim E+\dim(E^{\perp})=\dim V$ ? Justify.

5.3.8.

Let $P$ be the orthogonal projection onto a subspace $E$ of an inner product space $V$ , $\dim V=n$ , $\dim E=r$ . Find the eigenvalues and the eigenvectors (eigenspaces). Find the algebraic and geometric multiplicities of each eigenvalue.

5.3.9.

(Using eigenvalues to compute determinants).

a)

Find the matrix of the orthogonal projection onto the one-dimensional subspace in $\mathbb{R}^{n}$ spanned by the vector $(1,1,\ldots,1)^{T}$ ;
b)

Let $A$ be the $n\times n$ matrix with all entries equal $1$ . Compute its eigenvalues and their multiplicities (use the previous problem);
c)

Compute eigenvalues (and multiplicities) of the matrix $A-I$ , i.e. of the matrix with zeroes on the main diagonal and ones everywhere else;
d)

Compute $\det(A-I)$ .

5.3.10Legendre’s polynomials:.

Let an inner product on the space of polynomials be defined by $(f,g)=\int_{-1}^{1}f(t)\overline{g(t)}dt$ . Apply Gram-Schmidt orthogonalization to the system $1,t,t^{2},t^{3}$ .

Legendre’s polynomials are particular case of the so-called orthogonal polynomials, which play an important role in many branches of mathematics.

5.3.11.

Let $P=P_{E}$ be the matrix of an orthogonal projection onto a subspace $E$ . Show that

a)

The matrix $P$ is self-adjoint, meaning that $P^{*}=P$ .
b)

$P^{2}=P$ .

Remark: The above 2 properties completely characterize orthogonal projection, i.e. any matrix $P$ satisfying these properties is the matrix of some orthogonal projection. We will discuss this some time later.

5.3.12.

Show that for a subspace $E$ we have $(E^{\perp})^{\perp}=E$ . Hint: It is easy to see that $E$ is orthogonal to $E^{\perp}$ (why?). To show that any vector $\mathbf{x}$ orthogonal to $E^{\perp}$ belongs to $E$ use the decomposition $V=E\oplus E^{\perp}$ from Section 5.3.3 above.

5.3.13.

Suppose $P$ is the orthogonal projection onto a subspace $E$ , and $Q$ is the orthogonal projection onto the orthogonal complement $E^{\perp}$ .

a)

What are $P+Q$ and $PQ$ ?
b)

Show that $P-Q$ is its own inverse.

5.4. Least square solution. Formula for the orthogonal projection

As it was discussed before in Chapter 2, the equation

A\mathbf{x}=\mathbf{b}

has a solution if and only if $\mathbf{b}\in\operatorname{Ran}A$ . But what do we do to solve an equation that does not have a solution?

This seems to be a silly question, because if there is no solution, then there is no solution. But, situations when we want to solve an equation that does not have a solution can appear naturally, for example, if we obtained the equation from an experiment. If we do not have any errors, the right side $\mathbf{b}$ belongs to the column space $\operatorname{Ran}A$ , and equation is consistent. But, in real life it is impossible to avoid errors in measurements, so it is possible that an equation that in theory should be consistent, does not have a solution. So, what can one do in this situation?

5.4.1. Least square solution

The simplest idea is to write down the error

\|A\mathbf{x}-\mathbf{b}\|

and try to find $\mathbf{x}$ minimizing it. If we can find $\mathbf{x}$ such that the error is $0$ , the system is consistent and we have exact solution. Otherwise, we get the so-called least square solution.

The term least square arises from the fact that minimizing $\|A\mathbf{x}-\mathbf{b}\|$ is equivalent to minimizing

\|A\mathbf{x}-\mathbf{b}\|^{2}=\sum_{k=1}^{m}|(A\mathbf{x})_{k}-b_{k}|^{2}=% \sum_{k=1}^{m}\Bigl{|}\sum_{j=1}^{n}A_{k,j}x_{j}-b_{k}\Bigr{|}^{2}

i.e. to minimizing the sum of squares of linear functions.

There are several ways to find the least square solution. If we are in $\mathbb{R}^{n}$ , and everything is real, we can forget about absolute values. Then we can just take partial derivatives with respect to $x_{j}$ and find where all of them are $0$ , which gives us the minimum.

Geometric approach.

However, there is a simpler way of finding the minimum. Namely, if we take all possible vectors $\mathbf{x}$ , then $A\mathbf{x}$ gives us all possible vectors in $\operatorname{Ran}A$ , so minimum of $\|A\mathbf{x}-\mathbf{b}\|$ is exactly the distance from $\mathbf{b}$ to $\operatorname{Ran}A$ . Therefore the value of $\|A\mathbf{x}-\mathbf{b}\|$ is minimal if and only if $A\mathbf{x}=P_{\operatorname{Ran}A}\mathbf{b}$ , where $P_{\operatorname{Ran}A}$ stands for the orthogonal projection onto the column space $\operatorname{Ran}A$ .

So, to find the least square solution we simply need to solve the equation

A\mathbf{x}=P_{\operatorname{Ran}A}\mathbf{b}.

If we know an orthogonal basis $\mathbf{v}_{1},\mathbf{v}_{2},\ldots,\mathbf{v}_{n}$ in $\operatorname{Ran}A$ , we can find vector $P_{\operatorname{Ran}A}\mathbf{b}$ by the formula

P_{\operatorname{Ran}A}\mathbf{b}=\sum_{k=1}^{n}\frac{(\mathbf{b},\mathbf{v}_{% k})}{\|\mathbf{v}_{k}\|^{2}}\mathbf{v}_{k}.

If we only know a basis in $\operatorname{Ran}A$ , we need to use the Gram–Schmidt orthogonalization to obtain an orthogonal basis from it.

So, theoretically, the problem is solved, but the solution is not very simple: it involves Gram–Schmidt orthogonalization, which can be computationally intensive. Fortunately, there exists a simpler solution.

Normal equation.

Namely, $A\mathbf{x}$ is the orthogonal projection $P_{\operatorname{Ran}A}\mathbf{b}$ if and only if $\mathbf{b}-A\mathbf{x}\perp\operatorname{Ran}A$ ( $A\mathbf{x}\in\operatorname{Ran}A$ for all $\mathbf{x}$ ).

If $\mathbf{a}_{1},\mathbf{a}_{2},\ldots,\mathbf{a}_{n}$ are columns of $A$ , then the condition $\mathbf{b}-A\mathbf{x}\perp\operatorname{Ran}A$ can be rewritten as

\mathbf{b}-A\mathbf{x}\perp\mathbf{a}_{k},\qquad\forall k=1,2,\ldots,n.

That means

0=(\mathbf{b}-A\mathbf{x},\mathbf{a}_{k})=\mathbf{a}_{k}^{*}(\mathbf{b}-A% \mathbf{x})\qquad\forall k=1,2,\ldots,n.

Joining rows $\mathbf{a}_{k}^{*}$ together we get that these equations are equivalent to

A^{*}(\mathbf{b}-A\mathbf{x})=\mathbf{0},

which in turn is equivalent to the so-called normal equation

A^{*}A\mathbf{x}=A^{*}\mathbf{b}.

A solution of this equation gives us the least square solution of $A\mathbf{x}=\mathbf{b}$ .

Note, that the least square solution is unique if and only if $A^{*}A$ is invertible.

5.4.2. Formula for the orthogonal projection.

As we already discussed above, if $\mathbf{x}$ is a solution of the normal equation $A^{*}A\mathbf{x}=A^{*}\mathbf{b}$ (i.e. a least square solution of $A\mathbf{x}=\mathbf{b}$ ), then $A\mathbf{x}=P_{\operatorname{Ran}A}\mathbf{b}$ . So, to find the orthogonal projection of $\mathbf{b}$ onto the column space $\operatorname{Ran}A$ we need to solve the normal equation $A^{*}A\mathbf{x}=A^{*}\mathbf{b}$ , and then multiply the solution by $A$ .

If the operator $A^{*}A$ is invertible, the solution of the normal equation $A^{*}A\mathbf{x}=A^{*}\mathbf{b}$ is given by $\mathbf{x}=(A^{*}A)^{-1}A^{*}\mathbf{b}$ , so the orthogonal projection $P_{\operatorname{Ran}A}\mathbf{b}$ can be computed as

P_{\operatorname{Ran}A}\mathbf{b}=A(A^{*}A)^{-1}A^{*}\mathbf{b}.

Since this is true for all $\mathbf{b}$ ,

P_{\operatorname{Ran}A}=A(A^{*}A)^{-1}A^{*}

is the formula for the matrix of the orthogonal projection onto $\operatorname{Ran}A$ .

The following theorem implies that for an $m\times n$ matrix $A$ the matrix $A^{*}A$ is invertible if and only if $\operatorname{rank}A=n$ .

Theorem 5.4.1.

For an $m\times n$ matrix $A$

\operatorname{Ker}A=\operatorname{Ker}(A^{*}A).

Indeed, according to the rank theorem $\operatorname{Ker}A=\{\mathbf{0}\}$ if and only if rank $A$ is $n$ . Therefore $\operatorname{Ker}(A^{*}A)=\{\mathbf{0}\}$ if and only if $\operatorname{rank}A=n$ . Since the matrix $A^{*}A$ is square, it is invertible if and only if $\operatorname{rank}A=n$ .

We leave the proof of the theorem as an exercise. To prove the equality $\operatorname{Ker}A=\operatorname{Ker}(A^{*}A)$ one needs to prove two inclusions $\operatorname{Ker}(A^{*}A)\subset\operatorname{Ker}A$ and $\operatorname{Ker}A\subset\operatorname{Ker}(A^{*}A)$ . One of the inclusions is trivial, for the other one use the fact that

\|A\mathbf{x}\|^{2}=(A\mathbf{x},A\mathbf{x})=(A^{*}A\mathbf{x},\mathbf{x}).

5.4.3. An example: line fitting

Let us introduce a few examples where the least square solution appears naturally. Suppose that we know that two quantities $x$ and $y$ are related by the law $y=a+bx$ . The coefficients $a$ and $b$ are unknown, and we would like to find them from experimental data.

Suppose we run the experiment $n$ times, and we get $n$ pairs $(x_{k},y_{k})$ , $k=1,2,\ldots,n$ . Ideally, all the points $(x_{k},y_{k})$ should be on a straight line, but because of errors in measurements, it usually does not happen: the point are usually close to some line, but not exactly on it. That is where the least square solution helps!

Ideally, the coefficients $a$ and $b$ should satisfy the equations

a+bx_{k}=y_{k},\qquad k=1,2,\ldots,n

(note that here, $x_{k}$ and $y_{k}$ are some fixed numbers, and the unknowns are $a$ and $b$ ). If it is possible to find such $a$ and $b$ we are lucky. If not, the standard thing to do is to minimize the total quadratic error

\sum_{k=1}^{n}|a+bx_{k}-y_{k}|^{2}.

But, minimizing this error is exactly finding the least square solution of the system

\left(\begin{array}[]{cc}1&x_{1}\\ 1&x_{2}\\ \vdots&\vdots\\ 1&x_{n}\end{array}\right)\left[\begin{array}[]{c}a\\ b\end{array}\right]=\left(\begin{array}[]{c}y_{1}\\ y_{2}\\ \vdots\\ y_{n}\end{array}\right)

(recall that $x_{k}$ , $y_{k}$ are some given numbers, and the unknowns are $a$ and $b$ ).

An example.

Suppose our data $(x_{k},y_{k})$ consist of pairs

(-2,4),\ (-1,2),\ (0,1),\ (2,1),\ (3,1).

Then we need to find the least square solution of

\left(\begin{array}[]{cc}1&-2\\ 1&-1\\ 1&0\\ 1&2\\ 1&3\end{array}\right)\left[\begin{array}[]{c}a\\ b\end{array}\right]=\left(\begin{array}[]{c}4\\ 2\\ 1\\ 1\\ 1\end{array}\right)

Then

A^{*}A=\left(\begin{array}[]{ccccc}1&1&1&1&1\\ -2&-1&0&2&3\end{array}\right)\left(\begin{array}[]{cc}1&-2\\ 1&-1\\ 1&0\\ 1&2\\ 1&3\end{array}\right)=\left(\begin{array}[]{cc}5&2\\ 2&18\end{array}\right)

and

A^{*}\mathbf{b}=\left(\begin{array}[]{ccccc}1&1&1&1&1\\ -2&-1&0&2&3\end{array}\right)\left(\begin{array}[]{c}4\\ 2\\ 1\\ 1\\ 1\\ \end{array}\right)=\left(\begin{array}[]{c}9\\ -5\end{array}\right)

so the normal equation $A^{*}A\mathbf{x}=A^{*}\mathbf{b}$ is rewritten as

\left(\begin{array}[]{cc}5&2\\ 2&18\end{array}\right)\left(\begin{array}[]{c}a\\ b\end{array}\right)=\left(\begin{array}[]{c}9\\ -5\end{array}\right).

The solution of this equation is

a=2,b=-1/2,

so the best fitting straight line is

y=2-1/2x.

5.4.4. Other examples: curves and planes.

The least square method is not limited to the line fitting. It can also be applied to more general curves, as well as to surfaces in higher dimensions.

The only constraint here is that the parameters we want to find be involved linearly. The general algorithm is as follows:

1.

Find the equations that your data should satisfy if there is exact fit;
2.

Write these equations as a linear system, where unknowns are the parameters you want to find. Note, that the system need not to be consistent (and usually is not);
3.

Find the least square solution of the system.

An example: curve fitting

For example, suppose we know that the relation between $x$ and $y$ is given by the quadratic law $y=a+bx+cx^{2}$ , so we want to fit a parabola $y=a+bx+cx^{2}$ to the data. Then our unknowns $a$ , $b$ , $c$ should satisfy the equations

a+bx_{k}+cx_{k}^{2}=y_{k},\qquad k=1,2,\ldots,n

or, in matrix form

\left(\begin{array}[]{ccc}1&x_{1}&x^{2}_{1}\\ 1&x_{2}&x^{2}_{2}\\ \vdots&\vdots&\vdots\\ 1&x_{n}&x^{2}_{n}\end{array}\right)\left(\begin{array}[]{c}a\\ b\\ c\end{array}\right)=\left(\begin{array}[]{c}y_{1}\\ {y}_{2}\\ \vdots\\ {y}_{n}\end{array}\right)

For example, for the data from the previous example we need to find the least square solution of

\left(\begin{array}[]{ccc}1&-2&4\\ 1&-1&1\\ 1&0&0\\ 1&2&4\\ 1&3&9\end{array}\right)\left(\begin{array}[]{c}a\\ b\\ c\end{array}\right)=\left(\begin{array}[]{c}4\\ 2\\ 1\\ 1\\ 1\end{array}\right).

Then

A^{*}A=\left(\begin{array}[]{ccccc}1&1&1&1&1\\ -2&-1&0&2&3\\ 4&1&0&4&9\end{array}\right)\left(\begin{array}[]{cccc}1&-2&4\\ 1&-1&1\\ 1&0&0\\ 1&2&4\\ 1&3&9\end{array}\right)=\left(\begin{array}[]{cccc}5&2&18\\ 2&18&26\\ 18&26&114\end{array}\right)

and

A^{*}\mathbf{b}=\left(\begin{array}[]{ccccc}1&1&1&1&1\\ -2&-1&0&2&3\\ 4&1&0&4&9\end{array}\right)\left(\begin{array}[]{c}4\\ 2\\ 1\\ 1\\ 1\end{array}\right)=\left(\begin{array}[]{cccc}9\\ -5\\ 31\end{array}\right).

Therefore the normal equation $A^{*}A\mathbf{x}=A^{*}\mathbf{b}$ is

\left(\begin{array}[]{cccc}5&2&18\\ 2&18&26\\ 18&26&114\end{array}\right)\left(\begin{array}[]{c}a\\ b\\ c\end{array}\right)=\left(\begin{array}[]{cccc}9\\ -5\\ 31\end{array}\right)

which has the unique solution

a=86/77,\qquad b=-62/77,\qquad c=43/154.

Therefore,

y=86/77-62x/77+43x^{2}/154

is the best fitting parabola.

Plane fitting

As another example, let us fit a plane $z=a+bx+cy$ to the data

(x_{k},y_{k},z_{k})\in\mathbb{R}^{3},\qquad k=1,2,\ldots n.

The equations we should have in the case of exact fit are

a+bx_{k}+cy_{k}=z_{k},\qquad k=1,2,\ldots,n,

or, in the matrix form

\left(\begin{array}[]{ccc}1&x_{1}&y_{1}\\ 1&x_{2}&y_{2}\\ \vdots&\vdots&\vdots\\ 1&x_{n}&y_{n}\end{array}\right)\left(\begin{array}[]{c}a\\ b\\ c\end{array}\right)=\left(\begin{array}[]{c}z_{1}\\ {z}_{2}\\ \vdots\\ {z}_{n}\end{array}\right).

So, to find the best fitting plane, we need to find the best square solution of this system (the unknowns are $a$ , $b$ , $c$ ).

Exercises.

5.4.1.

Find the least square solution of the system

\left(\begin{array}[]{cc}1&0\\ 0&1\\ 1&1\end{array}\right)\mathbf{x}=\left(\begin{array}[]{c}1\\ 1\\ 0\end{array}\right)

5.4.2.

Find the matrix of the orthogonal projection $P$ onto the column space of

\left(\begin{array}[]{rr}1&1\\ 2&-1\\ -2&4\end{array}\right).

Use two methods: Gram–Schmidt orthogonalization and formula for the projection.

Compare the results.

5.4.3.

Find the best straight line fit (least square solution) to the points $(-2,4)$ , $(-1,3)$ , $(0,1)$ , $(2,0)$ .

5.4.4.

Fit a plane $z=a+bx+cy$ to four points $(1,1,3)$ , $(0,3,6)$ , $(2,1,5)$ , $(0,0,0)$ .

To do that

a)

Find 4 equations with 3 unknowns $a,b,c$ such that the plane pass through all 4 points (this system does not have to have a solution);
b)

Find the least square solution of the system.

5.4.5.

Minimal norm solution. let an equation $A\mathbf{x}=\mathbf{b}$ has a solution, and let $A$ has non-trivial kernel (so the solution is not unique). Prove that

a)

There exist a unique solution $\mathbf{x}_{0}$ of $A\mathbf{x}=\mathbf{b}$ minimizing the norm $\|\mathbf{x}\|$ , i.e. that there exists unique $\mathbf{x}_{0}$ such that $A\mathbf{x}_{0}=\mathbf{b}$ and $\|\mathbf{x}_{0}\|\leq\|\mathbf{x}\|$ for any $\mathbf{x}$ satisfying $A\mathbf{x}=\mathbf{b}$ .
b)

$\mathbf{x}_{0}=P_{{}_{\scriptstyle(\operatorname{Ker}A)^{\perp}}}\mathbf{x}$ for any $\mathbf{x}$ satisfying $A\mathbf{x}=\mathbf{b}$ .

5.4.6.

Minimal norm least square solution. Applying previous problem to the equation $A\mathbf{x}=P_{{}_{\scriptstyle\operatorname{Ran}A}}\mathbf{b}$ show that

a)

There exists a unique least square solution $\mathbf{x}_{0}$ of $A\mathbf{x}=\mathbf{b}$ minimizing the norm $\|\mathbf{x}\|$ .
b)

$\mathbf{x}_{0}=P_{{}_{\scriptstyle(\operatorname{Ker}A)^{\perp}}}\mathbf{x}$ for any least square solution $\mathbf{x}$ of $A\mathbf{x}=\mathbf{b}$ .

5.5. Adjoint of a linear transformation. Fundamental subspaces revisited.

5.5.1. Adjoint matrices and adjoint operators.

Let as recall that for an $m\times n$ matrix $A$ its Hermitian adjoint (or simply adjoint) $A^{*}$ is defined by $A^{*}:=\overline{A^{T}}$ . In other words, the matrix $A^{*}$ is obtained from the transposed matrix $A^{T}$ by taking complex conjugate of each entry.

The following identity is the main property of adjoint matrix:

\framebox{$(A\mathbf{x},\mathbf{y})=(\mathbf{x},A^{*}\mathbf{y})\qquad\forall% \mathbf{x}\in\mathbb{C}^{n},\ \forall\mathbf{y}\in\mathbb{C}^{m}.$}

Before proving this identity, let us introduce some useful formulas. Let us recall that for transposed matrices we have the identity $(AB)^{T}=B^{T}A^{T}$ . Since for complex numbers $z$ and $w$ we have $\overline{zw}=\overline{z}\,\overline{w}$ , the identity

(AB)^{*}=B^{*}A^{*}

holds for the adjoint.

Also, since $(A^{T})^{T}=A$ and $\overline{\overline{z}}=z$ ,

(A^{*})^{*}=A.

Now, we are ready to prove the main identity:

(A\mathbf{x},\mathbf{y})=\mathbf{y}^{*}A\mathbf{x}=(A^{*}\mathbf{y})^{*}% \mathbf{x}=(\mathbf{x},A^{*}\mathbf{y});

the first and the last equalities here follow from the definition of inner product in $\mathbb{F}^{n}$ , and the middle one follows from the fact that

(A^{*}\mathbf{x})^{*}=\mathbf{x}^{*}(A^{*})^{*}=\mathbf{x}^{*}A.

Uniqueness of the adjoint.

The above main identity $(A\mathbf{x},\mathbf{y})\linebreak=(\mathbf{x},A^{*}\mathbf{y})$ is often used as the definition of the adjoint operator. Let us first notice that the adjoint operator is unique: if a matrix $B$ satisfies

(A\mathbf{x},\mathbf{y})=(\mathbf{x},B\mathbf{y})\qquad\forall\mathbf{x},% \mathbf{y},

then $B=A^{*}$ . Indeed, by the definition of $A^{*}$ for a given $\mathbf{y}$ we have

(\mathbf{x},A^{*}\mathbf{y})=(\mathbf{x},B\mathbf{y})\qquad\forall\mathbf{x},

and therefore by Corollary 5.1.5 $A^{*}\mathbf{y}=B\mathbf{y}$ . Since it is true for all $\mathbf{y}$ , the linear transformations, and therefore the matrices $A^{*}$ and $B$ coincide.

Adjoint transformation in abstract setting.

The above main identity $(A\mathbf{x},\mathbf{y})=(\mathbf{x},A^{*}\mathbf{y})$ can be used to define the adjoint operator in abstract setting, where $A:V\to W$ is an operator acting from one inner product space to another. Namely, we define $A^{*}:W\to V$ to be the operator satisfying

(A\mathbf{x},\mathbf{y})=(\mathbf{x},A^{*}\mathbf{y})\qquad\forall\mathbf{x}% \in V,\ \forall\mathbf{y}\in W.

Why does such an operator exists? We can simply construct it: consider orthonormal bases $\mathcal{A}=\mathbf{v}_{1},\mathbf{v}_{2},\ldots,\mathbf{v}_{n}$ in $V$ and $\mathcal{B}=\mathbf{w}_{1},\mathbf{w}_{2},\ldots,\mathbf{w}_{m}$ in $W$ . If $[A]_{{}_{\scriptstyle\mathcal{B}\mathcal{A}}}$ is the matrix of $A$ with respect to these bases, we define the operator $A^{*}$ by defining its matrix $[A^{*}]_{{}_{\scriptstyle\mathcal{A}\mathcal{B}}}$ as

[A^{*}]_{{}_{\scriptstyle\mathcal{A}\mathcal{B}}}=([A]_{{}_{\scriptstyle% \mathcal{B}\mathcal{A}}})^{*}.

We leave the proof that this indeed gives the adjoint operator as an exercise for the reader.

Note, that the reasoning in the above Sect. 5.5.1 implies that the adjoint operator is unique.

Useful formulas.

Below we present the properties of the adjoint operators (matrices) we will use a lot. We leave the proofs as an exercise for the reader.

1.

$(A+B)^{*}=A^{*}+B^{*}$ ;
2.

$(\alpha A)^{*}=\overline{\alpha}A^{*}$ ;
3.

$(AB)^{*}=B^{*}A^{*}$ ;
4.

$(A^{*})^{*}=A$ ;
5.

$(\mathbf{y},A\mathbf{x})=(A^{*}\mathbf{y},\mathbf{x})$ .

5.5.2. Relation between fundamental subspaces.

Theorem 5.5.1.

Let $A:V\to W$ be an operator acting from one inner product space to another. Then

1.

$\operatorname{Ker}A^{*}=(\operatorname{Ran}A)^{\perp}$ ;
2.

$\operatorname{Ker}A=(\operatorname{Ran}A^{*})^{\perp}$ ;
3.

$\operatorname{Ran}A=(\operatorname{Ker}A^{*})^{\perp}$ ;
4.

$\operatorname{Ran}A^{*}=(\operatorname{Ker}A)^{\perp}$ .

Remark.

Earlier in Section 2.7 of Chapter 2 the fundamental subspaces were defined (as it is often done in the literature) using $A^{T}$ instead of $A^{*}$ . Of course, there is no difference for real matrices, so in the real case the above theorem gives the geometric description of the fundamentals subspaces defined there.

Geometric interpretation of the fundamental subspaces defined using $A^{T}$ is presented in Chapter 8 below, see Section 8.3 there (Theorem 8.3.7). The formulas in this theorem are essentially the same as in Theorem 5.5.1 here, only the interpretation is a bit different.

Proof of Theorem 5.5.1.

First of all, let us notice, that since for a subspace $E$ we have $(E^{\perp})^{\perp}=E$ , the statements 1 and 3 are equivalent. Similarly, for the same reason, the statements 2 and 4 are equivalent as well. Finally, statement 2 is exactly statement 1 applied to the operator $A^{*}$ (here we use the fact that $(A^{*})^{*}=A$ ).

So, to prove the theorem we only need to prove statement 1.

We will present 2 proofs of this statement: a “matrix” proof, and an “invariant”, or “coordinate-free” one.

In the “matrix” proof, we assume that $A$ is an $m\times n$ matrix, i.e. that $A:\mathbb{F}^{n}\to\mathbb{F}^{m}$ . The general case can be always reduced to this one by picking orthonormal bases in $V$ and $W$ , and considering the matrix of $A$ in this bases.

Let $\mathbf{a}_{1},\mathbf{a}_{2},\ldots,\mathbf{a}_{n}$ be the columns of $A$ . Note, that $\mathbf{x}\in(\operatorname{Ran}A)^{\perp}$ if and only if $\mathbf{x}\perp\mathbf{a}_{k}$ (i.e. $(\mathbf{x},\mathbf{a}_{k})=0$ ) $\forall k=1,2,\ldots,n$ .

By the definition of the inner product in $\mathbb{F}^{n}$ , that means

0=(\mathbf{x},\mathbf{a}_{k})=\mathbf{a}_{k}^{*}\,\mathbf{x}\qquad\forall k=1,% 2,\ldots,n.

Since $\mathbf{a}_{k}^{*}$ is the row number $k$ of $A^{*}$ , the above $n$ equalities are equivalent to the equation

A^{*}\mathbf{x}=\mathbf{0}.

So, we proved that $\mathbf{x}\in(\operatorname{Ran}A)^{\perp}$ if and only if $A^{*}\mathbf{x}=\mathbf{0}$ , and that is exactly the statement 1.

Now, let us present the “coordinate-free” proof. The inclusion $\mathbf{x}\in(\operatorname{Ran}A)^{\perp}$ means that $\mathbf{x}$ is orthogonal to all vectors of the form $A\mathbf{y}$ , i.e. that

(\mathbf{x},A\mathbf{y})=0\qquad\forall\mathbf{y}.

Since $(\mathbf{x},A\mathbf{y})=(A^{*}\mathbf{x},\mathbf{y})$ , this identity is equivalent to

(A^{*}\mathbf{x},\mathbf{y})=0\qquad\forall\mathbf{y},

and by Lemma 5.1.4 this happens if and only if $A^{*}\mathbf{x}=\mathbf{0}$ . So we proved that $\mathbf{x}\in(\operatorname{Ran}A)^{\perp}$ if and only if $A^{*}\mathbf{x}=\mathbf{0}$ , which is exactly the statement 1 of the theorem. ∎

5.5.3. The “essential” part of a linear transformation

The above theorem makes the structure of the operator $A$ and the geometry of fundamental subspaces much more transparent. It follows from this theorem that the operator $A$ can be represented as a composition of orthogonal projection onto $\operatorname{Ran}A^{*}$ and an isomorphism from $\operatorname{Ran}A^{*}$ to $\operatorname{Ran}A$ .

Indeed, let $\widetilde{A}:\operatorname{Ran}A^{*}\to\operatorname{Ran}A$ be the restriction of $A$ to the domain $\operatorname{Ran}A^{*}$ and the target space $\operatorname{Ran}A$ ,

\widetilde{A}\mathbf{x}=A\mathbf{x},\qquad\forall\mathbf{x}\in\operatorname{% Ran}A^{*}.

Since $\operatorname{Ker}A=(\operatorname{Ran}A^{*})^{\perp}$ , we have

A\mathbf{x}=AP_{{}_{\scriptstyle\operatorname{Ran}{A^{*}}}}\mathbf{x}=% \widetilde{A}P_{{}_{\scriptstyle\operatorname{Ran}A^{*}}}\mathbf{x}\qquad% \forall\mathbf{x}\in X;

the fact that $\mathbf{x}-P_{{}_{\scriptstyle\operatorname{Ran}A^{*}}}\mathbf{x}\in(% \operatorname{Ran}A^{*})^{\perp}=\operatorname{Ker}A$ is used here. Therefore we can write

(5.5.1)

\displaystyle A\mathbf{x}=\widetilde{A}P_{{}_{\scriptstyle\operatorname{Ran}A^% {*}}}\mathbf{x}\qquad\forall\mathbf{x}\in X,

or, equivalently, $A=\widetilde{A}P_{{}_{\scriptstyle\operatorname{Ran}A^{*}}}$ .

Note also that $\widetilde{A}:\operatorname{Ran}A^{*}\to\operatorname{Ran}A$ is an invertible transformation. First we notice that $\operatorname{Ker}\widetilde{A}=\{\mathbf{0}\}$ : if $\mathbf{x}\in\operatorname{Ran}A^{*}$ is such that $\widetilde{A}\mathbf{x}=A\mathbf{x}=\mathbf{0}$ , then $\mathbf{x}\in\operatorname{Ker}A=(\operatorname{Ran}A^{*})^{\perp}$ , so $\mathbf{x}\in\operatorname{Ran}A^{*}\cap(\operatorname{Ran}A^{*})^{\perp}$ , thus $\mathbf{x}=\mathbf{0}$ . Then to see that $\widetilde{A}$ is invertible, it is sufficient to see that $\widetilde{A}$ is onto (surjective). But this immediately follows from (5.5.1):

\operatorname{Ran}\widetilde{A}=\widetilde{A}\operatorname{Ran}A^{*}=% \widetilde{A}P_{{}_{\scriptstyle\operatorname{Ran}A^{*}}}X=AX=\operatorname{% Ran}A.

The isomorphism $\widetilde{A}$ is sometimes called the “essential part” of the operator $A$ (a non-standard terminology).

The fact the “essential part” $\widetilde{A}:\operatorname{Ran}A^{*}\to\operatorname{Ran}A$ of $A$ is an isomorphism implies the following “complex” rank theorem: $\operatorname{rank}A=\operatorname{rank}A^{*}$ . But, of course, this theorem also follows from an elementary observation that complex conjugation does not change rank of a matrix, $\operatorname{rank}A=\operatorname{rank}\overline{A}$ .

Exercises.

5.5.1.

Show that for a square matrix $A$ the equality $\det(A^{*})=\overline{\det(A)}$ holds.

5.5.2.

Find matrices of orthogonal projections onto all 4 fundamental subspaces of the matrix

A=\left(\begin{array}[]{ccc}1&1&1\\ 1&3&2\\ 2&4&3\\ \end{array}\right)\ .

Note, that really you need only to compute 2 of the projections. If you pick an appropriate 2, the other 2 are easy to obtain from them (recall, how the projections onto $E$ and $E^{\perp}$ are related).

5.5.3.

Let $A$ be an $m\times n$ matrix. Show that $\operatorname{Ker}A=\operatorname{Ker}(A^{*}A)$ .

To do that you need to prove 2 inclusions, $\operatorname{Ker}(A^{*}A)\subset\operatorname{Ker}A$ and $\operatorname{Ker}A\subset\operatorname{Ker}(A^{*}A)$ . One of the inclusions is trivial, for the other one use the fact that

\|A\mathbf{x}\|^{2}=(A\mathbf{x},A\mathbf{x})=(A^{*}A\mathbf{x},\mathbf{x}).

5.5.4.

Use the equality $\operatorname{Ker}A=\operatorname{Ker}(A^{*}A)$ to prove that

a)

$\operatorname{rank}A=\operatorname{rank}(A^{*}A)$ ;
b)

If $A\mathbf{x}=\mathbf{0}$ has only the trivial solution, $A$ is left invertible. (You can just write a formula for a left inverse).

5.5.5.

Suppose, that for a matrix $A$ the matrix $A^{*}A$ is invertible, so the orthogonal projection onto $\operatorname{Ran}A$ is given by the formula $A(A^{*}A)^{-1}A^{*}$ . Can you write formulas for the orthogonal projections onto the other 3 fundamental subspaces ( $\operatorname{Ker}A$ , $\operatorname{Ker}A^{*}$ , $\operatorname{Ran}A^{*}$ )?

5.5.6.

Let a matrix $P$ be self-adjoint ( $P^{*}=P$ ) and let $P^{2}=P$ . Show that $P$ is the matrix of an orthogonal projection. Hint: consider the decomposition $\mathbf{x}=\mathbf{x}_{1}+\mathbf{x}_{2}$ , $\mathbf{x}_{1}\in\operatorname{Ran}P$ , $\mathbf{x}_{2}\perp\operatorname{Ran}P$ and show that $P\mathbf{x}_{1}=\mathbf{x}_{1}$ , $P\mathbf{x}_{2}=\mathbf{0}$ . For one of the equalities you will need self-adjointness, for the other one the property $P^{2}=P$ .

5.6. Isometries and unitary operators. Unitary and orthogonal matrices.

5.6.1. Main definitions

Definition.

An operator $U:X\to Y$ is called an isometry, if it preserves the norm,

\|U\mathbf{x}\|=\|\mathbf{x}\|\qquad\forall\mathbf{x}\in X.

The following theorem shows that an isometry preserves the inner product

Theorem 5.6.1.

An operator $U:X\to Y$ is an isometry if and only if it preserves the inner product, i.e if and only if

(\mathbf{x},\mathbf{y})=(U\mathbf{x},U\mathbf{y})\qquad\forall\mathbf{x},% \mathbf{y}\in X.

Proof.

The proof uses the polarization identities (Lemma 5.1.9). For example, if $X$ is a complex space

	$\displaystyle(U\mathbf{x},U\mathbf{y})$	$\displaystyle=\frac{1}{4}\sum_{\alpha=\pm 1,\pm i}\alpha\\|U\mathbf{x}+\alpha U% \mathbf{y}\\|^{2}$
		$\displaystyle=\frac{1}{4}\sum_{\alpha=\pm 1,\pm i}\alpha\\|U(\mathbf{x}+\alpha% \mathbf{y})\\|^{2}$
		$\displaystyle=\frac{1}{4}\sum_{\alpha=\pm 1,\pm i}\alpha\\|\mathbf{x}+\alpha% \mathbf{y}\\|^{2}=(\mathbf{x},\mathbf{y}).$

Similarly, for a real space $X$

	$\displaystyle(U\mathbf{x},U\mathbf{y})$	$\displaystyle=\frac{1}{4}\left(\\|U\mathbf{x}+U\mathbf{y}\\|^{2}-\\|U\mathbf{x}-U% \mathbf{y}\\|^{2}\right)$
		$\displaystyle=\frac{1}{4}\left(\\|U(\mathbf{x}+\mathbf{y})\\|^{2}-\\|U(\mathbf{x}% -\mathbf{y})\\|^{2}\right)$
		$\displaystyle=\frac{1}{4}\left(\\|\mathbf{x}+\mathbf{y}\\|^{2}-\\|\mathbf{x}-% \mathbf{y}\\|^{2}\right)=(\mathbf{x},\mathbf{y}).$

∎

Lemma 5.6.2.

An operator $U:X\to Y$ is an isometry if and only if $U^{*}U=I$ .

Proof.

If $U^{*}U=I$ , then by the definition of adjoint operator

(\mathbf{x},\mathbf{x})=(U^{*}U\mathbf{x},\mathbf{x})=(U\mathbf{x},U\mathbf{x}% )\qquad\forall\mathbf{x}\in X.

Therefore $\|\mathbf{x}\|=\|U\mathbf{x}\|$ , and so $U$ is an isometry.

On the other hand, if $U$ is an isometry, then by the definition of adjoint operator and by Theorem 5.6.1 we have for all $\mathbf{x}\in X$

(U^{*}U\mathbf{x},\mathbf{y})=(U\mathbf{x},U\mathbf{y})=(\mathbf{x},\mathbf{y}% )\qquad\forall\mathbf{y}\in X,

and therefore by Corollary 5.1.5 $U^{*}U\mathbf{x}=\mathbf{x}$ . Since it is true for all $\mathbf{x}\in X$ , we have $U^{*}U=I$ . ∎

The above lemma implies that an isometry is always left invertible ( $U^{*}$ being a left inverse).

Definition.

An isometry $U:X\to Y$ is called a unitary operator if it is invertible.

Proposition 5.6.3.

An isometry $U:X\to Y$ is a unitary operator if and only if $\dim X=\dim Y$ .

Proof.

Since $U$ is an isometry, it is left invertible, and since $\dim X=\dim Y$ , it is invertible (a left invertible square matrix is invertible).

On the other hand, if $U:X\to Y$ is invertible, $\dim X=\dim Y$ (only square matrices are invertible, isomorphic spaces have equal dimensions). ∎

A square matrix $U$ is called unitary if $U^{*}U=I$ , i.e. a unitary matrix is a matrix of a unitary operator acting in $\mathbb{F}^{n}$ .

A unitary matrix with real entries is called an orthogonal matrix. An orthogonal matrix can be interpreted a matrix of a unitary operator acting in the real space $\mathbb{R}^{n}$ .

Few properties of unitary operators:

1.

For a unitary transformation $U$ , $U^{-1}=U^{*}$ ;
2.

If $U$ is unitary, $U^{*}=U^{-1}$ is also unitary;
3.

If $U$ is a isometry, and $\mathbf{v}_{1},\mathbf{v}_{2},\ldots,\mathbf{v}_{n}$ is an orthonormal basis, then $U\mathbf{v}_{1},U\mathbf{v}_{2},\ldots,U\mathbf{v}_{n}$ is an orthonormal system. Moreover, if $U$ is unitary, $U\mathbf{v}_{1},U\mathbf{v}_{2},\ldots,U\mathbf{v}_{n}$ is an orthonormal basis.
4.

A product of unitary operators is a unitary operator as well.

5.6.2. Examples

First of all, let us notice, that

a matrix $U$ is an isometry if and only if its columns form an orthonormal system.

This statement can be checked directly by computing the product $U^{*}U$ .

It is easy to check that the columns of the rotation matrix

\left(\begin{array}[]{cc}\cos\alpha&-\sin\alpha\\ \sin\alpha&\cos\alpha\end{array}\right)

are orthogonal to each other, and that each column has norm 1. Therefore, the rotation matrix is an isometry, and since it is square, it is unitary. Since all entries of the rotation matrix are real, it is an orthogonal matrix.

The next example is more abstract. Let $X$ and $Y$ be inner product spaces, $\dim X=\dim Y=n$ , and let $\mathbf{x}_{1},\mathbf{x}_{2},\ldots,\mathbf{x}_{n}$ and $\mathbf{y}_{1},\mathbf{y}_{2},\ldots,\mathbf{y}_{n}$ be orthonormal bases in $X$ and $Y$ respectively. Define an operator $U:X\to Y$ by

U\mathbf{x}_{k}=\mathbf{y}_{k},\qquad k=1,2,\ldots,n.

Since for a vector $\mathbf{x}=c_{1}\mathbf{x}_{1}+c_{2}\mathbf{x}_{2}+\ldots+c_{n}\mathbf{x}_{n}$

\|\mathbf{x}\|^{2}=|c_{1}|^{2}+|c_{2}|^{2}+\ldots+|c_{n}|^{2}

and

\|U\mathbf{x}\|^{2}=\|U(\sum_{k=1}^{n}c_{k}\mathbf{x}_{k})\|^{2}=\|\sum_{k=1}^% {n}c_{k}\mathbf{y}_{k}\|^{2}=\sum_{k=1}^{n}|c_{k}|^{2},

one can conclude that $\|U\mathbf{x}\|=\|\mathbf{x}\|$ for all $\mathbf{x}\in X$ , so $U$ is a unitary operator.

5.6.3. Properties of unitary operators

Proposition 5.6.4.

Let $U$ be a unitary matrix. Then

1.

$|\det U|=1$ . In particular, for an orthogonal matrix $\det U=\pm 1$ ;
2.

If $\lambda$ is an eigenvalue of $U$ , then $|\lambda|=1$

Remark.

Note, that for an orthogonal matrix, an eigenvalue (unlike the determinant) does not have to be real. Our old friend, the rotation matrix gives an example.

Proof of Proposition 5.6.4.

Let $\det U=z$ . Since $\det(U^{*})=\overline{\det(U)}$ , see Exercise 5.5.1, we have

|z|^{2}=\overline{z}z=\det(U^{*}U)=\det I=1,

so $|\det U|=|z|=1$ . Statement 1 is proved.

To prove statement 2 let us notice that if $U\mathbf{x}=\lambda\mathbf{x}$ then

\|U\mathbf{x}\|=\|\lambda\mathbf{x}\|=|\lambda|\cdot\|\mathbf{x}\|,

so $|\lambda|=1$ . ∎

5.6.4. Unitary equivalent operators

Definition.

Operators (matrices) $A$ and $B$ are called unitarily equivalent if there exists a unitary operator $U$ such that $A=UBU^{*}$ .

Since for a unitary $U$ we have $U^{-1}=U^{*}$ , any two unitary equivalent matrices are similar as well.

The converse is not true, it is easy to construct a pair of similar matrices, which are not unitarily equivalent.

The following proposition gives a way to construct a counterexample.

Proposition 5.6.5.

A matrix $A$ is unitarily equivalent to a diagonal one if and only if it has an orthogonal (orthonormal) basis of eigenvectors.

Proof.

Let $A=UBU^{*}$ and let $B\mathbf{x}=\lambda\mathbf{x}$ . Then $AU\mathbf{x}=UBU^{*}U\mathbf{x}=UB\mathbf{x}=U(\lambda\mathbf{x})=\lambda U% \mathbf{x}$ , i.e. $U\mathbf{x}$ is an eigenvector of $A$ .

So, let $A$ be unitarily equivalent to a diagonal matrix $D$ , i.e. let $A=UDU^{*}$ . The vectors $\mathbf{e}_{k}$ of the standard basis are eigenvectors of $D$ , so the vectors $U\mathbf{e}_{k}$ are eigenvectors of $A$ . Since $U$ is unitary, the system $U\mathbf{e}_{1},U\mathbf{e}_{2},\ldots,U\mathbf{e}_{n}$ is an orthonormal basis.

Now let $A$ has an orthogonal basis $\mathbf{u}_{1},\mathbf{u}_{2},\ldots,\mathbf{u}_{n}$ of eigenvectors. Dividing each vector $\mathbf{u}_{k}$ by its norm if necessary, we can always assume that the system $\mathbf{u}_{1},\mathbf{u}_{2},\ldots,\mathbf{u}_{n}$ is an orthonormal basis. Let $D$ be the matrix of $A$ in the basis $\mathcal{B}=\mathbf{u}_{1},\mathbf{u}_{2},\ldots,\mathbf{u}_{n}$ . Clearly, $D$ is a diagonal matrix.

Denote by $U$ the matrix with columns $\mathbf{u}_{1},\mathbf{u}_{2},\ldots,\mathbf{u}_{n}$ . Since the columns form an orthonormal basis, $U$ is unitary. The standard change of coordinate formula implies

A=[A]_{{}_{\scriptstyle\mathcal{S}\mathcal{S}}}=[I]_{{}_{\scriptstyle\mathcal{% S}\mathcal{B}}}[A]_{{}_{\scriptstyle\mathcal{B}\mathcal{B}}}[I]_{{}_{% \scriptstyle\mathcal{B}\mathcal{S}}}=UDU^{-1}

and since $U$ is unitary, $A=UDU^{*}$ . ∎

Exercises.

5.6.1.

Orthogonally diagonalize the following matrices,

\left(\begin{array}[]{cc}1&2\\ 2&1\\ \end{array}\right),\qquad\left(\begin{array}[]{rr}0&-1\\ 1&0\\ \end{array}\right),\qquad\left(\begin{array}[]{lll}0&2&2\\ 2&0&2\\ 2&2&0\\ \end{array}\right)

i.e. for each matrix $A$ find a unitary matrix $U$ and a diagonal matrix $D$ such that $A=UDU^{*}$

5.6.2.

True or false: a matrix is unitarily equivalent to a diagonal one if and only if it has an orthogonal basis of eigenvectors.

5.6.3.

Prove the polarization identities

(A\mathbf{x},\mathbf{y})=\frac{1}{4}\bigl{[}(A(\mathbf{x}+\mathbf{y}),\mathbf{% x}+\mathbf{y})-(A(\mathbf{x}-\mathbf{y}),\mathbf{x}-\mathbf{y})\bigr{]}\qquad% \text{(real case, $A=A^{*}$)},

and

(A\mathbf{x},\mathbf{y})=\frac{1}{4}\sum_{\alpha=\pm 1,\pm i}\alpha(A(\mathbf{% x}+\alpha\mathbf{y}),\mathbf{x}+\alpha\mathbf{y})\qquad\text{(complex case, $A% $ is arbitrary)}.

5.6.4.

Show that a product of unitary (orthogonal) matrices is unitary (orthogonal) as well.

5.6.5.

Let $U:X\to X$ be a linear transformation on a finite-dimensional inner product space. True or false:

a)

If $\|U\mathbf{x}\|=\|\mathbf{x}\|$ for all $\mathbf{x}\in X$ , then $U$ is unitary.
b)

If $\|U\mathbf{e}_{k}\|=\|\mathbf{e}_{k}\|$ , $k=1,2\ldots,n$ for some orthonormal basis $\mathbf{e}_{1},\mathbf{e}_{2},\ldots,\mathbf{e}_{n}$ , then $U$ is unitary.

Justify your answers with a proof or a counterexample.

5.6.6.

Let $A$ and $B$ be unitarily equivalent $n\times n$ matrices.

a)

Prove that $\operatorname{trace}(A^{*}A)=\operatorname{trace}(B^{*}B)$ .
b)

Use a) to prove that

$\sum_{j,k=1}^{n}|A_{j,k}|^{2}=\sum_{j,k=1}^{n}|B_{j,k}|^{2}.$
c)

Use b) to prove that the matrices

$\left(\begin{array}[]{ll}1&2\\ 2&i\\ \end{array}\right)\qquad\text{and}\qquad\left(\begin{array}[]{ll}i&4\\ 1&1\\ \end{array}\right)$

are not unitarily equivalent.

5.6.7.

Which of the following pairs of matrices are unitarily equivalent:

a)

$\displaystyle\left(\begin{array}[]{ll}1&0\\ 0&1\\ \end{array}\right)\quad\text{and}\quad\left(\begin{array}[]{ll}0&1\\ 1&0\\ \end{array}\right)$ .
b)

$\displaystyle\left(\begin{array}[]{ll}0&1\\ 1&0\\ \end{array}\right)\quad\text{and}\quad\left(\begin{array}[]{ll}0&1/2\\ 1/2&0\\ \end{array}\right)$ .
c)

$\displaystyle\left(\begin{array}[]{rrr}0&1&0\\ -1&0&0\\ 0&0&1\\ \end{array}\right)\quad\text{and}\quad\left(\begin{array}[]{rrr}2&0&0\\ 0&-1&0\\ 0&0&0\\ \end{array}\right)$ .
d)

$\displaystyle\left(\begin{array}[]{rrr}0&1&0\\ -1&0&0\\ 0&0&1\\ \end{array}\right)\quad\text{and}\quad\left(\begin{array}[]{rrr}1&0&0\\ 0&-i&0\\ 0&0&i\\ \end{array}\right)$ .
e)

$\displaystyle\left(\begin{array}[]{rrr}1&1&0\\ 0&2&2\\ 0&0&3\\ \end{array}\right)\quad\text{and}\quad\left(\begin{array}[]{rrr}1&0&0\\ 0&2&0\\ 0&0&3\\ \end{array}\right)$ .

Hint: It is easy to eliminate matrices that are not unitarily equivalent: remember, that unitarily equivalent matrices are similar, and trace, determinant and eigenvalues of similar matrices coincide.

Also, the previous problem helps in eliminating non unitarily equivalent matrices.

Finally, a matrix is unitarily equivalent to a diagonal one if and only if it has an orthogonal basis of eigenvectors.

5.6.8.

Let $U$ be a $2\times 2$ orthogonal matrix with $\det U=1$ . Prove that $U$ is a rotation matrix.

5.6.9.

Let $U$ be a $3\times 3$ orthogonal matrix with $\det U=1$ . Prove that

a)

$1$ is an eigenvalue of $U$ .
b)

If $\mathbf{v}_{1},\mathbf{v}_{2},\mathbf{v}_{3}$ is an orthonormal basis, such that $U\mathbf{v}_{1}=\mathbf{v}_{1}$ (remember, that 1 is an eigenvalue), then in this basis the matrix of $U$ is

$\left(\begin{array}[]{ccc}1&0&0\\ 0&\cos\alpha&-\sin\alpha\\ 0&\sin\alpha&\cos\alpha\end{array}\right),$

where $\alpha$ is some angle.

Hint: Show, that since $\mathbf{v}_{1}$ is an eigenvector of $U$ , all entries below 1 must be zero, and since $\mathbf{v}_{1}$ is also an eigenvector of $U^{*}$ (why?), all entries right of 1 also must be zero. Then show that the lower right $2\times 2$ matrix is an orthogonal one with determinant 1, and use the previous problem.

5.7. Rigid motions in $\mathbb{R}^{n}$

A rigid motion in an inner product space $V$ is a transformation $f:V\to V$ preserving the distance between point, i.e. such that

\|f(\mathbf{x})-f(\mathbf{y})\|=\|\mathbf{x}-\mathbf{y}\|\qquad\forall\mathbf{% x},\mathbf{y}\in V.

Note, that in the definition we do not assume that the transformation $f$ is linear.

Clearly, any unitary transformation is a rigid motion. Another example of a rigid motion is a translation (shift) by $\mathbf{a}\in V$ , $f(\mathbf{x})=\mathbf{x}+\mathbf{a}$ .

The main result of this section is the following theorem, stating that any rigid motion in a real inner product space is a composition of an orthogonal transformation and a translation.

Theorem 5.7.1.

Let $f$ be a rigid motion in a real inner product space $X$ , and let $T(\mathbf{x}):=f(\mathbf{x})-f(\mathbf{0})$ . Then $T$ is an orthogonal transformation.

To prove this theorem we need the following simple lemma.

Lemma 5.7.2.

Let $T$ be as defined in Theorem 5.7.1. Then for all $\mathbf{x},\mathbf{y}\in X$

1.

$\|T\mathbf{x}\|=\|\mathbf{x}\|$ ;
2.

$\|T(\mathbf{x})-T(\mathbf{y})\|=\|\mathbf{x}-\mathbf{y}\|$ ;
3.

$(T(\mathbf{x}),T(\mathbf{y}))=(\mathbf{x},\mathbf{y})$ .

Proof.

To prove statement 1 notice that

\|T(\mathbf{x})\|=\|f(\mathbf{x})-f(\mathbf{0})\|=\|\mathbf{x}-\mathbf{0}\|=\|% \mathbf{x}\|.

Statement 2 follows from the following chain of identities:

	$\displaystyle\\|T(\mathbf{x})-T(\mathbf{y})\\|$	$\displaystyle=\\|(f(\mathbf{x})-f(\mathbf{0}))-(f(\mathbf{y})-f(\mathbf{0}))\\|$
		$\displaystyle=\\|f(\mathbf{x})-f(\mathbf{y})\\|=\\|\mathbf{x}-\mathbf{y}\\|.$

An alternative explanation would be that $T$ is a composition of 2 rigid motions ( $f$ followed by the translation by $\mathbf{a}=-f(\mathbf{0})$ ), and one can easily see that a composition of rigid motions is a rigid motion. Since $T(\mathbf{0})=\mathbf{0}$ , and so $\|T(\mathbf{x})\|=\|T(\mathbf{x})-T(\mathbf{0})\|$ , statement 1 can be treated as a particular case of statement 2.

To prove statement 3, let us notice that in a real inner product space

\|T(\mathbf{x})-T(\mathbf{y})\|^{2}=\|T(\mathbf{x})\|^{2}+\|T(\mathbf{y})\|^{2% }-2(T(\mathbf{x}),T(\mathbf{y})),

and

\|\mathbf{x}-\mathbf{y}\|^{2}=\|\mathbf{x}\|^{2}+\|\mathbf{y}\|^{2}-2(\mathbf{% x},\mathbf{y}).

Recalling that $\|T(\mathbf{x})-T(\mathbf{y})\|=\|\mathbf{x}-\mathbf{y}\|$ and $\|T(\mathbf{x})\|=\|\mathbf{x}\|$ , $\|T(\mathbf{y})\|=\|\mathbf{y}\|$ , we immediately get the desired conclusion. ∎

Proof of Theorem 5.7.1.

First of all notice that for all $\mathbf{x}\in X$

\|T(\mathbf{x})\|=\|f(\mathbf{x})-f(\mathbf{0})\|=\|\mathbf{x}-\mathbf{0}\|=\|% \mathbf{x}\|,

so $T$ preserves the norm, $\|T\mathbf{x}\|=\|\mathbf{x}\|$ .

We would like to say that the identity $\|T\mathbf{x}\|=\|\mathbf{x}\|$ means $T$ is an isometry, but to be able to say that we need to prove that $T$ is a linear transformation.

To do that, let us fix an orthonormal basis $\mathbf{e}_{1},\mathbf{e}_{2},\ldots,\mathbf{e}_{n}$ in $X$ , and let $\mathbf{b}_{k}:=T(\mathbf{e}_{k})$ , $k=1,2,\ldots,n$ . Since $T$ preserves the inner product (statement 3 of Lemma 5.7.2), we can conclude that $\mathbf{b}_{1},\mathbf{b}_{2},\ldots,\mathbf{b}_{n}$ is an orthonormal system. In fact, since $\dim X=n$ (because basis $\mathbf{e}_{1},\mathbf{e}_{2},\ldots,\mathbf{e}_{n}$ consists of $n$ vectors), we can conclude that $\mathbf{b}_{1},\mathbf{b}_{2},\ldots,\mathbf{b}_{n}$ is an orthonormal basis.

Let $\mathbf{x}=\sum_{k=1}^{n}\alpha_{k}\mathbf{e}_{k}$ . Recall that by the abstract orthogonal Fourier decomposition (5.2.2) we have that $\alpha_{k}=(\mathbf{x},\mathbf{e}_{k})$ . Applying the abstract orthogonal Fourier decomposition (5.2.2) to $T(\mathbf{x})$ and the orthonormal basis $\mathbf{b}_{1},\mathbf{b}_{2},\ldots,\mathbf{b}_{n}$ we get

T(\mathbf{x})=\sum_{k=1}^{n}(T(\mathbf{x}),\mathbf{b}_{k})\mathbf{b}_{k}.

Since

(T(\mathbf{x}),\mathbf{b}_{k})=(T(\mathbf{x}),T(\mathbf{e}_{k}))=(\mathbf{x},% \mathbf{e}_{k})=\alpha_{k},

we get that

T\Bigl{(}\sum_{k=1}^{n}\alpha_{k}\mathbf{e}_{k}\Bigr{)}=\sum_{k=1}^{n}\alpha_{% k}\mathbf{b}_{k}.

This means that $T$ is a linear transformation whose matrix with respect to the bases $\mathcal{S}:=\mathbf{e}_{1},\mathbf{e}_{2},\ldots,\mathbf{e}_{n}$ and $\mathcal{B}:=\mathbf{b}_{1},\mathbf{b}_{2},\ldots,\mathbf{b}_{n}$ is identity matrix, $[T]_{{}_{\scriptstyle\mathcal{B},\mathcal{S}}}=I$ .

An alternative way to show that $T$ is a linear transformation is the following direct calculation

	$\displaystyle\\|T(\mathbf{x}+\alpha\mathbf{y})-(T(\mathbf{x})+\alpha T(\mathbf{% y}))\\|^{2}=\\|(T(\mathbf{x}+\alpha\mathbf{y})-T(\mathbf{x}))-\alpha T(\mathbf{y% })\\|^{2}$
	$\displaystyle=\\|T(\mathbf{x}+\alpha\mathbf{y})-T(\mathbf{x})\\|^{2}+\alpha^{2}% \\|T(\mathbf{y})\\|^{2}-2\alpha(T(\mathbf{x}+\alpha\mathbf{y})-T(\mathbf{x}),T(% \mathbf{y}))$
	$\displaystyle=\\|\mathbf{x}+\alpha\mathbf{y}-\mathbf{x}\\|^{2}+\alpha^{2}\\|% \mathbf{y}\\|^{2}-2\alpha(T(\mathbf{x}+\alpha\mathbf{y}),T(\mathbf{y}))+2\alpha% (T(\mathbf{x}),T(\mathbf{y}))$
	$\displaystyle=\alpha^{2}\\|\mathbf{y}\\|^{2}+\alpha^{2}\\|\mathbf{y}\\|^{2}-2% \alpha(\mathbf{x}+\alpha\mathbf{y},\mathbf{y})+2\alpha(\mathbf{x},\mathbf{y})$
	$\displaystyle=2\alpha^{2}\\|\mathbf{y}\\|^{2}-2\alpha(\mathbf{x},\mathbf{y})-2% \alpha^{2}(\mathbf{y},\mathbf{y})+2\alpha(\mathbf{x},\mathbf{y})=0$

Therefore

T(\mathbf{x}+\alpha\mathbf{y})=T(\mathbf{x})+\alpha T(\mathbf{y}),

which implies that $T$ is linear (taking $\mathbf{x}=\mathbf{0}$ or $\alpha=1$ we get two properties from the definition of a linear transformation).

So, $T$ is a linear transformation satisfying $\|T\mathbf{x}\|=\|\mathbf{x}\|$ , i.e. $T$ is an isometry. Since $T:X\to X$ , $T$ is unitary transformation (see Proposition 5.6.3). That completes the proof, since an orthogonal transformation is simply a unitary transformation in a real inner product space. ∎

Exercises.

5.7.1.

Give an example of a rigid motion $T$ in $\mathbb{C}^{n}$ , $T(\mathbf{0})=\mathbf{0}$ , which is not a linear transformation.

5.8. Complexification and decomplexification

This section is probably a bit more abstract than the rest of the chapter, and can be skipped at the first reading.

5.8.1. Decomplexification

Decomplexification of a vector space

Any complex vector space can be interpreted a real vector space: we just need to forget that we can multiply vectors by complex numbers and act as only multiplication by real numbers is allowed.

For example, the space $\mathbb{C}^{n}$ is canonically identified with the real space $\mathbb{R}^{2n}$ : each complex coordinate $z_{k}=x_{k}+iy_{k}$ gives us 2 real ones $x_{k}$ and $y_{k}$ .

“Canonically” here means that this is a standard, most natural way of identifying $\mathbb{C}^{n}$ and $\mathbb{R}^{2n}$ . Note, that while the above definition gives us a canonical way to get real coordinates from complex ones, it does not say anything about ordering real coordinates.

In fact, there are two standard ways to order the coordinates $x_{k}$ , $y_{k}$ . One way is to take first the real parts and then the imaginary parts, so the ordering is $x_{1},x_{2},\ldots,x_{n},y_{1},y_{2},\ldots,y_{n}$ . The other standard alternative is the ordering $x_{1},y_{1},x_{2},y_{2},\ldots,x_{n},y_{n}$ . The material of this section does not depend on the choice of ordering of coordinates, so the reader does not have to worry about picking an ordering.

Decomplexification of an inner product

It turns out that if we are given a complex inner product (in a complex space), we can in a canonical way get a real inner product from it. To see how we can do that, let as first consider the above example of $\mathbb{C}^{n}$ canonically identifies with $\mathbb{R}^{2n}$ . Let $(\mathbf{x},\mathbf{y})_{{}_{\scriptstyle\mathbb{C}}}$ denote the standard inner product in $\mathbb{C}^{n}$ , and $(\mathbf{x},\mathbf{y})_{{}_{\scriptstyle\mathbb{R}}}$ be the standard inner product in $\mathbb{R}^{2n}$ (note that the standard inner product in $\mathbb{R}^{n}$ does not depend on the ordering of coordinates). Then (see Exercise 5.8.1 below)

(5.8.1)

(\mathbf{x},\mathbf{y})_{{}_{\scriptstyle\mathbb{R}}}=\operatorname{Re}(% \mathbf{x},\mathbf{y})_{{}_{\scriptstyle\mathbb{C}}}

This formula can be used to canonically define a real inner product from the complex one in general situation. Namely, it is an easy exercise to show that if $(\mathbf{x},\mathbf{y})_{{}_{\scriptstyle\mathbb{C}}}$ is an inner product in a complex inner product space, then $(\mathbf{x},\mathbf{y})_{{}_{\scriptstyle\mathbb{R}}}$ defined by (5.8.1) is a real inner product (on the corresponding real space).

Summarizing we can say that

To decomplexify a complex inner product space we simply “forget” that we can multiply by complex numbers, i.e. we only allow multiplication by reals. The canonical real inner product in the decomplexified space is given by formula (5.8.1)

Remark.

Any (complex) linear transformation on $\mathbb{C}^{n}$ (or, more generally, on a complex vector space) gives as a real linear transformation: it is simply the fact that if $T(\alpha\mathbf{x}+\beta\mathbf{y})=\alpha T\mathbf{x}+\beta T\mathbf{y}$ holds for $\alpha,\beta\in\mathbb{C}$ , then it of course holds for $\alpha,\beta\in\mathbb{R}$ .

The converse is not true, i.e. a (real) linear transformation on the decomplexification $\mathbb{R}^{2n}$ of $\mathbb{C}^{n}$ does not always give a (complex) linear transformation of $\mathbb{C}^{n}$ (the same in the abstract settings).

For example, if one considers the case $n=1$ , then the multiplication by a complex number $z$ (general form of a linear transformation in $\mathbb{C}^{1}$ ) treated as a linear transformation in $\mathbb{R}^{2}$ has a very specific structure (can you describe it?).

5.8.2. Complexification

We can also do a converse, namely get a complex inner product space from a real one: in fact, you probably already did it before, without paying much attention to it.

Namely, given a real inner product space $\mathbb{R}^{n}$ we can obtain a complex space $\mathbb{C}^{n}$ out of it by allowing complex coordinates (with the standard inner product in both cases). The space $\mathbb{R}^{n}$ in this case will be a real²²2Real subspace mean the set closed with respect to sum and multiplication by real numberssubspace of $\mathbb{C}^{n}$ consisting of vectors with real coordinates.

Abstractly, this construction can be described as follows: given a real vector space $X$ we can define its complexification $X_{{}_{\scriptstyle\mathbb{C}}}$ as the collection of all pairs $[\mathbf{x}_{1},\mathbf{x}_{2}]$ , $\mathbf{x}_{1},\mathbf{x}_{2}\in X$ with the addition and multiplication by a real number $\alpha$ are defined coordinate-wise,

[\mathbf{x}_{1},\mathbf{x}_{2}]+[\mathbf{y}_{1},\mathbf{y}_{2}]=[\mathbf{x}_{1% }+\mathbf{y}_{1},\mathbf{x}_{2}+\mathbf{y}_{2}],\qquad\alpha[\mathbf{x}_{1},% \mathbf{x}_{2}]=[\alpha\mathbf{x}_{1},\alpha\mathbf{x}_{2}].

If $X=\mathbb{R}^{n}$ then the vector $\mathbf{x}_{1}$ consists of real parts of complex coordinates of $\mathbb{C}^{n}$ and the vector $\mathbf{x}_{2}$ of the imaginary parts. Thus informally one can write the pair $[\mathbf{x}_{1},\mathbf{x}_{2}]$ as $\mathbf{x}_{1}+i\mathbf{x}_{2}$ .

To define multiplication by complex numbers we define multiplication by $i$ as

i[\mathbf{x}_{1},\mathbf{x}_{2}]=[-\mathbf{x}_{2},\mathbf{x}_{1}]

(writing $[\mathbf{x}_{1},\mathbf{x}_{2}]$ as $\mathbf{x}_{2}+i\mathbf{x}_{2}$ we can see that it must be defined this way) and define multiplication by arbitrary complex numbers using second distributive property $(\alpha+\beta)\mathbf{v}=\alpha\mathbf{v}+\beta\mathbf{v}$ .

If, in addition, $X$ is an inner product space we can extend the inner product to $X_{{}_{\scriptstyle\mathbb{C}}}$ by

\bigl{(}[\mathbf{x}_{1},\mathbf{x}_{2}],[\mathbf{y}_{1},\mathbf{y}_{2}]\bigr{)% }_{{}_{\scriptstyle X_{\mathbb{C}}}}=(\mathbf{x}_{1},\mathbf{y}_{1})_{{}_{% \scriptstyle X}}+(\mathbf{x}_{2},\mathbf{y}_{2})_{{}_{\scriptstyle X}}-i(% \mathbf{x}_{1},\mathbf{y}_{2})_{{}_{\scriptstyle X}}+i(\mathbf{x}_{2},\mathbf{% y}_{1})_{{}_{\scriptstyle X}}\,.

The easiest way to see that everything is well defined, is to fix a basis (an orthonormal basis in the case of a real inner product space) and see what this construction gives us in coordinates. Then we can see that if we treat vector $\mathbf{x}_{1}$ as the vector consisting of the real parts of complex coordinates, and vector $\mathbf{x}_{2}$ as the vector consisting of imaginary parts of coordinates, then this construction is exactly the standard complexification of $\mathbb{R}^{n}$ (by allowing complex coordinates) described above.

The fact that we can express this construction in coordinate-free way, without picking a basis and working with coordinates, means that the result does not depend on the choice of a basis.

So, the easiest way to think about complexification is probably as follows:

To construct a complexification of a real vector space $X$ we can pick a basis (an orthonormal basis if $X$ is a real inner product space) and then work with coordinates, allowing the complex ones. The resulting space does not depend on the choice of a basis; we can get from one coordinates to the others by the standard change of coordinate formula.

Note, that any linear transformation $T$ in the real space $X$ gives rise to a linear transformation $T_{{}_{\scriptstyle\mathbb{C}}}$ in the complexification $X_{{}_{\scriptstyle\mathbb{C}}}$ .

The easiest way to see that is to fix a basis in $X$ (an orthonormal basis if $X$ is a real inner product space) and to work in a coordinate representation: in this case $T_{{}_{\scriptstyle\mathbb{C}}}$ has the same matrix as $T$ . In the abstract representation we can write

T_{{}_{\scriptstyle\mathbb{C}}}[\mathbf{x}_{1},\mathbf{x}_{2}]=[T\mathbf{x}_{1% },T\mathbf{x}_{2}].

On the other hand, not all linear transformations in $X_{{}_{\scriptstyle\mathbb{C}}}$ can be obtained from the transformations in $X$ ; if we do complexification in coordinates, only the transformations with real matrices work.

Note, that this is completely opposite to the situation in the case of decomplexification, described in Section 5.8.1.

An attentive reader probably already noticed that the operations of complexification and decomplixification are not the inverse of each other. First, the space and its complexification have the same dimension, while the decomplixification of an $n$ -dimensional space has dimension $2n$ . Moreover, as we just discussed, the relation between real and complex linear transformations is completely opposite in these cases.

In the next section we discuss the operation, inverse in some sense to decomplexification.

5.8.3. Introducing complex structure to a real space

The construction described in this section works only for real spaces of even dimension.

An elementary way to introduce a complex structure

Let $X$ be a real inner product space of dimension $2n$ . We want invert the decomplexification procedure to introduce a complex structure on $X$ , i.e. to identify this space with a complex space such that its decomplexification (see Section 5.8.1) give us the original space $X$ . The simplest idea is to fix an orthonormal basis in $X$ and then split the coordinates in this basis into two equal parts.

We than treat one half of the coordinates (say coordinates $x_{1},x_{2},\ldots,x_{n}$ ) as real parts of complex coordinates, and treat the rest as the imaginary parts. Then we have to join real and imaginary parts together: for example if we treat $x_{1},x_{2},\ldots,x_{n}$ as real parts and $x_{n+1},x_{n+2},\ldots,x_{2n}$ as imaginary parts, we can define complex coordinates $z_{k}=x_{k}+ix_{n+k}$ .

Of course, the result will generally depend on the choice of the orthonormal basis and on the way we split the real coordinates into real and imaginary parts and on how we join them.

One can also see from the decomplexification construction described in Section 5.8.1 that all complex structures on a real inner product space $X$ can be obtained in this way.

From elementary to abstract construction of complex structure

The above construction can be described in an abstract, coordinate-free way. Namely, let us split the space $X$ as $X=E\oplus E^{\perp}$ , where $E$ is a subspace, $\dim E=n$ (so $\dim E^{\perp}=n$ as well), and let $U_{0}:E\to E^{\perp}$ be a unitary (more precisely, orthogonal, since our spaces are real) transformation.

Note, that if $\mathbf{v}_{1},\mathbf{v}_{2},\ldots,\mathbf{v}_{n}$ is an orthonormal basis in $E$ , then the system $U_{0}\mathbf{v}_{1},U_{0}\mathbf{v}_{2},\ldots,U_{0}\mathbf{v}_{n}$ is an orthonormal basis in $E^{\perp}$ , so

\mathbf{v}_{1},\mathbf{v}_{2},\ldots,\mathbf{v}_{n},U_{0}\mathbf{v}_{1},U_{0}% \mathbf{v}_{2},\ldots,U_{0}\mathbf{v}_{n}

is an orthonormal basis in the whole space $X$ .

If $x_{1},x_{2},\ldots,x_{2n}$ are coordinates of a vector $\mathbf{x}$ in this basis, and we treat $x_{k}+ix_{n+k}$ , $k=1,2,\ldots,n$ as complex coordinates of $\mathbf{x}$ , then the multiplication by $i$ is represented by the orthogonal transformation $U$ which is given in the orthogonal basis of subspaces $E,E^{\perp}$ by the block matrix

U=\left(\begin{array}[]{cc}\mathbf{0}&-U_{0}^{*}\\ U_{0}&\mathbf{0}\end{array}\right).

This means that

i\left(\begin{array}[]{c}\mathbf{x}_{1}\\ \mathbf{x}_{2}\end{array}\right)=U\left(\begin{array}[]{c}\mathbf{x}_{1}\\ \mathbf{x}_{2}\end{array}\right)=\left(\begin{array}[]{cc}\mathbf{0}&-U_{0}^{*% }\\ U_{0}&\mathbf{0}\end{array}\right)\left(\begin{array}[]{c}\mathbf{x}_{1}\\ \mathbf{x}_{2}\end{array}\right)

$\mathbf{x}_{1}\in E$ , $\mathbf{x}_{2}\in E^{\perp}$ .

Clearly, $U$ is an orthogonal transformation such that $U^{2}=-I$ . Therefore, any complex structure on $X$ is given by an orthogonal transformation $U$ , satisfying $U^{2}=-I$ ; the transformation $U$ gives us the multiplication by the imaginary unit $i$ .

The converse is also true, namely any orthogonal transformation $U$ satisfying $U^{2}=-I$ defines a complex structure on a real inner product space $X$ . Let us explain how.

An abstract construction of complex structure

Let us first consider an abstract explanation. To define a complex structure, we need to define the multiplication of vectors by complex numbers (initially we only can multiply by real numbers). In fact we need only to define the multiplication by $i$ , the rest will follow from linearity in the original real space. And the multiplication by $i$ is given by the orthogonal transformation $U$ satisfying $U^{2}=-I$ .

Namely, if the multiplication by $i$ is given by $U$ , $i\mathbf{x}=U\mathbf{x}$ , then the complex multiplication must be defined by

(5.8.2)

(\alpha+\beta i)\mathbf{x}:=\alpha\mathbf{x}+\beta U\mathbf{x}=(\alpha I+\beta U% )\mathbf{x},\qquad\alpha,\beta\in\mathbb{R},\quad\mathbf{x}\in X.

We will use this formula now as the definition of complex multiplication.

It is not hard to check that for the complex multiplication defined above by (5.8.2) all axioms of complex vector space are satisfied. One can see that, for example by using linearity in the real space $X$ and noticing that with respect to algebraic operations (addition and multiplication) the linear transformations of form

\alpha I+\beta U,\qquad\alpha,\beta\in\mathbb{R},

behave absolutely the same way as complex numbers $\alpha+\beta i$ , i.e such transformations give us a representation of the field of complex numbers $\mathbb{C}$ .

This means that first, a sum and a product of transformations of the form $\alpha I+\beta U$ is a transformation of the same form, and to get the coefficients $\alpha$ , $\beta$ of the result we can perform the operation on the corresponding complex numbers and take the real and imaginary parts of the result. Note, that here we need the identity $U^{2}=-I$ , but we do not need the fact that $U$ is an orthogonal transformation.

Thus, we got the structure of a complex vector space. To get a complex inner product space we need to introduce complex inner product $(\mathbf{x},\mathbf{y})_{{}_{\scriptstyle\mathbb{C}}}$ , such that the original real inner product $(\mathbf{x},\mathbf{y})_{{}_{\scriptstyle\mathbb{R}}}$ is the real part of it.

We really do not have any choice here. Indeed, for any complex inner product

\operatorname{Im}(\mathbf{x},\mathbf{y})_{{}_{\scriptstyle\mathbb{C}}}=% \operatorname{Re}\left[-i(\mathbf{x},\mathbf{y})_{{}_{\scriptstyle\mathbb{C}}}% \right]=\operatorname{Re}(\mathbf{x},i\mathbf{y})_{{}_{\scriptstyle\mathbb{C}}% }=\operatorname{Re}(\mathbf{x},U\mathbf{y})_{{}_{\scriptstyle\mathbb{C}}};

for the last equality we used the definition (5.8.2) of complex multiplication. Therefore, the only way to define the complex inner product $(\mathbf{x},\mathbf{y})_{{}_{\scriptstyle\mathbb{C}}}$ such that $(\mathbf{x},\mathbf{y})_{{}_{\scriptstyle\mathbb{R}}}=\operatorname{Re}(% \mathbf{x},\mathbf{y})_{{}_{\scriptstyle\mathbb{C}}}$ is

(5.8.3)

(\mathbf{x},\mathbf{y})_{{}_{\scriptstyle\mathbb{C}}}:=(\mathbf{x},\mathbf{y})% _{{}_{\scriptstyle\mathbb{R}}}+i(\mathbf{x},U\mathbf{y})_{{}_{\scriptstyle% \mathbb{R}}}.

Let us show that this is indeed a complex inner product. We will need the fact that $U^{*}=-U$ , see Exercise 5.8.4 below (by $U^{*}$ here we mean the adjoint with respect to the original real inner product).

To show that $(\mathbf{y},\mathbf{x})_{{}_{\scriptstyle\mathbb{C}}}=\overline{(\mathbf{x},% \mathbf{y})_{{}_{\scriptstyle\mathbb{C}}}}$ we use the identity $U^{*}=-U$ and symmetry of the real inner product:

	$\displaystyle(\mathbf{y},\mathbf{x})_{{}_{\scriptstyle\mathbb{C}}}$	$\displaystyle=(\mathbf{y},\mathbf{x})_{{}_{\scriptstyle\mathbb{R}}}+i(\mathbf{% y},U\mathbf{x})_{{}_{\scriptstyle\mathbb{R}}}$
		$\displaystyle=(\mathbf{x},\mathbf{y})_{{}_{\scriptstyle\mathbb{R}}}+i(U\mathbf% {x},\mathbf{y})_{{}_{\scriptstyle\mathbb{R}}}$
		$\displaystyle=(\mathbf{x},\mathbf{y})_{{}_{\scriptstyle\mathbb{R}}}-i(\mathbf{% x},U\mathbf{y})_{{}_{\scriptstyle\mathbb{R}}}$
		$\displaystyle=\overline{(\mathbf{x},\mathbf{y})_{{}_{\scriptstyle\mathbb{R}}}+% i(\mathbf{x},U\mathbf{y})_{{}_{\scriptstyle\mathbb{R}}}}$
		$\displaystyle=\overline{(\mathbf{x},\mathbf{y})_{{}_{\scriptstyle\mathbb{C}}}}.$

To prove the linearity of the complex inner product, let us first notice that $(\mathbf{x},\mathbf{y})_{{}_{\scriptstyle\mathbb{C}}}$ is real linear in the first (in fact in each) argument, i.e. that $(\alpha\mathbf{x}+\beta\mathbf{y},\mathbf{z})_{{}_{\scriptstyle\mathbb{C}}}=% \alpha(\mathbf{x},\mathbf{z})_{{}_{\scriptstyle\mathbb{C}}}+\beta(\mathbf{y},% \mathbf{z})_{{}_{\scriptstyle\mathbb{C}}}$ for $\alpha,\beta\in\mathbb{R}$ ; this is true because each summand in the right side of (5.8.3) is real linear in the first argument.

Using real linearity of $(\mathbf{x},\mathbf{y})_{{}_{\scriptstyle\mathbb{C}}}$ and the identity $U^{*}=-U$ (which implies that $(U\mathbf{x},\mathbf{y})_{{}_{\scriptstyle\mathbb{R}}}=-(\mathbf{x},U\mathbf{y% })_{{}_{\scriptstyle\mathbb{R}}}$ ) together with the orthogonality of $U$ , we get the following chain of equalities

	$\displaystyle((\alpha I+\beta U)\mathbf{x},\mathbf{y})_{{}_{\scriptstyle% \mathbb{C}}}$	$\displaystyle=\alpha(\mathbf{x},\mathbf{y})_{{}_{\scriptstyle\mathbb{C}}}+% \beta(U\mathbf{x},\mathbf{y})_{{}_{\scriptstyle\mathbb{C}}}$
		$\displaystyle=\alpha(\mathbf{x},\mathbf{y})_{{}_{\scriptstyle\mathbb{C}}}+% \beta\left[(U\mathbf{x},\mathbf{y})_{{}_{\scriptstyle\mathbb{R}}}+i(U\mathbf{x% },U\mathbf{y})_{{}_{\scriptstyle\mathbb{R}}}\right]$
		$\displaystyle=\alpha(\mathbf{x},\mathbf{y})_{{}_{\scriptstyle\mathbb{C}}}+% \beta\left[-(\mathbf{x},U\mathbf{y})_{{}_{\scriptstyle\mathbb{R}}}+i(\mathbf{x% },\mathbf{y})_{{}_{\scriptstyle\mathbb{R}}}\right]$
		$\displaystyle=\alpha(\mathbf{x},\mathbf{y})_{{}_{\scriptstyle\mathbb{C}}}+% \beta i\left[(\mathbf{x},\mathbf{y})_{{}_{\scriptstyle\mathbb{R}}}+i(\mathbf{x% },U\mathbf{y})_{{}_{\scriptstyle\mathbb{R}}}\right]$
		$\displaystyle=\alpha(\mathbf{x},\mathbf{y})_{{}_{\scriptstyle\mathbb{C}}}+% \beta i(\mathbf{x},\mathbf{y})_{{}_{\scriptstyle\mathbb{C}}}=(\alpha+\beta i)(% \mathbf{x},\mathbf{y})_{{}_{\scriptstyle\mathbb{C}}},$

which proves complex linearity.

Finally, to prove non-negativity of $(\mathbf{x},\mathbf{x})_{{}_{\scriptstyle\mathbb{C}}}$ let us notice (see Exercise 5.8.3 below) that $(\mathbf{x},U\mathbf{x})_{{}_{\scriptstyle\mathbb{R}}}=0$ , so

(\mathbf{x},\mathbf{x})_{{}_{\scriptstyle\mathbb{C}}}=(\mathbf{x},\mathbf{x})_% {{}_{\scriptstyle\mathbb{R}}}=\|\mathbf{x}\|^{2}\geq 0.

The abstract construction via the elementary one

For a reader who is not comfortable with such “high brow” and abstract proof, there is another, more hands on, explanation.

Namely, it can be shown, see Exercise 5.8.5 below, that there exists a subspace $E$ , $\dim E=n$ (recall that $\dim X=2n$ ), such that the matrix of $U$ with respect to the decomposition $X=E\oplus E^{\perp}$ is given by

U=\left(\begin{array}[]{cc}\mathbf{0}&-U_{0}^{*}\\ U_{0}&\mathbf{0}\end{array}\right),

where $U_{0}:E\to E^{\perp}$ is some orthogonal transformation.

Let $\mathbf{v}_{1},\mathbf{v}_{2},\ldots,\mathbf{v}_{n}$ be an orthonormal basis in $E$ . Then the system $U_{0}\mathbf{v}_{1},U_{0}\mathbf{v}_{2},\ldots,U_{0}\mathbf{v}_{n}$ is an orthonormal basis in $E^{\perp}$ , so

\mathbf{v}_{1},\mathbf{v}_{2},\ldots,\mathbf{v}_{n},U_{0}\mathbf{v}_{1},U_{0}% \mathbf{v}_{2},\ldots,U_{0}\mathbf{v}_{n}

is an orthonormal basis in the whole space $X$ . Considering the coordinates $x_{1},x_{2},\ldots,x_{2n}$ in this basis and treating $x_{k}+ix_{n+k}$ as complex coordinates, we get an elementary, “coordinate” way of defining complex structure, which was already described above. But if we look carefully, we see that multiplication by $i$ is given by the transformation $U$ : it is trivial for $\mathbf{x}\in E$ and for $\mathbf{y}\in E^{\perp}$ , and so it is true for all real linear combinations of $\alpha\mathbf{x}+\beta\mathbf{y}$ , i.e. for all vectors in $X$ .

But that means that the abstract introduction of complex structure and the corresponding elementary approach give us the same result! And since the elementary approach clearly gives us the a complex structure, the abstract approach gives us the same complex structure.

Exercises.

5.8.1.

Prove formula (5.8.1). Namely, show that if

\mathbf{x}=(z_{1},z_{2},\ldots,z_{n})^{T},\qquad\mathbf{y}=(w_{1},w_{2},\ldots% ,w_{n})^{T},

$z_{k}=x_{k}+iy_{k}$ , $w_{k}=u_{k}+iv_{k}$ , $x_{k},y_{k},u_{k},v_{k}\in\mathbb{R}$ , then

\operatorname{Re}\Bigl{(}\sum_{k=1}^{n}z_{k}\overline{w}_{k}\Bigr{)}=\sum_{k=1% }^{n}x_{k}u_{k}+\sum_{k=1}^{n}y_{k}v_{k}.

5.8.2.

Show that if $(\mathbf{x},\mathbf{y})_{{}_{\scriptstyle\mathbb{C}}}$ is an inner product in a complex inner product space, then $(\mathbf{x},\mathbf{y})_{{}_{\scriptstyle\mathbb{R}}}$ defined by (5.8.1) is a real inner product.

5.8.3.

Let $U$ be an orthogonal transformation (in a real inner product space $X$ ), satisfying $U^{2}=-I$ . Prove that for all $\mathbf{x}\in X$

U\mathbf{x}\perp\mathbf{x}.

5.8.4.

Show, that if $U$ is an orthogonal transformation satisfying $U^{2}=-I$ , then $U^{*}=-U$ .

5.8.5.

Let $U$ be an orthogonal transformation in a real inner product space, satisfying $U^{2}=-I$ . Show that in this case $\dim X=2n$ , and that there exists a subspace $E\subset X$ , $\dim E=n$ , and an orthogonal transformation $U_{0}:E\to E^{\perp}$ such that $U$ in the decomposition $X=E\oplus E^{\perp}$ is given by the block matrix

U=\left(\begin{array}[]{cc}\mathbf{0}&-U_{0}^{*}\\ U_{0}&\mathbf{0}\end{array}\right).

This statement can be easily obtained from Theorem 6.5.1 of Chapter 6, if one notes that the only rotation $R_{\alpha}$ in $\mathbb{R}^{2}$ satisfying $R_{\alpha}^{2}=-I$ are rotations through $\alpha=\pm\pi/2$ .

However, one can find an elementary proof here, not using this theorem. For example, the statement is trivial if $\dim X=2$ : in this case we can take for $E$ any one-dimensional subspace, see Exercise 5.8.3.

Then it is not hard to show that such operator $U$ does not exists in $\mathbb{R}^{3}$ , and one can use induction in $\dim X$ to complete the proof.

	$\displaystyle\\|\mathbf{x}+\mathbf{y}\\|^{2}=(\mathbf{x}+\mathbf{y},\mathbf{x}+% \mathbf{y})$	$\displaystyle=\\|\mathbf{x}\\|^{2}+\\|\mathbf{y}\\|^{2}+(\mathbf{x},\mathbf{y})+(% \mathbf{y},\mathbf{x})$
		$\displaystyle\leq\\|\mathbf{x}\\|^{2}+\\|\mathbf{y}\\|^{2}+2\|(\mathbf{x},\mathbf{y% })\|$
		$\displaystyle\leq\\|\mathbf{x}\\|^{2}+\\|\mathbf{y}\\|^{2}+2\\|\mathbf{x}\\|\cdot\\|% \mathbf{y}\\|=(\\|\mathbf{x}\\|+\\|\mathbf{y}\\|)^{2}.$

	$\displaystyle(U\mathbf{x},U\mathbf{y})$	$\displaystyle=\frac{1}{4}\sum_{\alpha=\pm 1,\pm i}\alpha\\|U\mathbf{x}+\alpha U% \mathbf{y}\\|^{2}$
		$\displaystyle=\frac{1}{4}\sum_{\alpha=\pm 1,\pm i}\alpha\\|U(\mathbf{x}+\alpha% \mathbf{y})\\|^{2}$
		$\displaystyle=\frac{1}{4}\sum_{\alpha=\pm 1,\pm i}\alpha\\|\mathbf{x}+\alpha% \mathbf{y}\\|^{2}=(\mathbf{x},\mathbf{y}).$

	$\displaystyle(U\mathbf{x},U\mathbf{y})$	$\displaystyle=\frac{1}{4}\left(\\|U\mathbf{x}+U\mathbf{y}\\|^{2}-\\|U\mathbf{x}-U% \mathbf{y}\\|^{2}\right)$
		$\displaystyle=\frac{1}{4}\left(\\|U(\mathbf{x}+\mathbf{y})\\|^{2}-\\|U(\mathbf{x}% -\mathbf{y})\\|^{2}\right)$
		$\displaystyle=\frac{1}{4}\left(\\|\mathbf{x}+\mathbf{y}\\|^{2}-\\|\mathbf{x}-% \mathbf{y}\\|^{2}\right)=(\mathbf{x},\mathbf{y}).$

	$\displaystyle\\|T(\mathbf{x}+\alpha\mathbf{y})-(T(\mathbf{x})+\alpha T(\mathbf{% y}))\\|^{2}=\\|(T(\mathbf{x}+\alpha\mathbf{y})-T(\mathbf{x}))-\alpha T(\mathbf{y% })\\|^{2}$
	$\displaystyle=\\|T(\mathbf{x}+\alpha\mathbf{y})-T(\mathbf{x})\\|^{2}+\alpha^{2}% \\|T(\mathbf{y})\\|^{2}-2\alpha(T(\mathbf{x}+\alpha\mathbf{y})-T(\mathbf{x}),T(% \mathbf{y}))$
	$\displaystyle=\\|\mathbf{x}+\alpha\mathbf{y}-\mathbf{x}\\|^{2}+\alpha^{2}\\|% \mathbf{y}\\|^{2}-2\alpha(T(\mathbf{x}+\alpha\mathbf{y}),T(\mathbf{y}))+2\alpha% (T(\mathbf{x}),T(\mathbf{y}))$
	$\displaystyle=\alpha^{2}\\|\mathbf{y}\\|^{2}+\alpha^{2}\\|\mathbf{y}\\|^{2}-2% \alpha(\mathbf{x}+\alpha\mathbf{y},\mathbf{y})+2\alpha(\mathbf{x},\mathbf{y})$
	$\displaystyle=2\alpha^{2}\\|\mathbf{y}\\|^{2}-2\alpha(\mathbf{x},\mathbf{y})-2% \alpha^{2}(\mathbf{y},\mathbf{y})+2\alpha(\mathbf{x},\mathbf{y})=0$

Chapter 5 Inner product spaces

5.1. Inner product in ℝn\mathbb{R}^{n} and ℂn\mathbb{C}^{n}. Inner product spaces.

5.1.1. Inner product and norm in ℝn\mathbb{R}^{n}.

Remark.

5.1.2. Inner product and norm in ℂn\mathbb{C}^{n}.

Remark.

5.1.3. Inner product spaces.

Examples

Example 5.1.1.

Example 5.1.2.

Example 5.1.3.

5.1.4. Properties of inner product

Lemma 5.1.4.

Proof.

Corollary 5.1.5.

Corollary 5.1.6.

Proof.

Theorem 5.1.7 (Cauchy–Schwarz inequality).

Proof.

Lemma 5.1.8 (Triangle inequality).

Proof.

Lemma 5.1.9 (Polarization identities).

Lemma 5.1.10 (Parallelogram Identity).

5.1.5. Norm. Normed spaces

Theorem 5.1.11.

Exercises.

5.1.1.

5.1.2.

5.1.3.

5.1.4.

5.1.5.

5.1.6Equality in Cauchy–Schwarz inequality.

5.1.7.

5.1.8.

5.1.9.

5.2. Orthogonality. Orthogonal and orthonormal bases.

Definition 5.2.1.

Definition 5.2.2.

Lemma 5.2.3.

Proof.

Definition 5.2.4.

Lemma 5.2.5 (Generalized Pythagorean identity).

Proof of the Lemma.

Corollary 5.2.6.

Proof.

Remark.

5.2.1. Orthogonal and orthonormal bases.

Definition 5.2.7.

Remark 5.2.8.

Exercises.

5.2.1.

5.2.2.

5.2.3.

5.2.4.

5.2.5.

5.3. Orthogonal projection and Gram-Schmidt orthogonalization

Definition 5.3.1.

Theorem 5.3.2.

Proof.

Proposition 5.3.3.

Remark 5.3.4.

Remark 5.3.5.

Proof of Proposition 5.3.3.

5.3.1. Gram-Schmidt orthogonalization algorithm

5.3.2. An example.

Remark.

5.3.3. Orthogonal complement. Decomposition V=E⊕E⟂V=E\oplus E^{\perp}

Definition.

Proposition 5.3.6.

Exercises.

5.3.1.

5.3.2.

5.3.3.

5.3.4.

5.3.5.

5.3.6.

5.3.7.

5.3.8.

5.3.9.

5.3.10Legendre’s polynomials:.

5.1. Inner product in $\mathbb{R}^{n}$ and $\mathbb{C}^{n}$ . Inner product spaces.

5.1.1. Inner product and norm in $\mathbb{R}^{n}$ .

5.1.2. Inner product and norm in $\mathbb{C}^{n}$ .

5.3.3. Orthogonal complement. Decomposition $V=E\oplus E^{\perp}$

5.7. Rigid motions in $\mathbb{R}^{n}$