Chapter 5 Inner product spaces

Theory of inner product spaces is developed only for real and complex spaces, so 𝔽\mathbb{F} in this Chapter is always \mathbb{R} or \mathbb{C}; the results usually do not generalize to spaces over arbitrary fields.

Most of the results and calculations in this chapter hold (and the results have the same statements) in both real and complex cases. In rare situations when there is a difference between real and complex case, we state explicitly which case is considered: otherwise everything holds for both cases.

Finally, when the results and calculations hold for both complex and real cases, we use formulas for the complex case; in the real case they give correct, although sometimes a bit more complicated, formulas.

5.1. Inner product in n\mathbb{R}^{n} and n\mathbb{C}^{n}. Inner product spaces.

5.1.1. Inner product and norm in n\mathbb{R}^{n}.

In dimensions 2 and 3, we defined the length of a vector 𝐱\mathbf{x} (i.e. the distance from its endpoint to the origin) by the Pythagorean rule, for example in 3\mathbb{R}^{3} the length of the vector is defined as

𝐱=x12+x22+x32.\|\mathbf{x}\|=\sqrt{x_{1}^{2}+x_{2}^{2}+x_{3}^{2}}.

It is natural to generalize this formula for all nn, to define the norm of the vector 𝐱n\mathbf{x}\in\mathbb{R}^{n} as

𝐱=x12+x22++xn2.\|\mathbf{x}\|=\sqrt{x_{1}^{2}+x_{2}^{2}+\ldots+x_{n}^{2}}.

The word norm is used as a fancy replacement for the word length.

The dot product in 3\mathbb{R}^{3} was defined as 𝐱𝐲=x1y1+x2y2+x3y3\mathbf{x}\cdot\mathbf{y}=x_{1}y_{1}+x_{2}y_{2}+x_{3}y_{3}, where 𝐱=(x1,x2,x3)T\mathbf{x}=(x_{1},x_{2},x_{3})^{T} and 𝐲=(y1,y2,y3)T\mathbf{y}=(y_{1},y_{2},y_{3})^{T}.

Remark.

While the notation 𝐱𝐲\mathbf{x}\cdot\mathbf{y} and term “dot product” is often used in the literature, we prefer to call it “inner product”, and for reasons which will be clear later, we will the notation (𝐱,𝐲)(\mathbf{x},\mathbf{y}) instead of 𝐱𝐲\mathbf{x}\cdot\mathbf{y}.

Thus, in n\mathbb{R}^{n} one can define the inner product (𝐱,𝐲)(\mathbf{x},\mathbf{y}) of two vectors 𝐱=(x1,x2,,xn)T\mathbf{x}=(x_{1},x_{2},\ldots,x_{n})^{T}, 𝐲=(y1,y2,,yn)T\mathbf{y}=(y_{1},y_{2},\ldots,y_{n})^{T} by

(𝐱,𝐲):=x1y1+x2y2++xnyn=𝐲T𝐱,(\mathbf{x},\mathbf{y}):=x_{1}y_{1}+x_{2}y_{2}+\ldots+x_{n}y_{n}=\mathbf{y}^{T% }\mathbf{x},

so 𝐱=(𝐱,𝐱)\|\mathbf{x}\|=\sqrt{(\mathbf{x},\mathbf{x})}.

Note, that 𝐲T𝐱=𝐱T𝐲\mathbf{y}^{T}\mathbf{x}=\mathbf{x}^{T}\mathbf{y}, and we use the notation 𝐲T𝐱\mathbf{y}^{T}\mathbf{x} only to be consistent.

5.1.2. Inner product and norm in n\mathbb{C}^{n}.

Let us now define norm and inner product for n\mathbb{C}^{n}. As we have seen before, the complex space n\mathbb{C}^{n} is the most natural space from the point of view of spectral theory: even if one starts from a matrix with real coefficients (or operator on a real vectors space), the eigenvalues can be complex, and one needs to work in a complex space.

For a complex number z=x+iyz=x+iy, we have |z|2=x2+y2=zz¯|z|^{2}=x^{2}+y^{2}=z\overline{z}. If 𝐳n\mathbf{z}\in\mathbb{C}^{n} is given by

𝐳=(z1z2zn)=(x1+iy1x2+iy2xn+iyn),\mathbf{z}=\left(\begin{array}[]{c}z_{1}\\ {z}_{2}\\ \vdots\\ {z}_{n}\end{array}\right)=\left(\begin{array}[]{c}x_{1}+iy_{1}\\ x_{2}+iy_{2}\\ \vdots\\ x_{n}+iy_{n}\end{array}\right),

it is natural to define its norm 𝐳\|\mathbf{z}\| by

𝐳2=k=1n(xk2+yk2)=k=1n|zk|2.\|\mathbf{z}\|^{2}={\sum_{k=1}^{n}(x_{k}^{2}+y_{k}^{2})}={\sum_{k=1}^{n}|z_{k}% |^{2}}.

Let us try to define an inner product on n\mathbb{C}^{n} such that 𝐳2=(𝐳,𝐳)\|\mathbf{z}\|^{2}=(\mathbf{z},\mathbf{z}). One of the choices is to define (𝐳,𝐰)(\mathbf{z},\mathbf{w}) by

(𝐳,𝐰)=z1w¯1+z2w¯2++znw¯n=k=1nzkw¯k,(\mathbf{z},\mathbf{w})=z_{1}\overline{w}_{1}+z_{2}\overline{w}_{2}+\ldots+z_{% n}\overline{w}_{n}=\sum_{k=1}^{n}z_{k}\overline{w}_{k},

and that will be our definition of the standard inner product in n\mathbb{C}^{n}.

To simplify the notation, let us introduce a new notion. For a matrix AA let us define its Hermitian adjoint, or simply adjoint AA^{*} by A=A¯TA^{*}=\overline{A}^{T}, meaning that we take the transpose of the matrix, and then take the complex conjugate of each entry. Note, that for a real matrix AA, A=ATA^{*}=A^{T}.

Using the notion of AA^{*}, one can write the standard inner product in n\mathbb{C}^{n} as

(𝐳,𝐰)=𝐰𝐳.(\mathbf{z},\mathbf{w})=\mathbf{w}^{*}\mathbf{z}.
Remark.

It is easy to see that one can define a different inner product in n\mathbb{C}^{n} such that 𝐳2=(𝐳,𝐳)\|\mathbf{z}\|^{2}=(\mathbf{z},\mathbf{z}), namely the inner product given by

(𝐳,𝐰)1=z¯1w1+z¯2w2++z¯nwn=𝐳𝐰.(\mathbf{z},\mathbf{w})_{1}=\overline{z}_{1}w_{1}+\overline{z}_{2}w_{2}+\ldots% +\overline{z}_{n}w_{n}=\mathbf{z}^{*}\mathbf{w}.

We did not specify what properties we want the inner product to satisfy, but 𝐳𝐰\mathbf{z}^{*}\mathbf{w} and 𝐰𝐳\mathbf{w}^{*}\mathbf{z} are the only reasonable choices giving 𝐳2=(𝐳,𝐳)\|\mathbf{z}\|^{2}=(\mathbf{z},\mathbf{z}).

Note, that the above two choices of the inner product are essentially equivalent: the only difference between them is notational, because (𝐳,𝐰)1=(𝐰,𝐳)(\mathbf{z},\mathbf{w})_{1}=(\mathbf{w},\mathbf{z}).

While the second choice of the inner product looks more natural, the first one, (𝐳,𝐰)=𝐰𝐳(\mathbf{z},\mathbf{w})=\mathbf{w}^{*}\mathbf{z} is more widely used, so we will use it as well.

5.1.3. Inner product spaces.

The inner product we defined for n\mathbb{R}^{n} and n\mathbb{C}^{n} satisfies the following properties:

  1. 1.

    (Conjugate) symmetry: (𝐱,𝐲)=(𝐲,𝐱)¯(\mathbf{x},\mathbf{y})=\overline{(\mathbf{y},\mathbf{x})}; note, that for a real space, this property is just symmetry, (𝐱,𝐲)=(𝐲,𝐱)(\mathbf{x},\mathbf{y})=(\mathbf{y},\mathbf{x});

  2. 2.

    Linearity: (α𝐱+β𝐲,𝐳)=α(𝐱,𝐳)+β(𝐲,𝐳)(\alpha\mathbf{x}+\beta\mathbf{y},\mathbf{z})=\alpha(\mathbf{x},\mathbf{z})+% \beta(\mathbf{y},\mathbf{z}) for all vector 𝐱,𝐲,𝐳\mathbf{x},\mathbf{y},\mathbf{z} and all scalars α,β\alpha,\beta;

  3. 3.

    Non-negativity: (𝐱,𝐱)0(\mathbf{x},\mathbf{x})\geq 0 𝐱\forall\mathbf{x};

  4. 4.

    Non-degeneracy: (𝐱,𝐱)=0(\mathbf{x},\mathbf{x})=0 if and only if 𝐱=𝟎\mathbf{x}=\mathbf{0}.

Let VV be a (complex or real) vector space. An inner product on VV is a function, that assign to each pair of vectors 𝐱\mathbf{x}, 𝐲\mathbf{y} a scalar, denoted by (𝐱,𝐲)(\mathbf{x},\mathbf{y}) such that the above properties 1–4 are satisfied.

Note that for a real space VV we assume that (𝐱,𝐲)(\mathbf{x},\mathbf{y}) is always real, and for a complex space the inner product (𝐱,𝐲)(\mathbf{x},\mathbf{y}) can be complex.

A space VV together with an inner product on it is called an inner product space. Given an inner product space, one defines the norm on it by

𝐱=(𝐱,𝐱).\|\mathbf{x}\|=\sqrt{(\mathbf{x},\mathbf{x})}.

Examples

Example 5.1.1.

Let VV be n\mathbb{R}^{n} or n\mathbb{C}^{n}. We already have an inner product (𝐱,𝐲)=𝐲𝐱=k=1nxky¯k(\mathbf{x},\mathbf{y})=\mathbf{y}^{*}\mathbf{x}=\sum_{k=1}^{n}x_{k}\overline{% y}_{k} defined above.

This inner product is called the standard inner product in n\mathbb{R}^{n} or n\mathbb{C}^{n}

We will use symbol 𝔽\mathbb{F} to denote both \mathbb{C} and \mathbb{R}. When we have some statement about the space 𝔽n\mathbb{F}^{n}, it means the statement is true for both n\mathbb{R}^{n} and n\mathbb{C}^{n}.

Example 5.1.2.

Let VV be the space n\mathbb{P}_{n} of polynomials of degree at most nn. Define the inner product by

(f,g)=11f(t)g(t)¯𝑑t.(f,g)=\int_{-1}^{1}f(t)\overline{g(t)}dt.

It is easy to check, that the above properties 1–4 are satisfied.

This definition works both for complex and real cases. In the real case we only allow polynomials with real coefficients, and we do not need the complex conjugate here.

Let us recall, that for a square matrix AA, its trace is defined as the sum of the diagonal entries,

traceA:=k=1nak,k.\operatorname{trace}A:=\sum_{k=1}^{n}a_{k,k}.
Example 5.1.3.

For the space Mm×nM_{m\times n} of m×nm\times n matrices let us define the so-called Frobenius inner product by

(A,B)=trace(BA).(A,B)=\operatorname{trace}(B^{*}A).

Again, it is easy to check that the properties 1–4 are satisfied, i.e. that we indeed defined an inner product.

Note, that

trace(BA)=j,kAj,kB¯j,k,\operatorname{trace}(B^{*}A)=\sum_{j,k}A_{j,k}\overline{B}_{j,k},

so this inner product coincides with the standard inner product in mn\mathbb{C}^{mn}.

5.1.4. Properties of inner product

The statements we get in this section are true for any abstract inner product space, not only for 𝔽n\mathbb{F}^{n}. To prove them we use only properties 1–4 of the inner product.

First of all let us notice, that properties 1 and 2 imply that

  1. 22^{\prime}.

    (𝐱,α𝐲+β𝐳)=α¯(𝐱,𝐲)+β¯(𝐱,𝐳)(\mathbf{x},\alpha\mathbf{y}+\beta\mathbf{z})=\overline{\alpha}(\mathbf{x},% \mathbf{y})+\overline{\beta}(\mathbf{x},\mathbf{z}).

Indeed,

(𝐱,α𝐲+β𝐳)=(α𝐲+β𝐳,𝐱)¯=α(𝐲,𝐱)+β(𝐳,𝐱)¯==α¯(𝐲,𝐱)¯+β¯(𝐳,𝐱)¯=α¯(𝐱,𝐲)+β¯(𝐱,𝐳)(\mathbf{x},\alpha\mathbf{y}+\beta\mathbf{z})=\overline{(\alpha\mathbf{y}+% \beta\mathbf{z},\mathbf{x})}=\overline{\alpha(\mathbf{y},\mathbf{x})+\beta(% \mathbf{z},\mathbf{x})}=\\ =\overline{\alpha}\overline{(\mathbf{y},\mathbf{x})}+\overline{\beta}\,% \overline{(\mathbf{z},\mathbf{x})}=\overline{\alpha}(\mathbf{x},\mathbf{y})+% \overline{\beta}(\mathbf{x},\mathbf{z})

Note also that property 2 implies that for all vectors 𝐱\mathbf{x}

(𝟎,𝐱)=(𝐱,𝟎)=0.(\mathbf{0},\mathbf{x})=(\mathbf{x},\mathbf{0})=0.
Lemma 5.1.4.

Let 𝐱\mathbf{x} be a vector in an inner product space VV. Then 𝐱=𝟎\mathbf{x}=\mathbf{0} if and only if

(5.1.1) (𝐱,𝐲)=0𝐲V.(\mathbf{x},\mathbf{y})=0\qquad\forall\mathbf{y}\in V.
Proof.

Since (𝟎,𝐲)=0(\mathbf{0},\mathbf{y})=0 we only need to show that (5.1.1) implies 𝐱=𝟎\mathbf{x}=\mathbf{0}. Putting 𝐲=𝐱\mathbf{y}=\mathbf{x} in (5.1.1) we get (𝐱,𝐱)=0(\mathbf{x},\mathbf{x})=0, so 𝐱=𝟎\mathbf{x}=\mathbf{0}. ∎

Applying the above lemma to the difference 𝐱𝐲\mathbf{x}-\mathbf{y} we get the following

Corollary 5.1.5.

Let 𝐱,𝐲\mathbf{x},\mathbf{y} be vectors in an inner product space VV. The equality 𝐱=𝐲\mathbf{x}=\mathbf{y} holds if and only if

(𝐱,𝐳)=(𝐲,𝐳)𝐳V.(\mathbf{x},\mathbf{z})=(\mathbf{y},\mathbf{z})\qquad\forall\mathbf{z}\in V.

The following corollary is very simple, but will be used a lot

Corollary 5.1.6.

Suppose two operators A,B:XYA,B:X\to Y satisfy

(A𝐱,𝐲)=(B𝐱,𝐲)𝐱X,𝐲Y.(A\mathbf{x},\mathbf{y})=(B\mathbf{x},\mathbf{y})\qquad\forall\mathbf{x}\in X,% \ \forall\mathbf{y}\in Y.

Then A=BA=B.

Proof.

By the previous corollary (fix 𝐱\mathbf{x} and take all possible 𝐲\mathbf{y}’s) we get A𝐱=B𝐱A\mathbf{x}=B\mathbf{x}. Since this is true for all 𝐱X\mathbf{x}\in X, the transformations AA and BB coincide. ∎

The following property relates the norm and the inner product.

Theorem 5.1.7 (Cauchy–Schwarz inequality).
|(𝐱,𝐲)|𝐱𝐲.|(\mathbf{x},\mathbf{y})|\leq\|\mathbf{x}\|\cdot\|\mathbf{y}\|.
Proof.

The proof we are going to present, is not the shortest one, but it shows where the main ideas came from.

Let us consider the real case first. If 𝐲=𝟎\mathbf{y}=\mathbf{0}, the statement is trivial, so we can assume that 𝐲𝟎\mathbf{y}\neq\mathbf{0}. By the properties of an inner product, for all scalar tt

0𝐱t𝐲2=(𝐱t𝐲,𝐱t𝐲)=𝐱22t(𝐱,𝐲)+t2𝐲2.0\leq\|\mathbf{x}-t\mathbf{y}\|^{2}=(\mathbf{x}-t\mathbf{y},\mathbf{x}-t% \mathbf{y})=\|\mathbf{x}\|^{2}-2t(\mathbf{x},\mathbf{y})+t^{2}\|\mathbf{y}\|^{% 2}.

In particular, this inequality should hold for t=(𝐱,𝐲)𝐲2t=\frac{(\mathbf{x},\mathbf{y})}{\|\mathbf{y}\|^{2}} 111That is the point where the above quadratic polynomial has a minimum: it can be computed, for example by taking the derivative in tt and equating it to 0, and for this point the inequality becomes

0𝐱22(𝐱,𝐲)2𝐲2+(𝐱,𝐲)2𝐲2=𝐱2(𝐱,𝐲)2𝐲2,0\leq\|\mathbf{x}\|^{2}-2\frac{(\mathbf{x},\mathbf{y})^{2}}{\|\mathbf{y}\|^{2}% }+\frac{(\mathbf{x},\mathbf{y})^{2}}{\|\mathbf{y}\|^{2}}=\|\mathbf{x}\|^{2}-% \frac{(\mathbf{x},\mathbf{y})^{2}}{\|\mathbf{y}\|^{2}},

which is exactly the inequality we need to prove.

There are several possible ways to treat the complex case. One is to replace 𝐱\mathbf{x} by α𝐱\alpha\mathbf{x}, where α\alpha is a complex constant, |α|=1|\alpha|=1 such that (α𝐱,𝐲)(\alpha\mathbf{x},\mathbf{y}) is real, and then repeat the proof for the real case.

The other possibility is again to consider

0𝐱t𝐲2\displaystyle 0\leq\|\mathbf{x}-t\mathbf{y}\|^{2} =(𝐱t𝐲,𝐱t𝐲)=(𝐱,𝐱t𝐲)t(𝐲,𝐱t𝐲)\displaystyle=(\mathbf{x}-t\mathbf{y},\mathbf{x}-t\mathbf{y})=(\mathbf{x},% \mathbf{x}-t\mathbf{y})-t(\mathbf{y},\mathbf{x}-t\mathbf{y})
=𝐱2t(𝐲,𝐱)t¯(𝐱,𝐲)+|t|2𝐲2.\displaystyle=\|\mathbf{x}\|^{2}-t(\mathbf{y},\mathbf{x})-\overline{t}(\mathbf% {x},\mathbf{y})+|t|^{2}\|\mathbf{y}\|^{2}.

Substituting t=(𝐱,𝐲)𝐲2=(𝐲,𝐱)¯𝐲2t=\frac{{(\mathbf{x},\mathbf{y})}}{\|\mathbf{y}\|^{2}}=\frac{\overline{(% \mathbf{y},\mathbf{x})}}{\|\mathbf{y}\|^{2}} into this inequality, we get

0𝐱2|(𝐱,𝐲)|2𝐲20\leq\|\mathbf{x}\|^{2}-\frac{|(\mathbf{x},\mathbf{y})|^{2}}{\|\mathbf{y}\|^{2}}

which is the inequality we need.

Note, that the above paragraph is in fact a complete formal proof of the theorem. The reasoning before that was only to explain why do we need to pick this particular value of tt. ∎

An immediate Corollary of the Cauchy–Schwarz Inequality is the following lemma.

Lemma 5.1.8 (Triangle inequality).

For any vectors 𝐱\mathbf{x}, 𝐲\mathbf{y} in an inner product space

𝐱+𝐲𝐱+𝐲.\|\mathbf{x}+\mathbf{y}\|\leq\|\mathbf{x}\|+\|\mathbf{y}\|.
Proof.
𝐱+𝐲2=(𝐱+𝐲,𝐱+𝐲)\displaystyle\|\mathbf{x}+\mathbf{y}\|^{2}=(\mathbf{x}+\mathbf{y},\mathbf{x}+% \mathbf{y}) =𝐱2+𝐲2+(𝐱,𝐲)+(𝐲,𝐱)\displaystyle=\|\mathbf{x}\|^{2}+\|\mathbf{y}\|^{2}+(\mathbf{x},\mathbf{y})+(% \mathbf{y},\mathbf{x})
𝐱2+𝐲2+2|(𝐱,𝐲)|\displaystyle\leq\|\mathbf{x}\|^{2}+\|\mathbf{y}\|^{2}+2|(\mathbf{x},\mathbf{y% })|
𝐱2+𝐲2+2𝐱𝐲=(𝐱+𝐲)2.\displaystyle\leq\|\mathbf{x}\|^{2}+\|\mathbf{y}\|^{2}+2\|\mathbf{x}\|\cdot\|% \mathbf{y}\|=(\|\mathbf{x}\|+\|\mathbf{y}\|)^{2}.

The following polarization identities allow one to reconstruct the inner product from the norm:

Lemma 5.1.9 (Polarization identities).

For 𝐱,𝐲V\mathbf{x},\mathbf{y}\in V

(𝐱,𝐲)=14(𝐱+𝐲2𝐱𝐲2)(\mathbf{x},\mathbf{y})=\frac{1}{4}\left(\|\mathbf{x}+\mathbf{y}\|^{2}-\|% \mathbf{x}-\mathbf{y}\|^{2}\right)

if VV is a real inner product space, and

(𝐱,𝐲)=14α=±1,±iα𝐱+α𝐲2(\mathbf{x},\mathbf{y})=\frac{1}{4}\sum_{\alpha=\pm 1,\pm i}\alpha\|\mathbf{x}% +\alpha\mathbf{y}\|^{2}

if VV is a complex inner product space.

The lemma is proved by direct computation. We leave the proof as an exercise for the reader.

Another important property of the norm in an inner product space can be also checked by direct calculation.

Lemma 5.1.10 (Parallelogram Identity).

For any vectors 𝐮,𝐯\mathbf{u},\mathbf{v}

𝐮+𝐯2+𝐮𝐯2=2(𝐮2+𝐯2).\|\mathbf{u}+\mathbf{v}\|^{2}+\|\mathbf{u}-\mathbf{v}\|^{2}=2(\|\mathbf{u}\|^{% 2}+\|\mathbf{v}\|^{2}).

In 2-dimensional space this lemma relates sides of a parallelogram with its diagonals, which explains the name. It is a well-known fact from planar geometry.

5.1.5. Norm. Normed spaces

We have proved before that the norm 𝐯\|\mathbf{v}\| satisfies the following properties:

  1. 1.

    Homogeneity: α𝐯=|α|𝐯\|\alpha\mathbf{v}\|=|\alpha|\cdot\|\mathbf{v}\| for all vectors 𝐯\mathbf{v} and all scalars α\alpha.

  2. 2.

    Triangle inequality: 𝐮+𝐯𝐮+𝐯\|\mathbf{u}+\mathbf{v}\|\leq\|\mathbf{u}\|+\|\mathbf{v}\|.

  3. 3.

    Non-negativity: 𝐯0\|\mathbf{v}\|\geq 0 for all vectors 𝐯\mathbf{v}.

  4. 4.

    Non-degeneracy: 𝐯=0\|\mathbf{v}\|=0 if and only if 𝐯=𝟎\mathbf{v}=\mathbf{0}.

Suppose in a vector space VV we assigned to each vector 𝐯\mathbf{v} a number 𝐯\|\mathbf{v}\| such that above properties 1–4 are satisfied. Then we say that the function 𝐯𝐯\mathbf{v}\mapsto\|\mathbf{v}\| is a norm. A vector space VV equipped with a norm is called a normed space.

Any inner product space is a normed space, because the norm 𝐯=(𝐯,𝐯)\|\mathbf{v}\|=\sqrt{(\mathbf{v},\mathbf{v})} satisfies the above properties 1–4. However, there are many other normed spaces. For example, given pp, 1p<1\leq p<\infty one can define the norm p\|\,\cdot\,\|_{p} on n\mathbb{R}^{n} or n\mathbb{C}^{n} by

𝐱p=(|x1|p+|x2|p++|xn|p)1/p=(k=1n|xk|p)1/p.\|\mathbf{x}\|_{p}=\left(|x_{1}|^{p}+|x_{2}|^{p}+\ldots+|x_{n}|^{p}\right)^{1/% p}=\left(\sum_{k=1}^{n}|x_{k}|^{p}\right)^{1/p}.

One can also define the norm \|\,\cdot\,\|_{\infty} (p=p=\infty) by

𝐱=max{|xk|:k=1,2,,n}.\|\mathbf{x}\|_{\infty}=\max\{|x_{k}|:k=1,2,\ldots,n\}.

The norm p\|\,\cdot\,\|_{p} for p=2p=2 coincides with the regular norm obtained from the inner product.

To check that p\|\,\cdot\,\|_{p} is indeed a norm one has to check that it satisfies all the above properties 1–4. Properties 1, 3 and 4 are very easy to check, we leave it as an exercise for the reader. The triangle inequality (property 2) is easy to check for p=1p=1 and p=p=\infty (and we proved it for p=2p=2).

For all other pp the triangle inequality is true, but the proof is not so simple, and we will not present it here. The triangle inequality for p\|\,\cdot\,\|_{p} even has special name: its called Minkowski inequality, after the German mathematician H. Minkowski.

Note, that the norm p\|\,\cdot\,\|_{p} for p2p\neq 2 cannot be obtained from an inner product. It is easy to see that this norm is not obtained from the standard inner product in n\mathbb{R}^{n} (n\mathbb{C}^{n}). But we claim more! We claim that it is impossible to introduce an inner product which gives rise to the norm p\|\,\cdot\,\|_{p}, p2p\neq 2.

This statement is actually quite easy to prove. By Lemma 5.1.10 any norm obtained from an inner product must satisfy the Parallelogram Identity. It is easy to see that the Parallelogram Identity fails for the norm p\|\,\cdot\,\|_{p}, p2p\neq 2, and one can easily find a counterexample in 2\mathbb{R}^{2}, which then gives rise to a counterexample in all other spaces.

In fact, the Parallelogram Identity, as the theorem below asserts, completely characterizes norms obtained from an inner product.

Theorem 5.1.11.

A norm in a normed space is obtained from some inner product if and only if it satisfies the Parallelogram Identity

𝐮+𝐯2+𝐮𝐯2=2(𝐮2+𝐯2)𝐮,𝐯V.\|\mathbf{u}+\mathbf{v}\|^{2}+\|\mathbf{u}-\mathbf{v}\|^{2}=2(\|\mathbf{u}\|^{% 2}+\|\mathbf{v}\|^{2})\qquad\forall\mathbf{u},\mathbf{v}\in V.

Lemma 5.1.10 asserts that a norm obtained from an inner product satisfies the Parallelogram Identity.

The converse implication is more complicated. If we are given a norm, and this norm came from an inner product, then we do not have any choice; this inner product must be given by the polarization identities, see Lemma 5.1.9. But, we need to show that (𝐱,𝐲)(\mathbf{x},\mathbf{y}) which we got from the polarization identities is indeed an inner product, i.e. that it satisfies all the properties.

It is indeed possible to verify that if the norm satisfies the parallelogram identity then the inner product (𝐱,𝐲)(\mathbf{x},\mathbf{y}) obtained from the polarization identities is indeed an inner product (i.e. satisfies all the properties of an inner product). However, the proof is a bit too involved, so we do not present it here.

Exercises.

5.1.1.

Compute

(3+2i)(53i),23i12i,Re(23i12i),(1+2i)3,Im((1+2i)3).(3+2i)(5-3i),\qquad\frac{2-3i}{1-2i},\qquad\operatorname{Re}\left(\frac{2-3i}{% 1-2i}\right),\qquad(1+2i)^{3},\qquad\operatorname{Im}((1+2i)^{3}).
5.1.2.

For vectors 𝐱=(1,2i,1+i)T\mathbf{x}=(1,2i,1+i)^{T} and 𝐲=(i,2i,3)T\mathbf{y}=(i,2-i,3)^{T} compute

  1. a)

    (𝐱,𝐲)(\mathbf{x},\mathbf{y}), 𝐱2\|\mathbf{x}\|^{2}, 𝐲2\|\mathbf{y}\|^{2}, 𝐲\|\mathbf{y}\|;

  2. b)

    (3𝐱,2i𝐲)(3\mathbf{x},2i\mathbf{y}), (2𝐱,i𝐱+2𝐲)(2\mathbf{x},i\mathbf{x}+2\mathbf{y});

  3. c)

    𝐱+2𝐲\|\mathbf{x}+2\mathbf{y}\|.

Remark: After you have done part a), you can do parts b) and c) without actually computing all vectors involved, just by using the properties of inner product.

5.1.3.

Let 𝐮=2\|\mathbf{u}\|=2, 𝐯=3\|\mathbf{v}\|=3, (𝐮,𝐯)=2+i(\mathbf{u},\mathbf{v})=2+i. Compute

𝐮+𝐯2,𝐮𝐯2,(𝐮+𝐯,𝐮i𝐯),(𝐮+3i𝐯,4i𝐮).\|\mathbf{u}+\mathbf{v}\|^{2},\qquad\|\mathbf{u}-\mathbf{v}\|^{2},\qquad(% \mathbf{u}+\mathbf{v},\mathbf{u}-i\mathbf{v}),\qquad(\mathbf{u}+3i\mathbf{v},4% i\mathbf{u}).
5.1.4.

Prove that for vectors in a inner product space

𝐱±𝐲2=𝐱2+𝐲2±2Re(𝐱,𝐲)\|\mathbf{x}\pm\mathbf{y}\|^{2}=\|\mathbf{x}\|^{2}+\|\mathbf{y}\|^{2}\pm 2% \operatorname{Re}(\mathbf{x},\mathbf{y})

Recall that Rez=12(z+z¯)\operatorname{Re}z=\frac{1}{2}(z+\overline{z})

5.1.5.

Explain why each of the following is not an inner product on a given vector space:

  1. a)

    (𝐱,𝐲)=x1y1x2y2(\mathbf{x},\mathbf{y})=x_{1}y_{1}-x_{2}y_{2} on 2\mathbb{R}^{2};

  2. b)

    (A,B)=trace(A+B)(A,B)=\operatorname{trace}(A+B) on the space of real 2×22\times 2 matrices’

  3. c)

    (f,g)=01f(t)g(t)¯𝑑t(f,g)=\int_{0}^{1}f^{\prime}(t)\overline{g(t)}dt on the space of polynomials; f(t)f^{\prime}(t) denotes derivative.

5.1.6Equality in Cauchy–Schwarz inequality.

Prove that

|(𝐱,𝐲)|=𝐱𝐲|(\mathbf{x},\mathbf{y})|=\|\mathbf{x}\|\cdot\|\mathbf{y}\|

if and only if one of the vectors is a multiple of the other. Hint: Analyze the proof of the Cauchy–Schwarz inequality.

5.1.7.

Prove the parallelogram identity for an inner product space VV,

𝐱+𝐲2+𝐱𝐲2=2(𝐱2+𝐲2).\|\mathbf{x}+\mathbf{y}\|^{2}+\|\mathbf{x}-\mathbf{y}\|^{2}=2(\|\mathbf{x}\|^{% 2}+\|\mathbf{y}\|^{2}).
5.1.8.

Let 𝐯1,𝐯2,,𝐯n\mathbf{v}_{1},\mathbf{v}_{2},\ldots,\mathbf{v}_{n} be a spanning set (in particular, a basis) in an inner product space VV. Prove that

  1. a)

    If (𝐱,𝐯)=0(\mathbf{x},\mathbf{v})=0 for all 𝐯V\mathbf{v}\in V, then 𝐱=𝟎\mathbf{x}=\mathbf{0};

  2. b)

    If (𝐱,𝐯k)=0(\mathbf{x},\mathbf{v}_{k})=0 k\forall k, then 𝐱=𝟎\mathbf{x}=\mathbf{0};

  3. c)

    If (𝐱,𝐯k)=(𝐲,𝐯k)(\mathbf{x},\mathbf{v}_{k})=(\mathbf{y},\mathbf{v}_{k}) k\forall k, then 𝐱=𝐲\mathbf{x}=\mathbf{y}.

5.1.9.

Consider the space 2\mathbb{R}^{2} with the norm p\|\,\cdot\,\|_{p}, introduced in Section 5.1.5. For p=1,2,p=1,2,\infty, draw the “unit ball” BpB_{p} in the norm p\|\,\cdot\,\|_{p}

Bp:={𝐱2:𝐱p1}.B_{p}:=\{\mathbf{x}\in\mathbb{R}^{2}:\|\mathbf{x}\|_{p}\leq 1\}.

Can you guess what the balls BpB_{p} for other pp look like?

5.2. Orthogonality. Orthogonal and orthonormal bases.

Definition 5.2.1.

Two vectors 𝐮\mathbf{u} and 𝐯\mathbf{v} are called orthogonal (also perpendicular) if (𝐮,𝐯)=0(\mathbf{u},\mathbf{v})=0. We will write 𝐮𝐯\mathbf{u}\perp\mathbf{v} to say that the vectors are orthogonal.

Note, that for orthogonal vectors 𝐮\mathbf{u} and 𝐯\mathbf{v} we have the following, so-called Pythagorean identity:

𝐮+𝐯2=𝐮2+𝐯2if 𝐮𝐯.\|\mathbf{u}+\mathbf{v}\|^{2}=\|\mathbf{u}\|^{2}+\|\mathbf{v}\|^{2}\qquad\text% {if }\mathbf{u}\perp\mathbf{v}.

The proof is straightforward computation,

𝐮+𝐯2=(𝐮+𝐯,𝐮+𝐯)=(𝐮,𝐮)+(𝐯,𝐯)+(𝐮,𝐯)+(𝐯,𝐮)=𝐮2+𝐯2\|\mathbf{u}+\mathbf{v}\|^{2}=(\mathbf{u}+\mathbf{v},\mathbf{u}+\mathbf{v})=(% \mathbf{u},\mathbf{u})+(\mathbf{v},\mathbf{v})+(\mathbf{u},\mathbf{v})+(% \mathbf{v},\mathbf{u})=\|\mathbf{u}\|^{2}+\|\mathbf{v}\|^{2}

((𝐮,𝐯)=(𝐯,𝐮)=0(\mathbf{u},\mathbf{v})=(\mathbf{v},\mathbf{u})=0 because of orthogonality).

Definition 5.2.2.

We say that a vector 𝐯\mathbf{v} is orthogonal to a subspace EE if 𝐯\mathbf{v} is orthogonal to all vectors 𝐰\mathbf{w} in EE.

We say that subspaces EE and FF are orthogonal if all vectors in EE are orthogonal to FF, i.e. all vectors in EE are orthogonal to all vectors in FF

The following lemma shows how to check that a vector is orthogonal to a subspace.

Lemma 5.2.3.

Let EE be spanned by vectors 𝐯1,𝐯2,,𝐯r\mathbf{v}_{1},\mathbf{v}_{2},\ldots,\mathbf{v}_{r}. Then 𝐯E\mathbf{v}\perp E if and only if

𝐯𝐯k,k=1,2,,r.\mathbf{v}\perp\mathbf{v}_{k},\qquad\forall k=1,2,\ldots,r.
Proof.

By the definition, if 𝐯E\mathbf{v}\perp E then 𝐯\mathbf{v} is orthogonal to all vectors in EE. In particular, 𝐯𝐯k\mathbf{v}\perp\mathbf{v}_{k}, k=1,2,,rk=1,2,\ldots,r.

On the other hand, let 𝐯𝐯k\mathbf{v}\perp\mathbf{v}_{k}, k=1,2,,rk=1,2,\ldots,r. Since the vectors 𝐯k\mathbf{v}_{k} span EE, any vector 𝐰E\mathbf{w}\in E can be represented as a linear combination k=1rαk𝐯k\sum_{k=1}^{r}\alpha_{k}\mathbf{v}_{k}. Then

(𝐯,𝐰)=k=1rαk(𝐯,𝐯k)=0,(\mathbf{v},\mathbf{w})=\sum_{k=1}^{r}\alpha_{k}(\mathbf{v},\mathbf{v}_{k})=0,

so 𝐯𝐰\mathbf{v}\perp\mathbf{w}. ∎

Definition 5.2.4.

A system of vectors 𝐯1,𝐯2,,𝐯n\mathbf{v}_{1},\mathbf{v}_{2},\ldots,\mathbf{v}_{n} is called orthogonal if any two vectors are orthogonal to each other (i.e. if (𝐯j,𝐯k)=0(\mathbf{v}_{j},\mathbf{v}_{k})=0 for jkj\neq k).

If, in addition 𝐯k=1\|\mathbf{v}_{k}\|=1 for all kk, we call the system orthonormal.

Lemma 5.2.5 (Generalized Pythagorean identity).

Let 𝐯1,𝐯2,,𝐯n\mathbf{v}_{1},\mathbf{v}_{2},\ldots,\mathbf{v}_{n} be an orthogonal system. Then

k=1nαk𝐯k2=k=1n|αk|2𝐯k2\left\|\sum_{k=1}^{n}\alpha_{k}\mathbf{v}_{k}\right\|^{2}=\sum_{k=1}^{n}|% \alpha_{k}|^{2}\|\mathbf{v}_{k}\|^{2}

This formula looks particularly simple for orthonormal systems, where 𝐯k=1\|\mathbf{v}_{k}\|=1.

Proof of the Lemma.
k=1nαk𝐯k2=(k=1nαk𝐯k,j=1nαj𝐯j)=k=1nj=1nαkα¯j(𝐯k,𝐯j).\left\|\sum_{k=1}^{n}\alpha_{k}\mathbf{v}_{k}\right\|^{2}=\Bigl{(}\sum_{k=1}^{% n}\alpha_{k}\mathbf{v}_{k},\sum_{j=1}^{n}\alpha_{j}\mathbf{v}_{j}\Bigr{)}=\sum% _{k=1}^{n}\sum_{j=1}^{n}\alpha_{k}\overline{\alpha}_{j}(\mathbf{v}_{k},\mathbf% {v}_{j}).

Because of orthogonality (𝐯k,𝐯j)=0(\mathbf{v}_{k},\mathbf{v}_{j})=0 if jkj\neq k. Therefore we only need to sum the terms with j=kj=k, which gives exactly

k=1n|αk|2(𝐯k,𝐯k)=k=1n|αk|2𝐯k2.\sum_{k=1}^{n}|\alpha_{k}|^{2}(\mathbf{v}_{k},\mathbf{v}_{k})=\sum_{k=1}^{n}|% \alpha_{k}|^{2}\|\mathbf{v}_{k}\|^{2}.

Corollary 5.2.6.

Any orthogonal system 𝐯1,𝐯2,,𝐯n\mathbf{v}_{1},\mathbf{v}_{2},\ldots,\mathbf{v}_{n} of non-zero vectors is linearly independent.

Proof.

Suppose for some α1,α2,,αn\alpha_{1},\alpha_{2},\ldots,\alpha_{n} we have k=1nαk𝐯k=𝟎\sum_{k=1}^{n}\alpha_{k}\mathbf{v}_{k}=\mathbf{0}. Then by the Generalized Pythagorean identity (Lemma 5.2.5)

0=𝟎2=k=1n|αk|2𝐯k2.0=\|\mathbf{0}\|^{2}=\sum_{k=1}^{n}|\alpha_{k}|^{2}\|\mathbf{v}_{k}\|^{2}.

Since 𝐯k0\|\mathbf{v}_{k}\|\neq 0 (𝐯k𝟎\mathbf{v}_{k}\neq\mathbf{0}) we conclude that

αk=0k,\alpha_{k}=0\qquad\forall k,

so only the trivial linear combination gives 𝟎\mathbf{0}. ∎

Remark.

In what follows we will usually mean by an orthogonal system an orthogonal system of non-zero vectors. Since the zero vector 𝟎\mathbf{0} is orthogonal to everything, it always can be added to any orthogonal system, but it is really not interesting to consider orthogonal systems with zero vectors.

5.2.1. Orthogonal and orthonormal bases.

Definition 5.2.7.

An orthogonal (orthonormal) system 𝐯1,𝐯2,,𝐯n\mathbf{v}_{1},\mathbf{v}_{2},\ldots,\mathbf{v}_{n} which is also a basis is called an orthogonal (orthonormal) basis.

It is clear that if dimV=n\dim V=n then any orthogonal system of nn non-zero vectors is an orthogonal basis.

As we studied before, to find coordinates of a vector in a basis one needs to solve a linear system. However, for an orthogonal basis finding coordinates of a vector is much easier. Namely, suppose 𝐯1,𝐯2,,𝐯n\mathbf{v}_{1},\mathbf{v}_{2},\ldots,\mathbf{v}_{n} is an orthogonal basis, and let

𝐱=α1𝐯1+α2𝐯2++αn𝐯n=j=1nαj𝐯j.\mathbf{x}=\alpha_{1}\mathbf{v}_{1}+\alpha_{2}\mathbf{v}_{2}+\ldots+\alpha_{n}% \mathbf{v}_{n}=\sum_{j=1}^{n}\alpha_{j}\mathbf{v}_{j}.

Taking inner product of both sides of the equation with 𝐯1\mathbf{v}_{1} we get

(𝐱,𝐯1)=j=1nαj(𝐯j,𝐯1)=α1(𝐯1,𝐯1)=α1𝐯12(\mathbf{x},\mathbf{v}_{1})=\sum_{j=1}^{n}\alpha_{j}(\mathbf{v}_{j},\mathbf{v}% _{1})=\alpha_{1}(\mathbf{v}_{1},\mathbf{v}_{1})=\alpha_{1}\|\mathbf{v}_{1}\|^{2}

(all inner products (𝐯j,𝐯1)=0(\mathbf{v}_{j},\mathbf{v}_{1})=0 if j1j\neq 1), so

α1=(𝐱,𝐯1)𝐯12.\alpha_{1}=\frac{(\mathbf{x},\mathbf{v}_{1})}{\|\mathbf{v}_{1}\|^{2}}.

Similarly, multiplying both sides by 𝐯k\mathbf{v}_{k} we get

(𝐱,𝐯k)=j=1nαj(𝐯j,𝐯k)=αk(𝐯k,𝐯k)=αk𝐯k2(\mathbf{x},\mathbf{v}_{k})=\sum_{j=1}^{n}\alpha_{j}(\mathbf{v}_{j},\mathbf{v}% _{k})=\alpha_{k}(\mathbf{v}_{k},\mathbf{v}_{k})=\alpha_{k}\|\mathbf{v}_{k}\|^{2}

so

(5.2.1) αk=(𝐱,𝐯k)𝐯k2.\alpha_{k}=\frac{(\mathbf{x},\mathbf{v}_{k})}{\|\mathbf{v}_{k}\|^{2}}.

Therefore,

to find coordinates of a vector in an orthogonal basis one does not need to solve a linear system, the coordinates are determined by the formula (5.2.1).

This formula is especially simple for orthonormal bases, when 𝐯k=1\|\mathbf{v}_{k}\|=1.

Namely, if 𝐯1,𝐯2,,𝐯n\mathbf{v}_{1},\mathbf{v}_{2},\ldots,\mathbf{v}_{n} is an orthonormal basis, any vector 𝐯\mathbf{v} can be represented as

(5.2.2) 𝐯=k=1n(𝐯,𝐯k)𝐯k.\mathbf{v}=\sum_{k=1}^{n}(\mathbf{v},\mathbf{v}_{k})\mathbf{v}_{k}.

This formula is sometimes called (a baby version of) the abstract orthogonal Fourier decomposition. The classical (non-abstract) Fourier decomposition deals with a concrete orthonormal system (sines and cosines or complex exponentials). We call this formula a baby version because the real Fourier decomposition deals with infinite orthonormal systems.

Remark 5.2.8.

The importance of orthonormal bases is that if we fix an orthonormal basis in an inner product space VV, we can work with coordinates in this basis the same way we work with vectors in 𝔽n\mathbb{F}^{n}. Namely, as it was discussed in the very beginning of the book, see Remark 1.2.4 in Chapter 1, if we have a vector space VV (over a field 𝔽\mathbb{F}) with a basis 𝐯1,𝐯2,,𝐯n\mathbf{v}_{1},\mathbf{v}_{2},\ldots,\mathbf{v}_{n}, then we can perform the standard vector operations (vector addition and multiplication by a scalar) by working with the columns of coordinates in the basis 𝐯1,𝐯2,,𝐯n\mathbf{v}_{1},\mathbf{v}_{2},\ldots,\mathbf{v}_{n} in absolutely the same way we work with vectors in the standard coordinate space 𝔽n\mathbb{F}^{n}.

Exercise 5.2.3 below shows that if we have an orthonormal basis in an inner product space VV, we can compute the inner product of 2 vectors in VV by taking columns of their coordinates in this orthonormal basis and computing the standard inner product (in n\mathbb{C}^{n} or n\mathbb{R}^{n}) of these columns.

As it will be shown below in Section 5.3 any finite-dimensional inner product space has an orthonormal basis. Thus, the standard inner product spaces n\mathbb{C}^{n} (or n\mathbb{R}^{n} in the case of real spaces) are essentially the only examples of a finite-dimensional inner product spaces.

This is a very important remark allowing one to translate any statement about the standard inner product space 𝔽n\mathbb{F}^{n} to an inner product space with an orthonormal basis 𝐯1,𝐯2,,𝐯n\mathbf{v}_{1},\mathbf{v}_{2},\ldots,\mathbf{v}_{n}.

Exercises.

5.2.1.

Find all vectors in 4\mathbb{R}^{4} orthogonal to vectors (1,1,1,1)T(1,1,1,1)^{T} and (1,2,3,4)T(1,2,3,4)^{T}.

5.2.2.

Let AA be a real m×nm\times n matrix. Describe (RanAT)(\operatorname{Ran}A^{T})^{\perp}, (RanA)(\operatorname{Ran}A)^{\perp}

5.2.3.

Let 𝐯1,𝐯2,,𝐯n\mathbf{v}_{1},\mathbf{v}_{2},\ldots,\mathbf{v}_{n} be an orthonormal basis in VV.

  1. a)

    Prove that for any 𝐱=k=1nαk𝐯k\mathbf{x}=\sum_{k=1}^{n}\alpha_{k}\mathbf{v}_{k}, 𝐲=k=1nβk𝐯k\mathbf{y}=\sum_{k=1}^{n}\beta_{k}\mathbf{v}_{k}

    (𝐱,𝐲)=k=1nαkβ¯k.(\mathbf{x},\mathbf{y})=\sum_{k=1}^{n}\alpha_{k}\overline{\beta}_{k}.
  2. b)

    Deduce from this the Parseval’s identity

    (𝐱,𝐲)=k=1n(𝐱,𝐯k)(𝐲,𝐯k)¯(\mathbf{x},\mathbf{y})=\sum_{k=1}^{n}(\mathbf{x},\mathbf{v}_{k})\overline{(% \mathbf{y},\mathbf{v}_{k})}
  3. c)

    Assume now that 𝐯1,𝐯2,,𝐯n\mathbf{v}_{1},\mathbf{v}_{2},\ldots,\mathbf{v}_{n} is only an orthogonal basis, not an orthonormal one. Can you write down Parseval’s identity in this case?

This problem shows that if we have an orthonormal basis, we can use the coordinates in this basis absolutely the same way we use the standard coordinates in n\mathbb{C}^{n} (or n\mathbb{R}^{n}).

The problem below shows that we can define an inner product by declaring a basis to be an orthonormal one.

5.2.4.

Let VV be a vector space and let 𝐯1,𝐯2,,𝐯n\mathbf{v}_{1},\mathbf{v}_{2},\ldots,\mathbf{v}_{n} be a basis in VV. For 𝐱=k=1nαk𝐯k\mathbf{x}=\sum_{k=1}^{n}\alpha_{k}\mathbf{v}_{k}, 𝐲=k=1nβk𝐯k\mathbf{y}=\sum_{k=1}^{n}\beta_{k}\mathbf{v}_{k} define 𝐱,𝐲:=k=1nαkβ¯k\langle\mathbf{x},\mathbf{y}\rangle:=\sum_{k=1}^{n}\alpha_{k}\bar{\beta}_{k}.

Prove that 𝐱,𝐲\langle\mathbf{x},\mathbf{y}\rangle defines an inner product in VV.

5.2.5.

Let AA be a real m×nm\times n matrix. Describe the set of all vectors in 𝔽m\mathbb{F}^{m} orthogonal to RanA\operatorname{Ran}A.

5.3. Orthogonal projection and Gram-Schmidt orthogonalization

Recalling the definition of orthogonal projection from the classical planar (2-dimensional) geometry, one can introduce the following definition. Let EE be a subspace of an inner product space VV.

Definition 5.3.1.

For a vector 𝐯\mathbf{v} its orthogonal projection PE𝐯P_{E}\mathbf{v} onto the subspace EE is a vector 𝐰\mathbf{w} such that

  1. 1.

    𝐰E\mathbf{w}\in E ;

  2. 2.

    𝐯𝐰E\mathbf{v}-\mathbf{w}\perp E.

We will use notation 𝐰=PE𝐯\mathbf{w}=P_{E}\mathbf{v} for the orthogonal projection.

After introducing an object, it is natural to ask:

  1. 1.

    Does the object exist?

  2. 2.

    Is the object unique?

  3. 3.

    How does one find it?

We will show first that the projection is unique. Then we present a method of finding the projection, proving its existence.

The following theorem shows why the orthogonal projection is important and also proves that it is unique.

Theorem 5.3.2.

The orthogonal projection 𝐰=PE𝐯\mathbf{w}=P_{E}\mathbf{v} minimizes the distance from 𝐯\mathbf{v} to EE, i.e. for all 𝐱E\mathbf{x}\in E

𝐯𝐰𝐯𝐱.\|\mathbf{v}-\mathbf{w}\|\leq\|\mathbf{v}-\mathbf{x}\|.

Moreover, if for some 𝐱E\mathbf{x}\in E

𝐯𝐰=𝐯𝐱,\|\mathbf{v}-\mathbf{w}\|=\|\mathbf{v}-\mathbf{x}\|,

then 𝐱=𝐰\mathbf{x}=\mathbf{w}.

Proof.

Let 𝐲=𝐰𝐱\mathbf{y}=\mathbf{w}-\mathbf{x}. Then

𝐯𝐱=𝐯𝐰+𝐰𝐱=𝐯𝐰+𝐲.\mathbf{v}-\mathbf{x}=\mathbf{v}-\mathbf{w}+\mathbf{w}-\mathbf{x}=\mathbf{v}-% \mathbf{w}+\mathbf{y}.

Since 𝐯𝐰E\mathbf{v}-\mathbf{w}\perp E we have 𝐲𝐯𝐰\mathbf{y}\perp\mathbf{v}-\mathbf{w} and so by Pythagorean Theorem

𝐯𝐱2=𝐯𝐰2+𝐲2𝐯𝐰2.\|\mathbf{v}-\mathbf{x}\|^{2}=\|\mathbf{v}-\mathbf{w}\|^{2}+\|\mathbf{y}\|^{2}% \geq\|\mathbf{v}-\mathbf{w}\|^{2}.

Note that equality happens only if 𝐲=𝟎\mathbf{y}=\mathbf{0} i.e. if 𝐱=𝐰\mathbf{x}=\mathbf{w}. ∎

The following proposition shows how to find an orthogonal projection if we know an orthogonal basis in EE.

Proposition 5.3.3.

Let 𝐯1,𝐯2,,𝐯r\mathbf{v}_{1},\mathbf{v}_{2},\ldots,\mathbf{v}_{r} be an orthogonal basis in EE. Then the orthogonal projection PE𝐯P_{E}\mathbf{v} of a vector 𝐯\mathbf{v} is given by the formula

PE𝐯=k=1rαk𝐯k,whereαk=(𝐯,𝐯k)𝐯k2.P_{{}_{\scriptstyle E}}\mathbf{v}=\sum_{k=1}^{r}\alpha_{k}\mathbf{v}_{k},% \qquad\text{where}\qquad\alpha_{k}=\frac{(\mathbf{v},\mathbf{v}_{k})}{\|% \mathbf{v}_{k}\|^{2}}\,.

In other words

(5.3.1) PE𝐯=k=1r(𝐯,𝐯k)𝐯k2𝐯k.P_{{}_{\scriptstyle E}}\mathbf{v}=\sum_{k=1}^{r}\frac{(\mathbf{v},\mathbf{v}_{% k})}{\|\mathbf{v}_{k}\|^{2}}\,\mathbf{v}_{k}.

Note that the formula for αk\alpha_{k} coincides with (5.2.1), i.e. this formula applied to an orthogonal system (not a basis) gives us a projection onto its span.

Remark 5.3.4.

It is easy to see now from formula (5.3.1) that the orthogonal projection PEP_{{}_{\scriptstyle E}} is a linear transformation.

One can also see linearity of PEP_{{}_{\scriptstyle E}} directly, from the definition and uniqueness of the orthogonal projection. Indeed, it is easy to check that for any 𝐱\mathbf{x} and 𝐲\mathbf{y} the vector α𝐱+β𝐲(αPE𝐱βPE𝐲)\alpha\mathbf{x}+\beta\mathbf{y}-(\alpha P_{{}_{\scriptstyle E}}\mathbf{x}-% \beta P_{{}_{\scriptstyle E}}\mathbf{y}) is orthogonal to any vector in EE, so by the definition PE(α𝐱+β𝐲)=αPE𝐱+βPE𝐲P_{{}_{\scriptstyle E}}(\alpha\mathbf{x}+\beta\mathbf{y})=\alpha P_{{}_{% \scriptstyle E}}\mathbf{x}+\beta P_{{}_{\scriptstyle E}}\mathbf{y}.

Remark 5.3.5.

Recalling the definition of inner product in n\mathbb{C}^{n} and n\mathbb{R}^{n} one can get from the above formula (5.3.1) the matrix of the orthogonal projection PEP_{{}_{\scriptstyle E}} onto EE in n\mathbb{C}^{n} (n\mathbb{R}^{n}) is given by

(5.3.2) PE=k=1r1𝐯k2𝐯k𝐯kP_{{}_{\scriptstyle E}}=\sum_{k=1}^{r}\frac{1}{\|\mathbf{v}_{k}\|^{2}}\,\,% \mathbf{v}_{k}\mathbf{v}_{k}^{*}

where columns 𝐯1,𝐯2,,𝐯r\mathbf{v}_{1},\mathbf{v}_{2},\ldots,\mathbf{v}_{r} form an orthogonal basis in EE.

Proof of Proposition 5.3.3.

Let

𝐰:=k=1rαk𝐯k,whereαk=(𝐯,𝐯k)𝐯k2.\mathbf{w}:=\sum_{k=1}^{r}\alpha_{k}\mathbf{v}_{k},\qquad\text{where}\qquad% \alpha_{k}=\frac{(\mathbf{v},\mathbf{v}_{k})}{\|\mathbf{v}_{k}\|^{2}}\,.

We want to show that 𝐯𝐰E\mathbf{v}-\mathbf{w}\perp E. By Lemma 5.2.3 it is sufficient to show that 𝐯𝐰𝐯k\mathbf{v}-\mathbf{w}\perp\mathbf{v}_{k}, k=1,2,,rk=1,2,\ldots,r. Computing the inner product we get for k=1,2,,rk=1,2,\ldots,r

(𝐯𝐰,𝐯k)\displaystyle(\mathbf{v}-\mathbf{w},\mathbf{v}_{k}) =(𝐯,𝐯k)(𝐰,𝐯k)=(𝐯,𝐯k)j=1rαj(𝐯j,𝐯k)\displaystyle=(\mathbf{v},\mathbf{v}_{k})-(\mathbf{w},\mathbf{v}_{k})=(\mathbf% {v},\mathbf{v}_{k})-\sum_{j=1}^{r}\alpha_{j}(\mathbf{v}_{j},\mathbf{v}_{k})
=(𝐯,𝐯k)αk(𝐯k,𝐯k)=(𝐯,𝐯k)(𝐯,𝐯k)𝐯k2𝐯k2=0.\displaystyle=(\mathbf{v},\mathbf{v}_{k})-\alpha_{k}(\mathbf{v}_{k},\mathbf{v}% _{k})=(\mathbf{v},\mathbf{v}_{k})-\frac{(\mathbf{v},\mathbf{v}_{k})}{\|\mathbf% {v}_{k}\|^{2}}\|\mathbf{v}_{k}\|^{2}=0.

So, if we know an orthogonal basis in EE we can find the orthogonal projection onto EE. In particular, since any system consisting of one vector is an orthogonal system, we know how to perform orthogonal projection onto one-dimensional spaces.

But how do we find an orthogonal projection if we are only given a basis in EE? Fortunately, there exists a simple algorithm allowing one to get an orthogonal basis from a basis.

5.3.1. Gram-Schmidt orthogonalization algorithm

Suppose we have a linearly independent system 𝐱1,𝐱2,,𝐱n\mathbf{x}_{1},\mathbf{x}_{2},\ldots,\mathbf{x}_{n}. The Gram-Schmidt method constructs from this system an orthogonal system 𝐯1,𝐯2,,𝐯n\mathbf{v}_{1},\mathbf{v}_{2},\ldots,\mathbf{v}_{n} such that

span{𝐱1,𝐱2,,𝐱n}=span{𝐯1,𝐯2,,𝐯n}.\operatorname{span}\{\mathbf{x}_{1},\mathbf{x}_{2},\ldots,\mathbf{x}_{n}\}=% \operatorname{span}\{\mathbf{v}_{1},\mathbf{v}_{2},\ldots,\mathbf{v}_{n}\}.

Moreover, for all rnr\leq n we get

span{𝐱1,𝐱2,,𝐱r}=span{𝐯1,𝐯2,,𝐯r}\operatorname{span}\{\mathbf{x}_{1},\mathbf{x}_{2},\ldots,\mathbf{x}_{r}\}=% \operatorname{span}\{\mathbf{v}_{1},\mathbf{v}_{2},\ldots,\mathbf{v}_{r}\}

Now let us describe the algorithm.

Step 1. Put 𝐯1:=𝐱1\mathbf{v}_{1}:=\mathbf{x}_{1}. Denote by E1:=span{𝐱1}=span{𝐯1}E_{1}:=\operatorname{span}\{\mathbf{x}_{1}\}=\operatorname{span}\{\mathbf{v}_{% 1}\}.

Step 2. Define 𝐯2\mathbf{v}_{2} by

𝐯2=𝐱2PE1𝐱2=𝐱2(𝐱2,𝐯1)𝐯12𝐯1.\mathbf{v}_{2}=\mathbf{x}_{2}-P_{E_{1}}\mathbf{x}_{2}=\mathbf{x}_{2}-\frac{(% \mathbf{x}_{2},\mathbf{v}_{1})}{\|\mathbf{v}_{1}\|^{2}}\mathbf{v}_{1}.

Define E2=span{𝐯1,𝐯2}E_{2}=\operatorname{span}\{\mathbf{v}_{1},\mathbf{v}_{2}\}. Note that span{𝐱1,𝐱2}=E2\operatorname{span}\{\mathbf{x}_{1},\mathbf{x}_{2}\}=E_{2}.

Step 3. Define 𝐯3\mathbf{v}_{3} by

𝐯3:=𝐱3PE2𝐱3=𝐱3(𝐱3,𝐯1)𝐯12𝐯1(𝐱3,𝐯2)𝐯22𝐯2\mathbf{v}_{3}:=\mathbf{x}_{3}-P_{E_{2}}\mathbf{x}_{3}=\mathbf{x}_{3}-\frac{(% \mathbf{x}_{3},\mathbf{v}_{1})}{\|\mathbf{v}_{1}\|^{2}}\mathbf{v}_{1}-\frac{(% \mathbf{x}_{3},\mathbf{v}_{2})}{\|\mathbf{v}_{2}\|^{2}}\mathbf{v}_{2}

Put E3:=span{𝐯1,𝐯2,𝐯3}E_{3}:=\operatorname{span}\{\mathbf{v}_{1},\mathbf{v}_{2},\mathbf{v}_{3}\}. Note that span{𝐱1,𝐱2,𝐱3}=E3\operatorname{span}\{\mathbf{x}_{1},\mathbf{x}_{2},\mathbf{x}_{3}\}=E_{3}. Note also that 𝐱3E2\mathbf{x}_{3}\notin E_{2} so 𝐯3𝟎\mathbf{v}_{3}\neq\mathbf{0}.

Step r+1r+1. Suppose that we already made rr steps of the process, constructing an orthogonal system (consisting of non-zero vectors) 𝐯1,𝐯2,,𝐯r\mathbf{v}_{1},\mathbf{v}_{2},\ldots,\mathbf{v}_{r} such that Er:=span{𝐯1,𝐯2,,𝐯r}=span{𝐱1,𝐱2,,𝐱r}E_{r}:=\operatorname{span}\{\mathbf{v}_{1},\mathbf{v}_{2},\ldots,\mathbf{v}_{r% }\}=\operatorname{span}\{\mathbf{x}_{1},\mathbf{x}_{2},\ldots,\mathbf{x}_{r}\}. Define

𝐯r+1:=𝐱r+1PEr𝐱r+1=𝐱r+1k=1r(𝐱r+1,𝐯k)𝐯k2𝐯k\mathbf{v}_{r+1}:=\mathbf{x}_{r+1}-P_{E_{r}}\mathbf{x}_{r+1}=\mathbf{x}_{r+1}-% \sum_{k=1}^{r}\frac{(\mathbf{x}_{r+1},\mathbf{v}_{k})}{\|\mathbf{v}_{k}\|^{2}}% \mathbf{v}_{k}

Note,that 𝐱r+1Er\mathbf{x}_{r+1}\notin E_{r} so 𝐯r+1𝟎\mathbf{v}_{r+1}\neq\mathbf{0}.

Continuing this algorithm we get an orthogonal system 𝐯1,𝐯2,,𝐯n\mathbf{v}_{1},\mathbf{v}_{2},\ldots,\mathbf{v}_{n}.

5.3.2. An example.

Suppose we are given vectors

𝐱1=(1,1,1)T,𝐱2=(0,1,2)T,𝐱3=(1,0,2)T,\mathbf{x}_{1}=(1,1,1)^{T},\qquad\mathbf{x}_{2}=(0,1,2)^{T},\qquad\mathbf{x}_{% 3}=(1,0,2)^{T},

and we want to orthogonalize it by Gram-Schmidt. On the first step define

𝐯1=𝐱1=(1,1,1)T.\mathbf{v}_{1}=\mathbf{x}_{1}=(1,1,1)^{T}.

On the second step we get

𝐯2=𝐱2PE1𝐱2=𝐱2(𝐱2,𝐯1)𝐯12𝐯1.\mathbf{v}_{2}=\mathbf{x}_{2}-P_{E_{1}}\mathbf{x}_{2}=\mathbf{x}_{2}-\frac{(% \mathbf{x}_{2},\mathbf{v}_{1})}{\|\mathbf{v}_{1}\|^{2}}\mathbf{v}_{1}.

Computing

(𝐱2,𝐯1)=((012),(111))=3,𝐯12=3,(\mathbf{x}_{2},\mathbf{v}_{1})=\bigl{(}\left(\begin{array}[]{c}0\\ 1\\ 2\end{array}\right),\left(\begin{array}[]{c}1\\ 1\\ 1\end{array}\right)\bigr{)}=3,\quad\|\mathbf{v}_{1}\|^{2}=3,

we get

𝐯2=(012)33(111)=(101).\mathbf{v}_{2}=\left(\begin{array}[]{c}0\\ 1\\ 2\end{array}\right)-\frac{3}{3}\left(\begin{array}[]{c}1\\ 1\\ 1\end{array}\right)=\left(\begin{array}[]{c}-1\\ 0\\ 1\end{array}\right).

Finally, define

𝐯3=𝐱3PE2𝐱3=𝐱3(𝐱3,𝐯1)𝐯12𝐯1(𝐱3,𝐯2)𝐯22𝐯2.\mathbf{v}_{3}=\mathbf{x}_{3}-P_{E_{2}}\mathbf{x}_{3}=\mathbf{x}_{3}-\frac{(% \mathbf{x}_{3},\mathbf{v}_{1})}{\|\mathbf{v}_{1}\|^{2}}\mathbf{v}_{1}-\frac{(% \mathbf{x}_{3},\mathbf{v}_{2})}{\|\mathbf{v}_{2}\|^{2}}\mathbf{v}_{2}.

Computing

((102),(111))=3,((102),(101))=1,𝐯12=3,𝐯22=2\bigl{(}\left(\begin{array}[]{c}1\\ 0\\ 2\end{array}\right),\left(\begin{array}[]{c}1\\ 1\\ 1\end{array}\right)\bigr{)}=3,\quad\bigl{(}\left(\begin{array}[]{c}1\\ 0\\ 2\end{array}\right),\left(\begin{array}[]{c}-1\\ 0\\ 1\end{array}\right)\bigr{)}=1,\ \|\mathbf{v}_{1}\|^{2}=3,\ \|\mathbf{v}_{2}\|^% {2}=2

(𝐯12\|\mathbf{v}_{1}\|^{2} was already computed before) we get

𝐯3=(102)33(111)12(101)=(12112)\mathbf{v}_{3}=\left(\begin{array}[]{c}1\\ 0\\ 2\end{array}\right)-\frac{3}{3}\left(\begin{array}[]{c}1\\ 1\\ 1\end{array}\right)-\frac{1}{2}\left(\begin{array}[]{c}-1\\ 0\\ 1\end{array}\right)=\left(\begin{array}[]{c}\frac{1}{2}\\ -1\\ \frac{1}{2}\end{array}\right)
Remark.

Since the multiplication by a scalar does not change the orthogonality, one can multiply vectors 𝐯k\mathbf{v}_{k} obtained by Gram-Schmidt by any non-zero numbers.

In particular, in many theoretical constructions one normalizes vectors 𝐯k\mathbf{v}_{k} by dividing them by their respective norms 𝐯k\|\mathbf{v}_{k}\|. Then the resulting system will be orthonormal, and the formulas will look simpler.

On the other hand, when performing the computations one may want to avoid fractional entries by multiplying a vector by the least common denominator of its entries. Thus one may want to replace the vector 𝐯3\mathbf{v}_{3} from the above example by (1,2,1)T(1,-2,1)^{T}.

5.3.3. Orthogonal complement. Decomposition V=EEV=E\oplus E^{\perp}

Definition.

For a subspace EE its orthogonal complement EE^{\perp} is the set of all vectors orthogonal to EE,

E:={𝐱:𝐱E}.E^{\perp}:=\{\mathbf{x}:\mathbf{x}\perp E\}.

If 𝐱,𝐲E\mathbf{x},\mathbf{y}\perp E then for any linear combination α𝐱+β𝐲E\alpha\mathbf{x}+\beta\mathbf{y}\perp E (can you see why?). Therefore EE^{\perp} is a subspace.

By the definition of orthogonal projection any vector in an inner product space VV admits a unique representation

𝐯=𝐯1+𝐯2,𝐯1E,𝐯2E (eqv. 𝐯2E)\mathbf{v}=\mathbf{v}_{1}+\mathbf{v}_{2},\qquad\mathbf{v}_{1}\in E,\ \mathbf{v% }_{2}\perp E\text{ (eqv. }\mathbf{v}_{2}\in E^{\perp})

(where clearly 𝐯1=PE𝐯\mathbf{v}_{1}=P_{{}_{\scriptstyle E}}\mathbf{v}).

This statement is often symbolically written as V=EEV=E\oplus E^{\perp}, which mean exactly that any vector admits the unique decomposition above.

The following proposition gives an important property of the orthogonal complement.

Proposition 5.3.6.

For a subspace EE

(E)=E.(E^{\perp})^{\perp}=E.

The proof is left as an exercise, see Problem 5.3.12 below.

Exercises.

5.3.1.

Apply Gram–Schmidt orthogonalization to the system of vectors (1,2,2)T(1,2,-2)^{T}, (1,1,4)T(1,-1,4)^{T}, (2,1,1)T(2,1,1)^{T}.

5.3.2.

Apply Gram–Schmidt orthogonalization to the system of vectors (1,2,3)T(1,2,3)^{T}, (1,3,1)T(1,3,1)^{T}. Write the matrix of the orthogonal projection onto 2-dimensional subspace spanned by these vectors.

5.3.3.

Complete an orthogonal system obtained in the previous problem to an orthogonal basis in 3\mathbb{R}^{3}, i.e. add to the system some vectors (how many?) to get an orthogonal basis.

Can you describe how to complete an orthogonal system to an orthogonal basis in general situation of n\mathbb{R}^{n} or n\mathbb{C}^{n}?

5.3.4.

Find the distance from a vector (2,3,1)T(2,3,1)^{T} to the subspace spanned by the vectors (1,2,3)T(1,2,3)^{T}, (1,3,1)T(1,3,1)^{T}. Note, that I am only asking to find the distance to the subspace, not the orthogonal projection.

5.3.5.

Find the orthogonal projection of a vector (1,1,1,1)T(1,1,1,1)^{T} onto the subspace spanned by the vectors 𝐯1=(1,3,1,1)T\mathbf{v}_{1}=(1,3,1,1)^{T} and 𝐯2=(2,1,1,0)T\mathbf{v}_{2}=(2,-1,1,0)^{T} (note that 𝐯1𝐯2\mathbf{v}_{1}\perp\mathbf{v}_{2}).

5.3.6.

Find the distance from a vector (1,2,3,4)T(1,2,3,4)^{T} to the subspace spanned by the vectors 𝐯1=(1,1,1,0)T\mathbf{v}_{1}=(1,-1,1,0)^{T} and 𝐯2=(1,2,1,1)T\mathbf{v}_{2}=(1,2,1,1)^{T} (note that 𝐯1𝐯2\mathbf{v}_{1}\perp\mathbf{v}_{2}). Can you find the distance without actually computing the projection? That would simplify the calculations.

5.3.7.

True or false: if EE is a subspace of VV, then dimE+dim(E)=dimV\dim E+\dim(E^{\perp})=\dim V? Justify.

5.3.8.

Let PP be the orthogonal projection onto a subspace EE of an inner product space VV, dimV=n\dim V=n, dimE=r\dim E=r. Find the eigenvalues and the eigenvectors (eigenspaces). Find the algebraic and geometric multiplicities of each eigenvalue.

5.3.9.

(Using eigenvalues to compute determinants).

  1. a)

    Find the matrix of the orthogonal projection onto the one-dimensional subspace in n\mathbb{R}^{n} spanned by the vector (1,1,,1)T(1,1,\ldots,1)^{T};

  2. b)

    Let AA be the n×nn\times n matrix with all entries equal 11. Compute its eigenvalues and their multiplicities (use the previous problem);

  3. c)

    Compute eigenvalues (and multiplicities) of the matrix AIA-I, i.e. of the matrix with zeroes on the main diagonal and ones everywhere else;

  4. d)

    Compute det(AI)\det(A-I).

5.3.10Legendre’s polynomials:.

Let an inner product on the space of polynomials be defined by (f,g)=11f(t)g(t)¯𝑑t(f,g)=\int_{-1}^{1}f(t)\overline{g(t)}dt. Apply Gram-Schmidt orthogonalization to the system 1,t,t2,t31,t,t^{2},t^{3}.

Legendre’s polynomials are particular case of the so-called orthogonal polynomials, which play an important role in many branches of mathematics.

5.3.11.

Let P=PEP=P_{E} be the matrix of an orthogonal projection onto a subspace EE. Show that

  1. a)

    The matrix PP is self-adjoint, meaning that P=PP^{*}=P.

  2. b)

    P2=PP^{2}=P.

Remark: The above 2 properties completely characterize orthogonal projection, i.e. any matrix PP satisfying these properties is the matrix of some orthogonal projection. We will discuss this some time later.

5.3.12.

Show that for a subspace EE we have (E)=E(E^{\perp})^{\perp}=E. Hint: It is easy to see that EE is orthogonal to EE^{\perp} (why?). To show that any vector 𝐱\mathbf{x} orthogonal to EE^{\perp} belongs to EE use the decomposition V=EEV=E\oplus E^{\perp} from Section 5.3.3 above.

5.3.13.

Suppose PP is the orthogonal projection onto a subspace EE, and QQ is the orthogonal projection onto the orthogonal complement EE^{\perp}.

  1. a)

    What are P+QP+Q and PQPQ?

  2. b)

    Show that PQP-Q is its own inverse.

5.4. Least square solution. Formula for the orthogonal projection

As it was discussed before in Chapter 2, the equation

A𝐱=𝐛A\mathbf{x}=\mathbf{b}

has a solution if and only if 𝐛RanA\mathbf{b}\in\operatorname{Ran}A. But what do we do to solve an equation that does not have a solution?

This seems to be a silly question, because if there is no solution, then there is no solution. But, situations when we want to solve an equation that does not have a solution can appear naturally, for example, if we obtained the equation from an experiment. If we do not have any errors, the right side 𝐛\mathbf{b} belongs to the column space RanA\operatorname{Ran}A, and equation is consistent. But, in real life it is impossible to avoid errors in measurements, so it is possible that an equation that in theory should be consistent, does not have a solution. So, what can one do in this situation?

5.4.1. Least square solution

The simplest idea is to write down the error

A𝐱𝐛\|A\mathbf{x}-\mathbf{b}\|

and try to find 𝐱\mathbf{x} minimizing it. If we can find 𝐱\mathbf{x} such that the error is 0, the system is consistent and we have exact solution. Otherwise, we get the so-called least square solution.

The term least square arises from the fact that minimizing A𝐱𝐛\|A\mathbf{x}-\mathbf{b}\| is equivalent to minimizing

A𝐱𝐛2=k=1m|(A𝐱)kbk|2=k=1m|j=1nAk,jxjbk|2\|A\mathbf{x}-\mathbf{b}\|^{2}=\sum_{k=1}^{m}|(A\mathbf{x})_{k}-b_{k}|^{2}=% \sum_{k=1}^{m}\Bigl{|}\sum_{j=1}^{n}A_{k,j}x_{j}-b_{k}\Bigr{|}^{2}

i.e. to minimizing the sum of squares of linear functions.

There are several ways to find the least square solution. If we are in n\mathbb{R}^{n}, and everything is real, we can forget about absolute values. Then we can just take partial derivatives with respect to xjx_{j} and find where all of them are 0, which gives us the minimum.

Geometric approach.

However, there is a simpler way of finding the minimum. Namely, if we take all possible vectors 𝐱\mathbf{x}, then A𝐱A\mathbf{x} gives us all possible vectors in RanA\operatorname{Ran}A, so minimum of A𝐱𝐛\|A\mathbf{x}-\mathbf{b}\| is exactly the distance from 𝐛\mathbf{b} to RanA\operatorname{Ran}A. Therefore the value of A𝐱𝐛\|A\mathbf{x}-\mathbf{b}\| is minimal if and only if A𝐱=PRanA𝐛A\mathbf{x}=P_{\operatorname{Ran}A}\mathbf{b}, where PRanAP_{\operatorname{Ran}A} stands for the orthogonal projection onto the column space RanA\operatorname{Ran}A.

So, to find the least square solution we simply need to solve the equation

A𝐱=PRanA𝐛.A\mathbf{x}=P_{\operatorname{Ran}A}\mathbf{b}.

If we know an orthogonal basis 𝐯1,𝐯2,,𝐯n\mathbf{v}_{1},\mathbf{v}_{2},\ldots,\mathbf{v}_{n} in RanA\operatorname{Ran}A, we can find vector PRanA𝐛P_{\operatorname{Ran}A}\mathbf{b} by the formula

PRanA𝐛=k=1n(𝐛,𝐯k)𝐯k2𝐯k.P_{\operatorname{Ran}A}\mathbf{b}=\sum_{k=1}^{n}\frac{(\mathbf{b},\mathbf{v}_{% k})}{\|\mathbf{v}_{k}\|^{2}}\mathbf{v}_{k}.

If we only know a basis in RanA\operatorname{Ran}A, we need to use the Gram–Schmidt orthogonalization to obtain an orthogonal basis from it.

So, theoretically, the problem is solved, but the solution is not very simple: it involves Gram–Schmidt orthogonalization, which can be computationally intensive. Fortunately, there exists a simpler solution.

Normal equation.

Namely, A𝐱A\mathbf{x} is the orthogonal projection PRanA𝐛P_{\operatorname{Ran}A}\mathbf{b} if and only if 𝐛A𝐱RanA\mathbf{b}-A\mathbf{x}\perp\operatorname{Ran}A (A𝐱RanAA\mathbf{x}\in\operatorname{Ran}A for all 𝐱\mathbf{x}).

If 𝐚1,𝐚2,,𝐚n\mathbf{a}_{1},\mathbf{a}_{2},\ldots,\mathbf{a}_{n} are columns of AA, then the condition 𝐛A𝐱RanA\mathbf{b}-A\mathbf{x}\perp\operatorname{Ran}A can be rewritten as

𝐛A𝐱𝐚k,k=1,2,,n.\mathbf{b}-A\mathbf{x}\perp\mathbf{a}_{k},\qquad\forall k=1,2,\ldots,n.

That means

0=(𝐛A𝐱,𝐚k)=𝐚k(𝐛A𝐱)k=1,2,,n.0=(\mathbf{b}-A\mathbf{x},\mathbf{a}_{k})=\mathbf{a}_{k}^{*}(\mathbf{b}-A% \mathbf{x})\qquad\forall k=1,2,\ldots,n.

Joining rows 𝐚k\mathbf{a}_{k}^{*} together we get that these equations are equivalent to

A(𝐛A𝐱)=𝟎,A^{*}(\mathbf{b}-A\mathbf{x})=\mathbf{0},

which in turn is equivalent to the so-called normal equation

AA𝐱=A𝐛.A^{*}A\mathbf{x}=A^{*}\mathbf{b}.

A solution of this equation gives us the least square solution of A𝐱=𝐛A\mathbf{x}=\mathbf{b}.

Note, that the least square solution is unique if and only if AAA^{*}A is invertible.

5.4.2. Formula for the orthogonal projection.

As we already discussed above, if 𝐱\mathbf{x} is a solution of the normal equation AA𝐱=A𝐛A^{*}A\mathbf{x}=A^{*}\mathbf{b} (i.e. a least square solution of A𝐱=𝐛A\mathbf{x}=\mathbf{b}), then A𝐱=PRanA𝐛A\mathbf{x}=P_{\operatorname{Ran}A}\mathbf{b}. So, to find the orthogonal projection of 𝐛\mathbf{b} onto the column space RanA\operatorname{Ran}A we need to solve the normal equation AA𝐱=A𝐛A^{*}A\mathbf{x}=A^{*}\mathbf{b}, and then multiply the solution by AA.

If the operator AAA^{*}A is invertible, the solution of the normal equation AA𝐱=A𝐛A^{*}A\mathbf{x}=A^{*}\mathbf{b} is given by 𝐱=(AA)1A𝐛\mathbf{x}=(A^{*}A)^{-1}A^{*}\mathbf{b}, so the orthogonal projection PRanA𝐛P_{\operatorname{Ran}A}\mathbf{b} can be computed as

PRanA𝐛=A(AA)1A𝐛.P_{\operatorname{Ran}A}\mathbf{b}=A(A^{*}A)^{-1}A^{*}\mathbf{b}.

Since this is true for all 𝐛\mathbf{b},

PRanA=A(AA)1AP_{\operatorname{Ran}A}=A(A^{*}A)^{-1}A^{*}

is the formula for the matrix of the orthogonal projection onto RanA\operatorname{Ran}A.

The following theorem implies that for an m×nm\times n matrix AA the matrix AAA^{*}A is invertible if and only if rankA=n\operatorname{rank}A=n.

Theorem 5.4.1.

For an m×nm\times n matrix AA

KerA=Ker(AA).\operatorname{Ker}A=\operatorname{Ker}(A^{*}A).

Indeed, according to the rank theorem KerA={𝟎}\operatorname{Ker}A=\{\mathbf{0}\} if and only if rank AA is nn. Therefore Ker(AA)={𝟎}\operatorname{Ker}(A^{*}A)=\{\mathbf{0}\} if and only if rankA=n\operatorname{rank}A=n. Since the matrix AAA^{*}A is square, it is invertible if and only if rankA=n\operatorname{rank}A=n.

We leave the proof of the theorem as an exercise. To prove the equality KerA=Ker(AA)\operatorname{Ker}A=\operatorname{Ker}(A^{*}A) one needs to prove two inclusions Ker(AA)KerA\operatorname{Ker}(A^{*}A)\subset\operatorname{Ker}A and KerAKer(AA)\operatorname{Ker}A\subset\operatorname{Ker}(A^{*}A). One of the inclusions is trivial, for the other one use the fact that

A𝐱2=(A𝐱,A𝐱)=(AA𝐱,𝐱).\|A\mathbf{x}\|^{2}=(A\mathbf{x},A\mathbf{x})=(A^{*}A\mathbf{x},\mathbf{x}).

5.4.3. An example: line fitting

Let us introduce a few examples where the least square solution appears naturally. Suppose that we know that two quantities xx and yy are related by the law y=a+bxy=a+bx. The coefficients aa and bb are unknown, and we would like to find them from experimental data.

Suppose we run the experiment nn times, and we get nn pairs (xk,yk)(x_{k},y_{k}), k=1,2,,nk=1,2,\ldots,n. Ideally, all the points (xk,yk)(x_{k},y_{k}) should be on a straight line, but because of errors in measurements, it usually does not happen: the point are usually close to some line, but not exactly on it. That is where the least square solution helps!

Ideally, the coefficients aa and bb should satisfy the equations

a+bxk=yk,k=1,2,,na+bx_{k}=y_{k},\qquad k=1,2,\ldots,n

(note that here, xkx_{k} and yky_{k} are some fixed numbers, and the unknowns are aa and bb). If it is possible to find such aa and bb we are lucky. If not, the standard thing to do is to minimize the total quadratic error

k=1n|a+bxkyk|2.\sum_{k=1}^{n}|a+bx_{k}-y_{k}|^{2}.

But, minimizing this error is exactly finding the least square solution of the system

(1x11x21xn)[ab]=(y1y2yn)\left(\begin{array}[]{cc}1&x_{1}\\ 1&x_{2}\\ \vdots&\vdots\\ 1&x_{n}\end{array}\right)\left[\begin{array}[]{c}a\\ b\end{array}\right]=\left(\begin{array}[]{c}y_{1}\\ y_{2}\\ \vdots\\ y_{n}\end{array}\right)

(recall that xkx_{k}, yky_{k} are some given numbers, and the unknowns are aa and bb).

An example.

Suppose our data (xk,yk)(x_{k},y_{k}) consist of pairs

(2,4),(1,2),(0,1),(2,1),(3,1).(-2,4),\ (-1,2),\ (0,1),\ (2,1),\ (3,1).

Then we need to find the least square solution of

(1211101213)[ab]=(42111)\left(\begin{array}[]{cc}1&-2\\ 1&-1\\ 1&0\\ 1&2\\ 1&3\end{array}\right)\left[\begin{array}[]{c}a\\ b\end{array}\right]=\left(\begin{array}[]{c}4\\ 2\\ 1\\ 1\\ 1\end{array}\right)

Then

AA=(1111121023)(1211101213)=(52218)A^{*}A=\left(\begin{array}[]{ccccc}1&1&1&1&1\\ -2&-1&0&2&3\end{array}\right)\left(\begin{array}[]{cc}1&-2\\ 1&-1\\ 1&0\\ 1&2\\ 1&3\end{array}\right)=\left(\begin{array}[]{cc}5&2\\ 2&18\end{array}\right)

and

A𝐛=(1111121023)(42111)=(95)A^{*}\mathbf{b}=\left(\begin{array}[]{ccccc}1&1&1&1&1\\ -2&-1&0&2&3\end{array}\right)\left(\begin{array}[]{c}4\\ 2\\ 1\\ 1\\ 1\\ \end{array}\right)=\left(\begin{array}[]{c}9\\ -5\end{array}\right)

so the normal equation AA𝐱=A𝐛A^{*}A\mathbf{x}=A^{*}\mathbf{b} is rewritten as

(52218)(ab)=(95).\left(\begin{array}[]{cc}5&2\\ 2&18\end{array}\right)\left(\begin{array}[]{c}a\\ b\end{array}\right)=\left(\begin{array}[]{c}9\\ -5\end{array}\right).

The solution of this equation is

a=2,b=1/2,a=2,b=-1/2,

so the best fitting straight line is

y=21/2x.y=2-1/2x.

5.4.4. Other examples: curves and planes.

The least square method is not limited to the line fitting. It can also be applied to more general curves, as well as to surfaces in higher dimensions.

The only constraint here is that the parameters we want to find be involved linearly. The general algorithm is as follows:

  1. 1.

    Find the equations that your data should satisfy if there is exact fit;

  2. 2.

    Write these equations as a linear system, where unknowns are the parameters you want to find. Note, that the system need not to be consistent (and usually is not);

  3. 3.

    Find the least square solution of the system.

An example: curve fitting

For example, suppose we know that the relation between xx and yy is given by the quadratic law y=a+bx+cx2y=a+bx+cx^{2}, so we want to fit a parabola y=a+bx+cx2y=a+bx+cx^{2} to the data. Then our unknowns aa, bb, cc should satisfy the equations

a+bxk+cxk2=yk,k=1,2,,na+bx_{k}+cx_{k}^{2}=y_{k},\qquad k=1,2,\ldots,n

or, in matrix form

(1x1x121x2x221xnxn2)(abc)=(y1y2yn)\left(\begin{array}[]{ccc}1&x_{1}&x^{2}_{1}\\ 1&x_{2}&x^{2}_{2}\\ \vdots&\vdots&\vdots\\ 1&x_{n}&x^{2}_{n}\end{array}\right)\left(\begin{array}[]{c}a\\ b\\ c\end{array}\right)=\left(\begin{array}[]{c}y_{1}\\ {y}_{2}\\ \vdots\\ {y}_{n}\end{array}\right)

For example, for the data from the previous example we need to find the least square solution of

(124111100124139)(abc)=(42111).\left(\begin{array}[]{ccc}1&-2&4\\ 1&-1&1\\ 1&0&0\\ 1&2&4\\ 1&3&9\end{array}\right)\left(\begin{array}[]{c}a\\ b\\ c\end{array}\right)=\left(\begin{array}[]{c}4\\ 2\\ 1\\ 1\\ 1\end{array}\right).

Then

AA=(111112102341049)(124111100124139)=(5218218261826114)A^{*}A=\left(\begin{array}[]{ccccc}1&1&1&1&1\\ -2&-1&0&2&3\\ 4&1&0&4&9\end{array}\right)\left(\begin{array}[]{cccc}1&-2&4\\ 1&-1&1\\ 1&0&0\\ 1&2&4\\ 1&3&9\end{array}\right)=\left(\begin{array}[]{cccc}5&2&18\\ 2&18&26\\ 18&26&114\end{array}\right)

and

A𝐛=(111112102341049)(42111)=(9531).A^{*}\mathbf{b}=\left(\begin{array}[]{ccccc}1&1&1&1&1\\ -2&-1&0&2&3\\ 4&1&0&4&9\end{array}\right)\left(\begin{array}[]{c}4\\ 2\\ 1\\ 1\\ 1\end{array}\right)=\left(\begin{array}[]{cccc}9\\ -5\\ 31\end{array}\right).

Therefore the normal equation AA𝐱=A𝐛A^{*}A\mathbf{x}=A^{*}\mathbf{b} is

(5218218261826114)(abc)=(9531)\left(\begin{array}[]{cccc}5&2&18\\ 2&18&26\\ 18&26&114\end{array}\right)\left(\begin{array}[]{c}a\\ b\\ c\end{array}\right)=\left(\begin{array}[]{cccc}9\\ -5\\ 31\end{array}\right)

which has the unique solution

a=86/77,b=62/77,c=43/154.a=86/77,\qquad b=-62/77,\qquad c=43/154.

Therefore,

y=86/7762x/77+43x2/154y=86/77-62x/77+43x^{2}/154

is the best fitting parabola.

Plane fitting

As another example, let us fit a plane z=a+bx+cyz=a+bx+cy to the data

(xk,yk,zk)3,k=1,2,n.(x_{k},y_{k},z_{k})\in\mathbb{R}^{3},\qquad k=1,2,\ldots n.

The equations we should have in the case of exact fit are

a+bxk+cyk=zk,k=1,2,,n,a+bx_{k}+cy_{k}=z_{k},\qquad k=1,2,\ldots,n,

or, in the matrix form

(1x1y11x2y21xnyn)(abc)=(z1z2zn).\left(\begin{array}[]{ccc}1&x_{1}&y_{1}\\ 1&x_{2}&y_{2}\\ \vdots&\vdots&\vdots\\ 1&x_{n}&y_{n}\end{array}\right)\left(\begin{array}[]{c}a\\ b\\ c\end{array}\right)=\left(\begin{array}[]{c}z_{1}\\ {z}_{2}\\ \vdots\\ {z}_{n}\end{array}\right).

So, to find the best fitting plane, we need to find the best square solution of this system (the unknowns are aa, bb, cc).

Exercises.

5.4.1.

Find the least square solution of the system

(100111)𝐱=(110)\left(\begin{array}[]{cc}1&0\\ 0&1\\ 1&1\end{array}\right)\mathbf{x}=\left(\begin{array}[]{c}1\\ 1\\ 0\end{array}\right)
5.4.2.

Find the matrix of the orthogonal projection PP onto the column space of

(112124).\left(\begin{array}[]{rr}1&1\\ 2&-1\\ -2&4\end{array}\right).

Use two methods: Gram–Schmidt orthogonalization and formula for the projection.

Compare the results.

5.4.3.

Find the best straight line fit (least square solution) to the points (2,4)(-2,4), (1,3)(-1,3), (0,1)(0,1), (2,0)(2,0).

5.4.4.

Fit a plane z=a+bx+cyz=a+bx+cy to four points (1,1,3)(1,1,3), (0,3,6)(0,3,6), (2,1,5)(2,1,5), (0,0,0)(0,0,0).

To do that

  1. a)

    Find 4 equations with 3 unknowns a,b,ca,b,c such that the plane pass through all 4 points (this system does not have to have a solution);

  2. b)

    Find the least square solution of the system.

5.4.5.

Minimal norm solution. let an equation A𝐱=𝐛A\mathbf{x}=\mathbf{b} has a solution, and let AA has non-trivial kernel (so the solution is not unique). Prove that

  1. a)

    There exist a unique solution 𝐱0\mathbf{x}_{0} of A𝐱=𝐛A\mathbf{x}=\mathbf{b} minimizing the norm 𝐱\|\mathbf{x}\|, i.e. that there exists unique 𝐱0\mathbf{x}_{0} such that A𝐱0=𝐛A\mathbf{x}_{0}=\mathbf{b} and 𝐱0𝐱\|\mathbf{x}_{0}\|\leq\|\mathbf{x}\| for any 𝐱\mathbf{x} satisfying A𝐱=𝐛A\mathbf{x}=\mathbf{b}.

  2. b)

    𝐱0=P(KerA)𝐱\mathbf{x}_{0}=P_{{}_{\scriptstyle(\operatorname{Ker}A)^{\perp}}}\mathbf{x} for any 𝐱\mathbf{x} satisfying A𝐱=𝐛A\mathbf{x}=\mathbf{b}.

5.4.6.

Minimal norm least square solution. Applying previous problem to the equation A𝐱=PRanA𝐛A\mathbf{x}=P_{{}_{\scriptstyle\operatorname{Ran}A}}\mathbf{b} show that

  1. a)

    There exists a unique least square solution 𝐱0\mathbf{x}_{0} of A𝐱=𝐛A\mathbf{x}=\mathbf{b} minimizing the norm 𝐱\|\mathbf{x}\|.

  2. b)

    𝐱0=P(KerA)𝐱\mathbf{x}_{0}=P_{{}_{\scriptstyle(\operatorname{Ker}A)^{\perp}}}\mathbf{x} for any least square solution 𝐱\mathbf{x} of A𝐱=𝐛A\mathbf{x}=\mathbf{b}.

5.5. Adjoint of a linear transformation. Fundamental subspaces revisited.

5.5.1. Adjoint matrices and adjoint operators.

Let as recall that for an m×nm\times n matrix AA its Hermitian adjoint (or simply adjoint) AA^{*} is defined by A:=AT¯A^{*}:=\overline{A^{T}}. In other words, the matrix AA^{*} is obtained from the transposed matrix ATA^{T} by taking complex conjugate of each entry.

The following identity is the main property of adjoint matrix:

(A𝐱,𝐲)=(𝐱,A𝐲)𝐱n,𝐲m.\framebox{$(A\mathbf{x},\mathbf{y})=(\mathbf{x},A^{*}\mathbf{y})\qquad\forall% \mathbf{x}\in\mathbb{C}^{n},\ \forall\mathbf{y}\in\mathbb{C}^{m}.$}

Before proving this identity, let us introduce some useful formulas. Let us recall that for transposed matrices we have the identity (AB)T=BTAT(AB)^{T}=B^{T}A^{T}. Since for complex numbers zz and ww we have zw¯=z¯w¯\overline{zw}=\overline{z}\,\overline{w}, the identity

(AB)=BA(AB)^{*}=B^{*}A^{*}

holds for the adjoint.

Also, since (AT)T=A(A^{T})^{T}=A and z¯¯=z\overline{\overline{z}}=z,

(A)=A.(A^{*})^{*}=A.

Now, we are ready to prove the main identity:

(A𝐱,𝐲)=𝐲A𝐱=(A𝐲)𝐱=(𝐱,A𝐲);(A\mathbf{x},\mathbf{y})=\mathbf{y}^{*}A\mathbf{x}=(A^{*}\mathbf{y})^{*}% \mathbf{x}=(\mathbf{x},A^{*}\mathbf{y});

the first and the last equalities here follow from the definition of inner product in 𝔽n\mathbb{F}^{n}, and the middle one follows from the fact that

(A𝐱)=𝐱(A)=𝐱A.(A^{*}\mathbf{x})^{*}=\mathbf{x}^{*}(A^{*})^{*}=\mathbf{x}^{*}A.

Uniqueness of the adjoint.

The above main identity (A𝐱,𝐲)=(𝐱,A𝐲)(A\mathbf{x},\mathbf{y})\linebreak=(\mathbf{x},A^{*}\mathbf{y}) is often used as the definition of the adjoint operator. Let us first notice that the adjoint operator is unique: if a matrix BB satisfies

(A𝐱,𝐲)=(𝐱,B𝐲)𝐱,𝐲,(A\mathbf{x},\mathbf{y})=(\mathbf{x},B\mathbf{y})\qquad\forall\mathbf{x},% \mathbf{y},

then B=AB=A^{*}. Indeed, by the definition of AA^{*} for a given 𝐲\mathbf{y} we have

(𝐱,A𝐲)=(𝐱,B𝐲)𝐱,(\mathbf{x},A^{*}\mathbf{y})=(\mathbf{x},B\mathbf{y})\qquad\forall\mathbf{x},

and therefore by Corollary 5.1.5 A𝐲=B𝐲A^{*}\mathbf{y}=B\mathbf{y}. Since it is true for all 𝐲\mathbf{y}, the linear transformations, and therefore the matrices AA^{*} and BB coincide.

Adjoint transformation in abstract setting.

The above main identity (A𝐱,𝐲)=(𝐱,A𝐲)(A\mathbf{x},\mathbf{y})=(\mathbf{x},A^{*}\mathbf{y}) can be used to define the adjoint operator in abstract setting, where A:VWA:V\to W is an operator acting from one inner product space to another. Namely, we define A:WVA^{*}:W\to V to be the operator satisfying

(A𝐱,𝐲)=(𝐱,A𝐲)𝐱V,𝐲W.(A\mathbf{x},\mathbf{y})=(\mathbf{x},A^{*}\mathbf{y})\qquad\forall\mathbf{x}% \in V,\ \forall\mathbf{y}\in W.

Why does such an operator exists? We can simply construct it: consider orthonormal bases 𝒜=𝐯1,𝐯2,,𝐯n\mathcal{A}=\mathbf{v}_{1},\mathbf{v}_{2},\ldots,\mathbf{v}_{n} in VV and =𝐰1,𝐰2,,𝐰m\mathcal{B}=\mathbf{w}_{1},\mathbf{w}_{2},\ldots,\mathbf{w}_{m} in WW. If [A]𝒜[A]_{{}_{\scriptstyle\mathcal{B}\mathcal{A}}} is the matrix of AA with respect to these bases, we define the operator AA^{*} by defining its matrix [A]𝒜[A^{*}]_{{}_{\scriptstyle\mathcal{A}\mathcal{B}}} as

[A]𝒜=([A]𝒜).[A^{*}]_{{}_{\scriptstyle\mathcal{A}\mathcal{B}}}=([A]_{{}_{\scriptstyle% \mathcal{B}\mathcal{A}}})^{*}.

We leave the proof that this indeed gives the adjoint operator as an exercise for the reader.

Note, that the reasoning in the above Sect. 5.5.1 implies that the adjoint operator is unique.

Useful formulas.

Below we present the properties of the adjoint operators (matrices) we will use a lot. We leave the proofs as an exercise for the reader.

  1. 1.

    (A+B)=A+B(A+B)^{*}=A^{*}+B^{*};

  2. 2.

    (αA)=α¯A(\alpha A)^{*}=\overline{\alpha}A^{*};

  3. 3.

    (AB)=BA(AB)^{*}=B^{*}A^{*};

  4. 4.

    (A)=A(A^{*})^{*}=A;

  5. 5.

    (𝐲,A𝐱)=(A𝐲,𝐱)(\mathbf{y},A\mathbf{x})=(A^{*}\mathbf{y},\mathbf{x}).

5.5.2. Relation between fundamental subspaces.

Theorem 5.5.1.

Let A:VWA:V\to W be an operator acting from one inner product space to another. Then

  1. 1.

    KerA=(RanA)\operatorname{Ker}A^{*}=(\operatorname{Ran}A)^{\perp};

  2. 2.

    KerA=(RanA)\operatorname{Ker}A=(\operatorname{Ran}A^{*})^{\perp};

  3. 3.

    RanA=(KerA)\operatorname{Ran}A=(\operatorname{Ker}A^{*})^{\perp};

  4. 4.

    RanA=(KerA)\operatorname{Ran}A^{*}=(\operatorname{Ker}A)^{\perp}.

Remark.

Earlier in Section 2.7 of Chapter 2 the fundamental subspaces were defined (as it is often done in the literature) using ATA^{T} instead of AA^{*}. Of course, there is no difference for real matrices, so in the real case the above theorem gives the geometric description of the fundamentals subspaces defined there.

Geometric interpretation of the fundamental subspaces defined using ATA^{T} is presented in Chapter 8 below, see Section 8.3 there (Theorem 8.3.7). The formulas in this theorem are essentially the same as in Theorem 5.5.1 here, only the interpretation is a bit different.

Proof of Theorem 5.5.1.

First of all, let us notice, that since for a subspace EE we have (E)=E(E^{\perp})^{\perp}=E, the statements 1 and 3 are equivalent. Similarly, for the same reason, the statements 2 and 4 are equivalent as well. Finally, statement 2 is exactly statement 1 applied to the operator AA^{*} (here we use the fact that (A)=A(A^{*})^{*}=A).

So, to prove the theorem we only need to prove statement 1.

We will present 2 proofs of this statement: a “matrix” proof, and an “invariant”, or “coordinate-free” one.

In the “matrix” proof, we assume that AA is an m×nm\times n matrix, i.e. that A:𝔽n𝔽mA:\mathbb{F}^{n}\to\mathbb{F}^{m}. The general case can be always reduced to this one by picking orthonormal bases in VV and WW, and considering the matrix of AA in this bases.

Let 𝐚1,𝐚2,,𝐚n\mathbf{a}_{1},\mathbf{a}_{2},\ldots,\mathbf{a}_{n} be the columns of AA. Note, that 𝐱(RanA)\mathbf{x}\in(\operatorname{Ran}A)^{\perp} if and only if 𝐱𝐚k\mathbf{x}\perp\mathbf{a}_{k} (i.e. (𝐱,𝐚k)=0(\mathbf{x},\mathbf{a}_{k})=0) k=1,2,,n\forall k=1,2,\ldots,n.

By the definition of the inner product in 𝔽n\mathbb{F}^{n}, that means

0=(𝐱,𝐚k)=𝐚k𝐱k=1,2,,n.0=(\mathbf{x},\mathbf{a}_{k})=\mathbf{a}_{k}^{*}\,\mathbf{x}\qquad\forall k=1,% 2,\ldots,n.

Since 𝐚k\mathbf{a}_{k}^{*} is the row number kk of AA^{*}, the above nn equalities are equivalent to the equation

A𝐱=𝟎.A^{*}\mathbf{x}=\mathbf{0}.

So, we proved that 𝐱(RanA)\mathbf{x}\in(\operatorname{Ran}A)^{\perp} if and only if A𝐱=𝟎A^{*}\mathbf{x}=\mathbf{0}, and that is exactly the statement 1.

Now, let us present the “coordinate-free” proof. The inclusion 𝐱(RanA)\mathbf{x}\in(\operatorname{Ran}A)^{\perp} means that 𝐱\mathbf{x} is orthogonal to all vectors of the form A𝐲A\mathbf{y}, i.e. that

(𝐱,A𝐲)=0𝐲.(\mathbf{x},A\mathbf{y})=0\qquad\forall\mathbf{y}.

Since (𝐱,A𝐲)=(A𝐱,𝐲)(\mathbf{x},A\mathbf{y})=(A^{*}\mathbf{x},\mathbf{y}), this identity is equivalent to

(A𝐱,𝐲)=0𝐲,(A^{*}\mathbf{x},\mathbf{y})=0\qquad\forall\mathbf{y},

and by Lemma 5.1.4 this happens if and only if A𝐱=𝟎A^{*}\mathbf{x}=\mathbf{0}. So we proved that 𝐱(RanA)\mathbf{x}\in(\operatorname{Ran}A)^{\perp} if and only if A𝐱=𝟎A^{*}\mathbf{x}=\mathbf{0}, which is exactly the statement 1 of the theorem. ∎

5.5.3. The “essential” part of a linear transformation

The above theorem makes the structure of the operator AA and the geometry of fundamental subspaces much more transparent. It follows from this theorem that the operator AA can be represented as a composition of orthogonal projection onto RanA\operatorname{Ran}A^{*} and an isomorphism from RanA\operatorname{Ran}A^{*} to RanA\operatorname{Ran}A.

Indeed, let A~:RanARanA\widetilde{A}:\operatorname{Ran}A^{*}\to\operatorname{Ran}A be the restriction of AA to the domain RanA\operatorname{Ran}A^{*} and the target space RanA\operatorname{Ran}A,

A~𝐱=A𝐱,𝐱RanA.\widetilde{A}\mathbf{x}=A\mathbf{x},\qquad\forall\mathbf{x}\in\operatorname{% Ran}A^{*}.

Since KerA=(RanA)\operatorname{Ker}A=(\operatorname{Ran}A^{*})^{\perp}, we have

A𝐱=APRanA𝐱=A~PRanA𝐱𝐱X;A\mathbf{x}=AP_{{}_{\scriptstyle\operatorname{Ran}{A^{*}}}}\mathbf{x}=% \widetilde{A}P_{{}_{\scriptstyle\operatorname{Ran}A^{*}}}\mathbf{x}\qquad% \forall\mathbf{x}\in X;

the fact that 𝐱PRanA𝐱(RanA)=KerA\mathbf{x}-P_{{}_{\scriptstyle\operatorname{Ran}A^{*}}}\mathbf{x}\in(% \operatorname{Ran}A^{*})^{\perp}=\operatorname{Ker}A is used here. Therefore we can write

(5.5.1) A𝐱=A~PRanA𝐱𝐱X,\displaystyle A\mathbf{x}=\widetilde{A}P_{{}_{\scriptstyle\operatorname{Ran}A^% {*}}}\mathbf{x}\qquad\forall\mathbf{x}\in X,

or, equivalently, A=A~PRanAA=\widetilde{A}P_{{}_{\scriptstyle\operatorname{Ran}A^{*}}}.

Note also that A~:RanARanA\widetilde{A}:\operatorname{Ran}A^{*}\to\operatorname{Ran}A is an invertible transformation. First we notice that KerA~={𝟎}\operatorname{Ker}\widetilde{A}=\{\mathbf{0}\}: if 𝐱RanA\mathbf{x}\in\operatorname{Ran}A^{*} is such that A~𝐱=A𝐱=𝟎\widetilde{A}\mathbf{x}=A\mathbf{x}=\mathbf{0}, then 𝐱KerA=(RanA)\mathbf{x}\in\operatorname{Ker}A=(\operatorname{Ran}A^{*})^{\perp}, so 𝐱RanA(RanA)\mathbf{x}\in\operatorname{Ran}A^{*}\cap(\operatorname{Ran}A^{*})^{\perp}, thus 𝐱=𝟎\mathbf{x}=\mathbf{0}. Then to see that A~\widetilde{A} is invertible, it is sufficient to see that A~\widetilde{A} is onto (surjective). But this immediately follows from (5.5.1):

RanA~=A~RanA=A~PRanAX=AX=RanA.\operatorname{Ran}\widetilde{A}=\widetilde{A}\operatorname{Ran}A^{*}=% \widetilde{A}P_{{}_{\scriptstyle\operatorname{Ran}A^{*}}}X=AX=\operatorname{% Ran}A.

The isomorphism A~\widetilde{A} is sometimes called the “essential part” of the operator AA (a non-standard terminology).

The fact the “essential part” A~:RanARanA\widetilde{A}:\operatorname{Ran}A^{*}\to\operatorname{Ran}A of AA is an isomorphism implies the following “complex” rank theorem: rankA=rankA\operatorname{rank}A=\operatorname{rank}A^{*}. But, of course, this theorem also follows from an elementary observation that complex conjugation does not change rank of a matrix, rankA=rankA¯\operatorname{rank}A=\operatorname{rank}\overline{A}.

Exercises.

5.5.1.

Show that for a square matrix AA the equality det(A)=det(A)¯\det(A^{*})=\overline{\det(A)} holds.

5.5.2.

Find matrices of orthogonal projections onto all 4 fundamental subspaces of the matrix

A=(111132243).A=\left(\begin{array}[]{ccc}1&1&1\\ 1&3&2\\ 2&4&3\\ \end{array}\right)\ .

Note, that really you need only to compute 2 of the projections. If you pick an appropriate 2, the other 2 are easy to obtain from them (recall, how the projections onto EE and EE^{\perp} are related).

5.5.3.

Let AA be an m×nm\times n matrix. Show that KerA=Ker(AA)\operatorname{Ker}A=\operatorname{Ker}(A^{*}A).

To do that you need to prove 2 inclusions, Ker(AA)KerA\operatorname{Ker}(A^{*}A)\subset\operatorname{Ker}A and KerAKer(AA)\operatorname{Ker}A\subset\operatorname{Ker}(A^{*}A). One of the inclusions is trivial, for the other one use the fact that

A𝐱2=(A𝐱,A𝐱)=(AA𝐱,𝐱).\|A\mathbf{x}\|^{2}=(A\mathbf{x},A\mathbf{x})=(A^{*}A\mathbf{x},\mathbf{x}).
5.5.4.

Use the equality KerA=Ker(AA)\operatorname{Ker}A=\operatorname{Ker}(A^{*}A) to prove that

  1. a)

    rankA=rank(AA)\operatorname{rank}A=\operatorname{rank}(A^{*}A);

  2. b)

    If A𝐱=𝟎A\mathbf{x}=\mathbf{0} has only the trivial solution, AA is left invertible. (You can just write a formula for a left inverse).

5.5.5.

Suppose, that for a matrix AA the matrix AAA^{*}A is invertible, so the orthogonal projection onto RanA\operatorname{Ran}A is given by the formula A(AA)1AA(A^{*}A)^{-1}A^{*}. Can you write formulas for the orthogonal projections onto the other 3 fundamental subspaces (KerA\operatorname{Ker}A, KerA\operatorname{Ker}A^{*}, RanA\operatorname{Ran}A^{*})?

5.5.6.

Let a matrix PP be self-adjoint (P=PP^{*}=P) and let P2=PP^{2}=P. Show that PP is the matrix of an orthogonal projection. Hint: consider the decomposition 𝐱=𝐱1+𝐱2\mathbf{x}=\mathbf{x}_{1}+\mathbf{x}_{2}, 𝐱1RanP\mathbf{x}_{1}\in\operatorname{Ran}P, 𝐱2RanP\mathbf{x}_{2}\perp\operatorname{Ran}P and show that P𝐱1=𝐱1P\mathbf{x}_{1}=\mathbf{x}_{1}, P𝐱2=𝟎P\mathbf{x}_{2}=\mathbf{0}. For one of the equalities you will need self-adjointness, for the other one the property P2=PP^{2}=P.

5.6. Isometries and unitary operators. Unitary and orthogonal matrices.

5.6.1. Main definitions

Definition.

An operator U:XYU:X\to Y is called an isometry, if it preserves the norm,

U𝐱=𝐱𝐱X.\|U\mathbf{x}\|=\|\mathbf{x}\|\qquad\forall\mathbf{x}\in X.

The following theorem shows that an isometry preserves the inner product

Theorem 5.6.1.

An operator U:XYU:X\to Y is an isometry if and only if it preserves the inner product, i.e if and only if

(𝐱,𝐲)=(U𝐱,U𝐲)𝐱,𝐲X.(\mathbf{x},\mathbf{y})=(U\mathbf{x},U\mathbf{y})\qquad\forall\mathbf{x},% \mathbf{y}\in X.
Proof.

The proof uses the polarization identities (Lemma 5.1.9). For example, if XX is a complex space

(U𝐱,U𝐲)\displaystyle(U\mathbf{x},U\mathbf{y}) =14α=±1,±iαU𝐱+αU𝐲2\displaystyle=\frac{1}{4}\sum_{\alpha=\pm 1,\pm i}\alpha\|U\mathbf{x}+\alpha U% \mathbf{y}\|^{2}
=14α=±1,±iαU(𝐱+α𝐲)2\displaystyle=\frac{1}{4}\sum_{\alpha=\pm 1,\pm i}\alpha\|U(\mathbf{x}+\alpha% \mathbf{y})\|^{2}
=14α=±1,±iα𝐱+α𝐲2=(𝐱,𝐲).\displaystyle=\frac{1}{4}\sum_{\alpha=\pm 1,\pm i}\alpha\|\mathbf{x}+\alpha% \mathbf{y}\|^{2}=(\mathbf{x},\mathbf{y}).

Similarly, for a real space XX

(U𝐱,U𝐲)\displaystyle(U\mathbf{x},U\mathbf{y}) =14(U𝐱+U𝐲2U𝐱U𝐲2)\displaystyle=\frac{1}{4}\left(\|U\mathbf{x}+U\mathbf{y}\|^{2}-\|U\mathbf{x}-U% \mathbf{y}\|^{2}\right)
=14(U(𝐱+𝐲)2U(𝐱𝐲)2)\displaystyle=\frac{1}{4}\left(\|U(\mathbf{x}+\mathbf{y})\|^{2}-\|U(\mathbf{x}% -\mathbf{y})\|^{2}\right)
=14(𝐱+𝐲2𝐱𝐲2)=(𝐱,𝐲).\displaystyle=\frac{1}{4}\left(\|\mathbf{x}+\mathbf{y}\|^{2}-\|\mathbf{x}-% \mathbf{y}\|^{2}\right)=(\mathbf{x},\mathbf{y}).

Lemma 5.6.2.

An operator U:XYU:X\to Y is an isometry if and only if UU=IU^{*}U=I.

Proof.

If UU=IU^{*}U=I, then by the definition of adjoint operator

(𝐱,𝐱)=(UU𝐱,𝐱)=(U𝐱,U𝐱)𝐱X.(\mathbf{x},\mathbf{x})=(U^{*}U\mathbf{x},\mathbf{x})=(U\mathbf{x},U\mathbf{x}% )\qquad\forall\mathbf{x}\in X.

Therefore 𝐱=U𝐱\|\mathbf{x}\|=\|U\mathbf{x}\|, and so UU is an isometry.

On the other hand, if UU is an isometry, then by the definition of adjoint operator and by Theorem 5.6.1 we have for all 𝐱X\mathbf{x}\in X

(UU𝐱,𝐲)=(U𝐱,U𝐲)=(𝐱,𝐲)𝐲X,(U^{*}U\mathbf{x},\mathbf{y})=(U\mathbf{x},U\mathbf{y})=(\mathbf{x},\mathbf{y}% )\qquad\forall\mathbf{y}\in X,

and therefore by Corollary 5.1.5 UU𝐱=𝐱U^{*}U\mathbf{x}=\mathbf{x}. Since it is true for all 𝐱X\mathbf{x}\in X, we have UU=IU^{*}U=I. ∎

The above lemma implies that an isometry is always left invertible (UU^{*} being a left inverse).

Definition.

An isometry U:XYU:X\to Y is called a unitary operator if it is invertible.

Proposition 5.6.3.

An isometry U:XYU:X\to Y is a unitary operator if and only if dimX=dimY\dim X=\dim Y.

Proof.

Since UU is an isometry, it is left invertible, and since dimX=dimY\dim X=\dim Y, it is invertible (a left invertible square matrix is invertible).

On the other hand, if U:XYU:X\to Y is invertible, dimX=dimY\dim X=\dim Y (only square matrices are invertible, isomorphic spaces have equal dimensions). ∎

A square matrix UU is called unitary if UU=IU^{*}U=I, i.e. a unitary matrix is a matrix of a unitary operator acting in 𝔽n\mathbb{F}^{n}.

A unitary matrix with real entries is called an orthogonal matrix. An orthogonal matrix can be interpreted a matrix of a unitary operator acting in the real space n\mathbb{R}^{n}.

Few properties of unitary operators:

  1. 1.

    For a unitary transformation UU, U1=UU^{-1}=U^{*};

  2. 2.

    If UU is unitary, U=U1U^{*}=U^{-1} is also unitary;

  3. 3.

    If UU is a isometry, and 𝐯1,𝐯2,,𝐯n\mathbf{v}_{1},\mathbf{v}_{2},\ldots,\mathbf{v}_{n} is an orthonormal basis, then U𝐯1,U𝐯2,,U𝐯nU\mathbf{v}_{1},U\mathbf{v}_{2},\ldots,U\mathbf{v}_{n} is an orthonormal system. Moreover, if UU is unitary, U𝐯1,U𝐯2,,U𝐯nU\mathbf{v}_{1},U\mathbf{v}_{2},\ldots,U\mathbf{v}_{n} is an orthonormal basis.

  4. 4.

    A product of unitary operators is a unitary operator as well.

5.6.2. Examples

First of all, let us notice, that

a matrix UU is an isometry if and only if its columns form an orthonormal system.

This statement can be checked directly by computing the product UUU^{*}U.

It is easy to check that the columns of the rotation matrix

(cosαsinαsinαcosα)\left(\begin{array}[]{cc}\cos\alpha&-\sin\alpha\\ \sin\alpha&\cos\alpha\end{array}\right)

are orthogonal to each other, and that each column has norm 1. Therefore, the rotation matrix is an isometry, and since it is square, it is unitary. Since all entries of the rotation matrix are real, it is an orthogonal matrix.

The next example is more abstract. Let XX and YY be inner product spaces, dimX=dimY=n\dim X=\dim Y=n, and let 𝐱1,𝐱2,,𝐱n\mathbf{x}_{1},\mathbf{x}_{2},\ldots,\mathbf{x}_{n} and 𝐲1,𝐲2,,𝐲n\mathbf{y}_{1},\mathbf{y}_{2},\ldots,\mathbf{y}_{n} be orthonormal bases in XX and YY respectively. Define an operator U:XYU:X\to Y by

U𝐱k=𝐲k,k=1,2,,n.U\mathbf{x}_{k}=\mathbf{y}_{k},\qquad k=1,2,\ldots,n.

Since for a vector 𝐱=c1𝐱1+c2𝐱2++cn𝐱n\mathbf{x}=c_{1}\mathbf{x}_{1}+c_{2}\mathbf{x}_{2}+\ldots+c_{n}\mathbf{x}_{n}

𝐱2=|c1|2+|c2|2++|cn|2\|\mathbf{x}\|^{2}=|c_{1}|^{2}+|c_{2}|^{2}+\ldots+|c_{n}|^{2}

and

U𝐱2=U(k=1nck𝐱k)2=k=1nck𝐲k2=k=1n|ck|2,\|U\mathbf{x}\|^{2}=\|U(\sum_{k=1}^{n}c_{k}\mathbf{x}_{k})\|^{2}=\|\sum_{k=1}^% {n}c_{k}\mathbf{y}_{k}\|^{2}=\sum_{k=1}^{n}|c_{k}|^{2},

one can conclude that U𝐱=𝐱\|U\mathbf{x}\|=\|\mathbf{x}\| for all 𝐱X\mathbf{x}\in X, so UU is a unitary operator.

5.6.3. Properties of unitary operators

Proposition 5.6.4.

Let UU be a unitary matrix. Then

  1. 1.

    |detU|=1|\det U|=1. In particular, for an orthogonal matrix detU=±1\det U=\pm 1;

  2. 2.

    If λ\lambda is an eigenvalue of UU, then |λ|=1|\lambda|=1

Remark.

Note, that for an orthogonal matrix, an eigenvalue (unlike the determinant) does not have to be real. Our old friend, the rotation matrix gives an example.

Proof of Proposition 5.6.4.

Let detU=z\det U=z. Since det(U)=det(U)¯\det(U^{*})=\overline{\det(U)}, see Exercise 5.5.1, we have

|z|2=z¯z=det(UU)=detI=1,|z|^{2}=\overline{z}z=\det(U^{*}U)=\det I=1,

so |detU|=|z|=1|\det U|=|z|=1. Statement 1 is proved.

To prove statement 2 let us notice that if U𝐱=λ𝐱U\mathbf{x}=\lambda\mathbf{x} then

U𝐱=λ𝐱=|λ|𝐱,\|U\mathbf{x}\|=\|\lambda\mathbf{x}\|=|\lambda|\cdot\|\mathbf{x}\|,

so |λ|=1|\lambda|=1. ∎

5.6.4. Unitary equivalent operators

Definition.

Operators (matrices) AA and BB are called unitarily equivalent if there exists a unitary operator UU such that A=UBUA=UBU^{*}.

Since for a unitary UU we have U1=UU^{-1}=U^{*}, any two unitary equivalent matrices are similar as well.

The converse is not true, it is easy to construct a pair of similar matrices, which are not unitarily equivalent.

The following proposition gives a way to construct a counterexample.

Proposition 5.6.5.

A matrix AA is unitarily equivalent to a diagonal one if and only if it has an orthogonal (orthonormal) basis of eigenvectors.

Proof.

Let A=UBUA=UBU^{*} and let B𝐱=λ𝐱B\mathbf{x}=\lambda\mathbf{x}. Then AU𝐱=UBUU𝐱=UB𝐱=U(λ𝐱)=λU𝐱AU\mathbf{x}=UBU^{*}U\mathbf{x}=UB\mathbf{x}=U(\lambda\mathbf{x})=\lambda U% \mathbf{x}, i.e. U𝐱U\mathbf{x} is an eigenvector of AA.

So, let AA be unitarily equivalent to a diagonal matrix DD, i.e. let A=UDUA=UDU^{*}. The vectors 𝐞k\mathbf{e}_{k} of the standard basis are eigenvectors of DD, so the vectors U𝐞kU\mathbf{e}_{k} are eigenvectors of AA. Since UU is unitary, the system U𝐞1,U𝐞2,,U𝐞nU\mathbf{e}_{1},U\mathbf{e}_{2},\ldots,U\mathbf{e}_{n} is an orthonormal basis.

Now let AA has an orthogonal basis 𝐮1,𝐮2,,𝐮n\mathbf{u}_{1},\mathbf{u}_{2},\ldots,\mathbf{u}_{n} of eigenvectors. Dividing each vector 𝐮k\mathbf{u}_{k} by its norm if necessary, we can always assume that the system 𝐮1,𝐮2,,𝐮n\mathbf{u}_{1},\mathbf{u}_{2},\ldots,\mathbf{u}_{n} is an orthonormal basis. Let DD be the matrix of AA in the basis =𝐮1,𝐮2,,𝐮n\mathcal{B}=\mathbf{u}_{1},\mathbf{u}_{2},\ldots,\mathbf{u}_{n}. Clearly, DD is a diagonal matrix.

Denote by UU the matrix with columns 𝐮1,𝐮2,,𝐮n\mathbf{u}_{1},\mathbf{u}_{2},\ldots,\mathbf{u}_{n}. Since the columns form an orthonormal basis, UU is unitary. The standard change of coordinate formula implies

A=[A]𝒮𝒮=[I]𝒮[A][I]𝒮=UDU1A=[A]_{{}_{\scriptstyle\mathcal{S}\mathcal{S}}}=[I]_{{}_{\scriptstyle\mathcal{% S}\mathcal{B}}}[A]_{{}_{\scriptstyle\mathcal{B}\mathcal{B}}}[I]_{{}_{% \scriptstyle\mathcal{B}\mathcal{S}}}=UDU^{-1}

and since UU is unitary, A=UDUA=UDU^{*}. ∎

Exercises.

5.6.1.

Orthogonally diagonalize the following matrices,

(1221),(0110),(022202220)\left(\begin{array}[]{cc}1&2\\ 2&1\\ \end{array}\right),\qquad\left(\begin{array}[]{rr}0&-1\\ 1&0\\ \end{array}\right),\qquad\left(\begin{array}[]{lll}0&2&2\\ 2&0&2\\ 2&2&0\\ \end{array}\right)

i.e. for each matrix AA find a unitary matrix UU and a diagonal matrix DD such that A=UDUA=UDU^{*}

5.6.2.

True or false: a matrix is unitarily equivalent to a diagonal one if and only if it has an orthogonal basis of eigenvectors.

5.6.3.

Prove the polarization identities

(A𝐱,𝐲)=14[(A(𝐱+𝐲),𝐱+𝐲)(A(𝐱𝐲),𝐱𝐲)](real case, A=A),(A\mathbf{x},\mathbf{y})=\frac{1}{4}\bigl{[}(A(\mathbf{x}+\mathbf{y}),\mathbf{% x}+\mathbf{y})-(A(\mathbf{x}-\mathbf{y}),\mathbf{x}-\mathbf{y})\bigr{]}\qquad% \text{(real case, $A=A^{*}$)},

and

(A𝐱,𝐲)=14α=±1,±iα(A(𝐱+α𝐲),𝐱+α𝐲)(complex case, A is arbitrary).(A\mathbf{x},\mathbf{y})=\frac{1}{4}\sum_{\alpha=\pm 1,\pm i}\alpha(A(\mathbf{% x}+\alpha\mathbf{y}),\mathbf{x}+\alpha\mathbf{y})\qquad\text{(complex case, $A% $ is arbitrary)}.
5.6.4.

Show that a product of unitary (orthogonal) matrices is unitary (orthogonal) as well.

5.6.5.

Let U:XXU:X\to X be a linear transformation on a finite-dimensional inner product space. True or false:

  1. a)

    If U𝐱=𝐱\|U\mathbf{x}\|=\|\mathbf{x}\| for all 𝐱X\mathbf{x}\in X, then UU is unitary.

  2. b)

    If U𝐞k=𝐞k\|U\mathbf{e}_{k}\|=\|\mathbf{e}_{k}\|, k=1,2,nk=1,2\ldots,n for some orthonormal basis 𝐞1,𝐞2,,𝐞n\mathbf{e}_{1},\mathbf{e}_{2},\ldots,\mathbf{e}_{n}, then UU is unitary.

Justify your answers with a proof or a counterexample.

5.6.6.

Let AA and BB be unitarily equivalent n×nn\times n matrices.

  1. a)

    Prove that trace(AA)=trace(BB)\operatorname{trace}(A^{*}A)=\operatorname{trace}(B^{*}B).

  2. b)

    Use a) to prove that

    j,k=1n|Aj,k|2=j,k=1n|Bj,k|2.\sum_{j,k=1}^{n}|A_{j,k}|^{2}=\sum_{j,k=1}^{n}|B_{j,k}|^{2}.
  3. c)

    Use b) to prove that the matrices

    (122i)and(i411)\left(\begin{array}[]{ll}1&2\\ 2&i\\ \end{array}\right)\qquad\text{and}\qquad\left(\begin{array}[]{ll}i&4\\ 1&1\\ \end{array}\right)

    are not unitarily equivalent.

5.6.7.

Which of the following pairs of matrices are unitarily equivalent:

  1. a)

    (1001)and(0110)\displaystyle\left(\begin{array}[]{ll}1&0\\ 0&1\\ \end{array}\right)\quad\text{and}\quad\left(\begin{array}[]{ll}0&1\\ 1&0\\ \end{array}\right).

  2. b)

    (0110)and(01/21/20)\displaystyle\left(\begin{array}[]{ll}0&1\\ 1&0\\ \end{array}\right)\quad\text{and}\quad\left(\begin{array}[]{ll}0&1/2\\ 1/2&0\\ \end{array}\right).

  3. c)

    (010100001)and(200010000)\displaystyle\left(\begin{array}[]{rrr}0&1&0\\ -1&0&0\\ 0&0&1\\ \end{array}\right)\quad\text{and}\quad\left(\begin{array}[]{rrr}2&0&0\\ 0&-1&0\\ 0&0&0\\ \end{array}\right).

  4. d)

    (010100001)and(1000i000i)\displaystyle\left(\begin{array}[]{rrr}0&1&0\\ -1&0&0\\ 0&0&1\\ \end{array}\right)\quad\text{and}\quad\left(\begin{array}[]{rrr}1&0&0\\ 0&-i&0\\ 0&0&i\\ \end{array}\right).

  5. e)

    (110022003)and(100020003)\displaystyle\left(\begin{array}[]{rrr}1&1&0\\ 0&2&2\\ 0&0&3\\ \end{array}\right)\quad\text{and}\quad\left(\begin{array}[]{rrr}1&0&0\\ 0&2&0\\ 0&0&3\\ \end{array}\right).

Hint: It is easy to eliminate matrices that are not unitarily equivalent: remember, that unitarily equivalent matrices are similar, and trace, determinant and eigenvalues of similar matrices coincide.

Also, the previous problem helps in eliminating non unitarily equivalent matrices.

Finally, a matrix is unitarily equivalent to a diagonal one if and only if it has an orthogonal basis of eigenvectors.

5.6.8.

Let UU be a 2×22\times 2 orthogonal matrix with detU=1\det U=1. Prove that UU is a rotation matrix.

5.6.9.

Let UU be a 3×33\times 3 orthogonal matrix with detU=1\det U=1. Prove that

  1. a)

    11 is an eigenvalue of UU.

  2. b)

    If 𝐯1,𝐯2,𝐯3\mathbf{v}_{1},\mathbf{v}_{2},\mathbf{v}_{3} is an orthonormal basis, such that U𝐯1=𝐯1U\mathbf{v}_{1}=\mathbf{v}_{1} (remember, that 1 is an eigenvalue), then in this basis the matrix of UU is

    (1000cosαsinα0sinαcosα),\left(\begin{array}[]{ccc}1&0&0\\ 0&\cos\alpha&-\sin\alpha\\ 0&\sin\alpha&\cos\alpha\end{array}\right),

    where α\alpha is some angle.

    Hint: Show, that since 𝐯1\mathbf{v}_{1} is an eigenvector of UU, all entries below 1 must be zero, and since 𝐯1\mathbf{v}_{1} is also an eigenvector of UU^{*} (why?), all entries right of 1 also must be zero. Then show that the lower right 2×22\times 2 matrix is an orthogonal one with determinant 1, and use the previous problem.

5.7. Rigid motions in n\mathbb{R}^{n}

A rigid motion in an inner product space VV is a transformation f:VVf:V\to V preserving the distance between point, i.e. such that

f(𝐱)f(𝐲)=𝐱𝐲𝐱,𝐲V.\|f(\mathbf{x})-f(\mathbf{y})\|=\|\mathbf{x}-\mathbf{y}\|\qquad\forall\mathbf{% x},\mathbf{y}\in V.

Note, that in the definition we do not assume that the transformation ff is linear.

Clearly, any unitary transformation is a rigid motion. Another example of a rigid motion is a translation (shift) by 𝐚V\mathbf{a}\in V, f(𝐱)=𝐱+𝐚f(\mathbf{x})=\mathbf{x}+\mathbf{a}.

The main result of this section is the following theorem, stating that any rigid motion in a real inner product space is a composition of an orthogonal transformation and a translation.

Theorem 5.7.1.

Let ff be a rigid motion in a real inner product space XX, and let T(𝐱):=f(𝐱)f(𝟎)T(\mathbf{x}):=f(\mathbf{x})-f(\mathbf{0}). Then TT is an orthogonal transformation.

To prove this theorem we need the following simple lemma.

Lemma 5.7.2.

Let TT be as defined in Theorem 5.7.1. Then for all 𝐱,𝐲X\mathbf{x},\mathbf{y}\in X

  1. 1.

    T𝐱=𝐱\|T\mathbf{x}\|=\|\mathbf{x}\|;

  2. 2.

    T(𝐱)T(𝐲)=𝐱𝐲\|T(\mathbf{x})-T(\mathbf{y})\|=\|\mathbf{x}-\mathbf{y}\|;

  3. 3.

    (T(𝐱),T(𝐲))=(𝐱,𝐲)(T(\mathbf{x}),T(\mathbf{y}))=(\mathbf{x},\mathbf{y}).

Proof.

To prove statement 1 notice that

T(𝐱)=f(𝐱)f(𝟎)=𝐱𝟎=𝐱.\|T(\mathbf{x})\|=\|f(\mathbf{x})-f(\mathbf{0})\|=\|\mathbf{x}-\mathbf{0}\|=\|% \mathbf{x}\|.

Statement 2 follows from the following chain of identities:

T(𝐱)T(𝐲)\displaystyle\|T(\mathbf{x})-T(\mathbf{y})\| =(f(𝐱)f(𝟎))(f(𝐲)f(𝟎))\displaystyle=\|(f(\mathbf{x})-f(\mathbf{0}))-(f(\mathbf{y})-f(\mathbf{0}))\|
=f(𝐱)f(𝐲)=𝐱𝐲.\displaystyle=\|f(\mathbf{x})-f(\mathbf{y})\|=\|\mathbf{x}-\mathbf{y}\|.

An alternative explanation would be that TT is a composition of 2 rigid motions (ff followed by the translation by 𝐚=f(𝟎)\mathbf{a}=-f(\mathbf{0})), and one can easily see that a composition of rigid motions is a rigid motion. Since T(𝟎)=𝟎T(\mathbf{0})=\mathbf{0}, and so T(𝐱)=T(𝐱)T(𝟎)\|T(\mathbf{x})\|=\|T(\mathbf{x})-T(\mathbf{0})\|, statement 1 can be treated as a particular case of statement 2.

To prove statement 3, let us notice that in a real inner product space

T(𝐱)T(𝐲)2=T(𝐱)2+T(𝐲)22(T(𝐱),T(𝐲)),\|T(\mathbf{x})-T(\mathbf{y})\|^{2}=\|T(\mathbf{x})\|^{2}+\|T(\mathbf{y})\|^{2% }-2(T(\mathbf{x}),T(\mathbf{y})),

and

𝐱𝐲2=𝐱2+𝐲22(𝐱,𝐲).\|\mathbf{x}-\mathbf{y}\|^{2}=\|\mathbf{x}\|^{2}+\|\mathbf{y}\|^{2}-2(\mathbf{% x},\mathbf{y}).

Recalling that T(𝐱)T(𝐲)=𝐱𝐲\|T(\mathbf{x})-T(\mathbf{y})\|=\|\mathbf{x}-\mathbf{y}\| and T(𝐱)=𝐱\|T(\mathbf{x})\|=\|\mathbf{x}\|, T(𝐲)=𝐲\|T(\mathbf{y})\|=\|\mathbf{y}\|, we immediately get the desired conclusion. ∎

Proof of Theorem 5.7.1.

First of all notice that for all 𝐱X\mathbf{x}\in X

T(𝐱)=f(𝐱)f(𝟎)=𝐱𝟎=𝐱,\|T(\mathbf{x})\|=\|f(\mathbf{x})-f(\mathbf{0})\|=\|\mathbf{x}-\mathbf{0}\|=\|% \mathbf{x}\|,

so TT preserves the norm, T𝐱=𝐱\|T\mathbf{x}\|=\|\mathbf{x}\|.

We would like to say that the identity T𝐱=𝐱\|T\mathbf{x}\|=\|\mathbf{x}\| means TT is an isometry, but to be able to say that we need to prove that TT is a linear transformation.

To do that, let us fix an orthonormal basis 𝐞1,𝐞2,,𝐞n\mathbf{e}_{1},\mathbf{e}_{2},\ldots,\mathbf{e}_{n} in XX, and let 𝐛k:=T(𝐞k)\mathbf{b}_{k}:=T(\mathbf{e}_{k}), k=1,2,,nk=1,2,\ldots,n. Since TT preserves the inner product (statement 3 of Lemma 5.7.2), we can conclude that 𝐛1,𝐛2,,𝐛n\mathbf{b}_{1},\mathbf{b}_{2},\ldots,\mathbf{b}_{n} is an orthonormal system. In fact, since dimX=n\dim X=n (because basis 𝐞1,𝐞2,,𝐞n\mathbf{e}_{1},\mathbf{e}_{2},\ldots,\mathbf{e}_{n} consists of nn vectors), we can conclude that 𝐛1,𝐛2,,𝐛n\mathbf{b}_{1},\mathbf{b}_{2},\ldots,\mathbf{b}_{n} is an orthonormal basis.

Let 𝐱=k=1nαk𝐞k\mathbf{x}=\sum_{k=1}^{n}\alpha_{k}\mathbf{e}_{k}. Recall that by the abstract orthogonal Fourier decomposition (5.2.2) we have that αk=(𝐱,𝐞k)\alpha_{k}=(\mathbf{x},\mathbf{e}_{k}). Applying the abstract orthogonal Fourier decomposition (5.2.2) to T(𝐱)T(\mathbf{x}) and the orthonormal basis 𝐛1,𝐛2,,𝐛n\mathbf{b}_{1},\mathbf{b}_{2},\ldots,\mathbf{b}_{n} we get

T(𝐱)=k=1n(T(𝐱),𝐛k)𝐛k.T(\mathbf{x})=\sum_{k=1}^{n}(T(\mathbf{x}),\mathbf{b}_{k})\mathbf{b}_{k}.

Since

(T(𝐱),𝐛k)=(T(𝐱),T(𝐞k))=(𝐱,𝐞k)=αk,(T(\mathbf{x}),\mathbf{b}_{k})=(T(\mathbf{x}),T(\mathbf{e}_{k}))=(\mathbf{x},% \mathbf{e}_{k})=\alpha_{k},

we get that

T(k=1nαk𝐞k)=k=1nαk𝐛k.T\Bigl{(}\sum_{k=1}^{n}\alpha_{k}\mathbf{e}_{k}\Bigr{)}=\sum_{k=1}^{n}\alpha_{% k}\mathbf{b}_{k}.

This means that TT is a linear transformation whose matrix with respect to the bases 𝒮:=𝐞1,𝐞2,,𝐞n\mathcal{S}:=\mathbf{e}_{1},\mathbf{e}_{2},\ldots,\mathbf{e}_{n} and :=𝐛1,𝐛2,,𝐛n\mathcal{B}:=\mathbf{b}_{1},\mathbf{b}_{2},\ldots,\mathbf{b}_{n} is identity matrix, [T],𝒮=I[T]_{{}_{\scriptstyle\mathcal{B},\mathcal{S}}}=I.

An alternative way to show that TT is a linear transformation is the following direct calculation

T(𝐱+α𝐲)(T(𝐱)+αT(𝐲))2=(T(𝐱+α𝐲)T(𝐱))αT(𝐲)2\displaystyle\|T(\mathbf{x}+\alpha\mathbf{y})-(T(\mathbf{x})+\alpha T(\mathbf{% y}))\|^{2}=\|(T(\mathbf{x}+\alpha\mathbf{y})-T(\mathbf{x}))-\alpha T(\mathbf{y% })\|^{2}
=T(𝐱+α𝐲)T(𝐱)2+α2T(𝐲)22α(T(𝐱+α𝐲)T(𝐱),T(𝐲))\displaystyle=\|T(\mathbf{x}+\alpha\mathbf{y})-T(\mathbf{x})\|^{2}+\alpha^{2}% \|T(\mathbf{y})\|^{2}-2\alpha(T(\mathbf{x}+\alpha\mathbf{y})-T(\mathbf{x}),T(% \mathbf{y}))
=𝐱+α𝐲𝐱2+α2𝐲22α(T(𝐱+α𝐲),T(𝐲))+2α(T(𝐱),T(𝐲))\displaystyle=\|\mathbf{x}+\alpha\mathbf{y}-\mathbf{x}\|^{2}+\alpha^{2}\|% \mathbf{y}\|^{2}-2\alpha(T(\mathbf{x}+\alpha\mathbf{y}),T(\mathbf{y}))+2\alpha% (T(\mathbf{x}),T(\mathbf{y}))
=α2𝐲2+α2𝐲22α(𝐱+α𝐲,𝐲)+2α(𝐱,𝐲)\displaystyle=\alpha^{2}\|\mathbf{y}\|^{2}+\alpha^{2}\|\mathbf{y}\|^{2}-2% \alpha(\mathbf{x}+\alpha\mathbf{y},\mathbf{y})+2\alpha(\mathbf{x},\mathbf{y})
=2α2𝐲22α(𝐱,𝐲)2α2(𝐲,𝐲)+2α(𝐱,𝐲)=0\displaystyle=2\alpha^{2}\|\mathbf{y}\|^{2}-2\alpha(\mathbf{x},\mathbf{y})-2% \alpha^{2}(\mathbf{y},\mathbf{y})+2\alpha(\mathbf{x},\mathbf{y})=0

Therefore

T(𝐱+α𝐲)=T(𝐱)+αT(𝐲),T(\mathbf{x}+\alpha\mathbf{y})=T(\mathbf{x})+\alpha T(\mathbf{y}),

which implies that TT is linear (taking 𝐱=𝟎\mathbf{x}=\mathbf{0} or α=1\alpha=1 we get two properties from the definition of a linear transformation).

So, TT is a linear transformation satisfying T𝐱=𝐱\|T\mathbf{x}\|=\|\mathbf{x}\|, i.e. TT is an isometry. Since T:XXT:X\to X, TT is unitary transformation (see Proposition 5.6.3). That completes the proof, since an orthogonal transformation is simply a unitary transformation in a real inner product space. ∎

Exercises.

5.7.1.

Give an example of a rigid motion TT in n\mathbb{C}^{n}, T(𝟎)=𝟎T(\mathbf{0})=\mathbf{0}, which is not a linear transformation.

5.8. Complexification and decomplexification

This section is probably a bit more abstract than the rest of the chapter, and can be skipped at the first reading.

5.8.1. Decomplexification

Decomplexification of a vector space

Any complex vector space can be interpreted a real vector space: we just need to forget that we can multiply vectors by complex numbers and act as only multiplication by real numbers is allowed.

For example, the space n\mathbb{C}^{n} is canonically identified with the real space 2n\mathbb{R}^{2n}: each complex coordinate zk=xk+iykz_{k}=x_{k}+iy_{k} gives us 2 real ones xkx_{k} and yky_{k}.

“Canonically” here means that this is a standard, most natural way of identifying n\mathbb{C}^{n} and 2n\mathbb{R}^{2n}. Note, that while the above definition gives us a canonical way to get real coordinates from complex ones, it does not say anything about ordering real coordinates.

In fact, there are two standard ways to order the coordinates xkx_{k}, yky_{k}. One way is to take first the real parts and then the imaginary parts, so the ordering is x1,x2,,xn,y1,y2,,ynx_{1},x_{2},\ldots,x_{n},y_{1},y_{2},\ldots,y_{n}. The other standard alternative is the ordering x1,y1,x2,y2,,xn,ynx_{1},y_{1},x_{2},y_{2},\ldots,x_{n},y_{n}. The material of this section does not depend on the choice of ordering of coordinates, so the reader does not have to worry about picking an ordering.

Decomplexification of an inner product

It turns out that if we are given a complex inner product (in a complex space), we can in a canonical way get a real inner product from it. To see how we can do that, let as first consider the above example of n\mathbb{C}^{n} canonically identifies with 2n\mathbb{R}^{2n}. Let (𝐱,𝐲)(\mathbf{x},\mathbf{y})_{{}_{\scriptstyle\mathbb{C}}} denote the standard inner product in n\mathbb{C}^{n}, and (𝐱,𝐲)(\mathbf{x},\mathbf{y})_{{}_{\scriptstyle\mathbb{R}}} be the standard inner product in 2n\mathbb{R}^{2n} (note that the standard inner product in n\mathbb{R}^{n} does not depend on the ordering of coordinates). Then (see Exercise 5.8.1 below)

(5.8.1) (𝐱,𝐲)=Re(𝐱,𝐲)(\mathbf{x},\mathbf{y})_{{}_{\scriptstyle\mathbb{R}}}=\operatorname{Re}(% \mathbf{x},\mathbf{y})_{{}_{\scriptstyle\mathbb{C}}}

This formula can be used to canonically define a real inner product from the complex one in general situation. Namely, it is an easy exercise to show that if (𝐱,𝐲)(\mathbf{x},\mathbf{y})_{{}_{\scriptstyle\mathbb{C}}} is an inner product in a complex inner product space, then (𝐱,𝐲)(\mathbf{x},\mathbf{y})_{{}_{\scriptstyle\mathbb{R}}} defined by (5.8.1) is a real inner product (on the corresponding real space).

Summarizing we can say that

To decomplexify a complex inner product space we simply “forget” that we can multiply by complex numbers, i.e. we only allow multiplication by reals. The canonical real inner product in the decomplexified space is given by formula (5.8.1)

Remark.

Any (complex) linear transformation on n\mathbb{C}^{n} (or, more generally, on a complex vector space) gives as a real linear transformation: it is simply the fact that if T(α𝐱+β𝐲)=αT𝐱+βT𝐲T(\alpha\mathbf{x}+\beta\mathbf{y})=\alpha T\mathbf{x}+\beta T\mathbf{y} holds for α,β\alpha,\beta\in\mathbb{C}, then it of course holds for α,β\alpha,\beta\in\mathbb{R}.

The converse is not true, i.e. a (real) linear transformation on the decomplexification 2n\mathbb{R}^{2n} of n\mathbb{C}^{n} does not always give a (complex) linear transformation of n\mathbb{C}^{n} (the same in the abstract settings).

For example, if one considers the case n=1n=1, then the multiplication by a complex number zz (general form of a linear transformation in 1\mathbb{C}^{1}) treated as a linear transformation in 2\mathbb{R}^{2} has a very specific structure (can you describe it?).

5.8.2. Complexification

We can also do a converse, namely get a complex inner product space from a real one: in fact, you probably already did it before, without paying much attention to it.

Namely, given a real inner product space n\mathbb{R}^{n} we can obtain a complex space n\mathbb{C}^{n} out of it by allowing complex coordinates (with the standard inner product in both cases). The space n\mathbb{R}^{n} in this case will be a real222Real subspace mean the set closed with respect to sum and multiplication by real numberssubspace of n\mathbb{C}^{n} consisting of vectors with real coordinates.

Abstractly, this construction can be described as follows: given a real vector space XX we can define its complexification XX_{{}_{\scriptstyle\mathbb{C}}} as the collection of all pairs [𝐱1,𝐱2][\mathbf{x}_{1},\mathbf{x}_{2}], 𝐱1,𝐱2X\mathbf{x}_{1},\mathbf{x}_{2}\in X with the addition and multiplication by a real number α\alpha are defined coordinate-wise,

[𝐱1,𝐱2]+[𝐲1,𝐲2]=[𝐱1+𝐲1,𝐱2+𝐲2],α[𝐱1,𝐱2]=[α𝐱1,α𝐱2].[\mathbf{x}_{1},\mathbf{x}_{2}]+[\mathbf{y}_{1},\mathbf{y}_{2}]=[\mathbf{x}_{1% }+\mathbf{y}_{1},\mathbf{x}_{2}+\mathbf{y}_{2}],\qquad\alpha[\mathbf{x}_{1},% \mathbf{x}_{2}]=[\alpha\mathbf{x}_{1},\alpha\mathbf{x}_{2}].

If X=nX=\mathbb{R}^{n} then the vector 𝐱1\mathbf{x}_{1} consists of real parts of complex coordinates of n\mathbb{C}^{n} and the vector 𝐱2\mathbf{x}_{2} of the imaginary parts. Thus informally one can write the pair [𝐱1,𝐱2][\mathbf{x}_{1},\mathbf{x}_{2}] as 𝐱1+i𝐱2\mathbf{x}_{1}+i\mathbf{x}_{2}.

To define multiplication by complex numbers we define multiplication by ii as

i[𝐱1,𝐱2]=[𝐱2,𝐱1]i[\mathbf{x}_{1},\mathbf{x}_{2}]=[-\mathbf{x}_{2},\mathbf{x}_{1}]

(writing [𝐱1,𝐱2][\mathbf{x}_{1},\mathbf{x}_{2}] as 𝐱2+i𝐱2\mathbf{x}_{2}+i\mathbf{x}_{2} we can see that it must be defined this way) and define multiplication by arbitrary complex numbers using second distributive property (α+β)𝐯=α𝐯+β𝐯(\alpha+\beta)\mathbf{v}=\alpha\mathbf{v}+\beta\mathbf{v}.

If, in addition, XX is an inner product space we can extend the inner product to XX_{{}_{\scriptstyle\mathbb{C}}} by

([𝐱1,𝐱2],[𝐲1,𝐲2])X=(𝐱1,𝐲1)X+(𝐱2,𝐲2)Xi(𝐱1,𝐲2)X+i(𝐱2,𝐲1)X.\bigl{(}[\mathbf{x}_{1},\mathbf{x}_{2}],[\mathbf{y}_{1},\mathbf{y}_{2}]\bigr{)% }_{{}_{\scriptstyle X_{\mathbb{C}}}}=(\mathbf{x}_{1},\mathbf{y}_{1})_{{}_{% \scriptstyle X}}+(\mathbf{x}_{2},\mathbf{y}_{2})_{{}_{\scriptstyle X}}-i(% \mathbf{x}_{1},\mathbf{y}_{2})_{{}_{\scriptstyle X}}+i(\mathbf{x}_{2},\mathbf{% y}_{1})_{{}_{\scriptstyle X}}\,.

The easiest way to see that everything is well defined, is to fix a basis (an orthonormal basis in the case of a real inner product space) and see what this construction gives us in coordinates. Then we can see that if we treat vector 𝐱1\mathbf{x}_{1} as the vector consisting of the real parts of complex coordinates, and vector 𝐱2\mathbf{x}_{2} as the vector consisting of imaginary parts of coordinates, then this construction is exactly the standard complexification of n\mathbb{R}^{n} (by allowing complex coordinates) described above.

The fact that we can express this construction in coordinate-free way, without picking a basis and working with coordinates, means that the result does not depend on the choice of a basis.

So, the easiest way to think about complexification is probably as follows:

To construct a complexification of a real vector space XX we can pick a basis (an orthonormal basis if XX is a real inner product space) and then work with coordinates, allowing the complex ones. The resulting space does not depend on the choice of a basis; we can get from one coordinates to the others by the standard change of coordinate formula.

Note, that any linear transformation TT in the real space XX gives rise to a linear transformation TT_{{}_{\scriptstyle\mathbb{C}}} in the complexification XX_{{}_{\scriptstyle\mathbb{C}}}.

The easiest way to see that is to fix a basis in XX (an orthonormal basis if XX is a real inner product space) and to work in a coordinate representation: in this case TT_{{}_{\scriptstyle\mathbb{C}}} has the same matrix as TT. In the abstract representation we can write

T[𝐱1,𝐱2]=[T𝐱1,T𝐱2].T_{{}_{\scriptstyle\mathbb{C}}}[\mathbf{x}_{1},\mathbf{x}_{2}]=[T\mathbf{x}_{1% },T\mathbf{x}_{2}].

On the other hand, not all linear transformations in XX_{{}_{\scriptstyle\mathbb{C}}} can be obtained from the transformations in XX; if we do complexification in coordinates, only the transformations with real matrices work.

Note, that this is completely opposite to the situation in the case of decomplexification, described in Section 5.8.1.

An attentive reader probably already noticed that the operations of complexification and decomplixification are not the inverse of each other. First, the space and its complexification have the same dimension, while the decomplixification of an nn-dimensional space has dimension 2n2n. Moreover, as we just discussed, the relation between real and complex linear transformations is completely opposite in these cases.

In the next section we discuss the operation, inverse in some sense to decomplexification.

5.8.3. Introducing complex structure to a real space

The construction described in this section works only for real spaces of even dimension.

An elementary way to introduce a complex structure

Let XX be a real inner product space of dimension 2n2n. We want invert the decomplexification procedure to introduce a complex structure on XX, i.e. to identify this space with a complex space such that its decomplexification (see Section 5.8.1) give us the original space XX. The simplest idea is to fix an orthonormal basis in XX and then split the coordinates in this basis into two equal parts.

We than treat one half of the coordinates (say coordinates x1,x2,,xnx_{1},x_{2},\ldots,x_{n}) as real parts of complex coordinates, and treat the rest as the imaginary parts. Then we have to join real and imaginary parts together: for example if we treat x1,x2,,xnx_{1},x_{2},\ldots,x_{n} as real parts and xn+1,xn+2,,x2nx_{n+1},x_{n+2},\ldots,x_{2n} as imaginary parts, we can define complex coordinates zk=xk+ixn+kz_{k}=x_{k}+ix_{n+k}.

Of course, the result will generally depend on the choice of the orthonormal basis and on the way we split the real coordinates into real and imaginary parts and on how we join them.

One can also see from the decomplexification construction described in Section 5.8.1 that all complex structures on a real inner product space XX can be obtained in this way.

From elementary to abstract construction of complex structure

The above construction can be described in an abstract, coordinate-free way. Namely, let us split the space XX as X=EEX=E\oplus E^{\perp}, where EE is a subspace, dimE=n\dim E=n (so dimE=n\dim E^{\perp}=n as well), and let U0:EEU_{0}:E\to E^{\perp} be a unitary (more precisely, orthogonal, since our spaces are real) transformation.

Note, that if 𝐯1,𝐯2,,𝐯n\mathbf{v}_{1},\mathbf{v}_{2},\ldots,\mathbf{v}_{n} is an orthonormal basis in EE, then the system U0𝐯1,U0𝐯2,,U0𝐯nU_{0}\mathbf{v}_{1},U_{0}\mathbf{v}_{2},\ldots,U_{0}\mathbf{v}_{n} is an orthonormal basis in EE^{\perp}, so

𝐯1,𝐯2,,𝐯n,U0𝐯1,U0𝐯2,,U0𝐯n\mathbf{v}_{1},\mathbf{v}_{2},\ldots,\mathbf{v}_{n},U_{0}\mathbf{v}_{1},U_{0}% \mathbf{v}_{2},\ldots,U_{0}\mathbf{v}_{n}

is an orthonormal basis in the whole space XX.

If x1,x2,,x2nx_{1},x_{2},\ldots,x_{2n} are coordinates of a vector 𝐱\mathbf{x} in this basis, and we treat xk+ixn+kx_{k}+ix_{n+k}, k=1,2,,nk=1,2,\ldots,n as complex coordinates of 𝐱\mathbf{x}, then the multiplication by ii is represented by the orthogonal transformation UU which is given in the orthogonal basis of subspaces E,EE,E^{\perp} by the block matrix

U=(𝟎U0U0𝟎).U=\left(\begin{array}[]{cc}\mathbf{0}&-U_{0}^{*}\\ U_{0}&\mathbf{0}\end{array}\right).

This means that

i(𝐱1𝐱2)=U(𝐱1𝐱2)=(𝟎U0U0𝟎)(𝐱1𝐱2)i\left(\begin{array}[]{c}\mathbf{x}_{1}\\ \mathbf{x}_{2}\end{array}\right)=U\left(\begin{array}[]{c}\mathbf{x}_{1}\\ \mathbf{x}_{2}\end{array}\right)=\left(\begin{array}[]{cc}\mathbf{0}&-U_{0}^{*% }\\ U_{0}&\mathbf{0}\end{array}\right)\left(\begin{array}[]{c}\mathbf{x}_{1}\\ \mathbf{x}_{2}\end{array}\right)

𝐱1E\mathbf{x}_{1}\in E, 𝐱2E\mathbf{x}_{2}\in E^{\perp}.

Clearly, UU is an orthogonal transformation such that U2=IU^{2}=-I. Therefore, any complex structure on XX is given by an orthogonal transformation UU, satisfying U2=IU^{2}=-I; the transformation UU gives us the multiplication by the imaginary unit ii.

The converse is also true, namely any orthogonal transformation UU satisfying U2=IU^{2}=-I defines a complex structure on a real inner product space XX. Let us explain how.

An abstract construction of complex structure

Let us first consider an abstract explanation. To define a complex structure, we need to define the multiplication of vectors by complex numbers (initially we only can multiply by real numbers). In fact we need only to define the multiplication by ii, the rest will follow from linearity in the original real space. And the multiplication by ii is given by the orthogonal transformation UU satisfying U2=IU^{2}=-I.

Namely, if the multiplication by ii is given by UU, i𝐱=U𝐱i\mathbf{x}=U\mathbf{x}, then the complex multiplication must be defined by

(5.8.2) (α+βi)𝐱:=α𝐱+βU𝐱=(αI+βU)𝐱,α,β,𝐱X.(\alpha+\beta i)\mathbf{x}:=\alpha\mathbf{x}+\beta U\mathbf{x}=(\alpha I+\beta U% )\mathbf{x},\qquad\alpha,\beta\in\mathbb{R},\quad\mathbf{x}\in X.

We will use this formula now as the definition of complex multiplication.

It is not hard to check that for the complex multiplication defined above by (5.8.2) all axioms of complex vector space are satisfied. One can see that, for example by using linearity in the real space XX and noticing that with respect to algebraic operations (addition and multiplication) the linear transformations of form

αI+βU,α,β,\alpha I+\beta U,\qquad\alpha,\beta\in\mathbb{R},

behave absolutely the same way as complex numbers α+βi\alpha+\beta i, i.e such transformations give us a representation of the field of complex numbers \mathbb{C}.

This means that first, a sum and a product of transformations of the form αI+βU\alpha I+\beta U is a transformation of the same form, and to get the coefficients α\alpha, β\beta of the result we can perform the operation on the corresponding complex numbers and take the real and imaginary parts of the result. Note, that here we need the identity U2=IU^{2}=-I, but we do not need the fact that UU is an orthogonal transformation.

Thus, we got the structure of a complex vector space. To get a complex inner product space we need to introduce complex inner product (𝐱,𝐲)(\mathbf{x},\mathbf{y})_{{}_{\scriptstyle\mathbb{C}}}, such that the original real inner product (𝐱,𝐲)(\mathbf{x},\mathbf{y})_{{}_{\scriptstyle\mathbb{R}}} is the real part of it.

We really do not have any choice here. Indeed, for any complex inner product

Im(𝐱,𝐲)=Re[i(𝐱,𝐲)]=Re(𝐱,i𝐲)=Re(𝐱,U𝐲);\operatorname{Im}(\mathbf{x},\mathbf{y})_{{}_{\scriptstyle\mathbb{C}}}=% \operatorname{Re}\left[-i(\mathbf{x},\mathbf{y})_{{}_{\scriptstyle\mathbb{C}}}% \right]=\operatorname{Re}(\mathbf{x},i\mathbf{y})_{{}_{\scriptstyle\mathbb{C}}% }=\operatorname{Re}(\mathbf{x},U\mathbf{y})_{{}_{\scriptstyle\mathbb{C}}};

for the last equality we used the definition (5.8.2) of complex multiplication. Therefore, the only way to define the complex inner product (𝐱,𝐲)(\mathbf{x},\mathbf{y})_{{}_{\scriptstyle\mathbb{C}}} such that (𝐱,𝐲)=Re(𝐱,𝐲)(\mathbf{x},\mathbf{y})_{{}_{\scriptstyle\mathbb{R}}}=\operatorname{Re}(% \mathbf{x},\mathbf{y})_{{}_{\scriptstyle\mathbb{C}}} is

(5.8.3) (𝐱,𝐲):=(𝐱,𝐲)+i(𝐱,U𝐲).(\mathbf{x},\mathbf{y})_{{}_{\scriptstyle\mathbb{C}}}:=(\mathbf{x},\mathbf{y})% _{{}_{\scriptstyle\mathbb{R}}}+i(\mathbf{x},U\mathbf{y})_{{}_{\scriptstyle% \mathbb{R}}}.

Let us show that this is indeed a complex inner product. We will need the fact that U=UU^{*}=-U, see Exercise 5.8.4 below (by UU^{*} here we mean the adjoint with respect to the original real inner product).

To show that (𝐲,𝐱)=(𝐱,𝐲)¯(\mathbf{y},\mathbf{x})_{{}_{\scriptstyle\mathbb{C}}}=\overline{(\mathbf{x},% \mathbf{y})_{{}_{\scriptstyle\mathbb{C}}}} we use the identity U=UU^{*}=-U and symmetry of the real inner product:

(𝐲,𝐱)\displaystyle(\mathbf{y},\mathbf{x})_{{}_{\scriptstyle\mathbb{C}}} =(𝐲,𝐱)+i(𝐲,U𝐱)\displaystyle=(\mathbf{y},\mathbf{x})_{{}_{\scriptstyle\mathbb{R}}}+i(\mathbf{% y},U\mathbf{x})_{{}_{\scriptstyle\mathbb{R}}}
=(𝐱,𝐲)+i(U𝐱,𝐲)\displaystyle=(\mathbf{x},\mathbf{y})_{{}_{\scriptstyle\mathbb{R}}}+i(U\mathbf% {x},\mathbf{y})_{{}_{\scriptstyle\mathbb{R}}}
=(𝐱,𝐲)i(𝐱,U𝐲)\displaystyle=(\mathbf{x},\mathbf{y})_{{}_{\scriptstyle\mathbb{R}}}-i(\mathbf{% x},U\mathbf{y})_{{}_{\scriptstyle\mathbb{R}}}
=(𝐱,𝐲)+i(𝐱,U𝐲)¯\displaystyle=\overline{(\mathbf{x},\mathbf{y})_{{}_{\scriptstyle\mathbb{R}}}+% i(\mathbf{x},U\mathbf{y})_{{}_{\scriptstyle\mathbb{R}}}}
=(𝐱,𝐲)¯.\displaystyle=\overline{(\mathbf{x},\mathbf{y})_{{}_{\scriptstyle\mathbb{C}}}}.

To prove the linearity of the complex inner product, let us first notice that (𝐱,𝐲)(\mathbf{x},\mathbf{y})_{{}_{\scriptstyle\mathbb{C}}} is real linear in the first (in fact in each) argument, i.e. that (α𝐱+β𝐲,𝐳)=α(𝐱,𝐳)+β(𝐲,𝐳)(\alpha\mathbf{x}+\beta\mathbf{y},\mathbf{z})_{{}_{\scriptstyle\mathbb{C}}}=% \alpha(\mathbf{x},\mathbf{z})_{{}_{\scriptstyle\mathbb{C}}}+\beta(\mathbf{y},% \mathbf{z})_{{}_{\scriptstyle\mathbb{C}}} for α,β\alpha,\beta\in\mathbb{R}; this is true because each summand in the right side of (5.8.3) is real linear in the first argument.

Using real linearity of (𝐱,𝐲)(\mathbf{x},\mathbf{y})_{{}_{\scriptstyle\mathbb{C}}} and the identity U=UU^{*}=-U (which implies that (U𝐱,𝐲)=(𝐱,U𝐲)(U\mathbf{x},\mathbf{y})_{{}_{\scriptstyle\mathbb{R}}}=-(\mathbf{x},U\mathbf{y% })_{{}_{\scriptstyle\mathbb{R}}}) together with the orthogonality of UU, we get the following chain of equalities

((αI+βU)𝐱,𝐲)\displaystyle((\alpha I+\beta U)\mathbf{x},\mathbf{y})_{{}_{\scriptstyle% \mathbb{C}}} =α(𝐱,𝐲)+β(U𝐱,𝐲)\displaystyle=\alpha(\mathbf{x},\mathbf{y})_{{}_{\scriptstyle\mathbb{C}}}+% \beta(U\mathbf{x},\mathbf{y})_{{}_{\scriptstyle\mathbb{C}}}
=α(𝐱,𝐲)+β[(U𝐱,𝐲)+i(U𝐱,U𝐲)]\displaystyle=\alpha(\mathbf{x},\mathbf{y})_{{}_{\scriptstyle\mathbb{C}}}+% \beta\left[(U\mathbf{x},\mathbf{y})_{{}_{\scriptstyle\mathbb{R}}}+i(U\mathbf{x% },U\mathbf{y})_{{}_{\scriptstyle\mathbb{R}}}\right]
=α(𝐱,𝐲)+β[(𝐱,U𝐲)+i(𝐱,𝐲)]\displaystyle=\alpha(\mathbf{x},\mathbf{y})_{{}_{\scriptstyle\mathbb{C}}}+% \beta\left[-(\mathbf{x},U\mathbf{y})_{{}_{\scriptstyle\mathbb{R}}}+i(\mathbf{x% },\mathbf{y})_{{}_{\scriptstyle\mathbb{R}}}\right]
=α(𝐱,𝐲)+βi[(𝐱,𝐲)+i(𝐱,U𝐲)]\displaystyle=\alpha(\mathbf{x},\mathbf{y})_{{}_{\scriptstyle\mathbb{C}}}+% \beta i\left[(\mathbf{x},\mathbf{y})_{{}_{\scriptstyle\mathbb{R}}}+i(\mathbf{x% },U\mathbf{y})_{{}_{\scriptstyle\mathbb{R}}}\right]
=α(𝐱,𝐲)+βi(𝐱,𝐲)=(α+βi)(𝐱,𝐲),\displaystyle=\alpha(\mathbf{x},\mathbf{y})_{{}_{\scriptstyle\mathbb{C}}}+% \beta i(\mathbf{x},\mathbf{y})_{{}_{\scriptstyle\mathbb{C}}}=(\alpha+\beta i)(% \mathbf{x},\mathbf{y})_{{}_{\scriptstyle\mathbb{C}}},

which proves complex linearity.

Finally, to prove non-negativity of (𝐱,𝐱)(\mathbf{x},\mathbf{x})_{{}_{\scriptstyle\mathbb{C}}} let us notice (see Exercise 5.8.3 below) that (𝐱,U𝐱)=0(\mathbf{x},U\mathbf{x})_{{}_{\scriptstyle\mathbb{R}}}=0, so

(𝐱,𝐱)=(𝐱,𝐱)=𝐱20.(\mathbf{x},\mathbf{x})_{{}_{\scriptstyle\mathbb{C}}}=(\mathbf{x},\mathbf{x})_% {{}_{\scriptstyle\mathbb{R}}}=\|\mathbf{x}\|^{2}\geq 0.

The abstract construction via the elementary one

For a reader who is not comfortable with such “high brow” and abstract proof, there is another, more hands on, explanation.

Namely, it can be shown, see Exercise 5.8.5 below, that there exists a subspace EE, dimE=n\dim E=n (recall that dimX=2n\dim X=2n), such that the matrix of UU with respect to the decomposition X=EEX=E\oplus E^{\perp} is given by

U=(𝟎U0U0𝟎),U=\left(\begin{array}[]{cc}\mathbf{0}&-U_{0}^{*}\\ U_{0}&\mathbf{0}\end{array}\right),

where U0:EEU_{0}:E\to E^{\perp} is some orthogonal transformation.

Let 𝐯1,𝐯2,,𝐯n\mathbf{v}_{1},\mathbf{v}_{2},\ldots,\mathbf{v}_{n} be an orthonormal basis in EE. Then the system U0𝐯1,U0𝐯2,,U0𝐯nU_{0}\mathbf{v}_{1},U_{0}\mathbf{v}_{2},\ldots,U_{0}\mathbf{v}_{n} is an orthonormal basis in EE^{\perp}, so

𝐯1,𝐯2,,𝐯n,U0𝐯1,U0𝐯2,,U0𝐯n\mathbf{v}_{1},\mathbf{v}_{2},\ldots,\mathbf{v}_{n},U_{0}\mathbf{v}_{1},U_{0}% \mathbf{v}_{2},\ldots,U_{0}\mathbf{v}_{n}

is an orthonormal basis in the whole space XX. Considering the coordinates x1,x2,,x2nx_{1},x_{2},\ldots,x_{2n} in this basis and treating xk+ixn+kx_{k}+ix_{n+k} as complex coordinates, we get an elementary, “coordinate” way of defining complex structure, which was already described above. But if we look carefully, we see that multiplication by ii is given by the transformation UU: it is trivial for 𝐱E\mathbf{x}\in E and for 𝐲E\mathbf{y}\in E^{\perp}, and so it is true for all real linear combinations of α𝐱+β𝐲\alpha\mathbf{x}+\beta\mathbf{y}, i.e. for all vectors in XX.

But that means that the abstract introduction of complex structure and the corresponding elementary approach give us the same result! And since the elementary approach clearly gives us the a complex structure, the abstract approach gives us the same complex structure.

Exercises.

5.8.1.

Prove formula (5.8.1). Namely, show that if

𝐱=(z1,z2,,zn)T,𝐲=(w1,w2,,wn)T,\mathbf{x}=(z_{1},z_{2},\ldots,z_{n})^{T},\qquad\mathbf{y}=(w_{1},w_{2},\ldots% ,w_{n})^{T},

zk=xk+iykz_{k}=x_{k}+iy_{k}, wk=uk+ivkw_{k}=u_{k}+iv_{k}, xk,yk,uk,vkx_{k},y_{k},u_{k},v_{k}\in\mathbb{R}, then

Re(k=1nzkw¯k)=k=1nxkuk+k=1nykvk.\operatorname{Re}\Bigl{(}\sum_{k=1}^{n}z_{k}\overline{w}_{k}\Bigr{)}=\sum_{k=1% }^{n}x_{k}u_{k}+\sum_{k=1}^{n}y_{k}v_{k}.
5.8.2.

Show that if (𝐱,𝐲)(\mathbf{x},\mathbf{y})_{{}_{\scriptstyle\mathbb{C}}} is an inner product in a complex inner product space, then (𝐱,𝐲)(\mathbf{x},\mathbf{y})_{{}_{\scriptstyle\mathbb{R}}} defined by (5.8.1) is a real inner product.

5.8.3.

Let UU be an orthogonal transformation (in a real inner product space XX), satisfying U2=IU^{2}=-I. Prove that for all 𝐱X\mathbf{x}\in X

U𝐱𝐱.U\mathbf{x}\perp\mathbf{x}.
5.8.4.

Show, that if UU is an orthogonal transformation satisfying U2=IU^{2}=-I, then U=UU^{*}=-U.

5.8.5.

Let UU be an orthogonal transformation in a real inner product space, satisfying U2=IU^{2}=-I. Show that in this case dimX=2n\dim X=2n, and that there exists a subspace EXE\subset X, dimE=n\dim E=n, and an orthogonal transformation U0:EEU_{0}:E\to E^{\perp} such that UU in the decomposition X=EEX=E\oplus E^{\perp} is given by the block matrix

U=(𝟎U0U0𝟎).U=\left(\begin{array}[]{cc}\mathbf{0}&-U_{0}^{*}\\ U_{0}&\mathbf{0}\end{array}\right).

This statement can be easily obtained from Theorem 6.5.1 of Chapter 6, if one notes that the only rotation RαR_{\alpha} in 2\mathbb{R}^{2} satisfying Rα2=IR_{\alpha}^{2}=-I are rotations through α=±π/2\alpha=\pm\pi/2.

However, one can find an elementary proof here, not using this theorem. For example, the statement is trivial if dimX=2\dim X=2: in this case we can take for EE any one-dimensional subspace, see Exercise 5.8.3.

Then it is not hard to show that such operator UU does not exists in 3\mathbb{R}^{3}, and one can use induction in dimX\dim X to complete the proof.