Linear Algebra Done Wrong 5 Inner product spaces 7 Bilinear and quadratic forms

Chapter 6 Structure of operators in inner product spaces.

In this chapter we are again assuming that all spaces are finite-dimensional. Again, we are dealing only with complex or real spaces, theory of inner product spaces does not apply to spaces over general fields. When we are not mentioning what space are we in, everything work for both complex and real spaces.

To avoid writing essentially the same formulas twice we will use the notation for the complex case: in the real case it give correct, although sometimes a bit more complicated, formulas.

6.1. Upper triangular (Schur) representation of an operator.

Theorem 6.1.1.

Let $A:X\to X$ be an operator acting in a complex inner product space. There exists an orthonormal basis $\mathbf{u}_{1},\mathbf{u}_{2},\ldots,\mathbf{u}_{n}$ in $X$ such that the matrix of $A$ in this basis is upper triangular.

In other words, any $n\times n$ matrix $A$ can be represented as $A=UTU^{*}$ , where $U$ is a unitary, and $T$ is an upper triangular matrix.

Proof.

We prove the theorem using the induction in $\dim X$ . If $\dim X=1$ the theorem is trivial, since any $1\times 1$ matrix is upper triangular.

Suppose we proved that the theorem is true if $\dim X=n-1$ , and we want to prove it for $\dim X=n$ .

Let $\lambda_{1}$ be an eigenvalue of $A$ , and let $\mathbf{u}_{1}$ , $\|\mathbf{u}_{1}\|=1$ be a corresponding eigenvector, $A\mathbf{u}_{1}=\lambda_{1}\mathbf{u}_{1}$ . Denote $E=\mathbf{u}_{1}^{\perp}$ , and let $\mathbf{v}_{2},\ldots,\mathbf{v}_{n}$ be some orthonormal basis in $E$ (clearly, $\dim E=\dim X-1=n-1$ ), so $\mathbf{u}_{1},\mathbf{v}_{2},\ldots,\mathbf{v}_{n}$ is an orthonormal basis in $X$ . In this basis the matrix of $A$ has the form

(6.1.1)

\left(\begin{array}[]{c|cccc}\lambda_{1}&&*&\\ \hline\cr 0&&&\\ \vdots&&A_{1}&\\ 0&&&\end{array}\right);

here all entries below $\lambda_{1}$ are zeroes, and $*$ means that we do not care what entries are in the first row right of $\lambda_{1}$ .

We do care enough about the lower right $(n-1)\times(n-1)$ block, to give it name: we denote it as $A_{1}$ .

Note, that $A_{1}$ defines a linear transformation in $E$ , and since $\dim E=n-1$ , the induction hypothesis implies that there exists an orthonormal basis (let us denote it as $\mathbf{u}_{2},\ldots,\mathbf{u}_{n}$ ) in which the matrix of $A_{1}$ is upper triangular.

So, matrix of $A$ in the orthonormal basis $\mathbf{u}_{1},\mathbf{u}_{2},\ldots,\mathbf{u}_{n}$ has the form (6.1.1), where matrix $A_{1}$ is upper triangular. Therefore, the matrix of $A$ in this basis is upper triangular as well. ∎

Remark.

Note, that the subspace $E=\mathbf{u}_{1}^{\perp}$ introduced in the proof is not invariant under $A$ , i.e. the inclusion $AE\subset E$ does not necessarily hold. That means that $A_{1}$ is not a part of $A$ , it is some operator constructed from $A$ .

Note also, that $AE\subset E$ if and only if all entries denoted by $*$ (i.e. all entries in the first row, except $\lambda_{1}$ ) are zero.

Remark.

Note, that even if we start from a real matrix $A$ , the matrices $U$ and $T$ can have complex entries. The rotation matrix

\left(\begin{array}[]{cc}\cos\alpha&-\sin\alpha\\ \sin\alpha&\cos\alpha\end{array}\right),\qquad\alpha\neq k\pi,k\in\mathbb{Z}

is not unitarily equivalent (not even similar) to a real upper triangular matrix. Indeed, eigenvalues of this matrix are complex, and the eigenvalues of an upper triangular matrix are its diagonal entries.

Remark.

An analogue of Theorem 6.1.1 can be stated and proved for an arbitrary vector space, without requiring it to have an inner product. In this case the theorem claims that any operator have an upper triangular form in some basis. A proof can be modeled after the proof of Theorem 6.1.1. An alternative way is to equip $V$ with an inner product by fixing a basis in $V$ and declaring it to be an orthonormal one, see Exercise 5.2.4 in Chapter 5.

Note, that the version for inner product spaces (Theorem 6.1.1) is stronger than the one for the vector spaces, because it says that we always can find an orthonormal basis, not just a basis.

The following theorem is a real-valued version of Theorem 6.1.1.

Theorem 6.1.2.

Let $A:X\to X$ be an operator acting in a real inner product space. Suppose that all eigenvalues of $A$ are real (meaning that $A$ has exactly $n=\dim X$ real eigenvalues, counting multiplicities). Then there exists an orthonormal basis $\mathbf{u}_{1},\mathbf{u}_{2},\ldots,\mathbf{u}_{n}$ in $X$ such that the matrix of $A$ in this basis is upper triangular.

In other words, any real $n\times n$ matrix $A$ with all real eigenvalues can be represented as $A=UTU^{*}=UTU^{T}$ , where $U$ is an orthogonal, and $T$ is a real upper triangular matrices.

Remark 6.1.3.

Recall that as we had already discussed in Section 4.1.4 of Chapter 4 by eigenvalues of an operator $A$ in a real vector space $X$ we always mean the eigenvalues of its complexification $A_{{}_{\scriptstyle\mathbb{C}}}$ acting in the complexification $X_{{}_{\scriptstyle\mathbb{C}}}$ of $X$ .

If $X=\mathbb{R}^{n}$ then its complexification $\mathbb{C}^{n}$ is obtained from $\mathbb{R}^{n}$ by allowing complex coordinates. The complexification of an operator $A$ (the multiplication by a real matrix $A$ ) on $\mathbb{R}^{n}$ is the multiplication by the same (real) matrix, but now in the space $\mathbb{C}^{n}$ .

For more details about complexifications see Section 4.1.4 of Chapter 4 and Section 5.8.2 of Chapter 5.

Proof of Theorem 6.1.2.

To prove the theorem we just need to analyze the proof of Theorem 6.1.1. Let us assume (we can always do this without loss of generality) that the operator (matrix) $A$ acts in $\mathbb{R}^{n}$ .

Suppose, the theorem is true for $(n-1)\times(n-1)$ matrices. As in the proof of Theorem 6.1.1 let $\lambda_{1}$ be a real eigenvalue of $A$ , $\mathbf{u}_{1}\in\mathbb{R}^{n}$ , $\|\mathbf{u}_{1}\|=1$ be a corresponding eigenvector, and let $\mathbf{v}_{2},\ldots,\mathbf{v}_{n}$ be on orthonormal system (in $\mathbb{R}^{n}$ ) such that $\mathbf{u}_{1},\mathbf{v}_{2},\ldots,\mathbf{v}_{n}$ is an orthonormal basis in $\mathbb{R}^{n}$ .

The matrix of $A$ in this basis has form (6.1.1), where $A_{1}$ is some real matrix.

If we can prove that matrix $A_{1}$ has only real eigenvalues, then we are done. Indeed, then by the induction hypothesis there exists an orthonormal basis $\mathbf{u}_{2},\ldots,\mathbf{u}_{n}$ in $E=\mathbf{u}_{1}^{\perp}$ such that the matrix of $A_{1}$ in this basis is upper triangular, so the matrix of $A$ in the basis $\mathbf{u}_{1},\mathbf{u}_{2},\ldots,\mathbf{u}_{n}$ is also upper triangular.

To show that $A_{1}$ has only real eigenvalues, let us notice that

\det(A-\lambda I)=(\lambda_{1}-\lambda)\det(A_{1}-\lambda I)

(take the cofactor expansion in the first row, for example), and so any eigenvalue of $A_{1}$ is also an eigenvalue of $A$ . But $A$ has only real eigenvalues! ∎

Exercises.

6.1.1.

Use the upper triangular representation of an operator to give an alternative proof of the fact that determinant is the product and the trace is the sum of eigenvalues counting multiplicities.

6.2. Spectral theorem for self-adjoint and normal operators.

In this section we deal with matrices (operators) which are unitarily equivalent to diagonal matrices.

Let us recall that an operator is called self-adjoint if $A=A^{*}$ . A matrix of a self-adjoint operator (in some orthonormal basis), i.e. a matrix satisfying $A^{*}=A$ is called a Hermitian matrix.

The terms self-adjoint and Hermitian essentially mean the same. Usually people say self-adjoint when speaking about operators (transformations), and Hermitian when speaking about matrices. We will try to follow this convention, but since we often do not distinguish between operators and their matrices, we will sometimes mix both terms.

Theorem 6.2.1.

Let $A=A^{*}$ be a self-adjoint operator in an inner product space $X$ (the space can be complex or real). Then all eigenvalues of $A$ are real, and there exists an orthonormal basis of eigenvectors of $A$ in $X$ .

This theorem can be restated in matrix form as follows

Theorem 6.2.2.

Let $A=A^{*}$ be a self-adjoint (and therefore square) matrix. Then $A$ can be represented as

A=UDU^{*},

where $U$ is a unitary matrix and $D$ is a diagonal matrix with real entries.

Moreover, if the matrix $A$ is real, matrix $U$ can be chosen to be real (i.e. orthogonal).

Proof.

To prove Theorems 6.2.1 and 6.2.2 for a complex inner product space $X$ let us first apply Theorem 6.1.1 to find an orthonormal basis in $X$ such that the matrix of $A$ in this basis is upper triangular. Now let us ask ourself a question: What upper triangular matrices are self-adjoint?

The answer is immediate: an upper triangular matrix is self-adjoint if and only if it is a diagonal matrix with real entries. Theorem 6.2.1 (and so Theorem 6.2.2) is proved.

To treat the case of a real vector space, we first notice that by the already proved complex case all eigenvalues of $A$ are real¹¹1Recall, see Remark 6.1.3 above, that by the eigenvalues of an operator in a real vector space $X$ , we always mean the eigenvalues of the extension of this operator to the complexification $X_{{}_{\scriptstyle\mathbb{C}}}$ of $X$ . Then we just apply Theorem 6.1.2 and notice again that the only self-adjoint upper triangular matrices are the diagonal ones. ∎

Remark.

In many textbooks only real matrices are considered, and Theorem 6.2.2 is often called the “Spectral Theorem for symmetric matrices”. However, we should emphasize that the conclusion of Theorem 6.2.2 fails for complex symmetric matrices: the theorem holds for Hermitian matrices, and in particular for real symmetric matrices.

Let us give an independent proof to the fact that eigenvalues of a self-adjoint operators are real. Let $A=A^{*}$ and $A\mathbf{x}=\lambda\mathbf{x}$ , $\mathbf{x}\neq\mathbf{0}$ . Then

(A\mathbf{x},\mathbf{x})=(\lambda\mathbf{x},\mathbf{x})=\lambda(\mathbf{x},% \mathbf{x})=\lambda\|\mathbf{x}\|^{2}.

On the other hand,

(A\mathbf{x},\mathbf{x})=(\mathbf{x},A^{*}\mathbf{x})=(\mathbf{x},A\mathbf{x})% =(\mathbf{x},\lambda\mathbf{x})=\overline{\lambda}(\mathbf{x},\mathbf{x})=% \overline{\lambda}\|\mathbf{x}\|^{2},

so $\lambda\|\mathbf{x}\|^{2}=\overline{\lambda}\|\mathbf{x}\|^{2}$ . Since $\|\mathbf{x}\|\neq 0$ ( $\mathbf{x}\neq\mathbf{0}$ ), we can conclude $\lambda=\overline{\lambda}$ , so $\lambda$ is real.

It also follows from Theorem 6.2.1 that eigenspaces of a self-adjoint operator are orthogonal. Let us give an alternative proof of this result.

Proposition 6.2.3.

Let $A=A^{*}$ be a self-adjoint operator, and let $\mathbf{u},\mathbf{v}$ be its eigenvectors, $A\mathbf{u}=\lambda\mathbf{u}$ , $A\mathbf{v}=\mu\mathbf{v}$ . Then, if $\lambda\neq\mu$ , the eigenvectors $\mathbf{u}$ and $\mathbf{v}$ are orthogonal.

Proof.

This proposition follows from the spectral theorem (Theorem 6.2.1), but here we are giving a direct proof. Namely,

(A\mathbf{u},\mathbf{v})=(\lambda\mathbf{u},\mathbf{v})=\lambda(\mathbf{u},% \mathbf{v}).

On the other hand

(A\mathbf{u},\mathbf{v})=(\mathbf{u},A^{*}\mathbf{v})=(\mathbf{u},A\mathbf{v})% =(\mathbf{u},\mu\mathbf{v})=\overline{\mu}(\mathbf{u},\mathbf{v})=\mu(\mathbf{% u},\mathbf{v})

(the last equality holds because eigenvalues of a self-adjoint operator are real), so $\lambda(\mathbf{u},\mathbf{v})=\mu(\mathbf{u},\mathbf{v})$ . If $\lambda\neq\mu$ it is possible only if $(\mathbf{u},\mathbf{v})=0$ . ∎

Now let us try to find what matrices are unitarily equivalent to a diagonal one. It is easy to check that for a diagonal matrix $D$

D^{*}D=DD^{*}.

Therefore $A^{*}A=AA^{*}$ if the matrix of $A$ in some orthonormal basis is diagonal.

Definition.

An operator (matrix) $N$ is called normal if $N^{*}N=NN^{*}$ .

Clearly, any self-adjoint operator ( $A=A^{*}$ ) is normal. Also, any unitary operator $U:X\to X$ is normal since $U^{*}U=UU^{*}=I$ .

Note, that a normal operator is an operator acting in one space, not from one space to another. So, if $U$ is a unitary operator acting from one space to another, we cannot say that $U$ is normal.

Theorem 6.2.4.

Any normal operator $N$ in a complex vector space has an orthonormal basis of eigenvectors.

In other words, any matrix $N$ satisfying $N^{*}N=NN^{*}$ can be represented as

N=UDU^{*},

where $U$ is a unitary matrix, and $D$ is a diagonal one.

Remark.

Note, that in the above theorem even if $N$ is a real matrix, we did not claim that matrices $U$ and $D$ are real. Moreover, it can be easily shown, that if $D$ is real, $N$ must be self-adjoint.

Proof of Theorem 6.2.4.

To prove Theorem 6.2.4 we apply Theorem 6.1.1 to get an orthonormal basis, such that the matrix of $N$ in this basis is upper triangular. To complete the proof of the theorem we only need to show that an upper triangular normal matrix must be diagonal.

We will prove this using induction in the dimension of matrix. The case of $1\times 1$ matrix is trivial, since any $1\times 1$ matrix is diagonal.

Suppose we have proved that any $(n-1)\times(n-1)$ upper triangular normal matrix is diagonal, and we want to prove it for $n\times n$ matrices. Let $N$ be $n\times n$ upper triangular normal matrix. We can write it as

N=\left(\begin{array}[]{c|ccc}a_{1,1}&a_{1,2}&\ldots&a_{1,n}\\ \hline\cr 0&&&\\ \vdots&&N_{1}&\\ 0&&&\end{array}\right)

where $N_{1}$ is an upper triangular $(n-1)\times(n-1)$ matrix.

Let us compare upper left entries (first row first column) of $N^{*}N$ and $NN^{*}$ . Direct computation shows that that

(N^{*}N)_{1,1}=\overline{a}_{1,1}a_{1,1}=|a_{1,1}|^{2}

and

(NN^{*})_{1,1}=|a_{1,1}|^{2}+|a_{1,2}|^{2}+\ldots+|a_{1,n}|^{2}.

So, $(N^{*}N)_{1,1}=(NN^{*})_{1,1}$ if and only if $a_{1,2}=\ldots=a_{1,n}=0$ . Therefore, the matrix $N$ has the form

N=\left(\begin{array}[]{c|ccc}a_{1,1}&0&\ldots&0\\ \hline\cr 0&&&\\ \vdots&&N_{1}&\\ 0&&&\end{array}\right)

It follows from the above representation that

N^{*}N=\left(\begin{array}[]{c|ccc}|a_{1,1}|^{2}&0&\ldots&0\\ \hline\cr 0&&&\\ \vdots&&N_{1}^{*}N_{1}&\\ 0&&&\end{array}\right),\qquad NN^{*}=\left(\begin{array}[]{c|ccc}|a_{1,1}|^{2}% &0&\ldots&0\\ \hline\cr 0&&&\\ \vdots&&N_{1}N_{1}^{*}&\\ 0&&&\end{array}\right)

so $N_{1}^{*}N_{1}=N_{1}N_{1}^{*}$ . That means the matrix $N_{1}$ is also normal, and by the induction hypothesis it is diagonal. So the matrix $N$ is also diagonal. ∎

The following proposition gives a very useful characterization of normal operators.

Proposition 6.2.5.

An operator $N:X\to X$ is normal if and only if

\|N\mathbf{x}\|=\|N^{*}\mathbf{x}\|\qquad\forall\mathbf{x}\in X.

Proof.

Let $N$ be normal, $N^{*}N=NN^{*}$ . Then

\|N\mathbf{x}\|^{2}=(N\mathbf{x},N\mathbf{x})=(N^{*}N\mathbf{x},\mathbf{x})=(% NN^{*}\mathbf{x},\mathbf{x})=(N^{*}\mathbf{x},N^{*}\mathbf{x})=\|N^{*}\mathbf{% x}\|^{2}

so $\|N\mathbf{x}\|=\|N^{*}\mathbf{x}\|$ .

Now let

\|N\mathbf{x}\|=\|N^{*}\mathbf{x}\|\qquad\forall\mathbf{x}\in X.

The Polarization Identities (Lemma 5.1.9 in Chapter 5) imply that for all $\mathbf{x},\mathbf{y}\in X$

	$\displaystyle(N^{*}N\mathbf{x},\mathbf{y})=(N\mathbf{x},N\mathbf{y})$	$\displaystyle=\frac{1}{4}\sum_{\alpha=\pm 1,\pm i}\alpha\\|N\mathbf{x}+\alpha N% \mathbf{y}\\|^{2}$
		$\displaystyle=\frac{1}{4}\sum_{\alpha=\pm 1,\pm i}\alpha\\|N(\mathbf{x}+\alpha% \mathbf{y})\\|^{2}$
		$\displaystyle=\frac{1}{4}\sum_{\alpha=\pm 1,\pm i}\alpha\\|N^{*}(\mathbf{x}+% \alpha\mathbf{y})\\|^{2}$
		$\displaystyle=\frac{1}{4}\sum_{\alpha=\pm 1,\pm i}\alpha\\|N^{}\mathbf{x}+% \alpha N^{}\mathbf{y}\\|^{2}$
		$\displaystyle=(N^{}\mathbf{x},N^{}\mathbf{y})=(NN^{*}\mathbf{x},\mathbf{y})$

and therefore (see Corollary 5.1.6 in Chapter 5) $N^{*}N=NN^{*}$ . ∎

Exercises.

6.2.1.

True or false:

a)

Every unitary operator $U:X\to X$ is normal.
b)

A matrix is unitary if and only if it is invertible.
c)

If two matrices are unitarily equivalent, then they are also similar.
d)

The sum of self-adjoint operators is self-adjoint.
e)

The adjoint of a unitary operator is unitary.
f)

The adjoint of a normal operator is normal.
g)

If all eigenvalues of a linear operator are $1$ , then the operator must be unitary or orthogonal.
h)

If all eigenvalues of a normal operator are $1$ , then the operator is identity.
i)

A linear operator may preserve norm but not the inner product.

6.2.2.

True or false: The sum of normal operators is normal? Justify your conclusion.

6.2.3.

Show that an operator unitarily equivalent to a diagonal one is normal.

6.2.4.

Orthogonally diagonalize the matrix,

A=\left(\begin{array}[]{cc}3&2\\ 2&3\\ \end{array}\right).

Find all square roots of $A$ , i.e. find all matrices $B$ such that $B^{2}=A$ .

Note, that all square roots of $A$ are self-adjoint.

6.2.5.

True or false: any self-adjoint matrix has a self-adjoint square root. Justify.

6.2.6.

Orthogonally diagonalize the matrix,

A=\left(\begin{array}[]{cc}7&2\\ 2&4\\ \end{array}\right),

i.e. represent it as $A=UDU^{*}$ , where $D$ is diagonal and $U$ is unitary.

Among all square roots of $A$ , i.e. among all matrices $B$ such that $B^{2}=A$ , find one that has positive eigenvalues. You can leave $B$ as a product.

6.2.7.

True or false:

a)

A product of two self-adjoint matrices is self-adjoint.
b)

If $A$ is self-adjoint, then $A^{k}$ is self-adjoint.

Justify your conclusions

6.2.8.

Let $A$ be $m\times n$ matrix. Prove that

a)

$A^{*}A$ is self-adjoint.
b)

All eigenvalues of $A^{*}A$ are non-negative.
c)

$A^{*}A+I$ is invertible.

6.2.9.

Give a proof if the statement is true, or give a counterexample if it is false:

a)

If $A=A^{*}$ then $A+iI$ is invertible.
b)

If $U$ is unitary, $U+\frac{3}{4}I$ is invertible.
c)

If a matrix $A$ is real, $A-iI$ is invertible.

6.2.10.

Orthogonally diagonalize the rotation matrix

R_{\alpha}=\left(\begin{array}[]{cc}\cos\alpha&-\sin\alpha\\ \sin\alpha&\cos\alpha\\ \end{array}\right),

where $\alpha$ is not an integer multiple of $\pi$ . Note, that you will get complex eigenvalues in this case.

6.2.11.

Orthogonally diagonalize the matrix

A=\left(\begin{array}[]{cc}\cos\alpha&\sin\alpha\\ \sin\alpha&-\cos\alpha\\ \end{array}\right).

Hints: You will get real eigenvalues in this case. Also, the trigonometric identities $\sin 2x=2\sin x\cos x$ , $\sin^{2}x=(1-\cos 2x)/2$ , $\cos^{2}x=(1+\cos 2x)/2$ (applied to $x=\alpha/2$ ) will help to simplify expressions for eigenvectors.

6.2.12.

Can you describe the linear transformation with matrix $A$ from the previous problem geometrically? It has a very simple geometric interpretation.

6.2.13.

Prove that a normal operator with unimodular eigenvalues (i.e. with all eigenvalues satisfying $|\lambda_{k}|=1$ ) is unitary. Hint: Consider diagonalization

6.2.14.

Prove that a normal operator with real eigenvalues is self-adjoint.

6.2.15.

Show by example that conclusion of Theorem 6.2.2 fails for complex symmetric matrices. Namely

a)

construct a (diagonalizable) $2\times 2$ complex symmetric matrix not admitting an orthogonal basis of eigenvectors;
b)

construct a $2\times 2$ complex symmetric matrix which cannot be diagonalized.

6.3. Polar and singular value decompositions.

6.3.1. Positive definite operators. Square roots

Definition.

A self adjoint operator $A:X\to X$ is called positive definite if

(A\mathbf{x},\mathbf{x})>0\qquad\forall\mathbf{x}\neq\mathbf{0},

and it is called positive semidefinite if

(A\mathbf{x},\mathbf{x})\geq 0\qquad\forall\mathbf{x}\in X.

We will use the notation $A>0$ for positive definite operators, and $A\geq 0$ for positive semi-definite.

The following theorem describes positive definite and semidefinite operators.

Theorem 6.3.1.

Let $A=A^{*}$ . Then

1.

$A>0$ if and only if all eigenvalues of $A$ are positive.
2.

$A\geq 0$ if and only if all eigenvalues of $A$ are non-negative.

Proof.

Pick an orthonormal basis such that matrix of $A$ in this basis is diagonal (see Theorem 6.2.1). To finish the proof it remains to notice that a diagonal matrix is positive definite (positive semidefinite) if and only if all its diagonal entries are positive (non-negative). ∎

Corollary 6.3.2.

Let $A=A^{*}\geq 0$ be a positive semidefinite operator. There exists a unique positive semidefinite operator $B$ such that $B^{2}=A$

Such $B$ is called (positive) square root of $A$ and is denoted as $\sqrt{A}$ or $A^{1/2}$ .

Proof.

Let us prove that $\sqrt{A}$ exists. Let $\mathbf{v}_{1},\mathbf{v}_{2},\ldots,\mathbf{v}_{n}$ be an orthonormal basis of eigenvectors of $A$ , and let $\lambda_{1},\lambda_{2},\ldots,\lambda_{n}$ be the corresponding eigenvalues. Note, that since $A\geq 0$ , all $\lambda_{k}\geq 0$ .

In the basis $\mathbf{v}_{1},\mathbf{v}_{2},\ldots,\mathbf{v}_{n}$ the matrix of $A$ is a diagonal matrix $\operatorname{diag}\{\lambda_{1},\lambda_{2},\ldots,\lambda_{n}\}$ with entries $\lambda_{1},\lambda_{2},\ldots,\lambda_{n}$ on the diagonal. Define the matrix of $B$ in the same basis as $\operatorname{diag}\{\sqrt{\lambda_{1}},\sqrt{\lambda_{2}},\ldots,\sqrt{% \lambda_{n}}\}$ .

Clearly, $B=B^{*}\geq 0$ and $B^{2}=A$ .

To prove that such $B$ is unique, let us suppose that there exists an operator $C=C^{*}\geq 0$ such that $C^{2}=A$ . Let $\mathbf{u}_{1},\mathbf{u}_{2},\ldots,\mathbf{u}_{n}$ be an orthonormal basis of eigenvectors of $C$ , and let $\mu_{1},\mu_{2},\ldots,\mu_{n}$ be the corresponding eigenvalues (note that $\mu_{k}\geq 0$ $\forall k$ ). The matrix of $C$ in the basis $\mathbf{u}_{1},\mathbf{u}_{2},\ldots,\mathbf{u}_{n}$ is a diagonal one $\operatorname{diag}\{\mu_{1},\mu_{2},\ldots,\mu_{n}\}$ , and therefore the matrix of $A=C^{2}$ in the same basis is $\operatorname{diag}\{\mu^{2}_{1},\mu^{2}_{2},\ldots,\mu^{2}_{n}\}$ . This implies that any eigenvalue $\lambda$ of $A$ is of form $\mu_{k}^{2}$ , and, moreover, if $A\mathbf{x}=\lambda\mathbf{x}$ , then $C\mathbf{x}=\sqrt{\lambda}\mathbf{x}$ .

Therefore in the basis $\mathbf{v}_{1},\mathbf{v}_{2},\ldots,\mathbf{v}_{n}$ above, the matrix of $C$ has the diagonal form $\operatorname{diag}\{\sqrt{\lambda_{1}},\sqrt{\lambda_{2}},\ldots,\sqrt{% \lambda_{n}}\}$ , i.e. $B=C$ . ∎

6.3.2. Modulus of an operator. Polar decomposition.

Consider an operator $A:X\to Y$ . Its Hermitian square $A^{*}A$ is a positive semidefinite operator acting in $X$ . Indeed,

(A^{*}A)^{*}=A^{*}A^{**}=A^{*}A

and

(A^{*}A\mathbf{x},\mathbf{x})=(A\mathbf{x},A\mathbf{x})=\|A\mathbf{x}\|^{2}% \geq 0\qquad\forall\mathbf{x}\in X.

Therefore, there exists a (unique) positive-semidefinite square root $R=\sqrt{A^{*}A}$ . This operator $R$ is called the modulus of the operator $A$ , and is often denoted as $|A|$ .

The modulus of $A$ shows how “big” the operator $A$ is:

Proposition 6.3.3.

For a linear operator $A:X\to Y$

\||A|\mathbf{x}\|=\|A\mathbf{x}\|\qquad\forall\mathbf{x}\in X.

Proof.

For any $\mathbf{x}\in X$

	$\displaystyle\\|\|A\|\mathbf{x}\\|^{2}$	$\displaystyle=(\|A\|\mathbf{x},\|A\|\mathbf{x})=(\|A\|^{*}\|A\|\mathbf{x},\mathbf{x})=% (\|A\|^{2}\mathbf{x},\mathbf{x})$
		$\displaystyle=(A^{*}A\mathbf{x},\mathbf{x})=(A\mathbf{x},A\mathbf{x})=\\|A% \mathbf{x}\\|^{2}$

∎

Corollary 6.3.4.

\operatorname{Ker}A=\operatorname{Ker}|A|=(\operatorname{Ran}|A|)^{\perp}.

Proof.

The first equality follows immediately from Proposition 6.3.3, the second one follows from the identity $\operatorname{Ker}T=(\operatorname{Ran}T^{*})^{\perp}$ ( $|A|$ is self-adjoint). ∎

Theorem 6.3.5 (Polar decomposition of an operator).

Let $A:X\to X$ be an operator (square matrix). Then $A$ can be represented as

A=U|A|,

where $U$ is a unitary operator.

Remark.

The unitary operator $U$ is generally not unique. As one will see from the proof of the theorem, $U$ is unique if and only if $A$ is invertible.

Remark.

The polar decomposition $A=U|A|$ also holds for operators $A:X\to Y$ acting from one space to another. But in this case we can only guarantee that $U$ is an isometry from $\operatorname{Ran}|A|=(\operatorname{Ker}A)^{\perp}$ to $Y$ .

If $\dim X\leq\dim Y$ this isometry can be extended to an isometry from the whole $X$ to $Y$ (if $\dim X=\dim Y$ this will be a unitary operator).

Proof of Theorem 6.3.5.

Consider a vector $\mathbf{x}\in\operatorname{Ran}|A|$ . Then vector $\mathbf{x}$ can be represented as $\mathbf{x}=|A|\mathbf{v}$ for some vector $\mathbf{v}\in X$ .

Define $U_{0}\mathbf{x}:=A\mathbf{v}$ . By Proposition 6.3.3

\|U_{0}\mathbf{x}\|=\|A\mathbf{v}\|=\||A|\mathbf{v}\|=\|\mathbf{x}\|

so it looks like $U_{0}$ is an isometry from $\operatorname{Ran}|A|$ to $X$ .

But first we need to prove that $U_{0}$ is well defined. Let $\mathbf{v}_{1}$ be another vector such that $\mathbf{x}=|A|\mathbf{v}_{1}$ . But $\mathbf{x}=|A|\mathbf{v}=|A|\mathbf{v}_{1}$ means that $\mathbf{v}-\mathbf{v}_{1}\in\operatorname{Ker}|A|=\operatorname{Ker}A$ (cf Corollary 6.3.4), so $A\mathbf{v}=A\mathbf{v}_{1}$ , meaning that $U_{0}\mathbf{x}$ is well defined.

By the construction $A=U_{0}|A|$ . We leave as an exercise for the reader to check that $U_{0}$ is a linear transformation.

To extend $U_{0}$ to a unitary operator $U$ , let us find some unitary transformation $U_{1}:\operatorname{Ker}A\to(\operatorname{Ran}A)^{\perp}=\operatorname{Ker}A^% {*}$ . It is always possible to do this, since for square matrices $\dim\operatorname{Ker}A=\dim\operatorname{Ker}A^{*}$ (the Rank Theorem).

It is easy to check that $U=U_{0}+U_{1}$ is a unitary operator, and that $A=U|A|$ . ∎

6.3.3. Singular values. Schmidt decomposition.

Definition.

Eigenvalues of $|A|$ are called the singular values of $A$ . In other words, if $\lambda_{1},\lambda_{2},\ldots,\lambda_{n}$ are eigenvalues of $A^{*}A$ then $\sqrt{\lambda_{1}},\sqrt{\lambda_{2}},\ldots,\sqrt{\lambda_{n}}$ are singular values of $A$ .

Remark.

Very often in the literature the singular values are defined as the non-negative square roots of the eigenvalues of $A^{*}A$ , without any reference to the modulus $|A|$ .

I consider the notion of the modulus of an operator to be an important one, so it was introduced above. However, the notion of the modulus of an operator is not required for what follows (defining the Schmidt and singular value decompositions). Moreover, as it will be shown below, the modulus of $A$ can be easily constructed from the singular value decomposition.

Consider an operator $A:X\to Y$ , and let $\sigma_{1},\sigma_{2},\ldots,\sigma_{n}$ be the singular values of $A$ counting multiplicities. Assume also that $\sigma_{1},\sigma_{2},\ldots,\sigma_{r}$ are the non-zero singular values of $A$ , counting multiplicities. This means, in particular, that $\sigma_{k}=0$ for $k>r$ .

By the definition of singular values the numbers $\sigma^{2}_{1},\sigma^{2}_{2},\ldots,\sigma^{2}_{n}$ are eigenvalues of $A^{*}A$ . Let $\mathbf{v}_{1},\mathbf{v}_{2},\ldots,\mathbf{v}_{n}$ be an orthonormal basis²²2We know, that for a self-adjoint operator ( $A^{*}A$ in our case) there exists an orthonormal basis of eigenvectors. of eigenvectors of $A^{*}A$ , $A^{*}A\mathbf{v}_{k}=\sigma^{2}_{k}\mathbf{v}_{k}$ .

Proposition 6.3.6.

The system

\mathbf{w}_{k}:=\frac{1}{\sigma_{k}}A\mathbf{v}_{k},\qquad k=1,2,\ldots,r

is an orthonormal system.

Proof.

(A\mathbf{v}_{j},A\mathbf{v}_{k})=(A^{*}A\mathbf{v}_{j},\mathbf{v}_{k})=(% \sigma_{j}^{2}\mathbf{v}_{j},\mathbf{v}_{k})=\sigma_{j}^{2}(\mathbf{v}_{j},% \mathbf{v}_{k})=\left\{\begin{array}[]{ll}0,&j\neq k\\ \sigma_{j}^{2},&j=k\end{array}\right.

since $\mathbf{v}_{1},\mathbf{v}_{2},\ldots,\mathbf{v}_{r}$ is an orthonormal system. ∎

In the notation of the above proposition, the operator $A$ can be represented as

(6.3.1)

A=\sum_{k=1}^{r}\sigma_{k}\mathbf{w}_{k}\mathbf{v}_{k}^{*},

or, equivalently

(6.3.2)

A\mathbf{x}=\sum_{k=1}^{r}\sigma_{k}(\mathbf{x},\mathbf{v}_{k})\mathbf{w}_{k}.

Indeed, we know that $\mathbf{v}_{1},\mathbf{v}_{2},\ldots,\mathbf{v}_{n}$ is an orthonormal basis in $X$ . Then substituting $\mathbf{x}=\mathbf{v}_{j}$ into the right side of (6.3.2) we get

\sum_{k=1}^{r}\sigma_{k}(\mathbf{v}_{j},\mathbf{v}_{k})\mathbf{w}_{k}=\sigma_{% j}(\mathbf{v}_{j},\mathbf{v}_{j})\mathbf{w}_{j}=\sigma_{j}\mathbf{w}_{j}=A% \mathbf{v}_{j}\qquad\text{if }j=1,2,\ldots,r,

and

\sum_{k=1}^{r}\sigma_{k}(\mathbf{v}_{k}^{*}\mathbf{v}_{j})\mathbf{w}_{k}=% \mathbf{0}=A\mathbf{v}_{j}\qquad\text{for }j>r.

So the operators in the left and right sides of (6.3.1) coincide on the basis $\mathbf{v}_{1},\mathbf{v}_{2},\ldots,\mathbf{v}_{n}$ , so they are equal.

Definition.

The above decomposition (6.3.1) (or (6.3.2)) is called the Schmidt decomposition of the operator $A$ .

Remark.

Schmidt decomposition of an operator is not unique. Why?

Lemma 6.3.7.

Let $A$ can be represented as

A=\sum_{k=1}^{r}\sigma_{k}\mathbf{w}_{k}\mathbf{v}_{k}^{*}

where $\sigma_{k}>0$ and $\mathbf{v}_{1},\mathbf{v}_{2},\ldots,\mathbf{v}_{r}$ , $\mathbf{w}_{1},\mathbf{w}_{2},\ldots,\mathbf{w}_{r}$ are some orthonormal systems.

Then this representation gives a Schmidt decomposition of $A$ .

Proof.

We only need to show that $\mathbf{v}_{1},\mathbf{v}_{2},\ldots,\mathbf{v}_{r}$ are eigenvectors of $A^{*}A$ , $A^{*}A\mathbf{v}_{k}=\sigma_{k}^{2}\mathbf{v}_{k}$ . Since $\mathbf{w}_{1},\mathbf{w}_{2},\ldots,\mathbf{w}_{r}$ is an orthonormal system,

\mathbf{w}_{k}^{*}\mathbf{w}_{j}=(\mathbf{w}_{j},\mathbf{w}_{k})=\delta_{k,j}:% =\left\{\begin{array}[]{ll}0,&j\neq k\\ 1,&j=k,\end{array}\right.

and therefore

A^{*}A=\sum_{k=1}^{r}\sigma_{k}^{2}\mathbf{v}_{k}\mathbf{v}_{k}^{*}.

Since $\mathbf{v}_{1},\mathbf{v}_{2},\ldots,\mathbf{v}_{r}$ is an orthonormal system

A^{*}A\mathbf{v}_{j}=\sum_{k=1}^{r}\sigma_{k}^{2}\mathbf{v}_{k}\mathbf{v}_{k}^% {*}\mathbf{v}_{j}=\sigma_{j}^{2}\mathbf{v}_{j}

thus $\mathbf{v}_{k}$ are eigenvectors of $A^{*}A$ . ∎

Corollary 6.3.8.

Let

A=\sum_{k=1}^{r}\sigma_{k}\mathbf{w}_{k}\mathbf{v}_{k}^{*}

be a Schmidt decomposition of $A$ . Then

A^{*}=\sum_{k=1}^{r}\sigma_{k}\mathbf{v}_{k}\mathbf{w}_{k}^{*}

is a Schmidt decomposition of $A^{*}$

6.3.4. Matrix representation of the Schmidt decomposition. Singular value decomposition.

The Schmidt decomposition can be written in a nice matrix form. Namely, let us assume that $A:\mathbb{F}^{n}\to\mathbb{F}^{m}$ , where $\mathbb{F}$ is either $\mathbb{C}$ or $\mathbb{R}$ (we can always do that by fixing orthonormal bases in $X$ and $Y$ and working with coordinates in these bases). Let $\sigma_{1},\sigma_{2},\ldots,\sigma_{r}$ be non-zero singular values of $A$ , and let

A=\sum_{k=1}^{r}\sigma_{k}\mathbf{w}_{k}\mathbf{v}_{k}^{*}

be a Schmidt decomposition of $A$ .

As one can easily see, this equality can be rewritten as

(6.3.3)

A=\widetilde{W}\widetilde{\Sigma}\widetilde{V}^{*},

where $\widetilde{\Sigma}=\operatorname{diag}\{\sigma_{1},\sigma_{2},\ldots,\sigma_{r}\}$ and $\widetilde{V}$ and $\widetilde{W}$ are matrices with columns $\mathbf{v}_{1},\mathbf{v}_{2},\ldots,\mathbf{v}_{r}$ and $\mathbf{w}_{1},\mathbf{w}_{2},\ldots,\mathbf{w}_{r}$ respectively. (Can you tell what is the size of each matrix?)

Note, that since $\mathbf{v}_{1},\mathbf{v}_{2},\ldots,\mathbf{v}_{r}$ and $\mathbf{w}_{1},\mathbf{w}_{2},\ldots,\mathbf{w}_{r}$ are orthonormal systems, the matrices $\widetilde{V}$ and $\widetilde{W}$ are isometries. Note also that $r=\operatorname{rank}A$ , see Exercise 6.3.1 below.

If the matrix $A$ is invertible, then $m=n=r$ , the matrices $\widetilde{V}$ , $\widetilde{W}$ are unitary and $\widetilde{\Sigma}$ is an invertible diagonal matrix.

It turns out that it is always possible to write a representation similar to (6.3.3) with unitary $V$ and $W$ instead of $\widetilde{V}$ and $\widetilde{W}$ , and in many situations it is more convenient to work with such a representation. To write this representation one needs first to complete the systems $\mathbf{v}_{1},\mathbf{v}_{2},\ldots,\mathbf{v}_{r}$ and $\mathbf{w}_{1},\mathbf{w}_{2},\ldots,\mathbf{w}_{r}$ to orthogonal bases in $\mathbb{F}^{n}$ and $\mathbb{F}^{m}$ respectively.

Recall, that to complete, say $\mathbf{v}_{1},\mathbf{v}_{2},\ldots,\mathbf{v}_{r}$ to an orthonormal basis in $\mathbb{F}^{n}$ one just needs to find and orthonormal basis $\mathbf{v}_{r+1},\ldots,\mathbf{v}_{n}$ in $\operatorname{Ker}V^{*}$ ; then the system $\mathbf{v}_{1},\mathbf{v}_{2},\ldots,\mathbf{v}_{n}$ will be an orthonormal basis in $\mathbb{F}^{n}$ . And one can always get an orthonormal basis from an arbitrary one using Gram–Schmidt orthogonalization.

Then $A$ can be represented as

(6.3.4)

A=W\Sigma V^{*},

where $V\in M_{n\times n}^{\mathbb{F}}$ and $W\in M_{m\times m}^{\mathbb{F}}$ are unitary matrices with columns $\mathbf{v}_{1},\mathbf{v}_{2},\ldots,\mathbf{v}_{n}$ and $\mathbf{w}_{1},\mathbf{w}_{2},\ldots,\mathbf{w}_{m}$ respectively, and $\Sigma$ is a “diagonal” $m\times n$ matrix

(6.3.5)

\Sigma_{j,k}=\left\{\begin{array}[]{ll}\sigma_{k}&j=k\leq r:\\ 0&\text{otherwise.}\end{array}\right.

In other words, to get the matrix $\Sigma$ one has to take the diagonal matrix $\operatorname{diag}\{\sigma_{1},\sigma_{2},\ldots,\sigma_{r}\}$ and make it to an $m\times n$ matrix by adding extra zeroes “south and east”.

Definition 6.3.9.

For a matrix $A\in M_{m\times n}^{\mathbb{F}}$ (recall that here $\mathbb{F}$ is always $\mathbb{C}$ or $\mathbb{R}$ ) its singular value decomposition (SVD) is a decomposition of form (6.3.4), i.e. a decomposition $A=W\Sigma V^{*}$ , where $W\in M_{m\times m}^{\mathbb{F}}$ , $V\in M_{n\times n}^{\mathbb{F}}$ are unitary matrices and $\Sigma\in M_{m\times n}^{\mathbb{R}_{+}}$ is a “diagonal” one (meaning that $\sigma_{k,k}\geq 0$ for all $k=1,2,\ldots,\min\{m,n\}$ , and $\sigma_{j,k}=0$ for all $j\neq k$ ).

The representation (6.3.3) is often called the reduced or compact SVD. More precisely the reduced SVD is a representation $A=\widetilde{W}\widetilde{\Sigma}\widetilde{V}^{*}$ , where $\widetilde{\Sigma}\in M_{r\times r}^{\mathbb{R}_{+}}$ , $r\leq\min\{m,n\}$ is a diagonal matrix with strictly positive diagonal entries, and $\widetilde{W}\in M_{m\times r}^{\mathbb{F}}$ , $\widetilde{V}\in M_{n\times r}^{\mathbb{F}}$ are isometries; moreover, we require that at least one of the matrices $\widetilde{W}$ and $\widetilde{V}$ is not square.

Remark 6.3.10.

It is easy to see that if $A=W\Sigma V^{*}$ is a singular value decomposition of $A$ , then $\sigma_{k}:=\sigma_{k,k}$ are singular values of $A$ , i.e. $\sigma_{k}^{2}$ are eigenvalues of $A^{*}A$ . Moreover, the columns $\mathbf{v}_{k}$ of $V$ are the corresponding eigenvectors of $A^{*}A$ , $A^{*}A\mathbf{v}_{k}=\sigma_{k}^{2}\mathbf{v}_{k}$ . Note also that if $\sigma_{k}\neq 0$ then $\mathbf{w}_{k}=\frac{1}{\sigma_{k}}A\mathbf{v}_{k}$ .

All that means that any singular value decomposition $A=W\Sigma V^{*}$ can be obtained from a Schmidt decomposition (6.3.2) by the construction described above in this section.

The reduced singular value decomposition can be interpreted as a matrix form of the Schmidt decomposition (6.3.2) for a non-invertible matrix $A$ . For an invertible matrix $A$ the matrix form of the Schmidt decomposition gives the singular value decomposition.

Remark 6.3.11.

An alternative way to interpret the singular value decomposition $A=W\Sigma V^{*}$ is to say that $\Sigma$ is the matrix of $A$ in the (orthonormal) bases $\mathcal{A}=\mathbf{v}_{1},\mathbf{v}_{2},\ldots,\mathbf{v}_{n}$ and $\mathcal{B}:=\mathbf{w}_{1},\mathbf{w}_{2},\ldots,\mathbf{w}_{m}$ , i.e that $\Sigma=[A]_{{}_{\scriptstyle\mathcal{B},\mathcal{A}}}$ .

We will use this interpretation later.

From singular value decomposition to the polar decomposition

Note, that if we know the singular value decomposition $A=W\Sigma V^{*}$ of a square matrix $A$ , we can write a polar decomposition of $A$ :

(6.3.6)

A=W\Sigma V^{*}=(WV^{*})(V\Sigma V^{*})=U|A|

where $|A|=V\Sigma V^{*}$ and $U=WV^{*}$ .

To see that this indeed give us a polar decomposition let us notice that $V\Sigma V^{*}$ is a self-adjoint, positive semidefinite operator and that

A^{*}A=V\Sigma W^{*}W\Sigma V^{*}=V\Sigma\Sigma V^{*}=V\Sigma V^{*}V\Sigma V^{% *}=(V\Sigma V^{*})^{2}.

So by the definition of $|A|$ as the unique positive semidefinite square root of $A^{*}A$ , we can see that $|A|=V\Sigma V^{*}$ . The transformation $WV^{*}$ is clearly unitary, as a product of two unitary transformations, so (6.3.6) indeed gives us a polar decomposition of $A$ .

Note, that this reasoning only works for square matrices, because if $A$ is not square, then the product $V\Sigma$ is not defined (dimensions do not match, can you see how?).

Exercises.

6.3.1.

Show that the number of non-zero singular values of a matrix $A$ coincides with its rank.

6.3.2.

Find Schmidt decompositions $\displaystyle A=\sum_{k=1}^{r}s_{k}\mathbf{w}_{k}\mathbf{v}_{k}^{*}$ for the following matrices $A$ :

\left(\begin{array}[]{rr}2&3\\ 0&2\\ \end{array}\right),\qquad\left(\begin{array}[]{rr}7&1\\ 0&0\\ 5&5\end{array}\right),\qquad\left(\begin{array}[]{rr}1&1\\ 0&1\\ -1&1\end{array}\right).

6.3.3.

Let $A$ be an invertible matrix, and let $A=W\Sigma V^{*}$ be its singular value decomposition. Find a singular value decomposition for $A^{*}$ and $A^{-1}$ .

6.3.4.

Find singular value decomposition $\displaystyle A=W\Sigma V^{*}$ where $V$ and $W$ are unitary matrices for the following matrices:

a)

$\displaystyle A=\left(\begin{array}[]{rr}-3&1\\ 6&-2\\ 6&-2\end{array}\right)$ ;
b)

$\displaystyle A=\left(\begin{array}[]{rrr}3&2&2\\ 2&3&-2\end{array}\right)$ .

6.3.5.

Find singular value decomposition of the matrix

A=\left(\begin{array}[]{rr}2&3\\ 0&2\\ \end{array}\right)

Use it to find

a)

$\max_{\|\mathbf{x}\|\leq 1}\|A\mathbf{x}\|$ and the vectors where the maximum is attained;
b)

$\min_{\|\mathbf{x}\|=1}\|A\mathbf{x}\|$ and the vectors where the minimum is attained;
c)

the image $A(B)$ of the closed unit ball in $\mathbb{R}^{2}$ , $B=\{\mathbf{x}\in\mathbb{R}^{2}:\|\mathbf{x}\|\leq 1\}$ . Describe $A(B)$ geometrically.

6.3.6.

Show that for a square matrix $A$ , $|\det A|=\det|A|$ .

6.3.7.

True or false

a)

Singular values of a matrix are also eigenvalues of the matrix.
b)

Singular values of a matrix $A$ are eigenvalues of $A^{*}A$ .
c)

Is $s$ is a singular value of a matrix $A$ and $c$ is a scalar, then $|c|s$ is a singular value of $cA$ .
d)

The singular values of any linear operator are non-negative.
e)

Singular values of a self-adjoint matrix coincide with its eigenvalues.

6.3.8.

Let $A$ be an $m\times n$ matrix. Prove that non-zero eigenvalues of the matrices $A^{*}A$ and $AA^{*}$ (counting multiplicities) coincide.

Can you say when zero eigenvalue of $A^{*}A$ and zero eigenvalue of $AA^{*}$ have the same multiplicity?

6.3.9.

Let $s$ be the largest singular value of an operator $A$ , and let $\lambda$ be the eigenvalue of $A$ with largest absolute value. Show that $|\lambda|\leq s$ .

6.3.10.

Show that the rank of a matrix is the number of its non-zero singular values (counting multiplicities).

6.3.11.

Show that the operator norm of a matrix $A$ coincides with its Frobenius norm if and only if the matrix has rank one. Hint: The previous problem might help.

6.3.12.

For the matrix $A$

A=\left(\begin{array}[]{rr}2&-3\\ 0&2\\ \end{array}\right),

describe the inverse image of the unit ball, i.e. the set of all $\mathbf{x}\in\mathbb{R}^{2}$ such that $\|A\mathbf{x}\|\leq 1$ . Use singular value decomposition.

6.4. Applications of the singular value decomposition.

As we discussed above, the singular value decomposition is simply diagonalization with respect to two different orthonormal bases. Since we have two different bases here, we cannot say much about spectral properties of an operator from its singular value decomposition. For example, the diagonal entries of $\Sigma$ in the singular value decomposition (6.3.4) are not the eigenvalues of $A$ . Note, that for $A=W\Sigma V^{*}$ as in (6.3.4) we generally have $A^{n}\neq W\Sigma^{n}V^{*}$ , so this diagonalization does not help us in computing functions of a matrix.

However, as the examples below show, singular values tell us a lot about so-called metric properties of a linear transformation.

Final remark: performing singular value decomposition requires finding eigenvalues and eigenvectors of the Hermitian (self-adjoint) matrix $A^{*}A$ . To find eigenvalues we usually computed characteristic polynomial, found its roots, and so on… This looks like quite a complicated process, especially if one takes into account that there is no formula for finding roots of polynomials of degree 5 and higher.

However, there are very effective numerical methods of find eigenvalues and eigenvectors of a hermitian matrix up to any given precision. These methods do not involve computing the characteristic polynomial and finding its roots. They compute approximate eigenvalues and eigenvectors directly by an iterative procedure. Because a Hermitian matrix has an orthogonal basis of eigenvectors, these methods work extremely well.

We will not discuss these methods here, it goes beyond the scope of this book. However, you should believe me that there are very effective numerical methods for computing eigenvalues and eigenvectors of a Hermitian matrix and for finding the singular value decomposition. These methods are extremely effective, and just a little more computationally intensive than solving a linear system.

6.4.1. Image of the unit ball

Consider for example the following problem: let $A:\mathbb{R}^{n}\to\mathbb{R}^{m}$ be a linear transformation, and let $B=\{\mathbf{x}\in\mathbb{R}^{n}:\|\mathbf{x}\|\leq 1\}$ be the closed unit ball in $\mathbb{R}^{n}$ . We want to describe $A(B)$ , i.e. we want to find out how the unit ball is transformed under the linear transformation.

Let us first consider the simplest case when $A$ is a diagonal matrix $A=\operatorname{diag}\{\sigma_{1},\sigma_{2},\ldots,\sigma_{n}\}$ , $\sigma_{k}>0$ , $k=1,2,\ldots,n$ . Then for $\mathbf{x}=(x_{1},x_{2},\ldots,x_{n})^{T}$ and $(y_{1},y_{2},\ldots,y_{n})^{T}=\mathbf{y}=A\mathbf{x}$ we have $y_{k}=\sigma_{k}x_{k}$ (equivalently, $x_{k}=y_{k}/\sigma_{k}$ ) for $k=1,2,\ldots,n$ , so

\mathbf{y}=(y_{1},y_{2},\ldots,y_{n})^{T}=A\mathbf{x}\qquad\text{for }\|% \mathbf{x}\|\leq 1,

if and only if the coordinates $y_{1},y_{2},\ldots,y_{n}$ satisfy the inequality

\frac{y_{1}^{2}}{\sigma_{1}^{2}}+\frac{y_{2}^{2}}{\sigma_{2}^{2}}+\cdots+\frac% {y_{n}^{2}}{\sigma_{n}^{2}}=\sum_{k=1}^{n}\frac{y_{k}^{2}}{\sigma_{k}^{2}}\leq 1

(this is simply the inequality $\|\mathbf{x}\|^{2}=\sum_{k}|x_{k}|^{2}\leq 1$ ).

The set of points in $\mathbb{R}^{n}$ satisfying the above inequalities is called an ellipsoid. If $n=2$ it is an ellipse with half-axes $\sigma_{1}$ and $\sigma_{2}$ , for $n=3$ it is an ellipsoid with half-axes $\sigma_{1},\sigma_{2}$ and $\sigma_{3}$ . In $\mathbb{R}^{n}$ the geometry of this set is also easy to visualize, and we call that set an ellipsoid with half axes $\sigma_{1},\sigma_{2},\ldots,\sigma_{n}$ . The vectors $\mathbf{e}_{1},\mathbf{e}_{2},\ldots,\mathbf{e}_{n}$ or, more precisely the corresponding lines are called the principal axes of the ellipsoid.

The singular value decomposition essentially says that any operator in an inner product space is diagonal with respect to a pair of orthonormal bases, see Remark 6.3.11. Namely, consider the orthogonal bases $\mathcal{A}=\mathbf{v}_{1},\mathbf{v}_{2},\ldots,\mathbf{v}_{n}$ and $\mathcal{B}=\mathbf{w}_{1},\mathbf{w}_{2},\ldots,\mathbf{w}_{n}$ from the singular value decomposition (6.3.4). Then the matrix of $A$ in these bases is diagonal

[A]_{{}_{\scriptstyle\mathcal{B},\mathcal{A}}}=\operatorname{diag}\{\sigma_{k}% :k=1,2,\ldots,n\}.

Assuming that all $\sigma_{k}>0$ and essentially repeating the above reasoning, it is easy to show that any point $\mathbf{y}=A\mathbf{x}\in A(B)$ if and only if it satisfies the inequality

\frac{y_{1}^{2}}{\sigma_{1}^{2}}+\frac{y_{2}^{2}}{\sigma_{2}^{2}}+\cdots+\frac% {y_{n}^{2}}{\sigma_{n}^{2}}=\sum_{k=1}^{n}\frac{y_{k}^{2}}{\sigma_{k}^{2}}\leq 1.

where $y_{1},y_{2},\ldots,y_{n}$ are coordinates of $\mathbf{y}$ in the orthonormal basis $\mathcal{B}=\mathbf{w}_{1},\mathbf{w}_{2},\ldots,\mathbf{w}_{n}$ , not in the standard one. Similarly, $(x_{1},x_{2},\ldots,x_{n})^{T}=[\mathbf{x}]_{{}_{\scriptstyle\mathcal{A}}}$ .

But that is essentially the same ellipsoid as before, only “rotated” (with different but still orthogonal principal axes)!

There is also an alternative explanation which is presented below.

Consider the general case, when the matrix $A$ is not necessarily square, and (or) not all singular values are non-zero. Consider first the case of a “diagonal” matrix $\Sigma$ of form (6.3.5). It is easy to see that the image $\Sigma B$ of the unit ball $B$ is the ellipsoid (not in the whole space but in the $\operatorname{Ran}\Sigma$ ) with half axes $\sigma_{1},\sigma_{2},\ldots,\sigma_{r}$ .

Consider now the general case, $A=W\Sigma V^{*}$ , where $V$ , $W$ are unitary operators. Unitary transformations do not change the unit ball (because they preserve norm), so $V^{*}(B)=B$ . We know that $\Sigma(B)$ is an ellipsoid in $\operatorname{Ran}\Sigma$ with half-axes $\sigma_{1},\sigma_{2},\ldots,\sigma_{r}$ . Unitary transformations do not change geometry of objects, so $W(\Sigma(B))$ is also an ellipsoid with the same half-axes. It is not hard to see from the decomposition $A=W\Sigma V^{*}$ (using the fact that both $W$ and $V^{*}$ are invertible) that $W$ transforms $\operatorname{Ran}\Sigma$ to $\operatorname{Ran}A$ , so we can conclude:

the image $A(B)$ of the closed unit ball $B$ is an ellipsoid in $\operatorname{Ran}A$ with half axes $\sigma_{1},\sigma_{2},\ldots,\sigma_{r}$ . Here $r$ is the number of non-zero singular values, i.e. the rank of $A$ .

6.4.2. Operator norm of a linear transformation

Given a linear transformation $A:X\to Y$ let us consider the following optimization problem: find the maximum of $\|A\mathbf{x}\|$ on the closed unit ball $B=\{\mathbf{x}\in X:\|\mathbf{x}\|\leq 1\}$ .

Again, singular value decomposition allows us to solve the problem. For a “diagonal” (like the matrix $\Sigma$ in the definition of the singular value decomposition) matrix $A$ with non-negative entries the maximum is exactly maximal diagonal entry. Indeed, let $s_{1},s_{2},\ldots,s_{r}$ be non-zero diagonal entries of $A$ and let $s_{1}$ be the maximal one. Since for $\mathbf{x}=(x_{1},x_{2},\ldots,x_{n})^{T}$

(6.4.1)

A\mathbf{x}=\sum_{k=1}^{r}\ s_{k}x_{k}\mathbf{e}_{k},

we can conclude that

\|A\mathbf{x}\|^{2}=\sum_{k=1}^{r}s_{k}^{2}|x_{k}|^{2}\leq s_{1}^{2}\sum_{k=1}% ^{r}|x_{k}|^{2}=s_{1}^{2}\cdot\|\mathbf{x}\|^{2},

so $\|A\mathbf{x}\|\leq s_{1}\|\mathbf{x}\|$ . On the other hand, $\|A\mathbf{e}_{1}\|=\|s_{1}\mathbf{e}_{1}\|=s_{1}\|\mathbf{e}_{1}\|$ , so indeed $s_{1}$ is the maximum of $\|A\mathbf{x}\|$ on the closed unit ball $B$ . Note, that in the above reasoning we did not assume that the matrix $A$ is square; we only assumed that all entries outside the “main diagonal” are $0$ , so formula (6.4.1) holds.

To treat the general case let us consider the singular value decomposition (6.3.4), $A=W\Sigma V^{*}$ , where $W$ , $V$ are unitary operators, and $\Sigma$ is the “diagonal” matrix with non-negative entries. Since unitary transformations do not change the norm, one can conclude that the maximum of $\|A\mathbf{x}\|$ on the unit ball $B$ is the maximal diagonal entry of $\Sigma$ i.e. that

the maximum of $\|A\mathbf{x}\|$ on the unit ball $B$ is the maximal singular value of $A$ .

Definition.

The quantity $\max\{\|A\mathbf{x}\|:\mathbf{x}\in X,\|\mathbf{x}\|\leq 1\}$ is called the operator norm of $A$ and denoted $\|A\|$ .

It is an easy exercise to see that $\|A\|$ satisfies all properties of the norm:

1.

$\|\alpha A\|=|\alpha|\cdot\|A\|$ ;
2.

$\|A+B\|\leq\|A\|+\|B\|$ ;
3.

$\|A\|\geq 0$ for all $A$ ;
4.

$\|A\|=0$ if and only if $A=\mathbf{0}$ ,

so it is indeed a norm on a space of linear transformations from $X$ to $Y$ .

One of the main properties of the operator norm is the inequality

\|A\mathbf{x}\|\leq\|A\|\cdot\|\mathbf{x}\|,

which follows easily from the homogeneity of the norm $\|\mathbf{x}\|$ .

In fact, it can be shown that the operator norm $\|A\|$ is the best (smallest) number $C\geq 0$ such that

\|A\mathbf{x}\|\leq C\|\mathbf{x}\|\qquad\forall\mathbf{x}\in X.

This is often used as a definition of the operator norm.

On the space of linear transformations we already have one norm, the Frobenius, or Hilbert-Schmidt norm $\|A\|_{2}$ ,

\|A\|_{2}^{2}=\operatorname{trace}(A^{*}A).

So, let us investigate how these two norms compare.

Let $s_{1},s_{2},\ldots,s_{r}$ be non-zero singular values of $A$ (counting multiplicities), and let $s_{1}$ be the largest singular value. Then $s^{2}_{1},s^{2}_{2},\ldots,s^{2}_{r}$ are non-zero eigenvalues of $A^{*}A$ (again counting multiplicities). Recalling that the trace equals the sum of the eigenvalues we conclude that

\|A\|_{2}^{2}=\operatorname{trace}(A^{*}A)=\sum_{k=1}^{r}s_{k}^{2}.

On the other hand we know that the operator norm of $A$ equals its largest singular value, i.e. $\|A\|=s_{1}$ . So we can conclude that $\|A\|\leq\|A\|_{2}$ , i.e. that

the operator norm of a matrix cannot be more than its Frobenius norm.

This statement also admits a direct proof using the Cauchy–Schwarz inequality, and such a proof is presented in some textbooks. The beauty of the proof we presented here is that it does not require any computations and illuminates the reasons behind the inequality.

6.4.3. Condition number of a matrix

Suppose we have an invertible matrix $A$ and we want to solve the equation $A\mathbf{x}=\mathbf{b}$ . The solution, of course, is given by $\mathbf{x}=A^{-1}\mathbf{b}$ , but we want to investigate what happens if we know the data only approximately.

That happens in the real life, when the data is obtained, for example by some experiments. But even if we have exact data, round-off errors during computations by a computer may have the same effect of distorting the data.

Let us consider the simplest model, suppose there is a small error in the right side of the equation. That means, instead of the equation $A\mathbf{x}=\mathbf{b}$ we are solving

A\mathbf{x}=\mathbf{b}+{\scriptstyle\Delta}\mathbf{b},

where ${\scriptstyle\Delta}\mathbf{b}$ is a small perturbation of the right side $\mathbf{b}$ .

So, instead of the exact solution $\mathbf{x}$ of $A\mathbf{x}=\mathbf{b}$ we get the approximate solution $\mathbf{x}+{\scriptstyle\Delta}\mathbf{x}$ of $A(\mathbf{x}+{\scriptstyle\Delta}\mathbf{x})=\mathbf{b}+{\scriptstyle\Delta}% \mathbf{b}$ . We are assuming that $A$ is invertible, so ${\scriptstyle\Delta}\mathbf{x}=A^{-1}{\scriptstyle\Delta}\mathbf{b}$ .

We want to know how big is the relative error in the solution $\|{\scriptstyle\Delta}\mathbf{x}\|/\|\mathbf{x}\|$ in comparison with the relative error in the right side $\|{\scriptstyle\Delta}\mathbf{b}\|/\|\mathbf{b}\|$ . It is easy to see that

\frac{\|{\scriptstyle\Delta}\mathbf{x}\|}{\|\mathbf{x}\|}=\frac{\|A^{-1}{% \scriptstyle\Delta}\mathbf{b}\|}{\|\mathbf{x}\|}=\frac{\|A^{-1}{\scriptstyle% \Delta}\mathbf{b}\|}{\|\mathbf{b}\|}\frac{\|\mathbf{b}\|}{\|\mathbf{x}\|}=% \frac{\|A^{-1}{\scriptstyle\Delta}\mathbf{b}\|}{\|\mathbf{b}\|}\frac{\|A% \mathbf{x}\|}{\|\mathbf{x}\|}.

Since $\|A^{-1}{\scriptstyle\Delta}\mathbf{b}\|\leq\|A^{-1}\|\cdot\|{\scriptstyle% \Delta}\mathbf{b}\|$ and $\|A\mathbf{x}\|\leq\|A\|\cdot\|\mathbf{x}\|$ we can conclude that

\frac{\|{\scriptstyle\Delta}\mathbf{x}\|}{\|\mathbf{x}\|}\leq\|A^{-1}\|\cdot\|% A\|\cdot\frac{\|{\scriptstyle\Delta}\mathbf{b}\|}{\|\mathbf{b}\|}.

The quantity $\|A\|\cdot\|A^{-1}\|$ is called the condition number of the matrix $A$ . It estimates how the relative error in the solution $\mathbf{x}$ depends on the relative error in the right side $\mathbf{b}$ .

Let us see how this quantity is related to singular values. Let the numbers $s_{1},s_{2},\ldots,s_{n}$ be the singular values of $A$ , and let us assume that $s_{1}$ is the largest singular value and $s_{n}$ is the smallest. We know that the (operator) norm of an operator equals its largest singular value, so

\|A\|=s_{1},\qquad\|A^{-1}\|=\frac{1}{s_{n}},

\|A\|\cdot\|A^{-1}\|=\frac{s_{1}}{s_{n}}.

In other words

The condition number of a matrix equals to the ratio of the largest and the smallest singular values.

We deduced above that $\frac{\|{\scriptstyle\Delta}\mathbf{x}\|}{\|\mathbf{x}\|}\leq\|A^{-1}\|\cdot\|% A\|\cdot\frac{\|{\scriptstyle\Delta}\mathbf{b}\|}{\|\mathbf{b}\|}$ . It is not hard to see that this estimate is sharp, i.e. that it is possible to pick the right side $\mathbf{b}$ and the error ${\scriptstyle\Delta}\mathbf{b}$ such that we have equality

\frac{\|{\scriptstyle\Delta}\mathbf{x}\|}{\|\mathbf{x}\|}=\|A^{-1}\|\cdot\|A\|% \cdot\frac{\|{\scriptstyle\Delta}\mathbf{b}\|}{\|\mathbf{b}\|}.

We just put $\mathbf{b}=\mathbf{w}_{1}$ and ${\scriptstyle\Delta}\mathbf{b}=\alpha\mathbf{w}_{n}$ , where $\mathbf{w}_{1}$ and $\mathbf{w}_{n}$ are respectively the first and the last column of the matrix $W$ in the singular value decomposition $A=W\Sigma V^{*}$ , and $\alpha\neq 0$ is an arbitrary scalar. Here, as usual, the singular values are assumed to be in non-increasing order $s_{1}\geq s_{2}\geq\ldots\geq s_{n}$ , so $s_{1}$ is the largest and $s_{n}$ is the smallest eigenvalue.

We leave the details as an exercise for the reader.

A matrix is called well conditioned if its condition number is not too big. If the condition number is big, the matrix is called ill conditioned. What is “big” here depends on the problem: with what precision you can find your right side, what precision is required for the solution, etc.

6.4.4. Effective rank of a matrix

Theoretically, the rank of a matrix is easy to compute: one just needs to row reduce matrix and count pivots. However, in practical applications not everything is so easy. The main reason is that very often we do not know the exact matrix, we only know its approximation up to some precision.

Moreover, even if we know the exact matrix, most computer programs introduce round-off errors in the computations, so effectively we cannot distinguish between a zero pivot and a very small pivot.

A simple naïve idea of working with round-off errors is as follows. When computing the rank (and other objects related to it, like column space, kernel, etc) one simply sets up a tolerance (some small number) and if the pivot is smaller than the tolerance, count it as zero. The advantage of this approach is its simplicity, since it is very easy to program. However, the main disadvantage is that it is impossible to see what the tolerance is responsible for. For example, what do we lose if we set the tolerance equal to $10^{-6}$ ? How much better will $10^{-8}$ be?

While the above approach works well for well conditioned matrices, it is not very reliable in the general case.

A better approach is to use singular values. It requires more computations, but gives much better results, which are easier to interpret. In this approach we also set up some small number as a tolerance, and then perform singular value decomposition. Then we simply treat singular values smaller than the tolerance as zero. The advantage of this approach is that we can see what we are doing. The singular values are the half-axes of the ellipsoid $A(B)$ ( $B$ is the closed unit ball), so by setting up the tolerance we just deciding how “thin” the ellipsoid should be to be considered “flat”.

6.4.5. Moore–Penrose (pseudo)inverse.

As we discussed in Section 5.4 of Chapter 5 above, the least square solution gives us, in the case when an equation $A\mathbf{x}=\mathbf{b}$ does not have a solutions, the “next best thing” (and gives us the solution of $A\mathbf{x}=\mathbf{b}$ when it exists).

Note, that the question of uniqueness is not addressed by the least square solution: a solution of the normal equation $A^{*}A\mathbf{x}=A^{*}\mathbf{b}$ does not have to be unique. A natural distinguished solution would be a solution of minimal norm; such a solution is indeed unique, and can be obtained by taking an arbitrary solution and then taking its projection onto $(\operatorname{Ker}A^{*}A)^{\perp}=(\operatorname{Ker}A)^{\perp}$ , see Exercises 5.4.5 and 5.4.6 in Chapter 5.

It is not hard to see that if $A=\widetilde{W}\widetilde{\Sigma}\widetilde{V}^{*}$ is a reduced singular value decomposition of $A$ , then the minimal norm least square solution $\mathbf{x}_{0}$ is given by

(6.4.2)

\displaystyle\mathbf{x}_{0}=\widetilde{V}\widetilde{\Sigma}^{-1}\widetilde{W}^% {*}\mathbf{b}.

Indeed, $\mathbf{x}_{0}$ is a least square solution of $A\mathbf{x}=\mathbf{b}$ (i.e. a solution of $A\mathbf{x}=P_{{}_{\scriptstyle\operatorname{Ran}A}}\mathbf{b}$ ):

A\mathbf{x}_{0}=\widetilde{W}\widetilde{\Sigma}\widetilde{V}^{*}\widetilde{V}% \widetilde{\Sigma}^{-1}\widetilde{W}^{*}\mathbf{b}=\widetilde{W}\widetilde{% \Sigma}\widetilde{\Sigma}^{-1}\widetilde{W}^{*}\mathbf{b}=\widetilde{W}% \widetilde{W}^{*}\mathbf{b}=P_{{}_{\scriptstyle\operatorname{Ran}A}}\mathbf{b};

in the last equality in the chain we used the fact that $\widetilde{W}\widetilde{W}^{*}=P_{{}_{\scriptstyle\operatorname{Ran}\widetilde% {W}}}$ ( $P_{{}_{\scriptstyle\operatorname{Ran}\widetilde{W}}}=\widetilde{W}(\widetilde{% W}^{*}\widetilde{W})^{-1}\widetilde{W}^{*}=\widetilde{W}\widetilde{W}^{*}$ ) and that $\operatorname{Ran}\widetilde{W}=\operatorname{Ran}A$ (see Exercise 6.4.4 below).

The general solution of $A\mathbf{x}=P_{{}_{\scriptstyle\operatorname{Ran}A}}\mathbf{b}$ is given by

\mathbf{x}=\mathbf{x}_{0}+\mathbf{y},\qquad\mathbf{y}\in\operatorname{Ker}A.

Note that $\mathbf{x}_{0}\in\operatorname{Ran}\widetilde{W}$ and $\operatorname{Ran}\widetilde{W}=\operatorname{Ran}A^{*}$ , see again Exercise 6.4.4 below. Therefore $\mathbf{x}_{0}\perp\operatorname{Ker}A$ (because $\operatorname{Ker}A=(\operatorname{Ran}A^{*})^{\perp}$ , see Theorem 5.5.1 in Chapter 5), so $\mathbf{x}_{0}$ is indeed the unique minimal norm solution of $A\mathbf{x}=P_{{}_{\scriptstyle\operatorname{Ran}A}}\mathbf{b}$ , or equivalently, the minimal norm least square solution of $A\mathbf{x}=\mathbf{b}$ .

Definition 6.4.1.

The operator $A^{+}:=\widetilde{V}\widetilde{\Sigma}^{-1}\widetilde{W}^{*}$ , where $A=\widetilde{W}\widetilde{\Sigma}\widetilde{V}^{*}$ is a reduced singular value decomposition of $A$ , is called the Moore–Penrose inverse (or Moore–Penrose pseudoinverse) of the operator $A$ . In other words, the Moore–Penrose inverse is the operator giving the unique least square solution of $A\mathbf{x}=\mathbf{b}$ .

Remark 6.4.2.

In the literature the Moore–Penrose inverse is usually defined as a matrix $A^{+}$ such that

1.

$AA^{+}A=A$ ;
2.

$A^{+}AA^{+}=A^{+}$ ;
3.

$(AA^{+})^{*}=AA^{+}$ ;
4.

$(A^{+}A)^{*}=A^{+}A$ .

It is very easy to check that the operator $A^{+}:=\widetilde{V}\widetilde{\Sigma}^{-1}\widetilde{W}^{*}$ satisfies properties 1–4 above.

It is also possible (although a bit harder) to show that an operator $A^{+}$ satisfying properties 1–4 is unique. Indeed, right and left multiplying identity 1 by $A^{+}$ , we get that $(A^{+}A)^{2}=A^{+}A$ and $(AA^{+})^{2}=AA^{+}$ ; together with properties 3 and 4 this means that $A^{+}A$ and $AA^{+}$ are orthogonal projections (see Exercise 5.5.6 in Chapter 5).

Trivially, $\operatorname{Ker}A\subset\operatorname{Ker}A^{+}A$ . On the other hand, identity 1 implies that $\operatorname{Ker}A^{+}A\subset\operatorname{Ker}A$ (why?), so $\operatorname{Ker}A^{+}A=\operatorname{Ker}A$ . But this means that $A^{+}A$ is the orthogonal projection onto $(\operatorname{Ker}A)^{\perp}=\operatorname{Ran}A^{*}$ ,

A^{+}A=P_{{}_{\scriptstyle\operatorname{Ran}A^{*}}}.

Property 1 also implies that $AA^{+}\mathbf{y}=\mathbf{y}$ for all $\mathbf{y}\in\operatorname{Ran}A$ . Since $AA^{+}$ is an orthogonal projection, we conclude that $\operatorname{Ran}A\subset\operatorname{Ran}AA^{+}$ . The opposite inclusion $\operatorname{Ran}AA^{+}\subset\operatorname{Ran}A$ is trivial, so $AA^{+}$ is the orthogonal projection onto $\operatorname{Ran}A$ ,

AA^{+}=P_{{}_{\scriptstyle\operatorname{Ran}A}}.

Knowing $A^{+}A$ and $AA^{+}$ we can rewrite property 2 as

P_{{}_{\scriptstyle\operatorname{Ran}A^{*}}}A^{+}=A^{+}\qquad\text{or}\qquad A% ^{+}P_{{}_{\scriptstyle\operatorname{Ran}A}}=A^{+}\,.

Combining the above identities we get

P_{{}_{\scriptstyle\operatorname{Ran}A^{*}}}A^{+}P_{{}_{\scriptstyle% \operatorname{Ran}A}}=A^{+}.

Finally, for any $\mathbf{b}$ in the target space of $A$

\mathbf{x}_{0}:=A^{+}\mathbf{b}=P_{{}_{\scriptstyle\operatorname{Ran}A^{*}}}A^% {+}\mathbf{b}\in\operatorname{Ran}A^{*}

and

A\mathbf{x}_{0}=AA^{+}\mathbf{b}=P_{{}_{\scriptstyle\operatorname{Ran}A}}% \mathbf{b},

i.e. $\mathbf{x}_{0}$ is a least square solution of $A\mathbf{x}=\mathbf{b}$ . Since $\mathbf{x}_{0}\in\operatorname{Ran}A^{*}=(\operatorname{Ker}A)^{\perp}$ , $\mathbf{x}_{0}$ is, as we discussed above, the least square solution of minimal norm. But, as we had shown before, such minimal norm solution is given by (6.4.2), so $A^{+}=\widetilde{V}\widetilde{\Sigma}^{-1}\widetilde{W}^{*}$ . ∎

Exercises.

6.4.1.

Find norms and condition numbers for the following matrices:

a)

$\displaystyle A=\left(\begin{array}[]{rr}4&0\\ 1&3\end{array}\right)$ .
b)

$\displaystyle A=\left(\begin{array}[]{rr}5&3\\ -3&3\end{array}\right)$ .

For the matrix $A$ from part a) present an example of the right side $\mathbf{b}$ and the error ${\scriptstyle\Delta}\mathbf{b}$ such that

\frac{\|{\scriptstyle\Delta}\mathbf{x}\|}{\|\mathbf{x}\|}=\|A\|\cdot\|A^{-1}\|% \cdot\frac{\|{\scriptstyle\Delta}\mathbf{b}\|}{\|\mathbf{b}\|};

here $A\mathbf{x}=\mathbf{b}$ and $A{\scriptstyle\Delta}\mathbf{x}={\scriptstyle\Delta}\mathbf{b}$ .

6.4.2.

Let $A$ be a normal operator, and let $\lambda_{1},\lambda_{2},\ldots,\lambda_{n}$ be its eigenvalues (counting multiplicities). Show that singular values of $A$ are $|\lambda_{1}|,|\lambda_{2}|,\ldots,|\lambda_{n}|$ .

6.4.3.

Find singular values, norm and condition number of the matrix

A=\left(\begin{array}[]{rrr}2&1&1\\ 1&2&1\\ 1&1&2\end{array}\right)

You can do this problem practically without any computations, if you use the previous problem and can answer the following questions:

a)

What are singular values (eigenvalues) of an orthogonal projection $P_{E}$ onto some subspace $E$ ?
b)

What is the matrix of the orthogonal projection onto the subspace spanned by the vector $(1,1,1)^{T}$ ?
c)

How the eigenvalues of the operators $T$ and $aT+bI$ , where $a$ and $b$ are scalars, are related?

Of course, you can also just honestly do the computations.

6.4.4.

Let $A=\widetilde{W}\widetilde{\Sigma}\widetilde{V}^{*}$ be a reduced singular value decomposition of $A$ . Show that $\operatorname{Ran}A=\operatorname{Ran}\widetilde{W}$ , and then by taking adjoint that $\operatorname{Ran}A^{*}=\operatorname{Ran}\widetilde{V}$ .

6.4.5.

Write a formula for the Moore–Penrose inverse $A^{+}$ in terms of the singular value decomposition $A=W\Sigma V^{*}$ .

6.4.6.

Tychonov’s regularization: Prove that the Moore–Penrose inverse $A^{+}$ can be computed as the limits

A^{+}=\lim_{\varepsilon\to 0+}(A^{*}A+\varepsilon I)^{-1}A^{*}=\lim_{% \varepsilon\to 0+}A^{*}(AA^{*}+\varepsilon I)^{-1}.

6.5. Structure of orthogonal matrices

An orthogonal matrix $U$ with $\det U=1$ is often called a rotation. The theorem below explains this name.

Theorem 6.5.1.

Let $U$ be an orthogonal operator in $\mathbb{R}^{n}$ and let $\det U=1$ .³³3For an orthogonal matrix $U$ $\det U=\pm 1$ . Then there exists an orthonormal basis $\mathbf{v}_{1},\mathbf{v}_{2},\ldots,\mathbf{v}_{n}$ such that the matrix of $U$ in this basis has the block diagonal form

\left(\begin{array}[]{ccccc}R_{\varphi_{1}}&&&&\\[-10.76385pt] &R_{\varphi_{2}}&&&{\raisebox{6.45831pt}{\Huge$0$}}\\ &&\ddots&&\\ &&&R_{\varphi_{k}}&\\[-10.76385pt] {\raisebox{6.45831pt}{\Huge$0$}}&&&&I_{n-2k}\\ \end{array}\right),

where $R_{\varphi_{k}}$ are $2$ -dimensional rotations,

R_{\varphi_{k}}=\left(\begin{array}[]{cc}\cos\varphi_{k}&-\sin\varphi_{k}\\ \sin\varphi_{k}&\cos\varphi_{k}\\ \end{array}\right)

and $I_{n-2k}$ stands for the identity matrix of size $(n-2k)\times(n-2k)$ .

Proof.

We know that if $p$ is a polynomial with real coefficient and $\lambda$ is its complex root, $p(\lambda)=0$ , then $\overline{\lambda}$ is a root of $p$ as well, $p(\overline{\lambda})=0$ (this can easily be checked by plugging $\overline{\lambda}$ into $p(z)=\sum_{k=0}^{n}a_{k}z^{k}$ ).

Therefore, all complex eigenvalues of a real matrix can be split into pairs $\lambda_{k}$ , $\overline{\lambda}_{k}$ .

We know, that eigenvalues of a unitary matrix have absolute value $1$ , so all complex eigenvalues of $U$ (which is both real and unitary) can be written as pairs $\lambda_{k}=\cos\alpha_{k}+i\sin\alpha_{k}$ , $\overline{\lambda}_{k}=\cos\alpha_{k}-i\sin\alpha_{k}$ .

Fix a pair of complex eigenvalues $\lambda$ and $\overline{\lambda}$ , and let $\mathbf{u}\in\mathbb{C}^{n}$ be the eigenvector of $U$ , $U\mathbf{u}=\lambda\mathbf{u}$ . Then $U\overline{\mathbf{u}}=\overline{\lambda}\overline{\mathbf{u}}$ . Now, split $\mathbf{u}$ into real and imaginary parts, i.e. define

\mathbf{x}:=\operatorname{Re}\mathbf{u}=(\mathbf{u}+\overline{\mathbf{u}})/2,% \qquad\mathbf{y}=\operatorname{Im}\mathbf{u}=(\mathbf{u}-\overline{\mathbf{u}}% )/(2i),

so $\mathbf{u}=\mathbf{x}+i\mathbf{y}$ (note, that $\mathbf{x},\mathbf{y}$ are real vectors, i.e. vectors with real entries). Then

U\mathbf{x}=U\frac{1}{2}(\mathbf{u}+\overline{\mathbf{u}})=\frac{1}{2}(\lambda% \mathbf{u}+\overline{\lambda}\overline{\mathbf{u}})=\operatorname{Re}(\lambda% \mathbf{u}).

Similarly,

U\mathbf{y}=\frac{1}{2i}U(\mathbf{u}-\overline{\mathbf{u}})=\frac{1}{2i}(% \lambda\mathbf{u}-\overline{\lambda}\overline{\mathbf{u}})=\operatorname{Im}(% \lambda\mathbf{u}).

Since $\lambda=\cos\alpha+i\sin\alpha$ , we have

\lambda\mathbf{u}=(\cos\alpha+i\sin\alpha)(\mathbf{x}+i\mathbf{y})=((\cos% \alpha)\mathbf{x}-(\sin\alpha)\mathbf{y})+i((\cos\alpha)\mathbf{y}+(\sin\alpha% )\mathbf{x}).

U\mathbf{x}=\operatorname{Re}(\lambda\mathbf{u})=(\cos\alpha)\mathbf{x}-(\sin% \alpha)\mathbf{y},\qquad U\mathbf{y}=\operatorname{Im}(\lambda\mathbf{u})=(% \cos\alpha)\mathbf{y}+(\sin\alpha)\mathbf{x}.

In other word, $U$ leaves the 2-dimensional subspace $E_{\lambda}$ spanned by the vectors $\mathbf{x},\mathbf{y}$ invariant and the matrix of the restriction of $U$ onto this subspace is the rotation matrix

R_{-\alpha}=\left(\begin{array}[]{cc}\cos\alpha&\sin\alpha\\ -\sin\alpha&\cos\alpha\end{array}\right).

Note, that the vectors $\mathbf{u}$ and $\overline{\mathbf{u}}$ (eigenvectors of a unitary matrix, corresponding to different eigenvalues) are orthogonal, so by the Pythagorean Theorem

\|\mathbf{x}\|=\|\mathbf{y}\|=\frac{\sqrt{2}}{2}\|\mathbf{u}\|.

It is easy to check that $\mathbf{x}\perp\mathbf{y}$ , so $\mathbf{x},\mathbf{y}$ is an orthogonal basis in $E_{\lambda}$ . If we multiply each vector in the basis $\mathbf{x},\mathbf{y}$ by the same non-zero number, we do not change matrices of linear transformations, so without loss of generality we can assume that $\|\mathbf{x}\|=\|\mathbf{y}\|=1$ i.e. that $\mathbf{x},\mathbf{y}$ is an orthonormal basis in $E_{\lambda}$ .

Let us complete the orthonormal system $\mathbf{v}_{1}=\mathbf{x},\mathbf{v}_{2}=\mathbf{y}$ to an orthonormal basis in $\mathbb{R}^{n}$ . Since $UE_{\lambda}\subset E_{\lambda}$ , i.e. $E_{\lambda}$ is an invariant subspace of $U$ , the matrix of $U$ in this basis has the block triangular form

\left(\begin{array}[]{c|ccc}R_{-\alpha}&&*&\\ \hline\cr&&&\\ \mathbf{0}&&U_{1}&\\ &&&\\ \end{array}\right)

where $\mathbf{0}$ stands for the $(n-2)\times 2$ block of zeroes.

Since the rotation matrix $R_{-\alpha}$ is invertible, we have $UE_{\lambda}=E_{\lambda}$ . Therefore

U^{*}E_{\lambda}=U^{-1}E_{\lambda}=E_{\lambda},

so the matrix of $U$ in the basis we constructed is in fact block diagonal,

\left(\begin{array}[]{c|ccc}R_{-\alpha}&&\mathbf{0}&\\ \hline\cr&&&\\ \mathbf{0}&&U_{1}&\\ &&&\\ \end{array}\right).

Since $U$ is unitary

I=U^{*}U=\left(\begin{array}[]{c|ccc}I&&\mathbf{0}&\\ \hline\cr&&&\\ \mathbf{0}&&U_{1}^{*}U_{1}&\\ &&&\\ \end{array}\right),

so, since $U_{1}$ is square, it is also unitary.

If $U_{1}$ has complex eigenvalues we can apply the same procedure to decrease its size by 2 until we are left with a block that has only real eigenvalues. Real eigenvalues can be only $+1$ or $-1$ , so in some orthonormal basis the matrix of $U$ has the form

\left(\begin{array}[]{cccccc}R_{-\alpha_{1}}&&&&&\\[-10.76385pt] &R_{-\alpha_{2}}&&&{\raisebox{6.45831pt}{\Huge$0$}}&\\ &&\ddots&&&\\ &&&R_{-\alpha_{d}}&&\\ &&&&-I_{r}&\\[-10.76385pt] {\raisebox{6.45831pt}{\Huge$0$}}&&&&&I_{l}\\ \end{array}\right);

here $I_{r}$ and $I_{l}$ are identity matrices of size $r\times r$ and $l\times l$ respectively. Since $\det U=1$ , the multiplicity of the eigenvalue $-1$ (i.e. $r$ ) must be even.

Note, that the $2\times 2$ matrix $-I_{2}$ can be interpreted as the rotation through the angle $\pi$ . Therefore, the above matrix has the form given in the conclusion of the theorem with $\varphi_{k}=-\alpha_{k}$ or $\varphi_{k}=\pi$ ∎

Let us give a different interpretation of Theorem 6.5.1. Define $T_{j}$ to be a rotation through $\varphi_{j}$ in the plane spanned by the vectors $\mathbf{v}_{j}$ , $\mathbf{v}_{j+1}$ . Then Theorem 6.5.1 simply says that $U$ is the composition of the rotations $T_{j}$ , $j=1,2,\ldots,k$ . Note, that because the rotations $T_{j}$ act in mutually orthogonal planes, they commute, i.e. it does not matter in what order we take the composition. So, the theorem can be interpreted as follows:

Any rotation in $\mathbb{R}^{n}$ can be represented as a composition of at most $n/2$ commuting planar rotations.

If an orthogonal matrix has determinant $-1$ , its structure is described by the following theorem.

Theorem 6.5.2.

Let $U$ be an orthogonal operator in $\mathbb{R}^{n}$ , and let $\det U=-1$ .Then there exists an orthonormal basis $\mathbf{v}_{1},\mathbf{v}_{2},\ldots,\mathbf{v}_{n}$ such that the matrix of $U$ in this basis has block diagonal form

\left(\begin{array}[]{ccccccc}R_{\varphi_{1}}&&&&&\\[-10.76385pt] &R_{\varphi_{2}}&&&&{\raisebox{6.45831pt}{\Huge$0$}}\\ &&\ddots&&&\\ &&&R_{\varphi_{k}}&&\\ &&&&I_{r}&\\[-6.45831pt] {\raisebox{0.0pt}{\Huge$0$}}&&&&&-1\end{array}\right),

where $r=n-2k-1$ and $R_{\varphi_{k}}$ are $2$ -dimensional rotations,

R_{\varphi_{k}}=\left(\begin{array}[]{cc}\cos\varphi_{k}&-\sin\varphi_{k}\\ \sin\varphi_{k}&\cos\varphi_{k}\\ \end{array}\right)

and $I_{n-2k}$ stands for the identity matrix of size $(n-2k)\times(n-2k)$ .

We leave the proof as an exercise for the reader. The modification that one should make to the proof of Theorem 6.5.1 are pretty obvious.

Note, that it follows from the above theorem that an orthogonal $2\times 2$ matrix $U$ with determinant $-1$ is always a reflection.

Let us now fix an orthonormal basis, say the standard basis in $\mathbb{R}^{n}$ . We call an elementary rotation⁴⁴4This term is not widely accepted. a rotation in the $x_{j}$ - $x_{k}$ plane, i.e. a linear transformation which changes only the coordinates $x_{j}$ and $x_{k}$ , and it acts on these two coordinates as a plane rotation.

Theorem 6.5.3.

Any rotation $U$ (i.e. an orthogonal transformation $U$ with $\det U=1$ ) can be represented as a product at most $n(n-1)/2$ elementary rotations.

To prove the theorem we will need the following simple lemmas.

Lemma 6.5.4.

Let $\mathbf{x}=(x_{1},x_{2})^{T}\in\mathbb{R}^{2}$ . There exists a rotation $R_{\alpha}$ of $\mathbb{R}^{2}$ which moves the vector $\mathbf{x}$ to the vector $(a,0)^{T}$ , where $a=\sqrt{x_{1}^{2}+x_{2}^{2}}$ .

The proof is elementary, and we leave it as an exercise for the reader. One can just draw a picture or/and write a formula for $R_{\alpha}$ .

Lemma 6.5.5.

Let $\mathbf{x}=(x_{1},x_{2},\ldots,x_{n})^{T}\in\mathbb{R}^{n}$ . There exist $n-1$ elementary rotations $R_{1},R_{2},\ldots,R_{n-1}$ such that $R_{n-1}\ldots,R_{2}R_{1}\mathbf{x}=(a,0,0,\ldots,0)^{T}$ , where $a=\sqrt{x_{1}^{2}+x_{2}^{2}+\ldots+x_{n}^{2}}$ .

Proof.

The idea of the proof of the lemma is very simple. We use an elementary rotation $R_{1}$ in the $x_{n-1}$ - $x_{n}$ plane to “kill” the last coordinate of $\mathbf{x}$ (Lemma 6.5.4 guarantees that such rotation exists). Then use an elementary rotation $R_{2}$ in $x_{n-2}$ - $x_{n-1}$ plane to “kill” the coordinate number $n-1$ of $R_{1}\mathbf{x}$ (the rotation $R_{2}$ does not change the last coordinate, so the last coordinate of $R_{2}R_{1}\mathbf{x}$ remains zero), and so on…

For a formal proof we will use induction in $n$ . The case $n=1$ is trivial, since any vector in $\mathbb{R}^{1}$ has the desired form. The case $n=2$ is treated by Lemma 6.5.4.

Assuming now that Lemma is true for $n-1$ , let us prove it for $n$ . By Lemma 6.5.4 there exists a $2\times 2$ rotation matrix $R_{\alpha}$ such that

R_{\alpha}\left(\begin{array}[]{cc}x_{n-1}\\ x_{n}\end{array}\right)=\left(\begin{array}[]{cc}a_{n-1}\\ 0\end{array}\right),

where $a_{n-1}=\sqrt{x_{n-1}^{2}+x_{n}^{2}}$ . So if we define the $n\times n$ elementary rotation $R_{1}$ by

R_{1}=\left(\begin{array}[]{cc}I_{n-2}&\mathbf{0}\\ \mathbf{0}&R_{\alpha}\\ \end{array}\right)

( $I_{n-2}$ is $(n-2)\times(n-2)$ identity matrix), then

R_{1}\mathbf{x}=(x_{1},x_{2},\ldots,x_{n-2},a_{n-1},0)^{T}.

We assumed that the conclusion of the lemma holds for $n-1$ , so there exist $n-2$ elementary rotations (let us call them $R_{2},R_{3},\ldots,R_{n-1}$ ) in $\mathbb{R}^{n-1}$ which transform the vector $(x_{1},x_{2},\ldots,x_{n-2},a_{n-1})^{T}\in\mathbb{R}^{n-1}$ to the vector $(a,0,\ldots,0)^{T}\in\mathbb{R}^{n-1}$ . In other words

R_{n-1}\ldots R_{3}R_{2}(x_{1},x_{2},\ldots,x_{n-2},a_{n-1})^{T}=(a,0,\ldots,0% )^{T}.

We can always assume that the elementary rotations $R_{2},R_{3},\ldots,R_{n-1}$ act in $\mathbb{R}^{n}$ , simply by assuming that they do not change the last coordinate. Then

R_{n-1}\ldots R_{3}R_{2}R_{1}\mathbf{x}=(a,0,\ldots,0)^{T}\in\mathbb{R}^{n}.

Let us now show that $a=\sqrt{x_{1}^{2}+x_{2}^{2}+\ldots+x_{n}^{2}}$ . It can be easily checked directly, but we apply the following indirect reasoning. We know that orthogonal transformations preserve the norm, and we know that $a\geq 0$ . But, then we do not have any choice, the only possibility for $a$ is $a=\sqrt{x_{1}^{2}+x_{2}^{2}+\ldots+x_{n}^{2}}$ . ∎

Lemma 6.5.6.

Let $A$ be an $n\times n$ matrix with real entries. There exist elementary rotations $R_{1},R_{2},\ldots,R_{N}$ , $N\leq n(n-1)/2$ such that the matrix $B=R_{N}\ldots R_{2}R_{1}A$ is upper triangular, and, moreover, all its diagonal entries except the last one $B_{n,n}$ are non-negative.

Proof.

We will use induction in $n$ . The case $n=1$ is trivial, since we can say that any $1\times 1$ matrix is of desired form.

Let us consider the case $n=2$ . Let $\mathbf{a}_{1}$ be the first column of $A$ . By Lemma 6.5.4 there exists a rotation $R$ which “kills” the second coordinate of $\mathbf{a}_{1}$ , making the first coordinate non-negative. Then the matrix $B=RA$ is of desired form.

Let us now assume that lemma holds for $(n-1)\times(n-1)$ matrices, and we want to prove it for $n\times n$ matrices. For the $n\times n$ matrix $A$ let $\mathbf{a}_{1}$ be its first column. By Lemma 6.5.5 we can find $n-1$ elementary rotations (say $R_{1},R_{2},\ldots,R_{n-1}$ ) which transform $\mathbf{a}_{1}$ into $(a,0,\ldots,0)^{T}$ . So, the matrix $R_{n-1}\ldots R_{2}R_{1}A$ has the following block triangular form

R_{n-1}\ldots R_{2}R_{1}A=\left(\begin{array}[]{ll}a&*\\ \mathbf{0}&A_{1}\\ \end{array}\right),

where $A_{1}$ is an $(n-1)\times(n-1)$ block.

We assumed that lemma holds for $n-1$ , so $A_{1}$ can be transformed by at most $(n-1)(n-2)/2$ rotations into the desired upper triangular form. Note, that these rotations act in $\mathbb{R}^{n-1}$ (only on the coordinates $x_{2},x_{3},\ldots,x_{n}$ ), but we can always assume that they act on the whole $\mathbb{R}^{n}$ simply by assuming that they do not change the first coordinate. Then, these rotations do not change the vector $(a,0,\ldots,0)^{T}$ (the first column of $R_{n-1}\ldots R_{2}R_{1}A$ ), so the matrix $A$ can be transformed into the desired upper triangular form by at most $n-1+(n-1)(n-2)/2=n(n-1)/2$ elementary rotations. ∎

Proof of Theorem 6.5.3.

By Lemma 6.5.6 there exist elementary rotations $R_{1},R_{2},\ldots,R_{N}$ such that the matrix $U_{1}=R_{N}\ldots R_{2}R_{1}U$ is upper triangular, and all diagonal entries, except maybe the last one, are non-negative.

Note, that the matrix $U_{1}$ is orthogonal. Any orthogonal matrix is normal, and we know that an upper triangular matrix can be normal only if it is diagonal. Therefore, $U_{1}$ is a diagonal matrix.

We know that an eigenvalue of an orthogonal matrix can either be $1$ or $-1$ , so we can have only $1$ or $-1$ on the diagonal of $U_{1}$ . But, we know that all diagonal entries of $U_{1}$ , except may be the last one, are non-negative, so all the diagonal entries of $U_{1}$ , except may be the last one, are $1$ . The last diagonal entry can be $\pm 1$ .

Since elementary rotations have determinant 1, we can conclude that $\det U_{1}=\det U=1$ , so the last diagonal entry also must be $1$ . So $U_{1}=I$ , and therefore $U$ can be represented as a product of elementary rotations $U=R_{1}^{-1}R_{2}^{-1}\ldots R^{-1}_{N}$ . Here we use the fact that the inverse of an elementary rotation is an elementary rotation as well. ∎

6.6. Orientation

6.6.1. Motivation

In Figures 6.1, 6.2 below we see 3 orthonormal bases in $\mathbb{R}^{2}$ and $\mathbb{R}^{3}$ respectively. In each figure, the basis b) can be obtained from the standard basis a) by a rotation, while it is impossible to rotate the standard basis to get the basis c) (so that $\mathbf{e}_{k}$ goes to $\mathbf{v}_{k}$ $\forall k$ ).

Refer to caption — Figure 6.1. Orientation in $\mathbb{R}^{2}$

You have probably heard the word “orientation” before, and you probably know that bases a) and b) have positive orientation, and orientation of the bases c) is negative. You also probably know some rules to determine the orientation, like the right hand rule from physics. So, if you can see a basis, say in $\mathbb{R}^{3}$ , you probably can say what orientation it has.

But what if you only given coordinates of the vectors $\mathbf{v}_{1},\mathbf{v}_{2},\mathbf{v}_{3}$ ? Of course, you can try to draw a picture to visualize the vectors, and then to see what the orientation is. But this is not always easy. Moreover, how do you “explain” this to a computer?

It turns out that there is an easier way. Let us explain it. We need to check whether it is possible to get a basis $\mathbf{v}_{1},\mathbf{v}_{2},\mathbf{v}_{3}$ in $\mathbb{R}^{3}$ by rotating the standard basis $\mathbf{e}_{1},\mathbf{e}_{2},\mathbf{e}_{3}$ . There is unique linear transformation $U$ such that

U\mathbf{e}_{k}=\mathbf{v}_{k},\qquad k=1,2,3;

its matrix (in the standard basis) is the matrix with columns $\mathbf{v}_{1},\mathbf{v}_{2},\mathbf{v}_{3}$ . It is an orthogonal matrix (because it transforms an orthonormal basis to an orthonormal basis), so we need to see when it is rotation. Theorems 6.5.1 and 6.5.2 give us the answer: the matrix $U$ is a rotation if and only if $\det U=1$ . Note, that (for $3\times 3$ matrices) if $\det U=-1$ , then $U$ is the composition of a rotation about some axis and a reflection in the plane of rotation, i.e. in the plane orthogonal to this axis.

This gives us a motivation for the formal definition below.

6.6.2. Formal definition

Let $\mathcal{A}$ and $\mathcal{B}$ be two bases in a real vector space $X$ . We say that the bases $\mathcal{A}$ and $\mathcal{B}$ have the same orientation, if the change of coordinates matrix $[I]_{{}_{\scriptstyle\mathcal{B},\mathcal{A}}}$ has positive determinant, and say that they have different orientations if the determinant of $[I]_{{}_{\scriptstyle\mathcal{B},\mathcal{A}}}$ is negative.

Note, that since $[I]_{\mathcal{A},\mathcal{B}}=[I]_{{}_{\scriptstyle\mathcal{B},\mathcal{A}}}^{% -1}$ , one can use the matrix $[I]_{\mathcal{A},\mathcal{B}}$ in the definition.

We usually assume that the standard basis $\mathbf{e}_{1},\mathbf{e}_{2},\ldots,\mathbf{e}_{n}$ in $\mathbb{R}^{n}$ has positive orientation. In an abstract space one just needs to fix a basis and declare that its orientation is positive.

If an orthonormal basis $\mathbf{v}_{1},\mathbf{v}_{2},\ldots,\mathbf{v}_{n}$ in $\mathbb{R}^{n}$ has positive orientation (i.e. the same orientation as the standard basis) Theorems 6.5.1 and 6.5.2 say that the basis $\mathbf{v}_{1},\mathbf{v}_{2},\ldots,\mathbf{v}_{n}$ is obtained from the standard basis by a rotation.

6.6.3. Continuous transformations of bases and orientation

Definition.

We say that a basis $\mathcal{A}=\{\mathbf{a}_{1},\mathbf{a}_{2},\ldots,\mathbf{a}_{n}\}$ can be continuously transformed to a basis $\mathcal{B}=\{\mathbf{b}_{1},\mathbf{b}_{2},\ldots,\mathbf{b}_{n}\}$ if there exists a continuous family of bases $\mathcal{V}(t)=\{\mathbf{v}_{1}(t),\mathbf{v}_{2}(t),\ldots,\mathbf{v}_{n}(t)\}$ , $t\in[a,b]$ such that

\mathbf{v}_{k}(a)=\mathbf{a}_{k},\quad\mathbf{v}_{k}(b)=\mathbf{b}_{k},\qquad k% =1,2\ldots,n.

“Continuous family of bases” mean that the vector-functions $\mathbf{v}_{k}(t)$ are continuous (their coordinates in some bases are continuous functions) and, which is essential, the system $\mathbf{v}_{1}(t),\mathbf{v}_{2}(t),\ldots,\mathbf{v}_{n}(t)$ is a basis for all $t\in[a,b]$ .

Note, that performing a change of variables, we can always assume, if necessary that $[a,b]=[0,1]$ .

Theorem 6.6.1.

Two bases $\mathcal{A}=\{\mathbf{a}_{1},\mathbf{a}_{2},\ldots,\mathbf{a}_{n}\}$ and $\mathcal{B}=\{\mathbf{b}_{1},\mathbf{b}_{2},\ldots,\mathbf{b}_{n}\}$ have the same orientation, if and only if one of the bases can be continuously transformed to the other.

Proof.

Suppose the basis $\mathcal{A}$ can be continuously transformed to the basis $\mathcal{B}$ , and let $\mathcal{V}(t)$ , $t\in[a,b]$ be a continuous family of bases, performing this transformation. Consider a matrix-function $V(t)$ whose columns are the coordinate vectors $[\mathbf{v}_{k}(t)]_{\mathcal{A}}$ of $\mathbf{v}_{k}(t)$ in the basis $\mathcal{A}$ .

Clearly, the entries of $V(t)$ are continuous functions and $V(a)=I$ , $V(b)=[I]_{\mathcal{A},\mathcal{B}}$ . Note, that because $\mathcal{V}(t)$ is always a basis, $\det V(t)$ is never zero. Then, the Intermediate Value Theorem asserts that $\det V(a)$ and $\det V(b)$ has the same sign. Since $\det V(a)=\det I=1$ , we can conclude that

\det[I]_{\mathcal{A},\mathcal{B}}=\det V(b)>0,

so the bases $\mathcal{A}$ and $\mathcal{B}$ have the same orientation.

To prove the opposite implication, i.e. the “only if” part of the theorem, one needs to show that the identity matrix $I$ can be continuously transformed through invertible matrices to any matrix $B$ satisfying $\det B>0$ . In other words, that there exists a continuous matrix-function $V(t)$ on an interval $[a,b]$ such that for all $t\in[a,b]$ the matrix $V(t)$ is invertible and such that

V(a)=I,\qquad V(b)=B.

We leave the proof of this fact as an exercise for the reader. There are several ways to prove that, one of which is outlined in Exercises 6.6.2—6.6.5 below. ∎

Exercises.

6.6.1.

Let $R_{\alpha}$ be the rotation through $\alpha$ , so its matrix in the standard basis is

\left(\begin{array}[]{rr}\cos\alpha&-\sin\alpha\\ \sin\alpha&\cos\alpha\\ \end{array}\right).

Find the matrix of $R_{\alpha}$ in the basis $\mathbf{v}_{1},\mathbf{v}_{2}$ , where $\mathbf{v}_{1}=\mathbf{e}_{2}$ , $\mathbf{v}_{2}=\mathbf{e}_{1}$ .

6.6.2.

Let $R_{\alpha}$ be the rotation matrix

R_{\alpha}=\left(\begin{array}[]{rr}\cos\alpha&-\sin\alpha\\ \sin\alpha&\cos\alpha\\ \end{array}\right).

Show that the $2\times 2$ identity matrix $I_{2}$ can be continuously transformed through invertible matrices into $R_{\alpha}$ .

6.6.3.

Let $U$ be an $n\times n$ orthogonal matrix, and let $\det U>0$ . Show that the $n\times n$ identity matrix $I_{n}$ can be continuously transformed through invertible matrices into $U$ . Hint: Use the previous problem and representation of a rotation in $\mathbb{R}^{n}$ as a product of planar rotations, see Section 6.5.

6.6.4.

Let $A$ be an $n\times n$ positive definite Hermitian matrix, $A=A^{*}>0$ . Show that the $n\times n$ identity matrix $I_{n}$ can be continuously transformed through invertible matrices into $A$ . Hint: What about diagonal matrices?

6.6.5.

Using polar decomposition and Problems 6.6.3, 6.6.4 above, complete the proof of the “only if” part of Theorem 6.6.1

	$\displaystyle(N^{*}N\mathbf{x},\mathbf{y})=(N\mathbf{x},N\mathbf{y})$	$\displaystyle=\frac{1}{4}\sum_{\alpha=\pm 1,\pm i}\alpha\\|N\mathbf{x}+\alpha N% \mathbf{y}\\|^{2}$
		$\displaystyle=\frac{1}{4}\sum_{\alpha=\pm 1,\pm i}\alpha\\|N(\mathbf{x}+\alpha% \mathbf{y})\\|^{2}$
		$\displaystyle=\frac{1}{4}\sum_{\alpha=\pm 1,\pm i}\alpha\\|N^{*}(\mathbf{x}+% \alpha\mathbf{y})\\|^{2}$
		$\displaystyle=\frac{1}{4}\sum_{\alpha=\pm 1,\pm i}\alpha\\|N^{}\mathbf{x}+% \alpha N^{}\mathbf{y}\\|^{2}$
		$\displaystyle=(N^{}\mathbf{x},N^{}\mathbf{y})=(NN^{*}\mathbf{x},\mathbf{y})$