Chapter 6 Structure of operators in inner product spaces.

In this chapter we are again assuming that all spaces are finite-dimensional. Again, we are dealing only with complex or real spaces, theory of inner product spaces does not apply to spaces over general fields. When we are not mentioning what space are we in, everything work for both complex and real spaces.

To avoid writing essentially the same formulas twice we will use the notation for the complex case: in the real case it give correct, although sometimes a bit more complicated, formulas.

6.1. Upper triangular (Schur) representation of an operator.

Theorem 6.1.1.

Let A:XXA:X\to X be an operator acting in a complex inner product space. There exists an orthonormal basis 𝐮1,𝐮2,,𝐮n\mathbf{u}_{1},\mathbf{u}_{2},\ldots,\mathbf{u}_{n} in XX such that the matrix of AA in this basis is upper triangular.

In other words, any n×nn\times n matrix AA can be represented as A=UTUA=UTU^{*}, where UU is a unitary, and TT is an upper triangular matrix.

Proof.

We prove the theorem using the induction in dimX\dim X. If dimX=1\dim X=1 the theorem is trivial, since any 1×11\times 1 matrix is upper triangular.

Suppose we proved that the theorem is true if dimX=n1\dim X=n-1, and we want to prove it for dimX=n\dim X=n.

Let λ1\lambda_{1} be an eigenvalue of AA, and let 𝐮1\mathbf{u}_{1}, 𝐮1=1\|\mathbf{u}_{1}\|=1 be a corresponding eigenvector, A𝐮1=λ1𝐮1A\mathbf{u}_{1}=\lambda_{1}\mathbf{u}_{1}. Denote E=𝐮1E=\mathbf{u}_{1}^{\perp}, and let 𝐯2,,𝐯n\mathbf{v}_{2},\ldots,\mathbf{v}_{n} be some orthonormal basis in EE (clearly, dimE=dimX1=n1\dim E=\dim X-1=n-1), so 𝐮1,𝐯2,,𝐯n\mathbf{u}_{1},\mathbf{v}_{2},\ldots,\mathbf{v}_{n} is an orthonormal basis in XX. In this basis the matrix of AA has the form

(6.1.1) (λ10A10);\left(\begin{array}[]{c|cccc}\lambda_{1}&&*&\\ \hline\cr 0&&&\\ \vdots&&A_{1}&\\ 0&&&\end{array}\right);

here all entries below λ1\lambda_{1} are zeroes, and * means that we do not care what entries are in the first row right of λ1\lambda_{1}.

We do care enough about the lower right (n1)×(n1)(n-1)\times(n-1) block, to give it name: we denote it as A1A_{1}.

Note, that A1A_{1} defines a linear transformation in EE, and since dimE=n1\dim E=n-1, the induction hypothesis implies that there exists an orthonormal basis (let us denote it as 𝐮2,,𝐮n\mathbf{u}_{2},\ldots,\mathbf{u}_{n}) in which the matrix of A1A_{1} is upper triangular.

So, matrix of AA in the orthonormal basis 𝐮1,𝐮2,,𝐮n\mathbf{u}_{1},\mathbf{u}_{2},\ldots,\mathbf{u}_{n} has the form (6.1.1), where matrix A1A_{1} is upper triangular. Therefore, the matrix of AA in this basis is upper triangular as well. ∎

Remark.

Note, that the subspace E=𝐮1E=\mathbf{u}_{1}^{\perp} introduced in the proof is not invariant under AA, i.e. the inclusion AEEAE\subset E does not necessarily hold. That means that A1A_{1} is not a part of AA, it is some operator constructed from AA.

Note also, that AEEAE\subset E if and only if all entries denoted by * (i.e. all entries in the first row, except λ1\lambda_{1}) are zero.

Remark.

Note, that even if we start from a real matrix AA, the matrices UU and TT can have complex entries. The rotation matrix

(cosαsinαsinαcosα),αkπ,k\left(\begin{array}[]{cc}\cos\alpha&-\sin\alpha\\ \sin\alpha&\cos\alpha\end{array}\right),\qquad\alpha\neq k\pi,k\in\mathbb{Z}

is not unitarily equivalent (not even similar) to a real upper triangular matrix. Indeed, eigenvalues of this matrix are complex, and the eigenvalues of an upper triangular matrix are its diagonal entries.

Remark.

An analogue of Theorem 6.1.1 can be stated and proved for an arbitrary vector space, without requiring it to have an inner product. In this case the theorem claims that any operator have an upper triangular form in some basis. A proof can be modeled after the proof of Theorem 6.1.1. An alternative way is to equip VV with an inner product by fixing a basis in VV and declaring it to be an orthonormal one, see Exercise 5.2.4 in Chapter 5.

Note, that the version for inner product spaces (Theorem 6.1.1) is stronger than the one for the vector spaces, because it says that we always can find an orthonormal basis, not just a basis.

The following theorem is a real-valued version of Theorem 6.1.1.

Theorem 6.1.2.

Let A:XXA:X\to X be an operator acting in a real inner product space. Suppose that all eigenvalues of AA are real (meaning that AA has exactly n=dimXn=\dim X real eigenvalues, counting multiplicities). Then there exists an orthonormal basis 𝐮1,𝐮2,,𝐮n\mathbf{u}_{1},\mathbf{u}_{2},\ldots,\mathbf{u}_{n} in XX such that the matrix of AA in this basis is upper triangular.

In other words, any real n×nn\times n matrix AA with all real eigenvalues can be represented as A=UTU=UTUTA=UTU^{*}=UTU^{T}, where UU is an orthogonal, and TT is a real upper triangular matrices.

Remark 6.1.3.

Recall that as we had already discussed in Section 4.1.4 of Chapter 4 by eigenvalues of an operator AA in a real vector space XX we always mean the eigenvalues of its complexification AA_{{}_{\scriptstyle\mathbb{C}}} acting in the complexification XX_{{}_{\scriptstyle\mathbb{C}}} of XX.

If X=nX=\mathbb{R}^{n} then its complexification n\mathbb{C}^{n} is obtained from n\mathbb{R}^{n} by allowing complex coordinates. The complexification of an operator AA (the multiplication by a real matrix AA) on n\mathbb{R}^{n} is the multiplication by the same (real) matrix, but now in the space n\mathbb{C}^{n}.

For more details about complexifications see Section 4.1.4 of Chapter 4 and Section 5.8.2 of Chapter 5.

Proof of Theorem 6.1.2.

To prove the theorem we just need to analyze the proof of Theorem 6.1.1. Let us assume (we can always do this without loss of generality) that the operator (matrix) AA acts in n\mathbb{R}^{n}.

Suppose, the theorem is true for (n1)×(n1)(n-1)\times(n-1) matrices. As in the proof of Theorem 6.1.1 let λ1\lambda_{1} be a real eigenvalue of AA, 𝐮1n\mathbf{u}_{1}\in\mathbb{R}^{n}, 𝐮1=1\|\mathbf{u}_{1}\|=1 be a corresponding eigenvector, and let 𝐯2,,𝐯n\mathbf{v}_{2},\ldots,\mathbf{v}_{n} be on orthonormal system (in n\mathbb{R}^{n}) such that 𝐮1,𝐯2,,𝐯n\mathbf{u}_{1},\mathbf{v}_{2},\ldots,\mathbf{v}_{n} is an orthonormal basis in n\mathbb{R}^{n}.

The matrix of AA in this basis has form (6.1.1), where A1A_{1} is some real matrix.

If we can prove that matrix A1A_{1} has only real eigenvalues, then we are done. Indeed, then by the induction hypothesis there exists an orthonormal basis 𝐮2,,𝐮n\mathbf{u}_{2},\ldots,\mathbf{u}_{n} in E=𝐮1E=\mathbf{u}_{1}^{\perp} such that the matrix of A1A_{1} in this basis is upper triangular, so the matrix of AA in the basis 𝐮1,𝐮2,,𝐮n\mathbf{u}_{1},\mathbf{u}_{2},\ldots,\mathbf{u}_{n} is also upper triangular.

To show that A1A_{1} has only real eigenvalues, let us notice that

det(AλI)=(λ1λ)det(A1λI)\det(A-\lambda I)=(\lambda_{1}-\lambda)\det(A_{1}-\lambda I)

(take the cofactor expansion in the first row, for example), and so any eigenvalue of A1A_{1} is also an eigenvalue of AA. But AA has only real eigenvalues! ∎

Exercises.

6.1.1.

Use the upper triangular representation of an operator to give an alternative proof of the fact that determinant is the product and the trace is the sum of eigenvalues counting multiplicities.

6.2. Spectral theorem for self-adjoint and normal operators.

In this section we deal with matrices (operators) which are unitarily equivalent to diagonal matrices.

Let us recall that an operator is called self-adjoint if A=AA=A^{*}. A matrix of a self-adjoint operator (in some orthonormal basis), i.e. a matrix satisfying A=AA^{*}=A is called a Hermitian matrix.

The terms self-adjoint and Hermitian essentially mean the same. Usually people say self-adjoint when speaking about operators (transformations), and Hermitian when speaking about matrices. We will try to follow this convention, but since we often do not distinguish between operators and their matrices, we will sometimes mix both terms.

Theorem 6.2.1.

Let A=AA=A^{*} be a self-adjoint operator in an inner product space XX (the space can be complex or real). Then all eigenvalues of AA are real, and there exists an orthonormal basis of eigenvectors of AA in XX.

This theorem can be restated in matrix form as follows

Theorem 6.2.2.

Let A=AA=A^{*} be a self-adjoint (and therefore square) matrix. Then AA can be represented as

A=UDU,A=UDU^{*},

where UU is a unitary matrix and DD is a diagonal matrix with real entries.

Moreover, if the matrix AA is real, matrix UU can be chosen to be real (i.e. orthogonal).

Proof.

To prove Theorems 6.2.1 and 6.2.2 for a complex inner product space XX let us first apply Theorem 6.1.1 to find an orthonormal basis in XX such that the matrix of AA in this basis is upper triangular. Now let us ask ourself a question: What upper triangular matrices are self-adjoint?

The answer is immediate: an upper triangular matrix is self-adjoint if and only if it is a diagonal matrix with real entries. Theorem 6.2.1 (and so Theorem 6.2.2) is proved.

To treat the case of a real vector space, we first notice that by the already proved complex case all eigenvalues of AA are real111Recall, see Remark 6.1.3 above, that by the eigenvalues of an operator in a real vector space XX, we always mean the eigenvalues of the extension of this operator to the complexification XX_{{}_{\scriptstyle\mathbb{C}}} of XX. Then we just apply Theorem 6.1.2 and notice again that the only self-adjoint upper triangular matrices are the diagonal ones. ∎

Remark.

In many textbooks only real matrices are considered, and Theorem 6.2.2 is often called the “Spectral Theorem for symmetric matrices”. However, we should emphasize that the conclusion of Theorem 6.2.2 fails for complex symmetric matrices: the theorem holds for Hermitian matrices, and in particular for real symmetric matrices.

Let us give an independent proof to the fact that eigenvalues of a self-adjoint operators are real. Let A=AA=A^{*} and A𝐱=λ𝐱A\mathbf{x}=\lambda\mathbf{x}, 𝐱𝟎\mathbf{x}\neq\mathbf{0}. Then

(A𝐱,𝐱)=(λ𝐱,𝐱)=λ(𝐱,𝐱)=λ𝐱2.(A\mathbf{x},\mathbf{x})=(\lambda\mathbf{x},\mathbf{x})=\lambda(\mathbf{x},% \mathbf{x})=\lambda\|\mathbf{x}\|^{2}.

On the other hand,

(A𝐱,𝐱)=(𝐱,A𝐱)=(𝐱,A𝐱)=(𝐱,λ𝐱)=λ¯(𝐱,𝐱)=λ¯𝐱2,(A\mathbf{x},\mathbf{x})=(\mathbf{x},A^{*}\mathbf{x})=(\mathbf{x},A\mathbf{x})% =(\mathbf{x},\lambda\mathbf{x})=\overline{\lambda}(\mathbf{x},\mathbf{x})=% \overline{\lambda}\|\mathbf{x}\|^{2},

so λ𝐱2=λ¯𝐱2\lambda\|\mathbf{x}\|^{2}=\overline{\lambda}\|\mathbf{x}\|^{2}. Since 𝐱0\|\mathbf{x}\|\neq 0 (𝐱𝟎\mathbf{x}\neq\mathbf{0}), we can conclude λ=λ¯\lambda=\overline{\lambda}, so λ\lambda is real.

It also follows from Theorem 6.2.1 that eigenspaces of a self-adjoint operator are orthogonal. Let us give an alternative proof of this result.

Proposition 6.2.3.

Let A=AA=A^{*} be a self-adjoint operator, and let 𝐮,𝐯\mathbf{u},\mathbf{v} be its eigenvectors, A𝐮=λ𝐮A\mathbf{u}=\lambda\mathbf{u}, A𝐯=μ𝐯A\mathbf{v}=\mu\mathbf{v}. Then, if λμ\lambda\neq\mu, the eigenvectors 𝐮\mathbf{u} and 𝐯\mathbf{v} are orthogonal.

Proof.

This proposition follows from the spectral theorem (Theorem 6.2.1), but here we are giving a direct proof. Namely,

(A𝐮,𝐯)=(λ𝐮,𝐯)=λ(𝐮,𝐯).(A\mathbf{u},\mathbf{v})=(\lambda\mathbf{u},\mathbf{v})=\lambda(\mathbf{u},% \mathbf{v}).

On the other hand

(A𝐮,𝐯)=(𝐮,A𝐯)=(𝐮,A𝐯)=(𝐮,μ𝐯)=μ¯(𝐮,𝐯)=μ(𝐮,𝐯)(A\mathbf{u},\mathbf{v})=(\mathbf{u},A^{*}\mathbf{v})=(\mathbf{u},A\mathbf{v})% =(\mathbf{u},\mu\mathbf{v})=\overline{\mu}(\mathbf{u},\mathbf{v})=\mu(\mathbf{% u},\mathbf{v})

(the last equality holds because eigenvalues of a self-adjoint operator are real), so λ(𝐮,𝐯)=μ(𝐮,𝐯)\lambda(\mathbf{u},\mathbf{v})=\mu(\mathbf{u},\mathbf{v}). If λμ\lambda\neq\mu it is possible only if (𝐮,𝐯)=0(\mathbf{u},\mathbf{v})=0. ∎

Now let us try to find what matrices are unitarily equivalent to a diagonal one. It is easy to check that for a diagonal matrix DD

DD=DD.D^{*}D=DD^{*}.

Therefore AA=AAA^{*}A=AA^{*} if the matrix of AA in some orthonormal basis is diagonal.

Definition.

An operator (matrix) NN is called normal if NN=NNN^{*}N=NN^{*}.

Clearly, any self-adjoint operator (A=AA=A^{*}) is normal. Also, any unitary operator U:XXU:X\to X is normal since UU=UU=IU^{*}U=UU^{*}=I.

Note, that a normal operator is an operator acting in one space, not from one space to another. So, if UU is a unitary operator acting from one space to another, we cannot say that UU is normal.

Theorem 6.2.4.

Any normal operator NN in a complex vector space has an orthonormal basis of eigenvectors.

In other words, any matrix NN satisfying NN=NNN^{*}N=NN^{*} can be represented as

N=UDU,N=UDU^{*},

where UU is a unitary matrix, and DD is a diagonal one.

Remark.

Note, that in the above theorem even if NN is a real matrix, we did not claim that matrices UU and DD are real. Moreover, it can be easily shown, that if DD is real, NN must be self-adjoint.

Proof of Theorem 6.2.4.

To prove Theorem 6.2.4 we apply Theorem 6.1.1 to get an orthonormal basis, such that the matrix of NN in this basis is upper triangular. To complete the proof of the theorem we only need to show that an upper triangular normal matrix must be diagonal.

We will prove this using induction in the dimension of matrix. The case of 1×11\times 1 matrix is trivial, since any 1×11\times 1 matrix is diagonal.

Suppose we have proved that any (n1)×(n1)(n-1)\times(n-1) upper triangular normal matrix is diagonal, and we want to prove it for n×nn\times n matrices. Let NN be n×nn\times n upper triangular normal matrix. We can write it as

N=(a1,1a1,2a1,n0N10)N=\left(\begin{array}[]{c|ccc}a_{1,1}&a_{1,2}&\ldots&a_{1,n}\\ \hline\cr 0&&&\\ \vdots&&N_{1}&\\ 0&&&\end{array}\right)

where N1N_{1} is an upper triangular (n1)×(n1)(n-1)\times(n-1) matrix.

Let us compare upper left entries (first row first column) of NNN^{*}N and NNNN^{*}. Direct computation shows that that

(NN)1,1=a¯1,1a1,1=|a1,1|2(N^{*}N)_{1,1}=\overline{a}_{1,1}a_{1,1}=|a_{1,1}|^{2}

and

(NN)1,1=|a1,1|2+|a1,2|2++|a1,n|2.(NN^{*})_{1,1}=|a_{1,1}|^{2}+|a_{1,2}|^{2}+\ldots+|a_{1,n}|^{2}.

So, (NN)1,1=(NN)1,1(N^{*}N)_{1,1}=(NN^{*})_{1,1} if and only if a1,2==a1,n=0a_{1,2}=\ldots=a_{1,n}=0. Therefore, the matrix NN has the form

N=(a1,1000N10)N=\left(\begin{array}[]{c|ccc}a_{1,1}&0&\ldots&0\\ \hline\cr 0&&&\\ \vdots&&N_{1}&\\ 0&&&\end{array}\right)

It follows from the above representation that

NN=(|a1,1|2000N1N10),NN=(|a1,1|2000N1N10)N^{*}N=\left(\begin{array}[]{c|ccc}|a_{1,1}|^{2}&0&\ldots&0\\ \hline\cr 0&&&\\ \vdots&&N_{1}^{*}N_{1}&\\ 0&&&\end{array}\right),\qquad NN^{*}=\left(\begin{array}[]{c|ccc}|a_{1,1}|^{2}% &0&\ldots&0\\ \hline\cr 0&&&\\ \vdots&&N_{1}N_{1}^{*}&\\ 0&&&\end{array}\right)

so N1N1=N1N1N_{1}^{*}N_{1}=N_{1}N_{1}^{*}. That means the matrix N1N_{1} is also normal, and by the induction hypothesis it is diagonal. So the matrix NN is also diagonal. ∎

The following proposition gives a very useful characterization of normal operators.

Proposition 6.2.5.

An operator N:XXN:X\to X is normal if and only if

N𝐱=N𝐱𝐱X.\|N\mathbf{x}\|=\|N^{*}\mathbf{x}\|\qquad\forall\mathbf{x}\in X.
Proof.

Let NN be normal, NN=NNN^{*}N=NN^{*}. Then

N𝐱2=(N𝐱,N𝐱)=(NN𝐱,𝐱)=(NN𝐱,𝐱)=(N𝐱,N𝐱)=N𝐱2\|N\mathbf{x}\|^{2}=(N\mathbf{x},N\mathbf{x})=(N^{*}N\mathbf{x},\mathbf{x})=(% NN^{*}\mathbf{x},\mathbf{x})=(N^{*}\mathbf{x},N^{*}\mathbf{x})=\|N^{*}\mathbf{% x}\|^{2}

so N𝐱=N𝐱\|N\mathbf{x}\|=\|N^{*}\mathbf{x}\|.

Now let

N𝐱=N𝐱𝐱X.\|N\mathbf{x}\|=\|N^{*}\mathbf{x}\|\qquad\forall\mathbf{x}\in X.

The Polarization Identities (Lemma 5.1.9 in Chapter 5) imply that for all 𝐱,𝐲X\mathbf{x},\mathbf{y}\in X

(NN𝐱,𝐲)=(N𝐱,N𝐲)\displaystyle(N^{*}N\mathbf{x},\mathbf{y})=(N\mathbf{x},N\mathbf{y}) =14α=±1,±iαN𝐱+αN𝐲2\displaystyle=\frac{1}{4}\sum_{\alpha=\pm 1,\pm i}\alpha\|N\mathbf{x}+\alpha N% \mathbf{y}\|^{2}
=14α=±1,±iαN(𝐱+α𝐲)2\displaystyle=\frac{1}{4}\sum_{\alpha=\pm 1,\pm i}\alpha\|N(\mathbf{x}+\alpha% \mathbf{y})\|^{2}
=14α=±1,±iαN(𝐱+α𝐲)2\displaystyle=\frac{1}{4}\sum_{\alpha=\pm 1,\pm i}\alpha\|N^{*}(\mathbf{x}+% \alpha\mathbf{y})\|^{2}
=14α=±1,±iαN𝐱+αN𝐲2\displaystyle=\frac{1}{4}\sum_{\alpha=\pm 1,\pm i}\alpha\|N^{*}\mathbf{x}+% \alpha N^{*}\mathbf{y}\|^{2}
=(N𝐱,N𝐲)=(NN𝐱,𝐲)\displaystyle=(N^{*}\mathbf{x},N^{*}\mathbf{y})=(NN^{*}\mathbf{x},\mathbf{y})

and therefore (see Corollary 5.1.6 in Chapter 5) NN=NNN^{*}N=NN^{*}. ∎

Exercises.

6.2.1.

True or false:

  1. a)

    Every unitary operator U:XXU:X\to X is normal.

  2. b)

    A matrix is unitary if and only if it is invertible.

  3. c)

    If two matrices are unitarily equivalent, then they are also similar.

  4. d)

    The sum of self-adjoint operators is self-adjoint.

  5. e)

    The adjoint of a unitary operator is unitary.

  6. f)

    The adjoint of a normal operator is normal.

  7. g)

    If all eigenvalues of a linear operator are 11, then the operator must be unitary or orthogonal.

  8. h)

    If all eigenvalues of a normal operator are 11, then the operator is identity.

  9. i)

    A linear operator may preserve norm but not the inner product.

6.2.2.

True or false: The sum of normal operators is normal? Justify your conclusion.

6.2.3.

Show that an operator unitarily equivalent to a diagonal one is normal.

6.2.4.

Orthogonally diagonalize the matrix,

A=(3223).A=\left(\begin{array}[]{cc}3&2\\ 2&3\\ \end{array}\right).

Find all square roots of AA, i.e. find all matrices BB such that B2=AB^{2}=A.

Note, that all square roots of AA are self-adjoint.

6.2.5.

True or false: any self-adjoint matrix has a self-adjoint square root. Justify.

6.2.6.

Orthogonally diagonalize the matrix,

A=(7224),A=\left(\begin{array}[]{cc}7&2\\ 2&4\\ \end{array}\right),

i.e. represent it as A=UDUA=UDU^{*}, where DD is diagonal and UU is unitary.

Among all square roots of AA, i.e. among all matrices BB such that B2=AB^{2}=A, find one that has positive eigenvalues. You can leave BB as a product.

6.2.7.

True or false:

  1. a)

    A product of two self-adjoint matrices is self-adjoint.

  2. b)

    If AA is self-adjoint, then AkA^{k} is self-adjoint.

Justify your conclusions

6.2.8.

Let AA be m×nm\times n matrix. Prove that

  1. a)

    AAA^{*}A is self-adjoint.

  2. b)

    All eigenvalues of AAA^{*}A are non-negative.

  3. c)

    AA+IA^{*}A+I is invertible.

6.2.9.

Give a proof if the statement is true, or give a counterexample if it is false:

  1. a)

    If A=AA=A^{*} then A+iIA+iI is invertible.

  2. b)

    If UU is unitary, U+34IU+\frac{3}{4}I is invertible.

  3. c)

    If a matrix AA is real, AiIA-iI is invertible.

6.2.10.

Orthogonally diagonalize the rotation matrix

Rα=(cosαsinαsinαcosα),R_{\alpha}=\left(\begin{array}[]{cc}\cos\alpha&-\sin\alpha\\ \sin\alpha&\cos\alpha\\ \end{array}\right),

where α\alpha is not an integer multiple of π\pi. Note, that you will get complex eigenvalues in this case.

6.2.11.

Orthogonally diagonalize the matrix

A=(cosαsinαsinαcosα).A=\left(\begin{array}[]{cc}\cos\alpha&\sin\alpha\\ \sin\alpha&-\cos\alpha\\ \end{array}\right).

Hints: You will get real eigenvalues in this case. Also, the trigonometric identities sin2x=2sinxcosx\sin 2x=2\sin x\cos x, sin2x=(1cos2x)/2\sin^{2}x=(1-\cos 2x)/2, cos2x=(1+cos2x)/2\cos^{2}x=(1+\cos 2x)/2 (applied to x=α/2x=\alpha/2) will help to simplify expressions for eigenvectors.

6.2.12.

Can you describe the linear transformation with matrix AA from the previous problem geometrically? It has a very simple geometric interpretation.

6.2.13.

Prove that a normal operator with unimodular eigenvalues (i.e. with all eigenvalues satisfying |λk|=1|\lambda_{k}|=1) is unitary. Hint: Consider diagonalization

6.2.14.

Prove that a normal operator with real eigenvalues is self-adjoint.

6.2.15.

Show by example that conclusion of Theorem 6.2.2 fails for complex symmetric matrices. Namely

  1. a)

    construct a (diagonalizable) 2×22\times 2 complex symmetric matrix not admitting an orthogonal basis of eigenvectors;

  2. b)

    construct a 2×22\times 2 complex symmetric matrix which cannot be diagonalized.

6.3. Polar and singular value decompositions.

6.3.1. Positive definite operators. Square roots

Definition.

A self adjoint operator A:XXA:X\to X is called positive definite if

(A𝐱,𝐱)>0𝐱𝟎,(A\mathbf{x},\mathbf{x})>0\qquad\forall\mathbf{x}\neq\mathbf{0},

and it is called positive semidefinite if

(A𝐱,𝐱)0𝐱X.(A\mathbf{x},\mathbf{x})\geq 0\qquad\forall\mathbf{x}\in X.

We will use the notation A>0A>0 for positive definite operators, and A0A\geq 0 for positive semi-definite.

The following theorem describes positive definite and semidefinite operators.

Theorem 6.3.1.

Let A=AA=A^{*}. Then

  1. 1.

    A>0A>0 if and only if all eigenvalues of AA are positive.

  2. 2.

    A0A\geq 0 if and only if all eigenvalues of AA are non-negative.

Proof.

Pick an orthonormal basis such that matrix of AA in this basis is diagonal (see Theorem 6.2.1). To finish the proof it remains to notice that a diagonal matrix is positive definite (positive semidefinite) if and only if all its diagonal entries are positive (non-negative). ∎

Corollary 6.3.2.

Let A=A0A=A^{*}\geq 0 be a positive semidefinite operator. There exists a unique positive semidefinite operator BB such that B2=AB^{2}=A

Such BB is called (positive) square root of AA and is denoted as A\sqrt{A} or A1/2A^{1/2}.

Proof.

Let us prove that A\sqrt{A} exists. Let 𝐯1,𝐯2,,𝐯n\mathbf{v}_{1},\mathbf{v}_{2},\ldots,\mathbf{v}_{n} be an orthonormal basis of eigenvectors of AA, and let λ1,λ2,,λn\lambda_{1},\lambda_{2},\ldots,\lambda_{n} be the corresponding eigenvalues. Note, that since A0A\geq 0, all λk0\lambda_{k}\geq 0.

In the basis 𝐯1,𝐯2,,𝐯n\mathbf{v}_{1},\mathbf{v}_{2},\ldots,\mathbf{v}_{n} the matrix of AA is a diagonal matrix diag{λ1,λ2,,λn}\operatorname{diag}\{\lambda_{1},\lambda_{2},\ldots,\lambda_{n}\} with entries λ1,λ2,,λn\lambda_{1},\lambda_{2},\ldots,\lambda_{n} on the diagonal. Define the matrix of BB in the same basis as diag{λ1,λ2,,λn}\operatorname{diag}\{\sqrt{\lambda_{1}},\sqrt{\lambda_{2}},\ldots,\sqrt{% \lambda_{n}}\}.

Clearly, B=B0B=B^{*}\geq 0 and B2=AB^{2}=A.

To prove that such BB is unique, let us suppose that there exists an operator C=C0C=C^{*}\geq 0 such that C2=AC^{2}=A. Let 𝐮1,𝐮2,,𝐮n\mathbf{u}_{1},\mathbf{u}_{2},\ldots,\mathbf{u}_{n} be an orthonormal basis of eigenvectors of CC, and let μ1,μ2,,μn\mu_{1},\mu_{2},\ldots,\mu_{n} be the corresponding eigenvalues (note that μk0\mu_{k}\geq 0 k\forall k). The matrix of CC in the basis 𝐮1,𝐮2,,𝐮n\mathbf{u}_{1},\mathbf{u}_{2},\ldots,\mathbf{u}_{n} is a diagonal one diag{μ1,μ2,,μn}\operatorname{diag}\{\mu_{1},\mu_{2},\ldots,\mu_{n}\}, and therefore the matrix of A=C2A=C^{2} in the same basis is diag{μ12,μ22,,μn2}\operatorname{diag}\{\mu^{2}_{1},\mu^{2}_{2},\ldots,\mu^{2}_{n}\}. This implies that any eigenvalue λ\lambda of AA is of form μk2\mu_{k}^{2}, and, moreover, if A𝐱=λ𝐱A\mathbf{x}=\lambda\mathbf{x}, then C𝐱=λ𝐱C\mathbf{x}=\sqrt{\lambda}\mathbf{x}.

Therefore in the basis 𝐯1,𝐯2,,𝐯n\mathbf{v}_{1},\mathbf{v}_{2},\ldots,\mathbf{v}_{n} above, the matrix of CC has the diagonal form diag{λ1,λ2,,λn}\operatorname{diag}\{\sqrt{\lambda_{1}},\sqrt{\lambda_{2}},\ldots,\sqrt{% \lambda_{n}}\}, i.e. B=CB=C. ∎

6.3.2. Modulus of an operator. Polar decomposition.

Consider an operator A:XYA:X\to Y. Its Hermitian square AAA^{*}A is a positive semidefinite operator acting in XX. Indeed,

(AA)=AA=AA(A^{*}A)^{*}=A^{*}A^{**}=A^{*}A

and

(AA𝐱,𝐱)=(A𝐱,A𝐱)=A𝐱20𝐱X.(A^{*}A\mathbf{x},\mathbf{x})=(A\mathbf{x},A\mathbf{x})=\|A\mathbf{x}\|^{2}% \geq 0\qquad\forall\mathbf{x}\in X.

Therefore, there exists a (unique) positive-semidefinite square root R=AAR=\sqrt{A^{*}A}. This operator RR is called the modulus of the operator AA, and is often denoted as |A||A|.

The modulus of AA shows how “big” the operator AA is:

Proposition 6.3.3.

For a linear operator A:XYA:X\to Y

|A|𝐱=A𝐱𝐱X.\||A|\mathbf{x}\|=\|A\mathbf{x}\|\qquad\forall\mathbf{x}\in X.
Proof.

For any 𝐱X\mathbf{x}\in X

|A|𝐱2\displaystyle\||A|\mathbf{x}\|^{2} =(|A|𝐱,|A|𝐱)=(|A||A|𝐱,𝐱)=(|A|2𝐱,𝐱)\displaystyle=(|A|\mathbf{x},|A|\mathbf{x})=(|A|^{*}|A|\mathbf{x},\mathbf{x})=% (|A|^{2}\mathbf{x},\mathbf{x})
=(AA𝐱,𝐱)=(A𝐱,A𝐱)=A𝐱2\displaystyle=(A^{*}A\mathbf{x},\mathbf{x})=(A\mathbf{x},A\mathbf{x})=\|A% \mathbf{x}\|^{2}

Corollary 6.3.4.
KerA=Ker|A|=(Ran|A|).\operatorname{Ker}A=\operatorname{Ker}|A|=(\operatorname{Ran}|A|)^{\perp}.
Proof.

The first equality follows immediately from Proposition 6.3.3, the second one follows from the identity KerT=(RanT)\operatorname{Ker}T=(\operatorname{Ran}T^{*})^{\perp} (|A||A| is self-adjoint). ∎

Theorem 6.3.5 (Polar decomposition of an operator).

Let A:XXA:X\to X be an operator (square matrix). Then AA can be represented as

A=U|A|,A=U|A|,

where UU is a unitary operator.

Remark.

The unitary operator UU is generally not unique. As one will see from the proof of the theorem, UU is unique if and only if AA is invertible.

Remark.

The polar decomposition A=U|A|A=U|A| also holds for operators A:XYA:X\to Y acting from one space to another. But in this case we can only guarantee that UU is an isometry from Ran|A|=(KerA)\operatorname{Ran}|A|=(\operatorname{Ker}A)^{\perp} to YY.

If dimXdimY\dim X\leq\dim Y this isometry can be extended to an isometry from the whole XX to YY (if dimX=dimY\dim X=\dim Y this will be a unitary operator).

Proof of Theorem 6.3.5.

Consider a vector 𝐱Ran|A|\mathbf{x}\in\operatorname{Ran}|A|. Then vector 𝐱\mathbf{x} can be represented as 𝐱=|A|𝐯\mathbf{x}=|A|\mathbf{v} for some vector 𝐯X\mathbf{v}\in X.

Define U0𝐱:=A𝐯U_{0}\mathbf{x}:=A\mathbf{v}. By Proposition 6.3.3

U0𝐱=A𝐯=|A|𝐯=𝐱\|U_{0}\mathbf{x}\|=\|A\mathbf{v}\|=\||A|\mathbf{v}\|=\|\mathbf{x}\|

so it looks like U0U_{0} is an isometry from Ran|A|\operatorname{Ran}|A| to XX.

But first we need to prove that U0U_{0} is well defined. Let 𝐯1\mathbf{v}_{1} be another vector such that 𝐱=|A|𝐯1\mathbf{x}=|A|\mathbf{v}_{1}. But 𝐱=|A|𝐯=|A|𝐯1\mathbf{x}=|A|\mathbf{v}=|A|\mathbf{v}_{1} means that 𝐯𝐯1Ker|A|=KerA\mathbf{v}-\mathbf{v}_{1}\in\operatorname{Ker}|A|=\operatorname{Ker}A (cf Corollary 6.3.4), so A𝐯=A𝐯1A\mathbf{v}=A\mathbf{v}_{1}, meaning that U0𝐱U_{0}\mathbf{x} is well defined.

By the construction A=U0|A|A=U_{0}|A|. We leave as an exercise for the reader to check that U0U_{0} is a linear transformation.

To extend U0U_{0} to a unitary operator UU, let us find some unitary transformation U1:KerA(RanA)=KerAU_{1}:\operatorname{Ker}A\to(\operatorname{Ran}A)^{\perp}=\operatorname{Ker}A^% {*}. It is always possible to do this, since for square matrices dimKerA=dimKerA\dim\operatorname{Ker}A=\dim\operatorname{Ker}A^{*} (the Rank Theorem).

It is easy to check that U=U0+U1U=U_{0}+U_{1} is a unitary operator, and that A=U|A|A=U|A|. ∎

6.3.3. Singular values. Schmidt decomposition.

Definition.

Eigenvalues of |A||A| are called the singular values of AA. In other words, if λ1,λ2,,λn\lambda_{1},\lambda_{2},\ldots,\lambda_{n} are eigenvalues of AAA^{*}A then λ1,λ2,,λn\sqrt{\lambda_{1}},\sqrt{\lambda_{2}},\ldots,\sqrt{\lambda_{n}} are singular values of AA.

Remark.

Very often in the literature the singular values are defined as the non-negative square roots of the eigenvalues of AAA^{*}A, without any reference to the modulus |A||A|.

I consider the notion of the modulus of an operator to be an important one, so it was introduced above. However, the notion of the modulus of an operator is not required for what follows (defining the Schmidt and singular value decompositions). Moreover, as it will be shown below, the modulus of AA can be easily constructed from the singular value decomposition.

Consider an operator A:XYA:X\to Y, and let σ1,σ2,,σn\sigma_{1},\sigma_{2},\ldots,\sigma_{n} be the singular values of AA counting multiplicities. Assume also that σ1,σ2,,σr\sigma_{1},\sigma_{2},\ldots,\sigma_{r} are the non-zero singular values of AA, counting multiplicities. This means, in particular, that σk=0\sigma_{k}=0 for k>rk>r.

By the definition of singular values the numbers σ12,σ22,,σn2\sigma^{2}_{1},\sigma^{2}_{2},\ldots,\sigma^{2}_{n} are eigenvalues of AAA^{*}A. Let 𝐯1,𝐯2,,𝐯n\mathbf{v}_{1},\mathbf{v}_{2},\ldots,\mathbf{v}_{n} be an orthonormal basis222We know, that for a self-adjoint operator (AAA^{*}A in our case) there exists an orthonormal basis of eigenvectors. of eigenvectors of AAA^{*}A, AA𝐯k=σk2𝐯kA^{*}A\mathbf{v}_{k}=\sigma^{2}_{k}\mathbf{v}_{k}.

Proposition 6.3.6.

The system

𝐰k:=1σkA𝐯k,k=1,2,,r\mathbf{w}_{k}:=\frac{1}{\sigma_{k}}A\mathbf{v}_{k},\qquad k=1,2,\ldots,r

is an orthonormal system.

Proof.
(A𝐯j,A𝐯k)=(AA𝐯j,𝐯k)=(σj2𝐯j,𝐯k)=σj2(𝐯j,𝐯k)={0,jkσj2,j=k(A\mathbf{v}_{j},A\mathbf{v}_{k})=(A^{*}A\mathbf{v}_{j},\mathbf{v}_{k})=(% \sigma_{j}^{2}\mathbf{v}_{j},\mathbf{v}_{k})=\sigma_{j}^{2}(\mathbf{v}_{j},% \mathbf{v}_{k})=\left\{\begin{array}[]{ll}0,&j\neq k\\ \sigma_{j}^{2},&j=k\end{array}\right.

since 𝐯1,𝐯2,,𝐯r\mathbf{v}_{1},\mathbf{v}_{2},\ldots,\mathbf{v}_{r} is an orthonormal system. ∎

In the notation of the above proposition, the operator AA can be represented as

(6.3.1) A=k=1rσk𝐰k𝐯k,A=\sum_{k=1}^{r}\sigma_{k}\mathbf{w}_{k}\mathbf{v}_{k}^{*},

or, equivalently

(6.3.2) A𝐱=k=1rσk(𝐱,𝐯k)𝐰k.A\mathbf{x}=\sum_{k=1}^{r}\sigma_{k}(\mathbf{x},\mathbf{v}_{k})\mathbf{w}_{k}.

Indeed, we know that 𝐯1,𝐯2,,𝐯n\mathbf{v}_{1},\mathbf{v}_{2},\ldots,\mathbf{v}_{n} is an orthonormal basis in XX. Then substituting 𝐱=𝐯j\mathbf{x}=\mathbf{v}_{j} into the right side of (6.3.2) we get

k=1rσk(𝐯j,𝐯k)𝐰k=σj(𝐯j,𝐯j)𝐰j=σj𝐰j=A𝐯jif j=1,2,,r,\sum_{k=1}^{r}\sigma_{k}(\mathbf{v}_{j},\mathbf{v}_{k})\mathbf{w}_{k}=\sigma_{% j}(\mathbf{v}_{j},\mathbf{v}_{j})\mathbf{w}_{j}=\sigma_{j}\mathbf{w}_{j}=A% \mathbf{v}_{j}\qquad\text{if }j=1,2,\ldots,r,

and

k=1rσk(𝐯k𝐯j)𝐰k=𝟎=A𝐯jfor j>r.\sum_{k=1}^{r}\sigma_{k}(\mathbf{v}_{k}^{*}\mathbf{v}_{j})\mathbf{w}_{k}=% \mathbf{0}=A\mathbf{v}_{j}\qquad\text{for }j>r.

So the operators in the left and right sides of (6.3.1) coincide on the basis 𝐯1,𝐯2,,𝐯n\mathbf{v}_{1},\mathbf{v}_{2},\ldots,\mathbf{v}_{n}, so they are equal.

Definition.

The above decomposition (6.3.1) (or (6.3.2)) is called the Schmidt decomposition of the operator AA.

Remark.

Schmidt decomposition of an operator is not unique. Why?

Lemma 6.3.7.

Let AA can be represented as

A=k=1rσk𝐰k𝐯kA=\sum_{k=1}^{r}\sigma_{k}\mathbf{w}_{k}\mathbf{v}_{k}^{*}

where σk>0\sigma_{k}>0 and 𝐯1,𝐯2,,𝐯r\mathbf{v}_{1},\mathbf{v}_{2},\ldots,\mathbf{v}_{r}, 𝐰1,𝐰2,,𝐰r\mathbf{w}_{1},\mathbf{w}_{2},\ldots,\mathbf{w}_{r} are some orthonormal systems.

Then this representation gives a Schmidt decomposition of AA.

Proof.

We only need to show that 𝐯1,𝐯2,,𝐯r\mathbf{v}_{1},\mathbf{v}_{2},\ldots,\mathbf{v}_{r} are eigenvectors of AAA^{*}A, AA𝐯k=σk2𝐯kA^{*}A\mathbf{v}_{k}=\sigma_{k}^{2}\mathbf{v}_{k}. Since 𝐰1,𝐰2,,𝐰r\mathbf{w}_{1},\mathbf{w}_{2},\ldots,\mathbf{w}_{r} is an orthonormal system,

𝐰k𝐰j=(𝐰j,𝐰k)=δk,j:={0,jk1,j=k,\mathbf{w}_{k}^{*}\mathbf{w}_{j}=(\mathbf{w}_{j},\mathbf{w}_{k})=\delta_{k,j}:% =\left\{\begin{array}[]{ll}0,&j\neq k\\ 1,&j=k,\end{array}\right.

and therefore

AA=k=1rσk2𝐯k𝐯k.A^{*}A=\sum_{k=1}^{r}\sigma_{k}^{2}\mathbf{v}_{k}\mathbf{v}_{k}^{*}.

Since 𝐯1,𝐯2,,𝐯r\mathbf{v}_{1},\mathbf{v}_{2},\ldots,\mathbf{v}_{r} is an orthonormal system

AA𝐯j=k=1rσk2𝐯k𝐯k𝐯j=σj2𝐯jA^{*}A\mathbf{v}_{j}=\sum_{k=1}^{r}\sigma_{k}^{2}\mathbf{v}_{k}\mathbf{v}_{k}^% {*}\mathbf{v}_{j}=\sigma_{j}^{2}\mathbf{v}_{j}

thus 𝐯k\mathbf{v}_{k} are eigenvectors of AAA^{*}A. ∎

Corollary 6.3.8.

Let

A=k=1rσk𝐰k𝐯kA=\sum_{k=1}^{r}\sigma_{k}\mathbf{w}_{k}\mathbf{v}_{k}^{*}

be a Schmidt decomposition of AA. Then

A=k=1rσk𝐯k𝐰kA^{*}=\sum_{k=1}^{r}\sigma_{k}\mathbf{v}_{k}\mathbf{w}_{k}^{*}

is a Schmidt decomposition of AA^{*}

6.3.4. Matrix representation of the Schmidt decomposition. Singular value decomposition.

The Schmidt decomposition can be written in a nice matrix form. Namely, let us assume that A:𝔽n𝔽mA:\mathbb{F}^{n}\to\mathbb{F}^{m}, where 𝔽\mathbb{F} is either \mathbb{C} or \mathbb{R} (we can always do that by fixing orthonormal bases in XX and YY and working with coordinates in these bases). Let σ1,σ2,,σr\sigma_{1},\sigma_{2},\ldots,\sigma_{r} be non-zero singular values of AA, and let

A=k=1rσk𝐰k𝐯kA=\sum_{k=1}^{r}\sigma_{k}\mathbf{w}_{k}\mathbf{v}_{k}^{*}

be a Schmidt decomposition of AA.

As one can easily see, this equality can be rewritten as

(6.3.3) A=W~Σ~V~,A=\widetilde{W}\widetilde{\Sigma}\widetilde{V}^{*},

where Σ~=diag{σ1,σ2,,σr}\widetilde{\Sigma}=\operatorname{diag}\{\sigma_{1},\sigma_{2},\ldots,\sigma_{r}\} and V~\widetilde{V} and W~\widetilde{W} are matrices with columns 𝐯1,𝐯2,,𝐯r\mathbf{v}_{1},\mathbf{v}_{2},\ldots,\mathbf{v}_{r} and 𝐰1,𝐰2,,𝐰r\mathbf{w}_{1},\mathbf{w}_{2},\ldots,\mathbf{w}_{r} respectively. (Can you tell what is the size of each matrix?)

Note, that since 𝐯1,𝐯2,,𝐯r\mathbf{v}_{1},\mathbf{v}_{2},\ldots,\mathbf{v}_{r} and 𝐰1,𝐰2,,𝐰r\mathbf{w}_{1},\mathbf{w}_{2},\ldots,\mathbf{w}_{r} are orthonormal systems, the matrices V~\widetilde{V} and W~\widetilde{W} are isometries. Note also that r=rankAr=\operatorname{rank}A, see Exercise 6.3.1 below.

If the matrix AA is invertible, then m=n=rm=n=r, the matrices V~\widetilde{V}, W~\widetilde{W} are unitary and Σ~\widetilde{\Sigma} is an invertible diagonal matrix.

It turns out that it is always possible to write a representation similar to (6.3.3) with unitary VV and WW instead of V~\widetilde{V} and W~\widetilde{W}, and in many situations it is more convenient to work with such a representation. To write this representation one needs first to complete the systems 𝐯1,𝐯2,,𝐯r\mathbf{v}_{1},\mathbf{v}_{2},\ldots,\mathbf{v}_{r} and 𝐰1,𝐰2,,𝐰r\mathbf{w}_{1},\mathbf{w}_{2},\ldots,\mathbf{w}_{r} to orthogonal bases in 𝔽n\mathbb{F}^{n} and 𝔽m\mathbb{F}^{m} respectively.

Recall, that to complete, say 𝐯1,𝐯2,,𝐯r\mathbf{v}_{1},\mathbf{v}_{2},\ldots,\mathbf{v}_{r} to an orthonormal basis in 𝔽n\mathbb{F}^{n} one just needs to find and orthonormal basis 𝐯r+1,,𝐯n\mathbf{v}_{r+1},\ldots,\mathbf{v}_{n} in KerV\operatorname{Ker}V^{*}; then the system 𝐯1,𝐯2,,𝐯n\mathbf{v}_{1},\mathbf{v}_{2},\ldots,\mathbf{v}_{n} will be an orthonormal basis in 𝔽n\mathbb{F}^{n}. And one can always get an orthonormal basis from an arbitrary one using Gram–Schmidt orthogonalization.

Then AA can be represented as

(6.3.4) A=WΣV,A=W\Sigma V^{*},

where VMn×n𝔽V\in M_{n\times n}^{\mathbb{F}} and WMm×m𝔽W\in M_{m\times m}^{\mathbb{F}} are unitary matrices with columns 𝐯1,𝐯2,,𝐯n\mathbf{v}_{1},\mathbf{v}_{2},\ldots,\mathbf{v}_{n} and 𝐰1,𝐰2,,𝐰m\mathbf{w}_{1},\mathbf{w}_{2},\ldots,\mathbf{w}_{m} respectively, and Σ\Sigma is a “diagonal” m×nm\times n matrix

(6.3.5) Σj,k={σkj=kr:0otherwise.\Sigma_{j,k}=\left\{\begin{array}[]{ll}\sigma_{k}&j=k\leq r:\\ 0&\text{otherwise.}\end{array}\right.

In other words, to get the matrix Σ\Sigma one has to take the diagonal matrix diag{σ1,σ2,,σr}\operatorname{diag}\{\sigma_{1},\sigma_{2},\ldots,\sigma_{r}\} and make it to an m×nm\times n matrix by adding extra zeroes “south and east”.

Definition 6.3.9.

For a matrix AMm×n𝔽A\in M_{m\times n}^{\mathbb{F}} (recall that here 𝔽\mathbb{F} is always \mathbb{C} or \mathbb{R}) its singular value decomposition (SVD) is a decomposition of form (6.3.4), i.e. a decomposition A=WΣVA=W\Sigma V^{*}, where WMm×m𝔽W\in M_{m\times m}^{\mathbb{F}}, VMn×n𝔽V\in M_{n\times n}^{\mathbb{F}} are unitary matrices and ΣMm×n+\Sigma\in M_{m\times n}^{\mathbb{R}_{+}} is a “diagonal” one (meaning that σk,k0\sigma_{k,k}\geq 0 for all k=1,2,,min{m,n}k=1,2,\ldots,\min\{m,n\}, and σj,k=0\sigma_{j,k}=0 for all jkj\neq k).

The representation (6.3.3) is often called the reduced or compact SVD. More precisely the reduced SVD is a representation A=W~Σ~V~A=\widetilde{W}\widetilde{\Sigma}\widetilde{V}^{*}, where Σ~Mr×r+\widetilde{\Sigma}\in M_{r\times r}^{\mathbb{R}_{+}}, rmin{m,n}r\leq\min\{m,n\} is a diagonal matrix with strictly positive diagonal entries, and W~Mm×r𝔽\widetilde{W}\in M_{m\times r}^{\mathbb{F}}, V~Mn×r𝔽\widetilde{V}\in M_{n\times r}^{\mathbb{F}} are isometries; moreover, we require that at least one of the matrices W~\widetilde{W} and V~\widetilde{V} is not square.

Remark 6.3.10.

It is easy to see that if A=WΣVA=W\Sigma V^{*} is a singular value decomposition of AA, then σk:=σk,k\sigma_{k}:=\sigma_{k,k} are singular values of AA, i.e. σk2\sigma_{k}^{2} are eigenvalues of AAA^{*}A. Moreover, the columns 𝐯k\mathbf{v}_{k} of VV are the corresponding eigenvectors of AAA^{*}A, AA𝐯k=σk2𝐯kA^{*}A\mathbf{v}_{k}=\sigma_{k}^{2}\mathbf{v}_{k}. Note also that if σk0\sigma_{k}\neq 0 then 𝐰k=1σkA𝐯k\mathbf{w}_{k}=\frac{1}{\sigma_{k}}A\mathbf{v}_{k}.

All that means that any singular value decomposition A=WΣVA=W\Sigma V^{*} can be obtained from a Schmidt decomposition (6.3.2) by the construction described above in this section.

The reduced singular value decomposition can be interpreted as a matrix form of the Schmidt decomposition (6.3.2) for a non-invertible matrix AA. For an invertible matrix AA the matrix form of the Schmidt decomposition gives the singular value decomposition.

Remark 6.3.11.

An alternative way to interpret the singular value decomposition A=WΣVA=W\Sigma V^{*} is to say that Σ\Sigma is the matrix of AA in the (orthonormal) bases 𝒜=𝐯1,𝐯2,,𝐯n\mathcal{A}=\mathbf{v}_{1},\mathbf{v}_{2},\ldots,\mathbf{v}_{n} and :=𝐰1,𝐰2,,𝐰m\mathcal{B}:=\mathbf{w}_{1},\mathbf{w}_{2},\ldots,\mathbf{w}_{m}, i.e that Σ=[A],𝒜\Sigma=[A]_{{}_{\scriptstyle\mathcal{B},\mathcal{A}}}.

We will use this interpretation later.

From singular value decomposition to the polar decomposition

Note, that if we know the singular value decomposition A=WΣVA=W\Sigma V^{*} of a square matrix AA, we can write a polar decomposition of AA:

(6.3.6) A=WΣV=(WV)(VΣV)=U|A|A=W\Sigma V^{*}=(WV^{*})(V\Sigma V^{*})=U|A|

where |A|=VΣV|A|=V\Sigma V^{*} and U=WVU=WV^{*}.

To see that this indeed give us a polar decomposition let us notice that VΣVV\Sigma V^{*} is a self-adjoint, positive semidefinite operator and that

AA=VΣWWΣV=VΣΣV=VΣVVΣV=(VΣV)2.A^{*}A=V\Sigma W^{*}W\Sigma V^{*}=V\Sigma\Sigma V^{*}=V\Sigma V^{*}V\Sigma V^{% *}=(V\Sigma V^{*})^{2}.

So by the definition of |A||A| as the unique positive semidefinite square root of AAA^{*}A, we can see that |A|=VΣV|A|=V\Sigma V^{*}. The transformation WVWV^{*} is clearly unitary, as a product of two unitary transformations, so (6.3.6) indeed gives us a polar decomposition of AA.

Note, that this reasoning only works for square matrices, because if AA is not square, then the product VΣV\Sigma is not defined (dimensions do not match, can you see how?).

Exercises.

6.3.1.

Show that the number of non-zero singular values of a matrix AA coincides with its rank.

6.3.2.

Find Schmidt decompositions A=k=1rsk𝐰k𝐯k\displaystyle A=\sum_{k=1}^{r}s_{k}\mathbf{w}_{k}\mathbf{v}_{k}^{*} for the following matrices AA:

(2302),(710055),(110111).\left(\begin{array}[]{rr}2&3\\ 0&2\\ \end{array}\right),\qquad\left(\begin{array}[]{rr}7&1\\ 0&0\\ 5&5\end{array}\right),\qquad\left(\begin{array}[]{rr}1&1\\ 0&1\\ -1&1\end{array}\right).
6.3.3.

Let AA be an invertible matrix, and let A=WΣVA=W\Sigma V^{*} be its singular value decomposition. Find a singular value decomposition for AA^{*} and A1A^{-1}.

6.3.4.

Find singular value decomposition A=WΣV\displaystyle A=W\Sigma V^{*} where VV and WW are unitary matrices for the following matrices:

  1. a)

    A=(316262)\displaystyle A=\left(\begin{array}[]{rr}-3&1\\ 6&-2\\ 6&-2\end{array}\right);

  2. b)

    A=(322232)\displaystyle A=\left(\begin{array}[]{rrr}3&2&2\\ 2&3&-2\end{array}\right).

6.3.5.

Find singular value decomposition of the matrix

A=(2302)A=\left(\begin{array}[]{rr}2&3\\ 0&2\\ \end{array}\right)

Use it to find

  1. a)

    max𝐱1A𝐱\max_{\|\mathbf{x}\|\leq 1}\|A\mathbf{x}\| and the vectors where the maximum is attained;

  2. b)

    min𝐱=1A𝐱\min_{\|\mathbf{x}\|=1}\|A\mathbf{x}\| and the vectors where the minimum is attained;

  3. c)

    the image A(B)A(B) of the closed unit ball in 2\mathbb{R}^{2}, B={𝐱2:𝐱1}B=\{\mathbf{x}\in\mathbb{R}^{2}:\|\mathbf{x}\|\leq 1\}. Describe A(B)A(B) geometrically.

6.3.6.

Show that for a square matrix AA, |detA|=det|A||\det A|=\det|A|.

6.3.7.

True or false

  1. a)

    Singular values of a matrix are also eigenvalues of the matrix.

  2. b)

    Singular values of a matrix AA are eigenvalues of AAA^{*}A.

  3. c)

    Is ss is a singular value of a matrix AA and cc is a scalar, then |c|s|c|s is a singular value of cAcA.

  4. d)

    The singular values of any linear operator are non-negative.

  5. e)

    Singular values of a self-adjoint matrix coincide with its eigenvalues.

6.3.8.

Let AA be an m×nm\times n matrix. Prove that non-zero eigenvalues of the matrices AAA^{*}A and AAAA^{*} (counting multiplicities) coincide.

Can you say when zero eigenvalue of AAA^{*}A and zero eigenvalue of AAAA^{*} have the same multiplicity?

6.3.9.

Let ss be the largest singular value of an operator AA, and let λ\lambda be the eigenvalue of AA with largest absolute value. Show that |λ|s|\lambda|\leq s.

6.3.10.

Show that the rank of a matrix is the number of its non-zero singular values (counting multiplicities).

6.3.11.

Show that the operator norm of a matrix AA coincides with its Frobenius norm if and only if the matrix has rank one. Hint: The previous problem might help.

6.3.12.

For the matrix AA

A=(2302),A=\left(\begin{array}[]{rr}2&-3\\ 0&2\\ \end{array}\right),

describe the inverse image of the unit ball, i.e. the set of all 𝐱2\mathbf{x}\in\mathbb{R}^{2} such that A𝐱1\|A\mathbf{x}\|\leq 1. Use singular value decomposition.

6.4. Applications of the singular value decomposition.

As we discussed above, the singular value decomposition is simply diagonalization with respect to two different orthonormal bases. Since we have two different bases here, we cannot say much about spectral properties of an operator from its singular value decomposition. For example, the diagonal entries of Σ\Sigma in the singular value decomposition (6.3.4) are not the eigenvalues of AA. Note, that for A=WΣVA=W\Sigma V^{*} as in (6.3.4) we generally have AnWΣnVA^{n}\neq W\Sigma^{n}V^{*}, so this diagonalization does not help us in computing functions of a matrix.

However, as the examples below show, singular values tell us a lot about so-called metric properties of a linear transformation.

Final remark: performing singular value decomposition requires finding eigenvalues and eigenvectors of the Hermitian (self-adjoint) matrix AAA^{*}A. To find eigenvalues we usually computed characteristic polynomial, found its roots, and so on… This looks like quite a complicated process, especially if one takes into account that there is no formula for finding roots of polynomials of degree 5 and higher.

However, there are very effective numerical methods of find eigenvalues and eigenvectors of a hermitian matrix up to any given precision. These methods do not involve computing the characteristic polynomial and finding its roots. They compute approximate eigenvalues and eigenvectors directly by an iterative procedure. Because a Hermitian matrix has an orthogonal basis of eigenvectors, these methods work extremely well.

We will not discuss these methods here, it goes beyond the scope of this book. However, you should believe me that there are very effective numerical methods for computing eigenvalues and eigenvectors of a Hermitian matrix and for finding the singular value decomposition. These methods are extremely effective, and just a little more computationally intensive than solving a linear system.

6.4.1. Image of the unit ball

Consider for example the following problem: let A:nmA:\mathbb{R}^{n}\to\mathbb{R}^{m} be a linear transformation, and let B={𝐱n:𝐱1}B=\{\mathbf{x}\in\mathbb{R}^{n}:\|\mathbf{x}\|\leq 1\} be the closed unit ball in n\mathbb{R}^{n}. We want to describe A(B)A(B), i.e. we want to find out how the unit ball is transformed under the linear transformation.

Let us first consider the simplest case when AA is a diagonal matrix A=diag{σ1,σ2,,σn}A=\operatorname{diag}\{\sigma_{1},\sigma_{2},\ldots,\sigma_{n}\}, σk>0\sigma_{k}>0, k=1,2,,nk=1,2,\ldots,n. Then for 𝐱=(x1,x2,,xn)T\mathbf{x}=(x_{1},x_{2},\ldots,x_{n})^{T} and (y1,y2,,yn)T=𝐲=A𝐱(y_{1},y_{2},\ldots,y_{n})^{T}=\mathbf{y}=A\mathbf{x} we have yk=σkxky_{k}=\sigma_{k}x_{k} (equivalently, xk=yk/σkx_{k}=y_{k}/\sigma_{k}) for k=1,2,,nk=1,2,\ldots,n, so

𝐲=(y1,y2,,yn)T=A𝐱for 𝐱1,\mathbf{y}=(y_{1},y_{2},\ldots,y_{n})^{T}=A\mathbf{x}\qquad\text{for }\|% \mathbf{x}\|\leq 1,

if and only if the coordinates y1,y2,,yny_{1},y_{2},\ldots,y_{n} satisfy the inequality

y12σ12+y22σ22++yn2σn2=k=1nyk2σk21\frac{y_{1}^{2}}{\sigma_{1}^{2}}+\frac{y_{2}^{2}}{\sigma_{2}^{2}}+\cdots+\frac% {y_{n}^{2}}{\sigma_{n}^{2}}=\sum_{k=1}^{n}\frac{y_{k}^{2}}{\sigma_{k}^{2}}\leq 1

(this is simply the inequality 𝐱2=k|xk|21\|\mathbf{x}\|^{2}=\sum_{k}|x_{k}|^{2}\leq 1).

The set of points in n\mathbb{R}^{n} satisfying the above inequalities is called an ellipsoid. If n=2n=2 it is an ellipse with half-axes σ1\sigma_{1} and σ2\sigma_{2}, for n=3n=3 it is an ellipsoid with half-axes σ1,σ2\sigma_{1},\sigma_{2} and σ3\sigma_{3}. In n\mathbb{R}^{n} the geometry of this set is also easy to visualize, and we call that set an ellipsoid with half axes σ1,σ2,,σn\sigma_{1},\sigma_{2},\ldots,\sigma_{n}. The vectors 𝐞1,𝐞2,,𝐞n\mathbf{e}_{1},\mathbf{e}_{2},\ldots,\mathbf{e}_{n} or, more precisely the corresponding lines are called the principal axes of the ellipsoid.

The singular value decomposition essentially says that any operator in an inner product space is diagonal with respect to a pair of orthonormal bases, see Remark 6.3.11. Namely, consider the orthogonal bases 𝒜=𝐯1,𝐯2,,𝐯n\mathcal{A}=\mathbf{v}_{1},\mathbf{v}_{2},\ldots,\mathbf{v}_{n} and =𝐰1,𝐰2,,𝐰n\mathcal{B}=\mathbf{w}_{1},\mathbf{w}_{2},\ldots,\mathbf{w}_{n} from the singular value decomposition (6.3.4). Then the matrix of AA in these bases is diagonal

[A],𝒜=diag{σk:k=1,2,,n}.[A]_{{}_{\scriptstyle\mathcal{B},\mathcal{A}}}=\operatorname{diag}\{\sigma_{k}% :k=1,2,\ldots,n\}.

Assuming that all σk>0\sigma_{k}>0 and essentially repeating the above reasoning, it is easy to show that any point 𝐲=A𝐱A(B)\mathbf{y}=A\mathbf{x}\in A(B) if and only if it satisfies the inequality

y12σ12+y22σ22++yn2σn2=k=1nyk2σk21.\frac{y_{1}^{2}}{\sigma_{1}^{2}}+\frac{y_{2}^{2}}{\sigma_{2}^{2}}+\cdots+\frac% {y_{n}^{2}}{\sigma_{n}^{2}}=\sum_{k=1}^{n}\frac{y_{k}^{2}}{\sigma_{k}^{2}}\leq 1.

where y1,y2,,yny_{1},y_{2},\ldots,y_{n} are coordinates of 𝐲\mathbf{y} in the orthonormal basis =𝐰1,𝐰2,,𝐰n\mathcal{B}=\mathbf{w}_{1},\mathbf{w}_{2},\ldots,\mathbf{w}_{n}, not in the standard one. Similarly, (x1,x2,,xn)T=[𝐱]𝒜(x_{1},x_{2},\ldots,x_{n})^{T}=[\mathbf{x}]_{{}_{\scriptstyle\mathcal{A}}}.

But that is essentially the same ellipsoid as before, only “rotated” (with different but still orthogonal principal axes)!

There is also an alternative explanation which is presented below.

Consider the general case, when the matrix AA is not necessarily square, and (or) not all singular values are non-zero. Consider first the case of a “diagonal” matrix Σ\Sigma of form (6.3.5). It is easy to see that the image ΣB\Sigma B of the unit ball BB is the ellipsoid (not in the whole space but in the RanΣ\operatorname{Ran}\Sigma) with half axes σ1,σ2,,σr\sigma_{1},\sigma_{2},\ldots,\sigma_{r}.

Consider now the general case, A=WΣVA=W\Sigma V^{*}, where VV, WW are unitary operators. Unitary transformations do not change the unit ball (because they preserve norm), so V(B)=BV^{*}(B)=B. We know that Σ(B)\Sigma(B) is an ellipsoid in RanΣ\operatorname{Ran}\Sigma with half-axes σ1,σ2,,σr\sigma_{1},\sigma_{2},\ldots,\sigma_{r}. Unitary transformations do not change geometry of objects, so W(Σ(B))W(\Sigma(B)) is also an ellipsoid with the same half-axes. It is not hard to see from the decomposition A=WΣVA=W\Sigma V^{*} (using the fact that both WW and VV^{*} are invertible) that WW transforms RanΣ\operatorname{Ran}\Sigma to RanA\operatorname{Ran}A, so we can conclude:

the image A(B)A(B) of the closed unit ball BB is an ellipsoid in RanA\operatorname{Ran}A with half axes σ1,σ2,,σr\sigma_{1},\sigma_{2},\ldots,\sigma_{r}. Here rr is the number of non-zero singular values, i.e. the rank of AA.

6.4.2. Operator norm of a linear transformation

Given a linear transformation A:XYA:X\to Y let us consider the following optimization problem: find the maximum of A𝐱\|A\mathbf{x}\| on the closed unit ball B={𝐱X:𝐱1}B=\{\mathbf{x}\in X:\|\mathbf{x}\|\leq 1\}.

Again, singular value decomposition allows us to solve the problem. For a “diagonal” (like the matrix Σ\Sigma in the definition of the singular value decomposition) matrix AA with non-negative entries the maximum is exactly maximal diagonal entry. Indeed, let s1,s2,,srs_{1},s_{2},\ldots,s_{r} be non-zero diagonal entries of AA and let s1s_{1} be the maximal one. Since for 𝐱=(x1,x2,,xn)T\mathbf{x}=(x_{1},x_{2},\ldots,x_{n})^{T}

(6.4.1) A𝐱=k=1rskxk𝐞k,A\mathbf{x}=\sum_{k=1}^{r}\ s_{k}x_{k}\mathbf{e}_{k},

we can conclude that

A𝐱2=k=1rsk2|xk|2s12k=1r|xk|2=s12𝐱2,\|A\mathbf{x}\|^{2}=\sum_{k=1}^{r}s_{k}^{2}|x_{k}|^{2}\leq s_{1}^{2}\sum_{k=1}% ^{r}|x_{k}|^{2}=s_{1}^{2}\cdot\|\mathbf{x}\|^{2},

so A𝐱s1𝐱\|A\mathbf{x}\|\leq s_{1}\|\mathbf{x}\|. On the other hand, A𝐞1=s1𝐞1=s1𝐞1\|A\mathbf{e}_{1}\|=\|s_{1}\mathbf{e}_{1}\|=s_{1}\|\mathbf{e}_{1}\|, so indeed s1s_{1} is the maximum of A𝐱\|A\mathbf{x}\| on the closed unit ball BB. Note, that in the above reasoning we did not assume that the matrix AA is square; we only assumed that all entries outside the “main diagonal” are 0, so formula (6.4.1) holds.

To treat the general case let us consider the singular value decomposition (6.3.4), A=WΣVA=W\Sigma V^{*}, where WW, VV are unitary operators, and Σ\Sigma is the “diagonal” matrix with non-negative entries. Since unitary transformations do not change the norm, one can conclude that the maximum of A𝐱\|A\mathbf{x}\| on the unit ball BB is the maximal diagonal entry of Σ\Sigma i.e. that

the maximum of A𝐱\|A\mathbf{x}\| on the unit ball BB is the maximal singular value of AA.

Definition.

The quantity max{A𝐱:𝐱X,𝐱1}\max\{\|A\mathbf{x}\|:\mathbf{x}\in X,\|\mathbf{x}\|\leq 1\} is called the operator norm of AA and denoted A\|A\|.

It is an easy exercise to see that A\|A\| satisfies all properties of the norm:

  1. 1.

    αA=|α|A\|\alpha A\|=|\alpha|\cdot\|A\|;

  2. 2.

    A+BA+B\|A+B\|\leq\|A\|+\|B\|;

  3. 3.

    A0\|A\|\geq 0 for all AA;

  4. 4.

    A=0\|A\|=0 if and only if A=𝟎A=\mathbf{0},

so it is indeed a norm on a space of linear transformations from XX to YY.

One of the main properties of the operator norm is the inequality

A𝐱A𝐱,\|A\mathbf{x}\|\leq\|A\|\cdot\|\mathbf{x}\|,

which follows easily from the homogeneity of the norm 𝐱\|\mathbf{x}\|.

In fact, it can be shown that the operator norm A\|A\| is the best (smallest) number C0C\geq 0 such that

A𝐱C𝐱𝐱X.\|A\mathbf{x}\|\leq C\|\mathbf{x}\|\qquad\forall\mathbf{x}\in X.

This is often used as a definition of the operator norm.

On the space of linear transformations we already have one norm, the Frobenius, or Hilbert-Schmidt norm A2\|A\|_{2},

A22=trace(AA).\|A\|_{2}^{2}=\operatorname{trace}(A^{*}A).

So, let us investigate how these two norms compare.

Let s1,s2,,srs_{1},s_{2},\ldots,s_{r} be non-zero singular values of AA (counting multiplicities), and let s1s_{1} be the largest singular value. Then s12,s22,,sr2s^{2}_{1},s^{2}_{2},\ldots,s^{2}_{r} are non-zero eigenvalues of AAA^{*}A (again counting multiplicities). Recalling that the trace equals the sum of the eigenvalues we conclude that

A22=trace(AA)=k=1rsk2.\|A\|_{2}^{2}=\operatorname{trace}(A^{*}A)=\sum_{k=1}^{r}s_{k}^{2}.

On the other hand we know that the operator norm of AA equals its largest singular value, i.e. A=s1\|A\|=s_{1}. So we can conclude that AA2\|A\|\leq\|A\|_{2}, i.e. that

the operator norm of a matrix cannot be more than its Frobenius norm.

This statement also admits a direct proof using the Cauchy–Schwarz inequality, and such a proof is presented in some textbooks. The beauty of the proof we presented here is that it does not require any computations and illuminates the reasons behind the inequality.

6.4.3. Condition number of a matrix

Suppose we have an invertible matrix AA and we want to solve the equation A𝐱=𝐛A\mathbf{x}=\mathbf{b}. The solution, of course, is given by 𝐱=A1𝐛\mathbf{x}=A^{-1}\mathbf{b}, but we want to investigate what happens if we know the data only approximately.

That happens in the real life, when the data is obtained, for example by some experiments. But even if we have exact data, round-off errors during computations by a computer may have the same effect of distorting the data.

Let us consider the simplest model, suppose there is a small error in the right side of the equation. That means, instead of the equation A𝐱=𝐛A\mathbf{x}=\mathbf{b} we are solving

A𝐱=𝐛+Δ𝐛,A\mathbf{x}=\mathbf{b}+{\scriptstyle\Delta}\mathbf{b},

where Δ𝐛{\scriptstyle\Delta}\mathbf{b} is a small perturbation of the right side 𝐛\mathbf{b}.

So, instead of the exact solution 𝐱\mathbf{x} of A𝐱=𝐛A\mathbf{x}=\mathbf{b} we get the approximate solution 𝐱+Δ𝐱\mathbf{x}+{\scriptstyle\Delta}\mathbf{x} of A(𝐱+Δ𝐱)=𝐛+Δ𝐛A(\mathbf{x}+{\scriptstyle\Delta}\mathbf{x})=\mathbf{b}+{\scriptstyle\Delta}% \mathbf{b}. We are assuming that AA is invertible, so Δ𝐱=A1Δ𝐛{\scriptstyle\Delta}\mathbf{x}=A^{-1}{\scriptstyle\Delta}\mathbf{b}.

We want to know how big is the relative error in the solution Δ𝐱/𝐱\|{\scriptstyle\Delta}\mathbf{x}\|/\|\mathbf{x}\| in comparison with the relative error in the right side Δ𝐛/𝐛\|{\scriptstyle\Delta}\mathbf{b}\|/\|\mathbf{b}\|. It is easy to see that

Δ𝐱𝐱=A1Δ𝐛𝐱=A1Δ𝐛𝐛𝐛𝐱=A1Δ𝐛𝐛A𝐱𝐱.\frac{\|{\scriptstyle\Delta}\mathbf{x}\|}{\|\mathbf{x}\|}=\frac{\|A^{-1}{% \scriptstyle\Delta}\mathbf{b}\|}{\|\mathbf{x}\|}=\frac{\|A^{-1}{\scriptstyle% \Delta}\mathbf{b}\|}{\|\mathbf{b}\|}\frac{\|\mathbf{b}\|}{\|\mathbf{x}\|}=% \frac{\|A^{-1}{\scriptstyle\Delta}\mathbf{b}\|}{\|\mathbf{b}\|}\frac{\|A% \mathbf{x}\|}{\|\mathbf{x}\|}.

Since A1Δ𝐛A1Δ𝐛\|A^{-1}{\scriptstyle\Delta}\mathbf{b}\|\leq\|A^{-1}\|\cdot\|{\scriptstyle% \Delta}\mathbf{b}\| and A𝐱A𝐱\|A\mathbf{x}\|\leq\|A\|\cdot\|\mathbf{x}\| we can conclude that

Δ𝐱𝐱A1AΔ𝐛𝐛.\frac{\|{\scriptstyle\Delta}\mathbf{x}\|}{\|\mathbf{x}\|}\leq\|A^{-1}\|\cdot\|% A\|\cdot\frac{\|{\scriptstyle\Delta}\mathbf{b}\|}{\|\mathbf{b}\|}.

The quantity AA1\|A\|\cdot\|A^{-1}\| is called the condition number of the matrix AA. It estimates how the relative error in the solution 𝐱\mathbf{x} depends on the relative error in the right side 𝐛\mathbf{b}.

Let us see how this quantity is related to singular values. Let the numbers s1,s2,,sns_{1},s_{2},\ldots,s_{n} be the singular values of AA, and let us assume that s1s_{1} is the largest singular value and sns_{n} is the smallest. We know that the (operator) norm of an operator equals its largest singular value, so

A=s1,A1=1sn,\|A\|=s_{1},\qquad\|A^{-1}\|=\frac{1}{s_{n}},

so

AA1=s1sn.\|A\|\cdot\|A^{-1}\|=\frac{s_{1}}{s_{n}}.

In other words

The condition number of a matrix equals to the ratio of the largest and the smallest singular values.

We deduced above that Δ𝐱𝐱A1AΔ𝐛𝐛\frac{\|{\scriptstyle\Delta}\mathbf{x}\|}{\|\mathbf{x}\|}\leq\|A^{-1}\|\cdot\|% A\|\cdot\frac{\|{\scriptstyle\Delta}\mathbf{b}\|}{\|\mathbf{b}\|}. It is not hard to see that this estimate is sharp, i.e. that it is possible to pick the right side 𝐛\mathbf{b} and the error Δ𝐛{\scriptstyle\Delta}\mathbf{b} such that we have equality

Δ𝐱𝐱=A1AΔ𝐛𝐛.\frac{\|{\scriptstyle\Delta}\mathbf{x}\|}{\|\mathbf{x}\|}=\|A^{-1}\|\cdot\|A\|% \cdot\frac{\|{\scriptstyle\Delta}\mathbf{b}\|}{\|\mathbf{b}\|}.

We just put 𝐛=𝐰1\mathbf{b}=\mathbf{w}_{1} and Δ𝐛=α𝐰n{\scriptstyle\Delta}\mathbf{b}=\alpha\mathbf{w}_{n}, where 𝐰1\mathbf{w}_{1} and 𝐰n\mathbf{w}_{n} are respectively the first and the last column of the matrix WW in the singular value decomposition A=WΣVA=W\Sigma V^{*}, and α0\alpha\neq 0 is an arbitrary scalar. Here, as usual, the singular values are assumed to be in non-increasing order s1s2sns_{1}\geq s_{2}\geq\ldots\geq s_{n}, so s1s_{1} is the largest and sns_{n} is the smallest eigenvalue.

We leave the details as an exercise for the reader.

A matrix is called well conditioned if its condition number is not too big. If the condition number is big, the matrix is called ill conditioned. What is “big” here depends on the problem: with what precision you can find your right side, what precision is required for the solution, etc.

6.4.4. Effective rank of a matrix

Theoretically, the rank of a matrix is easy to compute: one just needs to row reduce matrix and count pivots. However, in practical applications not everything is so easy. The main reason is that very often we do not know the exact matrix, we only know its approximation up to some precision.

Moreover, even if we know the exact matrix, most computer programs introduce round-off errors in the computations, so effectively we cannot distinguish between a zero pivot and a very small pivot.

A simple naïve idea of working with round-off errors is as follows. When computing the rank (and other objects related to it, like column space, kernel, etc) one simply sets up a tolerance (some small number) and if the pivot is smaller than the tolerance, count it as zero. The advantage of this approach is its simplicity, since it is very easy to program. However, the main disadvantage is that it is impossible to see what the tolerance is responsible for. For example, what do we lose if we set the tolerance equal to 10610^{-6}? How much better will 10810^{-8} be?

While the above approach works well for well conditioned matrices, it is not very reliable in the general case.

A better approach is to use singular values. It requires more computations, but gives much better results, which are easier to interpret. In this approach we also set up some small number as a tolerance, and then perform singular value decomposition. Then we simply treat singular values smaller than the tolerance as zero. The advantage of this approach is that we can see what we are doing. The singular values are the half-axes of the ellipsoid A(B)A(B) (BB is the closed unit ball), so by setting up the tolerance we just deciding how “thin” the ellipsoid should be to be considered “flat”.

6.4.5. Moore–Penrose (pseudo)inverse.

As we discussed in Section 5.4 of Chapter 5 above, the least square solution gives us, in the case when an equation A𝐱=𝐛A\mathbf{x}=\mathbf{b} does not have a solutions, the “next best thing” (and gives us the solution of A𝐱=𝐛A\mathbf{x}=\mathbf{b} when it exists).

Note, that the question of uniqueness is not addressed by the least square solution: a solution of the normal equation AA𝐱=A𝐛A^{*}A\mathbf{x}=A^{*}\mathbf{b} does not have to be unique. A natural distinguished solution would be a solution of minimal norm; such a solution is indeed unique, and can be obtained by taking an arbitrary solution and then taking its projection onto (KerAA)=(KerA)(\operatorname{Ker}A^{*}A)^{\perp}=(\operatorname{Ker}A)^{\perp}, see Exercises 5.4.5 and 5.4.6 in Chapter 5.

It is not hard to see that if A=W~Σ~V~A=\widetilde{W}\widetilde{\Sigma}\widetilde{V}^{*} is a reduced singular value decomposition of AA, then the minimal norm least square solution 𝐱0\mathbf{x}_{0} is given by

(6.4.2) 𝐱0=V~Σ~1W~𝐛.\displaystyle\mathbf{x}_{0}=\widetilde{V}\widetilde{\Sigma}^{-1}\widetilde{W}^% {*}\mathbf{b}.

Indeed, 𝐱0\mathbf{x}_{0} is a least square solution of A𝐱=𝐛A\mathbf{x}=\mathbf{b} (i.e. a solution of A𝐱=PRanA𝐛A\mathbf{x}=P_{{}_{\scriptstyle\operatorname{Ran}A}}\mathbf{b}):

A𝐱0=W~Σ~V~V~Σ~1W~𝐛=W~Σ~Σ~1W~𝐛=W~W~𝐛=PRanA𝐛;A\mathbf{x}_{0}=\widetilde{W}\widetilde{\Sigma}\widetilde{V}^{*}\widetilde{V}% \widetilde{\Sigma}^{-1}\widetilde{W}^{*}\mathbf{b}=\widetilde{W}\widetilde{% \Sigma}\widetilde{\Sigma}^{-1}\widetilde{W}^{*}\mathbf{b}=\widetilde{W}% \widetilde{W}^{*}\mathbf{b}=P_{{}_{\scriptstyle\operatorname{Ran}A}}\mathbf{b};

in the last equality in the chain we used the fact that W~W~=PRanW~\widetilde{W}\widetilde{W}^{*}=P_{{}_{\scriptstyle\operatorname{Ran}\widetilde% {W}}} (PRanW~=W~(W~W~)1W~=W~W~P_{{}_{\scriptstyle\operatorname{Ran}\widetilde{W}}}=\widetilde{W}(\widetilde{% W}^{*}\widetilde{W})^{-1}\widetilde{W}^{*}=\widetilde{W}\widetilde{W}^{*}) and that RanW~=RanA\operatorname{Ran}\widetilde{W}=\operatorname{Ran}A (see Exercise 6.4.4 below).

The general solution of A𝐱=PRanA𝐛A\mathbf{x}=P_{{}_{\scriptstyle\operatorname{Ran}A}}\mathbf{b} is given by

𝐱=𝐱0+𝐲,𝐲KerA.\mathbf{x}=\mathbf{x}_{0}+\mathbf{y},\qquad\mathbf{y}\in\operatorname{Ker}A.

Note that 𝐱0RanW~\mathbf{x}_{0}\in\operatorname{Ran}\widetilde{W} and RanW~=RanA\operatorname{Ran}\widetilde{W}=\operatorname{Ran}A^{*}, see again Exercise 6.4.4 below. Therefore 𝐱0KerA\mathbf{x}_{0}\perp\operatorname{Ker}A (because KerA=(RanA)\operatorname{Ker}A=(\operatorname{Ran}A^{*})^{\perp}, see Theorem 5.5.1 in Chapter 5), so 𝐱0\mathbf{x}_{0} is indeed the unique minimal norm solution of A𝐱=PRanA𝐛A\mathbf{x}=P_{{}_{\scriptstyle\operatorname{Ran}A}}\mathbf{b}, or equivalently, the minimal norm least square solution of A𝐱=𝐛A\mathbf{x}=\mathbf{b}.

Definition 6.4.1.

The operator A+:=V~Σ~1W~A^{+}:=\widetilde{V}\widetilde{\Sigma}^{-1}\widetilde{W}^{*}, where A=W~Σ~V~A=\widetilde{W}\widetilde{\Sigma}\widetilde{V}^{*} is a reduced singular value decomposition of AA, is called the Moore–Penrose inverse (or Moore–Penrose pseudoinverse) of the operator AA. In other words, the Moore–Penrose inverse is the operator giving the unique least square solution of A𝐱=𝐛A\mathbf{x}=\mathbf{b}.

Remark 6.4.2.

In the literature the Moore–Penrose inverse is usually defined as a matrix A+A^{+} such that

  1. 1.

    AA+A=AAA^{+}A=A;

  2. 2.

    A+AA+=A+A^{+}AA^{+}=A^{+};

  3. 3.

    (AA+)=AA+(AA^{+})^{*}=AA^{+};

  4. 4.

    (A+A)=A+A(A^{+}A)^{*}=A^{+}A.

It is very easy to check that the operator A+:=V~Σ~1W~A^{+}:=\widetilde{V}\widetilde{\Sigma}^{-1}\widetilde{W}^{*} satisfies properties 1–4 above.

It is also possible (although a bit harder) to show that an operator A+A^{+} satisfying properties 1–4 is unique. Indeed, right and left multiplying identity 1 by A+A^{+}, we get that (A+A)2=A+A(A^{+}A)^{2}=A^{+}A and (AA+)2=AA+(AA^{+})^{2}=AA^{+}; together with properties 3 and 4 this means that A+AA^{+}A and AA+AA^{+} are orthogonal projections (see Exercise 5.5.6 in Chapter 5).

Trivially, KerAKerA+A\operatorname{Ker}A\subset\operatorname{Ker}A^{+}A. On the other hand, identity 1 implies that KerA+AKerA\operatorname{Ker}A^{+}A\subset\operatorname{Ker}A (why?), so KerA+A=KerA\operatorname{Ker}A^{+}A=\operatorname{Ker}A. But this means that A+AA^{+}A is the orthogonal projection onto (KerA)=RanA(\operatorname{Ker}A)^{\perp}=\operatorname{Ran}A^{*},

A+A=PRanA.A^{+}A=P_{{}_{\scriptstyle\operatorname{Ran}A^{*}}}.

Property 1 also implies that AA+𝐲=𝐲AA^{+}\mathbf{y}=\mathbf{y} for all 𝐲RanA\mathbf{y}\in\operatorname{Ran}A. Since AA+AA^{+} is an orthogonal projection, we conclude that RanARanAA+\operatorname{Ran}A\subset\operatorname{Ran}AA^{+}. The opposite inclusion RanAA+RanA\operatorname{Ran}AA^{+}\subset\operatorname{Ran}A is trivial, so AA+AA^{+} is the orthogonal projection onto RanA\operatorname{Ran}A,

AA+=PRanA.AA^{+}=P_{{}_{\scriptstyle\operatorname{Ran}A}}.

Knowing A+AA^{+}A and AA+AA^{+} we can rewrite property 2 as

PRanAA+=A+orA+PRanA=A+.P_{{}_{\scriptstyle\operatorname{Ran}A^{*}}}A^{+}=A^{+}\qquad\text{or}\qquad A% ^{+}P_{{}_{\scriptstyle\operatorname{Ran}A}}=A^{+}\,.

Combining the above identities we get

PRanAA+PRanA=A+.P_{{}_{\scriptstyle\operatorname{Ran}A^{*}}}A^{+}P_{{}_{\scriptstyle% \operatorname{Ran}A}}=A^{+}.

Finally, for any 𝐛\mathbf{b} in the target space of AA

𝐱0:=A+𝐛=PRanAA+𝐛RanA\mathbf{x}_{0}:=A^{+}\mathbf{b}=P_{{}_{\scriptstyle\operatorname{Ran}A^{*}}}A^% {+}\mathbf{b}\in\operatorname{Ran}A^{*}

and

A𝐱0=AA+𝐛=PRanA𝐛,A\mathbf{x}_{0}=AA^{+}\mathbf{b}=P_{{}_{\scriptstyle\operatorname{Ran}A}}% \mathbf{b},

i.e. 𝐱0\mathbf{x}_{0} is a least square solution of A𝐱=𝐛A\mathbf{x}=\mathbf{b}. Since 𝐱0RanA=(KerA)\mathbf{x}_{0}\in\operatorname{Ran}A^{*}=(\operatorname{Ker}A)^{\perp}, 𝐱0\mathbf{x}_{0} is, as we discussed above, the least square solution of minimal norm. But, as we had shown before, such minimal norm solution is given by (6.4.2), so A+=V~Σ~1W~A^{+}=\widetilde{V}\widetilde{\Sigma}^{-1}\widetilde{W}^{*}. ∎

Exercises.

6.4.1.

Find norms and condition numbers for the following matrices:

  1. a)

    A=(4013)\displaystyle A=\left(\begin{array}[]{rr}4&0\\ 1&3\end{array}\right).

  2. b)

    A=(5333)\displaystyle A=\left(\begin{array}[]{rr}5&3\\ -3&3\end{array}\right).

For the matrix AA from part a) present an example of the right side 𝐛\mathbf{b} and the error Δ𝐛{\scriptstyle\Delta}\mathbf{b} such that

Δ𝐱𝐱=AA1Δ𝐛𝐛;\frac{\|{\scriptstyle\Delta}\mathbf{x}\|}{\|\mathbf{x}\|}=\|A\|\cdot\|A^{-1}\|% \cdot\frac{\|{\scriptstyle\Delta}\mathbf{b}\|}{\|\mathbf{b}\|};

here A𝐱=𝐛A\mathbf{x}=\mathbf{b} and AΔ𝐱=Δ𝐛A{\scriptstyle\Delta}\mathbf{x}={\scriptstyle\Delta}\mathbf{b}.

6.4.2.

Let AA be a normal operator, and let λ1,λ2,,λn\lambda_{1},\lambda_{2},\ldots,\lambda_{n} be its eigenvalues (counting multiplicities). Show that singular values of AA are |λ1|,|λ2|,,|λn||\lambda_{1}|,|\lambda_{2}|,\ldots,|\lambda_{n}|.

6.4.3.

Find singular values, norm and condition number of the matrix

A=(211121112)A=\left(\begin{array}[]{rrr}2&1&1\\ 1&2&1\\ 1&1&2\end{array}\right)

You can do this problem practically without any computations, if you use the previous problem and can answer the following questions:

  1. a)

    What are singular values (eigenvalues) of an orthogonal projection PEP_{E} onto some subspace EE?

  2. b)

    What is the matrix of the orthogonal projection onto the subspace spanned by the vector (1,1,1)T(1,1,1)^{T}?

  3. c)

    How the eigenvalues of the operators TT and aT+bIaT+bI, where aa and bb are scalars, are related?

Of course, you can also just honestly do the computations.

6.4.4.

Let A=W~Σ~V~A=\widetilde{W}\widetilde{\Sigma}\widetilde{V}^{*} be a reduced singular value decomposition of AA. Show that RanA=RanW~\operatorname{Ran}A=\operatorname{Ran}\widetilde{W}, and then by taking adjoint that RanA=RanV~\operatorname{Ran}A^{*}=\operatorname{Ran}\widetilde{V}.

6.4.5.

Write a formula for the Moore–Penrose inverse A+A^{+} in terms of the singular value decomposition A=WΣVA=W\Sigma V^{*}.

6.4.6.

Tychonov’s regularization: Prove that the Moore–Penrose inverse A+A^{+} can be computed as the limits

A+=limε0+(AA+εI)1A=limε0+A(AA+εI)1.A^{+}=\lim_{\varepsilon\to 0+}(A^{*}A+\varepsilon I)^{-1}A^{*}=\lim_{% \varepsilon\to 0+}A^{*}(AA^{*}+\varepsilon I)^{-1}.

6.5. Structure of orthogonal matrices

An orthogonal matrix UU with detU=1\det U=1 is often called a rotation. The theorem below explains this name.

Theorem 6.5.1.

Let UU be an orthogonal operator in n\mathbb{R}^{n} and let detU=1\det U=1.333For an orthogonal matrix UU detU=±1\det U=\pm 1. Then there exists an orthonormal basis 𝐯1,𝐯2,,𝐯n\mathbf{v}_{1},\mathbf{v}_{2},\ldots,\mathbf{v}_{n} such that the matrix of UU in this basis has the block diagonal form

(Rφ1Rφ20Rφk0In2k),\left(\begin{array}[]{ccccc}R_{\varphi_{1}}&&&&\\[-10.76385pt] &R_{\varphi_{2}}&&&{\raisebox{6.45831pt}{\Huge$0$}}\\ &&\ddots&&\\ &&&R_{\varphi_{k}}&\\[-10.76385pt] {\raisebox{6.45831pt}{\Huge$0$}}&&&&I_{n-2k}\\ \end{array}\right),

where RφkR_{\varphi_{k}} are 22-dimensional rotations,

Rφk=(cosφksinφksinφkcosφk)R_{\varphi_{k}}=\left(\begin{array}[]{cc}\cos\varphi_{k}&-\sin\varphi_{k}\\ \sin\varphi_{k}&\cos\varphi_{k}\\ \end{array}\right)

and In2kI_{n-2k} stands for the identity matrix of size (n2k)×(n2k)(n-2k)\times(n-2k).

Proof.

We know that if pp is a polynomial with real coefficient and λ\lambda is its complex root, p(λ)=0p(\lambda)=0, then λ¯\overline{\lambda} is a root of pp as well, p(λ¯)=0p(\overline{\lambda})=0 (this can easily be checked by plugging λ¯\overline{\lambda} into p(z)=k=0nakzkp(z)=\sum_{k=0}^{n}a_{k}z^{k}).

Therefore, all complex eigenvalues of a real matrix can be split into pairs λk\lambda_{k}, λ¯k\overline{\lambda}_{k}.

We know, that eigenvalues of a unitary matrix have absolute value 11, so all complex eigenvalues of UU (which is both real and unitary) can be written as pairs λk=cosαk+isinαk\lambda_{k}=\cos\alpha_{k}+i\sin\alpha_{k}, λ¯k=cosαkisinαk\overline{\lambda}_{k}=\cos\alpha_{k}-i\sin\alpha_{k}.

Fix a pair of complex eigenvalues λ\lambda and λ¯\overline{\lambda}, and let 𝐮n\mathbf{u}\in\mathbb{C}^{n} be the eigenvector of UU, U𝐮=λ𝐮U\mathbf{u}=\lambda\mathbf{u}. Then U𝐮¯=λ¯𝐮¯U\overline{\mathbf{u}}=\overline{\lambda}\overline{\mathbf{u}}. Now, split 𝐮\mathbf{u} into real and imaginary parts, i.e. define

𝐱:=Re𝐮=(𝐮+𝐮¯)/2,𝐲=Im𝐮=(𝐮𝐮¯)/(2i),\mathbf{x}:=\operatorname{Re}\mathbf{u}=(\mathbf{u}+\overline{\mathbf{u}})/2,% \qquad\mathbf{y}=\operatorname{Im}\mathbf{u}=(\mathbf{u}-\overline{\mathbf{u}}% )/(2i),

so 𝐮=𝐱+i𝐲\mathbf{u}=\mathbf{x}+i\mathbf{y} (note, that 𝐱,𝐲\mathbf{x},\mathbf{y} are real vectors, i.e. vectors with real entries). Then

U𝐱=U12(𝐮+𝐮¯)=12(λ𝐮+λ¯𝐮¯)=Re(λ𝐮).U\mathbf{x}=U\frac{1}{2}(\mathbf{u}+\overline{\mathbf{u}})=\frac{1}{2}(\lambda% \mathbf{u}+\overline{\lambda}\overline{\mathbf{u}})=\operatorname{Re}(\lambda% \mathbf{u}).

Similarly,

U𝐲=12iU(𝐮𝐮¯)=12i(λ𝐮λ¯𝐮¯)=Im(λ𝐮).U\mathbf{y}=\frac{1}{2i}U(\mathbf{u}-\overline{\mathbf{u}})=\frac{1}{2i}(% \lambda\mathbf{u}-\overline{\lambda}\overline{\mathbf{u}})=\operatorname{Im}(% \lambda\mathbf{u}).

Since λ=cosα+isinα\lambda=\cos\alpha+i\sin\alpha, we have

λ𝐮=(cosα+isinα)(𝐱+i𝐲)=((cosα)𝐱(sinα)𝐲)+i((cosα)𝐲+(sinα)𝐱).\lambda\mathbf{u}=(\cos\alpha+i\sin\alpha)(\mathbf{x}+i\mathbf{y})=((\cos% \alpha)\mathbf{x}-(\sin\alpha)\mathbf{y})+i((\cos\alpha)\mathbf{y}+(\sin\alpha% )\mathbf{x}).

so

U𝐱=Re(λ𝐮)=(cosα)𝐱(sinα)𝐲,U𝐲=Im(λ𝐮)=(cosα)𝐲+(sinα)𝐱.U\mathbf{x}=\operatorname{Re}(\lambda\mathbf{u})=(\cos\alpha)\mathbf{x}-(\sin% \alpha)\mathbf{y},\qquad U\mathbf{y}=\operatorname{Im}(\lambda\mathbf{u})=(% \cos\alpha)\mathbf{y}+(\sin\alpha)\mathbf{x}.

In other word, UU leaves the 2-dimensional subspace EλE_{\lambda} spanned by the vectors 𝐱,𝐲\mathbf{x},\mathbf{y} invariant and the matrix of the restriction of UU onto this subspace is the rotation matrix

Rα=(cosαsinαsinαcosα).R_{-\alpha}=\left(\begin{array}[]{cc}\cos\alpha&\sin\alpha\\ -\sin\alpha&\cos\alpha\end{array}\right).

Note, that the vectors 𝐮\mathbf{u} and 𝐮¯\overline{\mathbf{u}} (eigenvectors of a unitary matrix, corresponding to different eigenvalues) are orthogonal, so by the Pythagorean Theorem

𝐱=𝐲=22𝐮.\|\mathbf{x}\|=\|\mathbf{y}\|=\frac{\sqrt{2}}{2}\|\mathbf{u}\|.

It is easy to check that 𝐱𝐲\mathbf{x}\perp\mathbf{y}, so 𝐱,𝐲\mathbf{x},\mathbf{y} is an orthogonal basis in EλE_{\lambda}. If we multiply each vector in the basis 𝐱,𝐲\mathbf{x},\mathbf{y} by the same non-zero number, we do not change matrices of linear transformations, so without loss of generality we can assume that 𝐱=𝐲=1\|\mathbf{x}\|=\|\mathbf{y}\|=1 i.e. that 𝐱,𝐲\mathbf{x},\mathbf{y} is an orthonormal basis in EλE_{\lambda}.

Let us complete the orthonormal system 𝐯1=𝐱,𝐯2=𝐲\mathbf{v}_{1}=\mathbf{x},\mathbf{v}_{2}=\mathbf{y} to an orthonormal basis in n\mathbb{R}^{n}. Since UEλEλUE_{\lambda}\subset E_{\lambda}, i.e. EλE_{\lambda} is an invariant subspace of UU, the matrix of UU in this basis has the block triangular form

(Rα𝟎U1)\left(\begin{array}[]{c|ccc}R_{-\alpha}&&*&\\ \hline\cr&&&\\ \mathbf{0}&&U_{1}&\\ &&&\\ \end{array}\right)

where 𝟎\mathbf{0} stands for the (n2)×2(n-2)\times 2 block of zeroes.

Since the rotation matrix RαR_{-\alpha} is invertible, we have UEλ=EλUE_{\lambda}=E_{\lambda}. Therefore

UEλ=U1Eλ=Eλ,U^{*}E_{\lambda}=U^{-1}E_{\lambda}=E_{\lambda},

so the matrix of UU in the basis we constructed is in fact block diagonal,

(Rα𝟎𝟎U1).\left(\begin{array}[]{c|ccc}R_{-\alpha}&&\mathbf{0}&\\ \hline\cr&&&\\ \mathbf{0}&&U_{1}&\\ &&&\\ \end{array}\right).

Since UU is unitary

I=UU=(I𝟎𝟎U1U1),I=U^{*}U=\left(\begin{array}[]{c|ccc}I&&\mathbf{0}&\\ \hline\cr&&&\\ \mathbf{0}&&U_{1}^{*}U_{1}&\\ &&&\\ \end{array}\right),

so, since U1U_{1} is square, it is also unitary.

If U1U_{1} has complex eigenvalues we can apply the same procedure to decrease its size by 2 until we are left with a block that has only real eigenvalues. Real eigenvalues can be only +1+1 or 1-1, so in some orthonormal basis the matrix of UU has the form

(Rα1Rα20RαdIr0Il);\left(\begin{array}[]{cccccc}R_{-\alpha_{1}}&&&&&\\[-10.76385pt] &R_{-\alpha_{2}}&&&{\raisebox{6.45831pt}{\Huge$0$}}&\\ &&\ddots&&&\\ &&&R_{-\alpha_{d}}&&\\ &&&&-I_{r}&\\[-10.76385pt] {\raisebox{6.45831pt}{\Huge$0$}}&&&&&I_{l}\\ \end{array}\right);

here IrI_{r} and IlI_{l} are identity matrices of size r×rr\times r and l×ll\times l respectively. Since detU=1\det U=1, the multiplicity of the eigenvalue 1-1 (i.e. rr) must be even.

Note, that the 2×22\times 2 matrix I2-I_{2} can be interpreted as the rotation through the angle π\pi. Therefore, the above matrix has the form given in the conclusion of the theorem with φk=αk\varphi_{k}=-\alpha_{k} or φk=π\varphi_{k}=\pi

Let us give a different interpretation of Theorem 6.5.1. Define TjT_{j} to be a rotation through φj\varphi_{j} in the plane spanned by the vectors 𝐯j\mathbf{v}_{j}, 𝐯j+1\mathbf{v}_{j+1}. Then Theorem 6.5.1 simply says that UU is the composition of the rotations TjT_{j}, j=1,2,,kj=1,2,\ldots,k. Note, that because the rotations TjT_{j} act in mutually orthogonal planes, they commute, i.e. it does not matter in what order we take the composition. So, the theorem can be interpreted as follows:

Any rotation in n\mathbb{R}^{n} can be represented as a composition of at most n/2n/2 commuting planar rotations.

If an orthogonal matrix has determinant 1-1, its structure is described by the following theorem.

Theorem 6.5.2.

Let UU be an orthogonal operator in n\mathbb{R}^{n}, and let detU=1\det U=-1.Then there exists an orthonormal basis 𝐯1,𝐯2,,𝐯n\mathbf{v}_{1},\mathbf{v}_{2},\ldots,\mathbf{v}_{n} such that the matrix of UU in this basis has block diagonal form

(Rφ1Rφ20RφkIr01),\left(\begin{array}[]{ccccccc}R_{\varphi_{1}}&&&&&\\[-10.76385pt] &R_{\varphi_{2}}&&&&{\raisebox{6.45831pt}{\Huge$0$}}\\ &&\ddots&&&\\ &&&R_{\varphi_{k}}&&\\ &&&&I_{r}&\\[-6.45831pt] {\raisebox{0.0pt}{\Huge$0$}}&&&&&-1\end{array}\right),

where r=n2k1r=n-2k-1 and RφkR_{\varphi_{k}} are 22-dimensional rotations,

Rφk=(cosφksinφksinφkcosφk)R_{\varphi_{k}}=\left(\begin{array}[]{cc}\cos\varphi_{k}&-\sin\varphi_{k}\\ \sin\varphi_{k}&\cos\varphi_{k}\\ \end{array}\right)

and In2kI_{n-2k} stands for the identity matrix of size (n2k)×(n2k)(n-2k)\times(n-2k).

We leave the proof as an exercise for the reader. The modification that one should make to the proof of Theorem 6.5.1 are pretty obvious.

Note, that it follows from the above theorem that an orthogonal 2×22\times 2 matrix UU with determinant 1-1 is always a reflection.

Let us now fix an orthonormal basis, say the standard basis in n\mathbb{R}^{n}. We call an elementary rotation444This term is not widely accepted. a rotation in the xjx_{j}-xkx_{k} plane, i.e. a linear transformation which changes only the coordinates xjx_{j} and xkx_{k}, and it acts on these two coordinates as a plane rotation.

Theorem 6.5.3.

Any rotation UU (i.e. an orthogonal transformation UU with detU=1\det U=1) can be represented as a product at most n(n1)/2n(n-1)/2 elementary rotations.

To prove the theorem we will need the following simple lemmas.

Lemma 6.5.4.

Let 𝐱=(x1,x2)T2\mathbf{x}=(x_{1},x_{2})^{T}\in\mathbb{R}^{2}. There exists a rotation RαR_{\alpha} of 2\mathbb{R}^{2} which moves the vector 𝐱\mathbf{x} to the vector (a,0)T(a,0)^{T}, where a=x12+x22a=\sqrt{x_{1}^{2}+x_{2}^{2}}.

The proof is elementary, and we leave it as an exercise for the reader. One can just draw a picture or/and write a formula for RαR_{\alpha}.

Lemma 6.5.5.

Let 𝐱=(x1,x2,,xn)Tn\mathbf{x}=(x_{1},x_{2},\ldots,x_{n})^{T}\in\mathbb{R}^{n}. There exist n1n-1 elementary rotations R1,R2,,Rn1R_{1},R_{2},\ldots,R_{n-1} such that Rn1,R2R1𝐱=(a,0,0,,0)TR_{n-1}\ldots,R_{2}R_{1}\mathbf{x}=(a,0,0,\ldots,0)^{T}, where a=x12+x22++xn2a=\sqrt{x_{1}^{2}+x_{2}^{2}+\ldots+x_{n}^{2}}.

Proof.

The idea of the proof of the lemma is very simple. We use an elementary rotation R1R_{1} in the xn1x_{n-1}-xnx_{n} plane to “kill” the last coordinate of 𝐱\mathbf{x} (Lemma 6.5.4 guarantees that such rotation exists). Then use an elementary rotation R2R_{2} in xn2x_{n-2}-xn1x_{n-1} plane to “kill” the coordinate number n1n-1 of R1𝐱R_{1}\mathbf{x} (the rotation R2R_{2} does not change the last coordinate, so the last coordinate of R2R1𝐱R_{2}R_{1}\mathbf{x} remains zero), and so on…

For a formal proof we will use induction in nn. The case n=1n=1 is trivial, since any vector in 1\mathbb{R}^{1} has the desired form. The case n=2n=2 is treated by Lemma 6.5.4.

Assuming now that Lemma is true for n1n-1, let us prove it for nn. By Lemma 6.5.4 there exists a 2×22\times 2 rotation matrix RαR_{\alpha} such that

Rα(xn1xn)=(an10),R_{\alpha}\left(\begin{array}[]{cc}x_{n-1}\\ x_{n}\end{array}\right)=\left(\begin{array}[]{cc}a_{n-1}\\ 0\end{array}\right),

where an1=xn12+xn2a_{n-1}=\sqrt{x_{n-1}^{2}+x_{n}^{2}}. So if we define the n×nn\times n elementary rotation R1R_{1} by

R1=(In2𝟎𝟎Rα)R_{1}=\left(\begin{array}[]{cc}I_{n-2}&\mathbf{0}\\ \mathbf{0}&R_{\alpha}\\ \end{array}\right)

(In2I_{n-2} is (n2)×(n2)(n-2)\times(n-2) identity matrix), then

R1𝐱=(x1,x2,,xn2,an1,0)T.R_{1}\mathbf{x}=(x_{1},x_{2},\ldots,x_{n-2},a_{n-1},0)^{T}.

We assumed that the conclusion of the lemma holds for n1n-1, so there exist n2n-2 elementary rotations (let us call them R2,R3,,Rn1R_{2},R_{3},\ldots,R_{n-1}) in n1\mathbb{R}^{n-1} which transform the vector (x1,x2,,xn2,an1)Tn1(x_{1},x_{2},\ldots,x_{n-2},a_{n-1})^{T}\in\mathbb{R}^{n-1} to the vector (a,0,,0)Tn1(a,0,\ldots,0)^{T}\in\mathbb{R}^{n-1}. In other words

Rn1R3R2(x1,x2,,xn2,an1)T=(a,0,,0)T.R_{n-1}\ldots R_{3}R_{2}(x_{1},x_{2},\ldots,x_{n-2},a_{n-1})^{T}=(a,0,\ldots,0% )^{T}.

We can always assume that the elementary rotations R2,R3,,Rn1R_{2},R_{3},\ldots,R_{n-1} act in n\mathbb{R}^{n}, simply by assuming that they do not change the last coordinate. Then

Rn1R3R2R1𝐱=(a,0,,0)Tn.R_{n-1}\ldots R_{3}R_{2}R_{1}\mathbf{x}=(a,0,\ldots,0)^{T}\in\mathbb{R}^{n}.

Let us now show that a=x12+x22++xn2a=\sqrt{x_{1}^{2}+x_{2}^{2}+\ldots+x_{n}^{2}}. It can be easily checked directly, but we apply the following indirect reasoning. We know that orthogonal transformations preserve the norm, and we know that a0a\geq 0. But, then we do not have any choice, the only possibility for aa is a=x12+x22++xn2a=\sqrt{x_{1}^{2}+x_{2}^{2}+\ldots+x_{n}^{2}}. ∎

Lemma 6.5.6.

Let AA be an n×nn\times n matrix with real entries. There exist elementary rotations R1,R2,,RNR_{1},R_{2},\ldots,R_{N}, Nn(n1)/2N\leq n(n-1)/2 such that the matrix B=RNR2R1AB=R_{N}\ldots R_{2}R_{1}A is upper triangular, and, moreover, all its diagonal entries except the last one Bn,nB_{n,n} are non-negative.

Proof.

We will use induction in nn. The case n=1n=1 is trivial, since we can say that any 1×11\times 1 matrix is of desired form.

Let us consider the case n=2n=2. Let 𝐚1\mathbf{a}_{1} be the first column of AA. By Lemma 6.5.4 there exists a rotation RR which “kills” the second coordinate of 𝐚1\mathbf{a}_{1}, making the first coordinate non-negative. Then the matrix B=RAB=RA is of desired form.

Let us now assume that lemma holds for (n1)×(n1)(n-1)\times(n-1) matrices, and we want to prove it for n×nn\times n matrices. For the n×nn\times n matrix AA let 𝐚1\mathbf{a}_{1} be its first column. By Lemma 6.5.5 we can find n1n-1 elementary rotations (say R1,R2,,Rn1R_{1},R_{2},\ldots,R_{n-1}) which transform 𝐚1\mathbf{a}_{1} into (a,0,,0)T(a,0,\ldots,0)^{T}. So, the matrix Rn1R2R1AR_{n-1}\ldots R_{2}R_{1}A has the following block triangular form

Rn1R2R1A=(a𝟎A1),R_{n-1}\ldots R_{2}R_{1}A=\left(\begin{array}[]{ll}a&*\\ \mathbf{0}&A_{1}\\ \end{array}\right),

where A1A_{1} is an (n1)×(n1)(n-1)\times(n-1) block.

We assumed that lemma holds for n1n-1, so A1A_{1} can be transformed by at most (n1)(n2)/2(n-1)(n-2)/2 rotations into the desired upper triangular form. Note, that these rotations act in n1\mathbb{R}^{n-1} (only on the coordinates x2,x3,,xnx_{2},x_{3},\ldots,x_{n}), but we can always assume that they act on the whole n\mathbb{R}^{n} simply by assuming that they do not change the first coordinate. Then, these rotations do not change the vector (a,0,,0)T(a,0,\ldots,0)^{T} (the first column of Rn1R2R1AR_{n-1}\ldots R_{2}R_{1}A), so the matrix AA can be transformed into the desired upper triangular form by at most n1+(n1)(n2)/2=n(n1)/2n-1+(n-1)(n-2)/2=n(n-1)/2 elementary rotations. ∎

Proof of Theorem 6.5.3.

By Lemma 6.5.6 there exist elementary rotations R1,R2,,RNR_{1},R_{2},\ldots,R_{N} such that the matrix U1=RNR2R1UU_{1}=R_{N}\ldots R_{2}R_{1}U is upper triangular, and all diagonal entries, except maybe the last one, are non-negative.

Note, that the matrix U1U_{1} is orthogonal. Any orthogonal matrix is normal, and we know that an upper triangular matrix can be normal only if it is diagonal. Therefore, U1U_{1} is a diagonal matrix.

We know that an eigenvalue of an orthogonal matrix can either be 11 or 1-1, so we can have only 11 or 1-1 on the diagonal of U1U_{1}. But, we know that all diagonal entries of U1U_{1}, except may be the last one, are non-negative, so all the diagonal entries of U1U_{1}, except may be the last one, are 11. The last diagonal entry can be ±1\pm 1.

Since elementary rotations have determinant 1, we can conclude that detU1=detU=1\det U_{1}=\det U=1, so the last diagonal entry also must be 11. So U1=IU_{1}=I, and therefore UU can be represented as a product of elementary rotations U=R11R21RN1U=R_{1}^{-1}R_{2}^{-1}\ldots R^{-1}_{N}. Here we use the fact that the inverse of an elementary rotation is an elementary rotation as well. ∎

6.6. Orientation

6.6.1. Motivation

In Figures 6.1, 6.2 below we see 3 orthonormal bases in 2\mathbb{R}^{2} and 3\mathbb{R}^{3} respectively. In each figure, the basis b) can be obtained from the standard basis a) by a rotation, while it is impossible to rotate the standard basis to get the basis c) (so that 𝐞k\mathbf{e}_{k} goes to 𝐯k\mathbf{v}_{k} k\forall k).

Refer to caption
Figure 6.1. Orientation in 2\mathbb{R}^{2}
Refer to caption
Figure 6.2. Orientation in 3\mathbb{R}^{3}

You have probably heard the word “orientation” before, and you probably know that bases a) and b) have positive orientation, and orientation of the bases c) is negative. You also probably know some rules to determine the orientation, like the right hand rule from physics. So, if you can see a basis, say in 3\mathbb{R}^{3}, you probably can say what orientation it has.

But what if you only given coordinates of the vectors 𝐯1,𝐯2,𝐯3\mathbf{v}_{1},\mathbf{v}_{2},\mathbf{v}_{3}? Of course, you can try to draw a picture to visualize the vectors, and then to see what the orientation is. But this is not always easy. Moreover, how do you “explain” this to a computer?

It turns out that there is an easier way. Let us explain it. We need to check whether it is possible to get a basis 𝐯1,𝐯2,𝐯3\mathbf{v}_{1},\mathbf{v}_{2},\mathbf{v}_{3} in 3\mathbb{R}^{3} by rotating the standard basis 𝐞1,𝐞2,𝐞3\mathbf{e}_{1},\mathbf{e}_{2},\mathbf{e}_{3}. There is unique linear transformation UU such that

U𝐞k=𝐯k,k=1,2,3;U\mathbf{e}_{k}=\mathbf{v}_{k},\qquad k=1,2,3;

its matrix (in the standard basis) is the matrix with columns 𝐯1,𝐯2,𝐯3\mathbf{v}_{1},\mathbf{v}_{2},\mathbf{v}_{3}. It is an orthogonal matrix (because it transforms an orthonormal basis to an orthonormal basis), so we need to see when it is rotation. Theorems 6.5.1 and 6.5.2 give us the answer: the matrix UU is a rotation if and only if detU=1\det U=1. Note, that (for 3×33\times 3 matrices) if detU=1\det U=-1, then UU is the composition of a rotation about some axis and a reflection in the plane of rotation, i.e. in the plane orthogonal to this axis.

This gives us a motivation for the formal definition below.

6.6.2. Formal definition

Let 𝒜\mathcal{A} and \mathcal{B} be two bases in a real vector space XX. We say that the bases 𝒜\mathcal{A} and \mathcal{B} have the same orientation, if the change of coordinates matrix [I],𝒜[I]_{{}_{\scriptstyle\mathcal{B},\mathcal{A}}} has positive determinant, and say that they have different orientations if the determinant of [I],𝒜[I]_{{}_{\scriptstyle\mathcal{B},\mathcal{A}}} is negative.

Note, that since [I]𝒜,=[I],𝒜1[I]_{\mathcal{A},\mathcal{B}}=[I]_{{}_{\scriptstyle\mathcal{B},\mathcal{A}}}^{% -1}, one can use the matrix [I]𝒜,[I]_{\mathcal{A},\mathcal{B}} in the definition.

We usually assume that the standard basis 𝐞1,𝐞2,,𝐞n\mathbf{e}_{1},\mathbf{e}_{2},\ldots,\mathbf{e}_{n} in n\mathbb{R}^{n} has positive orientation. In an abstract space one just needs to fix a basis and declare that its orientation is positive.

If an orthonormal basis 𝐯1,𝐯2,,𝐯n\mathbf{v}_{1},\mathbf{v}_{2},\ldots,\mathbf{v}_{n} in n\mathbb{R}^{n} has positive orientation (i.e. the same orientation as the standard basis) Theorems 6.5.1 and 6.5.2 say that the basis 𝐯1,𝐯2,,𝐯n\mathbf{v}_{1},\mathbf{v}_{2},\ldots,\mathbf{v}_{n} is obtained from the standard basis by a rotation.

6.6.3. Continuous transformations of bases and orientation

Definition.

We say that a basis 𝒜={𝐚1,𝐚2,,𝐚n}\mathcal{A}=\{\mathbf{a}_{1},\mathbf{a}_{2},\ldots,\mathbf{a}_{n}\} can be continuously transformed to a basis ={𝐛1,𝐛2,,𝐛n}\mathcal{B}=\{\mathbf{b}_{1},\mathbf{b}_{2},\ldots,\mathbf{b}_{n}\} if there exists a continuous family of bases 𝒱(t)={𝐯1(t),𝐯2(t),,𝐯n(t)}\mathcal{V}(t)=\{\mathbf{v}_{1}(t),\mathbf{v}_{2}(t),\ldots,\mathbf{v}_{n}(t)\}, t[a,b]t\in[a,b] such that

𝐯k(a)=𝐚k,𝐯k(b)=𝐛k,k=1,2,n.\mathbf{v}_{k}(a)=\mathbf{a}_{k},\quad\mathbf{v}_{k}(b)=\mathbf{b}_{k},\qquad k% =1,2\ldots,n.

“Continuous family of bases” mean that the vector-functions 𝐯k(t)\mathbf{v}_{k}(t) are continuous (their coordinates in some bases are continuous functions) and, which is essential, the system 𝐯1(t),𝐯2(t),,𝐯n(t)\mathbf{v}_{1}(t),\mathbf{v}_{2}(t),\ldots,\mathbf{v}_{n}(t) is a basis for all t[a,b]t\in[a,b].

Note, that performing a change of variables, we can always assume, if necessary that [a,b]=[0,1][a,b]=[0,1].

Theorem 6.6.1.

Two bases 𝒜={𝐚1,𝐚2,,𝐚n}\mathcal{A}=\{\mathbf{a}_{1},\mathbf{a}_{2},\ldots,\mathbf{a}_{n}\} and ={𝐛1,𝐛2,,𝐛n}\mathcal{B}=\{\mathbf{b}_{1},\mathbf{b}_{2},\ldots,\mathbf{b}_{n}\} have the same orientation, if and only if one of the bases can be continuously transformed to the other.

Proof.

Suppose the basis 𝒜\mathcal{A} can be continuously transformed to the basis \mathcal{B}, and let 𝒱(t)\mathcal{V}(t), t[a,b]t\in[a,b] be a continuous family of bases, performing this transformation. Consider a matrix-function V(t)V(t) whose columns are the coordinate vectors [𝐯k(t)]𝒜[\mathbf{v}_{k}(t)]_{\mathcal{A}} of 𝐯k(t)\mathbf{v}_{k}(t) in the basis 𝒜\mathcal{A}.

Clearly, the entries of V(t)V(t) are continuous functions and V(a)=IV(a)=I, V(b)=[I]𝒜,V(b)=[I]_{\mathcal{A},\mathcal{B}}. Note, that because 𝒱(t)\mathcal{V}(t) is always a basis, detV(t)\det V(t) is never zero. Then, the Intermediate Value Theorem asserts that detV(a)\det V(a) and detV(b)\det V(b) has the same sign. Since detV(a)=detI=1\det V(a)=\det I=1, we can conclude that

det[I]𝒜,=detV(b)>0,\det[I]_{\mathcal{A},\mathcal{B}}=\det V(b)>0,

so the bases 𝒜\mathcal{A} and \mathcal{B} have the same orientation.

To prove the opposite implication, i.e. the “only if” part of the theorem, one needs to show that the identity matrix II can be continuously transformed through invertible matrices to any matrix BB satisfying detB>0\det B>0. In other words, that there exists a continuous matrix-function V(t)V(t) on an interval [a,b][a,b] such that for all t[a,b]t\in[a,b] the matrix V(t)V(t) is invertible and such that

V(a)=I,V(b)=B.V(a)=I,\qquad V(b)=B.

We leave the proof of this fact as an exercise for the reader. There are several ways to prove that, one of which is outlined in Exercises 6.6.26.6.5 below. ∎

Exercises.

6.6.1.

Let RαR_{\alpha} be the rotation through α\alpha, so its matrix in the standard basis is

(cosαsinαsinαcosα).\left(\begin{array}[]{rr}\cos\alpha&-\sin\alpha\\ \sin\alpha&\cos\alpha\\ \end{array}\right).

Find the matrix of RαR_{\alpha} in the basis 𝐯1,𝐯2\mathbf{v}_{1},\mathbf{v}_{2}, where 𝐯1=𝐞2\mathbf{v}_{1}=\mathbf{e}_{2}, 𝐯2=𝐞1\mathbf{v}_{2}=\mathbf{e}_{1}.

6.6.2.

Let RαR_{\alpha} be the rotation matrix

Rα=(cosαsinαsinαcosα).R_{\alpha}=\left(\begin{array}[]{rr}\cos\alpha&-\sin\alpha\\ \sin\alpha&\cos\alpha\\ \end{array}\right).

Show that the 2×22\times 2 identity matrix I2I_{2} can be continuously transformed through invertible matrices into RαR_{\alpha}.

6.6.3.

Let UU be an n×nn\times n orthogonal matrix, and let detU>0\det U>0. Show that the n×nn\times n identity matrix InI_{n} can be continuously transformed through invertible matrices into UU. Hint: Use the previous problem and representation of a rotation in n\mathbb{R}^{n} as a product of planar rotations, see Section 6.5.

6.6.4.

Let AA be an n×nn\times n positive definite Hermitian matrix, A=A>0A=A^{*}>0. Show that the n×nn\times n identity matrix InI_{n} can be continuously transformed through invertible matrices into AA. Hint: What about diagonal matrices?

6.6.5.

Using polar decomposition and Problems 6.6.3, 6.6.4 above, complete the proof of the “only if” part of Theorem 6.6.1