In this chapter we are again assuming that all spaces are finite-dimensional. Again, we are dealing only with complex or real spaces, theory of inner product spaces does not apply to spaces over general fields. When we are not mentioning what space are we in, everything work for both complex and real spaces.
To avoid writing essentially the same formulas twice we will use the notation for the complex case: in the real case it give correct, although sometimes a bit more complicated, formulas.
Let be an operator acting in a complex inner product space. There exists an orthonormal basis in such that the matrix of in this basis is upper triangular.
In other words, any matrix can be represented as , where is a unitary, and is an upper triangular matrix.
We prove the theorem using the induction in . If the theorem is trivial, since any matrix is upper triangular.
Suppose we proved that the theorem is true if , and we want to prove it for .
Let be an eigenvalue of , and let , be a corresponding eigenvector, . Denote , and let be some orthonormal basis in (clearly, ), so is an orthonormal basis in . In this basis the matrix of has the form
| (6.1.1) |
here all entries below are zeroes, and means that we do not care what entries are in the first row right of .
We do care enough about the lower right block, to give it name: we denote it as .
Note, that defines a linear transformation in , and since , the induction hypothesis implies that there exists an orthonormal basis (let us denote it as ) in which the matrix of is upper triangular.
So, matrix of in the orthonormal basis has the form (6.1.1), where matrix is upper triangular. Therefore, the matrix of in this basis is upper triangular as well. ∎
Note, that the subspace introduced in the proof is not invariant under , i.e. the inclusion does not necessarily hold. That means that is not a part of , it is some operator constructed from .
Note also, that if and only if all entries denoted by (i.e. all entries in the first row, except ) are zero.
Note, that even if we start from a real matrix , the matrices and can have complex entries. The rotation matrix
is not unitarily equivalent (not even similar) to a real upper triangular matrix. Indeed, eigenvalues of this matrix are complex, and the eigenvalues of an upper triangular matrix are its diagonal entries.
An analogue of Theorem 6.1.1 can be stated and proved for an arbitrary vector space, without requiring it to have an inner product. In this case the theorem claims that any operator have an upper triangular form in some basis. A proof can be modeled after the proof of Theorem 6.1.1. An alternative way is to equip with an inner product by fixing a basis in and declaring it to be an orthonormal one, see Exercise 5.2.4 in Chapter 5.
Note, that the version for inner product spaces (Theorem 6.1.1) is stronger than the one for the vector spaces, because it says that we always can find an orthonormal basis, not just a basis.
The following theorem is a real-valued version of Theorem 6.1.1.
Let be an operator acting in a real inner product space. Suppose that all eigenvalues of are real (meaning that has exactly real eigenvalues, counting multiplicities). Then there exists an orthonormal basis in such that the matrix of in this basis is upper triangular.
In other words, any real matrix with all real eigenvalues can be represented as , where is an orthogonal, and is a real upper triangular matrices.
Recall that as we had already discussed in Section 4.1.4 of Chapter 4 by eigenvalues of an operator in a real vector space we always mean the eigenvalues of its complexification acting in the complexification of .
If then its complexification is obtained from by allowing complex coordinates. The complexification of an operator (the multiplication by a real matrix ) on is the multiplication by the same (real) matrix, but now in the space .
To prove the theorem we just need to analyze the proof of Theorem 6.1.1. Let us assume (we can always do this without loss of generality) that the operator (matrix) acts in .
Suppose, the theorem is true for matrices. As in the proof of Theorem 6.1.1 let be a real eigenvalue of , , be a corresponding eigenvector, and let be on orthonormal system (in ) such that is an orthonormal basis in .
The matrix of in this basis has form (6.1.1), where is some real matrix.
If we can prove that matrix has only real eigenvalues, then we are done. Indeed, then by the induction hypothesis there exists an orthonormal basis in such that the matrix of in this basis is upper triangular, so the matrix of in the basis is also upper triangular.
To show that has only real eigenvalues, let us notice that
(take the cofactor expansion in the first row, for example), and so any eigenvalue of is also an eigenvalue of . But has only real eigenvalues! ∎
Use the upper triangular representation of an operator to give an alternative proof of the fact that determinant is the product and the trace is the sum of eigenvalues counting multiplicities.
In this section we deal with matrices (operators) which are unitarily equivalent to diagonal matrices.
Let us recall that an operator is called self-adjoint if . A matrix of a self-adjoint operator (in some orthonormal basis), i.e. a matrix satisfying is called a Hermitian matrix.
The terms self-adjoint and Hermitian essentially mean the same. Usually people say self-adjoint when speaking about operators (transformations), and Hermitian when speaking about matrices. We will try to follow this convention, but since we often do not distinguish between operators and their matrices, we will sometimes mix both terms.
Let be a self-adjoint operator in an inner product space (the space can be complex or real). Then all eigenvalues of are real, and there exists an orthonormal basis of eigenvectors of in .
This theorem can be restated in matrix form as follows
Let be a self-adjoint (and therefore square) matrix. Then can be represented as
where is a unitary matrix and is a diagonal matrix with real entries.
Moreover, if the matrix is real, matrix can be chosen to be real (i.e. orthogonal).
To prove Theorems 6.2.1 and 6.2.2 for a complex inner product space let us first apply Theorem 6.1.1 to find an orthonormal basis in such that the matrix of in this basis is upper triangular. Now let us ask ourself a question: What upper triangular matrices are self-adjoint?
The answer is immediate: an upper triangular matrix is self-adjoint if and only if it is a diagonal matrix with real entries. Theorem 6.2.1 (and so Theorem 6.2.2) is proved.
To treat the case of a real vector space, we first notice that by the already proved complex case all eigenvalues of are real111Recall, see Remark 6.1.3 above, that by the eigenvalues of an operator in a real vector space , we always mean the eigenvalues of the extension of this operator to the complexification of . Then we just apply Theorem 6.1.2 and notice again that the only self-adjoint upper triangular matrices are the diagonal ones. ∎
In many textbooks only real matrices are considered, and Theorem 6.2.2 is often called the “Spectral Theorem for symmetric matrices”. However, we should emphasize that the conclusion of Theorem 6.2.2 fails for complex symmetric matrices: the theorem holds for Hermitian matrices, and in particular for real symmetric matrices.
Let us give an independent proof to the fact that eigenvalues of a self-adjoint operators are real. Let and , . Then
On the other hand,
so . Since (), we can conclude , so is real.
It also follows from Theorem 6.2.1 that eigenspaces of a self-adjoint operator are orthogonal. Let us give an alternative proof of this result.
Let be a self-adjoint operator, and let be its eigenvectors, , . Then, if , the eigenvectors and are orthogonal.
This proposition follows from the spectral theorem (Theorem 6.2.1), but here we are giving a direct proof. Namely,
On the other hand
(the last equality holds because eigenvalues of a self-adjoint operator are real), so . If it is possible only if . ∎
Now let us try to find what matrices are unitarily equivalent to a diagonal one. It is easy to check that for a diagonal matrix
Therefore if the matrix of in some orthonormal basis is diagonal.
An operator (matrix) is called normal if .
Clearly, any self-adjoint operator () is normal. Also, any unitary operator is normal since .
Note, that a normal operator is an operator acting in one space, not from one space to another. So, if is a unitary operator acting from one space to another, we cannot say that is normal.
Any normal operator in a complex vector space has an orthonormal basis of eigenvectors.
In other words, any matrix satisfying can be represented as
where is a unitary matrix, and is a diagonal one.
Note, that in the above theorem even if is a real matrix, we did not claim that matrices and are real. Moreover, it can be easily shown, that if is real, must be self-adjoint.
To prove Theorem 6.2.4 we apply Theorem 6.1.1 to get an orthonormal basis, such that the matrix of in this basis is upper triangular. To complete the proof of the theorem we only need to show that an upper triangular normal matrix must be diagonal.
We will prove this using induction in the dimension of matrix. The case of matrix is trivial, since any matrix is diagonal.
Suppose we have proved that any upper triangular normal matrix is diagonal, and we want to prove it for matrices. Let be upper triangular normal matrix. We can write it as
where is an upper triangular matrix.
Let us compare upper left entries (first row first column) of and . Direct computation shows that that
and
So, if and only if . Therefore, the matrix has the form
It follows from the above representation that
so . That means the matrix is also normal, and by the induction hypothesis it is diagonal. So the matrix is also diagonal. ∎
The following proposition gives a very useful characterization of normal operators.
An operator is normal if and only if
Let be normal, . Then
so .
True or false:
Every unitary operator is normal.
A matrix is unitary if and only if it is invertible.
If two matrices are unitarily equivalent, then they are also similar.
The sum of self-adjoint operators is self-adjoint.
The adjoint of a unitary operator is unitary.
The adjoint of a normal operator is normal.
If all eigenvalues of a linear operator are , then the operator must be unitary or orthogonal.
If all eigenvalues of a normal operator are , then the operator is identity.
A linear operator may preserve norm but not the inner product.
True or false: The sum of normal operators is normal? Justify your conclusion.
Show that an operator unitarily equivalent to a diagonal one is normal.
Orthogonally diagonalize the matrix,
Find all square roots of , i.e. find all matrices such that .
Note, that all square roots of are self-adjoint.
True or false: any self-adjoint matrix has a self-adjoint square root. Justify.
Orthogonally diagonalize the matrix,
i.e. represent it as , where is diagonal and is unitary.
Among all square roots of , i.e. among all matrices such that , find one that has positive eigenvalues. You can leave as a product.
True or false:
A product of two self-adjoint matrices is self-adjoint.
If is self-adjoint, then is self-adjoint.
Justify your conclusions
Let be matrix. Prove that
is self-adjoint.
All eigenvalues of are non-negative.
is invertible.
Give a proof if the statement is true, or give a counterexample if it is false:
If then is invertible.
If is unitary, is invertible.
If a matrix is real, is invertible.
Orthogonally diagonalize the rotation matrix
where is not an integer multiple of . Note, that you will get complex eigenvalues in this case.
Orthogonally diagonalize the matrix
Hints: You will get real eigenvalues in this case. Also, the trigonometric identities , , (applied to ) will help to simplify expressions for eigenvectors.
Can you describe the linear transformation with matrix from the previous problem geometrically? It has a very simple geometric interpretation.
Prove that a normal operator with unimodular eigenvalues (i.e. with all eigenvalues satisfying ) is unitary. Hint: Consider diagonalization
Prove that a normal operator with real eigenvalues is self-adjoint.
Show by example that conclusion of Theorem 6.2.2 fails for complex symmetric matrices. Namely
construct a (diagonalizable) complex symmetric matrix not admitting an orthogonal basis of eigenvectors;
construct a complex symmetric matrix which cannot be diagonalized.
A self adjoint operator is called positive definite if
and it is called positive semidefinite if
We will use the notation for positive definite operators, and for positive semi-definite.
The following theorem describes positive definite and semidefinite operators.
Let . Then
if and only if all eigenvalues of are positive.
if and only if all eigenvalues of are non-negative.
Pick an orthonormal basis such that matrix of in this basis is diagonal (see Theorem 6.2.1). To finish the proof it remains to notice that a diagonal matrix is positive definite (positive semidefinite) if and only if all its diagonal entries are positive (non-negative). ∎
Let be a positive semidefinite operator. There exists a unique positive semidefinite operator such that
Such is called (positive) square root of and is denoted as or .
Let us prove that exists. Let be an orthonormal basis of eigenvectors of , and let be the corresponding eigenvalues. Note, that since , all .
In the basis the matrix of is a diagonal matrix with entries on the diagonal. Define the matrix of in the same basis as .
Clearly, and .
To prove that such is unique, let us suppose that there exists an operator such that . Let be an orthonormal basis of eigenvectors of , and let be the corresponding eigenvalues (note that ). The matrix of in the basis is a diagonal one , and therefore the matrix of in the same basis is . This implies that any eigenvalue of is of form , and, moreover, if , then .
Therefore in the basis above, the matrix of has the diagonal form , i.e. . ∎
Consider an operator . Its Hermitian square is a positive semidefinite operator acting in . Indeed,
and
Therefore, there exists a (unique) positive-semidefinite square root . This operator is called the modulus of the operator , and is often denoted as .
The modulus of shows how “big” the operator is:
For a linear operator
For any
∎
The first equality follows immediately from Proposition 6.3.3, the second one follows from the identity ( is self-adjoint). ∎
Let be an operator (square matrix). Then can be represented as
where is a unitary operator.
The unitary operator is generally not unique. As one will see from the proof of the theorem, is unique if and only if is invertible.
The polar decomposition also holds for operators acting from one space to another. But in this case we can only guarantee that is an isometry from to .
If this isometry can be extended to an isometry from the whole to (if this will be a unitary operator).
Consider a vector . Then vector can be represented as for some vector .
But first we need to prove that is well defined. Let be another vector such that . But means that (cf Corollary 6.3.4), so , meaning that is well defined.
By the construction . We leave as an exercise for the reader to check that is a linear transformation.
To extend to a unitary operator , let us find some unitary transformation . It is always possible to do this, since for square matrices (the Rank Theorem).
It is easy to check that is a unitary operator, and that . ∎
Eigenvalues of are called the singular values of . In other words, if are eigenvalues of then are singular values of .
Very often in the literature the singular values are defined as the non-negative square roots of the eigenvalues of , without any reference to the modulus .
I consider the notion of the modulus of an operator to be an important one, so it was introduced above. However, the notion of the modulus of an operator is not required for what follows (defining the Schmidt and singular value decompositions). Moreover, as it will be shown below, the modulus of can be easily constructed from the singular value decomposition.
Consider an operator , and let be the singular values of counting multiplicities. Assume also that are the non-zero singular values of , counting multiplicities. This means, in particular, that for .
By the definition of singular values the numbers are eigenvalues of . Let be an orthonormal basis222We know, that for a self-adjoint operator ( in our case) there exists an orthonormal basis of eigenvectors. of eigenvectors of , .
The system
is an orthonormal system.
since is an orthonormal system. ∎
In the notation of the above proposition, the operator can be represented as
| (6.3.1) |
or, equivalently
| (6.3.2) |
Indeed, we know that is an orthonormal basis in . Then substituting into the right side of (6.3.2) we get
and
So the operators in the left and right sides of (6.3.1) coincide on the basis , so they are equal.
Schmidt decomposition of an operator is not unique. Why?
Let can be represented as
where and , are some orthonormal systems.
Then this representation gives a Schmidt decomposition of .
We only need to show that are eigenvectors of , . Since is an orthonormal system,
and therefore
Since is an orthonormal system
thus are eigenvectors of . ∎
Let
be a Schmidt decomposition of . Then
is a Schmidt decomposition of
The Schmidt decomposition can be written in a nice matrix form. Namely, let us assume that , where is either or (we can always do that by fixing orthonormal bases in and and working with coordinates in these bases). Let be non-zero singular values of , and let
be a Schmidt decomposition of .
As one can easily see, this equality can be rewritten as
| (6.3.3) |
where and and are matrices with columns and respectively. (Can you tell what is the size of each matrix?)
Note, that since and are orthonormal systems, the matrices and are isometries. Note also that , see Exercise 6.3.1 below.
If the matrix is invertible, then , the matrices , are unitary and is an invertible diagonal matrix.
It turns out that it is always possible to write a representation similar to (6.3.3) with unitary and instead of and , and in many situations it is more convenient to work with such a representation. To write this representation one needs first to complete the systems and to orthogonal bases in and respectively.
Recall, that to complete, say to an orthonormal basis in one just needs to find and orthonormal basis in ; then the system will be an orthonormal basis in . And one can always get an orthonormal basis from an arbitrary one using Gram–Schmidt orthogonalization.
Then can be represented as
| (6.3.4) |
where and are unitary matrices with columns and respectively, and is a “diagonal” matrix
| (6.3.5) |
In other words, to get the matrix one has to take the diagonal matrix and make it to an matrix by adding extra zeroes “south and east”.
For a matrix (recall that here is always or ) its singular value decomposition (SVD) is a decomposition of form (6.3.4), i.e. a decomposition , where , are unitary matrices and is a “diagonal” one (meaning that for all , and for all ).
The representation (6.3.3) is often called the reduced or compact SVD. More precisely the reduced SVD is a representation , where , is a diagonal matrix with strictly positive diagonal entries, and , are isometries; moreover, we require that at least one of the matrices and is not square.
It is easy to see that if is a singular value decomposition of , then are singular values of , i.e. are eigenvalues of . Moreover, the columns of are the corresponding eigenvectors of , . Note also that if then .
All that means that any singular value decomposition can be obtained from a Schmidt decomposition (6.3.2) by the construction described above in this section.
The reduced singular value decomposition can be interpreted as a matrix form of the Schmidt decomposition (6.3.2) for a non-invertible matrix . For an invertible matrix the matrix form of the Schmidt decomposition gives the singular value decomposition.
An alternative way to interpret the singular value decomposition is to say that is the matrix of in the (orthonormal) bases and , i.e that .
We will use this interpretation later.
Note, that if we know the singular value decomposition of a square matrix , we can write a polar decomposition of :
| (6.3.6) |
where and .
To see that this indeed give us a polar decomposition let us notice that is a self-adjoint, positive semidefinite operator and that
So by the definition of as the unique positive semidefinite square root of , we can see that . The transformation is clearly unitary, as a product of two unitary transformations, so (6.3.6) indeed gives us a polar decomposition of .
Note, that this reasoning only works for square matrices, because if is not square, then the product is not defined (dimensions do not match, can you see how?).
Show that the number of non-zero singular values of a matrix coincides with its rank.
Find Schmidt decompositions for the following matrices :
Let be an invertible matrix, and let be its singular value decomposition. Find a singular value decomposition for and .
Find singular value decomposition where and are unitary matrices for the following matrices:
;
.
Find singular value decomposition of the matrix
Use it to find
and the vectors where the maximum is attained;
and the vectors where the minimum is attained;
the image of the closed unit ball in , . Describe geometrically.
Show that for a square matrix , .
True or false
Singular values of a matrix are also eigenvalues of the matrix.
Singular values of a matrix are eigenvalues of .
Is is a singular value of a matrix and is a scalar, then is a singular value of .
The singular values of any linear operator are non-negative.
Singular values of a self-adjoint matrix coincide with its eigenvalues.
Let be an matrix. Prove that non-zero eigenvalues of the matrices and (counting multiplicities) coincide.
Can you say when zero eigenvalue of and zero eigenvalue of have the same multiplicity?
Let be the largest singular value of an operator , and let be the eigenvalue of with largest absolute value. Show that .
Show that the rank of a matrix is the number of its non-zero singular values (counting multiplicities).
Show that the operator norm of a matrix coincides with its Frobenius norm if and only if the matrix has rank one. Hint: The previous problem might help.
For the matrix
describe the inverse image of the unit ball, i.e. the set of all such that . Use singular value decomposition.
As we discussed above, the singular value decomposition is simply diagonalization with respect to two different orthonormal bases. Since we have two different bases here, we cannot say much about spectral properties of an operator from its singular value decomposition. For example, the diagonal entries of in the singular value decomposition (6.3.4) are not the eigenvalues of . Note, that for as in (6.3.4) we generally have , so this diagonalization does not help us in computing functions of a matrix.
However, as the examples below show, singular values tell us a lot about so-called metric properties of a linear transformation.
Final remark: performing singular value decomposition requires finding eigenvalues and eigenvectors of the Hermitian (self-adjoint) matrix . To find eigenvalues we usually computed characteristic polynomial, found its roots, and so on… This looks like quite a complicated process, especially if one takes into account that there is no formula for finding roots of polynomials of degree 5 and higher.
However, there are very effective numerical methods of find eigenvalues and eigenvectors of a hermitian matrix up to any given precision. These methods do not involve computing the characteristic polynomial and finding its roots. They compute approximate eigenvalues and eigenvectors directly by an iterative procedure. Because a Hermitian matrix has an orthogonal basis of eigenvectors, these methods work extremely well.
We will not discuss these methods here, it goes beyond the scope of this book. However, you should believe me that there are very effective numerical methods for computing eigenvalues and eigenvectors of a Hermitian matrix and for finding the singular value decomposition. These methods are extremely effective, and just a little more computationally intensive than solving a linear system.
Consider for example the following problem: let be a linear transformation, and let be the closed unit ball in . We want to describe , i.e. we want to find out how the unit ball is transformed under the linear transformation.
Let us first consider the simplest case when is a diagonal matrix , , . Then for and we have (equivalently, ) for , so
if and only if the coordinates satisfy the inequality
(this is simply the inequality ).
The set of points in satisfying the above inequalities is called an ellipsoid. If it is an ellipse with half-axes and , for it is an ellipsoid with half-axes and . In the geometry of this set is also easy to visualize, and we call that set an ellipsoid with half axes . The vectors or, more precisely the corresponding lines are called the principal axes of the ellipsoid.
The singular value decomposition essentially says that any operator in an inner product space is diagonal with respect to a pair of orthonormal bases, see Remark 6.3.11. Namely, consider the orthogonal bases and from the singular value decomposition (6.3.4). Then the matrix of in these bases is diagonal
Assuming that all and essentially repeating the above reasoning, it is easy to show that any point if and only if it satisfies the inequality
where are coordinates of in the orthonormal basis , not in the standard one. Similarly, .
But that is essentially the same ellipsoid as before, only “rotated” (with different but still orthogonal principal axes)!
There is also an alternative explanation which is presented below.
Consider the general case, when the matrix is not necessarily square, and (or) not all singular values are non-zero. Consider first the case of a “diagonal” matrix of form (6.3.5). It is easy to see that the image of the unit ball is the ellipsoid (not in the whole space but in the ) with half axes .
Consider now the general case, , where , are unitary operators. Unitary transformations do not change the unit ball (because they preserve norm), so . We know that is an ellipsoid in with half-axes . Unitary transformations do not change geometry of objects, so is also an ellipsoid with the same half-axes. It is not hard to see from the decomposition (using the fact that both and are invertible) that transforms to , so we can conclude:
the image of the closed unit ball is an ellipsoid in with half axes . Here is the number of non-zero singular values, i.e. the rank of .
Given a linear transformation let us consider the following optimization problem: find the maximum of on the closed unit ball .
Again, singular value decomposition allows us to solve the problem. For a “diagonal” (like the matrix in the definition of the singular value decomposition) matrix with non-negative entries the maximum is exactly maximal diagonal entry. Indeed, let be non-zero diagonal entries of and let be the maximal one. Since for
| (6.4.1) |
we can conclude that
so . On the other hand, , so indeed is the maximum of on the closed unit ball . Note, that in the above reasoning we did not assume that the matrix is square; we only assumed that all entries outside the “main diagonal” are , so formula (6.4.1) holds.
To treat the general case let us consider the singular value decomposition (6.3.4), , where , are unitary operators, and is the “diagonal” matrix with non-negative entries. Since unitary transformations do not change the norm, one can conclude that the maximum of on the unit ball is the maximal diagonal entry of i.e. that
the maximum of on the unit ball is the maximal singular value of .
The quantity is called the operator norm of and denoted .
It is an easy exercise to see that satisfies all properties of the norm:
;
;
for all ;
if and only if ,
so it is indeed a norm on a space of linear transformations from to .
One of the main properties of the operator norm is the inequality
which follows easily from the homogeneity of the norm .
In fact, it can be shown that the operator norm is the best (smallest) number such that
This is often used as a definition of the operator norm.
On the space of linear transformations we already have one norm, the Frobenius, or Hilbert-Schmidt norm ,
So, let us investigate how these two norms compare.
Let be non-zero singular values of (counting multiplicities), and let be the largest singular value. Then are non-zero eigenvalues of (again counting multiplicities). Recalling that the trace equals the sum of the eigenvalues we conclude that
On the other hand we know that the operator norm of equals its largest singular value, i.e. . So we can conclude that , i.e. that
the operator norm of a matrix cannot be more than its Frobenius norm.
This statement also admits a direct proof using the Cauchy–Schwarz inequality, and such a proof is presented in some textbooks. The beauty of the proof we presented here is that it does not require any computations and illuminates the reasons behind the inequality.
Suppose we have an invertible matrix and we want to solve the equation . The solution, of course, is given by , but we want to investigate what happens if we know the data only approximately.
That happens in the real life, when the data is obtained, for example by some experiments. But even if we have exact data, round-off errors during computations by a computer may have the same effect of distorting the data.
Let us consider the simplest model, suppose there is a small error in the right side of the equation. That means, instead of the equation we are solving
where is a small perturbation of the right side .
So, instead of the exact solution of we get the approximate solution of . We are assuming that is invertible, so .
We want to know how big is the relative error in the solution in comparison with the relative error in the right side . It is easy to see that
Since and we can conclude that
The quantity is called the condition number of the matrix . It estimates how the relative error in the solution depends on the relative error in the right side .
Let us see how this quantity is related to singular values. Let the numbers be the singular values of , and let us assume that is the largest singular value and is the smallest. We know that the (operator) norm of an operator equals its largest singular value, so
so
In other words
The condition number of a matrix equals to the ratio of the largest and the smallest singular values.
We deduced above that . It is not hard to see that this estimate is sharp, i.e. that it is possible to pick the right side and the error such that we have equality
We just put and , where and are respectively the first and the last column of the matrix in the singular value decomposition , and is an arbitrary scalar. Here, as usual, the singular values are assumed to be in non-increasing order , so is the largest and is the smallest eigenvalue.
We leave the details as an exercise for the reader.
A matrix is called well conditioned if its condition number is not too big. If the condition number is big, the matrix is called ill conditioned. What is “big” here depends on the problem: with what precision you can find your right side, what precision is required for the solution, etc.
Theoretically, the rank of a matrix is easy to compute: one just needs to row reduce matrix and count pivots. However, in practical applications not everything is so easy. The main reason is that very often we do not know the exact matrix, we only know its approximation up to some precision.
Moreover, even if we know the exact matrix, most computer programs introduce round-off errors in the computations, so effectively we cannot distinguish between a zero pivot and a very small pivot.
A simple naïve idea of working with round-off errors is as follows. When computing the rank (and other objects related to it, like column space, kernel, etc) one simply sets up a tolerance (some small number) and if the pivot is smaller than the tolerance, count it as zero. The advantage of this approach is its simplicity, since it is very easy to program. However, the main disadvantage is that it is impossible to see what the tolerance is responsible for. For example, what do we lose if we set the tolerance equal to ? How much better will be?
While the above approach works well for well conditioned matrices, it is not very reliable in the general case.
A better approach is to use singular values. It requires more computations, but gives much better results, which are easier to interpret. In this approach we also set up some small number as a tolerance, and then perform singular value decomposition. Then we simply treat singular values smaller than the tolerance as zero. The advantage of this approach is that we can see what we are doing. The singular values are the half-axes of the ellipsoid ( is the closed unit ball), so by setting up the tolerance we just deciding how “thin” the ellipsoid should be to be considered “flat”.
As we discussed in Section 5.4 of Chapter 5 above, the least square solution gives us, in the case when an equation does not have a solutions, the “next best thing” (and gives us the solution of when it exists).
Note, that the question of uniqueness is not addressed by the least square solution: a solution of the normal equation does not have to be unique. A natural distinguished solution would be a solution of minimal norm; such a solution is indeed unique, and can be obtained by taking an arbitrary solution and then taking its projection onto , see Exercises 5.4.5 and 5.4.6 in Chapter 5.
It is not hard to see that if is a reduced singular value decomposition of , then the minimal norm least square solution is given by
| (6.4.2) |
Indeed, is a least square solution of (i.e. a solution of ):
in the last equality in the chain we used the fact that () and that (see Exercise 6.4.4 below).
The general solution of is given by
Note that and , see again Exercise 6.4.4 below. Therefore (because , see Theorem 5.5.1 in Chapter 5), so is indeed the unique minimal norm solution of , or equivalently, the minimal norm least square solution of .
The operator , where is a reduced singular value decomposition of , is called the Moore–Penrose inverse (or Moore–Penrose pseudoinverse) of the operator . In other words, the Moore–Penrose inverse is the operator giving the unique least square solution of .
In the literature the Moore–Penrose inverse is usually defined as a matrix such that
;
;
;
.
It is very easy to check that the operator satisfies properties 1–4 above.
It is also possible (although a bit harder) to show that an operator satisfying properties 1–4 is unique. Indeed, right and left multiplying identity 1 by , we get that and ; together with properties 3 and 4 this means that and are orthogonal projections (see Exercise 5.5.6 in Chapter 5).
Trivially, . On the other hand, identity 1 implies that (why?), so . But this means that is the orthogonal projection onto ,
Property 1 also implies that for all . Since is an orthogonal projection, we conclude that . The opposite inclusion is trivial, so is the orthogonal projection onto ,
Knowing and we can rewrite property 2 as
Combining the above identities we get
Finally, for any in the target space of
and
i.e. is a least square solution of . Since , is, as we discussed above, the least square solution of minimal norm. But, as we had shown before, such minimal norm solution is given by (6.4.2), so . ∎
Find norms and condition numbers for the following matrices:
.
.
For the matrix from part a) present an example of the right side and the error such that
here and .
Let be a normal operator, and let be its eigenvalues (counting multiplicities). Show that singular values of are .
Find singular values, norm and condition number of the matrix
You can do this problem practically without any computations, if you use the previous problem and can answer the following questions:
What are singular values (eigenvalues) of an orthogonal projection onto some subspace ?
What is the matrix of the orthogonal projection onto the subspace spanned by the vector ?
How the eigenvalues of the operators and , where and are scalars, are related?
Of course, you can also just honestly do the computations.
Let be a reduced singular value decomposition of . Show that , and then by taking adjoint that .
Write a formula for the Moore–Penrose inverse in terms of the singular value decomposition .
Tychonov’s regularization: Prove that the Moore–Penrose inverse can be computed as the limits
An orthogonal matrix with is often called a rotation. The theorem below explains this name.
Let be an orthogonal operator in and let .333For an orthogonal matrix . Then there exists an orthonormal basis such that the matrix of in this basis has the block diagonal form
where are -dimensional rotations,
and stands for the identity matrix of size .
We know that if is a polynomial with real coefficient and is its complex root, , then is a root of as well, (this can easily be checked by plugging into ).
Therefore, all complex eigenvalues of a real matrix can be split into pairs , .
We know, that eigenvalues of a unitary matrix have absolute value , so all complex eigenvalues of (which is both real and unitary) can be written as pairs , .
Fix a pair of complex eigenvalues and , and let be the eigenvector of , . Then . Now, split into real and imaginary parts, i.e. define
so (note, that are real vectors, i.e. vectors with real entries). Then
Similarly,
Since , we have
so
In other word, leaves the 2-dimensional subspace spanned by the vectors invariant and the matrix of the restriction of onto this subspace is the rotation matrix
Note, that the vectors and (eigenvectors of a unitary matrix, corresponding to different eigenvalues) are orthogonal, so by the Pythagorean Theorem
It is easy to check that , so is an orthogonal basis in . If we multiply each vector in the basis by the same non-zero number, we do not change matrices of linear transformations, so without loss of generality we can assume that i.e. that is an orthonormal basis in .
Let us complete the orthonormal system to an orthonormal basis in . Since , i.e. is an invariant subspace of , the matrix of in this basis has the block triangular form
where stands for the block of zeroes.
Since the rotation matrix is invertible, we have . Therefore
so the matrix of in the basis we constructed is in fact block diagonal,
Since is unitary
so, since is square, it is also unitary.
If has complex eigenvalues we can apply the same procedure to decrease its size by 2 until we are left with a block that has only real eigenvalues. Real eigenvalues can be only or , so in some orthonormal basis the matrix of has the form
here and are identity matrices of size and respectively. Since , the multiplicity of the eigenvalue (i.e. ) must be even.
Note, that the matrix can be interpreted as the rotation through the angle . Therefore, the above matrix has the form given in the conclusion of the theorem with or ∎
Let us give a different interpretation of Theorem 6.5.1. Define to be a rotation through in the plane spanned by the vectors , . Then Theorem 6.5.1 simply says that is the composition of the rotations , . Note, that because the rotations act in mutually orthogonal planes, they commute, i.e. it does not matter in what order we take the composition. So, the theorem can be interpreted as follows:
Any rotation in can be represented as a composition of at most commuting planar rotations.
If an orthogonal matrix has determinant , its structure is described by the following theorem.
Let be an orthogonal operator in , and let .Then there exists an orthonormal basis such that the matrix of in this basis has block diagonal form
where and are -dimensional rotations,
and stands for the identity matrix of size .
We leave the proof as an exercise for the reader. The modification that one should make to the proof of Theorem 6.5.1 are pretty obvious.
Note, that it follows from the above theorem that an orthogonal matrix with determinant is always a reflection.
Let us now fix an orthonormal basis, say the standard basis in . We call an elementary rotation444This term is not widely accepted. a rotation in the - plane, i.e. a linear transformation which changes only the coordinates and , and it acts on these two coordinates as a plane rotation.
Any rotation (i.e. an orthogonal transformation with ) can be represented as a product at most elementary rotations.
To prove the theorem we will need the following simple lemmas.
Let . There exists a rotation of which moves the vector to the vector , where .
The proof is elementary, and we leave it as an exercise for the reader. One can just draw a picture or/and write a formula for .
Let . There exist elementary rotations such that , where .
The idea of the proof of the lemma is very simple. We use an elementary rotation in the - plane to “kill” the last coordinate of (Lemma 6.5.4 guarantees that such rotation exists). Then use an elementary rotation in - plane to “kill” the coordinate number of (the rotation does not change the last coordinate, so the last coordinate of remains zero), and so on…
For a formal proof we will use induction in . The case is trivial, since any vector in has the desired form. The case is treated by Lemma 6.5.4.
Assuming now that Lemma is true for , let us prove it for . By Lemma 6.5.4 there exists a rotation matrix such that
where . So if we define the elementary rotation by
( is identity matrix), then
We assumed that the conclusion of the lemma holds for , so there exist elementary rotations (let us call them ) in which transform the vector to the vector . In other words
We can always assume that the elementary rotations act in , simply by assuming that they do not change the last coordinate. Then
Let us now show that . It can be easily checked directly, but we apply the following indirect reasoning. We know that orthogonal transformations preserve the norm, and we know that . But, then we do not have any choice, the only possibility for is . ∎
Let be an matrix with real entries. There exist elementary rotations , such that the matrix is upper triangular, and, moreover, all its diagonal entries except the last one are non-negative.
We will use induction in . The case is trivial, since we can say that any matrix is of desired form.
Let us consider the case . Let be the first column of . By Lemma 6.5.4 there exists a rotation which “kills” the second coordinate of , making the first coordinate non-negative. Then the matrix is of desired form.
Let us now assume that lemma holds for matrices, and we want to prove it for matrices. For the matrix let be its first column. By Lemma 6.5.5 we can find elementary rotations (say ) which transform into . So, the matrix has the following block triangular form
where is an block.
We assumed that lemma holds for , so can be transformed by at most rotations into the desired upper triangular form. Note, that these rotations act in (only on the coordinates ), but we can always assume that they act on the whole simply by assuming that they do not change the first coordinate. Then, these rotations do not change the vector (the first column of ), so the matrix can be transformed into the desired upper triangular form by at most elementary rotations. ∎
By Lemma 6.5.6 there exist elementary rotations such that the matrix is upper triangular, and all diagonal entries, except maybe the last one, are non-negative.
Note, that the matrix is orthogonal. Any orthogonal matrix is normal, and we know that an upper triangular matrix can be normal only if it is diagonal. Therefore, is a diagonal matrix.
We know that an eigenvalue of an orthogonal matrix can either be or , so we can have only or on the diagonal of . But, we know that all diagonal entries of , except may be the last one, are non-negative, so all the diagonal entries of , except may be the last one, are . The last diagonal entry can be .
Since elementary rotations have determinant 1, we can conclude that , so the last diagonal entry also must be . So , and therefore can be represented as a product of elementary rotations . Here we use the fact that the inverse of an elementary rotation is an elementary rotation as well. ∎
In Figures 6.1, 6.2 below we see 3 orthonormal bases in and respectively. In each figure, the basis b) can be obtained from the standard basis a) by a rotation, while it is impossible to rotate the standard basis to get the basis c) (so that goes to ).
You have probably heard the word “orientation” before, and you probably know that bases a) and b) have positive orientation, and orientation of the bases c) is negative. You also probably know some rules to determine the orientation, like the right hand rule from physics. So, if you can see a basis, say in , you probably can say what orientation it has.
But what if you only given coordinates of the vectors ? Of course, you can try to draw a picture to visualize the vectors, and then to see what the orientation is. But this is not always easy. Moreover, how do you “explain” this to a computer?
It turns out that there is an easier way. Let us explain it. We need to check whether it is possible to get a basis in by rotating the standard basis . There is unique linear transformation such that
its matrix (in the standard basis) is the matrix with columns . It is an orthogonal matrix (because it transforms an orthonormal basis to an orthonormal basis), so we need to see when it is rotation. Theorems 6.5.1 and 6.5.2 give us the answer: the matrix is a rotation if and only if . Note, that (for matrices) if , then is the composition of a rotation about some axis and a reflection in the plane of rotation, i.e. in the plane orthogonal to this axis.
This gives us a motivation for the formal definition below.
Let and be two bases in a real vector space . We say that the bases and have the same orientation, if the change of coordinates matrix has positive determinant, and say that they have different orientations if the determinant of is negative.
Note, that since , one can use the matrix in the definition.
We usually assume that the standard basis in has positive orientation. In an abstract space one just needs to fix a basis and declare that its orientation is positive.
We say that a basis can be continuously transformed to a basis if there exists a continuous family of bases , such that
“Continuous family of bases” mean that the vector-functions are continuous (their coordinates in some bases are continuous functions) and, which is essential, the system is a basis for all .
Note, that performing a change of variables, we can always assume, if necessary that .
Two bases and have the same orientation, if and only if one of the bases can be continuously transformed to the other.
Suppose the basis can be continuously transformed to the basis , and let , be a continuous family of bases, performing this transformation. Consider a matrix-function whose columns are the coordinate vectors of in the basis .
Clearly, the entries of are continuous functions and , . Note, that because is always a basis, is never zero. Then, the Intermediate Value Theorem asserts that and has the same sign. Since , we can conclude that
so the bases and have the same orientation.
To prove the opposite implication, i.e. the “only if” part of the theorem, one needs to show that the identity matrix can be continuously transformed through invertible matrices to any matrix satisfying . In other words, that there exists a continuous matrix-function on an interval such that for all the matrix is invertible and such that
We leave the proof of this fact as an exercise for the reader. There are several ways to prove that, one of which is outlined in Exercises 6.6.2—6.6.5 below. ∎
Let be the rotation through , so its matrix in the standard basis is
Find the matrix of in the basis , where , .
Let be the rotation matrix
Show that the identity matrix can be continuously transformed through invertible matrices into .
Let be an orthogonal matrix, and let . Show that the identity matrix can be continuously transformed through invertible matrices into . Hint: Use the previous problem and representation of a rotation in as a product of planar rotations, see Section 6.5.
Let be an positive definite Hermitian matrix, . Show that the identity matrix can be continuously transformed through invertible matrices into . Hint: What about diagonal matrices?