Theory of inner product spaces is developed only for real and complex spaces, so in this Chapter is always or ; the results usually do not generalize to spaces over arbitrary fields.
Most of the results and calculations in this chapter hold (and the results have the same statements) in both real and complex cases. In rare situations when there is a difference between real and complex case, we state explicitly which case is considered: otherwise everything holds for both cases.
Finally, when the results and calculations hold for both complex and real cases, we use formulas for the complex case; in the real case they give correct, although sometimes a bit more complicated, formulas.
In dimensions 2 and 3, we defined the length of a vector (i.e. the distance from its endpoint to the origin) by the Pythagorean rule, for example in the length of the vector is defined as
It is natural to generalize this formula for all , to define the norm of the vector as
The word norm is used as a fancy replacement for the word length.
The dot product in was defined as , where and .
While the notation and term “dot product” is often used in the literature, we prefer to call it “inner product”, and for reasons which will be clear later, we will the notation instead of .
Thus, in one can define the inner product of two vectors , by
so .
Note, that , and we use the notation only to be consistent.
Let us now define norm and inner product for . As we have seen before, the complex space is the most natural space from the point of view of spectral theory: even if one starts from a matrix with real coefficients (or operator on a real vectors space), the eigenvalues can be complex, and one needs to work in a complex space.
For a complex number , we have . If is given by
it is natural to define its norm by
Let us try to define an inner product on such that . One of the choices is to define by
and that will be our definition of the standard inner product in .
To simplify the notation, let us introduce a new notion. For a matrix let us define its Hermitian adjoint, or simply adjoint by , meaning that we take the transpose of the matrix, and then take the complex conjugate of each entry. Note, that for a real matrix , .
Using the notion of , one can write the standard inner product in as
It is easy to see that one can define a different inner product in such that , namely the inner product given by
We did not specify what properties we want the inner product to satisfy, but and are the only reasonable choices giving .
Note, that the above two choices of the inner product are essentially equivalent: the only difference between them is notational, because .
While the second choice of the inner product looks more natural, the first one, is more widely used, so we will use it as well.
The inner product we defined for and satisfies the following properties:
(Conjugate) symmetry: ; note, that for a real space, this property is just symmetry, ;
Linearity: for all vector and all scalars ;
Non-negativity: ;
Non-degeneracy: if and only if .
Let be a (complex or real) vector space. An inner product on is a function, that assign to each pair of vectors , a scalar, denoted by such that the above properties 1–4 are satisfied.
Note that for a real space we assume that is always real, and for a complex space the inner product can be complex.
A space together with an inner product on it is called an inner product space. Given an inner product space, one defines the norm on it by
Let be or . We already have an inner product defined above.
This inner product is called the standard inner product in or
We will use symbol to denote both and . When we have some statement about the space , it means the statement is true for both and .
Let be the space of polynomials of degree at most . Define the inner product by
It is easy to check, that the above properties 1–4 are satisfied.
This definition works both for complex and real cases. In the real case we only allow polynomials with real coefficients, and we do not need the complex conjugate here.
Let us recall, that for a square matrix , its trace is defined as the sum of the diagonal entries,
For the space of matrices let us define the so-called Frobenius inner product by
Again, it is easy to check that the properties 1–4 are satisfied, i.e. that we indeed defined an inner product.
Note, that
so this inner product coincides with the standard inner product in .
The statements we get in this section are true for any abstract inner product space, not only for . To prove them we use only properties 1–4 of the inner product.
First of all let us notice, that properties 1 and 2 imply that
.
Indeed,
Note also that property 2 implies that for all vectors
Let be a vector in an inner product space . Then if and only if
| (5.1.1) |
Applying the above lemma to the difference we get the following
Let be vectors in an inner product space . The equality holds if and only if
The following corollary is very simple, but will be used a lot
Suppose two operators satisfy
Then .
By the previous corollary (fix and take all possible ’s) we get . Since this is true for all , the transformations and coincide. ∎
The following property relates the norm and the inner product.
The proof we are going to present, is not the shortest one, but it shows where the main ideas came from.
Let us consider the real case first. If , the statement is trivial, so we can assume that . By the properties of an inner product, for all scalar
In particular, this inequality should hold for 111That is the point where the above quadratic polynomial has a minimum: it can be computed, for example by taking the derivative in and equating it to , and for this point the inequality becomes
which is exactly the inequality we need to prove.
There are several possible ways to treat the complex case. One is to replace by , where is a complex constant, such that is real, and then repeat the proof for the real case.
The other possibility is again to consider
Substituting into this inequality, we get
which is the inequality we need.
Note, that the above paragraph is in fact a complete formal proof of the theorem. The reasoning before that was only to explain why do we need to pick this particular value of . ∎
An immediate Corollary of the Cauchy–Schwarz Inequality is the following lemma.
For any vectors , in an inner product space
∎
The following polarization identities allow one to reconstruct the inner product from the norm:
For
if is a real inner product space, and
if is a complex inner product space.
The lemma is proved by direct computation. We leave the proof as an exercise for the reader.
Another important property of the norm in an inner product space can be also checked by direct calculation.
For any vectors
In 2-dimensional space this lemma relates sides of a parallelogram with its diagonals, which explains the name. It is a well-known fact from planar geometry.
We have proved before that the norm satisfies the following properties:
Homogeneity: for all vectors and all scalars .
Triangle inequality: .
Non-negativity: for all vectors .
Non-degeneracy: if and only if .
Suppose in a vector space we assigned to each vector a number such that above properties 1–4 are satisfied. Then we say that the function is a norm. A vector space equipped with a norm is called a normed space.
Any inner product space is a normed space, because the norm satisfies the above properties 1–4. However, there are many other normed spaces. For example, given , one can define the norm on or by
One can also define the norm () by
The norm for coincides with the regular norm obtained from the inner product.
To check that is indeed a norm one has to check that it satisfies all the above properties 1–4. Properties 1, 3 and 4 are very easy to check, we leave it as an exercise for the reader. The triangle inequality (property 2) is easy to check for and (and we proved it for ).
For all other the triangle inequality is true, but the proof is not so simple, and we will not present it here. The triangle inequality for even has special name: its called Minkowski inequality, after the German mathematician H. Minkowski.
Note, that the norm for cannot be obtained from an inner product. It is easy to see that this norm is not obtained from the standard inner product in (). But we claim more! We claim that it is impossible to introduce an inner product which gives rise to the norm , .
This statement is actually quite easy to prove. By Lemma 5.1.10 any norm obtained from an inner product must satisfy the Parallelogram Identity. It is easy to see that the Parallelogram Identity fails for the norm , , and one can easily find a counterexample in , which then gives rise to a counterexample in all other spaces.
In fact, the Parallelogram Identity, as the theorem below asserts, completely characterizes norms obtained from an inner product.
A norm in a normed space is obtained from some inner product if and only if it satisfies the Parallelogram Identity
Lemma 5.1.10 asserts that a norm obtained from an inner product satisfies the Parallelogram Identity.
The converse implication is more complicated. If we are given a norm, and this norm came from an inner product, then we do not have any choice; this inner product must be given by the polarization identities, see Lemma 5.1.9. But, we need to show that which we got from the polarization identities is indeed an inner product, i.e. that it satisfies all the properties.
It is indeed possible to verify that if the norm satisfies the parallelogram identity then the inner product obtained from the polarization identities is indeed an inner product (i.e. satisfies all the properties of an inner product). However, the proof is a bit too involved, so we do not present it here.
Compute
For vectors and compute
, , , ;
, ;
.
Remark: After you have done part a), you can do parts b) and c) without actually computing all vectors involved, just by using the properties of inner product.
Let , , . Compute
Prove that for vectors in a inner product space
Recall that
Explain why each of the following is not an inner product on a given vector space:
on ;
on the space of real matrices’
on the space of polynomials; denotes derivative.
Prove that
if and only if one of the vectors is a multiple of the other. Hint: Analyze the proof of the Cauchy–Schwarz inequality.
Prove the parallelogram identity for an inner product space ,
Let be a spanning set (in particular, a basis) in an inner product space . Prove that
If for all , then ;
If , then ;
If , then .
Consider the space with the norm , introduced in Section 5.1.5. For , draw the “unit ball” in the norm
Can you guess what the balls for other look like?
Two vectors and are called orthogonal (also perpendicular) if . We will write to say that the vectors are orthogonal.
Note, that for orthogonal vectors and we have the following, so-called Pythagorean identity:
The proof is straightforward computation,
( because of orthogonality).
We say that a vector is orthogonal to a subspace if is orthogonal to all vectors in .
We say that subspaces and are orthogonal if all vectors in are orthogonal to , i.e. all vectors in are orthogonal to all vectors in
The following lemma shows how to check that a vector is orthogonal to a subspace.
Let be spanned by vectors . Then if and only if
By the definition, if then is orthogonal to all vectors in . In particular, , .
On the other hand, let , . Since the vectors span , any vector can be represented as a linear combination . Then
so . ∎
A system of vectors is called orthogonal if any two vectors are orthogonal to each other (i.e. if for ).
If, in addition for all , we call the system orthonormal.
Let be an orthogonal system. Then
This formula looks particularly simple for orthonormal systems, where .
Because of orthogonality if . Therefore we only need to sum the terms with , which gives exactly
∎
Any orthogonal system of non-zero vectors is linearly independent.
Suppose for some we have . Then by the Generalized Pythagorean identity (Lemma 5.2.5)
Since () we conclude that
so only the trivial linear combination gives . ∎
In what follows we will usually mean by an orthogonal system an orthogonal system of non-zero vectors. Since the zero vector is orthogonal to everything, it always can be added to any orthogonal system, but it is really not interesting to consider orthogonal systems with zero vectors.
An orthogonal (orthonormal) system which is also a basis is called an orthogonal (orthonormal) basis.
It is clear that if then any orthogonal system of non-zero vectors is an orthogonal basis.
As we studied before, to find coordinates of a vector in a basis one needs to solve a linear system. However, for an orthogonal basis finding coordinates of a vector is much easier. Namely, suppose is an orthogonal basis, and let
Taking inner product of both sides of the equation with we get
(all inner products if ), so
Similarly, multiplying both sides by we get
so
| (5.2.1) |
Therefore,
to find coordinates of a vector in an orthogonal basis one does not need to solve a linear system, the coordinates are determined by the formula (5.2.1).
This formula is especially simple for orthonormal bases, when .
Namely, if is an orthonormal basis, any vector can be represented as
| (5.2.2) |
This formula is sometimes called (a baby version of) the abstract orthogonal Fourier decomposition. The classical (non-abstract) Fourier decomposition deals with a concrete orthonormal system (sines and cosines or complex exponentials). We call this formula a baby version because the real Fourier decomposition deals with infinite orthonormal systems.
The importance of orthonormal bases is that if we fix an orthonormal basis in an inner product space , we can work with coordinates in this basis the same way we work with vectors in . Namely, as it was discussed in the very beginning of the book, see Remark 1.2.4 in Chapter 1, if we have a vector space (over a field ) with a basis , then we can perform the standard vector operations (vector addition and multiplication by a scalar) by working with the columns of coordinates in the basis in absolutely the same way we work with vectors in the standard coordinate space .
Exercise 5.2.3 below shows that if we have an orthonormal basis in an inner product space , we can compute the inner product of 2 vectors in by taking columns of their coordinates in this orthonormal basis and computing the standard inner product (in or ) of these columns.
As it will be shown below in Section 5.3 any finite-dimensional inner product space has an orthonormal basis. Thus, the standard inner product spaces (or in the case of real spaces) are essentially the only examples of a finite-dimensional inner product spaces.
This is a very important remark allowing one to translate any statement about the standard inner product space to an inner product space with an orthonormal basis .
Find all vectors in orthogonal to vectors and .
Let be a real matrix. Describe ,
Let be an orthonormal basis in .
Prove that for any ,
Deduce from this the Parseval’s identity
Assume now that is only an orthogonal basis, not an orthonormal one. Can you write down Parseval’s identity in this case?
This problem shows that if we have an orthonormal basis, we can use the coordinates in this basis absolutely the same way we use the standard coordinates in (or ).
The problem below shows that we can define an inner product by declaring a basis to be an orthonormal one.
Let be a vector space and let be a basis in . For , define .
Prove that defines an inner product in .
Let be a real matrix. Describe the set of all vectors in orthogonal to .
Recalling the definition of orthogonal projection from the classical planar (2-dimensional) geometry, one can introduce the following definition. Let be a subspace of an inner product space .
For a vector its orthogonal projection onto the subspace is a vector such that
;
.
We will use notation for the orthogonal projection.
After introducing an object, it is natural to ask:
Does the object exist?
Is the object unique?
How does one find it?
We will show first that the projection is unique. Then we present a method of finding the projection, proving its existence.
The following theorem shows why the orthogonal projection is important and also proves that it is unique.
The orthogonal projection minimizes the distance from to , i.e. for all
Moreover, if for some
then .
Let . Then
Since we have and so by Pythagorean Theorem
Note that equality happens only if i.e. if . ∎
The following proposition shows how to find an orthogonal projection if we know an orthogonal basis in .
Let be an orthogonal basis in . Then the orthogonal projection of a vector is given by the formula
In other words
| (5.3.1) |
Note that the formula for coincides with (5.2.1), i.e. this formula applied to an orthogonal system (not a basis) gives us a projection onto its span.
It is easy to see now from formula (5.3.1) that the orthogonal projection is a linear transformation.
One can also see linearity of directly, from the definition and uniqueness of the orthogonal projection. Indeed, it is easy to check that for any and the vector is orthogonal to any vector in , so by the definition .
Recalling the definition of inner product in and one can get from the above formula (5.3.1) the matrix of the orthogonal projection onto in () is given by
| (5.3.2) |
where columns form an orthogonal basis in .
Let
We want to show that . By Lemma 5.2.3 it is sufficient to show that , . Computing the inner product we get for
∎
So, if we know an orthogonal basis in we can find the orthogonal projection onto . In particular, since any system consisting of one vector is an orthogonal system, we know how to perform orthogonal projection onto one-dimensional spaces.
But how do we find an orthogonal projection if we are only given a basis in ? Fortunately, there exists a simple algorithm allowing one to get an orthogonal basis from a basis.
Suppose we have a linearly independent system . The Gram-Schmidt method constructs from this system an orthogonal system such that
Moreover, for all we get
Now let us describe the algorithm.
Step 1. Put . Denote by .
Step 2. Define by
Define . Note that .
Step 3. Define by
Put . Note that . Note also that so .
…
Step . Suppose that we already made steps of the process, constructing an orthogonal system (consisting of non-zero vectors) such that . Define
Note,that so .
…
Continuing this algorithm we get an orthogonal system .
Suppose we are given vectors
and we want to orthogonalize it by Gram-Schmidt. On the first step define
On the second step we get
Computing
we get
Finally, define
Computing
( was already computed before) we get
Since the multiplication by a scalar does not change the orthogonality, one can multiply vectors obtained by Gram-Schmidt by any non-zero numbers.
In particular, in many theoretical constructions one normalizes vectors by dividing them by their respective norms . Then the resulting system will be orthonormal, and the formulas will look simpler.
On the other hand, when performing the computations one may want to avoid fractional entries by multiplying a vector by the least common denominator of its entries. Thus one may want to replace the vector from the above example by .
For a subspace its orthogonal complement is the set of all vectors orthogonal to ,
If then for any linear combination (can you see why?). Therefore is a subspace.
By the definition of orthogonal projection any vector in an inner product space admits a unique representation
(where clearly ).
This statement is often symbolically written as , which mean exactly that any vector admits the unique decomposition above.
The following proposition gives an important property of the orthogonal complement.
For a subspace
The proof is left as an exercise, see Problem 5.3.12 below.
Apply Gram–Schmidt orthogonalization to the system of vectors , , .
Apply Gram–Schmidt orthogonalization to the system of vectors , . Write the matrix of the orthogonal projection onto 2-dimensional subspace spanned by these vectors.
Complete an orthogonal system obtained in the previous problem to an orthogonal basis in , i.e. add to the system some vectors (how many?) to get an orthogonal basis.
Can you describe how to complete an orthogonal system to an orthogonal basis in general situation of or ?
Find the distance from a vector to the subspace spanned by the vectors , . Note, that I am only asking to find the distance to the subspace, not the orthogonal projection.
Find the orthogonal projection of a vector onto the subspace spanned by the vectors and (note that ).
Find the distance from a vector to the subspace spanned by the vectors and (note that ). Can you find the distance without actually computing the projection? That would simplify the calculations.
True or false: if is a subspace of , then ? Justify.
Let be the orthogonal projection onto a subspace of an inner product space , , . Find the eigenvalues and the eigenvectors (eigenspaces). Find the algebraic and geometric multiplicities of each eigenvalue.
(Using eigenvalues to compute determinants).
Find the matrix of the orthogonal projection onto the one-dimensional subspace in spanned by the vector ;
Let be the matrix with all entries equal . Compute its eigenvalues and their multiplicities (use the previous problem);
Compute eigenvalues (and multiplicities) of the matrix , i.e. of the matrix with zeroes on the main diagonal and ones everywhere else;
Compute .
Let an inner product on the space of polynomials be defined by . Apply Gram-Schmidt orthogonalization to the system .
Legendre’s polynomials are particular case of the so-called orthogonal polynomials, which play an important role in many branches of mathematics.
Let be the matrix of an orthogonal projection onto a subspace . Show that
The matrix is self-adjoint, meaning that .
.
Remark: The above 2 properties completely characterize orthogonal projection, i.e. any matrix satisfying these properties is the matrix of some orthogonal projection. We will discuss this some time later.
Show that for a subspace we have . Hint: It is easy to see that is orthogonal to (why?). To show that any vector orthogonal to belongs to use the decomposition from Section 5.3.3 above.
Suppose is the orthogonal projection onto a subspace , and is the orthogonal projection onto the orthogonal complement .
What are and ?
Show that is its own inverse.
As it was discussed before in Chapter 2, the equation
has a solution if and only if . But what do we do to solve an equation that does not have a solution?
This seems to be a silly question, because if there is no solution, then there is no solution. But, situations when we want to solve an equation that does not have a solution can appear naturally, for example, if we obtained the equation from an experiment. If we do not have any errors, the right side belongs to the column space , and equation is consistent. But, in real life it is impossible to avoid errors in measurements, so it is possible that an equation that in theory should be consistent, does not have a solution. So, what can one do in this situation?
The simplest idea is to write down the error
and try to find minimizing it. If we can find such that the error is , the system is consistent and we have exact solution. Otherwise, we get the so-called least square solution.
The term least square arises from the fact that minimizing is equivalent to minimizing
i.e. to minimizing the sum of squares of linear functions.
There are several ways to find the least square solution. If we are in , and everything is real, we can forget about absolute values. Then we can just take partial derivatives with respect to and find where all of them are , which gives us the minimum.
However, there is a simpler way of finding the minimum. Namely, if we take all possible vectors , then gives us all possible vectors in , so minimum of is exactly the distance from to . Therefore the value of is minimal if and only if , where stands for the orthogonal projection onto the column space .
So, to find the least square solution we simply need to solve the equation
If we know an orthogonal basis in , we can find vector by the formula
If we only know a basis in , we need to use the Gram–Schmidt orthogonalization to obtain an orthogonal basis from it.
So, theoretically, the problem is solved, but the solution is not very simple: it involves Gram–Schmidt orthogonalization, which can be computationally intensive. Fortunately, there exists a simpler solution.
Namely, is the orthogonal projection if and only if ( for all ).
If are columns of , then the condition can be rewritten as
That means
Joining rows together we get that these equations are equivalent to
which in turn is equivalent to the so-called normal equation
A solution of this equation gives us the least square solution of .
Note, that the least square solution is unique if and only if is invertible.
As we already discussed above, if is a solution of the normal equation (i.e. a least square solution of ), then . So, to find the orthogonal projection of onto the column space we need to solve the normal equation , and then multiply the solution by .
If the operator is invertible, the solution of the normal equation is given by , so the orthogonal projection can be computed as
Since this is true for all ,
is the formula for the matrix of the orthogonal projection onto .
The following theorem implies that for an matrix the matrix is invertible if and only if .
For an matrix
Indeed, according to the rank theorem if and only if rank is . Therefore if and only if . Since the matrix is square, it is invertible if and only if .
We leave the proof of the theorem as an exercise. To prove the equality one needs to prove two inclusions and . One of the inclusions is trivial, for the other one use the fact that
Let us introduce a few examples where the least square solution appears naturally. Suppose that we know that two quantities and are related by the law . The coefficients and are unknown, and we would like to find them from experimental data.
Suppose we run the experiment times, and we get pairs , . Ideally, all the points should be on a straight line, but because of errors in measurements, it usually does not happen: the point are usually close to some line, but not exactly on it. That is where the least square solution helps!
Ideally, the coefficients and should satisfy the equations
(note that here, and are some fixed numbers, and the unknowns are and ). If it is possible to find such and we are lucky. If not, the standard thing to do is to minimize the total quadratic error
But, minimizing this error is exactly finding the least square solution of the system
(recall that , are some given numbers, and the unknowns are and ).
Suppose our data consist of pairs
Then we need to find the least square solution of
Then
and
so the normal equation is rewritten as
The solution of this equation is
so the best fitting straight line is
The least square method is not limited to the line fitting. It can also be applied to more general curves, as well as to surfaces in higher dimensions.
The only constraint here is that the parameters we want to find be involved linearly. The general algorithm is as follows:
Find the equations that your data should satisfy if there is exact fit;
Write these equations as a linear system, where unknowns are the parameters you want to find. Note, that the system need not to be consistent (and usually is not);
Find the least square solution of the system.
For example, suppose we know that the relation between and is given by the quadratic law , so we want to fit a parabola to the data. Then our unknowns , , should satisfy the equations
or, in matrix form
For example, for the data from the previous example we need to find the least square solution of
Then
and
Therefore the normal equation is
which has the unique solution
Therefore,
is the best fitting parabola.
As another example, let us fit a plane to the data
The equations we should have in the case of exact fit are
or, in the matrix form
So, to find the best fitting plane, we need to find the best square solution of this system (the unknowns are , , ).
Find the least square solution of the system
Find the matrix of the orthogonal projection onto the column space of
Use two methods: Gram–Schmidt orthogonalization and formula for the projection.
Compare the results.
Find the best straight line fit (least square solution) to the points , , , .
Fit a plane to four points , , , .
To do that
Find 4 equations with 3 unknowns such that the plane pass through all 4 points (this system does not have to have a solution);
Find the least square solution of the system.
Minimal norm solution. let an equation has a solution, and let has non-trivial kernel (so the solution is not unique). Prove that
There exist a unique solution of minimizing the norm , i.e. that there exists unique such that and for any satisfying .
for any satisfying .
Minimal norm least square solution. Applying previous problem to the equation show that
There exists a unique least square solution of minimizing the norm .
for any least square solution of .
Let as recall that for an matrix its Hermitian adjoint (or simply adjoint) is defined by . In other words, the matrix is obtained from the transposed matrix by taking complex conjugate of each entry.
The following identity is the main property of adjoint matrix:
Before proving this identity, let us introduce some useful formulas. Let us recall that for transposed matrices we have the identity . Since for complex numbers and we have , the identity
holds for the adjoint.
Also, since and ,
Now, we are ready to prove the main identity:
the first and the last equalities here follow from the definition of inner product in , and the middle one follows from the fact that
The above main identity is often used as the definition of the adjoint operator. Let us first notice that the adjoint operator is unique: if a matrix satisfies
then . Indeed, by the definition of for a given we have
and therefore by Corollary 5.1.5 . Since it is true for all , the linear transformations, and therefore the matrices and coincide.
The above main identity can be used to define the adjoint operator in abstract setting, where is an operator acting from one inner product space to another. Namely, we define to be the operator satisfying
Why does such an operator exists? We can simply construct it: consider orthonormal bases in and in . If is the matrix of with respect to these bases, we define the operator by defining its matrix as
We leave the proof that this indeed gives the adjoint operator as an exercise for the reader.
Note, that the reasoning in the above Sect. 5.5.1 implies that the adjoint operator is unique.
Below we present the properties of the adjoint operators (matrices) we will use a lot. We leave the proofs as an exercise for the reader.
;
;
;
;
.
Let be an operator acting from one inner product space to another. Then
;
;
;
.
Earlier in Section 2.7 of Chapter 2 the fundamental subspaces were defined (as it is often done in the literature) using instead of . Of course, there is no difference for real matrices, so in the real case the above theorem gives the geometric description of the fundamentals subspaces defined there.
First of all, let us notice, that since for a subspace we have , the statements 1 and 3 are equivalent. Similarly, for the same reason, the statements 2 and 4 are equivalent as well. Finally, statement 2 is exactly statement 1 applied to the operator (here we use the fact that ).
So, to prove the theorem we only need to prove statement 1.
We will present 2 proofs of this statement: a “matrix” proof, and an “invariant”, or “coordinate-free” one.
In the “matrix” proof, we assume that is an matrix, i.e. that . The general case can be always reduced to this one by picking orthonormal bases in and , and considering the matrix of in this bases.
Let be the columns of . Note, that if and only if (i.e. ) .
By the definition of the inner product in , that means
Since is the row number of , the above equalities are equivalent to the equation
So, we proved that if and only if , and that is exactly the statement 1.
Now, let us present the “coordinate-free” proof. The inclusion means that is orthogonal to all vectors of the form , i.e. that
Since , this identity is equivalent to
and by Lemma 5.1.4 this happens if and only if . So we proved that if and only if , which is exactly the statement 1 of the theorem. ∎
The above theorem makes the structure of the operator and the geometry of fundamental subspaces much more transparent. It follows from this theorem that the operator can be represented as a composition of orthogonal projection onto and an isomorphism from to .
Indeed, let be the restriction of to the domain and the target space ,
Since , we have
the fact that is used here. Therefore we can write
| (5.5.1) |
or, equivalently, .
Note also that is an invertible transformation. First we notice that : if is such that , then , so , thus . Then to see that is invertible, it is sufficient to see that is onto (surjective). But this immediately follows from (5.5.1):
The isomorphism is sometimes called the “essential part” of the operator (a non-standard terminology).
The fact the “essential part” of is an isomorphism implies the following “complex” rank theorem: . But, of course, this theorem also follows from an elementary observation that complex conjugation does not change rank of a matrix, .
Show that for a square matrix the equality holds.
Find matrices of orthogonal projections onto all 4 fundamental subspaces of the matrix
Note, that really you need only to compute 2 of the projections. If you pick an appropriate 2, the other 2 are easy to obtain from them (recall, how the projections onto and are related).
Let be an matrix. Show that .
To do that you need to prove 2 inclusions, and . One of the inclusions is trivial, for the other one use the fact that
Use the equality to prove that
;
If has only the trivial solution, is left invertible. (You can just write a formula for a left inverse).
Suppose, that for a matrix the matrix is invertible, so the orthogonal projection onto is given by the formula . Can you write formulas for the orthogonal projections onto the other 3 fundamental subspaces (, , )?
Let a matrix be self-adjoint () and let . Show that is the matrix of an orthogonal projection. Hint: consider the decomposition , , and show that , . For one of the equalities you will need self-adjointness, for the other one the property .
An operator is called an isometry, if it preserves the norm,
The following theorem shows that an isometry preserves the inner product
An operator is an isometry if and only if it preserves the inner product, i.e if and only if
The proof uses the polarization identities (Lemma 5.1.9). For example, if is a complex space
Similarly, for a real space
∎
An operator is an isometry if and only if .
If , then by the definition of adjoint operator
Therefore , and so is an isometry.
The above lemma implies that an isometry is always left invertible ( being a left inverse).
An isometry is called a unitary operator if it is invertible.
An isometry is a unitary operator if and only if .
Since is an isometry, it is left invertible, and since , it is invertible (a left invertible square matrix is invertible).
On the other hand, if is invertible, (only square matrices are invertible, isomorphic spaces have equal dimensions). ∎
A square matrix is called unitary if , i.e. a unitary matrix is a matrix of a unitary operator acting in .
A unitary matrix with real entries is called an orthogonal matrix. An orthogonal matrix can be interpreted a matrix of a unitary operator acting in the real space .
Few properties of unitary operators:
For a unitary transformation , ;
If is unitary, is also unitary;
If is a isometry, and is an orthonormal basis, then is an orthonormal system. Moreover, if is unitary, is an orthonormal basis.
A product of unitary operators is a unitary operator as well.
First of all, let us notice, that
a matrix is an isometry if and only if its columns form an orthonormal system.
This statement can be checked directly by computing the product .
It is easy to check that the columns of the rotation matrix
are orthogonal to each other, and that each column has norm 1. Therefore, the rotation matrix is an isometry, and since it is square, it is unitary. Since all entries of the rotation matrix are real, it is an orthogonal matrix.
The next example is more abstract. Let and be inner product spaces, , and let and be orthonormal bases in and respectively. Define an operator by
Since for a vector
and
one can conclude that for all , so is a unitary operator.
Let be a unitary matrix. Then
. In particular, for an orthogonal matrix ;
If is an eigenvalue of , then
Note, that for an orthogonal matrix, an eigenvalue (unlike the determinant) does not have to be real. Our old friend, the rotation matrix gives an example.
Operators (matrices) and are called unitarily equivalent if there exists a unitary operator such that .
Since for a unitary we have , any two unitary equivalent matrices are similar as well.
The converse is not true, it is easy to construct a pair of similar matrices, which are not unitarily equivalent.
The following proposition gives a way to construct a counterexample.
A matrix is unitarily equivalent to a diagonal one if and only if it has an orthogonal (orthonormal) basis of eigenvectors.
Let and let . Then , i.e. is an eigenvector of .
So, let be unitarily equivalent to a diagonal matrix , i.e. let . The vectors of the standard basis are eigenvectors of , so the vectors are eigenvectors of . Since is unitary, the system is an orthonormal basis.
Now let has an orthogonal basis of eigenvectors. Dividing each vector by its norm if necessary, we can always assume that the system is an orthonormal basis. Let be the matrix of in the basis . Clearly, is a diagonal matrix.
Denote by the matrix with columns . Since the columns form an orthonormal basis, is unitary. The standard change of coordinate formula implies
and since is unitary, . ∎
Orthogonally diagonalize the following matrices,
i.e. for each matrix find a unitary matrix and a diagonal matrix such that
True or false: a matrix is unitarily equivalent to a diagonal one if and only if it has an orthogonal basis of eigenvectors.
Prove the polarization identities
and
Show that a product of unitary (orthogonal) matrices is unitary (orthogonal) as well.
Let be a linear transformation on a finite-dimensional inner product space. True or false:
If for all , then is unitary.
If , for some orthonormal basis , then is unitary.
Justify your answers with a proof or a counterexample.
Let and be unitarily equivalent matrices.
Prove that .
Use a) to prove that
Use b) to prove that the matrices
are not unitarily equivalent.
Which of the following pairs of matrices are unitarily equivalent:
.
.
.
.
.
Hint: It is easy to eliminate matrices that are not unitarily equivalent: remember, that unitarily equivalent matrices are similar, and trace, determinant and eigenvalues of similar matrices coincide.
Also, the previous problem helps in eliminating non unitarily equivalent matrices.
Finally, a matrix is unitarily equivalent to a diagonal one if and only if it has an orthogonal basis of eigenvectors.
Let be a orthogonal matrix with . Prove that is a rotation matrix.
Let be a orthogonal matrix with . Prove that
is an eigenvalue of .
If is an orthonormal basis, such that (remember, that 1 is an eigenvalue), then in this basis the matrix of is
where is some angle.
Hint: Show, that since is an eigenvector of , all entries below 1 must be zero, and since is also an eigenvector of (why?), all entries right of 1 also must be zero. Then show that the lower right matrix is an orthogonal one with determinant 1, and use the previous problem.
A rigid motion in an inner product space is a transformation preserving the distance between point, i.e. such that
Note, that in the definition we do not assume that the transformation is linear.
Clearly, any unitary transformation is a rigid motion. Another example of a rigid motion is a translation (shift) by , .
The main result of this section is the following theorem, stating that any rigid motion in a real inner product space is a composition of an orthogonal transformation and a translation.
Let be a rigid motion in a real inner product space , and let . Then is an orthogonal transformation.
To prove this theorem we need the following simple lemma.
Let be as defined in Theorem 5.7.1. Then for all
;
;
.
To prove statement 1 notice that
Statement 2 follows from the following chain of identities:
An alternative explanation would be that is a composition of 2 rigid motions ( followed by the translation by ), and one can easily see that a composition of rigid motions is a rigid motion. Since , and so , statement 1 can be treated as a particular case of statement 2.
To prove statement 3, let us notice that in a real inner product space
and
Recalling that and , , we immediately get the desired conclusion. ∎
First of all notice that for all
so preserves the norm, .
We would like to say that the identity means is an isometry, but to be able to say that we need to prove that is a linear transformation.
To do that, let us fix an orthonormal basis in , and let , . Since preserves the inner product (statement 3 of Lemma 5.7.2), we can conclude that is an orthonormal system. In fact, since (because basis consists of vectors), we can conclude that is an orthonormal basis.
Let . Recall that by the abstract orthogonal Fourier decomposition (5.2.2) we have that . Applying the abstract orthogonal Fourier decomposition (5.2.2) to and the orthonormal basis we get
Since
we get that
This means that is a linear transformation whose matrix with respect to the bases and is identity matrix, .
An alternative way to show that is a linear transformation is the following direct calculation
Therefore
which implies that is linear (taking or we get two properties from the definition of a linear transformation).
So, is a linear transformation satisfying , i.e. is an isometry. Since , is unitary transformation (see Proposition 5.6.3). That completes the proof, since an orthogonal transformation is simply a unitary transformation in a real inner product space. ∎
Give an example of a rigid motion in , , which is not a linear transformation.
This section is probably a bit more abstract than the rest of the chapter, and can be skipped at the first reading.
Any complex vector space can be interpreted a real vector space: we just need to forget that we can multiply vectors by complex numbers and act as only multiplication by real numbers is allowed.
For example, the space is canonically identified with the real space : each complex coordinate gives us 2 real ones and .
“Canonically” here means that this is a standard, most natural way of identifying and . Note, that while the above definition gives us a canonical way to get real coordinates from complex ones, it does not say anything about ordering real coordinates.
In fact, there are two standard ways to order the coordinates , . One way is to take first the real parts and then the imaginary parts, so the ordering is . The other standard alternative is the ordering . The material of this section does not depend on the choice of ordering of coordinates, so the reader does not have to worry about picking an ordering.
It turns out that if we are given a complex inner product (in a complex space), we can in a canonical way get a real inner product from it. To see how we can do that, let as first consider the above example of canonically identifies with . Let denote the standard inner product in , and be the standard inner product in (note that the standard inner product in does not depend on the ordering of coordinates). Then (see Exercise 5.8.1 below)
| (5.8.1) |
This formula can be used to canonically define a real inner product from the complex one in general situation. Namely, it is an easy exercise to show that if is an inner product in a complex inner product space, then defined by (5.8.1) is a real inner product (on the corresponding real space).
Summarizing we can say that
To decomplexify a complex inner product space we simply “forget” that we can multiply by complex numbers, i.e. we only allow multiplication by reals. The canonical real inner product in the decomplexified space is given by formula (5.8.1)
Any (complex) linear transformation on (or, more generally, on a complex vector space) gives as a real linear transformation: it is simply the fact that if holds for , then it of course holds for .
The converse is not true, i.e. a (real) linear transformation on the decomplexification of does not always give a (complex) linear transformation of (the same in the abstract settings).
For example, if one considers the case , then the multiplication by a complex number (general form of a linear transformation in ) treated as a linear transformation in has a very specific structure (can you describe it?).
We can also do a converse, namely get a complex inner product space from a real one: in fact, you probably already did it before, without paying much attention to it.
Namely, given a real inner product space we can obtain a complex space out of it by allowing complex coordinates (with the standard inner product in both cases). The space in this case will be a real222Real subspace mean the set closed with respect to sum and multiplication by real numberssubspace of consisting of vectors with real coordinates.
Abstractly, this construction can be described as follows: given a real vector space we can define its complexification as the collection of all pairs , with the addition and multiplication by a real number are defined coordinate-wise,
If then the vector consists of real parts of complex coordinates of and the vector of the imaginary parts. Thus informally one can write the pair as .
To define multiplication by complex numbers we define multiplication by as
(writing as we can see that it must be defined this way) and define multiplication by arbitrary complex numbers using second distributive property .
If, in addition, is an inner product space we can extend the inner product to by
The easiest way to see that everything is well defined, is to fix a basis (an orthonormal basis in the case of a real inner product space) and see what this construction gives us in coordinates. Then we can see that if we treat vector as the vector consisting of the real parts of complex coordinates, and vector as the vector consisting of imaginary parts of coordinates, then this construction is exactly the standard complexification of (by allowing complex coordinates) described above.
The fact that we can express this construction in coordinate-free way, without picking a basis and working with coordinates, means that the result does not depend on the choice of a basis.
So, the easiest way to think about complexification is probably as follows:
To construct a complexification of a real vector space we can pick a basis (an orthonormal basis if is a real inner product space) and then work with coordinates, allowing the complex ones. The resulting space does not depend on the choice of a basis; we can get from one coordinates to the others by the standard change of coordinate formula.
Note, that any linear transformation in the real space gives rise to a linear transformation in the complexification .
The easiest way to see that is to fix a basis in (an orthonormal basis if is a real inner product space) and to work in a coordinate representation: in this case has the same matrix as . In the abstract representation we can write
On the other hand, not all linear transformations in can be obtained from the transformations in ; if we do complexification in coordinates, only the transformations with real matrices work.
Note, that this is completely opposite to the situation in the case of decomplexification, described in Section 5.8.1.
An attentive reader probably already noticed that the operations of complexification and decomplixification are not the inverse of each other. First, the space and its complexification have the same dimension, while the decomplixification of an -dimensional space has dimension . Moreover, as we just discussed, the relation between real and complex linear transformations is completely opposite in these cases.
In the next section we discuss the operation, inverse in some sense to decomplexification.
The construction described in this section works only for real spaces of even dimension.
Let be a real inner product space of dimension . We want invert the decomplexification procedure to introduce a complex structure on , i.e. to identify this space with a complex space such that its decomplexification (see Section 5.8.1) give us the original space . The simplest idea is to fix an orthonormal basis in and then split the coordinates in this basis into two equal parts.
We than treat one half of the coordinates (say coordinates ) as real parts of complex coordinates, and treat the rest as the imaginary parts. Then we have to join real and imaginary parts together: for example if we treat as real parts and as imaginary parts, we can define complex coordinates .
Of course, the result will generally depend on the choice of the orthonormal basis and on the way we split the real coordinates into real and imaginary parts and on how we join them.
One can also see from the decomplexification construction described in Section 5.8.1 that all complex structures on a real inner product space can be obtained in this way.
The above construction can be described in an abstract, coordinate-free way. Namely, let us split the space as , where is a subspace, (so as well), and let be a unitary (more precisely, orthogonal, since our spaces are real) transformation.
Note, that if is an orthonormal basis in , then the system is an orthonormal basis in , so
is an orthonormal basis in the whole space .
If are coordinates of a vector in this basis, and we treat , as complex coordinates of , then the multiplication by is represented by the orthogonal transformation which is given in the orthogonal basis of subspaces by the block matrix
This means that
, .
Clearly, is an orthogonal transformation such that . Therefore, any complex structure on is given by an orthogonal transformation , satisfying ; the transformation gives us the multiplication by the imaginary unit .
The converse is also true, namely any orthogonal transformation satisfying defines a complex structure on a real inner product space . Let us explain how.
Let us first consider an abstract explanation. To define a complex structure, we need to define the multiplication of vectors by complex numbers (initially we only can multiply by real numbers). In fact we need only to define the multiplication by , the rest will follow from linearity in the original real space. And the multiplication by is given by the orthogonal transformation satisfying .
Namely, if the multiplication by is given by , , then the complex multiplication must be defined by
| (5.8.2) |
We will use this formula now as the definition of complex multiplication.
It is not hard to check that for the complex multiplication defined above by (5.8.2) all axioms of complex vector space are satisfied. One can see that, for example by using linearity in the real space and noticing that with respect to algebraic operations (addition and multiplication) the linear transformations of form
behave absolutely the same way as complex numbers , i.e such transformations give us a representation of the field of complex numbers .
This means that first, a sum and a product of transformations of the form is a transformation of the same form, and to get the coefficients , of the result we can perform the operation on the corresponding complex numbers and take the real and imaginary parts of the result. Note, that here we need the identity , but we do not need the fact that is an orthogonal transformation.
Thus, we got the structure of a complex vector space. To get a complex inner product space we need to introduce complex inner product , such that the original real inner product is the real part of it.
We really do not have any choice here. Indeed, for any complex inner product
for the last equality we used the definition (5.8.2) of complex multiplication. Therefore, the only way to define the complex inner product such that is
| (5.8.3) |
Let us show that this is indeed a complex inner product. We will need the fact that , see Exercise 5.8.4 below (by here we mean the adjoint with respect to the original real inner product).
To show that we use the identity and symmetry of the real inner product:
To prove the linearity of the complex inner product, let us first notice that is real linear in the first (in fact in each) argument, i.e. that for ; this is true because each summand in the right side of (5.8.3) is real linear in the first argument.
Using real linearity of and the identity (which implies that ) together with the orthogonality of , we get the following chain of equalities
which proves complex linearity.
Finally, to prove non-negativity of let us notice (see Exercise 5.8.3 below) that , so
For a reader who is not comfortable with such “high brow” and abstract proof, there is another, more hands on, explanation.
Namely, it can be shown, see Exercise 5.8.5 below, that there exists a subspace , (recall that ), such that the matrix of with respect to the decomposition is given by
where is some orthogonal transformation.
Let be an orthonormal basis in . Then the system is an orthonormal basis in , so
is an orthonormal basis in the whole space . Considering the coordinates in this basis and treating as complex coordinates, we get an elementary, “coordinate” way of defining complex structure, which was already described above. But if we look carefully, we see that multiplication by is given by the transformation : it is trivial for and for , and so it is true for all real linear combinations of , i.e. for all vectors in .
But that means that the abstract introduction of complex structure and the corresponding elementary approach give us the same result! And since the elementary approach clearly gives us the a complex structure, the abstract approach gives us the same complex structure.
Show that if is an inner product in a complex inner product space, then defined by (5.8.1) is a real inner product.
Let be an orthogonal transformation (in a real inner product space ), satisfying . Prove that for all
Show, that if is an orthogonal transformation satisfying , then .
Let be an orthogonal transformation in a real inner product space, satisfying . Show that in this case , and that there exists a subspace , , and an orthogonal transformation such that in the decomposition is given by the block matrix
This statement can be easily obtained from Theorem 6.5.1 of Chapter 6, if one notes that the only rotation in satisfying are rotations through .
However, one can find an elementary proof here, not using this theorem. For example, the statement is trivial if : in this case we can take for any one-dimensional subspace, see Exercise 5.8.3.
Then it is not hard to show that such operator does not exists in , and one can use induction in to complete the proof.