Spectral theory is the main tool that helps us to understand the structure of a linear operator. In this chapter we consider only operators acting from a vector space to itself (or, equivalently, matrices). If we have such a linear transformation , we can multiply it by itself, take any power of it, or any polynomial.
The main idea of spectral theory is to split the operator into simple blocks and analyze each block separately.
To explain the main idea, let us consider difference equations. Many processes can be described by the equations of the following type
where is a linear transformation, and is the state of the system at the time . Given the initial state we would like to know the state at the time , analyze the long time behavior of , etc. 111The difference equations are discrete time analogues of the differential equation . To solve the differential equation, one needs to compute , and spectral theory also helps in doing this.
At the first glance the problem looks trivial: the solution is given by the formula . But what if is huge: thousands, millions? Or what if we want to analyze the behavior of as ?
Here the idea of eigenvalues and eigenvectors comes in. Suppose that , where is some scalar. Then , so the behavior of the solution is very well understood
In this section we will consider only operators in finite-dimensional spaces. Spectral theory in infinitely many dimensions is significantly more complicated, and most of the results presented here fail in infinite-dimensional setting.
A scalar is called an eigenvalue of an operator if there exists a non-zero vector such that
The vector is called the eigenvector of (corresponding to the eigenvalue ).
If we know that is an eigenvalue, the eigenvectors are easy to find: one just has to solve the equation , or, equivalently
So, finding all eigenvectors, corresponding to an eigenvalue is simply finding the nullspace of . The nullspace , i.e. the set of all eigenvectors and vector, is called the eigenspace.
The set of all eigenvalues of an operator is called spectrum of , and is usually denoted .
A scalar is an eigenvalue if and only if the nullspace is non-trivial (so the equation has a non-trivial solution).
Let act on (i.e. ). Since the matrix of is square, has a non-trivial nullspace if and only if it is not invertible. We know that a square matrix is not invertible if and only if its determinant is . Therefore
If is an matrix, the determinant is a polynomial of degree of the variable . This polynomial is called the characteristic polynomial of . So, to find all eigenvalues of one just needs to compute the characteristic polynomial and find all its roots.
This method of finding the spectrum of an operator is not very practical in higher dimensions. Finding roots of a polynomial of high degree can be a very difficult problem, and it is impossible to solve the equation of degree higher than 4 in radicals. So, in higher dimensions different numerical methods of finding eigenvalues and eigenvectors are used.
So we know how to find the spectrum of a matrix. But how do we find eigenvalues of an operator acting in an abstract vector space? The recipe is simple:
Take an arbitrary basis, and compute eigenvalues of the matrix of the operator in this basis.
But how do we know that the result does not depend on a choice of the basis?
There can be several possible explanations. One is based on the notion of similar matrices. Let us recall that square matrices and are called similar if there exist an invertible matrix such that
Note, that determinants of similar matrices coincide. Indeed
because . Note that if then
so the matrices and are similar. Therefore
i.e.
| characteristic polynomials of similar matrices coincide. |
If is a linear transformation, and and are two bases in , then
and since the matrices and are similar.
In other words, matrices of a linear transformation in different bases are similar.
Therefore, we can define the characteristic polynomial of an operator as the characteristic polynomial of its matrix in some basis. As we have discussed above, the result does not depend on the choice of the basis, so characteristic polynomial of an operator is well defined.
The fundamental theorem of algebra asserts that any polynomial (of degree at least ) has a complex root. That implies that an operator in a finite-dimensional complex vector space has at least one eigenvalue, so its spectrum is non-empty.
On the other hand it is easy to construct a linear transformation in a real vector space without real eigenvalues, the rotation , in being one of examples. Since it is usually assumed that eigenvalues should belong to the field of scalars (if an operator acts in a vector space over a field the eigenvalues should be in ), such operators have empty spectrum.
Thus, the complex case (i.e. operators acting in complex vector spaces) seems to be the most natural setting for the spectral theory. Since , we can always treat a real matrix as an operator in to allow complex eigenvalues. Treating real matrices as operators in is typical in the spectral theory, and we will follow this agreement. Finding eigenvalues of a matrix (unless otherwise specified) will always mean finding all complex eigenvalues and not restricting oneself only to real ones.
Note that an operator in an abstract real vector space also can be interpreted as an operator in a complex space. A naïve approach would be to fix a basis (recall that all spaces in this chapter are finite-dimensional), and then work with coordinates in this basis allowing complex coordinates: that will be essentially move from operators in to operators described above.
This construction describes what is known as the complexification of a real vector space, and the result does not depend on the choice of a basis. A “high brow” abstract construction of the complexification, explaining why the result does not depend on the choice of a basis is described below in Section 5.8.2 of Chapter 5.
Let us remind the reader, that if is a polynomial, and is its root (i.e. ) then divides , i.e. can be represented as , where is some polynomial. If , then also can be divided by , so divides and so on.
The largest positive integer such that divides is called the multiplicity of the root .
If is an eigenvalue of an operator (matrix) , then it is a root of the characteristic polynomial . The multiplicity of this root is called the (algebraic) multiplicity of the eigenvalue .
Any polynomial of degree has exactly complex roots, counting multiplicity. The words counting multiplicities mean that if a root has multiplicity we have to list (count) it times. In other words, can be represented as
where are its complex roots, counting multiplicities.
There is another notion of multiplicity of an eigenvalue: the dimension of the eigenspace is called geometric multiplicity of the eigenvalue .
Geometric multiplicity is not as widely used as algebraic multiplicity. So, when people say simply “multiplicity” they usually mean algebraic multiplicity.
Let us mention, that algebraic and geometric multiplicities of an eigenvalue can differ.
Geometric multiplicity of an eigenvalue cannot exceed its algebraic multiplicity.
See Exercise 4.1.9 below. ∎
Let be matrix, and let be its (complex) eigenvalues (counting multiplicities). Then
.
.
Computing eigenvalues is equivalent to finding roots of a characteristic polynomial of a matrix (or using some numerical method), which can be quite time consuming. However, there is one particular case, when we can just read eigenvalues off the matrix. Namely
eigenvalues of a triangular matrix (counting multiplicities) are exactly the diagonal entries
By triangular here we mean either upper or lower triangular matrix. Since a diagonal matrix is a particular case of a triangular matrix (it is both upper and lower triangular
the eigenvalues of a diagonal matrix are its diagonal entries
The proof of the statement about triangular matrices is trivial: we need to subtract from the diagonal entries of , and use the fact that determinant of a triangular matrix is the product of its diagonal entries. We get the characteristic polynomial
and its roots are exactly .
True or false:
Every linear operator in an -dimensional vector space has distinct eigenvalues;
If a matrix has one eigenvector, it has infinitely many eigenvectors;
There exists a square real matrix with no real eigenvalues;
There exists a square matrix with no (complex) eigenvectors;
Similar matrices always have the same eigenvalues;
Similar matrices always have the same eigenvectors;
A non-zero sum of two eigenvectors of a matrix is always an eigenvector;
A non-zero sum of two eigenvectors of a matrix corresponding to the same eigenvalue is always an eigenvector.
Find characteristic polynomials, eigenvalues and eigenvectors of the following matrices:
Compute eigenvalues and eigenvectors of the rotation matrix
Note, that the eigenvalues (and eigenvectors) do not need to be real.
Compute characteristic polynomials and eigenvalues of the following matrices:
Do not expand the characteristic polynomials, leave them as products.
Prove that eigenvalues (counting multiplicities) of a triangular matrix coincide with its diagonal entries
An operator is called nilpotent if for some . Prove that if is nilpotent, then (i.e. that is the only eigenvalue of ).
Let be a basis in a vector space . Assume also that the first vectors of the basis are eigenvectors of an operator , corresponding to an eigenvalue (i.e. that , ). Show that in this basis the matrix of the operator has block triangular form
where is identity matrix and is some matrix.
Use the two previous exercises to prove that geometric multiplicity of an eigenvalue cannot exceed its algebraic multiplicity.
Prove that determinant of a matrix is the product of its eigenvalues (counting multiplicities).
Hint: first show that , where are eigenvalues (counting multiplicities). Then compare the free terms (terms without ) or plug in to get the conclusion.
Prove that the trace of a matrix equals the sum of eigenvalues in three steps. First, compute the coefficient of in the right side of the equality
Then show that can be represented as
where is polynomial of degree at most . And finally, comparing the coefficients of get the conclusion.
One of the application of the spectral theory is the diagonalization of operators, which means given an operator to find a basis in which the matrix of the operator is diagonal. Such basis does not always exists, i.e not all operators can be diagonalized (are diagonalizable). Importance of diagonalizable operators comes from the fact that the powers, and more general function of diagonal matrices are easy to compute, so if we diagonalize an operator we can easily compute functions of it.
We will explain how to compute functions of diagonalizable operators in this section. We also give a necessary and sufficient condition for an operator to be diagonalizable, as well as some simple sufficient conditions.
Note also that for operators in (matrices) the diagonalizability of means that it can be represented as , where is a diagonal matrix (and , of course, is invertible); we will explain this shortly.
Unless otherwise specified, all results in this section hold for both complex and real vector spaces (and even for spaces over arbitrary fields).
Suppose an operator in a vector space is such that has a basis of eigenvectors of , with being the corresponding eigenvalues. Then the matrix of in this basis is the diagonal matrix with on the diagonal
| (4.2.5) |
On the other hand, if the matrix of an operator in a basis is given by (4.2.5) then trivially , i.e are eigenvalues and are corresponding eigenvectors.
Note that the above reasoning holds for both complex and real vector spaces (and even for vector spaces over arbitrary fields)
Applying the above reasoning to operators in (matrices) we immediately get the following theorem. Note, that while in this book is either or , this theorem holds for an arbitrary field .
A matrix (with values in ) admits a representation , where is a diagonal matrix and is an invertible one (both with entries in ) if and only if there exists a basis in of eigenvectors of .
Moreover, in this case diagonal entries of are the eigenvalues and the columns of are the corresponding eigenvectors (column number corresponds to th diagonal entry of ).
Let , and let be the columns of (note that since is invertible its columns form a basis in ). Then the identity means that .
Indeed, is the change of the coordinates matrix from to the standard basis , so we get from that , which means exactly that .
And as we just discussed above, if and only if are the eigenvalues and are the corresponding eigenvectors of . ∎
Note if a matrix admits the representation with a diagonal matrix , then a simple direct calculation shows that the columns of are eigenvectors of and diagonal entries of are corresponding eigenvalues. This gives an alternative proof of the corresponding statement in Theorem 4.2.1.
As we discussed above, a diagonalizable operator has exactly eigenvalues (counting multiplicities). Any operator in a complex vector space has eigenvalues (counting multiplicities); an operator in a real space, on the other hand, could have no real eigenvalues.
We will, as it is customary in the spectral theory, treat real matrices as operators in the complex space , thus allowing complex eigenvalues and eigenvectors. Unless otherwise specified we will mean by the diagonalization of a matrix its complex diagonalization, i.e. a representation where matrices and can have complex entries.
The question when a real matrix admits a real diagonalization ( with real matrices and ) is in fact a very simple one, see Theorem 4.2.9 below.
Let the matrix of an operator in a basis is a diagonal one given by (4.2.5). Then it is easy to find an th power of the operator . Namely, the matrix of in the basis is
Moreover, functions of the operator are also very easy to compute: for example the operator (matrix) exponent is defined as , and its matrix in the basis is
Let now be an operator in . To find the matrices of the operators and in the standard basis , we need to recall that the change of coordinate matrix is the matrix with columns . Let us call this matrix , then according to the change of coordinates formula we have
where we use for the diagonal matrix in the middle.
Similarly
and similarly for .
Another way of thinking about powers (or other functions) of diagonalizable operators is to see that if operator can be represented as , then
and it is easy to compute the th power of a diagonal matrix.
We now present very simple sufficient condition for an operator to be diagonalizable, see Corollary 4.2.3 below.
Let be distinct eigenvalues of , and let be the corresponding eigenvectors. Then vectors are linearly independent.
We will use induction on . The case is trivial, because by the definition an eigenvector is non-zero, and a system consisting of one non-zero vector is linearly independent.
Suppose that the statement of the theorem is true for . Suppose there exists a non-trivial linear combination
| (4.2.6) |
If an operator has exactly distinct eigenvalues, then it is diagonalizable.
While very simple, this is a very important statement, and it will be used a lot! Please remember it.
To describe diagonalizable operators we need to introduce some new definitions.
Let be subspaces of a vector space . We say that the system of subspaces is a basis in if any vector admits a unique representation as a sum
| (4.2.7) |
We also say, that a system of subspaces is linearly independent if the equation
has only trivial solution ( ).
Another way to phrase that is to say that a system of subspaces is linearly independent if and only if any system of non-zero vectors , where , is linearly independent.
We say that the system of subspaces is generating (or complete, or spanning) if any vector admits representation as (4.2.7) (not necessarily unique).
From the above definition one can immediately see that Theorem 4.2.2 in fact states that the system of eigenspaces of an operator
is linearly independent.
It is easy to see that similarly to the bases of vectors, a system of subspaces is a basis if and only if it is generating and linearly independent. We leave the proof of this fact as an exercise for the reader.
There is a simple example of a basis of subspaces. Let be a vector space with a basis . Split the set of indices into subsets , and define subspaces . Clearly the subspaces form a basis of .
The following theorem shows that in the finite-dimensional case it is essentially the only possible example of a basis of subspaces.
Let be a basis of subspaces, and let for each a collection of vectors be a basis in .222We do not list the vectors in , one just should keep in mind that each consists of finitely many vectors in Then the union of these bases is a basis in .
To prove the theorem we need the following lemma
Let be a linearly independent family of subspaces, and let us have in each subspace a linearly independent system of vectors in .333Again, here we do not name each vector in individually, we just keep in mind that each set consists of finitely many vectors. Then the union is a linearly independent system.
The proof of the lemma is almost trivial, if one thinks a bit about it. The main difficulty in writing the proof is a choice of an appropriate notation. Instead of using two indices (one for the number and the other for the number of a vector in , let us use “flat” notation.
Namely, let be the number of vectors in . Let us order the set , for example as follows: first list all vectors from , then all vectors in , etc, listing all vectors from last.
This way, we index all vectors in by integers , and the set of indices splits into the sets such that the set consists of vectors .
Suppose we have a non-trivial linear combination
| (4.2.8) |
Denote
Then (4.2.8) can be rewritten as
Since and the system of subspaces is linearly independent, . Than means that for every
and since the system of vectors (i.e. the system ) are linearly independent, we have for all . Since it is true for all , we can conclude that for all . ∎
To prove the theorem we will use the same notation as in the proof of Lemma 4.2.7, i.e. the system consists of vectors , .
Lemma 4.2.7 asserts that the system of vectors , is linearly independent, so it only remains to show that the system is complete.
Since the system of subspaces is a basis, any vector can be represented as
Since the vectors , form a basis in , the vectors can be represented as
and therefore . ∎
First of all let us recall a simple necessary condition. Since the eigenvalues (counting multiplicities) of a diagonal matrix are exactly , we see that if an operator is diagonalizable, it has exactly eigenvalues (counting multiplicities).
Theorem below holds for both real and complex vector spaces (and even for spaces over general fields).
Let an operator has exactly eigenvalues (counting multiplicities)444Since any operator in a complex vector space has exactly eigenvalues (counting multiplicities), this assumption is moot in the complex case. . Then is diagonalizable if and only if for each eigenvalue the dimension of the eigenspace (i.e. the geometric multiplicity of ) coincides with the algebraic multiplicity of .
First of all let us note, that for a diagonal matrix, the algebraic and geometric multiplicities of eigenvalues coincide, and therefore the same holds for the diagonalizable operators.
Let us now prove the other implication. Let be eigenvalues of , and let be the corresponding eigenspaces. According to Remark 4.2.4, the subspaces , are linearly independent.
Let be a basis in . By Lemma 4.2.7 the system is a linearly independent system of vectors.
We know that each consists of vectors. So the number of vectors in equal to the sum of multiplicities of eigenvalues . But the sum of multiplicities of the eigenvalues is the number of eigenvalues counting multiplicities, which is exactly . So, we have a linearly independent system of eigenvectors, which means it is a basis. ∎
The theorem below is, in fact, already proven (it is essentially Theorem 4.2.8 for real spaces). We state it here to summarize the situation with real diagonalization of real matrices.
A real matrix admits a real factorization (i.e. representation where and are real matrices, is diagonal and is invertible) if and only if it admits complex factorization and all eigenvalues of are real.
Consider the matrix
Its characteristic polynomial is equal to
and its roots (eigenvalues) are and . For the eigenvalue
A basis in its nullspace consists of one vector , so this is the corresponding eigenvector.
Similarly, for
and the eigenspace is spanned by the vector . The matrix can be diagonalized as
Consider the matrix
Its characteristic polynomial is
and the eigenvalues (roots of the characteristic polynomial) are . For
This matrix has rank 1, so the eigenspace is spanned by one vector, for example by .
Since the matrix is real, we do not need to compute an eigenvector for : we can get it for free by taking the complex conjugate of the above eigenvector, see Exercise 4.2.2 below. So, for a corresponding eigenvector is , and so the matrix can be diagonalized as
Consider the matrix
Its characteristic polynomial is
so has an eigenvalue of multiplicity . But, it is easy to see that (1 pivot, so free variable). Therefore, the geometric multiplicity of the eigenvalue is different from its algebraic multiplicity, so is not diagonalizable.
There is also an explanation which does not use Theorem 4.2.8. Namely, we got that the eigenspace is one dimensional (spanned by the vector ). If were diagonalizable, it would have a diagonal form in some basis,555Note, that the only linear transformation having matrix in some basis is the identity transformation . Since is definitely not the identity, we can immediately conclude that cannot be diagonalized, so counting dimension of the eigenspace is not necessary. and so the dimension of the eigenspace would be 2. Therefore cannot be diagonalized.
Let be matrix. True or false:
has the same eigenvalues as .
has the same eigenvectors as .
If is diagonalizable, then so is .
Justify your conclusions.
Let be a square matrix with real entries, and let be its complex eigenvalue. Suppose is a corresponding eigenvector, . Prove that the is an eigenvalue of and . Here is the complex conjugate of the vector , .
Let
Find by diagonalizing .
Construct a matrix with eigenvalues 1 and 3 and corresponding eigenvectors and . Is such a matrix unique?
Diagonalize the following matrices, if possible:
.
.
( is one of the eigenvalues)
Consider the matrix
Find its eigenvalues. Is it possible to find the eigenvalues without computing?
Is this matrix diagonalizable? Find out without computing anything.
If the matrix is diagonalizable, diagonalize it.
Diagonalize the matrix
Find all square roots of the matrix
i.e. find all matrices such that . Hint: Finding a square root of a diagonal matrix is easy. You can leave your answer as a product.
Let us recall that the famous Fibonacci sequence:
is defined as follows: we put , and define
We want to find a formula for . To do this
Find a matrix such that
Hint: Combine the trivial equation with the Fibonacci relation .
Diagonalize and find a formula for .
Noticing that
find a formula for . (You will need to compute an inverse and perform multiplication here).
Show that the vector converges to an eigenvector of .
What do you think, is it a coincidence?
Let be a matrix with 3 eigenvalues (not counting multiplicities). Suppose we know that one eigenspace is three-dimensional. Can you say if is diagonalizable?
Give an example of a matrix which cannot be diagonalized. After you constructed the matrix, can you make it “generic”, so no special structure of the matrix could be seen?
Let a non-zero matrix satisfy . Prove that cannot be diagonalized. More generally, any non-zero nilpotent matrix, i.e. a non-zero matrix satisfying for some cannot be diagonalized.
Eigenvalues of a transposition:
Consider the transformation in the space of matrices, . Find all its eigenvalues and eigenvectors. Is it possible to diagonalize this transformation? Hint: While it is possible to write a matrix of this linear transformation in some basis, compute characteristic polynomial, and so on, it is easier to find eigenvalues and eigenvectors directly from the definition.
Can you do the same problem but in the space of matrices?
Prove that two subspaces and are linearly independent if and only if .