Let be a square matrix, and let be its characteristic polynomial. Then .
The proof looks ridiculously simple: plugging instead of in the definition of the characteristic polynomial we get
∎
But this is a wrong proof! To see why, let us analyze what the theorem states. It states, that if we compute the characteristic polynomial
and then plug matrix instead of to get
then the result will be zero matrix.
It is not clear why we get the same result if we just plug instead of in the determinant . Moreover, it is easy to see that with the exception of trivial case of matrices we will get a different object. Namely, is zero matrix, and its determinant is just the number . But is a matrix, and the theorem claims that this matrix is the zero matrix. Thus we are comparing apples and oranges. Even though in both cases we got zero, these are different zeroes: the number zero and the zero matrix!
Let us present a correct (albeit different from the standard one used in most textbooks) proof, which is based on some ideas from analysis.
The proof is based on several observations. First of all, the theorem is trivial for diagonal matrices, and so for matrices similar to diagonal (i.e. for diagonalizable matrices), see Problem 9.1.1 below.
The second observation is that any matrix can be approximated (as close as we want) by diagonalizable matrices. Since any operator has an upper triangular matrix in some orthonormal basis (see Theorem 6.1.1 in Chapter 6), we can assume without loss of generality that is an upper triangular matrix.
We can perturb diagonal entries of (as little as we want), to make them all different, so the perturbed matrix is diagonalizable (eigenvalues of a triangular matrix are its diagonal entries, see Section 4.1.7 in Chapter 4, and by Corollary 4.2.3 in Chapter 4 an matrix with distinct eigenvalues is diagonalizable).
As I just mentioned, we can perturb the diagonal entries of as little as we want, so Frobenius norm is as small as we want. Therefore one can find a sequence of diagonalizable matrices such that as (for example such that as ). It can be shown that the characteristic polynomials converge to the characteristic polynomial of . Therefore
But as we just discussed above the Cayley–Hamilton Theorem is trivial for diagonalizable matrices, so . Therefore . ∎
This proof illustrates an important idea that often it is sufficient to consider only a typical, generic situation. It is going beyond the scope of the book, but let us mention, without going into details, that a generic (i.e. typical) matrix is diagonalizable.
This proof is intended for a reader who is comfortable with such ideas from analysis as continuity and convergence111Here I mean analysis, i.e. a rigorous treatment of continuity, convergence, etc, and not calculus, which, as it is taught now, is simply a collection of recipes.. Such a reader should be able to fill in all the details, and for him/her this proof should look extremely easy and natural.
However, for others, who are not comfortable yet with these ideas, the proof definitely may look strange. It may even look like some kind of cheating, although, let me repeat that it is an absolutely correct and rigorous proof (modulo some standard facts in analysis). So, let us present another, proof of the theorem which is one of the “standard” proofs from linear algebra textbooks.
We know, see Theorem 6.6.1.1 from Chapter 6, that any square matrix is unitary equivalent to an upper triangular one. Since for any polynomial we have , and the characteristic polynomials of unitarily equivalent matrices coincide, it is sufficient to prove the theorem only for upper triangular matrices.
So, let be an upper triangular matrix. We know that diagonal entries of a triangular matrix coincide with it eigenvalues, so let be eigenvalues of ordered as they appear on the diagonal, so
The characteristic polynomial of can be represented as , so
Define subspaces , where is the standard basis in . Since the matrix of is upper triangular, the subspaces are so-called invariant subspaces of the operator , i.e. (meaning that for all ). Moreover, since for any and any
because both and are in . Thus , i.e. is an invariant subspace of .
We can say even more about the the subspace . Namely, , because only the first entries of the th column of the matrix of can be non-zero. On the other hand, for we have (because is an invariant subspace of ).
Take any vector . By the definition of it can be represented as a linear combination of the vectors . Since all vectors are transformed by to some vectors in , we can conclude that
| (9.1.1) |
Take an arbitrary vector . Applying (9.1.1) inductively with we get
The last inclusion mean that . But , so
Therefore for all , which means exactly that . ∎
As discussed in the above section, the Cayley–Hamilton theorem states that if is a square matrix, and
is its characteristic polynomial, then (we assuming, that by definition ).
Prove this theorem for the special case when is similar to a diagonal matrix, .
Hint: If and is any polynomial, can you compute ? What about ?
Let us also recall that for a square matrix (an operator) and for a polynomial the operator is defined by substituting instead of the independent variable,
here we agree that .
We know that generally matrix multiplication is not commutative, i.e. generally so the order is essential. However
and from here it is easy to show that for arbitrary polynomials and
where .
That means that when dealing only with polynomials of an operator , one does not need to worry about non-commutativity, and act like is simply an independent (scalar) variable. In particular, if a polynomial can be represented as a product of monomials
where are the roots of , then can be represented as
Let us recall that the spectrum of a square matrix (an operator) is the set of all eigenvalues of (not counting multiplicities).
For a square matrix and an arbitrary polynomial
In other words, is an eigenvalue of if and only if for some eigenvalue of .
Note, that as stated, this theorem does not say anything about multiplicities of the eigenvalues.
Note, that one inclusion is trivial. Namely, if is an eigenvalue of , for some , then , and , so is an eigenvalue of . That means that the inclusion is trivial.
If we consider a particular case of the above theorem, we get the following corollary.
Let be a square matrix with eigenvalues and let be a polynomial. Then is invertible if and only if
As it was discussed above, the inclusion
is trivial.
To prove the opposite inclusion take a point . Denote , so . Since the operator is not invertible.
Let us represent the polynomial as a product of monomials,
Then, as it was discussed above in Section 9.2.1, we can represent
The operator is not invertible, so one of the terms must be not invertible (because a product of invertible transformations is always invertible). That means .
On the other hand is a root of , so
and therefore . So we have proved the inclusion . ∎
An operator is called nilpotent if for some . Prove that if is nilpotent, then (i.e. that is the only eigenvalue of ).
Can you do it without using the spectral mapping theorem?
Let be an operator (linear transformation) in a vector space . A subspace of the vector space is called an invariant subspace of the operator (or, shortly, -invariant) if , i.e. if for all .
If is -invariant, then
i.e. is -invariant.
Similarly one can show (using induction, for example), that if then
This implies that for any polynomial , i.e. that:
any -invariant subspace is an invariant subspace of .
If is an -invariant subspace, then for all the result also belongs to . Therefore we can treat as an operator acting on , not on the whole space .
Formally, for an -invariant subspace we define the so-called restriction of onto by
Here we changed domain and target space of the operator, but the rule assigning value to the argument remains the same.
We will need the following simple lemma
Let be a polynomial, and let be an -invariant subspace. Then
The proof is trivial ∎
If a basis of -invariant subspaces, and are the corresponding restrictions, then, since , the operators act independently of each other (do not interact), and to analyze action of we can analyze operators separately.
In particular, if we pick a basis in each subspace and join them to get a basis in (see Theorem 4.2.6 from Chapter 4) then the operator will have in this basis the following block-diagonal form
(of course, here we have the correct ordering of the basis in , first we take a basis in ,then in and so on).
Our goal now is to pick a basis of invariant subspaces such that the restrictions have a simple structure. In this case we will get a basis in which the matrix of has a simple structure.
The eigenspaces would be good candidates, because the restriction of to the eigenspace is simply . Unfortunately, as we know eigenspaces do not always form a basis (they form a basis if and only if can be diagonalized, cf Theorem 4.2.1 in Chapter 4).
However, the so-called generalized eigenspaces will work.
A vector is called a generalized eigenvector (corresponding to an eigenvalue ) if for some .
The collection of all generalized eigenvectors, together with is called the generalized eigenspace (corresponding to the eigenvalue ).
In other words one can represent the generalized eigenspace as
| (9.3.1) |
The sequence , is an increasing sequence of subspaces, i.e.
The representation (9.3.1) does not look very simple, for it involves an infinite union. However, the sequence of the subspaces stabilizes, i.e.
so, in fact one can take the finite union.
To show that the sequence of kernels stabilizes, let us notice that if for finite-dimensional subspaces and we have (symbol means that but ), then .
Since , it cannot grow to infinity, so at some point
The rest follows from the lemma below.
Let for some
Then
Let , i.e. . Then
But we know that so , which means . Recalling the definition of we get that
so . We proved that . The opposite inclusion is trivial. ∎
The number on which the sequence stabilizes, i.e. the number such that
is called the depth of the eigenvalue .
It follows from the definition of the depth, that for the generalized eigenspace
| (9.3.2) |
Now let us summarize, what we know about generalized eigenspaces.
is an invariant subspace of , .
, because the operator , is nilpotent, see b), and the spectrum of nilpotent operator consists of one point , see Problem 9.2.1
Now we are ready to state the main result of this section. Let .
Let consists of points , and let be the corresponding generalized eigenspaces. Then the system of subspace is a basis of subspaces in .
Let be the multiplicity of the eigenvalue , so is the characteristic polynomial of . Define
| (9.3.3) |
Define . By property c) of the generalized eigenspaces . Since , the Spectral Mapping Theorem, see Corollary 9.2.2, implies that the operator is invertible.
Continuing with the proof of Theorem 9.3.4 define
Since for and , we can conclude that for all . Therefore, by the Spectral Mapping Theorem, see Corollary 9.2.2, the operator
is invertible.
Define the operators by
For the operators defined above
;
for ;
;
moreover, , so, in fact .
Property a) is trivial:
Property b) follows from (9.3.3). Indeed, contains the factor , restriction of which to is zero. Therefore and thus .
To prove property c), recall that according to Cayley–Hamilton Theorem . Since , we have for
That means, any vector in is annihilated by some power of , which by definition means that .
Now we are ready to complete the proof of the theorem. Take and define . Then according to Statement c) of Lemma 9.3.7, , and by Statement a),
so admits a representation as a linear combination.
To show that this representation is unique, we can just note, that if is represented as , , then it follows from the Statements b) and d) of Lemma 9.3.7 that
∎
Algebraic multiplicity of an eigenvalue equals the dimension of the corresponding generalized eigenspace.
According to Remark 9.3.5, if we join bases in generalized eigenspaces to get a basis in the whole space, the matrix of in any such basis has a block-diagonal form , where . Operators are nilpotent, so . Therefore, the spectrum of the operator (recall that ) consists of one eigenvalue of (algebraic) multiplicity . The multiplicity equals because an operator in a finite-dimensional space has exactly eigenvalues counting multiplicities, and has only one eigenvalue.
Note that we are free to pick bases in , so let us pick them in such a way that the corresponding blocks are upper triangular. Then
But this means that the algebraic multiplicity of the eigenvalue is . ∎
The following corollary is very important for differential equations.
Any operator in can be represented as , where is diagonalizable (i.e. diagonal in some basis) and is nilpotent ( for some ), and .
As we discussed above, see Remark 9.3.5, if we join the bases in to get a basis in , then in this basis has the block diagonal form , where , . The operators are nilpotent, and the operator is diagonal (in this basis). Notice also that (identity operator commutes with any operator), so the block diagonal operator commutes with , . Therefore, defining as the block diagonal operator we get the desired decomposition. ∎
This corollary allows us to compute functions of operators. Let us recall that if is a polynomial of degree , then can be computed with the help of Taylor’s formula
This formula is an algebraic identity, meaning that for each polynomial we can check that the formula is true using formal algebraic manipulations with and and not caring about their nature.
Since operators and commute, , the same rules as for usual (scalar) variables apply to them, and we can write (by plugging instead of and instead of )
Here, to compute the derivative we first compute the th derivative of the polynomial (using the usual rules from calculus), and then plug instead of .
But since is nilpotent, for some , only first terms can be non-zero, so
If is much smaller than , this formula makes computation of much easier.
The same approach works if is not a polynomial, but an infinite power series. For general power series we have to be careful about convergence of all the series involved, so we cannot say that the formula is true for an arbitrary power series . However, if the radius of convergence of the power series is , then everything works fine. In particular, if , then, using the fact that we get.
This formula has important applications in differential equation.
Note, that the fact that is essential here!
Recall, that an operator in a vector space is called nilpotent if for some exponent .
In the previous section we have proved, see Remark 9.3.5, that if we join the bases in all generalized eigenspaces to get a basis in the whole space, then the operator has in this basis a block diagonal form and operators can be represented as , where are nilpotent operators.
In each generalized eigenspace we want to pick up a basis such that the matrix of in this basis has the simplest possible form. Since matrix (in any basis) of the identity operator is the identity matrix, we need to find a basis in which the nilpotent operator has a simple form.
Since we can deal with each separately, we will need to consider the following problem:
For a nilpotent operator find a basis such that the matrix of in this basis is simple.
Let see, what does it mean for a matrix to have a simple form. It is easy to see that the matrix
| (9.4.1) |
(the entries on the diagonal right above the main one are , and the rest of entries are ) is nilpotent.
These matrices (together with zero matrices) will be our “building blocks”. Namely, we will show that for any nilpotent operator one can find a basis such that the matrix of the operator in this basis has the block diagonal form , where each is either a block of form (9.4.1) or a zero block.
Let us see what we should be looking for. Suppose the matrix of an operator has in a basis the form (9.4.1). Then
| (9.4.2) | ||||
| and | ||||
| (9.4.3) | ||||
Thus we have to be looking for the chains of vectors satisfying the above relations (9.4.2), (9.4.3).
A similar definition can be made for an arbitrary operator. Then all vectors must belong to the same generalized eigenspace , and they must satisfy the identities
Let be a nilpotent operator, and let be cycles of its generalized eigenvectors, , being the length of the cycle . Assume that the initial vectors are linearly independent. Then no vector belongs to two cycles, and the union of all the vectors from all the cycles is a linearly independent.
Let be the total number of vectors in all the cycles222Here we just count vectors in each cycle, and add all the numbers. We do not care if some cycles have a common vector, we count this vector in each cycle it belongs to (of course, according to the theorem, it is impossible, but initially we cannot assume that). We will use induction in . If the theorem is trivial.
Let us now assume, that the theorem is true for all operators and for all collection of cycles, as long as the total number of vectors in all the cycles is strictly less than .
Without loss of generality we can assume that the vectors span the whole space , because, otherwise we can consider instead of the operator its restriction onto the invariant subspace .
Consider the subspace . It follows from the relations (9.4.2), (9.4.3) that vectors span . Note that if then the system is a cycle, and that annihilates any cycle of length 1.
Therefore, we have finitely many cycles, and initial vectors of these cycles are linearly independent, so the induction hypothesis applies, and the vectors are linearly independent. Since these vectors also span , we have a basis there. Therefore,
(we had vectors, and we removed one vector from each cycle , , so we have vectors in the basis ). On the other hand for , and since these vectors are linearly independent . By the Rank Theorem (Theorem 2.7.2 from Chapter 2)
so .
On the other hand is spanned by vectors, therefore the vectors , form a basis, so they are linearly independent ∎
Let be a nilpotent operator. Then has a basis consisting of union of cycles of generalized eigenvectors of the operator .
We will use induction in where . For the theorem is trivial.
Assume that the theorem is true for any operator acting in a space of dimension strictly less than .
Consider the subspace . is an invariant subspace of the operator , so we can consider the restriction .
Since is not invertible, , so by the induction hypothesis there exist cycles of generalized eigenvectors such that their union is a basis in . Let , where is the initial vector of the cycle.
Since the end vector belong to , one can find a vector such that . So we can extend each cycle to a bigger cycle . Since the initial vectors of cycles , are linearly independent, the above Theorem 9.4.1 implies that the union of these cycles is a linearly independent system.
By the definition of the cycle we have , and we assumed that the initial vectors , are linearly independent. Let us complete this system to a basis in , i.e. let find vectors such that the system is a basis in (it may happen that the system , is already a basis in , in which case we put and add nothing).
The vector can be treated as a cycle of length 1, so we have a collection of cycles , whose initial vectors are linearly independent. So, we can apply Theorem 9.4.1 to get that the union of all these cycles is a linearly independent system.
To show that it is a basis, let us count the dimensions. We know that the cycles have vectors total. Each cycle was obtained from by adding 1 vector to it, so the total number of vectors in all the cycles is .
We know that (because is a basis there). We added to the cycles additional vectors, so we got
linearly independent vectors. But linearly independent vectors is a basis. ∎
A basis consisting of a union of cycles of generalized eigenvectors of a nilpotent operator (existence of which is guaranteed by the Theorem 9.4.2) is called a Jordan canonical basis for .
Note, that such basis is not unique.
Let be a nilpotent operator. There exists a basis (a Jordan canonical basis) such that the matrix of in this basis is a block diagonal , where all (except may be one) are blocks of form (9.4.1), and one of the blocks can be zero.
The matrix of in a Jordan canonical basis is called the Jordan canonical form of the operator . We will see later that the Jordan canonical form is unique, if we agree on how to order the blocks (i.e. on how to order the vectors in the basis).
According to Theorem 9.4.2 one can find a basis consisting of a union of cycles of generalized eigenvectors. A cycle of size gives rise to a diagonal block of form (9.4.1), and a cycle of length 1 correspond to a zero block. We can join these zero blocks in one large zero block (because off-diagonal entries are ). ∎
There is a good way of visualizing Theorem 9.4.2 and Corollary 9.4.3, the so-called dot diagrams. This methods also allows us to answer many natural questions, like “is the block diagonal representation given by Corollary 9.4.3 unique?”
Of course, if we treat this question literally, the answer is “no”, for we always can change the order of the blocks. But, if we exclude such trivial possibilities, for example by agreeing on some order of blocks (say, if we put all non-zero blocks in decreasing order, and then put the zero block), is the representation unique, or not?
To better understand the structure of nilpotent operators, described in the Section 9.4.1, let us draw the so-called dot diagram. Namely, suppose we have a basis, which is a union of cycles of generalized eigenvectors. Let us represent the basis by an array of dots, so that each column represents a cycle. The first row (i.e. the top one in the Figure 9.1 below) correspond to initial vectors of cycles, and we arrange the columns (cycles) by their length, putting the longest one to the left. Dots in the th column, going from top to the bottom, correspond to the vectors of the th cycle.
Usually one just present a dot diagram without specifying corresponding vectors. But sometimes it is beneficial to consider a marked dot diagram, where one marks dots with the corresponding vectors in the cycle.
On the figure 9.1 we have the dot diagram of a nilpotent operator, as well as its Jordan canonical form. This dot diagram shows, that the basis has 1 cycle of length 5, one cycle of length 3, two cycles of length 2, and 2 cycles of length 1. The cycle of length 5 corresponds to the block of the matrix, the cycle of length 3 correspond to the block, and two cycles of length 2 correspond to two blocks. Two cycles of length 1 correspond to two zero entries on the diagonal, which we join in the zero block. Here in each block we are only giving the main diagonal and the diagonal above it; all other entries of the matrix are zero.
If we agree on the ordering of the blocks, there is a one-to-one correspondence between dot diagrams and Jordan canonical forms (for nilpotent operators). So, the question about uniqueness of the Jordan canonical form is equivalent to the question about uniqueness of the dot diagram.
To answer the question about uniqueness of the dot diagram, let us analyze, how the operator transforms it. Assume first that we are given a marked dot diagram, meaning that we mark each dot with the corresponding vector in the cycle. In this case we can get both the Jordan canonical form and Jordan canonical basis from such dot diagram.
Since the operator annihilates initial vectors of the cycles, and moves vector of a cycle to the vector , we can see that the operator acts on its dot diagram by removing the last (bottom) dot in every column. An equivalent description would be to remove the first (top) row, but require that top row in th column still corresponds to the initial vector of the cycle.
The new dot diagram will be the (marked) dot diagram of the operator restricted to the -invariant subspace , and it gives us the Jordan canonical form for the restriction (both the Jordan canonical basis and the matrix, i.e. the Jordan canonical form).
Similarly, it is not hard to see that the operator removes the first rows of the dot diagram. Therefore, if for all we know the dimensions , we know the dot diagram of the operator . Namely, the number of dots in the first row is , the number of dots in the second row is
and the number of dots in the th row is
But this means that the dot diagram, which was initially defined using a Jordan canonical basis, does not depend on a particular choice of such a basis. Therefore, the dot diagram, is unique! This implies that if we agree on the order of the blocks, then the Jordan canonical form is unique.
Let us say few words about computing a Jordan canonical basis for a nilpotent operator. We have already proved existence of this basis in Section 9.4.2, so we can use Jordan canonical form and dot diagram to guide us through the construction.
Let be the largest positive integer such that (so ). Equivalently, we can say that is the smallest non-negative integer such that (and so ). One can see from the above analysis of dot diagrams, see Section 9.4.3, that is the length of the longest cycle (so this also can be used as the definition of ).
Computing operators , , and counting we can construct the dot diagram of . Namely, gives us the number of dots in the top row, the number of dots in the next row, etc.
Now we want to put vectors instead of dots and find a basis which is a union of cycles.
We start by finding the longest cycles (because we know the dot diagram, we know how many cycles should be there, and what is the length of each cycle). Consider the column space . Since there holds , so . To be in agreement with the notation in the next steps we will use the notation instead of .
Let be a basis in the subspace ; the vectors will be the initial vectors of the cycles. Then we find the end vectors of the cycles by solving the equations
Applying consecutively the operator to the end vector , we get all the vectors in the cycle. Thus, we have constructed all cycles of maximal length.
Alternatively, for each we can find the vectors , by consecutively solving the equations
Thus, we constructed all cycles of maximal length.
Let be the length of a maximal cycle among those that are left to find. Consider the subspace , and let its dimension be . Since , we conclude that
so . Moreover333one can also see from the dot diagram that for we have , so is the largest exponent for which we need to complete the basis., it can be seen from the dot diagram that . Therefore we can complete the basis in to a basis in . Then we find end vectors of the cycles by solving (for ) the equations
thus constructing the cycles of length .
Let denote the length of a maximal cycle among ones left. Then, completing the basis in to a basis in we construct the cycles of length , and so on…
One final remark: as we discussed above, if we know the dot diagram, we know the canonical form, so after we have found a Jordan canonical basis, we do not need to compute the matrix of in this basis: we already know it!
Given an operator there exist a basis (Jordan canonical basis) such that the matrix of in this basis has a block diagonal form with blocks of form
| (9.5.1) |
where is an eigenvalue of . Here we assume that the block of size is just .
The block diagonal form from Theorem 9.5.1 is called the Jordan canonical form of the operator . The corresponding basis is called a Jordan canonical basis for an operator .
According to Theorem 9.3.4 and Remark 9.3.5, if we join bases in the generalized eigenspaces to get a basis in the whole space, the matrix of in this basis has a block diagonal form , where . The operators are nilpotent, so by Theorem 9.4.2 (more precisely, by Corollary 9.4.3) one can find a basis in such that the matrix of in this basis is the Jordan canonical form of . To get the matrix of in this basis one just puts instead of on the main diagonal. ∎
First of all let us recall that the computing of eigenvalues is the hardest part, but here we do not discuss this part, and assume that eigenvalues are already computed.
For each eigenvalue we compute subspaces , until the sequence of the subspaces stabilizes. In fact, since we have an increasing sequence of subspaces (), then it is sufficient only to keep track of their dimension (or ranks of the operators ). For an eigenvalue let be the number where the sequence stabilizes, i.e. satisfies
Then is the generalized eigenspace corresponding to the eigenvalue .
After we computed all the generalized eigenspaces there are two possible ways of action. The first way is to find a basis in each generalized eigenspace, so the matrix of the operator in this basis has the block-diagonal form , where . Then we can deal with each matrix separately. The operators are nilpotent, so applying the algorithm described in Section 9.4.4 we get the Jordan canonical representation for , and putting instead of 0 on the main diagonal, we get the Jordan canonical representation for the block . The advantage of this approach is that we are working with smaller blocks. But we need to find the matrix of the operator in a new basis, which involves inverting a matrix and matrix multiplication.
Another way is to find a Jordan canonical basis in each of the generalized eigenspaces by working directly with the operator , without splitting it first into the blocks. Again, the algorithm we outlined above in Section 9.4.4 works with a slight modification. Namely, when computing a Jordan canonical basis for a generalized eigenspace , instead of considering subspaces , which we would need to consider when working with the block separately, we consider the subspaces .