Chapter 9 Advanced spectral theory

9.1. Cayley–Hamilton Theorem

Theorem 9.1.1 (Cayley–Hamilton).

Let AA be a square matrix, and let p(λ)=det(AλI)p(\lambda)=\det(A-\lambda I) be its characteristic polynomial. Then p(A)=𝟎p(A)=\mathbf{0}.

A wrong proof.

The proof looks ridiculously simple: plugging AA instead of λ\lambda in the definition of the characteristic polynomial we get

p(A)=det(AAI)=det𝟎=0.p(A)=\det(A-AI)=\det\mathbf{0}=0.

But this is a wrong proof! To see why, let us analyze what the theorem states. It states, that if we compute the characteristic polynomial

det(AλI)=p(λ)=k=0nckλk\det(A-\lambda I)=p(\lambda)=\sum_{k=0}^{n}c_{k}\lambda^{k}

and then plug matrix AA instead of λ\lambda to get

p(A):=k=0nckAk=c0I+c1A++cnAnp(A):=\sum_{k=0}^{n}c_{k}A^{k}=c_{0}I+c_{1}A+\ldots+c_{n}A^{n}

then the result will be zero matrix.

It is not clear why we get the same result if we just plug AA instead of λ\lambda in the determinant det(AλI)\det(A-\lambda I). Moreover, it is easy to see that with the exception of trivial case of 1×11\times 1 matrices we will get a different object. Namely, AAIA-AI is zero matrix, and its determinant is just the number 0. But p(A)p(A) is a matrix, and the theorem claims that this matrix is the zero matrix. Thus we are comparing apples and oranges. Even though in both cases we got zero, these are different zeroes: the number zero and the zero matrix!

Let us present a correct (albeit different from the standard one used in most textbooks) proof, which is based on some ideas from analysis.

A “continuous” proof.

The proof is based on several observations. First of all, the theorem is trivial for diagonal matrices, and so for matrices similar to diagonal (i.e. for diagonalizable matrices), see Problem 9.1.1 below.

The second observation is that any matrix can be approximated (as close as we want) by diagonalizable matrices. Since any operator has an upper triangular matrix in some orthonormal basis (see Theorem 6.1.1 in Chapter 6), we can assume without loss of generality that AA is an upper triangular matrix.

We can perturb diagonal entries of AA (as little as we want), to make them all different, so the perturbed matrix A~\widetilde{A} is diagonalizable (eigenvalues of a triangular matrix are its diagonal entries, see Section 4.1.7 in Chapter 4, and by Corollary 4.2.3 in Chapter 4 an n×nn\times n matrix with nn distinct eigenvalues is diagonalizable).

As I just mentioned, we can perturb the diagonal entries of AA as little as we want, so Frobenius norm AA~2\|A-\widetilde{A}\|_{2} is as small as we want. Therefore one can find a sequence of diagonalizable matrices AkA_{k} such that AkAA_{k}\to A as kk\to\infty (for example such that AkA20\|A_{k}-A\|_{2}\to 0 as kk\to\infty). It can be shown that the characteristic polynomials pk(λ)=det(AkλI)p_{k}(\lambda)=\det(A_{k}-\lambda I) converge to the characteristic polynomial p(λ)=det(AλI)p(\lambda)=\det(A-\lambda I) of AA. Therefore

p(A)=limkpk(Ak).p(A)=\lim_{k\to\infty}p_{k}(A_{k}).

But as we just discussed above the Cayley–Hamilton Theorem is trivial for diagonalizable matrices, so pk(Ak)=𝟎p_{k}(A_{k})=\mathbf{0}. Therefore p(A)=limk𝟎=𝟎p(A)=\lim_{k\to\infty}\mathbf{0}=\mathbf{0}. ∎

This proof illustrates an important idea that often it is sufficient to consider only a typical, generic situation. It is going beyond the scope of the book, but let us mention, without going into details, that a generic (i.e. typical) matrix is diagonalizable.

This proof is intended for a reader who is comfortable with such ideas from analysis as continuity and convergence111Here I mean analysis, i.e. a rigorous treatment of continuity, convergence, etc, and not calculus, which, as it is taught now, is simply a collection of recipes.. Such a reader should be able to fill in all the details, and for him/her this proof should look extremely easy and natural.

However, for others, who are not comfortable yet with these ideas, the proof definitely may look strange. It may even look like some kind of cheating, although, let me repeat that it is an absolutely correct and rigorous proof (modulo some standard facts in analysis). So, let us present another, proof of the theorem which is one of the “standard” proofs from linear algebra textbooks.

A “standard” proof.

We know, see Theorem 6.6.1.1 from Chapter 6, that any square matrix is unitary equivalent to an upper triangular one. Since for any polynomial pp we have p(UAU1)=Up(A)U1p(UAU^{-1})=Up(A)U^{-1}, and the characteristic polynomials of unitarily equivalent matrices coincide, it is sufficient to prove the theorem only for upper triangular matrices.

So, let AA be an upper triangular matrix. We know that diagonal entries of a triangular matrix coincide with it eigenvalues, so let λ1,λ2,,λn\lambda_{1},\lambda_{2},\ldots,\lambda_{n} be eigenvalues of AA ordered as they appear on the diagonal, so

A=(λ1λ2𝟎λn).A=\left(\begin{array}[]{llll}\lambda_{1}&&&*\\ &\lambda_{2}&&\\ &&\ddots&\\ \mathbf{0}&&&\lambda_{n}\\ \end{array}\right).

The characteristic polynomial p(z)=det(AzI)p(z)=\det(A-zI) of AA can be represented as p(z)=(λ1z)(λ2z)(λnz)=(1)n(zλ1)(zλ2)(zλn)p(z)=(\lambda_{1}-z)(\lambda_{2}-z)\ldots(\lambda_{n}-z)=(-1)^{n}(z-\lambda_{1% })(z-\lambda_{2})\ldots(z-\lambda_{n}), so

p(A)=(1)n(Aλ1I)(Aλ2I)(AλnI).p(A)=(-1)^{n}(A-\lambda_{1}I)(A-\lambda_{2}I)\ldots(A-\lambda_{n}I).

Define subspaces Ek:=span{𝐞1,𝐞2,,𝐞k}E_{k}:=\operatorname{span}\{\mathbf{e}_{1},\mathbf{e}_{2},\ldots,\mathbf{e}_{k}\}, where 𝐞1,𝐞2,,𝐞n\mathbf{e}_{1},\mathbf{e}_{2},\ldots,\mathbf{e}_{n} is the standard basis in n\mathbb{C}^{n}. Since the matrix of AA is upper triangular, the subspaces EkE_{k} are so-called invariant subspaces of the operator AA, i.e. AEkEkAE_{k}\subset E_{k} (meaning that A𝐯EkA\mathbf{v}\in E_{k} for all 𝐯Ek\mathbf{v}\in E_{k}). Moreover, since for any 𝐯Ek\mathbf{v}\in E_{k} and any λ\lambda

(AλI)𝐯=A𝐯λ𝐯Ek,(A-\lambda I)\mathbf{v}=A\mathbf{v}-\lambda\mathbf{v}\in E_{k},

because both A𝐯A\mathbf{v} and λ𝐯\lambda\mathbf{v} are in EkE_{k}. Thus (AλI)EkEk(A-\lambda I)E_{k}\subset E_{k}, i.e. EkE_{k} is an invariant subspace of AλIA-\lambda I.

We can say even more about the the subspace (AλkI)Ek(A-\lambda_{k}I)E_{k}. Namely, (AλkI)𝐞kspan{𝐞1,𝐞2,,𝐞k1}(A-\lambda_{k}I)\mathbf{e}_{k}\in\operatorname{span}\{\mathbf{e}_{1},\mathbf{e% }_{2},\ldots,\mathbf{e}_{k-1}\}, because only the first k1k-1 entries of the kkth column of the matrix of AλkIA-\lambda_{k}I can be non-zero. On the other hand, for j<kj<k we have (AλkI)𝐞jEjEk(A-\lambda_{k}I)\mathbf{e}_{j}\in E_{j}\subset E_{k} (because EjE_{j} is an invariant subspace of AλkIA-\lambda_{k}I).

Take any vector 𝐯Ek\mathbf{v}\in E_{k}. By the definition of EkE_{k} it can be represented as a linear combination of the vectors 𝐞1,𝐞2,,𝐞k\mathbf{e}_{1},\mathbf{e}_{2},\ldots,\mathbf{e}_{k}. Since all vectors 𝐞1,𝐞2,,𝐞k\mathbf{e}_{1},\mathbf{e}_{2},\ldots,\mathbf{e}_{k} are transformed by AλkIA-\lambda_{k}I to some vectors in Ek1E_{k-1}, we can conclude that

(9.1.1) (AλkI)𝐯Ek1𝐯Ek.(A-\lambda_{k}I)\mathbf{v}\in E_{k-1}\qquad\forall\mathbf{v}\in E_{k}.

Take an arbitrary vector 𝐱n=En\mathbf{x}\in\mathbb{C}^{n}=E_{n}. Applying (9.1.1) inductively with k=n,n1,1k=n,n-1,\ldots 1 we get

𝐱1\displaystyle\mathbf{x}_{1} :=(AλnI)𝐱En1,\displaystyle:=(A-\lambda_{n}I)\mathbf{x}\in E_{n-1},
𝐱2\displaystyle\mathbf{x}_{2} :=(Aλn1I)𝐱1=(Aλn1I)(AλnI)𝐱En2,\displaystyle:=(A-\lambda_{n-1}I)\mathbf{x}_{1}=(A-\lambda_{n-1}I)(A-\lambda_{% n}I)\mathbf{x}\in E_{n-2},
\displaystyle\ldots
𝐱n\displaystyle\mathbf{x}_{n} :=(Aλ2I)𝐱n1=(Aλ2I)(Aλn1I)(AλnI)𝐱E1.\displaystyle:=(A-\lambda_{2}I)\mathbf{x}_{n-1}=(A-\lambda_{2}I)\ldots(A-% \lambda_{n-1}I)(A-\lambda_{n}I)\mathbf{x}\in E_{1}.

The last inclusion mean that 𝐱n=α𝐞1\mathbf{x}_{n}=\alpha\mathbf{e}_{1}. But (Aλ1I)𝐞1=𝟎(A-\lambda_{1}I)\mathbf{e}_{1}=\mathbf{0}, so

𝟎=(Aλ1I)𝐱n=(Aλ1I)(Aλ2I)(AλnI)𝐱.\mathbf{0}=(A-\lambda_{1}I)\mathbf{x}_{n}=(A-\lambda_{1}I)(A-\lambda_{2}I)% \ldots(A-\lambda_{n}I)\mathbf{x}.

Therefore p(A)𝐱=𝟎p(A)\mathbf{x}=\mathbf{0} for all 𝐱n\mathbf{x}\in\mathbb{C}^{n}, which means exactly that p(A)=𝟎p(A)=\mathbf{0}. ∎

Exercises.

9.1.1Cayley–Hamilton Theorem for diagonalizable matrices.

As discussed in the above section, the Cayley–Hamilton theorem states that if AA is a square matrix, and

p(λ)=det(AλI)=k=0nckλkp(\lambda)=\det(A-\lambda I)=\sum_{k=0}^{n}c_{k}\lambda^{k}

is its characteristic polynomial, then p(A):=k=0nckAk=𝟎p(A):=\sum_{k=0}^{n}c_{k}A^{k}=\mathbf{0} (we assuming, that by definition A0=IA^{0}=I).

Prove this theorem for the special case when AA is similar to a diagonal matrix, A=SDS1A=SDS^{-1}.

Hint: If D=diag{λ1,λ2,,λn}D=\operatorname{diag}\{\lambda_{1},\lambda_{2},\ldots,\lambda_{n}\} and pp is any polynomial, can you compute p(D)p(D)? What about p(A)p(A)?

9.2. Spectral Mapping Theorem

9.2.1. Polynomials of operators

Let us also recall that for a square matrix (an operator) AA and for a polynomial p(z)=k=0Nakzkp(z)=\sum_{k=0}^{N}a_{k}z^{k} the operator p(A)p(A) is defined by substituting AA instead of the independent variable,

p(A):=k=0NakAk=a0I+a1A+a2A2++aNAN;p(A):=\sum_{k=0}^{N}a_{k}A^{k}=a_{0}I+a_{1}A+a_{2}A^{2}+\ldots+a_{N}A^{N};

here we agree that A0=IA^{0}=I.

We know that generally matrix multiplication is not commutative, i.e. generally ABBAAB\neq BA so the order is essential. However

AkAj=AjAk=Ak+j,A^{k}A^{j}=A^{j}A^{k}=A^{k+j},

and from here it is easy to show that for arbitrary polynomials pp and qq

p(A)q(A)=q(A)p(A)=R(A)p(A)q(A)=q(A)p(A)=R(A)

where R(z)=p(z)q(z)R(z)=p(z)q(z).

That means that when dealing only with polynomials of an operator AA, one does not need to worry about non-commutativity, and act like AA is simply an independent (scalar) variable. In particular, if a polynomial p(z)p(z) can be represented as a product of monomials

p(z)=a(zz1)(zz2)(zzN),p(z)=a(z-z_{1})(z-z_{2})\ldots(z-z_{N}),

where z1,z2,,zNz_{1},z_{2},\ldots,z_{N} are the roots of pp, then p(A)p(A) can be represented as

p(A)=a(Az1I)(Az2I)(AzNI)p(A)=a(A-z_{1}I)(A-z_{2}I)\ldots(A-z_{N}I)

9.2.2. Spectral Mapping Theorem

Let us recall that the spectrum σ(A)\sigma(A) of a square matrix (an operator) AA is the set of all eigenvalues of AA (not counting multiplicities).

Theorem 9.2.1 (Spectral Mapping Theorem).

For a square matrix AA and an arbitrary polynomial pp

σ(p(A))=p(σ(A)).\sigma(p(A))=p(\sigma(A)).

In other words, μ\mu is an eigenvalue of p(A)p(A) if and only if μ=p(λ)\mu=p(\lambda) for some eigenvalue λ\lambda of AA.

Note, that as stated, this theorem does not say anything about multiplicities of the eigenvalues.

Remark.

Note, that one inclusion is trivial. Namely, if λ\lambda is an eigenvalue of AA, A𝐱=λ𝐱A\mathbf{x}=\lambda\mathbf{x} for some 𝐱𝟎\mathbf{x}\neq\mathbf{0}, then Ak𝐱=λk𝐱A^{k}\mathbf{x}=\lambda^{k}\mathbf{x}, and p(A)𝐱=p(λ)𝐱p(A)\mathbf{x}=p(\lambda)\mathbf{x}, so p(λ)p(\lambda) is an eigenvalue of p(A)p(A). That means that the inclusion p(σ(A))σ(p(A))p(\sigma(A))\subset\sigma(p(A)) is trivial.

If we consider a particular case μ=0\mu=0 of the above theorem, we get the following corollary.

Corollary 9.2.2.

Let AA be a square matrix with eigenvalues λ1,λ2,,λn\lambda_{1},\lambda_{2},\ldots,\lambda_{n} and let pp be a polynomial. Then p(A)p(A) is invertible if and only if

p(λk)0k=1,2,,n.p(\lambda_{k})\neq 0\qquad\forall k=1,2,\ldots,n.
Proof of Theorem 9.2.1.

As it was discussed above, the inclusion

p(σ(A))σ(p(A))p(\sigma(A))\subset\sigma(p(A))

is trivial.

To prove the opposite inclusion σ(p(A))p(σ(A))\sigma(p(A))\subset p(\sigma(A)) take a point μσ(p(A))\mu\in\sigma(p(A)). Denote q(z)=p(z)μq(z)=p(z)-\mu, so q(A)=p(A)μIq(A)=p(A)-\mu I. Since μσ(p(A))\mu\in\sigma(p(A)) the operator q(A)=p(A)μIq(A)=p(A)-\mu I is not invertible.

Let us represent the polynomial q(z)q(z) as a product of monomials,

q(z)=a(zz1)(zz2)(zzN).q(z)=a(z-z_{1})(z-z_{2})\ldots(z-z_{N}).

Then, as it was discussed above in Section 9.2.1, we can represent

q(A)=a(Az1I)(Az2I)(AzNI).q(A)=a(A-z_{1}I)(A-z_{2}I)\ldots(A-z_{N}I).

The operator q(A)q(A) is not invertible, so one of the terms AzkIA-z_{k}I must be not invertible (because a product of invertible transformations is always invertible). That means zkσ(A)z_{k}\in\sigma(A).

On the other hand zkz_{k} is a root of qq, so

0=q(zk)=p(zk)μ0=q(z_{k})=p(z_{k})-\mu

and therefore μ=p(zk)\mu=p(z_{k}). So we have proved the inclusion σ(p(A))p(σ(A))\sigma(p(A))\subset p(\sigma(A)). ∎

Exercises.

9.2.1.

An operator AA is called nilpotent if Ak=𝟎A^{k}=\mathbf{0} for some kk. Prove that if AA is nilpotent, then σ(A)={0}\sigma(A)=\{0\} (i.e. that 0 is the only eigenvalue of AA).

Can you do it without using the spectral mapping theorem?

9.3. Generalized eigenspaces. Geometric meaning of algebraic multiplicity

9.3.1. Invariant subspaces

Definition.

Let A:VVA:V\to V be an operator (linear transformation) in a vector space VV. A subspace EE of the vector space VV is called an invariant subspace of the operator AA (or, shortly, AA-invariant) if AEEAE\subset E, i.e. if A𝐯EA\mathbf{v}\in E for all 𝐯E\mathbf{v}\in E.

If EE is AA-invariant, then

A2E=A(AE)AEE,A^{2}E=A(AE)\subset AE\subset E,

i.e. EE is A2A^{2}-invariant.

Similarly one can show (using induction, for example), that if AEEAE\subset E then

AkEEk1.A^{k}E\subset E\qquad\forall k\geq 1.

This implies that p(A)EEp(A)E\subset E for any polynomial pp, i.e. that:

any AA-invariant subspace EE is an invariant subspace of p(A)p(A).

If EE is an AA-invariant subspace, then for all 𝐯E\mathbf{v}\in E the result A𝐯A\mathbf{v} also belongs to EE. Therefore we can treat AA as an operator acting on EE, not on the whole space VV.

Formally, for an AA-invariant subspace EE we define the so-called restriction A|E:EEA|_{E}:E\to E of AA onto EE by

(A|E)𝐯=A𝐯𝐯E.(A|_{E})\mathbf{v}=A\mathbf{v}\qquad\forall\mathbf{v}\in E.

Here we changed domain and target space of the operator, but the rule assigning value to the argument remains the same.

We will need the following simple lemma

Lemma 9.3.1.

Let pp be a polynomial, and let EE be an AA-invariant subspace. Then

p(A|E)=p(A)|E.p(A|_{E})=p(A)|_{E}.
Proof.

The proof is trivial ∎

If E1,E2,,ErE_{1},E_{2},\ldots,E_{r} a basis of AA-invariant subspaces, and Ak:=A|EkA_{k}:=A|_{E_{k}} are the corresponding restrictions, then, since AEk=AkEkEkAE_{k}=A_{k}E_{k}\subset E_{k}, the operators AkA_{k} act independently of each other (do not interact), and to analyze action of AA we can analyze operators AkA_{k} separately.

In particular, if we pick a basis in each subspace EkE_{k} and join them to get a basis in VV (see Theorem 4.2.6 from Chapter 4) then the operator AA will have in this basis the following block-diagonal form

A=(A1A2𝟎𝟎Ar)A=\left(\begin{array}[]{llll}A_{1}&&&\\[-10.76385pt] &A_{2}&&\raisebox{6.45831pt}{\Huge$\mathbf{0}$}\\ &&\ddots&\\[-6.45831pt] \raisebox{0.0pt}{\Huge$\mathbf{0}$}&&&A_{r}\\ \end{array}\right)

(of course, here we have the correct ordering of the basis in VV, first we take a basis in E1E_{1},then in E2E_{2} and so on).

Our goal now is to pick a basis of invariant subspaces E1,E2,,ErE_{1},E_{2},\ldots,E_{r} such that the restrictions AkA_{k} have a simple structure. In this case we will get a basis in which the matrix of AA has a simple structure.

The eigenspaces Ker(AλkI)\operatorname{Ker}(A-\lambda_{k}I) would be good candidates, because the restriction of AA to the eigenspace Ker(AλkI)\operatorname{Ker}(A-\lambda_{k}I) is simply λkI\lambda_{k}I. Unfortunately, as we know eigenspaces do not always form a basis (they form a basis if and only if AA can be diagonalized, cf Theorem 4.2.1 in Chapter 4).

However, the so-called generalized eigenspaces will work.

9.3.2. Generalized eigenspaces.

Definition 9.3.2.

A vector 𝐯\mathbf{v} is called a generalized eigenvector (corresponding to an eigenvalue λ\lambda) if (AλI)k𝐯=𝟎(A-\lambda I)^{k}\mathbf{v}=\mathbf{0} for some k1k\geq 1.

The collection EλE_{\lambda} of all generalized eigenvectors, together with 𝟎\mathbf{0} is called the generalized eigenspace (corresponding to the eigenvalue λ\lambda).

In other words one can represent the generalized eigenspace EλE_{\lambda} as

(9.3.1) Eλ=k1Ker(AλI)k.E_{\lambda}=\bigcup_{k\geq 1}\operatorname{Ker}(A-\lambda I)^{k}.

The sequence Ker(AλI)k\operatorname{Ker}(A-\lambda I)^{k}, k=1,2,3,k=1,2,3,\ldots is an increasing sequence of subspaces, i.e.

Ker(AλI)kKer(AλI)k+1k1.\operatorname{Ker}(A-\lambda I)^{k}\subset\operatorname{Ker}(A-\lambda I)^{k+1% }\qquad\forall k\geq 1.

The representation (9.3.1) does not look very simple, for it involves an infinite union. However, the sequence of the subspaces Ker(AλI)k\operatorname{Ker}(A-\lambda I)^{k} stabilizes, i.e.

Ker(AλI)k=Ker(AλI)k+1kkλ,\operatorname{Ker}(A-\lambda I)^{k}=\operatorname{Ker}(A-\lambda I)^{k+1}% \qquad\forall k\geq k_{\lambda},

so, in fact one can take the finite union.

To show that the sequence of kernels stabilizes, let us notice that if for finite-dimensional subspaces EE and FF we have EFE\subsetneqq F (symbol EFE\subsetneqq F means that EFE\subset F but EFE\neq F), then dimE<dimF\dim E<\dim F.

Since dimKer(AλI)kdimV<\dim\operatorname{Ker}(A-\lambda I)^{k}\leq\dim V<\infty, it cannot grow to infinity, so at some point

Ker(AλI)k=Ker(AλI)k+1.\operatorname{Ker}(A-\lambda I)^{k}=\operatorname{Ker}(A-\lambda I)^{k+1}.

The rest follows from the lemma below.

Lemma 9.3.3.

Let for some kk

Ker(AλI)k=Ker(AλI)k+1.\operatorname{Ker}(A-\lambda I)^{k}=\operatorname{Ker}(A-\lambda I)^{k+1}.

Then

Ker(AλI)k+r=Ker(AλI)k+r+1r0.\operatorname{Ker}(A-\lambda I)^{k+r}=\operatorname{Ker}(A-\lambda I)^{k+r+1}% \qquad\forall r\geq 0.
Proof.

Let 𝐯Ker(AλI)k+r+1\mathbf{v}\in\operatorname{Ker}(A-\lambda I)^{k+r+1}, i.e. (AλI)k+r+1𝐯=𝟎(A-\lambda I)^{k+r+1}\mathbf{v}=\mathbf{0}. Then

𝐰:=(AλI)r𝐯Ker(AλI)k+1.\mathbf{w}:=(A-\lambda I)^{r}\mathbf{v}\in\operatorname{Ker}(A-\lambda I)^{k+1}.

But we know that Ker(AλI)k=Ker(AλI)k+1\operatorname{Ker}(A-\lambda I)^{k}=\operatorname{Ker}(A-\lambda I)^{k+1} so 𝐰Ker(AλI)k\mathbf{w}\in\operatorname{Ker}(A-\lambda I)^{k}, which means (AλI)k𝐰=𝟎(A-\lambda I)^{k}\mathbf{w}=\mathbf{0}. Recalling the definition of 𝐰\mathbf{w} we get that

(AλI)k+r𝐯=(AλI)k𝐰=𝟎(A-\lambda I)^{k+r}\mathbf{v}=(A-\lambda I)^{k}\mathbf{w}=\mathbf{0}

so 𝐯Ker(AλI)k+r\mathbf{v}\in\operatorname{Ker}(A-\lambda I)^{k+r}. We proved that Ker(AλI)k+r+1Ker(AλI)k+r\operatorname{Ker}(A-\lambda I)^{k+r+1}\subset\operatorname{Ker}(A-\lambda I)^% {k+r}. The opposite inclusion is trivial. ∎

Definition.

The number d=d(λ)d=d(\lambda) on which the sequence Ker(AλI)k\operatorname{Ker}(A-\lambda I)^{k} stabilizes, i.e. the number dd such that

Ker(AλI)d1Ker(AλI)d=Ker(AλI)d+1\operatorname{Ker}(A-\lambda I)^{d-1}\subsetneqq\operatorname{Ker}(A-\lambda I% )^{d}=\operatorname{Ker}(A-\lambda I)^{d+1}

is called the depth of the eigenvalue λ\lambda.

It follows from the definition of the depth, that for the generalized eigenspace EλE_{\lambda}

(9.3.2) (AλI)d(λ)𝐯=𝟎𝐯Eλ.(A-\lambda I)^{d(\lambda)}\mathbf{v}=\mathbf{0}\qquad\forall\mathbf{v}\in E_{% \lambda}.

Now let us summarize, what we know about generalized eigenspaces.

  1. a)

    EλE_{\lambda} is an invariant subspace of AA, AEλEλAE_{\lambda}\subset E_{\lambda}.

  2. b)

    If d(λ)d(\lambda) is the depth of the eigenvalue λ\lambda, then

    ((AλI)|Eλ)d(λ)=(A|EλλIEλ)d(λ)=𝟎.((A-\lambda I)|_{E_{\lambda}})^{d(\lambda)}=(A|_{E_{\lambda}}-\lambda I_{E_{% \lambda}})^{d(\lambda)}=\mathbf{0}.

    (this is just another way of writing (9.3.2))

  3. c)

    σ(A|Eλ)={λ}\sigma(A|_{E_{\lambda}})=\{\lambda\}, because the operator A|EλλIEλA|_{E_{\lambda}}-\lambda I_{E_{\lambda}}, is nilpotent, see b), and the spectrum of nilpotent operator consists of one point 0, see Problem 9.2.1

Now we are ready to state the main result of this section. Let A:VVA:V\to V.

Theorem 9.3.4.

Let σ(A)\sigma(A) consists of rr points λ1,λ2,,λr\lambda_{1},\lambda_{2},\ldots,\lambda_{r}, and let Ek:=EλkE_{k}:=E_{\lambda_{k}} be the corresponding generalized eigenspaces. Then the system of subspace E1,E2,,ErE_{1},E_{2},\ldots,E_{r} is a basis of subspaces in VV.

Remark 9.3.5.

If we join the bases in all generalized eigenspaces EkE_{k}, then by Theorem 4.2.6 from Chapter 4 we will get a basis in the whole space. In this basis the matrix of the operator AA has the block diagonal form A=diag{A1,A2,,Ar}A=\operatorname{diag}\{A_{1},A_{2},\ldots,A_{r}\}, where Ak:=A|EkA_{k}:=A|_{E_{k}}, Ek=EλkE_{k}=E_{\lambda_{k}}. It is also easy to see, see (9.3.2) that the operators Nk:=AkλkIEkN_{k}:=A_{k}-\lambda_{k}I_{E_{k}} are nilpotent, Nkdk=𝟎N_{k}^{d_{k}}=\mathbf{0}.

Proof of Theorem 9.3.4.

Let mkm_{k} be the multiplicity of the eigenvalue λk\lambda_{k}, so p(z)=k=1r(zλk)mkp(z)=\prod_{k=1}^{r}(z-\lambda_{k})^{m_{k}} is the characteristic polynomial of AA. Define

pk(z)=p(z)/(zλk)mk=jk(zλj)mj.\quad p_{k}(z)=p(z)/(z-\lambda_{k})^{m_{k}}=\prod_{j\neq k}(z-\lambda_{j})^{m_% {j}}.
Lemma 9.3.6.
(9.3.3) (AλkI)mk|Ek=𝟎,(A-\lambda_{k}I)^{m_{k}}|_{E_{k}}=\mathbf{0},
Proof.

Define Ak:=A|EkA_{k}:=A|_{E_{k}}. By property c) of the generalized eigenspaces σ(Ak)={λk}\sigma(A_{k})=\{\lambda_{k}\}. Since pk(λk)0p_{k}(\lambda_{k})\neq 0, the Spectral Mapping Theorem, see Corollary 9.2.2, implies that the operator pk(Ak)p_{k}(A_{k}) is invertible.

By the Cayley–Hamilton Theorem (Theorem 9.1.1)

𝟎=p(A)=(AλkI)mkpk(A).\mathbf{0}=p(A)=(A-\lambda_{k}I)^{m_{k}}p_{k}(A).

Restricting the above identity to the invariant subspace EkE_{k} we get (see Lemma 9.3.1) that

𝟎=p(Ak)=(AkλkIEk)mkpk(Ak)\mathbf{0}=p(A_{k})=(A_{k}-\lambda_{k}I_{E_{k}})^{m_{k}}p_{k}(A_{k})

(𝟎\mathbf{0} here is zero operator in EkE_{k}). Since pk(A)p_{k}(A) is invertible we conclude that

(AkλkI)mk=p(Ak)pk(Ak)1=𝟎pk(Ak)1=𝟎.(A_{k}-\lambda_{k}I)^{m_{k}}=p(A_{k})p_{k}(A_{k})^{-1}=\mathbf{0}\,p_{k}(A_{k}% )^{-1}=\mathbf{0}.

Lemma 9.3.1 implies that (AλkI)mk|Ek=(AkλkI)mk(A-\lambda_{k}I)^{m_{k}}|_{E_{k}}=(A_{k}-\lambda_{k}I)^{m_{k}}, which concludes the proof. ∎

Continuing with the proof of Theorem 9.3.4 define

q(z)=k=1rpk(z).q(z)=\sum_{k=1}^{r}p_{k}(z).

Since pk(λj)=0p_{k}(\lambda_{j})=0 for jkj\neq k and pk(λk)0p_{k}(\lambda_{k})\neq 0, we can conclude that q(λk)0q(\lambda_{k})\neq 0 for all kk. Therefore, by the Spectral Mapping Theorem, see Corollary 9.2.2, the operator

B=q(A)B=q(A)

is invertible.

Define the operators 𝒫k\mathcal{P}_{k} by

𝒫k=B1pk(A).\mathcal{P}_{k}=B^{-1}p_{k}(A).
Lemma 9.3.7.

For the operators 𝒫k\mathcal{P}_{k} defined above

  1. a)

    𝒫1+𝒫2++𝒫r=I\mathcal{P}_{1}+\mathcal{P}_{2}+\ldots+\mathcal{P}_{r}=I;

  2. b)

    𝒫k|Ej=𝟎\mathcal{P}_{k}|_{E_{j}}=\mathbf{0} for jkj\neq k;

  3. c)

    Ran𝒫kEk\operatorname{Ran}\mathcal{P}_{k}\subset E_{k};

  4. d)

    moreover, 𝒫k𝐯=𝐯\mathcal{P}_{k}\mathbf{v}=\mathbf{v} 𝐯Ek\forall\mathbf{v}\in E_{k}, so, in fact Ran𝒫k=Ek\operatorname{Ran}\mathcal{P}_{k}=E_{k}.

Proof.

Property a) is trivial:

k=1r𝒫k=B1k=1rpk(A)=B1B=I.\sum_{k=1}^{r}\mathcal{P}_{k}=B^{-1}\sum_{k=1}^{r}p_{k}(A)=B^{-1}B=I.

Property b) follows from (9.3.3). Indeed, pk(A)p_{k}(A) contains the factor (AλjI)mj(A-\lambda_{j}I)^{m_{j}}, restriction of which to EjE_{j} is zero. Therefore pk(A)|Ej=𝟎p_{k}(A)|_{E_{j}}=\mathbf{0} and thus 𝒫k|Ej=B1pk(A)|Ej=𝟎\mathcal{P}_{k}|_{E_{j}}=B^{-1}p_{k}(A)|_{E_{j}}=\mathbf{0}.

To prove property c), recall that according to Cayley–Hamilton Theorem p(A)=𝟎p(A)=\mathbf{0}. Since p(z)=(zλk)mkpk(z)p(z)=(z-\lambda_{k})^{m_{k}}p_{k}(z), we have for 𝐰=pk(A)𝐯\mathbf{w}=p_{k}(A)\mathbf{v}

(AλkI)mk𝐰=(AλkI)mkpk(A)𝐯=p(A)𝐯=𝟎.(A-\lambda_{k}I)^{m_{k}}\mathbf{w}=(A-\lambda_{k}I)^{m_{k}}p_{k}(A)\mathbf{v}=% p(A)\mathbf{v}=\mathbf{0}.

That means, any vector 𝐰\mathbf{w} in Ranpk(A)\operatorname{Ran}p_{k}(A) is annihilated by some power of (AλkI)(A-\lambda_{k}I), which by definition means that Ranpk(A)Ek\operatorname{Ran}p_{k}(A)\subset E_{k}.

To prove the last property, let us notice that it follows from (9.3.3) that for 𝐯Ek\mathbf{v}\in E_{k}

pk(A)𝐯=j=1rpj(A)𝐯=B𝐯,p_{k}(A)\mathbf{v}=\sum_{j=1}^{r}p_{j}(A)\mathbf{v}=B\mathbf{v},

which implies 𝒫k𝐯=B1B𝐯=𝐯\mathcal{P}_{k}\mathbf{v}=B^{-1}B\mathbf{v}=\mathbf{v}. ∎

Now we are ready to complete the proof of the theorem. Take 𝐯V\mathbf{v}\in V and define 𝐯k=𝒫k𝐯\mathbf{v}_{k}=\mathcal{P}_{k}\mathbf{v}. Then according to Statement c) of Lemma 9.3.7, 𝐯kEk\mathbf{v}_{k}\in E_{k}, and by Statement a),

𝐯=k=1r𝐯k,\mathbf{v}=\sum_{k=1}^{r}\mathbf{v}_{k},

so 𝐯\mathbf{v} admits a representation as a linear combination.

To show that this representation is unique, we can just note, that if 𝐯\mathbf{v} is represented as 𝐯=k=1r𝐯k\mathbf{v}=\sum_{k=1}^{r}\mathbf{v}_{k}, 𝐯kEk\mathbf{v}_{k}\in E_{k}, then it follows from the Statements b) and d) of Lemma 9.3.7 that

𝒫k𝐯=𝒫k(𝐯1+𝐯2++𝐯r)=𝒫k𝐯k=𝐯k.\mathcal{P}_{k}\mathbf{v}=\mathcal{P}_{k}(\mathbf{v}_{1}+\mathbf{v}_{2}+\ldots% +\mathbf{v}_{r})=\mathcal{P}_{k}\mathbf{v}_{k}=\mathbf{v}_{k}.

9.3.3. Geometric meaning of algebraic multiplicity

Proposition 9.3.8.

Algebraic multiplicity of an eigenvalue equals the dimension of the corresponding generalized eigenspace.

Proof.

According to Remark 9.3.5, if we join bases in generalized eigenspaces Ek=EλkE_{k}=E_{\lambda_{k}} to get a basis in the whole space, the matrix of AA in any such basis has a block-diagonal form diag{A1,A2,,Ar}\operatorname{diag}\{A_{1},A_{2},\ldots,A_{r}\}, where Ak:=A|EkA_{k}:=A|_{E_{k}}. Operators Nk=AkλkIEkN_{k}=A_{k}-\lambda_{k}I_{E_{k}} are nilpotent, so σ(Nk)={0}\sigma(N_{k})=\{0\}. Therefore, the spectrum of the operator AkA_{k} (recall that Ak=NkλkIA_{k}=N_{k}-\lambda_{k}I) consists of one eigenvalue λk\lambda_{k} of (algebraic) multiplicity nk=dimEkn_{k}=\dim E_{k}. The multiplicity equals nkn_{k} because an operator in a finite-dimensional space VV has exactly dimV\dim V eigenvalues counting multiplicities, and AkA_{k} has only one eigenvalue.

Note that we are free to pick bases in EkE_{k}, so let us pick them in such a way that the corresponding blocks AkA_{k} are upper triangular. Then

det(AλI)=k=1rdet(AkλIEk)=k=1r(λkλ)nk.\det(A-\lambda I)=\prod_{k=1}^{r}\det(A_{k}-\lambda I_{E_{k}})=\prod_{k=1}^{r}% (\lambda_{k}-\lambda)^{n_{k}}.

But this means that the algebraic multiplicity of the eigenvalue λk\lambda_{k} is nk=dimEλkn_{k}=\dim E_{\lambda_{k}}. ∎

9.3.4. An important application

The following corollary is very important for differential equations.

Corollary 9.3.9.

Any operator AA in VV can be represented as A=D+NA=D+N, where DD is diagonalizable (i.e. diagonal in some basis) and NN is nilpotent (Nm=𝟎N^{m}=\mathbf{0} for some mm), and DN=NDDN=ND.

Proof.

As we discussed above, see Remark 9.3.5, if we join the bases in EkE_{k} to get a basis in VV, then in this basis AA has the block diagonal form A=diag{A1,A2,,Ar}A=\operatorname{diag}\{A_{1},A_{2},\ldots,A_{r}\}, where Ak:=A|EkA_{k}:=A|_{E_{k}}, Ek=EλkE_{k}=E_{\lambda_{k}}. The operators Nk:=AkλkIEkN_{k}:=A_{k}-\lambda_{k}I_{E_{k}} are nilpotent, and the operator D=diag{λ1IE1,λ2IE2,λrIEr}D=\operatorname{diag}\{\lambda_{1}I_{E_{1}},\lambda_{2}I_{E_{2}}\ldots,\lambda% _{r}I_{E_{r}}\} is diagonal (in this basis). Notice also that λkIEkNk=NkλkIEk\lambda_{k}I_{E_{k}}N_{k}=N_{k}\lambda_{k}I_{E_{k}} (identity operator commutes with any operator), so the block diagonal operator N=diag{N1,N2,,Nr}N=\operatorname{diag}\{N_{1},N_{2},\ldots,N_{r}\} commutes with DD, DN=NDDN=ND. Therefore, defining NN as the block diagonal operator N=diag{N1,N2,,Nr}N=\operatorname{diag}\{N_{1},N_{2},\ldots,N_{r}\} we get the desired decomposition. ∎

This corollary allows us to compute functions of operators. Let us recall that if pp is a polynomial of degree dd, then p(a+x)p(a+x) can be computed with the help of Taylor’s formula

p(a+x)=k=0dp(k)(a)k!xkp(a+x)=\sum_{k=0}^{d}\frac{p^{(k)}(a)}{k!}x^{k}

This formula is an algebraic identity, meaning that for each polynomial pp we can check that the formula is true using formal algebraic manipulations with aa and xx and not caring about their nature.

Since operators DD and NN commute, DN=NDDN=ND, the same rules as for usual (scalar) variables apply to them, and we can write (by plugging DD instead of aa and NN instead of xx)

p(A)=p(D+N)=k=0dp(k)(D)k!Nk.p(A)=p(D+N)=\sum_{k=0}^{d}\frac{p^{(k)}(D)}{k!}N^{k}.

Here, to compute the derivative p(k)(D)p^{(k)}(D) we first compute the kkth derivative of the polynomial p(x)p(x) (using the usual rules from calculus), and then plug DD instead of xx.

But since NN is nilpotent, Nm=𝟎N^{m}=\mathbf{0} for some mm, only first mm terms can be non-zero, so

p(A)=p(D+N)=k=0m1p(k)(D)k!Nk.p(A)=p(D+N)=\sum_{k=0}^{m-1}\frac{p^{(k)}(D)}{k!}N^{k}.

If mm is much smaller than dd, this formula makes computation of p(A)p(A) much easier.

The same approach works if pp is not a polynomial, but an infinite power series. For general power series we have to be careful about convergence of all the series involved, so we cannot say that the formula is true for an arbitrary power series p(x)p(x). However, if the radius of convergence of the power series is \infty, then everything works fine. In particular, if p(x)=exp(x)=e^{x}, then, using the fact that (ex)=ex(e^{x})^{\prime}=e^{x} we get.

eA=k=0m1eDk!Nk=eDk=0m11k!Nke^{A}=\sum_{k=0}^{m-1}\frac{e^{D}}{k!}N^{k}=e^{D}\sum_{k=0}^{m-1}\frac{1}{k!}N% ^{k}

This formula has important applications in differential equation.

Note, that the fact that ND=DNND=DN is essential here!

9.4. Structure of nilpotent operators

Recall, that an operator AA in a vector space VV is called nilpotent if Ak=𝟎A^{k}=\mathbf{0} for some exponent kk.

In the previous section we have proved, see Remark 9.3.5, that if we join the bases in all generalized eigenspaces Ek=EλkE_{k}=E_{\lambda_{k}} to get a basis in the whole space, then the operator AA has in this basis a block diagonal form diag{A1,A2,,Ar}\operatorname{diag}\{A_{1},A_{2},\ldots,A_{r}\} and operators AkA_{k} can be represented as Ak=λkI+NkA_{k}=\lambda_{k}I+N_{k}, where NkN_{k} are nilpotent operators.

In each generalized eigenspace EkE_{k} we want to pick up a basis such that the matrix of AkA_{k} in this basis has the simplest possible form. Since matrix (in any basis) of the identity operator is the identity matrix, we need to find a basis in which the nilpotent operator NkN_{k} has a simple form.

Since we can deal with each NkN_{k} separately, we will need to consider the following problem:

For a nilpotent operator AA find a basis such that the matrix of AA in this basis is simple.

Let see, what does it mean for a matrix to have a simple form. It is easy to see that the matrix

(9.4.1) (01 00 10100)\left(\begin{array}[]{ccccccc}0&1&&&\ 0\\ &0&\ 1&&\\ &&0&\ddots&\\ &&&\ddots&1\\ 0&&&&0\\ \end{array}\right)

(the entries on the diagonal right above the main one are 11, and the rest of entries are 0) is nilpotent.

These matrices (together with 1×11\times 1 zero matrices) will be our “building blocks”. Namely, we will show that for any nilpotent operator one can find a basis such that the matrix of the operator in this basis has the block diagonal form diag{A1,A2,,Ar}\operatorname{diag}\{A_{1},A_{2},\ldots,A_{r}\}, where each AkA_{k} is either a block of form (9.4.1) or a 1×11\times 1 zero block.

Let us see what we should be looking for. Suppose the matrix of an operator AA has in a basis 𝐯1,𝐯2,,𝐯p\mathbf{v}_{1},\mathbf{v}_{2},\ldots,\mathbf{v}_{p} the form (9.4.1). Then

(9.4.2) A𝐯1\displaystyle A\mathbf{v}_{1} =𝟎\displaystyle=\mathbf{0}
and
(9.4.3) A𝐯k+1\displaystyle A\mathbf{v}_{k+1} =𝐯k,k=1,2,,p1.\displaystyle=\mathbf{v}_{k},\qquad k=1,2,\ldots,p-1.

Thus we have to be looking for the chains of vectors 𝐯1,𝐯2,,𝐯p\mathbf{v}_{1},\mathbf{v}_{2},\ldots,\mathbf{v}_{p} satisfying the above relations (9.4.2), (9.4.3).

9.4.1. Cycles of generalized eigenvectors

Definition.

Let AA be a nilpotent operator. A chain of non-zero vectors 𝐯1,𝐯2,,𝐯p\mathbf{v}_{1},\mathbf{v}_{2},\ldots,\mathbf{v}_{p} satisfying relations (9.4.2), (9.4.3) is called a cycle of generalized eigenvectors of AA. The vector 𝐯1\mathbf{v}_{1} is called the initial vector of the cycle, the vector 𝐯p\mathbf{v}_{p} is called the end vector of the cycle, and the number pp is called the length of the cycle.

Remark.

A similar definition can be made for an arbitrary operator. Then all vectors 𝐯k\mathbf{v}_{k} must belong to the same generalized eigenspace EλE_{\lambda}, and they must satisfy the identities

(AλI)𝐯1=𝟎,(AλI)𝐯k+1=𝐯k,k=1,2,,p1,(A-\lambda I)\mathbf{v}_{1}=\mathbf{0},\qquad(A-\lambda I)\mathbf{v}_{k+1}=% \mathbf{v}_{k},\quad k=1,2,\ldots,p-1,
Theorem 9.4.1.

Let AA be a nilpotent operator, and let 𝒞1,𝒞2,,𝒞r\mathcal{C}_{1},\mathcal{C}_{2},\ldots,\mathcal{C}_{r} be cycles of its generalized eigenvectors, 𝒞k=𝐯1k,𝐯2k,,𝐯pkk\mathcal{C}_{k}=\mathbf{v}^{k}_{1},\mathbf{v}^{k}_{2},\ldots,\mathbf{v}^{k}_{p% _{k}}, pkp_{k} being the length of the cycle 𝒞k\mathcal{C}_{k}. Assume that the initial vectors 𝐯11,𝐯12,,𝐯1r\mathbf{v}_{1}^{1},\mathbf{v}_{1}^{2},\ldots,\mathbf{v}_{1}^{r} are linearly independent. Then no vector belongs to two cycles, and the union of all the vectors from all the cycles is a linearly independent.

Proof.

Let n=p1+p2++prn=p_{1}+p_{2}+\ldots+p_{r} be the total number of vectors in all the cycles222Here we just count vectors in each cycle, and add all the numbers. We do not care if some cycles have a common vector, we count this vector in each cycle it belongs to (of course, according to the theorem, it is impossible, but initially we cannot assume that). We will use induction in nn. If n=1n=1 the theorem is trivial.

Let us now assume, that the theorem is true for all operators and for all collection of cycles, as long as the total number of vectors in all the cycles is strictly less than nn.

Without loss of generality we can assume that the vectors 𝐯jk\mathbf{v}_{j}^{k} span the whole space VV, because, otherwise we can consider instead of the operator AA its restriction onto the invariant subspace span{𝐯jk:k=1,2,,r, 1jpk}\operatorname{span}\{\mathbf{v}_{j}^{k}:k=1,2,\ldots,r,\ 1\leq j\leq p_{k}\}.

Consider the subspace RanA\operatorname{Ran}A. It follows from the relations (9.4.2), (9.4.3) that vectors 𝐯jk:k=1,2,,r, 1jpk1\mathbf{v}_{j}^{k}:k=1,2,\ldots,r,\ 1\leq j\leq p_{k}-1 span RanA\operatorname{Ran}A. Note that if pk>1p_{k}>1 then the system 𝐯1k,𝐯2k,,𝐯pk1k\mathbf{v}^{k}_{1},\mathbf{v}^{k}_{2},\ldots,\mathbf{v}^{k}_{p_{k}-1} is a cycle, and that AA annihilates any cycle of length 1.

Therefore, we have finitely many cycles, and initial vectors of these cycles are linearly independent, so the induction hypothesis applies, and the vectors 𝐯jk:k=1,2,,r, 1jpk1\mathbf{v}_{j}^{k}:k=1,2,\ldots,r,\ 1\leq j\leq p_{k}-1 are linearly independent. Since these vectors also span RanA\operatorname{Ran}A, we have a basis there. Therefore,

rankA=dimRanA=nr\operatorname{rank}A=\dim\operatorname{Ran}A=n-r

(we had nn vectors, and we removed one vector 𝐯pkk\mathbf{v}^{k}_{p_{k}} from each cycle 𝒞k\mathcal{C}_{k}, k=1,2,,rk=1,2,\ldots,r, so we have nrn-r vectors in the basis 𝐯jk:k=1,2,,r, 1jpk1\mathbf{v}_{j}^{k}:k=1,2,\ldots,r,\ 1\leq j\leq p_{k}-1 ). On the other hand A𝐯1k=𝟎A\mathbf{v}_{1}^{k}=\mathbf{0} for k=1,2,,rk=1,2,\ldots,r, and since these vectors are linearly independent dimKerAr\dim\operatorname{Ker}A\geq r. By the Rank Theorem (Theorem 2.7.2 from Chapter 2)

dimV=rankA+dimKerA=(nr)+dimKerA(nr)+r=n\dim V=\operatorname{rank}A+\dim\operatorname{Ker}A=(n-r)+\dim\operatorname{% Ker}A\geq(n-r)+r=n

so dimVn\dim V\geq n.

On the other hand VV is spanned by nn vectors, therefore the vectors 𝐯jk:k=1,2,,r, 1jpk\mathbf{v}_{j}^{k}:k=1,2,\ldots,r,\ 1\leq j\leq p_{k}, form a basis, so they are linearly independent ∎

9.4.2. Jordan canonical form of a nilpotent operator

Theorem 9.4.2.

Let A:VVA:V\to V be a nilpotent operator. Then VV has a basis consisting of union of cycles of generalized eigenvectors of the operator AA.

Proof.

We will use induction in nn where n=dimVn=\dim V. For n=1n=1 the theorem is trivial.

Assume that the theorem is true for any operator acting in a space of dimension strictly less than nn.

Consider the subspace X=RanAX=\operatorname{Ran}A. XX is an invariant subspace of the operator AA, so we can consider the restriction A|XA|_{X}.

Since AA is not invertible, dimRanA<dimV\dim\operatorname{Ran}A<\dim V, so by the induction hypothesis there exist cycles 𝒞1,𝒞2,,𝒞r\mathcal{C}_{1},\mathcal{C}_{2},\ldots,\mathcal{C}_{r} of generalized eigenvectors such that their union is a basis in XX. Let 𝒞k=𝐯1k,𝐯2k,,𝐯pkk\mathcal{C}_{k}=\mathbf{v}^{k}_{1},\mathbf{v}^{k}_{2},\ldots,\mathbf{v}^{k}_{p% _{k}}, where 𝐯1k\mathbf{v}^{k}_{1} is the initial vector of the cycle.

Since the end vector 𝐯pkk\mathbf{v}^{k}_{p_{k}} belong to RanA\operatorname{Ran}A, one can find a vector 𝐯pk+1k\mathbf{v}^{k}_{p_{k}+1} such that A𝐯pk+1k=𝐯pkkA\mathbf{v}^{k}_{p_{k}+1}=\mathbf{v}^{k}_{p_{k}}. So we can extend each cycle 𝒞k\mathcal{C}_{k} to a bigger cycle 𝒞~k=𝐯1k,𝐯2k,,𝐯pkk,𝐯pk+1k\widetilde{\mathcal{C}}_{k}=\mathbf{v}^{k}_{1},\mathbf{v}^{k}_{2},\ldots,% \mathbf{v}^{k}_{p_{k}},\mathbf{v}^{k}_{p_{k}+1}. Since the initial vectors 𝐯1k\mathbf{v}^{k}_{1} of cycles 𝒞~k\widetilde{\mathcal{C}}_{k}, k=1,2,,rk=1,2,\ldots,r are linearly independent, the above Theorem 9.4.1 implies that the union of these cycles is a linearly independent system.

By the definition of the cycle we have 𝐯1kKerA\mathbf{v}_{1}^{k}\in\operatorname{Ker}A, and we assumed that the initial vectors 𝐯1k\mathbf{v}_{1}^{k}, k=1,2,,rk=1,2,\ldots,r are linearly independent. Let us complete this system to a basis in KerA\operatorname{Ker}A, i.e. let find vectors 𝐮1,𝐮2,,𝐮q\mathbf{u}_{1},\mathbf{u}_{2},\ldots,\mathbf{u}_{q} such that the system 𝐯11,𝐯12,,𝐯1r,𝐮1,𝐮2,,𝐮q\mathbf{v}_{1}^{1},\mathbf{v}_{1}^{2},\ldots,\mathbf{v}_{1}^{r},\mathbf{u}_{1}% ,\mathbf{u}_{2},\ldots,\mathbf{u}_{q} is a basis in KerA\operatorname{Ker}A (it may happen that the system 𝐯1k\mathbf{v}_{1}^{k}, k=1,2,,rk=1,2,\ldots,r is already a basis in KerA\operatorname{Ker}A, in which case we put q=0q=0 and add nothing).

The vector 𝐮j\mathbf{u}_{j} can be treated as a cycle of length 1, so we have a collection of cycles 𝒞~1,𝒞~2,,𝒞~r,𝐮1,𝐮2,,𝐮q\widetilde{\mathcal{C}}_{1},\widetilde{\mathcal{C}}_{2},\ldots,\widetilde{% \mathcal{C}}_{r},\mathbf{u}_{1},\mathbf{u}_{2},\ldots,\mathbf{u}_{q}, whose initial vectors are linearly independent. So, we can apply Theorem 9.4.1 to get that the union of all these cycles is a linearly independent system.

To show that it is a basis, let us count the dimensions. We know that the cycles 𝒞1,𝒞2,,𝒞r\mathcal{C}_{1},\mathcal{C}_{2},\ldots,\mathcal{C}_{r} have dimRanA=rankA\dim\operatorname{Ran}A=\operatorname{rank}A vectors total. Each cycle 𝒞~k\widetilde{\mathcal{C}}_{k} was obtained from 𝒞k\mathcal{C}_{k} by adding 1 vector to it, so the total number of vectors in all the cycles 𝒞~k\widetilde{\mathcal{C}}_{k} is rankA+r\operatorname{rank}A+r.

We know that dimKerA=r+q\dim\operatorname{Ker}A=r+q (because 𝐯11,𝐯12,,𝐯1r,𝐮1,𝐮2,,𝐮q\mathbf{v}_{1}^{1},\mathbf{v}_{1}^{2},\ldots,\mathbf{v}_{1}^{r},\mathbf{u}_{1}% ,\mathbf{u}_{2},\ldots,\mathbf{u}_{q} is a basis there). We added to the cycles 𝒞~1,𝒞~2,,𝒞~r\widetilde{\mathcal{C}}_{1},\widetilde{\mathcal{C}}_{2},\ldots,\widetilde{% \mathcal{C}}_{r} additional qq vectors, so we got

rankA+r+q=rankA+dimKerA=dimV\operatorname{rank}A+r+q=\operatorname{rank}A+\dim\operatorname{Ker}A=\dim V

linearly independent vectors. But dimV\dim V linearly independent vectors is a basis. ∎

Definition.

A basis consisting of a union of cycles of generalized eigenvectors of a nilpotent operator AA (existence of which is guaranteed by the Theorem 9.4.2) is called a Jordan canonical basis for AA.

Note, that such basis is not unique.

Corollary 9.4.3.

Let AA be a nilpotent operator. There exists a basis (a Jordan canonical basis) such that the matrix of AA in this basis is a block diagonal diag{A1,A2,,Ar}\operatorname{diag}\{A_{1},A_{2},\ldots,A_{r}\}, where all AkA_{k} (except may be one) are blocks of form (9.4.1), and one of the blocks AkA_{k} can be zero.

The matrix of AA in a Jordan canonical basis is called the Jordan canonical form of the operator AA. We will see later that the Jordan canonical form is unique, if we agree on how to order the blocks (i.e. on how to order the vectors in the basis).

Proof of Corollary 9.4.3.

According to Theorem 9.4.2 one can find a basis consisting of a union of cycles of generalized eigenvectors. A cycle of size pp gives rise to a p×pp\times p diagonal block of form (9.4.1), and a cycle of length 1 correspond to a 1×11\times 1 zero block. We can join these 1×11\times 1 zero blocks in one large zero block (because off-diagonal entries are 0). ∎

9.4.3. Dot diagrams. Uniqueness of the Jordan canonical form

There is a good way of visualizing Theorem 9.4.2 and Corollary 9.4.3, the so-called dot diagrams. This methods also allows us to answer many natural questions, like “is the block diagonal representation given by Corollary 9.4.3 unique?”

Of course, if we treat this question literally, the answer is “no”, for we always can change the order of the blocks. But, if we exclude such trivial possibilities, for example by agreeing on some order of blocks (say, if we put all non-zero blocks in decreasing order, and then put the zero block), is the representation unique, or not?

To better understand the structure of nilpotent operators, described in the Section 9.4.1, let us draw the so-called dot diagram. Namely, suppose we have a basis, which is a union of cycles of generalized eigenvectors. Let us represent the basis by an array of dots, so that each column represents a cycle. The first row (i.e. the top one in the Figure 9.1 below) correspond to initial vectors 𝐯1k\mathbf{v}_{1}^{k} of cycles, and we arrange the columns (cycles) by their length, putting the longest one to the left. Dots in the kkth column, going from top to the bottom, correspond to the vectors 𝐯1k,𝐯2k,,𝐯pkk\mathbf{v}_{1}^{k},\mathbf{v}_{2}^{k},\ldots,\mathbf{v}_{p_{k}}^{k} of the kkth cycle.

Usually one just present a dot diagram without specifying corresponding vectors. But sometimes it is beneficial to consider a marked dot diagram, where one marks dots with the corresponding vectors in the cycle.

On the figure 9.1 we have the dot diagram of a nilpotent operator, as well as its Jordan canonical form. This dot diagram shows, that the basis has 1 cycle of length 5, one cycle of length 3, two cycles of length 2, and 2 cycles of length 1. The cycle of length 5 corresponds to the 5×55\times 5 block of the matrix, the cycle of length 3 correspond to the 3×33\times 3 block, and two cycles of length 2 correspond to two 2×22\times 2 blocks. Two cycles of length 1 correspond to two zero entries on the diagonal, which we join in the 2×22\times 2 zero block. Here in each block we are only giving the main diagonal and the diagonal above it; all other entries of the matrix are zero.

(010101010001010010001000)\begin{picture}(25.0,30.0)(0.0,0.0)\put(0.0,30.0){\circle*{1.0}}\put(5.0,30.0)% {\circle*{1.0}}\put(10.0,30.0){\circle*{1.0}}\put(15.0,30.0){\circle*{1.0}}% \put(20.0,30.0){\circle*{1.0}}\put(25.0,30.0){\circle*{1.0}} \put(0.0,25.0){\circle*{1.0}}\put(5.0,25.0){\circle*{1.0}}\put(10.0,25.0){% \circle*{1.0}}\put(15.0,25.0){\circle*{1.0}} \put(0.0,20.0){\circle*{1.0}}\put(5.0,20.0){\circle*{1.0}} \put(0.0,15.0){\circle*{1.0}} \put(0.0,10.0){\circle*{1.0}} \end{picture}\qquad\left(\begin{array}[]{ccccc|ccccccccc}0&1&&&&&&&&&&&&\\ &0&1&&&&&&&&&&&\\ &&0&1&&&&&&&&&&\\ &&&0&1&&&&&&0&&&\\ &&&&0&&&&&&&&&\\ \cline{1-8}\cr&&&&&0&1&\lx@intercol\hfil\hfil\lx@intercol\vrule\lx@intercol&&&% &&&\\ &&&&&&0&\lx@intercol\hfil 1\hfil\lx@intercol\vrule\lx@intercol&&&&&&\\ &&&&&&&\lx@intercol\hfil 0\hfil\lx@intercol\vrule\lx@intercol&&&&&&\\ \cline{6-10}\cr&&&&\lx@intercol\hfil\hfil\lx@intercol&&&\lx@intercol\hfil\hfil% \lx@intercol\vrule\lx@intercol&0&\lx@intercol\hfil 1\hfil\lx@intercol\vrule% \lx@intercol&&&&\\ &&&&\lx@intercol\hfil\hfil\lx@intercol&&&\lx@intercol\hfil\hfil\lx@intercol% \vrule\lx@intercol&&\lx@intercol\hfil 0\hfil\lx@intercol\vrule\lx@intercol&&&&% \\ \cline{9-12}\cr&&0&&\lx@intercol\hfil\hfil\lx@intercol&&&&&\lx@intercol\hfil% \hfil\lx@intercol\vrule\lx@intercol&0&\lx@intercol\hfil 1\hfil\lx@intercol% \vrule\lx@intercol&&\\ &&&&\lx@intercol\hfil\hfil\lx@intercol&&&&&\lx@intercol\hfil\hfil\lx@intercol% \vrule\lx@intercol&&\lx@intercol\hfil 0\hfil\lx@intercol\vrule\lx@intercol&&\\ \cline{11-13}\cr&&&&\lx@intercol\hfil\hfil\lx@intercol&&&&&&&\lx@intercol\hfil% \hfil\lx@intercol\vrule\lx@intercol&\lx@intercol\hfil 0\hfil\lx@intercol\vrule% \lx@intercol&\\ \cline{13-14}\cr&&&&\lx@intercol\hfil\hfil\lx@intercol&&&&&&&&\lx@intercol% \hfil\hfil\lx@intercol\vrule\lx@intercol&0\\ \end{array}\right)
Figure 9.1. Dot diagram and corresponding Jordan canonical form of a nilpotent operator

If we agree on the ordering of the blocks, there is a one-to-one correspondence between dot diagrams and Jordan canonical forms (for nilpotent operators). So, the question about uniqueness of the Jordan canonical form is equivalent to the question about uniqueness of the dot diagram.

Action of operator on a dot diagram

To answer the question about uniqueness of the dot diagram, let us analyze, how the operator AA transforms it. Assume first that we are given a marked dot diagram, meaning that we mark each dot with the corresponding vector in the cycle. In this case we can get both the Jordan canonical form and Jordan canonical basis from such dot diagram.

Since the operator AA annihilates initial vectors of the cycles, and moves vector 𝐯k+1\mathbf{v}_{k+1} of a cycle to the vector 𝐯k\mathbf{v}_{k}, we can see that the operator AA acts on its dot diagram by removing the last (bottom) dot in every column. An equivalent description would be to remove the first (top) row, but require that top row in kkth column still corresponds to the initial vector 𝐯1k\mathbf{v}_{1}^{k} of the cycle.

The new dot diagram will be the (marked) dot diagram of the operator AA restricted to the AA-invariant subspace RanA\operatorname{Ran}A, and it gives us the Jordan canonical form for the restriction A|RanAA|_{\operatorname{Ran}A} (both the Jordan canonical basis and the matrix, i.e. the Jordan canonical form).

Similarly, it is not hard to see that the operator AkA^{k} removes the first kk rows of the dot diagram. Therefore, if for all kk we know the dimensions dimKer(Ak)\dim\operatorname{Ker}(A^{k}), we know the dot diagram of the operator AA. Namely, the number of dots in the first row is dimKerA\dim\operatorname{Ker}A, the number of dots in the second row is

dimKer(A2)dimKerA,\dim\operatorname{Ker}(A^{2})-\dim\operatorname{Ker}A,

and the number of dots in the kkth row is

dimKer(Ak)dimKer(Ak1).\dim\operatorname{Ker}(A^{k})-\dim\operatorname{Ker}(A^{k-1}).

But this means that the dot diagram, which was initially defined using a Jordan canonical basis, does not depend on a particular choice of such a basis. Therefore, the dot diagram, is unique! This implies that if we agree on the order of the blocks, then the Jordan canonical form is unique.

9.4.4. Computing a Jordan canonical basis

Let us say few words about computing a Jordan canonical basis for a nilpotent operator. We have already proved existence of this basis in Section 9.4.2, so we can use Jordan canonical form and dot diagram to guide us through the construction.

Let p1p_{1} be the largest positive integer such that Ap11𝟎A^{p_{1}-1}\neq\mathbf{0} (so Ap1=𝟎A^{p_{1}}=\mathbf{0}). Equivalently, we can say that p1p_{1} is the smallest non-negative integer such that Ap1=𝟎A^{p_{1}}=\mathbf{0} (and so Ap11𝟎A^{p_{1}-1}\neq\mathbf{0}). One can see from the above analysis of dot diagrams, see Section 9.4.3, that p1p_{1} is the length of the longest cycle (so this also can be used as the definition of p1p_{1}).

Computing operators AkA^{k}, k=1,2,,p11k=1,2,\ldots,p_{1}-1, and counting dimKer(Ak)\dim\operatorname{Ker}(A^{k}) we can construct the dot diagram of AA. Namely, dimKerA\dim\operatorname{Ker}A gives us the number of dots in the top row, dimKerA2dimKerA\dim\operatorname{Ker}A^{2}-\dim\operatorname{Ker}A the number of dots in the next row, etc.

Now we want to put vectors instead of dots and find a basis which is a union of cycles.

We start by finding the longest cycles (because we know the dot diagram, we know how many cycles should be there, and what is the length of each cycle). Consider the column space Ran(Ap11)\operatorname{Ran}(A^{p_{1}-1}). Since Ap1=𝟎A^{p_{1}}=\mathbf{0} there holds Ran(Ap11)KerA\operatorname{Ran}(A^{p_{1}-1})\subset\operatorname{Ker}A, so Ran(Ap11)=Ran(Ap11)KerA\operatorname{Ran}(A^{p_{1}-1})=\operatorname{Ran}(A^{p_{1}-1})\cap% \operatorname{Ker}A. To be in agreement with the notation in the next steps we will use the notation Ran(Ap11)KerA\operatorname{Ran}(A^{p_{1}-1})\cap\operatorname{Ker}A instead of Ran(Ap11)\operatorname{Ran}(A^{p_{1}-1}).

Let 𝐯11,𝐯12,,𝐯1r1\mathbf{v}_{1}^{1},\mathbf{v}^{2}_{1},\ldots,\mathbf{v}^{r_{1}}_{1} be a basis in the subspace Ran(Ap11)KerA\operatorname{Ran}(A^{p_{1}-1})\cap\operatorname{Ker}A; the vectors 𝐯11,𝐯12,,𝐯1r1\mathbf{v}_{1}^{1},\mathbf{v}^{2}_{1},\ldots,\mathbf{v}^{r_{1}}_{1} will be the initial vectors of the cycles. Then we find the end vectors 𝐯p11,𝐯p12,,𝐯p1r1\mathbf{v}_{p_{1}}^{1},\mathbf{v}_{p_{1}}^{2},\ldots,\mathbf{v}_{p_{1}}^{r_{1}} of the cycles by solving the equations

Ap11𝐯p1k=𝐯1k,k=1,2,,r1.A^{p_{1}-1}\mathbf{v}_{p_{1}}^{k}=\mathbf{v}_{1}^{k},\qquad k=1,2,\ldots,r_{1}.

Applying consecutively the operator AA to the end vector 𝐯p1k\mathbf{v}_{p_{1}}^{k}, we get all the vectors 𝐯jk\mathbf{v}_{j}^{k} in the cycle. Thus, we have constructed all cycles of maximal length.

Alternatively, for each kk we can find the vectors 𝐯jk\mathbf{v}_{j}^{k}, 2jp12\leq j\leq p_{1} by consecutively solving the equations

A𝐯jk=𝐯j1k,j=2,,p1.A\mathbf{v}_{j}^{k}=\mathbf{v}_{j-1}^{k},\qquad j=2,\dots,p_{1}.

Thus, we constructed all cycles of maximal length.

Let p2p_{2} be the length of a maximal cycle among those that are left to find. Consider the subspace Ran(Ap21)KerA\operatorname{Ran}(A^{p_{2}-1})\cap\operatorname{Ker}A, and let its dimension be r2r_{2}. Since Ran(Ap11)Ran(Ap21)\operatorname{Ran}(A^{p_{1}-1})\subset\operatorname{Ran}(A^{p_{2}-1}), we conclude that

Ran(Ap11)KerARan(Ap21)KerA,\operatorname{Ran}(A^{p_{1}-1})\cap\operatorname{Ker}A\subset\operatorname{Ran% }(A^{p_{2}-1})\cap\operatorname{Ker}A,

so r2r1=dimRan(Ap11)KerAr_{2}\geq r_{1}=\dim\operatorname{Ran}(A^{p_{1}-1})\cap\operatorname{Ker}A. Moreover333one can also see from the dot diagram that for p2<pp1p_{2}<p\leq p_{1} we have Ran(Ap)KerA=Ran(Ap1)KerA\operatorname{Ran}(A^{p})\cap\operatorname{Ker}A=\operatorname{Ran}(A^{p_{1}})% \cap\operatorname{Ker}A, so p2p_{2} is the largest exponent for which we need to complete the basis., it can be seen from the dot diagram that r2>r1r_{2}>r_{1}. Therefore we can complete the basis 𝐯11,𝐯12,,𝐯1r1\mathbf{v}_{1}^{1},\mathbf{v}_{1}^{2},\ldots,\mathbf{v}_{1}^{r_{1}} in Ran(Ap11)KerA\operatorname{Ran}(A^{p_{1}-1})\cap\operatorname{Ker}A to a basis 𝐯11,𝐯12,,𝐯1r1,𝐯1r1+1,,𝐯1r2\mathbf{v}_{1}^{1},\mathbf{v}_{1}^{2},\ldots,\mathbf{v}_{1}^{r_{1}},\mathbf{v}% _{1}^{r_{1}+1},\ldots,\mathbf{v}_{1}^{r_{2}} in Ran(Ap21)KerA\operatorname{Ran}(A^{p_{2}-1})\cap\operatorname{Ker}A. Then we find end vectors of the cycles 𝒞r1+1,,𝒞r2\mathcal{C}_{r_{1}+1},\ldots,\mathcal{C}_{r_{2}} by solving (for 𝐯p2k\mathbf{v}_{p_{2}}^{k}) the equations

Ap21𝐯p2k=𝐯1k,k=r1+1,r1+2,,r2,A^{p_{2}-1}\mathbf{v}_{p_{2}}^{k}=\mathbf{v}_{1}^{k},\qquad k=r_{1}+1,r_{1}+2,% \ldots,r_{2},

thus constructing the cycles of length p2p_{2}.

Let p3p_{3} denote the length of a maximal cycle among ones left. Then, completing the basis 𝐯11,𝐯12,,𝐯1r2\mathbf{v}_{1}^{1},\mathbf{v}_{1}^{2},\ldots,\mathbf{v}_{1}^{r_{2}} in Ran(Ap21)KerA\operatorname{Ran}(A^{p_{2}-1})\cap\operatorname{Ker}A to a basis in Ran(Ap31)KerA\operatorname{Ran}(A^{p_{3}-1})\cap\operatorname{Ker}A we construct the cycles of length p3p_{3}, and so on…

One final remark: as we discussed above, if we know the dot diagram, we know the canonical form, so after we have found a Jordan canonical basis, we do not need to compute the matrix of AA in this basis: we already know it!

9.5. Jordan decomposition theorem

Theorem 9.5.1.

Given an operator AA there exist a basis (Jordan canonical basis) such that the matrix of AA in this basis has a block diagonal form with blocks of form

(9.5.1) (λ1 0λ 1λ10λ)\left(\begin{array}[]{ccccccc}\lambda&1&&&\ 0\\ &\lambda&\ 1&&\\ &&\lambda&\ddots&\\[-4.79993pt] &&&\ddots&1\\ 0&&&&\lambda\\ \end{array}\right)

where λ\lambda is an eigenvalue of AA. Here we assume that the block of size 11 is just λ\lambda.

The block diagonal form from Theorem 9.5.1 is called the Jordan canonical form of the operator AA. The corresponding basis is called a Jordan canonical basis for an operator AA.

Proof of Theorem 9.5.1.

According to Theorem 9.3.4 and Remark 9.3.5, if we join bases in the generalized eigenspaces Ek=EλkE_{k}=E_{\lambda_{k}} to get a basis in the whole space, the matrix of AA in this basis has a block diagonal form diag{A1,A2,,Ar}\operatorname{diag}\{A_{1},A_{2},\ldots,A_{r}\}, where Ak=A|EkA_{k}=A|_{E_{k}}. The operators Nk=AkλkIEkN_{k}=A_{k}-\lambda_{k}I_{E_{k}} are nilpotent, so by Theorem 9.4.2 (more precisely, by Corollary 9.4.3) one can find a basis in EkE_{k} such that the matrix of NkN_{k} in this basis is the Jordan canonical form of NkN_{k}. To get the matrix of AkA_{k} in this basis one just puts λk\lambda_{k} instead of 0 on the main diagonal. ∎

9.5.1. Remarks about computing Jordan canonical basis

First of all let us recall that the computing of eigenvalues is the hardest part, but here we do not discuss this part, and assume that eigenvalues are already computed.

For each eigenvalue λ\lambda we compute subspaces Ker(AλI)k\operatorname{Ker}(A-\lambda I)^{k}, k=1,2,k=1,2,\ldots until the sequence of the subspaces stabilizes. In fact, since we have an increasing sequence of subspaces (Ker(AλI)kKer(AλI)k+1\operatorname{Ker}(A-\lambda I)^{k}\subset\operatorname{Ker}(A-\lambda I)^{k+1}), then it is sufficient only to keep track of their dimension (or ranks of the operators (AλI)k(A-\lambda I)^{k}). For an eigenvalue λ\lambda let m=mλm=m_{\lambda} be the number where the sequence Ker(AλI)k\operatorname{Ker}(A-\lambda I)^{k} stabilizes, i.e. mm satisfies

dimKer(AλI)m1<dimKer(AλI)m=dimKer(AλI)m+1.\dim\operatorname{Ker}(A-\lambda I)^{m-1}<\dim\operatorname{Ker}(A-\lambda I)^% {m}=\dim\operatorname{Ker}(A-\lambda I)^{m+1}.

Then Eλ=Ker(AλI)mE_{\lambda}=\operatorname{Ker}(A-\lambda I)^{m} is the generalized eigenspace corresponding to the eigenvalue λ\lambda.

After we computed all the generalized eigenspaces there are two possible ways of action. The first way is to find a basis in each generalized eigenspace, so the matrix of the operator AA in this basis has the block-diagonal form diag{A1,A2,,Ar}\operatorname{diag}\{A_{1},A_{2},\ldots,A_{r}\}, where Ak=A|EλkA_{k}=A|_{E_{\lambda_{k}}}. Then we can deal with each matrix AkA_{k} separately. The operators Nk=AkλkIN_{k}=A_{k}-\lambda_{k}I are nilpotent, so applying the algorithm described in Section 9.4.4 we get the Jordan canonical representation for NkN_{k}, and putting λk\lambda_{k} instead of 0 on the main diagonal, we get the Jordan canonical representation for the block AkA_{k}. The advantage of this approach is that we are working with smaller blocks. But we need to find the matrix of the operator in a new basis, which involves inverting a matrix and matrix multiplication.

Another way is to find a Jordan canonical basis in each of the generalized eigenspaces EλkE_{\lambda_{k}} by working directly with the operator AA, without splitting it first into the blocks. Again, the algorithm we outlined above in Section 9.4.4 works with a slight modification. Namely, when computing a Jordan canonical basis for a generalized eigenspace EλkE_{\lambda_{k}}, instead of considering subspaces Ran(AkλkI)j\operatorname{Ran}(A_{k}-\lambda_{k}I)^{j}, which we would need to consider when working with the block AkA_{k} separately, we consider the subspaces (AλkI)jEλk(A-\lambda_{k}I)^{j}E_{\lambda_{k}}.

Index