Linear Algebra Done Wrong 8 Dual spaces and tensors

Chapter 9 Advanced spectral theory

9.1. Cayley–Hamilton Theorem

Theorem 9.1.1 (Cayley–Hamilton).

Let $A$ be a square matrix, and let $p(\lambda)=\det(A-\lambda I)$ be its characteristic polynomial. Then $p(A)=\mathbf{0}$ .

A wrong proof.

The proof looks ridiculously simple: plugging $A$ instead of $\lambda$ in the definition of the characteristic polynomial we get

p(A)=\det(A-AI)=\det\mathbf{0}=0.

∎

But this is a wrong proof! To see why, let us analyze what the theorem states. It states, that if we compute the characteristic polynomial

\det(A-\lambda I)=p(\lambda)=\sum_{k=0}^{n}c_{k}\lambda^{k}

and then plug matrix $A$ instead of $\lambda$ to get

p(A):=\sum_{k=0}^{n}c_{k}A^{k}=c_{0}I+c_{1}A+\ldots+c_{n}A^{n}

then the result will be zero matrix.

It is not clear why we get the same result if we just plug $A$ instead of $\lambda$ in the determinant $\det(A-\lambda I)$ . Moreover, it is easy to see that with the exception of trivial case of $1\times 1$ matrices we will get a different object. Namely, $A-AI$ is zero matrix, and its determinant is just the number $0$ . But $p(A)$ is a matrix, and the theorem claims that this matrix is the zero matrix. Thus we are comparing apples and oranges. Even though in both cases we got zero, these are different zeroes: the number zero and the zero matrix!

Let us present a correct (albeit different from the standard one used in most textbooks) proof, which is based on some ideas from analysis.

A “continuous” proof.

The proof is based on several observations. First of all, the theorem is trivial for diagonal matrices, and so for matrices similar to diagonal (i.e. for diagonalizable matrices), see Problem 9.1.1 below.

The second observation is that any matrix can be approximated (as close as we want) by diagonalizable matrices. Since any operator has an upper triangular matrix in some orthonormal basis (see Theorem 6.1.1 in Chapter 6), we can assume without loss of generality that $A$ is an upper triangular matrix.

We can perturb diagonal entries of $A$ (as little as we want), to make them all different, so the perturbed matrix $\widetilde{A}$ is diagonalizable (eigenvalues of a triangular matrix are its diagonal entries, see Section 4.1.7 in Chapter 4, and by Corollary 4.2.3 in Chapter 4 an $n\times n$ matrix with $n$ distinct eigenvalues is diagonalizable).

As I just mentioned, we can perturb the diagonal entries of $A$ as little as we want, so Frobenius norm $\|A-\widetilde{A}\|_{2}$ is as small as we want. Therefore one can find a sequence of diagonalizable matrices $A_{k}$ such that $A_{k}\to A$ as $k\to\infty$ (for example such that $\|A_{k}-A\|_{2}\to 0$ as $k\to\infty$ ). It can be shown that the characteristic polynomials $p_{k}(\lambda)=\det(A_{k}-\lambda I)$ converge to the characteristic polynomial $p(\lambda)=\det(A-\lambda I)$ of $A$ . Therefore

p(A)=\lim_{k\to\infty}p_{k}(A_{k}).

But as we just discussed above the Cayley–Hamilton Theorem is trivial for diagonalizable matrices, so $p_{k}(A_{k})=\mathbf{0}$ . Therefore $p(A)=\lim_{k\to\infty}\mathbf{0}=\mathbf{0}$ . ∎

This proof illustrates an important idea that often it is sufficient to consider only a typical, generic situation. It is going beyond the scope of the book, but let us mention, without going into details, that a generic (i.e. typical) matrix is diagonalizable.

This proof is intended for a reader who is comfortable with such ideas from analysis as continuity and convergence¹¹1Here I mean analysis, i.e. a rigorous treatment of continuity, convergence, etc, and not calculus, which, as it is taught now, is simply a collection of recipes.. Such a reader should be able to fill in all the details, and for him/her this proof should look extremely easy and natural.

However, for others, who are not comfortable yet with these ideas, the proof definitely may look strange. It may even look like some kind of cheating, although, let me repeat that it is an absolutely correct and rigorous proof (modulo some standard facts in analysis). So, let us present another, proof of the theorem which is one of the “standard” proofs from linear algebra textbooks.

A “standard” proof.

We know, see Theorem 6.6.1.1 from Chapter 6, that any square matrix is unitary equivalent to an upper triangular one. Since for any polynomial $p$ we have $p(UAU^{-1})=Up(A)U^{-1}$ , and the characteristic polynomials of unitarily equivalent matrices coincide, it is sufficient to prove the theorem only for upper triangular matrices.

So, let $A$ be an upper triangular matrix. We know that diagonal entries of a triangular matrix coincide with it eigenvalues, so let $\lambda_{1},\lambda_{2},\ldots,\lambda_{n}$ be eigenvalues of $A$ ordered as they appear on the diagonal, so

A=\left(\begin{array}[]{llll}\lambda_{1}&&&*\\ &\lambda_{2}&&\\ &&\ddots&\\ \mathbf{0}&&&\lambda_{n}\\ \end{array}\right).

The characteristic polynomial $p(z)=\det(A-zI)$ of $A$ can be represented as $p(z)=(\lambda_{1}-z)(\lambda_{2}-z)\ldots(\lambda_{n}-z)=(-1)^{n}(z-\lambda_{1% })(z-\lambda_{2})\ldots(z-\lambda_{n})$ , so

p(A)=(-1)^{n}(A-\lambda_{1}I)(A-\lambda_{2}I)\ldots(A-\lambda_{n}I).

Define subspaces $E_{k}:=\operatorname{span}\{\mathbf{e}_{1},\mathbf{e}_{2},\ldots,\mathbf{e}_{k}\}$ , where $\mathbf{e}_{1},\mathbf{e}_{2},\ldots,\mathbf{e}_{n}$ is the standard basis in $\mathbb{C}^{n}$ . Since the matrix of $A$ is upper triangular, the subspaces $E_{k}$ are so-called invariant subspaces of the operator $A$ , i.e. $AE_{k}\subset E_{k}$ (meaning that $A\mathbf{v}\in E_{k}$ for all $\mathbf{v}\in E_{k}$ ). Moreover, since for any $\mathbf{v}\in E_{k}$ and any $\lambda$

(A-\lambda I)\mathbf{v}=A\mathbf{v}-\lambda\mathbf{v}\in E_{k},

because both $A\mathbf{v}$ and $\lambda\mathbf{v}$ are in $E_{k}$ . Thus $(A-\lambda I)E_{k}\subset E_{k}$ , i.e. $E_{k}$ is an invariant subspace of $A-\lambda I$ .

We can say even more about the the subspace $(A-\lambda_{k}I)E_{k}$ . Namely, $(A-\lambda_{k}I)\mathbf{e}_{k}\in\operatorname{span}\{\mathbf{e}_{1},\mathbf{e% }_{2},\ldots,\mathbf{e}_{k-1}\}$ , because only the first $k-1$ entries of the $k$ th column of the matrix of $A-\lambda_{k}I$ can be non-zero. On the other hand, for $j<k$ we have $(A-\lambda_{k}I)\mathbf{e}_{j}\in E_{j}\subset E_{k}$ (because $E_{j}$ is an invariant subspace of $A-\lambda_{k}I$ ).

Take any vector $\mathbf{v}\in E_{k}$ . By the definition of $E_{k}$ it can be represented as a linear combination of the vectors $\mathbf{e}_{1},\mathbf{e}_{2},\ldots,\mathbf{e}_{k}$ . Since all vectors $\mathbf{e}_{1},\mathbf{e}_{2},\ldots,\mathbf{e}_{k}$ are transformed by $A-\lambda_{k}I$ to some vectors in $E_{k-1}$ , we can conclude that

(9.1.1)

(A-\lambda_{k}I)\mathbf{v}\in E_{k-1}\qquad\forall\mathbf{v}\in E_{k}.

Take an arbitrary vector $\mathbf{x}\in\mathbb{C}^{n}=E_{n}$ . Applying (9.1.1) inductively with $k=n,n-1,\ldots 1$ we get

	$\displaystyle\mathbf{x}_{1}$	$\displaystyle:=(A-\lambda_{n}I)\mathbf{x}\in E_{n-1},$
	$\displaystyle\mathbf{x}_{2}$	$\displaystyle:=(A-\lambda_{n-1}I)\mathbf{x}_{1}=(A-\lambda_{n-1}I)(A-\lambda_{% n}I)\mathbf{x}\in E_{n-2},$
		$\displaystyle\ldots$
	$\displaystyle\mathbf{x}_{n}$	$\displaystyle:=(A-\lambda_{2}I)\mathbf{x}_{n-1}=(A-\lambda_{2}I)\ldots(A-% \lambda_{n-1}I)(A-\lambda_{n}I)\mathbf{x}\in E_{1}.$

The last inclusion mean that $\mathbf{x}_{n}=\alpha\mathbf{e}_{1}$ . But $(A-\lambda_{1}I)\mathbf{e}_{1}=\mathbf{0}$ , so

\mathbf{0}=(A-\lambda_{1}I)\mathbf{x}_{n}=(A-\lambda_{1}I)(A-\lambda_{2}I)% \ldots(A-\lambda_{n}I)\mathbf{x}.

Therefore $p(A)\mathbf{x}=\mathbf{0}$ for all $\mathbf{x}\in\mathbb{C}^{n}$ , which means exactly that $p(A)=\mathbf{0}$ . ∎

Exercises.

9.1.1Cayley–Hamilton Theorem for diagonalizable matrices.

As discussed in the above section, the Cayley–Hamilton theorem states that if $A$ is a square matrix, and

p(\lambda)=\det(A-\lambda I)=\sum_{k=0}^{n}c_{k}\lambda^{k}

is its characteristic polynomial, then $p(A):=\sum_{k=0}^{n}c_{k}A^{k}=\mathbf{0}$ (we assuming, that by definition $A^{0}=I$ ).

Prove this theorem for the special case when $A$ is similar to a diagonal matrix, $A=SDS^{-1}$ .

Hint: If $D=\operatorname{diag}\{\lambda_{1},\lambda_{2},\ldots,\lambda_{n}\}$ and $p$ is any polynomial, can you compute $p(D)$ ? What about $p(A)$ ?

9.2. Spectral Mapping Theorem

9.2.1. Polynomials of operators

Let us also recall that for a square matrix (an operator) $A$ and for a polynomial $p(z)=\sum_{k=0}^{N}a_{k}z^{k}$ the operator $p(A)$ is defined by substituting $A$ instead of the independent variable,

p(A):=\sum_{k=0}^{N}a_{k}A^{k}=a_{0}I+a_{1}A+a_{2}A^{2}+\ldots+a_{N}A^{N};

here we agree that $A^{0}=I$ .

We know that generally matrix multiplication is not commutative, i.e. generally $AB\neq BA$ so the order is essential. However

A^{k}A^{j}=A^{j}A^{k}=A^{k+j},

and from here it is easy to show that for arbitrary polynomials $p$ and $q$

p(A)q(A)=q(A)p(A)=R(A)

where $R(z)=p(z)q(z)$ .

That means that when dealing only with polynomials of an operator $A$ , one does not need to worry about non-commutativity, and act like $A$ is simply an independent (scalar) variable. In particular, if a polynomial $p(z)$ can be represented as a product of monomials

p(z)=a(z-z_{1})(z-z_{2})\ldots(z-z_{N}),

where $z_{1},z_{2},\ldots,z_{N}$ are the roots of $p$ , then $p(A)$ can be represented as

p(A)=a(A-z_{1}I)(A-z_{2}I)\ldots(A-z_{N}I)

9.2.2. Spectral Mapping Theorem

Let us recall that the spectrum $\sigma(A)$ of a square matrix (an operator) $A$ is the set of all eigenvalues of $A$ (not counting multiplicities).

Theorem 9.2.1 (Spectral Mapping Theorem).

For a square matrix $A$ and an arbitrary polynomial $p$

\sigma(p(A))=p(\sigma(A)).

In other words, $\mu$ is an eigenvalue of $p(A)$ if and only if $\mu=p(\lambda)$ for some eigenvalue $\lambda$ of $A$ .

Note, that as stated, this theorem does not say anything about multiplicities of the eigenvalues.

Remark.

Note, that one inclusion is trivial. Namely, if $\lambda$ is an eigenvalue of $A$ , $A\mathbf{x}=\lambda\mathbf{x}$ for some $\mathbf{x}\neq\mathbf{0}$ , then $A^{k}\mathbf{x}=\lambda^{k}\mathbf{x}$ , and $p(A)\mathbf{x}=p(\lambda)\mathbf{x}$ , so $p(\lambda)$ is an eigenvalue of $p(A)$ . That means that the inclusion $p(\sigma(A))\subset\sigma(p(A))$ is trivial.

If we consider a particular case $\mu=0$ of the above theorem, we get the following corollary.

Corollary 9.2.2.

Let $A$ be a square matrix with eigenvalues $\lambda_{1},\lambda_{2},\ldots,\lambda_{n}$ and let $p$ be a polynomial. Then $p(A)$ is invertible if and only if

p(\lambda_{k})\neq 0\qquad\forall k=1,2,\ldots,n.

Proof of Theorem 9.2.1.

As it was discussed above, the inclusion

p(\sigma(A))\subset\sigma(p(A))

is trivial.

To prove the opposite inclusion $\sigma(p(A))\subset p(\sigma(A))$ take a point $\mu\in\sigma(p(A))$ . Denote $q(z)=p(z)-\mu$ , so $q(A)=p(A)-\mu I$ . Since $\mu\in\sigma(p(A))$ the operator $q(A)=p(A)-\mu I$ is not invertible.

Let us represent the polynomial $q(z)$ as a product of monomials,

q(z)=a(z-z_{1})(z-z_{2})\ldots(z-z_{N}).

Then, as it was discussed above in Section 9.2.1, we can represent

q(A)=a(A-z_{1}I)(A-z_{2}I)\ldots(A-z_{N}I).

The operator $q(A)$ is not invertible, so one of the terms $A-z_{k}I$ must be not invertible (because a product of invertible transformations is always invertible). That means $z_{k}\in\sigma(A)$ .

On the other hand $z_{k}$ is a root of $q$ , so

0=q(z_{k})=p(z_{k})-\mu

and therefore $\mu=p(z_{k})$ . So we have proved the inclusion $\sigma(p(A))\subset p(\sigma(A))$ . ∎

Exercises.

9.2.1.

An operator $A$ is called nilpotent if $A^{k}=\mathbf{0}$ for some $k$ . Prove that if $A$ is nilpotent, then $\sigma(A)=\{0\}$ (i.e. that $0$ is the only eigenvalue of $A$ ).

Can you do it without using the spectral mapping theorem?

9.3. Generalized eigenspaces. Geometric meaning of algebraic multiplicity

9.3.1. Invariant subspaces

Definition.

Let $A:V\to V$ be an operator (linear transformation) in a vector space $V$ . A subspace $E$ of the vector space $V$ is called an invariant subspace of the operator $A$ (or, shortly, $A$ -invariant) if $AE\subset E$ , i.e. if $A\mathbf{v}\in E$ for all $\mathbf{v}\in E$ .

If $E$ is $A$ -invariant, then

A^{2}E=A(AE)\subset AE\subset E,

i.e. $E$ is $A^{2}$ -invariant.

Similarly one can show (using induction, for example), that if $AE\subset E$ then

A^{k}E\subset E\qquad\forall k\geq 1.

This implies that $p(A)E\subset E$ for any polynomial $p$ , i.e. that:

any $A$ -invariant subspace $E$ is an invariant subspace of $p(A)$ .

If $E$ is an $A$ -invariant subspace, then for all $\mathbf{v}\in E$ the result $A\mathbf{v}$ also belongs to $E$ . Therefore we can treat $A$ as an operator acting on $E$ , not on the whole space $V$ .

Formally, for an $A$ -invariant subspace $E$ we define the so-called restriction $A|_{E}:E\to E$ of $A$ onto $E$ by

(A|_{E})\mathbf{v}=A\mathbf{v}\qquad\forall\mathbf{v}\in E.

Here we changed domain and target space of the operator, but the rule assigning value to the argument remains the same.

We will need the following simple lemma

Lemma 9.3.1.

Let $p$ be a polynomial, and let $E$ be an $A$ -invariant subspace. Then

p(A|_{E})=p(A)|_{E}.

Proof.

The proof is trivial ∎

If $E_{1},E_{2},\ldots,E_{r}$ a basis of $A$ -invariant subspaces, and $A_{k}:=A|_{E_{k}}$ are the corresponding restrictions, then, since $AE_{k}=A_{k}E_{k}\subset E_{k}$ , the operators $A_{k}$ act independently of each other (do not interact), and to analyze action of $A$ we can analyze operators $A_{k}$ separately.

In particular, if we pick a basis in each subspace $E_{k}$ and join them to get a basis in $V$ (see Theorem 4.2.6 from Chapter 4) then the operator $A$ will have in this basis the following block-diagonal form

A=\left(\begin{array}[]{llll}A_{1}&&&\\[-10.76385pt] &A_{2}&&\raisebox{6.45831pt}{\Huge$\mathbf{0}$}\\ &&\ddots&\\[-6.45831pt] \raisebox{0.0pt}{\Huge$\mathbf{0}$}&&&A_{r}\\ \end{array}\right)

(of course, here we have the correct ordering of the basis in $V$ , first we take a basis in $E_{1}$ ,then in $E_{2}$ and so on).

Our goal now is to pick a basis of invariant subspaces $E_{1},E_{2},\ldots,E_{r}$ such that the restrictions $A_{k}$ have a simple structure. In this case we will get a basis in which the matrix of $A$ has a simple structure.

The eigenspaces $\operatorname{Ker}(A-\lambda_{k}I)$ would be good candidates, because the restriction of $A$ to the eigenspace $\operatorname{Ker}(A-\lambda_{k}I)$ is simply $\lambda_{k}I$ . Unfortunately, as we know eigenspaces do not always form a basis (they form a basis if and only if $A$ can be diagonalized, cf Theorem 4.2.1 in Chapter 4).

However, the so-called generalized eigenspaces will work.

9.3.2. Generalized eigenspaces.

Definition 9.3.2.

A vector $\mathbf{v}$ is called a generalized eigenvector (corresponding to an eigenvalue $\lambda$ ) if $(A-\lambda I)^{k}\mathbf{v}=\mathbf{0}$ for some $k\geq 1$ .

The collection $E_{\lambda}$ of all generalized eigenvectors, together with $\mathbf{0}$ is called the generalized eigenspace (corresponding to the eigenvalue $\lambda$ ).

In other words one can represent the generalized eigenspace $E_{\lambda}$ as

(9.3.1)

E_{\lambda}=\bigcup_{k\geq 1}\operatorname{Ker}(A-\lambda I)^{k}.

The sequence $\operatorname{Ker}(A-\lambda I)^{k}$ , $k=1,2,3,\ldots$ is an increasing sequence of subspaces, i.e.

\operatorname{Ker}(A-\lambda I)^{k}\subset\operatorname{Ker}(A-\lambda I)^{k+1% }\qquad\forall k\geq 1.

The representation (9.3.1) does not look very simple, for it involves an infinite union. However, the sequence of the subspaces $\operatorname{Ker}(A-\lambda I)^{k}$ stabilizes, i.e.

\operatorname{Ker}(A-\lambda I)^{k}=\operatorname{Ker}(A-\lambda I)^{k+1}% \qquad\forall k\geq k_{\lambda},

so, in fact one can take the finite union.

To show that the sequence of kernels stabilizes, let us notice that if for finite-dimensional subspaces $E$ and $F$ we have $E\subsetneqq F$ (symbol $E\subsetneqq F$ means that $E\subset F$ but $E\neq F$ ), then $\dim E<\dim F$ .

Since $\dim\operatorname{Ker}(A-\lambda I)^{k}\leq\dim V<\infty$ , it cannot grow to infinity, so at some point

\operatorname{Ker}(A-\lambda I)^{k}=\operatorname{Ker}(A-\lambda I)^{k+1}.

The rest follows from the lemma below.

Lemma 9.3.3.

Let for some $k$

\operatorname{Ker}(A-\lambda I)^{k}=\operatorname{Ker}(A-\lambda I)^{k+1}.

Then

\operatorname{Ker}(A-\lambda I)^{k+r}=\operatorname{Ker}(A-\lambda I)^{k+r+1}% \qquad\forall r\geq 0.

Proof.

Let $\mathbf{v}\in\operatorname{Ker}(A-\lambda I)^{k+r+1}$ , i.e. $(A-\lambda I)^{k+r+1}\mathbf{v}=\mathbf{0}$ . Then

\mathbf{w}:=(A-\lambda I)^{r}\mathbf{v}\in\operatorname{Ker}(A-\lambda I)^{k+1}.

But we know that $\operatorname{Ker}(A-\lambda I)^{k}=\operatorname{Ker}(A-\lambda I)^{k+1}$ so $\mathbf{w}\in\operatorname{Ker}(A-\lambda I)^{k}$ , which means $(A-\lambda I)^{k}\mathbf{w}=\mathbf{0}$ . Recalling the definition of $\mathbf{w}$ we get that

(A-\lambda I)^{k+r}\mathbf{v}=(A-\lambda I)^{k}\mathbf{w}=\mathbf{0}

so $\mathbf{v}\in\operatorname{Ker}(A-\lambda I)^{k+r}$ . We proved that $\operatorname{Ker}(A-\lambda I)^{k+r+1}\subset\operatorname{Ker}(A-\lambda I)^% {k+r}$ . The opposite inclusion is trivial. ∎

Definition.

The number $d=d(\lambda)$ on which the sequence $\operatorname{Ker}(A-\lambda I)^{k}$ stabilizes, i.e. the number $d$ such that

\operatorname{Ker}(A-\lambda I)^{d-1}\subsetneqq\operatorname{Ker}(A-\lambda I% )^{d}=\operatorname{Ker}(A-\lambda I)^{d+1}

is called the depth of the eigenvalue $\lambda$ .

It follows from the definition of the depth, that for the generalized eigenspace $E_{\lambda}$

(9.3.2)

(A-\lambda I)^{d(\lambda)}\mathbf{v}=\mathbf{0}\qquad\forall\mathbf{v}\in E_{% \lambda}.

Now let us summarize, what we know about generalized eigenspaces.

a)

$E_{\lambda}$ is an invariant subspace of $A$ , $AE_{\lambda}\subset E_{\lambda}$ .
b)

If $d(\lambda)$ is the depth of the eigenvalue $\lambda$ , then

$((A-\lambda I)|_{E_{\lambda}})^{d(\lambda)}=(A|_{E_{\lambda}}-\lambda I_{E_{% \lambda}})^{d(\lambda)}=\mathbf{0}.$

(this is just another way of writing (9.3.2))
c)

$\sigma(A|_{E_{\lambda}})=\{\lambda\}$ , because the operator $A|_{E_{\lambda}}-\lambda I_{E_{\lambda}}$ , is nilpotent, see b), and the spectrum of nilpotent operator consists of one point $0$ , see Problem 9.2.1

Now we are ready to state the main result of this section. Let $A:V\to V$ .

Theorem 9.3.4.

Let $\sigma(A)$ consists of $r$ points $\lambda_{1},\lambda_{2},\ldots,\lambda_{r}$ , and let $E_{k}:=E_{\lambda_{k}}$ be the corresponding generalized eigenspaces. Then the system of subspace $E_{1},E_{2},\ldots,E_{r}$ is a basis of subspaces in $V$ .

Remark 9.3.5.

If we join the bases in all generalized eigenspaces $E_{k}$ , then by Theorem 4.2.6 from Chapter 4 we will get a basis in the whole space. In this basis the matrix of the operator $A$ has the block diagonal form $A=\operatorname{diag}\{A_{1},A_{2},\ldots,A_{r}\}$ , where $A_{k}:=A|_{E_{k}}$ , $E_{k}=E_{\lambda_{k}}$ . It is also easy to see, see (9.3.2) that the operators $N_{k}:=A_{k}-\lambda_{k}I_{E_{k}}$ are nilpotent, $N_{k}^{d_{k}}=\mathbf{0}$ .

Proof of Theorem 9.3.4.

Let $m_{k}$ be the multiplicity of the eigenvalue $\lambda_{k}$ , so $p(z)=\prod_{k=1}^{r}(z-\lambda_{k})^{m_{k}}$ is the characteristic polynomial of $A$ . Define

\quad p_{k}(z)=p(z)/(z-\lambda_{k})^{m_{k}}=\prod_{j\neq k}(z-\lambda_{j})^{m_% {j}}.

Lemma 9.3.6.

(9.3.3)

(A-\lambda_{k}I)^{m_{k}}|_{E_{k}}=\mathbf{0},

Proof.

Define $A_{k}:=A|_{E_{k}}$ . By property c) of the generalized eigenspaces $\sigma(A_{k})=\{\lambda_{k}\}$ . Since $p_{k}(\lambda_{k})\neq 0$ , the Spectral Mapping Theorem, see Corollary 9.2.2, implies that the operator $p_{k}(A_{k})$ is invertible.

By the Cayley–Hamilton Theorem (Theorem 9.1.1)

\mathbf{0}=p(A)=(A-\lambda_{k}I)^{m_{k}}p_{k}(A).

Restricting the above identity to the invariant subspace $E_{k}$ we get (see Lemma 9.3.1) that

\mathbf{0}=p(A_{k})=(A_{k}-\lambda_{k}I_{E_{k}})^{m_{k}}p_{k}(A_{k})

( $\mathbf{0}$ here is zero operator in $E_{k}$ ). Since $p_{k}(A)$ is invertible we conclude that

(A_{k}-\lambda_{k}I)^{m_{k}}=p(A_{k})p_{k}(A_{k})^{-1}=\mathbf{0}\,p_{k}(A_{k}% )^{-1}=\mathbf{0}.

Lemma 9.3.1 implies that $(A-\lambda_{k}I)^{m_{k}}|_{E_{k}}=(A_{k}-\lambda_{k}I)^{m_{k}}$ , which concludes the proof. ∎

Continuing with the proof of Theorem 9.3.4 define

q(z)=\sum_{k=1}^{r}p_{k}(z).

Since $p_{k}(\lambda_{j})=0$ for $j\neq k$ and $p_{k}(\lambda_{k})\neq 0$ , we can conclude that $q(\lambda_{k})\neq 0$ for all $k$ . Therefore, by the Spectral Mapping Theorem, see Corollary 9.2.2, the operator

B=q(A)

is invertible.

Define the operators $\mathcal{P}_{k}$ by

\mathcal{P}_{k}=B^{-1}p_{k}(A).

Lemma 9.3.7.

For the operators $\mathcal{P}_{k}$ defined above

a)

$\mathcal{P}_{1}+\mathcal{P}_{2}+\ldots+\mathcal{P}_{r}=I$ ;
b)

$\mathcal{P}_{k}|_{E_{j}}=\mathbf{0}$ for $j\neq k$ ;
c)

$\operatorname{Ran}\mathcal{P}_{k}\subset E_{k}$ ;
d)

moreover, $\mathcal{P}_{k}\mathbf{v}=\mathbf{v}$ $\forall\mathbf{v}\in E_{k}$ , so, in fact $\operatorname{Ran}\mathcal{P}_{k}=E_{k}$ .

Proof.

Property a) is trivial:

\sum_{k=1}^{r}\mathcal{P}_{k}=B^{-1}\sum_{k=1}^{r}p_{k}(A)=B^{-1}B=I.

Property b) follows from (9.3.3). Indeed, $p_{k}(A)$ contains the factor $(A-\lambda_{j}I)^{m_{j}}$ , restriction of which to $E_{j}$ is zero. Therefore $p_{k}(A)|_{E_{j}}=\mathbf{0}$ and thus $\mathcal{P}_{k}|_{E_{j}}=B^{-1}p_{k}(A)|_{E_{j}}=\mathbf{0}$ .

To prove property c), recall that according to Cayley–Hamilton Theorem $p(A)=\mathbf{0}$ . Since $p(z)=(z-\lambda_{k})^{m_{k}}p_{k}(z)$ , we have for $\mathbf{w}=p_{k}(A)\mathbf{v}$

(A-\lambda_{k}I)^{m_{k}}\mathbf{w}=(A-\lambda_{k}I)^{m_{k}}p_{k}(A)\mathbf{v}=% p(A)\mathbf{v}=\mathbf{0}.

That means, any vector $\mathbf{w}$ in $\operatorname{Ran}p_{k}(A)$ is annihilated by some power of $(A-\lambda_{k}I)$ , which by definition means that $\operatorname{Ran}p_{k}(A)\subset E_{k}$ .

To prove the last property, let us notice that it follows from (9.3.3) that for $\mathbf{v}\in E_{k}$

p_{k}(A)\mathbf{v}=\sum_{j=1}^{r}p_{j}(A)\mathbf{v}=B\mathbf{v},

which implies $\mathcal{P}_{k}\mathbf{v}=B^{-1}B\mathbf{v}=\mathbf{v}$ . ∎

Now we are ready to complete the proof of the theorem. Take $\mathbf{v}\in V$ and define $\mathbf{v}_{k}=\mathcal{P}_{k}\mathbf{v}$ . Then according to Statement c) of Lemma 9.3.7, $\mathbf{v}_{k}\in E_{k}$ , and by Statement a),

\mathbf{v}=\sum_{k=1}^{r}\mathbf{v}_{k},

so $\mathbf{v}$ admits a representation as a linear combination.

To show that this representation is unique, we can just note, that if $\mathbf{v}$ is represented as $\mathbf{v}=\sum_{k=1}^{r}\mathbf{v}_{k}$ , $\mathbf{v}_{k}\in E_{k}$ , then it follows from the Statements b) and d) of Lemma 9.3.7 that

\mathcal{P}_{k}\mathbf{v}=\mathcal{P}_{k}(\mathbf{v}_{1}+\mathbf{v}_{2}+\ldots% +\mathbf{v}_{r})=\mathcal{P}_{k}\mathbf{v}_{k}=\mathbf{v}_{k}.

∎

9.3.3. Geometric meaning of algebraic multiplicity

Proposition 9.3.8.

Algebraic multiplicity of an eigenvalue equals the dimension of the corresponding generalized eigenspace.

Proof.

According to Remark 9.3.5, if we join bases in generalized eigenspaces $E_{k}=E_{\lambda_{k}}$ to get a basis in the whole space, the matrix of $A$ in any such basis has a block-diagonal form $\operatorname{diag}\{A_{1},A_{2},\ldots,A_{r}\}$ , where $A_{k}:=A|_{E_{k}}$ . Operators $N_{k}=A_{k}-\lambda_{k}I_{E_{k}}$ are nilpotent, so $\sigma(N_{k})=\{0\}$ . Therefore, the spectrum of the operator $A_{k}$ (recall that $A_{k}=N_{k}-\lambda_{k}I$ ) consists of one eigenvalue $\lambda_{k}$ of (algebraic) multiplicity $n_{k}=\dim E_{k}$ . The multiplicity equals $n_{k}$ because an operator in a finite-dimensional space $V$ has exactly $\dim V$ eigenvalues counting multiplicities, and $A_{k}$ has only one eigenvalue.

Note that we are free to pick bases in $E_{k}$ , so let us pick them in such a way that the corresponding blocks $A_{k}$ are upper triangular. Then

\det(A-\lambda I)=\prod_{k=1}^{r}\det(A_{k}-\lambda I_{E_{k}})=\prod_{k=1}^{r}% (\lambda_{k}-\lambda)^{n_{k}}.

But this means that the algebraic multiplicity of the eigenvalue $\lambda_{k}$ is $n_{k}=\dim E_{\lambda_{k}}$ . ∎

9.3.4. An important application

The following corollary is very important for differential equations.

Corollary 9.3.9.

Any operator $A$ in $V$ can be represented as $A=D+N$ , where $D$ is diagonalizable (i.e. diagonal in some basis) and $N$ is nilpotent ( $N^{m}=\mathbf{0}$ for some $m$ ), and $DN=ND$ .

Proof.

As we discussed above, see Remark 9.3.5, if we join the bases in $E_{k}$ to get a basis in $V$ , then in this basis $A$ has the block diagonal form $A=\operatorname{diag}\{A_{1},A_{2},\ldots,A_{r}\}$ , where $A_{k}:=A|_{E_{k}}$ , $E_{k}=E_{\lambda_{k}}$ . The operators $N_{k}:=A_{k}-\lambda_{k}I_{E_{k}}$ are nilpotent, and the operator $D=\operatorname{diag}\{\lambda_{1}I_{E_{1}},\lambda_{2}I_{E_{2}}\ldots,\lambda% _{r}I_{E_{r}}\}$ is diagonal (in this basis). Notice also that $\lambda_{k}I_{E_{k}}N_{k}=N_{k}\lambda_{k}I_{E_{k}}$ (identity operator commutes with any operator), so the block diagonal operator $N=\operatorname{diag}\{N_{1},N_{2},\ldots,N_{r}\}$ commutes with $D$ , $DN=ND$ . Therefore, defining $N$ as the block diagonal operator $N=\operatorname{diag}\{N_{1},N_{2},\ldots,N_{r}\}$ we get the desired decomposition. ∎

This corollary allows us to compute functions of operators. Let us recall that if $p$ is a polynomial of degree $d$ , then $p(a+x)$ can be computed with the help of Taylor’s formula

p(a+x)=\sum_{k=0}^{d}\frac{p^{(k)}(a)}{k!}x^{k}

This formula is an algebraic identity, meaning that for each polynomial $p$ we can check that the formula is true using formal algebraic manipulations with $a$ and $x$ and not caring about their nature.

Since operators $D$ and $N$ commute, $DN=ND$ , the same rules as for usual (scalar) variables apply to them, and we can write (by plugging $D$ instead of $a$ and $N$ instead of $x$ )

p(A)=p(D+N)=\sum_{k=0}^{d}\frac{p^{(k)}(D)}{k!}N^{k}.

Here, to compute the derivative $p^{(k)}(D)$ we first compute the $k$ th derivative of the polynomial $p(x)$ (using the usual rules from calculus), and then plug $D$ instead of $x$ .

But since $N$ is nilpotent, $N^{m}=\mathbf{0}$ for some $m$ , only first $m$ terms can be non-zero, so

p(A)=p(D+N)=\sum_{k=0}^{m-1}\frac{p^{(k)}(D)}{k!}N^{k}.

If $m$ is much smaller than $d$ , this formula makes computation of $p(A)$ much easier.

The same approach works if $p$ is not a polynomial, but an infinite power series. For general power series we have to be careful about convergence of all the series involved, so we cannot say that the formula is true for an arbitrary power series $p(x)$ . However, if the radius of convergence of the power series is $\infty$ , then everything works fine. In particular, if $p(x)=e^{x}$ , then, using the fact that $(e^{x})^{\prime}=e^{x}$ we get.

e^{A}=\sum_{k=0}^{m-1}\frac{e^{D}}{k!}N^{k}=e^{D}\sum_{k=0}^{m-1}\frac{1}{k!}N% ^{k}

This formula has important applications in differential equation.

Note, that the fact that $ND=DN$ is essential here!

9.4. Structure of nilpotent operators

Recall, that an operator $A$ in a vector space $V$ is called nilpotent if $A^{k}=\mathbf{0}$ for some exponent $k$ .

In the previous section we have proved, see Remark 9.3.5, that if we join the bases in all generalized eigenspaces $E_{k}=E_{\lambda_{k}}$ to get a basis in the whole space, then the operator $A$ has in this basis a block diagonal form $\operatorname{diag}\{A_{1},A_{2},\ldots,A_{r}\}$ and operators $A_{k}$ can be represented as $A_{k}=\lambda_{k}I+N_{k}$ , where $N_{k}$ are nilpotent operators.

In each generalized eigenspace $E_{k}$ we want to pick up a basis such that the matrix of $A_{k}$ in this basis has the simplest possible form. Since matrix (in any basis) of the identity operator is the identity matrix, we need to find a basis in which the nilpotent operator $N_{k}$ has a simple form.

Since we can deal with each $N_{k}$ separately, we will need to consider the following problem:

For a nilpotent operator $A$ find a basis such that the matrix of $A$ in this basis is simple.

Let see, what does it mean for a matrix to have a simple form. It is easy to see that the matrix

(9.4.1)

\left(\begin{array}[]{ccccccc}0&1&&&\ 0\\ &0&\ 1&&\\ &&0&\ddots&\\ &&&\ddots&1\\ 0&&&&0\\ \end{array}\right)

(the entries on the diagonal right above the main one are $1$ , and the rest of entries are $0$ ) is nilpotent.

These matrices (together with $1\times 1$ zero matrices) will be our “building blocks”. Namely, we will show that for any nilpotent operator one can find a basis such that the matrix of the operator in this basis has the block diagonal form $\operatorname{diag}\{A_{1},A_{2},\ldots,A_{r}\}$ , where each $A_{k}$ is either a block of form (9.4.1) or a $1\times 1$ zero block.

Let us see what we should be looking for. Suppose the matrix of an operator $A$ has in a basis $\mathbf{v}_{1},\mathbf{v}_{2},\ldots,\mathbf{v}_{p}$ the form (9.4.1). Then

(9.4.2)	$\displaystyle A\mathbf{v}_{1}$	$\displaystyle=\mathbf{0}$
and
(9.4.3)	$\displaystyle A\mathbf{v}_{k+1}$	$\displaystyle=\mathbf{v}_{k},\qquad k=1,2,\ldots,p-1.$

Thus we have to be looking for the chains of vectors $\mathbf{v}_{1},\mathbf{v}_{2},\ldots,\mathbf{v}_{p}$ satisfying the above relations (9.4.2), (9.4.3).

9.4.1. Cycles of generalized eigenvectors

Definition.

Let $A$ be a nilpotent operator. A chain of non-zero vectors $\mathbf{v}_{1},\mathbf{v}_{2},\ldots,\mathbf{v}_{p}$ satisfying relations (9.4.2), (9.4.3) is called a cycle of generalized eigenvectors of $A$ . The vector $\mathbf{v}_{1}$ is called the initial vector of the cycle, the vector $\mathbf{v}_{p}$ is called the end vector of the cycle, and the number $p$ is called the length of the cycle.

Remark.

A similar definition can be made for an arbitrary operator. Then all vectors $\mathbf{v}_{k}$ must belong to the same generalized eigenspace $E_{\lambda}$ , and they must satisfy the identities

(A-\lambda I)\mathbf{v}_{1}=\mathbf{0},\qquad(A-\lambda I)\mathbf{v}_{k+1}=% \mathbf{v}_{k},\quad k=1,2,\ldots,p-1,

Theorem 9.4.1.

Let $A$ be a nilpotent operator, and let $\mathcal{C}_{1},\mathcal{C}_{2},\ldots,\mathcal{C}_{r}$ be cycles of its generalized eigenvectors, $\mathcal{C}_{k}=\mathbf{v}^{k}_{1},\mathbf{v}^{k}_{2},\ldots,\mathbf{v}^{k}_{p% _{k}}$ , $p_{k}$ being the length of the cycle $\mathcal{C}_{k}$ . Assume that the initial vectors $\mathbf{v}_{1}^{1},\mathbf{v}_{1}^{2},\ldots,\mathbf{v}_{1}^{r}$ are linearly independent. Then no vector belongs to two cycles, and the union of all the vectors from all the cycles is a linearly independent.

Proof.

Let $n=p_{1}+p_{2}+\ldots+p_{r}$ be the total number of vectors in all the cycles²²2Here we just count vectors in each cycle, and add all the numbers. We do not care if some cycles have a common vector, we count this vector in each cycle it belongs to (of course, according to the theorem, it is impossible, but initially we cannot assume that). We will use induction in $n$ . If $n=1$ the theorem is trivial.

Let us now assume, that the theorem is true for all operators and for all collection of cycles, as long as the total number of vectors in all the cycles is strictly less than $n$ .

Without loss of generality we can assume that the vectors $\mathbf{v}_{j}^{k}$ span the whole space $V$ , because, otherwise we can consider instead of the operator $A$ its restriction onto the invariant subspace $\operatorname{span}\{\mathbf{v}_{j}^{k}:k=1,2,\ldots,r,\ 1\leq j\leq p_{k}\}$ .

Consider the subspace $\operatorname{Ran}A$ . It follows from the relations (9.4.2), (9.4.3) that vectors $\mathbf{v}_{j}^{k}:k=1,2,\ldots,r,\ 1\leq j\leq p_{k}-1$ span $\operatorname{Ran}A$ . Note that if $p_{k}>1$ then the system $\mathbf{v}^{k}_{1},\mathbf{v}^{k}_{2},\ldots,\mathbf{v}^{k}_{p_{k}-1}$ is a cycle, and that $A$ annihilates any cycle of length 1.

Therefore, we have finitely many cycles, and initial vectors of these cycles are linearly independent, so the induction hypothesis applies, and the vectors $\mathbf{v}_{j}^{k}:k=1,2,\ldots,r,\ 1\leq j\leq p_{k}-1$ are linearly independent. Since these vectors also span $\operatorname{Ran}A$ , we have a basis there. Therefore,

\operatorname{rank}A=\dim\operatorname{Ran}A=n-r

(we had $n$ vectors, and we removed one vector $\mathbf{v}^{k}_{p_{k}}$ from each cycle $\mathcal{C}_{k}$ , $k=1,2,\ldots,r$ , so we have $n-r$ vectors in the basis $\mathbf{v}_{j}^{k}:k=1,2,\ldots,r,\ 1\leq j\leq p_{k}-1$ ). On the other hand $A\mathbf{v}_{1}^{k}=\mathbf{0}$ for $k=1,2,\ldots,r$ , and since these vectors are linearly independent $\dim\operatorname{Ker}A\geq r$ . By the Rank Theorem (Theorem 2.7.2 from Chapter 2)

\dim V=\operatorname{rank}A+\dim\operatorname{Ker}A=(n-r)+\dim\operatorname{% Ker}A\geq(n-r)+r=n

so $\dim V\geq n$ .

On the other hand $V$ is spanned by $n$ vectors, therefore the vectors $\mathbf{v}_{j}^{k}:k=1,2,\ldots,r,\ 1\leq j\leq p_{k}$ , form a basis, so they are linearly independent ∎

9.4.2. Jordan canonical form of a nilpotent operator

Theorem 9.4.2.

Let $A:V\to V$ be a nilpotent operator. Then $V$ has a basis consisting of union of cycles of generalized eigenvectors of the operator $A$ .

Proof.

We will use induction in $n$ where $n=\dim V$ . For $n=1$ the theorem is trivial.

Assume that the theorem is true for any operator acting in a space of dimension strictly less than $n$ .

Consider the subspace $X=\operatorname{Ran}A$ . $X$ is an invariant subspace of the operator $A$ , so we can consider the restriction $A|_{X}$ .

Since $A$ is not invertible, $\dim\operatorname{Ran}A<\dim V$ , so by the induction hypothesis there exist cycles $\mathcal{C}_{1},\mathcal{C}_{2},\ldots,\mathcal{C}_{r}$ of generalized eigenvectors such that their union is a basis in $X$ . Let $\mathcal{C}_{k}=\mathbf{v}^{k}_{1},\mathbf{v}^{k}_{2},\ldots,\mathbf{v}^{k}_{p% _{k}}$ , where $\mathbf{v}^{k}_{1}$ is the initial vector of the cycle.

Since the end vector $\mathbf{v}^{k}_{p_{k}}$ belong to $\operatorname{Ran}A$ , one can find a vector $\mathbf{v}^{k}_{p_{k}+1}$ such that $A\mathbf{v}^{k}_{p_{k}+1}=\mathbf{v}^{k}_{p_{k}}$ . So we can extend each cycle $\mathcal{C}_{k}$ to a bigger cycle $\widetilde{\mathcal{C}}_{k}=\mathbf{v}^{k}_{1},\mathbf{v}^{k}_{2},\ldots,% \mathbf{v}^{k}_{p_{k}},\mathbf{v}^{k}_{p_{k}+1}$ . Since the initial vectors $\mathbf{v}^{k}_{1}$ of cycles $\widetilde{\mathcal{C}}_{k}$ , $k=1,2,\ldots,r$ are linearly independent, the above Theorem 9.4.1 implies that the union of these cycles is a linearly independent system.

By the definition of the cycle we have $\mathbf{v}_{1}^{k}\in\operatorname{Ker}A$ , and we assumed that the initial vectors $\mathbf{v}_{1}^{k}$ , $k=1,2,\ldots,r$ are linearly independent. Let us complete this system to a basis in $\operatorname{Ker}A$ , i.e. let find vectors $\mathbf{u}_{1},\mathbf{u}_{2},\ldots,\mathbf{u}_{q}$ such that the system $\mathbf{v}_{1}^{1},\mathbf{v}_{1}^{2},\ldots,\mathbf{v}_{1}^{r},\mathbf{u}_{1}% ,\mathbf{u}_{2},\ldots,\mathbf{u}_{q}$ is a basis in $\operatorname{Ker}A$ (it may happen that the system $\mathbf{v}_{1}^{k}$ , $k=1,2,\ldots,r$ is already a basis in $\operatorname{Ker}A$ , in which case we put $q=0$ and add nothing).

The vector $\mathbf{u}_{j}$ can be treated as a cycle of length 1, so we have a collection of cycles $\widetilde{\mathcal{C}}_{1},\widetilde{\mathcal{C}}_{2},\ldots,\widetilde{% \mathcal{C}}_{r},\mathbf{u}_{1},\mathbf{u}_{2},\ldots,\mathbf{u}_{q}$ , whose initial vectors are linearly independent. So, we can apply Theorem 9.4.1 to get that the union of all these cycles is a linearly independent system.

To show that it is a basis, let us count the dimensions. We know that the cycles $\mathcal{C}_{1},\mathcal{C}_{2},\ldots,\mathcal{C}_{r}$ have $\dim\operatorname{Ran}A=\operatorname{rank}A$ vectors total. Each cycle $\widetilde{\mathcal{C}}_{k}$ was obtained from $\mathcal{C}_{k}$ by adding 1 vector to it, so the total number of vectors in all the cycles $\widetilde{\mathcal{C}}_{k}$ is $\operatorname{rank}A+r$ .

We know that $\dim\operatorname{Ker}A=r+q$ (because $\mathbf{v}_{1}^{1},\mathbf{v}_{1}^{2},\ldots,\mathbf{v}_{1}^{r},\mathbf{u}_{1}% ,\mathbf{u}_{2},\ldots,\mathbf{u}_{q}$ is a basis there). We added to the cycles $\widetilde{\mathcal{C}}_{1},\widetilde{\mathcal{C}}_{2},\ldots,\widetilde{% \mathcal{C}}_{r}$ additional $q$ vectors, so we got

\operatorname{rank}A+r+q=\operatorname{rank}A+\dim\operatorname{Ker}A=\dim V

linearly independent vectors. But $\dim V$ linearly independent vectors is a basis. ∎

Definition.

A basis consisting of a union of cycles of generalized eigenvectors of a nilpotent operator $A$ (existence of which is guaranteed by the Theorem 9.4.2) is called a Jordan canonical basis for $A$ .

Note, that such basis is not unique.

Corollary 9.4.3.

Let $A$ be a nilpotent operator. There exists a basis (a Jordan canonical basis) such that the matrix of $A$ in this basis is a block diagonal $\operatorname{diag}\{A_{1},A_{2},\ldots,A_{r}\}$ , where all $A_{k}$ (except may be one) are blocks of form (9.4.1), and one of the blocks $A_{k}$ can be zero.

The matrix of $A$ in a Jordan canonical basis is called the Jordan canonical form of the operator $A$ . We will see later that the Jordan canonical form is unique, if we agree on how to order the blocks (i.e. on how to order the vectors in the basis).

Proof of Corollary 9.4.3.

According to Theorem 9.4.2 one can find a basis consisting of a union of cycles of generalized eigenvectors. A cycle of size $p$ gives rise to a $p\times p$ diagonal block of form (9.4.1), and a cycle of length 1 correspond to a $1\times 1$ zero block. We can join these $1\times 1$ zero blocks in one large zero block (because off-diagonal entries are $0$ ). ∎

9.4.3. Dot diagrams. Uniqueness of the Jordan canonical form

There is a good way of visualizing Theorem 9.4.2 and Corollary 9.4.3, the so-called dot diagrams. This methods also allows us to answer many natural questions, like “is the block diagonal representation given by Corollary 9.4.3 unique?”

Of course, if we treat this question literally, the answer is “no”, for we always can change the order of the blocks. But, if we exclude such trivial possibilities, for example by agreeing on some order of blocks (say, if we put all non-zero blocks in decreasing order, and then put the zero block), is the representation unique, or not?

To better understand the structure of nilpotent operators, described in the Section 9.4.1, let us draw the so-called dot diagram. Namely, suppose we have a basis, which is a union of cycles of generalized eigenvectors. Let us represent the basis by an array of dots, so that each column represents a cycle. The first row (i.e. the top one in the Figure 9.1 below) correspond to initial vectors $\mathbf{v}_{1}^{k}$ of cycles, and we arrange the columns (cycles) by their length, putting the longest one to the left. Dots in the $k$ th column, going from top to the bottom, correspond to the vectors $\mathbf{v}_{1}^{k},\mathbf{v}_{2}^{k},\ldots,\mathbf{v}_{p_{k}}^{k}$ of the $k$ th cycle.

Usually one just present a dot diagram without specifying corresponding vectors. But sometimes it is beneficial to consider a marked dot diagram, where one marks dots with the corresponding vectors in the cycle.

On the figure 9.1 we have the dot diagram of a nilpotent operator, as well as its Jordan canonical form. This dot diagram shows, that the basis has 1 cycle of length 5, one cycle of length 3, two cycles of length 2, and 2 cycles of length 1. The cycle of length 5 corresponds to the $5\times 5$ block of the matrix, the cycle of length 3 correspond to the $3\times 3$ block, and two cycles of length 2 correspond to two $2\times 2$ blocks. Two cycles of length 1 correspond to two zero entries on the diagonal, which we join in the $2\times 2$ zero block. Here in each block we are only giving the main diagonal and the diagonal above it; all other entries of the matrix are zero.

\begin{picture}(25.0,30.0)(0.0,0.0)\put(0.0,30.0){\circle*{1.0}}\put(5.0,30.0)% {\circle*{1.0}}\put(10.0,30.0){\circle*{1.0}}\put(15.0,30.0){\circle*{1.0}}% \put(20.0,30.0){\circle*{1.0}}\put(25.0,30.0){\circle*{1.0}} \put(0.0,25.0){\circle*{1.0}}\put(5.0,25.0){\circle*{1.0}}\put(10.0,25.0){% \circle*{1.0}}\put(15.0,25.0){\circle*{1.0}} \put(0.0,20.0){\circle*{1.0}}\put(5.0,20.0){\circle*{1.0}} \put(0.0,15.0){\circle*{1.0}} \put(0.0,10.0){\circle*{1.0}} \end{picture}\qquad\left(\begin{array}[]{ccccc|ccccccccc}0&1&&&&&&&&&&&&\\ &0&1&&&&&&&&&&&\\ &&0&1&&&&&&&&&&\\ &&&0&1&&&&&&0&&&\\ &&&&0&&&&&&&&&\\ \cline{1-8}\cr&&&&&0&1&\lx@intercol\hfil\hfil\lx@intercol\vrule\lx@intercol&&&% &&&\\ &&&&&&0&\lx@intercol\hfil 1\hfil\lx@intercol\vrule\lx@intercol&&&&&&\\ &&&&&&&\lx@intercol\hfil 0\hfil\lx@intercol\vrule\lx@intercol&&&&&&\\ \cline{6-10}\cr&&&&\lx@intercol\hfil\hfil\lx@intercol&&&\lx@intercol\hfil\hfil% \lx@intercol\vrule\lx@intercol&0&\lx@intercol\hfil 1\hfil\lx@intercol\vrule% \lx@intercol&&&&\\ &&&&\lx@intercol\hfil\hfil\lx@intercol&&&\lx@intercol\hfil\hfil\lx@intercol% \vrule\lx@intercol&&\lx@intercol\hfil 0\hfil\lx@intercol\vrule\lx@intercol&&&&% \\ \cline{9-12}\cr&&0&&\lx@intercol\hfil\hfil\lx@intercol&&&&&\lx@intercol\hfil% \hfil\lx@intercol\vrule\lx@intercol&0&\lx@intercol\hfil 1\hfil\lx@intercol% \vrule\lx@intercol&&\\ &&&&\lx@intercol\hfil\hfil\lx@intercol&&&&&\lx@intercol\hfil\hfil\lx@intercol% \vrule\lx@intercol&&\lx@intercol\hfil 0\hfil\lx@intercol\vrule\lx@intercol&&\\ \cline{11-13}\cr&&&&\lx@intercol\hfil\hfil\lx@intercol&&&&&&&\lx@intercol\hfil% \hfil\lx@intercol\vrule\lx@intercol&\lx@intercol\hfil 0\hfil\lx@intercol\vrule% \lx@intercol&\\ \cline{13-14}\cr&&&&\lx@intercol\hfil\hfil\lx@intercol&&&&&&&&\lx@intercol% \hfil\hfil\lx@intercol\vrule\lx@intercol&0\\ \end{array}\right)

Figure 9.1. Dot diagram and corresponding Jordan canonical form of a nilpotent operator

If we agree on the ordering of the blocks, there is a one-to-one correspondence between dot diagrams and Jordan canonical forms (for nilpotent operators). So, the question about uniqueness of the Jordan canonical form is equivalent to the question about uniqueness of the dot diagram.

Action of operator on a dot diagram

To answer the question about uniqueness of the dot diagram, let us analyze, how the operator $A$ transforms it. Assume first that we are given a marked dot diagram, meaning that we mark each dot with the corresponding vector in the cycle. In this case we can get both the Jordan canonical form and Jordan canonical basis from such dot diagram.

Since the operator $A$ annihilates initial vectors of the cycles, and moves vector $\mathbf{v}_{k+1}$ of a cycle to the vector $\mathbf{v}_{k}$ , we can see that the operator $A$ acts on its dot diagram by removing the last (bottom) dot in every column. An equivalent description would be to remove the first (top) row, but require that top row in $k$ th column still corresponds to the initial vector $\mathbf{v}_{1}^{k}$ of the cycle.

The new dot diagram will be the (marked) dot diagram of the operator $A$ restricted to the $A$ -invariant subspace $\operatorname{Ran}A$ , and it gives us the Jordan canonical form for the restriction $A|_{\operatorname{Ran}A}$ (both the Jordan canonical basis and the matrix, i.e. the Jordan canonical form).

Similarly, it is not hard to see that the operator $A^{k}$ removes the first $k$ rows of the dot diagram. Therefore, if for all $k$ we know the dimensions $\dim\operatorname{Ker}(A^{k})$ , we know the dot diagram of the operator $A$ . Namely, the number of dots in the first row is $\dim\operatorname{Ker}A$ , the number of dots in the second row is

\dim\operatorname{Ker}(A^{2})-\dim\operatorname{Ker}A,

and the number of dots in the $k$ th row is

\dim\operatorname{Ker}(A^{k})-\dim\operatorname{Ker}(A^{k-1}).

But this means that the dot diagram, which was initially defined using a Jordan canonical basis, does not depend on a particular choice of such a basis. Therefore, the dot diagram, is unique! This implies that if we agree on the order of the blocks, then the Jordan canonical form is unique.

9.4.4. Computing a Jordan canonical basis

Let us say few words about computing a Jordan canonical basis for a nilpotent operator. We have already proved existence of this basis in Section 9.4.2, so we can use Jordan canonical form and dot diagram to guide us through the construction.

Let $p_{1}$ be the largest positive integer such that $A^{p_{1}-1}\neq\mathbf{0}$ (so $A^{p_{1}}=\mathbf{0}$ ). Equivalently, we can say that $p_{1}$ is the smallest non-negative integer such that $A^{p_{1}}=\mathbf{0}$ (and so $A^{p_{1}-1}\neq\mathbf{0}$ ). One can see from the above analysis of dot diagrams, see Section 9.4.3, that $p_{1}$ is the length of the longest cycle (so this also can be used as the definition of $p_{1}$ ).

Computing operators $A^{k}$ , $k=1,2,\ldots,p_{1}-1$ , and counting $\dim\operatorname{Ker}(A^{k})$ we can construct the dot diagram of $A$ . Namely, $\dim\operatorname{Ker}A$ gives us the number of dots in the top row, $\dim\operatorname{Ker}A^{2}-\dim\operatorname{Ker}A$ the number of dots in the next row, etc.

Now we want to put vectors instead of dots and find a basis which is a union of cycles.

We start by finding the longest cycles (because we know the dot diagram, we know how many cycles should be there, and what is the length of each cycle). Consider the column space $\operatorname{Ran}(A^{p_{1}-1})$ . Since $A^{p_{1}}=\mathbf{0}$ there holds $\operatorname{Ran}(A^{p_{1}-1})\subset\operatorname{Ker}A$ , so $\operatorname{Ran}(A^{p_{1}-1})=\operatorname{Ran}(A^{p_{1}-1})\cap% \operatorname{Ker}A$ . To be in agreement with the notation in the next steps we will use the notation $\operatorname{Ran}(A^{p_{1}-1})\cap\operatorname{Ker}A$ instead of $\operatorname{Ran}(A^{p_{1}-1})$ .

Let $\mathbf{v}_{1}^{1},\mathbf{v}^{2}_{1},\ldots,\mathbf{v}^{r_{1}}_{1}$ be a basis in the subspace $\operatorname{Ran}(A^{p_{1}-1})\cap\operatorname{Ker}A$ ; the vectors $\mathbf{v}_{1}^{1},\mathbf{v}^{2}_{1},\ldots,\mathbf{v}^{r_{1}}_{1}$ will be the initial vectors of the cycles. Then we find the end vectors $\mathbf{v}_{p_{1}}^{1},\mathbf{v}_{p_{1}}^{2},\ldots,\mathbf{v}_{p_{1}}^{r_{1}}$ of the cycles by solving the equations

A^{p_{1}-1}\mathbf{v}_{p_{1}}^{k}=\mathbf{v}_{1}^{k},\qquad k=1,2,\ldots,r_{1}.

Applying consecutively the operator $A$ to the end vector $\mathbf{v}_{p_{1}}^{k}$ , we get all the vectors $\mathbf{v}_{j}^{k}$ in the cycle. Thus, we have constructed all cycles of maximal length.

Alternatively, for each $k$ we can find the vectors $\mathbf{v}_{j}^{k}$ , $2\leq j\leq p_{1}$ by consecutively solving the equations

A\mathbf{v}_{j}^{k}=\mathbf{v}_{j-1}^{k},\qquad j=2,\dots,p_{1}.

Thus, we constructed all cycles of maximal length.

Let $p_{2}$ be the length of a maximal cycle among those that are left to find. Consider the subspace $\operatorname{Ran}(A^{p_{2}-1})\cap\operatorname{Ker}A$ , and let its dimension be $r_{2}$ . Since $\operatorname{Ran}(A^{p_{1}-1})\subset\operatorname{Ran}(A^{p_{2}-1})$ , we conclude that

\operatorname{Ran}(A^{p_{1}-1})\cap\operatorname{Ker}A\subset\operatorname{Ran% }(A^{p_{2}-1})\cap\operatorname{Ker}A,

so $r_{2}\geq r_{1}=\dim\operatorname{Ran}(A^{p_{1}-1})\cap\operatorname{Ker}A$ . Moreover³³3one can also see from the dot diagram that for $p_{2}<p\leq p_{1}$ we have $\operatorname{Ran}(A^{p})\cap\operatorname{Ker}A=\operatorname{Ran}(A^{p_{1}})% \cap\operatorname{Ker}A$ , so $p_{2}$ is the largest exponent for which we need to complete the basis., it can be seen from the dot diagram that $r_{2}>r_{1}$ . Therefore we can complete the basis $\mathbf{v}_{1}^{1},\mathbf{v}_{1}^{2},\ldots,\mathbf{v}_{1}^{r_{1}}$ in $\operatorname{Ran}(A^{p_{1}-1})\cap\operatorname{Ker}A$ to a basis $\mathbf{v}_{1}^{1},\mathbf{v}_{1}^{2},\ldots,\mathbf{v}_{1}^{r_{1}},\mathbf{v}% _{1}^{r_{1}+1},\ldots,\mathbf{v}_{1}^{r_{2}}$ in $\operatorname{Ran}(A^{p_{2}-1})\cap\operatorname{Ker}A$ . Then we find end vectors of the cycles $\mathcal{C}_{r_{1}+1},\ldots,\mathcal{C}_{r_{2}}$ by solving (for $\mathbf{v}_{p_{2}}^{k}$ ) the equations

A^{p_{2}-1}\mathbf{v}_{p_{2}}^{k}=\mathbf{v}_{1}^{k},\qquad k=r_{1}+1,r_{1}+2,% \ldots,r_{2},

thus constructing the cycles of length $p_{2}$ .

Let $p_{3}$ denote the length of a maximal cycle among ones left. Then, completing the basis $\mathbf{v}_{1}^{1},\mathbf{v}_{1}^{2},\ldots,\mathbf{v}_{1}^{r_{2}}$ in $\operatorname{Ran}(A^{p_{2}-1})\cap\operatorname{Ker}A$ to a basis in $\operatorname{Ran}(A^{p_{3}-1})\cap\operatorname{Ker}A$ we construct the cycles of length $p_{3}$ , and so on…

One final remark: as we discussed above, if we know the dot diagram, we know the canonical form, so after we have found a Jordan canonical basis, we do not need to compute the matrix of $A$ in this basis: we already know it!

9.5. Jordan decomposition theorem

Theorem 9.5.1.

Given an operator $A$ there exist a basis (Jordan canonical basis) such that the matrix of $A$ in this basis has a block diagonal form with blocks of form

(9.5.1)

\left(\begin{array}[]{ccccccc}\lambda&1&&&\ 0\\ &\lambda&\ 1&&\\ &&\lambda&\ddots&\\[-4.79993pt] &&&\ddots&1\\ 0&&&&\lambda\\ \end{array}\right)

where $\lambda$ is an eigenvalue of $A$ . Here we assume that the block of size $1$ is just $\lambda$ .

The block diagonal form from Theorem 9.5.1 is called the Jordan canonical form of the operator $A$ . The corresponding basis is called a Jordan canonical basis for an operator $A$ .

Proof of Theorem 9.5.1.

According to Theorem 9.3.4 and Remark 9.3.5, if we join bases in the generalized eigenspaces $E_{k}=E_{\lambda_{k}}$ to get a basis in the whole space, the matrix of $A$ in this basis has a block diagonal form $\operatorname{diag}\{A_{1},A_{2},\ldots,A_{r}\}$ , where $A_{k}=A|_{E_{k}}$ . The operators $N_{k}=A_{k}-\lambda_{k}I_{E_{k}}$ are nilpotent, so by Theorem 9.4.2 (more precisely, by Corollary 9.4.3) one can find a basis in $E_{k}$ such that the matrix of $N_{k}$ in this basis is the Jordan canonical form of $N_{k}$ . To get the matrix of $A_{k}$ in this basis one just puts $\lambda_{k}$ instead of $0$ on the main diagonal. ∎

9.5.1. Remarks about computing Jordan canonical basis

First of all let us recall that the computing of eigenvalues is the hardest part, but here we do not discuss this part, and assume that eigenvalues are already computed.

For each eigenvalue $\lambda$ we compute subspaces $\operatorname{Ker}(A-\lambda I)^{k}$ , $k=1,2,\ldots$ until the sequence of the subspaces stabilizes. In fact, since we have an increasing sequence of subspaces ( $\operatorname{Ker}(A-\lambda I)^{k}\subset\operatorname{Ker}(A-\lambda I)^{k+1}$ ), then it is sufficient only to keep track of their dimension (or ranks of the operators $(A-\lambda I)^{k}$ ). For an eigenvalue $\lambda$ let $m=m_{\lambda}$ be the number where the sequence $\operatorname{Ker}(A-\lambda I)^{k}$ stabilizes, i.e. $m$ satisfies

\dim\operatorname{Ker}(A-\lambda I)^{m-1}<\dim\operatorname{Ker}(A-\lambda I)^% {m}=\dim\operatorname{Ker}(A-\lambda I)^{m+1}.

Then $E_{\lambda}=\operatorname{Ker}(A-\lambda I)^{m}$ is the generalized eigenspace corresponding to the eigenvalue $\lambda$ .

After we computed all the generalized eigenspaces there are two possible ways of action. The first way is to find a basis in each generalized eigenspace, so the matrix of the operator $A$ in this basis has the block-diagonal form $\operatorname{diag}\{A_{1},A_{2},\ldots,A_{r}\}$ , where $A_{k}=A|_{E_{\lambda_{k}}}$ . Then we can deal with each matrix $A_{k}$ separately. The operators $N_{k}=A_{k}-\lambda_{k}I$ are nilpotent, so applying the algorithm described in Section 9.4.4 we get the Jordan canonical representation for $N_{k}$ , and putting $\lambda_{k}$ instead of 0 on the main diagonal, we get the Jordan canonical representation for the block $A_{k}$ . The advantage of this approach is that we are working with smaller blocks. But we need to find the matrix of the operator in a new basis, which involves inverting a matrix and matrix multiplication.

Another way is to find a Jordan canonical basis in each of the generalized eigenspaces $E_{\lambda_{k}}$ by working directly with the operator $A$ , without splitting it first into the blocks. Again, the algorithm we outlined above in Section 9.4.4 works with a slight modification. Namely, when computing a Jordan canonical basis for a generalized eigenspace $E_{\lambda_{k}}$ , instead of considering subspaces $\operatorname{Ran}(A_{k}-\lambda_{k}I)^{j}$ , which we would need to consider when working with the block $A_{k}$ separately, we consider the subspaces $(A-\lambda_{k}I)^{j}E_{\lambda_{k}}$ .

Index

adjoint
- of an operator §5.5.1
basis Definition 1.2.1
- biorthogonal, see basis, dual
- dual §8.1.3
- of subspaces §4.2.4
- orthogonal §5.2.1
- orthonormal §5.2.1
$\operatorname{codim}$ §7.4.2
codimension §7.4.2
coordinates
- of a vector in the basis Definition 1.2.1
counting multiplicities §4.1.5
dual basis §8.1.3
dual space Definition 8.1.2
duality §8.5.2
eigenvalue §4.1.1
eigenvector §4.1.1
Einstein notation §8.4.3, Remark
entry, entries
- of a matrix §1.1.2
- of a tensor §8.6.1
Fourier decomposition
- abstract, non-orthogonal §8.1.3, §8.2.3
- abstract, orthogonal §5.2.1, §8.2.3
Frobenius norm §6.4.2
functional
- linear Definition 8.1.1
Gauss–Jordan elimination §2.2
generalized eigenspace Definition 9.3.2
generalized eigenvector Definition 9.3.2
generating system Definition 1.2.5
Gram–Schmidt orthogonalization §5.3.1
Hermitian matrix §6.2
Hilbert–Schmidt norm §6.4.2
inner product
- abstract §5.1.3
- in $\mathbb{C}^{n}$ §5.1.2
inner product space §5.1.3
invariant subspace §9.1
isometry §5.6.1
Jordan canonical
- basis §9.4.2, §9.5
  - for a nilpotent operator §9.4.2
- form §9.4.2, §9.5
  - for a nilpotent operator §9.4.2
Jordan decomposition theorem Theorem 9.5.1
- for a nilpotent operator Corollary 9.4.3
Kroneker delta §8.1.3, Definition 8.2.2
least square solution §5.4.1
- minimal norm §6.4.5
linear combination §1.2
- trivial Definition
linear functional Definition 8.1.1
linearly dependent Definition
linearly independent Definition
matrix §1.1.2
- antisymmetric item 3
- Hermitian §6.2
- lower triangular §3.3.3
- symmetric item 4, 1.2.3
- triangular §3.3.3
- upper triangular 1.7.5, §3.3.3
minor §3.6
$M_{m,n}$ Example
$M_{m\times n}$ Example
Moore–Penrose inverse §6.4.5
multilinear function §8.5.1
multilinear functional, see tensor
multiplicities
- counting §4.1.5
multiplicity
- algebraic §4.1.5
- geometric §4.1.5
norm §5.1.1, §5.1.5
- Euclidean in $\mathbb{R}^{n}$ or $\mathbb{C}^{n}$ §5.1.1
- Frobenius §6.4.2
- Hilbert–Scmidt §6.4.2
- operator, of a linear transformation §6.4.2
normal operator Definition
normed space §5.1.5
operator norm §6.4.2
orthogonal complement §5.3.3
orthogonal projection §5.3
pivot §2.2.3
polynomial matrix Example
projection
- orthogonal §5.3
row reduction §2.2
Schmidt decomposition §6.3.3
self-adjoint operator §6.2
singular value decomposition §6.3.4
- reduced (equiv. compact) §6.3.4
singular values §6.3.3
space
- dual Definition 8.1.2
spectral theory Chapter 4
spectrum §4.1.1
submatrix §3.6
subspace §1.7
- invariant §9.1
tensor §8.5.1
- contravariant §8.5.3
- covariant §8.5.3
- $r$ -covariant $s$ -contravariant §8.5.3
trace §1.5.5
transpose §1.1.2
triangular matrix §3.3.3
- eigenvalues of §4.1.7
unitary operator §5.6.1
valency of a tensor §8.5.1
$\delta_{k,j}$ , see Kroneker delta