Chapter 4 Introduction to spectral theory (eigenvalues and eigenvectors)

Spectral theory is the main tool that helps us to understand the structure of a linear operator. In this chapter we consider only operators acting from a vector space to itself (or, equivalently, n×nn\times n matrices). If we have such a linear transformation A:VVA:V\to V, we can multiply it by itself, take any power of it, or any polynomial.

The main idea of spectral theory is to split the operator into simple blocks and analyze each block separately.

To explain the main idea, let us consider difference equations. Many processes can be described by the equations of the following type

𝐱n+1=A𝐱n,n=0,1,2,,\mathbf{x}_{n+1}=A\mathbf{x}_{n},\qquad n=0,1,2,\ldots,

where A:VVA:V\to V is a linear transformation, and 𝐱n\mathbf{x}_{n} is the state of the system at the time nn. Given the initial state 𝐱0\mathbf{x}_{0} we would like to know the state 𝐱n\mathbf{x}_{n} at the time nn, analyze the long time behavior of 𝐱n\mathbf{x}_{n}, etc. 111The difference equations are discrete time analogues of the differential equation 𝐱(t)=A𝐱(t)\mathbf{x}^{\prime}(t)=A\mathbf{x}(t). To solve the differential equation, one needs to compute etA:=k=0tkAk/k!e^{tA}:=\sum_{k=0}^{\infty}t^{k}A^{k}/k!, and spectral theory also helps in doing this.

At the first glance the problem looks trivial: the solution 𝐱n\mathbf{x}_{n} is given by the formula 𝐱n=An𝐱0\mathbf{x}_{n}=A^{n}\mathbf{x}_{0}. But what if nn is huge: thousands, millions? Or what if we want to analyze the behavior of 𝐱n\mathbf{x}_{n} as nn\to\infty?

Here the idea of eigenvalues and eigenvectors comes in. Suppose that A𝐱0=λ𝐱0A\mathbf{x}_{0}=\lambda\mathbf{x}_{0}, where λ\lambda is some scalar. Then A2𝐱0=λ2𝐱0,A3𝐱0=λ3𝐱0,,An𝐱0=λn𝐱0A^{2}\mathbf{x}_{0}=\lambda^{2}\mathbf{x}_{0},\ A^{3}\mathbf{x}_{0}=\lambda^{3% }\mathbf{x}_{0},\ldots,\linebreak\ A^{n}\mathbf{x}_{0}=\lambda^{n}\mathbf{x}_{0}, so the behavior of the solution is very well understood

In this section we will consider only operators in finite-dimensional spaces. Spectral theory in infinitely many dimensions is significantly more complicated, and most of the results presented here fail in infinite-dimensional setting.

4.1. Main definitions

4.1.1. Eigenvalues, eigenvectors, spectrum

A scalar λ\lambda is called an eigenvalue of an operator A:VVA:V\to V if there exists a non-zero vector 𝐯V\mathbf{v}\in V such that

A𝐯=λ𝐯.A\mathbf{v}=\lambda\mathbf{v}.

The vector 𝐯\mathbf{v} is called the eigenvector of AA (corresponding to the eigenvalue λ\lambda).

If we know that λ\lambda is an eigenvalue, the eigenvectors are easy to find: one just has to solve the equation A𝐱=λ𝐱A\mathbf{x}=\lambda\mathbf{x}, or, equivalently

(AλI)𝐱=𝟎.(A-\lambda I)\mathbf{x}=\mathbf{0}.

So, finding all eigenvectors, corresponding to an eigenvalue λ\lambda is simply finding the nullspace of AλIA-\lambda I. The nullspace Ker(AλI)\operatorname{Ker}(A-\lambda I), i.e. the set of all eigenvectors and 𝟎\mathbf{0} vector, is called the eigenspace.

The set of all eigenvalues of an operator AA is called spectrum of AA, and is usually denoted σ(A)\sigma(A).

4.1.2. Finding eigenvalues: characteristic polynomials

A scalar λ\lambda is an eigenvalue if and only if the nullspace Ker(AλI)\operatorname{Ker}(A-\lambda I) is non-trivial (so the equation (AλI)𝐱=𝟎(A-\lambda I)\mathbf{x}=\mathbf{0} has a non-trivial solution).

Let AA act on 𝔽n\mathbb{F}^{n} (i.e. A:𝔽n𝔽nA:\mathbb{F}^{n}\to\mathbb{F}^{n}). Since the matrix of AA is square, AλIA-\lambda I has a non-trivial nullspace if and only if it is not invertible. We know that a square matrix is not invertible if and only if its determinant is 0. Therefore

λσ(A), i.e. λ is an eigenvalue of Adet(AλI)=0\framebox{$\displaystyle\lambda\in\sigma(A),\text{ i.e.~{}$\lambda$ is an % eigenvalue of $A$}\quad\Longleftrightarrow\quad\det(A-\lambda I)=0$}

If AA is an n×nn\times n matrix, the determinant det(AλI)\det(A-\lambda I) is a polynomial of degree nn of the variable λ\lambda. This polynomial is called the characteristic polynomial of AA. So, to find all eigenvalues of AA one just needs to compute the characteristic polynomial and find all its roots.

This method of finding the spectrum of an operator is not very practical in higher dimensions. Finding roots of a polynomial of high degree can be a very difficult problem, and it is impossible to solve the equation of degree higher than 4 in radicals. So, in higher dimensions different numerical methods of finding eigenvalues and eigenvectors are used.

4.1.3. Finding characteristic polynomial and eigenvalues of an abstract operator

So we know how to find the spectrum of a matrix. But how do we find eigenvalues of an operator acting in an abstract vector space? The recipe is simple:

Take an arbitrary basis, and compute eigenvalues of the matrix of the operator in this basis.

But how do we know that the result does not depend on a choice of the basis?

There can be several possible explanations. One is based on the notion of similar matrices. Let us recall that square matrices AA and BB are called similar if there exist an invertible matrix SS such that

A=SBS1.A=SBS^{-1}.

Note, that determinants of similar matrices coincide. Indeed

detA=det(SBS1)=detSdetBdetS1=detB\det A=\det(SBS^{-1})=\det S\det B\det S^{-1}=\det B

because detS1=1/detS\det S^{-1}=1/\det S. Note that if A=SBS1A=SBS^{-1} then

AλI=SBS1λSIS1=S(BS1λIS1)=S(BλI)S1,A-\lambda I=SBS^{-1}-\lambda SIS^{-1}=S(BS^{-1}-\lambda IS^{-1})=S(B-\lambda I% )S^{-1},

so the matrices AλIA-\lambda I and BλIB-\lambda I are similar. Therefore

det(AλI)=det(BλI),\det(A-\lambda I)=\det(B-\lambda I),

i.e.

characteristic polynomials of similar matrices coincide.

If T:VVT:V\to V is a linear transformation, and 𝒜\mathcal{A} and \mathcal{B} are two bases in VV, then

[T]𝒜𝒜=[I]𝒜[T][I]𝒜[T]_{{}_{\scriptstyle\mathcal{A}\mathcal{A}}}=[I]_{{}_{\scriptstyle\mathcal{A}% \mathcal{B}}}[T]_{{}_{\scriptstyle\mathcal{B}\mathcal{B}}}[I]_{{}_{% \scriptstyle\mathcal{B}\mathcal{A}}}

and since [I]𝒜=([I]𝒜)1[I]_{{}_{\scriptstyle\mathcal{B}\mathcal{A}}}=([I]_{{}_{\scriptstyle\mathcal{A% }\mathcal{B}}})^{-1} the matrices [T]𝒜𝒜[T]_{{}_{\scriptstyle\mathcal{A}\mathcal{A}}} and [T][T]_{{}_{\scriptstyle\mathcal{B}\mathcal{B}}} are similar.

In other words, matrices of a linear transformation in different bases are similar.

Therefore, we can define the characteristic polynomial of an operator as the characteristic polynomial of its matrix in some basis. As we have discussed above, the result does not depend on the choice of the basis, so characteristic polynomial of an operator is well defined.

4.1.4. Complex vs real spaces

The fundamental theorem of algebra asserts that any polynomial (of degree at least 11) has a complex root. That implies that an operator in a finite-dimensional complex vector space has at least one eigenvalue, so its spectrum is non-empty.

On the other hand it is easy to construct a linear transformation in a real vector space without real eigenvalues, the rotation RαR_{\alpha}, απn\alpha\neq\pi n in 2\mathbb{R}^{2} being one of examples. Since it is usually assumed that eigenvalues should belong to the field of scalars (if an operator acts in a vector space over a field 𝔽\mathbb{F} the eigenvalues should be in 𝔽\mathbb{F}), such operators have empty spectrum.

Thus, the complex case (i.e. operators acting in complex vector spaces) seems to be the most natural setting for the spectral theory. Since \mathbb{R}\subset\mathbb{C}, we can always treat a real n×nn\times n matrix as an operator in n\mathbb{C}^{n} to allow complex eigenvalues. Treating real matrices as operators in n\mathbb{C}^{n} is typical in the spectral theory, and we will follow this agreement. Finding eigenvalues of a matrix (unless otherwise specified) will always mean finding all complex eigenvalues and not restricting oneself only to real ones.

Note that an operator in an abstract real vector space also can be interpreted as an operator in a complex space. A naïve approach would be to fix a basis (recall that all spaces in this chapter are finite-dimensional), and then work with coordinates in this basis allowing complex coordinates: that will be essentially move from operators in n\mathbb{R}^{n} to operators n\mathbb{C}^{n} described above.

This construction describes what is known as the complexification of a real vector space, and the result does not depend on the choice of a basis. A “high brow” abstract construction of the complexification, explaining why the result does not depend on the choice of a basis is described below in Section 5.8.2 of Chapter 5.

4.1.5. Multiplicities of eigenvalues

Let us remind the reader, that if pp is a polynomial, and λ\lambda is its root (i.e. p(λ)=0p(\lambda)=0) then zλz-\lambda divides p(z)p(z), i.e. pp can be represented as p(z)=(zλ)q(z)p(z)=(z-\lambda)q(z), where qq is some polynomial. If q(λ)=0q(\lambda)=0, then qq also can be divided by zλz-\lambda, so (zλ)2(z-\lambda)^{2} divides pp and so on.

The largest positive integer kk such that (zλ)k(z-\lambda)^{k} divides p(z)p(z) is called the multiplicity of the root λ\lambda.

If λ\lambda is an eigenvalue of an operator (matrix) AA, then it is a root of the characteristic polynomial p(z)=det(AzI)p(z)=\det(A-zI). The multiplicity of this root is called the (algebraic) multiplicity of the eigenvalue λ\lambda.

Any polynomial p(z)=k=0nakzkp(z)=\sum_{k=0}^{n}a_{k}z^{k} of degree nn has exactly nn complex roots, counting multiplicity. The words counting multiplicities mean that if a root has multiplicity dd we have to list (count) it dd times. In other words, pp can be represented as

p(z)=an(zλ1)(zλ2)(zλn).p(z)=a_{n}(z-\lambda_{1})(z-\lambda_{2})\ldots(z-\lambda_{n}).

where λ1,λ2,,λn\lambda_{1},\lambda_{2},\ldots,\lambda_{n} are its complex roots, counting multiplicities.

There is another notion of multiplicity of an eigenvalue: the dimension of the eigenspace Ker(AλI)\operatorname{Ker}(A-\lambda I) is called geometric multiplicity of the eigenvalue λ\lambda.

Geometric multiplicity is not as widely used as algebraic multiplicity. So, when people say simply “multiplicity” they usually mean algebraic multiplicity.

Let us mention, that algebraic and geometric multiplicities of an eigenvalue can differ.

Proposition 4.1.1.

Geometric multiplicity of an eigenvalue cannot exceed its algebraic multiplicity.

Proof.

See Exercise 4.1.9 below. ∎

4.1.6. Trace and determinant.

Theorem 4.1.2.

Let AA be n×nn\times n matrix, and let λ1,λ2,,λn\lambda_{1},\lambda_{2},\ldots,\lambda_{n} be its (complex) eigenvalues (counting multiplicities). Then

  1. 1.

    traceA=λ1+λ2++λn\operatorname{trace}A=\lambda_{1}+\lambda_{2}+\ldots+\lambda_{n}.

  2. 2.

    detA=λ1λ2λn\det A=\lambda_{1}\lambda_{2}\ldots\lambda_{n}.

Proof.

See Exercises 4.1.10, 4.1.11 below. ∎

4.1.7. Eigenvalues of a triangular matrix

Computing eigenvalues is equivalent to finding roots of a characteristic polynomial of a matrix (or using some numerical method), which can be quite time consuming. However, there is one particular case, when we can just read eigenvalues off the matrix. Namely

eigenvalues of a triangular matrix (counting multiplicities) are exactly the diagonal entries a1,1,a2,2,,an,na_{1,1},a_{2,2},\ldots,a_{n,n}

By triangular here we mean either upper or lower triangular matrix. Since a diagonal matrix is a particular case of a triangular matrix (it is both upper and lower triangular

the eigenvalues of a diagonal matrix are its diagonal entries

The proof of the statement about triangular matrices is trivial: we need to subtract λ\lambda from the diagonal entries of AA, and use the fact that determinant of a triangular matrix is the product of its diagonal entries. We get the characteristic polynomial

det(AλI)=(a1,1λ)(a2,2λ)(an,nλ)\det(A-\lambda I)=(a_{1,1}-\lambda)(a_{2,2}-\lambda)\ldots(a_{n,n}-\lambda)

and its roots are exactly a1,1,a2,2,,an,na_{1,1},a_{2,2},\ldots,a_{n,n}.

Exercises.

4.1.1.

True or false:

  1. a)

    Every linear operator in an nn-dimensional vector space has nn distinct eigenvalues;

  2. b)

    If a matrix has one eigenvector, it has infinitely many eigenvectors;

  3. c)

    There exists a square real matrix with no real eigenvalues;

  4. d)

    There exists a square matrix with no (complex) eigenvectors;

  5. e)

    Similar matrices always have the same eigenvalues;

  6. f)

    Similar matrices always have the same eigenvectors;

  7. g)

    A non-zero sum of two eigenvectors of a matrix AA is always an eigenvector;

  8. h)

    A non-zero sum of two eigenvectors of a matrix AA corresponding to the same eigenvalue λ\lambda is always an eigenvector.

4.1.2.

Find characteristic polynomials, eigenvalues and eigenvectors of the following matrices:

(4523),(2114),(133353331).\left(\begin{array}[]{rr}4&-5\\ 2&-3\\ \end{array}\right),\qquad\left(\begin{array}[]{rr}2&1\\ -1&4\\ \end{array}\right),\qquad\left(\begin{array}[]{rrr}1&3&3\\ -3&-5&-3\\ 3&3&1\\ \end{array}\right).
4.1.3.

Compute eigenvalues and eigenvectors of the rotation matrix

(cosαsinαsinαcosα).\left(\begin{array}[]{rr}\cos\alpha&-\sin\alpha\\ \sin\alpha&\cos\alpha\\ \end{array}\right).

Note, that the eigenvalues (and eigenvectors) do not need to be real.

4.1.4.

Compute characteristic polynomials and eigenvalues of the following matrices:

(12567023600250003),(21020π4320016100054),(4000130024e03311),\displaystyle\left(\begin{array}[]{rrrr}1&2&5&67\\ 0&2&3&6\\ 0&0&-2&5\\ 0&0&0&3\\ \end{array}\right),\qquad\left(\begin{array}[]{rrrr}2&1&0&2\\ 0&\pi&43&2\\ 0&0&16&1\\ 0&0&0&54\\ \end{array}\right),\qquad\left(\begin{array}[]{rrrr}4&0&0&0\\ 1&3&0&0\\ 2&4&e&0\\ 3&3&1&1\\ \end{array}\right),
(4000100024003311).\displaystyle\left(\begin{array}[]{rrrr}4&0&0&0\\ 1&0&0&0\\ 2&4&0&0\\ 3&3&1&1\\ \end{array}\right).

Do not expand the characteristic polynomials, leave them as products.

4.1.5.

Prove that eigenvalues (counting multiplicities) of a triangular matrix coincide with its diagonal entries

4.1.6.

An operator AA is called nilpotent if Ak=𝟎A^{k}=\mathbf{0} for some kk. Prove that if AA is nilpotent, then σ(A)={0}\sigma(A)=\{0\} (i.e. that 0 is the only eigenvalue of AA).

4.1.7.

Show that characteristic polynomial of a block triangular matrix

(A𝟎B),\left(\begin{array}[]{cc}A&*\\ \mathbf{0}&B\end{array}\right),

where AA and BB are square matrices, coincides with det(AλI)det(BλI)\det(A-\lambda I)\det(B-\lambda I). (Use Exercise 3.3.11 from Chapter 3).

4.1.8.

Let 𝐯1,𝐯2,,𝐯n\mathbf{v}_{1},\mathbf{v}_{2},\ldots,\mathbf{v}_{n} be a basis in a vector space VV. Assume also that the first kk vectors 𝐯1,𝐯2,,𝐯k\mathbf{v}_{1},\mathbf{v}_{2},\ldots,\mathbf{v}_{k} of the basis are eigenvectors of an operator AA, corresponding to an eigenvalue λ\lambda (i.e. that A𝐯j=λ𝐯jA\mathbf{v}_{j}=\lambda\mathbf{v}_{j}, j=1,2,,kj=1,2,\ldots,k). Show that in this basis the matrix of the operator AA has block triangular form

(λIk𝟎B),\left(\begin{array}[]{cc}\lambda I_{k}&*\\ \mathbf{0}&B\end{array}\right),

where IkI_{k} is k×kk\times k identity matrix and BB is some (nk)×(nk)(n-k)\times(n-k) matrix.

4.1.9.

Use the two previous exercises to prove that geometric multiplicity of an eigenvalue cannot exceed its algebraic multiplicity.

4.1.10.

Prove that determinant of a matrix AA is the product of its eigenvalues (counting multiplicities).

Hint: first show that det(AλI)=(λ1λ)(λ2λ)(λnλ)\det(A-\lambda I)=(\lambda_{1}-\lambda)(\lambda_{2}-\lambda)\ldots(\lambda_{n}% -\lambda), where λ1,λ2,,λn\lambda_{1},\lambda_{2},\ldots,\lambda_{n} are eigenvalues (counting multiplicities). Then compare the free terms (terms without λ\lambda) or plug in λ=0\lambda=0 to get the conclusion.

4.1.11.

Prove that the trace of a matrix equals the sum of eigenvalues in three steps. First, compute the coefficient of λn1\lambda^{n-1} in the right side of the equality

det(AλI)=(λ1λ)(λ2λ)(λnλ).\det(A-\lambda I)=(\lambda_{1}-\lambda)(\lambda_{2}-\lambda)\ldots(\lambda_{n}% -\lambda).

Then show that det(AλI)\det(A-\lambda I) can be represented as

det(AλI)=(a1,1λ)(a2,2λ)(an,nλ)+q(λ)\det(A-\lambda I)=(a_{1,1}-\lambda)(a_{2,2}-\lambda)\ldots(a_{n,n}-\lambda)+q(\lambda)

where q(λ)q(\lambda) is polynomial of degree at most n2n-2. And finally, comparing the coefficients of λn1\lambda^{n-1} get the conclusion.

4.2. Diagonalization.

One of the application of the spectral theory is the diagonalization of operators, which means given an operator to find a basis in which the matrix of the operator is diagonal. Such basis does not always exists, i.e not all operators can be diagonalized (are diagonalizable). Importance of diagonalizable operators comes from the fact that the powers, and more general function of diagonal matrices are easy to compute, so if we diagonalize an operator we can easily compute functions of it.

We will explain how to compute functions of diagonalizable operators in this section. We also give a necessary and sufficient condition for an operator to be diagonalizable, as well as some simple sufficient conditions.

Note also that for operators in 𝔽n\mathbb{F}^{n} (matrices) the diagonalizability of AA means that it can be represented as A=SDS1A=SDS^{-1}, where DD is a diagonal matrix (and SS, of course, is invertible); we will explain this shortly.

Unless otherwise specified, all results in this section hold for both complex and real vector spaces (and even for spaces over arbitrary fields).

4.2.1. Preliminaries

Suppose an operator AA in a vector space VV is such that VV has a basis =𝐛1,𝐛2,,𝐛n\mathcal{B}=\mathbf{b}_{1},\mathbf{b}_{2},\ldots,\mathbf{b}_{n} of eigenvectors of AA, with λ1,λ2,,λn\lambda_{1},\lambda_{2},\ldots,\lambda_{n} being the corresponding eigenvalues. Then the matrix of AA in this basis is the diagonal matrix with λ1,λ2,,λn\lambda_{1},\lambda_{2},\ldots,\lambda_{n} on the diagonal

(4.2.5) [A]=diag{λ1,λ2,,λn}=(λ1λ200λn).\displaystyle[A]_{{}_{\scriptstyle\mathcal{B}\mathcal{B}}}=\operatorname{diag}% \{\lambda_{1},\lambda_{2},\ldots,\lambda_{n}\}=\left(\begin{array}[]{cccc}% \lambda_{1}&&&\\[-10.76385pt] &\lambda_{2}&&{\raisebox{6.45831pt}{\Huge$0$}}\\ &&\ddots&\\[-4.30554pt] {\mbox{\Huge$0$}}&&&\lambda_{n}\end{array}\right).

On the other hand, if the matrix of an operator AA in a basis =𝐛1,𝐛2,,𝐛n\mathcal{B}=\mathbf{b}_{1},\mathbf{b}_{2},\ldots,\mathbf{b}_{n} is given by (4.2.5) then trivially A𝐛k=λk𝐛kA\mathbf{b}_{k}=\lambda_{k}\mathbf{b}_{k}, i.e  λk\lambda_{k} are eigenvalues and 𝐛k\mathbf{b}_{k} are corresponding eigenvectors.

Note that the above reasoning holds for both complex and real vector spaces (and even for vector spaces over arbitrary fields)

Applying the above reasoning to operators in 𝔽n\mathbb{F}^{n} (matrices) we immediately get the following theorem. Note, that while in this book 𝔽\mathbb{F} is either \mathbb{C} or \mathbb{R}, this theorem holds for an arbitrary field 𝔽\mathbb{F}.

Theorem 4.2.1.

A matrix AA (with values in 𝔽\mathbb{F}) admits a representation A=SDS1A=SDS^{-1}, where DD is a diagonal matrix and SS is an invertible one (both with entries in 𝔽\mathbb{F}) if and only if there exists a basis in 𝔽n\mathbb{F}^{n} of eigenvectors of AA.

Moreover, in this case diagonal entries of DD are the eigenvalues and the columns of SS are the corresponding eigenvectors (column number kk corresponds to kkth diagonal entry of DD).

Proof.

Let D=diag{λ1,λ2,,λn}D=\operatorname{diag}\{\lambda_{1},\lambda_{2},\ldots,\lambda_{n}\}, and let 𝐛1,𝐛2,,𝐛n\mathbf{b}_{1},\mathbf{b}_{2},\ldots,\mathbf{b}_{n} be the columns of SS (note that since SS is invertible its columns form a basis in 𝔽n\mathbb{F}^{n}). Then the identity A=SDS1A=SDS^{-1} means that D=[A],D=[A]_{{}_{\scriptstyle\mathcal{B},\mathcal{B}}}.

Indeed, S=[I]𝒮,S=[I]_{{}_{\scriptstyle\mathcal{S},\mathcal{B}}} is the change of the coordinates matrix from \mathcal{B} to the standard basis 𝒮\mathcal{S}, so we get from A=SDS1A=SDS^{-1} that D=S1AS=[I],𝒮A[I]𝒮,D=S^{-1}AS=[I]_{{}_{\scriptstyle\mathcal{B},\mathcal{S}}}A[I]_{{}_{% \scriptstyle\mathcal{S},\mathcal{B}}}, which means exactly that D=[A],D=[A]_{{}_{\scriptstyle\mathcal{B},\mathcal{B}}}.

And as we just discussed above, [A],=D=diag{λ1,λ2,,λn}[A]_{{}_{\scriptstyle\mathcal{B},\mathcal{B}}}=D=\operatorname{diag}\{\lambda_% {1},\lambda_{2},\ldots,\lambda_{n}\} if and only if λk\lambda_{k} are the eigenvalues and 𝐛k\mathbf{b}_{k} are the corresponding eigenvectors of AA. ∎

Remark.

Note if a matrix admits the representation A=SDS1A=SDS^{-1} with a diagonal matrix DD, then a simple direct calculation shows that the columns of SS are eigenvectors of AA and diagonal entries of DD are corresponding eigenvalues. This gives an alternative proof of the corresponding statement in Theorem 4.2.1.

As we discussed above, a diagonalizable operator A:VVA:V\to V has exactly n=dimVn=\dim V eigenvalues (counting multiplicities). Any operator in a complex vector space VV has nn eigenvalues (counting multiplicities); an operator in a real space, on the other hand, could have no real eigenvalues.

We will, as it is customary in the spectral theory, treat real matrices as operators in the complex space n\mathbb{C}^{n}, thus allowing complex eigenvalues and eigenvectors. Unless otherwise specified we will mean by the diagonalization of a matrix its complex diagonalization, i.e. a representation A=SDS1A=SDS^{-1} where matrices SS and DD can have complex entries.

The question when a real matrix admits a real diagonalization (A=SDS1A=SDS^{-1} with real matrices SS and DD) is in fact a very simple one, see Theorem 4.2.9 below.

4.2.2. Some motivations: functions of operators.

Let the matrix of an operator AA in a basis =𝐛1,𝐛2,,𝐛n\mathcal{B}=\mathbf{b}_{1},\mathbf{b}_{2},\ldots,\mathbf{b}_{n} is a diagonal one given by (4.2.5). Then it is easy to find an NNth power of the operator AA. Namely, the matrix of ANA^{N} in the basis \mathcal{B} is

[AN]=diag{λ1N,λ2N,,λnN}=(λ1Nλ2N00λnN).[A^{N}]_{{}_{\scriptstyle\mathcal{B}\mathcal{B}}}=\operatorname{diag}\{\lambda% ^{N}_{1},\lambda^{N}_{2},\ldots,\lambda^{N}_{n}\}=\left(\begin{array}[]{cccc}% \lambda_{1}^{N}&&&\\[-10.76385pt] &\lambda_{2}^{N}&&{\raisebox{6.45831pt}{\Huge$0$}}\\ &&\ddots&\\[-4.30554pt] {\mbox{\Huge$0$}}&&&\lambda_{n}^{N}\end{array}\right).

Moreover, functions of the operator are also very easy to compute: for example the operator (matrix) exponent etAe^{tA} is defined as etA=I+tA+t2A22!+t3A33!+=k=0tkAkk!\displaystyle e^{tA}=I+tA+\frac{t^{2}A^{2}}{2!}+\frac{t^{3}A^{3}}{3!}+\ldots=% \sum_{k=0}^{\infty}\frac{t^{k}A^{k}}{k!}, and its matrix in the basis \mathcal{B} is

[etA]=diag{eλ1t,eλ2t,,eλnt}=(eλ1teλ2t00eλnt).[e^{tA}]_{{}_{\scriptstyle\mathcal{B}\mathcal{B}}}=\operatorname{diag}\{e^{% \lambda_{1}t},e^{\lambda_{2}t},\ldots,e^{\lambda_{n}t}\}=\left(\begin{array}[]% {cccc}e^{\lambda_{1}t}&&&\\[-10.76385pt] &e^{\lambda_{2}t}&&{\raisebox{6.45831pt}{\Huge$0$}}\\ &&\ddots&\\[-4.30554pt] {\mbox{\Huge$0$}}&&&e^{\lambda_{n}t}\end{array}\right).

Let now AA be an operator in 𝔽n\mathbb{F}^{n}. To find the matrices of the operators ANA^{N} and etAe^{tA} in the standard basis 𝒮\mathcal{S}, we need to recall that the change of coordinate matrix [I]𝒮[I]_{{}_{\scriptstyle\mathcal{S}\mathcal{B}}} is the matrix with columns 𝐛1,𝐛2,,𝐛n\mathbf{b}_{1},\mathbf{b}_{2},\ldots,\mathbf{b}_{n}. Let us call this matrix SS, then according to the change of coordinates formula we have

A=[A]𝒮𝒮=S(λ1λ200λn)S1=SDS1,A=[A]_{{}_{\scriptstyle\mathcal{S}\mathcal{S}}}=S\left(\begin{array}[]{cccc}% \lambda_{1}&&&\\[-10.76385pt] &\lambda_{2}&&{\raisebox{6.45831pt}{\Huge$0$}}\\ &&\ddots&\\[-4.30554pt] {\mbox{\Huge$0$}}&&&\lambda_{n}\end{array}\right)S^{-1}=SDS^{-1},

where we use DD for the diagonal matrix in the middle.

Similarly

AN=SDNS1=S(λ1Nλ2N00λnN)S1,A^{N}=SD^{N}S^{-1}=S\left(\begin{array}[]{cccc}\lambda_{1}^{N}&&&\\[-10.76385% pt] &\lambda_{2}^{N}&&{\raisebox{6.45831pt}{\Huge$0$}}\\ &&\ddots&\\[-4.30554pt] {\mbox{\Huge$0$}}&&&\lambda_{n}^{N}\end{array}\right)S^{-1}\ ,

and similarly for etAe^{tA}.

Another way of thinking about powers (or other functions) of diagonalizable operators is to see that if operator AA can be represented as A=SDS1A=SDS^{-1}, then

AN=(SDS1)(SDS1)(SDS1)N times=SDNS1A^{N}=\underbrace{(SDS^{-1})(SDS^{-1})\ldots(SDS^{-1})}_{N\text{ times}}=SD^{N% }S^{-1}

and it is easy to compute the NNth power of a diagonal matrix.

4.2.3. The case of nn distinct eigenvalues

We now present very simple sufficient condition for an operator to be diagonalizable, see Corollary 4.2.3 below.

Theorem 4.2.2.

Let λ1,λ2,,λr\lambda_{1},\lambda_{2},\ldots,\lambda_{r} be distinct eigenvalues of AA, and let 𝐯1,𝐯2,,𝐯r\mathbf{v}_{1},\mathbf{v}_{2},\ldots,\mathbf{v}_{r} be the corresponding eigenvectors. Then vectors 𝐯1,𝐯2,,𝐯r\mathbf{v}_{1},\mathbf{v}_{2},\ldots,\mathbf{v}_{r} are linearly independent.

Proof.

We will use induction on rr. The case r=1r=1 is trivial, because by the definition an eigenvector is non-zero, and a system consisting of one non-zero vector is linearly independent.

Suppose that the statement of the theorem is true for r1r-1. Suppose there exists a non-trivial linear combination

(4.2.6) c1𝐯1+c2𝐯2++cr𝐯r=k=1rck𝐯k=𝟎.c_{1}\mathbf{v}_{1}+c_{2}\mathbf{v}_{2}+\ldots+c_{r}\mathbf{v}_{r}=\sum_{k=1}^% {r}c_{k}\mathbf{v}_{k}=\mathbf{0}.

Applying AλrIA-\lambda_{r}I to (4.2.6) and using the fact that (AλrI)𝐯r=𝟎(A-\lambda_{r}I)\mathbf{v}_{r}=\mathbf{0} we get

k=1r1ck(λkλr)𝐯k=𝟎.\sum_{k=1}^{r-1}c_{k}(\lambda_{k}-\lambda_{r})\mathbf{v}_{k}=\mathbf{0}.

By the induction hypothesis vectors 𝐯1,𝐯2,,𝐯r1\mathbf{v}_{1},\mathbf{v}_{2},\ldots,\mathbf{v}_{r-1} are linearly independent, so ck(λkλr)=0c_{k}(\lambda_{k}-\lambda_{r})=0 for k=1,2,,r1k=1,2,\ldots,r-1. Since λkλr\lambda_{k}\neq\lambda_{r} we can conclude that ck=0c_{k}=0 for k<rk<r. Then it follows from (4.2.6) that cr=0c_{r}=0, i.e. we have the trivial linear combination. ∎

Corollary 4.2.3.

If an operator A:VVA:V\to V has exactly n=dimVn=\dim V distinct eigenvalues, then it is diagonalizable.

While very simple, this is a very important statement, and it will be used a lot! Please remember it.

Proof of Corollary 4.2.3.

For each eigenvalue λk\lambda_{k} let 𝐯k\mathbf{v}_{k} be a corresponding eigenvector (just pick one eigenvector for each eigenvalue). By Theorem 4.2.2 the system 𝐯1,𝐯2,,𝐯n\mathbf{v}_{1},\mathbf{v}_{2},\ldots,\mathbf{v}_{n} is linearly independent, and since it consists of exactly n=dimVn=\dim V vectors it is a basis. ∎

4.2.4. Bases of subspaces (AKA direct sums of subspaces).

To describe diagonalizable operators we need to introduce some new definitions.

Let V1,V2,,VpV_{1},V_{2},\ldots,V_{p} be subspaces of a vector space VV. We say that the system of subspaces is a basis in VV if any vector 𝐯V\mathbf{v}\in V admits a unique representation as a sum

(4.2.7) 𝐯=𝐯1+𝐯2++𝐯p=k=1p𝐯k,𝐯kVk.\mathbf{v}=\mathbf{v}_{1}+\mathbf{v}_{2}+\ldots+\mathbf{v}_{p}=\sum_{k=1}^{p}% \mathbf{v}_{k},\qquad\mathbf{v}_{k}\in V_{k}.

We also say, that a system of subspaces V1,V2,,VpV_{1},V_{2},\ldots,V_{p} is linearly independent if the equation

𝐯1+𝐯2++𝐯p=𝟎,𝐯kVk\mathbf{v}_{1}+\mathbf{v}_{2}+\ldots+\mathbf{v}_{p}=\mathbf{0},\qquad\mathbf{v% }_{k}\in V_{k}

has only trivial solution (𝐯k=𝟎\mathbf{v}_{k}=\mathbf{0} k=1,2,,p\forall k=1,2,\ldots,p).

Another way to phrase that is to say that a system of subspaces V1,V2,,VpV_{1},V_{2},\ldots,V_{p} is linearly independent if and only if any system of non-zero vectors 𝐯k\mathbf{v}_{k}, where 𝐯kVk\mathbf{v}_{k}\in V_{k}, is linearly independent.

We say that the system of subspaces V1,V2,,VpV_{1},V_{2},\ldots,V_{p} is generating (or complete, or spanning) if any vector 𝐯V\mathbf{v}\in V admits representation as (4.2.7) (not necessarily unique).

Remark 4.2.4.

From the above definition one can immediately see that Theorem 4.2.2 in fact states that the system of eigenspaces EkE_{k} of an operator AA

Ek:=Ker(AλkI),λkσ(A),E_{k}:=\operatorname{Ker}(A-\lambda_{k}I),\qquad\lambda_{k}\in\sigma(A),

is linearly independent.

Remark 4.2.5.

It is easy to see that similarly to the bases of vectors, a system of subspaces V1,V2,,VpV_{1},V_{2},\ldots,V_{p} is a basis if and only if it is generating and linearly independent. We leave the proof of this fact as an exercise for the reader.

There is a simple example of a basis of subspaces. Let VV be a vector space with a basis 𝐯1,𝐯2,,𝐯n\mathbf{v}_{1},\mathbf{v}_{2},\ldots,\mathbf{v}_{n}. Split the set of indices 1,2,,n1,2,\ldots,n into pp subsets Λ1,Λ2,,Λp\Lambda_{1},\Lambda_{2},\ldots,\Lambda_{p}, and define subspaces Vk:=span{𝐯j:jΛk}V_{k}:=\operatorname{span}\{\mathbf{v}_{j}:j\in\Lambda_{k}\}. Clearly the subspaces VkV_{k} form a basis of VV.

The following theorem shows that in the finite-dimensional case it is essentially the only possible example of a basis of subspaces.

Theorem 4.2.6.

Let V1,V2,,VpV_{1},V_{2},\ldots,V_{p} be a basis of subspaces, and let for each kk a collection k\mathcal{B}_{k} of vectors be a basis in VkV_{k}.222We do not list the vectors in k\mathcal{B}_{k}, one just should keep in mind that each k\mathcal{B}_{k} consists of finitely many vectors in VkV_{k} Then the union kk\cup_{k}\mathcal{B}_{k} of these bases is a basis in VV.

To prove the theorem we need the following lemma

Lemma 4.2.7.

Let V1,V2,,VpV_{1},V_{2},\ldots,V_{p} be a linearly independent family of subspaces, and let us have in each subspace VkV_{k} a linearly independent system k\mathcal{B}_{k} of vectors in k\mathcal{B}_{k}.333Again, here we do not name each vector in k\mathcal{B}_{k} individually, we just keep in mind that each set k\mathcal{B}_{k} consists of finitely many vectors. Then the union :=kk\mathcal{B}:=\cup_{k}\mathcal{B}_{k} is a linearly independent system.

Proof.

The proof of the lemma is almost trivial, if one thinks a bit about it. The main difficulty in writing the proof is a choice of an appropriate notation. Instead of using two indices (one for the number kk and the other for the number of a vector in k\mathcal{B}_{k}, let us use “flat” notation.

Namely, let nn be the number of vectors in :=kk\mathcal{B}:=\cup_{k}\mathcal{B}_{k}. Let us order the set \mathcal{B}, for example as follows: first list all vectors from 1\mathcal{B}_{1}, then all vectors in 2\mathcal{B}_{2}, etc, listing all vectors from p\mathcal{B}_{p} last.

This way, we index all vectors in \mathcal{B} by integers 1,2,,n1,2,\ldots,n, and the set of indices {1,2,,n}\{1,2,\ldots,n\} splits into the sets Λ1,Λ2,,Λp\Lambda_{1},\Lambda_{2},\ldots,\Lambda_{p} such that the set k\mathcal{B}_{k} consists of vectors 𝐛j:jΛk\mathbf{b}_{j}:j\in\Lambda_{k}.

Suppose we have a non-trivial linear combination

(4.2.8) c1𝐛1+c2𝐛2++cn𝐛n=j=1ncj𝐛j=𝟎.c_{1}\mathbf{b}_{1}+c_{2}\mathbf{b}_{2}+\ldots+c_{n}\mathbf{b}_{n}=\sum_{j=1}^% {n}c_{j}\mathbf{b}_{j}=\mathbf{0}.

Denote

𝐯k=jΛkcj𝐛j.\mathbf{v}_{k}=\sum_{j\in\Lambda_{k}}c_{j}\mathbf{b}_{j}.

Then (4.2.8) can be rewritten as

𝐯1+𝐯2++𝐯p=𝟎.\mathbf{v}_{1}+\mathbf{v}_{2}+\ldots+\mathbf{v}_{p}=\mathbf{0}.

Since 𝐯kVk\mathbf{v}_{k}\in V_{k} and the system of subspaces VkV_{k} is linearly independent, 𝐯k=𝟎\mathbf{v}_{k}=\mathbf{0} k\forall k. Than means that for every kk

jΛkcj𝐛j=𝟎,\sum_{j\in\Lambda_{k}}c_{j}\mathbf{b}_{j}=\mathbf{0},

and since the system of vectors 𝐛j:jΛk\mathbf{b}_{j}:j\in\Lambda_{k} (i.e. the system k\mathcal{B}_{k}) are linearly independent, we have cj=0c_{j}=0 for all jΛkj\in\Lambda_{k}. Since it is true for all Λk\Lambda_{k}, we can conclude that cj=0c_{j}=0 for all jj. ∎

Proof of Theorem 4.2.6.

To prove the theorem we will use the same notation as in the proof of Lemma 4.2.7, i.e. the system k\mathcal{B}_{k} consists of vectors 𝐛j\mathbf{b}_{j}, jΛkj\in\Lambda_{k}.

Lemma 4.2.7 asserts that the system of vectors 𝐛j\mathbf{b}_{j}, j=1,2,,nj=1,2,\ldots,n is linearly independent, so it only remains to show that the system is complete.

Since the system of subspaces V1,V2,,VpV_{1},V_{2},\ldots,V_{p} is a basis, any vector 𝐯V\mathbf{v}\in V can be represented as

𝐯=𝐯1+𝐯2++𝐯p=k=1p𝐯k,𝐯kVk.\mathbf{v}=\mathbf{v}_{1}+\mathbf{v}_{2}+\ldots+\mathbf{v}_{p}=\sum_{k=1}^{p}% \mathbf{v}_{k},\qquad\mathbf{v}_{k}\in V_{k}.

Since the vectors 𝐛j\mathbf{b}_{j}, jΛkj\in\Lambda_{k} form a basis in VkV_{k}, the vectors 𝐯k\mathbf{v}_{k} can be represented as

𝐯k=jΛkcj𝐛j,\mathbf{v}_{k}=\sum_{j\in\Lambda_{k}}c_{j}\mathbf{b}_{j},

and therefore 𝐯=j=1ncj𝐛j\mathbf{v}=\sum_{j=1}^{n}c_{j}\mathbf{b}_{j}. ∎

4.2.5. Criterion of diagonalizability

First of all let us recall a simple necessary condition. Since the eigenvalues (counting multiplicities) of a diagonal matrix D=diag{λ1,λ2,,λn}D=\operatorname{diag}\{\lambda_{1},\lambda_{2},\ldots,\lambda_{n}\} are exactly λ1,λ2,,λn\lambda_{1},\lambda_{2},\ldots,\lambda_{n}, we see that if an operator A:VVA:V\to V is diagonalizable, it has exactly n=dimVn=\dim V eigenvalues (counting multiplicities).

Theorem below holds for both real and complex vector spaces (and even for spaces over general fields).

Theorem 4.2.8.

Let an operator A:VVA:V\to V has exactly n=dimVn=\dim V eigenvalues (counting multiplicities)444Since any operator in a complex vector space has exactly nn eigenvalues (counting multiplicities), this assumption is moot in the complex case. . Then AA is diagonalizable if and only if for each eigenvalue λ\lambda the dimension of the eigenspace Ker(AλI)\operatorname{Ker}(A-\lambda I) (i.e. the geometric multiplicity of λ\lambda) coincides with the algebraic multiplicity of λ\lambda.

Proof.

First of all let us note, that for a diagonal matrix, the algebraic and geometric multiplicities of eigenvalues coincide, and therefore the same holds for the diagonalizable operators.

Let us now prove the other implication. Let λ1,λ2,,λp\lambda_{1},\lambda_{2},\ldots,\lambda_{p} be eigenvalues of AA, and let Ek:=Ker(AλkI)E_{k}:=\operatorname{Ker}(A-\lambda_{k}I) be the corresponding eigenspaces. According to Remark 4.2.4, the subspaces EkE_{k}, k=1,2,,pk=1,2,\ldots,p are linearly independent.

Let k\mathcal{B}_{k} be a basis in EkE_{k}. By Lemma 4.2.7 the system =kk\mathcal{B}=\cup_{k}\mathcal{B}_{k} is a linearly independent system of vectors.

We know that each k\mathcal{B}_{k} consists of dimEk(=multiplicity of λk)\dim E_{k}(=\text{multiplicity of }\lambda_{k}) vectors. So the number of vectors in \mathcal{B} equal to the sum of multiplicities of eigenvalues λk\lambda_{k}. But the sum of multiplicities of the eigenvalues is the number of eigenvalues counting multiplicities, which is exactly n=dimVn=\dim V. So, we have a linearly independent system of n=dimVn=\dim V eigenvectors, which means it is a basis. ∎

4.2.6. Real factorization

The theorem below is, in fact, already proven (it is essentially Theorem 4.2.8 for real spaces). We state it here to summarize the situation with real diagonalization of real matrices.

Theorem 4.2.9.

A real n×nn\times n matrix AA admits a real factorization (i.e. representation A=SDS1A=SDS^{-1} where SS and DD are real matrices, DD is diagonal and SS is invertible) if and only if it admits complex factorization and all eigenvalues of AA are real.

4.2.7. Some example

Real eigenvalues

Consider the matrix

A=(1281).A=\left(\begin{array}[]{cc}1&2\\ 8&1\end{array}\right).

Its characteristic polynomial is equal to

|1λ281λ|=(1λ)216\left|\begin{array}[]{cc}1-\lambda&2\\ 8&1-\lambda\end{array}\right|=(1-\lambda)^{2}-16

and its roots (eigenvalues) are λ=5\lambda=5 and λ=3\lambda=-3. For the eigenvalue λ=5\lambda=5

A5I=(152815)=(4284)A-5I=\left(\begin{array}[]{cc}1-5&2\\ 8&1-5\end{array}\right)=\left(\begin{array}[]{cc}-4&2\\ 8&-4\end{array}\right)

A basis in its nullspace consists of one vector (1,2)T(1,2)^{T}, so this is the corresponding eigenvector.

Similarly, for λ=3\lambda=-3

AλI=A+3I=(4284)A-\lambda I=A+3I=\left(\begin{array}[]{cc}4&2\\ 8&4\end{array}\right)

and the eigenspace Ker(A+3I)\operatorname{Ker}(A+3I) is spanned by the vector (1,2)T(1,-2)^{T}. The matrix AA can be diagonalized as

A=(1281)=(1122)(5003)(1122)1A=\left(\begin{array}[]{cc}1&2\\ 8&1\end{array}\right)=\left(\begin{array}[]{cc}1&1\\ 2&-2\end{array}\right)\left(\begin{array}[]{cc}5&0\\ 0&-3\end{array}\right)\left(\begin{array}[]{cc}1&1\\ 2&-2\end{array}\right)^{-1}

Complex eigenvalues

Consider the matrix

A=(1221).A=\left(\begin{array}[]{cc}1&2\\ -2&1\end{array}\right).

Its characteristic polynomial is

|1λ221λ|=(1λ)2+22\left|\begin{array}[]{cc}1-\lambda&2\\ -2&1-\lambda\end{array}\right|=(1-\lambda)^{2}+2^{2}

and the eigenvalues (roots of the characteristic polynomial) are λ=1±2i\lambda=1\pm 2i. For λ=1+2i\lambda=1+2i

AλI=(2i222i)A-\lambda I=\left(\begin{array}[]{cc}-2i&2\\ -2&-2i\end{array}\right)

This matrix has rank 1, so the eigenspace Ker(AλI)\operatorname{Ker}(A-\lambda I) is spanned by one vector, for example by (1,i)T(1,i)^{T}.

Since the matrix AA is real, we do not need to compute an eigenvector for λ=12i\lambda=1-2i: we can get it for free by taking the complex conjugate of the above eigenvector, see Exercise 4.2.2 below. So, for λ=12i\lambda=1-2i a corresponding eigenvector is (1,i)T(1,-i)^{T}, and so the matrix AA can be diagonalized as

A=(11ii)(1+2i0012i)(11ii)1.A=\left(\begin{array}[]{cc}1&1\\ i&-i\end{array}\right)\left(\begin{array}[]{cc}1+2i&0\\ 0&1-2i\end{array}\right)\left(\begin{array}[]{cc}1&1\\ i&-i\end{array}\right)^{-1}.

A non-diagonalizable matrix

Consider the matrix

A=(1101).A=\left(\begin{array}[]{cc}1&1\\ 0&1\end{array}\right).

Its characteristic polynomial is

|1λ101λ|=(1λ)2,\left|\begin{array}[]{cc}1-\lambda&1\\ 0&1-\lambda\end{array}\right|=(1-\lambda)^{2},

so AA has an eigenvalue 11 of multiplicity 22. But, it is easy to see that dimKer(AI)=1\dim\operatorname{Ker}(A-I)=1 (1 pivot, so 21=12-1=1 free variable). Therefore, the geometric multiplicity of the eigenvalue 11 is different from its algebraic multiplicity, so AA is not diagonalizable.

There is also an explanation which does not use Theorem 4.2.8. Namely, we got that the eigenspace Ker(A1I)\operatorname{Ker}(A-1I) is one dimensional (spanned by the vector (1,0)T(1,0)^{T}). If AA were diagonalizable, it would have a diagonal form (1001)\left(\begin{array}[]{cc}1&0\\ 0&1\end{array}\right) in some basis,555Note, that the only linear transformation having matrix (1001)\left(\begin{array}[]{cc}1&0\\ 0&1\end{array}\right) in some basis is the identity transformation II. Since AA is definitely not the identity, we can immediately conclude that AA cannot be diagonalized, so counting dimension of the eigenspace is not necessary. and so the dimension of the eigenspace would be 2. Therefore AA cannot be diagonalized.

Exercises.

4.2.1.

Let AA be n×nn\times n matrix. True or false:

  1. a)

    ATA^{T} has the same eigenvalues as AA.

  2. b)

    ATA^{T} has the same eigenvectors as AA.

  3. c)

    If AA is diagonalizable, then so is ATA^{T}.

Justify your conclusions.

4.2.2.

Let AA be a square matrix with real entries, and let λ\lambda be its complex eigenvalue. Suppose 𝐯=(v1,v2,,vn)T\mathbf{v}=(v_{1},v_{2},\ldots,v_{n})^{T} is a corresponding eigenvector, A𝐯=λ𝐯A\mathbf{v}=\lambda\mathbf{v}. Prove that the λ¯\overline{\lambda} is an eigenvalue of AA and A𝐯¯=λ¯𝐯¯A\overline{\mathbf{v}}=\overline{\lambda}\overline{\mathbf{v}}. Here 𝐯¯\overline{\mathbf{v}} is the complex conjugate of the vector 𝐯\mathbf{v}, 𝐯¯:=(v¯1,v¯2,,v¯n)T\overline{\mathbf{v}}:=(\overline{v}_{1},\overline{v}_{2},\ldots,\overline{v}_% {n})^{T}.

4.2.3.

Let

A=(4312).A=\left(\begin{array}[]{rr}4&3\\ 1&2\\ \end{array}\right).

Find A2004A^{2004} by diagonalizing AA.

4.2.4.

Construct a matrix AA with eigenvalues 1 and 3 and corresponding eigenvectors (1,2)T(1,2)^{T} and (1,1)T(1,1)^{T}. Is such a matrix unique?

4.2.5.

Diagonalize the following matrices, if possible:

  1. a)

    (4211)\displaystyle\left(\begin{array}[]{rr}4&-2\\ 1&1\\ \end{array}\right).

  2. b)

    (1164)\displaystyle\left(\begin{array}[]{rr}-1&-1\\ 6&4\\ \end{array}\right).

  3. c)

    (226516529)\displaystyle\left(\begin{array}[]{ccccccccccc}-2&2&6\\ 5&1&-6\\ -5&2&9\end{array}\right)  (λ=2\lambda=2 is one of the eigenvalues)

4.2.6.

Consider the matrix

A=(266052004).A=\left(\begin{array}[]{ccccccccccc}2&6&-6\\ 0&5&-2\\ 0&0&4\end{array}\right).
  1. a)

    Find its eigenvalues. Is it possible to find the eigenvalues without computing?

  2. b)

    Is this matrix diagonalizable? Find out without computing anything.

  3. c)

    If the matrix is diagonalizable, diagonalize it.

4.2.7.

Diagonalize the matrix

(206024004).\left(\begin{array}[]{ccccccccccc}2&0&6\\ 0&2&4\\ 0&0&4\end{array}\right).
4.2.8.

Find all square roots of the matrix

A=(5230)A=\left(\begin{array}[]{rr}5&2\\ -3&0\\ \end{array}\right)

i.e. find all matrices BB such that B2=AB^{2}=A. Hint: Finding a square root of a diagonal matrix is easy. You can leave your answer as a product.

4.2.9.

Let us recall that the famous Fibonacci sequence:

0,1,1,2,3,5,8,13,21,0,1,1,2,3,5,8,13,21,\ldots

is defined as follows: we put φ0=0\varphi_{0}=0, φ1=1\varphi_{1}=1 and define

φn+2=φn+1+φn.\varphi_{n+2}=\varphi_{n+1}+\varphi_{n}.

We want to find a formula for φn\varphi_{n}. To do this

  1. a)

    Find a 2×22\times 2 matrix AA such that

    (φn+2φn+1)=A(φn+1φn)\left(\begin{array}[]{c}\varphi_{n+2}\\ \varphi_{n+1}\end{array}\right)=A\left(\begin{array}[]{c}\varphi_{n+1}\\ \varphi_{n}\end{array}\right)

    Hint: Combine the trivial equation φn+1=φn+1\varphi_{n+1}=\varphi_{n+1} with the Fibonacci relation φn+2=φn+1+φn\varphi_{n+2}=\varphi_{n+1}+\varphi_{n}.

  2. b)

    Diagonalize AA and find a formula for AnA^{n}.

  3. c)

    Noticing that

    (φn+1φn)=An(φ1φ0)=An(10)\left(\begin{array}[]{c}\varphi_{n+1}\\ \varphi_{n}\end{array}\right)=A^{n}\left(\begin{array}[]{c}\varphi_{1}\\ \varphi_{0}\end{array}\right)=A^{n}\left(\begin{array}[]{c}1\\ 0\end{array}\right)

    find a formula for φn\varphi_{n}. (You will need to compute an inverse and perform multiplication here).

  4. d)

    Show that the vector (φn+1/φn,1)T(\varphi_{n+1}/\varphi_{n},1)^{T} converges to an eigenvector of AA.

    What do you think, is it a coincidence?

4.2.10.

Let AA be a 5×55\times 5 matrix with 3 eigenvalues (not counting multiplicities). Suppose we know that one eigenspace is three-dimensional. Can you say if AA is diagonalizable?

4.2.11.

Give an example of a 3×33\times 3 matrix which cannot be diagonalized. After you constructed the matrix, can you make it “generic”, so no special structure of the matrix could be seen?

4.2.12.

Let a non-zero matrix AA satisfy A5=𝟎A^{5}=\mathbf{0}. Prove that AA cannot be diagonalized. More generally, any non-zero nilpotent matrix, i.e. a non-zero matrix satisfying AN=𝟎A^{N}=\mathbf{0} for some NN cannot be diagonalized.

4.2.13.

Eigenvalues of a transposition:

  1. a)

    Consider the transformation TT in the space M2×2M_{2\times 2} of 2×22\times 2 matrices, T(A)=ATT(A)=A^{T}. Find all its eigenvalues and eigenvectors. Is it possible to diagonalize this transformation? Hint: While it is possible to write a matrix of this linear transformation in some basis, compute characteristic polynomial, and so on, it is easier to find eigenvalues and eigenvectors directly from the definition.

  2. b)

    Can you do the same problem but in the space of n×nn\times n matrices?

4.2.14.

Prove that two subspaces V1V_{1} and V2V_{2} are linearly independent if and only if V1V2={𝟎}V_{1}\cap V_{2}=\{\mathbf{0}\}.