Chapter 7 Bilinear and quadratic forms

While the study of real quadratic forms (i.e. real homogeneous polynomials of degree 22) was probably the initial motivation for the subject of this chapter, complex quadratic forms (A𝐱,𝐱)(A\mathbf{x},\mathbf{x}), 𝐱n\mathbf{x}\in\mathbb{C}^{n}, A=AA=A^{*} are also of significant interest. So, unless otherwise specified, result and calculations hold in both real and complex case.

To avoid writing twice essentially the same formulas, we use the notation adapted to the complex case: in particular, in the real case the notation AA^{*} is used instead of ATA^{T}.

7.1. Main definition

7.1.1. Bilinear forms on n\mathbb{R}^{n}

A bilinear form on n\mathbb{R}^{n} is a function L=L(𝐱,𝐲)L=L(\mathbf{x},\mathbf{y}) of two arguments 𝐱,𝐲n\mathbf{x},\mathbf{y}\in\mathbb{R}^{n} which is linear in each argument, i.e. such that

  1. 1.

    L(α𝐱1+β𝐱2,𝐲)=αL(𝐱1,𝐲)+βL(𝐱2,𝐲);L(\alpha\mathbf{x}_{1}+\beta\mathbf{x}_{2},\mathbf{y})=\alpha L(\mathbf{x}_{1}% ,\mathbf{y})+\beta L(\mathbf{x}_{2},\mathbf{y});

  2. 2.

    L(𝐱,α𝐲1+β𝐲2)=αL(𝐱,𝐲1)+βL(𝐱,𝐲2)L(\mathbf{x},\alpha\mathbf{y}_{1}+\beta\mathbf{y}_{2})=\alpha L(\mathbf{x},% \mathbf{y}_{1})+\beta L(\mathbf{x},\mathbf{y}_{2}).

One can consider bilinear form whose values belong to an arbitrary vector space, but in this book we only consider forms that take real values.

If 𝐱=(x1,x2,,xn)T\mathbf{x}=(x_{1},x_{2},\ldots,x_{n})^{T} and 𝐲=(y1,y2,,yn)T\mathbf{y}=(y_{1},y_{2},\ldots,y_{n})^{T}, a bilinear form can be written as

L(𝐱,𝐲)=j,k=1naj,kxkyj,L(\mathbf{x},\mathbf{y})=\sum_{j,k=1}^{n}a_{j,k}x_{k}y_{j},

or in matrix form

L(𝐱,𝐲)=(A𝐱,𝐲)L(\mathbf{x},\mathbf{y})=(A\mathbf{x},\mathbf{y})

where

A=(a1,1a1,2a1,na2,1a2,2a2,nan,1an,2an,n).A=\left(\begin{array}[]{cccc}a_{1,1}&a_{1,2}&\ldots&a_{1,n}\\ a_{2,1}&a_{2,2}&\ldots&a_{2,n}\\ \vdots&\vdots&\ddots&\vdots\\ a_{n,1}&a_{n,2}&\ldots&a_{n,n}\end{array}\right).

The matrix AA is determined uniquely by the bilinear form LL.

7.1.2. Quadratic forms on n\mathbb{R}^{n}

There are several equivalent definition of a quadratic form.

One can say that a quadratic form on n\mathbb{R}^{n} is the “diagonal” of a bilinear form LL, i.e. that any quadratic form QQ is defined by Q[𝐱]=L(𝐱,𝐱)=(A𝐱,𝐱)Q[\mathbf{x}]=L(\mathbf{x},\mathbf{x})=(A\mathbf{x},\mathbf{x}).

Another, more algebraic way, is to say that a quadratic form is a homogeneous polynomial of degree 2, i.e. that Q[𝐱]Q[\mathbf{x}] is a polynomial of nn variables x1,x2,,xnx_{1},x_{2},\ldots,x_{n} having only terms of degree 2. That means that only terms axk2ax_{k}^{2} and cxjxkcx_{j}x_{k} are allowed.

There many ways (in fact, infinitely many) to write a quadratic form Q[𝐱]Q[\mathbf{x}] as Q[𝐱]=(A𝐱,𝐱)Q[\mathbf{x}]=(A\mathbf{x},\mathbf{x}). For example, the quadratic form Q[𝐱]=x12+x224x1x2Q[\mathbf{x}]=x_{1}^{2}+x_{2}^{2}-4x_{1}x_{2} on 2\mathbb{R}^{2} can be represented as (A𝐱,𝐱)(A\mathbf{x},\mathbf{x}) where AA can be any of the matrices

(1401),(1041),(1221).\left(\begin{array}[]{rr}1&-4\\ 0&1\\ \end{array}\right),\qquad\left(\begin{array}[]{rr}1&0\\ -4&1\\ \end{array}\right),\qquad\left(\begin{array}[]{rr}1&-2\\ -2&1\\ \end{array}\right).

In fact, any matrix AA of form

(1a4a1)\left(\begin{array}[]{cc}1&a-4\\ -a&1\\ \end{array}\right)

will work.

But if we require the matrix AA to be symmetric, then such a matrix is unique:

Any quadratic form Q[𝐱]Q[\mathbf{x}] on n\mathbb{R}^{n} admits unique representation Q[𝐱]=(A𝐱,𝐱)Q[\mathbf{x}]=(A\mathbf{x},\mathbf{x}) where AA is a (real) symmetric matrix.

For example, for the quadratic form

Q[𝐱]=x12+3x22+5x32+4x1x216x1x3+7x2x3Q[\mathbf{x}]=x_{1}^{2}+3x_{2}^{2}+5x_{3}^{2}+4x_{1}x_{2}-16x_{1}x_{3}+7x_{2}x% _{3}

on 3\mathbb{R}^{3}, the corresponding symmetric matrix AA is

(128233.583.55).\left(\begin{array}[]{ccc}1&2&-8\\ 2&3&3.5\\ -8&3.5&5\\ \end{array}\right).

7.1.3. Quadratic forms on n\mathbb{C}^{n}

One can also define a quadratic form on n\mathbb{C}^{n} (or any complex inner product space) by taking a self-adjoint transformation A=AA=A^{*} and defining QQ by Q[𝐱]=(A𝐱,𝐱)Q[\mathbf{x}]=(A\mathbf{x},\mathbf{x}). While our main examples will be in n\mathbb{R}^{n}, all the theorems are true in the setting of n\mathbb{C}^{n} as well. Bearing this in mind, we will always use AA^{*} instead of ATA^{T}

The only essential difference with the real case is that in the complex case we do not have any freedom of choice: if the quadratic form is real, the corresponding matrix has to be Hermitian (self-adjoint).

Note that if A=AA=A^{*} then

(A𝐱,𝐱)=(𝐱,A𝐱)=(𝐱,A𝐱)=(A𝐱,𝐱)¯,(A\mathbf{x},\mathbf{x})=(\mathbf{x},A^{*}\mathbf{x})=(\mathbf{x},A\mathbf{x})% =\overline{(A\mathbf{x},\mathbf{x})},

so (A𝐱,𝐱)(A\mathbf{x},\mathbf{x})\in\mathbb{R}.

The converse is also true.

Lemma 7.1.1.

Let (A𝐱,𝐱)(A\mathbf{x},\mathbf{x}) be real for all 𝐱n\mathbf{x}\in\mathbb{C}^{n}. Then A=AA=A^{*}.

We leave the proof as an exercise for the reader, see Problem 7.1.4 below.

One of the possible ways to prove Lemma 7.1.1 is to use the following version of polarization identities.

Lemma 7.1.2.

Let AA be an operator in an inner product space XX.

  1. 1.

    If XX is a complex space, then for any 𝐱,𝐲X\mathbf{x},\mathbf{y}\in X

    (A𝐱,𝐲)=14α:α4=1α(A(𝐱+α𝐲),𝐱+α𝐲).(A\mathbf{x},\mathbf{y})=\frac{1}{4}\sum_{\alpha\in\mathbb{C}:\alpha^{4}=1}% \alpha(A(\mathbf{x}+\alpha\mathbf{y}),\mathbf{x}+\alpha\mathbf{y}).
  2. 2.

    If XX is a real space and A=AA=A^{*}, then any 𝐱,𝐲X\mathbf{x},\mathbf{y}\in X

    (A𝐱,𝐲)=14[(A(𝐱+𝐲),𝐱+𝐲)(A(𝐱𝐲),𝐱𝐲)].(A\mathbf{x},\mathbf{y})=\frac{1}{4}\Bigl{[}(A(\mathbf{x}+\mathbf{y}),\mathbf{% x}+\mathbf{y})-(A(\mathbf{x}-\mathbf{y}),\mathbf{x}-\mathbf{y})\Bigr{]}.

For the proof of Lemma 7.1.2 see Exercise 5.6.3 in Chapter 5 above.

Exercises.

7.1.1.

Find the matrix of the bilinear form LL on 3\mathbb{R}^{3},

L(𝐱,𝐲)=x1y1+2x1y2+14x1y35x2y1+2x2y23x2y3+8x3y1+19x3y22x3y3.L(\mathbf{x},\mathbf{y})=x_{1}y_{1}+2x_{1}y_{2}+14x_{1}y_{3}-5x_{2}y_{1}+2x_{2% }y_{2}-3x_{2}y_{3}+8x_{3}y_{1}+19x_{3}y_{2}-2x_{3}y_{3}.
7.1.2.

Define the bilinear form LL on 2\mathbb{R}^{2} by

L(𝐱,𝐲)=det[𝐱,𝐲],L(\mathbf{x},\mathbf{y})=\det[\mathbf{x},\mathbf{y}],

i.e. to compute L(𝐱,𝐲)L(\mathbf{x},\mathbf{y}) we form a 2×22\times 2 matrix with columns 𝐱,𝐲\mathbf{x},\mathbf{y} and compute its determinant.

Find the matrix of LL.

7.1.3.

Find the matrix of the quadratic form QQ on 3\mathbb{R}^{3}

Q[𝐱]=x12+2x1x23x1x39x22+6x2x3+13x32.Q[\mathbf{x}]=x_{1}^{2}+2x_{1}x_{2}-3x_{1}x_{3}-9x_{2}^{2}+6x_{2}x_{3}+13x_{3}% ^{2}.
7.1.4.

Prove Lemma 7.1.1 above.

Hint: Use the polarization identity, see Lemma 7.1.2. Alternatively, you can consider the expression (A(𝐱+z𝐲),𝐱+z𝐲)(A(\mathbf{x}+z\mathbf{y}),\mathbf{x}+z\mathbf{y}) and show that if it is real for all zz\in\mathbb{C} then (A𝐱,𝐲)=(𝐲,A𝐱)¯(A\mathbf{x},\mathbf{y})=\overline{(\mathbf{y},A^{*}\mathbf{x})}.

7.2. Diagonalization of quadratic forms

You have probably met quadratic forms before, when you studied second order curves in the plane. Maybe you even studied the second order surfaces in 3\mathbb{R}^{3}.

We want to present a unified approach to classification of such objects. Suppose we are given a set in n\mathbb{R}^{n} defined by the equation Q[𝐱]=1Q[\mathbf{x}]=1, where QQ is some quadratic form. If QQ has some simple form, for example if the corresponding matrix is diagonal, i.e. if Q[𝐱]=a1x12+a2x22++anxn2Q[\mathbf{x}]=a_{1}x^{2}_{1}+a_{2}x^{2}_{2}+\ldots+a_{n}x^{2}_{n}, we can easily visualize this set, especially if n=2,3n=2,3. In higher dimensions, it is also possible, if not to visualize, then to understand the structure of the set very well.

So, if we are given a general, complicated quadratic form, we want to simplify it as much as possible, for example to make it diagonal. The standard way of doing that is the change of variables.

7.2.1. Orthogonal diagonalization

Let us have a quadratic form Q[𝐱]=(A𝐱,𝐱)Q[\mathbf{x}]=(A\mathbf{x},\mathbf{x}) in 𝔽n\mathbb{F}^{n} (𝔽\mathbb{F} is \mathbb{R} or \mathbb{C}). Introduce new variables 𝐲=(y1,y2,,yn)T𝔽n\mathbf{y}=(y_{1},y_{2},\ldots,y_{n})^{T}\in\mathbb{F}^{n}, with 𝐲=S1𝐱\mathbf{y}=S^{-1}\mathbf{x}, where SS is some invertible n×nn\times n matrix, so 𝐱=S𝐲\mathbf{x}=S\mathbf{y}.

Then,

Q[𝐱]=Q[S𝐲]=(AS𝐲,S𝐲)=(SAS𝐲,𝐲),Q[\mathbf{x}]=Q[S\mathbf{y}]=(AS\mathbf{y},S\mathbf{y})=(S^{*}AS\mathbf{y},% \mathbf{y}),

so in the new variables 𝐲\mathbf{y} the quadratic form has matrix SASS^{*}AS.

So, we want to find an invertible matrix SS such that the matrix SASS^{*}AS is diagonal. Note, that it is different from the diagonalization of matrices we had before: we tried to represent a matrix AA as A=SDS1A=SDS^{-1}, so the matrix D=S1ASD=S^{-1}AS was diagonal. However, for unitary matrices UU, we have U=U1U^{*}=U^{-1}, and we can orthogonally diagonalize symmetric matrices. So we can apply the orthogonal diagonalization we studied before to the quadratic forms.

Namely, we can represent the matrix AA as A=UDU=UDU1A=UDU^{*}=UDU^{-1}. Recall, that DD is a diagonal matrix with eigenvalues of AA on the diagonal, and UU is the matrix of eigenvectors (we need to pick an orthogonal basis of eigenvectors). Then D=UAUD=U^{*}AU, so in the variables 𝐲=U1𝐱\mathbf{y}=U^{-1}\mathbf{x} the quadratic form has diagonal matrix.

Let us analyze the geometric meaning of the orthogonal diagonalization. The columns 𝐮1,𝐮2,,𝐮n\mathbf{u}_{1},\mathbf{u}_{2},\ldots,\mathbf{u}_{n} of the unitary matrix UU form an orthonormal basis in 𝔽n\mathbb{F}^{n}, let us call this basis \mathcal{B}. The change of coordinate matrix [I]𝒮,[I]_{{}_{\scriptstyle\mathcal{S},\mathcal{B}}} from this basis to the standard one is exactly UU. We know that 𝐲=(y1,y2,,yn)T=U1𝐱\mathbf{y}=(y_{1},y_{2},\ldots,y_{n})^{T}=U^{-1}\mathbf{x}, so the coordinates y1,y2,,yny_{1},y_{2},\ldots,y_{n} can be interpreted as coordinates of the vector 𝐱\mathbf{x} in the new basis 𝐮1,𝐮2,,𝐮n\mathbf{u}_{1},\mathbf{u}_{2},\ldots,\mathbf{u}_{n}.

So, orthogonal diagonalization allows us to visualize very well the set Q[𝐱]=1Q[\mathbf{x}]=1, or a similar one, as long as we can visualize it for diagonal matrices.

Example.

Consider the quadratic form of two variables (i.e. quadratic form on 2\mathbb{R}^{2}), Q(x,y)=2x2+2y2+2xyQ(x,y)=2x^{2}+2y^{2}+2xy. Let us describe the set of points (x,y)T2(x,y)^{T}\in\mathbb{R}^{2} satisfying Q(x,y)=1Q(x,y)=1.

The matrix AA of QQ is

A=(2112).A=\left(\begin{array}[]{cc}2&1\\ 1&2\\ \end{array}\right).

Orthogonally diagonalizing this matrix we can represent it as

A=U(3001)U,whereU=12(1111),A=U\left(\begin{array}[]{cr}3&0\\ 0&1\\ \end{array}\right)U^{*},\qquad\text{where}\quad U=\frac{1}{\sqrt{2}}\left(% \begin{array}[]{cr}1&-1\\ 1&1\\ \end{array}\right),

or, equivalently

UAU=(3001)=:D.U^{*}AU=\left(\begin{array}[]{cr}3&0\\ 0&1\\ \end{array}\right)=:D.

The set {𝐲:(D𝐲,𝐲)=1}\{\mathbf{y}:(D\mathbf{y},\mathbf{y})=1\} is the ellipse with half-axes 1/31/\sqrt{3} and 11. Therefore the set {𝐱2:(A𝐱,𝐱)=1}\{\mathbf{x}\in\mathbb{R}^{2}:(A\mathbf{x},\mathbf{x})=1\}, is the same ellipse only in the basis (12,12)T(\frac{1}{\sqrt{2}},\frac{1}{\sqrt{2}})^{T}, (12,12)T(-\frac{1}{\sqrt{2}},\frac{1}{\sqrt{2}})^{T}, or, equivalently, the same ellipse, rotated π/4\pi/4.

7.2.2. Non-orthogonal diagonalization

Orthogonal diagonalization involves computing eigenvalues and eigenvectors, so it may be difficult to do without computers for large nn. On the other hand, the non-orthogonal diagonalization, i.e. finding an invertible SS (without requiring S1=SS^{-1}=S^{*}) such that D=SASD=S^{*}AS is diagonal, is much easier computationally and can be done using only algebraic operations (addition, subtraction, multiplication, division).

Below we present two most used methods of non-orthogonal diagonalization.

Diagonalization by completion of squares

The first methods is based on completion of squares. We will illustrate this method on real quadratic forms (forms on n\mathbb{R}^{n}). After simple modifications this method could be used in the complex case, but we will not discuss it here. If necessary, an interested reader should be able to make the appropriate modifications.

Let us again consider the quadratic form of two variables, Q[𝐱]=2x12+2x1x2+2x22Q[\mathbf{x}]=2x_{1}^{2}+2x_{1}x_{2}+2x_{2}^{2} (it is the same quadratic form as in the above example, only here we call variables not x,yx,y but x1,x2x_{1},x_{2}). Since

2(x1+12x2)2=2(x12+2x112x2+14x22)=2x12+2x1x2+12x222\left(x_{1}+\frac{1}{2}x_{2}\right)^{2}=2\left(x_{1}^{2}+2x_{1}\frac{1}{2}x_{% 2}+\frac{1}{4}x_{2}^{2}\right)=2x_{1}^{2}+2x_{1}x_{2}+\frac{1}{2}x_{2}^{2}

(note, that the first two terms coincide with the first two terms of QQ), we get

2x12+2x1x2+2x22=2(x1+12x2)2+32x22=2y12+32y22,2x_{1}^{2}+2x_{1}x_{2}+2x_{2}^{2}=2\left(x_{1}+\frac{1}{2}x_{2}\right)^{2}+% \frac{3}{2}x_{2}^{2}=2y_{1}^{2}+\frac{3}{2}y_{2}^{2},

where y1=x1+12x2y_{1}=x_{1}+\frac{1}{2}x_{2} and y2=x2y_{2}=x_{2}.

The same method can be applied to quadratic form of more than 2 variables. Let us consider, for example, a form Q[𝐱]Q[\mathbf{x}] in 3\mathbb{R}^{3},

Q[𝐱]=x126x1x2+4x1x36x2x3+8x223x32.Q[\mathbf{x}]=x_{1}^{2}-6x_{1}x_{2}+4x_{1}x_{3}-6x_{2}x_{3}+8x_{2}^{2}-3x_{3}^% {2}.

Considering all terms involving the first variable x1x_{1} (the first 3 terms in this case), let us pick a full square or a multiple of a full square which has exactly these terms (plus some other terms).

Since

(x13x2+2x3)2=x126x1x2+4x1x312x2x3+9x22+4x32(x_{1}-3x_{2}+2x_{3})^{2}=x_{1}^{2}-6x_{1}x_{2}+4x_{1}x_{3}-12x_{2}x_{3}+9x_{2% }^{2}+4x_{3}^{2}

we can rewrite the quadratic form as

(x13x2+2x3)2x22+6x2x37x32.(x_{1}-3x_{2}+2x_{3})^{2}-x_{2}^{2}+6x_{2}x_{3}-7x_{3}^{2}.

Note, that the expression x22+6x2x37x32-x_{2}^{2}+6x_{2}x_{3}-7x_{3}^{2} involves only variables x2x_{2} and x3x_{3}. Since

(x23x3)2=(x226x2x3+9x32)=x22+6x2x39x32-(x_{2}-3x_{3})^{2}=-(x_{2}^{2}-6x_{2}x_{3}+9x_{3}^{2})=-x_{2}^{2}+6x_{2}x_{3}% -9x_{3}^{2}

we have

x22+6x2x37x32=(x23x3)2+2x32.-x_{2}^{2}+6x_{2}x_{3}-7x_{3}^{2}=-(x_{2}-3x_{3})^{2}+2x_{3}^{2}.

Thus we can write the quadratic form QQ as

Q[𝐱]=(x13x2+2x3)2(x23x3)2+2x32=y12y22+2y32Q[\mathbf{x}]=(x_{1}-3x_{2}+2x_{3})^{2}-(x_{2}-3x_{3})^{2}+2x_{3}^{2}=y_{1}^{2% }-y_{2}^{2}+2y_{3}^{2}

where

y1=x13x2+2x3,y2=x23x3,y3=x3.y_{1}=x_{1}-3x_{2}+2x_{3},\qquad y_{2}=x_{2}-3x_{3},\qquad y_{3}=x_{3}.

Finally, let us address the question that an attentive reader is probably already asking: what to do if at some point we do have a product of two variables, but no corresponding squares? For example, how to diagonalize the form x1x2x_{1}x_{2}? The answer follows immediately from the identity

(7.2.1) 4x1x2=(x1+x2)2(x1x2)2,\displaystyle 4x_{1}x_{2}=(x_{1}+x_{2})^{2}-(x_{1}-x_{2})^{2},

which gives us the representation

Q[𝐱]=y12y22,y1=(x1+x2)/2,y2=(x1x2)/2.Q[\mathbf{x}]=y_{1}^{2}-y_{2}^{2},\qquad y_{1}=(x_{1}+x_{2})/2,\ y_{2}=(x_{1}-% x_{2})/2.

Diagonalization using row/column operations

There is another way of performing non-orthogonal diagonalization of a quadratic form. The idea is to perform row operations on the matrix AA of the quadratic form. The difference with the row reduction (Gauss–Jordan elimination) is that after each row operation we need to perform the same column operation, the reason for that being that we want to make the matrix SASS^{*}AS diagonal.

Let us explain how everything works on an example. Suppose we want to diagonalize a quadratic form with matrix

A=(113121311).A=\left(\begin{array}[]{rrr}1&-1&3\\ -1&2&1\\ 3&1&1\\ \end{array}\right).

We augment the matrix AA by the identity matrix, and perform on the augmented matrix (A|I)(A|I) row/column operations. After each row operation we have to perform on the matrix AA the same column operation.111In the case of complex Hermitian matrices we perform for each row operation the conjugate of the corresponding column operation, see Remark 7.2.1 below We get

(113100121010311001)+R1\displaystyle\left(\begin{array}[]{rrr|rrr}1&-1&3&1&0&0\\ -1&2&1&0&1&0\\ 3&1&1&0&0&1\end{array}\right)\begin{array}[]{c}\vphantom{1}\\ +R_{1}\\ \vphantom{1}\end{array} (113100014110311001)\displaystyle\sim\left(\begin{array}[]{rrr|rrr}1&-1&3&1&0&0\\ 0&1&4&1&1&0\\ 3&1&1&0&0&1\end{array}\right)\sim
(103100014110341001)3R1\displaystyle\left(\begin{array}[]{rrr|rrr}1&0&3&1&0&0\\ 0&1&4&1&1&0\\ 3&4&1&0&0&1\end{array}\right)\begin{array}[]{c}\vphantom{1}\\ \vphantom{1}\\ -3R_{1}\end{array} (103100014110048301)\displaystyle\sim\left(\begin{array}[]{rrr|rrr}1&0&3&1&0&0\\ 0&1&4&1&1&0\\ 0&4&-8&-3&0&1\end{array}\right)\sim
(100100014110048301)4R2\displaystyle\left(\begin{array}[]{rrr|rrr}1&0&0&1&0&0\\ 0&1&4&1&1&0\\ 0&4&-8&-3&0&1\end{array}\right)\begin{array}[]{c}\vphantom{1}\\ \vphantom{1}\\ -4R_{2}\end{array} (1001000141100024741)\displaystyle\sim\left(\begin{array}[]{rrr|rrr}1&0&0&1&0&0\\ 0&1&4&1&1&0\\ 0&0&-24&-7&-4&1\end{array}\right)\sim
(1001000101100024741).\displaystyle\left(\begin{array}[]{rrr|rrr}1&0&0&1&0&0\\ 0&1&0&1&1&0\\ 0&0&-24&-7&-4&1\end{array}\right).\qquad

Note, that we perform column operations only on the left side of the augmented matrix

We get the diagonal DD matrix on the left, and the matrix SS^{*} on the right, so D=SASD=S^{*}AS,

(1000100024)=(100110741)(113121311)(117014001).\left(\begin{array}[]{rrr}1&0&0\\ 0&1&0\\ 0&0&-24\end{array}\right)=\left(\begin{array}[]{rrr}1&0&0\\ 1&1&0\\ -7&-4&1\end{array}\right)\left(\begin{array}[]{rrr}1&-1&3\\ -1&2&1\\ 3&1&1\\ \end{array}\right)\left(\begin{array}[]{rrr}1&1&-7\\ 0&1&-4\\ 0&0&1\end{array}\right).

Let us explain why the method works. A row operation is a left multiplication by an elementary matrix. The corresponding column operation is the right multiplication by the transposed elementary matrix. Therefore, performing row operations E1,E2,,ENE_{1},E_{2},\ldots,E_{N} and the same column operations we transform the matrix AA to

(7.2.2) ENE2E1AE1E2EN=EAE.E_{N}\ldots E_{2}E_{1}AE_{1}^{*}E_{2}^{*}\ldots E_{N}^{*}=EAE^{*}.

As for the identity matrix in the right side, we performed only row operations on it, so the identity matrix is transformed to

ENE2E1I=EI=E.E_{N}\ldots E_{2}E_{1}I=EI=E.

If we now denote E=SE^{*}=S we get that SASS^{*}AS is a diagonal matrix, and the matrix E=SE=S^{*} is the right half of the transformed augmented matrix.

In the above example we got lucky, because we did not need to interchange two rows. This operation is a bit tricker to perform. It is quite simple if you know what to do, but it may be hard to guess the correct row operations. Let us consider, for example, a quadratic form with the matrix

A=(0110)A=\left(\begin{array}[]{cc}0&1\\ 1&0\\ \end{array}\right)

If we want to diagonalize it by row and column operations, the simplest idea would be to interchange rows 1 and 2. But we also must to perform the same column operation, i.e. interchange columns 1 and 2, so we will end up with the same matrix.

So, we need something more non-trivial. The identity (7.2.1), for example, can be used to diagonalize this quadratic form. However, a simpler idea also works: use row operations to get a non-zero entry on the diagonal! For example, if we start with making a1,1a_{1,1} non-zero, the following series of row (and the corresponding column) operations is one of the possible choices:

(01101001)+12R2\displaystyle\left(\begin{array}[]{rr|rr}0&1&1&0\\ 1&0&0&1\\ \end{array}\right)\begin{array}[]{c}+\frac{1}{2}R_{2}\\ \vphantom{1}\end{array} (1/2111/21001)\displaystyle\sim\left(\begin{array}[]{cc|cc}1/2&1&1&1/2\\ 1&0&0&1\\ \end{array}\right)\sim
(1111/21001)R1\displaystyle\left(\begin{array}[]{cr|cc}1&1&1&1/2\\ 1&0&0&1\\ \end{array}\right)\begin{array}[]{c}\vphantom{1}\\ -R_{1}\end{array} (1111/20111/2)\displaystyle\sim\left(\begin{array}[]{cr|rc}1&1&1&1/2\\ 0&-1&-1&1/2\\ \end{array}\right)\sim
(1011/20111/2).\displaystyle\left(\begin{array}[]{cr|rc}1&0&1&1/2\\ 0&-1&-1&1/2\\ \end{array}\right).\qquad
Remark.

Non-orthogonal diagonalization gives us a simple description of a set Q[𝐱]=1Q[\mathbf{x}]=1 in a non-orthogonal basis. It is harder to visualize, than the representation given by the orthogonal diagonalization. However, if we are not interested in the details, for example if it is sufficient for us just to know that the set is an ellipsoid (or hyperboloid, etc), then the non-orthogonal diagonalization is an easier way to get the answer.

Remark 7.2.1.

For quadratic forms with complex entries (i.e. for forms (A𝐱,𝐱)(A\mathbf{x},\mathbf{x}), A=AA=A^{*}), the non-orthogonal diagonalization works the same way as in the real case, with the only difference, that the corresponding “column operations” have the complex conjugate coefficients.

The reason for that is that if a row operation is given by left multiplication by an elementary matrix EkE_{k}, then the corresponding column operation is given by the right multiplication by EkE_{k}^{*}, see (7.2.2).

Note that formula (7.2.2) works in both complex and reals cases: in real case we could write EkTE_{k}^{T} instead of EkE_{k}^{*}, but using Hermitian adjoint allows us to have the same formula in both cases.

Exercises.

7.2.1.

Diagonalize the quadratic form with the matrix

A=(121232121).A=\left(\begin{array}[]{ccc}1&2&1\\ 2&3&2\\ 1&2&1\end{array}\right).

Use two methods: completion of squares and row operations. Which one do you like better?

Can you say if the matrix AA is positive definite or not?

7.2.2.

For the matrix AA

A=(211121112)A=\left(\begin{array}[]{rrr}2&1&1\\ 1&2&1\\ 1&1&2\end{array}\right)

orthogonally diagonalize the corresponding quadratic form, i.e. find a diagonal matrix DD and a unitary matrix UU such that D=UAUD=U^{*}AU.

7.3. Sylvester’s Law of Inertia

As we discussed above, there are many ways to diagonalize a quadratic form. Note, that a resulting diagonal matrix is not unique. For example, if we got a diagonal matrix

D=diag{λ1,λ2,,λn},D=\operatorname{diag}\{\lambda_{1},\lambda_{2},\ldots,\lambda_{n}\},

we can take a diagonal matrix

S=diag{s1,s2,,sn},sk,sk0S=\operatorname{diag}\{s_{1},s_{2},\ldots,s_{n}\},\qquad s_{k}\in\mathbb{R},% \quad s_{k}\neq 0

and transform DD to

SDS=diag{s12λ1,s22λ2,,sn2λn}.S^{*}DS=\operatorname{diag}\{s_{1}^{2}\lambda_{1},s_{2}^{2}\lambda_{2},\ldots,% s_{n}^{2}\lambda_{n}\}.

This transformation changes the diagonal entries of DD. However, it does not change the signs of the diagonal entries. And this is always the case!

Namely, the famous Sylvester’s Law of Inertia states that:

For a Hermitian matrix AA (i.e. for a quadratic form Q[𝐱]=(A𝐱,𝐱)Q[\mathbf{x}]=(A\mathbf{x},\mathbf{x})) and any of its diagonalization D=SASD=S^{*}AS, the number of positive (negative, zero) diagonal entries of DD depends only on AA, but not on a particular choice of diagonalization.

Here we of course assume that SS is an invertible matrix, and DD is a diagonal one.

The idea of the proof of the Sylvester’s Law of Inertia is to express the number of positive (negative, zero) diagonal entries of a diagonalization D=SASD=S^{*}AS in terms of AA, not involving SS or DD.

We will need the following definition.

Definition.

Given an n×nn\times n Hermitian matrix A=AA=A^{*} (a quadratic form Q[𝐱]=(A𝐱,𝐱)Q[\mathbf{x}]=(A\mathbf{x},\mathbf{x}) on 𝔽n\mathbb{F}^{n}) we call a subspace E𝔽nE\subset\mathbb{F}^{n} positive (resp. negative, resp. neutral) if

(A𝐱,𝐱)>0(resp. (A𝐱,𝐱)<0,resp. (A𝐱,𝐱)=0)(A\mathbf{x},\mathbf{x})>0\qquad(\text{resp.~{}}(A\mathbf{x},\mathbf{x})<0,% \quad\text{resp.~{}}(A\mathbf{x},\mathbf{x})=0)

for all 𝐱E\mathbf{x}\in E, 𝐱𝟎\mathbf{x}\neq\mathbf{0}.

Sometimes, to emphasize the role of AA we will say AA-positive (AA negative, AA-neutral).

Theorem 7.3.1.

Let AA be an n×nn\times n Hermitian matrix, and let D=SASD=S^{*}AS be its diagonalization by an invertible matrix SS. Then the number of positive (resp. negative) diagonal entries of DD coincides with the maximal dimension of an AA-positive (resp. AA-negative) subspace.

The above theorem says that if r+r_{+} is the number of positive diagonal entries of DD, then there exists an AA-positive subspace EE of dimension r+r_{+}, but it is impossible to find a positive subspace EE with dimE>r+\dim E>r_{+}.

We will need the following lemma, which can be considered a particular case of the above theorem.

Lemma 7.3.2.

Let DD be a diagonal matrix D=diag{λ1,λ2,,λn}D=\operatorname{diag}\{\lambda_{1},\lambda_{2},\ldots,\lambda_{n}\}. Then the number of positive (resp. negative) diagonal entries of DD coincides with the maximal dimension of a DD-positive (resp. DD-negative) subspace.

Proof.

By rearranging the standard basis in 𝔽n\mathbb{F}^{n} (changing the numeration) we can always assume without loss of generality that the positive diagonal entries of DD are the first r+r_{+} diagonal entries.

Consider the subspace E+E_{+} spanned by the first r+r_{+} coordinate vectors 𝐞1,𝐞2,,𝐞r+\mathbf{e}_{1},\mathbf{e}_{2},\ldots,\mathbf{e}_{r_{+}}. Clearly E+E_{+} is a DD-positive subspace, and dimE+=r+\dim E_{+}=r_{+}.

Let us now show that for any other DD-positive subspace EE we have dimEr+\dim E\leq r_{+}. Consider the orthogonal projection P=PE+P=P_{{}_{\scriptstyle E_{+}}},

P𝐱=(x1,x2,,xr+,0,0)T,𝐱=(x1,x2,,xn)T.P\mathbf{x}=(x_{1},x_{2},\ldots,x_{r_{+}},0\ldots,0)^{T},\qquad\mathbf{x}=(x_{% 1},x_{2},\ldots,x_{n})^{T}.

For a DD-positive subspace EE define an operator T:EE+T:E\to E_{+} by

T𝐱=P𝐱,𝐱E.T\mathbf{x}=P\mathbf{x},\qquad\forall\mathbf{x}\in E.

In other words, TT is the restriction of the projection PP: PP is defined on the whole space, but we restricted its domain to EE and target space to E+E_{+}. We got an operator acting from EE to E+E_{+}, and we use a different letter to distinguish it from PP.

Note, that KerT={𝟎}\operatorname{Ker}T=\{\mathbf{0}\}. Indeed, let for 𝐱=(x1,x2,,xn)TE\mathbf{x}=(x_{1},x_{2},\ldots,x_{n})^{T}\in E we have T𝐱=P𝐱=𝟎T\mathbf{x}=P\mathbf{x}=\mathbf{0}. Then, by the definition of PP

x1=x2==xr+=0,x_{1}=x_{2}=\ldots=x_{r_{+}}=0,

and therefore

(D𝐱,𝐱)=k=r++1nλkxk20(λk0for k>r+).(D\mathbf{x},\mathbf{x})=\sum_{k=r_{+}+1}^{n}\lambda_{k}x_{k}^{2}\leq 0\qquad(% \lambda_{k}\leq 0\ \text{for }k>r_{+}).

But 𝐱\mathbf{x} belongs to a DD-positive subspace EE, so the inequality (D𝐱,𝐱)0(D\mathbf{x},\mathbf{x})\leq 0 holds only for 𝐱=𝟎\mathbf{x}=\mathbf{0}.

Let us now apply the Rank Theorem (Theorem 2.7.1 from Chapter 2). First of all, rankT=dimRanTdimE+=r+\operatorname{rank}T=\dim\operatorname{Ran}T\leq\dim E_{+}=r_{+} because RanTE+\operatorname{Ran}T\subset E_{+}. By the Rank Theorem, dimKerT+rankT=dimE\dim\operatorname{Ker}T+\operatorname{rank}T=\dim E. But we just proved that KerT={𝟎}\operatorname{Ker}T=\{\mathbf{0}\}, i.e. that dimKerT=0\dim\operatorname{Ker}T=0, so

dimE=rankTdimE+=r+.\dim E=\operatorname{rank}T\leq\dim E_{+}=r_{+}.

To prove the statement about negative entries, we just apply the above reasoning to the matrix D-D. ∎

Proof of Theorem 7.3.1.

Let D=SASD=S^{*}AS be a diagonalization of AA. Since

(D𝐱,𝐱)=(SAS𝐱,𝐱)=(AS𝐱,S𝐱)(D\mathbf{x},\mathbf{x})=(S^{*}AS\mathbf{x},\mathbf{x})=(AS\mathbf{x},S\mathbf% {x})

it follows that for any DD-positive subspace EE, the subspace SESE is an AA-positive subspace. The same identity implies that for any AA-positive subspace FF the subspace S1FS^{-1}F is DD-positive.

Since SS and S1S^{-1} are invertible transformations, dimE=dimSE\dim E=\dim SE and dimF=dimS1F\dim F=\dim S^{-1}F. Therefore, for any DD positive subspace EE we can find an AA-positive subspace (namely SESE) of the same dimension, and vice versa: for any AA-positive subspace FF we can find a DD-positive subspace (namely S1FS^{-1}F) of the same dimension. Therefore the maximal possible dimensions of a AA-positive and a DD-positive subspace coincide, and the theorem is proved.

The case of negative diagonal entries treated similarly, we leave the details as an exercise to the reader. ∎

7.4. Positive definite forms. Minimax characterization of eigenvalues and the Sylvester’s criterion of positivity

Definition.

A quadratic form QQ is called

  • Positive definite if Q[𝐱]>0Q[\mathbf{x}]>0 for all 𝐱𝟎\mathbf{x}\neq\mathbf{0}.

  • Positive semidefinite if Q[𝐱]0Q[\mathbf{x}]\geq 0 for all 𝐱\mathbf{x}.

  • Negative definite if Q[𝐱]<0Q[\mathbf{x}]<0 for all 𝐱𝟎\mathbf{x}\neq\mathbf{0}.

  • Negative semidefinite if Q[𝐱]0Q[\mathbf{x}]\leq 0 for all 𝐱\mathbf{x}.

  • Indefinite if it take both positive and negative values, i.e. if there exist vectors 𝐱1\mathbf{x}_{1} and 𝐱2\mathbf{x}_{2} such that Q[𝐱1]>0Q[\mathbf{x}_{1}]>0 and Q[𝐱2]<0Q[\mathbf{x}_{2}]<0.

Definition.

A Hermitian matrix A=AA=A^{*} is called positive definite (negative definite, etc…) if the corresponding quadratic form Q[𝐱]=(A𝐱,𝐱)Q[\mathbf{x}]=(A\mathbf{x},\mathbf{x}) is positive definite (negative definite, etc…).

Theorem 7.4.1.

Let A=AA=A^{*}. Then

  1. 1.

    AA is positive definite iff all eigenvalues of AA are positive.

  2. 2.

    AA is positive semidefinite iff all eigenvalues of AA are non-negative.

  3. 3.

    AA is negative definite iff all eigenvalues of AA are negative.

  4. 4.

    AA is negative semidefinite iff all eigenvalues of AA are non-positive.

  5. 5.

    AA is indefinite iff it has both positive and negative eigenvalues.

Proof.

The proof follows trivially from the orthogonal diagonalization. Indeed, there is an orthonormal basis in which matrix of AA is diagonal, and for diagonal matrices the theorem is trivial. ∎

Remark.

Note, that to find whether a matrix (a quadratic form) is positive definite (negative definite, etc) one does not have to compute eigenvalues. By Sylvester’s Law of Inertia it is sufficient to perform an arbitrary, not necessarily orthogonal diagonalization D=SASD=S^{*}AS and look at the diagonal entries of DD.

7.4.1. Sylvester’s criterion of positivity

It is an easy exercise to see that a 2×22\times 2 matrix

A=(abb¯c)A=\left(\begin{array}[]{rr}a&b\\ \overline{b}&c\\ \end{array}\right)

is positive definite if and only if

(7.4.1) a>0and detA=ac|b|2>0a>0\qquad\text{and }\det A=ac-|b|^{2}>0

Indeed, if a>0a>0 and detA=ac|b|2>0\det A=ac-|b|^{2}>0, then c>0c>0, so traceA=a+c>0\operatorname{trace}A=a+c>0. So we know that if λ1,λ2\lambda_{1},\lambda_{2} are eigenvalues of AA then λ1λ2>0\lambda_{1}\lambda_{2}>0 (detA>0\det A>0) and λ1+λ2=traceA>0\lambda_{1}+\lambda_{2}=\operatorname{trace}A>0. But that only possible if both eigenvalues are positive. So we have proved that conditions (7.4.1) imply that AA is positive definite. The opposite implication is quite simple, we leave it as an exercise for the reader.

This result can be generalized to the case of n×nn\times n matrices. Namely, for a matrix AA

A=(a1,1a1,2a1,na2,1a2,2a2,nan,1an,2an,n)A=\left(\begin{array}[]{cccc}a_{1,1}&a_{1,2}&\ldots&a_{1,n}\\ a_{2,1}&a_{2,2}&\ldots&a_{2,n}\\ \vdots&\vdots&\ddots&\vdots\\ a_{n,1}&a_{n,2}&\ldots&a_{n,n}\end{array}\right)

let us consider its all upper left submatrices

A1=(a1,1),A2=(a1,1a1,2a2,1a2,2),A3=(a1,1a1,2a1,3a2,1a2,2a2,3a3,1a3,2a3,3),,An=AA_{1}=(a_{1,1}),\ A_{2}=\left(\begin{array}[]{cc}a_{1,1}&a_{1,2}\\ a_{2,1}&a_{2,2}\\ \end{array}\right),\ A_{3}=\left(\begin{array}[]{ccc}a_{1,1}&a_{1,2}&a_{1,3}\\ a_{2,1}&a_{2,2}&a_{2,3}\\ a_{3,1}&a_{3,2}&a_{3,3}\\ \end{array}\right),\ldots,A_{n}=A
Theorem 7.4.2 (Sylvester’s Criterion of Positivity).

A matrix A=AA=A^{*} is positive definite if and only if

detAk>0for all k=1,2,,n.\det A_{k}>0\qquad\text{for all }k=1,2,\ldots,n.

First of all let us notice that if A>0A>0 then Ak>0A_{k}>0 also (can you explain why?). Therefore, since all eigenvalues of a positive definite matrix are positive, see Theorem 7.4.1, detAk>0\det A_{k}>0 for all kk.

One can show that if detAk>0\det A_{k}>0 k\forall k then all eigenvalues of AA are positive by analyzing diagonalization of a quadratic form using row and column operations, which was described in Section 7.2.2. The key here is the observation that if we perform row/column operations in natural order (i.e. first subtracting the first row/column from all other rows/columns, then subtracting the second row/columns from the rows/columns 3,4,,n3,4,\ldots,n, and so on…), and if we are not doing any row interchanges, then we automatically diagonalize quadratic forms AkA_{k} as well. Namely, after we subtract first and second rows and columns, we get diagonalization of A2A_{2}; after we subtract the third row/column we get the diagonalization of A3A_{3}, and so on.

Since we are performing only row replacement we do not change the determinant. Moreover, since we are not performing row exchanges and performing the operations in the correct order, we preserve determinants of AkA_{k}. Therefore, the condition detAk>0\det A_{k}>0 guarantees that each new entry in the diagonal is positive.

Of course, one has to be sure that we can use only row replacements, and perform the operations in the correct order, i.e. that we do not encounter any pathological situation. If one analyzes the algorithm, one can see that the only bad situation that can happen is the situation where at some step we have zero in the pivot place. In other words, if after we subtracted the first kk rows and columns and obtained a diagonalization of AkA_{k}, the entry in the k+1k+1st row and k+1k+1st column is 0. We leave it as an exercise for the reader to show that this is impossible. ∎

The proof we outlined above is quite simple. However, let us present, in more detail, another one, which can be found in more advanced textbooks. I personally prefer this second proof, for it demonstrates some important connections.

We will need the following characterization of eigenvalues of a hermitian matrix.

7.4.2. Minimax characterization of eigenvalues

For a subspace EE of an inner product space XX the codimension codimE\operatorname{codim}E of EE is defined as the the dimension of its orthogonal complement, codimE=dim(E)\operatorname{codim}E=\dim(E^{\perp}). Since for a subspace EXE\subset X, dimX=n\dim X=n we have dimE+dimE=n\dim E+\dim E^{\perp}=n, we can see that codimE=dimXdimE\operatorname{codim}E=\dim X-\dim E.

Recall that the trivial subspace {𝟎}\{\mathbf{0}\} has dimension zero, so the whole space XX has codimension 0.

Remark.

For arbitrary vector space the codimension of a subspace EE can just be defined as codimE=dimXdimE\operatorname{codim}E=\dim X-\dim E.

A more “high brow” definition would be the dimension of the quotient space X/EX/E; however in this book we do not discuss the quotient space, and ether definition (dimE\dim E^{\perp} or dimXdime\dim X-\dim e works for our purposes).

Theorem 7.4.3 (Minimax characterization of eigenvalues).

Let A=AA=A^{*} be an n×nn\times n matrix, and let λ1λ2λn\lambda_{1}\geq\lambda_{2}\geq\ldots\geq\lambda_{n} be its eigenvalues taken in the decreasing order. Then

λk=maxE:dimE=kmin𝐱E𝐱=1(A𝐱,𝐱)=minF:codimF=k1max𝐱F𝐱=1(A𝐱,𝐱).\lambda_{k}=\max_{\begin{subarray}{c}E:\\ \dim E=k\end{subarray}}\ \min_{\begin{subarray}{c}\mathbf{x}\in E\\ \|\mathbf{x}\|=1\end{subarray}}(A\mathbf{x},\mathbf{x})=\min_{\begin{subarray}% {c}F:\\ \operatorname{codim}F=k-1\end{subarray}}\ \max_{\begin{subarray}{c}\mathbf{x}% \in F\\ \|\mathbf{x}\|=1\end{subarray}}(A\mathbf{x},\mathbf{x}).

Let us explain in more details what the expressions like maxmin\max\min and minmax\min\max mean. To compute the first one, we need to consider all subspaces EE of dimension kk. For each such subspace EE we consider the set of all 𝐱E\mathbf{x}\in E of norm 1, and find the minimum of (A𝐱,𝐱)(A\mathbf{x},\mathbf{x}) over all such 𝐱\mathbf{x}. Thus for each subspace we obtain a number, and we need to pick a subspace EE such that the number is maximal. That is the maxmin\max\min.

The minmax\min\max is defined similarly.

Remark.

A sophisticated reader may notice a problem here: why do the maxima and minima exist? It is well known, that maximum and minimum have a nasty habit of not existing: for example, the function f(x)=xf(x)=x has neither maximum nor minimum on the open interval (0,1)(0,1).

However, in this case maximum and minimum do exist. There are two possible explanations of the fact that (A𝐱,𝐱)(A\mathbf{x},\mathbf{x}) attains maximum and minimum. The first one requires some familiarity with basic notions of analysis: one should just say that the unit sphere in EE, i.e. the set {𝐱E:𝐱=1}\{\mathbf{x}\in E:\|\mathbf{x}\|=1\} is compact, and that a continuous function (Q[𝐱]=(A𝐱,𝐱)Q[\mathbf{x}]=(A\mathbf{x},\mathbf{x}) in our case) on a compact set attains its maximum and minimum.

Another explanation will be to notice that the function Q[𝐱]=(A𝐱,𝐱)Q[\mathbf{x}]=(A\mathbf{x},\mathbf{x}), 𝐱E\mathbf{x}\in E is a quadratic form on EE. It is not difficult to compute the matrix of this form in some orthonormal basis in EE, but let us only note that this matrix is not AA: it has to be a k×kk\times k matrix, where k=dimEk=\dim E.

It is easy to see that for a quadratic form the maximum and minimum over a unit sphere is the maximal and minimal eigenvalues of its matrix.

As for optimizing over all subspaces, we will prove below that the maximum and minimum do exist.

Proof of Theorem 7.4.3.

First of all, by picking an appropriate orthonormal basis, we can assume without loss of generality that the matrix AA is diagonal, A=diag{λ1,λ2,,λn}A=\operatorname{diag}\{\lambda_{1},\lambda_{2},\ldots,\lambda_{n}\}.

Pick subspaces EE and FF, dimE=k\dim E=k, codimF=k1\operatorname{codim}F=k-1, i.e. dimF=nk+1\dim F=n-k+1. Since dimE+dimF>n\dim E+\dim F>n, there exists a non-zero vector 𝐱0EF\mathbf{x}_{0}\in E\cap F. By normalizing it we can assume without loss of generality that 𝐱0=1\|\mathbf{x}_{0}\|=1. We can always arrange the eigenvalues in decreasing order, so let us assume that λ1λ2λn\lambda_{1}\geq\lambda_{2}\geq\ldots\geq\lambda_{n}.

Since 𝐱0\mathbf{x}_{0} belongs to the both subspaces EE and FF

min𝐱E𝐱=1(A𝐱,𝐱)(A𝐱0,𝐱0)max𝐱F𝐱=1(A𝐱,𝐱).\min_{\begin{subarray}{c}\mathbf{x}\in E\\ \|\mathbf{x}\|=1\end{subarray}}(A\mathbf{x},\mathbf{x})\leq(A\mathbf{x}_{0},% \mathbf{x}_{0})\leq\max_{\begin{subarray}{c}\mathbf{x}\in F\\ \|\mathbf{x}\|=1\end{subarray}}(A\mathbf{x},\mathbf{x}).

We did not assume anything except dimensions about the subspaces EE and FF, so the above inequality

(7.4.2) min𝐱E𝐱=1(A𝐱,𝐱)max𝐱F𝐱=1(A𝐱,𝐱).\min_{\begin{subarray}{c}\mathbf{x}\in E\\ \|\mathbf{x}\|=1\end{subarray}}(A\mathbf{x},\mathbf{x})\leq\max_{\begin{% subarray}{c}\mathbf{x}\in F\\ \|\mathbf{x}\|=1\end{subarray}}(A\mathbf{x},\mathbf{x}).

holds for all pairs of EE and FF of appropriate dimensions.

Define

E0:=span{𝐞1,𝐞2,,𝐞k},F0:=span{𝐞k,𝐞k+1,𝐞k+2,,𝐞n}.E_{0}:=\operatorname{span}\{\mathbf{e}_{1},\mathbf{e}_{2},\ldots,\mathbf{e}_{k% }\},\qquad F_{0}:=\operatorname{span}\{\mathbf{e}_{k},\mathbf{e}_{k+1},\mathbf% {e}_{k+2},\ldots,\mathbf{e}_{n}\}.

Since for a self-adjoint matrix BB, the maximum and minimum of (B𝐱,𝐱)(B\mathbf{x},\mathbf{x}) over the unit sphere {𝐱:𝐱=1}\{\mathbf{x}:\|\mathbf{x}\|=1\} are the maximal and the minimal eigenvalue respectively (easy to check on diagonal matrices), we get that

min𝐱E0𝐱=1(A𝐱,𝐱)=max𝐱F0𝐱=1(A𝐱,𝐱)=λk.\min_{\begin{subarray}{c}\mathbf{x}\in E_{0}\\ \|\mathbf{x}\|=1\end{subarray}}(A\mathbf{x},\mathbf{x})=\max_{\begin{subarray}% {c}\mathbf{x}\in F_{0}\\ \|\mathbf{x}\|=1\end{subarray}}(A\mathbf{x},\mathbf{x})=\lambda_{k}.

It follows from (7.4.2) that for any subspace EE, dimE=k\dim E=k

min𝐱E𝐱=1(A𝐱,𝐱)max𝐱F0𝐱=1(A𝐱,𝐱)=λk\min_{\begin{subarray}{c}\mathbf{x}\in E\\ \|\mathbf{x}\|=1\end{subarray}}(A\mathbf{x},\mathbf{x})\leq\max_{\begin{% subarray}{c}\mathbf{x}\in F_{0}\\ \|\mathbf{x}\|=1\end{subarray}}(A\mathbf{x},\mathbf{x})=\lambda_{k}

and similarly, for any subspace FF of codimension k1k-1,

max𝐱F𝐱=1(A𝐱,𝐱)min𝐱E0𝐱=1(A𝐱,𝐱)=λk.\max_{\begin{subarray}{c}\mathbf{x}\in F\\ \|\mathbf{x}\|=1\end{subarray}}(A\mathbf{x},\mathbf{x})\geq\min_{\begin{% subarray}{c}\mathbf{x}\in E_{0}\\ \|\mathbf{x}\|=1\end{subarray}}(A\mathbf{x},\mathbf{x})=\lambda_{k}.

But on subspaces E0E_{0} and F0F_{0} both maximum and minimum are λk\lambda_{k}, so minmax=maxmin=λk\min\max=\max\min=\lambda_{k}. ∎

Corollary 7.4.4 (Intertwining of eigenvalues).

Let A=A={aj,k}j,k=1nA=A^{*}=\{a_{j,k}\}_{j,k=1}^{n} be a self-adjoint matrix, and let A~={aj,k}j,k=1n1\widetilde{A}=\{a_{j,k}\}_{j,k=1}^{n-1} be its submatrix of size (n1)×(n1)(n-1)\times(n-1). Let λ1,λ2,,λn\lambda_{1},\lambda_{2},\ldots,\lambda_{n} and μ1,μ2,,μn1\mu_{1},\mu_{2},\ldots,\mu_{n-1} be the eigenvalues of AA and A~\widetilde{A} respectively, taken in decreasing order. Then

λ1μ1λ2μ2λn1μn1λn.\lambda_{1}\geq\mu_{1}\geq\lambda_{2}\geq\mu_{2}\geq\ldots\geq\lambda_{n-1}% \geq\mu_{n-1}\geq\lambda_{n}.

i.e.

λkμkλk+1,k=1,2,,n1\lambda_{k}\geq\mu_{k}\geq\lambda_{k+1},\qquad k=1,2,\ldots,n-1
Proof.

Let X~𝔽n\widetilde{X}\subset\mathbb{F}^{n} be the subspace spanned by the first n1n-1 basis vectors, X~=span{𝐞1,𝐞2,,𝐞n1}\widetilde{X}=\operatorname{span}\{\mathbf{e}_{1},\mathbf{e}_{2},\ldots,% \mathbf{e}_{n-1}\}. Since (A~𝐱,𝐱)=(A𝐱,𝐱)(\widetilde{A}\mathbf{x},\mathbf{x})=(A\mathbf{x},\mathbf{x}) for all 𝐱X~\mathbf{x}\in\widetilde{X}, Theorem 7.4.3 implies that

μk=maxEX~dimE=kmin𝐱E𝐱=1(A𝐱,𝐱).\mu_{k}=\max_{\begin{subarray}{c}E\subset\widetilde{X}\\ \dim E=k\end{subarray}}\ \min_{\begin{subarray}{c}\mathbf{x}\in E\\ \|\mathbf{x}\|=1\end{subarray}}(A\mathbf{x},\mathbf{x}).

To get λk\lambda_{k} we need to get maximum over the set of all subspaces EE of 𝔽n\mathbb{F}^{n}, dimE=k\dim E=k, i.e. take maximum over a bigger set (any subspace of X~\widetilde{X} is a subspace of 𝔽n\mathbb{F}^{n}). Therefore

μkλk.\mu_{k}\leq\lambda_{k}.

(the maximum can only increase, if we increase the set).

On the other hand, any subspace EX~E\subset\widetilde{X} of codimension k1k-1 (here we mean codimension in X~\widetilde{X}) has dimension n1(k1)=nkn-1-(k-1)=n-k, so its codimension in 𝔽n\mathbb{F}^{n} is kk. Therefore

μk=minEX~dimE=nkmax𝐱E𝐱=1(A𝐱,𝐱)minE𝔽ndimE=nkmax𝐱E𝐱=1(A𝐱,𝐱)=λk+1\mu_{k}=\min_{\begin{subarray}{c}E\subset\widetilde{X}\\ \dim E=n-k\end{subarray}}\ \max_{\begin{subarray}{c}\mathbf{x}\in E\\ \|\mathbf{x}\|=1\end{subarray}}(A\mathbf{x},\mathbf{x})\geq\min_{\begin{% subarray}{c}E\subset\mathbb{F}^{n}\\ \dim E=n-k\end{subarray}}\ \max_{\begin{subarray}{c}\mathbf{x}\in E\\ \|\mathbf{x}\|=1\end{subarray}}(A\mathbf{x},\mathbf{x})=\lambda_{k+1}

(minimum over a bigger set can only be smaller). ∎

Proof of Theorem 7.4.2.

If A>0A>0, then Ak>0A_{k}>0 for k=1,2,,nk=1,2,\ldots,n as well (can you explain why?). Since all eigenvalues of a positive definite matrix are positive (see Theorem 7.4.1), detAk>0\det A_{k}>0 for all k=1,2,,nk=1,2,\ldots,n.

Let us now prove the other implication. Let detAk>0\det A_{k}>0 for all kk. We will show, using induction in kk, that all AkA_{k} (and so A=AnA=A_{n}) are positive definite.

Clearly A1A_{1} is positive definite (it is 1×11\times 1 matrix, so A1=detA1A_{1}=\det A_{1}). Assuming that Ak1>0A_{k-1}>0 (and detAk>0\det A_{k}>0) let us show that AkA_{k} is positive definite. Let λ1,λ2,,λk\lambda_{1},\lambda_{2},\ldots,\lambda_{k} and μ1,μ2,,μk1\mu_{1},\mu_{2},\ldots,\mu_{k-1} be eigenvalues of AkA_{k} and Ak1A_{k-1} respectively. By Corollary 7.4.4

λjμj>0for j=1,2,,k1.\lambda_{j}\geq\mu_{j}>0\qquad\text{for }j=1,2,\ldots,k-1.

Since detAk=λ1λ2λk1λk>0\det A_{k}=\lambda_{1}\lambda_{2}\ldots\lambda_{k-1}\lambda_{k}>0, the last eigenvalue λk\lambda_{k} must also be positive. Therefore, since all its eigenvalues are positive, the matrix AkA_{k} is positive definite. ∎

7.4.3. Some remarks

First of all notice, that Sylvester Criterion of Positivity does not generalize to positive semidefinite matrices if n3n\geq 3, meaning that for n×nn\times n matrices, n3n\geq 3, the conditions detAk0\det A_{k}\geq 0 do not imply that AA is positive semidefinite, see Problem 7.4.6 below.

For 2×22\times 2 matrices, however, the conditions detAk0\det A_{k}\geq 0 imply that AA is positive semidefinite, see Problem 7.4.3 below. This sometimes leads to the wrong conclusion about n×nn\times n matrices.

Finally, we should say couple words about negative definite matrices. It is a typical students’ mistake to say that the condition detAk<0\det A_{k}<0 implies that AA is negative definite. But that is wrong!

To check if the matrix AA is negative definite, one just have to check that the matrix A-A is positive definite. Applying Sylvester’s Criterion of Positivity to A-A one can see that AA is negative definite if and only if (1)kdetAk>0(-1)^{k}\det A_{k}>0 for all k=1,2,,nk=1,2,\ldots,n.

Exercises.

7.4.1.

Using Sylvester’s Criterion of Positivity check if the matrices

A=(421231112),B=(312142222)A=\left(\begin{array}[]{rrr}4&2&1\\ 2&3&-1\\ 1&-1&2\end{array}\right),\qquad B=\left(\begin{array}[]{rrr}3&-1&2\\ -1&4&-2\\ 2&-2&2\end{array}\right)

are positive definite or not.

Are the matrices A-A, A3A^{3} and A1A^{-1}, A+B1A+B^{-1}, A+BA+B, ABA-B positive definite?

7.4.2.

True or false:

  1. a)

    If AA is positive definite, then A5A^{5} is positive definite.

  2. b)

    If AA is negative definite, then A8A^{8} is negative definite.

  3. c)

    If AA is negative definite, then A12A^{12} is positive definite.

  4. d)

    If AA is positive definite and BB is negative semidefinite, then ABA-B is positive definite.

  5. e)

    If AA is indefinite, and BB is positive definite, then A+BA+B is indefinite.

7.4.3.

Let AA be a 2×22\times 2 Hermitian matrix, such that a1,1>0a_{1,1}>0, detA0\det A\geq 0. Prove that AA is positive semidefinite.

7.4.4.

Find a real symmetric n×nn\times n matrix such that detAk0\det A_{k}\geq 0 for all k=1,2,,nk=1,2,\ldots,n, but the matrix AA is not positive semidefinite. Try to find an example for the minimal possible nn.

7.4.5.

Let AA be an n×nn\times n Hermitian matrix such that detAk>0\det A_{k}>0 for all k=1,2,,n1k=1,2,\ldots,n-1 and detA0\det A\geq 0. Prove that AA is positive semidefinite.

7.4.6.

Find a real symmetric 3×33\times 3 matrix AA such that a1,1>0a_{1,1}>0, detAk0\det A_{k}\geq 0 for k=2,3k=2,3, but the matrix AA is not positive semidefinite.

7.5. Positive definite forms and inner products

Let VV be an inner product space and let =𝐯1,𝐯2,,𝐯n\mathcal{B}=\mathbf{v}_{1},\mathbf{v}_{2},\ldots,\mathbf{v}_{n} be a basis (not necessarily orthogonal) in VV. Let G={gj,k}j,k=1nG=\{g_{j,k}\}_{j,k=1}^{n} be the matrix defined by

gj,k=(𝐯k,𝐯j).g_{j,k}=(\mathbf{v}_{k},\mathbf{v}_{j}).

If 𝐱=kxk𝐯k\mathbf{x}=\sum_{k}x_{k}\mathbf{v}_{k} and 𝐲=kyk𝐯k\mathbf{y}=\sum_{k}y_{k}\mathbf{v}_{k}, then

(𝐱,𝐲)=(kxk𝐯k,jyj𝐯j)\displaystyle(\mathbf{x},\mathbf{y})=\left(\sum_{k}x_{k}\mathbf{v}_{k},\sum_{j% }y_{j}\mathbf{v}_{j}\right) =k,j=1nxky¯j(𝐯k,𝐯j)\displaystyle=\sum_{k,j=1}^{n}x_{k}\overline{y}_{j}(\mathbf{v}_{k},\mathbf{v}_% {j})
=j=1nk=1ngj,kxky¯j=(G[𝐱],[𝐲])n,\displaystyle=\sum_{j=1}^{n}\sum_{k=1}^{n}g_{j,k}x_{k}\overline{y}_{j}=\left(G% [\mathbf{x}]_{\mathcal{B}},[\mathbf{y}]_{\mathcal{B}}\right)_{\mathbb{C}^{n}},

where (,)n(\,\cdot\,,\,\cdot\,)_{\mathbb{C}^{n}} stands for the standard inner product in n\mathbb{C}^{n}. One can immediately see that GG is a positive definite matrix (why?).

So, when one works with coordinates in an arbitrary (not necessarily orthogonal) basis in an inner product space, the inner product (in terms of coordinates) is not computed as the standard inner product in n\mathbb{C}^{n}, but with the help of a positive definite matrix GG as described above.

Note, that this GG-inner product coincides with the standard inner product in n\mathbb{C}^{n} if and only if G=IG=I, which happens if and only if the basis 𝐯1,𝐯2,,𝐯n\mathbf{v}_{1},\mathbf{v}_{2},\ldots,\mathbf{v}_{n} is orthonormal.

Conversely, given a positive definite matrix GG one can define a non-standard inner product (GG-inner product) in n\mathbb{C}^{n} by

(𝐱,𝐲)G:=(G𝐱,𝐲)n,𝐱,𝐲n.(\mathbf{x},\mathbf{y})_{G}:=(G\mathbf{x},\mathbf{y})_{\mathbb{C}^{n}},\qquad% \mathbf{x},\mathbf{y}\in\mathbb{C}^{n}.

One can easily check that (𝐱,𝐲)G(\mathbf{x},\mathbf{y})_{G} is indeed an inner product, i.e. that properties 1–4 from Section 5.1.3 of Chapter 5 are satisfied.