Linear Algebra Done Wrong 6 Structure of operators in inner product spaces.8 Dual spaces and tensors

Chapter 7 Bilinear and quadratic forms

While the study of real quadratic forms (i.e. real homogeneous polynomials of degree $2$ ) was probably the initial motivation for the subject of this chapter, complex quadratic forms $(A\mathbf{x},\mathbf{x})$ , $\mathbf{x}\in\mathbb{C}^{n}$ , $A=A^{*}$ are also of significant interest. So, unless otherwise specified, result and calculations hold in both real and complex case.

To avoid writing twice essentially the same formulas, we use the notation adapted to the complex case: in particular, in the real case the notation $A^{*}$ is used instead of $A^{T}$ .

7.1. Main definition

7.1.1. Bilinear forms on $\mathbb{R}^{n}$

A bilinear form on $\mathbb{R}^{n}$ is a function $L=L(\mathbf{x},\mathbf{y})$ of two arguments $\mathbf{x},\mathbf{y}\in\mathbb{R}^{n}$ which is linear in each argument, i.e. such that

1.

$L(\alpha\mathbf{x}_{1}+\beta\mathbf{x}_{2},\mathbf{y})=\alpha L(\mathbf{x}_{1}% ,\mathbf{y})+\beta L(\mathbf{x}_{2},\mathbf{y});$
2.

$L(\mathbf{x},\alpha\mathbf{y}_{1}+\beta\mathbf{y}_{2})=\alpha L(\mathbf{x},% \mathbf{y}_{1})+\beta L(\mathbf{x},\mathbf{y}_{2})$ .

One can consider bilinear form whose values belong to an arbitrary vector space, but in this book we only consider forms that take real values.

If $\mathbf{x}=(x_{1},x_{2},\ldots,x_{n})^{T}$ and $\mathbf{y}=(y_{1},y_{2},\ldots,y_{n})^{T}$ , a bilinear form can be written as

L(\mathbf{x},\mathbf{y})=\sum_{j,k=1}^{n}a_{j,k}x_{k}y_{j},

or in matrix form

L(\mathbf{x},\mathbf{y})=(A\mathbf{x},\mathbf{y})

where

A=\left(\begin{array}[]{cccc}a_{1,1}&a_{1,2}&\ldots&a_{1,n}\\ a_{2,1}&a_{2,2}&\ldots&a_{2,n}\\ \vdots&\vdots&\ddots&\vdots\\ a_{n,1}&a_{n,2}&\ldots&a_{n,n}\end{array}\right).

The matrix $A$ is determined uniquely by the bilinear form $L$ .

7.1.2. Quadratic forms on $\mathbb{R}^{n}$

There are several equivalent definition of a quadratic form.

One can say that a quadratic form on $\mathbb{R}^{n}$ is the “diagonal” of a bilinear form $L$ , i.e. that any quadratic form $Q$ is defined by $Q[\mathbf{x}]=L(\mathbf{x},\mathbf{x})=(A\mathbf{x},\mathbf{x})$ .

Another, more algebraic way, is to say that a quadratic form is a homogeneous polynomial of degree 2, i.e. that $Q[\mathbf{x}]$ is a polynomial of $n$ variables $x_{1},x_{2},\ldots,x_{n}$ having only terms of degree 2. That means that only terms $ax_{k}^{2}$ and $cx_{j}x_{k}$ are allowed.

There many ways (in fact, infinitely many) to write a quadratic form $Q[\mathbf{x}]$ as $Q[\mathbf{x}]=(A\mathbf{x},\mathbf{x})$ . For example, the quadratic form $Q[\mathbf{x}]=x_{1}^{2}+x_{2}^{2}-4x_{1}x_{2}$ on $\mathbb{R}^{2}$ can be represented as $(A\mathbf{x},\mathbf{x})$ where $A$ can be any of the matrices

\left(\begin{array}[]{rr}1&-4\\ 0&1\\ \end{array}\right),\qquad\left(\begin{array}[]{rr}1&0\\ -4&1\\ \end{array}\right),\qquad\left(\begin{array}[]{rr}1&-2\\ -2&1\\ \end{array}\right).

In fact, any matrix $A$ of form

\left(\begin{array}[]{cc}1&a-4\\ -a&1\\ \end{array}\right)

will work.

But if we require the matrix $A$ to be symmetric, then such a matrix is unique:

Any quadratic form $Q[\mathbf{x}]$ on $\mathbb{R}^{n}$ admits unique representation $Q[\mathbf{x}]=(A\mathbf{x},\mathbf{x})$ where $A$ is a (real) symmetric matrix.

For example, for the quadratic form

Q[\mathbf{x}]=x_{1}^{2}+3x_{2}^{2}+5x_{3}^{2}+4x_{1}x_{2}-16x_{1}x_{3}+7x_{2}x% _{3}

on $\mathbb{R}^{3}$ , the corresponding symmetric matrix $A$ is

\left(\begin{array}[]{ccc}1&2&-8\\ 2&3&3.5\\ -8&3.5&5\\ \end{array}\right).

7.1.3. Quadratic forms on $\mathbb{C}^{n}$

One can also define a quadratic form on $\mathbb{C}^{n}$ (or any complex inner product space) by taking a self-adjoint transformation $A=A^{*}$ and defining $Q$ by $Q[\mathbf{x}]=(A\mathbf{x},\mathbf{x})$ . While our main examples will be in $\mathbb{R}^{n}$ , all the theorems are true in the setting of $\mathbb{C}^{n}$ as well. Bearing this in mind, we will always use $A^{*}$ instead of $A^{T}$

The only essential difference with the real case is that in the complex case we do not have any freedom of choice: if the quadratic form is real, the corresponding matrix has to be Hermitian (self-adjoint).

Note that if $A=A^{*}$ then

(A\mathbf{x},\mathbf{x})=(\mathbf{x},A^{*}\mathbf{x})=(\mathbf{x},A\mathbf{x})% =\overline{(A\mathbf{x},\mathbf{x})},

so $(A\mathbf{x},\mathbf{x})\in\mathbb{R}$ .

The converse is also true.

Lemma 7.1.1.

Let $(A\mathbf{x},\mathbf{x})$ be real for all $\mathbf{x}\in\mathbb{C}^{n}$ . Then $A=A^{*}$ .

We leave the proof as an exercise for the reader, see Problem 7.1.4 below.

One of the possible ways to prove Lemma 7.1.1 is to use the following version of polarization identities.

Lemma 7.1.2.

Let $A$ be an operator in an inner product space $X$ .

If $X$ is a complex space, then for any $\mathbf{x},\mathbf{y}\in X$

(A\mathbf{x},\mathbf{y})=\frac{1}{4}\sum_{\alpha\in\mathbb{C}:\alpha^{4}=1}% \alpha(A(\mathbf{x}+\alpha\mathbf{y}),\mathbf{x}+\alpha\mathbf{y}).

If $X$ is a real space and $A=A^{*}$ , then any $\mathbf{x},\mathbf{y}\in X$

(A\mathbf{x},\mathbf{y})=\frac{1}{4}\Bigl{[}(A(\mathbf{x}+\mathbf{y}),\mathbf{% x}+\mathbf{y})-(A(\mathbf{x}-\mathbf{y}),\mathbf{x}-\mathbf{y})\Bigr{]}.

For the proof of Lemma 7.1.2 see Exercise 5.6.3 in Chapter 5 above.

Exercises.

7.1.1.

Find the matrix of the bilinear form $L$ on $\mathbb{R}^{3}$ ,

L(\mathbf{x},\mathbf{y})=x_{1}y_{1}+2x_{1}y_{2}+14x_{1}y_{3}-5x_{2}y_{1}+2x_{2% }y_{2}-3x_{2}y_{3}+8x_{3}y_{1}+19x_{3}y_{2}-2x_{3}y_{3}.

7.1.2.

Define the bilinear form $L$ on $\mathbb{R}^{2}$ by

L(\mathbf{x},\mathbf{y})=\det[\mathbf{x},\mathbf{y}],

i.e. to compute $L(\mathbf{x},\mathbf{y})$ we form a $2\times 2$ matrix with columns $\mathbf{x},\mathbf{y}$ and compute its determinant.

Find the matrix of $L$ .

7.1.3.

Find the matrix of the quadratic form $Q$ on $\mathbb{R}^{3}$

Q[\mathbf{x}]=x_{1}^{2}+2x_{1}x_{2}-3x_{1}x_{3}-9x_{2}^{2}+6x_{2}x_{3}+13x_{3}% ^{2}.

7.1.4.

Prove Lemma 7.1.1 above.

Hint: Use the polarization identity, see Lemma 7.1.2. Alternatively, you can consider the expression $(A(\mathbf{x}+z\mathbf{y}),\mathbf{x}+z\mathbf{y})$ and show that if it is real for all $z\in\mathbb{C}$ then $(A\mathbf{x},\mathbf{y})=\overline{(\mathbf{y},A^{*}\mathbf{x})}$ .

7.2. Diagonalization of quadratic forms

You have probably met quadratic forms before, when you studied second order curves in the plane. Maybe you even studied the second order surfaces in $\mathbb{R}^{3}$ .

We want to present a unified approach to classification of such objects. Suppose we are given a set in $\mathbb{R}^{n}$ defined by the equation $Q[\mathbf{x}]=1$ , where $Q$ is some quadratic form. If $Q$ has some simple form, for example if the corresponding matrix is diagonal, i.e. if $Q[\mathbf{x}]=a_{1}x^{2}_{1}+a_{2}x^{2}_{2}+\ldots+a_{n}x^{2}_{n}$ , we can easily visualize this set, especially if $n=2,3$ . In higher dimensions, it is also possible, if not to visualize, then to understand the structure of the set very well.

So, if we are given a general, complicated quadratic form, we want to simplify it as much as possible, for example to make it diagonal. The standard way of doing that is the change of variables.

7.2.1. Orthogonal diagonalization

Let us have a quadratic form $Q[\mathbf{x}]=(A\mathbf{x},\mathbf{x})$ in $\mathbb{F}^{n}$ ( $\mathbb{F}$ is $\mathbb{R}$ or $\mathbb{C}$ ). Introduce new variables $\mathbf{y}=(y_{1},y_{2},\ldots,y_{n})^{T}\in\mathbb{F}^{n}$ , with $\mathbf{y}=S^{-1}\mathbf{x}$ , where $S$ is some invertible $n\times n$ matrix, so $\mathbf{x}=S\mathbf{y}$ .

Then,

Q[\mathbf{x}]=Q[S\mathbf{y}]=(AS\mathbf{y},S\mathbf{y})=(S^{*}AS\mathbf{y},% \mathbf{y}),

so in the new variables $\mathbf{y}$ the quadratic form has matrix $S^{*}AS$ .

So, we want to find an invertible matrix $S$ such that the matrix $S^{*}AS$ is diagonal. Note, that it is different from the diagonalization of matrices we had before: we tried to represent a matrix $A$ as $A=SDS^{-1}$ , so the matrix $D=S^{-1}AS$ was diagonal. However, for unitary matrices $U$ , we have $U^{*}=U^{-1}$ , and we can orthogonally diagonalize symmetric matrices. So we can apply the orthogonal diagonalization we studied before to the quadratic forms.

Namely, we can represent the matrix $A$ as $A=UDU^{*}=UDU^{-1}$ . Recall, that $D$ is a diagonal matrix with eigenvalues of $A$ on the diagonal, and $U$ is the matrix of eigenvectors (we need to pick an orthogonal basis of eigenvectors). Then $D=U^{*}AU$ , so in the variables $\mathbf{y}=U^{-1}\mathbf{x}$ the quadratic form has diagonal matrix.

Let us analyze the geometric meaning of the orthogonal diagonalization. The columns $\mathbf{u}_{1},\mathbf{u}_{2},\ldots,\mathbf{u}_{n}$ of the unitary matrix $U$ form an orthonormal basis in $\mathbb{F}^{n}$ , let us call this basis $\mathcal{B}$ . The change of coordinate matrix $[I]_{{}_{\scriptstyle\mathcal{S},\mathcal{B}}}$ from this basis to the standard one is exactly $U$ . We know that $\mathbf{y}=(y_{1},y_{2},\ldots,y_{n})^{T}=U^{-1}\mathbf{x}$ , so the coordinates $y_{1},y_{2},\ldots,y_{n}$ can be interpreted as coordinates of the vector $\mathbf{x}$ in the new basis $\mathbf{u}_{1},\mathbf{u}_{2},\ldots,\mathbf{u}_{n}$ .

So, orthogonal diagonalization allows us to visualize very well the set $Q[\mathbf{x}]=1$ , or a similar one, as long as we can visualize it for diagonal matrices.

Example.

Consider the quadratic form of two variables (i.e. quadratic form on $\mathbb{R}^{2}$ ), $Q(x,y)=2x^{2}+2y^{2}+2xy$ . Let us describe the set of points $(x,y)^{T}\in\mathbb{R}^{2}$ satisfying $Q(x,y)=1$ .

The matrix $A$ of $Q$ is

A=\left(\begin{array}[]{cc}2&1\\ 1&2\\ \end{array}\right).

Orthogonally diagonalizing this matrix we can represent it as

A=U\left(\begin{array}[]{cr}3&0\\ 0&1\\ \end{array}\right)U^{*},\qquad\text{where}\quad U=\frac{1}{\sqrt{2}}\left(% \begin{array}[]{cr}1&-1\\ 1&1\\ \end{array}\right),

or, equivalently

U^{*}AU=\left(\begin{array}[]{cr}3&0\\ 0&1\\ \end{array}\right)=:D.

The set $\{\mathbf{y}:(D\mathbf{y},\mathbf{y})=1\}$ is the ellipse with half-axes $1/\sqrt{3}$ and $1$ . Therefore the set $\{\mathbf{x}\in\mathbb{R}^{2}:(A\mathbf{x},\mathbf{x})=1\}$ , is the same ellipse only in the basis $(\frac{1}{\sqrt{2}},\frac{1}{\sqrt{2}})^{T}$ , $(-\frac{1}{\sqrt{2}},\frac{1}{\sqrt{2}})^{T}$ , or, equivalently, the same ellipse, rotated $\pi/4$ .

7.2.2. Non-orthogonal diagonalization

Orthogonal diagonalization involves computing eigenvalues and eigenvectors, so it may be difficult to do without computers for large $n$ . On the other hand, the non-orthogonal diagonalization, i.e. finding an invertible $S$ (without requiring $S^{-1}=S^{*}$ ) such that $D=S^{*}AS$ is diagonal, is much easier computationally and can be done using only algebraic operations (addition, subtraction, multiplication, division).

Below we present two most used methods of non-orthogonal diagonalization.

Diagonalization by completion of squares

The first methods is based on completion of squares. We will illustrate this method on real quadratic forms (forms on $\mathbb{R}^{n}$ ). After simple modifications this method could be used in the complex case, but we will not discuss it here. If necessary, an interested reader should be able to make the appropriate modifications.

Let us again consider the quadratic form of two variables, $Q[\mathbf{x}]=2x_{1}^{2}+2x_{1}x_{2}+2x_{2}^{2}$ (it is the same quadratic form as in the above example, only here we call variables not $x,y$ but $x_{1},x_{2}$ ). Since

2\left(x_{1}+\frac{1}{2}x_{2}\right)^{2}=2\left(x_{1}^{2}+2x_{1}\frac{1}{2}x_{% 2}+\frac{1}{4}x_{2}^{2}\right)=2x_{1}^{2}+2x_{1}x_{2}+\frac{1}{2}x_{2}^{2}

(note, that the first two terms coincide with the first two terms of $Q$ ), we get

2x_{1}^{2}+2x_{1}x_{2}+2x_{2}^{2}=2\left(x_{1}+\frac{1}{2}x_{2}\right)^{2}+% \frac{3}{2}x_{2}^{2}=2y_{1}^{2}+\frac{3}{2}y_{2}^{2},

where $y_{1}=x_{1}+\frac{1}{2}x_{2}$ and $y_{2}=x_{2}$ .

The same method can be applied to quadratic form of more than 2 variables. Let us consider, for example, a form $Q[\mathbf{x}]$ in $\mathbb{R}^{3}$ ,

Q[\mathbf{x}]=x_{1}^{2}-6x_{1}x_{2}+4x_{1}x_{3}-6x_{2}x_{3}+8x_{2}^{2}-3x_{3}^% {2}.

Considering all terms involving the first variable $x_{1}$ (the first 3 terms in this case), let us pick a full square or a multiple of a full square which has exactly these terms (plus some other terms).

Since

(x_{1}-3x_{2}+2x_{3})^{2}=x_{1}^{2}-6x_{1}x_{2}+4x_{1}x_{3}-12x_{2}x_{3}+9x_{2% }^{2}+4x_{3}^{2}

we can rewrite the quadratic form as

(x_{1}-3x_{2}+2x_{3})^{2}-x_{2}^{2}+6x_{2}x_{3}-7x_{3}^{2}.

Note, that the expression $-x_{2}^{2}+6x_{2}x_{3}-7x_{3}^{2}$ involves only variables $x_{2}$ and $x_{3}$ . Since

-(x_{2}-3x_{3})^{2}=-(x_{2}^{2}-6x_{2}x_{3}+9x_{3}^{2})=-x_{2}^{2}+6x_{2}x_{3}% -9x_{3}^{2}

we have

-x_{2}^{2}+6x_{2}x_{3}-7x_{3}^{2}=-(x_{2}-3x_{3})^{2}+2x_{3}^{2}.

Thus we can write the quadratic form $Q$ as

Q[\mathbf{x}]=(x_{1}-3x_{2}+2x_{3})^{2}-(x_{2}-3x_{3})^{2}+2x_{3}^{2}=y_{1}^{2% }-y_{2}^{2}+2y_{3}^{2}

where

y_{1}=x_{1}-3x_{2}+2x_{3},\qquad y_{2}=x_{2}-3x_{3},\qquad y_{3}=x_{3}.

Finally, let us address the question that an attentive reader is probably already asking: what to do if at some point we do have a product of two variables, but no corresponding squares? For example, how to diagonalize the form $x_{1}x_{2}$ ? The answer follows immediately from the identity

(7.2.1)

\displaystyle 4x_{1}x_{2}=(x_{1}+x_{2})^{2}-(x_{1}-x_{2})^{2},

which gives us the representation

Q[\mathbf{x}]=y_{1}^{2}-y_{2}^{2},\qquad y_{1}=(x_{1}+x_{2})/2,\ y_{2}=(x_{1}-% x_{2})/2.

Diagonalization using row/column operations

There is another way of performing non-orthogonal diagonalization of a quadratic form. The idea is to perform row operations on the matrix $A$ of the quadratic form. The difference with the row reduction (Gauss–Jordan elimination) is that after each row operation we need to perform the same column operation, the reason for that being that we want to make the matrix $S^{*}AS$ diagonal.

Let us explain how everything works on an example. Suppose we want to diagonalize a quadratic form with matrix

A=\left(\begin{array}[]{rrr}1&-1&3\\ -1&2&1\\ 3&1&1\\ \end{array}\right).

We augment the matrix $A$ by the identity matrix, and perform on the augmented matrix $(A|I)$ row/column operations. After each row operation we have to perform on the matrix $A$ the same column operation.¹¹1In the case of complex Hermitian matrices we perform for each row operation the conjugate of the corresponding column operation, see Remark 7.2.1 below We get

	$\displaystyle\left(\begin{array}[]{rrr\|rrr}1&-1&3&1&0&0\\ -1&2&1&0&1&0\\ 3&1&1&0&0&1\end{array}\right)\begin{array}[]{c}\vphantom{1}\\ +R_{1}\\ \vphantom{1}\end{array}$	$\displaystyle\sim\left(\begin{array}[]{rrr\|rrr}1&-1&3&1&0&0\\ 0&1&4&1&1&0\\ 3&1&1&0&0&1\end{array}\right)\sim$
	$\displaystyle\left(\begin{array}[]{rrr\|rrr}1&0&3&1&0&0\\ 0&1&4&1&1&0\\ 3&4&1&0&0&1\end{array}\right)\begin{array}[]{c}\vphantom{1}\\ \vphantom{1}\\ -3R_{1}\end{array}$	$\displaystyle\sim\left(\begin{array}[]{rrr\|rrr}1&0&3&1&0&0\\ 0&1&4&1&1&0\\ 0&4&-8&-3&0&1\end{array}\right)\sim$
	$\displaystyle\left(\begin{array}[]{rrr\|rrr}1&0&0&1&0&0\\ 0&1&4&1&1&0\\ 0&4&-8&-3&0&1\end{array}\right)\begin{array}[]{c}\vphantom{1}\\ \vphantom{1}\\ -4R_{2}\end{array}$	$\displaystyle\sim\left(\begin{array}[]{rrr\|rrr}1&0&0&1&0&0\\ 0&1&4&1&1&0\\ 0&0&-24&-7&-4&1\end{array}\right)\sim$
	$\displaystyle\left(\begin{array}[]{rrr\|rrr}1&0&0&1&0&0\\ 0&1&0&1&1&0\\ 0&0&-24&-7&-4&1\end{array}\right).\qquad$

Note, that we perform column operations only on the left side of the augmented matrix

We get the diagonal $D$ matrix on the left, and the matrix $S^{*}$ on the right, so $D=S^{*}AS$ ,

\left(\begin{array}[]{rrr}1&0&0\\ 0&1&0\\ 0&0&-24\end{array}\right)=\left(\begin{array}[]{rrr}1&0&0\\ 1&1&0\\ -7&-4&1\end{array}\right)\left(\begin{array}[]{rrr}1&-1&3\\ -1&2&1\\ 3&1&1\\ \end{array}\right)\left(\begin{array}[]{rrr}1&1&-7\\ 0&1&-4\\ 0&0&1\end{array}\right).

Let us explain why the method works. A row operation is a left multiplication by an elementary matrix. The corresponding column operation is the right multiplication by the transposed elementary matrix. Therefore, performing row operations $E_{1},E_{2},\ldots,E_{N}$ and the same column operations we transform the matrix $A$ to

(7.2.2)

E_{N}\ldots E_{2}E_{1}AE_{1}^{*}E_{2}^{*}\ldots E_{N}^{*}=EAE^{*}.

As for the identity matrix in the right side, we performed only row operations on it, so the identity matrix is transformed to

E_{N}\ldots E_{2}E_{1}I=EI=E.

If we now denote $E^{*}=S$ we get that $S^{*}AS$ is a diagonal matrix, and the matrix $E=S^{*}$ is the right half of the transformed augmented matrix.

In the above example we got lucky, because we did not need to interchange two rows. This operation is a bit tricker to perform. It is quite simple if you know what to do, but it may be hard to guess the correct row operations. Let us consider, for example, a quadratic form with the matrix

A=\left(\begin{array}[]{cc}0&1\\ 1&0\\ \end{array}\right)

If we want to diagonalize it by row and column operations, the simplest idea would be to interchange rows 1 and 2. But we also must to perform the same column operation, i.e. interchange columns 1 and 2, so we will end up with the same matrix.

So, we need something more non-trivial. The identity (7.2.1), for example, can be used to diagonalize this quadratic form. However, a simpler idea also works: use row operations to get a non-zero entry on the diagonal! For example, if we start with making $a_{1,1}$ non-zero, the following series of row (and the corresponding column) operations is one of the possible choices:

	$\displaystyle\left(\begin{array}[]{rr\|rr}0&1&1&0\\ 1&0&0&1\\ \end{array}\right)\begin{array}[]{c}+\frac{1}{2}R_{2}\\ \vphantom{1}\end{array}$	$\displaystyle\sim\left(\begin{array}[]{cc\|cc}1/2&1&1&1/2\\ 1&0&0&1\\ \end{array}\right)\sim$
	$\displaystyle\left(\begin{array}[]{cr\|cc}1&1&1&1/2\\ 1&0&0&1\\ \end{array}\right)\begin{array}[]{c}\vphantom{1}\\ -R_{1}\end{array}$	$\displaystyle\sim\left(\begin{array}[]{cr\|rc}1&1&1&1/2\\ 0&-1&-1&1/2\\ \end{array}\right)\sim$
	$\displaystyle\left(\begin{array}[]{cr\|rc}1&0&1&1/2\\ 0&-1&-1&1/2\\ \end{array}\right).\qquad$

Remark.

Non-orthogonal diagonalization gives us a simple description of a set $Q[\mathbf{x}]=1$ in a non-orthogonal basis. It is harder to visualize, than the representation given by the orthogonal diagonalization. However, if we are not interested in the details, for example if it is sufficient for us just to know that the set is an ellipsoid (or hyperboloid, etc), then the non-orthogonal diagonalization is an easier way to get the answer.

Remark 7.2.1.

For quadratic forms with complex entries (i.e. for forms $(A\mathbf{x},\mathbf{x})$ , $A=A^{*}$ ), the non-orthogonal diagonalization works the same way as in the real case, with the only difference, that the corresponding “column operations” have the complex conjugate coefficients.

The reason for that is that if a row operation is given by left multiplication by an elementary matrix $E_{k}$ , then the corresponding column operation is given by the right multiplication by $E_{k}^{*}$ , see (7.2.2).

Note that formula (7.2.2) works in both complex and reals cases: in real case we could write $E_{k}^{T}$ instead of $E_{k}^{*}$ , but using Hermitian adjoint allows us to have the same formula in both cases.

Exercises.

7.2.1.

Diagonalize the quadratic form with the matrix

A=\left(\begin{array}[]{ccc}1&2&1\\ 2&3&2\\ 1&2&1\end{array}\right).

Use two methods: completion of squares and row operations. Which one do you like better?

Can you say if the matrix $A$ is positive definite or not?

7.2.2.

For the matrix $A$

A=\left(\begin{array}[]{rrr}2&1&1\\ 1&2&1\\ 1&1&2\end{array}\right)

orthogonally diagonalize the corresponding quadratic form, i.e. find a diagonal matrix $D$ and a unitary matrix $U$ such that $D=U^{*}AU$ .

7.3. Sylvester’s Law of Inertia

As we discussed above, there are many ways to diagonalize a quadratic form. Note, that a resulting diagonal matrix is not unique. For example, if we got a diagonal matrix

D=\operatorname{diag}\{\lambda_{1},\lambda_{2},\ldots,\lambda_{n}\},

we can take a diagonal matrix

S=\operatorname{diag}\{s_{1},s_{2},\ldots,s_{n}\},\qquad s_{k}\in\mathbb{R},% \quad s_{k}\neq 0

and transform $D$ to

S^{*}DS=\operatorname{diag}\{s_{1}^{2}\lambda_{1},s_{2}^{2}\lambda_{2},\ldots,% s_{n}^{2}\lambda_{n}\}.

This transformation changes the diagonal entries of $D$ . However, it does not change the signs of the diagonal entries. And this is always the case!

Namely, the famous Sylvester’s Law of Inertia states that:

For a Hermitian matrix $A$ (i.e. for a quadratic form $Q[\mathbf{x}]=(A\mathbf{x},\mathbf{x})$ ) and any of its diagonalization $D=S^{*}AS$ , the number of positive (negative, zero) diagonal entries of $D$ depends only on $A$ , but not on a particular choice of diagonalization.

Here we of course assume that $S$ is an invertible matrix, and $D$ is a diagonal one.

The idea of the proof of the Sylvester’s Law of Inertia is to express the number of positive (negative, zero) diagonal entries of a diagonalization $D=S^{*}AS$ in terms of $A$ , not involving $S$ or $D$ .

We will need the following definition.

Definition.

Given an $n\times n$ Hermitian matrix $A=A^{*}$ (a quadratic form $Q[\mathbf{x}]=(A\mathbf{x},\mathbf{x})$ on $\mathbb{F}^{n}$ ) we call a subspace $E\subset\mathbb{F}^{n}$ positive (resp. negative, resp. neutral) if

(A\mathbf{x},\mathbf{x})>0\qquad(\text{resp.~{}}(A\mathbf{x},\mathbf{x})<0,% \quad\text{resp.~{}}(A\mathbf{x},\mathbf{x})=0)

for all $\mathbf{x}\in E$ , $\mathbf{x}\neq\mathbf{0}$ .

Sometimes, to emphasize the role of $A$ we will say $A$ -positive ( $A$ negative, $A$ -neutral).

Theorem 7.3.1.

Let $A$ be an $n\times n$ Hermitian matrix, and let $D=S^{*}AS$ be its diagonalization by an invertible matrix $S$ . Then the number of positive (resp. negative) diagonal entries of $D$ coincides with the maximal dimension of an $A$ -positive (resp. $A$ -negative) subspace.

The above theorem says that if $r_{+}$ is the number of positive diagonal entries of $D$ , then there exists an $A$ -positive subspace $E$ of dimension $r_{+}$ , but it is impossible to find a positive subspace $E$ with $\dim E>r_{+}$ .

We will need the following lemma, which can be considered a particular case of the above theorem.

Lemma 7.3.2.

Let $D$ be a diagonal matrix $D=\operatorname{diag}\{\lambda_{1},\lambda_{2},\ldots,\lambda_{n}\}$ . Then the number of positive (resp. negative) diagonal entries of $D$ coincides with the maximal dimension of a $D$ -positive (resp. $D$ -negative) subspace.

Proof.

By rearranging the standard basis in $\mathbb{F}^{n}$ (changing the numeration) we can always assume without loss of generality that the positive diagonal entries of $D$ are the first $r_{+}$ diagonal entries.

Consider the subspace $E_{+}$ spanned by the first $r_{+}$ coordinate vectors $\mathbf{e}_{1},\mathbf{e}_{2},\ldots,\mathbf{e}_{r_{+}}$ . Clearly $E_{+}$ is a $D$ -positive subspace, and $\dim E_{+}=r_{+}$ .

Let us now show that for any other $D$ -positive subspace $E$ we have $\dim E\leq r_{+}$ . Consider the orthogonal projection $P=P_{{}_{\scriptstyle E_{+}}}$ ,

P\mathbf{x}=(x_{1},x_{2},\ldots,x_{r_{+}},0\ldots,0)^{T},\qquad\mathbf{x}=(x_{% 1},x_{2},\ldots,x_{n})^{T}.

For a $D$ -positive subspace $E$ define an operator $T:E\to E_{+}$ by

T\mathbf{x}=P\mathbf{x},\qquad\forall\mathbf{x}\in E.

In other words, $T$ is the restriction of the projection $P$ : $P$ is defined on the whole space, but we restricted its domain to $E$ and target space to $E_{+}$ . We got an operator acting from $E$ to $E_{+}$ , and we use a different letter to distinguish it from $P$ .

Note, that $\operatorname{Ker}T=\{\mathbf{0}\}$ . Indeed, let for $\mathbf{x}=(x_{1},x_{2},\ldots,x_{n})^{T}\in E$ we have $T\mathbf{x}=P\mathbf{x}=\mathbf{0}$ . Then, by the definition of $P$

x_{1}=x_{2}=\ldots=x_{r_{+}}=0,

and therefore

(D\mathbf{x},\mathbf{x})=\sum_{k=r_{+}+1}^{n}\lambda_{k}x_{k}^{2}\leq 0\qquad(% \lambda_{k}\leq 0\ \text{for }k>r_{+}).

But $\mathbf{x}$ belongs to a $D$ -positive subspace $E$ , so the inequality $(D\mathbf{x},\mathbf{x})\leq 0$ holds only for $\mathbf{x}=\mathbf{0}$ .

Let us now apply the Rank Theorem (Theorem 2.7.1 from Chapter 2). First of all, $\operatorname{rank}T=\dim\operatorname{Ran}T\leq\dim E_{+}=r_{+}$ because $\operatorname{Ran}T\subset E_{+}$ . By the Rank Theorem, $\dim\operatorname{Ker}T+\operatorname{rank}T=\dim E$ . But we just proved that $\operatorname{Ker}T=\{\mathbf{0}\}$ , i.e. that $\dim\operatorname{Ker}T=0$ , so

\dim E=\operatorname{rank}T\leq\dim E_{+}=r_{+}.

To prove the statement about negative entries, we just apply the above reasoning to the matrix $-D$ . ∎

Proof of Theorem 7.3.1.

Let $D=S^{*}AS$ be a diagonalization of $A$ . Since

(D\mathbf{x},\mathbf{x})=(S^{*}AS\mathbf{x},\mathbf{x})=(AS\mathbf{x},S\mathbf% {x})

it follows that for any $D$ -positive subspace $E$ , the subspace $SE$ is an $A$ -positive subspace. The same identity implies that for any $A$ -positive subspace $F$ the subspace $S^{-1}F$ is $D$ -positive.

Since $S$ and $S^{-1}$ are invertible transformations, $\dim E=\dim SE$ and $\dim F=\dim S^{-1}F$ . Therefore, for any $D$ positive subspace $E$ we can find an $A$ -positive subspace (namely $SE$ ) of the same dimension, and vice versa: for any $A$ -positive subspace $F$ we can find a $D$ -positive subspace (namely $S^{-1}F$ ) of the same dimension. Therefore the maximal possible dimensions of a $A$ -positive and a $D$ -positive subspace coincide, and the theorem is proved.

The case of negative diagonal entries treated similarly, we leave the details as an exercise to the reader. ∎

7.4. Positive definite forms. Minimax characterization of eigenvalues and the Sylvester’s criterion of positivity

Definition.

A quadratic form $Q$ is called

•

Positive definite if $Q[\mathbf{x}]>0$ for all $\mathbf{x}\neq\mathbf{0}$ .
•

Positive semidefinite if $Q[\mathbf{x}]\geq 0$ for all $\mathbf{x}$ .
•

Negative definite if $Q[\mathbf{x}]<0$ for all $\mathbf{x}\neq\mathbf{0}$ .
•

Negative semidefinite if $Q[\mathbf{x}]\leq 0$ for all $\mathbf{x}$ .
•

Indefinite if it take both positive and negative values, i.e. if there exist vectors $\mathbf{x}_{1}$ and $\mathbf{x}_{2}$ such that $Q[\mathbf{x}_{1}]>0$ and $Q[\mathbf{x}_{2}]<0$ .

Definition.

A Hermitian matrix $A=A^{*}$ is called positive definite (negative definite, etc…) if the corresponding quadratic form $Q[\mathbf{x}]=(A\mathbf{x},\mathbf{x})$ is positive definite (negative definite, etc…).

Theorem 7.4.1.

Let $A=A^{*}$ . Then

1.

$A$ is positive definite iff all eigenvalues of $A$ are positive.
2.

$A$ is positive semidefinite iff all eigenvalues of $A$ are non-negative.
3.

$A$ is negative definite iff all eigenvalues of $A$ are negative.
4.

$A$ is negative semidefinite iff all eigenvalues of $A$ are non-positive.
5.

$A$ is indefinite iff it has both positive and negative eigenvalues.

Proof.

The proof follows trivially from the orthogonal diagonalization. Indeed, there is an orthonormal basis in which matrix of $A$ is diagonal, and for diagonal matrices the theorem is trivial. ∎

Remark.

Note, that to find whether a matrix (a quadratic form) is positive definite (negative definite, etc) one does not have to compute eigenvalues. By Sylvester’s Law of Inertia it is sufficient to perform an arbitrary, not necessarily orthogonal diagonalization $D=S^{*}AS$ and look at the diagonal entries of $D$ .

7.4.1. Sylvester’s criterion of positivity

It is an easy exercise to see that a $2\times 2$ matrix

A=\left(\begin{array}[]{rr}a&b\\ \overline{b}&c\\ \end{array}\right)

is positive definite if and only if

(7.4.1)

a>0\qquad\text{and }\det A=ac-|b|^{2}>0

Indeed, if $a>0$ and $\det A=ac-|b|^{2}>0$ , then $c>0$ , so $\operatorname{trace}A=a+c>0$ . So we know that if $\lambda_{1},\lambda_{2}$ are eigenvalues of $A$ then $\lambda_{1}\lambda_{2}>0$ ( $\det A>0$ ) and $\lambda_{1}+\lambda_{2}=\operatorname{trace}A>0$ . But that only possible if both eigenvalues are positive. So we have proved that conditions (7.4.1) imply that $A$ is positive definite. The opposite implication is quite simple, we leave it as an exercise for the reader.

This result can be generalized to the case of $n\times n$ matrices. Namely, for a matrix $A$

A=\left(\begin{array}[]{cccc}a_{1,1}&a_{1,2}&\ldots&a_{1,n}\\ a_{2,1}&a_{2,2}&\ldots&a_{2,n}\\ \vdots&\vdots&\ddots&\vdots\\ a_{n,1}&a_{n,2}&\ldots&a_{n,n}\end{array}\right)

let us consider its all upper left submatrices

A_{1}=(a_{1,1}),\ A_{2}=\left(\begin{array}[]{cc}a_{1,1}&a_{1,2}\\ a_{2,1}&a_{2,2}\\ \end{array}\right),\ A_{3}=\left(\begin{array}[]{ccc}a_{1,1}&a_{1,2}&a_{1,3}\\ a_{2,1}&a_{2,2}&a_{2,3}\\ a_{3,1}&a_{3,2}&a_{3,3}\\ \end{array}\right),\ldots,A_{n}=A

Theorem 7.4.2 (Sylvester’s Criterion of Positivity).

A matrix $A=A^{*}$ is positive definite if and only if

\det A_{k}>0\qquad\text{for all }k=1,2,\ldots,n.

First of all let us notice that if $A>0$ then $A_{k}>0$ also (can you explain why?). Therefore, since all eigenvalues of a positive definite matrix are positive, see Theorem 7.4.1, $\det A_{k}>0$ for all $k$ .

One can show that if $\det A_{k}>0$ $\forall k$ then all eigenvalues of $A$ are positive by analyzing diagonalization of a quadratic form using row and column operations, which was described in Section 7.2.2. The key here is the observation that if we perform row/column operations in natural order (i.e. first subtracting the first row/column from all other rows/columns, then subtracting the second row/columns from the rows/columns $3,4,\ldots,n$ , and so on…), and if we are not doing any row interchanges, then we automatically diagonalize quadratic forms $A_{k}$ as well. Namely, after we subtract first and second rows and columns, we get diagonalization of $A_{2}$ ; after we subtract the third row/column we get the diagonalization of $A_{3}$ , and so on.

Since we are performing only row replacement we do not change the determinant. Moreover, since we are not performing row exchanges and performing the operations in the correct order, we preserve determinants of $A_{k}$ . Therefore, the condition $\det A_{k}>0$ guarantees that each new entry in the diagonal is positive.

Of course, one has to be sure that we can use only row replacements, and perform the operations in the correct order, i.e. that we do not encounter any pathological situation. If one analyzes the algorithm, one can see that the only bad situation that can happen is the situation where at some step we have zero in the pivot place. In other words, if after we subtracted the first $k$ rows and columns and obtained a diagonalization of $A_{k}$ , the entry in the $k+1$ st row and $k+1$ st column is $0$ . We leave it as an exercise for the reader to show that this is impossible. ∎

The proof we outlined above is quite simple. However, let us present, in more detail, another one, which can be found in more advanced textbooks. I personally prefer this second proof, for it demonstrates some important connections.

We will need the following characterization of eigenvalues of a hermitian matrix.

7.4.2. Minimax characterization of eigenvalues

For a subspace $E$ of an inner product space $X$ the codimension $\operatorname{codim}E$ of $E$ is defined as the the dimension of its orthogonal complement, $\operatorname{codim}E=\dim(E^{\perp})$ . Since for a subspace $E\subset X$ , $\dim X=n$ we have $\dim E+\dim E^{\perp}=n$ , we can see that $\operatorname{codim}E=\dim X-\dim E$ .

Recall that the trivial subspace $\{\mathbf{0}\}$ has dimension zero, so the whole space $X$ has codimension $0$ .

Remark.

For arbitrary vector space the codimension of a subspace $E$ can just be defined as $\operatorname{codim}E=\dim X-\dim E$ .

A more “high brow” definition would be the dimension of the quotient space $X/E$ ; however in this book we do not discuss the quotient space, and ether definition ( $\dim E^{\perp}$ or $\dim X-\dim e$ works for our purposes).

Theorem 7.4.3 (Minimax characterization of eigenvalues).

Let $A=A^{*}$ be an $n\times n$ matrix, and let $\lambda_{1}\geq\lambda_{2}\geq\ldots\geq\lambda_{n}$ be its eigenvalues taken in the decreasing order. Then

\lambda_{k}=\max_{\begin{subarray}{c}E:\\ \dim E=k\end{subarray}}\ \min_{\begin{subarray}{c}\mathbf{x}\in E\\ \|\mathbf{x}\|=1\end{subarray}}(A\mathbf{x},\mathbf{x})=\min_{\begin{subarray}% {c}F:\\ \operatorname{codim}F=k-1\end{subarray}}\ \max_{\begin{subarray}{c}\mathbf{x}% \in F\\ \|\mathbf{x}\|=1\end{subarray}}(A\mathbf{x},\mathbf{x}).

Let us explain in more details what the expressions like $\max\min$ and $\min\max$ mean. To compute the first one, we need to consider all subspaces $E$ of dimension $k$ . For each such subspace $E$ we consider the set of all $\mathbf{x}\in E$ of norm 1, and find the minimum of $(A\mathbf{x},\mathbf{x})$ over all such $\mathbf{x}$ . Thus for each subspace we obtain a number, and we need to pick a subspace $E$ such that the number is maximal. That is the $\max\min$ .

The $\min\max$ is defined similarly.

Remark.

A sophisticated reader may notice a problem here: why do the maxima and minima exist? It is well known, that maximum and minimum have a nasty habit of not existing: for example, the function $f(x)=x$ has neither maximum nor minimum on the open interval $(0,1)$ .

However, in this case maximum and minimum do exist. There are two possible explanations of the fact that $(A\mathbf{x},\mathbf{x})$ attains maximum and minimum. The first one requires some familiarity with basic notions of analysis: one should just say that the unit sphere in $E$ , i.e. the set $\{\mathbf{x}\in E:\|\mathbf{x}\|=1\}$ is compact, and that a continuous function ( $Q[\mathbf{x}]=(A\mathbf{x},\mathbf{x})$ in our case) on a compact set attains its maximum and minimum.

Another explanation will be to notice that the function $Q[\mathbf{x}]=(A\mathbf{x},\mathbf{x})$ , $\mathbf{x}\in E$ is a quadratic form on $E$ . It is not difficult to compute the matrix of this form in some orthonormal basis in $E$ , but let us only note that this matrix is not $A$ : it has to be a $k\times k$ matrix, where $k=\dim E$ .

It is easy to see that for a quadratic form the maximum and minimum over a unit sphere is the maximal and minimal eigenvalues of its matrix.

As for optimizing over all subspaces, we will prove below that the maximum and minimum do exist.

Proof of Theorem 7.4.3.

First of all, by picking an appropriate orthonormal basis, we can assume without loss of generality that the matrix $A$ is diagonal, $A=\operatorname{diag}\{\lambda_{1},\lambda_{2},\ldots,\lambda_{n}\}$ .

Pick subspaces $E$ and $F$ , $\dim E=k$ , $\operatorname{codim}F=k-1$ , i.e. $\dim F=n-k+1$ . Since $\dim E+\dim F>n$ , there exists a non-zero vector $\mathbf{x}_{0}\in E\cap F$ . By normalizing it we can assume without loss of generality that $\|\mathbf{x}_{0}\|=1$ . We can always arrange the eigenvalues in decreasing order, so let us assume that $\lambda_{1}\geq\lambda_{2}\geq\ldots\geq\lambda_{n}$ .

Since $\mathbf{x}_{0}$ belongs to the both subspaces $E$ and $F$

\min_{\begin{subarray}{c}\mathbf{x}\in E\\ \|\mathbf{x}\|=1\end{subarray}}(A\mathbf{x},\mathbf{x})\leq(A\mathbf{x}_{0},% \mathbf{x}_{0})\leq\max_{\begin{subarray}{c}\mathbf{x}\in F\\ \|\mathbf{x}\|=1\end{subarray}}(A\mathbf{x},\mathbf{x}).

We did not assume anything except dimensions about the subspaces $E$ and $F$ , so the above inequality

(7.4.2)

\min_{\begin{subarray}{c}\mathbf{x}\in E\\ \|\mathbf{x}\|=1\end{subarray}}(A\mathbf{x},\mathbf{x})\leq\max_{\begin{% subarray}{c}\mathbf{x}\in F\\ \|\mathbf{x}\|=1\end{subarray}}(A\mathbf{x},\mathbf{x}).

holds for all pairs of $E$ and $F$ of appropriate dimensions.

Define

E_{0}:=\operatorname{span}\{\mathbf{e}_{1},\mathbf{e}_{2},\ldots,\mathbf{e}_{k% }\},\qquad F_{0}:=\operatorname{span}\{\mathbf{e}_{k},\mathbf{e}_{k+1},\mathbf% {e}_{k+2},\ldots,\mathbf{e}_{n}\}.

Since for a self-adjoint matrix $B$ , the maximum and minimum of $(B\mathbf{x},\mathbf{x})$ over the unit sphere $\{\mathbf{x}:\|\mathbf{x}\|=1\}$ are the maximal and the minimal eigenvalue respectively (easy to check on diagonal matrices), we get that

\min_{\begin{subarray}{c}\mathbf{x}\in E_{0}\\ \|\mathbf{x}\|=1\end{subarray}}(A\mathbf{x},\mathbf{x})=\max_{\begin{subarray}% {c}\mathbf{x}\in F_{0}\\ \|\mathbf{x}\|=1\end{subarray}}(A\mathbf{x},\mathbf{x})=\lambda_{k}.

It follows from (7.4.2) that for any subspace $E$ , $\dim E=k$

\min_{\begin{subarray}{c}\mathbf{x}\in E\\ \|\mathbf{x}\|=1\end{subarray}}(A\mathbf{x},\mathbf{x})\leq\max_{\begin{% subarray}{c}\mathbf{x}\in F_{0}\\ \|\mathbf{x}\|=1\end{subarray}}(A\mathbf{x},\mathbf{x})=\lambda_{k}

and similarly, for any subspace $F$ of codimension $k-1$ ,

\max_{\begin{subarray}{c}\mathbf{x}\in F\\ \|\mathbf{x}\|=1\end{subarray}}(A\mathbf{x},\mathbf{x})\geq\min_{\begin{% subarray}{c}\mathbf{x}\in E_{0}\\ \|\mathbf{x}\|=1\end{subarray}}(A\mathbf{x},\mathbf{x})=\lambda_{k}.

But on subspaces $E_{0}$ and $F_{0}$ both maximum and minimum are $\lambda_{k}$ , so $\min\max=\max\min=\lambda_{k}$ . ∎

Corollary 7.4.4 (Intertwining of eigenvalues).

Let $A=A^{*}=\{a_{j,k}\}_{j,k=1}^{n}$ be a self-adjoint matrix, and let $\widetilde{A}=\{a_{j,k}\}_{j,k=1}^{n-1}$ be its submatrix of size $(n-1)\times(n-1)$ . Let $\lambda_{1},\lambda_{2},\ldots,\lambda_{n}$ and $\mu_{1},\mu_{2},\ldots,\mu_{n-1}$ be the eigenvalues of $A$ and $\widetilde{A}$ respectively, taken in decreasing order. Then

\lambda_{1}\geq\mu_{1}\geq\lambda_{2}\geq\mu_{2}\geq\ldots\geq\lambda_{n-1}% \geq\mu_{n-1}\geq\lambda_{n}.

i.e.

\lambda_{k}\geq\mu_{k}\geq\lambda_{k+1},\qquad k=1,2,\ldots,n-1

Proof.

Let $\widetilde{X}\subset\mathbb{F}^{n}$ be the subspace spanned by the first $n-1$ basis vectors, $\widetilde{X}=\operatorname{span}\{\mathbf{e}_{1},\mathbf{e}_{2},\ldots,% \mathbf{e}_{n-1}\}$ . Since $(\widetilde{A}\mathbf{x},\mathbf{x})=(A\mathbf{x},\mathbf{x})$ for all $\mathbf{x}\in\widetilde{X}$ , Theorem 7.4.3 implies that

\mu_{k}=\max_{\begin{subarray}{c}E\subset\widetilde{X}\\ \dim E=k\end{subarray}}\ \min_{\begin{subarray}{c}\mathbf{x}\in E\\ \|\mathbf{x}\|=1\end{subarray}}(A\mathbf{x},\mathbf{x}).

To get $\lambda_{k}$ we need to get maximum over the set of all subspaces $E$ of $\mathbb{F}^{n}$ , $\dim E=k$ , i.e. take maximum over a bigger set (any subspace of $\widetilde{X}$ is a subspace of $\mathbb{F}^{n}$ ). Therefore

\mu_{k}\leq\lambda_{k}.

(the maximum can only increase, if we increase the set).

On the other hand, any subspace $E\subset\widetilde{X}$ of codimension $k-1$ (here we mean codimension in $\widetilde{X}$ ) has dimension $n-1-(k-1)=n-k$ , so its codimension in $\mathbb{F}^{n}$ is $k$ . Therefore

\mu_{k}=\min_{\begin{subarray}{c}E\subset\widetilde{X}\\ \dim E=n-k\end{subarray}}\ \max_{\begin{subarray}{c}\mathbf{x}\in E\\ \|\mathbf{x}\|=1\end{subarray}}(A\mathbf{x},\mathbf{x})\geq\min_{\begin{% subarray}{c}E\subset\mathbb{F}^{n}\\ \dim E=n-k\end{subarray}}\ \max_{\begin{subarray}{c}\mathbf{x}\in E\\ \|\mathbf{x}\|=1\end{subarray}}(A\mathbf{x},\mathbf{x})=\lambda_{k+1}

(minimum over a bigger set can only be smaller). ∎

Proof of Theorem 7.4.2.

If $A>0$ , then $A_{k}>0$ for $k=1,2,\ldots,n$ as well (can you explain why?). Since all eigenvalues of a positive definite matrix are positive (see Theorem 7.4.1), $\det A_{k}>0$ for all $k=1,2,\ldots,n$ .

Let us now prove the other implication. Let $\det A_{k}>0$ for all $k$ . We will show, using induction in $k$ , that all $A_{k}$ (and so $A=A_{n}$ ) are positive definite.

Clearly $A_{1}$ is positive definite (it is $1\times 1$ matrix, so $A_{1}=\det A_{1}$ ). Assuming that $A_{k-1}>0$ (and $\det A_{k}>0$ ) let us show that $A_{k}$ is positive definite. Let $\lambda_{1},\lambda_{2},\ldots,\lambda_{k}$ and $\mu_{1},\mu_{2},\ldots,\mu_{k-1}$ be eigenvalues of $A_{k}$ and $A_{k-1}$ respectively. By Corollary 7.4.4

\lambda_{j}\geq\mu_{j}>0\qquad\text{for }j=1,2,\ldots,k-1.

Since $\det A_{k}=\lambda_{1}\lambda_{2}\ldots\lambda_{k-1}\lambda_{k}>0$ , the last eigenvalue $\lambda_{k}$ must also be positive. Therefore, since all its eigenvalues are positive, the matrix $A_{k}$ is positive definite. ∎

7.4.3. Some remarks

First of all notice, that Sylvester Criterion of Positivity does not generalize to positive semidefinite matrices if $n\geq 3$ , meaning that for $n\times n$ matrices, $n\geq 3$ , the conditions $\det A_{k}\geq 0$ do not imply that $A$ is positive semidefinite, see Problem 7.4.6 below.

For $2\times 2$ matrices, however, the conditions $\det A_{k}\geq 0$ imply that $A$ is positive semidefinite, see Problem 7.4.3 below. This sometimes leads to the wrong conclusion about $n\times n$ matrices.

Finally, we should say couple words about negative definite matrices. It is a typical students’ mistake to say that the condition $\det A_{k}<0$ implies that $A$ is negative definite. But that is wrong!

To check if the matrix $A$ is negative definite, one just have to check that the matrix $-A$ is positive definite. Applying Sylvester’s Criterion of Positivity to $-A$ one can see that $A$ is negative definite if and only if $(-1)^{k}\det A_{k}>0$ for all $k=1,2,\ldots,n$ .

Exercises.

7.4.1.

Using Sylvester’s Criterion of Positivity check if the matrices

A=\left(\begin{array}[]{rrr}4&2&1\\ 2&3&-1\\ 1&-1&2\end{array}\right),\qquad B=\left(\begin{array}[]{rrr}3&-1&2\\ -1&4&-2\\ 2&-2&2\end{array}\right)

are positive definite or not.

Are the matrices $-A$ , $A^{3}$ and $A^{-1}$ , $A+B^{-1}$ , $A+B$ , $A-B$ positive definite?

7.4.2.

True or false:

a)

If $A$ is positive definite, then $A^{5}$ is positive definite.
b)

If $A$ is negative definite, then $A^{8}$ is negative definite.
c)

If $A$ is negative definite, then $A^{12}$ is positive definite.
d)

If $A$ is positive definite and $B$ is negative semidefinite, then $A-B$ is positive definite.
e)

If $A$ is indefinite, and $B$ is positive definite, then $A+B$ is indefinite.

7.4.3.

Let $A$ be a $2\times 2$ Hermitian matrix, such that $a_{1,1}>0$ , $\det A\geq 0$ . Prove that $A$ is positive semidefinite.

7.4.4.

Find a real symmetric $n\times n$ matrix such that $\det A_{k}\geq 0$ for all $k=1,2,\ldots,n$ , but the matrix $A$ is not positive semidefinite. Try to find an example for the minimal possible $n$ .

7.4.5.

Let $A$ be an $n\times n$ Hermitian matrix such that $\det A_{k}>0$ for all $k=1,2,\ldots,n-1$ and $\det A\geq 0$ . Prove that $A$ is positive semidefinite.

7.4.6.

Find a real symmetric $3\times 3$ matrix $A$ such that $a_{1,1}>0$ , $\det A_{k}\geq 0$ for $k=2,3$ , but the matrix $A$ is not positive semidefinite.

7.5. Positive definite forms and inner products

Let $V$ be an inner product space and let $\mathcal{B}=\mathbf{v}_{1},\mathbf{v}_{2},\ldots,\mathbf{v}_{n}$ be a basis (not necessarily orthogonal) in $V$ . Let $G=\{g_{j,k}\}_{j,k=1}^{n}$ be the matrix defined by

g_{j,k}=(\mathbf{v}_{k},\mathbf{v}_{j}).

If $\mathbf{x}=\sum_{k}x_{k}\mathbf{v}_{k}$ and $\mathbf{y}=\sum_{k}y_{k}\mathbf{v}_{k}$ , then

	$\displaystyle(\mathbf{x},\mathbf{y})=\left(\sum_{k}x_{k}\mathbf{v}_{k},\sum_{j% }y_{j}\mathbf{v}_{j}\right)$	$\displaystyle=\sum_{k,j=1}^{n}x_{k}\overline{y}_{j}(\mathbf{v}_{k},\mathbf{v}_% {j})$
		$\displaystyle=\sum_{j=1}^{n}\sum_{k=1}^{n}g_{j,k}x_{k}\overline{y}_{j}=\left(G% [\mathbf{x}]_{\mathcal{B}},[\mathbf{y}]_{\mathcal{B}}\right)_{\mathbb{C}^{n}},$

where $(\,\cdot\,,\,\cdot\,)_{\mathbb{C}^{n}}$ stands for the standard inner product in $\mathbb{C}^{n}$ . One can immediately see that $G$ is a positive definite matrix (why?).

So, when one works with coordinates in an arbitrary (not necessarily orthogonal) basis in an inner product space, the inner product (in terms of coordinates) is not computed as the standard inner product in $\mathbb{C}^{n}$ , but with the help of a positive definite matrix $G$ as described above.

Note, that this $G$ -inner product coincides with the standard inner product in $\mathbb{C}^{n}$ if and only if $G=I$ , which happens if and only if the basis $\mathbf{v}_{1},\mathbf{v}_{2},\ldots,\mathbf{v}_{n}$ is orthonormal.

Conversely, given a positive definite matrix $G$ one can define a non-standard inner product ( $G$ -inner product) in $\mathbb{C}^{n}$ by

(\mathbf{x},\mathbf{y})_{G}:=(G\mathbf{x},\mathbf{y})_{\mathbb{C}^{n}},\qquad% \mathbf{x},\mathbf{y}\in\mathbb{C}^{n}.

One can easily check that $(\mathbf{x},\mathbf{y})_{G}$ is indeed an inner product, i.e. that properties 1–4 from Section 5.1.3 of Chapter 5 are satisfied.

	$\displaystyle\left(\begin{array}[]{rrr\|rrr}1&-1&3&1&0&0\\ -1&2&1&0&1&0\\ 3&1&1&0&0&1\end{array}\right)\begin{array}[]{c}\vphantom{1}\\ +R_{1}\\ \vphantom{1}\end{array}$	$\displaystyle\sim\left(\begin{array}[]{rrr\|rrr}1&-1&3&1&0&0\\ 0&1&4&1&1&0\\ 3&1&1&0&0&1\end{array}\right)\sim$
	$\displaystyle\left(\begin{array}[]{rrr\|rrr}1&0&3&1&0&0\\ 0&1&4&1&1&0\\ 3&4&1&0&0&1\end{array}\right)\begin{array}[]{c}\vphantom{1}\\ \vphantom{1}\\ -3R_{1}\end{array}$	$\displaystyle\sim\left(\begin{array}[]{rrr\|rrr}1&0&3&1&0&0\\ 0&1&4&1&1&0\\ 0&4&-8&-3&0&1\end{array}\right)\sim$
	$\displaystyle\left(\begin{array}[]{rrr\|rrr}1&0&0&1&0&0\\ 0&1&4&1&1&0\\ 0&4&-8&-3&0&1\end{array}\right)\begin{array}[]{c}\vphantom{1}\\ \vphantom{1}\\ -4R_{2}\end{array}$	$\displaystyle\sim\left(\begin{array}[]{rrr\|rrr}1&0&0&1&0&0\\ 0&1&4&1&1&0\\ 0&0&-24&-7&-4&1\end{array}\right)\sim$
	$\displaystyle\left(\begin{array}[]{rrr\|rrr}1&0&0&1&0&0\\ 0&1&0&1&1&0\\ 0&0&-24&-7&-4&1\end{array}\right).\qquad$

	$\displaystyle\left(\begin{array}[]{rr\|rr}0&1&1&0\\ 1&0&0&1\\ \end{array}\right)\begin{array}[]{c}+\frac{1}{2}R_{2}\\ \vphantom{1}\end{array}$	$\displaystyle\sim\left(\begin{array}[]{cc\|cc}1/2&1&1&1/2\\ 1&0&0&1\\ \end{array}\right)\sim$
	$\displaystyle\left(\begin{array}[]{cr\|cc}1&1&1&1/2\\ 1&0&0&1\\ \end{array}\right)\begin{array}[]{c}\vphantom{1}\\ -R_{1}\end{array}$	$\displaystyle\sim\left(\begin{array}[]{cr\|rc}1&1&1&1/2\\ 0&-1&-1&1/2\\ \end{array}\right)\sim$
	$\displaystyle\left(\begin{array}[]{cr\|rc}1&0&1&1/2\\ 0&-1&-1&1/2\\ \end{array}\right).\qquad$

Chapter 7 Bilinear and quadratic forms

7.1. Main definition

7.1.1. Bilinear forms on ℝn\mathbb{R}^{n}

7.1.2. Quadratic forms on ℝn\mathbb{R}^{n}

7.1.3. Quadratic forms on ℂn\mathbb{C}^{n}

Lemma 7.1.1.

Lemma 7.1.2.

Exercises.

7.1.1.

7.1.2.

7.1.3.

7.1.4.

7.2. Diagonalization of quadratic forms

7.2.1. Orthogonal diagonalization

Example.

7.2.2. Non-orthogonal diagonalization

Diagonalization by completion of squares

Diagonalization using row/column operations

Remark.

Remark 7.2.1.

Exercises.

7.2.1.

7.2.2.

7.3. Sylvester’s Law of Inertia

Definition.

Theorem 7.3.1.

Lemma 7.3.2.

Proof.

Proof of Theorem 7.3.1.

7.4. Positive definite forms. Minimax characterization of eigenvalues and the Sylvester’s criterion of positivity

Definition.

Definition.

Theorem 7.4.1.

Proof.

Remark.

7.4.1. Sylvester’s criterion of positivity

Theorem 7.4.2 (Sylvester’s Criterion of Positivity).

7.4.2. Minimax characterization of eigenvalues

Remark.

Theorem 7.4.3 (Minimax characterization of eigenvalues).

Remark.

Proof of Theorem 7.4.3.

Corollary 7.4.4 (Intertwining of eigenvalues).

Proof.

Proof of Theorem 7.4.2.

7.4.3. Some remarks

Exercises.

7.4.1.

7.4.2.

7.4.3.

7.4.4.

7.4.5.

7.4.6.

7.5. Positive definite forms and inner products

7.1.1. Bilinear forms on $\mathbb{R}^{n}$

7.1.2. Quadratic forms on $\mathbb{R}^{n}$

7.1.3. Quadratic forms on $\mathbb{C}^{n}$