While the study of real quadratic forms (i.e. real homogeneous polynomials of degree ) was probably the initial motivation for the subject of this chapter, complex quadratic forms , , are also of significant interest. So, unless otherwise specified, result and calculations hold in both real and complex case.
To avoid writing twice essentially the same formulas, we use the notation adapted to the complex case: in particular, in the real case the notation is used instead of .
A bilinear form on is a function of two arguments which is linear in each argument, i.e. such that
.
One can consider bilinear form whose values belong to an arbitrary vector space, but in this book we only consider forms that take real values.
If and , a bilinear form can be written as
or in matrix form
where
The matrix is determined uniquely by the bilinear form .
There are several equivalent definition of a quadratic form.
One can say that a quadratic form on is the “diagonal” of a bilinear form , i.e. that any quadratic form is defined by .
Another, more algebraic way, is to say that a quadratic form is a homogeneous polynomial of degree 2, i.e. that is a polynomial of variables having only terms of degree 2. That means that only terms and are allowed.
There many ways (in fact, infinitely many) to write a quadratic form as . For example, the quadratic form on can be represented as where can be any of the matrices
In fact, any matrix of form
will work.
But if we require the matrix to be symmetric, then such a matrix is unique:
Any quadratic form on admits unique representation where is a (real) symmetric matrix.
For example, for the quadratic form
on , the corresponding symmetric matrix is
One can also define a quadratic form on (or any complex inner product space) by taking a self-adjoint transformation and defining by . While our main examples will be in , all the theorems are true in the setting of as well. Bearing this in mind, we will always use instead of
The only essential difference with the real case is that in the complex case we do not have any freedom of choice: if the quadratic form is real, the corresponding matrix has to be Hermitian (self-adjoint).
Note that if then
so .
The converse is also true.
Let be real for all . Then .
We leave the proof as an exercise for the reader, see Problem 7.1.4 below.
One of the possible ways to prove Lemma 7.1.1 is to use the following version of polarization identities.
Let be an operator in an inner product space .
If is a complex space, then for any
If is a real space and , then any
Find the matrix of the bilinear form on ,
Define the bilinear form on by
i.e. to compute we form a matrix with columns and compute its determinant.
Find the matrix of .
Find the matrix of the quadratic form on
You have probably met quadratic forms before, when you studied second order curves in the plane. Maybe you even studied the second order surfaces in .
We want to present a unified approach to classification of such objects. Suppose we are given a set in defined by the equation , where is some quadratic form. If has some simple form, for example if the corresponding matrix is diagonal, i.e. if , we can easily visualize this set, especially if . In higher dimensions, it is also possible, if not to visualize, then to understand the structure of the set very well.
So, if we are given a general, complicated quadratic form, we want to simplify it as much as possible, for example to make it diagonal. The standard way of doing that is the change of variables.
Let us have a quadratic form in ( is or ). Introduce new variables , with , where is some invertible matrix, so .
Then,
so in the new variables the quadratic form has matrix .
So, we want to find an invertible matrix such that the matrix is diagonal. Note, that it is different from the diagonalization of matrices we had before: we tried to represent a matrix as , so the matrix was diagonal. However, for unitary matrices , we have , and we can orthogonally diagonalize symmetric matrices. So we can apply the orthogonal diagonalization we studied before to the quadratic forms.
Namely, we can represent the matrix as . Recall, that is a diagonal matrix with eigenvalues of on the diagonal, and is the matrix of eigenvectors (we need to pick an orthogonal basis of eigenvectors). Then , so in the variables the quadratic form has diagonal matrix.
Let us analyze the geometric meaning of the orthogonal diagonalization. The columns of the unitary matrix form an orthonormal basis in , let us call this basis . The change of coordinate matrix from this basis to the standard one is exactly . We know that , so the coordinates can be interpreted as coordinates of the vector in the new basis .
So, orthogonal diagonalization allows us to visualize very well the set , or a similar one, as long as we can visualize it for diagonal matrices.
Consider the quadratic form of two variables (i.e. quadratic form on ), . Let us describe the set of points satisfying .
The matrix of is
Orthogonally diagonalizing this matrix we can represent it as
or, equivalently
The set is the ellipse with half-axes and . Therefore the set , is the same ellipse only in the basis , , or, equivalently, the same ellipse, rotated .
Orthogonal diagonalization involves computing eigenvalues and eigenvectors, so it may be difficult to do without computers for large . On the other hand, the non-orthogonal diagonalization, i.e. finding an invertible (without requiring ) such that is diagonal, is much easier computationally and can be done using only algebraic operations (addition, subtraction, multiplication, division).
Below we present two most used methods of non-orthogonal diagonalization.
The first methods is based on completion of squares. We will illustrate this method on real quadratic forms (forms on ). After simple modifications this method could be used in the complex case, but we will not discuss it here. If necessary, an interested reader should be able to make the appropriate modifications.
Let us again consider the quadratic form of two variables, (it is the same quadratic form as in the above example, only here we call variables not but ). Since
(note, that the first two terms coincide with the first two terms of ), we get
where and .
The same method can be applied to quadratic form of more than 2 variables. Let us consider, for example, a form in ,
Considering all terms involving the first variable (the first 3 terms in this case), let us pick a full square or a multiple of a full square which has exactly these terms (plus some other terms).
Since
we can rewrite the quadratic form as
Note, that the expression involves only variables and . Since
we have
Thus we can write the quadratic form as
where
Finally, let us address the question that an attentive reader is probably already asking: what to do if at some point we do have a product of two variables, but no corresponding squares? For example, how to diagonalize the form ? The answer follows immediately from the identity
| (7.2.1) |
which gives us the representation
There is another way of performing non-orthogonal diagonalization of a quadratic form. The idea is to perform row operations on the matrix of the quadratic form. The difference with the row reduction (Gauss–Jordan elimination) is that after each row operation we need to perform the same column operation, the reason for that being that we want to make the matrix diagonal.
Let us explain how everything works on an example. Suppose we want to diagonalize a quadratic form with matrix
We augment the matrix by the identity matrix, and perform on the augmented matrix row/column operations. After each row operation we have to perform on the matrix the same column operation.111In the case of complex Hermitian matrices we perform for each row operation the conjugate of the corresponding column operation, see Remark 7.2.1 below We get
Note, that we perform column operations only on the left side of the augmented matrix
We get the diagonal matrix on the left, and the matrix on the right, so ,
Let us explain why the method works. A row operation is a left multiplication by an elementary matrix. The corresponding column operation is the right multiplication by the transposed elementary matrix. Therefore, performing row operations and the same column operations we transform the matrix to
| (7.2.2) |
As for the identity matrix in the right side, we performed only row operations on it, so the identity matrix is transformed to
If we now denote we get that is a diagonal matrix, and the matrix is the right half of the transformed augmented matrix.
In the above example we got lucky, because we did not need to interchange two rows. This operation is a bit tricker to perform. It is quite simple if you know what to do, but it may be hard to guess the correct row operations. Let us consider, for example, a quadratic form with the matrix
If we want to diagonalize it by row and column operations, the simplest idea would be to interchange rows 1 and 2. But we also must to perform the same column operation, i.e. interchange columns 1 and 2, so we will end up with the same matrix.
So, we need something more non-trivial. The identity (7.2.1), for example, can be used to diagonalize this quadratic form. However, a simpler idea also works: use row operations to get a non-zero entry on the diagonal! For example, if we start with making non-zero, the following series of row (and the corresponding column) operations is one of the possible choices:
Non-orthogonal diagonalization gives us a simple description of a set in a non-orthogonal basis. It is harder to visualize, than the representation given by the orthogonal diagonalization. However, if we are not interested in the details, for example if it is sufficient for us just to know that the set is an ellipsoid (or hyperboloid, etc), then the non-orthogonal diagonalization is an easier way to get the answer.
For quadratic forms with complex entries (i.e. for forms , ), the non-orthogonal diagonalization works the same way as in the real case, with the only difference, that the corresponding “column operations” have the complex conjugate coefficients.
The reason for that is that if a row operation is given by left multiplication by an elementary matrix , then the corresponding column operation is given by the right multiplication by , see (7.2.2).
Note that formula (7.2.2) works in both complex and reals cases: in real case we could write instead of , but using Hermitian adjoint allows us to have the same formula in both cases.
Diagonalize the quadratic form with the matrix
Use two methods: completion of squares and row operations. Which one do you like better?
Can you say if the matrix is positive definite or not?
For the matrix
orthogonally diagonalize the corresponding quadratic form, i.e. find a diagonal matrix and a unitary matrix such that .
As we discussed above, there are many ways to diagonalize a quadratic form. Note, that a resulting diagonal matrix is not unique. For example, if we got a diagonal matrix
we can take a diagonal matrix
and transform to
This transformation changes the diagonal entries of . However, it does not change the signs of the diagonal entries. And this is always the case!
Namely, the famous Sylvester’s Law of Inertia states that:
For a Hermitian matrix (i.e. for a quadratic form ) and any of its diagonalization , the number of positive (negative, zero) diagonal entries of depends only on , but not on a particular choice of diagonalization.
Here we of course assume that is an invertible matrix, and is a diagonal one.
The idea of the proof of the Sylvester’s Law of Inertia is to express the number of positive (negative, zero) diagonal entries of a diagonalization in terms of , not involving or .
We will need the following definition.
Given an Hermitian matrix (a quadratic form on ) we call a subspace positive (resp. negative, resp. neutral) if
for all , .
Sometimes, to emphasize the role of we will say -positive ( negative, -neutral).
Let be an Hermitian matrix, and let be its diagonalization by an invertible matrix . Then the number of positive (resp. negative) diagonal entries of coincides with the maximal dimension of an -positive (resp. -negative) subspace.
The above theorem says that if is the number of positive diagonal entries of , then there exists an -positive subspace of dimension , but it is impossible to find a positive subspace with .
We will need the following lemma, which can be considered a particular case of the above theorem.
Let be a diagonal matrix . Then the number of positive (resp. negative) diagonal entries of coincides with the maximal dimension of a -positive (resp. -negative) subspace.
By rearranging the standard basis in (changing the numeration) we can always assume without loss of generality that the positive diagonal entries of are the first diagonal entries.
Consider the subspace spanned by the first coordinate vectors . Clearly is a -positive subspace, and .
Let us now show that for any other -positive subspace we have . Consider the orthogonal projection ,
For a -positive subspace define an operator by
In other words, is the restriction of the projection : is defined on the whole space, but we restricted its domain to and target space to . We got an operator acting from to , and we use a different letter to distinguish it from .
Note, that . Indeed, let for we have . Then, by the definition of
and therefore
But belongs to a -positive subspace , so the inequality holds only for .
Let us now apply the Rank Theorem (Theorem 2.7.1 from Chapter 2). First of all, because . By the Rank Theorem, . But we just proved that , i.e. that , so
To prove the statement about negative entries, we just apply the above reasoning to the matrix . ∎
Let be a diagonalization of . Since
it follows that for any -positive subspace , the subspace is an -positive subspace. The same identity implies that for any -positive subspace the subspace is -positive.
Since and are invertible transformations, and . Therefore, for any positive subspace we can find an -positive subspace (namely ) of the same dimension, and vice versa: for any -positive subspace we can find a -positive subspace (namely ) of the same dimension. Therefore the maximal possible dimensions of a -positive and a -positive subspace coincide, and the theorem is proved.
The case of negative diagonal entries treated similarly, we leave the details as an exercise to the reader. ∎
A quadratic form is called
Positive definite if for all .
Positive semidefinite if for all .
Negative definite if for all .
Negative semidefinite if for all .
Indefinite if it take both positive and negative values, i.e. if there exist vectors and such that and .
A Hermitian matrix is called positive definite (negative definite, etc…) if the corresponding quadratic form is positive definite (negative definite, etc…).
Let . Then
is positive definite iff all eigenvalues of are positive.
is positive semidefinite iff all eigenvalues of are non-negative.
is negative definite iff all eigenvalues of are negative.
is negative semidefinite iff all eigenvalues of are non-positive.
is indefinite iff it has both positive and negative eigenvalues.
The proof follows trivially from the orthogonal diagonalization. Indeed, there is an orthonormal basis in which matrix of is diagonal, and for diagonal matrices the theorem is trivial. ∎
Note, that to find whether a matrix (a quadratic form) is positive definite (negative definite, etc) one does not have to compute eigenvalues. By Sylvester’s Law of Inertia it is sufficient to perform an arbitrary, not necessarily orthogonal diagonalization and look at the diagonal entries of .
It is an easy exercise to see that a matrix
is positive definite if and only if
| (7.4.1) |
Indeed, if and , then , so . So we know that if are eigenvalues of then () and . But that only possible if both eigenvalues are positive. So we have proved that conditions (7.4.1) imply that is positive definite. The opposite implication is quite simple, we leave it as an exercise for the reader.
This result can be generalized to the case of matrices. Namely, for a matrix
let us consider its all upper left submatrices
A matrix is positive definite if and only if
First of all let us notice that if then also (can you explain why?). Therefore, since all eigenvalues of a positive definite matrix are positive, see Theorem 7.4.1, for all .
One can show that if then all eigenvalues of are positive by analyzing diagonalization of a quadratic form using row and column operations, which was described in Section 7.2.2. The key here is the observation that if we perform row/column operations in natural order (i.e. first subtracting the first row/column from all other rows/columns, then subtracting the second row/columns from the rows/columns , and so on…), and if we are not doing any row interchanges, then we automatically diagonalize quadratic forms as well. Namely, after we subtract first and second rows and columns, we get diagonalization of ; after we subtract the third row/column we get the diagonalization of , and so on.
Since we are performing only row replacement we do not change the determinant. Moreover, since we are not performing row exchanges and performing the operations in the correct order, we preserve determinants of . Therefore, the condition guarantees that each new entry in the diagonal is positive.
Of course, one has to be sure that we can use only row replacements, and perform the operations in the correct order, i.e. that we do not encounter any pathological situation. If one analyzes the algorithm, one can see that the only bad situation that can happen is the situation where at some step we have zero in the pivot place. In other words, if after we subtracted the first rows and columns and obtained a diagonalization of , the entry in the st row and st column is . We leave it as an exercise for the reader to show that this is impossible. ∎
The proof we outlined above is quite simple. However, let us present, in more detail, another one, which can be found in more advanced textbooks. I personally prefer this second proof, for it demonstrates some important connections.
We will need the following characterization of eigenvalues of a hermitian matrix.
For a subspace of an inner product space the codimension of is defined as the the dimension of its orthogonal complement, . Since for a subspace , we have , we can see that .
Recall that the trivial subspace has dimension zero, so the whole space has codimension .
For arbitrary vector space the codimension of a subspace can just be defined as .
A more “high brow” definition would be the dimension of the quotient space ; however in this book we do not discuss the quotient space, and ether definition ( or works for our purposes).
Let be an matrix, and let be its eigenvalues taken in the decreasing order. Then
Let us explain in more details what the expressions like and mean. To compute the first one, we need to consider all subspaces of dimension . For each such subspace we consider the set of all of norm 1, and find the minimum of over all such . Thus for each subspace we obtain a number, and we need to pick a subspace such that the number is maximal. That is the .
The is defined similarly.
A sophisticated reader may notice a problem here: why do the maxima and minima exist? It is well known, that maximum and minimum have a nasty habit of not existing: for example, the function has neither maximum nor minimum on the open interval .
However, in this case maximum and minimum do exist. There are two possible explanations of the fact that attains maximum and minimum. The first one requires some familiarity with basic notions of analysis: one should just say that the unit sphere in , i.e. the set is compact, and that a continuous function ( in our case) on a compact set attains its maximum and minimum.
Another explanation will be to notice that the function , is a quadratic form on . It is not difficult to compute the matrix of this form in some orthonormal basis in , but let us only note that this matrix is not : it has to be a matrix, where .
It is easy to see that for a quadratic form the maximum and minimum over a unit sphere is the maximal and minimal eigenvalues of its matrix.
As for optimizing over all subspaces, we will prove below that the maximum and minimum do exist.
First of all, by picking an appropriate orthonormal basis, we can assume without loss of generality that the matrix is diagonal, .
Pick subspaces and , , , i.e. . Since , there exists a non-zero vector . By normalizing it we can assume without loss of generality that . We can always arrange the eigenvalues in decreasing order, so let us assume that .
Since belongs to the both subspaces and
We did not assume anything except dimensions about the subspaces and , so the above inequality
| (7.4.2) |
holds for all pairs of and of appropriate dimensions.
Define
Since for a self-adjoint matrix , the maximum and minimum of over the unit sphere are the maximal and the minimal eigenvalue respectively (easy to check on diagonal matrices), we get that
It follows from (7.4.2) that for any subspace ,
and similarly, for any subspace of codimension ,
But on subspaces and both maximum and minimum are , so . ∎
Let be a self-adjoint matrix, and let be its submatrix of size . Let and be the eigenvalues of and respectively, taken in decreasing order. Then
i.e.
Let be the subspace spanned by the first basis vectors, . Since for all , Theorem 7.4.3 implies that
To get we need to get maximum over the set of all subspaces of , , i.e. take maximum over a bigger set (any subspace of is a subspace of ). Therefore
(the maximum can only increase, if we increase the set).
On the other hand, any subspace of codimension (here we mean codimension in ) has dimension , so its codimension in is . Therefore
(minimum over a bigger set can only be smaller). ∎
If , then for as well (can you explain why?). Since all eigenvalues of a positive definite matrix are positive (see Theorem 7.4.1), for all .
Let us now prove the other implication. Let for all . We will show, using induction in , that all (and so ) are positive definite.
Clearly is positive definite (it is matrix, so ). Assuming that (and ) let us show that is positive definite. Let and be eigenvalues of and respectively. By Corollary 7.4.4
Since , the last eigenvalue must also be positive. Therefore, since all its eigenvalues are positive, the matrix is positive definite. ∎
First of all notice, that Sylvester Criterion of Positivity does not generalize to positive semidefinite matrices if , meaning that for matrices, , the conditions do not imply that is positive semidefinite, see Problem 7.4.6 below.
For matrices, however, the conditions imply that is positive semidefinite, see Problem 7.4.3 below. This sometimes leads to the wrong conclusion about matrices.
Finally, we should say couple words about negative definite matrices. It is a typical students’ mistake to say that the condition implies that is negative definite. But that is wrong!
To check if the matrix is negative definite, one just have to check that the matrix is positive definite. Applying Sylvester’s Criterion of Positivity to one can see that is negative definite if and only if for all .
Using Sylvester’s Criterion of Positivity check if the matrices
are positive definite or not.
Are the matrices , and , , , positive definite?
True or false:
If is positive definite, then is positive definite.
If is negative definite, then is negative definite.
If is negative definite, then is positive definite.
If is positive definite and is negative semidefinite, then is positive definite.
If is indefinite, and is positive definite, then is indefinite.
Let be a Hermitian matrix, such that , . Prove that is positive semidefinite.
Find a real symmetric matrix such that for all , but the matrix is not positive semidefinite. Try to find an example for the minimal possible .
Let be an Hermitian matrix such that for all and . Prove that is positive semidefinite.
Find a real symmetric matrix such that , for , but the matrix is not positive semidefinite.
Let be an inner product space and let be a basis (not necessarily orthogonal) in . Let be the matrix defined by
If and , then
where stands for the standard inner product in . One can immediately see that is a positive definite matrix (why?).
So, when one works with coordinates in an arbitrary (not necessarily orthogonal) basis in an inner product space, the inner product (in terms of coordinates) is not computed as the standard inner product in , but with the help of a positive definite matrix as described above.
Note, that this -inner product coincides with the standard inner product in if and only if , which happens if and only if the basis is orthonormal.