All vector spaces in this chapter are finite-dimensional.
A linear functional on a vector space (over a field ) is a linear transformation .
This special class of linear transformations is sufficiently important to deserve a separate name.
If one thinks of vectors as of some physical objects, like force or velocity, then one can think of a linear functional as a (linear) measurement, that gives you some a scalar quantity as the result: think about force or velocity in a given direction.
A collection of all linear functionals on a finite-dimensional111We consider here only finite-dimensional spaces because for infinite-dimensional spaces the dual space consists not of all but only of the so-called bounded linear functionals. Without giving the precise definition, let us only mention than in the finite-dimensional case (both the domain and the target space are finite-dimensional) all linear transformations are bounded, and we do not need to mention the word bounded vector space is called the dual of and is usually denoted as or
As it was discussed earlier in Section 1.4 of Chapter 1, the set of all linear transformations acting from to is a vector space (with naturally defined operations of addition and multiplication by a scalar). So, the dual space is a vector space.
Let us consider an example. Let the space be , what is its dual? As we know, a linear transformation is represented by an matrix, so a linear functional on (i.e. a linear transformation ) is given by an matrix (row), let us denote it by . The collection of all such rows is isomorphic to (isomorphism is given by taking the transpose ).
So, the dual of is itself. The same holds true for , of course, as well as for , where is an arbitrary field. Since the space over a field (here we mostly interested in the case or ) of dimension is isomorphic to , and the dual to is isomorphic to , we can conclude that the dual is isomorphic to
Thus, the definition of the dual space is starting to look a bit silly, since it does not appear to give us anything new.
However, that is not the case! If we look carefully, we can see that the dual space is indeed a new object. To see that, let us analyze how the entries of the matrix (which we can call the coordinates of ) change when we change the basis in .
Let
be two bases in , and let and be the matrices of in the bases and respectively (we suppose that the basis in the target space of scalars is always the standard one, so we can skip the subscript in the notation). Then recalling the change of coordinate rule from Section 2.8.4 in Chapter 2 we get that
Recall that for a vector its coordinates in different bases are related by the formula
and that
If we denote , so , the entries of the vectors and are related by the formula
| (8.1.1) |
(since we usually represent a vector as a column of its coordinates, we use and instead of and )
Saying it in words
If is the change of coordinate matrix (from old coordinates to the new ones) in , then the change of coordinate matrix in the dual space is .
So, the dual space of while isomorphic to is indeed a different object: the difference is in how the coordinates in and change when one changes the basis in .
One can ask: why can’t we pick a basis in and some completely unrelated basis in the dual ? Of course, we can do that, but imagine, what would it take to compute , knowing coordinates of in some basis and coordinates of in some completely unrelated basis.
So, if we want (knowing the coordinates of a vector in some basis) to compute the action of a linear functional using the standard rules of matrix algebra, i.e. to multiply a row (the functional) by a column (the vector), we have no choice: the “coordinates” of the linear functional should be the entries of its matrix (in the same basis).
As we can see later, see Section 8.1.3 below, the entries (“coordinates”) of a linear functional are indeed the coordinates in some basis (the so-called dual basis.
Let . If for all then . As a corollary, if for all , then
Fix a basis in . Then
Picking different matrices (i.e. different ) we can easily see that . Indeed, if
then the equality
implies that th coordinate of is .
Using this equality for all we conclude that , so . ∎
As we discussed above, the dual space is a vector space, so one can consider its dual . It looks like one that can consider the dual of and so on…However, the fun stops with because
The second dual is canonically (i.e. in a natural way) isomorphic to
Let us decipher this statement. Any vector canonically defines a linear functional on (i.e. an element of the second dual by the rule
It is easy to check that the mapping , is a linear transformation.
Since , the condition implies that is an invertible transformation (isomorphism).
The isomorphism is very natural, (at least for a mathematician). In particular, it was defined without using a basis, so it does not depend on the choice of basis. So, informally we say that is canonically isomorphic to : the rigorous statement is that the map described above (which we consider to be a natural and canonical) is an isomorphism from to .
In the previous sections, we several times referred to the entries of the matrix of a linear functional as coordinates. But coordinates in this book usually means the coordinates in some basis. Are the “coordinates’ of a linear functional really coordinates in some basis? Turns out the answer is “yes”, so the terminology remains consistent.
Let us find the basis corresponding to the coordinates of . Let be a basis in . For , let be its matrix (row) in the basis . Consider linear functionals defined by
| (8.1.2) |
where is the Kroneker delta,
Recall, that a linear transformation is defined by its action on a basis, so the functionals are well defined.
As one can easily see, the functional can be represented as
Indeed, take an arbitrary , so . By linearity and definition of
Therefore
Since this identity holds for all , we conclude that .
Since we did not assume anything about , we have just shown that any linear functional can be represented as a linear combination of , so the system is generating.
Let us show that this system is linearly independent (and so it is a basis). Let . Then for an arbitrary
so . Therefore, all are and the system is linearly independent.
So, the system is indeed a basis in the dual space and the entries of are coordinates of with respect to the basis .
Let be a basis in . The system of vectors
uniquely defined by the equation (8.1.2) is called the dual (or biorthogonal) basis to .
Note that we have shown that the dual system to a basis is a basis. Note also that in is the dual system to a basis , then is the dual to the basis
The dual system can be used for computing the coordinates in the basis . Let be the biorthogonal system to , and let . Then, as it was shown before
so . Then we can write
| (8.1.3) |
In other words,
The th coordinate of a vector in a basis is , where is the dual basis.
This formula is called (a baby version of) the abstract non-orthogonal Fourier decomposition of (in the basis ). The reason for this name will be clear later in Section 8.2.3.
Let and be bases in and respectively, and let be the dual basis to . Then the matrix of the transformation in the bases , is given by
The first example we consider is a trivial one. Let be (or ) and let be the standard basis there. The dual space will be the space of -dimensional row vectors, which is isomorphic to (or in the complex case), and the standard basis there is the dual to . The standard basis in (or in is ) obtained from by transposition.
The next example is more interesting. Let us consider the space of polynomials of degree at most . As we know, the powers , form the standard basis in this space. What is the dual to this basis?
The answer might be tricky to guess, but it is very easy to check when you know it. Namely, consider the linear functionals , , acting on polynomials as follows:
here we use the usual agreement that and .
Since
we can easily see that the system is the dual to the system of powers .
Applying (8.1.3) to the above system and its dual we get that any polynomial of degree at most can be represented as
| (8.1.4) |
This formula is well-known in Calculus as the Taylor formula for polynomials. More precisely, this is a particular case of the Taylor formula, the so-called Maclaurin formula. The general Taylor formula
can be obtained from (8.1.4) by applying it to the polynomial and then denoting . It also can be obtained by considering powers , and finding the dual system the same way we did it for .222 Note, that the general Taylor formula says more than the formula for polynomials obtained here: it says that any times differentiable function can be approximated near the point by its Taylor polynomial. Moreover, if the function is times differentiable, it allows us to estimate the error. The above formula for polynomials serves as a motivation and a starting point for the general case
Our next example deals with the so-called Lagrange interpolating formula. Let be distinct points (in or ), and let be the space of polynomials of degree at most . Define functionals by
What is the dual of this system of functionals? Note, that while it is not hard to show that the functionals are linearly independent, and so (since ) form a basis in , we do not need that. We will construct the dual system directly, and then will be able to see that the system is indeed a basis.
Namely, let us define the polynomials , as
where in the products runs from to . Clearly and if , so indeed the system is dual to the system .
There is a little detail here, since the notion of a dual system was defined only for a basis, and we did not prove that either of the systems is one. But one can immediately see that the system is linearly independent (can you explain why?), and since it contains vectors, it is a basis. Therefore, the system of functionals is also a basis in the dual space .
Note, that we did not just got lucky here, this is a general phenomenon. Namely, as Problem 8.1.1 below asserts, any system of vectors having a “‘dual” one must be linearly independent. So, constructing a dual system is a way of proving linear independence (and an easy one, if you can do it easily as in the above example).
Applying formula (8.1.3) to the above example one can see that the unique polynomial , satisfying
| (8.1.5) |
can be reconstructed by the formula
| (8.1.6) |
This formula is well-known in mathematics as the “Lagrange interpolation formula”.
Let be a system of vectors in such that there exists a system of linear functionals such that
Show that the system is linearly independent.
Prove that given distinct points and values (not necessarily distinct) the polynomial , satisfying (8.1.5) is unique. Try to prove it using the ideas from linear algebra, and not what you know about polynomials.
Let us recall that there is no inner product space over an arbitrary field, that all our inner product spaces are either real or complex.
Let be an inner product space. Given a linear functional on there exists a unique vector such that
| (8.2.1) |
Fix an orthonormal basis in , and let
be the matrix of in this basis. Define vector by
| (8.2.2) |
where denotes the complex conjugate of . In the case of a real space the conjugation does nothing and can be simply ignored.
We claim that satisfies (8.2.1).
Indeed, take an arbitrary vector . Then
and
On the other hand, recalling that if we know coordinates of 2 vectors in an orthonormal basis, we can compute the inner product by taking these coordinate and computing the standard inner product in (see Exercise 5.2.3 in Chapter 5) we see that
so (8.2.1) holds.
While the statement of the theorem does not require a basis, the proof presented above utilizes an orthonormal basis in , although the resulting vector does not depend on the choice of the basis333 An alternative proof that does need a basis is also possible. This alternative proof, that works in infinite-dimensional case, uses strong convexity of the unit ball in the inner product space together with the idea of completeness from analysis.. An advantage of this proof is that it gives a formula for computing the representing vector .
For a vector in an inner product space one can define a linear functional ,
It is easy to see that the mapping is an injective mapping from to its dual . The above Theorem 8.2.1 asserts that this mapping is a surjection (onto), so one is tempted to say that the dual of an inner product space is (canonically isomorphic to) the space itself, with the canonical isomorphism given by .
This is indeed the case if is a real inner product space and in this case it is easy to show that the map is a linear transformation. We already discussed that the map is injective and surjective, so it is an invertible linear transformations, i.e. an isomorphism.
However if is a complex space, one needs to be a bit more careful. Namely, the mapping that that maps a vector to the linear functional as in Theorem 8.2.1 () is not a linear one.
More precisely, while it is easy to show that
| (8.2.3) |
it follows from the definition of and properties of inner product that
| (8.2.4) |
so .
In other words, one can say that the dual of a complex inner product space is the space itself but with the different linear structure: adding 2 vectors is equivalent to adding corresponding linear functionals, but multiplying a vector by is equivalent to multiplying the corresponding functional by .
A transformation satisfying is sometimes called a conjugate linear transformation.
So, for a complex inner product space its dual can be canonically identified with by a conjugate linear isomorphism (i.e. invertible conjugate linear transformation)
Of course, for a real inner product space the complex conjugation can be simply ignored (because is real), so the map is a linear one. In this case we can, indeed say that the dual of an inner product space is the space itself.
In both, real and complex cases, we nevertheless can say that the dual of an inner product space can be canonically identified with the space itself.
Let be a basis in an inner product space . The unique system in defined by
where is the Kroneker delta, is called the biorthogonal or dual to the basis .
This definition clearly agrees with Definition 8.1.4, if one identifies the dual with as it was discussed above. Then it follows immediately from the discussion in Section 8.1.3 that the dual system to a basis is uniquely defined and forms a basis, and that the dual to is .
The abstract non-orthogonal Fourier decomposition formula (8.1.3) can be rewritten as
By analogy with the case of an inner product spaces, see Theorem 8.2.1, it is customary to write , where is a linear functional (i.e. , ) in the form resembling inner product
Note, that the expression is linear in both arguments, unlike the inner product which in the case of a complex space is linear in the first argument and conjugate linear in the second. So, to distinguish it from the inner product, we use the angular brackets.444This notation, while widely used, is far from the standard. Sometimes is used, sometimes the angular brackets are used for the inner product. So, encountering expression like that in the text, one has to be very careful to distinguish inner product from the action of a linear functional.
Note also, that while in the inner product both vectors belong to the same space, and above belong to different spaces: in particular, we cannot add them.
Let be a linear transformation. The transformation ( and are dual spaces for and respectively) such that
is called the adjoint (dual) to .
Of course, it is not a priori clear why the transformation exists. Below we will show that indeed such transformation exists, and moreover, it is unique.
Let us first consider the case when , ( here is, as usual, either or , but everything works for the case of arbitrary fields)
As usual, we identify a vector in with the column of its coordinates, and a linear transformation with its matrix (in the standard basis).
The dual of is, as it was discussed above, the space of rows of size , so we can identify its with . Again, we will treat an element of as a column vector of its coordinates.
Under these agreements we have for and
where the right side is the product of matrices (or a row and a column). Then, for arbitrary and
(the expressions in the middle are products of matrices).
So we have proved that the adjoint transformation exists. let us show that it is unique. Assume that for some transformation
That means that for arbitrary
By taking for and the vectors from the standard bases in and respectively we get that the matrices and coincide. ∎
So, for ,
The dual transformation exists, and is unique. Moreover, its matrix (in the standard bases) equals (the transpose of the matrix of )
Now, let us consider the general case. In fact, we do not need to do much, since everything can be reduced to the case of spaces .
Namely, let us fix bases in , and in , and let and be their dual bases (in and respectively). For a vector (from a space or its dual) we as usual denote by the column of its coordinates in the basis . Then
i.e. instead of working with and we can work with columns their coordinates (in the dual bases and respectively) absolutely the same way we do in in the case of . Of course, the same works for , so working with columns of coordinates and then translating everything back to the abstract setting we get that the dual transformation exists in unique in this case as well.
Moreover, using the fact (which we just proved) that for the matrix of is we get
| (8.3.1) |
or in plain English
The matrix of the dual transformation in the dual basis is the transpose of the matrix of the transformation in the original bases.
Note, that while we used basis to construct the dual transformation, the resulting transformation does not depend on the choice of a basis.
Let us now present another, more “high brow” way of defining the dual of a linear transformation. Namely, for , let us fix for a moment and treat the expression as a function of . It is easy to see that this is a composition of two linear transformations (which ones?) and so it is a linear function of , i.e. a linear functional on , i.e. an element of .
Let us call this linear functional to emphasize the fact that it depends on . Since we can do this for every , we can define the transformation such that
Our next step is to show that is a linear transformation. Note, that since the transformation was defined in rather indirect way, we cannot see immediately from the definition that it is linear. To show the linearity of let us take . For
Since this identity is true for all , we conclude that , i.e. that is linear.
The main advantage of this approach that it does not require a basis, so it can be (and is) used in the infinite-dimensional situation. However, the proof that we presented above in Sections 8.3.1, 8.3.1 gives a constructive way to compute the dual transformation, so we used that proof instead of more general coordinate-free one.
Note, that the above coordinate-free approach can be used to define the Hermitian adjoint of an operator in an inner product space. The only addition to the reasoning presented above will be the use of the Riesz Representation Theorem (Theorem 8.2.1). We leave the details as an exercise to the reader, see Problem 8.3.2 below.
Let be a vector space and let . The annihilator of , denoted by is the set of all such that for all .
Using the fact that is canonically isomorphic to (see Section 8.1.2) we say that for its annihilator consists of all vectors such that for all .
Formally speaking, for the set should be defined as the set of all such that for all ; the symbol is often used for the annihilator from the second part of Definition 8.3.4. However, because of the natural isomorphism of and there is no real difference between these two cases, so we will always use .
Distinguishing the cases and makes a lot of sense in the infinite-dimensional situation, where is not always canonically isomorphic to .
The spaces such that canonically isomorphic to have a special name: they are called reflexive spaces.
Let be a subspace of . Then
This proposition looks absolutely like Proposition 5.3.6 from Chapter 5. However its proof is a bit more complicated, since the suggested proof of Proposition 5.3.6 from Chapter 5 heavily used the inner product space structure: it used the decomposition , which is not true in our situation because, for example, and are in different spaces.
Let be a basis in (recall that all spaces in this chapter are assumed to be finite-dimensional), so .
Let be an operator acting from one vector space to another. Then
;
;
;
.
First of all, let us notice, that since for a subspace we have , the statements 1 and 3 are equivalent. Similarly, for the same reason, the statements 2 and 4 are equivalent as well. Finally, statement 2 is exactly statement 1 applied to the operator (here we use the trivial fact fact that , which is true, for example, because of the corresponding fact for the transpose).
So, to prove the theorem we only need to prove statement 1.
Recall that . The inclusion means that annihilates all vectors of the form , i.e. that
Since , the last identity is equivalent to
But that means that ( is a zero functional).
So we have proved that iff , or equivalently iff . ∎
Prove that if for linear transformations
for all and for all , then .
Probably one of the easiest ways of proving this is to use Lemma 8.1.3.
The next problem gives a way to prove Proposition 8.3.6
Let be a basis in and let be its dual basis. Let , . Prove that .
Use the previous problem to prove that for a subspace
We know that the dual space has the same dimension as , so the space and its dual are isomorphic. So one can think that really there is no difference between the space and its dual. However, as we discussed above in Section 8.1.1, when we change basis in the space the coordinates in and in change according to different rules, see formula (8.1.1) above.
On the other hand, using the natural isomorphism of and we can say that is the dual of . From this point of view, there is no difference between and : we can start from and say that is its dual, or we can do it the other way around and start from .
We already used this point of view above, for example in the proof of Theorem 8.3.7.
Note also, that the change of coordinate formula (8.1.1) (see also the boxed statement below it) agrees with this point of view: if , then , so we get the change of coordinate formula in from the one in by the same rule!
There are infinitely many possibilities to define an isomorphism between and .
If then the most natural way to identify and is to identify the standard basis in with the one in . In this case the action of a linear functional will be given by the “inner product type” expression
To generalize this to the general case one has to fix a basis in and consider the dual basis , and define an isomorphism by , .
This isomorphism is natural in some sense, but it depends on the choice of the basis, so in general there is no natural way to identify and .
The exception to this is the case when is a real inner product space: the Riesz representation theorem (Theorem 8.2.1) gives a natural way to identify a linear functional with a vector in . Note that this approach works only for real inner product spaces. In the complex case, the Riesz representation theorem gives a natural identification of and , but this identification is not linear but conjugate linear.
To illustrate the relations between vectors and linear functional, let us consider an example from multivariable calculus, which gives rise to important ideas like tangent and cotangent bundles in differential geometry.
Let us recall the notion of the path integral (of the second kind) from the calculus. Recall that a path in is defined by its parameterization, i.e. by a function
acting from an interval to . If is the so-called differential form (differential -form),
the path integral
is computed by substituting in the expression, i.e. is computed as
In other words, at each moment we have to evaluate the velocity
apply to it the linear functional , (here but for a fixed each is just a number, so we simply write ), and then integrate the result (which depends on ) with respect to .
Let us fix and analyze . We will show that according to the rules of Calculus, the coordinates of change as coordinates of a vector, and the coordinates of as the coordinates of a linear functional (covector). Let us assume as it is customary in Calculus, that are the coordinates in the standard basis in , and let be a different basis in . We will use notation to denote the coordinates of a vector , i.e. .
Let be the change of coordinates matrix, , so the new coordinates are expressed in terms of the old ones as
So the new coordinates of the vector are obtained from its old coordinates as
Let us now calculate the differential form
| (8.4.1) |
in terms of new coordinates . The change of coordinates matrix from the new to the old ones is . Let , so
Substituting this into (8.4.1) we get
where
But that is exactly the change of coordinate rule for the dual space! So
according to the rules of Calculus, the coefficients of a differential -form change by the same rule as coordinates in the dual space
So, according to the accepted rules of Calculus, the coordinates of velocity change as coordinates of a vector and coefficients (coordinates) of a differential -form change as the entries of a linear functional. In the differential the set of all velocities is called the tangent space, and the set of all differential forms is its dual and is called the cotangent space.
As we discussed above, in differential geometry vectors are represented by velocities, i.e. by the derivatives . This is a simple and intuitively clear point of view, but sometimes it is viewed as a bit naïve.
More “highbrow” point of view, also used in differential geometry (although in more advanced texts) is that vectors are represented by a differential operators
| (8.4.2) |
The informal reason for that is the following. Suppose we want to compute the derivative of a function along the path given by the function , i.e. the derivative
By the Chain Rule, at a given time
where the differential operator is given by (8.4.2) with .
Of course, we need to show that the coefficient of a differential form change according to the change of coordinate rule for vectors. This is intuitively clear, and can be easily shown by using the multivariable Chain Rule. We leave this as an exercise for the reader, see Problem 8.4.1 below.
As we already discussed above, it follows from the Riesz Representation Theorem (Theorem 8.2.1) that a real inner product space and its dual are canonically isomorphic. Thus we can say that vectors and functionals live in the same space which makes things both simpler and more confusing.
First of all let us note, that if the change of coordinates matrix is orthogonal (), then . Therefore, for an orthogonal change of coordinate matrix the coordinates of a vector and of a linear functional change according to the same rule, so one cannot really see a difference between a vector and a functional.
The change of coordinate matrix is orthogonal, for example, if we change from one orthonormal basis to another.
Let be a basis in a real inner product space and let be the dual basis (we identify the dual space with via Riesz Representation Theorem, so can be assumed to be in ).
Here we present the standard in differential geometry notation (the so-called Einstein notation) for working with coordinates in these bases. Since we will only be working with coordinates, we can assume that we are working in the space with the non-standard inner product defined by the positive definite matrix , , which is often called the metric tensor,
| (8.4.3) |
To distinguish between vectors and linear functionals (co-vectors) it is agreed to write the coordinates of a vector with indices as superscripts and the coordinates a a linear functional with indices as subscripts: thus , denotes the coordinates of a vector and , denotes the coordinates of a linear functional .
Putting indices as superscripts can be confusing, since one will need to distinguish it from the power. However, this is a standard and widely used notation, so we need to get acquainted with it. While I personally, like a lot of mathematicians, prefer using coordinate-free notation, all final computations are done in coordinates, so the coordinate notation has to be used. And as far as coordinate notations go, you will see that this notation is quite convenient to work with.
Another convention in the Einstein notation is that whenever in a product the same index appear in the subscript and superscript, it means one needs to sum up in this index. Thus means , so we can write . The same convention holds when we have more than one index of summation, so (8.4.3) can be rewritten in this notation as
| (8.4.4) |
(mathematicians are lazy and are always trying to avoid writing extra symbols, whenever they can).
Finally, the last convention in the Einstein notation is the preservation of the position of the indices: if we do not sum over an index, it remains in the same position (subscript or superscript) as it was before. Thus we can write , but not , because the index must remain as a superscript.
Note, that to compute the inner product of vectors, knowing their coordinates is not sufficient. One also needs to know the matrix (which is often called the metric tensor). This agrees with the Einstein notation: if we try to write as the standard inner product, the expression means just the product of coordinates, since for the summation we need the same index both as the subscript and the superscript. The expression (8.4.4), on the other hand, fit this convention perfectly.
Let us recall that we have a basis in a real inner product space, and that , is its dual basis (we identify with its dual via Riesz Representation Theorem, so are in ).
Given a vector it can be represented as
| (8.4.5) | ||||
| (8.4.6) |
The coordinates are called the covariant coordinates of the vector and the coordinates are called the contravariant coordinates.
Now let us ask ourselves a question: how can one get covariant coordinates of a vector from the contravariant ones?
According to the Einstein notation, we use the contravariant coordinates working with vectors, and covariant ones for linear functionals (i.e. when we interpret a vector as a linear functional). We know (see (8.4.6)) that , so
or in the Einstein notation
In other words,
the metric tensor is the change of coordinates matrix from contravariant coordinates to the covariant ones .
The operation of getting from contravariant coordinates to covariant is called lowering of the indices.
Note the following interpretation of the formula (8.4.4) for the inner product: as we know for the vector we get its covariant coordinate as , so . Similarly, because is symmetric, we can say that and that . In other words
To compute the inner product of two vectors, one first needs to use the metric tensor to lower indices of one vector, and then, treating this vector as a functional compute its value on the other vector.
Of course, we can also change from covariant coordinates to contravariant ones (raise the indices). Since
we get that
so the change of coordinate matrix in this case is .
Since, as we know, the change of coordinate matrix is the metric tensor, we can immediately conclude that is the metric tensor in covariant coordinates, i.e. that if then
Note, that if one looks at the big picture, the covariant and contravariant coordinates are completely interchangeable. It is just the matter of which one of the bases in the dual pair and we assign to be the “primary” one and which one to be the dual.
What to chose as a “primary” object, and what as the “dual” one depends mostly on accepted conventions.
Einstein notation is usually used in differential, and especially Riemannian geometry, where vectors are identified with velocities and covectors (linear functionals) with the differential -forms, see Section 8.4.2 above. Vectors and covectors here are clearly different objects and form what is called tangent and cotangent spaces respectively.
In Riemannian geometry one then introduces inner product (i.e. the metric tensor, if one thinks in terms of coordinates) on the tangent space, which allows us identify vectors and covectors (linear functionals). In coordinate representation this identification is done by lowering/raising indices, as described above.
Let us summarize the above discussion on whether or not a space is different from its dual.
In short, the answer is “Yes”, they are different objects. Although in the finite-dimensional case, which is treated in this book, they are isomorphic, nothing is usually gained from the identification of a space and its dual.
Even in the simplest case of it is useful to think that the elements of are columns and the elements of its dual are rows (even though, when doing manipulations with the elements of the dual space we often put the rows vertically). More striking examples are ones considered in Sections 8.1.4 and 8.1.4 dealing with Taylor formula and Lagrange interpolation. One can clearly see there that the linear functionals are indeed completely different objects than polynomials, and that hardly anything can be gained by identifying functionals with the polynomials.
For inner product spaces the situation is different, because such spaces can be canonically identified with their duals. This identification is linear for real inner product spaces, so a real inner product space is canonically isomorphic to its dual. In the case of complex spaces, this identification is only conjugate linear, but it is nevertheless very helpful to identify a linear functional with a vector and use the inner product space structure and ideas like orthogonality, self-adjointness, orthogonal projections, etc.
However, sometimes even in the case of real inner product spaces, it is more natural to consider the space and its dual as different objects. For example, in Riemannian geometry, see Remark 8.4.1 above vector and covectors come from different objects, velocities and differential -forms respectively. Even though the introduction of the metric tensor allows us to identify vectors and covectors, it is sometimes more convenient to remember their origins think of them as of different objects.
Let be a differential operator
Show, using the chain rule, that if we change a basis and write in new coordinates, its coefficients change according to the change of coordinates rule for vectors.
Let be vector spaces (over the same field ). A multilinear (-linear) map with values in is a function of vector variables , , with the target space , which is linear in each variable . In other words, it means that if we fix all variables except we get a linear map, and this should be true for all . We will use the symbol for the set of all such multilinear functions.
If the target space is the field of scalars , we call a multilinear functional, or tensor. The number is called the valency of the multilinear functional (tensor). Thus, tensor of valency is a linear functional, tensor of valency is called a bilinear form.
Let . Define a polylinear functional by multiplying the functionals ,
| (8.5.1) |
for , . The polylinear functional is called the tensor product of functionals .
Notice, that in the space one can introduce the natural operations of addition and multiplication by a scalar,
where , .
Equipped with these operations, the space is a vector space.
To see that we first need to show that and are multilinear functions. Since “multilinear” means that it is linear in each argument separately (with all the other variables fixed), this follows from the corresponding fact about linear transformation; namely from the fact that the sum of linear transformations and a scalar multiple of a linear transformation are linear transformations, cf. Section 1.4 of Chapter 1.
Then it is easy to show that satisfies all axioms of vector space; one just need to use the fact that satisfies these axioms. We leave the details as an exercise for the reader. He/she can look at Section 1.4 of Chapter 1, where it was shown that the set of linear transformations satisfies axiom 7. Literally the same proof works for multilinear functions; the proof that all other axioms are also satisfied is very similar.
Let be bases in the spaces respectively. Since a linear transformation is defined by its action on a basis, a multilinear function is defined by its values on all tuples
Since there are exactly
such tuples, and each is determined by coordinates (in some basis in ). we can conclude that is determined by entries. In other words
in particular, if the target space is the field of scalars (i.e. if we are dealing with multilinear functionals)
It is easy to find a basis in . Namely, let for the system be a basis in and let be its dual system, .
The system
is a basis in the space .
Here is the tensor product of functionals, as defined in (8.5.1).
Let be vector spaces. The tensor product
of spaces is simply the set of multilinear functionals; here is the dual of .
Here we treat a vector as a linear functional on ; the tensor product of vectors is the defined according to (8.5.1).
The tensor product of vectors is clearly linear in each argument . In other words, the map is a multilinear functional with values in . We leave the proof as an exercise for a reader, see Problem 8.5.1 below
Note, that the set of tensor products of vectors is strictly less than , see Problem 8.5.2 below.
For any multilinear function there exists a unique linear transformation extending , i.e. such that
| (8.5.6) |
for all choices of vectors , .
If is a linear transformation, then trivially the function ,
is a multilinear function in . This follows immediately from the fact that the expression is linear in each variable .
Define on the basis (8.5.5) by
and then extend it by linearity to all space . To complete the proof we need to show that (8.5.6) holds for all choices of vectors , (we now know that only when each is one of the vectors ).
To prove that, let us decompose as
Using linearity in each variable we get
so by the definition of identity (8.5.6) holds. ∎
As one can easily see, the dual of the tensor product is the tensor product of dual spaces .
Indeed, by Proposition 8.5.4 and remark after it, there is a natural one-to-one correspondence between multilinear functionals in (i.e. the elements of ) and the linear transformations (i.e. with the elements of the dual of ).
Let be vector spaces, and let be either or , . For a multilinear function we say that that it is covariant in variable if and contravariant in this variable if .
If a multilinear function is covariant (contravariant) in all variables, we say that the multilinear function is covariant (contravariant). In general, if a function is covariant in variables and contravariant in variables, we say that the multilinear function is -covariant -contravariant (or simply multilinear function, or that its valency is ).
Thus, a linear functional can be interpreted as -covariant tensor (recall, that we use the word tensor for the case of functionals, i.e. when the target space is the field of scalars ). By duality, a vector can be interpreted as -contravariant tensor.
At first the terminology might look a bit confusing: if a variable is a vector (not a functional), it is a covariant variable but a contravariant object (tensor). But notice, that we did not say here a “covariant variable”: we said that if then the mulitilinear function is covariant in the variable . So, the by the covariant variable me mean not the vector , but the “slot” in the tensor where we put it!
So there is no contradiction, we put the contravariant objects (tensors) into covariant slots and vice versa.
Sometimes, slightly abusing the language, people talk about covariant (contravariant) variables or arguments. But it is usually meant that the corresponding “slots” in the tensor are covariant (contravariant), and not the variables as objects.
A linear transformation can be interpreted as -covariant -contravariant tensor. Namely, the bilinear functional ,
is covariant in the first variable and contravariant in the second one .
Conversely,
Given a - tensor , there exists a unique linear transformation such that
| (8.5.7) |
for all , .
First of all note, that the uniqueness is a trivial corollary of Lemma 8.1.3, cf. Problem 8.3.1 above. So we only need to prove existence of .
Let be a basis in , and let be the dual basis in , . Then define the matrix by
Define to be the operator with matrix . Clearly (see Remark 8.1.5)
| (8.5.8) |
which implies the equality (8.5.7). This can be easily seen by decomposing and and using linearity in each argument.
Another, more high brow explanation is that the tensors in left and the right sides of (8.5.7) coincide on a basis in (see Remark 8.5.3 about the basis), so they coincide. To be more precise, one should lift the bilinear forms to the linear transformations (functionals) (see Proposition 8.5.4), and since the transformations coincide on a basis, they are equal.
One can also give an alternative, coordinate-free proof of existence of , along the lines of the coordinate-free definition of the dual space (see Section 8.3.1). Namely, if we fix , the function is a linear in , so it is a linear functional on , i.e. a vector in .
Note that we also can say that the function from Proposition 8.5.5 defines not the transformation , but its adjoint. Apriori, without assuming anything (like order of variables and its interpretation) we cannot distinguish between a transformation and its adjoint.
Note, that if we would like to follow the Einstein notation, the entries of the matrix of the transformation should be written as . Then if , are the coordinates of the vector , the th coordinate of is given by
Recall the here we skip the sign of summation, but we mean the sum over . Note also, that we preserve positions of the indices, so the index stays upstairs. The index does not appear in the left side of the equation because we sum over this index in the right side, and its got “killed”.
Similarly, if , are the coordinates of the vector , then th coordinate of is given by
(again, skipping the sign of summation over ). Again, since we preserve the position of the indices, so the index in is a subscript.
Note, that since and are vectors, according to the conventions of the Einstein notation, the indices in their coordinates indeed should be written as superscripts.
Similarly, and are covectors, so indices in their coordinates should be written as subscripts.
The Einstein notation emphasizes the fact mentioned in the previous remark, that a -covariant -contravariant tensor gives us both a linear transformation and its adjoint: the expression gives the action of , and gives the action of its adjoint .
More generally, any polylinear transformation can be interpreted as a tensor. Namely, given a polylinear transformation one can define the tensor by
| (8.5.9) |
Conversely,
Given a tensor there exists a unique polylinear transformation such that (8.5.9) is satisfied.
By Proposition 8.5.4 the tensor can be extended to a linear transformation (functional) such that
for all , .
This section shows that
tensors are universal objects in polylinear algebra, since any polylinear transformation can be interpreted as a tensor and vice versa.
Show that the tensor product of vectors is linear in each argument .
Show that the set of tensor products of vectors is strictly less than .
Prove that the transformation from Proposition 8.5.6 is unique.
The main reason for the differentiation of covariant and contravariant variables is that under the change of bases, their coordinates change according to different rules. Thus, the entries of covariant and contravariant vectors change according to different rules as well.
In this section we going to investigate this in details. Note, that coordinate representations are extremely important, since, for example, all numerical computations (unlike the theoretical investigations) are performed using some coordinate system.
Let be an -covariant -contravariant tensor, . Let be covariant variables (), and be the contravariant ones (). Let us write the covariant variables first, so the the tensor will be written as . For fix a basis in , and let be the dual basis in .
For a vector let , be its coordinates in the basis , and similarly, if let , be its coordinates in the dual basis (note that in agreement with the Einstein notation, the coordinates of the vector are indexed by a superscript, and the coordinates of a covector are indexed by a subscript).
Denote
| (8.6.1) |
Then, in the Einstein notation
| (8.6.2) |
(the summation here is over the indices and ).
Note that we use the notation and to emphasize that these are not the indices: the numbers in parenthesis just show the order of argument. Thus, right side of (8.6.2) does not have any indices left (all indices were used in summation), so it is just a number (for fixed s and s).
To show that (8.6.1) implies (8.6.2) we first notice that (8.6.1) means that (8.6.2) holds when s and s are the elements of the corresponding bases. Decomposing each argument and in the corresponding basis and using linearity in each argument we can easily get (8.6.2). The computation is rather simple, but because there are a lot of indices, the formulas could be quite big and could look quite frightening.
To avoid writing too many huge formulas, we leave this computation to the reader as an exercise.
We do not want the reader to feel cheated, so we present a different, more “high brow” (abstract) explanation, which does not require any computations! Namely, let us notice that the expressions in the left and the right side of (8.6.2) define tensors. By Proposition 8.5.4 they can be lifted to linear functionals on the tensor product .
Rephrasing what we discussed in the beginning of the proof, we can say that (8.6.1) means that the functional coincide on all vectors
of a basis in the tensor product, so the functionals (and therefore the tensors) are equal. ∎
The entries are called the entries of the tensor in the bases , .
Now, let for , be a basis in (and be the dual basis in ). We want to investigate how the entries of the tensor change when we change the bases from to .
Let us first consider the familiar cases of vectors and linear functionals, considered above in Section 8.1.1 but write everything down using the Einstein notation. Let we have in two bases, and and let
be the change of coordinates matrix from to . For a vector let be its coordinates in the basis and be the coordinates in the basis . Similarly, for let denote the coordinates in the basis and –the coordinates in the basis ( and are the dual bases to and respectively).
Denote by the entries of the matrix : to be consistent with the Einstein notation the superscript denotes the number of the row. Then we can write the change of coordinate formula as
| (8.6.3) |
Similarly, let be the entries of : again superscript is used to denote the number of the row. Then we can write the change of coordinate formula for the dual space as
| (8.6.4) |
the summation here is over the index (i.e. along the columns of ), so the change of coordinate matrix in this case is indeed .
Let us emphasize that we did not prove anything here: we only rewrote formula (8.1.1) from Section 8.1.1 using the Einstein notation.
While it is not needed in what follows, let us play a bit more with the Einstein notation. Namely, the equations
can be rewritten in the Einstein notation as
respectively.
Now we are ready to give the change of coordinate formula for general tensors.
For let be the change of coordinates matrices, and let be their inverses.
As in Section 8.6.2 we denote by the entries of a matrix , with the agreement that superscript gives the number of the column.
Given an -covariant -contravariant tensor let
be its entries in the bases (the old ones) and (the new ones) respectively. In the above notation
(the summation here is in the indices and ).
Because of many indices, the formula in this proposition looks very complicated. However if one understands the main idea, the formula will turn out to be quite simple and easy to memorize.
To explain the main idea let us, sightly abusing the language, express this formula “in plain English”. namely, we can say, that
To express the “new” tensor entries in terms of the “old” ones , one needs for each covariant index (subscript) apply the covariant rule (8.6.4), and for each contravariant index (superscript) apply the contravariant rule (8.6.3)
Informally, the idea of the proof is very simple: we just change the bases one at a time, applying each time the change of coordinate formulas (8.6.3) or (8.6.4), depending on whether the tensor is covariant or contravariant in the corresponding variable.
To write the rigorous formal proof we will use the induction in and (the number of covariant and contravariant arguments of the tensor). Proposition is true for , and for , , see (8.6.4) or (8.6.3) respectively.
Assuming now that the proposition is proved for some and , let us prove it for , and for , .
Let us do the latter case, the other one is done similarly. The main idea is that we first change bases and use the induction hypothesis; then we change the last one and use (8.6.3).
Namely, let be the entries of an tensor in the bases , .
Let us fix the index and consider the -covariant -contravariant tensor , where are the variables. Clearly
are its entries in the bases and respectively (can you see why?) Recall, that the index here is fixed.
By the induction hypothesis
| (8.6.5) |
Note, that we did not assume anything about the index , so (8.6.5) holds for all .
Now let us fix indices and consider -contravariant tensor
of the variable . Here are the vectors in the basis and are the vectors in the dual basis .
It is again easy to see that
, are the indices of this functional in the bases and respectively. According to (8.6.3)
and since we did not assume anything about the indices , the above identity holds for all their combinations. Combining this with (8.6.5) we get that the proposition holds for tensors of valency .