Chapter 8 Dual spaces and tensors

All vector spaces in this chapter are finite-dimensional.

8.1. Dual spaces

8.1.1. Linear functionals and the dual space. Change of coordinates in the dual space

Definition 8.1.1.

A linear functional on a vector space VV (over a field 𝔽\mathbb{F}) is a linear transformation L:V𝔽L:V\to\mathbb{F}.

This special class of linear transformations is sufficiently important to deserve a separate name.

If one thinks of vectors as of some physical objects, like force or velocity, then one can think of a linear functional as a (linear) measurement, that gives you some a scalar quantity as the result: think about force or velocity in a given direction.

Definition 8.1.2.

A collection of all linear functionals on a finite-dimensional111We consider here only finite-dimensional spaces because for infinite-dimensional spaces the dual space consists not of all but only of the so-called bounded linear functionals. Without giving the precise definition, let us only mention than in the finite-dimensional case (both the domain and the target space are finite-dimensional) all linear transformations are bounded, and we do not need to mention the word bounded vector space VV is called the dual of VV and is usually denoted as VV^{\prime} or VV^{*}

As it was discussed earlier in Section 1.4 of Chapter 1, the set (V,W)\mathcal{L}(V,W) of all linear transformations acting from VV to WW is a vector space (with naturally defined operations of addition and multiplication by a scalar). So, the dual space V=(V,𝔽)V^{\prime}=\mathcal{L}(V,\mathbb{F}) is a vector space.

Let us consider an example. Let the space VV be n\mathbb{R}^{n}, what is its dual? As we know, a linear transformation T:nmT:\mathbb{R}^{n}\to\mathbb{R}^{m} is represented by an m×nm\times n matrix, so a linear functional on n\mathbb{R}^{n} (i.e. a linear transformation L:nL:\mathbb{R}^{n}\to\mathbb{R}) is given by an 1×n1\times n matrix (row), let us denote it by [L][L]. The collection of all such rows is isomorphic to n\mathbb{R}^{n} (isomorphism is given by taking the transpose [L][L]T[L]\to[L]^{T}).

So, the dual of n\mathbb{R}^{n} is n\mathbb{R}^{n} itself. The same holds true for n\mathbb{C}^{n}, of course, as well as for 𝔽n\mathbb{F}^{n}, where 𝔽\mathbb{F} is an arbitrary field. Since the space VV over a field 𝔽\mathbb{F} (here we mostly interested in the case 𝔽=\mathbb{F}=\mathbb{R} or 𝔽=\mathbb{F}=\mathbb{C}) of dimension nn is isomorphic to 𝔽n\mathbb{F}^{n}, and the dual to 𝔽n\mathbb{F}^{n} is isomorphic to 𝔽n\mathbb{F}^{n}, we can conclude that the dual VV^{\prime} is isomorphic to VV

Thus, the definition of the dual space is starting to look a bit silly, since it does not appear to give us anything new.

However, that is not the case! If we look carefully, we can see that the dual space is indeed a new object. To see that, let us analyze how the entries of the matrix [L][L] (which we can call the coordinates of LL) change when we change the basis in VV.

Change of coordinates formula

Let

𝒜={𝐚1,𝐚2,,𝐚n},={𝐛1,𝐛2,,𝐛n}\mathcal{A}=\{\mathbf{a}_{1},\mathbf{a}_{2},\ldots,\mathbf{a}_{n}\},\qquad% \mathcal{B}=\{\mathbf{b}_{1},\mathbf{b}_{2},\ldots,\mathbf{b}_{n}\}

be two bases in VV, and let [L]𝒜=[L]𝒮,𝒜[L]_{\mathcal{A}}=[L]_{\mathcal{S},\mathcal{A}} and [L]=[L]𝒮,[L]_{\mathcal{B}}=[L]_{\mathcal{S},\mathcal{B}} be the matrices of LL in the bases 𝒜\mathcal{A} and \mathcal{B} respectively (we suppose that the basis in the target space of scalars is always the standard one, so we can skip the subscript 𝒮\mathcal{S} in the notation). Then recalling the change of coordinate rule from Section 2.8.4 in Chapter 2 we get that

[L]=[L]𝒜[I]𝒜,.[L]_{\mathcal{B}}=[L]_{\mathcal{A}}[I]_{\mathcal{A},\mathcal{B}}.

Recall that for a vector 𝐯V\mathbf{v}\in V its coordinates in different bases are related by the formula

[𝐯]=[I],𝒜[𝐯]𝒜,[\mathbf{v}]_{\mathcal{B}}=[I]_{\mathcal{B},\mathcal{A}}[\mathbf{v}]_{\mathcal% {A}},

and that

[I]𝒜,=[I],𝒜1.[I]_{\mathcal{A},\mathcal{B}}=[I]_{\mathcal{B},\mathcal{A}}^{-1}.

If we denote S:=[I],𝒜S:=[I]_{\mathcal{B},\mathcal{A}}, so [𝐯]=S[𝐯]𝒜[\mathbf{v}]_{\mathcal{B}}=S[\mathbf{v}]_{\mathcal{A}}, the entries of the vectors [L]T[L]_{\mathcal{B}}^{T} and [L]𝒜T[L]_{\mathcal{A}}^{T} are related by the formula

(8.1.1) [L]T=(S1)T[L]𝒜T[L]_{\mathcal{B}}^{T}=(S^{-1})^{T}[L]_{\mathcal{A}}^{T}

(since we usually represent a vector as a column of its coordinates, we use [L]𝒜T[L]_{\mathcal{A}}^{T} and [L]T[L]_{\mathcal{B}}^{T} instead of [L]𝒜[L]_{\mathcal{A}} and [L][L]_{\mathcal{B}})

Saying it in words

If SS is the change of coordinate matrix (from old coordinates to the new ones) in XX, then the change of coordinate matrix in the dual space XX^{\prime} is (S1)T(S^{-1})^{T}.

So, the dual space VV^{\prime} of VV while isomorphic to VV is indeed a different object: the difference is in how the coordinates in VV and VV^{\prime} change when one changes the basis in VV.

Remark.

One can ask: why can’t we pick a basis in XX and some completely unrelated basis in the dual XX^{\prime}? Of course, we can do that, but imagine, what would it take to compute L(𝐱)L(\mathbf{x}), knowing coordinates of 𝐱\mathbf{x} in some basis and coordinates of LL in some completely unrelated basis.

So, if we want (knowing the coordinates of a vector 𝐱\mathbf{x} in some basis) to compute the action of a linear functional LL using the standard rules of matrix algebra, i.e. to multiply a row (the functional) by a column (the vector), we have no choice: the “coordinates” of the linear functional LL should be the entries of its matrix (in the same basis).

As we can see later, see Section 8.1.3 below, the entries (“coordinates”) of a linear functional are indeed the coordinates in some basis (the so-called dual basis.

A uniqueness theorem

Lemma 8.1.3.

Let 𝐯V\mathbf{v}\in V. If L(𝐯)=0L(\mathbf{v})=0 for all LVL\in V^{\prime} then 𝐯=0\mathbf{v}=0. As a corollary, if L(𝐯1)=L(𝐯2)L(\mathbf{v}_{1})=L(\mathbf{v}_{2}) for all LVL\in V^{\prime}, then 𝐯1=𝐯2\mathbf{v}_{1}=\mathbf{v}_{2}

Proof.

Fix a basis \mathcal{B} in VV. Then

L(𝐯)=[L][𝐯].L(\mathbf{v})=[L]_{\mathcal{B}}[\mathbf{v}]_{\mathcal{B}}.

Picking different matrices (i.e. different LL) we can easily see that [𝐯]=𝟎[\mathbf{v}]_{\mathcal{B}}=\mathbf{0}. Indeed, if

Lk=[0,,0,1𝑘,0,,0]L_{k}=[0,\ldots,0,\underset{k}{1},0,\ldots,0]

then the equality

Lk[𝐯]=0L_{k}[\mathbf{v}]_{\mathcal{B}}=0

implies that kkth coordinate of [𝐯][\mathbf{v}]_{\mathcal{B}} is 0.

Using this equality for all kk we conclude that [𝐯]=𝟎[\mathbf{v}]_{\mathcal{B}}=\mathbf{0}, so 𝐯=𝟎\mathbf{v}=\mathbf{0}. ∎

8.1.2. Second dual

As we discussed above, the dual space VV^{\prime} is a vector space, so one can consider its dual V′′=(V)V^{\prime\prime}=(V^{\prime})^{\prime}. It looks like one that can consider the dual V′′′V^{\prime\prime\prime} of V′′V^{\prime\prime} and so on…However, the fun stops with V′′V^{\prime\prime} because

The second dual V′′V^{\prime\prime} is canonically (i.e. in a natural way) isomorphic to VV

Let us decipher this statement. Any vector 𝐯V\mathbf{v}\in V canonically defines a linear functional L𝐯L_{\mathbf{v}} on VV^{\prime} (i.e. an element of the second dual V′′V^{\prime\prime} by the rule

L𝐯(f)=f(𝐯)fVL_{\mathbf{v}}(f)=f(\mathbf{v})\qquad\forall f\in V^{\prime}

It is easy to check that the mapping T:VV′′T:V\to V^{\prime\prime}, T𝐯=L𝐯T\mathbf{v}=L_{\mathbf{v}} is a linear transformation.

Note, that KerT={𝟎}\operatorname{Ker}T=\{\mathbf{0}\}. Indeed, if T𝐯=𝟎T\mathbf{v}=\mathbf{0}, then

f(𝐯)=0fV,f(\mathbf{v})=0\qquad\forall f\in V^{\prime},

and by Lemma 8.1.3 above we have 𝐯=𝟎\mathbf{v}=\mathbf{0}.

Since dimV′′=dimV=dimV\dim V^{\prime\prime}=\dim V^{\prime}=\dim V, the condition KerT={𝟎}\operatorname{Ker}T=\{\mathbf{0}\} implies that TT is an invertible transformation (isomorphism).

The isomorphism TT is very natural, (at least for a mathematician). In particular, it was defined without using a basis, so it does not depend on the choice of basis. So, informally we say that V′′V^{\prime\prime} is canonically isomorphic to VV: the rigorous statement is that the map TT described above (which we consider to be a natural and canonical) is an isomorphism from VV to V′′V^{\prime\prime}.

8.1.3. Dual, a.k.a. biorthogonal bases

In the previous sections, we several times referred to the entries of the matrix of a linear functional LL as coordinates. But coordinates in this book usually means the coordinates in some basis. Are the “coordinates’ of a linear functional really coordinates in some basis? Turns out the answer is “yes”, so the terminology remains consistent.

Let us find the basis corresponding to the coordinates of LVL\in V^{\prime}. Let {𝐛1,𝐛2,,𝐛n}\{\mathbf{b}_{1},\mathbf{b}_{2},\ldots,\mathbf{b}_{n}\} be a basis in VV. For LVL\in V^{\prime}, let [L]=[L1,L2,,Ln][L]_{\mathcal{B}}=[L_{1},L_{2},\ldots,L_{n}] be its matrix (row) in the basis \mathcal{B}. Consider linear functionals 𝐛1,𝐛2,,𝐛nV\mathbf{b}^{\prime}_{1},\mathbf{b}^{\prime}_{2},\ldots,\mathbf{b}^{\prime}_{n}% \in V^{\prime} defined by

(8.1.2) 𝐛k(𝐛j)=δk,j\mathbf{b}^{\prime}_{k}(\mathbf{b}_{j})=\delta_{k,j}

where δk,j\delta_{k,j} is the Kroneker delta,

δk.j={1,j=k0jk\delta_{k.j}=\left\{\begin{array}[]{ll}1,&j=k\\ 0&j\neq k\end{array}\right.

Recall, that a linear transformation is defined by its action on a basis, so the functionals 𝐛k\mathbf{b}^{\prime}_{k} are well defined.

As one can easily see, the functional LL can be represented as

L=Lk𝐛k.L=\sum L_{k}\mathbf{b}^{\prime}_{k}.

Indeed, take an arbitrary 𝐯=kαk𝐛kV\mathbf{v}=\sum_{k}\alpha_{k}\mathbf{b}_{k}\in V, so [𝐯]=[α1,α2,,αn]T[\mathbf{v}]_{\mathcal{B}}=[\alpha_{1},\alpha_{2},\ldots,\alpha_{n}]^{T}. By linearity and definition of 𝐛k\mathbf{b}^{\prime}_{k}

𝐛k(𝐯)=𝐛k(jαj𝐛j)=jαj𝐛k(𝐛j)=αk.\mathbf{b}_{k}^{\prime}(\mathbf{v})=\mathbf{b}_{k}^{\prime}\left(\sum_{j}% \alpha_{j}\mathbf{b}_{j}\right)=\sum_{j}\alpha_{j}\mathbf{b}^{\prime}_{k}(% \mathbf{b}_{j})=\alpha_{k}.

Therefore

L𝐯=[L][𝐯]=kLkαk=kLk𝐛k(𝐯).L\mathbf{v}=[L]_{\mathcal{B}}[\mathbf{v}]_{\mathcal{B}}=\sum_{k}L_{k}\alpha_{k% }=\sum_{k}L_{k}\mathbf{b}^{\prime}_{k}(\mathbf{v}).

Since this identity holds for all 𝐯V\mathbf{v}\in V, we conclude that L=kLk𝐛kL=\sum_{k}L_{k}\mathbf{b}^{\prime}_{k}.

Since we did not assume anything about LVL\in V^{\prime}, we have just shown that any linear functional LL can be represented as a linear combination of 𝐛1,𝐛2,,𝐛n\mathbf{b}^{\prime}_{1},\mathbf{b}^{\prime}_{2},\ldots,\mathbf{b}^{\prime}_{n}, so the system 𝐛1,𝐛2,,𝐛n\mathbf{b}^{\prime}_{1},\mathbf{b}^{\prime}_{2},\ldots,\mathbf{b}^{\prime}_{n} is generating.

Let us show that this system is linearly independent (and so it is a basis). Let 𝟎=kLk𝐛k\mathbf{0}=\sum_{k}L_{k}\mathbf{b}^{\prime}_{k}. Then for an arbitrary j=1,2,,nj=1,2,\ldots,n

0=𝟎𝐛j=(kLk𝐛k)(𝐛j)=kLk𝐛k(𝐛j)=Lj0=\mathbf{0}\mathbf{b}_{j}=\left(\sum_{k}L_{k}\mathbf{b}^{\prime}_{k}\right)(% \mathbf{b}_{j})=\sum_{k}L_{k}\mathbf{b}^{\prime}_{k}(\mathbf{b}_{j})=L_{j}

so Lj=0L_{j}=0. Therefore, all LkL_{k} are 0 and the system is linearly independent.

So, the system 𝐛1,𝐛2,,𝐛n\mathbf{b}^{\prime}_{1},\mathbf{b}^{\prime}_{2},\ldots,\mathbf{b}^{\prime}_{n} is indeed a basis in the dual space VV^{\prime} and the entries of [L][L]_{\mathcal{B}} are coordinates of LL with respect to the basis \mathcal{B}.

Definition 8.1.4.

Let 𝐛1,𝐛2,,𝐛n\mathbf{b}_{1},\mathbf{b}_{2},\ldots,\mathbf{b}_{n} be a basis in VV. The system of vectors

𝐛1,𝐛2,,𝐛nV,\mathbf{b}^{\prime}_{1},\mathbf{b}^{\prime}_{2},\ldots,\mathbf{b}^{\prime}_{n}% \in V^{\prime},

uniquely defined by the equation (8.1.2) is called the dual (or biorthogonal) basis to 𝐛1,𝐛2,,𝐛n\mathbf{b}_{1},\mathbf{b}_{2},\ldots,\mathbf{b}_{n}.

Note that we have shown that the dual system to a basis is a basis. Note also that in 𝐛1,𝐛2,,𝐛n\mathbf{b}^{\prime}_{1},\mathbf{b}^{\prime}_{2},\ldots,\mathbf{b}^{\prime}_{n} is the dual system to a basis 𝐛1,𝐛2,,𝐛n\mathbf{b}_{1},\mathbf{b}_{2},\ldots,\mathbf{b}_{n}, then 𝐛1,𝐛2,,𝐛n\mathbf{b}_{1},\mathbf{b}_{2},\ldots,\mathbf{b}_{n} is the dual to the basis 𝐛1,𝐛2,,𝐛n\mathbf{b}^{\prime}_{1},\mathbf{b}^{\prime}_{2},\ldots,\mathbf{b}^{\prime}_{n}

Abstract non-orthogonal Fourier decomposition

The dual system can be used for computing the coordinates in the basis 𝐛1,𝐛2,,𝐛n\mathbf{b}_{1},\mathbf{b}_{2},\ldots,\mathbf{b}_{n}. Let 𝐛1,𝐛2,,𝐛n\mathbf{b}^{\prime}_{1},\mathbf{b}^{\prime}_{2},\ldots,\mathbf{b}^{\prime}_{n} be the biorthogonal system to 𝐛1,𝐛2,,𝐛n\mathbf{b}_{1},\mathbf{b}_{2},\ldots,\mathbf{b}_{n}, and let 𝐯=kαk𝐛k\mathbf{v}=\sum_{k}\alpha_{k}\mathbf{b}_{k}. Then, as it was shown before

𝐛j(𝐯)=𝐛j(kαk𝐛k)=kαk𝐛j(𝐛k)=αj𝐛j(𝐛j)=αj,\mathbf{b}^{\prime}_{j}(\mathbf{v})=\mathbf{b}_{j}\left(\sum_{k}\alpha_{k}% \mathbf{b}_{k}\right)=\sum_{k}\alpha_{k}\mathbf{b}_{j}(\mathbf{b}_{k})=\alpha_% {j}\mathbf{b}_{j}^{\prime}(\mathbf{b}_{j})=\alpha_{j},

so αk=𝐛k(𝐯)\alpha_{k}=\mathbf{b}_{k}^{\prime}(\mathbf{v}). Then we can write

(8.1.3) 𝐯=k𝐛k(𝐯)𝐛k.\mathbf{v}=\sum_{k}\mathbf{b}_{k}^{\prime}(\mathbf{v})\mathbf{b}_{k}.

In other words,

The kkth coordinate of a vector 𝐯\mathbf{v} in a basis ={𝐛1,𝐛2,,𝐛n}\mathcal{B}=\{\mathbf{b}_{1},\mathbf{b}_{2},\ldots,\mathbf{b}_{n}\} is 𝐛k(𝐯)}\mathbf{b}_{k}^{\prime}(\mathbf{v})\}, where ={𝐛1,𝐛2,,𝐛n}\mathcal{B}^{\prime}=\{\mathbf{b}^{\prime}_{1},\mathbf{b}^{\prime}_{2},\ldots,% \mathbf{b}^{\prime}_{n}\} is the dual basis.

This formula is called (a baby version of) the abstract non-orthogonal Fourier decomposition of 𝐯\mathbf{v} (in the basis 𝐛1,𝐛2,,𝐛n\mathbf{b}_{1},\mathbf{b}_{2},\ldots,\mathbf{b}_{n}). The reason for this name will be clear later in Section 8.2.3.

Remark 8.1.5.

Let 𝒜={𝐚1,𝐚2,,𝐚n}\mathcal{A}=\{\mathbf{a}_{1},\mathbf{a}_{2},\ldots,\mathbf{a}_{n}\} and ={𝐛1,𝐛2,,𝐛m}\mathcal{B}=\{\mathbf{b}_{1},\mathbf{b}_{2},\ldots,\mathbf{b}_{m}\} be bases in XX and YY respectively, and let ={𝐛1,𝐛2,,𝐛m}\mathcal{B}^{\prime}=\{\mathbf{b}^{\prime}_{1},\mathbf{b}^{\prime}_{2},\ldots,% \mathbf{b}^{\prime}_{m}\} be the dual basis to \mathcal{B}. Then the matrix [T],𝒜=:A={ak,j}k=1mj=1n[T]_{\mathcal{B},\mathcal{A}}=:A=\{a_{k,j}\}_{k=1}^{m}{}_{j=1}^{n} of the transformation TT in the bases 𝒜\mathcal{A}, \mathcal{B} is given by

ak,j=𝐛k(T𝐚j),j=1,2,,n,k=1,2,,m.a_{k,j}=\mathbf{b}^{\prime}_{k}(T\mathbf{a}_{j}),\qquad j=1,2,\ldots,n,\quad k% =1,2,\ldots,m.

8.1.4. Examples of dual systems

The first example we consider is a trivial one. Let VV be n\mathbb{R}^{n} (or n\mathbb{C}^{n}) and let 𝐞1,𝐞2,,𝐞n\mathbf{e}_{1},\mathbf{e}_{2},\ldots,\mathbf{e}_{n} be the standard basis there. The dual space will be the space of nn-dimensional row vectors, which is isomorphic to n\mathbb{R}^{n} (or n\mathbb{C}^{n} in the complex case), and the standard basis there is the dual to 𝐞1,𝐞2,,𝐞n\mathbf{e}_{1},\mathbf{e}_{2},\ldots,\mathbf{e}_{n}. The standard basis in (n)(\mathbb{R}^{n})^{\prime} (or in (n)(\mathbb{C}^{n})^{\prime} is 𝐞1T,𝐞2T,,𝐞nT\mathbf{e}^{T}_{1},\mathbf{e}^{T}_{2},\ldots,\mathbf{e}^{T}_{n}) obtained from 𝐞1,𝐞2,,𝐞n\mathbf{e}_{1},\mathbf{e}_{2},\ldots,\mathbf{e}_{n} by transposition.

Taylor formula

The next example is more interesting. Let us consider the space n\mathbb{P}_{n} of polynomials of degree at most nn. As we know, the powers {𝐞k}k=0n\{\mathbf{e}_{k}\}_{k=0}^{n}, 𝐞(t)=tn\mathbf{e}(t)=t^{n} form the standard basis in this space. What is the dual to this basis?

The answer might be tricky to guess, but it is very easy to check when you know it. Namely, consider the linear functionals 𝐞k(n)\mathbf{e}^{\prime}_{k}\in(\mathbb{P}_{n})^{\prime}, k=0,1,,nk=0,1,\ldots,n, acting on polynomials as follows:

𝐞k(p):=1k!dkdtkp(t)|t=0=1k!p(k)(0);\mathbf{e}^{\prime}_{k}(p):=\frac{1}{k!}\frac{d^{k}}{dt^{k}}p(t)\bigm{|}_{t=0}% =\frac{1}{k!}p^{(k)}(0);

here we use the usual agreement that 0!=10!=1 and d0f/dt0=fd^{0}f/dt^{0}=f.

Since

dkdtktj={j(j1)(jk+1)tjk,kj0k>j\frac{d^{k}}{dt^{k}}t^{j}=\left\{\begin{array}[]{ll}j(j-1)\ldots(j-k+1)t^{j-k}% ,&k\leq j\\ 0&k>j\end{array}\right.

we can easily see that the system {𝐞k}k=0n\{\mathbf{e}^{\prime}_{k}\}_{k=0}^{n} is the dual to the system of powers {𝐞k}k=0n\{\mathbf{e}_{k}\}_{k=0}^{n}.

Applying (8.1.3) to the above system {𝐞k}k=0n\{\mathbf{e}_{k}\}_{k=0}^{n} and its dual we get that any polynomial pp of degree at most nn can be represented as

(8.1.4) p(t)=k=0np(k)(0)k!tkp(t)=\sum_{k=0}^{n}\frac{p^{(k)}(0)}{k!}t^{k}

This formula is well-known in Calculus as the Taylor formula for polynomials. More precisely, this is a particular case of the Taylor formula, the so-called Maclaurin formula. The general Taylor formula

p(t)=k=0np(k)(a)k!(ta)kp(t)=\sum_{k=0}^{n}\frac{p^{(k)}(a)}{k!}(t-a)^{k}

can be obtained from (8.1.4) by applying it to the polynomial p(τa)p(\tau-a) and then denoting t:=τat:=\tau-a. It also can be obtained by considering powers (ta)k(t-a)^{k}, k=0,1,,nk=0,1,\ldots,n and finding the dual system the same way we did it for tkt^{k}.222 Note, that the general Taylor formula says more than the formula for polynomials obtained here: it says that any nn times differentiable function can be approximated near the point aa by its Taylor polynomial. Moreover, if the function is n+1n+1 times differentiable, it allows us to estimate the error. The above formula for polynomials serves as a motivation and a starting point for the general case

Lagrange interpolation

Our next example deals with the so-called Lagrange interpolating formula. Let a1,a2,,an+1a_{1},a_{2},\ldots,a_{n+1} be distinct points (in \mathbb{R} or \mathbb{C}), and let n\mathbb{P}_{n} be the space of polynomials of degree at most nn. Define functionals 𝐟kn\mathbf{f}_{k}\in\mathbb{P}_{n}^{\prime} by

𝐟k(p)=p(ak)pn.\mathbf{f}_{k}(p)=p(a_{k})\qquad\forall p\in\mathbb{P}_{n}.

What is the dual of this system of functionals? Note, that while it is not hard to show that the functionals 𝐟k\mathbf{f}_{k} are linearly independent, and so (since dim(n)=dimn=n+1\dim(\mathbb{P}_{n})^{\prime}=\dim\mathbb{P}_{n}=n+1) form a basis in (n)(\mathbb{P}_{n})^{\prime}, we do not need that. We will construct the dual system directly, and then will be able to see that the system 𝐟1,𝐟2,,𝐟n+1\mathbf{f}_{1},\mathbf{f}_{2},\ldots,\mathbf{f}_{n+1} is indeed a basis.

Namely, let us define the polynomials pkp_{k}, k=1,2,,n+1k=1,2,\ldots,n+1 as

pk(t)=j:jk(taj)/j:jk(akaj)p_{k}(t)={\prod_{j:j\neq k}(t-a_{j})}\Bigm{/}{\prod_{j:j\neq k}(a_{k}-a_{j})}

where jj in the products runs from 11 to n+1n+1. Clearly pk(ak)=1p_{k}(a_{k})=1 and pk(aj)=0p_{k}(a_{j})=0 if jkj\neq k, so indeed the system p1,p2,,pn+1p_{1},p_{2},\ldots,p_{n+1} is dual to the system 𝐟1,𝐟2,,𝐟n+1\mathbf{f}_{1},\mathbf{f}_{2},\ldots,\mathbf{f}_{n+1}.

There is a little detail here, since the notion of a dual system was defined only for a basis, and we did not prove that either of the systems is one. But one can immediately see that the system p1,p2,,pn+1p_{1},p_{2},\ldots,p_{n+1} is linearly independent (can you explain why?), and since it contains n+1=dimnn+1=\dim\mathbb{P}_{n} vectors, it is a basis. Therefore, the system of functionals 𝐟1,𝐟2,,𝐟n+1\mathbf{f}_{1},\mathbf{f}_{2},\ldots,\mathbf{f}_{n+1} is also a basis in the dual space (n)(\mathbb{P}_{n})^{\prime}.

Remark.

Note, that we did not just got lucky here, this is a general phenomenon. Namely, as Problem 8.1.1 below asserts, any system of vectors having a “‘dual” one must be linearly independent. So, constructing a dual system is a way of proving linear independence (and an easy one, if you can do it easily as in the above example).

Applying formula (8.1.3) to the above example one can see that the unique polynomial pp, degpn\deg p\leq n satisfying

(8.1.5) p(ak)=yk,k=1,2,,n+1p(a_{k})=y_{k},\qquad k=1,2,\ldots,n+1

can be reconstructed by the formula

(8.1.6) p(t)=k=1n+1ykpk(t).p(t)=\sum_{k=1}^{n+1}y_{k}p_{k}(t).

This formula is well-known in mathematics as the “Lagrange interpolation formula”.

Exercises.

8.1.1.

Let 𝐯1,𝐯2,,𝐯r\mathbf{v}_{1},\mathbf{v}_{2},\ldots,\mathbf{v}_{r} be a system of vectors in XX such that there exists a system 𝐯1,𝐯2,,𝐯r\mathbf{v}^{\prime}_{1},\mathbf{v}^{\prime}_{2},\ldots,\mathbf{v}^{\prime}_{r} of linear functionals such that

𝐯k(𝐯j)={1,j=k0jk\mathbf{v}^{\prime}_{k}(\mathbf{v}_{j})=\left\{\begin{array}[]{ll}1,&j=k\\ 0&j\neq k\end{array}\right.
  1. a)

    Show that the system 𝐯1,𝐯2,,𝐯r\mathbf{v}_{1},\mathbf{v}_{2},\ldots,\mathbf{v}_{r} is linearly independent.

  2. b)

    Show that if the system 𝐯1,𝐯2,,𝐯r\mathbf{v}_{1},\mathbf{v}_{2},\ldots,\mathbf{v}_{r} is not generating, then the “biorthogonal” system 𝐯1,𝐯2,,𝐯r\mathbf{v}^{\prime}_{1},\mathbf{v}^{\prime}_{2},\ldots,\mathbf{v}^{\prime}_{r} is not unique. Hint: Probably the easiest way to prove that is to complete the system 𝐯1,𝐯2,,𝐯r\mathbf{v}_{1},\mathbf{v}_{2},\ldots,\mathbf{v}_{r} to a basis, see Proposition 2.5.4 from Chapter 2

8.1.2.

Prove that given distinct points a1,a2,,an+1a_{1},a_{2},\ldots,a_{n+1} and values y1,y2,,yn+1y_{1},y_{2},\ldots,y_{n+1} (not necessarily distinct) the polynomial pp, degpn\deg p\leq n satisfying (8.1.5) is unique. Try to prove it using the ideas from linear algebra, and not what you know about polynomials.

8.2. Dual of an inner product space

Let us recall that there is no inner product space over an arbitrary field, that all our inner product spaces are either real or complex.

8.2.1. Riesz representation theorem

Theorem 8.2.1 (Riesz representation theorem).

Let HH be an inner product space. Given a linear functional LL on HH there exists a unique vector 𝐲H\mathbf{y}\in H such that

(8.2.1) L(𝐯)=(𝐯,𝐲)𝐯H.L(\mathbf{v})=(\mathbf{v},\mathbf{y})\qquad\forall\mathbf{v}\in H.
Proof.

Fix an orthonormal basis 𝐞1,𝐞2,,𝐞n\mathbf{e}_{1},\mathbf{e}_{2},\ldots,\mathbf{e}_{n} in HH, and let

[L]=[L1,L2,,Ln][L]=[L_{1},L_{2},\ldots,L_{n}]

be the matrix of LL in this basis. Define vector 𝐲\mathbf{y} by

(8.2.2) 𝐲:=kL¯k𝐞k,\mathbf{y}:=\sum_{k}\overline{L}_{k}\mathbf{e}_{k},

where L¯k\overline{L}_{k} denotes the complex conjugate of LkL_{k}. In the case of a real space the conjugation does nothing and can be simply ignored.

We claim that 𝐲\mathbf{y} satisfies (8.2.1).

Indeed, take an arbitrary vector 𝐯=kαk𝐞k\mathbf{v}=\sum_{k}\alpha_{k}\mathbf{e}_{k}. Then

[𝐯]=[α1,α2,,αn]T[\mathbf{v}]=[\alpha_{1},\alpha_{2},\ldots,\alpha_{n}]^{T}

and

L(𝐯)=[L][𝐯]=kLkαk.L(\mathbf{v})=[L][\mathbf{v}]=\sum_{k}L_{k}\alpha_{k}.

On the other hand, recalling that if we know coordinates of 2 vectors in an orthonormal basis, we can compute the inner product by taking these coordinate and computing the standard inner product in 𝔽n\mathbb{F}^{n} (see Exercise 5.2.3 in Chapter 5) we see that

(𝐯,𝐲)=kαkL¯¯k=kαkLk(\mathbf{v},\mathbf{y})=\sum_{k}\alpha_{k}\overline{\overline{L}}_{k}=\sum_{k}% \alpha_{k}L_{k}

so (8.2.1) holds.

To show that the vector 𝐲\mathbf{y} is unique, let us assume that 𝐲\mathbf{y} satisfies (8.2.1). Then for k=1,2,,nk=1,2,\ldots,n

(𝐞k,𝐲)=L(𝐞k)=Lk,(\mathbf{e}_{k},\mathbf{y})=L(\mathbf{e}_{k})=L_{k},

so (𝐲,𝐞k)=L¯k(\mathbf{y},\mathbf{e}_{k})=\overline{L}_{k}. Then, using the formula for the decomposition in the orthonormal basis, see Section 5.2.1 of Chapter 5 we get

𝐲=k(𝐲,𝐞k)𝐞k=kL¯k𝐞k\mathbf{y}=\sum_{k}(\mathbf{y},\mathbf{e}_{k})\mathbf{e}_{k}=\sum_{k}\overline% {L}_{k}\mathbf{e}_{k}

which means that any vector satisfying (8.2.1) must be represented by (8.2.2). ∎

Remark.

While the statement of the theorem does not require a basis, the proof presented above utilizes an orthonormal basis in HH, although the resulting vector 𝐲\mathbf{y} does not depend on the choice of the basis333 An alternative proof that does need a basis is also possible. This alternative proof, that works in infinite-dimensional case, uses strong convexity of the unit ball in the inner product space together with the idea of completeness from analysis.. An advantage of this proof is that it gives a formula for computing the representing vector 𝐲\mathbf{y}.

8.2.2. Is an inner product space a dual to itself?

For a vector 𝐲\mathbf{y} in an inner product space HH one can define a linear functional L𝐲L_{\mathbf{y}},

L𝐲(𝐯):=(𝐯,𝐲).L_{\mathbf{y}}(\mathbf{v}):=(\mathbf{v},\mathbf{y}).

It is easy to see that the mapping 𝐲L𝐲\mathbf{y}\mapsto L_{\mathbf{y}} is an injective mapping from HH to its dual HH^{*}. The above Theorem 8.2.1 asserts that this mapping is a surjection (onto), so one is tempted to say that the dual of an inner product space HH is (canonically isomorphic to) the space HH itself, with the canonical isomorphism given by 𝐲L𝐲\mathbf{y}\mapsto L_{\mathbf{y}}.

This is indeed the case if HH is a real inner product space and in this case it is easy to show that the map 𝐲L𝐲\mathbf{y}\mapsto L_{\mathbf{y}} is a linear transformation. We already discussed that the map is injective and surjective, so it is an invertible linear transformations, i.e. an isomorphism.

However if HH is a complex space, one needs to be a bit more careful. Namely, the mapping 𝐲L𝐲\mathbf{y}\mapsto L_{\mathbf{y}} that that maps a vector 𝐲H\mathbf{y}\in H to the linear functional L𝐲L_{\mathbf{y}} as in Theorem 8.2.1 (L𝐲(𝐯)=(𝐯,𝐲)L_{\mathbf{y}}(\mathbf{v})=(\mathbf{v},\mathbf{y})) is not a linear one.

More precisely, while it is easy to show that

(8.2.3) L𝐲1+𝐲2=L𝐲1+L𝐲2,L_{\mathbf{y}_{1}+\mathbf{y}_{2}}=L_{\mathbf{y}_{1}}+L_{\mathbf{y}_{2}},

it follows from the definition of L𝐲L_{\mathbf{y}} and properties of inner product that

(8.2.4) Lα𝐲(𝐯)=(𝐯,α𝐲)=α¯(𝐯,𝐲)=α¯L𝐲(𝐯),L_{\alpha\mathbf{y}}(\mathbf{v})=(\mathbf{v},\alpha\mathbf{y})=\overline{% \alpha}(\mathbf{v},\mathbf{y})=\overline{\alpha}L_{\mathbf{y}}(\mathbf{v}),

so Lα𝐲=α¯L𝐲L_{\alpha\mathbf{y}}=\overline{\alpha}L_{\mathbf{y}}.

In other words, one can say that the dual of a complex inner product space is the space itself but with the different linear structure: adding 2 vectors is equivalent to adding corresponding linear functionals, but multiplying a vector by α\alpha is equivalent to multiplying the corresponding functional by α¯\overline{\alpha}.

A transformation TT satisfying T(α𝐱+β𝐲)=α¯T𝐱+β¯T𝐲T(\alpha\mathbf{x}+\beta\mathbf{y})=\overline{\alpha}T\mathbf{x}+\overline{% \beta}T\mathbf{y} is sometimes called a conjugate linear transformation.

So, for a complex inner product space HH its dual can be canonically identified with HH by a conjugate linear isomorphism (i.e. invertible conjugate linear transformation)

Of course, for a real inner product space the complex conjugation can be simply ignored (because α\alpha is real), so the map 𝐲L𝐲\mathbf{y}\mapsto L_{\mathbf{y}} is a linear one. In this case we can, indeed say that the dual of an inner product space HH is the space itself.

In both, real and complex cases, we nevertheless can say that the dual of an inner product space can be canonically identified with the space itself.

8.2.3. Biorthogonal systems and orthonormal bases

Definition 8.2.2.

Let 𝐛1,𝐛2,,𝐛n\mathbf{b}_{1},\mathbf{b}_{2},\ldots,\mathbf{b}_{n} be a basis in an inner product space HH. The unique system 𝐛1,𝐛2,,𝐛n\mathbf{b}^{\prime}_{1},\mathbf{b}^{\prime}_{2},\ldots,\mathbf{b}^{\prime}_{n} in HH defined by

(𝐛j,𝐛k)=δj,k,(\mathbf{b}_{j},\mathbf{b}^{\prime}_{k})=\delta_{j,k},

where δj,k\delta_{j,k} is the Kroneker delta, is called the biorthogonal or dual to the basis 𝐛1,𝐛2,,𝐛n\mathbf{b}_{1},\mathbf{b}_{2},\ldots,\mathbf{b}_{n}.

This definition clearly agrees with Definition 8.1.4, if one identifies the dual HH^{\prime} with HH as it was discussed above. Then it follows immediately from the discussion in Section 8.1.3 that the dual system 𝐛1,𝐛2,,𝐛n\mathbf{b}^{\prime}_{1},\mathbf{b}^{\prime}_{2},\ldots,\mathbf{b}^{\prime}_{n} to a basis 𝐛1,𝐛2,,𝐛n\mathbf{b}_{1},\mathbf{b}_{2},\ldots,\mathbf{b}_{n} is uniquely defined and forms a basis, and that the dual to 𝐛1,𝐛2,,𝐛n\mathbf{b}^{\prime}_{1},\mathbf{b}^{\prime}_{2},\ldots,\mathbf{b}^{\prime}_{n} is 𝐛1,𝐛2,,𝐛n\mathbf{b}_{1},\mathbf{b}_{2},\ldots,\mathbf{b}_{n}.

The abstract non-orthogonal Fourier decomposition formula (8.1.3) can be rewritten as

𝐯=k=1n(𝐯,𝐛k)𝐛k\mathbf{v}=\sum_{k=1}^{n}(\mathbf{v},\mathbf{b}^{\prime}_{k})\mathbf{b}_{k}

Note, that an orthonormal basis is dual to itself. So, if 𝐞1,𝐞2,,𝐞n\mathbf{e}_{1},\mathbf{e}_{2},\ldots,\mathbf{e}_{n} is an orthonormal basis, the above formula is rewritten as

𝐯=k=1n(𝐯,𝐞k)𝐞k\mathbf{v}=\sum_{k=1}^{n}(\mathbf{v},\mathbf{e}_{k})\mathbf{e}_{k}

which is the classical (orthogonal) abstract Fourier decomposition, see formula (5.2.2) in Section 5.2.1 of Chapter 5.

8.3. Adjoint (dual) transformations and transpose. Fundamental subspace revisited (once more)

By analogy with the case of an inner product spaces, see Theorem 8.2.1, it is customary to write L(𝐯)L(\mathbf{v}), where LL is a linear functional (i.e. LVL\in V^{\prime}, 𝐯V\mathbf{v}\in V) in the form resembling inner product

L(𝐯)=𝐯,LL(\mathbf{v})=\langle\mathbf{v},L\rangle

Note, that the expression 𝐯,L\langle\mathbf{v},L\rangle is linear in both arguments, unlike the inner product which in the case of a complex space is linear in the first argument and conjugate linear in the second. So, to distinguish it from the inner product, we use the angular brackets.444This notation, while widely used, is far from the standard. Sometimes (𝐯,L)(\mathbf{v},L) is used, sometimes the angular brackets are used for the inner product. So, encountering expression like that in the text, one has to be very careful to distinguish inner product from the action of a linear functional.

Note also, that while in the inner product both vectors belong to the same space, 𝐯\mathbf{v} and LL above belong to different spaces: in particular, we cannot add them.

8.3.1. Dual (adjoint) transformation

Definition 8.3.1.

Let A:XYA:X\to Y be a linear transformation. The transformation A:YXA^{\prime}:Y^{\prime}\to X^{\prime} (XX^{\prime} and YY^{\prime} are dual spaces for XX and YY respectively) such that

A𝐱,𝐲=𝐱,A𝐲𝐱X,𝐲Y\langle A\mathbf{x},\mathbf{y}^{\prime}\rangle=\langle\mathbf{x},A^{\prime}% \mathbf{y}^{\prime}\rangle\qquad\forall\mathbf{x}\in X,\mathbf{y}^{\prime}\in Y% ^{\prime}

is called the adjoint (dual) to AA.

Of course, it is not a priori clear why the transformation AA^{\prime} exists. Below we will show that indeed such transformation exists, and moreover, it is unique.

Dual transformation for the case A:𝔽n𝔽mA:\mathbb{F}^{n}\to\mathbb{F}^{m}

Let us first consider the case when X=𝔽nX=\mathbb{F}^{n}, Y=𝔽mY=\mathbb{F}^{m} (𝔽\mathbb{F} here is, as usual, either \mathbb{R} or \mathbb{C}, but everything works for the case of arbitrary fields)

As usual, we identify a vector 𝐯\mathbf{v} in 𝔽n\mathbb{F}^{n} with the column of its coordinates, and a linear transformation with its matrix (in the standard basis).

The dual of 𝔽n\mathbb{F}^{n} is, as it was discussed above, the space of rows of size nn, so we can identify its with 𝔽n\mathbb{F}^{n}. Again, we will treat an element of (𝔽n)(\mathbb{F}^{n})^{\prime} as a column vector of its coordinates.

Under these agreements we have for 𝐱𝔽n\mathbf{x}\in\mathbb{F}^{n} and 𝐱(𝔽n)\mathbf{x}^{\prime}\in(\mathbb{F}^{n})^{\prime}

𝐱(𝐱)=𝐱,𝐱=(𝐱)T𝐱\mathbf{x}^{\prime}(\mathbf{x})=\langle\mathbf{x},\mathbf{x}^{\prime}\rangle=(% \mathbf{x}^{\prime})^{T}\mathbf{x}

where the right side is the product of matrices (or a row and a column). Then, for arbitrary 𝐱X=𝔽n\mathbf{x}\in X=\mathbb{F}^{n} and 𝐲Y=(𝔽m)\mathbf{y}^{\prime}\in Y^{\prime}=(\mathbb{F}^{m})^{\prime}

A𝐱,𝐲=(𝐲)TA𝐱=(AT𝐲)T𝐱=𝐱,AT𝐲\langle A\mathbf{x},\mathbf{y}^{\prime}\rangle=(\mathbf{y}^{\prime})^{T}A% \mathbf{x}=(A^{T}\mathbf{y}^{\prime})^{T}\mathbf{x}=\langle\mathbf{x},A^{T}% \mathbf{y}^{\prime}\rangle

(the expressions in the middle are products of matrices).

So we have proved that the adjoint transformation exists. let us show that it is unique. Assume that for some transformation BB

A𝐱,𝐲=𝐱,B𝐲𝐱𝔽n,𝐲(𝔽m).\langle A\mathbf{x},\mathbf{y}^{\prime}\rangle=\langle\mathbf{x},B\mathbf{y}^{% \prime}\rangle\qquad\forall\mathbf{x}\in\mathbb{F}^{n},\forall\mathbf{y}^{% \prime}\in(\mathbb{F}^{m})^{\prime}.

That means that for arbitrary

𝐱,(ATB)𝐲=0,𝐱𝔽n,𝐲(𝔽m)\langle\mathbf{x},(A^{T}-B)\mathbf{y}^{\prime}\rangle=0,\qquad\forall\mathbf{x% }\in\mathbb{F}^{n},\forall\mathbf{y}^{\prime}\in(\mathbb{F}^{m})^{\prime}

By taking for 𝐱\mathbf{x} and 𝐲\mathbf{y}^{\prime} the vectors from the standard bases in 𝔽n\mathbb{F}^{n} and (𝔽m)𝔽m(\mathbb{F}^{m})^{\prime}\cong\mathbb{F}^{m} respectively we get that the matrices BB and ATA^{T} coincide. ∎

So, for X=𝔽nX=\mathbb{F}^{n}, Y=𝔽mY=\mathbb{F}^{m}

The dual transformation AA^{\prime} exists, and is unique. Moreover, its matrix (in the standard bases) equals ATA^{T} (the transpose of the matrix of AA)

Dual transformation in the abstract setting

Now, let us consider the general case. In fact, we do not need to do much, since everything can be reduced to the case of spaces 𝔽n\mathbb{F}^{n}.

Namely, let us fix bases 𝒜=𝐚1,𝐚2,,𝐚n\mathcal{A}=\mathbf{a}_{1},\mathbf{a}_{2},\ldots,\mathbf{a}_{n} in XX, and =𝐛1,𝐛2,,𝐛m\mathcal{B}=\mathbf{b}_{1},\mathbf{b}_{2},\ldots,\mathbf{b}_{m} in YY, and let 𝒜=𝐚1,𝐚2,,𝐚n\mathcal{A}^{\prime}=\mathbf{a}^{\prime}_{1},\mathbf{a}^{\prime}_{2},\ldots,% \mathbf{a}^{\prime}_{n} and =𝐛1,𝐛2,,𝐛m\mathcal{B}=\mathbf{b}^{\prime}_{1},\mathbf{b}^{\prime}_{2},\ldots,\mathbf{b}^% {\prime}_{m} be their dual bases (in XX^{\prime} and YY^{\prime} respectively). For a vector 𝐯\mathbf{v} (from a space or its dual) we as usual denote by [𝐯][\mathbf{v}]_{\mathcal{B}} the column of its coordinates in the basis \mathcal{B}. Then

𝐱,𝐱=([𝐱]𝒜)T[𝐱]𝒜,𝐱X𝐱X,\langle\mathbf{x},\mathbf{x}^{\prime}\rangle=([\mathbf{x}^{\prime}]_{\mathcal{% A}^{\prime}})^{T}[\mathbf{x}]_{\mathcal{A}},\qquad\forall\mathbf{x}\in X\quad% \forall\mathbf{x}^{\prime}\in X^{\prime},

i.e. instead of working with 𝐱X\mathbf{x}\in X and 𝐱X\mathbf{x}^{\prime}\in X^{\prime} we can work with columns their coordinates (in the dual bases 𝒜\mathcal{A} and 𝒜\mathcal{A}^{\prime} respectively) absolutely the same way we do in in the case of 𝔽n\mathbb{F}^{n}. Of course, the same works for YY, so working with columns of coordinates and then translating everything back to the abstract setting we get that the dual transformation exists in unique in this case as well.

Moreover, using the fact (which we just proved) that for A:𝔽n𝔽mA:\mathbb{F}^{n}\to\mathbb{F}^{m} the matrix of AA^{\prime} is ATA^{T} we get

(8.3.1) [A]𝒜,=([A],𝒜)T,[A^{\prime}]_{\mathcal{A}^{\prime},\mathcal{B}^{\prime}}=([A]_{\mathcal{B},% \mathcal{A}})^{T},

or in plain English

The matrix of the dual transformation in the dual basis is the transpose of the matrix of the transformation in the original bases.

Remark 8.3.2.

Note, that while we used basis to construct the dual transformation, the resulting transformation does not depend on the choice of a basis.

A coordinate-free way to define the dual transformation

Let us now present another, more “high brow” way of defining the dual of a linear transformation. Namely, for 𝐱X\mathbf{x}\in X, 𝐲Y\mathbf{y}^{\prime}\in Y let us fix for a moment 𝐲\mathbf{y}^{\prime} and treat the expression A𝐱,𝐲=𝐲(A𝐱)\langle A\mathbf{x},\mathbf{y}^{\prime}\rangle=\mathbf{y}^{\prime}(A\mathbf{x}) as a function of 𝐱\mathbf{x}. It is easy to see that this is a composition of two linear transformations (which ones?) and so it is a linear function of 𝐱\mathbf{x}, i.e. a linear functional on XX, i.e. an element of XX^{\prime}.

Let us call this linear functional B(𝐲)B(\mathbf{y}^{\prime}) to emphasize the fact that it depends on 𝐲\mathbf{y}^{\prime}. Since we can do this for every 𝐲Y\mathbf{y}^{\prime}\in Y^{\prime}, we can define the transformation B:YXB:Y^{\prime}\to X^{\prime} such that

A𝐱,𝐲=𝐱,B(𝐲)\langle A\mathbf{x},\mathbf{y}^{\prime}\rangle=\langle\mathbf{x},B(\mathbf{y}^% {\prime})\rangle

Our next step is to show that BB is a linear transformation. Note, that since the transformation BB was defined in rather indirect way, we cannot see immediately from the definition that it is linear. To show the linearity of BB let us take 𝐲1,𝐲2Y\mathbf{y}^{\prime}_{1},\mathbf{y}^{\prime}_{2}\in Y^{\prime}. For 𝐱X\mathbf{x}\in X

𝐱,B(α𝐲1+β𝐲2)\displaystyle\langle\mathbf{x},B(\alpha\mathbf{y}^{\prime}_{1}+\beta\mathbf{y}% ^{\prime}_{2})\rangle =A𝐱,α𝐲1+β𝐲2by the definition of B\displaystyle=\langle A\mathbf{x},\alpha\mathbf{y}^{\prime}_{1}+\beta\mathbf{y% }^{\prime}_{2}\rangle\qquad\text{by the definition of }B
=αA𝐱,𝐲1+βA𝐱,𝐲2by linearity\displaystyle=\alpha\langle A\mathbf{x},\mathbf{y}^{\prime}_{1}\rangle+\beta% \langle A\mathbf{x},\mathbf{y}^{\prime}_{2}\rangle\qquad\text{by linearity}
=α𝐱,B(𝐲1)+β𝐱,B(𝐲2)by the definition of B\displaystyle=\alpha\langle\mathbf{x},B(\mathbf{y}^{\prime}_{1})\rangle+\beta% \langle\mathbf{x},B(\mathbf{y}^{\prime}_{2})\rangle\qquad\text{by the % definition of }B
=𝐱,αB(𝐲1)+βB(𝐲2)by linearity\displaystyle=\langle\mathbf{x},\alpha B(\mathbf{y}^{\prime}_{1})+\beta B(% \mathbf{y}^{\prime}_{2})\rangle\qquad\text{by linearity}

Since this identity is true for all 𝐱\mathbf{x}, we conclude that B(α𝐲1+β𝐲2)=αB(𝐲1)+βB(𝐲2)B(\alpha\mathbf{y}^{\prime}_{1}+\beta\mathbf{y}^{\prime}_{2})=\alpha B(\mathbf% {y}^{\prime}_{1})+\beta B(\mathbf{y}^{\prime}_{2}), i.e. that BB is linear.

The main advantage of this approach that it does not require a basis, so it can be (and is) used in the infinite-dimensional situation. However, the proof that we presented above in Sections 8.3.1, 8.3.1 gives a constructive way to compute the dual transformation, so we used that proof instead of more general coordinate-free one.

Remark 8.3.3.

Note, that the above coordinate-free approach can be used to define the Hermitian adjoint of an operator in an inner product space. The only addition to the reasoning presented above will be the use of the Riesz Representation Theorem (Theorem 8.2.1). We leave the details as an exercise to the reader, see Problem 8.3.2 below.

8.3.2. Annihilators and relations between fundamental subspaces

Definition 8.3.4.

Let XX be a vector space and let EXE\subset X. The annihilator of EE, denoted by EE^{\perp} is the set of all 𝐱X\mathbf{x}^{\prime}\in X^{\prime} such that 𝐱,𝐱=0\langle\mathbf{x},\mathbf{x}^{\prime}\rangle=0 for all 𝐱E\mathbf{x}\in E.

Using the fact that X′′X^{\prime\prime} is canonically isomorphic to XX (see Section 8.1.2) we say that for EXE\subset X^{\prime} its annihilator EE^{\perp} consists of all vectors 𝐱X\mathbf{x}\in X such that 𝐱,𝐱=0\langle\mathbf{x},\mathbf{x}^{\prime}\rangle=0 for all 𝐱E\mathbf{x}^{\prime}\in E.

Remark 8.3.5.

Formally speaking, for EXE\subset X^{\prime} the set EE^{\perp} should be defined as the set of all 𝐱′′X′′\mathbf{x}^{\prime\prime}\in X^{\prime\prime} such that 𝐱,𝐱′′=0\langle\mathbf{x}^{\prime},\mathbf{x}^{\prime\prime}\rangle=0 for all 𝐱E\mathbf{x}^{\prime}\in E; the symbol EE_{\perp} is often used for the annihilator from the second part of Definition 8.3.4. However, because of the natural isomorphism of X′′X^{\prime\prime} and XX there is no real difference between these two cases, so we will always use EE^{\perp}.

Distinguishing the cases EXE\subset X and EXE\subset X^{\prime} makes a lot of sense in the infinite-dimensional situation, where X′′X^{\prime\prime} is not always canonically isomorphic to XX.

The spaces such that X′′X^{\prime\prime} canonically isomorphic to XX have a special name: they are called reflexive spaces.

Proposition 8.3.6.

Let EE be a subspace of XX. Then (E)=E(E^{\perp})^{\perp}=E

This proposition looks absolutely like Proposition 5.3.6 from Chapter 5. However its proof is a bit more complicated, since the suggested proof of Proposition 5.3.6 from Chapter 5 heavily used the inner product space structure: it used the decomposition X=EEX=E\oplus E^{\perp}, which is not true in our situation because, for example, EE and EE^{\perp} are in different spaces.

Proof.

Let 𝐯1,𝐯2,,𝐯r\mathbf{v}_{1},\mathbf{v}_{2},\ldots,\mathbf{v}_{r} be a basis in EE (recall that all spaces in this chapter are assumed to be finite-dimensional), so E=span{𝐯1,𝐯2,,𝐯r}E=\operatorname{span}\{\mathbf{v}_{1},\mathbf{v}_{2},\ldots,\mathbf{v}_{r}\}.

By Proposition 2.5.4 from Chapter 2 the system can be extended to a basis in all XX, i.e. one can find vectors 𝐯r+1,,𝐯n\mathbf{v}_{r+1},\ldots,\mathbf{v}_{n} (n=dimXn=\dim X) such that 𝐯1,𝐯2,,𝐯n\mathbf{v}_{1},\mathbf{v}_{2},\ldots,\mathbf{v}_{n} is a basis in XX.

Let 𝐯1,𝐯2,,𝐯n\mathbf{v}^{\prime}_{1},\mathbf{v}^{\prime}_{2},\ldots,\mathbf{v}^{\prime}_{n} be the dual basis to 𝐯1,𝐯2,,𝐯n\mathbf{v}_{1},\mathbf{v}_{2},\ldots,\mathbf{v}_{n}. By Exercise 8.3.3 below E=span{𝐯r+1,,𝐯n}E^{\perp}=\operatorname{span}\{\mathbf{v}^{\prime}_{r+1},\ldots,\mathbf{v}^{% \prime}_{n}\}. Applying again this exercise to EE^{\perp} we get that

(E)=span{𝐯1,𝐯2,,𝐯r}=E.(E^{\perp})^{\perp}=\operatorname{span}\{\mathbf{v}_{1},\mathbf{v}_{2},\ldots,% \mathbf{v}_{r}\}=E.

The following theorem is analogous to Theorem 5.5.1 from Chapter 5

Theorem 8.3.7.

Let A:XYA:X\to Y be an operator acting from one vector space to another. Then

  1. a)

    KerA=(RanA)\operatorname{Ker}A^{\prime}=(\operatorname{Ran}A)^{\perp};

  2. b)

    KerA=(RanA)\operatorname{Ker}A=(\operatorname{Ran}A^{\prime})^{\perp};

  3. c)

    RanA=(KerA)\operatorname{Ran}A=(\operatorname{Ker}A^{\prime})^{\perp};

  4. d)

    RanA=(KerA)\operatorname{Ran}A^{\prime}=(\operatorname{Ker}A)^{\perp}.

Proof.

First of all, let us notice, that since for a subspace EE we have (E)=E(E^{\perp})^{\perp}=E, the statements 1 and 3 are equivalent. Similarly, for the same reason, the statements 2 and 4 are equivalent as well. Finally, statement 2 is exactly statement 1 applied to the operator AA^{\prime} (here we use the trivial fact fact that (A)=A(A^{\prime})^{\prime}=A, which is true, for example, because of the corresponding fact for the transpose).

So, to prove the theorem we only need to prove statement 1.

Recall that A:YXA^{\prime}:Y^{\prime}\to X^{\prime}. The inclusion 𝐲(RanA)\mathbf{y}^{\prime}\in(\operatorname{Ran}A)^{\perp} means that 𝐲\mathbf{y}^{\prime} annihilates all vectors of the form A𝐱A\mathbf{x}, i.e. that

A𝐱,𝐲=0𝐱X.\langle A\mathbf{x},\mathbf{y}^{\prime}\rangle=0\qquad\forall\mathbf{x}\in X.

Since A𝐱,𝐲=𝐱,A𝐲\langle A\mathbf{x},\mathbf{y}^{\prime}\rangle=\langle\mathbf{x},A^{\prime}% \mathbf{y}^{\prime}\rangle, the last identity is equivalent to

𝐱,A𝐲=0𝐱X.\langle\mathbf{x},A^{\prime}\mathbf{y}^{\prime}\rangle=0\qquad\forall\mathbf{x% }\in X.

But that means that A𝐲=𝟎A^{\prime}\mathbf{y}^{\prime}=\mathbf{0} (A𝐲A^{\prime}\mathbf{y}^{\prime} is a zero functional).

So we have proved that 𝐲(RanA)\mathbf{y}^{\prime}\in(\operatorname{Ran}A)^{\perp} iff A𝐲=𝟎A^{\prime}\mathbf{y}^{\prime}=\mathbf{0}, or equivalently iff 𝐲KerA\mathbf{y}^{\prime}\in\operatorname{Ker}A^{\prime}. ∎

Exercises.

8.3.1.

Prove that if for linear transformations T,T1:XYT,T_{1}:X\to Y

T𝐱,𝐲=T1𝐱,𝐲\langle T\mathbf{x},\mathbf{y}^{\prime}\rangle=\langle T_{1}\mathbf{x},\mathbf% {y}^{\prime}\rangle

for all 𝐱X\mathbf{x}\in X and for all 𝐲Y\mathbf{y}^{\prime}\in Y^{\prime}, then T=T1T=T_{1}.

Probably one of the easiest ways of proving this is to use Lemma 8.1.3.

8.3.2.

Combine the Riesz Representation Theorem (Theorem 8.2.1) with the reasoning in Section 8.3.1 above to present a coordinate-free definition of the Hermitian adjoint of an operator in an inner product space.

The next problem gives a way to prove Proposition 8.3.6

8.3.3.

Let 𝐯1,𝐯2,,𝐯n\mathbf{v}_{1},\mathbf{v}_{2},\ldots,\mathbf{v}_{n} be a basis in XX and let 𝐯1,𝐯2,,𝐯n\mathbf{v}^{\prime}_{1},\mathbf{v}^{\prime}_{2},\ldots,\mathbf{v}^{\prime}_{n} be its dual basis. Let E:=span{𝐯1,𝐯2,,𝐯r}E:=\operatorname{span}\{\mathbf{v}_{1},\mathbf{v}_{2},\ldots,\mathbf{v}_{r}\}, r<nr<n. Prove that E=span{𝐯r+1,,𝐯n}E^{\perp}=\operatorname{span}\{\mathbf{v}^{\prime}_{r+1},\ldots,\mathbf{v}^{% \prime}_{n}\}.

8.3.4.

Use the previous problem to prove that for a subspace EXE\subset X

dimE+dimE=dimX.\dim E+\dim E^{\perp}=\dim X.

8.4. What is the difference between a space and its dual?

We know that the dual space XX^{\prime} has the same dimension as XX, so the space and its dual are isomorphic. So one can think that really there is no difference between the space and its dual. However, as we discussed above in Section 8.1.1, when we change basis in the space XX the coordinates in XX and in XX^{\prime} change according to different rules, see formula (8.1.1) above.

On the other hand, using the natural isomorphism of XX and X′′X^{\prime\prime} we can say that XX is the dual of XX^{\prime}. From this point of view, there is no difference between XX and XX^{\prime}: we can start from XX and say that XX^{\prime} is its dual, or we can do it the other way around and start from XX^{\prime}.

We already used this point of view above, for example in the proof of Theorem 8.3.7.

Note also, that the change of coordinate formula (8.1.1) (see also the boxed statement below it) agrees with this point of view: if S~:=(S1)T\widetilde{S}:=(S^{-1})^{T}, then (S~1)T=S(\widetilde{S}^{-1})^{T}=S, so we get the change of coordinate formula in XX from the one in XX^{\prime} by the same rule!

8.4.1. Isomorphisms between XX and XX^{\prime}

There are infinitely many possibilities to define an isomorphism between XX and XX^{\prime}.

If X=𝔽nX=\mathbb{F}^{n} then the most natural way to identify XX and XX^{\prime} is to identify the standard basis in 𝔽n\mathbb{F}^{n} with the one in (𝔽n)(\mathbb{F}^{n})^{\prime}. In this case the action of a linear functional will be given by the “inner product type” expression

𝐯,𝐯=(𝐯)T𝐯.\langle\mathbf{v},\mathbf{v}^{\prime}\rangle=(\mathbf{v}^{\prime})^{T}\mathbf{% v}.

To generalize this to the general case one has to fix a basis =𝐛1,𝐛2,,𝐛n\mathcal{B}=\mathbf{b}_{1},\mathbf{b}_{2},\ldots,\mathbf{b}_{n} in XX and consider the dual basis =𝐛1,𝐛2,,𝐛n\mathcal{B}^{\prime}=\mathbf{b}^{\prime}_{1},\mathbf{b}^{\prime}_{2},\ldots,% \mathbf{b}^{\prime}_{n}, and define an isomorphism T:XXT:X\to X^{\prime} by T𝐛k=𝐛kT\mathbf{b}_{k}=\mathbf{b}^{\prime}_{k}, k=1,2,,nk=1,2,\ldots,n.

This isomorphism is natural in some sense, but it depends on the choice of the basis, so in general there is no natural way to identify XX and XX^{\prime}.

The exception to this is the case when XX is a real inner product space: the Riesz representation theorem (Theorem 8.2.1) gives a natural way to identify a linear functional with a vector in XX. Note that this approach works only for real inner product spaces. In the complex case, the Riesz representation theorem gives a natural identification of XX and XX^{\prime}, but this identification is not linear but conjugate linear.

8.4.2. An example: velocities (differential operators) and differential forms as vectors and linear functionals

To illustrate the relations between vectors and linear functional, let us consider an example from multivariable calculus, which gives rise to important ideas like tangent and cotangent bundles in differential geometry.

Let us recall the notion of the path integral (of the second kind) from the calculus. Recall that a path γ\gamma in n\mathbb{R}^{n} is defined by its parameterization, i.e. by a function

t𝐱(t)=(x1(t),x2(t),,xn(t))Tt\mapsto\mathbf{x}(t)=(x_{1}(t),x_{2}(t),\ldots,x_{n}(t))^{T}

acting from an interval [a,b][a,b] to n\mathbb{R}^{n}. If ω\omega is the so-called differential form (differential 11-form),

ω=f1(𝐱)dx1+f2(𝐱)dx2++fn(𝐱)dxn,\omega=f_{1}(\mathbf{x})dx_{1}+f_{2}(\mathbf{x})dx_{2}+\ldots+f_{n}(\mathbf{x}% )dx_{n},

the path integral

γω=γf1𝑑x1+f2dx2++fndxn\int_{\gamma}\omega=\int_{\gamma}f_{1}dx_{1}+f_{2}dx_{2}+\ldots+f_{n}dx_{n}

is computed by substituting 𝐱(t)=(x1(t),x2(t),,xn(t))T\mathbf{x}(t)=(x_{1}(t),x_{2}(t),\ldots,x_{n}(t))^{T} in the expression, i.e. γω\int_{\gamma}\omega is computed as

ab(f1(𝐱(t))dx1(t)dt+f2(𝐱(t))dx2(t)dt++fn(𝐱(t))dxn(t)dt)𝑑t.\int_{a}^{b}\left(f_{1}(\mathbf{x}(t))\frac{dx_{1}(t)}{dt}+f_{2}(\mathbf{x}(t)% )\frac{dx_{2}(t)}{dt}+\ldots+f_{n}(\mathbf{x}(t))\frac{dx_{n}(t)}{dt}\right)dt.

In other words, at each moment tt we have to evaluate the velocity

𝐯=d𝐱(t)dt=(dx1(t)dt,dx2(t)dt,,dxn(t)dt)T,\mathbf{v}=\frac{d\mathbf{x}(t)}{dt}=\left(\frac{dx_{1}(t)}{dt},\frac{dx_{2}(t% )}{dt},\ldots,\frac{dx_{n}(t)}{dt}\right)^{T},

apply to it the linear functional 𝐟=(f1,f2,,fn)\mathbf{f}=(f_{1},f_{2},\ldots,f_{n}), 𝐟(𝐯)=k=1nfkvk\mathbf{f}(\mathbf{v})=\sum_{k=1}^{n}f_{k}v_{k} (here fk=fk(𝐱(t))f_{k}=f_{k}(\mathbf{x}(t)) but for a fixed tt each fkf_{k} is just a number, so we simply write fkf_{k}), and then integrate the result (which depends on tt) with respect to tt.

Velocities as vectors

Let us fix tt and analyze 𝐟(𝐯)\mathbf{f}(\mathbf{v}). We will show that according to the rules of Calculus, the coordinates of 𝐯\mathbf{v} change as coordinates of a vector, and the coordinates of 𝐟\mathbf{f} as the coordinates of a linear functional (covector). Let us assume as it is customary in Calculus, that xkx_{k} are the coordinates in the standard basis in n\mathbb{R}^{n}, and let ={𝐛1,𝐛2,,𝐛n}\mathcal{B}=\{\mathbf{b}_{1},\mathbf{b}_{2},\ldots,\mathbf{b}_{n}\} be a different basis in n\mathbb{R}^{n}. We will use notation x~k\widetilde{x}_{k} to denote the coordinates of a vector 𝐱=(x1,x2,,xn)T\mathbf{x}=(x_{1},x_{2},\ldots,x_{n})^{T}, i.e. [𝐱]=(x~1,x~2,,x~n)T[\mathbf{x}]_{\mathcal{B}}=(\widetilde{x}_{1},\widetilde{x}_{2},\ldots,% \widetilde{x}_{n})^{T}.

Let A={ak,j}k,j=1nA=\{a_{k,j}\}_{k,j=1}^{n} be the change of coordinates matrix, A=[I],𝒮A=[I]_{\mathcal{B},\mathcal{S}}, so the new coordinates x~k\widetilde{x}_{k} are expressed in terms of the old ones as

x~k=j=1nak,jxj,k=1,2,,n.\widetilde{x}_{k}=\sum_{j=1}^{n}a_{k,j}x_{j},\qquad k=1,2,\ldots,n.

So the new coordinates v~k\widetilde{v}_{k} of the vector 𝐯\mathbf{v} are obtained from its old coordinates vkv_{k} as

v~k=j=1nak,jvj,k=1,2,,n.\widetilde{v}_{k}=\sum_{j=1}^{n}a_{k,j}v_{j},\qquad k=1,2,\ldots,n.

Differential forms as linear functionals (covectors)

Let us now calculate the differential form

(8.4.1) ω=k=1nfkdxk\omega=\sum_{k=1}^{n}f_{k}dx_{k}

in terms of new coordinates x~k\widetilde{x}_{k}. The change of coordinates matrix from the new to the old ones is A1A^{-1}. Let A1={a~k,j}k,j=1nA^{-1}=\{\widetilde{a}_{k,j}\}_{k,j=1}^{n}, so

xk=j=1na~k,jx~j,anddxk=j=1na~k,jdx~j,k=1,2,,n.x_{k}=\sum_{j=1}^{n}\widetilde{a}_{k,j}\widetilde{x}_{j},\quad\text{and}\quad dx% _{k}=\sum_{j=1}^{n}\widetilde{a}_{k,j}d\widetilde{x}_{j},\qquad k=1,2,\ldots,n.

Substituting this into (8.4.1) we get

ω\displaystyle\omega =k=1nfkj=1na~k,jdx~j\displaystyle=\sum_{k=1}^{n}f_{k}\sum_{j=1}^{n}\widetilde{a}_{k,j}d\widetilde{% x}_{j}
=j=1n(k=1na~k,jfk)dx~j\displaystyle=\sum_{j=1}^{n}\left(\sum_{k=1}^{n}\widetilde{a}_{k,j}f_{k}\right% )d\widetilde{x}_{j}
=j=1nf~jdx~j\displaystyle=\sum_{j=1}^{n}\widetilde{f}_{j}d\widetilde{x}_{j}

where

f~j=k=1na~k,jfk.\widetilde{f}_{j}=\sum_{k=1}^{n}\widetilde{a}_{k,j}f_{k}.

But that is exactly the change of coordinate rule for the dual space! So

according to the rules of Calculus, the coefficients of a differential 11-form change by the same rule as coordinates in the dual space

So, according to the accepted rules of Calculus, the coordinates of velocity 𝐯\mathbf{v} change as coordinates of a vector and coefficients (coordinates) of a differential 11-form change as the entries of a linear functional. In the differential the set of all velocities is called the tangent space, and the set of all differential 11 forms is its dual and is called the cotangent space.

Differential operators as vectors

As we discussed above, in differential geometry vectors are represented by velocities, i.e. by the derivatives d𝐱(t)/dt{d\mathbf{x}(t)}/{dt}. This is a simple and intuitively clear point of view, but sometimes it is viewed as a bit naïve.

More “highbrow” point of view, also used in differential geometry (although in more advanced texts) is that vectors are represented by a differential operators

(8.4.2) D=kvkxk.D=\sum_{k}v_{k}\frac{\partial}{\partial x_{k}}.

The informal reason for that is the following. Suppose we want to compute the derivative of a function Φ\Phi along the path given by the function t𝐱(t)t\mapsto\mathbf{x}(t), i.e. the derivative

dΦ(𝐱(t))dt.\frac{d\Phi(\mathbf{x}(t))}{dt}.

By the Chain Rule, at a given time tt

dΦ(𝐱(t))dt=k=1n(Φxk|𝐱=𝐱(t))xk(t)=DΦ|𝐱=𝐱(t),\frac{d\Phi(\mathbf{x}(t))}{dt}=\sum_{k=1}^{n}\left(\frac{\partial\Phi}{% \partial x_{k}}\Bigm{|}_{\mathbf{x}=\mathbf{x}(t)}\right)x_{k}^{\prime}(t)=D% \Phi\bigm{|}_{\mathbf{x}=\mathbf{x}(t)},

where the differential operator DD is given by (8.4.2) with vk=xk(t)v_{k}=x_{k}^{\prime}(t).

Of course, we need to show that the coefficient vkv_{k} of a differential form change according to the change of coordinate rule for vectors. This is intuitively clear, and can be easily shown by using the multivariable Chain Rule. We leave this as an exercise for the reader, see Problem 8.4.1 below.

8.4.3. The case of a real inner product space

As we already discussed above, it follows from the Riesz Representation Theorem (Theorem 8.2.1) that a real inner product space XX and its dual XX^{\prime} are canonically isomorphic. Thus we can say that vectors and functionals live in the same space which makes things both simpler and more confusing.

Remark.

First of all let us note, that if the change of coordinates matrix SS is orthogonal (S1=STS^{-1}=S^{T}), then (S1)T=S(S^{-1})^{T}=S. Therefore, for an orthogonal change of coordinate matrix the coordinates of a vector and of a linear functional change according to the same rule, so one cannot really see a difference between a vector and a functional.

The change of coordinate matrix is orthogonal, for example, if we change from one orthonormal basis to another.

Einstein notation, metric tensor

Let ={𝐛1,𝐛2,,𝐛n\mathcal{B}=\{\mathbf{b}_{1},\mathbf{b}_{2},\ldots,\mathbf{b}_{n} be a basis in a real inner product space XX and let ={𝐛1,𝐛2,,𝐛n}\mathcal{B}^{\prime}=\{\mathbf{b}^{\prime}_{1},\mathbf{b}^{\prime}_{2},\ldots,% \mathbf{b}^{\prime}_{n}\} be the dual basis (we identify the dual space XX^{\prime} with XX via Riesz Representation Theorem, so 𝐛k\mathbf{b}^{\prime}_{k} can be assumed to be in XX).

Here we present the standard in differential geometry notation (the so-called Einstein notation) for working with coordinates in these bases. Since we will only be working with coordinates, we can assume that we are working in the space n\mathbb{R}^{n} with the non-standard inner product (,)G(\,\cdot\,,\,\cdot\,)_{G} defined by the positive definite matrix G={gj,k}j,k=1nG=\{g_{j,k}\}_{j,k=1}^{n}, gj,k=(𝐛k,𝐛j)Xg_{j,k}=(\mathbf{b}_{k},\mathbf{b}_{j})_{{}_{\scriptstyle X}}, which is often called the metric tensor,

(8.4.3) (𝐱,𝐲)=(𝐱,𝐲)G=j=1nk=1ngj,kxjyk,𝐱,𝐲n(\mathbf{x},\mathbf{y})=(\mathbf{x},\mathbf{y})_{{}_{\scriptstyle G}}=\sum_{j=% 1}^{n}\sum_{k=1}^{n}g_{j,k}x_{j}y_{k},\qquad\mathbf{x},\mathbf{y}\in\mathbb{R}% ^{n}

(see Section 7.5 in Chapter 7).

To distinguish between vectors and linear functionals (co-vectors) it is agreed to write the coordinates of a vector with indices as superscripts and the coordinates a a linear functional with indices as subscripts: thus xjx^{j}, j=1,2,,nj=1,2,\ldots,n denotes the coordinates of a vector 𝐱\mathbf{x} and fkf_{k}, k=1,2,,nk=1,2,\ldots,n denotes the coordinates of a linear functional 𝐟\mathbf{f}.

Remark.

Putting indices as superscripts can be confusing, since one will need to distinguish it from the power. However, this is a standard and widely used notation, so we need to get acquainted with it. While I personally, like a lot of mathematicians, prefer using coordinate-free notation, all final computations are done in coordinates, so the coordinate notation has to be used. And as far as coordinate notations go, you will see that this notation is quite convenient to work with.

Another convention in the Einstein notation is that whenever in a product the same index appear in the subscript and superscript, it means one needs to sum up in this index. Thus xjfjx^{j}f_{j} means jxjfj\sum_{j}x^{j}f_{j}, so we can write 𝐟(𝐱)=xjfj\mathbf{f}(\mathbf{x})=x^{j}f_{j}. The same convention holds when we have more than one index of summation, so (8.4.3) can be rewritten in this notation as

(8.4.4) (𝐱,𝐲)=gj,kxkyj,𝐱,𝐲n(\mathbf{x},\mathbf{y})=g_{j,k}x^{k}y^{j},\qquad\mathbf{x},\mathbf{y}\in% \mathbb{R}^{n}

(mathematicians are lazy and are always trying to avoid writing extra symbols, whenever they can).

Finally, the last convention in the Einstein notation is the preservation of the position of the indices: if we do not sum over an index, it remains in the same position (subscript or superscript) as it was before. Thus we can write yj=akjxky^{j}=a^{j}_{k}x^{k}, but not fj=akjxkf_{j}=a^{j}_{k}x^{k}, because the index jj must remain as a superscript.

Note, that to compute the inner product of 22 vectors, knowing their coordinates is not sufficient. One also needs to know the matrix GG (which is often called the metric tensor). This agrees with the Einstein notation: if we try to write (𝐱,𝐲)(\mathbf{x},\mathbf{y}) as the standard inner product, the expression xkykx_{k}y_{k} means just the product of coordinates, since for the summation we need the same index both as the subscript and the superscript. The expression (8.4.4), on the other hand, fit this convention perfectly.

Covariant and contravariant coordinates. Lovering and raising the indices

Let us recall that we have a basis 𝐛1,𝐛2,,𝐛n\mathbf{b}_{1},\mathbf{b}_{2},\ldots,\mathbf{b}_{n} in a real inner product space, and that 𝐛1,𝐛2,,𝐛n\mathbf{b}^{\prime}_{1},\mathbf{b}^{\prime}_{2},\ldots,\mathbf{b}^{\prime}_{n}, 𝐛kX\mathbf{b}^{\prime}_{k}\in X is its dual basis (we identify XX with its dual XX^{\prime} via Riesz Representation Theorem, so 𝐛k\mathbf{b}_{k}^{\prime} are in XX).

Given a vector 𝐱X\mathbf{x}\in X it can be represented as

(8.4.5) 𝐱\displaystyle\mathbf{x} =k=1n(𝐱,𝐛k)𝐛k=:k=1nxk𝐛k,and as\displaystyle=\sum_{k=1}^{n}(\mathbf{x},\mathbf{b}^{\prime}_{k})\mathbf{b}_{k}% =:\sum_{k=1}^{n}x^{k}\mathbf{b}_{k},\qquad\text{and as}
(8.4.6) 𝐱\displaystyle\mathbf{x} =k=1n(𝐱,𝐛k)𝐛k=:k=1nxk𝐛k.\displaystyle=\sum_{k=1}^{n}(\mathbf{x},\mathbf{b}_{k})\mathbf{b}^{\prime}_{k}% =:\sum_{k=1}^{n}x_{k}\mathbf{b}^{\prime}_{k}.

The coordinates xkx_{k} are called the covariant coordinates of the vector 𝐱\mathbf{x} and the coordinates xkx^{k} are called the contravariant coordinates.

Now let us ask ourselves a question: how can one get covariant coordinates of a vector from the contravariant ones?

According to the Einstein notation, we use the contravariant coordinates working with vectors, and covariant ones for linear functionals (i.e. when we interpret a vector 𝐱X\mathbf{x}\in X as a linear functional). We know (see (8.4.6)) that xk=(𝐱,𝐛k)x_{k}=(\mathbf{x},\mathbf{b}_{k}), so

xk=(𝐱,𝐛k)=(jxj𝐛j,𝐛k)=jxj(𝐛j,𝐛k)=jgk,jxj,x_{k}=(\mathbf{x},\mathbf{b}_{k})=\Bigl{(}\sum_{j}x^{j}\mathbf{b}_{j},\mathbf{% b}_{k}\Bigr{)}=\sum_{j}x^{j}(\mathbf{b}_{j},\mathbf{b}_{k})=\sum_{j}g_{k,j}x^{% j},

or in the Einstein notation

xk=gk,jxj.x_{k}=g_{k,j}x^{j}.

In other words,

the metric tensor GG is the change of coordinates matrix from contravariant coordinates xkx^{k} to the covariant ones xkx_{k}.

The operation of getting from contravariant coordinates to covariant is called lowering of the indices.

Note the following interpretation of the formula (8.4.4) for the inner product: as we know for the vector 𝐱\mathbf{x} we get its covariant coordinate as xj=gj,kxkx_{j}=g_{j,k}x^{k}, so (𝐱,𝐲)=xjyj(\mathbf{x},\mathbf{y})=x_{j}y^{j}. Similarly, because GG is symmetric, we can say that yk=gj,kyky_{k}=g_{j,k}y^{k} and that (𝐱,𝐲)=xkyk(\mathbf{x},\mathbf{y})=x^{k}y_{k}. In other words

To compute the inner product of two vectors, one first needs to use the metric tensor GG to lower indices of one vector, and then, treating this vector as a functional compute its value on the other vector.

Of course, we can also change from covariant coordinates xjx_{j} to contravariant ones xjx^{j} (raise the indices). Since

(x1,x2,,xn)T=G(x1,x2,,xn)T,(x_{1},x_{2},\ldots,x_{n})^{T}=G(x^{1},x^{2},\ldots,x^{n})^{T},

we get that

(x1,x2,,xn)T=G1(x1,x2,,xn)T(x^{1},x^{2},\ldots,x^{n})^{T}=G^{-1}(x_{1},x_{2},\ldots,x_{n})^{T}

so the change of coordinate matrix in this case is G1G^{-1}.

Since, as we know, the change of coordinate matrix is the metric tensor, we can immediately conclude that G1G^{-1} is the metric tensor in covariant coordinates, i.e. that if G1={gk,j}k,j=1nG^{-1}=\{g^{k,j}\}_{k,j=1}^{n} then

(𝐱,𝐲)=gk,jxjyk.(\mathbf{x},\mathbf{y})=g^{k,j}x_{j}y_{k}.
Remark.

Note, that if one looks at the big picture, the covariant and contravariant coordinates are completely interchangeable. It is just the matter of which one of the bases in the dual pair \mathcal{B} and \mathcal{B}^{\prime} we assign to be the “primary” one and which one to be the dual.

What to chose as a “primary” object, and what as the “dual” one depends mostly on accepted conventions.

Remark 8.4.1.

Einstein notation is usually used in differential, and especially Riemannian geometry, where vectors are identified with velocities and covectors (linear functionals) with the differential 11-forms, see Section 8.4.2 above. Vectors and covectors here are clearly different objects and form what is called tangent and cotangent spaces respectively.

In Riemannian geometry one then introduces inner product (i.e. the metric tensor, if one thinks in terms of coordinates) on the tangent space, which allows us identify vectors and covectors (linear functionals). In coordinate representation this identification is done by lowering/raising indices, as described above.

8.4.4. Conclusions

Let us summarize the above discussion on whether or not a space is different from its dual.

In short, the answer is “Yes”, they are different objects. Although in the finite-dimensional case, which is treated in this book, they are isomorphic, nothing is usually gained from the identification of a space and its dual.

Even in the simplest case of 𝔽n\mathbb{F}^{n} it is useful to think that the elements of 𝔽n\mathbb{F}^{n} are columns and the elements of its dual are rows (even though, when doing manipulations with the elements of the dual space we often put the rows vertically). More striking examples are ones considered in Sections 8.1.4 and 8.1.4 dealing with Taylor formula and Lagrange interpolation. One can clearly see there that the linear functionals are indeed completely different objects than polynomials, and that hardly anything can be gained by identifying functionals with the polynomials.

For inner product spaces the situation is different, because such spaces can be canonically identified with their duals. This identification is linear for real inner product spaces, so a real inner product space is canonically isomorphic to its dual. In the case of complex spaces, this identification is only conjugate linear, but it is nevertheless very helpful to identify a linear functional with a vector and use the inner product space structure and ideas like orthogonality, self-adjointness, orthogonal projections, etc.

However, sometimes even in the case of real inner product spaces, it is more natural to consider the space and its dual as different objects. For example, in Riemannian geometry, see Remark 8.4.1 above vector and covectors come from different objects, velocities and differential 11-forms respectively. Even though the introduction of the metric tensor allows us to identify vectors and covectors, it is sometimes more convenient to remember their origins think of them as of different objects.

Exercises.

8.4.1.

Let DD be a differential operator

D=k=1nvkxk.D=\sum_{k=1}^{n}v_{k}\frac{\partial}{\partial x_{k}}.

Show, using the chain rule, that if we change a basis and write DD in new coordinates, its coefficients vkv_{k} change according to the change of coordinates rule for vectors.

8.5. Multilinear functions. Tensors

8.5.1. Multilinear functions

Definition 8.5.1.

Let V1,V2,,Vp,VV_{1},V_{2},\ldots,V_{p},V be vector spaces (over the same field 𝔽\mathbb{F}). A multilinear (pp-linear) map with values in VV is a function FF of pp vector variables 𝐯1,𝐯2,,𝐯p\mathbf{v}_{1},\mathbf{v}_{2},\ldots,\mathbf{v}_{p}, 𝐯kVk\mathbf{v}_{k}\in V_{k}, with the target space VV, which is linear in each variable 𝐯k\mathbf{v}_{k}. In other words, it means that if we fix all variables except 𝐯k\mathbf{v}_{k} we get a linear map, and this should be true for all k=1,2,,pk=1,2,\ldots,p. We will use the symbol L(V1,V2,,Vp;V)L(V_{1},V_{2},\ldots,V_{p};V) for the set of all such multilinear functions.

If the target space VV is the field of scalars 𝔽\mathbb{F}, we call FF a multilinear functional, or tensor. The number pp is called the valency of the multilinear functional (tensor). Thus, tensor of valency 11 is a linear functional, tensor of valency 22 is called a bilinear form.

Example.

Let 𝐟k(Vk)\mathbf{f}_{k}\in(V_{k})^{\prime}. Define a polylinear functional F=𝐟1𝐟2𝐟pF=\mathbf{f}_{1}\otimes\mathbf{f}_{2}\otimes\ldots\otimes\mathbf{f}_{p} by multiplying the functionals 𝐟k\mathbf{f}_{k},

(8.5.1) 𝐟1𝐟2𝐟p(𝐯1,𝐯2,,𝐯p)=𝐟1(𝐯1)𝐟2(𝐯2)𝐟p(𝐯p),\mathbf{f}_{1}\otimes\mathbf{f}_{2}\otimes\ldots\otimes\mathbf{f}_{p}(\mathbf{% v}_{1},\mathbf{v}_{2},\ldots,\mathbf{v}_{p})=\mathbf{f}_{1}(\mathbf{v}_{1})% \mathbf{f}_{2}(\mathbf{v}_{2})\ldots\mathbf{f}_{p}(\mathbf{v}_{p}),

for 𝐯kVk\mathbf{v}_{k}\in V_{k}, k=1,2,,pk=1,2,\ldots,p. The polylinear functional 𝐟1𝐟2𝐟p\mathbf{f}_{1}\otimes\mathbf{f}_{2}\otimes\ldots\otimes\mathbf{f}_{p} is called the tensor product of functionals 𝐟k\mathbf{f}_{k}.

Multilinear functions form a vector space

Notice, that in the space L(V1,V2,,Vp;V)L(V_{1},V_{2},\ldots,V_{p};V) one can introduce the natural operations of addition and multiplication by a scalar,

(F1+F2)(𝐯1,𝐯2,,𝐯p)\displaystyle(F_{1}+F_{2})(\mathbf{v}_{1},\mathbf{v}_{2},\ldots,\mathbf{v}_{p}) :=F1(𝐯1,𝐯2,,𝐯p)+F2(𝐯1,𝐯2,,𝐯p),\displaystyle:=F_{1}(\mathbf{v}_{1},\mathbf{v}_{2},\ldots,\mathbf{v}_{p})+F_{2% }(\mathbf{v}_{1},\mathbf{v}_{2},\ldots,\mathbf{v}_{p}),
(αF1)(𝐯1,𝐯2,,𝐯p)\displaystyle(\alpha F_{1})(\mathbf{v}_{1},\mathbf{v}_{2},\ldots,\mathbf{v}_{p}) :=αF1(𝐯1,𝐯2,,𝐯p),\displaystyle:=\alpha F_{1}(\mathbf{v}_{1},\mathbf{v}_{2},\ldots,\mathbf{v}_{p% }),

where F1,F2L(V1,V2,,Vp;V)F_{1},F_{2}\in L(V_{1},V_{2},\ldots,V_{p};V), α𝔽\alpha\in\mathbb{F}.

Equipped with these operations, the space L(V1,V2,,Vp;V)L(V_{1},V_{2},\ldots,V_{p};V) is a vector space.

To see that we first need to show that F1+F2F_{1}+F_{2} and αF1\alpha F_{1} are multilinear functions. Since “multilinear” means that it is linear in each argument separately (with all the other variables fixed), this follows from the corresponding fact about linear transformation; namely from the fact that the sum of linear transformations and a scalar multiple of a linear transformation are linear transformations, cf. Section 1.4 of Chapter 1.

Then it is easy to show that L(V1,V2,,Vp;V)L(V_{1},V_{2},\ldots,V_{p};V) satisfies all axioms of vector space; one just need to use the fact that VV satisfies these axioms. We leave the details as an exercise for the reader. He/she can look at Section 1.4 of Chapter 1, where it was shown that the set of linear transformations satisfies axiom 7. Literally the same proof works for multilinear functions; the proof that all other axioms are also satisfied is very similar.

Dimension of L(V1,V2,,Vp;V)L(V_{1},V_{2},\ldots,V_{p};V)

Let 1,2,,p\mathcal{B}_{1},\mathcal{B}_{2},\ldots,\mathcal{B}_{p} be bases in the spaces V1,V2,,VpV_{1},V_{2},\ldots,V_{p} respectively. Since a linear transformation is defined by its action on a basis, a multilinear function FL(V1,V2,,Vp;V)F\in L(V_{1},V_{2},\ldots,V_{p};V) is defined by its values on all tuples

𝐛j11,𝐛j22,,𝐛jpp,𝐛jkkk.\mathbf{b}^{1}_{j_{1}},\mathbf{b}^{2}_{j_{2}},\ldots,\mathbf{b}^{p}_{j_{p}},% \qquad\mathbf{b}^{k}_{j_{k}}\in\mathcal{B}_{k}.

Since there are exactly

(dimV1)(dimV2)(dimVp)(\dim V_{1})(\dim V_{2})\ldots(\dim V_{p})

such tuples, and each F(𝐛j11,𝐛j22,,𝐛jpp)F(\mathbf{b}^{1}_{j_{1}},\mathbf{b}^{2}_{j_{2}},\ldots,\mathbf{b}^{p}_{j_{p}}) is determined by dimV\dim V coordinates (in some basis in VV). we can conclude that FL(V1,V2,,Vp;V)F\in L(V_{1},V_{2},\ldots,V_{p};V) is determined by (dimV1)(dimV2)(dimVp)(dimV)(\dim V_{1})(\dim V_{2})\ldots(\dim V_{p})(\dim V) entries. In other words

dimL(V1,V2,,Vp;V)=(dimV1)(dimV2)(dimVp)(dimV).\dim L(V_{1},V_{2},\ldots,V_{p};V)=(\dim V_{1})(\dim V_{2})\ldots(\dim V_{p})(% \dim V).

in particular, if the target space is the field of scalars 𝔽\mathbb{F} (i.e. if we are dealing with multilinear functionals)

dimL(V1,V2,,Vp;𝔽)=(dimV1)(dimV2)(dimVp).\dim L(V_{1},V_{2},\ldots,V_{p};\mathbb{F})=(\dim V_{1})(\dim V_{2})\ldots(% \dim V_{p}).

It is easy to find a basis in L(V1,V2,,Vp;𝔽)L(V_{1},V_{2},\ldots,V_{p};\mathbb{F}). Namely, let for k=1,2,,pk=1,2,\ldots,p the system k={𝐛jk}j=1dimVk\mathcal{B}_{k}=\{\mathbf{b}^{k}_{j}\}_{j=1}^{\dim V_{k}} be a basis in VkV_{k} and let ={𝐛~jk}j=1dimVk\mathcal{B}^{\prime}=\{\widetilde{\mathbf{b}}^{k}_{j}\}_{j=1}^{\dim V_{k}} be its dual system, 𝐛~jkVk\widetilde{\mathbf{b}}^{k}_{j}\in V_{k}^{\prime}.

Proposition 8.5.2.

The system

𝐛~j11𝐛~j22𝐛~jpp,1jkdimVk,k=1,2,,p,\widetilde{\mathbf{b}}^{1}_{j_{1}}\otimes\widetilde{\mathbf{b}}^{2}_{j_{2}}% \otimes\ldots\otimes\widetilde{\mathbf{b}}^{p}_{j_{p}},\qquad 1\leq j_{k}\leq% \dim V_{k},\quad k=1,2,\ldots,p,

is a basis in the space L(V1,V2,,Vp;𝔽)L(V_{1},V_{2},\ldots,V_{p};\mathbb{F}).

Here 𝐛~j11𝐛~j22𝐛~jpp\widetilde{\mathbf{b}}^{1}_{j_{1}}\otimes\widetilde{\mathbf{b}}^{2}_{j_{2}}% \otimes\ldots\otimes\widetilde{\mathbf{b}}^{p}_{j_{p}} is the tensor product of functionals, as defined in (8.5.1).

Proof.

We want to represent FF as

(8.5.2) F=j1,j2,,jpαj1,j2,,jp𝐛~j11𝐛~j22𝐛~jppF=\sum_{j_{1},j_{2},\ldots,j_{p}}\alpha_{j_{1},j_{2},\ldots,j_{p}}\widetilde{% \mathbf{b}}^{1}_{j_{1}}\otimes\widetilde{\mathbf{b}}^{2}_{j_{2}}\otimes\ldots% \otimes\widetilde{\mathbf{b}}^{p}_{j_{p}}

Since 𝐛~j(𝐛l)=δj,l\widetilde{\mathbf{b}}_{j}(\mathbf{b}_{l})=\delta_{j,l}, we have

(8.5.3) 𝐛~j11𝐛~j22𝐛~jpp(𝐛j11,𝐛j22,,𝐛jpp)\displaystyle\widetilde{\mathbf{b}}^{1}_{j_{1}}\otimes\widetilde{\mathbf{b}}^{% 2}_{j_{2}}\otimes\ldots\otimes\widetilde{\mathbf{b}}^{p}_{j_{p}}(\mathbf{b}^{1% }_{j_{1}},\mathbf{b}^{2}_{j_{2}},\ldots,\mathbf{b}^{p}_{j_{p}}) =1and\displaystyle=1\qquad\text{and}
(8.5.4) 𝐛~j11𝐛~j22𝐛~jpp(𝐛j11,𝐛j22,,𝐛jpp)\displaystyle\widetilde{\mathbf{b}}^{1}_{j_{1}}\otimes\widetilde{\mathbf{b}}^{% 2}_{j_{2}}\otimes\ldots\otimes\widetilde{\mathbf{b}}^{p}_{j_{p}}(\mathbf{b}^{1% }_{j^{\prime}_{1}},\mathbf{b}^{2}_{j^{\prime}_{2}},\ldots,\mathbf{b}^{p}_{j^{% \prime}_{p}}) =0\displaystyle=0

for any collection of indices j1,j2,,jpj^{\prime}_{1},j^{\prime}_{2},\ldots,j^{\prime}_{p} different from j1,j2,,jpj_{1},j_{2},\ldots,j_{p}.

Therefore, applying (8.5.2) to the tuple 𝐛j11,𝐛j22,,𝐛jpp\mathbf{b}^{1}_{j_{1}},\mathbf{b}^{2}_{j_{2}},\ldots,\mathbf{b}^{p}_{j_{p}} we get

αj1,j2,,jp=F(𝐛j11,𝐛j22,,𝐛jpp),\alpha_{j_{1},j_{2},\ldots,j_{p}}=F(\mathbf{b}^{1}_{j_{1}},\mathbf{b}^{2}_{j_{% 2}},\ldots,\mathbf{b}^{p}_{j_{p}}),

so the representation (8.5.2) is unique (if exists).

On the other hand, defining αj1,j2,,jp:=F(𝐛j11,𝐛j22,,𝐛jpp)\alpha_{j_{1},j_{2},\ldots,j_{p}}:=F(\mathbf{b}^{1}_{j_{1}},\mathbf{b}^{2}_{j_% {2}},\ldots,\mathbf{b}^{p}_{j_{p}}) and using (8.5.3) and (8.5.4), we can see that the equality (8.5.2) holds on all tuples of form 𝐛j11,𝐛j22,,𝐛jpp\mathbf{b}^{1}_{j_{1}},\mathbf{b}^{2}_{j_{2}},\ldots,\mathbf{b}^{p}_{j_{p}}. So decomposition (8.5.2) holds, so we indeed have a basis. ∎

8.5.2. Tensor Products

Definition.

Let V1,V2,,VpV_{1},V_{2},\ldots,V_{p} be vector spaces. The tensor product

V1V2VpV_{1}\otimes V_{2}\otimes\ldots\otimes V_{p}

of spaces VkV_{k} is simply the set L(V1,V2,,Vp;𝔽)L(V^{\prime}_{1},V^{\prime}_{2},\ldots,V^{\prime}_{p};\mathbb{F}) of multilinear functionals; here VkV_{k}^{\prime} is the dual of VkV_{k}.

Remark 8.5.3.

By Proposition 8.5.2 we get that if k={𝐛jk}j=1dimVk\mathcal{B}_{k}=\{\mathbf{b}^{k}_{j}\}_{j=1}^{\dim V_{k}} is a basis in VkV_{k} for k=1,2,,pk=1,2,\ldots,p, then the system

(8.5.5) 𝐛j11𝐛j22𝐛jpp,1jkdimVk,k=1,2,,p,\mathbf{b}^{1}_{j_{1}}\otimes\mathbf{b}^{2}_{j_{2}}\otimes\ldots\otimes\mathbf% {b}^{p}_{j_{p}},\qquad 1\leq j_{k}\leq\dim V_{k},\quad k=1,2,\ldots,p,

is a basis in V1V2VpV_{1}\otimes V_{2}\otimes\ldots\otimes V_{p}.

Here we treat a vector 𝐯kVk\mathbf{v}_{k}\in V_{k} as a linear functional on VkV^{\prime}_{k}; the tensor product of vectors 𝐯1𝐯2𝐯p\mathbf{v}_{1}\otimes\mathbf{v}_{2}\otimes\ldots\otimes\mathbf{v}_{p} is the defined according to (8.5.1).

Remark.

The tensor product 𝐯1𝐯2𝐯p\mathbf{v}_{1}\otimes\mathbf{v}_{2}\otimes\ldots\otimes\mathbf{v}_{p} of vectors is clearly linear in each argument 𝐯k\mathbf{v}_{k}. In other words, the map (𝐯1,𝐯2,,𝐯p)𝐯1𝐯2𝐯p(\mathbf{v}_{1},\mathbf{v}_{2},\ldots,\mathbf{v}_{p})\mapsto\mathbf{v}_{1}% \otimes\mathbf{v}_{2}\otimes\ldots\otimes\mathbf{v}_{p} is a multilinear functional with values in V1V2VpV_{1}\otimes V_{2}\otimes\ldots\otimes V_{p}. We leave the proof as an exercise for a reader, see Problem 8.5.1 below

Remark.

Note, that the set {𝐯1𝐯2𝐯p:𝐯kVk}\{\mathbf{v}_{1}\otimes\mathbf{v}_{2}\otimes\ldots\otimes\mathbf{v}_{p}\,:\,% \mathbf{v}_{k}\in V_{k}\} of tensor products of vectors is strictly less than V1V2VpV_{1}\otimes V_{2}\otimes\ldots\otimes V_{p}, see Problem 8.5.2 below.

Lifting a multilinear function to a linear transformation on the tensor product

Proposition 8.5.4.

For any multilinear function FL(V1,V2,,Vp;V)F\in L(V_{1},V_{2},\ldots,V_{p};V) there exists a unique linear transformation T:V1V2VpVT:V_{1}\otimes V_{2}\otimes\ldots\otimes V_{p}\to V extending FF, i.e. such that

(8.5.6) F(𝐯1,𝐯2,,𝐯p)=T𝐯1𝐯2𝐯p,F(\mathbf{v}_{1},\mathbf{v}_{2},\ldots,\mathbf{v}_{p})=T\,\mathbf{v}_{1}% \otimes\mathbf{v}_{2}\otimes\ldots\otimes\mathbf{v}_{p},

for all choices of vectors 𝐯kVk\mathbf{v}_{k}\in V_{k}, 1kp1\leq k\leq p.

Remark.

If T:V1V2VpVT:V_{1}\otimes V_{2}\otimes\ldots\otimes V_{p}\to V is a linear transformation, then trivially the function FF,

F(𝐯1,𝐯2,,𝐯p):=T𝐯1𝐯2𝐯p,F(\mathbf{v}_{1},\mathbf{v}_{2},\ldots,\mathbf{v}_{p}):=T\,\mathbf{v}_{1}% \otimes\mathbf{v}_{2}\otimes\ldots\otimes\mathbf{v}_{p},

is a multilinear function in L(V1,V2,,Vp;V)L(V_{1},V_{2},\ldots,V_{p};V). This follows immediately from the fact that the expression 𝐯1𝐯2𝐯p\mathbf{v}_{1}\otimes\mathbf{v}_{2}\otimes\ldots\otimes\mathbf{v}_{p} is linear in each variable 𝐯k\mathbf{v}_{k}.

Proof of Proposition 8.5.4.

Define TT on the basis (8.5.5) by

T𝐛j11𝐛j22𝐛jpp=F(𝐛j11,𝐛j22,,𝐛jpp)T\mathbf{b}^{1}_{j_{1}}\otimes\mathbf{b}^{2}_{j_{2}}\otimes\ldots\otimes% \mathbf{b}^{p}_{j_{p}}=F(\mathbf{b}^{1}_{j_{1}},\mathbf{b}^{2}_{j_{2}},\ldots,% \mathbf{b}^{p}_{j_{p}})

and then extend it by linearity to all space V1V2VpV_{1}\otimes V_{2}\otimes\ldots\otimes V_{p}. To complete the proof we need to show that (8.5.6) holds for all choices of vectors 𝐯kVk\mathbf{v}_{k}\in V_{k}, 1kp1\leq k\leq p (we now know that only when each 𝐯k\mathbf{v}_{k} is one of the vectors 𝐛jkk\mathbf{b}^{k}_{j_{k}}).

To prove that, let us decompose 𝐯k\mathbf{v}_{k} as

𝐯k=jkαjkk𝐛jkk,k=1,2,,p.\mathbf{v}_{k}=\sum_{j_{k}}\alpha^{k}_{j_{k}}\mathbf{b}^{k}_{j_{k}},\qquad k=1% ,2,\ldots,p.

Using linearity in each variable 𝐯k\mathbf{v}_{k} we get

𝐯1𝐯2𝐯p\displaystyle\mathbf{v}_{1}\otimes\mathbf{v}_{2}\otimes\ldots\otimes\mathbf{v}% _{p} =j1,j2,,jpαj11αj22,,αjpp𝐛j11𝐛j22𝐛jpp,\displaystyle=\sum_{j_{1},j_{2},\ldots,j_{p}}\alpha^{1}_{j_{1}}\alpha^{2}_{j_{% 2}},\ldots,\alpha^{p}_{j_{p}}\mathbf{b}^{1}_{j_{1}}\otimes\mathbf{b}^{2}_{j_{2% }}\otimes\ldots\otimes\mathbf{b}^{p}_{j_{p}},
F(𝐯1,𝐯2,,𝐯p)\displaystyle F(\mathbf{v}_{1},\mathbf{v}_{2},\ldots,\mathbf{v}_{p}) =j1,j2,,jpαj11αj22,,αjppF(𝐛j11,𝐛j22,,𝐛jpp)\displaystyle=\sum_{j_{1},j_{2},\ldots,j_{p}}\alpha^{1}_{j_{1}}\alpha^{2}_{j_{% 2}},\ldots,\alpha^{p}_{j_{p}}F(\mathbf{b}^{1}_{j_{1}},\mathbf{b}^{2}_{j_{2}},% \ldots,\mathbf{b}^{p}_{j_{p}})

so by the definition of TT identity (8.5.6) holds. ∎

Dual of a tensor product

As one can easily see, the dual of the tensor product V1V2VpV_{1}\otimes V_{2}\otimes\ldots\otimes V_{p} is the tensor product of dual spaces V1V2VpV_{1}^{\prime}\otimes V_{2}^{\prime}\otimes\ldots\otimes V_{p}^{\prime}.

Indeed, by Proposition 8.5.4 and remark after it, there is a natural one-to-one correspondence between multilinear functionals in L(V1,V2,,Vp,𝔽)L(V_{1},V_{2},\ldots,V_{p},\mathbb{F}) (i.e. the elements of V1V2VnV_{1}^{\prime}\otimes V_{2}^{\prime}\otimes\ldots\otimes V_{n}^{\prime}) and the linear transformations T:V1V2Vp𝔽T:V_{1}\otimes V_{2}\otimes\ldots\otimes V_{p}\to\mathbb{F} (i.e. with the elements of the dual of V1V2VpV_{1}\otimes V_{2}\otimes\ldots\otimes V_{p}).

Note, that the bases from Remark 8.5.3 and Proposition 8.5.2 are the dual bases (in V1V2VpV_{1}\otimes V_{2}\otimes\ldots\otimes V_{p} and V1V2VnV_{1}^{\prime}\otimes V_{2}^{\prime}\otimes\ldots\otimes V_{n}^{\prime} respectively). Knowing the dual bases allows us easily calculate the duality between the spaces V1V2VpV_{1}\otimes V_{2}\otimes\ldots\otimes V_{p} and V1V2VpV_{1}^{\prime}\otimes V_{2}^{\prime}\otimes\ldots\otimes V_{p}^{\prime}, i.e. the expression 𝐱,𝐱\langle\mathbf{x},\mathbf{x}^{\prime}\rangle, 𝐱V1V2Vp\mathbf{x}\in V_{1}\otimes V_{2}\otimes\ldots\otimes V_{p}, 𝐱V1V2Vp\mathbf{x}^{\prime}\in V_{1}^{\prime}\otimes V_{2}^{\prime}\otimes\ldots% \otimes V_{p}^{\prime}

8.5.3. Covariant and contravariant tensors

Let X1,X2,,XpX_{1},X_{2},\ldots,X_{p} be vector spaces, and let VkV_{k} be either XkX_{k} or XkX_{k}^{\prime}, k=1,2,,pk=1,2,\ldots,p. For a multilinear function FL(V1,V2,,Vp;V)F\in L(V_{1},V_{2},\ldots,V_{p};V) we say that that it is covariant in variable 𝐯kVk\mathbf{v}_{k}\in V_{k} if Vk=XkV_{k}=X_{k} and contravariant in this variable if Vk=XkV_{k}=X_{k}^{\prime}.

If a multilinear function is covariant (contravariant) in all variables, we say that the multilinear function is covariant (contravariant). In general, if a function is covariant in rr variables and contravariant in ss variables, we say that the multilinear function is rr-covariant ss-contravariant (or simply (r,s)(r,s) multilinear function, or that its valency is (r,s)(r,s)).

Thus, a linear functional can be interpreted as 11-covariant tensor (recall, that we use the word tensor for the case of functionals, i.e. when the target space is the field of scalars 𝔽\mathbb{F}). By duality, a vector can be interpreted as 11-contravariant tensor.

Remark.

At first the terminology might look a bit confusing: if a variable is a vector (not a functional), it is a covariant variable but a contravariant object (tensor). But notice, that we did not say here a “covariant variable”: we said that if 𝐯kXk\mathbf{v}_{k}\in X_{k} then the mulitilinear function is covariant in the variable 𝐯k\mathbf{v}_{k}. So, the by the covariant variable me mean not the vector 𝐯k\mathbf{v}_{k}, but the “slot” in the tensor where we put it!

So there is no contradiction, we put the contravariant objects (tensors) into covariant slots and vice versa.

Sometimes, slightly abusing the language, people talk about covariant (contravariant) variables or arguments. But it is usually meant that the corresponding “slots” in the tensor are covariant (contravariant), and not the variables as objects.

Linear transformations as tensors

A linear transformation T:X1X2T:X_{1}\to X_{2} can be interpreted as 11-covariant 11-contravariant tensor. Namely, the bilinear functional FF,

F(𝐱1,𝐱2):=T𝐱1,𝐱2,𝐱1X1,𝐱2X2F(\mathbf{x}_{1},\mathbf{x}_{2}^{\prime}):=\langle T\mathbf{x}_{1},\mathbf{x}_% {2}^{\prime}\rangle,\qquad\mathbf{x}_{1}\in X_{1},\mathbf{x}_{2}^{\prime}\in X% _{2}^{\prime}

is covariant in the first variable 𝐱1\mathbf{x}_{1} and contravariant in the second one 𝐱2\mathbf{x}_{2}^{\prime}.

Conversely,

Proposition 8.5.5.

Given a 11-11 tensor FL(X1,X2;𝔽)F\in L(X_{1},X_{2}^{\prime};\mathbb{F}), there exists a unique linear transformation T:X1X2T:X_{1}\to X_{2} such that

(8.5.7) F(𝐱1,𝐱2):=T𝐱1,𝐱2,F(\mathbf{x}_{1},\mathbf{x}_{2}^{\prime}):=\langle T\mathbf{x}_{1},\mathbf{x}_% {2}^{\prime}\rangle,

for all 𝐱1X2\mathbf{x}_{1}\in X_{2}, 𝐱2X2\mathbf{x}_{2}^{\prime}\in X_{2}^{\prime}.

Proof.

First of all note, that the uniqueness is a trivial corollary of Lemma 8.1.3, cf. Problem 8.3.1 above. So we only need to prove existence of TT.

Let k={𝐛jk}j=1dimXk\mathcal{B}_{k}=\{\mathbf{b}^{k}_{j}\}_{j=1}^{\dim X_{k}} be a basis in XkX_{k}, and let k={𝐛~jk}j=1dimXk\mathcal{B}_{k}^{\prime}=\{\widetilde{\mathbf{b}}^{k}_{j}\}_{j=1}^{\dim X_{k}} be the dual basis in XkX_{k}^{\prime}, k=1,2k=1,2. Then define the matrix A={ak,j}k=1dimX2j=1dimX1A=\{a_{k,j}\}_{k=1}^{\dim X_{2}}{}_{j=1}^{\dim X_{1}} by

ak,j=F(𝐛j1,𝐛~k2).a_{k,j}=F(\mathbf{b}^{1}_{j},\widetilde{\mathbf{b}}^{2}_{k}).

Define TT to be the operator with matrix [T]2,1=A[T]_{\mathcal{B}_{2},\mathcal{B}_{1}}=A. Clearly (see Remark 8.1.5)

(8.5.8) T𝐛j1,𝐛~k2=ak,j=F(𝐛j1,𝐛~k2)\langle T\mathbf{b}^{1}_{j},\widetilde{\mathbf{b}}^{2}_{k}\rangle=a_{k,j}=F(% \mathbf{b}^{1}_{j},\widetilde{\mathbf{b}}^{2}_{k})

which implies the equality (8.5.7). This can be easily seen by decomposing 𝐱1=jαj𝐛j\mathbf{x}_{1}=\sum_{j}\alpha_{j}\mathbf{b}_{j} and 𝐱2=kβk𝐛k\mathbf{x}_{2}^{\prime}=\sum_{k}\beta_{k}\mathbf{b}_{k}^{\prime} and using linearity in each argument.

Another, more high brow explanation is that the tensors in left and the right sides of (8.5.7) coincide on a basis in X1X2X_{1}\otimes X_{2}^{\prime} (see Remark 8.5.3 about the basis), so they coincide. To be more precise, one should lift the bilinear forms to the linear transformations (functionals) X1X2𝔽X_{1}\otimes X_{2}^{\prime}\to\mathbb{F} (see Proposition 8.5.4), and since the transformations coincide on a basis, they are equal.

One can also give an alternative, coordinate-free proof of existence of TT, along the lines of the coordinate-free definition of the dual space (see Section 8.3.1). Namely, if we fix 𝐱1\mathbf{x}_{1}, the function F(𝐱1,𝐱2)F(\mathbf{x}_{1},\mathbf{x}_{2}^{\prime}) is a linear in 𝐱2\mathbf{x}_{2}^{\prime}, so it is a linear functional on X2X_{2}^{\prime}, i.e. a vector in X2X_{2}.

Let us call this vector T(𝐱1)T(\mathbf{x}_{1}). So we defined a transformation T:X1X2T:X_{1}\to X_{2}. One can easily show that TT is a linear transformation by essentially repeating the reasoning from Section 8.3.1. The equality (8.5.7) follows authomatically from the definition of TT. ∎

Remark.

Note that we also can say that the function FF from Proposition 8.5.5 defines not the transformation TT, but its adjoint. Apriori, without assuming anything (like order of variables and its interpretation) we cannot distinguish between a transformation and its adjoint.

Remark.

Note, that if we would like to follow the Einstein notation, the entries aj,ka_{j,k} of the matrix A=[T]2,1A=[T]_{\mathcal{B}_{2},\mathcal{B}_{1}} of the transformation TT should be written as akja^{j}_{k}. Then if xkx^{k}, k=1,2,,dimX1k=1,2,\ldots,\dim X_{1} are the coordinates of the vector 𝐱X1\mathbf{x}\in X_{1}, the jjth coordinate of 𝐲=T𝐱\mathbf{y}=T\mathbf{x} is given by

yj=akjxk.y^{j}=a^{j}_{k}x^{k}.

Recall the here we skip the sign of summation, but we mean the sum over kk. Note also, that we preserve positions of the indices, so the index jj stays upstairs. The index kk does not appear in the left side of the equation because we sum over this index in the right side, and its got “killed”.

Similarly, if xjx_{j}, j=1,2,,dimX2j=1,2,\ldots,\dim X_{2} are the coordinates of the vector 𝐱X2\mathbf{x}^{\prime}\in X_{2}^{\prime}, then kkth coordinate of 𝐲:=T𝐱\mathbf{y}^{\prime}:=T^{\prime}\mathbf{x}^{\prime} is given by

yk=akjxjy_{k}=a^{j}_{k}x_{j}

(again, skipping the sign of summation over jj). Again, since we preserve the position of the indices, so the index kk in yky_{k} is a subscript.

Note, that since 𝐱X1\mathbf{x}\in X_{1} and 𝐲=T𝐱X2\mathbf{y}=T\mathbf{x}\in X_{2} are vectors, according to the conventions of the Einstein notation, the indices in their coordinates indeed should be written as superscripts.

Similarly, 𝐱X2\mathbf{x}^{\prime}\in X_{2}^{\prime} and 𝐲=T𝐱X1\mathbf{y}^{\prime}=T^{\prime}\mathbf{x}^{\prime}\in X_{1}^{\prime} are covectors, so indices in their coordinates should be written as subscripts.

The Einstein notation emphasizes the fact mentioned in the previous remark, that a 11-covariant 11-contravariant tensor gives us both a linear transformation and its adjoint: the expression akjxka^{j}_{k}x^{k} gives the action of TT, and akjxja^{j}_{k}x_{j} gives the action of its adjoint TT^{\prime}.

Polylinear transformations as tensors

More generally, any polylinear transformation can be interpreted as a tensor. Namely, given a polylinear transformation FL(V1,V2,,Vp;V)F\in L(V_{1},V_{2},\ldots,V_{p};V) one can define the tensor F~L(V1,V2,,Vp,V;𝔽)\widetilde{F}\in L(V_{1},V_{2},\ldots,V_{p},V^{\prime};\mathbb{F}) by

(8.5.9) F~(𝐯1,𝐯2,,𝐯p,𝐯)=F(𝐯1,𝐯2,,𝐯p),𝐯,𝐯kVk,𝐯V.\widetilde{F}(\mathbf{v}_{1},\mathbf{v}_{2},\ldots,\mathbf{v}_{p},\mathbf{v}^{% \prime})=\langle F(\mathbf{v}_{1},\mathbf{v}_{2},\ldots,\mathbf{v}_{p}),% \mathbf{v}^{\prime}\rangle,\qquad\mathbf{v}_{k}\in V_{k},\mathbf{v}^{\prime}% \in V^{\prime}.

Conversely,

Proposition 8.5.6.

Given a tensor F~L(V1,V2,,Vp,V;𝔽)\widetilde{F}\in L(V_{1},V_{2},\ldots,V_{p},V^{\prime};\mathbb{F}) there exists a unique polylinear transformation FL(V1,V2,,Vp;V)F\in L(V_{1},V_{2},\ldots,V_{p};V) such that (8.5.9) is satisfied.

Proof.

By Proposition 8.5.4 the tensor F~\widetilde{F} can be extended to a linear transformation (functional) T~:V1V2VpV𝔽\widetilde{T}:V_{1}\otimes V_{2}\otimes\ldots\otimes V_{p}\otimes V^{\prime}% \to\mathbb{F} such that

F~(𝐯1,𝐯2,,𝐯p,𝐯)=T~(𝐯1𝐯2𝐯p𝐯)\widetilde{F}(\mathbf{v}_{1},\mathbf{v}_{2},\ldots,\mathbf{v}_{p},\mathbf{v}^{% \prime})=\widetilde{T}(\mathbf{v}_{1}\otimes\mathbf{v}_{2}\otimes\ldots\otimes% \mathbf{v}_{p}\otimes\mathbf{v}^{\prime})

for all 𝐯kVk\mathbf{v}_{k}\in V_{k}, 𝐯V\mathbf{v}^{\prime}\in V^{\prime}.

If 𝐰W:=V1V2Vp\mathbf{w}\in W:=V_{1}\otimes V_{2}\otimes\ldots\otimes V_{p} and 𝐯V\mathbf{v}^{\prime}\in V^{\prime}, then

𝐰𝐯V1V2VpV.\mathbf{w}\otimes\mathbf{v}^{\prime}\in V_{1}\otimes V_{2}\otimes\ldots\otimes V% _{p}\otimes V^{\prime}.

So, we can define a bilinear functional (tensor) GL(W,V;𝔽)G\in L(W,V^{\prime};\mathbb{F}) by

G(𝐰,𝐯):=T~(𝐰𝐯).G(\mathbf{w},\mathbf{v}^{\prime}):=\widetilde{T}(\mathbf{w}\otimes\mathbf{v}).

By Proposition 8.5.5, GG gives rise to a linear transformation, i.e. there exists a unique linear transformation T:WVT:W\to V such that

G(𝐰,𝐯)=T𝐰,𝐯𝐰W,𝐯V.G(\mathbf{w},\mathbf{v}^{\prime})=\langle T\mathbf{w},\mathbf{v}^{\prime}% \rangle\qquad\forall\mathbf{w}\in W,\quad\forall\mathbf{v}^{\prime}\in V^{% \prime}.

And the linear transformation TT gives us the polylinear map

FL(V1,V2,,Vp;V)F\in L(V_{1},V_{2},\ldots,V_{p};V)

by

F(𝐯1,𝐯2,,𝐯p)=T(𝐯1𝐯2𝐯p),F(\mathbf{v}_{1},\mathbf{v}_{2},\ldots,\mathbf{v}_{p})=T(\mathbf{v}_{1}\otimes% \mathbf{v}_{2}\otimes\ldots\otimes\mathbf{v}_{p}),

see Remark after Proposition 8.5.4.

The uniqueness of the transformation FF, is, as in Proposition 8.5.5, is a trivial corollary of Lemma 8.1.3. We leave the details as an exercise for the reader. ∎

This section shows that

tensors are universal objects in polylinear algebra, since any polylinear transformation can be interpreted as a tensor and vice versa.

Exercises.

8.5.1.

Show that the tensor product 𝐯1𝐯2𝐯p\mathbf{v}_{1}\otimes\mathbf{v}_{2}\otimes\ldots\otimes\mathbf{v}_{p} of vectors is linear in each argument 𝐯k\mathbf{v}_{k}.

8.5.2.

Show that the set {𝐯1𝐯2𝐯p:𝐯kVk}\{\mathbf{v}_{1}\otimes\mathbf{v}_{2}\otimes\ldots\otimes\mathbf{v}_{p}\,:\,% \mathbf{v}_{k}\in V_{k}\} of tensor products of vectors is strictly less than V1V2VpV_{1}\otimes V_{2}\otimes\ldots\otimes V_{p}.

8.5.3.

Prove that the transformation FF from Proposition 8.5.6 is unique.

8.6. Change of coordinates formula for tensors.

The main reason for the differentiation of covariant and contravariant variables is that under the change of bases, their coordinates change according to different rules. Thus, the entries of covariant and contravariant vectors change according to different rules as well.

In this section we going to investigate this in details. Note, that coordinate representations are extremely important, since, for example, all numerical computations (unlike the theoretical investigations) are performed using some coordinate system.

8.6.1. Coordinate representation of a tensor.

Let FF be an rr-covariant ss-contravariant tensor, r+s=pr+s=p. Let 𝐱1,,𝐱r\mathbf{x}_{1},\ldots,\mathbf{x}_{r} be covariant variables (𝐱kXk\mathbf{x}_{k}\in X_{k}), and 𝐟1,,𝐟s\mathbf{f}_{1},\ldots,\mathbf{f}_{s} be the contravariant ones (𝐟kXk\mathbf{f}_{k}\in X_{k}^{\prime}). Let us write the covariant variables first, so the the tensor will be written as F(𝐱1,,𝐱r,𝐟1,,𝐟s)F(\mathbf{x}_{1},\ldots,\mathbf{x}_{r},\mathbf{f}_{1},\ldots,\mathbf{f}_{s}). For k=1,2,,pk=1,2,\ldots,p fix a basis k={𝐛j(k)}j=1dimXk\mathcal{B}_{k}=\{\mathbf{b}^{(k)}_{j}\}_{j=1}^{\dim X_{k}} in XkX_{k}, and let k={𝐛~j(k)}j=1dimXk\mathcal{B}^{\prime}_{k}=\{\widetilde{\mathbf{b}}^{(k)}_{j}\}_{j=1}^{\dim X_{k}} be the dual basis in XkX_{k}^{\prime}.

For a vector 𝐱kXk\mathbf{x}_{k}\in X_{k} let x(k)jx_{(k)}^{j}, j=1,2,,dimXkj=1,2,\ldots,\dim X_{k} be its coordinates in the basis k\mathcal{B}_{k}, and similarly, if 𝐟kXk\mathbf{f}_{k}\in X_{k}^{\prime} let fj(k)f^{(k)}_{j}, j=1,2,,dimXkj=1,2,\ldots,\dim X_{k} be its coordinates in the dual basis k\mathcal{B}^{\prime}_{k} (note that in agreement with the Einstein notation, the coordinates of the vector are indexed by a superscript, and the coordinates of a covector are indexed by a subscript).

Proposition 8.6.1.

Denote

(8.6.1) φj1,,jrk1,,ks:=F(𝐛j1(1),,𝐛jr(r),𝐛~k1(r+1),,𝐛~ks(r+s)).\varphi_{j_{1},\ldots,j_{r}}^{k_{1},\ldots,k_{s}}:=F(\mathbf{b}^{(1)}_{j_{1}},% \ldots,\mathbf{b}^{(r)}_{j_{r}},\widetilde{\mathbf{b}}^{(r+1)}_{k_{1}},\ldots,% \widetilde{\mathbf{b}}^{(r+s)}_{k_{s}}).

Then, in the Einstein notation

(8.6.2) F(𝐱1,,𝐱r,𝐟1,,𝐟s)=φj1,,jrk1,,ksx(1)j1x(r)jrfk1(1)fks(s)F(\mathbf{x}_{1},\ldots,\mathbf{x}_{r},\mathbf{f}_{1},\ldots,\mathbf{f}_{s})=% \varphi_{j_{1},\ldots,j_{r}}^{k_{1},\ldots,k_{s}}x_{(1)}^{j_{1}}\ldots x_{(r)}% ^{j_{r}}f^{(1)}_{k_{1}}\ldots f^{(s)}_{k_{s}}

(the summation here is over the indices j1,,jrj_{1},\ldots,j_{r} and k1,,ksk_{1},\ldots,k_{s}).

Note that we use the notation (1),,(r)(1),\ldots,(r) and (1),,(s)(1),\ldots,(s) to emphasize that these are not the indices: the numbers in parenthesis just show the order of argument. Thus, right side of (8.6.2) does not have any indices left (all indices were used in summation), so it is just a number (for fixed 𝐱k\mathbf{x}_{k}s and 𝐟k\mathbf{f}_{k}s).

Proof of Proposition 8.6.1.

To show that (8.6.1) implies (8.6.2) we first notice that (8.6.1) means that (8.6.2) holds when 𝐱j\mathbf{x}_{j}s and 𝐟k\mathbf{f}_{k}s are the elements of the corresponding bases. Decomposing each argument 𝐱j\mathbf{x}_{j} and 𝐟k\mathbf{f}_{k} in the corresponding basis and using linearity in each argument we can easily get (8.6.2). The computation is rather simple, but because there are a lot of indices, the formulas could be quite big and could look quite frightening.

To avoid writing too many huge formulas, we leave this computation to the reader as an exercise.

We do not want the reader to feel cheated, so we present a different, more “high brow” (abstract) explanation, which does not require any computations! Namely, let us notice that the expressions in the left and the right side of (8.6.2) define tensors. By Proposition 8.5.4 they can be lifted to linear functionals on the tensor product X1XrXr+1Xr+sX_{1}\otimes\ldots\otimes X_{r}\otimes X_{r+1}^{\prime}\otimes\ldots\otimes X_% {r+s}^{\prime}.

Rephrasing what we discussed in the beginning of the proof, we can say that (8.6.1) means that the functional coincide on all vectors

𝐛j1(1)𝐛jr(r)𝐛~k1(r+1)𝐛~ks(r+s)\mathbf{b}^{(1)}_{j_{1}}\otimes\ldots\otimes\mathbf{b}^{(r)}_{j_{r}}\otimes% \widetilde{\mathbf{b}}^{(r+1)}_{k_{1}}\otimes\ldots\otimes\widetilde{\mathbf{b% }}^{(r+s)}_{k_{s}}

of a basis in the tensor product, so the functionals (and therefore the tensors) are equal. ∎

The entries φj1,,jrk1,,ks\varphi_{j_{1},\ldots,j_{r}}^{k_{1},\ldots,k_{s}} are called the entries of the tensor FF in the bases k\mathcal{B}_{k}, k=1,2,,pk=1,2,\ldots,p.

Now, let for k=1,2,pk=1,2,\ldots p, 𝒜k\mathcal{A}_{k} be a basis in XkX_{k} (and 𝒜k\mathcal{A}_{k}^{\prime} be the dual basis in XkX_{k}^{\prime}). We want to investigate how the entries of the tensor FF change when we change the bases from k\mathcal{B}_{k} to 𝒜k\mathcal{A}_{k}.

8.6.2. Change of coordinate formulas in Einstein notation

Let us first consider the familiar cases of vectors and linear functionals, considered above in Section 8.1.1 but write everything down using the Einstein notation. Let we have in XX two bases, \mathcal{B} and 𝒜\mathcal{A} and let

A=[A]𝒜,A=[A]_{\mathcal{A},\mathcal{B}}

be the change of coordinates matrix from \mathcal{B} to 𝒜\mathcal{A}. For a vector 𝐱X\mathbf{x}\in X let xkx^{k} be its coordinates in the basis \mathcal{B} and 𝐱~k\widetilde{\mathbf{x}}^{k} be the coordinates in the basis 𝒜\mathcal{A}. Similarly, for 𝐟X\mathbf{f}\in X^{\prime} let fkf_{k} denote the coordinates in the basis \mathcal{B}^{\prime} and f~k\widetilde{f}_{k}–the coordinates in the basis 𝒜\mathcal{A}^{\prime} (\mathcal{B}^{\prime} and 𝒜\mathcal{A}^{\prime} are the dual bases to \mathcal{B} and 𝒜\mathcal{A} respectively).

Denote by (A)kj(A)^{j}_{k} the entries of the matrix AA: to be consistent with the Einstein notation the superscript jj denotes the number of the row. Then we can write the change of coordinate formula as

(8.6.3) x~j=(A)kjxk.\widetilde{x}^{j}=(A)^{j}_{k}x^{k}.

Similarly, let (A1)jk(A^{-1})_{j}^{k} be the entries of A1A^{-1}: again superscript is used to denote the number of the row. Then we can write the change of coordinate formula for the dual space as

(8.6.4) f~j=(A1)jkfk;\widetilde{f}_{j}=(A^{-1})_{j}^{k}f_{k};

the summation here is over the index kk (i.e. along the columns of A1A^{-1}), so the change of coordinate matrix in this case is indeed (A1)T(A^{-1})^{T}.

Let us emphasize that we did not prove anything here: we only rewrote formula (8.1.1) from Section 8.1.1 using the Einstein notation.

Remark.

While it is not needed in what follows, let us play a bit more with the Einstein notation. Namely, the equations

A1A=IandAA1=IA^{-1}A=I\qquad\text{and}\qquad AA^{-1}=I

can be rewritten in the Einstein notation as

(A)kj(A1)lk=δj,land(A1)jk(A)lj=δk,l(A)^{j}_{k}(A^{-1})^{k}_{l}=\delta_{j,l}\qquad\text{and}\qquad(A^{-1})^{k}_{j}% (A)^{j}_{l}=\delta_{k,l}

respectively.

8.6.3. Change of coordinates formula for tensors

Now we are ready to give the change of coordinate formula for general tensors.

For k=1,2,,p:=r+sk=1,2,\ldots,p:=r+s let Ak:=[I]𝒜,A_{k}:=[I]_{\mathcal{A},\mathcal{B}} be the change of coordinates matrices, and let Ak1A_{k}^{-1} be their inverses.

As in Section 8.6.2 we denote by (A)kj(A)^{j}_{k} the entries of a matrix AA, with the agreement that superscript gives the number of the column.

Proposition 8.6.2.

Given an rr-covariant ss-contravariant tensor FF let

φj1,,jrk1,,ksandφ~j1,,jrk1,,ks\varphi_{j_{1},\ldots,j_{r}}^{k_{1},\ldots,k_{s}}\qquad\text{and}\qquad% \widetilde{\varphi}_{j_{1},\ldots,j_{r}}^{k_{1},\ldots,k_{s}}

be its entries in the bases k\mathcal{B}_{k} (the old ones) and 𝒜k\mathcal{A}_{k} (the new ones) respectively. In the above notation

φ~j1,,jrk1,,ks=φj1,,jrk1,,ks(A11)j1j1(Ar1)jrjr(Ar+1)k1k1(Ar+s)ksks\widetilde{\varphi}_{j_{1},\ldots,j_{r}}^{k_{1},\ldots,k_{s}}=\varphi_{j_{1}^{% \prime},\ldots,j_{r}^{\prime}}^{k_{1}^{\prime},\ldots,k_{s}^{\prime}}(A_{1}^{-% 1})^{j_{1}^{\prime}}_{j_{1}}\ldots(A_{r}^{-1})^{j_{r}^{\prime}}_{j_{r}}(A_{r+1% })^{k_{1}}_{k_{1}^{\prime}}\ldots(A_{r+s})^{k_{s}}_{k_{s}^{\prime}}

(the summation here is in the indices j1,,jrj_{1}^{\prime},\ldots,j_{r}^{\prime} and k1,,ksk_{1}^{\prime},\ldots,k_{s}).

Because of many indices, the formula in this proposition looks very complicated. However if one understands the main idea, the formula will turn out to be quite simple and easy to memorize.

To explain the main idea let us, sightly abusing the language, express this formula “in plain English”. namely, we can say, that

To express the “new” tensor entries φ~j1,,jrk1,,ks\widetilde{\varphi}_{j_{1},\ldots,j_{r}}^{k_{1},\ldots,k_{s}} in terms of the “old” ones φj1,,jrk1,,ks\varphi_{j_{1},\ldots,j_{r}}^{k_{1},\ldots,k_{s}}, one needs for each covariant index (subscript) apply the covariant rule (8.6.4), and for each contravariant index (superscript) apply the contravariant rule (8.6.3)

Proof of Proposition 8.6.2.

Informally, the idea of the proof is very simple: we just change the bases one at a time, applying each time the change of coordinate formulas (8.6.3) or (8.6.4), depending on whether the tensor is covariant or contravariant in the corresponding variable.

To write the rigorous formal proof we will use the induction in rr and ss (the number of covariant and contravariant arguments of the tensor). Proposition is true for r=1r=1, s=0s=0 and for r=0r=0, s=1s=1, see (8.6.4) or (8.6.3) respectively.

Assuming now that the proposition is proved for some rr and ss, let us prove it for r+1r+1, ss and for rr, s+1s+1.

Let us do the latter case, the other one is done similarly. The main idea is that we first change p=r+sp=r+s bases and use the induction hypothesis; then we change the last one and use (8.6.3).

Namely, let φ^j1,,jrk1,,ks+1\widehat{\varphi}_{j_{1},\ldots,j_{r}}^{k_{1},\ldots,k_{s+1}} be the entries of an (r,s+1)(r,s+1) tensor FF in the bases 𝒜1,,𝒜p,p+1\mathcal{A}_{1},\ldots,\mathcal{A}_{p},\mathcal{B}_{p+1}, p=r+sp=r+s.

Let us fix the index ks+1k_{s+1} and consider the rr-covariant ss-contravariant tensor F(𝐱1,,𝐱r,𝐟1,,𝐟s,𝐛~s+1(r+s+1))F(\mathbf{x}_{1},\ldots,\mathbf{x}_{r},\mathbf{f}_{1},\ldots,\mathbf{f}_{s},% \widetilde{\mathbf{b}}^{(r+s+1)}_{s+1}), where 𝐱1,,𝐱r,𝐟1,,𝐟s\mathbf{x}_{1},\ldots,\mathbf{x}_{r},\mathbf{f}_{1},\ldots,\mathbf{f}_{s} are the variables. Clearly

φj1,,jrk1,,ks,ks+1andφ^j1,,jrk1,,ks,ks+1\varphi_{j_{1},\ldots,j_{r}}^{k_{1},\ldots,k_{s},k_{s+1}}\qquad\text{and}% \qquad\widehat{\varphi}_{j_{1},\ldots,j_{r}}^{k_{1},\ldots,k_{s},k_{s+1}}

are its entries in the bases 1,,p\mathcal{B}_{1},\ldots,\mathcal{B}_{p} and 𝒜1,,𝒜p\mathcal{A}_{1},\ldots,\mathcal{A}_{p} respectively (can you see why?) Recall, that the index ks+1k_{s+1} here is fixed.

By the induction hypothesis

(8.6.5) φ^j1,,jrk1,,ks,ks+1=φj1,,jrk1,,ks,ks+1(A11)j1j1(Ar1)jrjr(Ar+1)k1k1(Ar+s)ksks.\widehat{\varphi}_{j_{1},\ldots,j_{r}}^{k_{1},\ldots,k_{s},k_{s+1}}=\varphi_{j% _{1}^{\prime},\ldots,j_{r}^{\prime}}^{k_{1}^{\prime},\ldots,k_{s}^{\prime},k_{% s+1}}(A_{1}^{-1})^{j_{1}^{\prime}}_{j_{1}}\ldots(A_{r}^{-1})^{j_{r}^{\prime}}_% {j_{r}}(A_{r+1})^{k_{1}}_{k_{1}^{\prime}}\ldots(A_{r+s})^{k_{s}}_{k_{s}^{% \prime}}.

Note, that we did not assume anything about the index ks+1k_{s+1}, so (8.6.5) holds for all ks+1k_{s+1}.

Now let us fix indices j1,,jr,k1,,ksj_{1},\ldots,j_{r},k_{1},\ldots,k_{s} and consider 11-contravariant tensor

F(𝐚j1(1),,𝐚jr(r),𝐚~k1(r+1),,𝐚~ks(r+s),𝐟s+1)F(\mathbf{a}^{(1)}_{j_{1}},\ldots,\mathbf{a}^{(r)}_{j_{r}},\widetilde{\mathbf{% a}}^{(r+1)}_{k_{1}},\ldots,\widetilde{\mathbf{a}}^{(r+s)}_{k_{s}},\mathbf{f}_{% s+1})

of the variable 𝐟s+1\mathbf{f}_{s+1}. Here 𝐚j(k)\mathbf{a}^{(k)}_{j} are the vectors in the basis 𝒜k\mathcal{A}_{k} and 𝐚~j(k)\widetilde{\mathbf{a}}^{(k)}_{j} are the vectors in the dual basis 𝒜k\mathcal{A}_{k}^{\prime}.

It is again easy to see that

φ^j1,,jrk1,,ks,ks+1andφ~j1,,jrk1,,ks,ks+1,\widehat{\varphi}_{j_{1},\ldots,j_{r}}^{k_{1},\ldots,k_{s},k_{s+1}}\qquad\text% {and}\qquad\widetilde{\varphi}_{j_{1},\ldots,j_{r}}^{k_{1},\ldots,k_{s},k_{s+1% }},

js+1=1,2,,dimXp+1j_{s+1}=1,2,\ldots,\dim X_{p+1}, are the indices of this functional in the bases p+1\mathcal{B}_{p+1} and 𝒜p+1\mathcal{A}_{p+1} respectively. According to (8.6.3)

φ~j1,,jrk1,,ks,ks+1=φ^j1,,jrk1,,ks,ks+1(Ap+1)ks+1ks+1,\widetilde{\varphi}_{j_{1},\ldots,j_{r}}^{k_{1},\ldots,k_{s},k_{s+1}}=\widehat% {\varphi}_{j_{1},\ldots,j_{r}}^{k_{1},\ldots,k_{s},k^{\prime}_{s+1}}(A_{p+1})^% {k_{s+1}}_{k^{\prime}_{s+1}},

and since we did not assume anything about the indices j1,,jr,k1,,ksj_{1},\ldots,j_{r},k_{1},\ldots,k_{s}, the above identity holds for all their combinations. Combining this with (8.6.5) we get that the proposition holds for tensors of valency (r,s+1)(r,s+1).

The case of valency (r+1,s)(r+1,s) is treated absolutely the same way: the only difference is that in the end we get a 11-covariant tensor and use (8.6.4) instead of (8.6.3). ∎