Linear Algebra Done Wrong 7 Bilinear and quadratic forms 9 Advanced spectral theory

Chapter 8 Dual spaces and tensors

All vector spaces in this chapter are finite-dimensional.

8.1. Dual spaces

8.1.1. Linear functionals and the dual space. Change of coordinates in the dual space

Definition 8.1.1.

A linear functional on a vector space $V$ (over a field $\mathbb{F}$ ) is a linear transformation $L:V\to\mathbb{F}$ .

This special class of linear transformations is sufficiently important to deserve a separate name.

If one thinks of vectors as of some physical objects, like force or velocity, then one can think of a linear functional as a (linear) measurement, that gives you some a scalar quantity as the result: think about force or velocity in a given direction.

Definition 8.1.2.

A collection of all linear functionals on a finite-dimensional¹¹1We consider here only finite-dimensional spaces because for infinite-dimensional spaces the dual space consists not of all but only of the so-called bounded linear functionals. Without giving the precise definition, let us only mention than in the finite-dimensional case (both the domain and the target space are finite-dimensional) all linear transformations are bounded, and we do not need to mention the word bounded vector space $V$ is called the dual of $V$ and is usually denoted as $V^{\prime}$ or $V^{*}$

As it was discussed earlier in Section 1.4 of Chapter 1, the set $\mathcal{L}(V,W)$ of all linear transformations acting from $V$ to $W$ is a vector space (with naturally defined operations of addition and multiplication by a scalar). So, the dual space $V^{\prime}=\mathcal{L}(V,\mathbb{F})$ is a vector space.

Let us consider an example. Let the space $V$ be $\mathbb{R}^{n}$ , what is its dual? As we know, a linear transformation $T:\mathbb{R}^{n}\to\mathbb{R}^{m}$ is represented by an $m\times n$ matrix, so a linear functional on $\mathbb{R}^{n}$ (i.e. a linear transformation $L:\mathbb{R}^{n}\to\mathbb{R}$ ) is given by an $1\times n$ matrix (row), let us denote it by $[L]$ . The collection of all such rows is isomorphic to $\mathbb{R}^{n}$ (isomorphism is given by taking the transpose $[L]\to[L]^{T}$ ).

So, the dual of $\mathbb{R}^{n}$ is $\mathbb{R}^{n}$ itself. The same holds true for $\mathbb{C}^{n}$ , of course, as well as for $\mathbb{F}^{n}$ , where $\mathbb{F}$ is an arbitrary field. Since the space $V$ over a field $\mathbb{F}$ (here we mostly interested in the case $\mathbb{F}=\mathbb{R}$ or $\mathbb{F}=\mathbb{C}$ ) of dimension $n$ is isomorphic to $\mathbb{F}^{n}$ , and the dual to $\mathbb{F}^{n}$ is isomorphic to $\mathbb{F}^{n}$ , we can conclude that the dual $V^{\prime}$ is isomorphic to $V$

Thus, the definition of the dual space is starting to look a bit silly, since it does not appear to give us anything new.

However, that is not the case! If we look carefully, we can see that the dual space is indeed a new object. To see that, let us analyze how the entries of the matrix $[L]$ (which we can call the coordinates of $L$ ) change when we change the basis in $V$ .

Change of coordinates formula

Let

\mathcal{A}=\{\mathbf{a}_{1},\mathbf{a}_{2},\ldots,\mathbf{a}_{n}\},\qquad% \mathcal{B}=\{\mathbf{b}_{1},\mathbf{b}_{2},\ldots,\mathbf{b}_{n}\}

be two bases in $V$ , and let $[L]_{\mathcal{A}}=[L]_{\mathcal{S},\mathcal{A}}$ and $[L]_{\mathcal{B}}=[L]_{\mathcal{S},\mathcal{B}}$ be the matrices of $L$ in the bases $\mathcal{A}$ and $\mathcal{B}$ respectively (we suppose that the basis in the target space of scalars is always the standard one, so we can skip the subscript $\mathcal{S}$ in the notation). Then recalling the change of coordinate rule from Section 2.8.4 in Chapter 2 we get that

[L]_{\mathcal{B}}=[L]_{\mathcal{A}}[I]_{\mathcal{A},\mathcal{B}}.

Recall that for a vector $\mathbf{v}\in V$ its coordinates in different bases are related by the formula

[\mathbf{v}]_{\mathcal{B}}=[I]_{\mathcal{B},\mathcal{A}}[\mathbf{v}]_{\mathcal% {A}},

and that

[I]_{\mathcal{A},\mathcal{B}}=[I]_{\mathcal{B},\mathcal{A}}^{-1}.

If we denote $S:=[I]_{\mathcal{B},\mathcal{A}}$ , so $[\mathbf{v}]_{\mathcal{B}}=S[\mathbf{v}]_{\mathcal{A}}$ , the entries of the vectors $[L]_{\mathcal{B}}^{T}$ and $[L]_{\mathcal{A}}^{T}$ are related by the formula

(8.1.1)

[L]_{\mathcal{B}}^{T}=(S^{-1})^{T}[L]_{\mathcal{A}}^{T}

(since we usually represent a vector as a column of its coordinates, we use $[L]_{\mathcal{A}}^{T}$ and $[L]_{\mathcal{B}}^{T}$ instead of $[L]_{\mathcal{A}}$ and $[L]_{\mathcal{B}}$ )

Saying it in words

If $S$ is the change of coordinate matrix (from old coordinates to the new ones) in $X$ , then the change of coordinate matrix in the dual space $X^{\prime}$ is $(S^{-1})^{T}$ .

So, the dual space $V^{\prime}$ of $V$ while isomorphic to $V$ is indeed a different object: the difference is in how the coordinates in $V$ and $V^{\prime}$ change when one changes the basis in $V$ .

Remark.

One can ask: why can’t we pick a basis in $X$ and some completely unrelated basis in the dual $X^{\prime}$ ? Of course, we can do that, but imagine, what would it take to compute $L(\mathbf{x})$ , knowing coordinates of $\mathbf{x}$ in some basis and coordinates of $L$ in some completely unrelated basis.

So, if we want (knowing the coordinates of a vector $\mathbf{x}$ in some basis) to compute the action of a linear functional $L$ using the standard rules of matrix algebra, i.e. to multiply a row (the functional) by a column (the vector), we have no choice: the “coordinates” of the linear functional $L$ should be the entries of its matrix (in the same basis).

As we can see later, see Section 8.1.3 below, the entries (“coordinates”) of a linear functional are indeed the coordinates in some basis (the so-called dual basis.

A uniqueness theorem

Lemma 8.1.3.

Let $\mathbf{v}\in V$ . If $L(\mathbf{v})=0$ for all $L\in V^{\prime}$ then $\mathbf{v}=0$ . As a corollary, if $L(\mathbf{v}_{1})=L(\mathbf{v}_{2})$ for all $L\in V^{\prime}$ , then $\mathbf{v}_{1}=\mathbf{v}_{2}$

Proof.

Fix a basis $\mathcal{B}$ in $V$ . Then

L(\mathbf{v})=[L]_{\mathcal{B}}[\mathbf{v}]_{\mathcal{B}}.

Picking different matrices (i.e. different $L$ ) we can easily see that $[\mathbf{v}]_{\mathcal{B}}=\mathbf{0}$ . Indeed, if

L_{k}=[0,\ldots,0,\underset{k}{1},0,\ldots,0]

then the equality

L_{k}[\mathbf{v}]_{\mathcal{B}}=0

implies that $k$ th coordinate of $[\mathbf{v}]_{\mathcal{B}}$ is $0$ .

Using this equality for all $k$ we conclude that $[\mathbf{v}]_{\mathcal{B}}=\mathbf{0}$ , so $\mathbf{v}=\mathbf{0}$ . ∎

8.1.2. Second dual

As we discussed above, the dual space $V^{\prime}$ is a vector space, so one can consider its dual $V^{\prime\prime}=(V^{\prime})^{\prime}$ . It looks like one that can consider the dual $V^{\prime\prime\prime}$ of $V^{\prime\prime}$ and so on…However, the fun stops with $V^{\prime\prime}$ because

The second dual $V^{\prime\prime}$ is canonically (i.e. in a natural way) isomorphic to $V$

Let us decipher this statement. Any vector $\mathbf{v}\in V$ canonically defines a linear functional $L_{\mathbf{v}}$ on $V^{\prime}$ (i.e. an element of the second dual $V^{\prime\prime}$ by the rule

L_{\mathbf{v}}(f)=f(\mathbf{v})\qquad\forall f\in V^{\prime}

It is easy to check that the mapping $T:V\to V^{\prime\prime}$ , $T\mathbf{v}=L_{\mathbf{v}}$ is a linear transformation.

Note, that $\operatorname{Ker}T=\{\mathbf{0}\}$ . Indeed, if $T\mathbf{v}=\mathbf{0}$ , then

f(\mathbf{v})=0\qquad\forall f\in V^{\prime},

and by Lemma 8.1.3 above we have $\mathbf{v}=\mathbf{0}$ .

Since $\dim V^{\prime\prime}=\dim V^{\prime}=\dim V$ , the condition $\operatorname{Ker}T=\{\mathbf{0}\}$ implies that $T$ is an invertible transformation (isomorphism).

The isomorphism $T$ is very natural, (at least for a mathematician). In particular, it was defined without using a basis, so it does not depend on the choice of basis. So, informally we say that $V^{\prime\prime}$ is canonically isomorphic to $V$ : the rigorous statement is that the map $T$ described above (which we consider to be a natural and canonical) is an isomorphism from $V$ to $V^{\prime\prime}$ .

8.1.3. Dual, a.k.a. biorthogonal bases

In the previous sections, we several times referred to the entries of the matrix of a linear functional $L$ as coordinates. But coordinates in this book usually means the coordinates in some basis. Are the “coordinates’ of a linear functional really coordinates in some basis? Turns out the answer is “yes”, so the terminology remains consistent.

Let us find the basis corresponding to the coordinates of $L\in V^{\prime}$ . Let $\{\mathbf{b}_{1},\mathbf{b}_{2},\ldots,\mathbf{b}_{n}\}$ be a basis in $V$ . For $L\in V^{\prime}$ , let $[L]_{\mathcal{B}}=[L_{1},L_{2},\ldots,L_{n}]$ be its matrix (row) in the basis $\mathcal{B}$ . Consider linear functionals $\mathbf{b}^{\prime}_{1},\mathbf{b}^{\prime}_{2},\ldots,\mathbf{b}^{\prime}_{n}% \in V^{\prime}$ defined by

(8.1.2)

\mathbf{b}^{\prime}_{k}(\mathbf{b}_{j})=\delta_{k,j}

where $\delta_{k,j}$ is the Kroneker delta,

\delta_{k.j}=\left\{\begin{array}[]{ll}1,&j=k\\ 0&j\neq k\end{array}\right.

Recall, that a linear transformation is defined by its action on a basis, so the functionals $\mathbf{b}^{\prime}_{k}$ are well defined.

As one can easily see, the functional $L$ can be represented as

L=\sum L_{k}\mathbf{b}^{\prime}_{k}.

Indeed, take an arbitrary $\mathbf{v}=\sum_{k}\alpha_{k}\mathbf{b}_{k}\in V$ , so $[\mathbf{v}]_{\mathcal{B}}=[\alpha_{1},\alpha_{2},\ldots,\alpha_{n}]^{T}$ . By linearity and definition of $\mathbf{b}^{\prime}_{k}$

\mathbf{b}_{k}^{\prime}(\mathbf{v})=\mathbf{b}_{k}^{\prime}\left(\sum_{j}% \alpha_{j}\mathbf{b}_{j}\right)=\sum_{j}\alpha_{j}\mathbf{b}^{\prime}_{k}(% \mathbf{b}_{j})=\alpha_{k}.

Therefore

L\mathbf{v}=[L]_{\mathcal{B}}[\mathbf{v}]_{\mathcal{B}}=\sum_{k}L_{k}\alpha_{k% }=\sum_{k}L_{k}\mathbf{b}^{\prime}_{k}(\mathbf{v}).

Since this identity holds for all $\mathbf{v}\in V$ , we conclude that $L=\sum_{k}L_{k}\mathbf{b}^{\prime}_{k}$ .

Since we did not assume anything about $L\in V^{\prime}$ , we have just shown that any linear functional $L$ can be represented as a linear combination of $\mathbf{b}^{\prime}_{1},\mathbf{b}^{\prime}_{2},\ldots,\mathbf{b}^{\prime}_{n}$ , so the system $\mathbf{b}^{\prime}_{1},\mathbf{b}^{\prime}_{2},\ldots,\mathbf{b}^{\prime}_{n}$ is generating.

Let us show that this system is linearly independent (and so it is a basis). Let $\mathbf{0}=\sum_{k}L_{k}\mathbf{b}^{\prime}_{k}$ . Then for an arbitrary $j=1,2,\ldots,n$

0=\mathbf{0}\mathbf{b}_{j}=\left(\sum_{k}L_{k}\mathbf{b}^{\prime}_{k}\right)(% \mathbf{b}_{j})=\sum_{k}L_{k}\mathbf{b}^{\prime}_{k}(\mathbf{b}_{j})=L_{j}

so $L_{j}=0$ . Therefore, all $L_{k}$ are $0$ and the system is linearly independent.

So, the system $\mathbf{b}^{\prime}_{1},\mathbf{b}^{\prime}_{2},\ldots,\mathbf{b}^{\prime}_{n}$ is indeed a basis in the dual space $V^{\prime}$ and the entries of $[L]_{\mathcal{B}}$ are coordinates of $L$ with respect to the basis $\mathcal{B}$ .

Definition 8.1.4.

Let $\mathbf{b}_{1},\mathbf{b}_{2},\ldots,\mathbf{b}_{n}$ be a basis in $V$ . The system of vectors

\mathbf{b}^{\prime}_{1},\mathbf{b}^{\prime}_{2},\ldots,\mathbf{b}^{\prime}_{n}% \in V^{\prime},

uniquely defined by the equation (8.1.2) is called the dual (or biorthogonal) basis to $\mathbf{b}_{1},\mathbf{b}_{2},\ldots,\mathbf{b}_{n}$ .

Note that we have shown that the dual system to a basis is a basis. Note also that in $\mathbf{b}^{\prime}_{1},\mathbf{b}^{\prime}_{2},\ldots,\mathbf{b}^{\prime}_{n}$ is the dual system to a basis $\mathbf{b}_{1},\mathbf{b}_{2},\ldots,\mathbf{b}_{n}$ , then $\mathbf{b}_{1},\mathbf{b}_{2},\ldots,\mathbf{b}_{n}$ is the dual to the basis $\mathbf{b}^{\prime}_{1},\mathbf{b}^{\prime}_{2},\ldots,\mathbf{b}^{\prime}_{n}$

Abstract non-orthogonal Fourier decomposition

The dual system can be used for computing the coordinates in the basis $\mathbf{b}_{1},\mathbf{b}_{2},\ldots,\mathbf{b}_{n}$ . Let $\mathbf{b}^{\prime}_{1},\mathbf{b}^{\prime}_{2},\ldots,\mathbf{b}^{\prime}_{n}$ be the biorthogonal system to $\mathbf{b}_{1},\mathbf{b}_{2},\ldots,\mathbf{b}_{n}$ , and let $\mathbf{v}=\sum_{k}\alpha_{k}\mathbf{b}_{k}$ . Then, as it was shown before

\mathbf{b}^{\prime}_{j}(\mathbf{v})=\mathbf{b}_{j}\left(\sum_{k}\alpha_{k}% \mathbf{b}_{k}\right)=\sum_{k}\alpha_{k}\mathbf{b}_{j}(\mathbf{b}_{k})=\alpha_% {j}\mathbf{b}_{j}^{\prime}(\mathbf{b}_{j})=\alpha_{j},

so $\alpha_{k}=\mathbf{b}_{k}^{\prime}(\mathbf{v})$ . Then we can write

(8.1.3)

\mathbf{v}=\sum_{k}\mathbf{b}_{k}^{\prime}(\mathbf{v})\mathbf{b}_{k}.

In other words,

The $k$ th coordinate of a vector $\mathbf{v}$ in a basis $\mathcal{B}=\{\mathbf{b}_{1},\mathbf{b}_{2},\ldots,\mathbf{b}_{n}\}$ is $\mathbf{b}_{k}^{\prime}(\mathbf{v})\}$ , where $\mathcal{B}^{\prime}=\{\mathbf{b}^{\prime}_{1},\mathbf{b}^{\prime}_{2},\ldots,% \mathbf{b}^{\prime}_{n}\}$ is the dual basis.

This formula is called (a baby version of) the abstract non-orthogonal Fourier decomposition of $\mathbf{v}$ (in the basis $\mathbf{b}_{1},\mathbf{b}_{2},\ldots,\mathbf{b}_{n}$ ). The reason for this name will be clear later in Section 8.2.3.

Remark 8.1.5.

Let $\mathcal{A}=\{\mathbf{a}_{1},\mathbf{a}_{2},\ldots,\mathbf{a}_{n}\}$ and $\mathcal{B}=\{\mathbf{b}_{1},\mathbf{b}_{2},\ldots,\mathbf{b}_{m}\}$ be bases in $X$ and $Y$ respectively, and let $\mathcal{B}^{\prime}=\{\mathbf{b}^{\prime}_{1},\mathbf{b}^{\prime}_{2},\ldots,% \mathbf{b}^{\prime}_{m}\}$ be the dual basis to $\mathcal{B}$ . Then the matrix $[T]_{\mathcal{B},\mathcal{A}}=:A=\{a_{k,j}\}_{k=1}^{m}{}_{j=1}^{n}$ of the transformation $T$ in the bases $\mathcal{A}$ , $\mathcal{B}$ is given by

a_{k,j}=\mathbf{b}^{\prime}_{k}(T\mathbf{a}_{j}),\qquad j=1,2,\ldots,n,\quad k% =1,2,\ldots,m.

8.1.4. Examples of dual systems

The first example we consider is a trivial one. Let $V$ be $\mathbb{R}^{n}$ (or $\mathbb{C}^{n}$ ) and let $\mathbf{e}_{1},\mathbf{e}_{2},\ldots,\mathbf{e}_{n}$ be the standard basis there. The dual space will be the space of $n$ -dimensional row vectors, which is isomorphic to $\mathbb{R}^{n}$ (or $\mathbb{C}^{n}$ in the complex case), and the standard basis there is the dual to $\mathbf{e}_{1},\mathbf{e}_{2},\ldots,\mathbf{e}_{n}$ . The standard basis in $(\mathbb{R}^{n})^{\prime}$ (or in $(\mathbb{C}^{n})^{\prime}$ is $\mathbf{e}^{T}_{1},\mathbf{e}^{T}_{2},\ldots,\mathbf{e}^{T}_{n}$ ) obtained from $\mathbf{e}_{1},\mathbf{e}_{2},\ldots,\mathbf{e}_{n}$ by transposition.

Taylor formula

The next example is more interesting. Let us consider the space $\mathbb{P}_{n}$ of polynomials of degree at most $n$ . As we know, the powers $\{\mathbf{e}_{k}\}_{k=0}^{n}$ , $\mathbf{e}(t)=t^{n}$ form the standard basis in this space. What is the dual to this basis?

The answer might be tricky to guess, but it is very easy to check when you know it. Namely, consider the linear functionals $\mathbf{e}^{\prime}_{k}\in(\mathbb{P}_{n})^{\prime}$ , $k=0,1,\ldots,n$ , acting on polynomials as follows:

\mathbf{e}^{\prime}_{k}(p):=\frac{1}{k!}\frac{d^{k}}{dt^{k}}p(t)\bigm{|}_{t=0}% =\frac{1}{k!}p^{(k)}(0);

here we use the usual agreement that $0!=1$ and $d^{0}f/dt^{0}=f$ .

Since

\frac{d^{k}}{dt^{k}}t^{j}=\left\{\begin{array}[]{ll}j(j-1)\ldots(j-k+1)t^{j-k}% ,&k\leq j\\ 0&k>j\end{array}\right.

we can easily see that the system $\{\mathbf{e}^{\prime}_{k}\}_{k=0}^{n}$ is the dual to the system of powers $\{\mathbf{e}_{k}\}_{k=0}^{n}$ .

Applying (8.1.3) to the above system $\{\mathbf{e}_{k}\}_{k=0}^{n}$ and its dual we get that any polynomial $p$ of degree at most $n$ can be represented as

(8.1.4)

p(t)=\sum_{k=0}^{n}\frac{p^{(k)}(0)}{k!}t^{k}

This formula is well-known in Calculus as the Taylor formula for polynomials. More precisely, this is a particular case of the Taylor formula, the so-called Maclaurin formula. The general Taylor formula

p(t)=\sum_{k=0}^{n}\frac{p^{(k)}(a)}{k!}(t-a)^{k}

can be obtained from (8.1.4) by applying it to the polynomial $p(\tau-a)$ and then denoting $t:=\tau-a$ . It also can be obtained by considering powers $(t-a)^{k}$ , $k=0,1,\ldots,n$ and finding the dual system the same way we did it for $t^{k}$ .²²2 Note, that the general Taylor formula says more than the formula for polynomials obtained here: it says that any $n$ times differentiable function can be approximated near the point $a$ by its Taylor polynomial. Moreover, if the function is $n+1$ times differentiable, it allows us to estimate the error. The above formula for polynomials serves as a motivation and a starting point for the general case

Lagrange interpolation

Our next example deals with the so-called Lagrange interpolating formula. Let $a_{1},a_{2},\ldots,a_{n+1}$ be distinct points (in $\mathbb{R}$ or $\mathbb{C}$ ), and let $\mathbb{P}_{n}$ be the space of polynomials of degree at most $n$ . Define functionals $\mathbf{f}_{k}\in\mathbb{P}_{n}^{\prime}$ by

\mathbf{f}_{k}(p)=p(a_{k})\qquad\forall p\in\mathbb{P}_{n}.

What is the dual of this system of functionals? Note, that while it is not hard to show that the functionals $\mathbf{f}_{k}$ are linearly independent, and so (since $\dim(\mathbb{P}_{n})^{\prime}=\dim\mathbb{P}_{n}=n+1$ ) form a basis in $(\mathbb{P}_{n})^{\prime}$ , we do not need that. We will construct the dual system directly, and then will be able to see that the system $\mathbf{f}_{1},\mathbf{f}_{2},\ldots,\mathbf{f}_{n+1}$ is indeed a basis.

Namely, let us define the polynomials $p_{k}$ , $k=1,2,\ldots,n+1$ as

p_{k}(t)={\prod_{j:j\neq k}(t-a_{j})}\Bigm{/}{\prod_{j:j\neq k}(a_{k}-a_{j})}

where $j$ in the products runs from $1$ to $n+1$ . Clearly $p_{k}(a_{k})=1$ and $p_{k}(a_{j})=0$ if $j\neq k$ , so indeed the system $p_{1},p_{2},\ldots,p_{n+1}$ is dual to the system $\mathbf{f}_{1},\mathbf{f}_{2},\ldots,\mathbf{f}_{n+1}$ .

There is a little detail here, since the notion of a dual system was defined only for a basis, and we did not prove that either of the systems is one. But one can immediately see that the system $p_{1},p_{2},\ldots,p_{n+1}$ is linearly independent (can you explain why?), and since it contains $n+1=\dim\mathbb{P}_{n}$ vectors, it is a basis. Therefore, the system of functionals $\mathbf{f}_{1},\mathbf{f}_{2},\ldots,\mathbf{f}_{n+1}$ is also a basis in the dual space $(\mathbb{P}_{n})^{\prime}$ .

Remark.

Note, that we did not just got lucky here, this is a general phenomenon. Namely, as Problem 8.1.1 below asserts, any system of vectors having a “‘dual” one must be linearly independent. So, constructing a dual system is a way of proving linear independence (and an easy one, if you can do it easily as in the above example).

Applying formula (8.1.3) to the above example one can see that the unique polynomial $p$ , $\deg p\leq n$ satisfying

(8.1.5)

p(a_{k})=y_{k},\qquad k=1,2,\ldots,n+1

can be reconstructed by the formula

(8.1.6)

p(t)=\sum_{k=1}^{n+1}y_{k}p_{k}(t).

This formula is well-known in mathematics as the “Lagrange interpolation formula”.

Exercises.

8.1.1.

Let $\mathbf{v}_{1},\mathbf{v}_{2},\ldots,\mathbf{v}_{r}$ be a system of vectors in $X$ such that there exists a system $\mathbf{v}^{\prime}_{1},\mathbf{v}^{\prime}_{2},\ldots,\mathbf{v}^{\prime}_{r}$ of linear functionals such that

\mathbf{v}^{\prime}_{k}(\mathbf{v}_{j})=\left\{\begin{array}[]{ll}1,&j=k\\ 0&j\neq k\end{array}\right.

a)

Show that the system $\mathbf{v}_{1},\mathbf{v}_{2},\ldots,\mathbf{v}_{r}$ is linearly independent.
b)

Show that if the system $\mathbf{v}_{1},\mathbf{v}_{2},\ldots,\mathbf{v}_{r}$ is not generating, then the “biorthogonal” system $\mathbf{v}^{\prime}_{1},\mathbf{v}^{\prime}_{2},\ldots,\mathbf{v}^{\prime}_{r}$ is not unique. Hint: Probably the easiest way to prove that is to complete the system $\mathbf{v}_{1},\mathbf{v}_{2},\ldots,\mathbf{v}_{r}$ to a basis, see Proposition 2.5.4 from Chapter 2

8.1.2.

Prove that given distinct points $a_{1},a_{2},\ldots,a_{n+1}$ and values $y_{1},y_{2},\ldots,y_{n+1}$ (not necessarily distinct) the polynomial $p$ , $\deg p\leq n$ satisfying (8.1.5) is unique. Try to prove it using the ideas from linear algebra, and not what you know about polynomials.

8.2. Dual of an inner product space

Let us recall that there is no inner product space over an arbitrary field, that all our inner product spaces are either real or complex.

8.2.1. Riesz representation theorem

Theorem 8.2.1 (Riesz representation theorem).

Let $H$ be an inner product space. Given a linear functional $L$ on $H$ there exists a unique vector $\mathbf{y}\in H$ such that

(8.2.1)

L(\mathbf{v})=(\mathbf{v},\mathbf{y})\qquad\forall\mathbf{v}\in H.

Proof.

Fix an orthonormal basis $\mathbf{e}_{1},\mathbf{e}_{2},\ldots,\mathbf{e}_{n}$ in $H$ , and let

[L]=[L_{1},L_{2},\ldots,L_{n}]

be the matrix of $L$ in this basis. Define vector $\mathbf{y}$ by

(8.2.2)

\mathbf{y}:=\sum_{k}\overline{L}_{k}\mathbf{e}_{k},

where $\overline{L}_{k}$ denotes the complex conjugate of $L_{k}$ . In the case of a real space the conjugation does nothing and can be simply ignored.

We claim that $\mathbf{y}$ satisfies (8.2.1).

Indeed, take an arbitrary vector $\mathbf{v}=\sum_{k}\alpha_{k}\mathbf{e}_{k}$ . Then

[\mathbf{v}]=[\alpha_{1},\alpha_{2},\ldots,\alpha_{n}]^{T}

and

L(\mathbf{v})=[L][\mathbf{v}]=\sum_{k}L_{k}\alpha_{k}.

On the other hand, recalling that if we know coordinates of 2 vectors in an orthonormal basis, we can compute the inner product by taking these coordinate and computing the standard inner product in $\mathbb{F}^{n}$ (see Exercise 5.2.3 in Chapter 5) we see that

(\mathbf{v},\mathbf{y})=\sum_{k}\alpha_{k}\overline{\overline{L}}_{k}=\sum_{k}% \alpha_{k}L_{k}

so (8.2.1) holds.

To show that the vector $\mathbf{y}$ is unique, let us assume that $\mathbf{y}$ satisfies (8.2.1). Then for $k=1,2,\ldots,n$

(\mathbf{e}_{k},\mathbf{y})=L(\mathbf{e}_{k})=L_{k},

so $(\mathbf{y},\mathbf{e}_{k})=\overline{L}_{k}$ . Then, using the formula for the decomposition in the orthonormal basis, see Section 5.2.1 of Chapter 5 we get

\mathbf{y}=\sum_{k}(\mathbf{y},\mathbf{e}_{k})\mathbf{e}_{k}=\sum_{k}\overline% {L}_{k}\mathbf{e}_{k}

which means that any vector satisfying (8.2.1) must be represented by (8.2.2). ∎

Remark.

While the statement of the theorem does not require a basis, the proof presented above utilizes an orthonormal basis in $H$ , although the resulting vector $\mathbf{y}$ does not depend on the choice of the basis³³3 An alternative proof that does need a basis is also possible. This alternative proof, that works in infinite-dimensional case, uses strong convexity of the unit ball in the inner product space together with the idea of completeness from analysis.. An advantage of this proof is that it gives a formula for computing the representing vector $\mathbf{y}$ .

8.2.2. Is an inner product space a dual to itself?

For a vector $\mathbf{y}$ in an inner product space $H$ one can define a linear functional $L_{\mathbf{y}}$ ,

L_{\mathbf{y}}(\mathbf{v}):=(\mathbf{v},\mathbf{y}).

It is easy to see that the mapping $\mathbf{y}\mapsto L_{\mathbf{y}}$ is an injective mapping from $H$ to its dual $H^{*}$ . The above Theorem 8.2.1 asserts that this mapping is a surjection (onto), so one is tempted to say that the dual of an inner product space $H$ is (canonically isomorphic to) the space $H$ itself, with the canonical isomorphism given by $\mathbf{y}\mapsto L_{\mathbf{y}}$ .

This is indeed the case if $H$ is a real inner product space and in this case it is easy to show that the map $\mathbf{y}\mapsto L_{\mathbf{y}}$ is a linear transformation. We already discussed that the map is injective and surjective, so it is an invertible linear transformations, i.e. an isomorphism.

However if $H$ is a complex space, one needs to be a bit more careful. Namely, the mapping $\mathbf{y}\mapsto L_{\mathbf{y}}$ that that maps a vector $\mathbf{y}\in H$ to the linear functional $L_{\mathbf{y}}$ as in Theorem 8.2.1 ( $L_{\mathbf{y}}(\mathbf{v})=(\mathbf{v},\mathbf{y})$ ) is not a linear one.

More precisely, while it is easy to show that

(8.2.3)

L_{\mathbf{y}_{1}+\mathbf{y}_{2}}=L_{\mathbf{y}_{1}}+L_{\mathbf{y}_{2}},

it follows from the definition of $L_{\mathbf{y}}$ and properties of inner product that

(8.2.4)

L_{\alpha\mathbf{y}}(\mathbf{v})=(\mathbf{v},\alpha\mathbf{y})=\overline{% \alpha}(\mathbf{v},\mathbf{y})=\overline{\alpha}L_{\mathbf{y}}(\mathbf{v}),

so $L_{\alpha\mathbf{y}}=\overline{\alpha}L_{\mathbf{y}}$ .

In other words, one can say that the dual of a complex inner product space is the space itself but with the different linear structure: adding 2 vectors is equivalent to adding corresponding linear functionals, but multiplying a vector by $\alpha$ is equivalent to multiplying the corresponding functional by $\overline{\alpha}$ .

A transformation $T$ satisfying $T(\alpha\mathbf{x}+\beta\mathbf{y})=\overline{\alpha}T\mathbf{x}+\overline{% \beta}T\mathbf{y}$ is sometimes called a conjugate linear transformation.

So, for a complex inner product space $H$ its dual can be canonically identified with $H$ by a conjugate linear isomorphism (i.e. invertible conjugate linear transformation)

Of course, for a real inner product space the complex conjugation can be simply ignored (because $\alpha$ is real), so the map $\mathbf{y}\mapsto L_{\mathbf{y}}$ is a linear one. In this case we can, indeed say that the dual of an inner product space $H$ is the space itself.

In both, real and complex cases, we nevertheless can say that the dual of an inner product space can be canonically identified with the space itself.

8.2.3. Biorthogonal systems and orthonormal bases

Definition 8.2.2.

Let $\mathbf{b}_{1},\mathbf{b}_{2},\ldots,\mathbf{b}_{n}$ be a basis in an inner product space $H$ . The unique system $\mathbf{b}^{\prime}_{1},\mathbf{b}^{\prime}_{2},\ldots,\mathbf{b}^{\prime}_{n}$ in $H$ defined by

(\mathbf{b}_{j},\mathbf{b}^{\prime}_{k})=\delta_{j,k},

where $\delta_{j,k}$ is the Kroneker delta, is called the biorthogonal or dual to the basis $\mathbf{b}_{1},\mathbf{b}_{2},\ldots,\mathbf{b}_{n}$ .

This definition clearly agrees with Definition 8.1.4, if one identifies the dual $H^{\prime}$ with $H$ as it was discussed above. Then it follows immediately from the discussion in Section 8.1.3 that the dual system $\mathbf{b}^{\prime}_{1},\mathbf{b}^{\prime}_{2},\ldots,\mathbf{b}^{\prime}_{n}$ to a basis $\mathbf{b}_{1},\mathbf{b}_{2},\ldots,\mathbf{b}_{n}$ is uniquely defined and forms a basis, and that the dual to $\mathbf{b}^{\prime}_{1},\mathbf{b}^{\prime}_{2},\ldots,\mathbf{b}^{\prime}_{n}$ is $\mathbf{b}_{1},\mathbf{b}_{2},\ldots,\mathbf{b}_{n}$ .

The abstract non-orthogonal Fourier decomposition formula (8.1.3) can be rewritten as

\mathbf{v}=\sum_{k=1}^{n}(\mathbf{v},\mathbf{b}^{\prime}_{k})\mathbf{b}_{k}

Note, that an orthonormal basis is dual to itself. So, if $\mathbf{e}_{1},\mathbf{e}_{2},\ldots,\mathbf{e}_{n}$ is an orthonormal basis, the above formula is rewritten as

\mathbf{v}=\sum_{k=1}^{n}(\mathbf{v},\mathbf{e}_{k})\mathbf{e}_{k}

which is the classical (orthogonal) abstract Fourier decomposition, see formula (5.2.2) in Section 5.2.1 of Chapter 5.

8.3. Adjoint (dual) transformations and transpose. Fundamental subspace revisited (once more)

By analogy with the case of an inner product spaces, see Theorem 8.2.1, it is customary to write $L(\mathbf{v})$ , where $L$ is a linear functional (i.e. $L\in V^{\prime}$ , $\mathbf{v}\in V$ ) in the form resembling inner product

L(\mathbf{v})=\langle\mathbf{v},L\rangle

Note, that the expression $\langle\mathbf{v},L\rangle$ is linear in both arguments, unlike the inner product which in the case of a complex space is linear in the first argument and conjugate linear in the second. So, to distinguish it from the inner product, we use the angular brackets.⁴⁴4This notation, while widely used, is far from the standard. Sometimes $(\mathbf{v},L)$ is used, sometimes the angular brackets are used for the inner product. So, encountering expression like that in the text, one has to be very careful to distinguish inner product from the action of a linear functional.

Note also, that while in the inner product both vectors belong to the same space, $\mathbf{v}$ and $L$ above belong to different spaces: in particular, we cannot add them.

8.3.1. Dual (adjoint) transformation

Definition 8.3.1.

Let $A:X\to Y$ be a linear transformation. The transformation $A^{\prime}:Y^{\prime}\to X^{\prime}$ ( $X^{\prime}$ and $Y^{\prime}$ are dual spaces for $X$ and $Y$ respectively) such that

\langle A\mathbf{x},\mathbf{y}^{\prime}\rangle=\langle\mathbf{x},A^{\prime}% \mathbf{y}^{\prime}\rangle\qquad\forall\mathbf{x}\in X,\mathbf{y}^{\prime}\in Y% ^{\prime}

is called the adjoint (dual) to $A$ .

Of course, it is not a priori clear why the transformation $A^{\prime}$ exists. Below we will show that indeed such transformation exists, and moreover, it is unique.

Dual transformation for the case $A:\mathbb{F}^{n}\to\mathbb{F}^{m}$

Let us first consider the case when $X=\mathbb{F}^{n}$ , $Y=\mathbb{F}^{m}$ ( $\mathbb{F}$ here is, as usual, either $\mathbb{R}$ or $\mathbb{C}$ , but everything works for the case of arbitrary fields)

As usual, we identify a vector $\mathbf{v}$ in $\mathbb{F}^{n}$ with the column of its coordinates, and a linear transformation with its matrix (in the standard basis).

The dual of $\mathbb{F}^{n}$ is, as it was discussed above, the space of rows of size $n$ , so we can identify its with $\mathbb{F}^{n}$ . Again, we will treat an element of $(\mathbb{F}^{n})^{\prime}$ as a column vector of its coordinates.

Under these agreements we have for $\mathbf{x}\in\mathbb{F}^{n}$ and $\mathbf{x}^{\prime}\in(\mathbb{F}^{n})^{\prime}$

\mathbf{x}^{\prime}(\mathbf{x})=\langle\mathbf{x},\mathbf{x}^{\prime}\rangle=(% \mathbf{x}^{\prime})^{T}\mathbf{x}

where the right side is the product of matrices (or a row and a column). Then, for arbitrary $\mathbf{x}\in X=\mathbb{F}^{n}$ and $\mathbf{y}^{\prime}\in Y^{\prime}=(\mathbb{F}^{m})^{\prime}$

\langle A\mathbf{x},\mathbf{y}^{\prime}\rangle=(\mathbf{y}^{\prime})^{T}A% \mathbf{x}=(A^{T}\mathbf{y}^{\prime})^{T}\mathbf{x}=\langle\mathbf{x},A^{T}% \mathbf{y}^{\prime}\rangle

(the expressions in the middle are products of matrices).

So we have proved that the adjoint transformation exists. let us show that it is unique. Assume that for some transformation $B$

\langle A\mathbf{x},\mathbf{y}^{\prime}\rangle=\langle\mathbf{x},B\mathbf{y}^{% \prime}\rangle\qquad\forall\mathbf{x}\in\mathbb{F}^{n},\forall\mathbf{y}^{% \prime}\in(\mathbb{F}^{m})^{\prime}.

That means that for arbitrary

\langle\mathbf{x},(A^{T}-B)\mathbf{y}^{\prime}\rangle=0,\qquad\forall\mathbf{x% }\in\mathbb{F}^{n},\forall\mathbf{y}^{\prime}\in(\mathbb{F}^{m})^{\prime}

By taking for $\mathbf{x}$ and $\mathbf{y}^{\prime}$ the vectors from the standard bases in $\mathbb{F}^{n}$ and $(\mathbb{F}^{m})^{\prime}\cong\mathbb{F}^{m}$ respectively we get that the matrices $B$ and $A^{T}$ coincide. ∎

So, for $X=\mathbb{F}^{n}$ , $Y=\mathbb{F}^{m}$

The dual transformation $A^{\prime}$ exists, and is unique. Moreover, its matrix (in the standard bases) equals $A^{T}$ (the transpose of the matrix of $A$ )

Dual transformation in the abstract setting

Now, let us consider the general case. In fact, we do not need to do much, since everything can be reduced to the case of spaces $\mathbb{F}^{n}$ .

Namely, let us fix bases $\mathcal{A}=\mathbf{a}_{1},\mathbf{a}_{2},\ldots,\mathbf{a}_{n}$ in $X$ , and $\mathcal{B}=\mathbf{b}_{1},\mathbf{b}_{2},\ldots,\mathbf{b}_{m}$ in $Y$ , and let $\mathcal{A}^{\prime}=\mathbf{a}^{\prime}_{1},\mathbf{a}^{\prime}_{2},\ldots,% \mathbf{a}^{\prime}_{n}$ and $\mathcal{B}=\mathbf{b}^{\prime}_{1},\mathbf{b}^{\prime}_{2},\ldots,\mathbf{b}^% {\prime}_{m}$ be their dual bases (in $X^{\prime}$ and $Y^{\prime}$ respectively). For a vector $\mathbf{v}$ (from a space or its dual) we as usual denote by $[\mathbf{v}]_{\mathcal{B}}$ the column of its coordinates in the basis $\mathcal{B}$ . Then

\langle\mathbf{x},\mathbf{x}^{\prime}\rangle=([\mathbf{x}^{\prime}]_{\mathcal{% A}^{\prime}})^{T}[\mathbf{x}]_{\mathcal{A}},\qquad\forall\mathbf{x}\in X\quad% \forall\mathbf{x}^{\prime}\in X^{\prime},

i.e. instead of working with $\mathbf{x}\in X$ and $\mathbf{x}^{\prime}\in X^{\prime}$ we can work with columns their coordinates (in the dual bases $\mathcal{A}$ and $\mathcal{A}^{\prime}$ respectively) absolutely the same way we do in in the case of $\mathbb{F}^{n}$ . Of course, the same works for $Y$ , so working with columns of coordinates and then translating everything back to the abstract setting we get that the dual transformation exists in unique in this case as well.

Moreover, using the fact (which we just proved) that for $A:\mathbb{F}^{n}\to\mathbb{F}^{m}$ the matrix of $A^{\prime}$ is $A^{T}$ we get

(8.3.1)

[A^{\prime}]_{\mathcal{A}^{\prime},\mathcal{B}^{\prime}}=([A]_{\mathcal{B},% \mathcal{A}})^{T},

or in plain English

The matrix of the dual transformation in the dual basis is the transpose of the matrix of the transformation in the original bases.

Remark 8.3.2.

Note, that while we used basis to construct the dual transformation, the resulting transformation does not depend on the choice of a basis.

A coordinate-free way to define the dual transformation

Let us now present another, more “high brow” way of defining the dual of a linear transformation. Namely, for $\mathbf{x}\in X$ , $\mathbf{y}^{\prime}\in Y$ let us fix for a moment $\mathbf{y}^{\prime}$ and treat the expression $\langle A\mathbf{x},\mathbf{y}^{\prime}\rangle=\mathbf{y}^{\prime}(A\mathbf{x})$ as a function of $\mathbf{x}$ . It is easy to see that this is a composition of two linear transformations (which ones?) and so it is a linear function of $\mathbf{x}$ , i.e. a linear functional on $X$ , i.e. an element of $X^{\prime}$ .

Let us call this linear functional $B(\mathbf{y}^{\prime})$ to emphasize the fact that it depends on $\mathbf{y}^{\prime}$ . Since we can do this for every $\mathbf{y}^{\prime}\in Y^{\prime}$ , we can define the transformation $B:Y^{\prime}\to X^{\prime}$ such that

\langle A\mathbf{x},\mathbf{y}^{\prime}\rangle=\langle\mathbf{x},B(\mathbf{y}^% {\prime})\rangle

Our next step is to show that $B$ is a linear transformation. Note, that since the transformation $B$ was defined in rather indirect way, we cannot see immediately from the definition that it is linear. To show the linearity of $B$ let us take $\mathbf{y}^{\prime}_{1},\mathbf{y}^{\prime}_{2}\in Y^{\prime}$ . For $\mathbf{x}\in X$

	$\displaystyle\langle\mathbf{x},B(\alpha\mathbf{y}^{\prime}_{1}+\beta\mathbf{y}% ^{\prime}_{2})\rangle$	$\displaystyle=\langle A\mathbf{x},\alpha\mathbf{y}^{\prime}_{1}+\beta\mathbf{y% }^{\prime}_{2}\rangle\qquad\text{by the definition of }B$
		$\displaystyle=\alpha\langle A\mathbf{x},\mathbf{y}^{\prime}_{1}\rangle+\beta% \langle A\mathbf{x},\mathbf{y}^{\prime}_{2}\rangle\qquad\text{by linearity}$
		$\displaystyle=\alpha\langle\mathbf{x},B(\mathbf{y}^{\prime}_{1})\rangle+\beta% \langle\mathbf{x},B(\mathbf{y}^{\prime}_{2})\rangle\qquad\text{by the % definition of }B$
		$\displaystyle=\langle\mathbf{x},\alpha B(\mathbf{y}^{\prime}_{1})+\beta B(% \mathbf{y}^{\prime}_{2})\rangle\qquad\text{by linearity}$

Since this identity is true for all $\mathbf{x}$ , we conclude that $B(\alpha\mathbf{y}^{\prime}_{1}+\beta\mathbf{y}^{\prime}_{2})=\alpha B(\mathbf% {y}^{\prime}_{1})+\beta B(\mathbf{y}^{\prime}_{2})$ , i.e. that $B$ is linear.

The main advantage of this approach that it does not require a basis, so it can be (and is) used in the infinite-dimensional situation. However, the proof that we presented above in Sections 8.3.1, 8.3.1 gives a constructive way to compute the dual transformation, so we used that proof instead of more general coordinate-free one.

Remark 8.3.3.

Note, that the above coordinate-free approach can be used to define the Hermitian adjoint of an operator in an inner product space. The only addition to the reasoning presented above will be the use of the Riesz Representation Theorem (Theorem 8.2.1). We leave the details as an exercise to the reader, see Problem 8.3.2 below.

8.3.2. Annihilators and relations between fundamental subspaces

Definition 8.3.4.

Let $X$ be a vector space and let $E\subset X$ . The annihilator of $E$ , denoted by $E^{\perp}$ is the set of all $\mathbf{x}^{\prime}\in X^{\prime}$ such that $\langle\mathbf{x},\mathbf{x}^{\prime}\rangle=0$ for all $\mathbf{x}\in E$ .

Using the fact that $X^{\prime\prime}$ is canonically isomorphic to $X$ (see Section 8.1.2) we say that for $E\subset X^{\prime}$ its annihilator $E^{\perp}$ consists of all vectors $\mathbf{x}\in X$ such that $\langle\mathbf{x},\mathbf{x}^{\prime}\rangle=0$ for all $\mathbf{x}^{\prime}\in E$ .

Remark 8.3.5.

Formally speaking, for $E\subset X^{\prime}$ the set $E^{\perp}$ should be defined as the set of all $\mathbf{x}^{\prime\prime}\in X^{\prime\prime}$ such that $\langle\mathbf{x}^{\prime},\mathbf{x}^{\prime\prime}\rangle=0$ for all $\mathbf{x}^{\prime}\in E$ ; the symbol $E_{\perp}$ is often used for the annihilator from the second part of Definition 8.3.4. However, because of the natural isomorphism of $X^{\prime\prime}$ and $X$ there is no real difference between these two cases, so we will always use $E^{\perp}$ .

Distinguishing the cases $E\subset X$ and $E\subset X^{\prime}$ makes a lot of sense in the infinite-dimensional situation, where $X^{\prime\prime}$ is not always canonically isomorphic to $X$ .

The spaces such that $X^{\prime\prime}$ canonically isomorphic to $X$ have a special name: they are called reflexive spaces.

Proposition 8.3.6.

Let $E$ be a subspace of $X$ . Then $(E^{\perp})^{\perp}=E$

This proposition looks absolutely like Proposition 5.3.6 from Chapter 5. However its proof is a bit more complicated, since the suggested proof of Proposition 5.3.6 from Chapter 5 heavily used the inner product space structure: it used the decomposition $X=E\oplus E^{\perp}$ , which is not true in our situation because, for example, $E$ and $E^{\perp}$ are in different spaces.

Proof.

Let $\mathbf{v}_{1},\mathbf{v}_{2},\ldots,\mathbf{v}_{r}$ be a basis in $E$ (recall that all spaces in this chapter are assumed to be finite-dimensional), so $E=\operatorname{span}\{\mathbf{v}_{1},\mathbf{v}_{2},\ldots,\mathbf{v}_{r}\}$ .

By Proposition 2.5.4 from Chapter 2 the system can be extended to a basis in all $X$ , i.e. one can find vectors $\mathbf{v}_{r+1},\ldots,\mathbf{v}_{n}$ ( $n=\dim X$ ) such that $\mathbf{v}_{1},\mathbf{v}_{2},\ldots,\mathbf{v}_{n}$ is a basis in $X$ .

Let $\mathbf{v}^{\prime}_{1},\mathbf{v}^{\prime}_{2},\ldots,\mathbf{v}^{\prime}_{n}$ be the dual basis to $\mathbf{v}_{1},\mathbf{v}_{2},\ldots,\mathbf{v}_{n}$ . By Exercise 8.3.3 below $E^{\perp}=\operatorname{span}\{\mathbf{v}^{\prime}_{r+1},\ldots,\mathbf{v}^{% \prime}_{n}\}$ . Applying again this exercise to $E^{\perp}$ we get that

(E^{\perp})^{\perp}=\operatorname{span}\{\mathbf{v}_{1},\mathbf{v}_{2},\ldots,% \mathbf{v}_{r}\}=E.

∎

The following theorem is analogous to Theorem 5.5.1 from Chapter 5

Theorem 8.3.7.

Let $A:X\to Y$ be an operator acting from one vector space to another. Then

a)

$\operatorname{Ker}A^{\prime}=(\operatorname{Ran}A)^{\perp}$ ;
b)

$\operatorname{Ker}A=(\operatorname{Ran}A^{\prime})^{\perp}$ ;
c)

$\operatorname{Ran}A=(\operatorname{Ker}A^{\prime})^{\perp}$ ;
d)

$\operatorname{Ran}A^{\prime}=(\operatorname{Ker}A)^{\perp}$ .

Proof.

First of all, let us notice, that since for a subspace $E$ we have $(E^{\perp})^{\perp}=E$ , the statements 1 and 3 are equivalent. Similarly, for the same reason, the statements 2 and 4 are equivalent as well. Finally, statement 2 is exactly statement 1 applied to the operator $A^{\prime}$ (here we use the trivial fact fact that $(A^{\prime})^{\prime}=A$ , which is true, for example, because of the corresponding fact for the transpose).

So, to prove the theorem we only need to prove statement 1.

Recall that $A^{\prime}:Y^{\prime}\to X^{\prime}$ . The inclusion $\mathbf{y}^{\prime}\in(\operatorname{Ran}A)^{\perp}$ means that $\mathbf{y}^{\prime}$ annihilates all vectors of the form $A\mathbf{x}$ , i.e. that

\langle A\mathbf{x},\mathbf{y}^{\prime}\rangle=0\qquad\forall\mathbf{x}\in X.

Since $\langle A\mathbf{x},\mathbf{y}^{\prime}\rangle=\langle\mathbf{x},A^{\prime}% \mathbf{y}^{\prime}\rangle$ , the last identity is equivalent to

\langle\mathbf{x},A^{\prime}\mathbf{y}^{\prime}\rangle=0\qquad\forall\mathbf{x% }\in X.

But that means that $A^{\prime}\mathbf{y}^{\prime}=\mathbf{0}$ ( $A^{\prime}\mathbf{y}^{\prime}$ is a zero functional).

So we have proved that $\mathbf{y}^{\prime}\in(\operatorname{Ran}A)^{\perp}$ iff $A^{\prime}\mathbf{y}^{\prime}=\mathbf{0}$ , or equivalently iff $\mathbf{y}^{\prime}\in\operatorname{Ker}A^{\prime}$ . ∎

Exercises.

8.3.1.

Prove that if for linear transformations $T,T_{1}:X\to Y$

\langle T\mathbf{x},\mathbf{y}^{\prime}\rangle=\langle T_{1}\mathbf{x},\mathbf% {y}^{\prime}\rangle

for all $\mathbf{x}\in X$ and for all $\mathbf{y}^{\prime}\in Y^{\prime}$ , then $T=T_{1}$ .

Probably one of the easiest ways of proving this is to use Lemma 8.1.3.

8.3.2.

Combine the Riesz Representation Theorem (Theorem 8.2.1) with the reasoning in Section 8.3.1 above to present a coordinate-free definition of the Hermitian adjoint of an operator in an inner product space.

The next problem gives a way to prove Proposition 8.3.6

8.3.3.

Let $\mathbf{v}_{1},\mathbf{v}_{2},\ldots,\mathbf{v}_{n}$ be a basis in $X$ and let $\mathbf{v}^{\prime}_{1},\mathbf{v}^{\prime}_{2},\ldots,\mathbf{v}^{\prime}_{n}$ be its dual basis. Let $E:=\operatorname{span}\{\mathbf{v}_{1},\mathbf{v}_{2},\ldots,\mathbf{v}_{r}\}$ , $r<n$ . Prove that $E^{\perp}=\operatorname{span}\{\mathbf{v}^{\prime}_{r+1},\ldots,\mathbf{v}^{% \prime}_{n}\}$ .

8.3.4.

Use the previous problem to prove that for a subspace $E\subset X$

\dim E+\dim E^{\perp}=\dim X.

8.4. What is the difference between a space and its dual?

We know that the dual space $X^{\prime}$ has the same dimension as $X$ , so the space and its dual are isomorphic. So one can think that really there is no difference between the space and its dual. However, as we discussed above in Section 8.1.1, when we change basis in the space $X$ the coordinates in $X$ and in $X^{\prime}$ change according to different rules, see formula (8.1.1) above.

On the other hand, using the natural isomorphism of $X$ and $X^{\prime\prime}$ we can say that $X$ is the dual of $X^{\prime}$ . From this point of view, there is no difference between $X$ and $X^{\prime}$ : we can start from $X$ and say that $X^{\prime}$ is its dual, or we can do it the other way around and start from $X^{\prime}$ .

We already used this point of view above, for example in the proof of Theorem 8.3.7.

Note also, that the change of coordinate formula (8.1.1) (see also the boxed statement below it) agrees with this point of view: if $\widetilde{S}:=(S^{-1})^{T}$ , then $(\widetilde{S}^{-1})^{T}=S$ , so we get the change of coordinate formula in $X$ from the one in $X^{\prime}$ by the same rule!

8.4.1. Isomorphisms between $X$ and $X^{\prime}$

There are infinitely many possibilities to define an isomorphism between $X$ and $X^{\prime}$ .

If $X=\mathbb{F}^{n}$ then the most natural way to identify $X$ and $X^{\prime}$ is to identify the standard basis in $\mathbb{F}^{n}$ with the one in $(\mathbb{F}^{n})^{\prime}$ . In this case the action of a linear functional will be given by the “inner product type” expression

\langle\mathbf{v},\mathbf{v}^{\prime}\rangle=(\mathbf{v}^{\prime})^{T}\mathbf{% v}.

To generalize this to the general case one has to fix a basis $\mathcal{B}=\mathbf{b}_{1},\mathbf{b}_{2},\ldots,\mathbf{b}_{n}$ in $X$ and consider the dual basis $\mathcal{B}^{\prime}=\mathbf{b}^{\prime}_{1},\mathbf{b}^{\prime}_{2},\ldots,% \mathbf{b}^{\prime}_{n}$ , and define an isomorphism $T:X\to X^{\prime}$ by $T\mathbf{b}_{k}=\mathbf{b}^{\prime}_{k}$ , $k=1,2,\ldots,n$ .

This isomorphism is natural in some sense, but it depends on the choice of the basis, so in general there is no natural way to identify $X$ and $X^{\prime}$ .

The exception to this is the case when $X$ is a real inner product space: the Riesz representation theorem (Theorem 8.2.1) gives a natural way to identify a linear functional with a vector in $X$ . Note that this approach works only for real inner product spaces. In the complex case, the Riesz representation theorem gives a natural identification of $X$ and $X^{\prime}$ , but this identification is not linear but conjugate linear.

8.4.2. An example: velocities (differential operators) and differential forms as vectors and linear functionals

To illustrate the relations between vectors and linear functional, let us consider an example from multivariable calculus, which gives rise to important ideas like tangent and cotangent bundles in differential geometry.

Let us recall the notion of the path integral (of the second kind) from the calculus. Recall that a path $\gamma$ in $\mathbb{R}^{n}$ is defined by its parameterization, i.e. by a function

t\mapsto\mathbf{x}(t)=(x_{1}(t),x_{2}(t),\ldots,x_{n}(t))^{T}

acting from an interval $[a,b]$ to $\mathbb{R}^{n}$ . If $\omega$ is the so-called differential form (differential $1$ -form),

\omega=f_{1}(\mathbf{x})dx_{1}+f_{2}(\mathbf{x})dx_{2}+\ldots+f_{n}(\mathbf{x}% )dx_{n},

the path integral

\int_{\gamma}\omega=\int_{\gamma}f_{1}dx_{1}+f_{2}dx_{2}+\ldots+f_{n}dx_{n}

is computed by substituting $\mathbf{x}(t)=(x_{1}(t),x_{2}(t),\ldots,x_{n}(t))^{T}$ in the expression, i.e. $\int_{\gamma}\omega$ is computed as

\int_{a}^{b}\left(f_{1}(\mathbf{x}(t))\frac{dx_{1}(t)}{dt}+f_{2}(\mathbf{x}(t)% )\frac{dx_{2}(t)}{dt}+\ldots+f_{n}(\mathbf{x}(t))\frac{dx_{n}(t)}{dt}\right)dt.

In other words, at each moment $t$ we have to evaluate the velocity

\mathbf{v}=\frac{d\mathbf{x}(t)}{dt}=\left(\frac{dx_{1}(t)}{dt},\frac{dx_{2}(t% )}{dt},\ldots,\frac{dx_{n}(t)}{dt}\right)^{T},

apply to it the linear functional $\mathbf{f}=(f_{1},f_{2},\ldots,f_{n})$ , $\mathbf{f}(\mathbf{v})=\sum_{k=1}^{n}f_{k}v_{k}$ (here $f_{k}=f_{k}(\mathbf{x}(t))$ but for a fixed $t$ each $f_{k}$ is just a number, so we simply write $f_{k}$ ), and then integrate the result (which depends on $t$ ) with respect to $t$ .

Velocities as vectors

Let us fix $t$ and analyze $\mathbf{f}(\mathbf{v})$ . We will show that according to the rules of Calculus, the coordinates of $\mathbf{v}$ change as coordinates of a vector, and the coordinates of $\mathbf{f}$ as the coordinates of a linear functional (covector). Let us assume as it is customary in Calculus, that $x_{k}$ are the coordinates in the standard basis in $\mathbb{R}^{n}$ , and let $\mathcal{B}=\{\mathbf{b}_{1},\mathbf{b}_{2},\ldots,\mathbf{b}_{n}\}$ be a different basis in $\mathbb{R}^{n}$ . We will use notation $\widetilde{x}_{k}$ to denote the coordinates of a vector $\mathbf{x}=(x_{1},x_{2},\ldots,x_{n})^{T}$ , i.e. $[\mathbf{x}]_{\mathcal{B}}=(\widetilde{x}_{1},\widetilde{x}_{2},\ldots,% \widetilde{x}_{n})^{T}$ .

Let $A=\{a_{k,j}\}_{k,j=1}^{n}$ be the change of coordinates matrix, $A=[I]_{\mathcal{B},\mathcal{S}}$ , so the new coordinates $\widetilde{x}_{k}$ are expressed in terms of the old ones as

\widetilde{x}_{k}=\sum_{j=1}^{n}a_{k,j}x_{j},\qquad k=1,2,\ldots,n.

So the new coordinates $\widetilde{v}_{k}$ of the vector $\mathbf{v}$ are obtained from its old coordinates $v_{k}$ as

\widetilde{v}_{k}=\sum_{j=1}^{n}a_{k,j}v_{j},\qquad k=1,2,\ldots,n.

Differential forms as linear functionals (covectors)

Let us now calculate the differential form

(8.4.1)

\omega=\sum_{k=1}^{n}f_{k}dx_{k}

in terms of new coordinates $\widetilde{x}_{k}$ . The change of coordinates matrix from the new to the old ones is $A^{-1}$ . Let $A^{-1}=\{\widetilde{a}_{k,j}\}_{k,j=1}^{n}$ , so

x_{k}=\sum_{j=1}^{n}\widetilde{a}_{k,j}\widetilde{x}_{j},\quad\text{and}\quad dx% _{k}=\sum_{j=1}^{n}\widetilde{a}_{k,j}d\widetilde{x}_{j},\qquad k=1,2,\ldots,n.

Substituting this into (8.4.1) we get

	$\displaystyle\omega$	$\displaystyle=\sum_{k=1}^{n}f_{k}\sum_{j=1}^{n}\widetilde{a}_{k,j}d\widetilde{% x}_{j}$
		$\displaystyle=\sum_{j=1}^{n}\left(\sum_{k=1}^{n}\widetilde{a}_{k,j}f_{k}\right% )d\widetilde{x}_{j}$
		$\displaystyle=\sum_{j=1}^{n}\widetilde{f}_{j}d\widetilde{x}_{j}$

where

\widetilde{f}_{j}=\sum_{k=1}^{n}\widetilde{a}_{k,j}f_{k}.

But that is exactly the change of coordinate rule for the dual space! So

according to the rules of Calculus, the coefficients of a differential $1$ -form change by the same rule as coordinates in the dual space

So, according to the accepted rules of Calculus, the coordinates of velocity $\mathbf{v}$ change as coordinates of a vector and coefficients (coordinates) of a differential $1$ -form change as the entries of a linear functional. In the differential the set of all velocities is called the tangent space, and the set of all differential $1$ forms is its dual and is called the cotangent space.

Differential operators as vectors

As we discussed above, in differential geometry vectors are represented by velocities, i.e. by the derivatives ${d\mathbf{x}(t)}/{dt}$ . This is a simple and intuitively clear point of view, but sometimes it is viewed as a bit naïve.

More “highbrow” point of view, also used in differential geometry (although in more advanced texts) is that vectors are represented by a differential operators

(8.4.2)

D=\sum_{k}v_{k}\frac{\partial}{\partial x_{k}}.

The informal reason for that is the following. Suppose we want to compute the derivative of a function $\Phi$ along the path given by the function $t\mapsto\mathbf{x}(t)$ , i.e. the derivative

\frac{d\Phi(\mathbf{x}(t))}{dt}.

By the Chain Rule, at a given time $t$

\frac{d\Phi(\mathbf{x}(t))}{dt}=\sum_{k=1}^{n}\left(\frac{\partial\Phi}{% \partial x_{k}}\Bigm{|}_{\mathbf{x}=\mathbf{x}(t)}\right)x_{k}^{\prime}(t)=D% \Phi\bigm{|}_{\mathbf{x}=\mathbf{x}(t)},

where the differential operator $D$ is given by (8.4.2) with $v_{k}=x_{k}^{\prime}(t)$ .

Of course, we need to show that the coefficient $v_{k}$ of a differential form change according to the change of coordinate rule for vectors. This is intuitively clear, and can be easily shown by using the multivariable Chain Rule. We leave this as an exercise for the reader, see Problem 8.4.1 below.

8.4.3. The case of a real inner product space

As we already discussed above, it follows from the Riesz Representation Theorem (Theorem 8.2.1) that a real inner product space $X$ and its dual $X^{\prime}$ are canonically isomorphic. Thus we can say that vectors and functionals live in the same space which makes things both simpler and more confusing.

Remark.

First of all let us note, that if the change of coordinates matrix $S$ is orthogonal ( $S^{-1}=S^{T}$ ), then $(S^{-1})^{T}=S$ . Therefore, for an orthogonal change of coordinate matrix the coordinates of a vector and of a linear functional change according to the same rule, so one cannot really see a difference between a vector and a functional.

The change of coordinate matrix is orthogonal, for example, if we change from one orthonormal basis to another.

Einstein notation, metric tensor

Let $\mathcal{B}=\{\mathbf{b}_{1},\mathbf{b}_{2},\ldots,\mathbf{b}_{n}$ be a basis in a real inner product space $X$ and let $\mathcal{B}^{\prime}=\{\mathbf{b}^{\prime}_{1},\mathbf{b}^{\prime}_{2},\ldots,% \mathbf{b}^{\prime}_{n}\}$ be the dual basis (we identify the dual space $X^{\prime}$ with $X$ via Riesz Representation Theorem, so $\mathbf{b}^{\prime}_{k}$ can be assumed to be in $X$ ).

Here we present the standard in differential geometry notation (the so-called Einstein notation) for working with coordinates in these bases. Since we will only be working with coordinates, we can assume that we are working in the space $\mathbb{R}^{n}$ with the non-standard inner product $(\,\cdot\,,\,\cdot\,)_{G}$ defined by the positive definite matrix $G=\{g_{j,k}\}_{j,k=1}^{n}$ , $g_{j,k}=(\mathbf{b}_{k},\mathbf{b}_{j})_{{}_{\scriptstyle X}}$ , which is often called the metric tensor,

(8.4.3)

(\mathbf{x},\mathbf{y})=(\mathbf{x},\mathbf{y})_{{}_{\scriptstyle G}}=\sum_{j=% 1}^{n}\sum_{k=1}^{n}g_{j,k}x_{j}y_{k},\qquad\mathbf{x},\mathbf{y}\in\mathbb{R}% ^{n}

(see Section 7.5 in Chapter 7).

To distinguish between vectors and linear functionals (co-vectors) it is agreed to write the coordinates of a vector with indices as superscripts and the coordinates a a linear functional with indices as subscripts: thus $x^{j}$ , $j=1,2,\ldots,n$ denotes the coordinates of a vector $\mathbf{x}$ and $f_{k}$ , $k=1,2,\ldots,n$ denotes the coordinates of a linear functional $\mathbf{f}$ .

Remark.

Putting indices as superscripts can be confusing, since one will need to distinguish it from the power. However, this is a standard and widely used notation, so we need to get acquainted with it. While I personally, like a lot of mathematicians, prefer using coordinate-free notation, all final computations are done in coordinates, so the coordinate notation has to be used. And as far as coordinate notations go, you will see that this notation is quite convenient to work with.

Another convention in the Einstein notation is that whenever in a product the same index appear in the subscript and superscript, it means one needs to sum up in this index. Thus $x^{j}f_{j}$ means $\sum_{j}x^{j}f_{j}$ , so we can write $\mathbf{f}(\mathbf{x})=x^{j}f_{j}$ . The same convention holds when we have more than one index of summation, so (8.4.3) can be rewritten in this notation as

(8.4.4)

(\mathbf{x},\mathbf{y})=g_{j,k}x^{k}y^{j},\qquad\mathbf{x},\mathbf{y}\in% \mathbb{R}^{n}

(mathematicians are lazy and are always trying to avoid writing extra symbols, whenever they can).

Finally, the last convention in the Einstein notation is the preservation of the position of the indices: if we do not sum over an index, it remains in the same position (subscript or superscript) as it was before. Thus we can write $y^{j}=a^{j}_{k}x^{k}$ , but not $f_{j}=a^{j}_{k}x^{k}$ , because the index $j$ must remain as a superscript.

Note, that to compute the inner product of $2$ vectors, knowing their coordinates is not sufficient. One also needs to know the matrix $G$ (which is often called the metric tensor). This agrees with the Einstein notation: if we try to write $(\mathbf{x},\mathbf{y})$ as the standard inner product, the expression $x_{k}y_{k}$ means just the product of coordinates, since for the summation we need the same index both as the subscript and the superscript. The expression (8.4.4), on the other hand, fit this convention perfectly.

Covariant and contravariant coordinates. Lovering and raising the indices

Let us recall that we have a basis $\mathbf{b}_{1},\mathbf{b}_{2},\ldots,\mathbf{b}_{n}$ in a real inner product space, and that $\mathbf{b}^{\prime}_{1},\mathbf{b}^{\prime}_{2},\ldots,\mathbf{b}^{\prime}_{n}$ , $\mathbf{b}^{\prime}_{k}\in X$ is its dual basis (we identify $X$ with its dual $X^{\prime}$ via Riesz Representation Theorem, so $\mathbf{b}_{k}^{\prime}$ are in $X$ ).

Given a vector $\mathbf{x}\in X$ it can be represented as

(8.4.5)		$\displaystyle\mathbf{x}$	$\displaystyle=\sum_{k=1}^{n}(\mathbf{x},\mathbf{b}^{\prime}_{k})\mathbf{b}_{k}% =:\sum_{k=1}^{n}x^{k}\mathbf{b}_{k},\qquad\text{and as}$
(8.4.6)		$\displaystyle\mathbf{x}$	$\displaystyle=\sum_{k=1}^{n}(\mathbf{x},\mathbf{b}_{k})\mathbf{b}^{\prime}_{k}% =:\sum_{k=1}^{n}x_{k}\mathbf{b}^{\prime}_{k}.$

The coordinates $x_{k}$ are called the covariant coordinates of the vector $\mathbf{x}$ and the coordinates $x^{k}$ are called the contravariant coordinates.

Now let us ask ourselves a question: how can one get covariant coordinates of a vector from the contravariant ones?

According to the Einstein notation, we use the contravariant coordinates working with vectors, and covariant ones for linear functionals (i.e. when we interpret a vector $\mathbf{x}\in X$ as a linear functional). We know (see (8.4.6)) that $x_{k}=(\mathbf{x},\mathbf{b}_{k})$ , so

x_{k}=(\mathbf{x},\mathbf{b}_{k})=\Bigl{(}\sum_{j}x^{j}\mathbf{b}_{j},\mathbf{% b}_{k}\Bigr{)}=\sum_{j}x^{j}(\mathbf{b}_{j},\mathbf{b}_{k})=\sum_{j}g_{k,j}x^{% j},

or in the Einstein notation

x_{k}=g_{k,j}x^{j}.

In other words,

the metric tensor $G$ is the change of coordinates matrix from contravariant coordinates $x^{k}$ to the covariant ones $x_{k}$ .

The operation of getting from contravariant coordinates to covariant is called lowering of the indices.

Note the following interpretation of the formula (8.4.4) for the inner product: as we know for the vector $\mathbf{x}$ we get its covariant coordinate as $x_{j}=g_{j,k}x^{k}$ , so $(\mathbf{x},\mathbf{y})=x_{j}y^{j}$ . Similarly, because $G$ is symmetric, we can say that $y_{k}=g_{j,k}y^{k}$ and that $(\mathbf{x},\mathbf{y})=x^{k}y_{k}$ . In other words

To compute the inner product of two vectors, one first needs to use the metric tensor $G$ to lower indices of one vector, and then, treating this vector as a functional compute its value on the other vector.

Of course, we can also change from covariant coordinates $x_{j}$ to contravariant ones $x^{j}$ (raise the indices). Since

(x_{1},x_{2},\ldots,x_{n})^{T}=G(x^{1},x^{2},\ldots,x^{n})^{T},

we get that

(x^{1},x^{2},\ldots,x^{n})^{T}=G^{-1}(x_{1},x_{2},\ldots,x_{n})^{T}

so the change of coordinate matrix in this case is $G^{-1}$ .

Since, as we know, the change of coordinate matrix is the metric tensor, we can immediately conclude that $G^{-1}$ is the metric tensor in covariant coordinates, i.e. that if $G^{-1}=\{g^{k,j}\}_{k,j=1}^{n}$ then

(\mathbf{x},\mathbf{y})=g^{k,j}x_{j}y_{k}.

Remark.

Note, that if one looks at the big picture, the covariant and contravariant coordinates are completely interchangeable. It is just the matter of which one of the bases in the dual pair $\mathcal{B}$ and $\mathcal{B}^{\prime}$ we assign to be the “primary” one and which one to be the dual.

What to chose as a “primary” object, and what as the “dual” one depends mostly on accepted conventions.

Remark 8.4.1.

Einstein notation is usually used in differential, and especially Riemannian geometry, where vectors are identified with velocities and covectors (linear functionals) with the differential $1$ -forms, see Section 8.4.2 above. Vectors and covectors here are clearly different objects and form what is called tangent and cotangent spaces respectively.

In Riemannian geometry one then introduces inner product (i.e. the metric tensor, if one thinks in terms of coordinates) on the tangent space, which allows us identify vectors and covectors (linear functionals). In coordinate representation this identification is done by lowering/raising indices, as described above.

8.4.4. Conclusions

Let us summarize the above discussion on whether or not a space is different from its dual.

In short, the answer is “Yes”, they are different objects. Although in the finite-dimensional case, which is treated in this book, they are isomorphic, nothing is usually gained from the identification of a space and its dual.

Even in the simplest case of $\mathbb{F}^{n}$ it is useful to think that the elements of $\mathbb{F}^{n}$ are columns and the elements of its dual are rows (even though, when doing manipulations with the elements of the dual space we often put the rows vertically). More striking examples are ones considered in Sections 8.1.4 and 8.1.4 dealing with Taylor formula and Lagrange interpolation. One can clearly see there that the linear functionals are indeed completely different objects than polynomials, and that hardly anything can be gained by identifying functionals with the polynomials.

For inner product spaces the situation is different, because such spaces can be canonically identified with their duals. This identification is linear for real inner product spaces, so a real inner product space is canonically isomorphic to its dual. In the case of complex spaces, this identification is only conjugate linear, but it is nevertheless very helpful to identify a linear functional with a vector and use the inner product space structure and ideas like orthogonality, self-adjointness, orthogonal projections, etc.

However, sometimes even in the case of real inner product spaces, it is more natural to consider the space and its dual as different objects. For example, in Riemannian geometry, see Remark 8.4.1 above vector and covectors come from different objects, velocities and differential $1$ -forms respectively. Even though the introduction of the metric tensor allows us to identify vectors and covectors, it is sometimes more convenient to remember their origins think of them as of different objects.

Exercises.

8.4.1.

Let $D$ be a differential operator

D=\sum_{k=1}^{n}v_{k}\frac{\partial}{\partial x_{k}}.

Show, using the chain rule, that if we change a basis and write $D$ in new coordinates, its coefficients $v_{k}$ change according to the change of coordinates rule for vectors.

8.5. Multilinear functions. Tensors

8.5.1. Multilinear functions

Definition 8.5.1.

Let $V_{1},V_{2},\ldots,V_{p},V$ be vector spaces (over the same field $\mathbb{F}$ ). A multilinear ( $p$ -linear) map with values in $V$ is a function $F$ of $p$ vector variables $\mathbf{v}_{1},\mathbf{v}_{2},\ldots,\mathbf{v}_{p}$ , $\mathbf{v}_{k}\in V_{k}$ , with the target space $V$ , which is linear in each variable $\mathbf{v}_{k}$ . In other words, it means that if we fix all variables except $\mathbf{v}_{k}$ we get a linear map, and this should be true for all $k=1,2,\ldots,p$ . We will use the symbol $L(V_{1},V_{2},\ldots,V_{p};V)$ for the set of all such multilinear functions.

If the target space $V$ is the field of scalars $\mathbb{F}$ , we call $F$ a multilinear functional, or tensor. The number $p$ is called the valency of the multilinear functional (tensor). Thus, tensor of valency $1$ is a linear functional, tensor of valency $2$ is called a bilinear form.

Example.

Let $\mathbf{f}_{k}\in(V_{k})^{\prime}$ . Define a polylinear functional $F=\mathbf{f}_{1}\otimes\mathbf{f}_{2}\otimes\ldots\otimes\mathbf{f}_{p}$ by multiplying the functionals $\mathbf{f}_{k}$ ,

(8.5.1)

\mathbf{f}_{1}\otimes\mathbf{f}_{2}\otimes\ldots\otimes\mathbf{f}_{p}(\mathbf{% v}_{1},\mathbf{v}_{2},\ldots,\mathbf{v}_{p})=\mathbf{f}_{1}(\mathbf{v}_{1})% \mathbf{f}_{2}(\mathbf{v}_{2})\ldots\mathbf{f}_{p}(\mathbf{v}_{p}),

for $\mathbf{v}_{k}\in V_{k}$ , $k=1,2,\ldots,p$ . The polylinear functional $\mathbf{f}_{1}\otimes\mathbf{f}_{2}\otimes\ldots\otimes\mathbf{f}_{p}$ is called the tensor product of functionals $\mathbf{f}_{k}$ .

Multilinear functions form a vector space

Notice, that in the space $L(V_{1},V_{2},\ldots,V_{p};V)$ one can introduce the natural operations of addition and multiplication by a scalar,

	$\displaystyle(F_{1}+F_{2})(\mathbf{v}_{1},\mathbf{v}_{2},\ldots,\mathbf{v}_{p})$	$\displaystyle:=F_{1}(\mathbf{v}_{1},\mathbf{v}_{2},\ldots,\mathbf{v}_{p})+F_{2% }(\mathbf{v}_{1},\mathbf{v}_{2},\ldots,\mathbf{v}_{p}),$
	$\displaystyle(\alpha F_{1})(\mathbf{v}_{1},\mathbf{v}_{2},\ldots,\mathbf{v}_{p})$	$\displaystyle:=\alpha F_{1}(\mathbf{v}_{1},\mathbf{v}_{2},\ldots,\mathbf{v}_{p% }),$

where $F_{1},F_{2}\in L(V_{1},V_{2},\ldots,V_{p};V)$ , $\alpha\in\mathbb{F}$ .

Equipped with these operations, the space $L(V_{1},V_{2},\ldots,V_{p};V)$ is a vector space.

To see that we first need to show that $F_{1}+F_{2}$ and $\alpha F_{1}$ are multilinear functions. Since “multilinear” means that it is linear in each argument separately (with all the other variables fixed), this follows from the corresponding fact about linear transformation; namely from the fact that the sum of linear transformations and a scalar multiple of a linear transformation are linear transformations, cf. Section 1.4 of Chapter 1.

Then it is easy to show that $L(V_{1},V_{2},\ldots,V_{p};V)$ satisfies all axioms of vector space; one just need to use the fact that $V$ satisfies these axioms. We leave the details as an exercise for the reader. He/she can look at Section 1.4 of Chapter 1, where it was shown that the set of linear transformations satisfies axiom 7. Literally the same proof works for multilinear functions; the proof that all other axioms are also satisfied is very similar.

Dimension of $L(V_{1},V_{2},\ldots,V_{p};V)$

Let $\mathcal{B}_{1},\mathcal{B}_{2},\ldots,\mathcal{B}_{p}$ be bases in the spaces $V_{1},V_{2},\ldots,V_{p}$ respectively. Since a linear transformation is defined by its action on a basis, a multilinear function $F\in L(V_{1},V_{2},\ldots,V_{p};V)$ is defined by its values on all tuples

\mathbf{b}^{1}_{j_{1}},\mathbf{b}^{2}_{j_{2}},\ldots,\mathbf{b}^{p}_{j_{p}},% \qquad\mathbf{b}^{k}_{j_{k}}\in\mathcal{B}_{k}.

Since there are exactly

(\dim V_{1})(\dim V_{2})\ldots(\dim V_{p})

such tuples, and each $F(\mathbf{b}^{1}_{j_{1}},\mathbf{b}^{2}_{j_{2}},\ldots,\mathbf{b}^{p}_{j_{p}})$ is determined by $\dim V$ coordinates (in some basis in $V$ ). we can conclude that $F\in L(V_{1},V_{2},\ldots,V_{p};V)$ is determined by $(\dim V_{1})(\dim V_{2})\ldots(\dim V_{p})(\dim V)$ entries. In other words

\dim L(V_{1},V_{2},\ldots,V_{p};V)=(\dim V_{1})(\dim V_{2})\ldots(\dim V_{p})(% \dim V).

in particular, if the target space is the field of scalars $\mathbb{F}$ (i.e. if we are dealing with multilinear functionals)

\dim L(V_{1},V_{2},\ldots,V_{p};\mathbb{F})=(\dim V_{1})(\dim V_{2})\ldots(% \dim V_{p}).

It is easy to find a basis in $L(V_{1},V_{2},\ldots,V_{p};\mathbb{F})$ . Namely, let for $k=1,2,\ldots,p$ the system $\mathcal{B}_{k}=\{\mathbf{b}^{k}_{j}\}_{j=1}^{\dim V_{k}}$ be a basis in $V_{k}$ and let $\mathcal{B}^{\prime}=\{\widetilde{\mathbf{b}}^{k}_{j}\}_{j=1}^{\dim V_{k}}$ be its dual system, $\widetilde{\mathbf{b}}^{k}_{j}\in V_{k}^{\prime}$ .

Proposition 8.5.2.

The system

\widetilde{\mathbf{b}}^{1}_{j_{1}}\otimes\widetilde{\mathbf{b}}^{2}_{j_{2}}% \otimes\ldots\otimes\widetilde{\mathbf{b}}^{p}_{j_{p}},\qquad 1\leq j_{k}\leq% \dim V_{k},\quad k=1,2,\ldots,p,

is a basis in the space $L(V_{1},V_{2},\ldots,V_{p};\mathbb{F})$ .

Here $\widetilde{\mathbf{b}}^{1}_{j_{1}}\otimes\widetilde{\mathbf{b}}^{2}_{j_{2}}% \otimes\ldots\otimes\widetilde{\mathbf{b}}^{p}_{j_{p}}$ is the tensor product of functionals, as defined in (8.5.1).

Proof.

We want to represent $F$ as

(8.5.2)

F=\sum_{j_{1},j_{2},\ldots,j_{p}}\alpha_{j_{1},j_{2},\ldots,j_{p}}\widetilde{% \mathbf{b}}^{1}_{j_{1}}\otimes\widetilde{\mathbf{b}}^{2}_{j_{2}}\otimes\ldots% \otimes\widetilde{\mathbf{b}}^{p}_{j_{p}}

Since $\widetilde{\mathbf{b}}_{j}(\mathbf{b}_{l})=\delta_{j,l}$ , we have

(8.5.3)		$\displaystyle\widetilde{\mathbf{b}}^{1}_{j_{1}}\otimes\widetilde{\mathbf{b}}^{% 2}_{j_{2}}\otimes\ldots\otimes\widetilde{\mathbf{b}}^{p}_{j_{p}}(\mathbf{b}^{1% }_{j_{1}},\mathbf{b}^{2}_{j_{2}},\ldots,\mathbf{b}^{p}_{j_{p}})$	$\displaystyle=1\qquad\text{and}$
(8.5.4)		$\displaystyle\widetilde{\mathbf{b}}^{1}_{j_{1}}\otimes\widetilde{\mathbf{b}}^{% 2}_{j_{2}}\otimes\ldots\otimes\widetilde{\mathbf{b}}^{p}_{j_{p}}(\mathbf{b}^{1% }_{j^{\prime}_{1}},\mathbf{b}^{2}_{j^{\prime}_{2}},\ldots,\mathbf{b}^{p}_{j^{% \prime}_{p}})$	$\displaystyle=0$

for any collection of indices $j^{\prime}_{1},j^{\prime}_{2},\ldots,j^{\prime}_{p}$ different from $j_{1},j_{2},\ldots,j_{p}$ .

Therefore, applying (8.5.2) to the tuple $\mathbf{b}^{1}_{j_{1}},\mathbf{b}^{2}_{j_{2}},\ldots,\mathbf{b}^{p}_{j_{p}}$ we get

\alpha_{j_{1},j_{2},\ldots,j_{p}}=F(\mathbf{b}^{1}_{j_{1}},\mathbf{b}^{2}_{j_{% 2}},\ldots,\mathbf{b}^{p}_{j_{p}}),

so the representation (8.5.2) is unique (if exists).

On the other hand, defining $\alpha_{j_{1},j_{2},\ldots,j_{p}}:=F(\mathbf{b}^{1}_{j_{1}},\mathbf{b}^{2}_{j_% {2}},\ldots,\mathbf{b}^{p}_{j_{p}})$ and using (8.5.3) and (8.5.4), we can see that the equality (8.5.2) holds on all tuples of form $\mathbf{b}^{1}_{j_{1}},\mathbf{b}^{2}_{j_{2}},\ldots,\mathbf{b}^{p}_{j_{p}}$ . So decomposition (8.5.2) holds, so we indeed have a basis. ∎

8.5.2. Tensor Products

Definition.

Let $V_{1},V_{2},\ldots,V_{p}$ be vector spaces. The tensor product

V_{1}\otimes V_{2}\otimes\ldots\otimes V_{p}

of spaces $V_{k}$ is simply the set $L(V^{\prime}_{1},V^{\prime}_{2},\ldots,V^{\prime}_{p};\mathbb{F})$ of multilinear functionals; here $V_{k}^{\prime}$ is the dual of $V_{k}$ .

Remark 8.5.3.

By Proposition 8.5.2 we get that if $\mathcal{B}_{k}=\{\mathbf{b}^{k}_{j}\}_{j=1}^{\dim V_{k}}$ is a basis in $V_{k}$ for $k=1,2,\ldots,p$ , then the system

(8.5.5)

\mathbf{b}^{1}_{j_{1}}\otimes\mathbf{b}^{2}_{j_{2}}\otimes\ldots\otimes\mathbf% {b}^{p}_{j_{p}},\qquad 1\leq j_{k}\leq\dim V_{k},\quad k=1,2,\ldots,p,

is a basis in $V_{1}\otimes V_{2}\otimes\ldots\otimes V_{p}$ .

Here we treat a vector $\mathbf{v}_{k}\in V_{k}$ as a linear functional on $V^{\prime}_{k}$ ; the tensor product of vectors $\mathbf{v}_{1}\otimes\mathbf{v}_{2}\otimes\ldots\otimes\mathbf{v}_{p}$ is the defined according to (8.5.1).

Remark.

The tensor product $\mathbf{v}_{1}\otimes\mathbf{v}_{2}\otimes\ldots\otimes\mathbf{v}_{p}$ of vectors is clearly linear in each argument $\mathbf{v}_{k}$ . In other words, the map $(\mathbf{v}_{1},\mathbf{v}_{2},\ldots,\mathbf{v}_{p})\mapsto\mathbf{v}_{1}% \otimes\mathbf{v}_{2}\otimes\ldots\otimes\mathbf{v}_{p}$ is a multilinear functional with values in $V_{1}\otimes V_{2}\otimes\ldots\otimes V_{p}$ . We leave the proof as an exercise for a reader, see Problem 8.5.1 below

Remark.

Note, that the set $\{\mathbf{v}_{1}\otimes\mathbf{v}_{2}\otimes\ldots\otimes\mathbf{v}_{p}\,:\,% \mathbf{v}_{k}\in V_{k}\}$ of tensor products of vectors is strictly less than $V_{1}\otimes V_{2}\otimes\ldots\otimes V_{p}$ , see Problem 8.5.2 below.

Lifting a multilinear function to a linear transformation on the tensor product

Proposition 8.5.4.

For any multilinear function $F\in L(V_{1},V_{2},\ldots,V_{p};V)$ there exists a unique linear transformation $T:V_{1}\otimes V_{2}\otimes\ldots\otimes V_{p}\to V$ extending $F$ , i.e. such that

(8.5.6)

F(\mathbf{v}_{1},\mathbf{v}_{2},\ldots,\mathbf{v}_{p})=T\,\mathbf{v}_{1}% \otimes\mathbf{v}_{2}\otimes\ldots\otimes\mathbf{v}_{p},

for all choices of vectors $\mathbf{v}_{k}\in V_{k}$ , $1\leq k\leq p$ .

Remark.

If $T:V_{1}\otimes V_{2}\otimes\ldots\otimes V_{p}\to V$ is a linear transformation, then trivially the function $F$ ,

F(\mathbf{v}_{1},\mathbf{v}_{2},\ldots,\mathbf{v}_{p}):=T\,\mathbf{v}_{1}% \otimes\mathbf{v}_{2}\otimes\ldots\otimes\mathbf{v}_{p},

is a multilinear function in $L(V_{1},V_{2},\ldots,V_{p};V)$ . This follows immediately from the fact that the expression $\mathbf{v}_{1}\otimes\mathbf{v}_{2}\otimes\ldots\otimes\mathbf{v}_{p}$ is linear in each variable $\mathbf{v}_{k}$ .

Proof of Proposition 8.5.4.

Define $T$ on the basis (8.5.5) by

T\mathbf{b}^{1}_{j_{1}}\otimes\mathbf{b}^{2}_{j_{2}}\otimes\ldots\otimes% \mathbf{b}^{p}_{j_{p}}=F(\mathbf{b}^{1}_{j_{1}},\mathbf{b}^{2}_{j_{2}},\ldots,% \mathbf{b}^{p}_{j_{p}})

and then extend it by linearity to all space $V_{1}\otimes V_{2}\otimes\ldots\otimes V_{p}$ . To complete the proof we need to show that (8.5.6) holds for all choices of vectors $\mathbf{v}_{k}\in V_{k}$ , $1\leq k\leq p$ (we now know that only when each $\mathbf{v}_{k}$ is one of the vectors $\mathbf{b}^{k}_{j_{k}}$ ).

To prove that, let us decompose $\mathbf{v}_{k}$ as

\mathbf{v}_{k}=\sum_{j_{k}}\alpha^{k}_{j_{k}}\mathbf{b}^{k}_{j_{k}},\qquad k=1% ,2,\ldots,p.

Using linearity in each variable $\mathbf{v}_{k}$ we get

	$\displaystyle\mathbf{v}_{1}\otimes\mathbf{v}_{2}\otimes\ldots\otimes\mathbf{v}% _{p}$	$\displaystyle=\sum_{j_{1},j_{2},\ldots,j_{p}}\alpha^{1}_{j_{1}}\alpha^{2}_{j_{% 2}},\ldots,\alpha^{p}_{j_{p}}\mathbf{b}^{1}_{j_{1}}\otimes\mathbf{b}^{2}_{j_{2% }}\otimes\ldots\otimes\mathbf{b}^{p}_{j_{p}},$
	$\displaystyle F(\mathbf{v}_{1},\mathbf{v}_{2},\ldots,\mathbf{v}_{p})$	$\displaystyle=\sum_{j_{1},j_{2},\ldots,j_{p}}\alpha^{1}_{j_{1}}\alpha^{2}_{j_{% 2}},\ldots,\alpha^{p}_{j_{p}}F(\mathbf{b}^{1}_{j_{1}},\mathbf{b}^{2}_{j_{2}},% \ldots,\mathbf{b}^{p}_{j_{p}})$

so by the definition of $T$ identity (8.5.6) holds. ∎

Dual of a tensor product

As one can easily see, the dual of the tensor product $V_{1}\otimes V_{2}\otimes\ldots\otimes V_{p}$ is the tensor product of dual spaces $V_{1}^{\prime}\otimes V_{2}^{\prime}\otimes\ldots\otimes V_{p}^{\prime}$ .

Indeed, by Proposition 8.5.4 and remark after it, there is a natural one-to-one correspondence between multilinear functionals in $L(V_{1},V_{2},\ldots,V_{p},\mathbb{F})$ (i.e. the elements of $V_{1}^{\prime}\otimes V_{2}^{\prime}\otimes\ldots\otimes V_{n}^{\prime}$ ) and the linear transformations $T:V_{1}\otimes V_{2}\otimes\ldots\otimes V_{p}\to\mathbb{F}$ (i.e. with the elements of the dual of $V_{1}\otimes V_{2}\otimes\ldots\otimes V_{p}$ ).

Note, that the bases from Remark 8.5.3 and Proposition 8.5.2 are the dual bases (in $V_{1}\otimes V_{2}\otimes\ldots\otimes V_{p}$ and $V_{1}^{\prime}\otimes V_{2}^{\prime}\otimes\ldots\otimes V_{n}^{\prime}$ respectively). Knowing the dual bases allows us easily calculate the duality between the spaces $V_{1}\otimes V_{2}\otimes\ldots\otimes V_{p}$ and $V_{1}^{\prime}\otimes V_{2}^{\prime}\otimes\ldots\otimes V_{p}^{\prime}$ , i.e. the expression $\langle\mathbf{x},\mathbf{x}^{\prime}\rangle$ , $\mathbf{x}\in V_{1}\otimes V_{2}\otimes\ldots\otimes V_{p}$ , $\mathbf{x}^{\prime}\in V_{1}^{\prime}\otimes V_{2}^{\prime}\otimes\ldots% \otimes V_{p}^{\prime}$

8.5.3. Covariant and contravariant tensors

Let $X_{1},X_{2},\ldots,X_{p}$ be vector spaces, and let $V_{k}$ be either $X_{k}$ or $X_{k}^{\prime}$ , $k=1,2,\ldots,p$ . For a multilinear function $F\in L(V_{1},V_{2},\ldots,V_{p};V)$ we say that that it is covariant in variable $\mathbf{v}_{k}\in V_{k}$ if $V_{k}=X_{k}$ and contravariant in this variable if $V_{k}=X_{k}^{\prime}$ .

If a multilinear function is covariant (contravariant) in all variables, we say that the multilinear function is covariant (contravariant). In general, if a function is covariant in $r$ variables and contravariant in $s$ variables, we say that the multilinear function is $r$ -covariant $s$ -contravariant (or simply $(r,s)$ multilinear function, or that its valency is $(r,s)$ ).

Thus, a linear functional can be interpreted as $1$ -covariant tensor (recall, that we use the word tensor for the case of functionals, i.e. when the target space is the field of scalars $\mathbb{F}$ ). By duality, a vector can be interpreted as $1$ -contravariant tensor.

Remark.

At first the terminology might look a bit confusing: if a variable is a vector (not a functional), it is a covariant variable but a contravariant object (tensor). But notice, that we did not say here a “covariant variable”: we said that if $\mathbf{v}_{k}\in X_{k}$ then the mulitilinear function is covariant in the variable $\mathbf{v}_{k}$ . So, the by the covariant variable me mean not the vector $\mathbf{v}_{k}$ , but the “slot” in the tensor where we put it!

So there is no contradiction, we put the contravariant objects (tensors) into covariant slots and vice versa.

Sometimes, slightly abusing the language, people talk about covariant (contravariant) variables or arguments. But it is usually meant that the corresponding “slots” in the tensor are covariant (contravariant), and not the variables as objects.

Linear transformations as tensors

A linear transformation $T:X_{1}\to X_{2}$ can be interpreted as $1$ -covariant $1$ -contravariant tensor. Namely, the bilinear functional $F$ ,

F(\mathbf{x}_{1},\mathbf{x}_{2}^{\prime}):=\langle T\mathbf{x}_{1},\mathbf{x}_% {2}^{\prime}\rangle,\qquad\mathbf{x}_{1}\in X_{1},\mathbf{x}_{2}^{\prime}\in X% _{2}^{\prime}

is covariant in the first variable $\mathbf{x}_{1}$ and contravariant in the second one $\mathbf{x}_{2}^{\prime}$ .

Conversely,

Proposition 8.5.5.

Given a $1$ - $1$ tensor $F\in L(X_{1},X_{2}^{\prime};\mathbb{F})$ , there exists a unique linear transformation $T:X_{1}\to X_{2}$ such that

(8.5.7)

F(\mathbf{x}_{1},\mathbf{x}_{2}^{\prime}):=\langle T\mathbf{x}_{1},\mathbf{x}_% {2}^{\prime}\rangle,

for all $\mathbf{x}_{1}\in X_{2}$ , $\mathbf{x}_{2}^{\prime}\in X_{2}^{\prime}$ .

Proof.

First of all note, that the uniqueness is a trivial corollary of Lemma 8.1.3, cf. Problem 8.3.1 above. So we only need to prove existence of $T$ .

Let $\mathcal{B}_{k}=\{\mathbf{b}^{k}_{j}\}_{j=1}^{\dim X_{k}}$ be a basis in $X_{k}$ , and let $\mathcal{B}_{k}^{\prime}=\{\widetilde{\mathbf{b}}^{k}_{j}\}_{j=1}^{\dim X_{k}}$ be the dual basis in $X_{k}^{\prime}$ , $k=1,2$ . Then define the matrix $A=\{a_{k,j}\}_{k=1}^{\dim X_{2}}{}_{j=1}^{\dim X_{1}}$ by

a_{k,j}=F(\mathbf{b}^{1}_{j},\widetilde{\mathbf{b}}^{2}_{k}).

Define $T$ to be the operator with matrix $[T]_{\mathcal{B}_{2},\mathcal{B}_{1}}=A$ . Clearly (see Remark 8.1.5)

(8.5.8)

\langle T\mathbf{b}^{1}_{j},\widetilde{\mathbf{b}}^{2}_{k}\rangle=a_{k,j}=F(% \mathbf{b}^{1}_{j},\widetilde{\mathbf{b}}^{2}_{k})

which implies the equality (8.5.7). This can be easily seen by decomposing $\mathbf{x}_{1}=\sum_{j}\alpha_{j}\mathbf{b}_{j}$ and $\mathbf{x}_{2}^{\prime}=\sum_{k}\beta_{k}\mathbf{b}_{k}^{\prime}$ and using linearity in each argument.

Another, more high brow explanation is that the tensors in left and the right sides of (8.5.7) coincide on a basis in $X_{1}\otimes X_{2}^{\prime}$ (see Remark 8.5.3 about the basis), so they coincide. To be more precise, one should lift the bilinear forms to the linear transformations (functionals) $X_{1}\otimes X_{2}^{\prime}\to\mathbb{F}$ (see Proposition 8.5.4), and since the transformations coincide on a basis, they are equal.

One can also give an alternative, coordinate-free proof of existence of $T$ , along the lines of the coordinate-free definition of the dual space (see Section 8.3.1). Namely, if we fix $\mathbf{x}_{1}$ , the function $F(\mathbf{x}_{1},\mathbf{x}_{2}^{\prime})$ is a linear in $\mathbf{x}_{2}^{\prime}$ , so it is a linear functional on $X_{2}^{\prime}$ , i.e. a vector in $X_{2}$ .

Let us call this vector $T(\mathbf{x}_{1})$ . So we defined a transformation $T:X_{1}\to X_{2}$ . One can easily show that $T$ is a linear transformation by essentially repeating the reasoning from Section 8.3.1. The equality (8.5.7) follows authomatically from the definition of $T$ . ∎

Remark.

Note that we also can say that the function $F$ from Proposition 8.5.5 defines not the transformation $T$ , but its adjoint. Apriori, without assuming anything (like order of variables and its interpretation) we cannot distinguish between a transformation and its adjoint.

Remark.

Note, that if we would like to follow the Einstein notation, the entries $a_{j,k}$ of the matrix $A=[T]_{\mathcal{B}_{2},\mathcal{B}_{1}}$ of the transformation $T$ should be written as $a^{j}_{k}$ . Then if $x^{k}$ , $k=1,2,\ldots,\dim X_{1}$ are the coordinates of the vector $\mathbf{x}\in X_{1}$ , the $j$ th coordinate of $\mathbf{y}=T\mathbf{x}$ is given by

y^{j}=a^{j}_{k}x^{k}.

Recall the here we skip the sign of summation, but we mean the sum over $k$ . Note also, that we preserve positions of the indices, so the index $j$ stays upstairs. The index $k$ does not appear in the left side of the equation because we sum over this index in the right side, and its got “killed”.

Similarly, if $x_{j}$ , $j=1,2,\ldots,\dim X_{2}$ are the coordinates of the vector $\mathbf{x}^{\prime}\in X_{2}^{\prime}$ , then $k$ th coordinate of $\mathbf{y}^{\prime}:=T^{\prime}\mathbf{x}^{\prime}$ is given by

y_{k}=a^{j}_{k}x_{j}

(again, skipping the sign of summation over $j$ ). Again, since we preserve the position of the indices, so the index $k$ in $y_{k}$ is a subscript.

Note, that since $\mathbf{x}\in X_{1}$ and $\mathbf{y}=T\mathbf{x}\in X_{2}$ are vectors, according to the conventions of the Einstein notation, the indices in their coordinates indeed should be written as superscripts.

Similarly, $\mathbf{x}^{\prime}\in X_{2}^{\prime}$ and $\mathbf{y}^{\prime}=T^{\prime}\mathbf{x}^{\prime}\in X_{1}^{\prime}$ are covectors, so indices in their coordinates should be written as subscripts.

The Einstein notation emphasizes the fact mentioned in the previous remark, that a $1$ -covariant $1$ -contravariant tensor gives us both a linear transformation and its adjoint: the expression $a^{j}_{k}x^{k}$ gives the action of $T$ , and $a^{j}_{k}x_{j}$ gives the action of its adjoint $T^{\prime}$ .

Polylinear transformations as tensors

More generally, any polylinear transformation can be interpreted as a tensor. Namely, given a polylinear transformation $F\in L(V_{1},V_{2},\ldots,V_{p};V)$ one can define the tensor $\widetilde{F}\in L(V_{1},V_{2},\ldots,V_{p},V^{\prime};\mathbb{F})$ by

(8.5.9)

\widetilde{F}(\mathbf{v}_{1},\mathbf{v}_{2},\ldots,\mathbf{v}_{p},\mathbf{v}^{% \prime})=\langle F(\mathbf{v}_{1},\mathbf{v}_{2},\ldots,\mathbf{v}_{p}),% \mathbf{v}^{\prime}\rangle,\qquad\mathbf{v}_{k}\in V_{k},\mathbf{v}^{\prime}% \in V^{\prime}.

Conversely,

Proposition 8.5.6.

Given a tensor $\widetilde{F}\in L(V_{1},V_{2},\ldots,V_{p},V^{\prime};\mathbb{F})$ there exists a unique polylinear transformation $F\in L(V_{1},V_{2},\ldots,V_{p};V)$ such that (8.5.9) is satisfied.

Proof.

By Proposition 8.5.4 the tensor $\widetilde{F}$ can be extended to a linear transformation (functional) $\widetilde{T}:V_{1}\otimes V_{2}\otimes\ldots\otimes V_{p}\otimes V^{\prime}% \to\mathbb{F}$ such that

\widetilde{F}(\mathbf{v}_{1},\mathbf{v}_{2},\ldots,\mathbf{v}_{p},\mathbf{v}^{% \prime})=\widetilde{T}(\mathbf{v}_{1}\otimes\mathbf{v}_{2}\otimes\ldots\otimes% \mathbf{v}_{p}\otimes\mathbf{v}^{\prime})

for all $\mathbf{v}_{k}\in V_{k}$ , $\mathbf{v}^{\prime}\in V^{\prime}$ .

If $\mathbf{w}\in W:=V_{1}\otimes V_{2}\otimes\ldots\otimes V_{p}$ and $\mathbf{v}^{\prime}\in V^{\prime}$ , then

\mathbf{w}\otimes\mathbf{v}^{\prime}\in V_{1}\otimes V_{2}\otimes\ldots\otimes V% _{p}\otimes V^{\prime}.

So, we can define a bilinear functional (tensor) $G\in L(W,V^{\prime};\mathbb{F})$ by

G(\mathbf{w},\mathbf{v}^{\prime}):=\widetilde{T}(\mathbf{w}\otimes\mathbf{v}).

By Proposition 8.5.5, $G$ gives rise to a linear transformation, i.e. there exists a unique linear transformation $T:W\to V$ such that

G(\mathbf{w},\mathbf{v}^{\prime})=\langle T\mathbf{w},\mathbf{v}^{\prime}% \rangle\qquad\forall\mathbf{w}\in W,\quad\forall\mathbf{v}^{\prime}\in V^{% \prime}.

And the linear transformation $T$ gives us the polylinear map

F\in L(V_{1},V_{2},\ldots,V_{p};V)

F(\mathbf{v}_{1},\mathbf{v}_{2},\ldots,\mathbf{v}_{p})=T(\mathbf{v}_{1}\otimes% \mathbf{v}_{2}\otimes\ldots\otimes\mathbf{v}_{p}),

see Remark after Proposition 8.5.4.

The uniqueness of the transformation $F$ , is, as in Proposition 8.5.5, is a trivial corollary of Lemma 8.1.3. We leave the details as an exercise for the reader. ∎

This section shows that

tensors are universal objects in polylinear algebra, since any polylinear transformation can be interpreted as a tensor and vice versa.

Exercises.

8.5.1.

Show that the tensor product $\mathbf{v}_{1}\otimes\mathbf{v}_{2}\otimes\ldots\otimes\mathbf{v}_{p}$ of vectors is linear in each argument $\mathbf{v}_{k}$ .

8.5.2.

Show that the set $\{\mathbf{v}_{1}\otimes\mathbf{v}_{2}\otimes\ldots\otimes\mathbf{v}_{p}\,:\,% \mathbf{v}_{k}\in V_{k}\}$ of tensor products of vectors is strictly less than $V_{1}\otimes V_{2}\otimes\ldots\otimes V_{p}$ .

8.5.3.

Prove that the transformation $F$ from Proposition 8.5.6 is unique.

8.6. Change of coordinates formula for tensors.

The main reason for the differentiation of covariant and contravariant variables is that under the change of bases, their coordinates change according to different rules. Thus, the entries of covariant and contravariant vectors change according to different rules as well.

In this section we going to investigate this in details. Note, that coordinate representations are extremely important, since, for example, all numerical computations (unlike the theoretical investigations) are performed using some coordinate system.

8.6.1. Coordinate representation of a tensor.

Let $F$ be an $r$ -covariant $s$ -contravariant tensor, $r+s=p$ . Let $\mathbf{x}_{1},\ldots,\mathbf{x}_{r}$ be covariant variables ( $\mathbf{x}_{k}\in X_{k}$ ), and $\mathbf{f}_{1},\ldots,\mathbf{f}_{s}$ be the contravariant ones ( $\mathbf{f}_{k}\in X_{k}^{\prime}$ ). Let us write the covariant variables first, so the the tensor will be written as $F(\mathbf{x}_{1},\ldots,\mathbf{x}_{r},\mathbf{f}_{1},\ldots,\mathbf{f}_{s})$ . For $k=1,2,\ldots,p$ fix a basis $\mathcal{B}_{k}=\{\mathbf{b}^{(k)}_{j}\}_{j=1}^{\dim X_{k}}$ in $X_{k}$ , and let $\mathcal{B}^{\prime}_{k}=\{\widetilde{\mathbf{b}}^{(k)}_{j}\}_{j=1}^{\dim X_{k}}$ be the dual basis in $X_{k}^{\prime}$ .

For a vector $\mathbf{x}_{k}\in X_{k}$ let $x_{(k)}^{j}$ , $j=1,2,\ldots,\dim X_{k}$ be its coordinates in the basis $\mathcal{B}_{k}$ , and similarly, if $\mathbf{f}_{k}\in X_{k}^{\prime}$ let $f^{(k)}_{j}$ , $j=1,2,\ldots,\dim X_{k}$ be its coordinates in the dual basis $\mathcal{B}^{\prime}_{k}$ (note that in agreement with the Einstein notation, the coordinates of the vector are indexed by a superscript, and the coordinates of a covector are indexed by a subscript).

Proposition 8.6.1.

Denote

(8.6.1)

\varphi_{j_{1},\ldots,j_{r}}^{k_{1},\ldots,k_{s}}:=F(\mathbf{b}^{(1)}_{j_{1}},% \ldots,\mathbf{b}^{(r)}_{j_{r}},\widetilde{\mathbf{b}}^{(r+1)}_{k_{1}},\ldots,% \widetilde{\mathbf{b}}^{(r+s)}_{k_{s}}).

Then, in the Einstein notation

(8.6.2)

F(\mathbf{x}_{1},\ldots,\mathbf{x}_{r},\mathbf{f}_{1},\ldots,\mathbf{f}_{s})=% \varphi_{j_{1},\ldots,j_{r}}^{k_{1},\ldots,k_{s}}x_{(1)}^{j_{1}}\ldots x_{(r)}% ^{j_{r}}f^{(1)}_{k_{1}}\ldots f^{(s)}_{k_{s}}

(the summation here is over the indices $j_{1},\ldots,j_{r}$ and $k_{1},\ldots,k_{s}$ ).

Note that we use the notation $(1),\ldots,(r)$ and $(1),\ldots,(s)$ to emphasize that these are not the indices: the numbers in parenthesis just show the order of argument. Thus, right side of (8.6.2) does not have any indices left (all indices were used in summation), so it is just a number (for fixed $\mathbf{x}_{k}$ s and $\mathbf{f}_{k}$ s).

Proof of Proposition 8.6.1.

To show that (8.6.1) implies (8.6.2) we first notice that (8.6.1) means that (8.6.2) holds when $\mathbf{x}_{j}$ s and $\mathbf{f}_{k}$ s are the elements of the corresponding bases. Decomposing each argument $\mathbf{x}_{j}$ and $\mathbf{f}_{k}$ in the corresponding basis and using linearity in each argument we can easily get (8.6.2). The computation is rather simple, but because there are a lot of indices, the formulas could be quite big and could look quite frightening.

To avoid writing too many huge formulas, we leave this computation to the reader as an exercise.

We do not want the reader to feel cheated, so we present a different, more “high brow” (abstract) explanation, which does not require any computations! Namely, let us notice that the expressions in the left and the right side of (8.6.2) define tensors. By Proposition 8.5.4 they can be lifted to linear functionals on the tensor product $X_{1}\otimes\ldots\otimes X_{r}\otimes X_{r+1}^{\prime}\otimes\ldots\otimes X_% {r+s}^{\prime}$ .

Rephrasing what we discussed in the beginning of the proof, we can say that (8.6.1) means that the functional coincide on all vectors

\mathbf{b}^{(1)}_{j_{1}}\otimes\ldots\otimes\mathbf{b}^{(r)}_{j_{r}}\otimes% \widetilde{\mathbf{b}}^{(r+1)}_{k_{1}}\otimes\ldots\otimes\widetilde{\mathbf{b% }}^{(r+s)}_{k_{s}}

of a basis in the tensor product, so the functionals (and therefore the tensors) are equal. ∎

The entries $\varphi_{j_{1},\ldots,j_{r}}^{k_{1},\ldots,k_{s}}$ are called the entries of the tensor $F$ in the bases $\mathcal{B}_{k}$ , $k=1,2,\ldots,p$ .

Now, let for $k=1,2,\ldots p$ , $\mathcal{A}_{k}$ be a basis in $X_{k}$ (and $\mathcal{A}_{k}^{\prime}$ be the dual basis in $X_{k}^{\prime}$ ). We want to investigate how the entries of the tensor $F$ change when we change the bases from $\mathcal{B}_{k}$ to $\mathcal{A}_{k}$ .

8.6.2. Change of coordinate formulas in Einstein notation

Let us first consider the familiar cases of vectors and linear functionals, considered above in Section 8.1.1 but write everything down using the Einstein notation. Let we have in $X$ two bases, $\mathcal{B}$ and $\mathcal{A}$ and let

A=[A]_{\mathcal{A},\mathcal{B}}

be the change of coordinates matrix from $\mathcal{B}$ to $\mathcal{A}$ . For a vector $\mathbf{x}\in X$ let $x^{k}$ be its coordinates in the basis $\mathcal{B}$ and $\widetilde{\mathbf{x}}^{k}$ be the coordinates in the basis $\mathcal{A}$ . Similarly, for $\mathbf{f}\in X^{\prime}$ let $f_{k}$ denote the coordinates in the basis $\mathcal{B}^{\prime}$ and $\widetilde{f}_{k}$ –the coordinates in the basis $\mathcal{A}^{\prime}$ ( $\mathcal{B}^{\prime}$ and $\mathcal{A}^{\prime}$ are the dual bases to $\mathcal{B}$ and $\mathcal{A}$ respectively).

Denote by $(A)^{j}_{k}$ the entries of the matrix $A$ : to be consistent with the Einstein notation the superscript $j$ denotes the number of the row. Then we can write the change of coordinate formula as

(8.6.3)

\widetilde{x}^{j}=(A)^{j}_{k}x^{k}.

Similarly, let $(A^{-1})_{j}^{k}$ be the entries of $A^{-1}$ : again superscript is used to denote the number of the row. Then we can write the change of coordinate formula for the dual space as

(8.6.4)

\widetilde{f}_{j}=(A^{-1})_{j}^{k}f_{k};

the summation here is over the index $k$ (i.e. along the columns of $A^{-1}$ ), so the change of coordinate matrix in this case is indeed $(A^{-1})^{T}$ .

Let us emphasize that we did not prove anything here: we only rewrote formula (8.1.1) from Section 8.1.1 using the Einstein notation.

Remark.

While it is not needed in what follows, let us play a bit more with the Einstein notation. Namely, the equations

A^{-1}A=I\qquad\text{and}\qquad AA^{-1}=I

can be rewritten in the Einstein notation as

(A)^{j}_{k}(A^{-1})^{k}_{l}=\delta_{j,l}\qquad\text{and}\qquad(A^{-1})^{k}_{j}% (A)^{j}_{l}=\delta_{k,l}

respectively.

8.6.3. Change of coordinates formula for tensors

Now we are ready to give the change of coordinate formula for general tensors.

For $k=1,2,\ldots,p:=r+s$ let $A_{k}:=[I]_{\mathcal{A},\mathcal{B}}$ be the change of coordinates matrices, and let $A_{k}^{-1}$ be their inverses.

As in Section 8.6.2 we denote by $(A)^{j}_{k}$ the entries of a matrix $A$ , with the agreement that superscript gives the number of the column.

Proposition 8.6.2.

Given an $r$ -covariant $s$ -contravariant tensor $F$ let

\varphi_{j_{1},\ldots,j_{r}}^{k_{1},\ldots,k_{s}}\qquad\text{and}\qquad% \widetilde{\varphi}_{j_{1},\ldots,j_{r}}^{k_{1},\ldots,k_{s}}

be its entries in the bases $\mathcal{B}_{k}$ (the old ones) and $\mathcal{A}_{k}$ (the new ones) respectively. In the above notation

\widetilde{\varphi}_{j_{1},\ldots,j_{r}}^{k_{1},\ldots,k_{s}}=\varphi_{j_{1}^{% \prime},\ldots,j_{r}^{\prime}}^{k_{1}^{\prime},\ldots,k_{s}^{\prime}}(A_{1}^{-% 1})^{j_{1}^{\prime}}_{j_{1}}\ldots(A_{r}^{-1})^{j_{r}^{\prime}}_{j_{r}}(A_{r+1% })^{k_{1}}_{k_{1}^{\prime}}\ldots(A_{r+s})^{k_{s}}_{k_{s}^{\prime}}

(the summation here is in the indices $j_{1}^{\prime},\ldots,j_{r}^{\prime}$ and $k_{1}^{\prime},\ldots,k_{s}$ ).

Because of many indices, the formula in this proposition looks very complicated. However if one understands the main idea, the formula will turn out to be quite simple and easy to memorize.

To explain the main idea let us, sightly abusing the language, express this formula “in plain English”. namely, we can say, that

To express the “new” tensor entries $\widetilde{\varphi}_{j_{1},\ldots,j_{r}}^{k_{1},\ldots,k_{s}}$ in terms of the “old” ones $\varphi_{j_{1},\ldots,j_{r}}^{k_{1},\ldots,k_{s}}$ , one needs for each covariant index (subscript) apply the covariant rule (8.6.4), and for each contravariant index (superscript) apply the contravariant rule (8.6.3)

Proof of Proposition 8.6.2.

Informally, the idea of the proof is very simple: we just change the bases one at a time, applying each time the change of coordinate formulas (8.6.3) or (8.6.4), depending on whether the tensor is covariant or contravariant in the corresponding variable.

To write the rigorous formal proof we will use the induction in $r$ and $s$ (the number of covariant and contravariant arguments of the tensor). Proposition is true for $r=1$ , $s=0$ and for $r=0$ , $s=1$ , see (8.6.4) or (8.6.3) respectively.

Assuming now that the proposition is proved for some $r$ and $s$ , let us prove it for $r+1$ , $s$ and for $r$ , $s+1$ .

Let us do the latter case, the other one is done similarly. The main idea is that we first change $p=r+s$ bases and use the induction hypothesis; then we change the last one and use (8.6.3).

Namely, let $\widehat{\varphi}_{j_{1},\ldots,j_{r}}^{k_{1},\ldots,k_{s+1}}$ be the entries of an $(r,s+1)$ tensor $F$ in the bases $\mathcal{A}_{1},\ldots,\mathcal{A}_{p},\mathcal{B}_{p+1}$ , $p=r+s$ .

Let us fix the index $k_{s+1}$ and consider the $r$ -covariant $s$ -contravariant tensor $F(\mathbf{x}_{1},\ldots,\mathbf{x}_{r},\mathbf{f}_{1},\ldots,\mathbf{f}_{s},% \widetilde{\mathbf{b}}^{(r+s+1)}_{s+1})$ , where $\mathbf{x}_{1},\ldots,\mathbf{x}_{r},\mathbf{f}_{1},\ldots,\mathbf{f}_{s}$ are the variables. Clearly

\varphi_{j_{1},\ldots,j_{r}}^{k_{1},\ldots,k_{s},k_{s+1}}\qquad\text{and}% \qquad\widehat{\varphi}_{j_{1},\ldots,j_{r}}^{k_{1},\ldots,k_{s},k_{s+1}}

are its entries in the bases $\mathcal{B}_{1},\ldots,\mathcal{B}_{p}$ and $\mathcal{A}_{1},\ldots,\mathcal{A}_{p}$ respectively (can you see why?) Recall, that the index $k_{s+1}$ here is fixed.

By the induction hypothesis

(8.6.5)

\widehat{\varphi}_{j_{1},\ldots,j_{r}}^{k_{1},\ldots,k_{s},k_{s+1}}=\varphi_{j% _{1}^{\prime},\ldots,j_{r}^{\prime}}^{k_{1}^{\prime},\ldots,k_{s}^{\prime},k_{% s+1}}(A_{1}^{-1})^{j_{1}^{\prime}}_{j_{1}}\ldots(A_{r}^{-1})^{j_{r}^{\prime}}_% {j_{r}}(A_{r+1})^{k_{1}}_{k_{1}^{\prime}}\ldots(A_{r+s})^{k_{s}}_{k_{s}^{% \prime}}.

Note, that we did not assume anything about the index $k_{s+1}$ , so (8.6.5) holds for all $k_{s+1}$ .

Now let us fix indices $j_{1},\ldots,j_{r},k_{1},\ldots,k_{s}$ and consider $1$ -contravariant tensor

F(\mathbf{a}^{(1)}_{j_{1}},\ldots,\mathbf{a}^{(r)}_{j_{r}},\widetilde{\mathbf{% a}}^{(r+1)}_{k_{1}},\ldots,\widetilde{\mathbf{a}}^{(r+s)}_{k_{s}},\mathbf{f}_{% s+1})

of the variable $\mathbf{f}_{s+1}$ . Here $\mathbf{a}^{(k)}_{j}$ are the vectors in the basis $\mathcal{A}_{k}$ and $\widetilde{\mathbf{a}}^{(k)}_{j}$ are the vectors in the dual basis $\mathcal{A}_{k}^{\prime}$ .

It is again easy to see that

\widehat{\varphi}_{j_{1},\ldots,j_{r}}^{k_{1},\ldots,k_{s},k_{s+1}}\qquad\text% {and}\qquad\widetilde{\varphi}_{j_{1},\ldots,j_{r}}^{k_{1},\ldots,k_{s},k_{s+1% }},

$j_{s+1}=1,2,\ldots,\dim X_{p+1}$ , are the indices of this functional in the bases $\mathcal{B}_{p+1}$ and $\mathcal{A}_{p+1}$ respectively. According to (8.6.3)

\widetilde{\varphi}_{j_{1},\ldots,j_{r}}^{k_{1},\ldots,k_{s},k_{s+1}}=\widehat% {\varphi}_{j_{1},\ldots,j_{r}}^{k_{1},\ldots,k_{s},k^{\prime}_{s+1}}(A_{p+1})^% {k_{s+1}}_{k^{\prime}_{s+1}},

and since we did not assume anything about the indices $j_{1},\ldots,j_{r},k_{1},\ldots,k_{s}$ , the above identity holds for all their combinations. Combining this with (8.6.5) we get that the proposition holds for tensors of valency $(r,s+1)$ .

The case of valency $(r+1,s)$ is treated absolutely the same way: the only difference is that in the end we get a $1$ -covariant tensor and use (8.6.4) instead of (8.6.3). ∎

Chapter 8 Dual spaces and tensors

8.1. Dual spaces

8.1.1. Linear functionals and the dual space. Change of coordinates in the dual space

Definition 8.1.1.

Definition 8.1.2.

Change of coordinates formula

Remark.

A uniqueness theorem

Lemma 8.1.3.

Proof.

8.1.2. Second dual

8.1.3. Dual, a.k.a. biorthogonal bases

Definition 8.1.4.

Abstract non-orthogonal Fourier decomposition

Remark 8.1.5.

8.1.4. Examples of dual systems

Taylor formula

Lagrange interpolation

Remark.

Exercises.

8.1.1.

8.1.2.

8.2. Dual of an inner product space

8.2.1. Riesz representation theorem

Theorem 8.2.1 (Riesz representation theorem).

Proof.

Remark.

8.2.2. Is an inner product space a dual to itself?

8.2.3. Biorthogonal systems and orthonormal bases

Definition 8.2.2.

8.3. Adjoint (dual) transformations and transpose. Fundamental subspace revisited (once more)

8.3.1. Dual (adjoint) transformation

Definition 8.3.1.

Dual transformation for the case A:𝔽n→𝔽mA:\mathbb{F}^{n}\to\mathbb{F}^{m}

Dual transformation in the abstract setting

Remark 8.3.2.

A coordinate-free way to define the dual transformation

Remark 8.3.3.

8.3.2. Annihilators and relations between fundamental subspaces

Definition 8.3.4.

Remark 8.3.5.

Proposition 8.3.6.

Proof.

Theorem 8.3.7.

Proof.

Exercises.

8.3.1.

8.3.2.

8.3.3.

8.3.4.

8.4. What is the difference between a space and its dual?

8.4.1. Isomorphisms between XX and X′X^{\prime}

8.4.2. An example: velocities (differential operators) and differential forms as vectors and linear functionals

Velocities as vectors

Differential forms as linear functionals (covectors)

Differential operators as vectors

8.4.3. The case of a real inner product space

Remark.

Einstein notation, metric tensor

Remark.

Covariant and contravariant coordinates. Lovering and raising the indices

Remark.

Remark 8.4.1.

8.4.4. Conclusions

Exercises.

8.4.1.

8.5. Multilinear functions. Tensors

8.5.1. Multilinear functions

Definition 8.5.1.

Example.

Multilinear functions form a vector space

Dimension of L​(V1,V2,…,Vp;V)L(V_{1},V_{2},\ldots,V_{p};V)

Proposition 8.5.2.

Proof.

8.5.2. Tensor Products

Definition.

Remark 8.5.3.

Remark.

Remark.

Lifting a multilinear function to a linear transformation on the tensor product

Dual transformation for the case $A:\mathbb{F}^{n}\to\mathbb{F}^{m}$

8.4.1. Isomorphisms between $X$ and $X^{\prime}$

Dimension of $L(V_{1},V_{2},\ldots,V_{p};V)$