Chapter 1 Basic Notions

1.1. Vector spaces

A vector space VV is a (non-empty) collection of objects, called vectors (denoted in this book by lowercase bold letters, like 𝐯\mathbf{v}), along with two operations, addition of vectors and multiplication by a number (scalar)111We need some visual distinction between vectors and other objects, so in this book we use bold lowercase letters for vectors and regular lowercase letters for numbers (scalars). In some (more advanced) books Latin letters are reserved for vectors, while Greek letters are used for scalars; in even more advanced texts any letter can be used for anything and the reader must understand from the context what each symbol means. I think it is helpful, especially for a beginner to have some visual distinction between different objects, so a bold lowercase letters will always denote a vector. And on a blackboard an arrow (like in v\vec{v}) is used to identify a vector., such that the results of both operations are always in VV, and the following 8 properties below (the so-called axioms of a vector space) hold:

The first 4 properties deal with the addition:

  1. 1.

    Commutativity: 𝐯+𝐰=𝐰+𝐯\displaystyle\mathbf{v}+\mathbf{w}=\mathbf{w}+\mathbf{v} for all 𝐯,𝐰V\mathbf{v},\mathbf{w}\in V;

  2. 2.

    Associativity: (𝐮+𝐯)+𝐰=𝐮+(𝐯+𝐰)\displaystyle(\mathbf{u}+\mathbf{v})+\mathbf{w}=\mathbf{u}+(\mathbf{v}+\mathbf% {w}) for all 𝐮,𝐯,𝐰V\mathbf{u},\mathbf{v},\mathbf{w}\in V;

  3. 3.

    Zero vector: there exists a special vector, denoted by 𝟎\mathbf{0} such that 𝐯+𝟎=𝐯\displaystyle\mathbf{v}+\mathbf{0}=\mathbf{v} for all 𝐯V\mathbf{v}\in V;

  4. 4.

    Additive inverse: For every vector 𝐯V\mathbf{v}\in V there exists a vector 𝐰V\mathbf{w}\in V such that 𝐯+𝐰=𝟎\mathbf{v}+\mathbf{w}=\mathbf{0}. Such additive inverse is usually denoted as 𝐯-\mathbf{v};

    The next two properties concern multiplication:

  5. 5.

    Multiplicative identity: 1𝐯=𝐯1\mathbf{v}=\mathbf{v} for all 𝐯V\mathbf{v}\in V;

  6. 6.

    Multiplicative associativity: (αβ)𝐯=α(β𝐯)(\alpha\beta)\mathbf{v}=\alpha(\beta\mathbf{v}) for all 𝐯V\mathbf{v}\in V and all scalars α\alpha, β\beta;

    And finally, two distributive properties, which connect multiplication and addition:

  7. 7.

    α(𝐮+𝐯)=α𝐮+α𝐯\alpha(\mathbf{u}+\mathbf{v})=\alpha\mathbf{u}+\alpha\mathbf{v} for all 𝐮,𝐯V\mathbf{u},\mathbf{v}\in V and all scalars α\alpha;

  8. 8.

    (α+β)𝐯=α𝐯+β𝐯(\alpha+\beta)\mathbf{v}=\alpha\mathbf{v}+\beta\mathbf{v} for all 𝐯V\mathbf{v}\in V and all scalars α\alpha, β\beta.

Remark.

The above properties seem hard to memorize, but it is not necessary. They are simply the familiar rules of algebraic manipulations with numbers, that you know from high school. The only new twist here is that you have to understand what operations you can apply to what objects. You can add vectors, and you can multiply a vector by a number (scalar). Of course, you can do with number all possible manipulations that you have learned before. But, you cannot multiply two vectors, or add a number to a vector.

Remark.

It is easy to prove that zero vector 𝟎\mathbf{0} is unique, and that given 𝐯V\mathbf{v}\in V its additive inverse 𝐯-\mathbf{v} is also unique.

It is also not hard to show using properties 5, 6 and 8 that 𝟎=0𝐯\mathbf{0}=0\mathbf{v} for any 𝐯V\mathbf{v}\in V, and that 𝐯=(1)𝐯-\mathbf{v}=(-1)\mathbf{v}. Note, that to do this one still needs to use other properties of a vector space in the proofs, in particular properties 3 and 4.

If the scalars are the usual real numbers, we call the space VV a real vector space. If the scalars are the complex numbers, i.e. if we can multiply vectors by complex numbers, we call the space VV a complex vector space.

Note, that any complex vector space is a real vector space as well (if we can multiply by complex numbers, we can multiply by real numbers), but not the other way around.

It is also possible to consider a situation when the scalars are elements of an arbitrary field222If you do not know what a field is, do not worry, since in this book we consider only the case of real and complex spaces.𝔽\mathbb{F}. In this case we say that VV is a vector space over the field 𝔽\mathbb{F}. Although many of the constructions in the book (in particular, everything in Chapters 13) work for general fields, in this text we are considering only real and complex vector spaces.

If we do not specify the set of scalars, or use a letter 𝔽\mathbb{F} for it, then the results are true for both real and complex spaces. If we need to distinguish real and complex cases, we will explicitly say which case we are considering.

Note, that in the definition of a vector space over an arbitrary field, we require the set of scalars to be a field, so we can always divide (without a remainder) by a non-zero scalar. Thus, it is possible to consider vector space over rationals, but not over the integers.

1.1.1. Examples.

Example.

The space n\mathbb{R}^{n} consists of all columns of size nn,

𝐯=(v1v2vn)\mathbf{v}=\left(\begin{array}[]{c}v_{1}\\ {v}_{2}\\ \vdots\\ {v}_{n}\end{array}\right)

whose entries are real numbers. Addition and multiplication are defined entrywise, i.e.

α(v1v2vn)=(αv1αv2αvn),(v1v2vn)+(w1w2wn)=(v1+w1v2+w2vn+wn)\alpha\left(\begin{array}[]{c}v_{1}\\ {v}_{2}\\ \vdots\\ {v}_{n}\end{array}\right)=\left(\begin{array}[]{c}\alpha v_{1}\\ {\alpha v}_{2}\\ \vdots\\ {\alpha v}_{n}\end{array}\right),\qquad\left(\begin{array}[]{c}v_{1}\\ {v}_{2}\\ \vdots\\ {v}_{n}\end{array}\right)+\left(\begin{array}[]{c}w_{1}\\ {w}_{2}\\ \vdots\\ {w}_{n}\end{array}\right)=\left(\begin{array}[]{c}v_{1}+w_{1}\\ v_{2}+w_{2}\\ \vdots\\ v_{n}+w_{n}\end{array}\right)
Example.

The space n\mathbb{C}^{n} also consists of columns of size nn, only the entries now are complex numbers. Addition and multiplication are defined exactly as in the case of n\mathbb{R}^{n}, the only difference is that we can now multiply vectors by complex numbers, i.e. n\mathbb{C}^{n} is a complex vector space.

Many results in this text are true for both n\mathbb{R}^{n} and n\mathbb{C}^{n}. In such cases we will use notation 𝔽n\mathbb{F}^{n}.

Example.

The space Mm×nM_{m\times n} (also denoted as Mm,nM_{m,n}) of m×nm\times n matrices: the addition and multiplication by scalars are defined entrywise. If we allow only real entries (and so only multiplication only by reals), then we have a real vector space; if we allow complex entries and multiplication by complex numbers, we then have a complex vector space.

Formally, we have to distinguish between between real and complex cases, i.e. to write something like Mm,nM_{m,n}^{\mathbb{R}} or Mm,nM_{m,n}^{\mathbb{C}}. However, in most situations there is no difference between real and complex case, and there is no need to specify which case we are considering. If there is a difference we say explicitly which case we are considering.

Remark.

As we mentioned above, the axioms of a vector space are just the familiar rules of algebraic manipulations with (real or complex) numbers, so if we put scalars (numbers) for the vectors, all axioms will be satisfied. Thus, the set \mathbb{R} of real numbers is a real vector space, and the set \mathbb{C} of complex numbers is a complex vector space.

More importantly, since in the above examples all vector operations (addition and multiplication by a scalar) are performed entrywise, for these examples the axioms of a vector space are automatically satisfied because they are satisfied for scalars (can you see why?). So, we do not have to check the axioms, we get the fact that the above examples are indeed vector spaces for free!

The same can be applied to the next example, the coefficients of the polynomials play the role of entries there.

Example.

The space n\mathbb{P}_{n} of polynomials of degree at most nn, consists of all polynomials pp of form

p(t)=a0+a1t+a2t2++antn,p(t)=a_{0}+a_{1}t+a_{2}t^{2}+\ldots+a_{n}t^{n},

where tt is the independent variable. Note, that some, or even all, coefficients aka_{k} can be 0.

In the case of real coefficients aka_{k} we have a real vector space, complex coefficient give us a complex vector space. Again, we will specify whether we treating real or complex case only when it is essential; otherwise everything applies to both cases.

Question: What are zero vectors in each of the above examples?

1.1.2. Matrix notation

An m×nm\times n matrix is a rectangular array with mm rows and nn columns. Elements of the array are called entries of the matrix.

It is often convenient to denote matrix entries by indexed letters: the first index denotes the number of the row, where the entry is, and the second one is the number of the column. For example

(1.1.1) A=(aj,k)j=1,m,k=1n=(a1,1a1,2a1,na2,1a2,2a2,nam,1am,2am,n)A=(a_{j,k})_{j=1,}^{m,}\,{\vphantom{)}}_{k=1}^{n}=\left(\begin{array}[]{cccc}a% _{1,1}&a_{1,2}&\ldots&a_{1,n}\\ a_{2,1}&a_{2,2}&\ldots&a_{2,n}\\ \vdots&\vdots&&\vdots\\ a_{m,1}&a_{m,2}&\ldots&a_{m,n}\end{array}\right)

is a general way to write an m×nm\times n matrix.

Very often for a matrix AA the entry in row number jj and column number kk is denoted by Aj,kA_{j,k} or (A)j,k(A)_{j,k}, and sometimes as in example (1.1.1) above the same letter but in lowercase is used for the matrix entries.

Given a matrix AA, its transpose (or transposed matrix) ATA^{T}, is defined by transforming the rows of AA into the columns. For example

(123456)T=(142536).\left(\begin{array}[]{ccc}1&2&3\\ 4&5&6\end{array}\right)^{T}=\left(\begin{array}[]{cc}1&4\\ 2&5\\ 3&6\end{array}\right).

So, the columns of ATA^{T} are the rows of AA and vice versa, the rows of ATA^{T} are the columns of AA.

The formal definition is as follows: (AT)j,k=(A)k,j(A^{T})_{j,k}=(A)_{k,j} meaning that the entry of ATA^{T} in the row number jj and column number kk equals the entry of AA in the row number kk and column number jj.

The transpose of a matrix has a very nice interpretation in terms of linear transformations, namely it gives the so-called adjoint transformation. We will study this in detail later, but for now transposition will be just a useful formal operation.

One of the first uses of the transpose is that we can write a column vector 𝐱𝔽n\mathbf{x}\in\mathbb{F}^{n} (recall that 𝔽\mathbb{F} is \mathbb{R} or \mathbb{C}) as 𝐱=(x1,x2,,xn)T\mathbf{x}=(x_{1},x_{2},\ldots,x_{n})^{T}. If we put the column vertically, it will use significantly more space.

Exercises.

1.1.1.

Let 𝐱=(1,2,3)T\mathbf{x}=(1,2,3)^{T}, 𝐲=(y1,y2,y3)T\mathbf{y}=(y_{1},y_{2},y_{3})^{T}, 𝐳=(4,2,1)T\mathbf{z}=(4,2,1)^{T}. Compute 2𝐱2\mathbf{x}, 3𝐲3\mathbf{y}, 𝐱+2𝐲3𝐳\mathbf{x}+2\mathbf{y}-3\mathbf{z}.

1.1.2.

Which of the following sets (with natural addition and multiplication by a scalar) are vector spaces. Justify your answer.

  1. a)

    The set of all continuous functions on the interval [0,1][0,1];

  2. b)

    The set of all non-negative functions on the interval [0,1][0,1];

  3. c)

    The set of all polynomials of degree exactly nn;

  4. d)

    The set of all symmetric n×nn\times n matrices, i.e. the set of matrices A={aj,k}j,k=1nA=\{a_{j,k}\}_{j,k=1}^{n} such that AT=AA^{T}=A.

1.1.3.

True or false:

  1. a)

    Every vector space contains a zero vector;

  2. b)

    A vector space can have more than one zero vector;

  3. c)

    An m×nm\times n matrix has mm rows and nn columns;

  4. d)

    If ff and gg are polynomials of degree nn, then f+gf+g is also a polynomial of degree nn;

  5. e)

    If ff and gg are polynomials of degree at most nn, then f+gf+g is also a polynomial of degree at most nn

1.1.4.

Prove that a zero vector 𝟎\mathbf{0} of a vector space VV is unique.

1.1.5.

What matrix is the zero vector of the space M2×3M_{2\times 3}?

1.1.6.

Prove that the additive inverse, defined in Axiom 4 of a vector space is unique.

1.1.7.

Prove that 0𝐯=𝟎0\mathbf{v}=\mathbf{0} for any vector 𝐯V\mathbf{v}\in V.

1.1.8.

Prove that for any vector 𝐯\mathbf{v} its additive inverse 𝐯-\mathbf{v} is given by (1)𝐯(-1)\mathbf{v}.

1.2. Linear combinations, bases.

Let VV be a vector space, and let 𝐯1,𝐯2,,𝐯pV\mathbf{v}_{1},\mathbf{v}_{2},\ldots,\mathbf{v}_{p}\in V be a collection of vectors. A linear combination of vectors 𝐯1,𝐯2,,𝐯p\mathbf{v}_{1},\mathbf{v}_{2},\ldots,\mathbf{v}_{p} is a sum of form

α1𝐯1+α2𝐯2++αp𝐯p=k=1pαk𝐯k.\alpha_{1}\mathbf{v}_{1}+\alpha_{2}\mathbf{v}_{2}+\ldots+\alpha_{p}\mathbf{v}_% {p}=\sum_{k=1}^{p}\alpha_{k}\mathbf{v}_{k}.
Definition 1.2.1.

A system of vectors 𝐯1,𝐯2,𝐯nV\mathbf{v}_{1},\mathbf{v}_{2},\ldots\mathbf{v}_{n}\in V is called a basis (for the vector space VV) if any vector 𝐯V\mathbf{v}\in V admits a unique representation as a linear combination

𝐯=α1𝐯1+α2𝐯2++αn𝐯n=k=1nαk𝐯k.\mathbf{v}=\alpha_{1}\mathbf{v}_{1}+\alpha_{2}\mathbf{v}_{2}+\ldots+\alpha_{n}% \mathbf{v}_{n}=\sum_{k=1}^{n}\alpha_{k}\mathbf{v}_{k}.

The coefficients α1,α2,,αn\alpha_{1},\alpha_{2},\ldots,\alpha_{n} are called coordinates of the vector 𝐯\mathbf{v} (in the basis, or with respect to the basis 𝐯1,𝐯2,,𝐯n\mathbf{v}_{1},\mathbf{v}_{2},\ldots,\mathbf{v}_{n}).

Another way to say that 𝐯1,𝐯2,,𝐯n\mathbf{v}_{1},\mathbf{v}_{2},\ldots,\mathbf{v}_{n} is a basis is to say that for any possible choice of the right side 𝐯\mathbf{v}, the equation x1𝐯1+x2𝐯2++xn𝐯n=𝐯x_{1}\mathbf{v}_{1}+x_{2}\mathbf{v}_{2}+\ldots+x_{n}\mathbf{v}_{n}=\mathbf{v} (with unknowns xkx_{k}) has a unique solution.

Before discussing any properties of bases333the plural for the “basis” is bases, the same as the plural for “base”, let us give a few examples, showing that such objects exist, and that it makes sense to study them.

Example 1.2.2.

In the first example the space VV is 𝔽n\mathbb{F}^{n}, where 𝔽\mathbb{F} is either \mathbb{R} or \mathbb{C}. Consider vectors

𝐞1=(1000),𝐞2=(0100),𝐞3=(0010),,𝐞n=(0001),\mathbf{e}_{1}=\left(\begin{array}[]{c}1\\ 0\\ 0\\ \vdots\\ 0\end{array}\right),\quad\mathbf{e}_{2}=\left(\begin{array}[]{c}0\\ 1\\ 0\\ \vdots\\ 0\end{array}\right),\quad\mathbf{e}_{3}=\left(\begin{array}[]{c}0\\ 0\\ 1\\ \vdots\\ 0\end{array}\right),\ldots,\quad\mathbf{e}_{n}=\left(\begin{array}[]{c}0\\ 0\\ 0\\ \vdots\\ 1\end{array}\right),\quad

(the vector 𝐞k\mathbf{e}_{k} has all entries 0 except the entry number kk, which is 11). The system of vectors 𝐞1,𝐞2,,𝐞n\mathbf{e}_{1},\mathbf{e}_{2},\ldots,\mathbf{e}_{n} is a basis in 𝔽n\mathbb{F}^{n}. Indeed, any vector

𝐯=(x1x2xn)𝔽n\mathbf{v}=\left(\begin{array}[]{c}x_{1}\\ {x}_{2}\\ \vdots\\ {x}_{n}\end{array}\right)\in\mathbb{F}^{n}

can be represented as the linear combination

𝐯=x1𝐞1+x2𝐞2++xn𝐞n=k=1nxk𝐞k\mathbf{v}=x_{1}\mathbf{e}_{1}+x_{2}\mathbf{e}_{2}+\ldots+x_{n}\mathbf{e}_{n}=% \sum_{k=1}^{n}x_{k}\mathbf{e}_{k}

and this representation is unique. The system 𝐞1,𝐞2,,𝐞n𝔽n\mathbf{e}_{1},\mathbf{e}_{2},\ldots,\mathbf{e}_{n}\in\mathbb{F}^{n} is called the standard basis in 𝔽n\mathbb{F}^{n}.

Example 1.2.3.

In this example the space is the space n\mathbb{P}_{n} of the polynomials of degree at most nn. Consider vectors (polynomials) 𝐞0,𝐞1,𝐞2,,𝐞nn\mathbf{e}_{0},\mathbf{e}_{1},\mathbf{e}_{2},\ldots,\mathbf{e}_{n}\in\mathbb{P% }_{n} defined by

𝐞0:=1,𝐞1:=t,𝐞2:=t2,𝐞3:=t3,,𝐞n:=tn.\mathbf{e}_{0}:=1,\quad\mathbf{e}_{1}:=t,\quad\mathbf{e}_{2}:=t^{2},\quad% \mathbf{e}_{3}:=t^{3},\quad\ldots,\quad\mathbf{e}_{n}:=t^{n}.

Clearly, any polynomial pp, p(t)=a0+a1t+a2t2++antnp(t)=a_{0}+a_{1}t+a_{2}t^{2}+\ldots+a_{n}t^{n} admits a unique representation

p=a0𝐞0+a1𝐞1++an𝐞n.p=a_{0}\mathbf{e}_{0}+a_{1}\mathbf{e}_{1}+\ldots+a_{n}\mathbf{e}_{n}.

So the system 𝐞0,𝐞1,𝐞2,,𝐞nn\mathbf{e}_{0},\mathbf{e}_{1},\mathbf{e}_{2},\ldots,\mathbf{e}_{n}\in\mathbb{P% }_{n} is a basis in n\mathbb{P}_{n}. We will call it the standard basis in n\mathbb{P}_{n}.

Remark 1.2.4.

If a vector space VV has a basis 𝐯1,𝐯2,,𝐯n\mathbf{v}_{1},\mathbf{v}_{2},\ldots,\mathbf{v}_{n}, then any vector 𝐯\mathbf{v} is uniquely defined by its coefficients in the decomposition 𝐯=k=1nαk𝐯k\mathbf{v}=\sum_{k=1}^{n}\alpha_{k}\mathbf{v}_{k}. So, if we stack the coefficients αk\alpha_{k} in a column, we can operate with them as if they were column vectors, i.e. as with elements of 𝔽n\mathbb{F}^{n} (again here 𝔽\mathbb{F} is either \mathbb{R} or \mathbb{C}, but everything also works for an abstract field 𝔽\mathbb{F}).

Namely, if 𝐯=k=1nαk𝐯k\mathbf{v}=\sum_{k=1}^{n}\alpha_{k}\mathbf{v}_{k} and 𝐰=k=1nβk𝐯k\mathbf{w}=\sum_{k=1}^{n}\beta_{k}\mathbf{v}_{k}, then

𝐯+𝐰=k=1nαk𝐯k+k=1nβk𝐯k=k=1n(αk+βk)𝐯k,\mathbf{v}+\mathbf{w}=\sum_{k=1}^{n}\alpha_{k}\mathbf{v}_{k}+\sum_{k=1}^{n}% \beta_{k}\mathbf{v}_{k}=\sum_{k=1}^{n}(\alpha_{k}+\beta_{k})\mathbf{v}_{k},

i.e. to get the column of coordinates of the sum one just need to add the columns of coordinates of the summands. Similarly, to get the coordinates of α𝐯\alpha\mathbf{v} we need simply to multiply the column of coordinates of 𝐯\mathbf{v} by α\alpha.

This is a very important remark, that will be used throughout the book. It allows us to translate any statement about the standard column space 𝔽n\mathbb{F}^{n} to a vector space VV with a basis 𝐯1,𝐯2,,𝐯n\mathbf{v}_{1},\mathbf{v}_{2},\ldots,\mathbf{v}_{n}

1.2.1. Generating and linearly independent systems

The definition of a basis says that any vector admits a unique representation as a linear combination. This statement is in fact two statements, namely that the representation exists and that it is unique. Let us analyze these two statements separately.

If we only consider the existence we get the following notion

Definition 1.2.5.

A system of vectors 𝐯1,𝐯2,,𝐯pV\mathbf{v}_{1},\mathbf{v}_{2},\ldots,\mathbf{v}_{p}\in V is called a generating system (also a spanning system, or a complete system) in VV if any vector 𝐯V\mathbf{v}\in V admits representation as a linear combination

𝐯=α1𝐯1+α2𝐯2++αp𝐯p=k=1pαk𝐯k.\mathbf{v}=\alpha_{1}\mathbf{v}_{1}+\alpha_{2}\mathbf{v}_{2}+\ldots+\alpha_{p}% \mathbf{v}_{p}=\sum_{k=1}^{p}\alpha_{k}\mathbf{v}_{k}.

The only difference from the definition of a basis is that we do not assume that the representation above is unique.

The words generating, spanning and complete here are synonyms. I personally prefer the term complete, because of my operator theory background. Generating and spanning are more often used in linear algebra textbooks.

Clearly, any basis is a generating (complete) system. Also, if we have a basis, say 𝐯1,𝐯2,,𝐯n\mathbf{v}_{1},\mathbf{v}_{2},\ldots,\mathbf{v}_{n}, and we add to it several vectors, say 𝐯n+1,,𝐯p\mathbf{v}_{n+1},\ldots,\mathbf{v}_{p}, then the new system will be a generating (complete) system. Indeed, we can represent any vector as a linear combination of the vectors 𝐯1,𝐯2,,𝐯n\mathbf{v}_{1},\mathbf{v}_{2},\ldots,\mathbf{v}_{n}, and just ignore the new ones (by putting corresponding coefficients αk=0\alpha_{k}=0).

Now, let us turn our attention to the uniqueness. We do not want to worry about existence, so let us consider the zero vector 𝟎\mathbf{0}, which always admits a representation as a linear combination.

Definition.

A linear combination α1𝐯1+α2𝐯2++αp𝐯p\alpha_{1}\mathbf{v}_{1}+\alpha_{2}\mathbf{v}_{2}+\ldots+\alpha_{p}\mathbf{v}_% {p} is called trivial if αk=0\alpha_{k}=0 k\forall k.

A trivial linear combination is always (for all choices of vectors 𝐯1,𝐯2,,𝐯p\mathbf{v}_{1},\mathbf{v}_{2},\ldots,\mathbf{v}_{p}) equal to 𝟎\mathbf{0}, and that is probably the reason for the name.

Definition.

A system of vectors 𝐯1,𝐯2,,𝐯pV\mathbf{v}_{1},\mathbf{v}_{2},\ldots,\mathbf{v}_{p}\in V is called linearly independent if only the trivial linear combination (k=1pαk𝐯k\sum_{k=1}^{p}\alpha_{k}\mathbf{v}_{k} with αk=0\alpha_{k}=0 k\forall k) of vectors 𝐯1,𝐯2,,𝐯p\mathbf{v}_{1},\mathbf{v}_{2},\ldots,\mathbf{v}_{p} equals 𝟎\mathbf{0}.

In other words, the system 𝐯1,𝐯2,,𝐯p\mathbf{v}_{1},\mathbf{v}_{2},\ldots,\mathbf{v}_{p} is linearly independent iff the equation x1𝐯1+x2𝐯2++xp𝐯p=𝟎x_{1}\mathbf{v}_{1}+x_{2}\mathbf{v}_{2}+\ldots+x_{p}\mathbf{v}_{p}=\mathbf{0} (with unknowns xkx_{k}) has only trivial solution x1=x2==xp=0x_{1}=x_{2}=\ldots=x_{p}=0.

If a system is not linearly independent, it is called linearly dependent. By negating the definition of linear independence, we get the following

Definition.

A system of vectors 𝐯1,𝐯2,,𝐯p\mathbf{v}_{1},\mathbf{v}_{2},\ldots,\mathbf{v}_{p} is called linearly dependent if 𝟎\mathbf{0} can be represented as a nontrivial linear combination, 𝟎=k=1pαk𝐯k\mathbf{0}=\sum_{k=1}^{p}\alpha_{k}\mathbf{v}_{k}. Non-trivial here means that at least one of the coefficient αk\alpha_{k} is non-zero. This can be (and usually is) written as k=1p|αk|0\sum_{k=1}^{p}|\alpha_{k}|\neq 0.

So, restating the definition we can say, that a system is linearly dependent if and only if there exist scalars α1,α2,,αp\alpha_{1},\alpha_{2},\ldots,\alpha_{p}, k=1p|αk|0\sum_{k=1}^{p}|\alpha_{k}|\neq 0 such that

k=1pαk𝐯k=𝟎.\sum_{k=1}^{p}\alpha_{k}\mathbf{v}_{k}=\mathbf{0}.

An alternative definition (in terms of equations) is that a system 𝐯1,𝐯2,,𝐯p\mathbf{v}_{1},\linebreak\mathbf{v}_{2},\ldots,\mathbf{v}_{p} is linearly dependent iff the equation

x1𝐯1+x2𝐯2++xp𝐯p=𝟎x_{1}\mathbf{v}_{1}+x_{2}\mathbf{v}_{2}+\ldots+x_{p}\mathbf{v}_{p}=\mathbf{0}

(with unknowns xkx_{k}) has a non-trivial solution. Non-trivial, once again means that at least one of xkx_{k} is different from 0, and it can be written as k=1p|xk|0\sum_{k=1}^{p}|x_{k}|\neq 0.

The following proposition gives an alternative description of linearly dependent systems.

Proposition 1.2.6.

A system of vectors 𝐯1,𝐯2,,𝐯pV\mathbf{v}_{1},\mathbf{v}_{2},\ldots,\mathbf{v}_{p}\in V is linearly dependent if and only if one of the vectors 𝐯k\mathbf{v}_{k} can be represented as a linear combination of the other vectors,

(1.2.1) 𝐯k=j=1jkpβj𝐯j.\mathbf{v}_{k}=\sum_{\begin{subarray}{c}j=1\\ j\neq k\end{subarray}}^{p}\beta_{j}\mathbf{v}_{j}.
Proof.

Suppose the system 𝐯1,𝐯2,,𝐯p\mathbf{v}_{1},\mathbf{v}_{2},\ldots,\mathbf{v}_{p} is linearly dependent. Then there exist scalars αk\alpha_{k}, k=1p|αk|0\sum_{k=1}^{p}|\alpha_{k}|\neq 0 such that

α1𝐯1+α2𝐯2++αp𝐯p=𝟎.\alpha_{1}\mathbf{v}_{1}+\alpha_{2}\mathbf{v}_{2}+\ldots+\alpha_{p}\mathbf{v}_% {p}=\mathbf{0}.

Let kk be the index such that αk0\alpha_{k}\neq 0. Then, moving all terms except αk𝐯k\alpha_{k}\mathbf{v}_{k} to the right side we get

αk𝐯k=j=1jkpαj𝐯j.\alpha_{k}\mathbf{v}_{k}=-\sum_{\begin{subarray}{c}j=1\\ j\neq k\end{subarray}}^{p}\alpha_{j}\mathbf{v}_{j}.

Dividing both sides by αk\alpha_{k} we get (1.2.1) with βj=αj/αk\beta_{j}=-\alpha_{j}/\alpha_{k}.

On the other hand, if (1.2.1) holds, 𝟎\mathbf{0} can be represented as a non-trivial linear combination

𝐯kj=1jkpβj𝐯j=𝟎.\mathbf{v}_{k}-\sum_{\begin{subarray}{c}j=1\\ j\neq k\end{subarray}}^{p}\beta_{j}\mathbf{v}_{j}=\mathbf{0}.

Obviously, any basis is a linearly independent system. Indeed, if a system 𝐯1,𝐯2,,𝐯n\mathbf{v}_{1},\mathbf{v}_{2},\ldots,\mathbf{v}_{n} is a basis, 𝟎\mathbf{0} admits a unique representation

𝟎=α1𝐯1+α2𝐯2++αn𝐯n=k=1nαk𝐯k.\mathbf{0}=\alpha_{1}\mathbf{v}_{1}+\alpha_{2}\mathbf{v}_{2}+\ldots+\alpha_{n}% \mathbf{v}_{n}=\sum_{k=1}^{n}\alpha_{k}\mathbf{v}_{k}.

Since the trivial linear combination always gives 𝟎\mathbf{0}, the trivial linear combination must be the only one giving 𝟎\mathbf{0}.

So, as we already discussed, if a system is a basis it is a complete (generating) and linearly independent system. The following proposition shows that the converse implication is also true.

Proposition 1.2.7.

A system of vectors 𝐯1,𝐯2,,𝐯nV\mathbf{v}_{1},\mathbf{v}_{2},\ldots,\mathbf{v}_{n}\in V is a basis if and only if it is linearly independent and complete (generating).

Proof.

We already know that a basis is always linearly independent and complete, so in one direction the proposition is already proved.

Let us prove the other direction. Suppose a system 𝐯1,𝐯2,,𝐯n\mathbf{v}_{1},\mathbf{v}_{2},\ldots,\mathbf{v}_{n} is linearly independent and complete. Take an arbitrary vector 𝐯V\mathbf{v}\in V. Since the system 𝐯1,𝐯2,,𝐯n\mathbf{v}_{1},\mathbf{v}_{2},\ldots,\mathbf{v}_{n} is linearly complete (generating), 𝐯\mathbf{v} can be represented as

𝐯=α1𝐯1+α2𝐯2++αn𝐯n=k=1nαk𝐯k.\mathbf{v}=\alpha_{1}\mathbf{v}_{1}+\alpha_{2}\mathbf{v}_{2}+\ldots+\alpha_{n}% \mathbf{v}_{n}=\sum_{k=1}^{n}\alpha_{k}\mathbf{v}_{k}.

We only need to show that this representation is unique.

Suppose 𝐯\mathbf{v} admits another representation

𝐯=k=1nα~k𝐯k.\mathbf{v}=\sum_{k=1}^{n}\widetilde{\alpha}_{k}\mathbf{v}_{k}.

Then

k=1n(αkα~k)𝐯k=k=1nαk𝐯kk=1nα~k𝐯k=𝐯𝐯=𝟎.\sum_{k=1}^{n}(\alpha_{k}-\widetilde{\alpha}_{k})\mathbf{v}_{k}=\sum_{k=1}^{n}% \alpha_{k}\mathbf{v}_{k}-\sum_{k=1}^{n}\widetilde{\alpha}_{k}\mathbf{v}_{k}=% \mathbf{v}-\mathbf{v}=\mathbf{0}.

Since the system is linearly independent, αkα~k=0\alpha_{k}-\widetilde{\alpha}_{k}=0 k\forall k, and thus the representation 𝐯=α1𝐯1+α2𝐯2++αn𝐯n\mathbf{v}=\alpha_{1}\mathbf{v}_{1}+\alpha_{2}\mathbf{v}_{2}+\ldots+\alpha_{n}% \mathbf{v}_{n} is unique. ∎

Remark.

In many textbooks a basis is defined as a complete and linearly independent system (by Proposition 1.2.7 this definition is equivalent to ours). Although this definition is more common than one presented in this text, I prefer the latter. It emphasizes the main property of a basis, namely that any vector admits a unique representation as a linear combination.

Proposition 1.2.8.

Any (finite) generating system contains a basis.

Proof.

Suppose 𝐯1,𝐯2,,𝐯pV\mathbf{v}_{1},\mathbf{v}_{2},\ldots,\mathbf{v}_{p}\in V is a generating (complete) set. If it is linearly independent, it is a basis, and we are done.

Suppose it is not linearly independent, i.e. it is linearly dependent. Then there exists a vector 𝐯k\mathbf{v}_{k} which can be represented as a linear combination of the vectors 𝐯j\mathbf{v}_{j}, jkj\neq k.

Since 𝐯k\mathbf{v}_{k} can be represented as a linear combination of vectors 𝐯j\mathbf{v}_{j}, jkj\neq k, any linear combination of vectors 𝐯1,𝐯2,,𝐯p\mathbf{v}_{1},\mathbf{v}_{2},\ldots,\mathbf{v}_{p} can be represented as a linear combination of the same vectors without 𝐯k\mathbf{v}_{k} (i.e. the vectors 𝐯j\mathbf{v}_{j}, 1jp1\leq j\leq p, jkj\neq k). So, if we delete the vector 𝐯k\mathbf{v}_{k}, the new system will still be a complete one.

If the new system is linearly independent, we are done. If not, we repeat the procedure.

Repeating this procedure finitely many times we arrive to a linearly independent and complete system, because otherwise we delete all vectors and end up with an empty set.

So, any finite complete (generating) set contains a complete linearly independent subset, i.e. a basis. ∎

Exercises.

1.2.1.

Find a basis in the space of 3×23\times 2 matrices M3×2M_{3\times 2}.

1.2.2.

True or false:

  1. a)

    Any set containing a zero vector is linearly dependent

  2. b)

    A basis must contain 𝟎\mathbf{0};

  3. c)

    subsets of linearly dependent sets are linearly dependent;

  4. d)

    subsets of linearly independent sets are linearly independent;

  5. e)

    If α1𝐯1+α2𝐯2++αn𝐯n=𝟎\alpha_{1}\mathbf{v}_{1}+\alpha_{2}\mathbf{v}_{2}+\ldots+\alpha_{n}\mathbf{v}_% {n}=\mathbf{0} then all scalars αk\alpha_{k} are zero;

1.2.3.

Recall, that a matrix is called symmetric if AT=AA^{T}=A. Write down a basis in the space of symmetric 2×22\times 2 matrices (there are many possible answers). How many elements are in the basis?

1.2.4.

Write down a basis for the space of

  1. a)

    3×33\times 3 symmetric matrices;

  2. b)

    n×nn\times n symmetric matrices;

  3. c)

    n×nn\times n antisymmetric (AT=AA^{T}=-A) matrices;

1.2.5.

Let a system of vectors 𝐯1,𝐯2,,𝐯r\mathbf{v}_{1},\mathbf{v}_{2},\ldots,\mathbf{v}_{r} be linearly independent but not generating. Show that it is possible to find a vector 𝐯r+1\mathbf{v}_{r+1} such that the system 𝐯1,𝐯2,,𝐯r,𝐯r+1\mathbf{v}_{1},\mathbf{v}_{2},\ldots,\mathbf{v}_{r},\mathbf{v}_{r+1} is linearly independent. Hint: Take for 𝐯r+1\mathbf{v}_{r+1} any vector that cannot be represented as a linear combination k=1rαk𝐯k\sum_{k=1}^{r}\alpha_{k}\mathbf{v}_{k} and show that the system 𝐯1,𝐯2,,𝐯r,𝐯r+1\mathbf{v}_{1},\mathbf{v}_{2},\ldots,\mathbf{v}_{r},\mathbf{v}_{r+1} is linearly independent.

1.2.6.

Is it possible that vectors 𝐯1,𝐯2,𝐯3\mathbf{v}_{1},\mathbf{v}_{2},\mathbf{v}_{3} are linearly dependent, but the vectors 𝐰1=𝐯1+𝐯2\mathbf{w}_{1}=\mathbf{v}_{1}+\mathbf{v}_{2}, 𝐰2=𝐯2+𝐯3\mathbf{w}_{2}=\mathbf{v}_{2}+\mathbf{v}_{3} and 𝐰3=𝐯3+𝐯1\mathbf{w}_{3}=\mathbf{v}_{3}+\mathbf{v}_{1} are linearly independent?

1.3. Linear Transformations. Matrix–vector multiplication

A transformation TT from a set XX to a set YY is a rule that for each argument (input) xXx\in X assigns a value (output) y=T(x)Yy=T(x)\in Y.

The set XX is called the domain of TT, and the set YY is called the target space or codomain of TT.

We write T:XYT:X\to Y to say that TT is a transformation with the domain XX and the target space YY.

The words “transformation”, “transform”, “mapping”, “map”, “operator”, “function” all denote the same object.

Definition.

Let VV, WW be vector spaces (over the same field 𝔽\mathbb{F}). A transformation T:VWT:V\to W is called linear if

  1. 1.

    T(𝐮+𝐯)=T(𝐮)+T(𝐯)T(\mathbf{u}+\mathbf{v})=T(\mathbf{u})+T(\mathbf{v}) 𝐮,𝐯V\forall\mathbf{u},\mathbf{v}\in V;

  2. 2.

    T(α𝐯)=αT(𝐯)T(\alpha\mathbf{v})=\alpha T(\mathbf{v}) for all 𝐯V\mathbf{v}\in V and for all scalars α𝔽\alpha\in\mathbb{F}.

Properties 1 and 2 together are equivalent to the following one:

T(α𝐮+β𝐯)=αT(𝐮)+βT(𝐯)T(\alpha\mathbf{u}+\beta\mathbf{v})=\alpha T(\mathbf{u})+\beta T(\mathbf{v})

for all 𝐮,𝐯V\mathbf{u},\mathbf{v}\in V and for all scalars α,β\alpha,\beta.

1.3.1. Examples.

You dealt with linear transformation before, may be without even suspecting it, as the examples below show.

Example.

Differentiation: Let V=nV=\mathbb{P}_{n} (the set of polynomials of degree at most nn), W=n1W=\mathbb{P}_{n-1}, and let T:nn1T:\mathbb{P}_{n}\to\mathbb{P}_{n-1} be the differentiation operator,

T(p):=ppn.T(p):=p^{\prime}\qquad\forall p\in\mathbb{P}_{n}.

Since (f+g)=f+g(f+g)^{\prime}=f^{\prime}+g^{\prime} and (αf)=αf(\alpha f)^{\prime}=\alpha f^{\prime}, this is a linear transformation.

Refer to caption
Figure 1.1. Rotation
Example.

Rotation: in this example V=W=2V=W=\mathbb{R}^{2} (the usual coordinate plane), and a transformation Tγ:22T_{\gamma}:\mathbb{R}^{2}\to\mathbb{R}^{2} takes a vector in 2\mathbb{R}^{2} and rotates it counterclockwise by γ\gamma radians see Fig. 1.1 below. Since TγT_{\gamma} rotates the plane as a whole, it rotates as a whole the parallelogram used to define a sum of two vectors (parallelogram law). Therefore the property 1 of linear transformation holds. It is also easy to see that the property 2 is also true.

Example.

Reflection: in this example again V=W=2V=W=\mathbb{R}^{2}, and the transformation T:22T:\mathbb{R}^{2}\to\mathbb{R}^{2} is the reflection in the first coordinate axis. It can also be shown geometrically, that this transformation is linear, but we will use another way to show that.

Namely, it is easy to write a formula for TT,

T((x1x2))=(x1x2)T\Big{(}\left(\begin{array}[]{c}x_{1}\\ x_{2}\end{array}\right)\Big{)}=\left(\begin{array}[]{c}x_{1}\\ -x_{2}\end{array}\right)

and from this formula it is easy to check that the transformation is linear.

Example.

Let us investigate linear transformations T:T:\mathbb{R}\to\mathbb{R}. Any such transformation is given by the formula

T(x)=axwhere a=T(1).T(x)=ax\qquad\text{where }a=T(1).

Indeed,

T(x)=T(x×1)=xT(1)=xa=ax.T(x)=T(x\times 1)=xT(1)=xa=ax.

So, any linear transformation of \mathbb{R} is just a multiplication by a constant.

1.3.2. Linear transformations 𝔽n𝔽m\mathbb{F}^{n}\to\mathbb{F}^{m}. Matrix–column multiplication.

It turns out that a linear transformation T:𝔽n𝔽mT:\mathbb{F}^{n}\to\mathbb{F}^{m} also can be represented as a multiplication, not by a scalar, but by a matrix.

Let us see how. Let T:𝔽n𝔽mT:\mathbb{F}^{n}\to\mathbb{F}^{m} be a linear transformation. What information do we need to compute T(𝐱)T(\mathbf{x}) for all vectors 𝐱𝔽n\mathbf{x}\in\mathbb{F}^{n}? My claim is that it is sufficient to know how TT acts on the standard basis 𝐞1,𝐞2,,𝐞n\mathbf{e}_{1},\mathbf{e}_{2},\ldots,\mathbf{e}_{n} of 𝔽n\mathbb{F}^{n}. Namely, it is sufficient to know nn vectors in 𝔽m\mathbb{F}^{m} (i.e. the vectors of size mm),

𝐚1:=T(𝐞1),𝐚2:=T(𝐞2),,𝐚n:=T(𝐞n).\mathbf{a}_{1}:=T(\mathbf{e}_{1}),\quad\mathbf{a}_{2}:=T(\mathbf{e}_{2}),\quad% \ldots\ ,\quad\mathbf{a}_{n}:=T(\mathbf{e}_{n}).

Indeed, let

𝐱=(x1x2xn).\mathbf{x}=\left(\begin{array}[]{c}x_{1}\\ {x}_{2}\\ \vdots\\ {x}_{n}\end{array}\right).

Then 𝐱=x1𝐞1+x2𝐞2++xn𝐞n=k=1nxk𝐞k\mathbf{x}=x_{1}\mathbf{e}_{1}+x_{2}\mathbf{e}_{2}+\ldots+x_{n}\mathbf{e}_{n}=% \sum_{k=1}^{n}x_{k}\mathbf{e}_{k} and

T(𝐱)=T(k=1nxk𝐞k)=k=1nT(xk𝐞k)=k=1nxkT(𝐞k)=k=1nxk𝐚k.T(\mathbf{x})=T(\sum_{k=1}^{n}x_{k}\mathbf{e}_{k})=\sum_{k=1}^{n}T(x_{k}% \mathbf{e}_{k})=\sum_{k=1}^{n}x_{k}T(\mathbf{e}_{k})=\sum_{k=1}^{n}x_{k}% \mathbf{a}_{k}.

So, if we join the vectors (columns) 𝐚1,𝐚2,,𝐚n\mathbf{a}_{1},\mathbf{a}_{2},\ldots,\mathbf{a}_{n} together in a matrix A=[𝐚1,𝐚2,,𝐚n]A=\left[\mathbf{a}_{1},\mathbf{a}_{2},\ldots,\mathbf{a}_{n}\right] (𝐚k\mathbf{a}_{k} being the kkth column of AA, k=1,2,,nk=1,2,\ldots,n), this matrix contains all the information about TT.

Let us show how one should define the product of a matrix and a vector (column) to represent the transformation TT as a product, T(𝐱)=A𝐱T(\mathbf{x})=A\mathbf{x}. Let

A=(a1,1a1,2a1,na2,1a2,2a2,nam,1am,2am,n).A=\left(\begin{array}[]{cccc}a_{1,1}&a_{1,2}&\ldots&a_{1,n}\\ a_{2,1}&a_{2,2}&\ldots&a_{2,n}\\ \vdots&\vdots&&\vdots\\ a_{m,1}&a_{m,2}&\ldots&a_{m,n}\end{array}\right).

Recall, that the column number kk of AA is the vector 𝐚k\mathbf{a}_{k}, i.e.

𝐚k=(a1,ka2,kam,k).\mathbf{a}_{k}=\left(\begin{array}[]{c}a_{1,k}\\ a_{2,k}\\ \vdots\\ a_{m,k}\end{array}\right).

Then if we want A𝐱=T(𝐱)A\mathbf{x}=T(\mathbf{x}) we get

A𝐱=k=1nxk𝐚k=x1(a1,1a2,1am,1)+x2(a1,2a2,2am,2)++xn(a1,na2,nam,n).A\mathbf{x}=\sum_{k=1}^{n}x_{k}\mathbf{a}_{k}=x_{1}\left(\begin{array}[]{c}a_{% 1,1}\\ a_{2,1}\\ \vdots\\ a_{m,1}\end{array}\right)+x_{2}\left(\begin{array}[]{c}a_{1,2}\\ a_{2,2}\\ \vdots\\ a_{m,2}\end{array}\right)+\ldots+x_{n}\left(\begin{array}[]{c}a_{1,n}\\ a_{2,n}\\ \vdots\\ a_{m,n}\end{array}\right).

So, the matrix–vector multiplication should be performed by the following column by coordinate rule:

multiply each column of the matrix by the corresponding coordinate of the vector.

Example.
(123321)(123)=1(13)+2(22)+3(31)=(1410).\left(\begin{array}[]{ccc}1&2&3\\ 3&2&1\end{array}\right)\left(\begin{array}[]{c}1\\ 2\\ 3\end{array}\right)=1\left(\begin{array}[]{c}1\\ 3\end{array}\right)+2\left(\begin{array}[]{c}2\\ 2\end{array}\right)+3\left(\begin{array}[]{c}3\\ 1\end{array}\right)=\left(\begin{array}[]{c}14\\ 10\end{array}\right).

The “column by coordinate” rule is very well adapted for parallel computing. It will be also very important in different theoretical constructions later.

However, when doing computations manually, it is more convenient to compute the result one entry at a time. This can be expressed as the following row by column rule:

To get the entry number kk of the result, one need to multiply row number kk of the matrix by the vector, that is, if A𝐱=𝐲A\mathbf{x}=\mathbf{y}, then yk=j=1nak,jxjy_{k}=\sum_{j=1}^{n}a_{k,j}x_{j}, k=1,2,mk=1,2,\ldots m;

here xjx_{j} and yky_{k} are coordinates of the vectors 𝐱\mathbf{x} and 𝐲\mathbf{y} respectively, and aj,ka_{j,k} are the entries of the matrix AA.

Example.
(123456)(123)=(11+22+3341+52+63)=(1432)\left(\begin{array}[]{ccc}1&2&3\\ 4&5&6\end{array}\right)\left(\begin{array}[]{c}1\\ 2\\ 3\end{array}\right)=\left(\begin{array}[]{c}1\cdot 1+2\cdot 2+3\cdot 3\\ 4\cdot 1+5\cdot 2+6\cdot 3\end{array}\right)=\left(\begin{array}[]{c}14\\ 32\end{array}\right)

1.3.3. Linear transformations and generating sets.

As we discussed above, linear transformation TT (acting from 𝔽n\mathbb{F}^{n} to 𝔽m\mathbb{F}^{m}) is completely defined by its values on the standard basis in 𝔽n\mathbb{F}^{n}.

The fact that we consider the standard basis is not essential, one can consider any basis, even any generating (spanning) set. Namely,

A linear transformation T:VWT:V\to W is completely defined by its values on a generating set (in particular by its values on a basis).

So, if 𝐯1,𝐯2,,𝐯n\mathbf{v}_{1},\mathbf{v}_{2},\ldots,\mathbf{v}_{n} is a generating set (in particular, if it is a basis) in VV, and TT and T1T_{1} are linear transformations T,T1:VWT,T_{1}:V\to W such that

T𝐯k=T1𝐯k,k=1,2,,nT\mathbf{v}_{k}=T_{1}\mathbf{v}_{k},\qquad k=1,2,\ldots,n

then T=T1T=T_{1}.

The proof of this statement is trivial and left as an exercise.

1.3.4. Conclusions.

  • To get the matrix of a linear transformation T:𝔽n𝔽mT:\mathbb{F}^{n}\to\mathbb{F}^{m} one needs to join the vectors 𝐚k=T𝐞k\mathbf{a}_{k}=T\mathbf{e}_{k} (where 𝐞1,𝐞2,,𝐞n\mathbf{e}_{1},\mathbf{e}_{2},\ldots,\mathbf{e}_{n} is the standard basis in 𝔽n\mathbb{F}^{n}) into a matrix: kkth column of the matrix is 𝐚k\mathbf{a}_{k}, k=1,2,,nk=1,2,\ldots,n.

  • If the matrix AA of the linear transformation TT is known, then T(𝐱)T(\mathbf{x}) can be found by the matrix–vector multiplication, T(𝐱)=A𝐱T(\mathbf{x})=A\mathbf{x}. To perform matrix–vector multiplication one can use either “column by coordinate” or “row by column” rule.

    The latter seems more appropriate for manual computations. The former is well adapted for parallel computers, and will be used in different theoretical constructions.

For a linear transformation T:𝔽n𝔽mT:\mathbb{F}^{n}\to\mathbb{F}^{m}, its matrix is usually denoted as [T][T]. However, very often people do not distinguish between a linear transformation and its matrix, and use the same symbol for both. When it does not lead to confusion, we will also use the same symbol for a transformation and its matrix.

Since a linear transformation is essentially a multiplication, the notation T𝐯T\mathbf{v} is often used instead of T(𝐯)T(\mathbf{v}). We will also use this notation. Note that the usual order of algebraic operations apply, i.e. T𝐯+𝐮T\mathbf{v}+\mathbf{u} means T(𝐯)+𝐮T(\mathbf{v})+\mathbf{u}, not T(𝐯+𝐮)T(\mathbf{v}+\mathbf{u}).

Remark.

In the matrix–vector multiplication A𝐱A\mathbf{x} the number of columns of the matrix AA matrix must coincide with the size of the vector 𝐱\mathbf{x}, i.e. a vector in 𝔽n\mathbb{F}^{n} can only be multiplied by an m×nm\times n matrix.

It makes sense, since an m×nm\times n matrix defines a linear transformation 𝔽n𝔽m\mathbb{F}^{n}\to\mathbb{F}^{m}, so vector 𝐱\mathbf{x} must belong to 𝔽n\mathbb{F}^{n}.

The easiest way to remember this is to remember that if performing multiplication you run out of some elements faster, then the multiplication is not defined. For example, if using the “row by column” rule you run out of row entries, but still have some unused entries in the vector, the multiplication is not defined. It is also not defined if you run out of vector’s entries, but still have unused entries in the row.

Remark.

One does not have to restrict himself to the case of 𝔽n\mathbb{F}^{n} with standard basis: everything described in this section works for transformation between arbitrary vector spaces as long as there is a basis in the domain and in the target space. Of course, if one changes a basis, the matrix of the linear transformation will be different. This will be discussed later in Section 2.8 of Chapter 2.

Exercises.

1.3.1.

Multiply:

  1. a)

    (123456)(132)\displaystyle\left(\begin{array}[]{ccc}1&2&3\\ 4&5&6\end{array}\right)\left(\begin{array}[]{c}1\\ 3\\ 2\end{array}\right);

  2. b)

    (120120)(13)\displaystyle\left(\begin{array}[]{cc}1&2\\ 0&1\\ 2&0\end{array}\right)\left(\begin{array}[]{c}1\\ 3\end{array}\right);

  3. c)

    (1200012000120001)(1234)\displaystyle\left(\begin{array}[]{cccc}1&2&0&0\\ 0&1&2&0\\ 0&0&1&2\\ 0&0&0&1\end{array}\right)\left(\begin{array}[]{c}1\\ 2\\ 3\\ 4\end{array}\right);

  4. d)

    (120012001000)(1234)\displaystyle\left(\begin{array}[]{ccc}1&2&0\\ 0&1&2\\ 0&0&1\\ 0&0&0\end{array}\right)\left(\begin{array}[]{c}1\\ 2\\ 3\\ 4\end{array}\right).

1.3.2.

Let a linear transformation in 2\mathbb{R}^{2} be the reflection in the line x1=x2x_{1}=x_{2}. Find its matrix.

1.3.3.

For each linear transformation below find it matrix

  1. a)

    T:23T:\mathbb{R}^{2}\to\mathbb{R}^{3} defined by T(x,y)T=(x+2y,2x5y,7y)TT(x,y)^{T}=(x+2y,2x-5y,7y)^{T};

  2. b)

    T:43T:\mathbb{R}^{4}\to\mathbb{R}^{3} defined by T(x1,x2,x3,x4)T=(x1+x2+x3+x4,x2x4,x1+3x2+6x4)TT(x_{1},x_{2},x_{3},x_{4})^{T}=(x_{1}+x_{2}+x_{3}+x_{4},x_{2}-x_{4},x_{1}+3x_{% 2}+6x_{4})^{T};

  3. c)

    T:nnT:\mathbb{P}_{n}\to\mathbb{P}_{n}, Tf(t)=f(t)Tf(t)=f^{\prime}(t) (find the matrix with respect to the standard basis 1,t,t2,,tn1,t,t^{2},\ldots,t^{n});

  4. d)

    T:nnT:\mathbb{P}_{n}\to\mathbb{P}_{n}, Tf(t)=2f(t)+3f(t)4f′′(t)Tf(t)=2f(t)+3f^{\prime}(t)-4f^{\prime\prime}(t) (again with respect to the standard basis 1,t,t2,,tn1,t,t^{2},\ldots,t^{n}).

1.3.4.

Find 3×33\times 3 matrices representing the transformations of 3\mathbb{R}^{3} which:

  1. a)

    project every vector onto xx-yy plane;

  2. b)

    reflect every vector through xx-yy plane;

  3. c)

    rotate the xx-yy plane through 3030^{\circ}, leaving zz-axis alone.

1.3.5.

Let AA be a linear transformation. If 𝐳\mathbf{z} is the center of the straight interval [𝐱,𝐲][\mathbf{x},\mathbf{y}], show that A𝐳A\mathbf{z} is the center of the interval [A𝐱,A𝐲][A\mathbf{x},A\mathbf{y}]. Hint: What does it mean that 𝐳\mathbf{z} is the center of the interval [𝐱,𝐲][\mathbf{x},\mathbf{y}]?

1.3.6.

The set \mathbb{C} of complex numbers can be canonically identified with the space 2\mathbb{R}^{2} by treating each z=x+iyz=x+iy\in\mathbb{C} as a column (x,y)T2(x,y)^{T}\in\mathbb{R}^{2}.

  1. a)

    Treating \mathbb{C} as a complex vector space, show that the multiplication by α=a+ib\alpha=a+ib\in\mathbb{C} is a linear transformation in \mathbb{C}. What is its matrix?

  2. b)

    Treating \mathbb{C} as the real vector space 2\mathbb{R}^{2} show that the multiplication by α=a+ib\alpha=a+ib defines a linear transformation there. What is its matrix?

  3. c)

    Define T(x+iy)=2xy+i(x3y)T(x+iy)=2x-y+i(x-3y). Show that this transformation is not a linear transformation in the complex vectors space \mathbb{C}, but if we treat \mathbb{C} as the real vector space 2\mathbb{R}^{2} then it is a linear transformation there (i.e. that TT is a real linear but not a complex linear transformation).

    Find the matrix of the real linear transformation TT.

1.3.7.

Show that any linear transformation in \mathbb{C} (treated as a complex vector space) is a multiplication by α\alpha\in\mathbb{C}.

1.4. Linear transformations as a vector space

What operations can we perform with linear transformations? We can always multiply a linear transformation for a scalar, i.e. if we have a linear transformation T:VWT:V\to W and a scalar α\alpha we can define a new transformation αT\alpha T by

(αT)𝐯=α(T𝐯)𝐯V.(\alpha T)\mathbf{v}=\alpha(T\mathbf{v})\qquad\forall\mathbf{v}\in V.

It is easy to check that αT\alpha T is also a linear transformation:

(αT)(α1𝐯1+α2𝐯2)\displaystyle(\alpha T)(\alpha_{1}\mathbf{v}_{1}+\alpha_{2}\mathbf{v}_{2}) =α(T(α1𝐯1+α2𝐯2))by the definition of αT\displaystyle=\alpha(T(\alpha_{1}\mathbf{v}_{1}+\alpha_{2}\mathbf{v}_{2}))% \qquad\text{by the definition of }\alpha T
=α(α1T𝐯1+α2T𝐯2)by the linearity of T\displaystyle=\alpha(\alpha_{1}T\mathbf{v}_{1}+\alpha_{2}T\mathbf{v}_{2})% \qquad\text{by the linearity of }T
=α1αT𝐯1+α2αT𝐯2\displaystyle=\alpha_{1}\alpha T\mathbf{v}_{1}+\alpha_{2}\alpha T\mathbf{v}_{2}
=α1(αT)𝐯1+α2(αT)𝐯2\displaystyle=\alpha_{1}(\alpha T)\mathbf{v}_{1}+\alpha_{2}(\alpha T)\mathbf{v% }_{2}

If T1T_{1} and T2T_{2} are linear transformations with the same domain and target space (T1:VWT_{1}:V\to W and T2:VWT_{2}:V\to W, or in short T1,T2:VWT_{1},T_{2}:V\to W), then we can add these transformations, i.e. define a new transformation T=(T1+T2):VWT=(T_{1}+T_{2}):V\to W by

(T1+T2)𝐯=T1𝐯+T2𝐯𝐯V.(T_{1}+T_{2})\mathbf{v}=T_{1}\mathbf{v}+T_{2}\mathbf{v}\qquad\forall\mathbf{v}% \in V.

It is easy to check that the transformation T1+T2T_{1}+T_{2} is a linear one, one just needs to repeat the above reasoning for the linearity of αT\alpha T.

So, if we fix vector spaces VV and WW and consider the collection of all linear transformations from VV to WW (let us denote it by (V,W)\mathcal{L}(V,W)), we can define 2 operations on (V,W)\mathcal{L}(V,W): multiplication by a scalar and addition. It can be easily shown that these operations satisfy the axioms of a vector space, defined in Section 1.1.

This should come as no surprise for the reader, since axioms of a vector space essentially mean that operation on vectors follow standard rules of algebra. And the operations on linear transformations are defined as to satisfy these rules!

As an illustration, let us write down a formal proof of the first distributive law (axiom 7) of a vector space. We want to show that α(T1+T2)=αT1+αT2\alpha(T_{1}+T_{2})=\alpha T_{1}+\alpha T_{2}. For any 𝐯V\mathbf{v}\in V

α(T1+T2)𝐯\displaystyle\alpha(T_{1}+T_{2})\mathbf{v} =α((T1+T2)𝐯)\displaystyle=\alpha((T_{1}+T_{2})\mathbf{v})\qquad by the definition of multiplication
=α(T1𝐯+T2𝐯)\displaystyle=\alpha(T_{1}\mathbf{v}+T_{2}\mathbf{v}) by the definition of the sum
=αT1𝐯+αT2𝐯\displaystyle=\alpha T_{1}\mathbf{v}+\alpha T_{2}\mathbf{v} by Axiom 7 for W\displaystyle\text{by Axiom 7 for }W
=(αT1+αT2)𝐯\displaystyle=(\alpha T_{1}+\alpha T_{2})\mathbf{v} by the definition of the sum

So indeed α(T1+T2)=αT1+αT2\alpha(T_{1}+T_{2})=\alpha T_{1}+\alpha T_{2}.

Remark.

Linear operations (addition and multiplication by a scalar) on linear transformations T:𝔽n𝔽mT:\mathbb{F}^{n}\to\mathbb{F}^{m} correspond to the respective operations on their matrices. Since we know that the set of m×nm\times n matrices is a vector space, this immediately implies that (𝔽n,𝔽m)\mathcal{L}(\mathbb{F}^{n},\mathbb{F}^{m}) is a vector space.

We presented the abstract proof above, first of all because it work for general spaces, for example, for spaces without a basis, where we cannot work with coordinates. Secondly, the reasonings similar to the abstract one presented here, are used in many places, so the reader will benefit from understanding it.

And as the reader gains some mathematical sophistication, he/she will see that this abstract reasoning is indeed a very simple one, that can be performed almost automatically.

1.5. Composition of linear transformations and matrix multiplication.

1.5.1. Definition of the matrix multiplication.

Knowing matrix–vector multiplication, one can easily guess what is the natural way to define the product ABAB of two matrices: Let us multiply by AA each column of BB (matrix-vector multiplication) and join the resulting column-vectors into a matrix. Formally,

if 𝐛1,𝐛2,,𝐛r\mathbf{b}_{1},\mathbf{b}_{2},\ldots,\mathbf{b}_{r} are the columns of BB, then A𝐛1,A𝐛2,,A𝐛rA\mathbf{b}_{1},A\mathbf{b}_{2},\ldots,A\mathbf{b}_{r} are the columns of the matrix ABAB.

Recalling the row by column rule for the matrix–vector multiplication we get the following row by column rule for the matrices

the entry (AB)j,k(AB)_{j,k} (the entry in the row jj and column kk) of the product ABAB is defined by (AB)j,k=(row #j of A)(column #k of B)(AB)_{j,k}=(\text{row \#}j\text{\ of }A)\cdot(\text{column \#}k\text{\ of }B)\qquad\qquad

Formally it can be rewritten as

(AB)j,k=laj,lbl,k,(AB)_{j,k}=\sum_{l}a_{j,l}b_{l,k},

if aj,ka_{j,k} and bj,kb_{j,k} are entries of the matrices AA and BB respectively.

I intentionally did not speak about sizes of the matrices AA and BB, but if we recall the row by column rule for the matrix–vector multiplication, we can see that in order for the multiplication to be defined, the size of a row of AA should be equal to the size of a column of BB.

In other words the product ABAB is defined if and only if AA is an m×nm\times n and BB is n×rn\times r matrix.

1.5.2. Motivation: composition of linear transformations.

One can ask yourself here: Why are we using such a complicated rule of multiplication? Why don’t we just multiply matrices entrywise?

And the answer is, that the multiplication, as it is defined above, arises naturally from the composition of linear transformations.

Suppose we have two linear transformations, T1:𝔽n𝔽mT_{1}:\mathbb{F}^{n}\to\mathbb{F}^{m} and T2:𝔽r𝔽nT_{2}:\mathbb{F}^{r}\to\mathbb{F}^{n}. Define the composition T=T1T2T=T_{1}\circ T_{2} of the transformations T1T_{1}, T2T_{2} as

T(𝐱)=T1(T2(𝐱))𝐱𝔽r.T(\mathbf{x})=T_{1}(T_{2}(\mathbf{x}))\qquad\forall\mathbf{x}\in\mathbb{F}^{r}.

Note that T2(𝐱)𝔽nT_{2}(\mathbf{x})\in\mathbb{F}^{n}. Since T1:𝔽n𝔽mT_{1}:\mathbb{F}^{n}\to\mathbb{F}^{m}, the expression T1(T2(𝐱))T_{1}(T_{2}(\mathbf{x})) is well defined and the result belongs to 𝔽m\mathbb{F}^{m}. So, T:𝔽r𝔽mT:\mathbb{F}^{r}\to\mathbb{F}^{m}.

It is easy to show that TT is a linear transformation (exercise), so it is defined by an m×rm\times r matrix. How one can find this matrix, knowing the matrices of T1T_{1} and T2T_{2}?

Remark.

We will usually identify a linear transformation and its matrix, but in the next few paragraphs we will distinguish them

Let AA be the matrix of T1T_{1} and BB be the matrix of T2T_{2}. As we discussed in the previous section, the columns of TT are vectors T(𝐞1),T(𝐞2),,T(𝐞r)T(\mathbf{e}_{1}),T(\mathbf{e}_{2}),\ldots,T(\mathbf{e}_{r}), where 𝐞1,𝐞2,,𝐞r\mathbf{e}_{1},\mathbf{e}_{2},\ldots,\mathbf{e}_{r} is the standard basis in 𝔽r\mathbb{F}^{r}. For k=1,2,,rk=1,2,\ldots,r we have

T(𝐞k)=T1(T2(𝐞k))=T1(B𝐞k)=T1(𝐛k)=A𝐛kT(\mathbf{e}_{k})=T_{1}(T_{2}(\mathbf{e}_{k}))=T_{1}(B\mathbf{e}_{k})=T_{1}(% \mathbf{b}_{k})=A\mathbf{b}_{k}

(operators T2T_{2} and T1T_{1} are simply the multiplication by BB and AA respectively).

So, the columns of the matrix of TT are A𝐛1,A𝐛2,,A𝐛rA\mathbf{b}_{1},A\mathbf{b}_{2},\ldots,A\mathbf{b}_{r}, and that is exactly how the matrix ABAB was defined!

Let us return to identifying again a linear transformation with its matrix. Since the matrix multiplication agrees with the composition, we can (and will) write T1T2T_{1}T_{2} instead of T1T2T_{1}\circ T_{2} and T1T2𝐱T_{1}T_{2}\mathbf{x} instead of T1(T2(𝐱))T_{1}(T_{2}(\mathbf{x})).

Remark.

Note that in the composition T1T2T_{1}T_{2} the transformation T2T_{2} is applied first! The way to remember this is to see that in T1T2𝐱T_{1}T_{2}\mathbf{x} the transformation T2T_{2} meets 𝐱\mathbf{x} first.

Remark.

There is another way of checking the dimensions of matrices in a product, different from the row by column rule: for a composition T1T2T_{1}T_{2} to be defined it is necessary that T2𝐱T_{2}\mathbf{x} belongs to the domain of T1T_{1}. If T2T_{2} acts from some space, say 𝔽r\mathbb{F}^{r} to 𝔽n\mathbb{F}^{n}, then T1T_{1} must act from 𝔽n\mathbb{F}^{n} to some space, say 𝔽m\mathbb{F}^{m}. So, in order for T1T2T_{1}T_{2} to be defined the matrices of T1T_{1} and T2T_{2} should be of sizes m×nm\times n and n×rn\times r respectively—the same condition as obtained from the row by column rule.

Example.

Let T:22T:\mathbb{R}^{2}\to\mathbb{R}^{2} be the reflection in the line x1=3x2x_{1}=3x_{2}. It is a linear transformation, so let us find its matrix. To find the matrix, we need to compute T𝐞1T\mathbf{e}_{1} and T𝐞2T\mathbf{e}_{2}. However, the direct computation of T𝐞1T\mathbf{e}_{1} and T𝐞2T\mathbf{e}_{2} involves significantly more trigonometry than a sane person is willing to remember.

An easier way to find the matrix of TT is to represent it as a composition of simple linear transformation. Namely, let γ\gamma be the angle between the x1x_{1} axis and the line x1=3x2x_{1}=3x_{2}, and let T0T_{0} be the reflection in the x1x_{1}-axis. Then to get the reflection TT we can first rotate the plane by the angle γ-\gamma, moving the line x1=3x2x_{1}=3x_{2} to the x1x_{1}-axis, then reflect everything in the x1x_{1} axis, and then rotate the plane by γ\gamma, taking everything back. Formally it can be written as

T=RγT0RγT=R_{\gamma}T_{0}R_{-\gamma}

(note the order of terms!), where RγR_{\gamma} is the rotation by γ\gamma. The matrix of T0T_{0} is easy to compute,

T0=(1001),T_{0}=\left(\begin{array}[]{cc}1&0\\ 0&-1\end{array}\right),

the rotation matrices are known

Rγ\displaystyle R_{\gamma} =(cosγsinγsinγcosγ),\displaystyle=\left(\begin{array}[]{cc}\cos\gamma&-\sin\gamma\\ \sin\gamma&\cos\gamma\end{array}\right),
Rγ\displaystyle R_{-\gamma} =(cos(γ)sin(γ)sin(γ)cos(γ))=(cosγsinγsinγcosγ)\displaystyle=\left(\begin{array}[]{cc}\cos(-\gamma)&-\sin(-\gamma)\\ \sin(-\gamma)&\cos(-\gamma)\end{array}\right)=\left(\begin{array}[]{cc}\cos% \gamma&\sin\gamma\\ -\sin\gamma&\cos\gamma\end{array}\right)

To compute sinγ\sin\gamma and cosγ\cos\gamma take a vector in the line x1=3x2x_{1}=3x_{2}, say a vector (3,1)T(3,1)^{T}. Then

cosγ=first coordinatelength=332+12=310\cos\gamma=\frac{\text{first coordinate}}{\text{length}}=\frac{3}{\sqrt{3^{2}+% 1^{2}}}=\frac{3}{\sqrt{10}}

and similarly

sinγ=second coordinatelength=132+12=110\sin\gamma=\frac{\text{second coordinate}}{\text{length}}=\frac{1}{\sqrt{3^{2}% +1^{2}}}=\frac{1}{\sqrt{10}}

Gathering everything together we get

T=RγT0Rγ\displaystyle T=R_{\gamma}T_{0}R_{-\gamma} =110(3113)(1001)110(3113)\displaystyle=\frac{1}{\sqrt{10}}\left(\begin{array}[]{cc}3&-1\\ 1&3\end{array}\right)\left(\begin{array}[]{cc}1&0\\ 0&-1\end{array}\right)\frac{1}{\sqrt{10}}\left(\begin{array}[]{cc}3&1\\ -1&3\end{array}\right)
=110(3113)(1001)(3113)\displaystyle=\frac{1}{{10}}\left(\begin{array}[]{cc}3&-1\\ 1&3\end{array}\right)\left(\begin{array}[]{cc}1&0\\ 0&-1\end{array}\right)\left(\begin{array}[]{cc}3&1\\ -1&3\end{array}\right)

It remains only to perform matrix multiplication here to get the final result. ∎

1.5.3. Properties of matrix multiplication.

Matrix multiplication enjoys a lot of properties, familiar to us from high school algebra:

  1. 1.

    Associativity: A(BC)=(AB)CA(BC)=(AB)C, provided that either left or right side is well defined; we therefore can (and will) simply write ABCABC in this case.

  2. 2.

    Distributivity: A(B+C)=AB+ACA(B+C)=AB+AC, (A+B)C=AC+BC(A+B)C=AC+BC, provided either left or right side of each equation is well defined.

  3. 3.

    One can take scalar multiplies out: A(αB)=(αA)B=α(AB)=αABA(\alpha B)=(\alpha A)B=\alpha(AB)=\alpha AB.

These properties are easy to prove. One should prove the corresponding properties for linear transformations, and they almost trivially follow from the definitions. The properties of linear transformations then imply the properties for the matrix multiplication.

The new twist here is that the commutativity fails:

matrix multiplication is non-commutative, i.e. generally for matrices ABBAAB\neq BA.

One can see easily it would be unreasonable to expect the commutativity of matrix multiplication. Indeed, let AA and BB be matrices of sizes m×nm\times n and n×rn\times r respectively. Then the product ABAB is well defined, but if mrm\neq r, BABA is not defined.

Even when both products are well defined, for example, when AA and BB are n×nn\times n (square) matrices, the multiplication is still non-commutative. If we just pick the matrices AA and BB at random, the chances are that ABBAAB\neq BA: we have to be very lucky to get AB=BAAB=BA.

1.5.4. Transposed matrices and multiplication.

Given a matrix AA, its transpose (or transposed matrix) ATA^{T} is defined by transforming the rows of AA into the columns. For example

(123456)T=(142536).\left(\begin{array}[]{ccc}1&2&3\\ 4&5&6\end{array}\right)^{T}=\left(\begin{array}[]{cc}1&4\\ 2&5\\ 3&6\end{array}\right).

So, the columns of ATA^{T} are the rows of AA and vice versa, the rows of ATA^{T} are the columns of AA.

The formal definition is as follows: (AT)j,k=(A)k,j(A^{T})_{j,k}=(A)_{k,j} meaning that the entry of ATA^{T} in the row number jj and column number kk equals the entry of AA in the row number kk and row number jj.

The transpose of a matrix has a very nice interpretation in terms of linear transformations, namely it gives the so-called adjoint transformation. We will study this in detail later, but for now transposition will be just a useful formal operation.

One of the first uses of the transpose is that we can write a column vector 𝐱𝔽n\mathbf{x}\in\mathbb{F}^{n} as 𝐱=(x1,x2,,xn)T\mathbf{x}=(x_{1},x_{2},\ldots,x_{n})^{T}. If we put the column vertically, it will use significantly more space.

A simple analysis of the row by columns rule shows that

(AB)T=BTAT,(AB)^{T}=B^{T}A^{T},

i.e. when you take the transpose of the product, you change the order of the terms.

1.5.5. Trace and matrix multiplication

For a square (n×nn\times n) matrix A=(aj,k)A=(a_{j,k}) its trace (denoted by traceA\operatorname{trace}A) is the sum of the diagonal entries

traceA=k=1nak,k.\operatorname{trace}A=\sum_{k=1}^{n}a_{k,k}.
Theorem 1.5.1.

Let AA and BB be matrices of size m×nm\times n and n×mn\times m respectively (so the both products ABAB and BABA are well defined). Then

trace(AB)=trace(BA)\operatorname{trace}(AB)=\operatorname{trace}(BA)

We leave the proof of this theorem as an exercise, see Problem 1.5.6 below. There are essentially two ways of proving this theorem. One is to compute the diagonal entries of ABAB and of BABA and compare their sums. This method requires some proficiency in manipulating sums in \sum notation.

If you are not comfortable with algebraic manipulations, there is another way. We can consider two linear transformations, TT and T1T_{1}, acting from Mn×mM_{n\times m} to 𝔽=𝔽1\mathbb{F}=\mathbb{F}^{1} defined by

T(X)=trace(AX),T1(X)=trace(XA)T(X)=\operatorname{trace}(AX),\qquad T_{1}(X)=\operatorname{trace}(XA)

To prove the theorem it is sufficient to show that T=T1T=T_{1}; the equality for X=BX=B gives the theorem.

Since a linear transformation is completely defined by its values on a generating system, we need just to check the equality on some simple matrices, for example on matrices Xj,kX_{j,k}, which has all entries 0 except the entry 11 in the intersection of jjth column and kkth row.

Exercises.

1.5.1.

Let

A=(1231),B=(102312),C=(123211),D=(221)A=\left(\begin{array}[]{cc}1&2\\ 3&1\end{array}\right),\ B=\left(\begin{array}[]{ccc}1&0&2\\ 3&1&-2\end{array}\right),\ C=\left(\begin{array}[]{ccc}1&-2&3\\ -2&1&-1\end{array}\right),\ D=\left(\begin{array}[]{c}-2\\ 2\\ 1\end{array}\right)
  1. a)

    Mark all the products that are defined, and give the dimensions of the result: ABAB, BABA, ABCABC, ABDABD, BCBC, BCTBC^{T}, BTCB^{T}C, DCDC, DTCTD^{T}C^{T}.

  2. b)

    Compute ABAB, A(3B+C)A(3B+C), BTAB^{T}A, A(BD)A(BD), (AB)D(AB)D.

1.5.2.

Let TγT_{\gamma} be the matrix of rotation by γ\gamma in 2\mathbb{R}^{2}. Check by matrix multiplication that TγTγ=TγTγ=IT_{\gamma}T_{-\gamma}=T_{-\gamma}T_{\gamma}=I

1.5.3.

Multiply two rotation matrices TαT_{\alpha} and TβT_{\beta} (it is a rare case when the multiplication is commutative, i.e. TαTβ=TβTαT_{\alpha}T_{\beta}=T_{\beta}T_{\alpha}, so the order is not essential). Deduce formulas for sin(α+β)\sin(\alpha+\beta) and cos(α+β)\cos(\alpha+\beta) from here.

1.5.4.

Find the matrix of the orthogonal projection in 2\mathbb{R}^{2} onto the line x1=2x2x_{1}=-2x_{2}. Hint: What is the matrix of the projection onto the coordinate axis x1x_{1}?

1.5.5.

Find linear transformations A,B:22A,B:\mathbb{R}^{2}\to\mathbb{R}^{2} such that AB=𝟎AB=\mathbf{0} but BA𝟎BA\neq\mathbf{0}.

1.5.6.

Prove Theorem 1.5.1, i.e. prove that trace(AB)=trace(BA)\operatorname{trace}(AB)=\operatorname{trace}(BA).

1.5.7.

Construct a non-zero matrix AA such that A2=𝟎A^{2}=\mathbf{0}.

1.5.8.

Find the matrix of the reflection through the line y=2x/3y=-2x/3. Perform all the multiplications.

1.6. Invertible transformations and matrices. Isomorphisms

1.6.1. Identity transformation and identity matrix.

Among all linear transformations, there is a special one, the identity transformation (operator) II, I𝐱=𝐱I\mathbf{x}=\mathbf{x}, 𝐱\forall\mathbf{x}.

To be precise, there are infinitely many identity transformations: for any vector space VV, there is the identity transformation I=IV:VVI=I_{{}_{\scriptstyle V}}:V\to V, IV𝐱=𝐱I_{{}_{\scriptstyle V}}\mathbf{x}=\mathbf{x}, 𝐱V\forall\mathbf{x}\in V. However, when it is does not lead to the confusion we will use the same symbol II for all identity operators (transformations). We will use the notation IVI_{{}_{\scriptstyle V}} only we want to emphasize in what space the transformation is acting.

Clearly, if I:𝔽n𝔽nI:\mathbb{F}^{n}\to\mathbb{F}^{n} is the identity transformation in 𝔽n\mathbb{F}^{n}, its matrix is the n×nn\times n matrix

I=In=(100010001)I=I_{n}=\left(\begin{array}[]{cccc}1&0&\ldots&0\\ 0&1&\ldots&0\\ \vdots&\vdots&\ddots&\vdots\\ 0&0&\ldots&1\end{array}\right)

(11 on the main diagonal and 0 everywhere else). When we want to emphasize the size of the matrix, we use the notation InI_{n}; otherwise we just use II.

Remark.

Often, the symbol EE is used in Linear Algebra textbooks for the identity matrix. I prefer II, since it is used in operator theory.

Clearly, for an arbitrary linear transformation AA, the equalities

AI=A,IA=AAI=A,\qquad IA=A

hold (whenever the product is defined).

1.6.2. Invertible transformations.

Definition.

Let A:VWA:V\to W be a linear transformation. We say that the transformation AA is left invertible if there exist a linear transformation B:WVB:W\to V such that

BA=I(I=IV here).BA=I\qquad(I=I_{{}_{\scriptstyle V}}\text{ here}).

The transformation AA is called right invertible if there exists a linear transformation C:WVC:W\to V such that

AC=I(here I=IW).AC=I\qquad(\text{here }I=I_{{}_{\scriptstyle W}}).

The transformations BB and CC are called left and right inverses of AA. Note, that we did not assume the uniqueness of BB or CC here, and generally left and right inverses are not unique.

Definition.

A linear transformation A:VWA:V\to W is called invertible if it is both right and left invertible.

Theorem 1.6.1.

If a linear transformation A:VWA:V\to W is invertible, then its left and right inverses BB and CC are unique and coincide.

Corollary.

A transformation A:VWA:V\to W is invertible if and only if there exists a unique linear transformation (denoted A1A^{-1}), A1:WVA^{-1}:W\to V such that

(1.6.1) A1A=IV,AA1=IW.\displaystyle A^{-1}A=I_{{}_{\scriptstyle V}},\qquad AA^{-1}=I_{{}_{% \scriptstyle W}}.

The transformation A1A^{-1} is called the inverse of AA.

Remark.

Very often existence of thew transformation A1A^{-1} satisfying (1.6.1) is used as the definition of an invertible transformation.

Proof of Theorem 1.6.1.

Let BA=IBA=I and AC=IAC=I. Then

BAC=B(AC)=BI=B.BAC=B(AC)=BI=B.

On the other hand

BAC=(BA)C=IC=C,BAC=(BA)C=IC=C,

and therefore B=CB=C.

Suppose for some transformation B1B_{1} we have B1A=IB_{1}A=I. Repeating the above reasoning with B1B_{1} instead of BB we get B1=CB_{1}=C. Therefore the left inverse BB is unique. The uniqueness of CC is proved similarly. ∎

Definition.

A matrix is called invertible (resp. left invertible, right invertible) if the corresponding linear transformation is invertible (resp. left invertible, right invertible).

Theorem 1.6.1 asserts that a matrix AA is invertible if there exists a unique matrix A1A^{-1} such that A1A=IA^{-1}A=I, AA1=IAA^{-1}=I. The matrix A1A^{-1} is called (surprise) the inverse of AA.

Examples.

  1. 1.

    The identity transformation (matrix) is invertible, I1=II^{-1}=I;

  2. 2.

    The rotation RγR_{\gamma}

    Rγ=(cosγsinγsinγcosγ)R_{\gamma}=\left(\begin{array}[]{cc}\cos\gamma&-\sin\gamma\\ \sin\gamma&\cos\gamma\end{array}\right)

    is invertible, and the inverse is given by (Rγ)1=Rγ(R_{\gamma})^{-1}=R_{-\gamma}. This equality is clear from the geometric description of RγR_{\gamma}, and it also can be checked by the matrix multiplication;

  3. 3.

    The column (1,1)T(1,1)^{T} is left invertible but not right invertible. One of the possible left inverses in the row (1/2,1/2)(1/2,1/2).

    To show that this matrix is not right invertible, we just notice that there are more than one left inverse. Exercise: describe all left inverses of this matrix.

  4. 4.

    The row (1,1)(1,1) is right invertible, but not left invertible. The column (1/2,1/2)T(1/2,1/2)^{T} is a possible right inverse.

Remark 1.6.2.

An invertible matrix must be square (n×nn\times n). Moreover, if a square matrix AA has either left or right inverse, it is invertible. So, it is sufficient to check only one of the identities AA1=IAA^{-1}=I, A1A=IA^{-1}A=I.

This fact will be proved later. Until we prove this fact, we will not use it. I presented it here only to prevent the reader from trying wrong directions.

Properties of the inverse transformation.

Theorem 1.6.3 (Inverse of the product).

If linear transformations AA and BB are invertible (and such that the product ABAB is defined), then the product ABAB is invertible and

(AB)1=B1A1;(AB)^{-1}=B^{-1}A^{-1};

note the change of the order!

Proof.

Direct computation shows:

(AB)(B1A1)=A(BB1)A1=AIA1=AA1=I(AB)(B^{-1}A^{-1})=A(BB^{-1})A^{-1}=AIA^{-1}=AA^{-1}=I

and similarly

(B1A1)(AB)=B1(A1A)B=B1IB=B1B=I(B^{-1}A^{-1})(AB)=B^{-1}(A^{-1}A)B=B^{-1}IB=B^{-1}B=I

Remark 1.6.4.

The invertibility of the product ABAB does not imply the invertibility of the factors AA and BB (can you think of an example?). However, if one of the factors (either AA or BB) and the product ABAB are invertible, then the second factor is also invertible.

We leave the proof of this fact as an exercise.

Theorem 1.6.5 (Inverse of ATA^{T}).

If a matrix AA is invertible, then ATA^{T} is also invertible and

(AT)1=(A1)T(A^{T})^{-1}=(A^{-1})^{T}
Proof.

Using (AB)T=BTAT(AB)^{T}=B^{T}A^{T} we get

(A1)TAT=(AA1)T=IT=I,(A^{-1})^{T}A^{T}=(AA^{-1})^{T}=I^{T}=I,

and similarly

AT(A1)T=(A1A)T=IT=I.A^{T}(A^{-1})^{T}=(A^{-1}A)^{T}=I^{T}=I.

And finally, if AA is invertible, then A1A^{-1} is also invertible, (A1)1=A(A^{-1})^{-1}=A.

So, let us summarize the main properties of the inverse:

  1. 1.

    If AA is invertible, then A1A^{-1} is also invertible, (A1)1=A(A^{-1})^{-1}=A;

  2. 2.

    If AA and BB are invertible and the product ABAB is defined, then ABAB is invertible and (AB)1=B1A1(AB)^{-1}=B^{-1}A^{-1}.

  3. 3.

    If AA is invertible, then ATA^{T} is also invertible and (AT)1=(A1)T(A^{T})^{-1}=(A^{-1})^{T}.

1.6.3. Isomorphism. Isomorphic spaces.

An invertible linear transformation A:VWA:V\to W is called an isomorphism. We did not introduce anything new here, it is just another name for the object we already studied.

Two vector spaces VV and WW are called isomorphic (denoted VWV\cong W) if there is an isomorphism A:VWA:V\to W.

Isomorphic spaces can be considered as different representation of the same space, meaning that all properties and constructions involving vector space operations are preserved under isomorphism.

The theorem below illustrates this statement.

Theorem 1.6.6.

Let A:VWA:V\to W be an isomorphism, and let 𝐯1,𝐯2,,𝐯n\mathbf{v}_{1},\mathbf{v}_{2},\ldots,\mathbf{v}_{n} be a basis in VV. Then the system A𝐯1,A𝐯2,,A𝐯nA\mathbf{v}_{1},A\mathbf{v}_{2},\ldots,A\mathbf{v}_{n} is a basis in WW.

We leave the proof of the theorem as an exercise.

Remark.

In the above theorem one can replace “basis” by “linearly independent”, or “generating”, or “linearly dependent”—all these properties are preserved under isomorphisms.

Remark.

If AA is an isomorphism, then so is A1A^{-1}. Therefore in the above theorem we can state that 𝐯1,𝐯2,,𝐯n\mathbf{v}_{1},\mathbf{v}_{2},\ldots,\mathbf{v}_{n} is a basis if and only if A𝐯1,A𝐯2,,A𝐯nA\mathbf{v}_{1},A\mathbf{v}_{2},\linebreak\ldots,A\mathbf{v}_{n} is a basis.

The converse to the Theorem 1.6.6 is also true

Theorem 1.6.7.

Let A:VWA:V\to W be a linear map,and let 𝐯1,𝐯2,,𝐯n\mathbf{v}_{1},\mathbf{v}_{2},\ldots,\mathbf{v}_{n} and 𝐰1,𝐰2,,𝐰n\mathbf{w}_{1},\mathbf{w}_{2},\ldots,\mathbf{w}_{n} be bases in VV and WW respectively. If A𝐯k=𝐰kA\mathbf{v}_{k}=\mathbf{w}_{k}, k=1,2,,nk=1,2,\ldots,n, then AA is an isomorphism.

Proof.

Define the inverse transformation A1A^{-1} by A1𝐰k=𝐯kA^{-1}\mathbf{w}_{k}=\mathbf{v}_{k}, k=1,2,,nk=1,\linebreak 2,\ldots,n (as we know, a linear transformation is defined by its values on a basis). ∎

Examples

  1. 1.

    Let A:𝔽n+1n𝔽A:\mathbb{F}^{n+1}\to\mathbb{P}_{n}^{\mathbb{F}} (n𝔽\mathbb{P}_{n}^{\mathbb{F}} is the set of polynomials k=0naktk\sum_{k=0}^{n}a_{k}t^{k}, ak𝔽a_{k}\in\mathbb{F} of degree at most nn) is defined by

    A𝐞1=1,A𝐞2=t,,A𝐞n=tn1,A𝐞n+1=tnA\mathbf{e}_{1}=1,\ A\mathbf{e}_{2}=t,\ldots,A\mathbf{e}_{n}=t^{n-1},\ A% \mathbf{e}_{n+1}=t^{n}

    By Theorem 1.6.7 AA is an isomorphism, so n𝔽𝔽n+1\mathbb{P}_{n}^{\mathbb{F}}\cong\mathbb{F}^{n+1}.

  2. 2.

    Let VV be a vector space (over 𝔽\mathbb{F}) with a basis 𝐯1,𝐯2,,𝐯n\mathbf{v}_{1},\mathbf{v}_{2},\ldots,\mathbf{v}_{n}. Define transformation A:𝔽nVA:\mathbb{F}^{n}\to V by

    A𝐞k=𝐯k,k=1,2,,n,A\mathbf{e}_{k}=\mathbf{v}_{k},\qquad k=1,2,\ldots,n,

    where 𝐞1,𝐞2,,𝐞n\mathbf{e}_{1},\mathbf{e}_{2},\ldots,\mathbf{e}_{n} is the standard basis in 𝔽n\mathbb{F}^{n}. Again by Theorem 1.6.7 AA is an isomorphism, so V𝔽nV\cong\mathbb{F}^{n}.

  3. 3.

    The space M2×3𝔽M_{2\times 3}^{\mathbb{F}} of 2×32\times 3 matrices with entries in 𝔽\mathbb{F} is isomorphic to 𝔽6\mathbb{F}^{6};

  4. 4.

    More generally, Mm×n𝔽𝔽mnM_{m\times n}^{\mathbb{F}}\cong\mathbb{F}^{m\cdot n}

Remark.

Any real vector space with a basis is isomorphic to n\mathbb{R}^{n} (for some nn). Similarly, any complex vector space with a basis is isomorphic to n\mathbb{C}^{n}.

1.6.4. Invertibility and equations.

Theorem 1.6.8.

Let A:XYA:X\to Y be a linear transformation. Then AA is invertible if and only if for any right side 𝐛Y\mathbf{b}\in Y the equation

A𝐱=𝐛A\mathbf{x}=\mathbf{b}

has a unique solution 𝐱X\mathbf{x}\in X.

Proof.

Suppose AA is invertible. Then 𝐱=A1𝐛\mathbf{x}=A^{-1}\mathbf{b} solves the equation A𝐱=𝐛A\mathbf{x}=\mathbf{b}. To show that the solution is unique, suppose that for some other vector 𝐱1X\mathbf{x}_{1}\in X

A𝐱1=𝐛A\mathbf{x}_{1}=\mathbf{b}

Multiplying this identity by A1A^{-1} from the left we get

A1A𝐱1=A1𝐛,A^{-1}A\mathbf{x}_{1}=A^{-1}\mathbf{b},

and therefore 𝐱1=A1𝐛=𝐱\mathbf{x}_{1}=A^{-1}\mathbf{b}=\mathbf{x}. Note that both identities, AA1=IAA^{-1}=I and A1A=IA^{-1}A=I were used here.

Let us now suppose that the equation A𝐱=𝐛A\mathbf{x}=\mathbf{b} has a unique solution 𝐱\mathbf{x} for any 𝐛Y\mathbf{b}\in Y. Let us use symbol 𝐲\mathbf{y} instead of 𝐛\mathbf{b}. We know that given 𝐲Y\mathbf{y}\in Y the equation

A𝐱=𝐲A\mathbf{x}=\mathbf{y}

has a unique solution 𝐱X\mathbf{x}\in X. Let us call this solution B(𝐲)B(\mathbf{y}).

Note that B(𝐲)B(\mathbf{y}) is defined for all 𝐲Y\mathbf{y}\in Y, so we defined a transformation B:YXB:Y\to X.

Let us check that BB is a linear transformation. We need to show that B(α𝐲1+β𝐲2)=αB(𝐲1)+βB(𝐲2)B(\alpha\mathbf{y}_{1}+\beta\mathbf{y}_{2})=\alpha B(\mathbf{y}_{1})+\beta B(% \mathbf{y}_{2}). Let 𝐱k:=B(𝐲k)\mathbf{x}_{k}:=B(\mathbf{y}_{k}), k=1,2k=1,2, i.e. A𝐱k=𝐲kA\mathbf{x}_{k}=\mathbf{y}_{k}, k=1,2k=1,2. Then

A(α𝐱1+β𝐱2)=αA𝐱1+βA𝐱2=α𝐲1+β𝐲2,A(\alpha\mathbf{x}_{1}+\beta\mathbf{x}_{2})=\alpha A\mathbf{x}_{1}+\beta A% \mathbf{x}_{2}=\alpha\mathbf{y}_{1}+\beta\mathbf{y}_{2},

which means

B(α𝐲1+β𝐲2)=αB(𝐲1)+βB(𝐲2).B(\alpha\mathbf{y}_{1}+\beta\mathbf{y}_{2})=\alpha B(\mathbf{y}_{1})+\beta B(% \mathbf{y}_{2}).

And finally, let us show that BB is indeed the inverse of AA. Take 𝐱X\mathbf{x}\in X and let 𝐲=A𝐱\mathbf{y}=A\mathbf{x}, so by the definition of BB we have 𝐱=B𝐲\mathbf{x}=B\mathbf{y}. Then for all 𝐱X\mathbf{x}\in X

BA𝐱=B𝐲=𝐱,BA\mathbf{x}=B\mathbf{y}=\mathbf{x},

so BA=IBA=I. Similarly, for arbitrary 𝐲Y\mathbf{y}\in Y let 𝐱=B𝐲\mathbf{x}=B\mathbf{y}, so 𝐲=A𝐱\mathbf{y}=A\mathbf{x}. Then for all 𝐲Y\mathbf{y}\in Y

AB𝐲=A𝐱=𝐲AB\mathbf{y}=A\mathbf{x}=\mathbf{y}

so AB=IAB=I. ∎

Recalling the definition of a basis we get the following corollary of Theorems 1.6.6 and 1.6.7.

Corollary 1.6.9.

An m×nm\times n matrix is invertible if and only if its columns form a basis in 𝔽m\mathbb{F}^{m}.

Exercises.

1.6.1.

Prove, that if A:VWA:V\to W is an isomorphism (i.e. an invertible linear transformation) and 𝐯1,𝐯2,,𝐯n\mathbf{v}_{1},\mathbf{v}_{2},\ldots,\mathbf{v}_{n} is a basis in VV, then A𝐯1,A𝐯2,,A𝐯nA\mathbf{v}_{1},A\mathbf{v}_{2},\ldots,A\mathbf{v}_{n} is a basis in WW.

1.6.2.

Find all right inverses to the 1×21\times 2 matrix (row) A=(1,1)A=(1,1). Conclude from here that the row AA is not left invertible.

1.6.3.

Find all left inverses of the column (1,2,3)T(1,2,3)^{T}

1.6.4.

Is the column (1,2,3)T(1,2,3)^{T} right invertible? Justify

1.6.5.

Find two matrices AA and BB that ABAB is invertible, but AA and BB are not. Hint: square matrices AA and BB would not work. Remark: It is easy to construct such AA and BB in the case when ABAB is a 1×11\times 1 matrix (a scalar). But can you get 2×22\times 2 matrix ABAB? 3×33\times 3? n×nn\times n?

1.6.6.

Suppose the product ABAB is invertible. Show that AA is right invertible and BB is left invertible. Hint: you can just write formulas for right and left inverses.

1.6.7.

Let AA and ABAB be invertible (assuming that the product ABAB is well defined). Prove that BB is invertible.

1.6.8.

Let AA be n×nn\times n matrix. Prove that if A2=𝟎A^{2}=\mathbf{0} then AA is not invertible

1.6.9.

Suppose AB=𝟎AB=\mathbf{0} for some non-zero matrix BB. Can AA be invertible? Justify.

1.6.10.

Write matrices of the linear transformations T1T_{1} and T2T_{2} in 𝔽5\mathbb{F}^{5}, defined as follows: T1T_{1} interchanges the coordinates x2x_{2} and x4x_{4} of the vector 𝐱\mathbf{x}, and T2T_{2} just adds to the coordinate x2x_{2} aa times the coordinate x4x_{4}, and does not change other coordinates, i.e.

T1(x1x2x3x4x5)=(x1x4x3x2x5),T2(x1x2x3x4x5)=(x1x2+ax4x3x4x5);T_{1}\left(\begin{array}[]{c}x_{1}\\ x_{2}\\ x_{3}\\ x_{4}\\ x_{5}\end{array}\right)=\left(\begin{array}[]{c}x_{1}\\ x_{4}\\ x_{3}\\ x_{2}\\ x_{5}\end{array}\right),\qquad T_{2}\left(\begin{array}[]{c}x_{1}\\ x_{2}\\ x_{3}\\ x_{4}\\ x_{5}\end{array}\right)=\left(\begin{array}[]{c}x_{1}\\ x_{2}+ax_{4}\\ x_{3}\\ x_{4}\\ x_{5}\end{array}\right);

here aa is some fixed number.

Show that T1T_{1} and T2T_{2} are invertible transformations, and write the matrices of the inverses. Hint: it may be simpler, if you first describe the inverse transformation, and then find its matrix, rather than trying to guess (or compute) the inverses of the matrices T1T_{1}, T2T_{2}.

1.6.11.

Find the matrix of the rotation in 3\mathbb{R}^{3} through the angle α\alpha around the vector (1,2,3)T(1,2,3)^{T}. We assume that the rotation is counterclockwise if we sit at the tip of the vector and looking at the origin.

You can present the answer as a product of several matrices: you don’t have to perform the multiplication.

1.6.12.

Give examples of matrices (say 2×22\times 2) such that:

  1. a)

    A+BA+B is not invertible although both AA and BB are invertible;

  2. b)

    A+BA+B is invertible although both AA and BB are not invertible;

  3. c)

    All of AA, BB and A+BA+B are invertible

1.6.13.

Let AA be an invertible symmetric (AT=AA^{T}=A) matrix. Is the inverse of AA symmetric? Justify.

1.7. Subspaces.

A subspace of a vector space VV is a non-empty subset V0VV_{0}\subset V of VV which is closed under the vector addition and multiplication by scalars, i.e.

  1. 1.

    If 𝐯V0\mathbf{v}\in V_{0} then α𝐯V0\alpha\mathbf{v}\in V_{0} for all scalars α\alpha;

  2. 2.

    For any 𝐮,𝐯V0\mathbf{u},\mathbf{v}\in V_{0} the sum 𝐮+𝐯V0\mathbf{u}+\mathbf{v}\in V_{0};

Again, the conditions 1 and 2 can be replaced by the following one:

α𝐮+β𝐯V0for all 𝐮,𝐯V0,and for all scalars α,β.\alpha\mathbf{u}+\beta\mathbf{v}\in V_{0}\qquad\text{for all }\mathbf{u},% \mathbf{v}\in V_{0},\ \text{and for all scalars }\alpha,\beta.

Note, that a subspace V0VV_{0}\subset V with the operations (vector addition and multiplication by scalars) inherited from VV, is a vector space. Indeed, since VV is non-empty, it contain at least 11 vector 𝐯\mathbf{v} and since 𝟎=0𝐯\mathbf{0}=0\mathbf{v}, the above condition 1. imples that the zero vector 𝟎\mathbf{0} is in VV. Also, for any 𝐯V\mathbf{v}\in V its additive inverse 𝐯-\mathbf{v} in given by 𝐯=(1)𝐯-\mathbf{v}=(-1)\mathbf{v}, so again by property 1. 𝐯V-\mathbf{v}\in V. The rest of the axiom of the vector space are satisfied because all operations are inherited from the vector space VV. The only thing that could possibly go wrong, is that the result of some operation does not belong to V0V_{0}. But the definition of a subspace prohibits this!

Now let us consider some examples:

  1. 1.

    Trivial subspaces of a space VV, namely VV itself and {𝟎}\{\mathbf{0}\} (the subspace consisting only of zero vector). Note, that the empty set \varnothing is not a vector space, since it does not contain a zero vector, so it is not a subspace.

With each linear transformation A:VWA:V\to W we can associate the following two subspaces:

  1. 2.

    The null space, or kernel of AA, which is denoted as NullA\operatorname{Null}A or KerA\operatorname{Ker}A and consists of all vectors 𝐯V\mathbf{v}\in V such that A𝐯=𝟎A\mathbf{v}=\mathbf{0}

  2. 3.

    The range RanA\operatorname{Ran}A is defined as the set of all vectors 𝐰W\mathbf{w}\in W which can be represented as 𝐰=A𝐯\mathbf{w}=A\mathbf{v} for some 𝐯V\mathbf{v}\in V.

If AA is a matrix, i.e. A:𝔽m𝔽nA:\mathbb{F}^{m}\to\mathbb{F}^{n}, then recalling column by coordinate rule of the matrix–vector multiplication, we can see that any vector 𝐰RanA\mathbf{w}\in\operatorname{Ran}A can be represented as a linear combination of columns of the matrix AA. That explains why the term column space (and notation ColA\operatorname{Col}A) is often used for the range of the matrix. So, for a matrix AA, the notation ColA\operatorname{Col}A is often used instead of RanA\operatorname{Ran}A.

And now the last example.

  1. 4.

    Given a system of vectors 𝐯1,𝐯2,,𝐯rV\mathbf{v}_{1},\mathbf{v}_{2},\ldots,\mathbf{v}_{r}\in V its linear span (sometimes called simply span) {𝐯1,𝐯2,,𝐯r}\mathcal{L}\{\mathbf{v}_{1},\mathbf{v}_{2},\ldots,\mathbf{v}_{r}\} is the collection of all vectors 𝐯V\mathbf{v}\in V that can be represented as a linear combination 𝐯=α1𝐯1+α2𝐯2++αr𝐯r\mathbf{v}=\alpha_{1}\mathbf{v}_{1}+\alpha_{2}\mathbf{v}_{2}+\ldots+\alpha_{r}% \mathbf{v}_{r} of vectors 𝐯1,𝐯2,,𝐯r\mathbf{v}_{1},\mathbf{v}_{2},\ldots,\mathbf{v}_{r}. The notation span{𝐯1,𝐯2,,𝐯r}\operatorname{span}\{\mathbf{v}_{1},\mathbf{v}_{2},\ldots,\mathbf{v}_{r}\} is also used instead of {𝐯1,𝐯2,,𝐯r}\mathcal{L}\{\mathbf{v}_{1},\mathbf{v}_{2},\ldots,\mathbf{v}_{r}\}

It is easy to check that in all of these examples we indeed have subspaces. We leave this an an exercise for the reader. Some of the statements will be proved later in the text.

Exercises.

1.7.1.

Let XX and YY be subspaces of a vector space VV. Prove that XYX\cap Y is a subspace of VV.

1.7.2.

Let VV be a vector space. For X,YVX,Y\subset V the sum X+YX+Y is the collection of all vectors 𝐯\mathbf{v} which can be represented as 𝐯=𝐱+𝐲\mathbf{v}=\mathbf{x}+\mathbf{y}, 𝐱X\mathbf{x}\in X, 𝐲Y\mathbf{y}\in Y. Show that if XX and YY are subspaces of VV, then X+YX+Y is also a subspace.

1.7.3.

Let XX be a subspace of a vector space VV, and let 𝐯V\mathbf{v}\in V, 𝐯X\mathbf{v}\notin X. Prove that if 𝐱X\mathbf{x}\in X then 𝐱+𝐯X\mathbf{x}+\mathbf{v}\notin X.

1.7.4.

Let XX and YY be subspaces of a vector space VV. Using the previous exercise, show that XYX\cup Y is a subspace if and only if XYX\subset Y or YXY\subset X.

1.7.5.

What is the smallest subspace of the space of 4×44\times 4 matrices which contains all upper triangular matrices (aj,k=0a_{j,k}=0 for all j>kj>k), and all symmetric matrices (A=ATA=A^{T})? What is the largest subspace contained in both of those subspaces?

1.8. Application to computer graphics.

In this section we give some ideas of how linear algebra is used in computer graphics. We will not go into the details, but just explain some ideas. In particular we explain why manipulation with 33 dimensional images are reduced to multiplications of 4×44\times 4 matrices.

1.8.1. 22-dimensional manipulation.

The xx-yy plane (more precisely, a rectangle there) is a good model of a computer monitor. Any object on a monitor is represented as a collection of pixels, each pixel is assigned a specific color. Position of each pixel is determined by the column and row, which play role of xx and yy coordinates on the plane. So a rectangle on a plane with xx-yy coordinates is a good model for a computer screen: and a graphical object is just a collection of points.

Remark.

There are two types of graphical objects: bitmap objects, where every pixel of an object is described, and vector object, where we describe only critical points, and graphic engine connects them to reconstruct the object. A (digital) photo is a good example of a bitmap object: every pixel of it is described. Bitmap object can contain a lot of points, so manipulations with bitmaps require a lot of computing power. Anybody who has edited digital photos in a bitmap manipulation program, like Adobe Photoshop, knows that one needs quite a powerful computer, and even with modern and powerful computers manipulations can take some time.

That is the reason that most of the objects, appearing on a computer screen are vector ones: the computer only needs to memorize critical points. For example, to describe a polygon, one needs only to give the coordinates of its vertices, and which vertex is connected with which. Of course, not all objects on a computer screen can be represented as polygons, some, like letters, have curved smooth boundaries. But there are standard methods allowing one to draw smooth curves through a collection of points, for example Bezier splines, used in PostScript and Adobe PDF (and in many other formats).

Anyhow, this is the subject of a completely different book, and we will not discuss it here. For us a graphical object will be a collection of points (either wireframe model, or bitmap) and we would like to show how one can perform some manipulations with such objects.

The simplest transformation is a translation (shift), where each point (vector) 𝐯\mathbf{v} is translated by 𝐚\mathbf{a}, i.e. the vector 𝐯\mathbf{v} is replaced by 𝐯+𝐚\mathbf{v}+\mathbf{a} (notation 𝐯𝐯+𝐚\mathbf{v}\mapsto\mathbf{v}+\mathbf{a} is used for this). Vector addition is very well adapted to computers, so the translation is easy to implement.

Note, that the translation is not a linear transformation (if 𝐚𝟎\mathbf{a}\neq\mathbf{0}): while it preserves the straight lines, it does not preserve 𝟎\mathbf{0}.

All other transformation used in computer graphics are linear. The first one that comes to mind is rotation. The rotation by γ\gamma around the origin 𝟎\mathbf{0} is given by the multiplication by the rotation matrix RγR_{\gamma} we discussed above,

Rγ=(cosγsinγsinγcosγ).R_{\gamma}=\left(\begin{array}[]{cc}\cos\gamma&-\sin\gamma\\ \sin\gamma&\cos\gamma\end{array}\right).

If we want to rotate around a point 𝐚\mathbf{a}, we first need to translate the picture by 𝐚-\mathbf{a}, moving the point 𝐚\mathbf{a} to 𝟎\mathbf{0}, then rotate around 𝟎\mathbf{0} (multiply by RγR_{\gamma}) and then translate everything back by 𝐚\mathbf{a}.

Another very useful transformation is scaling, given by a matrix

(a00b),\left(\begin{array}[]{cc}a&0\\ 0&b\end{array}\right),

a,b0a,b\geq 0. If a=ba=b it is uniform scaling which enlarges (reduces) an object, preserving its shape. If aba\neq b then xx and yy coordinates scale differently; the object becomes “taller” or “wider”.

Another often used transformation is reflection: for example the matrix

(1001),\left(\begin{array}[]{cc}1&0\\ 0&-1\end{array}\right),

defines the reflection through xx-axis.

We will show later in the book, that any linear transformation in 2\mathbb{R}^{2} can be represented either as a composition of scaling rotations and reflections. However it is sometimes convenient to consider some different transformations, like the shear transformation, given by the matrix

(1tanφ01).\left(\begin{array}[]{cc}1&\tan\varphi\\ 0&1\end{array}\right).

This transformation makes all objects slanted, the horizontal lines remain horizontal, but vertical lines go to the slanted lines at the angle φ\varphi to the vertical ones.

1.8.2. 33-dimensional graphics

Three-dimensional graphics is more complicated. First we need to be able to manipulate 33-dimensional objects, and then we need to represent it on 22-dimensional plane (monitor).

The manipulations with 33-dimensional objects is pretty straightforward, we have the same basic transformations: translation, reflection through a plane, scaling, rotation. Matrices of these transformations are very similar to the matrices of their 2×22\times 2 counterparts. For example the matrices

(100010001),(a000b000c),(cosγsinγ0sinγcosγ0001)\left(\begin{array}[]{ccc}1&0&0\\ 0&1&0\\ 0&0&-1\end{array}\right),\qquad\left(\begin{array}[]{ccc}a&0&0\\ 0&b&0\\ 0&0&c\end{array}\right),\qquad\left(\begin{array}[]{ccc}\cos\gamma&-\sin\gamma% &0\\ \sin\gamma&\cos\gamma&0\\ 0&0&1\end{array}\right)

represent respectively reflection through xx-yy plane, scaling, and rotation around zz-axis.

Note, that the above rotation is essentially 22-dimensional transformation, it does not change zz coordinate. Similarly, one can write matrices for the other 2 elementary rotations around xx and around yy axes. It will be shown later that a rotation around an arbitrary axis can be represented as a composition of elementary rotations.

So, we know how to manipulate 33-dimensional objects. Let us now discuss how to represent such objects on a 22-dimensional plane. The simplest way is to project it to a plane, say to the xx-yy plane. To perform such projection one just needs to replace zz coordinate by 0, the matrix of this projection is

(100010000).\left(\begin{array}[]{ccc}1&0&0\\ 0&1&0\\ 0&0&0\end{array}\right).

Such method is often used in technical illustrations. Rotating an object and projecting it is equivalent to looking at it from different points. However, this method does not give a very realistic picture, because it does not take into account the perspective, the fact that the objects that are further away look smaller.

To get a more realistic picture one needs to use the so-called perspective projection. To define a perspective projection one needs to pick a point (the center of projection or the focal point) and a plane to project onto. Then each point in 3\mathbb{R}^{3} is projected into a point on the plane such that the point, its image and the center of the projection lie on the same line, see Fig. 1.2.

This is exactly how a camera works, and it is a reasonable first approximation of how our eyes work.

Refer to caption
Figure 1.2. Perspective projection onto xx-yy plane: FF is the center (focal point) of the projection

Let us get a formula for the projection. Assume that the focal point is (0,0,d)T(0,0,d)^{T} and that we are projecting onto xx-yy plane, see Fig. 1.3 a). Consider a point 𝐯=(x,y,z)T\mathbf{v}=(x,y,z)^{T}, and let 𝐯=(x,y,0)T\mathbf{v}^{*}=(x^{*},y^{*},0)^{T} be its projection. Analyzing similar triangles see Fig. 1.3 b), we get that

xd=xdz,\frac{x^{*}}{d}=\frac{x}{d-z},

so

x=xddz=x1z/d,x^{*}=\frac{xd}{d-z}=\frac{x}{1-z/d},

and similarly

y=y1z/d.y^{*}=\frac{y}{1-z/d}.

Note, that this formula also works if z>dz>d and if z<0z<0: you can draw the corresponding similar triangles to check it.

Thus the perspective projection maps a point (x,y,z)T(x,y,z)^{T} to the point (x1z/d,y1z/d,0)T\left(\frac{x}{1-z/d},\frac{y}{1-z/d},0\right)^{T}.

Refer to caption
Figure 1.3. Finding coordinates x,yx^{*},\ y^{*} of the perspective projection of the point (x,y,z)T(x,y,z)^{T}.

This transformation is definitely not linear (because of zz in the denominator). However it is still possible to represent it as a linear transformation. To do this let us introduce the so-called homogeneous coordinates.

In the homogeneous coordinates, every point in 3\mathbb{R}^{3} is represented by 44 coordinates, the last, 44th coordinate playing role of the scaling coefficient. Thus, to get usual 33-dimensional coordinates of the vector 𝐯=(x,y,z)T\mathbf{v}=(x,y,z)^{T} from its homogeneous coordinates (x1,x2,x3,x4)T(x_{1},x_{2},x_{3},x_{4})^{T} one needs to divide all entries by the last coordinate x4x_{4} and take the first 3 coordinates 444If we multiply homogeneous coordinates of a point in 3\mathbb{R}^{3} by a non-zero scalar, we do not change the point. In other words, in homogeneous coordinates a point in 3\mathbb{R}^{3} is represented by a line through 𝟎\mathbf{0} in 4\mathbb{R}^{4}. (if x4=0x_{4}=0 this recipe does not work, so we assume that the case x4=0x_{4}=0 corresponds to the point at infinity).

Thus in homogeneous coordinates the vector 𝐯\mathbf{v}^{*} can be represented as (x,y,0,1z/d)T(x,y,0,1-z/d)^{T}, so in homogeneous coordinates the perspective projection is a linear transformation:

(xy01z/d)=(100001000000001/d1)(xyz1).\left(\begin{array}[]{c}x\\ y\\ 0\\ 1-z/d\end{array}\right)=\left(\begin{array}[]{cccccccccc}1&0&0&0\\ 0&1&0&0\\ 0&0&0&0\\ 0&0&-1/d&1\end{array}\right)\left(\begin{array}[]{c}x\\ y\\ z\\ 1\end{array}\right).

Note that in the homogeneous coordinates the translation is also a linear transformation:

(x+a1y+a2z+a31)=(100a1010a2001a30001)(xyz1).\left(\begin{array}[]{c}x+a_{1}\\ y+a_{2}\\ z+a_{3}\\ 1\end{array}\right)=\left(\begin{array}[]{cccccccccc}1&0&0&a_{1}\\ 0&1&0&a_{2}\\ 0&0&1&a_{3}\\ 0&0&0&1\end{array}\right)\left(\begin{array}[]{c}x\\ y\\ z\\ 1\end{array}\right).

But what happen if the center of projection is not a point (0,0,d)T(0,0,d)^{T} but some arbitrary point (d1,d2,d3)T(d_{1},d_{2},d_{3})^{T}. Then we first need to apply the translation by (d1,d2,0)T-(d_{1},d_{2},0)^{T} to move the center to (0,0,d3)T(0,0,d_{3})^{T} while preserving the xx-yy plane, apply the projection, and then move everything back translating it by (d1,d2,0)T(d_{1},d_{2},0)^{T}. Similarly, if the plane we project to is not xx-yy plane, we move it to the xx-yy plane by using rotations and translations, and so on.

All these operations are just multiplications by 4×44\times 4 matrices. That explains why modern graphic cards have 4×44\times 4 matrix operations embedded in the processor.

Of course, here we only touched the mathematics behind 33-dimensional graphics, there is much more. For example, how to determine which parts of the object are visible and which are hidden, how to make realistic lighting, shades, etc.

Exercises.

1.8.1.

What vector in 3\mathbb{R}^{3} has homogeneous coordinates (10,20,30,5)(10,20,30,5)?

1.8.2.

Show that a rotation through γ\gamma can be represented as a composition of two shear-and-scale transformations

T1=(10sinγcosγ),T2=(secγtanγ01).T_{1}=\left(\begin{array}[]{cc}1&0\\ \sin\gamma&\cos\gamma\end{array}\right),\qquad T_{2}=\left(\begin{array}[]{cc}% \sec\gamma&-\tan\gamma\\ 0&1\end{array}\right).

In what order the transformations should be taken?

1.8.3.

Multiplication of a 2-vector by an arbitrary 2×22\times 2 matrix usually requires 44 multiplications.

Suppose a 2×10002\times 1000 matrix DD contains coordinates of 10001000 points in 2\mathbb{R}^{2}. How many multiplications are required to transform these points using 2 arbitrary 2×22\times 2 matrices AA and BB. Compare 2 possibilities, A(BD)A(BD) and (AB)D(AB)D.

1.8.4.

Write 4×44\times 4 matrix performing perspective projection to xx-yy plane with center (d1,d2,d3)T(d_{1},d_{2},d_{3})^{T}.

1.8.5.

A transformation TT in 3\mathbb{R}^{3} is a rotation about the line y=x+3y=x+3 in the xx-yy plane through an angle γ\gamma. Write a 4×44\times 4 matrix corresponding to this transformation.

You can leave the result as a product of matrices.