Chapter 2 Systems of linear equations

2.1. Different faces of linear systems.

There exist several points of view on what a system of linear equations, or in short a linear system is. The first, naïve one is, that it is simply a collection of mm linear equations with nn unknowns x1,x2,,xnx_{1},x_{2},\ldots,x_{n},

{a1,1x1+a1,2x2++a1,nxn=b1a2,1x1+a2,2x2++a2,nxn=b2am,1x1+am,2x2++am,nxn=bm.\left\{\begin{array}[]{cccccccccc}a_{1,1}x_{1}&+&a_{1,2}x_{2}&+&\ldots&+&a_{1,% n}x_{n}&=&b_{1}&\\ a_{2,1}x_{1}&+&a_{2,2}x_{2}&+&\ldots&+&a_{2,n}x_{n}&=&b_{2}&\\ \ldots&&&&&&&&\\ a_{m,1}x_{1}&+&a_{m,2}x_{2}&+&\ldots&+&a_{m,n}x_{n}&=&b_{m}&.\end{array}\right.

To solve the system is to find all nn-tuples of numbers x1,x2,,xnx_{1},x_{2},\ldots,x_{n} which satisfy all mm equations simultaneously.

If we denote 𝐱:=(x1,x2,,xn)T𝔽n\mathbf{x}:=(x_{1},x_{2},\ldots,x_{n})^{T}\in\mathbb{F}^{n}, 𝐛=(b1,b2,,bm)T𝔽m\mathbf{b}=(b_{1},b_{2},\ldots,b_{m})^{T}\in\mathbb{F}^{m}, and

A=(a1,1a1,2a1,na2,1a2,2a2,nam,1am,2am,n),A=\left(\begin{array}[]{cccccccc}a_{1,1}&a_{1,2}&\ldots&a_{1,n}\\ a_{2,1}&a_{2,2}&\ldots&a_{2,n}\\ \vdots&\vdots&&\vdots\\ a_{m,1}&a_{m,2}&\ldots&a_{m,n}\end{array}\right),

then the above linear system can be written in the matrix form (as a matrix-vector equation)

A𝐱=𝐛.A\mathbf{x}=\mathbf{b}.

To solve the above equation is to find all vectors 𝐱𝔽n\mathbf{x}\in\mathbb{F}^{n} satisfying A𝐱=𝐛A\mathbf{x}=\mathbf{b}.

And finally, recalling the “column by coordinate” rule of the matrix-vector multiplication, we can write the system as a vector equation

x1𝐚1+x2𝐚2++xn𝐚n=𝐛,x_{1}\mathbf{a}_{1}+x_{2}\mathbf{a}_{2}+\ldots+x_{n}\mathbf{a}_{n}=\mathbf{b},

where 𝐚k\mathbf{a}_{k} is the kkth column of the matrix AA, 𝐚k=(a1,k,a2,k,,am,k)T\mathbf{a}_{k}=(a_{1,k},a_{2,k},\ldots,a_{m,k})^{T}, k=1,2,,nk=1,2,\ldots,n.

Note, these three examples are essentially just different representations of the same mathematical object.

Before explaining how to solve a linear system, let us notice that it does not matter what we call the unknowns, xkx_{k}, yky_{k} or something else. So, all the information necessary to solve the system is contained in the matrix AA, which is called the coefficient matrix of the system and in the vector (right side) 𝐛\mathbf{b}. Hence, all the information we need is contained in the following matrix

(a1,1a1,2a1,nb1a2,1a2,2a2,nb2am,1am,2am,nbm)\left(\begin{array}[]{cccc|cccc}a_{1,1}&a_{1,2}&\ldots&a_{1,n}&b_{1}\\ a_{2,1}&a_{2,2}&\ldots&a_{2,n}&b_{2}\\ \vdots&\vdots&&\vdots&\vdots\\ a_{m,1}&a_{m,2}&\ldots&a_{m,n}&b_{m}\end{array}\right)

which is obtained by attaching the column 𝐛\mathbf{b} to the matrix AA. This matrix is called the augmented matrix of the system. We will usually put the vertical line separating AA and 𝐛\mathbf{b} to distinguish between the augmented matrix and the coefficient matrix.

2.2. Solution of a linear system. Echelon and reduced echelon forms

Linear systems are solved by the Gauss–Jordan elimination (which is sometimes called row reduction). By performing operations on rows of the augmented matrix of the system (i.e. on the equations), we reduce it to a simple form, the so-called echelon form. When the system is in the echelon form, one can easily write the solution.

2.2.1. Row operations.

There are three types of row operations we use:

  1. 1.

    Row exchange: interchange two rows of the matrix;

  2. 2.

    Scaling: multiply a row by a non-zero scalar aa;

  3. 3.

    Row replacement: replace a row # kk by its sum with a constant multiple of a row # jj; all other rows remain intact;

It is clear that the operations 1 and 2 do not change the set of solutions of the system; they essentially do not change the system.

As for the operation 3, one can easily see that it does not lose solutions. Namely, let a “new” system be obtained from an “old” one by a row operation of type 3. Then any solution of the “old” system is a solution of the “new” one.

To see that we do not gain anything extra, i.e. that any solution of the “new” system is also a solution of the “old” one, we just notice that row operation of type 3 are reversible, i.e. the “old’ system also can be obtained from the “new” one by applying a row operation of type 3 (can you say which one?)

Row operations and multiplication by elementary matrices

There is another, more “advanced” explanation why the above row operations are legal. Namely, every row operation is equivalent to the multiplication of the matrix from the left by one of the special elementary matrices.

Namely, the multiplication by the matrix

jkjk(1101111011)\begin{array}[]{cc}&\begin{array}[]{ccccccccccc}\phantom{\ldots}&\phantom{% \ldots}&\phantom{\ldots}&j&\phantom{\ldots}&\phantom{ldots}&\phantom{\ldots}&k% &\phantom{\ldots}&\phantom{\ldots}&\end{array}\\[4.30554pt] \begin{array}[]{c}\phantom{\vdots}\\ \phantom{\vdots}\\ \phantom{\vdots}\\ j\\ \phantom{\vdots}\\ \phantom{\vdots}\\ \phantom{\vdots}\\ k\\ \phantom{1}\\ \phantom{1}\\ \\ \end{array}&\left(\begin{array}[]{ccccccccccc}1&&&\vdots&&&&\vdots&&&\\ &\ddots&&\vdots&&&&\vdots&&&\\ &&1&\vdots&&&&\vdots&&&\\ \ldots&\ldots&\ldots&0&\ldots&\ldots&\ldots&1&&&\\ &&&\vdots&1&&&\vdots&&&\\ &&&\vdots&&\ddots&&\vdots&&&\\ &&&\vdots&&&1&\vdots&&&\\ \ldots&\ldots&\ldots&1&\ldots&\ldots&\ldots&0&&&\\ &&&&&&&&1&&\\ &&&&&&&&&\ddots&\\ &&&&&&&&&&1\end{array}\right)\end{array}

(all non-marked off-diagonal entries are 0) just interchanges the rows number jj and number kk. Multiplication by the matrix

k(1100a001001)\begin{array}[]{c}\phantom{\vdots}\\ \\ \\ k\\ \\ \phantom{\ddots}\\ \end{array}\left(\begin{array}[]{ccccccc}1&&&\vdots&&&\\ &\ddots&&\vdots&&&\\ &&1&0&&&\\ \ldots&\ldots&0&a&0&&\\ &&&0&1&&\\[-4.30554pt] &&&&&\ddots&0\\ &&&&&0&1\end{array}\right)

(again all non-marked off-diagonal entries are 0) multiplies the row number kk by aa. Finally, multiplication by the matrix

jk(110a11)\begin{array}[]{c}\vphantom{\vdots}\\ \vphantom{\vdots}\\ j\\ \vphantom{\vdots}\\ k\\ \vphantom{\ddots}\\ \vphantom{0}\end{array}\left(\begin{array}[]{ccccccc}1&&\vdots&&\vdots&&\\ &\ddots&\vdots&&\vdots&&\\ \ldots&\ldots&1&\ldots&0&&\\ &&\vdots&\ddots&\vdots&&\\ \ldots&\ldots&a&\ldots&1&&\\ &&&&&\ddots&\\ &&&&&&1\end{array}\right)

adds to the row #kk row #jj multiplied by aa, and leaves all other rows intact.

To see that the multiplication by these matrices works as advertised, one can just see how the multiplications act on vectors (columns).

A way to describe (or to remember) these elementary matrices: each of them is obtained by applying the corresponding row operation to the identity matrix II.

Note that all these matrices are invertible (compare with reversibility of row operations). The inverse of the first matrix is the matrix itself. To get the inverse of the second one, one just replaces aa by 1/a1/a. And finally, the inverse of the third matrix is obtained by replacing aa by a-a. To see that the inverses are indeed obtained this way, one again can simply check how they act on columns.

So, performing a row operation on the augmented matrix of the system A𝐱=𝐛A\mathbf{x}=\mathbf{b} is equivalent to the multiplication of the system (from the left) by a special invertible matrix EE. Left multiplying the equality A𝐱=𝐛A\mathbf{x}=\mathbf{b} by EE we get that any solution of the equation

A𝐱=𝐛A\mathbf{x}=\mathbf{b}

is also a solution of

EA𝐱=E𝐛.EA\mathbf{x}=E\mathbf{b}.

Multiplying this equation (from the left) by E1E^{-1} we get that any of its solutions is a solution of the equation

E1EA𝐱=E1E𝐛,E^{-1}EA\mathbf{x}=E^{-1}E\mathbf{b},

which is the original equation A𝐱=𝐛A\mathbf{x}=\mathbf{b}. So, a row operation does not change the solution set of a system.

2.2.2. Row reduction.

The main step of row reduction consists of three sub-steps:

  1. 1.

    Find the leftmost non-zero column of the matrix;

  2. 2.

    Make sure, by applying row operations of type 1 (row exchange), if necessary, that the first (the upper) entry of this column is non-zero. This entry will be called the pivot entry or simply the pivot;

  3. 3.

    “Kill” (i.e. make them 0) all non-zero entries below the pivot by adding (subtracting) an appropriate multiple of the first row from the rows number 2,3,,m2,3,\ldots,m.

We apply the main step to a matrix, then we leave the first row alone and apply the main step to rows 2,,m2,\ldots,m, then to rows 3,,m3,\ldots,m, etc.

The point to remember is that after we subtract a multiple of a row from all rows below it (step 3), we leave it alone and do not change it in any way, not even interchange it with another row.

After applying the main step finitely many times (at most mm), we get what is called the echelon form of the matrix.

An example of row reduction.

Let us consider the following linear system:

{x1+2x2+3x3=13x1+2x2+x3=72x1+x2+2x3=1\left\{\begin{array}[]{l}x_{1}+2x_{2}+3x_{3}=1\\ 3x_{1}+2x_{2}+x_{3}=7\\ 2x_{1}+x_{2}+2x_{3}=1\end{array}\right.

The augmented matrix of the system is

(123132172121)\left(\begin{array}[]{rrr|r}1&2&3&1\\ 3&2&1&7\\ 2&1&2&1\end{array}\right)

Subtracting 33\cdotRow#1 from the second row, and subtracting 22\cdotRow#1 from the third one we get:

(123132172121)3R12R1(123104840341)\left(\begin{array}[]{rrr|r}1&2&3&1\\ 3&2&1&7\\ 2&1&2&1\end{array}\right)\begin{array}[]{c}\\ -3R_{1}\\ -2R_{1}\end{array}\qquad\sim\qquad\left(\begin{array}[]{rrr|r}1&2&3&1\\ 0&-4&-8&4\\ 0&-3&-4&-1\end{array}\right)

Multiplying the second row by 1/4-1/4 we get

(123101210341)\left(\begin{array}[]{rrr|r}1&2&3&1\\ 0&1&2&-1\\ 0&-3&-4&-1\end{array}\right)

Adding 33\cdotRow#2 to the third row we obtain

(123101210341)+3R2(123101210024)\left(\begin{array}[]{rrr|r}1&2&3&1\\ 0&1&2&-1\\ 0&-3&-4&-1\end{array}\right)\begin{array}[]{c}\phantom{1}\\ \phantom{1}\\ +3R_{2}\end{array}\qquad\sim\qquad\left(\begin{array}[]{rrr|r}1&2&3&1\\ 0&1&2&-1\\ 0&0&2&-4\end{array}\right)

Now we can use the so called back substitution to solve the system. Namely, from the last row (equation) we get x3=2x_{3}=-2. Then from the second equation we get

x2=12x3=12(2)=3,x_{2}=-1-2x_{3}=-1-2(-2)=3,

and finally, from the first row (equation)

x1=12x23x3=16+6=1.x_{1}=1-2x_{2}-3x_{3}=1-6+6=1.

So, the solution is

{x1=1x2=3,x3=2,\left\{\begin{array}[]{l}x_{1}=1\\ x_{2}=3,\\ x_{3}=-2,\end{array}\right.

or in vector form

𝐱=(132)\mathbf{x}=\left(\begin{array}[]{r}1\\ 3\\ -2\end{array}\right)

or 𝐱=(1,3,2)T\mathbf{x}=(1,3,-2)^{T}. We can check the solution by multiplying A𝐱A\mathbf{x}, where AA is the coefficient matrix.

Instead of using back substitution, we can do row reduction from bottom to top, killing all the entries above the main diagonal of the coefficient matrix: we start by multiplying the last row by 1/21/2, and the rest is pretty self-explanatory:

(123101210012)3R32R3\displaystyle\left(\begin{array}[]{ccc|r}1&2&3&1\\ 0&1&2&-1\\ 0&0&1&-2\end{array}\right)\begin{array}[]{c}-3R_{3}\\ -2R_{3}\\ \phantom{1}\end{array} (120701030012)2R2\displaystyle\sim\left(\begin{array}[]{ccc|r}1&2&0&7\\ 0&1&0&3\\ 0&0&1&-2\end{array}\right)\begin{array}[]{c}-2R_{2}\\ \phantom{1}\\ \phantom{1}\end{array}
(100101030012)\displaystyle\sim\left(\begin{array}[]{ccc|r}1&0&0&1\\ 0&1&0&3\\ 0&0&1&-2\end{array}\right)

and we just read the solution 𝐱=(1,3,2)T\mathbf{x}=(1,3,-2)^{T} off the augmented matrix.

We leave it as an exercise to the reader to formulate the algorithm for the backward phase of the row reduction.

2.2.3. Echelon form

A matrix is in echelon form if it satisfies the following two conditions:

  1. 1.

    All zero rows (i.e. the rows with all entries equal 0), if any, are below all non-zero entries.

For a non-zero row, let us call the leftmost non-zero entry the leading entry. Then the second property of the echelon form can be formulated as follows:

  1. 2.

    For any non-zero row its leading entry is strictly to the right of the leading entry in the previous row.

The leading entry in each row in echelon form is also called pivot entry, or simply pivot, because these entries are exactly the pivots we used in the row reduction.

A particular case of the echelon form is the so-called triangular form. We got this form in our example above. In this form the coefficient matrix is square (n×nn\times n), all its entries on the main diagonal are non-zero, and all the entries below the main diagonal are zero. The right side, i.e. the rightmost column of the augmented matrix can be arbitrary.

After the backward phase of the row reduction, we get what the so-called reduced echelon form of the matrix: coefficient matrix equal II, as in the above example, is a particular case of the reduced echelon form.

The general definition is as follows: we say that a matrix is in the reduced echelon form, if it is in the echelon form and

  1. 3.

    All pivot entries are equal 11;

  2. 4.

    All entries above the pivots are 0. Note, that all entries below the pivots are also 0 because of the echelon form.

To get reduced echelon form from echelon form, we work from the bottom to the top and from the right to the left, using row replacement to kill all entries above the pivots.

An example of the reduced echelon form is the system with the coefficient matrix equal II. In this case, one just reads the solution from the reduced echelon form. In the general case, one can also easily read the solution from the reduced echelon form. For example, let the reduced echelon form of the system (augmented matrix) be

(120001001502000013);\left(\begin{array}[]{ccccc|c}\framebox{1}&2&0&0&0&1\\ 0&0&\framebox{1}&5&0&2\\ 0&0&0&0&\framebox{1}&3\end{array}\right);

here we boxed the pivots. The idea is to move the variables, corresponding to the columns without pivot (the so-called free variables) to the right side. Then we can just write the solution.

{x1=12x2x2 is freex3=25x4x4 is freex5=3\left\{\begin{array}[]{l}x_{1}=1-2x_{2}\\ x_{2}\text{ is free}\\ x_{3}=2-5x_{4}\\ x_{4}\text{ is free}\\ x_{5}=3\end{array}\right.

or in the vector form

𝐱=(12x2x225x4x43)=(10203)+x2(21000)+x4(00510),x2,x4𝔽.\mathbf{x}=\left(\begin{array}[]{c}1-2x_{2}\\ x_{2}\\ 2-5x_{4}\\ x_{4}\\ 3\end{array}\right)=\left(\begin{array}[]{c}1\\ 0\\ 2\\ 0\\ 3\end{array}\right)+x_{2}\left(\begin{array}[]{c}-2\\ 1\\ 0\\ 0\\ 0\end{array}\right)+x_{4}\left(\begin{array}[]{c}0\\ 0\\ -5\\ 1\\ 0\end{array}\right),\quad x_{2},x_{4}\in\mathbb{F}.

One can also find the solution from the echelon form by using back substitution: the idea is to work from bottom to top, moving all free variables to the right side.

Exercises.

2.2.1.

Write the systems of equations below in matrix form and as vector equations:

  1. a)

    {x1+2x2x3=12x1+2x2+x3=13x1+5x22x3=1\displaystyle\left\{\begin{array}[]{rrrrrrrrrrrrrrr}x_{1}&+&2x_{2}&-&x_{3}&=&-% 1\\ 2x_{1}&+&2x_{2}&+&x_{3}&=&1\\ 3x_{1}&+&5x_{2}&-&2x_{3}&=&-1\end{array}\right.

  2. b)

    {x12x2x3=12x13x2+x3=63x15x2=7x1+5x3=9\displaystyle\left\{\begin{array}[]{rrrrrrrrrrrrrrr}x_{1}&-&2x_{2}&-&x_{3}&=&1% \\ 2x_{1}&-&3x_{2}&+&x_{3}&=&6\\ 3x_{1}&-&5x_{2}&&&=&7\\ x_{1}&&&+&5x_{3}&=&9\end{array}\right.

  3. c)

    {x1+2x2+2x4=63x1+5x2x3+6x4=172x1+4x2+x3+2x4=122x17x3+11x4=7\displaystyle\left\{\begin{array}[]{rrrrrrrrrrrrrrr}x_{1}&+&2x_{2}&&&+&2x_{4}&% =&6\\ 3x_{1}&+&5x_{2}&-&x_{3}&+&6x_{4}&=&17\\ 2x_{1}&+&4x_{2}&+&x_{3}&+&2x_{4}&=&12\\ 2x_{1}&&&-&7x_{3}&+&11x_{4}&=&7\end{array}\right.

  4. d)

    {x14x2x3+x4=32x18x2+x34x4=9x1+4x22x3+5x4=6\displaystyle\left\{\begin{array}[]{rrrrrrrrrrrrrrr}x_{1}&-&4x_{2}&-&x_{3}&+&x% _{4}&=&3\\ 2x_{1}&-&8x_{2}&+&x_{3}&-&4x_{4}&=&9\\ -x_{1}&+&4x_{2}&-&2x_{3}&+&5x_{4}&=&-6\end{array}\right.

  5. e)

    {x1+2x2x3+3x4=22x1+4x2x3+6x4=5x2+2x4=3\displaystyle\left\{\begin{array}[]{rrrrrrrrrrrrrrr}x_{1}&+&2x_{2}&-&x_{3}&+&3% x_{4}&=&2\\ 2x_{1}&+&4x_{2}&-&x_{3}&+&6x_{4}&=&5\\ &&x_{2}&&&+&2x_{4}&=&3\end{array}\right.

  6. f)

    {2x12x2x3+6x42x5=1x1x2+x3+2x4x5=24x14x2+5x3+7x4x5=6\displaystyle\left\{\begin{array}[]{rrrrrrrrrrrrrrr}2x_{1}&-&2x_{2}&-&x_{3}&+&% 6x_{4}&-2x_{5}&=&1\\ x_{1}&-&x_{2}&+&x_{3}&+&2x_{4}&-x_{5}&=&2\\ 4x_{1}&-&4x_{2}&+&5x_{3}&+&7x_{4}&-x_{5}&=&6\end{array}\right.

  7. g)

    {3x1x2+x3x4+2x5=5x1x2x32x4x5=25x12x2+x33x4+3x5=102x1x22x4+x5=5\displaystyle\left\{\begin{array}[]{rrrrrrrrrrrrrrr}3x_{1}&-&x_{2}&+&x_{3}&-&x% _{4}&+&2x_{5}&=&5\\ x_{1}&-&x_{2}&-&x_{3}&-&2x_{4}&-&x_{5}&=&2\\ 5x_{1}&-&2x_{2}&+&x_{3}&-&3x_{4}&+&3x_{5}&=&10\\ 2x_{1}&-&x_{2}&&&-&2x_{4}&+&x_{5}&=&5\end{array}\right.

Solve the systems. Write the answers in the vector form.

2.2.2.

Find all solutions of the vector equation

x1𝐯1+x2𝐯2+x3𝐯3=𝟎,x_{1}\mathbf{v}_{1}+x_{2}\mathbf{v}_{2}+x_{3}\mathbf{v}_{3}=\mathbf{0},

where 𝐯1=(1,1,0)T\mathbf{v}_{1}=(1,1,0)^{T}, 𝐯2=(0,1,1)T\mathbf{v}_{2}=(0,1,1)^{T} and 𝐯3=(1,0,1)T\mathbf{v}_{3}=(1,0,1)^{T}. What conclusion can you make about linear independence (dependence) of the system of vectors 𝐯1,𝐯2,𝐯3\mathbf{v}_{1},\mathbf{v}_{2},\mathbf{v}_{3}?

2.3. Analyzing the pivots.

All questions about existence of a solution and it uniqueness can be answered by analyzing pivots in the echelon (reduced echelon) form of the augmented matrix of the system. First of all, let us investigate the question of when is the equation A𝐱=𝐛A\mathbf{x}=\mathbf{b} inconsistent, i.e. when it does not have a solution. The answer follows immediately, if one just thinks about it:

a system is inconsistent (does not have a solution) if and only if there is a pivot in the last column of an echelon form of the augmented matrix, i.e. iff an echelon form of the augmented matrix has a row (000b)\left(\begin{array}[]{cccc|cccc}0&0&\ldots&0&b\end{array}\right), b0b\neq 0 in it.

Indeed, such a row correspond to the equation 0x1+0x2++0xn=b00x_{1}+0x_{2}+\ldots+0x_{n}=b\neq 0 that does not have a solution. If we don’t have such a row, we just make the reduced echelon form and then read the solution off it.

Now, three more statements. Note, they all deal with the coefficient matrix, and not with the augmented matrix of the system.

  1. 1.

    A solution (if it exists) is unique iff there are no free variables, that is if and only if the echelon form of the coefficient matrix has a pivot in every column;

  2. 2.

    Equation A𝐱=𝐛A\mathbf{x}=\mathbf{b} is consistent for all right sides 𝐛\mathbf{b} if and only if the echelon form of the coefficient matrix has a pivot in every row.

  3. 3.

    Equation A𝐱=𝐛A\mathbf{x}=\mathbf{b} has a unique solution for any right side 𝐛\mathbf{b} if and only if echelon form of the coefficient matrix AA has a pivot in every column and every row.

The first statement is trivial, because free variables are responsible for all non-uniqueness. I should only emphasize that this statement does not say anything about the existence.

The second statement is a tiny bit more complicated. If we have a pivot in every row of the coefficient matrix, we cannot have the pivot in the last column of the augmented matrix, so the system is always consistent, no matter what the right side 𝐛\mathbf{b} is.

Let us show that if we have a zero row in the echelon form of the coefficient matrix AA, then we can pick a right side 𝐛\mathbf{b} such that the system A𝐱=𝐛A\mathbf{x}=\mathbf{b} is not consistent. Let AeA_{e} echelon form of the coefficient matrix AA. Then

Ae=EA,A_{e}=EA,

where EE is the product of elementary matrices, corresponding to the row operations, E=ENE2E1E=E_{N}\ldots E_{2}E_{1}. If AeA_{e} has a zero row, then the last row is also zero. Therefore, if we put 𝐛e=(0,,0,1)T\mathbf{b}_{e}=(0,\ldots,0,1)^{T} (all entries are 0, except the last one), then the equation

Ae𝐱=𝐛eA_{e}\mathbf{x}=\mathbf{b}_{e}

does not have a solution. Multiplying this equation by E1E^{-1} from the left, and recalling that E1Ae=AE^{-1}A_{e}=A, we get that the equation

A𝐱=E1𝐛eA\mathbf{x}=E^{-1}\mathbf{b}_{e}

does not have a solution.

Finally, statement 3 immediately follows from statements 1 and 2. ∎

From the above analysis of pivots we get several very important corollaries. The main observation we use is

In echelon form, any row and any column have no more than 1 pivot in it (it can have 0 pivots)

2.3.1. Corollaries about linear independence and bases. Dimension

Questions as to when a system of vectors in 𝔽n\mathbb{F}^{n} is a basis, a linearly independent or a spanning system, can be easily answered by the row reduction.

Proposition 2.3.1.

Let us have a system of vectors 𝐯1,𝐯2,,𝐯m𝔽n\mathbf{v}_{1},\mathbf{v}_{2},\ldots,\mathbf{v}_{m}\in\mathbb{F}^{n}, and let A=[𝐯1,𝐯2,,𝐯m]A=[\mathbf{v}_{1},\mathbf{v}_{2},\ldots,\mathbf{v}_{m}] be an n×mn\times m matrix with columns 𝐯1,𝐯2,,𝐯m\mathbf{v}_{1},\mathbf{v}_{2},\ldots,\mathbf{v}_{m}. Then

  1. 1.

    The system 𝐯1,𝐯2,,𝐯m\mathbf{v}_{1},\mathbf{v}_{2},\ldots,\mathbf{v}_{m} is linearly independent iff echelon form of AA has a pivot in every column;

  2. 2.

    The system 𝐯1,𝐯2,,𝐯m\mathbf{v}_{1},\mathbf{v}_{2},\ldots,\mathbf{v}_{m} is complete in 𝔽n\mathbb{F}^{n} (spanning, generating) iff echelon form of AA has a pivot in every row;

  3. 3.

    The system 𝐯1,𝐯2,,𝐯m\mathbf{v}_{1},\mathbf{v}_{2},\ldots,\mathbf{v}_{m} is a basis in 𝔽n\mathbb{F}^{n} iff echelon form of AA has a pivot in every column and in every row.

Proof.

The system 𝐯1,𝐯2,,𝐯m𝔽n\mathbf{v}_{1},\mathbf{v}_{2},\ldots,\mathbf{v}_{m}\in\mathbb{F}^{n} is linearly independent if and only if the equation

x1𝐯1+x2𝐯2++xm𝐯m=𝟎x_{1}\mathbf{v}_{1}+x_{2}\mathbf{v}_{2}+\ldots+x_{m}\mathbf{v}_{m}=\mathbf{0}

has the unique (trivial) solution x1=x2==xm=0x_{1}=x_{2}=\ldots=x_{m}=0, or equivalently, the equation A𝐱=𝟎A\mathbf{x}=\mathbf{0} has unique solution 𝐱=𝟎\mathbf{x}=\mathbf{0}. By statement 1 above, it happens if and only if there is a pivot in every column of the matrix.

Similarly, the system 𝐯1,𝐯2,,𝐯m𝔽n\mathbf{v}_{1},\mathbf{v}_{2},\ldots,\mathbf{v}_{m}\in\mathbb{F}^{n} is complete in 𝔽n\mathbb{F}^{n} if and only if the equation

x1𝐯1+x2𝐯2++xm𝐯m=𝐛x_{1}\mathbf{v}_{1}+x_{2}\mathbf{v}_{2}+\ldots+x_{m}\mathbf{v}_{m}=\mathbf{b}

has a solution for any right side 𝐛𝔽n\mathbf{b}\in\mathbb{F}^{n}. By statement 2 above, it happens if and only if there is a pivot in every row in echelon form of the matrix.

And finally, the system 𝐯1,𝐯2,,𝐯m𝔽n\mathbf{v}_{1},\mathbf{v}_{2},\ldots,\mathbf{v}_{m}\in\mathbb{F}^{n} is a basis in 𝔽n\mathbb{F}^{n} if and only if the equation

x1𝐯1+x2𝐯2++xm𝐯m=𝐛x_{1}\mathbf{v}_{1}+x_{2}\mathbf{v}_{2}+\ldots+x_{m}\mathbf{v}_{m}=\mathbf{b}

has unique solution for any right side 𝐛𝔽n\mathbf{b}\in\mathbb{F}^{n}. By statement 3 this happens if and only if there is a pivot in every column and in every row of echelon form of AA. ∎

Proposition 2.3.2.

Any linearly independent system of vectors in 𝔽n\mathbb{F}^{n} cannot have more than nn vectors in it.

Proof.

Let a system 𝐯1,𝐯2,,𝐯m𝔽n\mathbf{v}_{1},\mathbf{v}_{2},\ldots,\mathbf{v}_{m}\in\mathbb{F}^{n} be linearly independent, and let A=[𝐯1,𝐯2,,𝐯m]A=[\mathbf{v}_{1},\mathbf{v}_{2},\ldots,\mathbf{v}_{m}] be the n×mn\times m matrix with columns 𝐯1,𝐯2,,𝐯m\mathbf{v}_{1},\mathbf{v}_{2},\ldots,\mathbf{v}_{m}. By Proposition 2.3.1 echelon form of AA must have a pivot in every column, which is impossible if m>nm>n (number of pivots cannot be more than number of rows). ∎

Proposition 2.3.3.

Any two bases in a vector space VV have the same number of vectors in them.

Proof.

Let 𝐯1,𝐯2,,𝐯n\mathbf{v}_{1},\mathbf{v}_{2},\ldots,\mathbf{v}_{n} and 𝐰1,𝐰2,,𝐰m\mathbf{w}_{1},\mathbf{w}_{2},\ldots,\mathbf{w}_{m} be two different bases in VV. Without loss of generality we can assume that nmn\leq m. Consider an isomorphism A:𝔽nVA:\mathbb{F}^{n}\to V defined by

A𝐞k=𝐯k,k=1,2,n,A\mathbf{e}_{k}=\mathbf{v}_{k},\qquad k=1,2,\ldots n,

where 𝐞1,𝐞2,,𝐞n\mathbf{e}_{1},\mathbf{e}_{2},\ldots,\mathbf{e}_{n} is the standard basis in n\mathbb{R}^{n}.

Since A1A^{-1} is also an isomorphism, the system

A1𝐰1,A1𝐰2,,A1𝐰mA^{-1}\mathbf{w}_{1},A^{-1}\mathbf{w}_{2},\ldots,A^{-1}\mathbf{w}_{m}

is a basis (see Theorem 1.6.6 in Chapter 1). So it is linearly independent, and by Proposition 2.3.2, mnm\leq n. Together with the assumption nmn\leq m this implies that m=nm=n. ∎

The statement below is a particular case of the above proposition.

Proposition 2.3.4.

Any basis in 𝔽n\mathbb{F}^{n} must have exactly nn vectors in it.

Proof.

This fact follows immediately from the previous proposition, but there is also a direct proof. Let 𝐯1,𝐯2,,𝐯m\mathbf{v}_{1},\mathbf{v}_{2},\ldots,\mathbf{v}_{m} be a basis in 𝔽n\mathbb{F}^{n} and let AA be the n×mn\times m matrix with columns 𝐯1,𝐯2,,𝐯m\mathbf{v}_{1},\mathbf{v}_{2},\ldots,\mathbf{v}_{m}. The fact that the system is a basis, means that the equation

A𝐱=𝐛A\mathbf{x}=\mathbf{b}

has a unique solution for any (all possible) right side 𝐛\mathbf{b}. The existence means that there is a pivot in every row (of a reduced echelon form of the matrix), hence the number of pivots is exactly nn. The uniqueness mean that there is pivot in every column of the coefficient matrix (its echelon form), so

m=number of columns=number of pivots=nm=\text{number of columns}=\text{number of pivots}=n

Proposition 2.3.5.

Any spanning (generating) set in 𝔽n\mathbb{F}^{n} must have at least nn vectors.

Proof.

Let 𝐯1,𝐯2,,𝐯m\mathbf{v}_{1},\mathbf{v}_{2},\ldots,\mathbf{v}_{m} be a complete system in 𝔽n\mathbb{F}^{n}, and let AA be n×mn\times m matrix with columns 𝐯1,𝐯2,,𝐯m\mathbf{v}_{1},\mathbf{v}_{2},\ldots,\mathbf{v}_{m}. Statement 2 of Proposition 2.3.1 implies that echelon form of AA has a pivot in every row. Since number of pivots cannot exceed the number of columns, nmn\leq m. ∎

2.3.2. Corollaries about invertible matrices

Proposition 2.3.6.

A matrix AA is invertible if and only if its echelon form has pivot in every column and every row.

Proof.

As it was discussed in the beginning of the section, the equation A𝐱=𝐛A\mathbf{x}=\mathbf{b} has a unique solution for any right side 𝐛\mathbf{b} if and only if the echelon form of AA has pivot in every row and every column. But, we know, see Theorem 1.6.8 in Chapter 1, that the matrix (linear transformation) AA is invertible if and only if the equation A𝐱=𝐛A\mathbf{x}=\mathbf{b} has a unique solution for any possible right side 𝐛\mathbf{b}.

There is also an alternative proof. We know that a matrix is invertible if and only if its columns form a basis in (see Corollary 1.6.9 in Section 1.6.4, Chapter 1). Proposition 2.3.4 above states that it happens if and only if there is a pivot in every row and every column. ∎

The above proposition immediately implies the following

Corollary 2.3.7.

An invertible matrix must be square (n×nn\times n).

Proposition 2.3.8.

If a square (n×nn\times n) matrix is left invertible, or if it is right invertible, then it is invertible. In other words, to check the invertibility of a square matrix AA it is sufficient to check only one of the conditions AA1=IAA^{-1}=I, A1A=IA^{-1}A=I.

Note, that this proposition applies only to square matrices!

Proof.

We know that matrix AA is invertible if and only if the equation A𝐱=𝐛A\mathbf{x}=\mathbf{b} has a unique solution for any right side 𝐛\mathbf{b}. This happens if and only if the echelon form of the matrix AA has pivots in every row and in every column.

If a matrix AA is left invertible, the equation A𝐱=𝟎A\mathbf{x}=\mathbf{0} has unique solution 𝐱=𝟎\mathbf{x}=\mathbf{0}. Indeed, if BB is a left inverse of AA (i.e. BA=IBA=I), and 𝐱\mathbf{x} satisfies

A𝐱=𝟎,A\mathbf{x}=\mathbf{0},

then multiplying this identity by BB from the left we get 𝐱=𝟎\mathbf{x}=\mathbf{0}, so the solution is unique. Therefore, the echelon form of AA has pivots in every column (no free variables). If the matrix AA is square (n×nn\times n), the echelon form also has pivots in every row (nn pivots, and a row can have no more than 1 pivot), so the matrix is invertible.

If a matrix AA is right invertible, and CC is its right inverse (AC=IAC=I), then for 𝐱=C𝐛\mathbf{x}=C\mathbf{b}, 𝐛𝔽n\mathbf{b}\in\mathbb{F}^{n}

A𝐱=AC𝐛=I𝐛=𝐛.A\mathbf{x}=AC\mathbf{b}=I\mathbf{b}=\mathbf{b}.

Therefore, for any right side 𝐛\mathbf{b} the equation A𝐱=𝐛A\mathbf{x}=\mathbf{b} has a solution 𝐱=C𝐛\mathbf{x}=C\mathbf{b}. Thus, echelon form of AA has a pivot in every row. If AA is square, it also has a pivot in every column, so AA is invertible. ∎

Exercises.

2.3.1.

For what value of bb the system

(122246123)𝐱=(14b)\left(\begin{array}[]{ccc}1&2&2\\ 2&4&6\\ 1&2&3\end{array}\right)\mathbf{x}=\left(\begin{array}[]{c}1\\ 4\\ b\end{array}\right)

has a solution. Find the general solution of the system for this value of bb.

2.3.2.

Determine, if the vectors

(1100),(1010),(0011),(0101)\left(\begin{array}[]{c}1\\ 1\\ 0\\ 0\end{array}\right),\qquad\left(\begin{array}[]{c}1\\ 0\\ 1\\ 0\end{array}\right),\qquad\left(\begin{array}[]{c}0\\ 0\\ 1\\ 1\end{array}\right),\qquad\left(\begin{array}[]{c}0\\ 1\\ 0\\ 1\end{array}\right)

are linearly independent or not.

Do these four vectors span 4\mathbb{R}^{4}? (In other words, is it a generating system?) What about 4\mathbb{C}^{4}?

2.3.3.

Determine, which of the following systems of vectors are bases in 3\mathbb{R}^{3}:

  1. a)

    (1,2,1)T(1,2,-1)^{T}, (1,0,2)T(1,0,2)^{T}, (2,1,1)T(2,1,1)^{T};

  2. b)

    (1,3,2)T(-1,3,2)^{T}, (3,1,3)T(-3,1,3)^{T}, (2,10,2)T(2,10,2)^{T};

  3. c)

    (67,13,47)T(67,13,-47)^{T}, (π,7.84,0)T(\pi,-7.84,0)^{T}, (3,0,0)T(3,0,0)^{T}.

Which of the systems are bases in 3\mathbb{C}^{3}?

2.3.4.

Do the polynomials x3+2xx^{3}+2x, x2+x+1x^{2}+x+1, x3+5x^{3}+5 generate (span) 3\mathbb{P}_{3}? Justify your answer.

2.3.5.

Can 5 vectors in 𝔽4\mathbb{F}^{4} be linearly independent? Justify your answer.

2.3.6.

Prove or disprove: If the columns of a square (n×nn\times n) matrix AA are linearly independent, so are the columns of A2=AAA^{2}=AA.

2.3.7.

Prove or disprove: If the columns of a square (n×nn\times n) matrix AA are linearly independent, so are the rows of A3=AAAA^{3}=AAA.

2.3.8.

Show that if the equation A𝐱=𝟎A\mathbf{x}=\mathbf{0} has unique solution (i.e. if echelon form of AA has pivot in every column), then AA is left invertible. Hint: elementary matrices may help.
Note: It was shown in the text that if AA is left invertible, then the equation A𝐱=𝟎A\mathbf{x}=\mathbf{0} has unique solution. But here you are asked to prove the converse of this statement, which was not proved.
Remark: This can be a very hard problem, for it requires deep understanding of the subject. However, when you understand what to do, the problem becomes almost trivial.

2.3.9.

Is the reduced echelon form of a matrix unique? Justify your conclusion.

Namely, suppose that by performing some row operations (not necessarily following any algorithm) we end up with a reduced echelon matrix. Do we always end up with the same matrix, or can we get different ones? Note that we are only allowed to perform row operations, the “column operations”’ are forbidden.

Hint: What happens if you start with an invertible matrix? Also, are the pivots always in the same columns, or can it be different depending on what row operations you perform? If you can tell what the pivot columns are without reverting to row operations, then the positions of pivot columns do not depend on them.

2.4. Finding A1A^{-1} by row reduction.

As it was discussed above, an invertible matrix must be square, and its echelon form must have pivots in every row and every column. Therefore reduced echelon form of an invertible matrix is the identity matrix II. Therefore,

Any invertible matrix is row equivalent (i.e. can be reduced by row operations) to the identity matrix.

Now let us state a simple algorithm of finding the inverse of an n×nn\times n matrix:

  1. 1.

    Form an augmented n×2nn\times 2n matrix (A|I)(A\,|\,I) by writing the n×nn\times n identity matrix right of AA;

  2. 2.

    Performing row operations on the augmented matrix transform AA to the identity matrix II;

  3. 3.

    The matrix II that we added will be automatically transformed to A1A^{-1};

  4. 4.

    If it is impossible to transform AA to the identity by row operations, AA is not invertible.

There are several possible explanations of the above algorithm. The first, a naïve one, is as follows: we know that (for an invertible AA) the vector A1𝐛A^{-1}\mathbf{b} is the solution of the equation A𝐱=𝐛A\mathbf{x}=\mathbf{b}. So to find the column number kk of A1A^{-1} we need to find the solution of A𝐱=𝐞kA\mathbf{x}=\mathbf{e}_{k}, where 𝐞1,𝐞2,,𝐞n\mathbf{e}_{1},\mathbf{e}_{2},\ldots,\mathbf{e}_{n} is the standard basis in n\mathbb{R}^{n}. The above algorithm just solves the equations

A𝐱=𝐞k,k=1,2,,nA\mathbf{x}=\mathbf{e}_{k},\qquad k=1,2,\ldots,n

simultaneously!

Let us also present another, more “advanced” explanation. As we discussed above, every row operation can be realized as a left multiplication by an elementary matrix. Let E1,E2,,ENE_{1},E_{2},\ldots,E_{N} be the elementary matrices corresponding to the row operation we performed, and let E=ENE2E1E=E_{N}\cdots E_{2}E_{1} be their product.111Although it does not matter here, but please notice, that if the row operation E1E_{1} was performed first, E1E_{1} must be the rightmost term in the product We know that the row operations transform AA to identity, i.e. EA=IEA=I, so E=A1E=A^{-1}. But the same row operations transform the augmented matrix (A|I)(A\,|\,I) to (EA|E)=(I|A1)(EA\,|\,E)=(I\,|\,A^{-1}). ∎

This “advanced” explanation using elementary matrices implies an important proposition that will be often used later.

Theorem 2.4.1.

Any invertible matrix can be represented as a product of elementary matrices.

Proof.

As we discussed in the previous paragraph, A1=ENE2E1A^{-1}=E_{N}\cdots E_{2}E_{1}, so

A=(A1)1=E11E21EN1.A=(A^{-1})^{-1}=E_{1}^{-1}E_{2}^{-1}\cdots E_{N}^{-1}.

An Example.

Suppose we want to find the inverse of the matrix

(1422773116).\left(\begin{array}[]{rrr}1&4&-2\\ -2&-7&7\\ 3&11&-6\end{array}\right).

Augmenting the identity matrix to it and performing row reduction we get

(1421002770103116001)+2R13R1\displaystyle\left(\begin{array}[]{rrr|rrr}1&4&-2&1&0&0\\ -2&-7&7&0&1&0\\ 3&11&-6&0&0&1\end{array}\right)\begin{array}[]{c}\vphantom{1}\\ +2R_{1}\\ -3R_{1}\end{array} (142100013210010301)+R2\displaystyle\sim\left(\begin{array}[]{rrr|rrr}1&4&-2&1&0&0\\ 0&1&3&2&1&0\\ 0&-1&0&-3&0&1\end{array}\right)\begin{array}[]{c}\vphantom{1}\\ \vphantom{1}\\ +R_{2}\end{array}\sim
(142100013210003111)×3\displaystyle\left(\begin{array}[]{rrr|rrr}1&4&-2&1&0&0\\ 0&1&3&2&1&0\\ 0&0&3&-1&1&1\end{array}\right)\begin{array}[]{c}\times 3\\ \phantom{1}\\ \phantom{1}\end{array} (3126300013210003111)+2R3R3\displaystyle\sim\left(\begin{array}[]{rrr|rrr}3&12&-6&3&0&0\\ 0&1&3&2&1&0\\ 0&0&3&-1&1&1\end{array}\right)\begin{array}[]{c}+2R_{3}\\ -R_{3}\\ \phantom{1}\end{array}\sim

Here in the last row operation we multiplied the first row by 33 to avoid fractions in the backward phase of row reduction. Continuing with the row reduction we get

(3120122010301003111)12R2\displaystyle\left(\begin{array}[]{rrr|rrr}3&12&0&1&2&2\\ 0&1&0&3&0&-1\\ 0&0&3&-1&1&1\end{array}\right)\begin{array}[]{c}-12R_{2}\\ \phantom{1}\\ \phantom{1}\end{array} (30035214010301003111)\displaystyle\sim\left(\begin{array}[]{rrr|rrr}3&0&0&-35&2&14\\ 0&1&0&3&0&-1\\ 0&0&3&-1&1&1\end{array}\right)

Dividing the first and the last row by 3 we get the inverse matrix

(35/32/314/33011/31/31/3)\left(\begin{array}[]{ccc}{-35/3}&{2/3}&{14/3}\\ 3&0&-1\\ {-1/3}&{1/3}&{1/3}\end{array}\right)

Exercises.

2.4.1.

Find the inverse of the matrices

(121373234),(112112114).\left(\begin{array}[]{ccccc}1&2&1\\ 3&7&3\\ 2&3&4\end{array}\right),\qquad\left(\begin{array}[]{rrr}1&-1&2\\ 1&1&-2\\ 1&1&4\\ \end{array}\right).

Show all steps

2.5. Dimension. Finite-dimensional spaces.

Definition.

The dimension dimV\dim V of a vector space VV is the number of vectors in a basis.

For a vector space consisting only of zero vector 𝟎\mathbf{0} we put dimV=0\dim V=0. If VV does not have a (finite) basis, we put dimV=\dim V=\infty.

If dimV\dim V is finite, we call the space VV finite-dimensional; otherwise we call it infinite-dimensional.

Proposition 2.3.3 asserts that the dimension is well defined, i.e. that it does not depend on the choice of a basis.

Proposition 1.2.8 from Chapter 1 states that any finite spanning system in a vector space VV contains a basis. This immediately implies the following

Proposition 2.5.1.

A vector space VV is finite-dimensional if and only if it has a finite spanning system.

Suppose, that we have a system of vectors in a finite-dimensional vector space, and we want to check if it is a basis (or if it is linearly independent, or if it is complete)? Probably the simplest way is to use an isomorphism A:VnA:V\to\mathbb{R}^{n}, n=dimVn=\dim V to move the problem to n\mathbb{R}^{n}, where all such questions can be answered by row reduction (studying pivots).

Note, that if dimV=n\dim V=n, then there always exists an isomorphism A:VnA:V\to\mathbb{R}^{n}. Indeed, if dimV=n\dim V=n then there exists a basis 𝐯1,𝐯2,,𝐯nV\mathbf{v}_{1},\mathbf{v}_{2},\ldots,\mathbf{v}_{n}\in V, and one can define an isomorphism A:VnA:V\to\mathbb{R}^{n} by

A𝐯k=𝐞k,k=1,2,,n.A\mathbf{v}_{k}=\mathbf{e}_{k},\qquad k=1,2,\ldots,n.

As an example, let us give the following two corollaries of the above Propositions 2.3.2, 2.3.5:

Proposition 2.5.2.

Any linearly independent system in a finite-dimensional vector space VV cannot have more than dimV\dim V vectors in it.

Proof.

Let 𝐯1,𝐯2,,𝐯mV\mathbf{v}_{1},\mathbf{v}_{2},\ldots,\mathbf{v}_{m}\in V be a linearly independent system, and let A:VnA:V\to\mathbb{R}^{n} be an isomorphism. Then A𝐯1,A𝐯2,,A𝐯mA\mathbf{v}_{1},A\mathbf{v}_{2},\ldots,A\mathbf{v}_{m} is a linearly independent system in n\mathbb{R}^{n}, and by Proposition 2.3.2 mnm\leq n. ∎

Proposition 2.5.3.

Any generating system in a finite-dimensional vector space VV must have at least dimV\dim V vectors in it.

Proof.

Let 𝐯1,𝐯2,,𝐯mV\mathbf{v}_{1},\mathbf{v}_{2},\ldots,\mathbf{v}_{m}\in V be a complete system in VV, and let A:VnA:V\to\mathbb{R}^{n} be an isomorphism. Then A𝐯1,A𝐯2,,A𝐯mA\mathbf{v}_{1},A\mathbf{v}_{2},\ldots,A\mathbf{v}_{m} is a complete system in n\mathbb{R}^{n}, and by Proposition 2.3.5 mnm\geq n. ∎

2.5.1. Completing a linearly independent system to a basis

The following statement will play an important role later.

Proposition 2.5.4 (Completion to a basis).

A linearly independent system of vectors in a finite-dimensional space can be completed to a basis, i.e. if 𝐯1,𝐯2,,𝐯r\mathbf{v}_{1},\mathbf{v}_{2},\ldots,\mathbf{v}_{r} are linearly independent vectors in a finite-dimensional vector space VV then one can find vectors 𝐯r+1,𝐯r+2,𝐯n\mathbf{v}_{r+1},\mathbf{v}_{r+2}\ldots,\mathbf{v}_{n} such that the system of vectors 𝐯1,𝐯2,,𝐯n\mathbf{v}_{1},\mathbf{v}_{2},\ldots,\mathbf{v}_{n} is a basis in VV.

Proof.

Let n=dimVn=\dim V. Assume that the system is not generating, since otherwise it is already a basis. Take any vector (let us call it 𝐯r+1\mathbf{v}_{r+1}) not belonging to span{𝐯1,𝐯2,,𝐯r}\operatorname{span}\{\mathbf{v}_{1},\mathbf{v}_{2},\ldots,\mathbf{v}_{r}\} (one can always do that because the system 𝐯1,𝐯2,,𝐯r\mathbf{v}_{1},\mathbf{v}_{2},\ldots,\mathbf{v}_{r} is not generating). By Exercise 1.2.5 from Chapter 1 the system 𝐯1,,𝐯r,𝐯r+1\mathbf{v}_{1},\ldots,{\mathbf{v}}_{r},\mathbf{v}_{r+1} is linearly independent (notice that in this case r<nr<n by Proposition 2.5.2). Repeat the procedure with the new system to get vector 𝐯r+2\mathbf{v}_{r+2}, and so on.

We will stop the process when we get a generating system. Note, that the process cannot continue infinitely, because a linearly independent system of vectors in VV cannot have more than n=dimVn=\dim V vectors. ∎

2.5.2. Subspaces of finite dimensional spaces

Theorem 2.5.5.

Let VV be a subspace of a vector space WW, dimW<\dim W<\infty. Then VV is finite dimensional and dimVdimW\dim V\leq\dim W.

Moreover, if dimV=dimW\dim V=\dim W then V=WV=W (we are still assuming that VV is a subspace of WW here).

Remark.

This theorem looks like a complete banality, like an easy corollary of Proposition 2.5.2. But we can apply Proposition 2.5.2 only if we already have a basis in VV. And we only have a basis in WW, and we cannot say how many vectors in this basis belong to VV; in fact, it is easy to construct an example where none of the vectors in the basis of WW belongs to VV.

Proof of Theorem 2.5.5.

If V={𝟎}V=\{\mathbf{0}\} then the theorem is trivial, so let us assume otherwise.

We want to find a basis in VV. Take a non-zero vector 𝐯1V\mathbf{v}_{1}\in V. If V=span{𝐯1}V=\operatorname{span}\{\mathbf{v}_{1}\}, we got our basis (consisting of the one vector 𝐯1\mathbf{v}_{1}).

If not, we continue by induction. Suppose we constructed rr linearly independent vectors 𝐯1,,𝐯rV\mathbf{v}_{1},\ldots,\mathbf{v}_{r}\in V. If V=span{𝐯k:1kr}V=\operatorname{span}\{\mathbf{v}_{k}:1\leq k\leq r\}, then we have found a basis in VV. If not, there exists a vector 𝐯r+1V\mathbf{v}_{r+1}\in V, 𝐯r+1span{𝐯k:1kr}\mathbf{v}_{r+1}\notin\operatorname{span}\{\mathbf{v}_{k}:1\leq k\leq r\}. By Exercise 1.2.5 from Chapter 1 the system 𝐯1,,𝐯r,𝐯r+1\mathbf{v}_{1},\ldots,{\mathbf{v}}_{r},\mathbf{v}_{r+1} is linearly independent.

Exercises.

2.5.1.

True or false:

  1. a)

    Every vector space that is generated by a finite set has a basis;

  2. b)

    Every vector space has a (finite) basis;

  3. c)

    A vector space cannot have more than one basis;

  4. d)

    If a vector space has a finite basis, then the number of vectors in every basis is the same.

  5. e)

    The dimension of n\mathbb{P}_{n} is nn;

  6. f)

    The dimension on Mm×nM_{m\times n} is m+nm+n;

  7. g)

    If vectors 𝐯1,𝐯2,,𝐯n\mathbf{v}_{1},\mathbf{v}_{2},\ldots,\mathbf{v}_{n} generate (span) the vector space VV, then every vector in VV can be written as a linear combination of vector 𝐯1,𝐯2,,𝐯n\mathbf{v}_{1},\mathbf{v}_{2},\ldots,\mathbf{v}_{n} in only one way;

  8. h)

    Every subspace of a finite-dimensional space is finite-dimensional;

  9. i)

    If VV is a vector space having dimension nn, then VV has exactly one subspace of dimension 0 and exactly one subspace of dimension nn.

2.5.2.

Prove that if VV is a vector space having dimension nn, then a system of vectors 𝐯1,𝐯2,,𝐯n\mathbf{v}_{1},\mathbf{v}_{2},\ldots,\mathbf{v}_{n} in VV is linearly independent if and only if it spans VV.

2.5.3.

Prove that a linearly independent system of vectors 𝐯1,𝐯2,,𝐯n\mathbf{v}_{1},\mathbf{v}_{2},\ldots,\mathbf{v}_{n} in a vector space VV is a basis if and only if n=dimVn=\dim V.

2.5.4.

(An old exercise revisited: now this problem should be easy) Is it possible that vectors 𝐯1,𝐯2,𝐯3\mathbf{v}_{1},\mathbf{v}_{2},\mathbf{v}_{3} are linearly dependent, but the vectors 𝐰1=𝐯1+𝐯2\mathbf{w}_{1}=\mathbf{v}_{1}+\mathbf{v}_{2}, 𝐰2=𝐯2+𝐯3\mathbf{w}_{2}=\mathbf{v}_{2}+\mathbf{v}_{3} and 𝐰3=𝐯3+𝐯1\mathbf{w}_{3}=\mathbf{v}_{3}+\mathbf{v}_{1} are linearly independent? Hint: What dimension can the subspace span(𝐯1,𝐯2,𝐯3)\operatorname{span}(\mathbf{v}_{1},\mathbf{v}_{2},\mathbf{v}_{3}) have?

2.5.5.

Let vectors 𝐮,𝐯,𝐰\mathbf{u},\mathbf{v},\mathbf{w} be a basis in VV. Show that 𝐮+𝐯+𝐰,𝐯+𝐰,𝐰\mathbf{u}+\mathbf{v}+\mathbf{w},\ \mathbf{v}+\mathbf{w},\ \mathbf{w} is also a basis in VV.

2.5.6.

Consider in the space 5\mathbb{R}^{5} vectors 𝐯1=(2,1,1,5,3)T\mathbf{v}_{1}=(2,-1,1,5,-3)^{T}, 𝐯2=(3,2,0,0,0)T\mathbf{v}_{2}=(3,-2,0,0,0)^{T}, 𝐯3=(1,1,50,921,0)T\mathbf{v}_{3}=(1,1,50,-921,0)^{T}.

  1. a)

    Prove that these vectors are linearly independent.

  2. b)

    Complete this system of vectors to a basis.

If you do part b) first you can do everything without any computations.

2.6. General solution of a linear system.

In this short section we discuss the structure of the general solution (i.e. of the solution set) of a linear system.

We call a system A𝐱=𝐛A\mathbf{x}=\mathbf{b} homogeneous if the right side 𝐛=𝟎\mathbf{b}=\mathbf{0}, i.e. a homogeneous system is a system of form A𝐱=𝟎A\mathbf{x}=\mathbf{0}.

With each system

A𝐱=𝐛A\mathbf{x}=\mathbf{b}

we can associate a homogeneous system just by putting 𝐛=𝟎\mathbf{b}=\mathbf{0}.

Theorem 2.6.1 (General solution of a linear equation).

Let a vector 𝐱1\mathbf{x}_{1} satisfy the equation A𝐱=𝐛A\mathbf{x}=\mathbf{b}, and let HH be the set of all solutions of the associated homogeneous system

A𝐱=𝟎.A\mathbf{x}=\mathbf{0}.

Then the set

{𝐱=𝐱1+𝐱h:𝐱hH}\{\mathbf{x}=\mathbf{x}_{1}+\mathbf{x}_{\scriptstyle\text{\rm h}}:\mathbf{x}_{% \scriptstyle\text{\rm h}}\in H\}

is the set of all solutions of the equation A𝐱=𝐛A\mathbf{x}=\mathbf{b}.

In other words, this theorem can be stated as

General solution of A𝐱=𝐛=A particular solution of A𝐱=𝐛+General solution of A𝐱=𝟎.\framebox{\parbox{85.35826pt}{ General solution of $A\mathbf{x}=\mathbf{b}$}}=\framebox{\parbox{85.35826pt}{A% particular solution of $A\mathbf{x}=\mathbf{b}$}}+\framebox{\parbox{85.35826pt}{General solution of $A\mathbf{x}=\mathbf{0}$}}\ .
Proof.

Fix a vector 𝐱1\mathbf{x}_{1} satisfying A𝐱1=𝐛A\mathbf{x}_{1}=\mathbf{b}. Let a vector 𝐱h\mathbf{x}_{\scriptstyle\text{\rm h}} satisfy A𝐱h=𝟎A\mathbf{x}_{{}_{\scriptstyle h}}=\mathbf{0}. Then for 𝐱=𝐱1+𝐱h\mathbf{x}=\mathbf{x}_{1}+\mathbf{x}_{\scriptstyle\text{\rm h}} we have

A𝐱=A(𝐱1+𝐱h)=A𝐱1+A𝐱h=𝐛+𝟎=𝐛,A\mathbf{x}=A(\mathbf{x}_{1}+\mathbf{x}_{\scriptstyle\text{\rm h}})=A\mathbf{x% }_{1}+A\mathbf{x}_{\scriptstyle\text{\rm h}}=\mathbf{b}+\mathbf{0}=\mathbf{b},

so any 𝐱\mathbf{x} of form

𝐱=𝐱1+𝐱h,𝐱hH\mathbf{x}=\mathbf{x}_{1}+\mathbf{x}_{\scriptstyle\text{\rm h}},\qquad\mathbf{% x}_{\scriptstyle\text{\rm h}}\in H

is a solution of A𝐱=𝐛A\mathbf{x}=\mathbf{b}.

Now let 𝐱\mathbf{x} satisfy A𝐱=𝐛A\mathbf{x}=\mathbf{b}. Then for 𝐱h:=𝐱𝐱1\mathbf{x}_{\scriptstyle\text{\rm h}}:=\mathbf{x}-\mathbf{x}_{1} we get

A𝐱h=A(𝐱𝐱1)=A𝐱A𝐱1=𝐛𝐛=𝟎,A\mathbf{x}_{\scriptstyle\text{\rm h}}=A(\mathbf{x}-\mathbf{x}_{1})=A\mathbf{x% }-A\mathbf{x}_{1}=\mathbf{b}-\mathbf{b}=\mathbf{0},

so 𝐱hH\mathbf{x}_{\scriptstyle\text{\rm h}}\in H. Therefore any solution 𝐱\mathbf{x} of A𝐱=𝐛A\mathbf{x}=\mathbf{b} can be represented as 𝐱=𝐱1+𝐱h\mathbf{x}=\mathbf{x}_{1}+\mathbf{x}_{\scriptstyle\text{\rm h}} with some 𝐱hH\mathbf{x}_{\scriptstyle\text{\rm h}}\in H. ∎

The power of this theorem is in its generality. It applies to all linear equations, we do not have to assume here that vector spaces are finite-dimensional. You will meet this theorem in differential equations, integral equations, partial differential equations, etc. Besides showing the structure of the solution set, this theorem allows one to separate investigation of uniqueness from the study of existence. Namely, to study uniqueness, we only need to analyze uniqueness of the homogeneous equation A𝐱=𝟎A\mathbf{x}=\mathbf{0}, which always has a solution.

There is an immediate application in this course: this theorem allows us to check a solution of a system A𝐱=𝐛A\mathbf{x}=\mathbf{b}. For example, consider a system

(23149111131112522238)𝐱=(176814).\left(\begin{array}[]{ccccc}2&3&1&4&-9\\ 1&1&1&1&-3\\ 1&1&1&2&-5\\ 2&2&2&3&-8\end{array}\right)\mathbf{x}=\left(\begin{array}[]{c}17\\ 6\\ 8\\ 14\end{array}\right).

Performing row reduction one can find the solution of this system

(2.6.1) 𝐱=(31020)+x3(21100)+x5(21021),x3,x5𝔽.\mathbf{x}=\left(\begin{array}[]{c}3\\ 1\\ 0\\ 2\\ 0\end{array}\right)+x_{3}\left(\begin{array}[]{c}-2\\ 1\\ 1\\ 0\\ 0\end{array}\right)+x_{5}\left(\begin{array}[]{c}2\\ -1\\ 0\\ 2\\ 1\end{array}\right)\ ,\qquad x_{3},x_{5}\in\mathbb{F}.

The parameters x3x_{3}, x5x_{5} can be denoted here by any other letters, tt and ss, for example; we are keeping notation x3x_{3} and x5x_{5} here only to remind us that the parameters came from the corresponding free variables.

Now, let us suppose, that we are just given this solution, and we want to check whether or not it is correct. Of course, we can repeat the row operations, but this is too time consuming. Moreover, if the solution was obtained by some non-standard method, it can look differently from what we get from the row reduction. For example, the formula

(2.6.2) 𝐱=(31020)+s(21100)+t(00121),s,t𝔽\mathbf{x}=\left(\begin{array}[]{c}3\\ 1\\ 0\\ 2\\ 0\end{array}\right)+s\left(\begin{array}[]{c}-2\\ 1\\ 1\\ 0\\ 0\end{array}\right)+t\left(\begin{array}[]{c}0\\ 0\\ 1\\ 2\\ 1\end{array}\right)\ ,\qquad s,t\in\mathbb{F}

gives the same set as (2.6.1) (can you say why?); here we just replaced the last vector in (2.6.1) by its sum with the second one. So, this formula is different from the solution we got from the row reduction, but it is nevertheless correct.

The simplest way to check that (2.6.1) and (2.6.2) give us correct solutions, is to check that the first vector (3,1,0,2,0)T(3,1,0,2,0)^{T} satisfies the equation A𝐱=𝐛A\mathbf{x}=\mathbf{b}, and that the other two (the ones with the parameters x3x_{3} and x5x_{5} or ss and tt in front of them) should satisfy the associated homogeneous equation A𝐱=𝟎A\mathbf{x}=\mathbf{0}.

If this checks out, we will be assured that any vector 𝐱\mathbf{x} defined by (2.6.1) or (2.6.2) is indeed a solution.

Note, that this method of checking the solution does not guarantee that (2.6.1) (or (2.6.2)) gives us all the solutions. For example, if we just somehow miss out the term with x3x_{3}, the above method of checking will still work fine.

So, how can we guarantee, that we did not miss any free variable, and there should not be extra term in (2.6.1)?

What comes to mind, is to count the pivots again. In this example, if one does row operations, the number of pivots is 3. So indeed, there should be 2 free variables, and it looks like we did not miss anything in (2.6.1).

To be able to prove this, we will need new notions of fundamental subspaces and of rank of a matrix. I should also mention that in this particular example, one does not have to perform all row operations to check that there are only 2 free variables, and that formulas (2.6.1) and (2.6.2) both give correct general solutions.

Exercises.

2.6.1.

True or false

  1. a)

    Any system of linear equations has at least one solution;

  2. b)

    Any system of linear equations has at most one solution;

  3. c)

    Any homogeneous system of linear equations has at least one solution;

  4. d)

    Any system of nn linear equations in nn unknowns has at least one solution;

  5. e)

    Any system of nn linear equations in nn unknowns has at most one solution;

  6. f)

    If the homogeneous system corresponding to a given system of a linear equations has a solution, then the given system has a solution;

  7. g)

    If the coefficient matrix of a homogeneous system of nn linear equations in nn unknowns is invertible, then the system has no non-zero solution;

  8. h)

    The solution set of any system of mm equations in nn unknowns is a subspace in n\mathbb{R}^{n};

  9. i)

    The solution set of any homogeneous system of mm equations in nn unknowns is a subspace in n\mathbb{R}^{n}.

2.6.2.

Find a 2×32\times 3 system (2 equations with 3 unknowns) such that its general solution has a form

(110)+s(121),s.\left(\begin{array}[]{c}1\\ 1\\ 0\end{array}\right)+s\left(\begin{array}[]{c}1\\ 2\\ 1\end{array}\right),\qquad s\in\mathbb{R}.

2.7. Fundamental subspaces of a matrix. Rank.

As we discussed above in Section 1.7 of Chapter 1, with any linear transformation A:VWA:V\to W we can associate two subspaces, namely, its kernel, or null space

KerA=NullA:={𝐯V:A𝐯=𝟎}V,\operatorname{Ker}A=\operatorname{Null}A:=\{\mathbf{v}\in V:A\mathbf{v}=% \mathbf{0}\}\subset V,

and its range

RanA={𝐰W:𝐰=A𝐯 for some 𝐯V}W.\operatorname{Ran}A=\{\mathbf{w}\in W:\mathbf{w}=A\mathbf{v}\text{ for some }% \mathbf{v}\in V\}\subset W.

In other words, the kernel KerA\operatorname{Ker}A is the solution set of the homogeneous equation A𝐱=𝟎A\mathbf{x}=\mathbf{0}, and the range RanA\operatorname{Ran}A is exactly the set of all right sides 𝐛W\mathbf{b}\in W for which the equation A𝐱=𝐛A\mathbf{x}=\mathbf{b} has a solution.

If AA is an m×nm\times n matrix, i.e. a mapping from 𝔽n\mathbb{F}^{n} to 𝔽m\mathbb{F}^{m}, then it follows from the “column by coordinate” rule of the matrix multiplication that any vector 𝐰RanA\mathbf{w}\in\operatorname{Ran}A can be represented as a linear combination of columns of AA. This explains the name column space (notation ColA\operatorname{Col}A), which is often used instead of RanA\operatorname{Ran}A.

If AA is a matrix, then in addition to RanA\operatorname{Ran}A and KerA\operatorname{Ker}A one can also consider the range and kernel for the transposed matrix ATA^{T}. Often the term row space is used for RanAT\operatorname{Ran}A^{T} and the term left null space is used for KerAT\operatorname{Ker}A^{T} (but usually no special notation is introduced).

The four subspaces RanA\operatorname{Ran}A, KerA\operatorname{Ker}A, RanAT\operatorname{Ran}A^{T}, KerAT\operatorname{Ker}A^{T} are called the fundamental subspaces of the matrix AA. In this section we will study important relations between the dimensions of the four fundamental subspaces.

We will need the following definition, which is one of the fundamental notions of Linear Algebra

Definition.

Given a linear transformation (matrix) AA its rank, rankA\operatorname{rank}A, is the dimension of the range of AA

rankA:=dimRanA.\operatorname{rank}A:=\dim\operatorname{Ran}A.

2.7.1. Computing fundamental subspaces and rank.

To compute the fundamental subspaces and rank of a matrix, one needs to do echelon reduction. Namely, let AA be the matrix, and AeA_{\text{e}} be its echelon form

  1. 1.

    The pivot columns of the original matrix AA (i.e. the columns where after row operations we will have pivots in the echelon form) give us a basis (one of many possible) in RanA\operatorname{Ran}A.

  2. 2.

    The pivot rows of the echelon form AeA_{\scriptstyle\text{\rm e}} give us a basis in the row space. Of course, it is possible just to transpose the matrix, and then do row operations. But if we already have the echelon form of AA, say by computing RanA\operatorname{Ran}A, then we get RanAT\operatorname{Ran}A^{T} for free.

  3. 3.

    To find a basis in the null space KerA\operatorname{Ker}A one needs to solve the homogeneous equation A𝐱=𝟎A\mathbf{x}=\mathbf{0}: the details will be seen from the example below.

Example.

Consider a matrix

(11221221113333211110).\left(\begin{array}[]{ccccc}1&1&2&2&1\\ 2&2&1&1&1\\ 3&3&3&3&2\\ 1&1&-1&-1&0\end{array}\right).

Performing row operations we get the echelon form

(11221003310000000000)\left(\begin{array}[]{ccccc}\framebox{$1$}&1&2&2&1\\ 0&0&\framebox{$-3$}&-3&-1\\ 0&0&0&0&0\\ 0&0&0&0&0\end{array}\right)

(the pivots are boxed here). So, the columns 1 and 3 of the original matrix, i.e. the columns

(1231),(2131)\left(\begin{array}[]{c}1\\ 2\\ 3\\ 1\end{array}\right),\quad\left(\begin{array}[]{r}2\\ 1\\ 3\\ -1\end{array}\right)

give us a basis in RanA\operatorname{Ran}A. We also get a basis for the row space RanAT\operatorname{Ran}A^{T} for free: the first and second row of the echelon form of AA, i.e. the vectors

(11221),(00331)\left(\begin{array}[]{c}1\\ 1\\ 2\\ 2\\ 1\end{array}\right),\quad\left(\begin{array}[]{r}0\\ 0\\ -3\\ -3\\ -1\end{array}\right)

(we put the vectors vertically here. The question of whether to put vectors here vertically as columns, or horizontally as rows is really a matter of convention. Our reason for putting them vertically is that although we call RanAT\operatorname{Ran}A^{T} the row space we define it as a column space of ATA^{T})

To compute the basis in the null space KerA\operatorname{Ker}A we need to solve the equation A𝐱=𝟎A\mathbf{x}=\mathbf{0}. Compute the reduced echelon form of AA, which in this example is

(11001/300111/30000000000).\left(\begin{array}[]{ccccc}\framebox{$1$}&1&0&0&1/3\\ 0&0&\framebox{$1$}&1&1/3\\ 0&0&0&0&0\\ 0&0&0&0&0\end{array}\right).

Note, that when solving the homogeneous equation A𝐱=𝟎A\mathbf{x}=\mathbf{0}, it is not necessary to write the whole augmented matrix, it is sufficient to work with the coefficient matrix. Indeed, in this case the last column of the augmented matrix is the column of zeroes, which does not change under row operations. So, we can just keep this column in mind, without actually writing it. Keeping this last zero column in mind, we can read the solution off the reduced echelon form above:

{x1=x213x5,x2 is free,x3=x413x5x4 is free,x5 is free,\left\{\begin{array}[]{l}x_{1}=-x_{2}-\frac{1}{3}x_{5},\\ x_{2}\text{ is free},\\ x_{3}=-x_{4}-\frac{1}{3}x_{5}\\ x_{4}\text{ is free},\\ x_{5}\text{ is free},\end{array}\right.

or, in the vector form

(2.7.1) 𝐱=(x213x5x2x413x5x4x5)=x2(11000)+x4(00110)+x5(1/301/301)\mathbf{x}=\left(\begin{array}[]{c}-x_{2}-\frac{1}{3}x_{5}\\ x_{2}\\ -x_{4}-\frac{1}{3}x_{5}\\ x_{4}\\ x_{5}\end{array}\right)=x_{2}\left(\begin{array}[]{c}-1\\ 1\\ 0\\ 0\\ 0\end{array}\right)+x_{4}\left(\begin{array}[]{c}0\\ 0\\ -1\\ 1\\ 0\end{array}\right)+x_{5}\left(\begin{array}[]{c}-1/3\\ 0\\ -1/3\\ 0\\ 1\end{array}\right)

The vectors at each free variable, i.e. in our case the vectors

(11000),(00110),(1/301/301)\left(\begin{array}[]{c}-1\\ 1\\ 0\\ 0\\ 0\end{array}\right),\quad\left(\begin{array}[]{c}0\\ 0\\ -1\\ 1\\ 0\end{array}\right),\quad\left(\begin{array}[]{c}-1/3\\ 0\\ -1/3\\ 0\\ 1\end{array}\right)

form a basis in KerA\operatorname{Ker}A.

Unfortunately, there is no shortcut for finding a basis in KerAT\operatorname{Ker}A^{T}, one must solve the equation AT𝐱=𝟎A^{T}\mathbf{x}=\mathbf{0}. The knowledge of the echelon form of AA does not help here.

2.7.2. Explanation of the computing bases in the fundamental subspaces.

So, why do the above methods indeed give us bases in the fundamental subspaces?

The null space KerA\operatorname{Ker}A.

The case of the null space KerA\operatorname{Ker}A is probably the simplest one: since we solved the equation A𝐱=𝟎A\mathbf{x}=\mathbf{0}, i.e. found all the solutions, then any vector in KerA\operatorname{Ker}A is a linear combination of the vectors we obtained. Thus, the vectors we obtained form a spanning system in KerA\operatorname{Ker}A. To see that the system is linearly independent, let us multiply each vector by the corresponding free variable and add everything, see (2.7.1). Then for each free variable xkx_{k}, the entry number kk of the resulting vector is exactly xkx_{k}, see (2.7.1) again, so the only way this vector (the linear combination) can be 𝟎\mathbf{0} is when all free variables are 0.

The column space RanA\operatorname{Ran}A.

Let us now explain why the method for finding a basis in the column space RanA\operatorname{Ran}A works. First of all, notice that the pivot columns of the reduced echelon form AreA_{\scriptstyle\text{\rm re}} of AA form a basis in RanAre\operatorname{Ran}A_{\scriptstyle\text{\rm re}} (not in the column space of the original matrix, but of its reduced echelon form!). Since row operations are just left multiplications by invertible matrices, they do not change linear independence. Therefore, the pivot columns of the original matrix AA are linearly independent.

Let us now show that the pivot columns of AA span the column space of AA. Let 𝐯1,𝐯2,,𝐯r\mathbf{v}_{1},\mathbf{v}_{2},\ldots,\mathbf{v}_{r} be the pivot columns of AA, and let 𝐯\mathbf{v} be an arbitrary column of AA. We want to show that 𝐯\mathbf{v} can be represented as a linear combination of the pivot columns 𝐯1,𝐯2,,𝐯r\mathbf{v}_{1},\mathbf{v}_{2},\ldots,\mathbf{v}_{r},

𝐯=α1𝐯1+α2𝐯2++αr𝐯r.\mathbf{v}=\alpha_{1}\mathbf{v}_{1}+\alpha_{2}\mathbf{v}_{2}+\ldots+\alpha_{r}% \mathbf{v}_{r}.

The reduced echelon form AreA_{\scriptstyle\text{\rm re}} is obtained from AA by the left multiplication

Are=EA,A_{\scriptstyle\text{\rm re}}=EA,

where EE is a product of elementary matrices, so EE is an invertible matrix. The vectors E𝐯1,E𝐯2,,E𝐯rE\mathbf{v}_{1},E\mathbf{v}_{2},\ldots,E\mathbf{v}_{r} are the pivot columns of AreA_{\scriptstyle\text{\rm re}}, and the column 𝐯\mathbf{v} of AA is transformed to the column E𝐯E\mathbf{v} of AreA_{\scriptstyle\text{\rm re}}. Since the pivot columns of AreA_{\scriptstyle\text{\rm re}} form a basis in RanAre\operatorname{Ran}A_{\scriptstyle\text{\rm re}}, vector E𝐯E\mathbf{v} can be represented as a linear combination

E𝐯=α1E𝐯1+α2E𝐯2++αrE𝐯r.E\mathbf{v}=\alpha_{1}E\mathbf{v}_{1}+\alpha_{2}E\mathbf{v}_{2}+\ldots+\alpha_% {r}E\mathbf{v}_{r}.

Multiplying this equality by E1E^{-1} from the left we get the representation

𝐯=α1𝐯1+α2𝐯2++αr𝐯r,\mathbf{v}=\alpha_{1}\mathbf{v}_{1}+\alpha_{2}\mathbf{v}_{2}+\ldots+\alpha_{r}% \mathbf{v}_{r},

so indeed the pivot columns of AA span RanA\operatorname{Ran}A.

The row space RanAT\operatorname{Ran}A^{T}.

It is easy to see that the pivot rows of the echelon form AeA_{\scriptstyle\text{\rm e}} of AA are linearly independent. Indeed, let 𝐰1,𝐰2,,𝐰r\mathbf{w}_{1},\mathbf{w}_{2},\ldots,\mathbf{w}_{r} be the transposed (since we agreed always to put vectors vertically) pivot rows of AeA_{\scriptstyle\text{\rm e}}. Suppose

α1𝐰1+α2𝐰2++αr𝐰r=𝟎.\alpha_{1}\mathbf{w}_{1}+\alpha_{2}\mathbf{w}_{2}+\ldots+\alpha_{r}\mathbf{w}_% {r}=\mathbf{0}.

Consider the first non-zero entry of 𝐰1\mathbf{w}_{1}. Since for all other vectors 𝐰2,𝐰3,,𝐰r\mathbf{w}_{2},\mathbf{w}_{3},\ldots,\mathbf{w}_{r} the corresponding entries equal 0 (by the definition of echelon form), we can conclude that α1=0\alpha_{1}=0. So we can just ignore the first term in the sum.

Consider now the first non-zero entry of 𝐰2\mathbf{w}_{2}. The corresponding entries of the vectors 𝐰3,,𝐰r\mathbf{w}_{3},\ldots,\mathbf{w}_{r} are 0, so α2=0\alpha_{2}=0. Repeating this procedure, we get that αk=0\alpha_{k}=0 k=1,2,,r\forall k=1,2,\ldots,r.

To see that vectors 𝐰1,𝐰2,,𝐰r\mathbf{w}_{1},\mathbf{w}_{2},\ldots,\mathbf{w}_{r} span the row space, one can notice that row operations do not change the row space. This can be obtained directly from analyzing row operations, but we present here a more formal way to demonstrate this fact.

For a transformation AA and a set XX let us denote by A(X)A(X) the set of all elements yy which can represented as y=A(x)y=A(x), xXx\in X,

A(X):={y=A(x):xX}.A(X):=\left\{y=A(x):x\in X\right\}.

If AA is an m×nm\times n matrix, and AeA_{\scriptstyle\text{\rm e}} is its echelon form, AeA_{\scriptstyle\text{\rm e}} is obtained from AA be left multiplication

Ae=EA,A_{\scriptstyle\text{\rm e}}=EA,

where EE is an m×mm\times m invertible matrix (the product of the corresponding elementary matrices). Then

RanAeT=Ran(ATET)=AT(RanET)=AT(m)=RanAT,\operatorname{Ran}A^{T}_{\scriptstyle\text{\rm e}}=\operatorname{Ran}(A^{T}E^{% T})=A^{T}(\operatorname{Ran}E^{T})=A^{T}(\mathbb{R}^{m})=\operatorname{Ran}A^{% T},

so indeed RanAT=RanAeT\operatorname{Ran}A^{T}=\operatorname{Ran}A^{T}_{\scriptstyle\text{\rm e}}.

2.7.3. The Rank Theorem. Dimensions of fundamental subspaces.

There are many applications in which one needs to find a basis in column space or in the null space of a matrix. For example, as it was shown above, solving a homogeneous equation A𝐱=𝟎A\mathbf{x}=\mathbf{0} amounts to finding a basis in the null space KerA\operatorname{Ker}A. Finding a basis in the column space means simply extracting a basis from a spanning set, by removing unnecessary vectors (columns).

However, the most important application of the above methods of computing bases of fundamental subspaces is the relations between their dimensions.

Theorem 2.7.1 (The Rank Theorem).

For a matrix AA

rankA=rankAT.\operatorname{rank}A=\operatorname{rank}A^{T}.

This theorem is often stated as follows:

The column rank of a matrix coincides with its row rank.

The proof of this theorem is trivial, since dimensions of both RanA\operatorname{Ran}A and RanAT\operatorname{Ran}A^{T} are equal to the number of pivots in the echelon form of AA.

The following theorem is gives us important relations between dimensions of the fundamental spaces. It is often also called the Rank Theorem

Theorem 2.7.2.

Let AA be an m×nm\times n matrix, i.e. a linear transformation from 𝔽n\mathbb{F}^{n} to 𝔽m\mathbb{F}^{m}. Then

  1. 1.

    dimKerA+rankA=n\dim\operatorname{Ker}A+\operatorname{rank}A=n (dimension of the domain of AA);

  2. 2.

    dimKerAT+rankA=m\dim\operatorname{Ker}A^{T}+\operatorname{rank}A=m (dimension of the target space of AA).

Proof.

The proof, modulo the above algorithms of finding bases in the fundamental subspaces, is almost trivial. The first statement is simply the fact that the number of free variables (dimKerA)\dim\operatorname{Ker}A) plus the number of basic variables (i.e. the number of pivots, i.e. rankA\operatorname{rank}A) adds up to the number of columns (i.e. to nn).

The second statement, if one takes into account that rankA=rankAT\operatorname{rank}A=\operatorname{rank}A^{T} is simply the first statement applied to ATA^{T}. ∎

As an application of the above theorem, let us recall the example from Section 2.6. There we considered a system

(23149111131112522238)𝐱=(176814),\left(\begin{array}[]{ccccc}2&3&1&4&-9\\ 1&1&1&1&-3\\ 1&1&1&2&-5\\ 2&2&2&3&-8\end{array}\right)\mathbf{x}=\left(\begin{array}[]{c}17\\ 6\\ 8\\ 14\end{array}\right),

and we claimed that its general solution given by

𝐱=(31020)+x3(21100)+x5(21021),x3,x5𝔽,\mathbf{x}=\left(\begin{array}[]{c}3\\ 1\\ 0\\ 2\\ 0\end{array}\right)+x_{3}\left(\begin{array}[]{c}-2\\ 1\\ 1\\ 0\\ 0\end{array}\right)+x_{5}\left(\begin{array}[]{c}2\\ -1\\ 0\\ 2\\ 1\end{array}\right)\ ,\qquad x_{3},x_{5}\in\mathbb{F},

or by

𝐱=(31020)+s(21100)+t(00121),s,t𝔽.\mathbf{x}=\left(\begin{array}[]{c}3\\ 1\\ 0\\ 2\\ 0\end{array}\right)+s\left(\begin{array}[]{c}-2\\ 1\\ 1\\ 0\\ 0\end{array}\right)+t\left(\begin{array}[]{c}0\\ 0\\ 1\\ 2\\ 1\end{array}\right)\ ,\qquad s,t\in\mathbb{F}.

We checked in Section 2.6 that a vector 𝐱\mathbf{x} given by either formula is indeed a solution of the equation. But, how can we guarantee that any of the formulas describe all solutions?

First of all, we know that in either formula, the last 2 vectors (the ones multiplied by the parameters) belong to KerA\operatorname{Ker}A. It is easy to see that in either case both vectors are linearly independent (two vectors are linearly dependent if and only if one is a multiple of the other).

Now, let us count dimensions: interchanging the first and the second rows and performing first round of row operations

2R1R12R1(11113231491112522238)(11113011230001200012)\begin{array}[]{c}\phantom{1}\\ -2R_{1}\\ -R_{1}\\ -2R_{1}\end{array}\left(\begin{array}[]{ccccc}1&1&1&1&-3\\ 2&3&1&4&-9\\ 1&1&1&2&-5\\ 2&2&2&3&-8\end{array}\right)\quad\sim\quad\left(\begin{array}[]{ccccc}1&1&1&1&% -3\\ 0&1&-1&2&-3\\ 0&0&0&1&-2\\ 0&0&0&1&-2\end{array}\right)

we see that there are three pivots already, so rankA3\operatorname{rank}A\geq 3. (Actually, we already can see that the rank is 3, but it is enough just to have the estimate here). By Theorem 2.7.2, rankA+dimKerA=5\operatorname{rank}A+\dim\operatorname{Ker}A=5, hence dimKerA2\dim\operatorname{Ker}A\leq 2, and therefore there cannot be more than 2 linearly independent vectors in KerA\operatorname{Ker}A. Therefore, last 2 vectors in either formula form a basis in KerA\operatorname{Ker}A, so either formula give all solutions of the equation.

An important corollary of the rank theorem, is the following theorem connecting existence and uniqueness for linear equations.

Theorem 2.7.3.

Let AA be an m×nm\times n matrix. Then the equation

A𝐱=𝐛A\mathbf{x}=\mathbf{b}

has a solution for every 𝐛m\mathbf{b}\in\mathbb{R}^{m} if and only if the dual equation

AT𝐱=𝟎A^{T}\mathbf{x}=\mathbf{0}

has a unique (only the trivial) solution. (Note, that in the second equation we have ATA^{T}, not AA).

Proof.

The proof follows immediately from Theorem 2.7.2 by counting the dimensions. We leave the details as an exercise to the reader. ∎

There is a very nice geometric interpretation of the second rank theorem (Theorem 2.7.2). Namely, statement 1 of the theorem says, that if a transformation A:𝔽n𝔽mA:\mathbb{F}^{n}\to\mathbb{F}^{m} has trivial kernel (KerA={𝟎}\operatorname{Ker}A=\{\mathbf{0}\}), then the dimensions of the domain 𝔽n\mathbb{F}^{n} and of the range RanA\operatorname{Ran}A coincide. If the kernel is non-trivial, then the transformation “kills” dimKerA\dim\operatorname{Ker}A dimensions, so dimRanA=ndimKerA\dim\operatorname{Ran}A=n-\dim\operatorname{Ker}A.

2.7.4. Completion of a linearly independent system to a basis

As Proposition 2.5.4 from Section 2.5 above asserts, any linearly independent system can be completed to a basis, i.e. given linearly independent vectors 𝐯1,𝐯2,,𝐯r\mathbf{v}_{1},\mathbf{v}_{2},\ldots,\mathbf{v}_{r} in a finite-dimensional vector space VV, one can find vectors 𝐯r+1,𝐯r+2,𝐯n\mathbf{v}_{r+1},\mathbf{v}_{r+2}\ldots,\mathbf{v}_{n} such that the system of vectors 𝐯1,𝐯2,,𝐯n\mathbf{v}_{1},\mathbf{v}_{2},\ldots,\mathbf{v}_{n} is a basis in VV.

Theoretically, the proof of this proposition give us an algorithm of finding the vectors 𝐯r+1,𝐯r+2,𝐯n\mathbf{v}_{r+1},\mathbf{v}_{r+2}\ldots,\mathbf{v}_{n}, but this algorithm does not look too practical.

Ideas of this section give us a more practical way to perform the completion to a basis.

First of all, notice that if an m×nm\times n matrix is in an echelon form, then its non-zero rows (which are clearly linearly independent) can be easily completed to a basis in the whole space 𝔽n\mathbb{F}^{n}; one just needs to add some rows in appropriate places, so the resulting matrix is still in the echelon form and has pivots in every column.

Then, the non-zero rows of the new matrix form a basis, and we can order it any way we want, because property of being basis does not depend on the ordering.

Suppose now that we have linearly independent vectors 𝐯1,𝐯2,,𝐯r𝔽n\mathbf{v}_{1},\mathbf{v}_{2},\ldots,\mathbf{v}_{r}\in\mathbb{F}^{n}. Consider the matrix AA with rows 𝐯1T,𝐯2T,,𝐯rT\mathbf{v}^{T}_{1},\mathbf{v}^{T}_{2},\ldots,\mathbf{v}^{T}_{r} and perform row operations to get the echelon form AeA_{\scriptstyle\text{\rm e}}. As we discussed above, the rows of AeA_{\scriptstyle\text{\rm e}} can be easily completed to a basis in 𝔽n\mathbb{F}^{n}. And it turns out that the same vectors that complete rows of AeA_{\scriptstyle\text{\rm e}} to a basis complete to a basis the original vectors 𝐯1,𝐯2,,𝐯r\mathbf{v}_{1},\mathbf{v}_{2},\ldots,\mathbf{v}_{r}.

To see that, let vectors 𝐯r+1,,𝐯n\mathbf{v}_{r+1},\ldots,\mathbf{v}_{n} complete the rows of AeA_{\scriptstyle\text{\rm e}} to a basis in 𝔽n\mathbb{F}^{n}. Then, if we add to a matrix AeA_{\scriptstyle\text{\rm e}} rows 𝐯r+1T,,𝐯nT\mathbf{v}^{T}_{r+1},\ldots,\mathbf{v}^{T}_{n}, we get an invertible matrix. Let call this matrix A~e\widetilde{A}_{\scriptstyle\text{\rm e}}, and let A~\widetilde{A} be the matrix obtained from AA by adding rows 𝐯r+1T,,𝐯nT\mathbf{v}^{T}_{r+1},\ldots,\mathbf{v}^{T}_{n}. The matrix A~e\widetilde{A}_{\scriptstyle\text{\rm e}} can be obtained from A~\widetilde{A} by row operations, so

A~e=EA~,\widetilde{A}_{\scriptstyle\text{\rm e}}=E\widetilde{A},

where EE is the product of the corresponding elementary matrices. Then A~=E1Ae\widetilde{A}=E^{-1}A_{\scriptstyle\text{\rm e}} and A~\widetilde{A} is invertible as a product of invertible matrices.

But that means that the rows of A~\widetilde{A} form a basis in 𝔽n\mathbb{F}^{n}, which is exactly what we need.

Remark.

The method of completion to a basis described above may be not the simplest one, but one of its principal advantages is that it works for vector spaces over an arbitrary field.

Exercises.

2.7.1.

True or false:

  1. a)

    The rank of a matrix is equal to the number of its non-zero columns;

  2. b)

    The m×nm\times n zero matrix is the only m×nm\times n matrix having rank 0;

  3. c)

    Elementary row operations preserve rank;

  4. d)

    Elementary column operations do not necessarily preserve rank;

  5. e)

    The rank of a matrix is equal to the maximum number of linearly independent columns in the matrix;

  6. f)

    The rank of a matrix is equal to the maximum number of linearly independent rows in the matrix;

  7. g)

    The rank of an n×nn\times n matrix is at most nn;

  8. h)

    An n×nn\times n matrix having rank nn is invertible.

2.7.2.

A 54×3754\times 37 matrix has rank 3131. What are dimensions of all 4 fundamental subspaces?

2.7.3.

Compute rank and find bases of all four fundamental subspaces for the matrices

(110011110),(12311140120230110000).\left(\begin{array}[]{ccccccc}1&1&0\\ 0&1&1\\ 1&1&0\end{array}\right),\qquad\left(\begin{array}[]{ccccccc}1&2&3&1&1\\ 1&4&0&1&2\\ 0&2&-3&0&1\\ 1&0&0&0&0\end{array}\right).
2.7.4.

Prove that if A:XYA:X\to Y and VV is a subspace of XX then dimAVrankA\operatorname{dim}AV\leq\operatorname{rank}A. (AVAV here means the subspace VV transformed by the transformation AA, i.e. any vector in AVAV can be represented as A𝐯A\mathbf{v}, 𝐯V\mathbf{v}\in V). Deduce from here that rank(AB)rankA\operatorname{rank}(AB)\leq\operatorname{rank}A.

Remark: Here one can use the fact that if VWV\subset W then dimVdimW\operatorname{dim}V\leq\operatorname{dim}W. Do you understand why is it true?

2.7.5.

Prove that if A:XYA:X\to Y and VV is a subspace of XX then dimAVdimV\operatorname{dim}AV\leq\operatorname{dim}V. Deduce from here that rank(AB)rankB\operatorname{rank}(AB)\leq\operatorname{rank}B.

2.7.6.

Prove that if the product ABAB of two n×nn\times n matrices is invertible, then both AA and BB are invertible. Even if you know about determinants, do not use them, we did not cover them yet. Hint: use previous 2 problems.

2.7.7.

Prove that if A𝐱=𝟎A\mathbf{x}=\mathbf{0} has unique solution, then the equation AT𝐱=𝐛A^{T}\mathbf{x}=\mathbf{b} has a solution for every right side 𝐛\mathbf{b}.
Hint: count pivots

2.7.8.

Write a matrix with the required property, or explain why no such matrix exists

  1. a)

    Column space contains (1,0,0)T(1,0,0)^{T}, (0,0,1)T(0,0,1)^{T}, row space contains (1,1)T(1,1)^{T}, (1,2)T(1,2)^{T};

  2. b)

    Column space is spanned by (1,1,1)T(1,1,1)^{T}, nullspace is spanned by (1,2,3)T(1,2,3)^{T};

  3. c)

    Column space is 4\mathbb{R}^{4}, row space is 3\mathbb{R}^{3}.

Hint: Check first if the dimensions add up.

2.7.9.

If AA has the same four fundamental subspaces as BB, does A=BA=B?

2.7.10.

Complete the rows of a matrix

(e3340π620021πe1100003320000001)\left(\begin{array}[]{rrrrrrr}e^{3}&3&4&0&-\pi&6&-2\\ 0&0&2&-1&\pi^{e}&1&1\\ 0&0&0&0&3&-3&2\\ 0&0&0&0&0&0&1\\ \end{array}\right)

to a basis in 7\mathbb{R}^{7}.

2.7.11.

For a matrix

(1212322155363024144711)\left(\begin{array}[]{rrrrrrr}1&2&-1&2&3\\ 2&2&1&5&5\\ 3&6&-3&0&24\\ -1&-4&4&-7&11\end{array}\right)

find bases in its column and row spaces.

2.7.12.

For the matrix in the previous problem, complete the basis in the row space to a basis in 5\mathbb{R}^{5}

2.7.13.

For the matrix

A=(1ii1)A=\left(\begin{array}[]{rr}1&i\\ i&-1\\ \end{array}\right)

compute RanA\operatorname{Ran}A and KerA\operatorname{Ker}A. What can you say about relation between these subspaces?

2.7.14.

Is it possible that for a real matrix AA that RanA=KerAT\operatorname{Ran}A=\operatorname{Ker}A^{T}? Is it possible for a complex AA?

2.7.15.

Complete the vectors (1,2,1,2,3)T(1,2,-1,2,3)^{T}, (2,2,1,5,5)T(2,2,1,5,5)^{T}, (1,4,4,7,11)T(-1,-4,4,7,-11)^{T} to a basis in 5\mathbb{R}^{5}.

2.8. Representation of a linear transformation in arbitrary bases. Change of coordinates formula.

The material we have learned about linear transformations and their matrices can be easily extended to transformations in abstract vector spaces with finite bases. In this section we will distinguish between a linear transformation TT and its matrix, the reason being that we consider different bases, so a linear transformation can have different matrix representation.

2.8.1. Coordinate vector.

Let VV be a vector space with a basis :={𝐛1,𝐛2,,𝐛n}\mathcal{B}:=\{\mathbf{b}_{1},\mathbf{b}_{2},\ldots,\mathbf{b}_{n}\}. Any vector 𝐯V\mathbf{v}\in V admits a unique representation as a linear combination

𝐯=x1𝐛1+x2𝐛2++xn𝐛n=k=1nxk𝐛k.\mathbf{v}=x_{1}\mathbf{b}_{1}+x_{2}\mathbf{b}_{2}+\ldots+x_{n}\mathbf{b}_{n}=% \sum_{k=1}^{n}x_{k}\mathbf{b}_{k}.

The numbers x1,x2,,xnx_{1},x_{2},\ldots,x_{n} are called the coordinates of the vector 𝐯\mathbf{v} in the basis \mathcal{B}. It is convenient to join these coordinates into the so-called coordinate vector of 𝐯\mathbf{v} relative to the basis \mathcal{B}, which is the column vector

[𝐯]:=(x1x2xn)𝔽n.[\mathbf{v}]_{{}_{\scriptstyle\mathcal{B}}}:=\left(\begin{array}[]{c}x_{1}\\ {x}_{2}\\ \vdots\\ {x}_{n}\end{array}\right)\in\mathbb{F}^{n}.

Note that the mapping

𝐯[𝐯]\mathbf{v}\mapsto[\mathbf{v}]_{{}_{\scriptstyle\mathcal{B}}}

is an isomorphism between VV and 𝔽n\mathbb{F}^{n}. It transforms the basis 𝐛1,𝐛2,,𝐛n\mathbf{b}_{1},\mathbf{b}_{2},\ldots,\mathbf{b}_{n} to the standard basis 𝐞1,𝐞2,,𝐞n\mathbf{e}_{1},\mathbf{e}_{2},\ldots,\mathbf{e}_{n} in 𝔽n\mathbb{F}^{n}.

2.8.2. Matrix of a linear transformation.

Let T:VWT:V\to W be a linear transformation, and let 𝒜={𝐚1,𝐚2,,𝐚n}\mathcal{A}=\{\mathbf{a}_{1},\mathbf{a}_{2},\ldots,\mathbf{a}_{n}\}, :={𝐛1,𝐛2,,𝐛m}\mathcal{B}:=\{\mathbf{b}_{1},\mathbf{b}_{2},\ldots,\mathbf{b}_{m}\} be bases in VV and WW respectively.

A matrix of the transformation TT in (or with respect to) the bases 𝒜\mathcal{A} and \mathcal{B} is an m×nm\times n matrix, denoted by [T]𝒜[T]_{{}_{\scriptstyle\mathcal{B}\mathcal{A}}}, which relates the coordinate vectors [T𝐯][T\mathbf{v}]_{{}_{\scriptstyle\mathcal{B}}} and [𝐯]𝒜[\mathbf{v}]_{{}_{\scriptstyle\mathcal{A}}},

[T𝐯]=[T]𝒜[𝐯]𝒜;[T\mathbf{v}]_{{}_{\scriptstyle\mathcal{B}}}=[T]_{{}_{\scriptstyle\mathcal{B}% \mathcal{A}}}[\mathbf{v}]_{{}_{\scriptstyle\mathcal{A}}};

notice the balance of symbols 𝒜\mathcal{A} and \mathcal{B} here: this is the reason we put the first basis 𝒜\mathcal{A} into the second position.

The matrix [T]𝒜[T]_{{}_{\scriptstyle\mathcal{B}\mathcal{A}}} is easy to find: its kkth column is just the coordinate vector [T𝐚k][T\mathbf{a}_{k}]_{{}_{\scriptstyle\mathcal{B}}} (compare this with finding the matrix of a linear transformation from 𝔽n\mathbb{F}^{n} to 𝔽m\mathbb{F}^{m}).

As in the case of standard bases, composition of linear transformations is equivalent to multiplication of their matrices: one only has to be a bit more careful about bases. Namely, let T1:XYT_{1}:X\to Y and T2:YZT_{2}:Y\to Z be linear transformation, and let 𝒜,\mathcal{A},\mathcal{B} and 𝒞\mathcal{C} be bases in XX, YY and ZZ respectively. Then for the composition T=T2T1T=T_{2}T_{1},

T:XZ,T𝐱:=T2(T1(𝐱))T:X\to Z,\qquad T\mathbf{x}:=T_{2}(T_{1}(\mathbf{x}))

we have

(2.8.1) [T]𝒞𝒜=[T2T1]𝒞𝒜=[T2]𝒞[T1]𝒜[T]_{{}_{\scriptstyle\mathcal{C}\mathcal{A}}}=[T_{2}T_{1}]_{{}_{\scriptstyle% \mathcal{C}\mathcal{A}}}=[T_{2}]_{{}_{\scriptstyle\mathcal{C}\mathcal{B}}}[T_{% 1}]_{{}_{\scriptstyle\mathcal{B}\mathcal{A}}}

(notice again the balance of indices here).

The proof here goes exactly as in the case of 𝔽n\mathbb{F}^{n} spaces with standard bases, so we do not repeat it here. Another possibility is to transfer everything to the spaces 𝔽n\mathbb{F}^{n} via the coordinate isomorphisms 𝐯[𝐯]\mathbf{v}\mapsto[\mathbf{v}]_{{}_{\scriptstyle\mathcal{B}}}. Then one does not need any proof, everything follows from the results about matrix multiplication.

2.8.3. Change of coordinate matrix.

Let us have two bases 𝒜={𝐚1,𝐚2,,𝐚n}\mathcal{A}=\linebreak\{\mathbf{a}_{1},\mathbf{a}_{2},\ldots,\mathbf{a}_{n}\} and ={𝐛1,𝐛2,,𝐛n}\mathcal{B}=\{\mathbf{b}_{1},\mathbf{b}_{2},\ldots,\mathbf{b}_{n}\} in a vector space VV. Consider the identity transformation I=IVI=I_{V} and its matrix [I]𝒜[I]_{{}_{\scriptstyle\mathcal{B}\mathcal{A}}} in these bases. By the definition

[𝐯]=[I]𝒜[𝐯]𝒜,𝐯V,[\mathbf{v}]_{{}_{\scriptstyle\mathcal{B}}}=[I]_{{}_{\scriptstyle\mathcal{B}% \mathcal{A}}}[\mathbf{v}]_{{}_{\scriptstyle\mathcal{A}}},\qquad\forall\mathbf{% v}\in V,

i.e. for any vector 𝐯V\mathbf{v}\in V the matrix [I]𝒜[I]_{{}_{\scriptstyle\mathcal{B}\mathcal{A}}} transforms its coordinates in the basis 𝒜\mathcal{A} into coordinates in the basis \mathcal{B}. The matrix [I]𝒜[I]_{{}_{\scriptstyle\mathcal{B}\mathcal{A}}} is often called the change of coordinates (from the basis 𝒜\mathcal{A} to the basis \mathcal{B}) matrix.

The matrix [I]𝒜[I]_{{}_{\scriptstyle\mathcal{B}\mathcal{A}}} is easy to compute: according to the general rule of finding the matrix of a linear transformation, its kkth column is the coordinate representation [𝐚k][\mathbf{a}_{k}]_{{}_{\scriptstyle\mathcal{B}}} of kkth element of the basis 𝒜\mathcal{A}

Note that

[I]𝒜=([I]𝒜)1,[I]_{{}_{\scriptstyle\mathcal{A}\mathcal{B}}}=([I]_{{}_{\scriptstyle\mathcal{B% }\mathcal{A}}})^{-1},

(follows immediately from the multiplication of matrices rule (2.8.1)), so any change of coordinate matrix is always invertible.

An example: change of coordinates from the standard basis

Let our space VV be 𝔽n\mathbb{F}^{n}, and let us have a basis ={𝐛1,𝐛2,,𝐛n}\mathcal{B}=\{\mathbf{b}_{1},\mathbf{b}_{2},\ldots,\mathbf{b}_{n}\} there. We also have the standard basis 𝒮={𝐞1,𝐞2,,𝐞n}\mathcal{S}=\{\mathbf{e}_{1},\mathbf{e}_{2},\ldots,\mathbf{e}_{n}\} there. The change of coordinates matrix [I]𝒮[I]_{{}_{\scriptstyle\mathcal{S}\mathcal{B}}} is easy to compute:

[I]𝒮=[𝐛1,𝐛2,,𝐛n]=:B,[I]_{{}_{\scriptstyle\mathcal{S}\mathcal{B}}}=[\mathbf{b}_{1},\mathbf{b}_{2},% \ldots,\mathbf{b}_{n}]=:B,

i.e. it is just the matrix BB whose kkth column is the vector (column) 𝐯k\mathbf{v}_{k}. And in the other direction

[I]𝒮=([I]𝒮)1=B1.[I]_{{}_{\scriptstyle\mathcal{B}\mathcal{S}}}=([I]_{{}_{\scriptstyle\mathcal{S% }\mathcal{B}}})^{-1}=B^{-1}.

For example, consider a basis

={(12),(21)}\mathcal{B}=\left\{\left(\begin{array}[]{c}1\\ 2\end{array}\right),\quad\left(\begin{array}[]{c}2\\ 1\end{array}\right)\right\}

in 𝔽2\mathbb{F}^{2}, and let 𝒮\mathcal{S} denote the standard basis there. Then

[I]𝒮=(1221)=:B[I]_{{}_{\scriptstyle\mathcal{S}\mathcal{B}}}=\left(\begin{array}[]{cc}1&2\\ 2&1\end{array}\right)=:B

and

[I]𝒮=[I]𝒮1=B1=13(1221)[I]_{{}_{\scriptstyle\mathcal{B}\mathcal{S}}}=[I]_{{}_{\scriptstyle\mathcal{S}% \mathcal{B}}}^{-1}=B^{-1}=\frac{1}{3}\left(\begin{array}[]{rr}-1&2\\ 2&-1\end{array}\right)

(we know how to compute inverses, and it is also easy to check that the above matrix is indeed the inverse of BB)

An example: going through the standard basis

In the space of polynomials of degree at most 11 we have bases

𝒜={1,1+x},and={1+2x,12x},\mathcal{A}=\{1,1+x\},\quad\text{and}\quad\mathcal{B}=\{1+2x,1-2x\},

and we want to find the change of coordinate matrix [I]𝒜[I]_{{}_{\scriptstyle\mathcal{B}\mathcal{A}}}.

Of course, we can always take vectors from the basis 𝒜\mathcal{A} and try to decompose them in the basis \mathcal{B}; it involves solving linear systems, and we know how to do that.

However, I think the following way is simpler. In 1\mathbb{P}_{1} we also have the standard basis 𝒮={1,x}\mathcal{S}=\{1,x\}, and for this basis

[I]𝒮𝒜=(1101)=:A,[I]𝒮=(1122)=:B,[I]_{{}_{\scriptstyle\mathcal{S}\mathcal{A}}}=\left(\begin{array}[]{rr}1&1\\ 0&1\end{array}\right)=:A,\qquad[I]_{{}_{\scriptstyle\mathcal{S}\mathcal{B}}}=% \left(\begin{array}[]{rr}1&1\\ 2&-2\end{array}\right)=:B,

and taking the inverses

[I]𝒜𝒮=A1=(1101),[I]𝒮=B1=14(2121).[I]_{{}_{\scriptstyle\mathcal{A}\mathcal{S}}}=A^{-1}=\left(\begin{array}[]{rr}% 1&-1\\ 0&1\end{array}\right),\qquad[I]_{{}_{\scriptstyle\mathcal{B}\mathcal{S}}}=B^{-% 1}=\frac{1}{4}\left(\begin{array}[]{rr}2&1\\ 2&-1\end{array}\right).

Then

[I]𝒜=[I]𝒮[I]𝒮𝒜=B1A=14(2121)(1101)[I]_{{}_{\scriptstyle\mathcal{B}\mathcal{A}}}=[I]_{{}_{\scriptstyle\mathcal{B}% \mathcal{S}}}[I]_{{}_{\scriptstyle\mathcal{S}\mathcal{A}}}=B^{-1}A=\frac{1}{4}% \left(\begin{array}[]{rr}2&1\\ 2&-1\end{array}\right)\left(\begin{array}[]{rr}1&1\\ 0&1\end{array}\right)

and

[I]𝒜=[I]𝒜𝒮[I]𝒮=A1B=(1101)(1122);[I]_{{}_{\scriptstyle\mathcal{A}\mathcal{B}}}=[I]_{{}_{\scriptstyle\mathcal{A}% \mathcal{S}}}[I]_{{}_{\scriptstyle\mathcal{S}\mathcal{B}}}=A^{-1}B=\left(% \begin{array}[]{rr}1&-1\\ 0&1\end{array}\right)\left(\begin{array}[]{rr}1&1\\ 2&-2\end{array}\right);

the reader should notice the balance of indices in the above formulas.

2.8.4. Matrix of a transformation and change of coordinates.

Let T:VWT:V\to W be a linear transformation, and let 𝒜\mathcal{A}, 𝒜~\widetilde{\mathcal{A}} be two bases in VV and let \mathcal{B}, ~\widetilde{\mathcal{B}} be two bases in WW. Suppose we know the matrix [T]𝒜[T]_{{}_{\scriptstyle\mathcal{B}\mathcal{A}}}, and we would like to find the matrix representation with respect to new bases 𝒜~\widetilde{\mathcal{A}}, ~\widetilde{\mathcal{B}}, i.e. the matrix [T]~𝒜~[T]_{{}_{\scriptstyle\widetilde{\mathcal{B}}\widetilde{\mathcal{A}}}}. The rule is very simple:

to get the matrix in the “new” bases one has to left and right multiply the matrix in the “old” bases by change of coordinates matrices.

I did not mention here which change of coordinate matrix should go where, because we don’t have any choice if we follow the balance of indices rule. Namely, matrix representation of a linear transformation changes according to the formula

[T]~𝒜~=[I]~[T]𝒜[I]𝒜𝒜~;\vphantom{T^{\mathcal{B}}}[T]_{{}_{\scriptstyle\widetilde{\mathcal{B}}% \widetilde{\mathcal{A}}}}=[I]_{{}_{\scriptstyle\widetilde{\mathcal{B}}\mathcal% {B}}}[T]_{{}_{\scriptstyle\mathcal{B}\mathcal{A}}}[I]_{{}_{\scriptstyle% \mathcal{A}\widetilde{\mathcal{A}}}}\ ;

notice the balance of indices here. The proof can be done just by analyzing what each of the matrices does.

2.8.5. Case of one basis: similar matrices

Let VV be a vector space and let 𝒜={𝐚1,𝐚2,,𝐚n}\mathcal{A}=\{\mathbf{a}_{1},\mathbf{a}_{2},\ldots,\mathbf{a}_{n}\} be a basis in VV. Consider a linear transformation T:VVT:V\to V and let [T]𝒜𝒜[T]_{{}_{\scriptstyle\mathcal{A}\mathcal{A}}} be its matrix in this basis (we use the same basis for “inputs” and “outputs”)

The case when we use the same basis for “inputs” and “outputs” is very important (because in this case we can multiply a matrix by itself), so let us study this case a bit more carefully. Notice, that very often in this case the shorter notation [T]𝒜[T]_{{}_{\scriptstyle\mathcal{A}}} is used instead of [T]𝒜𝒜[T]_{{}_{\scriptstyle\mathcal{A}\mathcal{A}}}. However, the two index notation [T]𝒜𝒜[T]_{{}_{\scriptstyle\mathcal{A}\mathcal{A}}} is better adapted to the balance of indices rule, so I recommend using it (or at least always keep it in mind) when doing change of coordinates.

Let ={𝐛1,𝐛2,,𝐛n}\mathcal{B}=\{\mathbf{b}_{1},\mathbf{b}_{2},\ldots,\mathbf{b}_{n}\} be another basis in VV. By the change of coordinate rule above

[T]=[I]𝒜[T]𝒜𝒜[I]𝒜[T]_{{}_{\scriptstyle\mathcal{B}\mathcal{B}}}=[I]_{{}_{\scriptstyle\mathcal{B}% \mathcal{A}}}[T]_{{}_{\scriptstyle\mathcal{A}\mathcal{A}}}[I]_{{}_{% \scriptstyle\mathcal{A}\mathcal{B}}}

Recalling that

[I]𝒜=[I]𝒜1[I]_{{}_{\scriptstyle\mathcal{B}\mathcal{A}}}=[I]_{{}_{\scriptstyle\mathcal{A}% \mathcal{B}}}^{-1}

and denoting Q:=[I]𝒜Q:=[I]_{{}_{\scriptstyle\mathcal{A}\mathcal{B}}}, we can rewrite the above formula as

[T]=Q1[T]𝒜𝒜Q.[T]_{{}_{\scriptstyle\mathcal{B}\mathcal{B}}}=Q^{-1}[T]_{{}_{\scriptstyle% \mathcal{A}\mathcal{A}}}Q.

This gives a motivation for the following definition

Definition 2.8.1.

We say that a matrix AA is similar to a matrix BB if there exists an invertible matrix QQ such that A=Q1BQA=Q^{-1}BQ.

Since an invertible matrix must be square, it follows from counting dimensions, that similar matrices AA and BB have to be square and of the same size. If AA is similar to BB, i.e. if A=Q1BQA=Q^{-1}BQ, then

B=QAQ1=(Q1)1A(Q1)B=QAQ^{-1}=(Q^{-1})^{-1}A(Q^{-1})

(since Q1Q^{-1} is invertible), therefore BB is similar to AA. So, we can just say that AA and BB are similar.

The above reasoning shows, that it does not matter where to put QQ and where Q1Q^{-1}: one can use the formula A=QBQ1A=QBQ^{-1} in the definition of similarity.

The above discussion shows, that one can treat similar matrices as different matrix representation of the same linear operator (transformation).

Exercises.

2.8.1.

True or false

  1. a)

    Every change of coordinate matrix is square;

  2. b)

    Every change of coordinate matrix is invertible;

  3. c)

    The matrices AA and BB are called similar if B=QTAQB=Q^{T}AQ for some matrix QQ;

  4. d)

    The matrices AA and BB are called similar if B=Q1AQB=Q^{-1}AQ for some matrix QQ;

  5. e)

    Similar matrices do not need to be square.

2.8.2.

Consider the system of vectors

(1,2,1,1)T,(0,1,3,1)T,(0,3,2,0)T,(0,1,0,0)T.(1,2,1,1)^{T},\quad(0,1,3,1)^{T},\quad(0,3,2,0)^{T},\quad(0,1,0,0)^{T}.
  1. a)

    Prove that it is a basis in 𝔽4\mathbb{F}^{4}. Try to do minimal amount of computations.

  2. b)

    Find the change of coordinate matrix that changes the coordinates in this basis to the standard coordinates in 𝔽4\mathbb{F}^{4} (i.e. to the coordinates in the standard basis 𝐞1,,𝐞4\mathbf{e}_{1},\ldots,\mathbf{e}_{4}).

2.8.3.

Find the change of coordinates matrix that changes the coordinates in the basis 1,1+t1,1+t in 1\mathbb{P}_{1} to the coordinates in the basis 1t,2t1-t,2t.

2.8.4.

Let TT be the linear operator in 𝔽2\mathbb{F}^{2} defined (in the standard coordinates) by

T(xy)=(3x+yx2y)T\left(\begin{array}[]{c}x\\ y\end{array}\right)=\left(\begin{array}[]{c}3x+y\\ x-2y\end{array}\right)

Find the matrix of TT in the standard basis and in the basis

(1,1)T,(1,2)T.(1,1)^{T},\qquad(1,2)^{T}.
2.8.5.

Prove, that if AA and BB are similar matrices then traceA=traceB\operatorname{trace}A=\operatorname{trace}B. Hint: recall how trace(XY)\operatorname{trace}(XY) and trace(YX)\operatorname{trace}(YX) are related.

2.8.6.

Are the matrices

(1322)and(0242)\left(\begin{array}[]{cc}1&3\\ 2&2\end{array}\right)\qquad\text{and}\qquad\left(\begin{array}[]{cc}0&2\\ 4&2\end{array}\right)

similar? Justify.