Chapter 2 Systems of linear equations ‣ Linear Algebra Done Wrong

2.1. Different faces of linear systems.

There exist several points of view on what a system of linear equations, or in short a linear system is. The first, naïve one is, that it is simply a collection of $m$ linear equations with $n$ unknowns $x_{1},x_{2},\ldots,x_{n}$ ,

\left\{\begin{array}[]{cccccccccc}a_{1,1}x_{1}&+&a_{1,2}x_{2}&+&\ldots&+&a_{1,% n}x_{n}&=&b_{1}&\\ a_{2,1}x_{1}&+&a_{2,2}x_{2}&+&\ldots&+&a_{2,n}x_{n}&=&b_{2}&\\ \ldots&&&&&&&&\\ a_{m,1}x_{1}&+&a_{m,2}x_{2}&+&\ldots&+&a_{m,n}x_{n}&=&b_{m}&.\end{array}\right.

To solve the system is to find all $n$ -tuples of numbers $x_{1},x_{2},\ldots,x_{n}$ which satisfy all $m$ equations simultaneously.

If we denote $\mathbf{x}:=(x_{1},x_{2},\ldots,x_{n})^{T}\in\mathbb{F}^{n}$ , $\mathbf{b}=(b_{1},b_{2},\ldots,b_{m})^{T}\in\mathbb{F}^{m}$ , and

A=\left(\begin{array}[]{cccccccc}a_{1,1}&a_{1,2}&\ldots&a_{1,n}\\ a_{2,1}&a_{2,2}&\ldots&a_{2,n}\\ \vdots&\vdots&&\vdots\\ a_{m,1}&a_{m,2}&\ldots&a_{m,n}\end{array}\right),

then the above linear system can be written in the matrix form (as a matrix-vector equation)

A\mathbf{x}=\mathbf{b}.

To solve the above equation is to find all vectors $\mathbf{x}\in\mathbb{F}^{n}$ satisfying $A\mathbf{x}=\mathbf{b}$ .

And finally, recalling the “column by coordinate” rule of the matrix-vector multiplication, we can write the system as a vector equation

x_{1}\mathbf{a}_{1}+x_{2}\mathbf{a}_{2}+\ldots+x_{n}\mathbf{a}_{n}=\mathbf{b},

where $\mathbf{a}_{k}$ is the $k$ th column of the matrix $A$ , $\mathbf{a}_{k}=(a_{1,k},a_{2,k},\ldots,a_{m,k})^{T}$ , $k=1,2,\ldots,n$ .

Note, these three examples are essentially just different representations of the same mathematical object.

Before explaining how to solve a linear system, let us notice that it does not matter what we call the unknowns, $x_{k}$ , $y_{k}$ or something else. So, all the information necessary to solve the system is contained in the matrix $A$ , which is called the coefficient matrix of the system and in the vector (right side) $\mathbf{b}$ . Hence, all the information we need is contained in the following matrix

\left(\begin{array}[]{cccc|cccc}a_{1,1}&a_{1,2}&\ldots&a_{1,n}&b_{1}\\ a_{2,1}&a_{2,2}&\ldots&a_{2,n}&b_{2}\\ \vdots&\vdots&&\vdots&\vdots\\ a_{m,1}&a_{m,2}&\ldots&a_{m,n}&b_{m}\end{array}\right)

which is obtained by attaching the column $\mathbf{b}$ to the matrix $A$ . This matrix is called the augmented matrix of the system. We will usually put the vertical line separating $A$ and $\mathbf{b}$ to distinguish between the augmented matrix and the coefficient matrix.

2.2. Solution of a linear system. Echelon and reduced echelon forms

Linear systems are solved by the Gauss–Jordan elimination (which is sometimes called row reduction). By performing operations on rows of the augmented matrix of the system (i.e. on the equations), we reduce it to a simple form, the so-called echelon form. When the system is in the echelon form, one can easily write the solution.

2.2.1. Row operations.

There are three types of row operations we use:

1.

Row exchange: interchange two rows of the matrix;
2.

Scaling: multiply a row by a non-zero scalar $a$ ;
3.

Row replacement: replace a row # $k$ by its sum with a constant multiple of a row # $j$ ; all other rows remain intact;

It is clear that the operations 1 and 2 do not change the set of solutions of the system; they essentially do not change the system.

As for the operation 3, one can easily see that it does not lose solutions. Namely, let a “new” system be obtained from an “old” one by a row operation of type 3. Then any solution of the “old” system is a solution of the “new” one.

To see that we do not gain anything extra, i.e. that any solution of the “new” system is also a solution of the “old” one, we just notice that row operation of type 3 are reversible, i.e. the “old’ system also can be obtained from the “new” one by applying a row operation of type 3 (can you say which one?)

Row operations and multiplication by elementary matrices

There is another, more “advanced” explanation why the above row operations are legal. Namely, every row operation is equivalent to the multiplication of the matrix from the left by one of the special elementary matrices.

Namely, the multiplication by the matrix

\begin{array}[]{cc}&\begin{array}[]{ccccccccccc}\phantom{\ldots}&\phantom{% \ldots}&\phantom{\ldots}&j&\phantom{\ldots}&\phantom{ldots}&\phantom{\ldots}&k% &\phantom{\ldots}&\phantom{\ldots}&\end{array}\\[4.30554pt] \begin{array}[]{c}\phantom{\vdots}\\ \phantom{\vdots}\\ \phantom{\vdots}\\ j\\ \phantom{\vdots}\\ \phantom{\vdots}\\ \phantom{\vdots}\\ k\\ \phantom{1}\\ \phantom{1}\\ \\ \end{array}&\left(\begin{array}[]{ccccccccccc}1&&&\vdots&&&&\vdots&&&\\ &\ddots&&\vdots&&&&\vdots&&&\\ &&1&\vdots&&&&\vdots&&&\\ \ldots&\ldots&\ldots&0&\ldots&\ldots&\ldots&1&&&\\ &&&\vdots&1&&&\vdots&&&\\ &&&\vdots&&\ddots&&\vdots&&&\\ &&&\vdots&&&1&\vdots&&&\\ \ldots&\ldots&\ldots&1&\ldots&\ldots&\ldots&0&&&\\ &&&&&&&&1&&\\ &&&&&&&&&\ddots&\\ &&&&&&&&&&1\end{array}\right)\end{array}

(all non-marked off-diagonal entries are $0$ ) just interchanges the rows number $j$ and number $k$ . Multiplication by the matrix

\begin{array}[]{c}\phantom{\vdots}\\ \\ \\ k\\ \\ \phantom{\ddots}\\ \end{array}\left(\begin{array}[]{ccccccc}1&&&\vdots&&&\\ &\ddots&&\vdots&&&\\ &&1&0&&&\\ \ldots&\ldots&0&a&0&&\\ &&&0&1&&\\[-4.30554pt] &&&&&\ddots&0\\ &&&&&0&1\end{array}\right)

(again all non-marked off-diagonal entries are $0$ ) multiplies the row number $k$ by $a$ . Finally, multiplication by the matrix

\begin{array}[]{c}\vphantom{\vdots}\\ \vphantom{\vdots}\\ j\\ \vphantom{\vdots}\\ k\\ \vphantom{\ddots}\\ \vphantom{0}\end{array}\left(\begin{array}[]{ccccccc}1&&\vdots&&\vdots&&\\ &\ddots&\vdots&&\vdots&&\\ \ldots&\ldots&1&\ldots&0&&\\ &&\vdots&\ddots&\vdots&&\\ \ldots&\ldots&a&\ldots&1&&\\ &&&&&\ddots&\\ &&&&&&1\end{array}\right)

adds to the row # $k$ row # $j$ multiplied by $a$ , and leaves all other rows intact.

To see that the multiplication by these matrices works as advertised, one can just see how the multiplications act on vectors (columns).

A way to describe (or to remember) these elementary matrices: each of them is obtained by applying the corresponding row operation to the identity matrix $I$ .

Note that all these matrices are invertible (compare with reversibility of row operations). The inverse of the first matrix is the matrix itself. To get the inverse of the second one, one just replaces $a$ by $1/a$ . And finally, the inverse of the third matrix is obtained by replacing $a$ by $-a$ . To see that the inverses are indeed obtained this way, one again can simply check how they act on columns.

So, performing a row operation on the augmented matrix of the system $A\mathbf{x}=\mathbf{b}$ is equivalent to the multiplication of the system (from the left) by a special invertible matrix $E$ . Left multiplying the equality $A\mathbf{x}=\mathbf{b}$ by $E$ we get that any solution of the equation

A\mathbf{x}=\mathbf{b}

is also a solution of

EA\mathbf{x}=E\mathbf{b}.

Multiplying this equation (from the left) by $E^{-1}$ we get that any of its solutions is a solution of the equation

E^{-1}EA\mathbf{x}=E^{-1}E\mathbf{b},

which is the original equation $A\mathbf{x}=\mathbf{b}$ . So, a row operation does not change the solution set of a system.

2.2.2. Row reduction.

The main step of row reduction consists of three sub-steps:

1.

Find the leftmost non-zero column of the matrix;
2.

Make sure, by applying row operations of type 1 (row exchange), if necessary, that the first (the upper) entry of this column is non-zero. This entry will be called the pivot entry or simply the pivot;
3.

“Kill” (i.e. make them 0) all non-zero entries below the pivot by adding (subtracting) an appropriate multiple of the first row from the rows number $2,3,\ldots,m$ .

We apply the main step to a matrix, then we leave the first row alone and apply the main step to rows $2,\ldots,m$ , then to rows $3,\ldots,m$ , etc.

The point to remember is that after we subtract a multiple of a row from all rows below it (step 3), we leave it alone and do not change it in any way, not even interchange it with another row.

After applying the main step finitely many times (at most $m$ ), we get what is called the echelon form of the matrix.

An example of row reduction.

Let us consider the following linear system:

\left\{\begin{array}[]{l}x_{1}+2x_{2}+3x_{3}=1\\ 3x_{1}+2x_{2}+x_{3}=7\\ 2x_{1}+x_{2}+2x_{3}=1\end{array}\right.

The augmented matrix of the system is

\left(\begin{array}[]{rrr|r}1&2&3&1\\ 3&2&1&7\\ 2&1&2&1\end{array}\right)

Subtracting $3\cdot$ Row#1 from the second row, and subtracting $2\cdot$ Row#1 from the third one we get:

\left(\begin{array}[]{rrr|r}1&2&3&1\\ 3&2&1&7\\ 2&1&2&1\end{array}\right)\begin{array}[]{c}\\ -3R_{1}\\ -2R_{1}\end{array}\qquad\sim\qquad\left(\begin{array}[]{rrr|r}1&2&3&1\\ 0&-4&-8&4\\ 0&-3&-4&-1\end{array}\right)

Multiplying the second row by $-1/4$ we get

\left(\begin{array}[]{rrr|r}1&2&3&1\\ 0&1&2&-1\\ 0&-3&-4&-1\end{array}\right)

Adding $3\cdot$ Row#2 to the third row we obtain

\left(\begin{array}[]{rrr|r}1&2&3&1\\ 0&1&2&-1\\ 0&-3&-4&-1\end{array}\right)\begin{array}[]{c}\phantom{1}\\ \phantom{1}\\ +3R_{2}\end{array}\qquad\sim\qquad\left(\begin{array}[]{rrr|r}1&2&3&1\\ 0&1&2&-1\\ 0&0&2&-4\end{array}\right)

Now we can use the so called back substitution to solve the system. Namely, from the last row (equation) we get $x_{3}=-2$ . Then from the second equation we get

x_{2}=-1-2x_{3}=-1-2(-2)=3,

and finally, from the first row (equation)

x_{1}=1-2x_{2}-3x_{3}=1-6+6=1.

So, the solution is

\left\{\begin{array}[]{l}x_{1}=1\\ x_{2}=3,\\ x_{3}=-2,\end{array}\right.

or in vector form

\mathbf{x}=\left(\begin{array}[]{r}1\\ 3\\ -2\end{array}\right)

or $\mathbf{x}=(1,3,-2)^{T}$ . We can check the solution by multiplying $A\mathbf{x}$ , where $A$ is the coefficient matrix.

Instead of using back substitution, we can do row reduction from bottom to top, killing all the entries above the main diagonal of the coefficient matrix: we start by multiplying the last row by $1/2$ , and the rest is pretty self-explanatory:

	$\displaystyle\left(\begin{array}[]{ccc\|r}1&2&3&1\\ 0&1&2&-1\\ 0&0&1&-2\end{array}\right)\begin{array}[]{c}-3R_{3}\\ -2R_{3}\\ \phantom{1}\end{array}$	$\displaystyle\sim\left(\begin{array}[]{ccc\|r}1&2&0&7\\ 0&1&0&3\\ 0&0&1&-2\end{array}\right)\begin{array}[]{c}-2R_{2}\\ \phantom{1}\\ \phantom{1}\end{array}$
		$\displaystyle\sim\left(\begin{array}[]{ccc\|r}1&0&0&1\\ 0&1&0&3\\ 0&0&1&-2\end{array}\right)$

and we just read the solution $\mathbf{x}=(1,3,-2)^{T}$ off the augmented matrix.

We leave it as an exercise to the reader to formulate the algorithm for the backward phase of the row reduction.

2.2.3. Echelon form

A matrix is in echelon form if it satisfies the following two conditions:

1.

All zero rows (i.e. the rows with all entries equal $0$ ), if any, are below all non-zero entries.

For a non-zero row, let us call the leftmost non-zero entry the leading entry. Then the second property of the echelon form can be formulated as follows:

2.

For any non-zero row its leading entry is strictly to the right of the leading entry in the previous row.

The leading entry in each row in echelon form is also called pivot entry, or simply pivot, because these entries are exactly the pivots we used in the row reduction.

A particular case of the echelon form is the so-called triangular form. We got this form in our example above. In this form the coefficient matrix is square ( $n\times n$ ), all its entries on the main diagonal are non-zero, and all the entries below the main diagonal are zero. The right side, i.e. the rightmost column of the augmented matrix can be arbitrary.

After the backward phase of the row reduction, we get what the so-called reduced echelon form of the matrix: coefficient matrix equal $I$ , as in the above example, is a particular case of the reduced echelon form.

The general definition is as follows: we say that a matrix is in the reduced echelon form, if it is in the echelon form and

3.

All pivot entries are equal $1$ ;
4.

All entries above the pivots are $0$ . Note, that all entries below the pivots are also $0$ because of the echelon form.

To get reduced echelon form from echelon form, we work from the bottom to the top and from the right to the left, using row replacement to kill all entries above the pivots.

An example of the reduced echelon form is the system with the coefficient matrix equal $I$ . In this case, one just reads the solution from the reduced echelon form. In the general case, one can also easily read the solution from the reduced echelon form. For example, let the reduced echelon form of the system (augmented matrix) be

\left(\begin{array}[]{ccccc|c}\framebox{1}&2&0&0&0&1\\ 0&0&\framebox{1}&5&0&2\\ 0&0&0&0&\framebox{1}&3\end{array}\right);

here we boxed the pivots. The idea is to move the variables, corresponding to the columns without pivot (the so-called free variables) to the right side. Then we can just write the solution.

\left\{\begin{array}[]{l}x_{1}=1-2x_{2}\\ x_{2}\text{ is free}\\ x_{3}=2-5x_{4}\\ x_{4}\text{ is free}\\ x_{5}=3\end{array}\right.

or in the vector form

\mathbf{x}=\left(\begin{array}[]{c}1-2x_{2}\\ x_{2}\\ 2-5x_{4}\\ x_{4}\\ 3\end{array}\right)=\left(\begin{array}[]{c}1\\ 0\\ 2\\ 0\\ 3\end{array}\right)+x_{2}\left(\begin{array}[]{c}-2\\ 1\\ 0\\ 0\\ 0\end{array}\right)+x_{4}\left(\begin{array}[]{c}0\\ 0\\ -5\\ 1\\ 0\end{array}\right),\quad x_{2},x_{4}\in\mathbb{F}.

One can also find the solution from the echelon form by using back substitution: the idea is to work from bottom to top, moving all free variables to the right side.

Exercises.

2.2.1.

Write the systems of equations below in matrix form and as vector equations:

a)

$\displaystyle\left\{\begin{array}[]{rrrrrrrrrrrrrrr}x_{1}&+&2x_{2}&-&x_{3}&=&-% 1\\ 2x_{1}&+&2x_{2}&+&x_{3}&=&1\\ 3x_{1}&+&5x_{2}&-&2x_{3}&=&-1\end{array}\right.$
b)

$\displaystyle\left\{\begin{array}[]{rrrrrrrrrrrrrrr}x_{1}&-&2x_{2}&-&x_{3}&=&1% \\ 2x_{1}&-&3x_{2}&+&x_{3}&=&6\\ 3x_{1}&-&5x_{2}&&&=&7\\ x_{1}&&&+&5x_{3}&=&9\end{array}\right.$
c)

$\displaystyle\left\{\begin{array}[]{rrrrrrrrrrrrrrr}x_{1}&+&2x_{2}&&&+&2x_{4}&% =&6\\ 3x_{1}&+&5x_{2}&-&x_{3}&+&6x_{4}&=&17\\ 2x_{1}&+&4x_{2}&+&x_{3}&+&2x_{4}&=&12\\ 2x_{1}&&&-&7x_{3}&+&11x_{4}&=&7\end{array}\right.$
d)

$\displaystyle\left\{\begin{array}[]{rrrrrrrrrrrrrrr}x_{1}&-&4x_{2}&-&x_{3}&+&x% _{4}&=&3\\ 2x_{1}&-&8x_{2}&+&x_{3}&-&4x_{4}&=&9\\ -x_{1}&+&4x_{2}&-&2x_{3}&+&5x_{4}&=&-6\end{array}\right.$
e)

$\displaystyle\left\{\begin{array}[]{rrrrrrrrrrrrrrr}x_{1}&+&2x_{2}&-&x_{3}&+&3% x_{4}&=&2\\ 2x_{1}&+&4x_{2}&-&x_{3}&+&6x_{4}&=&5\\ &&x_{2}&&&+&2x_{4}&=&3\end{array}\right.$
f)

$\displaystyle\left\{\begin{array}[]{rrrrrrrrrrrrrrr}2x_{1}&-&2x_{2}&-&x_{3}&+&% 6x_{4}&-2x_{5}&=&1\\ x_{1}&-&x_{2}&+&x_{3}&+&2x_{4}&-x_{5}&=&2\\ 4x_{1}&-&4x_{2}&+&5x_{3}&+&7x_{4}&-x_{5}&=&6\end{array}\right.$
g)

$\displaystyle\left\{\begin{array}[]{rrrrrrrrrrrrrrr}3x_{1}&-&x_{2}&+&x_{3}&-&x% _{4}&+&2x_{5}&=&5\\ x_{1}&-&x_{2}&-&x_{3}&-&2x_{4}&-&x_{5}&=&2\\ 5x_{1}&-&2x_{2}&+&x_{3}&-&3x_{4}&+&3x_{5}&=&10\\ 2x_{1}&-&x_{2}&&&-&2x_{4}&+&x_{5}&=&5\end{array}\right.$

Solve the systems. Write the answers in the vector form.

2.2.2.

Find all solutions of the vector equation

x_{1}\mathbf{v}_{1}+x_{2}\mathbf{v}_{2}+x_{3}\mathbf{v}_{3}=\mathbf{0},

where $\mathbf{v}_{1}=(1,1,0)^{T}$ , $\mathbf{v}_{2}=(0,1,1)^{T}$ and $\mathbf{v}_{3}=(1,0,1)^{T}$ . What conclusion can you make about linear independence (dependence) of the system of vectors $\mathbf{v}_{1},\mathbf{v}_{2},\mathbf{v}_{3}$ ?

2.3. Analyzing the pivots.

All questions about existence of a solution and it uniqueness can be answered by analyzing pivots in the echelon (reduced echelon) form of the augmented matrix of the system. First of all, let us investigate the question of when is the equation $A\mathbf{x}=\mathbf{b}$ inconsistent, i.e. when it does not have a solution. The answer follows immediately, if one just thinks about it:

a system is inconsistent (does not have a solution) if and only if there is a pivot in the last column of an echelon form of the augmented matrix, i.e. iff an echelon form of the augmented matrix has a row $\left(\begin{array}[]{cccc|cccc}0&0&\ldots&0&b\end{array}\right)$ , $b\neq 0$ in it.

Indeed, such a row correspond to the equation $0x_{1}+0x_{2}+\ldots+0x_{n}=b\neq 0$ that does not have a solution. If we don’t have such a row, we just make the reduced echelon form and then read the solution off it.

Now, three more statements. Note, they all deal with the coefficient matrix, and not with the augmented matrix of the system.

1.

A solution (if it exists) is unique iff there are no free variables, that is if and only if the echelon form of the coefficient matrix has a pivot in every column;
2.

Equation $A\mathbf{x}=\mathbf{b}$ is consistent for all right sides $\mathbf{b}$ if and only if the echelon form of the coefficient matrix has a pivot in every row.
3.

Equation $A\mathbf{x}=\mathbf{b}$ has a unique solution for any right side $\mathbf{b}$ if and only if echelon form of the coefficient matrix $A$ has a pivot in every column and every row.

The first statement is trivial, because free variables are responsible for all non-uniqueness. I should only emphasize that this statement does not say anything about the existence.

The second statement is a tiny bit more complicated. If we have a pivot in every row of the coefficient matrix, we cannot have the pivot in the last column of the augmented matrix, so the system is always consistent, no matter what the right side $\mathbf{b}$ is.

Let us show that if we have a zero row in the echelon form of the coefficient matrix $A$ , then we can pick a right side $\mathbf{b}$ such that the system $A\mathbf{x}=\mathbf{b}$ is not consistent. Let $A_{e}$ echelon form of the coefficient matrix $A$ . Then

A_{e}=EA,

where $E$ is the product of elementary matrices, corresponding to the row operations, $E=E_{N}\ldots E_{2}E_{1}$ . If $A_{e}$ has a zero row, then the last row is also zero. Therefore, if we put $\mathbf{b}_{e}=(0,\ldots,0,1)^{T}$ (all entries are $0$ , except the last one), then the equation

A_{e}\mathbf{x}=\mathbf{b}_{e}

does not have a solution. Multiplying this equation by $E^{-1}$ from the left, and recalling that $E^{-1}A_{e}=A$ , we get that the equation

A\mathbf{x}=E^{-1}\mathbf{b}_{e}

does not have a solution.

Finally, statement 3 immediately follows from statements 1 and 2. ∎

From the above analysis of pivots we get several very important corollaries. The main observation we use is

In echelon form, any row and any column have no more than 1 pivot in it (it can have 0 pivots)

2.3.1. Corollaries about linear independence and bases. Dimension

Questions as to when a system of vectors in $\mathbb{F}^{n}$ is a basis, a linearly independent or a spanning system, can be easily answered by the row reduction.

Proposition 2.3.1.

Let us have a system of vectors $\mathbf{v}_{1},\mathbf{v}_{2},\ldots,\mathbf{v}_{m}\in\mathbb{F}^{n}$ , and let $A=[\mathbf{v}_{1},\mathbf{v}_{2},\ldots,\mathbf{v}_{m}]$ be an $n\times m$ matrix with columns $\mathbf{v}_{1},\mathbf{v}_{2},\ldots,\mathbf{v}_{m}$ . Then

1.

The system $\mathbf{v}_{1},\mathbf{v}_{2},\ldots,\mathbf{v}_{m}$ is linearly independent iff echelon form of $A$ has a pivot in every column;
2.

The system $\mathbf{v}_{1},\mathbf{v}_{2},\ldots,\mathbf{v}_{m}$ is complete in $\mathbb{F}^{n}$ (spanning, generating) iff echelon form of $A$ has a pivot in every row;
3.

The system $\mathbf{v}_{1},\mathbf{v}_{2},\ldots,\mathbf{v}_{m}$ is a basis in $\mathbb{F}^{n}$ iff echelon form of $A$ has a pivot in every column and in every row.

Proof.

The system $\mathbf{v}_{1},\mathbf{v}_{2},\ldots,\mathbf{v}_{m}\in\mathbb{F}^{n}$ is linearly independent if and only if the equation

x_{1}\mathbf{v}_{1}+x_{2}\mathbf{v}_{2}+\ldots+x_{m}\mathbf{v}_{m}=\mathbf{0}

has the unique (trivial) solution $x_{1}=x_{2}=\ldots=x_{m}=0$ , or equivalently, the equation $A\mathbf{x}=\mathbf{0}$ has unique solution $\mathbf{x}=\mathbf{0}$ . By statement 1 above, it happens if and only if there is a pivot in every column of the matrix.

Similarly, the system $\mathbf{v}_{1},\mathbf{v}_{2},\ldots,\mathbf{v}_{m}\in\mathbb{F}^{n}$ is complete in $\mathbb{F}^{n}$ if and only if the equation

x_{1}\mathbf{v}_{1}+x_{2}\mathbf{v}_{2}+\ldots+x_{m}\mathbf{v}_{m}=\mathbf{b}

has a solution for any right side $\mathbf{b}\in\mathbb{F}^{n}$ . By statement 2 above, it happens if and only if there is a pivot in every row in echelon form of the matrix.

And finally, the system $\mathbf{v}_{1},\mathbf{v}_{2},\ldots,\mathbf{v}_{m}\in\mathbb{F}^{n}$ is a basis in $\mathbb{F}^{n}$ if and only if the equation

x_{1}\mathbf{v}_{1}+x_{2}\mathbf{v}_{2}+\ldots+x_{m}\mathbf{v}_{m}=\mathbf{b}

has unique solution for any right side $\mathbf{b}\in\mathbb{F}^{n}$ . By statement 3 this happens if and only if there is a pivot in every column and in every row of echelon form of $A$ . ∎

Proposition 2.3.2.

Any linearly independent system of vectors in $\mathbb{F}^{n}$ cannot have more than $n$ vectors in it.

Proof.

Let a system $\mathbf{v}_{1},\mathbf{v}_{2},\ldots,\mathbf{v}_{m}\in\mathbb{F}^{n}$ be linearly independent, and let $A=[\mathbf{v}_{1},\mathbf{v}_{2},\ldots,\mathbf{v}_{m}]$ be the $n\times m$ matrix with columns $\mathbf{v}_{1},\mathbf{v}_{2},\ldots,\mathbf{v}_{m}$ . By Proposition 2.3.1 echelon form of $A$ must have a pivot in every column, which is impossible if $m>n$ (number of pivots cannot be more than number of rows). ∎

Proposition 2.3.3.

Any two bases in a vector space $V$ have the same number of vectors in them.

Proof.

Let $\mathbf{v}_{1},\mathbf{v}_{2},\ldots,\mathbf{v}_{n}$ and $\mathbf{w}_{1},\mathbf{w}_{2},\ldots,\mathbf{w}_{m}$ be two different bases in $V$ . Without loss of generality we can assume that $n\leq m$ . Consider an isomorphism $A:\mathbb{F}^{n}\to V$ defined by

A\mathbf{e}_{k}=\mathbf{v}_{k},\qquad k=1,2,\ldots n,

where $\mathbf{e}_{1},\mathbf{e}_{2},\ldots,\mathbf{e}_{n}$ is the standard basis in $\mathbb{R}^{n}$ .

Since $A^{-1}$ is also an isomorphism, the system

A^{-1}\mathbf{w}_{1},A^{-1}\mathbf{w}_{2},\ldots,A^{-1}\mathbf{w}_{m}

is a basis (see Theorem 1.6.6 in Chapter 1). So it is linearly independent, and by Proposition 2.3.2, $m\leq n$ . Together with the assumption $n\leq m$ this implies that $m=n$ . ∎

The statement below is a particular case of the above proposition.

Proposition 2.3.4.

Any basis in $\mathbb{F}^{n}$ must have exactly $n$ vectors in it.

Proof.

This fact follows immediately from the previous proposition, but there is also a direct proof. Let $\mathbf{v}_{1},\mathbf{v}_{2},\ldots,\mathbf{v}_{m}$ be a basis in $\mathbb{F}^{n}$ and let $A$ be the $n\times m$ matrix with columns $\mathbf{v}_{1},\mathbf{v}_{2},\ldots,\mathbf{v}_{m}$ . The fact that the system is a basis, means that the equation

A\mathbf{x}=\mathbf{b}

has a unique solution for any (all possible) right side $\mathbf{b}$ . The existence means that there is a pivot in every row (of a reduced echelon form of the matrix), hence the number of pivots is exactly $n$ . The uniqueness mean that there is pivot in every column of the coefficient matrix (its echelon form), so

m=\text{number of columns}=\text{number of pivots}=n

∎

Proposition 2.3.5.

Any spanning (generating) set in $\mathbb{F}^{n}$ must have at least $n$ vectors.

Proof.

Let $\mathbf{v}_{1},\mathbf{v}_{2},\ldots,\mathbf{v}_{m}$ be a complete system in $\mathbb{F}^{n}$ , and let $A$ be $n\times m$ matrix with columns $\mathbf{v}_{1},\mathbf{v}_{2},\ldots,\mathbf{v}_{m}$ . Statement 2 of Proposition 2.3.1 implies that echelon form of $A$ has a pivot in every row. Since number of pivots cannot exceed the number of columns, $n\leq m$ . ∎

2.3.2. Corollaries about invertible matrices

Proposition 2.3.6.

A matrix $A$ is invertible if and only if its echelon form has pivot in every column and every row.

Proof.

As it was discussed in the beginning of the section, the equation $A\mathbf{x}=\mathbf{b}$ has a unique solution for any right side $\mathbf{b}$ if and only if the echelon form of $A$ has pivot in every row and every column. But, we know, see Theorem 1.6.8 in Chapter 1, that the matrix (linear transformation) $A$ is invertible if and only if the equation $A\mathbf{x}=\mathbf{b}$ has a unique solution for any possible right side $\mathbf{b}$ .

There is also an alternative proof. We know that a matrix is invertible if and only if its columns form a basis in (see Corollary 1.6.9 in Section 1.6.4, Chapter 1). Proposition 2.3.4 above states that it happens if and only if there is a pivot in every row and every column. ∎

The above proposition immediately implies the following

Corollary 2.3.7.

An invertible matrix must be square ( $n\times n$ ).

Proposition 2.3.8.

If a square ( $n\times n$ ) matrix is left invertible, or if it is right invertible, then it is invertible. In other words, to check the invertibility of a square matrix $A$ it is sufficient to check only one of the conditions $AA^{-1}=I$ , $A^{-1}A=I$ .

Note, that this proposition applies only to square matrices!

Proof.

We know that matrix $A$ is invertible if and only if the equation $A\mathbf{x}=\mathbf{b}$ has a unique solution for any right side $\mathbf{b}$ . This happens if and only if the echelon form of the matrix $A$ has pivots in every row and in every column.

If a matrix $A$ is left invertible, the equation $A\mathbf{x}=\mathbf{0}$ has unique solution $\mathbf{x}=\mathbf{0}$ . Indeed, if $B$ is a left inverse of $A$ (i.e. $BA=I$ ), and $\mathbf{x}$ satisfies

A\mathbf{x}=\mathbf{0},

then multiplying this identity by $B$ from the left we get $\mathbf{x}=\mathbf{0}$ , so the solution is unique. Therefore, the echelon form of $A$ has pivots in every column (no free variables). If the matrix $A$ is square ( $n\times n$ ), the echelon form also has pivots in every row ( $n$ pivots, and a row can have no more than 1 pivot), so the matrix is invertible.

If a matrix $A$ is right invertible, and $C$ is its right inverse ( $AC=I$ ), then for $\mathbf{x}=C\mathbf{b}$ , $\mathbf{b}\in\mathbb{F}^{n}$

A\mathbf{x}=AC\mathbf{b}=I\mathbf{b}=\mathbf{b}.

Therefore, for any right side $\mathbf{b}$ the equation $A\mathbf{x}=\mathbf{b}$ has a solution $\mathbf{x}=C\mathbf{b}$ . Thus, echelon form of $A$ has a pivot in every row. If $A$ is square, it also has a pivot in every column, so $A$ is invertible. ∎

Exercises.

2.3.1.

For what value of $b$ the system

\left(\begin{array}[]{ccc}1&2&2\\ 2&4&6\\ 1&2&3\end{array}\right)\mathbf{x}=\left(\begin{array}[]{c}1\\ 4\\ b\end{array}\right)

has a solution. Find the general solution of the system for this value of $b$ .

2.3.2.

Determine, if the vectors

\left(\begin{array}[]{c}1\\ 1\\ 0\\ 0\end{array}\right),\qquad\left(\begin{array}[]{c}1\\ 0\\ 1\\ 0\end{array}\right),\qquad\left(\begin{array}[]{c}0\\ 0\\ 1\\ 1\end{array}\right),\qquad\left(\begin{array}[]{c}0\\ 1\\ 0\\ 1\end{array}\right)

are linearly independent or not.

Do these four vectors span $\mathbb{R}^{4}$ ? (In other words, is it a generating system?) What about $\mathbb{C}^{4}$ ?

2.3.3.

Determine, which of the following systems of vectors are bases in $\mathbb{R}^{3}$ :

a)

$(1,2,-1)^{T}$ , $(1,0,2)^{T}$ , $(2,1,1)^{T}$ ;
b)

$(-1,3,2)^{T}$ , $(-3,1,3)^{T}$ , $(2,10,2)^{T}$ ;
c)

$(67,13,-47)^{T}$ , $(\pi,-7.84,0)^{T}$ , $(3,0,0)^{T}$ .

Which of the systems are bases in $\mathbb{C}^{3}$ ?

2.3.4.

Do the polynomials $x^{3}+2x$ , $x^{2}+x+1$ , $x^{3}+5$ generate (span) $\mathbb{P}_{3}$ ? Justify your answer.

2.3.5.

Can 5 vectors in $\mathbb{F}^{4}$ be linearly independent? Justify your answer.

2.3.6.

Prove or disprove: If the columns of a square ( $n\times n$ ) matrix $A$ are linearly independent, so are the columns of $A^{2}=AA$ .

2.3.7.

Prove or disprove: If the columns of a square ( $n\times n$ ) matrix $A$ are linearly independent, so are the rows of $A^{3}=AAA$ .

2.3.8.

Show that if the equation $A\mathbf{x}=\mathbf{0}$ has unique solution (i.e. if echelon form of $A$ has pivot in every column), then $A$ is left invertible. Hint: elementary matrices may help.
Note: It was shown in the text that if $A$ is left invertible, then the equation $A\mathbf{x}=\mathbf{0}$ has unique solution. But here you are asked to prove the converse of this statement, which was not proved.
Remark: This can be a very hard problem, for it requires deep understanding of the subject. However, when you understand what to do, the problem becomes almost trivial.

2.3.9.

Is the reduced echelon form of a matrix unique? Justify your conclusion.

Namely, suppose that by performing some row operations (not necessarily following any algorithm) we end up with a reduced echelon matrix. Do we always end up with the same matrix, or can we get different ones? Note that we are only allowed to perform row operations, the “column operations”’ are forbidden.

Hint: What happens if you start with an invertible matrix? Also, are the pivots always in the same columns, or can it be different depending on what row operations you perform? If you can tell what the pivot columns are without reverting to row operations, then the positions of pivot columns do not depend on them.

2.4. Finding $A^{-1}$ by row reduction.

As it was discussed above, an invertible matrix must be square, and its echelon form must have pivots in every row and every column. Therefore reduced echelon form of an invertible matrix is the identity matrix $I$ . Therefore,

Any invertible matrix is row equivalent (i.e. can be reduced by row operations) to the identity matrix.

Now let us state a simple algorithm of finding the inverse of an $n\times n$ matrix:

1.

Form an augmented $n\times 2n$ matrix $(A\,|\,I)$ by writing the $n\times n$ identity matrix right of $A$ ;
2.

Performing row operations on the augmented matrix transform $A$ to the identity matrix $I$ ;
3.

The matrix $I$ that we added will be automatically transformed to $A^{-1}$ ;
4.

If it is impossible to transform $A$ to the identity by row operations, $A$ is not invertible.

There are several possible explanations of the above algorithm. The first, a naïve one, is as follows: we know that (for an invertible $A$ ) the vector $A^{-1}\mathbf{b}$ is the solution of the equation $A\mathbf{x}=\mathbf{b}$ . So to find the column number $k$ of $A^{-1}$ we need to find the solution of $A\mathbf{x}=\mathbf{e}_{k}$ , where $\mathbf{e}_{1},\mathbf{e}_{2},\ldots,\mathbf{e}_{n}$ is the standard basis in $\mathbb{R}^{n}$ . The above algorithm just solves the equations

A\mathbf{x}=\mathbf{e}_{k},\qquad k=1,2,\ldots,n

simultaneously!

Let us also present another, more “advanced” explanation. As we discussed above, every row operation can be realized as a left multiplication by an elementary matrix. Let $E_{1},E_{2},\ldots,E_{N}$ be the elementary matrices corresponding to the row operation we performed, and let $E=E_{N}\cdots E_{2}E_{1}$ be their product.¹¹1Although it does not matter here, but please notice, that if the row operation $E_{1}$ was performed first, $E_{1}$ must be the rightmost term in the product We know that the row operations transform $A$ to identity, i.e. $EA=I$ , so $E=A^{-1}$ . But the same row operations transform the augmented matrix $(A\,|\,I)$ to $(EA\,|\,E)=(I\,|\,A^{-1})$ . ∎

This “advanced” explanation using elementary matrices implies an important proposition that will be often used later.

Theorem 2.4.1.

Any invertible matrix can be represented as a product of elementary matrices.

Proof.

As we discussed in the previous paragraph, $A^{-1}=E_{N}\cdots E_{2}E_{1}$ , so

A=(A^{-1})^{-1}=E_{1}^{-1}E_{2}^{-1}\cdots E_{N}^{-1}.

∎

An Example.

Suppose we want to find the inverse of the matrix

\left(\begin{array}[]{rrr}1&4&-2\\ -2&-7&7\\ 3&11&-6\end{array}\right).

Augmenting the identity matrix to it and performing row reduction we get

	$\displaystyle\left(\begin{array}[]{rrr\|rrr}1&4&-2&1&0&0\\ -2&-7&7&0&1&0\\ 3&11&-6&0&0&1\end{array}\right)\begin{array}[]{c}\vphantom{1}\\ +2R_{1}\\ -3R_{1}\end{array}$	$\displaystyle\sim\left(\begin{array}[]{rrr\|rrr}1&4&-2&1&0&0\\ 0&1&3&2&1&0\\ 0&-1&0&-3&0&1\end{array}\right)\begin{array}[]{c}\vphantom{1}\\ \vphantom{1}\\ +R_{2}\end{array}\sim$
	$\displaystyle\left(\begin{array}[]{rrr\|rrr}1&4&-2&1&0&0\\ 0&1&3&2&1&0\\ 0&0&3&-1&1&1\end{array}\right)\begin{array}[]{c}\times 3\\ \phantom{1}\\ \phantom{1}\end{array}$	$\displaystyle\sim\left(\begin{array}[]{rrr\|rrr}3&12&-6&3&0&0\\ 0&1&3&2&1&0\\ 0&0&3&-1&1&1\end{array}\right)\begin{array}[]{c}+2R_{3}\\ -R_{3}\\ \phantom{1}\end{array}\sim$

Here in the last row operation we multiplied the first row by $3$ to avoid fractions in the backward phase of row reduction. Continuing with the row reduction we get

\displaystyle\left(\begin{array}[]{rrr|rrr}3&12&0&1&2&2\\ 0&1&0&3&0&-1\\ 0&0&3&-1&1&1\end{array}\right)\begin{array}[]{c}-12R_{2}\\ \phantom{1}\\ \phantom{1}\end{array}

\displaystyle\sim\left(\begin{array}[]{rrr|rrr}3&0&0&-35&2&14\\ 0&1&0&3&0&-1\\ 0&0&3&-1&1&1\end{array}\right)

Dividing the first and the last row by 3 we get the inverse matrix

\left(\begin{array}[]{ccc}{-35/3}&{2/3}&{14/3}\\ 3&0&-1\\ {-1/3}&{1/3}&{1/3}\end{array}\right)

Exercises.

2.4.1.

Find the inverse of the matrices

\left(\begin{array}[]{ccccc}1&2&1\\ 3&7&3\\ 2&3&4\end{array}\right),\qquad\left(\begin{array}[]{rrr}1&-1&2\\ 1&1&-2\\ 1&1&4\\ \end{array}\right).

Show all steps

2.5. Dimension. Finite-dimensional spaces.

Definition.

The dimension $\dim V$ of a vector space $V$ is the number of vectors in a basis.

For a vector space consisting only of zero vector $\mathbf{0}$ we put $\dim V=0$ . If $V$ does not have a (finite) basis, we put $\dim V=\infty$ .

If $\dim V$ is finite, we call the space $V$ finite-dimensional; otherwise we call it infinite-dimensional.

Proposition 2.3.3 asserts that the dimension is well defined, i.e. that it does not depend on the choice of a basis.

Proposition 1.2.8 from Chapter 1 states that any finite spanning system in a vector space $V$ contains a basis. This immediately implies the following

Proposition 2.5.1.

A vector space $V$ is finite-dimensional if and only if it has a finite spanning system.

Suppose, that we have a system of vectors in a finite-dimensional vector space, and we want to check if it is a basis (or if it is linearly independent, or if it is complete)? Probably the simplest way is to use an isomorphism $A:V\to\mathbb{R}^{n}$ , $n=\dim V$ to move the problem to $\mathbb{R}^{n}$ , where all such questions can be answered by row reduction (studying pivots).

Note, that if $\dim V=n$ , then there always exists an isomorphism $A:V\to\mathbb{R}^{n}$ . Indeed, if $\dim V=n$ then there exists a basis $\mathbf{v}_{1},\mathbf{v}_{2},\ldots,\mathbf{v}_{n}\in V$ , and one can define an isomorphism $A:V\to\mathbb{R}^{n}$ by

A\mathbf{v}_{k}=\mathbf{e}_{k},\qquad k=1,2,\ldots,n.

As an example, let us give the following two corollaries of the above Propositions 2.3.2, 2.3.5:

Proposition 2.5.2.

Any linearly independent system in a finite-dimensional vector space $V$ cannot have more than $\dim V$ vectors in it.

Proof.

Let $\mathbf{v}_{1},\mathbf{v}_{2},\ldots,\mathbf{v}_{m}\in V$ be a linearly independent system, and let $A:V\to\mathbb{R}^{n}$ be an isomorphism. Then $A\mathbf{v}_{1},A\mathbf{v}_{2},\ldots,A\mathbf{v}_{m}$ is a linearly independent system in $\mathbb{R}^{n}$ , and by Proposition 2.3.2 $m\leq n$ . ∎

Proposition 2.5.3.

Any generating system in a finite-dimensional vector space $V$ must have at least $\dim V$ vectors in it.

Proof.

Let $\mathbf{v}_{1},\mathbf{v}_{2},\ldots,\mathbf{v}_{m}\in V$ be a complete system in $V$ , and let $A:V\to\mathbb{R}^{n}$ be an isomorphism. Then $A\mathbf{v}_{1},A\mathbf{v}_{2},\ldots,A\mathbf{v}_{m}$ is a complete system in $\mathbb{R}^{n}$ , and by Proposition 2.3.5 $m\geq n$ . ∎

2.5.1. Completing a linearly independent system to a basis

The following statement will play an important role later.

Proposition 2.5.4 (Completion to a basis).

A linearly independent system of vectors in a finite-dimensional space can be completed to a basis, i.e. if $\mathbf{v}_{1},\mathbf{v}_{2},\ldots,\mathbf{v}_{r}$ are linearly independent vectors in a finite-dimensional vector space $V$ then one can find vectors $\mathbf{v}_{r+1},\mathbf{v}_{r+2}\ldots,\mathbf{v}_{n}$ such that the system of vectors $\mathbf{v}_{1},\mathbf{v}_{2},\ldots,\mathbf{v}_{n}$ is a basis in $V$ .

Proof.

Let $n=\dim V$ . Assume that the system is not generating, since otherwise it is already a basis. Take any vector (let us call it $\mathbf{v}_{r+1}$ ) not belonging to $\operatorname{span}\{\mathbf{v}_{1},\mathbf{v}_{2},\ldots,\mathbf{v}_{r}\}$ (one can always do that because the system $\mathbf{v}_{1},\mathbf{v}_{2},\ldots,\mathbf{v}_{r}$ is not generating). By Exercise 1.2.5 from Chapter 1 the system $\mathbf{v}_{1},\ldots,{\mathbf{v}}_{r},\mathbf{v}_{r+1}$ is linearly independent (notice that in this case $r<n$ by Proposition 2.5.2). Repeat the procedure with the new system to get vector $\mathbf{v}_{r+2}$ , and so on.

We will stop the process when we get a generating system. Note, that the process cannot continue infinitely, because a linearly independent system of vectors in $V$ cannot have more than $n=\dim V$ vectors. ∎

2.5.2. Subspaces of finite dimensional spaces

Theorem 2.5.5.

Let $V$ be a subspace of a vector space $W$ , $\dim W<\infty$ . Then $V$ is finite dimensional and $\dim V\leq\dim W$ .

Moreover, if $\dim V=\dim W$ then $V=W$ (we are still assuming that $V$ is a subspace of $W$ here).

Remark.

This theorem looks like a complete banality, like an easy corollary of Proposition 2.5.2. But we can apply Proposition 2.5.2 only if we already have a basis in $V$ . And we only have a basis in $W$ , and we cannot say how many vectors in this basis belong to $V$ ; in fact, it is easy to construct an example where none of the vectors in the basis of $W$ belongs to $V$ .

Proof of Theorem 2.5.5.

If $V=\{\mathbf{0}\}$ then the theorem is trivial, so let us assume otherwise.

We want to find a basis in $V$ . Take a non-zero vector $\mathbf{v}_{1}\in V$ . If $V=\operatorname{span}\{\mathbf{v}_{1}\}$ , we got our basis (consisting of the one vector $\mathbf{v}_{1}$ ).

If not, we continue by induction. Suppose we constructed $r$ linearly independent vectors $\mathbf{v}_{1},\ldots,\mathbf{v}_{r}\in V$ . If $V=\operatorname{span}\{\mathbf{v}_{k}:1\leq k\leq r\}$ , then we have found a basis in $V$ . If not, there exists a vector $\mathbf{v}_{r+1}\in V$ , $\mathbf{v}_{r+1}\notin\operatorname{span}\{\mathbf{v}_{k}:1\leq k\leq r\}$ . By Exercise 1.2.5 from Chapter 1 the system $\mathbf{v}_{1},\ldots,{\mathbf{v}}_{r},\mathbf{v}_{r+1}$ is linearly independent.

∎

Exercises.

2.5.1.

True or false:

a)

Every vector space that is generated by a finite set has a basis;
b)

Every vector space has a (finite) basis;
c)

A vector space cannot have more than one basis;
d)

If a vector space has a finite basis, then the number of vectors in every basis is the same.
e)

The dimension of $\mathbb{P}_{n}$ is $n$ ;
f)

The dimension on $M_{m\times n}$ is $m+n$ ;
g)

If vectors $\mathbf{v}_{1},\mathbf{v}_{2},\ldots,\mathbf{v}_{n}$ generate (span) the vector space $V$ , then every vector in $V$ can be written as a linear combination of vector $\mathbf{v}_{1},\mathbf{v}_{2},\ldots,\mathbf{v}_{n}$ in only one way;
h)

Every subspace of a finite-dimensional space is finite-dimensional;
i)

If $V$ is a vector space having dimension $n$ , then $V$ has exactly one subspace of dimension $0$ and exactly one subspace of dimension $n$ .

2.5.2.

Prove that if $V$ is a vector space having dimension $n$ , then a system of vectors $\mathbf{v}_{1},\mathbf{v}_{2},\ldots,\mathbf{v}_{n}$ in $V$ is linearly independent if and only if it spans $V$ .

2.5.3.

Prove that a linearly independent system of vectors $\mathbf{v}_{1},\mathbf{v}_{2},\ldots,\mathbf{v}_{n}$ in a vector space $V$ is a basis if and only if $n=\dim V$ .

2.5.4.

(An old exercise revisited: now this problem should be easy) Is it possible that vectors $\mathbf{v}_{1},\mathbf{v}_{2},\mathbf{v}_{3}$ are linearly dependent, but the vectors $\mathbf{w}_{1}=\mathbf{v}_{1}+\mathbf{v}_{2}$ , $\mathbf{w}_{2}=\mathbf{v}_{2}+\mathbf{v}_{3}$ and $\mathbf{w}_{3}=\mathbf{v}_{3}+\mathbf{v}_{1}$ are linearly independent? Hint: What dimension can the subspace $\operatorname{span}(\mathbf{v}_{1},\mathbf{v}_{2},\mathbf{v}_{3})$ have?

2.5.5.

Let vectors $\mathbf{u},\mathbf{v},\mathbf{w}$ be a basis in $V$ . Show that $\mathbf{u}+\mathbf{v}+\mathbf{w},\ \mathbf{v}+\mathbf{w},\ \mathbf{w}$ is also a basis in $V$ .

2.5.6.

Consider in the space $\mathbb{R}^{5}$ vectors $\mathbf{v}_{1}=(2,-1,1,5,-3)^{T}$ , $\mathbf{v}_{2}=(3,-2,0,0,0)^{T}$ , $\mathbf{v}_{3}=(1,1,50,-921,0)^{T}$ .

a)

Prove that these vectors are linearly independent.
b)

Complete this system of vectors to a basis.

If you do part b) first you can do everything without any computations.

2.6. General solution of a linear system.

In this short section we discuss the structure of the general solution (i.e. of the solution set) of a linear system.

We call a system $A\mathbf{x}=\mathbf{b}$ homogeneous if the right side $\mathbf{b}=\mathbf{0}$ , i.e. a homogeneous system is a system of form $A\mathbf{x}=\mathbf{0}$ .

With each system

A\mathbf{x}=\mathbf{b}

we can associate a homogeneous system just by putting $\mathbf{b}=\mathbf{0}$ .

Theorem 2.6.1 (General solution of a linear equation).

Let a vector $\mathbf{x}_{1}$ satisfy the equation $A\mathbf{x}=\mathbf{b}$ , and let $H$ be the set of all solutions of the associated homogeneous system

A\mathbf{x}=\mathbf{0}.

Then the set

\{\mathbf{x}=\mathbf{x}_{1}+\mathbf{x}_{\scriptstyle\text{\rm h}}:\mathbf{x}_{% \scriptstyle\text{\rm h}}\in H\}

is the set of all solutions of the equation $A\mathbf{x}=\mathbf{b}$ .

In other words, this theorem can be stated as

\framebox{\parbox{85.35826pt}{ General solution of $A\mathbf{x}=\mathbf{b}$}}=\framebox{\parbox{85.35826pt}{A% particular solution of $A\mathbf{x}=\mathbf{b}$}}+\framebox{\parbox{85.35826pt}{General solution of $A\mathbf{x}=\mathbf{0}$}}\ .

Proof.

Fix a vector $\mathbf{x}_{1}$ satisfying $A\mathbf{x}_{1}=\mathbf{b}$ . Let a vector $\mathbf{x}_{\scriptstyle\text{\rm h}}$ satisfy $A\mathbf{x}_{{}_{\scriptstyle h}}=\mathbf{0}$ . Then for $\mathbf{x}=\mathbf{x}_{1}+\mathbf{x}_{\scriptstyle\text{\rm h}}$ we have

A\mathbf{x}=A(\mathbf{x}_{1}+\mathbf{x}_{\scriptstyle\text{\rm h}})=A\mathbf{x% }_{1}+A\mathbf{x}_{\scriptstyle\text{\rm h}}=\mathbf{b}+\mathbf{0}=\mathbf{b},

so any $\mathbf{x}$ of form

\mathbf{x}=\mathbf{x}_{1}+\mathbf{x}_{\scriptstyle\text{\rm h}},\qquad\mathbf{% x}_{\scriptstyle\text{\rm h}}\in H

is a solution of $A\mathbf{x}=\mathbf{b}$ .

Now let $\mathbf{x}$ satisfy $A\mathbf{x}=\mathbf{b}$ . Then for $\mathbf{x}_{\scriptstyle\text{\rm h}}:=\mathbf{x}-\mathbf{x}_{1}$ we get

A\mathbf{x}_{\scriptstyle\text{\rm h}}=A(\mathbf{x}-\mathbf{x}_{1})=A\mathbf{x% }-A\mathbf{x}_{1}=\mathbf{b}-\mathbf{b}=\mathbf{0},

so $\mathbf{x}_{\scriptstyle\text{\rm h}}\in H$ . Therefore any solution $\mathbf{x}$ of $A\mathbf{x}=\mathbf{b}$ can be represented as $\mathbf{x}=\mathbf{x}_{1}+\mathbf{x}_{\scriptstyle\text{\rm h}}$ with some $\mathbf{x}_{\scriptstyle\text{\rm h}}\in H$ . ∎

The power of this theorem is in its generality. It applies to all linear equations, we do not have to assume here that vector spaces are finite-dimensional. You will meet this theorem in differential equations, integral equations, partial differential equations, etc. Besides showing the structure of the solution set, this theorem allows one to separate investigation of uniqueness from the study of existence. Namely, to study uniqueness, we only need to analyze uniqueness of the homogeneous equation $A\mathbf{x}=\mathbf{0}$ , which always has a solution.

There is an immediate application in this course: this theorem allows us to check a solution of a system $A\mathbf{x}=\mathbf{b}$ . For example, consider a system

\left(\begin{array}[]{ccccc}2&3&1&4&-9\\ 1&1&1&1&-3\\ 1&1&1&2&-5\\ 2&2&2&3&-8\end{array}\right)\mathbf{x}=\left(\begin{array}[]{c}17\\ 6\\ 8\\ 14\end{array}\right).

Performing row reduction one can find the solution of this system

(2.6.1)

\mathbf{x}=\left(\begin{array}[]{c}3\\ 1\\ 0\\ 2\\ 0\end{array}\right)+x_{3}\left(\begin{array}[]{c}-2\\ 1\\ 1\\ 0\\ 0\end{array}\right)+x_{5}\left(\begin{array}[]{c}2\\ -1\\ 0\\ 2\\ 1\end{array}\right)\ ,\qquad x_{3},x_{5}\in\mathbb{F}.

The parameters $x_{3}$ , $x_{5}$ can be denoted here by any other letters, $t$ and $s$ , for example; we are keeping notation $x_{3}$ and $x_{5}$ here only to remind us that the parameters came from the corresponding free variables.

Now, let us suppose, that we are just given this solution, and we want to check whether or not it is correct. Of course, we can repeat the row operations, but this is too time consuming. Moreover, if the solution was obtained by some non-standard method, it can look differently from what we get from the row reduction. For example, the formula

(2.6.2)

\mathbf{x}=\left(\begin{array}[]{c}3\\ 1\\ 0\\ 2\\ 0\end{array}\right)+s\left(\begin{array}[]{c}-2\\ 1\\ 1\\ 0\\ 0\end{array}\right)+t\left(\begin{array}[]{c}0\\ 0\\ 1\\ 2\\ 1\end{array}\right)\ ,\qquad s,t\in\mathbb{F}

gives the same set as (2.6.1) (can you say why?); here we just replaced the last vector in (2.6.1) by its sum with the second one. So, this formula is different from the solution we got from the row reduction, but it is nevertheless correct.

The simplest way to check that (2.6.1) and (2.6.2) give us correct solutions, is to check that the first vector $(3,1,0,2,0)^{T}$ satisfies the equation $A\mathbf{x}=\mathbf{b}$ , and that the other two (the ones with the parameters $x_{3}$ and $x_{5}$ or $s$ and $t$ in front of them) should satisfy the associated homogeneous equation $A\mathbf{x}=\mathbf{0}$ .

If this checks out, we will be assured that any vector $\mathbf{x}$ defined by (2.6.1) or (2.6.2) is indeed a solution.

Note, that this method of checking the solution does not guarantee that (2.6.1) (or (2.6.2)) gives us all the solutions. For example, if we just somehow miss out the term with $x_{3}$ , the above method of checking will still work fine.

So, how can we guarantee, that we did not miss any free variable, and there should not be extra term in (2.6.1)?

What comes to mind, is to count the pivots again. In this example, if one does row operations, the number of pivots is 3. So indeed, there should be 2 free variables, and it looks like we did not miss anything in (2.6.1).

To be able to prove this, we will need new notions of fundamental subspaces and of rank of a matrix. I should also mention that in this particular example, one does not have to perform all row operations to check that there are only 2 free variables, and that formulas (2.6.1) and (2.6.2) both give correct general solutions.

Exercises.

2.6.1.

True or false

a)

Any system of linear equations has at least one solution;
b)

Any system of linear equations has at most one solution;
c)

Any homogeneous system of linear equations has at least one solution;
d)

Any system of $n$ linear equations in $n$ unknowns has at least one solution;
e)

Any system of $n$ linear equations in $n$ unknowns has at most one solution;
f)

If the homogeneous system corresponding to a given system of a linear equations has a solution, then the given system has a solution;
g)

If the coefficient matrix of a homogeneous system of $n$ linear equations in $n$ unknowns is invertible, then the system has no non-zero solution;
h)

The solution set of any system of $m$ equations in $n$ unknowns is a subspace in $\mathbb{R}^{n}$ ;
i)

The solution set of any homogeneous system of $m$ equations in $n$ unknowns is a subspace in $\mathbb{R}^{n}$ .

2.6.2.

Find a $2\times 3$ system (2 equations with 3 unknowns) such that its general solution has a form

\left(\begin{array}[]{c}1\\ 1\\ 0\end{array}\right)+s\left(\begin{array}[]{c}1\\ 2\\ 1\end{array}\right),\qquad s\in\mathbb{R}.

2.7. Fundamental subspaces of a matrix. Rank.

As we discussed above in Section 1.7 of Chapter 1, with any linear transformation $A:V\to W$ we can associate two subspaces, namely, its kernel, or null space

\operatorname{Ker}A=\operatorname{Null}A:=\{\mathbf{v}\in V:A\mathbf{v}=% \mathbf{0}\}\subset V,

and its range

\operatorname{Ran}A=\{\mathbf{w}\in W:\mathbf{w}=A\mathbf{v}\text{ for some }% \mathbf{v}\in V\}\subset W.

In other words, the kernel $\operatorname{Ker}A$ is the solution set of the homogeneous equation $A\mathbf{x}=\mathbf{0}$ , and the range $\operatorname{Ran}A$ is exactly the set of all right sides $\mathbf{b}\in W$ for which the equation $A\mathbf{x}=\mathbf{b}$ has a solution.

If $A$ is an $m\times n$ matrix, i.e. a mapping from $\mathbb{F}^{n}$ to $\mathbb{F}^{m}$ , then it follows from the “column by coordinate” rule of the matrix multiplication that any vector $\mathbf{w}\in\operatorname{Ran}A$ can be represented as a linear combination of columns of $A$ . This explains the name column space (notation $\operatorname{Col}A$ ), which is often used instead of $\operatorname{Ran}A$ .

If $A$ is a matrix, then in addition to $\operatorname{Ran}A$ and $\operatorname{Ker}A$ one can also consider the range and kernel for the transposed matrix $A^{T}$ . Often the term row space is used for $\operatorname{Ran}A^{T}$ and the term left null space is used for $\operatorname{Ker}A^{T}$ (but usually no special notation is introduced).

The four subspaces $\operatorname{Ran}A$ , $\operatorname{Ker}A$ , $\operatorname{Ran}A^{T}$ , $\operatorname{Ker}A^{T}$ are called the fundamental subspaces of the matrix $A$ . In this section we will study important relations between the dimensions of the four fundamental subspaces.

We will need the following definition, which is one of the fundamental notions of Linear Algebra

Definition.

Given a linear transformation (matrix) $A$ its rank, $\operatorname{rank}A$ , is the dimension of the range of $A$

\operatorname{rank}A:=\dim\operatorname{Ran}A.

2.7.1. Computing fundamental subspaces and rank.

To compute the fundamental subspaces and rank of a matrix, one needs to do echelon reduction. Namely, let $A$ be the matrix, and $A_{\text{e}}$ be its echelon form

1.

The pivot columns of the original matrix $A$ (i.e. the columns where after row operations we will have pivots in the echelon form) give us a basis (one of many possible) in $\operatorname{Ran}A$ .
2.

The pivot rows of the echelon form $A_{\scriptstyle\text{\rm e}}$ give us a basis in the row space. Of course, it is possible just to transpose the matrix, and then do row operations. But if we already have the echelon form of $A$ , say by computing $\operatorname{Ran}A$ , then we get $\operatorname{Ran}A^{T}$ for free.
3.

To find a basis in the null space $\operatorname{Ker}A$ one needs to solve the homogeneous equation $A\mathbf{x}=\mathbf{0}$ : the details will be seen from the example below.

Example.

Consider a matrix

\left(\begin{array}[]{ccccc}1&1&2&2&1\\ 2&2&1&1&1\\ 3&3&3&3&2\\ 1&1&-1&-1&0\end{array}\right).

Performing row operations we get the echelon form

\left(\begin{array}[]{ccccc}\framebox{$1$}&1&2&2&1\\ 0&0&\framebox{$-3$}&-3&-1\\ 0&0&0&0&0\\ 0&0&0&0&0\end{array}\right)

(the pivots are boxed here). So, the columns 1 and 3 of the original matrix, i.e. the columns

\left(\begin{array}[]{c}1\\ 2\\ 3\\ 1\end{array}\right),\quad\left(\begin{array}[]{r}2\\ 1\\ 3\\ -1\end{array}\right)

give us a basis in $\operatorname{Ran}A$ . We also get a basis for the row space $\operatorname{Ran}A^{T}$ for free: the first and second row of the echelon form of $A$ , i.e. the vectors

\left(\begin{array}[]{c}1\\ 1\\ 2\\ 2\\ 1\end{array}\right),\quad\left(\begin{array}[]{r}0\\ 0\\ -3\\ -3\\ -1\end{array}\right)

(we put the vectors vertically here. The question of whether to put vectors here vertically as columns, or horizontally as rows is really a matter of convention. Our reason for putting them vertically is that although we call $\operatorname{Ran}A^{T}$ the row space we define it as a column space of $A^{T}$ )

To compute the basis in the null space $\operatorname{Ker}A$ we need to solve the equation $A\mathbf{x}=\mathbf{0}$ . Compute the reduced echelon form of $A$ , which in this example is

\left(\begin{array}[]{ccccc}\framebox{$1$}&1&0&0&1/3\\ 0&0&\framebox{$1$}&1&1/3\\ 0&0&0&0&0\\ 0&0&0&0&0\end{array}\right).

Note, that when solving the homogeneous equation $A\mathbf{x}=\mathbf{0}$ , it is not necessary to write the whole augmented matrix, it is sufficient to work with the coefficient matrix. Indeed, in this case the last column of the augmented matrix is the column of zeroes, which does not change under row operations. So, we can just keep this column in mind, without actually writing it. Keeping this last zero column in mind, we can read the solution off the reduced echelon form above:

\left\{\begin{array}[]{l}x_{1}=-x_{2}-\frac{1}{3}x_{5},\\ x_{2}\text{ is free},\\ x_{3}=-x_{4}-\frac{1}{3}x_{5}\\ x_{4}\text{ is free},\\ x_{5}\text{ is free},\end{array}\right.

or, in the vector form

(2.7.1)

\mathbf{x}=\left(\begin{array}[]{c}-x_{2}-\frac{1}{3}x_{5}\\ x_{2}\\ -x_{4}-\frac{1}{3}x_{5}\\ x_{4}\\ x_{5}\end{array}\right)=x_{2}\left(\begin{array}[]{c}-1\\ 1\\ 0\\ 0\\ 0\end{array}\right)+x_{4}\left(\begin{array}[]{c}0\\ 0\\ -1\\ 1\\ 0\end{array}\right)+x_{5}\left(\begin{array}[]{c}-1/3\\ 0\\ -1/3\\ 0\\ 1\end{array}\right)

The vectors at each free variable, i.e. in our case the vectors

\left(\begin{array}[]{c}-1\\ 1\\ 0\\ 0\\ 0\end{array}\right),\quad\left(\begin{array}[]{c}0\\ 0\\ -1\\ 1\\ 0\end{array}\right),\quad\left(\begin{array}[]{c}-1/3\\ 0\\ -1/3\\ 0\\ 1\end{array}\right)

form a basis in $\operatorname{Ker}A$ .

Unfortunately, there is no shortcut for finding a basis in $\operatorname{Ker}A^{T}$ , one must solve the equation $A^{T}\mathbf{x}=\mathbf{0}$ . The knowledge of the echelon form of $A$ does not help here.

2.7.2. Explanation of the computing bases in the fundamental subspaces.

So, why do the above methods indeed give us bases in the fundamental subspaces?

The null space $\operatorname{Ker}A$ .

The case of the null space $\operatorname{Ker}A$ is probably the simplest one: since we solved the equation $A\mathbf{x}=\mathbf{0}$ , i.e. found all the solutions, then any vector in $\operatorname{Ker}A$ is a linear combination of the vectors we obtained. Thus, the vectors we obtained form a spanning system in $\operatorname{Ker}A$ . To see that the system is linearly independent, let us multiply each vector by the corresponding free variable and add everything, see (2.7.1). Then for each free variable $x_{k}$ , the entry number $k$ of the resulting vector is exactly $x_{k}$ , see (2.7.1) again, so the only way this vector (the linear combination) can be $\mathbf{0}$ is when all free variables are $0$ .

The column space $\operatorname{Ran}A$ .

Let us now explain why the method for finding a basis in the column space $\operatorname{Ran}A$ works. First of all, notice that the pivot columns of the reduced echelon form $A_{\scriptstyle\text{\rm re}}$ of $A$ form a basis in $\operatorname{Ran}A_{\scriptstyle\text{\rm re}}$ (not in the column space of the original matrix, but of its reduced echelon form!). Since row operations are just left multiplications by invertible matrices, they do not change linear independence. Therefore, the pivot columns of the original matrix $A$ are linearly independent.

Let us now show that the pivot columns of $A$ span the column space of $A$ . Let $\mathbf{v}_{1},\mathbf{v}_{2},\ldots,\mathbf{v}_{r}$ be the pivot columns of $A$ , and let $\mathbf{v}$ be an arbitrary column of $A$ . We want to show that $\mathbf{v}$ can be represented as a linear combination of the pivot columns $\mathbf{v}_{1},\mathbf{v}_{2},\ldots,\mathbf{v}_{r}$ ,

\mathbf{v}=\alpha_{1}\mathbf{v}_{1}+\alpha_{2}\mathbf{v}_{2}+\ldots+\alpha_{r}% \mathbf{v}_{r}.

The reduced echelon form $A_{\scriptstyle\text{\rm re}}$ is obtained from $A$ by the left multiplication

A_{\scriptstyle\text{\rm re}}=EA,

where $E$ is a product of elementary matrices, so $E$ is an invertible matrix. The vectors $E\mathbf{v}_{1},E\mathbf{v}_{2},\ldots,E\mathbf{v}_{r}$ are the pivot columns of $A_{\scriptstyle\text{\rm re}}$ , and the column $\mathbf{v}$ of $A$ is transformed to the column $E\mathbf{v}$ of $A_{\scriptstyle\text{\rm re}}$ . Since the pivot columns of $A_{\scriptstyle\text{\rm re}}$ form a basis in $\operatorname{Ran}A_{\scriptstyle\text{\rm re}}$ , vector $E\mathbf{v}$ can be represented as a linear combination

E\mathbf{v}=\alpha_{1}E\mathbf{v}_{1}+\alpha_{2}E\mathbf{v}_{2}+\ldots+\alpha_% {r}E\mathbf{v}_{r}.

Multiplying this equality by $E^{-1}$ from the left we get the representation

\mathbf{v}=\alpha_{1}\mathbf{v}_{1}+\alpha_{2}\mathbf{v}_{2}+\ldots+\alpha_{r}% \mathbf{v}_{r},

so indeed the pivot columns of $A$ span $\operatorname{Ran}A$ .

The row space $\operatorname{Ran}A^{T}$ .

It is easy to see that the pivot rows of the echelon form $A_{\scriptstyle\text{\rm e}}$ of $A$ are linearly independent. Indeed, let $\mathbf{w}_{1},\mathbf{w}_{2},\ldots,\mathbf{w}_{r}$ be the transposed (since we agreed always to put vectors vertically) pivot rows of $A_{\scriptstyle\text{\rm e}}$ . Suppose

\alpha_{1}\mathbf{w}_{1}+\alpha_{2}\mathbf{w}_{2}+\ldots+\alpha_{r}\mathbf{w}_% {r}=\mathbf{0}.

Consider the first non-zero entry of $\mathbf{w}_{1}$ . Since for all other vectors $\mathbf{w}_{2},\mathbf{w}_{3},\ldots,\mathbf{w}_{r}$ the corresponding entries equal $0$ (by the definition of echelon form), we can conclude that $\alpha_{1}=0$ . So we can just ignore the first term in the sum.

Consider now the first non-zero entry of $\mathbf{w}_{2}$ . The corresponding entries of the vectors $\mathbf{w}_{3},\ldots,\mathbf{w}_{r}$ are $0$ , so $\alpha_{2}=0$ . Repeating this procedure, we get that $\alpha_{k}=0$ $\forall k=1,2,\ldots,r$ .

To see that vectors $\mathbf{w}_{1},\mathbf{w}_{2},\ldots,\mathbf{w}_{r}$ span the row space, one can notice that row operations do not change the row space. This can be obtained directly from analyzing row operations, but we present here a more formal way to demonstrate this fact.

For a transformation $A$ and a set $X$ let us denote by $A(X)$ the set of all elements $y$ which can represented as $y=A(x)$ , $x\in X$ ,

A(X):=\left\{y=A(x):x\in X\right\}.

If $A$ is an $m\times n$ matrix, and $A_{\scriptstyle\text{\rm e}}$ is its echelon form, $A_{\scriptstyle\text{\rm e}}$ is obtained from $A$ be left multiplication

A_{\scriptstyle\text{\rm e}}=EA,

where $E$ is an $m\times m$ invertible matrix (the product of the corresponding elementary matrices). Then

\operatorname{Ran}A^{T}_{\scriptstyle\text{\rm e}}=\operatorname{Ran}(A^{T}E^{% T})=A^{T}(\operatorname{Ran}E^{T})=A^{T}(\mathbb{R}^{m})=\operatorname{Ran}A^{% T},

so indeed $\operatorname{Ran}A^{T}=\operatorname{Ran}A^{T}_{\scriptstyle\text{\rm e}}$ .

2.7.3. The Rank Theorem. Dimensions of fundamental subspaces.

There are many applications in which one needs to find a basis in column space or in the null space of a matrix. For example, as it was shown above, solving a homogeneous equation $A\mathbf{x}=\mathbf{0}$ amounts to finding a basis in the null space $\operatorname{Ker}A$ . Finding a basis in the column space means simply extracting a basis from a spanning set, by removing unnecessary vectors (columns).

However, the most important application of the above methods of computing bases of fundamental subspaces is the relations between their dimensions.

Theorem 2.7.1 (The Rank Theorem).

For a matrix $A$

\operatorname{rank}A=\operatorname{rank}A^{T}.

This theorem is often stated as follows:

The column rank of a matrix coincides with its row rank.

The proof of this theorem is trivial, since dimensions of both $\operatorname{Ran}A$ and $\operatorname{Ran}A^{T}$ are equal to the number of pivots in the echelon form of $A$ .

The following theorem is gives us important relations between dimensions of the fundamental spaces. It is often also called the Rank Theorem

Theorem 2.7.2.

Let $A$ be an $m\times n$ matrix, i.e. a linear transformation from $\mathbb{F}^{n}$ to $\mathbb{F}^{m}$ . Then

1.

$\dim\operatorname{Ker}A+\operatorname{rank}A=n$ (dimension of the domain of $A$ );
2.

$\dim\operatorname{Ker}A^{T}+\operatorname{rank}A=m$ (dimension of the target space of $A$ ).

Proof.

The proof, modulo the above algorithms of finding bases in the fundamental subspaces, is almost trivial. The first statement is simply the fact that the number of free variables ( $\dim\operatorname{Ker}A)$ plus the number of basic variables (i.e. the number of pivots, i.e. $\operatorname{rank}A$ ) adds up to the number of columns (i.e. to $n$ ).

The second statement, if one takes into account that $\operatorname{rank}A=\operatorname{rank}A^{T}$ is simply the first statement applied to $A^{T}$ . ∎

As an application of the above theorem, let us recall the example from Section 2.6. There we considered a system

\left(\begin{array}[]{ccccc}2&3&1&4&-9\\ 1&1&1&1&-3\\ 1&1&1&2&-5\\ 2&2&2&3&-8\end{array}\right)\mathbf{x}=\left(\begin{array}[]{c}17\\ 6\\ 8\\ 14\end{array}\right),

and we claimed that its general solution given by

\mathbf{x}=\left(\begin{array}[]{c}3\\ 1\\ 0\\ 2\\ 0\end{array}\right)+x_{3}\left(\begin{array}[]{c}-2\\ 1\\ 1\\ 0\\ 0\end{array}\right)+x_{5}\left(\begin{array}[]{c}2\\ -1\\ 0\\ 2\\ 1\end{array}\right)\ ,\qquad x_{3},x_{5}\in\mathbb{F},

or by

\mathbf{x}=\left(\begin{array}[]{c}3\\ 1\\ 0\\ 2\\ 0\end{array}\right)+s\left(\begin{array}[]{c}-2\\ 1\\ 1\\ 0\\ 0\end{array}\right)+t\left(\begin{array}[]{c}0\\ 0\\ 1\\ 2\\ 1\end{array}\right)\ ,\qquad s,t\in\mathbb{F}.

We checked in Section 2.6 that a vector $\mathbf{x}$ given by either formula is indeed a solution of the equation. But, how can we guarantee that any of the formulas describe all solutions?

First of all, we know that in either formula, the last 2 vectors (the ones multiplied by the parameters) belong to $\operatorname{Ker}A$ . It is easy to see that in either case both vectors are linearly independent (two vectors are linearly dependent if and only if one is a multiple of the other).

Now, let us count dimensions: interchanging the first and the second rows and performing first round of row operations

\begin{array}[]{c}\phantom{1}\\ -2R_{1}\\ -R_{1}\\ -2R_{1}\end{array}\left(\begin{array}[]{ccccc}1&1&1&1&-3\\ 2&3&1&4&-9\\ 1&1&1&2&-5\\ 2&2&2&3&-8\end{array}\right)\quad\sim\quad\left(\begin{array}[]{ccccc}1&1&1&1&% -3\\ 0&1&-1&2&-3\\ 0&0&0&1&-2\\ 0&0&0&1&-2\end{array}\right)

we see that there are three pivots already, so $\operatorname{rank}A\geq 3$ . (Actually, we already can see that the rank is 3, but it is enough just to have the estimate here). By Theorem 2.7.2, $\operatorname{rank}A+\dim\operatorname{Ker}A=5$ , hence $\dim\operatorname{Ker}A\leq 2$ , and therefore there cannot be more than 2 linearly independent vectors in $\operatorname{Ker}A$ . Therefore, last 2 vectors in either formula form a basis in $\operatorname{Ker}A$ , so either formula give all solutions of the equation.

An important corollary of the rank theorem, is the following theorem connecting existence and uniqueness for linear equations.

Theorem 2.7.3.

Let $A$ be an $m\times n$ matrix. Then the equation

A\mathbf{x}=\mathbf{b}

has a solution for every $\mathbf{b}\in\mathbb{R}^{m}$ if and only if the dual equation

A^{T}\mathbf{x}=\mathbf{0}

has a unique (only the trivial) solution. (Note, that in the second equation we have $A^{T}$ , not $A$ ).

Proof.

The proof follows immediately from Theorem 2.7.2 by counting the dimensions. We leave the details as an exercise to the reader. ∎

There is a very nice geometric interpretation of the second rank theorem (Theorem 2.7.2). Namely, statement 1 of the theorem says, that if a transformation $A:\mathbb{F}^{n}\to\mathbb{F}^{m}$ has trivial kernel ( $\operatorname{Ker}A=\{\mathbf{0}\}$ ), then the dimensions of the domain $\mathbb{F}^{n}$ and of the range $\operatorname{Ran}A$ coincide. If the kernel is non-trivial, then the transformation “kills” $\dim\operatorname{Ker}A$ dimensions, so $\dim\operatorname{Ran}A=n-\dim\operatorname{Ker}A$ .

2.7.4. Completion of a linearly independent system to a basis

As Proposition 2.5.4 from Section 2.5 above asserts, any linearly independent system can be completed to a basis, i.e. given linearly independent vectors $\mathbf{v}_{1},\mathbf{v}_{2},\ldots,\mathbf{v}_{r}$ in a finite-dimensional vector space $V$ , one can find vectors $\mathbf{v}_{r+1},\mathbf{v}_{r+2}\ldots,\mathbf{v}_{n}$ such that the system of vectors $\mathbf{v}_{1},\mathbf{v}_{2},\ldots,\mathbf{v}_{n}$ is a basis in $V$ .

Theoretically, the proof of this proposition give us an algorithm of finding the vectors $\mathbf{v}_{r+1},\mathbf{v}_{r+2}\ldots,\mathbf{v}_{n}$ , but this algorithm does not look too practical.

Ideas of this section give us a more practical way to perform the completion to a basis.

First of all, notice that if an $m\times n$ matrix is in an echelon form, then its non-zero rows (which are clearly linearly independent) can be easily completed to a basis in the whole space $\mathbb{F}^{n}$ ; one just needs to add some rows in appropriate places, so the resulting matrix is still in the echelon form and has pivots in every column.

Then, the non-zero rows of the new matrix form a basis, and we can order it any way we want, because property of being basis does not depend on the ordering.

Suppose now that we have linearly independent vectors $\mathbf{v}_{1},\mathbf{v}_{2},\ldots,\mathbf{v}_{r}\in\mathbb{F}^{n}$ . Consider the matrix $A$ with rows $\mathbf{v}^{T}_{1},\mathbf{v}^{T}_{2},\ldots,\mathbf{v}^{T}_{r}$ and perform row operations to get the echelon form $A_{\scriptstyle\text{\rm e}}$ . As we discussed above, the rows of $A_{\scriptstyle\text{\rm e}}$ can be easily completed to a basis in $\mathbb{F}^{n}$ . And it turns out that the same vectors that complete rows of $A_{\scriptstyle\text{\rm e}}$ to a basis complete to a basis the original vectors $\mathbf{v}_{1},\mathbf{v}_{2},\ldots,\mathbf{v}_{r}$ .

To see that, let vectors $\mathbf{v}_{r+1},\ldots,\mathbf{v}_{n}$ complete the rows of $A_{\scriptstyle\text{\rm e}}$ to a basis in $\mathbb{F}^{n}$ . Then, if we add to a matrix $A_{\scriptstyle\text{\rm e}}$ rows $\mathbf{v}^{T}_{r+1},\ldots,\mathbf{v}^{T}_{n}$ , we get an invertible matrix. Let call this matrix $\widetilde{A}_{\scriptstyle\text{\rm e}}$ , and let $\widetilde{A}$ be the matrix obtained from $A$ by adding rows $\mathbf{v}^{T}_{r+1},\ldots,\mathbf{v}^{T}_{n}$ . The matrix $\widetilde{A}_{\scriptstyle\text{\rm e}}$ can be obtained from $\widetilde{A}$ by row operations, so

\widetilde{A}_{\scriptstyle\text{\rm e}}=E\widetilde{A},

where $E$ is the product of the corresponding elementary matrices. Then $\widetilde{A}=E^{-1}A_{\scriptstyle\text{\rm e}}$ and $\widetilde{A}$ is invertible as a product of invertible matrices.

But that means that the rows of $\widetilde{A}$ form a basis in $\mathbb{F}^{n}$ , which is exactly what we need.

Remark.

The method of completion to a basis described above may be not the simplest one, but one of its principal advantages is that it works for vector spaces over an arbitrary field.

Exercises.

2.7.1.

True or false:

a)

The rank of a matrix is equal to the number of its non-zero columns;
b)

The $m\times n$ zero matrix is the only $m\times n$ matrix having rank 0;
c)

Elementary row operations preserve rank;
d)

Elementary column operations do not necessarily preserve rank;
e)

The rank of a matrix is equal to the maximum number of linearly independent columns in the matrix;
f)

The rank of a matrix is equal to the maximum number of linearly independent rows in the matrix;
g)

The rank of an $n\times n$ matrix is at most $n$ ;
h)

An $n\times n$ matrix having rank $n$ is invertible.

2.7.2.

A $54\times 37$ matrix has rank $31$ . What are dimensions of all 4 fundamental subspaces?

2.7.3.

Compute rank and find bases of all four fundamental subspaces for the matrices

\left(\begin{array}[]{ccccccc}1&1&0\\ 0&1&1\\ 1&1&0\end{array}\right),\qquad\left(\begin{array}[]{ccccccc}1&2&3&1&1\\ 1&4&0&1&2\\ 0&2&-3&0&1\\ 1&0&0&0&0\end{array}\right).

2.7.4.

Prove that if $A:X\to Y$ and $V$ is a subspace of $X$ then $\operatorname{dim}AV\leq\operatorname{rank}A$ . ( $AV$ here means the subspace $V$ transformed by the transformation $A$ , i.e. any vector in $AV$ can be represented as $A\mathbf{v}$ , $\mathbf{v}\in V$ ). Deduce from here that $\operatorname{rank}(AB)\leq\operatorname{rank}A$ .

Remark: Here one can use the fact that if $V\subset W$ then $\operatorname{dim}V\leq\operatorname{dim}W$ . Do you understand why is it true?

2.7.5.

Prove that if $A:X\to Y$ and $V$ is a subspace of $X$ then $\operatorname{dim}AV\leq\operatorname{dim}V$ . Deduce from here that $\operatorname{rank}(AB)\leq\operatorname{rank}B$ .

2.7.6.

Prove that if the product $AB$ of two $n\times n$ matrices is invertible, then both $A$ and $B$ are invertible. Even if you know about determinants, do not use them, we did not cover them yet. Hint: use previous 2 problems.

2.7.7.

Prove that if $A\mathbf{x}=\mathbf{0}$ has unique solution, then the equation $A^{T}\mathbf{x}=\mathbf{b}$ has a solution for every right side $\mathbf{b}$ .
Hint: count pivots

2.7.8.

Write a matrix with the required property, or explain why no such matrix exists

a)

Column space contains $(1,0,0)^{T}$ , $(0,0,1)^{T}$ , row space contains $(1,1)^{T}$ , $(1,2)^{T}$ ;
b)

Column space is spanned by $(1,1,1)^{T}$ , nullspace is spanned by $(1,2,3)^{T}$ ;
c)

Column space is $\mathbb{R}^{4}$ , row space is $\mathbb{R}^{3}$ .

Hint: Check first if the dimensions add up.

2.7.9.

If $A$ has the same four fundamental subspaces as $B$ , does $A=B$ ?

2.7.10.

Complete the rows of a matrix

\left(\begin{array}[]{rrrrrrr}e^{3}&3&4&0&-\pi&6&-2\\ 0&0&2&-1&\pi^{e}&1&1\\ 0&0&0&0&3&-3&2\\ 0&0&0&0&0&0&1\\ \end{array}\right)

to a basis in $\mathbb{R}^{7}$ .

2.7.11.

For a matrix

\left(\begin{array}[]{rrrrrrr}1&2&-1&2&3\\ 2&2&1&5&5\\ 3&6&-3&0&24\\ -1&-4&4&-7&11\end{array}\right)

find bases in its column and row spaces.

2.7.12.

For the matrix in the previous problem, complete the basis in the row space to a basis in $\mathbb{R}^{5}$

2.7.13.

For the matrix

A=\left(\begin{array}[]{rr}1&i\\ i&-1\\ \end{array}\right)

compute $\operatorname{Ran}A$ and $\operatorname{Ker}A$ . What can you say about relation between these subspaces?

2.7.14.

Is it possible that for a real matrix $A$ that $\operatorname{Ran}A=\operatorname{Ker}A^{T}$ ? Is it possible for a complex $A$ ?

2.7.15.

Complete the vectors $(1,2,-1,2,3)^{T}$ , $(2,2,1,5,5)^{T}$ , $(-1,-4,4,7,-11)^{T}$ to a basis in $\mathbb{R}^{5}$ .

2.8. Representation of a linear transformation in arbitrary bases. Change of coordinates formula.

The material we have learned about linear transformations and their matrices can be easily extended to transformations in abstract vector spaces with finite bases. In this section we will distinguish between a linear transformation $T$ and its matrix, the reason being that we consider different bases, so a linear transformation can have different matrix representation.

2.8.1. Coordinate vector.

Let $V$ be a vector space with a basis $\mathcal{B}:=\{\mathbf{b}_{1},\mathbf{b}_{2},\ldots,\mathbf{b}_{n}\}$ . Any vector $\mathbf{v}\in V$ admits a unique representation as a linear combination

\mathbf{v}=x_{1}\mathbf{b}_{1}+x_{2}\mathbf{b}_{2}+\ldots+x_{n}\mathbf{b}_{n}=% \sum_{k=1}^{n}x_{k}\mathbf{b}_{k}.

The numbers $x_{1},x_{2},\ldots,x_{n}$ are called the coordinates of the vector $\mathbf{v}$ in the basis $\mathcal{B}$ . It is convenient to join these coordinates into the so-called coordinate vector of $\mathbf{v}$ relative to the basis $\mathcal{B}$ , which is the column vector

[\mathbf{v}]_{{}_{\scriptstyle\mathcal{B}}}:=\left(\begin{array}[]{c}x_{1}\\ {x}_{2}\\ \vdots\\ {x}_{n}\end{array}\right)\in\mathbb{F}^{n}.

Note that the mapping

\mathbf{v}\mapsto[\mathbf{v}]_{{}_{\scriptstyle\mathcal{B}}}

is an isomorphism between $V$ and $\mathbb{F}^{n}$ . It transforms the basis $\mathbf{b}_{1},\mathbf{b}_{2},\ldots,\mathbf{b}_{n}$ to the standard basis $\mathbf{e}_{1},\mathbf{e}_{2},\ldots,\mathbf{e}_{n}$ in $\mathbb{F}^{n}$ .

2.8.2. Matrix of a linear transformation.

Let $T:V\to W$ be a linear transformation, and let $\mathcal{A}=\{\mathbf{a}_{1},\mathbf{a}_{2},\ldots,\mathbf{a}_{n}\}$ , $\mathcal{B}:=\{\mathbf{b}_{1},\mathbf{b}_{2},\ldots,\mathbf{b}_{m}\}$ be bases in $V$ and $W$ respectively.

A matrix of the transformation $T$ in (or with respect to) the bases $\mathcal{A}$ and $\mathcal{B}$ is an $m\times n$ matrix, denoted by $[T]_{{}_{\scriptstyle\mathcal{B}\mathcal{A}}}$ , which relates the coordinate vectors $[T\mathbf{v}]_{{}_{\scriptstyle\mathcal{B}}}$ and $[\mathbf{v}]_{{}_{\scriptstyle\mathcal{A}}}$ ,

[T\mathbf{v}]_{{}_{\scriptstyle\mathcal{B}}}=[T]_{{}_{\scriptstyle\mathcal{B}% \mathcal{A}}}[\mathbf{v}]_{{}_{\scriptstyle\mathcal{A}}};

notice the balance of symbols $\mathcal{A}$ and $\mathcal{B}$ here: this is the reason we put the first basis $\mathcal{A}$ into the second position.

The matrix $[T]_{{}_{\scriptstyle\mathcal{B}\mathcal{A}}}$ is easy to find: its $k$ th column is just the coordinate vector $[T\mathbf{a}_{k}]_{{}_{\scriptstyle\mathcal{B}}}$ (compare this with finding the matrix of a linear transformation from $\mathbb{F}^{n}$ to $\mathbb{F}^{m}$ ).

As in the case of standard bases, composition of linear transformations is equivalent to multiplication of their matrices: one only has to be a bit more careful about bases. Namely, let $T_{1}:X\to Y$ and $T_{2}:Y\to Z$ be linear transformation, and let $\mathcal{A},\mathcal{B}$ and $\mathcal{C}$ be bases in $X$ , $Y$ and $Z$ respectively. Then for the composition $T=T_{2}T_{1}$ ,

T:X\to Z,\qquad T\mathbf{x}:=T_{2}(T_{1}(\mathbf{x}))

we have

(2.8.1)

[T]_{{}_{\scriptstyle\mathcal{C}\mathcal{A}}}=[T_{2}T_{1}]_{{}_{\scriptstyle% \mathcal{C}\mathcal{A}}}=[T_{2}]_{{}_{\scriptstyle\mathcal{C}\mathcal{B}}}[T_{% 1}]_{{}_{\scriptstyle\mathcal{B}\mathcal{A}}}

(notice again the balance of indices here).

The proof here goes exactly as in the case of $\mathbb{F}^{n}$ spaces with standard bases, so we do not repeat it here. Another possibility is to transfer everything to the spaces $\mathbb{F}^{n}$ via the coordinate isomorphisms $\mathbf{v}\mapsto[\mathbf{v}]_{{}_{\scriptstyle\mathcal{B}}}$ . Then one does not need any proof, everything follows from the results about matrix multiplication.

2.8.3. Change of coordinate matrix.

Let us have two bases $\mathcal{A}=\linebreak\{\mathbf{a}_{1},\mathbf{a}_{2},\ldots,\mathbf{a}_{n}\}$ and $\mathcal{B}=\{\mathbf{b}_{1},\mathbf{b}_{2},\ldots,\mathbf{b}_{n}\}$ in a vector space $V$ . Consider the identity transformation $I=I_{V}$ and its matrix $[I]_{{}_{\scriptstyle\mathcal{B}\mathcal{A}}}$ in these bases. By the definition

[\mathbf{v}]_{{}_{\scriptstyle\mathcal{B}}}=[I]_{{}_{\scriptstyle\mathcal{B}% \mathcal{A}}}[\mathbf{v}]_{{}_{\scriptstyle\mathcal{A}}},\qquad\forall\mathbf{% v}\in V,

i.e. for any vector $\mathbf{v}\in V$ the matrix $[I]_{{}_{\scriptstyle\mathcal{B}\mathcal{A}}}$ transforms its coordinates in the basis $\mathcal{A}$ into coordinates in the basis $\mathcal{B}$ . The matrix $[I]_{{}_{\scriptstyle\mathcal{B}\mathcal{A}}}$ is often called the change of coordinates (from the basis $\mathcal{A}$ to the basis $\mathcal{B}$ ) matrix.

The matrix $[I]_{{}_{\scriptstyle\mathcal{B}\mathcal{A}}}$ is easy to compute: according to the general rule of finding the matrix of a linear transformation, its $k$ th column is the coordinate representation $[\mathbf{a}_{k}]_{{}_{\scriptstyle\mathcal{B}}}$ of $k$ th element of the basis $\mathcal{A}$

Note that

[I]_{{}_{\scriptstyle\mathcal{A}\mathcal{B}}}=([I]_{{}_{\scriptstyle\mathcal{B% }\mathcal{A}}})^{-1},

(follows immediately from the multiplication of matrices rule (2.8.1)), so any change of coordinate matrix is always invertible.

An example: change of coordinates from the standard basis

Let our space $V$ be $\mathbb{F}^{n}$ , and let us have a basis $\mathcal{B}=\{\mathbf{b}_{1},\mathbf{b}_{2},\ldots,\mathbf{b}_{n}\}$ there. We also have the standard basis $\mathcal{S}=\{\mathbf{e}_{1},\mathbf{e}_{2},\ldots,\mathbf{e}_{n}\}$ there. The change of coordinates matrix $[I]_{{}_{\scriptstyle\mathcal{S}\mathcal{B}}}$ is easy to compute:

[I]_{{}_{\scriptstyle\mathcal{S}\mathcal{B}}}=[\mathbf{b}_{1},\mathbf{b}_{2},% \ldots,\mathbf{b}_{n}]=:B,

i.e. it is just the matrix $B$ whose $k$ th column is the vector (column) $\mathbf{v}_{k}$ . And in the other direction

[I]_{{}_{\scriptstyle\mathcal{B}\mathcal{S}}}=([I]_{{}_{\scriptstyle\mathcal{S% }\mathcal{B}}})^{-1}=B^{-1}.

For example, consider a basis

\mathcal{B}=\left\{\left(\begin{array}[]{c}1\\ 2\end{array}\right),\quad\left(\begin{array}[]{c}2\\ 1\end{array}\right)\right\}

in $\mathbb{F}^{2}$ , and let $\mathcal{S}$ denote the standard basis there. Then

[I]_{{}_{\scriptstyle\mathcal{S}\mathcal{B}}}=\left(\begin{array}[]{cc}1&2\\ 2&1\end{array}\right)=:B

and

[I]_{{}_{\scriptstyle\mathcal{B}\mathcal{S}}}=[I]_{{}_{\scriptstyle\mathcal{S}% \mathcal{B}}}^{-1}=B^{-1}=\frac{1}{3}\left(\begin{array}[]{rr}-1&2\\ 2&-1\end{array}\right)

(we know how to compute inverses, and it is also easy to check that the above matrix is indeed the inverse of $B$ )

An example: going through the standard basis

In the space of polynomials of degree at most $1$ we have bases

\mathcal{A}=\{1,1+x\},\quad\text{and}\quad\mathcal{B}=\{1+2x,1-2x\},

and we want to find the change of coordinate matrix $[I]_{{}_{\scriptstyle\mathcal{B}\mathcal{A}}}$ .

Of course, we can always take vectors from the basis $\mathcal{A}$ and try to decompose them in the basis $\mathcal{B}$ ; it involves solving linear systems, and we know how to do that.

However, I think the following way is simpler. In $\mathbb{P}_{1}$ we also have the standard basis $\mathcal{S}=\{1,x\}$ , and for this basis

[I]_{{}_{\scriptstyle\mathcal{S}\mathcal{A}}}=\left(\begin{array}[]{rr}1&1\\ 0&1\end{array}\right)=:A,\qquad[I]_{{}_{\scriptstyle\mathcal{S}\mathcal{B}}}=% \left(\begin{array}[]{rr}1&1\\ 2&-2\end{array}\right)=:B,

and taking the inverses

[I]_{{}_{\scriptstyle\mathcal{A}\mathcal{S}}}=A^{-1}=\left(\begin{array}[]{rr}% 1&-1\\ 0&1\end{array}\right),\qquad[I]_{{}_{\scriptstyle\mathcal{B}\mathcal{S}}}=B^{-% 1}=\frac{1}{4}\left(\begin{array}[]{rr}2&1\\ 2&-1\end{array}\right).

Then

[I]_{{}_{\scriptstyle\mathcal{B}\mathcal{A}}}=[I]_{{}_{\scriptstyle\mathcal{B}% \mathcal{S}}}[I]_{{}_{\scriptstyle\mathcal{S}\mathcal{A}}}=B^{-1}A=\frac{1}{4}% \left(\begin{array}[]{rr}2&1\\ 2&-1\end{array}\right)\left(\begin{array}[]{rr}1&1\\ 0&1\end{array}\right)

and

[I]_{{}_{\scriptstyle\mathcal{A}\mathcal{B}}}=[I]_{{}_{\scriptstyle\mathcal{A}% \mathcal{S}}}[I]_{{}_{\scriptstyle\mathcal{S}\mathcal{B}}}=A^{-1}B=\left(% \begin{array}[]{rr}1&-1\\ 0&1\end{array}\right)\left(\begin{array}[]{rr}1&1\\ 2&-2\end{array}\right);

the reader should notice the balance of indices in the above formulas.

2.8.4. Matrix of a transformation and change of coordinates.

Let $T:V\to W$ be a linear transformation, and let $\mathcal{A}$ , $\widetilde{\mathcal{A}}$ be two bases in $V$ and let $\mathcal{B}$ , $\widetilde{\mathcal{B}}$ be two bases in $W$ . Suppose we know the matrix $[T]_{{}_{\scriptstyle\mathcal{B}\mathcal{A}}}$ , and we would like to find the matrix representation with respect to new bases $\widetilde{\mathcal{A}}$ , $\widetilde{\mathcal{B}}$ , i.e. the matrix $[T]_{{}_{\scriptstyle\widetilde{\mathcal{B}}\widetilde{\mathcal{A}}}}$ . The rule is very simple:

to get the matrix in the “new” bases one has to left and right multiply the matrix in the “old” bases by change of coordinates matrices.

I did not mention here which change of coordinate matrix should go where, because we don’t have any choice if we follow the balance of indices rule. Namely, matrix representation of a linear transformation changes according to the formula

\vphantom{T^{\mathcal{B}}}[T]_{{}_{\scriptstyle\widetilde{\mathcal{B}}% \widetilde{\mathcal{A}}}}=[I]_{{}_{\scriptstyle\widetilde{\mathcal{B}}\mathcal% {B}}}[T]_{{}_{\scriptstyle\mathcal{B}\mathcal{A}}}[I]_{{}_{\scriptstyle% \mathcal{A}\widetilde{\mathcal{A}}}}\ ;

notice the balance of indices here. The proof can be done just by analyzing what each of the matrices does.

2.8.5. Case of one basis: similar matrices

Let $V$ be a vector space and let $\mathcal{A}=\{\mathbf{a}_{1},\mathbf{a}_{2},\ldots,\mathbf{a}_{n}\}$ be a basis in $V$ . Consider a linear transformation $T:V\to V$ and let $[T]_{{}_{\scriptstyle\mathcal{A}\mathcal{A}}}$ be its matrix in this basis (we use the same basis for “inputs” and “outputs”)

The case when we use the same basis for “inputs” and “outputs” is very important (because in this case we can multiply a matrix by itself), so let us study this case a bit more carefully. Notice, that very often in this case the shorter notation $[T]_{{}_{\scriptstyle\mathcal{A}}}$ is used instead of $[T]_{{}_{\scriptstyle\mathcal{A}\mathcal{A}}}$ . However, the two index notation $[T]_{{}_{\scriptstyle\mathcal{A}\mathcal{A}}}$ is better adapted to the balance of indices rule, so I recommend using it (or at least always keep it in mind) when doing change of coordinates.

Let $\mathcal{B}=\{\mathbf{b}_{1},\mathbf{b}_{2},\ldots,\mathbf{b}_{n}\}$ be another basis in $V$ . By the change of coordinate rule above

[T]_{{}_{\scriptstyle\mathcal{B}\mathcal{B}}}=[I]_{{}_{\scriptstyle\mathcal{B}% \mathcal{A}}}[T]_{{}_{\scriptstyle\mathcal{A}\mathcal{A}}}[I]_{{}_{% \scriptstyle\mathcal{A}\mathcal{B}}}

Recalling that

[I]_{{}_{\scriptstyle\mathcal{B}\mathcal{A}}}=[I]_{{}_{\scriptstyle\mathcal{A}% \mathcal{B}}}^{-1}

and denoting $Q:=[I]_{{}_{\scriptstyle\mathcal{A}\mathcal{B}}}$ , we can rewrite the above formula as

[T]_{{}_{\scriptstyle\mathcal{B}\mathcal{B}}}=Q^{-1}[T]_{{}_{\scriptstyle% \mathcal{A}\mathcal{A}}}Q.

This gives a motivation for the following definition

Definition 2.8.1.

We say that a matrix $A$ is similar to a matrix $B$ if there exists an invertible matrix $Q$ such that $A=Q^{-1}BQ$ .

Since an invertible matrix must be square, it follows from counting dimensions, that similar matrices $A$ and $B$ have to be square and of the same size. If $A$ is similar to $B$ , i.e. if $A=Q^{-1}BQ$ , then

B=QAQ^{-1}=(Q^{-1})^{-1}A(Q^{-1})

(since $Q^{-1}$ is invertible), therefore $B$ is similar to $A$ . So, we can just say that $A$ and $B$ are similar.

The above reasoning shows, that it does not matter where to put $Q$ and where $Q^{-1}$ : one can use the formula $A=QBQ^{-1}$ in the definition of similarity.

The above discussion shows, that one can treat similar matrices as different matrix representation of the same linear operator (transformation).

Exercises.

2.8.1.

True or false

a)

Every change of coordinate matrix is square;
b)

Every change of coordinate matrix is invertible;
c)

The matrices $A$ and $B$ are called similar if $B=Q^{T}AQ$ for some matrix $Q$ ;
d)

The matrices $A$ and $B$ are called similar if $B=Q^{-1}AQ$ for some matrix $Q$ ;
e)

Similar matrices do not need to be square.

2.8.2.

Consider the system of vectors

(1,2,1,1)^{T},\quad(0,1,3,1)^{T},\quad(0,3,2,0)^{T},\quad(0,1,0,0)^{T}.

a)

Prove that it is a basis in $\mathbb{F}^{4}$ . Try to do minimal amount of computations.
b)

Find the change of coordinate matrix that changes the coordinates in this basis to the standard coordinates in $\mathbb{F}^{4}$ (i.e. to the coordinates in the standard basis $\mathbf{e}_{1},\ldots,\mathbf{e}_{4}$ ).

2.8.3.

Find the change of coordinates matrix that changes the coordinates in the basis $1,1+t$ in $\mathbb{P}_{1}$ to the coordinates in the basis $1-t,2t$ .

2.8.4.

Let $T$ be the linear operator in $\mathbb{F}^{2}$ defined (in the standard coordinates) by

T\left(\begin{array}[]{c}x\\ y\end{array}\right)=\left(\begin{array}[]{c}3x+y\\ x-2y\end{array}\right)

Find the matrix of $T$ in the standard basis and in the basis

(1,1)^{T},\qquad(1,2)^{T}.

2.8.5.

Prove, that if $A$ and $B$ are similar matrices then $\operatorname{trace}A=\operatorname{trace}B$ . Hint: recall how $\operatorname{trace}(XY)$ and $\operatorname{trace}(YX)$ are related.

2.8.6.

Are the matrices

\left(\begin{array}[]{cc}1&3\\ 2&2\end{array}\right)\qquad\text{and}\qquad\left(\begin{array}[]{cc}0&2\\ 4&2\end{array}\right)

similar? Justify.

	$\displaystyle\left(\begin{array}[]{rrr\|rrr}1&4&-2&1&0&0\\ -2&-7&7&0&1&0\\ 3&11&-6&0&0&1\end{array}\right)\begin{array}[]{c}\vphantom{1}\\ +2R_{1}\\ -3R_{1}\end{array}$	$\displaystyle\sim\left(\begin{array}[]{rrr\|rrr}1&4&-2&1&0&0\\ 0&1&3&2&1&0\\ 0&-1&0&-3&0&1\end{array}\right)\begin{array}[]{c}\vphantom{1}\\ \vphantom{1}\\ +R_{2}\end{array}\sim$
	$\displaystyle\left(\begin{array}[]{rrr\|rrr}1&4&-2&1&0&0\\ 0&1&3&2&1&0\\ 0&0&3&-1&1&1\end{array}\right)\begin{array}[]{c}\times 3\\ \phantom{1}\\ \phantom{1}\end{array}$	$\displaystyle\sim\left(\begin{array}[]{rrr\|rrr}3&12&-6&3&0&0\\ 0&1&3&2&1&0\\ 0&0&3&-1&1&1\end{array}\right)\begin{array}[]{c}+2R_{3}\\ -R_{3}\\ \phantom{1}\end{array}\sim$

Chapter 2 Systems of linear equations

2.1. Different faces of linear systems.

2.2. Solution of a linear system. Echelon and reduced echelon forms

2.2.1. Row operations.

Row operations and multiplication by elementary matrices

2.2.2. Row reduction.

An example of row reduction.

2.2.3. Echelon form

Exercises.

2.2.1.

2.2.2.

2.3. Analyzing the pivots.

2.3.1. Corollaries about linear independence and bases. Dimension

Proposition 2.3.1.

Proof.

Proposition 2.3.2.

Proof.

Proposition 2.3.3.

Proof.

Proposition 2.3.4.

Proof.

Proposition 2.3.5.

Proof.

2.3.2. Corollaries about invertible matrices

Proposition 2.3.6.

Proof.

Corollary 2.3.7.

Proposition 2.3.8.

Proof.

Exercises.

2.3.1.

2.3.2.

2.3.3.

2.3.4.

2.3.5.

2.3.6.

2.3.7.

2.3.8.

2.3.9.

2.4. Finding A−1A^{-1} by row reduction.

Theorem 2.4.1.

Proof.

An Example.

Exercises.

2.4.1.

2.5. Dimension. Finite-dimensional spaces.

Definition.

Proposition 2.5.1.

Proposition 2.5.2.

Proof.

Proposition 2.5.3.

Proof.

2.5.1. Completing a linearly independent system to a basis

Proposition 2.5.4 (Completion to a basis).

Proof.

2.5.2. Subspaces of finite dimensional spaces

Theorem 2.5.5.

Remark.

Proof of Theorem 2.5.5.

Exercises.

2.5.1.

2.5.2.

2.5.3.

2.5.4.

2.5.5.

2.5.6.

2.6. General solution of a linear system.

Theorem 2.6.1 (General solution of a linear equation).

Proof.

Exercises.

2.6.1.

2.6.2.

2.7. Fundamental subspaces of a matrix. Rank.

Definition.

2.7.1. Computing fundamental subspaces and rank.

Example.

2.7.2. Explanation of the computing bases in the fundamental subspaces.

The null space Ker⁡A\operatorname{Ker}A.

The column space Ran⁡A\operatorname{Ran}A.

The row space Ran⁡AT\operatorname{Ran}A^{T}.

2.4. Finding $A^{-1}$ by row reduction.

The null space $\operatorname{Ker}A$ .

The column space $\operatorname{Ran}A$ .

The row space $\operatorname{Ran}A^{T}$ .