Linear Algebra Done Wrong 2 Systems of linear equations 4 Introduction to spectral theory

Chapter 3 Determinants

3.1. Introduction.

The reader probably already met determinants in calculus or algebra, at least the determinants of $2\times 2$ and $3\times 3$ matrices. For a $2\times 2$ matrix

\left(\begin{array}[]{cc}a&b\\ c&d\end{array}\right)

the determinant is simply $ad-bc$ ; the determinant of a $3\times 3$ matrix can be found by the “Star of David” rule.

In this chapter we would like to introduce determinants for $n\times n$ matrices. I don’t want just to give a formal definition. First I want to give some motivation, and then derive some properties the determinant should have. Then if we want to have these properties, we do not have any choice, and arrive to several equivalent definitions of the determinant.

It is more convenient to start not with the determinant of a matrix, but with determinant of a system of vectors. There is no real difference here, since we always can join vectors together (say as columns) to form a matrix.

Let us have $n$ vectors $\mathbf{v}_{1},\mathbf{v}_{2},\ldots,\mathbf{v}_{n}$ in $\mathbb{R}^{n}$ (notice that the number of vectors coincides with dimension), and we want to find the $n$ -dimensional volume of the parallelepiped determined by these vectors.

The parallelepiped determined by the vectors $\mathbf{v}_{1},\mathbf{v}_{2},\ldots,\mathbf{v}_{n}$ can be defined as the collection of all vectors $\mathbf{v}\in\mathbb{R}^{n}$ that can be represented as

\mathbf{v}=t_{1}\mathbf{v}_{1}+t_{2}\mathbf{v}_{2}+\ldots+t_{n}\mathbf{v}_{n},% \qquad 0\leq t_{k}\leq 1\quad\forall k=1,2,\ldots,n.

It can be easily visualized when $n=2$ (parallelogram) and $n=3$ (parallelepiped). So, what is the $n$ -dimensional volume?

If $n=2$ it is area; if $n=3$ it is indeed the volume. In dimension $1$ is it just the length.

Finally, let us introduce some notation. For a system of vectors (columns) $\mathbf{a}_{1},\mathbf{a}_{2},\ldots,\mathbf{a}_{n}$ we will denote its determinant (that we are going to construct) as $D(\mathbf{a}_{1},\mathbf{a}_{2},\ldots,\mathbf{a}_{n})$ . If we join these vectors in a matrix $A$ (column number $k$ of $A$ is $\mathbf{a}_{k}$ ), then we will use the notation $\det A$ ,

\det A=D(\mathbf{a}_{1},\mathbf{a}_{2},\ldots,\mathbf{a}_{n})

Also, for a matrix

A=\left(\begin{array}[]{cccc}a_{1,1}&a_{1,2}&\ldots&a_{1,n}\\ a_{2,1}&a_{2,2}&\ldots&a_{2,n}\\ \vdots&\vdots&&\vdots\\ a_{n,1}&a_{n,2}&\ldots&a_{n,n}\end{array}\right)

its determinant is often denoted by

\left|\begin{array}[]{cccc}a_{1,1}&a_{1,2}&\ldots&a_{1,n}\\ a_{2,1}&a_{2,2}&\ldots&a_{2,n}\\ \vdots&\vdots&&\vdots\\ a_{n,1}&a_{n,2}&\ldots&a_{n,n}\end{array}\right|.

3.2. What properties determinant should have.

We know, that for dimensions 2 and 3 “volume” of a parallelepiped is determined by the base times height rule: if we pick one vector, then height is the distance from this vector to the subspace spanned by the remaining vectors, and the base is the $(n-1)$ -dimensional volume of the parallelepiped determined by the remaining vectors.

Now let us generalize this idea to higher dimensions. For a moment we do not care about how exactly to determine height and base. We will show, that if we assume that the base and the height satisfy some natural properties, then we do not have any choice, and the volume (determinant) is uniquely defined.

3.2.1. Linearity in each argument.

First of all, if we multiply vector $\mathbf{v}_{1}$ by a positive number $a$ , then the height (i.e. the distance to the linear span $\mathcal{L}(\mathbf{v}_{2},\ldots,\mathbf{v}_{n})$ ) is multiplied by $a$ . If we admit negative heights (and negative volumes), then this property holds for all scalars $a$ , and so the determinant $D(\mathbf{v}_{1},\mathbf{v}_{2},\ldots,\mathbf{v}_{n})$ of the system $\mathbf{v}_{1},\mathbf{v}_{2},\ldots,\mathbf{v}_{n}$ should satisfy

D(\alpha\mathbf{v}_{1},\mathbf{v}_{2},\ldots,\mathbf{v}_{n})=\alpha D(\mathbf{% v}_{1},\mathbf{v}_{2},\ldots,\mathbf{v}_{n}).

Of course, there is nothing special about vector $\mathbf{v}_{1}$ , so for any index $k$

(3.2.1)

D(\mathbf{v}_{1},\ldots,\underset{k}{\alpha\mathbf{v}_{k}},\ldots,\mathbf{v}_{% n})=\alpha D(\mathbf{v}_{1},\ldots,\underset{k}{\mathbf{v}_{k}},\ldots,\mathbf% {v}_{n})

To get the next property, let us notice that if we add 2 vectors, then the “height” of the result should be equal the sum of the “heights” of summands, i.e. that

(3.2.2)

D(\mathbf{v}_{1},\ldots,\underbrace{\mathbf{u}_{k}+\mathbf{v}_{k}}_{k},\ldots,% \mathbf{v}_{n})=\\ D(\mathbf{v}_{1},\ldots,\underset{k}{\mathbf{u}_{k}},\ldots,\mathbf{v}_{n})+D(% \mathbf{v}_{1},\ldots,\underset{k}{\mathbf{v}_{k}},\ldots,\mathbf{v}_{n})

In other words, the above two properties say that the determinant of $n$ vectors is linear in each argument (vector), meaning that if we fix $n-1$ vectors and interpret the remaining vector as a variable (argument), we get a linear function.

Remark.

We already know that linearity is a very nice property, that helps in many situations. So, admitting negative heights (and therefore negative volumes) is a very small price to pay to get linearity, since we can always put on the absolute value afterwards.

In fact, by admitting negative heights, we did not sacrifice anything! To the contrary, we even gained something, because the sign of the determinant contains some information about the system of vectors (orientation).

3.2.2. Determinant is $0$ if two vectors coincide

If $\mathbf{v}_{j}=\mathbf{v}_{k}$ for some $j\neq k$ , then the vectors $\mathbf{v}_{1},\mathbf{v}_{2},\ldots,\mathbf{v}_{n}$ are linearly dependent, so

\dim\operatorname{span}\{\mathbf{v}_{1},\mathbf{v}_{2},\ldots,\mathbf{v}_{n}\}% <n.

But then the volume must be $0$ (for example, because the “height”, i.e. the distance from $\mathbf{v}_{j}$ to $\operatorname{span}\{\mathbf{v}_{r}:r\neq j\}$ is $0$ ). Thus, it is natural to assume that

(3.2.3)

\displaystyle D(\mathbf{v}_{1},\mathbf{v}_{2},\ldots,\mathbf{v}_{n})=0\qquad% \text{if }\mathbf{v}_{j}=\mathbf{v}_{k}\quad\text{for some }j\neq k.

In what follows that will be the property of the determinant we will use.

Remark.

A stronger property that for a linearly dependent system of vectors $\mathbf{v}_{1},\mathbf{v}_{2},\ldots,\mathbf{v}_{n}$ there holds $D(\mathbf{v}_{1},\mathbf{v}_{2},\ldots,\mathbf{v}_{n})=0$ is also quite natural.

However, this property easily follows from a weaker assumption (3.2.3) and the linearity (3.2.1), (3.2.2), see Proposition 3.3.1 for details.

3.2.3. Preservation under “column replacement”

The next property also seems natural. Namely, if we take a vector, say $\mathbf{v}_{j}$ , and add to it a multiple of another vector $\mathbf{v}_{k}$ , the “height” does not change, so

(3.2.4)

D(\mathbf{v}_{1},\ldots,\underbrace{\mathbf{v}_{j}+\alpha\mathbf{v}_{k}}_{j},% \ldots,\underset{k}{\mathbf{v}_{k}},\ldots,\mathbf{v}_{n})\\ =D(\mathbf{v}_{1},\ldots,\underset{j}{\mathbf{v}_{j}},\ldots,\underset{k}{% \mathbf{v}_{k}},\ldots,\mathbf{v}_{n}).

In other words, if we apply the column operation of the third type, the determinant does not change.

However, this property is not an independent one, it is an easy corollary of (3.2.3) and the linearity (3.2.1), (3.2.2). Namely

	$\displaystyle D(\mathbf{v}_{1},$	$\displaystyle\ldots,\underbrace{\mathbf{v}_{j}+\alpha\mathbf{v}_{k}}_{j},% \ldots,\underset{k}{\mathbf{v}_{k}},\ldots,\mathbf{v}_{n})$
		$\displaystyle=D(\mathbf{v}_{1},\ldots,\underset{j}{\mathbf{v}_{j}},\ldots,% \underset{k}{\mathbf{v}_{k}},\ldots,\mathbf{v}_{n})+\alpha D(\mathbf{v}_{1},% \ldots,\underset{j}{\mathbf{v}_{k}},\ldots,\underset{k}{\mathbf{v}_{k}},\ldots% ,\mathbf{v}_{n})$
		$\displaystyle=D(\mathbf{v}_{1},\ldots,\underset{j}{\mathbf{v}_{j}},\ldots,% \underset{k}{\mathbf{v}_{k}},\ldots,\mathbf{v}_{n});$

the first equality here follows from the linearity (3.2.1), (3.2.2), and the second one is just (3.2.3).

3.2.4. Antisymmetry.

The next property the determinant should have, is that if we interchange 2 vectors, the determinant changes sign:

(3.2.5)

D(\mathbf{v}_{1},\ldots,\underset{j}{\mathbf{v}_{k}},\ldots,\underset{k}{% \mathbf{v}_{j}},\ldots,\mathbf{v}_{n})=-D(\mathbf{v}_{1},\ldots,\underset{j}{% \mathbf{v}_{j}},\ldots,\underset{k}{\mathbf{v}_{k}},\ldots,\mathbf{v}_{n}).

Remark.

Functions of several variables that change sign when one interchanges any two arguments are called antisymmetric.

At first sight this property does not look natural, but it can be deduced from the previous ones. Namely, applying property (3.2.4) three times, and then using (3.2.1) we get

	$\displaystyle D($	$\displaystyle\mathbf{v}_{1},\ldots,\underset{j}{\mathbf{v}_{j}},\ldots,% \underset{k}{\mathbf{v}_{k}},\ldots,\mathbf{v}_{n})$
		$\displaystyle=D(\mathbf{v}_{1},\ldots,\underset{j}{\mathbf{v}_{j}},\ldots,% \underbrace{\mathbf{v}_{k}-\mathbf{v}_{j}}_{k},\ldots,\mathbf{v}_{n})$
		$\displaystyle=D(\mathbf{v}_{1},\ldots,\underbrace{\mathbf{v}_{j}+(\mathbf{v}_{% k}-\mathbf{v}_{j})}_{j},\ldots,\underbrace{\mathbf{v}_{k}-\mathbf{v}_{j}}_{k},% \ldots,\mathbf{v}_{n})$
		$\displaystyle=D(\mathbf{v}_{1},\ldots,\underset{j}{\mathbf{v}_{k}},\ldots,% \underbrace{\mathbf{v}_{k}-\mathbf{v}_{j}}_{k},\ldots,\mathbf{v}_{n})$
		$\displaystyle=D(\mathbf{v}_{1},\ldots,\underset{j}{\mathbf{v}_{k}},\ldots,% \underbrace{(\mathbf{v}_{k}-\mathbf{v}_{j})-\mathbf{v}_{k}}_{k},\ldots,\mathbf% {v}_{n})$
		$\displaystyle=D(\mathbf{v}_{1},\ldots,\underset{j}{\mathbf{v}_{k}},\ldots,% \underset{k}{-\mathbf{v}_{j}},\ldots,\mathbf{v}_{n})$
		$\displaystyle=-D(\mathbf{v}_{1},\ldots,\underset{j}{\mathbf{v}_{k}},\ldots,% \underset{k}{\mathbf{v}_{j}},\ldots,\mathbf{v}_{n}).$

Recall, that the property (3.2.4) follows from (3.2.3) and the linearity (3.2.1), (3.2.2), so the antisymmetry (3.2.5) also follows from these properties.

3.2.5. Normalization.

The last property is the easiest one. For the standard basis $\mathbf{e}_{1},\mathbf{e}_{2},\ldots,\mathbf{e}_{n}$ in $\mathbb{R}^{n}$ the corresponding parallelepiped is the $n$ -dimensional unit cube, so

(3.2.6)

D(\mathbf{e}_{1},\mathbf{e}_{2},\ldots,\mathbf{e}_{n})=1.

In matrix notation this can be written as

\det(I)=1

3.3. Constructing the determinant.

The plan of the game is now as follows: using the properties that as we decided in Section 3.2 the determinant should have, we derive other properties of the determinant, some of them highly non-trivial. We will show how to use these properties to compute the determinant using our old friend—row reduction.

Later, in Section 3.4, we will show that the determinant, i.e. a function with the desired properties exists and unique. After all we have to be sure that the object we are computing and studying exists.

While our initial geometric motivation for determinant and its properties came from considering vectors in the real vector space $\mathbb{R}^{n}$ , so they relate only to matrices with real entries, all the constructions below use only algebraic operations (addition, multiplication, division) and are applicable to matrices with complex entries, and even with entries in an arbitrary field.

So in what follows we are constructing determinant not just for real matrices, but for complex matrices as well (and also for matrices with entries in an arbitrary field). The nice geometric motivation for the properties works only in the real case, but after we decided on the properties of the determinant (see properties 1–3 below) everything works in the general case.

3.3.1. Basic properties.

We will use the following basic properties of the determinant:

Determinant is linear in each column, i.e. in vector notation for every index $k$

D(\mathbf{v}_{1},\ldots,\underbrace{\alpha\mathbf{u}_{k}+\beta\mathbf{v}_{k}}_% {k},\ldots,\mathbf{v}_{n})=\\ \alpha D(\mathbf{v}_{1},\ldots,\underset{k}{\mathbf{u}_{k}},\ldots,\mathbf{v}_% {n})+\beta D(\mathbf{v}_{1},\ldots,\underset{k}{\mathbf{v}_{k}},\ldots,\mathbf% {v}_{n})

for all scalars $\alpha$ , $\beta$ .

2.

Determinant is antisymmetric, i.e. if one interchanges two columns, the determinant changes sign.
3.

Normalization property: $\det I=1$ .

All these properties were discussed above in Section 3.2. The first property is just the (3.2.1) and (3.2.2) combined. The second one is (3.2.5), and the last one is the normalization property (3.2.6). Note, that we did not use properties (3.2.3) and (3.2.4): they can be deduced from the above three. These three properties completely define determinant!

3.3.2. Properties of determinant deduced from the basic properties.

In what follows, $\det A=D(\mathbf{a}_{1},\mathbf{a}_{2},\ldots,\mathbf{a}_{n})$ is a function satisfying properties 1–3 in Section 3.3.1 above.

Proposition 3.3.1.

For a square matrix $A$ the following statements hold:

1.

If $A$ has a zero column, then $\det A=0$ .
2.

If $A$ has two equal columns, then $\det A=0$ ;
3.

If one column of $A$ is a multiple of another, then $\det A=0$ ;
4.

If columns of $A$ are linearly dependent, i.e. if the matrix is not invertible, then $\det A=0$ .

Proof.

Statement 1 follows immediately from linearity. If we multiply the zero column by zero, we do not change the matrix and its determinant. But by the property 1 above, we should get $0$ .

The fact that determinant is antisymmetric, implies statement 2. Indeed, if we interchange two equal columns, we change nothing, so the determinant remains the same. On the other hand, interchanging two columns changes sign of determinant, so

\det A=-\det A,

which is possible only if $\det A=0$ .

Statement 3 is immediate corollary of statement 2 and linearity.

To prove the last statement, let us first suppose that the first vector $\mathbf{v}_{1}$ is a linear combination of the other vectors,

\mathbf{v}_{1}=\alpha_{2}\mathbf{v}_{2}+\alpha_{3}\mathbf{v}_{3}+\ldots+\alpha% _{n}\mathbf{v}_{n}=\sum_{k=2}^{n}\alpha_{k}\mathbf{v}_{k}.

Then by linearity we have (in vector notation)

D(\mathbf{v}_{1},\mathbf{v}_{2},\ldots,\mathbf{v}_{n})=D\left(\big{(}\sum_{k=2% }^{n}\alpha_{k}\mathbf{v}_{k}\big{)},\mathbf{v}_{2},\mathbf{v}_{3},\ldots,% \mathbf{v}_{n}\right)\\ =\sum_{k=2}^{n}\alpha_{k}D(\mathbf{v}_{k},\mathbf{v}_{2},\mathbf{v}_{3},\ldots% ,\mathbf{v}_{n})

and each determinant in the sum is zero because of two equal columns.

Let us now consider general case, i.e. let us assume that the system $\mathbf{v}_{1},\mathbf{v}_{2},\ldots,\mathbf{v}_{n}$ is linearly dependent. Then one of the vectors, say $\mathbf{v}_{k}$ can be represented as a linear combination of the others. Interchanging this vector with $\mathbf{v}_{1}$ we arrive to the situation we just treated, so

D(\mathbf{v}_{1},\ldots,\underset{k}{\mathbf{v}_{k}},\ldots,\mathbf{v}_{n})=-D% (\mathbf{v}_{k},\ldots,\underset{k}{\mathbf{v}_{1}},\ldots,\mathbf{v}_{n})=-0=0,

so the determinant in this case is also $0$ . ∎

The next proposition generalizes property (3.2.4). As we already have said above, this property can be deduced from the three “basic” properties of the determinant, we are using in this section.

Proposition 3.3.2.

The determinant does not change if we add to a column a linear combination of the other columns (leaving the other columns intact). In particular, the determinant is preserved under “column replacement” (column operation of third type).

Note, that adding to a column a multiple of itself is prohibited here. We can only add multiples of the other columns.

Proof of Proposition 3.3.2.

Fix a vector $\mathbf{v}_{k}$ , and let $\mathbf{u}$ be a linear combination of the other vectors,

\mathbf{u}=\sum_{j\neq k}\alpha_{j}\mathbf{v}_{j}.

Then by linearity

D(\mathbf{v}_{1},\ldots,\underbrace{\mathbf{v}_{k}+\mathbf{u}}_{k},\ldots,% \mathbf{v}_{n})=D(\mathbf{v}_{1},\ldots,\underset{k}{\mathbf{v}_{k}},\ldots,% \mathbf{v}_{n})+D(\mathbf{v}_{1},\ldots,\underset{k}{\mathbf{u}},\ldots,% \mathbf{v}_{n}),

and by Proposition 3.3.1 the last term is zero. ∎

3.3.3. Determinants of diagonal and triangular matrices.

Now we are ready to compute determinant for some important special classes of matrices. The first class is the so-called diagonal matrices. Let us recall that a square matrix $A=\{a_{j,k}\}_{j,k=1}^{n}$ is called diagonal if all entries off the main diagonal are zero, i.e. if $a_{j,k}=0$ for all $j\neq k$ . We will often use the notation $\operatorname{diag}\{a_{1},a_{2},\ldots,a_{n}\}$ for the diagonal matrix

\left(\begin{array}[]{cccc}a_{1}&0&\ldots&0\\ 0&a_{2}&\ldots&0\\ \vdots&\vdots&\ddots&0\\ 0&0&\ldots&a_{n}\end{array}\right).

Since a diagonal matrix $\operatorname{diag}\{a_{1},a_{2},\ldots,a_{n}\}$ can be obtained from the identity matrix $I$ by multiplying column number $k$ by $a_{k}$ ,

Determinant of a diagonal matrix equal the product of the diagonal entries, $\det(\operatorname{diag}\{a_{1},a_{2},\ldots,a_{n}\})=a_{1}a_{2}\ldots a_{n}.$

The next important class is the class of so-called triangular matrices. A square matrix $A=\{a_{j,k}\}_{j,k=1}^{n}$ is called upper triangular if all entries below the main diagonal are $0$ , i.e. if $a_{j,k}=0$ for all $k<j$ . A square matrix is called lower triangular if all entries above the main are $0$ , i.e if $a_{j,k}=0$ for all $j<k$ . We call a matrix triangular, if it is either lower or upper triangular matrix.

It is easy to see that

Determinant of a triangular matrix equals to the product of the diagonal entries, $\det A=a_{1,1}a_{2,2}\ldots a_{n,n}.$

Indeed, if a triangular matrix has zero on the main diagonal, it is not invertible (this can easily be checked by column operations) and therefore both sides equal zero. If all diagonal entries are non-zero, then using column replacement (column operations of third type) one can transform the matrix into a diagonal one with the same diagonal entries: For upper triangular matrix one should first subtract appropriate multiples of the first column from the columns number $2,3,\ldots,n$ , “killing” all entries in the first row, then subtract appropriate multiples of the second column from columns number $3,\ldots,n$ , and so on.

To treat the case of lower triangular matrices one has to do “column reduction” from the left to the right, i.e. first subtract appropriate multiples of the last column from columns number $n-1,\ldots,2,1$ , and so on.

3.3.4. Computing the determinant.

Now we know how to compute determinants, using their properties: one just needs to do column reduction (i.e. row reduction for $A^{T}$ ) keeping track of column operations changing the determinant. Fortunately, the most often used operation—row replacement, i.e. operation of third type does not change the determinant. So we only need to keep track of interchanging of columns and of multiplication of column by a scalar.

If an echelon form of $A^{T}$ does not have pivots in every column (and row), then $A$ is not invertible, so $\det A=0$ . If $A$ is invertible, we arrive at a triangular matrix, and $\det A$ is the product of diagonal entries times the correction from column interchanges and multiplications.

The above algorithm implies that $\det A$ can be zero only if a matrix $A$ is not invertible. Combining this with the last statement of Proposition 3.3.1 we get

Proposition 3.3.3.

$\det A=0$ if and only if $A$ is not invertible. An equivalent statement: $\det A\neq 0$ if and only if $A$ is invertible.

Note, that although we now know how to compute determinants, the determinant is still not defined. One can ask: why don’t we define it as the result we get from the above algorithm? The problem is that formally this result is not well defined: that means we did not prove that different sequences of column operations yield the same answer.

3.3.5. Determinants of a transpose and of a product. Determinants of elementary matrices.

In this section we prove two important theorems.

Theorem 3.3.4 (Determinant of a transpose).

For a square matrix $A$ ,

\det A=\det(A^{T}).

This theorem implies that for all statement about columns we discussed above, the corresponding statements about rows are also true. In particular, determinants behave under row operations the same way they behave under column operations. So, we can use row operations to compute determinants.

Theorem 3.3.5 (Determinant of a product).

For $n\times n$ matrices $A$ and $B$

\det(AB)=(\det A)(\det B)

In other words

Determinant of a product equals product of determinants.

To prove both theorems we need the following lemma.

Lemma 3.3.6.

For a square matrix $A$ and an elementary matrix $E$ (of the same size)

\det(AE)=(\det A)(\det E)

Proof.

The proof can be done just by direct checking: determinants of special matrices are easy to compute; right multiplication by an elementary matrix is a column operation, and effect of column operations on the determinant is well known.

This can look like a lucky coincidence, that the determinants of elementary matrices agree with the corresponding column operations, but it is not a coincidence at all.

Namely, for a column operation the corresponding elementary matrix can be obtained from the identity matrix $I$ by this column operation. So, its determinant is 1 (determinant of $I$ ) times the effect of the column operation.

And that is all! It may be hard to realize at first, but the above paragraph is a complete and rigorous proof of the lemma! ∎

Applying $N$ times Lemma 3.3.6 we get the following corollary.

Corollary 3.3.7.

For any matrix $A$ and any sequence of elementary matrices $E_{1},E_{2},\ldots,E_{N}$ (all matrices are $n\times n$ )

\det(AE_{1}E_{2}\ldots E_{N})=(\det A)(\det E_{1})(\det E_{2})\ldots(\det E_{N})

Lemma 3.3.8.

Any invertible matrix is a product of elementary matrices.

Proof.

We know that any invertible matrix is row equivalent to the identity matrix, which is its reduced echelon form. So

I=E_{N}E_{N-1}\ldots E_{2}E_{1}A,

and therefore any invertible matrix can be represented as a product of elementary matrices,

A=E_{1}^{-1}E_{2}^{-1}\ldots E_{N-1}^{-1}E_{N}^{-1}I=E_{1}^{-1}E_{2}^{-1}% \ldots E_{N-1}^{-1}E_{N}^{-1}

(the inverse of an elementary matrix is an elementary matrix). ∎

Proof of Theorem 3.3.4.

First of all, it can be easily checked, that for an elementary matrix $E$ we have $\det E=\det(E^{T})$ . Notice, that it is sufficient to prove the theorem only for invertible matrices $A$ , since if $A$ is not invertible then $A^{T}$ is also not invertible, and both determinants are zero.

By Lemma 3.3.8 matrix $A$ can be represented as a product of elementary matrices,

A=E_{1}E_{2}\ldots E_{N},

and by Corollary 3.3.7 the determinant of $A$ is the product of determinants of the elementary matrices. Since taking the transpose just transposes each elementary matrix and reverses their order, Corollary 3.3.7 implies that $\det A=\det A^{T}$ . ∎

Proof of Theorem 3.3.5.

Let us first suppose that the matrix $B$ is invertible. Then Lemma 3.3.8 implies that $B$ can be represented as a product of elementary matrices

B=E_{1}E_{2}\ldots E_{N},

and so by Corollary 3.3.7

\det(AB)=(\det A)[(\det E_{1})(\det E_{2})\ldots(\det E_{N})]=(\det A)(\det B).

If $B$ is not invertible, then the product $AB$ is also not invertible, and the theorem just says that $0=0$ .

To check that the product $AB=C$ is not invertible, let us assume that it is invertible. Then multiplying the identity $AB=C$ by $C^{-1}$ from the left, we get $C^{-1}AB=I$ , so $C^{-1}A$ is a left inverse of $B$ . So $B$ is left invertible, and since it is square, it is invertible. We got a contradiction. ∎

3.3.6. Summary of properties of determinant.

First of all, let us say once more, that the determinant is defined only for square matrices! Since we now know that $\det A=\det(A^{T})$ , the statements that we knew about columns are true for rows too.

1.

Determinant is linear in each row (column) when the other rows (columns) are fixed.
2.

If one interchanges two rows (columns) of a matrix $A$ , the determinant changes sign.
3.

For a triangular (in particular, for a diagonal) matrix its determinant is the product of the diagonal entries. In particular, $\det I=1$ .
4.

If a matrix $A$ has a zero row (or column), $\det A=0$ .
5.

If a matrix $A$ has two equal rows (columns), $\det A=0$ .
6.

If one of the rows (columns) of $A$ is a linear combination of the other rows (columns), i.e. if the matrix is not invertible, then $\det A=0$ ;

More generally,
7.

$\det A=0$ if and only if $A$ is not invertible, or equivalently
8.

$\det A\neq 0$ if and only if $A$ is invertible.
9.

$\det A$ does not change if we add to a row (column) a linear combination of the other rows (columns). In particular, the determinant is preserved under the row (column) replacement, i.e. under the row (column) operation of the third kind.
10.

$\det A^{T}=\det A$ .
11.

$\det(AB)=(\det A)(\det B)$ .

And finally,
12.

If $A$ is an $n\times n$ matrix, then $\det(aA)=a^{n}\det A$ .

The last property follows from the linearity of the determinant, if we recall that to multiply a matrix $A$ by $a$ we have to multiply each row by $a$ , and that each multiplication multiplies the determinant by $a$ .

Exercises.

3.3.1.

If $A$ is an $n\times n$ matrix, how are the determinants $\det A$ and $\det(5A)$ related? Remark: $\det(5A)=5\det A$ only in the trivial case of $1\times 1$ matrices

3.3.2.

How are the determinants $\det A$ and $\det B$ related if

A=\left(\begin{array}[]{ccc}a_{1}&a_{2}&a_{3}\\ b_{1}&b_{2}&b_{3}\\ c_{1}&c_{2}&c_{3}\end{array}\right),\qquad B=\left(\begin{array}[]{ccc}2a_{1}&% 3a_{2}&5a_{3}\\ 2b_{1}&3b_{2}&5b_{3}\\ 2c_{1}&3c_{2}&5c_{3}\end{array}\right);

A=\left(\begin{array}[]{ccc}a_{1}&a_{2}&a_{3}\\ b_{1}&b_{2}&b_{3}\\ c_{1}&c_{2}&c_{3}\end{array}\right),\qquad B=\left(\begin{array}[]{ccc}3a_{1}&% 4a_{2}+5a_{1}&5a_{3}\\ 3b_{1}&4b_{2}+5b_{1}&5b_{3}\\ 3c_{1}&4c_{2}+5c_{1}&5c_{3}\end{array}\right).

3.3.3.

Using column or row operations compute the determinants

\left|\begin{array}[]{rrr}0&1&2\\ -1&0&-3\\ 2&3&0\end{array}\right|,\qquad\left|\begin{array}[]{rrr}1&2&3\\ 4&5&6\\ 7&8&9\end{array}\right|,\qquad\left|\begin{array}[]{rrrr}1&0&-2&3\\ -3&1&1&2\\ 0&4&-1&1\\ 2&3&0&1\end{array}\right|,\qquad\left|\begin{array}[]{rrr}1&x\\ 1&y\end{array}\right|.

3.3.4.

A square ( $n\times n$ ) matrix is called skew-symmetric (or antisymmetric) if $A^{T}=-A$ . Prove that if $A$ is skew-symmetric and $n$ is odd, then $\det A=0$ . Is this true for even $n$ ?

3.3.5.

A square matrix is called nilpotent if $A^{k}=\mathbf{0}$ for some positive integer $k$ . Show that for a nilpotent matrix $A$ $\det A=0$ .

3.3.6.

Prove that if the matrices $A$ and $B$ are similar, then $\det A=\det B$ .

3.3.7.

A real square matrix $Q$ is called orthogonal if $Q^{T}Q=I$ . Prove that if $Q$ is an orthogonal matrix then $\det Q=\pm 1$ .

3.3.8.

Show that

\left|\begin{array}[]{rrr}1&x&x^{2}\\ 1&y&y^{2}\\ 1&z&z^{2}\end{array}\right|=(z-x)(z-y)(y-x).

This is a particular case of the so-called Vandermonde determinant.

3.3.9.

Let points $A$ , $B$ and $C$ in the plane $\mathbb{R}^{2}$ have coordinates $(x_{1},y_{1})$ , $(x_{2},y_{2})$ and $(x_{3},y_{3})$ respectively. Show that the area of triangle $ABC$ is the absolute value of

\frac{1}{2}\left|\begin{array}[]{ccc}1&x_{1}&y_{1}\\ 1&x_{2}&y_{2}\\ 1&x_{3}&y_{3}\end{array}\right|.

Hint: use row operations and geometric interpretation of $2\times 2$ determinants (area).

3.3.10.

Let $A$ be a square matrix. Show that block triangular matrices

\left(\begin{array}[]{cc}I&*\\ \mathbf{0}&A\end{array}\right),\qquad\left(\begin{array}[]{cc}A&*\\ \mathbf{0}&I\end{array}\right),\qquad\left(\begin{array}[]{cc}I&\mathbf{0}\\ &A\end{array}\right),\qquad\left(\begin{array}[]{cc}A&\mathbf{0}\\ &I\end{array}\right)

all have determinant equal to $\det A$ . Here $*$ can be anything.

The following problems illustrate the power of block matrix notation.

3.3.11.

Use the previous problem to show that if $A$ and $C$ are square matrices, then

\det\left(\begin{array}[]{cc}A&B\\ \mathbf{0}&C\end{array}\right)=\det A\det C.

Hint: $\left(\begin{array}[]{cc}A&B\\ \mathbf{0}&C\end{array}\right)=\left(\begin{array}[]{cc}I&B\\ \mathbf{0}&C\end{array}\right)\left(\begin{array}[]{cc}A&\mathbf{0}\\ \mathbf{0}&I\end{array}\right)$ .

3.3.12.

Let $A$ be $m\times n$ and $B$ be $n\times m$ matrices. Prove that

\det\left(\begin{array}[]{cc}0&A\\ -B&I\end{array}\right)=\det(AB).

Hint: While it is possible to transform the matrix by row operations to a form where the determinant is easy to compute, the easiest way is to right multiply the matrix by $\left(\begin{array}[]{cc}I&0\\ B&I\end{array}\right)$ .

3.4. Formal definition. Existence and uniqueness of the determinant.

In this section we arrive to the formal definition of the determinant. We show that a function, satisfying the basic properties 1, 2, 3 from Section 3.3 exists, and moreover, such function is unique, i.e. we do not have any choice in constructing the determinant.

Consider an $n\times n$ matrix $A=\{a_{j,k}\}_{j,k=1}^{n}$ , and let $\mathbf{v}_{1},\mathbf{v}_{2},\ldots,\mathbf{v}_{n}$ be its columns, i.e.

\mathbf{v}_{k}=\left(\begin{array}[]{c}a_{1,k}\\ a_{2,k}\\ \vdots\\ a_{n,k}\end{array}\right)=a_{1,k}\mathbf{e}_{1}+a_{2,k}\mathbf{e}_{2}+\ldots+a% _{n,k}\mathbf{e}_{n}=\sum_{j=1}^{n}a_{j,k}\mathbf{e}_{j}.

Using linearity of the determinant we expand it in the first column $\mathbf{v}_{1}$ :

(3.4.1)

D(\mathbf{v}_{1},\mathbf{v}_{2},\ldots,\mathbf{v}_{n})=\\ D(\sum_{j=1}^{n}a_{j,1}\mathbf{e}_{j},\mathbf{v}_{2},\ldots,\mathbf{v}_{n})=% \sum_{j=1}^{n}a_{j,1}D(\mathbf{e}_{j},\mathbf{v}_{2},\ldots,\mathbf{v}_{n}).

Then we expand it in the second column, then in the third, and so on. We get

D(\mathbf{v}_{1},\mathbf{v}_{2},\ldots,\mathbf{v}_{n})=\sum_{j_{1}=1}^{n}\sum_% {j_{2}=1}^{n}\ldots\sum_{j_{n}=1}^{n}a_{j_{1},1}a_{j_{2},2}\ldots a_{j_{n},n}D% (\mathbf{e}_{j_{1}},\mathbf{e}_{j_{2}},\ldots\mathbf{e}_{j_{n}}).

Notice, that we have to use a different index of summation for each column: we call them $j_{1},j_{2},\ldots,j_{n}$ ; the index $j_{1}$ here is the same as the index $j$ in (3.4.1).

It is a huge sum, it contains $n^{n}$ terms. Fortunately, some of the terms are zero. Namely, if any 2 of the indices $j_{1},j_{2},\ldots,j_{n}$ coincide, the determinant $D(\mathbf{e}_{j_{1}},\mathbf{e}_{j_{2}},\ldots\mathbf{e}_{j_{n}})$ is zero, because there are two equal columns here.

So, let us rewrite the sum, omitting all zero terms. The most convenient way to do that is using the notion of a permutation. Informally, a permutation of an ordered set $\{1,2,\ldots,n\}$ is a rearrangement of its elements. A convenient formal way to represent such a rearrangement is by using a function

\sigma:\{1,2,\ldots,n\}\to\{1,2,\ldots,n\},

where $\sigma(1),\sigma(2),\ldots,\sigma(n)$ gives the new order of the set $1,2,\ldots,n$ . In other words, the permutation $\sigma$ rearranges the ordered set $1,2,\ldots,n$ into $\sigma(1),\sigma(2),\ldots,\sigma(n)$ .

Such function $\sigma$ has to be one-to-one (different values for different arguments) and onto (assumes all possible values from the target space). The functions which are one-to-one and onto are called bijections, and they give one-to-one correspondence between the domain and the target space.¹¹1 There is another canonical way to represent permutation by a bijection $\sigma$ , namely in this representation $\sigma(k)$ gives new position of the element number $k$ . In this representation $\sigma$ rearranges $\sigma(1),\sigma(2),\ldots,\sigma(n)$ into $1,2,\ldots,n$ . While in the first representation it is easy to write the function if you know the rearrangement of the set $1,2,\ldots,n$ , the second one is more adapted to the composition of permutations: it coincides with the composition of functions. Namely if we first perform the permutation that correspond to a function $\sigma$ and then one that correspond to $\tau$ , the resulting permutation will correspond to $\tau\circ\sigma$ .

Although it is not directly relevant here, let us notice, that it is well-known in combinatorics, that the number of different permutations of the set $\{1,2,\ldots,n\}$ is exactly $n!$ . The set of all permutations of the set $\{1,2,\ldots,n\}$ will be denoted $\operatorname{Perm}(n)$ .

Using the notion of a permutation, we can rewrite the determinant as

D(\mathbf{v}_{1},\mathbf{v}_{2},\ldots,\mathbf{v}_{n})=\\ \sum_{\sigma\in\operatorname{Perm}(n)}a_{\sigma(1),1}a_{\sigma(2),2}\ldots a_{% \sigma(n),n}D(\mathbf{e}_{\sigma(1)},\mathbf{e}_{\sigma(2)},\ldots,\mathbf{e}_% {\sigma(n)}).

The matrix with columns $\mathbf{e}_{\sigma(1)},\mathbf{e}_{\sigma(2)},\ldots,\mathbf{e}_{\sigma(n)}$ can be obtained from the identity matrix by finitely many column interchanges, so the determinant

D(\mathbf{e}_{\sigma(1)},\mathbf{e}_{\sigma(2)},\ldots,\mathbf{e}_{\sigma(n)})

is $1$ or $-1$ depending on the number of column interchanges.

To formalize that, we (informally) define the sign (denoted $\operatorname{sign}\sigma$ ) of a permutation $\sigma$ to be 1 if an even number of interchanges is necessary to rearrange the $n$ -tuple $1,2,\ldots,n$ into $\sigma(1),\sigma(2),\ldots,\sigma(n)$ , and $\operatorname{sign}(\sigma)=-1$ if the number of interchanges is odd.

It is a well-known fact from the combinatorics, that the sign of permutation is well defined, i.e. that although there are infinitely many ways to get the $n$ -tuple $\sigma(1),\sigma(2),\ldots,\sigma(n)$ from $1,2,\ldots,n$ , the number of interchanges is either always odd or always even.

One of the ways to show that is to introduce an alternative definition. Let $K=K(\sigma)$ be the number of disorders of $\sigma$ , i.e. the number of pairs $(j,k)$ , $j,k\in\{1,2,\ldots,n\}$ , $j<k$ such that $\sigma(j)>\sigma(k)$ , and see if the number is even or odd. We call the permutation $\sigma$ odd if $K$ is odd and even if $K$ is even. Then define $\operatorname{sign}\sigma:=(-1)^{K(\sigma)}$ ; note that this way $\operatorname{sign}\sigma$ is well defined.

We want to show that $\operatorname{sign}\sigma=(-1)^{K(\sigma)}$ can indeed be computed by rearranging the $n$ -tuple $1,2,\ldots,n$ into $\sigma(1),\sigma(2),\ldots,\sigma(n)$ and counting the number of interchanges, as was described above.

If $\sigma(k)=k$ $\forall k$ , then the number of disorders $K(\sigma)$ is $0$ , so sign of such identity permutation is $1$ . Note also, that any elementary transpose, which interchange two neighbors, changes the sign of a permutation, because it changes (increases or decreases) the number of disorders exactly by 1. So, to get from a permutation to another one always needs an even number of elementary transposes if the permutations have the same sign, and an odd number if the signs are different.

Finally, any interchange of two entries can be achieved by an odd number of elementary transposes. This implies that sign changes under an interchange of two entries. So, to get from $1,2,\ldots,n$ to an even permutation (positive sign) one always need even number of interchanges, and odd number of interchanges is needed to get an odd permutation (negative sign).

So, if we want determinant to satisfy basic properties 1–3 from Section 3.3, we must define it as

(3.4.2)

\det A=\sum_{\sigma\in\operatorname{Perm}(n)}a_{\sigma(1),1}a_{\sigma(2),2}% \ldots a_{\sigma(n),n}\operatorname{sign}(\sigma),

where the sum is taken over all permutations of the set $\{1,2,\ldots,n\}$ .

If we define the determinant this way, it is easy to check that it satisfies the basic properties 1–3 from Section 3.3. Indeed, it is linear in each column, because for each column every term (product) in the sum contains exactly one entry from this column.

Interchanging two columns of $A$ just adds an extra interchange to the permutation, so right side in (3.4.2) changes sign. Finally, for the identity matrix $I$ , the right side of (3.4.2) is $1$ (it has one non-zero term).

Exercises.

3.4.1.

Suppose the permutation $\sigma$ takes $(1,2,3,4,5)$ to $(5,4,1,2,3)$ .

a)

Find sign of $\sigma$ ;
b)

What does $\sigma^{2}:=\sigma\circ\sigma$ do to $(1,2,3,4,5)$ ?
c)

What does the inverse permutation $\sigma^{-1}$ do to $(1,2,3,4,5)$ ?
d)

What is the sign of $\sigma^{-1}$ ?

3.4.2.

Let $P$ be a permutation matrix, i.e. an $n\times n$ matrix consisting of zeroes and ones and such that there is exactly one 1 in every row and every column.

a)

Can you describe the corresponding linear transformation? That will explain the name.
b)

Show that $P$ is invertible. Can you describe $P^{-1}$ ?
c)

Show that for some $N>0$

$P^{N}:=\underbrace{PP\ldots P}_{N\text{ times}}=I.$

Use the fact that there are only finitely many permutations.

3.4.3.

Why is there an even number of permutations of $(1,2,\ldots,9)$ and why are exactly half of them odd permutations? Hint: This problem can be hard to solve in terms of permutations, but there is a very simple solution using determinants.

3.4.4.

If $\sigma$ is an odd permutation, explain why $\sigma^{2}$ is even but $\sigma^{-1}$ is odd.

3.4.5.

How many multiplications and additions is required to compute the determinant using formal definition (3.4.2) of the determinant of an $n\times n$ matrix? Do not count the operations needed to compute $\operatorname{sign}\sigma$ .

3.5. Cofactor expansion.

For an $n\times n$ matrix $A=\{a_{j,k}\}_{j,k=1}^{n}$ let $A_{j,k}$ denotes the $(n-1)\times(n-1)$ matrix obtained from $A$ by crossing out row number $j$ and column number $k$ .

Theorem 3.5.1 (Cofactor expansion of determinant).

Let $A$ be an $n\times n$ matrix. For each $j$ , $1\leq j\leq n$ , determinant of $A$ can be expanded in the row number $j$ as

\det A=\\ a_{j,1}(-1)^{j+1}\det A_{j,1}+a_{j,2}(-1)^{j+2}\det A_{j,2}+\ldots+a_{j,n}(-1)% ^{j+n}\det A_{j,n}\\ =\sum_{k=1}^{n}a_{j,k}(-1)^{j+k}\det A_{j,k}.

Similarly, for each $k$ , $1\leq k\leq n$ , the determinant can be expanded in the column number $k$ ,

\det A=\sum_{j=1}^{n}a_{j,k}(-1)^{j+k}\det A_{j,k}.

Proof.

Let us first prove the formula for the expansion in row number 1. The formula for expansion in row number $2$ then can be obtained from it by interchanging rows number 1 and $2$ . Interchanging then rows number $2$ and $3$ we get the formula for the expansion in row number $3$ , and so on.

Since $\det A=\det A^{T}$ , column expansion follows automatically.

Let us first consider a special case, when the first row has one non-zero term $a_{1,1}$ . Performing column operations on columns $2,3,\ldots,n$ we transform $A$ to the lower triangular form. The determinant of $A$ then can be computed as

\framebox{\parbox{113.81102pt}{the product of diagonal entries of the % triangular matrix}}\times\framebox{\parbox{113.81102pt}{correcting factor from the column% operations}}.

But the product of all diagonal entries except the first one (i.e. without $a_{1,1}$ ) times the correcting factor is exactly $\det A_{1,1}$ , so in this particular case $\det A=a_{1,1}\det A_{1,1}$ .

Let us now consider the case when all entries in the first row except $a_{1,2}$ are zeroes. This case can be reduced to the previous one by interchanging columns number 1 and 2, and therefore in this case $\det A=(-1)a_{1,2}\det A_{1,2}$ .

The case when $a_{1,3}$ is the only non-zero entry in the first row, can be reduced to the previous one by interchanging columns 2 and 3, so in this case $\det A=a_{1,3}\det A_{1,3}$ .

Repeating this procedure we get that in the case when $a_{1,k}$ is the only non-zero entry in the first row $\det A=(-1)^{1+k}a_{1,k}\det A_{1,k}$ . ²²2In the case when $a_{1,k}$ is the only non-zero entry in the first row it may be tempting to exchange columns number 1 and number $k$ , to reduce the problem to the case $a_{1,1}\neq 0$ . However, when we exchange columns 1 and $k$ we change the order of other columns: if we just cross out column number $k$ , then column number $1$ will be the first of the remaining columns. But, if we exchange columns $1$ and $k$ , and then cross out column $k$ (which is now the first one), then the column $1$ will be now column number $k-1$ . To avoid the complications of keeping track of the order of columns, we can, as we did above, exchange columns number $k$ and $k-1$ , reducing everything to the situation we treated on the previous step. Such an operation does not change the order for the rest of the columns.

In the general case, linearity of the determinant in each row implies that

\det A=\det A^{(1)}+\det A^{(2)}+\ldots+\det A^{(n)}=\sum_{k=1}^{n}\det A^{(k)}

where the matrix $A^{(k)}$ is obtained from $A$ by replacing all entries in the first row except $a_{1,k}$ by $0$ . As we just discussed above

\det A^{(k)}=(-1)^{1+k}a_{1,k}\det A_{1,k},

\det A=\sum_{k=1}^{n}(-1)^{1+k}a_{1,k}\det A_{1,k}.

To get the cofactor expansion in the second row, we can interchange the first and second rows and apply the above formula. The row exchange changes the sign, so we get

\det A=-\sum_{k=1}^{n}(-1)^{1+k}a_{2,k}\det A_{2,k}=\sum_{k=1}^{n}(-1)^{2+k}a_% {2,k}\det A_{2,k}.

Exchanging rows 3 and 2 and expanding in the second row we get formula

\det A=\sum_{k=1}^{n}(-1)^{3+k}a_{3,k}\det A_{3,k},

and so on.

To expand the determinant $\det A$ in a column one need to apply the row expansion formula for $A^{T}$ . ∎

Definition.

The numbers

C_{j,k}=(-1)^{j+k}\det A_{j,k}

are called cofactors.

Using this notation, the formula for expansion of the determinant in the row number $j$ can be rewritten as

\det A=a_{j,1}C_{j,1}+a_{j,2}C_{j,2}+\ldots+a_{j,n}C_{j,n}=\sum_{k=1}^{n}a_{j,% k}C_{j,k}.

Similarly, expansion in the column number $k$ can be written as

\det A=a_{1,k}C_{1,k}+a_{2,k}C_{2,k}+\ldots+a_{n,k}C_{n,k}=\sum_{j=1}^{n}a_{j,% k}C_{j,k}

Remark.

Very often the cofactor expansion formula is used as the definition of determinant. It is not difficult to show that the quantity given by this formula satisfies the basic properties of the determinant: the normalization property is trivial, the proof of antisymmetry is easy. However, the proof of linearity is a bit tedious (although not too difficult).

Remark.

Although it looks very nice, the cofactor expansion formula is not suitable for computing determinant of matrices bigger than $3\times 3$ .

As one can count it requires more than $n!$ multiplications (to be precise it requires $\sum_{k=1}^{n-1}n!/k!$ multiplications), and $n!$ grows very rapidly. For example, cofactor expansion of a $20\times 20$ matrix require more than $20!\approx 2.433\cdot 10^{18}$ multiplications. It would take a computer performing a billion multiplications per second over $77$ years to perform $20!$ multiplications; performing the multiplications required for the cofactor expansion of the determinant of a $20\times 20$ matrix will require more than 132 years.³³3The reader can check the numbers using, for example, WolframAlpha

On the other hand, computing the determinant of an $n\times n$ matrix using row reduction requires $(n^{3}+2n-3)/3$ multiplications (and about the same number of additions). It would take a computer performing a million operations per second (very slow, by today’s standards) a fraction of a second to compute the determinant of a $100\times 100$ matrix by row reduction.

It can only be practical to apply the cofactor expansion formula in higher dimensions if a row (or a column) has a lot of zero entries.

However, the cofactor expansion formula is of great theoretical importance, as the next section shows.

3.5.1. Cofactor formula for the inverse matrix

The matrix $C=\linebreak\{C_{j,k}\}_{j,k=1}^{n}$ whose entries are cofactors of a given matrix $A$ is called the cofactor matrix of $A$ .

Theorem 3.5.2.

Let $A$ be an invertible matrix and let $C$ be its cofactor matrix. Then

A^{-1}=\frac{1}{\det A}\,\,C^{T}.

Proof.

Let us find the product $AC^{T}$ . The diagonal entry number $j$ is obtained by multiplying $j$ th row of $A$ by $j$ th column of $C^{T}$ (i.e. $j$ th row of $C$ ), so

(AC^{T})_{j,j}=a_{j,1}C_{j,1}+a_{j,2}C_{j,2}+\ldots+a_{j,n}C_{j,n}=\det A,

by the cofactor expansion formula.

To get the off diagonal terms we need to multiply $k$ th row of $A$ by $j$ th column of $C^{T}$ , $j\neq k$ , to get

a_{k,1}C_{j,1}+a_{k,2}C_{j,2}+\ldots+a_{k,n}C_{j,n}.

It follows from the cofactor expansions formula (expanding in $j$ th row) that this is the determinant of the matrix obtained from $A$ by replacing row number $j$ by the row number $k$ (and leaving all other rows as they were). But the rows $j$ and $k$ of this matrix coincide, so the determinant is $0$ . So, all off-diagonal entries of $AC^{T}$ are zeroes (and all diagonal ones equal $\det A$ ), thus

AC^{T}=(\det A)\,I.

That means that the matrix $\frac{1}{\det A}\,\,C^{T}$ is a right inverse of $A$ , and since $A$ is square, it is the inverse. ∎

Recalling that for an invertible matrix $A$ the equation $A\mathbf{x}=\mathbf{b}$ has a unique solution

\mathbf{x}=A^{-1}\mathbf{b}=\frac{1}{\det A}C^{T}\mathbf{b},

we get the following corollary of the above theorem.

Corollary 3.5.3 (Cramer’s rule).

For an invertible matrix $A$ the entry number $k$ of the solution of the equation $A\mathbf{x}=\mathbf{b}$ is given by the formula

x_{k}=\frac{\det B_{k}}{\det A},

where the matrix $B_{k}$ is obtained from $A$ by replacing column number $k$ of $A$ by the vector $\mathbf{b}$ .

3.5.2. Some applications of the cofactor formula for the inverse.

Example (Inverting $2\times 2$ matrices).

The cofactor formula really shines when one needs to invert a $2\times 2$ matrix

A=\left(\begin{array}[]{cc}a&b\\ c&d\end{array}\right).

The cofactors are just entries ( $1\times 1$ matrices), the cofactor matrix is

\left(\begin{array}[]{cc}d&-c\\ -b&a\end{array}\right),

so the inverse matrix $A^{-1}$ is given by the formula

A^{-1}=\frac{1}{\det A}\left(\begin{array}[]{cc}d&-b\\ -c&a\end{array}\right).

While the cofactor formula for the inverse does not look practical for dimensions higher than 3, it has a great theoretical value, as the examples below illustrate.

Example (Matrix with integer inverse).

Suppose that we want to construct a matrix $A$ with integer entries, such that its inverse also has integer entries (inverting such a matrix would make a nice homework problem: no messing with fractions). If $\det A=1$ and its entries are integer, the cofactor formula for inverses implies that $A^{-1}$ also have integer entries.

Note, that it is easy to construct an integer matrix $A$ with $\det A=1$ : one should start with a triangular matrix with $1$ on the main diagonal, and then apply several row or column replacements (operations of the third type) to make the matrix look generic.

Example (Inverse of a polynomial matrix).

Another example is to consider a polynomial matrix $A(x)$ , i.e. a matrix whose entries are not numbers but polynomials $a_{j,k}(x)$ of the variable $x$ . If $\det A(x)\equiv 1$ , then the inverse matrix $A^{-1}(x)$ is also a polynomial matrix.

If $\det A(x)=p(x)\not\equiv 0$ , it follows from the cofactor expansion that $p(x)$ is a polynomial, so $A^{-1}(x)$ has rational entries: moreover, $p(x)$ is a multiple of each denominator.

Exercises.

3.5.1.

Evaluate the determinants using any method

\left|\begin{array}[]{rrr}0&1&1\\ 1&2&-5\\ 6&-4&3\end{array}\right|,\qquad\left|\begin{array}[]{rrrr}1&-2&3&-12\\ -5&12&-14&19\\ -9&22&-20&31\\ -4&9&-14&15\end{array}\right|.

3.5.2.

Use row (column) expansion to evaluate the determinants. Note, that you don’t need to use the first row (column): picking row (column) with many zeroes will simplify your calculations.

\left|\begin{array}[]{rrr}1&2&0\\ 1&1&5\\ 1&-3&0\end{array}\right|,\qquad\left|\begin{array}[]{rrrr}4&-6&-4&4\\ 2&1&0&0\\ 0&-3&1&3\\ -2&2&-3&-5\end{array}\right|.

3.5.3.

For the $n\times n$ matrix

A=\left(\begin{array}[]{ccccccccccc}0&0&0&\ldots&0&a_{0}\\ -1&0&0&\ldots&0&a_{1}\\ 0&-1&0&\ldots&0&a_{2}\\ \vdots&\vdots&\vdots&\ddots&\vdots&\vdots\\ 0&0&0&\ldots&0&a_{n-2}\\ 0&0&0&\ldots&-1&a_{n-1}\\ \end{array}\right)

compute $\det(A+tI)$ , where $I$ is $n\times n$ identity matrix. You should get a nice expression involving $a_{0},a_{1},\ldots,a_{n-1}$ and $t$ . Row expansion and induction is probably the best way to go.

3.5.4.

Using cofactor formula compute inverses of the matrices

\left(\begin{array}[]{cc}1&2\\ 3&4\end{array}\right),\qquad\left(\begin{array}[]{cc}19&-17\\ 3&-2\end{array}\right),\qquad\left(\begin{array}[]{cc}1&0\\ 3&5\end{array}\right),\qquad\left(\begin{array}[]{ccc}1&1&0\\ 2&1&2\\ 0&1&1\end{array}\right).

3.5.5.

Let $D_{n}$ be the determinant of the $n\times n$ tridiagonal matrix

\left(\begin{array}[]{cccccc}1&-1&&&0\\ 1&1&-1&&\\ &1&\ddots&\ddots&\\ &&\ddots&1&-1\\ 0&&&1&1\end{array}\right).

Using cofactor expansion show that $D_{n}=D_{n-1}+D_{n-2}$ . This yields that the sequence $D_{n}$ is the Fibonacci sequence $1,2,3,5,8,13,21,\ldots$

3.5.6.

Vandermonde determinant revisited. Our goal is to prove the formula

\left|\begin{array}[]{ccccc}1&c_{0}&c_{0}^{2}&\ldots&c_{0}^{n}\\ 1&c_{1}&c_{1}^{2}&\ldots&c_{1}^{n}\\ \vdots&\vdots&\vdots&&\vdots\\ 1&c_{n}&c_{n}^{2}&\ldots&c_{n}^{n}\end{array}\right|=\prod_{0\leq j<k\leq n}(c% _{k}-c_{j})

for the $(n+1)\times(n+1)$ Vandermonde determinant.

We will apply induction. To do this

a)

Check that the formula holds for $n=1$ , $n=2$ .
b)

Call the variable $c_{n}$ in the last row $x$ , and show that the determinant is a polynomial of degree $n$ , $A_{0}+A_{1}x+A_{2}x^{2}+\ldots+A_{n}x^{n}$ , with the coefficients $A_{k}$ depending on $c_{0},c_{1},\ldots,c_{n-1}$ .
c)

Show that the polynomial has zeroes at $x=c_{0},c_{1},\ldots,c_{n-1}$ , so it can be represented as $A_{n}\cdot(x-c_{0})(x-c_{1})\ldots(x-c_{n-1})$ , where $A_{n}$ as above.
d)

Assuming that the formula for the Vandermonde determinant is true for $n-1$ , compute $A_{n}$ and prove the formula for $n$ .

3.5.7.

How many multiplication is needed to compute the determinant of an $n\times n$ matrix using the cofactor expansion? Prove the formula.

3.6. Minors and rank.

For a matrix $A$ let us consider its $k\times k$ submatrix, obtained by taking $k$ rows and $k$ columns. The determinant of this matrix is called a minor of order $k$ . Note, that an $m\times n$ matrix has $\binom{m}{k}\cdot\binom{n}{k}$ different $k\times k$ submatrices, and so it has $\binom{m}{k}\cdot\binom{n}{k}$ minors of order $k$ .

Theorem 3.6.1.

For a non-zero matrix $A$ its rank equals to the maximal integer $k$ such that there exists a non-zero minor of order $k$ .

Proof.

Let us first show, that if $k>\operatorname{rank}A$ then all minors of order $k$ are $0$ . Indeed, since the dimension of the column space $\operatorname{Ran}A$ is $\operatorname{rank}A<k$ , any $k$ columns of $A$ are linearly dependent. Therefore, for any $k\times k$ submatrix of $A$ its columns are linearly dependent, and so all minors of order $k$ are $0$ .

To complete the proof we need to show that there exists a non-zero minor of order $k=\operatorname{rank}A$ . There can be many such minors, but probably the easiest way to get such a minor is to take pivot rows and pivot columns (i.e. rows and columns of the original matrix, containing a pivot). This $k\times k$ submatrix has the same pivots as the original matrix, so it is invertible (pivot in every column and every row) and its determinant is non-zero. ∎

This theorem does not look very useful, because it is much easier to perform row reduction than to compute all minors. However, it is of great theoretical importance, as the following corollary shows.

Corollary 3.6.2.

Let $A=A(x)$ be an $m\times n$ polynomial matrix (i.e. a matrix whose entries are polynomials of $x$ ). Then $\operatorname{rank}A(x)$ is constant everywhere, except maybe finitely many points, where the rank is smaller.

Proof.

Let $r$ be the largest integer such that $\operatorname{rank}A(x)=r$ for some $x$ . To show that such $r$ exists, we first try $r=\min\{m,n\}$ . If there exists $x$ such that $\operatorname{rank}A(x)=r$ , we have found $r$ . If not, we replace $r$ by $r-1$ and try again. After finitely many steps we either stop or hit $0$ . So, $r$ exists.

Let $x_{0}$ be a point such that $\operatorname{rank}A(x_{0})=r$ , and let $M$ be a minor of order $r$ such that $M(x_{0})\neq 0$ . Since $M(x)$ is the determinant of a $r\times r$ polynomial matrix, $M(x)$ is a polynomial. Since $M(x_{0})\neq 0$ , it is not identically zero, so it can be zero only at finitely many points. So, everywhere except maybe finitely many points $\operatorname{rank}A(x)\geq r$ . But by the definition of $r$ , $\operatorname{rank}A(x)\leq r$ for all $x$ . ∎

3.7. Review exercises for Chapter 3.

3.7.1.

True or false

a)

Determinant is only defined for square matrices.
b)

If two rows or columns of $A$ are identical, then $\det A=0$ .
c)

If $B$ is the matrix obtained from $A$ by interchanging two rows (or columns), then $\det B=\det A$ .
d)

If $B$ is the matrix obtained from $A$ by multiplying a row (column) of $A$ by a scalar $\alpha$ , then $\det B=\det A$ .
e)

If $B$ is the matrix obtained from $A$ by adding a multiple of a row to some other row, then $\det B=\det A$ .
f)

The determinant of a triangular matrix is the product of its diagonal entries.
g)

$\det(A^{T})=-\det(A)$ .
h)

$\det(AB)=\det(A)\det(B)$ .
i)

A matrix $A$ is invertible if and only if $\det A\neq 0$ .
j)

If $A$ is an invertible matrix, then $\det(A^{-1})=1/\det(A)$ .

3.7.2.

Let $A$ be an $n\times n$ matrix. How are $\det(3A)$ , $\det(-A)$ and $\det(A^{2})$ related to $\det A$ .

3.7.3.

If the entries of both $A$ and $A^{-1}$ are integers, is it possible that $\det A=3$ ? Hint: what is $\det(A)\det(A^{-1})$ ?

3.7.4.

Let $\mathbf{v}_{1},\mathbf{v}_{2}$ be vectors in $\mathbb{R}^{2}$ and let $A$ be the $2\times 2$ matrix with columns $\mathbf{v}_{1},\mathbf{v}_{2}$ . Prove that $|\det A|$ is the area of the parallelogram with two sides given by the vectors $\mathbf{v}_{1},\mathbf{v}_{2}$ .

Consider first the case when $\mathbf{v}_{1}=(x_{1},0)^{T}$ . To treat general case $\mathbf{v}_{1}=(x_{1},y_{1})^{T}$ left multiply $A$ by a rotation matrix that transforms vector $\mathbf{v}_{1}$ into $(\widetilde{x}_{1},0)^{T}$ . Hint: what is the determinant of a rotation matrix?

The following problem illustrates relation between the sign of the determinant and the so-called orientation of a system of vectors.

3.7.5.

Let $\mathbf{v}_{1}$ , $\mathbf{v}_{2}$ be vectors in $\mathbb{R}^{2}$ . Show that $D(\mathbf{v}_{1},\mathbf{v}_{2})>0$ if and only if there exists a rotation $T_{\alpha}$ such that the vector $T_{\alpha}\mathbf{v}_{1}$ is parallel to $\mathbf{e}_{1}$ (and looking in the same direction), and $T_{\alpha}\mathbf{v}_{2}$ is in the upper half-plane $x_{2}>0$ (the same half-plane as $\mathbf{e}_{2}$ ).

Hint: What is the determinant of a rotation matrix?

Chapter 3 Determinants

3.1. Introduction.

3.2. What properties determinant should have.

3.2.1. Linearity in each argument.

Remark.

3.2.2. Determinant is 0 if two vectors coincide

Remark.

3.2.3. Preservation under “column replacement”

3.2.4. Antisymmetry.

Remark.

3.2.5. Normalization.

3.3. Constructing the determinant.

3.3.1. Basic properties.

3.3.2. Properties of determinant deduced from the basic properties.

Proposition 3.3.1.

Proof.

Proposition 3.3.2.

Proof of Proposition 3.3.2.

3.3.3. Determinants of diagonal and triangular matrices.

3.3.4. Computing the determinant.

Proposition 3.3.3.

3.3.5. Determinants of a transpose and of a product. Determinants of elementary matrices.

Theorem 3.3.4 (Determinant of a transpose).

Theorem 3.3.5 (Determinant of a product).

Lemma 3.3.6.

Proof.

Corollary 3.3.7.

Lemma 3.3.8.

Proof.

Proof of Theorem 3.3.4.

Proof of Theorem 3.3.5.

3.3.6. Summary of properties of determinant.

Exercises.

3.3.1.

3.3.2.

3.3.3.

3.3.4.

3.3.5.

3.3.6.

3.3.7.

3.3.8.

3.3.9.

3.3.10.

3.3.11.

3.3.12.

3.4. Formal definition. Existence and uniqueness of the determinant.

Exercises.

3.4.1.

3.4.2.

3.4.3.

3.4.4.

3.4.5.

3.5. Cofactor expansion.

Theorem 3.5.1 (Cofactor expansion of determinant).

Proof.

Definition.

Remark.

Remark.

3.5.1. Cofactor formula for the inverse matrix

Theorem 3.5.2.

Proof.

Corollary 3.5.3 (Cramer’s rule).

3.5.2. Some applications of the cofactor formula for the inverse.

Example (Inverting 2×22\times 2 matrices).

Example (Matrix with integer inverse).

Example (Inverse of a polynomial matrix).

Exercises.

3.5.1.

3.5.2.

3.5.3.

3.5.4.

3.5.5.

3.5.6.

3.5.7.

3.6. Minors and rank.

Theorem 3.6.1.

Proof.

Corollary 3.6.2.

Proof.

3.7. Review exercises for Chapter 3.

3.2.2. Determinant is $0$ if two vectors coincide

Example (Inverting $2\times 2$ matrices).