Chapter 3 Determinants

3.1. Introduction.

The reader probably already met determinants in calculus or algebra, at least the determinants of 2×22\times 2 and 3×33\times 3 matrices. For a 2×22\times 2 matrix

(abcd)\left(\begin{array}[]{cc}a&b\\ c&d\end{array}\right)

the determinant is simply adbcad-bc; the determinant of a 3×33\times 3 matrix can be found by the “Star of David” rule.

In this chapter we would like to introduce determinants for n×nn\times n matrices. I don’t want just to give a formal definition. First I want to give some motivation, and then derive some properties the determinant should have. Then if we want to have these properties, we do not have any choice, and arrive to several equivalent definitions of the determinant.

It is more convenient to start not with the determinant of a matrix, but with determinant of a system of vectors. There is no real difference here, since we always can join vectors together (say as columns) to form a matrix.

Let us have nn vectors 𝐯1,𝐯2,,𝐯n\mathbf{v}_{1},\mathbf{v}_{2},\ldots,\mathbf{v}_{n} in n\mathbb{R}^{n} (notice that the number of vectors coincides with dimension), and we want to find the nn-dimensional volume of the parallelepiped determined by these vectors.

The parallelepiped determined by the vectors 𝐯1,𝐯2,,𝐯n\mathbf{v}_{1},\mathbf{v}_{2},\ldots,\mathbf{v}_{n} can be defined as the collection of all vectors 𝐯n\mathbf{v}\in\mathbb{R}^{n} that can be represented as

𝐯=t1𝐯1+t2𝐯2++tn𝐯n,0tk1k=1,2,,n.\mathbf{v}=t_{1}\mathbf{v}_{1}+t_{2}\mathbf{v}_{2}+\ldots+t_{n}\mathbf{v}_{n},% \qquad 0\leq t_{k}\leq 1\quad\forall k=1,2,\ldots,n.

It can be easily visualized when n=2n=2 (parallelogram) and n=3n=3 (parallelepiped). So, what is the nn-dimensional volume?

If n=2n=2 it is area; if n=3n=3 it is indeed the volume. In dimension 11 is it just the length.

Finally, let us introduce some notation. For a system of vectors (columns) 𝐚1,𝐚2,,𝐚n\mathbf{a}_{1},\mathbf{a}_{2},\ldots,\mathbf{a}_{n} we will denote its determinant (that we are going to construct) as D(𝐚1,𝐚2,,𝐚n)D(\mathbf{a}_{1},\mathbf{a}_{2},\ldots,\mathbf{a}_{n}). If we join these vectors in a matrix AA (column number kk of AA is 𝐚k\mathbf{a}_{k}), then we will use the notation detA\det A,

detA=D(𝐚1,𝐚2,,𝐚n)\det A=D(\mathbf{a}_{1},\mathbf{a}_{2},\ldots,\mathbf{a}_{n})

Also, for a matrix

A=(a1,1a1,2a1,na2,1a2,2a2,nan,1an,2an,n)A=\left(\begin{array}[]{cccc}a_{1,1}&a_{1,2}&\ldots&a_{1,n}\\ a_{2,1}&a_{2,2}&\ldots&a_{2,n}\\ \vdots&\vdots&&\vdots\\ a_{n,1}&a_{n,2}&\ldots&a_{n,n}\end{array}\right)

its determinant is often denoted by

|a1,1a1,2a1,na2,1a2,2a2,nan,1an,2an,n|.\left|\begin{array}[]{cccc}a_{1,1}&a_{1,2}&\ldots&a_{1,n}\\ a_{2,1}&a_{2,2}&\ldots&a_{2,n}\\ \vdots&\vdots&&\vdots\\ a_{n,1}&a_{n,2}&\ldots&a_{n,n}\end{array}\right|.

3.2. What properties determinant should have.

We know, that for dimensions 2 and 3 “volume” of a parallelepiped is determined by the base times height rule: if we pick one vector, then height is the distance from this vector to the subspace spanned by the remaining vectors, and the base is the (n1)(n-1)-dimensional volume of the parallelepiped determined by the remaining vectors.

Now let us generalize this idea to higher dimensions. For a moment we do not care about how exactly to determine height and base. We will show, that if we assume that the base and the height satisfy some natural properties, then we do not have any choice, and the volume (determinant) is uniquely defined.

3.2.1. Linearity in each argument.

First of all, if we multiply vector 𝐯1\mathbf{v}_{1} by a positive number aa, then the height (i.e. the distance to the linear span (𝐯2,,𝐯n)\mathcal{L}(\mathbf{v}_{2},\ldots,\mathbf{v}_{n})) is multiplied by aa. If we admit negative heights (and negative volumes), then this property holds for all scalars aa, and so the determinant D(𝐯1,𝐯2,,𝐯n)D(\mathbf{v}_{1},\mathbf{v}_{2},\ldots,\mathbf{v}_{n}) of the system 𝐯1,𝐯2,,𝐯n\mathbf{v}_{1},\mathbf{v}_{2},\ldots,\mathbf{v}_{n} should satisfy

D(α𝐯1,𝐯2,,𝐯n)=αD(𝐯1,𝐯2,,𝐯n).D(\alpha\mathbf{v}_{1},\mathbf{v}_{2},\ldots,\mathbf{v}_{n})=\alpha D(\mathbf{% v}_{1},\mathbf{v}_{2},\ldots,\mathbf{v}_{n}).

Of course, there is nothing special about vector 𝐯1\mathbf{v}_{1}, so for any index kk

(3.2.1) D(𝐯1,,α𝐯k𝑘,,𝐯n)=αD(𝐯1,,𝐯k𝑘,,𝐯n)D(\mathbf{v}_{1},\ldots,\underset{k}{\alpha\mathbf{v}_{k}},\ldots,\mathbf{v}_{% n})=\alpha D(\mathbf{v}_{1},\ldots,\underset{k}{\mathbf{v}_{k}},\ldots,\mathbf% {v}_{n})

To get the next property, let us notice that if we add 2 vectors, then the “height” of the result should be equal the sum of the “heights” of summands, i.e. that

(3.2.2) D(𝐯1,,𝐮k+𝐯kk,,𝐯n)=D(𝐯1,,𝐮k𝑘,,𝐯n)+D(𝐯1,,𝐯k𝑘,,𝐯n)D(\mathbf{v}_{1},\ldots,\underbrace{\mathbf{u}_{k}+\mathbf{v}_{k}}_{k},\ldots,% \mathbf{v}_{n})=\\ D(\mathbf{v}_{1},\ldots,\underset{k}{\mathbf{u}_{k}},\ldots,\mathbf{v}_{n})+D(% \mathbf{v}_{1},\ldots,\underset{k}{\mathbf{v}_{k}},\ldots,\mathbf{v}_{n})

In other words, the above two properties say that the determinant of nn vectors is linear in each argument (vector), meaning that if we fix n1n-1 vectors and interpret the remaining vector as a variable (argument), we get a linear function.

Remark.

We already know that linearity is a very nice property, that helps in many situations. So, admitting negative heights (and therefore negative volumes) is a very small price to pay to get linearity, since we can always put on the absolute value afterwards.

In fact, by admitting negative heights, we did not sacrifice anything! To the contrary, we even gained something, because the sign of the determinant contains some information about the system of vectors (orientation).

3.2.2. Determinant is 0 if two vectors coincide

If 𝐯j=𝐯k\mathbf{v}_{j}=\mathbf{v}_{k} for some jkj\neq k, then the vectors 𝐯1,𝐯2,,𝐯n\mathbf{v}_{1},\mathbf{v}_{2},\ldots,\mathbf{v}_{n} are linearly dependent, so

dimspan{𝐯1,𝐯2,,𝐯n}<n.\dim\operatorname{span}\{\mathbf{v}_{1},\mathbf{v}_{2},\ldots,\mathbf{v}_{n}\}% <n.

But then the volume must be 0 (for example, because the “height”, i.e. the distance from 𝐯j\mathbf{v}_{j} to span{𝐯r:rj}\operatorname{span}\{\mathbf{v}_{r}:r\neq j\} is 0). Thus, it is natural to assume that

(3.2.3) D(𝐯1,𝐯2,,𝐯n)=0if 𝐯j=𝐯kfor some jk.\displaystyle D(\mathbf{v}_{1},\mathbf{v}_{2},\ldots,\mathbf{v}_{n})=0\qquad% \text{if }\mathbf{v}_{j}=\mathbf{v}_{k}\quad\text{for some }j\neq k.

In what follows that will be the property of the determinant we will use.

Remark.

A stronger property that for a linearly dependent system of vectors 𝐯1,𝐯2,,𝐯n\mathbf{v}_{1},\mathbf{v}_{2},\ldots,\mathbf{v}_{n} there holds D(𝐯1,𝐯2,,𝐯n)=0D(\mathbf{v}_{1},\mathbf{v}_{2},\ldots,\mathbf{v}_{n})=0 is also quite natural.

However, this property easily follows from a weaker assumption (3.2.3) and the linearity (3.2.1), (3.2.2), see Proposition 3.3.1 for details.

3.2.3. Preservation under “column replacement”

The next property also seems natural. Namely, if we take a vector, say 𝐯j\mathbf{v}_{j}, and add to it a multiple of another vector 𝐯k\mathbf{v}_{k}, the “height” does not change, so

(3.2.4) D(𝐯1,,𝐯j+α𝐯kj,,𝐯k𝑘,,𝐯n)=D(𝐯1,,𝐯j𝑗,,𝐯k𝑘,,𝐯n).D(\mathbf{v}_{1},\ldots,\underbrace{\mathbf{v}_{j}+\alpha\mathbf{v}_{k}}_{j},% \ldots,\underset{k}{\mathbf{v}_{k}},\ldots,\mathbf{v}_{n})\\ =D(\mathbf{v}_{1},\ldots,\underset{j}{\mathbf{v}_{j}},\ldots,\underset{k}{% \mathbf{v}_{k}},\ldots,\mathbf{v}_{n}).

In other words, if we apply the column operation of the third type, the determinant does not change.

However, this property is not an independent one, it is an easy corollary of (3.2.3) and the linearity (3.2.1), (3.2.2). Namely

D(𝐯1,\displaystyle D(\mathbf{v}_{1}, ,𝐯j+α𝐯kj,,𝐯k𝑘,,𝐯n)\displaystyle\ldots,\underbrace{\mathbf{v}_{j}+\alpha\mathbf{v}_{k}}_{j},% \ldots,\underset{k}{\mathbf{v}_{k}},\ldots,\mathbf{v}_{n})
=D(𝐯1,,𝐯j𝑗,,𝐯k𝑘,,𝐯n)+αD(𝐯1,,𝐯k𝑗,,𝐯k𝑘,,𝐯n)\displaystyle=D(\mathbf{v}_{1},\ldots,\underset{j}{\mathbf{v}_{j}},\ldots,% \underset{k}{\mathbf{v}_{k}},\ldots,\mathbf{v}_{n})+\alpha D(\mathbf{v}_{1},% \ldots,\underset{j}{\mathbf{v}_{k}},\ldots,\underset{k}{\mathbf{v}_{k}},\ldots% ,\mathbf{v}_{n})
=D(𝐯1,,𝐯j𝑗,,𝐯k𝑘,,𝐯n);\displaystyle=D(\mathbf{v}_{1},\ldots,\underset{j}{\mathbf{v}_{j}},\ldots,% \underset{k}{\mathbf{v}_{k}},\ldots,\mathbf{v}_{n});

the first equality here follows from the linearity (3.2.1), (3.2.2), and the second one is just (3.2.3).

3.2.4. Antisymmetry.

The next property the determinant should have, is that if we interchange 2 vectors, the determinant changes sign:

(3.2.5) D(𝐯1,,𝐯k𝑗,,𝐯j𝑘,,𝐯n)=D(𝐯1,,𝐯j𝑗,,𝐯k𝑘,,𝐯n).D(\mathbf{v}_{1},\ldots,\underset{j}{\mathbf{v}_{k}},\ldots,\underset{k}{% \mathbf{v}_{j}},\ldots,\mathbf{v}_{n})=-D(\mathbf{v}_{1},\ldots,\underset{j}{% \mathbf{v}_{j}},\ldots,\underset{k}{\mathbf{v}_{k}},\ldots,\mathbf{v}_{n}).
Remark.

Functions of several variables that change sign when one interchanges any two arguments are called antisymmetric.

At first sight this property does not look natural, but it can be deduced from the previous ones. Namely, applying property (3.2.4) three times, and then using (3.2.1) we get

D(\displaystyle D( 𝐯1,,𝐯j𝑗,,𝐯k𝑘,,𝐯n)\displaystyle\mathbf{v}_{1},\ldots,\underset{j}{\mathbf{v}_{j}},\ldots,% \underset{k}{\mathbf{v}_{k}},\ldots,\mathbf{v}_{n})
=D(𝐯1,,𝐯j𝑗,,𝐯k𝐯jk,,𝐯n)\displaystyle=D(\mathbf{v}_{1},\ldots,\underset{j}{\mathbf{v}_{j}},\ldots,% \underbrace{\mathbf{v}_{k}-\mathbf{v}_{j}}_{k},\ldots,\mathbf{v}_{n})
=D(𝐯1,,𝐯j+(𝐯k𝐯j)j,,𝐯k𝐯jk,,𝐯n)\displaystyle=D(\mathbf{v}_{1},\ldots,\underbrace{\mathbf{v}_{j}+(\mathbf{v}_{% k}-\mathbf{v}_{j})}_{j},\ldots,\underbrace{\mathbf{v}_{k}-\mathbf{v}_{j}}_{k},% \ldots,\mathbf{v}_{n})
=D(𝐯1,,𝐯k𝑗,,𝐯k𝐯jk,,𝐯n)\displaystyle=D(\mathbf{v}_{1},\ldots,\underset{j}{\mathbf{v}_{k}},\ldots,% \underbrace{\mathbf{v}_{k}-\mathbf{v}_{j}}_{k},\ldots,\mathbf{v}_{n})
=D(𝐯1,,𝐯k𝑗,,(𝐯k𝐯j)𝐯kk,,𝐯n)\displaystyle=D(\mathbf{v}_{1},\ldots,\underset{j}{\mathbf{v}_{k}},\ldots,% \underbrace{(\mathbf{v}_{k}-\mathbf{v}_{j})-\mathbf{v}_{k}}_{k},\ldots,\mathbf% {v}_{n})
=D(𝐯1,,𝐯k𝑗,,𝐯j𝑘,,𝐯n)\displaystyle=D(\mathbf{v}_{1},\ldots,\underset{j}{\mathbf{v}_{k}},\ldots,% \underset{k}{-\mathbf{v}_{j}},\ldots,\mathbf{v}_{n})
=D(𝐯1,,𝐯k𝑗,,𝐯j𝑘,,𝐯n).\displaystyle=-D(\mathbf{v}_{1},\ldots,\underset{j}{\mathbf{v}_{k}},\ldots,% \underset{k}{\mathbf{v}_{j}},\ldots,\mathbf{v}_{n}).

Recall, that the property (3.2.4) follows from (3.2.3) and the linearity (3.2.1), (3.2.2), so the antisymmetry (3.2.5) also follows from these properties.

3.2.5. Normalization.

The last property is the easiest one. For the standard basis 𝐞1,𝐞2,,𝐞n\mathbf{e}_{1},\mathbf{e}_{2},\ldots,\mathbf{e}_{n} in n\mathbb{R}^{n} the corresponding parallelepiped is the nn-dimensional unit cube, so

(3.2.6) D(𝐞1,𝐞2,,𝐞n)=1.D(\mathbf{e}_{1},\mathbf{e}_{2},\ldots,\mathbf{e}_{n})=1.

In matrix notation this can be written as

det(I)=1\det(I)=1

3.3. Constructing the determinant.

The plan of the game is now as follows: using the properties that as we decided in Section 3.2 the determinant should have, we derive other properties of the determinant, some of them highly non-trivial. We will show how to use these properties to compute the determinant using our old friend—row reduction.

Later, in Section 3.4, we will show that the determinant, i.e. a function with the desired properties exists and unique. After all we have to be sure that the object we are computing and studying exists.

While our initial geometric motivation for determinant and its properties came from considering vectors in the real vector space n\mathbb{R}^{n}, so they relate only to matrices with real entries, all the constructions below use only algebraic operations (addition, multiplication, division) and are applicable to matrices with complex entries, and even with entries in an arbitrary field.

So in what follows we are constructing determinant not just for real matrices, but for complex matrices as well (and also for matrices with entries in an arbitrary field). The nice geometric motivation for the properties works only in the real case, but after we decided on the properties of the determinant (see properties 1–3 below) everything works in the general case.

3.3.1. Basic properties.

We will use the following basic properties of the determinant:

  1. 1.

    Determinant is linear in each column, i.e. in vector notation for every index kk

    D(𝐯1,,α𝐮k+β𝐯kk,,𝐯n)=αD(𝐯1,,𝐮k𝑘,,𝐯n)+βD(𝐯1,,𝐯k𝑘,,𝐯n)D(\mathbf{v}_{1},\ldots,\underbrace{\alpha\mathbf{u}_{k}+\beta\mathbf{v}_{k}}_% {k},\ldots,\mathbf{v}_{n})=\\ \alpha D(\mathbf{v}_{1},\ldots,\underset{k}{\mathbf{u}_{k}},\ldots,\mathbf{v}_% {n})+\beta D(\mathbf{v}_{1},\ldots,\underset{k}{\mathbf{v}_{k}},\ldots,\mathbf% {v}_{n})

    for all scalars α\alpha, β\beta.

  2. 2.

    Determinant is antisymmetric, i.e. if one interchanges two columns, the determinant changes sign.

  3. 3.

    Normalization property: detI=1\det I=1.

All these properties were discussed above in Section 3.2. The first property is just the (3.2.1) and (3.2.2) combined. The second one is (3.2.5), and the last one is the normalization property (3.2.6). Note, that we did not use properties (3.2.3) and (3.2.4): they can be deduced from the above three. These three properties completely define determinant!

3.3.2. Properties of determinant deduced from the basic properties.

In what follows, detA=D(𝐚1,𝐚2,,𝐚n)\det A=D(\mathbf{a}_{1},\mathbf{a}_{2},\ldots,\mathbf{a}_{n}) is a function satisfying properties 1–3 in Section 3.3.1 above.

Proposition 3.3.1.

For a square matrix AA the following statements hold:

  1. 1.

    If AA has a zero column, then detA=0\det A=0.

  2. 2.

    If AA has two equal columns, then detA=0\det A=0;

  3. 3.

    If one column of AA is a multiple of another, then detA=0\det A=0;

  4. 4.

    If columns of AA are linearly dependent, i.e. if the matrix is not invertible, then detA=0\det A=0.

Proof.

Statement 1 follows immediately from linearity. If we multiply the zero column by zero, we do not change the matrix and its determinant. But by the property 1 above, we should get 0.

The fact that determinant is antisymmetric, implies statement 2. Indeed, if we interchange two equal columns, we change nothing, so the determinant remains the same. On the other hand, interchanging two columns changes sign of determinant, so

detA=detA,\det A=-\det A,

which is possible only if detA=0\det A=0.

Statement 3 is immediate corollary of statement 2 and linearity.

To prove the last statement, let us first suppose that the first vector 𝐯1\mathbf{v}_{1} is a linear combination of the other vectors,

𝐯1=α2𝐯2+α3𝐯3++αn𝐯n=k=2nαk𝐯k.\mathbf{v}_{1}=\alpha_{2}\mathbf{v}_{2}+\alpha_{3}\mathbf{v}_{3}+\ldots+\alpha% _{n}\mathbf{v}_{n}=\sum_{k=2}^{n}\alpha_{k}\mathbf{v}_{k}.

Then by linearity we have (in vector notation)

D(𝐯1,𝐯2,,𝐯n)=D((k=2nαk𝐯k),𝐯2,𝐯3,,𝐯n)=k=2nαkD(𝐯k,𝐯2,𝐯3,,𝐯n)D(\mathbf{v}_{1},\mathbf{v}_{2},\ldots,\mathbf{v}_{n})=D\left(\big{(}\sum_{k=2% }^{n}\alpha_{k}\mathbf{v}_{k}\big{)},\mathbf{v}_{2},\mathbf{v}_{3},\ldots,% \mathbf{v}_{n}\right)\\ =\sum_{k=2}^{n}\alpha_{k}D(\mathbf{v}_{k},\mathbf{v}_{2},\mathbf{v}_{3},\ldots% ,\mathbf{v}_{n})

and each determinant in the sum is zero because of two equal columns.

Let us now consider general case, i.e. let us assume that the system 𝐯1,𝐯2,,𝐯n\mathbf{v}_{1},\mathbf{v}_{2},\ldots,\mathbf{v}_{n} is linearly dependent. Then one of the vectors, say 𝐯k\mathbf{v}_{k} can be represented as a linear combination of the others. Interchanging this vector with 𝐯1\mathbf{v}_{1} we arrive to the situation we just treated, so

D(𝐯1,,𝐯k𝑘,,𝐯n)=D(𝐯k,,𝐯1𝑘,,𝐯n)=0=0,D(\mathbf{v}_{1},\ldots,\underset{k}{\mathbf{v}_{k}},\ldots,\mathbf{v}_{n})=-D% (\mathbf{v}_{k},\ldots,\underset{k}{\mathbf{v}_{1}},\ldots,\mathbf{v}_{n})=-0=0,

so the determinant in this case is also 0. ∎

The next proposition generalizes property (3.2.4). As we already have said above, this property can be deduced from the three “basic” properties of the determinant, we are using in this section.

Proposition 3.3.2.

The determinant does not change if we add to a column a linear combination of the other columns (leaving the other columns intact). In particular, the determinant is preserved under “column replacement” (column operation of third type).

Note, that adding to a column a multiple of itself is prohibited here. We can only add multiples of the other columns.

Proof of Proposition 3.3.2.

Fix a vector 𝐯k\mathbf{v}_{k}, and let 𝐮\mathbf{u} be a linear combination of the other vectors,

𝐮=jkαj𝐯j.\mathbf{u}=\sum_{j\neq k}\alpha_{j}\mathbf{v}_{j}.

Then by linearity

D(𝐯1,,𝐯k+𝐮k,,𝐯n)=D(𝐯1,,𝐯k𝑘,,𝐯n)+D(𝐯1,,𝐮𝑘,,𝐯n),D(\mathbf{v}_{1},\ldots,\underbrace{\mathbf{v}_{k}+\mathbf{u}}_{k},\ldots,% \mathbf{v}_{n})=D(\mathbf{v}_{1},\ldots,\underset{k}{\mathbf{v}_{k}},\ldots,% \mathbf{v}_{n})+D(\mathbf{v}_{1},\ldots,\underset{k}{\mathbf{u}},\ldots,% \mathbf{v}_{n}),

and by Proposition 3.3.1 the last term is zero. ∎

3.3.3. Determinants of diagonal and triangular matrices.

Now we are ready to compute determinant for some important special classes of matrices. The first class is the so-called diagonal matrices. Let us recall that a square matrix A={aj,k}j,k=1nA=\{a_{j,k}\}_{j,k=1}^{n} is called diagonal if all entries off the main diagonal are zero, i.e. if aj,k=0a_{j,k}=0 for all jkj\neq k. We will often use the notation diag{a1,a2,,an}\operatorname{diag}\{a_{1},a_{2},\ldots,a_{n}\} for the diagonal matrix

(a1000a20000an).\left(\begin{array}[]{cccc}a_{1}&0&\ldots&0\\ 0&a_{2}&\ldots&0\\ \vdots&\vdots&\ddots&0\\ 0&0&\ldots&a_{n}\end{array}\right).

Since a diagonal matrix diag{a1,a2,,an}\operatorname{diag}\{a_{1},a_{2},\ldots,a_{n}\} can be obtained from the identity matrix II by multiplying column number kk by aka_{k},

Determinant of a diagonal matrix equal the product of the diagonal entries, det(diag{a1,a2,,an})=a1a2an.\det(\operatorname{diag}\{a_{1},a_{2},\ldots,a_{n}\})=a_{1}a_{2}\ldots a_{n}.

The next important class is the class of so-called triangular matrices. A square matrix A={aj,k}j,k=1nA=\{a_{j,k}\}_{j,k=1}^{n} is called upper triangular if all entries below the main diagonal are 0, i.e. if aj,k=0a_{j,k}=0 for all k<jk<j. A square matrix is called lower triangular if all entries above the main are 0, i.e if aj,k=0a_{j,k}=0 for all j<kj<k. We call a matrix triangular, if it is either lower or upper triangular matrix.

It is easy to see that

Determinant of a triangular matrix equals to the product of the diagonal entries, detA=a1,1a2,2an,n.\det A=a_{1,1}a_{2,2}\ldots a_{n,n}.

Indeed, if a triangular matrix has zero on the main diagonal, it is not invertible (this can easily be checked by column operations) and therefore both sides equal zero. If all diagonal entries are non-zero, then using column replacement (column operations of third type) one can transform the matrix into a diagonal one with the same diagonal entries: For upper triangular matrix one should first subtract appropriate multiples of the first column from the columns number 2,3,,n2,3,\ldots,n, “killing” all entries in the first row, then subtract appropriate multiples of the second column from columns number 3,,n3,\ldots,n, and so on.

To treat the case of lower triangular matrices one has to do “column reduction” from the left to the right, i.e. first subtract appropriate multiples of the last column from columns number n1,,2,1n-1,\ldots,2,1, and so on.

3.3.4. Computing the determinant.

Now we know how to compute determinants, using their properties: one just needs to do column reduction (i.e. row reduction for ATA^{T}) keeping track of column operations changing the determinant. Fortunately, the most often used operation—row replacement, i.e. operation of third type does not change the determinant. So we only need to keep track of interchanging of columns and of multiplication of column by a scalar.

If an echelon form of ATA^{T} does not have pivots in every column (and row), then AA is not invertible, so detA=0\det A=0. If AA is invertible, we arrive at a triangular matrix, and detA\det A is the product of diagonal entries times the correction from column interchanges and multiplications.

The above algorithm implies that detA\det A can be zero only if a matrix AA is not invertible. Combining this with the last statement of Proposition 3.3.1 we get

Proposition 3.3.3.

detA=0\det A=0 if and only if AA is not invertible. An equivalent statement: detA0\det A\neq 0 if and only if AA is invertible.

Note, that although we now know how to compute determinants, the determinant is still not defined. One can ask: why don’t we define it as the result we get from the above algorithm? The problem is that formally this result is not well defined: that means we did not prove that different sequences of column operations yield the same answer.

3.3.5. Determinants of a transpose and of a product. Determinants of elementary matrices.

In this section we prove two important theorems.

Theorem 3.3.4 (Determinant of a transpose).

For a square matrix AA,

detA=det(AT).\det A=\det(A^{T}).

This theorem implies that for all statement about columns we discussed above, the corresponding statements about rows are also true. In particular, determinants behave under row operations the same way they behave under column operations. So, we can use row operations to compute determinants.

Theorem 3.3.5 (Determinant of a product).

For n×nn\times n matrices AA and BB

det(AB)=(detA)(detB)\det(AB)=(\det A)(\det B)

In other words

Determinant of a product equals product of determinants.

To prove both theorems we need the following lemma.

Lemma 3.3.6.

For a square matrix AA and an elementary matrix EE (of the same size)

det(AE)=(detA)(detE)\det(AE)=(\det A)(\det E)
Proof.

The proof can be done just by direct checking: determinants of special matrices are easy to compute; right multiplication by an elementary matrix is a column operation, and effect of column operations on the determinant is well known.

This can look like a lucky coincidence, that the determinants of elementary matrices agree with the corresponding column operations, but it is not a coincidence at all.

Namely, for a column operation the corresponding elementary matrix can be obtained from the identity matrix II by this column operation. So, its determinant is 1 (determinant of II) times the effect of the column operation.

And that is all! It may be hard to realize at first, but the above paragraph is a complete and rigorous proof of the lemma! ∎

Applying NN times Lemma 3.3.6 we get the following corollary.

Corollary 3.3.7.

For any matrix AA and any sequence of elementary matrices E1,E2,,ENE_{1},E_{2},\ldots,E_{N} (all matrices are n×nn\times n)

det(AE1E2EN)=(detA)(detE1)(detE2)(detEN)\det(AE_{1}E_{2}\ldots E_{N})=(\det A)(\det E_{1})(\det E_{2})\ldots(\det E_{N})
Lemma 3.3.8.

Any invertible matrix is a product of elementary matrices.

Proof.

We know that any invertible matrix is row equivalent to the identity matrix, which is its reduced echelon form. So

I=ENEN1E2E1A,I=E_{N}E_{N-1}\ldots E_{2}E_{1}A,

and therefore any invertible matrix can be represented as a product of elementary matrices,

A=E11E21EN11EN1I=E11E21EN11EN1A=E_{1}^{-1}E_{2}^{-1}\ldots E_{N-1}^{-1}E_{N}^{-1}I=E_{1}^{-1}E_{2}^{-1}% \ldots E_{N-1}^{-1}E_{N}^{-1}

(the inverse of an elementary matrix is an elementary matrix). ∎

Proof of Theorem 3.3.4.

First of all, it can be easily checked, that for an elementary matrix EE we have detE=det(ET)\det E=\det(E^{T}). Notice, that it is sufficient to prove the theorem only for invertible matrices AA, since if AA is not invertible then ATA^{T} is also not invertible, and both determinants are zero.

By Lemma 3.3.8 matrix AA can be represented as a product of elementary matrices,

A=E1E2EN,A=E_{1}E_{2}\ldots E_{N},

and by Corollary 3.3.7 the determinant of AA is the product of determinants of the elementary matrices. Since taking the transpose just transposes each elementary matrix and reverses their order, Corollary 3.3.7 implies that detA=detAT\det A=\det A^{T}. ∎

Proof of Theorem 3.3.5.

Let us first suppose that the matrix BB is invertible. Then Lemma 3.3.8 implies that BB can be represented as a product of elementary matrices

B=E1E2EN,B=E_{1}E_{2}\ldots E_{N},

and so by Corollary 3.3.7

det(AB)=(detA)[(detE1)(detE2)(detEN)]=(detA)(detB).\det(AB)=(\det A)[(\det E_{1})(\det E_{2})\ldots(\det E_{N})]=(\det A)(\det B).

If BB is not invertible, then the product ABAB is also not invertible, and the theorem just says that 0=00=0.

To check that the product AB=CAB=C is not invertible, let us assume that it is invertible. Then multiplying the identity AB=CAB=C by C1C^{-1} from the left, we get C1AB=IC^{-1}AB=I, so C1AC^{-1}A is a left inverse of BB. So BB is left invertible, and since it is square, it is invertible. We got a contradiction. ∎

3.3.6. Summary of properties of determinant.

First of all, let us say once more, that the determinant is defined only for square matrices! Since we now know that detA=det(AT)\det A=\det(A^{T}), the statements that we knew about columns are true for rows too.

  1. 1.

    Determinant is linear in each row (column) when the other rows (columns) are fixed.

  2. 2.

    If one interchanges two rows (columns) of a matrix AA, the determinant changes sign.

  3. 3.

    For a triangular (in particular, for a diagonal) matrix its determinant is the product of the diagonal entries. In particular, detI=1\det I=1.

  4. 4.

    If a matrix AA has a zero row (or column), detA=0\det A=0.

  5. 5.

    If a matrix AA has two equal rows (columns), detA=0\det A=0.

  6. 6.

    If one of the rows (columns) of AA is a linear combination of the other rows (columns), i.e. if the matrix is not invertible, then detA=0\det A=0;

    More generally,

  7. 7.

    detA=0\det A=0 if and only if AA is not invertible, or equivalently

  8. 8.

    detA0\det A\neq 0 if and only if AA is invertible.

  9. 9.

    detA\det A does not change if we add to a row (column) a linear combination of the other rows (columns). In particular, the determinant is preserved under the row (column) replacement, i.e. under the row (column) operation of the third kind.

  10. 10.

    detAT=detA\det A^{T}=\det A.

  11. 11.

    det(AB)=(detA)(detB)\det(AB)=(\det A)(\det B).

    And finally,

  12. 12.

    If AA is an n×nn\times n matrix, then det(aA)=andetA\det(aA)=a^{n}\det A.

The last property follows from the linearity of the determinant, if we recall that to multiply a matrix AA by aa we have to multiply each row by aa, and that each multiplication multiplies the determinant by aa.

Exercises.

3.3.1.

If AA is an n×nn\times n matrix, how are the determinants detA\det A and det(5A)\det(5A) related? Remark: det(5A)=5detA\det(5A)=5\det A only in the trivial case of 1×11\times 1 matrices

3.3.2.

How are the determinants detA\det A and detB\det B related if

  1. a)
    A=(a1a2a3b1b2b3c1c2c3),B=(2a13a25a32b13b25b32c13c25c3);A=\left(\begin{array}[]{ccc}a_{1}&a_{2}&a_{3}\\ b_{1}&b_{2}&b_{3}\\ c_{1}&c_{2}&c_{3}\end{array}\right),\qquad B=\left(\begin{array}[]{ccc}2a_{1}&% 3a_{2}&5a_{3}\\ 2b_{1}&3b_{2}&5b_{3}\\ 2c_{1}&3c_{2}&5c_{3}\end{array}\right);
  2. b)
    A=(a1a2a3b1b2b3c1c2c3),B=(3a14a2+5a15a33b14b2+5b15b33c14c2+5c15c3).A=\left(\begin{array}[]{ccc}a_{1}&a_{2}&a_{3}\\ b_{1}&b_{2}&b_{3}\\ c_{1}&c_{2}&c_{3}\end{array}\right),\qquad B=\left(\begin{array}[]{ccc}3a_{1}&% 4a_{2}+5a_{1}&5a_{3}\\ 3b_{1}&4b_{2}+5b_{1}&5b_{3}\\ 3c_{1}&4c_{2}+5c_{1}&5c_{3}\end{array}\right).
3.3.3.

Using column or row operations compute the determinants

|012103230|,|123456789|,|1023311204112301|,|1x1y|.\left|\begin{array}[]{rrr}0&1&2\\ -1&0&-3\\ 2&3&0\end{array}\right|,\qquad\left|\begin{array}[]{rrr}1&2&3\\ 4&5&6\\ 7&8&9\end{array}\right|,\qquad\left|\begin{array}[]{rrrr}1&0&-2&3\\ -3&1&1&2\\ 0&4&-1&1\\ 2&3&0&1\end{array}\right|,\qquad\left|\begin{array}[]{rrr}1&x\\ 1&y\end{array}\right|.
3.3.4.

A square (n×nn\times n) matrix is called skew-symmetric (or antisymmetric) if AT=AA^{T}=-A. Prove that if AA is skew-symmetric and nn is odd, then detA=0\det A=0. Is this true for even nn?

3.3.5.

A square matrix is called nilpotent if Ak=𝟎A^{k}=\mathbf{0} for some positive integer kk. Show that for a nilpotent matrix AA detA=0\det A=0.

3.3.6.

Prove that if the matrices AA and BB are similar, then detA=detB\det A=\det B.

3.3.7.

A real square matrix QQ is called orthogonal if QTQ=IQ^{T}Q=I. Prove that if QQ is an orthogonal matrix then detQ=±1\det Q=\pm 1.

3.3.8.

Show that

|1xx21yy21zz2|=(zx)(zy)(yx).\left|\begin{array}[]{rrr}1&x&x^{2}\\ 1&y&y^{2}\\ 1&z&z^{2}\end{array}\right|=(z-x)(z-y)(y-x).

This is a particular case of the so-called Vandermonde determinant.

3.3.9.

Let points AA, BB and CC in the plane 2\mathbb{R}^{2} have coordinates (x1,y1)(x_{1},y_{1}), (x2,y2)(x_{2},y_{2}) and (x3,y3)(x_{3},y_{3}) respectively. Show that the area of triangle ABCABC is the absolute value of

12|1x1y11x2y21x3y3|.\frac{1}{2}\left|\begin{array}[]{ccc}1&x_{1}&y_{1}\\ 1&x_{2}&y_{2}\\ 1&x_{3}&y_{3}\end{array}\right|.

Hint: use row operations and geometric interpretation of 2×22\times 2 determinants (area).

3.3.10.

Let AA be a square matrix. Show that block triangular matrices

(I𝟎A),(A𝟎I),(I𝟎A),(A𝟎I)\left(\begin{array}[]{cc}I&*\\ \mathbf{0}&A\end{array}\right),\qquad\left(\begin{array}[]{cc}A&*\\ \mathbf{0}&I\end{array}\right),\qquad\left(\begin{array}[]{cc}I&\mathbf{0}\\ &A\end{array}\right),\qquad\left(\begin{array}[]{cc}A&\mathbf{0}\\ &I\end{array}\right)

all have determinant equal to detA\det A. Here * can be anything.

The following problems illustrate the power of block matrix notation.

3.3.11.

Use the previous problem to show that if AA and CC are square matrices, then

det(AB𝟎C)=detAdetC.\det\left(\begin{array}[]{cc}A&B\\ \mathbf{0}&C\end{array}\right)=\det A\det C.

Hint: (AB𝟎C)=(IB𝟎C)(A𝟎𝟎I)\left(\begin{array}[]{cc}A&B\\ \mathbf{0}&C\end{array}\right)=\left(\begin{array}[]{cc}I&B\\ \mathbf{0}&C\end{array}\right)\left(\begin{array}[]{cc}A&\mathbf{0}\\ \mathbf{0}&I\end{array}\right).

3.3.12.

Let AA be m×nm\times n and BB be n×mn\times m matrices. Prove that

det(0ABI)=det(AB).\det\left(\begin{array}[]{cc}0&A\\ -B&I\end{array}\right)=\det(AB).

Hint: While it is possible to transform the matrix by row operations to a form where the determinant is easy to compute, the easiest way is to right multiply the matrix by (I0BI)\left(\begin{array}[]{cc}I&0\\ B&I\end{array}\right).

3.4. Formal definition. Existence and uniqueness of the determinant.

In this section we arrive to the formal definition of the determinant. We show that a function, satisfying the basic properties 1, 2, 3 from Section 3.3 exists, and moreover, such function is unique, i.e. we do not have any choice in constructing the determinant.

Consider an n×nn\times n matrix A={aj,k}j,k=1nA=\{a_{j,k}\}_{j,k=1}^{n}, and let 𝐯1,𝐯2,,𝐯n\mathbf{v}_{1},\mathbf{v}_{2},\ldots,\mathbf{v}_{n} be its columns, i.e.

𝐯k=(a1,ka2,kan,k)=a1,k𝐞1+a2,k𝐞2++an,k𝐞n=j=1naj,k𝐞j.\mathbf{v}_{k}=\left(\begin{array}[]{c}a_{1,k}\\ a_{2,k}\\ \vdots\\ a_{n,k}\end{array}\right)=a_{1,k}\mathbf{e}_{1}+a_{2,k}\mathbf{e}_{2}+\ldots+a% _{n,k}\mathbf{e}_{n}=\sum_{j=1}^{n}a_{j,k}\mathbf{e}_{j}.

Using linearity of the determinant we expand it in the first column 𝐯1\mathbf{v}_{1}:

(3.4.1) D(𝐯1,𝐯2,,𝐯n)=D(j=1naj,1𝐞j,𝐯2,,𝐯n)=j=1naj,1D(𝐞j,𝐯2,,𝐯n).D(\mathbf{v}_{1},\mathbf{v}_{2},\ldots,\mathbf{v}_{n})=\\ D(\sum_{j=1}^{n}a_{j,1}\mathbf{e}_{j},\mathbf{v}_{2},\ldots,\mathbf{v}_{n})=% \sum_{j=1}^{n}a_{j,1}D(\mathbf{e}_{j},\mathbf{v}_{2},\ldots,\mathbf{v}_{n}).

Then we expand it in the second column, then in the third, and so on. We get

D(𝐯1,𝐯2,,𝐯n)=j1=1nj2=1njn=1naj1,1aj2,2ajn,nD(𝐞j1,𝐞j2,𝐞jn).D(\mathbf{v}_{1},\mathbf{v}_{2},\ldots,\mathbf{v}_{n})=\sum_{j_{1}=1}^{n}\sum_% {j_{2}=1}^{n}\ldots\sum_{j_{n}=1}^{n}a_{j_{1},1}a_{j_{2},2}\ldots a_{j_{n},n}D% (\mathbf{e}_{j_{1}},\mathbf{e}_{j_{2}},\ldots\mathbf{e}_{j_{n}}).

Notice, that we have to use a different index of summation for each column: we call them j1,j2,,jnj_{1},j_{2},\ldots,j_{n}; the index j1j_{1} here is the same as the index jj in (3.4.1).

It is a huge sum, it contains nnn^{n} terms. Fortunately, some of the terms are zero. Namely, if any 2 of the indices j1,j2,,jnj_{1},j_{2},\ldots,j_{n} coincide, the determinant D(𝐞j1,𝐞j2,𝐞jn)D(\mathbf{e}_{j_{1}},\mathbf{e}_{j_{2}},\ldots\mathbf{e}_{j_{n}}) is zero, because there are two equal columns here.

So, let us rewrite the sum, omitting all zero terms. The most convenient way to do that is using the notion of a permutation. Informally, a permutation of an ordered set {1,2,,n}\{1,2,\ldots,n\} is a rearrangement of its elements. A convenient formal way to represent such a rearrangement is by using a function

σ:{1,2,,n}{1,2,,n},\sigma:\{1,2,\ldots,n\}\to\{1,2,\ldots,n\},

where σ(1),σ(2),,σ(n)\sigma(1),\sigma(2),\ldots,\sigma(n) gives the new order of the set 1,2,,n1,2,\ldots,n. In other words, the permutation σ\sigma rearranges the ordered set 1,2,,n1,2,\ldots,n into σ(1),σ(2),,σ(n)\sigma(1),\sigma(2),\ldots,\sigma(n).

Such function σ\sigma has to be one-to-one (different values for different arguments) and onto (assumes all possible values from the target space). The functions which are one-to-one and onto are called bijections, and they give one-to-one correspondence between the domain and the target space.111 There is another canonical way to represent permutation by a bijection σ\sigma, namely in this representation σ(k)\sigma(k) gives new position of the element number kk. In this representation σ\sigma rearranges σ(1),σ(2),,σ(n)\sigma(1),\sigma(2),\ldots,\sigma(n) into 1,2,,n1,2,\ldots,n. While in the first representation it is easy to write the function if you know the rearrangement of the set 1,2,,n1,2,\ldots,n, the second one is more adapted to the composition of permutations: it coincides with the composition of functions. Namely if we first perform the permutation that correspond to a function σ\sigma and then one that correspond to τ\tau, the resulting permutation will correspond to τσ\tau\circ\sigma.

Although it is not directly relevant here, let us notice, that it is well-known in combinatorics, that the number of different permutations of the set {1,2,,n}\{1,2,\ldots,n\} is exactly n!n!. The set of all permutations of the set {1,2,,n}\{1,2,\ldots,n\} will be denoted Perm(n)\operatorname{Perm}(n).

Using the notion of a permutation, we can rewrite the determinant as

D(𝐯1,𝐯2,,𝐯n)=σPerm(n)aσ(1),1aσ(2),2aσ(n),nD(𝐞σ(1),𝐞σ(2),,𝐞σ(n)).D(\mathbf{v}_{1},\mathbf{v}_{2},\ldots,\mathbf{v}_{n})=\\ \sum_{\sigma\in\operatorname{Perm}(n)}a_{\sigma(1),1}a_{\sigma(2),2}\ldots a_{% \sigma(n),n}D(\mathbf{e}_{\sigma(1)},\mathbf{e}_{\sigma(2)},\ldots,\mathbf{e}_% {\sigma(n)}).

The matrix with columns 𝐞σ(1),𝐞σ(2),,𝐞σ(n)\mathbf{e}_{\sigma(1)},\mathbf{e}_{\sigma(2)},\ldots,\mathbf{e}_{\sigma(n)} can be obtained from the identity matrix by finitely many column interchanges, so the determinant

D(𝐞σ(1),𝐞σ(2),,𝐞σ(n))D(\mathbf{e}_{\sigma(1)},\mathbf{e}_{\sigma(2)},\ldots,\mathbf{e}_{\sigma(n)})

is 11 or 1-1 depending on the number of column interchanges.

To formalize that, we (informally) define the sign (denoted signσ\operatorname{sign}\sigma) of a permutation σ\sigma to be 1 if an even number of interchanges is necessary to rearrange the nn-tuple 1,2,,n1,2,\ldots,n into σ(1),σ(2),,σ(n)\sigma(1),\sigma(2),\ldots,\sigma(n), and sign(σ)=1\operatorname{sign}(\sigma)=-1 if the number of interchanges is odd.

It is a well-known fact from the combinatorics, that the sign of permutation is well defined, i.e. that although there are infinitely many ways to get the nn-tuple σ(1),σ(2),,σ(n)\sigma(1),\sigma(2),\ldots,\sigma(n) from 1,2,,n1,2,\ldots,n, the number of interchanges is either always odd or always even.

One of the ways to show that is to introduce an alternative definition. Let K=K(σ)K=K(\sigma) be the number of disorders of σ\sigma, i.e. the number of pairs (j,k)(j,k), j,k{1,2,,n}j,k\in\{1,2,\ldots,n\}, j<kj<k such that σ(j)>σ(k)\sigma(j)>\sigma(k), and see if the number is even or odd. We call the permutation σ\sigma odd if KK is odd and even if KK is even. Then define signσ:=(1)K(σ)\operatorname{sign}\sigma:=(-1)^{K(\sigma)}; note that this way signσ\operatorname{sign}\sigma is well defined.

We want to show that signσ=(1)K(σ)\operatorname{sign}\sigma=(-1)^{K(\sigma)} can indeed be computed by rearranging the nn-tuple 1,2,,n1,2,\ldots,n into σ(1),σ(2),,σ(n)\sigma(1),\sigma(2),\ldots,\sigma(n) and counting the number of interchanges, as was described above.

If σ(k)=k\sigma(k)=k k\forall k, then the number of disorders K(σ)K(\sigma) is 0, so sign of such identity permutation is 11. Note also, that any elementary transpose, which interchange two neighbors, changes the sign of a permutation, because it changes (increases or decreases) the number of disorders exactly by 1. So, to get from a permutation to another one always needs an even number of elementary transposes if the permutations have the same sign, and an odd number if the signs are different.

Finally, any interchange of two entries can be achieved by an odd number of elementary transposes. This implies that sign changes under an interchange of two entries. So, to get from 1,2,,n1,2,\ldots,n to an even permutation (positive sign) one always need even number of interchanges, and odd number of interchanges is needed to get an odd permutation (negative sign).

So, if we want determinant to satisfy basic properties 1–3 from Section 3.3, we must define it as

(3.4.2) detA=σPerm(n)aσ(1),1aσ(2),2aσ(n),nsign(σ),\det A=\sum_{\sigma\in\operatorname{Perm}(n)}a_{\sigma(1),1}a_{\sigma(2),2}% \ldots a_{\sigma(n),n}\operatorname{sign}(\sigma),

where the sum is taken over all permutations of the set {1,2,,n}\{1,2,\ldots,n\}.

If we define the determinant this way, it is easy to check that it satisfies the basic properties 1–3 from Section 3.3. Indeed, it is linear in each column, because for each column every term (product) in the sum contains exactly one entry from this column.

Interchanging two columns of AA just adds an extra interchange to the permutation, so right side in (3.4.2) changes sign. Finally, for the identity matrix II, the right side of (3.4.2) is 11 (it has one non-zero term).

Exercises.

3.4.1.

Suppose the permutation σ\sigma takes (1,2,3,4,5)(1,2,3,4,5) to (5,4,1,2,3)(5,4,1,2,3).

  1. a)

    Find sign of σ\sigma;

  2. b)

    What does σ2:=σσ\sigma^{2}:=\sigma\circ\sigma do to (1,2,3,4,5)(1,2,3,4,5)?

  3. c)

    What does the inverse permutation σ1\sigma^{-1} do to (1,2,3,4,5)(1,2,3,4,5)?

  4. d)

    What is the sign of σ1\sigma^{-1}?

3.4.2.

Let PP be a permutation matrix, i.e. an n×nn\times n matrix consisting of zeroes and ones and such that there is exactly one 1 in every row and every column.

  1. a)

    Can you describe the corresponding linear transformation? That will explain the name.

  2. b)

    Show that PP is invertible. Can you describe P1P^{-1}?

  3. c)

    Show that for some N>0N>0

    PN:=PPPN times=I.P^{N}:=\underbrace{PP\ldots P}_{N\text{ times}}=I.

    Use the fact that there are only finitely many permutations.

3.4.3.

Why is there an even number of permutations of (1,2,,9)(1,2,\ldots,9) and why are exactly half of them odd permutations? Hint: This problem can be hard to solve in terms of permutations, but there is a very simple solution using determinants.

3.4.4.

If σ\sigma is an odd permutation, explain why σ2\sigma^{2} is even but σ1\sigma^{-1} is odd.

3.4.5.

How many multiplications and additions is required to compute the determinant using formal definition (3.4.2) of the determinant of an n×nn\times n matrix? Do not count the operations needed to compute signσ\operatorname{sign}\sigma.

3.5. Cofactor expansion.

For an n×nn\times n matrix A={aj,k}j,k=1nA=\{a_{j,k}\}_{j,k=1}^{n} let Aj,kA_{j,k} denotes the (n1)×(n1)(n-1)\times(n-1) matrix obtained from AA by crossing out row number jj and column number kk.

Theorem 3.5.1 (Cofactor expansion of determinant).

Let AA be an n×nn\times n matrix. For each jj, 1jn1\leq j\leq n, determinant of AA can be expanded in the row number jj as

detA=aj,1(1)j+1detAj,1+aj,2(1)j+2detAj,2++aj,n(1)j+ndetAj,n=k=1naj,k(1)j+kdetAj,k.\det A=\\ a_{j,1}(-1)^{j+1}\det A_{j,1}+a_{j,2}(-1)^{j+2}\det A_{j,2}+\ldots+a_{j,n}(-1)% ^{j+n}\det A_{j,n}\\ =\sum_{k=1}^{n}a_{j,k}(-1)^{j+k}\det A_{j,k}.

Similarly, for each kk, 1kn1\leq k\leq n, the determinant can be expanded in the column number kk,

detA=j=1naj,k(1)j+kdetAj,k.\det A=\sum_{j=1}^{n}a_{j,k}(-1)^{j+k}\det A_{j,k}.
Proof.

Let us first prove the formula for the expansion in row number 1. The formula for expansion in row number 22 then can be obtained from it by interchanging rows number 1 and 22. Interchanging then rows number 22 and 33 we get the formula for the expansion in row number 33, and so on.

Since detA=detAT\det A=\det A^{T}, column expansion follows automatically.

Let us first consider a special case, when the first row has one non-zero term a1,1a_{1,1}. Performing column operations on columns 2,3,,n2,3,\ldots,n we transform AA to the lower triangular form. The determinant of AA then can be computed as

the product of diagonal entries of the triangular matrix×correcting factor from the column operations.\framebox{\parbox{113.81102pt}{the product of diagonal entries of the % triangular matrix}}\times\framebox{\parbox{113.81102pt}{correcting factor from the column% operations}}.

But the product of all diagonal entries except the first one (i.e. without a1,1a_{1,1}) times the correcting factor is exactly detA1,1\det A_{1,1}, so in this particular case detA=a1,1detA1,1\det A=a_{1,1}\det A_{1,1}.

Let us now consider the case when all entries in the first row except a1,2a_{1,2} are zeroes. This case can be reduced to the previous one by interchanging columns number 1 and 2, and therefore in this case detA=(1)a1,2detA1,2\det A=(-1)a_{1,2}\det A_{1,2}.

The case when a1,3a_{1,3} is the only non-zero entry in the first row, can be reduced to the previous one by interchanging columns 2 and 3, so in this case detA=a1,3detA1,3\det A=a_{1,3}\det A_{1,3}.

Repeating this procedure we get that in the case when a1,ka_{1,k} is the only non-zero entry in the first row detA=(1)1+ka1,kdetA1,k\det A=(-1)^{1+k}a_{1,k}\det A_{1,k}. 222In the case when a1,ka_{1,k} is the only non-zero entry in the first row it may be tempting to exchange columns number 1 and number kk, to reduce the problem to the case a1,10a_{1,1}\neq 0. However, when we exchange columns 1 and kk we change the order of other columns: if we just cross out column number kk, then column number 11 will be the first of the remaining columns. But, if we exchange columns 11 and kk, and then cross out column kk (which is now the first one), then the column 11 will be now column number k1k-1. To avoid the complications of keeping track of the order of columns, we can, as we did above, exchange columns number kk and k1k-1, reducing everything to the situation we treated on the previous step. Such an operation does not change the order for the rest of the columns.

In the general case, linearity of the determinant in each row implies that

detA=detA(1)+detA(2)++detA(n)=k=1ndetA(k)\det A=\det A^{(1)}+\det A^{(2)}+\ldots+\det A^{(n)}=\sum_{k=1}^{n}\det A^{(k)}

where the matrix A(k)A^{(k)} is obtained from AA by replacing all entries in the first row except a1,ka_{1,k} by 0. As we just discussed above

detA(k)=(1)1+ka1,kdetA1,k,\det A^{(k)}=(-1)^{1+k}a_{1,k}\det A_{1,k},

so

detA=k=1n(1)1+ka1,kdetA1,k.\det A=\sum_{k=1}^{n}(-1)^{1+k}a_{1,k}\det A_{1,k}.

To get the cofactor expansion in the second row, we can interchange the first and second rows and apply the above formula. The row exchange changes the sign, so we get

detA=k=1n(1)1+ka2,kdetA2,k=k=1n(1)2+ka2,kdetA2,k.\det A=-\sum_{k=1}^{n}(-1)^{1+k}a_{2,k}\det A_{2,k}=\sum_{k=1}^{n}(-1)^{2+k}a_% {2,k}\det A_{2,k}.

Exchanging rows 3 and 2 and expanding in the second row we get formula

detA=k=1n(1)3+ka3,kdetA3,k,\det A=\sum_{k=1}^{n}(-1)^{3+k}a_{3,k}\det A_{3,k},

and so on.

To expand the determinant detA\det A in a column one need to apply the row expansion formula for ATA^{T}. ∎

Definition.

The numbers

Cj,k=(1)j+kdetAj,kC_{j,k}=(-1)^{j+k}\det A_{j,k}

are called cofactors.

Using this notation, the formula for expansion of the determinant in the row number jj can be rewritten as

detA=aj,1Cj,1+aj,2Cj,2++aj,nCj,n=k=1naj,kCj,k.\det A=a_{j,1}C_{j,1}+a_{j,2}C_{j,2}+\ldots+a_{j,n}C_{j,n}=\sum_{k=1}^{n}a_{j,% k}C_{j,k}.

Similarly, expansion in the column number kk can be written as

detA=a1,kC1,k+a2,kC2,k++an,kCn,k=j=1naj,kCj,k\det A=a_{1,k}C_{1,k}+a_{2,k}C_{2,k}+\ldots+a_{n,k}C_{n,k}=\sum_{j=1}^{n}a_{j,% k}C_{j,k}
Remark.

Very often the cofactor expansion formula is used as the definition of determinant. It is not difficult to show that the quantity given by this formula satisfies the basic properties of the determinant: the normalization property is trivial, the proof of antisymmetry is easy. However, the proof of linearity is a bit tedious (although not too difficult).

Remark.

Although it looks very nice, the cofactor expansion formula is not suitable for computing determinant of matrices bigger than 3×33\times 3.

As one can count it requires more than n!n! multiplications (to be precise it requires k=1n1n!/k!\sum_{k=1}^{n-1}n!/k! multiplications), and n!n! grows very rapidly. For example, cofactor expansion of a 20×2020\times 20 matrix require more than 20!2.433101820!\approx 2.433\cdot 10^{18} multiplications. It would take a computer performing a billion multiplications per second over 7777 years to perform 20!20! multiplications; performing the multiplications required for the cofactor expansion of the determinant of a 20×2020\times 20 matrix will require more than 132 years.333The reader can check the numbers using, for example, WolframAlpha

On the other hand, computing the determinant of an n×nn\times n matrix using row reduction requires (n3+2n3)/3(n^{3}+2n-3)/3 multiplications (and about the same number of additions). It would take a computer performing a million operations per second (very slow, by today’s standards) a fraction of a second to compute the determinant of a 100×100100\times 100 matrix by row reduction.

It can only be practical to apply the cofactor expansion formula in higher dimensions if a row (or a column) has a lot of zero entries.

However, the cofactor expansion formula is of great theoretical importance, as the next section shows.

3.5.1. Cofactor formula for the inverse matrix

The matrix C={Cj,k}j,k=1nC=\linebreak\{C_{j,k}\}_{j,k=1}^{n} whose entries are cofactors of a given matrix AA is called the cofactor matrix of AA.

Theorem 3.5.2.

Let AA be an invertible matrix and let CC be its cofactor matrix. Then

A1=1detACT.A^{-1}=\frac{1}{\det A}\,\,C^{T}.
Proof.

Let us find the product ACTAC^{T}. The diagonal entry number jj is obtained by multiplying jjth row of AA by jjth column of CTC^{T} (i.e. jjth row of CC), so

(ACT)j,j=aj,1Cj,1+aj,2Cj,2++aj,nCj,n=detA,(AC^{T})_{j,j}=a_{j,1}C_{j,1}+a_{j,2}C_{j,2}+\ldots+a_{j,n}C_{j,n}=\det A,

by the cofactor expansion formula.

To get the off diagonal terms we need to multiply kkth row of AA by jjth column of CTC^{T}, jkj\neq k, to get

ak,1Cj,1+ak,2Cj,2++ak,nCj,n.a_{k,1}C_{j,1}+a_{k,2}C_{j,2}+\ldots+a_{k,n}C_{j,n}.

It follows from the cofactor expansions formula (expanding in jjth row) that this is the determinant of the matrix obtained from AA by replacing row number jj by the row number kk (and leaving all other rows as they were). But the rows jj and kk of this matrix coincide, so the determinant is 0. So, all off-diagonal entries of ACTAC^{T} are zeroes (and all diagonal ones equal detA\det A), thus

ACT=(detA)I.AC^{T}=(\det A)\,I.

That means that the matrix 1detACT\frac{1}{\det A}\,\,C^{T} is a right inverse of AA, and since AA is square, it is the inverse. ∎

Recalling that for an invertible matrix AA the equation A𝐱=𝐛A\mathbf{x}=\mathbf{b} has a unique solution

𝐱=A1𝐛=1detACT𝐛,\mathbf{x}=A^{-1}\mathbf{b}=\frac{1}{\det A}C^{T}\mathbf{b},

we get the following corollary of the above theorem.

Corollary 3.5.3 (Cramer’s rule).

For an invertible matrix AA the entry number kk of the solution of the equation A𝐱=𝐛A\mathbf{x}=\mathbf{b} is given by the formula

xk=detBkdetA,x_{k}=\frac{\det B_{k}}{\det A},

where the matrix BkB_{k} is obtained from AA by replacing column number kk of AA by the vector 𝐛\mathbf{b}.

3.5.2. Some applications of the cofactor formula for the inverse.

Example (Inverting 2×22\times 2 matrices).

The cofactor formula really shines when one needs to invert a 2×22\times 2 matrix

A=(abcd).A=\left(\begin{array}[]{cc}a&b\\ c&d\end{array}\right).

The cofactors are just entries (1×11\times 1 matrices), the cofactor matrix is

(dcba),\left(\begin{array}[]{cc}d&-c\\ -b&a\end{array}\right),

so the inverse matrix A1A^{-1} is given by the formula

A1=1detA(dbca).A^{-1}=\frac{1}{\det A}\left(\begin{array}[]{cc}d&-b\\ -c&a\end{array}\right).

While the cofactor formula for the inverse does not look practical for dimensions higher than 3, it has a great theoretical value, as the examples below illustrate.

Example (Matrix with integer inverse).

Suppose that we want to construct a matrix AA with integer entries, such that its inverse also has integer entries (inverting such a matrix would make a nice homework problem: no messing with fractions). If detA=1\det A=1 and its entries are integer, the cofactor formula for inverses implies that A1A^{-1} also have integer entries.

Note, that it is easy to construct an integer matrix AA with detA=1\det A=1: one should start with a triangular matrix with 11 on the main diagonal, and then apply several row or column replacements (operations of the third type) to make the matrix look generic.

Example (Inverse of a polynomial matrix).

Another example is to consider a polynomial matrix A(x)A(x), i.e. a matrix whose entries are not numbers but polynomials aj,k(x)a_{j,k}(x) of the variable xx. If detA(x)1\det A(x)\equiv 1, then the inverse matrix A1(x)A^{-1}(x) is also a polynomial matrix.

If detA(x)=p(x)0\det A(x)=p(x)\not\equiv 0, it follows from the cofactor expansion that p(x)p(x) is a polynomial, so A1(x)A^{-1}(x) has rational entries: moreover, p(x)p(x) is a multiple of each denominator.

Exercises.

3.5.1.

Evaluate the determinants using any method

|011125643|,|1231251214199222031491415|.\left|\begin{array}[]{rrr}0&1&1\\ 1&2&-5\\ 6&-4&3\end{array}\right|,\qquad\left|\begin{array}[]{rrrr}1&-2&3&-12\\ -5&12&-14&19\\ -9&22&-20&31\\ -4&9&-14&15\end{array}\right|.
3.5.2.

Use row (column) expansion to evaluate the determinants. Note, that you don’t need to use the first row (column): picking row (column) with many zeroes will simplify your calculations.

|120115130|,|4644210003132235|.\left|\begin{array}[]{rrr}1&2&0\\ 1&1&5\\ 1&-3&0\end{array}\right|,\qquad\left|\begin{array}[]{rrrr}4&-6&-4&4\\ 2&1&0&0\\ 0&-3&1&3\\ -2&2&-3&-5\end{array}\right|.
3.5.3.

For the n×nn\times n matrix

A=(0000a01000a10100a20000an20001an1)A=\left(\begin{array}[]{ccccccccccc}0&0&0&\ldots&0&a_{0}\\ -1&0&0&\ldots&0&a_{1}\\ 0&-1&0&\ldots&0&a_{2}\\ \vdots&\vdots&\vdots&\ddots&\vdots&\vdots\\ 0&0&0&\ldots&0&a_{n-2}\\ 0&0&0&\ldots&-1&a_{n-1}\\ \end{array}\right)

compute det(A+tI)\det(A+tI), where II is n×nn\times n identity matrix. You should get a nice expression involving a0,a1,,an1a_{0},a_{1},\ldots,a_{n-1} and tt. Row expansion and induction is probably the best way to go.

3.5.4.

Using cofactor formula compute inverses of the matrices

(1234),(191732),(1035),(110212011).\left(\begin{array}[]{cc}1&2\\ 3&4\end{array}\right),\qquad\left(\begin{array}[]{cc}19&-17\\ 3&-2\end{array}\right),\qquad\left(\begin{array}[]{cc}1&0\\ 3&5\end{array}\right),\qquad\left(\begin{array}[]{ccc}1&1&0\\ 2&1&2\\ 0&1&1\end{array}\right).
3.5.5.

Let DnD_{n} be the determinant of the n×nn\times n tridiagonal matrix

(110111111011).\left(\begin{array}[]{cccccc}1&-1&&&0\\ 1&1&-1&&\\ &1&\ddots&\ddots&\\ &&\ddots&1&-1\\ 0&&&1&1\end{array}\right).

Using cofactor expansion show that Dn=Dn1+Dn2D_{n}=D_{n-1}+D_{n-2}. This yields that the sequence DnD_{n} is the Fibonacci sequence 1,2,3,5,8,13,21,1,2,3,5,8,13,21,\ldots

3.5.6.

Vandermonde determinant revisited. Our goal is to prove the formula

|1c0c02c0n1c1c12c1n1cncn2cnn|=0j<kn(ckcj)\left|\begin{array}[]{ccccc}1&c_{0}&c_{0}^{2}&\ldots&c_{0}^{n}\\ 1&c_{1}&c_{1}^{2}&\ldots&c_{1}^{n}\\ \vdots&\vdots&\vdots&&\vdots\\ 1&c_{n}&c_{n}^{2}&\ldots&c_{n}^{n}\end{array}\right|=\prod_{0\leq j<k\leq n}(c% _{k}-c_{j})

for the (n+1)×(n+1)(n+1)\times(n+1) Vandermonde determinant.

We will apply induction. To do this

  1. a)

    Check that the formula holds for n=1n=1, n=2n=2.

  2. b)

    Call the variable cnc_{n} in the last row xx, and show that the determinant is a polynomial of degree nn, A0+A1x+A2x2++AnxnA_{0}+A_{1}x+A_{2}x^{2}+\ldots+A_{n}x^{n}, with the coefficients AkA_{k} depending on c0,c1,,cn1c_{0},c_{1},\ldots,c_{n-1}.

  3. c)

    Show that the polynomial has zeroes at x=c0,c1,,cn1x=c_{0},c_{1},\ldots,c_{n-1}, so it can be represented as An(xc0)(xc1)(xcn1)A_{n}\cdot(x-c_{0})(x-c_{1})\ldots(x-c_{n-1}), where AnA_{n} as above.

  4. d)

    Assuming that the formula for the Vandermonde determinant is true for n1n-1, compute AnA_{n} and prove the formula for nn.

3.5.7.

How many multiplication is needed to compute the determinant of an n×nn\times n matrix using the cofactor expansion? Prove the formula.

3.6. Minors and rank.

For a matrix AA let us consider its k×kk\times k submatrix, obtained by taking kk rows and kk columns. The determinant of this matrix is called a minor of order kk. Note, that an m×nm\times n matrix has (mk)(nk)\binom{m}{k}\cdot\binom{n}{k} different k×kk\times k submatrices, and so it has (mk)(nk)\binom{m}{k}\cdot\binom{n}{k} minors of order kk.

Theorem 3.6.1.

For a non-zero matrix AA its rank equals to the maximal integer kk such that there exists a non-zero minor of order kk.

Proof.

Let us first show, that if k>rankAk>\operatorname{rank}A then all minors of order kk are 0. Indeed, since the dimension of the column space RanA\operatorname{Ran}A is rankA<k\operatorname{rank}A<k, any kk columns of AA are linearly dependent. Therefore, for any k×kk\times k submatrix of AA its columns are linearly dependent, and so all minors of order kk are 0.

To complete the proof we need to show that there exists a non-zero minor of order k=rankAk=\operatorname{rank}A. There can be many such minors, but probably the easiest way to get such a minor is to take pivot rows and pivot columns (i.e. rows and columns of the original matrix, containing a pivot). This k×kk\times k submatrix has the same pivots as the original matrix, so it is invertible (pivot in every column and every row) and its determinant is non-zero. ∎

This theorem does not look very useful, because it is much easier to perform row reduction than to compute all minors. However, it is of great theoretical importance, as the following corollary shows.

Corollary 3.6.2.

Let A=A(x)A=A(x) be an m×nm\times n polynomial matrix (i.e. a matrix whose entries are polynomials of xx). Then rankA(x)\operatorname{rank}A(x) is constant everywhere, except maybe finitely many points, where the rank is smaller.

Proof.

Let rr be the largest integer such that rankA(x)=r\operatorname{rank}A(x)=r for some xx. To show that such rr exists, we first try r=min{m,n}r=\min\{m,n\}. If there exists xx such that rankA(x)=r\operatorname{rank}A(x)=r, we have found rr. If not, we replace rr by r1r-1 and try again. After finitely many steps we either stop or hit 0. So, rr exists.

Let x0x_{0} be a point such that rankA(x0)=r\operatorname{rank}A(x_{0})=r, and let MM be a minor of order rr such that M(x0)0M(x_{0})\neq 0. Since M(x)M(x) is the determinant of a r×rr\times r polynomial matrix, M(x)M(x) is a polynomial. Since M(x0)0M(x_{0})\neq 0, it is not identically zero, so it can be zero only at finitely many points. So, everywhere except maybe finitely many points rankA(x)r\operatorname{rank}A(x)\geq r. But by the definition of rr, rankA(x)r\operatorname{rank}A(x)\leq r for all xx. ∎

3.7. Review exercises for Chapter 3.

3.7.1.

True or false

  1. a)

    Determinant is only defined for square matrices.

  2. b)

    If two rows or columns of AA are identical, then detA=0\det A=0.

  3. c)

    If BB is the matrix obtained from AA by interchanging two rows (or columns), then detB=detA\det B=\det A.

  4. d)

    If BB is the matrix obtained from AA by multiplying a row (column) of AA by a scalar α\alpha, then detB=detA\det B=\det A.

  5. e)

    If BB is the matrix obtained from AA by adding a multiple of a row to some other row, then detB=detA\det B=\det A.

  6. f)

    The determinant of a triangular matrix is the product of its diagonal entries.

  7. g)

    det(AT)=det(A)\det(A^{T})=-\det(A).

  8. h)

    det(AB)=det(A)det(B)\det(AB)=\det(A)\det(B).

  9. i)

    A matrix AA is invertible if and only if detA0\det A\neq 0.

  10. j)

    If AA is an invertible matrix, then det(A1)=1/det(A)\det(A^{-1})=1/\det(A).

3.7.2.

Let AA be an n×nn\times n matrix. How are det(3A)\det(3A), det(A)\det(-A) and det(A2)\det(A^{2}) related to detA\det A.

3.7.3.

If the entries of both AA and A1A^{-1} are integers, is it possible that detA=3\det A=3? Hint: what is det(A)det(A1)\det(A)\det(A^{-1})?

3.7.4.

Let 𝐯1,𝐯2\mathbf{v}_{1},\mathbf{v}_{2} be vectors in 2\mathbb{R}^{2} and let AA be the 2×22\times 2 matrix with columns 𝐯1,𝐯2\mathbf{v}_{1},\mathbf{v}_{2}. Prove that |detA||\det A| is the area of the parallelogram with two sides given by the vectors 𝐯1,𝐯2\mathbf{v}_{1},\mathbf{v}_{2}.

Consider first the case when 𝐯1=(x1,0)T\mathbf{v}_{1}=(x_{1},0)^{T}. To treat general case 𝐯1=(x1,y1)T\mathbf{v}_{1}=(x_{1},y_{1})^{T} left multiply AA by a rotation matrix that transforms vector 𝐯1\mathbf{v}_{1} into (x~1,0)T(\widetilde{x}_{1},0)^{T}. Hint: what is the determinant of a rotation matrix?

The following problem illustrates relation between the sign of the determinant and the so-called orientation of a system of vectors.

3.7.5.

Let 𝐯1\mathbf{v}_{1}, 𝐯2\mathbf{v}_{2} be vectors in 2\mathbb{R}^{2}. Show that D(𝐯1,𝐯2)>0D(\mathbf{v}_{1},\mathbf{v}_{2})>0 if and only if there exists a rotation TαT_{\alpha} such that the vector Tα𝐯1T_{\alpha}\mathbf{v}_{1} is parallel to 𝐞1\mathbf{e}_{1} (and looking in the same direction), and Tα𝐯2T_{\alpha}\mathbf{v}_{2} is in the upper half-plane x2>0x_{2}>0 (the same half-plane as 𝐞2\mathbf{e}_{2}).

Hint: What is the determinant of a rotation matrix?