The determinant is one of linear algebra's most deceptive quantities: a single number that encodes the entire geometric behavior of a linear transformation. Ask how a linear map $A: \mathbb{R}^n \to \mathbb{R}^n$ distorts volume, and the answer is $|\det A|$. Ask whether the map is invertible, and the answer is $\det A \neq 0$. Ask how the map twists orientation — whether it flips a right-handed coordinate system into a left-handed one — and the sign of $\det A$ tells you. This compression of so much information into one number is not an accident; it reflects a deep algebraic structure that forces the determinant to be exactly what it is.
But the determinant's economy conceals a risk: the formulas arrive before the meaning does. The $2 \times 2$ formula $ad - bc$ can be memorized in seconds. The cofactor expansion formula can be applied row by row to any square matrix. These procedures are correct and sometimes necessary, but they obscure the *reason* the determinant exists and why it has the properties it does. This chapter builds the determinant from its geometric meaning outward, so that every formula is a consequence of something you already understand.
[example: Failure of Naive Inversion]
Consider the system
\begin{align*}
2x_1 + 4x_2 &= 6 \\
x_1 + 2x_2 &= 3.
\end{align*}
The coefficient matrix is
\begin{align*}
A &= \begin{pmatrix} 2 & 4 \\ 1 & 2 \end{pmatrix}.
\end{align*}
The second equation is exactly half the first, so the two equations describe the same line in $\mathbb{R}^2$. There is a whole family of solutions — no unique answer. Equivalently, the columns $(2, 1)^\top$ and $(4, 2)^\top$ are proportional: the second is twice the first. The map $A: \mathbb{R}^2 \to \mathbb{R}^2$ collapses the entire plane onto a line, losing a whole dimension of information. Computing $\det A = 2 \cdot 2 - 4 \cdot 1 = 4 - 4 = 0$ reveals this collapse immediately. When $\det A = 0$, the matrix is not invertible, and linear systems $Ax = b$ either have no solution or infinitely many.
Now perturb: replace $A$ by
\begin{align*}
B &= \begin{pmatrix} 2 & 4 \\ 1 & 3 \end{pmatrix}.
\end{align*}
The columns $(2, 1)^\top$ and $(4, 3)^\top$ are no longer proportional. The map $B$ sends the unit square with vertices $(0,0), (1,0), (0,1), (1,1)$ to a parallelogram with vertices
\begin{align*}
B\begin{pmatrix}0\\0\end{pmatrix} &= \begin{pmatrix}0\\0\end{pmatrix}, \quad
B\begin{pmatrix}1\\0\end{pmatrix} = \begin{pmatrix}2\\1\end{pmatrix}, \quad
B\begin{pmatrix}0\\1\end{pmatrix} = \begin{pmatrix}4\\3\end{pmatrix}, \quad
B\begin{pmatrix}1\\1\end{pmatrix} = \begin{pmatrix}6\\4\end{pmatrix}.
\end{align*}
The area of this parallelogram is $|2 \cdot 3 - 4 \cdot 1| = |6 - 4| = 2$. Observe that the unit square has area $1$, and the map $B$ scales area by exactly $|\det B| = 2$. This is the geometric meaning of the determinant in dimension two: it measures the signed area of the parallelogram spanned by the columns of the matrix.
[/example]
The example reveals two faces of the determinant: an algebraic criterion for invertibility, and a geometric measure of volume distortion. The definition below unifies both faces into a single characterization.
## Definition
What properties should a "signed volume" function on the columns of a matrix satisfy? First, it should be linear in each column separately — scaling a column by $t$ should scale the volume by $t$. Second, it should vanish whenever two columns are equal — a parallelotope with two equal edges is flat and has zero volume. It turns out these two requirements, together with a normalization condition, force a unique function.
[definition: Multilinear Alternating Form]
Let $F$ be a field and $n \geq 1$. A function
\begin{align*}
D: (F^n)^n &\to F
\end{align*}
is called **multilinear** if it is linear in each argument separately: for every $j \in \{1, \ldots, n\}$, every choice of columns $v_1, \ldots, \hat{v}_j, \ldots, v_n \in F^n$ (all columns except the $j$-th fixed), and every $v, w \in F^n$ and $\lambda \in F$,
\begin{align*}
D(v_1, \ldots, v + \lambda w, \ldots, v_n) &= D(v_1, \ldots, v, \ldots, v_n) + \lambda\, D(v_1, \ldots, w, \ldots, v_n).
\end{align*}
The function $D$ is called **alternating** if $D(v_1, \ldots, v_n) = 0$ whenever $v_i = v_j$ for some $i \neq j$.
[/definition]
[remark: Alternating Implies Antisymmetry]
If $D$ is multilinear and alternating, then swapping any two arguments reverses the sign of $D$. To see why: if $D(\ldots, v, \ldots, w, \ldots) = 0$ when $v = w$, then by multilinearity
\begin{align*}
0 &= D(\ldots, v+w, \ldots, v+w, \ldots) \\
&= D(\ldots, v, \ldots, v, \ldots) + D(\ldots, v, \ldots, w, \ldots) + D(\ldots, w, \ldots, v, \ldots) + D(\ldots, w, \ldots, w, \ldots) \\
&= 0 + D(\ldots, v, \ldots, w, \ldots) + D(\ldots, w, \ldots, v, \ldots) + 0,
\end{align*}
which forces $D(\ldots, w, \ldots, v, \ldots) = -D(\ldots, v, \ldots, w, \ldots)$.
[/remark]
The remarkable fact is that the space of multilinear alternating functions on $n$ columns of vectors in $F^n$ is exactly one-dimensional — there is essentially only one such function, once you fix its value on the identity matrix. That normalization gives the determinant.
[definition: Determinant]
Let $A \in M_{n \times n}(F)$ be an $n \times n$ matrix over a field $F$, with columns $a_1, \ldots, a_n \in F^n$. The **determinant** of $A$, written $\det A$, is the unique multilinear alternating function
\begin{align*}
\det: M_{n \times n}(F) &\to F
\end{align*}
satisfying $\det I_n = 1$, where $I_n$ is the $n \times n$ identity matrix.
Equivalently, the determinant is given by the Leibniz formula
\begin{align*}
\det A &= \sum_{\sigma \in S_n} \operatorname{sgn}(\sigma) \prod_{i=1}^{n} A_{i,\sigma(i)},
\end{align*}
where the sum runs over all permutations $\sigma$ of $\{1, \ldots, n\}$ and $\operatorname{sgn}(\sigma) \in \{+1, -1\}$ is the sign of $\sigma$.
[/definition]
[explanation: Why the Leibniz Formula Follows from the Axiomatic Definition]
Given the three axioms — multilinearity, alternating, normalized at the identity — the Leibniz formula is forced. Write each column $a_j = \sum_{i=1}^n A_{ij} e_i$ where $e_1, \ldots, e_n$ are the standard basis vectors. By multilinearity applied to all $n$ columns:
\begin{align*}
\det(a_1, \ldots, a_n) &= \det\!\left(\sum_{i_1} A_{i_1 1} e_{i_1},\, \ldots,\, \sum_{i_n} A_{i_n n} e_{i_n}\right) \\
&= \sum_{i_1, \ldots, i_n} A_{i_1 1} \cdots A_{i_n n}\, \det(e_{i_1}, \ldots, e_{i_n}).
\end{align*}
The alternating property says $\det(e_{i_1}, \ldots, e_{i_n}) = 0$ if any two indices coincide. So the only surviving terms are those where $(i_1, \ldots, i_n)$ is a permutation of $(1, \ldots, n)$. Writing $i_k = \sigma(k)$ for a permutation $\sigma \in S_n$, we get $\det(e_{\sigma(1)}, \ldots, e_{\sigma(n)}) = \operatorname{sgn}(\sigma) \det(e_1, \ldots, e_n) = \operatorname{sgn}(\sigma)$, using antisymmetry and the normalization $\det I_n = 1$. This yields exactly the Leibniz formula.
[/explanation]
For small matrices, the Leibniz formula reduces to familiar expressions. In dimension $2$, there are $2! = 2$ permutations of $\{1, 2\}$: the identity (sign $+1$) and the transposition $(12)$ (sign $-1$). So
\begin{align*}
\det \begin{pmatrix} a & b \\ c & d \end{pmatrix} &= ad - bc.
\end{align*}
In dimension $3$, there are $3! = 6$ permutations, giving the rule of Sarrus:
\begin{align*}
\det \begin{pmatrix} a & b & c \\ d & e & f \\ g & h & k \end{pmatrix} &= aek + bfg + cdh - ceg - bdk - afh.
\end{align*}
Rather than memorizing these, it is more useful to have a recursive procedure. The Leibniz formula sums over all $n!$ permutations, which is completely impractical to evaluate by hand for $n \geq 5$. What we want is a way to reduce an $n \times n$ determinant to a combination of $(n-1) \times (n-1)$ determinants — and to keep reducing until we hit $2 \times 2$ or $1 \times 1$ cases we can handle directly. This is possible precisely because the determinant is multilinear: if we fix all but one column (or row), the determinant is linear in the remaining column. Expanding that linearity along an entire row converts the $n \times n$ problem into $n$ problems of size $(n-1) \times (n-1)$, each obtained by deleting the chosen row and the corresponding column. The sign factor $(-1)^{i+j}$ arises from tracking how the deletion permutes the remaining rows and columns relative to the standard order. The resulting procedure is the **cofactor expansion**.
[definition: Cofactor Expansion]
Let $A \in M_{n \times n}(F)$. For $1 \leq i, j \leq n$, let $M_{ij}$ denote the $(n-1) \times (n-1)$ **minor** obtained by deleting row $i$ and column $j$ from $A$. The $(i,j)$-**cofactor** is
\begin{align*}
C_{ij} &= (-1)^{i+j} \det M_{ij}.
\end{align*}
The **cofactor expansion along row $i$** is
\begin{align*}
\det A &= \sum_{j=1}^{n} A_{ij}\, C_{ij} = \sum_{j=1}^{n} (-1)^{i+j} A_{ij}\, \det M_{ij}.
\end{align*}
Equivalently, expanding along column $j$:
\begin{align*}
\det A &= \sum_{i=1}^{n} A_{ij}\, C_{ij} = \sum_{i=1}^{n} (-1)^{i+j} A_{ij}\, \det M_{ij}.
\end{align*}
[/definition]
[example: Cofactor Expansion in Dimension 3]
Compute $\det A$ for
\begin{align*}
A &= \begin{pmatrix} 1 & 2 & 0 \\ 3 & -1 & 2 \\ 0 & 1 & 4 \end{pmatrix}.
\end{align*}
Expanding along the first row (choosing it because $A_{13} = 0$ eliminates one $2 \times 2$ determinant):
\begin{align*}
\det A &= 1 \cdot (-1)^{1+1} \det \begin{pmatrix} -1 & 2 \\ 1 & 4 \end{pmatrix}
+ 2 \cdot (-1)^{1+2} \det \begin{pmatrix} 3 & 2 \\ 0 & 4 \end{pmatrix}
+ 0 \cdot (\cdots).
\end{align*}
The two nonzero minors are:
\begin{align*}
\det \begin{pmatrix} -1 & 2 \\ 1 & 4 \end{pmatrix} &= (-1)(4) - (2)(1) = -4 - 2 = -6, \\
\det \begin{pmatrix} 3 & 2 \\ 0 & 4 \end{pmatrix} &= (3)(4) - (2)(0) = 12.
\end{align*}
Therefore:
\begin{align*}
\det A &= 1 \cdot (+1) \cdot (-6) + 2 \cdot (-1) \cdot 12 + 0 = -6 - 24 = -30.
\end{align*}
The sign $\det A = -30 < 0$ means the transformation $A$ reverses orientation. Its absolute value $30$ means $A$ scales volumes by a factor of $30$.
[/example]
## Multiplicativity and Invertibility
The single most important property of the determinant — the one that makes it a practical tool — is that it converts matrix multiplication into scalar multiplication. This is the content of the multiplicative property, and it is far from obvious: applying two linear maps in sequence corresponds to *multiplying* their volume-scaling factors.
To see why this is remarkable, consider that $\det(A + B) \neq \det A + \det B$ in general — the determinant is not linear as a function of the whole matrix, only as a function of individual columns. Yet the product formula holds exactly.
[quotetheorem:395]
This theorem has an immediate corollary that connects the determinant to the fundamental question of invertibility.
[quotetheorem:396]
[explanation: Why Multiplicativity Implies the Invertibility Criterion]
If $A$ is invertible, then $AA^{-1} = I_n$, so by multiplicativity $\det A \cdot \det(A^{-1}) = \det I_n = 1$. This forces $\det A \neq 0$ and gives $\det(A^{-1}) = (\det A)^{-1}$.
Conversely, suppose $\det A \neq 0$. The argument runs through row reduction: every sequence of elementary row operations preserves linear dependence relations among columns, and reduces $A$ to upper triangular form. If $\det A = 0$, the triangular form has a zero diagonal entry (since the determinant of a triangular matrix is the product of its diagonal entries), which means one row was zeroed out — a linear dependence among the original rows, equivalently among the columns of $A^\top$, and by the transpose theorem among the columns of $A$ itself. Hence the columns are linearly dependent, the column space has dimension less than $n$, and $A$ is not surjective (hence not invertible). If the columns are linearly independent (equivalently, $\det A \neq 0$), then $A$ is bijective.
The geometric picture is clean: $\det A = 0$ means $A$ collapses the unit cube to a lower-dimensional object (zero volume), destroying information. Such a map cannot be inverted.
[/explanation]
[example: Failure of Invertibility — A Non-Obvious Case]
Consider the $3 \times 3$ matrix
\begin{align*}
A &= \begin{pmatrix} 1 & 2 & 3 \\ 4 & 5 & 6 \\ 7 & 8 & 9 \end{pmatrix}.
\end{align*}
Each row looks independent at a glance. But observe that row $3$ = row $1$ + $2 \cdot$(row $2$) $-$ row $1$ is not immediately obvious. Subtract: row $2$ $-$ row $1$ $= (3, 3, 3)$, and row $3$ $-$ row $2$ $= (3, 3, 3)$. So the differences are equal, meaning row $3$ = $2 \cdot$ row $2$ $-$ row $1$. Equivalently, the three rows $r_1, r_2, r_3$ satisfy $r_1 - 2r_2 + r_3 = 0$ — they are linearly dependent.
The determinant confirms this:
\begin{align*}
\det A &= 1 \cdot \det\begin{pmatrix}5&6\\8&9\end{pmatrix}
- 2 \cdot \det\begin{pmatrix}4&6\\7&9\end{pmatrix}
+ 3 \cdot \det\begin{pmatrix}4&5\\7&8\end{pmatrix} \\
&= 1(45 - 48) - 2(36 - 42) + 3(32 - 35) \\
&= 1(-3) - 2(-6) + 3(-3) \\
&= -3 + 12 - 9 = 0.
\end{align*}
So $A$ is not invertible, even though no two rows are equal or proportional in the naive sense.
[/example]
The multiplicative property also gives us the determinant of a transpose:
[quotetheorem:394]
This symmetry between rows and columns is not obvious from the geometric picture (where we defined determinants in terms of columns), but it follows from the Leibniz formula by reindexing: the sum over permutations of the column indices equals the sum over permutations of the row indices, because $\operatorname{sgn}(\sigma^{-1}) = \operatorname{sgn}(\sigma)$.
## Effect of Row Operations
In practice, the most efficient way to compute a large determinant is not cofactor expansion (which costs $O(n!)$ operations) but row reduction (which costs $O(n^3)$ operations). Understanding how elementary row operations affect the determinant makes this precise.
Why are exactly three types of row operation singled out? Because they are the minimal generators of the group of invertible row transformations. Every invertible row transformation — every operation that can be undone — is a composition of these three moves, and no smaller collection suffices. Row swaps implement permutations of rows; row scalings implement rescaling by a nonzero factor (which can always be undone by scaling back); row additions implement shearing (which can be undone by subtracting the same multiple). Together they generate all invertible row operations. Non-invertible operations — such as zeroing out a row — are excluded because they destroy information and cannot be tracked through a computation. Formalizing exactly these three makes the class precise: it is not an arbitrary list of useful tricks, but the complete set of building blocks for reversible row manipulation.
[definition: Elementary Row Operations]
Let $A \in M_{n \times n}(F)$. The three **elementary row operations** are:
1. **Row swap:** interchange rows $i$ and $j$ ($i \neq j$).
2. **Row scaling:** multiply row $i$ by a nonzero scalar $\lambda \in F^\times$.
3. **Row addition:** add $\lambda$ times row $j$ to row $i$ ($i \neq j$, $\lambda \in F$).
[/definition]
[quotetheorem:3298]
[explanation: Why Row Addition Does Not Change the Determinant]
This is the most surprising of the three rules, and it follows directly from multilinearity. Suppose we add $\lambda$ times row $j$ to row $i$, producing matrix $A'$. The determinant of $A'$ equals
\begin{align*}
\det A' &= \det(\ldots, r_i + \lambda r_j, \ldots, r_j, \ldots),
\end{align*}
where $r_i$ is the old row $i$ and $r_j$ is row $j$. By multilinearity in row $i$:
\begin{align*}
\det A' &= \det(\ldots, r_i, \ldots, r_j, \ldots) + \lambda \det(\ldots, r_j, \ldots, r_j, \ldots).
\end{align*}
The second term has row $j$ appearing twice (in positions $i$ and $j$), so it vanishes by the alternating property. We are left with $\det A' = \det A$.
This is precisely why row reduction is valid for computing determinants: only the row swaps and row scalings contribute, and we can track their effects as we reduce $A$ to upper triangular form.
[/explanation]
[example: Computing a Determinant by Row Reduction]
Let
\begin{align*}
A &= \begin{pmatrix} 2 & 1 & 3 \\ 4 & 5 & 6 \\ 2 & 8 & 9 \end{pmatrix}.
\end{align*}
Perform row reduction, tracking determinant changes.
Step 1: Subtract $2$ times row $1$ from row $2$ (row addition, no change to $\det$):
\begin{align*}
A_1 &= \begin{pmatrix} 2 & 1 & 3 \\ 0 & 3 & 0 \\ 2 & 8 & 9 \end{pmatrix}, \quad \det A_1 = \det A.
\end{align*}
Step 2: Subtract $1$ times row $1$ from row $3$ (row addition, no change):
\begin{align*}
A_2 &= \begin{pmatrix} 2 & 1 & 3 \\ 0 & 3 & 0 \\ 0 & 7 & 6 \end{pmatrix}, \quad \det A_2 = \det A.
\end{align*}
Step 3: Subtract $\tfrac{7}{3}$ times row $2$ from row $3$ (row addition, no change):
\begin{align*}
A_3 &= \begin{pmatrix} 2 & 1 & 3 \\ 0 & 3 & 0 \\ 0 & 0 & 6 \end{pmatrix}, \quad \det A_3 = \det A.
\end{align*}
The matrix $A_3$ is upper triangular. The determinant of an upper triangular matrix equals the product of its diagonal entries (expand down the first column repeatedly, or use the Leibniz formula — the only permutation contributing a nonzero term is the identity):
\begin{align*}
\det A_3 &= 2 \cdot 3 \cdot 6 = 36.
\end{align*}
No row swaps or scalings were performed, so $\det A = \det A_3 = 36$.
[/example]
[quotetheorem:3299]
## Geometric Interpretation: Volume and Orientation
The abstract definition conceals a rich geometry. In this section we make the connection explicit: the determinant measures signed volume in $\mathbb{R}^n$, and its sign encodes orientation.
Every linear map $A: \mathbb{R}^n \to \mathbb{R}^n$ acts on geometric objects. Curves become curves, planes become planes, and — crucially — the unit cube $[0,1]^n$ becomes some $n$-dimensional parallelepiped. The volume of that parallelepiped is exactly $|\det A|$.
[illustration:det-parallelogram]
[definition: Signed Volume]
Let $v_1, \ldots, v_n \in \mathbb{R}^n$. The **signed volume** (or **signed $n$-dimensional measure**) of the parallelepiped
\begin{align*}
P &= \left\{\sum_{i=1}^{n} t_i v_i : 0 \leq t_i \leq 1\right\}
\end{align*}
spanned by $v_1, \ldots, v_n$ is $\det A$, where $A$ is the matrix with columns $v_1, \ldots, v_n$. The **volume** (unsigned) is $|\det A|$.
[/definition]
[explanation: Why the Sign Encodes Orientation]
In $\mathbb{R}^2$, the standard basis $\{e_1, e_2\}$ defines a right-handed frame: rotating $e_1$ by $90^\circ$ counterclockwise gives $e_2$. A matrix $A$ with $\det A > 0$ maps this frame to another right-handed frame — it preserves orientation. A matrix $A$ with $\det A < 0$ sends it to a left-handed frame — the image of $e_1$ and $e_2$ has opposite "handedness." A reflection across the $x$-axis, for example, is represented by
\begin{align*}
R &= \begin{pmatrix} 1 & 0 \\ 0 & -1 \end{pmatrix}, \quad \det R = -1,
\end{align*}
confirming that reflections reverse orientation.
In $\mathbb{R}^n$, "orientation" means the equivalence class of ordered bases under the relation "connected by a positive-determinant transition matrix." There are exactly two orientations. A transformation preserves orientation iff its determinant is positive, and reverses orientation iff its determinant is negative.
[/explanation]
[quotetheorem:3300]
[remark: Connection to the Change of Variables Formula]
The Volume Scaling Theorem is the linear-algebra heart of the change of variables formula in multivariable integration. When $\varphi: U \to V$ is a $C^1$ diffeomorphism between open sets in $\mathbb{R}^n$, the Jacobian matrix $J\varphi_x$ at each point $x$ is the linear approximation to $\varphi$ near $x$. The infinitesimal volume element transforms as $d\mathcal{L}^n(\varphi(x)) = |\det J\varphi_x|\, d\mathcal{L}^n(x)$, which is why the change of variables formula reads $\int_V f\, d\mathcal{L}^n = \int_U (f \circ \varphi)\, |\det J\varphi_x|\, d\mathcal{L}^n$.
[/remark]
[example: Shear Stretches Area in One Direction Only]
A horizontal shear by factor $k$ in $\mathbb{R}^2$ is the matrix
\begin{align*}
S_k &= \begin{pmatrix} 1 & k \\ 0 & 1 \end{pmatrix}.
\end{align*}
Geometrically, $S_k$ fixes the $x$-axis and slides every other horizontal line: the point $(x, y)$ maps to $(x + ky, y)$. The unit square with vertices $(0,0)$, $(1,0)$, $(0,1)$, $(1,1)$ maps to the parallelogram with vertices $(0,0)$, $(1,0)$, $(k,1)$, $(k+1,1)$. This parallelogram has the same base (length $1$ along the $x$-axis) and the same height (vertical distance $1$), so its area is still $1$.
The determinant confirms this:
\begin{align*}
\det S_k &= 1 \cdot 1 - k \cdot 0 = 1.
\end{align*}
So $\det S_k = 1$ for every $k$, no matter how extreme the shear. This is geometrically correct — shearing slides but does not stretch or compress area — yet it is initially surprising: the matrix $S_k$ looks more and more distorted as $k$ grows large, yet its volume-scaling factor never changes.
Contrast this with an orthogonal projection onto the $x$-axis:
\begin{align*}
P &= \begin{pmatrix} 1 & 0 \\ 0 & 0 \end{pmatrix}.
\end{align*}
The projection collapses the entire plane onto the $x$-axis. The unit square maps to the segment $[0,1] \times \{0\}$, which is one-dimensional and has zero area. Accordingly, $\det P = 1 \cdot 0 - 0 \cdot 0 = 0$, confirming that $P$ is not invertible and destroys volume entirely.
[/example]
## The Adjugate and Cramer's Rule
When a matrix $A$ is invertible, there is a formula for $A^{-1}$ in terms of cofactors. This formula is rarely the most efficient computational method, but it is theoretically important and gives rise to Cramer's Rule for solving linear systems.
The key observation is that the cofactors $C_{ij}$ of $A$ encode more information than the determinant alone. When we expand $\det A$ along row $i$, we get $\sum_j A_{ij} C_{ij} = \det A$. But if we expand along row $i$ using the cofactors from row $k \neq i$ — a "wrong row" expansion — the result is always zero. This is because we are effectively computing the determinant of a matrix with two identical rows.
[quotetheorem:3304]
This "orthogonality" of cofactors motivates the adjugate matrix.
[definition: Adjugate Matrix]
Let $A \in M_{n \times n}(F)$. The **adjugate** (or **classical adjoint**) of $A$ is the $n \times n$ matrix
\begin{align*}
\operatorname{adj}(A) &\in M_{n \times n}(F)
\end{align*}
whose $(i,j)$-entry is the $(j,i)$-cofactor: $(\operatorname{adj}(A))_{ij} = C_{ji} = (-1)^{i+j} \det M_{ji}$.
Note the transpose: the $(i,j)$-entry of $\operatorname{adj}(A)$ is $C_{ji}$, not $C_{ij}$.
[/definition]
The reason for the transpose becomes clear when we compute the product $A \cdot \operatorname{adj}(A)$:
\begin{align*}
(A \cdot \operatorname{adj}(A))_{ik} &= \sum_{j=1}^{n} A_{ij}\, (\operatorname{adj}(A))_{jk} = \sum_{j=1}^{n} A_{ij}\, C_{kj}.
\end{align*}
When $i = k$, this equals $\det A$ (cofactor expansion). When $i \neq k$, this equals $0$ (wrong-row expansion). So:
[quotetheorem:397]
[example: Explicit Inverse of a 2×2 Matrix]
Let
\begin{align*}
A &= \begin{pmatrix} 3 & 5 \\ 1 & 2 \end{pmatrix}.
\end{align*}
First, $\det A = 3 \cdot 2 - 5 \cdot 1 = 6 - 5 = 1 \neq 0$, so $A$ is invertible.
The four cofactors are:
\begin{align*}
C_{11} &= (-1)^{1+1} \det(2) = 2, \\
C_{12} &= (-1)^{1+2} \det(1) = -1, \\
C_{21} &= (-1)^{2+1} \det(5) = -5, \\
C_{22} &= (-1)^{2+2} \det(3) = 3.
\end{align*}
The adjugate transposes these: the $(i,j)$-entry of $\operatorname{adj}(A)$ is $C_{ji}$, not $C_{ij}$. So
\begin{align*}
\operatorname{adj}(A) &= \begin{pmatrix} C_{11} & C_{21} \\ C_{12} & C_{22} \end{pmatrix} = \begin{pmatrix} 2 & -5 \\ -1 & 3 \end{pmatrix}.
\end{align*}
Note the transpose: the $(1,2)$-entry is $C_{21} = -5$ (the cofactor from position $(2,1)$), not $C_{12} = -1$.
Since $\det A = 1$, the inverse is simply
\begin{align*}
A^{-1} &= \frac{1}{1} \begin{pmatrix} 2 & -5 \\ -1 & 3 \end{pmatrix} = \begin{pmatrix} 2 & -5 \\ -1 & 3 \end{pmatrix}.
\end{align*}
Verification: compute $A \cdot A^{-1}$ directly.
\begin{align*}
A \cdot A^{-1} &= \begin{pmatrix} 3 & 5 \\ 1 & 2 \end{pmatrix} \begin{pmatrix} 2 & -5 \\ -1 & 3 \end{pmatrix} \\
&= \begin{pmatrix} 3 \cdot 2 + 5 \cdot (-1) & 3 \cdot (-5) + 5 \cdot 3 \\ 1 \cdot 2 + 2 \cdot (-1) & 1 \cdot (-5) + 2 \cdot 3 \end{pmatrix} \\
&= \begin{pmatrix} 6 - 5 & -15 + 15 \\ 2 - 2 & -5 + 6 \end{pmatrix} \\
&= \begin{pmatrix} 1 & 0 \\ 0 & 1 \end{pmatrix} = I_2. \checkmark
\end{align*}
[/example]
The adjugate formula immediately gives Cramer's Rule, an explicit expression for the solution of a linear system $Ax = b$ when $A$ is invertible.
[quotetheorem:3305]
[remark: Cramer's Rule in Practice]
Cramer's Rule requires computing $n+1$ determinants of $n \times n$ matrices, each costing $O(n^3)$ operations by row reduction. So the total cost is $O(n^4)$, which is worse than Gaussian elimination ($O(n^3)$). Cramer's Rule is important for proofs and for expressing solutions symbolically (e.g., when entries of $A$ are formal variables), but for numerical computation with specific numbers, row reduction is always preferred.
[/remark]
## Eigenvalues and the Characteristic Polynomial
One of the most important applications of the determinant in linear algebra is the detection of eigenvalues. A scalar $\lambda \in F$ is an eigenvalue of $A \in M_{n \times n}(F)$ if and only if the matrix $A - \lambda I_n$ is not invertible, which happens exactly when $\det(A - \lambda I_n) = 0$.
[definition: Characteristic Polynomial]
Let $A \in M_{n \times n}(F)$. The **characteristic polynomial** of $A$ is the polynomial in the variable $\lambda$:
\begin{align*}
p_A(\lambda) &= \det(\lambda I_n - A) \in F[\lambda].
\end{align*}
The **eigenvalues** of $A$ are the roots of $p_A$ in $F$ (or in an algebraic closure of $F$).
[/definition]
[remark: Sign Convention for the Characteristic Polynomial]
Some authors define the characteristic polynomial as $\det(A - \lambda I_n)$ rather than $\det(\lambda I_n - A)$. These differ by the sign $(-1)^n$, which matters for the sign of the leading coefficient. The convention $\det(\lambda I_n - A)$ ensures that $p_A$ is a monic polynomial (leading coefficient $1$), which is standard in most modern treatments.
[/remark]
The characteristic polynomial has degree exactly $n$, with leading term $\lambda^n$ and constant term $(-1)^n \det A$. Expanding $\det(\lambda I_n - A)$ yields the coefficients in terms of traces of powers of $A$ and sub-determinants — but the two most important coefficients are:
[quotetheorem:3306]
[explanation: Why det A Equals the Product of Eigenvalues]
Over an algebraically closed field (such as $\mathbb{C}$), the characteristic polynomial factors completely:
\begin{align*}
p_A(\lambda) &= \prod_{i=1}^{n} (\lambda - \lambda_i).
\end{align*}
Setting $\lambda = 0$ gives $p_A(0) = \prod_{i=1}^{n} (-\lambda_i) = (-1)^n \prod_{i=1}^{n} \lambda_i$. But we also have $p_A(0) = \det(0 \cdot I_n - A) = \det(-A) = (-1)^n \det A$. Equating these: $(-1)^n \det A = (-1)^n \prod_{i=1}^{n} \lambda_i$, and dividing by $(-1)^n$ gives $\det A = \prod_{i=1}^{n} \lambda_i$.
[/explanation]
[example: Characteristic Polynomial of a 2×2 Matrix]
For
\begin{align*}
A &= \begin{pmatrix} 3 & 1 \\ 2 & 4 \end{pmatrix},
\end{align*}
the characteristic polynomial is
\begin{align*}
p_A(\lambda) &= \det\begin{pmatrix} \lambda - 3 & -1 \\ -2 & \lambda - 4 \end{pmatrix} = (\lambda - 3)(\lambda - 4) - (-1)(-2) = \lambda^2 - 7\lambda + 10.
\end{align*}
The roots are $\lambda = 2$ and $\lambda = 5$. Check: $\det A = 3 \cdot 4 - 1 \cdot 2 = 10 = 2 \cdot 5$ and $\operatorname{tr} A = 3 + 4 = 7 = 2 + 5$. Both the product and sum of eigenvalues are encoded in the characteristic polynomial, as predicted.
[/example]
## Similarity Invariance and the Trace
The determinant and trace are the most fundamental invariants of a linear map — quantities that do not depend on the choice of basis. This is what makes them genuinely geometric, rather than artifacts of coordinates.
When we change basis using an invertible matrix $P$, the matrix representing the same linear transformation changes from $A$ to $PAP^{-1}$. Two matrices related this way are called **similar**.
Why does similarity matter? Because a linear map $T: V \to V$ does not come with a preferred basis — any basis gives a valid matrix representative, and different choices give different-looking matrices. Similarity is precisely the equivalence relation that identifies all these matrix representations of the same underlying map. A quantity is intrinsic to the map (rather than to a particular matrix) if and only if it is invariant under similarity. The determinant and trace are two such quantities: they tell you something about the geometry of $T$ itself, not about how you chose to write it down. Without this invariance, neither $\det$ nor $\operatorname{tr}$ would have geometric meaning.
[definition: Similar Matrices]
Two matrices $A, B \in M_{n \times n}(F)$ are **similar** if there exists an invertible matrix $P \in M_{n \times n}(F)$ such that
\begin{align*}
B &= PAP^{-1}.
\end{align*}
Similarity is an equivalence relation on $M_{n \times n}(F)$. Each similarity class corresponds to a single linear map $T: V \to V$ (for an $n$-dimensional vector space $V$ over $F$), with different choices of basis giving different matrix representatives in the same class.
[/definition]
[quotetheorem:401]
[explanation: Proof of Similarity Invariance]
For the determinant:
\begin{align*}
\det(PAP^{-1}) &= \det P \cdot \det A \cdot \det(P^{-1}) = \det P \cdot \det A \cdot \frac{1}{\det P} = \det A.
\end{align*}
For the characteristic polynomial:
\begin{align*}
\det(\lambda I - PAP^{-1}) &= \det(P(\lambda I - A)P^{-1}) = \det P \cdot \det(\lambda I - A) \cdot \det(P^{-1}) = \det(\lambda I - A).
\end{align*}
For the trace: $\operatorname{tr}(PAP^{-1}) = \operatorname{tr}(P^{-1} \cdot PA) = \operatorname{tr}(A)$, using the cyclic property of trace ($\operatorname{tr}(XY) = \operatorname{tr}(YX)$ for all $X, Y$).
These invariances confirm that $\det$, $\operatorname{tr}$, and $p_A$ are properties of the linear map itself, independent of which basis we choose to write it in.
[/explanation]
[remark: Determinant of a Linear Map]
Because $\det A$ is invariant under similarity, we can define the **determinant of a linear map** $T: V \to V$ on any finite-dimensional vector space $V$: choose any basis, form the matrix $A$ representing $T$, and set $\det T := \det A$. The invariance guarantees this is well-defined — a different basis gives a similar matrix with the same determinant.
[/remark]
[example: Two Matrices with the Same Determinant That Are Not Similar]
Similarity is a strictly finer equivalence relation than "having the same determinant." Consider
\begin{align*}
A &= \begin{pmatrix} 2 & 0 \\ 0 & 2 \end{pmatrix}, \qquad B = \begin{pmatrix} 1 & 0 \\ 0 & 4 \end{pmatrix}.
\end{align*}
Both have $\det A = \det B = 4$. But $A = 2I_2$ commutes with every matrix, while $B$ does not. More concretely, $A$ has characteristic polynomial $(\lambda - 2)^2$ with only one eigenvalue $\lambda = 2$, while $B$ has characteristic polynomial $(\lambda - 1)(\lambda - 4)$ with two distinct eigenvalues $\lambda = 1$ and $\lambda = 4$. Since similar matrices have the same characteristic polynomial, $A$ and $B$ are not similar. Sharing a determinant is necessary but not sufficient for similarity.
[/example]
## References
Axler, S., *Linear Algebra Done Right* (2015). Springer.
Hoffman, K. and Kunze, R., *Linear Algebra* (1971). Prentice-Hall.
Horn, R. A. and Johnson, C. R., *Matrix Analysis* (2013). Cambridge University Press.
Lang, S., *Linear Algebra* (1987). Springer.
Strang, G., *Linear Algebra and Its Applications* (2006). Thomson Brooks/Cole.