[proofplan]
We describe an explicit algorithm that reduces $A$ to canonical form using three types of elementary operations: row/column swaps, row/column shears (adding a scalar multiple of one row/column to another), and row/column scalings. Each operation corresponds to left- or right-multiplication by an elementary matrix. The algorithm processes pivots one at a time, maintaining the invariant that after processing pivot $k$, the top-left $k \times k$ block is $I_k$ with zeros in the rest of rows and columns $1, \dots, k$.
[/proofplan]
[step:Establish that each type of elementary matrix is invertible]
[claim:Elementary Matrices Invertible]
Each elementary matrix is invertible, with inverse of the same type: $(S_{ij}^n)^{-1} = S_{ij}^n$, $(E_{ij}^n(\lambda))^{-1} = E_{ij}^n(-\lambda)$, and $(T_i^n(\lambda))^{-1} = T_i^n(\lambda^{-1})$.
[/claim]
[proof]
For the swap matrix: $S_{ij}^n S_{ij}^n = I_n$, since swapping rows $i$ and $j$ twice returns to the original configuration.
For the shear matrix: $E_{ij}^n(\lambda) E_{ij}^n(-\lambda) = I_n$, since adding $\lambda$ times row $j$ to row $i$ and then subtracting $\lambda$ times row $j$ from row $i$ yields the identity.
For the scaling matrix: $T_i^n(\lambda) T_i^n(\lambda^{-1}) = I_n$, since scaling row $i$ by $\lambda$ and then by $\lambda^{-1}$ returns to the original. Here $\lambda \neq 0$ ensures $\lambda^{-1}$ exists.
[/proof]
[/step]
[step:Describe the column-by-column reduction algorithm]
We process columns $k = 1, 2, \dots$ in order, maintaining the invariant that after processing column $k$, the top-left $k \times k$ block is $I_k$ and all other entries in the first $k$ rows and columns are zero.
**Processing column $k$:** If all entries $a_{ij}$ with $i \geq k$ and $j \geq k$ in the remaining submatrix are zero, the algorithm terminates (no further pivots exist). Otherwise:
(a) **Pivot selection:** Find some $a_{ij} \neq 0$ with $i \geq k$, $j \geq k$. Apply the row swap $S_{ik}$ and the column swap $S_{jk}$ to move this entry to position $(k, k)$.
(b) **Normalise the pivot:** Apply the row scaling $T_k(a_{kk}^{-1})$ to make the $(k,k)$-entry equal to $1$.
(c) **Clear column $k$ below the pivot:** For each $i > k$ with $a_{ik} \neq 0$, apply the row shear $E_{ik}(-a_{ik})$ to subtract $a_{ik}$ times row $k$ from row $i$.
(d) **Clear row $k$ to the right of the pivot:** For each $j > k$ with $a_{kj} \neq 0$, apply the column shear $E_{kj}(-a_{kj})$ to subtract $a_{kj}$ times column $k$ from column $j$.
(e) **Clear column $k$ above the pivot:** For each $i < k$ with $a_{ik} \neq 0$, apply the row shear $E_{ik}(-a_{ik})$.
(f) **Clear row $k$ to the left of the pivot:** For each $j < k$ with $a_{kj} \neq 0$, apply the column shear $E_{kj}(-a_{kj})$.
After processing, the $(k,k)$-entry is $1$ and all other entries in row $k$ and column $k$ are $0$, preserving the invariant.
[guided]
The algorithm generalises Gaussian elimination to both rows and columns simultaneously. At each stage, we select a non-zero entry in the unprocessed submatrix, move it to the diagonal via swaps, normalise it to $1$ via scaling, and then use it to eliminate all other entries in its row and column via shears.
Why does the invariant hold? After processing column $k$, the $(k,k)$-entry is $1$ by step (b), entries below it in column $k$ are $0$ by step (c), entries to the right in row $k$ are $0$ by step (d), entries above in column $k$ are $0$ by step (e), and entries to the left in row $k$ are $0$ by step (f). The operations in steps (c)--(f) do not disturb the previously processed rows and columns $1, \dots, k-1$ because they only add multiples of row $k$ or column $k$ (which has zeros in positions $1, \dots, k-1$ of that row/column).
[/guided]
[/step]
[step:Verify termination and identify the canonical form]
The algorithm terminates after at most $\min(n, m)$ pivot steps. If it successfully processes $r$ pivots before finding the remaining submatrix is entirely zero, the result is
\begin{align*}
\begin{pmatrix} I_r & \mathbf{0} \\ \mathbf{0} & \mathbf{0} \end{pmatrix}.
\end{align*}
The number of pivots $r$ equals $\mathrm{rank}(A)$, since elementary row and column operations do not change the rank (they correspond to multiplication by invertible matrices).
[/step]
[step:Express the reduction as a product of elementary matrices]
Each elementary row operation is left-multiplication by an elementary matrix, and each elementary column operation is right-multiplication by an elementary matrix. Since each elementary matrix is invertible (by the first step), composing all row operations gives an invertible matrix $Q^{-1}$ and composing all column operations gives an invertible matrix $P$, with
\begin{align*}
Q^{-1}AP = \begin{pmatrix} I_r & \mathbf{0} \\ \mathbf{0} & \mathbf{0} \end{pmatrix}.
\end{align*}
Since $Q^{-1}$ and $P$ are products of elementary matrices, this provides a constructive realisation of the [Canonical Form for Linear Maps](/theorems/388).
[/step]