[proofplan]
We regard the determinant as the alternating multilinear function of the columns of a real $n\times n$ matrix normalized to take value $1$ on the identity matrix. Expanding each column of $A$ in the standard basis and using multilinearity expresses $\det A$ as a sum over all functions from $\{1,\dots,n\}$ to itself. Alternation kills exactly the terms in which a basis vector is repeated, so the surviving terms are indexed by permutations. Each surviving determinant is the determinant of the corresponding permutation matrix, hence equals the signature of the permutation.
[/proofplan]
[step:Expand the determinant by multilinearity in the columns]
For each $j\in\{1,\dots,n\}$, let $c_j\in\mathbb{R}^n$ denote the $j$-th column of $A$, so
\begin{align*}
c_j=\sum_{i=1}^n A_{ij}e_i,
\end{align*}
where $e_i\in\mathbb{R}^n$ is the $i$-th standard basis vector. Define the finite set
\begin{align*}
\mathcal{F}:=\{f:\{1,\dots,n\}\to\{1,\dots,n\}\}.
\end{align*}
By multilinearity of $\det$ in the columns,
\begin{align*}
\det A
=
\sum_{f\in\mathcal{F}}
\left(\prod_{j=1}^n A_{f(j),j}\right)
\det(e_{f(1)},\dots,e_{f(n)}).
\end{align*}
[guided]
The determinant is multilinear in its columns, so we can expand one column at a time. For each $j\in\{1,\dots,n\}$, the $j$-th column $c_j\in\mathbb{R}^n$ has the standard basis expansion
\begin{align*}
c_j=\sum_{i=1}^n A_{ij}e_i.
\end{align*}
Thus the determinant of $A$ is
\begin{align*}
\det A=\det(c_1,\dots,c_n).
\end{align*}
Multilinearity says that after substituting the basis expansion of every column, we must choose one basis vector from each column and multiply the corresponding scalar coefficients. A simultaneous choice of one row index for every column is precisely a function
\begin{align*}
f:\{1,\dots,n\}\to\{1,\dots,n\}.
\end{align*}
For such a function $f$, the chosen vector in column $j$ is $e_{f(j)}$, and the chosen coefficient is $A_{f(j),j}$. Therefore multilinearity gives
\begin{align*}
\det A
=
\sum_{f:\{1,\dots,n\}\to\{1,\dots,n\}}
\left(\prod_{j=1}^n A_{f(j),j}\right)
\det(e_{f(1)},\dots,e_{f(n)}).
\end{align*}
This expansion is the point at which all possible index selections appear; the next step will use alternation to remove the selections that cannot contribute to the determinant.
[/guided]
[/step]
[step:Discard the terms whose selected basis vectors repeat]
Let $f\in\mathcal{F}$. If $f$ is not injective, then there exist distinct indices $j,k\in\{1,\dots,n\}$ such that $f(j)=f(k)$. Hence the $j$-th and $k$-th columns in the determinant
\begin{align*}
\det(e_{f(1)},\dots,e_{f(n)})
\end{align*}
are equal. Since the determinant is alternating in its columns, this determinant is $0$.
Because the domain and codomain of $f$ are finite sets with the same cardinality $n$, injectivity of $f$ is equivalent to bijectivity. Hence only bijections $f:\{1,\dots,n\}\to\{1,\dots,n\}$ survive, and these are exactly the elements of $S_n$. Therefore
\begin{align*}
\det A
=
\sum_{f\in S_n}
\left(\prod_{j=1}^n A_{f(j),j}\right)
\det(e_{f(1)},\dots,e_{f(n)}).
\end{align*}
[guided]
Let $f\in\mathcal{F}$ be one of the functions appearing in the multilinear expansion. The determinant is alternating in its columns, meaning that if two columns are equal, then the determinant is $0$. If $f$ is not injective, there exist distinct indices $j,k\in\{1,\dots,n\}$ such that $f(j)=f(k)$. Then the $j$-th and $k$-th columns of
\begin{align*}
\det(e_{f(1)},\dots,e_{f(n)})
\end{align*}
are both the same standard basis vector $e_{f(j)}=e_{f(k)}$, so alternation gives
\begin{align*}
\det(e_{f(1)},\dots,e_{f(n)})=0.
\end{align*}
Thus every non-injective choice contributes zero to the expansion.
It remains to identify which functions are injective. The domain and codomain of $f$ are both the finite set $\{1,\dots,n\}$, so an injective function $f:\{1,\dots,n\}\to\{1,\dots,n\}$ is automatically bijective. Conversely, every bijection is injective. Therefore the surviving functions are exactly the permutations of $\{1,\dots,n\}$, namely the elements of $S_n$. Removing the zero terms from the multilinear expansion gives
\begin{align*}
\det A
=
\sum_{f\in S_n}
\left(\prod_{j=1}^n A_{f(j),j}\right)
\det(e_{f(1)},\dots,e_{f(n)}).
\end{align*}
[/guided]
[/step]
[step:Reindex the surviving sum by the inverse permutation]
For each surviving permutation $f\in S_n$, define $\sigma:=f^{-1}\in S_n$. Then $f(j)=\sigma^{-1}(j)$ for every $j\in\{1,\dots,n\}$. Substituting this into the surviving sum gives
\begin{align*}
\det A
=
\sum_{\sigma\in S_n}
\left(\prod_{j=1}^n A_{\sigma^{-1}(j),j}\right)
\det(e_{\sigma^{-1}(1)},\dots,e_{\sigma^{-1}(n)}).
\end{align*}
In the product, the change of index $j=\sigma(i)$ gives
\begin{align*}
\prod_{j=1}^n A_{\sigma^{-1}(j),j}
=
\prod_{i=1}^n A_{i,\sigma(i)}.
\end{align*}
Thus
\begin{align*}
\det A
=
\sum_{\sigma\in S_n}
\left(\prod_{i=1}^n A_{i,\sigma(i)}\right)
\det(e_{\sigma^{-1}(1)},\dots,e_{\sigma^{-1}(n)}).
\end{align*}
[guided]
The surviving sum is indexed by bijections $f\in S_n$, but the target formula uses the entry $A_{i,\sigma(i)}$, where the first index is the row and the second index is the image of the row under the permutation. Our current product is instead written column-by-column as $A_{f(j),j}$. To put it in the desired row-by-row form, define
\begin{align*}
\sigma:=f^{-1}\in S_n.
\end{align*}
Then $f(j)=\sigma^{-1}(j)$ for every $j\in\{1,\dots,n\}$. Substituting this into the surviving sum gives
\begin{align*}
\det A
=
\sum_{\sigma\in S_n}
\left(\prod_{j=1}^n A_{\sigma^{-1}(j),j}\right)
\det(e_{\sigma^{-1}(1)},\dots,e_{\sigma^{-1}(n)}).
\end{align*}
Now use the change of finite index $j=\sigma(i)$. Since $\sigma$ is a bijection of $\{1,\dots,n\}$, as $i$ ranges over $\{1,\dots,n\}$ the index $j=\sigma(i)$ also ranges over $\{1,\dots,n\}$ exactly once. Hence
\begin{align*}
\prod_{j=1}^n A_{\sigma^{-1}(j),j}
=
\prod_{i=1}^n A_{\sigma^{-1}(\sigma(i)),\sigma(i)}
=
\prod_{i=1}^n A_{i,\sigma(i)}.
\end{align*}
Therefore
\begin{align*}
\det A
=
\sum_{\sigma\in S_n}
\left(\prod_{i=1}^n A_{i,\sigma(i)}\right)
\det(e_{\sigma^{-1}(1)},\dots,e_{\sigma^{-1}(n)}).
\end{align*}
[/guided]
[/step]
[step:Identify the remaining determinants with signatures]
For $\sigma\in S_n$, let $P_\sigma\in\mathbb{R}^{n\times n}$ denote the permutation matrix whose $j$-th column is $e_{\sigma^{-1}(j)}$. Equivalently, for every $i,j\in\{1,\dots,n\}$, its entries satisfy $(P_\sigma)_{ij}=1$ if $j=\sigma(i)$ and $(P_\sigma)_{ij}=0$ if $j\ne\sigma(i)$.
Therefore
\begin{align*}
\det(P_\sigma)=\det(e_{\sigma^{-1}(1)},\dots,e_{\sigma^{-1}(n)}).
\end{align*}
The hypotheses of [citetheorem:7881] are satisfied because $n\in\mathbb{N}$ and $\sigma\in S_n$, and the matrix just defined is precisely the permutation matrix $P_\sigma$ appearing in that theorem. Hence
\begin{align*}
\det(P_\sigma)=\operatorname{sgn}(\sigma).
\end{align*}
Substituting this identity into the preceding expression for $\det A$ yields
\begin{align*}
\det A
=
\sum_{\sigma\in S_n}
\operatorname{sgn}(\sigma)\prod_{i=1}^n A_{i,\sigma(i)}.
\end{align*}
This is the desired Leibniz formula.
[guided]
Fix $\sigma\in S_n$. Define $P_\sigma\in\mathbb{R}^{n\times n}$ to be the permutation matrix whose $j$-th column is $e_{\sigma^{-1}(j)}$. Equivalently, for every $i,j\in\{1,\dots,n\}$, the entries of $P_\sigma$ are given by $(P_\sigma)_{ij}=1$ when $j=\sigma(i)$ and $(P_\sigma)_{ij}=0$ when $j\ne\sigma(i)$. Since the determinant of a matrix is the determinant of its ordered list of columns, this definition gives
\begin{align*}
\det(P_\sigma)=\det(e_{\sigma^{-1}(1)},\dots,e_{\sigma^{-1}(n)}).
\end{align*}
We now invoke [citetheorem:7881]. Its hypotheses are satisfied because $n\in\mathbb{N}$ and $\sigma\in S_n$, and the matrix just defined is the permutation matrix $P_\sigma$ appearing in that theorem. Therefore
\begin{align*}
\det(P_\sigma)=\operatorname{sgn}(\sigma).
\end{align*}
Combining the two displayed identities yields
\begin{align*}
\det(e_{\sigma^{-1}(1)},\dots,e_{\sigma^{-1}(n)})=\operatorname{sgn}(\sigma).
\end{align*}
Substituting this into the reindexed determinant expansion gives
\begin{align*}
\det A
=
\sum_{\sigma\in S_n}
\operatorname{sgn}(\sigma)\prod_{i=1}^n A_{i,\sigma(i)}.
\end{align*}
This is exactly the Leibniz formula stated in the theorem.
[/guided]
[/step]