[proofplan]
A symmetric bilinear form over a field is non-degenerate if and only if its Gram matrix with respect to some (equivalently, any) basis has nonzero determinant. We compute the Gram matrix of the trace form by writing the trace via the $K$-embeddings of $L$ (from [Norm and Trace via Embeddings](/theorems/1577) in the general separable setting): this exhibits the Gram matrix as $S^\top S$ where $S = (\sigma_i(\alpha_j))$, so its determinant is $(\det S)^2$. To show $\det S \neq 0$, we use the [primitive element theorem](/theorems/???) to choose the basis $\alpha_j = \theta^{j-1}$, making $S$ a Vandermonde matrix whose determinant factors as $\prod_{i<j}(\sigma_i(\theta) - \sigma_j(\theta))$. Separability forces the embeddings to be distinct, and distinct embeddings must disagree on the primitive element $\theta$, so every Vandermonde factor is nonzero.
[/proofplan]
[step:Reduce non-degeneracy to non-vanishing of the Gram determinant]
Let $B: L \times L \to K$ be the symmetric $K$-bilinear form $B(x, y) = \operatorname{tr}_{L/K}(xy)$. Given the $K$-basis $\alpha_1, \ldots, \alpha_n$ of $L$, the Gram matrix of $B$ is
\begin{align*}
G := (B(\alpha_i, \alpha_j))_{i,j=1}^n = (\operatorname{tr}_{L/K}(\alpha_i \alpha_j))_{i,j=1}^n \in K^{n \times n}.
\end{align*}
For a symmetric $K$-bilinear form on a finite-dimensional $K$-vector space, non-degeneracy is equivalent to $\det G \neq 0$. Indeed, $B$ is degenerate iff there exists nonzero $y \in L$ with $B(x, y) = 0$ for all $x \in L$ iff the linear map
\begin{align*}
L &\to L^* \\
y &\mapsto B(\cdot, y)
\end{align*}
has nontrivial kernel iff its matrix (which is $G$ in the chosen bases) is singular iff $\det G = 0$. Equivalence holds because the determinant of $G$ is independent of the basis up to a nonzero square factor (change of basis by $P$ replaces $G$ by $P^\top G P$, and $\det(P^\top G P) = (\det P)^2 \det G$), so non-vanishing is a property of $B$ alone.
We reduce the theorem to showing $\det G \neq 0$.
[guided]
We begin by unpacking what non-degeneracy means and why it is equivalent to the determinant condition stated.
**Definition.** A symmetric $K$-bilinear form $B$ on a finite-dimensional $K$-vector space $V$ is **non-degenerate** if, whenever $B(x, y) = 0$ for all $x \in V$, we have $y = 0$. Equivalently, the induced map
\begin{align*}
\Phi_B : V &\to V^* \\
y &\mapsto B(\cdot, y)
\end{align*}
is injective, which for a finite-dimensional space is equivalent to being an isomorphism (since $\dim V = \dim V^*$).
**Gram matrix and determinant.** Fix a basis $\alpha_1, \ldots, \alpha_n$ of $V$ and its dual basis $\alpha_1^*, \ldots, \alpha_n^*$ of $V^*$. The matrix of $\Phi_B$ in these bases has $(i, j)$ entry $B(\alpha_i, \alpha_j)$, which is exactly the Gram matrix $G$. Therefore $\Phi_B$ is invertible iff $G$ is invertible iff $\det G \neq 0$.
**Basis-independence of the determinant condition.** If $\alpha_j' = \sum_k P_{kj} \alpha_k$ is another basis, linked to the old by an invertible matrix $P \in \operatorname{GL}_n(K)$, then a direct computation gives
\begin{align*}
G' = P^\top G P,
\end{align*}
so $\det G' = (\det P)^2 \det G$. Since $\det P \in K^\times$ is nonzero, $\det G'$ and $\det G$ are both zero or both nonzero. Hence the non-vanishing of $\det G$ is a property of $B$, independent of basis.
Applying this to our setting: $V = L$, $B(x, y) = \operatorname{tr}_{L/K}(xy)$. We have reduced the statement of the theorem to showing $\det G \neq 0$ for some (equivalently, any) $K$-basis of $L$.
[/guided]
[/step]
[step:Factor the Gram matrix as $S^\top S$ via the $K$-embeddings]
Let $\sigma_1, \ldots, \sigma_n: L \to \bar{K}$ be the $n$ distinct $K$-embeddings of $L$ into an algebraic closure $\bar{K}$ of $K$. (There are exactly $n = [L:K]$ of them because $L/K$ is separable.) By [Norm and Trace via Embeddings](/theorems/1577), for every $\beta \in L$,
\begin{align*}
\operatorname{tr}_{L/K}(\beta) = \sum_{k=1}^n \sigma_k(\beta).
\end{align*}
Define the matrix $S \in \bar{K}^{n \times n}$ by $S_{ij} := \sigma_i(\alpha_j)$. Compute the $(i, j)$ entry of $S^\top S$:
\begin{align*}
(S^\top S)_{ij} = \sum_{k=1}^n (S^\top)_{ik} S_{kj} = \sum_{k=1}^n S_{ki} S_{kj} = \sum_{k=1}^n \sigma_k(\alpha_i) \sigma_k(\alpha_j).
\end{align*}
Since each $\sigma_k$ is a ring homomorphism, $\sigma_k(\alpha_i)\sigma_k(\alpha_j) = \sigma_k(\alpha_i \alpha_j)$, so
\begin{align*}
(S^\top S)_{ij} = \sum_{k=1}^n \sigma_k(\alpha_i \alpha_j) = \operatorname{tr}_{L/K}(\alpha_i \alpha_j) = G_{ij}.
\end{align*}
Therefore $G = S^\top S$, and applying $\det$:
\begin{align*}
\det G = \det(S^\top S) = \det(S^\top) \det(S) = (\det S)^2. \tag{$\star$}
\end{align*}
[guided]
To evaluate $\det G$, we need a workable expression for the entries $\operatorname{tr}_{L/K}(\alpha_i \alpha_j) \in K$. The trace is defined as $\operatorname{tr}_{L/K}(\beta) = \operatorname{tr}(m_\beta)$ where $m_\beta: L \to L$ is multiplication by $\beta$, viewed as a $K$-linear endomorphism of the $n$-dimensional $K$-vector space $L$. This definition is basis-free but hard to manipulate computationally.
**The embedding formula.** The separability of $L/K$ provides a more symmetric expression. Let $\sigma_1, \ldots, \sigma_n: L \to \bar{K}$ denote the $n$ distinct $K$-embeddings of $L$ into an algebraic closure $\bar{K}$ of $K$ (each $\sigma_k$ fixes $K$ pointwise). Separability ensures there are exactly $n = [L:K]$ such embeddings (for inseparable extensions there would be fewer). By [Norm and Trace via Embeddings](/theorems/1577),
\begin{align*}
\operatorname{tr}_{L/K}(\beta) = \sum_{k=1}^n \sigma_k(\beta), \qquad \beta \in L.
\end{align*}
**Building the matrix $S$.** Define $S = (\sigma_i(\alpha_j))_{i,j=1}^n$: rows are indexed by embeddings, columns by basis elements. The transpose has entries $(S^\top)_{ik} = S_{ki} = \sigma_k(\alpha_i)$, so the $(i,j)$ entry of $S^\top S$ is
\begin{align*}
(S^\top S)_{ij} = \sum_k \sigma_k(\alpha_i) \sigma_k(\alpha_j).
\end{align*}
Here is where the **homomorphism property** of $\sigma_k$ matters: each $\sigma_k: L \to \bar{K}$ is a ring homomorphism, so $\sigma_k(\alpha_i) \sigma_k(\alpha_j) = \sigma_k(\alpha_i \alpha_j)$. Substituting:
\begin{align*}
(S^\top S)_{ij} = \sum_k \sigma_k(\alpha_i \alpha_j) = \operatorname{tr}_{L/K}(\alpha_i \alpha_j) = G_{ij}.
\end{align*}
So $G = S^\top S$ as matrices (entries a priori in $\bar{K}$, but $G$ actually has entries in $K$ since $\operatorname{tr}_{L/K}: L \to K$).
**Taking determinants.** Multiplicativity of $\det$ and $\det(S^\top) = \det(S)$ give
\begin{align*}
\det G = \det(S^\top) \det(S) = (\det S)^2.
\end{align*}
We have reduced the problem to showing $\det S \neq 0$ in $\bar{K}$.
[/guided]
[/step]
[step:Specialize to the basis $\alpha_j = \theta^{j-1}$ from the primitive element theorem]
By the observation in Step 1, it suffices to prove $\det G \neq 0$ for any single $K$-basis of $L$. We choose a convenient one. By the [primitive element theorem](/theorems/???) — applicable because $L/K$ is a finite separable extension — there exists $\theta \in L$ with $L = K(\theta)$. For such $\theta$, the powers
\begin{align*}
1, \theta, \theta^2, \ldots, \theta^{n-1}
\end{align*}
form a $K$-basis of $L$ (standard fact about simple extensions: $[K(\theta):K] = \deg p_\theta = n$, and $1, \theta, \ldots, \theta^{n-1}$ is a basis because any polynomial in $\theta$ of degree $\geq n$ can be reduced mod $p_\theta$).
Taking $\alpha_j := \theta^{j-1}$ for $j = 1, \ldots, n$, the matrix $S$ becomes
\begin{align*}
S_{ij} = \sigma_i(\alpha_j) = \sigma_i(\theta^{j-1}) = \sigma_i(\theta)^{j-1}.
\end{align*}
This is the **Vandermonde matrix** in the entries $\sigma_i(\theta)$, whose determinant is well-known:
\begin{align*}
\det S = \prod_{1 \leq i < j \leq n} (\sigma_j(\theta) - \sigma_i(\theta)). \tag{$\dagger$}
\end{align*}
[guided]
The factorization $G = S^\top S$ holds for *any* $K$-basis $\alpha_1, \ldots, \alpha_n$. By the basis-independence argument in Step 1, the non-vanishing of $\det G$ is a property of the form $B$, not the chosen basis, so we may evaluate $\det G$ for whichever basis makes the computation cleanest.
**Choosing the basis.** The [primitive element theorem](/theorems/???) states: every finite separable extension $L/K$ is generated by a single element — there exists $\theta \in L$ with $L = K(\theta)$. Our hypotheses ($L/K$ separable, $[L:K] = n < \infty$) are exactly what is needed.
With such a $\theta$, the powers $1, \theta, \theta^2, \ldots, \theta^{n-1}$ form a $K$-basis of $L = K(\theta)$: this is the standard basis of a simple algebraic extension. Reason: the minimal polynomial $p_\theta \in K[t]$ of $\theta$ over $K$ has degree $n = [L:K]$ (as $L = K[\theta] \cong K[t]/(p_\theta)$), so $\{1, \theta, \ldots, \theta^{n-1}\}$ is a $K$-basis.
**Substituting into $S$.** With $\alpha_j := \theta^{j-1}$,
\begin{align*}
S_{ij} = \sigma_i(\theta^{j-1}) = \sigma_i(\theta)^{j-1},
\end{align*}
using that $\sigma_i$ is a ring homomorphism so $\sigma_i(\theta^{j-1}) = \sigma_i(\theta)^{j-1}$. Writing $t_i := \sigma_i(\theta) \in \bar{K}$, the matrix is
\begin{align*}
S = \begin{pmatrix} 1 & t_1 & t_1^2 & \cdots & t_1^{n-1} \\ 1 & t_2 & t_2^2 & \cdots & t_2^{n-1} \\ \vdots & \vdots & \vdots & & \vdots \\ 1 & t_n & t_n^2 & \cdots & t_n^{n-1} \end{pmatrix}.
\end{align*}
This is the **Vandermonde matrix** in the variables $t_1, \ldots, t_n$. The Vandermonde determinant identity states
\begin{align*}
\det S = \prod_{1 \leq i < j \leq n} (t_j - t_i) = \prod_{i < j} (\sigma_j(\theta) - \sigma_i(\theta)).
\end{align*}
The identity is proved by showing both sides are polynomials in the $t_i$ of the same degree $\binom{n}{2}$, both vanish when any $t_i = t_j$, and comparing leading terms — it is a standard fact from linear algebra.
[/guided]
[/step]
[step:Conclude $\det S \neq 0$ from distinctness of embeddings on the primitive element]
From $(\dagger)$, $\det S = 0$ iff $\sigma_i(\theta) = \sigma_j(\theta)$ for some pair $i < j$. We show no such pair exists.
The embeddings $\sigma_1, \ldots, \sigma_n: L \to \bar{K}$ are distinct (by hypothesis, since separability of $L/K$ produces $n$ *distinct* $K$-embeddings). Suppose for contradiction $\sigma_i(\theta) = \sigma_j(\theta)$ for some $i \neq j$. Every element of $L = K(\theta)$ has the form $f(\theta)$ for some $f \in K[t]$ of degree $< n$. Applying $\sigma_i$ and $\sigma_j$ (both fix $K$ pointwise):
\begin{align*}
\sigma_i(f(\theta)) = f(\sigma_i(\theta)) = f(\sigma_j(\theta)) = \sigma_j(f(\theta)),
\end{align*}
for every $f \in K[t]$, hence for every element of $L$. Thus $\sigma_i = \sigma_j$ on $L$, contradicting distinctness.
Therefore $\sigma_i(\theta) \neq \sigma_j(\theta)$ for all $i < j$, every factor in $(\dagger)$ is nonzero, and
\begin{align*}
\det S = \prod_{i<j}(\sigma_j(\theta) - \sigma_i(\theta)) \neq 0 \quad \text{in } \bar{K}.
\end{align*}
Substituting into $(\star)$,
\begin{align*}
\det G = (\det S)^2 \neq 0 \quad \text{in } K,
\end{align*}
where we use that $\det G \in K$ (each entry $\operatorname{tr}_{L/K}(\alpha_i \alpha_j) \in K$) and $(\det S)^2$ is Galois-invariant hence lies in $K$. This completes the proof.
[guided]
From $(\dagger)$, $\det S$ is a product of differences $\sigma_j(\theta) - \sigma_i(\theta)$. A product of elements of a field is zero iff one of the factors is zero, so $\det S = 0$ iff some pair of embeddings agrees on $\theta$. We rule this out.
**Distinctness of embeddings.** By hypothesis the $\sigma_i$ are the $n$ distinct $K$-embeddings of $L$ into $\bar{K}$. Here "distinct" means as maps $L \to \bar{K}$: if $i \neq j$, there is some $x \in L$ with $\sigma_i(x) \neq \sigma_j(x)$.
**Why distinctness on $L$ implies distinctness on $\theta$.** This is where the **primitive element** property kicks in: since $L = K(\theta)$, every $x \in L$ has the form $x = f(\theta)$ for some $f \in K[t]$ (with $\deg f < n$, using the minimal polynomial relation). The embedding $\sigma_i$ fixes $K$ pointwise (by definition of $K$-embedding) and is a ring homomorphism, so
\begin{align*}
\sigma_i(f(\theta)) = f(\sigma_i(\theta)).
\end{align*}
(Expand $f(t) = \sum_k c_k t^k$ with $c_k \in K$: then $\sigma_i(\sum c_k \theta^k) = \sum \sigma_i(c_k) \sigma_i(\theta)^k = \sum c_k \sigma_i(\theta)^k = f(\sigma_i(\theta))$.)
So if $\sigma_i(\theta) = \sigma_j(\theta)$, then for every $x = f(\theta) \in L$,
\begin{align*}
\sigma_i(x) = f(\sigma_i(\theta)) = f(\sigma_j(\theta)) = \sigma_j(x),
\end{align*}
contradicting the distinctness $\sigma_i \neq \sigma_j$.
**Conclusion.** Every factor $\sigma_j(\theta) - \sigma_i(\theta)$ (for $i < j$) is nonzero, so
\begin{align*}
\det S = \prod_{i<j}(\sigma_j(\theta) - \sigma_i(\theta)) \neq 0
\end{align*}
in $\bar{K}$, and therefore $\det G = (\det S)^2 \neq 0$ in $\bar{K}$. Since $\det G \in K$ (every entry $\operatorname{tr}_{L/K}(\alpha_i \alpha_j) \in K$), we have $\det G \neq 0$ in $K$. By Step 1, the trace form is non-degenerate.
**Where every hypothesis is used.**
- *Separability*: produces exactly $n$ distinct $K$-embeddings (otherwise fewer, with multiplicities showing up in the trace formula and the matrix $S$ becoming row-deficient).
- *Separability*: again, in the primitive element theorem, to find $\theta$ with $L = K(\theta)$.
- *Primitive element*: lets us write $S$ as a Vandermonde matrix with a closed-form determinant.
For inseparable extensions the trace form is in fact degenerate — indeed the trace of every element is zero in characteristic $p$ for sufficiently inseparable extensions, so the embedding-theoretic strategy cannot work there.
[/guided]
[/step]