[proofplan]
We split into two cases. When $K$ is finite, $L$ is also finite, and the multiplicative group $L^*$ is a finite subgroup of the multiplicative group of a field; by the [Finite Multiplicative Subgroups Are Cyclic](/theorems/1266) theorem, $L^*$ is cyclic, and any generator is a primitive element. When $K$ is infinite, we reduce by induction on the number of generators to the two-generator case $L = K(\lambda, \beta)$. We then construct a candidate $\gamma = \beta + a\lambda$ for a carefully chosen $a \in K$, and show — by analysing the common roots of $P_\lambda$ and the polynomial $P_\beta(\gamma - at) \in K(\gamma)[t]$ — that for all but finitely many values of $a$, the element $\lambda$ is the unique common root, which forces $\lambda \in K(\gamma)$ via the Euclidean algorithm in $K(\gamma)[t]$. Since $K$ is infinite, a valid $a$ exists.
[/proofplan]
[step:Handle the finite field case via cyclicity of $L^*$]
Suppose $K$ is finite. Then $L$, being a finite extension of a finite field, is itself a finite field. The set $L^* := L \setminus \{0\}$ is a group under multiplication with $|L^*| = |L| - 1 < \infty$, and $L^*$ is a finite subgroup of the multiplicative group of the field $L$. By the [Finite Multiplicative Subgroups Are Cyclic](/theorems/1266) theorem, $L^*$ is cyclic.
Let $\gamma$ be a generator of $L^*$. Every nonzero element of $L$ is a power of $\gamma$, so $L^* \subset K(\gamma)$. Since $K(\gamma)$ is a subfield of $L$ containing $L^*$, it also contains $0$, and therefore $L \subset K(\gamma)$. The reverse inclusion $K(\gamma) \subset L$ holds because $\gamma \in L$ and $L$ is a field containing $K$. Hence $L = K(\gamma)$.
[guided]
When $K$ is finite with $|K| = q$, the extension $L/K$ has degree $[L : K] = d$ for some $d \ge 1$, so $|L| = q^d$. The multiplicative group $L^* = L \setminus \{0\}$ has order $q^d - 1$.
The key observation is that $L^*$ sits inside the multiplicative group of the field $L$. Not every finite group is cyclic (e.g., $(\mathbb{Z}/8\mathbb{Z})^\times \cong \mathbb{Z}/2\mathbb{Z} \times \mathbb{Z}/2\mathbb{Z}$ is not cyclic), but finite subgroups of the multiplicative group of a *field* are always cyclic. The reason is the polynomial root-counting property of fields: $t^d - 1$ has at most $d$ roots, which constrains the order structure of $L^*$ so tightly that it must be cyclic. This is the content of [Finite Multiplicative Subgroups Are Cyclic](/theorems/1266).
Applying that theorem: $L^*$ is cyclic. Let $\gamma$ be a generator. Then $\langle \gamma \rangle = L^*$, so every nonzero element of $L$ is a power of $\gamma$ and hence lies in $K(\gamma)$. Since $K(\gamma)$ is a field (in particular closed under addition), it contains $0$ as well. Therefore $L \subset K(\gamma)$. Conversely, $\gamma \in L$ and $K \subset L$, so $K(\gamma) \subset L$. We conclude $L = K(\gamma)$.
Note that the separability hypothesis is not needed in the finite case — the argument works for any finite extension of a finite field. This is consistent with the fact that every algebraic extension of a finite field is automatically separable (the Frobenius endomorphism is injective, hence surjective on a finite field, so every finite field is perfect).
[/guided]
[/step]
[step:Reduce the infinite-field case to two generators by induction]
Suppose $K$ is infinite. Since $L/K$ is a finite extension, we may write $L = K(\alpha_1, \ldots, \alpha_r)$ for finitely many elements $\alpha_1, \ldots, \alpha_r \in L$ that are algebraic over $K$.
We proceed by induction on $r$. The base case $r = 1$ gives $L = K(\alpha_1)$, which is already simple. For the inductive step, assume $r \ge 2$ and that every finite separable extension of $K$ generated by at most $r - 1$ elements is simple. Consider the sub-extension $K(\alpha_1, \ldots, \alpha_{r-1}) / K$. This is a finite extension contained in $L/K$. Since $L/K$ is separable and $K(\alpha_1, \ldots, \alpha_{r-1})$ is an intermediate field, the sub-extension $K(\alpha_1, \ldots, \alpha_{r-1})/K$ is also separable: every element $\alpha \in K(\alpha_1, \ldots, \alpha_{r-1})$ lies in $L$, so its minimal polynomial over $K$ has no repeated roots in $\bar{K}$ (by the separability of $L/K$). By the inductive hypothesis, $K(\alpha_1, \ldots, \alpha_{r-1}) = K(\beta)$ for some $\beta$. Then
\begin{align*}
L = K(\alpha_1, \ldots, \alpha_r) = K(\beta, \alpha_r).
\end{align*}
Renaming $\lambda := \alpha_r$, it suffices to prove the theorem for $L = K(\lambda, \beta)$ with $K$ infinite and $L/K$ finite and separable.
[guided]
The purpose of this step is to isolate the core difficulty. An extension $K(\alpha_1, \ldots, \alpha_r)$ with many generators looks complicated, but the induction peels off generators one at a time, replacing $r - 1$ of them with a single primitive element $\beta$. The argument rests on one structural fact: separability is inherited by sub-extensions.
Why does separability pass to $K(\alpha_1, \ldots, \alpha_{r-1})/K$? Take any $\alpha$ in $K(\alpha_1, \ldots, \alpha_{r-1})$. Since $K(\alpha_1, \ldots, \alpha_{r-1}) \subset L$ and $L/K$ is separable, the element $\alpha$ is separable over $K$ — its minimal polynomial $\operatorname{min}_K(\alpha)$ has distinct roots in $\bar{K}$. This holds for every element of the sub-extension, so $K(\alpha_1, \ldots, \alpha_{r-1})/K$ is separable.
After the induction, the entire problem collapses to the two-generator case $L = K(\lambda, \beta)$. The remaining steps handle this case.
[/guided]
[/step]
[step:Set up the minimal polynomials and construct $\gamma = \beta + a\lambda$]
We now work in the two-generator case: $L = K(\lambda, \beta)$ with $K$ infinite and $L/K$ finite and separable. Let $P_\lambda \in K[t]$ denote the minimal polynomial of $\lambda$ over $K$, with $\deg P_\lambda = m$, and let $P_\beta \in K[t]$ denote the minimal polynomial of $\beta$ over $K$, with $\deg P_\beta = n$.
Since $L/K$ is separable, both $\lambda$ and $\beta$ are separable over $K$. Therefore $P_\lambda$ and $P_\beta$ have no repeated roots in $\bar{K}$. Let $\lambda_1 = \lambda, \lambda_2, \ldots, \lambda_m$ denote the distinct roots of $P_\lambda$ in $\bar{K}$, and let $\beta_1 = \beta, \beta_2, \ldots, \beta_n$ denote the distinct roots of $P_\beta$ in $\bar{K}$.
For a parameter $a \in K$ to be determined, define
\begin{align*}
\gamma := \beta + a\lambda \in L.
\end{align*}
Since $\gamma \in L$ and $K \subset K(\gamma) \subset L$, the extension $L/K$ is simple via $\gamma$ if and only if $K(\gamma) = L = K(\lambda, \beta)$. Since $\beta = \gamma - a\lambda$, we have $\beta \in K(\gamma)$ whenever $\lambda \in K(\gamma)$ (provided $a \neq 0$). Therefore it suffices to find $a \in K \setminus \{0\}$ such that $\lambda \in K(\gamma)$.
[/step]
[step:Identify the common roots of $P_\lambda$ and $P_\beta(\gamma - at)$ in $K(\gamma)[t]$]
Define the polynomial
\begin{align*}
f(t) := P_\beta(\gamma - at) \in K(\gamma)[t].
\end{align*}
The coefficients of $f$ lie in $K(\gamma)$ because $\gamma \in K(\gamma)$, $a \in K \subset K(\gamma)$, and the coefficients of $P_\beta$ lie in $K \subset K(\gamma)$.
Evaluate $f$ at $t = \lambda$:
\begin{align*}
f(\lambda) = P_\beta(\gamma - a\lambda) = P_\beta(\beta) = 0,
\end{align*}
using $\gamma - a\lambda = \beta$. So $\lambda$ is a root of $f$. Since $\lambda$ is also a root of $P_\lambda$ (by definition of the minimal polynomial), $\lambda$ is a common root of $P_\lambda$ and $f$ in $\bar{K}$.
We now determine all common roots. An element $\lambda_j$ (a root of $P_\lambda$) is also a root of $f$ if and only if
\begin{align*}
P_\beta(\gamma - a\lambda_j) = 0,
\end{align*}
which holds if and only if $\gamma - a\lambda_j = \beta_i$ for some $i \in \{1, \ldots, n\}$. Substituting $\gamma = \beta + a\lambda$, this becomes
\begin{align*}
\beta + a\lambda - a\lambda_j = \beta_i,
\end{align*}
equivalently
\begin{align*}
a(\lambda - \lambda_j) = \beta_i - \beta.
\end{align*}
**Case $j = 1$:** Then $\lambda_j = \lambda$, so the left-hand side is $0$. The equation reduces to $\beta_i = \beta$, which holds only for $i = 1$. This confirms that $\lambda$ itself is always a common root, regardless of the choice of $a$.
**Case $j \ge 2$:** Then $\lambda_j \neq \lambda$, so $\lambda - \lambda_j \neq 0$ and we may solve:
\begin{align*}
a = \frac{\beta_i - \beta}{\lambda - \lambda_j}.
\end{align*}
The root $\lambda_j$ is a common root of $P_\lambda$ and $f$ if and only if $a$ equals this specific value. For each pair $(i, j)$ with $1 \le i \le n$ and $2 \le j \le m$, there is at most one such "bad" value of $a$.
[guided]
The idea behind the polynomial $f(t) = P_\beta(\gamma - at)$ is to encode the condition "$\gamma - at$ is a root of $P_\beta$" as a polynomial equation in $t$. Since we chose $\gamma = \beta + a\lambda$, evaluating at $t = \lambda$ automatically gives $\gamma - a\lambda = \beta$, which is a root of $P_\beta$. So $\lambda$ is guaranteed to be a common root of $P_\lambda$ and $f$ — the question is whether it is the *only* common root.
The common roots of $P_\lambda$ and $f$ are precisely those $\lambda_j$ (roots of $P_\lambda$) for which $\gamma - a\lambda_j$ is a root of $P_\beta$ — that is, $\gamma - a\lambda_j = \beta_i$ for some $i$. The substitution $\gamma = \beta + a\lambda$ converts this into $a(\lambda - \lambda_j) = \beta_i - \beta$.
For $j = 1$, this is $0 = \beta_i - \beta$, which forces $i = 1$ (since the $\beta_i$ are distinct). So the pair $(\lambda, \beta) = (\lambda_1, \beta_1)$ always contributes a common root, as expected.
For $j \ge 2$, we can divide by $\lambda - \lambda_j \neq 0$ (the $\lambda_j$ are distinct by separability of $P_\lambda$) to get $a = (\beta_i - \beta)/(\lambda - \lambda_j)$. Each pair $(i, j)$ gives at most one forbidden value. The critical question is whether we can avoid all of them simultaneously — this is addressed in the next step.
[/guided]
[/step]
[step:Choose $a \in K$ to make $\lambda$ the unique common root]
The set of "bad" values of $a$ — those for which some $\lambda_j$ with $j \ge 2$ is also a root of $f$ — is
\begin{align*}
S := \left\{ \frac{\beta_i - \beta}{\lambda - \lambda_j} : 1 \le i \le n,\; 2 \le j \le m \right\} \subset \bar{K}.
\end{align*}
This set has at most $(m - 1) \cdot n$ elements, which is finite. Since $K$ is infinite, the set $K \setminus (S \cap K)$ is nonempty (an infinite set cannot be exhausted by finitely many exclusions). Choose $a \in K \setminus (S \cup \{0\})$.
For this choice of $a$, the only common root of $P_\lambda$ and $f$ in $\bar{K}$ is $\lambda$ itself.
[guided]
This is the step where the hypothesis "$K$ is infinite" is consumed. The forbidden set $S$ has at most $(m-1)n$ elements — a finite number. Some of these values may not lie in $K$ (they are defined in $\bar{K}$), but even if all of them did, $K$ being infinite guarantees that $K \setminus S$ is nonempty. We also exclude $0$ to ensure $a \neq 0$, which was needed to recover $\beta = \gamma - a\lambda$ from $\gamma$.
What happens over a finite field? If $|K| \le (m-1)n$, the set $S$ could exhaust all of $K$, and no valid $a$ would exist. This is precisely why the finite case requires a separate argument (via cyclicity of $L^*$) — the polynomial method does not apply when $K$ has too few elements.
With the choice of $a$ made, we know: among the roots $\lambda_1, \ldots, \lambda_m$ of $P_\lambda$, only $\lambda_1 = \lambda$ is also a root of $f(t) = P_\beta(\gamma - at)$.
[/guided]
[/step]
[step:Conclude $\lambda \in K(\gamma)$ via the gcd in $K(\gamma)[t]$]
Both $P_\lambda$ and $f$ are polynomials in $K(\gamma)[t]$: $P_\lambda$ has coefficients in $K \subset K(\gamma)$, and $f$ has coefficients in $K(\gamma)$ as shown above. Since $K(\gamma)$ is a field, $K(\gamma)[t]$ is a Euclidean domain, and the greatest common divisor $g := \gcd(P_\lambda, f) \in K(\gamma)[t]$ is well-defined (up to units).
The polynomial $g$ divides $P_\lambda$ in $K(\gamma)[t]$. Since $P_\lambda$ is separable (it has no repeated roots in $\bar{K}$), every divisor of $P_\lambda$ in $\bar{K}[t]$ also has no repeated roots. In particular, $g$ has no repeated roots.
The roots of $g$ in $\bar{K}$ are precisely the common roots of $P_\lambda$ and $f$ in $\bar{K}$. By the choice of $a$ in the previous step, $\lambda$ is the only common root. Since $g$ has no repeated roots and $\lambda$ is its sole root, $g$ must be linear:
\begin{align*}
g(t) = c(t - \lambda)
\end{align*}
for some $c \in K(\gamma) \setminus \{0\}$ (the leading coefficient is a unit in $K(\gamma)$). Since $g \in K(\gamma)[t]$, the coefficient of $t^0$ in $g$ — namely $-c\lambda$ — lies in $K(\gamma)$. As $c \in K(\gamma) \setminus \{0\}$ and $K(\gamma)$ is a field, we conclude $\lambda = -(-c\lambda)/c \in K(\gamma)$.
[guided]
The Euclidean algorithm in $K(\gamma)[t]$ is the mechanism that converts root information into coefficient information. We know that $P_\lambda, f \in K(\gamma)[t]$, so the Euclidean algorithm computes their gcd $g$ using only operations in $K(\gamma)$ — in particular, $g \in K(\gamma)[t]$.
The roots of $g$ over $\bar{K}$ are exactly the common roots of $P_\lambda$ and $f$ over $\bar{K}$. We established that $\lambda$ is the only such root. Moreover, $g$ divides $P_\lambda$, and $P_\lambda$ is separable, so $g$ is also separable (a divisor of a separable polynomial cannot introduce repeated roots). A separable polynomial with a single root must be linear: $g(t) = c(t - \lambda)$ for some nonzero $c$.
Since $g \in K(\gamma)[t]$, its coefficients $c$ and $-c\lambda$ both lie in $K(\gamma)$. Since $c \neq 0$ and $K(\gamma)$ is a field, we can divide: $\lambda = -(-c\lambda) \cdot c^{-1} \in K(\gamma)$.
Why is the Euclidean algorithm essential here? The common roots of $P_\lambda$ and $f$ live in $\bar{K}$ — a potentially enormous algebraically closed field. We cannot directly "see" these roots from within $K(\gamma)$. But the gcd, computed entirely within $K(\gamma)[t]$, captures all the shared root information. The gcd being linear with coefficients in $K(\gamma)$ is what pulls $\lambda$ down into $K(\gamma)$.
[/guided]
[/step]
[step:Assemble the conclusion $L = K(\gamma)$]
From the previous step, $\lambda \in K(\gamma)$. Since $a \in K \subset K(\gamma)$ and $\gamma \in K(\gamma)$, we have
\begin{align*}
\beta = \gamma - a\lambda \in K(\gamma).
\end{align*}
Therefore both $\lambda$ and $\beta$ belong to $K(\gamma)$, giving $K(\lambda, \beta) \subset K(\gamma)$. Conversely, $\gamma = \beta + a\lambda \in K(\lambda, \beta)$, so $K(\gamma) \subset K(\lambda, \beta)$. Hence
\begin{align*}
L = K(\lambda, \beta) = K(\gamma).
\end{align*}
This proves that the two-generator extension $K(\lambda, \beta)/K$ is simple. Combined with the inductive reduction from the second step and the finite field case from the first step, the proof is complete: every finite separable extension $L/K$ satisfies $L = K(\gamma)$ for some $\gamma \in L$.
[/step]