[proofplan]
We prove that $|\mathrm{Gal}(L/K)| \leq [L:K]$ for any finite extension $L/K$ by induction on the number of generators. Write $L = K(\alpha_1, \ldots, \alpha_r)$ and form the tower $K = F_0 \subset F_1 \subset \cdots \subset F_r = L$ where $F_i = K(\alpha_1, \ldots, \alpha_i)$. Each $K$-automorphism of $L$ must send $\alpha_i$ to a root of its minimal polynomial over $F_{i-1}$, giving at most $d_i = [F_i : F_{i-1}]$ choices at the $i$-th step. The product $d_1 \cdots d_r = [L:K]$ by the Tower Law, completing the bound.
[/proofplan]
[step:Construct the generating tower]
Since $L/K$ is a finite extension, $L$ is finitely generated over $K$. Write $L = K(\alpha_1, \ldots, \alpha_r)$ and define intermediate fields
\begin{align*}
F_0 &= K, \\
F_i &= K(\alpha_1, \ldots, \alpha_i) \quad \text{for } i = 1, \ldots, r,
\end{align*}
so that $F_r = L$. Each step $F_{i-1} \subset F_i = F_{i-1}(\alpha_i)$ is a simple extension. Let $f_i \in F_{i-1}[x]$ be the minimal polynomial of $\alpha_i$ over $F_{i-1}$, and set $d_i = \deg f_i = [F_i : F_{i-1}]$.
[/step]
[step:Count the choices at each level of the tower]
Let $\sigma \in \mathrm{Gal}(L/K)$. Because $\sigma$ fixes $K$ pointwise, it is completely determined by the images $\sigma(\alpha_1), \ldots, \sigma(\alpha_r)$.
Consider what happens at the $i$-th level. Suppose the restriction of $\sigma$ to $F_{i-1}$ has already been determined (the base case $i = 1$ is trivial since $\sigma$ fixes $F_0 = K$). Apply $\sigma$ to the relation $f_i(\alpha_i) = 0$. Because the coefficients of $f_i$ lie in $F_{i-1}$ and $\sigma$ acts on them as $\sigma|_{F_{i-1}}$, we obtain
\begin{align*}
\sigma(f_i)(\sigma(\alpha_i)) = 0,
\end{align*}
where $\sigma(f_i)$ denotes the polynomial obtained by applying $\sigma$ to each coefficient of $f_i$. In particular, $\sigma(\alpha_i)$ must be a root of $\sigma(f_i)$ that lies in $L$. Since $\deg \sigma(f_i) = \deg f_i = d_i$, there are at most $d_i$ roots in $L$, so there are **at most $d_i$ possible values** for $\sigma(\alpha_i)$ once $\sigma|_{F_{i-1}}$ is fixed.
[guided]
Two distinct phenomena can make the count strictly less than $d_i$:
1. **Missing roots (failure of normality).** The polynomial $\sigma(f_i)$ may have roots that do not lie in $L$. For example, $\mathbb{Q}(\sqrt[3]{2})/\mathbb{Q}$ has degree $3$, but the minimal polynomial $x^3 - 2$ has only one real root in $\mathbb{Q}(\sqrt[3]{2})$; the other two roots are complex. Hence $|\mathrm{Gal}(\mathbb{Q}(\sqrt[3]{2})/\mathbb{Q})| = 1 < 3$.
2. **Repeated roots (failure of separability).** Over fields of positive characteristic, $f_i$ can be inseparable, meaning it has fewer than $d_i$ distinct roots in any extension. For instance, if $K = \mathbb{F}_p(t)$ and $\alpha^p = t$, then $f = x^p - t$ has $\alpha$ as its only root (with multiplicity $p$).
Equality $|\mathrm{Gal}(L/K)| = [L:K]$ holds at every step simultaneously precisely when $L/K$ is a **Galois extension**, i.e., both **normal** (every irreducible polynomial over $K$ that has one root in $L$ splits completely in $L$) and **separable** (every element of $L$ is a root of a separable polynomial over $K$). This is exactly the characterization proved in the Fundamental Theorem of Galois Theory.
[/guided]
[/step]
[step:Multiply via the Tower Law]
Combining all levels, each $\sigma \in \mathrm{Gal}(L/K)$ is determined by the sequence of choices $(\sigma(\alpha_1), \ldots, \sigma(\alpha_r))$, with at most $d_i$ options at step $i$. Therefore
\begin{align*}
|\mathrm{Gal}(L/K)| \;\leq\; d_1 \cdot d_2 \cdots d_r.
\end{align*}
By the Tower Law applied to the chain $K = F_0 \subset F_1 \subset \cdots \subset F_r = L$,
\begin{align*}
[L : K] \;=\; [F_r : F_{r-1}]\cdot [F_{r-1} : F_{r-2}] \cdots [F_1 : F_0] \;=\; d_r \cdot d_{r-1} \cdots d_1.
\end{align*}
Hence $|\mathrm{Gal}(L/K)| \leq [L:K]$. $\blacksquare$
[/step]