[proofplan]
We fix an arbitrary $\sigma \in \operatorname{Gal}(L/K)$ and lift it to the automorphism $\tau \in \operatorname{Gal}(M/K)$ that maximises $i_{M/K}(\tau)$ among all lifts of $\sigma$. The core of the proof is the pointwise identity $i_{M/K}(\tau g) = \min(i_{M/L}(g),\, i_{M/K}(\tau))$ for all $g \in H$, which we establish using the ultrametric inequality on $\mathcal{O}_M = \mathcal{O}_K[\alpha]$ and the maximality of $\tau$. From this identity, we derive $i_{L/K}(\sigma) - 1 = \eta_{M/L}(i_{M/K}(\tau) - 1)$ by expanding both sides via the averaging formula for $i_{L/K}$ and the integral definition of $\eta_{M/L}$. The theorem then follows from the chain of equivalences: $\sigma \in G_s(M/K)H/H$ iff $i_{M/K}(\tau) - 1 \geq s$ iff $i_{L/K}(\sigma) - 1 \geq \eta_{M/L}(s) = t$ iff $\sigma \in G_t(L/K)$.
[/proofplan]
[step:Choose a lift $\tau$ of $\sigma$ that maximises $i_{M/K}(\tau)$]
Let $\sigma \in \operatorname{Gal}(L/K)$. The coset of lifts of $\sigma$ to $\operatorname{Gal}(M/K)$ is $\{\tau g : g \in H\}$ for any single lift $\tau$. Since $H = \operatorname{Gal}(M/L)$ is finite, we may choose $\tau$ in this coset so that
\begin{align*}
i_{M/K}(\tau) = \max_{g \in H} i_{M/K}(\tau g).
\end{align*}
Fix such a $\tau$ for the remainder of the proof. Let $\alpha \in \mathcal{O}_M$ be a generator of the extension $M/K$, so that $\mathcal{O}_M = \mathcal{O}_K[\alpha]$.
[/step]
[step:Establish the pointwise identity $i_{M/K}(\tau g) = \min(i_{M/L}(g),\, i_{M/K}(\tau))$ for all $g \in H$]
Recall that for any $\rho \in \operatorname{Gal}(M/K) \setminus \{\operatorname{id}\}$, the index $i_{M/K}(\rho)$ is defined by
\begin{align*}
i_{M/K}(\rho) = v_M(\rho(\alpha) - \alpha),
\end{align*}
where $v_M$ is the normalised valuation on $M$ and $\mathcal{O}_M = \mathcal{O}_K[\alpha]$.
**Lower bound.** Write $\tau g(\alpha) - \alpha = (\tau g(\alpha) - \tau(\alpha)) + (\tau(\alpha) - \alpha)$. Since $\tau$ is an isometry of $M$ (it preserves $v_M$), we have
\begin{align*}
v_M(\tau g(\alpha) - \tau(\alpha)) = v_M(g(\alpha) - \alpha) = i_{M/K}(g).
\end{align*}
Note that for $g \in H = \operatorname{Gal}(M/L)$, the element $g$ fixes $L$ pointwise, and in particular $g$ fixes $\mathcal{O}_L$. Since $\mathcal{O}_L = \mathcal{O}_K[\alpha] \cap L$, we have $i_{M/K}(g) = i_{M/L}(g)$ when $g \neq \operatorname{id}$ (the indices computed via $\alpha$ agree because $\mathcal{O}_M = \mathcal{O}_K[\alpha]$ and the valuation is the same). By the ultrametric inequality on $v_M$:
\begin{align*}
i_{M/K}(\tau g) = v_M(\tau g(\alpha) - \alpha) \geq \min(v_M(\tau g(\alpha) - \tau(\alpha)),\, v_M(\tau(\alpha) - \alpha)) = \min(i_{M/L}(g),\, i_{M/K}(\tau)).
\end{align*}
When $g = \operatorname{id}$, we have $\tau g = \tau$ and $i_{M/L}(\operatorname{id}) = +\infty$ by convention, so $\min(+\infty, i_{M/K}(\tau)) = i_{M/K}(\tau) = i_{M/K}(\tau g)$, and the identity holds.
**Upper bound and case analysis for $g \neq \operatorname{id}$.** From the maximality of $\tau$:
\begin{align*}
i_{M/K}(\tau) \geq i_{M/K}(\tau g).
\end{align*}
*Case 1: $i_{M/L}(g) \geq i_{M/K}(\tau)$.* The lower bound gives $i_{M/K}(\tau g) \geq \min(i_{M/L}(g), i_{M/K}(\tau)) = i_{M/K}(\tau)$. Combined with the upper bound $i_{M/K}(\tau g) \leq i_{M/K}(\tau)$, we get $i_{M/K}(\tau g) = i_{M/K}(\tau) = \min(i_{M/L}(g), i_{M/K}(\tau))$.
*Case 2: $i_{M/L}(g) < i_{M/K}(\tau)$.* The two terms in $\tau g(\alpha) - \alpha = (\tau g(\alpha) - \tau(\alpha)) + (\tau(\alpha) - \alpha)$ have valuations $i_{M/L}(g)$ and $i_{M/K}(\tau)$, which are distinct. By the ultrametric equality for terms of different valuation (the [Isosceles Triangle Principle](/theorems/???)), the valuation of the sum equals the minimum:
\begin{align*}
i_{M/K}(\tau g) = v_M(\tau g(\alpha) - \alpha) = \min(i_{M/L}(g),\, i_{M/K}(\tau)) = i_{M/L}(g).
\end{align*}
In both cases, $i_{M/K}(\tau g) = \min(i_{M/L}(g),\, i_{M/K}(\tau))$.
[guided]
The heart of Herbrand's theorem is this pointwise identity. The strategy is to decompose $\tau g(\alpha) - \alpha$ into two pieces — one measuring $g$ and one measuring $\tau$ — and use ultrametric arithmetic to determine the valuation of the sum.
Write $\tau g(\alpha) - \alpha = (\tau g(\alpha) - \tau(\alpha)) + (\tau(\alpha) - \alpha)$. Since $\tau$ is an automorphism of $M$ preserving the valuation $v_M$, the first term has valuation
\begin{align*}
v_M(\tau g(\alpha) - \tau(\alpha)) = v_M(\tau(g(\alpha) - \alpha)) = v_M(g(\alpha) - \alpha) = i_{M/L}(g),
\end{align*}
and the second term has valuation $v_M(\tau(\alpha) - \alpha) = i_{M/K}(\tau)$.
The ultrametric inequality gives the lower bound $i_{M/K}(\tau g) \geq \min(i_{M/L}(g), i_{M/K}(\tau))$. The upper bound $i_{M/K}(\tau g) \leq i_{M/K}(\tau)$ is the maximality hypothesis on $\tau$.
Now we split into cases. If $i_{M/L}(g) \geq i_{M/K}(\tau)$, both the lower and upper bounds equal $i_{M/K}(\tau)$, so $i_{M/K}(\tau g) = i_{M/K}(\tau) = \min(i_{M/L}(g), i_{M/K}(\tau))$.
If $i_{M/L}(g) < i_{M/K}(\tau)$, the two terms have *distinct* valuations. The Isosceles Triangle Principle in non-archimedean valued fields states: if $|x| \neq |y|$, then $|x + y| = \max(|x|, |y|)$. Equivalently, for valuations, if $v(a) \neq v(b)$, then $v(a + b) = \min(v(a), v(b))$. Applying this with $a = \tau g(\alpha) - \tau(\alpha)$ (valuation $i_{M/L}(g)$) and $b = \tau(\alpha) - \alpha$ (valuation $i_{M/K}(\tau) > i_{M/L}(g)$), the valuation of the sum is $i_{M/L}(g)$, which equals $\min(i_{M/L}(g), i_{M/K}(\tau))$.
Why is the maximality hypothesis essential? Without it, we would have no upper bound on $i_{M/K}(\tau g)$ in Case 1. Different lifts of $\sigma$ give different values of $i_{M/K}(\tau g)$, and the identity holds only for the *maximal* lift.
[/guided]
[/step]
[step:Derive the averaging identity $i_{L/K}(\sigma) - 1 = \eta_{M/L}(i_{M/K}(\tau) - 1)$]
The index $i_{L/K}(\sigma)$ for $\sigma \in \operatorname{Gal}(L/K)$ can be computed via the averaging formula:
\begin{align*}
i_{L/K}(\sigma) = \frac{1}{e_{M/L}} \sum_{g \in H} i_{M/K}(\tau g),
\end{align*}
where $e_{M/L} = |H_0| = |G_0(M/L)|$ is the ramification index of $M/L$, and $\tau$ is any lift of $\sigma$ (the value is independent of the choice of lift). Applying the pointwise identity from the previous step:
\begin{align*}
i_{L/K}(\sigma) = \frac{1}{e_{M/L}} \sum_{g \in H} \min(i_{M/L}(g),\, i_{M/K}(\tau)).
\end{align*}
The Herbrand function $\eta_{M/L}$ is defined by
\begin{align*}
\eta_{M/L}(s) = \frac{1}{e_{M/L}} \sum_{g \in H} \min(i_{M/L}(g),\, s + 1) - 1.
\end{align*}
Substituting $s = i_{M/K}(\tau) - 1$:
\begin{align*}
\eta_{M/L}(i_{M/K}(\tau) - 1) &= \frac{1}{e_{M/L}} \sum_{g \in H} \min(i_{M/L}(g),\, i_{M/K}(\tau)) - 1.
\end{align*}
Comparing with the formula for $i_{L/K}(\sigma)$:
\begin{align*}
i_{L/K}(\sigma) = \eta_{M/L}(i_{M/K}(\tau) - 1) + 1,
\end{align*}
which gives the desired identity $i_{L/K}(\sigma) - 1 = \eta_{M/L}(i_{M/K}(\tau) - 1)$.
[/step]
[step:Conclude the theorem via a chain of equivalences]
Set $t = \eta_{M/L}(s)$. For $\sigma \in \operatorname{Gal}(L/K)$ with maximal lift $\tau$, the following equivalences hold:
\begin{align*}
\sigma \in \frac{G_s(M/K) \cdot H}{H}
&\iff \text{some lift of } \sigma \text{ lies in } G_s(M/K) \\
&\iff i_{M/K}(\tau) - 1 \geq s \quad \text{(since } \tau \text{ is the lift maximising } i_{M/K}\text{)} \\
&\iff \eta_{M/L}(i_{M/K}(\tau) - 1) \geq \eta_{M/L}(s) \quad \text{(since } \eta_{M/L} \text{ is non-decreasing)} \\
&\iff i_{L/K}(\sigma) - 1 \geq t \\
&\iff \sigma \in G_t(L/K).
\end{align*}
The second equivalence uses the maximality of $\tau$: $\sigma$ has *some* lift in $G_s(M/K)$ if and only if the lift with the largest $i_{M/K}$-value lies in $G_s(M/K)$. The third equivalence uses that $\eta_{M/L}$ is a non-decreasing piecewise-linear function (it is a non-negative linear combination of $\min$ terms). The fourth equivalence is the identity from the previous step.
Since this holds for every $\sigma \in \operatorname{Gal}(L/K)$, we conclude
\begin{align*}
\frac{G_s(M/K) \cdot H}{H} = G_t(L/K),
\end{align*}
where $t = \eta_{M/L}(s)$. The isomorphism form follows from the second isomorphism theorem: $G_s(M/K) \cdot H / H \cong G_s(M/K) / (G_s(M/K) \cap H) = G_s(M/K) / G_s(M/L)$, since $G_s(M/K) \cap H = G_s(M/L)$ by definition of the lower numbering ramification groups.
[guided]
The chain of equivalences translates the condition "$\sigma$ belongs to the image of $G_s(M/K)$ in $\operatorname{Gal}(L/K)$" into the condition "$\sigma \in G_t(L/K)$" via the Herbrand function.
Let us verify each step carefully.
**First equivalence.** By definition, $G_s(M/K) \cdot H / H$ consists of cosets $\tau' H$ where $\tau' \in G_s(M/K)$. An element $\sigma \in \operatorname{Gal}(L/K)$ lies in this quotient if and only if it has at least one lift $\tau'$ with $i_{M/K}(\tau') - 1 \geq s$.
**Second equivalence.** Among all lifts of $\sigma$, the one maximising $i_{M/K}$ is $\tau$ (by construction). If $\tau \in G_s(M/K)$, then $\sigma$ has a lift in $G_s(M/K)$. Conversely, if some lift $\tau g \in G_s(M/K)$, then $i_{M/K}(\tau g) - 1 \geq s$, and by maximality $i_{M/K}(\tau) \geq i_{M/K}(\tau g)$, so $i_{M/K}(\tau) - 1 \geq s$ as well.
**Third equivalence.** The function $\eta_{M/L}$ is non-decreasing because it is defined as $\eta_{M/L}(s) = \frac{1}{e_{M/L}} \sum_{g \in H} \min(i_{M/L}(g), s + 1) - 1$, and $\min(i_{M/L}(g), s + 1)$ is non-decreasing in $s$ for each $g$. Therefore $i_{M/K}(\tau) - 1 \geq s$ implies $\eta_{M/L}(i_{M/K}(\tau) - 1) \geq \eta_{M/L}(s)$. For the converse, if $i_{M/K}(\tau) - 1 < s$, then $\eta_{M/L}(i_{M/K}(\tau) - 1) < \eta_{M/L}(s)$ because $\eta_{M/L}$ is strictly increasing on any interval where at least one ramification break is not yet reached (in fact $\eta_{M/L}$ is piecewise linear with positive slopes on $[-1, \infty)$).
**Fourth equivalence.** This is exactly the identity $i_{L/K}(\sigma) - 1 = \eta_{M/L}(i_{M/K}(\tau) - 1)$ proved in the previous step.
The isomorphism form $G_t(L/K) \cong G_s(M/K) / G_s(M/L)$ follows from the second isomorphism theorem for groups: $G_s(M/K) \cdot H / H \cong G_s(M/K) / (G_s(M/K) \cap H)$. Since $H = \operatorname{Gal}(M/L)$ and the lower numbering is defined intrinsically on $\operatorname{Gal}(M/K)$, an element $g \in H$ lies in $G_s(M/K)$ if and only if $i_{M/K}(g) - 1 \geq s$, which for $g \in H$ is equivalent to $i_{M/L}(g) - 1 \geq s$, i.e., $g \in G_s(M/L)$. Therefore $G_s(M/K) \cap H = G_s(M/L)$.
[/guided]
[/step]