[proofplan]
We show that the upper-numbering ramification filtration is compatible with quotients by proving $G^t(M/K) \cdot H / H = G^t(L/K)$ for all $t \geq -1$. The key ingredient is the composition law $\eta_{M/K} = \eta_{L/K} \circ \eta_{M/L}$ for the Herbrand functions, which we establish by verifying that both sides are piecewise-linear, agree at $s = 0$, and have identical slopes on each linear piece (using the integral formula and [Herbrand's Theorem](/theorems/???)). With the composition law in hand, we substitute $s = \psi_{M/K}(t)$ into Herbrand's theorem and use invertibility of $\eta_{L/K}$ to identify the resulting lower-numbering index with $\psi_{L/K}(t)$, yielding the desired compatibility.
[/proofplan]
[step:Establish the composition law $\eta_{M/K} = \eta_{L/K} \circ \eta_{M/L}$]
Define the Herbrand function for a finite Galois extension $E/F$ by
\begin{align*}
\eta_{E/F}(s) = \int_0^s \frac{|G_0(E/F)|}{|G_{\lceil u \rceil}(E/F)|} \, d\mathcal{L}^1(u)
\end{align*}
for $s \geq 0$ (extended to $s \in [-1, 0]$ by $\eta_{E/F}(s) = s$). Both $\eta_{M/K}$ and $\eta_{L/K} \circ \eta_{M/L}$ are continuous piecewise-linear functions $[-1, \infty) \to [-1, \infty)$ satisfying $\eta_{M/K}(-1) = -1$ and $(\eta_{L/K} \circ \eta_{M/L})(-1) = -1$.
To show they agree, it suffices to show they have the same derivative wherever both are differentiable (i.e., away from the finitely many break points). At a point $s > 0$ that is not a break of the lower-numbering filtration, the derivative of $\eta_{M/K}$ is
\begin{align*}
\eta_{M/K}'(s) = \frac{|G_0(M/K)|}{|G_{\lceil s \rceil}(M/K)|}.
\end{align*}
By the chain rule, the derivative of $\eta_{L/K} \circ \eta_{M/L}$ is
\begin{align*}
(\eta_{L/K} \circ \eta_{M/L})'(s) = \eta_{L/K}'(\eta_{M/L}(s)) \cdot \eta_{M/L}'(s) = \frac{|G_0(L/K)|}{|G_{\lceil \eta_{M/L}(s) \rceil}(L/K)|} \cdot \frac{|G_0(M/L)|}{|G_{\lceil s \rceil}(M/L)|}.
\end{align*}
By [Herbrand's Theorem](/theorems/???), $G_{\lceil s \rceil}(M/K) \cdot H / H = G_{\eta_{M/L}(\lceil s \rceil)}(L/K)$, which gives
\begin{align*}
|G_{\lceil s \rceil}(L/K)| = \frac{|G_{\lceil s \rceil}(M/K)|}{|G_{\lceil s \rceil}(M/L)|}
\end{align*}
(using $|G_s(M/K) \cdot H / H| = |G_s(M/K)| / |G_s(M/K) \cap H| = |G_s(M/K)| / |G_s(M/L)|$), and since $G_0(M/K) = G_0(M/L) \cdot G_0(L/K)$ with $|G_0(M/K)| = |G_0(M/L)| \cdot |G_0(L/K)|$ (as $e_{M/K} = e_{M/L} \cdot e_{L/K}$), substitution gives
\begin{align*}
\frac{|G_0(L/K)|}{|G_{\lceil \eta_{M/L}(s) \rceil}(L/K)|} \cdot \frac{|G_0(M/L)|}{|G_{\lceil s \rceil}(M/L)|} = \frac{|G_0(L/K)| \cdot |G_0(M/L)| \cdot |G_{\lceil s \rceil}(M/L)|}{|G_{\lceil s \rceil}(M/K)| \cdot |G_{\lceil s \rceil}(M/L)|} = \frac{|G_0(M/K)|}{|G_{\lceil s \rceil}(M/K)|}.
\end{align*}
Since the derivatives agree at all but finitely many points and both functions are continuous with $\eta_{M/K}(-1) = (\eta_{L/K} \circ \eta_{M/L})(-1) = -1$, we conclude $\eta_{M/K} = \eta_{L/K} \circ \eta_{M/L}$.
[guided]
The composition law for Herbrand functions is the engine that makes the upper numbering well-behaved in towers. Let us carefully verify why it holds.
Both $\eta_{M/K}$ and $\eta_{L/K} \circ \eta_{M/L}$ are continuous, piecewise-linear functions on $[-1, \infty)$, both equal to the identity on $[-1, 0]$, both taking $-1$ to $-1$. Two such functions agree everywhere provided they have the same derivative on every linear piece.
The derivative of $\eta_{M/K}$ at a non-break point $s > 0$ is $\eta_{M/K}'(s) = |G_0(M/K)| / |G_{\lceil s \rceil}(M/K)|$, which is the reciprocal of the index of the $\lceil s \rceil$-th ramification group in the inertia group.
For the composition, the chain rule gives $(\eta_{L/K} \circ \eta_{M/L})'(s) = \eta_{L/K}'(\eta_{M/L}(s)) \cdot \eta_{M/L}'(s)$. This product involves two ratios of ramification group orders. Herbrand's theorem relates the ramification groups in the tower: $G_i(M/K) \cdot H / H \cong G_i(M/K) / G_i(M/L)$ maps onto $G_{\eta_{M/L}(i)}(L/K)$. This allows us to express $|G_{\eta_{M/L}(i)}(L/K)|$ in terms of $|G_i(M/K)|$ and $|G_i(M/L)|$, and the two-factor product telescopes to $|G_0(M/K)| / |G_{\lceil s \rceil}(M/K)|$.
Why is this composition law important? It implies that $\psi_{M/K} = \psi_{M/L} \circ \psi_{L/K}$ (compose the inverses in the opposite order), and therefore the upper numbering — defined via $G^t = G_{\psi(t)}$ — satisfies the compatibility $G^t(M/K) \cdot H / H = G^t(L/K)$, unlike the lower numbering which does not pass to quotients cleanly.
[/guided]
[/step]
[step:Translate Herbrand's theorem from lower to upper numbering using the composition law]
Let $t \in [-1, \infty)$. Define $s = \psi_{M/K}(t)$, where $\psi_{M/K} = \eta_{M/K}^{-1}$ is the inverse Herbrand function. By definition of the upper-numbering filtration:
\begin{align*}
G^t(M/K) = G_s(M/K) = G_{\psi_{M/K}(t)}(M/K).
\end{align*}
Set $u = \eta_{M/L}(s)$. By [Herbrand's Theorem](/theorems/???):
\begin{align*}
\frac{G_s(M/K) \cdot H}{H} = G_u(L/K).
\end{align*}
We claim $u = \psi_{L/K}(t)$. Apply $\eta_{L/K}$ to $u$:
\begin{align*}
\eta_{L/K}(u) = \eta_{L/K}(\eta_{M/L}(s)) = \eta_{M/K}(s) = \eta_{M/K}(\psi_{M/K}(t)) = t,
\end{align*}
where the second equality is the composition law established in the previous step. Since $\eta_{L/K}$ is a strictly increasing continuous bijection $[-1, \infty) \to [-1, \infty)$, it is invertible, and $u = \eta_{L/K}^{-1}(t) = \psi_{L/K}(t)$.
[/step]
[step:Conclude the compatibility of the upper-numbering filtration with quotients]
From the previous step, for each $t \in [-1, \infty)$:
\begin{align*}
\frac{G^t(M/K) \cdot H}{H} = \frac{G_{\psi_{M/K}(t)}(M/K) \cdot H}{H} = G_{\psi_{L/K}(t)}(L/K) = G^t(L/K).
\end{align*}
The first equality is the definition of the upper-numbering filtration for $M/K$. The second is Herbrand's theorem with the identification $u = \psi_{L/K}(t)$. The third is the definition of the upper-numbering filtration for $L/K$.
This is the desired compatibility: the upper-numbering ramification filtration commutes with passage to the quotient $\operatorname{Gal}(L/K) = \operatorname{Gal}(M/K) / H$, for every $t \geq -1$.
[guided]
This final step assembles the ingredients. The entire argument is:
1. Start with $G^t(M/K) = G_{\psi_{M/K}(t)}(M/K)$ — this is just the definition of upper numbering.
2. Apply Herbrand's theorem (which is a statement about the *lower* numbering) to get $G_s(M/K) \cdot H / H = G_u(L/K)$ where $u = \eta_{M/L}(s)$ and $s = \psi_{M/K}(t)$.
3. Use the composition law $\eta_{M/K} = \eta_{L/K} \circ \eta_{M/L}$ to identify $u$: since $\eta_{L/K}(u) = \eta_{L/K}(\eta_{M/L}(s)) = \eta_{M/K}(s) = t$, we get $u = \psi_{L/K}(t)$.
4. Rewrite $G_u(L/K) = G_{\psi_{L/K}(t)}(L/K) = G^t(L/K)$ — again just the definition of upper numbering.
The point of the upper numbering is precisely this compatibility. The lower-numbering filtration $(G_s)$ does not pass to quotients in a clean way: Herbrand's theorem involves the non-trivial function $\eta_{M/L}$. The upper numbering absorbs this function into the re-indexing $G^t = G_{\psi(t)}$, making the passage to quotients automatic. This is why the upper numbering — despite being less intuitive than the lower numbering — is the "correct" invariant for studying ramification in towers and in infinite extensions.
[/guided]
[/step]