[step: Decompose the $K$-embeddings of $M$]
Let $\sigma_1, \dots, \sigma_r$ be the distinct $K$-embeddings $L \hookrightarrow \bar{K}$, where $r = [L:K]$ (separability gives exactly $r$). For each $\sigma_i$, let $\tau_{i1}, \dots, \tau_{is}$ be the distinct extensions of $\sigma_i$ to embeddings $M \hookrightarrow \bar{K}$, where $s = [M:L]$ (again by separability). Every $K$-embedding $M \hookrightarrow \bar{K}$ restricts to some $\sigma_i$ on $L$, so the full set of $K$-embeddings of $M$ is
\begin{align*}
\{\,\tau_{ij} : 1 \le i \le r,\; 1 \le j \le s\,\},
\end{align*}
a disjoint union of size $rs = [M:K]$.
[guided]
This is the key structural fact: the $[M:K]$ embeddings of $M$ partition into $[L:K]$ families, one for each embedding of $L$, each family of size $[M:L]$.
[step: Group the sum and product to obtain both tower formulas]
**Trace.** For $\alpha \in M$, the definition gives
\begin{align*}
\mathrm{Tr}_{M/K}(\alpha)
&= \sum_{i=1}^{r}\sum_{j=1}^{s} \tau_{ij}(\alpha).
\end{align*}
The inner sum over extensions of a fixed $\sigma_i$ is exactly $\mathrm{Tr}_{M/L}(\alpha)$ computed inside the copy of $L$ given by $\sigma_i$. More precisely, $\sigma_i$ is a $K$-embedding of $L$, so applying it to the element $\mathrm{Tr}_{M/L}(\alpha) \in L$ yields $\sigma_i\!\bigl(\mathrm{Tr}_{M/L}(\alpha)\bigr) = \sum_{j=1}^{s} \tau_{ij}(\alpha)$. Summing over all $\sigma_i$,
\begin{align*}
\mathrm{Tr}_{M/K}(\alpha)
&= \sum_{i=1}^{r} \sigma_i\!\bigl(\mathrm{Tr}_{M/L}(\alpha)\bigr)
= \mathrm{Tr}_{L/K}\!\bigl(\mathrm{Tr}_{M/L}(\alpha)\bigr).
\end{align*}
**Norm.** Replacing sums with products in the same grouping,
\begin{align*}
\mathrm{N}_{M/K}(\alpha)
&= \prod_{i=1}^{r}\prod_{j=1}^{s} \tau_{ij}(\alpha)
= \prod_{i=1}^{r} \sigma_i\!\bigl(\mathrm{N}_{M/L}(\alpha)\bigr)
= \mathrm{N}_{L/K}\!\bigl(\mathrm{N}_{M/L}(\alpha)\bigr).
\end{align*}
The inner product equals $\sigma_i\!\bigl(\mathrm{N}_{M/L}(\alpha)\bigr)$ by exactly the same reasoning as for the trace, with multiplication in place of addition. $\blacksquare$
[guided]
The essential move in both cases is the same: reindex a single sum (or product) over all $[M:K]$ embeddings as a double sum (or product) — first over the $[L:K]$ embeddings of $L$, then over the $[M:L]$ extensions of each. The outer operation then becomes $\mathrm{Tr}_{L/K}$ (or $\mathrm{N}_{L/K}$) applied to the element $\mathrm{Tr}_{M/L}(\alpha)$ (or $\mathrm{N}_{M/L}(\alpha)$) that already lives in $L$.