[proofplan]
The proof determines $\operatorname{Tr}_{L/K}(\alpha)$ and $\operatorname{N}_{L/K}(\alpha)$ by computing the matrix of the $K$-linear multiplication map $m_\alpha$ in a basis adapted to the tower $K \subset K(\alpha) \subset L$. We fix a $K$-basis $\{1, \alpha, \ldots, \alpha^{d-1}\}$ of $K(\alpha)$ and a $K(\alpha)$-basis $\{w_1, \ldots, w_r\}$ of $L$, forming the $K$-basis $\{w_j \alpha^i : 0 \le i \le d-1, \, 1 \le j \le r\}$ of $L$. In this basis, $m_\alpha$ acts on each $K(\alpha)$-line independently, producing a block-diagonal matrix with $r$ identical copies of the companion matrix of $P_\alpha$. The trace and determinant of this block-diagonal matrix yield the formulas $\operatorname{Tr}_{L/K}(\alpha) = -r \, a_{d-1}$ and $\operatorname{N}_{L/K}(\alpha) = (-1)^{dr} a_0^r$.
[/proofplan]
[step:Construct a $K$-basis of $L$ from the tower $K \subset K(\alpha) \subset L$]
Set $d := [K(\alpha) : K] = \deg P_\alpha$ and $r := [L : K(\alpha)]$. Since $P_\alpha = t^d + a_{d-1}t^{d-1} + \cdots + a_0$ is the minimal polynomial of $\alpha$ over $K$, the set $\{1, \alpha, \alpha^2, \ldots, \alpha^{d-1}\}$ is a $K$-basis of $K(\alpha)$.
Fix a $K(\alpha)$-basis $\{w_1, \ldots, w_r\}$ of $L$. By the Tower Law, $[L : K] = [L : K(\alpha)] \cdot [K(\alpha) : K] = rd$. The set
\begin{align*}
\mathcal{B} := \{w_j \alpha^i : 0 \le i \le d-1, \, 1 \le j \le r\}
\end{align*}
is a $K$-basis of $L$ with $|\mathcal{B}| = rd = [L : K]$. This follows from the standard basis-product construction for towers of extensions: every element of $L$ is a $K(\alpha)$-linear combination of the $w_j$, and each coefficient in $K(\alpha)$ is a $K$-linear combination of $1, \alpha, \ldots, \alpha^{d-1}$.
[guided]
The trace and norm of $\alpha$ over $L/K$ are defined as the trace and determinant of the $K$-linear multiplication map
\begin{align*}
m_\alpha: L &\to L \\
x &\mapsto \alpha x.
\end{align*}
To compute $\operatorname{Tr}(m_\alpha)$ and $\det(m_\alpha)$, we need to represent $m_\alpha$ as a matrix with respect to some $K$-basis of $L$. The choice of basis matters for the matrix entries but not for the trace and determinant (which are basis-independent). A good choice is one that reveals the structure of $m_\alpha$.
The key observation is that $m_\alpha$ commutes with scalar multiplication by elements of $K(\alpha)$: for $c \in K(\alpha)$ and $x \in L$, $m_\alpha(cx) = \alpha(cx) = c(\alpha x) = c \cdot m_\alpha(x)$. This means $m_\alpha$ is not merely $K$-linear but $K(\alpha)$-linear. Consequently, if we choose a basis that respects the $K(\alpha)$-module structure of $L$, the matrix of $m_\alpha$ will decompose into blocks.
Set $d := \deg P_\alpha = [K(\alpha) : K]$ and $r := [L : K(\alpha)]$. The set $\{1, \alpha, \ldots, \alpha^{d-1}\}$ is a $K$-basis of $K(\alpha)$ because $P_\alpha$ is the minimal polynomial of $\alpha$ over $K$: it is the monic polynomial of least degree in $K[t]$ satisfied by $\alpha$, so $\{1, \alpha, \ldots, \alpha^{d-1}\}$ are linearly independent over $K$, and every higher power $\alpha^k$ ($k \ge d$) can be reduced using the relation $\alpha^d = -a_{d-1}\alpha^{d-1} - \cdots - a_0$.
Fix a $K(\alpha)$-basis $\{w_1, \ldots, w_r\}$ of $L$. By the Tower Law, $[L : K] = rd$. The products $\{w_j \alpha^i : 0 \le i \le d-1, \, 1 \le j \le r\}$ form a $K$-basis $\mathcal{B}$ of $L$ with $rd$ elements. We order this basis by grouping all powers of $\alpha$ with the same $w_j$ together:
\begin{align*}
\mathcal{B} = \{w_1, w_1 \alpha, \ldots, w_1 \alpha^{d-1}, \; w_2, w_2 \alpha, \ldots, w_2 \alpha^{d-1}, \; \ldots, \; w_r, w_r \alpha, \ldots, w_r \alpha^{d-1}\}.
\end{align*}
This ordering will produce the block-diagonal structure we seek.
[/guided]
[/step]
[step:Show that $m_\alpha$ acts on each $K(\alpha)$-coset $w_j \cdot K(\alpha)$ independently, producing $r$ identical companion blocks]
For each $j \in \{1, \ldots, r\}$, define the $d$-dimensional $K$-subspace $V_j := \operatorname{span}_K\{w_j, w_j\alpha, \ldots, w_j\alpha^{d-1}\}$. Since $\mathcal{B}$ is a $K$-basis of $L$, we have the direct sum decomposition $L = V_1 \oplus V_2 \oplus \cdots \oplus V_r$.
The map $m_\alpha$ preserves each $V_j$: for $0 \le i \le d-2$,
\begin{align*}
m_\alpha(w_j \alpha^i) = w_j \alpha^{i+1} \in V_j,
\end{align*}
and for $i = d-1$, the relation $\alpha^d = -a_{d-1}\alpha^{d-1} - \cdots - a_1 \alpha - a_0$ (which holds because $P_\alpha(\alpha) = 0$) gives
\begin{align*}
m_\alpha(w_j \alpha^{d-1}) = w_j \alpha^d = w_j(-a_{d-1}\alpha^{d-1} - a_{d-2}\alpha^{d-2} - \cdots - a_0) = -a_0 w_j - a_1 w_j\alpha - \cdots - a_{d-1} w_j\alpha^{d-1}.
\end{align*}
Hence $m_\alpha(V_j) \subset V_j$ for each $j$. The matrix of $m_\alpha|_{V_j}$ with respect to the ordered basis $(w_j, w_j\alpha, \ldots, w_j\alpha^{d-1})$ is the companion matrix of $P_\alpha$:
\begin{align*}
C_{P_\alpha} = \begin{pmatrix} 0 & 0 & \cdots & 0 & -a_0 \\ 1 & 0 & \cdots & 0 & -a_1 \\ 0 & 1 & \cdots & 0 & -a_2 \\ \vdots & \vdots & \ddots & \vdots & \vdots \\ 0 & 0 & \cdots & 1 & -a_{d-1} \end{pmatrix} \in K^{d \times d}.
\end{align*}
The columns of $C_{P_\alpha}$ record the images: the $i$-th column ($1 \le i \le d-1$) has a single $1$ in position $(i+1, i)$ (encoding $w_j\alpha^{i-1} \mapsto w_j\alpha^i$), and the $d$-th column has entries $(-a_0, -a_1, \ldots, -a_{d-1})^\top$ (encoding $w_j\alpha^{d-1} \mapsto -a_0 w_j - \cdots - a_{d-1}w_j\alpha^{d-1}$).
Since the matrix $C_{P_\alpha}$ is the same for every $j \in \{1, \ldots, r\}$ (the coefficients $a_0, \ldots, a_{d-1}$ depend only on $P_\alpha$, not on the choice of $w_j$), the matrix of $m_\alpha$ with respect to the full basis $\mathcal{B}$ is block-diagonal:
\begin{align*}
[m_\alpha]_{\mathcal{B}} = \begin{pmatrix} C_{P_\alpha} & & \\ & \ddots & \\ & & C_{P_\alpha} \end{pmatrix} \in K^{rd \times rd},
\end{align*}
with $r$ identical blocks of size $d \times d$ along the diagonal.
[guided]
Why does $m_\alpha$ preserve each subspace $V_j$? The subspace $V_j = w_j \cdot \operatorname{span}_K\{1, \alpha, \ldots, \alpha^{d-1}\} = w_j \cdot K(\alpha)$ is a copy of $K(\alpha)$ inside $L$, scaled by $w_j$. Multiplication by $\alpha$ maps $K(\alpha)$ to itself (since $K(\alpha)$ is a subfield containing $\alpha$), so $m_\alpha$ maps $w_j \cdot K(\alpha)$ to $w_j \cdot \alpha \cdot K(\alpha) = w_j \cdot K(\alpha) = V_j$. The crucial point is that multiplication by $\alpha$ acts within $K(\alpha)$, and the factor $w_j$ is simply carried along.
For the top-degree case $i = d-1$: since $P_\alpha(\alpha) = 0$, we have
\begin{align*}
\alpha^d + a_{d-1}\alpha^{d-1} + \cdots + a_1\alpha + a_0 = 0,
\end{align*}
so $\alpha^d = -a_{d-1}\alpha^{d-1} - \cdots - a_0$. Multiplying both sides by $w_j$:
\begin{align*}
w_j\alpha^d = -a_0 w_j - a_1 w_j\alpha - \cdots - a_{d-1} w_j\alpha^{d-1}.
\end{align*}
This is precisely the last column of the companion matrix. The companion matrix $C_{P_\alpha}$ encodes the action of multiplication by $\alpha$ on the $K$-vector space $K(\alpha)$ with respect to the power basis $\{1, \alpha, \ldots, \alpha^{d-1}\}$: it shifts each basis element up by one degree, except at degree $d-1$, where the minimal polynomial relation wraps $\alpha^d$ back into the span of lower powers.
The block-diagonal structure arises because the subspaces $V_1, \ldots, V_r$ are $m_\alpha$-invariant and span all of $L$. Since each $V_j$ is a copy of $K(\alpha)$ (as a $K$-vector space) with $\alpha$ acting the same way regardless of the "label" $w_j$, every diagonal block is the same matrix $C_{P_\alpha}$.
[/guided]
[/step]
[step:Read off trace and determinant from the block-diagonal matrix]
The trace of a block-diagonal matrix is the sum of the traces of its diagonal blocks, and the determinant is the product of the determinants.
**Trace.** The companion matrix $C_{P_\alpha}$ has diagonal entries $0, 0, \ldots, 0, -a_{d-1}$ (the entry $-a_{d-1}$ appears in position $(d, d)$, and all other diagonal entries are zero). Therefore
\begin{align*}
\operatorname{Tr}(C_{P_\alpha}) = -a_{d-1}.
\end{align*}
Since $[m_\alpha]_{\mathcal{B}}$ consists of $r$ copies of $C_{P_\alpha}$:
\begin{align*}
\operatorname{Tr}_{L/K}(\alpha) = \operatorname{Tr}(m_\alpha) = r \cdot \operatorname{Tr}(C_{P_\alpha}) = -r \, a_{d-1}.
\end{align*}
**Determinant.** We compute $\det(C_{P_\alpha})$ by cofactor expansion along the first row. The first row of $C_{P_\alpha}$ is $(0, 0, \ldots, 0, -a_0)$, so the only nonzero entry is in position $(1, d)$, contributing
\begin{align*}
\det(C_{P_\alpha}) = (-1)^{1+d}(-a_0) \cdot M_{1d},
\end{align*}
where $M_{1d}$ is the $(1,d)$-minor, the determinant of the $(d-1) \times (d-1)$ matrix obtained by deleting row $1$ and column $d$. This submatrix is lower-triangular with all diagonal entries equal to $1$:
\begin{align*}
\begin{pmatrix} 1 & 0 & \cdots & 0 \\ 0 & 1 & \cdots & 0 \\ \vdots & \vdots & \ddots & \vdots \\ 0 & 0 & \cdots & 1 \end{pmatrix} \in K^{(d-1) \times (d-1)},
\end{align*}
so $M_{1d} = 1$. Therefore
\begin{align*}
\det(C_{P_\alpha}) = (-1)^{1+d}(-a_0) = (-1)^{d+1}(-a_0) = (-1)^{d+1} \cdot (-1) \cdot a_0 = (-1)^{d+2} a_0 = (-1)^d a_0.
\end{align*}
Since $[m_\alpha]_{\mathcal{B}}$ consists of $r$ copies of $C_{P_\alpha}$:
\begin{align*}
\operatorname{N}_{L/K}(\alpha) = \det(m_\alpha) = \det(C_{P_\alpha})^r = ((-1)^d a_0)^r = (-1)^{dr} a_0^r.
\end{align*}
[guided]
**Trace computation.** The trace of $C_{P_\alpha}$ is the sum of its diagonal entries. Reading the diagonal from the matrix definition: the $(i,i)$-entry for $1 \le i \le d-1$ comes from the subdiagonal ones, which sit in position $(i+1, i)$, not on the diagonal. The diagonal entries at positions $(1,1), (2,2), \ldots, (d-1, d-1)$ are all $0$, and the $(d,d)$-entry is $-a_{d-1}$ (the last entry of the last column). Hence $\operatorname{Tr}(C_{P_\alpha}) = 0 + 0 + \cdots + 0 + (-a_{d-1}) = -a_{d-1}$.
Why does $-a_{d-1}$ appear? Recall that $a_{d-1}$ is the coefficient of $t^{d-1}$ in $P_\alpha = t^d + a_{d-1}t^{d-1} + \cdots + a_0$. If we factor $P_\alpha$ over a splitting field as $(t - \beta_1)\cdots(t - \beta_d)$, then Vieta's formula gives $a_{d-1} = -(\beta_1 + \cdots + \beta_d)$, so $\operatorname{Tr}(C_{P_\alpha}) = -a_{d-1} = \beta_1 + \cdots + \beta_d$. The trace of the companion matrix equals the sum of the roots, as expected.
**Determinant computation.** We expand $\det(C_{P_\alpha})$ along the first row. The first row is $(0, 0, \ldots, 0, -a_0)$, with the nonzero entry in position $(1, d)$. Cofactor expansion gives:
\begin{align*}
\det(C_{P_\alpha}) &= (-a_0) \cdot (-1)^{1+d} \cdot \det\begin{pmatrix} 1 & 0 & \cdots & 0 \\ 0 & 1 & \cdots & 0 \\ \vdots & \vdots & \ddots & \vdots \\ 0 & 0 & \cdots & 1 \end{pmatrix}.
\end{align*}
The submatrix is the $(d-1) \times (d-1)$ identity matrix (it consists of the subdiagonal ones from $C_{P_\alpha}$, which after deleting row $1$ and column $d$ become diagonal entries). Its determinant is $1$. Therefore:
\begin{align*}
\det(C_{P_\alpha}) &= (-a_0)(-1)^{1+d} = (-1)^{d+2} a_0 = (-1)^d a_0.
\end{align*}
The sign simplification uses $(-1)^{d+2} = (-1)^d \cdot (-1)^2 = (-1)^d$.
Again, Vieta's formula confirms: $a_0 = (-1)^d \beta_1 \cdots \beta_d$ (the constant term of a monic polynomial equals $(-1)^d$ times the product of its roots), so $\det(C_{P_\alpha}) = (-1)^d \cdot (-1)^d \beta_1 \cdots \beta_d = \beta_1 \cdots \beta_d$. The determinant of the companion matrix equals the product of the roots.
**Assembling the block-diagonal results.** For the full matrix $[m_\alpha]_{\mathcal{B}}$ with $r$ identical diagonal blocks:
\begin{align*}
\operatorname{Tr}_{L/K}(\alpha) &= \sum_{j=1}^{r} \operatorname{Tr}(C_{P_\alpha}) = r \cdot (-a_{d-1}) = -r \, a_{d-1}, \\[6pt]
\operatorname{N}_{L/K}(\alpha) &= \prod_{j=1}^{r} \det(C_{P_\alpha}) = ((-1)^d a_0)^r = (-1)^{dr} a_0^r.
\end{align*}
These are the desired formulas. The factor $r = [L : K(\alpha)]$ measures how many times the minimal polynomial's contribution is "repeated" across the full extension.
[/guided]
[/step]