Suppose for contradiction that reliable transmission at rate $R > C$ is possible, witnessed by a sequence of codes $(C_n)_{n \geq 1}$ with $C_n$ of length $n$, size $\lceil 2^{nR} \rceil$, and $\hat{e}(C_n) \to 0$.
Define the **average error rate** $e(C_n) = \frac{1}{|C_n|} \sum_{c \in C_n} \mathbb{P}(\text{error} \mid c \text{ sent})$. Since $e(C_n) \leq \hat{e}(C_n)$, we also have $e(C_n) \to 0$.
Take $X$ uniform over $C_n$ and $Y$ the decoded output after transmission. Then $p := e(C_n) = \mathbb{P}(X \neq Y)$, and
\begin{align*}
H(X) = \log_2 |C_n| = \log_2 \lceil 2^{nR} \rceil \geq nR - 1.
\end{align*}
By Fano's inequality with $|C_n|$ as the alphabet size,
\begin{align*}
H(X \mid Y) \leq H(p) + p \log_2(|C_n| - 1) \leq 1 + p \cdot nR.
\end{align*}
Since $I(X; Y) = H(X) - H(X \mid Y)$ and the $n$th extension has information capacity $nC$,
\begin{align*}
nC \geq I(X; Y) = H(X) - H(X \mid Y) \geq (nR - 1) - (1 + pnR),
\end{align*}
which simplifies to $nC \geq nR - 2 - pnR$, so
\begin{align*}
p \geq \frac{n(R-C) - 2}{nR} \xrightarrow{n \to \infty} \frac{R-C}{R} \neq 0.
\end{align*}
Since $R > C$, this contradicts $p = e(C_n) \to 0$.