[proofplan]
We prove both inclusions directly from the definition of the subdifferential. A maximizer $x \in C$ in direction $y$ gives a global affine minorant of the support function, so it is a subgradient at $y$. Conversely, if $x$ is a subgradient, the subgradient inequality first forces $y \cdot x = \sigma_C(y)$, and then a compact convex projection argument forces $x \in C$. These two facts identify $x$ as a maximizer of $c \mapsto y \cdot c$ over $C$.
[/proofplan]
[step:Verify that the support function and the exposed face are well-defined]
For each $z \in \mathbb{R}^n$, define the linear functional $\ell_z: C \to \mathbb{R}$ by
\begin{align*}
\ell_z(c) = z \cdot c.
\end{align*}
Since $\ell_z$ is continuous and $C$ is compact and nonempty, $\ell_z$ attains a finite maximum on $C$. Hence $\sigma_C(z)$ is finite for every $z \in \mathbb{R}^n$, and the set
\begin{align*}
A_y := \operatorname{argmax}_{c \in C} y \cdot c
= \{c \in C : y \cdot c = \sigma_C(y)\}
\end{align*}
is nonempty.
[/step]
[step:Show that every maximizer gives a subgradient]
Let $x \in A_y$. Then $x \in C$ and $y \cdot x = \sigma_C(y)$. For an arbitrary $z \in \mathbb{R}^n$, the definition of $\sigma_C(z)$ as a maximum over $C$ gives
\begin{align*}
\sigma_C(z) \ge z \cdot x.
\end{align*}
Using $y \cdot x = \sigma_C(y)$, we rewrite the right-hand side as
\begin{align*}
z \cdot x = \sigma_C(y) + x \cdot (z - y).
\end{align*}
Therefore, for every $z \in \mathbb{R}^n$,
\begin{align*}
\sigma_C(z) \ge \sigma_C(y) + x \cdot (z - y).
\end{align*}
This is exactly the definition of $x \in \partial \sigma_C(y)$. Hence
\begin{align*}
A_y \subset \partial \sigma_C(y).
\end{align*}
[guided]
Let $x \in A_y$. The meaning of $x \in A_y$ is that $x$ is a point of $C$ where the linear functional $c \mapsto y \cdot c$ reaches its maximum. Thus
\begin{align*}
x \in C
\end{align*}
and
\begin{align*}
y \cdot x = \sigma_C(y).
\end{align*}
To prove that $x$ is a subgradient of $\sigma_C$ at $y$, we must prove the global affine lower bound required by the definition of the subdifferential. That means we must show that for every $z \in \mathbb{R}^n$,
\begin{align*}
\sigma_C(z) \ge \sigma_C(y) + x \cdot (z - y).
\end{align*}
Fix $z \in \mathbb{R}^n$. Since $x \in C$ and $\sigma_C(z)$ is the maximum of $c \mapsto z \cdot c$ over all $c \in C$, the value at $x$ is bounded above by that maximum:
\begin{align*}
z \cdot x \le \sigma_C(z).
\end{align*}
Because $x$ maximizes in direction $y$, we also have $y \cdot x = \sigma_C(y)$. Therefore
\begin{align*}
z \cdot x = y \cdot x + x \cdot (z - y) = \sigma_C(y) + x \cdot (z - y).
\end{align*}
Combining the two displayed relations gives
\begin{align*}
\sigma_C(z) \ge \sigma_C(y) + x \cdot (z - y).
\end{align*}
Since $z \in \mathbb{R}^n$ was arbitrary, this proves $x \in \partial \sigma_C(y)$.
[/guided]
[/step]
[step:Use the subgradient inequality to force equality in the support direction]
Let $x \in \partial \sigma_C(y)$. By definition, for every $z \in \mathbb{R}^n$,
\begin{align*}
\sigma_C(z) \ge \sigma_C(y) + x \cdot (z - y).
\end{align*}
Taking $z = 0$ and using $\sigma_C(0) = 0$ gives
\begin{align*}
0 \ge \sigma_C(y) - x \cdot y.
\end{align*}
Thus
\begin{align*}
x \cdot y \ge \sigma_C(y).
\end{align*}
Taking $z = 2y$ gives
\begin{align*}
\sigma_C(2y) \ge \sigma_C(y) + x \cdot y.
\end{align*}
Since $\sigma_C(2y) = 2\sigma_C(y)$, this gives
\begin{align*}
2\sigma_C(y) \ge \sigma_C(y) + x \cdot y.
\end{align*}
Hence
\begin{align*}
\sigma_C(y) \ge x \cdot y.
\end{align*}
Therefore
\begin{align*}
x \cdot y = \sigma_C(y).
\end{align*}
[/step]
[step:Prove that a subgradient must lie in the compact convex set]
We prove $x \in C$. Suppose, for contradiction, that $x \notin C$. Define $d: C \to \mathbb{R}$ by
\begin{align*}
d(c) = |x - c|^2.
\end{align*}
The map $d$ is continuous, and $C$ is compact and nonempty, so there exists $c_0 \in C$ such that
\begin{align*}
|x - c_0|^2 = \min_{c \in C} |x - c|^2.
\end{align*}
Since $x \notin C$, we have $v := x - c_0 \ne 0$.
For any $c \in C$, define $\gamma_c: [0,1] \to C$ by
\begin{align*}
\gamma_c(t) = (1-t)c_0 + tc.
\end{align*}
Convexity of $C$ gives $\gamma_c(t) \in C$ for every $t \in [0,1]$. The real function $\phi_c: [0,1] \to \mathbb{R}$ defined by
\begin{align*}
\phi_c(t) = |x - \gamma_c(t)|^2
\end{align*}
has a minimum at $t = 0$. Indeed, $\gamma_c(t) \in C$ for every $t \in [0,1]$, and $c_0$ was chosen to minimize the distance from $x$ to $C$. Hence for every $h \in (0,1]$,
\begin{align*}
\frac{\phi_c(h)-\phi_c(0)}{h} \ge 0.
\end{align*}
Passing to the limit $h \downarrow 0$ gives that the right derivative at $0$ is nonnegative:
\begin{align*}
0 \le \phi_c'(0) = -2(x - c_0) \cdot (c - c_0).
\end{align*}
Thus
\begin{align*}
v \cdot c \le v \cdot c_0
\end{align*}
for every $c \in C$. Hence
\begin{align*}
\sigma_C(v) = \max_{c \in C} v \cdot c \le v \cdot c_0.
\end{align*}
But
\begin{align*}
v \cdot x = v \cdot c_0 + |v|^2 > v \cdot c_0.
\end{align*}
Therefore
\begin{align*}
v \cdot x - \sigma_C(v) > 0.
\end{align*}
Now rewrite the subgradient inequality as
\begin{align*}
x \cdot z - \sigma_C(z) \le x \cdot y - \sigma_C(y)
\end{align*}
for every $z \in \mathbb{R}^n$. For each $t > 0$, take $z = tv$. Since $t > 0$,
\begin{align*}
\sigma_C(tv) = t\sigma_C(v).
\end{align*}
Thus
\begin{align*}
t(x \cdot v - \sigma_C(v)) \le x \cdot y - \sigma_C(y)
\end{align*}
for every $t > 0$. The coefficient $x \cdot v - \sigma_C(v)$ is positive, so the left-hand side tends to $+\infty$ as $t \to \infty$, while the right-hand side is fixed. This contradiction proves $x \in C$.
[/step]
[step:Conclude that every subgradient is an exposed maximizer]
Let $x \in \partial \sigma_C(y)$. The previous two steps show that $x \in C$ and
\begin{align*}
y \cdot x = \sigma_C(y).
\end{align*}
Therefore $x \in A_y$. Hence
\begin{align*}
\partial \sigma_C(y) \subset A_y.
\end{align*}
Together with the first inclusion,
\begin{align*}
\partial \sigma_C(y) = A_y
= \operatorname{argmax}_{c \in C} y \cdot c.
\end{align*}
This proves the formula for every $y \in \mathbb{R}^n$.
[/step]