[proofplan]
We first record the compactness consequences: the supremum defining $F$ is attained, the active set is nonempty, and $F$ is locally continuous. The directional derivative formula is proved by two inequalities: active maximizers at $y$ give the lower bound, while maximizers at nearby points $y+th$ give the upper bound after compactness and continuity allow us to pass to a limiting active maximizer. The subgradient inclusion follows from the convexity of each active function $\phi(\cdot,x)$, and the uniqueness case follows by showing that the first-order error is $o(|u|)$ uniformly over nearby maximizers.
[/proofplan]
[step:Establish attainment, convexity, and local continuity of the envelope]
Fix $y \in \mathbb{R}^k$. Since $X$ is compact and the map $x \mapsto \phi(y,x)$ from $X$ to $\mathbb{R}$ is continuous, the maximum of $\phi(y,\cdot)$ over $X$ is attained. Hence $X^*(y)$ is nonempty.
For every $y_0,y_1 \in \mathbb{R}^k$ and every $\lambda \in [0,1]$, convexity of $\phi(\cdot,x)$ gives
\begin{align*}
\phi(\lambda y_0+(1-\lambda)y_1,x) \leq \lambda \phi(y_0,x)+(1-\lambda)\phi(y_1,x)
\end{align*}
for every $x \in X$. Taking the supremum over $x \in X$ and using $\phi(y_i,x) \leq F(y_i)$ for $i \in \{0,1\}$ gives
\begin{align*}
F(\lambda y_0+(1-\lambda)y_1) \leq \lambda F(y_0)+(1-\lambda)F(y_1).
\end{align*}
Thus $F$ is convex.
We also need continuity at the fixed point $y$. Choose $r>0$, and define the compact set $K_r=\overline{B}(y,r) \times X \subset \mathbb{R}^k \times X$. The restriction of $\phi$ to $K_r$ is uniformly continuous. Therefore, for every $\varepsilon>0$, there exists $\delta \in (0,r)$ such that, whenever $z \in \overline{B}(y,r)$ and $|z-y|<\delta$,
\begin{align*}
|\phi(z,x)-\phi(y,x)| \leq \varepsilon
\end{align*}
for every $x \in X$. Taking suprema over $x \in X$ gives
\begin{align*}
|F(z)-F(y)| \leq \varepsilon.
\end{align*}
Hence $F$ is continuous at $y$.
[guided]
The compactness assumption on $X$ is used immediately. For the fixed point $y \in \mathbb{R}^k$, the function $x \mapsto \phi(y,x)$ is continuous on the compact set $X$, so it attains its maximum. Since $F(y)$ is the supremum of the same function over $X$, this maximum equals $F(y)$. Thus there is at least one $x \in X$ such that $\phi(y,x)=F(y)$, which proves $X^*(y)\neq \varnothing$.
Next we verify that $F$ is convex. Let $y_0,y_1 \in \mathbb{R}^k$ and $\lambda \in [0,1]$. For each fixed $x \in X$, the map $\phi(\cdot,x):\mathbb{R}^k \to \mathbb{R}$ is convex, so
\begin{align*}
\phi(\lambda y_0+(1-\lambda)y_1,x) \leq \lambda \phi(y_0,x)+(1-\lambda)\phi(y_1,x).
\end{align*}
Since $\phi(y_0,x) \leq F(y_0)$ and $\phi(y_1,x) \leq F(y_1)$ for every $x \in X$, we get
\begin{align*}
\phi(\lambda y_0+(1-\lambda)y_1,x) \leq \lambda F(y_0)+(1-\lambda)F(y_1).
\end{align*}
Taking the supremum over $x \in X$ yields
\begin{align*}
F(\lambda y_0+(1-\lambda)y_1) \leq \lambda F(y_0)+(1-\lambda)F(y_1).
\end{align*}
Therefore $F$ is convex.
Finally, we prove local continuity of $F$ at $y$. This matters because, in the upper-bound argument, we will choose maximizers $x_t$ at the nearby points $y+th$ and then pass to a limit. To identify the limit as active at $y$, we must know that $F(y+th) \to F(y)$.
Choose $r>0$ and define $K_r=\overline{B}(y,r)\times X$. This set is compact because $\overline{B}(y,r)$ and $X$ are compact. The map $\phi: \mathbb{R}^k \times X \to \mathbb{R}$ is continuous, so its restriction to $K_r$ is uniformly continuous. Hence, for every $\varepsilon>0$, there exists $\delta \in (0,r)$ such that, for all $z \in \overline{B}(y,r)$ with $|z-y|<\delta$ and all $x \in X$,
\begin{align*}
|\phi(z,x)-\phi(y,x)| \leq \varepsilon.
\end{align*}
Taking suprema over $x \in X$ gives both inequalities
\begin{align*}
F(z) \leq F(y)+\varepsilon
\end{align*}
and
\begin{align*}
F(y) \leq F(z)+\varepsilon.
\end{align*}
Thus
\begin{align*}
|F(z)-F(y)| \leq \varepsilon,
\end{align*}
so $F$ is continuous at $y$.
[/guided]
[/step]
[step:Obtain the lower directional bound from active maximizers]
Fix $h \in \mathbb{R}^k$. For every $x \in X^*(y)$ and every $t>0$,
\begin{align*}
\frac{F(y+th)-F(y)}{t} \geq \frac{\phi(y+th,x)-\phi(y,x)}{t}.
\end{align*}
Since $\phi(\cdot,x)$ is differentiable at $y$, the right-hand side converges to $\nabla_y\phi(y,x)\cdot h$ as $t \downarrow 0$. Therefore
\begin{align*}
\liminf_{t \downarrow 0} \frac{F(y+th)-F(y)}{t} \geq \sup_{x \in X^*(y)} \nabla_y\phi(y,x)\cdot h.
\end{align*}
[/step]
[step:Use nearby maximizers to prove the upper directional bound]
Let $(t_j)_{j=1}^{\infty}$ be any sequence in $(0,\infty)$ with $t_j \downarrow 0$. For each $j \in \mathbb{N}$, choose $x_j \in X^*(y+t_jh)$, which is possible by compactness. Since $X$ is compact, after passing to a subsequence there exists $x_0 \in X$ such that $x_j \to x_0$.
By continuity of $\phi$ and continuity of $F$ at $y$,
\begin{align*}
\phi(y,x_0) = \lim_{j \to \infty}\phi(y+t_jh,x_j) = \lim_{j \to \infty}F(y+t_jh)=F(y).
\end{align*}
Thus $x_0 \in X^*(y)$.
For each $j$, the [fundamental theorem of calculus](/theorems/632) along the line segment $s \mapsto y+s h$ gives
\begin{align*}
\phi(y+t_jh,x_j)-\phi(y,x_j)=\int_0^{t_j} \nabla_y\phi(y+sh,x_j)\cdot h \, d\mathcal{L}^1(s).
\end{align*}
Since $x_j$ maximizes at $y+t_jh$,
\begin{align*}
F(y+t_jh)-F(y) \leq \phi(y+t_jh,x_j)-\phi(y,x_j).
\end{align*}
Hence
\begin{align*}
\frac{F(y+t_jh)-F(y)}{t_j} \leq \frac{1}{t_j}\int_0^{t_j} \nabla_y\phi(y+sh,x_j)\cdot h \, d\mathcal{L}^1(s).
\end{align*}
The continuity of $(z,x)\mapsto \nabla_y\phi(z,x)$ and the convergence $(y+sh,x_j)\to (y,x_0)$ uniformly for $0\leq s\leq t_j$ imply
\begin{align*}
\lim_{j \to \infty}\frac{1}{t_j}\int_0^{t_j} \nabla_y\phi(y+sh,x_j)\cdot h \, d\mathcal{L}^1(s)=\nabla_y\phi(y,x_0)\cdot h.
\end{align*}
Since $x_0 \in X^*(y)$, every sequential upper limit of the difference quotients is bounded above by
\begin{align*}
\sup_{x \in X^*(y)} \nabla_y\phi(y,x)\cdot h.
\end{align*}
Therefore
\begin{align*}
\limsup_{t \downarrow 0} \frac{F(y+th)-F(y)}{t} \leq \sup_{x \in X^*(y)} \nabla_y\phi(y,x)\cdot h.
\end{align*}
Combining this with the lower bound proves the directional derivative formula.
[/step]
[step:Convert active gradients into subgradients]
Fix $x \in X^*(y)$. Since $\phi(\cdot,x)$ is convex and differentiable at $y$, its first-order convexity inequality gives, for every $z \in \mathbb{R}^k$,
\begin{align*}
\phi(z,x) \geq \phi(y,x)+\nabla_y\phi(y,x)\cdot (z-y).
\end{align*}
Because $F(z)\geq \phi(z,x)$ and $\phi(y,x)=F(y)$, we obtain
\begin{align*}
F(z) \geq F(y)+\nabla_y\phi(y,x)\cdot (z-y).
\end{align*}
Thus $\nabla_y\phi(y,x)\in \partial F(y)$.
Now let $m \in \mathbb{N}$, let $x_1,\dots,x_m \in X^*(y)$, and let $\lambda_1,\dots,\lambda_m \in [0,1]$ satisfy $\sum_{i=1}^m \lambda_i=1$. Define
\begin{align*}
g = \sum_{i=1}^m \lambda_i \nabla_y\phi(y,x_i).
\end{align*}
Multiplying the subgradient inequality for each $x_i$ by $\lambda_i$ and summing over $i$ gives, for every $z \in \mathbb{R}^k$,
\begin{align*}
F(z) \geq F(y)+g\cdot (z-y).
\end{align*}
Therefore $g \in \partial F(y)$, and hence
\begin{align*}
\operatorname{conv}\{\nabla_y \phi(y,x):x\in X^*(y)\}\subseteq \partial F(y).
\end{align*}
[/step]
[step:Derive differentiability from uniqueness of the active maximizer]
Assume $X^*(y)=\{x^*(y)\}$. Define
\begin{align*}
g=\nabla_y\phi(y,x^*(y)).
\end{align*}
We prove that $F(y+u)-F(y)-g\cdot u=o(|u|)$ as $u \to 0$.
For the lower bound, the subgradient inequality from the previous step gives
\begin{align*}
F(y+u)-F(y)-g\cdot u \geq 0
\end{align*}
for every $u \in \mathbb{R}^k$.
For the upper bound, let $(u_j)_{j=1}^{\infty}$ be any sequence in $\mathbb{R}^k\setminus\{0\}$ with $u_j \to 0$. Choose $x_j \in X^*(y+u_j)$. By compactness of $X$, after passing to a subsequence, $x_j \to x_0$ for some $x_0 \in X$. The continuity of $F$ at $y$ and of $\phi$ imply
\begin{align*}
\phi(y,x_0)=\lim_{j\to\infty}\phi(y+u_j,x_j)=\lim_{j\to\infty}F(y+u_j)=F(y).
\end{align*}
Hence $x_0 \in X^*(y)$, so uniqueness gives $x_0=x^*(y)$.
For each $j$, using maximality of $x_j$ at $y+u_j$ and the identity $\phi(y,x_j)\leq F(y)$, we get
\begin{align*}
F(y+u_j)-F(y) \leq \phi(y+u_j,x_j)-\phi(y,x_j).
\end{align*}
By the fundamental theorem of calculus along $s \mapsto y+s u_j$,
\begin{align*}
\phi(y+u_j,x_j)-\phi(y,x_j)=\int_0^1 \nabla_y\phi(y+s u_j,x_j)\cdot u_j \, d\mathcal{L}^1(s).
\end{align*}
Therefore
\begin{align*}
\frac{F(y+u_j)-F(y)-g\cdot u_j}{|u_j|} \leq \sup_{0\leq s\leq 1}\left|\nabla_y\phi(y+s u_j,x_j)-g\right|.
\end{align*}
Since $u_j \to 0$, $x_j \to x^*(y)$ along the chosen subsequence, and $(z,x)\mapsto \nabla_y\phi(z,x)$ is continuous, the right-hand side tends to $0$.
Every subsequential upper limit of
\begin{align*}
\frac{F(y+u)-F(y)-g\cdot u}{|u|}
\end{align*}
is therefore at most $0$, while the lower bound shows it is at least $0$. Hence
\begin{align*}
F(y+u)=F(y)+g\cdot u+o(|u|)
\end{align*}
as $u\to 0$. This is precisely differentiability of $F$ at $y$, with
\begin{align*}
\nabla F(y)=g=\nabla_y\phi(y,x^*(y)).
\end{align*}
[/step]