Danskin Envelope Theorem — Statement & Proof

Theorem

Edit Issues Pull Requests Attributions Admin

Discussion

Proof

[proofplan] We first record the compactness consequences: the supremum defining $F$ is attained, the active set is nonempty, and $F$ is locally continuous. The directional derivative formula is proved by two inequalities: active maximizers at $y$ give the lower bound, while maximizers at nearby points $y+th$ give the upper bound after compactness and continuity allow us to pass to a limiting active maximizer. The subgradient inclusion follows from the convexity of each active function $\phi(\cdot,x)$, and the uniqueness case follows by showing that the first-order error is $o(|u|)$ uniformly over nearby maximizers. [/proofplan] [step:Establish attainment, convexity, and local continuity of the envelope] Fix $y \in \mathbb{R}^k$. Since $X$ is compact and the map $x \mapsto \phi(y,x)$ from $X$ to $\mathbb{R}$ is continuous, the maximum of $\phi(y,\cdot)$ over $X$ is attained. Hence $X^*(y)$ is nonempty. For every $y_0,y_1 \in \mathbb{R}^k$ and every $\lambda \in [0,1]$, convexity of $\phi(\cdot,x)$ gives \begin{align*} \phi(\lambda y_0+(1-\lambda)y_1,x) \leq \lambda \phi(y_0,x)+(1-\lambda)\phi(y_1,x) \end{align*} for every $x \in X$. Taking the supremum over $x \in X$ and using $\phi(y_i,x) \leq F(y_i)$ for $i \in \{0,1\}$ gives \begin{align*} F(\lambda y_0+(1-\lambda)y_1) \leq \lambda F(y_0)+(1-\lambda)F(y_1). \end{align*} Thus $F$ is convex. We also need continuity at the fixed point $y$. Choose $r>0$, and define the compact set $K_r=\overline{B}(y,r) \times X \subset \mathbb{R}^k \times X$. The restriction of $\phi$ to $K_r$ is uniformly continuous. Therefore, for every $\varepsilon>0$, there exists $\delta \in (0,r)$ such that, whenever $z \in \overline{B}(y,r)$ and $|z-y|<\delta$, \begin{align*} |\phi(z,x)-\phi(y,x)| \leq \varepsilon \end{align*} for every $x \in X$. Taking suprema over $x \in X$ gives \begin{align*} |F(z)-F(y)| \leq \varepsilon. \end{align*} Hence $F$ is continuous at $y$. [guided] The compactness assumption on $X$ is used immediately. For the fixed point $y \in \mathbb{R}^k$, the function $x \mapsto \phi(y,x)$ is continuous on the compact set $X$, so it attains its maximum. Since $F(y)$ is the supremum of the same function over $X$, this maximum equals $F(y)$. Thus there is at least one $x \in X$ such that $\phi(y,x)=F(y)$, which proves $X^*(y)\neq \varnothing$. Next we verify that $F$ is convex. Let $y_0,y_1 \in \mathbb{R}^k$ and $\lambda \in [0,1]$. For each fixed $x \in X$, the map $\phi(\cdot,x):\mathbb{R}^k \to \mathbb{R}$ is convex, so \begin{align*} \phi(\lambda y_0+(1-\lambda)y_1,x) \leq \lambda \phi(y_0,x)+(1-\lambda)\phi(y_1,x). \end{align*} Since $\phi(y_0,x) \leq F(y_0)$ and $\phi(y_1,x) \leq F(y_1)$ for every $x \in X$, we get \begin{align*} \phi(\lambda y_0+(1-\lambda)y_1,x) \leq \lambda F(y_0)+(1-\lambda)F(y_1). \end{align*} Taking the supremum over $x \in X$ yields \begin{align*} F(\lambda y_0+(1-\lambda)y_1) \leq \lambda F(y_0)+(1-\lambda)F(y_1). \end{align*} Therefore $F$ is convex. Finally, we prove local continuity of $F$ at $y$. This matters because, in the upper-bound argument, we will choose maximizers $x_t$ at the nearby points $y+th$ and then pass to a limit. To identify the limit as active at $y$, we must know that $F(y+th) \to F(y)$. Choose $r>0$ and define $K_r=\overline{B}(y,r)\times X$. This set is compact because $\overline{B}(y,r)$ and $X$ are compact. The map $\phi: \mathbb{R}^k \times X \to \mathbb{R}$ is continuous, so its restriction to $K_r$ is uniformly continuous. Hence, for every $\varepsilon>0$, there exists $\delta \in (0,r)$ such that, for all $z \in \overline{B}(y,r)$ with $|z-y|<\delta$ and all $x \in X$, \begin{align*} |\phi(z,x)-\phi(y,x)| \leq \varepsilon. \end{align*} Taking suprema over $x \in X$ gives both inequalities \begin{align*} F(z) \leq F(y)+\varepsilon \end{align*} and \begin{align*} F(y) \leq F(z)+\varepsilon. \end{align*} Thus \begin{align*} |F(z)-F(y)| \leq \varepsilon, \end{align*} so $F$ is continuous at $y$. [/guided] [/step] [step:Obtain the lower directional bound from active maximizers] Fix $h \in \mathbb{R}^k$. For every $x \in X^*(y)$ and every $t>0$, \begin{align*} \frac{F(y+th)-F(y)}{t} \geq \frac{\phi(y+th,x)-\phi(y,x)}{t}. \end{align*} Since $\phi(\cdot,x)$ is differentiable at $y$, the right-hand side converges to $\nabla_y\phi(y,x)\cdot h$ as $t \downarrow 0$. Therefore \begin{align*} \liminf_{t \downarrow 0} \frac{F(y+th)-F(y)}{t} \geq \sup_{x \in X^*(y)} \nabla_y\phi(y,x)\cdot h. \end{align*} [/step] [step:Use nearby maximizers to prove the upper directional bound] Let $(t_j)_{j=1}^{\infty}$ be any sequence in $(0,\infty)$ with $t_j \downarrow 0$. For each $j \in \mathbb{N}$, choose $x_j \in X^*(y+t_jh)$, which is possible by compactness. Since $X$ is compact, after passing to a subsequence there exists $x_0 \in X$ such that $x_j \to x_0$. By continuity of $\phi$ and continuity of $F$ at $y$, \begin{align*} \phi(y,x_0) = \lim_{j \to \infty}\phi(y+t_jh,x_j) = \lim_{j \to \infty}F(y+t_jh)=F(y). \end{align*} Thus $x_0 \in X^*(y)$. For each $j$, the [fundamental theorem of calculus](/theorems/632) along the line segment $s \mapsto y+s h$ gives \begin{align*} \phi(y+t_jh,x_j)-\phi(y,x_j)=\int_0^{t_j} \nabla_y\phi(y+sh,x_j)\cdot h \, d\mathcal{L}^1(s). \end{align*} Since $x_j$ maximizes at $y+t_jh$, \begin{align*} F(y+t_jh)-F(y) \leq \phi(y+t_jh,x_j)-\phi(y,x_j). \end{align*} Hence \begin{align*} \frac{F(y+t_jh)-F(y)}{t_j} \leq \frac{1}{t_j}\int_0^{t_j} \nabla_y\phi(y+sh,x_j)\cdot h \, d\mathcal{L}^1(s). \end{align*} The continuity of $(z,x)\mapsto \nabla_y\phi(z,x)$ and the convergence $(y+sh,x_j)\to (y,x_0)$ uniformly for $0\leq s\leq t_j$ imply \begin{align*} \lim_{j \to \infty}\frac{1}{t_j}\int_0^{t_j} \nabla_y\phi(y+sh,x_j)\cdot h \, d\mathcal{L}^1(s)=\nabla_y\phi(y,x_0)\cdot h. \end{align*} Since $x_0 \in X^*(y)$, every sequential upper limit of the difference quotients is bounded above by \begin{align*} \sup_{x \in X^*(y)} \nabla_y\phi(y,x)\cdot h. \end{align*} Therefore \begin{align*} \limsup_{t \downarrow 0} \frac{F(y+th)-F(y)}{t} \leq \sup_{x \in X^*(y)} \nabla_y\phi(y,x)\cdot h. \end{align*} Combining this with the lower bound proves the directional derivative formula. [/step] [step:Convert active gradients into subgradients] Fix $x \in X^*(y)$. Since $\phi(\cdot,x)$ is convex and differentiable at $y$, its first-order convexity inequality gives, for every $z \in \mathbb{R}^k$, \begin{align*} \phi(z,x) \geq \phi(y,x)+\nabla_y\phi(y,x)\cdot (z-y). \end{align*} Because $F(z)\geq \phi(z,x)$ and $\phi(y,x)=F(y)$, we obtain \begin{align*} F(z) \geq F(y)+\nabla_y\phi(y,x)\cdot (z-y). \end{align*} Thus $\nabla_y\phi(y,x)\in \partial F(y)$. Now let $m \in \mathbb{N}$, let $x_1,\dots,x_m \in X^*(y)$, and let $\lambda_1,\dots,\lambda_m \in [0,1]$ satisfy $\sum_{i=1}^m \lambda_i=1$. Define \begin{align*} g = \sum_{i=1}^m \lambda_i \nabla_y\phi(y,x_i). \end{align*} Multiplying the subgradient inequality for each $x_i$ by $\lambda_i$ and summing over $i$ gives, for every $z \in \mathbb{R}^k$, \begin{align*} F(z) \geq F(y)+g\cdot (z-y). \end{align*} Therefore $g \in \partial F(y)$, and hence \begin{align*} \operatorname{conv}\{\nabla_y \phi(y,x):x\in X^*(y)\}\subseteq \partial F(y). \end{align*} [/step] [step:Derive differentiability from uniqueness of the active maximizer] Assume $X^*(y)=\{x^*(y)\}$. Define \begin{align*} g=\nabla_y\phi(y,x^*(y)). \end{align*} We prove that $F(y+u)-F(y)-g\cdot u=o(|u|)$ as $u \to 0$. For the lower bound, the subgradient inequality from the previous step gives \begin{align*} F(y+u)-F(y)-g\cdot u \geq 0 \end{align*} for every $u \in \mathbb{R}^k$. For the upper bound, let $(u_j)_{j=1}^{\infty}$ be any sequence in $\mathbb{R}^k\setminus\{0\}$ with $u_j \to 0$. Choose $x_j \in X^*(y+u_j)$. By compactness of $X$, after passing to a subsequence, $x_j \to x_0$ for some $x_0 \in X$. The continuity of $F$ at $y$ and of $\phi$ imply \begin{align*} \phi(y,x_0)=\lim_{j\to\infty}\phi(y+u_j,x_j)=\lim_{j\to\infty}F(y+u_j)=F(y). \end{align*} Hence $x_0 \in X^*(y)$, so uniqueness gives $x_0=x^*(y)$. For each $j$, using maximality of $x_j$ at $y+u_j$ and the identity $\phi(y,x_j)\leq F(y)$, we get \begin{align*} F(y+u_j)-F(y) \leq \phi(y+u_j,x_j)-\phi(y,x_j). \end{align*} By the fundamental theorem of calculus along $s \mapsto y+s u_j$, \begin{align*} \phi(y+u_j,x_j)-\phi(y,x_j)=\int_0^1 \nabla_y\phi(y+s u_j,x_j)\cdot u_j \, d\mathcal{L}^1(s). \end{align*} Therefore \begin{align*} \frac{F(y+u_j)-F(y)-g\cdot u_j}{|u_j|} \leq \sup_{0\leq s\leq 1}\left|\nabla_y\phi(y+s u_j,x_j)-g\right|. \end{align*} Since $u_j \to 0$, $x_j \to x^*(y)$ along the chosen subsequence, and $(z,x)\mapsto \nabla_y\phi(z,x)$ is continuous, the right-hand side tends to $0$. Every subsequential upper limit of \begin{align*} \frac{F(y+u)-F(y)-g\cdot u}{|u|} \end{align*} is therefore at most $0$, while the lower bound shows it is at least $0$. Hence \begin{align*} F(y+u)=F(y)+g\cdot u+o(|u|) \end{align*} as $u\to 0$. This is precisely differentiability of $F$ at $y$, with \begin{align*} \nabla F(y)=g=\nabla_y\phi(y,x^*(y)). \end{align*} [/step]

Prerequisites (0/2 completed)

Prerequisites Graph

Interactive dependency map showing how this theorem builds on foundational concepts

Loading dependency graph...

Theorems

Fundamental Theorem Of Calculus

Definitions & Concepts

Continuity

What brings you to Androma?

Start with a route through the knowledge graph.

Danskin Envelope Theorem (Theorem # 6691)

Discussion

Proof

Prerequisites (0/2 completed)

Prerequisites Graph

Explore Further

Sign in to Androma

Check your inbox

One last step

Danskin Envelope Theorem (Theorem # 6691)

Discussion

Proof

Prerequisites (0/2 completed)

Prerequisites Graph

Explore Further