[proofplan]
The [Slater strong duality](/theorems/6684) theorem with the stated closedness hypothesis gives optimal dual multipliers, and zero duality gap identifies the Lagrangian value at $x^*$ with the primal optimum. This identity forces [complementary slackness](/theorems/2559), while optimality of the dual multipliers forces $x^*$ to minimize the Lagrangian over $X$. Differentiability at $x^*$ then converts this constrained minimization statement into the normal-cone stationarity inclusion. Finally, when the data are convex and affine, the same stationarity inequality gives global minimality of the Lagrangian, and feasibility plus complementary slackness identify that lower bound with $f_0(x^*)$.
[/proofplan]
[step:Choose optimal dual multipliers from strong duality]
Let $U \subset \mathbb{R}^n$ be the [open set](/page/Open%20Set) from the theorem statement on which the objective and constraint functions are defined. Define the Lagrangian map $L: U \times \mathbb{R}^m_+ \times \mathbb{R}^p \to \mathbb{R}$ by
\begin{align*}
L:(x,\lambda,\nu) \mapsto f_0(x)+\sum_{i=1}^m \lambda_i f_i(x)+\sum_{j=1}^p \nu_j h_j(x).
\end{align*}
Define the dual function $g: \mathbb{R}^m_+ \times \mathbb{R}^p \to [-\infty,\infty)$ by
\begin{align*}
g:(\lambda,\nu) \mapsto \inf_{x \in X} L(x,\lambda,\nu).
\end{align*}
The hypotheses needed for Slater strong duality are now explicitly in force: $X$ is convex, $f_0,f_1,\dots,f_m$ are convex on $X$, $h_1,\dots,h_p$ are affine, Slater's condition holds, and the stated closedness hypothesis holds. The theorem statement assumes the conclusion of that duality result, namely strong duality with dual attainment. Hence there are multipliers $\lambda^* \in \mathbb{R}^m_+$ and $\nu^* \in \mathbb{R}^p$ such that
\begin{align*}
g(\lambda^*,\nu^*)=f_0(x^*).
\end{align*}
Since $x^*$ is primal feasible, the primal feasibility relations
\begin{align*}
f_i(x^*) \leq 0 \text{ for } 1 \leq i \leq m
\end{align*}
and
\begin{align*}
h_j(x^*)=0 \text{ for } 1 \leq j \leq p
\end{align*}
hold by definition of primal optimality. Since $\lambda^* \in \mathbb{R}^m_+$, the dual feasibility relations $\lambda_i^* \geq 0$ hold for $1 \leq i \leq m$.
[/step]
[step:Derive complementary slackness from equality of primal and dual values]
Because $g(\lambda^*,\nu^*)$ is the infimum of $L(\cdot,\lambda^*,\nu^*)$ over $X$, we have
\begin{align*}
g(\lambda^*,\nu^*) \leq L(x^*,\lambda^*,\nu^*).
\end{align*}
Using feasibility of $x^*$ and the definition of $L$,
\begin{align*}
L(x^*,\lambda^*,\nu^*)=f_0(x^*)+\sum_{i=1}^m \lambda_i^* f_i(x^*)+\sum_{j=1}^p \nu_j^* h_j(x^*).
\end{align*}
Since $h_j(x^*)=0$ for every $j$, this becomes
\begin{align*}
L(x^*,\lambda^*,\nu^*)=f_0(x^*)+\sum_{i=1}^m \lambda_i^* f_i(x^*).
\end{align*}
For each $i$, dual feasibility gives $\lambda_i^* \geq 0$ and primal feasibility gives $f_i(x^*) \leq 0$, hence $\lambda_i^* f_i(x^*) \leq 0$. Therefore
\begin{align*}
L(x^*,\lambda^*,\nu^*) \leq f_0(x^*).
\end{align*}
Combining this with $g(\lambda^*,\nu^*)=f_0(x^*)$ and $g(\lambda^*,\nu^*) \leq L(x^*,\lambda^*,\nu^*)$ yields
\begin{align*}
L(x^*,\lambda^*,\nu^*)=f_0(x^*).
\end{align*}
Thus
\begin{align*}
\sum_{i=1}^m \lambda_i^* f_i(x^*)=0.
\end{align*}
Every summand satisfies $\lambda_i^* f_i(x^*) \leq 0$, so each summand must be zero. Hence
\begin{align*}
\lambda_i^* f_i(x^*)=0 \text{ for } 1 \leq i \leq m.
\end{align*}
[guided]
The point of this step is to extract the equality conditions hidden inside zero duality gap. Since $g(\lambda^*,\nu^*)$ is defined as the infimum of $L(x,\lambda^*,\nu^*)$ over all $x \in X$, evaluating at the particular point $x^*$ gives
\begin{align*}
g(\lambda^*,\nu^*) \leq L(x^*,\lambda^*,\nu^*).
\end{align*}
Strong duality and dual attainment give
\begin{align*}
g(\lambda^*,\nu^*)=f_0(x^*).
\end{align*}
Now expand the Lagrangian at $x^*$. By definition,
\begin{align*}
L(x^*,\lambda^*,\nu^*)=f_0(x^*)+\sum_{i=1}^m \lambda_i^* f_i(x^*)+\sum_{j=1}^p \nu_j^* h_j(x^*).
\end{align*}
The equality constraints vanish at $x^*$ because $x^*$ is feasible, so
\begin{align*}
L(x^*,\lambda^*,\nu^*)=f_0(x^*)+\sum_{i=1}^m \lambda_i^* f_i(x^*).
\end{align*}
Each term in the remaining sum is nonpositive: $\lambda_i^* \geq 0$ by dual feasibility, while $f_i(x^*) \leq 0$ by primal feasibility. Therefore
\begin{align*}
L(x^*,\lambda^*,\nu^*) \leq f_0(x^*).
\end{align*}
Together with $f_0(x^*)=g(\lambda^*,\nu^*) \leq L(x^*,\lambda^*,\nu^*)$, this forces equality:
\begin{align*}
L(x^*,\lambda^*,\nu^*)=f_0(x^*).
\end{align*}
Substituting the expanded expression for $L$ gives
\begin{align*}
\sum_{i=1}^m \lambda_i^* f_i(x^*)=0.
\end{align*}
A sum of nonpositive [real numbers](/page/Real%20Numbers) can equal zero only when every summand is zero. Hence
\begin{align*}
\lambda_i^* f_i(x^*)=0 \text{ for } 1 \leq i \leq m.
\end{align*}
This is complementary slackness.
[/guided]
[/step]
[step:Show that $x^*$ minimizes the optimal Lagrangian over $X$]
From the preceding step,
\begin{align*}
L(x^*,\lambda^*,\nu^*)=f_0(x^*).
\end{align*}
Since $g(\lambda^*,\nu^*)=f_0(x^*)$, we obtain
\begin{align*}
L(x^*,\lambda^*,\nu^*)=g(\lambda^*,\nu^*).
\end{align*}
By definition of the dual function, $g(\lambda^*,\nu^*)=\inf_{x \in X} L(x,\lambda^*,\nu^*)$. Therefore
\begin{align*}
L(x^*,\lambda^*,\nu^*)=\inf_{x \in X} L(x,\lambda^*,\nu^*).
\end{align*}
Thus $x^*$ minimizes the function $x \mapsto L(x,\lambda^*,\nu^*)$ over $X$.
[/step]
[step:Convert Lagrangian minimality into the normal-cone stationarity condition]
Define the function $\Phi: U \to \mathbb{R}$ by
\begin{align*}
\Phi:x \mapsto L(x,\lambda^*,\nu^*).
\end{align*}
Since $f_0,f_1,\dots,f_m,h_1,\dots,h_p$ are differentiable at $x^*$ as functions on the open set $U$, the function $\Phi$ is differentiable at $x^*$ in the ambient Euclidean sense and
\begin{align*}
\nabla \Phi(x^*)=\nabla f_0(x^*)+\sum_{i=1}^m \lambda_i^*\nabla f_i(x^*)+\sum_{j=1}^p \nu_j^*\nabla h_j(x^*).
\end{align*}
Fix $x \in X$ and define the line segment map $\gamma_x: [0,1] \to X$ by
\begin{align*}
\gamma_x:t \mapsto x^*+t(x-x^*).
\end{align*}
This map has image in $X$ because $X$ is convex. Since $x^*$ minimizes $\Phi|_X$ over $X$, define the one-variable function $\psi_x: [0,1] \to \mathbb{R}$ by
\begin{align*}
\psi_x:t \mapsto \Phi(\gamma_x(t)).
\end{align*}
Then $\psi_x$ has a minimum at $t=0$. Differentiability of the ambient function $\Phi:U\to\mathbb R$ at $x^*$ gives the right derivative
\begin{align*}
\psi_x'(0+)=\nabla \Phi(x^*)\cdot(x-x^*).
\end{align*}
Because $\psi_x(t) \geq \psi_x(0)$ for all $t \in [0,1]$, this right derivative is nonnegative:
\begin{align*}
\nabla \Phi(x^*)\cdot(x-x^*) \geq 0.
\end{align*}
Equivalently,
\begin{align*}
(-\nabla \Phi(x^*))\cdot(x-x^*) \leq 0 \text{ for every } x \in X.
\end{align*}
By the definition of $N_X(x^*)$, this means $-\nabla \Phi(x^*) \in N_X(x^*)$, which is equivalent to
\begin{align*}
0 \in \nabla \Phi(x^*)+N_X(x^*).
\end{align*}
Substituting the formula for $\nabla \Phi(x^*)$ gives
\begin{align*}
0 \in \nabla f_0(x^*)+\sum_{i=1}^m \lambda_i^*\nabla f_i(x^*)+\sum_{j=1}^p \nu_j^*\nabla h_j(x^*)+N_X(x^*).
\end{align*}
[guided]
We now turn the global minimization statement for the Lagrangian into a first-order condition. From the preceding step, $x^*$ minimizes $L(\cdot,\lambda^*,\nu^*)$ over $X$. The gradients in the KKT condition are ambient Euclidean gradients, so we use the open set $U \subset \mathbb{R}^n$ from the theorem statement. Define $\Phi: U \to \mathbb{R}$ by
\begin{align*}
\Phi:x \mapsto L(x,\lambda^*,\nu^*).
\end{align*}
Because each function $f_0,f_1,\dots,f_m,h_1,\dots,h_p$ is differentiable at $x^*$ as a function on $U$, and because $\lambda_i^*$ and $\nu_j^*$ are fixed real coefficients, $\Phi$ is differentiable at $x^*$ in the ambient Euclidean sense with gradient
\begin{align*}
\nabla \Phi(x^*)=\nabla f_0(x^*)+\sum_{i=1}^m \lambda_i^*\nabla f_i(x^*)+\sum_{j=1}^p \nu_j^*\nabla h_j(x^*).
\end{align*}
We know from the previous step that $x^*$ minimizes $\Phi$ over the convex set $X$. To test the first-order consequence in an arbitrary feasible direction, fix $x \in X$ and define $\gamma_x: [0,1] \to X$ by
\begin{align*}
\gamma_x:t \mapsto x^*+t(x-x^*).
\end{align*}
The image of $\gamma_x$ lies in $X$ because $X$ is convex: every point $\gamma_x(t)$ is a convex combination of $x^*$ and $x$. Now define $\psi_x: [0,1] \to \mathbb{R}$ by
\begin{align*}
\psi_x:t \mapsto \Phi(\gamma_x(t)).
\end{align*}
Since $x^*$ minimizes $\Phi$ on $X$, the point $t=0$ minimizes $\psi_x$ on $[0,1]$. Hence
\begin{align*}
\psi_x(t)-\psi_x(0) \geq 0 \text{ for every } t \in [0,1].
\end{align*}
For $t>0$, divide by $t$:
\begin{align*}
\frac{\psi_x(t)-\psi_x(0)}{t} \geq 0.
\end{align*}
Taking the limit as $t \downarrow 0$ and using differentiability of the ambient map $\Phi:U\to\mathbb R$ at $x^*$ gives
\begin{align*}
\nabla \Phi(x^*)\cdot(x-x^*) \geq 0.
\end{align*}
This inequality holds for every $x \in X$. Rewriting it with the sign reversed gives
\begin{align*}
(-\nabla \Phi(x^*))\cdot(x-x^*) \leq 0 \text{ for every } x \in X.
\end{align*}
By the definition of the normal cone
\begin{align*}
N_X(x^*)=\{w \in \mathbb{R}^n: w\cdot(x-x^*)\leq 0 \text{ for every } x \in X\},
\end{align*}
we have $-\nabla \Phi(x^*) \in N_X(x^*)$. This is exactly
\begin{align*}
0 \in \nabla \Phi(x^*)+N_X(x^*).
\end{align*}
Substituting the expression for $\nabla \Phi(x^*)$ yields
\begin{align*}
0 \in \nabla f_0(x^*)+\sum_{i=1}^m \lambda_i^*\nabla f_i(x^*)+\sum_{j=1}^p \nu_j^*\nabla h_j(x^*)+N_X(x^*).
\end{align*}
This is the normal-cone stationarity condition.
[/guided]
[/step]
[step:Reduce stationarity to the gradient equation at an interior point]
Assume that $X$ is full-dimensional and $x^* \in \operatorname{int}(X)$. For $r>0$, let $B(x^*,r)$ denote the open Euclidean ball
\begin{align*}
B(x^*,r)=\{y \in \mathbb{R}^n: |y-x^*|<r\}.
\end{align*}
We show that $N_X(x^*)=\{0\}$. The vector $0 \in \mathbb{R}^n$ belongs to $N_X(x^*)$ by definition. Conversely, let $w \in N_X(x^*)$. Since $x^* \in \operatorname{int}(X)$, there exists $r>0$ such that $B(x^*,r) \subset X$. If $w \neq 0$, define $y \in \mathbb{R}^n$ by
\begin{align*}
y=x^*+\frac{r}{2|w|}w.
\end{align*}
Then $y \in B(x^*,r) \subset X$. Since $w \in N_X(x^*)$,
\begin{align*}
w\cdot(y-x^*) \leq 0.
\end{align*}
But the definition of $y$ gives
\begin{align*}
w\cdot(y-x^*)=\frac{r}{2|w|}|w|^2=\frac{r|w|}{2}>0,
\end{align*}
a contradiction. Hence $w=0$, so $N_X(x^*)=\{0\}$. The normal-cone stationarity inclusion therefore becomes
\begin{align*}
\nabla f_0(x^*)+\sum_{i=1}^m \lambda_i^*\nabla f_i(x^*)+\sum_{j=1}^p \nu_j^*\nabla h_j(x^*)=0.
\end{align*}
If $X$ is lower-dimensional, even points in $\operatorname{ri}(X)$ may have nonzero normal vectors orthogonal to the affine hull of $X$, so the preceding argument does not apply and the normal-cone inclusion remains the correct condition.
[guided]
The stationarity inclusion says that the negative gradient of the optimal Lagrangian may be balanced by a normal vector to the feasible set. At a genuine interior point of a full-dimensional feasible set, there are no nonzero normal vectors. We verify this directly from the definition.
Assume that $X$ is full-dimensional and $x^* \in \operatorname{int}(X)$. For $r>0$, let $B(x^*,r)$ denote the open Euclidean ball
\begin{align*}
B(x^*,r)=\{y \in \mathbb{R}^n: |y-x^*|<r\}.
\end{align*}
We prove $N_X(x^*)=\{0\}$. First, $0 \in N_X(x^*)$ because
\begin{align*}
0\cdot(x-x^*)=0 \leq 0 \text{ for every } x \in X.
\end{align*}
Conversely, let $w \in N_X(x^*)$. Since $x^* \in \operatorname{int}(X)$, there exists $r>0$ such that $B(x^*,r) \subset X$. Suppose, for contradiction, that $w \neq 0$. Define $y \in \mathbb{R}^n$ by
\begin{align*}
y=x^*+\frac{r}{2|w|}w.
\end{align*}
Then
\begin{align*}
|y-x^*|=\frac{r}{2|w|}|w|=\frac{r}{2}<r,
\end{align*}
so $y \in B(x^*,r) \subset X$. Because $w \in N_X(x^*)$, the definition of the normal cone gives
\begin{align*}
w\cdot(y-x^*) \leq 0.
\end{align*}
But the chosen point $y$ lies in the direction of $w$, and hence
\begin{align*}
w\cdot(y-x^*)=\frac{r}{2|w|}|w|^2=\frac{r|w|}{2}>0.
\end{align*}
This contradiction shows that no nonzero $w$ belongs to $N_X(x^*)$. Therefore $N_X(x^*)=\{0\}$.
With the normal cone reduced to $\{0\}$, the stationarity inclusion
\begin{align*}
0 \in \nabla f_0(x^*)+\sum_{i=1}^m \lambda_i^*\nabla f_i(x^*)+\sum_{j=1}^p \nu_j^*\nabla h_j(x^*)+N_X(x^*)
\end{align*}
becomes the gradient equation
\begin{align*}
\nabla f_0(x^*)+\sum_{i=1}^m \lambda_i^*\nabla f_i(x^*)+\sum_{j=1}^p \nu_j^*\nabla h_j(x^*)=0.
\end{align*}
The full-dimensional interior assumption is essential for this reduction. If $X$ is lower-dimensional, a point may lie in $\operatorname{ri}(X)$ while still admitting nonzero normal vectors perpendicular to the affine hull of $X$. In that case the normal-cone inclusion, not the unconstrained gradient equation, is the correct stationarity condition.
[/guided]
[/step]
[step:Prove sufficiency for convex problems]
Assume now that $f_0,f_1,\dots,f_m$ are convex on $X$ and $h_1,\dots,h_p$ are affine on $X$. Let $x^* \in X$ be primal feasible, and suppose there exist $\lambda^* \in \mathbb{R}^m_+$ and $\nu^* \in \mathbb{R}^p$ satisfying complementary slackness and normal-cone stationarity.
Define $\Phi: X \to \mathbb{R}$ by
\begin{align*}
\Phi:x \mapsto L(x,\lambda^*,\nu^*).
\end{align*}
Because $f_0,f_1,\dots,f_m$ are convex, $\lambda_i^* \geq 0$ for every $i$, and $h_1,\dots,h_p$ are affine, the function $\Phi$ is convex on $X$. The stationarity condition says that there exists $w^* \in N_X(x^*)$ such that
\begin{align*}
\nabla \Phi(x^*)+w^*=0.
\end{align*}
Thus $w^*=-\nabla \Phi(x^*)$. Since $w^* \in N_X(x^*)$, for every $x \in X$,
\begin{align*}
w^*\cdot(x-x^*) \leq 0.
\end{align*}
Equivalently,
\begin{align*}
\nabla \Phi(x^*)\cdot(x-x^*) \geq 0.
\end{align*}
Convexity of $\Phi$ gives the first-order convexity inequality
\begin{align*}
\Phi(x) \geq \Phi(x^*)+\nabla \Phi(x^*)\cdot(x-x^*) \text{ for every } x \in X.
\end{align*}
Therefore
\begin{align*}
\Phi(x) \geq \Phi(x^*) \text{ for every } x \in X.
\end{align*}
Now let $x \in X$ be any primal feasible point. Since $\lambda_i^* \geq 0$ and $f_i(x) \leq 0$,
\begin{align*}
\sum_{i=1}^m \lambda_i^* f_i(x) \leq 0.
\end{align*}
Since $h_j(x)=0$ for every $j$,
\begin{align*}
L(x,\lambda^*,\nu^*) \leq f_0(x).
\end{align*}
At $x^*$, complementary slackness and equality feasibility give
\begin{align*}
L(x^*,\lambda^*,\nu^*)=f_0(x^*).
\end{align*}
Combining these inequalities,
\begin{align*}
f_0(x) \geq L(x,\lambda^*,\nu^*) \geq L(x^*,\lambda^*,\nu^*)=f_0(x^*).
\end{align*}
Thus $f_0(x) \geq f_0(x^*)$ for every primal feasible $x \in X$, so $x^*$ is a primal optimum.
[guided]
We now prove the converse direction for convex problems. Assume $f_0,f_1,\dots,f_m$ are convex on $X$ and $h_1,\dots,h_p$ are affine on $X$. Let $x^* \in X$ be primal feasible, and suppose there are multipliers $\lambda^* \in \mathbb{R}^m_+$ and $\nu^* \in \mathbb{R}^p$ satisfying complementary slackness and normal-cone stationarity.
Define $\Phi: X \to \mathbb{R}$ by
\begin{align*}
\Phi:x \mapsto L(x,\lambda^*,\nu^*).
\end{align*}
Why is $\Phi$ convex? The function $f_0$ is convex, each term $\lambda_i^* f_i$ is convex because $\lambda_i^* \geq 0$ and $f_i$ is convex, and each term $\nu_j^* h_j$ is affine because $h_j$ is affine. A sum of convex functions and affine functions is convex, so $\Phi$ is convex on $X$.
The stationarity condition says that there exists $w^* \in N_X(x^*)$ such that
\begin{align*}
\nabla \Phi(x^*)+w^*=0.
\end{align*}
Thus $w^*=-\nabla \Phi(x^*)$. Since $w^* \in N_X(x^*)$, the definition of the normal cone gives, for every $x \in X$,
\begin{align*}
w^*\cdot(x-x^*) \leq 0.
\end{align*}
Substituting $w^*=-\nabla \Phi(x^*)$ gives
\begin{align*}
\nabla \Phi(x^*)\cdot(x-x^*) \geq 0.
\end{align*}
Because $\Phi$ is convex and differentiable at $x^*$, the first-order convexity inequality gives
\begin{align*}
\Phi(x) \geq \Phi(x^*)+\nabla \Phi(x^*)\cdot(x-x^*) \text{ for every } x \in X.
\end{align*}
Combining this with the preceding nonnegative directional inequality yields
\begin{align*}
\Phi(x) \geq \Phi(x^*) \text{ for every } x \in X.
\end{align*}
Thus $x^*$ minimizes the Lagrangian with the proposed multipliers over all of $X$.
Now take any primal feasible point $x \in X$. Since $\lambda_i^* \geq 0$ and $f_i(x) \leq 0$, each product $\lambda_i^* f_i(x)$ is nonpositive, so
\begin{align*}
\sum_{i=1}^m \lambda_i^* f_i(x) \leq 0.
\end{align*}
Since $h_j(x)=0$ for every equality constraint, the Lagrangian at $x$ satisfies
\begin{align*}
L(x,\lambda^*,\nu^*) \leq f_0(x).
\end{align*}
At $x^*$, complementary slackness gives $\lambda_i^* f_i(x^*)=0$ for every $i$, and equality feasibility gives $h_j(x^*)=0$ for every $j$. Therefore
\begin{align*}
L(x^*,\lambda^*,\nu^*)=f_0(x^*).
\end{align*}
Putting the inequalities together, we obtain
\begin{align*}
f_0(x) \geq L(x,\lambda^*,\nu^*) \geq L(x^*,\lambda^*,\nu^*)=f_0(x^*).
\end{align*}
Since this holds for every primal feasible $x \in X$, the point $x^*$ is a primal optimum.
[/guided]
[/step]