Saddle Point Theorem for Lagrange Duality — Statement & Proof

Theorem

Edit Issues Pull Requests Attributions Admin

Discussion

Proof

[proofplan] We prove both implications directly from the definitions of the Lagrangian, primal feasibility, and the dual function. Starting from primal and dual optimality with zero duality gap, [weak duality](/theorems/2549) forces the optimal Lagrangian value to equal the primal value, and feasibility gives the left saddle inequality. Conversely, the saddle inequalities first force feasibility of $x^*$ and [complementary slackness](/theorems/2559); then the right saddle inequality identifies $x^*$ as a minimizer of the Lagrangian at $(\lambda^*,\nu^*)$, so the dual value equals the primal value. These identities imply primal optimality, dual optimality, and zero duality gap. [/proofplan] [step:Declare the optimization data and dual objects] Throughout the proof, let $D$ denote the common domain of the objective and constraint functions. Let $f_0:D\to\mathbb R$ be the objective, let $f_i:D\to\mathbb R$ for $1\leq i\leq m$ be the inequality constraint functions, and let $h_j:D\to\mathbb R$ for $1\leq j\leq p$ be the equality constraint functions. The feasible set is $\mathcal F=\{x\in D: f_i(x)\leq 0 \text{ for }1\leq i\leq m,\ h_j(x)=0 \text{ for }1\leq j\leq p\}$. The Lagrangian $L:D\times\mathbb R_+^m\times\mathbb R^p\to\mathbb R$ is defined by $L(x,\lambda,\nu)=f_0(x)+\sum_{i=1}^m\lambda_i f_i(x)+\sum_{j=1}^p\nu_j h_j(x)$, and the dual function $g:\mathbb R_+^m\times\mathbb R^p\to\mathbb R\cup\{-\infty\}$ is defined by $g(\lambda,\nu)=\inf_{x\in D}L(x,\lambda,\nu)$. The primal and dual optimal values are denoted by $p^*$ and $d^*$, respectively. [/step] [step:Derive the saddle inequalities from optimality and zero duality gap] Assume that $x^*\in\mathcal{F}$ is primal optimal, $(\lambda^*,\nu^*)$ is dual optimal, and $p^*=d^*$. Since $x^*$ is primal optimal, \begin{align*} f_0(x^*)=p^*. \end{align*} Since $(\lambda^*,\nu^*)$ is dual optimal, \begin{align*} g(\lambda^*,\nu^*)=d^*=p^*. \end{align*} By definition of $g$ as an infimum over $D$, for every $x\in D$, \begin{align*} g(\lambda^*,\nu^*)\leq L(x,\lambda^*,\nu^*). \end{align*} Thus, for every $x\in D$, \begin{align*} p^* \leq L(x,\lambda^*,\nu^*). \end{align*} Because $x^*\in\mathcal{F}$, we have $f_i(x^*)\leq 0$ for $1\leq i\leq m$ and $h_j(x^*)=0$ for $1\leq j\leq p$. Since $\lambda_i^*\geq 0$ for each $i$, \begin{align*} L(x^*,\lambda^*,\nu^*) = f_0(x^*)+\sum_{i=1}^m \lambda_i^* f_i(x^*)+\sum_{j=1}^p \nu_j^* h_j(x^*) \leq f_0(x^*)=p^*. \end{align*} On the other hand, $g(\lambda^*,\nu^*)\leq L(x^*,\lambda^*,\nu^*)$, so \begin{align*} p^*=g(\lambda^*,\nu^*)\leq L(x^*,\lambda^*,\nu^*)\leq p^*. \end{align*} Hence \begin{align*} L(x^*,\lambda^*,\nu^*)=p^*. \end{align*} Combining this with $p^*\leq L(x,\lambda^*,\nu^*)$ gives \begin{align*} L(x^*,\lambda^*,\nu^*)\leq L(x,\lambda^*,\nu^*) \end{align*} for every $x\in D$. Now fix arbitrary $\lambda\in\mathbb{R}_+^m$ and $\nu\in\mathbb{R}^p$. Using feasibility of $x^*$ again, \begin{align*} L(x^*,\lambda,\nu)=f_0(x^*)+\sum_{i=1}^m \lambda_i f_i(x^*)+\sum_{j=1}^p \nu_j h_j(x^*) \leq f_0(x^*)=p^*. \end{align*} Since $L(x^*,\lambda^*,\nu^*)=p^*$, this gives \begin{align*} L(x^*,\lambda,\nu)\leq L(x^*,\lambda^*,\nu^*). \end{align*} Therefore $(x^*,\lambda^*,\nu^*)$ is a saddle point of $L$. [guided] Assume that $x^*$ is primal optimal, $(\lambda^*,\nu^*)$ is dual optimal, and $p^*=d^*$. The goal is to prove the two saddle inequalities \begin{align*} L(x^*,\lambda,\nu)\leq L(x^*,\lambda^*,\nu^*) \leq L(x,\lambda^*,\nu^*) \end{align*} for every $x\in D$, $\lambda\in\mathbb{R}_+^m$, and $\nu\in\mathbb{R}^p$. First, primal optimality gives \begin{align*} f_0(x^*)=p^*. \end{align*} Dual optimality and zero duality gap give \begin{align*} g(\lambda^*,\nu^*)=d^*=p^*. \end{align*} Because $g(\lambda^*,\nu^*)$ is defined as the infimum of $L(\cdot,\lambda^*,\nu^*)$ over $D$, it is bounded above by every value of that function. Thus, for every $x\in D$, \begin{align*} g(\lambda^*,\nu^*)\leq L(x,\lambda^*,\nu^*). \end{align*} Substituting $g(\lambda^*,\nu^*)=p^*$ gives \begin{align*} p^*\leq L(x,\lambda^*,\nu^*). \end{align*} Next we compare $L(x^*,\lambda^*,\nu^*)$ with $f_0(x^*)$. Since $x^*$ is feasible, all inequality constraints satisfy $f_i(x^*)\leq 0$, and all equality constraints satisfy $h_j(x^*)=0$. Since each multiplier $\lambda_i^*$ is nonnegative, each product $\lambda_i^* f_i(x^*)$ is nonpositive. Therefore \begin{align*} L(x^*,\lambda^*,\nu^*) = f_0(x^*)+\sum_{i=1}^m \lambda_i^* f_i(x^*)+\sum_{j=1}^p \nu_j^* h_j(x^*) \leq f_0(x^*)=p^*. \end{align*} But the definition of $g$ also gives \begin{align*} p^*=g(\lambda^*,\nu^*)\leq L(x^*,\lambda^*,\nu^*). \end{align*} The two inequalities force equality: \begin{align*} L(x^*,\lambda^*,\nu^*)=p^*. \end{align*} Combining this identity with $p^*\leq L(x,\lambda^*,\nu^*)$ yields the right saddle inequality \begin{align*} L(x^*,\lambda^*,\nu^*)\leq L(x,\lambda^*,\nu^*) \end{align*} for every $x\in D$. It remains to prove the left saddle inequality. Let $\lambda\in\mathbb{R}_+^m$ and $\nu\in\mathbb{R}^p$ be arbitrary. Feasibility of $x^*$ again gives $f_i(x^*)\leq 0$ and $h_j(x^*)=0$. Since each $\lambda_i\geq 0$, \begin{align*} L(x^*,\lambda,\nu)=f_0(x^*)+\sum_{i=1}^m \lambda_i f_i(x^*)+\sum_{j=1}^p \nu_j h_j(x^*) \leq f_0(x^*)=p^*. \end{align*} Using $L(x^*,\lambda^*,\nu^*)=p^*$, we obtain \begin{align*} L(x^*,\lambda,\nu)\leq L(x^*,\lambda^*,\nu^*). \end{align*} Thus both saddle inequalities hold, so $(x^*,\lambda^*,\nu^*)$ is a saddle point. [/guided] [/step] [step:Use the left saddle inequality to prove feasibility of $x^*$] Assume conversely that $(x^*,\lambda^*,\nu^*)$ is a saddle point of $L$. Since $L:D\times\mathbb R_+^m\times\mathbb R^p\to\mathbb R$, the value $L(x^*,\lambda^*,\nu^*)$ is finite, and the values $f_0(x^*)$, $f_i(x^*)$, and $h_j(x^*)$ are finite [real numbers](/page/Real%20Numbers) whenever they appear below. Thus for every $\lambda\in\mathbb{R}_+^m$ and every $\nu\in\mathbb{R}^p$, \begin{align*} L(x^*,\lambda,\nu)\leq L(x^*,\lambda^*,\nu^*). \end{align*} We prove that $x^*\in\mathcal{F}$. Fix an index $i\in\{1,\dots,m\}$. For each $t\geq 0$, define $\lambda(t)\in\mathbb{R}_+^m$ by $\lambda_i(t)=t$ and $\lambda_k(t)=0$ for $k\neq i$. Taking $\nu=0\in\mathbb{R}^p$ in the left saddle inequality gives \begin{align*} f_0(x^*)+t f_i(x^*)\leq L(x^*,\lambda^*,\nu^*) \end{align*} for every $t\geq 0$. If $f_i(x^*)>0$, the left-hand side tends to $+\infty$ as $t\to\infty$, contradicting the fixed finite right-hand side. Hence $f_i(x^*)\leq 0$. Since $i$ was arbitrary, all inequality constraints hold. Fix an index $j\in\{1,\dots,p\}$. For each $s\in\mathbb{R}$, define $\nu(s)\in\mathbb{R}^p$ by $\nu_j(s)=s$ and $\nu_\ell(s)=0$ for $\ell\neq j$. Taking $\lambda=0\in\mathbb{R}_+^m$ gives \begin{align*} f_0(x^*)+s h_j(x^*)\leq L(x^*,\lambda^*,\nu^*) \end{align*} for every $s\in\mathbb{R}$. If $h_j(x^*)>0$, then the left-hand side tends to $+\infty$ as $s\to\infty$; if $h_j(x^*)<0$, it tends to $+\infty$ as $s\to-\infty$. Both cases contradict the fixed finite right-hand side. Therefore $h_j(x^*)=0$. Since $j$ was arbitrary, all equality constraints hold, and hence $x^*\in\mathcal{F}$. [/step] [step:Identify the Lagrangian value with the primal and dual values] Since $x^*\in\mathcal{F}$, substituting $\lambda=0\in\mathbb{R}_+^m$ and $\nu=0\in\mathbb{R}^p$ into the left saddle inequality gives \begin{align*} f_0(x^*)\leq L(x^*,\lambda^*,\nu^*). \end{align*} Feasibility also gives $f_i(x^*)\leq 0$ for each $i$ and $h_j(x^*)=0$ for each $j$. Since $\lambda_i^*\geq 0$, \begin{align*} L(x^*,\lambda^*,\nu^*)=f_0(x^*)+\sum_{i=1}^m \lambda_i^* f_i(x^*)+\sum_{j=1}^p \nu_j^* h_j(x^*)\leq f_0(x^*). \end{align*} Therefore \begin{align*} L(x^*,\lambda^*,\nu^*)=f_0(x^*). \end{align*} The right saddle inequality states that for every $x\in D$, \begin{align*} L(x^*,\lambda^*,\nu^*)\leq L(x,\lambda^*,\nu^*). \end{align*} Taking the infimum over $x\in D$ gives \begin{align*} L(x^*,\lambda^*,\nu^*)\leq g(\lambda^*,\nu^*). \end{align*} The reverse inequality follows from the definition of $g$ as an infimum: \begin{align*} g(\lambda^*,\nu^*)\leq L(x^*,\lambda^*,\nu^*). \end{align*} Thus \begin{align*} g(\lambda^*,\nu^*)=L(x^*,\lambda^*,\nu^*)=f_0(x^*). \end{align*} [/step] [step:Conclude primal optimality, dual optimality, and zero duality gap] Let $y\in\mathcal{F}$ be arbitrary. Since $\lambda^*\in\mathbb{R}_+^m$, $f_i(y)\leq 0$ for every $i$, and $h_j(y)=0$ for every $j$, \begin{align*} L(y,\lambda^*,\nu^*)=f_0(y)+\sum_{i=1}^m \lambda_i^* f_i(y)+\sum_{j=1}^p \nu_j^* h_j(y)\leq f_0(y). \end{align*} The right saddle inequality gives \begin{align*} f_0(x^*)=L(x^*,\lambda^*,\nu^*)\leq L(y,\lambda^*,\nu^*)\leq f_0(y). \end{align*} Since this holds for every $y\in\mathcal{F}$, $x^*$ is primal optimal and \begin{align*} p^*=f_0(x^*). \end{align*} For every $\lambda\in\mathbb{R}_+^m$ and $\nu\in\mathbb{R}^p$, the definition of $g$ gives \begin{align*} g(\lambda,\nu)\leq L(x^*,\lambda,\nu). \end{align*} The left saddle inequality and the identity from the previous step give \begin{align*} L(x^*,\lambda,\nu)\leq L(x^*,\lambda^*,\nu^*)=g(\lambda^*,\nu^*). \end{align*} Therefore \begin{align*} g(\lambda,\nu)\leq g(\lambda^*,\nu^*) \end{align*} for every dual feasible pair $(\lambda,\nu)$. Hence $(\lambda^*,\nu^*)$ is dual optimal and \begin{align*} d^*=g(\lambda^*,\nu^*). \end{align*} Together with $g(\lambda^*,\nu^*)=f_0(x^*)=p^*$, this gives \begin{align*} L(x^*,\lambda^*,\nu^*)=f_0(x^*)=p^*=d^*=g(\lambda^*,\nu^*). \end{align*} Thus primal optimality, dual optimality, and zero duality gap all hold, completing the proof. [/step]

What brings you to Androma?

Start with a route through the knowledge graph.

Saddle Point Theorem for Lagrange Duality (Theorem # 6686)

Discussion

Proof

Explore Further

Sign in to Androma

Check your inbox

One last step

Saddle Point Theorem for Lagrange Duality (Theorem # 6686)

Discussion

Proof

Explore Further