Karush-Kuhn-Tucker Necessary Conditions under LICQ for Discretized Optimal Control

Theorem

Edit Issues Pull Requests Attributions Admin

Let $q \in \mathbb{N}$ and let $a,b \in \mathbb{N}\cup\{0\}$. Let $U \subset \mathbb{R}^q$ be an [open set](/page/Open%20Set), let $z^* \in U$, and let $F:U\to\mathbb{R}$, $A:U\to\mathbb{R}^a$, and $G:U\to\mathbb{R}^b$ be continuously differentiable maps. When $a=0$, interpret $A$ as the unique map into $\mathbb{R}^0$ and impose no equality constraints; when $b=0$, impose no inequality constraints. Suppose that $z^*$ is a local minimizer of $F$ over the feasible set \begin{align*} \mathcal{F}:=\{z \in U : A(z)=0 \text{ and } G_i(z)\leq 0 \text{ for every } i \in \{1,\dots,b\}\}. \end{align*} Thus $A(z^*)=0$ and $G_i(z^*)\leq 0$ for every $i \in \{1,\dots,b\}$. For each $j\in\{1,\dots,a\}$ let $A_j:U\to\mathbb{R}$ denote the $j$-th component of $A$, and for each $i\in\{1,\dots,b\}$ let $G_i:U\to\mathbb{R}$ denote the $i$-th component of $G$. Define the active inequality set at $z^*$ by \begin{align*} I(z^*) := \{i \in \{1,\dots,b\}:G_i(z^*)=0\}. \end{align*} Assume the [linear independence](/page/Linear%20Independence) constraint qualification holds at $z^*$: the vectors \begin{align*} \{\nabla A_j(z^*) : j \in \{1,\dots,a\}\}\cup \{\nabla G_i(z^*) : i \in I(z^*)\} \end{align*} are linearly independent in $\mathbb{R}^q$, with the convention that an empty family is linearly independent. Let $DA_{z^*}:\mathbb{R}^q\to\mathbb{R}^a$ and $DG_{z^*}:\mathbb{R}^q\to\mathbb{R}^b$ denote the total derivatives at $z^*$. Let $DA_{z^*}^{\top}:\mathbb{R}^a\to\mathbb{R}^q$ and $DG_{z^*}^{\top}:\mathbb{R}^b\to\mathbb{R}^q$ denote their transpose linear maps with respect to the Euclidean inner products. Then there exist multipliers $\lambda \in \mathbb{R}^a$ and $\mu \in \mathbb{R}^b$ such that \begin{align*} \nabla F(z^*) + DA_{z^*}^{\top}\lambda + DG_{z^*}^{\top}\mu = 0, \end{align*} \begin{align*} A(z^*)=0, \end{align*} \begin{align*} G_i(z^*)\leq 0 \quad \text{for every } i \in \{1,\dots,b\}, \end{align*} \begin{align*} \mu_i \geq 0 \quad \text{for every } i \in \{1,\dots,b\}, \end{align*} and \begin{align*} \mu_iG_i(z^*)=0 \quad \text{for every } i \in \{1,\dots,b\}. \end{align*}

Discussion

Proof

[proofplan] We separate the active and inactive inequality constraints at the local minimizer. Inactive constraints remain inactive in a neighbourhood, so only the equality constraints and active inequalities affect first-order feasible directions. LICQ lets us realize every linearized feasible direction as a limit of differentiable feasible curves, and local minimality then forces the objective gradient to be nonnegative on the linearized feasible cone. A finite-dimensional polar-cone/Farkas argument converts that first-order inequality into the existence of equality multipliers and nonnegative active inequality multipliers; extending the inactive multipliers by zero gives the stated KKT system. [/proofplan] [step:Discard inactive inequalities locally and define the linearized feasible cone] Write $I:=I(z^*)$. Since $G_i(z^*)<0$ for every $i\notin I$ and each component $G_i:U\to\mathbb{R}$ is continuous at $z^*$, there is an open neighbourhood $V\subset U$ of $z^*$ such that \begin{align*} G_i(z)<0 \quad \text{for every } z\in V \text{ and every } i\notin I. \end{align*} Thus, near $z^*$, inactive inequalities impose no first-order restriction. Define the linearized feasible cone $K\subset\mathbb{R}^q$ by \begin{align*} K:=\{d\in\mathbb{R}^q : DA_{z^*}d=0 \text{ and } DG_{i,z^*}d\leq 0 \text{ for every } i\in I\}. \end{align*} Here $DA_{z^*}:\mathbb{R}^q\to\mathbb{R}^a$ is the total derivative of $A$ at $z^*$, and $DG_{i,z^*}:\mathbb{R}^q\to\mathbb{R}$ is the total derivative of the scalar map $G_i$ at $z^*$. [/step] [step:Realize strict linearized directions by feasible curves] Let $d\in\mathbb{R}^q$ satisfy \begin{align*} DA_{z^*}d=0 \end{align*} and \begin{align*} DG_{i,z^*}d<0 \quad \text{for every } i\in I. \end{align*} We prove that there exists $\varepsilon>0$ and a differentiable curve $\gamma:[0,\varepsilon)\to U$ such that $\gamma(0)=z^*$, $\gamma'(0)=d$, and $\gamma(t)\in\mathcal{F}$ for every sufficiently small $t\geq 0$. First treat the equality constraints. If $a=0$, define $\gamma(t):=z^*+td$ for all sufficiently small $t\geq 0$ with $z^*+td\in U$; then $A(\gamma(t))=0$ is vacuous. If $a>0$, the LICQ family contains $\{\nabla A_j(z^*) : j\in\{1,\dots,a\}\}$ as a linearly independent subfamily, so $DA_{z^*}:\mathbb{R}^q\to\mathbb{R}^a$ is surjective. Since $A$ is $C^1$ near $z^*$ and $A(z^*)=0$, the finite-dimensional [implicit function theorem](/theorems/52) gives a $C^1$ local parametrization of $A^{-1}(\{0\})$ near $z^*$ whose tangent space at $z^*$ is $\ker DA_{z^*}$. Because $d\in\ker DA_{z^*}$, there are $\varepsilon>0$ and a differentiable curve $\gamma:[0,\varepsilon)\to U$ such that \begin{align*} \gamma(0)=z^*, \quad \gamma'(0)=d, \quad A(\gamma(t))=0 \text{ for every } t\in[0,\varepsilon). \end{align*} For each active index $i\in I$, differentiability of $G_i$ at $z^*$ along the curve $\gamma$ gives \begin{align*} G_i(\gamma(t))=G_i(z^*)+tDG_{i,z^*}d+r_i(t), \end{align*} where $r_i:[0,\varepsilon)\to\mathbb{R}$ satisfies $r_i(t)/t\to 0$ as $t\downarrow 0$. Since $G_i(z^*)=0$ and $DG_{i,z^*}d<0$, we have $G_i(\gamma(t))<0$ for every sufficiently small $t>0$. Inactive inequalities are negative for small $t$ by the neighbourhood $V$. Therefore, after reducing $\varepsilon$ if necessary, $\gamma(t)\in\mathcal{F}$ for all sufficiently small $t\geq 0$. [guided] The purpose of this step is to justify that strict linearized feasible directions are actual first-order motions through feasible points. Fix $d\in\mathbb{R}^q$ with \begin{align*} DA_{z^*}d=0 \end{align*} and \begin{align*} DG_{i,z^*}d<0 \quad \text{for every } i\in I. \end{align*} The equality condition says that $d$ is tangent to the equality constraints at first order, while the strict inequality condition says each active inequality immediately moves into the feasible side. We first build a curve that satisfies the equality constraints exactly. If $a=0$, there are no equality constraints, so the straight curve $\gamma(t):=z^*+td$ works for all sufficiently small $t\geq 0$ because $U$ is open. If $a>0$, the gradients $\nabla A_j(z^*)$ are linearly independent because they form a subfamily of the LICQ family. Equivalently, the derivative $DA_{z^*}:\mathbb{R}^q\to\mathbb{R}^a$ has full row rank and is surjective. The finite-dimensional implicit function theorem applies to the $C^1$ map $A:U\to\mathbb{R}^a$ at the point $z^*$ with $A(z^*)=0$ and surjective derivative. It gives that $A^{-1}(\{0\})$ is locally a $C^1$ submanifold and that its tangent space at $z^*$ is $\ker DA_{z^*}$. Since $d\in\ker DA_{z^*}$, the local parametrization of this submanifold yields a differentiable curve $\gamma:[0,\varepsilon)\to U$ satisfying \begin{align*} \gamma(0)=z^*, \quad \gamma'(0)=d, \quad A(\gamma(t))=0 \text{ for every } t\in[0,\varepsilon). \end{align*} Now check the active inequalities along this equality-feasible curve. For each $i\in I$, the component $G_i:U\to\mathbb{R}$ is differentiable at $z^*$. Therefore the one-variable expansion along $\gamma$ is \begin{align*} G_i(\gamma(t))=G_i(z^*)+tDG_{i,z^*}\gamma'(0)+r_i(t), \end{align*} where $r_i:[0,\varepsilon)\to\mathbb{R}$ is a remainder with $r_i(t)/t\to 0$ as $t\downarrow 0$. Since $i\in I$, $G_i(z^*)=0$, and since $\gamma'(0)=d$, this becomes \begin{align*} G_i(\gamma(t))=tDG_{i,z^*}d+r_i(t). \end{align*} Because $DG_{i,z^*}d<0$ and $r_i(t)/t\to 0$, the negative linear term dominates the remainder for all sufficiently small $t>0$. Hence $G_i(\gamma(t))<0$ for every active $i\in I$ and every sufficiently small $t>0$. Finally, if $i\notin I$, then $G_i(z^*)<0$. Continuity of $G_i$ and continuity of $\gamma$ at $0$ imply $G_i(\gamma(t))<0$ for all sufficiently small $t\geq 0$. Thus the curve satisfies the equalities exactly and all inequalities for small positive time, so $\gamma(t)\in\mathcal{F}$. [/guided] [/step] [step:Use local minimality to obtain nonnegativity on the whole linearized cone] First suppose $d\in K$ satisfies $DG_{i,z^*}d<0$ for every $i\in I$. By the preceding step, there is a differentiable feasible curve $\gamma:[0,\varepsilon)\to\mathcal{F}$ with $\gamma(0)=z^*$ and $\gamma'(0)=d$. Since $z^*$ is a local minimizer of $F$ on $\mathcal{F}$, \begin{align*} F(\gamma(t))-F(z^*)\geq 0 \end{align*} for all sufficiently small $t\geq 0$. Dividing by $t>0$ and letting $t\downarrow 0$ gives \begin{align*} \nabla F(z^*)\cdot d\geq 0. \end{align*} It remains to remove the strictness. LICQ implies the [linear map](/page/Linear%20Map) \begin{align*} L:\ker DA_{z^*}\to\mathbb{R}^{I},\qquad Ld:=(DG_{i,z^*}d)_{i\in I} \end{align*} is surjective. Indeed, if its range were not all of $\mathbb{R}^{I}$, there would be coefficients $\alpha_i$, not all zero, such that \begin{align*} \sum_{i\in I}\alpha_iDG_{i,z^*}d=0 \quad \text{for every } d\in\ker DA_{z^*}. \end{align*} This would imply that $\sum_{i\in I}\alpha_i\nabla G_i(z^*)$ belongs to the span of $\{\nabla A_j(z^*)\}_{j=1}^a$, contradicting LICQ. Choose $w\in\ker DA_{z^*}$ such that \begin{align*} DG_{i,z^*}w=-1 \quad \text{for every } i\in I. \end{align*} For arbitrary $d\in K$ and every $\varepsilon>0$, the vector $d+\varepsilon w$ satisfies \begin{align*} DA_{z^*}(d+\varepsilon w)=0 \end{align*} and \begin{align*} DG_{i,z^*}(d+\varepsilon w)<0 \quad \text{for every } i\in I. \end{align*} Therefore \begin{align*} \nabla F(z^*)\cdot(d+\varepsilon w)\geq 0. \end{align*} Letting $\varepsilon\downarrow 0$ yields \begin{align*} \nabla F(z^*)\cdot d\geq 0 \quad \text{for every } d\in K. \end{align*} [/step] [step:Represent the polar cone by equality and active inequality gradients] Define the polar cone $K^\circ\subset\mathbb{R}^q$ by \begin{align*} K^\circ:=\{v\in\mathbb{R}^q : v\cdot d\leq 0 \text{ for every } d\in K\}. \end{align*} From the previous step, \begin{align*} -\nabla F(z^*)\in K^\circ. \end{align*} Let $DG_{I,z^*}:\mathbb{R}^q\to\mathbb{R}^{I}$ be the linear map defined by \begin{align*} DG_{I,z^*}d:=(DG_{i,z^*}d)_{i\in I}. \end{align*} We write vectors $\mu_I\in\mathbb{R}^{I}$ with coordinates indexed by $i\in I$, and $DG_{I,z^*}^{\top}:\mathbb{R}^{I}\to\mathbb{R}^q$ denotes the transpose map. We use the following finite-dimensional Farkas polar form. If $E:X\to Y$ and $H:X\to\mathbb{R}^m$ are linear maps between finite-dimensional Euclidean spaces, and \begin{align*} C:=\{x\in X:E x=0 \text{ and } (Hx)_r\leq 0 \text{ for every } r\in\{1,\dots,m\}\}, \end{align*} then \begin{align*} C^\circ=\{E^\top \lambda+H^\top \nu : \lambda\in Y,\ \nu\in[0,\infty)^m\}. \end{align*} This is the separating-hyperplane form of Farkas' lemma for a homogeneous system of linear equalities and inequalities. Applying it with $X=\mathbb{R}^q$, $Y=\mathbb{R}^a$, $E=DA_{z^*}$, $m=|I|$, and $H=DG_{I,z^*}$ gives \begin{align*} K^\circ=\{DA_{z^*}^{\top}\lambda+DG_{I,z^*}^{\top}\mu_I : \lambda\in\mathbb{R}^a,\ \mu_I\in\mathbb{R}^{I},\ \mu_i\geq 0 \text{ for every } i\in I\}. \end{align*} The sign convention matches the definition of $K^\circ$: if $v=DA_{z^*}^{\top}\lambda+DG_{I,z^*}^{\top}\mu_I$ with $\mu_i\geq 0$ for every $i\in I$ and $d\in K$, then \begin{align*} v\cdot d=\lambda\cdot DA_{z^*}d+\sum_{i\in I}\mu_iDG_{i,z^*}d\leq 0. \end{align*} Applying this representation to $-\nabla F(z^*)\in K^\circ$, there exist $\lambda\in\mathbb{R}^a$ and $\mu_I\in\mathbb{R}^{I}$ with $\mu_i\geq 0$ for every $i\in I$ such that \begin{align*} -\nabla F(z^*)=DA_{z^*}^{\top}\lambda+DG_{I,z^*}^{\top}\mu_I. \end{align*} Equivalently, \begin{align*} \nabla F(z^*)+DA_{z^*}^{\top}\lambda+DG_{I,z^*}^{\top}\mu_I=0. \end{align*} [/step] [step:Extend the active multipliers and verify the KKT conditions] Define $\mu\in\mathbb{R}^b$ as follows: for $i\in I$, set $\mu_i:=(\mu_I)_i$, and for $i\notin I$, set $\mu_i:=0$. Then $\mu_i\geq 0$ for every $i\in\{1,\dots,b\}$, because active multipliers are nonnegative and inactive multipliers are zero. Since $\mu_i=0$ for $i\notin I$, the active stationarity equation becomes the full stationarity equation \begin{align*} \nabla F(z^*)+DA_{z^*}^{\top}\lambda+DG_{z^*}^{\top}\mu=0. \end{align*} Primal feasibility and inequality feasibility are part of the assumption that $z^*$ is a local minimizer over $\mathcal{F}$, so \begin{align*} A(z^*)=0 \end{align*} and \begin{align*} G_i(z^*)\leq 0 \quad \text{for every } i\in\{1,\dots,b\}. \end{align*} Finally, if $i\in I$, then $G_i(z^*)=0$, so $\mu_iG_i(z^*)=0$. If $i\notin I$, then $\mu_i=0$, so again $\mu_iG_i(z^*)=0$. Therefore \begin{align*} \mu_iG_i(z^*)=0 \quad \text{for every } i\in\{1,\dots,b\}. \end{align*} All KKT conditions follow. [/step]

Prerequisites (0/3 completed)

Prerequisites Graph

Interactive dependency map showing how this theorem builds on foundational concepts

Loading dependency graph...

Theorems

Definitions & Concepts

Open Set

What brings you to Androma?

Start with a route through the knowledge graph.