[proofplan]
We split the proof according to whether the linearized constraint map $g=D(G_0)_{e_0}$ is surjective. If $g$ is not surjective, finite-dimensional linear algebra gives a nonzero covector annihilating $\operatorname{Range}(g)$, hence an abnormal Fritz John multiplier. If $g$ is surjective, the [implicit function theorem](/theorems/52) parametrizes the feasible set near $e_0$ by $\ker g$; differentiating this parametrization and using the local minimum condition shows that $j$ vanishes on $\ker g$. A right inverse for $g$ then defines a covector on $\mathbb{R}^k$ whose pullback equals $j$, giving the normal multiplier equation.
[/proofplan]
[step:Produce an abnormal multiplier when the constraint derivative is not surjective]
The hypothesis says that $e_0$ is a local minimizer of the $C^1$ function $J_0:U\to\mathbb{R}$ on the feasible set
\begin{align*}
\mathcal{M} := \{e \in U : G_0(e)=0\}.
\end{align*}
Indeed, $G_0(e_0)=0$, and there is an open neighbourhood $V\subset U$ of $e_0$ such that $J_0(e_0)\leq J_0(e)$ for every $e\in V\cap \mathcal{M}$.
Assume first that the [linear map](/page/Linear%20Map) $g:E\to\mathbb{R}^k$ is not surjective. Then $\operatorname{Range}(g)$ is a proper linear subspace of the finite-dimensional real [vector space](/page/Vector%20Space) $\mathbb{R}^k$. By the finite-dimensional annihilator theorem for a proper subspace, there exists a nonzero covector $\lambda\in(\mathbb{R}^k)^*$ such that $\lambda(y)=0$ for every $y\in\operatorname{Range}(g)$. Set $\lambda_0:=0$. Then $(\lambda_0,\lambda)\neq(0,0)$ and, for every $h\in E$,
\begin{align*}
\lambda_0 j(h)+\lambda(g(h))=\lambda(g(h))=0,
\end{align*}
because $g(h)\in\operatorname{Range}(g)$. Thus the desired Fritz John equation holds in the nonsurjective case.
[guided]
We first record the feasible set because the local minimum hypothesis will be used in the surjective case. Define
\begin{align*}
\mathcal{M} := \{e \in U : G_0(e)=0\}.
\end{align*}
Since $G_0(e_0)=0$, the point $e_0$ lies in $\mathcal{M}$. The stated neighbourhood condition says that $J_0(e_0)\leq J_0(e)$ for all feasible points $e$ sufficiently close to $e_0$, so $e_0$ is a local minimizer of $J_0$ restricted to $\mathcal{M}$.
Now suppose $g:E\to\mathbb{R}^k$ is not surjective. Then $\operatorname{Range}(g)$ is a proper subspace of $\mathbb{R}^k$. By the finite-dimensional annihilator theorem for a proper subspace, there exists $\lambda\in(\mathbb{R}^k)^*$, $\lambda\neq 0$, such that $\lambda(y)=0$ for every $y\in\operatorname{Range}(g)$. We define $\lambda_0:=0$. This gives a nonzero multiplier pair because $\lambda\neq 0$.
For every direction $h\in E$, the vector $g(h)$ belongs to $\operatorname{Range}(g)$ by definition of the range. Hence
\begin{align*}
\lambda_0 j(h)+\lambda(g(h))=\lambda(g(h))=0.
\end{align*}
This proves the Fritz John identity when the constraint derivative has deficient range. The multiplier is abnormal because the objective multiplier is $\lambda_0=0$.
[/guided]
[/step]
[step:Parametrize the feasible set when the constraint derivative is surjective]
Assume now that
\begin{align*}
g=D(G_0)_{e_0}:E\to\mathbb{R}^k
\end{align*}
is surjective. Define the kernel subspace $K:=\ker g\subset E$. Since $E$ is finite-dimensional and $g$ is surjective, there exists a linear right inverse
\begin{align*}
R:\mathbb{R}^k\to E
\end{align*}
such that $g\circ R=\operatorname{id}_{\mathbb{R}^k}$. Define the linear isomorphism
\begin{align*}
\Phi:K\times\mathbb{R}^k\to E, \qquad \Phi(k_0,y)=k_0+R(y).
\end{align*}
The map is injective because $k_0+R(y)=0$ implies $0=g(k_0+R(y))=y$, hence $k_0=0$; it is surjective because every $h\in E$ decomposes as
\begin{align*}
h=(h-R(g(h)))+R(g(h)),
\end{align*}
with $h-R(g(h))\in K$.
Using this isomorphism, define the local constraint map
\begin{align*}
H:W\subset K\times\mathbb{R}^k\to\mathbb{R}^k, \qquad H(k_0,y)=G_0(e_0+\Phi(k_0,y)),
\end{align*}
where $W:=\{(k_0,y)\in K\times\mathbb{R}^k:e_0+\Phi(k_0,y)\in U\}$. The set $W$ is open, $(0,0)\in W$, and $H$ is $C^1$. Its derivative in the $\mathbb{R}^k$ variable at $(0,0)$ is
\begin{align*}
D_yH_{(0,0)}(y)=D(G_0)_{e_0}(R(y))=g(R(y))=y.
\end{align*}
Thus $D_yH_{(0,0)}=\operatorname{id}_{\mathbb{R}^k}$ is invertible. The spaces $K$ and $\mathbb{R}^k$ are finite-dimensional normed real vector spaces, $W$ is an open neighbourhood of $(0,0)$ in $K\times\mathbb{R}^k$, the map $H:W\to\mathbb{R}^k$ is $C^1$, and its derivative in the second variable at $(0,0)$ is an isomorphism. By the finite-dimensional implicit function theorem, there are open neighbourhoods $A\subset K$ of $0$ and $B\subset\mathbb{R}^k$ of $0$, and a $C^1$ map
\begin{align*}
\psi:A\to B
\end{align*}
with $\psi(0)=0$ such that, for $(k_0,y)\in A\times B$,
\begin{align*}
H(k_0,y)=0 \quad\text{if and only if}\quad y=\psi(k_0).
\end{align*}
[guided]
Assume now that the derivative of the constraint map,
\begin{align*}
g=D(G_0)_{e_0}:E\to\mathbb{R}^k,
\end{align*}
is surjective. We want to replace the constrained problem near $e_0$ by an unconstrained problem on the kernel of $g$. Define
\begin{align*}
K:=\ker g\subset E.
\end{align*}
Because $E$ and $\mathbb{R}^k$ are finite-dimensional and $g$ is surjective, there exists a linear right inverse
\begin{align*}
R:\mathbb{R}^k\to E
\end{align*}
such that $g\circ R=\operatorname{id}_{\mathbb{R}^k}$. This choice lets us split each direction into a kernel part and a constraint-changing part. Define
\begin{align*}
\Phi:K\times\mathbb{R}^k\to E, \qquad \Phi(k_0,y)=k_0+R(y).
\end{align*}
The map $\Phi$ is injective because $\Phi(k_0,y)=0$ implies $0=g(k_0+R(y))=y$, and then $k_0=0$. It is surjective because every $h\in E$ has the decomposition
\begin{align*}
h=(h-R(g(h)))+R(g(h)),
\end{align*}
where $g(h-R(g(h)))=g(h)-g(R(g(h)))=g(h)-g(h)=0$, so $h-R(g(h))\in K$.
Now define the [open set](/page/Open%20Set)
\begin{align*}
W:=\{(k_0,y)\in K\times\mathbb{R}^k:e_0+\Phi(k_0,y)\in U\}
\end{align*}
and the local constraint map
\begin{align*}
H:W\to\mathbb{R}^k, \qquad H(k_0,y)=G_0(e_0+\Phi(k_0,y)).
\end{align*}
The set $W$ is open in the finite-dimensional vector space $K\times\mathbb{R}^k$ because $U$ is open and $(k_0,y)\mapsto e_0+\Phi(k_0,y)$ is continuous. Also $(0,0)\in W$ because $e_0\in U$, and $H$ is $C^1$ because $G_0$ is $C^1$ and $\Phi$ is linear. The derivative of $H$ in the $\mathbb{R}^k$ variable at $(0,0)$ is
\begin{align*}
D_yH_{(0,0)}(y)=D(G_0)_{e_0}(R(y))=g(R(y))=y.
\end{align*}
Thus $D_yH_{(0,0)}=\operatorname{id}_{\mathbb{R}^k}$, which is an isomorphism. The finite-dimensional implicit function theorem applies to the $C^1$ map $H:W\to\mathbb{R}^k$ between finite-dimensional normed spaces with the second-variable derivative invertible at $(0,0)$. Therefore there are open neighbourhoods $A\subset K$ of $0$ and $B\subset\mathbb{R}^k$ of $0$, and a $C^1$ map
\begin{align*}
\psi:A\to B
\end{align*}
with $\psi(0)=0$ such that, for $(k_0,y)\in A\times B$,
\begin{align*}
H(k_0,y)=0 \quad\text{if and only if}\quad y=\psi(k_0).
\end{align*}
This parametrizes all nearby feasible points in these coordinates by the kernel variable $k_0$.
[/guided]
[/step]
[step:Differentiate the reduced objective along the kernel directions]
Shrink $A$ if necessary so that $e_0+\Phi(k_0,\psi(k_0))\in V$ for every $k_0\in A$. Define the reduced objective
\begin{align*}
F:A\to\mathbb{R}, \qquad F(k_0)=J_0(e_0+\Phi(k_0,\psi(k_0))).
\end{align*}
For every $k_0\in A$, the point $e_0+\Phi(k_0,\psi(k_0))$ is feasible because $H(k_0,\psi(k_0))=0$. Hence the local minimality hypothesis gives $F(0)\leq F(k_0)$ for all $k_0\in A$, after the shrinkage of $A$. Therefore $DF_0=0$ as an element of $K^*$.
We next compute $D\psi_0$. Differentiating the identity $H(k_0,\psi(k_0))=0$ at $k_0=0$ in a direction $v\in K$ gives
\begin{align*}
0=D_kH_{(0,0)}(v)+D_yH_{(0,0)}(D\psi_0(v)).
\end{align*}
Here $D_kH_{(0,0)}(v)=g(v)=0$ because $v\in K$, and $D_yH_{(0,0)}=\operatorname{id}_{\mathbb{R}^k}$. Thus $D\psi_0(v)=0$ for every $v\in K$.
Now differentiate $F$ at $0$. For every $v\in K$, the chain rule gives
\begin{align*}
0=DF_0(v)=j(\Phi(v,D\psi_0(v)))=j(v+R(D\psi_0(v)))=j(v).
\end{align*}
Thus $j$ vanishes on $K=\ker g$.
[guided]
The parametrization from the previous step converts the constrained local minimum into an ordinary local minimum on the kernel space $K$. Shrink $A$ if necessary so that
\begin{align*}
e_0+\Phi(k_0,\psi(k_0))\in V
\end{align*}
for every $k_0\in A$; this is possible because $k_0\mapsto e_0+\Phi(k_0,\psi(k_0))$ is continuous, sends $0$ to $e_0$, and $V$ is an open neighbourhood of $e_0$. Define the reduced objective map
\begin{align*}
F:A\to\mathbb{R}, \qquad F(k_0)=J_0(e_0+\Phi(k_0,\psi(k_0))).
\end{align*}
The map $F$ is $C^1$ because $J_0$, $\Phi$, and $\psi$ are $C^1$. For each $k_0\in A$, the point $e_0+\Phi(k_0,\psi(k_0))$ is feasible, since
\begin{align*}
G_0(e_0+\Phi(k_0,\psi(k_0)))=H(k_0,
\psi(k_0))=0.
\end{align*}
It also lies in $V$ after the shrinkage of $A$. Hence the local minimality hypothesis gives
\begin{align*}
F(0)=J_0(e_0)\leq J_0(e_0+\Phi(k_0,\psi(k_0)))=F(k_0)
\end{align*}
for every $k_0\in A$. Since $A$ is an open neighbourhood of $0$ in the finite-dimensional vector space $K$ and $F$ is differentiable at $0$, the derivative of $F$ at this local minimum is zero:
\begin{align*}
DF_0=0\in K^*.
\end{align*}
Next we compute the derivative of the implicit function $\psi$ at $0$. The identity
\begin{align*}
H(k_0,\psi(k_0))=0
\end{align*}
holds for every $k_0\in A$. Differentiating this identity at $0$ in a direction $v\in K$ and using the chain rule gives
\begin{align*}
0=D_kH_{(0,0)}(v)+D_yH_{(0,0)}(D\psi_0(v)).
\end{align*}
The first term is
\begin{align*}
D_kH_{(0,0)}(v)=D(G_0)_{e_0}(v)=g(v)=0,
\end{align*}
because $v\in K=\ker g$. The second-variable derivative is $D_yH_{(0,0)}=\operatorname{id}_{\mathbb{R}^k}$, so the displayed identity becomes $D\psi_0(v)=0$.
Finally differentiate the reduced objective. For every $v\in K$, the chain rule gives
\begin{align*}
0=DF_0(v)=D(J_0)_{e_0}(\Phi(v,D\psi_0(v)))=j(v+R(D\psi_0(v)))=j(v).
\end{align*}
Thus the objective derivative $j$ vanishes on every vector in $K=\ker g$.
[/guided]
[/step]
[step:Build the normal multiplier from the quotient by the kernel]
Define a covector
\begin{align*}
\mu:\mathbb{R}^k\to\mathbb{R}, \qquad \mu(y)=j(R(y)).
\end{align*}
This map is linear because $j$ and $R$ are linear. For each $h\in E$, the decomposition from the previous step gives $h=(h-R(g(h)))+R(g(h))$ with $h-R(g(h))\in\ker g$. Since $j$ vanishes on $\ker g$,
\begin{align*}
j(h)=j(R(g(h)))=\mu(g(h)).
\end{align*}
Define $\lambda\in(\mathbb{R}^k)^*$ by $\lambda:=-\mu$. Then, for every $h\in E$,
\begin{align*}
j(h)+\lambda(g(h))=j(h)-\mu(g(h))=0.
\end{align*}
Thus $(1,\lambda)$ is a nonzero multiplier pair satisfying the Fritz John equation. Combining this normal construction in the surjective case with the abnormal construction in the nonsurjective case proves the multiplier alternative. The equivalence with
\begin{align*}
\lambda_0D(J_0)_{e_0}+\lambda\circ D(G_0)_{e_0}=0
\end{align*}
as an element of $E^*$ follows directly from the definitions of $j$ and $g$, and the proof is complete.
[guided]
We now convert the fact that $j$ vanishes on $\ker g$ into a multiplier on $\mathbb{R}^k$. Define the covector
\begin{align*}
\mu:\mathbb{R}^k\to\mathbb{R}, \qquad \mu(y)=j(R(y)).
\end{align*}
This map is linear because both $R$ and $j$ are linear. For any $h\in E$, use the decomposition associated to the right inverse $R$:
\begin{align*}
h=(h-R(g(h)))+R(g(h)).
\end{align*}
The first summand lies in $\ker g$ because
\begin{align*}
g(h-R(g(h)))=g(h)-g(R(g(h)))=g(h)-g(h)=0.
\end{align*}
Since the previous step proved that $j$ vanishes on $\ker g$, we obtain
\begin{align*}
j(h)=j(R(g(h)))=\mu(g(h)).
\end{align*}
Define
\begin{align*}
\lambda:=-\mu\in(\mathbb{R}^k)^*.
\end{align*}
Then, for every $h\in E$,
\begin{align*}
j(h)+\lambda(g(h))=j(h)-\mu(g(h))=0.
\end{align*}
Thus $(1,\lambda)$ is a nonzero multiplier pair, and the multiplier is normal because its first component is $1$. The first step handled the nonsurjective case by constructing an abnormal pair with $\lambda_0=0$; the present construction handles the surjective case with $\lambda_0=1$. These two cases exhaust all possibilities for the linear map $g:E\to\mathbb{R}^k$. Finally, since $j=D(J_0)_{e_0}$ and $g=D(G_0)_{e_0}$ by definition, the identity
\begin{align*}
\lambda_0j(h)+\lambda(g(h))=0\quad\text{for every }h\in E
\end{align*}
is exactly the assertion that
\begin{align*}
\lambda_0D(J_0)_{e_0}+\lambda\circ D(G_0)_{e_0}=0
\end{align*}
as a linear functional on $E$.
[/guided]
[/step]