[proofplan]
We enlarge the state by the accumulated running cost and use the assumed endpoint separation condition to obtain a nonzero terminal multiplier. The multiplier is oriented so that every terminal first-order variation in the cone $K$ has nonpositive pairing with it, while tangent first-order endpoint descent directions are excluded. Solving the adjoint equation backward transports this terminal multiplier to a costate $p(t)$. Finally, evaluating the multiplier inequality on one-needle variations gives the Hamiltonian maximum condition at Lebesgue points, hence almost everywhere.
[/proofplan]
[step:Separate the terminal variation cone from endpoint descent directions]
Let
\begin{align*}
a=\nabla\Phi(x^*(t_1))\in\mathbb{R}^n
\end{align*}
and define the endpoint tangent descent cone
\begin{align*}
D=\{(w,\eta)\in\mathbb{R}^n\times\mathbb{R}:w\in T_{x^*(t_1)}M,\ \eta+a\cdot w<0\}.
\end{align*}
The set $K$ is a closed convex cone by definition, and $D$ is a convex cone, relatively open in the vector subspace $T_{x^*(t_1)}M\times\mathbb{R}$. By the assumed separation condition, $K\cap D=\varnothing$. We use the following finite-dimensional cone-separation form of the separating hyperplane theorem: if $C\subset\mathbb{R}^N$ is a closed convex cone, $O$ is a nonempty convex cone relatively open in a linear subspace, and $C\cap O=\varnothing$, then there is a nonzero linear functional that is nonpositive on $C$ and nonnegative on $O$. This form applies although $0\in K$ and $0\notin D$, because the separation is between a closed cone and a relatively open cone disjoint from it, and the functional may vanish at the common vertex direction. Hence we obtain a nonzero vector
\begin{align*}
(q,p_0)\in\mathbb{R}^n\times\mathbb{R}
\end{align*}
such that
\begin{align*}
q\cdot w+p_0\eta\leq 0
\end{align*}
for every $(w,\eta)\in K$, and
\begin{align*}
q\cdot w+p_0\eta\geq 0
\end{align*}
for every $(w,\eta)\in D$.
We now extract the sign and transversality information from the [second inequality](/theorems/2136). Taking $w=0$ and any $\eta<0$ gives $p_0\eta\geq 0$, hence $p_0\leq 0$. Next fix $w\in T_{x^*(t_1)}M$. For every $\varepsilon>0$, the point $(w,-a\cdot w-\varepsilon)$ belongs to $D$, so
\begin{align*}
q\cdot w+p_0(-a\cdot w-\varepsilon)\geq 0.
\end{align*}
Letting $\varepsilon\downarrow 0$ gives
\begin{align*}
(q-p_0a)\cdot w\geq 0.
\end{align*}
Replacing $w$ by $-w$ gives the reverse inequality, and therefore
\begin{align*}
(q-p_0a)\cdot w=0
\end{align*}
for every $w\in T_{x^*(t_1)}M$. Thus
\begin{align*}
q-p_0\nabla\Phi(x^*(t_1))\perp T_{x^*(t_1)}M.
\end{align*}
[guided]
The separation assumption says that the cone $K$ of attainable first-order terminal variations never enters the cone of endpoint-tangent variations that would strictly decrease the terminal augmented cost. We convert that geometric exclusion into a multiplier.
Set
\begin{align*}
a=\nabla\Phi(x^*(t_1))\in\mathbb{R}^n
\end{align*}
and define
\begin{align*}
D=\{(w,\eta)\in\mathbb{R}^n\times\mathbb{R}:w\in T_{x^*(t_1)}M,\ \eta+a\cdot w<0\}.
\end{align*}
Here $w$ is a first-order endpoint displacement tangent to the endpoint constraint manifold $M$, and $\eta$ is a first-order change in the accumulated running cost. The inequality $\eta+a\cdot w<0$ is exactly the strict first-order decrease of $\Phi(x(t_1))+j(t_1)$ in the direction $(w,\eta)$.
The set $K$ is a closed convex cone by its definition as the closure of a convex cone. The set $D$ is a convex cone because $T_{x^*(t_1)}M$ is a [vector space](/page/Vector%20Space) and the inequality defining $D$ is homogeneous and convex. The hypothesis gives $K\cap D=\varnothing$.
We now use the finite-dimensional cone-separation form of the separating hyperplane theorem. The version needed here says: if $C\subset\mathbb{R}^N$ is a closed convex cone, $O$ is a nonempty convex cone relatively open in a linear subspace, and $C\cap O=\varnothing$, then some nonzero linear functional is nonpositive on $C$ and nonnegative on $O$. This is the correct form for the present situation because $K$ contains $0$, while $D$ is open relative to $T_{x^*(t_1)}M\times\mathbb{R}$ and does not contain $0$; ordinary strict separation of two closed sets is not what is being used. Applying this cone-separation theorem with $C=K$ and $O=D$, there is a nonzero vector
\begin{align*}
(q,p_0)\in\mathbb{R}^n\times\mathbb{R}
\end{align*}
whose associated linear functional is nonpositive on $K$ and nonnegative on $D$:
\begin{align*}
q\cdot w+p_0\eta\leq 0
\end{align*}
for every $(w,\eta)\in K$, and
\begin{align*}
q\cdot w+p_0\eta\geq 0
\end{align*}
for every $(w,\eta)\in D$.
The orientation is important. We choose it so that $K$ lies in the nonpositive halfspace; this is the sign convention that will later produce a Hamiltonian maximum because $p_0\leq 0$.
To see $p_0\leq 0$, take $w=0$. Then $(0,\eta)\in D$ for every $\eta<0$, so
\begin{align*}
p_0\eta\geq 0
\end{align*}
for every $\eta<0$. This forces $p_0\leq 0$.
Now fix an arbitrary tangent vector $w\in T_{x^*(t_1)}M$. For every $\varepsilon>0$, the vector $(w,-a\cdot w-\varepsilon)$ belongs to $D$, hence
\begin{align*}
q\cdot w+p_0(-a\cdot w-\varepsilon)\geq 0.
\end{align*}
Letting $\varepsilon\downarrow 0$ gives
\begin{align*}
(q-p_0a)\cdot w\geq 0.
\end{align*}
Applying the same argument to $-w\in T_{x^*(t_1)}M$ gives
\begin{align*}
(q-p_0a)\cdot w\leq 0.
\end{align*}
Therefore
\begin{align*}
(q-p_0a)\cdot w=0
\end{align*}
for every $w\in T_{x^*(t_1)}M$. Since $a=\nabla\Phi(x^*(t_1))$, this is precisely
\begin{align*}
q-p_0\nabla\Phi(x^*(t_1))\perp T_{x^*(t_1)}M.
\end{align*}
[/guided]
[/step]
[step:Transport the terminal multiplier backward by the adjoint equation]
Define the coefficient maps
\begin{align*}
A:[t_0,t_1]\to\mathbb{R}^{n\times n}
\end{align*}
and
\begin{align*}
b:[t_0,t_1]\to\mathbb{R}^n
\end{align*}
by
\begin{align*}
A(t)=\frac{\partial f}{\partial x}(x^*(t),u^*(t))
\end{align*}
and
\begin{align*}
b(t)=\frac{\partial L}{\partial x}(x^*(t),u^*(t)).
\end{align*}
Because $x^*$ is continuous on the compact interval $[t_0,t_1]$, its image is compact in $X$. Since $U$ is compact and the derivatives above are continuous on $X\times U$, the functions $A$ and $b$ are bounded and Lebesgue measurable.
Let
\begin{align*}
p:[t_0,t_1]\to\mathbb{R}^n
\end{align*}
be the unique absolutely continuous solution of the terminal-value linear equation
\begin{align*}
p(t_1)=q
\end{align*}
and
\begin{align*}
\dot p(t)=-A(t)^\top p(t)-p_0b(t)
\end{align*}
for $\mathcal{L}^1$-a.e. $t\in[t_0,t_1]$. This equation is well posed because the coefficients are bounded and measurable.
Since $(q,p_0)\neq(0,0)$, the pair $(p_0,p)$ is nontrivial. Indeed, if $(p_0,p(t))=(0,0)$ for every $t$, then evaluating at $t=t_1$ gives $(p_0,q)=(0,0)$, contradicting the nonzero separating multiplier.
[/step]
[step:Convert terminal cone inequalities into Hamiltonian increment inequalities]
Consider a finite needle variation with needle times $\tau_1,\dots,\tau_N$, replacement controls $v_1,\dots,v_N\in U$, and first-order durations $\alpha_1,\dots,\alpha_N\geq 0$. Let
\begin{align*}
(z,r):[t_0,t_1]\to\mathbb{R}^n\times\mathbb{R}
\end{align*}
be the corresponding first-order variation of the augmented system. Its terminal value belongs to $K$, so the separated inequality gives
\begin{align*}
q\cdot z(t_1)+p_0r(t_1)\leq 0.
\end{align*}
We compute this terminal pairing. On each open interval between needle times,
\begin{align*}
\frac{d}{dt}\bigl(p(t)\cdot z(t)+p_0r(t)\bigr)=\dot p(t)\cdot z(t)+p(t)\cdot \dot z(t)+p_0\dot r(t)
\end{align*}
for $\mathcal{L}^1$-a.e. $t$. Substituting $\dot z=A z$, $\dot r=b\cdot z$, and $\dot p=-A^\top p-p_0b$ gives
\begin{align*}
\frac{d}{dt}\bigl(p(t)\cdot z(t)+p_0r(t)\bigr)=0.
\end{align*}
Since $(z(t_0),r(t_0))=(0,0)$, the value at $t_1$ is the sum of the jumps of this pairing at the needle times.
At the needle time $\tau_i$, define
\begin{align*}
\Delta f_i=f(x^*(\tau_i),v_i)-f(x^*(\tau_i),u^*(\tau_i))
\end{align*}
and
\begin{align*}
\Delta L_i=L(x^*(\tau_i),v_i)-L(x^*(\tau_i),u^*(\tau_i)).
\end{align*}
The prescribed jump conditions give
\begin{align*}
z(\tau_i^+)-z(\tau_i^-)=\alpha_i\Delta f_i
\end{align*}
and
\begin{align*}
r(\tau_i^+)-r(\tau_i^-)=\alpha_i\Delta L_i.
\end{align*}
Therefore
\begin{align*}
q\cdot z(t_1)+p_0r(t_1)=\sum_{i=1}^N \alpha_i\bigl(p(\tau_i)\cdot\Delta f_i+p_0\Delta L_i\bigr).
\end{align*}
Using the definition of $H$, this becomes
\begin{align*}
q\cdot z(t_1)+p_0r(t_1)=\sum_{i=1}^N \alpha_i\bigl(H(x^*(\tau_i),p(\tau_i),v_i,p_0)-H(x^*(\tau_i),p(\tau_i),u^*(\tau_i),p_0)\bigr).
\end{align*}
Combining with the cone inequality yields
\begin{align*}
\sum_{i=1}^N \alpha_i\bigl(H(x^*(\tau_i),p(\tau_i),v_i,p_0)-H(x^*(\tau_i),p(\tau_i),u^*(\tau_i),p_0)\bigr)\leq 0.
\end{align*}
[/step]
[step:Localize the one-needle inequality to obtain the Hamiltonian maximum condition]
Apply the preceding inequality to one needle, so $N=1$, with a Lebesgue point $\tau$ of $u^*$, an arbitrary $v\in U$, and an arbitrary first-order duration $\alpha>0$. The inequality becomes
\begin{align*}
\alpha\bigl(H(x^*(\tau),p(\tau),v,p_0)-H(x^*(\tau),p(\tau),u^*(\tau),p_0)\bigr)\leq 0.
\end{align*}
Since $\alpha>0$, division by $\alpha$ gives
\begin{align*}
H(x^*(\tau),p(\tau),v,p_0)\leq H(x^*(\tau),p(\tau),u^*(\tau),p_0)
\end{align*}
for every $v\in U$ and every Lebesgue point $\tau$ of $u^*$ for which the needle variation expansion is valid.
The set of Lebesgue points of the measurable map $u^*$ has full $\mathcal{L}^1$ measure in $[t_0,t_1]$. Hence, for $\mathcal{L}^1$-a.e. $t\in[t_0,t_1]$,
\begin{align*}
H(x^*(t),p(t),u^*(t),p_0)=\max_{v\in U}H(x^*(t),p(t),v,p_0).
\end{align*}
The maximum is attained because $U$ is compact and, for fixed $x^*(t)$, $p(t)$, and $p_0$, the map $v\mapsto H(x^*(t),p(t),v,p_0)$ is continuous on $U$.
[guided]
The previous step turned every finite needle variation into an inequality involving Hamiltonian increments. To obtain a pointwise condition, we use the simplest possible needle: one change of the control at one Lebesgue point.
Fix a Lebesgue point $\tau$ of $u^*$ and fix an arbitrary replacement value $v\in U$. Take one needle at time $\tau$ with first-order duration $\alpha>0$. The finite-needle inequality becomes
\begin{align*}
\alpha\bigl(H(x^*(\tau),p(\tau),v,p_0)-H(x^*(\tau),p(\tau),u^*(\tau),p_0)\bigr)\leq 0.
\end{align*}
The parameter $\alpha$ is positive, so we may divide by it without changing the direction of the inequality:
\begin{align*}
H(x^*(\tau),p(\tau),v,p_0)\leq H(x^*(\tau),p(\tau),u^*(\tau),p_0).
\end{align*}
Because $v\in U$ was arbitrary, the optimal control value $u^*(\tau)$ maximizes the Hamiltonian over all admissible control values at the time $\tau$.
The role of the Lebesgue point hypothesis is to justify the first-order needle expansion: the short interval on which the control is changed samples $u^*$ by its point value $u^*(\tau)$ to first order. Since $u^*:[t_0,t_1]\to U$ is Lebesgue measurable and $U\subset\mathbb{R}^m$, almost every time is a Lebesgue point of $u^*$. Therefore the inequality holds for $\mathcal{L}^1$-a.e. $t\in[t_0,t_1]$.
Finally, the word "maximum" is justified, rather than merely "supremum", because $U$ is compact and the function
\begin{align*}
v\mapsto H(x^*(t),p(t),v,p_0)
\end{align*}
is continuous on $U$. The continuity follows from the continuity of $f$ and $L$ in $(x,u)$ and the formula
\begin{align*}
H(x,p,u,p_0)=p\cdot f(x,u)+p_0L(x,u).
\end{align*}
Thus, for $\mathcal{L}^1$-a.e. $t\in[t_0,t_1]$,
\begin{align*}
H(x^*(t),p(t),u^*(t),p_0)=\max_{v\in U}H(x^*(t),p(t),v,p_0).
\end{align*}
[/guided]
[/step]
[step:Identify the state equation and adjoint equation with Hamiltonian derivatives]
For $x\in X$, $p\in\mathbb{R}^n$, $u\in U$, and $p_0\in\mathbb{R}$, the Hamiltonian is
\begin{align*}
H(x,p,u,p_0)=p\cdot f(x,u)+p_0L(x,u).
\end{align*}
Differentiating with respect to $p$ gives
\begin{align*}
\frac{\partial H}{\partial p}(x,p,u,p_0)=f(x,u).
\end{align*}
Since $(x^*,u^*)$ is admissible,
\begin{align*}
\dot{x}^*(t)=f(x^*(t),u^*(t))=\frac{\partial H}{\partial p}(x^*(t),p(t),u^*(t),p_0)
\end{align*}
for $\mathcal{L}^1$-a.e. $t\in[t_0,t_1]$.
Differentiating with respect to $x$ gives
\begin{align*}
\frac{\partial H}{\partial x}(x,p,u,p_0)=\left(\frac{\partial f}{\partial x}(x,u)\right)^\top p+p_0\frac{\partial L}{\partial x}(x,u).
\end{align*}
Using the definition of $A$ and $b$, the adjoint equation constructed above is exactly
\begin{align*}
\dot p(t)=-\frac{\partial H}{\partial x}(x^*(t),p(t),u^*(t),p_0)
\end{align*}
for $\mathcal{L}^1$-a.e. $t\in[t_0,t_1]$.
[/step]
[step:Record transversality and nontriviality]
The terminal condition for the adjoint construction was $p(t_1)=q$. The separation step proved
\begin{align*}
q-p_0\nabla\Phi(x^*(t_1))\perp T_{x^*(t_1)}M.
\end{align*}
Substituting $q=p(t_1)$ gives the terminal transversality condition
\begin{align*}
p(t_1)-p_0\nabla\Phi(x^*(t_1))\perp T_{x^*(t_1)}M.
\end{align*}
The same separation step gave $(q,p_0)\neq(0,0)$, and the terminal condition $p(t_1)=q$ implies that $(p_0,p(t))$ is not identically zero on $[t_0,t_1]$. Together with the state equation, adjoint equation, Hamiltonian maximum condition, and $p_0\leq 0$, this proves the theorem.
[/step]