[proofplan]
The proof is the standard verification argument: compare the candidate value function $W$ with the cost accumulated along an arbitrary admissible trajectory. The HJB equation says that, along every admissible control, the derivative of $W$ along the controlled path is bounded below by the negative running cost. Integrating this differential inequality from the initial time to the terminal time and using $W(T,\cdot)=g$ gives $W \leq J$, and taking the infimum over controls gives $W \leq V$. If a feedback attains the Hamiltonian infimum along its closed-loop trajectory, the same calculation is an equality, so the corresponding control attains the value.
[/proofplan]
[step:Derive the differential inequality along an arbitrary admissible trajectory]
Fix $(t,x) \in [0,T]\times \mathbb{R}^n$ and $u \in \mathcal{A}[t,T]$. Let $x_u: [t,T]\to \mathbb{R}^n$ be the corresponding state trajectory, so $x_u(t)=x$ and $\dot{x}_u(s)=f(x_u(s),u(s))$ for the relevant times $s \in [t,T]$.
If $t=T$, then
\begin{align*}
W(T,x)=g(x)=J_{T,x}[u],
\end{align*}
because the running cost over $[T,T]$ is zero. Hence assume $t<T$.
For each $s \in [t,T)$, the definition of the infimum gives
\begin{align*}
\inf_{a\in U}\{\ell(x_u(s),a)+\nabla_x W(s,x_u(s))\cdot f(x_u(s),a)\} \leq \ell(x_u(s),u(s))+\nabla_x W(s,x_u(s))\cdot f(x_u(s),u(s)).
\end{align*}
Using the HJB equation at $(s,x_u(s))$ and adding $\partial_t W(s,x_u(s))$ to the preceding inequality yields
\begin{align*}
0 \leq \partial_t W(s,x_u(s))+\nabla_x W(s,x_u(s))\cdot f(x_u(s),u(s))+\ell(x_u(s),u(s)).
\end{align*}
[guided]
Fix the initial pair $(t,x)$ and an admissible control $u \in \mathcal{A}[t,T]$. The associated trajectory is the map $x_u: [t,T]\to \mathbb{R}^n$ satisfying $x_u(t)=x$ and the controlled state equation
\begin{align*}
\dot{x}_u(s)=f(x_u(s),u(s)).
\end{align*}
The endpoint case $t=T$ contains no evolution. The integral over the one-point interval $[T,T]$ is zero, and the terminal condition gives
\begin{align*}
J_{T,x}[u]=g(x)=W(T,x).
\end{align*}
Thus the desired inequality is an equality when $t=T$, and we may assume $t<T$.
The HJB equation compares $W$ against all possible instantaneous controls. At time $s \in [t,T)$ and state $x_u(s)$, the actual control value $u(s)$ is one admissible element of $U$. Therefore the infimum over all $a \in U$ is bounded above by the value obtained at $a=u(s)$:
\begin{align*}
\inf_{a\in U}\{\ell(x_u(s),a)+\nabla_x W(s,x_u(s))\cdot f(x_u(s),a)\} \leq \ell(x_u(s),u(s))+\nabla_x W(s,x_u(s))\cdot f(x_u(s),u(s)).
\end{align*}
The HJB equation says
\begin{align*}
\partial_t W(s,x_u(s))+\inf_{a\in U}\{\ell(x_u(s),a)+\nabla_x W(s,x_u(s))\cdot f(x_u(s),a)\}=0.
\end{align*}
Substituting the preceding upper bound for the infimum gives the pointwise differential inequality
\begin{align*}
0 \leq \partial_t W(s,x_u(s))+\nabla_x W(s,x_u(s))\cdot f(x_u(s),u(s))+\ell(x_u(s),u(s)).
\end{align*}
This is the core verification inequality: the infinitesimal change of $W$ along the controlled path cannot be smaller than the negative of the running cost.
[/guided]
[/step]
[step:Integrate the chain-rule identity to compare $W$ with the cost]
Define
\begin{align*}
q_u: [t,T]\to \mathbb{R}, \qquad q_u(s):=W(s,x_u(s)).
\end{align*}
By the chain-rule hypothesis applied to $q_u$ and the state equation,
\begin{align*}
q_u'(s)=\partial_t W(s,x_u(s))+\nabla_x W(s,x_u(s))\cdot f(x_u(s),u(s))
\end{align*}
for the times at which the chain rule is asserted. Combining this identity with the differential inequality from the previous step gives
\begin{align*}
q_u'(s)+\ell(x_u(s),u(s))\geq 0.
\end{align*}
Equivalently,
\begin{align*}
-q_u'(s)\leq \ell(x_u(s),u(s)).
\end{align*}
Integrating over $[t,T]$ with respect to $\mathcal{L}^1$ and using the fundamental identity for the derivative of $q_u$ on $[t,T]$, we obtain
\begin{align*}
q_u(t)-q_u(T)\leq \int_{[t,T]} \ell(x_u(s),u(s))\,d\mathcal{L}^1(s).
\end{align*}
Since $q_u(t)=W(t,x)$ and, by the terminal condition, $q_u(T)=W(T,x_u(T))=g(x_u(T))$, this becomes
\begin{align*}
W(t,x)\leq \int_{[t,T]} \ell(x_u(s),u(s))\,d\mathcal{L}^1(s)+g(x_u(T)).
\end{align*}
The right-hand side is precisely $J_{t,x}[u]$, so
\begin{align*}
W(t,x)\leq J_{t,x}[u].
\end{align*}
If the running cost integral is $+\infty$, the same inequality is interpreted in the extended-real order and remains valid; the hypotheses that the integral and cost are well defined exclude undefined extended-real arithmetic.
[/step]
[step:Take the infimum over admissible controls]
The preceding estimate holds for the fixed admissible control $u \in \mathcal{A}[t,T]$. Since $u$ was arbitrary, $W(t,x)$ is a lower bound for the set of extended-[real numbers](/page/Real%20Numbers)
\begin{align*}
\{J_{t,x}[v]: v\in \mathcal{A}[t,T]\}.
\end{align*}
Therefore, by the definition of the infimum,
\begin{align*}
W(t,x)\leq \inf_{v\in \mathcal{A}[t,T]}J_{t,x}[v]=V(t,x).
\end{align*}
This proves the verification inequality for every initial pair $(t,x)$.
[/step]
[step:Use Hamiltonian attainment along the closed-loop trajectory to get equality]
Now fix the initial pair $(t,x)$ from the optimality part of the theorem. Let
\begin{align*}
a^*: [t,T]\times \mathbb{R}^n\to U
\end{align*}
be the feedback map, let $x^*: [t,T]\to \mathbb{R}^n$ be the closed-loop trajectory, and define $u^*: [t,T]\to U$ by
\begin{align*}
u^*(s):=a^*(s,x^*(s)).
\end{align*}
By hypothesis, $u^*\in \mathcal{A}[t,T]$ and $J_{t,x}[u^*]$ is finite.
If $t=T$, then $W(T,x)=g(x)=J_{T,x}[u^*]$, so assume $t<T$. The Hamiltonian attainment hypothesis gives, for every $s\in[t,T)$,
\begin{align*}
\ell(x^*(s),u^*(s))+\nabla_x W(s,x^*(s))\cdot f(x^*(s),u^*(s))=\inf_{a\in U}\{\ell(x^*(s),a)+\nabla_x W(s,x^*(s))\cdot f(x^*(s),a)\}.
\end{align*}
Using the HJB equation at $(s,x^*(s))$, we obtain
\begin{align*}
\partial_t W(s,x^*(s))+\nabla_x W(s,x^*(s))\cdot f(x^*(s),u^*(s))+\ell(x^*(s),u^*(s))=0.
\end{align*}
Define
\begin{align*}
q_*: [t,T]\to \mathbb{R}, \qquad q_*(s):=W(s,x^*(s)).
\end{align*}
The chain-rule hypothesis applied to the admissible control $u^*$ and its trajectory $x^*$ gives
\begin{align*}
q_*'(s)=\partial_t W(s,x^*(s))+\nabla_x W(s,x^*(s))\cdot f(x^*(s),u^*(s)).
\end{align*}
Hence
\begin{align*}
q_*'(s)+\ell(x^*(s),u^*(s))=0.
\end{align*}
Integrating over $[t,T]$ with respect to $\mathcal{L}^1$ gives
\begin{align*}
q_*(t)-q_*(T)=\int_{[t,T]}\ell(x^*(s),u^*(s))\,d\mathcal{L}^1(s).
\end{align*}
Using $q_*(t)=W(t,x)$ and $q_*(T)=W(T,x^*(T))=g(x^*(T))$, we obtain
\begin{align*}
W(t,x)=\int_{[t,T]}\ell(x^*(s),u^*(s))\,d\mathcal{L}^1(s)+g(x^*(T))=J_{t,x}[u^*].
\end{align*}
[guided]
The lower-bound part of the proof used only that the actual control is one candidate in the infimum. For the feedback control, we assume something stronger: along the closed-loop trajectory, the chosen feedback value actually attains the infimum in the Hamiltonian.
Let $x^*: [t,T]\to \mathbb{R}^n$ be the solution of the closed-loop equation and let $u^*: [t,T]\to U$ be the induced control
\begin{align*}
u^*(s)=a^*(s,x^*(s)).
\end{align*}
The hypotheses state that $u^*\in\mathcal{A}[t,T]$ and that its cost $J_{t,x}[u^*]$ is finite, so this control is a legitimate competitor in the value problem.
When $t=T$, the equality follows immediately from the terminal condition:
\begin{align*}
W(T,x)=g(x)=J_{T,x}[u^*].
\end{align*}
Assume now that $t<T$. The Hamiltonian attainment hypothesis says that, for every $s\in[t,T)$, the value $u^*(s)$ realizes the same quantity as the infimum:
\begin{align*}
\ell(x^*(s),u^*(s))+\nabla_x W(s,x^*(s))\cdot f(x^*(s),u^*(s))=\inf_{a\in U}\{\ell(x^*(s),a)+\nabla_x W(s,x^*(s))\cdot f(x^*(s),a)\}.
\end{align*}
The HJB equation at the point $(s,x^*(s))$ is
\begin{align*}
\partial_t W(s,x^*(s))+\inf_{a\in U}\{\ell(x^*(s),a)+\nabla_x W(s,x^*(s))\cdot f(x^*(s),a)\}=0.
\end{align*}
Substituting the attained value of the infimum gives an equality, not merely an inequality:
\begin{align*}
\partial_t W(s,x^*(s))+\nabla_x W(s,x^*(s))\cdot f(x^*(s),u^*(s))+\ell(x^*(s),u^*(s))=0.
\end{align*}
Now define the real-valued function
\begin{align*}
q_*: [t,T]\to \mathbb{R}, \qquad q_*(s):=W(s,x^*(s)).
\end{align*}
The chain-rule hypothesis applies to the admissible control $u^*$ and the corresponding trajectory $x^*$, so
\begin{align*}
q_*'(s)=\partial_t W(s,x^*(s))+\nabla_x W(s,x^*(s))\cdot f(x^*(s),u^*(s)).
\end{align*}
Combining this identity with the preceding equality yields
\begin{align*}
q_*'(s)+\ell(x^*(s),u^*(s))=0.
\end{align*}
Integrating this equality over $[t,T]$ with respect to $\mathcal{L}^1$ gives
\begin{align*}
q_*(t)-q_*(T)=\int_{[t,T]}\ell(x^*(s),u^*(s))\,d\mathcal{L}^1(s).
\end{align*}
Finally, $q_*(t)=W(t,x)$ because $x^*(t)=x$, and $q_*(T)=W(T,x^*(T))=g(x^*(T))$ by the terminal condition. Therefore
\begin{align*}
W(t,x)=\int_{[t,T]}\ell(x^*(s),u^*(s))\,d\mathcal{L}^1(s)+g(x^*(T))=J_{t,x}[u^*].
\end{align*}
The feedback condition has converted the verification inequality into an exact identity along the closed-loop path.
[/guided]
[/step]
[step:Conclude optimality of the feedback control]
From the first part of the proof applied at the fixed initial pair $(t,x)$, we have
\begin{align*}
W(t,x)\leq V(t,x).
\end{align*}
Since $u^*\in\mathcal{A}[t,T]$, the definition of $V(t,x)$ as an infimum gives
\begin{align*}
V(t,x)\leq J_{t,x}[u^*].
\end{align*}
The previous step proved
\begin{align*}
W(t,x)=J_{t,x}[u^*].
\end{align*}
Combining the three relations yields
\begin{align*}
W(t,x)=V(t,x)=J_{t,x}[u^*].
\end{align*}
Thus $u^*$ attains the infimum defining $V(t,x)$, and hence $u^*$ is optimal from the initial pair $(t,x)$.
[/step]