[proofplan]
Fix a point $(t,x)$ with $t<T$ and compare the dynamic programming identity on the short interval $[t,t+h]$ with the first-order Taylor expansion of $V$ along controlled trajectories. Constant controls give one inequality against each fixed action $a\in U$, hence against the infimum over $U$. The assumed approximate minimizers for the short-time problem give the reverse inequality up to an $o(h)$ error. Dividing by $h$ and letting $h\downarrow 0$ identifies $-\partial_t V$ with the Hamiltonian, and the terminal condition follows from the value function at the degenerate terminal interval.
[/proofplan]
[step:Fix a short time interval and define the Hamiltonian convention]
Fix $(t,x)\in [0,T)\times\mathbb{R}^n$. Choose $h>0$ so small that $t+h\leq T$ and the dynamic programming principle is valid on $[t,t+h]$. For $a\in U$, define the constant control
\begin{align*}
u_a:[t,t+h]\to U,\qquad u_a(s)=a.
\end{align*}
By hypothesis, $u_a\in\mathcal{A}_{t,t+h}$ for all sufficiently small $h>0$.
For $p\in\mathbb{R}^n$, the Hamiltonian convention used in this theorem is
\begin{align*}
H(x,p)=\inf_{a\in U}\{\ell(x,a)+p\cdot f(x,a)\}.
\end{align*}
Thus proving the Hamilton-Jacobi-Bellman equation is exactly proving
\begin{align*}
-\partial_t V(t,x)=H(x,\nabla_x V(t,x)).
\end{align*}
[/step]
[step:Use constant controls to obtain the lower bound for each action]
Fix $a\in U$, and let $y_a:[t,t+h]\to\mathbb{R}^n$ denote the trajectory generated by $u_a$, so that $y_a(t)=x$ and
\begin{align*}
\dot y_a(s)=f(y_a(s),a).
\end{align*}
The dynamic programming principle gives, because the infimum is bounded above by the cost of the admissible control $u_a$,
\begin{align*}
V(t,x)\leq \int_t^{t+h}\ell(y_a(s),a)\,d\mathcal{L}^1(s)+V(t+h,y_a(t+h)).
\end{align*}
Since $y_a\in C^1([t,t+h];\mathbb{R}^n)$ and $\dot y_a(t)=f(x,a)$, the first-order expansion of a differentiable curve gives
\begin{align*}
y_a(t+h)=x+h f(x,a)+o(h)
\end{align*}
as $h\downarrow 0$. Since $\ell$ is continuous and $y_a(s)\to x$ as $s\downarrow t$, the elementary averaging property for continuous functions on shrinking intervals gives
\begin{align*}
\int_t^{t+h}\ell(y_a(s),a)\,d\mathcal{L}^1(s)=h\ell(x,a)+o(h).
\end{align*}
Since $V\in C^1([0,T]\times\mathbb{R}^n)$, Taylor expansion at $(t,x)$ gives
\begin{align*}
V(t+h,y_a(t+h))=V(t,x)+h\partial_t V(t,x)+h\nabla_x V(t,x)\cdot f(x,a)+o(h).
\end{align*}
Substituting these two expansions into the dynamic-programming inequality yields
\begin{align*}
0\leq h\{\ell(x,a)+\partial_t V(t,x)+\nabla_x V(t,x)\cdot f(x,a)\}+o(h).
\end{align*}
Dividing by $h>0$ and letting $h\downarrow 0$ gives
\begin{align*}
-\partial_t V(t,x)\leq \ell(x,a)+\nabla_x V(t,x)\cdot f(x,a).
\end{align*}
Because $a\in U$ was arbitrary,
\begin{align*}
-\partial_t V(t,x)\leq \inf_{a\in U}\{\ell(x,a)+\nabla_x V(t,x)\cdot f(x,a)\}.
\end{align*}
[guided]
We first extract information from controls that we can name explicitly. Fix an action $a\in U$, and define the constant admissible control
\begin{align*}
u_a:[t,t+h]\to U,\qquad u_a(s)=a.
\end{align*}
Let $y_a:[t,t+h]\to\mathbb{R}^n$ be the corresponding trajectory, so $y_a(t)=x$ and
\begin{align*}
\dot y_a(s)=f(y_a(s),a).
\end{align*}
The dynamic programming principle says that $V(t,x)$ is the infimum of the short-time running cost plus the future value. Since an infimum is no larger than any particular admissible cost, applying it to $u_a$ gives
\begin{align*}
V(t,x)\leq \int_t^{t+h}\ell(y_a(s),a)\,d\mathcal{L}^1(s)+V(t+h,y_a(t+h)).
\end{align*}
Now we expand each term to first order in $h$. The theorem statement declares $y_a\in C^1([t,t+h];\mathbb{R}^n)$ and the ODE gives $\dot y_a(t)=f(y_a(t),a)=f(x,a)$. Therefore the first-order expansion of the differentiable curve $y_a$ at $t$ gives
\begin{align*}
y_a(t+h)=x+h f(x,a)+o(h)
\end{align*}
as $h\downarrow 0$. The running cost is also first-order in the interval length. The map $s\mapsto \ell(y_a(s),a)$ is continuous at $t$, so the elementary averaging property on shrinking intervals gives
\begin{align*}
\int_t^{t+h}\ell(y_a(s),a)\,d\mathcal{L}^1(s)=h\ell(x,a)+o(h).
\end{align*}
Finally, $V\in C^1([0,T]\times\mathbb{R}^n)$, so Taylor expansion in both the time and space variables at $(t,x)$ gives
\begin{align*}
V(t+h,y_a(t+h))=V(t,x)+h\partial_t V(t,x)+h\nabla_x V(t,x)\cdot f(x,a)+o(h).
\end{align*}
Substituting the two first-order expansions into the dynamic-programming inequality and cancelling $V(t,x)$ from both sides gives
\begin{align*}
0\leq h\{\ell(x,a)+\partial_t V(t,x)+\nabla_x V(t,x)\cdot f(x,a)\}+o(h).
\end{align*}
After division by $h>0$ and passage to the limit $h\downarrow 0$, we obtain
\begin{align*}
-\partial_t V(t,x)\leq \ell(x,a)+\nabla_x V(t,x)\cdot f(x,a).
\end{align*}
This inequality holds for every fixed $a\in U$. Therefore the left-hand side is bounded above by the greatest lower bound of the right-hand side over all actions:
\begin{align*}
-\partial_t V(t,x)\leq \inf_{a\in U}\{\ell(x,a)+\nabla_x V(t,x)\cdot f(x,a)\}.
\end{align*}
[/guided]
[/step]
[step:Use approximate minimizers to obtain the reverse bound for the infimum]
By the assumed first-order approximate-minimizer consistency, there is a remainder $r_{t,x}(h)=o(h)$ as $h\downarrow 0$ such that
\begin{align*}
0\geq h\left\{\partial_t V(t,x)+\inf_{a\in U}\bigl(\ell(x,a)+\nabla_x V(t,x)\cdot f(x,a)\bigr)\right\}+r_{t,x}(h).
\end{align*}
Dividing by $h>0$ and letting $h\downarrow 0$ gives
\begin{align*}
-\partial_t V(t,x)\geq \inf_{a\in U}\{\ell(x,a)+\nabla_x V(t,x)\cdot f(x,a)\}.
\end{align*}
This is precisely the reverse inequality to the one obtained from constant controls.
[/step]
[step:Identify the Hamiltonian and write the differential equation]
Combining the two inequalities gives
\begin{align*}
-\partial_t V(t,x)=\inf_{a\in U}\{\ell(x,a)+\nabla_x V(t,x)\cdot f(x,a)\}.
\end{align*}
By the definition of $H$,
\begin{align*}
-\partial_t V(t,x)=H(x,\nabla_x V(t,x)).
\end{align*}
Equivalently,
\begin{align*}
\partial_t V(t,x)+H(x,\nabla_x V(t,x))=0.
\end{align*}
Since $(t,x)\in [0,T)\times\mathbb{R}^n$ was arbitrary, the Hamilton-Jacobi-Bellman equation holds on $[0,T)\times\mathbb{R}^n$.
[/step]
[step:Evaluate the value function at the terminal time]
The terminal condition is part of the hypotheses: for every $x\in\mathbb{R}^n$,
\begin{align*}
V(T,x)=g(x).
\end{align*}
This holds for every $x\in\mathbb{R}^n$, completing the proof.
[/step]