Classical HJB Verification Theorem — Statement & Proof

Theorem

Edit Issues Pull Requests Attributions Admin

Let $T > 0$, let $U$ be a nonempty control set, and let $f: \mathbb{R}^n \times U \to \mathbb{R}^n$, $\ell: \mathbb{R}^n \times U \to \mathbb{R}$, and $g: \mathbb{R}^n \to \mathbb{R}$ be given functions. For each $t \in [0,T]$, let $\mathcal{A}[t,T]$ be a class of admissible controls $u: [t,T] \to U$. For each initial pair $(t,x) \in [0,T] \times \mathbb{R}^n$ and each $u \in \mathcal{A}[t,T]$, assume that there is a corresponding state trajectory $x_u: [t,T] \to \mathbb{R}^n$ satisfying \begin{align*} \dot{x}_u(s) = f(x_u(s),u(s)) \end{align*} for the relevant times $s \in [t,T]$ and satisfying $x_u(t)=x$. Assume that the running cost integral is well defined as an extended real number for every admissible control, and define \begin{align*} J_{t,x}[u] := \int_{[t,T]} \ell(x_u(s),u(s))\, d\mathcal{L}^1(s) + g(x_u(T)). \end{align*} Define the value function $V: [0,T] \times \mathbb{R}^n \to [-\infty,\infty]$ by \begin{align*} V(t,x) := \inf_{u \in \mathcal{A}[t,T]} J_{t,x}[u]. \end{align*} Let $W \in C^1([0,T]\times \mathbb{R}^n)$ satisfy the Hamilton-Jacobi-Bellman equation \begin{align*} \partial_t W(t,x) + \inf_{a \in U}\{\ell(x,a) + \nabla_x W(t,x)\cdot f(x,a)\} = 0 \end{align*} for every $(t,x) \in [0,T) \times \mathbb{R}^n$, and the terminal condition \begin{align*} W(T,x)=g(x) \end{align*} for every $x \in \mathbb{R}^n$. Assume that for every initial pair $(t,x) \in [0,T]\times \mathbb{R}^n$ and every $u \in \mathcal{A}[t,T]$, the state equation is well posed and the chain rule applies to the function \begin{align*} q_u: [t,T] \to \mathbb{R}, \qquad q_u(s):=W(s,x_u(s)). \end{align*} Then \begin{align*} W(t,x) \leq J_{t,x}[u] \end{align*} for every $(t,x) \in [0,T]\times \mathbb{R}^n$ and every $u \in \mathcal{A}[t,T]$. Consequently, \begin{align*} W(t,x) \leq V(t,x) \end{align*} for every $(t,x) \in [0,T]\times \mathbb{R}^n$. Moreover, fix an initial pair $(t,x) \in [0,T]\times \mathbb{R}^n$. Suppose there exists a measurable feedback map \begin{align*} a^*: [t,T]\times \mathbb{R}^n \to U \end{align*} such that the closed-loop equation \begin{align*} \dot{x}^*(s)=f(x^*(s),a^*(s,x^*(s))), \qquad x^*(t)=x \end{align*} is well posed on $[t,T]$. Define the feedback-generated control $u^*: [t,T]\to U$ by \begin{align*} u^*(s):=a^*(s,x^*(s)). \end{align*} Assume that $u^* \in \mathcal{A}[t,T]$, that $J_{t,x}[u^*]$ is finite, and that for every $s \in [t,T)$, \begin{align*} \ell(x^*(s),a^*(s,x^*(s)))+\nabla_x W(s,x^*(s))\cdot f(x^*(s),a^*(s,x^*(s))) = \inf_{a\in U}\{\ell(x^*(s),a)+\nabla_x W(s,x^*(s))\cdot f(x^*(s),a)\}. \end{align*} Then \begin{align*} W(t,x)=V(t,x)=J_{t,x}[u^*]. \end{align*} In particular, $u^*$ is optimal from the initial pair $(t,x)$.

Discussion

Proof

[proofplan] The proof is the standard verification argument: compare the candidate value function $W$ with the cost accumulated along an arbitrary admissible trajectory. The HJB equation says that, along every admissible control, the derivative of $W$ along the controlled path is bounded below by the negative running cost. Integrating this differential inequality from the initial time to the terminal time and using $W(T,\cdot)=g$ gives $W \leq J$, and taking the infimum over controls gives $W \leq V$. If a feedback attains the Hamiltonian infimum along its closed-loop trajectory, the same calculation is an equality, so the corresponding control attains the value. [/proofplan] [step:Derive the differential inequality along an arbitrary admissible trajectory] Fix $(t,x) \in [0,T]\times \mathbb{R}^n$ and $u \in \mathcal{A}[t,T]$. Let $x_u: [t,T]\to \mathbb{R}^n$ be the corresponding state trajectory, so $x_u(t)=x$ and $\dot{x}_u(s)=f(x_u(s),u(s))$ for the relevant times $s \in [t,T]$. If $t=T$, then \begin{align*} W(T,x)=g(x)=J_{T,x}[u], \end{align*} because the running cost over $[T,T]$ is zero. Hence assume $t<T$. For each $s \in [t,T)$, the definition of the infimum gives \begin{align*} \inf_{a\in U}\{\ell(x_u(s),a)+\nabla_x W(s,x_u(s))\cdot f(x_u(s),a)\} \leq \ell(x_u(s),u(s))+\nabla_x W(s,x_u(s))\cdot f(x_u(s),u(s)). \end{align*} Using the HJB equation at $(s,x_u(s))$ and adding $\partial_t W(s,x_u(s))$ to the preceding inequality yields \begin{align*} 0 \leq \partial_t W(s,x_u(s))+\nabla_x W(s,x_u(s))\cdot f(x_u(s),u(s))+\ell(x_u(s),u(s)). \end{align*} [guided] Fix the initial pair $(t,x)$ and an admissible control $u \in \mathcal{A}[t,T]$. The associated trajectory is the map $x_u: [t,T]\to \mathbb{R}^n$ satisfying $x_u(t)=x$ and the controlled state equation \begin{align*} \dot{x}_u(s)=f(x_u(s),u(s)). \end{align*} The endpoint case $t=T$ contains no evolution. The integral over the one-point interval $[T,T]$ is zero, and the terminal condition gives \begin{align*} J_{T,x}[u]=g(x)=W(T,x). \end{align*} Thus the desired inequality is an equality when $t=T$, and we may assume $t<T$. The HJB equation compares $W$ against all possible instantaneous controls. At time $s \in [t,T)$ and state $x_u(s)$, the actual control value $u(s)$ is one admissible element of $U$. Therefore the infimum over all $a \in U$ is bounded above by the value obtained at $a=u(s)$: \begin{align*} \inf_{a\in U}\{\ell(x_u(s),a)+\nabla_x W(s,x_u(s))\cdot f(x_u(s),a)\} \leq \ell(x_u(s),u(s))+\nabla_x W(s,x_u(s))\cdot f(x_u(s),u(s)). \end{align*} The HJB equation says \begin{align*} \partial_t W(s,x_u(s))+\inf_{a\in U}\{\ell(x_u(s),a)+\nabla_x W(s,x_u(s))\cdot f(x_u(s),a)\}=0. \end{align*} Substituting the preceding upper bound for the infimum gives the pointwise differential inequality \begin{align*} 0 \leq \partial_t W(s,x_u(s))+\nabla_x W(s,x_u(s))\cdot f(x_u(s),u(s))+\ell(x_u(s),u(s)). \end{align*} This is the core verification inequality: the infinitesimal change of $W$ along the controlled path cannot be smaller than the negative of the running cost. [/guided] [/step] [step:Integrate the chain-rule identity to compare $W$ with the cost] Define \begin{align*} q_u: [t,T]\to \mathbb{R}, \qquad q_u(s):=W(s,x_u(s)). \end{align*} By the chain-rule hypothesis applied to $q_u$ and the state equation, \begin{align*} q_u'(s)=\partial_t W(s,x_u(s))+\nabla_x W(s,x_u(s))\cdot f(x_u(s),u(s)) \end{align*} for the times at which the chain rule is asserted. Combining this identity with the differential inequality from the previous step gives \begin{align*} q_u'(s)+\ell(x_u(s),u(s))\geq 0. \end{align*} Equivalently, \begin{align*} -q_u'(s)\leq \ell(x_u(s),u(s)). \end{align*} Integrating over $[t,T]$ with respect to $\mathcal{L}^1$ and using the fundamental identity for the derivative of $q_u$ on $[t,T]$, we obtain \begin{align*} q_u(t)-q_u(T)\leq \int_{[t,T]} \ell(x_u(s),u(s))\,d\mathcal{L}^1(s). \end{align*} Since $q_u(t)=W(t,x)$ and, by the terminal condition, $q_u(T)=W(T,x_u(T))=g(x_u(T))$, this becomes \begin{align*} W(t,x)\leq \int_{[t,T]} \ell(x_u(s),u(s))\,d\mathcal{L}^1(s)+g(x_u(T)). \end{align*} The right-hand side is precisely $J_{t,x}[u]$, so \begin{align*} W(t,x)\leq J_{t,x}[u]. \end{align*} If the running cost integral is $+\infty$, the same inequality is interpreted in the extended-real order and remains valid; the hypotheses that the integral and cost are well defined exclude undefined extended-real arithmetic. [/step] [step:Take the infimum over admissible controls] The preceding estimate holds for the fixed admissible control $u \in \mathcal{A}[t,T]$. Since $u$ was arbitrary, $W(t,x)$ is a lower bound for the set of extended-[real numbers](/page/Real%20Numbers) \begin{align*} \{J_{t,x}[v]: v\in \mathcal{A}[t,T]\}. \end{align*} Therefore, by the definition of the infimum, \begin{align*} W(t,x)\leq \inf_{v\in \mathcal{A}[t,T]}J_{t,x}[v]=V(t,x). \end{align*} This proves the verification inequality for every initial pair $(t,x)$. [/step] [step:Use Hamiltonian attainment along the closed-loop trajectory to get equality] Now fix the initial pair $(t,x)$ from the optimality part of the theorem. Let \begin{align*} a^*: [t,T]\times \mathbb{R}^n\to U \end{align*} be the feedback map, let $x^*: [t,T]\to \mathbb{R}^n$ be the closed-loop trajectory, and define $u^*: [t,T]\to U$ by \begin{align*} u^*(s):=a^*(s,x^*(s)). \end{align*} By hypothesis, $u^*\in \mathcal{A}[t,T]$ and $J_{t,x}[u^*]$ is finite. If $t=T$, then $W(T,x)=g(x)=J_{T,x}[u^*]$, so assume $t<T$. The Hamiltonian attainment hypothesis gives, for every $s\in[t,T)$, \begin{align*} \ell(x^*(s),u^*(s))+\nabla_x W(s,x^*(s))\cdot f(x^*(s),u^*(s))=\inf_{a\in U}\{\ell(x^*(s),a)+\nabla_x W(s,x^*(s))\cdot f(x^*(s),a)\}. \end{align*} Using the HJB equation at $(s,x^*(s))$, we obtain \begin{align*} \partial_t W(s,x^*(s))+\nabla_x W(s,x^*(s))\cdot f(x^*(s),u^*(s))+\ell(x^*(s),u^*(s))=0. \end{align*} Define \begin{align*} q_*: [t,T]\to \mathbb{R}, \qquad q_*(s):=W(s,x^*(s)). \end{align*} The chain-rule hypothesis applied to the admissible control $u^*$ and its trajectory $x^*$ gives \begin{align*} q_*'(s)=\partial_t W(s,x^*(s))+\nabla_x W(s,x^*(s))\cdot f(x^*(s),u^*(s)). \end{align*} Hence \begin{align*} q_*'(s)+\ell(x^*(s),u^*(s))=0. \end{align*} Integrating over $[t,T]$ with respect to $\mathcal{L}^1$ gives \begin{align*} q_*(t)-q_*(T)=\int_{[t,T]}\ell(x^*(s),u^*(s))\,d\mathcal{L}^1(s). \end{align*} Using $q_*(t)=W(t,x)$ and $q_*(T)=W(T,x^*(T))=g(x^*(T))$, we obtain \begin{align*} W(t,x)=\int_{[t,T]}\ell(x^*(s),u^*(s))\,d\mathcal{L}^1(s)+g(x^*(T))=J_{t,x}[u^*]. \end{align*} [guided] The lower-bound part of the proof used only that the actual control is one candidate in the infimum. For the feedback control, we assume something stronger: along the closed-loop trajectory, the chosen feedback value actually attains the infimum in the Hamiltonian. Let $x^*: [t,T]\to \mathbb{R}^n$ be the solution of the closed-loop equation and let $u^*: [t,T]\to U$ be the induced control \begin{align*} u^*(s)=a^*(s,x^*(s)). \end{align*} The hypotheses state that $u^*\in\mathcal{A}[t,T]$ and that its cost $J_{t,x}[u^*]$ is finite, so this control is a legitimate competitor in the value problem. When $t=T$, the equality follows immediately from the terminal condition: \begin{align*} W(T,x)=g(x)=J_{T,x}[u^*]. \end{align*} Assume now that $t<T$. The Hamiltonian attainment hypothesis says that, for every $s\in[t,T)$, the value $u^*(s)$ realizes the same quantity as the infimum: \begin{align*} \ell(x^*(s),u^*(s))+\nabla_x W(s,x^*(s))\cdot f(x^*(s),u^*(s))=\inf_{a\in U}\{\ell(x^*(s),a)+\nabla_x W(s,x^*(s))\cdot f(x^*(s),a)\}. \end{align*} The HJB equation at the point $(s,x^*(s))$ is \begin{align*} \partial_t W(s,x^*(s))+\inf_{a\in U}\{\ell(x^*(s),a)+\nabla_x W(s,x^*(s))\cdot f(x^*(s),a)\}=0. \end{align*} Substituting the attained value of the infimum gives an equality, not merely an inequality: \begin{align*} \partial_t W(s,x^*(s))+\nabla_x W(s,x^*(s))\cdot f(x^*(s),u^*(s))+\ell(x^*(s),u^*(s))=0. \end{align*} Now define the real-valued function \begin{align*} q_*: [t,T]\to \mathbb{R}, \qquad q_*(s):=W(s,x^*(s)). \end{align*} The chain-rule hypothesis applies to the admissible control $u^*$ and the corresponding trajectory $x^*$, so \begin{align*} q_*'(s)=\partial_t W(s,x^*(s))+\nabla_x W(s,x^*(s))\cdot f(x^*(s),u^*(s)). \end{align*} Combining this identity with the preceding equality yields \begin{align*} q_*'(s)+\ell(x^*(s),u^*(s))=0. \end{align*} Integrating this equality over $[t,T]$ with respect to $\mathcal{L}^1$ gives \begin{align*} q_*(t)-q_*(T)=\int_{[t,T]}\ell(x^*(s),u^*(s))\,d\mathcal{L}^1(s). \end{align*} Finally, $q_*(t)=W(t,x)$ because $x^*(t)=x$, and $q_*(T)=W(T,x^*(T))=g(x^*(T))$ by the terminal condition. Therefore \begin{align*} W(t,x)=\int_{[t,T]}\ell(x^*(s),u^*(s))\,d\mathcal{L}^1(s)+g(x^*(T))=J_{t,x}[u^*]. \end{align*} The feedback condition has converted the verification inequality into an exact identity along the closed-loop path. [/guided] [/step] [step:Conclude optimality of the feedback control] From the first part of the proof applied at the fixed initial pair $(t,x)$, we have \begin{align*} W(t,x)\leq V(t,x). \end{align*} Since $u^*\in\mathcal{A}[t,T]$, the definition of $V(t,x)$ as an infimum gives \begin{align*} V(t,x)\leq J_{t,x}[u^*]. \end{align*} The previous step proved \begin{align*} W(t,x)=J_{t,x}[u^*]. \end{align*} Combining the three relations yields \begin{align*} W(t,x)=V(t,x)=J_{t,x}[u^*]. \end{align*} Thus $u^*$ attains the infimum defining $V(t,x)$, and hence $u^*$ is optimal from the initial pair $(t,x)$. [/step]

Prerequisites (0/2 completed)

Prerequisites Graph

Interactive dependency map showing how this theorem builds on foundational concepts

Loading dependency graph...

Definitions & Concepts

What brings you to Androma?

Start with a route through the knowledge graph.

Classical HJB Verification Theorem (Theorem # 7631)

Discussion

Proof

Prerequisites (0/2 completed)

Prerequisites Graph

Explore Further

Sign in to Androma

Check your inbox

One last step

Classical HJB Verification Theorem (Theorem # 7631)

Discussion

Proof

Prerequisites (0/2 completed)

Prerequisites Graph

Explore Further