Hamilton-Jacobi-Bellman Equation from the Dynamic Programming Principle

Hamilton-Jacobi-Bellman Equation from the Dynamic Programming Principle (Theorem # 7630)

Theorem

Edit Issues Pull Requests Attributions Admin

Discussion

Proof

[proofplan] Fix a point $(t,x)$ with $t<T$ and compare the dynamic programming identity on the short interval $[t,t+h]$ with the first-order Taylor expansion of $V$ along controlled trajectories. Constant controls give one inequality against each fixed action $a\in U$, hence against the infimum over $U$. The assumed approximate minimizers for the short-time problem give the reverse inequality up to an $o(h)$ error. Dividing by $h$ and letting $h\downarrow 0$ identifies $-\partial_t V$ with the Hamiltonian, and the terminal condition follows from the value function at the degenerate terminal interval. [/proofplan] [step:Fix a short time interval and define the Hamiltonian convention] Fix $(t,x)\in [0,T)\times\mathbb{R}^n$. Choose $h>0$ so small that $t+h\leq T$ and the dynamic programming principle is valid on $[t,t+h]$. For $a\in U$, define the constant control \begin{align*} u_a:[t,t+h]\to U,\qquad u_a(s)=a. \end{align*} By hypothesis, $u_a\in\mathcal{A}_{t,t+h}$ for all sufficiently small $h>0$. For $p\in\mathbb{R}^n$, the Hamiltonian convention used in this theorem is \begin{align*} H(x,p)=\inf_{a\in U}\{\ell(x,a)+p\cdot f(x,a)\}. \end{align*} Thus proving the Hamilton-Jacobi-Bellman equation is exactly proving \begin{align*} -\partial_t V(t,x)=H(x,\nabla_x V(t,x)). \end{align*} [/step] [step:Use constant controls to obtain the lower bound for each action] Fix $a\in U$, and let $y_a:[t,t+h]\to\mathbb{R}^n$ denote the trajectory generated by $u_a$, so that $y_a(t)=x$ and \begin{align*} \dot y_a(s)=f(y_a(s),a). \end{align*} The dynamic programming principle gives, because the infimum is bounded above by the cost of the admissible control $u_a$, \begin{align*} V(t,x)\leq \int_t^{t+h}\ell(y_a(s),a)\,d\mathcal{L}^1(s)+V(t+h,y_a(t+h)). \end{align*} Since $y_a\in C^1([t,t+h];\mathbb{R}^n)$ and $\dot y_a(t)=f(x,a)$, the first-order expansion of a differentiable curve gives \begin{align*} y_a(t+h)=x+h f(x,a)+o(h) \end{align*} as $h\downarrow 0$. Since $\ell$ is continuous and $y_a(s)\to x$ as $s\downarrow t$, the elementary averaging property for continuous functions on shrinking intervals gives \begin{align*} \int_t^{t+h}\ell(y_a(s),a)\,d\mathcal{L}^1(s)=h\ell(x,a)+o(h). \end{align*} Since $V\in C^1([0,T]\times\mathbb{R}^n)$, Taylor expansion at $(t,x)$ gives \begin{align*} V(t+h,y_a(t+h))=V(t,x)+h\partial_t V(t,x)+h\nabla_x V(t,x)\cdot f(x,a)+o(h). \end{align*} Substituting these two expansions into the dynamic-programming inequality yields \begin{align*} 0\leq h\{\ell(x,a)+\partial_t V(t,x)+\nabla_x V(t,x)\cdot f(x,a)\}+o(h). \end{align*} Dividing by $h>0$ and letting $h\downarrow 0$ gives \begin{align*} -\partial_t V(t,x)\leq \ell(x,a)+\nabla_x V(t,x)\cdot f(x,a). \end{align*} Because $a\in U$ was arbitrary, \begin{align*} -\partial_t V(t,x)\leq \inf_{a\in U}\{\ell(x,a)+\nabla_x V(t,x)\cdot f(x,a)\}. \end{align*} [guided] We first extract information from controls that we can name explicitly. Fix an action $a\in U$, and define the constant admissible control \begin{align*} u_a:[t,t+h]\to U,\qquad u_a(s)=a. \end{align*} Let $y_a:[t,t+h]\to\mathbb{R}^n$ be the corresponding trajectory, so $y_a(t)=x$ and \begin{align*} \dot y_a(s)=f(y_a(s),a). \end{align*} The dynamic programming principle says that $V(t,x)$ is the infimum of the short-time running cost plus the future value. Since an infimum is no larger than any particular admissible cost, applying it to $u_a$ gives \begin{align*} V(t,x)\leq \int_t^{t+h}\ell(y_a(s),a)\,d\mathcal{L}^1(s)+V(t+h,y_a(t+h)). \end{align*} Now we expand each term to first order in $h$. The theorem statement declares $y_a\in C^1([t,t+h];\mathbb{R}^n)$ and the ODE gives $\dot y_a(t)=f(y_a(t),a)=f(x,a)$. Therefore the first-order expansion of the differentiable curve $y_a$ at $t$ gives \begin{align*} y_a(t+h)=x+h f(x,a)+o(h) \end{align*} as $h\downarrow 0$. The running cost is also first-order in the interval length. The map $s\mapsto \ell(y_a(s),a)$ is continuous at $t$, so the elementary averaging property on shrinking intervals gives \begin{align*} \int_t^{t+h}\ell(y_a(s),a)\,d\mathcal{L}^1(s)=h\ell(x,a)+o(h). \end{align*} Finally, $V\in C^1([0,T]\times\mathbb{R}^n)$, so Taylor expansion in both the time and space variables at $(t,x)$ gives \begin{align*} V(t+h,y_a(t+h))=V(t,x)+h\partial_t V(t,x)+h\nabla_x V(t,x)\cdot f(x,a)+o(h). \end{align*} Substituting the two first-order expansions into the dynamic-programming inequality and cancelling $V(t,x)$ from both sides gives \begin{align*} 0\leq h\{\ell(x,a)+\partial_t V(t,x)+\nabla_x V(t,x)\cdot f(x,a)\}+o(h). \end{align*} After division by $h>0$ and passage to the limit $h\downarrow 0$, we obtain \begin{align*} -\partial_t V(t,x)\leq \ell(x,a)+\nabla_x V(t,x)\cdot f(x,a). \end{align*} This inequality holds for every fixed $a\in U$. Therefore the left-hand side is bounded above by the greatest lower bound of the right-hand side over all actions: \begin{align*} -\partial_t V(t,x)\leq \inf_{a\in U}\{\ell(x,a)+\nabla_x V(t,x)\cdot f(x,a)\}. \end{align*} [/guided] [/step] [step:Use approximate minimizers to obtain the reverse bound for the infimum] By the assumed first-order approximate-minimizer consistency, there is a remainder $r_{t,x}(h)=o(h)$ as $h\downarrow 0$ such that \begin{align*} 0\geq h\left\{\partial_t V(t,x)+\inf_{a\in U}\bigl(\ell(x,a)+\nabla_x V(t,x)\cdot f(x,a)\bigr)\right\}+r_{t,x}(h). \end{align*} Dividing by $h>0$ and letting $h\downarrow 0$ gives \begin{align*} -\partial_t V(t,x)\geq \inf_{a\in U}\{\ell(x,a)+\nabla_x V(t,x)\cdot f(x,a)\}. \end{align*} This is precisely the reverse inequality to the one obtained from constant controls. [/step] [step:Identify the Hamiltonian and write the differential equation] Combining the two inequalities gives \begin{align*} -\partial_t V(t,x)=\inf_{a\in U}\{\ell(x,a)+\nabla_x V(t,x)\cdot f(x,a)\}. \end{align*} By the definition of $H$, \begin{align*} -\partial_t V(t,x)=H(x,\nabla_x V(t,x)). \end{align*} Equivalently, \begin{align*} \partial_t V(t,x)+H(x,\nabla_x V(t,x))=0. \end{align*} Since $(t,x)\in [0,T)\times\mathbb{R}^n$ was arbitrary, the Hamilton-Jacobi-Bellman equation holds on $[0,T)\times\mathbb{R}^n$. [/step] [step:Evaluate the value function at the terminal time] The terminal condition is part of the hypotheses: for every $x\in\mathbb{R}^n$, \begin{align*} V(T,x)=g(x). \end{align*} This holds for every $x\in\mathbb{R}^n$, completing the proof. [/step]

Explore Further

Properties of Closed Sets Topology Kellerer Dual Attainment Theorem for Dominated Lower Semicontinuous Costs Analysis Calderón–Zygmund Theorem Analysis Logarithmic Transformation of the Heat Equation Analysis RESET THEOREM PDE Support Of Convolution Real Analysis Plancherel Identity Measure Theory Equivalence of Weyl and Kohn--Nirenberg Quantizations Modulo Residual Terms Analysis Analysis Area

What brings you to Androma?

Start with a route through the knowledge graph.