Deterministic Dynamic Programming Principle — Statement & Proof

Theorem

Edit Issues Pull Requests Attributions Admin

Let $T>0$, let $U$ be a nonempty control-value set, let $f:[0,T]\times\mathbb{R}^n\times U\to \mathbb{R}^n$ be a controlled vector field, let $\ell:[0,T]\times\mathbb{R}^n\times U\to \mathbb{R}$ be a running cost, and let $g:\mathbb{R}^n\to \mathbb{R}$ be a terminal cost. For each $0\leq a\leq b\leq T$ and $z\in\mathbb{R}^n$, let $\mathcal{A}_{a,z}[a,b]$ be a nonempty set of admissible controls on $[a,b]$ starting from $z$, where each $u\in\mathcal{A}_{a,z}[a,b]$ is a map $u:[a,b]\to U$. Assume that for every $u\in\mathcal{A}_{a,z}[a,b]$ the initial-value problem \begin{align*} \dot{x}(s)=f(s,x(s),u(s)), \quad x(a)=z \end{align*} is well posed on $[a,b]$. Denote its solution by \begin{align*} x^{a,z;u}:[a,b]\to\mathbb{R}^n. \end{align*} Assume that for every $0\leq a\leq b\leq T$, every $z\in\mathbb{R}^n$, and every $u\in\mathcal{A}_{a,z}[a,b]$, the running cost integral \begin{align*} \int_{[a,b]} \ell(s,x^{a,z;u}(s),u(s))\,d\mathcal{L}^1(s) \end{align*} is a finite real number. Also assume that $g(x^{a,z;u}(b))\in\mathbb{R}$ for every such trajectory endpoint. For $0\leq t\leq T$, $x\in\mathbb{R}^n$, and $u\in\mathcal{A}_{t,x}[t,T]$, define the cost functional \begin{align*} J(t,x;u)=\int_{[t,T]} \ell(s,x^{t,x;u}(s),u(s))\,d\mathcal{L}^1(s)+g(x^{t,x;u}(T)). \end{align*} Define the value function \begin{align*} V(t,x)=\inf_{u\in\mathcal{A}_{t,x}[t,T]} J(t,x;u). \end{align*} Assume $V(t,x)\in\mathbb{R}$ for all $0\leq t\leq T$ and $x\in\mathbb{R}^n$. Assume the admissible control classes are stable under restriction and concatenation in the following sense. First, if $u\in\mathcal{A}_{t,x}[t,T]$ and $0\leq t\leq \tau\leq T$, then $u|_{[t,\tau]}\in\mathcal{A}_{t,x}[t,\tau]$ and $u|_{[\tau,T]}\in\mathcal{A}_{\tau,y}[\tau,T]$, where \begin{align*} y=x^{t,x;u}(\tau). \end{align*} The restricted trajectories agree with the full trajectory on their intervals: \begin{align*} x^{t,x;u|_{[t,\tau]}}(s)=x^{t,x;u}(s) \quad \text{for } s\in[t,\tau], \end{align*} and \begin{align*} x^{\tau,y;u|_{[\tau,T]}}(s)=x^{t,x;u}(s) \quad \text{for } s\in[\tau,T]. \end{align*} Second, if $u_1\in\mathcal{A}_{t,x}[t,\tau]$, if \begin{align*} y=x^{t,x;u_1}(\tau), \end{align*} and if $u_2\in\mathcal{A}_{\tau,y}[\tau,T]$, then the concatenated control \begin{align*} u_1\oplus_\tau u_2:[t,T]\to U \end{align*} belongs to $\mathcal{A}_{t,x}[t,T]$, and its trajectory satisfies \begin{align*} x^{t,x;u_1\oplus_\tau u_2}(s)=x^{t,x;u_1}(s) \quad \text{for } s\in[t,\tau], \end{align*} while \begin{align*} x^{t,x;u_1\oplus_\tau u_2}(s)=x^{\tau,y;u_2}(s) \quad \text{for } s\in[\tau,T]. \end{align*} Finally, assume that for every $0\leq t\leq \tau\leq T$, every $x\in\mathbb{R}^n$, every $u_1\in\mathcal{A}_{t,x}[t,\tau]$, every endpoint $y=x^{t,x;u_1}(\tau)$, and every $\varepsilon>0$, there exists $u_\varepsilon\in\mathcal{A}_{\tau,y}[\tau,T]$ such that \begin{align*} J(\tau,y;u_\varepsilon)\leq V(\tau,y)+\varepsilon. \end{align*} Then for every $0\leq t\leq \tau\leq T$ and every $x\in\mathbb{R}^n$, \begin{align*} V(t,x)=\inf_{u\in\mathcal{A}_{t,x}[t,\tau]}\left\{\int_{[t,\tau]} \ell(s,x^{t,x;u}(s),u(s))\,d\mathcal{L}^1(s)+V\bigl(\tau,x^{t,x;u}(\tau)\bigr)\right\}. \end{align*}

Discussion

Proof

[proofplan] Fix $0\leq t\leq \tau\leq T$ and $x\in\mathbb{R}^n$. We prove the equality by proving two inequalities. A full admissible control on $[t,T]$ can be restricted to a prefix on $[t,\tau]$ and a tail on $[\tau,T]$, and the tail cost is bounded below by the value function at the intermediate state. Conversely, any admissible prefix can be concatenated with an $\varepsilon$-optimal tail from its endpoint, producing a full control whose cost is within $\varepsilon$ of the dynamic-programming expression. [/proofplan] [step:Define the dynamic-programming expression to be compared with $V(t,x)$] Fix $0\leq t\leq \tau\leq T$ and $x\in\mathbb{R}^n$. Define the extended real quantity $R(t,x,\tau)$ by \begin{align*} R(t,x,\tau)=\inf_{u\in\mathcal{A}_{t,x}[t,\tau]}\left\{\int_{[t,\tau]} \ell(s,x^{t,x;u}(s),u(s))\,d\mathcal{L}^1(s)+V\bigl(\tau,x^{t,x;u}(\tau)\bigr)\right\}. \end{align*} The hypotheses ensure that each expression inside this infimum is a finite real number whenever the corresponding admissible control is under consideration. The two inequalities below will show that this infimum is finite. It remains to prove \begin{align*} V(t,x)=R(t,x,\tau). \end{align*} [/step] [step:Restrict a full control to prove $V(t,x)\geq R(t,x,\tau)$] Let $w\in\mathcal{A}_{t,x}[t,T]$ be arbitrary. Define the restricted prefix control as the map $u_1:[t,\tau]\to U$ given by \begin{align*} u_1=w|_{[t,\tau]}, \end{align*} and define the intermediate state \begin{align*} y=x^{t,x;w}(\tau). \end{align*} By the restriction stability hypothesis, $u_1\in\mathcal{A}_{t,x}[t,\tau]$ and $w|_{[\tau,T]}\in\mathcal{A}_{\tau,y}[\tau,T]$. The restricted-trajectory identities give \begin{align*} x^{t,x;u_1}(s)=x^{t,x;w}(s) \quad \text{for } s\in[t,\tau] \end{align*} and \begin{align*} x^{\tau,y;w|_{[\tau,T]}}(s)=x^{t,x;w}(s) \quad \text{for } s\in[\tau,T]. \end{align*} Using these trajectory identities and additivity of the one-dimensional [Lebesgue integral](/page/Lebesgue%20Integral) over the adjacent intervals $[t,\tau]$ and $[\tau,T]$, whose overlap $\{\tau\}$ has $\mathcal{L}^1$-measure zero, we split the cost of $w$ as \begin{align*} J(t,x;w)=\int_{[t,\tau]} \ell(s,x^{t,x;w}(s),w(s))\,d\mathcal{L}^1(s)+J(\tau,y;w|_{[\tau,T]}). \end{align*} Since $w|_{[\tau,T]}\in\mathcal{A}_{\tau,y}[\tau,T]$, the definition of $V(\tau,y)$ as an infimum gives \begin{align*} J(\tau,y;w|_{[\tau,T]})\geq V(\tau,y). \end{align*} Therefore \begin{align*} J(t,x;w)\geq \int_{[t,\tau]} \ell(s,x^{t,x;u_1}(s),u_1(s))\,d\mathcal{L}^1(s)+V\bigl(\tau,x^{t,x;u_1}(\tau)\bigr). \end{align*} The right-hand side is one of the quantities over which $R(t,x,\tau)$ takes its infimum, so \begin{align*} J(t,x;w)\geq R(t,x,\tau). \end{align*} Taking the infimum over all $w\in\mathcal{A}_{t,x}[t,T]$ yields \begin{align*} V(t,x)\geq R(t,x,\tau). \end{align*} [guided] We prove the lower bound by starting with a full admissible control and cutting it at the intermediate time. Let \begin{align*} w\in\mathcal{A}_{t,x}[t,T] \end{align*} be arbitrary, and define the prefix control as the map $u_1:[t,\tau]\to U$ given by \begin{align*} u_1=w|_{[t,\tau]}. \end{align*} Also define the intermediate state reached by the full trajectory: \begin{align*} y=x^{t,x;w}(\tau). \end{align*} The restriction hypothesis applies to this full control and gives two admissible restricted controls: $u_1\in\mathcal{A}_{t,x}[t,\tau]$ and $w|_{[\tau,T]}\in\mathcal{A}_{\tau,y}[\tau,T]$. It also gives the trajectory identities \begin{align*} x^{t,x;u_1}(s)=x^{t,x;w}(s) \quad \text{for } s\in[t,\tau] \end{align*} and \begin{align*} x^{\tau,y;w|_{[\tau,T]}}(s)=x^{t,x;w}(s) \quad \text{for } s\in[\tau,T]. \end{align*} These identities are the reason the cost can be separated at time $\tau$. The one-dimensional Lebesgue integral is additive over $[t,\tau]$ and $[\tau,T]$ because their overlap is the singleton $\{\tau\}$ and $\mathcal{L}^1(\{\tau\})=0$. Using the trajectory identities in the two pieces gives \begin{align*} J(t,x;w)=\int_{[t,\tau]} \ell(s,x^{t,x;w}(s),w(s))\,d\mathcal{L}^1(s)+J(\tau,y;w|_{[\tau,T]}). \end{align*} Since $w|_{[\tau,T]}$ is admissible from $(\tau,y)$, the definition of the value function as an infimum over all admissible tails gives \begin{align*} J(\tau,y;w|_{[\tau,T]})\geq V(\tau,y). \end{align*} Substituting this lower bound and replacing $w$ by its prefix $u_1$ on $[t,\tau]$, we obtain \begin{align*} J(t,x;w)\geq \int_{[t,\tau]} \ell(s,x^{t,x;u_1}(s),u_1(s))\,d\mathcal{L}^1(s)+V\bigl(\tau,x^{t,x;u_1}(\tau)\bigr). \end{align*} The right-hand side is one admissible-prefix quantity appearing in the infimum defining $R(t,x,\tau)$, so it is at least $R(t,x,\tau)$. Hence \begin{align*} J(t,x;w)\geq R(t,x,\tau). \end{align*} Because $w$ was an arbitrary full control in $\mathcal{A}_{t,x}[t,T]$, taking the infimum over all such $w$ gives \begin{align*} V(t,x)\geq R(t,x,\tau). \end{align*} [/guided] [/step] [step:Concatenate an arbitrary prefix with an $\varepsilon$-optimal tail to prove $V(t,x)\leq R(t,x,\tau)$] Let $\varepsilon>0$ and let $u_1\in\mathcal{A}_{t,x}[t,\tau]$ be arbitrary. Define \begin{align*} y=x^{t,x;u_1}(\tau). \end{align*} By the $\varepsilon$-optimality hypothesis applied at time $\tau$ and state $y$, there exists $u_2\in\mathcal{A}_{\tau,y}[\tau,T]$ such that \begin{align*} J(\tau,y;u_2)\leq V(\tau,y)+\varepsilon. \end{align*} Define the concatenated control \begin{align*} w=u_1\oplus_\tau u_2. \end{align*} By concatenation stability, $w\in\mathcal{A}_{t,x}[t,T]$, and its trajectory agrees with $x^{t,x;u_1}$ on $[t,\tau]$ and with $x^{\tau,y;u_2}$ on $[\tau,T]$. Since $[t,\tau]\cap[\tau,T]=\{\tau\}$ and $\mathcal{L}^1(\{\tau\})=0$, additivity of the one-dimensional Lebesgue integral over these adjacent intervals gives \begin{align*} J(t,x;w)=\int_{[t,\tau]} \ell(s,x^{t,x;u_1}(s),u_1(s))\,d\mathcal{L}^1(s)+J(\tau,y;u_2). \end{align*} Using the choice of $u_2$, we obtain \begin{align*} J(t,x;w)\leq \int_{[t,\tau]} \ell(s,x^{t,x;u_1}(s),u_1(s))\,d\mathcal{L}^1(s)+V\bigl(\tau,x^{t,x;u_1}(\tau)\bigr)+\varepsilon. \end{align*} Since $V(t,x)$ is the infimum of $J(t,x;\cdot)$ over all full controls and $w$ is an admissible full control, \begin{align*} V(t,x)\leq \int_{[t,\tau]} \ell(s,x^{t,x;u_1}(s),u_1(s))\,d\mathcal{L}^1(s)+V\bigl(\tau,x^{t,x;u_1}(\tau)\bigr)+\varepsilon. \end{align*} This holds for every $u_1\in\mathcal{A}_{t,x}[t,\tau]$, so taking the infimum over $u_1$ gives \begin{align*} V(t,x)\leq R(t,x,\tau)+\varepsilon. \end{align*} Because $\varepsilon>0$ was arbitrary and all quantities are finite [real numbers](/page/Real%20Numbers), we conclude \begin{align*} V(t,x)\leq R(t,x,\tau). \end{align*} [guided] We now prove the reverse inequality, where the main issue is that the infimum defining $V(\tau,y)$ need not be attained. Fix $\varepsilon>0$ and choose an arbitrary prefix control \begin{align*} u_1\in\mathcal{A}_{t,x}[t,\tau]. \end{align*} Let \begin{align*} y=x^{t,x;u_1}(\tau) \end{align*} be the state reached at time $\tau$ by this prefix. The dynamic-programming expression wants to attach the number $V(\tau,y)$ after the prefix cost. Since $V(\tau,y)$ is an infimum, there may be no tail control whose cost is exactly $V(\tau,y)$. The hypothesis supplies the substitute we need: there exists a tail control \begin{align*} u_2\in\mathcal{A}_{\tau,y}[\tau,T] \end{align*} such that \begin{align*} J(\tau,y;u_2)\leq V(\tau,y)+\varepsilon. \end{align*} Now concatenate the prefix and this nearly optimal tail. Define \begin{align*} w=u_1\oplus_\tau u_2. \end{align*} The concatenation hypothesis verifies admissibility: $w\in\mathcal{A}_{t,x}[t,T]$. It also gives the trajectory identity needed to split the cost: on $[t,\tau]$ the trajectory of $w$ is $x^{t,x;u_1}$, and on $[\tau,T]$ it is $x^{\tau,y;u_2}$. The intervals $[t,\tau]$ and $[\tau,T]$ overlap only at $\{\tau\}$, and $\mathcal{L}^1(\{\tau\})=0$, so additivity of the Lebesgue integral gives the cost decomposition \begin{align*} J(t,x;w)=\int_{[t,\tau]} \ell(s,x^{t,x;u_1}(s),u_1(s))\,d\mathcal{L}^1(s)+J(\tau,y;u_2). \end{align*} Substituting the $\varepsilon$-optimal estimate for the tail gives \begin{align*} J(t,x;w)\leq \int_{[t,\tau]} \ell(s,x^{t,x;u_1}(s),u_1(s))\,d\mathcal{L}^1(s)+V(\tau,y)+\varepsilon. \end{align*} Since $y=x^{t,x;u_1}(\tau)$, this is \begin{align*} J(t,x;w)\leq \int_{[t,\tau]} \ell(s,x^{t,x;u_1}(s),u_1(s))\,d\mathcal{L}^1(s)+V\bigl(\tau,x^{t,x;u_1}(\tau)\bigr)+\varepsilon. \end{align*} Finally, $V(t,x)$ is the infimum of all full-control costs, and $w$ is one admissible full control. Hence \begin{align*} V(t,x)\leq J(t,x;w). \end{align*} Combining this with the previous bound yields \begin{align*} V(t,x)\leq \int_{[t,\tau]} \ell(s,x^{t,x;u_1}(s),u_1(s))\,d\mathcal{L}^1(s)+V\bigl(\tau,x^{t,x;u_1}(\tau)\bigr)+\varepsilon. \end{align*} This inequality holds for every prefix $u_1\in\mathcal{A}_{t,x}[t,\tau]$. Taking the infimum over all such $u_1$ gives \begin{align*} V(t,x)\leq R(t,x,\tau)+\varepsilon. \end{align*} Because $\varepsilon>0$ was arbitrary and both sides are finite real numbers, letting $\varepsilon$ tend to $0$ gives \begin{align*} V(t,x)\leq R(t,x,\tau). \end{align*} [/guided] [/step] [step:Combine the two inequalities to obtain the dynamic programming identity] The restriction argument proved \begin{align*} V(t,x)\geq R(t,x,\tau), \end{align*} and the concatenation argument proved \begin{align*} V(t,x)\leq R(t,x,\tau). \end{align*} Therefore \begin{align*} V(t,x)=R(t,x,\tau). \end{align*} By the definition of $R(t,x,\tau)$, this is exactly \begin{align*} V(t,x)=\inf_{u\in\mathcal{A}_{t,x}[t,\tau]}\left\{\int_{[t,\tau]} \ell(s,x^{t,x;u}(s),u(s))\,d\mathcal{L}^1(s)+V\bigl(\tau,x^{t,x;u}(\tau)\bigr)\right\}. \end{align*} This proves the deterministic dynamic programming principle. [/step]

Prerequisites (0/1 completed)

Prerequisites Graph

Interactive dependency map showing how this theorem builds on foundational concepts

Loading dependency graph...

Definitions & Concepts

Lebesgue Integral

What brings you to Androma?

Start with a route through the knowledge graph.

Deterministic Dynamic Programming Principle (Theorem # 7629)

Discussion

Proof

Prerequisites (0/1 completed)

Prerequisites Graph

Explore Further

Sign in to Androma

Check your inbox

One last step

Deterministic Dynamic Programming Principle (Theorem # 7629)

Discussion

Proof

Prerequisites (0/1 completed)

Prerequisites Graph

Explore Further