Infinite-Horizon Linear Quadratic Regulator Feedback Theorem

Infinite-Horizon Linear Quadratic Regulator Feedback Theorem (Theorem # 6362)

Theorem

Edit Issues Pull Requests Attributions Admin

Discussion

Proof

[proofplan] The proof first invokes the stabilizing algebraic Riccati theorem to obtain the unique non-negative symmetric solution $P$ and the Hurwitz closed-loop matrix. With this matrix fixed, we rewrite the running cost by completing the square and using the algebraic Riccati equation along trajectories. The key limiting point is a transversality argument: every finite-cost admissible trajectory satisfies $x(t)^ op Px(t)\to 0$, so the finite-horizon square identity passes to the infinite horizon. The resulting exact decomposition of the cost proves optimality of the feedback input, and equality forces the feedback deviation to vanish almost everywhere. [/proofplan] [step:Obtain the stabilizing algebraic Riccati solution] By the stabilizing algebraic Riccati existence and uniqueness theorem, applied to the matrices $A\in\mathbb R^{n\times n}$, $B\in\mathbb R^{n\times m}$, $Q=Q^\top\ge 0$, and $R=R^\top>0$, the stabilizability of $(A,B)$ and detectability of $(Q^{1/2},A)$ imply that there exists a unique matrix $P\in\mathbb R^{n\times n}$ such that $P=P^\top\ge 0$, \begin{align*} A^\top P+PA-PBR^{-1}B^\top P+Q=0, \end{align*} and the matrix $A_c\in\mathbb R^{n\times n}$ defined by \begin{align*} A_c:=A-BR^{-1}B^\top P \end{align*} is Hurwitz. [claim:The feedback trajectory is admissible] For each $x_0\in\mathbb R^n$, the feedback law generates an admissible input $u^*\in\mathcal U$. [/claim] [proof] Define $x^*:[0,\infty)\to\mathbb R^n$ by \begin{align*} x^*(t):=e^{tA_c}x_0. \end{align*} Then $x^*$ is continuously differentiable and satisfies $\dot{x}^*(t)=A_cx^*(t)$ and $x^*(0)=x_0$. Define $u^*:[0,\infty)\to\mathbb R^m$ by \begin{align*} u^*(t):=-R^{-1}B^\top Px^*(t). \end{align*} Since $x^*$ is continuous on every compact interval and $R^{-1}B^\top P$ is a fixed matrix, $u^*$ is continuous on every compact interval. Hence $u^*\in L^2_{\mathrm{loc}}([0,\infty);\mathbb R^m)=\mathcal U$. Moreover \begin{align*} Ax^*(t)+Bu^*(t)=\left(A-BR^{-1}B^\top P\right)x^*(t)=A_cx^*(t)=\dot{x}^*(t), \end{align*} so $x^*$ is the state generated by $u^*$. [/proof] [/step] [step:Complete the square in the running cost] Let $u\in\mathcal U$, and let $x:[0,\infty)\to\mathbb R^n$ denote the locally absolutely continuous solution of \begin{align*} \dot{x}(t)=Ax(t)+Bu(t),\qquad x(0)=x_0. \end{align*} Define $w:[0,\infty)\to\mathbb R^m$ by \begin{align*} w(t):=u(t)+R^{-1}B^\top Px(t). \end{align*} Using the algebraic Riccati equation to replace $Q$ by \begin{align*} Q=-A^\top P-PA+PBR^{-1}B^\top P, \end{align*} and expanding $w(t)^\top Rw(t)$ gives, for almost every $t\ge 0$, \begin{align*} x(t)^\top Qx(t)+u(t)^\top Ru(t)=w(t)^\top Rw(t)-\frac{d}{dt}\left(x(t)^\top Px(t)\right). \end{align*} [guided] We want to compare every admissible input with the feedback input, so the useful quantity is the deviation from feedback. Define $w:[0,\infty)\to\mathbb R^m$ by \begin{align*} w(t):=u(t)+R^{-1}B^\top Px(t). \end{align*} Then $w(t)=0$ exactly when $u(t)=-R^{-1}B^\top Px(t)$, so proving optimality should reduce to proving that a non-negative integral of $w(t)^\top Rw(t)$ remains. Because $x$ is locally absolutely continuous and $P$ is a fixed matrix, the scalar map $t\mapsto x(t)^\top Px(t)$ is locally absolutely continuous. Its derivative exists for almost every $t\ge 0$, and the product rule gives \begin{align*} \frac{d}{dt}\left(x(t)^\top Px(t)\right)=\dot{x}(t)^\top Px(t)+x(t)^\top P\dot{x}(t). \end{align*} Using $P=P^\top$ and $\dot{x}(t)=Ax(t)+Bu(t)$, this becomes \begin{align*} \frac{d}{dt}\left(x(t)^\top Px(t)\right)=x(t)^\top(A^\top P+PA)x(t)+2u(t)^\top B^\top Px(t). \end{align*} The algebraic Riccati equation is \begin{align*} A^\top P+PA-PBR^{-1}B^\top P+Q=0. \end{align*} Solving it for $Q$ gives \begin{align*} Q=-A^\top P-PA+PBR^{-1}B^\top P. \end{align*} Therefore \begin{align*} x(t)^\top Qx(t)=-x(t)^\top(A^\top P+PA)x(t)+x(t)^\top PBR^{-1}B^\top Px(t). \end{align*} Now expand the square using $R=R^\top$: \begin{align*} w(t)^\top Rw(t)=u(t)^\top Ru(t)+2u(t)^\top B^\top Px(t)+x(t)^\top PBR^{-1}B^\top Px(t). \end{align*} Combining the last three displayed identities gives \begin{align*} x(t)^\top Qx(t)+u(t)^\top Ru(t)=w(t)^\top Rw(t)-\frac{d}{dt}\left(x(t)^\top Px(t)\right). \end{align*} This identity is the core of the verification argument: the cost is a non-negative square plus a total derivative. [/guided] [/step] [step:Pass the square identity to the infinite horizon using transversality] For $T>0$, integrate the identity from the previous step over $[0,T]$ with respect to one-dimensional [Lebesgue measure](/page/Lebesgue%20Measure) $\mathcal L^1$. Since $t\mapsto x(t)^\top Px(t)$ is locally absolutely continuous, the [fundamental theorem of calculus](/theorems/632) gives \begin{align*} \int_0^{\!T}\left(x(t)^\top Qx(t)+u(t)^\top Ru(t)\right)\,d\mathcal L^1(t)=x_0^\top Px_0-x(T)^\top Px(T)+\int_0^{\!T}w(t)^\top Rw(t)\,d\mathcal L^1(t). \end{align*} We claim that every admissible $u\in\mathcal U$ with finite cost satisfies \begin{align*} x(T)^\top Px(T)\to 0 \end{align*} as $T\to\infty$. Indeed, let $C\in\mathbb R^{n\times n}$ denote the symmetric non-negative square root $C=Q^{1/2}$. Since $(C,A)$ is detectable, choose a matrix $G\in\mathbb R^{n\times n}$ such that $A-GC$ is Hurwitz. Finite cost and $R>0$ imply $u\in L^2([0,\infty);\mathbb R^m)$, while finite cost and $Q=C^\top C$ imply $Cx\in L^2([0,\infty);\mathbb R^n)$. The state equation can be rewritten as \begin{align*} \dot{x}(t)=(A-GC)x(t)+Bu(t)+GCx(t). \end{align*} The input map $r:[0,\infty)\to\mathbb R^n$ defined by \begin{align*} r(t):=Bu(t)+GCx(t) \end{align*} belongs to $L^2([0,\infty);\mathbb R^n)$. Since $A-GC$ is Hurwitz, the variation-of-constants formula and the standard $L^2$-stability estimate for Hurwitz linear systems imply $x(t)\to 0$ as $t\to\infty$. Because $P$ is fixed, this gives $x(T)^\top Px(T)\to 0$. If $J_{x_0}[u]=\infty$, then $J_{x_0}[u]\ge x_0^\top Px_0$ because $P\ge 0$. If $J_{x_0}[u]<\infty$, the transversality just proved and monotone convergence for the non-negative functions $w(t)^\top Rw(t)$ give \begin{align*} J_{x_0}[u]=x_0^\top Px_0+\int_0^\infty w(t)^\top Rw(t)\,d\mathcal L^1(t). \end{align*} Since $R>0$, the integral on the right is non-negative. Hence \begin{align*} J_{x_0}[u]\ge x_0^\top Px_0 \end{align*} for every admissible $u\in\mathcal U$. [guided] The finite-horizon identity contains a terminal term, so we cannot simply discard it when $T\to\infty$. The correct point is to prove transversality: for every finite-cost input, the terminal quantity $x(T)^\top Px(T)$ tends to zero. Let $C\in\mathbb R^{n\times n}$ be the symmetric non-negative square root $C=Q^{1/2}$, so $Q=C^\top C$. Since $(C,A)$ is detectable, there exists a matrix $G\in\mathbb R^{n\times n}$ such that $A-GC$ is Hurwitz. If $J_{x_0}[u]<\infty$, then the positive definiteness of $R$ gives $u\in L^2([0,\infty);\mathbb R^m)$, and the identity $x(t)^\top Qx(t)=|Cx(t)|^2$ gives $Cx\in L^2([0,\infty);\mathbb R^n)$. Rewriting the state equation gives \begin{align*} \dot{x}(t)=(A-GC)x(t)+Bu(t)+GCx(t). \end{align*} Define $r:[0,\infty)\to\mathbb R^n$ by \begin{align*} r(t):=Bu(t)+GCx(t). \end{align*} Because $B$ and $G$ are fixed matrices, $u\in L^2([0,\infty);\mathbb R^m)$, and $Cx\in L^2([0,\infty);\mathbb R^n)$, we have $r\in L^2([0,\infty);\mathbb R^n)$. The matrix $A-GC$ is Hurwitz, so the stable linear system driven by the $L^2$ input $r$ satisfies $x(t)\to 0$ as $t\to\infty$ by the variation-of-constants formula and the standard $L^2$-stability estimate for Hurwitz systems. Therefore \begin{align*} x(T)^\top Px(T)\to 0. \end{align*} Now integrate the square-completion identity over $[0,T]$: \begin{align*} \int_0^{\!T}\left(x(t)^\top Qx(t)+u(t)^\top Ru(t)\right)\,d\mathcal L^1(t)=x_0^\top Px_0-x(T)^\top Px(T)+\int_0^{\!T}w(t)^\top Rw(t)\,d\mathcal L^1(t). \end{align*} Let $T\to\infty$. The left-hand side increases to $J_{x_0}[u]$ by monotone convergence, the terminal term tends to zero by transversality, and the square term increases to its infinite-horizon integral by monotone convergence. Thus \begin{align*} J_{x_0}[u]=x_0^\top Px_0+\int_0^\infty w(t)^\top Rw(t)\,d\mathcal L^1(t). \end{align*} Since $R>0$, $w(t)^\top Rw(t)\ge 0$ for almost every $t$, and hence $J_{x_0}[u]\ge x_0^\top Px_0$. If $J_{x_0}[u]=\infty$, the same lower bound is immediate because $P\ge 0$. [/guided] [/step] [step:Show that the feedback input attains the lower bound] For the feedback input $u^*$ from the first step, the corresponding deviation $w^*:[0,\infty)\to\mathbb R^m$ is identically zero because \begin{align*} w^*(t)=u^*(t)+R^{-1}B^\top Px^*(t)=0. \end{align*} The identity from the square-completion step therefore gives, for every $T>0$, \begin{align*} \int_0^{\!T}\left(x^*(t)^\top Qx^*(t)+u^*(t)^\top Ru^*(t)\right)\,d\mathcal L^1(t)=x_0^\top Px_0-x^*(T)^\top Px^*(T). \end{align*} Since $A_c$ is Hurwitz, $x^*(T)=e^{TA_c}x_0\to 0$ as $T\to\infty$. Hence $x^*(T)^\top Px^*(T)\to 0$, and therefore \begin{align*} J_{x_0}[u^*]=x_0^\top Px_0. \end{align*} Together with the lower bound for arbitrary admissible $u$, this proves that $u^*$ is optimal. [/step] [step:Use equality in the infinite-horizon decomposition to prove uniqueness] Let $u\in\mathcal U$ be optimal, and let $x$ and $w$ be the corresponding state and feedback-deviation maps defined above. The feedback input has cost $x_0^\top Px_0$, so optimality gives $J_{x_0}[u]=x_0^\top Px_0<\infty$. Applying the infinite-horizon identity from the preceding step yields \begin{align*} 0=J_{x_0}[u]-x_0^\top Px_0=\int_0^\infty w(t)^\top Rw(t)\,d\mathcal L^1(t). \end{align*} Since $R>0$, the integrand $w(t)^\top Rw(t)$ is non-negative and vanishes only when $w(t)=0$. Thus $w(t)=0$ for $\mathcal L^1$-almost every $t\in[0,\infty)$, meaning \begin{align*} u(t)=-R^{-1}B^\top Px(t) \end{align*} for $\mathcal L^1$-almost every $t\ge 0$. The state equation then agrees almost everywhere with the closed-loop equation \begin{align*} \dot{x}(t)=A_cx(t),\qquad x(0)=x_0. \end{align*} Uniqueness of locally absolutely continuous solutions of this linear ordinary differential equation gives $x(t)=x^*(t)$ for all $t\ge 0$, and hence $u(t)=u^*(t)$ for $\mathcal L^1$-almost every $t\ge 0$. Therefore the optimal input is unique as an element of $L^2_{\mathrm{loc}}([0,\infty);\mathbb R^m)$, and it is generated by the stated feedback law. [/step]

Explore Further

Linear-Time Multi-Tape Simulation of Single-Tape Turing Machines applied Savitch's Theorem Corollary: PSPACE Equals NPSPACE applied Superposition Principle for Linear State Equations applied Continuous-Time Algebraic Riccati Equation Existence and LQR Optimality Theorem applied Reachability Canonical Form applied Luenberger Observer Error Dynamics and Stability Criterion applied Infinite-Horizon Observability Gramian Theorem applied Equivalence of Nondeterministic Polynomial Time and Polynomial-Time Verification applied

What brings you to Androma?

Start with a route through the knowledge graph.