Separation Principle for Infinite-Horizon Average-Cost LQG Control

Theorem

Edit Issues Pull Requests Attributions Admin

Let $n,m,r,q \in \mathbb N$. Let $(\Omega,\mathcal F,\mathbb P)$ be a probability space. Let $A \in \mathbb R^{n \times n}$, $B \in \mathbb R^{n \times m}$, $C \in \mathbb R^{r \times n}$, $G \in \mathbb R^{n \times q}$, let $Q \in \mathbb R^{n \times n}$ be symmetric positive semidefinite, let $R \in \mathbb R^{m \times m}$ be symmetric positive definite, let $W \in \mathbb R^{q \times q}$ be symmetric positive semidefinite, and let $V \in \mathbb R^{r \times r}$ be symmetric positive definite. Consider the continuous-time linear Gaussian system \begin{align*} dx(t) = A x(t)\,dt + B u(t)\,dt + G\,dw(t) \end{align*} and \begin{align*} dy(t) = Cx(t)\,dt + dv(t), \end{align*} where $w$ and $v$ are independent Gaussian Wiener processes with incremental covariance matrices $W\,dt$ and $V\,dt$, respectively, and the initial state $x(0):\Omega\to\mathbb R^n$ is Gaussian, square-integrable, and independent of $w$ and $v$. Let $\mathcal Y_t := \sigma(y(s):0 \le s \le t)$ be the observation filtration. An admissible control is an $\mathbb R^m$-valued process $u$ that is causal, $\mathcal Y_t$-adapted, square-integrable on every finite interval, has locally finite second moments for the corresponding state and conditional mean, satisfies the sublinear terminal condition \begin{align*} \limsup_{T\to\infty}\frac{1}{T}\mathbb E[\hat{x}(T)^\top P\hat{x}(T)]=0 \end{align*} for the stabilising regulator solution $P$ below, and for which the average cost \begin{align*} J_{\mathrm{av}}[u] := \limsup_{T \to \infty} \frac{1}{T}\mathbb E\left[\int_0^{\!T}\left(x(t)^\top Qx(t) + u(t)^\top R u(t)\right)\,d\mathcal L^1(t)\right] \end{align*} is well-defined in $[0,\infty]$. Assume that $(A,B)$ is stabilisable, $(Q^{1/2},A)$ is detectable, $(A,C)$ is detectable, and $(A,GW^{1/2})$ is stabilisable. Then the stabilising algebraic Riccati equations \begin{align*} A^\top P + PA - PBR^{-1}B^\top P + Q = 0 \end{align*} and \begin{align*} A\Sigma + \Sigma A^\top + GWG^\top - \Sigma C^\top V^{-1}C\Sigma = 0 \end{align*} have symmetric positive semidefinite stabilising solutions $P \in \mathbb R^{n \times n}$ and $\Sigma \in \mathbb R^{n \times n}$. Assume the Gaussian initial law is initialized with the steady-state filtering covariance, meaning that \begin{align*} \mathbb E[(x(0)-\mathbb E[x(0)\mid\mathcal Y_0])(x(0)-\mathbb E[x(0)\mid\mathcal Y_0])^\top\mid\mathcal Y_0]=\Sigma. \end{align*} Moreover the minimum of $J_{\mathrm{av}}[u]$ over all admissible controls is attained by the certainty-equivalence controller \begin{align*} u(t) = -R^{-1}B^\top P\,\hat{x}(t), \end{align*} where \begin{align*} \hat{x}(t) := \mathbb E[x(t)\mid \mathcal Y_t] \end{align*} is the Kalman-Bucy conditional mean estimate, equivalently the solution of \begin{align*} d\hat{x}(t)=A\hat{x}(t)\,dt+Bu(t)\,dt+L(dy(t)-C\hat{x}(t)\,dt). \end{align*} The minimum average cost is \begin{align*} \operatorname{tr}(Q\Sigma)+\operatorname{tr}(PLVL^\top), \end{align*} where $L=\Sigma C^\top V^{-1}$.

Discussion

Proof

[proofplan] The proof is the continuous-time LQG separation argument. First, the stabilising Riccati equations provide the two matrices that define the regulator gain and the filter gain. Then the [conditional expectation](/page/Conditional%20Expectation) decomposition $x=\hat{x}+\tilde{x}$ separates the state cost into a control-dependent term involving $\hat{x}$ and a covariance term independent of the admissible control. Finally, applying the fully observed LQR square-completion calculation to the conditional mean dynamics gives the minimising feedback $u=-R^{-1}B^\top P\hat{x}$. [/proofplan] [step:Define the regulator and filter gains from the stabilising Riccati equations] We use the locally staged [Continuous-Time Algebraic Riccati Equation Existence and LQR Optimality Theorem](/theorems/6403) [quotetheorem:TEMP-43] and the locally staged [Continuous-Time Kalman Filter Algebraic Riccati Theorem](/theorems/6407) [quotetheorem:TEMP-47]. The LQR Riccati theorem says that, when $(A,B)$ is stabilisable, $(Q^{1/2},A)$ is detectable, and $R=R^\top>0$, the regulator algebraic Riccati equation has a unique symmetric positive semidefinite stabilising solution. The steady-state Kalman-Bucy Riccati theorem says that, when $(A,C)$ is detectable, $(A,GW^{1/2})$ is stabilisable, and $V=V^\top>0$, the filter covariance algebraic Riccati equation has a unique symmetric positive semidefinite stabilising solution. These hypotheses are exactly the hypotheses stated in the theorem, so there exist unique symmetric positive semidefinite stabilising solutions $P$ and $\Sigma$ of the two algebraic Riccati equations. Define the regulator gain $K \in \mathbb R^{m \times n}$ by \begin{align*} K := R^{-1}B^\top P. \end{align*} Define the Kalman-Bucy filter gain $L \in \mathbb R^{n \times r}$ by \begin{align*} L := \Sigma C^\top V^{-1}. \end{align*} The stabilising property means that $A-BK$ and $A-LC$ are Hurwitz matrices. [/step] [step:Write the Kalman-Bucy estimate dynamics] For every admissible control $u$, define the conditional mean process $\hat{x}: [0,\infty) \times \Omega \to \mathbb R^n$ by \begin{align*} \hat{x}(t,\omega)=\mathbb E[x(t)\mid \mathcal Y_t](\omega). \end{align*} Because the initial state is Gaussian and independent of the Gaussian driving noises, the model is a linear Gaussian filtering model. The locally staged [Kalman-Bucy Filter Theorem](/theorems/6406) [quotetheorem:TEMP-46] applies under the deterministic coefficient, Gaussian initial-state, independent-noise, positive measurement-covariance, and initial filtering covariance $\Sigma$ hypotheses verified in the statement. Since $u$ is $\mathcal Y_t$-adapted, it is known to the filter at time $t$ and enters as an adapted input; it does not change the error covariance equation. The theorem gives \begin{align*} d\hat{x}(t) = A\hat{x}(t)\,dt + Bu(t)\,dt + L\left(dy(t)-C\hat{x}(t)\,dt\right). \end{align*} The innovation process $\nu$ is defined by \begin{align*} d\nu(t) := dy(t)-C\hat{x}(t)\,dt. \end{align*} It is a Gaussian Wiener process with covariance $V\,dt$ relative to the observation filtration. Hence the conditional mean is a fully observed controlled diffusion driven by the innovation: \begin{align*} d\hat{x}(t) = A\hat{x}(t)\,dt + Bu(t)\,dt + L\,d\nu(t). \end{align*} [/step] [step:Separate the conditional state cost by orthogonality of conditional expectation] Define the estimation error process $\tilde{x}$ by \begin{align*} \tilde{x}(t) := x(t)-\hat{x}(t). \end{align*} Since $\hat{x}(t)=\mathbb E[x(t)\mid \mathcal Y_t]$, the defining orthogonality property of conditional expectation gives \begin{align*} \mathbb E[\tilde{x}(t)\mid \mathcal Y_t] = 0. \end{align*} For each $t \ge 0$, expand the quadratic form: \begin{align*} x(t)^\top Qx(t) = \hat{x}(t)^\top Q\hat{x}(t) + 2\hat{x}(t)^\top Q\tilde{x}(t) + \tilde{x}(t)^\top Q\tilde{x}(t). \end{align*} Taking conditional expectation with respect to $\mathcal Y_t$, the cross term vanishes because $\hat{x}(t)$ is $\mathcal Y_t$-measurable and $\mathbb E[\tilde{x}(t)\mid \mathcal Y_t]=0$. Thus \begin{align*} \mathbb E[x(t)^\top Qx(t)\mid \mathcal Y_t] = \hat{x}(t)^\top Q\hat{x}(t) + \mathbb E[\tilde{x}(t)^\top Q\tilde{x}(t)\mid \mathcal Y_t]. \end{align*} [guided] The purpose of this step is to identify exactly which part of the state cost can be influenced by the output-feedback control. Define the estimation error by \begin{align*} \tilde{x}(t) := x(t)-\hat{x}(t). \end{align*} Because $\hat{x}(t)$ is the conditional expectation of $x(t)$ given $\mathcal Y_t$, it is the $L^2$ projection of $x(t)$ onto the $\mathcal Y_t$-measurable random variables. Therefore the error is conditionally centred: \begin{align*} \mathbb E[\tilde{x}(t)\mid \mathcal Y_t] = \mathbb E[x(t)-\hat{x}(t)\mid \mathcal Y_t] = \hat{x}(t)-\hat{x}(t)=0. \end{align*} Now expand the quadratic cost using $x(t)=\hat{x}(t)+\tilde{x}(t)$: \begin{align*} x(t)^\top Qx(t) = \hat{x}(t)^\top Q\hat{x}(t) + 2\hat{x}(t)^\top Q\tilde{x}(t) + \tilde{x}(t)^\top Q\tilde{x}(t). \end{align*} The middle term disappears after conditioning. Indeed, $\hat{x}(t)^\top Q$ is $\mathcal Y_t$-measurable, so it can be pulled through the conditional expectation: \begin{align*} \mathbb E[\hat{x}(t)^\top Q\tilde{x}(t)\mid \mathcal Y_t] = \hat{x}(t)^\top Q\,\mathbb E[\tilde{x}(t)\mid \mathcal Y_t] = 0. \end{align*} Therefore \begin{align*} \mathbb E[x(t)^\top Qx(t)\mid \mathcal Y_t] = \hat{x}(t)^\top Q\hat{x}(t) + \mathbb E[\tilde{x}(t)^\top Q\tilde{x}(t)\mid \mathcal Y_t]. \end{align*} This is the algebraic heart of certainty equivalence: the only control-dependent state variable in the conditional optimisation is the estimate $\hat{x}(t)$. [/guided] [/step] [step:Observe that the estimation-error covariance is independent of the chosen control] Let $S(t) \in \mathbb R^{n \times n}$ denote the conditional covariance matrix of the estimation error: \begin{align*} S(t) := \mathbb E[\tilde{x}(t)\tilde{x}(t)^\top\mid \mathcal Y_t]. \end{align*} For this linear Gaussian system with independent process and measurement noises, the Kalman-Bucy covariance theorem gives that $S(t)$ is deterministic and satisfies the autonomous Riccati equation \begin{align*} \frac{dS}{dt}(t) = AS(t)+S(t)A^\top+GWG^\top-S(t)C^\top V^{-1}CS(t). \end{align*} The initial condition is $S(0)=\Sigma$, and $\Sigma$ solves the corresponding algebraic Riccati equation, so uniqueness for the Riccati initial-value problem gives $S(t)=\Sigma$ for all $t\ge 0$. This equation contains $A,C,G,W,V$ but not $u$. Hence the conditional error-cost term is \begin{align*} \mathbb E[\tilde{x}(t)^\top Q\tilde{x}(t)\mid \mathcal Y_t] = \operatorname{tr}(QS(t)), \end{align*} and it is independent of the admissible control; its long-time average contribution is the finite constant $\operatorname{tr}(Q\Sigma)$. Consequently minimising $J_{\mathrm{av}}[u]$ is equivalent to minimising the control-dependent average cost \begin{align*} \limsup_{T \to \infty}\frac{1}{T}\mathbb E\left[\int_0^{\!T}\left(\hat{x}(t)^\top Q\hat{x}(t)+u(t)^\top Ru(t)\right)\,d\mathcal L^1(t)\right]. \end{align*} [/step] [step:Complete the square for the conditional mean dynamics] For the conditional mean dynamics \begin{align*} d\hat{x}(t)=A\hat{x}(t)\,dt+Bu(t)\,dt+L\,d\nu(t), \end{align*} apply [Itô's formula](/theorems/2099) to the function $V_P:\mathbb R^n \to \mathbb R$ defined by \begin{align*} V_P(z)=z^\top Pz. \end{align*} Using the Riccati identity \begin{align*} A^\top P+PA+Q-PBR^{-1}B^\top P=0, \end{align*} we obtain the following identity first for stopped processes at $\tau_N:=\inf\{t:|\hat{x}(t)|\ge N\}\wedge N$: \begin{align*} \hat{x}(t)^\top Q\hat{x}(t)+u(t)^\top Ru(t) = \left(u(t)+K\hat{x}(t)\right)^\top R\left(u(t)+K\hat{x}(t)\right) -\frac{d}{dt}\left(\hat{x}(t)^\top P\hat{x}(t)\right) +\operatorname{tr}(PLVL^\top) \end{align*} after taking expectations, where the stopped stochastic integral has expectation zero because its integrand is bounded and predictable. For fixed $T$, local square-integrability of $u$ and locally finite second moments of $\hat{x}$ make the drift terms integrable on $[0,T]$. The stopped terminal quadratic terms increase to the unstopped terminal quadratic term along a subsequence and are uniformly controlled by these finite second moments, so dominated convergence for the drift integrals and $L^1$ convergence of the stopped terminal terms let $N\to\infty$. Thus the identity holds on every finite interval. Integrating over $[0,T]$ with respect to $\mathcal L^1$ gives \begin{align*} \mathbb E\left[\int_0^{\!T}\left(\hat{x}(t)^\top Q\hat{x}(t)+u(t)^\top Ru(t)\right)\,d\mathcal L^1(t)\right] \end{align*} \begin{align*} = \mathbb E\left[\int_0^{\!T}\left(u(t)+K\hat{x}(t)\right)^\top R\left(u(t)+K\hat{x}(t)\right)\,d\mathcal L^1(t)\right] +\mathbb E[\hat{x}(0)^\top P\hat{x}(0)] -\mathbb E[\hat{x}(T)^\top P\hat{x}(T)] +T\operatorname{tr}(PLVL^\top). \end{align*} Because $R$ is positive definite, the integrand \begin{align*} \left(u(t)+K\hat{x}(t)\right)^\top R\left(u(t)+K\hat{x}(t)\right) \end{align*} is nonnegative and is minimised pointwise by \begin{align*} u(t)=-K\hat{x}(t)=-R^{-1}B^\top P\hat{x}(t). \end{align*} [/step] [step:Conclude optimality of the certainty-equivalence controller] Under the feedback \begin{align*} u(t)=-K\hat{x}(t), \end{align*} the estimate evolves with drift matrix $A-BK$, and the estimation error evolves with stabilising filter matrix $A-LC$. Since both matrices are Hurwitz, the closed-loop second moments are bounded on $[0,\infty)$, and the boundary term \begin{align*} \frac{1}{T}\left(\mathbb E[\hat{x}(0)^\top P\hat{x}(0)]-\mathbb E[\hat{x}(T)^\top P\hat{x}(T)]\right) \end{align*} vanishes in the average-cost limit. For an arbitrary admissible control, divide the square-completion identity by $T$ and take $\limsup_{T\to\infty}$. The initial term divided by $T$ tends to zero, the terminal condition in the definition of admissibility removes the terminal quadratic term, and the square term is nonnegative. Hence every admissible control has average cost at least the value obtained when $u+K\hat{x}=0$. Under the feedback $u=-K\hat{x}$, the estimate evolves with the Hurwitz drift $A-BK$, so its second moment is bounded and the same terminal condition holds; the square term vanishes identically. Therefore the minimum of the infinite-horizon average-cost LQG problem over admissible controls is achieved by \begin{align*} u(t)=-R^{-1}B^\top P\hat{x}(t), \end{align*} where $\hat{x}(t)=\mathbb E[x(t)\mid\mathcal Y_t]$ is generated by the Kalman-Bucy filter. The independent estimation-error contribution to the original state cost is $\operatorname{tr}(Q\Sigma)$ per unit time, and the square-completion trace contribution from the conditional mean equation is $\operatorname{tr}(PLVL^\top)$ per unit time. Hence the achieved and minimal average cost is \begin{align*} \operatorname{tr}(Q\Sigma)+\operatorname{tr}(PLVL^\top). \end{align*} This proves the certainty-equivalence form and the separation of the regulator Riccati equation from the filter Riccati equation. [guided] We verify the hypotheses of the external inputs first. The regulator Riccati theorem [quotetheorem:TEMP-43] applies because $(A,B)$ is stabilisable, $(Q^{1/2},A)$ is detectable, and $R=R^\top>0$. It gives the stabilising solution $P$ and the gain $K=R^{-1}B^\top P$. The steady-state filter Riccati theorem [quotetheorem:TEMP-47] applies because $(A,C)$ is detectable, $(A,GW^{1/2})$ is stabilisable, and $V=V^\top>0$. It gives the stabilising covariance $\Sigma$ and the gain $L=\Sigma C^\top V^{-1}$. The Kalman-Bucy filter theorem [quotetheorem:TEMP-46] applies because the initial state is Gaussian and independent of the Gaussian noises, the measurement covariance is positive definite, and the observation-adapted control $u$ is known to the filter at time $t$. The steady-state initial covariance assumption gives $S(0)=\Sigma$, so uniqueness for the covariance Riccati equation keeps $S(t)=\Sigma$ for all $t\ge 0$. Therefore the exact conditional mean equation is \begin{align*} d\hat{x}(t)=A\hat{x}(t)\,dt+Bu(t)\,dt+L(dy(t)-C\hat{x}(t)\,dt), \end{align*} where $dy(t)-C\hat{x}(t)\,dt=d\nu(t)$ is the innovation increment with covariance $V\,dt$. Now decompose the state as $x(t)=\hat{x}(t)+\tilde{x}(t)$. Since $\hat{x}(t)$ is the conditional expectation of $x(t)$, the error satisfies $\mathbb E[\tilde{x}(t)\mid\mathcal Y_t]=0$. Expanding $x(t)^\top Qx(t)$ and conditioning on $\mathcal Y_t$ kills the cross term. Because the conditional covariance is $\Sigma$, the error contribution is \begin{align*} \mathbb E[\tilde{x}(t)^\top Q\tilde{x}(t)\mid\mathcal Y_t]=\operatorname{tr}(Q\Sigma). \end{align*} Thus the original average cost is the conditional-mean average cost plus the fixed constant $\operatorname{tr}(Q\Sigma)$. It remains to minimize the conditional-mean part. For fixed $T$ and $N$, stop the process at $\tau_N=\inf\{t:|\hat{x}(t)|\ge N\}\wedge N$. On $[0,T\wedge\tau_N]$ the Itô integrands in $V_P(\hat{x})=\hat{x}^\top P\hat{x}$ are bounded predictable processes, so the stochastic integral has expectation zero. The drift identity obtained from Itô's formula is \begin{align*} (u(t)+K\hat{x}(t))^\top R(u(t)+K\hat{x}(t))-\frac{d}{dt}(\hat{x}(t)^\top P\hat{x}(t))+\operatorname{tr}(PLVL^\top). \end{align*} The equality follows by substituting the regulator Riccati identity $A^\top P+PA+Q-PBR^{-1}B^\top P=0$ and $K=R^{-1}B^\top P$. The admissibility hypotheses give local square-integrability of $u$ and locally finite second moments of $\hat{x}$, so the stopped drift integrals converge in $L^1$ to the unstopped drift integrals on $[0,T]$, and the stopped terminal quadratic terms converge in $L^1$ to $\hat{x}(T)^\top P\hat{x}(T)$. Hence the integrated identity holds without stopping. Divide that identity by $T$ and take $\limsup_{T\to\infty}$. The initial term divided by $T$ tends to zero. The admissibility terminal condition removes the terminal quadratic term. The square term is nonnegative because $R>0$, and it is zero exactly when $u(t)=-K\hat{x}(t)$. Under this feedback the matrix $A-BK$ is Hurwitz, so the conditional-mean second moment is bounded and the terminal condition is satisfied. Therefore the certainty-equivalence feedback attains the minimum, and adding back the fixed error-cost constant gives the minimum average cost \begin{align*} \operatorname{tr}(Q\Sigma)+\operatorname{tr}(PLVL^\top). \end{align*} [/guided] [/step]

Prerequisites (0/5 completed)

Prerequisites Graph

Interactive dependency map showing how this theorem builds on foundational concepts

Loading dependency graph...

Theorems

Definitions & Concepts

What brings you to Androma?

Start with a route through the knowledge graph.