This course develops the mathematical foundations of linear control theory, focusing on how dynamical systems are modeled, analyzed, and regulated in state-space form. It begins with the basic language of linear systems and solution operators, then studies stability as the first criterion for understanding long-term behavior. From there, the course turns to the structural properties that determine what can and cannot be done with a system: controllability, reachability, observability, and the duality between them.
As the chapters progress, the course builds the tools needed to simplify and analyze linear systems through canonical forms, minimal realizations, and the Kalman decomposition. These ideas support the design of controllers and observers, including state feedback, pole placement, and deterministic state estimation. The later chapters extend these design principles to optimal control and stochastic estimation with linear quadratic regulation and Kalman filtering, before combining them through the separation principle and output feedback.
The final chapter on robustness margins and model limitations places the theory in context, showing how idealized linear models behave under uncertainty and approximation. Overall, the course moves from fundamental system descriptions to analysis, synthesis, estimation, and robustness, giving a coherent pathway from theory to practical controller design.
# Introduction
This opening chapter sets the agenda for the course. Linear control theory studies systems whose internal state evolves by linear differential equations and whose behaviour can be altered through chosen inputs. The main question is not only how such systems move, but how to design inputs and feedback laws that make them move in a prescribed, stable, and robust way. Chapters 3 and 4 turn these questions into reachability and observability rank tests; Chapters 5 and 6 organize them through canonical forms and Kalman decomposition; Chapters 9 through 11 develop Riccati equations, estimators, and output-feedback controllers.
The course is finite-dimensional and state-space based. We work with vectors $x(t) \in \mathbb R^n$, inputs $u(t) \in \mathbb R^m$, outputs $y(t) \in \mathbb R^p$, and matrices of compatible sizes. The prerequisites are linear algebra, ordinary differential equations, matrix analysis, Laplace transforms, basic random vectors, and elementary convex optimization.
## What Is the Control Problem?
The first modelling problem is to separate three roles: the state records what the system remembers, the input is the externally chosen signal, and the output is the measured quantity. This separation matters because control design often has access to the input and output but not to the full state. The course therefore begins with a model class that keeps the internal dynamics visible.
[definition: Continuous-Time Linear State-Space System]
A continuous-time linear state-space system consists of integers $n,m,p \in \mathbb N$, matrices $A \in \mathbb R^{n \times n}$, $B \in \mathbb R^{n \times m}$, $C \in \mathbb R^{p \times n}$, and $D \in \mathbb R^{p \times m}$, and equations
\begin{align*}
\dot{x}(t) = Ax(t)+Bu(t), \qquad y(t) = Cx(t)+Du(t),
\end{align*}
on a time interval $I\subseteq\mathbb R$, where $u\in L^2_{\mathrm{loc}}(I;\mathbb R^m)$ is the input, $x:I\to\mathbb R^n$ is an absolutely continuous state trajectory satisfying the state equation a.e., and $y\in L^2_{\mathrm{loc}}(I;\mathbb R^p)$ is the output.
[/definition]
The matrix $A$ describes autonomous evolution, $B$ describes how the input enters the state equation, $C$ selects or combines state variables for measurement, and $D$ records direct feedthrough from input to output. Much of the course asks which properties depend on the particular coordinates chosen for $x$ and which properties are intrinsic to the input-output behaviour.
[example: Mass-Spring-Damper Model]
Consider a mass $M>0$ attached to a spring with constant $k>0$ and a damper with coefficient $c\ge 0$, driven by an external force $u(t)$. With displacement $q(t)$, Newton's law is
\begin{align*}
M\ddot q(t)+c\dot q(t)+kq(t)=u(t).
\end{align*}
Define the state variables by
\begin{align*}
x_1(t)=q(t), \qquad x_2(t)=\dot q(t).
\end{align*}
Then
\begin{align*}
\dot x_1(t)=\frac{d}{dt}q(t)=\dot q(t)=x_2(t).
\end{align*}
Solving Newton's law for $\ddot q(t)$ gives
\begin{align*}
M\ddot q(t)=u(t)-c\dot q(t)-kq(t).
\end{align*}
Since $M>0$, division by $M$ gives
\begin{align*}
\ddot q(t)=-\frac{k}{M}q(t)-\frac{c}{M}\dot q(t)+\frac{1}{M}u(t).
\end{align*}
Because $x_2(t)=\dot q(t)$, we have $\dot x_2(t)=\ddot q(t)$, and substituting $q(t)=x_1(t)$ and $\dot q(t)=x_2(t)$ gives
\begin{align*}
\dot x_2(t)=-\frac{k}{M}x_1(t)-\frac{c}{M}x_2(t)+\frac{1}{M}u(t).
\end{align*}
Thus the system has the state-space form $\dot x=Ax+Bu$, where $A\in\mathbb R^{2\times 2}$ has entries $a_{11}=0$, $a_{12}=1$, $a_{21}=-k/M$, and $a_{22}=-c/M$, while $B\in\mathbb R^{2\times 1}$ has entries $b_1=0$ and $b_2=1/M$. If the measured output is displacement, then
\begin{align*}
y(t)=x_1(t)=1\cdot x_1(t)+0\cdot x_2(t)+0\cdot u(t).
\end{align*}
So the output equation is $y=Cx+Du$ with $C$ the row vector with entries $1$ and $0$, and with $D=0$. The scalar second-order equation becomes a two-dimensional first-order system because the future motion is determined by the current position $q(t)$, the current velocity $\dot q(t)$, and the future input $u$.
[/example]
This example is representative: the dimension of the state is not the number of measured signals, but the number of variables needed to continue the motion once the future input is specified. The same idea appears in electrical circuits, mechanical systems, chemical networks, and linearized nonlinear models.
## Why Linearity Matters
The next question is why the linear class is powerful enough to deserve a full theory. Linearity gives a superposition principle, connects differential equations to matrix algebra, and turns design problems into solvability questions for linear or quadratic equations. It also supplies local models for nonlinear systems near operating points, so the tools developed here become a first layer of nonlinear control analysis.
[quotetheorem:6361]
[citeproof:6361]
The theorem explains why reachable sets, observable subspaces, transfer functions, and feedback laws can be studied through linear algebra: the map from initial state and input to the resulting trajectory respects linear combinations. The hypotheses are doing real work. The matrices $A$ and $B$ must be fixed, and the state equation must be linear in both $x$ and $u$; for example, in the scalar saturated system $\dot{x}=\operatorname{sat}(u)$ with $x(0)=0$, $u_1(t)=1$ and $u_2(t)=0$ give $x_1(t)=t$ and $x_2(t)=0$, but the input $3u_1-2u_2$ gives $\dot{x}=\operatorname{sat}(3)=1$, not the trajectory $3t$. The theorem also does not say that every desired trajectory can be produced, only that the set of trajectories produced from a chosen class of inputs has a linear structure. Reachability will ask whether this linear set is large enough to fill the whole state space.
[example: Superposing Two Inputs]
Let $\dot{x}=Ax+Bu$ with zero initial state. Suppose $u_1$ produces $x_1$ and $u_2$ produces $x_2$, meaning
\begin{align*}
\dot{x}_1(t)=Ax_1(t)+Bu_1(t),\qquad x_1(0)=0.
\end{align*}
Also,
\begin{align*}
\dot{x}_2(t)=Ax_2(t)+Bu_2(t),\qquad x_2(0)=0.
\end{align*}
For the combined input, define
\begin{align*}
u(t)=3u_1(t)-2u_2(t).
\end{align*}
We show that the corresponding state is
\begin{align*}
x(t)=3x_1(t)-2x_2(t).
\end{align*}
First,
\begin{align*}
x(0)=3x_1(0)-2x_2(0)=3\cdot 0-2\cdot 0=0.
\end{align*}
Differentiating the proposed trajectory and substituting the two state equations gives
\begin{align*}
\dot{x}(t)=3\dot{x}_1(t)-2\dot{x}_2(t).
\end{align*}
Thus
\begin{align*}
\dot{x}(t)=3\bigl(Ax_1(t)+Bu_1(t)\bigr)-2\bigl(Ax_2(t)+Bu_2(t)\bigr).
\end{align*}
Distributing the scalar factors gives
\begin{align*}
\dot{x}(t)=3Ax_1(t)+3Bu_1(t)-2Ax_2(t)-2Bu_2(t).
\end{align*}
Using linearity of matrix multiplication,
\begin{align*}
\dot{x}(t)=A\bigl(3x_1(t)-2x_2(t)\bigr)+B\bigl(3u_1(t)-2u_2(t)\bigr).
\end{align*}
By the definitions of $x(t)$ and $u(t)$, this is
\begin{align*}
\dot{x}(t)=Ax(t)+Bu(t).
\end{align*}
Therefore driving the system with $3u_1-2u_2$ produces the trajectory $3x_1-2x_2$. In particular, at any terminal time $T$,
\begin{align*}
x(T)=3x_1(T)-2x_2(T).
\end{align*}
So the operation sending an admissible input to its terminal state respects this linear combination, which is the basic reason input design can be treated as a linear mapping problem.
[/example]
The [linear map](/page/Linear%20Map) from inputs to terminal states is the source of the reachability theory in the first half of the course. The same structural viewpoint reappears in observability, where the question is whether the output signal contains enough information to reconstruct the hidden initial state.
## The Main Structural Questions
Before designing controllers, we need to know what the model permits. Some systems cannot be driven to every state because the input enters in too few directions. Some systems cannot be reconstructed from outputs because the sensor misses a hidden mode. These obstructions are algebraic, and the course develops rank tests that expose them.
For the rest of this introductory discussion, an admissible input on $[0,T]$ means a function $u\in L^2([0,T];\mathbb R^m)$. This is broad enough for the energy-based problems later in the course and still gives a well-defined absolutely continuous state trajectory through the variation-of-constants formula.
[definition: Reachability Question]
For the system $\dot{x}=Ax+Bu$ with $A\in\mathbb R^{n\times n}$ and $B\in\mathbb R^{n\times m}$, the reachability question at time $T>0$ asks for the subset
\begin{align*}
\mathcal R(T)=\{x(T)\in\mathbb R^n : x:[0,T]\to\mathbb R^n \text{ solves } \dot{x}=Ax+Bu,\ x(0)=0,\ u\in L^2([0,T];\mathbb R^m)\}.
\end{align*}
[/definition]
This question leads to the controllability matrix and the Kalman rank condition. The point of the rank condition is that a dynamical question about all possible input functions becomes a finite-dimensional question involving $B,AB,\dots,A^{n-1}B$.
[definition: Observability Question]
For $T>0$, $A\in\mathbb R^{n\times n}$, and $C\in\mathbb R^{p\times n}$, consider the autonomous output system
\begin{align*}
\dot{x}(t)=Ax(t), \qquad y(t)=Cx(t), \qquad t\in[0,T].
\end{align*}
The observability question asks whether the output map
\begin{align*}
\mathcal O_T:\mathbb R^n&\longrightarrow C([0,T];\mathbb R^p),&
\mathcal O_T(x_0)(t)&=Ce^{tA}x_0
\end{align*}
is injective.
[/definition]
Observability is the measurement-side partner of reachability. Its algebraic test involves $C,CA,\dots,CA^{n-1}$, and its role becomes decisive when the controller must be built from sensor data rather than full state information.
[example: An Unmeasured Mode]
Let $A=\operatorname{diag}(0,-1)$ and $C=\begin{pmatrix}1&0\end{pmatrix}$. For the autonomous system $\dot x=Ax$ and $y=Cx$, write $x(t)=(x_1(t),x_2(t))^\top$. Multiplying by the diagonal matrix gives
\begin{align*}
Ax(t)=\operatorname{diag}(0,-1)(x_1(t),x_2(t))^\top=(0,-x_2(t))^\top.
\end{align*}
Hence $\dot x_1(t)=0$ and $\dot x_2(t)=-x_2(t)$. If $x(0)=(a,b)^\top$, then $x_1(t)=a$, since its derivative is zero and $x_1(0)=a$. Also $x_2(t)=be^{-t}$, since $\frac{d}{dt}(be^{-t})=-be^{-t}$ and $be^0=b$. Thus
\begin{align*}
x(t)=(a,be^{-t})^\top.
\end{align*}
The measured output is
\begin{align*}
y(t)=Cx(t)=\begin{pmatrix}1&0\end{pmatrix}(a,be^{-t})^\top=1\cdot a+0\cdot be^{-t}=a.
\end{align*}
Now compare two initial states $x^{(b)}(0)=(a,b)^\top$ and $x^{(d)}(0)=(a,d)^\top$ with $b\ne d$. Their trajectories are
\begin{align*}
x^{(b)}(t)=(a,be^{-t})^\top \quad \text{and} \quad x^{(d)}(t)=(a,de^{-t})^\top.
\end{align*}
These trajectories differ in the second component whenever $b\ne d$, because $be^{-t}\ne de^{-t}$ for every $t\ge 0$. But their measured outputs agree for all $t$:
\begin{align*}
y^{(b)}(t)=\begin{pmatrix}1&0\end{pmatrix}(a,be^{-t})^\top=a=\begin{pmatrix}1&0\end{pmatrix}(a,de^{-t})^\top=y^{(d)}(t).
\end{align*}
The output map cannot distinguish changes in the initial value of the second state component, so this sensor choice leaves one dynamical mode unmeasured.
[/example]
Reachability and observability are not merely diagnostic. They determine which unstable modes can be controlled, which hidden modes can be estimated, and when a state-space model is minimal among all models producing the same transfer function.
## Feedback, Optimization, and Estimation
Once the structural obstructions are understood, the course turns to construction. Open-loop control asks us to choose the whole input signal in advance, so modelling errors or disturbances are not corrected after they occur. Feedback is the basic remedy: the input is recomputed from the measured or estimated state, changing the closed-loop dynamics rather than merely injecting a preplanned forcing term. Poorly chosen feedback can move eigenvalues in the wrong direction or amplify noise, so stability and performance become matrix-design problems.
[definition: State Feedback Law]
For the system $\dot{x}=Ax+Bu$ with $A\in\mathbb R^{n\times n}$ and $B\in\mathbb R^{n\times m}$ on a time interval $[0,T]$, a linear state feedback law with reference consists of a matrix $K\in \mathbb R^{m\times n}$, a reference signal $r\in L^2([0,T];\mathbb R^m)$, and the feedback map
\begin{align*}
\mathcal F_{K,r}: C([0,T];\mathbb R^n) \longrightarrow L^2([0,T];\mathbb R^m)
\end{align*}
defined by $u=\mathcal F_{K,r}(x)$, where
\begin{align*}
u(t)=Kx(t)+r(t).
\end{align*}
[/definition]
Substituting the feedback law gives the closed-loop state equation $\dot{x}=(A+BK)x+Br$. The central pole-placement problem asks how much freedom the matrix $K$ gives over the eigenvalues of $A+BK$.
[definition: Linear Quadratic Regulator Problem]
Given matrices $A\in \mathbb R^{n\times n}$, $B\in \mathbb R^{n\times m}$, $Q=Q^\top \ge 0$, $R=R^\top>0$, and an initial state $x_0\in\mathbb R^n$, let
\begin{align*}
\mathcal U=L^2_{\mathrm{loc}}([0,\infty);\mathbb R^m).
\end{align*}
For each $u\in\mathcal U$, let $x_u:[0,\infty)\to\mathbb R^n$ be the absolutely continuous solution of $\dot{x}=Ax+Bu$ with $x(0)=x_0$. The infinite-horizon linear quadratic regulator functional is the map $J:\mathcal U\to [0,\infty]$ defined by
\begin{align*}
J[u]=\int_0^\infty \left(x_u(t)^\top Qx_u(t)+u(t)^\top Ru(t)\right)\,dt.
\end{align*}
The infinite-horizon linear quadratic regulator problem is to find $u^*\in\mathcal U$ such that $J[u^*]\le J[u]$ for all $u\in\mathcal U$.
[/definition]
The linear quadratic regulator connects control to convex optimization and matrix Riccati equations. It is important because it does not merely stabilize the system; it balances state error against input effort through explicit weights. If $Q$ ignores an unstable state direction, the cost may fail to penalize a dangerous mode; if $R$ is not positive definite, the input penalty may lose strict convexity and the pointwise minimization in $u$ may become ill-posed. This raises the next design question: when the performance criterion is quadratic and the dynamics are linear, can the optimal input still be implemented as a linear feedback law?
[quotetheorem:6362]
[citeproof:6362]
This result previews a recurring pattern: the best controller is found by proving that a matrix equation has a solution with the right positivity and stability properties. Each hypothesis has a distinct role. Stabilizability is weaker than controllability but still ensures that unstable directions can be influenced; without it, an uncontrollable unstable mode can make the infinite-horizon cost infinite. In the scalar system $\dot{x}=x$ with $B=0$, $Q=1$, and $R=1$, every input is irrelevant, so $x(t)=e^t x_0$ and the cost is infinite whenever $x_0\ne 0$. Detectability of $(Q^{1/2},A)$ prevents unstable state directions from being invisible to the state penalty. In the scalar system $\dot{x}=x+u$ with $Q=0$ and $R=1$, the zero solution $P=0$ satisfies the Riccati equation but gives the non-stabilizing closed-loop matrix $A=1$, illustrating why the stabilizing solution and the detectability hypothesis matter. The condition $Q\ge 0$ makes the state cost nonnegative, while $R>0$ makes input effort strictly costly and makes the completion-of-the-square step select a unique minimizing input. Thus solving the algebraic Riccati equation is not by itself enough: the solution must be the stabilizing nonnegative solution, and only then does the feedback formula give both optimality and closed-loop stability.
The theorem also has sharp boundaries. It assumes exact knowledge of $A$, $B$, $Q$, and $R$, and it does not by itself provide robustness margins for modelling error, actuator limits, or unmodelled dynamics. The formula is a full-state feedback law, so it requires access to $x(t)$; when only $y(t)$ is measured, the controller must be combined with an observer or Kalman filter. That estimator side has the same flavour, with covariance matrices and stochastic estimation replacing cost matrices and regulator gains.
## How the Course Fits Together
The difficulty in linear control is that the main design tasks depend on several layers at once: differential equations describe motion, linear algebra describes structural obstructions, optimization selects good inputs, and signal-processing ideas explain what can be inferred from measurements. The lectures build in an order that keeps those layers separated before combining them. First we solve state-space equations and relate them to transfer functions. Then we study stability of autonomous linear dynamics, followed by reachability, observability, and the Kalman decomposition. With those structural tools in place, we design feedback, solve quadratic regulation problems, construct observers and Kalman filters, and combine estimation with control through the separation principle.
[remark: Mathematical Thread]
The same linear-algebraic objects appear throughout the course: invariant subspaces, matrix exponentials, spectra, Gramians, adjoints, and positive definite solutions of Lyapunov or Riccati equations. Learning to recognize these objects is part of the course's purpose.
[/remark]
The course is therefore both a modelling course and a proof-based systems course. Its endpoint is output-feedback synthesis: designing a controller when the state is not fully observed, the model is linear, and performance is measured through quadratic or transfer-function criteria. Chapter 1 begins the technical development by solving $\dot{x}=Ax+Bu$ and extracting the input-output map from the state-space representation.
# 1. State-Space Models and Solution Operators
Linear systems begin with a modelling decision: separate the variables that describe the internal condition of a device from the variables that are imposed on it and measured from it. This chapter sets up that language for finite-dimensional continuous-time systems. We pass from differential equations to solution operators, then from solution operators to transfer functions, which gives the bridge between time-domain state evolution and input-output frequency-domain descriptions.
## State, Input, Output, and Trajectories
The first question is how to encode a physical system so that the same notation covers mechanical, electrical, and signal-processing examples. The state-space form records internal variables in a vector $x(t)$, external controls in a vector $u(t)$, and measured quantities in a vector $y(t)$.
[definition: State-Space Matrix Data]
The matrix data of a finite-dimensional continuous-time state-space model consist of integers $n,m,p \in \mathbb N$ and matrices $A \in \mathbb R^{n \times n}$, $B \in \mathbb R^{n \times m}$, $C \in \mathbb R^{p \times n}$, and $D \in \mathbb R^{p \times m}$. With the trajectory convention fixed in the Introduction, these data are written in the shorthand form
\begin{align*}
\dot{x}(t)=A x(t)+B u(t), \qquad y(t)=Cx(t)+Du(t),
\end{align*}
where $x(t) \in \mathbb R^n$, $u(t) \in \mathbb R^m$, and $y(t) \in \mathbb R^p$.
[/definition]
The matrix $A$ describes the autonomous dynamics, $B$ describes how inputs enter the state equation, $C$ describes which state combinations are observed, and $D$ describes direct feedthrough from input to output. The formal definition above already specified the time interval, input regularity, absolute continuity of trajectories, and a.e. interpretation of the differential equation; the next definition specializes those conventions to a finite horizon.
[definition: Admissible Control and Trajectory]
Let $T>0$. An admissible control on $[0,T]$ is a function $u \in L^1([0,T];\mathbb R^m)$. Given $x_0 \in \mathbb R^n$ and an admissible control $u$, a trajectory is an absolutely [continuous function](/page/Continuous%20Function) $x:[0,T]\to \mathbb R^n$ such that $x(0)=x_0$ and
\begin{align*}
\dot{x}(t)=Ax(t)+Bu(t)
\end{align*}
for a.e. $t\in[0,T]$.
[/definition]
The a.e. qualification reflects the fact that an $L^1$ input need not have pointwise values everywhere. The output $y$ is then defined a.e. by $y(t)=Cx(t)+Du(t)$, and if $u$ is continuous then all equations hold pointwise.
[example: Mass-Spring-Damper Model]
Consider a mass $M>0$ attached to a spring with constant $k>0$ and a damper with coefficient $c\ge 0$, driven by an external force $u(t)$. Its displacement $q(t)$ satisfies
\begin{align*}
M\ddot{q}(t)+c\dot{q}(t)+kq(t)=u(t).
\end{align*}
Set $x_1(t)=q(t)$ and $x_2(t)=\dot q(t)$. Differentiating the first definition gives
\begin{align*}
\dot{x}_1(t)=\dot q(t)=x_2(t).
\end{align*}
Solving the scalar equation for $\ddot q(t)$ gives
\begin{align*}
M\ddot q(t)=u(t)-c\dot q(t)-kq(t).
\end{align*}
Since $M>0$, division by $M$ gives
\begin{align*}
\ddot q(t)=\frac{1}{M}u(t)-\frac{c}{M}\dot q(t)-\frac{k}{M}q(t).
\end{align*}
Using $\dot{x}_2(t)=\ddot q(t)$, $q(t)=x_1(t)$, and $\dot q(t)=x_2(t)$ yields
\begin{align*}
\dot{x}_2(t)=-\frac{k}{M}x_1(t)-\frac{c}{M}x_2(t)+\frac{1}{M}u(t).
\end{align*}
Thus, for $x(t)=(x_1(t),x_2(t))^\top$, the state-space matrices are determined by $A_{11}=0$, $A_{12}=1$, $A_{21}=-k/M$, $A_{22}=-c/M$, $B_1=0$, and $B_2=1/M$. If the measured output is displacement, then
\begin{align*}
y(t)=x_1(t)=1\cdot x_1(t)+0\cdot x_2(t)+0\cdot u(t).
\end{align*}
The second-order mechanical equation has become a first-order state-space system by storing displacement and velocity as the two state coordinates.
[/example]
The same formalism also handles circuits, where the state variables are chosen from capacitor voltages and inductor currents. The value of the state-space model is that after this conversion, the analysis depends on matrices rather than on the original physical domain.
[example: RLC Circuit Realization]
For a series RLC circuit with input voltage $u(t)$, resistor $R>0$, inductor $L>0$, and capacitor $C_0>0$, let $x_1(t)$ be the capacitor voltage and let $x_2(t)$ be the inductor current. Since the elements are in series, the capacitor current is also $x_2(t)$, and the capacitor law $i_C=C_0\dot v_C$ gives
\begin{align*}
x_2(t)=C_0\dot{x}_1(t).
\end{align*}
Because $C_0>0$, division by $C_0$ gives
\begin{align*}
\dot{x}_1(t)=\frac{1}{C_0}x_2(t).
\end{align*}
Kirchhoff's voltage law around the loop reads
\begin{align*}
u(t)=v_C(t)+v_R(t)+v_L(t).
\end{align*}
Here $v_C(t)=x_1(t)$, Ohm's law gives $v_R(t)=Rx_2(t)$, and the inductor law gives $v_L(t)=L\dot{x}_2(t)$. Substituting these expressions into the loop equation gives
\begin{align*}
u(t)=x_1(t)+Rx_2(t)+L\dot{x}_2(t).
\end{align*}
Moving the first two terms to the other side gives
\begin{align*}
L\dot{x}_2(t)=u(t)-x_1(t)-Rx_2(t).
\end{align*}
Since $L>0$, division by $L$ gives
\begin{align*}
\dot{x}_2(t)=-\frac{1}{L}x_1(t)-\frac{R}{L}x_2(t)+\frac{1}{L}u(t).
\end{align*}
Thus, for $x(t)=(x_1(t),x_2(t))^\top$, the state matrix is determined by $A_{11}=0$, $A_{12}=1/C_0$, $A_{21}=-1/L$, and $A_{22}=-R/L$, while the input matrix is determined by $B_1=0$ and $B_2=1/L$.
If the output is capacitor voltage, then
\begin{align*}
y(t)=x_1(t)=1\cdot x_1(t)+0\cdot x_2(t)+0\cdot u(t),
\end{align*}
so the output row is determined by $C_1=1$, $C_2=0$, and $D=0$. If the output is resistor voltage, then
\begin{align*}
y(t)=Rx_2(t)=0\cdot x_1(t)+R\cdot x_2(t)+0\cdot u(t),
\end{align*}
so the output row is determined by $C_1=0$, $C_2=R$, and $D=0$. The same circuit therefore gives different output matrices depending on which physical voltage is measured, while the state dynamics remain the same.
[/example]
These examples illustrate that the state variables are not arbitrary bookkeeping. A useful state contains enough information to predict future motion once the future input is known.
## Matrix Exponentials and Solution Operators
Once a system is written in state-space form, the next question is how the state at time $t$ depends on the initial state and the input history. For the unforced equation $\dot{x}=Ax$, this requires a replacement for the scalar exponential solution of $\dot{x}=ax$.
[definition: Matrix Exponential]
For $A\in \mathbb R^{n\times n}$, the matrix exponential is the map
\begin{align*}
\exp_A:\mathbb R\to \mathbb R^{n\times n}, \qquad \exp_A(t)=e^{At}:=\sum_{k=0}^{\infty}\frac{t^kA^k}{k!}.
\end{align*}
[/definition]
The series converges in any matrix norm and defines a one-parameter family of invertible matrices. Since the course will use this object as a flow map, the next step is to verify that it actually solves the autonomous state equation.
[quotetheorem:6363]
[citeproof:6363]
This theorem identifies $e^{At}$ as the solution operator for free motion. The hypothesis that $A$ is a fixed square matrix is essential: if the coefficient depends on time, the expression $e^{A(t-s)}$ no longer captures the accumulated evolution unless the coefficient matrices commute in a special way. The theorem does not say that every component decays or grows like a single scalar exponential; Jordan structure can introduce polynomial factors, and complex eigenvalues can produce oscillation. Real control systems are driven by inputs, so the next obstruction is to account for forcing that is distributed across the whole past interval rather than concentrated at the initial time.
[quotetheorem:6364]
[citeproof:6364]
The formula splits the state into the free response $e^{At}x_0$ and the forced response produced by the input. The $L^1$ hypothesis is the natural finite-total-magnitude assumption needed for the integral to be meaningful; if an input is more singular, such as an ideal impulse, it must be treated distributionally rather than as an ordinary trajectory. The formula does not claim that the current state depends only on the current input value: it depends on the whole past input history through the kernel $e^{A(t-\tau)}B$. This memory kernel is the starting point for reachability, observability, feedback design, and the impulse-response viewpoint used next.
[illustration:state-space-memory-kernel]
[example: Step Input for a Scalar System]
Let $\dot{x}(t)=ax(t)+bu(t)$ with $a,b\in\mathbb R$, initial condition $x(0)=x_0$, and constant input $u(t)=u_0$. By *Variation of Constants Formula*, the state is
\begin{align*}
x(t)=e^{at}x_0+\int_0^t e^{a(t-\tau)}bu(\tau)\,d\tau.
\end{align*}
Since $u(\tau)=u_0$ for every $\tau$, this becomes
\begin{align*}
x(t)=e^{at}x_0+bu_0\int_0^t e^{a(t-\tau)}\,d\tau.
\end{align*}
If $a\neq 0$, compute the integral explicitly:
\begin{align*}
\int_0^t e^{a(t-\tau)}\,d\tau=\int_0^t e^{at-a\tau}\,d\tau.
\end{align*}
Because $e^{at}$ is constant with respect to $\tau$,
\begin{align*}
\int_0^t e^{at-a\tau}\,d\tau=e^{at}\int_0^t e^{-a\tau}\,d\tau.
\end{align*}
An antiderivative of $e^{-a\tau}$ is $-\frac{1}{a}e^{-a\tau}$, so
\begin{align*}
e^{at}\int_0^t e^{-a\tau}\,d\tau=e^{at}\left(-\frac{1}{a}e^{-at}+\frac{1}{a}\right).
\end{align*}
Multiplying through by $e^{at}$ gives
\begin{align*}
e^{at}\left(-\frac{1}{a}e^{-at}+\frac{1}{a}\right)=-\frac{1}{a}+\frac{e^{at}}{a}.
\end{align*}
Hence
\begin{align*}
\int_0^t e^{a(t-\tau)}\,d\tau=\frac{e^{at}-1}{a}.
\end{align*}
Substituting this into the solution formula gives
\begin{align*}
x(t)=e^{at}x_0+\frac{bu_0}{a}(e^{at}-1).
\end{align*}
If $a=0$, then $e^{a(t-\tau)}=e^0=1$ and $e^{at}=e^0=1$, so
\begin{align*}
x(t)=x_0+bu_0\int_0^t 1\,d\tau.
\end{align*}
Since $\int_0^t 1\,d\tau=t$, this gives
\begin{align*}
x(t)=x_0+bu_0t.
\end{align*}
Thus the first term is the natural motion from the initial state, while the input contributes the accumulated forcing over the interval $[0,t]$.
[/example]
The scalar example shows the formula in a case where the exponential can be computed directly. For larger systems, computing $e^{At}$ by summing infinitely many powers is not practical; this motivates the next theorem, which uses the characteristic polynomial to reduce the expression to finitely many matrix powers.
[quotetheorem:6365]
[citeproof:6365]
This representation explains why repeated eigenvalues and Jordan blocks produce polynomial factors multiplying exponentials. The finite-dimensional hypothesis is essential because Cayley-Hamilton is a matrix theorem; for operators on infinite-dimensional spaces, exponentials may exist without being reducible to finitely many powers. The result does not by itself diagonalize $A$ or make the coefficient functions $\alpha_k$ numerically stable to compute, especially near repeated eigenvalues. Its role here is conceptual as well as computational: it shows that all state-transition behaviour is encoded by finitely many algebraic data of $A$, which prepares the later passage from time-domain kernels to rational transfer functions.
[example: Jordan Block Exponential]
Let $A=\lambda I+N$, where $N$ is a nonzero $2\times 2$ nilpotent matrix with $N^2=0$. We compute $e^{At}$ from the matrix exponential series. Since $I$ commutes with $N$, the binomial formula applies to $\lambda I+N$. For each integer $r\ge 1$,
\begin{align*}
(\lambda I+N)^r=\sum_{j=0}^{r}\binom{r}{j}(\lambda I)^{r-j}N^j.
\end{align*}
All terms with $j\ge 2$ vanish because $N^j=N^{j-2}N^2=0$, so
\begin{align*}
(\lambda I+N)^r=(\lambda I)^r+r(\lambda I)^{r-1}N.
\end{align*}
Since $(\lambda I)^r=\lambda^r I$ and $(\lambda I)^{r-1}N=\lambda^{r-1}N$, this becomes
\begin{align*}
(\lambda I+N)^r=\lambda^r I+r\lambda^{r-1}N.
\end{align*}
Using the definition of the matrix exponential,
\begin{align*}
e^{At}=I+\sum_{r=1}^{\infty}\frac{t^r}{r!}\left(\lambda^r I+r\lambda^{r-1}N\right).
\end{align*}
Collecting the coefficients of $I$ and $N$ gives
\begin{align*}
e^{At}=\left(\sum_{r=0}^{\infty}\frac{(\lambda t)^r}{r!}\right)I+\left(\sum_{r=1}^{\infty}\frac{r\lambda^{r-1}t^r}{r!}\right)N.
\end{align*}
In the second series, $r/r!=1/(r-1)!$, hence
\begin{align*}
\sum_{r=1}^{\infty}\frac{r\lambda^{r-1}t^r}{r!}=t\sum_{r=1}^{\infty}\frac{(\lambda t)^{r-1}}{(r-1)!}.
\end{align*}
Reindexing with $q=r-1$ gives
\begin{align*}
t\sum_{r=1}^{\infty}\frac{(\lambda t)^{r-1}}{(r-1)!}=t\sum_{q=0}^{\infty}\frac{(\lambda t)^q}{q!}=te^{\lambda t}.
\end{align*}
Therefore
\begin{align*}
e^{At}=e^{\lambda t}I+te^{\lambda t}N=e^{\lambda t}(I+tN).
\end{align*}
The exponential is not merely the scalar factor $e^{\lambda t}$ times the identity: the nonzero nilpotent part contributes the polynomial factor $t$, which records the defective Jordan structure and is the basic source of transient growth in such systems.
[/example]
## Impulse Responses and Transfer Functions
The time-domain solution tells us how every input history affects the state. For input-output analysis, the question becomes: what operator sends $u$ to $y$, especially when the initial condition is zero?
The forced part of the variation-of-constants formula already has convolution form. To include direct feedthrough without leaving finite-dimensional systems, we record the ordinary convolution kernel and keep the instantaneous feedthrough term separately.
[definition: Impulse Response]
For the state-space system $(A,B,C,D)$, the dynamic impulse-response kernel is the causal matrix-valued function
\begin{align*}
h(t)=Ce^{At}B,\qquad t\ge 0.
\end{align*}
The full input-output impulse response is the pair consisting of this kernel $h$ and the direct feedthrough matrix $D$.
[/definition]
Equivalently, an ordinary input $u$ with zero initial condition produces
\begin{align*}
y(t)=\int_0^t h(t-\tau)u(\tau)\,d\tau+Du(t).
\end{align*}
Substituting the definition of $h$ gives the concrete formula
\begin{align*}
y(t)=\int_0^t Ce^{A(t-\tau)}Bu(\tau)\,d\tau+Du(t).
\end{align*}
The impulse response is the time-domain fingerprint of the input-output map. Applying the [Laplace transform](/page/Laplace%20Transform) to the causal kernel $h$ and adding the feedthrough matrix $D$ produces an algebraic object, which motivates the next definition of the transfer function.
[definition: Transfer Function]
The transfer function of the state-space system $(A,B,C,D)$ is the map
\begin{align*}
G:\rho(A)\to \mathbb C^{p\times m}, \qquad G(s)=C(sI-A)^{-1}B+D,
\end{align*}
where
\begin{align*}
\rho(A):=\{s\in\mathbb C: sI-A \text{ is invertible}\}.
\end{align*}
[/definition]
The variable $s$ is the Laplace-domain frequency variable. As written, the definition is only an algebraic formula; the control meaning is that it should be the input-output law obtained from the differential equation after transients from the initial state have been removed. The obstruction is that nonzero initial data and spectral values of $A$ produce extra terms or undefined resolvents, so the precise derivation must specify zero initial condition and $s\in\rho(A)$.
[quotetheorem:6366]
[citeproof:6366]
The zero-initial-condition hypothesis is essential because otherwise the output contains an additional term $Ce^{At}x_0$ whose Laplace transform is not determined by the input. The restriction $s\in\rho(A)$ is also essential: at spectral values of $A$, the resolvent matrix is not defined, and these singularities are exactly where poles of the transfer function arise after cancellation is accounted for. The theorem does not say that the transfer function remembers every state coordinate; coordinates that are unreachable from the input or invisible at the output can disappear from $G(s)$. Chapters 3 through 5 make this loss of internal information precise through reachability, observability, and minimal realization theory.
[example: Mass-Spring-Damper Transfer Function]
For the mass-spring-damper realization with displacement output, the state equations are $\dot{x}_1=x_2$ and $\dot{x}_2=-(k/M)x_1-(c/M)x_2+(1/M)u$, so $B=(0,1/M)^\top$, $C=(1,0)$, and $D=0$. To compute $C(sI-A)^{-1}B$, let $z=(z_1,z_2)^\top$ be the solution of $(sI-A)z=B$. Since $A_{11}=0$, $A_{12}=1$, $A_{21}=-k/M$, and $A_{22}=-c/M$, the equation $(sI-A)z=B$ is the pair
\begin{align*}
sz_1-z_2=0.
\end{align*}
\begin{align*}
\frac{k}{M}z_1+\left(s+\frac{c}{M}\right)z_2=\frac{1}{M}.
\end{align*}
The first equation gives $z_2=sz_1$. Substituting this into the second equation gives
\begin{align*}
\frac{k}{M}z_1+\left(s+\frac{c}{M}\right)sz_1=\frac{1}{M}.
\end{align*}
Expanding the coefficient of $z_1$ gives
\begin{align*}
\left(\frac{k}{M}+s^2+\frac{c}{M}s\right)z_1=\frac{1}{M}.
\end{align*}
Combining the terms over the common denominator $M$ gives
\begin{align*}
\frac{Ms^2+cs+k}{M}z_1=\frac{1}{M}.
\end{align*}
Multiplying both sides by $M$ gives
\begin{align*}
(Ms^2+cs+k)z_1=1.
\end{align*}
Hence
\begin{align*}
z_1=\frac{1}{Ms^2+cs+k}.
\end{align*}
Since $C=(1,0)$, we have $Cz=z_1$, and since $D=0$,
\begin{align*}
G(s)=C(sI-A)^{-1}B+D=\frac{1}{Ms^2+cs+k}.
\end{align*}
Thus the state-space formula recovers the usual Laplace-domain relation from external force to displacement: the mass, damper, and spring contribute the denominator terms $Ms^2$, $cs$, and $k$, respectively.
[/example]
The mass-spring-damper computation starts from a physical second-order equation. A complementary construction starts with an abstract scalar differential equation and builds a state by stacking derivatives.
[example: Companion Form for a Scalar ODE]
Consider the forced scalar equation
\begin{align*}
y^{(n)}+a_{n-1}y^{(n-1)}+\cdots+a_1\dot{y}+a_0y=b_0u.
\end{align*}
Set $x_i=y^{(i-1)}$ for each $i\in\{1,\dots,n\}$, so $x_1=y$, $x_2=\dot y$, and $x_n=y^{(n-1)}$. For $1\le i<n$,
\begin{align*}
\dot{x}_i=\frac{d}{dt}y^{(i-1)}=y^{(i)}=x_{i+1}.
\end{align*}
Solving the scalar equation for $y^{(n)}$ gives
\begin{align*}
y^{(n)}=-a_{n-1}y^{(n-1)}-\cdots-a_1\dot y-a_0y+b_0u.
\end{align*}
Since $\dot{x}_n=\frac{d}{dt}y^{(n-1)}=y^{(n)}$, and since $y=x_1$, $\dot y=x_2$, ..., $y^{(n-1)}=x_n$, the last state equation is
\begin{align*}
\dot{x}_n=-a_0x_1-a_1x_2-\cdots-a_{n-1}x_n+b_0u.
\end{align*}
Thus the state matrix has $A_{i,i+1}=1$ for $1\le i<n$, last row entries $A_{n,j}=-a_{j-1}$ for $1\le j\le n$, and all other entries equal to $0$. The input matrix has $B_n=b_0$ and $B_i=0$ for $1\le i<n$, while the output matrices are $C=(1,0,\dots,0)$ and $D=0$.
To compute the transfer function, let $z=(z_1,\dots,z_n)^\top$ solve $(sI-A)z=B$. The first row gives
\begin{align*}
sz_1-z_2=0,
\end{align*}
so $z_2=sz_1$. The second row gives
\begin{align*}
sz_2-z_3=0,
\end{align*}
so $z_3=sz_2=s^2z_1$. Continuing through row $n-1$ gives
\begin{align*}
z_i=s^{i-1}z_1
\end{align*}
for every $i\in\{1,\dots,n\}$. The last row of $(sI-A)z=B$ is
\begin{align*}
a_0z_1+a_1z_2+\cdots+a_{n-2}z_{n-1}+(s+a_{n-1})z_n=b_0.
\end{align*}
Substituting $z_i=s^{i-1}z_1$ into each term gives
\begin{align*}
a_0z_1+a_1sz_1+\cdots+a_{n-2}s^{n-2}z_1+(s+a_{n-1})s^{n-1}z_1=b_0.
\end{align*}
The final product expands as
\begin{align*}
(s+a_{n-1})s^{n-1}z_1=s^nz_1+a_{n-1}s^{n-1}z_1.
\end{align*}
Therefore the last-row equation becomes
\begin{align*}
\left(s^n+a_{n-1}s^{n-1}+\cdots+a_1s+a_0\right)z_1=b_0.
\end{align*}
Whenever the denominator is nonzero, division gives
\begin{align*}
z_1=\frac{b_0}{s^n+a_{n-1}s^{n-1}+\cdots+a_1s+a_0}.
\end{align*}
Since $C=(1,0,\dots,0)$, we have $Cz=z_1$, and since $D=0$,
\begin{align*}
G(s)=C(sI-A)^{-1}B+D=\frac{b_0}{s^n+a_{n-1}s^{n-1}+\cdots+a_1s+a_0}.
\end{align*}
The companion matrix stores the derivative chain in the upper rows and places the original scalar equation in the last row, so the denominator of the transfer function is exactly the differential operator polynomial from the ODE.
[/example]
The examples show that different state choices can describe the same input-output map. A realization is minimal when no lower-dimensional state-space system has the same transfer function; in this course, minimality will later be characterized by reachability together with observability.
Once the solution operator has been built, the natural question is how its internal dynamics behave over time. Chapter 2 focuses on the homogeneous flow $\dot{x}=Ax$, using spectra and Lyapunov ideas to separate stable behaviour from unstable or marginal modes.
# 2. Stability of Linear Dynamics
Building on the state-space solution operators from Chapter 1, linear control theory next studies the homogeneous equation $\dot{x}=Ax$ because every controlled or observed system inherits its internal behaviour from this flow. The chapter assumes the standard prerequisites from linear algebra and ordinary differential equations: eigenvalues, Jordan blocks, matrix exponentials, and existence-uniqueness for autonomous ODEs. Chapter 1 gave the solution operator $e^{tA}$ and the variation-of-constants formula; this chapter asks what the trajectories of $e^{tA}x_0$ do as $t \to \infty$. The answer is both spectral and geometric: eigenvalues classify the modes, while Lyapunov functions certify decay without solving the ODE explicitly.
## Stability Notions for Linear Flows
The first question is what it should mean for the equilibrium $x=0$ of $x'=Ax$ to be stable. In applications, there are several levels of stability: small perturbations may remain small, may decay to zero, or may decay at a uniform exponential rate. These distinctions matter because feedback design usually aims for exponential decay, while systems with conserved energy can be stable without converging.
[definition: Lyapunov Stability]
Let $A \in \mathbb R^{n \times n}$. The equilibrium $0$ of $x'=Ax$ is Lyapunov stable if for every $\varepsilon>0$ there exists $\delta>0$ such that, whenever $|x_0|<\delta$, the solution $x(t)=e^{tA}x_0$ satisfies $|x(t)|<\varepsilon$ for all $t\ge 0$.
[/definition]
Lyapunov stability records bounded response to small initial errors, but it does not say that the error is corrected. To distinguish neutral motion from genuine settling, the next notion adds convergence back to the equilibrium.
[definition: Asymptotic Stability]
Let $A \in \mathbb R^{n \times n}$. The equilibrium $0$ of $x'=Ax$ is asymptotically stable if it is Lyapunov stable and there exists $r>0$ such that, for every $x_0\in\mathbb R^n$ with $|x_0|<r$, the corresponding solution satisfies $e^{tA}x_0 \to 0$ as $t\to\infty$.
[/definition]
For linear systems, local attraction propagates to every initial condition by scaling, but design and robustness estimates usually require a rate. This motivates the uniform exponential bound used throughout finite-dimensional linear control.
[definition: Exponential Stability]
Let $A \in \mathbb R^{n \times n}$. The equilibrium $0$ of $x'=Ax$ is exponentially stable if there exist constants $M\ge 1$ and $\alpha>0$ such that
\begin{align*}
|e^{tA}x_0| \le M e^{-\alpha t}|x_0|
\end{align*}
for all $t\ge 0$ and all $x_0\in\mathbb R^n$.
[/definition]
The constants $M$ and $\alpha$ separate transient amplification from asymptotic decay. The same eigenvalues may permit a large $M$ when eigenvectors are poorly conditioned, so exponential stability is not the same as monotone decay in the Euclidean norm.
[example: Stable And Marginal Oscillators]
For $\omega>0$, the defining relations give $A_0(ae_1+be_2)=-\omega b e_1+\omega a e_2$, so $A_0^2(ae_1+be_2)=-\omega^2(ae_1+be_2)$ and hence $A_0^2=-\omega^2 I$. Separating even and odd powers in the exponential series,
\begin{align*}
e^{tA_0}=\sum_{m=0}^{\infty}\frac{t^{2m}A_0^{2m}}{(2m)!}+\sum_{m=0}^{\infty}\frac{t^{2m+1}A_0^{2m+1}}{(2m+1)!}.
\end{align*}
Since $A_0^{2m}=(-1)^m\omega^{2m}I$ and $A_0^{2m+1}=(-1)^m\omega^{2m}A_0$, this becomes
\begin{align*}
e^{tA_0}=\cos(\omega t)I+\frac{\sin(\omega t)}{\omega}A_0.
\end{align*}
Therefore, for $x_0=ae_1+be_2$,
\begin{align*}
e^{tA_0}x_0=(a\cos(\omega t)-b\sin(\omega t))e_1+(a\sin(\omega t)+b\cos(\omega t))e_2.
\end{align*}
Taking the squared Euclidean norm gives
\begin{align*}
|e^{tA_0}x_0|^2=(a\cos(\omega t)-b\sin(\omega t))^2+(a\sin(\omega t)+b\cos(\omega t))^2.
\end{align*}
Expanding the two squares,
\begin{align*}
|e^{tA_0}x_0|^2=a^2\cos^2(\omega t)-2ab\cos(\omega t)\sin(\omega t)+b^2\sin^2(\omega t)+a^2\sin^2(\omega t)+2ab\sin(\omega t)\cos(\omega t)+b^2\cos^2(\omega t).
\end{align*}
The mixed terms cancel, and $\sin^2(\omega t)+\cos^2(\omega t)=1$, so
\begin{align*}
|e^{tA_0}x_0|^2=a^2+b^2=|x_0|^2.
\end{align*}
Given $\varepsilon>0$, choosing $\delta=\varepsilon$ gives $|e^{tA_0}x_0|=|x_0|<\varepsilon$ for all $t\ge 0$, so the equilibrium is Lyapunov stable. It is not asymptotically stable, because for $x_0=e_1$ the identity above gives $|e^{tA_0}e_1|=1$ for every $t\ge 0$, so $e^{tA_0}e_1$ cannot converge to $0$.
If damping is added by setting $A_\gamma=A_0-\gamma I$ with $\gamma>0$, then $A_0$ and $-\gamma I$ commute, so the exponential of the sum factors:
\begin{align*}
e^{tA_\gamma}=e^{t(A_0-\gamma I)}=e^{tA_0}e^{-\gamma t I}=e^{-\gamma t}e^{tA_0}.
\end{align*}
Using the norm preservation already proved,
\begin{align*}
|e^{tA_\gamma}x_0|=e^{-\gamma t}|e^{tA_0}x_0|=e^{-\gamma t}|x_0|.
\end{align*}
Thus the undamped oscillator is Lyapunov stable but not asymptotically stable, while the damped oscillator is exponentially stable with $M=1$ and $\alpha=\gamma$.
[/example]
The oscillator example shows that imaginary-axis eigenvalues are not automatically unstable. The missing issue is whether the corresponding modes are semisimple or whether Jordan blocks create polynomial growth.
## Spectral Classification of Stability
The problem now becomes spectral: given $A$, can stability be read from the locations and Jordan structure of its eigenvalues? The matrix exponential turns each Jordan block into an exponential factor multiplied by a polynomial in $t$. Hence the real parts of eigenvalues control decay or growth, while Jordan blocks on the imaginary axis decide boundedness.
[definition: Hurwitz Matrix]
A matrix $A\in\mathbb R^{n\times n}$ is Hurwitz if every eigenvalue $\lambda\in\mathbb C$ of $A$ satisfies $\operatorname{Re}(\lambda)<0$.
[/definition]
Hurwitz matrices describe the spectral configuration expected to force decay of every mode, but stability is not determined by eigenvalue locations alone on the boundary. A mode with negative real part decays, a mode with positive real part grows, and an imaginary-axis eigenvalue is harmless only when its Jordan block introduces no polynomial factor. The classification needed here must therefore combine real parts with semisimplicity of boundary eigenvalues.
The useful stability criterion should turn these block-level alternatives into exact tests for Lyapunov stability, asymptotic stability, and exponential stability. Such a classification also gives a benchmark for the Lyapunov methods that follow: those methods should recover decay without requiring us to inspect every Jordan block directly.
The next issue is to make this distinction precise in a way that applies to every finite-dimensional linear system at once. The criterion below packages the Jordan-block alternatives into three stability tests, so that later Lyapunov arguments can be checked against an exact spectral standard rather than against examples one at a time.
[quotetheorem:6367]
[citeproof:6367]
This theorem explains why asymptotic and exponential stability coincide for finite-dimensional linear autonomous systems. It also separates three different boundary phenomena that are easy to confuse. A pure rotation has eigenvalues on the imaginary axis and is Lyapunov stable, but it is not asymptotically stable because no energy is dissipated. A defective nilpotent block has the same spectral location, with eigenvalue $0$, yet it is not Lyapunov stable because the nilpotent part produces polynomial growth. A single eigenvalue with positive real part gives an exponentially growing mode, so even boundedness of nearby trajectories fails.
The hypotheses therefore cannot be weakened to a statement about non-positive real parts alone. The theorem also does not say that all stable systems have decreasing Euclidean norm; non-normal Hurwitz matrices can have transient growth before the eventual exponential decay dominates. What the theorem provides is an eventual bound after choosing suitable constants, and this distinction motivates Lyapunov functions: they search for a better quadratic geometry in which the decay is visible directly.
[example: Defective Repeated Eigenvalue]
Let $A$ be the nilpotent linear map on $\mathbb R^2$ satisfying $Ae_1=0$ and $Ae_2=e_1$. For $x=ae_1+be_2$, linearity gives
\begin{align*}
A(ae_1+be_2)=aAe_1+bAe_2=be_1.
\end{align*}
Applying $A$ once more,
\begin{align*}
A^2(ae_1+be_2)=A(be_1)=bAe_1=0,
\end{align*}
so $A^2=0$.
The eigenvalue equation $Ax=\lambda x$ for $x=ae_1+be_2\ne 0$ becomes
\begin{align*}
be_1=\lambda ae_1+\lambda be_2.
\end{align*}
Comparing the $e_2$ coefficient gives $\lambda b=0$. If $b\ne0$, then $\lambda=0$. If $b=0$, then $x=ae_1$ with $a\ne0$, and the equation becomes
\begin{align*}
0=\lambda ae_1,
\end{align*}
so again $\lambda=0$. Thus the only eigenvalue is $0$. Its eigenspace is
\begin{align*}
\ker A=\{ae_1+be_2:be_1=0\}=\operatorname{span}\{e_1\}.
\end{align*}
This eigenspace is one-dimensional, while the algebraic multiplicity of the only eigenvalue is $2$, so the eigenvalue $0$ is not semisimple.
Because $A^k=0$ for every $k\ge2$, the exponential series terminates:
\begin{align*}
e^{tA}=I+tA+\sum_{k=2}^{\infty}\frac{t^kA^k}{k!}=I+tA.
\end{align*}
For the initial condition $x_0=e_2$,
\begin{align*}
e^{tA}e_2=(I+tA)e_2=e_2+tAe_2=e_2+te_1.
\end{align*}
Hence
\begin{align*}
|e^{tA}e_2|^2=|te_1+e_2|^2=t^2+1.
\end{align*}
This trajectory is unbounded as $t\to\infty$.
To see the failure of Lyapunov stability directly, fix any $\delta>0$ and choose $x_0=(\delta/2)e_2$. Then $|x_0|=\delta/2<\delta$, but
\begin{align*}
|e^{tA}x_0|=\frac{\delta}{2}|e^{tA}e_2|=\frac{\delta}{2}\sqrt{t^2+1}.
\end{align*}
For all sufficiently large $t$, this quantity exceeds $1$. Therefore no matter how small the initial neighbourhood is, some trajectory starting inside it eventually leaves the unit ball. The equilibrium is not Lyapunov stable even though the spectrum lies on the imaginary axis.
[/example]
The defective block example isolates the difference between spectral location and spectral structure. For nonlinear models, the same matrix appears as the Jacobian at an equilibrium, so we need a criterion explaining when the linear spectral picture survives small nonlinear remainders.
[quotetheorem:698]
[citeproof:698]
The indirect criterion is not a substitute for the exact spectral theorem when the system is already linear, and it is deliberately silent in the borderline case. If all eigenvalues of $Jf_{x^*}$ lie on the imaginary axis or at $0$, the linearisation alone may be inconclusive: nonlinear terms can create attraction, repulsion, or neutral motion on a centre direction. For instance, in one dimension $x'=-x^3$ is locally asymptotically stable at $0$, while $x'=x^3$ is unstable, although both have Jacobian $0$ at the equilibrium.
The hypotheses in the criterion encode the mechanism of the proof. The condition $f(x^*)=0$ is needed so that $x^*$ is an equilibrium whose stability is being tested; without it, trajectories starting near $x^*$ are not being compared with a stationary solution at $x^*$. The $C^1$ assumption gives the expansion $f(x^*+\xi)=Jf_{x^*}\xi+r(\xi)$ with $|r(\xi)|/|\xi|\to 0$, so the nonlinear system is a small-o perturbation of its linearisation near the equilibrium. That small-o control is what lets a strict Hurwitz decay estimate, or a strict unstable mode, survive after restricting to a sufficiently small neighbourhood.
The limitation is important in control design. Stabilising the linearisation is a reliable first target when the closed-loop Jacobian is Hurwitz, but marginal linearisations require higher-order analysis, centre manifold reductions, or a nonlinear Lyapunov function. The next section develops the quadratic Lyapunov functions that make the Hurwitz case constructive.
## Quadratic Lyapunov Functions
A spectral test is decisive, but it may be inconvenient in computation and design. Lyapunov's second method replaces explicit solutions with an energy-like scalar function that decreases along trajectories. For linear systems, the natural candidates are quadratic forms.
[definition: Positive Definite Matrix]
A symmetric matrix $P\in\mathbb R^{n\times n}$ is positive definite, written $P>0$, if
\begin{align*}
x^\top P x>0
\end{align*}
for every $x\in\mathbb R^n\setminus\{0\}$.
[/definition]
A positive definite $P$ defines a quadratic energy $V(x)=x^\top Px$. To use this energy for stability, we must impose a condition that makes it decrease along every nonzero trajectory of $x'=Ax$.
[definition: Quadratic Lyapunov Function]
Let $A\in\mathbb R^{n\times n}$. A quadratic Lyapunov function for $x'=Ax$ is a function $V:\mathbb R^n\to\mathbb R$ of the form
\begin{align*}
V(x)=x^\top P x,
\end{align*}
where $P=P^\top>0$ and
\begin{align*}
A^\top P+PA<0.
\end{align*}
[/definition]
The matrix inequality appears by differentiating $V$ along solutions. What still has to be justified is that a decreasing quadratic quantity really controls the Euclidean size of the state and gives a uniform decay rate, rather than merely decreasing in some preferred coordinates. Positive definiteness supplies comparison with $|x|^2$, while strict negativity supplies a derivative bound strong enough to force exponential decay.
The needed bridge is a stability certificate: from the algebraic inequalities on $P$ and $A^\top P+PA$, we must obtain an actual bound on trajectories of $x'=Ax$. This is the point at which the geometry of the quadratic form is converted into exponential convergence in the original state space.
[quotetheorem:6368]
[citeproof:6368]
The theorem turns a matrix inequality into a trajectory estimate, and both strict hypotheses are doing real work. Positive definiteness of $P$ makes $V(x)=x^\top Px$ comparable to $|x|^2$ from above and below; without that lower bound, a decreasing quadratic form may ignore some nonzero directions. Strict negativity of $A^\top P+PA$ makes the derivative dominate the state size in every direction, which is what produces a uniform exponential rate.
Non-strict inequalities do not give the same conclusion. For a pure rotation, $P=I$ gives $A^\top P+PA=0$, so $V$ is conserved and no convergence follows. For $A=0$, every positive definite $P$ also gives zero derivative, again yielding stability but not decay. This gap motivates the Lyapunov equation: instead of asking for an unspecified strict inequality, prescribe a positive definite $Q$ and solve $A^\top P+PA=-Q$ to force a quantified decrease.
## Lyapunov Equations
The remaining question is how to find $P$. Instead of searching directly for a strict inequality, we prescribe a positive definite matrix $Q$ and solve a linear matrix equation. This is the continuous-time Lyapunov equation.
[definition: Continuous-Time Lyapunov Equation]
Let $A,Q\in\mathbb R^{n\times n}$ with $Q=Q^\top$. The continuous-time Lyapunov equation for $A$ and $Q$ is
\begin{align*}
A^\top P+PA=-Q,
\end{align*}
where the unknown is a symmetric matrix $P=P^\top$.
[/definition]
Solving this equation with $Q>0$ would produce the strict inequality needed above, but it is not automatic that such a symmetric positive definite solution exists. The obstruction is spectral: if $A$ has nondecaying modes, the accumulated energy represented by the Lyapunov integral can fail to converge or can miss directions. The useful criterion must say exactly when prescribing $Q>0$ leads to a valid positive definite certificate.
[quotetheorem:6369]
[citeproof:6369]
This theorem is a bridge between eigenvalue placement and convex certificates. The Hurwitz hypothesis is essential, not a technical convenience. If $A=0$ and $Q>0$, the equation becomes $0=-Q$, which has no solution. If $A$ is a pure rotation and $Q=I$, multiplying the equation along the rotating flow would force the conserved orbit to have strictly decreasing accumulated quadratic energy, an impossibility; equivalently, the integral formula fails to converge.
The theorem also does not supply a positive definite certificate when the data are weakened. If $A$ is not Hurwitz, a solution may fail to exist or may fail to be positive definite. If $Q$ is only semidefinite, the equation can miss undamped directions and may certify decay only on the directions observed by $Q$. Requiring $Q>0$ and $A$ Hurwitz gives the exact setting in which solving a linear matrix equation produces the strict Lyapunov inequality needed above. In later control synthesis, the same certificate becomes a robustness margin: if $A^\top P+PA\le -Q$ with $Q>0$, then sufficiently small model perturbations still leave a negative derivative after the perturbation terms are estimated against $P$ and $Q$.
[example: Constructing A Lyapunov Matrix By Integration]
Let $A=\operatorname{diag}(-1,-2)$ and $Q=I$. The exponential of a diagonal matrix is obtained by exponentiating its diagonal entries, so
\begin{align*}
e^{tA}=\operatorname{diag}(e^{-t},e^{-2t}).
\end{align*}
Using the integral formula from the *[Lyapunov Equation Theorem For Hurwitz Matrices](/theorems/6369)*,
\begin{align*}
P=\int_0^\infty e^{tA^\top}Qe^{tA}\,dt.
\end{align*}
Here $A=A^\top$ and $Q=I$, hence
\begin{align*}
e^{tA^\top}Qe^{tA}=\operatorname{diag}(e^{-t},e^{-2t})I\operatorname{diag}(e^{-t},e^{-2t}).
\end{align*}
Multiplication by $I$ leaves the diagonal matrix unchanged, and multiplying diagonal matrices multiplies corresponding diagonal entries, so
\begin{align*}
e^{tA^\top}Qe^{tA}=\operatorname{diag}(e^{-2t},e^{-4t}).
\end{align*}
Therefore
\begin{align*}
P=\int_0^\infty \operatorname{diag}(e^{-2t},e^{-4t})\,dt.
\end{align*}
Integrating entry by entry gives
\begin{align*}
P=\operatorname{diag}\left(\int_0^\infty e^{-2t}\,dt,\int_0^\infty e^{-4t}\,dt\right).
\end{align*}
The first integral is
\begin{align*}
\int_0^\infty e^{-2t}\,dt=\left[-\frac12 e^{-2t}\right]_0^\infty=\frac12,
\end{align*}
and the second is
\begin{align*}
\int_0^\infty e^{-4t}\,dt=\left[-\frac14 e^{-4t}\right]_0^\infty=\frac14.
\end{align*}
Thus
\begin{align*}
P=\operatorname{diag}\left(\frac12,\frac14\right).
\end{align*}
Now verify the Lyapunov equation explicitly. Since $A=A^\top$,
\begin{align*}
A^\top P+PA=AP+PA.
\end{align*}
The two products are
\begin{align*}
AP=\operatorname{diag}(-1,-2)\operatorname{diag}\left(\frac12,\frac14\right)=\operatorname{diag}\left(-\frac12,-\frac12\right)
\end{align*}
and
\begin{align*}
PA=\operatorname{diag}\left(\frac12,\frac14\right)\operatorname{diag}(-1,-2)=\operatorname{diag}\left(-\frac12,-\frac12\right).
\end{align*}
Adding them gives
\begin{align*}
A^\top P+PA=\operatorname{diag}\left(-\frac12,-\frac12\right)+\operatorname{diag}\left(-\frac12,-\frac12\right)=\operatorname{diag}(-1,-1)=-I.
\end{align*}
The matrix $P$ is symmetric and its diagonal entries $\frac12$ and $\frac14$ are positive, so $P>0$. Hence $V(x)=x^\top Px$ is a quadratic Lyapunov function for $x'=Ax$, and the entries of $P$ record the accumulated future energy in the two exponentially decaying coordinate modes.
[/example]
The integral construction also shows why a Lyapunov matrix should be interpreted as accumulated future energy. If $Q$ weights the instantaneous state size, then $x_0^\top Px_0$ equals the total discounted energy of the unforced trajectory:
\begin{align*}
x_0^\top Px_0=\int_0^\infty (e^{tA}x_0)^\top Q(e^{tA}x_0)\,dt.
\end{align*}
This interpretation will reappear in linear-quadratic regulation, where the matrix solving a Riccati equation plays the same role for optimally controlled trajectories.
The same energy viewpoint also connects the chapter to mechanics. For an undamped conservative system, a quadratic Hamiltonian is typically conserved and gives Lyapunov stability without attraction. Adding damping or feedback changes the energy balance from conservation to dissipation, and the Lyapunov equation is the linear algebraic form of that dissipative estimate.
After understanding how the free dynamics evolve, we ask which of those states can actually be generated by an input. Chapter 3 takes the state-transition machinery from Chapter 1 and uses it to define reachability and controllability on finite time intervals.
# 3. Controllability and Reachability
This chapter asks which states of a linear system can be produced by choosing an input, and how to construct such an input on a prescribed time interval. The previous chapters treated the unforced dynamics $\dot{x}=Ax$ and the solution operator $e^{At}$; here the input term $Bu$ becomes the object of study. The central theme is that a dynamical question about steering trajectories is equivalent to finite-dimensional linear algebra involving $A$ and $B$.
We work with the continuous-time, finite-dimensional system
\begin{align*}
\dot{x}(t) &= Ax(t)+Bu(t), \qquad x(t)\in \mathbb R^n,\quad u(t)\in \mathbb R^m,
\end{align*}
where $A\in \mathbb R^{n\times n}$ and $B\in \mathbb R^{n\times m}$. Unless stated otherwise, controls are taken in $L^2([0,T];\mathbb R^m)$ on finite horizons $T>0$.
## Reachable Subspaces on a Finite Horizon
The first problem is constructive: starting from $x(0)=0$, which terminal states can we obtain at time $T$? The variation-of-constants formula from Chapter 1 gives the terminal state as a linear function of the whole input signal, so reachability is a question about the range of a [bounded linear operator](/page/Bounded%20Linear%20Operator)
\begin{align*}
\mathcal L_T : L^2([0,T];\mathbb R^m) &\to \mathbb R^n, & \mathcal L_Tu&=\int_0^{\!T} e^{A(T-s)}Bu(s)\,ds.
\end{align*}
The reachable set is precisely $\operatorname{Range}(\mathcal L_T)$.
[definition: Reachable Set on a Horizon]
For $T>0$, the reachable set from the origin at time $T$ is
\begin{align*}
\mathcal R_T := \left\{\int_0^{\!T} e^{A(T-s)}Bu(s)\,ds : u\in L^2([0,T];\mathbb R^m)\right\}\subset \mathbb R^n.
\end{align*}
[/definition]
The set $\mathcal R_T$ is a linear subspace because the input-to-state map is linear. The dependence on $T$ looks analytic at first, but the theorem below shows that for time-invariant systems the subspace is determined by finitely many columns.
[quotetheorem:6370]
[citeproof:6370]
This result turns the reachable set into a computable object. Instead of searching over all input functions, we compute the span generated by applying the drift matrix $A$ repeatedly to the actuator directions in $B$. The finite-dimensional and time-invariant hypotheses are doing real work here: Cayley-Hamilton reduces all powers of $A$ to the first $n$ powers, and the same fixed matrices $A$ and $B$ govern the whole interval. For a time-varying system $\dot{x}=A(t)x+B(t)u$, the reachable directions depend on the transition matrix and on the full time profile of $B(t)$, so there need not be a single finite matrix $[B,AB,\dots,A^{n-1}B]$ that captures reachability.
[example: Time-Varying Actuator Not Captured by a Fixed Kalman Matrix]
Take $A(t)=0$ on $\mathbb R^2$ and let $B(t)=(1,t)^\top$ on $[0,1]$. Starting from the origin, a scalar input $u$ gives terminal state
\begin{align*}
x(1)=\int_0^1 B(t)u(t)\,dt=\int_0^1 (1,t)^\top u(t)\,dt=\left(\int_0^1 u(t)\,dt,\int_0^1 t\,u(t)\,dt\right)^\top.
\end{align*}
For $u_1(t)=1$, this becomes
\begin{align*}
x_1(1)=\left(\int_0^1 1\,dt,\int_0^1 t\,dt\right)^\top=\left(1,\frac12\right)^\top.
\end{align*}
For $u_2(t)=t-\frac12$, the first coordinate is
\begin{align*}
\int_0^1 \left(t-\frac12\right)\,dt=\left[\frac{t^2}{2}-\frac{t}{2}\right]_0^1=0,
\end{align*}
and the second coordinate is
\begin{align*}
\int_0^1 t\left(t-\frac12\right)\,dt=\int_0^1 \left(t^2-\frac{t}{2}\right)\,dt=\left[\frac{t^3}{3}-\frac{t^2}{4}\right]_0^1=\frac13-\frac14=\frac1{12}.
\end{align*}
Thus
\begin{align*}
x_2(1)=\left(0,\frac1{12}\right)^\top.
\end{align*}
The two terminal vectors are linearly independent: if
\begin{align*}
a\left(1,\frac12\right)^\top+b\left(0,\frac1{12}\right)^\top=(0,0)^\top,
\end{align*}
then the first coordinate gives $a=0$, and then the second coordinate gives $b/12=0$, so $b=0$. Hence the reachable subspace contains two independent vectors in $\mathbb R^2$, and therefore it is all of $\mathbb R^2$.
By contrast, if one freezes the system to a fixed pair with $A=0$ and one actuator column $B=b\in\mathbb R^2$, then
\begin{align*}
AB=0b=0.
\end{align*}
The corresponding Kalman matrix has columns $b$ and $0$, so its range is $\operatorname{span}\{b\}$, which has dimension at most $1$. The variation of the actuator direction from $(1,0)^\top$ at $t=0$ to $(1,1)^\top$ at $t=1$ creates reachability that no frozen single-column Kalman matrix with $A=0$ can record.
[/example]
The theorem also does not produce a steering control. It identifies the terminal states that are possible under unconstrained $L^2$ controls, but it says nothing about input bounds, sign constraints, or the energy required to reach a particular point. For instance, if the span is a proper subspace, no choice of unconstrained control reaches a target outside it; if the span is all of $\mathbb R^n$, a further construction is still needed to find a control for a specified target.
[example: Double Integrator Reachable Subspace]
Consider the double integrator with state $x=(x_1,x_2)^\top$. Its dynamics can be written as $\dot{x}(t)=Ax(t)+Bu(t)$, where $B=e_2$ and $A$ is the linear map satisfying $Ae_1=0$ and $Ae_2=e_1$. Therefore
\begin{align*}
AB=Ae_2=e_1.
\end{align*}
The reachable subspace is generated by the columns $B$ and $AB$, so in this case it is
\begin{align*}
\operatorname{span}\{B,AB\}=\operatorname{span}\{e_2,e_1\}.
\end{align*}
To check that these two vectors span all of $\mathbb R^2$, suppose
\begin{align*}
ae_2+be_1=0.
\end{align*}
Taking coordinates gives $(b,a)^\top=(0,0)^\top$, hence $b=0$ and $a=0$. Thus $e_2$ and $e_1$ are linearly independent, so
\begin{align*}
\operatorname{span}\{e_2,e_1\}=\mathbb R^2.
\end{align*}
The input acts directly only on velocity through $B=e_2$, but the coupling $\dot{x}_1=x_2$ turns velocity actuation into reachability of the position direction $e_1$.
[/example]
The reachable subspace describes where motion is possible, but not yet which input realizes a desired terminal point. The next object encodes both reachability and an energy-minimizing steering formula.
## Controllability Gramians and Steering Controls
Once a target $x_1$ lies in $\mathcal R_T$, we need an actual control $u$ satisfying $x(T)=x_1$. Among all such controls, the most natural choice in an $L^2$ framework is the one with minimum input energy. This leads to a symmetric matrix obtained by composing the reachability operator with its adjoint.
[definition: Controllability Gramian]
For $T>0$, the finite-horizon controllability Gramian is
\begin{align*}
W_T := \int_0^{\!T} e^{As}BB^\top e^{A^\top s}\,ds \in \mathbb R^{n\times n}.
\end{align*}
[/definition]
The change of variables $s\mapsto T-s$ shows that the same range information is obtained from the kernel $e^{A(T-s)}B$. Since the Gramian is symmetric and positive semidefinite, the next question is whether its null space is precisely the set of state directions invisible to all steering attempts.
[quotetheorem:6371]
[citeproof:6371]
The criterion says that full reachability is the same as invertibility of a concrete matrix. When $W_T$ is singular, the conclusion is not that the system has no motion; rather, motion is confined to the subspace $\operatorname{Range}(W_T)$. For example, if $A=0$ and $B=e_1$ in $\mathbb R^2$, then the only nonzero entry of the Gramian is $(W_T)_{11}=T$, so the first coordinate is reachable and the second coordinate is not. This interpretation uses the finite-dimensional fact that a symmetric positive semidefinite matrix satisfies $\operatorname{Range}(W_T)=(\ker W_T)^\bot$; in more general operator settings, range-closure issues can enter.
A criterion alone does not steer the system, because invertibility tells us that a control exists without identifying which signal to apply. The next result turns the range criterion into an explicit synthesis formula: the inverse Gramian selects the state-space vector whose adjoint reachability image supplies the desired terminal state. This is the point where the Hilbert-space geometry of least-squares projection becomes a concrete input law.
[quotetheorem:6372]
[citeproof:6372]
The nonsingularity assumption is exactly what permits the formula to be written for every target $x_1\in\mathbb R^n$. If $W_T$ is singular but $x_1\in\operatorname{Range}(W_T)$, steering is still possible, but $W_T^{-1}x_1$ must be replaced by a choice of solution $y$ to $W_Ty=x_1$, such as the Moore-Penrose least-squares solution. If $x_1\notin\operatorname{Range}(W_T)$, no unconstrained $L^2$ control reaches the target from the origin.
For nonzero initial states, steering from $x(0)=x_0$ to $x(T)=x_1$ is reduced to the origin problem by subtracting the free response. The target for the controlled part is
\begin{align*}
x_1-e^{AT}x_0.
\end{align*}
The same Gramian formula applies to this displayed target whenever the system is fully controllable. The formula concerns unconstrained controls; it does not account for amplitude bounds, saturation, positivity constraints, or other admissibility restrictions. Among unconstrained $L^2$ controls, the Gramian construction is the Hilbert-space analogue of a least-squares solution and gives the minimum-energy steering control when $W_T$ is nonsingular.
[example: Minimum-Energy Steering for the Double Integrator]
For the double integrator, $A^2=0$, so $e^{As}=I+sA$. Since $B=e_2$ and $Ae_2=e_1$, we have
\begin{align*}
e^{As}B=(I+sA)e_2=e_2+s e_1=(s,1)^\top.
\end{align*}
Thus $W_T=\int_0^{\!T} (e^{As}B)(e^{As}B)^\top\,ds$, so its entries are
\begin{align*}
(W_T)_{11}=\int_0^{\!T} s^2\,ds=T^3/3.
\end{align*}
\begin{align*}
(W_T)_{12}=(W_T)_{21}=\int_0^{\!T} s\,ds=T^2/2.
\end{align*}
\begin{align*}
(W_T)_{22}=\int_0^{\!T} 1\,ds=T.
\end{align*}
The determinant is
\begin{align*}
\det W_T=(T^3/3)T-(T^2/2)^2=T^4/3-T^4/4=T^4/12.
\end{align*}
Since $T>0$, this determinant is nonzero, so $W_T$ is invertible. Its inverse has entries
\begin{align*}
(W_T^{-1})_{11}=12/T^3.
\end{align*}
\begin{align*}
(W_T^{-1})_{12}=(W_T^{-1})_{21}=-6/T^2.
\end{align*}
\begin{align*}
(W_T^{-1})_{22}=4/T.
\end{align*}
For a target terminal state $y=(p,v)^\top$, set
\begin{align*}
\alpha=12p/T^3-6v/T^2.
\end{align*}
\begin{align*}
\beta=-6p/T^2+4v/T.
\end{align*}
Because $B^\top e^{A^\top(T-t)}=(T-t,1)$, the Gramian steering formula gives
\begin{align*}
u(t)=(T-t)\alpha+\beta.
\end{align*}
Substituting this control into the terminal-state formula and writing $r=T-t$, the first coordinate of $x(T)$ is
\begin{align*}
\int_0^{\!T} r(\alpha r+\beta)\,dr=\alpha T^3/3+\beta T^2/2.
\end{align*}
Using the displayed values of $\alpha$ and $\beta$,
\begin{align*}
\alpha T^3/3+\beta T^2/2=(12p/T^3-6v/T^2)T^3/3+(-6p/T^2+4v/T)T^2/2=4p-2Tv-3p+2Tv=p.
\end{align*}
The second coordinate is
\begin{align*}
\int_0^{\!T} (\alpha r+\beta)\,dr=\alpha T^2/2+\beta T.
\end{align*}
Again substituting $\alpha$ and $\beta$ gives
\begin{align*}
\alpha T^2/2+\beta T=(12p/T^3-6v/T^2)T^2/2+(-6p/T^2+4v/T)T=6p/T-3v-6p/T+4v=v.
\end{align*}
Hence $x(T)=(p,v)^\top$, so every terminal position and velocity is reachable, and the affine-in-time input $u(t)=(T-t)\alpha+\beta$ is the corresponding Gramian steering control.
[/example]
The Gramian criterion is powerful when a horizon is fixed and energy matters. For structural analysis, however, it is often faster to test the finite controllability matrix directly.
## The Kalman Rank Condition
The main structural question is whether every state direction can be generated by repeated interaction between the dynamics $A$ and the input directions $B$. Since the reachable subspace formula already identifies the relevant span, controllability becomes a rank condition.
[definition: Controllability]
The pair $(A,B)$ is controllable if for some $T>0$ the reachable set satisfies
\begin{align*}
\mathcal R_T=\mathbb R^n.
\end{align*}
[/definition]
For time-invariant finite-dimensional systems, the phrase "for some $T>0$" may be replaced by "for every $T>0$". The definition is phrased in terms of reachable terminal states, which is hard to check directly because it quantifies over all input functions. The finite-dimensional obstruction is whether the directions produced by $B,AB,\dots,A^{n-1}B$ span the whole state space; any missing orthogonal direction cannot be assigned from the origin.
[quotetheorem:6373]
[citeproof:6373]
This theorem is the standard first test for controllability. Its interpretation is not merely algebraic: $B$ gives the directly actuated directions, $AB$ gives directions reached after one interaction with the dynamics, and higher powers describe further propagation through the state coupling. Rank deficiency gives a concrete obstruction: if a nonzero vector $q$ is orthogonal to every column of the controllability matrix, then every reachable terminal state satisfies $q^\top x(T)=0$, so the component in the $q$-direction cannot be assigned from the origin. The time-varying actuator example above also marks the boundary of the theorem: reachability there comes from the changing column $B(t)$, not from repeated multiplication by one fixed matrix $A$.
[illustration:controllability-propagation-chain]
The test is also a structural, unconstrained-input statement. It does not say that a target can be reached with bounded amplitude, with nonnegative inputs, or with good numerical conditioning. In computations, the controllability matrix may have full rank while being nearly rank deficient, meaning the system is theoretically controllable but some directions require very large input energy over the chosen horizon.
[example: Actuator Placement in a Three-State Chain]
Let $A\in\mathbb R^{3\times 3}$ be defined on the standard basis by $Ae_1=0$, $Ae_2=e_1$, and $Ae_3=e_2$. First place the scalar actuator in the third state, so $B=e_3$. Then
\begin{align*}
B=e_3.
\end{align*}
Also,
\begin{align*}
AB=Ae_3=e_2.
\end{align*}
Applying $A$ once more gives
\begin{align*}
A^2B=A(AB)=Ae_2=e_1.
\end{align*}
Thus the controllability matrix has columns $e_3$, $e_2$, and $e_1$. If
\begin{align*}
a e_3+b e_2+c e_1=0,
\end{align*}
then the left side has coordinates $(c,b,a)^\top$, so
\begin{align*}
(c,b,a)^\top=(0,0,0)^\top.
\end{align*}
Hence $c=0$, $b=0$, and $a=0$. The columns $e_3,e_2,e_1$ are linearly independent, so the controllability matrix has rank $3$. By the *[Kalman Controllability Rank Theorem](/theorems/6373)*, the pair $(A,e_3)$ is controllable.
Now place the actuator in the first state, so $B=e_1$. Then
\begin{align*}
B=e_1.
\end{align*}
Since $Ae_1=0$, we get
\begin{align*}
AB=Ae_1=0.
\end{align*}
Therefore
\begin{align*}
A^2B=A(AB)=A0=0.
\end{align*}
The controllability matrix has columns $e_1$, $0$, and $0$, so its range is
\begin{align*}
\operatorname{span}\{e_1,0,0\}=\operatorname{span}\{e_1\}.
\end{align*}
This subspace has dimension $1$, not $3$, so the pair $(A,e_1)$ is not controllable by the *Kalman Controllability Rank Theorem*. The same chain dynamics transmits actuation from the third state through $e_3\mapsto e_2\mapsto e_1$, but an actuator placed at $e_1$ cannot generate the missing $e_2$ or $e_3$ directions.
[/example]
Rank tests can become numerically delicate for high-dimensional systems, and they do not explicitly identify which dynamical mode is missing. The PBH test reformulates controllability in terms of eigenvectors of $A^\top$, making uncontrollable modes visible.
## The PBH Controllability Test
The final problem in this chapter is modal: can an eigenmode of the autonomous dynamics escape the influence of the input? A left eigenvector $q^\top A=\lambda q^\top$ measures a scalar component $q^\top x(t)$ of the state, and the input can affect that component only through $q^\top B$.
[quotetheorem:6374]
[citeproof:6374]
The PBH test says that uncontrollability is witnessed by a specific autonomous mode. If $q^*B=0$, then the scalar projection $q^*x(t)$ evolves without direct input along that eigenmode, so the controller cannot assign that modal component freely. The complex formulation is essential even for real matrices, because real matrices may have non-real eigenvalues; a complex left eigenvector represents a two-dimensional real oscillatory mode through its real and imaginary parts. Thus the PBH obstruction detects both real exponential modes and complex conjugate modal pairs.
This is still a theorem about finite-dimensional linear time-invariant systems. For nonlinear systems, linearization may give useful local information but does not by itself decide global controllability. For example, the scalar nonlinear system $\dot{x}=u^2$ has linearization $\dot{\xi}=0$ with respect to the input at $u=0$, while the nonlinear system can still move monotonically to the right and cannot move left. For time-varying linear systems, the obstruction is not expressed by eigenvectors of a single fixed matrix $A$, because there is no single autonomous modal decomposition governing the whole horizon; the example with $B(t)=(1,t)^\top$ has no fixed left eigenvector test that represents the changing input direction.
[example: Uncontrollable Eigenmode Detected by a Left Eigenvector]
Let $A=\operatorname{diag}(1,2)$, $B=e_1=(1,0)^\top$, and $q=e_2=(0,1)^\top$. Since the data are real, $q^*=q^\top=(0,1)$. Multiplying on the left gives
\begin{align*}
q^*A=(0,1)\operatorname{diag}(1,2)=(0\cdot 1+1\cdot 0,\;0\cdot 0+1\cdot 2)=(0,2)=2(0,1)=2q^*.
\end{align*}
Thus $q$ is a left eigenvector of $A$ with eigenvalue $\lambda=2$. The same vector annihilates the input direction because
\begin{align*}
q^*B=(0,1)(1,0)^\top=0\cdot 1+1\cdot 0=0.
\end{align*}
Therefore there is a nonzero left eigenvector of $A$ that is invisible to $B$, so by the *[Popov-Belevitch-Hautus Controllability Test](/theorems/6374)* the pair $(A,B)$ is not controllable.
This obstruction is also visible directly from the differential equations. Writing $x=(x_1,x_2)^\top$, we have
\begin{align*}
Ax+Bu=(x_1,2x_2)^\top+(u,0)^\top=(x_1+u,2x_2)^\top.
\end{align*}
Hence
\begin{align*}
\dot{x}_1=x_1+u.
\end{align*}
\begin{align*}
\dot{x}_2=2x_2.
\end{align*}
If $x_2(0)=0$, then
\begin{align*}
\frac{d}{dt}\left(e^{-2t}x_2(t)\right)=e^{-2t}\dot{x}_2(t)-2e^{-2t}x_2(t)=e^{-2t}(2x_2(t))-2e^{-2t}x_2(t)=0.
\end{align*}
So $e^{-2t}x_2(t)=e^0x_2(0)=0$, and multiplying by $e^{2t}$ gives
\begin{align*}
x_2(t)=0
\end{align*}
for every $t$. The input can affect the first coordinate, but it never enters the eigenmode measured by $q^*x=x_2$, so terminal states with nonzero second coordinate cannot be reached from the origin.
[/example]
The Kalman, Gramian, and PBH tests are three views of the same phenomenon. The Gramian is best suited to finite-horizon steering and energy; the Kalman matrix gives a finite algebraic rank test; the PBH test identifies missing eigenmodes. These viewpoints also explain why numerical linear algebra matters in control: rank-revealing factorizations, singular values of $W_T$, and conditioning of the controllability matrix measure how expensive or unstable steering computations may become. The same ideas will reappear in dual form in observability, where inputs are replaced by outputs and reachable directions are replaced by distinguishable state directions.
Controllability describes what inputs can accomplish; observability asks what can be recovered from the output. Chapter 4 makes that dual viewpoint precise, showing how output measurements reveal, or fail to reveal, the hidden state.
# 4. Observability and Duality
Observability asks the inverse question to controllability. Instead of asking which states can be reached by a suitable input, we ask which internal states can be inferred from the measured output. This chapter develops the algebraic and energy tests for observability, then shows that they are exactly the controllability tests from Chapter 3 applied to the transposed system.
Throughout the chapter we consider the finite-dimensional continuous-time linear system
\begin{align*}
\dot{x}(t) &= Ax(t) + Bu(t), & y(t) &= Cx(t) + Du(t),
\end{align*}
where $A \in \mathbb R^{n \times n}$, $B \in \mathbb R^{n \times m}$, $C \in \mathbb R^{p \times n}$, and $D \in \mathbb R^{p \times m}$. Observability is a property of the pair $(C,A)$, because the direct feedthrough and the known input can be subtracted from the measured output.
## Indistinguishable States and Output Data
What information about $x(0)$ is contained in the curve $y(t)$ on a time interval? If the input is known, the forced part of the output is known as well, so the essential question is whether the homogeneous output $C e^{tA}x_0$ determines the initial state $x_0$.
[definition: Indistinguishable States]
For the homogeneous system $\dot{x}(t)=Ax(t)$, $y(t)=Cx(t)$, two states $x_1,x_2 \in \mathbb R^n$ are indistinguishable on $[0,T]$ if
\begin{align*}
C e^{tA}x_1 = C e^{tA}x_2 \quad \text{for every } t \in [0,T].
\end{align*}
[/definition]
Indistinguishability is a linear relation: only the difference $x_1-x_2$ matters. Thus the obstruction to state reconstruction is the set of nonzero initial states that generate zero output.
[definition: Unobservable Subspace]
For the pair $(C,A)$, the unobservable subspace is
\begin{align*}
\mathcal N_o := \{x \in \mathbb R^n : C e^{tA}x = 0 \text{ for every } t \ge 0\}.
\end{align*}
[/definition]
The pair is observable precisely when this obstruction vanishes. This makes observability a uniqueness property for the inverse problem of recovering $x(0)$ from output data.
[definition: Observable Pair]
The pair $(C,A)$ is observable if $\mathcal N_o = \{0\}$.
[/definition]
When a nonzero vector lies in $\mathcal N_o$, the outputs produced by $x_0$ and by $x_0+x$ are identical for the same known input. A sensor may therefore miss an entire mode of the internal dynamics.
[example: Hidden Mode In A Sensor Network]
Here $A=\operatorname{diag}(0,-1,-2)$ and $C=\begin{pmatrix}1&0&0\end{pmatrix}$. For an initial state $x(0)=(a,b,c)$, the diagonal form gives
\begin{align*}
e^{tA}=\operatorname{diag}(1,e^{-t},e^{-2t}).
\end{align*}
Therefore
\begin{align*}
e^{tA}x(0)=(a,e^{-t}b,e^{-2t}c).
\end{align*}
Applying the sensor,
\begin{align*}
Ce^{tA}x(0)=\begin{pmatrix}1&0&0\end{pmatrix}(a,e^{-t}b,e^{-2t}c)=a.
\end{align*}
Thus the measured output is $y(t)=a=x_1(0)$ for every $t\ge 0$, with no dependence on the decaying modes $e^{-t}x_2(0)$ and $e^{-2t}x_3(0)$.
Now $x=(a,b,c)$ lies in $\mathcal N_o$ exactly when $Ce^{tA}x=0$ for every $t\ge 0$. From the computation above,
\begin{align*}
Ce^{tA}(a,b,c)=a,
\end{align*}
so this condition is equivalent to $a=0$, while $b$ and $c$ are arbitrary. Hence
\begin{align*}
\mathcal N_o=\{(0,b,c):b,c\in\mathbb R\}.
\end{align*}
This subspace contains nonzero states, so the pair is not observable: the sensor distinguishes the first coordinate but misses the two internal decaying modes.
[/example]
This example shows that stability of an unmeasured mode is not the same as observability. A hidden mode may decay, but it still cannot be reconstructed from the sensor data.
## The Observability Matrix
How can the condition $C e^{tA}x=0$ for all $t$ be tested with finitely many linear equations? The answer comes from differentiating the output at $t=0$ and then using Cayley-Hamilton to stop after $n$ derivatives.
[definition: Observability Matrix]
The observability matrix of the pair $(C,A)$ is the linear map $\mathcal O(C,A):\mathbb R^n \to \mathbb R^{np}$ represented by the block column matrix
\begin{align*}
\mathcal O(C,A) := \operatorname{col}(C,CA,CA^2,\dots,CA^{n-1}) \in \mathbb R^{np \times n}.
\end{align*}
[/definition]
The rows of $\mathcal O(C,A)$ record the initial output derivatives. Indeed, for $x(0)=x_0$ in the homogeneous system,
\begin{align*}
\frac{d^k}{dt^k}y(0)=CA^k x_0, \qquad k \ge 0.
\end{align*}
The definition gives a finite collection of tests, but the observability problem asks for vanishing of $C e^{tA}x_0$ over a continuum of times. The next theorem proves that the first $n$ blocks contain all the information needed.
[quotetheorem:6375]
[citeproof:6375]
The rank theorem turns an infinite-time inverse problem into a finite matrix test, but the rank condition is essential rather than cosmetic. If $C=0$, then every block $CA^k$ vanishes and every initial state produces zero output, so no amount of differentiating the output can recover the state. More generally, any nonzero vector in $\ker \mathcal O(C,A)$ gives a concrete pair of indistinguishable states, namely $0$ and that vector.
The theorem also says less than a reconstruction algorithm might require. Full rank gives uniqueness in exact arithmetic and exact output data, but it does not say that reconstruction is well conditioned: the smallest singular value of $\mathcal O(C,A)$ may be very small, so small measurement errors in output derivatives can be amplified. This limitation motivates the energy-based Gramian test below, which uses output over an interval rather than finitely many derivatives at one time.
[example: Position-Only Measurement Of A Mechanical Oscillator]
Let $A$ be the linear map $A(q,v)=(v,-\omega^2 q)$ and let $C$ be the sensor map $C(q,v)=q$, where $\omega>0$. Then $x=(q,v)$ satisfies $\dot{x}=Ax$, and the measured output is $y=Cx=q$. The first block of the observability matrix sends $(q,v)$ to $C(q,v)=q$. For the second block, first apply $A$ and then $C$:
\begin{align*}
CA(q,v)=C(A(q,v))=C(v,-\omega^2 q)=v.
\end{align*}
Thus the observability matrix acts by
\begin{align*}
\mathcal O(C,A)(q,v)=(C(q,v),CA(q,v))=(q,v).
\end{align*}
So $\mathcal O(C,A)$ is the identity map on $\mathbb R^2$, hence $\operatorname{rank}\mathcal O(C,A)=2$. Since the state space has dimension $2$, the pair is observable by the *[Kalman Observability Rank Theorem](/theorems/6375)*.
The same calculation shows how the two initial coordinates appear in the measured output and its first derivative:
\begin{align*}
y(0)=Cx(0)=C(q(0),v(0))=q(0).
\end{align*}
Also, because $\dot{x}(0)=Ax(0)$, differentiating $y(t)=Cx(t)$ gives
\begin{align*}
\dot y(0)=C\dot{x}(0)=CAx(0)=CA(q(0),v(0))=v(0).
\end{align*}
Thus position-only measurement is enough for the ideal undamped oscillator because the dynamics transfer the unmeasured initial velocity into the first derivative of the measured position.
[/example]
The oscillator example captures the role of dynamics: an unmeasured coordinate can become visible if the state equation transfers it into a measured coordinate. The next test gives a frequency-domain version of the same idea.
[quotetheorem:6376]
[citeproof:6376]
PBH isolates unobservability at the level of modes. A mode is hidden precisely when an eigenvector lies in the kernel of the sensor map, and this is why repeated eigenvalues require particular care: seeing one direction in an eigenspace does not mean seeing the whole eigenspace. The invariant-subspace argument in the proof is what makes ordinary eigenvectors enough, even when the observability matrix kernel is first detected through a mixture of generalized modal directions.
The test is still qualitative. It identifies which modal directions are invisible, but it does not measure how strongly visible directions appear in the output or how sensitive reconstruction is to noise. For that, the rank information has to be supplemented by singular values or by the observability Gramian.
[example: Repeated Eigenvalue With One Visible Direction]
Let $A=I_2$ and $C=\begin{bmatrix}1&0\end{bmatrix}$, so for $x=(x_1,x_2)\in\mathbb R^2$ we have $Cx=x_1$. Since $A=I_2$, every nonzero vector satisfies
\begin{align*}
Ax=I_2x=x=1x.
\end{align*}
Thus the whole plane is the eigenspace for the repeated eigenvalue $1$. In particular,
\begin{align*}
A(0,1)=(0,1)
\end{align*}
and
\begin{align*}
C(0,1)=0.
\end{align*}
So the eigenvector $(0,1)$ is invisible to the sensor.
At $\lambda=1$, the PBH map is
\begin{align*}
\operatorname{col}(\lambda I-A,C)=\operatorname{col}(I_2-I_2,C)=\operatorname{col}(0,C).
\end{align*}
Applied to an arbitrary vector $(u,v)$, this gives
\begin{align*}
\operatorname{col}(0,C)(u,v)=(0,0,u).
\end{align*}
Its image is therefore
\begin{align*}
\{(0,0,u):u\in\mathbb R\},
\end{align*}
which is one-dimensional. Hence
\begin{align*}
\operatorname{rank}\operatorname{col}(I_2-A,C)=1<2.
\end{align*}
By the *PBH Observability Test*, the pair $(C,A)$ is not observable. The point is that measuring one direction in a repeated eigenspace does not measure the whole eigenspace; here the second coordinate remains hidden.
[/example]
The rank and PBH criteria are exact, but they do not yet quantify how well the state can be reconstructed from noisy or finite-energy output data. That quantitative role is played by the observability Gramian.
## The Observability Gramian
If $C e^{tA}x_0$ is the output produced by $x_0$, then the squared output energy on $[0,T]$ is a quadratic function of $x_0$. The matrix representing this quadratic form measures how much each state direction is seen by the sensors over the observation window.
[definition: Observability Gramian]
For $T>0$, the observability Gramian of $(C,A)$ on $[0,T]$ is the linear map $W_o(T):\mathbb R^n \to \mathbb R^n$ represented by
\begin{align*}
W_o(T) := \int_0^{\!T} e^{tA^\top}C^\top C e^{tA}\,dt \in \mathbb R^{n \times n}.
\end{align*}
[/definition]
For every $x_0 \in \mathbb R^n$, the Gramian satisfies
\begin{align*}
x_0^\top W_o(T)x_0 = \int_0^{\!T} |C e^{tA}x_0|^2\,dt.
\end{align*}
Thus $W_o(T)$ is positive semidefinite, and its kernel consists of the directions that produce zero output energy on the interval. The important question is whether this energy test detects exactly the same directions as the rank test.
[quotetheorem:6377]
[citeproof:6377]
The hypothesis $T>0$ matters. At $T=0$ the integral is the zero matrix, even for an observable pair, so positive definiteness cannot hold on a degenerate observation window. For every genuine interval, analyticity of $C e^{tA}x_0$ prevents a nonzero observable state from having zero output energy throughout the interval.
The Gramian gives more than a yes-or-no test, but positive definiteness still does not guarantee numerical stability. Its smallest eigenvalue controls the worst observed output energy among unit initial states, so a positive but tiny eigenvalue marks a direction that is theoretically observable but highly noise-sensitive. This is exactly the quantity that appears when one writes the explicit reconstruction formula from continuous output data.
[example: Reconstructing Initial State From Output Data]
Assume $(C,A)$ is observable, let $T>0$, and suppose the homogeneous output $y(t)=Ce^{tA}x_0$ is known for $0\le t\le T$. For each time $t$, substituting the measured output into the weighted signal gives
\begin{align*}
e^{tA^\top}C^\top y(t)=e^{tA^\top}C^\top\bigl(Ce^{tA}x_0\bigr).
\end{align*}
By associativity of matrix multiplication,
\begin{align*}
e^{tA^\top}C^\top\bigl(Ce^{tA}x_0\bigr)=e^{tA^\top}C^\top C e^{tA}x_0.
\end{align*}
Integrating this identity over $[0,T]$ gives
\begin{align*}
\int_0^{\!T} e^{tA^\top}C^\top y(t)\,dt=\int_0^{\!T} e^{tA^\top}C^\top C e^{tA}x_0\,dt.
\end{align*}
Since $x_0$ is constant with respect to $t$, it can be pulled outside the matrix integral:
\begin{align*}
\int_0^{\!T} e^{tA^\top}C^\top C e^{tA}x_0\,dt=\left(\int_0^{\!T} e^{tA^\top}C^\top C e^{tA}\,dt\right)x_0.
\end{align*}
By the definition of the observability Gramian, this becomes
\begin{align*}
\int_0^{\!T} e^{tA^\top}C^\top y(t)\,dt=W_o(T)x_0.
\end{align*}
Because $(C,A)$ is observable and $T>0$, the *[Gramian Observability Criterion](/theorems/6377)* says that $W_o(T)$ is positive definite, hence invertible. Multiplying the last identity by $W_o(T)^{-1}$ gives
\begin{align*}
W_o(T)^{-1}\int_0^{\!T} e^{tA^\top}C^\top y(t)\,dt=W_o(T)^{-1}W_o(T)x_0.
\end{align*}
Since $W_o(T)^{-1}W_o(T)=I$, we obtain
\begin{align*}
x_0=W_o(T)^{-1}\int_0^{\!T} e^{tA^\top}C^\top y(t)\,dt.
\end{align*}
Thus the observed output curve on $[0,T]$ determines the unique initial state that produced it.
[/example]
This reconstruction formula uses a finite observation window, while stable systems often permit all future output energy to be aggregated. The theorem below gives a clean sufficient setting: when $A$ is Hurwitz, the infinite-horizon Gramian is finite and is governed by a Lyapunov equation.
[quotetheorem:6378]
[citeproof:6378]
The Hurwitz hypothesis is what makes the all-future energy integral automatically finite for every sensor matrix $C$. For example, if $A=[1]$ and $C=[1]$, the pair is observable but
\begin{align*}
\int_0^\infty e^{2t}\,dt
\end{align*}
diverges, so the infinite-horizon Gramian does not exist despite observability. Thus the theorem is not merely an observability statement; it combines observability with stability of the forward dynamics.
There is also a limitation in the other direction. Non-Hurwitz systems may have finite output energy in special degenerate cases, such as unstable directions annihilated by $C$, but then the Gramian cannot be expected to be positive definite for state reconstruction. In estimation and filtering the Hurwitz assumption is therefore the natural regime where the Lyapunov equation gives a well-posed infinite-horizon measure of output information. The final section explains why every observability statement mirrors a controllability statement.
## Duality Between Controllability and Observability
Why do the observability tests look like transposed versions of controllability tests? The reason is that controllability asks whether columns generated by $B,AB,\dots,A^{n-1}B$ span the state space, while observability asks whether rows generated by $C,CA,\dots,CA^{n-1}$ separate the state space.
The finite-dimensional dual system is obtained by replacing the input matrix with the transposed output matrix and replacing $A$ with $A^\top$. Its controllability matrix is
\begin{align*}
\mathcal C(A^\top,C^\top)=\begin{bmatrix}C^\top&A^\top C^\top&\cdots&(A^\top)^{n-1}C^\top\end{bmatrix}.
\end{align*}
Transposing the observability matrix gives
\begin{align*}
\mathcal O(C,A)^\top
=\begin{bmatrix}C^\top&A^\top C^\top&\cdots&(A^\top)^{n-1}C^\top\end{bmatrix}
=\mathcal C(A^\top,C^\top).
\end{align*}
Therefore $\operatorname{rank}\mathcal O(C,A)=\operatorname{rank}\mathcal C(A^\top,C^\top)$, so $(C,A)$ is observable exactly when the controllability pair with state matrix $A^\top$ and input matrix $C^\top$ is controllable.
Duality is not only a mnemonic, but it depends on the same finite-dimensional rank hypotheses as the Kalman tests. If $A=I_2$ and $C x=x_1$, then $\mathcal O(C,A)$ has rank $1$, and the dual controllability matrix for $(A^\top,C^\top)$ also has rank $1$; the same missing direction appears as an unobservable state in the original system and as an unreachable state in the dual system.
The principle transfers algebraic yes-or-no statements, not conditioning by itself. A dual pair may be controllable with a badly conditioned controllability matrix exactly when the original pair is observable with a badly conditioned observability matrix. To compare the quantitative energy interpretations, we need the corresponding Gramian identity.
[quotetheorem:6379]
[citeproof:6379]
The equality is formal, but its hypotheses and interpretation still matter. It uses the standard continuous-time finite-dimensional Gramian definitions over the same interval $[0,T]$; changing the time horizon or replacing the system by a time-varying one would require a different formula. In the example $A=I_2$ and $C x=x_1$, both the observability Gramian and the dual controllability Gramian have rank $1$, so the shared Gramian identity records the same missing direction from both viewpoints.
This identity also has a limitation: it does not make an ill-conditioned reconstruction problem well conditioned. It only says that the poor direction can be studied equivalently as weak observability of $(C,A)$ or weak controllability of $(A^\top,C^\top)$. The same transposition explains the dual PBH test: controllability rules out left eigenvectors annihilating $B$, while observability rules out right eigenvectors annihilated by $C$.
[remark: Design Consequence Of Duality]
Observer design for $(C,A)$ is algebraically dual to state-feedback design for $(A^\top,C^\top)$. In later chapters this becomes the pole-placement rule: choosing an observer gain $L$ so that $A-LC$ has desired eigenvalues is equivalent to choosing a feedback gain for the transposed pair.
[/remark]
Observability completes the structural pair with controllability. Controllability says inputs can move the state through all directions; observability says outputs distinguish all directions. A realization that has both properties is minimal from the input-output viewpoint, which is the bridge to realization theory, state estimation, and output-feedback control.
Reachability and observability are the structural tests that decide whether a realization contains redundant state variables. Chapter 5 uses them to build canonical forms and to identify when a transfer function admits a minimal realization.
# 5. Canonical Forms and Minimal Realizations
Canonical forms turn an abstract state-space realization of a transfer function into matrices with a prescribed algebraic shape. Chapters 3 and 4 established reachability and observability as intrinsic tests for whether inputs can generate states and outputs can detect states. This chapter asks how those tests behave under changes of coordinates, how companion matrices encode scalar transfer functions, and how to remove state variables that do not affect the input-output map.
## Coordinate Changes and Invariant Properties
A state vector is a coordinate description of the internal condition of a system, not a physical object with a unique basis. The first problem is to distinguish genuine system properties from features created by a particular choice of state coordinates. A singular change of variables would collapse distinct states together, so it could destroy information about reachability, observability, and transfer functions rather than merely rename coordinates.
[definition: Similar State-Space Realizations]
Let
\begin{align*}
\Sigma &: \dot{x}=Ax+Bu, \qquad y=Cx+Du
\end{align*}
be a continuous-time linear system with $x\in \mathbb R^n$, $u\in \mathbb R^m$, and $y\in \mathbb R^p$. Let $\tau:\mathbb R^n\to\mathbb R^n$ be the invertible linear map $\tau(z)=Tz$, where $T\in \mathbb R^{n\times n}$ is invertible, and write $z=T^{-1}x$. The transformed realization is
\begin{align*}
\tilde A = T^{-1}AT, \qquad \tilde B = T^{-1}B, \qquad \tilde C = CT, \qquad \tilde D = D.
\end{align*}
The two realizations are called similar.
[/definition]
Similarity says that the trajectories are the same curves written as $x(t)=Tz(t)$. To use canonical forms without changing the plant, we need to verify that the external map from $u$ to $y$ survives this coordinate change.
[quotetheorem:6380]
[citeproof:6380]
The invertibility hypothesis is doing real work: if $T$ were singular, two different states could be identified and the formula for $(sI-\tilde A)^{-1}$ need not represent a realization on the same state space. For example, projecting a two-state system onto its first coordinate can remove a mode that appears in $C(sI-A)^{-1}B$, so the transfer function may change. The theorem does not say that every realization with the same transfer function is obtained by similarity; that stronger statement requires minimality later in the chapter. What it does give is permission to choose convenient coordinates before computing canonical matrices.
Throughout the companion-form constructions, $e_j\in\mathbb R^n$ denotes the $j$th standard basis vector, with the ambient dimension determined by context.
[example: Similarity Change Preserving Transfer Function]
Consider
\begin{align*}
A_{11}=0,\quad A_{12}=1,\quad A_{21}=-2,\quad A_{22}=-3,\quad B_1=0,\quad B_2=1,\quad C_1=1,\quad C_2=0,\quad D=0.
\end{align*}
The change of coordinates is
\begin{align*}
T_{11}=1,\quad T_{12}=1,\quad T_{21}=0,\quad T_{22}=1,\quad (T^{-1})_{11}=1,\quad (T^{-1})_{12}=-1,\quad (T^{-1})_{21}=0,\quad (T^{-1})_{22}=1.
\end{align*}
First compute $AT$. Its entries are
\begin{align*}
(AT)_{11}=0\cdot 1+1\cdot 0=0,\quad (AT)_{12}=0\cdot 1+1\cdot 1=1,\quad (AT)_{21}=(-2)\cdot 1+(-3)\cdot 0=-2,\quad (AT)_{22}=(-2)\cdot 1+(-3)\cdot 1=-5.
\end{align*}
Therefore $\tilde A=T^{-1}AT$ has entries
\begin{align*}
\tilde A_{11}=1\cdot 0+(-1)(-2)=2,\quad \tilde A_{12}=1\cdot 1+(-1)(-5)=6,\quad \tilde A_{21}=0\cdot 0+1\cdot(-2)=-2,\quad \tilde A_{22}=0\cdot 1+1\cdot(-5)=-5.
\end{align*}
The transformed input and output matrices are
\begin{align*}
\tilde B_1=1\cdot 0+(-1)\cdot 1=-1,\quad \tilde B_2=0\cdot 0+1\cdot 1=1,\quad \tilde C_1=1\cdot 1+0\cdot 0=1,\quad \tilde C_2=1\cdot 1+0\cdot 1=1.
\end{align*}
For the original realization, solving $(sI-A)v=B$ gives the two scalar equations
\begin{align*}
sv_1-v_2=0,\quad 2v_1+(s+3)v_2=1.
\end{align*}
The first equation gives $v_2=sv_1$. Substituting this into the second equation gives
\begin{align*}
2v_1+(s+3)sv_1=1.
\end{align*}
Hence
\begin{align*}
(s^2+3s+2)v_1=1.
\end{align*}
Since $C=(1,0)$, the transfer function of the original realization is
\begin{align*}
C(sI-A)^{-1}B=v_1=\frac{1}{s^2+3s+2}.
\end{align*}
For the transformed realization, solving $(sI-\tilde A)w=\tilde B$ gives
\begin{align*}
(s-2)w_1-6w_2=-1,\quad 2w_1+(s+5)w_2=1.
\end{align*}
The determinant of this $2$ by $2$ system is
\begin{align*}
(s-2)(s+5)-(-6)(2)=s^2+3s-10+12=s^2+3s+2.
\end{align*}
By the two-variable elimination formula, the first component is
\begin{align*}
w_1=\frac{(-1)(s+5)-(-6)(1)}{s^2+3s+2}=\frac{1-s}{s^2+3s+2}.
\end{align*}
The second component is
\begin{align*}
w_2=\frac{(s-2)(1)-2(-1)}{s^2+3s+2}=\frac{s}{s^2+3s+2}.
\end{align*}
Since $\tilde C=(1,1)$, we get
\begin{align*}
\tilde C(sI-\tilde A)^{-1}\tilde B=w_1+w_2=\frac{1-s}{s^2+3s+2}+\frac{s}{s^2+3s+2}=\frac{1}{s^2+3s+2}.
\end{align*}
Because $D=\tilde D=0$, both coordinate descriptions have the same transfer function
\begin{align*}
G(s)=\frac{1}{s^2+3s+2}.
\end{align*}
The matrices have changed, but the input-output map has not.
[/example]
The example shows input-output invariance in one calculation. For controller and observer design we also need the rank tests to be invariant, since otherwise reachability and observability would depend on coordinates rather than on the system.
[quotetheorem:6381]
[citeproof:6381]
The theorem depends on the same invertibility condition as transfer-function invariance, because rank is preserved by multiplication with an invertible matrix and can drop under a singular projection. As a concrete failure, a reachable two-state pair can become unreachable if the coordinate map collapses the input direction to zero. The statement also does not claim that reachability implies observability, or that either property follows from having a particular transfer function before minimality is imposed. Its role is narrower and essential: once a rank test passes, any canonical coordinate system obtained by similarity keeps that pass.
## Controllable Companion Form
Suppose a transfer function is given as a rational function and we need a state-space model whose reachability is built into the matrix shape. Without such a normal form, a realization may contain denominator factors whose modes cannot actually be reached by the input, so pole placement calculations would be performed on states that the controller cannot move. The companion construction avoids this failure for scalar strictly proper functions by placing the denominator coefficients in the state matrix and using the input to drive one distinguished coordinate.
[definition: Controllable Companion Pair]
Let
\begin{align*}
p(s)=s^n+a_{n-1}s^{n-1}+\cdots+a_1s+a_0
\end{align*}
be a monic polynomial. The controllable companion pair associated to $p$ is the pair $(A_c,B_c)$ with $B_c=e_n$ and with entries
\begin{align*}
(A_c)_{i,i+1}=1 \text{ for } 1\le i<n, \qquad (A_c)_{n,j}=-a_{j-1} \text{ for } 1\le j\le n,
\end{align*}
all other entries being zero.
[/definition]
The matrix $A_c$ is arranged so that the input enters the highest derivative coordinate in the corresponding scalar differential equation. We now need to know that this convenient shape has not inserted an unreachable state and that it has the intended denominator.
[quotetheorem:6382]
[citeproof:6382]
The single-input and reachability hypotheses are necessary for this exact companion pair: with two inputs there is no single cyclic vector generating the whole basis, and with an unreachable pair the input-generated subspace is too small. For example, if $B=0$ and $n>0$, no similarity can turn $B$ into $e_n$, since invertible maps preserve the zero vector. The theorem also does not choose the output row or realize a numerator; it only normalizes the input-dynamics pair. The next result solves that missing realization problem by choosing $C_c$ so that the resolvent expansion produces the desired rational function.
[quotetheorem:6383]
[citeproof:6383]
The strict properness hypothesis separates dynamic memory from direct feedthrough; if the numerator had degree $n$, the quotient would contribute a nonzero $D$ term rather than being represented by $C_c(sI-A_c)^{-1}B_c$ alone. The scalar SISO hypothesis is also part of the formula, since matrix-valued transfer functions require several input and output directions and cannot be encoded by one row $C_c$ in this way. This theorem does not assert minimality after algebraic cancellations: if numerator and denominator share a factor, the companion realization of the uncancelled expression may contain a hidden mode. The statement is most transparent in a numerical example, where the denominator fixes the dynamics and the numerator fixes the output row.
[example: Building a Controllable Canonical Realization]
For
\begin{align*}
G(s)=\frac{2s+5}{s^3+4s^2+3s+2},
\end{align*}
the denominator coefficients are $a_0=2$, $a_1=3$, and $a_2=4$. The controllable companion pair therefore has entries
\begin{align*}
(A_c)_{12}=1,\quad (A_c)_{23}=1,\quad (A_c)_{31}=-2,\quad (A_c)_{32}=-3,\quad (A_c)_{33}=-4,
\end{align*}
with all other entries zero, and
\begin{align*}
B_c=e_3.
\end{align*}
The numerator is
\begin{align*}
2s+5=5+2s+0s^2,
\end{align*}
so take
\begin{align*}
C_c=(5,2,0), \qquad D=0.
\end{align*}
To compute the transfer function, solve $(sI-A_c)v=B_c$ with $v=(v_1,v_2,v_3)^\top$. The three scalar equations are
\begin{align*}
sv_1-v_2=0,
\end{align*}
\begin{align*}
sv_2-v_3=0,
\end{align*}
and
\begin{align*}
2v_1+3v_2+(s+4)v_3=1.
\end{align*}
The first equation gives
\begin{align*}
v_2=sv_1.
\end{align*}
The second equation then gives
\begin{align*}
v_3=sv_2=s^2v_1.
\end{align*}
Substituting these expressions into the third equation gives
\begin{align*}
2v_1+3sv_1+(s+4)s^2v_1=1.
\end{align*}
Expanding the left-hand side gives
\begin{align*}
(2+3s+s^3+4s^2)v_1=1.
\end{align*}
Reordering powers of $s$ gives
\begin{align*}
(s^3+4s^2+3s+2)v_1=1.
\end{align*}
Hence
\begin{align*}
v_1=\frac{1}{s^3+4s^2+3s+2}.
\end{align*}
Using $v_2=sv_1$ and $v_3=s^2v_1$, we also get
\begin{align*}
v_2=\frac{s}{s^3+4s^2+3s+2}.
\end{align*}
\begin{align*}
v_3=\frac{s^2}{s^3+4s^2+3s+2}.
\end{align*}
Therefore
\begin{align*}
C_c(sI-A_c)^{-1}B_c=5v_1+2v_2+0v_3.
\end{align*}
Substituting the computed components gives
\begin{align*}
C_c(sI-A_c)^{-1}B_c=\frac{5}{s^3+4s^2+3s+2}+\frac{2s}{s^3+4s^2+3s+2}.
\end{align*}
Combining the fractions gives
\begin{align*}
C_c(sI-A_c)^{-1}B_c=\frac{2s+5}{s^3+4s^2+3s+2}.
\end{align*}
Since $D=0$, the realization has transfer function $G(s)$.
It remains to verify reachability in this concrete realization. The first reachability column is
\begin{align*}
B_c=e_3.
\end{align*}
Multiplying by $A_c$ gives
\begin{align*}
A_cB_c=(0,1,-4)^\top.
\end{align*}
Multiplying once more gives
\begin{align*}
A_c^2B_c=A_c(0,1,-4)^\top=(1,-4,13)^\top,
\end{align*}
because the first coordinate is $1$, the second coordinate is $-4$, and the third coordinate is $-3\cdot 1+(-4)(-4)=13$. Thus the reachability matrix has columns
\begin{align*}
(0,0,1)^\top,\qquad (0,1,-4)^\top,\qquad (1,-4,13)^\top.
\end{align*}
Using the determinant formula for a $3$ by $3$ matrix with rows $(0,0,1)$, $(0,1,-4)$, and $(1,-4,13)$ gives
\begin{align*}
\det \mathcal R=0\cdot(1\cdot 13-(-4)(-4))-0\cdot(0\cdot 13-(-4)\cdot 1)+1\cdot(0\cdot(-4)-1\cdot 1).
\end{align*}
Therefore
\begin{align*}
\det \mathcal R=-1.
\end{align*}
The determinant is nonzero, so $\mathcal R$ has rank $3$; all three states are reachable, and the companion construction realizes the desired transfer function without introducing an unreachable state.
[/example]
If the transfer function is proper but not strictly proper, the polynomial part contributes a direct feedthrough term. In the common case where numerator and denominator have the same degree, polynomial division gives a constant $D$ plus a strictly proper remainder, and the companion construction realizes the remainder.
[remark: Direct Feedthrough in Companion Realizations]
For a proper scalar transfer function $G(s)$, write $G(s)=D+G_0(s)$ where $G_0$ is strictly proper. A state-space realization may then use the companion form for $G_0$ and place the quotient constant in $D$. This separates instantaneous input-output dependence from dynamic memory.
[/remark]
The controllable companion form is input-adapted. There is a companion construction in which the output samples one distinguished coordinate and observability is built into the matrix shape.
## Observable Companion Form
If measurements rather than actuation are the focus, a useful coordinate system is one in which the output and its successive derivatives reveal the state. The obstruction is that two different states can produce the same entire output history when the pair is not observable, making any observer based on those measurements underdetermined. The observable companion form used here chooses the first coordinate as the measured output and arranges the matrix so that repeated output derivatives recover the remaining coordinates.
[definition: Observable Companion Pair]
For the monic polynomial
\begin{align*}
p(s)=s^n+a_{n-1}s^{n-1}+\cdots+a_1s+a_0,
\end{align*}
the observable companion pair is $(C_o,A_o)$ with $C_o=e_1^\top$ and with entries
\begin{align*}
(A_o)_{i,i+1}=1\quad(1\le i\le n-1),\qquad (A_o)_{n,j}=-a_{j-1}\quad(1\le j\le n),
\end{align*}
all other entries being $0$.
[/definition]
This convention makes the observability matrix begin with the coordinate rows $e_1^\top,e_2^\top,\dots,e_n^\top$. To use it as a measurement-adapted normal form, we need the rank statement and the same denominator statement.
[quotetheorem:6384]
[citeproof:6384]
The hypotheses mirror the reachable theorem at the level of cyclic data: single-output observability provides one cyclic covector, while a nonobservable pair cannot be made observable by a similarity transformation. For example, if $C=0$ and $n>0$, every transformed output row remains zero, so no coordinate basis can produce $e_1^\top$. The theorem does not state that the system is reachable, nor does it determine a numerator for a transfer function. The observable form realizes the same scalar transfer functions as the controllable form, but the numerator coefficients enter through the input column instead of the output row.
[example: Observable Canonical Realization]
For
\begin{align*}
G(s)=\frac{2s+5}{s^3+4s^2+3s+2},
\end{align*}
the denominator coefficients are $a_0=2$, $a_1=3$, and $a_2=4$. The observable companion matrix has nonzero entries
\begin{align*}
(A_o)_{12}=1,\quad (A_o)_{23}=1,\quad (A_o)_{31}=-2,\quad (A_o)_{32}=-3,\quad (A_o)_{33}=-4.
\end{align*}
Take
\begin{align*}
C_o=e_1^\top=(1,0,0),\qquad B_o=(0,2,-3)^\top,\qquad D=0.
\end{align*}
To compute the transfer function, solve $(sI-A_o)w=B_o$ with $w=(w_1,w_2,w_3)^\top$. From the entries of $A_o$, the three scalar equations are
\begin{align*}
sw_1-w_2=0.
\end{align*}
\begin{align*}
sw_2-w_3=2.
\end{align*}
\begin{align*}
2w_1+3w_2+(s+4)w_3=-3.
\end{align*}
The first equation gives
\begin{align*}
w_2=sw_1.
\end{align*}
The second equation gives
\begin{align*}
w_3=sw_2-2=s^2w_1-2.
\end{align*}
Substituting these expressions into the third equation gives
\begin{align*}
2w_1+3sw_1+(s+4)(s^2w_1-2)=-3.
\end{align*}
Expanding the left-hand side gives
\begin{align*}
(s^3+4s^2+3s+2)w_1-2s-8=-3.
\end{align*}
Therefore
\begin{align*}
(s^3+4s^2+3s+2)w_1=2s+5.
\end{align*}
Thus
\begin{align*}
w_1=\frac{2s+5}{s^3+4s^2+3s+2}.
\end{align*}
Since $C_o=(1,0,0)$, we get
\begin{align*}
C_o(sI-A_o)^{-1}B_o=w_1=\frac{2s+5}{s^3+4s^2+3s+2}.
\end{align*}
Because $D=0$, this realization has transfer function $G(s)$.
It remains to verify observability in this concrete realization. The first observability row is
\begin{align*}
C_o=(1,0,0).
\end{align*}
Multiplying by $A_o$ gives
\begin{align*}
C_oA_o=(0,1,0).
\end{align*}
Multiplying once more gives
\begin{align*}
C_oA_o^2=(0,1,0)A_o=(0,0,1).
\end{align*}
Thus the observability matrix has rows $(1,0,0)$, $(0,1,0)$, and $(0,0,1)$. Hence
\begin{align*}
\mathcal O=I_3.
\end{align*}
Therefore
\begin{align*}
\det \mathcal O=1.
\end{align*}
The determinant is nonzero, so $\mathcal O$ has rank $3$. All three states are observable from the output derivatives in the ideal noiseless model.
[/example]
The companion forms show that realizations are far from unique. Minimal realization theory asks which of these many realizations have the smallest possible state dimension.
## Minimal Realizations and McMillan Degree
A transfer function can be represented by many state-space models, including models with extra states that never affect the input-output map. The failure mode is common: building a model from an uncancelled rational expression can introduce modes that appear in the state matrix but cannot be identified from input-output data. The next question is how to detect and remove such states, and how to identify the intrinsic dynamic order of the transfer function.
[definition: Minimal Realization]
A realization $(A,B,C,D)$ of a transfer function $G(s)$ is minimal if no realization of $G(s)$ has smaller state dimension.
[/definition]
Minimality names the absence of removable states in a particular realization. To compare this across all possible realizations of the same transfer function, we need a number attached to the transfer function itself rather than to a chosen model.
[definition: McMillan Degree]
Let $G\in \mathbb R(s)^{p\times m}$ be a proper rational transfer matrix, regarded as a function $G:\Omega\to\mathbb C^{p\times m}$ on the set $\Omega\subset\mathbb C$ where all entries are defined. The McMillan degree $\delta(G)$ is the state dimension of any minimal realization of $G(s)$.
[/definition]
The definition relies on the theorem below, which implies that all minimal realizations have the same dimension. In the scalar case, the McMillan degree is the degree of the denominator after cancelling common factors between numerator and denominator.
[quotetheorem:6385]
[citeproof:6385]
Both rank hypotheses are needed. If reachability fails, an input can never excite some state direction; if observability fails, a nonzero state direction can remain invisible at the output. A concrete counterexample is the realization $\dot{x}_1=-x_1+u$, $\dot{x}_2=-2x_2$, $y=x_1$, whose second state changes the characteristic polynomial but not the transfer function $1/(s+1)$. The theorem does not say that every nonminimal realization has an evident zero row or column in its original coordinates; the redundant part may appear only after a Kalman decomposition. This is the main bridge between algebraic transfer functions and geometric state-space structure, because it says that a state is redundant exactly when it is either not reachable from the input or not visible in the output.
[example: Nonminimal Realization with a Cancelled Pole]
Consider the uncancelled scalar transfer function
\begin{align*}
G(s)=\frac{s+1}{(s+1)(s+2)}.
\end{align*}
Expanding the denominator gives
\begin{align*}
(s+1)(s+2)=s^2+3s+2.
\end{align*}
The controllable companion realization of the uncancelled expression has
\begin{align*}
A_{11}=0,\quad A_{12}=1,\quad A_{21}=-2,\quad A_{22}=-3,\quad B_1=0,\quad B_2=1,\quad C_1=1,\quad C_2=1,\quad D=0.
\end{align*}
To verify its transfer function, solve $(sI-A)v=B$ with $v=(v_1,v_2)^\top$. The two scalar equations are
\begin{align*}
sv_1-v_2=0,\qquad 2v_1+(s+3)v_2=1.
\end{align*}
The first equation gives
\begin{align*}
v_2=sv_1.
\end{align*}
Substituting this into the second equation gives
\begin{align*}
2v_1+(s+3)sv_1=1.
\end{align*}
Expanding the coefficient of $v_1$ gives
\begin{align*}
2v_1+(s^2+3s)v_1=1.
\end{align*}
Thus
\begin{align*}
(s^2+3s+2)v_1=1.
\end{align*}
Hence
\begin{align*}
v_1=\frac{1}{s^2+3s+2}.
\end{align*}
Using $v_2=sv_1$ gives
\begin{align*}
v_2=\frac{s}{s^2+3s+2}.
\end{align*}
Multiplication by $C=(1,1)$ gives
\begin{align*}
C(sI-A)^{-1}B=v_1+v_2.
\end{align*}
Substituting the two components gives
\begin{align*}
C(sI-A)^{-1}B=\frac{1}{s^2+3s+2}+\frac{s}{s^2+3s+2}.
\end{align*}
Combining numerators gives
\begin{align*}
C(sI-A)^{-1}B=\frac{s+1}{s^2+3s+2}.
\end{align*}
Since $s^2+3s+2=(s+1)(s+2)$, this is the uncancelled expression
\begin{align*}
C(sI-A)^{-1}B=\frac{s+1}{(s+1)(s+2)}.
\end{align*}
As a rational function after cancellation, this equals
\begin{align*}
\frac{1}{s+2}.
\end{align*}
The cancelled mode is visible in the observability test. The first observability row is
\begin{align*}
C=(1,1).
\end{align*}
Multiplying by $A$ gives
\begin{align*}
CA=(-2,-2),
\end{align*}
because the first entry is $1\cdot 0+1\cdot(-2)=-2$ and the second entry is $1\cdot 1+1\cdot(-3)=-2$. Therefore the observability matrix has first row $(1,1)$ and second row $(-2,-2)$. Its determinant is
\begin{align*}
1\cdot(-2)-1\cdot(-2)=0.
\end{align*}
So the two-state realization is not observable. More explicitly, the vector $q=(1,-1)^\top$ is invisible at the output because
\begin{align*}
Cq=1-1=0.
\end{align*}
It is also an eigenvector of $A$:
\begin{align*}
Aq=(-1,1)^\top.
\end{align*}
Since
\begin{align*}
(-1,1)^\top=-1(1,-1)^\top,
\end{align*}
the unobservable state direction has eigenvalue $-1$. Thus the pole $s=-1$ belongs to a hidden unobservable mode and is cancelled from the input-output transfer function.
After cancellation, the reduced transfer function is
\begin{align*}
G(s)=\frac{1}{s+2}.
\end{align*}
A one-state realization is
\begin{align*}
\dot{x}=-2x+u,\qquad y=x.
\end{align*}
Here
\begin{align*}
A=-2,\quad B=1,\quad C=1,\quad D=0.
\end{align*}
Its transfer function is
\begin{align*}
C(sI-A)^{-1}B=1\cdot\frac{1}{s-(-2)}\cdot 1=\frac{1}{s+2}.
\end{align*}
Since $B=1\ne 0$, the one-state reachability matrix has rank $1$. Since $C=1\ne 0$, the one-state observability matrix has rank $1$. By the *[Minimal Realization Theorem](/theorems/6385)*, this realization is minimal. The McMillan degree is therefore $1$, not the degree $2$ of the uncancelled denominator.
[/example]
For matrix-valued transfer functions, cancellations are more subtle than cancelling scalar polynomials entry by entry. The McMillan degree counts the total pole structure of the transfer matrix after all common state-space redundancies have been removed.
[remark: Poles, Zeros, and Hidden Modes]
A pole of $A$ that is unreachable cannot be excited by the input, while a pole that is unobservable cannot be detected at the output. Either type of mode may appear in $\det(sI-A)$ without appearing as a pole of the transfer function. Pole-zero cancellation is the transfer-function shadow of this state-space redundancy.
[/remark]
The remark identifies why nonminimal poles can appear in state matrices. The remaining question is whether two models that have no such hidden modes can still represent the same transfer function in genuinely different ways.
[quotetheorem:6386]
[citeproof:6386]
Reachability and observability are essential here because they make the map built from matching Markov parameters both well-defined and bijective. If either realization is nonminimal, the conclusion can fail: one may append an unreachable and unobservable extra mode with any eigenvalue and obtain the same transfer function in a larger state space, which cannot be similar to the original model. The theorem also does not choose a preferred coordinate basis, so it gives uniqueness only up to similarity rather than equality of matrices. This provides a practical warning: if two minimal models of the same plant look different, the difference is a coordinate transformation, not a different input-output system.
## Procedure for Constructing Minimal Canonical Models
A common modelling task starts with a rational transfer function and asks for a state-space realization suitable for controller design. The following workflow separates algebraic simplification from coordinate selection.
[explanation: Minimal Realization Workflow]
Begin with the proper transfer function $G(s)$. If $G$ is scalar, cancel common factors in the numerator and denominator; if $G$ is matrix-valued, use a state-space or Smith-McMillan computation rather than entrywise cancellation alone. Separate any direct feedthrough term $D$ by polynomial division.
Realize the strictly proper part in controllable companion form when input design and reachability are the priority, or in observable companion form when output reconstruction and observer design are the priority. Then test reachability and observability of the resulting realization. If both tests pass, the realization is minimal; if not, apply the reachability and observability decompositions to remove redundant blocks.
[/explanation]
The workflow is not just computational bookkeeping. It prevents cancelled dynamics from being mistaken for controllable plant modes, which matters when assigning poles, designing observers, or estimating model order from data.
[example: From Transfer Function to Minimal Companion Model]
Let
\begin{align*}
G(s)=\frac{s^2+3s+2}{s^3+6s^2+11s+6}.
\end{align*}
The numerator factors because
\begin{align*}
(s+1)(s+2)=s^2+2s+s+2=s^2+3s+2.
\end{align*}
For the denominator, first compute
\begin{align*}
(s+1)(s+2)=s^2+3s+2.
\end{align*}
Then
\begin{align*}
(s^2+3s+2)(s+3)=s^3+3s^2+3s^2+9s+2s+6.
\end{align*}
Combining like powers gives
\begin{align*}
(s^2+3s+2)(s+3)=s^3+6s^2+11s+6.
\end{align*}
Thus, as a rational function,
\begin{align*}
G(s)=\frac{(s+1)(s+2)}{(s+1)(s+2)(s+3)}=\frac{1}{s+3}.
\end{align*}
The reduced denominator is $s+3$, so the one-state companion realization is
\begin{align*}
A=-3,\qquad B=1,\qquad C=1,\qquad D=0.
\end{align*}
Its transfer function is
\begin{align*}
C(sI-A)^{-1}B+D=1\cdot (s-(-3))^{-1}\cdot 1+0.
\end{align*}
Since $s-(-3)=s+3$, this becomes
\begin{align*}
C(sI-A)^{-1}B+D=\frac{1}{s+3}.
\end{align*}
The reachability matrix is the $1$ by $1$ matrix
\begin{align*}
\mathcal R=(B)=(1),
\end{align*}
so it has rank $1$. The observability matrix is
\begin{align*}
\mathcal O=(C)=(1),
\end{align*}
so it also has rank $1$. By the *Minimal Realization Theorem*, this one-state realization is minimal, and the McMillan degree of $G$ is $1$.
A third-order companion realization built from the uncancelled denominator would have state dimension $3$, but the transfer function has already reduced to a degree-one denominator. The factors $s+1$ and $s+2$ cancel before the input-output map is formed, so the corresponding modes are redundant rather than genuine assignable plant poles in a minimal feedback design.
[/example]
Canonical forms make the link between rational functions and state-space geometry explicit. Companion realizations encode the denominator and numerator coefficients directly, similarity transformations identify equivalent coordinate descriptions, and the minimal realization theorem tells us exactly when the state dimension is intrinsic. These ideas prepare for feedback design: pole placement and observer construction are meaningful on the reachable and observable part of the system, which is precisely the minimal part.
Once minimality is understood, the remaining state-space structure can be separated into the parts that matter for control and observation and the parts that do not. Chapter 6 performs that decomposition explicitly, setting up the clean split needed for later design.
# 6. Kalman Decomposition
The previous chapters isolated controllability and observability as separate tests on a state-space model. In practice the two tests interact: a state direction may be reachable but invisible at the output, visible but unreachable from the input, both, or neither. Kalman decomposition is the coordinate theorem that separates these four behaviours and explains which part of a realization is actually seen by the input-output map.
The chapter has two goals. First, we construct the four invariant pieces from the reachable subspace and the unobservable subspace. Second, we use the resulting block form to identify the minimal subsystem and to clarify which modes can be moved by feedback, reconstructed by observers, or detected from transfer-function data. The prerequisites are the reachability and observability rank tests, similarity transformations of state-space realizations, and the transfer function formula obtained from the resolvent of $A$.
## The Four State Components
The basic problem is to replace a state coordinate system chosen by modelling convenience with one adapted to the two structural subspaces. For a continuous-time linear system
\begin{align*}
\dot{x}(t) = Ax(t)+Bu(t), \qquad y(t)=Cx(t)+Du(t),
\end{align*}
with $x(t)\in \mathbb R^n$, $u(t)\in \mathbb R^m$, and $y(t)\in \mathbb R^p$, the reachable subspace records what inputs can create, while the unobservable subspace records what outputs cannot distinguish from zero.
[definition: Reachable Subspace]
For matrices $A\in \mathbb R^{n\times n}$ and $B\in \mathbb R^{n\times m}$, the reachable subspace is
\begin{align*}
\mathcal R(A,B)=\operatorname{span}\{Bv,ABv,\dots,A^{n-1}Bv: v\in \mathbb R^m\}\subset \mathbb R^n.
\end{align*}
[/definition]
This subspace is $A$-invariant up to the input directions: Cayley-Hamilton gives $A\mathcal R(A,B)\subset \mathcal R(A,B)$ and $\operatorname{im}B\subset \mathcal R(A,B)$. The dual obstruction is a subspace on which every free response has zero measured output.
[definition: Unobservable Subspace]
For matrices $A\in \mathbb R^{n\times n}$ and $C\in \mathbb R^{p\times n}$, the unobservable subspace is
\begin{align*}
\mathcal N(A,C)=\bigcap_{k=0}^{n-1}\ker(CA^k)\subset \mathbb R^n.
\end{align*}
[/definition]
The unobservable subspace is $A$-invariant by Cayley-Hamilton, so invisible initial states remain invisible under the autonomous dynamics. This still does not say how the two structural tests interact: a direction may lie in $\mathcal R(A,B)$, in $\mathcal N(A,C)$, in both, or in neither after choosing suitable complements. We therefore need a four-way vocabulary before a four-block coordinate theorem can be stated.
[definition: Kalman State Types]
For a realization $(A,B,C,D)$ on $\mathbb R^n$, the state directions are classified by membership in the reachable subspace $\mathcal R(A,B)$ and the unobservable subspace $\mathcal N(A,C)$. A direction is reachable-observable if it contributes to $\mathcal R(A,B)$ modulo $\mathcal R(A,B)\cap\mathcal N(A,C)$, reachable-unobservable if it lies in $\mathcal R(A,B)\cap\mathcal N(A,C)$, unreachable-observable if it is represented in a complement to $\mathcal R(A,B)+\mathcal N(A,C)$ after quotienting by reachable directions, and unreachable-unobservable if it lies in a complement inside $\mathcal N(A,C)$ away from $\mathcal R(A,B)\cap\mathcal N(A,C)$.
[/definition]
The definition uses complements because the four pieces are subspaces after a choice of basis, not canonical labelled subsets of vectors in the original coordinates. The dimensions of the pieces are canonical, and the block form below makes the dependence on the chosen complements harmless.
[example: Two Independent Tests Do Not Give Two Blocks]
Consider a system with $n=3$ whose reachable and unobservable subspaces are
\begin{align*}
\mathcal R(A,B)=\operatorname{span}(e_1,e_2), \qquad \mathcal N(A,C)=\operatorname{span}(e_2,e_3).
\end{align*}
We compute their overlap explicitly. If $v\in \mathcal R(A,B)\cap \mathcal N(A,C)$, then $v=a e_1+b e_2=c e_2+d e_3$ for some scalars $a,b,c,d$. Comparing coordinates in the basis $(e_1,e_2,e_3)$ gives $a=0$, $b=c$, and $d=0$, so $v=b e_2$. Hence
\begin{align*}
\mathcal R(A,B)\cap \mathcal N(A,C)=\operatorname{span}(e_2).
\end{align*}
Thus $e_2$ is both reachable and unobservable.
The reachable part is
\begin{align*}
\mathcal R(A,B)=\operatorname{span}(e_1)\oplus \operatorname{span}(e_2),
\end{align*}
and $e_1\notin \mathcal N(A,C)$, so $e_1$ represents a reachable-observable direction. Similarly,
\begin{align*}
\mathcal N(A,C)=\operatorname{span}(e_2)\oplus \operatorname{span}(e_3),
\end{align*}
and $e_3\notin \mathcal R(A,B)$, so $e_3$ is unreachable and unobservable. Finally,
\begin{align*}
\mathcal R(A,B)+\mathcal N(A,C)=\operatorname{span}(e_1,e_2,e_3)=\mathbb R^3.
\end{align*}
Therefore there is no remaining unreachable-observable complement in this example. The two tests overlap along $\operatorname{span}(e_2)$, so the decomposition must record both $\mathcal R(A,B)\cap\mathcal N(A,C)$ and $\mathcal R(A,B)+\mathcal N(A,C)$, not just two independent yes-or-no labels.
[/example]
## Kalman Decomposition Theorem
The question is now whether the preceding classification can be achieved by a similarity transformation that also respects the differential equation. The answer is yes: after changing coordinates, the matrices have a block triangular form in which the input can enter only reachable blocks and the output can see only observable blocks.
[quotetheorem:6387]
[citeproof:6387]
The exact values of the off-diagonal blocks depend on the chosen ordered basis, but the displayed zeros carry the system-theoretic information. The hypotheses cannot be weakened to an arbitrary decomposition into four vector-space summands: for example, if a chosen “reachable” summand is not $A$-invariant, then $A$ can immediately drive it into an unreachable coordinate and the lower-left zero blocks disappear. The theorem also does not claim that the complements are canonical or orthogonal; it gives a similarity normal form whose invariant content is the pair of subspaces $\mathcal R(A,B)$ and $\mathcal N(A,C)$. This is why later transfer-function and design results use only the forced zero rows and columns, not the particular numerical off-diagonal entries.
[remark: Coordinate Dependence]
Kalman decomposition is a similarity statement, not an [orthogonal projection theorem](/theorems/4916). Different choices of complements give different off-diagonal blocks, while the dimensions of the four diagonal state classes are invariant under similarity.
[/remark]
This explains why computations often use rank-revealing algorithms rather than a canonical formula. The invariant content is the pair of subspaces and the induced minimal block, not the decorative entries produced by a particular basis.
[example: Four Blocks in Dimension Four]
Let
\begin{align*}
A=\operatorname{diag}(-1,-2,5,7),\qquad B=e_1+e_2,\qquad Cz=z_1+z_3.
\end{align*}
The reachability generators are obtained by applying the diagonal matrix $A$ to $B$. Since $Ae_1=-e_1$ and $Ae_2=-2e_2$, we get
\begin{align*}
B=e_1+e_2.
\end{align*}
\begin{align*}
AB=-e_1-2e_2.
\end{align*}
\begin{align*}
A^2B=e_1+4e_2.
\end{align*}
\begin{align*}
A^3B=-e_1-8e_2.
\end{align*}
All these vectors lie in $\operatorname{span}(e_1,e_2)$. The first two already span this plane: if
\begin{align*}
\alpha(e_1+e_2)+\beta(-e_1-2e_2)=0,
\end{align*}
then the $e_1$-coordinate gives $\alpha-\beta=0$, and the $e_2$-coordinate gives $\alpha-2\beta=0$. Subtracting the two equations gives $-\beta=0$, hence $\beta=0$ and then $\alpha=0$. Therefore $B$ and $AB$ are linearly independent, so
\begin{align*}
\mathcal R(A,B)=\operatorname{span}(e_1,e_2).
\end{align*}
For the unobservable subspace, write $v=v_1e_1+v_2e_2+v_3e_3+v_4e_4$. Because $A$ is diagonal,
\begin{align*}
Cv=v_1+v_3.
\end{align*}
\begin{align*}
CAv=-v_1+5v_3.
\end{align*}
If $v\in\mathcal N(A,C)$, both expressions are zero. Thus $v_1+v_3=0$ and $-v_1+5v_3=0$. Adding these two equations gives $6v_3=0$, so $v_3=0$, and then $v_1=0$. The coordinates $v_2$ and $v_4$ never appear in $CA^k v$, since $C$ kills $e_2$ and $e_4$ after every diagonal evolution. Hence
\begin{align*}
\mathcal N(A,C)=\operatorname{span}(e_2,e_4).
\end{align*}
Thus $e_1\in\mathcal R(A,B)$ and $e_1\notin\mathcal N(A,C)$, so the first coordinate is reachable-observable. The vector $e_2$ lies in $\mathcal R(A,B)\cap\mathcal N(A,C)$, so the second coordinate is reachable-unobservable. Also
\begin{align*}
\mathcal R(A,B)+\mathcal N(A,C)=\operatorname{span}(e_1,e_2,e_4),
\end{align*}
so $e_3$ spans a complement to this sum and is unreachable-observable. Finally, $e_4\in\mathcal N(A,C)$ and $e_4\notin\mathcal R(A,B)$, so the fourth coordinate is unreachable-unobservable. Each Kalman state type has dimension $1$, and four nonzero types cannot occur in dimension smaller than $4$.
[/example]
## Transfer Functions and Minimal Realizations
The decomposition becomes useful when we ask which states can be inferred from input-output experiments. If the initial state is fixed at zero, the transfer function is
\begin{align*}
G(s)=C(sI-A)^{-1}B+D,
\end{align*}
where $s$ belongs to the resolvent set of $A$. Kalman decomposition shows that only the reachable-observable block can affect this rational matrix.
[quotetheorem:6388]
[citeproof:6388]
The theorem separates internal dynamics from external behaviour. The zero-initial-state assumption is essential: an unreachable-observable state can affect the measured output if it is nonzero initially, even though no input can create it from rest. The theorem also does not say that the hidden diagonal eigenvalues are absent from the state equation; it says that their contributions cancel or never enter the rational matrix $G(s)$. This is the first appearance of rational matrix cancellation in the chapter, and it motivates the realization-theoretic question: when has a model kept only those state variables that the input-output map can actually require?
[definition: Minimal Realization]
Let $G$ be a real rational $p\times m$ matrix transfer function. A finite-dimensional continuous-time realization $(A,B,C,D)$ over $\mathbb R$ of $G$ is minimal if no finite-dimensional continuous-time realization over $\mathbb R$ of the same transfer function has smaller state dimension.
[/definition]
Minimality is the state-space analogue of having no redundant internal variables. A concrete failure case is obtained by adjoining an unreachable and unobservable scalar equation $\dot w=\lambda w$ to any realization; the transfer function is unchanged, but the state dimension has increased by one. The previous theorem gives the candidate minimal subsystem, while the next result states the usual rank-test characterization.
[quotetheorem:6389]
[citeproof:6389]
The criterion says that minimality is not an extra analytic condition; it is exactly the simultaneous absence of unreachable and unobservable state directions. Both hypotheses are necessary: an observable but unreachable extra scalar mode, or a reachable but unobservable extra scalar mode, leaves the transfer function nonminimal in the sense detected by the decomposition. The theorem does not provide a numerically stable algorithm for finding the minimal subsystem; it identifies the algebraic condition that such an algorithm must enforce through rank computations. Once such a model has been reached, the next issue is whether two different minimal constructions can describe genuinely different internal dynamics or only different coordinates. Without minimality the answer is no: the same transfer function can be enlarged by two hidden scalar modes with different eigenvalues, producing nonminimal realizations that are not similar because their spectra differ.
[quotetheorem:6390]
[citeproof:6390]
This uniqueness result is the reason transfer-function models and minimal state-space models carry the same finite-dimensional dynamics. Minimality is necessary: adding an unreachable-unobservable scalar block with eigenvalue $2$ or with eigenvalue $3$ gives the same transfer function but two nonminimal realizations that cannot be similar. The theorem also does not choose a preferred coordinate system for the minimal state; it says only that every minimal coordinate system represents the same dynamics up to an invertible change of basis. This prepares the design interpretation, where the invariant question is not which coordinates were chosen, but which eigenvalues lie in reachable or observable invariant subspaces.
[example: Extracting the Minimal Realization]
For the four-dimensional system from the previous example, $A=\operatorname{diag}(-1,-2,5,7)$, $B=e_1+e_2$, $Cz=z_1+z_3$, and $D=0$. Hence
\begin{align*}
sI-A=\operatorname{diag}(s+1,s+2,s-5,s-7).
\end{align*}
For $s\notin\{-1,-2,5,7\}$, its inverse is
\begin{align*}
(sI-A)^{-1}=\operatorname{diag}\left(\frac{1}{s+1},\frac{1}{s+2},\frac{1}{s-5},\frac{1}{s-7}\right).
\end{align*}
Since $B=e_1+e_2$, multiplying the diagonal inverse by $B$ gives
\begin{align*}
(sI-A)^{-1}B=\frac{1}{s+1}e_1+\frac{1}{s+2}e_2.
\end{align*}
Applying $C$ to this vector uses $Ce_1=1$ and $Ce_2=0$, so
\begin{align*}
C(sI-A)^{-1}B=C\left(\frac{1}{s+1}e_1+\frac{1}{s+2}e_2\right)=\frac{1}{s+1}Ce_1+\frac{1}{s+2}Ce_2=\frac{1}{s+1}.
\end{align*}
Therefore the zero-state transfer function of the full realization is
\begin{align*}
G(s)=C(sI-A)^{-1}B+D=\frac{1}{s+1}.
\end{align*}
The reachable-observable block is the scalar subsystem
\begin{align*}
\dot z_1=-z_1+u,\quad y=z_1.
\end{align*}
Its transfer function is
\begin{align*}
1\cdot \frac{1}{s-(-1)}\cdot 1=\frac{1}{s+1}.
\end{align*}
Thus this one-dimensional subsystem is the minimal realization extracted from the Kalman decomposition. The eigenvalues $-2$, $5$, and $7$ are still eigenvalues of the full state matrix, but the factors $s+2$, $s-5$, and $s-7$ do not appear in the zero-state transfer function because their coordinates are respectively reachable-unobservable, unreachable-observable, and unreachable-unobservable.
[/example]
## Feedback, Observers, and Hidden Instability
The last issue is design. Kalman decomposition does not merely simplify realization theory; it tells us which design objectives are mathematically possible. State feedback can move only reachable modes, while output injection can move only observable modes.
[quotetheorem:6391]
[citeproof:6391]
This is the structural reason stabilizability is weaker than reachability: unreachable modes are acceptable only when their autonomous dynamics are already stable. The zero-row hypothesis for $\tilde B$ is essential; if an input channel entered an unreachable block, that block would be reachable by definition and feedback could alter its diagonal dynamics. The theorem does not say that every reachable eigenvalue can be placed without the usual controllability assumptions on the reachable subsystem; it says only that unreachable eigenvalues are excluded from assignment. The observer-side statement is dual, with quotient spaces replaced by annihilating output maps.
[quotetheorem:6392]
[citeproof:6392]
The design interpretation is immediate: detectability is the condition that unobservable modes already decay, just as stabilizability is the condition that unreachable modes already decay. The zero-column hypothesis for $\tilde C$ is essential; if an output measured an unobservable coordinate, that coordinate would not belong to $\mathcal N(A,C)$ and output injection could affect its error dynamics. The theorem does not promise observer pole placement on every observable block unless the observable subsystem satisfies the corresponding dual rank condition. Kalman decomposition places these statements in the same coordinate picture and shows how feedback and observers are dual statements about invariant subspaces.
[example: Inaccessible Unstable Mode]
Take the two-state Kalman-form realization with state $z=(z_1,z_2)$,
\begin{align*}
A=\operatorname{diag}(-1,2),\qquad B=e_1,\qquad Cz=z_1,\qquad D=0.
\end{align*}
The input enters only the first coordinate and the output measures only the first coordinate, so $z_1$ is the reachable-observable coordinate and $z_2$ is the unreachable-unobservable coordinate. For $s\notin\{-1,2\}$,
\begin{align*}
sI-A=\operatorname{diag}(s+1,s-2).
\end{align*}
Hence
\begin{align*}
(sI-A)^{-1}=\operatorname{diag}\left(\frac{1}{s+1},\frac{1}{s-2}\right).
\end{align*}
Multiplying by $B=e_1$ gives
\begin{align*}
(sI-A)^{-1}B=\frac{1}{s+1}e_1.
\end{align*}
Applying $C$ gives
\begin{align*}
G(s)=C(sI-A)^{-1}B+D=\frac{1}{s+1}Ce_1+0=\frac{1}{s+1}.
\end{align*}
Thus every zero-state input-output experiment sees only the stable pole $-1$, while the hidden pole $2$ is absent from the transfer function.
The hidden coordinate still appears in the internal state equation. With $u=0$, the coordinate equations are $\dot z_1=-z_1$ and $\dot z_2=2z_2$. Since $\frac{d}{dt}(e^{-2t}z_2(t))=e^{-2t}\dot z_2(t)-2e^{-2t}z_2(t)=0$, the second equation has solution
\begin{align*}
z_2(t)=e^{2t}z_2(0).
\end{align*}
Therefore any initial condition with $z_2(0)\neq 0$ produces exponential growth in the hidden state.
Now apply state feedback $u=Kz+v$ with $Kz=k_1z_1+k_2z_2$. Since $BKz=(k_1z_1+k_2z_2)e_1$, the closed-loop equations are
\begin{align*}
\dot z_1=(-1+k_1)z_1+k_2z_2+v.
\end{align*}
The hidden coordinate still satisfies
\begin{align*}
\dot z_2=2z_2.
\end{align*}
Equivalently, $A+BK$ is upper triangular with diagonal entries $-1+k_1$ and $2$, so the unreachable eigenvalue $2$ remains fixed.
Similarly, for an observer gain $L=(\ell_1,\ell_2)^\top$, the error dynamics use $A-LC$. Since $Ce=e_1^\top e=e_1$-coordinate of $e$, the equations are
\begin{align*}
\dot e_1=(-1-\ell_1)e_1.
\end{align*}
The second error coordinate satisfies
\begin{align*}
\dot e_2=-\ell_2 e_1+2e_2.
\end{align*}
Thus $A-LC$ is lower triangular with diagonal entries $-1-\ell_1$ and $2$, so the unobservable eigenvalue $2$ also remains fixed. The unstable mode is invisible in the transfer function, immovable by the given input, and not reconstructible from the given output.
[/example]
This example is a warning about the difference between external stability and internal stability. A nonminimal realization can look harmless through its transfer function while containing unstable autonomous dynamics invisible to experiments performed from zero initial state.
## Summary of the Decomposition Viewpoint
Kalman decomposition organizes a realization into four state classes determined by reachability and observability. Only the reachable-observable part contributes to the transfer function and survives in a minimal realization. Minimal realizations are reachable, observable, and unique up to similarity, so they are the correct state-space representatives of a transfer function.
For design, the same block form identifies immovable modes. State feedback cannot assign unreachable eigenvalues, and observer injection cannot assign unobservable eigenvalues. The practical message is that before designing gains, one should know which modes are controlled, which modes are measured, and which modes are merely artifacts or hidden liabilities of the chosen realization.
The Kalman decomposition tells us which modes are usable, but not yet how to move them. Chapter 7 turns the reachable part of the system into a design object, using state feedback to assign closed-loop poles where the structure allows it.
# 7. State Feedback and Pole Placement
State feedback is the first point in the course where structural properties become design tools. Chapters 2 and 3 described how spectra control stability and which states can be reached; here the question is whether measuring the full state allows us to reshape the dynamics by choosing the input as a function of the state. The central message is that controllable modes can be assigned arbitrary closed-loop eigenvalues, while uncontrollable modes remain fixed and must already be stable if stabilisation is to be possible.
## Static State Feedback and Closed-Loop Dynamics
The design problem begins with a plant whose state is available for measurement. If the open-loop dynamics have undesirable eigenvalues, we ask whether a linear control law can move them without changing the state variables or adding controller dynamics.
[definition: Static State Feedback]
Let $A \in \mathbb R^{n \times n}$, $B \in \mathbb R^{n \times m}$, and consider the system
\begin{align*}
\dot{x} = Ax + Bu.
\end{align*}
A static state feedback law with external input $v: [0,\infty) \to \mathbb R^m$ is the map
\begin{align*}
F_K: \mathbb R^n \times \mathbb R^m \to \mathbb R^m, \qquad F_K(x,v)=Kx+v,
\end{align*}
where $K \in \mathbb R^{m \times n}$ is the feedback gain.
[/definition]
The gain $K$ is useful only through the differential equation it creates after substitution. Since all stability and pole-placement questions will be asked about that substituted equation, we need a name for the matrix that replaces $A$ in the autonomous part. This motivates the closed-loop matrix definition.
[definition: Closed-Loop Matrix]
For the system $\dot{x}=Ax+Bu$ under the static state feedback $u=Kx+v$, the closed-loop matrix is
\begin{align*}
A_K := A + BK.
\end{align*}
[/definition]
Substituting the feedback law gives the closed-loop dynamics
\begin{align*}
\dot{x} = A_Kx + Bv.
\end{align*}
The external input $v$ is kept in the formula because feedback design is usually only the first layer of a controller. It shapes the autonomous part $\dot{x}=A_Kx$ while preserving an input channel for tracking, disturbance rejection, or later optimal design.
[example: Double Integrator Feedback Form]
Consider the double integrator
\begin{align*}
\dot{x}_1=x_2,\qquad \dot{x}_2=u.
\end{align*}
Here $A_{12}=1$, all other entries of $A$ are zero, and $B=e_2$, so $Ax=(x_2,0)^\top$ and $Bu=(0,u)^\top$. For $K=(k_1\ k_2)$, the feedback law is
\begin{align*}
u=Kx+v=k_1x_1+k_2x_2+v.
\end{align*}
Substituting this into the system gives
\begin{align*}
\dot{x}_1=x_2.
\end{align*}
\begin{align*}
\dot{x}_2=k_1x_1+k_2x_2+v.
\end{align*}
The closed-loop matrix $A+BK$ has entries $(A+BK)_{11}=0$, $(A+BK)_{12}=1$, $(A+BK)_{21}=k_1$, and $(A+BK)_{22}=k_2$. Hence $sI-(A+BK)$ has entries $s$, $-1$, $-k_1$, and $s-k_2$, so the $2\times 2$ determinant formula gives
\begin{align*}
\det(sI-(A+BK))=s(s-k_2)-(-1)(-k_1).
\end{align*}
Therefore
\begin{align*}
\det(sI-(A+BK))=s^2-k_2s-k_1.
\end{align*}
If the desired poles are $\lambda_1$ and $\lambda_2$, then the desired characteristic polynomial is
\begin{align*}
(s-\lambda_1)(s-\lambda_2)=s^2-(\lambda_1+\lambda_2)s+\lambda_1\lambda_2.
\end{align*}
Matching coefficients with $s^2-k_2s-k_1$ gives $-k_2=-(\lambda_1+\lambda_2)$ and $-k_1=\lambda_1\lambda_2$, so
\begin{align*}
k_2=\lambda_1+\lambda_2,\qquad k_1=-\lambda_1\lambda_2.
\end{align*}
Thus the feedback entries set the two coefficients of the closed-loop polynomial: $k_2$ controls the trace term, while $k_1$ controls the determinant term.
[/example]
This example shows the two roles of the gain entries: one coefficient changes the trace and the other changes the determinant. In higher dimension the gain does not usually act coefficient-by-coefficient, so we need structural tests that say when every desired polynomial can be obtained.
[definition: Closed-Loop Spectrum]
The closed-loop spectrum of $\dot{x}=Ax+Bu$ under $u=Kx+v$ is the spectrum
\begin{align*}
\sigma(A+BK).
\end{align*}
[/definition]
The corresponding closed-loop characteristic polynomial is
\begin{align*}
p_K(s) := \det(sI-A-BK).
\end{align*}
The first obstruction is immediate from uncontrollable directions. If a state component cannot be affected by any input, feedback cannot move the eigenvalues supported entirely on that component.
[definition: Stabilisability]
The pair $(A,B)$ is stabilisable if there exists $K \in \mathbb R^{m \times n}$ such that $A+BK$ is Hurwitz, meaning every eigenvalue of $A+BK$ has negative real part.
[/definition]
Stabilisability is weaker than controllability. It allows uncontrollable modes, provided those modes do not already destroy asymptotic stability.
[example: One Uncontrollable Stable Mode]
Let $A$ have entries $A_{12}=1$ and $A_{33}=-2$, all other entries zero, and let $B=e_2$. Then
\begin{align*}
Ax=(x_2,0,-2x_3)^\top
\end{align*}
and
\begin{align*}
Bu=(0,u,0)^\top.
\end{align*}
For $K=(k_1\ k_2\ 0)$, the feedback law is
\begin{align*}
u=Kx+v=k_1x_1+k_2x_2+v.
\end{align*}
Substitution gives
\begin{align*}
\dot{x}_1=x_2.
\end{align*}
\begin{align*}
\dot{x}_2=k_1x_1+k_2x_2+v.
\end{align*}
\begin{align*}
\dot{x}_3=-2x_3.
\end{align*}
Thus the third state remains decoupled from both $u$ and $v$, so the eigenvalue $-2$ is fixed under this feedback.
The closed-loop matrix $A+BK$ has entries $(A+BK)_{12}=1$, $(A+BK)_{21}=k_1$, $(A+BK)_{22}=k_2$, $(A+BK)_{33}=-2$, and all other entries zero. Hence $sI-(A+BK)$ has a decoupled third coordinate with diagonal entry $s+2$, so its determinant is the product of $s+2$ and the determinant of the first two coordinates:
\begin{align*}
\det(sI-(A+BK))=(s+2)\det\begin{pmatrix} s&-1; -k_1&s-k_2\end{pmatrix}.
\end{align*}
Using the $2\times 2$ determinant formula,
\begin{align*}
\det\begin{pmatrix} s&-1; -k_1&s-k_2\end{pmatrix}=s(s-k_2)-(-1)(-k_1).
\end{align*}
Therefore
\begin{align*}
\det(sI-(A+BK))=(s+2)(s^2-k_2s-k_1).
\end{align*}
To place the controllable two-dimensional part at desired poles $\lambda_1$ and $\lambda_2$, where $\operatorname{Re}\lambda_1<0$ and $\operatorname{Re}\lambda_2<0$, match
\begin{align*}
s^2-k_2s-k_1=(s-\lambda_1)(s-\lambda_2).
\end{align*}
Expanding the right-hand side gives
\begin{align*}
(s-\lambda_1)(s-\lambda_2)=s^2-(\lambda_1+\lambda_2)s+\lambda_1\lambda_2.
\end{align*}
Comparing coefficients gives
\begin{align*}
-k_2=-(\lambda_1+\lambda_2).
\end{align*}
\begin{align*}
-k_1=\lambda_1\lambda_2.
\end{align*}
Thus
\begin{align*}
k_2=\lambda_1+\lambda_2.
\end{align*}
\begin{align*}
k_1=-\lambda_1\lambda_2.
\end{align*}
With this choice, the closed-loop spectrum is $\{\lambda_1,\lambda_2,-2\}$. The two assignable poles are placed in the open left half-plane, and the uncontrollable pole $-2$ was already stable, so the whole closed-loop system is stabilised.
[/example]
This example previews the general stabilisability criterion: the only uncontrollable modes that matter for stabilisation are those with nonnegative real part. We next make pole assignment precise for controllable systems, then return to this weaker condition.
## Pole Assignment for Controllable Systems
The main design question is now exact: given a desired monic polynomial of degree $n$, can we find a static feedback gain whose closed-loop characteristic polynomial equals it? For single-input systems the answer is constructive once the system is controllable.
[definition: Controllability Matrix]
For $A \in \mathbb R^{n\times n}$ and $B \in \mathbb R^{n\times 1}$, the controllability matrix is
\begin{align*}
\mathcal C(A,B) := \begin{pmatrix}B&AB&\cdots&A^{n-1}B\end{pmatrix}.
\end{align*}
[/definition]
When $\mathcal C(A,B)$ is invertible, the ordered list $B,AB,\dots,A^{n-1}B$ is a basis. In that basis the system has companion form, where changing the last row changes the characteristic polynomial directly. This coefficient control is exactly the mechanism needed to prove that controllability implies arbitrary pole assignment.
[quotetheorem:6393]
[citeproof:6393]
The theorem is often stated as pole assignment: specifying a multiset of complex numbers closed under conjugation is the same as specifying a real monic characteristic polynomial. The controllability hypothesis is necessary, not merely technical: if $A=\operatorname{diag}(1,-1)$ and $B=e_2$, then the first state is unaffected by the input, so $1$ remains an eigenvalue of $A+BK$ for every $K$. Thus the theorem does not say that arbitrary eigenvalues can be assigned for every plant; it says that controllability is exactly the structural condition under which no open-loop eigenvalue is fixed by the input geometry. Repeated roots are allowed, although numerical algorithms may become ill-conditioned when desired poles are clustered.
[example: Repeated Poles In Companion Form]
Consider the companion-form single-input system with $B=e_3$ and with $A_{12}=A_{23}=1$, $A_{31}=-a_0$, $A_{32}=-a_1$, $A_{33}=-a_2$, all other entries zero. Thus $A$ has first row $(0,1,0)$, second row $(0,0,1)$, and third row $(-a_0,-a_1,-a_2)$. For $K=(k_0\ k_1\ k_2)$, the product $BK=e_3K$ has first row $(0,0,0)$, second row $(0,0,0)$, and third row $(k_0,k_1,k_2)$. Therefore $A+BK$ has first row $(0,1,0)$, second row $(0,0,1)$, and third row
\begin{align*}
(-a_0+k_0,\,-a_1+k_1,\,-a_2+k_2).
\end{align*}
We compute the characteristic polynomial from $sI-(A+BK)$. Its first row is $(s,-1,0)$, its second row is $(0,s,-1)$, and its third row is
\begin{align*}
(a_0-k_0,\,a_1-k_1,\,s+a_2-k_2).
\end{align*}
Expanding along the first row gives
\begin{align*}
\det(sI-A-BK)=s\bigl(s(s+a_2-k_2)-(-1)(a_1-k_1)\bigr)-(-1)\bigl(0\cdot(s+a_2-k_2)-(-1)(a_0-k_0)\bigr).
\end{align*}
The first minor is
\begin{align*}
s(s+a_2-k_2)-(-1)(a_1-k_1)=s(s+a_2-k_2)+a_1-k_1.
\end{align*}
The second minor is
\begin{align*}
0\cdot(s+a_2-k_2)-(-1)(a_0-k_0)=a_0-k_0.
\end{align*}
Substituting these values into the expansion gives
\begin{align*}
\det(sI-A-BK)=s\bigl(s(s+a_2-k_2)+a_1-k_1\bigr)+(a_0-k_0).
\end{align*}
Expanding the right-hand side,
\begin{align*}
\det(sI-A-BK)=s^3+(a_2-k_2)s^2+(a_1-k_1)s+(a_0-k_0).
\end{align*}
To assign the repeated pole $-\alpha$, the desired characteristic polynomial is $(s+\alpha)^3$. First,
\begin{align*}
(s+\alpha)^2=s^2+2\alpha s+\alpha^2.
\end{align*}
Hence
\begin{align*}
(s+\alpha)^3=(s+\alpha)(s^2+2\alpha s+\alpha^2).
\end{align*}
Expanding term by term,
\begin{align*}
(s+\alpha)^3=s^3+2\alpha s^2+\alpha^2s+\alpha s^2+2\alpha^2s+\alpha^3.
\end{align*}
Combining like powers of $s$ gives
\begin{align*}
(s+\alpha)^3=s^3+3\alpha s^2+3\alpha^2s+\alpha^3.
\end{align*}
Matching this with
\begin{align*}
s^3+(a_2-k_2)s^2+(a_1-k_1)s+(a_0-k_0)
\end{align*}
requires
\begin{align*}
a_2-k_2=3\alpha,\qquad a_1-k_1=3\alpha^2,\qquad a_0-k_0=\alpha^3.
\end{align*}
Solving these three equations gives
\begin{align*}
k_0=a_0-\alpha^3,\qquad k_1=a_1-3\alpha^2,\qquad k_2=a_2-3\alpha.
\end{align*}
With this choice, $\det(sI-A-BK)=(s+\alpha)^3$, so the closed-loop matrix has the repeated pole $-\alpha$ with algebraic multiplicity $3$.
[/example]
Companion form gives an exact coefficient-matching recipe, but it is inefficient to compute a full coordinate transformation by hand for each system. The underlying problem is to extract the same last-row coefficient information directly from the original coordinates. Controllability is the necessary obstruction: only when the controllability matrix is invertible can one isolate the coordinate direction on which the desired characteristic polynomial should act.
[quotetheorem:6394]
[citeproof:6394]
Ackermann's formula is valuable conceptually and for small examples. Its controllability hypothesis is exactly what makes the expression meaningful: if $\mathcal C(A,B)$ is singular, the row $e_n^\top\mathcal C(A,B)^{-1}$ is not defined, and this algebraic failure reflects an actual fixed mode rather than a defect of notation. For example, if $A=\operatorname{diag}(1,-1)$ and $B=e_2$, then $\mathcal C(A,B)$ has first row zero and no feedback gain can remove the unstable pole at $1$. In numerical work, pole placement is usually computed by more stable algorithms because forming high powers of $A$ and inverting $\mathcal C(A,B)$ can amplify conditioning problems.
[example: Ackermann Formula For The Double Integrator]
For the double integrator, $A e_1=0$, $A e_2=e_1$, and $B=e_2$. Hence
\begin{align*}
AB=Ae_2=e_1.
\end{align*}
Therefore the controllability matrix has first column $e_2$ and second column $e_1$:
\begin{align*}
\mathcal C(A,B)=\begin{pmatrix}0&1;1&0\end{pmatrix}.
\end{align*}
This matrix swaps $e_1$ and $e_2$, so applying it twice fixes both basis vectors. Thus
\begin{align*}
\mathcal C(A,B)^2=I,
\end{align*}
and hence $\mathcal C(A,B)^{-1}=\mathcal C(A,B)$.
Let the desired polynomial be $q(s)=s^2+a_1s+a_0$. Since $Ae_1=0$ and $Ae_2=e_1$, we have
\begin{align*}
A^2e_1=A0=0.
\end{align*}
Also,
\begin{align*}
A^2e_2=Ae_1=0.
\end{align*}
Thus $A^2=0$, and substituting $A$ into $q$ gives
\begin{align*}
q(A)=A^2+a_1A+a_0I=a_1A+a_0I.
\end{align*}
The matrix $a_1A+a_0I$ has first row $(a_0,a_1)$ and second row $(0,a_0)$.
Ackermann's formula gives
\begin{align*}
K=-e_2^\top\mathcal C(A,B)^{-1}q(A).
\end{align*}
Because $\mathcal C(A,B)$ swaps the two coordinates,
\begin{align*}
e_2^\top\mathcal C(A,B)^{-1}=e_2^\top\mathcal C(A,B)=e_1^\top.
\end{align*}
Therefore
\begin{align*}
K=-e_1^\top q(A).
\end{align*}
Since the first row of $q(A)$ is $(a_0,a_1)$, this becomes
\begin{align*}
K=\begin{pmatrix}-a_0&-a_1\end{pmatrix}.
\end{align*}
With this gain, $BK=e_2K$, so $BK$ has first row $(0,0)$ and second row $(-a_0,-a_1)$. Adding this to $A$, whose first row is $(0,1)$ and second row is $(0,0)$, gives $A+BK$ with first row $(0,1)$ and second row $(-a_0,-a_1)$. Hence $sI-(A+BK)$ has first row $(s,-1)$ and second row $(a_0,s+a_1)$. Using the $2\times 2$ determinant formula,
\begin{align*}
\det(sI-A-BK)=s(s+a_1)-(-1)a_0=s^2+a_1s+a_0.
\end{align*}
Thus Ackermann's formula produces exactly the feedback gain whose closed-loop characteristic polynomial is the prescribed polynomial $q(s)$.
[/example]
The single-input theorem captures the cleanest version of pole assignment, but most physical systems have several actuators. Multiple inputs add freedom rather than a new obstruction: a controllable multi-input pair can be reduced to chains of single-input companion blocks, and feedback can coordinate those chains to prescribe the full closed-loop spectrum.
[quotetheorem:6395]
[citeproof:6395]
The theorem does not say that every gain is acceptable from an engineering viewpoint. It says that the algebraic obstruction has disappeared; robustness, actuator size, saturation, and modelling error remain separate design constraints. The controllability assumption is still necessary: for $A=\operatorname{diag}(1,-1)$ and $B=e_2$, the first coordinate is never actuated, so $1$ is an eigenvalue of $A+BK$ for every $K\in\mathbb R^{1\times 2}$. This example is also a warning that the single-input coefficient-matching picture does not automatically generalise by selecting one input direction; multi-input systems require the full controllability-chain structure.
## Stabilisation Under Weaker Structural Conditions
Full pole assignment is more than is needed for asymptotic stability. Stabilisation asks only whether all closed-loop poles can be put in the open left half-plane, so stable uncontrollable modes may be tolerated.
[quotetheorem:6396]
[citeproof:6396]
This theorem turns stabilisation into a decomposition statement: separate controllable dynamics from uncontrollable dynamics, then check where the fixed eigenvalues lie. Its hypothesis cannot be weakened to the existence of some stable controllable modes: if $A=\operatorname{diag}(1,-1)$ and $B=e_2$, the unstable eigenvalue $1$ is uncontrollable and remains present under every feedback gain, so no stabilising feedback exists. The theorem also does not promise arbitrary placement of the uncontrollable stable modes; it only says that their fixed locations do not prevent asymptotic stability. In computations, explicitly building the Kalman decomposition can be more work than necessary, especially when the only question is whether unstable modes are actuated. This motivates the PBH stabilisability criterion, which replaces the decomposition by a rank test at closed-right-half-plane spectral values.
[quotetheorem:6397]
[citeproof:6397]
The PBH form is often the most useful test because it isolates only the dangerous part of the spectrum. It is not a full controllability test: stable uncontrollable eigenvalues may fail the full PBH condition, but they do not prevent feedback stabilisation. The restriction to $\operatorname{Re}\lambda\ge 0$ is essential; if $A=\operatorname{diag}(1,-1)$ and $B=e_2$, then the rank condition fails at $\lambda=1$, and this failure exactly records the immovable unstable mode. Conversely, a rank failure at $\lambda=-1$ alone would not obstruct stabilisation, though it would obstruct arbitrary pole placement. The criterion therefore separates the stabilisation problem from the stronger design problem of assigning every closed-loop pole.
[example: PBH Test With A Stable Uncontrollable Mode]
Let $A$ have rows $(0,1,0)$, $(0,0,0)$, and $(0,0,-2)$, and let $B=e_2=(0,1,0)^\top$. We compute the open-loop eigenvalues and then apply the *PBH Stabilisability Criterion*. Since $sI-A$ is upper triangular with diagonal entries $s$, $s$, and $s+2$, its determinant is
\begin{align*}
\det(sI-A)=s\cdot s\cdot (s+2)=s^2(s+2).
\end{align*}
Thus the eigenvalues of $A$ are $0$, $0$, and $-2$.
For a scalar $\lambda$, the PBH matrix $\begin{pmatrix}\lambda I-A&B\end{pmatrix}$ has rows
\begin{align*}
(\lambda,-1,0,0),\qquad (0,\lambda,0,1),\qquad (0,0,\lambda+2,0).
\end{align*}
At the stable eigenvalue $\lambda=-2$, these rows become
\begin{align*}
(-2,-1,0,0),\qquad (0,-2,0,1),\qquad (0,0,0,0).
\end{align*}
The third row is zero, so the rank is at most $2$. Hence the mode at $-2$ is uncontrollable, but $\operatorname{Re}(-2)<0$, so this rank failure is not an obstruction to stabilisability.
It remains to check the only eigenvalue in the closed right half-plane, namely $\lambda=0$. Substitution gives the three rows
\begin{align*}
r_1=(0,-1,0,0),\qquad r_2=(0,0,0,1),\qquad r_3=(0,0,2,0).
\end{align*}
If $c_1r_1+c_2r_2+c_3r_3=0$, then comparing columns $2$, $4$, and $3$ gives
\begin{align*}
-c_1=0,\qquad c_2=0,\qquad 2c_3=0.
\end{align*}
Therefore $c_1=c_2=c_3=0$, so the three rows are linearly independent and
\begin{align*}
\operatorname{rank}\begin{pmatrix}0I-A&B\end{pmatrix}=3.
\end{align*}
Every open-loop eigenvalue with nonnegative real part therefore passes the PBH rank test, and the pair $(A,B)$ is stabilisable even though the stable eigenvalue $-2$ is uncontrollable.
[/example]
The chapter's design logic can now be summarised in three levels. Controllability permits arbitrary pole assignment. Stabilisability permits enough pole assignment to make the system asymptotically stable. The PBH rank test identifies the difference by checking controllability only at eigenvalues that could obstruct stability.
State feedback gives control of the dynamics only when the full state is available. Chapter 8 relaxes that assumption and develops observers, so that the missing state can be reconstructed from measured outputs and known inputs.
# 8. Observers and State Estimation Without Noise
State feedback assumes that the full state $x(t)$ is available for measurement, but most physical systems only expose selected outputs $y(t)=Cx(t)$. This chapter asks when the missing state variables can be reconstructed from the measured output and the known input, without modelling process noise or sensor noise. The main mechanism is the Luenberger observer: a copy of the plant corrected by output prediction error. The design theory mirrors the state-feedback pole-placement theory of Chapter 7, with observability and detectability replacing reachability and stabilisability.
## Estimation from Output Measurements
The central problem is to build a dynamical system whose state $\tilde{x}(t)$ converges to the true state $x(t)$ while using only the known input $u(t)$ and the measured output $y(t)$. If convergence is required for every initial mismatch, the dynamics of the mismatch must be autonomous and stable. This requirement is the reason the observer correction is injected through the output error $y-C\tilde{x}$.
[definition: Luenberger Observer]
Let $A \in \mathbb R^{n \times n}$, $B \in \mathbb R^{n \times m}$, and $C \in \mathbb R^{p \times n}$. For the plant
\begin{align*}
\dot{x} &= Ax+Bu, & y&=Cx,
\end{align*}
a Luenberger observer with observer gain $L \in \mathbb R^{n \times p}$ is the system
\begin{align*}
\dot{\tilde{x}} = A\tilde{x}+Bu+L(y-C\tilde{x}).
\end{align*}
[/definition]
Its observer state trajectory is a map $\tilde{x}:[0,\infty)\to\mathbb R^n$, and the estimation error is $e:=x-\tilde{x}$. The observer contains the same input channel as the plant, so the known forcing cancels out of the error equation. The remaining dynamics are controlled only by the matrix $A-LC$, and this turns observer design into a matrix assignment problem.
[quotetheorem:6398]
[citeproof:6398]
This theorem separates estimation from the particular input signal, but it also shows exactly where the model-matching hypothesis enters. If the observer uses the wrong input channel, for instance if the plant is $\dot x=u$ and $y=x$ but the observer omits $u$, then $\dot e=u-Le$, so a persistent input prevents convergence even when $L>0$. The theorem also does not say that every choice of $L$ works: if $C=0$ and $A$ has an unstable eigenvalue, then $A-LC=A$ for every gain and the output contains no information with which to correct that error. This motivates the next structural question: which directions in the state can actually be influenced by output injection, and which invisible directions must already be stable?
[example: Velocity Estimation from Position]
Consider the double integrator
\begin{align*}
\dot{x}_1=x_2,\qquad \dot{x}_2=u,\qquad y=x_1.
\end{align*}
Here position is measured and velocity is not. The plant matrices are determined by $A_{12}=1$, all other entries of $A$ equal to $0$, $B=(0,1)^\top$, and $C=(1,0)$. For $L=(\ell_1,\ell_2)^\top$, the product $LC$ has first row $(\ell_1,0)$ and second row $(\ell_2,0)$, because each entry is $(LC)_{ij}=L_iC_j$. Therefore $A-LC$ has first row $(-\ell_1,1)$ and second row $(-\ell_2,0)$.
We compute the characteristic polynomial from the $2\times 2$ determinant formula:
\begin{align*}
\det(sI-(A-LC))=\det\begin{pmatrix}s+\ell_1&-1;\ell_2&s\end{pmatrix}.
\end{align*}
Thus
\begin{align*}
\det(sI-(A-LC))=(s+\ell_1)s-(-1)\ell_2.
\end{align*}
Expanding the two terms gives
\begin{align*}
(s+\ell_1)s-(-1)\ell_2=s^2+\ell_1s+\ell_2.
\end{align*}
If $\ell_1=2\omega$ and $\ell_2=\omega^2$, then
\begin{align*}
s^2+\ell_1s+\ell_2=s^2+2\omega s+\omega^2=(s+\omega)^2.
\end{align*}
Hence both observer error eigenvalues are $-\omega$. For $\omega>0$, the error dynamics are stable, and the unmeasured velocity estimate converges with a rate controlled by the chosen pole location $-\omega$.
[/example]
The example shows the practical meaning of output injection: the measured position error corrects both the position estimate and, through the second component of $L$, the velocity estimate. This succeeds because the hidden velocity leaves a signature in the derivative of the measured position. The next structural question is how to recognise, from $(C,A)$ alone, which state directions leave such signatures and which directions cannot be recovered from output data.
## Observability and Detectability for Estimation
If a state direction never affects the measured output, no deterministic observer can infer its initial value from output data. This obstruction is harmless only when the hidden direction already decays under the plant dynamics. Detectability is the condition that captures exactly this weaker requirement.
[definition: Observability Matrix]
For $A \in \mathbb R^{n \times n}$ and $C \in \mathbb R^{p \times n}$, the observability matrix $\mathcal O(C,A) \in \mathbb R^{np \times n}$ is the block matrix whose block rows are $C,CA,\dots,CA^{n-1}$. The pair $(C,A)$ is observable if $\operatorname{rank}\mathcal O(C,A)=n$.
[/definition]
Observability says that no nonzero initial state can produce identically zero output under the homogeneous dynamics. For asymptotic estimation, however, we do not need to recover every hidden direction with equal force: an unobservable component that already decays under $\dot{x}=Ax$ will disappear from the estimation error even without correction. This motivates a weaker condition that only forbids unobservable modes on the imaginary axis or in the open right half-plane.
[definition: Detectability]
Let $A \in \mathbb R^{n \times n}$ and $C \in \mathbb R^{p \times n}$. Define the unobservable subspace
\begin{align*}
\mathcal N = \bigcap_{k=0}^{n-1} \ker(CA^k).
\end{align*}
The pair $(C,A)$ is detectable if the restriction $A|_{\mathcal N}:\mathcal N \to \mathcal N$ has all eigenvalues in the open left half-plane.
[/definition]
Detectability names the exact modal obstruction to convergence, but it is not yet a convenient test from the matrices. To use the condition in design, we need a rank criterion that detects the same non-decaying invisible modes without first constructing an observability decomposition. The next theorem supplies that test.
[quotetheorem:6399]
[citeproof:6399]
This criterion is often the most direct way to decide whether convergence is possible, because it tests only the modes that could persist or grow. Its necessity can be seen in the scalar invisible unstable system $A=(1)$ and $C=(0)$: at $\lambda=1$ the stacked matrix has rank $0$, and no gain can change $A-LC=1$. The criterion is not a pole-placement theorem and does not prescribe a numerical gain; it only decides whether some stabilising observer gain can exist. It also explains why full observability is stronger than needed: stable hidden dynamics may be left alone, whereas unstable or marginal hidden dynamics make estimation impossible.
[example: Detectable System with One Unobservable Stable Mode]
Let $A=\operatorname{diag}(0,-2)$ and $C=(1,0)$. The observability matrix has rows $C$ and $CA$. Since
\begin{align*}
CA=(1,0)\operatorname{diag}(0,-2)=(0,0),
\end{align*}
its two rows are $(1,0)$ and $(0,0)$, so its row space is $\operatorname{span}\{(1,0)\}$ and $\operatorname{rank}\mathcal O(C,A)=1<2$. Thus the pair is not observable.
The unobservable subspace is
\begin{align*}
\ker C\cap\ker(CA)=\{(x_1,x_2)^\top:x_1=0\}\cap\mathbb R^2=\operatorname{span}\{e_2\}.
\end{align*}
On this subspace,
\begin{align*}
Ae_2=\operatorname{diag}(0,-2)e_2=-2e_2,
\end{align*}
so the only unobservable eigenvalue is $-2$, which lies in the open left half-plane. Hence the pair is detectable.
Now choose an observer gain $L=(\ell,0)^\top$. Then
\begin{align*}
LC=(\ell,0)^\top(1,0)
\end{align*}
has first row $(\ell,0)$ and second row $(0,0)$, and therefore
\begin{align*}
A-LC=\operatorname{diag}(0,-2)-LC=\operatorname{diag}(-\ell,-2).
\end{align*}
The observer error eigenvalues are therefore $-\ell$ and $-2$. If $\ell>0$, both lie in the open left half-plane, so the observer error converges to zero even though the second state component is not observable.
[/example]
Detectability is therefore the right structural assumption for state estimation without noise. Observability will be needed when arbitrary pole placement is required, because then even the stable hidden modes must be movable.
## Observer Pole Placement
The next question is not merely whether the error can be made stable, but whether its convergence rates can be assigned in advance. Since the observer error matrix is $A-LC$, the problem is dual to state-feedback pole placement for $A-BK$. The duality comes from transposition: eigenvalues of $A-LC$ are eigenvalues of $(A-LC)^\top=A^\top-C^\top L^\top$.
[quotetheorem:6400]
[citeproof:6400]
Pole placement should be read as a design freedom, not as a recommendation to choose arbitrarily fast poles. The observability hypothesis is necessary: if $A=\operatorname{diag}(1,-1)$ and $C=(0,1)$, then the unstable first mode is invisible and $A-LC$ always has eigenvalue $1$, whatever $L$ is. Detectability would still be enough to stabilise the error when all invisible modes are already stable, but it would not allow arbitrary assignment of those hidden eigenvalues because output injection cannot move them. Large observer gains can amplify measurement errors and unmodelled dynamics; this chapter ignores noise, but later output-feedback design must account for that tradeoff.
[example: Observable Companion-Form Observer]
Consider the observable companion-form pair with $C=(0,0,1)$ and with $A$ determined by $A_{13}=-a_0$, $A_{21}=1$, $A_{23}=-a_1$, $A_{32}=1$, $A_{33}=-a_2$, all other entries being $0$. For $L=(\ell_0,\ell_1,\ell_2)^\top$, the output-injection matrix $LC$ has entries $(LC)_{ij}=L_iC_j$. Since $C_1=0$, $C_2=0$, and $C_3=1$, its only possibly nonzero entries are $(LC)_{13}=\ell_0$, $(LC)_{23}=\ell_1$, and $(LC)_{33}=\ell_2$.
Set
\begin{align*}
b_0=a_0+\ell_0,\qquad b_1=a_1+\ell_1,\qquad b_2=a_2+\ell_2.
\end{align*}
Then $A-LC$ has nonzero entries $(A-LC)_{13}=-b_0$, $(A-LC)_{21}=1$, $(A-LC)_{23}=-b_1$, $(A-LC)_{32}=1$, and $(A-LC)_{33}=-b_2$. Hence $sI-(A-LC)$ has nonzero entries $d_{11}=s$, $d_{13}=b_0$, $d_{21}=-1$, $d_{22}=s$, $d_{23}=b_1$, $d_{32}=-1$, and $d_{33}=s+b_2$.
Using the $3\times 3$ determinant formula,
\begin{align*}
\det(d)=d_{11}d_{22}d_{33}+d_{12}d_{23}d_{31}+d_{13}d_{21}d_{32}-d_{13}d_{22}d_{31}-d_{12}d_{21}d_{33}-d_{11}d_{23}d_{32}.
\end{align*}
Substituting the entries listed above gives
\begin{align*}
\det(sI-(A-LC))=s\cdot s\cdot(s+b_2)+0\cdot b_1\cdot0+b_0\cdot(-1)\cdot(-1)-b_0\cdot s\cdot0-0\cdot(-1)\cdot(s+b_2)-s\cdot b_1\cdot(-1).
\end{align*}
Therefore
\begin{align*}
\det(sI-(A-LC))=s^2(s+b_2)+b_0+sb_1=s^3+b_2s^2+b_1s+b_0.
\end{align*}
Replacing $b_i$ by $a_i+\ell_i$ yields
\begin{align*}
\det(sI-(A-LC))=s^3+(a_2+\ell_2)s^2+(a_1+\ell_1)s+(a_0+\ell_0).
\end{align*}
To obtain the desired monic polynomial $s^3+\alpha_2s^2+\alpha_1s+\alpha_0$, match coefficients:
\begin{align*}
a_2+\ell_2=\alpha_2,\qquad a_1+\ell_1=\alpha_1,\qquad a_0+\ell_0=\alpha_0.
\end{align*}
Thus
\begin{align*}
\ell_0=\alpha_0-a_0,\qquad \ell_1=\alpha_1-a_1,\qquad \ell_2=\alpha_2-a_2.
\end{align*}
In this coordinate form, observer pole placement reduces to changing the three [coefficients of the characteristic polynomial](/theorems/3306) by choosing the three entries of $L$.
[/example]
This companion-form computation is the observer version of coefficient matching for controllable canonical form. In general coordinates, the same design is carried out by transforming to observable canonical form or by using numerical pole placement algorithms on the dual system.
## Reduced-Order Observers
A full-order observer estimates every component of $x$, including combinations already measured directly in $y=Cx$. When $C$ has full row rank $p$, it is often wasteful to run an $n$-dimensional estimator. A reduced-order observer estimates only the $n-p$ missing state coordinates after a change of variables that makes the measured part explicit.
[definition: Output-Splitting Coordinates]
Assume $C \in \mathbb R^{p \times n}$ has rank $p$. An output-splitting coordinate map is an invertible linear map $T:\mathbb R^n \to \mathbb R^n$ such that, for $z=Tx$ and $z=(z_1,z_2) \in \mathbb R^p \times \mathbb R^{n-p}$,
\begin{align*}
CT^{-1}=\begin{pmatrix} I_p & 0 \end{pmatrix}, \qquad z_1=y.
\end{align*}
[/definition]
For a plant $\dot x=Ax+Bu$, output-splitting coordinates write the transformed system in block form. Partition $TAT^{-1}$ into blocks $A_{11},A_{12},A_{21},A_{22}$ conforming to $z=(z_1,z_2)$, and partition $TB$ into blocks $B_1,B_2$. The measured coordinate satisfies
\begin{align*}
\dot z_1 = A_{11}z_1+A_{12}z_2+B_1u.
\end{align*}
The unmeasured coordinate satisfies
\begin{align*}
\dot z_2 = A_{21}z_1+A_{22}z_2+B_2u.
\end{align*}
Output-splitting coordinates separate the measured variables from the variables that still need estimation. The first transformed equation contains the term $A_{12}z_2$, so it is the channel through which the unmeasured state influences measured data. This raises the construction problem: can we use that channel to build an estimator of dimension $n-p$ whose error has assignable stable dynamics?
[quotetheorem:6401]
[citeproof:6401]
The reduced-order design is most useful when sensors give accurate measurements of some states and the model is trusted enough to reconstruct the rest, but the observability hypothesis on $(A_{12},A_{22})$ is doing real work. If $A_{12}=0$ and $A_{22}=1$, the unmeasured state does not enter the measured equation at all, and $A_{22}-MA_{12}=1$ for every $M$, so no reduced observer can make that error decay. The theorem also does not construct a noise filter or a robustness guarantee: it assumes the model and the measured coordinate are exact, and it only gives deterministic convergence of the unmeasured-state error. It also clarifies why observer order is a modelling choice: the mathematical state of the estimator need not duplicate the plant state.
[remark: Detectability Versus Observability in Observer Design]
Detectability is sufficient for some full-order observer to converge, because only unstable and marginal modes must be corrected by output injection. Observability is required for arbitrary assignment of all $n$ observer poles. Reduced-order pole placement requires the corresponding unmeasured subsystem pair to be observable after the measured coordinates are separated.
[/remark]
The chapter's main message is the duality between control and estimation. State feedback changes $A$ by subtracting $BK$ and relies on reachability or stabilisability; output injection changes $A$ by subtracting $LC$ and relies on observability or detectability. In the noise-free setting, the observer is therefore a deterministic dynamic inverse for the measured output, stabilised by the same pole-placement ideas used for controller design. This is also the starting point for output feedback: a controller can act on $\tilde{x}$ when $x$ is unavailable, while later Kalman filtering replaces the deterministic correction $L(y-C\tilde{x})$ by a gain chosen from a noise model rather than by pole placement alone.
With observers in hand, the design problem shifts from structural feasibility to optimal performance. Chapter 9 replaces pole placement by a quadratic criterion and shows how linear quadratic regulation produces an optimal stabilizing feedback law.
# 9. Linear Quadratic Regulation
Linear quadratic regulation turns the design question from pole placement into an optimization problem. Chapters 3, 4, and 7 developed reachability, observability, and stabilisation as structural properties of $\dot{x}=Ax+Bu$; here the controller is chosen by minimizing a quadratic cost that penalizes both state deviation and control effort. The result is a feedback law obtained from a Riccati equation, with finite-horizon and infinite-horizon versions reflecting whether the control problem has a terminal time or runs forever.
## Finite-Horizon Quadratic Regulation
How should a controller trade a smaller state against a larger input over a fixed interval $[0,T]$? The finite-horizon LQR problem answers this by assigning a quadratic running cost and a quadratic terminal cost, then optimizing over all admissible inputs. The key point is that the optimal policy is not chosen by guessing poles; it is obtained by propagating a matrix differential equation backward from the terminal condition.
[definition: Finite-Horizon LQR Problem]
Let $A \in \mathbb R^{n \times n}$, $B \in \mathbb R^{n \times m}$, $Q \in \mathbb R^{n \times n}$, $R \in \mathbb R^{m \times m}$, and $S \in \mathbb R^{n \times n}$ satisfy $Q = Q^\top \ge 0$, $R = R^\top > 0$, and $S = S^\top \ge 0$. For $x_0 \in \mathbb R^n$ and $T>0$, let $\mathcal U_T(x_0)$ be the set of all $u\in L^2([0,T];\mathbb R^m)$ for which the state equation
\begin{align*}
x'(t) = Ax(t) + Bu(t), \qquad x(0)=x_0,
\end{align*}
has an absolutely continuous solution $x:[0,T]\to\mathbb R^n$. The finite-horizon cost functional is the map $J_T:\mathcal U_T(x_0)\to\mathbb R$ defined by
\begin{align*}
J_T[u;x_0] = x(T)^\top Sx(T) + \int_0^{\!T} \left(x(t)^\top Qx(t) + u(t)^\top Ru(t)\right)\,dt.
\end{align*}
The finite-horizon LQR problem is to minimize $J_T[u;x_0]$ over $u\in\mathcal U_T(x_0)$.
[/definition]
The matrices have different design roles. The matrix $Q$ penalizes directions in the state space, $R$ prices control energy, and $S$ expresses the terminal penalty. Positivity of $R$ is essential for strict convexity in the input and for the feedback formula to contain $R^{-1}$.
[example: Scalar Regulator]
In the one-dimensional system $x'(t)=ax(t)+bu(t)$, the matrices in the finite-horizon Riccati equation are $A=(a)$, $B=(b)$, $Q=(q)$, $R=(r)$, and $S=(s)$, with $b\ne 0$, $q\ge 0$, $r>0$, and $s\ge 0$. Writing $P(t)=(p(t))$, the matrix terms become ordinary scalar products:
\begin{align*}
A^\top P(t)=ap(t).
\end{align*}
Also,
\begin{align*}
P(t)A=p(t)a=ap(t).
\end{align*}
Since $R^{-1}=(1/r)$ and $B^\top=(b)$, the quadratic input term is
\begin{align*}
P(t)BR^{-1}B^\top P(t)=p(t)b\frac{1}{r}bp(t)=\frac{b^2}{r}p(t)^2.
\end{align*}
Substituting these scalar identities into the differential Riccati equation gives
\begin{align*}
-\dot p(t)=ap(t)+ap(t)-\frac{b^2}{r}p(t)^2+q=2ap(t)-\frac{b^2}{r}p(t)^2+q.
\end{align*}
The terminal condition $P(T)=S$ is exactly $p(T)=s$.
The feedback law also reduces to a scalar formula. From
\begin{align*}
R^{-1}B^\top P(t)=\frac{1}{r}bp(t)=\frac{b}{r}p(t),
\end{align*}
the optimal input is
\begin{align*}
u(t)=-\frac{b}{r}p(t)x(t).
\end{align*}
Thus even in one dimension the gain is governed by a nonlinear terminal-value equation, and the nonlinear term $-\frac{b^2}{r}p(t)^2$ is exactly the scalar remnant of the feedback-through-the-input term $-PBR^{-1}B^\top P$.
[/example]
The scalar case suggests that the value of the optimization problem should be quadratic in the current state. Dynamic programming turns that ansatz into a derivation: if the cost-to-go from time $t$ is $x^\top P(t)x$, the principle of optimality forces a matrix equation for $P(t)$.
[definition: Differential Riccati Equation]
For the finite-horizon data $(A,B,Q,R,S,T)$, the differential Riccati equation is the terminal-value problem for a map $P\in C^1([0,T];\operatorname{Sym}_n(\mathbb R))$ given by
\begin{align*}
-\dot P(t) = A^\top P(t) + P(t)A - P(t)BR^{-1}B^\top P(t) + Q,\qquad P(T)=S,
\end{align*}
where $\operatorname{Sym}_n(\mathbb R)$ denotes the space of real symmetric $n\times n$ matrices.
[/definition]
The equation is solved backward in time, since the terminal cost is specified at $T$. The negative sign in front of $\dot P$ records this backward propagation of the remaining value of state deviations. This backward propagation is also the mechanism that prevents two common design mistakes: choosing an input that lowers the present state while creating an expensive terminal state, and choosing a feedback gain that ignores how future control effort is priced. The theorem packages the Riccati propagation and the completion-of-squares certificate into a single optimality statement.
[quotetheorem:6402]
[citeproof:6402]
This theorem gives both the controller and a certificate of optimality. Since $V(t,x)=x^\top P(t)x$ has state Hessian $2P(t)$, the matrix $P(t)$ is half of that Hessian; large eigenvalues of $P(t)$ identify directions that are costly at time $t$.
The sign assumptions are part of the optimization problem, not cosmetic hypotheses. The conditions $Q\ge 0$ and $S\ge 0$ prevent the cost from rewarding large state magnitudes, while $R>0$ makes the instantaneous minimization over $u$ strictly convex and gives a well-defined inverse $R^{-1}$. For a concrete singular-$R$ failure, take $A=0$, $B=0$, $Q=0$, $S=0$, and $R=(0)$; every input has cost zero, so the minimizer is not unique. For an indefinite terminal cost, take $A=0$, $B=1$, $Q=0$, $R=1$, $S=-1$, $x_0=0$, and $T=2$; the constant control $u(t)=M$ gives $x(2)=2M$ and cost $2M^2-4M^2=-2M^2$, so the cost is unbounded below as $M\to\infty$. The theorem also does not assert robustness to actuator saturation or modelling error; it is an exact linear-quadratic statement for the specified admissible controls. Its backward Riccati construction prepares the infinite-horizon theory: as the terminal time is moved farther into the future, the early-time matrices $P(t)$ are candidates to settle to a stationary matrix, and the differential equation should then lose its derivative term.
[example: Double Integrator with Position-Control Cost]
For the double integrator, the state equation has $A_{12}=1$, all other entries of $A$ equal to $0$, and $B=e_2$. Take $Q=\operatorname{diag}(1,0)$ and $R=(r)$ with $r>0$. Write the symmetric Riccati matrix by its entries $p_{11}(t)$, $p_{12}(t)=p_{21}(t)$, and $p_{22}(t)$.
Suppressing the common time argument $t$, the open-loop terms in the Riccati equation are determined entry by entry. Since $A_{12}=1$ is the only nonzero entry of $A$, the entries of $A^\top P$ are
\begin{align*}
(A^\top P)_{11}=0,\quad (A^\top P)_{12}=0,\quad (A^\top P)_{21}=p_{11},\quad (A^\top P)_{22}=p_{12}.
\end{align*}
Similarly,
\begin{align*}
(PA)_{11}=0,\quad (PA)_{12}=p_{11},\quad (PA)_{21}=0,\quad (PA)_{22}=p_{12}.
\end{align*}
Because $B=e_2$, the product $PB$ is the second column of $P$ and $B^\top P$ is the second row of $P$. Thus
\begin{align*}
(PBR^{-1}B^\top P)_{ij}=\frac{1}{r}p_{i2}p_{2j}.
\end{align*}
In particular,
\begin{align*}
(PBR^{-1}B^\top P)_{11}=\frac{p_{12}^2}{r},\quad (PBR^{-1}B^\top P)_{12}=\frac{p_{12}p_{22}}{r},\quad (PBR^{-1}B^\top P)_{22}=\frac{p_{22}^2}{r}.
\end{align*}
Substituting these entries into
\begin{align*}
-\dot P=A^\top P+PA-PBR^{-1}B^\top P+Q
\end{align*}
gives the scalar system
\begin{align*}
-\dot p_{11}=1-\frac{p_{12}^2}{r}.
\end{align*}
\begin{align*}
-\dot p_{12}=p_{11}-\frac{p_{12}p_{22}}{r}.
\end{align*}
\begin{align*}
-\dot p_{22}=2p_{12}-\frac{p_{22}^2}{r}.
\end{align*}
The feedback gain is obtained from the second row of $P$:
\begin{align*}
R^{-1}B^\top P=\frac{1}{r}(p_{12},p_{22}).
\end{align*}
Therefore the optimal input has the form
\begin{align*}
u(t)=-\frac{1}{r}\bigl(p_{12}(t)x_1(t)+p_{22}(t)x_2(t)\bigr).
\end{align*}
Although $Q$ contains no direct velocity penalty, the equation for $p_{12}$ is driven by $p_{11}$ and the equation for $p_{22}$ is driven by $p_{12}$, so the position cost propagates through the dynamics into a velocity-dependent feedback term.
[/example]
The double-integrator example emphasizes that LQR does not merely penalize the measured component of the state. The dynamics propagate the state penalty through the reachable directions, and the Riccati equation records this propagation through the mixed terms of $P(t)$.
## Infinite-Horizon Regulation and the Algebraic Riccati Equation
What changes when there is no terminal time and the controller must regulate forever? The value function should become time-independent, provided the plant has enough controllability to stabilize unstable directions and enough observability through the cost to detect unpenalized unstable directions. The differential Riccati equation is then replaced by an algebraic equation.
[definition: Infinite-Horizon LQR Problem]
Let $A \in \mathbb R^{n\times n}$, $B\in\mathbb R^{n\times m}$, $Q=C^\top C$ for some $C\in\mathbb R^{p\times n}$, and $R=R^\top>0$. For $x_0\in\mathbb R^n$, let $\mathcal U_\infty(x_0)$ be the set of all $u\in L^2_{\mathrm{loc}}([0,\infty);\mathbb R^m)$ for which
\begin{align*}
x'(t)=Ax(t)+Bu(t),\qquad x(0)=x_0,
\end{align*}
has an absolutely continuous solution on every compact interval. The infinite-horizon cost functional is the extended-real map $J_\infty:\mathcal U_\infty(x_0)\to[0,\infty]$ defined by
\begin{align*}
J_\infty[u;x_0]=\int_0^\infty \left(x(t)^\top Qx(t)+u(t)^\top Ru(t)\right)\,dt.
\end{align*}
The infinite-horizon LQR problem is to minimize $J_\infty[u;x_0]$ over $u\in\mathcal U_\infty(x_0)$.
[/definition]
The infinite-horizon problem removes the terminal weight and asks for finite accumulated cost over an unbounded time interval. The factorisation $Q=C^\top C$ identifies the part of the state being measured by the running cost, so it also tells us which directions are visible to the performance criterion. This changes the role of the Riccati equation: the finite-horizon equation was anchored by the terminal condition $P(T)=S$, while the infinite-horizon problem has no final time from which to propagate a value matrix backward. A new definition is therefore needed, because the finite-horizon differential equation no longer has terminal data and cannot by itself state what stationary optimality should mean.
The natural replacement is to define the stationary equation for a value matrix. If the finite-horizon value matrices settle to a constant matrix $P$ as the horizon grows, the derivative term in the differential Riccati equation disappears. What remains is an algebraic balance between open-loop growth, the decrease supplied by feedback through $B$, and the running state penalty $Q$. This balance is the equation a time-independent quadratic value function $x^\top Px$ must satisfy, so the algebraic Riccati equation is the right object to isolate before stabilisability and detectability turn it into a synthesis theorem.
Before stating any regulator theorem, we first name the stationary matrix equation itself. This definition separates the algebraic balance from the later hypotheses: stabilisability will rule out uncontrollable unstable directions, detectability will rule out unobserved unstable directions, and the equation below records the candidate quadratic value matrix that those hypotheses can select.
[definition: Algebraic Riccati Equation]
Let $A\in\mathbb R^{n\times n}$, $B\in\mathbb R^{n\times m}$, $Q\in\mathbb R^{n\times n}$ with $Q=Q^\top\ge 0$, and $R\in\mathbb R^{m\times m}$ with $R=R^\top>0$. The continuous-time algebraic Riccati equation associated with $(A,B,Q,R)$ is
\begin{align*}
A^\top P+PA-PBR^{-1}B^\top P+Q=0,
\end{align*}
solved for $P\in\operatorname{Sym}_n(\mathbb R)$.
[/definition]
The algebraic equation encodes stationarity of the value matrix, but stationarity alone does not identify the regulator. Since Riccati equations may have several symmetric solutions, the next condition selects the solution whose feedback produces a closed loop that decays in time.
[definition: Stabilising Solution]
A symmetric solution $P$ of the algebraic Riccati equation is stabilising if
\begin{align*}
A-BR^{-1}B^\top P
\end{align*}
is Hurwitz.
[/definition]
This definition selects the Riccati solution that yields a controller usable on the infinite horizon. Without the Hurwitz condition, the feedback may satisfy the algebraic equation but fail to make the integral cost finite for all initial states.
[quotetheorem:6403]
[citeproof:6403]
The stabilisability and detectability assumptions are the infinite-horizon analogues of the two ways a regulator can fail. An uncontrollable unstable mode cannot be moved by the input, while an undetectable unstable mode can avoid the state cost and still prevent decay. For stabilisability failure, take $A=(1)$ and $B=(0)$: the state $x(t)=e^t x_0$ cannot be stabilised, so a positive state cost gives infinite cost for $x_0\ne 0$. For detectability failure, take $A=(1)$, $B=(1)$, $C=(0)$, and $R=(1)$, so $Q=C^\top C=(0)$. The mode $x(t)=e^t x_0$ is unstable and invisible to the running state cost; the algebraic Riccati equation has the non-stabilising solution $P=0$, whose feedback leaves the closed-loop matrix equal to $1$. Detectability rules out exactly this kind of invisible unstable mode, while allowing invisible modes that are already stable.
The theorem also has a uniqueness limitation that is worth separating from the stabilising conclusion. Riccati equations can have several symmetric positive semidefinite solutions, but only one of them is stabilising under the stated hypotheses. In the scalar system $x'=x+u$ with $Q=0$ and $R=(1)$, the algebraic equation is $2P-P^2=0$, so both $P=0$ and $P=2$ solve it; the feedback from $P=0$ leaves the unstable open-loop dynamics unchanged, while $P=2$ gives the stabilising closed loop $x'=-x$. Thus the word stabilising is doing real selection work, and outside the stabilisable/detectable regime the algebraic equation alone is not a reliable synthesis rule.
[example: Unpenalized Stable Mode]
Let $A=\operatorname{diag}(-1,1)$, $B=(0,1)^\top$, $Q=\operatorname{diag}(0,1)$, and $R=(1)$. Since $Q=C^\top C$ with $C=(0,1)$, the running cost sees only the second coordinate. The only unstable eigenvalue of $A$ is $\lambda=1$. For stabilisability, the columns of $\lambda I-A$ together with $B$ are $(2,0)^\top$, $(0,0)^\top$, and $(0,1)^\top$, which span $\mathbb R^2$. For detectability, the rows of $\lambda I-A$ together with $C$ are $(2,0)$, $(0,0)$, and $(0,1)$, which also span $\mathbb R^2$. Thus the unstable mode is both actuated and seen by the cost output.
Write a symmetric Riccati candidate by its entries $P_{11}=\alpha$, $P_{12}=P_{21}=\beta$, and $P_{22}=\gamma$. Because $A=\operatorname{diag}(-1,1)$, the entries of $A^\top P+PA$ are
\begin{align*}
(A^\top P+PA)_{11}=-2\alpha,\qquad (A^\top P+PA)_{12}=0,\qquad (A^\top P+PA)_{22}=2\gamma.
\end{align*}
Also $R^{-1}=1$, $PB=(\beta,\gamma)^\top$, and $B^\top P=(\beta,\gamma)$, so
\begin{align*}
(PBR^{-1}B^\top P)_{11}=\beta^2,\qquad (PBR^{-1}B^\top P)_{12}=\beta\gamma,\qquad (PBR^{-1}B^\top P)_{22}=\gamma^2.
\end{align*}
Substituting these entries into
\begin{align*}
A^\top P+PA-PBR^{-1}B^\top P+Q=0
\end{align*}
gives
\begin{align*}
-2\alpha-\beta^2=0.
\end{align*}
\begin{align*}
-\beta\gamma=0.
\end{align*}
\begin{align*}
2\gamma-\gamma^2+1=0.
\end{align*}
Since $P\ge 0$ requires $\alpha\ge 0$, the first equation forces $\beta=0$ and then $\alpha=0$. The last equation is
\begin{align*}
\gamma^2-2\gamma-1=0,
\end{align*}
so
\begin{align*}
\gamma=1+\sqrt{2}
\end{align*}
is the positive root. Hence the positive semidefinite candidate has entries
\begin{align*}
P_{11}=0,\qquad P_{12}=P_{21}=0,\qquad P_{22}=1+\sqrt{2}.
\end{align*}
Its feedback row is
\begin{align*}
R^{-1}B^\top P=(0,1+\sqrt{2}),
\end{align*}
so the closed-loop matrix has diagonal entries $-1$ and
\begin{align*}
1-(1+\sqrt{2})=-\sqrt{2}.
\end{align*}
Both closed-loop eigenvalues are negative, so the solution is stabilising. The Riccati solution therefore assigns zero value to the unpenalized autonomous stable first coordinate and positive value $1+\sqrt{2}$ to the controlled unstable second coordinate.
[/example]
This example shows why detectability is weaker than observability in regulation. Modes invisible to the cost are permitted when their own dynamics already decay, because they do not obstruct finite accumulated cost.
## Energy Interpretation and Closed-Loop Geometry
Why does the same matrix $P$ describe both optimal cost and feedback gain? The answer is that $x^\top Px$ is a storage function: it measures the least future energy needed to regulate the state. The Riccati identity says that, along the optimal closed loop, the rate of decrease of this stored value equals the running cost being paid.
[quotetheorem:6404]
[citeproof:6404]
The identity is the Lyapunov interpretation of LQR. It says that optimal regulation dissipates the value function at exactly the rate at which the cost is accumulated.
Each hypothesis in the identity has a specific role. The matrix $P$ must be the stabilising solution, not merely a symmetric solution of the algebraic Riccati equation, because the storage function $x^\top Px$ is meant to represent the value of the infinite-horizon optimization problem. A non-stabilising solution can satisfy the same algebraic cancellation while producing a feedback law that does not regulate the plant.
The Hurwitz condition is what turns the differential dissipation identity into the infinite-horizon cost formula. Without closed-loop decay, integration over $[0,T]$ leaves the boundary term $x(T)^\top Px(T)$, and there is no reason for that term to vanish as $T\to\infty$. A boundary example is $A=(0)$, $B=(0)$, $Q=(0)$ with $P=(1)$: the algebraic Riccati equation is satisfied and $d(x^\top Px)/dt=0$, but $x(t)^\top Px(t)=x_0^2$ never decays, so $x_0^\top Px_0$ is not equal to the accumulated running cost.
This is also the point at which the chapter connects to later observer design. The regulator Riccati equation constructs a Lyapunov certificate for the controlled plant; the estimator Riccati equation will play the analogous role for error dynamics. The separation principle relies on these two closed-loop energy pictures being compatible, so the identity is an optimal closed-loop statement rather than a general identity for every symmetric Riccati solution.
[example: Cheap-Control Limit and Eigenvalue Movement]
Consider a controllable single-input system with $R=(\varepsilon)$ and fixed $Q\ge 0$, where $\varepsilon>0$. The feedback gain has the form
\begin{align*}
K_\varepsilon=R^{-1}B^\top P_\varepsilon=\varepsilon^{-1}B^\top P_\varepsilon.
\end{align*}
Thus decreasing $\varepsilon$ makes the input cheaper in the cost, and the scalar calculation shows how this can push the closed-loop pole farther into the left half-plane.
For $x'=ax+bu$ with $b\ne 0$ and $q>0$, the scalar algebraic Riccati equation is
\begin{align*}
2ap_\varepsilon-\frac{b^2}{\varepsilon}p_\varepsilon^2+q=0.
\end{align*}
Multiplying by $\varepsilon$ gives
\begin{align*}
2a\varepsilon p_\varepsilon-b^2p_\varepsilon^2+\varepsilon q=0.
\end{align*}
Multiplying by $-1$ gives the quadratic equation
\begin{align*}
b^2p_\varepsilon^2-2a\varepsilon p_\varepsilon-\varepsilon q=0.
\end{align*}
Applying the [quadratic formula](/theorems/1301) to this equation gives
\begin{align*}
p_\varepsilon=\frac{2a\varepsilon\pm\sqrt{(2a\varepsilon)^2+4b^2\varepsilon q}}{2b^2}.
\end{align*}
Expanding the square inside the radical gives
\begin{align*}
p_\varepsilon=\frac{2a\varepsilon\pm\sqrt{4a^2\varepsilon^2+4b^2\varepsilon q}}{2b^2}.
\end{align*}
Factoring $4\varepsilon^2$ from the radical gives
\begin{align*}
p_\varepsilon=\frac{2a\varepsilon\pm 2\varepsilon\sqrt{a^2+\frac{b^2q}{\varepsilon}}}{2b^2}.
\end{align*}
Canceling the common factor $2\varepsilon$ in the numerator and denominator gives
\begin{align*}
p_\varepsilon=\frac{\varepsilon}{b^2}\left(a\pm\sqrt{a^2+\frac{b^2q}{\varepsilon}}\right).
\end{align*}
Since $q>0$, $\varepsilon>0$, and $b\ne 0$, we have
\begin{align*}
\sqrt{a^2+\frac{b^2q}{\varepsilon}}>|a|.
\end{align*}
Therefore $a-\sqrt{a^2+b^2q/\varepsilon}<0$, while $a+\sqrt{a^2+b^2q/\varepsilon}>0$, so the positive semidefinite stabilising root is
\begin{align*}
p_\varepsilon=\frac{\varepsilon}{b^2}\left(a+\sqrt{a^2+\frac{b^2q}{\varepsilon}}\right).
\end{align*}
The scalar feedback is
\begin{align*}
u(t)=-\frac{b}{\varepsilon}p_\varepsilon x(t).
\end{align*}
Substituting this into $x'=ax+bu$ gives
\begin{align*}
x'(t)=\left(a-\frac{b^2}{\varepsilon}p_\varepsilon\right)x(t).
\end{align*}
Using the displayed formula for $p_\varepsilon$, the closed-loop pole is
\begin{align*}
a-\frac{b^2}{\varepsilon}p_\varepsilon=a-\frac{b^2}{\varepsilon}\cdot \frac{\varepsilon}{b^2}\left(a+\sqrt{a^2+\frac{b^2q}{\varepsilon}}\right).
\end{align*}
Canceling $\frac{b^2}{\varepsilon}\cdot\frac{\varepsilon}{b^2}=1$ gives
\begin{align*}
a-\frac{b^2}{\varepsilon}p_\varepsilon=a-\left(a+\sqrt{a^2+\frac{b^2q}{\varepsilon}}\right).
\end{align*}
Therefore
\begin{align*}
a-\frac{b^2}{\varepsilon}p_\varepsilon=-\sqrt{a^2+\frac{b^2q}{\varepsilon}}.
\end{align*}
As $\varepsilon\downarrow 0$, the term $b^2q/\varepsilon$ tends to $+\infty$, so the closed-loop pole tends to $-\infty$. In this scalar model, cheap control produces an increasingly fast stable closed-loop mode.
[/example]
Cheap control illustrates the design tradeoff hidden inside $R$. Making input inexpensive improves regulation speed in directions that the actuator can reach, but it may also demand large gains and amplify modelling errors or actuator limitations.
[remark: Relation to Pole Placement]
Pole placement asks for a feedback gain producing a prescribed closed-loop spectrum, while LQR chooses the gain minimizing a quadratic performance index. When $(A,B)$ is controllable, both methods can stabilise the same finite-dimensional plant, but LQR additionally supplies a Lyapunov function and an energy interpretation. This places LQR closer to variational principles and dissipativity theory than to purely spectral assignment: the controller is certified by a value function, not just by eigenvalue locations. The price is that the designer specifies weights rather than closed-loop poles directly.
[/remark]
The chapter completes the transition from structural linear systems theory to optimal synthesis. The next step in the course is to combine these ideas with state estimation: when the full state is not measured, the controller will use an observer, and the separation principle will explain why the regulator and estimator Riccati equations can be designed separately.
The LQR framework assumes the state is known, but in practice the controller must often work from noisy measurements. Chapter 10 therefore adds stochastic disturbances and measurement noise, leading to the Kalman filter as the optimal estimator.
# 10. Kalman Filtering
Kalman filtering answers the estimation question left open by the deterministic observer design of Chapter 8: how should a controller reconstruct the state when the model is driven by random disturbances and the measurements are noisy? The preceding observer and LQR chapters treated state reconstruction and optimal feedback as separate deterministic problems. This chapter adds probabilistic structure, first in discrete time and then in continuous time, and shows that Gaussian linear systems admit finite-dimensional filters governed only by a mean estimate and an error covariance.
## Probability Notation for Filtering
The filtering sections use a small amount of probability language to keep the statements precise. A probability space $(\Omega,\mathcal F,\mathbb P)$ is the background sample space for all random variables. A sigma-algebra is a collection of events; in filtering it represents the information currently available. The Borel sigma-algebra on $\mathbb R^n$ is the standard event collection generated by open sets, so saying that $x_k$ is an $\mathbb R^n$-valued random vector means its coordinate events are measurable in this usual sense.
A filtration $(\mathcal F_t)_{t\ge 0}$ is an increasing family of sigma-algebras, so $\mathcal F_t$ means the information known by time $t$. A process is adapted when its value at time $t$ depends only on information in $\mathcal F_t$. In discrete time, conditioning on $y_0,\dots,y_k$ is shorthand for conditioning on the information generated by those measurements. Thus $\mathbb E[x_k\mid y_0,\dots,y_k]$ is the best mean-square estimate of $x_k$ using the measurements available up to time $k$, and its covariance records the remaining mean-square error.
The continuous-time models use [Brownian motion](/page/Brownian%20Motion), also called a Wiener process, as the idealized source of white Gaussian noise. If $W_t$ is standard Brownian motion in $\mathbb R^q$, then its increments over disjoint time intervals are independent Gaussian random vectors, and the covariance of an increment over a short interval of length $\Delta t$ is approximately $\Delta t\,I_q$. A covariance intensity such as $G(t)G(t)^\top$ or $R(t)$ is the matrix multiplying this infinitesimal variance rate. Equations with terms such as $dW_t$ and $dV_t$ are read in integrated form over time intervals; the displayed differentials are compact notation for those integral equations. Two stochastic processes are indistinguishable when their sample paths agree for all times except on a probability-zero event. These explanations are enough for the Kalman and Kalman-Bucy formulas below; the course uses them as linear-Gaussian modelling notation, not as a full course in [stochastic calculus](/page/Stochastic%20Calculus).
## Linear Stochastic Systems and Least-Squares Estimation
The basic problem is to estimate an unobserved state from a stream of corrupted measurements. In deterministic observer theory, the innovation $y-C\hat{x}$ measures disagreement between the model and the output; in stochastic filtering, the same innovation is weighted by the amount of uncertainty in the prediction and in the sensor.
[definition: Discrete-Time Linear Gaussian State Model]
Let $(x_k)_{k\ge 0}$ be an $\mathbb R^n$-valued state process and $(y_k)_{k\ge 0}$ an $\mathbb R^p$-valued measurement process. For each $k\ge 0$, let
\begin{align*}
A_k &: \mathbb R^n\to \mathbb R^n, & B_k &: \mathbb R^m\to \mathbb R^n, & C_k &: \mathbb R^n\to \mathbb R^p
\end{align*}
be linear maps, represented by real matrices, and let $u_k\in\mathbb R^m$ be known. The model is
\begin{align*}
x_{k+1} = A_k x_k + B_k u_k + w_k.
\end{align*}
\begin{align*}
y_k = C_k x_k + v_k.
\end{align*}
Here $w_k\sim \mathcal N(0,Q_k)$ with $Q_k:\mathbb R^n\to\mathbb R^n$ symmetric positive semidefinite, $v_k\sim \mathcal N(0,R_k)$ with $R_k:\mathbb R^p\to\mathbb R^p$ symmetric positive definite, $x_0\sim \mathcal N(\bar{x}_0,P_0)$ with $P_0:\mathbb R^n\to\mathbb R^n$ symmetric positive semidefinite, and the random vectors $x_0,w_0,w_1,\dots,v_0,v_1,\dots$ are independent.
[/definition]
The model separates what is known from what is uncertain. The matrices and inputs are data, while the state, disturbances, and sensor errors are random vectors. The positive definiteness of $R_k$ means every measurement channel has nonzero noise after any linear combination.
[example: Scalar Random Walk Model]
Consider the scalar model $x_{k+1}=x_k+w_k$ and $y_k=x_k+v_k$, with $w_k\sim\mathcal N(0,q)$, $v_k\sim\mathcal N(0,r)$, $q\ge 0$, and $r>0$. Here $A=1$ and $C=1$. If the predicted estimate and covariance at time $k$ are $\hat{x}_{k\mid k-1}$ and $p_{k\mid k-1}$, then the innovation is
\begin{align*}
y_k-C\hat{x}_{k\mid k-1}=y_k-\hat{x}_{k\mid k-1}.
\end{align*}
Its covariance is
\begin{align*}
S_k=Cp_{k\mid k-1}C^\top+r=1\cdot p_{k\mid k-1}\cdot 1+r=p_{k\mid k-1}+r.
\end{align*}
Thus the scalar gain is
\begin{align*}
K_k=p_{k\mid k-1}C^\top S_k^{-1}=p_{k\mid k-1}\cdot 1\cdot \frac{1}{p_{k\mid k-1}+r}=\frac{p_{k\mid k-1}}{p_{k\mid k-1}+r}.
\end{align*}
Substituting this gain into the correction step gives
\begin{align*}
\hat{x}_{k\mid k}=\hat{x}_{k\mid k-1}+\frac{p_{k\mid k-1}}{p_{k\mid k-1}+r}(y_k-\hat{x}_{k\mid k-1}).
\end{align*}
Expanding the right-hand side,
\begin{align*}
\hat{x}_{k\mid k}=\left(1-\frac{p_{k\mid k-1}}{p_{k\mid k-1}+r}\right)\hat{x}_{k\mid k-1}+\frac{p_{k\mid k-1}}{p_{k\mid k-1}+r}y_k.
\end{align*}
Since
\begin{align*}
1-\frac{p_{k\mid k-1}}{p_{k\mid k-1}+r}=\frac{p_{k\mid k-1}+r-p_{k\mid k-1}}{p_{k\mid k-1}+r}=\frac{r}{p_{k\mid k-1}+r},
\end{align*}
the update is the weighted average
\begin{align*}
\hat{x}_{k\mid k}=\frac{r}{p_{k\mid k-1}+r}\hat{x}_{k\mid k-1}+\frac{p_{k\mid k-1}}{p_{k\mid k-1}+r}y_k.
\end{align*}
The two weights add to
\begin{align*}
\frac{r}{p_{k\mid k-1}+r}+\frac{p_{k\mid k-1}}{p_{k\mid k-1}+r}=\frac{r+p_{k\mid k-1}}{p_{k\mid k-1}+r}=1.
\end{align*}
The covariance correction becomes
\begin{align*}
p_{k\mid k}=(1-K_kC)p_{k\mid k-1}=\left(1-\frac{p_{k\mid k-1}}{p_{k\mid k-1}+r}\right)p_{k\mid k-1}.
\end{align*}
Using the same fraction identity,
\begin{align*}
p_{k\mid k}=\frac{r}{p_{k\mid k-1}+r}p_{k\mid k-1}=\frac{rp_{k\mid k-1}}{p_{k\mid k-1}+r}.
\end{align*}
The next predicted covariance is
\begin{align*}
p_{k+1\mid k}=1\cdot p_{k\mid k}\cdot 1+q=p_{k\mid k}+q.
\end{align*}
Thus the scalar filter blends the old prediction with the new noisy observation: larger sensor noise $r$ shifts weight toward $\hat{x}_{k\mid k-1}$, while larger predicted uncertainty $p_{k\mid k-1}$ shifts weight toward $y_k$.
[/example]
The scalar formula points to the general structure of the filter: each step must specify what information is available before the new observation and what information is available after it. Before introducing notation for those two estimates, we need the Hilbert-space principle that identifies the best least-squares estimator from a chosen information space by an orthogonality condition.
[quotetheorem:6405]
[citeproof:6405]
Closedness is essential because the metric [projection theorem](/theorems/1985) is a theorem about closed subspaces: if the available estimators form only a dense proper subspace, an infimum of squared errors may exist without being attained inside that subspace. Square-integrability is also doing real work, since the criterion is an $L^2$ distance and the covariance computations used later are finite only in that setting. Linearity matters because orthogonality characterizes projection onto a linear information space; for nonlinear classes of estimators, the best approximation need not be described by a single linear orthogonality equation. Thus the theorem is not yet a Kalman filter: it only says how to recognize the best least-squares estimator once the admissible information space has been specified.
For Gaussian systems, least-squares and [conditional expectation](/page/Conditional%20Expectation) agree, and the orthogonality equations reduce to linear algebra. A recursive filter must distinguish two information states: the estimate made before the current measurement arrives and the estimate after that measurement has been incorporated. Without separate notation for these two moments, the covariance update would conflate model propagation with measurement correction.
The next definitions fix that bookkeeping precisely. They name the two conditional means and their error covariances so the Kalman recursion can state separately which quantities are propagated by the dynamics and which are corrected by the latest observation.
[definition: Prediction and Filtering Estimates]
For the discrete-time model, define $\hat{x}_{k\mid k-1}=\mathbb E[x_k\mid y_0,\dots,y_{k-1}]$ and $\hat{x}_{k\mid k}=\mathbb E[x_k\mid y_0,\dots,y_k]$. The corresponding error covariances are
\begin{align*}
P_{k\mid k-1} = \mathbb E[(x_k-\hat{x}_{k\mid k-1})(x_k-\hat{x}_{k\mid k-1})^\top].
\end{align*}
\begin{align*}
P_{k\mid k} = \mathbb E[(x_k-\hat{x}_{k\mid k})(x_k-\hat{x}_{k\mid k})^\top].
\end{align*}
[/definition]
The notation separates the time being estimated from the latest measurement used. Prediction moves the model forward, while correction incorporates a new observation.
## The Discrete-Time Kalman Filter
How does the filter update without storing all past measurements? The Gaussian-linear assumption makes the conditional distribution of $x_k$ given measurements Gaussian, so the mean and covariance are sufficient statistics. The recursion below is the central computational result.
[remark: Discrete-Time Kalman Filter Recursion]
For the discrete-time linear Gaussian model, assume the prediction mean and covariance before observing $y_k$ are $\hat{x}_{k\mid k-1}$ and $P_{k\mid k-1}$. Define the innovation and innovation covariance by
\begin{align*}
\nu_k=y_k-C_k\hat{x}_{k\mid k-1}.
\end{align*}
\begin{align*}
S_k=C_kP_{k\mid k-1}C_k^\top+R_k.
\end{align*}
When $S_k$ is invertible, the Kalman gain is
\begin{align*}
K_k=P_{k\mid k-1}C_k^\top S_k^{-1}.
\end{align*}
The measurement update is
\begin{align*}
\hat{x}_{k\mid k}=\hat{x}_{k\mid k-1}+K_k\nu_k.
\end{align*}
\begin{align*}
P_{k\mid k}=P_{k\mid k-1}-K_kS_kK_k^\top.
\end{align*}
The one-step prediction is
\begin{align*}
\hat{x}_{k+1\mid k}=A_k\hat{x}_{k\mid k}+B_ku_k.
\end{align*}
\begin{align*}
P_{k+1\mid k}=A_kP_{k\mid k}A_k^\top+Q_k.
\end{align*}
[/remark]
The gain has the same role as the observer gain in a Luenberger observer, but it is chosen from covariance information rather than assigned directly. The innovation $y_k-C_k\hat{x}_{k\mid k-1}$ is the part of the measurement not predicted by the model. The Gaussian assumption is what turns the conditional mean into a finite-dimensional recursion for only $\hat{x}_{k\mid k}$ and $P_{k\mid k}$; for a non-Gaussian prior or nonlinear observation model, the conditional distribution may require higher moments or even a full density. Independence of $w_k$ and $v_k$ from the past prevents old information from re-entering the update through hidden correlations; if the noises are correlated, the gain and covariance equations acquire extra cross-covariance terms. The condition $R_k>0$ ensures that $S_k$ is invertible in every measured direction. With singular measurement noise, some linear combinations of $y_k$ may be exact constraints rather than noisy observations, and the displayed inverse must be replaced by a constrained update or a pseudoinverse formulation.
[example: Tracking Position and Velocity from Noisy Position Data]
Let $x_k=(p_k,v_k)\in\mathbb R^2$, where $p_k$ is position and $v_k$ is velocity, and let the sampling time be $h>0$. The constant-velocity model is determined by
\begin{align*}
A(p,v)=(p+hv,v)
\end{align*}
and
\begin{align*}
C(p,v)=p.
\end{align*}
Thus the state equation predicts the next position by adding $h$ times the current velocity, while the measurement observes only position.
Write the predicted covariance at time $k$ in coordinates as
\begin{align*}
P_{k\mid k-1}(z_1,z_2)=(az_1+bz_2,bz_1+dz_2),
\end{align*}
so $a$ is the predicted position-error variance and $b$ is the predicted position-velocity error covariance. If the scalar measurement noise variance is $r>0$, then the innovation covariance from *Discrete-Time Kalman Filter Recursion* is
\begin{align*}
S_k=CP_{k\mid k-1}C^\top+r.
\end{align*}
Since $C^\top s=(s,0)$ for a scalar $s$, we have
\begin{align*}
P_{k\mid k-1}C^\top s=P_{k\mid k-1}(s,0)=(as,bs).
\end{align*}
Applying $C$ gives
\begin{align*}
CP_{k\mid k-1}C^\top s=C(as,bs)=as.
\end{align*}
Therefore, as a scalar covariance,
\begin{align*}
S_k=a+r.
\end{align*}
The Kalman gain is
\begin{align*}
K_k=P_{k\mid k-1}C^\top S_k^{-1}.
\end{align*}
For a scalar innovation $\eta$, this gives
\begin{align*}
K_k\eta=P_{k\mid k-1}C^\top\frac{\eta}{a+r}.
\end{align*}
Using $C^\top s=(s,0)$ again,
\begin{align*}
K_k\eta=P_{k\mid k-1}\left(\frac{\eta}{a+r},0\right)=\left(\frac{a\eta}{a+r},\frac{b\eta}{a+r}\right).
\end{align*}
Hence the gain has position component $a/(a+r)$ and velocity component $b/(a+r)$.
The innovation is
\begin{align*}
y_k-C\hat{x}_{k\mid k-1}=y_k-\hat p_{k\mid k-1}.
\end{align*}
Substituting the gain into the correction equation from *Discrete-Time Kalman Filter Recursion* yields
\begin{align*}
\hat{x}_{k\mid k}=\hat{x}_{k\mid k-1}+K_k\bigl(y_k-\hat p_{k\mid k-1}\bigr).
\end{align*}
Taking the first coordinate gives
\begin{align*}
\hat p_{k\mid k}=\hat p_{k\mid k-1}+\frac{a}{a+r}\bigl(y_k-\hat p_{k\mid k-1}\bigr).
\end{align*}
Taking the second coordinate gives
\begin{align*}
\hat v_{k\mid k}=\hat v_{k\mid k-1}+\frac{b}{a+r}\bigl(y_k-\hat p_{k\mid k-1}\bigr).
\end{align*}
The same position innovation changes the velocity estimate exactly when $b\ne 0$, because $b$ measures the predicted covariance between position error and velocity error. Thus a position-only sensor can still correct velocity through the off-diagonal covariance in $P_{k\mid k-1}$.
[/example]
For numerical work, the covariance update is often written in a form that preserves symmetry and positive semidefiniteness under roundoff. This is the Joseph form.
[remark: Joseph Covariance Form]
The correction covariance can also be written as
\begin{align*}
P_{k\mid k}=(I-K_kC_k)P_{k\mid k-1}(I-K_kC_k)^\top+K_kR_kK_k^\top.
\end{align*}
For the optimal gain this equals $(I-K_kC_k)P_{k\mid k-1}$ in exact arithmetic. The displayed form is preferred in implementations where preserving $P_{k\mid k}\ge 0$ matters.
[/remark]
The covariance recursion is deterministic. Once $A_k,C_k,Q_k,R_k$ and $P_0$ are fixed, the matrices $P_{k\mid k}$ do not depend on the realized measurement values.
## Continuous-Time Kalman-Bucy Filtering
The continuous-time version asks for an estimator driven by a measurement signal observed continuously in time. The answer resembles an observer with a time-varying gain, while the gain is still obtained from a Riccati equation for the estimation covariance.
[definition: Continuous-Time Linear Gaussian State Model]
Let $x(t)\in\mathbb R^n$ and $y(t)\in\mathbb R^p$. For each $t\ge 0$, let
\begin{align*}
A(t)&:\mathbb R^n\to\mathbb R^n, & B(t)&:\mathbb R^m\to\mathbb R^n, & G(t)&:\mathbb R^q\to\mathbb R^n,
\end{align*}
\begin{align*}
C(t)&:\mathbb R^n\to\mathbb R^p, & D(t)&:\mathbb R^r\to\mathbb R^p
\end{align*}
be linear maps, represented by real matrices, and let $u(t)\in\mathbb R^m$ be known. The state and observation equations are
\begin{align*}
dx(t) = A(t)x(t)\,dt+B(t)u(t)\,dt+G(t)\,dW_t.
\end{align*}
\begin{align*}
dy(t) = C(t)x(t)\,dt+D(t)\,dV_t.
\end{align*}
Here $(W_t)_{t\ge 0}$ is a standard Brownian motion in $\mathbb R^q$, $(V_t)_{t\ge 0}$ is a standard Brownian motion in $\mathbb R^r$, the two Brownian motions are independent, $x(0)$ is Gaussian and independent of them, and $D(t)D(t)^\top=R(t):\mathbb R^p\to\mathbb R^p$ is symmetric positive definite.
[/definition]
The measurement equation should be read in integrated form: the observed process has a drift depending on $x(t)$ plus white measurement noise. This raises the next problem: find an adapted estimator whose increments use only the new innovation $dy(t)-C(t)\hat{x}(t)\,dt$ and whose covariance can be propagated without knowing the realized observation path.
[quotetheorem:6406]
[citeproof:6406]
The covariance equation differs from the control Riccati equation in sign and interpretation. LQR propagates the value of future control effort backward in time, while filtering propagates present estimation uncertainty forward in time. The hypotheses also mark the limits of the formula. If $R(t)$ is degenerate, some observed directions have no white-noise variance, so $R(t)^{-1}$ is not defined and the observation may impose exact constraints instead of giving a regular innovation. If the estimator is allowed to depend on future observations, it becomes a smoothing problem rather than a filtering problem, and adaptedness is lost. If the model is non-Gaussian or nonlinear, the conditional mean may still exist, but it generally no longer closes through only $\hat{x}(t)$ and $P(t)$.
[example: Scalar Kalman-Bucy Filter]
For the scalar state equation $dx=ax\,dt+\sigma\,dW_t$ and measurement $dy=cx\,dt+\rho\,dV_t$, the scalar coefficients in the Kalman-Bucy covariance equation are $A=a$, $G=\sigma$, $C=c$, and $R=\rho^2$. Since the measurement covariance is positive, $\rho^2>0$, so $R^{-1}=1/\rho^2$.
Substituting these scalar quantities into the covariance equation gives
\begin{align*}
\dot P=AP+PA^\top+GG^\top-PC^\top R^{-1}CP.
\end{align*}
Because scalar transpose does not change a number, this becomes
\begin{align*}
\dot P=aP+Pa+\sigma\sigma-(Pc)\frac{1}{\rho^2}(cP).
\end{align*}
Combining the scalar products gives
\begin{align*}
\dot P=2aP+\sigma^2-\frac{c^2P^2}{\rho^2}.
\end{align*}
Equivalently,
\begin{align*}
\dot P=2aP+\sigma^2-\frac{c^2}{\rho^2}P^2.
\end{align*}
The gain formula gives
\begin{align*}
K=PC^\top R^{-1}=P\cdot c\cdot \frac{1}{\rho^2}=\frac{Pc}{\rho^2}.
\end{align*}
Thus large measurement-noise variance $\rho^2$ decreases the multiplier on the innovation, while large process-noise variance $\sigma^2$ increases the covariance growth term and can raise the gain through the factor $P$ after transients.
[/example]
The scalar example shows the balance encoded by the Riccati equation: process noise creates uncertainty, and informative measurements remove uncertainty. In multiple dimensions, the same balance is constrained by which state directions are observed and which directions are excited by disturbances.
## Steady-State Filtering and Detectability
For time-invariant systems, engineers often use a constant filter gain after transients. The mathematical question is when the covariance Riccati equation converges to a stabilizing algebraic solution.
[definition: Detectability for Filtering]
Let $A:\mathbb R^n\to\mathbb R^n$ and $C:\mathbb R^n\to\mathbb R^p$ be real linear maps, extended complex-linearly to $\mathbb C^n$ and $\mathbb C^p$. The pair $(C,A)$ is detectable if every eigenvector $z\in\mathbb C^n$ of $A$ with $Az=\lambda z$ and $\operatorname{Re}(\lambda)\ge 0$ satisfies $Cz\ne 0$.
[/definition]
Detectability is the observability condition relevant to estimation. It handles how measurements see unstable modes, but the covariance Riccati equation also contains the process-noise factor $G$. We therefore need the dual condition that rules out unforced unstable directions in the covariance dynamics.
[definition: Stabilizability of Process Noise]
Let $A:\mathbb R^n\to\mathbb R^n$ and $G:\mathbb R^q\to\mathbb R^n$ be real linear maps, extended complex-linearly where eigenvectors are considered, and set $Q=GG^\top:\mathbb R^n\to\mathbb R^n$. The pair $(A,G)$ is stabilizable if every eigenvector $z\in\mathbb C^n$ of $A^\top$ with $A^\top z=\lambda z$ and $\operatorname{Re}(\lambda)\ge 0$ satisfies $G^\top z\ne 0$.
[/definition]
This condition prevents the Riccati equation from ignoring unstable directions that carry no process-noise covariance. With both the observable unstable directions and the noise-driven unstable directions controlled, the remaining question is convergence of the Riccati flow to a constant stabilizing covariance.
[quotetheorem:6407]
[citeproof:6407]
The two structural hypotheses exclude different failure modes. If $(A,C)$ is not detectable, an unstable mode may be invisible in the measurements; then no filter gain based on $y$ can stabilize the estimation error in that direction. If $(A,G)$ is not stabilizable, the Riccati equation may fail to select the stabilizing covariance because an unstable adjoint direction is not represented in the process-noise covariance. The theorem also does not say that every initial transient is short, nor that the steady-state gain is appropriate for strongly time-varying or incorrectly modelled systems; it only gives convergence under the stated time-invariant hypotheses.
The steady-state gain
\begin{align*}
K_\infty=P_\infty C^\top R^{-1}
\end{align*}
turns the filter into a fixed-gain stochastic observer. This is the filter used in many time-invariant applications after an initial covariance transient.
[example: Covariance Convergence in a Detectable System]
Let $A(z_1,z_2)=(z_2,0)$, let $C(z_1,z_2)=z_1$, let $G s=(0,s)$, and let $R=(r)$ with $r>0$. We first verify the two structural hypotheses needed for the steady-state covariance theorem.
To check detectability, let $z=(z_1,z_2)\in\mathbb C^2$ be an eigenvector of $A$ with eigenvalue $\lambda$ and $\operatorname{Re}(\lambda)\ge 0$. The equation $Az=\lambda z$ is
\begin{align*}
(z_2,0)=(\lambda z_1,\lambda z_2).
\end{align*}
Thus
\begin{align*}
z_2=\lambda z_1.
\end{align*}
and
\begin{align*}
0=\lambda z_2.
\end{align*}
Since $A^2=0$, every eigenvalue of $A$ is $0$, so $\lambda=0$. Therefore $z_2=0$. Because $z$ is an eigenvector, $z\ne 0$, hence $z_1\ne 0$. Applying the measurement map gives
\begin{align*}
Cz=C(z_1,0)=z_1\ne 0.
\end{align*}
So every eigenvector of $A$ with nonnegative real-part eigenvalue is seen by $C$, and $(A,C)$ is detectable.
To check stabilizability of the process noise, use $A^\top(z_1,z_2)=(0,z_1)$ and $G^\top(z_1,z_2)=z_2$. If $z=(z_1,z_2)\in\mathbb C^2$ is an eigenvector of $A^\top$ with eigenvalue $\lambda$ and $\operatorname{Re}(\lambda)\ge 0$, then
\begin{align*}
(0,z_1)=(\lambda z_1,\lambda z_2).
\end{align*}
Hence
\begin{align*}
0=\lambda z_1.
\end{align*}
and
\begin{align*}
z_1=\lambda z_2.
\end{align*}
Again $A^\top$ has only the eigenvalue $0$, so $\lambda=0$ and $z_1=0$. Since $z$ is an eigenvector, $z\ne 0$, hence $z_2\ne 0$. Therefore
\begin{align*}
G^\top z=G^\top(0,z_2)=z_2\ne 0.
\end{align*}
Thus $(A,G)$ is stabilizable.
The hypotheses of *Steady-State Filter Riccati Theorem* are satisfied, so the covariance solution $P(t)$ converges to a positive semidefinite stabilizing limit $P_\infty$. If
\begin{align*}
P_\infty(z_1,z_2)=(\alpha z_1+\beta z_2,\beta z_1+\delta z_2),
\end{align*}
then $C^\top s=(s,0)$ and $R^{-1}s=s/r$. The steady-state gain satisfies
\begin{align*}
K_\infty s=P_\infty C^\top R^{-1}s=P_\infty(s/r,0)=(\alpha s/r,\beta s/r).
\end{align*}
Therefore the gain vector is $(\alpha/r,\beta/r)$. For an error vector $e=(e_1,e_2)$, the limiting error dynamics are generated by
\begin{align*}
(A-K_\infty C)e=Ae-K_\infty(Ce).
\end{align*}
Since $Ae=(e_2,0)$ and $Ce=e_1$,
\begin{align*}
K_\infty(Ce)=K_\infty e_1=(\alpha e_1/r,\beta e_1/r).
\end{align*}
Hence
\begin{align*}
(A-K_\infty C)e=(e_2-\alpha e_1/r,-\beta e_1/r).
\end{align*}
The theorem says this linear map is Hurwitz. Thus the steady-state filter gain stabilizes the position-velocity estimation error even though the sensor measures only position, because the acceleration-noise channel makes the velocity direction part of the stabilizable covariance dynamics.
[/example]
The filtering chapter completes the bridge from deterministic state-space theory to output-feedback control. The Kalman filter supplies the state estimate, LQR supplies the state-feedback law, and the separation principle combines them without redesigning either component.
The filtering result completes the estimator side of the story, and the next chapter combines it with the regulator. Chapter 11 shows how state feedback and state estimation fit together in the separation principle to yield implementable output feedback.
# 11. Separation Principle and Output Feedback
This chapter completes the passage from ideal state feedback to implementable output feedback. Chapters 7 and 9 designed regulators under the assumption that the full state $x(t)$ is available, and Chapters 8 and 10 designed estimators that reconstruct $x(t)$ from measured outputs. The separation principle says that, under the right structural hypotheses, these two designs can be carried out independently and then combined without losing closed-loop stability.
## Combining State Feedback with State Estimation
The practical problem is that LQR gives a feedback law $u(t)=-Kx(t)$, while many systems only measure $y(t)=Cx(t)$, possibly corrupted by noise. The natural replacement is to run an observer or Kalman filter in parallel with the plant and feed back the estimate $\hat{x}(t)$ rather than the true state. The main question is whether the regulator poles and estimator poles interfere with each other after the loops are connected.
Consider the deterministic plant
\begin{align*}
\dot{x}(t) = Ax(t)+Bu(t), \qquad y(t)=Cx(t),
\end{align*}
where $x(t) \in \mathbb R^n$, $u(t) \in \mathbb R^m$, and $y(t) \in \mathbb R^p$. Suppose a state-feedback gain $K \in \mathbb R^{m \times n}$ has been chosen, and suppose an observer gain $L \in \mathbb R^{n \times p}$ has been chosen.
[definition: Observer-Based State Feedback]
Let $\mathcal Y$ be a class of measured output signals $y:[0,\infty)\to \mathbb R^p$ for which the following differential equation has a solution, and let $\mathcal U$ be the corresponding class of control signals $u:[0,\infty)\to \mathbb R^m$. The observer-based state-feedback controller associated to $K$ and $L$ is the causal dynamic output-feedback operator
\begin{align*}
\mathcal C_{K,L}:\mathcal Y\to \mathcal U, \qquad y\mapsto u,
\end{align*}
with controller state $\hat{x}:[0,\infty)\to \mathbb R^n$ defined by
\begin{align*}
\dot{\hat{x}}(t) = A\hat{x}(t)+Bu(t)+L(y(t)-C\hat{x}(t)), \qquad u(t) = -K\hat{x}(t).
\end{align*}
[/definition]
The estimate is propagated using the same model as the plant, and the innovation $y-C\hat{x}$ corrects it using measured output error. The feedback law is the same algebraic law as full-state feedback, but applied to the observer state.
[example: Observer Feedback for a Double Integrator]
For the double integrator, write
\begin{align*}
A=\left(\begin{array}{cc}0&1\cr 0&0\end{array}\right),\qquad B=\left(\begin{array}{c}0\cr 1\end{array}\right),\qquad C=\left(\begin{array}{cc}1&0\end{array}\right),
\end{align*}
so that $\dot{x}_1=x_2$, $\dot{x}_2=u$, and $y=x_1$. With $Q=I$ and $R=1$, the continuous-time LQR Riccati equation is
\begin{align*}
A^\top P+PA-PBB^\top P+I=0.
\end{align*}
For a symmetric matrix $P=\left(\begin{array}{cc}p_{11}&p_{12}\cr p_{12}&p_{22}\end{array}\right)$, the first two products are
\begin{align*}
A^\top P=\left(\begin{array}{cc}0&0\cr 1&0\end{array}\right)\left(\begin{array}{cc}p_{11}&p_{12}\cr p_{12}&p_{22}\end{array}\right)=\left(\begin{array}{cc}0&0\cr p_{11}&p_{12}\end{array}\right).
\end{align*}
\begin{align*}
PA=\left(\begin{array}{cc}p_{11}&p_{12}\cr p_{12}&p_{22}\end{array}\right)\left(\begin{array}{cc}0&1\cr 0&0\end{array}\right)=\left(\begin{array}{cc}0&p_{11}\cr 0&p_{12}\end{array}\right).
\end{align*}
Since $BB^\top=\left(\begin{array}{cc}0&0\cr 0&1\end{array}\right)$, the quadratic term is
\begin{align*}
PBB^\top P=\left(\begin{array}{cc}p_{11}&p_{12}\cr p_{12}&p_{22}\end{array}\right)\left(\begin{array}{cc}0&0\cr 0&1\end{array}\right)\left(\begin{array}{cc}p_{11}&p_{12}\cr p_{12}&p_{22}\end{array}\right)=\left(\begin{array}{cc}p_{12}^2&p_{12}p_{22}\cr p_{12}p_{22}&p_{22}^2\end{array}\right).
\end{align*}
Substituting these expressions into the Riccati equation gives
\begin{align*}
\left(\begin{array}{cc}1-p_{12}^2&p_{11}-p_{12}p_{22}\cr p_{11}-p_{12}p_{22}&2p_{12}+1-p_{22}^2\end{array}\right)=0.
\end{align*}
Thus
\begin{align*}
1-p_{12}^2=0,\qquad p_{11}-p_{12}p_{22}=0,\qquad 2p_{12}+1-p_{22}^2=0.
\end{align*}
The positive stabilising solution takes $p_{12}=1$, so $p_{22}^2=3$, $p_{22}=\sqrt{3}$, and $p_{11}=p_{12}p_{22}=\sqrt{3}$. Hence
\begin{align*}
P=\left(\begin{array}{cc}\sqrt{3}&1\cr 1&\sqrt{3}\end{array}\right),\qquad K=R^{-1}B^\top P=\left(\begin{array}{cc}1&\sqrt{3}\end{array}\right).
\end{align*}
The full-state LQR closed-loop matrix is
\begin{align*}
A-BK=\left(\begin{array}{cc}0&1\cr 0&0\end{array}\right)-\left(\begin{array}{c}0\cr 1\end{array}\right)\left(\begin{array}{cc}1&\sqrt{3}\end{array}\right)=\left(\begin{array}{cc}0&1\cr -1&-\sqrt{3}\end{array}\right).
\end{align*}
Its characteristic polynomial is
\begin{align*}
\det\left(\lambda I-(A-BK)\right)=\det\left(\begin{array}{cc}\lambda&-1\cr 1&\lambda+\sqrt{3}\end{array}\right)=\lambda(\lambda+\sqrt{3})+1=\lambda^2+\sqrt{3}\lambda+1.
\end{align*}
The roots are $\lambda=(-\sqrt{3}+i)/2$ and $\lambda=(-\sqrt{3}-i)/2$, so the regulator poles lie in the open left half-plane.
Now take an observer gain $L=\left(\begin{array}{c}l_1\cr l_2\end{array}\right)$. Since
\begin{align*}
LC=\left(\begin{array}{c}l_1\cr l_2\end{array}\right)\left(\begin{array}{cc}1&0\end{array}\right)=\left(\begin{array}{cc}l_1&0\cr l_2&0\end{array}\right),
\end{align*}
the observer error matrix is
\begin{align*}
A-LC=\left(\begin{array}{cc}0&1\cr 0&0\end{array}\right)-\left(\begin{array}{cc}l_1&0\cr l_2&0\end{array}\right)=\left(\begin{array}{cc}-l_1&1\cr -l_2&0\end{array}\right).
\end{align*}
Its characteristic polynomial is
\begin{align*}
\det\left(\lambda I-(A-LC)\right)=\det\left(\begin{array}{cc}\lambda+l_1&-1\cr l_2&\lambda\end{array}\right)=\lambda(\lambda+l_1)+l_2=\lambda^2+l_1\lambda+l_2.
\end{align*}
Choosing $l_1=5$ and $l_2=6$ gives
\begin{align*}
\lambda^2+5\lambda+6=(\lambda+2)(\lambda+3),
\end{align*}
so $A-LC$ has observer poles $-2$ and $-3$.
With these gains, the implemented controller is
\begin{align*}
u=-\hat{x}_1-\sqrt{3}\hat{x}_2.
\end{align*}
\begin{align*}
\dot{\hat{x}}_1=\hat{x}_2+5(y-\hat{x}_1).
\end{align*}
\begin{align*}
\dot{\hat{x}}_2=u+6(y-\hat{x}_1).
\end{align*}
Thus the controller has its own two-dimensional state $(\hat{x}_1,\hat{x}_2)$, is driven only by the measured position $y=x_1$, and applies the same damping law as full-state LQR after replacing the unmeasured velocity $x_2$ by its observer estimate $\hat{x}_2$.
[/example]
This example shows why output feedback is not merely a static replacement for state feedback. The controller has its own state, and the closed-loop system is the cascade of the plant dynamics and the dynamics of the mismatch between the plant state and controller state. The following definition introduces that mismatch so that the cascade can be written as a triangular system.
[definition: Estimation Error]
Let $x:[0,\infty)\to \mathbb R^n$ be the plant state trajectory and let $\hat{x}:[0,\infty)\to \mathbb R^n$ be an observer state trajectory. The estimation error is the map $e:[0,\infty)\to \mathbb R^n$ defined by
\begin{align*}
e(t)=x(t)-\hat{x}(t).
\end{align*}
[/definition]
Subtracting the observer equation from the plant equation gives the autonomous error equation
\begin{align*}
\dot{e}(t)=(A-LC)e(t).
\end{align*}
Thus the estimator dynamics depend on $L$ and $C$, but not on the regulator gain $K$. This autonomy is the algebraic source of separation.
## The Separation Principle
The design problem is now split into two questions: can $K$ make $A-BK$ stable, and can $L$ make $A-LC$ stable? These are exactly the stabilisability and detectability hypotheses developed earlier in the course. The separation principle states that these two independent possibilities are sufficient for the dynamic output-feedback controller to stabilise the original plant.
[quotetheorem:6408]
[citeproof:6408]
The theorem gives a constructive recipe: design the regulator as if the state were measured, design the observer as if the input were known, and then connect them. Its conclusion is spectral, not merely qualitative, so pole placement and Riccati-based design can be read directly from the diagonal blocks.
Each hypothesis removes a specific obstruction. If $(A,B)$ has an unstable mode that no input can affect, then no choice of $K$ can make $A-BK$ Hurwitz, and the same uncontrollable growth remains under any observer-based controller of this form. A scalar example is $A=(1)$ and $B=(0)$: for every gain $K$, the closed-loop matrix remains $A-BK=(1)$. If $(A,C)$ has an unstable mode that does not appear in the measured output, then the observer error in that mode cannot be corrected by $L(y-C\hat{x})$, so $A-LC$ cannot be made Hurwitz. A scalar example is $A=(1)$ and $C=(0)$: for every gain $L$, the observer error matrix remains $A-LC=(1)$. Separation also does not say that every output-feedback stabilisation problem reduces to this observer architecture; it says that once stabilising gains $K$ and $L$ exist, connecting them produces a stable nominal closed loop. The remaining design questions are therefore about transient behaviour, noise sensitivity, and robustness, which are not captured by the pole-union statement alone.
[example: Full-State LQR Versus Observer-Based Feedback]
Let $K$ be the LQR gain for a stabilisable pair $(A,B)$ with $Q\ge 0$ and $R>0$, and let $L$ be an observer gain such that $A-LC$ has eigenvalues in the open left half-plane. With full-state feedback $u=-Kx$, substituting the control law into the plant equation gives
\begin{align*}
\dot{x}=Ax+Bu=Ax+B(-Kx)=Ax-BKx=(A-BK)x.
\end{align*}
Thus the full-state closed-loop poles are the eigenvalues of $A-BK$.
With observer-based feedback, $u=-K\hat{x}$ and the estimation error is $e=x-\hat{x}$, so $\hat{x}=x-e$. Substituting this identity into the plant equation gives
\begin{align*}
\dot{x}=Ax+B(-K\hat{x})=Ax-BK(x-e)=Ax-BKx+BKe=(A-BK)x+BKe.
\end{align*}
The observer error satisfies
\begin{align*}
\dot{e}=(A-LC)e.
\end{align*}
Therefore the combined dynamics in the coordinates $(x,e)$ are
\begin{align*}
\left(\begin{array}{c}\dot{x}\cr \dot{e}\end{array}\right)=\left(\begin{array}{cc}A-BK&BK\cr 0&A-LC\end{array}\right)\left(\begin{array}{c}x\cr e\end{array}\right).
\end{align*}
For every $\lambda$, the characteristic determinant factors because this matrix is block upper triangular:
\begin{align*}
\det\left(\lambda I-\left(\begin{array}{cc}A-BK&BK\cr 0&A-LC\end{array}\right)\right)=\det(\lambda I-(A-BK))\det(\lambda I-(A-LC)).
\end{align*}
Hence the observer-based closed-loop poles are exactly the LQR regulator poles together with the observer poles, counted with algebraic multiplicity.
The transient response can still differ from full-state LQR because the plant equation contains the extra forcing term $BKe$; as $e(t)$ decays this term vanishes, but before it vanishes it can change the plant trajectory.
[/example]
This comparison is important in simulations: a fast estimator makes the observer-based response close to the full-state LQR response after a short initial layer, while a slow estimator can dominate the transient even though the closed loop is stable. The calculation also uses a general matrix fact: for a block upper triangular matrix with square diagonal blocks $A$ and $B$ and a zero lower-left block, the determinant is $(\det A)(\det B)$. The same factorization holds for block lower triangular matrices.
This algebraic result is why the off-diagonal coupling $BK$ does not shift closed-loop eigenvalues. The hypotheses are structural rather than cosmetic: the diagonal blocks must be square so that their spectra and characteristic polynomials are defined, and the lower-left block must be zero so that the determinant factors into diagonal-block determinants. If a matrix has both off-diagonal couplings present, the eigenvalues can move. For instance,
\begin{align*}
\left(\begin{array}{cc} 0 & 1 \cr 1 & 0 \end{array}\right)
\end{align*}
has diagonal blocks $0$ and $0$, but its eigenvalues are $1$ and $-1$ because the lower-left coupling is nonzero. Thus triangular cascade form is what protects the diagonal spectra. The upper-right block can still affect eigenvectors, conditioning, and transient amplification, so separation should not be interpreted as a guarantee of identical time responses. This distinction will recur in robustness arguments: cascade structure preserves nominal poles, but the coupling channel still controls how modelling error, estimator transients, and sensor noise enter the regulated plant.
## Certainty Equivalence and the LQG Controller
The stochastic version of the same architecture starts from a different question. In LQR the state is known and the disturbance model is irrelevant to the feedback gain, while in Kalman filtering the control input is known and the cost function is irrelevant to the estimator gain. LQG combines these two Riccati designs and uses the estimate as though it were the true state.
Consider the continuous-time stochastic linear system
\begin{align*}
dx(t)=Ax(t)\,dt+Bu(t)\,dt+G\,dw(t), \qquad dy(t)=Cx(t)\,dt+dv(t),
\end{align*}
where $w$ and $v$ are independent Wiener processes with covariance intensities $W\ge 0$ and $V>0$. With persistent process noise, the undiscounted total quadratic cost over $[0,\infty)$ is usually infinite even for a stabilising controller, so the infinite-horizon objective is formulated as the long-run average cost
\begin{align*}
J_{\mathrm{av}}[u]=\limsup_{T\to\infty}\frac{1}{T}\mathbb E\left[\int_0^{\!T} \left(x(t)^\top Qx(t)+u(t)^\top Ru(t)\right)\,dt\right],
\end{align*}
with $Q\ge 0$ and $R>0$.
[definition: LQG Controller]
On a probability space $(\Omega,\mathcal F,\mathbb P)$ carrying the process and measurement noises, an infinite-horizon LQG controller is the causal dynamic output-feedback system with controller state $\hat{x}:[0,\infty)\times \Omega\to \mathbb R^n$, measurement process $y:[0,\infty)\times \Omega\to \mathbb R^p$ adapted to the output filtration, and control output $u:[0,\infty)\times \Omega\to \mathbb R^m$ adapted to the same filtration, defined by
\begin{align*}
d\hat{x}(t)=A\hat{x}(t)\,dt+Bu(t)\,dt+L\bigl(dy(t)-C\hat{x}(t)\,dt\bigr), \qquad u(t)=-K\hat{x}(t),
\end{align*}
where $K \in \mathbb R^{m\times n}$ and $L \in \mathbb R^{n\times p}$ are constant gains.
[/definition]
In the LQG synthesis, the regulator gain is $K=R^{-1}B^\top P$, where $P$ is the stabilising solution of the LQR algebraic Riccati equation. The filter gain is $L=\Sigma C^\top V^{-1}$, where $\Sigma \ge 0$ denotes the steady-state Kalman error covariance, obtained from the Kalman-Bucy algebraic Riccati equation. The word certainty equivalence refers to the form of the control law: the controller uses the full-information LQR formula and substitutes the conditional mean estimate for the unknown state. This suggests a strong claim, because it says that the stochastic output-feedback optimum is obtained by solving two separate Riccati equations rather than one coupled design problem. The next theorem states the hypotheses under which this separation of estimation and regulation is valid.
[quotetheorem:6409]
[citeproof:6409]
The theorem does not say that uncertainty is absent. It says that uncertainty is accounted for by the filter, and the controller acts on the best state estimate using the same law as in the full-state quadratic regulator.
The assumptions again correspond to concrete failure modes. If $(A,B)$ is not stabilisable, an unstable state component may remain outside the reach of the input, so no finite average-cost stabilising LQR law exists for that component. If $(Q^{1/2},A)$ is not detectable, an unpenalised unstable mode can prevent the Riccati equation from selecting a stabilising regulator in a well-posed way. If $(A,C)$ is not detectable, the filter cannot reconstruct an unstable unmeasured component, and if $(A,GW^{1/2})$ lacks the stated stabilisability property, the Kalman covariance Riccati equation may fail to have the stabilising solution needed for a stationary filter. For instance, take $A=(1)$ and $C=(0)$. The scalar state is unstable and entirely unobserved, so the Kalman innovation carries no information about the growing mode; the filter cannot produce a stable estimate of that component from the measurement history. Gaussianity and linearity matter because the conditional mean and covariance then form a finite-dimensional sufficient statistic; with nonlinear dynamics or non-Gaussian noise, higher conditional moments can affect optimal decisions. Independence of process and measurement noise is also part of the standard filter derivation; correlated noises require modified Riccati equations and gains. Certainty equivalence therefore identifies the optimal law within the classical linear-Gaussian average-cost model, but it does not provide robustness margins, saturation guarantees, or optimality for nonlinear or distributionally misspecified plants.
[example: LQG Control of the Double Integrator]
For the noisy double integrator with position measurements,
\begin{align*}
dx_1(t)=x_2(t)\,dt,\qquad dx_2(t)=u(t)\,dt+\sigma_w\,dw(t),\qquad dy(t)=x_1(t)\,dt+\sigma_v\,dv(t),
\end{align*}
write
\begin{align*}
A=\left(\begin{array}{cc}0&1\cr 0&0\end{array}\right),\qquad C=\left(\begin{array}{cc}1&0\end{array}\right),\qquad G=\left(\begin{array}{c}0\cr \sigma_w\end{array}\right),\qquad V=\sigma_v^2.
\end{align*}
Let the steady-state Kalman covariance be
\begin{align*}
\Sigma=\left(\begin{array}{cc}s_{11}&s_{12}\cr s_{12}&s_{22}\end{array}\right).
\end{align*}
The Kalman-Bucy algebraic Riccati equation is
\begin{align*}
A\Sigma+\Sigma A^\top+GG^\top-\Sigma C^\top V^{-1}C\Sigma=0.
\end{align*}
The two linear terms are
\begin{align*}
A\Sigma=\left(\begin{array}{cc}0&1\cr 0&0\end{array}\right)\left(\begin{array}{cc}s_{11}&s_{12}\cr s_{12}&s_{22}\end{array}\right)=\left(\begin{array}{cc}s_{12}&s_{22}\cr 0&0\end{array}\right)
\end{align*}
and
\begin{align*}
\Sigma A^\top=\left(\begin{array}{cc}s_{11}&s_{12}\cr s_{12}&s_{22}\end{array}\right)\left(\begin{array}{cc}0&0\cr 1&0\end{array}\right)=\left(\begin{array}{cc}s_{12}&0\cr s_{22}&0\end{array}\right).
\end{align*}
The process-noise term is
\begin{align*}
GG^\top=\left(\begin{array}{c}0\cr \sigma_w\end{array}\right)\left(\begin{array}{cc}0&\sigma_w\end{array}\right)=\left(\begin{array}{cc}0&0\cr 0&\sigma_w^2\end{array}\right).
\end{align*}
For the measurement term, first
\begin{align*}
\Sigma C^\top=\left(\begin{array}{cc}s_{11}&s_{12}\cr s_{12}&s_{22}\end{array}\right)\left(\begin{array}{c}1\cr 0\end{array}\right)=\left(\begin{array}{c}s_{11}\cr s_{12}\end{array}\right).
\end{align*}
Also
\begin{align*}
C\Sigma=\left(\begin{array}{cc}1&0\end{array}\right)\left(\begin{array}{cc}s_{11}&s_{12}\cr s_{12}&s_{22}\end{array}\right)=\left(\begin{array}{cc}s_{11}&s_{12}\end{array}\right).
\end{align*}
Since $V^{-1}=1/\sigma_v^2$, this gives
\begin{align*}
\Sigma C^\top V^{-1}C\Sigma=\frac{1}{\sigma_v^2}\left(\begin{array}{c}s_{11}\cr s_{12}\end{array}\right)\left(\begin{array}{cc}s_{11}&s_{12}\end{array}\right)=\frac{1}{\sigma_v^2}\left(\begin{array}{cc}s_{11}^2&s_{11}s_{12}\cr s_{11}s_{12}&s_{12}^2\end{array}\right).
\end{align*}
Equating entries in the Riccati equation gives
\begin{align*}
2s_{12}-\frac{s_{11}^2}{\sigma_v^2}=0.
\end{align*}
\begin{align*}
s_{22}-\frac{s_{11}s_{12}}{\sigma_v^2}=0.
\end{align*}
\begin{align*}
\sigma_w^2-\frac{s_{12}^2}{\sigma_v^2}=0.
\end{align*}
For $\sigma_w>0$ and $\sigma_v>0$, the nonnegative stabilising solution uses $s_{12}>0$, so the third equation gives
\begin{align*}
s_{12}=\sigma_w\sigma_v.
\end{align*}
Substituting this into the first equation gives
\begin{align*}
s_{11}^2=2s_{12}\sigma_v^2=2\sigma_w\sigma_v^3,
\end{align*}
so
\begin{align*}
s_{11}=\sqrt{2\sigma_w\sigma_v^3}.
\end{align*}
Substituting $s_{11}$ and $s_{12}$ into the second equation gives
\begin{align*}
s_{22}=\frac{s_{11}s_{12}}{\sigma_v^2}=\frac{\sqrt{2\sigma_w\sigma_v^3}\,\sigma_w\sigma_v}{\sigma_v^2}=\sqrt{2\sigma_w^3\sigma_v}.
\end{align*}
Therefore the Kalman gain is
\begin{align*}
L=\Sigma C^\top V^{-1}=\frac{1}{\sigma_v^2}\left(\begin{array}{cc}s_{11}&s_{12}\cr s_{12}&s_{22}\end{array}\right)\left(\begin{array}{c}1\cr 0\end{array}\right)=\left(\begin{array}{c}\sqrt{2\sigma_w/\sigma_v}\cr \sigma_w/\sigma_v\end{array}\right).
\end{align*}
Thus larger measurement-noise intensity $\sigma_v^2$ decreases both entries of $L$, so the innovation $dy-C\hat{x}\,dt$ is weighted less. Larger process-noise intensity $\sigma_w^2$ increases both entries of $L$, so the filter responds more strongly to the measured position innovation.
The LQG control law still has the certainty-equivalence form
\begin{align*}
u(t)=-K\hat{x}(t)=-k_1\hat{x}_1(t)-k_2\hat{x}_2(t).
\end{align*}
Changing $\sigma_w^2$ or $\sigma_v^2$ changes the estimator gain $L$, but it does not change the algebraic feedback form: the regulator acts on the estimate as if it were the state.
[/example]
This double-integrator example is the standard mental model for LQG: the regulator decides how aggressively to move a known mass, and the filter decides how much velocity information to infer from noisy position readings.
## Slow and Fast Observers
Separation gives closed-loop eigenvalues, but implementation also asks how observer speed affects transients and robustness. Making observer poles faster reduces estimation error quickly in the nominal model, yet it may amplify measurement noise and actuator activity. A useful design practice is to compare slow, matched, and fast observers rather than treating faster estimation as automatically better.
[example: Output Feedback with Slow Versus Fast Observer Poles]
Suppose the regulator block $A-BK$ has dominant poles near $-1$ and $-2$. The corresponding modal factors have the form $e^{-t}$ and $e^{-2t}$, so the slower regulator time scale is governed by $e^{-t}$. At $t=5$,
\begin{align*}
e^{-5}\approx 0.0067,\qquad e^{-10}\approx 0.000045,
\end{align*}
so both regulator modes have decayed substantially by this time scale.
Now compare an observer error matrix $A-LC$ with poles near $-0.2$ and $-0.3$. Its modal factors are $e^{-0.2t}$ and $e^{-0.3t}$. At the same time $t=5$,
\begin{align*}
e^{-0.2\cdot 5}=e^{-1}\approx 0.3679,\qquad e^{-0.3\cdot 5}=e^{-1.5}\approx 0.2231.
\end{align*}
Thus the estimation error can still be a visible part of the closed-loop motion after the regulator modes have mostly decayed. In the separated coordinates,
\begin{align*}
\dot{x}=(A-BK)x+BKe,\qquad \dot{e}=(A-LC)e,
\end{align*}
so the plant is driven not only by the regulator term $(A-BK)x$ but also by the forcing term $BKe$. When $e(t)$ decays on the slower $e^{-0.2t}$ or $e^{-0.3t}$ scale, that forcing can dominate the transient plant response.
If instead $A-LC$ has poles near $-5$ and $-6$, then the observer modal factors are $e^{-5t}$ and $e^{-6t}$. At $t=1$,
\begin{align*}
e^{-5}\approx 0.0067,\qquad e^{-6}\approx 0.0025.
\end{align*}
On this faster observer scale, $e(t)$ becomes small before the regulator modes $e^{-t}$ and $e^{-2t}$ have fully died away, so the forcing term $BKe$ quickly becomes negligible and the plant trajectory approaches the full-state feedback trajectory.
The same large observer gain that moves the poles far left also multiplies measurement error in the innovation term. If the measured output is $y=Cx+\eta$, then
\begin{align*}
L(y-C\hat{x})=L(Cx+\eta-C\hat{x})=LC(x-\hat{x})+L\eta=LCe+L\eta.
\end{align*}
The term $L\eta$ shows explicitly why high-frequency sensor noise can be injected into the observer state and then into the control law
\begin{align*}
u=-K\hat{x}.
\end{align*}
Fast observer poles can therefore improve nominal estimation transients while producing larger or more oscillatory control inputs when the measurement noise is significant.
[/example]
The separation principle is therefore a stability and pole-location theorem, not a complete robustness theorem. It licenses independent synthesis of $K$ and $L$, while leaving bandwidth, noise sensitivity, saturation, and modelling error as design constraints.
[remark: What Separation Does Not Cover]
The closed-loop spectrum separates for the nominal linear model. Performance margins, transient amplification, and robustness to unmodelled dynamics are affected by the interaction of $K$, $L$, actuator limits, sensor noise, and plant uncertainty. These issues motivate loop-transfer recovery, $H^\infty$ methods, and robust output-feedback design in later courses.
[/remark]
The chapter's main lesson is that output feedback for linear systems has a rare modular structure. Stabilising state feedback and stable state estimation can be designed as separate mathematical problems, then joined into a dynamic controller whose nominal poles are exactly the regulator poles together with the estimator poles.
After combining regulation and estimation, the course turns to what happens when the nominal model is only approximate. Chapter 12 studies robustness margins and model limitations, asking how much uncertainty the closed loop can tolerate before the design breaks down.
# 12. Robustness Margins and Model Limitations
This chapter revisits feedback design after Chapters 3, 4, 7, and 8 have supplied controllability, observability, pole placement, and observer construction as nominal algebraic tools. The central question is no longer whether a chosen model can be stabilised, but whether the resulting closed loop tolerates disturbances, sensor noise, hidden modes, and unmodelled dynamics. The chapter develops the standard robustness coordinates $S$ and $T$, explains observer bandwidth as a noise-amplification tradeoff, and finishes with internal stability and small-gain margins.
## Sensitivity in Linear Feedback Loops
A stabilising feedback loop is not judged only by the location of its nominal closed-loop poles. The practical question is how the closed-loop maps external signals into tracking errors, plant outputs, and control actions. For a scalar loop with plant $P(s)$ and controller $K(s)$, assume negative feedback with reference $r$, output disturbance $d$, sensor noise $n$, output $y$, and error signal $e = r - y - n$.
[definition: Loop Transfer Function]
Let $P,K \in \mathbb{R}(s)$ be scalar real-rational transfer functions for which the product is defined. The loop transfer function is the map $L: \mathbb{C} \setminus \operatorname{poles}(PK) \to \mathbb{C}$ given by
\begin{align*}
L(s) := P(s)K(s).
\end{align*}
[/definition]
The loop transfer function is the open-loop product seen around the feedback cycle. To compare disturbance rejection with noise transmission, we need the two closed-loop factors obtained by solving through the denominator $1+L$.
[definition: Sensitivity And Complementary Sensitivity]
Let $L \in \mathbb{R}(s)$ be a scalar loop transfer function, and assume $1+L$ is not the zero rational function. Let
\begin{align*}
\Omega_S := \mathbb{C}\setminus \operatorname{poles}\bigl((1+L)^{-1}\bigr), \qquad
\Omega_T := \mathbb{C}\setminus \operatorname{poles}\bigl(L(1+L)^{-1}\bigr).
\end{align*}
The sensitivity function is the scalar rational map $S:\Omega_S\to\mathbb{C}$ and the complementary sensitivity function is the scalar rational map $T:\Omega_T\to\mathbb{C}$ defined by
\begin{align*}
S(s) := \frac{1}{1+L(s)}, \qquad T(s) := \frac{L(s)}{1+L(s)}.
\end{align*}
[/definition]
These functions satisfy $S+T=1$, so reducing one typically enlarges the other over some frequency range. To use this identity in feedback design, one must know which physical signals these rational functions actually govern. The obstruction is that disturbance rejection, reference tracking, and measurement-noise transmission enter the closed loop at different summing points, so the same algebraic loop $L$ produces different closed-loop maps.
[quotetheorem:6410]
[citeproof:6410]
The theorem explains the central tradeoff of classical feedback design. Its hypotheses matter: if $1+L$ has an imaginary-axis zero, the displayed transfer functions have a pole on the stability boundary, so the frequency response no longer represents a stable closed-loop map. The result also does not say that small $S$ is always good, because $S+T=1$ forces any reduction in disturbance sensitivity to affect the complementary channel. Large $|L(i\omega)|$ makes $|S(i\omega)|$ small and rejects low-frequency disturbances, while the same choice makes $T(i\omega)$ close to $1$ and transmits measurement noise at frequencies where the loop gain remains high. This prepares the Bode-integral limitation, where the impossibility of making $S$ uniformly small is made quantitative.
[illustration:sensitivity-complementary-tradeoff]
[example: High Gain Feedback Tradeoff]
Let $P(s)=1/(s+1)$ and $K(s)=k$ with $k>0$. The loop transfer function is the product
\begin{align*}
L(s)=P(s)K(s)=\frac{1}{s+1}k=\frac{k}{s+1}.
\end{align*}
Using $S=1/(1+L)$ gives
\begin{align*}
S(s)=\frac{1}{1+\frac{k}{s+1}}=\frac{1}{\frac{s+1+k}{s+1}}=\frac{s+1}{s+1+k}.
\end{align*}
Using $T=L/(1+L)$ gives
\begin{align*}
T(s)=\frac{\frac{k}{s+1}}{1+\frac{k}{s+1}}=\frac{\frac{k}{s+1}}{\frac{s+1+k}{s+1}}=\frac{k}{s+1+k}.
\end{align*}
On the imaginary axis,
\begin{align*}
S(i\omega)=\frac{1+i\omega}{1+k+i\omega}.
\end{align*}
Therefore
\begin{align*}
|S(i\omega)|^2=\frac{|1+i\omega|^2}{|1+k+i\omega|^2}=\frac{1+\omega^2}{(1+k)^2+\omega^2}.
\end{align*}
At $\omega=0$ this becomes
\begin{align*}
|S(0)|=\frac{1}{1+k}.
\end{align*}
Thus increasing $k$ reduces the constant-disturbance gain from $d$ to $y$, because that disturbance path is multiplied by $S$.
For the complementary sensitivity,
\begin{align*}
T(i\omega)=\frac{k}{1+k+i\omega}.
\end{align*}
Hence
\begin{align*}
|T(i\omega)|^2=\frac{k^2}{(1+k)^2+\omega^2}.
\end{align*}
When $\omega$ is much larger than $k$, the term $\omega^2$ dominates the denominator, so $|T(i\omega)|$ is approximately $k/\omega$ and is small. The transition occurs at frequencies on the order of $1+k$, so increasing $k$ moves the transition to higher frequency. High gain therefore improves low-frequency disturbance rejection, but it also keeps $T$ large over a wider band, increasing the range of sensor-noise frequencies transmitted to the output.
[/example]
The next obstruction is deeper than this elementary computation. Even if a controller is allowed to have high order, stable feedback cannot make the sensitivity function small at every frequency for many important plant classes.
[remark: Bode Sensitivity Integral As A Limitation Principle]
Let $L \in \mathbb{R}(s)$ be a proper scalar loop transfer function, and set $S=(1+L)^{-1}$. Assume that $S$ is stable, that neither $L$ nor $1+L$ has zeros or poles on the imaginary axis, that there are no unstable pole-zero cancellations in the formation of $L$, and that $L(s)=O(1/s)$ as $|s|\to\infty$ in the closed right half-plane. If $\mathcal P_+$ is the multiset of poles $p$ of $L$ with $\operatorname{Re}(p)>0$, then
\begin{align*}
\int_0^\infty \log |S(i\omega)|\,d\omega = \pi \sum_{p \in \mathcal P_+} \operatorname{Re}(p).
\end{align*}
[/remark]
This statement is often called the waterbed effect: lowering $|S|$ over one frequency band forces a compensating increase elsewhere, with unstable open-loop poles increasing the required total excess. The high-frequency rolloff assumption is part of the statement, because additional feedthrough or slower decay changes the contour contribution at infinity. The exclusion of unstable cancellations is also essential: a cancelled unstable mode can disappear from the displayed transfer function while remaining internally present in a realization. In this chapter the result functions as a limitation theorem rather than as a design recipe: it tells the designer which frequency-domain tradeoffs cannot be removed by increasing controller order.
[example: Disturbance Rejection Cannot Be Uniform]
Suppose the hypotheses of the *[Bode Sensitivity Integral](/theorems/6411)* give
\begin{align*}
\int_0^\infty \log |S(i\omega)|\,d\omega=0.
\end{align*}
Assume the design achieves disturbance rejection on a nonempty low-frequency interval $I$, so $|S(i\omega)|<1$ for every $\omega\in I$. Choose $\omega_0\in I$. Since $S$ has no pole on the imaginary axis under the stability hypotheses, $\omega\mapsto \log |S(i\omega)|$ is continuous near $\omega_0$. Because $\log |S(i\omega_0)|<0$, there is a smaller interval $J\subset I$ and a number $\alpha>0$ such that
\begin{align*}
\log |S(i\omega)|\le -\alpha
\end{align*}
for every $\omega\in J$. Hence
\begin{align*}
\int_J \log |S(i\omega)|\,d\omega \le \int_J (-\alpha)\,d\omega = -\alpha |J|<0.
\end{align*}
If $\log |S(i\omega)|\le 0$ almost everywhere outside $J$, then
\begin{align*}
\int_0^\infty \log |S(i\omega)|\,d\omega
=
\int_J \log |S(i\omega)|\,d\omega
+
\int_{[0,\infty)\setminus J}\log |S(i\omega)|\,d\omega
<0,
\end{align*}
contradicting the integral value $0$. Therefore $\log |S(i\omega)|>0$ on some set of frequencies of positive measure, and on that set
\begin{align*}
\log |S(i\omega)|>0
\quad\Longleftrightarrow\quad
|S(i\omega)|>e^0=1.
\end{align*}
Thus improving disturbance rejection on one frequency band necessarily creates a frequency range where disturbances are amplified, which is why specifications are stated over bands rather than as uniform improvement at all frequencies.
[/example]
## Noise Amplification and Observer Bandwidth
The observer chapters showed how to assign estimator poles when the pair $(C,A)$ is observable. The robustness question is what happens when the measurement is noisy, because fast estimation requires high gain from the measurement residual into the state estimate. This section treats the observer as another feedback loop whose bandwidth must be chosen with the sensor model in mind.
[definition: Luenberger Observer With Measurement Noise]
Let $X=\mathbb{R}^n$ be the state space, $U=\mathbb{R}^m$ the input space, $Y=\mathbb{R}^p$ the measured-output space, and $N=\mathbb{R}^p$ the measurement-noise space. Let
\begin{align*}
A:X\to X,\qquad B:U\to X,\qquad C:X\to Y,\qquad H:Y\to X
\end{align*}
be linear maps. For the system with state equation $\dot{x}=Ax+Bu$ and measured output $y=Cx+n$, where $x(t)\in X$, $u(t)\in U$, $y(t)\in Y$, and $n(t)\in N$, a Luenberger observer with gain $H$ is the system on $X$ given by
\begin{align*}
\dot{\hat{x}} = A\hat{x}+Bu+H(y-C\hat{x}).
\end{align*}
[/definition]
The estimation error $\tilde{x}=x-\hat{x}$ is the right coordinate for assessing the design because the plant state itself mixes control input, initial condition, and sensor error. After subtracting the observer from the plant, the input $u$ cancels, but measurement noise remains injected through the gain $H$. The key design obstruction is therefore not only whether $A-HC$ is stable, but how the chosen observer gain filters or amplifies noise in the error channel.
[quotetheorem:6412]
[citeproof:6412]
Fast observer poles reduce the homogeneous error quickly, but the same gain matrix $H$ multiplies measurement noise before it is filtered by the observer dynamics. The Hurwitz hypothesis is the boundary between a stable estimator filter and an estimator that amplifies even zero noise through its own unstable error dynamics. The theorem does not choose $H$; it only exposes the transfer map that must be checked after pole assignment. The resulting controller may look excellent in a noise-free simulation and perform poorly with real sensors, which is why observer design is paired with bandwidth and noise modelling rather than treated as pure eigenvalue placement.
[example: Aggressive Scalar Observer]
Consider the scalar plant $\dot{x}=ax+bu$ with measured output $y=x+n$, and use the observer
\begin{align*}
\dot{\hat{x}}=a\hat{x}+bu+h(y-\hat{x}).
\end{align*}
For $\tilde{x}=x-\hat{x}$, substituting $y=x+n$ into the observer equation gives
\begin{align*}
\dot{\tilde{x}}=\dot{x}-\dot{\hat{x}}.
\end{align*}
\begin{align*}
\dot{\tilde{x}}=(ax+bu)-\bigl(a\hat{x}+bu+h(x+n-\hat{x})\bigr).
\end{align*}
\begin{align*}
\dot{\tilde{x}}=a(x-\hat{x})-h(x-\hat{x})-hn.
\end{align*}
\begin{align*}
\dot{\tilde{x}}=(a-h)\tilde{x}-hn.
\end{align*}
If $n=0$ and $h>a$, then
\begin{align*}
\dot{\tilde{x}}=-(h-a)\tilde{x}.
\end{align*}
The scalar homogeneous solution is
\begin{align*}
\tilde{x}(t)=e^{-(h-a)t}\tilde{x}(0),
\end{align*}
so the nominal estimation error decays at exponential rate $h-a$.
To compute the noise-to-error transfer function, take Laplace transforms with zero initial condition:
\begin{align*}
s\tilde{X}(s)=(a-h)\tilde{X}(s)-hN(s).
\end{align*}
Moving the $\tilde{X}(s)$ terms to the left gives
\begin{align*}
(s-a+h)\tilde{X}(s)=-hN(s).
\end{align*}
Dividing by $(s-a+h)N(s)$ gives
\begin{align*}
\frac{\tilde{X}(s)}{N(s)}=-\frac{h}{s-a+h}.
\end{align*}
Thus
\begin{align*}
G_{n\to\tilde{x}}(s)=-\frac{h}{s-a+h}.
\end{align*}
On the imaginary axis,
\begin{align*}
G_{n\to\tilde{x}}(i\omega)=-\frac{h}{i\omega+h-a}.
\end{align*}
Therefore
\begin{align*}
|G_{n\to\tilde{x}}(i\omega)|^2=\frac{h^2}{(h-a)^2+\omega^2}.
\end{align*}
At $\omega=0$,
\begin{align*}
|G_{n\to\tilde{x}}(0)|=\frac{h}{h-a},
\end{align*}
and this tends to $1$ as $h\to\infty$. The half-power frequency relative to the zero-frequency gain is determined by
\begin{align*}
\frac{h^2}{(h-a)^2+\omega^2}=\frac{1}{2}\frac{h^2}{(h-a)^2}.
\end{align*}
Cancelling $h^2$ and cross-multiplying gives
\begin{align*}
2(h-a)^2=(h-a)^2+\omega^2.
\end{align*}
Hence
\begin{align*}
\omega^2=(h-a)^2.
\end{align*}
Since frequency is nonnegative and $h>a$, the half-power frequency is
\begin{align*}
\omega=h-a.
\end{align*}
Increasing $h$ therefore makes the nominal observer error decay faster, but it also moves the noise bandwidth upward, so sensor noise is passed into the estimation error over a wider frequency range.
[/example]
## Internal Stability of Interconnected State Space Systems
Input-output stability of a single closed-loop transfer function is not enough when the system is an interconnection with hidden states. A pole-zero cancellation can hide an unstable internal mode from a chosen input-output channel. Internal stability asks for stability of every state in the interconnected realization, not only boundedness of a displayed transfer function.
[definition: Internal Stability]
A finite-dimensional linear interconnection is internally stable if, with all external inputs set to zero, every internal state component converges to $0$ exponentially for every initial condition.
[/definition]
This definition is stated in terms of an actual finite-dimensional realization of the interconnected system, with its specified internal state space and state equation. For dynamic output feedback, the next step is to build the combined state equation and reduce internal stability to the Hurwitz property of its closed-loop generator.
[quotetheorem:6413]
[citeproof:6413]
The criterion makes the danger of exact cancellation precise. The well-posedness assumption excludes algebraic feedback contradictions, while the Hurwitz condition tests every component of the combined plant-controller state. A transfer function may lose an unstable pole after algebraic cancellation, while the cancelled mode remains an eigenvalue of the interconnected state matrix. The theorem does not say that every stable input-output transfer function has a stable realization; minimality and realization structure still matter. This is the reason robustness analysis treats internal stability before applying uncertainty margins.
[example: Hidden Unstable Mode]
Consider the realization
\begin{align*}
\dot{x}_1=-x_1+u,\qquad \dot{x}_2=x_2,\qquad y=x_1.
\end{align*}
To compute the input-output transfer function, take Laplace transforms with zero initial condition. The first state equation gives
\begin{align*}
sX_1(s)=-X_1(s)+U(s).
\end{align*}
Moving the $X_1(s)$ terms to the left gives
\begin{align*}
(s+1)X_1(s)=U(s).
\end{align*}
Since $Y(s)=X_1(s)$, division by $U(s)$ gives
\begin{align*}
\frac{Y(s)}{U(s)}=\frac{X_1(s)}{U(s)}=\frac{1}{s+1}.
\end{align*}
Thus the transfer function from $u$ to $y$ has only the stable pole $-1$.
The second state does not appear in the output equation, since $y=x_1$ contains no $x_2$ term. With external input set to zero, its internal dynamics are
\begin{align*}
\dot{x}_2=x_2.
\end{align*}
The scalar solution is
\begin{align*}
x_2(t)=e^t x_2(0),
\end{align*}
because
\begin{align*}
\frac{d}{dt}\bigl(e^t x_2(0)\bigr)=e^t x_2(0)=x_2(t).
\end{align*}
If $x_2(0)\ne 0$, then $|x_2(t)|=e^t|x_2(0)|\to\infty$ as $t\to\infty$, so this internal state does not converge to $0$ exponentially. The realization is therefore not internally stable, even though the displayed input-output transfer function is stable; this is why hidden modes cannot be ignored in feedback analysis.
[/example]
## Small Gain and Stable Unmodelled Dynamics
A model is often trusted only up to an uncertainty block: neglected flexible modes, delay approximations, actuator dynamics, or sensor filters. The small-gain viewpoint asks whether stability persists for every stable uncertainty whose gain is below a specified margin. This gives a robustness test that uses norms rather than exact pole locations.
[definition: Stable SISO Transfer Function Norm]
Let $RH_\infty$ denote the set of proper scalar real-rational transfer functions with no poles in the closed right half-plane. The stable SISO transfer-function norm is the functional $\|\cdot\|_\infty: RH_\infty \to \mathbb{R}_{\ge 0}$ defined by
\begin{align*}
\|G\|_\infty := \sup_{\omega\in\mathbb R}|G(i\omega)|.
\end{align*}
[/definition]
This norm measures the largest sinusoidal amplification over frequency. In a feedback loop, the product of two stable gains controls whether repeated circulation around the loop can amplify a signal without bound, which leads to the small-gain stability test.
[quotetheorem:6414]
[citeproof:6414]
The theorem turns uncertainty modelling into a margin calculation. The strict inequality is essential: equality can place the Nyquist curve through $-1$, producing a marginal or unstable closed loop. Stability of $G$ and $\Delta$ is also part of the theorem; small gain in this form does not hide unstable dynamics inside either block. If the nominal closed-loop exposes an uncertainty channel with transfer $G$, then any stable multiplicative or additive error small enough in $\|\cdot\|_\infty$ preserves stability. This connects the frequency-domain sensitivity viewpoint with model validation, because the same frequencies at which $S$ and $T$ are large are usually the frequencies at which uncertainty margins are tight.
[example: Unmodelled Actuator Lag]
Suppose $\tau>0$ and a controller was designed for the ideal actuator $1$, while the actual actuator is
\begin{align*}
A_\tau(s)=\frac{1}{\tau s+1}.
\end{align*}
The additive modelling error is
\begin{align*}
A_\tau(s)-1=\frac{1}{\tau s+1}-1.
\end{align*}
Putting the two terms over the common denominator $\tau s+1$ gives
\begin{align*}
A_\tau(s)-1=\frac{1-(\tau s+1)}{\tau s+1}.
\end{align*}
Since $1-(\tau s+1)=-\tau s$, this becomes
\begin{align*}
A_\tau(s)-1=\frac{-\tau s}{\tau s+1}.
\end{align*}
On the imaginary axis,
\begin{align*}
A_\tau(i\omega)-1=\frac{-\tau i\omega}{1+\tau i\omega}.
\end{align*}
Therefore
\begin{align*}
|A_\tau(i\omega)-1|^2=\frac{|-\tau i\omega|^2}{|1+\tau i\omega|^2}.
\end{align*}
The numerator is
\begin{align*}
|-\tau i\omega|^2=\tau^2\omega^2,
\end{align*}
and the denominator is
\begin{align*}
|1+\tau i\omega|^2=1+\tau^2\omega^2.
\end{align*}
Hence
\begin{align*}
|A_\tau(i\omega)-1|^2=\frac{\tau^2\omega^2}{1+\tau^2\omega^2}.
\end{align*}
At $\omega=0$ this gives
\begin{align*}
|A_\tau(0)-1|^2=0,
\end{align*}
so the modelling error is zero at constant signals. For high frequency,
\begin{align*}
\frac{\tau^2\omega^2}{1+\tau^2\omega^2}=\frac{1}{1+\frac{1}{\tau^2\omega^2}}.
\end{align*}
As $\omega\to\infty$, the term $1/(\tau^2\omega^2)$ tends to $0$, so
\begin{align*}
\lim_{\omega\to\infty}|A_\tau(i\omega)-1|^2=1.
\end{align*}
Thus the error magnitude tends to $1$ at high frequency.
At the actuator bandwidth scale $\omega=1/\tau$,
\begin{align*}
|A_\tau(i/\tau)-1|^2=\frac{\tau^2(1/\tau)^2}{1+\tau^2(1/\tau)^2}.
\end{align*}
Since $\tau^2(1/\tau)^2=1$, this reduces to
\begin{align*}
|A_\tau(i/\tau)-1|^2=\frac{1}{2}.
\end{align*}
Therefore
\begin{align*}
|A_\tau(i/\tau)-1|=\frac{1}{\sqrt{2}}.
\end{align*}
If the nominal uncertainty channel is $G$, the small-gain sufficient condition is
\begin{align*}
\|G(A_\tau-1)\|_\infty<1
\end{align*}
by the *[Small Gain Theorem For Stable SISO Transfer Functions](/theorems/6414)*. Since the supremum over all real frequencies is at least the value at $\omega=1/\tau$,
\begin{align*}
\|G(A_\tau-1)\|_\infty\ge |G(i/\tau)|\,|A_\tau(i/\tau)-1|.
\end{align*}
Using the computed value of $|A_\tau(i/\tau)-1|$ gives
\begin{align*}
\|G(A_\tau-1)\|_\infty\ge \frac{|G(i/\tau)|}{\sqrt{2}}.
\end{align*}
Thus any design with $|G(i/\tau)|\ge \sqrt{2}$ fails this sufficient small-gain margin test at $\omega=1/\tau$. The actuator lag is harmless only where the nominal loop leaves enough frequency-domain margin, so aggressive high-bandwidth designs must include actuator bandwidth in the model.
[/example]
## Limits of Exact Pole Placement
Pole placement is an algebraic achievement: under controllability or observability hypotheses, the designer can assign a finite list of eigenvalues. Robust control asks a different question: whether the assigned dynamics survive perturbations, saturations, neglected states, and noise. The answer depends on eigenvector conditioning, input size, model validity, and the signal paths described above.
[remark: Pole Locations Are Not The Whole Closed Loop]
Two matrices may have the same eigenvalues while having very different transient behaviour because their eigenvectors are conditioned differently. A closed-loop matrix with far-left eigenvalues can still generate large short-time amplification before asymptotic decay. Thus a pole-placement design should be checked through state norms, control effort, sensitivity functions, and uncertainty margins.
[/remark]
The most common modelling failure is not a subtle transfer-function issue but a violation of the assumed input equation. A mathematical feedback law may request values of $u$ that the actuator cannot deliver.
[example: Actuator Saturation As A Modelling Warning]
For the scalar system $\dot{x}=x+u$, choose a gain $k>1$. In the unsaturated model the feedback law is $u=-kx$. Substituting this input into the state equation gives
\begin{align*}
\dot{x}=x+u.
\end{align*}
\begin{align*}
\dot{x}=x-kx.
\end{align*}
\begin{align*}
\dot{x}=(1-k)x.
\end{align*}
\begin{align*}
\dot{x}=-(k-1)x.
\end{align*}
Thus the nominal closed-loop scalar equation is stable: its solution is
\begin{align*}
x(t)=e^{-(k-1)t}x(0),
\end{align*}
and since $k-1>0$, $e^{-(k-1)t}\to 0$ as $t\to\infty$.
Now impose an actuator bound $|u|\le M$ with $M>0$. The implemented input is $u=\operatorname{sat}(-kx)$, where $\operatorname{sat}(v)=M$ for $v\ge M$, $\operatorname{sat}(v)=v$ for $-M\le v\le M$, and $\operatorname{sat}(v)=-M$ for $v\le -M$. The linear feedback is actually implemented only when
\begin{align*}
-M\le -kx\le M.
\end{align*}
Since $k>0$, this is equivalent to
\begin{align*}
-\frac{M}{k}\le x\le \frac{M}{k}.
\end{align*}
Outside this interval the closed-loop equation is not $\dot{x}=-(k-1)x$. If $x>M/k$, then $-kx<-M$, so $u=-M$ and
\begin{align*}
\dot{x}=x-M.
\end{align*}
If $x<-M/k$, then $-kx>M$, so $u=M$ and
\begin{align*}
\dot{x}=x+M.
\end{align*}
For example, when $x>\max\{M/k,M\}$, the saturated dynamics satisfy
\begin{align*}
\dot{x}=x-M>0.
\end{align*}
So the state moves farther in the positive direction instead of decaying. The high-gain pole-placement calculation is therefore a local model calculation, not a global stability guarantee; actuator constraints and regions of attraction must be checked even for this first-order plant.
[/example]
The course ends with a synthesis principle. Linear design methods are powerful because controllability, observability, Riccati equations, Kalman filtering, and separation reduce hard design tasks to structured algebra. They remain reliable only when paired with robustness checks that respect the frequency content of disturbances, the quality of measurements, internal stability of realizations, and the physical limits of actuators.
## Beyond and Connected Topics
These notes sit between finite-dimensional linear algebra and the broader theory of dynamical systems. The state-space viewpoint used throughout is the linear version of the autonomous systems studied in [Dynamics of ODEs](/page/Dynamics%20of%20ODEs): equilibria, linearisation, and qualitative stability become explicit matrix calculations when the vector field is linear. The robustness discussion also points toward nonlinear stability and bifurcation theory, where pole locations alone no longer decide the dynamics.
Several later directions use the same input-output perspective with richer state spaces. Rough paths and controlled differential equations extend the idea of a system driven by inputs beyond smooth forcing, while stochastic filtering turns noisy measurements into conditional state estimates. Optimisation and statistics provide another route outward: Riccati equations are quadratic optimisation problems in disguise, and Kalman filtering is the linear-Gaussian case of recursive inference.
For an applied course sequence, the natural next topics are nonlinear control, robust control, stochastic control, and system identification. Each keeps the same central question from this page: which properties of a model survive feedback, uncertainty, measurement noise, and implementation constraints?
## References
Androma, [Dynamics of ODEs](/page/Dynamics%20of%20ODEs).
Androma, [Hopf Bifurcation](/page/Hopf%20Bifurcation).
Androma, [Bifurcation Theory of One-Dimensional Maps](/page/Bifurcation%20Theory%20of%20One-Dimensional%20Maps).
Contents
- Introduction
- What Is the Control Problem?
- Why Linearity Matters
- The Main Structural Questions
- Feedback, Optimization, and Estimation
- How the Course Fits Together
- 1. State-Space Models and Solution Operators
- State, Input, Output, and Trajectories
- Matrix Exponentials and Solution Operators
- Impulse Responses and Transfer Functions
- 2. Stability of Linear Dynamics
- Stability Notions for Linear Flows
- Spectral Classification of Stability
- Quadratic Lyapunov Functions
- Lyapunov Equations
- 3. Controllability and Reachability
- Reachable Subspaces on a Finite Horizon
- Controllability Gramians and Steering Controls
- The Kalman Rank Condition
- The PBH Controllability Test
- 4. Observability and Duality
- Indistinguishable States and Output Data
- The Observability Matrix
- The Observability Gramian
- Duality Between Controllability and Observability
- 5. Canonical Forms and Minimal Realizations
- Coordinate Changes and Invariant Properties
- Controllable Companion Form
- Observable Companion Form
- Minimal Realizations and McMillan Degree
- Procedure for Constructing Minimal Canonical Models
- 6. Kalman Decomposition
- The Four State Components
- Kalman Decomposition Theorem
- Transfer Functions and Minimal Realizations
- Feedback, Observers, and Hidden Instability
- Summary of the Decomposition Viewpoint
- 7. State Feedback and Pole Placement
- Static State Feedback and Closed-Loop Dynamics
- Pole Assignment for Controllable Systems
- Stabilisation Under Weaker Structural Conditions
- 8. Observers and State Estimation Without Noise
- Estimation from Output Measurements
- Observability and Detectability for Estimation
- Observer Pole Placement
- Reduced-Order Observers
- 9. Linear Quadratic Regulation
- Finite-Horizon Quadratic Regulation
- Infinite-Horizon Regulation and the Algebraic Riccati Equation
- Energy Interpretation and Closed-Loop Geometry
- 10. Kalman Filtering
- Probability Notation for Filtering
- Linear Stochastic Systems and Least-Squares Estimation
- The Discrete-Time Kalman Filter
- Continuous-Time Kalman-Bucy Filtering
- Steady-State Filtering and Detectability
- 11. Separation Principle and Output Feedback
- Combining State Feedback with State Estimation
- The Separation Principle
- Certainty Equivalence and the LQG Controller
- Slow and Fast Observers
- 12. Robustness Margins and Model Limitations
- Sensitivity in Linear Feedback Loops
- Noise Amplification and Observer Bandwidth
- Internal Stability of Interconnected State Space Systems
- Small Gain and Stable Unmodelled Dynamics
- Limits of Exact Pole Placement
- Beyond and Connected Topics
- References
Control Theory I: Linear Systems
Content
Problems
History
Created by admin on 6/11/2026 | Last updated on 6/11/2026
Prerequisites (0/1 completed)
Log in to track your prerequisite progress.
Prerequisites Graph
Interactive dependency map showing prerequisite concepts
Loading dependency graph...
Theorem
Definition
Current
Requires
Rate this page
★
★
★
★
★
Poor
Excellent