This course develops the theory and design methods for nonlinear and optimal control systems. It moves beyond linear models to study equilibria, stability, feedback design, observability, and performance optimization for systems whose dynamics are genuinely nonlinear. The central goal is to understand how to analyze such systems rigorously and then turn that analysis into control laws, estimators, and algorithms that work in practice.
The chapters build in a natural progression. The first part establishes nonlinear system behavior and Lyapunov stability, then uses those tools for feedback design, feedback linearisation, and observer construction. The second part turns to optimal control, starting from the [calculus of variations](/page/Calculus%20of%20Variations) and the Pontryagin Maximum Principle, then moving to dynamic programming, Hamilton-Jacobi-Bellman equations, and viscosity solutions for nonsmooth value functions. The final chapters address constrained and robust control, including model predictive control, disturbance-aware design, and numerical methods, tying the theory to computational case studies and implementation.
# Introduction
This opening chapter sets the scope of the course and fixes the vocabulary used throughout the notes. The first control course treats linear models, transfer functions, controllability, observability, LQR, and Kalman filtering as the central language. Here the same questions are asked for systems whose dynamics may curve, saturate, switch behaviour across regions of state space, or impose hard constraints on feasible motion.
The course has two intertwined themes. Nonlinear control asks how to design feedback laws when superposition is unavailable and local linear models do not capture global behaviour. Optimal control asks how to choose inputs by minimizing a cost, often under differential constraints, endpoint constraints, uncertainty, and computational limits.
## What Changes Beyond Linear Control?
The first question is why linear theory is not enough. Linearization remains one of the main tools, but a nonlinear system may have several equilibria, finite escape time, invariant sets, state constraints, or behaviour determined by terms that vanish in the first derivative. A controller that stabilizes the tangent model may describe only a local picture, while the applied problem may require a large region of attraction or performance guarantees far from equilibrium.
A useful baseline is the controlled ordinary differential equation. It tells us what the state is, what the input is allowed to do, and how a trajectory is generated.
[definition: Controlled Dynamical System]
A controlled dynamical system consists of a state space $X \subset \mathbb R^n$, a control set $U \subset \mathbb R^m$, and a map $f: X \times U \to \mathbb R^n$ such that admissible trajectories satisfy
\begin{align*}
\dot{x}(t) = f(x(t), u(t)).
\end{align*}
[/definition]
In most of the course, $X$ is an open subset of $\mathbb R^n$ or a constraint set inside $\mathbb R^n$, and admissible controls are [measurable functions](/page/Measurable%20Functions) $u: [0,T] \to U$. This definition is deliberately broad: it includes mechanical systems, vehicles, population models, chemical reactors, and discretized PDE models.
[example: Pendulum With Torque Input]
Let $x_1$ be the angle of a pendulum and let $x_2$ be the angular velocity. With torque input $u \in \mathbb R$, damping coefficient $b \ge 0$, gravitational constant $g>0$, and length $\ell>0$, the state model is
\begin{align*}
\dot{x}_1=x_2.
\end{align*}
\begin{align*}
\dot{x}_2=-\frac{g}{\ell}\sin x_1-bx_2+u.
\end{align*}
For a constant input $u=\bar u$, an equilibrium $(\bar x_1,\bar x_2)$ must satisfy
\begin{align*}
0=\bar x_2.
\end{align*}
\begin{align*}
0=-\frac{g}{\ell}\sin \bar x_1-b\bar x_2+\bar u.
\end{align*}
Substituting $\bar x_2=0$ gives
\begin{align*}
0=-\frac{g}{\ell}\sin \bar x_1+\bar u.
\end{align*}
Equivalently,
\begin{align*}
\sin \bar x_1=\frac{\ell}{g}\bar u.
\end{align*}
Thus, when $\bar u=0$, every angle $\bar x_1=k\pi$ with $k\in\mathbb Z$ and velocity $\bar x_2=0$ is an equilibrium.
The nonlinear vector field is
\begin{align*}
f(x_1,x_2,u)=\left(x_2,-\frac{g}{\ell}\sin x_1-bx_2+u\right).
\end{align*}
Its derivatives with respect to the state variables are
\begin{align*}
\frac{\partial f_1}{\partial x_1}=0,\quad \frac{\partial f_1}{\partial x_2}=1,\quad \frac{\partial f_2}{\partial x_1}=-\frac{g}{\ell}\cos x_1,\quad \frac{\partial f_2}{\partial x_2}=-b.
\end{align*}
Therefore the linearization at an equilibrium angle $\bar x_1$ sends a perturbation $(\xi_1,\xi_2)$ to
\begin{align*}
\left(\xi_2,-\frac{g}{\ell}\cos(\bar x_1)\xi_1-b\xi_2\right).
\end{align*}
At the downward equilibrium $\bar x_1=0$, $\cos 0=1$, so the linearized perturbation equation is
\begin{align*}
\dot{\xi}_1=\xi_2,\quad \dot{\xi}_2=-\frac{g}{\ell}\xi_1-b\xi_2.
\end{align*}
At the upright equilibrium $\bar x_1=\pi$, $\cos \pi=-1$, so the linearized perturbation equation is
\begin{align*}
\dot{\xi}_1=\xi_2,\quad \dot{\xi}_2=\frac{g}{\ell}\xi_1-b\xi_2.
\end{align*}
The two linearizations differ by the sign of the stiffness term, even though they come from the same nonlinear equation; this is why a torque law designed near one equilibrium cannot be transferred to the other without checking the nonlinear dynamics and the region where the local model is valid.
[/example]
The pendulum also illustrates the first recurring design split. We can either design locally, using a tangent approximation near a chosen equilibrium, or design globally, using energy, invariance, geometry, or optimization.
## Feedback, Stability, and Lyapunov Functions
The next question is what it means for a nonlinear controller to succeed. In linear systems, stability is often read from eigenvalues of a matrix. In nonlinear systems, stability is a property of an equilibrium, an invariant set, or a trajectory, and the answer may depend strongly on the initial condition.
A feedback law closes the loop by choosing the input as a function of the current state. This turns the controlled system into an autonomous dynamical system, so stability theory becomes the core diagnostic.
[definition: State Feedback Law]
For a controlled system $\dot{x}=f(x,u)$ with state space $X \subset \mathbb R^n$ and control set $U \subset \mathbb R^m$, a state feedback law is a map $k: X \to U$. The associated closed-loop system is
\begin{align*}
\dot{x}(t) = f(x(t), k(x(t))).
\end{align*}
[/definition]
The regularity of $k$ matters. Continuous feedback often suffices for stability statements, while discontinuous feedback and sampled-data feedback require more care about the meaning of a solution. Before measuring stability, we need the point or set around which the closed-loop motion is being compared.
[definition: Equilibrium Of A Closed-Loop System]
Let $g: X \to \mathbb R^n$ define an autonomous system $\dot{x}=g(x)$ on $X \subset \mathbb R^n$. A point $x^* \in X$ is an equilibrium if
\begin{align*}
g(x^*) = 0.
\end{align*}
[/definition]
An equilibrium identifies the target state, but it does not by itself say whether nearby trajectories stay nearby or approach the target. This motivates the following definition: we need a scalar certificate that measures displacement from the equilibrium and can be differentiated along closed-loop motion.
[definition: Lyapunov Candidate]
Let $x^*$ be an equilibrium of $\dot{x}=g(x)$ on $X \subset \mathbb R^n$. A Lyapunov candidate near $x^*$ is a continuously differentiable function $V: D \to \mathbb R$, defined on a neighbourhood $D \subset X$ of $x^*$, such that $V(x^*)=0$ and $V(x)>0$ for all $x \in D \setminus \{x^*\}$.
[/definition]
The derivative of $V$ along trajectories is
\begin{align*}
\dot{V}(x) = \nabla V(x) \cdot g(x).
\end{align*}
If $\dot{V}$ is negative away from the equilibrium, then the level sets of $V$ behave like nested traps for the motion.
[example: Energy As A Lyapunov Candidate]
For the damped pendulum with no input, the dynamics are $\dot{x}_1=x_2$ and
\begin{align*}
\dot{x}_2 = -\frac{g}{\ell}\sin x_1 - b x_2.
\end{align*}
Near the downward equilibrium $(0,0)$, consider the shifted mechanical energy
\begin{align*}
V(x_1,x_2)=\frac{1}{2}x_2^2 + \frac{g}{\ell}(1-\cos x_1).
\end{align*}
It satisfies $V(0,0)=0$. Also, if $|x_1|<\pi$ and $(x_1,x_2)\ne(0,0)$, then $1-\cos x_1\ge 0$, with equality only when $x_1=0$, and $\frac{1}{2}x_2^2\ge 0$, with equality only when $x_2=0$. Hence $V(x_1,x_2)>0$ on this neighbourhood away from $(0,0)$.
The partial derivatives of $V$ are
\begin{align*}
\frac{\partial V}{\partial x_1}(x_1,x_2)=\frac{g}{\ell}\sin x_1
\end{align*}
and
\begin{align*}
\frac{\partial V}{\partial x_2}(x_1,x_2)=x_2.
\end{align*}
Therefore, along trajectories,
\begin{align*}
\dot V(x_1,x_2)=\frac{\partial V}{\partial x_1}\dot{x}_1+\frac{\partial V}{\partial x_2}\dot{x}_2.
\end{align*}
Substituting the derivatives of $V$ and the pendulum equations gives
\begin{align*}
\dot V(x_1,x_2)=\frac{g}{\ell}\sin x_1 \cdot x_2+x_2\left(-\frac{g}{\ell}\sin x_1-bx_2\right).
\end{align*}
Expanding the second term,
\begin{align*}
\dot V(x_1,x_2)=\frac{g}{\ell}x_2\sin x_1-\frac{g}{\ell}x_2\sin x_1-bx_2^2.
\end{align*}
The two sine terms cancel, so
\begin{align*}
\dot V(x_1,x_2)=-b x_2^2.
\end{align*}
Thus $V$ never increases when $b\ge 0$. The limitation is that $\dot V=0$ whenever $x_2=0$, even at points with $x_1\ne 0$, so energy decrease alone does not prove convergence to the downward equilibrium; an invariance argument is needed to rule out motion that remains in the zero-derivative set.
[/example]
This example foreshadows LaSalle's invariance principle in Chapter 2, which extends strict Lyapunov decrease to cases where the derivative is only nonpositive. It also explains why the course treats stability, attractivity, exponential rates, and regions of attraction separately.
## Optimization As A Control Principle
The third question is how to choose among many stabilizing or feasible controls. Engineering specifications usually include time, energy, tracking error, terminal accuracy, actuator bounds, or safety constraints, so control design naturally becomes an optimization problem over trajectories.
[definition: Finite-Horizon Optimal Control Problem]
A finite-horizon optimal control problem consists of a time interval $[0,T]$, dynamics $\dot{x}=f(x,u)$, an initial condition $x(0)=x_0$, a control constraint $u(t) \in U$, a running cost $L: \mathbb R^n \times U \to \mathbb R$, and a terminal cost $\Phi: \mathbb R^n \to \mathbb R$. Let $\mathcal U$ denote the admissible measurable controls $u:[0,T]\to U$ for which the corresponding trajectory is defined on $[0,T]$. The objective is to minimize the cost functional $J:\mathcal U \to \mathbb R$ given by
\begin{align*}
J[u] = \Phi(x(T)) + \int_0^T L(x(t),u(t))\,dt
\end{align*}
over admissible controls and their corresponding trajectories.
[/definition]
This formulation shifts attention from a single feedback map to an optimization problem constrained by an ODE. Two major methods will appear: Pontryagin's maximum principle, which gives necessary conditions using an adjoint equation, and dynamic programming, which characterizes the value function through a Hamilton-Jacobi-Bellman equation.
[example: Minimum-Energy Transfer For A Double Integrator]
Consider the controlled system $\dot{x}_1=x_2$ and
\begin{align*}
\dot{x}_2=u
\end{align*}
on $[0,T]$, with fixed endpoints $x(0)=(a,b)$ and $x(T)=(c,d)$, and cost
\begin{align*}
J[u]=\int_0^T u(t)^2\,dt.
\end{align*}
Using the Hamiltonian
\begin{align*}
H(x,u,\lambda)=u^2+\lambda_1x_2+\lambda_2u,
\end{align*}
the pointwise minimizing condition is
\begin{align*}
0=\frac{\partial H}{\partial u}=2u+\lambda_2.
\end{align*}
Hence
\begin{align*}
u(t)=-\frac{1}{2}\lambda_2(t).
\end{align*}
The adjoint equations are
\begin{align*}
\dot{\lambda}_1=-\frac{\partial H}{\partial x_1}=0.
\end{align*}
\begin{align*}
\dot{\lambda}_2=-\frac{\partial H}{\partial x_2}=-\lambda_1.
\end{align*}
Therefore $\lambda_1(t)=p$ for some constant $p$, and integrating $\dot{\lambda}_2=-p$ gives
\begin{align*}
\lambda_2(t)=-pt+q
\end{align*}
for another constant $q$. Substituting into $u(t)=-\lambda_2(t)/2$ gives
\begin{align*}
u(t)=\frac{p}{2}t-\frac{q}{2}.
\end{align*}
Thus the minimizing input has the affine form
\begin{align*}
u(t)=\alpha t+\beta.
\end{align*}
Now impose the endpoint conditions. Since $\dot{x}_2=u$, we have
\begin{align*}
x_2(t)=b+\int_0^t(\alpha s+\beta)\,ds=b+\frac{\alpha}{2}t^2+\beta t.
\end{align*}
At $t=T$ this gives
\begin{align*}
d-b=\frac{\alpha}{2}T^2+\beta T.
\end{align*}
Since $\dot{x}_1=x_2$, we also have
\begin{align*}
x_1(t)=a+\int_0^t\left(b+\frac{\alpha}{2}s^2+\beta s\right)\,ds=a+bt+\frac{\alpha}{6}t^3+\frac{\beta}{2}t^2.
\end{align*}
At $t=T$ this gives
\begin{align*}
c-a-bT=\frac{\alpha}{6}T^3+\frac{\beta}{2}T^2.
\end{align*}
Set $\Delta v=d-b$ and $\Delta p=c-a-bT$. The two endpoint equations are
\begin{align*}
\frac{\alpha}{2}T^2+\beta T=\Delta v.
\end{align*}
\begin{align*}
\frac{\alpha}{6}T^3+\frac{\beta}{2}T^2=\Delta p.
\end{align*}
From the first equation,
\begin{align*}
\beta=\frac{\Delta v}{T}-\frac{\alpha}{2}T.
\end{align*}
Substituting this into the second equation gives
\begin{align*}
\frac{\alpha}{6}T^3+\frac{1}{2}T^2\left(\frac{\Delta v}{T}-\frac{\alpha}{2}T\right)=\Delta p.
\end{align*}
Expanding the left-hand side gives
\begin{align*}
\frac{\alpha}{6}T^3+\frac{\Delta v}{2}T-\frac{\alpha}{4}T^3=\Delta p.
\end{align*}
Combining the $\alpha$ terms gives
\begin{align*}
-\frac{\alpha}{12}T^3+\frac{\Delta v}{2}T=\Delta p.
\end{align*}
Therefore
\begin{align*}
\alpha=\frac{6\Delta v}{T^2}-\frac{12\Delta p}{T^3}.
\end{align*}
Substituting this value into $\beta=\Delta v/T-\alpha T/2$ gives
\begin{align*}
\beta=\frac{\Delta v}{T}-\frac{T}{2}\left(\frac{6\Delta v}{T^2}-\frac{12\Delta p}{T^3}\right).
\end{align*}
Expanding gives
\begin{align*}
\beta=\frac{\Delta v}{T}-\frac{3\Delta v}{T}+\frac{6\Delta p}{T^2}.
\end{align*}
Thus
\begin{align*}
\beta=\frac{6\Delta p}{T^2}-\frac{2\Delta v}{T}.
\end{align*}
The minimum-energy input is therefore
\begin{align*}
u(t)=\left(\frac{6(d-b)}{T^2}-\frac{12(c-a-bT)}{T^3}\right)t+\frac{6(c-a-bT)}{T^2}-\frac{2(d-b)}{T}.
\end{align*}
This example shows how the costate $\lambda_2$ acts as the multiplier attached to the acceleration equation $\dot{x}_2=u$: minimizing the Hamiltonian turns that multiplier directly into the input.
[/example]
Optimal control also gives a bridge back to feedback. To avoid confusing this object with a Lyapunov function, write the value function as $\mathcal V(t,x)$. If $\mathcal V$ is known, the minimizing control can often be recovered from the Hamiltonian minimization rule. In practice, computing $\mathcal V$ exactly is hard in high dimension, which motivates approximation, viscosity solutions, and model predictive control.
## Computation, Constraints, and Model Predictive Control
The final question is how theoretical control laws become implementable algorithms. Real systems operate with actuator limits, state constraints, disturbances, model error, and sampled measurements. A useful controller must therefore be compatible with numerical optimization and robust enough to tolerate imperfect modelling.
[definition: Model Predictive Control]
Model predictive control is a feedback strategy in which, at each sampling time, a finite-horizon constrained optimal control problem is solved from the current state, the first part of the computed control is applied, and the procedure is repeated at the next sampling time.
[/definition]
The defining feature is receding-horizon implementation. The optimization problem looks open-loop over the prediction horizon, but repeated replanning turns it into a feedback mechanism.
[example: Constrained Velocity Control]
At a sampling time, let the current state be $x\in[-2,2]$ and choose a constant input $v$ over a short horizon $h>0$. The predicted motion satisfies
\begin{align*}
x(s)=x+sv
\end{align*}
for $0\le s\le h$, because $\dot{x}=v$ and $x(0)=x$. Consider the one-step constrained problem: minimize $(x+hv)^2$ subject to $|v|\le 1$ and $x+sv\in[-2,2]$ for every $0\le s\le h$.
First ignore the state constraint and set $y=x+hv$. The input bound $-1\le v\le 1$ implies
\begin{align*}
-h\le hv\le h.
\end{align*}
Adding $x$ gives
\begin{align*}
x-h\le y\le x+h.
\end{align*}
Thus the terminal state must lie in $[x-h,x+h]$, and the cost is $y^2$.
If $x>h$, then every $y\in[x-h,x+h]$ is positive, so $y^2$ is minimized at the endpoint closest to $0$, namely $y=x-h$. Solving $x+hv=x-h$ gives
\begin{align*}
hv=-h.
\end{align*}
Since $h>0$, this gives
\begin{align*}
v=-1.
\end{align*}
If $0\le x\le h$, then $0\in[x-h,x+h]$, so the minimum is attained at $y=0$. Solving $x+hv=0$ gives
\begin{align*}
v=-\frac{x}{h}.
\end{align*}
The same calculation on the negative side gives $v=1$ when $x<-h$, and $v=-x/h$ when $-h\le x\le 0$. Therefore the first applied input is $v(x)=1$ for $x<-h$, $v(x)=-x/h$ for $|x|\le h$, and $v(x)=-1$ for $x>h$.
For this input, the predicted terminal state is either $0$ or a point between $x$ and $0$. Since $[-2,2]$ is an interval and $x(s)=x+sv$ traces the line segment from $x$ to $x+hv$, every predicted state remains in $[-2,2]$ whenever the sampled state starts in $[-2,2]$. The optimization therefore produces saturated motion far from the origin and smaller inputs near the origin, while enforcing the input and state constraints inside the control computation.
[/example]
The price of this flexibility is that stability is no longer automatic. Chapter 10 proves stability under terminal costs, terminal sets, and Lyapunov-type decrease conditions on the receding-horizon value function.
## How The Course Is Organized
The guiding problem for the course is to connect analysis, design, and computation without treating them as separate subjects. We start with nonlinear systems, equilibria, linearization, and existence of trajectories. We then develop Lyapunov stability theory, including direct methods, invariant sets, and converse results.
The middle part of the course studies feedback design. Linearization-based methods give local designs, while feedback linearization, backstepping, control Lyapunov functions, and input-to-state stability give tools for structured nonlinear systems and robustness questions.
The final part turns to optimal control. Pontryagin's maximum principle gives necessary conditions in terms of state and costate equations. Dynamic programming introduces value functions and Hamilton-Jacobi-Bellman equations, including viscosity solutions when classical differentiability fails. Model predictive control closes the course by showing how finite-horizon optimization can be used as an implementable nonlinear feedback method.
[remark: Relation To The First Control Course]
Linear systems are not discarded in this course. They remain the local model, the computational approximation, and the source of many design templates. The main change is that every linear argument must now be checked against the nonlinear dynamics, the region of validity, and the constraints of the control problem.
[/remark]
By the end of the course, a successful analysis should answer four questions. What are the relevant trajectories and equilibria? What stability or performance property is required? Which Lyapunov, geometric, or optimization principle certifies it? How can the resulting controller be implemented under constraints?
The opening chapter has now posed the organizing questions for the course: what trajectories matter, which properties must be certified, and what tools can justify a controller under constraints. The next chapter begins the technical part of that program by defining nonlinear control systems, equilibria, and invariant behavior in the state-space language that will be used throughout the course.
# 1. Nonlinear Control Systems and Equilibria
Nonlinear control begins with the same state-space viewpoint as linear control, but the vector field is no longer a matrix and many global conclusions disappear. This chapter sets up the objects that will be used throughout the course: controlled trajectories, admissible inputs, equilibria, invariant sets, and linear approximations near steady behaviour. The emphasis is on separating what is intrinsic to the nonlinear system from what is an artefact of a chosen input, coordinate system, or local approximation.
## Control-Affine Systems and Controlled Trajectories
The first question is what data are needed to specify a nonlinear controlled motion. In linear systems the equation $\dot{x}=Ax+Bu$ hides several choices: the class of inputs, the time interval on which solutions are considered, and the regularity needed to make the differential equation meaningful. For nonlinear systems these choices have to be stated explicitly.
[definition: Controlled Dynamical System]
Let $X\subseteq \mathbb R^n$ be a state space and let $U\subseteq \mathbb R^m$ be an input set. A controlled dynamical system on $X$ with inputs in $U$ is an equation
\begin{align*}
\dot{x}(t)=f(x(t),u(t)),
\end{align*}
where $f:X\times U\to \mathbb R^n$ is a vector field and $u:I\to U$ is an input signal on an interval $I\subseteq \mathbb R$.
[/definition]
The definition records the ingredients but does not yet give a solvable problem, because not every time-dependent input should be allowed. We need a class of controls broad enough to include switching and saturation, while still making the right-hand side integrable in time.
[definition: Admissible Control]
Let $I\subseteq \mathbb R$ be an interval and $U\subseteq \mathbb R^m$. An admissible control is a measurable map $u:I\to U$ that is locally essentially bounded.
[/definition]
Admissible controls may have jumps, so the differential equation should not be read as a pointwise classical equation at every time. We need a trajectory concept that survives such inputs and still records the initial condition and accumulated motion.
[definition: Controlled Trajectory]
Let $u:I\to U$ be an admissible control and let $t_0\in I$. A controlled trajectory starting from $x_0\in X$ at time $t_0$ is an absolutely continuous map $x:I\to X$ satisfying $x(t_0)=x_0$ and
\begin{align*}
x(t)=x_0+\int_{t_0}^{t} f(x(s),u(s))\,ds
\end{align*}
for all $t\in I$.
[/definition]
This integral formulation makes feedback and open-loop inputs fit into the same notation, but it does not yet ensure that a trajectory exists or is unique. We need an existence theorem that states exactly which regularity assumptions make the controlled model well posed.
[quotetheorem:7601]
[proofunderconstruction:7601]
The hypotheses are close to minimal for the Picard iteration strategy. Continuity in the right-hand side ensures the integral equation is meaningful, while Lipschitz dependence in $x$ is what prevents branching of solutions; for instance, scalar equations such as $\dot{x}=\sqrt{|x|}$ can have non-unique solutions from $x_0=0$. The compact essential-range condition on the input is needed because an $U$-valued input may approach the boundary of an open input set on every short interval, destroying uniform estimates on the vector field. The theorem is also local: it does not prevent finite-time escape, and it gives no global stability, boundedness, or controllability conclusion.
The theorem gives local well-posedness, but its hypotheses apply to many vector fields without revealing how the input enters. We need a structural form that separates autonomous drift from controlled directions, because later controllability and feedback design use that separation.
[definition: Control Affine System]
Let $X\subseteq \mathbb R^n$ and $U\subseteq \mathbb R^m$. A control-affine system is a controlled dynamical system of the form
\begin{align*}
\dot{x}=f_0(x)+\sum_{i=1}^{m} u_i f_i(x),
\end{align*}
where $f_0,f_1,\dots,f_m:X\to \mathbb R^n$ are vector fields and $u=(u_1,\dots,u_m)\in U$.
[/definition]
The vector field $f_0$ is called the drift, and $f_i$ is the direction in which the input component $u_i$ pushes the state. This form is central because many mechanical and robotic systems have inputs that enter through forces, torques, or velocities while the state dependence remains nonlinear.
[example: Pendulum With Torque Input]
Consider a pendulum with angle $\theta$, angular velocity $\omega$, damping coefficient $b\ge 0$, gravitational constant $g>0$, length $\ell>0$, and torque input $u\in\mathbb R$. With state $x=(\theta,\omega)\in\mathbb R^2$, the equations are
\begin{align*}
\dot{\theta}=\omega,\qquad \dot{\omega}=-\frac{g}{\ell}\sin\theta-b\omega+u.
\end{align*}
Thus the vector field is
\begin{align*}
f((\theta,\omega),u)=\left(\omega,-\frac{g}{\ell}\sin\theta-b\omega+u\right).
\end{align*}
Define
\begin{align*}
f_0(\theta,\omega)=\left(\omega,-\frac{g}{\ell}\sin\theta-b\omega\right),\qquad f_1(\theta,\omega)=(0,1).
\end{align*}
Then
\begin{align*}
f_0(\theta,\omega)+u f_1(\theta,\omega)=\left(\omega,-\frac{g}{\ell}\sin\theta-b\omega\right)+u(0,1).
\end{align*}
Multiplying the input vector field by $u$ gives
\begin{align*}
u(0,1)=(0,u).
\end{align*}
Adding the two vectors componentwise gives
\begin{align*}
\left(\omega,-\frac{g}{\ell}\sin\theta-b\omega\right)+(0,u)=\left(\omega,-\frac{g}{\ell}\sin\theta-b\omega+u\right).
\end{align*}
Therefore
\begin{align*}
f((\theta,\omega),u)=f_0(\theta,\omega)+u f_1(\theta,\omega),
\end{align*}
so the pendulum is control-affine with drift $f_0$ and input vector field $f_1$. The example shows that the nonlinear term $\sin\theta$ belongs to the drift, while the actuator enters linearly through the constant direction $(0,1)$.
[/example]
The next useful object is the flow, which packages all initial conditions at once. For autonomous systems without controls, the flow is a family of maps indexed by time. With controls, the corresponding map also depends on the chosen input signal.
[definition: Controlled Flow Map]
Fix a class $\mathcal U$ of admissible controls. The controlled flow map is the partially defined map
\begin{align*}
\varphi:\mathcal D\to X,
\end{align*}
where $\mathcal D$ is the set of all tuples $(t,t_0,x_0,u)\in \mathbb R\times \mathbb R\times X\times\mathcal U$ for which the controlled trajectory with input $u$ and initial condition $x(t_0)=x_0$ exists at time $t$. It is defined by
\begin{align*}
\varphi(t,t_0,x_0;u)=x(t),
\end{align*}
where $x$ is that trajectory.
[/definition]
Flow notation is convenient when comparing several initial states under the same input. In feedback control, the input is no longer an externally fixed signal, so the closed-loop vector field gets its own flow.
[example: Unicycle Kinematics]
Let the unicycle state be $x=(p_1,p_2,\theta)\in \mathbb R^2\times S^1$, where $(p_1,p_2)$ is position and $\theta$ is heading. On a local chart for $S^1$, treat $\theta$ as a real coordinate. With speed input $v$ and angular velocity input $\omega$, the kinematic equations can be written as the vector equation
\begin{align*}\dot{x}=(\dot{p}_1,\dot{p}_2,\dot{\theta})=(v\cos\theta,v\sin\theta,\omega).\end{align*}
Define the drift and input vector fields by
\begin{align*}f_0(p_1,p_2,\theta)=(0,0,0),\qquad f_1(p_1,p_2,\theta)=(\cos\theta,\sin\theta,0),\qquad f_2(p_1,p_2,\theta)=(0,0,1).\end{align*}
Multiplying the first input vector field by $v$ gives
\begin{align*}v f_1(p_1,p_2,\theta)=v(\cos\theta,\sin\theta,0)=(v\cos\theta,v\sin\theta,0).\end{align*}
Multiplying the second input vector field by $\omega$ gives
\begin{align*}\omega f_2(p_1,p_2,\theta)=\omega(0,0,1)=(0,0,\omega).\end{align*}
Adding the drift and the two input terms componentwise gives
\begin{align*}f_0(p_1,p_2,\theta)+v f_1(p_1,p_2,\theta)+\omega f_2(p_1,p_2,\theta)=(v\cos\theta,v\sin\theta,\omega).\end{align*}
Therefore
\begin{align*}\dot{x}=f_0(x)+v f_1(x)+\omega f_2(x),\end{align*}
so the unicycle is control-affine on each local chart. The example shows that the forward velocity direction $(\cos\theta,\sin\theta,0)$ depends on the current heading, while the turning direction $(0,0,1)$ is independent of position and heading.
[/example]
## Equilibria, Invariant Sets, and Linearization
The next question is how to describe steady behaviour in a controlled system. In an uncontrolled ODE an equilibrium is a point where the vector field vanishes. In a controlled system, vanishing may depend on whether the input is fixed, free to choose, or determined by feedback.
[definition: Equilibrium For A Fixed Input]
Let $\dot{x}=f(x,u)$ be a controlled dynamical system and fix $\bar{u}\in U$. A point $x^*\in X$ is an equilibrium for the constant input $\bar{u}$ if
\begin{align*}
f(x^*,\bar{u})=0.
\end{align*}
[/definition]
Freezing the input turns the model into an autonomous ODE and gives the right notion for local phase portraits. We also need a more design-oriented notion that asks whether some constant input can hold the state at rest.
[definition: Controlled Equilibrium]
A point $x^*\in X$ is a controlled equilibrium if there exists $\bar{u}\in U$ such that
\begin{align*}
f(x^*,\bar{u})=0.
\end{align*}
[/definition]
The distinction matters because a state may be maintainable with one input but not with another. The same state can also become an equilibrium after feedback even if it is not an equilibrium for zero input.
[example: Logistic Growth With Harvesting Control]
Let $x(t)\ge 0$ be a population, let $r>0$ and $K>0$, and let $u(t)\ge 0$ be a harvesting rate. For the controlled model
\begin{align*}
\dot{x}=rx\left(1-\frac{x}{K}\right)-u
\end{align*}
a fixed-input equilibrium for the constant harvest $\bar{u}$ is a number $x^*\ge 0$ such that
\begin{align*}
0=rx^*\left(1-\frac{x^*}{K}\right)-\bar{u}.
\end{align*}
Equivalently,
\begin{align*}
\bar{u}=rx^*-\frac{r}{K}(x^*)^2.
\end{align*}
Multiplying by $K/r>0$ gives
\begin{align*}
\frac{K\bar{u}}{r}=Kx^*-(x^*)^2.
\end{align*}
Moving all terms to one side gives the quadratic equation
\begin{align*}
(x^*)^2-Kx^*+\frac{K\bar{u}}{r}=0.
\end{align*}
The [quadratic formula](/theorems/1301) gives
\begin{align*}
x^*=\frac{K\pm\sqrt{K^2-4K\bar{u}/r}}{2}.
\end{align*}
Thus the number of positive equilibria is determined by the discriminant
\begin{align*}
\Delta=K^2-\frac{4K\bar{u}}{r}=K^2\left(1-\frac{4\bar{u}}{rK}\right).
\end{align*}
If $0<\bar{u}<rK/4$, then $1-4\bar{u}/(rK)>0$, so $\Delta>0$ and the two roots
\begin{align*}
x^*_{\pm}=\frac{K\pm\sqrt{K^2-4K\bar{u}/r}}{2}
\end{align*}
are distinct and positive. If $\bar{u}=rK/4$, then
\begin{align*}
\Delta=K^2-\frac{4K}{r}\cdot\frac{rK}{4}=K^2-K^2=0,
\end{align*}
so the two roots coalesce at
\begin{align*}
x^*=\frac{K}{2}.
\end{align*}
If $\bar{u}>rK/4$, then $\Delta<0$, so the quadratic equation has no real root and hence no positive fixed-input equilibrium. Changing the constant harvest therefore creates two maintainable population levels, merges them into one threshold equilibrium, or removes positive equilibria altogether.
[/example]
Equilibria describe states that do not move, but nonlinear systems often confine motion to a curve, surface, or physically meaningful region. We need a set-based notion that records when trajectories cannot escape in forward time.
[definition: Positively Invariant Set]
Let $\dot{x}=f(x,u)$ be a controlled dynamical system and fix an admissible control $u$. A set $M\subseteq X$ is positively invariant for $u$ if every trajectory with $x(t_0)\in M$ satisfies $x(t)\in M$ for all $t\ge t_0$ for which the trajectory is defined.
[/definition]
Positive invariance is a forward-time notion. It is weaker than saying the vector field vanishes: motion may occur inside the set, but the dynamics do not cross its boundary outward.
[example: Nonnegative Population States]
Consider the scalar vector field
\begin{align*}
F(x,u)=rx\left(1-\frac{x}{K}\right)-u.
\end{align*}
Assume that along states with $0\le x(t)\le K$ the control satisfies
\begin{align*}
0\le u(t)\le rx(t)\left(1-\frac{x(t)}{K}\right).
\end{align*}
If $x(t)\in[0,K]$, then the upper bound gives
\begin{align*}
F(x(t),u(t))=rx(t)\left(1-\frac{x(t)}{K}\right)-u(t)\ge 0.
\end{align*}
At the lower endpoint $x=0$,
\begin{align*}
r\cdot 0\left(1-\frac{0}{K}\right)=0,
\end{align*}
so the constraint becomes $0\le u(t)\le 0$, hence $u(t)=0$ and
\begin{align*}
F(0,u(t))=0-u(t)=0.
\end{align*}
At the upper endpoint $x=K$,
\begin{align*}
rK\left(1-\frac{K}{K}\right)=rK(1-1)=0,
\end{align*}
so again $u(t)=0$ and
\begin{align*}
F(K,u(t))=0-u(t)=0\le 0.
\end{align*}
Thus the vector field does not point below $0$ at the lower boundary and does not point above $K$ at the upper boundary. Therefore a controlled trajectory starting in $[0,K]$ cannot cross either endpoint while it is defined, so $[0,K]$ is positively invariant for this constrained input class.
[/example]
The main local tool near an equilibrium is linearization. It replaces the nonlinear vector field by its first derivative at the equilibrium and asks whether the local phase portrait is already determined by this first-order part.
[definition: Linearization At A Fixed-Input Equilibrium]
Let $f:X\times U\to \mathbb R^n$ be continuously differentiable and let $x^*$ be an equilibrium for the constant input $\bar{u}$. The linearization at $(x^*,\bar{u})$ is the linear system
\begin{align*}
\dot{y}=Ay+Bv,
\end{align*}
where
\begin{align*}
A&=J_x f_{(x^*,\bar{u})}, & B&=J_u f_{(x^*,\bar{u})},
\end{align*}
and $y=x-x^*$, $v=u-\bar{u}$.
[/definition]
Here $J_x f_{(x^*,\bar{u})}$ and $J_u f_{(x^*,\bar{u})}$ are Jacobian matrices with respect to state and input variables. If the input is frozen at $\bar{u}$, the relevant local autonomous approximation is $\dot{y}=Ay$.
[definition: Hyperbolic Equilibrium]
Let $\dot{x}=F(x)$ be a continuously differentiable autonomous system and let $x^*$ be an equilibrium. The equilibrium $x^*$ is hyperbolic if no eigenvalue of $JF_{x^*}$ has real part equal to $0$.
[/definition]
Hyperbolicity excludes centre directions, where first-order information does not decide whether nearby trajectories spiral, drift, or remain on nonlinear centre manifolds. We need a theorem explaining why, once centre directions are absent, the nonlinear and linear phase portraits have the same qualitative orbit structure.
[quotetheorem:2777]
The usual dynamical-systems development uses fixed-point arguments on spaces of orbit corrections and the exponential splitting into stable and unstable linear subspaces. Hyperbolicity is essential: for $\dot{x}=x^2$ at $0$, the linearization is $\dot{y}=0$, but the nonlinear system has one-sided escape behaviour that a zero linear system does not capture. The theorem also gives only a topological conjugacy, so it preserves the ordering and shape of orbits but not distances, angles, differentiability of the conjugacy, or precise exponential rates. For control design we need a more concrete stability test, so the next result translates the eigenvalues of the linearization into local stability information.
[quotetheorem:6852]
[citeproof:6852]
This theorem is the first bridge from nonlinear control back to linear systems theory. It justifies using eigenvalue placement as a local design goal after feedback has produced a desired equilibrium, while warning that non-hyperbolic cases require tools beyond linearization. The warning is real: $\dot{x}=-x^3$ has zero linearization at $0$ but is asymptotically stable, $\dot{x}=x^3$ has the same zero linearization but is unstable, and $\dot{x}=0$ is stable without attraction. Thus eigenvalues with zero real part are not a small technical inconvenience; they mark exactly the cases where higher-order nonlinear terms can decide the local behaviour.
[example: Pendulum Phase Portrait Near Downward Equilibrium]
For the damped pendulum with zero torque, the vector field is
\begin{align*}
F(\theta,\omega)=\left(\omega,-\frac{g}{\ell}\sin\theta-b\omega\right).
\end{align*}
At $(0,0)$,
\begin{align*}
F(0,0)=\left(0,-\frac{g}{\ell}\sin 0-b\cdot 0\right)=(0,0),
\end{align*}
so $(0,0)$ is an equilibrium. The state derivatives are
\begin{align*}
\frac{\partial F_1}{\partial \theta}=0,\quad \frac{\partial F_1}{\partial \omega}=1,\quad \frac{\partial F_2}{\partial \theta}=-\frac{g}{\ell}\cos\theta,\quad \frac{\partial F_2}{\partial \omega}=-b.
\end{align*}
Evaluating at $(0,0)$ and using $\cos 0=1$, the linearization has entries
\begin{align*}
A_{11}=0,\quad A_{12}=1,\quad A_{21}=-\frac{g}{\ell},\quad A_{22}=-b.
\end{align*}
For a $2\times 2$ matrix with entries $A_{11},A_{12},A_{21},A_{22}$, the characteristic polynomial is
\begin{align*}
\det(\lambda I-A)=(\lambda-A_{11})(\lambda-A_{22})-A_{12}A_{21}.
\end{align*}
Substituting the four entries gives
\begin{align*}
\det(\lambda I-A)=(\lambda-0)(\lambda-(-b))-1\left(-\frac{g}{\ell}\right).
\end{align*}
Thus
\begin{align*}
\det(\lambda I-A)=\lambda(\lambda+b)+\frac{g}{\ell}=\lambda^2+b\lambda+\frac{g}{\ell}.
\end{align*}
The quadratic formula gives
\begin{align*}
\lambda_{\pm}=\frac{-b\pm\sqrt{b^2-4g/\ell}}{2}.
\end{align*}
If $b^2<4g/\ell$, then the square root is imaginary and both eigenvalues have real part $-b/2<0$. If $b^2=4g/\ell$, then both eigenvalues equal $-b/2<0$. If $b^2>4g/\ell$, then $0<\sqrt{b^2-4g/\ell}<b$, because $4g/\ell>0$. Therefore
\begin{align*}
-b+\sqrt{b^2-4g/\ell}<0
\end{align*}
and
\begin{align*}
-b-\sqrt{b^2-4g/\ell}<0.
\end{align*}
In every case with $b>0$, all eigenvalues of the linearization have negative real part, so the *Linearization Test For Hyperbolic Equilibria* implies that the downward equilibrium is locally asymptotically stable. The local phase portrait is a spiral when $b^2<4g/\ell$, a repeated-node case when $b^2=4g/\ell$, and a node when $b^2>4g/\ell$.
[/example]
## Input-To-State Viewpoints and Forward Completeness
The final question in this chapter is how to discuss solutions beyond a local time interval when inputs are present. Stability and optimal control both require trajectories over long horizons, so a local existence theorem must be supplemented by hypotheses that prevent finite-time escape.
[definition: Forward Completeness]
A controlled dynamical system $\dot{x}=f(x,u)$ is forward complete for a class $\mathcal U$ of admissible controls if, for every $u\in\mathcal U$, every initial time $t_0$, and every initial state $x_0\in X$, the corresponding maximal trajectory exists for all $t\ge t_0$.
[/definition]
Forward completeness is a property of the system together with the input class, but verifying it directly from maximal solutions is often difficult. We need a usable sufficient condition that turns finite-time escape into an estimate on the size of the vector field.
[quotetheorem:7602]
[citeproof:7602]
This result is frequently used as a model check: polynomial vector fields of degree greater than one may fail the linear growth condition globally, while saturated or damped systems often satisfy it on physically relevant regions. The scalar equation $\dot{x}=x^2$ with $x(0)>0$ shows what can go wrong without a growth bound, since its solution reaches infinity in finite positive time. The theorem is only a sufficient condition for existence over long horizons; it does not imply boundedness, stability, or convergence unless additional estimates are available.
[example: Forward Completeness Of Logistic Harvesting Under Bounded Inputs]
Consider inputs in the constrained class
\begin{align*}
0\le u(t)\le r x(t)\left(1-\frac{x(t)}{K}\right)
\end{align*}
whenever $0\le x(t)\le K$. At $x=0$ this condition forces
\begin{align*}
0\le u(t)\le r\cdot 0\left(1-\frac{0}{K}\right)=0,
\end{align*}
so $u(t)=0$ and
\begin{align*}
\dot{x}=r\cdot 0\left(1-\frac{0}{K}\right)-u(t)=0.
\end{align*}
At $x=K$ the logistic term is
\begin{align*}
rK\left(1-\frac{K}{K}\right)=rK(1-1)=0,
\end{align*}
so the same constraint gives $u(t)=0$ and hence
\begin{align*}
\dot{x}=0-u(t)=0.
\end{align*}
For $0<x<K$, the upper bound on $u(t)$ gives
\begin{align*}
\dot{x}=rx(t)\left(1-\frac{x(t)}{K}\right)-u(t)\ge 0.
\end{align*}
Thus a trajectory starting in $[0,K]$ cannot cross below $0$, and at the upper endpoint the vector field is not pointing out of $[0,K]$; hence the trajectory remains in the compact interval $[0,K]$ for as long as it is defined.
On any finite time interval, admissibility gives a local essential bound $|u(t)|\le M$. For $x\in[0,K]$,
\begin{align*}
0\le x\le K
\end{align*}
implies
\begin{align*}
0\le 1-\frac{x}{K}\le 1,
\end{align*}
so
\begin{align*}
0\le rx\left(1-\frac{x}{K}\right)\le rK.
\end{align*}
Therefore
\begin{align*}
\left|rx\left(1-\frac{x}{K}\right)-u(t)\right|\le rK+M.
\end{align*}
The right-hand side is bounded on the compact state interval and on each compact time interval, so the local existence theorem *[Picard Lindelof](/theorems/69) Theorem For Controlled Odes* can be restarted at every finite time. No finite-time escape is possible, and the logistic harvesting model is forward complete on $[0,K]$ for this constrained input class.
This conclusion does not come from global linear growth on all of $\mathbb R$. With $u=0$ and $x\ge K$,
\begin{align*}
\left|rx\left(1-\frac{x}{K}\right)\right|=rx\left(\frac{x}{K}-1\right)=\frac{r}{K}x^2-rx.
\end{align*}
If constants $a,b\ge 0$ satisfied $\left|rx(1-x/K)\right|\le a+b|x|$ for all $x\in\mathbb R$, then for every $x\ge K$ we would have
\begin{align*}
\frac{r}{K}x^2-rx\le a+bx.
\end{align*}
Equivalently,
\begin{align*}
\frac{r}{K}x^2-(r+b)x-a\le 0.
\end{align*}
The left-hand side is a quadratic polynomial with positive leading coefficient $r/K>0$, so it becomes positive for sufficiently large $x$, contradicting the inequality. The compact invariant interval, not a global linear-growth bound, is what gives forward completeness here.
[/example]
Input-to-state language treats the control signal as a disturbance or command whose size should determine the size of the state. This viewpoint prepares for Lyapunov stability, where we will estimate trajectories using scalar comparison inequalities.
[definition: Input To State Estimate]
Let $\dot{x}=f(x,u)$ have an equilibrium at $x=0$ for $u=0$. An input-to-state estimate consists of functions $\beta:[0,\infty)\times[0,\infty)\to[0,\infty)$ and $\gamma:[0,\infty)\to[0,\infty)$ such that $\beta$ is continuous, increasing in its first variable, satisfies $\beta(r,s)\to 0$ as $s\to\infty$ for each fixed $r$, and $\gamma$ is continuous, nondecreasing, and satisfies $\gamma(0)=0$. For every admissible input $u$ and every corresponding trajectory,
\begin{align*}
|x(t)|\le \beta(|x(t_0)|,t-t_0)+\gamma(\|u\|_{L^\infty([t_0,t])})
\end{align*}
for all $t\ge t_0$ for which the trajectory is defined.
[/definition]
The term involving $\beta$ describes decay from the initial condition when the input is absent, while the term involving $\gamma$ describes the ultimate size forced by the input. The definition is stated here as an estimate rather than as a full stability property; the Lyapunov version is developed in Chapter 11 after the stability machinery of Chapter 2 is in place.
[example: Stable Scalar System With Additive Input]
Consider $\dot{x}=-x+u$ on $\mathbb R$, with $u\in L^\infty_{\mathrm{loc}}([0,\infty))$, and fix $t\ge t_0$. Multiplying the differential equation by the integrating factor $e^t$ gives
\begin{align*}
e^t\dot{x}(t)+e^t x(t)=e^t u(t).
\end{align*}
By the product rule,
\begin{align*}
\frac{d}{dt}\bigl(e^t x(t)\bigr)=e^t\dot{x}(t)+e^t x(t),
\end{align*}
so
\begin{align*}
\frac{d}{dt}\bigl(e^t x(t)\bigr)=e^t u(t).
\end{align*}
Integrating from $t_0$ to $t$ gives
\begin{align*}
e^t x(t)-e^{t_0}x(t_0)=\int_{t_0}^{t} e^s u(s)\,ds.
\end{align*}
Multiplying by $e^{-t}$ yields
\begin{align*}
x(t)=e^{-(t-t_0)}x(t_0)+\int_{t_0}^{t} e^{-(t-s)}u(s)\,ds.
\end{align*}
Taking absolute values and using the triangle inequality gives
\begin{align*}
|x(t)|\le e^{-(t-t_0)}|x(t_0)|+\int_{t_0}^{t} e^{-(t-s)}|u(s)|\,ds.
\end{align*}
For almost every $s\in[t_0,t]$,
\begin{align*}
|u(s)|\le \|u\|_{L^\infty([t_0,t])},
\end{align*}
so
\begin{align*}
\int_{t_0}^{t} e^{-(t-s)}|u(s)|\,ds\le \|u\|_{L^\infty([t_0,t])}\int_{t_0}^{t} e^{-(t-s)}\,ds.
\end{align*}
Since
\begin{align*}
\int_{t_0}^{t} e^{-(t-s)}\,ds=1-e^{-(t-t_0)}\le 1,
\end{align*}
we obtain
\begin{align*}
|x(t)|\le e^{-(t-t_0)}|x(t_0)|+\|u\|_{L^\infty([t_0,t])}.
\end{align*}
Thus the estimate has the input-to-state form with $\beta(r,s)=e^{-s}r$ and $\gamma(r)=r$: the initial condition decays exponentially, while the input contributes at most its essential supremum over the time interval.
[/example]
The chapter has now fixed the basic language of nonlinear control systems. The next chapter uses this language to define stability properties and to prove Lyapunov theorems that work even when linearization is inconclusive.
With the basic objects of nonlinear control now fixed, the next step is to ask when those objects behave predictably. Chapter 2 develops Lyapunov theory as the main stability language, showing how stability can be proved without solving the nonlinear dynamics explicitly.
# 2. Lyapunov Stability Theory
This chapter develops Lyapunov theory as the main stability language for nonlinear control systems. Chapter 1 introduced equilibria, positively invariant sets, controlled trajectories, and linearization; here the aim is to decide whether an equilibrium is stable without solving the nonlinear differential equation. The central idea is to replace explicit trajectories by scalar functions that decrease along trajectories, much as mechanical energy decreases in a damped system.
## Stability Notions and Regions of Attraction
The first problem is to say what it means for an equilibrium to be stable in a nonlinear system. Linear systems suggest a spectral classification, but nonlinear systems may be stable only near the equilibrium and may have several competing attractors. We therefore distinguish staying near, converging to, and converging at a uniform exponential rate.
[definition: Equilibrium Of An Autonomous System]
Let $f: D \to \mathbb R^n$ be locally Lipschitz on an [open set](/page/Open%20Set) $D \subset \mathbb R^n$. A point $x^* \in D$ is an equilibrium of the autonomous system $\dot{x} = f(x)$ if $f(x^*) = 0$.
[/definition]
After a change of variables $y = x - x^*$, the equilibrium is moved to the origin. This convention lets us ask the first stability question: if the initial condition starts close to the equilibrium, does the whole future trajectory remain close to it?
[definition: Lyapunov Stability]
Let $0 \in D$ be an equilibrium of $\dot{x} = f(x)$. The equilibrium $0$ is Lyapunov stable if for every $\varepsilon > 0$ there exists $\delta > 0$ such that, whenever $|x_0| < \delta$, the solution $x(t;x_0)$ exists and satisfies $|x(t;x_0)| < \varepsilon$ for all $t \ge 0$.
[/definition]
Stability prevents escape from a neighbourhood, but it does not require convergence. A centre for the undamped pendulum is stable in this sense: nearby trajectories remain nearby, while typically circling the equilibrium. For control, the next question is whether the effect of the initial perturbation actually disappears as $t \to \infty$.
[definition: Asymptotic Stability]
Let $0 \in D$ be an equilibrium of $\dot{x} = f(x)$. The equilibrium $0$ is asymptotically stable if it is Lyapunov stable and there exists $r > 0$ such that $|x_0| < r$ implies
\begin{align*}
\lim_{t \to \infty} x(t;x_0) = 0.
\end{align*}
[/definition]
Asymptotic stability says that small disturbances decay, but it gives no quantitative rate. Rates matter in robustness estimates, tracking design, and sampled-data implementations, so we next ask for a uniform exponential envelope on the decay.
[definition: Exponential Stability]
Let $0 \in D$ be an equilibrium of $\dot{x} = f(x)$. The equilibrium $0$ is exponentially stable if there exist constants $M \ge 1$, $\alpha > 0$, and $r > 0$ such that, whenever $|x_0| < r$, the solution satisfies
\begin{align*}
|x(t;x_0)| \le M e^{-\alpha t}|x_0| \qquad \text{for all } t \ge 0.
\end{align*}
[/definition]
Exponential stability implies asymptotic stability, but many nonlinear equilibria converge at slower rates. The following scalar system separates the qualitative and quantitative notions.
[example: Algebraic Decay In A Scalar Nonlinear System]
Consider $\dot{x}=-x^3$ on $\mathbb R$. If $x_0=0$, then $x(t;0)=0$ for all $t\ge 0$. For $x_0\ne 0$, the solution cannot cross $0$ by uniqueness, so on its trajectory we may differentiate $x(t)^{-2}$:
\begin{align*}
\frac{d}{dt}\bigl(x(t)^{-2}\bigr)=-2x(t)^{-3}\dot{x}(t).
\end{align*}
Substituting $\dot{x}(t)=-x(t)^3$ gives
\begin{align*}
\frac{d}{dt}\bigl(x(t)^{-2}\bigr)=2.
\end{align*}
Integrating from $0$ to $t$ gives
\begin{align*}
x(t)^{-2}=x_0^{-2}+2t.
\end{align*}
Since the sign of $x(t)$ is the same as the sign of $x_0$, this is equivalent to
\begin{align*}
x(t;x_0)=\frac{x_0}{\sqrt{1+2tx_0^2}}.
\end{align*}
Thus
\begin{align*}
|x(t;x_0)|=\frac{|x_0|}{\sqrt{1+2tx_0^2}}\le |x_0|
\end{align*}
for every $t\ge 0$, which gives Lyapunov stability. Also,
\begin{align*}
\lim_{t\to\infty}\frac{|x_0|}{\sqrt{1+2tx_0^2}}=0,
\end{align*}
so every trajectory converges to the origin. Hence the origin is asymptotically stable.
It is not exponentially stable. If exponential stability held, then for some $M\ge 1$, $\alpha>0$, and $r>0$, every $0<|x_0|<r$ would satisfy
\begin{align*}
\frac{|x(t;x_0)|}{|x_0|}\le Me^{-\alpha t}
\end{align*}
for all $t\ge 0$. Using the explicit solution, this would mean
\begin{align*}
(1+2tx_0^2)^{-1/2}\le Me^{-\alpha t}.
\end{align*}
Multiplying by $e^{\alpha t}$ and then squaring gives
\begin{align*}
e^{2\alpha t}\le M^2(1+2tx_0^2).
\end{align*}
The left-hand side grows exponentially in $t$, while the right-hand side is linear in $t$, so the inequality fails for sufficiently large $t$. The origin therefore attracts nearby trajectories only at an algebraic rate, not at any uniform exponential rate.
[/example]
As the zero-linearization examples in Chapter 1 already suggested, local convergence need not carry a single universal rate. A different nonlinear issue is spatial rather than temporal: even when convergence holds near the equilibrium, other initial conditions may approach different attractors or escape. This motivates recording the full set of initial states that converge to the chosen equilibrium.
[definition: Region Of Attraction]
Let $0 \in D$ be an asymptotically stable equilibrium of $\dot{x} = f(x)$. The region of attraction of $0$ is
\begin{align*}
\mathcal A(0) = \{x_0 \in D : x(t;x_0) \text{ exists for all } t \ge 0 \text{ and } x(t;x_0) \to 0 \text{ as } t \to \infty\}.
\end{align*}
[/definition]
The region of attraction is usually hard to compute exactly. The phase line example below shows why basin boundaries are a genuine nonlinear feature rather than a technical detail.
[example: Two Stable Equilibria In One Dimension]
Consider the scalar autonomous system
\begin{align*}
\dot{x}=x-x^3.
\end{align*}
The equilibria are the roots of the right-hand side:
\begin{align*}
x-x^3=x(1-x^2)=x(1-x)(1+x),
\end{align*}
so the equilibria are $x=-1$, $x=0$, and $x=1$.
The sign of the vector field determines the phase line. If $0<x<1$, then $x>0$, $1-x>0$, and $1+x>0$, so
\begin{align*}
x-x^3=x(1-x)(1+x)>0.
\end{align*}
If $x>1$, then $x>0$, $1-x<0$, and $1+x>0$, so
\begin{align*}
x-x^3=x(1-x)(1+x)<0.
\end{align*}
Thus solutions starting with $x_0>0$ move toward $1$: from $(0,1)$ they increase, and from $(1,\infty)$ they decrease. Since a solution cannot cross an equilibrium by uniqueness, the intervals $(0,1)$ and $(1,\infty)$ are forward invariant. In either case the trajectory is monotone and bounded, hence has a limit $L\ge 0$. Passing to the limit in the autonomous equation forces
\begin{align*}
0=L-L^3=L(1-L)(1+L),
\end{align*}
and the only possible positive limit compatible with the invariant intervals is $L=1$. Therefore every solution with $x_0>0$ converges to $1$.
Similarly, if $-1<x<0$, then $x<0$, $1-x>0$, and $1+x>0$, so
\begin{align*}
x-x^3=x(1-x)(1+x)<0.
\end{align*}
If $x<-1$, then $x<0$, $1-x>0$, and $1+x<0$, so
\begin{align*}
x-x^3=x(1-x)(1+x)>0.
\end{align*}
Hence solutions starting with $x_0<0$ move toward $-1$, and the same monotonicity-and-limit argument gives $x(t;x_0)\to -1$.
The equilibrium $0$ is unstable: for every $\delta>0$, an initial condition $x_0\in(0,\delta)$ satisfies $x(t;x_0)\to 1$, so for large $t$ it leaves the neighbourhood $|x|<1/2$ of the origin. Thus the phase line splits into two basins, $(-\infty,0)$ and $(0,\infty)$, separated by the unstable equilibrium $0$.
[/example]
## Lyapunov Direct Method and Strict Lyapunov Functions
The main practical problem is to prove stability without integrating the ODE. The guiding observation is mechanical: if a nonnegative energy is small near equilibrium and never increases, the trajectory cannot move to a high-energy part of the state space. If the energy decreases strictly away from the equilibrium, it should force convergence.
[definition: Positive Definite Function]
Let $D \subset \mathbb R^n$ be a neighbourhood of $0$. A [continuous function](/page/Continuous%20Function) $V: D \to \mathbb R$ is positive definite on $D$ if $V(0)=0$ and $V(x)>0$ for all $x \in D \setminus \{0\}$.
[/definition]
Positive definiteness makes sublevel sets of $V$ serve as nonlinear neighbourhoods of the origin. To use these sublevel sets along trajectories, we need a derivative of $V$ in the direction of the vector field.
[definition: Orbital Derivative]
Let $f: D \to \mathbb R^n$ be a vector field and let $V: D \to \mathbb R$ be a function in $C^1(D)$. The orbital derivative of $V$ along $\dot{x}=f(x)$ is the function $\dot{V}:D \to \mathbb R$ defined by
\begin{align*}
\dot{V}(x) = \nabla V(x) \cdot f(x).
\end{align*}
[/definition]
The chain rule gives, along any solution,
\begin{align*}
\frac{d}{dt}V(x(t)) = \dot{V}(x(t)).
\end{align*}
Hence sign conditions on $\dot{V}$ become monotonicity statements for the scalar function $t \mapsto V(x(t))$. We first name the nonincreasing case, which is designed to prove that trajectories cannot leave small sublevel sets.
[definition: Lyapunov Function]
Let $f: D \to \mathbb R^n$ be locally Lipschitz on a neighbourhood $D \subset \mathbb R^n$ of $0$, and suppose $0$ is an equilibrium of $\dot{x}=f(x)$. A function $V: D \to \mathbb R$ in $C^1(D)$ is a Lyapunov function for $0$ if $V$ is positive definite and $\dot{V}(x) \le 0$ for all $x \in D$.
[/definition]
A Lyapunov function gives confinement, but convergence may fail if trajectories can move forever on a level set. This raises the question of what extra inequality rules out such persistent motion and certifies attraction rather than boundedness alone. The answer is to require strict decrease away from the equilibrium.
[definition: Strict Lyapunov Function]
Let $f: D \to \mathbb R^n$ be locally Lipschitz on a neighbourhood $D \subset \mathbb R^n$ of $0$, and suppose $0$ is an equilibrium of $\dot{x}=f(x)$. A function $V: D \to \mathbb R$ in $C^1(D)$ is a strict Lyapunov function for $0$ if $V$ is positive definite and $\dot{V}(x) < 0$ for all $x \in D \setminus \{0\}$.
[/definition]
The previous definitions were chosen so that sublevel sets behave like positively invariant neighbourhoods, in the sense of the forward-invariance language introduced in Chapter 1. The theorem below is the basic direct method: it converts the sign of a scalar orbital derivative into Lyapunov stability, and with strict decrease into asymptotic stability.
[quotetheorem:7603]
[citeproof:7603]
The hypotheses each have a distinct role. Positive definiteness is what makes small values of $V$ mean closeness to the origin; without it, a nonincreasing scalar function might trap trajectories near the wrong set. The inequality $\dot{V}\le 0$ gives stability but not attraction, as the undamped oscillator with conserved energy shows. Strict negativity away from $0$ rules out such persistent level-set motion locally, although it still gives no global basin unless the relevant sublevel sets are known to remain inside the domain. The theorem explains why Lyapunov functions are useful for the closed-loop autonomous systems introduced in Chapter 0, but it does not tell us how to find them. In mechanical examples, the total energy is often the right starting point.
[example: Damped Pendulum Energy Function]
Consider the damped pendulum with damping constant $c>0$ and state $x=(x_1,x_2)\in\mathbb R^2$:
\begin{align*}
\dot{x}_1=x_2.
\end{align*}
\begin{align*}
\dot{x}_2=-\sin x_1-cx_2.
\end{align*}
Around the downward equilibrium $(0,0)$, take
\begin{align*}
V(x_1,x_2)=1-\cos x_1+\frac{1}{2}x_2^2.
\end{align*}
On the strip $|x_1|<\pi$, we have $1-\cos x_1\ge 0$, with equality only when $x_1=0$, and $\frac{1}{2}x_2^2\ge 0$, with equality only when $x_2=0$. Hence $V(0,0)=0$ and $V(x_1,x_2)>0$ for every $(x_1,x_2)\ne(0,0)$ in this strip, so $V$ is positive definite there.
Its gradient is
\begin{align*}
\nabla V(x_1,x_2)=(\sin x_1,x_2).
\end{align*}
Therefore the orbital derivative along the pendulum dynamics is
\begin{align*}
\dot{V}(x_1,x_2)=(\sin x_1)x_2+x_2(-\sin x_1-cx_2).
\end{align*}
Expanding the second term gives
\begin{align*}
\dot{V}(x_1,x_2)=\sin x_1\,x_2-x_2\sin x_1-cx_2^2.
\end{align*}
The two mixed terms cancel, so
\begin{align*}
\dot{V}(x_1,x_2)=-c x_2^2.
\end{align*}
Since $c>0$ and $x_2^2\ge 0$, this gives
\begin{align*}
\dot{V}(x_1,x_2)\le 0.
\end{align*}
Thus $V$ is a Lyapunov function near $(0,0)$, and the *Lyapunov Stability Theorem* gives Lyapunov stability. The derivative is not strict, because $\dot{V}(x_1,0)=0$ for every $x_1$ with $|x_1|<\pi$, so this calculation alone does not prove convergence; LaSalle's principle below identifies which part of the set $x_2=0$ can actually contain a full trajectory.
[/example]
For non-mechanical systems, a common construction is to match a candidate storage function to the nonlinear restoring forces. The calculation must check both positivity and decay.
[example: Nonlinear Mass-Spring System]
Consider the damped nonlinear oscillator with $c>0$:
\begin{align*}
\dot{x}_1 = x_2,
\end{align*}
\begin{align*}
\dot{x}_2 = -x_1 - x_1^3 - c x_2.
\end{align*}
Take the energy candidate
\begin{align*}
V(x_1,x_2)=\frac{1}{2}x_2^2 + \frac{1}{2}x_1^2 + \frac{1}{4}x_1^4.
\end{align*}
Each term in $V$ is nonnegative. If $V(x_1,x_2)=0$, then $\frac{1}{2}x_2^2=0$, $\frac{1}{2}x_1^2=0$, and $\frac{1}{4}x_1^4=0$, so $x_2=0$ and $x_1=0$. Thus $V(0,0)=0$ and $V(x_1,x_2)>0$ whenever $(x_1,x_2)\ne(0,0)$, so $V$ is positive definite. Also,
\begin{align*}
V(x_1,x_2)\ge \frac{1}{2}x_1^2+\frac{1}{2}x_2^2=\frac{1}{2}|x|^2,
\end{align*}
so $V(x)\to\infty$ as $|x|\to\infty$; hence $V$ is radially unbounded.
The partial derivatives are
\begin{align*}
\frac{\partial V}{\partial x_1}(x_1,x_2)=x_1+x_1^3.
\end{align*}
\begin{align*}
\frac{\partial V}{\partial x_2}(x_1,x_2)=x_2.
\end{align*}
Therefore the orbital derivative along the vector field is
\begin{align*}
\dot{V}(x_1,x_2)=(x_1+x_1^3)x_2+x_2(-x_1-x_1^3-cx_2).
\end{align*}
Expanding the two products gives
\begin{align*}
\dot{V}(x_1,x_2)=x_1x_2+x_1^3x_2-x_1x_2-x_1^3x_2-cx_2^2.
\end{align*}
The terms $x_1x_2$ and $-x_1x_2$ cancel, and the terms $x_1^3x_2$ and $-x_1^3x_2$ cancel, leaving
\begin{align*}
\dot{V}(x_1,x_2)=-c x_2^2.
\end{align*}
Since $c>0$ and $x_2^2\ge 0$, we have
\begin{align*}
\dot{V}(x_1,x_2)\le 0.
\end{align*}
The decrease is only semidefinite, because $\dot{V}(x_1,0)=0$ for every $x_1$, not only at the origin. Thus this energy calculation proves nonincrease of $V$ but does not by itself prove convergence; the remaining question is whether any nonzero trajectory can stay inside the zero-derivative set $x_2=0$.
[/example]
This raises the quantitative question of what stronger Lyapunov inequalities imply when they are available. In design problems we sometimes obtain bounds showing that $V$ is comparable to $|x|^2$ and that its decay is comparable to $-|x|^2$. Those estimates should produce not only convergence but an explicit exponential rate.
[quotetheorem:7604]
[citeproof:7604]
The quadratic comparison is stronger than positive definiteness: it says that $V$ behaves like squared distance near the equilibrium. Each inequality has a separate job. Without the lower bound, $V(x(t))\to 0$ need not force $x(t)\to 0$ at an exponential rate; for instance $V(x)=x^4$ may decay exponentially while $|x|$ decays at half the exponent predicted by a quadratic comparison. Without the upper bound, $\dot{V}\le -c_3|x|^2$ does not imply a closed differential inequality of the form $\dot{V}\le -aV$. Without the derivative bound, $\dot{x}=-x^3$ with $V(x)=x^2/2$ has asymptotic stability but only algebraic decay. The local ball is also essential: estimates valid only near $0$ say nothing about a trajectory that leaves that neighbourhood before the inequalities become useful, so the proof combines decay with a trapping argument. This quantitative result raises a natural reverse question: if an equilibrium is already known to be stable, is the search for a Lyapunov function always justified? Converse theorems answer this at a structural level, even though they may not give an easily computable formula.
[quotetheorem:7605]
In this course the converse theorem is used as a conceptual result rather than as a construction. Under the stated $C^1$ vector-field hypothesis, the theorem gives a checkable local converse to the direct Lyapunov method: local asymptotic stability is enough to guarantee some differentiable strict Lyapunov certificate, although it may not be easy to compute. A common construction integrates a positive cost along trajectories, for example
\begin{align*}
V(x_0)=\int_0^\infty q(x(t;x_0))\,dt,
\end{align*}
with a positive definite function $q$, and then proves regularity of this value function. The asymptotic stability assumption is essential: if nearby trajectories do not converge to the equilibrium, such an integral may diverge or may fail to be positive definite with strict orbital decay. Forward existence and remaining inside $D$ prevent the construction from losing trajectories through finite-time escape or leaving the region where the vector field is controlled. The $C^1$ hypothesis on $f$ is the regularity input used by this version of the theorem; weaker converse theorems exist, but they require more technical statements about nonsmooth or merely continuous Lyapunov functions. The point for later control design is therefore not that the converse theorem gives a practical formula, but that searching for Lyapunov certificates is theoretically complete under the right local hypotheses.
## LaSalle Invariance Principle and Barbalat Lemma
Many natural Lyapunov functions have only semidefinite decay. The damped pendulum and nonlinear mass-spring examples both give $\dot{V}=-c x_2^2$, so the derivative does not rule out motion on the set $x_2=0$. The next problem is to decide whether a trajectory can remain forever in the zero-derivative set.
[definition: Positively Invariant Set]
Let $f: D \to \mathbb R^n$ be locally Lipschitz on an open set $D \subset \mathbb R^n$. A set $M \subset D$ is positively invariant for $\dot{x}=f(x)$ if $x_0 \in M$ implies $x(t;x_0) \in M$ for all $t \ge 0$ for which the solution exists.
[/definition]
This definition isolates the pieces of state space that can support entire forward motions. The relevant subset of $\{\dot{V}=0\}$ is not the whole zero-derivative set, but the largest part of it that the dynamics can keep invariant. This raises the invariance question: when $V$ only decreases semidefinitely, which invariant set must the trajectory approach?
[quotetheorem:2783]
Compactness ensures that omega-limit sets exist and that $V$ has a lower bound along the trapped motion. Positive invariance is what keeps the whole future trajectory inside the set where the inequality $\dot{V}\le 0$ is known; without it, the conclusion would not follow from estimates on $\Omega$ alone. For example, in the scalar system $\dot{x}=1$ with $V(x)=-x$ on $\Omega=[0,1]$, the inequality $\dot{V}=-1\le 0$ holds on $\Omega$, but a solution starting at $x_0=1/2$ leaves $\Omega$ after finite time, so no conclusion about its limiting behaviour can be drawn from the estimate on $\Omega$. Compactness is separate: on a noncompact positively invariant set, boundedness of $V(x(t))$ need not produce an omega-limit set, so the compactness step in the proof can fail. The largest invariant subset is necessary because $\dot{V}=0$ may contain points through which trajectories pass instantaneously rather than remain, so LaSalle does not say that solutions converge to all of $E$. The pendulum energy calculation now becomes decisive. The set where $\dot{V}=0$ is $\{x_2=0\}$, but the only trajectory that remains in this set near the downward equilibrium is the equilibrium itself.
[example: Damped Pendulum via LaSalle]
For the damped pendulum with $c>0$, use the energy
\begin{align*}
V(x_1,x_2)=1-\cos x_1+\frac{1}{2}x_2^2.
\end{align*}
Fix $0<\rho<2$, and let $\Omega_\rho$ be the connected component containing $(0,0)$ of the sublevel set
\begin{align*}
\{(x_1,x_2):V(x_1,x_2)\le \rho\}.
\end{align*}
If $(x_1,x_2)\in\Omega_\rho$, then $1-\cos x_1\le \rho<2$, so $\cos x_1> -1$ on the component containing $0$. Hence $|x_1|<\pi$ on this component. Also $\frac{1}{2}x_2^2\le \rho$, so $|x_2|\le \sqrt{2\rho}$. Thus $\Omega_\rho$ is bounded and closed, hence compact.
Along the pendulum dynamics,
\begin{align*}
\dot{x}_1=x_2
\end{align*}
and
\begin{align*}
\dot{x}_2=-\sin x_1-cx_2.
\end{align*}
The gradient of $V$ is
\begin{align*}
\nabla V(x_1,x_2)=(\sin x_1,x_2).
\end{align*}
Therefore
\begin{align*}
\dot{V}(x_1,x_2)=(\sin x_1)x_2+x_2(-\sin x_1-cx_2).
\end{align*}
Expanding the second product gives
\begin{align*}
\dot{V}(x_1,x_2)=\sin x_1\,x_2-x_2\sin x_1-cx_2^2.
\end{align*}
The two mixed terms are equal with opposite signs, so
\begin{align*}
\dot{V}(x_1,x_2)=-c x_2^2\le 0.
\end{align*}
Hence $V(x(t))$ is nonincreasing along trajectories, and any trajectory starting in $\Omega_\rho$ remains in $\Omega_\rho$.
The zero-derivative set inside $\Omega_\rho$ is
\begin{align*}
E=\{(x_1,x_2)\in\Omega_\rho:x_2=0\}.
\end{align*}
If a trajectory remains in $E$ for all future time, then $x_2(t)=0$ for all $t$. From $\dot{x}_1=x_2$, this gives
\begin{align*}
\dot{x}_1(t)=0.
\end{align*}
Since $x_2(t)$ is identically zero, also $\dot{x}_2(t)=0$. Substituting $x_2(t)=0$ into the second equation gives
\begin{align*}
0=\dot{x}_2(t)=-\sin x_1(t).
\end{align*}
Thus $\sin x_1(t)=0$. Because the whole set lies in $|x_1|<\pi$, the only such value is $x_1(t)=0$. Therefore the largest positively invariant subset of $E$ is $\{(0,0)\}$. By *LaSalle Invariance Principle*, every solution starting in $\Omega_\rho$ converges to the downward equilibrium $(0,0)$.
[/example]
Sublevel sets are also the standard way to estimate basins of attraction. The method is conservative but computable, which is why it appears often in nonlinear control design.
[example: Estimating a Basin of Attraction by Sublevel Sets]
[claim]Under these hypotheses, the compact sublevel set $\Omega_c$ is an inner estimate of the region of attraction: $\Omega_c\subset \mathcal A(0)$.[/claim]
[proof]Let $x(t)=x(t;x_0)$ be a solution with $x_0\in\Omega_c$. Along the solution, the chain rule and the definition of the orbital derivative give
\begin{align*}
\frac{d}{dt}V(x(t))=\dot V(x(t)).
\end{align*}
If $x(t)\ne 0$, then $\dot V(x(t))<0$ by hypothesis, while if $x(t)=0$, the equilibrium condition gives $x(s)=0$ for later times and hence $V(x(s))=0$. Thus, for every time for which the trajectory is defined and remains in $D$,
\begin{align*}
\frac{d}{dt}V(x(t))\le 0.
\end{align*}
Integrating this differential inequality from $0$ to $t$ gives
\begin{align*}
V(x(t))\le V(x_0).
\end{align*}
Since $x_0\in\Omega_c$, we have $V(x_0)\le c$, and therefore
\begin{align*}
V(x(t))\le c.
\end{align*}
Hence $x(t)\in\Omega_c$ for all times for which the solution is defined, so $\Omega_c$ is positively invariant.
Because $\Omega_c$ is compact and contained in $D$, a trajectory trapped in $\Omega_c$ cannot leave every compact subset of $D$ in finite time. Thus each solution with $x_0\in\Omega_c$ exists for all $t\ge 0$. On $\Omega_c$, the function $V$ is positive definite and satisfies $\dot V(x)<0$ for every $x\ne 0$, so the strict part of the *Lyapunov Stability Theorem* implies
\begin{align*}
\lim_{t\to\infty}x(t;x_0)=0.
\end{align*}
Therefore every $x_0\in\Omega_c$ belongs to $\mathcal A(0)$, which proves
\begin{align*}
\Omega_c\subset \mathcal A(0).
\end{align*}
[/proof]
The practical meaning is that one does not need the exact basin: any compact sublevel set lying inside the domain where the strict Lyapunov inequalities hold gives a certified inner approximation of the region of attraction.
[/example]
The sublevel-set method is geometric, while many control proofs are written as signal estimates, echoing the input-to-state estimate viewpoint introduced at the end of Chapter 1. After integrating an identity such as $\dot{V}=-W$, we often know that a nonnegative signal $W(t)$ has finite integral; the remaining question is what extra regularity turns finite accumulated energy into pointwise decay.
[quotetheorem:7606]
[citeproof:7606]
The assumptions have distinct roles. The finite integral is the source of decay information; [uniform continuity](/page/Uniform%20Continuity) alone gives no decay, as the constant signal $g(t)=1$ shows. Nonnegativity prevents cancellation: an oscillatory signal such as $g(t)=\sin t$ has cancellations in its signed integral and does not converge to $0$, so the Lyapunov application uses $g=W(x(t))\ge 0$. The uniform continuity assumption prevents narrow spikes of fixed height from moving farther and farther out in time while contributing finite total area. Without it, integrability alone does not force pointwise decay: a continuous nonnegative function made of triangular spikes of height $1$ and widths summing to a finite number has finite integral but does not converge to $0$. [Barbalat's lemma](/theorems/7606) proves pointwise decay of the signal $g(t)$ only; it does not by itself identify the limit of the full state $x(t)$, prove convergence of $x(t)$, or give a rate. In Lyapunov arguments, Barbalat is usually applied to $g(t)=W(x(t))\ge 0$ where $\dot{V}=-W$ and $W(x(t))$ is uniformly continuous. The finite integral follows from integrating $\dot{V}$, and uniform continuity often follows from boundedness of $x(t)$ and local Lipschitz bounds on the vector field.
[example: Using Barbalat for the Nonlinear Mass-Spring System]
For the nonlinear mass-spring system
\begin{align*}
\dot{x}_1=x_2
\end{align*}
and
\begin{align*}
\dot{x}_2=-x_1-x_1^3-cx_2,
\end{align*}
with $c>0$, use the energy function from the previous example,
\begin{align*}
V(x_1,x_2)=\frac{1}{2}x_2^2+\frac{1}{2}x_1^2+\frac{1}{4}x_1^4.
\end{align*}
There we computed
\begin{align*}
\dot V(x_1,x_2)=-c x_2^2.
\end{align*}
Along a trajectory $x(t)=x(t;x_0)$, the chain rule gives
\begin{align*}
\frac{d}{dt}V(x(t))=\dot V(x(t))=-c x_2(t)^2.
\end{align*}
Integrating from $0$ to $T$ gives
\begin{align*}
V(x(T))-V(x(0))=-c\int_0^T x_2(t)^2\,dt.
\end{align*}
Rearranging,
\begin{align*}
c\int_0^T x_2(t)^2\,dt=V(x(0))-V(x(T)).
\end{align*}
Since $V$ is a sum of nonnegative terms, $V(x(T))\ge 0$, so
\begin{align*}
c\int_0^T x_2(t)^2\,dt\le V(x(0)).
\end{align*}
Dividing by $c>0$ gives
\begin{align*}
\int_0^T x_2(t)^2\,dt\le \frac{V(x(0))}{c}.
\end{align*}
Because this bound is independent of $T$, monotone convergence gives
\begin{align*}
\int_0^\infty x_2(t)^2\,dt\le \frac{V(x(0))}{c}<\infty.
\end{align*}
It remains to justify the uniform continuity needed for *Barbalat Lemma*. Since $V(x(t))\le V(x(0))$ for all $t\ge 0$, the trajectory stays in the sublevel set
\begin{align*}
\{x:V(x)\le V(x(0))\}.
\end{align*}
This set is bounded because $V(x)\ge \frac{1}{2}|x|^2$, and it is closed because $V$ is continuous. Hence $x_1(t)$ and $x_2(t)$ are bounded. The vector field is polynomial, so on this bounded set $\dot{x}_2(t)=-x_1(t)-x_1(t)^3-cx_2(t)$ is also bounded. For $g(t)=x_2(t)^2$,
\begin{align*}
g'(t)=2x_2(t)\dot{x}_2(t).
\end{align*}
Both factors on the right are bounded, so $g'$ is bounded, and therefore $g$ is uniformly continuous. Since $g\ge 0$ and $\int_0^\infty g(t)\,dt<\infty$, *Barbalat Lemma* gives
\begin{align*}
x_2(t)^2\to 0.
\end{align*}
Thus
\begin{align*}
x_2(t)\to 0.
\end{align*}
Now take any [limit point](/page/Limit%20Point) $z=(z_1,z_2)$ of the bounded trajectory. Because $x_2(t)\to 0$, every such limit point satisfies
\begin{align*}
z_2=0.
\end{align*}
The shifted trajectories through times tending to infinity have subsequential limits that solve the same autonomous system. On such a limiting trajectory, the second component is identically $0$, so its derivative is also identically $0$. Substituting $x_2=0$ into the second state equation gives
\begin{align*}
0=\dot{x}_2=-x_1-x_1^3.
\end{align*}
Equivalently,
\begin{align*}
x_1+x_1^3=0.
\end{align*}
Factoring,
\begin{align*}
x_1(1+x_1^2)=0.
\end{align*}
Since $1+x_1^2>0$ for every real $x_1$, the only possibility is
\begin{align*}
x_1=0.
\end{align*}
Thus every limit point is $(0,0)$. The trajectory is bounded and all of its limit points equal the origin, so
\begin{align*}
x(t;x_0)\to (0,0).
\end{align*}
The energy identity first proves that the velocity has finite accumulated energy, and Barbalat's lemma turns that integral information into pointwise decay; the dynamics then force the only possible limiting position to be the origin.
[/example]
The chapter's main lesson is that Lyapunov analysis replaces explicit solutions by geometry of scalar functions. Stability comes from trapping sublevel sets, asymptotic convergence comes from strict decrease or invariance analysis, and basin estimates come from finding compact sublevel sets inside the domain where the Lyapunov inequalities hold. These tools will be reused in Chapter 3, where the controller is chosen so that a desired Lyapunov function decreases along closed-loop trajectories.
Chapter 2 showed how Lyapunov functions certify stability, estimate basins of attraction, and handle cases where linearization gives no clear answer. Chapter 3 uses the same ideas in the opposite direction, choosing the feedback law so that a suitable Lyapunov function decreases along closed-loop trajectories.
# 3. Lyapunov-Based Feedback Design
This chapter turns Lyapunov stability theory into a design method. In Chapter 2, a Lyapunov function certified stability for a closed-loop system already in hand; here the Lyapunov function becomes a tool for constructing the feedback law itself. The central question is: given a nonlinear control system, can we choose the input so that a proposed energy-like quantity decreases along every nonzero trajectory? The answer leads to control Lyapunov functions, universal formulas for affine systems, and recursive backstepping for systems whose dynamics have a triangular structure.
## From Lyapunov Certificates to Feedback Laws
Suppose the plant is a controlled ODE
\begin{align*}
\dot{x} = f(x,u), \qquad x \in \mathbb R^n,\quad u \in U \subseteq \mathbb R^m,
\end{align*}
with $f(0,0)=0$. A usual Lyapunov argument begins after the feedback $u=k(x)$ has been chosen. Feedback design reverses the order: first select a positive definite function $V$, then ask whether for each state $x \ne 0$ there is some admissible input that makes the derivative of $V$ negative.
The following definition formalises this design requirement. It is local or global according to the domain on which the inequalities are imposed.
[definition: Control Lyapunov Function]
Let $D \subseteq \mathbb R^n$ be a neighbourhood of $0$, let $U \subseteq \mathbb R^m$, and let $f:D \times U \to \mathbb R^n$ satisfy $f(0,0)=0$. A function $V \in C^1(D;\mathbb R)$ is a control Lyapunov function for $\dot{x}=f(x,u)$ on $D$ if $V(0)=0$, $V(x)>0$ for all $x \in D\setminus\{0\}$, and for every $x \in D\setminus\{0\}$ there exists $u \in U$ such that
\begin{align*}
\nabla V(x)\cdot f(x,u) < 0.
\end{align*}
[/definition]
The definition says that the input has enough authority to point the vector field into a decreasing level set of $V$. It does not itself give a feedback law, because the input that works may depend on $x$ in a discontinuous or nonunique way. The design problem is to select such inputs coherently.
[example: Scalar Integrator Control Lyapunov Function]
Consider the scalar integrator $\dot{x}=u$ with unrestricted input $u\in\mathbb R$, and choose
\begin{align*}
V(x)=\frac{x^2}{2}.
\end{align*}
Then $V(0)=0$, and if $x\ne0$ then $x^2>0$, so $V(x)>0$. Since $V'(x)=x$, the derivative of $V$ along a trajectory of $\dot{x}=u$ is
\begin{align*}
\dot V(x)=V'(x)\dot{x}=xu.
\end{align*}
For each $x\ne0$, choose the admissible input $u=-x$. Substituting this input gives
\begin{align*}
\dot V(x)=x(-x)=-x^2<0.
\end{align*}
Therefore $V$ satisfies the control Lyapunov function condition. Under the feedback $u=-x$, the closed-loop equation is
\begin{align*}
\dot{x}=-x.
\end{align*}
Its solution from $x(0)=x_0$ is $x(t)=x_0e^{-t}$, so
\begin{align*}
|x(t)|=|x_0|e^{-t}.
\end{align*}
The origin is therefore globally exponentially stable, and the Lyapunov decrease is exactly the squared state magnitude.
[/example]
This example hides the selection difficulty because the stabilizing input is smooth and unique. For more general systems, the set of inputs that decrease $V$ can change shape with $x$, and the course needs a result connecting this pointwise CLF condition with the existence of an actual stabilizing feedback.
[quotetheorem:7607]
[citeproof:7607]
Artstein's theorem is conceptually important because it converts stabilization into the search for a scalar function. Each hypothesis rules out a concrete obstruction. Without the small-control property, stabilizing inputs may have to stay bounded away from $0$ along states approaching the origin, so any feedback with $k(0)=0$ would be discontinuous at the equilibrium. Without properness on the region being stabilized, a negative derivative of $V$ can keep trajectories inside decreasing level sets without preventing escape toward the boundary of $D$ or to infinity. The affine structure is also part of the selection theorem: it makes the set of decreasing inputs a half-space or an intersection of half-spaces, rather than an arbitrary state-dependent set with no continuous selector. The theorem is not yet a practical formula because it leaves the feedback implicit. For affine systems with scalar input, Sontag's formula provides an explicit stabilizer from the same data.
## Sontag's Universal Formula for Affine Systems
How can the stabilizing input be written directly from $V$ without solving an optimization problem at each state? For a single-input affine system, the derivative of $V$ along trajectories separates into a drift term and a control term. The feedback design can therefore be expressed through two scalar functions.
[definition: Lie Derivative Data for a Control Lyapunov Function]
Let $D\subseteq\mathbb R^n$ be open, let $f:D\to\mathbb R^n$ and $g:D\to\mathbb R^n$, and consider the single-input affine system $\dot{x}=f(x)+g(x)u$ with $u\in\mathbb R$. For $V\in C^1(D;\mathbb R)$, define $a:D\to\mathbb R$ and $b:D\to\mathbb R$ by
\begin{align*}
a(x) = \nabla V(x)\cdot f(x).
\end{align*}
\begin{align*}
b(x) = \nabla V(x)\cdot g(x).
\end{align*}
[/definition]
With this notation, the derivative of $V$ under feedback $u=k(x)$ is $a(x)+b(x)k(x)$. If $b(x)\ne0$, a sufficiently large input of the opposite sign to $b(x)$ can force decrease. The only dangerous points are those where $b(x)=0$, because the control has no first-order effect on $V$ there; this is why the next condition controls the behaviour of stabilizing inputs near the origin.
[definition: Small-Control Property]
A control Lyapunov function $V$ for the single-input affine system has the small-control property if for every $\varepsilon>0$ there exists $\delta>0$ such that whenever $0<|x|<\delta$, there is $u\in\mathbb R$ with $|u|<\varepsilon$ and
\begin{align*}
a(x)+b(x)u<0.
\end{align*}
[/definition]
The small-control property ensures continuity of the stabilizing feedback at the equilibrium. Without it, the CLF may require inputs bounded away from zero near the origin, which produces a discontinuity at $x=0$ when the equilibrium input is $0$.
[quotetheorem:7608]
[citeproof:7608]
The formula is often called universal because it depends only on $a$ and $b$, not on a model-specific algebraic construction. The hypotheses explain exactly where the formula can fail. If $b(x)=0$ at a nonzero point and $a(x)\ge0$, no scalar input can make $a(x)+b(x)u$ negative there, so the proposed $V$ was not a CLF. If the small-control property fails, the algebraic expression may still decrease $V$ away from $0$, but it need not satisfy $k(x)\to0$ as $x\to0$; this creates a discontinuity at the equilibrium and can destroy the usual closed-loop existence and stability conclusion. Regularity of $f$, $g$, and $V$ is also needed so that the closed-loop ODE has the solutions to which Lyapunov's theorem is applied. The main design limitation remains that a CLF must already be known. The next example shows how the formula recovers familiar stabilizing feedback for an elementary plant.
[example: Integrator Chain Stabilization]
For the double integrator $\dot{x}_1=x_2$ and $\dot{x}_2=u$, choose the stabilizing linear feedback
\begin{align*}
u=-2x_1-3x_2.
\end{align*}
The closed-loop equations are $\dot{x}_1=x_2$ and $\dot{x}_2=-2x_1-3x_2$. Take $P$ with entries $p_{11}=5/4$, $p_{12}=p_{21}=1/4$, and $p_{22}=1/4$, and define
\begin{align*}
V(x)=x^\top Px=\frac{5}{4}x_1^2+\frac{1}{2}x_1x_2+\frac{1}{4}x_2^2.
\end{align*}
The leading principal coefficient is $5/4>0$, and the determinant is
\begin{align*}
\frac{5}{4}\cdot\frac{1}{4}-\frac{1}{4}\cdot\frac{1}{4}=\frac{5}{16}-\frac{1}{16}=\frac{1}{4}>0.
\end{align*}
Thus $P$ is positive definite, so $V(x)>0$ for every $x\ne0$ and $V(0)=0$.
Now compute the derivative of $V$ under the linear feedback. Its gradient is
\begin{align*}
\nabla V(x)=\left(\frac{5}{2}x_1+\frac{1}{2}x_2,\frac{1}{2}x_1+\frac{1}{2}x_2\right).
\end{align*}
Therefore
\begin{align*}
\dot V=\left(\frac{5}{2}x_1+\frac{1}{2}x_2\right)x_2+\left(\frac{1}{2}x_1+\frac{1}{2}x_2\right)(-2x_1-3x_2).
\end{align*}
Expanding the second product gives
\begin{align*}
\left(\frac{1}{2}x_1+\frac{1}{2}x_2\right)(-2x_1-3x_2)=-x_1^2-\frac{3}{2}x_1x_2-x_1x_2-\frac{3}{2}x_2^2.
\end{align*}
Substituting this into the derivative,
\begin{align*}
\dot V=\frac{5}{2}x_1x_2+\frac{1}{2}x_2^2-x_1^2-\frac{3}{2}x_1x_2-x_1x_2-\frac{3}{2}x_2^2.
\end{align*}
The mixed terms cancel because $\frac{5}{2}-\frac{3}{2}-1=0$, and the $x_2^2$ terms combine as $\frac{1}{2}-\frac{3}{2}=-1$, hence
\begin{align*}
\dot V=-x_1^2-x_2^2<0
\end{align*}
for every $x\ne0$. Thus this quadratic function is a control Lyapunov function.
For the open-loop affine form, write $f(x)=(x_2,0)$ and $g(x)=(0,1)$. The Lie derivative data are
\begin{align*}
a(x)=\nabla V(x)\cdot f(x)=\left(\frac{5}{2}x_1+\frac{1}{2}x_2\right)x_2=\frac{5}{2}x_1x_2+\frac{1}{2}x_2^2.
\end{align*}
Also,
\begin{align*}
b(x)=\nabla V(x)\cdot g(x)=\frac{1}{2}x_1+\frac{1}{2}x_2.
\end{align*}
For $b(x)\ne0$, Sontag's formula gives
\begin{align*}
k(x)=-\frac{a(x)+\sqrt{a(x)^2+b(x)^4}}{b(x)}.
\end{align*}
Substitution into $\dot V=a+bk$ yields
\begin{align*}
\dot V=a(x)+b(x)\left(-\frac{a(x)+\sqrt{a(x)^2+b(x)^4}}{b(x)}\right).
\end{align*}
Cancelling the nonzero factor $b(x)$ gives
\begin{align*}
\dot V=a(x)-a(x)-\sqrt{a(x)^2+b(x)^4}=-\sqrt{a(x)^2+b(x)^4}<0
\end{align*}
whenever $(a(x),b(x))\ne(0,0)$. If $b(x)=0$, then $x_2=-x_1$, and
\begin{align*}
a(x)=\frac{5}{2}x_1(-x_1)+\frac{1}{2}(-x_1)^2=-\frac{5}{2}x_1^2+\frac{1}{2}x_1^2=-2x_1^2.
\end{align*}
For a nonzero state with $b(x)=0$, this gives $a(x)<0$, so the formula's value $k(x)=0$ still gives $\dot V=a(x)<0$. Thus Sontag's feedback is a nonlinear stabilizing selector for the same quadratic energy.
[/example]
Near a mechanical equilibrium, the CLF may arise from energy shaping or from a quadratic approximation. For the inverted pendulum near the upright position, the affine form separates the natural pendulum dynamics from the torque input, and the CLF derivative records whether torque can remove the local energy error.
[example: Inverted Pendulum Near Upright]
Let $x_1$ be the angular displacement from the upright equilibrium and $x_2$ the angular velocity, with
\begin{align*}
\dot{x}_1=x_2.
\end{align*}
\begin{align*}
\dot{x}_2=\alpha\sin x_1+\beta u,
\end{align*}
where $\alpha,\beta>0$. Use the linear stabilizing torque
\begin{align*}
u=-\frac{\alpha+2}{\beta}x_1-\frac{3}{\beta}x_2.
\end{align*}
Then
\begin{align*}
\dot{x}_2=\alpha\sin x_1-(\alpha+2)x_1-3x_2.
\end{align*}
Equivalently,
\begin{align*}
\dot{x}_2=-2x_1-3x_2+\alpha(\sin x_1-x_1).
\end{align*}
Take the same quadratic function used for the double-integrator calculation,
\begin{align*}
V(x)=\frac{5}{4}x_1^2+\frac{1}{2}x_1x_2+\frac{1}{4}x_2^2.
\end{align*}
Its matrix has leading principal coefficient $5/4>0$ and determinant
\begin{align*}
\frac{5}{4}\cdot\frac{1}{4}-\frac{1}{4}\cdot\frac{1}{4}=\frac{5}{16}-\frac{1}{16}=\frac{1}{4}>0,
\end{align*}
so $V(x)>0$ for $x\ne0$ and $V(0)=0$. Also
\begin{align*}
\nabla V(x)=\left(\frac{5}{2}x_1+\frac{1}{2}x_2,\frac{1}{2}x_1+\frac{1}{2}x_2\right).
\end{align*}
Along the nonlinear closed loop,
\begin{align*}
\dot V=\left(\frac{5}{2}x_1+\frac{1}{2}x_2\right)x_2+\left(\frac{1}{2}x_1+\frac{1}{2}x_2\right)\left(-2x_1-3x_2+\alpha(\sin x_1-x_1)\right).
\end{align*}
Separate the linear part from the nonlinear remainder:
\begin{align*}
\dot V=\left[\left(\frac{5}{2}x_1+\frac{1}{2}x_2\right)x_2+\left(\frac{1}{2}x_1+\frac{1}{2}x_2\right)(-2x_1-3x_2)\right]+\frac{\alpha}{2}(x_1+x_2)(\sin x_1-x_1).
\end{align*}
The bracketed expression is
\begin{align*}
\frac{5}{2}x_1x_2+\frac{1}{2}x_2^2-x_1^2-\frac{3}{2}x_1x_2-x_1x_2-\frac{3}{2}x_2^2.
\end{align*}
Since $\frac{5}{2}-\frac{3}{2}-1=0$ and $\frac{1}{2}-\frac{3}{2}=-1$, this becomes
\begin{align*}
-x_1^2-x_2^2.
\end{align*}
Therefore
\begin{align*}
\dot V=-x_1^2-x_2^2+\frac{\alpha}{2}(x_1+x_2)(\sin x_1-x_1).
\end{align*}
For $|x_1|\le1$, [Taylor's theorem](/theorems/827) with remainder gives $|\sin x_1-x_1|\le |x_1|^3/6$. Hence, if $r^2=x_1^2+x_2^2$, then $|x_1|\le r$, $|x_2|\le r$, and
\begin{align*}
\left|\frac{\alpha}{2}(x_1+x_2)(\sin x_1-x_1)\right|\le\frac{\alpha}{2}(|x_1|+|x_2|)\frac{|x_1|^3}{6}.
\end{align*}
Using $|x_1|+|x_2|\le2r$ and $|x_1|^3\le r^3$ gives
\begin{align*}
\left|\frac{\alpha}{2}(x_1+x_2)(\sin x_1-x_1)\right|\le\frac{\alpha}{6}r^4.
\end{align*}
Thus
\begin{align*}
\dot V\le -r^2+\frac{\alpha}{6}r^4.
\end{align*}
On any neighbourhood where $r^2<3/\alpha$, this implies
\begin{align*}
\dot V<-\frac{1}{2}r^2<0
\end{align*}
for every nonzero state. Thus the quadratic function is a local control Lyapunov function for the nonlinear pendulum near the upright equilibrium, and Sontag's formula may be applied on such a neighbourhood to select a stabilizing torque from the corresponding Lie derivative data.
[/example]
Sontag's formula solves the feedback selection problem once a CLF is available. A major remaining design question is how to build such a function for nonlinear systems with more structure than a single integrator. Backstepping answers this by constructing the Lyapunov function and feedback recursively.
## Recursive Backstepping for Strict-Feedback Systems
What if the system contains a state variable that behaves like a virtual input for a lower-dimensional subsystem? Backstepping exploits this situation by first stabilizing the lower subsystem with a fictitious control, then forcing the actual next state to track that fictitious control. The method is especially useful for strict-feedback systems, where the control enters through a triangular chain.
[definition: Strict-Feedback Form]
Let $D\subseteq\mathbb R^n$ be a neighbourhood of $0$, and let $D_i\subseteq\mathbb R^i$ denote the projection of $D$ onto the first $i$ coordinates. A system with state $x=(x_1,\dots,x_n)\in D$ and scalar input $u\in\mathbb R$ is in strict-feedback form if there are functions $f_i,g_i:D_i\to\mathbb R$ for $1\le i\le n-1$ and functions $f_n,g_n:D\to\mathbb R$ such that
\begin{align*}
\dot{x}_i=f_i(x_1,\dots,x_i)+g_i(x_1,\dots,x_i)x_{i+1}, \qquad 1\le i\le n-1,
\end{align*}
and
\begin{align*}
\dot{x}_n=f_n(x)+g_n(x)u,
\end{align*}
where each $g_i$ is nonzero on its domain.
[/definition]
The triangular structure means that $x_{i+1}$ can be treated as a temporary control for the first $i$ equations. Backstepping introduces tracking errors $z_i$ that measure the difference between each state and the stabilizing virtual control designed at the previous step.
[example: Scalar Strict-Feedback First Step]
Consider $\dot{x}_1=f_1(x_1)+g_1(x_1)x_2$, and suppose a smooth virtual control $\alpha_1(x_1)$ has been chosen so that the virtual closed subsystem
\begin{align*}
\dot{x}_1=f_1(x_1)+g_1(x_1)\alpha_1(x_1)
\end{align*}
has a positive definite Lyapunov function $V_1(x_1)$ with
\begin{align*}
V_1'(x_1)\bigl(f_1(x_1)+g_1(x_1)\alpha_1(x_1)\bigr)<0
\end{align*}
for $x_1\ne0$. Define the tracking error
\begin{align*}
z_2=x_2-\alpha_1(x_1).
\end{align*}
Then $x_2=\alpha_1(x_1)+z_2$, so the first equation becomes
\begin{align*}
\dot{x}_1=f_1(x_1)+g_1(x_1)\alpha_1(x_1)+g_1(x_1)z_2.
\end{align*}
Augment the Lyapunov function by setting
\begin{align*}
V_2(x_1,z_2)=V_1(x_1)+\frac{1}{2}z_2^2.
\end{align*}
If the next design variable controls $\dot{x}_2$ through a temporary input $v=\dot{x}_2$, then differentiating $z_2=x_2-\alpha_1(x_1)$ gives
\begin{align*}
\dot z_2=v-\alpha_1'(x_1)\dot{x}_1.
\end{align*}
Therefore
\begin{align*}
\dot V_2=V_1'(x_1)\dot{x}_1+z_2\dot z_2.
\end{align*}
Substituting the expressions for $\dot{x}_1$ and $\dot z_2$ gives
\begin{align*}
\dot V_2=V_1'(x_1)\bigl(f_1(x_1)+g_1(x_1)\alpha_1(x_1)+g_1(x_1)z_2\bigr)+z_2\bigl(v-\alpha_1'(x_1)\dot{x}_1\bigr).
\end{align*}
Separating the stabilizing part of the first subsystem from the terms multiplied by $z_2$,
\begin{align*}
\dot V_2=V_1'(x_1)\bigl(f_1(x_1)+g_1(x_1)\alpha_1(x_1)\bigr)+z_2\bigl(V_1'(x_1)g_1(x_1)+v-\alpha_1'(x_1)\dot{x}_1\bigr).
\end{align*}
Choose, for any $c_2>0$,
\begin{align*}
v=\alpha_1'(x_1)\dot{x}_1-V_1'(x_1)g_1(x_1)-c_2z_2.
\end{align*}
Then the bracketed $z_2$ coefficient becomes
\begin{align*}
V_1'(x_1)g_1(x_1)+\alpha_1'(x_1)\dot{x}_1-V_1'(x_1)g_1(x_1)-c_2z_2-\alpha_1'(x_1)\dot{x}_1=-c_2z_2.
\end{align*}
Thus
\begin{align*}
\dot V_2=V_1'(x_1)\bigl(f_1(x_1)+g_1(x_1)\alpha_1(x_1)\bigr)-c_2z_2^2.
\end{align*}
The first term is negative whenever $x_1\ne0$, and the second term is negative whenever $z_2\ne0$, so the augmented design stabilizes the enlarged state by making the new variable track the virtual control $\alpha_1(x_1)$.
[/example]
The recursive theorem packages this calculation. Each step adds one squared tracking error to the Lyapunov function and chooses the next virtual input so that all cross terms are either cancelled or dominated by negative quadratic terms.
[quotetheorem:7609]
[citeproof:7609]
Backstepping is constructive but algebraically demanding, and its hypotheses are structural rather than cosmetic. The nonvanishing condition on $g_i$ is needed because each step divides by $g_i$ to solve for the next virtual or actual control; if $g_i$ vanishes, the required cancellation may demand an infinite input or may be impossible at that state. Smoothness is what allows the derivatives of the virtual controls $\alpha_i$ to be computed in the next recursive step. Properness is the global compactness condition that prevents the completed Lyapunov function from decreasing while trajectories escape to infinity. The strict-feedback triangular form is the reason the induction closes: if $x_{i+1}$ entered earlier equations in a non-triangular way, treating it as a virtual control would introduce uncontrolled terms depending on future states. Its advantage, when these hypotheses hold, is that every new state variable is handled by the same template: define an error, extend the Lyapunov function, differentiate, cancel, and add damping.
[example: Adaptive-Looking Backstepping for a Scalar Strict-Feedback System]
Consider $\dot{x}_1=x_1^3+x_2$ and $\dot{x}_2=u$. Choose
\begin{align*}
\alpha_1(x_1)=-x_1-x_1^3.
\end{align*}
If $x_2=\alpha_1(x_1)$, then
\begin{align*}
\dot{x}_1=x_1^3+(-x_1-x_1^3)=-x_1,
\end{align*}
so the first state is stabilized by the virtual control.
Define
\begin{align*}
z_2=x_2-\alpha_1(x_1)=x_2+x_1+x_1^3.
\end{align*}
Then $x_2=\alpha_1(x_1)+z_2$, and hence
\begin{align*}
\dot{x}_1=x_1^3+\alpha_1(x_1)+z_2=-x_1+z_2.
\end{align*}
Use the augmented Lyapunov function
\begin{align*}
V(x_1,z_2)=\frac{1}{2}x_1^2+\frac{1}{2}z_2^2.
\end{align*}
Since $\alpha_1'(x_1)=-1-3x_1^2$, differentiating $z_2=x_2-\alpha_1(x_1)$ gives
\begin{align*}
\dot z_2=u-\alpha_1'(x_1)\dot{x}_1=u+(1+3x_1^2)\dot{x}_1.
\end{align*}
Substituting $\dot{x}_1=-x_1+z_2$ gives
\begin{align*}
\dot z_2=u+(1+3x_1^2)(-x_1+z_2).
\end{align*}
Now differentiate $V$:
\begin{align*}
\dot V=x_1\dot{x}_1+z_2\dot z_2.
\end{align*}
Substituting the two state equations in $(x_1,z_2)$ gives
\begin{align*}
\dot V=x_1(-x_1+z_2)+z_2\left(u+(1+3x_1^2)(-x_1+z_2)\right).
\end{align*}
Expanding only the first product,
\begin{align*}
\dot V=-x_1^2+x_1z_2+z_2\left(u+(1+3x_1^2)(-x_1+z_2)\right).
\end{align*}
Equivalently,
\begin{align*}
\dot V=-x_1^2+z_2\left(x_1+u+(1+3x_1^2)(-x_1+z_2)\right).
\end{align*}
Choose any $c>0$ and set
\begin{align*}
u=-x_1-(1+3x_1^2)(-x_1+z_2)-cz_2.
\end{align*}
Then the coefficient of $z_2$ becomes
\begin{align*}
x_1-x_1-(1+3x_1^2)(-x_1+z_2)-cz_2+(1+3x_1^2)(-x_1+z_2)=-cz_2.
\end{align*}
Therefore
\begin{align*}
\dot V=-x_1^2-cz_2^2.
\end{align*}
This is negative for every $(x_1,z_2)\ne(0,0)$.
Writing the feedback in the original coordinates, use $z_2=x_2+x_1+x_1^3$ and $-x_1+z_2=x_1^3+x_2$ to obtain
\begin{align*}
u=-x_1-(1+3x_1^2)(x_1^3+x_2)-c(x_2+x_1+x_1^3).
\end{align*}
The feedback first cancels the derivative of the virtual control along the $x_1$ dynamics, then injects the damping term $-cz_2$ that forces $x_2$ to track $\alpha_1(x_1)$.
[/example]
## Comparing CLF Design, Sontag Feedback, and Backstepping
The three methods in this chapter answer different parts of the same stabilization question. A control Lyapunov function is the certificate: it says that stabilizing control values exist at every nonzero state. Sontag's formula is a selector: for scalar affine systems it turns that certificate into an explicit feedback. Backstepping is a constructor: for strict-feedback systems it builds both the certificate and the feedback by recursion.
[remark: Choosing a Design Method]
Use CLF reasoning when the main challenge is to certify stabilizability or to compare candidate energy functions. Use Sontag's formula when a scalar-input affine system already has a suitable CLF and an explicit stabilizer is desired. Use backstepping when the plant has a triangular structure and each state can be interpreted as a virtual control for the preceding subsystem.
[/remark]
The methods also differ in regularity and robustness. Artstein's theorem highlights that a CLF may guarantee a continuous feedback only under an additional small-control condition. Sontag's formula gives a concrete feedback but can be nonsmooth away from the origin if the CLF data are nonsmooth. Backstepping usually produces smooth feedback under smooth hypotheses, but the formula may become long and sensitive to modelling errors introduced in early recursive steps.
[explanation: Lyapunov Design Workflow]
A practical nonlinear stabilization workflow begins by identifying the control structure. If the system is affine and low-dimensional, try to guess a CLF from energy, linearization, or physical storage. Compute $a(x)=\nabla V(x)\cdot f(x)$ and $b(x)=\nabla V(x)\cdot g(x)$, then check whether $b(x)=0$ forces $a(x)<0$ away from the equilibrium. If the system is strict-feedback, avoid guessing the full Lyapunov function at once; instead, stabilize the first equation virtually and extend the design one state at a time. In every case, the final verification is the same Lyapunov calculation: the closed-loop derivative must be negative definite on the intended region.
[/explanation]
This chapter closes the Lyapunov part of the course by showing how stability certificates become feedback laws. The next part of nonlinear and optimal control changes perspective: instead of prescribing decrease of a Lyapunov function, we formulate control design as the minimization of a cost functional over admissible trajectories.
Lyapunov design has now shown how stability certificates can be turned into stabilizing feedback laws. The next part of the course changes perspective and asks when a nonlinear system can instead be transformed into a linear one by coordinates and feedback, leading to feedback linearisation and normal forms.
# 4. Feedback Linearisation and Normal Forms
Feedback linearisation asks when a nonlinear control system can be transformed, by a change of coordinates and a feedback law, into a linear system. Chapters 2 and 3 used Lyapunov functions to prove stability and design stabilising controls without trying to remove the nonlinearities. Here the viewpoint changes: we use Lie derivatives and differential-geometric rank conditions to decide which nonlinearities are removable and which remain as internal dynamics. The chapter assumes the reader is comfortable with smooth ODEs, the chain rule, local diffeomorphisms and the [inverse function theorem](/theorems/51), basic linear controllability for pairs $(A,B)$, and Lie brackets of vector fields at the level needed to state [Frobenius' theorem](/theorems/2453). The main outputs are input-output linearisation, full-state feedback linearisation, and the [Byrnes-Isidori normal form](/theorems/7612).
## Lie Derivatives and Input-Output Linearisation
The first problem is to understand how an input appears in successive derivatives of a measured output. For a control-affine system, differentiating the output along trajectories separates the autonomous drift from the controlled vector field, so we begin by fixing the class of systems and outputs under discussion.
[definition: Control-Affine System With Output]
Let $U \subset \mathbb R^n$ be open. A single-input control-affine system with output is a system
\begin{align*}
\dot{x}=f(x)+g(x)u, \qquad y=h(x),
\end{align*}
where $f,g:U\to \mathbb R^n$ and $h:U\to \mathbb R$ are smooth maps, $x\in U$, $u\in \mathbb R$, and $y\in \mathbb R$.
[/definition]
The definition isolates the drift $f$, the controlled direction $g$, and the measured quantity $h$. The next problem is to differentiate $h(x(t))$ in a way that records which vector field is driving the motion, which motivates the Lie derivative.
[definition: Lie Derivative]
Let $U\subset \mathbb R^n$ be open, let $X:U\to \mathbb R^n$ be a smooth vector field, and let $h:U\to \mathbb R$ be smooth. The Lie derivative of $h$ along $X$ is the smooth function $L_Xh:U\to\mathbb R$ defined by
\begin{align*}
L_X h(x) := \nabla h(x)\cdot X(x).
\end{align*}
Higher Lie derivatives along the same vector field $X$ are defined recursively by $L_X^0h=h$ and $L_X^{k+1}h=L_X(L_X^k h)$.
[/definition]
Along a trajectory of $\dot{x}=f(x)+g(x)u$, the first output derivative is
\begin{align*}
\dot{y}=L_fh(x)+L_gh(x)u.
\end{align*}
If $L_gh$ vanishes on the operating region, the input does not appear in the first derivative. This raises the next question: at which derivative does the input first appear with a nonzero coefficient?
[definition: Relative Degree]
Let $U\subset \mathbb R^n$ be open, let $f,g:U\to\mathbb R^n$ be smooth vector fields, and let $h:U\to\mathbb R$ be a smooth output map. The control-affine system
\begin{align*}
\dot{x}=f(x)+g(x)u, \qquad y=h(x),
\end{align*}
has relative degree $r$ on $U$ if
\begin{align*}
L_gL_f^k h(x)=0 \quad \text{for all }x\in U\text{ and }k=0,\dots,r-2,
\end{align*}
and
\begin{align*}
L_gL_f^{r-1}h(x)\neq 0 \quad \text{for all }x\in U.
\end{align*}
[/definition]
The integer $r$ is a structural property of the chosen output, not only of the state equation. A poorly chosen output may hide the input for several derivatives, while another output for the same plant may expose the input immediately. The following example shows the calculation in a mechanical system where the input acts on acceleration rather than position.
[example: Relative Degree Of A Pendulum Angle Output]
Consider the torque-driven pendulum
\begin{align*}
\dot{x}_1=x_2, \qquad \dot{x}_2=-\sin x_1+u,
\end{align*}
with output $y=x_1$. In control-affine form, the drift and input vector fields are
\begin{align*}
f(x)=(x_2,-\sin x_1), \qquad g(x)=(0,1),
\end{align*}
and the output map is $h(x)=x_1$.
Since $\nabla h(x)=(1,0)$, the Lie derivative of $h$ along $f$ is
\begin{align*}
L_fh(x)=\nabla h(x)\cdot f(x)=(1,0)\cdot (x_2,-\sin x_1)=x_2.
\end{align*}
The Lie derivative of $h$ along $g$ is
\begin{align*}
L_gh(x)=\nabla h(x)\cdot g(x)=(1,0)\cdot (0,1)=0.
\end{align*}
Thus the input does not appear in the first output derivative:
\begin{align*}
\dot y=L_fh(x)+L_gh(x)u=x_2+0\cdot u=x_2.
\end{align*}
Now $L_fh(x)=x_2$, so $\nabla(L_fh)(x)=(0,1)$. Hence
\begin{align*}
L_gL_fh(x)=\nabla(L_fh)(x)\cdot g(x)=(0,1)\cdot(0,1)=1.
\end{align*}
Also,
\begin{align*}
L_f^2h(x)=\nabla(L_fh)(x)\cdot f(x)=(0,1)\cdot(x_2,-\sin x_1)=-\sin x_1.
\end{align*}
Therefore the second output derivative is
\begin{align*}
\ddot y=L_f^2h(x)+L_gL_fh(x)u=-\sin x_1+u.
\end{align*}
The relative-degree conditions are $L_gh=0$ and $L_gL_fh=1\neq 0$, so the angle output has relative degree $2$. The calculation shows that torque affects the angle through acceleration, not directly through the first derivative of the angle.
[/example]
The pendulum calculation shows the design opportunity created by a nonzero decoupling coefficient. Once the input appears, we can ask whether a feedback law can prescribe the highest output derivative directly.
[quotetheorem:7610]
[citeproof:7610]
This theorem turns the output channel into a chain of integrators, and each relative-degree hypothesis is needed for that conclusion. The vanishing conditions $L_gL_f^k h=0$ for $k<r-1$ ensure that no input derivative or earlier input term appears before the chosen final equation; otherwise the output would not have the clean integrator-chain structure assumed by linear design. The nonzero decoupling coefficient is equally essential because the feedback divides by $L_gL_f^{r-1}h$ and so becomes singular at points where this coefficient vanishes. For instance, for $\dot{x}=u$ and output $y=x^2$, one has $L_gh(x)=2x$, so the feedback
\begin{align*}
u=\frac{v}{2x}
\end{align*}
fails at $x=0$ even though away from $0$ the output can be assigned by feedback.
We can then choose $v$ by linear design, for instance
\begin{align*}
v=y_d^{(r)}-a_{r-1}(y^{(r-1)}-y_d^{(r-1)})-\cdots-a_0(y-y_d),
\end{align*}
where the polynomial $s^r+a_{r-1}s^{r-1}+\cdots+a_0$ is Hurwitz. The remaining question is what happens to state variables not visible through the output derivatives, as the next example illustrates.
[example: Input-Output Linearisation Of A Cart-Pole Output]
For the cart-pole coordinates $x=(p,\dot p,\theta,\dot\theta)$, write the mechanical equations in block form as
\begin{align*}
d_{11}(x)\ddot p+d_{12}(x)\ddot\theta+c_1(x)=u
\end{align*}
and
\begin{align*}
d_{21}(x)\ddot p+d_{22}(x)\ddot\theta+c_2(x)=0,
\end{align*}
where the input force acts only in the cart equation. Take $y=p$. Then
\begin{align*}
\dot y=\dot p,
\end{align*}
so $u$ does not appear in the first output derivative.
Let
\begin{align*}
\Delta(x)=d_{11}(x)d_{22}(x)-d_{12}(x)d_{21}(x).
\end{align*}
To isolate $\ddot p$, multiply the first equation by $d_{22}(x)$:
\begin{align*}
d_{11}(x)d_{22}(x)\ddot p+d_{12}(x)d_{22}(x)\ddot\theta+d_{22}(x)c_1(x)=d_{22}(x)u.
\end{align*}
Multiply the second equation by $d_{12}(x)$:
\begin{align*}
d_{12}(x)d_{21}(x)\ddot p+d_{12}(x)d_{22}(x)\ddot\theta+d_{12}(x)c_2(x)=0.
\end{align*}
Subtracting the second displayed equation from the first cancels the $\ddot\theta$ term and gives
\begin{align*}
\Delta(x)\ddot p+d_{22}(x)c_1(x)-d_{12}(x)c_2(x)=d_{22}(x)u.
\end{align*}
Hence, on any region where $\Delta(x)\neq 0$,
\begin{align*}
\ddot y=\ddot p=\frac{d_{12}(x)c_2(x)-d_{22}(x)c_1(x)}{\Delta(x)}+\frac{d_{22}(x)}{\Delta(x)}u.
\end{align*}
Thus the decoupling coefficient for the cart-position output is
\begin{align*}
b(x)=\frac{d_{22}(x)}{\Delta(x)}.
\end{align*}
On a regular region where $b(x)\neq 0$, the feedback
\begin{align*}
u=\frac{v-\frac{d_{12}(x)c_2(x)-d_{22}(x)c_1(x)}{\Delta(x)}}{\frac{d_{22}(x)}{\Delta(x)}}
\end{align*}
is well-defined. Multiplying numerator and denominator by $\Delta(x)$ gives the equivalent form
\begin{align*}
u=\frac{\Delta(x)v-d_{12}(x)c_2(x)+d_{22}(x)c_1(x)}{d_{22}(x)}.
\end{align*}
Substitution into the expression for $\ddot y$ yields
\begin{align*}
\ddot y=\frac{d_{12}c_2-d_{22}c_1}{\Delta}+\frac{d_{22}}{\Delta}\cdot\frac{\Delta v-d_{12}c_2+d_{22}c_1}{d_{22}}=v.
\end{align*}
The output channel is therefore a double integrator in the new input $v$, while the remaining variables $\theta$ and $\dot\theta$ are not eliminated; after the feedback, their evolution is still determined by the coupled pole equation.
[/example]
## Full-State Linearisation and Diffeomorphisms
Input-output linearisation controls only the differentiated output. The stronger problem is to ask when all $n$ state variables can be converted into a controllable linear chain by a smooth coordinate change and a feedback law.
[definition: Diffeomorphism]
Let $U,V\subset\mathbb R^n$ be open. A map $\Phi:U\to V$ is a diffeomorphism if $\Phi$ is smooth, bijective, its inverse $\Phi^{-1}:V\to U$ is smooth, and $D\Phi_x:\mathbb R^n\to\mathbb R^n$ is invertible for every $x\in U$.
[/definition]
The diffeomorphism requirement prevents the coordinate transformation from folding the state space or losing dimension. The next task is to name the precise [equivalence relation](/page/Equivalence%20Relation) between a nonlinear system and a linear controllable model.
[definition: Feedback Equivalence To A Linear System]
Let $U\subset\mathbb R^n$ be open and let $f,g:U\to\mathbb R^n$ be smooth vector fields. The control-affine system $\dot{x}=f(x)+g(x)u$ is locally feedback equivalent to a controllable linear system near $x_0\in U$ if there are neighbourhoods $W\subset U$ of $x_0$ and $V\subset\mathbb R^n$ of $0$, a diffeomorphism $z=\Phi(x):W\to V$, and smooth functions $\alpha,\beta:W\to\mathbb R$ with $\beta(x)\neq 0$, such that the feedback $u=\alpha(x)+\beta(x)v$ transforms the system into
\begin{align*}
\dot{z}=Az+Bv,
\end{align*}
where $(A,B)$ is a controllable linear pair.
[/definition]
This definition says what success looks like, but it does not yet provide a test. The test must measure how the input direction changes when transported by the drift, and this leads to Lie brackets of vector fields.
[definition: Lie Bracket Of Vector Fields]
Let $X,Y:U\to\mathbb R^n$ be smooth vector fields on an open set $U\subset\mathbb R^n$. Their Lie bracket is the vector field $[X,Y]:U\to\mathbb R^n$ defined by
\begin{align*}
[X,Y](x):=DY_x(X(x))-DX_x(Y(x)).
\end{align*}
For a drift $f$ and input vector field $g$, define $\operatorname{ad}_f^0g=g$ and $\operatorname{ad}_f^{k+1}g=[f,\operatorname{ad}_f^kg]$.
[/definition]
The span of these iterated brackets is the nonlinear counterpart of the controllability matrix. Full rank is necessary, but not sufficient for constructing coordinates; the lower-dimensional directions must also close under brackets, which motivates involutivity.
The notation is borrowed from differential geometry but is used only locally here. The symbol $\operatorname{Gr}(k,n)$ denotes the Grassmannian of $k$-dimensional linear subspaces of $\mathbb R^n$. If $M$ is a smooth manifold, then $TM$ denotes its tangent bundle, and $\Gamma(E)$ denotes the smooth sections of a vector subbundle $E\subseteq TM$.
Coordinate construction fails if the admissible directions are not closed under infinitesimal commutators. The obstruction is concrete: a pair of vector fields can each lie in the proposed distribution while their bracket points outside it; then small commutator loops drift away from any candidate coordinate leaf, so the directions cannot be tangent to a family of local slices.
[definition: Involutive Distribution]
Let $U\subset\mathbb R^n$ be open. A smooth distribution is a map $\Delta:U\to\bigcup_{k=0}^n \operatorname{Gr}(k,n)$ such that each $\Delta(x)\subseteq\mathbb R^n$ is a linear subspace spanned locally by smooth vector fields. The distribution $\Delta$ is involutive if, whenever smooth vector fields $X,Y:U\to\mathbb R^n$ take values in $\Delta$, the bracket $[X,Y]:U\to\mathbb R^n$ also takes values in $\Delta$.
[/definition]
Involutivity is the compatibility condition for straightening a family of directions into coordinate planes. The obstruction is that a plane field may twist as one moves through space, so that following two admissible directions and then comparing the resulting infinitesimal motions produces a direction outside the original field. The geometric question is therefore whether the bracket-closure condition is exactly what is needed for the directions to be tangent to genuine local coordinate slices.
[quotetheorem:1522]
This theorem is quoted from differential geometry. Its hypotheses are not cosmetic: constant rank prevents the dimension of the admissible direction field from jumping, while involutivity is the condition that the proposed coordinate planes fit together consistently. A concrete obstruction is the distribution on $\mathbb R^3$ spanned by $X=\partial_x+y\partial_z$ and $Y=\partial_y$; here $[X,Y]=-\partial_z$, which is not in $\operatorname{span}\{X,Y\}$, so the two-plane field twists out of itself and cannot be straightened into surfaces $z_3=\text{constant}$. In the feedback-linearisation theorem, Frobenius supplies the coordinate-straightening step needed after the algebraic rank condition has identified enough independent directions.
The preceding definitions give two separate tests: the bracket-generated directions must span the tangent space, and the lower bracket directions must be integrable enough to define level sets of a candidate output. The next result combines these tests into a sufficient local construction of a coordinate system with relative degree equal to the full state dimension.
[quotetheorem:7611]
[citeproof:7611]
Full-state feedback linearisation is powerful but restrictive, and each hypothesis rules out a different failure mode. If $\dim\mathcal C(x_0)<n$, the iterated input directions do not span the tangent space, just as a linear pair with deficient controllability matrix cannot be transformed into a controllable canonical form. If the rank of $\Delta_{n-1}$ changes nearby, any proposed coordinate construction is singular at the rank-change point. Rank alone is also insufficient: the non-involutive distribution spanned by $\partial_x+y\partial_z$ and $\partial_y$ has constant rank two, but its bracket produces the missing $\partial_z$ direction, so there is no family of local hypersurfaces whose tangent spaces are exactly the distribution. This is why the criterion combines a Kalman-like rank test with the Frobenius condition rather than using rank by itself. A fully actuated joint model shows the favourable case.
[example: Exact Linearisation Of A Robotic Joint Model]
Consider a single robotic joint with inertia $M(q)>0$, Coriolis/friction term $C(q,\dot q)$, gravity term $G(q)$, and torque input $u$:
\begin{align*}
M(q)\ddot q+C(q,\dot q)+G(q)=u.
\end{align*}
Set $x_1=q$ and $x_2=\dot q$. Then $\dot{x}_1=\dot q=x_2$. Since $M(x_1)>0$, division by $M(x_1)$ is valid, and the second-order equation gives
\begin{align*}
M(x_1)\dot{x}_2+C(x_1,x_2)+G(x_1)=u.
\end{align*}
Subtracting $C(x_1,x_2)+G(x_1)$ from both sides gives
\begin{align*}
M(x_1)\dot{x}_2=u-C(x_1,x_2)-G(x_1).
\end{align*}
Multiplying by $M(x_1)^{-1}$ gives
\begin{align*}
\dot{x}_2=M(x_1)^{-1}\bigl(u-C(x_1,x_2)-G(x_1)\bigr).
\end{align*}
Choose the feedback
\begin{align*}
u=C(x_1,x_2)+G(x_1)+M(x_1)v.
\end{align*}
Substituting this expression for $u$ into the $\dot{x}_2$ equation gives
\begin{align*}
\dot{x}_2=M(x_1)^{-1}\bigl(C(x_1,x_2)+G(x_1)+M(x_1)v-C(x_1,x_2)-G(x_1)\bigr).
\end{align*}
The Coriolis/friction and gravity terms cancel inside the parentheses, so
\begin{align*}
\dot{x}_2=M(x_1)^{-1}M(x_1)v.
\end{align*}
Because $M(x_1)^{-1}M(x_1)=1$, this becomes
\begin{align*}
\dot{x}_2=v.
\end{align*}
Thus the closed-loop state equations are $\dot{x}_1=x_2$ and $\dot{x}_2=v$, so the joint is transformed into a double integrator driven by the new input $v$.
[/example]
## Zero Dynamics and Minimum-Phase Behaviour
When the relative degree $r$ is smaller than the state dimension $n$, input-output linearisation leaves $n-r$ internal coordinates. The central question is whether forcing the output to zero produces stable hidden motion or an unstable internal response.
[definition: Zero Dynamics Manifold]
Let $U\subset\mathbb R^n$ be open, let $f,g:U\to\mathbb R^n$ be smooth vector fields, and let $h:U\to\mathbb R$ be a smooth output map. Suppose the control-affine system
\begin{align*}
\dot{x}=f(x)+g(x)u, \qquad y=h(x),
\end{align*}
has relative degree $r$ on $U$. Suppose the constraint map $H:U\to\mathbb R^r$ defined by
\begin{align*}
H(x):=(h(x),L_fh(x),\dots,L_f^{r-1}h(x))
\end{align*}
has constant rank $r$ on the zero set $H^{-1}(0)$. The zero dynamics manifold is the embedded submanifold $\mathcal Z\subset U$ defined by
\begin{align*}
\mathcal Z:=\{x\in U: h(x)=L_fh(x)=\cdots=L_f^{r-1}h(x)=0\}.
\end{align*}
[/definition]
The manifold $\mathcal Z$ is the set where the output and its first $r-1$ derivatives vanish. To keep the trajectory on this set, the input must be chosen so that the next derivative also vanishes.
[definition: Zero Dynamics]
Let $U\subset\mathbb R^n$ be open, let $f,g:U\to\mathbb R^n$ be smooth vector fields, and let $h:U\to\mathbb R$ be a smooth output map. Suppose $\dot{x}=f(x)+g(x)u$, $y=h(x)$ has relative degree $r$ on $U$, and let $\mathcal Z\subset U$ be its zero dynamics manifold. Let
\begin{align*}
U_0:=\{x\in U: L_gL_f^{r-1}h(x)\neq 0\}.
\end{align*}
The zeroing feedback is the map $u_0:U_0\to\mathbb R$ defined by
\begin{align*}
u_0(x):=-\frac{L_f^rh(x)}{L_gL_f^{r-1}h(x)}.
\end{align*}
The zero dynamics are the dynamics induced on $\mathcal Z\cap U_0$ by
\begin{align*}
\dot{x}=f(x)+g(x)u_0(x).
\end{align*}
[/definition]
Zero dynamics are not an extra modelling choice; they are forced by the demand that the output remain identically zero. Their stability is the nonlinear analogue of stable transmission zeros in linear systems. This motivates the minimum-phase condition.
[definition: Minimum-Phase Nonlinear System]
Let $U\subset\mathbb R^n$ be open, let $f,g:U\to\mathbb R^n$ be smooth vector fields, and let $h:U\to\mathbb R$ be a smooth output map. Suppose the control-affine system $\dot{x}=f(x)+g(x)u$, $y=h(x)$ has relative degree $r$ on $U$, with zero dynamics manifold $\mathcal Z\subset U$ and zeroing feedback $u_0:U_0\to\mathbb R$. The system is minimum phase near an equilibrium $x^*\in U$ if $x^*\in\mathcal Z\cap U_0$ and $x^*$ is a locally asymptotically stable equilibrium of the vector field $x\mapsto f(x)+g(x)u_0(x)$ restricted to $\mathcal Z\cap U_0$.
[/definition]
Minimum-phase behaviour is what allows aggressive output tracking without exciting unstable hidden motion. If the zero dynamics are unstable, the output can look well-controlled while the internal state diverges. Aircraft pitch control provides a standard example where this distinction matters.
[example: Zero Dynamics Of A Nonlinear Aircraft Pitch Model]
Use coordinates $x=(\gamma,q,\alpha)$, where $\gamma$ is flight-path angle, $q$ is pitch rate, and $\alpha$ is angle of attack. A common simplified pitch model has the form
\begin{align*}
\dot{\gamma}=q-F(\alpha), \qquad \dot{q}=M(\alpha,q)+B(\alpha)\delta, \qquad \dot{\alpha}=A(\alpha,q),
\end{align*}
where $\delta$ is elevator deflection. Take $y=\gamma$ and work on a regular flight envelope where $B(\alpha)\neq 0$.
The first output derivative is
\begin{align*}
\dot{y}=\dot{\gamma}=q-F(\alpha).
\end{align*}
Differentiating once more along trajectories gives
\begin{align*}
\ddot{y}=\dot{q}-F'(\alpha)\dot{\alpha}.
\end{align*}
Substituting the state equations for $\dot q$ and $\dot \alpha$ gives
\begin{align*}
\ddot{y}=M(\alpha,q)+B(\alpha)\delta-F'(\alpha)A(\alpha,q).
\end{align*}
Thus the elevator appears in the second output derivative with coefficient $B(\alpha)$, so the zeroing feedback is obtained by imposing $\ddot y=0$:
\begin{align*}
0=M(\alpha,q)+B(\alpha)\delta_0-F'(\alpha)A(\alpha,q).
\end{align*}
Subtracting $M(\alpha,q)-F'(\alpha)A(\alpha,q)$ from both sides gives
\begin{align*}
B(\alpha)\delta_0=F'(\alpha)A(\alpha,q)-M(\alpha,q).
\end{align*}
Since $B(\alpha)\neq 0$, division by $B(\alpha)$ gives
\begin{align*}
\delta_0=\frac{F'(\alpha)A(\alpha,q)-M(\alpha,q)}{B(\alpha)}.
\end{align*}
The zero-output constraints are $y=0$ and $\dot y=0$, hence
\begin{align*}
\gamma=0
\end{align*}
and
\begin{align*}
q-F(\alpha)=0.
\end{align*}
The second constraint is equivalent to
\begin{align*}
q=F(\alpha).
\end{align*}
Restricting the remaining state equation $\dot{\alpha}=A(\alpha,q)$ to the zero dynamics manifold therefore gives the internal equation
\begin{align*}
\dot{\alpha}=A(\alpha,F(\alpha)).
\end{align*}
If $\alpha_*$ is a trim point, then it satisfies
\begin{align*}
A(\alpha_*,F(\alpha_*))=0.
\end{align*}
Linearising the scalar internal equation at $\alpha_*$ gives
\begin{align*}
\dot{\tilde{\alpha}}=\left(\frac{\partial A}{\partial \alpha}(\alpha_*,F(\alpha_*))+\frac{\partial A}{\partial q}(\alpha_*,F(\alpha_*))F'(\alpha_*)\right)\tilde{\alpha}.
\end{align*}
The aircraft is minimum phase at this trim precisely when this linearized internal mode is stable, meaning the displayed coefficient has negative real part; otherwise the flight-path angle can be held at zero while the hidden angle-of-attack dynamics move away from trim.
[/example]
## Byrnes-Isidori Normal Form
The last problem is to write the preceding decomposition in coordinates. The Byrnes-Isidori normal form separates the externally linearised chain from the internal zero dynamics and is the standard local model for analysing tracking and robustness.
[quotetheorem:7612]
[citeproof:7612]
In these coordinates, the zero dynamics are obtained by setting $\xi=0$ and $v=0$:
\begin{align*}
\dot{\eta}=q(0,\eta).
\end{align*}
The extra regularity hypotheses in the theorem are what make this decomposition a genuine coordinate statement rather than only a formal calculation of derivatives. If the output-derivative differentials lose rank, then $(h,L_fh,\dots,L_f^{r-1}h)$ cannot be part of a local chart; for example, $\dot{x}=u$ with $y=x^2$ has a singular output coordinate at $x=0$ because $dy=2x\,dx$ vanishes there. If no completion functions satisfying $L_g\psi_j=0$ can be found, the input leaks into the proposed internal coordinates and the displayed split between external and internal dynamics is false. Thus normal form makes visible the distinction between exact tracking of $y$ and stability of the hidden subsystem, but only away from singular points of the coordinate map and the decoupling coefficient. A controller based only on the chain of integrators is acceptable only when the corresponding internal dynamics are stable on the operating region.
[remark: Local Nature Of Feedback Linearisation]
All feedback-linearisation statements in this chapter are local unless global hypotheses are added. Singularities occur when the decoupling coefficient $L_gL_f^{r-1}h$ vanishes, when the coordinate map ceases to be injective, or when actuator limits make the cancelling feedback infeasible. For this reason, exact cancellation is often paired with robust or Lyapunov-based design in applications.
[/remark]
The chapter therefore gives a hierarchy. Relative degree and input-output linearisation control what the measured output can do; full-state feedback linearisation asks whether the whole state can be turned into a linear controllable system; zero dynamics and the Byrnes-Isidori form explain what remains when only part of the state is externally linearised.
The feedback linearisation chapter has identified the structural conditions under which part or all of a nonlinear system can be made linear. The next chapter leaves full-state design and turns to the harder case of output feedback, where only measurements are available and observers must reconstruct the hidden state.
# 5. Nonlinear Observers and Output Feedback
This chapter turns from state-feedback design to the harder situation in which the controller does not see the full state. The background expected is local stability theory for nonlinear systems, linear observability and detectability, Lyapunov methods, and the linear Kalman filter or at least its Riccati equation. In linear theory the Kalman filter and the separation principle give a clean route: estimate the state, then apply the state-feedback law to the estimate. Nonlinear systems keep the same ambition but lose the global linear superposition structure that made the linear argument work. The chapter studies what remains: high-gain observers for observable normal forms, extended Kalman filtering as a local estimation method, and the restrictions under which output-feedback stabilization can be justified.
## Detectability Beyond Linear Systems
The first question is what it should mean for an unmeasured nonlinear state to be recoverable from an output signal. For a linear system, detectability says that all unstable modes are visible through the output. For a nonlinear system, the corresponding idea must account for trajectories, nonlinear output maps, and the possibility that two different initial states produce the same output for a finite or infinite time interval.
[definition: Nonlinear Observability Along Inputs]
Let $U \subseteq \mathbb R^m$, let $f:\mathbb R^n\times U\to\mathbb R^n$, and let $h:\mathbb R^n\to\mathbb R^p$. For an admissible input $u:[0,T]\to U$, consider the state equation and output map
\begin{align*}
\dot{x}=f(x,u), \qquad y=h(x),
\end{align*}
where the output map is $x\mapsto h(x)$. The system is observable on a set $D \subseteq \mathbb R^n$ over $[0,T]$ for the input $u$ if, whenever two solutions $x_1:[0,T]\to\mathbb R^n$ and $x_2:[0,T]\to\mathbb R^n$ with $x_1(0),x_2(0)\in D$ satisfy
\begin{align*}
h(x_1(t)) = h(x_2(t)) \quad \text{for all } t\in[0,T],
\end{align*}
then $x_1(0)=x_2(0)$.
[/definition]
This definition separates the geometric question of distinguishability from the algorithmic question of building an observer. It also shows why nonlinear observability is often local and input-dependent: two states may be distinguishable under one input but not under another. For stabilization, however, exact reconstruction can be more than is needed, which motivates a weaker asymptotic notion.
[definition: Nonlinear Detectability]
Let $f:\mathbb R^n\to\mathbb R^n$ be a vector field, let $h:\mathbb R^n\to\mathbb R^p$ be an output map, and consider
\begin{align*}
\dot{x}=f(x), \qquad y=h(x).
\end{align*}
Let $\varphi_t:D\to\mathbb R^n$ denote the time-$t$ flow map, $x_0\mapsto \varphi_t(x_0)$, on the set of initial states for which the solution exists for all $t\ge 0$. A set $D\subseteq \mathbb R^n$ is detectable if for any $x_1,x_2\in D$ such that
\begin{align*}
h(\varphi_t(x_1))=h(\varphi_t(x_2)) \quad \text{for all } t\ge 0,
\end{align*}
the distance between the two trajectories satisfies $|\varphi_t(x_1)-\varphi_t(x_2)|\to 0$ as $t\to\infty$.
[/definition]
Detectability is weaker than observability because indistinguishable states are allowed, provided their future behaviour becomes asymptotically the same. This is the right property for stabilization: a controller need not reconstruct state components whose errors decay without intervention. The simplest illustration is a hidden stable coordinate.
[example: Unobservable Stable Mode]
Consider the system
\begin{align*}
\dot{x}_1=-x_1, \qquad \dot{x}_2=x_2, \qquad y=x_2.
\end{align*}
For an initial state $x(0)=(a,b)$, the two scalar equations solve to
\begin{align*}
x_1(t)=ae^{-t}, \qquad x_2(t)=be^t,
\end{align*}
because $\frac{d}{dt}(ae^{-t})=-ae^{-t}$ and $\frac{d}{dt}(be^t)=be^t$. Hence the output from $(a,b)$ is $y(t)=be^t$.
The system is not observable on any set containing two points $(a,b)$ and $(\tilde a,b)$ with $a\ne \tilde a$, since their outputs satisfy
\begin{align*}
y(t)=be^t=\tilde y(t)
\end{align*}
for every $t\ge 0$, while their initial states are different. However, the corresponding state difference is
\begin{align*}
x(t)-\tilde x(t)=(ae^{-t}-\tilde a e^{-t}, be^t-be^t)=((a-\tilde a)e^{-t},0).
\end{align*}
Therefore
\begin{align*}
|x(t)-\tilde x(t)|=|a-\tilde a|e^{-t},
\end{align*}
which tends to $0$ as $t\to\infty$. Thus the unmeasured coordinate is not reconstructible from the output, but its estimation error decays on its own, so detectability permits this hidden stable dynamics.
[/example]
The example distinguishes what must be estimated from what may be allowed to decay. To design an actual observer, one needs more than detectability as a qualitative property: the measured output must expose the hidden coordinates in a form that can be fed back dynamically. The useful case is when differentiating the output successively reveals the state one coordinate at a time, leaving only known nonlinear terms to be compensated.
[definition: Observable Chain Form]
A single-output system is in observable chain form on a domain $D\subseteq\mathbb R^n$ if it can be written as
\begin{align*}
\dot{x}_i=x_{i+1}+\phi_i(x,u) \text{ for } 1\le i<n, \qquad \dot{x}_n=\phi_n(x,u), \qquad y=x_1,
\end{align*}
where each $\phi_i:D\times U\to\mathbb R$ is locally Lipschitz.
[/definition]
The chain form mirrors repeated differentiation of the output: $x_1$ is measured, $x_2$ appears in the first derivative, and so on. This makes it possible to replace numerical differentiation by a dynamic estimator with gains placed through a Hurwitz polynomial. This motivates the following definition.
[definition: High-Gain Observer]
For a system in observable chain form, choose constants $a_1,\dots,a_n$ such that the polynomial
\begin{align*}
s^n+a_1s^{n-1}+\cdots+a_{n-1}s+a_n
\end{align*}
is Hurwitz. For $\varepsilon>0$, the high-gain observer is
\begin{align*}
\dot{\hat{x}}_i=\hat{x}_{i+1}+\phi_i(\hat{x},u)+\frac{a_i}{\varepsilon^i}(y-\hat{x}_1) \text{ for } 1\le i<n, \qquad \dot{\hat{x}}_n=\phi_n(\hat{x},u)+\frac{a_n}{\varepsilon^n}(y-\hat{x}_1).
\end{align*}
[/definition]
The gains become large as $\varepsilon$ decreases, so the observer reacts rapidly to output mismatch. This gives fast convergence in nominal models, but it also amplifies measurement noise and can create a peaking transient before the estimate enters a useful neighbourhood. This motivates the following convergence theorem, which identifies the estimate obtained under boundedness and local Lipschitz hypotheses.
[quotetheorem:7613]
[citeproof:7613]
The theorem is local in several separate senses, and each hypothesis excludes a real failure mode. Compact positive invariance prevents the plant from leaving the region where the observable chain model and Lipschitz constants are valid; without it, even a correct local observer may be estimating the wrong coordinates after the trajectory exits the chart. Bounded input is needed because the nonlinear terms $\phi_i(x,u)$ may have Lipschitz constants that grow with $u$; a rapidly growing input can dominate the high-gain correction. Hurwitzness of the injection polynomial is the linear core of the argument, and if the companion matrix has an eigenvalue with non-negative real part then the scaled error equation is not exponentially stable. Finally, observer boundedness cannot be ignored: peaking can push $\hat{x}$ outside the neighbourhood $N$, at which point the proof no longer controls the nonlinear remainder. In implementation, saturation is often added to the nonlinear terms to prevent peaking from pushing the observer far outside the modelled operating set. A pendulum shows the construction in a familiar mechanical model.
[example: Pendulum With Angle-Only Sensing]
For a pendulum with angle $x_1=\theta$, angular velocity $x_2=\dot{\theta}$, and input torque $u$, the model is
\begin{align*}
\dot{x}_1=x_2, \qquad \dot{x}_2=-\frac{g}{\ell}\sin x_1+\frac{1}{m\ell^2}u, \qquad y=x_1.
\end{align*}
Comparing this with the observable chain form for $n=2$,
\begin{align*}
\dot{x}_1=x_2+\phi_1(x,u), \qquad \dot{x}_2=\phi_2(x,u), \qquad y=x_1,
\end{align*}
gives
\begin{align*}
\phi_1(x,u)=0, \qquad \phi_2(x,u)=-\frac{g}{\ell}\sin x_1+\frac{1}{m\ell^2}u.
\end{align*}
Choose $a_1,a_2$ so that $s^2+a_1s+a_2$ is Hurwitz. The corresponding high-gain observer is
\begin{align*}
\dot{\hat{x}}_1=\hat{x}_2+\frac{a_1}{\varepsilon}(y-\hat{x}_1), \qquad \dot{\hat{x}}_2=-\frac{g}{\ell}\sin \hat{x}_1+\frac{1}{m\ell^2}u+\frac{a_2}{\varepsilon^2}(y-\hat{x}_1).
\end{align*}
Since $y=x_1$, the innovation is
\begin{align*}
y-\hat{x}_1=x_1-\hat{x}_1.
\end{align*}
Thus the measured angle corrects the position estimate through the term $(a_1/\varepsilon)(x_1-\hat{x}_1)$ and corrects the velocity estimate through the larger term $(a_2/\varepsilon^2)(x_1-\hat{x}_1)$.
If the measured angle is noisy, say $y_m=x_1+\nu$, then the innovation used by the observer becomes
\begin{align*}
y_m-\hat{x}_1=(x_1-\hat{x}_1)+\nu.
\end{align*}
The injected terms are therefore
\begin{align*}
\frac{a_1}{\varepsilon}(y_m-\hat{x}_1)=\frac{a_1}{\varepsilon}(x_1-\hat{x}_1)+\frac{a_1}{\varepsilon}\nu, \qquad \frac{a_2}{\varepsilon^2}(y_m-\hat{x}_1)=\frac{a_2}{\varepsilon^2}(x_1-\hat{x}_1)+\frac{a_2}{\varepsilon^2}\nu.
\end{align*}
So the observer avoids explicitly differentiating the angle measurement, but decreasing $\varepsilon$ multiplies angle noise by $1/\varepsilon$ in the first observer equation and by $1/\varepsilon^2$ in the second. This is the basic tradeoff: high gain can reconstruct angular velocity rapidly from angle-only sensing, but it also amplifies high-frequency sensor error.
[/example]
## Extended Kalman Filtering as Local Nonlinear Estimation
High-gain observers are deterministic and structure-driven, while many engineering systems come with process noise, sensor noise, and a statistical model. The next question is how the linear Kalman filter can be adapted when the state and output equations are nonlinear. The extended Kalman filter answers by repeatedly linearising the model along the current estimate and applying the Riccati-based correction from linear filtering.
[definition: Continuous-Time Nonlinear Filtering Model]
A continuous-time nonlinear state-space model consists of maps $f:\mathbb R^n\times\mathbb R^m\to\mathbb R^n$ and $h:\mathbb R^n\to\mathbb R^p$ and has the form
\begin{align*}
\dot{x}=f(x,u)+w, \qquad y=h(x)+v,
\end{align*}
where $x\in\mathbb R^n$, $u\in\mathbb R^m$, $y\in\mathbb R^p$, the drift map is $(x,u)\mapsto f(x,u)$, the output map is $x\mapsto h(x)$, and $w$ and $v$ are zero-mean noise processes with covariance matrices $Q\in\mathbb R^{n\times n}$ and $R\in\mathbb R^{p\times p}$, with $R$ positive definite.
[/definition]
The covariance matrices encode how much trust the estimator places in the model and in the output. The linear Kalman filter uses these matrices with fixed system matrices; for a nonlinear model, the corresponding matrices must be recomputed from the current estimated operating point. This motivates the following definition.
[definition: Continuous-Time Extended Kalman Filter]
Let $f:\mathbb R^n\times\mathbb R^m\to\mathbb R^n$ and $h:\mathbb R^n\to\mathbb R^p$ be smooth maps. Along an estimate $\hat{x}:[0,T]\to\mathbb R^n$ and input $u:[0,T]\to\mathbb R^m$, define the Jacobian matrices
\begin{align*}
A(t)=J_x f_{(\hat{x}(t),u(t))}\in\mathbb R^{n\times n}, \qquad C(t)=Jh_{\hat{x}(t)}\in\mathbb R^{p\times n}.
\end{align*}
The continuous-time extended Kalman filter is the dynamical system
\begin{align*}
\dot{\hat{x}}=f(\hat{x},u)+K(t)(y-h(\hat{x})), \qquad K(t)=P(t)C(t)^\top R^{-1}, \qquad \dot{P}=A(t)P+PA(t)^\top-PC(t)^\top R^{-1}C(t)P+Q,
\end{align*}
where $P(t)\in\mathbb R^{n\times n}$ is symmetric positive semidefinite.
[/definition]
The innovation $y-h(\hat{x})$ is the mismatch between the measured output and the predicted output. The gain $K(t)$ changes with time because the best local linear approximation changes as the estimate moves through state space. The next result states precisely what local approximation the EKF is using.
[quotetheorem:7614]
[citeproof:7614]
This result explains both the strength and the danger of the EKF, and the hypotheses mark the failure modes. The $C^2$ assumption is what makes the remainder quadratic; with only a Lipschitz output map, a kink such as $h(x)=|x|$ at the operating point has no single Jacobian that gives a valid first-order correction. Neighbourhood confinement is also essential: once $\hat{x}$ or $x$ leaves the region where the second derivatives are bounded, the constant $M$ in the remainder estimate no longer exists. Local observability and stability of the linearised error system are separate requirements, not consequences of the Taylor expansion; if $C(t)$ loses rank along the trajectory, an error direction may be invisible, and if $A(t)-K(t)C(t)$ is not stable then the first-order model amplifies rather than damps estimation error. Range-bearing tracking is a standard case where the geometry of the output map matters.
[example: Range-Bearing Tracking]
Let $r=\sqrt{p_1^2+p_2^2}$ and assume $r>0$, since range and bearing are singular at the sensor origin. The target state is $x=(p_1,p_2,v_1,v_2)$, with nearly constant velocity dynamics
\begin{align*}
\dot{p}_1=v_1, \qquad \dot{p}_2=v_2, \qquad \dot{v}_1=w_1, \qquad \dot{v}_2=w_2.
\end{align*}
The output map is
\begin{align*}
h(x)=\left(r,\operatorname{atan2}(p_2,p_1)\right).
\end{align*}
For the range component, differentiating $r=(p_1^2+p_2^2)^{1/2}$ gives
\begin{align*}
\frac{\partial r}{\partial p_1}=\frac{1}{2}(p_1^2+p_2^2)^{-1/2}(2p_1)=\frac{p_1}{r}.
\end{align*}
Similarly,
\begin{align*}
\frac{\partial r}{\partial p_2}=\frac{1}{2}(p_1^2+p_2^2)^{-1/2}(2p_2)=\frac{p_2}{r}.
\end{align*}
The range does not depend on $v_1$ or $v_2$, so
\begin{align*}
\frac{\partial r}{\partial v_1}=0, \qquad \frac{\partial r}{\partial v_2}=0.
\end{align*}
For the bearing component $\theta=\operatorname{atan2}(p_2,p_1)$, away from $r=0$ its differential is
\begin{align*}
d\theta=\frac{-p_2}{p_1^2+p_2^2}\,dp_1+\frac{p_1}{p_1^2+p_2^2}\,dp_2.
\end{align*}
Since $p_1^2+p_2^2=r^2$, this gives
\begin{align*}
\frac{\partial \theta}{\partial p_1}=-\frac{p_2}{r^2}, \qquad \frac{\partial \theta}{\partial p_2}=\frac{p_1}{r^2}.
\end{align*}
The bearing also does not depend on $v_1$ or $v_2$, so
\begin{align*}
\frac{\partial \theta}{\partial v_1}=0, \qquad \frac{\partial \theta}{\partial v_2}=0.
\end{align*}
Therefore, at an estimate $\hat{x}=(\hat p_1,\hat p_2,\hat v_1,\hat v_2)$ with
\begin{align*}
\hat r=\sqrt{\hat p_1^2+\hat p_2^2},
\end{align*}
the first row of the output Jacobian $Jh_{\hat{x}}$ is
\begin{align*}
\left(\frac{\hat p_1}{\hat r},\frac{\hat p_2}{\hat r},0,0\right),
\end{align*}
and the second row is
\begin{align*}
\left(-\frac{\hat p_2}{\hat r^2},\frac{\hat p_1}{\hat r^2},0,0\right).
\end{align*}
As $\hat r\to 0$, the range derivatives contain factors $1/\hat r$ and the bearing derivatives contain factors $1/\hat r^2$, so small position errors near the sensor can produce large changes in the linearized measurement equation. Thus EKF performance in range-bearing tracking depends on the geometry of the output map, not only on the covariance matrices.
[/example]
## Output-Feedback Stabilization and Separation Issues
The final question is whether an observer can simply be combined with a stabilizing state-feedback controller. For linear systems, stabilizability and detectability imply that the controller and observer may be designed separately. Nonlinear systems lack such a general global separation theorem, because the estimation error changes the state trajectory and the state feedback changes the observer's operating region.
[definition: Dynamic Output-Feedback Controller]
For a plant with $f:\mathbb R^n\times\mathbb R^m\to\mathbb R^n$ and $h:\mathbb R^n\to\mathbb R^p$ given by
\begin{align*}
\dot{x}=f(x,u), \qquad y=h(x),
\end{align*}
a dynamic output-feedback controller has an internal state $\zeta\in\mathbb R^q$ and equations
\begin{align*}
\dot{\zeta}=\alpha(\zeta,y), \qquad u=\beta(\zeta,y),
\end{align*}
where $\alpha:\mathbb R^q\times\mathbb R^p\to\mathbb R^q$ is the controller vector field $(\zeta,y)\mapsto\alpha(\zeta,y)$ and $\beta:\mathbb R^q\times\mathbb R^p\to\mathbb R^m$ is the output map $(\zeta,y)\mapsto\beta(\zeta,y)$.
[/definition]
This definition covers filters, compensators, and observer-based designs. In the observer-based special case, the missing state creates a concrete implementation problem: a state-feedback law $u=k(x)$ cannot be applied because $x$ is not measured. The standard workaround is to run an observer and feed the controller the estimate, while keeping track of the fact that the resulting closed loop now couples plant and estimation error dynamics.
[definition: Certainty-Equivalence Output Feedback]
Let $f:\mathbb R^n\times\mathbb R^m\to\mathbb R^n$ define the plant $\dot{x}=f(x,u)$, let $k:\mathbb R^n\to\mathbb R^m$ be a state-feedback law, and let $F:\mathbb R^n\times\mathbb R^p\times\mathbb R^m\to\mathbb R^n$ define an observer vector field by
\begin{align*}
\dot{\hat{x}} = F(\hat{x},y,k(\hat{x})).
\end{align*}
Here $k$ is the map $x\mapsto k(x)$ and $F$ is the map $(\hat{x},y,u)\mapsto F(\hat{x},y,u)$. The certainty-equivalence output-feedback controller is
\begin{align*}
u=k(\hat{x}).
\end{align*}
[/definition]
The phrase certainty-equivalence means that the controller behaves as though the estimate were the true state. The central limitation is that nonlinear stability margins may be destroyed during observer transients. A local separation theorem is therefore formulated with exponential margins and small initial errors.
[quotetheorem:7615]
[citeproof:7615]
The theorem is intentionally local, and each assumption rules out a different way in which separation can fail. Exponential margins matter because merely asymptotic subsystems can be destabilized by small persistent coupling terms; a slow observer error may feed the state equation for long enough to leave the attraction region. Small initial observer error matters because certainty-equivalence control can apply a large wrong input during peaking, even when the observer would converge for the open-loop plant. Local Lipschitz coupling is also structural: if the vector field has non-Lipschitz dependence on $(x,e)$ near the origin, the cross terms in the composite Lyapunov estimate may not be bounded by constants times $|x||e|$ or $|e|^2$. The result gives a rigorous version of the engineering rule: separation is acceptable when the state-feedback loop and the observer error loop both have enough local exponential margin. Feedback linearization from Chapter 4 supplies a concrete design where this rule is useful.
[example: Output Feedback for a Feedback-Linearizable System]
Suppose the system has relative degree $n$ near the origin and has been written in coordinates $z=(z_1,\dots,z_n)$ as
\begin{align*}
\dot{z}_i=z_{i+1} \text{ for } 1\le i<n, \qquad \dot{z}_n=a(z)+b(z)u, \qquad y=z_1.
\end{align*}
Assume $b$ is bounded away from $0$ on the coordinate neighbourhood, so $b(z)^{-1}$ is well-defined there. If the full state $z$ were measured, choose
\begin{align*}
u=b(z)^{-1}(-a(z)+v).
\end{align*}
Substituting this into the last equation gives
\begin{align*}
\dot{z}_n=a(z)+b(z)b(z)^{-1}(-a(z)+v)=a(z)-a(z)+v=v.
\end{align*}
Thus the closed-loop coordinates satisfy
\begin{align*}
\dot{z}_i=z_{i+1} \text{ for } 1\le i<n, \qquad \dot{z}_n=v.
\end{align*}
Now assign
\begin{align*}
v=-c_1z_1-\cdots-c_nz_n.
\end{align*}
Then
\begin{align*}
\dot{z}_n=-c_1z_1-\cdots-c_nz_n,
\end{align*}
so the nominal closed loop is the companion-form linear system whose characteristic polynomial is
\begin{align*}
s^n+c_ns^{n-1}+c_{n-1}s^{n-2}+\cdots+c_2s+c_1.
\end{align*}
Choosing $c_1,\dots,c_n$ so that this polynomial is Hurwitz makes the full-state feedback linearized model locally exponentially stable in these coordinates.
If only $z_1$ is measured, the measured output is
\begin{align*}
y=z_1,
\end{align*}
and the observer uses the innovation
\begin{align*}
y-\hat z_1=z_1-\hat z_1.
\end{align*}
A high-gain observer for the normal form has the first $n-1$ equations
\begin{align*}
\dot{\hat z}_i=\hat z_{i+1}+\frac{\alpha_i}{\varepsilon^i}(z_1-\hat z_1) \text{ for } 1\le i<n,
\end{align*}
and the last equation
\begin{align*}
\dot{\hat z}_n=a(\hat z)+b(\hat z)u+\frac{\alpha_n}{\varepsilon^n}(z_1-\hat z_1),
\end{align*}
where the observer polynomial is chosen Hurwitz. Certainty-equivalence output feedback replaces $z$ by $\hat z$ in the feedback-linearizing law:
\begin{align*}
u=b(\hat z)^{-1}\left(-a(\hat z)-c_1\hat z_1-\cdots-c_n\hat z_n\right).
\end{align*}
If the estimate were exact, $\hat z=z$, then this implemented input would reduce to the full-state input, and the plant equation would again give
\begin{align*}
\dot z_n=a(z)+b(z)b(z)^{-1}\left(-a(z)-c_1z_1-\cdots-c_nz_n\right)=-c_1z_1-\cdots-c_nz_n.
\end{align*}
With estimation error present, the controller applies the input computed from $\hat z$, not from $z$. Therefore the plant actually sees
\begin{align*}
\dot z_n=a(z)+b(z)b(\hat z)^{-1}\left(-a(\hat z)-c_1\hat z_1-\cdots-c_n\hat z_n\right).
\end{align*}
This equals the nominal stable equation only when the mismatch between $z$ and $\hat z$ is small enough that the coordinate chart remains valid, $b(\hat z)$ remains nonzero, and the observer transient stays in the neighbourhood where the high-gain estimates are controlled. The example shows the local separation picture: the state-feedback design stabilizes the exact-coordinate model, while the observer-based implementation is justified only inside the region where the feedback linearization and the high-gain observer estimates are both valid.
[/example]
The feedback-linearizable example also marks the boundary of the method: its conclusions live inside a chosen coordinate chart and a chosen operating region. This motivates a final warning about why the linear separation theorem should not be imported into nonlinear control without hypotheses.
[remark: Why Global Separation Fails]
Nonlinear output feedback can fail globally even when the state-feedback law and observer look reasonable in isolation. The observer may converge only for inputs that keep the plant in a compact set, while the controller using a poor estimate may drive the state outside that set. Nonlinear systems can also have multiple indistinguishable states, finite escape under bad transient inputs, or output maps whose rank changes across the state space. These phenomena have no counterpart in the finite-dimensional linear separation theorem.
[/remark]
The practical design lesson is to treat output feedback as a coupled closed-loop problem. High-gain observers and EKFs are powerful local estimators, but their region of validity, noise sensitivity, and interaction with the controller must be part of the stability argument rather than an afterthought.
Once the full state is no longer available, control and estimation can no longer be treated separately. Chapter 6 begins the optimal-control part of the course by formulating admissible trajectories, endpoint conditions, and cost functionals so that control design becomes an optimization problem.
# 6. Calculus of Variations and Optimal Control Problems
Optimal control problems become mathematical optimization problems only after we specify an admissible class, a state equation, endpoint requirements, and a cost. The prerequisites for this chapter are ordinary differential equations for controlled systems, multivariable calculus, compactness in spaces of curves, and the finite-dimensional multiplier rule for constrained optimization; the final section also uses the direct method from the calculus of variations. Earlier chapters treated nonlinear systems mostly through trajectories, stability, and feedback; here the trajectory is chosen by minimizing a performance criterion. The chapter develops the standard Bolza, Lagrange, and Mayer formulations, derives the Euler-Lagrange and transversality conditions as prototype necessary conditions, and closes with the compactness issues behind existence of optimal controls.
## From Performance Criteria to Standard Formulations
The first modelling problem is how to write the objective without losing information about what happens during the motion or at the final state. A spacecraft, robot, or vehicle may pay for fuel continuously while also paying a terminal penalty for missing a target. This motivates the Bolza formulation, which keeps both kinds of cost in one problem.
[definition: Bolza Optimal Control Problem]
Let $t_0<t_1$, let $U$ be a nonempty control set, let $\mathcal U$ be a specified class of measurable maps $u:[t_0,t_1]\to U$, and let $\mathcal X=AC([t_0,t_1];\mathbb R^n)$. Let $f:[t_0,t_1]\times\mathbb R^n\times U\to\mathbb R^n$ be the controlled vector field, let $L:[t_0,t_1]\times\mathbb R^n\times U\to\mathbb R$ be the running cost, and let $\Phi:\mathbb R^n\to\mathbb R$ be the terminal cost.
Let $E_0\subset\mathbb R^n$ be the allowed initial set and let $E_1\subset\mathbb R^n\times\mathbb R^n$ be the allowed endpoint set. The admissible set is
\begin{align*}
\mathcal A_B=\{(x,u)\in\mathcal X\times\mathcal U:x(t_0)\in E_0,\ (x(t_0),x(t_1))\in E_1,\ \dot x(t)=f(t,x(t),u(t))\text{ for a.e. }t\}.
\end{align*}
A Bolza optimal control problem is the minimization of the functional $J:\mathcal A_B\to\mathbb R\cup\{\infty\}$ defined by
\begin{align*}
J[x,u]=\Phi(x(t_1))+\int_{t_0}^{t_1}L(t,x(t),u(t))\,dt.
\end{align*}
[/definition]
The Bolza form separates terminal performance from accumulated running cost. In many problems the terminal term is absent, so it is useful to name the purely accumulated version before converting between formulations. This motivates the Lagrange formulation.
[definition: Lagrange Optimal Control Problem]
Let $t_0<t_1$, let $U$ be a nonempty control set, let $\mathcal U$ be a specified class of measurable maps $u:[t_0,t_1]\to U$, and let $f:[t_0,t_1]\times\mathbb R^n\times U\to\mathbb R^n$ be the controlled vector field. Let $\mathcal A_L\subset AC([t_0,t_1];\mathbb R^n)\times\mathcal U$ be the admissible set of pairs satisfying
\begin{align*}
\dot x(t)=f(t,x(t),u(t))\quad\text{for a.e. }t\in[t_0,t_1],
\end{align*}
together with the prescribed endpoint constraints and path constraints. Let $L:[t_0,t_1]\times\mathbb R^n\times U\to\mathbb R$. A Lagrange optimal control problem is the minimization of the functional $J:\mathcal A_L\to\mathbb R\cup\{\infty\}$ defined by
\begin{align*}
J[x,u]=\int_{t_0}^{t_1}L(t,x(t),u(t))\,dt.
\end{align*}
[/definition]
The Lagrange form includes fuel minimization, energy minimization, tracking error, and time minimization with $L=1$. The opposite extreme is also important: dynamic programming and adjoint methods often become cleaner when all cost is stored at the final time. This motivates the Mayer formulation.
[definition: Mayer Optimal Control Problem]
Let $t_0<t_1$, let $U$ be a nonempty control set, let $\mathcal U$ be a specified class of measurable maps $u:[t_0,t_1]\to U$, and let $f:[t_0,t_1]\times\mathbb R^n\times U\to\mathbb R^n$ be the controlled vector field. Let $\mathcal A_M\subset AC([t_0,t_1];\mathbb R^n)\times\mathcal U$ be the admissible set of pairs satisfying
\begin{align*}
\dot x(t)=f(t,x(t),u(t))\quad\text{for a.e. }t\in[t_0,t_1],
\end{align*}
together with the prescribed endpoint constraints and path constraints. Let $\Phi:\mathbb R^n\to\mathbb R$. A Mayer optimal control problem is the minimization of the functional $J:\mathcal A_M\to\mathbb R\cup\{\infty\}$ defined by
\begin{align*}
J[x,u]=\Phi(x(t_1)).
\end{align*}
[/definition]
The three formulations are not competing theories; they are different encodings of the same optimization data. The practical obstruction is that a running cost cannot be read directly from the terminal state, while many later arguments are cleanest for terminal-cost problems. Adding an accumulator state should solve this bookkeeping problem only if it preserves admissible trajectories and objective values, so one needs a precise equivalence statement before transferring necessary conditions between forms.
[quotetheorem:7616]
[citeproof:7616]
This result explains why adding state variables is a legitimate modelling operation rather than a change of problem. Each hypothesis has a specific role. The condition $z(t_0)=0$ fixes the accumulated-cost state; if instead $z(t_0)=c$, the Mayer terminal value becomes $\Phi(x(t_1))+c+\int_{t_0}^{t_1}L\,dt$, so the augmented problem represents a shifted objective. The differential equation for $z$ is also essential: replacing $\dot z=L$ by $\dot z=\alpha L$ with $\alpha\neq1$ changes the weight of the running cost, and minimizers can change when two admissible controls trade terminal cost against accumulated cost. The theorem also assumes that the same endpoint and path constraints are imposed on the original variables; adding a new terminal constraint on $z(t_1)$ would restrict the admissible set and break equivalence. It does not assert existence or uniqueness of minimizers; it only transfers admissible pairs and objective values when the augmented state equation is meaningful.
This conversion is a bridge to Pontryagin theory in Chapter 7 and dynamic programming in Chapter 8, where terminal-cost formulations are often algebraically cleaner. Before deriving necessary conditions, it is useful to see a concrete Lagrange problem where the cost has direct physical meaning. This motivates the minimum-energy steering example.
[example: Minimum Energy Steering]
Consider the controlled linear system $\dot x(t)=Ax(t)+Bu(t)$ on $[0,T]$, with $x(0)=x_0$ and required terminal state $x(T)=x_1$. For a square-integrable control $u$, the variation-of-constants formula gives
\begin{align*}
x(T)=e^{AT}x_0+\int_0^T e^{A(T-s)}Bu(s)\,ds.
\end{align*}
Thus the terminal constraint is equivalent to
\begin{align*}
\int_0^T e^{A(T-s)}Bu(s)\,ds=x_1-e^{AT}x_0.
\end{align*}
Let $d=x_1-e^{AT}x_0$. To minimize
\begin{align*}
J[u]=\frac12\int_0^T |u(t)|^2\,dt
\end{align*}
subject to the linear constraint above, introduce a multiplier $\lambda\in\mathbb R^n$ and consider the first variation of
\begin{align*}
\mathcal L[u,\lambda]=\frac12\int_0^T |u(t)|^2\,dt-\lambda\cdot\left(\int_0^T e^{A(T-s)}Bu(s)\,ds-d\right).
\end{align*}
For any square-integrable variation $v$,
\begin{align*}
\frac{d}{d\varepsilon}\mathcal L[u+\varepsilon v,\lambda]\bigg|_{\varepsilon=0}=\int_0^T u(t)\cdot v(t)\,dt-\lambda\cdot\int_0^T e^{A(T-s)}Bv(s)\,ds.
\end{align*}
Using $a\cdot Bv=(B^\top a)\cdot v$ with $a=e^{A^\top(T-s)}\lambda$, this becomes
\begin{align*}
\frac{d}{d\varepsilon}\mathcal L[u+\varepsilon v,\lambda]\bigg|_{\varepsilon=0}=\int_0^T \left(u(t)-B^\top e^{A^\top(T-t)}\lambda\right)\cdot v(t)\,dt.
\end{align*}
Since this must vanish for every $v$, the minimizing control has the form
\begin{align*}
u^*(t)=B^\top e^{A^\top(T-t)}\lambda.
\end{align*}
Enforcing the terminal constraint determines $\lambda$:
\begin{align*}
d=\int_0^T e^{A(T-s)}BB^\top e^{A^\top(T-s)}\lambda\,ds.
\end{align*}
With
\begin{align*}
W_T=\int_0^T e^{A(T-s)}BB^\top e^{A^\top(T-s)}\,ds,
\end{align*}
this is exactly
\begin{align*}
W_T\lambda=x_1-e^{AT}x_0.
\end{align*}
If $W_T$ is invertible, then
\begin{align*}
\lambda=W_T^{-1}(x_1-e^{AT}x_0),
\end{align*}
and therefore
\begin{align*}
u^*(t)=B^\top e^{A^\top(T-t)}W_T^{-1}(x_1-e^{AT}x_0).
\end{align*}
This control is not only stationary but minimizing. If $u$ is any other feasible control, set $v=u-u^*$. Then $v$ satisfies the homogeneous endpoint constraint
\begin{align*}
\int_0^T e^{A(T-s)}Bv(s)\,ds=0.
\end{align*}
Also,
\begin{align*}
\int_0^T u^*(t)\cdot v(t)\,dt=\lambda\cdot\int_0^T e^{A(T-s)}Bv(s)\,ds=0.
\end{align*}
Hence
\begin{align*}
J[u]=\frac12\int_0^T |u^*(t)+v(t)|^2\,dt.
\end{align*}
Expanding the square and using the vanishing cross term gives
\begin{align*}
J[u]=J[u^*]+\frac12\int_0^T |v(t)|^2\,dt\ge J[u^*].
\end{align*}
Thus the displayed open-loop control is the unique minimum-energy steering control when the Gramian is invertible.
[/example]
The example has fixed endpoints, so no boundary term survives in the first variation. The next section asks what stationarity says in the interior of the interval and prepares the control-theoretic adjoint equations by revisiting the classical calculus of variations.
## Euler-Lagrange Equations as the Prototype Necessary Condition
The basic necessary-condition problem is to identify which differential equation an optimal curve must satisfy. In the classical variational setting the velocity plays the role of the control, so the computation is simpler than Pontryagin's maximum principle but already contains the same first-variation structure. This motivates the classical variational problem.
[definition: Classical Variational Problem]
Let $a<b$, let $F:[a,b]\times\mathbb R^n\times\mathbb R^n\to\mathbb R$ be continuously differentiable, and fix $q_a,q_b\in\mathbb R^n$. The admissible curve class is
\begin{align*}
\mathcal C_{q_a,q_b}=\{q\in C^1([a,b];\mathbb R^n):q(a)=q_a,\ q(b)=q_b\}.
\end{align*}
The fixed-endpoint classical variational problem is the minimization of the functional $I:\mathcal C_{q_a,q_b}\to\mathbb R$ defined by
\begin{align*}
I[q]=\int_a^b F(t,q(t),\dot q(t))\,dt.
\end{align*}
[/definition]
The definition turns optimality into a statement about all nearby curves with the same endpoints. If endpoint variations were allowed, [integration by parts](/theorems/210) would leave boundary terms and the interior equation alone would miss part of the stationarity condition. If the curve were not varied against an arbitrary family of compactly supported perturbations, the fundamental lemma would not force a pointwise differential equation. Because fixed-endpoint variations vanish at $a$ and $b$, the boundary contribution disappears, leaving only the interior stationarity condition. This motivates the Euler-Lagrange equation.
[quotetheorem:6835]
[citeproof:6835]
The Euler-Lagrange equation is a necessary condition, not a complete optimality test. The $C^2$ regularity of $F$ and $q$ is what permits differentiating the first variation and interpreting the result as a classical pointwise equation; for the length functional $F(v)=|v|$, a minimizer with zero velocity at an interior point is not covered because $F_v$ is not defined there. The fixed-endpoint hypothesis is also used in a precise way. If $q(b)$ is free, [integration by parts](/theorems/2098) leaves the boundary condition $F_v(b,q(b),\dot q(b))=0$ rather than allowing that term to disappear; for $F(v)=|v|^2/2$ this would require $\dot q(b)=0$, a condition absent from the fixed-endpoint theorem. The assumption that variations are arbitrary in the interior is what lets the fundamental lemma produce a pointwise equation; restricting variations to a smaller class, such as perturbations satisfying an extra integral constraint, would introduce a multiplier term instead.
The equation produces candidate extremals, which still need boundary conditions, convexity or second-variation information, and comparison arguments before minimality can be concluded. A classical example shows how a geometric optimization problem becomes a Lagrange control problem.
[example: Brachistochrone as a Control Problem]
A particle slides under gravity from $(0,0)$ to $(x_1,y_1)$ with $y_1>0$, where $y$ is measured downward. If the curve is written as $y=y(x)$, then an arclength element is $\sqrt{1+(y'(x))^2}\,dx$, and [conservation of energy](/theorems/1335) gives speed $\sqrt{2gy(x)}$. Therefore the travel time is
\begin{align*}
T[y]=\int_0^{x_1}\frac{\sqrt{1+(y'(x))^2}}{\sqrt{2gy(x)}}\,dx.
\end{align*}
Set
\begin{align*}
F(y,p)=\frac{\sqrt{1+p^2}}{\sqrt{2gy}}.
\end{align*}
Since $F$ has no explicit $x$-dependence, the *[Beltrami identity](/theorems/3505)* gives
\begin{align*}
F-pF_p=C
\end{align*}
along any smooth extremal. Here
\begin{align*}
F_p(y,p)=\frac{p}{\sqrt{2gy}\sqrt{1+p^2}}.
\end{align*}
Thus
\begin{align*}
F-pF_p=\frac{\sqrt{1+p^2}}{\sqrt{2gy}}-\frac{p^2}{\sqrt{2gy}\sqrt{1+p^2}}.
\end{align*}
Putting the two terms over the same denominator gives
\begin{align*}
F-pF_p=\frac{1+p^2-p^2}{\sqrt{2gy}\sqrt{1+p^2}}.
\end{align*}
Hence
\begin{align*}
\frac{1}{\sqrt{2gy}\sqrt{1+(y')^2}}=C.
\end{align*}
Writing $C=1/\sqrt{2ga}$ with $a>0$, this becomes
\begin{align*}
y(1+(y')^2)=a.
\end{align*}
Solving for $y'$ gives
\begin{align*}
(y')^2=\frac{a-y}{y}.
\end{align*}
On a descending branch,
\begin{align*}
\frac{dx}{dy}=\sqrt{\frac{y}{a-y}}.
\end{align*}
Introduce the parameter $\theta$ by
\begin{align*}
y=\frac a2(1-\cos\theta).
\end{align*}
Then
\begin{align*}
dy=\frac a2\sin\theta\,d\theta.
\end{align*}
Also,
\begin{align*}
a-y=\frac a2(1+\cos\theta).
\end{align*}
Therefore
\begin{align*}
\sqrt{\frac{y}{a-y}}=\sqrt{\frac{1-\cos\theta}{1+\cos\theta}}.
\end{align*}
Using $1-\cos\theta=2\sin^2(\theta/2)$ and $1+\cos\theta=2\cos^2(\theta/2)$, this is
\begin{align*}
\sqrt{\frac{y}{a-y}}=\tan(\theta/2).
\end{align*}
Hence
\begin{align*}
dx=\tan(\theta/2)\frac a2\sin\theta\,d\theta.
\end{align*}
Since $\tan(\theta/2)=\sin\theta/(1+\cos\theta)$, we get
\begin{align*}
dx=\frac a2\frac{\sin^2\theta}{1+\cos\theta}\,d\theta.
\end{align*}
Using $\sin^2\theta=(1-\cos\theta)(1+\cos\theta)$, this reduces to
\begin{align*}
dx=\frac a2(1-\cos\theta)\,d\theta.
\end{align*}
Integrating from $\theta=0$ and using $x(0)=0$ gives
\begin{align*}
x=\frac a2(\theta-\sin\theta).
\end{align*}
Thus the extremal curve has the parametric form
\begin{align*}
x(\theta)=\frac a2(\theta-\sin\theta),\qquad y(\theta)=\frac a2(1-\cos\theta).
\end{align*}
The endpoint condition $(x(\theta_1),y(\theta_1))=(x_1,y_1)$ determines the constants $a$ and $\theta_1$, and the resulting curve is a cycloid arc, the classical brachistochrone.
[/example]
The brachistochrone shows that the state path and endpoint data interact through a global stationarity condition. The next complication is that endpoints are often constrained but not fixed, so the boundary terms no longer vanish automatically.
## Endpoint Constraints and Transversality Conditions
In applications the final state may lie on a target surface, the final time may be free, or several endpoint equalities and inequalities may be imposed. The problem is to translate these geometric endpoint restrictions into boundary conditions for an extremal. This motivates the formal notion of an endpoint constraint.
[definition: Endpoint Constraint]
Let $\mathcal X\subset AC([t_0,t_1];\mathbb R^n)$ be a trajectory class, and let $G:\mathbb R^n\times\mathbb R^n\times\mathbb R\times\mathbb R\to\mathbb R^k$ be a specified map. An equality endpoint constraint is the requirement that the endpoint evaluation map $\Gamma:\mathcal X\to\mathbb R^k$ defined by
\begin{align*}
\Gamma(x)=G(x(t_0),x(t_1),t_0,t_1)
\end{align*}
satisfies $\Gamma(x)=0$.
[/definition]
An endpoint constraint restricts which endpoint variations are allowed. The first variation still produces boundary terms, but now those terms only need to vanish against variations tangent to the constraint set. This motivates transversality.
[definition: Transversality Condition]
Let $V$ be the [vector space](/page/Vector%20Space) of first-order admissible endpoint variations and let $\beta:V\to\mathbb R$ be the boundary linear functional produced by the first variation of the cost. A transversality condition is the requirement
\begin{align*}
\beta(\eta)=0\quad\text{for every }\eta\in V.
\end{align*}
[/definition]
The definition says what kind of condition we seek, but the geometric content appears when the terminal state is constrained to a smooth target manifold. In that case, admissible terminal variations are tangent vectors, so the terminal momentum must be normal to the target. This motivates the terminal-manifold transversality theorem.
The picture encoded by the theorem is local at the endpoint: once the interior Euler-Lagrange equation has removed the bulk variation, the only remaining first-order term is the terminal momentum paired with the allowed terminal displacement. Since the endpoint may move only along $M$, stationarity forces that momentum to annihilate the tangent space $T_{q(b)}M$.
[quotetheorem:7617]
[citeproof:7617]
Thus terminal momentum lies in the normal space to the terminal manifold. The smooth embedded-submanifold hypothesis is what gives a well-defined tangent space of first-order terminal variations; if the target is a corner such as $M=\{(x_1,x_2):x_2\ge |x_1|\}$ at the origin, the terminal variations do not form a vector space and the correct replacement is a normal cone. The Euler-Lagrange hypothesis is also used: without cancellation of the interior first variation, the boundary pairing alone need not vanish against tangent terminal variations. The fixed initial point matters as well, since freeing $q(a)$ would produce an additional initial transversality condition involving $-F_v(a,q(a),\dot q(a))$. The theorem does not prove that every curve satisfying this condition is optimal, since the same boundary condition may hold for maxima or saddle extremals.
It is the geometric prototype for the endpoint costate condition in Pontryagin's maximum principle. If the terminal time is also free, varying the interval length produces an additional boundary term involving the Hamiltonian. This motivates the free-terminal-time transversality condition.
[quotetheorem:7618]
[citeproof:7618]
Free-time transversality explains why Hamiltonian boundary conditions appear in time-optimal control. The autonomy hypothesis matters: if $F$ depends explicitly on time, the same boundary calculation still gives a terminal Hamiltonian condition at the moving endpoint, but the Hamiltonian need not be conserved along the interval. The fixed terminal point hypothesis is also doing work, because a free terminal point would leave an additional momentum transversality condition; for $F(v)=|v|^2/2+1$, allowing $q(b)$ to move would additionally force $p(b)=\dot q(b)=0$, which is incompatible with many fixed-target transfers. The assumption that the admissible terminal times form an open interval matters because the proof uses both positive and negative variations; at a one-sided time constraint the conclusion becomes an inequality condition instead of $H(b)=0$. This theorem is therefore a boundary stationarity statement, not a sufficiency theorem for optimality.
A realistic endpoint constraint often combines equality and inequality requirements, so the next example records how active endpoint constraints contribute multipliers.
[example: Constrained Landing Problem]
For the terminal endpoint, write $r(T)=(r_1(T),r_2(T))$ and $v(T)=(v_1(T),v_2(T))$. The landing equalities are
\begin{align*}
g_1(r(T),v(T))=r_1(T)-R=0,\qquad g_2(r(T),v(T))=r_2(T)=0.
\end{align*}
If the velocity safe set is described locally by active inequalities $h_j(v(T))\le 0$ for $j\in A$, then the endpoint multiplier contribution has the form
\begin{align*}
\mu_1(r_1(T)-R)+\mu_2r_2(T)+\sum_{j\in A}\nu_jh_j(v(T)),
\end{align*}
where $\nu_j\ge 0$ for active inequality constraints.
Let the terminal costate be $p(T)=(p_r(T),p_v(T))\in\mathbb R^2\times\mathbb R^2$. The equality part contributes the endpoint gradient
\begin{align*}
\nabla_{(r,v)}\bigl(\mu_1(r_1-R)+\mu_2r_2\bigr)=(\mu_1,\mu_2,0,0).
\end{align*}
The active velocity inequalities contribute
\begin{align*}
\nabla_{(r,v)}\left(\sum_{j\in A}\nu_jh_j(v)\right)=\left(0,0,\sum_{j\in A}\nu_j\nabla h_j(v)\right).
\end{align*}
Thus a terminal transversality condition can be written as
\begin{align*}
p_r(T)=(\mu_1,\mu_2),\qquad p_v(T)=\sum_{j\in A}\nu_j\nabla h_j(v(T)).
\end{align*}
Equivalently, the velocity component $p_v(T)$ lies in the normal cone to the safe velocity set at $v(T)$, while the position component $p_r(T)$ is determined by the two landing equality multipliers.
In the equality-only case there are no active velocity terms, so the endpoint multiplier expression is just
\begin{align*}
\mu_1(r_1(T)-R)+\mu_2r_2(T).
\end{align*}
Taking its derivative with respect to $(r_1,r_2,v_1,v_2)$ gives
\begin{align*}
(\mu_1,\mu_2,0,0),
\end{align*}
so the terminal costate components conjugate to position are $p_{r_1}(T)=\mu_1$ and $p_{r_2}(T)=\mu_2$, while the velocity-conjugate components receive no equality contribution.
[/example]
Endpoint multipliers also lead to a distinction that becomes important in Pontryagin's maximum principle. Sometimes the objective multiplier is nonzero and the cost participates in stationarity; sometimes the endpoint constraints alone carry the multiplier rule. This motivates normal and abnormal extremals.
## Normal and Abnormal Extremals
The multiplier rule for constrained optimization introduces a scalar multiplier on the objective and a vector of multipliers on constraints. The problem is to understand whether the objective multiplier can vanish. If it vanishes, the stationarity equations are driven by feasibility geometry rather than by cost.
[definition: Normal Extremal]
Consider a first-order multiplier system for an endpoint-constrained variational or optimal control problem, with objective functional $J:\mathcal A\to\mathbb R$ and constraint map $G:\mathcal A\to\mathbb R^k$. An extremal is normal if the scalar multiplier $\lambda_0$ multiplying the first variation of $J$ is nonzero.
[/definition]
Normality permits the standard normalization of the objective multiplier, so the cost remains visible in the adjoint equations. The complementary case is needed because constrained endpoint maps can be singular, and then stationarity may be forced without any cost multiplier. This motivates the definition of an abnormal extremal.
[definition: Abnormal Extremal]
Consider a first-order multiplier system for an endpoint-constrained variational or optimal control problem, with objective functional $J:\mathcal A\to\mathbb R$ and constraint map $G:\mathcal A\to\mathbb R^k$. An extremal is abnormal if the scalar multiplier $\lambda_0$ multiplying the first variation of $J$ is zero.
[/definition]
The definitions separate two possible behaviours, but a theorem is needed to justify that they exhaust the multiplier alternatives. A concrete way abnormality can occur is through a singular endpoint map: if the derivative of the map from variations to endpoint displacement has deficient range, then some nonzero covector can annihilate every feasible first-order endpoint displacement. In that situation the constraint geometry can impose stationarity even when the first variation of the objective is not used. This motivates the multiplier alternative for endpoint-constrained problems.
[quotetheorem:7619]
[citeproof:7619]
The theorem anticipates the maximum principle: the costate and Hamiltonian equations persist in both cases, but the running-cost term disappears in the abnormal case. The finite-dimensional reduction is a model for the endpoint map obtained from linearizing a control system, so the space $E$ represents actual admissible perturbations rather than arbitrary endpoint displacements. The theorem does not identify which abnormal extremals are minimizers; it only says that first-order stationarity may be supported by endpoint geometry alone. This makes existence of actual minimizers even more important, since necessary conditions may have several competing extremals.
## Existence, Compactness, and Direct Methods
Necessary conditions matter most when a minimizer exists or when a minimizing sequence has a meaningful limit. The direct method asks for two ingredients: compactness to obtain a convergent subsequence and lower semicontinuity to pass the cost to the limit. In optimal control, both ingredients are tied to the topology chosen for controls and trajectories.
[definition: Admissible Pair]
Let $\mathcal U$ be a specified class of measurable controls $u:[t_0,t_1]\to U$, let $\mathcal X\subset AC([t_0,t_1];\mathbb R^n)$ be a trajectory class, let $f:[t_0,t_1]\times\mathbb R^n\times U\to\mathbb R^n$, and let $\mathcal C\subset\mathcal X\times\mathcal U$ encode endpoint and path constraints. An admissible pair is an element $(x,u)\in\mathcal X\times\mathcal U$ such that
\begin{align*}
(x,u)\in\mathcal C,\qquad \dot x(t)=f(t,x(t),u(t))\quad\text{for a.e. }t\in[t_0,t_1].
\end{align*}
[/definition]
The admissible set must be closed under the convergence used in the proof. Even with compact trajectories, a limit is not useful if it no longer satisfies the dynamics or constraints. To turn compactness into existence, we also need a cost condition that survives passage to the limit; this motivates lower semicontinuity.
[definition: Lower Semicontinuity]
Let $X$ be a [topological space](/page/Topological%20Space) and let $J:X\to\mathbb R\cup\{\infty\}$. The functional $J$ is lower semicontinuous if for every convergent sequence $z_k\to z$ in $X$,
\begin{align*}
J[z]\le\liminf_{k\to\infty}J[z_k].
\end{align*}
[/definition]
Lower semicontinuity is the inequality needed in minimization: the limit of a minimizing sequence cannot have a larger cost than the limiting infimum. When this cost condition is combined with compactness of the admissible set, it yields a minimizer. This motivates the [Weierstrass existence theorem for the direct method](/theorems/7620).
[quotetheorem:7620]
[citeproof:7620]
The abstract theorem leaves a control-specific question: where does compactness come from? Each assumption in the Weierstrass theorem rules out a different failure mode. Nonemptiness prevents the infimum from being taken over no admissible objects. Boundedness below prevents a minimizing sequence from driving the value to $-\infty$, as happens for $J[x]=-\int_0^1 |u(t)|^2\,dt$ with unconstrained controls. [Sequential compactness](/page/Sequential%20Compactness) is needed because the open interval $\mathcal A=(0,1)$ with $J[z]=z$ has infimum $0$ but no minimizer in $\mathcal A$. Lower semicontinuity is needed because on $\mathcal A=[0,1]$ the functional with $J(0)=1$ and $J(z)=z$ for $z>0$ has infimum $0$ but no point attaining it. For ordinary differential equations, bounded controls and growth estimates give uniform bounds and equicontinuity of trajectories. This motivates the [compactness theorem](/theorems/2748) for trajectories.
[quotetheorem:7621]
[citeproof:7621]
Trajectory compactness follows from a linear growth bound and compactness of the control range, but the theorem deliberately stops short of proving convergence to an admissible controlled trajectory. The common initial condition is important for the uniform bound; without it, the constant solutions $x_k(t)=k$ for $\dot x=0$ are neither uniformly bounded nor compact in $C([t_0,t_1];\mathbb R^n)$. The linear growth bound prevents finite-time escape uniformly in the sequence; for $\dot x=x^2$ with initial data near a blow-up threshold, no uniform estimate on a fixed interval follows from compactness of controls alone. Compactness of $U_c$ keeps the vector field uniformly bounded on compact state regions; if controls are unbounded, $\dot x=u$ with $u_k(t)=k$ loses equicontinuity. The conclusion is exactly the input needed for Arzela-Ascoli, connecting ordinary differential equation estimates with compactness in $C([t_0,t_1];\mathbb R^n)$.
Trajectory compactness does not by itself guarantee compactness of controls. Rapid switching between control values can converge only weakly and may create averaged dynamics. The following remark records the obstruction that motivates convexity and relaxed-control hypotheses.
[remark: Rapid Switching and Loss of Closure]
A sequence of measurable controls taking values in a compact set need not have a pointwise convergent subsequence. Fast alternation between two admissible control values can converge weakly to an averaged value that is not an allowed pointwise control. Without additional structure, the limiting trajectory may solve a convexified system rather than the original one.
[/remark]
The obstruction in the remark is resolved in many existence theorems by convexity of the velocity-cost epigraph. Convexity ensures that weak limits of oscillating controls can be represented without increasing the running cost. This motivates the standard convex existence result.
[quotetheorem:7622]
[citeproof:7622]
This theorem identifies the structural requirements behind many applied existence proofs: compactness of trajectories, closure of admissible dynamics, and lower semicontinuity of the cost. The convex epigraph hypothesis is the point that prevents rapid switching from leaving the original admissible problem; for a scalar system with $U_c=\{-1,1\}$ and $\dot x=u$, alternating rapidly between $-1$ and $1$ produces the averaged velocity $0$, which is not generated by any pointwise control value in $U_c$. Compactness of the state constraint prevents a minimizing sequence from escaping spatially; with $\dot x=u$ and no state bound, endpoint-free problems can send $x(t_1)$ to infinity while lowering a terminal reward. Closedness of the endpoint set is needed because endpoint limits preserve membership only for closed sets; if the target is the open set $(0,1)$, a sequence with terminal states $1/k$ can converge to an infeasible boundary point. Lower semicontinuity of $\Phi$ is separate from the running-cost epigraph condition; if $\Phi(0)=1$ and $\Phi(x)=0$ for $x\neq0$, terminal states approaching $0$ can lose the infimum at the limit. The theorem is an existence result only, not a uniqueness theorem and not a characterization of the minimizer.
The final example shows these hypotheses in a familiar bounded-input problem.
[example: Energy Minimization with Box-Constrained Input]
Let $U_c=[-M,M]^m$ and consider admissible measurable controls $u:[0,T]\to U_c$ for which the solution of $\dot x=Ax+Bu$, $x(0)=x_0$, satisfies $x(T)\in K$. Since $u(t)\in[-M,M]^m$, each component obeys $|u_i(t)|\le M$, so
\begin{align*}
|u(t)|^2=\sum_{i=1}^m |u_i(t)|^2\le \sum_{i=1}^m M^2=mM^2.
\end{align*}
Hence $|u(t)|\le \sqrt m\,M$, and the vector field satisfies
\begin{align*}
|Ax(t)+Bu(t)|\le \|A\|\,|x(t)|+\|B\|\,\sqrt m\,M.
\end{align*}
Writing the integral form of the state equation gives
\begin{align*}
|x(t)|\le |x_0|+\int_0^t \|A\|\,|x(s)|\,ds+\int_0^t \|B\|\sqrt m\,M\,ds.
\end{align*}
Since $t\le T$,
\begin{align*}
|x(t)|\le |x_0|+\|B\|\sqrt m\,M\,T+\int_0^t \|A\|\,|x(s)|\,ds.
\end{align*}
By *Gronwall's inequality*,
\begin{align*}
|x(t)|\le \left(|x_0|+\|B\|\sqrt m\,M\,T\right)e^{\|A\|T}.
\end{align*}
Thus all admissible trajectories are uniformly bounded; the same estimate for $|\dot x(t)|=|Ax(t)+Bu(t)|$ gives a uniform derivative bound, so the trajectories are equicontinuous.
The running cost is
\begin{align*}
L(u)=|u|^2.
\end{align*}
For $u_1,u_2\in U_c$ and $\theta\in[0,1]$, convexity of the box gives $\theta u_1+(1-\theta)u_2\in U_c$. Expanding the square,
\begin{align*}
|\theta u_1+(1-\theta)u_2|^2=\theta^2|u_1|^2+(1-\theta)^2|u_2|^2+2\theta(1-\theta)u_1\cdot u_2.
\end{align*}
Also,
\begin{align*}
\theta |u_1|^2+(1-\theta)|u_2|^2-|\theta u_1+(1-\theta)u_2|^2=\theta(1-\theta)|u_1-u_2|^2\ge 0.
\end{align*}
Therefore
\begin{align*}
|\theta u_1+(1-\theta)u_2|^2\le \theta |u_1|^2+(1-\theta)|u_2|^2,
\end{align*}
so the integrand is convex in $u$.
For fixed $(t,x)$, the velocity-cost epigraph is
\begin{align*}
Q(t,x)=\{(Ax+Bu,\ell):u\in[-M,M]^m,\ \ell\ge |u|^2\}.
\end{align*}
If $(Ax+Bu_i,\ell_i)\in Q(t,x)$ for $i=1,2$, set $u_\theta=\theta u_1+(1-\theta)u_2$. Then $u_\theta\in[-M,M]^m$ and
\begin{align*}
\theta(Ax+Bu_1)+(1-\theta)(Ax+Bu_2)=Ax+B u_\theta.
\end{align*}
Moreover,
\begin{align*}
\theta\ell_1+(1-\theta)\ell_2\ge \theta |u_1|^2+(1-\theta)|u_2|^2\ge |u_\theta|^2.
\end{align*}
Thus $Q(t,x)$ is convex. If $(u_k,x_k)$ is a minimizing sequence, the uniform trajectory bounds give a uniformly convergent subsequence $x_k\to x$. Since $K$ is closed and $x_k(T)\in K$, the limit satisfies $x(T)\in K$. The hypotheses of *Existence Under Convex Velocity-Cost Hypotheses* are therefore met in this bounded-input linear problem, assuming the admissible set is nonempty, and an optimal control exists. The optimal control need not stay in the interior of $[-M,M]^m$; the minimum may occur with some components equal to $\pm M$ on subintervals.
[/example]
The chapter has moved from modelling objectives to first-order necessary conditions and then to existence. The next stage of the course replaces the Euler-Lagrange equation by Pontryagin's maximum principle, where the state equation, costate equation, endpoint transversality, and Hamiltonian maximization appear as one unified stationarity system.
The calculus of variations chapter has set up optimal control as a problem of minimizing a cost over admissible dynamical trajectories. Chapter 7 replaces the variational viewpoint with Pontryagin's maximum principle, where the state, costate, and Hamiltonian conditions appear together as first-order necessary conditions.
# 7. Pontryagin Maximum Principle
This chapter turns optimal control from a variational problem into a Hamiltonian boundary-value problem. The central question is: if a control $u$ minimizes a cost subject to nonlinear dynamics, what first-order conditions must the state, control, and endpoint satisfy? The answer is the Pontryagin maximum principle, which introduces costates and switching functions and explains why many constrained optimal controls sit on the boundary of the admissible control set.
## Hamiltonians, Costates, and Switching Functions
The Euler--Lagrange equations from the calculus of variations do not directly handle control constraints such as $u(t) \in U$ or dynamics of the form $\dot{x}=f(x,u)$. We need a first-order condition that treats the state equation as a constraint while preserving the pointwise nature of the control choice.
[definition: Admissible Optimal Control Problem]
Let $t_0<t_1$, let $X\subset \mathbb R^n$ and $U\subset \mathbb R^m$, and let $f:X\times U\to \mathbb R^n$, $L:X\times U\to \mathbb R$, and $\Phi:X\to \mathbb R$ be given functions. An admissible pair is a pair $(x,u)$ such that $u:[t_0,t_1]\to U$ is measurable, $x:[t_0,t_1]\to X$ is absolutely continuous, and
\begin{align*}
\dot{x}(t)=f(x(t),u(t)) \quad \text{for a.e. } t\in[t_0,t_1].
\end{align*}
The admissible-pair class is
\begin{align*}
\mathcal A:=\{(x,u): (x,u) \text{ is an admissible pair on } [t_0,t_1]\}.
\end{align*}
The Bolza cost functional is the map $J:\mathcal A\to \mathbb R$ given by
\begin{align*}
J[x,u]=\Phi(x(t_1))+\int_{t_0}^{t_1} L(x(t),u(t))\,dt.
\end{align*}
[/definition]
This formulation separates the differential constraint from the quantity being minimized. To attach a multiplier to the state equation and still make a pointwise choice of $u(t)$, the next object combines dynamics and running cost into a single scalar function.
[definition: Control Hamiltonian]
For a minimization problem with multiplier $p_0\le 0$, state space $X\subset \mathbb R^n$, control set $U\subset \mathbb R^m$, dynamics $f:X\times U\to \mathbb R^n$, and running cost $L:X\times U\to \mathbb R$, the control Hamiltonian is the map $H:X\times\mathbb R^n\times U\times(-\infty,0]\to\mathbb R$ given by
\begin{align*}
H(x,p,u,p_0)=p\cdot f(x,u)+p_0 L(x,u).
\end{align*}
[/definition]
The sign convention $p_0\le 0$ makes the optimal control maximize $H$ in the maximum principle. Since this Hamiltonian contains an unknown multiplier paired with the state equation, we next name the time-dependent multiplier that carries endpoint information backward through the trajectory.
[definition: Costate]
A costate is an absolutely continuous function $p:[t_0,t_1]\to\mathbb R^n$ associated with an admissible trajectory $x:[t_0,t_1]\to X$ and satisfying an adjoint equation determined by the Hamiltonian.
[/definition]
The costate is the Lagrange multiplier for the state equation, and its evolution runs backward from terminal transversality data. Once $p$ is known, the remaining local question is how the Hamiltonian depends on each constrained control component, which leads to the switching functions.
[definition: Switching Function]
For a control-affine Hamiltonian of the form
\begin{align*}
H(x,p,u,p_0)=H_0(x,p,p_0)+\sum_{i=1}^m \varphi_i(t)u_i,
\end{align*}
where $H_0:X\times\mathbb R^n\times(-\infty,0]\to\mathbb R$ is the part independent of $u$ along the chosen extremal and each $\varphi_i:[t_0,t_1]\to\mathbb R$ is measurable, the functions $\varphi_i$ are the switching functions.
[/definition]
Switching functions are the coefficients that decide which boundary value of the control is favoured. These definitions now assemble into the maximum principle: a necessary condition that gives the state equation, the backward adjoint equation, the pointwise maximization rule, and endpoint transversality in one statement.
[quotetheorem:7623]
[citeproof:7623]
The maximum principle is a necessary condition, not a sufficient condition. Compactness of $U$ and continuity of $H$ are what make the displayed maximum meaningful; for instance, if $U=(0,1)$ and $H(u)=u$, the supremum is $1$ but no admissible control value attains it. The smoothness assumptions justify the linearized state equation and the adjoint equation, while the endpoint nondegeneracy condition rules out the case where separation produces only a vacuous multiplier. The theorem also does not assert existence or uniqueness of an optimal control, nor does it decide whether an extremal satisfying the equations is a minimizer. It reduces the search for candidates to controlled Hamiltonian trajectories, after which boundary conditions, switching structure, and second-order information decide which candidates are optimal.
[example: Scalar Quadratic Tracking]
Consider $\dot{x}=u$ on $[0,1]$, with $u\in\mathbb R$, $x(0)=x_0$, and
\begin{align*}
J[u]=\frac{1}{2}x(1)^2+\frac{1}{2}\int_0^1 u(t)^2\,dt.
\end{align*}
With the normal convention $p_0=-1$, the control Hamiltonian is
\begin{align*}
H(x,p,u)=p u-\frac{1}{2}u^2.
\end{align*}
For fixed $p$, the maximization condition requires maximizing the concave quadratic $u\mapsto pu-\frac{1}{2}u^2$. Its derivative is
\begin{align*}
\frac{\partial H}{\partial u}=p-u.
\end{align*}
Thus the critical point satisfies $p-u=0$, so
\begin{align*}
u^*(t)=p(t).
\end{align*}
Since $\partial^2 H/\partial u^2=-1<0$, this critical point is the global maximizer over $\mathbb R$.
The adjoint equation gives
\begin{align*}
\dot p(t)=-\frac{\partial H}{\partial x}(x(t),p(t),u(t))=0,
\end{align*}
because $H$ has no $x$-dependence. Hence $p(t)=p$ is constant, and therefore $u^*(t)=p$ is constant. Integrating $\dot{x}=p$ from $0$ to $1$ gives
\begin{align*}
x(1)-x(0)=\int_0^1 p\,dt=p.
\end{align*}
Since $x(0)=x_0$, this is
\begin{align*}
x(1)=x_0+p.
\end{align*}
The terminal cost is $\Phi(x)=\frac{1}{2}x^2$, so $\nabla\Phi(x)=x$. By the free-endpoint transversality condition in this sign convention, $p(1)=-\nabla\Phi(x(1))$, hence
\begin{align*}
p=-x(1).
\end{align*}
Substituting $x(1)=x_0+p$ gives
\begin{align*}
p=-(x_0+p).
\end{align*}
Moving the $p$-terms to the left gives
\begin{align*}
2p=-x_0.
\end{align*}
Therefore
\begin{align*}
p=-\frac{x_0}{2},
\end{align*}
and the optimal control candidate is
\begin{align*}
u^*(t)=-\frac{x_0}{2}.
\end{align*}
The corresponding terminal state is
\begin{align*}
x(1)=x_0-\frac{x_0}{2}=\frac{x_0}{2}.
\end{align*}
This agrees with minimizing over constant controls. If $u(t)\equiv u$, then $x(1)=x_0+u$, so the reduced cost is
\begin{align*}
F(u)=\frac{1}{2}(x_0+u)^2+\frac{1}{2}u^2.
\end{align*}
Expanding,
\begin{align*}
F(u)=\frac{1}{2}(x_0^2+2x_0u+u^2)+\frac{1}{2}u^2.
\end{align*}
Combining like terms,
\begin{align*}
F(u)=\frac{1}{2}x_0^2+x_0u+u^2.
\end{align*}
Thus
\begin{align*}
F'(u)=x_0+2u.
\end{align*}
The critical point satisfies $x_0+2u=0$, so
\begin{align*}
u=-\frac{x_0}{2}.
\end{align*}
Since $F''(u)=2>0$, this critical point is the unique minimizer among constant controls, matching the maximum-principle candidate.
[/example]
## Fixed-Time, Free-Time, and Terminal-Constraint Versions
Different optimal control problems differ less in their Hamiltonian equations than in their endpoint conditions. The question in each version is which variations are allowed at the final time and which multipliers must be added to enforce terminal restrictions.
[definition: Transversality Condition]
Let $E$ be the endpoint state space, let $S\subset E$ be the terminal constraint set, and let $\mathcal V\subset T_{x^*(t_1)}E$ be the class of first-order endpoint variations allowed by the terminal constraint and by the admissible control variations. A transversality condition is an endpoint condition on the costate obtained by requiring the first variation of the endpoint Lagrangian to vanish on $\mathcal V$, or to belong to the corresponding normal cone when $S$ is not a smooth manifold.
[/definition]
This definition turns endpoint geometry into a boundary condition for the adjoint equation. It motivates the first endpoint theorem: when the terminal state is free, all endpoint variations are allowed, so the terminal cost determines the final costate without any extra normal multiplier.
[quotetheorem:7624]
[citeproof:7624]
The free endpoint hypothesis is essential because the proof uses arbitrary terminal variations. If instead the terminal state is fixed, as in $\dot{x}=u$ with $x(0)=0$ and $x(1)=1$, there is no condition forcing $p(1)=-\nabla\Phi(x(1))$; the endpoint is not allowed to move. Normality and differentiability of $\Phi$ are also part of the statement: in an abnormal extremal the multiplier of the cost can vanish, and for a nonsmooth terminal cost such as $\Phi(x)=|x|$ at $x=0$ the gradient formula must be replaced by a subdifferential condition. The result therefore gives only the terminal boundary condition for this particular endpoint geometry, and the next theorem explains how the condition changes when the endpoint is restricted to a manifold.
[quotetheorem:7625]
[citeproof:7625]
The smooth embedded-submanifold hypothesis is what makes $T_{x^*(t_1)}M$ and $N_{x^*(t_1)}M$ well-defined linear spaces. If the terminal set has a corner, for example $M=\{(x_1,x_2): x_2\ge |x_1|\}$ at the origin, there is no single tangent space and the condition must be formulated using tangent cones and normal cones. The theorem also does not determine the normal component of $p(t_1)+\nabla\Phi(x^*(t_1))$; it only says that tangent endpoint variations cannot detect it. Terminal manifold transversality controls spatial endpoint variations, while free final time adds a scalar endpoint variation along the trajectory itself, so the next necessary condition involves the Hamiltonian value rather than only the terminal costate.
[quotetheorem:7626]
[citeproof:7626]
The autonomy assumption is what makes the Hamiltonian conserved; if $L(t,x,u)=t$ or $f$ depends explicitly on $t$, differentiating the Hamiltonian produces an extra $\partial H/\partial t$ term. The absence of a terminal-time cost is also essential: adding a term $\psi(t_1)$ changes the endpoint condition to $H(t_1)+\psi'(t_1)=0$ in the same sign convention. If the final time is fixed, no variation $\delta t_1$ is allowed, so the scalar Hamiltonian condition need not hold. This condition is often the fastest way to remove spurious extremals in time-optimal problems; when the cost is final time, the running cost is $L\equiv 1$, and the normal Hamiltonian becomes $H=p\cdot f-1$.
[example: Free-Time Integrator]
For $\dot{x}=u$ with $|u|\le 1$, the normal time-minimization Hamiltonian is
\begin{align*}
H(x,p,u)=pu-1.
\end{align*}
The Hamiltonian has no $x$-dependence, so the adjoint equation is
\begin{align*}
\dot p(t)=-\frac{\partial H}{\partial x}(x(t),p(t),u(t))=0.
\end{align*}
Hence $p(t)=p$ is constant. For this fixed constant $p$, the maximization condition over $[-1,1]$ requires maximizing $v\mapsto pv-1$. If $p>0$, then $pv-1$ is increasing in $v$, so the maximizer is $v=1$; if $p<0$, then $pv-1$ is decreasing in $v$, so the maximizer is $v=-1$. Thus, for $p\ne 0$,
\begin{align*}
u^*(t)=\operatorname{sgn}(p).
\end{align*}
By the free-final-time Hamiltonian condition, the maximized Hamiltonian must vanish along the extremal. Therefore
\begin{align*}
0=\max_{|v|\le 1}(pv-1)=|p|-1.
\end{align*}
Thus $|p|=1$, so $p=1$ or $p=-1$. If $p=1$, then $u^*(t)=1$, and integrating the state equation gives
\begin{align*}
x(t_1)-x(0)=\int_0^{t_1}1\,dt=t_1,
\end{align*}
so $x(t_1)=x_0+t_1>0$, which cannot satisfy $x(t_1)=0$. Hence the feasible extremal has $p=-1$ and $u^*(t)=-1$. Integrating $\dot x=-1$ from $0$ to $t_1$ gives
\begin{align*}
x(t_1)-x_0=\int_0^{t_1}-1\,dt=-t_1.
\end{align*}
Since $x(t_1)=0$, this becomes
\begin{align*}
-x_0=-t_1.
\end{align*}
Therefore
\begin{align*}
t_1=x_0.
\end{align*}
The optimal motion uses the largest admissible speed toward the origin, and the travel time equals the initial distance.
[/example]
## Bang-Bang Control and Singular Arcs
When the control set has corners, Hamiltonian maximization often selects an extreme point. The main question becomes whether the switching functions cross zero at isolated times, producing bang-bang controls, or vanish on intervals, producing singular arcs.
[definition: Bang-Bang Control]
A control $u:[t_0,t_1]\to U$ is bang-bang if $u(t)$ belongs to the set of extreme points of $U$ for a.e. $t\in[t_0,t_1]$.
[/definition]
This definition identifies the geometric shape of controls selected by a linear Hamiltonian over a box. It motivates the bang-bang principle for linear systems, where the switching functions $B^\top p$ determine which extreme point of the box maximizes the Hamiltonian at each time.
[quotetheorem:7627]
[citeproof:7627]
The box hypothesis is what makes the maximization split into independent scalar endpoint choices. For a non-box control set such as the Euclidean disk $\{u\in\mathbb R^2: |u|\le 1\}$, maximizing a linear function gives a boundary direction $u=(B^\top p)/|B^\top p|$ rather than componentwise upper and lower bounds. The isolated-zero assumption is also necessary: if a switching function vanishes on an interval, the maximum condition does not determine that component there, and the arc may be singular rather than bang-bang. The result is only a switching rule for normal extremals; it does not by itself say how many switches occur or whether the resulting trajectory satisfies the boundary conditions. The double integrator is the standard model where the switching geometry can be computed by hand.
[example: Double Integrator Time-Optimal Control]
Let $x_1$ denote position and $x_2$ velocity, and consider the time-minimization problem
\begin{align*}
\dot{x}_1=x_2,\qquad \dot{x}_2=u,\qquad |u|\le 1
\end{align*}
with terminal state $(x_1,x_2)=(0,0)$. In the normal convention for time minimization, the running cost is $L\equiv 1$, so the Hamiltonian is
\begin{align*}
H(x,p,u)=p_1x_2+p_2u-1.
\end{align*}
The only term depending on $u$ is $p_2u$, so the switching function is $\varphi(t)=p_2(t)$. The adjoint equations are obtained by differentiating $H$ with respect to $x_1$ and $x_2$:
\begin{align*}
\dot{p}_1(t)=-\frac{\partial H}{\partial x_1}=0
\end{align*}
and
\begin{align*}
\dot{p}_2(t)=-\frac{\partial H}{\partial x_2}=-p_1(t).
\end{align*}
Since $\dot p_1=0$, there is a constant $c$ with $p_1(t)=c$. Substituting this into the second adjoint equation gives
\begin{align*}
\dot p_2(t)=-c.
\end{align*}
Integrating from a reference time $t_0$ to $t$ gives
\begin{align*}
p_2(t)-p_2(t_0)=\int_{t_0}^t -c\,ds=-c(t-t_0).
\end{align*}
Hence
\begin{align*}
p_2(t)=p_2(t_0)-c(t-t_0),
\end{align*}
so $p_2$ is affine in time. An affine function has at most one zero unless both its slope and intercept vanish; therefore a normal nonsingular extremal has at most one switching time.
For each fixed time, maximizing $p_2(t)v$ over $-1\le v\le 1$ gives
\begin{align*}
u^*(t)=1\quad \text{if }p_2(t)>0
\end{align*}
and
\begin{align*}
u^*(t)=-1\quad \text{if }p_2(t)<0.
\end{align*}
Thus the control is bang-bang away from zeros of $p_2$, and because $p_2$ has at most one zero, the nonsingular extremal is either all acceleration, all braking, accelerate-then-brake, or brake-then-accelerate.
The switching curve is found by integrating the constant-control arcs that end at the origin. If $u=1$, then $\dot x_2=1$, so along the arc
\begin{align*}
\frac{dx_1}{dx_2}=\frac{\dot x_1}{\dot x_2}=\frac{x_2}{1}=x_2.
\end{align*}
Integrating gives
\begin{align*}
x_1=\frac{1}{2}x_2^2+C.
\end{align*}
The arc passes through $(0,0)$, so $C=0$, and the braking arc with $u=1$ is
\begin{align*}
x_1=\frac{1}{2}x_2^2,\qquad x_2\le 0.
\end{align*}
If $u=-1$, then $\dot x_2=-1$, so
\begin{align*}
\frac{dx_1}{dx_2}=\frac{\dot x_1}{\dot x_2}=\frac{x_2}{-1}=-x_2.
\end{align*}
Integrating gives
\begin{align*}
x_1=-\frac{1}{2}x_2^2+C.
\end{align*}
Again the arc passes through $(0,0)$, so $C=0$, and the braking arc with $u=-1$ is
\begin{align*}
x_1=-\frac{1}{2}x_2^2,\qquad x_2\ge 0.
\end{align*}
These two parabolas divide the phase plane into regions where the unique nonsingular time-optimal candidate first uses full acceleration and then full braking, or first uses full braking and then full acceleration.
[/example]
Zeros of the switching function that persist over an interval require higher-order analysis. Differentiating the switching condition may eventually reveal the control, and the sign of the resulting second variation imposes an additional necessary condition.
[definition: Singular Arc]
For a control-affine problem with switching function $\varphi_i$, an interval $I\subset[t_0,t_1]$ is a singular arc for the component $u_i$ if
\begin{align*}
\varphi_i(t)=0 \quad \text{for all } t\in I.
\end{align*}
[/definition]
On a singular arc the first-order maximization condition is degenerate in the corresponding control direction. The next condition records the sign required for the differentiated stationarity equation to be compatible with a maximum.
[quotetheorem:7628]
The Legendre--Clebsch condition comes from second-variation arguments and is usually developed in a more specialized treatment of singular optimal control. In this course it is used as a diagnostic: it can rule out candidate singular arcs produced by repeated differentiation of the switching function. The hypotheses matter. If the first control-dependent derivative is odd, the usual even-order sign test is not the right object; if $\partial(d^{2r}\varphi/dt^{2r})/\partial u=0$, differentiating has not yet solved for the singular control; and if the extremal is abnormal, the sign convention can change because the running-cost multiplier has vanished.
[example: Singular Control in a Scalar Affine System]
Consider the scalar affine system
\begin{align*}
\dot{x}=u,\qquad u\in[-1,1],
\end{align*}
with running cost $L(x)=\frac{1}{2}x^2$ and normal Hamiltonian
\begin{align*}
H(x,p,u)=pu-\frac{1}{2}x^2.
\end{align*}
Because the Hamiltonian is affine in $u$, the coefficient of $u$ is the switching function:
\begin{align*}
\varphi(t)=p(t).
\end{align*}
The adjoint equation is
\begin{align*}
\dot p(t)=-\frac{\partial H}{\partial x}(x(t),p(t),u(t)).
\end{align*}
Since
\begin{align*}
\frac{\partial H}{\partial x}(x,p,u)=\frac{\partial}{\partial x}\left(pu-\frac{1}{2}x^2\right)=-x,
\end{align*}
we get
\begin{align*}
\dot p(t)=x(t).
\end{align*}
On a singular arc, $\varphi(t)=0$ throughout the interval, so
\begin{align*}
p(t)=0.
\end{align*}
Differentiating this identity along the arc gives
\begin{align*}
\dot p(t)=0.
\end{align*}
Combining this with $\dot p(t)=x(t)$ gives
\begin{align*}
x(t)=0.
\end{align*}
To keep $x(t)=0$ on the same interval, differentiate once more along the state equation:
\begin{align*}
0=\dot x(t)=u(t).
\end{align*}
Thus the singular control candidate is
\begin{align*}
u(t)=0.
\end{align*}
Equivalently, the successive derivatives of the switching function are
\begin{align*}
\varphi(t)=p(t),
\end{align*}
\begin{align*}
\dot\varphi(t)=\dot p(t)=x(t),
\end{align*}
and
\begin{align*}
\ddot\varphi(t)=\dot x(t)=u(t).
\end{align*}
The control first appears in the second derivative of the switching function, and setting that derivative equal to zero yields the equilibrium singular arc $x(t)=0$, $u(t)=0$.
[/example]
A second applied setting where bang-bang and singular behavior meet is minimum-fuel control. Fuel costs usually involve $|u|$, so the Hamiltonian has flat regions where turning the actuator off is optimal and boundary regions where full thrust is optimal.
[example: Minimum-Fuel Spacecraft Reorientation]
For the simplified rotational model $\dot{x}=Ax+Bu$ with torque bounds $|u_i|\le u_{\max}$, the normal minimum-fuel Hamiltonian is
\begin{align*}
H(x,p,u)=p\cdot(Ax+Bu)-\sum_{i=1}^m |u_i|.
\end{align*}
Using $p\cdot Bu=(B^\top p)\cdot u$, this becomes
\begin{align*}
H(x,p,u)=p\cdot Ax+\sum_{i=1}^m\left((B^\top p)_i u_i-|u_i|\right).
\end{align*}
The term $p\cdot Ax$ is independent of $u$, so maximizing $H$ over the box $|u_i|\le u_{\max}$ separates into the scalar problems
\begin{align*}
\max_{|u_i|\le u_{\max}}\left(\alpha_i u_i-|u_i|\right),\qquad \alpha_i:=(B^\top p)_i.
\end{align*}
Fix one component and write $h(u)=\alpha u-|u|$ on $[-u_{\max},u_{\max}]$. If $u\ge 0$, then $|u|=u$, so
\begin{align*}
h(u)=\alpha u-u=(\alpha-1)u.
\end{align*}
If $u\le 0$, then $|u|=-u$, so
\begin{align*}
h(u)=\alpha u+u=(\alpha+1)u.
\end{align*}
When $|\alpha|<1$, we have $\alpha-1<0$ and $\alpha+1>0$. Thus $(\alpha-1)u\le 0$ for $u\in[0,u_{\max}]$, while $(\alpha+1)u\le 0$ for $u\in[-u_{\max},0]$. Since $h(0)=0$, the unique maximizer is
\begin{align*}
u_i^*=0.
\end{align*}
When $\alpha>1$, the expression $(\alpha-1)u$ is increasing on $[0,u_{\max}]$, so its maximum there is attained at $u=u_{\max}$. On $[-u_{\max},0]$, the expression $(\alpha+1)u$ is at most $0$, while
\begin{align*}
h(u_{\max})=(\alpha-1)u_{\max}>0.
\end{align*}
Therefore
\begin{align*}
u_i^*=u_{\max}.
\end{align*}
When $\alpha<-1$, the expression $(\alpha+1)u$ is maximized on $[-u_{\max},0]$ at $u=-u_{\max}$, and
\begin{align*}
h(-u_{\max})=(\alpha+1)(-u_{\max})=-(\alpha+1)u_{\max}>0.
\end{align*}
Hence
\begin{align*}
u_i^*=-u_{\max}.
\end{align*}
At the boundary $\alpha=1$, the positive side gives
\begin{align*}
h(u)=(1-1)u=0\qquad \text{for }0\le u\le u_{\max},
\end{align*}
while the negative side gives $h(u)=2u\le 0$. Thus every $u\in[0,u_{\max}]$ maximizes that component. At the boundary $\alpha=-1$, every $u\in[-u_{\max},0]$ maximizes that component. Consequently, for each actuator,
\begin{align*}
u_i^*=0\quad \text{if } |(B^\top p)_i|<1,
\end{align*}
\begin{align*}
u_i^*=u_{\max}\operatorname{sgn}((B^\top p)_i)\quad \text{if } |(B^\top p)_i|>1.
\end{align*}
The surfaces $|(B^\top p)_i|=1$ are precisely where coast arcs and full-thrust arcs can meet, which is the switching mechanism behind thrust-coast-thrust minimum-fuel maneuvers.
[/example]
The maximum principle therefore gives a workflow rather than a closed-form solver. Write the Hamiltonian, derive the adjoint equation and transversality conditions, maximize over the control set, analyze switching functions, and then enforce the boundary conditions. In Chapter 8, dynamic programming and Hamilton--Jacobi--Bellman theory give a complementary value-function viewpoint on the same optimality problem.
Pontryagin's maximum principle converted optimality into a boundary-value problem with adjoint dynamics and Hamiltonian maximization. The next chapter gives the complementary dynamic-programming viewpoint, using the value function and the Hamilton-Jacobi-Bellman equation to characterize optimal control globally.
# 8. Dynamic Programming and HJB Equations
Dynamic programming treats optimal control by asking how the best achievable cost changes when the initial time and state are changed. In the preceding parts of the course, Pontryagin's maximum principle converted an optimization problem into necessary conditions along an optimal trajectory. This chapter develops the complementary viewpoint: the value function stores the global optimum, and its local infinitesimal form is the Hamilton-Jacobi-Bellman equation. The resulting theory explains why Riccati equations appear in linear-quadratic regulation and gives a verification method for synthesising feedback controls from smooth solutions.
## Value Functions and Bellman Optimality
The central question is how to express an optimal control problem from any intermediate state without restarting the whole calculation. If a controller is optimal from time $t$ and state $x$, then after following it for a short time, the remaining part should still be optimal for the new state. Dynamic programming turns this consistency requirement into an equation for the value function.
Consider a controlled system
\begin{align*}
\dot{x}(s) &= f(x(s), u(s)), \qquad x(t)=x,
\end{align*}
where $x(s) \in \mathbb R^n$, $u(s) \in U \subset \mathbb R^m$, and admissible controls are measurable maps $u:[t,T]\to U$ for which the state equation has a unique trajectory. Let the running cost be $\ell:\mathbb R^n\times U\to \mathbb R$ and the terminal cost be $g:\mathbb R^n\to \mathbb R$.
[definition: Finite-Horizon Value Function]
Let $\mathcal A[t,T]$ denote the class of measurable controls $u:[t,T]\to U$ for which the initial-value problem $\dot{x}(s)=f(x(s),u(s))$, $x(t)=x$, has a unique admissible trajectory. The finite-horizon value function is the map $V:[0,T]\times\mathbb R^n\to \mathbb R\cup\{\pm\infty\}$ defined by
\begin{align*}
V(t,x)=\inf_{u\in\mathcal A[t,T]}\left\{\int_t^T \ell(x_u(s),u(s))\,ds + g(x_u(T))\right\},
\end{align*}
where $x_u(\cdot)$ denotes the trajectory generated by $u$ from $x_u(t)=x$.
[/definition]
The value function packages the whole family of optimization problems indexed by initial data. Its dependence on $t$ is backward: when $t$ is close to $T$, there is less time to influence the terminal cost, while when $t$ is earlier, the controller has more freedom to trade running cost against terminal performance.
[example: Scalar Finite-Horizon Regulator]
Let $\dot{x}=u$, $u\in\mathbb R$, and $q>0$. For a quadratic candidate
\begin{align*}
V(t,x)=\frac{1}{2}P(t)x^2,
\end{align*}
we have
\begin{align*}
\partial_t V(t,x)=\frac{1}{2}\dot P(t)x^2
\end{align*}
and
\begin{align*}
\partial_x V(t,x)=P(t)x.
\end{align*}
The pointwise control minimization is
\begin{align*}
\inf_{a\in\mathbb R}\left\{\frac{1}{2}a^2+aP(t)x\right\}.
\end{align*}
Completing the square gives
\begin{align*}
\frac{1}{2}a^2+aP(t)x=\frac{1}{2}\left(a+P(t)x\right)^2-\frac{1}{2}P(t)^2x^2,
\end{align*}
so the minimum is attained at
\begin{align*}
a=-P(t)x
\end{align*}
and its value is
\begin{align*}
-\frac{1}{2}P(t)^2x^2.
\end{align*}
Thus the quadratic ansatz satisfies the HJB equation precisely when
\begin{align*}
\frac{1}{2}\dot P(t)x^2-\frac{1}{2}P(t)^2x^2=0
\end{align*}
for every $x$, equivalently
\begin{align*}
\dot P(t)=P(t)^2.
\end{align*}
Writing this backward from the terminal condition $V(T,x)=\frac{q}{2}x^2$ gives
\begin{align*}
-\dot P(t)=-P(t)^2, \qquad P(T)=q.
\end{align*}
Let
\begin{align*}
P(t)=\frac{q}{1+q(T-t)}.
\end{align*}
Then $P(T)=q$, and differentiating the displayed formula gives
\begin{align*}
\dot P(t)=\frac{q^2}{(1+q(T-t))^2}.
\end{align*}
Since
\begin{align*}
P(t)^2=\frac{q^2}{(1+q(T-t))^2},
\end{align*}
we have $\dot P(t)=P(t)^2$, so this $P$ solves the Riccati equation. The optimal feedback obtained from the pointwise minimizer is therefore
\begin{align*}
u^*(t,x)=-P(t)x=-\frac{q}{1+q(T-t)}x.
\end{align*}
The factor $P(t)$ increases as $t$ approaches $T$, so the value function records both the remaining time horizon and the strength $q$ of the terminal penalty.
[/example]
The finite-horizon regulator suggests that optimality should be stable under cutting the time interval into pieces. The obstruction is that an apparently optimal plan could fail to remain optimal after an intermediate time if its tail were not optimal for the state it reaches; then local decisions could not be assembled into a global optimum. Dynamic programming identifies the exact recursive condition that rules out this inconsistency and later becomes the source of the HJB equation.
[quotetheorem:7629]
[citeproof:7629]
The principle says that a global optimum has no preferred starting time. The restriction and concatenation hypotheses are not cosmetic: if a control class imposed a global smoothness or communication constraint that failed after splicing two controls at $\tau$, then the right-hand side could use pieces that do not form an admissible full control. The $\varepsilon$-optimal tail assumption is the replacement for an actual minimizer; without it, the formula can fail to produce the reverse inequality when the infimum is approached but not attained. The theorem also does not assert that an optimal control exists or that $V$ is smooth. It only gives the exact recursive identity from which the short-time HJB calculation will be derived.
[remark: Time Direction]
The value function in a terminal-cost problem satisfies a backward equation. The terminal condition is known at $T$, and the HJB equation propagates information from $T$ toward earlier times.
[/remark]
## Derivation of the Hamilton-Jacobi-Bellman Equation
The next problem is to extract a differential equation from Bellman's identity. If $V$ is smooth and the dynamic programming principle is applied over a short interval $[t,t+h]$, then the first-order expansion of the value at the new state must balance the running cost paid over that interval.
Assume for this derivation that $V\in C^1([0,T]\times\mathbb R^n)$ and that controls can be held constant over a short interval. For a fixed $a\in U$, the trajectory satisfies $x(t+h)=x+h f(x,a)+o(h)$, and dynamic programming gives
\begin{align*}
V(t,x) \le \int_t^{t+h}\ell(x(s),a)\,ds + V(t+h,x(t+h)).
\end{align*}
Expanding the right-hand side and cancelling $V(t,x)$ yields
\begin{align*}
0 \le h\left(\ell(x,a)+\partial_t V(t,x)+\nabla_x V(t,x)\cdot f(x,a)\right)+o(h).
\end{align*}
Optimizing over $a$ gives the formal differential equation.
[definition: Control Hamiltonian]
The control Hamiltonian associated with the running cost $\ell$ and dynamics $f$ is the map $H:\mathbb R^n\times\mathbb R^n\to \mathbb R\cup\{\pm\infty\}$ defined by
\begin{align*}
H(x,p)=\inf_{a\in U}\{\ell(x,a)+p\cdot f(x,a)\}.
\end{align*}
[/definition]
With this convention, minimization problems produce a Hamiltonian that already contains the pointwise choice of control. The natural next step is to replace the infinitesimal Bellman inequality by a PDE whose vanishing expresses that no admissible instantaneous control can improve the value.
[quotetheorem:7630]
[citeproof:7630]
The HJB equation is nonlinear even when the dynamics are linear, because the minimization over controls usually depends on the gradient of $V$. The differentiability hypothesis is the major limitation: for example, if two different terminal routes are equally good, the value function may have a kink where the preferred route switches, and the displayed PDE is then not valid in the classical pointwise sense. The short-time admissibility and approximate-minimizer assumptions also matter, since the derivation tests the value function by freezing a control over $[t,t+h]$ and comparing it with nearly optimal controls over the same interval. A concrete failure occurs when $U$ is not compact and the infimum over $[t,t+h]$ is approached only by controls escaping to the boundary of $U$ or to unbounded amplitudes: the first-order inequality may identify a formal Hamiltonian value that no admissible short-time control realizes with an $o(h)$ error. Measurability can fail in a related way if the set of near minimizers jumps too irregularly with the reached state, so that the proposed local choices cannot be assembled into an admissible control. Thus the theorem is best read as a classical calculation under regularity; nonsmooth value functions require the viscosity interpretation developed after this chapter. When the minimizing control can be written as a function of $(t,x)$, the same equation becomes a feedback design tool.
[example: Nonlinear Scalar Minimum-Effort Problem]
Let $\dot{x}=u$, $u\in\mathbb R$, and
\begin{align*}
J_{t,x}[u]=\int_t^T \frac{1}{2}u(s)^2\,ds + g(x(T)).
\end{align*}
Here $f(x,a)=a$ and $\ell(x,a)=\frac{1}{2}a^2$, so for a momentum variable $p$ the Hamiltonian is
\begin{align*}
H(x,p)=\inf_{a\in\mathbb R}\left\{\frac{1}{2}a^2+pa\right\}.
\end{align*}
For each fixed $p$, complete the square:
\begin{align*}
\frac{1}{2}a^2+pa=\frac{1}{2}\left(a^2+2pa\right)=\frac{1}{2}\left((a+p)^2-p^2\right)=\frac{1}{2}(a+p)^2-\frac{1}{2}p^2.
\end{align*}
Since $\frac{1}{2}(a+p)^2\ge 0$ for all $a\in\mathbb R$, the smallest value occurs exactly when $a+p=0$, that is, when $a=-p$. Substituting $a=-p$ gives
\begin{align*}
\frac{1}{2}(-p)^2+p(-p)=\frac{1}{2}p^2-p^2=-\frac{1}{2}p^2,
\end{align*}
and therefore
\begin{align*}
H(x,p)=-\frac{1}{2}p^2.
\end{align*}
Substituting $p=\partial_x V(t,x)$ into the HJB equation $\partial_t V(t,x)+H(x,\partial_x V(t,x))=0$ gives
\begin{align*}
\partial_t V(t,x)-\frac{1}{2}\left(\partial_x V(t,x)\right)^2=0,
\end{align*}
with terminal condition
\begin{align*}
V(T,x)=g(x).
\end{align*}
The pointwise minimizing control is $a=-p$, so wherever $V$ is differentiable the feedback suggested by the equation is
\begin{align*}
u^*(t,x)=-\partial_x V(t,x).
\end{align*}
Thus the spatial derivative of the value function is not just a sensitivity: in this minimum-effort problem it directly determines the optimal control direction and magnitude.
[/example]
The minimum-effort example has a fixed terminal time, so the boundary condition enters at $T$. Many control problems instead end when the state reaches a target or leaves a safe region, and then the value function must remember where exit occurs rather than when a prescribed clock stops. This motivates an exit-time version of the same dynamic programming construction.
[definition: Exit-Time Value Function]
Let $D\subset\mathbb R^n$ be open. For $x\in D$, let $\mathcal A_D[x]$ denote the class of measurable controls $u:[0,\infty)\to U$ for which the trajectory $x_u(\cdot)$ with $x_u(0)=x$ is uniquely defined up to the exit time
\begin{align*}
\tau_D^u(x):=\inf\{s\ge 0:x_u(s)\notin D\}.
\end{align*}
Only controls with $\tau_D^u(x)<\infty$ are included in $\mathcal A_D[x]$. If no such finite-exit control is admissible, the infimum is taken over the empty set and the value is $+\infty$. The exit-time value function is the map $V:D\to \mathbb R\cup\{\pm\infty\}$ defined by
\begin{align*}
V(x)=\inf_{u\in\mathcal A_D[x]}\left\{\int_0^{\tau_D^u(x)}\ell(x_u(s),u(s))\,ds+g(x_u(\tau_D^u(x)))\right\}.
\end{align*}
[/definition]
The boundary condition is imposed on $\partial D$ rather than at a fixed final time. Inside $D$, the stationary version of dynamic programming leads to $H(x,\nabla V(x))=0$ under the same smoothness assumptions.
[example: Optimal Stopping-Style Exit Cost]
Let $D=(-1,1)$, $\dot{x}=u$, and running cost $1+\frac{1}{2}u^2$, with boundary cost $g(-1)=g(1)=0$. For a smooth stationary candidate $V$, the Hamiltonian at an interior point is
\begin{align*}
\inf_{a\in\mathbb R}\left\{1+\frac{1}{2}a^2+aV'(x)\right\}.
\end{align*}
For fixed $x$, complete the square in $a$:
\begin{align*}
1+\frac{1}{2}a^2+aV'(x)=1+\frac{1}{2}\left(a^2+2aV'(x)\right).
\end{align*}
\begin{align*}
1+\frac{1}{2}\left(a^2+2aV'(x)\right)=1+\frac{1}{2}\left((a+V'(x))^2-(V'(x))^2\right).
\end{align*}
\begin{align*}
1+\frac{1}{2}\left((a+V'(x))^2-(V'(x))^2\right)=1+\frac{1}{2}(a+V'(x))^2-\frac{1}{2}(V'(x))^2.
\end{align*}
Since $\frac{1}{2}(a+V'(x))^2\ge 0$, the minimum is attained at $a=-V'(x)$ and has value
\begin{align*}
1-\frac{1}{2}(V'(x))^2.
\end{align*}
Thus the stationary HJB equation is
\begin{align*}
0=1-\frac{1}{2}(V'(x))^2,
\end{align*}
so on every interval where $V$ is classically differentiable,
\begin{align*}
(V'(x))^2=2.
\end{align*}
The boundary conditions $V(-1)=V(1)=0$ and nonnegative running cost suggest the smaller of the two costs for exiting left or right:
\begin{align*}
V(x)=\sqrt{2}\min\{x+1,1-x\}=\sqrt{2}(1-|x|).
\end{align*}
For $-1<x<0$, this gives
\begin{align*}
V(x)=\sqrt{2}(x+1),
\end{align*}
so
\begin{align*}
V'(x)=\sqrt{2}.
\end{align*}
Substituting into the stationary equation gives
\begin{align*}
1-\frac{1}{2}(V'(x))^2=1-\frac{1}{2}(\sqrt{2})^2=1-1=0.
\end{align*}
For $0<x<1$, the same candidate is
\begin{align*}
V(x)=\sqrt{2}(1-x),
\end{align*}
so
\begin{align*}
V'(x)=-\sqrt{2}.
\end{align*}
Again,
\begin{align*}
1-\frac{1}{2}(V'(x))^2=1-\frac{1}{2}(-\sqrt{2})^2=1-1=0.
\end{align*}
At the boundary,
\begin{align*}
V(-1)=\sqrt{2}(1-|-1|)=0
\end{align*}
and
\begin{align*}
V(1)=\sqrt{2}(1-|1|)=0.
\end{align*}
The left derivative at $0$ is $\sqrt{2}$, while the right derivative at $0$ is $-\sqrt{2}$, so this candidate is not classically differentiable at the centre. That cusp records the fact that from $x=0$ the left and right exits have the same optimal cost, and the HJB equation must be interpreted in the viscosity sense there.
[/example]
## Verification Theorems and Smooth Optimal Synthesis
The HJB equation was derived from an already known value function. The synthesis problem runs in the opposite direction: if we find a smooth function solving the HJB equation, when does it equal the value function, and how do we recover an optimal control?
The key idea is to compare the candidate function along arbitrary controlled trajectories. The chain rule and the HJB inequality show that no admissible control can beat the candidate, while a control attaining the Hamiltonian makes the inequalities equalities.
[quotetheorem:7631]
[citeproof:7631]
Verification is powerful because it replaces an infinite-dimensional optimization problem by solving a PDE and minimizing a finite-dimensional Hamiltonian pointwise. Each hypothesis has a concrete role. If $W$ is not differentiable along trajectories, the chain-rule integration that gives the lower bound is unavailable. If the Hamiltonian infimum is not attained, for instance when $U$ is open and the minimizing sequence approaches a control outside $U$, the theorem may prove only $W\le V$ without producing an optimal feedback. If the feedback is discontinuous enough to make the closed-loop ODE nonunique, the formula $u^*(s)=a^*(s,x^*(s))$ does not identify a well-defined controlled trajectory. These are exactly the obstructions that lead from classical verification to viscosity solutions and relaxed or measurable-selection methods.
[example: Smooth Synthesis for a Quadratic Candidate]
Suppose $\dot{x}=Ax+Bu$ and
\begin{align*}
J_{t,x}[u]=\int_t^T \left(x(s)^\top Qx(s)+u(s)^\top Ru(s)\right)\,ds+x(T)^\top Sx(T),
\end{align*}
where $Q,S$ are symmetric positive semidefinite and $R$ is symmetric positive definite. We test a symmetric quadratic candidate
\begin{align*}
W(t,x)=x^\top P(t)x,
\end{align*}
with terminal condition $P(T)=S$. Since $P(t)$ is symmetric,
\begin{align*}
\partial_t W(t,x)=x^\top \dot P(t)x.
\end{align*}
Also,
\begin{align*}
\nabla_x W(t,x)=(P(t)+P(t)^\top)x=2P(t)x.
\end{align*}
The Hamiltonian expression in the HJB equation is
\begin{align*}
x^\top Qx+u^\top Ru+(2P(t)x)^\top(Ax+Bu).
\end{align*}
Separate the terms depending on $u$:
\begin{align*}
x^\top Qx+2x^\top P(t)Ax+u^\top Ru+2x^\top P(t)Bu.
\end{align*}
For fixed $(t,x)$, complete the square in $u$:
\begin{align*}
u^\top Ru+2x^\top P(t)Bu=\left(u+R^{-1}B^\top P(t)x\right)^\top R\left(u+R^{-1}B^\top P(t)x\right)-x^\top P(t)BR^{-1}B^\top P(t)x.
\end{align*}
Indeed, expanding the square gives
\begin{align*}
\left(u+R^{-1}B^\top P(t)x\right)^\top R\left(u+R^{-1}B^\top P(t)x\right)=u^\top Ru+2x^\top P(t)Bu+x^\top P(t)BR^{-1}B^\top P(t)x.
\end{align*}
Because $R$ is positive definite, the square term is minimized exactly when
\begin{align*}
u+R^{-1}B^\top P(t)x=0.
\end{align*}
Thus the pointwise minimizing control is
\begin{align*}
u^*(t,x)=-R^{-1}B^\top P(t)x,
\end{align*}
and the minimized Hamiltonian contribution is
\begin{align*}
x^\top Qx+2x^\top P(t)Ax-x^\top P(t)BR^{-1}B^\top P(t)x.
\end{align*}
Substituting this into the HJB equation gives
\begin{align*}
0=x^\top \dot P(t)x+x^\top Qx+2x^\top P(t)Ax-x^\top P(t)BR^{-1}B^\top P(t)x.
\end{align*}
Since $x^\top P(t)Ax$ is a scalar,
\begin{align*}
x^\top P(t)Ax=\left(x^\top P(t)Ax\right)^\top=x^\top A^\top P(t)x.
\end{align*}
Therefore
\begin{align*}
2x^\top P(t)Ax=x^\top\left(P(t)A+A^\top P(t)\right)x.
\end{align*}
So the HJB equation becomes
\begin{align*}
0=x^\top\left(\dot P(t)+A^\top P(t)+P(t)A-P(t)BR^{-1}B^\top P(t)+Q\right)x.
\end{align*}
For this to hold for every $x$, the symmetric matrix inside the quadratic form must vanish:
\begin{align*}
\dot P(t)+A^\top P(t)+P(t)A-P(t)BR^{-1}B^\top P(t)+Q=0.
\end{align*}
Equivalently,
\begin{align*}
-\dot P(t)=A^\top P(t)+P(t)A-P(t)BR^{-1}B^\top P(t)+Q,
\end{align*}
with $P(T)=S$. Thus the usual finite-horizon LQR feedback law appears by minimizing the Hamiltonian for a quadratic HJB candidate, and the Riccati equation is exactly the condition that the quadratic candidate satisfy the PDE.
[/example]
The connection to Riccati equations is more than a computational coincidence. In LQR, the quadratic ansatz is preserved because linear dynamics and quadratic costs make the HJB equation close on quadratic polynomials. For nonlinear systems, the same equation is usually a fully nonlinear PDE, and exact smooth synthesis is rare.
[remark: Classical and Viscosity Viewpoints]
Classical verification applies when the candidate value function is differentiable enough for the chain rule. When shocks, switching surfaces, or exit boundaries create nonsmooth value functions, Chapter 9 interprets the HJB equation in the viscosity sense rather than discarding it.
[/remark]
The chapter ends with the main conceptual loop. Dynamic programming defines $V$ through optimal subproblems, the small-time limit gives HJB, and verification turns a smooth HJB solution back into an optimal feedback. This loop is one of the central bridges between nonlinear control theory, PDE, and numerical optimal control.
Dynamic programming and the HJB equation have linked optimal control to a value function, but that derivation depends on smoothness that often fails in practice. Chapter 9 removes that restriction by developing viscosity solutions, so the dynamic-programming principle still makes sense when the value function has corners, shocks, or switching surfaces.
# 9. Viscosity Solutions and Nonsmooth Value Functions
In the previous chapters, dynamic programming led from an optimal control problem to a Hamilton-Jacobi-Bellman equation for the value function. That derivation used differentiability, but value functions often inherit corners from terminal costs, switching surfaces, and reachability fronts. This chapter replaces classical differentiability by viscosity inequalities, which test the equation only through smooth functions touching the value function from above or below.
The point of the viscosity framework is not to weaken the optimal control problem; it is to preserve the correct notion of solution when the value function is continuous but nonsmooth. The prerequisites are the dynamic programming principle, the formal HJB derivation, basic semicontinuity, and the maximum principle intuition from first-order PDEs. We first examine why classical solutions fail, then state the Crandall-Lions definition, prove the comparison mechanism in the first-order setting at a sketch level, and finish with stability under approximation.
## Why Classical HJB Solutions Fail
What goes wrong if the dynamic programming equation is interpreted pointwise? The HJB equation contains derivatives of the value function, but optimal decisions can change abruptly when two policies have the same cost. At such switching surfaces the value function may remain continuous while its gradient has a jump, so the equation has no classical meaning at the most important points.
For a finite-horizon deterministic control problem, let the state be $x(t) \in \mathbb R^n$, let the control set be a compact [metric space](/page/Metric%20Space) $A$, and let the dynamics be governed by a continuous map $f: \mathbb R^n\times A\to \mathbb R^n$:
\begin{align*}
\dot{x}(t) = f(x(t),u(t)).
\end{align*}
Let the running cost be a continuous function $\ell: \mathbb R^n\times A\to \mathbb R$ and the terminal cost be a continuous function $g:\mathbb R^n\to\mathbb R$. For $(t,x)\in[0,T]\times\mathbb R^n$, the value function is the map $V:[0,T]\times\mathbb R^n\to\mathbb R\cup\{+\infty\}$ defined by
\begin{align*}
V(t,x) = \inf_{u(\cdot)} \left\{ \int_t^T \ell(x(s),u(s))\,ds + g(x(T)) \right\},
\end{align*}
where the infimum is taken over admissible controls and trajectories satisfying $x(t)=x$. Under the standing boundedness hypotheses used in this chapter, $V$ is finite-valued, so we regard it as a map $V:[0,T]\times\mathbb R^n\to\mathbb R$.
The associated Hamiltonian is the map $H:\mathbb R^n\times\mathbb R^n\to\mathbb R$ given by
\begin{align*}
H(x,p) = \sup_{a\in A}\{-f(x,a)\cdot p - \ell(x,a)\}.
\end{align*}
Compactness of $A$ and continuity of the displayed expression ensure that this supremum is finite and attained for each $(x,p)$. The formal dynamic programming equation is
\begin{align*}
-\partial_t V(t,x) + H(x,\nabla V(t,x)) = 0,
\qquad
V(T,x)=g(x).
\end{align*}
This formula is useful only at points where $V$ has the required derivatives. The next example shows that nonsmoothness is not a pathology introduced by exotic data; it appears for the simplest terminal costs.
[example: Absolute Value Terminal Cost]
Consider the one-dimensional problem with no control, dynamics $\dot{x}=0$, zero running cost, terminal time $T$, and terminal cost $g(x)=|x|$. Since $\dot{x}(s)=0$ for every $s\in[t,T]$, the trajectory starting from $x(t)=x$ is constant: $x(s)=x$ for all $s\in[t,T]$. Therefore the accumulated running cost is
\begin{align*}
\int_t^T 0\,ds=0.
\end{align*}
The terminal state is $x(T)=x$, so the value function is
\begin{align*}
V(t,x)=0+g(x(T))=g(x)=|x|.
\end{align*}
For $x>0$, one has $V(t,x)=x$, hence $\partial_t V(t,x)=0$ and $\partial_x V(t,x)=1$. For $x<0$, one has $V(t,x)=-x$, hence $\partial_t V(t,x)=0$ and $\partial_x V(t,x)=-1$. Thus the formal equation $-\partial_t V=0$ holds at every point with $x\ne0$. At $x=0$, however,
\begin{align*}
\lim_{h\downarrow0}\frac{V(t,h)-V(t,0)}{h}=\lim_{h\downarrow0}\frac{|h|}{h}=1,
\end{align*}
while
\begin{align*}
\lim_{h\uparrow0}\frac{V(t,h)-V(t,0)}{h}=\lim_{h\uparrow0}\frac{|h|}{h}=-1.
\end{align*}
The two one-sided spatial derivatives are different, so $\partial_x V(t,0)$ does not exist. Hence the correct value function satisfies the formal HJB equation away from the corner but is not a classical solution on all of $[0,T]\times\mathbb R$; the terminal data alone have forced the value function outside the classical differentiability class.
[/example]
The absolute-value example has no optimization, so it isolates the regularity issue. In controlled systems, the same phenomenon is amplified because different controls can be optimal on different sides of a switching surface. The HJB equation then records a maximization or minimization over competing Hamiltonians, and the winning control may change discontinuously.
[example: Bang-Bang Value Function With A Kink]
Let $\dot{x}=u$ with $u\in[-1,1]$, terminal time $T$, zero running cost, and terminal cost $g(x)=|x|$. Write $\tau=T-t$. For any admissible control,
\begin{align*}
x(T)=x+\int_t^T u(s)\,ds.
\end{align*}
Since $-1\le u(s)\le1$, integration gives
\begin{align*}
-\tau\le \int_t^T u(s)\,ds\le \tau.
\end{align*}
Thus every terminal state must lie in the interval $[x-\tau,x+\tau]$. Conversely, if $y\in[x-\tau,x+\tau]$, the constant control
\begin{align*}
u(s)=\frac{y-x}{\tau}
\end{align*}
is admissible when $\tau>0$, and it drives the state from $x$ to $y$ at time $T$. Therefore
\begin{align*}
V(t,x)=\inf_{y\in[x-\tau,x+\tau]} |y|.
\end{align*}
If $|x|\le\tau$, then $0\in[x-\tau,x+\tau]$, so the infimum is $0$. If $x>\tau$, then every $y\in[x-\tau,x+\tau]$ is positive, so $|y|=y$ and the smallest value occurs at $y=x-\tau$:
\begin{align*}
V(t,x)=x-\tau=|x|-(T-t).
\end{align*}
This endpoint is reached by $u=-1$. If $x<-\tau$, then every $y\in[x-\tau,x+\tau]$ is negative, so $|y|=-y$ and the smallest value occurs at $y=x+\tau$:
\begin{align*}
V(t,x)=-(x+\tau)=|x|-(T-t).
\end{align*}
This endpoint is reached by $u=1$. Combining the three cases gives
\begin{align*}
V(t,x)=\max\{|x|-(T-t),0\}.
\end{align*}
The value is flat inside the reachable interval $|x|\le T-t$ and linear outside it, so the spatial derivative jumps on the moving fronts $|x|=T-t$; those fronts are exactly where the optimal policy changes from moving toward the origin to having already reached zero cost.
[/example]
The moving fronts in this example are the deterministic analogue of shocks in scalar conservation laws. Characteristics carry information from terminal or boundary data, and when different characteristic families collide, the derivative of the value function is no longer single-valued. A solution concept for HJB equations must therefore distinguish the correct continuous value function from other weak candidates without asking for classical differentiability.
[remark: Shocks And Switching Surfaces]
In first-order HJB equations, nonsmoothness often marks the point at which two costs, two trajectories, or two control modes tie. The value function may be Lipschitz, semiconcave, or merely uniformly continuous depending on the hypotheses, but differentiability can fail on sets of codimension one. These sets are not negligible for feedback synthesis because the optimal feedback is commonly discontinuous there.
[/remark]
The question now becomes: how can an equation involving $\partial_t V$ and $\nabla V$ be imposed at a point where those derivatives do not exist? The viscosity answer is to compare $V$ with smooth test functions that touch it locally. The derivatives of the [test function](/page/Test%20Function) replace the missing derivatives of $V$ at the touching point.
## Viscosity Subsolutions And Supersolutions
How should a nonsmooth value function remember the HJB equation at a corner? If a smooth function $\phi$ touches $V$ from above at a point, then $\phi$ is a local smooth overestimate of the value. Dynamic programming should prevent such an overestimate from violating the HJB inequality in the subsolution direction. Touching from below gives the corresponding supersolution condition.
We write a first-order HJB equation in the abstract form
\begin{align*}
F(t,x,r,p_t,p_x)=0,
\end{align*}
where $r$ represents the value, $p_t$ the time derivative, and $p_x\in\mathbb R^n$ the spatial gradient. For the equation $-\partial_t V+H(x,\nabla V)=0$, this means
\begin{align*}
F(t,x,r,p_t,p_x)=-p_t+H(x,p_x).
\end{align*}
[definition: Viscosity Subsolution]
Let $Q\subset \mathbb R\times\mathbb R^n$ be open, and let $F:Q\times\mathbb R\times\mathbb R\times\mathbb R^n\to\mathbb R$ be continuous. A function $u:Q\to\mathbb R$ is a viscosity subsolution of $F(t,x,u,\partial_t u,\nabla u)=0$ if $u$ is upper semicontinuous and, whenever $\phi\in C^1(Q)$ and $u-\phi$ has a local maximum at $(t_0,x_0)\in Q$, one has
\begin{align*}
F(t_0,x_0,u(t_0,x_0),\partial_t\phi(t_0,x_0),\nabla\phi(t_0,x_0))\le 0.
\end{align*}
[/definition]
The subsolution condition controls smooth functions that lie above the candidate and touch it at one point. To obtain a two-sided solution concept, we must also control smooth functions lying below the candidate, because a function can satisfy all upper tests while still being too small to represent the value function.
[definition: Viscosity Supersolution]
Let $Q\subset \mathbb R\times\mathbb R^n$ be open, and let $F:Q\times\mathbb R\times\mathbb R\times\mathbb R^n\to\mathbb R$ be continuous. A function $u:Q\to\mathbb R$ is a viscosity supersolution of $F(t,x,u,\partial_t u,\nabla u)=0$ if $u$ is lower semicontinuous and, whenever $\phi\in C^1(Q)$ and $u-\phi$ has a local minimum at $(t_0,x_0)\in Q$, one has
\begin{align*}
F(t_0,x_0,u(t_0,x_0),\partial_t\phi(t_0,x_0),\nabla\phi(t_0,x_0))\ge 0.
\end{align*}
[/definition]
The supersolution condition prevents the candidate from lying below the dynamic-programming value. When both upper and lower tests are valid, the candidate has the right first-order behaviour from both sides of every possible corner, which motivates the combined definition.
[definition: Viscosity Solution]
Let $Q\subset \mathbb R\times\mathbb R^n$ be open, and let $F:Q\times\mathbb R\times\mathbb R\times\mathbb R^n\to\mathbb R$ be continuous. A continuous function $u:Q\to\mathbb R$ is a viscosity solution of $F(t,x,u,\partial_t u,\nabla u)=0$ if it is both a viscosity subsolution and a viscosity supersolution.
[/definition]
For smooth functions this two-sided test should reduce to the classical equation; otherwise the new definition would not be an extension of the old one. The possible obstruction is that the inequalities are stated using arbitrary touching test functions rather than substituting derivatives of the candidate directly. When the candidate is genuinely $C^1$, every valid touching test has the same first-order jet at the contact point, so the viscosity inequalities should collapse back to the classical PDE.
[quotetheorem:7632]
[citeproof:7632]
This theorem explains why smooth verification arguments from earlier chapters fit into the viscosity framework. The $C^1$ hypothesis is essential: if $u$ has only a.e. derivatives, the test functions at a corner need not see the a.e. gradient, so pointwise classical substitution is no longer meaningful. Continuity of $F$ is also part of the consistency mechanism, because the same first-order jet must produce compatible inequalities from above and below. The theorem does not say that every Lipschitz weak solution is a viscosity solution; entropy conditions for conservation laws provide the familiar warning that weak formulations can admit extra candidates. The new gain is that the definition still applies to the bang-bang value function, where the touching test functions encode the admissible slopes at the kink.
Concrete failures show why these assumptions are not cosmetic. For the regularity hypothesis, let $u(x)=|x|$ on $Q=(-1,1)$ and consider the formal equation $|u_x|=1$. The equation holds at every $x\ne0$ and holds a.e., but at $x=0$ the constant test function touches $u$ from below and has derivative $0$, so the supersolution inequality for $F(p)=|p|-1$ fails. Thus a.e. satisfaction of the classical equation is not a substitute for the $C^1$ consistency argument. For continuity of the operator, the issue is different: at a touching point for a $C^1$ function, the derivatives of $u$ and the test function agree exactly, but a discontinuous operator may assign incompatible values to the same limiting jet depending on whether it is interpreted through upper or lower semicontinuous envelopes. For example, if $F(p)=1$ for $p=0$ and $F(p)=0$ for $p\ne0$, then the raw equation $F(u_x)=0$ is not governed by a single continuous jet-evaluation rule at $p=0$. Standard viscosity theory therefore replaces $F$ by its upper and lower semicontinuous envelopes in the subsolution and supersolution inequalities. The consistency theorem above avoids this extra convention by assuming $F$ is continuous from the start.
[example: Testing The Kink In A Reachability Value Function]
For $V(t,x)=\max\{|x|-(T-t),0\}$, consider the right moving front $(t_0,x_0)$, where $x_0=T-t_0>0$. In a neighbourhood of this point we have $x>0$, so
\begin{align*}
V(t,x)=\max\{x-(T-t),0\}=\max\{x+t-T,0\}.
\end{align*}
Set $z=x+t-T$. Then $z=0$ at $(t_0,x_0)$, $V=0$ on the side $z\le0$, and $V=z$ on the side $z\ge0$.
Suppose $\phi\in C^1$ touches $V$ from above at $(t_0,x_0)$. Since $V-\phi$ has a local maximum there, $\phi(t_0,x_0)=V(t_0,x_0)=0$ and $\phi\ge V$ near $(t_0,x_0)$. Along the line $(t,x)=(t_0+h,x_0)$, the value function is
\begin{align*}
V(t_0+h,x_0)=\max\{h,0\}.
\end{align*}
For $h>0$, the inequality $\phi\ge V$ gives
\begin{align*}
\frac{\phi(t_0+h,x_0)-\phi(t_0,x_0)}{h}\ge \frac{V(t_0+h,x_0)-V(t_0,x_0)}{h}=1.
\end{align*}
Letting $h\downarrow0$ gives $\partial_t\phi(t_0,x_0)\ge1$. For $h<0$, the same touching inequality gives
\begin{align*}
\phi(t_0+h,x_0)-\phi(t_0,x_0)\ge 0.
\end{align*}
Dividing by $h<0$ reverses the inequality:
\begin{align*}
\frac{\phi(t_0+h,x_0)-\phi(t_0,x_0)}{h}\le0.
\end{align*}
Letting $h\uparrow0$ gives $\partial_t\phi(t_0,x_0)\le0$, contradicting $\partial_t\phi(t_0,x_0)\ge1$. Thus there is no $C^1$ test function touching $V$ from above at the right moving front, so the subsolution condition is vacuous there.
Now suppose $\psi\in C^1$ touches $V$ from below at $(t_0,x_0)$. Then $\psi(t_0,x_0)=0$ and $\psi\le V$ near the point. Along $(t,x)=(t_0+h,x_0)$ with $h>0$,
\begin{align*}
\frac{\psi(t_0+h,x_0)-\psi(t_0,x_0)}{h}\le \frac{V(t_0+h,x_0)-V(t_0,x_0)}{h}=1,
\end{align*}
so $\partial_t\psi(t_0,x_0)\le1$. Along $h<0$, since $V(t_0+h,x_0)=0$ and $\psi(t_0+h,x_0)\le0$,
\begin{align*}
\frac{\psi(t_0+h,x_0)-\psi(t_0,x_0)}{h}\ge0,
\end{align*}
and hence $\partial_t\psi(t_0,x_0)\ge0$. The same argument in the spatial direction gives
\begin{align*}
0\le \partial_x\psi(t_0,x_0)\le1.
\end{align*}
For this lower test, the supersolution inequality for $-\partial_t V+|\partial_x V|=0$ becomes
\begin{align*}
-\partial_t\psi(t_0,x_0)+|\partial_x\psi(t_0,x_0)|\ge0.
\end{align*}
The admissible lower tests therefore encode the slopes allowed by the two one-sided branches, while no single derivative is assigned to $V$ itself at the moving front.
[/example]
The definitions above are local. To solve a terminal-value control problem, the terminal condition must also be imposed in a way compatible with continuity or semicontinuity. In the simplest continuous setting, this means $u(T,x)=g(x)$ pointwise, and comparison will propagate this boundary ordering into the interior.
## Comparison And Uniqueness
Why does the viscosity definition identify the value function uniquely? The main answer is comparison: if a subsolution starts below a supersolution on the terminal boundary, then it remains below throughout the domain. This principle is the substitute for subtracting two classical solutions and applying a maximum principle to their difference.
[quotetheorem:7633]
[citeproof:7633]
Each hypothesis in the comparison theorem blocks a specific failure mode. Properness, here supplied by $\lambda>0$, prevents adding a positive constant to a subsolution and preserving the same inequality; without some monotonicity in $u$, uniqueness may fail. The continuity assumptions on $H$ are what allow the two doubled points to merge in the limit; discontinuous Hamiltonians can remember which side of a switching interface the test point came from. The spatial-infinity condition is the unbounded-domain replacement for a lateral boundary condition, and without it a positive maximum of $u-v$ can escape to infinity rather than being detected by the viscosity inequalities. Thus the theorem is not a generic uniqueness statement for every first-order PDE; it is a comparison result for a controlled class of proper HJB equations.
For a control value function, comparison still has to be turned into a uniqueness statement for the particular class of admissible candidates. The obstruction is that two bounded continuous solutions on an unbounded state space may differ by hidden behaviour at spatial infinity even when they satisfy the same terminal condition and the same formal PDE. A separate uniqueness result records the extra compatibility condition that rules out those hidden branches and lets comparison be applied in both directions.
[quotetheorem:7634]
[proofunderconstruction:7634]
Uniqueness is what makes the viscosity solution concept useful for control, but it inherits all the restrictions of comparison. If the terminal condition is imposed only a.e., or if solutions are allowed to grow in incompatible ways at infinity, two different viscosity solutions may agree on the formal PDE while representing different boundary behaviour. The bounded uniformly continuous class rules out these hidden boundary branches and keeps the doubled-variable argument inside the intended solution space. If dynamic programming produces a continuous viscosity solution and comparison holds, then every convergent approximation scheme with the same limiting equation must converge to the value function.
[example: Time Optimal Reachability]
Let $K\subset\mathbb R^n$ be a target, and let $T(x)$ denote the minimum time needed to steer the system $\dot{x}=f(x,u)$, $u\in A$, from $x$ into $K$, with $T=0$ on $K$. At a point $x\notin K$ where $T$ is differentiable, dynamic programming over a short time interval of length $h>0$ says that using a constant control $a\in A$ for the first time step gives the candidate cost
\begin{align*}
h+T(x_h),
\end{align*}
where the corresponding trajectory satisfies
\begin{align*}
x_h=x+h f(x,a)+o(h).
\end{align*}
Since $T$ is differentiable at $x$, the first-order expansion of $T$ along this trajectory is
\begin{align*}
T(x_h)=T(x)+\nabla T(x)\cdot (x_h-x)+o(|x_h-x|).
\end{align*}
Substituting $x_h-x=h f(x,a)+o(h)$ gives
\begin{align*}
T(x_h)=T(x)+h\, f(x,a)\cdot \nabla T(x)+o(h).
\end{align*}
Thus the cost of this first step followed by an optimal continuation is
\begin{align*}
h+T(x_h)=T(x)+h\bigl(1+f(x,a)\cdot \nabla T(x)\bigr)+o(h).
\end{align*}
Taking the infimum over $a\in A$ and subtracting $T(x)$ from both sides of the dynamic programming identity gives
\begin{align*}
0=h\inf_{a\in A}\bigl(1+f(x,a)\cdot \nabla T(x)\bigr)+o(h).
\end{align*}
Dividing by $h>0$ and sending $h\downarrow0$ yields
\begin{align*}
\inf_{a\in A}\bigl(1+f(x,a)\cdot \nabla T(x)\bigr)=0.
\end{align*}
Since $1$ is independent of $a$, this is equivalent to
\begin{align*}
1+\inf_{a\in A} f(x,a)\cdot \nabla T(x)=0.
\end{align*}
Multiplying the infimum by $-1$ converts it to a supremum, so the stationary HJB equation is
\begin{align*}
\sup_{a\in A}\{-f(x,a)\cdot \nabla T(x)\}=1.
\end{align*}
The level sets of $T$ are reachable fronts. When two such fronts meet, the arrival time can remain continuous while its gradient becomes nonunique, so the displayed equation is imposed through viscosity test functions rather than by assigning a classical gradient to $T$ at the front intersection.
[/example]
The time-optimal example also shows why boundary conditions need care. On the target one usually has $T=0$, but near the target the value may fail to be smooth because many optimal trajectories can arrive with the same time. Viscosity comparison treats this as a boundary-value problem for a first-order nonlinear PDE, not as a request to choose a preferred smooth branch.
## Stability Under Approximation
Numerical methods, regularization, and model reduction all produce approximate value functions rather than exact ones. The natural question is whether the viscosity solution property survives limits. Stability is the theorem that makes viscosity solutions compatible with approximation schemes and with dynamic programming limits such as time discretization.
[quotetheorem:7635]
[citeproof:7635]
The theorem is short but powerful. It says that the viscosity definition is closed under locally uniform limits, so smoothing the data, discretizing time, or approximating the Hamiltonian does not change the limiting solution concept as long as the approximations converge in the correct topology. The locally [uniform convergence](/page/Uniform%20Convergence) of $u_k$ is used to transfer touching points from $u$ back to nearby functions $u_k$; pointwise convergence alone can lose maxima and can create artificial envelopes. The locally uniform convergence of $F_k$ is equally important because the test derivative is held fixed while the operator is passed to the limit. This theorem does not justify arbitrary weak limits of approximate value functions; in numerical PDE language, it is a stability result that must be paired with compactness, consistency, monotonicity, and comparison.
The hypotheses fail in concrete ways. If locally uniform convergence of $u_k$ is weakened to pointwise convergence, moving spikes can destroy the touching-point argument: for example, on $Q=(-1,1)$ let $u_k$ be continuous, equal to $1$ on a small interval around $1/k$, and equal to $0$ outside a slightly larger interval. Then $u_k\to0$ pointwise, but local maxima of $u_k-\phi$ may live on the moving spike rather than near the touching point of the limit, so the limiting test inequality is not forced. If locally uniform convergence of the operators is dropped, the limiting equation can also change at the exact jet being tested. For instance, let $F_k(p)=p^2+\mathbb{1}_{\{|p|\le 1/k\}}$ and let $F(p)=p^2$. Then $F_k(p)\to F(p)$ for every fixed $p\ne0$, but not locally uniformly near $p=0$; tests with derivative $0$ retain the extra term in the approximating equations and do not pass to the displayed limit. These examples are the reason stability is formulated in the compact-open topology for both the functions and the operators.
[example: Vanishing Viscosity Approximation]
Consider a sequence $\varepsilon_j\downarrow0$ and classical solutions $u^{\varepsilon_j}$ of the regularized terminal-value problems
\begin{align*}
-\partial_t u^{\varepsilon_j}(t,x)+H(x,\nabla u^{\varepsilon_j}(t,x))-\varepsilon_j\Delta u^{\varepsilon_j}(t,x)=0,
\end{align*}
with terminal data $u^{\varepsilon_j}(T,x)=g^{\varepsilon_j}(x)$ and $g^{\varepsilon_j}\to g$ locally uniformly. Write the parabolic operator tested against a smooth function $\phi$ as
\begin{align*}
F_j(x,p_t,p_x,X)=-p_t+H(x,p_x)-\varepsilon_j\operatorname{tr}(X).
\end{align*}
For the limiting first-order equation, the corresponding operator is
\begin{align*}
F(x,p_t,p_x)=-p_t+H(x,p_x).
\end{align*}
If $\phi\in C^2$ is fixed, then $X=D^2\phi(t,x)$ is fixed at the touching point, so
\begin{align*}
F_j(x,\partial_t\phi,\nabla\phi,D^2\phi)-F(x,\partial_t\phi,\nabla\phi)=-\varepsilon_j\operatorname{tr}(D^2\phi).
\end{align*}
Since $\operatorname{tr}(D^2\phi)$ is continuous and bounded on compact neighbourhoods, the right-hand side tends to $0$ locally uniformly as $\varepsilon_j\downarrow0$.
Assume also that $u^{\varepsilon_j}\to u$ locally uniformly. By *Stability Of Viscosity Subsolutions And Supersolutions*, the locally uniform convergence of the functions and operators passes the viscosity inequalities to the limit. Therefore $u$ satisfies
\begin{align*}
-\partial_t u+H(x,\nabla u)=0
\end{align*}
in the viscosity sense, with terminal condition $u(T,x)=g(x)$ inherited from the locally uniform convergence of $g^{\varepsilon_j}$. The term $-\varepsilon_j\Delta u^{\varepsilon_j}$ smooths each approximating equation, but when tested against any fixed smooth function its contribution is exactly $-\varepsilon_j\operatorname{tr}(D^2\phi)$, which vanishes in the limit.
[/example]
For numerical control, stability is paired with monotonicity and consistency. A finite-difference or semi-Lagrangian method should approximate the Hamiltonian consistently, preserve order, and produce locally bounded approximations. Under these conditions, the Barles-Souganidis philosophy says that any locally uniform limit is the viscosity solution, and comparison then upgrades subsequential convergence to full convergence.
[remark: Why Monotone Schemes Matter]
A nonmonotone discretization may converge to a function that solves the wrong weak equation or develops spurious oscillations near kinks. Monotonicity is the discrete shadow of the [comparison principle](/theorems/4870): increasing the data should not decrease the computed value. This is why many reliable HJB solvers use upwind or semi-Lagrangian constructions even when higher-order centered formulas look more accurate on smooth test problems.
[/remark]
The chapter's conceptual arc is now complete. Classical HJB theory explains the equation where the value function is differentiable; viscosity theory explains how the same dynamic programming law survives shocks, corners, and switching surfaces. Comparison gives uniqueness, while stability ensures that approximation methods and regularizations converge to the same mathematically meaningful object.
Viscosity theory has shown how the HJB equation remains meaningful even when the value function is nonsmooth. The next chapter uses that same value-function framework in the constrained setting, where the controller must respect state and input bounds and may need to be implemented receding horizon via MPC.
# 10. Constrained Optimal Control and MPC
This chapter brings optimal control into the constrained setting where state and input bounds are part of the problem rather than afterthoughts. The central question is no longer only how to minimize a cost, but how to keep the system inside an admissible region while repeatedly solving finite-horizon problems in real time. Model predictive control answers this by combining finite-horizon optimization, terminal ingredients, and a receding-horizon implementation.
The progression is from geometry to algorithms to guarantees. We first describe state and input constraints through viability and controlled invariance, then formulate finite-horizon receding-horizon control, and finally prove the recursive feasibility and Lyapunov stability mechanisms that make constrained MPC a feedback method rather than just a sequence of open-loop optimizations.
## Constraint Satisfaction and Viability
What does it mean for a nonlinear control system to respect constraints for all future time? A constraint set is useful only if the vector field can be steered so that trajectories do not immediately point out of the allowed region. This leads from pointwise inequalities to the geometric idea of viability.
Consider a controlled system with dynamics $f: X \times U \to \mathbb R^n$,
\begin{align*}
\dot{x}(t) = f(x(t), u(t)), \qquad x(t) \in X \subset \mathbb R^n, \qquad u(t) \in U \subset \mathbb R^m.
\end{align*}
Here $X$ is the admissible state constraint set and $U$ is the admissible input set. The admissible controls are measurable functions $u: [0,\infty) \to U$ for which the corresponding trajectory exists on the time interval under consideration.
[definition: Viable Trajectory]
Let $K \subset \mathbb R^n$ and let $U \subset \mathbb R^m$. A trajectory-control pair $(x,u)$ with $x: [0,T] \to \mathbb R^n$ and measurable $u: [0,T] \to U$ is viable in $K$ on $[0,T]$ for $\dot{x}=f(x,u)$ if $x(t) \in K$ for every $t \in [0,T]$.
[/definition]
This definition turns constraint satisfaction into a property of the entire trajectory. The next concept asks whether every initial condition in a set admits at least one such trajectory.
[definition: Viability Kernel]
Let $K \subset \mathbb R^n$, let $U \subset \mathbb R^m$, and let $f: K \times U \to \mathbb R^n$. The viability kernel of $K$ for $\dot{x}=f(x,u)$ with inputs in $U$ is
\begin{align*}
\operatorname{Viab}(K) = \{x_0 \in K : \text{there exists an admissible control } u \text{ such that } x(t;x_0,u) \in K \text{ for all } t \ge 0\}.
\end{align*}
[/definition]
The viability kernel is the largest part of $K$ from which constraints can be maintained indefinitely. In MPC, the feasible set of the optimization problem is a finite-horizon approximation to this infinite-horizon object.
[definition: Controlled Invariant Set]
Let $U \subset \mathbb R^m$ and let $f: C \times U \to \mathbb R^n$. A set $C \subset \mathbb R^n$ is controlled invariant for $\dot{x}=f(x,u)$ under input set $U$ if for every $x_0 \in C$ there exists an admissible control $u: [0,\infty) \to U$ such that the corresponding trajectory satisfies $x(t) \in C$ for all $t \ge 0$.
[/definition]
Controlled invariant sets are the basic terminal ingredients used later. If an MPC trajectory can be forced into a controlled invariant terminal set, then a known local policy can take over after the finite horizon.
To state a usable boundary test for controlled invariance, we need the tangent cone. It records the instantaneous directions that remain compatible with a closed constraint set to first order.
[definition: Bouligand Tangent Cone]
Let $K \subset \mathbb R^n$ be closed and let $x \in K$. The Bouligand tangent cone to $K$ at $x$ is
\begin{align*}
T_K(x) = \left\{v \in \mathbb R^n : \liminf_{h \downarrow 0} \frac{\operatorname{dist}(x + h v, K)}{h} = 0\right\}.
\end{align*}
[/definition]
For a smooth inequality set $K = \{x : g_i(x) \le 0,\ i=1,\dots,r\}$, the tangent cone at a regular boundary point consists of directions $v$ satisfying $\nabla g_i(x) \cdot v \le 0$ for every active constraint $g_i(x)=0$. The next theorem turns this local boundary test into a global viability criterion, which is the geometric foundation behind terminal invariant sets in MPC.
[quotetheorem:7636]
For a control system, the set-valued map is $F(x)=\{f(x,u):u\in U\}$. The theorem says that at every constrained state, at least one admissible input must generate an allowed instantaneous velocity. The closedness of $K$ is what makes the boundary and tangent cone the right objects: if $K=(0,\infty)$ and $F(x)=\{-1\}$, then every interior tangent cone is all of $\mathbb R$, so the tangent test holds at every point of $K$, but the solution $x(t)=x_0-t$ leaves $K$ in finite time. Upper semicontinuity rules out abrupt losses of viable directions; for instance, with $K=[0,\infty)$, $F(0)=\{0\}$, and $F(x)=\{-1\}$ for $x>0$, the tangent condition holds at every point, but every solution starting from $x_0>0$ reaches the boundary and then has no compatible continuation with velocity $-1$. Compactness and linear growth are also needed for global-in-time existence: on $K=\mathbb R$ with $F(x)=\{x^2\}$, the tangent condition is automatic, yet solutions with $x_0>0$ blow up in finite time. Convexity prevents a relaxed tangent direction from being mistaken for an implementable velocity; for example, if $K=\{0\}$ and $F(0)=\{-1,1\}$, then the convexified velocity set contains $0$, but the original inclusion has no viable trajectory because neither admissible velocity lies in $T_K(0)=\{0\}$. This is why MPC terminal sets are usually chosen with conservative invariance checks that are easier to verify than the full viability kernel.
[example: Constrained Double Integrator Viability]
Consider the double integrator
\begin{align*}
\dot{x}_1 = x_2, \qquad \dot{x}_2 = u, \qquad |u| \le u_{\max},
\end{align*}
with $u_{\max}>0$ and position constraint $|x_1|\le a$. At the right boundary $x_1=a$, the velocity must satisfy $x_2\le 0$, because $\dot{x}_1=x_2$ and any $x_2>0$ immediately points outside the constraint set. At the left boundary $x_1=-a$, the same argument gives $x_2\ge 0$.
These boundary sign conditions are necessary but not sufficient. Suppose first that $x_2>0$. The best possible braking input before reaching the right wall is $u=-u_{\max}$, because every admissible input satisfies $u(t)\ge -u_{\max}$. Under this maximal braking input,
\begin{align*}
x_2(t)=x_2-u_{\max}t.
\end{align*}
The stopping time is determined by $0=x_2-u_{\max}t_s$, hence
\begin{align*}
t_s=\frac{x_2}{u_{\max}}.
\end{align*}
The corresponding position is
\begin{align*}
x_1(t)=x_1+x_2t-\frac{1}{2}u_{\max}t^2.
\end{align*}
Substituting $t_s=x_2/u_{\max}$ gives
\begin{align*}
x_1(t_s)=x_1+x_2\frac{x_2}{u_{\max}}-\frac{1}{2}u_{\max}\frac{x_2^2}{u_{\max}^2}=x_1+\frac{x_2^2}{2u_{\max}}.
\end{align*}
Therefore the state can avoid crossing $x_1=a$ only if
\begin{align*}
x_1+\frac{x_2^2}{2u_{\max}}\le a,
\end{align*}
equivalently
\begin{align*}
\frac{x_2^2}{2u_{\max}}\le a-x_1.
\end{align*}
Now suppose $x_2<0$. The best possible braking input before reaching the left wall is $u=u_{\max}$. Then
\begin{align*}
x_2(t)=x_2+u_{\max}t.
\end{align*}
The stopping time satisfies $0=x_2+u_{\max}t_s$, so
\begin{align*}
t_s=\frac{-x_2}{u_{\max}}.
\end{align*}
The position during braking is
\begin{align*}
x_1(t)=x_1+x_2t+\frac{1}{2}u_{\max}t^2.
\end{align*}
Substituting $t_s=-x_2/u_{\max}$ gives
\begin{align*}
x_1(t_s)=x_1+x_2\frac{-x_2}{u_{\max}}+\frac{1}{2}u_{\max}\frac{x_2^2}{u_{\max}^2}=x_1-\frac{x_2^2}{2u_{\max}}.
\end{align*}
Thus avoiding the left wall requires
\begin{align*}
x_1-\frac{x_2^2}{2u_{\max}}\ge -a,
\end{align*}
equivalently
\begin{align*}
\frac{x_2^2}{2u_{\max}}\le a+x_1.
\end{align*}
Hence the viable states are exactly those with $|x_1|\le a$ and, depending on the sign of $x_2$, enough remaining distance to brake before the next wall. If $x_2=0$, choosing $u=0$ keeps both $x_2$ and $x_1$ constant, so every state with $|x_1|\le a$ is viable.
[/example]
The example shows why constraint handling is not a static clipping rule. The correct admissible region depends on the dynamics, the actuator limits, and the ability to recover before a future boundary is reached.
## Terminal Ingredients for Finite Horizons
How can a finite-horizon optimization problem represent an infinite-horizon safety and stability requirement? The standard answer is to add a terminal set, a terminal cost, and a local terminal controller. These ingredients encode what happens after the horizon ends.
[definition: Terminal Set and Terminal Controller]
For a constrained control system $\dot{x}=f(x,u)$ with admissible sets $X$ and $U$, a terminal set and terminal controller are a set $X_f \subset X$ and a feedback $\kappa_f: X_f \to U$ such that the closed-loop trajectory
\begin{align*}
\dot{x} = f(x,\kappa_f(x))
\end{align*}
starting from any $x_0 \in X_f$ remains in $X_f$ for all future time.
[/definition]
The terminal set gives a region where the controller designer already knows how to satisfy constraints. Feasibility requires invariance inside this set, but stability also requires a quantitative account of the cost accumulated after the horizon ends. That quantitative account is the terminal cost decrease condition.
[definition: Terminal Cost Decrease Condition]
Let $\ell: X \times U \to \mathbb R_+$ be a stage cost, let $V_f: X_f \to \mathbb R_+$ be a terminal cost, and let $\kappa_f: X_f \to U$ be a terminal controller. The pair $(V_f,\kappa_f)$ satisfies the terminal decrease condition on $X_f$ if
\begin{align*}
\nabla V_f(x) \cdot f(x,\kappa_f(x)) \le -\ell(x,\kappa_f(x))
\end{align*}
for every $x \in X_f$ where $V_f$ is differentiable.
[/definition]
This condition makes $V_f$ act as the remaining cost after the horizon. For a discrete-time prediction map $F_d: X \times U \to X$, it is the continuous-time analogue of the inequality
\begin{align*}
V_f(F_d(x,\kappa_f(x))) - V_f(x) \le -\ell(x,\kappa_f(x)).
\end{align*}
[example: Local Terminal Set from LQR]
Let the nonlinear dynamics be written near the equilibrium as
\begin{align*}
\dot{x}=f(x,u), \qquad f(0,0)=0,
\end{align*}
with linearization $f(x,u)=Ax+Bu+r(x,u)$, where $r(x,u)/\|(x,u)\|\to 0$ as $(x,u)\to 0$. Use the local feedback $u=Kx$ and set
\begin{align*}
r_K(x)=f(x,Kx)-(A+BK)x.
\end{align*}
Then $r_K(x)/|x|\to 0$ as $x\to 0$.
Assume the LQR weights are $Q=Q^\top>0$ and $R=R^\top>0$, and that $P=P^\top>0$ is chosen so that the closed-loop Riccati identity is
\begin{align*}
(A+BK)^\top P+P(A+BK)=-(Q+K^\top RK).
\end{align*}
For $V_f(x)=x^\top Px$, the gradient is $\nabla V_f(x)=2Px$, so along the nonlinear closed loop $u=Kx$,
\begin{align*}
\dot V_f(x)=2x^\top P f(x,Kx).
\end{align*}
Substituting $f(x,Kx)=(A+BK)x+r_K(x)$ gives
\begin{align*}
\dot V_f(x)=2x^\top P(A+BK)x+2x^\top P r_K(x).
\end{align*}
Since the scalar $x^\top P(A+BK)x$ equals its transpose $x^\top(A+BK)^\top Px$, the first term can be symmetrized:
\begin{align*}
2x^\top P(A+BK)x=x^\top\bigl(P(A+BK)+(A+BK)^\top P\bigr)x.
\end{align*}
Using the Riccati identity,
\begin{align*}
\dot V_f(x)=-x^\top(Q+K^\top RK)x+2x^\top P r_K(x).
\end{align*}
Let $M=Q+K^\top RK$. Since $M>0$, $x^\top Mx\ge \lambda_{\min}(M)|x|^2$. Because $r_K(x)/|x|\to 0$, we can choose $\varepsilon>0$ such that $|2x^\top P r_K(x)|\le \frac12\lambda_{\min}(M)|x|^2$ whenever $|x|\le\varepsilon$. Hence, on that neighbourhood,
\begin{align*}
\dot V_f(x)\le -\frac12\lambda_{\min}(M)|x|^2.
\end{align*}
Finally choose $\alpha>0$ small enough that the ellipsoid
\begin{align*}
X_f=\{x:x^\top Px\le \alpha\}
\end{align*}
lies inside $|x|\le\varepsilon$, satisfies $x\in X$, and satisfies $Kx\in U$ for every $x\in X_f$. Thus the LQR quadratic gives a terminal cost whose decrease survives the nonlinear remainder on a sufficiently small terminal ellipsoid.
[/example]
This construction is local, but that is enough for MPC: the optimizer only needs to reach $X_f$ by the end of the prediction horizon. The global constrained behavior is then built by repeatedly solving the finite-horizon problem.
## Finite-Horizon Receding-Horizon Control
How do we turn optimal control into a feedback law when solving an infinite-horizon constrained problem directly is not computationally realistic? MPC repeatedly solves a finite-horizon problem from the current state, applies the first part of the optimizer, and then shifts the horizon forward.
For a sampling time $\Delta t>0$, write the discrete-time prediction model as
\begin{align*}
x_{k+1}=F(x_k,u_k),
\end{align*}
where $F$ may come from exact flow over one sample or from a numerical discretization of the continuous dynamics. The finite-horizon MPC problem at state $x$ is
\begin{align*}
V_N(x) = \inf_{u_0,\dots,u_{N-1}} \left[\sum_{k=0}^{N-1} \ell(x_k,u_k) + V_f(x_N)\right].
\end{align*}
The constraints are $x_0=x$, $x_{k+1}=F(x_k,u_k)$, $x_k\in X$, $u_k\in U$ for $k=0,\dots,N-1$, and $x_N\in X_f$.
[definition: MPC Feedback Law]
Assume the finite-horizon problem admits an optimizer $(u_0^*(x),\dots,u_{N-1}^*(x))$ at each state $x \in \mathcal X_N$. The MPC feedback law is the map $\kappa_N: \mathcal X_N \to U$ defined by
\begin{align*}
\kappa_N(x)=u_0^*(x).
\end{align*}
[/definition]
The feedback law may be discontinuous when the optimizer is not unique. In practice, a deterministic tie-breaking rule or a strictly convex regularization is often used to obtain a well-defined control input.
[example: Collision-Avoidance MPC]
Consider a planar vehicle with position $p_k=(p_{1,k},p_{2,k})$, velocity $v_k=(v_{1,k},v_{2,k})$, and acceleration input $u_k\in\mathbb R^2$. With sampling time $\Delta t>0$, one common prediction model is
\begin{align*}
p_{k+1}=p_k+\Delta t\,v_k+\frac{1}{2}\Delta t^2u_k,\qquad v_{k+1}=v_k+\Delta t\,u_k.
\end{align*}
For obstacle center $c_j=(c_{j,1},c_{j,2})$ and safety radius $r_j$, the collision-avoidance constraint is
\begin{align*}
\|p_k-c_j\|\ge r_j.
\end{align*}
Equivalently, since both sides are nonnegative,
\begin{align*}
(p_{1,k}-c_{j,1})^2+(p_{2,k}-c_{j,2})^2\ge r_j^2.
\end{align*}
The actuator constraint is similarly written as
\begin{align*}
u_{1,k}^2+u_{2,k}^2\le u_{\max}^2.
\end{align*}
A finite-horizon MPC problem can therefore minimize, for example,
\begin{align*}
\sum_{k=0}^{N-1}\bigl(\|p_k-p_k^{\mathrm{ref}}\|^2+\lambda\|u_k\|^2\bigr)
\end{align*}
subject to the prediction dynamics, the actuator constraint, and the obstacle inequalities for every predicted time $k$ and every obstacle $j$. After solving, the controller applies only $u_0^*$, obtains the next measured state, updates obstacle estimates if needed, and solves the shifted problem again.
The obstacle constraint is nonconvex. For a single obstacle with $c=0$ and radius $r>0$, the points $p^+=(r,0)$ and $p^-=(-r,0)$ both satisfy
\begin{align*}
\|p^+\|^2=r^2,\qquad \|p^-\|^2=r^2.
\end{align*}
Their midpoint is
\begin{align*}
\frac{p^++p^-}{2}=(0,0),
\end{align*}
and this midpoint violates the constraint because
\begin{align*}
\|(0,0)\|^2=0<r^2.
\end{align*}
Thus the feasible position set outside the disk is not convex. In an MPC problem, this means that a trajectory passing above the obstacle and a trajectory passing below the obstacle may lie in different feasible regions, so a local optimizer can return different candidates depending on initialization.
Feasibility can also be lost if avoidance is delayed. Suppose the vehicle is moving directly toward a circular obstacle and the optimizer waits until the remaining distance is too small for the acceleration bound to generate enough lateral displacement. The later optimization problem then has to satisfy the same inequalities
\begin{align*}
(p_{1,k}-c_1)^2+(p_{2,k}-c_2)^2\ge r^2
\end{align*}
with fewer prediction steps and bounded inputs satisfying $\|u_k\|\le u_{\max}$. The issue is therefore not only choosing a low-cost path, but maintaining enough future maneuvering room so that the next constrained optimization problem remains feasible.
[/example]
The collision-avoidance problem illustrates both the strength and difficulty of MPC. Constraints appear in the optimization problem directly, but nonconvex safe regions can introduce multiple local minimizers and make real-time computation more delicate.
[definition: Feasible Set of an MPC Problem]
The feasible set for horizon $N$ is
\begin{align*}
\mathcal X_N = \{x \in X : \text{there exists } (u_0,\dots,u_{N-1}) \in U^N \text{ satisfying the prediction constraints and } x_N \in X_f\}.
\end{align*}
[/definition]
The set $\mathcal X_N$ is the domain on which the MPC controller is defined. The next question is whether applying the controller keeps the next state inside the same feasible set.
## Recursive Feasibility
Why should tomorrow's finite-horizon problem remain feasible after today's optimizer is used? The key argument is the shift construction: remove the input already applied, shift the remaining optimal sequence forward, and append the terminal controller at the end.
[quotetheorem:7637]
[citeproof:7637]
Recursive feasibility is the safety backbone of MPC. Once a trajectory starts in $\mathcal X_N$, the controller never asks the optimizer to solve an impossible problem, at least under exact modeling and successful numerical solution. The terminal invariance assumption is essential in a concrete way: for the scalar system $x_{k+1}=x_k+u_k$ with $X=[0,2]$, $U=\{1\}$, $X_f=\{1\}$, and horizon $N=1$, the problem is feasible at $x=0$ by choosing $u_0=1$, but after applying it the state is $x^+=1$ and the next problem would require $1+u_0=1$, impossible under $U=\{1\}$. Optimizer existence is also a real hypothesis rather than a technicality: if $x_{k+1}=0$, $X=X_f=\{0\}$, $U=(0,1)$, and the cost is $\ell(x,u)=u$, the infimum is $0$ but no minimizing input exists, so the feedback input $u_0^*(x)$ is not defined. Exact prediction matters as well; if the true plant is $x_{k+1}=F(x_k,u_k)+d_k$ with unmodelled $d_k\ne 0$, the shifted sequence constructed in the proof may no longer satisfy the state constraints. The theorem also does not guarantee convergence to the equilibrium; that requires the value-function decrease argument. The next example shows this mechanism in a physical system where terminal invariance is a local recoverability condition near the desired operating level.
[example: Nonlinear Tank-Level Regulation with Input Saturation]
For a concrete sampled tank model, take the forward-Euler discretization of an inflow-minus-outflow balance
\begin{align*}
h_{k+1}=h_k+\Delta t\bigl(u_k-\beta\sqrt{h_k}\bigr),
\end{align*}
where $\beta>0$ and $\Delta t>0$. Let the desired level satisfy $0<h_r<h_{\max}$, and choose the steady inflow
\begin{align*}
u_r=\beta\sqrt{h_r}.
\end{align*}
Then $h_r$ is an equilibrium because
\begin{align*}
h_r+\Delta t(u_r-\beta\sqrt{h_r})=h_r+\Delta t(\beta\sqrt{h_r}-\beta\sqrt{h_r})=h_r.
\end{align*}
Use the saturated local proportional controller
\begin{align*}
\kappa_f(h)=\min\{u_{\max},\max\{0,u_r-k(h-h_r)\}\},
\end{align*}
with gain $k\ge 0$. On the terminal interval $X_f=[h_r-\delta,h_r+\delta]$, write $e=h-h_r$. If $\delta$ is chosen so that
\begin{align*}
0<\delta\le \min\{h_r,h_{\max}-h_r\},
\end{align*}
then every $h\in X_f$ satisfies $0\le h\le h_{\max}$. If also
\begin{align*}
k\delta\le u_r
\end{align*}
and
\begin{align*}
k\delta\le u_{\max}-u_r,
\end{align*}
then for $|e|\le\delta$,
\begin{align*}
0\le u_r-k e\le u_{\max}.
\end{align*}
Thus the saturation is inactive on $X_f$, and $\kappa_f(h)=u_r-k e$ there.
For $h=h_r+e\in X_f$, the next error under this local controller is
\begin{align*}
e^+=h_{k+1}-h_r.
\end{align*}
Substituting the dynamics and $u_k=u_r-k e$ gives
\begin{align*}
e^+=h_r+e+\Delta t(u_r-k e-\beta\sqrt{h_r+e})-h_r.
\end{align*}
Using $u_r=\beta\sqrt{h_r}$,
\begin{align*}
e^+=e-\Delta t k e-\Delta t\beta(\sqrt{h_r+e}-\sqrt{h_r}).
\end{align*}
For $h_r+e\ge 0$, rationalizing the square-root difference gives
\begin{align*}
\sqrt{h_r+e}-\sqrt{h_r}=\frac{e}{\sqrt{h_r+e}+\sqrt{h_r}}.
\end{align*}
Therefore
\begin{align*}
e^+=\left(1-\Delta t k-\frac{\Delta t\beta}{\sqrt{h_r+e}+\sqrt{h_r}}\right)e.
\end{align*}
If $\delta$ is small enough that
\begin{align*}
\Delta t\left(k+\frac{\beta}{\sqrt{h_r-\delta}+\sqrt{h_r}}\right)\le 1,
\end{align*}
then for every $|e|\le\delta$ the multiplier lies between $0$ and $1$. Hence
\begin{align*}
|e^+|\le |e|\le\delta.
\end{align*}
So $h_{k+1}\in[h_r-\delta,h_r+\delta]$, the actuator constraint is respected, and the terminal interval is controlled invariant under the saturated local controller. This is the recursive-feasibility role of the terminal set: once the predicted tank level reaches $X_f$, the appended local law can keep it there without violating the physical level or pump constraints.
[/example]
This tank example highlights a practical point: constraints are often asymmetric and physical. The lower bound prevents an invalid square-root model, the upper bound prevents overflow, and the input saturation reflects pump capacity.
## Lyapunov Stability of MPC
Recursive feasibility keeps the closed loop admissible, but admissibility alone does not imply convergence. Stability comes from using the optimal value $V_N$ as a Lyapunov function and showing that it decreases along the MPC closed loop.
[quotetheorem:7638]
[citeproof:7638]
This stability result explains why the terminal cost and terminal set are paired. The terminal set alone protects feasibility, while the terminal cost supplies the final inequality needed for descent of the finite-horizon value function. The positivity assumptions on the stage cost and value function are necessary for the stability conclusion: for the scalar closed loop $x_{k+1}=x_k$ with zero stage cost and zero terminal cost, the value function is constant and satisfies a nonincrease inequality, but every nonzero initial condition remains nonzero. The terminal decrease condition is also essential: for $x_{k+1}=2x_k$ with no control, stage cost $\ell(x,0)=x^2$, terminal set $X_f=\mathbb R$, and terminal cost $V_f=0$, feasibility is permanent but the shifted-cost comparison gives no descent and the origin is unstable. Continuity and properness cannot be replaced by a merely pointwise positive function; a discontinuous candidate such as $V(0)=0$ and $V(x)=1$ for $x\ne 0$ has sublevel sets that do not give usable neighbourhood information around the origin, while a bounded function such as $V(x)=x^2/(1+x^2)$ cannot control large excursions through compact sublevel sets. Exact model matching and optimizer selection matter as well; model mismatch can break the shifted candidate argument, and nonunique optimizers can make the implemented feedback discontinuous unless a consistent selection rule is imposed. These limitations motivate the robust MPC variants of Chapter 11 and connect the Lyapunov proof to reachability, dynamic programming, and control barrier functions, all of which provide alternative ways to certify that constrained trajectories remain safe.
[remark: Constraint Tightening]
In the presence of disturbances, model mismatch, or numerical discretization error, the nominal shifted trajectory may fail to satisfy the original constraints. Constraint tightening replaces $X$ and $U$ in the optimizer by smaller sets, leaving a margin that absorbs bounded prediction error. Tube MPC implements this idea by optimizing a nominal trajectory and surrounding it with an invariant error tube.
[/remark]
Constraint tightening is the robust counterpart of recursive feasibility because it preserves the shift argument after uncertainty has moved the real state away from the nominal prediction. The nominal optimizer plans inside smaller sets, the ancillary feedback keeps the real trajectory inside a tube around that nominal plan, and the tube radius accounts for disturbances and discretization error. The price is conservatism: every margin removed from $X$ or $U$ reduces the set of initial states for which the tightened problem is feasible.
[example: Tightened Double Integrator MPC]
Consider the sampled double integrator with additive position disturbance
\begin{align*}
x_{1,k+1}=x_{1,k}+\Delta t\,x_{2,k}+\frac{1}{2}\Delta t^2u_k+w_k,\qquad |w_k|\le \bar w,
\end{align*}
and let the nominal prediction satisfy the disturbance-free equation
\begin{align*}
\bar x_{1,k+1}=\bar x_{1,k}+\Delta t\,\bar x_{2,k}+\frac{1}{2}\Delta t^2\bar u_k.
\end{align*}
Write the position tracking error as $e_{1,k}=x_{1,k}-\bar x_{1,k}$. A tube condition with radius $\rho_k$ means
\begin{align*}
|e_{1,k}|\le \rho_k.
\end{align*}
If the MPC problem imposes the tightened nominal constraint
\begin{align*}
|\bar x_{1,k}|\le a-\rho_k,
\end{align*}
then the real position satisfies the original constraint by the triangle inequality:
\begin{align*}
|x_{1,k}|=|\bar x_{1,k}+e_{1,k}|\le |\bar x_{1,k}|+|e_{1,k}|.
\end{align*}
Using the two bounds separately gives
\begin{align*}
|\bar x_{1,k}|+|e_{1,k}|\le (a-\rho_k)+\rho_k.
\end{align*}
Since $(a-\rho_k)+\rho_k=a$, we obtain
\begin{align*}
|x_{1,k}|\le a.
\end{align*}
For example, if the ancillary feedback keeps the position error from growing by more than one disturbance bound per step, then starting from $e_{1,0}=0$ one may take
\begin{align*}
\rho_k=k\bar w.
\end{align*}
Indeed, the recursive estimate $|e_{1,k+1}|\le |e_{1,k}|+\bar w$ gives $|e_{1,1}|\le \bar w$, then $|e_{1,2}|\le 2\bar w$, and continuing by induction gives $|e_{1,k}|\le k\bar w=\rho_k$. The tightened constraint is therefore a bookkeeping device: the nominal trajectory stays inside $[-a+\rho_k,a-\rho_k]$, and the reserved margin $\rho_k$ absorbs the worst-case tube error so that the real trajectory remains inside $[-a,a]$.
[/example]
The chapter's main message is that constrained MPC is built from three linked mechanisms. Viability describes whether constraints are compatible with the dynamics, recursive feasibility ensures that the optimization remains solvable after applying the controller, and the value-function decrease gives stability when terminal ingredients are chosen to represent the infinite-horizon tail.
Constrained optimal control and MPC have now made feasibility, invariance, and recursive solvability part of the design problem itself. The next chapter asks what remains of these guarantees when the model is imperfect, adding disturbances and uncertainty to the nominal optimization framework.
# 11. Robustness and Disturbance-Aware Design
Robust control asks what survives when the model used for design is not the model followed by the plant. Chapters 2, 6-9, and 10 treated nonlinear stability, optimality, and model predictive control under a nominal dynamics model. This chapter adds disturbances, parameter errors, and constraint uncertainty, and it develops three complementary responses: input-to-state stability for analysis, robust Lyapunov design for feedback synthesis, and tube or min-max ideas for constrained optimal control.
## Disturbance-to-State Estimates
The first question is not how to reject every disturbance, but how to measure the size of the resulting deviation. Asymptotic stability of the unforced system says what happens when the input vanishes; robustness asks for a bound that degrades continuously with the magnitude of the input.
We work with a controlled disturbance system
\begin{align*}
\dot{x} = f(x, w), \qquad x(t) \in \mathbb R^n, \quad w(t) \in \mathbb R^m,
\end{align*}
where $w$ is a measurable locally essentially bounded input. The origin is assumed to satisfy $f(0,0)=0$, so the nominal system has an equilibrium at $0$.
To state disturbance bounds compactly, the course uses comparison functions. These functions are the vocabulary for separating transient decay from disturbance amplification.
[definition: Comparison Function Classes]
A continuous function $\alpha:[0,\infty)\to[0,\infty)$ belongs to class $\mathcal K$ if $\alpha(0)=0$ and $\alpha$ is strictly increasing. It belongs to class $\mathcal K_\infty$ if, in addition, $\alpha(r)\to\infty$ as $r\to\infty$. A continuous function $\beta:[0,\infty)\times[0,\infty)\to[0,\infty)$ belongs to class $\mathcal{K}\mathcal{L}$ if $\beta(\cdot,t)$ belongs to class $\mathcal K$ for each $t\ge 0$, and $\beta(r,t)\to 0$ as $t\to\infty$ for each fixed $r\ge 0$.
[/definition]
These functions encode two effects at once: decay from the initial state and gain from the disturbance magnitude. Ordinary asymptotic stability is not enough for a forced system, because even a small persistent input can prevent convergence to the origin. The needed robustness estimate must separate what dies out with time from what remains because of the size of the disturbance.
[definition: Input-to-State Stability]
The system $\dot{x}=f(x,w)$ is input-to-state stable if there exist $\beta\in\mathcal{K}\mathcal{L}$ and $\gamma\in\mathcal K_\infty$ such that, for every initial state $x_0\in\mathbb R^n$ and every locally essentially bounded input $w:[0,\infty)\to\mathbb R^m$, the solution exists for all $t\ge 0$ and satisfies
\begin{align*}
|x(t)| \le \beta(|x_0|,t) + \gamma(\|w\|_{L^\infty(0,t)})
\end{align*}
for all $t\ge 0$.
[/definition]
The definition says that transients generated by $x_0$ decay, while persistent disturbance energy leaves a residual state bound. When $w=0$, ISS implies global asymptotic stability of the origin for the nominal system. The next example computes the two terms in the ISS estimate in a model where the disturbance has a direct physical meaning.
[example: Cruise Control With Road Grade]
Consider the longitudinal vehicle model
\begin{align*}
\dot{v}=-a(v-v_d)+u+d(t), \qquad a>0,
\end{align*}
where $v_d$ is constant. Set $e=v-v_d$, so $\dot e=\dot v$, and choose proportional feedback $u=-k(v-v_d)=-ke$. Substituting $v-v_d=e$ and $u=-ke$ into the vehicle equation gives
\begin{align*}
\dot e=-ae-ke+d(t)=-(a+k)e+d(t).
\end{align*}
Assume $k>-a$ and write $\lambda=a+k>0$. Multiplying the scalar equation $\dot e+\lambda e=d(t)$ by the integrating factor $e^{\lambda t}$ gives
\begin{align*}
e^{\lambda t}\dot e(t)+\lambda e^{\lambda t}e(t)=e^{\lambda t}d(t).
\end{align*}
By the product rule, the left-hand side is $\frac{d}{dt}(e^{\lambda t}e(t))$, hence integration from $0$ to $t$ gives
\begin{align*}
e^{\lambda t}e(t)-e(0)=\int_0^t e^{\lambda s}d(s)\,ds.
\end{align*}
Multiplying by $e^{-\lambda t}$ yields
\begin{align*}
e(t)=e^{-\lambda t}e(0)+\int_0^t e^{-\lambda(t-s)}d(s)\,ds.
\end{align*}
Taking absolute values and using the triangle inequality,
\begin{align*}
|e(t)|\le e^{-\lambda t}|e(0)|+\int_0^t e^{-\lambda(t-s)}|d(s)|\,ds.
\end{align*}
For $0\le s\le t$, $|d(s)|\le \|d\|_{L^\infty(0,t)}$ almost everywhere, so
\begin{align*}
\int_0^t e^{-\lambda(t-s)}|d(s)|\,ds\le \|d\|_{L^\infty(0,t)}\int_0^t e^{-\lambda(t-s)}\,ds.
\end{align*}
With $r=t-s$, the last integral is
\begin{align*}
\int_0^t e^{-\lambda(t-s)}\,ds=\int_0^t e^{-\lambda r}\,dr=\frac{1-e^{-\lambda t}}{\lambda}.
\end{align*}
Therefore
\begin{align*}
|e(t)|\le e^{-(a+k)t}|e(0)|+\frac{1-e^{-(a+k)t}}{a+k}\|d\|_{L^\infty(0,t)}.
\end{align*}
The first term is the decaying effect of the initial speed error, while the second term shows that the disturbance gain is at most $1/(a+k)$; increasing $k$ reduces this gain, subject to actuator limits and passenger-comfort constraints.
[/example]
The cruise-control calculation is useful because the scalar equation can be solved explicitly, but nonlinear systems rarely allow that route. We need a definition that certifies ISS through a local derivative inequality, in the same way that ordinary Lyapunov functions certify nominal stability without solving the ODE.
[definition: ISS Lyapunov Function]
A continuously differentiable function $V:\mathbb R^n\to[0,\infty)$ is an ISS Lyapunov function for $\dot{x}=f(x,w)$ if there exist $\alpha_1,\alpha_2,\alpha_3,\sigma\in\mathcal K_\infty$ such that
\begin{align*}
\alpha_1(|x|) \le V(x) \le \alpha_2(|x|)
\end{align*}
and
\begin{align*}
\nabla V(x)\cdot f(x,w) \le -\alpha_3(|x|)+\sigma(|w|)
\end{align*}
for all $x\in\mathbb R^n$ and $w\in\mathbb R^m$.
[/definition]
The derivative inequality says that $V$ decreases outside a disturbance-dependent neighbourhood of the origin. The nontrivial issue is passing from this pointwise differential inequality to a uniform bound along every disturbed trajectory.
The next theorem supplies that missing comparison step: it shows that a global ISS Lyapunov function is not merely a local differential certificate, but a certificate for the full input-to-state estimate with separate transient and disturbance-gain terms. The point is to convert the scalar inequality for $V$ into comparison functions that bound the actual state along every disturbed trajectory.
[quotetheorem:7639]
[citeproof:7639]
The theorem is the disturbance analogue of the Lyapunov direct method, but each hypothesis has a specific role. The lower and upper comparison bounds on $V$ are what let a scalar inequality for $V$ control the Euclidean size of $x$; without proper radial bounds, $V$ could decrease while the state escapes along directions that $V$ does not measure. The derivative inequality is also global in this statement, so the conclusion is global ISS rather than only a local estimate. Local Lipschitz regularity and forward completeness are not cosmetic assumptions: they rule out nonunique trajectories and finite escape, either of which would make a uniform ISS estimate meaningless. The theorem does not say that the disturbance is rejected exactly; it says that the state ultimately lies in a neighbourhood whose size is controlled by the input magnitude. The next example applies the theorem locally to a mechanical system, showing how the derivative inequality absorbs an unknown forcing term.
[example: Perturbed Pendulum Energy Estimate]
For the closed-loop pendulum with $u=-kq$, the equations are
\begin{align*}
\dot q=p,\qquad \dot p=-\sin q-cp-kq+d(t).
\end{align*}
Choose $0<\rho<\sqrt{1+k}$. Then
\begin{align*}
V(q,p)=\frac{1+k}{2}q^2+\frac{1}{2}p^2+\rho qp
\end{align*}
is positive definite, since its quadratic matrix has positive leading entry $1+k$ and determinant $(1+k)-\rho^2>0$.
Differentiating $V$ along the closed-loop dynamics gives
\begin{align*}
\dot V=(1+k)q\dot q+p\dot p+\rho(\dot q\,p+q\dot p).
\end{align*}
Substituting $\dot q=p$ and $\dot p=-\sin q-cp-kq+d(t)$ gives
\begin{align*}
\dot V=(1+k)qp+p(-\sin q-cp-kq+d)+\rho(p^2+q(-\sin q-cp-kq+d)).
\end{align*}
Expanding and collecting like terms,
\begin{align*}
\dot V=p(q-\sin q)-(c-\rho)p^2-\rho q\sin q-\rho cqp-\rho kq^2+(p+\rho q)d.
\end{align*}
Work on $|q|\le \delta$, where $\mu:=1-\delta^2/6>0$ and
\begin{align*}
\mu\le \frac{\sin q}{q}\le 1
\end{align*}
for $q\ne 0$, with the value at $q=0$ understood by continuity. Then
\begin{align*}
q\sin q\ge \mu q^2
\end{align*}
and
\begin{align*}
|q-\sin q|=|q|\left|1-\frac{\sin q}{q}\right|\le \frac{\delta^2}{6}|q|.
\end{align*}
Therefore, with $b:=\delta^2/6+\rho c$,
\begin{align*}
p(q-\sin q)-\rho cqp\le b|q||p|.
\end{align*}
Thus
\begin{align*}
\dot V\le -\rho(k+\mu)q^2-(c-\rho)p^2+b|q||p|+(p+\rho q)d.
\end{align*}
Choose $\rho>0$ and then $\delta>0$ small enough that $\rho<c$, $\mu>0$, and
\begin{align*}
b<2\sqrt{\rho(k+\mu)(c-\rho)}.
\end{align*}
Then the quadratic expression
\begin{align*}
\rho(k+\mu)q^2+(c-\rho)p^2-b|q||p|
\end{align*}
is positive definite, so there is $m>0$ such that
\begin{align*}
\rho(k+\mu)q^2+(c-\rho)p^2-b|q||p|\ge m(q^2+p^2)
\end{align*}
on this neighbourhood. Hence
\begin{align*}
\dot V\le -m(q^2+p^2)+(p+\rho q)d.
\end{align*}
Finally,
\begin{align*}
|(p+\rho q)d|\le |p+\rho q|\,|d|\le \sqrt{1+\rho^2}\sqrt{q^2+p^2}\,|d|.
\end{align*}
[Young's inequality](/theorems/244) gives
\begin{align*}
\sqrt{1+\rho^2}\sqrt{q^2+p^2}\,|d|\le \frac{m}{2}(q^2+p^2)+\frac{1+\rho^2}{2m}|d|^2.
\end{align*}
Combining the last two estimates,
\begin{align*}
\dot V\le -\frac{m}{2}(q^2+p^2)+\frac{1+\rho^2}{2m}|d(t)|^2.
\end{align*}
Thus the desired estimate holds with $a_1=m/2$ and $a_2=(1+\rho^2)/(2m)$. The inequality shows that damping and feedback stiffness create local decay, while the forcing term contributes only through a quadratic disturbance gain.
[/example]
## Robust Control Lyapunov Functions
The next design question is how to choose feedback when the disturbance is not known in advance. A nominal control Lyapunov function asks for some input that decreases $V$; a robust control Lyapunov function asks for an input that decreases $V$ uniformly over an uncertainty set or with a prescribed attenuation term.
Let the uncertain control-affine dynamics be
\begin{align*}
\dot{x}=f(x)+g(x)u+p(x)d,
\end{align*}
where $f:\mathbb R^n\to\mathbb R^n$, $g:\mathbb R^n\to\mathbb R^{n\times r}$, and $p:\mathbb R^n\to\mathbb R^{n\times m}$ are locally Lipschitz maps. The control is $u\in\mathbb R^r$, and $d\in\mathbb R^m$ is an unknown disturbance satisfying $|d|\le \bar{d}$. The design goal is to make the decrease condition insensitive to the particular value of $d$.
This uniformity requirement changes what a Lyapunov certificate has to prove. The following definition records the required worst-case decrease directly at each nonzero state.
[definition: Robust Control Lyapunov Function]
A continuously differentiable, positive definite, proper function $V:\mathbb R^n\to[0,\infty)$ is a robust control Lyapunov function for the uncertain system if, for each $x\ne 0$, there exists $u\in\mathbb R^r$ such that
\begin{align*}
\sup_{|d|\le \bar d}\, \nabla V(x)\cdot (f(x)+g(x)u+p(x)d) < 0.
\end{align*}
[/definition]
This definition makes the controller responsible for the worst admissible disturbance direction. In practice the supremum term becomes a computable margin: by Cauchy-Schwarz,
\begin{align*}
\sup_{|d|\le \bar d}\nabla V(x)\cdot p(x)d=\bar d\, |p(x)^\top\nabla V(x)|.
\end{align*}
The next theorem is the closed-loop version of the definition: once a feedback realizes the margin everywhere, Lyapunov stability follows uniformly over the uncertainty set.
[quotetheorem:7640]
[citeproof:7640]
The theorem deliberately separates two cases that are often blurred in informal robust-control language. Exact convergence under a persistent bounded disturbance requires the origin to remain an equilibrium for every admissible disturbance; for additive disturbances with $p(0)d\ne 0$, the state cannot stay at the origin no matter how strong the Lyapunov decrease is away from it. Properness of $V$ is needed to turn decreasing storage into bounded trajectories, and local Lipschitz regularity is needed for well-posed closed-loop solutions. The residual constant $c_{\bar d}$ measures the price of robustness: making it smaller gives tighter regulation, but may require more control authority or may be impossible under actuator constraints. Many engineering specifications therefore allow residual motion but require the motion to have limited energy or amplitude. This motivates a storage inequality that measures how much disturbance energy can pass to a chosen performance output.
[definition: Disturbance Attenuation Inequality]
Let $h:\mathbb R^n\times\mathbb R^r\to\mathbb R^q$ be a performance output map, let $z=h(x,u)$, and let $V:\mathbb R^n\to[0,\infty)$ be a continuously differentiable storage function. For an attenuation level $\gamma>0$, $V$ satisfies the disturbance attenuation inequality if, along closed-loop trajectories,
\begin{align*}
\dot V(x) + |z|^2 - \gamma^2 |d|^2 \le 0.
\end{align*}
[/definition]
This inequality says that stored energy plus accumulated output energy is paid for by the initial condition and the disturbance energy. It is the nonlinear counterpart of $H^\infty$ gain control, and the next example computes the corresponding gain condition in the cruise-control model.
[example: Attenuating Road-Grade Error]
For the cruise-control error system $\dot e=-(a+k)e+d$ with output $z=e$, take $V(e)=\frac{1}{2}e^2$ and assume $a+k>1$. Since $\frac{d}{dt}\frac{1}{2}e(t)^2=e(t)\dot e(t)$, along trajectories we have
\begin{align*}
\dot V=e\bigl(-(a+k)e+d\bigr)=-(a+k)e^2+ed.
\end{align*}
Because $z=e$, the attenuation expression is
\begin{align*}
\dot V+|z|^2-\gamma^2|d|^2=-(a+k)e^2+ed+e^2-\gamma^2d^2.
\end{align*}
Combining the two $e^2$ terms gives
\begin{align*}
\dot V+|z|^2-\gamma^2|d|^2=-(a+k-1)e^2+ed-\gamma^2d^2.
\end{align*}
Complete the square in $e$:
\begin{align*}
-(a+k-1)e^2+ed-\gamma^2d^2=-(a+k-1)\left(e-\frac{d}{2(a+k-1)}\right)^2-\left(\gamma^2-\frac{1}{4(a+k-1)}\right)d^2.
\end{align*}
Indeed, expanding the square gives
\begin{align*}
-(a+k-1)\left(e^2-\frac{ed}{a+k-1}+\frac{d^2}{4(a+k-1)^2}\right)-\gamma^2d^2+\frac{d^2}{4(a+k-1)}=-(a+k-1)e^2+ed-\gamma^2d^2.
\end{align*}
Therefore, if
\begin{align*}
\gamma^2\ge \frac{1}{4(a+k-1)},
\end{align*}
then both terms in the completed-square expression are nonpositive, so
\begin{align*}
\dot V+|z|^2-\gamma^2|d|^2\le 0.
\end{align*}
Equivalently, any
\begin{align*}
\gamma\ge \frac{1}{2\sqrt{a+k-1}}
\end{align*}
certifies the disturbance attenuation inequality. Integrating the inequality from $0$ to $T$ gives
\begin{align*}
\int_0^T |e(t)|^2\,dt\le V(e(0))+\gamma^2\int_0^T |d(t)|^2\,dt.
\end{align*}
Thus increasing the feedback gain $k$ above the threshold $1-a$ lowers the certified road-grade-to-tracking-error energy gain.
[/example]
The attenuation example treats a single feedback loop, but large controllers are assembled from interacting subsystems. Each subsystem may be ISS with respect to signals produced by the other subsystem. The next theorem supplies the missing interconnection test: it prevents the gains from forming a self-amplifying loop.
[quotetheorem:7641]
A dedicated nonlinear stability course usually derives this result from comparison estimates for the two ISS bounds. The strict inequality is important: if the composed gain equals the identity at some amplitude, the estimates permit a self-sustaining internal signal of that size, so decay need not follow from the subsystem bounds alone. If the composed gain exceeds the identity, the interconnection can amplify disturbances around the loop even when each subsystem is ISS in isolation. The theorem also assumes the interconnection is well posed and that the internal signals appearing in the two ISS estimates are exactly the signals fed between the subsystems; it does not certify robustness for algebraic loops or hidden unmodelled channels. Here the theorem is used as a design rule: the loop gain created by the interconnection must remain below the identity gain at every amplitude.
[example: Coupled Actuator and Plant]
Suppose the plant is ISS with respect to actuator error $e$ with internal gain $\gamma_{xe}(r)=2r$, and the actuator error dynamics are ISS with respect to plant motion with internal gain $\gamma_{ex}(r)=ar$, where $a\ge 0$. By the *[ISS Small-Gain Theorem](/theorems/7641)*, the interconnection is certified when the composed gain satisfies $(\gamma_{xe}\circ\gamma_{ex})(r)<r$ for every $r>0$.
For $r>0$, first apply $\gamma_{ex}$:
\begin{align*}
\gamma_{ex}(r)=ar.
\end{align*}
Then apply $\gamma_{xe}$ to that value:
\begin{align*}
(\gamma_{xe}\circ\gamma_{ex})(r)=\gamma_{xe}(ar)=2(ar)=2ar.
\end{align*}
Thus the small-gain condition is
\begin{align*}
2ar<r \qquad \text{for every } r>0.
\end{align*}
Since $r>0$, dividing both sides by $r$ gives
\begin{align*}
2a<1.
\end{align*}
Dividing by $2$ gives
\begin{align*}
a<\frac{1}{2}.
\end{align*}
Therefore the plant-actuator interconnection is certified by the small-gain test exactly when $a<1/2$. If $a$ reaches or exceeds $1/2$, the product of the two internal gains is not strictly below the identity gain, so the subsystem ISS estimates alone no longer rule out amplification around the plant-actuator loop.
[/example]
## Tube MPC and Min-Max Design
The final question is how to enforce constraints under model uncertainty. Nominal MPC plans one future trajectory, but the true state may deviate from the plan under disturbances. Robust MPC therefore plans a family of possible trajectories and keeps the entire family inside the constraint set.
For a discrete-time uncertain system
\begin{align*}
x_{k+1}=F(x_k,u_k,w_k), \qquad w_k\in W,
\end{align*}
let the constraints be $x_k\in X$ and $u_k\in U$, where $X\subset\mathbb R^n$, $U\subset\mathbb R^r$, and $W\subset\mathbb R^m$ are compact sets. In tube MPC, a nominal state $z_k$ is planned and the applied control has the form
\begin{align*}
u_k=v_k+K(x_k-z_k),
\end{align*}
so that the error $e_k=x_k-z_k$ is kept inside a robust invariant error set.
The tube construction needs a set that can contain every future error once the current error starts inside it. This is the set-valued analogue of an invariant region for a disturbed discrete-time system.
[definition: Robust Positively Invariant Set]
Let $W\subset\mathbb R^m$ and let $\Phi:\mathbb R^n\times W\to\mathbb R^n$ define the error dynamics $e_{k+1}=\Phi(e_k,w_k)$ with $w_k\in W$. A set $E\subset\mathbb R^n$ is robust positively invariant if
\begin{align*}
e\in E,\ w\in W \implies \Phi(e,w)\in E.
\end{align*}
[/definition]
Once such a set is known, the nominal planner can tighten constraints by the worst-case error set. We need an operation that keeps exactly those nominal points whose whole error cloud still lies inside the original constraint set.
[definition: Pontryagin Difference]
For sets $A,B\subset\mathbb R^n$, the Pontryagin difference is
\begin{align*}
A\ominus B := \{a\in\mathbb R^n : a+B\subset A\}.
\end{align*}
[/definition]
Thus $z\in X\ominus E$ means that every true state $x=z+e$ with $e\in E$ lies in $X$. The remaining issue for tube MPC is simultaneous: the error must stay inside the chosen tube under all disturbances, and the nominal plan must leave enough margin so that adding any admissible error still satisfies the original state and input constraints. Robust invariance handles the first requirement, while Pontryagin tightening encodes the second.
[quotetheorem:7642]
[citeproof:7642]
The theorem explains the geometry of tube MPC: optimize over a smaller nominal feasible region so that the real trajectory can move inside a certified tube. Robust positive invariance of $E$ is the central hypothesis; if the error set is not invariant, the true state can leave the tube even when the nominal plan satisfies all tightened constraints. The Pontryagin differences must also be nonempty: if $X\ominus E$ or $U\ominus KE$ is empty, the robustified problem has protected the constraints by eliminating every admissible nominal plan. Finally, the guarantee lasts only while the nominal MPC problem remains feasible; recursive feasibility usually requires a terminal set or an additional invariance argument. The price of robustness is loss of usable constraint space, and this loss grows with the disturbance set and with the conservatism of the invariant set $E$.
[example: Tube MPC for a Constrained Vehicle Model]
Consider the sampled vehicle model with state $x_k=(p_k,v_k)$ and disturbance $w_k=(w_{p,k},w_{v,k})$:
\begin{align*}
p_{k+1}=p_k+Tv_k+\frac{T^2}{2}u_k+w_{p,k}, \quad v_{k+1}=v_k+Tu_k+w_{v,k}.
\end{align*}
Let the nominal state be $z_k=(\bar p_k,\bar v_k)$ and let the nominal input be $\nu_k$, with
\begin{align*}
\bar p_{k+1}=\bar p_k+T\bar v_k+\frac{T^2}{2}\nu_k, \quad \bar v_{k+1}=\bar v_k+T\nu_k.
\end{align*}
Use the tube feedback
\begin{align*}
u_k=\nu_k+K_p(p_k-\bar p_k)+K_v(v_k-\bar v_k),
\end{align*}
and write $e^p_k=p_k-\bar p_k$, $e^v_k=v_k-\bar v_k$. Then $u_k-\nu_k=K_p e^p_k+K_v e^v_k$, so subtracting the nominal position update from the true position update gives
\begin{align*}
e^p_{k+1}=e^p_k+T e^v_k+\frac{T^2}{2}(K_p e^p_k+K_v e^v_k)+w_{p,k}.
\end{align*}
Collecting the $e^p_k$ and $e^v_k$ terms,
\begin{align*}
e^p_{k+1}=\left(1+\frac{T^2}{2}K_p\right)e^p_k+\left(T+\frac{T^2}{2}K_v\right)e^v_k+w_{p,k}.
\end{align*}
Similarly, subtracting the nominal speed update from the true speed update gives
\begin{align*}
e^v_{k+1}=e^v_k+T(K_p e^p_k+K_v e^v_k)+w_{v,k}.
\end{align*}
Collecting terms,
\begin{align*}
e^v_{k+1}=T K_p e^p_k+(1+T K_v)e^v_k+w_{v,k}.
\end{align*}
Suppose $K=(K_p,K_v)$ is chosen so that there is a box or zonotope $E$ satisfying the robust invariance condition $(A+BK)E+W\subset E$, where the two scalar equations above are the component form of $e_{k+1}=(A+BK)e_k+w_k$. If $e_0\in E$ and $w_0\in W$, then $e_1=(A+BK)e_0+w_0\in E$. Repeating the same implication, $e_k\in E$ for every $k$ by induction.
Let
\begin{align*}
X=\{(p,v): |p|\le p_{\max},\ |v|\le v_{\max}\}, \quad U=\{u: |u|\le u_{\max}\}.
\end{align*}
If the nominal MPC enforces $z_k\in X\ominus E$, then $z_k+e\in X$ for every $e\in E$ by the definition of the Pontryagin difference. Since $e_k\in E$ and $x_k=z_k+e_k$, it follows that $x_k\in X$. If the nominal MPC also enforces $\nu_k\in U\ominus KE$, then $\nu_k+Ke\in U$ for every $e\in E$. Taking $e=e_k$ gives
\begin{align*}
u_k=\nu_k+Ke_k\in U.
\end{align*}
Thus the planned trajectory uses the tightened sets $X\ominus E$ and $U\ominus KE$ so that every true trajectory whose error remains in the tube satisfies the original position, speed, and acceleration constraints.
[/example]
Min-max MPC takes a different viewpoint. Instead of precomputing a tube and then solving a nominal optimization problem, it optimizes against the worst disturbance sequence directly.
[definition: Finite-Horizon Min-Max Control Problem]
Let $F:\mathbb R^n\times\mathbb R^r\times W\to\mathbb R^n$ define the dynamics $x_{k+1}=F(x_k,u_k,w_k)$, where $W\subset\mathbb R^m$ is the disturbance set. Let $\ell:\mathbb R^n\times\mathbb R^r\to[0,\infty)$ be the stage cost, let $V_f:\mathbb R^n\to[0,\infty)$ be the terminal cost, and let $X\subset\mathbb R^n$, $U\subset\mathbb R^r$ be the state and input constraint sets. The finite-horizon min-max value is
\begin{align*}
J_N(x)=\inf_{u_0,\dots,u_{N-1}}\sup_{w_0,\dots,w_{N-1}\in W}
\left(\sum_{k=0}^{N-1}\ell(x_k,u_k)+V_f(x_N)\right),
\end{align*}
subject to the dynamics, $x_k\in X$, and $u_k\in U$ for every admissible disturbance sequence.
[/definition]
This formulation is conceptually direct but computationally expensive, because the controller is solving a game against nature. Tube MPC is often used because it separates the geometric robustness certificate from the online optimization.
[remark: Robustness Trade-Off]
ISS analysis, robust CLFs, and robust MPC address the same difficulty at different levels. ISS gives input-output estimates for a closed-loop system, robust CLFs design feedback with disturbance margins, and tube MPC enforces hard constraints by tightening the nominal planning problem. Stronger robustness margins usually reduce nominal performance, so design requires choosing which errors must be attenuated and which constraints must be protected.
[/remark]
Robust design has shown how to protect constraints and stability margins in the presence of disturbances and model mismatch. The final chapter turns from theory to computation and case studies, where the optimal-control and robustness ideas from the whole course are implemented by direct transcription and numerical methods.
# 12. Numerical Methods and Case Studies
## Direct Transcription of Optimal Control Problems
This chapter turns the optimal-control theory from Chapters 6-11 into numerical algorithms. The prerequisites are the finite-horizon optimal control problem, Pontryagin's maximum principle, the Hamilton-Jacobi-Bellman equation, and the finite-dimensional first-order conditions for constrained nonlinear programming. The main theme is that discretisation is not a cosmetic step: it determines the decision variables, the sparsity pattern, the multipliers, and the interpretation of the computed controller. We move from direct transcription to indirect boundary-value methods, then compare PMP, HJB, and MPC on the same nonlinear examples.
The first computational problem is to replace an infinite-dimensional choice of a trajectory and a control by finitely many decision variables without losing the structure of the dynamics. Direct methods do this by discretising the state, the control, or both, and then solving a nonlinear programming problem. The resulting optimisation problem is often large and sparse, so the useful formulation is the one that exposes the local coupling in time.
Consider the finite-horizon control problem with cost
\begin{align*}
\Phi(x(T)) + \int_0^T L(x(t),u(t))\,dt,
\end{align*}
subject to the dynamics $\dot{x}(t)=f(x(t),u(t))$, the initial condition $x(0)=x_0$, path constraints $g(x(t),u(t))\le 0$, and terminal constraints $\psi(x(T))=0$. Here $x(t)\in\mathbb R^n$, $u(t)\in\mathbb R^m$, and $f$ is nonlinear. A direct method chooses a grid $0=t_0<t_1<\cdots<t_N=T$ and approximates the optimisation variables by finitely many values $x_k\approx x(t_k)$ and $u_k\approx u(t_k)$.
[definition: Direct Shooting]
Fix integers $N\ge 1$, a grid $0=t_0<t_1<\cdots<t_N=T$, a finite-dimensional control parameter space $U_N\subseteq \mathbb R^{mN}$, and a numerical flow map $\varphi_{h_k}: \mathbb R^n\times \mathbb R^m\to \mathbb R^n$ for each step $h_k=t_{k+1}-t_k$. For the class $\mathcal A_{\mathrm{oc}}$ of finite-horizon optimal control problems with fixed initial state $x_0\in\mathbb R^n$, direct shooting is the transcription map
\begin{align*}
\mathcal T_{\mathrm{shoot}}:\mathcal A_{\mathrm{oc}}\times \{(t_k)_{k=0}^N\}\times U_N\longrightarrow \mathcal P_N,
\end{align*}
where $\mathcal P_N$ denotes finite-dimensional nonlinear programmes with decision variable $u_N=(u_0,\dots,u_{N-1})\in U_N$, and where the state sequence is defined recursively by
\begin{align*}
x_{k+1}=\varphi_{h_k}(x_k,u_k),\qquad k=0,\dots,N-1.
\end{align*}
[/definition]
Direct shooting keeps the optimisation dimension small, which is attractive when the control dimension is much smaller than the state dimension. Its weakness is that long integrations make the endpoint and path constraints highly nonlinear functions of the control parameters, so sensitivity to early controls can make the nonlinear programme poorly conditioned.
[example: Direct Shooting for Pendulum Swing-Up]
For a torque-controlled pendulum with state $x=(\theta,\omega)$ and scalar input $u$, let $\theta=0$ denote the upright equilibrium and take positive torque to increase $\theta$. Choose a grid $0=t_0<t_1<\cdots<t_N=T$, set $h_k=t_{k+1}-t_k$, and use piecewise constant torques
\begin{align*}
u(t)=u_k\quad\text{for }t\in[t_k,t_{k+1}).
\end{align*}
The shooting variables are therefore the finite sequence $(u_0,\dots,u_{N-1})$, while the state is generated by integrating
\begin{align*}
\dot{\theta}=\omega,
\end{align*}
\begin{align*}
\dot{\omega}=\frac{g}{\ell}\sin\theta+\frac{1}{I}u_k
\end{align*}
on each interval.
For example, with an explicit Euler step, the one-step update is obtained by replacing each derivative by its value at $(\theta_k,\omega_k,u_k)$:
\begin{align*}
\theta_{k+1}=\theta_k+h_k\omega_k.
\end{align*}
\begin{align*}
\omega_{k+1}=\omega_k+h_k\left(\frac{g}{\ell}\sin\theta_k+\frac{1}{I}u_k\right).
\end{align*}
A typical direct-shooting nonlinear programme is then
\begin{align*}
\min_{u_0,\dots,u_{N-1}}\; q_\theta\theta_N^2+q_\omega\omega_N^2+\sum_{k=0}^{N-1} h_k r u_k^2,
\end{align*}
subject to the recursive state equations above and any torque bounds such as $|u_k|\le u_{\max}$. The terminal terms $q_\theta\theta_N^2$ and $q_\omega\omega_N^2$ penalise failure to arrive at the upright rest state $(0,0)$, while the sum $\sum_{k=0}^{N-1}h_kr u_k^2$ is the rectangular-rule discretisation of the control-energy cost $\int_0^T r u(t)^2\,dt$.
The energy-pumping interpretation comes from multiplying the angular equation by $I\omega$. Since
\begin{align*}
I\omega\dot{\omega}=\frac{Ig}{\ell}\omega\sin\theta+u\omega,
\end{align*}
and
\begin{align*}
\frac{d}{dt}\left(\frac{1}{2}I\omega^2\right)=I\omega\dot{\omega},
\end{align*}
the control contribution to the rate of kinetic energy is the power term $u\omega$. Thus a torque with the same sign as $\omega$ increases kinetic energy, while a torque with the opposite sign decreases it. Direct shooting can therefore find swing-up trajectories by selecting early torques with $u_k\omega_k>0$ to build motion and later torques with $u_k\omega_k<0$ to brake near $(\theta,\omega)=(0,0)$. The same recursive map also explains the sensitivity to the initial guess: changing an early $u_k$ changes $\omega_{k+1}$, then changes $\theta_{k+2}$ through the next update, and this propagated phase difference can lead to a very different terminal state.
[/example]
The pendulum example shows the main defect of direct shooting: the optimiser sees the state only through a long simulation map. To improve conditioning, the next discretisation introduces state values as decision variables and asks the optimiser to enforce consistency between short simulated arcs.
[definition: Multiple Shooting]
Fix the same problem class $\mathcal A_{\mathrm{oc}}$, grid, control parameter space $U_N\subseteq\mathbb R^{mN}$, and one-step flow maps $\varphi_{h_k}: \mathbb R^n\times\mathbb R^m\to\mathbb R^n$. Multiple shooting is the transcription map
\begin{align*}
\mathcal T_{\mathrm{ms}}:\mathcal A_{\mathrm{oc}}\times \{(t_k)_{k=0}^N\}\times U_N\longrightarrow \mathcal P_N
\end{align*}
whose nonlinear programme has decision variable
\begin{align*}
z=(x_0,\dots,x_N,u_0,\dots,u_{N-1})\in(\mathbb R^n)^{N+1}\times U_N
\end{align*}
and contains, for every interval $[t_k,t_{k+1}]$, the continuity constraint
\begin{align*}
x_{k+1}=\varphi_{h_k}(x_k,u_k),
\end{align*}
where $h_k=t_{k+1}-t_k$.
[/definition]
Multiple shooting gives a larger nonlinear programme, but its constraints are local in time. The remaining cost of the method is that numerical integration is still embedded in every interval constraint. This motivates a formulation where the trajectory itself is represented by polynomials and the dynamics are enforced algebraically at selected points.
[definition: Direct Collocation]
Fix a polynomial degree $r\ge 1$, a mesh $0=t_0<\cdots<t_N=T$, collocation nodes $c_j\in[0,1]$ for $j=1,\dots,s$, and finite-dimensional coefficient spaces $X_N$ and $U_N$ for piecewise polynomial state and control curves
\begin{align*}
x_N:[0,T]\to\mathbb R^n,\qquad u_N:[0,T]\to\mathbb R^m.
\end{align*}
Direct collocation is the transcription map
\begin{align*}
\mathcal T_{\mathrm{coll}}:\mathcal A_{\mathrm{oc}}\times \{(t_k)_{k=0}^N,(c_j)_{j=1}^s,r\}\longrightarrow \mathcal P_N
\end{align*}
whose nonlinear programme has decision variables given by the coefficients of $(x_N,u_N)\in X_N\times U_N$ and whose dynamic equality constraints are
\begin{align*}
\dot{x}_N(t_k+c_jh_k)=f(x_N(t_k+c_jh_k),u_N(t_k+c_jh_k))
\end{align*}
for all mesh intervals and collocation nodes, together with the transcribed endpoint and path constraints.
[/definition]
Collocation removes numerical integration from the inner loop of the optimisation algorithm. Instead of repeatedly simulating the system, it asks for a piecewise polynomial curve whose derivative matches the vector field at prescribed points.
[example: Hermite-Simpson Collocation]
On an interval $[t_k,t_{k+1}]$ with step size $h=t_{k+1}-t_k$, write
\begin{align*}
f_k=f(x_k,u_k),\qquad f_c=f(x_c,u_c),\qquad f_{k+1}=f(x_{k+1},u_{k+1}).
\end{align*}
Hermite-Simpson collocation represents the state on this interval by the cubic Hermite polynomial $X(s)$ in the normalized variable $s=(t-t_k)/h$, determined by
\begin{align*}
X(0)=x_k,\qquad X(1)=x_{k+1},\qquad X'(0)=hf_k,\qquad X'(1)=hf_{k+1}.
\end{align*}
The cubic with these four endpoint conditions is
\begin{align*}
X(s)=(2s^3-3s^2+1)x_k+(-2s^3+3s^2)x_{k+1}+h(s^3-2s^2+s)f_k+h(s^3-s^2)f_{k+1}.
\end{align*}
Evaluating at the midpoint $s=1/2$ gives
\begin{align*}
X(1/2)=\left(\frac14-\frac34+1\right)x_k+\left(-\frac14+\frac34\right)x_{k+1}+h\left(\frac18-\frac12+\frac12\right)f_k+h\left(\frac18-\frac14\right)f_{k+1}.
\end{align*}
Thus the midpoint state constraint is
\begin{align*}
x_c=X(1/2)=\frac{x_k+x_{k+1}}{2}+\frac{h}{8}(f_k-f_{k+1}).
\end{align*}
The dynamics also imply the integral identity
\begin{align*}
x_{k+1}-x_k=\int_{t_k}^{t_{k+1}} f(x(t),u(t))\,dt.
\end{align*}
Simpson quadrature on the same interval uses the endpoint and midpoint vector-field values, so the collocation defect constraint is
\begin{align*}
x_{k+1}-x_k=\frac{h}{6}(f_k+4f_c+f_{k+1}).
\end{align*}
Equivalently, substituting back the definitions of $f_k$, $f_c$, and $f_{k+1}$ gives
\begin{align*}
x_{k+1}-x_k=\frac{h}{6}\bigl(f(x_k,u_k)+4f(x_c,u_c)+f(x_{k+1},u_{k+1})\bigr).
\end{align*}
The two displayed constraints force the polynomial midpoint and the Simpson integral of the vector field to agree, which is why the method can capture smooth state and control profiles with fewer grid points than first-order Euler transcription.
[/example]
The Hermite-Simpson constraints illustrate that direct transcription produces an ordinary nonlinear programme with sparse equality and inequality constraints. To interpret solver output, and to connect the computation back to optimal-control theory, we need the first-order optimality conditions of that finite-dimensional programme.
[quotetheorem:7643]
[citeproof:7643]
For optimal control, the multipliers attached to the dynamic constraints behave like discrete costates. Differentiability is needed because the stationarity equation is a first-order derivative identity; for nonsmooth objectives or switching costs the appropriate conditions use subgradients instead. Local minimality is also essential: a feasible point that is not locally optimal may satisfy the constraints but have a feasible descent direction, such as the unconstrained point $z=1$ for $F(z)=z^2$, where stationarity fails. LICQ rules out degenerate active constraints; without a constraint qualification, as in minimising $F(z)=z$ subject to $z^2\le 0$ at $z^*=0$, the feasible set has no descent direction but no nonnegative multiplier can satisfy $1+2z^*\mu=0$. These limitations explain why multiplier plots are meaningful only after checking feasibility, active sets, and constraint regularity, and they lead into the question of whether such multipliers converge to continuous-time adjoints.
The next question is whether the discrete problem is faithful to the continuous problem. A method can satisfy the finite-dimensional KKT conditions and still converge to the wrong limiting trajectory if the transcription is inconsistent.
[quotetheorem:7644]
The consistency statement separates three issues that are sometimes conflated: the local defect on a single interval, the accumulated residual over the mesh, and convergence of locally optimal discrete solutions. Smoothness is needed because a high-order collocation formula cannot achieve order $p$ across a kinked control or a discontinuous switching law; a bang-bang minimum with an unresolved switching time gives only low-order residual decay until the mesh captures the switch. Isolation and coercivity prevent the discrete optimiser from drifting along a flat family of continuous optima, while LICQ and strict complementarity keep multipliers and active sets stable under perturbation. The theorem does not assert global optimality of the discrete solution, nor does it cover mesh-independent convergence through state-constraint junctions or abnormal extremals. Its role in the chapter is to justify collocation as a faithful local approximation before we use the same discrete KKT system to interpret multipliers as costates.
[example: Energy-Optimal Robot Arm Motion]
For a two-link robot arm with configuration $q\in\mathbb R^2$, velocity $v=\dot q\in\mathbb R^2$, and torque input $u\in\mathbb R^2$, suppose the mass matrix $M(q)$ is invertible. The second-order dynamics
\begin{align*}
M(q)\ddot{q}+C(q,\dot{q})\dot{q}+G(q)=u
\end{align*}
can be solved for the acceleration by subtracting the Coriolis and gravity terms:
\begin{align*}
M(q)\ddot q=u-C(q,\dot q)\dot q-G(q).
\end{align*}
Multiplying by $M(q)^{-1}$ gives
\begin{align*}
\ddot q=M(q)^{-1}\bigl(u-C(q,\dot q)\dot q-G(q)\bigr).
\end{align*}
With $x=(q,v)$ and $v=\dot q$, the corresponding first-order control system is
\begin{align*}
\dot x=\bigl(\dot q,\dot v\bigr)=\bigl(v,M(q)^{-1}(u-C(q,v)v-G(q))\bigr).
\end{align*}
A collocation transcription with grid values $(q_k,v_k,u_k)$ and fixed endpoint configurations imposes endpoint constraints such as
\begin{align*}
q_0=q_{\mathrm{init}}.
\end{align*}
\begin{align*}
q_N=q_{\mathrm{term}}.
\end{align*}
The energy objective is discretised by a quadrature rule; with a rectangular rule, piecewise constant torques give
\begin{align*}
\int_0^T |u(t)|^2\,dt\approx \sum_{k=0}^{N-1}h_k |u_k|^2.
\end{align*}
Thus the nonlinear programme has decision variables consisting of the grid values of $q$, $v$, and $u$, objective $\sum_{k=0}^{N-1}h_k|u_k|^2$, endpoint constraints on $q_0$ and $q_N$, and collocation defect constraints enforcing
\begin{align*}
\dot q=v
\end{align*}
and
\begin{align*}
\dot v=M(q)^{-1}\bigl(u-C(q,v)v-G(q)\bigr)
\end{align*}
at the chosen collocation points.
The sparsity is local in time: the defect on an interval $[t_k,t_{k+1}]$ involves only variables attached to that interval and its endpoints, rather than all grid variables at once. The energy term penalises large torque values through $|u_k|^2$, while the polynomial control parametrisation and collocation defects constrain what can happen between neighbouring grid points. This is why the collocation solution tends to suppress unresolved high-frequency torque oscillations that a coarse shooting parametrisation may fail to represent accurately.
[/example]
## Indirect Methods and Costate Boundary-Value Problems
Direct methods discretise first and optimise second. Indirect methods reverse the order: derive first-order necessary conditions in continuous time, then solve the resulting boundary-value problem. The computational question is whether the Hamiltonian equations and their boundary conditions can be solved reliably for the desired extremal.
For the problem with objective
\begin{align*}
\Phi(x(T)) + \int_0^T L(x(t),u(t))\,dt
\end{align*}
and dynamics $\dot{x}(t)=f(x(t),u(t))$, $x(0)=x_0$, the Hamiltonian is
\begin{align*}
H(x,p,u)=p\cdot f(x,u)+L(x,u),
\end{align*}
where $p(t)\in\mathbb R^n$ is the costate. Pontryagin's maximum principle gives a coupled system for $x$ and $p$, together with an optimality condition in $u$.
[definition: Costate Boundary-Value Problem]
Let $f:\mathbb R^n\times\mathbb R^m\to\mathbb R^n$, $L:\mathbb R^n\times\mathbb R^m\to\mathbb R$, and $\Phi:\mathbb R^n\to\mathbb R$ be continuously differentiable, and define $H:\mathbb R^n\times\mathbb R^n\times\mathbb R^m\to\mathbb R$ by
\begin{align*}
H(x,p,u)=p\cdot f(x,u)+L(x,u).
\end{align*}
For a fixed horizon $T>0$, a costate boundary-value problem is the problem of finding functions
\begin{align*}
x\in C^1([0,T];\mathbb R^n),\qquad p\in C^1([0,T];\mathbb R^n),\qquad u:[0,T]\to\mathbb R^m
\end{align*}
satisfying the state equation, the costate equation
\begin{align*}
\dot{p}(t)=-\nabla_x H(x(t),p(t),u(t)),
\end{align*}
the pointwise stationarity or minimisation condition for $u(t)$, the initial condition $x(0)=x_0$, and the terminal transversality condition, for example $p(T)=\nabla\Phi(x(T))$ when the terminal state is free and there are no terminal constraints.
[/definition]
The state condition is imposed at the initial time, while the costate condition is usually imposed at the terminal time. This two-point structure is the main numerical difficulty: the missing initial costate must be chosen so that the terminal condition is satisfied.
[example: Shooting the Costate for a Scalar Regulator]
For a scalar nonlinear regulator with $\dot{x}=f(x)+u$, running cost $x^2+\rho u^2$ with $\rho>0$, and terminal cost $\Phi(x(T))$, the Hamiltonian is
\begin{align*}
H(x,p,u)=p(f(x)+u)+x^2+\rho u^2.
\end{align*}
The stationarity condition is $\partial H/\partial u=0$. Differentiating the displayed Hamiltonian with respect to $u$ gives
\begin{align*}
\frac{\partial H}{\partial u}(x,p,u)=p+2\rho u.
\end{align*}
Thus
\begin{align*}
p+2\rho u=0.
\end{align*}
Solving for $u$ gives
\begin{align*}
u=-\frac{p}{2\rho}.
\end{align*}
Substituting this feedback expression into the state equation gives
\begin{align*}
\dot{x}=f(x)-\frac{p}{2\rho}.
\end{align*}
The costate equation is $\dot p=-\partial H/\partial x$. Since
\begin{align*}
\frac{\partial H}{\partial x}(x,p,u)=p f'(x)+2x,
\end{align*}
we obtain
\begin{align*}
\dot p=-p f'(x)-2x.
\end{align*}
Therefore the indirect formulation is the two-dimensional boundary-value problem
\begin{align*}
\dot{x}=f(x)-\frac{p}{2\rho},\qquad \dot p=-p f'(x)-2x,
\end{align*}
with $x(0)=x_0$ and $p(T)=\Phi'(x(T))$.
An indirect shooting method chooses a trial value $\alpha$ for the missing initial costate $p(0)$, solves the initial-value problem
\begin{align*}
x(0)=x_0,\qquad p(0)=\alpha,
\end{align*}
and computes the terminal mismatch
\begin{align*}
S(\alpha)=p_\alpha(T)-\Phi'(x_\alpha(T)).
\end{align*}
The correct initial costate is a root of $S(\alpha)=0$, so the shooting method adjusts $\alpha$ until the forward integration satisfies the terminal transversality condition.
[/example]
Indirect shooting is accurate near a known extremal, but it can be fragile when there are state constraints, control bounds, switching surfaces, or conjugate points. Since direct methods also produce multiplier sequences, it is natural to ask whether those finite-dimensional multipliers are approximations to the costates computed by an indirect method.
[quotetheorem:7645]
[citeproof:7645]
This result explains a common diagnostic in numerical optimal control: a direct collocation solution can be inspected by plotting its multipliers. Normality is needed because abnormal extremals may have a vanishing cost multiplier, so the link between dynamic multipliers and the costate can be nonunique or badly scaled. The unconstrained and smooth hypotheses exclude active path constraints and bang-bang controls; at such points the continuous adjoint can jump or the stationarity condition can become a variational inequality, and a naive multiplier plot may show genuine nonsmooth structure rather than mesh error. Uniform invertibility of the discrete KKT system rules out numerical degeneracy: if the active set changes rapidly or the multiplier is nonunique, nearby meshes can produce different multiplier profiles. With those limitations in mind, smooth scaled multipliers support the interpretation that the discretisation is resolving the same extremal that an indirect PMP computation would target.
## Comparing PMP, HJB, and MPC on the Same System
The final conceptual problem is to understand what each optimal-control viewpoint gives in practice. Pontryagin's maximum principle describes open-loop extremals, Hamilton-Jacobi-Bellman theory describes the value function and optimal feedback, and model predictive control computes a feedback law by repeatedly solving finite-horizon problems. Comparing them on a single nonlinear system shows that they are not competing theories so much as different computational projections of the same optimisation principle.
Consider the inverted pendulum swing-up problem with torque bound $|u|\le u_{\max}$ and cost
\begin{align*}
J[u]=\Phi(x(T))+\int_0^T \bigl(q_1(1-\cos\theta(t))+q_2\omega(t)^2+r u(t)^2\bigr)\,dt.
\end{align*}
The PMP approach seeks a trajectory, costate, and input satisfying the Hamiltonian equations and the pointwise minimisation of $H$. The HJB approach seeks a value function $V(t,x)$ satisfying
\begin{align*}
-\partial_tV(t,x)=\min_{|u|\le u_{\max}}\{\nabla V(t,x)\cdot f(x,u)+q_1(1-\cos\theta)+q_2\omega^2+ru^2\},
\end{align*}
with terminal condition $V(T,x)=\Phi(x)$. The MPC approach solves a shorter finite-horizon version at each sampling time, applies the first part of the computed input, then resolves after measuring or estimating the new state.
[definition: Receding-Horizon Control Law]
Let $X\subseteq\mathbb R^n$ be the admissible state set, $U\subseteq\mathbb R^m$ the admissible control set, $\Delta>0$ the sampling time, and $T_H>0$ the prediction horizon. A receding-horizon control law is the feedback map
\begin{align*}
\kappa_{\mathrm{MPC}}:X\longrightarrow U
\end{align*}
defined by solving, at each sampling time $t_k=k\Delta$, a finite-horizon optimal control problem on $[t_k,t_k+T_H]$ with initial state $x(t_k)\in X$, selecting an optimal input $u_{k}^{*}:[t_k,t_k+T_H]\to U$, and setting
\begin{align*}
\kappa_{\mathrm{MPC}}(x(t_k))=u_k^*(t_k)
\end{align*}
or applying the first sampled segment of $u_k^*$ on $[t_k,t_{k+1})$.
[/definition]
Receding-horizon control is computational rather than purely analytic. Its strength is that constraints and nonlinear dynamics can be included directly in the online optimisation problem, while its weakness is that stability and feasibility require design conditions beyond pointwise numerical optimisation.
[example: Nonlinear Vehicle Path Tracking]
For the kinematic vehicle, take state $x=(p_1,p_2,\theta)$ and input $a=(v,\omega)$, with dynamics
\begin{align*}
\dot p_1=v\cos\theta,\qquad \dot p_2=v\sin\theta,\qquad \dot\theta=\omega.
\end{align*}
Suppose a reference path is sampled as $(p_{1,k}^{\mathrm{ref}},p_{2,k}^{\mathrm{ref}},\theta_k^{\mathrm{ref}})$ over a prediction horizon. At grid point $k$, the squared position error is
\begin{align*}
|e_{p,k}|^2=(p_{1,k}-p_{1,k}^{\mathrm{ref}})^2+(p_{2,k}-p_{2,k}^{\mathrm{ref}})^2.
\end{align*}
The heading error is
\begin{align*}
e_{\theta,k}=\theta_k-\theta_k^{\mathrm{ref}},
\end{align*}
so its quadratic penalty is
\begin{align*}
e_{\theta,k}^2=(\theta_k-\theta_k^{\mathrm{ref}})^2.
\end{align*}
With weights $q_p,q_\theta,r_v,r_\omega>0$, a typical finite-horizon MPC objective is
\begin{align*}
\sum_{k=0}^{N-1} h_k\bigl(q_p|e_{p,k}|^2+q_\theta e_{\theta,k}^2+r_v v_k^2+r_\omega\omega_k^2\bigr).
\end{align*}
Using an explicit Euler transcription, each derivative is evaluated at $(p_{1,k},p_{2,k},\theta_k,v_k,\omega_k)$, giving
\begin{align*}
p_{1,k+1}=p_{1,k}+h_k v_k\cos\theta_k.
\end{align*}
\begin{align*}
p_{2,k+1}=p_{2,k}+h_k v_k\sin\theta_k.
\end{align*}
\begin{align*}
\theta_{k+1}=\theta_k+h_k\omega_k.
\end{align*}
Road boundaries can be encoded by inequalities such as
\begin{align*}
p_{2,\min}\le p_{2,k}\le p_{2,\max},
\end{align*}
and actuator limits by
\begin{align*}
v_{\min}\le v_k\le v_{\max},\qquad |\omega_k|\le \omega_{\max}.
\end{align*}
At sampling time $t_j$, MPC solves this finite-dimensional problem with initial constraint $x_0=x(t_j)$, applies the first computed input $(v_0,\omega_0)$, then shifts the horizon and solves again from the next measured state. Thus the controller avoids gridding the full three-dimensional state space as an HJB computation would, while also avoiding the fragility of a single open-loop PMP trajectory: when a disturbance changes $(p_1,p_2,\theta)$, the next optimisation uses the changed state as its new initial condition.
[/example]
The comparison can be summarised by the object each method primarily computes. PMP computes necessary conditions along candidate optimal trajectories. HJB computes a value function whose gradient supplies feedback when the value function is smooth. MPC computes a sequence of finite-dimensional optimisation problems whose first controls define a feedback implementation.
[remark: When the Three Methods Agree]
If the value function is smooth near an optimal trajectory, then the HJB feedback satisfies the PMP stationarity condition with $p(t)=\nabla V(t,x(t))$. A direct transcription of the same problem has KKT multipliers approximating this costate. A stabilising MPC implementation with a long enough horizon often shadows the same optimal feedback locally, although its law is defined by repeated finite-horizon solves rather than by an explicit formula for $V$.
[/remark]
The agreement of PMP, HJB, and MPC near a trajectory still leaves a long-horizon design question: where does an optimiser spend most of its time when the horizon becomes large? To answer this, we first isolate the static operating point that minimises the running cost while satisfying the steady dynamics.
[definition: Optimal Steady State]
Let $f:\mathbb R^n\times\mathbb R^m\to\mathbb R^n$, $L:\mathbb R^n\times\mathbb R^m\to\mathbb R$, and let $X\subseteq\mathbb R^n$, $U\subseteq\mathbb R^m$ be pointwise state and control constraint sets. An optimal steady state is an element $(\bar{x},\bar{u})\in X\times U$ solving the static optimisation problem
\begin{align*}
\min_{(x,u)\in X\times U} L(x,u)\qquad\text{subject to}\qquad f(x,u)=0.
\end{align*}
[/definition]
This steady state is a static optimisation object, but it predicts the middle portion of many long dynamic optimisers. The next theorem states the turnpike phenomenon: under structural hypotheses, an optimal trajectory may have complicated endpoint arcs but spends only a bounded amount of time away from the optimal steady state.
[quotetheorem:7646]
The theorem is a structural result about long horizons, not a claim that every optimal-control problem has a turnpike. Uniqueness of the steady state is needed because if two distinct steady states have the same static running cost, an optimiser can spend macroscopic time near either one and no single $\bar{x}$ controls the middle arc. The coercive excess-cost bound is what turns time away from $\bar{x}$ into a quantitative estimate; without it, a flat valley of nearly optimal states can allow long excursions at negligible cost. Endpoint reachability supplies a competitor that travels to the steady state, waits, and travels out with cost bounded independently of $T$; if the endpoints cannot be connected to the steady state, the conclusion can fail even for a favourable running cost. The result does not specify the shape of the entry and exit arcs or prove feedback stability, but it motivates terminal ingredients and steady-state targets in long-horizon MPC.
[example: Turnpike Behaviour in Energy-Optimal Motion]
For a two-link robot arm with dynamics $M(q)\ddot q+C(q,\dot q)\dot q+G(q)=u$, a steady holding motion has $v=\dot q=0$ and $\ddot q=0$. Substituting these values into the dynamics gives
\begin{align*}
M(q)0+C(q,0)0+G(q)=u.
\end{align*}
Thus a posture $\bar q$ is held by the constant torque $\bar u=G(\bar q)$. If the running energy is $L(q,v,u)=|u|^2$, the static holding cost at $\bar q$ is
\begin{align*}
L(\bar q,0,\bar u)=|G(\bar q)|^2.
\end{align*}
Assume $\bar q$ is the low-energy holding posture and that there are admissible entry and exit motions: one moves from the initial configuration to $\bar q$ in time $\tau_{\mathrm{in}}$, and one moves from $\bar q$ to the terminal configuration in time $\tau_{\mathrm{out}}$. Let the total excess energy of these two endpoint arcs over the holding cost be bounded by $A$, independent of the horizon $T$. The competitor that enters $\bar q$, holds it, and exits has cost
\begin{align*}
J_{\mathrm{comp}}(T)=T|G(\bar q)|^2+A.
\end{align*}
Since the optimal trajectory has cost no larger than this competitor,
\begin{align*}
J^*(T)\le T|G(\bar q)|^2+A.
\end{align*}
Now suppose the running cost is separated from the holding cost away from $\bar q$: for every fixed $\varepsilon>0$, there is $\alpha_\varepsilon>0$ such that
\begin{align*}
L(q,v,u)-|G(\bar q)|^2\ge \alpha_\varepsilon
\end{align*}
whenever $|q-\bar q|>\varepsilon$. If
\begin{align*}
E_\varepsilon=\{t\in[0,T]:|q^*(t)-\bar q|>\varepsilon\},
\end{align*}
then integrating the previous inequality over $E_\varepsilon$ gives
\begin{align*}
\int_{E_\varepsilon}\bigl(L(q^*(t),v^*(t),u^*(t))-|G(\bar q)|^2\bigr)\,dt\ge \alpha_\varepsilon\mathcal L^1(E_\varepsilon).
\end{align*}
The total excess cost of the optimal trajectory is at most $A$, so
\begin{align*}
\alpha_\varepsilon\mathcal L^1(E_\varepsilon)\le A.
\end{align*}
Therefore
\begin{align*}
\mathcal L^1(E_\varepsilon)\le \frac{A}{\alpha_\varepsilon}.
\end{align*}
The bound does not grow with $T$, so increasing the horizon mostly adds time near the steady posture $\bar q$ rather than lengthening the entry and exit arcs. This is the turnpike mechanism behind terminal steady-state ingredients in long-horizon MPC.
[/example]
## Numerical Practice and Diagnostics
A numerical optimal-control computation is not complete when the solver returns a feasible point. The practitioner must check whether the mesh resolves the trajectory, whether the active constraints make sense, whether multipliers are well scaled, and whether the computed policy behaves robustly under simulation. These diagnostics are where theory, optimisation, and modelling assumptions meet.
The first diagnostic is mesh refinement. A direct solution should be recomputed on finer meshes, or with adaptive refinement, to check that the state, control, objective value, and active set stabilise. For collocation, large defect residuals indicate intervals where the polynomial approximation is not resolving the dynamics.
The second diagnostic is closed-loop simulation. An open-loop swing-up control from PMP or direct collocation may fail under model error or disturbances, while an MPC policy can correct deviations by resolving. Conversely, an MPC implementation may be computationally expensive or may lose feasibility without a terminal set and terminal cost.
[example: Comparing Open-Loop and Receding-Horizon Pendulum Control]
Let the direct collocation solve the pendulum swing-up problem from the measured initial state $x_0=(\theta_0,\omega_0)$ and return a grid torque sequence $(u_0^*,\dots,u_{N-1}^*)$. If this sequence is applied open-loop with an explicit Euler simulation, the nominal trajectory satisfies
\begin{align*}
\theta_{k+1}^*=\theta_k^*+h_k\omega_k^*.
\end{align*}
\begin{align*}
\omega_{k+1}^*=\omega_k^*+h_k\left(\frac{g}{\ell}\sin\theta_k^*+\frac{1}{I}u_k^*\right).
\end{align*}
The terminal penalty in the transcription is small precisely when $\theta_N^*$ and $\omega_N^*$ are close to $0$, so this open-loop torque reaches the upright rest state in the nominal simulation.
Now perturb only the initial angle, so the actual initial state is $\tilde\theta_0=\theta_0+\delta$ and $\tilde\omega_0=\omega_0$, while the same stored torque $u_k^*$ is still applied. The actual first step is
\begin{align*}
\tilde\theta_1=\theta_0+\delta+h_0\omega_0.
\end{align*}
Since $\theta_1^*=\theta_0+h_0\omega_0$, subtracting gives
\begin{align*}
\tilde\theta_1-\theta_1^*=\delta.
\end{align*}
For the angular velocity,
\begin{align*}
\tilde\omega_1=\omega_0+h_0\left(\frac{g}{\ell}\sin(\theta_0+\delta)+\frac{1}{I}u_0^*\right).
\end{align*}
Subtracting the nominal update
\begin{align*}
\omega_1^*=\omega_0+h_0\left(\frac{g}{\ell}\sin\theta_0+\frac{1}{I}u_0^*\right)
\end{align*}
gives
\begin{align*}
\tilde\omega_1-\omega_1^*=h_0\frac{g}{\ell}\bigl(\sin(\theta_0+\delta)-\sin\theta_0\bigr).
\end{align*}
Using the identity $\sin(a+b)=\sin a\cos b+\cos a\sin b$, this becomes
\begin{align*}
\tilde\omega_1-\omega_1^*=h_0\frac{g}{\ell}\bigl(\sin\theta_0(\cos\delta-1)+\cos\theta_0\sin\delta\bigr).
\end{align*}
Thus even though the torque sequence is unchanged, the perturbed state produces a different angular velocity after one step, and the next angle update then carries that velocity error forward.
A receding-horizon controller does not keep using the original sequence. At sampling time $t_j$, it measures $\tilde x_j=(\tilde\theta_j,\tilde\omega_j)$, solves a new finite-horizon collocation problem with initial constraint $x_0=\tilde x_j$, and applies only the first torque from the new optimizer. If the prediction horizon is long enough to include both energy pumping and braking, the recomputed torque can choose $u_k\omega_k>0$ while energy must be added and $u_k\omega_k<0$ near the upright state. The difference is therefore structural: open-loop control propagates the initial perturbation through a fixed torque sequence, while receding-horizon control repeatedly changes the sequence to match the currently measured pendulum state.
[/example]
The final diagnostic is interpretation of multipliers and switching structure. For bounded-input problems, multipliers and stationarity conditions can reveal bang-bang arcs, singular arcs, and active path constraints. These structures should agree with the qualitative predictions of PMP; disagreement often points to a coarse mesh, poor scaling, or an incorrect transcription.
[remark: Scaling and Solver Reliability]
State variables, controls, costs, and constraints should be scaled so that typical magnitudes are comparable. Bad scaling can make a mathematically sound transcription appear infeasible or cause the optimiser to stop at a point with large physical residuals. In nonlinear control computations, nondimensionalisation and careful choice of state coordinates are often as important as the optimisation algorithm.
[/remark]
By the end of the course, numerical methods should not be viewed as an afterthought to the theory. Direct transcription turns optimal control into sparse nonlinear programming; indirect methods expose the Hamiltonian geometry behind optimality; HJB identifies the feedback ideal; and MPC provides a practical way to approximate constrained nonlinear feedback. The strongest designs use these viewpoints together: theory predicts structure, transcription computes candidates, and closed-loop simulation tests whether the controller achieves the intended behaviour.
## Connections and Further Reading
These notes continue the path from [Control Theory I: Linear Systems](/page/Control%20Theory%20I%3A%20Linear%20Systems). Linear controllability, observability, Kalman filtering, Riccati equations, and linear-quadratic regulation remain the local models behind feedback linearisation, high-gain observers, EKF design, and quadratic MPC terminal ingredients.
The nonlinear-stability chapters connect most directly to [Cambridge II Dynamical Systems](/page/Cambridge%20II%20Dynamical%20Systems), Lyapunov methods, invariant-set arguments, and perturbation theory. The optimal-control chapters connect to [Calculus of Variations I: Classical Theory](/page/Calculus%20of%20Variations%20I%3A%20Classical%20Theory), [Convex Optimisation I: Theory](/page/Convex%20Optimisation%20I%3A%20Theory), and [numerical analysis](/page/Numerical%20Analysis), because PMP, HJB, and direct transcription all turn trajectory design into variational or finite-dimensional optimization problems.
For readers moving further through Androma, natural next stops are viscosity-solution theory for nonsmooth HJB equations, constrained optimization and KKT theory for direct transcription, numerical ODE methods for collocation and shooting, and robust control for disturbance-aware feedback design. The recurring theme is that local differential certificates, global value functions, and repeated finite-horizon optimizations are complementary ways to build nonlinear feedback.
## References
- Hassan K. Khalil, *Nonlinear Systems*, third edition, Prentice Hall, 2002.
- Eduardo D. Sontag, *Mathematical Control Theory: Deterministic Finite Dimensional Systems*, second edition, Springer, 1998.
- Arthur E. Bryson and Yu-Chi Ho, *Applied Optimal Control*, Hemisphere, 1975.
- Dimitri P. Bertsekas, *Dynamic Programming and Optimal Control*, Athena Scientific, 2017.
- Frank L. Lewis, Draguna Vrabie, and Vassilis L. Syrmos, *Optimal Control*, third edition, Wiley, 2012.
- James B. Rawlings, David Q. Mayne, and Moritz M. Diehl, *Model Predictive Control: Theory, Computation, and Design*, second edition, Nob Hill Publishing, 2017.
Contents
- Introduction
- What Changes Beyond Linear Control?
- Feedback, Stability, and Lyapunov Functions
- Optimization As A Control Principle
- Computation, Constraints, and Model Predictive Control
- How The Course Is Organized
- 1. Nonlinear Control Systems and Equilibria
- Control-Affine Systems and Controlled Trajectories
- Equilibria, Invariant Sets, and Linearization
- Input-To-State Viewpoints and Forward Completeness
- 2. Lyapunov Stability Theory
- Stability Notions and Regions of Attraction
- Lyapunov Direct Method and Strict Lyapunov Functions
- LaSalle Invariance Principle and Barbalat Lemma
- 3. Lyapunov-Based Feedback Design
- From Lyapunov Certificates to Feedback Laws
- Sontag's Universal Formula for Affine Systems
- Recursive Backstepping for Strict-Feedback Systems
- Comparing CLF Design, Sontag Feedback, and Backstepping
- 4. Feedback Linearisation and Normal Forms
- Lie Derivatives and Input-Output Linearisation
- Full-State Linearisation and Diffeomorphisms
- Zero Dynamics and Minimum-Phase Behaviour
- Byrnes-Isidori Normal Form
- 5. Nonlinear Observers and Output Feedback
- Detectability Beyond Linear Systems
- Extended Kalman Filtering as Local Nonlinear Estimation
- Output-Feedback Stabilization and Separation Issues
- 6. Calculus of Variations and Optimal Control Problems
- From Performance Criteria to Standard Formulations
- Euler-Lagrange Equations as the Prototype Necessary Condition
- Endpoint Constraints and Transversality Conditions
- Normal and Abnormal Extremals
- Existence, Compactness, and Direct Methods
- 7. Pontryagin Maximum Principle
- Hamiltonians, Costates, and Switching Functions
- Fixed-Time, Free-Time, and Terminal-Constraint Versions
- Bang-Bang Control and Singular Arcs
- 8. Dynamic Programming and HJB Equations
- Value Functions and Bellman Optimality
- Derivation of the Hamilton-Jacobi-Bellman Equation
- Verification Theorems and Smooth Optimal Synthesis
- 9. Viscosity Solutions and Nonsmooth Value Functions
- Why Classical HJB Solutions Fail
- Viscosity Subsolutions And Supersolutions
- Comparison And Uniqueness
- Stability Under Approximation
- 10. Constrained Optimal Control and MPC
- Constraint Satisfaction and Viability
- Terminal Ingredients for Finite Horizons
- Finite-Horizon Receding-Horizon Control
- Recursive Feasibility
- Lyapunov Stability of MPC
- 11. Robustness and Disturbance-Aware Design
- Disturbance-to-State Estimates
- Robust Control Lyapunov Functions
- Tube MPC and Min-Max Design
- 12. Numerical Methods and Case Studies
- Direct Transcription of Optimal Control Problems
- Indirect Methods and Costate Boundary-Value Problems
- Comparing PMP, HJB, and MPC on the Same System
- Numerical Practice and Diagnostics
- Connections and Further Reading
- References
Control Theory II: Nonlinear and Optimal Control
Content
Problems
History
Created by admin on 6/18/2026 | Last updated on 6/18/2026
Prerequisites (0/5 completed)
Log in to track your prerequisite progress.
Prerequisites Graph
Interactive dependency map showing prerequisite concepts
Loading dependency graph...
Theorem
Definition
Current
Requires
Rate this page
★
★
★
★
★
Poor
Excellent