[proofplan]
The proof rescales the observer error so that the high-gain terms become a fixed Hurwitz companion system multiplied by $\varepsilon^{-1}$. The triangular Lipschitz hypothesis is exactly what prevents the nonlinear perturbation from acquiring worse powers of $\varepsilon$ in the scaled coordinates. A quadratic Lyapunov function for the Hurwitz companion matrix gives an $\varepsilon^{-1}$ decay term, and choosing $\varepsilon$ sufficiently small lets this term dominate the uniformly bounded nonlinear perturbation. Finally, the scaling map is inverted, producing the factor $\varepsilon^{-(n-1)}$ in the original error coordinates.
[/proofplan]
[step:Write the error dynamics in original coordinates]
Define the error map $e:[0,T]\to\mathbb{R}^n$ by
\begin{align*}
e(t)=\hat{x}(t)-x(t).
\end{align*}
Thus $e_i(t)=\hat{x}_i(t)-x_i(t)$ for $1\le i\le n$. Since $x$ and $\hat{x}$ are absolutely continuous, $e$ is absolutely continuous. For $\mathcal{L}^1$-a.e. $t\in[0,T]$, subtracting the plant equations from the observer equations and using $y(t)=x_1(t)$ gives, for $1\le i\le n-1$,
\begin{align*}
\dot{e}_i(t)=e_{i+1}(t)+\phi_i(\hat{x}(t),u(t))-\phi_i(x(t),u(t))-\frac{a_i}{\varepsilon^i}e_1(t),
\end{align*}
and
\begin{align*}
\dot{e}_n(t)=\phi_n(\hat{x}(t),u(t))-\phi_n(x(t),u(t))-\frac{a_n}{\varepsilon^n}e_1(t).
\end{align*}
These identities hold for $\mathcal{L}^1$-a.e. $t$ because both trajectories satisfy their differential equations for $\mathcal{L}^1$-a.e. $t$.
[/step]
[step:Scale the error so the linear part is a Hurwitz companion system]
Define the scaled error map $\eta:[0,T]\to\mathbb{R}^n$ by
\begin{align*}
\eta_i(t)=\frac{e_i(t)}{\varepsilon^{n-i}}, \qquad 1\le i\le n.
\end{align*}
Equivalently, define the diagonal [linear map](/page/Linear%20Map) $S_\varepsilon:\mathbb{R}^n\to\mathbb{R}^n$ by
\begin{align*}
S_\varepsilon(e_1,\dots,e_n)=\left(\frac{e_1}{\varepsilon^{n-1}},\frac{e_2}{\varepsilon^{n-2}},\dots,e_n\right).
\end{align*}
Then $\eta(t)=S_\varepsilon e(t)$. Since $e$ is absolutely continuous and $S_\varepsilon$ is linear, $\eta$ is absolutely continuous.
Define the matrix $A\in\mathbb{R}^{n\times n}$ by setting $A_{i1}=-a_i$ for $1\le i\le n$, $A_{i,i+1}=1$ for $1\le i\le n-1$, and $A_{ij}=0$ otherwise. Thus $A$ is the companion matrix with first column $(-a_1,\dots,-a_n)^\top$ and superdiagonal entries equal to $1$.
The characteristic polynomial of $A$ is
\begin{align*}
\det(sI-A)=s^n+a_1s^{n-1}+\cdots+a_{n-1}s+a_n.
\end{align*}
Hence $A$ is Hurwitz by the hypothesis on $p$.
Define the perturbation map $r_\varepsilon:[0,T]\to\mathbb{R}^n$ componentwise, for $\mathcal{L}^1$-a.e. $t\in[0,T]$, by
\begin{align*}
(r_\varepsilon(t))_i=\frac{\phi_i(\hat{x}(t),u(t))-\phi_i(x(t),u(t))}{\varepsilon^{n-i}}, \qquad 1\le i\le n.
\end{align*}
For $i=n$ this denominator is $\varepsilon^0=1$. Dividing the error equations by $\varepsilon^{n-i}$ gives
\begin{align*}
\dot{\eta}(t)=\frac{1}{\varepsilon}A\eta(t)+r_\varepsilon(t)
\end{align*}
for $\mathcal{L}^1$-a.e. $t\in[0,T]$.
[guided]
The purpose of the scaling is to put all observer injection terms on the same $\varepsilon^{-1}$ time scale. We define
\begin{align*}
\eta_i(t)=\frac{e_i(t)}{\varepsilon^{n-i}}, \qquad 1\le i\le n.
\end{align*}
This means that $e_i(t)=\varepsilon^{n-i}\eta_i(t)$. Since $e$ is absolutely continuous and multiplication by the constant $\varepsilon^{-(n-i)}$ preserves absolute continuity, each $\eta_i$ is absolutely continuous.
Now compute the first $n-1$ equations. For $1\le i\le n-1$, we divide the identity for $\dot e_i$ by $\varepsilon^{n-i}$:
\begin{align*}
\dot{\eta}_i(t)=\frac{e_{i+1}(t)}{\varepsilon^{n-i}}+\frac{\phi_i(\hat{x}(t),u(t))-\phi_i(x(t),u(t))}{\varepsilon^{n-i}}-\frac{a_i}{\varepsilon^i}\frac{e_1(t)}{\varepsilon^{n-i}}.
\end{align*}
Using $e_{i+1}(t)=\varepsilon^{n-i-1}\eta_{i+1}(t)$ and $e_1(t)=\varepsilon^{n-1}\eta_1(t)$, this becomes
\begin{align*}
\dot{\eta}_i(t)=\frac{1}{\varepsilon}\eta_{i+1}(t)-\frac{a_i}{\varepsilon}\eta_1(t)+\frac{\phi_i(\hat{x}(t),u(t))-\phi_i(x(t),u(t))}{\varepsilon^{n-i}}.
\end{align*}
For the last coordinate, the same calculation gives
\begin{align*}
\dot{\eta}_n(t)=-\frac{a_n}{\varepsilon}\eta_1(t)+\phi_n(\hat{x}(t),u(t))-\phi_n(x(t),u(t)).
\end{align*}
These equations are exactly
\begin{align*}
\dot{\eta}(t)=\frac{1}{\varepsilon}A\eta(t)+r_\varepsilon(t),
\end{align*}
where the matrix $A$ has first column $(-a_1,\dots,-a_n)^\top$ and superdiagonal entries equal to $1$. The sign of the first column is negative because the observer uses $y-\hat{x}_1=x_1-\hat{x}_1=-e_1$. The characteristic polynomial of this companion-form error matrix is
\begin{align*}
\det(sI-A)=s^n+a_1s^{n-1}+\cdots+a_{n-1}s+a_n.
\end{align*}
Since this polynomial is Hurwitz by hypothesis, every eigenvalue of $A$ has strictly negative real part. This is the stability input that the Lyapunov argument will use.
[/guided]
[/step]
[step:Bound the nonlinear perturbation uniformly in the scaled variables]
Assume $0<\varepsilon\le 1$. For $\mathcal{L}^1$-a.e. $t\in[0,T]$, the hypotheses give $x(t)\in K\subset N$, $\hat{x}(t)\in N$, and $u(t)\in U_0$. Therefore, for each $1\le i\le n$,
\begin{align*}
|\phi_i(\hat{x}(t),u(t))-\phi_i(x(t),u(t))| \le L\sum_{j=1}^{i}|e_j(t)|.
\end{align*}
Using $e_j(t)=\varepsilon^{n-j}\eta_j(t)$, we obtain
\begin{align*}
|(r_\varepsilon(t))_i| \le L\sum_{j=1}^{i}\varepsilon^{i-j}|\eta_j(t)|.
\end{align*}
Since $0<\varepsilon\le 1$ and $j\le i$, we have $\varepsilon^{i-j}\le 1$, and hence
\begin{align*}
|(r_\varepsilon(t))_i| \le L\sum_{j=1}^{i}|\eta_j(t)| \le Li|\eta(t)| \le Ln|\eta(t)|.
\end{align*}
Consequently,
\begin{align*}
|r_\varepsilon(t)|^2 \le \sum_{i=1}^{n}L^2n^2|\eta(t)|^2=L^2n^3|\eta(t)|^2.
\end{align*}
Define
\begin{align*}
M_L=Ln^{3/2}.
\end{align*}
Then
\begin{align*}
|r_\varepsilon(t)|\le M_L|\eta(t)|
\end{align*}
for $\mathcal{L}^1$-a.e. $t\in[0,T]$.
[/step]
[step:Choose a quadratic Lyapunov function for the Hurwitz matrix]
Since $A$ is Hurwitz, the [Lyapunov equation theorem for Hurwitz matrices](/theorems/6369) gives a symmetric positive definite matrix $P\in\mathbb{R}^{n\times n}$ satisfying
\begin{align*}
A^\top P+PA=-I.
\end{align*}
Here $I\in\mathbb{R}^{n\times n}$ is the identity matrix. The invoked Lyapunov equation result is: for every real Hurwitz matrix $A$, there exists a unique real symmetric positive definite matrix $P$ solving $A^\top P+PA=-I$.
Let $\lambda_{\min}>0$ and $\lambda_{\max}>0$ denote respectively the smallest and largest eigenvalues of $P$. Since $P$ is symmetric positive definite, for every $\xi\in\mathbb{R}^n$,
\begin{align*}
\lambda_{\min}|\xi|^2 \le \xi^\top P\xi \le \lambda_{\max}|\xi|^2.
\end{align*}
Let $\|P\|_{\mathrm{op}}$ denote the operator norm of the linear map $P:\mathbb{R}^n\to\mathbb{R}^n$ with respect to the Euclidean norm, so that $|P\xi|\le \|P\|_{\mathrm{op}}|\xi|$ for every $\xi\in\mathbb{R}^n$.
Define the Lyapunov function $V:\mathbb{R}^n\to\mathbb{R}$ by
\begin{align*}
V(\xi)=\xi^\top P\xi.
\end{align*}
[/step]
[step:Differentiate the Lyapunov function and absorb the perturbation]
For $\mathcal{L}^1$-a.e. $t\in[0,T]$, the chain rule for absolutely continuous functions gives
\begin{align*}
\frac{d}{dt}V(\eta(t))=2\eta(t)^\top P\dot{\eta}(t).
\end{align*}
Using the scaled dynamics,
\begin{align*}
\frac{d}{dt}V(\eta(t))=\frac{1}{\varepsilon}\eta(t)^\top(A^\top P+PA)\eta(t)+2\eta(t)^\top P r_\varepsilon(t).
\end{align*}
The Lyapunov equation gives the first term:
\begin{align*}
\frac{1}{\varepsilon}\eta(t)^\top(A^\top P+PA)\eta(t)=-\frac{1}{\varepsilon}|\eta(t)|^2.
\end{align*}
For the perturbation term, the [Cauchy-Schwarz inequality](/theorems/432) in $\mathbb{R}^n$, the operator norm bound $|P\xi|\le \|P\|_{\mathrm{op}}|\xi|$, and the estimate $|r_\varepsilon(t)|\le M_L|\eta(t)|$ give
\begin{align*}
2|\eta(t)^\top P r_\varepsilon(t)| \le 2\|P\|_{\mathrm{op}}M_L|\eta(t)|^2.
\end{align*}
Therefore
\begin{align*}
\frac{d}{dt}V(\eta(t)) \le -\left(\frac{1}{\varepsilon}-2\|P\|_{\mathrm{op}}M_L\right)|\eta(t)|^2.
\end{align*}
Define
\begin{align*}
\varepsilon_0=\min\left\{1,\frac{1}{4\|P\|_{\mathrm{op}}M_L}\right\}.
\end{align*}
If $0<\varepsilon<\varepsilon_0$, then
\begin{align*}
\frac{1}{\varepsilon}-2\|P\|_{\mathrm{op}}M_L \ge \frac{1}{2\varepsilon}.
\end{align*}
Thus
\begin{align*}
\frac{d}{dt}V(\eta(t)) \le -\frac{1}{2\varepsilon}|\eta(t)|^2.
\end{align*}
Using $V(\eta(t))\le \lambda_{\max}|\eta(t)|^2$ in the equivalent form $|\eta(t)|^2\ge V(\eta(t))/\lambda_{\max}$, we obtain
\begin{align*}
\frac{d}{dt}V(\eta(t)) \le -\frac{1}{2\lambda_{\max}\varepsilon}V(\eta(t))
\end{align*}
for $\mathcal{L}^1$-a.e. $t\in[0,T]$.
[guided]
We now turn the scaled differential equation into decay. The Lyapunov function is
\begin{align*}
V(\xi)=\xi^\top P\xi,
\end{align*}
where $P$ is symmetric positive definite and satisfies
\begin{align*}
A^\top P+PA=-I.
\end{align*}
Here $\|P\|_{\mathrm{op}}$ denotes the operator norm of the linear map $P:\mathbb{R}^n\to\mathbb{R}^n$ with respect to the Euclidean norm, so $|P\xi|\le \|P\|_{\mathrm{op}}|\xi|$ for every $\xi\in\mathbb{R}^n$.
Because $\eta$ is absolutely continuous and $V$ is a polynomial function, the chain rule for absolutely continuous functions applies. Hence, for $\mathcal{L}^1$-a.e. $t$,
\begin{align*}
\frac{d}{dt}V(\eta(t))=2\eta(t)^\top P\dot{\eta}(t).
\end{align*}
Substitute the scaled error dynamics:
\begin{align*}
\frac{d}{dt}V(\eta(t))=2\eta(t)^\top P\left(\frac{1}{\varepsilon}A\eta(t)+r_\varepsilon(t)\right).
\end{align*}
The linear part is symmetrised by writing
\begin{align*}
2\eta(t)^\top PA\eta(t)=\eta(t)^\top(A^\top P+PA)\eta(t).
\end{align*}
Therefore
\begin{align*}
\frac{d}{dt}V(\eta(t))=\frac{1}{\varepsilon}\eta(t)^\top(A^\top P+PA)\eta(t)+2\eta(t)^\top P r_\varepsilon(t).
\end{align*}
The Lyapunov equation now gives the good negative term:
\begin{align*}
\frac{1}{\varepsilon}\eta(t)^\top(A^\top P+PA)\eta(t)=-\frac{1}{\varepsilon}|\eta(t)|^2.
\end{align*}
The remaining term is the nonlinear perturbation. We estimate it using the Cauchy-Schwarz inequality and the operator norm of $P$:
\begin{align*}
2|\eta(t)^\top P r_\varepsilon(t)| \le 2|\eta(t)|\,|P r_\varepsilon(t)| \le 2\|P\|_{\mathrm{op}}|\eta(t)|\,|r_\varepsilon(t)|.
\end{align*}
We now verify the perturbation bound inside this guided argument. Since $x(t)\in K\subset N$, $\hat{x}(t)\in N$, and $u(t)\in U_0$ for the times under consideration, the triangular Lipschitz hypothesis gives, for each $1\le i\le n$,
\begin{align*}
|\phi_i(\hat{x}(t),u(t))-\phi_i(x(t),u(t))| \le L\sum_{j=1}^{i}|e_j(t)|.
\end{align*}
Using $e_j(t)=\varepsilon^{n-j}\eta_j(t)$ and the definition
\begin{align*}
(r_\varepsilon(t))_i=\frac{\phi_i(\hat{x}(t),u(t))-\phi_i(x(t),u(t))}{\varepsilon^{n-i}},
\end{align*}
we obtain
\begin{align*}
|(r_\varepsilon(t))_i| \le L\sum_{j=1}^{i}\varepsilon^{i-j}|\eta_j(t)|.
\end{align*}
Because $0<\varepsilon\le 1$ and $j\le i$, each factor satisfies $\varepsilon^{i-j}\le 1$. Hence
\begin{align*}
|(r_\varepsilon(t))_i| \le L\sum_{j=1}^{i}|\eta_j(t)| \le Ln|\eta(t)|.
\end{align*}
Squaring and summing over $1\le i\le n$ gives
\begin{align*}
|r_\varepsilon(t)|^2\le L^2n^3|\eta(t)|^2.
\end{align*}
Thus, with $M_L=Ln^{3/2}$, we have $|r_\varepsilon(t)|\le M_L|\eta(t)|$. Substituting this into the Cauchy-Schwarz estimate gives
\begin{align*}
2|\eta(t)^\top P r_\varepsilon(t)| \le 2\|P\|_{\mathrm{op}}M_L|\eta(t)|^2.
\end{align*}
Combining the negative linear contribution with the perturbation bound yields
\begin{align*}
\frac{d}{dt}V(\eta(t)) \le -\left(\frac{1}{\varepsilon}-2\|P\|_{\mathrm{op}}M_L\right)|\eta(t)|^2.
\end{align*}
This is the point where high gain is used quantitatively. The destabilising perturbation has size independent of $\varepsilon$, while the Hurwitz linear part contributes $\varepsilon^{-1}|\eta|^2$. Choose
\begin{align*}
\varepsilon_0=\min\left\{1,\frac{1}{4\|P\|_{\mathrm{op}}M_L}\right\}.
\end{align*}
Then $0<\varepsilon<\varepsilon_0$ implies
\begin{align*}
\frac{1}{\varepsilon}-2\|P\|_{\mathrm{op}}M_L \ge \frac{1}{2\varepsilon}.
\end{align*}
Hence
\begin{align*}
\frac{d}{dt}V(\eta(t)) \le -\frac{1}{2\varepsilon}|\eta(t)|^2.
\end{align*}
Finally, the upper spectral bound $V(\xi)\le\lambda_{\max}|\xi|^2$ implies $|\xi|^2\ge V(\xi)/\lambda_{\max}$. Applying this with $\xi=\eta(t)$ gives
\begin{align*}
\frac{d}{dt}V(\eta(t)) \le -\frac{1}{2\lambda_{\max}\varepsilon}V(\eta(t)).
\end{align*}
This is a scalar differential inequality for the nonnegative absolutely [continuous function](/page/Continuous%20Function) $t\mapsto V(\eta(t))$.
[/guided]
[/step]
[step:Integrate the scalar differential inequality]
Define
\begin{align*}
\mu=\frac{1}{2\lambda_{\max}}.
\end{align*}
The previous step gives
\begin{align*}
\frac{d}{dt}V(\eta(t))\le -\frac{\mu}{\varepsilon}V(\eta(t))
\end{align*}
for $\mathcal{L}^1$-a.e. $t\in[0,T]$. Since $\eta$ is absolutely continuous and $V$ is a polynomial function, the map $w:[0,T]\to\mathbb{R}$ defined by $w(t)=V(\eta(t))$ is absolutely continuous. Since $P$ is positive definite, $w(t)=V(\eta(t))\ge 0$ for every $t\in[0,T]$. Applying the integral form of Gronwall's inequality to the nonnegative absolutely continuous map $w$ with coefficient $-\mu/\varepsilon$ yields
\begin{align*}
V(\eta(t))\le e^{-\mu t/\varepsilon}V(\eta(0))
\end{align*}
for every $t\in[0,T]$. Using the lower and upper spectral bounds for $P$, we get
\begin{align*}
\lambda_{\min}|\eta(t)|^2 \le V(\eta(t)) \le e^{-\mu t/\varepsilon}V(\eta(0)) \le \lambda_{\max}e^{-\mu t/\varepsilon}|\eta(0)|^2.
\end{align*}
Therefore
\begin{align*}
|\eta(t)| \le \left(\frac{\lambda_{\max}}{\lambda_{\min}}\right)^{1/2}e^{-\mu t/(2\varepsilon)}|\eta(0)|.
\end{align*}
Define
\begin{align*}
C_\eta=\left(\frac{\lambda_{\max}}{\lambda_{\min}}\right)^{1/2},
\qquad
\lambda=\frac{\mu}{2}=\frac{1}{4\lambda_{\max}}.
\end{align*}
Then
\begin{align*}
|\eta(t)|\le C_\eta e^{-\lambda t/\varepsilon}|\eta(0)|
\end{align*}
for every $t\in[0,T]$.
[/step]
[step:Return from scaled coordinates to the original observer error]
For $0<\varepsilon<\varepsilon_0\le 1$, the relation $e_i(t)=\varepsilon^{n-i}\eta_i(t)$ gives
\begin{align*}
|e(t)|^2=\sum_{i=1}^{n}\varepsilon^{2(n-i)}|\eta_i(t)|^2 \le \sum_{i=1}^{n}|\eta_i(t)|^2=|\eta(t)|^2.
\end{align*}
Thus $|e(t)|\le|\eta(t)|$. Also,
\begin{align*}
|\eta(0)|^2=\sum_{i=1}^{n}\varepsilon^{-2(n-i)}|e_i(0)|^2 \le \varepsilon^{-2(n-1)}\sum_{i=1}^{n}|e_i(0)|^2=\varepsilon^{-2(n-1)}|e(0)|^2,
\end{align*}
so
\begin{align*}
|\eta(0)|\le \varepsilon^{-(n-1)}|e(0)|.
\end{align*}
Combining these two scaling estimates with the decay estimate for $\eta$ gives
\begin{align*}
|e(t)|\le C_\eta \varepsilon^{-(n-1)}e^{-\lambda t/\varepsilon}|e(0)|
\end{align*}
for every $t\in[0,T]$.
Set $C=C_\eta$. The constants $\varepsilon_0$, $C$, and $\lambda$ depend only on $L$, $n$, and the coefficients $a_1,\dots,a_n$ through $A$, $P$, and $M_L$; the dimension $n$ is encoded by the coefficient list $a_1,\dots,a_n$ and by the ambient space containing $K\subset\mathbb{R}^n$. Hence the constants depend only on the permitted data $K$, $N$, $U_0$, $L$, and $a_1,\dots,a_n$. This proves the asserted local high-gain observer estimate.
[/step]