Observable Companion Form Theorem — Statement & Proof

Theorem

Edit Issues Pull Requests Attributions Admin

Discussion

Proof

[proofplan] We first verify the canonical pair directly: the observability matrix of $(C_o,A_o)$ contains the standard coordinate rows in order, so it has full rank. We then compute the characteristic polynomial of $A_o$ from the determinant of $sI-A_o$, obtaining exactly $p(s)$. For a general observable pair $(C,A)$, we use its observability matrix as the coordinate-change matrix; the first $n-1$ rows shift by multiplication with $A$, while the final row is determined by the Cayley-Hamilton identity applied to the given characteristic polynomial. [/proofplan] [step:Verify observability of the canonical pair] Let $I_n \in \mathbb R^{n \times n}$ denote the identity matrix. Let $\mathcal O(C_o,A_o) \in \mathbb R^{n \times n}$ denote the observability matrix whose $k$-th row is $C_oA_o^{k-1}$ for $1 \leq k \leq n$. We claim that \begin{align*} C_oA_o^{k-1}(x) = x_k \end{align*} for every $x=(x_1,\dots,x_n) \in \mathbb R^n$ and every $1 \leq k \leq n$. For $m\in\{0,\dots,n-1\}$, we prove the stronger assertion that the first $n-m$ coordinates of $A_o^m x$ are $x_{m+1},\dots,x_n$. The case $m=0$ is immediate. Suppose the assertion holds for some $m\leq n-2$, and set $y=A_o^m x\in\mathbb R^n$. By the entrywise definition of $A_o$, the $i$-th coordinate of $A_oy$ is $y_{i+1}$ for $1\leq i\leq n-1$. Therefore the first $n-m-1$ coordinates of $A_o^{m+1}x=A_oy$ are $x_{m+2},\dots,x_n$. This proves the stronger assertion by induction. Taking the first coordinate with $m=k-1$ gives $C_oA_o^{k-1}(x)=x_k$ for $1\leq k\leq n$. Equivalently, the first $n$ rows of the observability sequence are the coordinate functionals $x \mapsto x_1,\dots,x \mapsto x_n$. Hence \begin{align*} \mathcal O(C_o,A_o) = I_n. \end{align*} Therefore $\mathcal O(C_o,A_o)$ has rank $n$. By the definition of observability for a single-output pair, this proves that $(C_o,A_o)$ is observable. [guided] Let $I_n \in \mathbb R^{n \times n}$ denote the identity matrix. The observability matrix records which linear functionals of the initial state can be recovered from repeated observations. Here the output map is $C_o(x)=x_1$, so the first observed coordinate is the first coordinate. The special shape of $A_o$ shifts coordinates upward: for every $y=(y_1,\dots,y_n)\in\mathbb R^n$ and every $1\leq i\leq n-1$, the $i$-th coordinate of $A_oy$ is $y_{i+1}$. Formally, define $\mathcal O(C_o,A_o) \in \mathbb R^{n \times n}$ to be the matrix whose $k$-th row is $C_oA_o^{k-1}$. We prove a stronger induction statement than just the first-coordinate identity: for each $m\in\{0,\dots,n-1\}$, the first $n-m$ coordinates of $A_o^m x$ are $x_{m+1},\dots,x_n$ for every $x=(x_1,\dots,x_n)\in\mathbb R^n$. The case $m=0$ says that the first $n$ coordinates of $x$ are $x_1,\dots,x_n$, so it is immediate. Assume the statement holds for some $m\leq n-2$, and set $y=A_o^m x\in\mathbb R^n$. By the induction hypothesis, the first $n-m$ coordinates of $y$ are $x_{m+1},\dots,x_n$. Since the $i$-th coordinate of $A_oy$ is $y_{i+1}$ for $1\leq i\leq n-1$, the first $n-m-1$ coordinates of $A_o^{m+1}x=A_oy$ are $x_{m+2},\dots,x_n$. This proves the stronger statement by induction. Now choose $m=k-1$. The first coordinate of $A_o^{k-1}x$ is $x_k$, and applying $C_o$ extracts exactly that first coordinate. Hence \begin{align*} C_oA_o^{k-1}(x) = x_k \end{align*} for every $1\leq k\leq n$. Thus the rows of $\mathcal O(C_o,A_o)$ are exactly the standard coordinate rows. Hence $\mathcal O(C_o,A_o)=I_n$, which has rank $n$. This is precisely the definition of observability for a single-output pair. [/guided] [/step] [step:Compute the characteristic polynomial of the canonical matrix] Let $q \in \mathbb R[s]$ denote the characteristic polynomial of $A_o$, defined by \begin{align*} q(s)=\det(sI_n-A_o) \end{align*} for each $s\in\mathbb R$. Define the polynomial matrix $B \in \mathbb R[s]^{n\times n}$ by \begin{align*} B(s)=sI_n-A_o \end{align*} for each $s\in\mathbb R$. By the entrywise definition of $A_o$, the nonzero entries of $B(s)$ are $B(s)_{ii}=s$ for $1\leq i\leq n-1$, $B(s)_{i,i+1}=-1$ for $1\leq i\leq n-1$, $B(s)_{n,j}=a_{j-1}$ for $1\leq j\leq n-1$, and $B(s)_{n,n}=s+a_{n-1}$. We compute $\det B(s)$ from the Leibniz determinant formula. A nonzero term is determined by the column $j$ chosen by the last row. If $1\leq j\leq n-1$, then rows $1,\dots,j-1$ must choose their diagonal entries, and rows $j,\dots,n-1$ must choose their superdiagonal entries. The corresponding permutation is the cycle sending $j$ to $j+1$, $j+1$ to $j+2$, and so on, with $n$ sent to $j$; its sign is $(-1)^{n-j}$. The product of matrix entries contains $n-j$ factors equal to $-1$, so the sign from the entries is also $(-1)^{n-j}$. Hence the total contribution for this $j$ is \begin{align*} a_{j-1}s^{j-1}. \end{align*} If $j=n$, all first $n-1$ rows choose their diagonal entries and the last row chooses $s+a_{n-1}$, so the contribution is \begin{align*} s^{n-1}(s+a_{n-1})=s^n+a_{n-1}s^{n-1}. \end{align*} These are the only nonzero Leibniz terms, because each row $i<n$ has nonzero entries only in columns $i$ and $i+1$. Therefore \begin{align*} q(s)=s^n+a_{n-1}s^{n-1}+a_{n-2}s^{n-2}+\cdots+a_1s+a_0. \end{align*} Thus $q=p$, so the characteristic polynomial of $A_o$ is $p$. [guided] We need to show that the canonical matrix has the prescribed characteristic polynomial, so we compute the determinant of $sI_n-A_o$ directly. Let $q \in \mathbb R[s]$ denote the characteristic polynomial of $A_o$, defined by \begin{align*} q(s)=\det(sI_n-A_o) \end{align*} for each $s\in\mathbb R$. Define the polynomial matrix $B \in \mathbb R[s]^{n\times n}$ by \begin{align*} B(s)=sI_n-A_o. \end{align*} The entrywise form of $A_o$ gives $B(s)_{ii}=s$ for $1\leq i\leq n-1$, $B(s)_{i,i+1}=-1$ for $1\leq i\leq n-1$, $B(s)_{n,j}=a_{j-1}$ for $1\leq j\leq n-1$, and $B(s)_{n,n}=s+a_{n-1}$. We now apply the Leibniz determinant formula to $B(s)$. Because each row $i<n$ has nonzero entries only in columns $i$ and $i+1$, a nonzero product in the determinant is determined by the column $j$ chosen by the last row. If $1\leq j\leq n-1$, then rows $1,\dots,j-1$ must choose their diagonal entries, while rows $j,\dots,n-1$ must choose their superdiagonal entries. The corresponding permutation cycles the columns $j,j+1,\dots,n$, so its sign is $(-1)^{n-j}$. The selected superdiagonal entries contribute $n-j$ factors of $-1$, giving another factor $(-1)^{n-j}$. These two signs multiply to $1$, and the contribution is \begin{align*} a_{j-1}s^{j-1}. \end{align*} If $j=n$, the first $n-1$ rows choose the diagonal entries and the last row chooses $s+a_{n-1}$, giving \begin{align*} s^{n-1}(s+a_{n-1})=s^n+a_{n-1}s^{n-1}. \end{align*} There are no other nonzero Leibniz terms, because choosing any other pattern would require a zero entry in one of the first $n-1$ rows. Therefore \begin{align*} q(s)=s^n+a_{n-1}s^{n-1}+a_{n-2}s^{n-2}+\cdots+a_1s+a_0. \end{align*} This is exactly the polynomial $p(s)$ from the theorem statement, so the characteristic polynomial of $A_o$ is $p$. [/guided] [/step] [step:Use the observability matrix as the coordinate basis] Let $A \in \mathbb R^{n \times n}$ have characteristic polynomial $p$, and let $C:\mathbb R^n \to \mathbb R$ be a [linear map](/page/Linear%20Map) such that $(C,A)$ is observable. Define the observability matrix $\mathcal O(C,A) \in \mathbb R^{n \times n}$ by declaring its $k$-th row to be $CA^{k-1}$ for $1 \leq k \leq n$. For a single-output pair, observability means that this $n\times n$ observability matrix has rank $n$. Hence $\mathcal O(C,A)$ is invertible. Set \begin{align*} P = \mathcal O(C,A)^{-1}. \end{align*} We use the observable coordinates $z=\mathcal O(C,A)x$, equivalently $x=Pz$, for a state vector $x\in\mathbb R^n$ and its coordinate vector $z\in\mathbb R^n$. We compute $\mathcal O(C,A)A$. Its first $n-1$ rows are $CA,CA^2,\dots,CA^{n-1}$, which are the second through $n$-th rows of $\mathcal O(C,A)$. Since $A\in\mathbb R^{n\times n}$ is a square real matrix and $p$ is its characteristic polynomial, the [Cayley-Hamilton theorem](/theorems/921) gives $p(A)=0$. Therefore, for the last row, \begin{align*} A^n + a_{n-1}A^{n-1} + \cdots + a_1A + a_0I_n = 0. \end{align*} Multiplying on the left by $C$ yields \begin{align*} CA^n = -a_{n-1}CA^{n-1} - \cdots - a_1CA - a_0C. \end{align*} Thus the last row of $\mathcal O(C,A)A$ is the linear combination of the rows of $\mathcal O(C,A)$ with coefficients $-a_0,-a_1,\dots,-a_{n-1}$. By the entrywise definition of $A_o$, this proves \begin{align*} \mathcal O(C,A)A = A_o\mathcal O(C,A). \end{align*} Multiplying on the right by $\mathcal O(C,A)^{-1}=P$ gives \begin{align*} P^{-1}AP = A_o. \end{align*} Finally, since the first row of $\mathcal O(C,A)$ is $C$, we have \begin{align*} CP = C\mathcal O(C,A)^{-1} = C_o. \end{align*} Therefore the observable pair $(C,A)$ is similar to $(C_o,A_o)$ in the observable coordinate basis determined by $P$. [guided] The goal is to build coordinates in which the output and dynamics take the canonical form. Let $A\in\mathbb R^{n\times n}$ have characteristic polynomial $p$, and let $C:\mathbb R^n\to\mathbb R$ be a linear map such that $(C,A)$ is observable. Define the observability matrix $\mathcal O(C,A)\in\mathbb R^{n\times n}$ by making its $k$-th row equal to $CA^{k-1}$ for $1\leq k\leq n$. For a single-output pair, observability means precisely that this $n\times n$ observability matrix has rank $n$, hence $\mathcal O(C,A)$ is invertible. Set \begin{align*} P=\mathcal O(C,A)^{-1}. \end{align*} The observable coordinate vector is $z=\mathcal O(C,A)x$, equivalently $x=Pz$, for a state vector $x\in\mathbb R^n$. We next compute how the dynamics look in these coordinates. Multiplying $\mathcal O(C,A)$ on the right by $A$ shifts the rows: the first $n-1$ rows of $\mathcal O(C,A)A$ are $CA,CA^2,\dots,CA^{n-1}$, which are exactly the second through $n$-th rows of $\mathcal O(C,A)$. The only row not obtained by this shift is the last row, $CA^n$. Since $A\in\mathbb R^{n\times n}$ is a square real matrix and $p$ is its characteristic polynomial, the [Cayley-Hamilton theorem](/theorems/921) gives $p(A)=0$. Thus \begin{align*} A^n+a_{n-1}A^{n-1}+\cdots+a_1A+a_0I_n=0. \end{align*} Multiplying this identity on the left by the linear map $C$ gives \begin{align*} CA^n=-a_{n-1}CA^{n-1}-\cdots-a_1CA-a_0C. \end{align*} Therefore the last row of $\mathcal O(C,A)A$ is the linear combination of the rows of $\mathcal O(C,A)$ with coefficients $-a_0,-a_1,\dots,-a_{n-1}$. Comparing this row-shift structure with the entrywise definition of $A_o$, we obtain \begin{align*} \mathcal O(C,A)A=A_o\mathcal O(C,A). \end{align*} Now multiply on the right by $\mathcal O(C,A)^{-1}=P$ to get \begin{align*} P^{-1}AP=A_o. \end{align*} Finally, the first row of $\mathcal O(C,A)$ is $C$, while the first row of the identity matrix is $C_o$. Hence \begin{align*} CP=C\mathcal O(C,A)^{-1}=C_o. \end{align*} Thus, in the observable coordinate basis determined by $P$, the pair $(C,A)$ becomes $(C_o,A_o)$. This proves that every observable single-output pair with characteristic polynomial $p$ is similar to the canonical pair. [/guided] [/step]

Prerequisites (0/3 completed)

Prerequisites Graph

Interactive dependency map showing how this theorem builds on foundational concepts

Loading dependency graph...

Theorems

Hamilton Identity for Gradient Ricci Solitons

Definitions & Concepts

What brings you to Androma?

Start with a route through the knowledge graph.

Observable Companion Form Theorem (Theorem # 6384)

Discussion

Proof

Prerequisites (0/3 completed)

Prerequisites Graph

Explore Further

Sign in to Androma

Check your inbox

One last step

Observable Companion Form Theorem (Theorem # 6384)

Discussion

Proof

Prerequisites (0/3 completed)

Prerequisites Graph

Explore Further