Orthogonality Principle for Least-Squares Estimation

Orthogonality Principle for Least-Squares Estimation (Theorem # 6405)

Theorem

Edit Issues Pull Requests Attributions Admin

Discussion

Proof

[proofplan] We prove both directions directly from the Hilbert-space [inner product](/page/Inner%20Product). If $\hat X$ is a minimizer, then every line $\hat X+tZ$ inside $\mathcal H$ has minimal squared error at $t=0$, and expanding this quadratic in $t$ forces the linear coefficient to vanish. Conversely, if the residual $X-\hat X$ is orthogonal to every vector in $\mathcal H$, then expanding $X-Z=(X-\hat X)+(\hat X-Z)$ gives a Pythagorean identity, which shows that no $Z \in \mathcal H$ can have smaller error. [/proofplan] [step:Expand the error along every affine line in $\mathcal H$] Assume first that $\hat X$ minimizes $\mathbb E[|X-Z|^2]$ over $Z \in \mathcal H$. The probability space $(\Omega,\mathcal F,\mathbb P)$ is fixed from the theorem statement, and all random vectors are regarded as elements of $L^2(\Omega;\mathbb R^n)$. Define the residual random vector $R \in L^2(\Omega;\mathbb R^n)$ by $R := X-\hat X$. Fix $Z \in \mathcal H$. Since $\mathcal H$ is a linear subspace and $\hat X,Z \in \mathcal H$, we have $\hat X+tZ \in \mathcal H$ for every $t \in \mathbb R$. Define the function $\phi:\mathbb R \to \mathbb R$ by \begin{align*} \phi(t) := \mathbb E[|X-(\hat X+tZ)|^2]. \end{align*} Because $R,Z \in L^2(\Omega;\mathbb R^n)$, the quantities $\mathbb E[|R|^2]$, $\mathbb E[R^\top Z]$, and $\mathbb E[|Z|^2]$ are finite by the [Cauchy-Schwarz inequality](/theorems/432) in $L^2(\Omega;\mathbb R^n)$. Expanding the Euclidean square pointwise and taking expectations gives \begin{align*} \phi(t)=\mathbb E[|R-tZ|^2]=\mathbb E[|R|^2]-2t\,\mathbb E[R^\top Z]+t^2\mathbb E[|Z|^2]. \end{align*} Since $\hat X$ is a minimizer, $\phi(0)\leq \phi(t)$ for every $t \in \mathbb R$. Hence \begin{align*} 0\leq \phi(t)-\phi(0)=-2t\,\mathbb E[R^\top Z]+t^2\mathbb E[|Z|^2] \end{align*} for every $t \in \mathbb R$. [guided] Assume that $\hat X$ minimizes the least-squares error over $\mathcal H$. The probability space $(\Omega,\mathcal F,\mathbb P)$ is the one fixed in the theorem statement, and the notation $L^2(\Omega;\mathbb R^n)$ means square-integrable $\mathbb R^n$-valued random vectors modulo almost-sure equality. The residual is the part of $X$ not captured by $\hat X$, so define $R \in L^2(\Omega;\mathbb R^n)$ by \begin{align*} R := X-\hat X. \end{align*} To test whether $R$ is orthogonal to a direction $Z \in \mathcal H$, we move from $\hat X$ along the line determined by $Z$. Fix $Z \in \mathcal H$. Since $\mathcal H$ is a linear subspace, every perturbation $\hat X+tZ$ also belongs to $\mathcal H$ for $t \in \mathbb R$. Define $\phi:\mathbb R \to \mathbb R$ by \begin{align*} \phi(t) := \mathbb E[|X-(\hat X+tZ)|^2]. \end{align*} This function records the least-squares error along the admissible line through $\hat X$. We may expand this expression because $R,Z \in L^2(\Omega;\mathbb R^n)$. In particular, $\mathbb E[R^\top Z]$ is finite by the Cauchy-Schwarz inequality in the [Hilbert space](/page/Hilbert%20Space) $L^2(\Omega;\mathbb R^n)$. Since $X-(\hat X+tZ)=R-tZ$, the Euclidean identity $|a-b|^2=|a|^2-2a^\top b+|b|^2$ gives pointwise \begin{align*} |R-tZ|^2=|R|^2-2tR^\top Z+t^2|Z|^2. \end{align*} Taking expectations yields \begin{align*} \phi(t)=\mathbb E[|R|^2]-2t\,\mathbb E[R^\top Z]+t^2\mathbb E[|Z|^2]. \end{align*} The minimality of $\hat X$ says that $t=0$ minimizes this quadratic function, because $\hat X+tZ$ is admissible for every real $t$. Therefore \begin{align*} 0\leq \phi(t)-\phi(0)=-2t\,\mathbb E[R^\top Z]+t^2\mathbb E[|Z|^2] \end{align*} for every $t \in \mathbb R$. [/guided] [/step] [step:Force the residual to be orthogonal to every direction in $\mathcal H$] Keep $Z \in \mathcal H$ fixed and set \begin{align*} b := \mathbb E[R^\top Z], \qquad a := \mathbb E[|Z|^2]. \end{align*} The inequality from the previous step is \begin{align*} 0\leq -2tb+t^2a \end{align*} for every $t \in \mathbb R$. If $b>0$, choosing $t>0$ sufficiently small gives $-2tb+t^2a<0$. If $b<0$, choosing $t<0$ with sufficiently small $|t|$ gives $-2tb+t^2a<0$. Both alternatives contradict the inequality. Thus $b=0$, meaning \begin{align*} \mathbb E[(X-\hat X)^\top Z]=0. \end{align*} Since $Z \in \mathcal H$ was arbitrary, the residual $X-\hat X$ is orthogonal to every element of $\mathcal H$. [guided] We continue from the variational inequality obtained in the previous step. Fix $Z \in \mathcal H$ and define the [real numbers](/page/Real%20Numbers) $b \in \mathbb R$ and $a \in [0,\infty)$ by \begin{align*} b := \mathbb E[R^\top Z], \qquad a := \mathbb E[|Z|^2]. \end{align*} The finiteness of $b$ follows from the Cauchy-Schwarz inequality in $L^2(\Omega;\mathbb R^n)$, and $a$ is finite because $Z$ is square-integrable. The inequality from the preceding line expansion says \begin{align*} 0\leq -2tb+t^2a \end{align*} for every $t \in \mathbb R$. Why must the linear coefficient vanish? If $b>0$, choose a real number $t>0$ satisfying $0<t<2b/(a+1)$. Then $a+1>0$, and \begin{align*} -2tb+t^2a=t(-2b+ta)\leq t(-2b+t(a+1))<0, \end{align*} contradicting the inequality. If $b<0$, choose $t<0$ with $0<|t|<2|b|/(a+1)$. Writing $t=-|t|$, we get \begin{align*} -2tb+t^2a=|t|(2b+|t|a)\leq |t|(2b+|t|(a+1))<0, \end{align*} again contradicting the inequality. Both nonzero alternatives are impossible, so $b=0$. Substituting the definition of $b$ gives \begin{align*} \mathbb E[(X-\hat X)^\top Z]=\mathbb E[R^\top Z]=0. \end{align*} Because $Z \in \mathcal H$ was arbitrary, this proves that $X-\hat X$ is orthogonal to every admissible direction in $\mathcal H$. [/guided] [/step] [step:Use orthogonality to obtain the Pythagorean identity] Conversely, assume that \begin{align*} \mathbb E[(X-\hat X)^\top Y]=0 \end{align*} for every $Y \in \mathcal H$. Let $Z \in \mathcal H$ be arbitrary, and define $W \in \mathcal H$ by $W:=\hat X-Z$. Since $\mathcal H$ is a linear subspace and $\hat X,Z \in \mathcal H$, we have $W \in \mathcal H$. Hence the assumed orthogonality gives \begin{align*} \mathbb E[(X-\hat X)^\top W]=0. \end{align*} Using $X-Z=(X-\hat X)+(\hat X-Z)=R+W$, where $R:=X-\hat X$, we expand: \begin{align*} \mathbb E[|X-Z|^2]=\mathbb E[|R+W|^2]=\mathbb E[|R|^2]+2\mathbb E[R^\top W]+\mathbb E[|W|^2]. \end{align*} The middle term is zero by orthogonality, so \begin{align*} \mathbb E[|X-Z|^2]=\mathbb E[|X-\hat X|^2]+\mathbb E[|\hat X-Z|^2]. \end{align*} Since $\mathbb E[|\hat X-Z|^2]\geq 0$, it follows that \begin{align*} \mathbb E[|X-Z|^2]\geq \mathbb E[|X-\hat X|^2]. \end{align*} [guided] Now assume the orthogonality condition: \begin{align*} \mathbb E[(X-\hat X)^\top Y]=0 \end{align*} for every $Y \in \mathcal H$. We want to prove that $\hat X$ minimizes the squared error. Take an arbitrary competitor $Z \in \mathcal H$. The useful comparison vector is not $Z$ itself, but the displacement from $Z$ to $\hat X$. Define $W \in L^2(\Omega;\mathbb R^n)$ by \begin{align*} W := \hat X-Z. \end{align*} Because $\mathcal H$ is a linear subspace and both $\hat X$ and $Z$ lie in $\mathcal H$, the difference $W$ also lies in $\mathcal H$. Therefore the orthogonality hypothesis applies to $W$, giving \begin{align*} \mathbb E[(X-\hat X)^\top W]=0. \end{align*} Define $R \in L^2(\Omega;\mathbb R^n)$ by $R:=X-\hat X$. Then \begin{align*} X-Z=(X-\hat X)+(\hat X-Z)=R+W. \end{align*} Expanding the Euclidean square pointwise gives \begin{align*} |X-Z|^2=|R+W|^2=|R|^2+2R^\top W+|W|^2. \end{align*} Taking expectations, and using the orthogonality just proved for $W$, we obtain \begin{align*} \mathbb E[|X-Z|^2]=\mathbb E[|R|^2]+2\mathbb E[R^\top W]+\mathbb E[|W|^2]=\mathbb E[|R|^2]+\mathbb E[|W|^2]. \end{align*} Substituting back $R=X-\hat X$ and $W=\hat X-Z$ gives the Pythagorean identity \begin{align*} \mathbb E[|X-Z|^2]=\mathbb E[|X-\hat X|^2]+\mathbb E[|\hat X-Z|^2]. \end{align*} The second term is nonnegative because it is the expectation of a nonnegative [random variable](/page/Random%20Variable). Hence \begin{align*} \mathbb E[|X-Z|^2]\geq \mathbb E[|X-\hat X|^2]. \end{align*} This proves that the arbitrary competitor $Z$ cannot improve on $\hat X$. [/guided] [/step] [step:Conclude the equivalence] The previous step shows that every $Z \in \mathcal H$ satisfies \begin{align*} \mathbb E[|X-Z|^2]\geq \mathbb E[|X-\hat X|^2], \end{align*} so $\hat X$ minimizes the least-squares error over $\mathcal H$. Together with the first two steps, this proves both implications and completes the proof of the equivalence. [guided] The previous step proved the key inequality for an arbitrary competitor $Z \in \mathcal H$: \begin{align*} \mathbb E[|X-Z|^2]\geq \mathbb E[|X-\hat X|^2]. \end{align*} This is exactly the definition that $\hat X$ minimizes the least-squares error over $\mathcal H$. The first two steps proved the reverse implication, namely that minimality forces the orthogonality condition. Therefore the two conditions are equivalent. [/guided] [/step]

Prerequisites (0/6 completed)

Prerequisites Graph

Interactive dependency map showing how this theorem builds on foundational concepts

Loading dependency graph...

Theorems

Definitions & Concepts

Explore Further

Orthogonality Definition Expectation Definition test Theorem #89 Cauchy-Schwarz Inequality Theorem #432 Cauchy-Schwarz Inequality for Covariance Theorem #5018 Pythagorean Identity for Orthonormal Families Theorem #4883 Bode Sensitivity Integral applied Error Reduction for RP applied Polynomial-Time Decidability of Bipartite Matching applied Equivalence of Nondeterministic Polynomial Time and Polynomial-Time Verification applied Observable Companion Form Theorem applied Pole Placement Theorem for Single-Input Controllable Linear Systems applied Savitch's Theorem applied Gramian Range Criterion applied

What brings you to Androma?

Start with a route through the knowledge graph.