Normal Equations for Finite-Dimensional Least Squares

Normal Equations for Finite-Dimensional Least Squares (Theorem # 6863)

Theorem

Edit Issues Pull Requests Attributions Admin

Discussion

Proof

[proofplan] We first show that the residual $x-p$ is orthogonal to every vector in $V$ by differentiating the squared distance along arbitrary affine lines $p+tv$ inside $V$. Testing this orthogonality relation against each spanning vector $\varphi_i$ gives the normal equations. Finally, when the spanning family is linearly independent, the coefficient matrix is the Gram matrix, whose quadratic form is the squared Hilbert norm of a linear combination of the $\varphi_j$; this proves invertibility and hence uniqueness. [/proofplan] [step:Differentiate the squared distance along every direction in $V$] Let $v \in V$ be arbitrary. Define the function $q_v: \mathbb{R} \to \mathbb{R}$ by \begin{align*} q_v(t) := \|x-(p+tv)\|_H^2. \end{align*} Since $V$ is a linear subspace and $p,v \in V$, we have $p+tv \in V$ for every $t \in \mathbb{R}$. The best-approximation property of $p$ therefore gives \begin{align*} q_v(0) \leq q_v(t) \end{align*} for every $t \in \mathbb{R}$, so $q_v$ has a minimum at $0$. Expanding the square in the real [Hilbert space](/page/Hilbert%20Space) $H$ gives \begin{align*} q_v(t) = \|x-p\|_H^2 - 2t(x-p,v)_H + t^2\|v\|_H^2. \end{align*} Thus $q_v$ is differentiable and \begin{align*} q_v'(0) = -2(x-p,v)_H. \end{align*} Since $q_v$ has a minimum at $0$, we have $q_v'(0)=0$. Hence \begin{align*} (x-p,v)_H = 0. \end{align*} Because $v \in V$ was arbitrary, $x-p$ is orthogonal to every element of $V$. [guided] The point of this step is to convert the metric statement “$p$ is closest to $x$” into the Hilbert-space statement “the error $x-p$ is orthogonal to $V$.” To do this, fix an arbitrary vector $v \in V$ and move from $p$ in the direction $v$. Since $V$ is a linear subspace and $p,v \in V$, every point $p+tv$ with $t \in \mathbb{R}$ still lies in $V$. Define $q_v: \mathbb{R} \to \mathbb{R}$ by \begin{align*} q_v(t) := \|x-(p+tv)\|_H^2. \end{align*} The best-approximation hypothesis says that $p$ is at least as close to $x$ as every other vector in $V$. Applying this to the particular vector $p+tv \in V$ gives \begin{align*} q_v(0) = \|x-p\|_H^2 \leq \|x-(p+tv)\|_H^2 = q_v(t) \end{align*} for every $t \in \mathbb{R}$. Therefore $q_v$ has a minimum at $0$. Now expand $q_v(t)$ using bilinearity and symmetry of the real Hilbert [inner product](/page/Inner%20Product): \begin{align*} q_v(t) = (x-p-tv,x-p-tv)_H. \end{align*} This becomes \begin{align*} q_v(t) = \|x-p\|_H^2 - 2t(x-p,v)_H + t^2\|v\|_H^2. \end{align*} Thus $q_v$ is a polynomial in $t$, so it is differentiable, and its derivative at $0$ is \begin{align*} q_v'(0) = -2(x-p,v)_H. \end{align*} A differentiable real-valued function with a minimum at an interior point has derivative zero there, so $q_v'(0)=0$. Hence \begin{align*} (x-p,v)_H = 0. \end{align*} Since the direction $v \in V$ was arbitrary, the residual $x-p$ is orthogonal to every vector in $V$. [/guided] [/step] [step:Test the orthogonality relation against the spanning vectors] For each $i \in \{1,\dots,n\}$, the vector $\varphi_i$ belongs to $V$. Applying the orthogonality relation from the previous step with $v=\varphi_i$ gives \begin{align*} (x-p,\varphi_i)_H = 0. \end{align*} Using the representation $p=\sum_{j=1}^n a_j\varphi_j$ and linearity of the inner product in the first variable, we obtain \begin{align*} 0 = (x,\varphi_i)_H - \left(\sum_{j=1}^n a_j\varphi_j,\varphi_i\right)_H. \end{align*} Therefore \begin{align*} 0 = (x,\varphi_i)_H - \sum_{j=1}^n a_j(\varphi_j,\varphi_i)_H. \end{align*} Rearranging gives \begin{align*} \sum_{j=1}^n a_j(\varphi_j,\varphi_i)_H = (x,\varphi_i)_H. \end{align*} This is the desired normal equation for the index $i$. Since $i$ was arbitrary, the full system holds. [/step] [step:Identify the coefficient matrix as a Gram matrix] Define the real $n \times n$ matrix $G$ by \begin{align*} G_{ij} := (\varphi_j,\varphi_i)_H \end{align*} for $i,j \in \{1,\dots,n\}$. Define the coefficient vector $a \in \mathbb{R}^n$ by $a=(a_1,\dots,a_n)$ and define the right-hand side vector $b \in \mathbb{R}^n$ by \begin{align*} b_i := (x,\varphi_i)_H. \end{align*} The normal equations are exactly the finite-dimensional linear system \begin{align*} Ga=b. \end{align*} [/step] [step:Prove that the Gram matrix is invertible under linear independence] Assume that $\varphi_1,\dots,\varphi_n$ are linearly independent. To prove that $G$ is invertible, it is enough to prove that its kernel is zero. Let $c=(c_1,\dots,c_n) \in \mathbb{R}^n$ satisfy $Gc=0$. Define the vector $y \in H$ by \begin{align*} y := \sum_{j=1}^n c_j\varphi_j. \end{align*} Since $Gc=0$, for each $i \in \{1,\dots,n\}$ we have \begin{align*} \sum_{j=1}^n c_j(\varphi_j,\varphi_i)_H = 0. \end{align*} Multiplying the equation for index $i$ by $c_i$ and summing over $i$ gives \begin{align*} \sum_{i=1}^n c_i\sum_{j=1}^n c_j(\varphi_j,\varphi_i)_H = 0. \end{align*} By bilinearity of the real Hilbert inner product, the left-hand side is \begin{align*} \left(\sum_{j=1}^n c_j\varphi_j,\sum_{i=1}^n c_i\varphi_i\right)_H = \|y\|_H^2. \end{align*} Therefore \begin{align*} \|y\|_H^2 = 0. \end{align*} Positive definiteness of the Hilbert norm gives $y=0$. Since $\varphi_1,\dots,\varphi_n$ are linearly independent, the equality \begin{align*} \sum_{j=1}^n c_j\varphi_j = 0 \end{align*} implies $c_1=\dots=c_n=0$. Thus $\ker G=\{0\}$, so $G$ is invertible. [/step] [step:Conclude uniqueness of the normal-equation solution] When $\varphi_1,\dots,\varphi_n$ are linearly independent, the previous step shows that the coefficient matrix $G$ is invertible. Hence the linear system $Ga=b$ has exactly one solution $a \in \mathbb{R}^n$. Therefore the normal equations determine a unique coefficient vector, completing the proof. [/step]

Prerequisites (0/2 completed)

Prerequisites Graph

Interactive dependency map showing how this theorem builds on foundational concepts

Loading dependency graph...

Definitions & Concepts

Explore Further

Inner Product Definition Orthogonality Definition Symbol Characterisation of Uniform Ellipticity Analysis Jacobi Field Equation for Normal Variations of Minimal Hypersurfaces Analysis Taylor's Theorem With Cauchy Remainder Differentiation Center Manifold Reduction Principle Dynamical Systems Zero Entropy of Compact Abelian Group Rotations Analysis Lyapunov Bound on Bifurcating Equilibria Dynamical Systems Constant Coefficient Transport Formula Partial Differential Equations Intersection of Nested Compact Connected Sets is Connected Topology Analysis Area

What brings you to Androma?

Start with a route through the knowledge graph.