Variational Characterization of Cubic Smoothing Splines

Theorem

Edit Issues Pull Requests Attributions Admin

Discussion

Proof

[proofplan] We prove existence by the direct method. The second-derivative term controls functions modulo affine functions, while the data term at the two distinct sites $x_1$ and $x_n$ controls the affine part. Uniqueness follows from convexity: equality in the convexity inequality forces the difference of two minimizers to be affine and to vanish at every data site, hence to vanish identically. Finally, the first variation gives a weak Euler equation; testing away from the data sites shows that the fourth derivative vanishes on each open interval between knots, and [integration by parts](/theorems/210) gives the natural boundary conditions. [/proofplan] [step:Control the affine part and obtain coercivity in $H^2([a,b])$] Let $I := [a,b]$ and let $L := b-a > 0$. For $u \in H^2(I)$, define the affine map \begin{align*} A_u: I \to \mathbb{R} \end{align*} by requiring $A_u(x_1)=u(x_1)$ and $A_u(x_n)=u(x_n)$. This is well-defined because $x_1<x_n$. Explicitly, \begin{align*} A_u(t) = u(x_1) + \frac{u(x_n)-u(x_1)}{x_n-x_1}(t-x_1). \end{align*} Define the remainder map $w_u: I \to \mathbb{R}$ by \begin{align*} w_u = u-A_u. \end{align*} Then $w_u \in H^2(I)$, $w_u(x_1)=w_u(x_n)=0$, and $w_u''=u''$ in $L^2(I)$. We first record the one-dimensional estimate needed for coercivity. If $w \in H^2(I)$ and $w(x_1)=w(x_n)=0$, then \begin{align*} \|w\|_{H^2(I)} \leq C_0 \|w''\|_{L^2(I)} \end{align*} for a constant $C_0=C_0(a,b,x_1,x_n)>0$. Indeed, since $w \in H^2(I)$, its [weak derivative](/page/Weak%20Derivative) $w'$ has an absolutely continuous representative and \begin{align*} w'(t)=w'(x_1)+\int_{x_1}^t w''(r)\,d\mathcal{L}^1(r) \end{align*} for every $t \in I$. Also \begin{align*} 0=w(x_n)-w(x_1)=\int_{x_1}^{x_n} w'(t)\,d\mathcal{L}^1(t). \end{align*} Therefore \begin{align*} |w'(x_1)| \leq \frac{1}{x_n-x_1}\int_{x_1}^{x_n}\left|\int_{x_1}^t w''(r)\,d\mathcal{L}^1(r)\right|d\mathcal{L}^1(t). \end{align*} By the [Cauchy-Schwarz inequality](/theorems/432) on $[x_1,t]$, \begin{align*} \left|\int_{x_1}^t w''(r)\,d\mathcal{L}^1(r)\right| \leq |t-x_1|^{1/2}\|w''\|_{L^2(I)}. \end{align*} Hence $|w'(x_1)| \leq C_1\|w''\|_{L^2(I)}$ for \begin{align*} C_1 := \frac{1}{x_n-x_1}\int_{x_1}^{x_n} |t-x_1|^{1/2}\,d\mathcal{L}^1(t). \end{align*} It follows that \begin{align*} |w'(t)| \leq (C_1+L^{1/2})\|w''\|_{L^2(I)} \end{align*} for all $t \in I$. Since $w(x_1)=0$, \begin{align*} |w(t)| \leq \int_{x_1}^t |w'(r)|\,d\mathcal{L}^1(r) \leq L(C_1+L^{1/2})\|w''\|_{L^2(I)}. \end{align*} Combining the bounds for $w$, $w'$, and $w''$ gives the asserted estimate with an explicit constant $C_0$ depending only on $a,b,x_1,x_n$. Now let $(u_k)_{k=1}^\infty$ be a sequence in $H^2(I)$ such that $J_\lambda(u_k)$ is bounded. Since the data term is bounded, the real sequences $(u_k(x_1))_{k=1}^\infty$ and $(u_k(x_n))_{k=1}^\infty$ are bounded. Hence the affine functions $A_{u_k}$ are bounded in $H^2(I)$. The second-derivative term gives a bound for $\|u_k''\|_{L^2(I)}$, and the estimate above gives a bound for $w_{u_k}=u_k-A_{u_k}$ in $H^2(I)$. Therefore $(u_k)_{k=1}^\infty$ is bounded in $H^2(I)$. [guided] The obstacle in proving coercivity is that the seminorm \begin{align*} u \mapsto \left(\int_a^b |u''(t)|^2\,d\mathcal{L}^1(t)\right)^{1/2} \end{align*} does not see affine functions. Thus the penalty term controls curvature but not vertical shifts or slopes. The data term at two distinct data sites supplies exactly the missing control. For each $u \in H^2(I)$, define the affine function $A_u: I \to \mathbb{R}$ by \begin{align*} A_u(t) = u(x_1) + \frac{u(x_n)-u(x_1)}{x_n-x_1}(t-x_1). \end{align*} The denominator is nonzero because $x_1<x_n$. Then $w_u := u-A_u$ belongs to $H^2(I)$, satisfies $w_u(x_1)=w_u(x_n)=0$, and has $w_u''=u''$ because $A_u''=0$. We need to prove that a function $w \in H^2(I)$ with $w(x_1)=w(x_n)=0$ is controlled by its second derivative. Since $w' \in H^1(I)$, the one-dimensional absolute continuity representative satisfies \begin{align*} w'(t)=w'(x_1)+\int_{x_1}^t w''(r)\,d\mathcal{L}^1(r). \end{align*} The condition $w(x_n)=w(x_1)$ gives \begin{align*} 0=\int_{x_1}^{x_n} w'(t)\,d\mathcal{L}^1(t). \end{align*} Substituting the formula for $w'$ into this identity gives control of the unknown initial slope $w'(x_1)$. Taking absolute values and using the Cauchy-Schwarz inequality on each interval $[x_1,t]$ yields \begin{align*} |w'(x_1)| \leq \frac{1}{x_n-x_1}\int_{x_1}^{x_n}|t-x_1|^{1/2}\|w''\|_{L^2(I)}\,d\mathcal{L}^1(t). \end{align*} Thus $|w'(x_1)| \leq C_1\|w''\|_{L^2(I)}$, where \begin{align*} C_1 := \frac{1}{x_n-x_1}\int_{x_1}^{x_n} |t-x_1|^{1/2}\,d\mathcal{L}^1(t). \end{align*} Returning to the formula for $w'(t)$ gives \begin{align*} |w'(t)| \leq (C_1+L^{1/2})\|w''\|_{L^2(I)} \end{align*} for every $t \in I$. Since $w(x_1)=0$, integrating $w'$ from $x_1$ to $t$ gives \begin{align*} |w(t)| \leq L(C_1+L^{1/2})\|w''\|_{L^2(I)}. \end{align*} These estimates control the $L^2$ norms of $w$ and $w'$ by $\|w''\|_{L^2(I)}$, so there is a constant $C_0=C_0(a,b,x_1,x_n)>0$ such that \begin{align*} \|w\|_{H^2(I)} \leq C_0\|w''\|_{L^2(I)}. \end{align*} Now suppose $J_\lambda(u_k)$ is bounded. The term \begin{align*} \sum_{i=1}^n |u_k(x_i)-y_i|^2 \end{align*} bounds $u_k(x_1)$ and $u_k(x_n)$, because $y_1$ and $y_n$ are fixed [real numbers](/page/Real%20Numbers). Hence the affine interpolants $A_{u_k}$ are bounded in $H^2(I)$. The penalty term bounds $\|u_k''\|_{L^2(I)}$, so the estimate above bounds $u_k-A_{u_k}$ in $H^2(I)$. Adding back the bounded affine part, we conclude that $(u_k)$ is bounded in $H^2(I)$. [/guided] [/step] [step:Apply the direct method to obtain a minimizer] Let \begin{align*} m := \inf_{u \in H^2(I)} J_\lambda(u). \end{align*} Choose a minimizing sequence $(u_k)_{k=1}^\infty$ in $H^2(I)$ such that $J_\lambda(u_k)\to m$. By the coercivity estimate from the previous step, $(u_k)$ is bounded in $H^2(I)$. Since $H^2(I)$ is a [Hilbert space](/page/Hilbert%20Space), it is reflexive. Therefore the bounded sequence $(u_k)_{k=1}^\infty$ has a subsequence, still denoted $(u_k)$, and a function $s \in H^2(I)$ such that \begin{align*} u_k \rightharpoonup s \end{align*} weakly in $H^2(I)$. For each fixed $i \in \{1,\dots,n\}$, define the point-evaluation functional $E_i: H^2(I) \to \mathbb{R}$ by $E_i(u):=u(x_i)$. The one-dimensional estimate proved in the previous step, applied after subtracting the affine interpolant through two fixed points, implies that point evaluation is bounded on $H^2(I)$. Hence $E_i$ is a bounded linear functional, and [weak convergence](/page/Weak%20Convergence) in $H^2(I)$ gives \begin{align*} u_k(x_i) \to s(x_i). \end{align*} The map $u \mapsto u''$ is continuous from $H^2(I)$ to $L^2(I)$, so $u_k'' \rightharpoonup s''$ weakly in $L^2(I)$. The squared $L^2$ norm is weakly lower semicontinuous: for any weakly convergent sequence $f_k \rightharpoonup f$ in a Hilbert space, the identity \begin{align*} \|f\| = \sup_{\|g\|\leq 1} |(f,g)| \end{align*} and passage to the limit in $(f_k,g)$ give $\|f\| \leq \liminf_{k\to\infty}\|f_k\|$. Applying this in $L^2(I)$ gives \begin{align*} \int_a^b |s''(t)|^2\,d\mathcal{L}^1(t) \leq \liminf_{k\to\infty}\int_a^b |u_k''(t)|^2\,d\mathcal{L}^1(t). \end{align*} Together with pointwise convergence of the data values, this gives \begin{align*} J_\lambda(s)\leq \liminf_{k\to\infty}J_\lambda(u_k)=m. \end{align*} Since $m$ is the infimum, $J_\lambda(s)=m$. Thus $s$ is a minimizer. [/step] [step:Use strict convexity modulo affine functions to prove uniqueness] Suppose $s_0,s_1 \in H^2(I)$ are minimizers of $J_\lambda$. Define the midpoint map $s_{1/2}: I \to \mathbb{R}$ by \begin{align*} s_{1/2} := \frac{s_0+s_1}{2}. \end{align*} Since $H^2(I)$ is a [vector space](/page/Vector%20Space), $s_{1/2}\in H^2(I)$. For each $i \in \{1,\dots,n\}$, the function $r \mapsto |r-y_i|^2$ is convex on $\mathbb{R}$, and the map $v \mapsto \int_a^b |v''(t)|^2\,d\mathcal{L}^1(t)$ is convex on $H^2(I)$. Therefore \begin{align*} J_\lambda(s_{1/2}) \leq \frac{1}{2}J_\lambda(s_0)+\frac{1}{2}J_\lambda(s_1)=m. \end{align*} Thus $s_{1/2}$ is also a minimizer, and equality must hold in both convexity inequalities. Equality in the Hilbert-space norm convexity identity gives \begin{align*} s_0''=s_1'' \end{align*} in $L^2(I)$. Hence $h:=s_0-s_1$ satisfies $h''=0$ in $L^2(I)$, so $h$ agrees on $I$ with an affine function. Equality in the scalar convexity inequality for the data term gives \begin{align*} s_0(x_i)=s_1(x_i) \end{align*} for every $i \in \{1,\dots,n\}$. In particular, \begin{align*} h(x_1)=h(x_n)=0. \end{align*} An affine function vanishing at two distinct points is identically zero. Hence $h=0$ on $I$, so $s_0=s_1$. The minimizer is unique. [/step] [step:Derive the weak Euler equation from the first variation] Let $s \in H^2(I)$ be the unique minimizer. For an arbitrary direction $v \in H^2(I)$, define the variation map $\Phi_v: \mathbb{R} \to \mathbb{R}$ by $\Phi_v(\varepsilon):=J_\lambda(s+\varepsilon v)$. Since $s$ minimizes $J_\lambda$, the real function $\Phi_v$ has a minimum at $\varepsilon=0$. Differentiating the polynomial expression in $\varepsilon$ gives \begin{align*} 0=\Phi_v'(0)=2\sum_{i=1}^n (s(x_i)-y_i)v(x_i)+2\lambda\int_a^b s''(t)v''(t)\,d\mathcal{L}^1(t). \end{align*} Dividing by $2$ yields the Euler equation \begin{align*} \lambda\int_a^b s''(t)v''(t)\,d\mathcal{L}^1(t)+\sum_{i=1}^n (s(x_i)-y_i)v(x_i)=0 \end{align*} for every $v \in H^2(I)$. [/step] [step:Show that the fourth derivative vanishes away from the data sites] Let $U$ be a connected component of $(a,b)\setminus\{x_1,\dots,x_n\}$. Let $\varphi: U \to \mathbb{R}$ be any smooth compactly supported [test function](/page/Test%20Function) on $U$. Extend $\varphi$ by zero to a function $\tilde{\varphi}: I \to \mathbb{R}$. Then $\tilde{\varphi}\in H^2(I)$ and $\tilde{\varphi}(x_i)=0$ for every $i \in \{1,\dots,n\}$. Substituting $v=\tilde{\varphi}$ in the Euler equation gives \begin{align*} \int_U s''(t)\varphi''(t)\,d\mathcal{L}^1(t)=0. \end{align*} This means that the distributional fourth derivative of the distribution induced by $s|_U$ is zero on $U$. We justify the polynomial conclusion directly. Let $T_s \in \mathcal{D}'(U)$ be the [regular distribution](/page/Regular%20Distribution) defined by \begin{align*} T_s(\psi) := \int_U s(t)\psi(t)\,d\mathcal{L}^1(t) \end{align*} for every $\psi \in C_c^\infty(U)$. Since $D^4T_s=0$, the standard one-dimensional structure theorem for distributions with vanishing derivative gives constants $c_0,c_1,c_2,c_3 \in \mathbb{R}$ such that $T_s$ equals the distribution induced by $c_0+c_1t+c_2t^2+c_3t^3$ on $U$. Because $s\in H^2(U)\subset L^2_{\mathrm{loc}}(U)$, equality of the associated regular distributions implies equality of the functions for $\mathcal{L}^1$-a.e. $t\in U$. Replacing $s$ by its continuous representative, $s$ agrees everywhere on $U$ with that polynomial. Hence $s$ is cubic on every connected component of $(a,b)\setminus\{x_1,\dots,x_n\}$. [/step] [step:Prove that $s''$ has matching one-sided traces at interior data sites] Let $c=x_j$ be a data site with $a<c<b$. Choose $\eta>0$ so small that $(c-\eta,c+\eta)$ contains no data site except $c$. Let $v\in H^2(I)$ be supported in $(c-\eta,c+\eta)$, with $v(c)=0$. Then $v(x_i)=0$ for every $i\in\{1,\dots,n\}$, so the Euler equation gives \begin{align*} \int_{c-\eta}^{c+\eta}s''(t)v''(t)\,d\mathcal{L}^1(t)=0. \end{align*} On $(c-\eta,c)$ and $(c,c+\eta)$ the function $s$ is a polynomial of degree at most $3$, so ordinary [integration by parts](/theorems/2098) twice on each side is valid and the fourth-derivative interior terms vanish. The support of $v$ removes the outer endpoint contributions, while the condition $v(c)=0$ removes the terms involving the one-sided third derivatives. The remaining contribution is \begin{align*} \bigl(s''(c-)-s''(c+)\bigr)v'(c)=0, \end{align*} where $s''(c-)$ and $s''(c+)$ denote the left and right polynomial traces of $s''$ at $c$. Since $v'(c)$ can be prescribed arbitrarily while keeping $v(c)=0$ and keeping the support in $(c-\eta,c+\eta)$, we obtain \begin{align*} s''(c-)=s''(c+). \end{align*} Thus $s''$ is continuous across every interior data site in the trace sense. [/step] [step:Recover the natural boundary conditions] We now use the Euler equation with test functions supported near the endpoints. Since $s$ is a polynomial of degree at most $3$ on $(a,x_1)$ when $a<x_1$ and on $(x_n,b)$ when $x_n<b$, the one-sided traces $s''(a)$ and $s''(b)$ are well-defined. If $a=x_1$, the trace $s''(a)$ is understood as the right trace from the first open interval adjacent to $a$; similarly, if $b=x_n$, the trace $s''(b)$ is the left trace. Choose $\eta>0$ so small that $(a,a+\eta)$ contains no data site other than possibly $a$. Choose $v\in H^2(I)$ with support contained in $[a,a+\eta)$ and with $v(a)=0$. Then $v(x_i)=0$ for every data site in the support. On $(a,a+\eta)$ the function $s$ is a polynomial of degree at most $3$ except possibly at $a$ itself, so ordinary integration by parts twice gives the left endpoint contribution $-s''(a)v'(a)+s'''(a)v(a)$. The term $s'''(a)v(a)$ is zero because $v(a)=0$, the interior term vanishes because $s^{(4)}=0$ on $(a,a+\eta)$, and the support removes the right endpoint contribution. Hence \begin{align*} \int_a^b s''(t)v''(t)\,d\mathcal{L}^1(t)=-s''(a)v'(a). \end{align*} The Euler equation then gives \begin{align*} s''(a)v'(a)=0. \end{align*} Since $v'(a)$ can be prescribed arbitrarily while keeping $v(a)=0$, it follows that \begin{align*} s''(a)=0. \end{align*} For the right endpoint, choose $\eta>0$ so small that $(b-\eta,b)$ contains no data site other than possibly $b$. Choose $v\in H^2(I)$ with support contained in $(b-\eta,b]$ and with $v(b)=0$. Then $v(x_i)=0$ for every data site in the support. On $(b-\eta,b)$ the function $s$ is a polynomial of degree at most $3$ except possibly at $b$ itself. Integrating by parts twice gives the right endpoint contribution $s''(b)v'(b)-s'''(b)v(b)$. The term involving $v(b)$ is zero, the interior term vanishes because $s^{(4)}=0$, and the support removes the left endpoint contribution. Therefore \begin{align*} \int_a^b s''(t)v''(t)\,d\mathcal{L}^1(t)=s''(b)v'(b). \end{align*} The Euler equation gives \begin{align*} s''(b)v'(b)=0. \end{align*} Since $v'(b)$ can be prescribed arbitrarily while keeping $v(b)=0$, we obtain \begin{align*} s''(b)=0. \end{align*} Thus the minimizer satisfies the natural boundary conditions. [guided] The Euler equation says that curvature penalty and data residuals balance: \begin{align*} \lambda\int_a^b s''(t)v''(t)\,d\mathcal{L}^1(t)+\sum_{i=1}^n (s(x_i)-y_i)v(x_i)=0. \end{align*} To detect the boundary condition at $a$, choose $\eta>0$ so small that $(a,a+\eta)$ contains no data site other than possibly $a$. We choose $v$ supported in $[a,a+\eta)$ and impose $v(a)=0$. This makes the data sum disappear: if $a$ itself is a data site, then its contribution is zero because $v(a)=0$, and there are no other data sites in the support. On the interval $(a,a+\eta)$, the previous steps show that $s$ is cubic, so $s^{(4)}=0$ there. Ordinary integration by parts twice on $(a,a+\eta)$ gives the left endpoint contribution \begin{align*} -s''(a)v'(a)+s'''(a)v(a). \end{align*} The second term is zero because $v(a)=0$. The interior term vanishes because $s^{(4)}=0$ on $(a,a+\eta)$, and the support of $v$ removes the right endpoint contribution. Therefore the Euler equation forces \begin{align*} 0=\lambda\int_a^b s''(t)v''(t)\,d\mathcal{L}^1(t)=-\lambda s''(a)v'(a). \end{align*} Since $\lambda>0$ and $v'(a)$ can be chosen as any real number while keeping $v(a)=0$, the only possibility is \begin{align*} s''(a)=0. \end{align*} Now detect the boundary condition at $b$. Choose $\eta>0$ so small that $(b-\eta,b)$ contains no data site other than possibly $b$. Choose $v$ supported in $(b-\eta,b]$ and impose $v(b)=0$. The data sum again vanishes, because the only possible data site in the support is $b$ and $v(b)=0$. On $(b-\eta,b)$, the function $s$ is cubic, so $s^{(4)}=0$. Integrating by parts twice gives the right endpoint contribution \begin{align*} s''(b)v'(b)-s'''(b)v(b)=s''(b)v'(b). \end{align*} The Euler equation therefore gives \begin{align*} 0=\lambda\int_a^b s''(t)v''(t)\,d\mathcal{L}^1(t)=\lambda s''(b)v'(b). \end{align*} Since $\lambda>0$ and $v'(b)$ can be chosen as any real number while keeping $v(b)=0$, we obtain \begin{align*} s''(b)=0. \end{align*} These are exactly the natural boundary conditions for a cubic spline. [/guided] [/step] [step:Identify the minimizer as a natural cubic spline with knots among the data sites] The preceding steps show that $s$ agrees with a polynomial of degree at most $3$ on each connected component of $[a,b]\setminus\{x_1,\dots,x_n\}$ and satisfies \begin{align*} s''(a)=s''(b)=0. \end{align*} Since $s\in H^2(I)$, the representatives of $s$ and $s'$ are continuous on $I$. The matching-trace step proves that the one-sided traces of $s''$ agree at every interior data site. Therefore the polynomial pieces match with the standard $C^2$ spline regularity, and the only possible changes in the cubic polynomial pieces occur at the data sites $x_1,\dots,x_n$. Hence the knots of $s$ are contained in $\{x_1,\dots,x_n\}$. Therefore the unique minimizer of $J_\lambda$ is a natural cubic spline with knots among the data sites. This proves the theorem. [/step]

Prerequisites (0/8 completed)

Prerequisites Graph

Interactive dependency map showing how this theorem builds on foundational concepts

Loading dependency graph...

Theorems

Definitions & Concepts

What brings you to Androma?

Start with a route through the knowledge graph.