Stochastic Scale of Local Polynomial Derivative Estimators

Stochastic Scale of Local Polynomial Derivative Estimators (Theorem # 6352)

Theorem

Edit Issues Pull Requests Attributions Admin

Discussion

Proof

[proofplan] Let $h_n>0$ denote the bandwidth sequence appearing in the local polynomial estimator. The local polynomial normal equations separate the stochastic scale for estimating the regression level from the deterministic rescaling needed to recover a derivative coefficient. A function-value coefficient has effective sample size $nh_n$, hence stochastic scale $(nh_n)^{-1/2}$. The local-coordinate monomial of degree $\nu$ is expressed in units of $h_n^\nu$, so converting that coefficient into the $\nu$th derivative multiplies the stochastic fluctuation by $h_n^{-\nu}$, with the factor $\nu!$ changing only the constant. [/proofplan] [step:Record the asymptotic variance supplied by local polynomial normality] Let $h_n>0$ denote the bandwidth used by the local polynomial estimator at sample size $n$. Let $o_{\mathbb P}(1)$ denote a sequence of real-valued random variables converging to $0$ in probability, and let $A_n \xrightarrow{\mathbb P} A$ denote convergence in probability. In this proof, the stochastic standard deviation is the conditional standard deviation given the design variables $X_1,\dots,X_n$. The local polynomial asymptotic normality assumptions are used through their conditional variance expansion at the interior point $x$; in particular, the hypotheses include the measurability, interior-point, bandwidth, design, and moment conditions needed for that expansion. Thus there exists a deterministic constant $V_{p,\nu}(x) \in (0,\infty)$ such that \begin{align*} \operatorname{Var}\!\left(\hat m_p^{(\nu)}(x) \mid X_1,\dots,X_n\right) = \frac{V_{p,\nu}(x)}{n h_n^{2\nu+1}}\,(1+o_{\mathbb P}(1)). \end{align*} Since $1+o_{\mathbb P}(1) \xrightarrow{\mathbb P} 1$, this multiplicative factor is positive with probability tending to one. On that event the square root is well-defined, and the square-root map is continuous in a neighbourhood of $1$. Therefore the [Continuous Mapping Theorem](/theorems/1847) gives $\sqrt{1+o_{\mathbb P}(1)}=1+o_{\mathbb P}(1)$. Taking square roots gives \begin{align*} \operatorname{sd}\!\left(\hat m_p^{(\nu)}(x) \mid X_1,\dots,X_n\right) = \sqrt{V_{p,\nu}(x)}\,(n h_n^{2\nu+1})^{-1/2}\,(1+o_{\mathbb P}(1)). \end{align*} Since $\sqrt{V_{p,\nu}(x)} \in (0,\infty)$ is independent of $n$ and $h_n$, the stochastic standard deviation is of order $(n h_n^{2\nu+1})^{-1/2}$. [guided] First fix the asymptotic notation. Let $h_n>0$ denote the bandwidth used by the local polynomial estimator at sample size $n$. The expression $o_{\mathbb P}(1)$ denotes a sequence of real-valued random variables converging to $0$ in probability, and $A_n \xrightarrow{\mathbb P} A$ denotes convergence in probability. The theorem uses stochastic standard deviation in the local-polynomial sense: it is the conditional standard deviation given the observed design variables $X_1,\dots,X_n$. The hypothesis of local polynomial asymptotic normality is precisely the place where the variance calculation enters, but we use more than [weak convergence](/page/Weak%20Convergence) of a normalized estimator. We use the conditional variance expansion included among the local-polynomial asymptotic normality assumptions at the interior point $x$. Those assumptions include the measurability of the data, the fact that $x$ is an interior point, and the design, bandwidth, and moment conditions needed for the conditional variance formula. Therefore there is a finite nonzero deterministic constant $V_{p,\nu}(x)$ for which \begin{align*} \operatorname{Var}\!\left(\hat m_p^{(\nu)}(x) \mid X_1,\dots,X_n\right) = \frac{V_{p,\nu}(x)}{n h_n^{2\nu+1}}\,(1+o_{\mathbb P}(1)). \end{align*} The stochastic standard deviation is the square root of this variance. Since $1+o_{\mathbb P}(1) \xrightarrow{\mathbb P} 1$, the multiplicative factor is positive with probability tending to one; this matches the fact that the left-hand side is a variance and justifies taking the square root on an event whose probability tends to one. On that event, the map $r \mapsto \sqrt r$ is continuous at $r=1$, so the [Continuous Mapping Theorem](/theorems/1847) gives $\sqrt{1+o_{\mathbb P}(1)}=1+o_{\mathbb P}(1)$. Therefore \begin{align*} \operatorname{sd}\!\left(\hat m_p^{(\nu)}(x) \mid X_1,\dots,X_n\right) = \left[ \frac{V_{p,\nu}(x)}{n h_n^{2\nu+1}}\,(1+o_{\mathbb P}(1)) \right]^{1/2}. \end{align*} Equivalently, \begin{align*} \operatorname{sd}\!\left(\hat m_p^{(\nu)}(x) \mid X_1,\dots,X_n\right) = \sqrt{V_{p,\nu}(x)}\,(n h_n^{2\nu+1})^{-1/2}\,(1+o_{\mathbb P}(1)). \end{align*} The factor $\sqrt{V_{p,\nu}(x)}$ depends on fixed features of the problem, such as the kernel, polynomial order, design density, and conditional error variance at $x$, but it does not change with $n$ in the asymptotic order statement. Because $V_{p,\nu}(x)$ is finite and nonzero, multiplication by $\sqrt{V_{p,\nu}(x)}$ changes only the constant. To connect this variance calculation with the local-polynomial coefficients, let $\hat a_{\nu,n}$ denote the fitted coefficient of the monomial $u^\nu$ when the local polynomial is written in the dimensionless coordinate $u=(t-x)/h_n$. The asymptotic variance representation above is the proof of the stochastic scale; the following rescaling explanation only records why the same exponent appears in local coordinates. A function-value coefficient has effective sample size $n h_n$, and converting the coefficient of $u^\nu$ into the $\nu$th derivative multiplies the fluctuation by $h_n^{-\nu}$, while the Taylor factor $\nu!$ is fixed. Thus \begin{align*} h_n^{-\nu}(n h_n)^{-1/2} = (nh_n^{2\nu+1})^{-1/2}. \end{align*} Hence the stochastic standard deviation has order \begin{align*} (nh_n^{2\nu+1})^{-1/2}. \end{align*} [/guided] [/step] [step:Explain the derivative rescaling behind the exponent $2\nu+1$] Let $m: \mathbb{R}\to\mathbb{R}$ denote the regression function whose $\nu$th derivative at $x$ is estimated by $\hat m_p^{(\nu)}(x)$. Define the local-coordinate map $T_n: \mathbb{R}\to\mathbb{R}$ around the fixed interior point $x$ by \begin{align*} T_n(t)=\frac{t-x}{h_n}. \end{align*} For an observation location $t \in \mathbb{R}$, write $u=T_n(t) \in \mathbb{R}$, so that $t=x+h_nu$ and $t-x=h_nu$. Let $\hat a_{\nu,n}$ denote the fitted coefficient of $u^\nu$ in the local polynomial written in the dimensionless coordinate $u$. This rescaling step is explanatory reinforcement for the exponent already obtained from the assumed variance representation, not a separate derivation of that representation. Under the usual Taylor normalization, the derivative estimator is obtained from this local-coordinate coefficient by \begin{align*} \hat m_p^{(\nu)}(x)=\nu!\,h_n^{-\nu}\hat a_{\nu,n}. \end{align*} At the level of local-coordinate bookkeeping, the already-invoked asymptotic variance expansion may be read as recording a function-value-scale fluctuation of order $(n h_n)^{-1/2}$ before derivative rescaling. This is not an independent proof of the coefficient variance; it only explains the power of $h_n$ in the variance scale established in the previous step. Multiplying by $h_n^{-\nu}$ when passing from the dimensionless coefficient to the $\nu$th derivative gives \begin{align*} h_n^{-\nu}(n h_n)^{-1/2} = (n h_n^{2\nu+1})^{-1/2}. \end{align*} The derivative convention introduces the multiplicative factor $\nu!$, because the coefficient of $(t-x)^\nu$ in the Taylor expansion is $m^{(\nu)}(x)/\nu!$. Since $\nu!$ is fixed once $\nu$ is fixed, it changes only the leading constant and not the order. This agrees with the variance scale established above and completes the proof. [/step]

Prerequisites (0/3 completed)

Prerequisites Graph

Interactive dependency map showing how this theorem builds on foundational concepts

Loading dependency graph...

Theorems

Conditional Variance Formula for the Ordinary Least Squares Estimator

Definitions & Concepts

Explore Further

Event Definition Variance Definition Conditional Variance Formula for the Ordinary Least Squares Estimator Theorem #4436 PGF of a Random Sum Probability Theory Martingale Difference Bound under the Bounded Differences Condition Probability & Statistics Pitman Asymptotic Relative Efficiency of the Wilcoxon Signed-Rank Test Against the Paired $t$-Test Probability & Statistics Exponential Markov Inequality Probability & Statistics Computational Formula for Variance Probability Theory RIP Sufficient Condition for Exact Basis Pursuit Recovery Probability & Statistics Yang-Barron Entropy Lower Bound Probability & Statistics Optional Stopping for UI Martingales Martingale Theory Probability & Statistics Area

What brings you to Androma?

Start with a route through the knowledge graph.