Whittle Likelihood Approximation Theorem — Statement & Proof

Whittle Likelihood Approximation Theorem (Theorem # 3653)

Theorem

Edit Issues Pull Requests Attributions Admin

Let $(X_t)_{t\in\mathbb Z}$ be a mean-zero Gaussian stationary process on a probability space $(\Omega,\mathcal F,\mathbb P_{\theta_0})$ whose spectral density is $f_{\theta_0}$ for some $\theta_0\in\Theta$. Let $\mathcal L^1$ denote one-dimensional Lebesgue measure on $[-\pi,\pi]$. For each $\theta\in\Theta$, define the autocovariance function $\gamma_\theta:\mathbb Z\to\mathbb R$ by \begin{align*} \gamma_\theta(h)=\frac{1}{2\pi}\int_{-\pi}^{\pi} e^{ih\lambda}f_\theta(\lambda)\,d\mathcal L^1(\lambda), \end{align*} and define the Toeplitz covariance matrix $\Gamma_n(\theta)\in\mathbb R^{n\times n}$ by $(\Gamma_n(\theta))_{ij}=\gamma_\theta(i-j)$. Let $C_n(\theta)\in\mathbb R^{n\times n}$ be the positive definite circulant covariance matrix whose eigenvalues are the corresponding Fourier-grid spectral ordinates of $f_\theta$. Define the Gaussian negative log-likelihood, up to the usual additive constant independent of $\theta$, by \begin{align*} -2\ell_n(\theta)=n\log(2\pi)+\log\det\Gamma_n(\theta)+X_n^\top\Gamma_n(\theta)^{-1}X_n, \end{align*} where $X_n=(X_1,\dots,X_n)^\top$, and define the Whittle objective with the matching normalising constant by \begin{align*} 2Q_n(\theta)=n\log(2\pi)+\log\det C_n(\theta)+X_n^\top C_n(\theta)^{-1}X_n. \end{align*} Assume that for every compact $K\subset\Theta$, the functions $f_\theta$ are bounded away from $0$ uniformly on $K$, are sufficiently smooth in $\theta$, and satisfy short-memory regularity conditions strong enough to imply the two uniform Toeplitz-circulant approximations \begin{align*} \sup_{\theta\in K}\left|\frac{1}{n}\log\det\Gamma_n(\theta)-\frac{1}{n}\log\det C_n(\theta)\right|\to 0 \end{align*} and \begin{align*} \mathbb E_{\theta_0}\left[\sup_{\theta\in K}\left|\frac{1}{n}X_n^\top\left(\Gamma_n(\theta)^{-1}-C_n(\theta)^{-1}\right)X_n\right|\right]\to 0. \end{align*} Then, for each compact $K\subset\Theta$, \begin{align*} \sup_{\theta\in K}\left|\frac{-2\ell_n(\theta)}{n}-\frac{2Q_n(\theta)}{n}\right|\xrightarrow{\mathbb P}0. \end{align*}

Discussion

No discussion available for this theorem.

Proof

[proofplan] The proof separates the likelihood difference into a deterministic log-determinant term and a random quadratic-form term. The revised hypotheses state precisely the two uniform Toeplitz-circulant approximations supplied by the short-memory regularity assumptions. The Gaussian, mean-zero stationary, boundedness, smoothness, and short-memory assumptions are used through those two approximation inputs; once they are available, the remaining argument is deterministic plus a probability estimate. The log-determinant term converges uniformly by the first approximation, while the quadratic-form term converges in probability by the second approximation and Markov's inequality. Adding the two estimates gives the asserted uniform convergence after the common normalising constant cancels. [/proofplan] [step:Write the likelihood difference as a sum of determinant and quadratic-form errors] Fix a compact set $K\subset\Theta$. Let $(\Omega,\mathcal F,\mathbb P_{\theta_0})$ denote the probability space carrying the process $(X_t)_{t\in\mathbb Z}$ under the true parameter $\theta_0$, and let $\mathbb{E}_{\theta_0}$ denote expectation with respect to $\mathbb P_{\theta_0}$. Let $\mathcal L^1$ denote one-dimensional Lebesgue measure on $[-\pi,\pi]$. For each $n\in\mathbb{N}$, define the random vector \begin{align*} X_n: \Omega &\to \mathbb{R}^n \\ \omega &\mapsto (X_1(\omega),\dots,X_n(\omega))^\top. \end{align*} For $\theta\in K$, define the matrix difference \begin{align*} A_n(\theta):=\Gamma_n(\theta)^{-1}-C_n(\theta)^{-1}\in\mathbb{R}^{n\times n}. \end{align*} The Toeplitz covariance matrix $\Gamma_n(\theta)$ is positive definite because the spectral density $f_\theta$ is bounded below by a positive constant on $[-\pi,\pi]$; indeed, for every non-zero $v\in\mathbb R^n$, the spectral representation gives \begin{align*} v^\top\Gamma_n(\theta)v=\frac{1}{2\pi}\int_{-\pi}^{\pi}\left|\sum_{j=1}^{n}v_j e^{ij\lambda}\right|^2 f_\theta(\lambda)\,d\mathcal L^1(\lambda). \end{align*} The trigonometric polynomial $\lambda\mapsto \sum_{j=1}^{n}v_j e^{ij\lambda}$ is not identically zero because $v\neq 0$, and a non-zero trigonometric polynomial has only finitely many zeros on $[-\pi,\pi]$ unless it is identically zero. Hence its squared modulus is positive on a set of positive $\mathcal L^1$-measure. Since $f_\theta$ is bounded below by a positive constant, the displayed integral is strictly positive. The circulant Whittle covariance matrix $C_n(\theta)$ is positive definite by its definition in the theorem statement. Hence both inverses and determinants are well-defined. Using the definitions of the Gaussian likelihood and the Whittle objective in the theorem statement, \begin{align*} -2\ell_n(\theta)&=n\log(2\pi)+\log\det\Gamma_n(\theta)+X_n^\top\Gamma_n(\theta)^{-1}X_n,\\ 2Q_n(\theta)&=n\log(2\pi)+\log\det C_n(\theta)+X_n^\top C_n(\theta)^{-1}X_n. \end{align*} Subtracting these two identities, the common term $n\log(2\pi)$ cancels and gives \begin{align*} \frac{-2\ell_n(\theta)}{n}-\frac{2Q_n(\theta)}{n} =\frac{1}{n}\left(\log\det\Gamma_n(\theta)-\log\det C_n(\theta)\right) +\frac{1}{n}X_n^\top A_n(\theta)X_n. \end{align*} Taking the supremum over $\theta\in K$ and applying the triangle inequality yields \begin{align*} \sup_{\theta\in K}\left|\frac{-2\ell_n(\theta)}{n}-\frac{2Q_n(\theta)}{n}\right| &\leq D_n+R_n, \end{align*} where the deterministic error $D_n\in[0,\infty)$ and the random error $R_n:\Omega\to[0,\infty]$ are defined by \begin{align*} D_n&:=\sup_{\theta\in K}\left|\frac{1}{n}\log\det\Gamma_n(\theta)-\frac{1}{n}\log\det C_n(\theta)\right|,\\ R_n&:=\sup_{\theta\in K}\left|\frac{1}{n}X_n^\top A_n(\theta)X_n\right|. \end{align*} [/step] [step:Use the uniform Toeplitz determinant approximation] By the first uniform Toeplitz-circulant approximation in the theorem statement, applied on the fixed compact set $K\subset\Theta$, we have \begin{align*} D_n=\sup_{\theta\in K}\left|\frac{1}{n}\log\det\Gamma_n(\theta)-\frac{1}{n}\log\det C_n(\theta)\right|\to 0. \end{align*} Since $D_n$ is deterministic, this convergence also implies $D_n\xrightarrow{\mathbb P}0$. [/step] [step:Convert the uniform quadratic-form expectation bound into convergence in probability] The random variable $R_n$ is non-negative by definition. The second uniform Toeplitz-circulant approximation in the theorem statement gives \begin{align*} \mathbb{E}_{\theta_0}[R_n] =\mathbb{E}_{\theta_0}\left[\sup_{\theta\in K}\left|\frac{1}{n}X_n^\top\left(\Gamma_n(\theta)^{-1}-C_n(\theta)^{-1}\right)X_n\right|\right]\to 0. \end{align*} Let $\varepsilon>0$. Applying [Markov's Inequality](/page/Markov%27s%20Inequality) to the non-negative random variable $R_n$ gives \begin{align*} \mathbb{P}_{\theta_0}(R_n>\varepsilon)\leq \frac{\mathbb{E}_{\theta_0}[R_n]}{\varepsilon}\to 0. \end{align*} Therefore $R_n\xrightarrow{\mathbb P}0$. [guided] The term $R_n$ is the only random part of the likelihood approximation error. We have defined \begin{align*} R_n=\sup_{\theta\in K}\left|\frac{1}{n}X_n^\top\left(\Gamma_n(\theta)^{-1}-C_n(\theta)^{-1}\right)X_n\right|, \end{align*} so $R_n\geq 0$. The theorem statement assumes exactly the uniform quadratic-form approximation \begin{align*} \mathbb{E}_{\theta_0}[R_n]\to 0. \end{align*} To pass from an $L^1$ estimate to convergence in probability, fix $\varepsilon>0$ and apply [Markov's Inequality](/page/Markov%27s%20Inequality) to the non-negative random variable $R_n$: \begin{align*} \mathbb{P}_{\theta_0}(R_n>\varepsilon)\leq \frac{\mathbb{E}_{\theta_0}[R_n]}{\varepsilon}. \end{align*} The denominator $\varepsilon$ is fixed and positive, while the numerator tends to $0$. Hence \begin{align*} \mathbb{P}_{\theta_0}(R_n>\varepsilon)\to 0. \end{align*} This is precisely the definition of $R_n\xrightarrow{\mathbb P}0$. [/guided] [/step] [step:Combine the two error bounds to obtain the uniform Whittle approximation] Let $\varepsilon>0$. Since $D_n\to 0$, there exists $N\in\mathbb{N}$ such that $D_n\leq \varepsilon/2$ for every $n\geq N$. For such $n$, the bound from the first step gives \begin{align*} \left\{\sup_{\theta\in K}\left|\frac{-2\ell_n(\theta)}{n}-\frac{2Q_n(\theta)}{n}\right|>\varepsilon\right\} \subseteq \left\{R_n>\frac{\varepsilon}{2}\right\}. \end{align*} Taking probabilities with respect to the true law $\mathbb{P}_{\theta_0}$ and using $R_n\xrightarrow{\mathbb P}0$, we obtain \begin{align*} \mathbb{P}_{\theta_0}\left(\sup_{\theta\in K}\left|\frac{-2\ell_n(\theta)}{n}-\frac{2Q_n(\theta)}{n}\right|>\varepsilon\right) \leq \mathbb{P}_{\theta_0}\left(R_n>\frac{\varepsilon}{2}\right)\to 0. \end{align*} Because $\varepsilon>0$ was arbitrary, this proves \begin{align*} \sup_{\theta\in K}\left|\frac{-2\ell_n(\theta)}{n}-\frac{2Q_n(\theta)}{n}\right|\xrightarrow{\mathbb P}0. \end{align*} [/step]

Explore Further

Distribution of the Sample Mean of a Multivariate Normal Sample probability One-Way MANOVA Sum-of-Squares-and-Products Decomposition probability Linear Forecast from the Wold Representation probability Asymptotic Normality of ARMA Maximum Likelihood Estimators probability Law of the Unconscious Statistician probability Linear Filter Spectral Transformation Theorem probability Bartlett Chi-Squared Approximation for Wilks' Lambda Statistic probability Prediction Error Decomposition for the Linear Gaussian State Space Likelihood probability

What brings you to Androma?

Start with a route through the knowledge graph.

Whittle Likelihood Approximation Theorem (Theorem # 3653)

Discussion

Proof

Explore Further

Sign in to Androma

Check your inbox

One last step

Whittle Likelihood Approximation Theorem (Theorem # 3653)

Discussion

Proof

Explore Further