Dudley Chaining Bound for Separable Gaussian Processes

Theorem

Edit Issues Pull Requests Attributions Admin

Discussion

Proof

[proofplan] We prove the bound by chaining each $G_t$ to finite nets at dyadic scales in the canonical pseudometric. The finite increments at each scale are Gaussian with variance controlled by the scale, so a finite Gaussian maximal estimate bounds the expected largest increment by the scale times the square root of the logarithm of the covering number. We first prove the estimate on finite subsets of a countable separability set, then pass to the countable supremum and finally to $T$ by the stated separability assumption. The dyadic entropy sum is compared with the entropy integral, and all numerical constants are absorbed into a universal constant. [/proofplan] [step:Reduce to the positive-diameter case and fix dyadic nets] Let \begin{align*} D:=\operatorname{diam}(T,d_G). \end{align*} Since $(T,d_G)$ is [totally bounded](/page/Totally%20Bounded), $D<\infty$. If $D=0$, then for every $s,t\in T$, \begin{align*} \mathbb E\left[|G_s-G_t|^2\right]=0. \end{align*} Hence $G_s=G_t$ almost surely for each fixed pair $s,t\in T$. Let $S\subset T$ be the countable dense set from the separability hypothesis. Since $S\times S$ is countable, there is an event $A\in\mathcal F$ with $\mathbb P(A)=1$ such that $G_s(\omega)=G_t(\omega)$ for all $s,t\in S$ and all $\omega\in A$. Therefore \begin{align*} \sup_{s,t\in S}|G_t-G_s|=0 \end{align*} almost surely, and separability gives \begin{align*} \sup_{s,t\in T}|G_t-G_s|=0 \end{align*} almost surely. The desired inequality follows. Assume now that $D>0$. Choose a point $t_0\in T$ and define \begin{align*} T_0:=\{t_0\}. \end{align*} For each integer $k\ge 1$, define \begin{align*} \varepsilon_k:=2^{-k}D. \end{align*} Since $(T,d_G)$ is totally bounded, there exists a finite $\varepsilon_k$-net $T_k\subset T$ with \begin{align*} |T_k|=N(T,d_G,\varepsilon_k). \end{align*} For every $k\ge 1$ and every $t\in T$, choose one point $\pi_k(t)\in T_k$ satisfying \begin{align*} d_G(t,\pi_k(t))\le \varepsilon_k. \end{align*} Also define the map $\pi_0:T\to T_0$ by $\pi_0(t):=t_0$ for all $t\in T$. [guided] The case $D=0$ has to be removed before building dyadic scales, because the scales $\varepsilon_k=2^{-k}D$ would all vanish. In that case the canonical pseudometric says that every increment has zero second moment: \begin{align*} \mathbb E\left[|G_s-G_t|^2\right]=0. \end{align*} A nonnegative [random variable](/page/Random%20Variable) with expectation zero is zero almost surely, so $G_s=G_t$ almost surely for each fixed pair $s,t\in T$. We cannot immediately intersect over all pairs in $T\times T$, because $T$ may be uncountable. This is precisely why the theorem assumes separability: on the [countable set](/page/Countable%20Set) $S$, the intersection over all pairs $(s,t)\in S\times S$ is still a probability-one event. Thus \begin{align*} \sup_{s,t\in S}|G_t-G_s|=0 \end{align*} almost surely, and the separability identity transfers this to the full index set: \begin{align*} \sup_{s,t\in T}|G_t-G_s|=0. \end{align*} Now suppose $D>0$. We choose a coarse base point $t_0\in T$ and set $T_0=\{t_0\}$. At level $k\ge 1$, the scale is \begin{align*} \varepsilon_k=2^{-k}D. \end{align*} [Total boundedness](/page/Total%20Boundedness) guarantees a finite $\varepsilon_k$-net, and by the definition of the covering number we choose it with minimal cardinality: \begin{align*} |T_k|=N(T,d_G,\varepsilon_k). \end{align*} For every $t\in T$, the fact that $T_k$ is an $\varepsilon_k$-net lets us choose a nearest-net representative $\pi_k(t)\in T_k$ satisfying \begin{align*} d_G(t,\pi_k(t))\le \varepsilon_k. \end{align*} These maps $\pi_k:T\to T_k$ are the chain projections. They need not be measurable, because they are deterministic choices on the index set $T$, not random maps on $\Omega$. [/guided] [/step] [step:Prove the finite Gaussian maximal estimate used at every scale] [claim:Finite Gaussian maximal estimate] Let $m\in\mathbb N$, and let $Z_1,\dots,Z_m:(\Omega,\mathcal F)\to\mathbb R$ be centred Gaussian random variables. If $\sigma\ge 0$ satisfies \begin{align*} \mathbb E[Z_j^2]\le \sigma^2 \end{align*} for every $1\le j\le m$, then \begin{align*} \mathbb E\left[\max_{1\le j\le m}|Z_j|\right]\le \sigma\sqrt{2\log(2m)}. \end{align*} [/claim] [proof] If $\sigma=0$, then each $Z_j=0$ almost surely, and the estimate is immediate. Assume $\sigma>0$. For every $\lambda>0$, the exponential function $x\mapsto e^{\lambda x}$ is increasing, so \begin{align*} \exp\left(\lambda\max_{1\le j\le m}|Z_j|\right)\le \sum_{j=1}^{m}\left(e^{\lambda Z_j}+e^{-\lambda Z_j}\right). \end{align*} Taking expectation with respect to $\mathbb P$ and using the [moment generating function](/page/Moment%20Generating%20Function) of a centred Gaussian variable gives \begin{align*} \mathbb E\left[\exp\left(\lambda\max_{1\le j\le m}|Z_j|\right)\right]\le 2m\exp\left(\frac{\lambda^2\sigma^2}{2}\right). \end{align*} By [Jensen's inequality](/theorems/9) applied to the concave logarithm, \begin{align*} \lambda\mathbb E\left[\max_{1\le j\le m}|Z_j|\right]\le \log(2m)+\frac{\lambda^2\sigma^2}{2}. \end{align*} Set \begin{align*} \lambda:=\frac{\sqrt{2\log(2m)}}{\sigma}. \end{align*} Substitution yields \begin{align*} \mathbb E\left[\max_{1\le j\le m}|Z_j|\right]\le \sigma\sqrt{2\log(2m)}. \end{align*} [/proof] [/step] [step:Bound the oscillation over an arbitrary finite subset of the separability set] Let $S\subset T$ be the countable dense set from the separability hypothesis. Let $A\subset S$ be a nonempty finite subset. For $m\in\mathbb N$ and $t\in A$, write the telescoping identity \begin{align*} G_{\pi_m(t)}-G_{t_0}=\sum_{k=1}^{m}\left(G_{\pi_k(t)}-G_{\pi_{k-1}(t)}\right). \end{align*} For each fixed $t\in A$, \begin{align*} \mathbb E\left[|G_t-G_{\pi_m(t)}|^2\right]=d_G(t,\pi_m(t))^2\le \varepsilon_m^2, \end{align*} so $G_{\pi_m(t)}\to G_t$ in $L^2(\Omega,\mathcal F,\mathbb P)$ and therefore in $L^1(\Omega,\mathcal F,\mathbb P)$. For $u,v\in A$, the triangle inequality gives \begin{align*} |G_u-G_v|\le |G_u-G_{\pi_m(u)}|+|G_v-G_{\pi_m(v)}|+\sum_{k=1}^{m}|G_{\pi_k(u)}-G_{\pi_{k-1}(u)}|+\sum_{k=1}^{m}|G_{\pi_k(v)}-G_{\pi_{k-1}(v)}|. \end{align*} Taking the maximum over $u,v\in A$, then expectation, and then letting $m\to\infty$, gives \begin{align*} \mathbb E\left[\max_{u,v\in A}|G_u-G_v|\right]\le 2\sum_{k=1}^{\infty}\mathbb E\left[\max_{t\in A}|G_{\pi_k(t)}-G_{\pi_{k-1}(t)}|\right]. \end{align*} For $k\ge 1$, every random variable of the form $G_{\pi_k(t)}-G_{\pi_{k-1}(t)}$ is centred Gaussian and has variance \begin{align*} d_G(\pi_k(t),\pi_{k-1}(t))^2\le \left(d_G(\pi_k(t),t)+d_G(t,\pi_{k-1}(t))\right)^2\le (\varepsilon_k+\varepsilon_{k-1})^2\le (2\varepsilon_{k-1})^2. \end{align*} The set of possible pairs $(\pi_k(t),\pi_{k-1}(t))$ is contained in $T_k\times T_{k-1}$. Hence its cardinality is at most $|T_k||T_{k-1}|$. By the finite Gaussian maximal estimate, \begin{align*} \mathbb E\left[\max_{t\in A}|G_{\pi_k(t)}-G_{\pi_{k-1}(t)}|\right]\le 2\varepsilon_{k-1}\sqrt{2\log(2|T_k||T_{k-1}|)}. \end{align*} For $k\ge 2$, the scale satisfies $\varepsilon_k<D/2$. Hence $N(T,d_G,\varepsilon_k)\ge 2$: if a single $\varepsilon_k$-ball covered $T$, then the triangle inequality would give $D\le 2\varepsilon_k<D$, a contradiction. Also $|T_{k-1}|=N(T,d_G,\varepsilon_{k-1})\le N(T,d_G,\varepsilon_k)=|T_k|$, because covering numbers are nondecreasing as the radius decreases. Thus, for $k\ge 2$, \begin{align*} \log(2|T_k||T_{k-1}|)\le \log(2|T_k|^2)\le 3\log |T_k|=3\log N(T,d_G,\varepsilon_k). \end{align*} Consequently, for the universal constant $C_1:=2\sqrt{6}$, \begin{align*} \mathbb E\left[\max_{t\in A}|G_{\pi_k(t)}-G_{\pi_{k-1}(t)}|\right]\le C_1\varepsilon_{k-1}\sqrt{\log N(T,d_G,\varepsilon_k)} \end{align*} for every $k\ge 2$. The first level will be kept separate, because $N(T,d_G,D/2)$ may equal $1$. From the finite Gaussian maximal estimate with $|T_0|=1$, \begin{align*} \mathbb E\left[\max_{t\in A}|G_{\pi_1(t)}-G_{t_0}|\right]\le 2D\sqrt{2\log(2N(T,d_G,D/2))}. \end{align*} Therefore \begin{align*} \mathbb E\left[\max_{u,v\in A}|G_u-G_v|\right]\le 4D\sqrt{2\log(2N(T,d_G,D/2))}+2C_1\sum_{k=2}^{\infty}\varepsilon_{k-1}\sqrt{\log N(T,d_G,\varepsilon_k)}. \end{align*} [guided] The role of the finite set $A\subset S$ is to avoid measurability and limiting difficulties while the chain is being built. For $t\in A$, the chain at depth $m$ is the finite telescoping sum \begin{align*} G_{\pi_m(t)}-G_{t_0}=\sum_{k=1}^{m}\left(G_{\pi_k(t)}-G_{\pi_{k-1}(t)}\right). \end{align*} This identity is purely algebraic: every intermediate term cancels. We must connect $G_{\pi_m(t)}$ back to $G_t$. The canonical pseudometric was defined exactly so that \begin{align*} \mathbb E\left[|G_t-G_{\pi_m(t)}|^2\right]=d_G(t,\pi_m(t))^2. \end{align*} Because $\pi_m(t)$ lies within $\varepsilon_m$ of $t$, \begin{align*} \mathbb E\left[|G_t-G_{\pi_m(t)}|^2\right]\le \varepsilon_m^2. \end{align*} Since $\varepsilon_m\to 0$, this gives $G_{\pi_m(t)}\to G_t$ in $L^2$, hence also in $L^1$. For $u,v\in A$, insert the approximants and use the triangle inequality: \begin{align*} |G_u-G_v|\le |G_u-G_{\pi_m(u)}|+|G_v-G_{\pi_m(v)}|+|G_{\pi_m(u)}-G_{t_0}|+|G_{\pi_m(v)}-G_{t_0}|. \end{align*} Expanding the last two terms by the telescoping identity gives \begin{align*} |G_u-G_v|\le |G_u-G_{\pi_m(u)}|+|G_v-G_{\pi_m(v)}|+\sum_{k=1}^{m}|G_{\pi_k(u)}-G_{\pi_{k-1}(u)}|+\sum_{k=1}^{m}|G_{\pi_k(v)}-G_{\pi_{k-1}(v)}|. \end{align*} Taking the maximum over the finite set $A\times A$ and then expectation is legitimate because the maximum of finitely many measurable random variables is measurable. The two endpoint errors vanish in expectation as $m\to\infty$, since $A$ is finite and each error converges to zero in $L^1$. Thus \begin{align*} \mathbb E\left[\max_{u,v\in A}|G_u-G_v|\right]\le 2\sum_{k=1}^{\infty}\mathbb E\left[\max_{t\in A}|G_{\pi_k(t)}-G_{\pi_{k-1}(t)}|\right]. \end{align*} Now fix a level $k\ge 1$. Each increment $G_{\pi_k(t)}-G_{\pi_{k-1}(t)}$ is a centred Gaussian random variable, because it is a linear combination of two coordinates of a centred Gaussian process. Its variance is controlled by the canonical pseudometric: \begin{align*} \mathbb E\left[|G_{\pi_k(t)}-G_{\pi_{k-1}(t)}|^2\right]=d_G(\pi_k(t),\pi_{k-1}(t))^2. \end{align*} The triangle inequality for $d_G$ gives \begin{align*} d_G(\pi_k(t),\pi_{k-1}(t))\le d_G(\pi_k(t),t)+d_G(t,\pi_{k-1}(t))\le \varepsilon_k+\varepsilon_{k-1}\le 2\varepsilon_{k-1}. \end{align*} Thus the finite Gaussian maximal estimate applies with $\sigma=2\varepsilon_{k-1}$. The number of different increments at level $k$ is no larger than the number of possible pairs of net points: \begin{align*} (\pi_k(t),\pi_{k-1}(t))\in T_k\times T_{k-1}. \end{align*} Therefore there are at most $|T_k||T_{k-1}|$ variables to maximize. The finite Gaussian maximal estimate gives \begin{align*} \mathbb E\left[\max_{t\in A}|G_{\pi_k(t)}-G_{\pi_{k-1}(t)}|\right]\le 2\varepsilon_{k-1}\sqrt{2\log(2|T_k||T_{k-1}|)}. \end{align*} For $k\ge 2$, we have $\varepsilon_k<D/2$, and therefore $N(T,d_G,\varepsilon_k)\ge 2$. Indeed, if one $\varepsilon_k$-ball covered $T$, then every two points of $T$ would be at distance at most $2\varepsilon_k<D$, contradicting the definition of $D$ as the supremum of all pairwise distances. Since covering numbers increase when the radius decreases, \begin{align*} |T_{k-1}|=N(T,d_G,\varepsilon_{k-1})\le N(T,d_G,\varepsilon_k)=|T_k|. \end{align*} Hence, for $k\ge 2$, \begin{align*} \log(2|T_k||T_{k-1}|)\le \log(2|T_k|^2)\le 3\log |T_k|=3\log N(T,d_G,\varepsilon_k), \end{align*} where the middle inequality uses $|T_k|\ge 2$. Thus the level-$k$ increment is bounded by \begin{align*} C_1\varepsilon_{k-1}\sqrt{\log N(T,d_G,\varepsilon_k)} \end{align*} with the explicit universal constant $C_1=2\sqrt{6}$. The level $k=1$ is different because $N(T,d_G,D/2)$ can equal $1$. We therefore do not absorb the additive $\log 2$ term into the entropy at that level. The finite Gaussian maximal estimate gives directly \begin{align*} \mathbb E\left[\max_{t\in A}|G_{\pi_1(t)}-G_{t_0}|\right]\le 2D\sqrt{2\log(2N(T,d_G,D/2))}. \end{align*} Summing the first level and the levels $k\ge 2$ gives \begin{align*} \mathbb E\left[\max_{u,v\in A}|G_u-G_v|\right]\le 4D\sqrt{2\log(2N(T,d_G,D/2))}+2C_1\sum_{k=2}^{\infty}\varepsilon_{k-1}\sqrt{\log N(T,d_G,\varepsilon_k)}. \end{align*} [/guided] [/step] [step:Pass from finite subsets to the separable supremum] Enumerate the countable set $S$ as \begin{align*} S=\{q_1,q_2,\dots\}. \end{align*} For $r\in\mathbb N$, define the finite subset \begin{align*} A_r:=\{q_1,\dots,q_r\}. \end{align*} The random variables \begin{align*} M_r:=\max_{u,v\in A_r}|G_u-G_v| \end{align*} satisfy $0\le M_r\le M_{r+1}$ and \begin{align*} \lim_{r\to\infty}M_r=\sup_{u,v\in S}|G_u-G_v| \end{align*} pointwise. By the [monotone convergence theorem](/theorems/509) applied to the [measure space](/page/Measure%20Space) $(\Omega,\mathcal F,\mathbb P)$, \begin{align*} \mathbb E\left[\sup_{u,v\in S}|G_u-G_v|\right]=\lim_{r\to\infty}\mathbb E[M_r]. \end{align*} Using the finite-set estimate for each $A_r$ gives \begin{align*} \mathbb E\left[\sup_{u,v\in S}|G_u-G_v|\right]\le 4D\sqrt{2\log(2N(T,d_G,D/2))}+2C_1\sum_{k=2}^{\infty}\varepsilon_{k-1}\sqrt{\log N(T,d_G,\varepsilon_k)}. \end{align*} By the separability assumption, \begin{align*} \sup_{s,t\in T}|G_t-G_s|=\sup_{s,t\in S}|G_t-G_s| \end{align*} almost surely. Hence \begin{align*} \mathbb E\left[\sup_{s,t\in T}|G_t-G_s|\right]\le 4D\sqrt{2\log(2N(T,d_G,D/2))}+2C_1\sum_{k=2}^{\infty}\varepsilon_{k-1}\sqrt{\log N(T,d_G,\varepsilon_k)}. \end{align*} [/step] [step:Compare the corrected dyadic estimate with the entropy integral] Define the nonincreasing function $H:(0,D]\to[0,\infty)$ by \begin{align*} H(\varepsilon):=\sqrt{\log N(T,d_G,\varepsilon)}. \end{align*} For $\varepsilon\in(0,D/2)$, one has $N(T,d_G,\varepsilon)\ge 2$, because otherwise one ball of radius $\varepsilon$ would cover $T$ and imply $D\le 2\varepsilon<D$. Hence \begin{align*} \int_0^{D/2}H(\varepsilon)\,d\mathcal L^1(\varepsilon)\ge \frac{D}{2}\sqrt{\log 2}. \end{align*} Also $N(T,d_G,D/2)\le N(T,d_G,\varepsilon)$ for every $\varepsilon\in(0,D/2]$, so \begin{align*} D\sqrt{\log(2N(T,d_G,D/2))}\le C_2\int_0^D H(\varepsilon)\,d\mathcal L^1(\varepsilon) \end{align*} for the universal constant $C_2:=2\sqrt{2}/\sqrt{\log 2}$. For $k\ge 2$ and $\varepsilon\in(\varepsilon_{k+1},\varepsilon_k]$, monotonicity of the covering number in the scale gives \begin{align*} H(\varepsilon)\ge H(\varepsilon_k). \end{align*} Therefore \begin{align*} \int_{\varepsilon_{k+1}}^{\varepsilon_k}H(\varepsilon)\,d\mathcal L^1(\varepsilon)\ge (\varepsilon_k-\varepsilon_{k+1})H(\varepsilon_k)=\varepsilon_{k+1}H(\varepsilon_k). \end{align*} Since $\varepsilon_{k-1}=4\varepsilon_{k+1}$, \begin{align*} \varepsilon_{k-1}H(\varepsilon_k)\le 4\int_{\varepsilon_{k+1}}^{\varepsilon_k}H(\varepsilon)\,d\mathcal L^1(\varepsilon). \end{align*} Summing over $k\ge 2$ yields \begin{align*} \sum_{k=2}^{\infty}\varepsilon_{k-1}\sqrt{\log N(T,d_G,\varepsilon_k)}\le 4\int_0^D\sqrt{\log N(T,d_G,\varepsilon)}\,d\mathcal L^1(\varepsilon). \end{align*} Combining these estimates with the previous step gives \begin{align*} \mathbb E\left[\sup_{s,t\in T}|G_t-G_s|\right]\le (4\sqrt{2}C_2+8C_1)\int_0^D\sqrt{\log N(T,d_G,\varepsilon)}\,d\mathcal L^1(\varepsilon). \end{align*} Thus the theorem holds with the universal constant \begin{align*} C:=4\sqrt{2}C_2+8C_1. \end{align*} If the entropy integral is finite, the displayed inequality immediately implies \begin{align*} \mathbb E\left[\sup_{s,t\in T}|G_t-G_s|\right]<\infty. \end{align*} [guided] The corrected chaining estimate has two pieces: the first level and the levels $k\ge 2$. We compare both pieces with the same entropy integral. Define \begin{align*} H(\varepsilon):=\sqrt{\log N(T,d_G,\varepsilon)}, \qquad 0<\varepsilon\le D. \end{align*} This function is nonincreasing in the radius variable: if the allowed radius becomes smaller, at least as many balls may be needed, so the covering number cannot decrease. First handle the first-level term. For every $\varepsilon\in(0,D/2)$, the covering number satisfies $N(T,d_G,\varepsilon)\ge 2$. If this failed, then a single ball of radius $\varepsilon$ would contain all of $T$; by the triangle inequality, any two points of $T$ would then have distance at most $2\varepsilon<D$, contradicting the definition of $D$ as the supremum of pairwise distances. Therefore \begin{align*} \int_0^{D/2}H(\varepsilon)\,d\mathcal L^1(\varepsilon)\ge \frac{D}{2}\sqrt{\log 2}. \end{align*} Moreover $N(T,d_G,D/2)\le N(T,d_G,\varepsilon)$ for $0<\varepsilon\le D/2$, so the same interval also controls the factor involving $N(T,d_G,D/2)$. In particular, \begin{align*} D\sqrt{\log(2N(T,d_G,D/2))}\le C_2\int_0^D H(\varepsilon)\,d\mathcal L^1(\varepsilon) \end{align*} with the universal constant $C_2=2\sqrt{2}/\sqrt{\log 2}$. The precise value is not important; what matters is that the constant is numerical and does not depend on the process or on $T$. Now handle the tail of the dyadic sum. Fix $k\ge 2$. On the interval $(\varepsilon_{k+1},\varepsilon_k]$, monotonicity gives \begin{align*} H(\varepsilon)\ge H(\varepsilon_k). \end{align*} Integrating this lower bound with respect to one-dimensional [Lebesgue measure](/page/Lebesgue%20Measure) gives \begin{align*} \int_{\varepsilon_{k+1}}^{\varepsilon_k}H(\varepsilon)\,d\mathcal L^1(\varepsilon)\ge (\varepsilon_k-\varepsilon_{k+1})H(\varepsilon_k)=\varepsilon_{k+1}H(\varepsilon_k). \end{align*} Since the scales are dyadic, $\varepsilon_{k-1}=4\varepsilon_{k+1}$, and hence \begin{align*} \varepsilon_{k-1}H(\varepsilon_k)\le 4\int_{\varepsilon_{k+1}}^{\varepsilon_k}H(\varepsilon)\,d\mathcal L^1(\varepsilon). \end{align*} The intervals $(\varepsilon_{k+1},\varepsilon_k]$ for $k\ge 2$ are disjoint subintervals of $(0,D]$, so summing gives \begin{align*} \sum_{k=2}^{\infty}\varepsilon_{k-1}\sqrt{\log N(T,d_G,\varepsilon_k)}\le 4\int_0^D\sqrt{\log N(T,d_G,\varepsilon)}\,d\mathcal L^1(\varepsilon). \end{align*} Combining the first-level estimate and the tail estimate with the bound from the preceding step yields \begin{align*} \mathbb E\left[\sup_{s,t\in T}|G_t-G_s|\right]\le (4\sqrt{2}C_2+8C_1)\int_0^D\sqrt{\log N(T,d_G,\varepsilon)}\,d\mathcal L^1(\varepsilon). \end{align*} Thus the theorem holds with the universal constant \begin{align*} C:=4\sqrt{2}C_2+8C_1. \end{align*} If the entropy integral is finite, this displayed inequality gives the asserted finiteness of the expected oscillation supremum. [/guided] [/step]

Prerequisites (0/3 completed)

Prerequisites Graph

Interactive dependency map showing how this theorem builds on foundational concepts

Loading dependency graph...

Definitions & Concepts

What brings you to Androma?

Start with a route through the knowledge graph.