Peskun Ordering Theorem for Finite Reversible Markov Chains

Theorem

Edit Issues Pull Requests Attributions Admin

Discussion

Proof

[proofplan] We first remove the stationary mean of $h$, since adding a constant to the summands does not change the variance. On the finite-dimensional [Hilbert space](/page/Hilbert%20Space) of $\pi$-mean-zero functions, reversibility makes each transition kernel a self-adjoint operator, and irreducibility plus aperiodicity force all eigenvalues to lie strictly between $-1$ and $1$. This gives the resolvent formula $\sigma_P^2(g)=\langle g,(I+P)(I-P)^{-1}g\rangle_\pi$ for centered $g$. Finally, the off-diagonal dominance assumption gives the Dirichlet form ordering $I-P_1 \ge I-P_2$, and inversion reverses the order for positive definite [self-adjoint operators](/page/Self-Adjoint%20Operators), yielding the desired comparison. [/proofplan] [step:Center the observable without changing the variance] Define the stationary mean of $h$ by \begin{align*} \pi(h) := \sum_{x \in E} h(x)\pi(x). \end{align*} Define the centered function $g:E \to \mathbb{R}$ by $g(x):=h(x)-\pi(h)$ for $x \in E$. For every $N \in \mathbb{N}$, \begin{align*} \frac{1}{N}\sum_{n=0}^{N-1} h(X_n) = \pi(h) + \frac{1}{N}\sum_{n=0}^{N-1} g(X_n). \end{align*} Since variance is invariant under addition of a deterministic constant, \begin{align*} \operatorname{Var}\left(\frac{1}{N}\sum_{n=0}^{N-1} h(X_n)\right) = \operatorname{Var}\left(\frac{1}{N}\sum_{n=0}^{N-1} g(X_n)\right). \end{align*} It is therefore enough to prove the result for functions $g:E \to \mathbb{R}$ satisfying $\pi(g)=0$. [/step] [step:Represent each reversible kernel on the mean-zero Hilbert space] Let $L^2(\pi)$ be the real Hilbert space of functions $f:E \to \mathbb{R}$ with [inner product](/page/Inner%20Product) \begin{align*} \langle f,k\rangle_\pi := \sum_{x \in E} f(x)k(x)\pi(x). \end{align*} Let \begin{align*} L^2_0(\pi) := \{f:E \to \mathbb{R} : \langle f,\mathbb{1}_E\rangle_\pi = 0\} \end{align*} be the closed subspace of $\pi$-mean-zero functions, where $\mathbb{1}_E:E \to \mathbb{R}$ denotes the constant function $\mathbb{1}_E(x)=1$. For a Markov kernel $P:E \times E \to [0,1]$ reversible with respect to $\pi$, define the linear operator \begin{align*} T_P:L^2(\pi) &\to L^2(\pi) \end{align*} \begin{align*} f &\mapsto T_Pf \end{align*} by \begin{align*} (T_Pf)(x) := \sum_{y \in E} P(x,y)f(y). \end{align*} Reversibility gives, for all $f,k \in L^2(\pi)$, \begin{align*} \langle T_P f,k\rangle_\pi = \sum_{x,y \in E} P(x,y)f(y)k(x)\pi(x). \end{align*} Using $\pi(x)P(x,y)=\pi(y)P(y,x)$ and exchanging the names of $x$ and $y$ gives \begin{align*} \langle T_P f,k\rangle_\pi = \sum_{x,y \in E} f(x)P(x,y)k(y)\pi(x) = \langle f,T_P k\rangle_\pi. \end{align*} Thus $T_P$ is self-adjoint on $L^2(\pi)$. Since $\pi$ is stationary for $P$ by reversibility and stochasticity, $T_P$ preserves $L^2_0(\pi)$. We write $P$ also for the restricted self-adjoint operator \begin{align*} P:L^2_0(\pi) \to L^2_0(\pi). \end{align*} [/step] [step:Show that the resolvent formula gives the asymptotic variance] Let $P:E \times E \to [0,1]$ be irreducible, aperiodic, and reversible with respect to $\pi$, and let $g \in L^2_0(\pi)$. Since $E$ is finite, $L^2_0(\pi)$ is finite-dimensional. By the finite-dimensional [spectral theorem for self-adjoint operators](/theorems/6911), the self-adjoint operator $P:L^2_0(\pi) \to L^2_0(\pi)$ admits an orthonormal eigenbasis $(e_j)_{j=1}^m$ with real eigenvalues $(\lambda_j)_{j=1}^m$. We record why these eigenvalues lie strictly inside $(-1,1)$. If $Pf=\lambda f$ on $L^2(\pi)$ and $x_0 \in E$ satisfies $|f(x_0)|=\max_{x \in E}|f(x)|$, then stochasticity gives $|\lambda|\le 1$ whenever $f \ne 0$; hence every eigenvalue of the finite Markov operator has modulus at most $1$. If $Pf=f$, then the maximum principle and irreducibility force $f$ to be constant on $E$, so the eigenspace for $1$ on $L^2(\pi)$ is $\operatorname{span}\{\mathbb{1}_E\}$; therefore $1$ is not an eigenvalue on $L^2_0(\pi)$. If $Pf=-f$, then equality in the same maximum-modulus argument forces the chain to alternate signs on every possible transition, which partitions the communicating class into two cyclic classes of period $2$; this contradicts aperiodicity. Thus \begin{align*} -1 < \lambda_j < 1 \end{align*} for every $j \in \{1,\dots,m\}$. Write \begin{align*} g = \sum_{j=1}^m a_j e_j, \end{align*} where $a_j := \langle g,e_j\rangle_\pi$. Let $\mathbb{P}$ denote the law of the stationary Markov chain with transition kernel $P$ and initial law $\pi$, and let $\mathbb{E}$ denote expectation with respect to $\mathbb{P}$. For square-integrable real random variables $Y,Z:\Omega \to \mathbb{R}$, define \begin{align*} \operatorname{Cov}(Y,Z) := \mathbb{E}[(Y-\mathbb{E}[Y])(Z-\mathbb{E}[Z])]. \end{align*} Stationarity and the Markov property give \begin{align*} \operatorname{Cov}(g(X_0),g(X_k)) = \langle g,P^k g\rangle_\pi \end{align*} for every $k \in \mathbb{N}\cup\{0\}$. Hence, for $N \in \mathbb{N}$, \begin{align*} N\operatorname{Var}\left(\frac{1}{N}\sum_{n=0}^{N-1}g(X_n)\right) = \sum_{k=-(N-1)}^{N-1}\left(1-\frac{|k|}{N}\right)\langle g,P^{|k|}g\rangle_\pi. \end{align*} Using the eigenbasis expansion, \begin{align*} \langle g,P^k g\rangle_\pi = \sum_{j=1}^m a_j^2\lambda_j^k. \end{align*} Since $|\lambda_j|<1$, the finite geometric series limit gives \begin{align*} \lim_{N \to \infty} \sum_{k=-(N-1)}^{N-1}\left(1-\frac{|k|}{N}\right)\lambda_j^{|k|} = 1 + 2\sum_{k=1}^{\infty}\lambda_j^k = \frac{1+\lambda_j}{1-\lambda_j}. \end{align*} Therefore the stationary asymptotic variance exists and satisfies \begin{align*} \sigma_P^2(g) = \sum_{j=1}^m a_j^2\frac{1+\lambda_j}{1-\lambda_j}. \end{align*} Since $I-P$ is positive definite on $L^2_0(\pi)$, this is equivalently \begin{align*} \sigma_P^2(g)=\langle g,(I+P)(I-P)^{-1}g\rangle_\pi. \end{align*} [guided] We prove the asymptotic variance identity from first principles in the finite-state setting. The advantage of working on $L^2_0(\pi)$ is that the constant eigenfunction has been removed, so the problematic eigenvalue $1$ no longer appears. Because $P$ is reversible with respect to $\pi$, the operator $P:L^2_0(\pi)\to L^2_0(\pi)$ is self-adjoint. Since $L^2_0(\pi)$ is finite-dimensional, the finite-dimensional spectral theorem for self-adjoint operators gives an [orthonormal basis](/page/Orthonormal%20Basis) $(e_j)_{j=1}^m$ of $L^2_0(\pi)$ and [real numbers](/page/Real%20Numbers) $\lambda_1,\dots,\lambda_m$ such that $Pe_j=\lambda_j e_j$ for each $j$. We now verify the spectral restrictions used below. If $Pf=\lambda f$ on $L^2(\pi)$ and $x_0 \in E$ is chosen so that $|f(x_0)|=\max_{x \in E}|f(x)|$, then \begin{align*} |\lambda| |f(x_0)| = |(Pf)(x_0)| \le \sum_{y \in E}P(x_0,y)|f(y)| \le |f(x_0)|. \end{align*} For a nonzero eigenfunction this gives $|\lambda|\le 1$. If $\lambda=1$, equality in the maximum principle and irreducibility force $f$ to be constant on all states reachable from $x_0$, hence on all of $E$; therefore the only eigenfunctions with eigenvalue $1$ are constants, and no nonzero constant lies in $L^2_0(\pi)$. If $\lambda=-1$, equality forces every transition with positive probability to move between opposite signs of an extremal eigenfunction, producing a two-class cyclic decomposition of the irreducible chain. That contradicts aperiodicity. Hence each eigenvalue on $L^2_0(\pi)$ satisfies \begin{align*} -1 < \lambda_j < 1. \end{align*} Now expand the centered observable: \begin{align*} g = \sum_{j=1}^m a_j e_j, \end{align*} where $a_j:=\langle g,e_j\rangle_\pi$. Let $\mathbb{P}$ denote the law of the stationary Markov chain with transition kernel $P$ and initial law $\pi$, and let $\mathbb{E}$ denote expectation with respect to $\mathbb{P}$. For square-integrable real random variables $Y,Z:\Omega \to \mathbb{R}$, define \begin{align*} \operatorname{Cov}(Y,Z) := \mathbb{E}[(Y-\mathbb{E}[Y])(Z-\mathbb{E}[Z])]. \end{align*} Since the chain is stationary, $X_0$ has law $\pi$, and the Markov property gives \begin{align*} \mathbb{E}[g(X_k)\mid X_0] = (P^k g)(X_0). \end{align*} Because $\pi(g)=0$, the covariance at lag $k$ is \begin{align*} \operatorname{Cov}(g(X_0),g(X_k)) = \mathbb{E}[g(X_0)g(X_k)] = \langle g,P^k g\rangle_\pi. \end{align*} The variance of the empirical average is therefore the sum of all lag covariances: \begin{align*} N\operatorname{Var}\left(\frac{1}{N}\sum_{n=0}^{N-1}g(X_n)\right) = \sum_{k=-(N-1)}^{N-1}\left(1-\frac{|k|}{N}\right)\langle g,P^{|k|}g\rangle_\pi. \end{align*} The eigenbasis diagonalizes each covariance term: \begin{align*} \langle g,P^k g\rangle_\pi = \sum_{j=1}^m a_j^2\lambda_j^k. \end{align*} For a fixed eigenvalue $\lambda_j$ with $|\lambda_j|<1$, the weighted covariance sum converges to the two-sided geometric series \begin{align*} 1 + 2\sum_{k=1}^{\infty}\lambda_j^k = \frac{1+\lambda_j}{1-\lambda_j}. \end{align*} There are only finitely many eigenvalues, so we may pass the limit through the finite sum and obtain \begin{align*} \sigma_P^2(g) = \sum_{j=1}^m a_j^2\frac{1+\lambda_j}{1-\lambda_j}. \end{align*} Finally, because $(I-P)e_j=(1-\lambda_j)e_j$ and $1-\lambda_j>0$, the inverse $(I-P)^{-1}$ is well-defined on $L^2_0(\pi)$, and \begin{align*} (I+P)(I-P)^{-1}e_j = \frac{1+\lambda_j}{1-\lambda_j}e_j. \end{align*} Thus the preceding spectral expression is exactly \begin{align*} \sigma_P^2(g)=\langle g,(I+P)(I-P)^{-1}g\rangle_\pi. \end{align*} [/guided] [/step] [step:Convert off-diagonal dominance into Dirichlet form ordering] For $i \in \{1,2\}$, define the positive self-adjoint operator \begin{align*} A_i:L^2_0(\pi) &\to L^2_0(\pi) \end{align*} \begin{align*} f &\mapsto (I-P_i)f. \end{align*} For every $f \in L^2_0(\pi)$, reversibility gives the Dirichlet form identity \begin{align*} \langle f,A_i f\rangle_\pi = \frac{1}{2}\sum_{x,y \in E}\pi(x)P_i(x,y)(f(x)-f(y))^2. \end{align*} Indeed, expanding the right-hand side gives \begin{align*} \frac{1}{2}\sum_{x,y \in E}\pi(x)P_i(x,y)(f(x)-f(y))^2 = \sum_{x \in E}\pi(x)f(x)^2 - \sum_{x,y \in E}\pi(x)P_i(x,y)f(x)f(y), \end{align*} where stochasticity gives the first quadratic term and reversibility identifies the two symmetric square terms. This equals $\langle f,(I-P_i)f\rangle_\pi$. Since the diagonal terms satisfy $(f(x)-f(x))^2=0$, only pairs with $x \ne y$ contribute. The assumption $P_1(x,y)\ge P_2(x,y)$ for $x \ne y$ therefore implies \begin{align*} \langle f,A_1 f\rangle_\pi \ge \langle f,A_2 f\rangle_\pi \end{align*} for every $f \in L^2_0(\pi)$. [/step] [step:Invert the ordered positive operators to compare the resolvents] The preceding step says $A_1-A_2$ is positive semidefinite on $L^2_0(\pi)$. From the spectral conclusion above, $A_i=I-P_i$ is positive definite on $L^2_0(\pi)$ for $i \in \{1,2\}$. We prove the inverse-ordering directly. Define \begin{align*} B:=A_2^{-1/2}A_1A_2^{-1/2}. \end{align*} The square root $A_2^{-1/2}:L^2_0(\pi)\to L^2_0(\pi)$ is defined by the finite-dimensional spectral theorem applied to the positive definite self-adjoint operator $A_2$. Then $B$ is self-adjoint and $B\ge I$, so every eigenvalue of $B$ is at least $1$, hence $B^{-1}\le I$. Since \begin{align*} A_1^{-1}=A_2^{-1/2}B^{-1}A_2^{-1/2}, \end{align*} we obtain $A_1^{-1}\le A_2^{-1}$. [guided] The comparison of asymptotic variances will come from comparing the inverses of $A_1=I-P_1$ and $A_2=I-P_2$. The direction of the inequality reverses under inversion: a larger positive definite operator has a smaller inverse. We already proved that \begin{align*} \langle f,A_1f\rangle_\pi \ge \langle f,A_2f\rangle_\pi \end{align*} for every $f \in L^2_0(\pi)$. In operator notation, this is $A_1\ge A_2$. We also know that $A_2$ is positive definite, because all eigenvalues $\lambda$ of $P_2$ on $L^2_0(\pi)$ satisfy $\lambda<1$, so all eigenvalues $1-\lambda$ of $A_2$ are positive. To prove the inverse comparison rigorously, define \begin{align*} B:=A_2^{-1/2}A_1A_2^{-1/2}. \end{align*} This is a self-adjoint operator on $L^2_0(\pi)$ because $A_1$ and $A_2^{-1/2}$ are self-adjoint. For any $v \in L^2_0(\pi)$, set $w:=A_2^{-1/2}v$. Then \begin{align*} \langle v,Bv\rangle_\pi = \langle w,A_1w\rangle_\pi \ge \langle w,A_2w\rangle_\pi = \langle v,v\rangle_\pi. \end{align*} Thus $B\ge I$. Since $B$ is finite-dimensional and self-adjoint, it has an orthonormal eigenbasis, and the inequality $B\ge I$ means that every eigenvalue of $B$ is at least $1$. Therefore every eigenvalue of $B^{-1}$ is at most $1$, so $B^{-1}\le I$. Now compute the inverse of $A_1$ from the definition of $B$: \begin{align*} A_1 = A_2^{1/2}BA_2^{1/2}. \end{align*} Hence \begin{align*} A_1^{-1}=A_2^{-1/2}B^{-1}A_2^{-1/2}. \end{align*} Because $B^{-1}\le I$, for every $g \in L^2_0(\pi)$ we get \begin{align*} \langle g,A_1^{-1}g\rangle_\pi \le \langle g,A_2^{-1}g\rangle_\pi. \end{align*} This is the precise operator step that converts “more movement off the diagonal” into “smaller asymptotic variance.” [/guided] [/step] [step:Apply the resolvent identity to finish the Peskun comparison] For $i \in \{1,2\}$, the resolvent formula gives \begin{align*} \sigma_{P_i}^2(g)=\langle g,(I+P_i)(I-P_i)^{-1}g\rangle_\pi. \end{align*} Since $A_i=I-P_i$, we have $P_i=I-A_i$, and therefore \begin{align*} (I+P_i)(I-P_i)^{-1} = (2I-A_i)A_i^{-1}=2A_i^{-1}-I. \end{align*} Thus \begin{align*} \sigma_{P_i}^2(g)=2\langle g,A_i^{-1}g\rangle_\pi-\langle g,g\rangle_\pi. \end{align*} Using $A_1^{-1}\le A_2^{-1}$ in quadratic form order gives \begin{align*} \sigma_{P_1}^2(g)\le \sigma_{P_2}^2(g). \end{align*} Since $g=h-\pi(h)$ and centering does not change the variance of the empirical average, the same inequality holds for the original function $h:E\to\mathbb{R}$. The resolvent formula also proves that both asymptotic variance limits exist and are finite. This completes the proof. [/step]

What brings you to Androma?

Start with a route through the knowledge graph.