Androma — The Home of Mathematics on the Internet

custom_env admin

[step:Lower bound the Frobenius risk by a spectrally bounded sign packing]We use the following finite-packing fact. There are universal constants $a,b,c>0$ and, for each $d\ge2$, a set $\mathcal A_d$ of symmetric $d\times d$ matrices with zero diagonal and entries in $\{-1,1\}$ off the diagonal such that $|\mathcal A_d|\ge \exp(a d^2)$, $\|A\|_{\mathrm{op}}\le b\sqrt d$ for every $A\in\mathcal A_d$, and \begin{align*} \|A-B\|_F^2 \ge c d^2 \end{align*} for distinct $A,B\in\mathcal A_d$. To obtain it, apply the [Hamming and Gilbert-Varshamov Bounds](/theorems/5738) to the $d(d-1)/2$ off-diagonal sign coordinates, giving an exponentially large Hamming-separated family. A random symmetric sign matrix has $\|A\|_{\mathrm{op}}\le b\sqrt d$ with probability at least $3/4$ for a universal $b>0$, while the Gilbert-Varshamov family has separation at least a fixed positive fraction of the coordinates; averaging over random sign translations and then discarding matrices outside the operator-norm event leaves a subfamily of cardinality at least $\exp(ad^2)$ after reducing $a>0$, with the same Frobenius separation up to reducing $c>0$. Let $C_1>0$ be the universal constant such that the Gaussian covariance KL estimate below is bounded by $C_1 n\|\Sigma_A-\Sigma_B\|_F^2$ whenever all eigenvalues of the covariance matrices lie in $[1/4,3/4]$. Since $\|A-B\|_F^2\le 4d^2$ for matrices in $\mathcal A_d$, define $C_2=4C_1$ and $C_3=C_2$. Choose a constant $\gamma>0$ small enough that $C_3\gamma^2\le a/8$ and $\gamma\le 1/(4b)$. Set \begin{align*} \delta &= \gamma\min\left\{\frac{1}{\sqrt d},\frac{1}{\sqrt n}\right\}. \end{align*} For each $A\in\mathcal A_d$, define the covariance matrix \begin{align*} \Sigma_A &= \frac{1}{2}I_d + \delta A. \end{align*} Since $\|\delta A\|_{\mathrm{op}}\le 1/4$, each eigenvalue of $\Sigma_A$ lies in $[1/4,3/4]$, so $\Sigma_A\in\Theta_d$. For each $A\in\mathcal A_d$, let $P_A$ denote the probability law $\mathcal N(0,\Sigma_A)$ on $\mathbb R^d$. For distinct $A,B\in\mathcal A_d$, \begin{align*} \|\Sigma_A-\Sigma_B\|_F^2 = \delta^2\|A-B\|_F^2 \ge c\delta^2 d^2. \end{align*} The Kullback-Leibler divergence between $n$ samples from $\mathcal N(0,\Sigma_A)$ and $\mathcal N(0,\Sigma_B)$ is \begin{align*} D_{\mathrm{KL}}(P_A^{\otimes n}\|P_B^{\otimes n}) = \frac{n}{2}\left(\operatorname{tr}(\Sigma_B^{-1}\Sigma_A-I_d)-\log\det(\Sigma_B^{-1}\Sigma_A)\right). \end{align*} Because the eigenvalues of $\Sigma_A$ and $\Sigma_B$ lie in $[1/4,3/4]$, [Taylor's theorem](/theorems/827) for $t-1-\log t$ on $[1/3,3]$ gives \begin{align*} D_{\mathrm{KL}}(P_A^{\otimes n}\|P_B^{\otimes n}) \le C_1 n\|\Sigma_A-\Sigma_B\|_F^2 \le C_2 n\delta^2 d^2 \le C_3\gamma^2 d^2 \le \frac{a}{8}d^2. \end{align*} The [Fano Inequality](/theorems/1654) testing argument applied to the uniform prior on $\mathcal A_d$ yields \begin{align*} \inf_{\hat\Sigma}\sup_{\Sigma\in\Theta_d}\mathbb E_\Sigma[\|\hat\Sigma-\Sigma\|_F^2] \ge C_4\delta^2 d^2 \ge C_5\min\left\{d,\frac{d^2}{n}\right\}. \end{align*} Since $d(d+1)$ and $d^2$ are comparable for $d\ge2$, this is the desired Frobenius lower bound.[/step]

custom_env admin

[guided]The lower bound must use nonzero covariance matrices inside $\Theta_d$, because the zero covariance itself is easy to estimate. The role of the sign packing is to create many covariance matrices that are separated in Frobenius norm but remain uniformly bounded in operator norm. More precisely, the packing supplies universal constants $a,b,c>0$ and a set $\mathcal A_d$ of symmetric $d\times d$ matrices with zero diagonal and off-diagonal entries in $\{-1,1\}$ such that $|\mathcal A_d|\ge \exp(ad^2)$, $\|A\|_{\mathrm{op}}\le b\sqrt d$ for every $A\in\mathcal A_d$, and $\|A-B\|_F^2\ge cd^2$ whenever $A\ne B$. This is obtained from the [Hamming and Gilbert-Varshamov Bounds](/theorems/5738) on the off-diagonal sign coordinates, together with the random sign-translation pruning argument and the standard operator-norm bound for random symmetric sign matrices. The spectral pruning condition $\|A\|_{\mathrm{op}}\le b\sqrt d$ is what permits perturbations of size comparable to $d^{-1/2}$ while keeping $\frac12 I_d+\delta A$ positive semidefinite and bounded above by $I_d$. Let $C_1>0$ be the universal constant in the KL estimate below, define $C_2=4C_1$, and set $C_3=C_2$. Choose $\gamma>0$ small enough that $C_3\gamma^2\le a/8$ and $\gamma\le 1/(4b)$. Define \begin{align*} \delta &= \gamma\min\left\{\frac{1}{\sqrt d},\frac{1}{\sqrt n}\right\}, \end{align*} and, for $A\in\mathcal A_d$, define \begin{align*} \Sigma_A &= \frac{1}{2}I_d+\delta A. \end{align*} Since $A$ is symmetric, its eigenvalues are real, and the operator norm bound gives $\|\delta A\|_{\mathrm{op}}\le 1/4$. Therefore every eigenvalue of $\Sigma_A$ belongs to $[1/4,3/4]$, proving $0\preceq \Sigma_A\preceq I_d$, hence $\Sigma_A\in\Theta_d$. For each $A\in\mathcal A_d$, let $P_A$ denote the probability law $\mathcal N(0,\Sigma_A)$ on $\mathbb R^d$. The Frobenius separation is inherited directly from the packing: \begin{align*} \|\Sigma_A-\Sigma_B\|_F^2 =\delta^2\|A-B\|_F^2 \ge c\delta^2 d^2. \end{align*} The information distance is small because Gaussian covariance models are locally quadratic in the covariance matrix. For one observation, the covariance Kullback-Leibler formula is \begin{align*} D_{\mathrm{KL}}(\mathcal N(0,\Sigma_A)\|\mathcal N(0,\Sigma_B)) =\frac12\left(\operatorname{tr}(\Sigma_B^{-1}\Sigma_A-I_d)-\log\det(\Sigma_B^{-1}\Sigma_A)\right), \end{align*} and independence multiplies this quantity by $n$. Since all eigenvalues stay in a fixed compact subset of $(0,\infty)$, Taylor's theorem bounds the scalar expression $t-1-\log t$ by a universal multiple of $(t-1)^2$. Hence \begin{align*} D_{\mathrm{KL}}(P_A^{\otimes n}\|P_B^{\otimes n}) \le C_1n\|\Sigma_A-\Sigma_B\|_F^2 \le C_2n\delta^2d^2. \end{align*} The definition of $\delta$ gives $n\delta^2\le \gamma^2$, so the KL divergence is at most $C_3\gamma^2d^2\le ad^2/8$, while the logarithm of the packing size is at least $ad^2$. The [Fano Inequality](/theorems/1654) testing argument therefore forces a constant probability of confusing two separated covariance matrices. Multiplying that testing error by the squared separation gives \begin{align*} \inf_{\hat\Sigma}\sup_{\Sigma\in\Theta_d}\mathbb E_\Sigma[\|\hat\Sigma-\Sigma\|_F^2] \ge C_4\delta^2d^2 \ge C_5\min\left\{d,\frac{d^2}{n}\right\}. \end{align*} Because $d(d+1)\asymp d^2$ for $d\ge2$, this is the asserted Frobenius minimax lower rate.[/guided]

custom_env admin

What brings you to Androma?

Start with a route through the knowledge graph.

Attributions & Verification

Proof

Verification Progress

Contributors

Who Can Verify

Quick Actions

Sign in to Androma

Check your inbox

One last step

Attributions & Verification

Proof

Verification Progress

Contributors

Who Can Verify

Quick Actions

Raw Attribution Data