Androma — The Home of Mathematics on the Internet

custom_env admin

[step:Apply a finite Fano bound to the induced testing problem] [claim:Fano lower bound for finitely many Gaussian experiments] Let $J$ be a finite set with $M:=|J|\ge 2$. For each $j\in J$, let $P_j$ be a probability measure on a common measurable space $(\mathcal Y,\mathcal A)$. Let the measurable estimator $\tilde j$ be the map \begin{align*} \tilde j: \mathcal Y &\to J\cup\{\ast\}. \end{align*} Fix $j_0\in J$. Define $D_{\mathrm{KL}}(P\,\|\,Q)$ to be the Kullback-Leibler divergence of a probability measure $P$ from a probability measure $Q$, with value $+\infty$ if $P$ is not absolutely continuous with respect to $Q$. Then \begin{align*} \max_{j\in J}P_j(\tilde j\ne j) \ge 1-\frac{\frac{1}{M}\sum_{j\in J}D_{\mathrm{KL}}(P_j\,\|\,P_{j_0})+\log 2}{\log M}. \end{align*} [/claim] [proof] Let $V$ be a $J$-valued [random variable](/page/Random%20Variable) with the uniform law on $J$, and conditional on the event $\{V=j\}$ let $Y$ be a $\mathcal Y$-valued random variable with law $P_j$. Let $Q$ be the marginal law of $Y$, so \begin{align*} Q=\frac{1}{M}\sum_{m\in J}P_m. \end{align*} Define the mutual information $I(V;Y)$ by \begin{align*} I(V;Y) := \frac{1}{M}\sum_{j\in J}D_{\mathrm{KL}}(P_j\,\|\,Q). \end{align*} If at least one term $D_{\mathrm{KL}}(P_j\,\|\,P_{j_0})$ is infinite, the desired bound follows after substituting $+\infty$ on the right-hand side. Hence assume all these divergences are finite. Then $P_j\ll P_{j_0}$ for every $j\in J$, and therefore $Q\ll P_{j_0}$. Let $p_j:=dP_j/dP_{j_0}$ and $q:=dQ/dP_{j_0}$. By the definition of $Q$, \begin{align*} q=\frac{1}{M}\sum_{m\in J}p_m \end{align*} $P_{j_0}$-almost everywhere. Expanding the average divergence relative to $P_{j_0}$ gives \begin{align*} \frac{1}{M}\sum_{j\in J}D_{\mathrm{KL}}(P_j\,\|\,P_{j_0}) &= \frac{1}{M}\sum_{j\in J}\int_{\mathcal Y} p_j\log p_j\,dP_{j_0}. \end{align*} Since $p_j=q\,(dP_j/dQ)$ $P_{j_0}$-almost everywhere on the set where $q>0$, and $p_j=0$ where $q=0$, we obtain First, using $p_j=q\,(dP_j/dQ)$ $P_{j_0}$-almost everywhere on $\{q>0\}$ and $p_j=0$ on $\{q=0\}$ gives \begin{align*} \frac{1}{M}\sum_{j\in J}\int_{\mathcal Y} p_j\log p_j\,dP_{j_0} = \frac{1}{M}\sum_{j\in J}\int_{\mathcal Y} p_j\log\left(\frac{dP_j}{dQ}\right)\,dP_{j_0} + \frac{1}{M}\sum_{j\in J}\int_{\mathcal Y}p_j\log q\,dP_{j_0}. \end{align*} Since $q=M^{-1}\sum_{j\in J}p_j$, this becomes \begin{align*} \frac{1}{M}\sum_{j\in J}\int_{\mathcal Y} p_j\log p_j\,dP_{j_0} = \frac{1}{M}\sum_{j\in J}D_{\mathrm{KL}}(P_j\,\|\,Q) + \int_{\mathcal Y}q\log q\,dP_{j_0}. \end{align*} By the definitions of $I(V;Y)$ and $D_{\mathrm{KL}}(Q\,\|\,P_{j_0})$, we conclude \begin{align*} \frac{1}{M}\sum_{j\in J}\int_{\mathcal Y} p_j\log p_j\,dP_{j_0} = I(V;Y)+D_{\mathrm{KL}}(Q\,\|\,P_{j_0}). \end{align*} Because $D_{\mathrm{KL}}(Q\,\|\,P_{j_0})\ge 0$, this identity yields the required information bound \begin{align*} I(V;Y)\le \frac{1}{M}\sum_{j\in J}D_{\mathrm{KL}}(P_j\,\|\,P_{j_0}). \end{align*} Let $\hat V:=\tilde j(Y)$. Define the error probability $p_e:=\mathbb P(\hat V\ne V)$. Since $\ast\notin J$, every outcome with $\tilde j(Y)=\ast$ is counted as an error, so $\hat V$ is a finite-valued decision rule whose error event is precisely $\{\hat V\ne V\}$. Let $H(V)$ denote the Shannon entropy of $V$, and let $H(V\mid \hat V)$ denote the conditional Shannon entropy of $V$ given $\hat V$. We prove the finite-alphabet entropy estimate needed here. Let $E_0$ be the Bernoulli random variable defined by $E_0=1$ on $\{\hat V\ne V\}$ and $E_0=0$ on $\{\hat V=V\}$. Conditional on $\hat V$ and $E_0=0$, the value of $V$ is determined by $V=\hat V$. Conditional on $\hat V$ and $E_0=1$, the value of $V$ belongs to at most $M$ possibilities. Hence Since adding $E_0$ cannot decrease the information available to describe the pair, we have \begin{align*} H(V\mid \hat V) \le H(E_0,V\mid \hat V). \end{align*} The finite-alphabet [chain rule for entropy](/theorems/1635) gives \begin{align*} H(E_0,V\mid \hat V) = H(E_0\mid \hat V)+H(V\mid \hat V,E_0). \end{align*} The Bernoulli entropy is bounded by $\log 2$, and the conditional entropy of $V$ is $0$ on $\{E_0=0\}$ and at most $\log M$ on $\{E_0=1\}$. Therefore \begin{align*} H(V\mid \hat V) \le \log 2+p_e\log M. \end{align*} Moreover, $H(V)=\log M$. Because $\hat V$ is a measurable function of $Y$, conditioning on the full observation $Y$ leaves no more uncertainty than conditioning on $\hat V$ alone, so \begin{align*} H(V\mid \hat V)\ge H(V\mid Y)=H(V)-I(V;Y)=\log M-I(V;Y). \end{align*} Combining the preceding upper and lower bounds on $H(V\mid \hat V)$ gives \begin{align*} p_e \ge 1-\frac{I(V;Y)+\log 2}{\log M}. \end{align*} Finally, \begin{align*} p_e = \frac{1}{M}\sum_{j\in J}P_j(\tilde j\ne j) \le \max_{j\in J}P_j(\tilde j\ne j). \end{align*} Combining this with the information bound proves the claim. [/proof] Choose a fixed index $j_0\in J$. Applying the claim with $P_j=P_j^X$ and $\tilde j=\hat j_X$ gives \begin{align*} R_X \ge 1-\frac{\frac{1}{M}\sum_{j\in J}D_{\mathrm{KL}}(P_j^X\,\|\,P_{j_0}^X)+\log 2}{\log M}. \end{align*} [/step]

custom_env admin

[guided]The only quantitative input from the design matrix is the upper restricted-eigenvalue bound. The constant used for that bound is $\kappa_+>0$, defined by the condition that every $2k$-sparse vector $v\in\mathbb R^d$ satisfies \begin{align*} \frac{1}{n}\|Xv\|_2^2\le \kappa_+\|v\|_2^2. \end{align*} For two different alternatives $\beta_j$ and $\beta_{j_0}$, the Gaussian laws have the same covariance $\sigma^2 I_n$ and different means $X\beta_j$ and $X\beta_{j_0}$. Let \begin{align*} \mu_j:=X\beta_j,\qquad \mu_{j_0}:=X\beta_{j_0} \end{align*} denote these two mean vectors in $\mathbb R^n$. We compute the divergence from the Gaussian densities with respect to $\mathcal L^n$. For $y\in\mathbb R^n$, \begin{align*} \log\frac{dP_j^X}{dP_{j_0}^X}(y) = -\frac{1}{2\sigma^2}|y-\mu_j|^2+\frac{1}{2\sigma^2}|y-\mu_{j_0}|^2. \end{align*} Under $P_j^X$, write $y=\mu_j+\varepsilon$ with $\varepsilon\sim\mathcal N(0,\sigma^2 I_n)$. Then $y-\mu_{j_0}=\varepsilon+(\mu_j-\mu_{j_0})$, and the expectation of the cross term is zero because $\mathbb E_j[\varepsilon]=0$. Hence \begin{align*} D_{\mathrm{KL}}(P_j^X\,\|\,P_{j_0}^X) = \frac{1}{2\sigma^2}|\mu_j-\mu_{j_0}|^2 = \frac{1}{2\sigma^2}\|X(\beta_j-\beta_{j_0})\|_2^2. \end{align*} Now we check that the restricted-eigenvalue hypothesis applies to $\beta_j-\beta_{j_0}$. The baseline coordinates in $B$ cancel, because every $\beta_j$ has the same value $a$ on $B$. Therefore $\beta_j-\beta_{j_0}$ is supported only on the two varying coordinates $j$ and $j_0$. Since $k\ge 1$, every $2$-sparse vector is $2k$-sparse. Hence, on $E$, \begin{align*} \frac{1}{n}\|X(\beta_j-\beta_{j_0})\|_2^2 \le \kappa_+\|\beta_j-\beta_{j_0}\|_2^2. \end{align*} If $j\ne j_0$, then the difference vector has one coordinate equal to $a$ and one coordinate equal to $-a$, so \begin{align*} \|\beta_j-\beta_{j_0}\|_2^2=a^2+a^2=2a^2. \end{align*} If $j=j_0$, the difference is zero, and the same upper bound remains valid. Therefore \begin{align*} D_{\mathrm{KL}}(P_j^X\,\|\,P_{j_0}^X) \le \frac{1}{2\sigma^2}\,n\kappa_+(2a^2) = \frac{\kappa_+ n a^2}{\sigma^2}. \end{align*} [Fano's inequality](/theorems/1654) then forces \begin{align*} R_X \ge 1-\frac{\kappa_+ n a^2/\sigma^2+\log 2}{\log M}. \end{align*} This inequality expresses the core obstruction: if $n a^2/\sigma^2$ is much smaller than $\log M$, then the observations do not carry enough information to identify the active coordinate among $M$ possibilities.[/guided]

custom_env admin

What brings you to Androma?

Start with a route through the knowledge graph.

Attributions & Verification

Proof

Verification Progress

Contributors

Who Can Verify

Quick Actions

Sign in to Androma

Check your inbox

One last step

Attributions & Verification

Proof

Verification Progress

Contributors

Who Can Verify

Quick Actions

Raw Attribution Data