Androma — The Home of Mathematics on the Internet

custom_env admin

[guided]The purpose of this step is to convert the almost sure operator norm bound into a usable exponential estimate. Define the scalar function $g:(0,3/L)\to(0,\infty)$ by \begin{align*} g(\theta)=\frac{\theta^2}{2(1-\theta L/3)}. \end{align*} Fix $\theta\in(0,3/L)$. We first prove the scalar inequality that drives the whole argument. If $|x|\le L$, then the power series for the exponential gives \begin{align*} e^{\theta x}=1+\theta x+\sum_{m=2}^{\infty}\frac{\theta^m x^m}{m!}. \end{align*} For $m\ge 2$ and $|x|\le L$, we have $x^m\le |x|^m\le L^{m-2}x^2$, hence \begin{align*} \sum_{m=2}^{\infty}\frac{\theta^m x^m}{m!} \le x^2\sum_{m=2}^{\infty}\frac{\theta^mL^{m-2}}{m!}. \end{align*} The Bernstein simplification replaces the exact exponential remainder by a geometric upper bound. Since $m!\ge 2\cdot 3^{m-2}$ for every $m\ge 2$, we get \begin{align*} \sum_{m=2}^{\infty}\frac{\theta^mL^{m-2}}{m!} \le \frac{\theta^2}{2}\sum_{m=0}^{\infty}\left(\frac{\theta L}{3}\right)^m =\frac{\theta^2}{2(1-\theta L/3)} =g(\theta), \end{align*} where the geometric series converges because $\theta L/3<1$. Therefore \begin{align*} e^{\theta x}\le 1+\theta x+g(\theta)x^2 \end{align*} for every $x\in[-L,L]$. Now apply this scalar inequality to the eigenvalues of the symmetric random matrix $Y_i$. Since $\|Y_i\|_{\mathrm{op}}\le L$ almost surely, every eigenvalue of $Y_i$ belongs to $[-L,L]$ almost surely. Functional calculus therefore gives the Loewner-order inequality \begin{align*} \exp(\theta Y_i)\preceq I+\theta Y_i+g(\theta)Y_i^2 \end{align*} almost surely. Taking expectations is legitimate because all entries are bounded random variables, and expectation preserves the Loewner order: if $A(\omega)\preceq B(\omega)$ almost surely, then $\mathbb E[A]\preceq\mathbb E[B]$. Since $Y_i$ is centered, $\mathbb E[Y_i]=0$, so \begin{align*} \mathbb E[\exp(\theta Y_i)] \preceq I+g(\theta)\mathbb E[Y_i^2]. \end{align*} Finally, $\mathbb E[Y_i^2]$ is positive semidefinite, because $Y_i^2$ is positive semidefinite almost surely. Hence $g(\theta)\mathbb E[Y_i^2]$ is positive semidefinite. Applying $1+x\le e^x$ to each eigenvalue gives \begin{align*} I+g(\theta)\mathbb E[Y_i^2]\preceq \exp\left(g(\theta)\mathbb E[Y_i^2]\right). \end{align*} Combining the two Loewner inequalities yields \begin{align*} \mathbb E[\exp(\theta Y_i)]\preceq \exp\left(g(\theta)\mathbb E[Y_i^2]\right). \end{align*}[/guided]

custom_env admin

[step:Combine the independent summands inside a trace exponential]We use the [Lieb trace moment generating function estimate](https://doi.org/10.1007/s00440-011-0379-z), cited here as Tropp, *User-Friendly Tail Bounds for Sums of Random Matrices*, Theorem 3.6. We use the following Loewner-dominated corollary of that theorem: if $X_1,\dots,X_n$ are independent symmetric random matrices and $A_i$ are deterministic symmetric matrices such that \begin{align*} \mathbb E[\exp(X_i)]\preceq \exp(A_i) \end{align*} for every $i$, then \begin{align*} \mathbb E\left[\operatorname{tr}\exp\left(\sum_{i=1}^nX_i\right)\right] \le \operatorname{tr}\exp\left(\sum_{i=1}^nA_i\right). \end{align*} Indeed, $\exp(X_i)$ is positive definite almost surely, so $\mathbb E[\exp(X_i)]$ is positive definite and its matrix logarithm is well-defined. Tropp's theorem is stated with $\log\mathbb E[\exp(X_i)]$ in place of $A_i$. From $\mathbb E[\exp(X_i)]\preceq\exp(A_i)$ and the operator monotonicity of the matrix logarithm on positive definite matrices, we get \begin{align*} \log\mathbb E[\exp(X_i)]\preceq A_i. \end{align*} The trace exponential map $B\mapsto\operatorname{tr}\exp(B)$ is monotone in the Loewner order on $\mathbb S^d$, so replacing $\log\mathbb E[\exp(X_i)]$ by the larger matrix $A_i$ preserves the upper bound. This is the finite-dimensional noncommutative trace moment generating function estimate obtained from Lieb's concavity theorem. Apply this estimate with $X_i:=\theta Y_i$ and \begin{align*} A_i:=g(\theta)\mathbb E[Y_i^2]\in\mathbb S^d. \end{align*} The matrices $\theta Y_i$ are independent and symmetric almost surely, and the preceding step verifies the required moment generating function hypothesis. Therefore \begin{align*} \mathbb E[\operatorname{tr}\exp(\theta S)] \le \operatorname{tr}\exp\left(g(\theta)V\right). \end{align*} Since $V$ is positive semidefinite and $\lambda_{\max}(V)=\sigma^2$, every eigenvalue of $g(\theta)V$ is at most $g(\theta)\sigma^2$. Thus \begin{align*} \operatorname{tr}\exp\left(g(\theta)V\right) \le d\exp\left(g(\theta)\sigma^2\right). \end{align*} Consequently, \begin{align*} \mathbb E[\operatorname{tr}\exp(\theta S)] \le d\exp\left(\frac{\theta^2\sigma^2}{2(1-\theta L/3)}\right). \end{align*}[/step]

custom_env admin

[guided]This step is where independence is used in a genuinely matrix-valued way. We apply the [Lieb trace moment generating function estimate](https://doi.org/10.1007/s00440-011-0379-z) in the following Loewner-dominated form: if $X_1,\dots,X_n$ are independent symmetric random matrices and deterministic symmetric matrices $A_1,\dots,A_n$ satisfy \begin{align*} \mathbb E[\exp(X_i)]\preceq\exp(A_i) \end{align*} for every $i$, then \begin{align*} \mathbb E\left[\operatorname{tr}\exp\left(\sum_{i=1}^nX_i\right)\right] \le \operatorname{tr}\exp\left(\sum_{i=1}^nA_i\right). \end{align*} The usual statement has $\log\mathbb E[\exp(X_i)]$ in place of $A_i$. This is compatible with the displayed form because $\exp(X_i)$ is positive definite almost surely, hence $\mathbb E[\exp(X_i)]$ is positive definite and its logarithm is defined. The matrix logarithm is operator monotone on positive definite matrices, and $B\mapsto\operatorname{tr}\exp(B)$ is monotone in the Loewner order on $\mathbb S^d$. We apply the estimate with $X_i:=\theta Y_i$ and \begin{align*} A_i:=g(\theta)\mathbb E[Y_i^2]\in\mathbb S^d. \end{align*} The matrices $\theta Y_i$ are independent because the $Y_i$ are independent, and they are symmetric almost surely because each $Y_i$ is symmetric almost surely. The preceding step proves exactly the required hypothesis \begin{align*} \mathbb E[\exp(\theta Y_i)]\preceq\exp\left(g(\theta)\mathbb E[Y_i^2]\right). \end{align*} Therefore \begin{align*} \mathbb E[\operatorname{tr}\exp(\theta S)] \le \operatorname{tr}\exp\left(g(\theta)V\right). \end{align*} Since $V$ is positive semidefinite and $\lambda_{\max}(V)=\sigma^2$, every eigenvalue of $g(\theta)V$ is at most $g(\theta)\sigma^2$. Summing the exponentials of the $d$ eigenvalues gives \begin{align*} \operatorname{tr}\exp\left(g(\theta)V\right) \le d\exp\left(g(\theta)\sigma^2\right). \end{align*} Substituting the definition of $g(\theta)$ yields \begin{align*} \mathbb E[\operatorname{tr}\exp(\theta S)] \le d\exp\left(\frac{\theta^2\sigma^2}{2(1-\theta L/3)}\right). \end{align*}[/guided]

custom_env admin

What brings you to Androma?

Start with a route through the knowledge graph.

Attributions & Verification

Proof

Verification Progress

Contributors

Who Can Verify

Quick Actions

Sign in to Androma

Check your inbox

One last step

Attributions & Verification

Proof

Verification Progress

Contributors

Who Can Verify

Quick Actions

Raw Attribution Data