[proofplan]
The logarithm converts the product $X_n = \prod_{i=1}^n Y_i$ into the partial sum $S_n := \log X_n = \sum_{i=1}^n \log Y_i$. The summands $Z_i := \log Y_i$ form an i.i.d. $L^2$ sequence with mean $\mu$ and variance $\sigma^2$, so the sample mean $S_n/n$ has mean $\mu$ and variance $\sigma^2/n$. Applying Chebyshev's inequality to $S_n/n - \mu$ with deviation $a$ yields the quantitative bound, and choosing $N \geq \sigma^2/(\varepsilon \delta^2)$ gives the qualitative statement.
[/proofplan]
[step:Rewrite $\log X_n$ as an i.i.d. sum]
Since $Y_i > 0$ almost surely, $\log Y_i$ is well defined, and the product $X_n = \prod_{i=1}^n Y_i$ satisfies
\begin{align*}
\log X_n = \sum_{i=1}^n \log Y_i.
\end{align*}
Define the random variables $Z_i := \log Y_i$ for $i \geq 1$. Because $(Y_i)_{i \geq 1}$ is i.i.d. and $\log: (0, \infty) \to \mathbb{R}$ is Borel measurable, the sequence $(Z_i)_{i \geq 1}$ is i.i.d. as well. By hypothesis $Z_1 \in L^2$, so $\mathbb{E}[Z_1] = \mu$ and $\operatorname{Var}(Z_1) = \sigma^2 < \infty$. Writing $S_n := \sum_{i=1}^n Z_i = \log X_n$, the statement becomes a deviation bound for the sample mean $S_n/n$.
[/step]
[step:Compute the mean and variance of the sample mean $S_n/n$]
Since expectation is linear,
\begin{align*}
\mathbb{E}\!\left[\frac{S_n}{n}\right] = \frac{1}{n}\sum_{i=1}^n \mathbb{E}[Z_i] = \frac{1}{n} \cdot n\mu = \mu.
\end{align*}
The $Z_i$ are i.i.d. and hence pairwise uncorrelated, so by [Bienaymé's Identity](/theorems/???),
\begin{align*}
\operatorname{Var}(S_n) = \sum_{i=1}^n \operatorname{Var}(Z_i) = n\sigma^2,
\end{align*}
and therefore
\begin{align*}
\operatorname{Var}\!\left(\frac{S_n}{n}\right) = \frac{1}{n^2}\operatorname{Var}(S_n) = \frac{\sigma^2}{n}.
\end{align*}
[guided]
We need both the mean and the variance of the sample mean $S_n/n$ because Chebyshev's inequality controls deviations of a random variable from its mean in terms of its variance. Linearity of expectation applies without any independence assumption and gives
\begin{align*}
\mathbb{E}\!\left[\frac{S_n}{n}\right] = \frac{1}{n}\sum_{i=1}^n \mathbb{E}[Z_i] = \mu,
\end{align*}
using only that each $Z_i$ has the common mean $\mu$. For the variance we use that $\operatorname{Var}$ is not linear in general — the cross terms are covariances. By [Bienaymé's Identity](/theorems/???),
\begin{align*}
\operatorname{Var}\!\left(\sum_{i=1}^n Z_i\right) = \sum_{i=1}^n \operatorname{Var}(Z_i) + 2 \sum_{1 \leq i < j \leq n} \operatorname{Cov}(Z_i, Z_j),
\end{align*}
so we must check that the covariances vanish. Independence of $(Z_i)$ implies pairwise independence, hence pairwise uncorrelatedness: $\operatorname{Cov}(Z_i, Z_j) = 0$ for $i \neq j$. Thus
\begin{align*}
\operatorname{Var}(S_n) = n\sigma^2.
\end{align*}
Scaling by $1/n$ pulls a factor of $1/n^2$ out of the variance (the variance is quadratic in constants), giving
\begin{align*}
\operatorname{Var}\!\left(\frac{S_n}{n}\right) = \frac{\sigma^2}{n}.
\end{align*}
This $1/n$ shrinkage of the variance is the entire content of averaging: it is what makes the sample mean concentrate.
[/guided]
[/step]
[step:Apply Chebyshev's inequality to $S_n/n$ with deviation $a$]
The random variable $S_n/n$ lies in $L^2$ with mean $\mu$ and variance $\sigma^2/n$. By [Chebyshev's Inequality](/theorems/???) applied to $S_n/n$ with threshold $a > 0$,
\begin{align*}
\mathbb{P}\!\left(\left|\frac{S_n}{n} - \mu\right| \geq a\right) \leq \frac{\operatorname{Var}(S_n/n)}{a^2} = \frac{\sigma^2}{n a^2}.
\end{align*}
Substituting $S_n = \log X_n$ yields the stated bound
\begin{align*}
\mathbb{P}\!\left(\left|\frac{\log X_n}{n} - \mu\right| \geq a\right) \leq \frac{\sigma^2}{n a^2}.
\end{align*}
[guided]
Chebyshev's inequality states: if $W$ is a random variable with $\mathbb{E}[W^2] < \infty$, then for every $a > 0$,
\begin{align*}
\mathbb{P}(|W - \mathbb{E}[W]| \geq a) \leq \frac{\operatorname{Var}(W)}{a^2}.
\end{align*}
We apply this with $W := S_n/n$. The hypothesis $W \in L^2$ is satisfied: each $Z_i \in L^2$, so $S_n \in L^2$ by closedness of $L^2$ under finite linear combinations, and therefore $S_n/n \in L^2$. The previous step computed $\mathbb{E}[W] = \mu$ and $\operatorname{Var}(W) = \sigma^2/n$. Substituting,
\begin{align*}
\mathbb{P}\!\left(\left|\frac{S_n}{n} - \mu\right| \geq a\right) \leq \frac{\sigma^2/n}{a^2} = \frac{\sigma^2}{n a^2}.
\end{align*}
Rewriting $S_n = \log X_n$ finishes the first claim.
Why Chebyshev rather than a sharper bound such as Hoeffding or a central limit approximation? Chebyshev uses only the second moment, which is all we have assumed. It is also the canonical route to the weak law of large numbers under an $L^2$ hypothesis, and it yields an explicit non-asymptotic constant.
[/guided]
[/step]
[step:Derive the qualitative convergence statement]
Let $\varepsilon > 0$ and $\delta > 0$. Choose
\begin{align*}
N := \left\lceil \frac{\sigma^2}{\varepsilon \delta^2} \right\rceil,
\end{align*}
so that $N \geq \sigma^2/(\varepsilon \delta^2)$. For every $n \geq N$, applying the bound from the previous step with $a = \delta$ gives
\begin{align*}
\mathbb{P}\!\left(\left|\frac{\log X_n}{n} - \mu\right| \geq \delta\right) \leq \frac{\sigma^2}{n \delta^2} \leq \frac{\sigma^2}{N \delta^2} \leq \varepsilon,
\end{align*}
where the middle inequality uses $n \geq N$ and the last uses $N \geq \sigma^2/(\varepsilon\delta^2)$. This establishes the qualitative statement and completes the proof.
[/step]