[guided]The goal of this step is to replace each Bernoulli random variable by a uniform exponential estimate that does not depend on its parameter $p_i$. Work on the probability space $(\Omega,\mathcal F,\mathbb P)$ from the statement. Fix $i \in \{1,\dots,n\}$ and define $X_i:\Omega \to \mathbb R$ by
\begin{align*}
X_i(\omega)=Y_i(\omega)-p_i.
\end{align*}
This is the centered version of $Y_i$, since $\mathbb E[X_i]=0$.
We prove that for every $\lambda\in\mathbb R$,
\begin{align*}
\mathbb E[e^{\lambda X_i}]\le e^{\lambda^2/8}.
\end{align*}
If $p_i=0$ or $p_i=1$, then $Y_i=p_i$ almost surely, so $X_i=0$ almost surely. Therefore $\mathbb E[e^{\lambda X_i}]=1$, and the estimate follows from $1\le e^{\lambda^2/8}$.
Now assume $0<p_i<1$. The exponential moment can be computed from the two possible values of $Y_i$:
\begin{align*}
\mathbb E[e^{\lambda X_i}]=\mathbb E[e^{\lambda(Y_i-p_i)}].
\end{align*}
Since $Y_i$ takes only the values $0$ and $1$,
\begin{align*}
\mathbb E[e^{\lambda(Y_i-p_i)}]=p_i e^{\lambda(1-p_i)}+(1-p_i)e^{-\lambda p_i}.
\end{align*}
We want to show that this is at most $e^{\lambda^2/8}$. Taking logarithms is useful because products of exponential moments will later become sums. Define the function $h_i:\mathbb R \to \mathbb R$ by
\begin{align*}
h_i(u)=\log(1-p_i+p_i e^u)-p_i u.
\end{align*}
Then
\begin{align*}
\mathbb E[e^{\lambda X_i}]
=e^{h_i(\lambda)}.
\end{align*}
We estimate $h_i$ by bounding its second derivative. Differentiating gives
\begin{align*}
h_i'(u)=\frac{p_i e^u}{1-p_i+p_i e^u}-p_i.
\end{align*}
Differentiating once more gives
\begin{align*}
h_i''(u)=\frac{p_i(1-p_i)e^u}{(1-p_i+p_i e^u)^2}.
\end{align*}
Define $q_i:\mathbb R \to (0,1)$ by
\begin{align*}
q_i(u)=\frac{p_i e^u}{1-p_i+p_i e^u}.
\end{align*}
With this notation,
\begin{align*}
h_i''(u)=q_i(u)(1-q_i(u)).
\end{align*}
For every $q\in[0,1]$,
\begin{align*}
q(1-q)\le \frac14,
\end{align*}
so $h_i''(u)\le 1/4$ for every $u\in\mathbb R$. Since $h_i(0)=0$ and $h_i'(0)=0$, and since $\mathcal L^1$ denotes one-dimensional Lebesgue measure on $\mathbb R$, the function $h_i$ is twice continuously differentiable on $\mathbb R$, so [Taylor's Theorem](/theorems/827) with Integral Remainder gives, for $\lambda\ge 0$,
\begin{align*}
h_i(\lambda)=\int_0^\lambda (\lambda-u)h_i''(u)\,d\mathcal L^1(u)\le \frac14\int_0^\lambda (\lambda-u)\,d\mathcal L^1(u)=\frac{\lambda^2}{8}.
\end{align*}
For $\lambda<0$, we write the same remainder formula over $[\lambda,0]$ as
\begin{align*}
h_i(\lambda)=\int_\lambda^0 (u-\lambda)h_i''(u)\,d\mathcal L^1(u)\le \frac14\int_\lambda^0 (u-\lambda)\,d\mathcal L^1(u)=\frac{\lambda^2}{8}.
\end{align*}
Therefore
\begin{align*}
\mathbb E[e^{\lambda X_i}]
=e^{h_i(\lambda)}
\le e^{\lambda^2/8}.
\end{align*}
This is the special Bernoulli form of the sub-Gaussian estimate that drives the Chernoff-Hoeffding bound.[/guided]