[proofplan]
We convert each centred sub-Gaussian variable from its $\psi_2$ norm bound into a moment-generating-function bound with variance proxy proportional to $\|X_i\|_{\psi_2}^2$. Independence then factors the moment-generating function of the linear combination into the product of the individual moment-generating functions. Multiplying the resulting estimates gives a single sub-Gaussian moment-generating-function bound with proxy $\sum_i a_i^2\|X_i\|_{\psi_2}^2$, and the same equivalence converts this back to a $\psi_2$ norm estimate.
[/proofplan]
[step:Convert the $\psi_2$ bounds into moment-generating-function bounds]
For each $i \in \{1,\dots,n\}$, define
\begin{align*}
K_i := \|X_i\|_{\psi_2}.
\end{align*}
By the Sub-Gaussian MGF-Orlicz Norm Equivalence there exists a universal constant $c_0>0$ such that, because $X_i$ is centred and $K_i<\infty$,
\begin{align*}
\mathbb E\left[\exp\left(tX_i\right)\right]
\leq
\exp\left(c_0t^2K_i^2\right)
\end{align*}
for every $t\in\mathbb R$ and every $i \in \{1,\dots,n\}$.
[guided]
The role of the $\psi_2$ norm is to quantify sub-Gaussian decay. The form we need for sums is the moment-generating-function form, because independence turns moment-generating functions of sums into products.
For each $i \in \{1,\dots,n\}$, set
\begin{align*}
K_i := \|X_i\|_{\psi_2}.
\end{align*}
The hypotheses give $K_i<\infty$ and $\mathbb E[X_i]=0$. The Sub-Gaussian MGF-Orlicz Norm Equivalence says that a centred real-valued [random variable](/page/Random%20Variable) $Y$ with finite $\psi_2$ norm satisfies
\begin{align*}
\mathbb E\left[\exp\left(tY\right)\right]
\leq
\exp\left(c_0t^2\|Y\|_{\psi_2}^2\right)
\end{align*}
for every $t\in\mathbb R$, where $c_0>0$ is a universal constant. Applying this statement to $Y=X_i$ gives
\begin{align*}
\mathbb E\left[\exp\left(tX_i\right)\right]
\leq
\exp\left(c_0t^2K_i^2\right)
\end{align*}
for every $t\in\mathbb R$ and every $i \in \{1,\dots,n\}$.
This is the point where centering is used: the moment-generating-function characterization in this form gives a purely quadratic exponent only for centred sub-Gaussian variables.
[/guided]
[/step]
[step:Factor the moment-generating function of the linear combination]
Let $(\Omega,\mathcal F,\mathbb P)$ denote the underlying probability space. Define the real-valued random variable $S_a:(\Omega,\mathcal F)\to(\mathbb R,\mathcal B(\mathbb R))$ by
\begin{align*}
S_a(\omega) := \sum_{i=1}^n a_iX_i(\omega)
\end{align*}
for $\omega\in\Omega$. For arbitrary $t\in\mathbb R$, the identity $\exp(u+v)=\exp(u)\exp(v)$ gives
\begin{align*}
\exp\left(tS_a\right)=\prod_{i=1}^n \exp\left(ta_iX_i\right).
\end{align*}
Since $X_1,\dots,X_n$ are independent, the random variables $\exp(ta_iX_i)$ are independent. Their expectations are finite by the moment-generating-function bound from the previous step applied with $ta_i$ in place of $t$. Hence independence gives
\begin{align*}
\mathbb E\left[\exp\left(tS_a\right)\right]=\prod_{i=1}^n \mathbb E\left[\exp\left(ta_iX_i\right)\right].
\end{align*}
Using the bound from the previous step with $ta_i$ in place of $t$ yields
\begin{align*}
\mathbb E\left[\exp\left(tS_a\right)\right]\leq\prod_{i=1}^n \exp\left(c_0t^2a_i^2K_i^2\right).
\end{align*}
Multiplying exponentials and collecting the exponent gives
\begin{align*}
\mathbb E\left[\exp\left(tS_a\right)\right]\leq\exp\left(c_0t^2\sum_{i=1}^n a_i^2K_i^2\right).
\end{align*}
[guided]
Let $(\Omega,\mathcal F,\mathbb P)$ be the probability space on which the random variables are defined. We first name the linear combination as a random variable: define $S_a:(\Omega,\mathcal F)\to(\mathbb R,\mathcal B(\mathbb R))$ by
\begin{align*}
S_a(\omega) := \sum_{i=1}^n a_iX_i(\omega)
\end{align*}
for $\omega\in\Omega$. This definition is measurable because each $X_i$ is a real-valued random variable and finite linear combinations of measurable real-valued functions are measurable.
Why introduce $S_a$? The goal is to prove that this sum is sub-Gaussian, and the moment-generating-function form is designed for sums of independent variables. Fix $t\in\mathbb R$. Using the elementary exponential identity for a finite sum, we have
\begin{align*}
\exp\left(tS_a\right)=\prod_{i=1}^n \exp\left(ta_iX_i\right).
\end{align*}
Since $X_1,\dots,X_n$ are independent, the measurable transforms $\exp(ta_iX_i)$ are also independent. The previous step gives finite expectations for these transforms, because it applies to $X_i$ with the real parameter $ta_i$. Therefore the product rule for expectations of finitely many independent integrable random variables gives
\begin{align*}
\mathbb E\left[\exp\left(tS_a\right)\right]=\prod_{i=1}^n \mathbb E\left[\exp\left(ta_iX_i\right)\right].
\end{align*}
Now apply the individual estimate with $ta_i$ in place of $t$ for each index $i\in\{1,\dots,n\}$. This gives
\begin{align*}
\mathbb E\left[\exp\left(ta_iX_i\right)\right]\leq\exp\left(c_0t^2a_i^2K_i^2\right).
\end{align*}
Multiplying these $n$ inequalities yields
\begin{align*}
\mathbb E\left[\exp\left(tS_a\right)\right]\leq\prod_{i=1}^n \exp\left(c_0t^2a_i^2K_i^2\right).
\end{align*}
Finally, the product of exponentials is the exponential of the sum of exponents, so
\begin{align*}
\mathbb E\left[\exp\left(tS_a\right)\right]\leq\exp\left(c_0t^2\sum_{i=1}^n a_i^2K_i^2\right).
\end{align*}
This is the desired moment-generating-function bound for the whole linear combination.
[/guided]
[/step]
[step:Translate the resulting moment-generating-function bound back to the $\psi_2$ norm]
Define
\begin{align*}
\sigma_a^2 := \sum_{i=1}^n a_i^2K_i^2.
\end{align*}
The previous step proves that
\begin{align*}
\mathbb E\left[\exp\left(tS_a\right)\right]
\leq
\exp\left(c_0t^2\sigma_a^2\right)
\end{align*}
for every $t\in\mathbb R$. If $\sigma_a=0$, then $a_iK_i=0$ for every $i$, hence $a_iX_i=0$ almost surely for every $i$, so $S_a=0$ almost surely and the desired estimate holds.
Assume now that $\sigma_a>0$. Since each $X_i$ is centred, linearity of expectation and finiteness of the first moments of sub-Gaussian random variables give
\begin{align*}
\mathbb E[S_a]=\sum_{i=1}^n a_i\mathbb E[X_i]=0.
\end{align*}
The moment-generating-function bound just proved is exactly the hypothesis in the converse direction of the Sub-Gaussian MGF-Orlicz Norm Equivalence, applied to the centred random variable $S_a$ with variance proxy $c_0\sigma_a^2$. Hence there is a universal constant $c_1>0$ such that
\begin{align*}
\|S_a\|_{\psi_2}\leq c_1\sqrt{c_0}\,\sigma_a.
\end{align*}
Define the universal constant $C:=c_1\sqrt{c_0}$. Since $K_i=\|X_i\|_{\psi_2}$ for every $i\in\{1,\dots,n\}$ and $S_a=\sum_{i=1}^n a_iX_i$, this becomes
\begin{align*}
\left\|\sum_{i=1}^n a_iX_i\right\|_{\psi_2}\leq C\left(\sum_{i=1}^n a_i^2\|X_i\|_{\psi_2}^2\right)^{1/2}.
\end{align*}
The constant $C$ is universal because $c_0$ and $c_1$ are universal.
[guided]
We now convert the moment-generating-function estimate back into the Orlicz norm estimate. Define
\begin{align*}
\sigma_a^2 := \sum_{i=1}^n a_i^2K_i^2.
\end{align*}
The previous step proved that, for every $t\in\mathbb R$,
\begin{align*}
\mathbb E\left[\exp\left(tS_a\right)\right]\leq\exp\left(c_0t^2\sigma_a^2\right).
\end{align*}
If $\sigma_a=0$, then every summand $a_i^2K_i^2$ is zero. Hence $a_iK_i=0$ for every $i\in\{1,\dots,n\}$. If $a_i=0$, then $a_iX_i=0$ almost surely. If $K_i=0$, then the $\psi_2$ norm of $X_i$ is zero, so $X_i=0$ almost surely. Therefore $a_iX_i=0$ almost surely for every $i$, and $S_a=0$ almost surely. The desired estimate follows in this case.
Assume $\sigma_a>0$. The converse direction of the Sub-Gaussian MGF-Orlicz Norm Equivalence requires a centred random variable and a quadratic moment-generating-function bound. We verify both hypotheses. Since each $X_i$ is centred and sub-Gaussian random variables have finite first moments, linearity of expectation gives
\begin{align*}
\mathbb E[S_a]=\sum_{i=1}^n a_i\mathbb E[X_i]=0.
\end{align*}
The quadratic moment-generating-function bound is precisely the estimate already obtained:
\begin{align*}
\mathbb E\left[\exp\left(tS_a\right)\right]\leq\exp\left(c_0t^2\sigma_a^2\right)
\end{align*}
for every $t\in\mathbb R$. Applying the converse characterization to $S_a$ with variance proxy $c_0\sigma_a^2$ gives a universal constant $c_1>0$ such that
\begin{align*}
\|S_a\|_{\psi_2}\leq c_1\sqrt{c_0}\,\sigma_a.
\end{align*}
Define $C:=c_1\sqrt{c_0}$. Substituting back $S_a=\sum_{i=1}^n a_iX_i$ and $K_i=\|X_i\|_{\psi_2}$ gives
\begin{align*}
\left\|\sum_{i=1}^n a_iX_i\right\|_{\psi_2}\leq C\left(\sum_{i=1}^n a_i^2\|X_i\|_{\psi_2}^2\right)^{1/2}.
\end{align*}
This is the claimed estimate, and $C$ is universal because it is built only from the universal constants $c_0$ and $c_1$.
[/guided]
[/step]