[proofplan]
We apply Markov's inequality to the non-negative random variable $(X - \mathbb{E}[X])^2$. Markov's inequality gives an upper bound on the probability that a non-negative random variable exceeds a threshold, and specialising to the squared deviation yields Chebyshev's inequality.
[/proofplan]
[step:Establish Markov's inequality for a non-negative random variable]
[claim:Markov's Inequality]
Let $Z \ge 0$ be a random variable with $\mathbb{E}[Z] < \infty$. For any $t > 0$,
\begin{align*}
\mathbb{P}(Z \ge t) \le \frac{\mathbb{E}[Z]}{t}.
\end{align*}
[/claim]
[proof]
Since $Z \ge 0$, we have $Z \ge t \cdot \mathbb{1}_{\{Z \ge t\}}$ (the right side equals $t$ when $Z \ge t$ and $0$ otherwise, so $Z$ dominates it in both cases). Taking expectations of both sides and using monotonicity of expectation:
\begin{align*}
\mathbb{E}[Z] \ge \mathbb{E}[t \cdot \mathbb{1}_{\{Z \ge t\}}] = t \cdot \mathbb{E}[\mathbb{1}_{\{Z \ge t\}}] = t \cdot \mathbb{P}(Z \ge t).
\end{align*}
Dividing by $t > 0$ gives $\mathbb{P}(Z \ge t) \le \mathbb{E}[Z]/t$.
[/proof]
[/step]
[step:Apply Markov's inequality to $(X - \mathbb{E}[X])^2$ with threshold $a^2$]
Let $\mu = \mathbb{E}[X]$. The event $\{|X - \mu| \ge a\}$ is identical to $\{(X - \mu)^2 \ge a^2\}$ (since both sides of the inequality are non-negative, squaring preserves the direction). The random variable $Z = (X - \mu)^2$ is non-negative with $\mathbb{E}[Z] = \operatorname{Var}(X) < \infty$. Applying [Markov's inequality](/theorems/514) with $t = a^2 > 0$:
\begin{align*}
\mathbb{P}(|X - \mu| \ge a) = \mathbb{P}((X - \mu)^2 \ge a^2) \le \frac{\mathbb{E}[(X - \mu)^2]}{a^2} = \frac{\operatorname{Var}(X)}{a^2}.
\end{align*}
[guided]
The trick is to convert a probability about $|X - \mu|$ into a probability about a non-negative random variable, so that Markov's inequality applies.
Why square? The random variable $X - \mu$ can be negative, so Markov's inequality (which requires non-negativity) does not apply directly. But $(X - \mu)^2 \ge 0$, and the events $\{|X - \mu| \ge a\}$ and $\{(X - \mu)^2 \ge a^2\}$ are identical (squaring is a monotone transformation on $[0, \infty)$, and both $|X - \mu|$ and $a$ are non-negative).
Once we recognise this, Markov gives $\mathbb{P}((X - \mu)^2 \ge a^2) \le \frac{\mathbb{E}[(X - \mu)^2]}{a^2}$, and $\mathbb{E}[(X - \mu)^2]$ is exactly $\operatorname{Var}(X)$ by definition.
[/guided]
[/step]