[guided]We want an upper bound for $\mathbb P(X \ge a)$ that decays exponentially in a rate function. The standard device is to replace the event involving $X$ by an event involving $e^{\lambda X}$, because exponential moments are easy to compute for a Bernoulli random variable.
Fix $a \in (p,1)$ and choose $\lambda > 0$. Since $t \mapsto e^{\lambda t}$ is increasing, whenever $X \ge a$ we also have $e^{\lambda X} \ge e^{\lambda a}$. Hence
\begin{align*}
\{X \ge a\} \subseteq \{e^{\lambda X} \ge e^{\lambda a}\}.
\end{align*}
The random variable $e^{\lambda X}: \Omega \to (0,\infty)$ is non-negative. Therefore
\begin{align*}
e^{\lambda a}\mathbb P(e^{\lambda X} \ge e^{\lambda a})
\le \mathbb E[e^{\lambda X}],
\end{align*}
because the expectation over the event $\{e^{\lambda X} \ge e^{\lambda a}\}$ already contributes at least $e^{\lambda a}$ times the probability of that event. Combining these two observations, first we get
\begin{align*}
\mathbb P(X \ge a)\le \mathbb P(e^{\lambda X} \ge e^{\lambda a}).
\end{align*}
Then the elementary Markov estimate gives
\begin{align*}
\mathbb P(e^{\lambda X} \ge e^{\lambda a})\le e^{-\lambda a}\mathbb E[e^{\lambda X}].
\end{align*}
Using $\mathbb E[e^{\lambda X}]=\exp\{\Lambda_X(\lambda)\}$, this yields
\begin{align*}
\mathbb P(X \ge a)\le \exp\{\Lambda_X(\lambda)-\lambda a\}.
\end{align*}
Now we optimize the exponent. Define $\Phi_a: \mathbb R \to \mathbb R$ by
\begin{align*}
\Phi_a(\lambda)=\lambda a-\Lambda_X(\lambda).
\end{align*}
Since $\Lambda_X(\lambda)=\log(1-p+p e^\lambda)$, differentiation gives
\begin{align*}
\Phi_a'(\lambda)
= a-\frac{p e^\lambda}{1-p+p e^\lambda}.
\end{align*}
The optimal exponential tilt should make this derivative vanish, so we solve
\begin{align*}
a=\frac{p e^\lambda}{1-p+p e^\lambda}.
\end{align*}
Multiplying by $1-p+p e^\lambda$ and collecting the $e^\lambda$ terms gives
\begin{align*}
a(1-p)+ap e^\lambda = p e^\lambda,
\end{align*}
hence
\begin{align*}
a(1-p)=p(1-a)e^\lambda,
\end{align*}
and therefore
\begin{align*}
e^\lambda=\frac{a(1-p)}{p(1-a)}.
\end{align*}
Define
\begin{align*}
\lambda_a=\log\frac{a(1-p)}{p(1-a)}.
\end{align*}
Because $a>p$, the inequality $a(1-p)>p(1-a)$ holds, so $\lambda_a>0$. This matters because the upper-tail argument requires the exponential map $t \mapsto e^{\lambda t}$ to be increasing, which is why we restricted to $\lambda>0$.
To confirm that this stationary point really is the optimizer, compute the second derivative:
\begin{align*}
\Phi_a''(\lambda)
= -\frac{p(1-p)e^\lambda}{(1-p+p e^\lambda)^2}.
\end{align*}
Since $p \in (0,1)$ and $e^\lambda>0$, this quantity is strictly negative for every $\lambda \in \mathbb R$. Thus $\Phi_a$ is strictly concave, and its stationary point $\lambda_a$ is the unique maximizer.
It remains to compute the value of the maximum. Using the defining formula for $\lambda_a$,
\begin{align*}
1-p+p e^{\lambda_a}=1-p+p\frac{a(1-p)}{p(1-a)}.
\end{align*}
Cancelling $p$ in the second term gives
\begin{align*}
1-p+p e^{\lambda_a}=1-p+\frac{a(1-p)}{1-a}.
\end{align*}
Putting the terms over the common denominator $1-a$ gives
\begin{align*}
1-p+p e^{\lambda_a}=\frac{(1-p)(1-a)+a(1-p)}{1-a}.
\end{align*}
The numerator simplifies to $1-p$, so
\begin{align*}
1-p+p e^{\lambda_a}=\frac{1-p}{1-a}.
\end{align*}
Therefore
\begin{align*}
\Phi_a(\lambda_a)=a\lambda_a-\log(1-p+p e^{\lambda_a}).
\end{align*}
Using the formulas for $\lambda_a$ and $1-p+p e^{\lambda_a}$, this becomes
\begin{align*}
\Phi_a(\lambda_a)=a\log\frac{a(1-p)}{p(1-a)}-\log\frac{1-p}{1-a}.
\end{align*}
Separating logarithms gives
\begin{align*}
\Phi_a(\lambda_a)=a\log\frac{a}{p}+a\log\frac{1-p}{1-a}-\log\frac{1-p}{1-a}.
\end{align*}
Combining the last two terms gives
\begin{align*}
\Phi_a(\lambda_a)=a\log\frac{a}{p}+(1-a)\log\frac{1-a}{1-p}.
\end{align*}
By the definition of $D(a\|p)$,
\begin{align*}
\Phi_a(\lambda_a)=D(a\|p).
\end{align*}
Taking $\lambda=\lambda_a$ in the exponential estimate gives
\begin{align*}
\mathbb P(X \ge a)\le \exp\{\Lambda_X(\lambda_a)-\lambda_a a\}.
\end{align*}
Since $\Phi_a(\lambda)=\lambda a-\Lambda_X(\lambda)$, the exponent is $-\Phi_a(\lambda_a)$, so
\begin{align*}
\mathbb P(X \ge a)\le \exp\{-\Phi_a(\lambda_a)\}.
\end{align*}
Using $\Phi_a(\lambda_a)=D(a\|p)$, we conclude
\begin{align*}
\mathbb P(X \ge a)\le \exp\{-D(a\|p)\}.
\end{align*}
This proves the upper-tail bound.[/guided]