[guided]The purpose of this step is to replace the original probability space by the two events $A_\varepsilon$ and $A_\varepsilon^c$. This is useful because the total variation distance only asks for probability gaps of measurable sets, and a single nearly extremal set contains enough information to prove the estimate.
Since $P \ll Q$, the [Radon-Nikodym theorem](/page/Radon-Nikodym%20Theorem) gives a Radon-Nikodym derivative. Let $r:\Omega\to[0,\infty]$ be the measurable function given by $r(\omega)=\frac{dP}{dQ}(\omega)$ for $\omega\in\Omega$. Thus, for every measurable set $E \in \mathcal{F}$,
\begin{align*}
P(E)=\int_E r\,dQ.
\end{align*}
Because $P(\Omega)=1<\infty$, we may choose this derivative finite $Q$-a.e.; if necessary, redefine $r$ on the $Q$-null set where it is infinite. This redefinition preserves the displayed identity for every $E\in\mathcal{F}$ and preserves all integrals with respect to $Q$. We also define the function $\varphi:[0,\infty)\to\mathbb{R}$ by $\varphi(t)=t\log t$ for $t>0$ and $\varphi(0)=0$. The function $\varphi$ is convex on $[0,\infty)$: on $(0,\infty)$ it has second derivative $\varphi''(t)=1/t\geq 0$, and the value $\varphi(0)=0$ is the continuous endpoint extension of that convex function. The relative entropy can be written as
\begin{align*}
D_{\mathrm{KL}}(P \mid Q)=\int_\Omega \varphi(r)\,dQ.
\end{align*}
We now estimate the contribution from $A_\varepsilon$. If $b_\varepsilon=Q(A_\varepsilon)>0$, then $Q(\cdot \cap A_\varepsilon)/b_\varepsilon$ is a probability measure on $A_\varepsilon$. [Jensen's inequality](/page/Jensen%27s%20Inequality) applied to the convex function $\varphi$ gives
\begin{align*}
\frac{1}{b_\varepsilon}\int_{A_\varepsilon}\varphi(r)\,dQ
\geq
\varphi\left(
\frac{1}{b_\varepsilon}\int_{A_\varepsilon}r\,dQ
\right).
\end{align*}
Multiplying by $b_\varepsilon$ and using $\int_{A_\varepsilon}r\,dQ=P(A_\varepsilon)=a_\varepsilon$, we get
\begin{align*}
\int_{A_\varepsilon}\varphi(r)\,dQ
\geq
b_\varepsilon\varphi\left(\frac{a_\varepsilon}{b_\varepsilon}\right)
=
a_\varepsilon\log\left(\frac{a_\varepsilon}{b_\varepsilon}\right).
\end{align*}
If $b_\varepsilon=0$, absolute continuity $P \ll Q$ forces $a_\varepsilon=P(A_\varepsilon)=0$, so the corresponding entropy term is interpreted as $0$ and the inequality remains valid.
We next apply the same Jensen mechanism to the complement $A_\varepsilon^c$, but we must first check the endpoint. Since $Q(A_\varepsilon^c)=1-b_\varepsilon$ and $P(A_\varepsilon^c)=1-a_\varepsilon$, if $1-b_\varepsilon>0$, then $Q(\cdot\cap A_\varepsilon^c)/(1-b_\varepsilon)$ is a probability measure on $A_\varepsilon^c$ and [Jensen's inequality](/page/Jensen%27s%20Inequality) gives
\begin{align*}
\int_{A_\varepsilon^c}\varphi(r)\,dQ
\geq
(1-a_\varepsilon)\log\left(\frac{1-a_\varepsilon}{1-b_\varepsilon}\right).
\end{align*}
If $1-b_\varepsilon=0$, then $Q(A_\varepsilon^c)=0$, and absolute continuity $P\ll Q$ gives $P(A_\varepsilon^c)=1-a_\varepsilon=0$, so the complementary entropy term is $0$.
Adding the two estimates yields
\begin{align*}
D_{\mathrm{KL}}(P \mid Q)
\geq
a_\varepsilon\log\left(\frac{a_\varepsilon}{b_\varepsilon}\right)
+
(1-a_\varepsilon)\log\left(\frac{1-a_\varepsilon}{1-b_\varepsilon}\right).
\end{align*}
This is exactly the relative entropy of the two-point distributions $(a_\varepsilon,1-a_\varepsilon)$ and $(b_\varepsilon,1-b_\varepsilon)$.[/guided]