[proofplan]
We introduce the mixture law $\overline{P}$ of the observation $X$ and express mutual information as the average divergence from each conditional law $P_j$ to $\overline{P}$. We then compare this expression with an arbitrary reference measure $Q$ by expanding Radon-Nikodym derivatives, obtaining an exact decomposition with a non-negative remainder $D(\overline{P}\|Q)$. Finally, when all pairwise divergences are finite, we use the convexity of relative entropy in its second argument to bound each $D(P_j\|\overline{P})$ by the average of the divergences $D(P_j\|P_k)$.
[/proofplan]
[step:Identify the marginal law of $X$ and express mutual information through conditional laws]
Define the mixture probability measure $\overline{P}$ on $(E,\mathcal{E})$ by
\begin{align*}
\overline{P}(A) := \frac{1}{M}\sum_{j=1}^{M} P_j(A), \qquad A \in \mathcal{E}.
\end{align*}
Since $V$ is uniform and the conditional law of $X$ given $V=j$ is $P_j$, the marginal law of $X$ is $\overline{P}$.
For every $j \in \{1,\dots,M\}$, $P_j \ll \overline{P}$: if $\overline{P}(A)=0$, then $\sum_{k=1}^{M}P_k(A)=0$, hence $P_j(A)=0$. Let $r_j: E \to [0,\infty)$ be a Radon-Nikodym derivative, defined $\overline{P}$-almost everywhere by
\begin{align*}
r_j(x) := \frac{dP_j}{d\overline{P}}(x).
\end{align*} The joint law of $(V,X)$ is
\begin{align*}
\mathbb{P}_{V,X}(\{j\}\times A)=\frac{1}{M}P_j(A), \qquad j \in \{1,\dots,M\},\ A \in \mathcal{E},
\end{align*}
while the product of the marginals is
\begin{align*}
(\mathbb{P}_V\otimes \mathbb{P}_X)(\{j\}\times A)=\frac{1}{M}\overline{P}(A).
\end{align*}
Therefore the Radon-Nikodym derivative of $\mathbb{P}_{V,X}$ with respect to $\mathbb{P}_V\otimes \mathbb{P}_X$ is $r_j(x)$ on $\{j\}\times E$. By the definition of mutual information as relative entropy of the joint law with respect to the product law,
\begin{align*}
I(V;X)= \sum_{j=1}^{M}\frac{1}{M}\int_E \log r_j(x)\, dP_j(x).
\end{align*}
Using the definition of relative entropy for each pair $(P_j,\overline{P})$, this becomes
\begin{align*}
I(V;X)= \frac{1}{M}\sum_{j=1}^{M} D(P_j\|\overline{P}).
\end{align*}
[/step]
[step:Decompose the average divergence through an arbitrary reference law $Q$]
If $D(P_j\|Q)=+\infty$ for some $j$, then
\begin{align*}
I(V;X) \leq \frac{1}{M}\sum_{j=1}^{M}D(P_j\|Q)
\end{align*}
holds because the right-hand side is $+\infty$. Assume therefore that $D(P_j\|Q)<+\infty$ for all $j$. Then $P_j\ll Q$ for all $j$, and hence $\overline{P}\ll Q$.
Let $p_j: E \to [0,\infty)$ be a Radon-Nikodym derivative, defined $Q$-almost everywhere by
\begin{align*}
p_j(x) := \frac{dP_j}{dQ}(x),
\end{align*}
and let $\overline{p}: E \to [0,\infty)$ be a Radon-Nikodym derivative, defined $Q$-almost everywhere by
\begin{align*}
\overline{p}(x) := \frac{d\overline{P}}{dQ}(x).
\end{align*} Since $\overline{P}=M^{-1}\sum_{k=1}^{M}P_k$, we may choose
\begin{align*}
\overline{p}(x)=\frac{1}{M}\sum_{k=1}^{M}p_k(x)
\end{align*}
for $Q$-almost every $x \in E$. Also,
\begin{align*}
\frac{dP_j}{d\overline{P}}(x)=\frac{p_j(x)}{\overline{p}(x)}
\end{align*}
for $P_j$-almost every $x \in E$. Moreover $p_j(x) \leq M\overline{p}(x)$ for $Q$-almost every $x\in E$, so $\log(p_j/\overline{p}) \leq \log M$ on the set where $p_j>0$. Thus the following use of the logarithm does not form an undefined $+\infty-\infty$ expression. Therefore
\begin{align*}
D(P_j\|\overline{P})= \int_E \log\left(\frac{p_j(x)}{\overline{p}(x)}\right)\, dP_j(x).
\end{align*}
Since $D(P_j\|Q)<+\infty$, the positive part of $\log p_j$ is $P_j$-integrable; the preceding upper bound controls the positive part of $\log(p_j/\overline{p})$. Hence the subtraction below is well-defined in the extended real sense:
\begin{align*}
D(P_j\|\overline{P})= \int_E \log p_j(x)\, dP_j(x)-\int_E \log \overline{p}(x)\, dP_j(x).
\end{align*}
Averaging over $j$ gives
\begin{align*}
I(V;X)= \frac{1}{M}\sum_{j=1}^{M}\int_E \log p_j(x)\, dP_j(x)-\frac{1}{M}\sum_{j=1}^{M}\int_E \log \overline{p}(x)\, dP_j(x).
\end{align*}
By the definition of $D(P_j\|Q)$ and the identity $\overline{P}=M^{-1}\sum_{j=1}^{M}P_j$, this is
\begin{align*}
I(V;X)= \frac{1}{M}\sum_{j=1}^{M}D(P_j\|Q)-\int_E \log \overline{p}(x)\, d\overline{P}(x).
\end{align*}
By the definition of $D(\overline{P}\|Q)$, we conclude
\begin{align*}
I(V;X)= \frac{1}{M}\sum_{j=1}^{M}D(P_j\|Q)-D(\overline{P}\|Q).
\end{align*}
[guided]
The goal of this step is to compare every $P_j$ not with the true marginal $\overline{P}$, but with an arbitrary reference law $Q$. If some $D(P_j\|Q)$ is infinite, then the claimed upper bound is immediate because the right-hand side is $+\infty$. Thus the meaningful case is the finite case, where $D(P_j\|Q)<+\infty$ for every $j$.
Finite relative entropy implies absolute continuity, so $P_j\ll Q$ for every $j$. Since $\overline{P}$ is the average of the measures $P_j$, this also gives $\overline{P}\ll Q$. Define the Radon-Nikodym derivative map $p_j: E \to [0,\infty)$ for $j\in\{1,\dots,M\}$ by
\begin{align*}
p_j(x) := \frac{dP_j}{dQ}(x)
\end{align*}
for $Q$-almost every $x\in E$, and define the Radon-Nikodym derivative map $\overline{p}: E \to [0,\infty)$ by
\begin{align*}
\overline{p}(x) := \frac{d\overline{P}}{dQ}(x)
\end{align*}
for $Q$-almost every $x\in E$.
Because $\overline{P}=M^{-1}\sum_{k=1}^{M}P_k$, the derivative of the mixture is the mixture of the derivatives:
\begin{align*}
\overline{p}(x)=\frac{1}{M}\sum_{k=1}^{M}p_k(x)
\end{align*}
for $Q$-almost every $x\in E$.
Now compare $P_j$ to $\overline{P}$. The [chain rule for Radon-Nikodym derivatives](/theorems/1208) gives
\begin{align*}
\frac{dP_j}{d\overline{P}}(x)=\frac{p_j(x)}{\overline{p}(x)}
\end{align*}
for $P_j$-almost every $x\in E$. Substituting this into the relative entropy gives
\begin{align*}
D(P_j\|\overline{P})= \int_E \log\left(\frac{p_j(x)}{\overline{p}(x)}\right)\, dP_j(x).
\end{align*}
Why is it legitimate to split the logarithm? Since $\overline{p}=M^{-1}\sum_{k=1}^{M}p_k$, we have $p_j\leq M\overline{p}$ $Q$-almost everywhere. Hence $\log(p_j/\overline{p})\leq \log M$ on the set where $p_j>0$. Also $D(P_j\|Q)<+\infty$, so the positive part of $\log p_j$ is integrable with respect to $P_j$. These two facts ensure that the following subtraction is not an undefined $+\infty-\infty$ expression:
\begin{align*}
D(P_j\|\overline{P})= \int_E \log p_j(x)\, dP_j(x)-\int_E \log \overline{p}(x)\, dP_j(x).
\end{align*}
The first integral is exactly $D(P_j\|Q)$. The second integral becomes a mixture integral after averaging over $j$:
\begin{align*}
\frac{1}{M}\sum_{j=1}^{M}\int_E \log \overline{p}(x)\, dP_j(x)= \int_E \log \overline{p}(x)\, d\overline{P}(x).
\end{align*}
By the definition of relative entropy with density $\overline{p}=d\overline{P}/dQ$, this last integral is
\begin{align*}
\int_E \log \overline{p}(x)\, d\overline{P}(x)= D(\overline{P}\|Q).
\end{align*}
Therefore
\begin{align*}
I(V;X)
= \frac{1}{M}\sum_{j=1}^{M}D(P_j\|Q)-D(\overline{P}\|Q).
\end{align*}
This identity is the central point: the average divergence to the reference law $Q$ exceeds the mutual information by exactly the divergence from the marginal law $\overline{P}$ to $Q$.
[/guided]
[/step]
[step:Use non-negativity of relative entropy to obtain the reference-law bound]
We verify $D(\overline{P}\|Q)\geq 0$. Let $\overline{p}=d\overline{P}/dQ$. Since $\overline{P}$ is a probability measure,
\begin{align*}
\int_E \overline{p}(x)\, dQ(x)=1.
\end{align*}
Using the inequality $\log t \leq t-1$ for $t>0$ with $t=1/\overline{p}(x)$ on the set where $\overline{p}(x)>0$, we obtain
\begin{align*}
-\log \overline{p}(x) \leq \frac{1}{\overline{p}(x)}-1.
\end{align*}
Multiplying by $\overline{p}(x)$ and integrating with respect to $Q$ gives
\begin{align*}
-D(\overline{P}\|Q)= \int_E -\log \overline{p}(x)\, d\overline{P}(x).
\end{align*}
Using $d\overline{P}=\overline{p}\,dQ$, this equals
\begin{align*}
-D(\overline{P}\|Q)= \int_E -\overline{p}(x)\log \overline{p}(x)\, dQ(x).
\end{align*}
The pointwise inequality above gives
\begin{align*}
\int_E -\overline{p}(x)\log \overline{p}(x)\, dQ(x) \leq \int_E (1-\overline{p}(x))\, dQ(x).
\end{align*}
Since $Q$ and $\overline{P}$ are probability measures,
\begin{align*}
\int_E (1-\overline{p}(x))\, dQ(x)=Q(E)-\overline{P}(E)=0.
\end{align*}
Hence $D(\overline{P}\|Q)\geq 0$. Combining this with the decomposition from the previous step yields
\begin{align*}
I(V;X)\leq \frac{1}{M}\sum_{j=1}^{M}D(P_j\|Q).
\end{align*}
[/step]
[step:Apply convexity in the second argument to obtain the pairwise bound]
Assume $D(P_j\|P_k)<+\infty$ for every $j,k\in\{1,\dots,M\}$. Fix $j\in\{1,\dots,M\}$. Define the finite measure $\mu_j$ on $(E,\mathcal{E})$ by
\begin{align*}
\mu_j := P_j+\sum_{k=1}^{M}P_k.
\end{align*}
Let $p: E \to [0,\infty)$ be a Radon-Nikodym derivative, defined $\mu_j$-almost everywhere by
\begin{align*}
p(x):=\frac{dP_j}{d\mu_j}(x).
\end{align*}
For $k\in\{1,\dots,M\}$, let $q_k: E \to [0,\infty)$ be a Radon-Nikodym derivative, defined $\mu_j$-almost everywhere by
\begin{align*}
q_k(x):=\frac{dP_k}{d\mu_j}(x).
\end{align*}
Define $\overline{q}: E \to [0,\infty)$ by
\begin{align*}
\overline{q}(x):=\frac{1}{M}\sum_{k=1}^{M}q_k(x).
\end{align*}
Then $\overline{q}$ is a Radon-Nikodym derivative of $\overline{P}$ with respect to $\mu_j$.
Since $D(P_j\|P_k)<+\infty$, we have $P_j\ll P_k$ for each $k$, so $q_k(x)>0$ for $P_j$-almost every $x\in E$. On the set where $p(x)>0$, [Jensen's inequality](/theorems/1977) for the convex function $a\mapsto -\log a$ on $(0,\infty)$ gives
\begin{align*}
-\log\left(\frac{\overline{q}(x)}{p(x)}\right)= -\log\left(\frac{1}{M}\sum_{k=1}^{M}\frac{q_k(x)}{p(x)}\right).
\end{align*}
Applying [Jensen's inequality](/theorems/9) to the numbers $q_k(x)/p(x)>0$ gives
\begin{align*}
-\log\left(\frac{1}{M}\sum_{k=1}^{M}\frac{q_k(x)}{p(x)}\right) \leq \frac{1}{M}\sum_{k=1}^{M}-\log\left(\frac{q_k(x)}{p(x)}\right).
\end{align*}
Multiplying by $p(x)$ and integrating with respect to $\mu_j$ gives
\begin{align*}
D(P_j\|\overline{P})= \int_E p(x)\log\left(\frac{p(x)}{\overline{q}(x)}\right)\, d\mu_j(x).
\end{align*}
The [Jensen inequality](/theorems/515) gives
\begin{align*}
D(P_j\|\overline{P}) \leq \frac{1}{M}\sum_{k=1}^{M}\int_E p(x)\log\left(\frac{p(x)}{q_k(x)}\right)\, d\mu_j(x).
\end{align*}
By the density representation of relative entropy with respect to the dominating measure $\mu_j$,
\begin{align*}
\frac{1}{M}\sum_{k=1}^{M}\int_E p(x)\log\left(\frac{p(x)}{q_k(x)}\right)\, d\mu_j(x)= \frac{1}{M}\sum_{k=1}^{M}D(P_j\|P_k).
\end{align*}
Averaging this inequality over $j$ and using
\begin{align*}
I(V;X)=\frac{1}{M}\sum_{j=1}^{M}D(P_j\|\overline{P})
\end{align*}
yields
\begin{align*}
I(V;X) \leq \frac{1}{M}\sum_{j=1}^{M}\frac{1}{M}\sum_{k=1}^{M}D(P_j\|P_k).
\end{align*}
Rewriting the double average gives
\begin{align*}
I(V;X)\leq \frac{1}{M^2}\sum_{j=1}^{M}\sum_{k=1}^{M}D(P_j\|P_k).
\end{align*}
This is the claimed pairwise KL bound.
[/step]