[proofplan]
Fix any ciphertext $c \in \mathcal{C}$ with $\mathbb{P}(C = c) > 0$ — such a $c$ exists because $C$ takes values in $\mathcal{C}$. Perfect secrecy forces $\mathbb{P}(M = m \mid C = c) = \mathbb{P}(M = m) > 0$ for every $m \in \mathcal{M}$, so for each $m$ there is at least one key $k$ with $e(m, k) = c$. Choosing such a $k_m$ for each $m$ defines a map $\mathcal{M} \to \mathcal{K}$ which is injective: if $k_m = k_{m'}$ then applying the decryption map $d$ to $c$ under that common key recovers both $m$ and $m'$, forcing $m = m'$. Injectivity from $\mathcal{M}$ to $\mathcal{K}$ between finite sets gives the cardinality inequality.
[/proofplan]
[step:Fix a ciphertext that actually occurs]
Since $C$ is a random variable valued in the finite set $\mathcal{C}$, the law of total probability gives
\begin{align*}
1 = \sum_{c \in \mathcal{C}} \mathbb{P}(C = c).
\end{align*}
Hence the set $\mathcal{C}^+ := \{c \in \mathcal{C} : \mathbb{P}(C = c) > 0\}$ is nonempty. Fix any $c \in \mathcal{C}^+$.
[/step]
[step:Show that every message has a key encrypting it to $c$]
Fix $m \in \mathcal{M}$. By perfect secrecy applied to $m$ and our fixed $c \in \mathcal{C}^+$,
\begin{align*}
\mathbb{P}(M = m \mid C = c) = \mathbb{P}(M = m) > 0,
\end{align*}
where positivity comes from the hypothesis that every message has positive prior probability.
On the other hand, by Bayes' theorem and the definition of conditional probability,
\begin{align*}
\mathbb{P}(M = m \mid C = c) = \frac{\mathbb{P}(M = m,\, C = c)}{\mathbb{P}(C = c)} = \frac{\mathbb{P}(M = m,\, e(M, K) = c)}{\mathbb{P}(C = c)}.
\end{align*}
Expanding the joint probability using independence of $M$ and $K$,
\begin{align*}
\mathbb{P}(M = m,\, e(M, K) = c) = \sum_{k \in \mathcal{K}:\, e(m, k) = c} \mathbb{P}(M = m)\, \mathbb{P}(K = k).
\end{align*}
Positivity of $\mathbb{P}(M = m \mid C = c)$ therefore forces the set
\begin{align*}
\mathcal{K}_m := \{k \in \mathcal{K} : e(m, k) = c\}
\end{align*}
to be nonempty (if it were empty, the numerator above would vanish, making the conditional probability zero, contradicting $\mathbb{P}(M = m) > 0$).
[guided]
We want to conclude that for each message $m$ at least one key $k$ encrypts $m$ to our fixed $c$. How does perfect secrecy force this?
Perfect secrecy says that learning the ciphertext gives no information about the message: $\mathbb{P}(M = m \mid C = c) = \mathbb{P}(M = m)$. Combined with the hypothesis $\mathbb{P}(M = m) > 0$ (every message occurs with positive prior probability), this yields
\begin{align*}
\mathbb{P}(M = m \mid C = c) > 0.
\end{align*}
In words: under our fixed $c$, every message remains possible. The cryptosystem cannot "rule out" any message based on the ciphertext.
Now translate "possible" into "there exists a key". By the definition of conditional probability and the joint distribution,
\begin{align*}
\mathbb{P}(M = m,\, C = c) = \mathbb{P}(M = m \mid C = c) \cdot \mathbb{P}(C = c) > 0.
\end{align*}
Since $C = e(M, K)$, the event $\{M = m,\, C = c\}$ equals $\{M = m,\, e(m, K) = c\}$ (substituting $M = m$). Using independence of $M$ and $K$, this joint event has probability
\begin{align*}
\mathbb{P}(M = m) \cdot \mathbb{P}(e(m, K) = c) = \mathbb{P}(M = m) \cdot \sum_{k:\, e(m,k) = c} \mathbb{P}(K = k).
\end{align*}
Positivity of this product (given $\mathbb{P}(M = m) > 0$) forces $\sum_{k:\, e(m,k) = c} \mathbb{P}(K = k) > 0$, hence the set $\mathcal{K}_m = \{k : e(m, k) = c\}$ is nonempty.
Why do we need $\mathbb{P}(M = m) > 0$? Without it, the perfect-secrecy equation would be trivially satisfied with zero on both sides, and we could not deduce that $c$ is achievable from $m$. The hypothesis rules out "dead" messages.
[/guided]
[/step]
[step:Construct an injection $\mathcal{M} \to \mathcal{K}$ using the axiom of choice on a finite set]
Using that each $\mathcal{K}_m$ is nonempty, define a map
\begin{align*}
\kappa: \mathcal{M} &\to \mathcal{K} \\
m &\mapsto \kappa(m) \in \mathcal{K}_m,
\end{align*}
where $\kappa(m)$ is an arbitrary element of $\mathcal{K}_m$. The map is well-defined because each $\mathcal{K}_m$ is nonempty, and the selection requires only finitely many choices since $\mathcal{M}$ is finite (so no nontrivial use of the axiom of choice is needed).
By construction, $e(m, \kappa(m)) = c$ for every $m \in \mathcal{M}$.
[/step]
[step:Verify injectivity of $\kappa$ via the decryption map]
Suppose $m, m' \in \mathcal{M}$ satisfy $\kappa(m) = \kappa(m') =: k$. Then by the defining property of $\kappa$,
\begin{align*}
e(m, k) = c = e(m', k).
\end{align*}
Apply the decryption map $d$ in the second argument $k$ to both sides. By the cryptosystem axiom $d(e(m, k), k) = m$,
\begin{align*}
m = d(e(m, k), k) = d(c, k) = d(e(m', k), k) = m'.
\end{align*}
Hence $\kappa(m) = \kappa(m')$ implies $m = m'$, so $\kappa$ is injective.
[/step]
[step:Conclude the cardinality inequality]
We have constructed an injective map $\kappa: \mathcal{M} \to \mathcal{K}$ between finite sets. By the pigeonhole principle for finite sets, the existence of an injection from a finite set $A$ to a finite set $B$ implies $|A| \leq |B|$. Therefore
\begin{align*}
|\mathcal{M}| \leq |\mathcal{K}|,
\end{align*}
which is the statement of the theorem.
[/step]