[proofplan]
Both parts are proved by contradiction, each via a local surgery on the code that strictly decreases expected length. For (i), if a more probable symbol were assigned a strictly longer codeword than a less probable one, swapping the two codewords decreases $\mathbb{E}[S]$. For (ii), if the unique codeword at maximum depth had no sibling leaf in the coding tree, we could truncate its last digit without creating a prefix conflict, again strictly decreasing $\mathbb{E}[S]$. Both surgeries produce a prefix-free code of strictly smaller expected length, contradicting optimality of $c$.
[/proofplan]
[step:Prove part (i) by a codeword swap]
Suppose for contradiction that there exist indices $i, j$ with $p_i > p_j$ and $l_i > l_j$. Define a new code $\tilde c$ by swapping the codewords of $\mu_i$ and $\mu_j$:
\begin{align*}
\tilde c(\mu_i) &:= c(\mu_j), & \tilde c(\mu_j) &:= c(\mu_i), & \tilde c(\mu_k) &:= c(\mu_k) \quad (k \notin \{i, j\}).
\end{align*}
Since $\tilde c$ has the same multiset of codewords as $c$, it is prefix-free (the prefix-free property depends only on the multiset of codewords, not on the assignment to symbols).
Its expected length is
\begin{align*}
\mathbb{E}[\tilde S] = \sum_{k=1}^m p_k |\tilde c(\mu_k)| = \sum_{k \notin \{i,j\}} p_k l_k + p_i l_j + p_j l_i.
\end{align*}
Subtracting from $\mathbb{E}[S] = \sum_k p_k l_k = \sum_{k \notin \{i,j\}} p_k l_k + p_i l_i + p_j l_j$:
\begin{align*}
\mathbb{E}[S] - \mathbb{E}[\tilde S]
&= (p_i l_i + p_j l_j) - (p_i l_j + p_j l_i) \\
&= (p_i - p_j)(l_i - l_j).
\end{align*}
By assumption $p_i - p_j > 0$ and $l_i - l_j > 0$, so $\mathbb{E}[S] - \mathbb{E}[\tilde S] > 0$, i.e., $\mathbb{E}[\tilde S] < \mathbb{E}[S]$. This contradicts optimality of $c$. Hence $p_i > p_j$ implies $l_i \leq l_j$.
[guided]
We want to show: among optimal prefix-free codes, more probable symbols never receive longer codewords than less probable ones. The idea is the **exchange argument** — if some pair violated this, we would fix it locally by swapping their codewords, and the swap would strictly improve the expected length, contradicting optimality.
Suppose for contradiction that there exist indices $i, j$ with $p_i > p_j$ and simultaneously $l_i > l_j$ (the more probable symbol has the longer codeword — a bad situation). We surgically define the swap
\begin{align*}
\tilde c(\mu_i) := c(\mu_j), \qquad \tilde c(\mu_j) := c(\mu_i), \qquad \tilde c(\mu_k) := c(\mu_k) \ (k \notin \{i,j\}).
\end{align*}
*Why is $\tilde c$ still prefix-free?* The prefix-free property is a property of the **set** (or multiset) of codewords: a code is prefix-free iff no codeword is a prefix of another. Since $\tilde c$ uses exactly the same codewords as $c$, just re-assigned, this property is preserved.
*Why does $\tilde c$ strictly beat $c$?* Compute the length difference:
\begin{align*}
\mathbb{E}[S] - \mathbb{E}[\tilde S] = (p_i l_i + p_j l_j) - (p_i l_j + p_j l_i) = (p_i - p_j)(l_i - l_j).
\end{align*}
Both factors are **strictly positive** ($p_i > p_j$ gives $p_i - p_j > 0$; $l_i > l_j$ gives $l_i - l_j > 0$), so the product is strictly positive. Hence $\mathbb{E}[\tilde S] < \mathbb{E}[S]$, contradicting optimality of $c$. This contradiction disposes of the hypothesis that $p_i > p_j$ and $l_i > l_j$ can coexist; therefore $p_i > p_j$ forces $l_i \leq l_j$.
*Remark.* When $p_i = p_j$, no conclusion is drawn — the corresponding codewords may freely be swapped, and there is no obstruction to $l_i > l_j$ in that case.
[/guided]
[/step]
[step:Prove part (ii) by truncation of an isolated maximal codeword]
Let $L := \max_i l_i$ denote the maximum codeword length. Assume for contradiction that no two codewords of length $L$ differ only in their final digit.
Pick any codeword $w \in c(\{\mu_1, \ldots, \mu_m\})$ with $|w| = L$; say $w = c(\mu_i)$. Write $w = w' \cdot b$ where $w' \in B^{L-1}$ and $b \in B$ is the final digit.
*Claim.* $w' \cdot b'$ is not a codeword of $c$ for any $b' \in B$ with $b' \neq b$.
Indeed, suppose $w' \cdot b'$ were a codeword of $c$, say $w' \cdot b' = c(\mu_j)$. Then $|c(\mu_j)| = L$ (same length as $w$) and $c(\mu_j)$ differs from $c(\mu_i) = w$ only in the final digit. This pair $(c(\mu_i), c(\mu_j))$ would satisfy the property we assumed impossible — contradiction. So the claim holds.
Next, we show $w'$ is itself not a codeword of $c$. If $w' = c(\mu_k)$ for some $k$, then $c(\mu_k) = w'$ is a strict prefix of $c(\mu_i) = w = w' \cdot b$, violating the prefix-free property of $c$.
Define the truncated code $\tilde c$ by replacing $w$ with $w'$:
\begin{align*}
\tilde c(\mu_i) &:= w', & \tilde c(\mu_k) &:= c(\mu_k) \quad (k \neq i).
\end{align*}
*Prefix-free check.* We must check that $w'$ is prefix-incomparable with $c(\mu_k)$ for every $k \neq i$:
- **$w'$ is not a prefix of $c(\mu_k)$.** If $w'$ were a prefix of $c(\mu_k)$ with $|c(\mu_k)| \geq L$, then $c(\mu_k)$ would begin with $w' \cdot b^\star$ for some $b^\star \in B$. If $b^\star = b$, then $w = w' \cdot b = c(\mu_i)$ is a prefix of $c(\mu_k)$, violating prefix-freeness of $c$. If $b^\star \neq b$, then since $|c(\mu_k)| \leq L$ (by definition of $L$) and $c(\mu_k)$ has the form $w' \cdot b^\star \cdot (\text{more})$, the only way $|c(\mu_k)| \leq L$ is $|c(\mu_k)| = L$ and $c(\mu_k) = w' \cdot b^\star$. But then $c(\mu_k)$ is a codeword of length $L$ differing from $c(\mu_i) = w$ only in the final digit — contradicting our contradiction-hypothesis. So $w'$ is not a prefix of any $c(\mu_k)$. The case $|c(\mu_k)| < |w'| = L - 1$ is impossible for a prefix relation (a longer string cannot be extended *from* a shorter string we are compared to in this direction); more precisely, the relation "$w'$ is a prefix of $c(\mu_k)$" requires $|c(\mu_k)| \geq |w'|$.
- **$c(\mu_k)$ is not a prefix of $w'$.** If $c(\mu_k)$ were a prefix of $w' = $ (first $L - 1$ digits of $w$), then $c(\mu_k)$ would also be a prefix of $w = c(\mu_i)$, violating prefix-freeness of $c$.
Hence $\tilde c$ is prefix-free. Its expected length is
\begin{align*}
\mathbb{E}[\tilde S]
= \sum_{k \neq i} p_k l_k + p_i (L - 1)
= \mathbb{E}[S] - p_i.
\end{align*}
Since $p_i > 0$ (we may discard zero-probability symbols a priori), $\mathbb{E}[\tilde S] < \mathbb{E}[S]$, contradicting optimality of $c$. Therefore two codewords of maximum length must differ only in their final digit.
[guided]
The intuition is a coding-tree picture. A prefix-free binary code corresponds to a rooted binary tree in which codewords are leaves; the length of a codeword is the depth of its leaf. Part (ii) asserts: among optimal codes, at maximum depth two siblings exist — i.e., there is some internal node at depth $L - 1$ both of whose children are leaves.
*Why is this necessary for optimality?* Because if a leaf at depth $L$ had no sibling leaf — equivalently, if the internal node at depth $L - 1$ above it had only this one child as a descendant codeword — then the digit extending the parent node is doing no branching work and can be erased, producing a shorter prefix-free code.
*Formal argument.* Let $L = \max_i l_i$ and pick any codeword $w = c(\mu_i)$ of length $L$. Write $w = w' \cdot b$ with $b$ the last digit. Suppose for contradiction that no two length-$L$ codewords of $c$ differ only in their last digit.
**Sub-claim 1:** $w' \cdot b'$ is not a codeword of $c$ for $b' \neq b$.
If it were, say $w' \cdot b' = c(\mu_j)$, then $c(\mu_j)$ has length $L$ and differs from $c(\mu_i) = w' \cdot b$ only in the last digit — exactly the forbidden configuration.
**Sub-claim 2:** $w'$ itself is not a codeword of $c$.
If $w' = c(\mu_k)$, then $c(\mu_k)$ is a strict prefix of $c(\mu_i) = w' \cdot b = w$, violating prefix-freeness of $c$.
These two sub-claims together say: the string $w'$ is free — it is not a codeword, and neither is any extension of $w'$ other than $w$ itself (extensions $w' \cdot b'$ with $b' \neq b$ are forbidden by Sub-claim 1; longer extensions would be in the subtree rooted at $w' \cdot b'$ and hence would need to share the prefix $w' \cdot b'$ which is not a codeword but we'd need to check nothing below either; for prefix-freeness it suffices to rule out codewords that are prefixes of or prefixed by $w'$).
**Surgery.** Replace $w$ with $w'$ to obtain $\tilde c$. We verified above that $\tilde c$ is prefix-free. The new expected length is
\begin{align*}
\mathbb{E}[\tilde S] = \mathbb{E}[S] - p_i (L - (L - 1)) = \mathbb{E}[S] - p_i.
\end{align*}
Since $p_i > 0$, this is strictly smaller than $\mathbb{E}[S]$, contradicting optimality of $c$.
The contradiction forces the assumption to fail: two codewords of length $L$ must differ only in their final digit.
[/guided]
[/step]