[proofplan]
The proof is a direct analysis of the four criteria as functions of the generalized roots. Roy's statistic is the projection onto the first coordinate, so it cannot detect changes in the smaller roots once $\theta_1$ is fixed. The other three criteria are sums, or products equivalent to sums after taking $-\log$, of strictly increasing one-variable functions applied to each root. Therefore every additional positive root contributes evidence to Lawley-Hotelling, Pillai, and Wilks, but not necessarily to Roy.
[/proofplan]
[step:Express each MANOVA criterion as a root-response function]
Let $\Theta_s$ denote the ordered root cone
\begin{align*}
\Theta_s := \{(\theta_1,\dots,\theta_s) \in [0,\infty)^s : \theta_1 \geq \theta_2 \geq \cdots \geq \theta_s\}.
\end{align*}
The four criteria are functions
\begin{align*}
R &: \Theta_s \to [0,\infty), &
R(\theta) &= \theta_1, \\
L &: \Theta_s \to [0,\infty), &
L(\theta) &= \sum_{k=1}^{s} \theta_k, \\
P &: \Theta_s \to [0,s), &
P(\theta) &= \sum_{k=1}^{s} \frac{\theta_k}{1+\theta_k}, \\
\Lambda &: \Theta_s \to (0,1], &
\Lambda(\theta) &= \prod_{k=1}^{s} (1+\theta_k)^{-1}.
\end{align*}
For Wilks' criterion it is convenient to use the equivalent evidence statistic
\begin{align*}
W &: \Theta_s \to [0,\infty), &
W(\theta) &= -\log \Lambda(\theta)
= \sum_{k=1}^{s} \log(1+\theta_k).
\end{align*}
Since the natural Wilks rejection direction is small $\Lambda$, the statistic $W=-\log\Lambda$ has the same ordering of evidence in the opposite numerical scale: larger $W$ corresponds exactly to smaller $\Lambda$.
[guided]
The roots $\theta_1,\dots,\theta_s$ summarize the departure from the null hypothesis in the discriminating directions determined by the generalized eigenvalue problem for $(H,E)$. The point of the theorem is not to rederive the MANOVA distributions, but to compare how the four criteria read the same root vector.
We therefore define the ordered root cone
\begin{align*}
\Theta_s := \{(\theta_1,\dots,\theta_s) \in [0,\infty)^s : \theta_1 \geq \theta_2 \geq \cdots \geq \theta_s\}.
\end{align*}
On this cone, Roy's statistic is the map
\begin{align*}
R &: \Theta_s \to [0,\infty), &
R(\theta) &= \theta_1.
\end{align*}
The Lawley-Hotelling statistic is the map
\begin{align*}
L &: \Theta_s \to [0,\infty), &
L(\theta) &= \sum_{k=1}^{s} \theta_k.
\end{align*}
Pillai's statistic is the map
\begin{align*}
P &: \Theta_s \to [0,s), &
P(\theta) &= \sum_{k=1}^{s} \frac{\theta_k}{1+\theta_k}.
\end{align*}
Wilks' lambda is the map
\begin{align*}
\Lambda &: \Theta_s \to (0,1], &
\Lambda(\theta) &= \prod_{k=1}^{s} (1+\theta_k)^{-1}.
\end{align*}
Because Wilks' test rejects for small values of $\Lambda$, we compare it using the increasing evidence scale
\begin{align*}
W &: \Theta_s \to [0,\infty), &
W(\theta) &= -\log \Lambda(\theta)
= \sum_{k=1}^{s} \log(1+\theta_k).
\end{align*}
This transformation is strictly decreasing as a function of $\Lambda$ and therefore preserves the Wilks rejection ordering after reversing the numerical direction: increasing $W$ is exactly the same as decreasing $\Lambda$.
[/guided]
[/step]
[step:Show that Roy's criterion ignores all smaller roots once the largest root is fixed]
Let $\theta,\eta \in \Theta_s$ satisfy $\theta_1=\eta_1$. By the definition of $R$,
\begin{align*}
R(\theta) = \theta_1 = \eta_1 = R(\eta).
\end{align*}
Thus changing any of $\theta_2,\dots,\theta_s$ while keeping $\theta_1$ fixed has no effect on Roy's largest root statistic.
[/step]
[step:Show that Lawley-Hotelling responds linearly to every root]
Fix an index $j \in \{1,\dots,s\}$ and let $\theta,\eta \in \Theta_s$ satisfy $\eta_j > \theta_j$ and $\eta_k=\theta_k$ for every $k \neq j$. Then
\begin{align*}
L(\eta)-L(\theta)
&= \sum_{k=1}^{s} \eta_k - \sum_{k=1}^{s} \theta_k \\
&= \eta_j-\theta_j \\
&> 0.
\end{align*}
Therefore the Lawley-Hotelling trace is strictly increased by increasing any one root while holding the remaining roots fixed.
[guided]
The Lawley-Hotelling statistic is additive in the roots. To isolate the effect of one root, fix $j \in \{1,\dots,s\}$ and compare two root vectors $\theta,\eta \in \Theta_s$ such that only the $j$-th coordinate changes: assume $\eta_j>\theta_j$ and $\eta_k=\theta_k$ for every $k \neq j$. Then every unchanged coordinate cancels in the difference:
\begin{align*}
L(\eta)-L(\theta)
&= \sum_{k=1}^{s} \eta_k - \sum_{k=1}^{s} \theta_k \\
&= \eta_j-\theta_j \\
&> 0.
\end{align*}
This calculation shows exactly how a diffuse departure is counted: each additional positive root contributes its full size to the Lawley-Hotelling trace.
[/guided]
[/step]
[step:Show that Pillai's trace responds monotonically but boundedly to every root]
Define the one-variable transform
\begin{align*}
p &: [0,\infty) \to [0,1), &
p(t) &= \frac{t}{1+t}.
\end{align*}
For $0 \leq a < b$, direct subtraction gives
\begin{align*}
p(b)-p(a)
&= \frac{b}{1+b}-\frac{a}{1+a} \\
&= \frac{b(1+a)-a(1+b)}{(1+a)(1+b)} \\
&= \frac{b-a}{(1+a)(1+b)} \\
&> 0.
\end{align*}
Hence $p$ is strictly increasing on $[0,\infty)$. If $\theta,\eta \in \Theta_s$ differ only by an increase in the $j$-th root, then
\begin{align*}
P(\eta)-P(\theta)
= p(\eta_j)-p(\theta_j)
> 0.
\end{align*}
Thus Pillai's trace is strictly increased by increasing any root, with each root contributing through the bounded transform $t \mapsto t/(1+t)$.
[guided]
Pillai's statistic does not add the raw roots. It first applies the bounded transform
\begin{align*}
p &: [0,\infty) \to [0,1), &
p(t) &= \frac{t}{1+t}.
\end{align*}
We need to verify that this transform still increases when the root increases. Let $0 \leq a < b$. Then
\begin{align*}
p(b)-p(a)
&= \frac{b}{1+b}-\frac{a}{1+a} \\
&= \frac{b(1+a)-a(1+b)}{(1+a)(1+b)} \\
&= \frac{b-a}{(1+a)(1+b)} \\
&> 0.
\end{align*}
The denominator is positive because $a,b \geq 0$, and the numerator is positive because $b>a$. Therefore $p$ is strictly increasing on $[0,\infty)$.
Now take $\theta,\eta \in \Theta_s$ that differ only by increasing the $j$-th root. Since all other coordinates are equal, their transformed contributions cancel:
\begin{align*}
P(\eta)-P(\theta)
= p(\eta_j)-p(\theta_j)
> 0.
\end{align*}
This proves that Pillai's trace detects increases in every root. The word "boundedly" refers to the fact that each individual contribution satisfies $0 \leq p(t)<1$, so very large roots are compressed relative to the Lawley-Hotelling trace.
[/guided]
[/step]
[step:Show that Wilks' criterion responds through the product of all reciprocal root factors]
Define the one-variable transform
\begin{align*}
w &: [0,\infty) \to [0,\infty), &
w(t) &= \log(1+t).
\end{align*}
For $0 \leq a < b$, since $1+a < 1+b$ and the logarithm is strictly increasing on $(0,\infty)$,
\begin{align*}
w(a) < w(b).
\end{align*}
Thus $w$ is strictly increasing on $[0,\infty)$. If $\theta,\eta \in \Theta_s$ differ only by an increase in the $j$-th root, then
\begin{align*}
W(\eta)-W(\theta)
= w(\eta_j)-w(\theta_j)
> 0.
\end{align*}
Equivalently,
\begin{align*}
\Lambda(\eta)
= \prod_{k=1}^{s} (1+\eta_k)^{-1}
< \prod_{k=1}^{s} (1+\theta_k)^{-1}
= \Lambda(\theta),
\end{align*}
because the single changed factor satisfies $(1+\eta_j)^{-1} < (1+\theta_j)^{-1}$ and all other factors are unchanged. Hence Wilks' lambda decreases, and Wilks' evidence statistic increases, when any root is increased.
[guided]
Wilks' lambda is multiplicative:
\begin{align*}
\Lambda(\theta) = \prod_{k=1}^{s} (1+\theta_k)^{-1}.
\end{align*}
Because tests based on Wilks' lambda reject for small values, it is cleaner to work with the increasing statistic
\begin{align*}
W(\theta) = -\log \Lambda(\theta)
= \sum_{k=1}^{s} \log(1+\theta_k).
\end{align*}
The relevant one-variable transform is therefore
\begin{align*}
w &: [0,\infty) \to [0,\infty), &
w(t) &= \log(1+t).
\end{align*}
If $0 \leq a < b$, then $1+a<1+b$, and the logarithm is strictly increasing on $(0,\infty)$. Hence
\begin{align*}
w(a)<w(b).
\end{align*}
So $w$ is strictly increasing.
Now suppose $\theta,\eta \in \Theta_s$ differ only by increasing the $j$-th root. In the additive Wilks evidence scale, all unchanged coordinates cancel:
\begin{align*}
W(\eta)-W(\theta)
= w(\eta_j)-w(\theta_j)
> 0.
\end{align*}
Returning to Wilks' original lambda scale gives the equivalent product comparison
\begin{align*}
\Lambda(\eta)
= \prod_{k=1}^{s} (1+\eta_k)^{-1}
< \prod_{k=1}^{s} (1+\theta_k)^{-1}
= \Lambda(\theta),
\end{align*}
because the changed reciprocal factor is smaller:
\begin{align*}
(1+\eta_j)^{-1} < (1+\theta_j)^{-1}.
\end{align*}
Thus Wilks' criterion also uses every nonzero root, but in a multiplicative reciprocal form rather than by adding raw roots.
[/guided]
[/step]
[step:Deduce the qualitative ordering by concentrated and diffuse root patterns]
A root pattern concentrated in one dominant direction is represented by a vector for which $\theta_1$ is large and the remaining roots are zero or small. Roy's statistic records only $\theta_1$, so it is maximally aligned with this type of pattern. A diffuse root pattern is represented by a vector in which several of $\theta_2,\dots,\theta_s$ are positive. The previous steps show that increasing any of these smaller roots leaves $R$ unchanged when $\theta_1$ is fixed, but strictly increases $L$, strictly increases $P$, and strictly increases $W=-\log\Lambda$, equivalently strictly decreases $\Lambda$. Therefore Lawley-Hotelling, Pillai, and Wilks use all nonzero roots and are more responsive than Roy to alternatives spread across several discriminating directions. This proves the stated qualitative ordering.
[/step]