[proofplan]
We must show that $T$ is a minimal sufficient statistic if and only if the ratio $f(x; \theta)/f(y; \theta)$ being constant in $\theta$ is equivalent to $T(x) = T(y)$. The forward direction uses the Factorisation Criterion to express the ratio in terms of $g(T(x); \theta)/g(T(y); \theta)$, then uses minimality to show that constancy of this ratio forces $T(x) = T(y)$. The reverse direction constructs $T$ from the equivalence classes induced by the ratio: we define $T(x)$ as the equivalence class of $x$, show this $T$ is sufficient (via Factorisation), and then show it is minimal by proving any other sufficient statistic $T'$ must refine $T$, i.e., $T$ is a function of $T'$.
[/proofplan]
[step:Recall the setup and the definition of minimal sufficiency]
Let $X = (X_1, \ldots, X_n)$ take values in $\mathcal{X}$, and let $\{f(\cdot; \theta) : \theta \in \Theta\}$ be a family of densities with respect to a common $\sigma$-finite dominating measure $\nu$ on $\mathcal{X}$. Define the support
\begin{align*}
\mathcal{X}_0 := \{x \in \mathcal{X} : f(x; \theta) > 0 \text{ for some } \theta \in \Theta\}.
\end{align*}
A statistic $T: \mathcal{X} \to \mathcal{T}$ is [minimal sufficient](/page/Minimal%20Sufficient%20Statistic) if $T$ is sufficient and, for every other sufficient statistic $T': \mathcal{X} \to \mathcal{T}'$, there is a measurable map $\psi: \mathcal{T}' \to \mathcal{T}$ with $T = \psi \circ T'$ $\nu$-almost everywhere on $\mathcal{X}_0$. Intuitively, $T$ groups sample points into the coarsest partition that preserves sufficiency.
[guided]
Before we prove anything, let us unpack the definition of minimal sufficiency and the statement of the theorem.
A [sufficient statistic](/page/Sufficient%20Statistic) $T$ induces a partition of the sample space $\mathcal{X}$ into its level sets $\{T = t\}_{t \in \mathcal{T}}$. Different sufficient statistics produce partitions of different granularities: the identity map $x \mapsto x$ is trivially sufficient (its level sets are singletons), but it gives no data reduction. We want the coarsest possible sufficient partition — one that collapses as much of the data as it can without losing information about $\theta$.
Formally, $T$ is [minimal sufficient](/page/Minimal%20Sufficient%20Statistic) if (i) $T$ is sufficient and (ii) every sufficient statistic $T'$ is at least as fine as $T$, in the sense that $T$ is a function of $T'$ (up to a null set). The Lehmann–Scheffé criterion the theorem states is a concrete test: it says $T$ is minimal sufficient iff the equivalence relation "$f(x; \theta)/f(y; \theta)$ is constant in $\theta$" coincides with "$T(x) = T(y)$".
Why does the likelihood ratio enter? The likelihood function $\theta \mapsto f(x; \theta)$ carries all the information in $x$ about $\theta$. Two points $x$ and $y$ give the same information about $\theta$ precisely when their likelihood functions are proportional — i.e., $f(x; \theta) = c(x, y)\,f(y; \theta)$ for all $\theta$, where $c$ does not depend on $\theta$. The coarsest sufficient partition should identify exactly these pairs.
We prove both directions. The forward direction assumes minimal sufficiency and derives the likelihood-ratio criterion. The reverse direction assumes the criterion and constructs a minimal sufficient statistic.
[/guided]
[/step]
[step:Assume $T$ is minimal sufficient and prove the ratio criterion]
Suppose $T: \mathcal{X} \to \mathcal{T}$ is minimal sufficient. By the [Factorisation Criterion](/theorems/1425), there exist $g: \mathcal{T} \times \Theta \to [0, \infty)$ and $h: \mathcal{X} \to [0, \infty)$ with
\begin{align*}
f(x; \theta) = g(T(x); \theta)\,h(x).
\end{align*}
For $x, y \in \mathcal{X}_0$ with $h(x), h(y) > 0$, the ratio becomes
\begin{align*}
\frac{f(x; \theta)}{f(y; \theta)} = \frac{g(T(x); \theta)}{g(T(y); \theta)} \cdot \frac{h(x)}{h(y)}.
\end{align*}
If $T(x) = T(y)$, then $g(T(x); \theta) = g(T(y); \theta)$ and the ratio equals $h(x)/h(y)$, which does not depend on $\theta$.
For the converse direction within this step, define an equivalence relation on $\mathcal{X}_0$ by
\begin{align*}
x \sim y \iff \frac{f(x; \theta)}{f(y; \theta)} \text{ does not depend on } \theta.
\end{align*}
Let $T'(x) := [x]_\sim$ be the equivalence class map. We claim $T'$ is sufficient. For any $x \in \mathcal{X}_0$, fix a representative $x_0 \in [x]_\sim$ once and for all in each class, and write
\begin{align*}
f(x; \theta) = \frac{f(x; \theta)}{f(x_0; \theta)} \cdot f(x_0; \theta).
\end{align*}
By the definition of $\sim$, the first factor depends only on $x$ (not on $\theta$); set $h(x) := f(x; \theta)/f(x_0; \theta)$, which is well-defined. The second factor depends on $\theta$ and on the equivalence class $[x]_\sim = T'(x)$ through $x_0$; set $g(T'(x); \theta) := f(x_0; \theta)$. This gives $f(x; \theta) = g(T'(x); \theta)\,h(x)$, so $T'$ is sufficient by the [Factorisation Criterion](/theorems/1425).
Since $T$ is minimal sufficient, there is a measurable $\psi: \mathcal{T}' \to \mathcal{T}$ with $T = \psi \circ T'$ a.e. on $\mathcal{X}_0$. Hence $T'(x) = T'(y)$ implies $T(x) = T(y)$. But $T'(x) = T'(y)$ means exactly $x \sim y$, i.e., the ratio $f(x; \theta)/f(y; \theta)$ does not depend on $\theta$. Therefore
\begin{align*}
\frac{f(x; \theta)}{f(y; \theta)} \text{ does not depend on } \theta \implies T(x) = T(y),
\end{align*}
completing the forward direction.
[guided]
We assume $T$ is minimal sufficient and want to show the equivalence
\begin{align*}
T(x) = T(y) \iff \frac{f(x; \theta)}{f(y; \theta)} \text{ does not depend on } \theta.
\end{align*}
This has two pieces.
**Easy direction ($\Rightarrow$).** Suppose $T(x) = T(y)$. Sufficiency of $T$ (part of minimal sufficiency) gives, via the [Factorisation Criterion](/theorems/1425), a factorisation
\begin{align*}
f(x; \theta) = g(T(x); \theta)\,h(x),
\end{align*}
for some $g, h \ge 0$. Dividing,
\begin{align*}
\frac{f(x; \theta)}{f(y; \theta)} = \frac{g(T(x); \theta)\,h(x)}{g(T(y); \theta)\,h(y)} = \frac{g(T(x); \theta)}{g(T(y); \theta)} \cdot \frac{h(x)}{h(y)}.
\end{align*}
If $T(x) = T(y)$, the first factor equals $1$ for every $\theta$, so the whole ratio equals $h(x)/h(y)$, a constant in $\theta$.
**Hard direction ($\Leftarrow$).** Suppose the ratio $f(x; \theta)/f(y; \theta)$ does not depend on $\theta$. We must show $T(x) = T(y)$.
The strategy is to manufacture a candidate sufficient statistic $T'$ out of the ratio relation and invoke the minimality of $T$.
Define the equivalence relation $\sim$ on $\mathcal{X}_0$ by $x \sim y$ iff $f(x; \theta)/f(y; \theta)$ is constant in $\theta$. Reflexivity, symmetry, and transitivity are immediate from properties of constants. Let $T'(x) = [x]_\sim$ be the map to the quotient $\mathcal{X}_0/{\sim}$.
We claim $T'$ is sufficient, and we verify this via [Factorisation](/theorems/1425). Pick, for each equivalence class, a representative $x_0$ (measurability of this selection follows from standard section theorems; we take it for granted here). For $x \in \mathcal{X}_0$ with representative $x_0 \in [x]_\sim$,
\begin{align*}
f(x; \theta) = \underbrace{\frac{f(x; \theta)}{f(x_0; \theta)}}_{=: h(x)} \cdot \underbrace{f(x_0; \theta)}_{=: g(T'(x); \theta)}.
\end{align*}
The first factor depends only on $x$ (not $\theta$) by the defining property of $\sim$: since $x \sim x_0$, the ratio is a function of $x$ and $x_0$ alone, and $x_0$ is determined by the class $[x]_\sim$. The second factor depends on $\theta$ and on $T'(x)$ (since $x_0$ is a function of $T'(x)$). So we have produced the factorisation $f = g \circ T' \cdot h$, and the [Factorisation Criterion](/theorems/1425) says $T'$ is sufficient.
Now minimality of $T$ enters. By hypothesis $T$ is minimal sufficient. Since $T'$ is sufficient, minimal sufficiency gives a measurable $\psi: \mathcal{T}' \to \mathcal{T}$ with $T = \psi \circ T'$ on $\mathcal{X}_0$. Therefore
\begin{align*}
T'(x) = T'(y) \implies T(x) = \psi(T'(x)) = \psi(T'(y)) = T(y).
\end{align*}
But $T'(x) = T'(y)$ is exactly the statement $x \sim y$, i.e., the ratio is constant in $\theta$. This is what we needed: ratio constant in $\theta$ implies $T(x) = T(y)$.
[/guided]
[/step]
[step:Assume the ratio criterion and prove $T$ is minimal sufficient]
Conversely, suppose $T$ satisfies
\begin{align*}
T(x) = T(y) \iff \frac{f(x; \theta)}{f(y; \theta)} \text{ does not depend on } \theta \quad \text{for all } x, y \in \mathcal{X}_0.
\end{align*}
We show $T$ is sufficient and minimal.
**Sufficiency.** Let $\mathcal{T}_0 := T(\mathcal{X}_0)$. For each $t \in \mathcal{T}_0$, pick (using a measurable selection — which exists under mild regularity, and which we assume) a representative $x_t \in T^{-1}(t) \cap \mathcal{X}_0$. For $x \in \mathcal{X}_0$ with $T(x) = t$, we have $T(x) = T(x_t)$, so by the assumed criterion the ratio $f(x; \theta)/f(x_t; \theta)$ does not depend on $\theta$. Define
\begin{align*}
h(x) &:= \frac{f(x; \theta)}{f(x_{T(x)}; \theta)}, & g(t; \theta) &:= f(x_t; \theta).
\end{align*}
The function $h$ is well-defined because the right-hand side does not depend on $\theta$. Then
\begin{align*}
f(x; \theta) = g(T(x); \theta)\,h(x),
\end{align*}
so by the [Factorisation Criterion](/theorems/1425), $T$ is sufficient.
**Minimality.** Let $T': \mathcal{X} \to \mathcal{T}'$ be any other sufficient statistic; we show $T$ is a function of $T'$ a.e. on $\mathcal{X}_0$. Since $T'$ is sufficient, Factorisation produces $g'$ and $h'$ with
\begin{align*}
f(x; \theta) = g'(T'(x); \theta)\,h'(x).
\end{align*}
We first note that $h'(x) > 0$ for every $x \in \mathcal{X}_0$. Indeed, if $x \in \mathcal{X}_0$ then by definition there exists $\theta_0 \in \Theta$ with $f(x; \theta_0) > 0$. Since $f(x; \theta_0) = g'(T'(x); \theta_0)\,h'(x)$ and $g', h' \ge 0$, both factors must be strictly positive; in particular $h'(x) > 0$.
Now take $x, y \in \mathcal{X}_0$ with $T'(x) = T'(y)$. Since $h'(x), h'(y) > 0$ (as just established), the ratio
\begin{align*}
\frac{f(x; \theta)}{f(y; \theta)} = \frac{g'(T'(x); \theta)\,h'(x)}{g'(T'(y); \theta)\,h'(y)} = \frac{h'(x)}{h'(y)}
\end{align*}
does not depend on $\theta$ (the $g'$ factors are identical under $T'(x) = T'(y)$). By the assumed criterion, this forces $T(x) = T(y)$.
Therefore $T$ is constant on the level sets of $T'$, which means $T$ factors through $T'$: there is a well-defined map $\psi: \mathcal{T}'_0 \to \mathcal{T}$ on $\mathcal{T}'_0 := T'(\mathcal{X}_0)$ with $T(x) = \psi(T'(x))$ for all $x \in \mathcal{X}_0$. Measurability of $\psi$ follows from measurability of $T$ and $T'$ together with standard factorisation theorems for measurable functions. Hence $T$ is minimal sufficient.
[guided]
For the reverse direction we assume the ratio criterion and must prove two things: (a) $T$ is sufficient, and (b) $T$ is minimal, i.e., every other sufficient statistic refines $T$.
**(a) $T$ is sufficient.** The plan is to produce a factorisation $f(x; \theta) = g(T(x); \theta)\,h(x)$ and apply [Factorisation](/theorems/1425).
For each $t$ in the image $\mathcal{T}_0 = T(\mathcal{X}_0)$, pick a representative $x_t$ in the level set $T^{-1}(t) \cap \mathcal{X}_0$. (This measurable selection is a standard but non-trivial step; we assume the regularity needed for it to be valid, which holds under mild conditions — e.g., Polish $\mathcal{X}$, Borel $T$, by standard section theorems.)
For any $x \in \mathcal{X}_0$, set $t = T(x)$. Then $T(x) = T(x_t)$, so the ratio criterion (used left-to-right this time) says $f(x; \theta)/f(x_t; \theta)$ does not depend on $\theta$. We therefore can define
\begin{align*}
h(x) := \frac{f(x; \theta)}{f(x_{T(x)}; \theta)}, \qquad g(t; \theta) := f(x_t; \theta),
\end{align*}
with $h(x)$ well-defined because the right-hand side does not depend on $\theta$. Rearranging gives
\begin{align*}
f(x; \theta) = g(T(x); \theta)\,h(x),
\end{align*}
so by the [Factorisation Criterion](/theorems/1425), $T$ is sufficient.
**(b) $T$ is minimal.** Let $T'$ be any other sufficient statistic. We must show $T$ is a function of $T'$ a.e. on $\mathcal{X}_0$.
By sufficiency of $T'$, Factorisation gives $f(x; \theta) = g'(T'(x); \theta)\,h'(x)$. Before dividing, we must verify that we can safely cancel the $g'$ terms without worrying about $h'$ vanishing. We claim $h'(x) > 0$ for every $x \in \mathcal{X}_0$. To see this, note that $x \in \mathcal{X}_0$ means $f(x; \theta_0) > 0$ for some $\theta_0 \in \Theta$. Since $f(x; \theta_0) = g'(T'(x); \theta_0)\,h'(x)$ with $g', h' \ge 0$, both factors must be strictly positive; in particular $h'(x) > 0$. The same argument gives $h'(y) > 0$ for $y \in \mathcal{X}_0$.
Now consider two points $x, y \in \mathcal{X}_0$ with $T'(x) = T'(y)$. Since $h'(x), h'(y) > 0$, the ratio of densities is
\begin{align*}
\frac{f(x; \theta)}{f(y; \theta)} = \frac{g'(T'(x); \theta)\,h'(x)}{g'(T'(y); \theta)\,h'(y)}.
\end{align*}
Since $T'(x) = T'(y)$, the $g'$ terms are identical and cancel, leaving $h'(x)/h'(y)$, which does not depend on $\theta$.
Now apply the ratio criterion right-to-left: "ratio constant in $\theta$" implies "$T(x) = T(y)$". So whenever $T'(x) = T'(y)$, also $T(x) = T(y)$. This is exactly the statement that $T$ is constant on the level sets of $T'$, which in turn means $T$ factors through $T'$: there is a function $\psi: \mathcal{T}'_0 \to \mathcal{T}$ with $T = \psi \circ T'$ on $\mathcal{X}_0$. The measurability of $\psi$ — the standard delicate step — follows from measurable factorisation theorems (e.g., Kuratowski–Ryll-Nardzewski) applied to the measurable functions $T$ and $T'$.
Combining (a) and (b), $T$ is minimal sufficient.
[/guided]
[/step]
[step:Conclude the biconditional]
The two directions together show that $T$ is minimal sufficient if and only if, for all $x, y \in \mathcal{X}_0$,
\begin{align*}
T(x) = T(y) \iff \frac{f(x; \theta)}{f(y; \theta)} \text{ does not depend on } \theta.
\end{align*}
This is the Minimal Sufficiency Criterion.
[/step]