[proofplan]
Both directions rest on a single set-theoretic equivalence: for every $\theta_0 \in \Theta$ and every $x$, the statement $\theta_0 \in I(x)$ is *by definition* equivalent to $x \in A(\theta_0)$. This equivalence turns a statement about the random set $I(X)$ containing the fixed scalar $\theta_0$ into a statement about the random data $X$ falling into the fixed set $A(\theta_0)$ — converting a coverage probability into a test acceptance probability, and vice versa. Part (1) starts from size $\alpha$ of the test and lifts it to confidence level $1 - \alpha$; Part (2) starts from confidence level $1 - \alpha$ and lifts it to size $\alpha$ of the test. In both directions the logical content is the same identity being read in two directions.
[/proofplan]
[step:Establish the foundational equivalence $\theta_0 \in I(x) \iff x \in A(\theta_0)$]
We prove each direction separately, so it is useful to record the set-theoretic equivalence that underlies both.
For Part (1), $I$ is *defined* by $I(x) := \{\theta \in \Theta : x \in A(\theta)\}$. By the defining membership,
\begin{align*}
\theta_0 \in I(x) \iff x \in A(\theta_0),
\end{align*}
for every $\theta_0 \in \Theta$ and every $x$ in the sample space.
For Part (2), $A$ is *defined* by $A(\theta_0) := \{x : \theta_0 \in I(x)\}$. By the defining membership,
\begin{align*}
x \in A(\theta_0) \iff \theta_0 \in I(x),
\end{align*}
for every $\theta_0 \in \Theta$ and every $x$ in the sample space.
These two assertions say the same thing, read from opposite directions. In both parts, the equivalence holds *pointwise in $(x, \theta_0)$* — it is a tautology built into the definition of whichever object is being constructed. Because it holds pointwise, it lifts to an equality of events under any probability measure.
[guided]
The duality theorem is striking because the proof looks almost trivial — it is essentially a translation between two notations for the same underlying object. The "object" here is the subset
\begin{align*}
R := \{(x, \theta) \in \mathcal{X} \times \Theta : \text{the test at parameter } \theta \text{ accepts } H_0 \text{ on data } x\} \subseteq \mathcal{X} \times \Theta.
\end{align*}
Both $A(\theta_0)$ and $I(x)$ are slices of this same set $R$:
\begin{align*}
A(\theta_0) &= \{x : (x, \theta_0) \in R\} &&\text{(horizontal slice at $\theta = \theta_0$)}, \\
I(x) &= \{\theta : (x, \theta) \in R\} &&\text{(vertical slice at $x$)}.
\end{align*}
The equivalence $\theta_0 \in I(x) \iff x \in A(\theta_0)$ just says that a point $(x, \theta_0)$ is in $R$ iff it is in $R$ — both sides of the iff are literal rewrites of $(x, \theta_0) \in R$. Whichever direction of the duality we are proving, we are given one slicing convention as the definition and derive the other from it.
Because the equivalence holds pointwise, we can lift it to probability statements: for fixed $\theta_0$,
\begin{align*}
\{X \in A(\theta_0)\} = \{\theta_0 \in I(X)\}
\end{align*}
is an equality of events (subsets of the sample space), hence has equal probability under any measure.
[/guided]
[/step]
[step:Prove Part (1) — test $\Rightarrow$ confidence set]
Assume the hypothesis of Part (1): for every $\theta_0 \in \Theta$, $A(\theta_0) \subseteq \mathcal{X}$ is the acceptance region of a size $\alpha$ test of $H_0: \theta = \theta_0$ based on the data $X = (X_1, \ldots, X_n) \sim f_{X}(\cdot \mid \theta)$. By the definition of size, this means
\begin{align*}
\mathbb{P}(X \in A(\theta_0) \mid \theta = \theta_0) = 1 - \alpha,
\end{align*}
where $\mathbb{P}(\cdot \mid \theta = \theta_0)$ denotes the probability under the sampling distribution $f_{X}(\cdot \mid \theta_0)$. (We write $1 - \alpha$ rather than $\ge 1 - \alpha$; the theorem statement and the defining equality above are consistent in the size $\alpha$ setting where Type I error equals $\alpha$ exactly.)
Define
\begin{align*}
I: \mathcal{X} &\to \mathcal{P}(\Theta), \\
x &\mapsto \{\theta \in \Theta : x \in A(\theta)\}.
\end{align*}
Fix any $\theta_0 \in \Theta$. By Step 1, the events $\{X \in A(\theta_0)\}$ and $\{\theta_0 \in I(X)\}$ are identical. Therefore, evaluating their probabilities under $\mathbb{P}(\cdot \mid \theta = \theta_0)$,
\begin{align*}
\mathbb{P}(\theta_0 \in I(X) \mid \theta = \theta_0) = \mathbb{P}(X \in A(\theta_0) \mid \theta = \theta_0) = 1 - \alpha.
\end{align*}
Since $\theta_0 \in \Theta$ was arbitrary, the probability above equals $1 - \alpha$ for every $\theta_0 \in \Theta$. By the definition of a $100(1-\alpha)\%$ confidence set — a random set $I(X)$ such that $\mathbb{P}(\theta \in I(X) \mid \theta) = 1 - \alpha$ for every $\theta \in \Theta$ — the set $I(X)$ is a $100(1-\alpha)\%$ confidence set for $\theta$. This proves Part (1).
[guided]
The hypothesis is that, for each candidate parameter $\theta_0$, we have a test of $H_0: \theta = \theta_0$ with acceptance region $A(\theta_0)$. The size $\alpha$ condition gives
\begin{align*}
\mathbb{P}(X \in A(\theta_0) \mid \theta = \theta_0) = 1 - \alpha.
\end{align*}
This says: *when $\theta_0$ is the true parameter*, the probability of observing data $X$ in $A(\theta_0)$ is $1 - \alpha$.
We want to construct a confidence set. The idea — called *inverting the family of tests* — is to declare that $\theta_0$ belongs to the confidence set iff the test at $\theta_0$ would not reject on data $x$: $I(x) := \{\theta : x \in A(\theta)\}$. This is the definition.
To check that $I(X)$ has coverage probability $1 - \alpha$, fix an arbitrary $\theta_0 \in \Theta$ and compute $\mathbb{P}(\theta_0 \in I(X) \mid \theta = \theta_0)$. By the equivalence from Step 1, this is the same event as $\{X \in A(\theta_0)\}$ — not approximately the same, but literally the same subset of the sample space. So their probabilities under any measure are equal:
\begin{align*}
\mathbb{P}(\theta_0 \in I(X) \mid \theta = \theta_0) = \mathbb{P}(X \in A(\theta_0) \mid \theta = \theta_0) = 1 - \alpha.
\end{align*}
The first equality uses the set-theoretic identity of events; the second uses the size $\alpha$ hypothesis. Since $\theta_0$ was arbitrary, this coverage probability is $1 - \alpha$ *at every $\theta \in \Theta$*, which is precisely what a $100(1-\alpha)\%$ confidence set requires.
A conceptual remark: when we evaluate $\mathbb{P}(\theta_0 \in I(X) \mid \theta = \theta_0)$, the randomness is entirely in $X$, while $\theta_0$ is a fixed scalar — both in the event and in the conditioning on the sampling distribution. This matches the frequentist interpretation of coverage: the *procedure* $X \mapsto I(X)$, operating on random data from the true distribution, covers the fixed true parameter with the stated probability.
[/guided]
[/step]
[step:Prove Part (2) — confidence set $\Rightarrow$ test]
Assume the hypothesis of Part (2): $I(X)$ is a $100(1-\alpha)\%$ confidence set for $\theta$, meaning
\begin{align*}
\mathbb{P}(\theta_0 \in I(X) \mid \theta = \theta_0) = 1 - \alpha
\end{align*}
for every $\theta_0 \in \Theta$. Define, for each $\theta_0 \in \Theta$,
\begin{align*}
A(\theta_0) := \{x \in \mathcal{X} : \theta_0 \in I(x)\}.
\end{align*}
Consider the test of $H_0: \theta = \theta_0$ against $H_1: \theta \ne \theta_0$ with critical region $C(\theta_0) := \mathcal{X} \setminus A(\theta_0)$ — we reject $H_0$ iff $X \notin A(\theta_0)$. The size of this test is
\begin{align*}
\mathbb{P}(X \in C(\theta_0) \mid \theta = \theta_0) = 1 - \mathbb{P}(X \in A(\theta_0) \mid \theta = \theta_0).
\end{align*}
By Step 1, $\{X \in A(\theta_0)\} = \{\theta_0 \in I(X)\}$ as events. Therefore
\begin{align*}
\mathbb{P}(X \in A(\theta_0) \mid \theta = \theta_0) = \mathbb{P}(\theta_0 \in I(X) \mid \theta = \theta_0) = 1 - \alpha,
\end{align*}
where the last equality uses the confidence-set hypothesis. Substituting,
\begin{align*}
\mathbb{P}(X \in C(\theta_0) \mid \theta = \theta_0) = 1 - (1 - \alpha) = \alpha.
\end{align*}
Hence $A(\theta_0)$ is the acceptance region of a size $\alpha$ test of $H_0: \theta = \theta_0$, for every $\theta_0 \in \Theta$. This proves Part (2) and completes the proof of both directions of the duality.
[guided]
Part (2) is the reverse direction. We start from a $100(1-\alpha)\%$ confidence set $I(X)$ and construct, for each candidate $\theta_0$, a size $\alpha$ test of $H_0: \theta = \theta_0$.
The construction is the mirror image of Part (1)'s inversion: we declare the test at $\theta_0$ to *accept* when $\theta_0$ lies in the observed confidence set, i.e.
\begin{align*}
A(\theta_0) := \{x : \theta_0 \in I(x)\}.
\end{align*}
Equivalently, we reject $H_0: \theta = \theta_0$ when $\theta_0 \notin I(X)$. This is the operational reading of a confidence interval: if $I(X) = (2.4, 3.1)$, then the value $\theta_0 = 2$ is rejected at level $\alpha$, while the value $\theta_0 = 3$ is not rejected.
To verify the size, we compute the Type I error probability $\mathbb{P}(X \notin A(\theta_0) \mid \theta = \theta_0)$ under the sampling distribution when $\theta = \theta_0$ is the true parameter. By complement,
\begin{align*}
\mathbb{P}(X \notin A(\theta_0) \mid \theta = \theta_0) = 1 - \mathbb{P}(X \in A(\theta_0) \mid \theta = \theta_0).
\end{align*}
The event $\{X \in A(\theta_0)\}$ is, by the Step 1 equivalence, identical to $\{\theta_0 \in I(X)\}$. The confidence-set hypothesis gives $\mathbb{P}(\theta_0 \in I(X) \mid \theta = \theta_0) = 1 - \alpha$, hence $\mathbb{P}(X \in A(\theta_0) \mid \theta = \theta_0) = 1 - \alpha$. Substituting,
\begin{align*}
\mathbb{P}(X \notin A(\theta_0) \mid \theta = \theta_0) = \alpha,
\end{align*}
which is the definition of a size $\alpha$ test. Since $\theta_0$ was arbitrary, this holds uniformly for every $\theta_0$ — we have constructed an entire family of size $\alpha$ tests, indexed by the null parameter value.
The symmetry between Part (1) and Part (2) is exact. The two parts use the same pointwise equivalence (Step 1) and the same probability under the same distribution; the only difference is which of the two directions of the logical biconditional "probabilities equal on equal events" is read as hypothesis and which as conclusion. This is why the theorem deserves to be called a *duality* rather than a pair of separate implications: the two constructions $x \mapsto I(x)$ and $\theta_0 \mapsto A(\theta_0)$ are inverses of each other in a categorical sense — if one starts with a family of tests, constructs the confidence set, and then constructs the associated family of tests, one recovers the original tests (up to measurability).
[/guided]
[/step]