[example: Common Parametric Families]
Three families appear repeatedly in this course, and they illustrate two structurally different ways the choice of family matters.
**Poisson.** If $X \sim \operatorname{Pois}(\mu)$ with $\mu > 0$, then
\begin{align*}
f(x; \mu) = \frac{e^{-\mu}\mu^x}{x!}, \qquad x = 0, 1, 2, \ldots
\end{align*}
The parameter $\mu = \mathbb{E}[X]$ is the mean rate of occurrence. We know the data come from *some* Poisson distribution, but we do not know $\mu$.
**Binomial.** If $X \sim \operatorname{Bin}(n, \theta)$ with $n$ known and $\theta \in (0,1)$ unknown,
\begin{align*}
f(x; \theta) = \binom{n}{x}\theta^x(1-\theta)^{n-x}, \qquad x = 0, 1, \ldots, n.
\end{align*}
**Normal.** If $X \sim N(\mu, \sigma^2)$ with one or both of $\mu \in \mathbb{R}$, $\sigma^2 > 0$ unknown, the parameter is $\theta = (\mu, \sigma^2) \in \mathbb{R} \times (0,\infty)$. Here $\Theta$ is two-dimensional.
Now consider what the Poisson assumption buys us. Suppose we observe $n = 10$ counts: say $x = (2, 0, 3, 1, 4, 2, 1, 0, 3, 2)$. Under the Poisson model, the only free quantity is $\mu$, and these 10 numbers collectively pin it down: the maximum likelihood estimate (derived in Chapter 1) is simply $\hat{\mu} = \bar{x} = 1.8$. The parametric assumption has converted 10 numbers into a single inferential target.
Without the Poisson assumption, those same 10 observations are compatible with any distribution on $\{0,1,2,\ldots\}$ that assigns positive probability to the values $\{0,1,2,3,4\}$. A geometric distribution, a negative binomial, a distribution that places all its mass on five arbitrary points — all are consistent with the data. The sample cannot distinguish among them because each introduces its own free parameters that can be tuned to fit the observations. The parametric assumption is the commitment that converts an underdetermined problem into a tractable one.
Notice also that the choice of family has predictive consequences. If we believe the data are Poisson, then the probability of observing $X = 10$ in a future trial is $e^{-1.8}(1.8)^{10}/10! \approx 0.00003$. If instead we model the data as negative binomial with the same mean, the tail probabilities differ — possibly substantially. The family is not a neutral container; it shapes what the model predicts beyond the range of the observed data.
[/example]