[proofplan]
We prove the theorem by structural induction on formulas. First we verify that every term is interpreted in the ultraproduct coordinatewise modulo $\mathcal U$; this reduces atomic formulas to the defining interpretations of equality and relation symbols in the ultraproduct. Boolean connectives are then handled by the closure of an ultrafilter under finite intersections and by the fact that an ultrafilter decides complements. The existential quantifier is the only non-formal step: one direction uses a representative of an ultraproduct witness, and the other direction chooses coordinatewise witnesses on a $\mathcal U$-large set.
[/proofplan]
[step:Define the coordinate truth set and reduce terms to coordinatewise evaluation]
Let $P := \prod_{i \in I}|M_i|$. For $a,b \in P$, write
\begin{align*}
a \sim_{\mathcal U} b
\quad \Longleftrightarrow \quad
\{i \in I : a(i)=b(i)\}\in \mathcal U.
\end{align*}
The universe of $M$ is $P/\sim_{\mathcal U}$, and $[a]_{\mathcal U}$ denotes the $\sim_{\mathcal U}$-class of $a \in P$.
For an $L$-formula $\theta(x_1,\dots,x_m)$ and elements $c_1,\dots,c_m \in P$, define its coordinate truth set by
\begin{align*}
\|\theta(c_1,\dots,c_m)\|
:=
\{i \in I : M_i \models \theta(c_1(i),\dots,c_m(i))\}.
\end{align*}
We first prove the term evaluation lemma. For every $L$-term $t(x_1,\dots,x_n)$ and every $a_1,\dots,a_n \in P$, define
\begin{align*}
t_P(a_1,\dots,a_n): I &\to \bigcup_{i \in I}|M_i| \\
i &\mapsto t^{M_i}(a_1(i),\dots,a_n(i)).
\end{align*}
Then $t_P(a_1,\dots,a_n) \in P$, and
\begin{align*}
t^M([a_1]_{\mathcal U},\dots,[a_n]_{\mathcal U})
=
[t_P(a_1,\dots,a_n)]_{\mathcal U}.
\end{align*}
This is proved by induction on the construction of $t$. If $t=x_k$, then $t_P(a_1,\dots,a_n)=a_k$, so the identity follows from the interpretation of variables. If
\begin{align*}
t = f(s_1,\dots,s_m)
\end{align*}
for an $m$-ary function symbol $f$ of $L$ and terms $s_1,\dots,s_m$, the induction hypothesis gives
\begin{align*}
s_r^M([a_1]_{\mathcal U},\dots,[a_n]_{\mathcal U})
=
[(s_r)_P(a_1,\dots,a_n)]_{\mathcal U}
\end{align*}
for each $r \in \{1,\dots,m\}$. The interpretation of $f$ in the ultraproduct is coordinatewise:
\begin{align*}
f^M([b_1]_{\mathcal U},\dots,[b_m]_{\mathcal U})
=
[i \mapsto f^{M_i}(b_1(i),\dots,b_m(i))]_{\mathcal U}.
\end{align*}
Substituting $b_r=(s_r)_P(a_1,\dots,a_n)$ gives the desired formula for $t$.
[guided]
The point of this preliminary step is that formulas are built out of terms, so atomic formulas can only be checked once we know how terms behave in the ultraproduct.
Let $P := \prod_{i \in I}|M_i|$. Thus an element $a \in P$ is a function assigning to each index $i \in I$ an element $a(i)\in |M_i|$. Define the [equivalence relation](/page/Equivalence%20Relation)
\begin{align*}
a \sim_{\mathcal U} b
\quad \Longleftrightarrow \quad
\{i \in I : a(i)=b(i)\}\in \mathcal U.
\end{align*}
The underlying set of the ultraproduct $M$ is $P/\sim_{\mathcal U}$.
For any formula $\theta(x_1,\dots,x_m)$ and any tuple $c_1,\dots,c_m \in P$, define the coordinate truth set
\begin{align*}
\|\theta(c_1,\dots,c_m)\|
:=
\{i \in I : M_i \models \theta(c_1(i),\dots,c_m(i))\}.
\end{align*}
This notation records exactly where the formula is true among the factors.
Now fix a term $t(x_1,\dots,x_n)$ and elements $a_1,\dots,a_n \in P$. Define
\begin{align*}
t_P(a_1,\dots,a_n): I &\to \bigcup_{i \in I}|M_i| \\
i &\mapsto t^{M_i}(a_1(i),\dots,a_n(i)).
\end{align*}
For each $i$, the value lies in $|M_i|$, so $t_P(a_1,\dots,a_n)\in P$. We prove by induction on terms that
\begin{align*}
t^M([a_1]_{\mathcal U},\dots,[a_n]_{\mathcal U})
=
[t_P(a_1,\dots,a_n)]_{\mathcal U}.
\end{align*}
If $t=x_k$, then evaluating $t$ in $M$ returns the $k$th argument:
\begin{align*}
t^M([a_1]_{\mathcal U},\dots,[a_n]_{\mathcal U})
=
[a_k]_{\mathcal U}.
\end{align*}
Also $t_P(a_1,\dots,a_n)=a_k$, so the equality follows.
Suppose $t=f(s_1,\dots,s_m)$, where $f$ is an $m$-ary function symbol of $L$. By the induction hypothesis, for each $r \in \{1,\dots,m\}$,
\begin{align*}
s_r^M([a_1]_{\mathcal U},\dots,[a_n]_{\mathcal U})
=
[(s_r)_P(a_1,\dots,a_n)]_{\mathcal U}.
\end{align*}
The ultraproduct interprets function symbols coordinatewise:
\begin{align*}
f^M([b_1]_{\mathcal U},\dots,[b_m]_{\mathcal U})
=
[i \mapsto f^{M_i}(b_1(i),\dots,b_m(i))]_{\mathcal U}.
\end{align*}
Applying this with $b_r=(s_r)_P(a_1,\dots,a_n)$ gives
\begin{align*}
t^M([a_1]_{\mathcal U},\dots,[a_n]_{\mathcal U})
=
[t_P(a_1,\dots,a_n)]_{\mathcal U}.
\end{align*}
[/guided]
[/step]
[step:Verify atomic formulas from the definition of the ultraproduct]
Let $\varphi(x_1,\dots,x_n)$ be atomic.
First suppose
\begin{align*}
\varphi \equiv (s=t)
\end{align*}
for $L$-terms $s,t$. By the term evaluation lemma,
\begin{align*}
M \models s([a_1]_{\mathcal U},\dots,[a_n]_{\mathcal U})
=
t([a_1]_{\mathcal U},\dots,[a_n]_{\mathcal U})
\end{align*}
if and only if
\begin{align*}
[s_P(a_1,\dots,a_n)]_{\mathcal U}
=
[t_P(a_1,\dots,a_n)]_{\mathcal U}.
\end{align*}
By the definition of $\sim_{\mathcal U}$, this is equivalent to
\begin{align*}
\{i \in I : s^{M_i}(a_1(i),\dots,a_n(i))
=
t^{M_i}(a_1(i),\dots,a_n(i))\}\in \mathcal U,
\end{align*}
which is exactly $\|\varphi(a_1,\dots,a_n)\|\in \mathcal U$.
Now suppose
\begin{align*}
\varphi \equiv R(s_1,\dots,s_m)
\end{align*}
for an $m$-ary relation symbol $R$ of $L$. By the term evaluation lemma and by the definition of the interpretation of $R$ in the ultraproduct,
\begin{align*}
M \models R(s_1([a]_{\mathcal U}),\dots,s_m([a]_{\mathcal U}))
\end{align*}
if and only if
\begin{align*}
\{i \in I : M_i \models R((s_1)_P(a)(i),\dots,(s_m)_P(a)(i))\}\in \mathcal U,
\end{align*}
where $a$ denotes the tuple $(a_1,\dots,a_n)$. This set is precisely $\|\varphi(a_1,\dots,a_n)\|$, so the theorem holds for atomic formulas.
[guided]
There are two kinds of atomic formulas: equalities of terms and applications of relation symbols to terms.
First consider an equality
\begin{align*}
\varphi \equiv (s=t).
\end{align*}
By the term evaluation lemma, the two sides evaluated in $M$ are
\begin{align*}
s^M([a_1]_{\mathcal U},\dots,[a_n]_{\mathcal U})
&=
[s_P(a_1,\dots,a_n)]_{\mathcal U},\\
t^M([a_1]_{\mathcal U},\dots,[a_n]_{\mathcal U})
&=
[t_P(a_1,\dots,a_n)]_{\mathcal U}.
\end{align*}
Equality in the quotient $P/\sim_{\mathcal U}$ means equality on a $\mathcal U$-large set of coordinates. Therefore
\begin{align*}
M \models s=t
\end{align*}
at the tuple $([a_1]_{\mathcal U},\dots,[a_n]_{\mathcal U})$ if and only if
\begin{align*}
\{i \in I : s^{M_i}(a_1(i),\dots,a_n(i))
=
t^{M_i}(a_1(i),\dots,a_n(i))\}\in \mathcal U.
\end{align*}
This is exactly the coordinate truth set of the equality formula.
Now consider a relation formula
\begin{align*}
\varphi \equiv R(s_1,\dots,s_m),
\end{align*}
where $R$ is an $m$-ary relation symbol of $L$. The interpretation of $R$ in the ultraproduct is defined by declaring
\begin{align*}
M \models R([b_1]_{\mathcal U},\dots,[b_m]_{\mathcal U})
\end{align*}
if and only if
\begin{align*}
\{i \in I : M_i \models R(b_1(i),\dots,b_m(i))\}\in \mathcal U.
\end{align*}
Taking $b_r=(s_r)_P(a_1,\dots,a_n)$ and using the term evaluation lemma gives precisely
\begin{align*}
M \models \varphi([a_1]_{\mathcal U},\dots,[a_n]_{\mathcal U})
\quad \Longleftrightarrow \quad
\|\varphi(a_1,\dots,a_n)\|\in \mathcal U.
\end{align*}
[/guided]
[/step]
[step:Pass the induction through Boolean connectives using ultrafilter laws]
Assume the theorem holds for formulas $\psi$ and $\theta$ with free variables among $x_1,\dots,x_n$.
For conjunction,
\begin{align*}
\|(\psi \wedge \theta)(a_1,\dots,a_n)\|
=
\|\psi(a_1,\dots,a_n)\| \cap \|\theta(a_1,\dots,a_n)\|.
\end{align*}
Since $\mathcal U$ is a filter, this intersection belongs to $\mathcal U$ if and only if both factors belong to $\mathcal U$. By the induction hypothesis, this is equivalent to
\begin{align*}
M \models \psi([a_1]_{\mathcal U},\dots,[a_n]_{\mathcal U})
\quad\text{and}\quad
M \models \theta([a_1]_{\mathcal U},\dots,[a_n]_{\mathcal U}),
\end{align*}
which is equivalent to
\begin{align*}
M \models (\psi \wedge \theta)([a_1]_{\mathcal U},\dots,[a_n]_{\mathcal U}).
\end{align*}
For negation,
\begin{align*}
\|(\neg \psi)(a_1,\dots,a_n)\|
=
I \setminus \|\psi(a_1,\dots,a_n)\|.
\end{align*}
Since $\mathcal U$ is an ultrafilter, exactly one of a subset of $I$ and its complement belongs to $\mathcal U$. Hence $\|(\neg\psi)(a_1,\dots,a_n)\|\in \mathcal U$ if and only if $\|\psi(a_1,\dots,a_n)\|\notin \mathcal U$. By the induction hypothesis, this is equivalent to
\begin{align*}
M \not\models \psi([a_1]_{\mathcal U},\dots,[a_n]_{\mathcal U}),
\end{align*}
and therefore to
\begin{align*}
M \models \neg\psi([a_1]_{\mathcal U},\dots,[a_n]_{\mathcal U}).
\end{align*}
The other Boolean connectives follow from these two cases by their definitions in terms of $\neg$ and $\wedge$.
[guided]
The Boolean part of the proof is where the algebra of an ultrafilter exactly matches the truth tables of first-order logic.
Assume first that the theorem has already been proved for $\psi$ and $\theta$. For conjunction, the coordinate set on which $\psi \wedge \theta$ is true is the intersection of the coordinate sets on which the two formulas are true:
\begin{align*}
\|(\psi \wedge \theta)(a_1,\dots,a_n)\|
=
\|\psi(a_1,\dots,a_n)\| \cap \|\theta(a_1,\dots,a_n)\|.
\end{align*}
A filter is closed under finite intersections and upward closed, so this intersection lies in $\mathcal U$ if and only if both $\|\psi(a_1,\dots,a_n)\|$ and $\|\theta(a_1,\dots,a_n)\|$ lie in $\mathcal U$. By the induction hypothesis, this is equivalent to
\begin{align*}
M \models \psi([a_1]_{\mathcal U},\dots,[a_n]_{\mathcal U})
\end{align*}
and
\begin{align*}
M \models \theta([a_1]_{\mathcal U},\dots,[a_n]_{\mathcal U}).
\end{align*}
By the semantics of conjunction in $M$, this is equivalent to
\begin{align*}
M \models (\psi \wedge \theta)([a_1]_{\mathcal U},\dots,[a_n]_{\mathcal U}).
\end{align*}
For negation, the coordinate truth set is the complement:
\begin{align*}
\|(\neg \psi)(a_1,\dots,a_n)\|
=
I \setminus \|\psi(a_1,\dots,a_n)\|.
\end{align*}
The ultrafilter property says that for every subset $A \subset I$, exactly one of $A$ and $I \setminus A$ belongs to $\mathcal U$. Applying this to $A=\|\psi(a_1,\dots,a_n)\|$, we get
\begin{align*}
\|(\neg\psi)(a_1,\dots,a_n)\|\in \mathcal U
\quad \Longleftrightarrow \quad
\|\psi(a_1,\dots,a_n)\|\notin \mathcal U.
\end{align*}
By the induction hypothesis, this is equivalent to saying that $\psi$ is not true in $M$ at the tuple $([a_1]_{\mathcal U},\dots,[a_n]_{\mathcal U})$. By the semantics of negation, this is exactly
\begin{align*}
M \models \neg\psi([a_1]_{\mathcal U},\dots,[a_n]_{\mathcal U}).
\end{align*}
Disjunction, implication, and biconditional may be reduced to conjunction and negation, so no separate argument is needed for them.
[/guided]
[/step]
[step:Handle existential quantifiers by choosing coordinate witnesses on a large set]
Assume the theorem holds for $\psi(y,x_1,\dots,x_n)$. Let
\begin{align*}
\varphi(x_1,\dots,x_n) \equiv \exists y\,\psi(y,x_1,\dots,x_n).
\end{align*}
First suppose
\begin{align*}
M \models \exists y\,\psi(y,[a_1]_{\mathcal U},\dots,[a_n]_{\mathcal U}).
\end{align*}
Then there is an element $[b]_{\mathcal U}\in M$, represented by some $b \in P$, such that
\begin{align*}
M \models \psi([b]_{\mathcal U},[a_1]_{\mathcal U},\dots,[a_n]_{\mathcal U}).
\end{align*}
By the induction hypothesis,
\begin{align*}
\{i \in I : M_i \models \psi(b(i),a_1(i),\dots,a_n(i))\}\in \mathcal U.
\end{align*}
This set is contained in
\begin{align*}
\{i \in I : M_i \models \exists y\,\psi(y,a_1(i),\dots,a_n(i))\},
\end{align*}
so upward closure of $\mathcal U$ gives
\begin{align*}
\|\varphi(a_1,\dots,a_n)\|\in \mathcal U.
\end{align*}
Conversely, suppose
\begin{align*}
A :=
\{i \in I : M_i \models \exists y\,\psi(y,a_1(i),\dots,a_n(i))\}
\in \mathcal U.
\end{align*}
For each $i \in A$, choose $b(i)\in |M_i|$ such that
\begin{align*}
M_i \models \psi(b(i),a_1(i),\dots,a_n(i)).
\end{align*}
For each $i \in I\setminus A$, choose an arbitrary element $b(i)\in |M_i|$, possible because each $M_i$ is nonempty. This defines an element $b\in P$. Since
\begin{align*}
A \subset
\{i \in I : M_i \models \psi(b(i),a_1(i),\dots,a_n(i))\},
\end{align*}
upward closure of $\mathcal U$ gives
\begin{align*}
\{i \in I : M_i \models \psi(b(i),a_1(i),\dots,a_n(i))\}\in \mathcal U.
\end{align*}
By the induction hypothesis,
\begin{align*}
M \models \psi([b]_{\mathcal U},[a_1]_{\mathcal U},\dots,[a_n]_{\mathcal U}).
\end{align*}
Therefore
\begin{align*}
M \models \exists y\,\psi(y,[a_1]_{\mathcal U},\dots,[a_n]_{\mathcal U}).
\end{align*}
[guided]
This is the only step where witnesses matter. The induction hypothesis applies to $\psi$, but the formula we want to handle is $\exists y\,\psi$.
Assume first that
\begin{align*}
M \models \exists y\,\psi(y,[a_1]_{\mathcal U},\dots,[a_n]_{\mathcal U}).
\end{align*}
By the semantics of the existential quantifier in $M$, there is an element of the ultraproduct, say $[b]_{\mathcal U}\in M$ with $b\in P$, such that
\begin{align*}
M \models \psi([b]_{\mathcal U},[a_1]_{\mathcal U},\dots,[a_n]_{\mathcal U}).
\end{align*}
Now the induction hypothesis applies to $\psi$ with the tuple $(b,a_1,\dots,a_n)$. Hence
\begin{align*}
\{i \in I : M_i \models \psi(b(i),a_1(i),\dots,a_n(i))\}\in \mathcal U.
\end{align*}
At every index in this set, $b(i)$ is a witness to the existential formula in $M_i$. Therefore this set is contained in
\begin{align*}
\{i \in I : M_i \models \exists y\,\psi(y,a_1(i),\dots,a_n(i))\}.
\end{align*}
Since $\mathcal U$ is upward closed, the larger set also belongs to $\mathcal U$. This proves the forward implication for the existential formula.
Conversely, suppose
\begin{align*}
A :=
\{i \in I : M_i \models \exists y\,\psi(y,a_1(i),\dots,a_n(i))\}
\in \mathcal U.
\end{align*}
For each $i\in A$, the existential statement in $M_i$ gives at least one element of $|M_i|$ witnessing it. Choose one such element and call it $b(i)$, so that
\begin{align*}
M_i \models \psi(b(i),a_1(i),\dots,a_n(i)).
\end{align*}
For $i\in I\setminus A$, no witness is required, but we still need a full product element $b\in P$. Since every $M_i$ is nonempty, choose an arbitrary element $b(i)\in |M_i|$ for those indices. Thus we have defined
\begin{align*}
b \in \prod_{i \in I}|M_i| = P.
\end{align*}
By construction,
\begin{align*}
A \subset
\{i \in I : M_i \models \psi(b(i),a_1(i),\dots,a_n(i))\}.
\end{align*}
Because $A\in\mathcal U$ and $\mathcal U$ is upward closed, the right-hand set belongs to $\mathcal U$. Applying the induction hypothesis to $\psi$ gives
\begin{align*}
M \models \psi([b]_{\mathcal U},[a_1]_{\mathcal U},\dots,[a_n]_{\mathcal U}).
\end{align*}
Therefore $[b]_{\mathcal U}$ witnesses the existential quantifier in $M$, and so
\begin{align*}
M \models \exists y\,\psi(y,[a_1]_{\mathcal U},\dots,[a_n]_{\mathcal U}).
\end{align*}
[/guided]
[/step]
[step:Conclude the structural induction over all formulas]
The atomic case, the Boolean connective cases, and the existential quantifier case cover the usual recursive construction of first-order formulas. Universal quantifiers are handled by the equivalence
\begin{align*}
\forall y\,\psi \equiv \neg\exists y\,\neg\psi,
\end{align*}
using the already proved cases of negation and existential quantification. Hence, by structural induction on $\varphi$, for every $L$-formula $\varphi(x_1,\dots,x_n)$ and every $a_1,\dots,a_n\in P$,
\begin{align*}
M \models \varphi([a_1]_{\mathcal U},\dots,[a_n]_{\mathcal U})
\quad \Longleftrightarrow \quad
\|\varphi(a_1,\dots,a_n)\|\in \mathcal U.
\end{align*}
Expanding the definition of the coordinate truth set gives
\begin{align*}
M \models \varphi([a_1]_{\mathcal U},\dots,[a_n]_{\mathcal U})
\quad \Longleftrightarrow \quad
\{i \in I : M_i \models \varphi(a_1(i),\dots,a_n(i))\}\in \mathcal U,
\end{align*}
which is the desired conclusion.
[/step]