[proofplan]
We prove the stronger statement by adding countably many new constant symbols and requiring them to denote pairwise distinct elements. Every finite fragment of the expanded theory is satisfiable because the original theory has finite models of arbitrarily large cardinality. The first-order [compactness theorem](/theorems/2748) then gives a model of the expanded theory, and its reduct to the original language is an infinite model of $T$. The final assertion follows because a theory whose models were exactly the finite structures would have finite models of arbitrarily large size, hence an infinite model, contradicting the asserted description of its models.
[/proofplan]
[step:Expand the language by constants that must be pairwise distinct]
Let $L^*$ be the first-order language obtained from $L$ by adjoining a new constant symbol $c_k$ for each $k \in \mathbb{N}$. Define an $L^*$-theory $\Sigma$ by
\begin{align*}
\Sigma
:=
T
\cup
\{c_i \neq c_j : i,j \in \mathbb{N} \text{ and } i \neq j\}.
\end{align*}
Here each sentence $c_i \neq c_j$ is the $L^*$-sentence asserting that the interpretations of the constant symbols $c_i$ and $c_j$ are distinct.
[guided]
The goal is to force an infinite model using first-order sentences. First-order logic cannot directly say “the universe is infinite” by one sentence, but it can say, for each finite number of named elements, that those named elements are all distinct.
We therefore enlarge the language. Let $L^*$ be the first-order language obtained from $L$ by adding one new constant symbol $c_k$ for every $k \in \mathbb{N}$. These symbols are not assumed to name elements yet; they become interpreted only inside an $L^*$-structure.
Now define the expanded theory
\begin{align*}
\Sigma
:=
T
\cup
\{c_i \neq c_j : i,j \in \mathbb{N} \text{ and } i \neq j\}.
\end{align*}
The set $T$ is viewed as a set of $L^*$-sentences through the inclusion $L \subset L^*$. The new sentences require the interpretations of all the constants $c_1,c_2,\dots$ to be pairwise distinct. Any model of $\Sigma$ will therefore contain infinitely many distinct elements.
[/guided]
[/step]
[step:Verify that every finite part of the expanded theory has a model]
Let $\Sigma_0 \subset \Sigma$ be finite. Only finitely many of the new constant symbols occur in $\Sigma_0$; denote their distinct indices by
\begin{align*}
k_1,\dots,k_m \in \mathbb{N},
\end{align*}
where $m \in \mathbb{N} \cup \{0\}$. By hypothesis, there exists a finite $L$-structure $A$ such that $A \models T$ and $|A| \geq m$.
Choose distinct elements
\begin{align*}
a_1,\dots,a_m \in A.
\end{align*}
Make $A$ into an $L^*$-structure $A^*$ by interpreting $c_{k_r}^{A^*} := a_r$ for $1 \leq r \leq m$, and interpreting every remaining constant symbol $c_k$ arbitrarily as an element of $A$ when $m > 0$. If $m=0$, there are no constant symbols from $\Sigma_0$ to constrain, and any arbitrary interpretation of the new constants gives an $L^*$-expansion $A^*$ of $A$.
Since the $L$-reduct of $A^*$ is $A$ and $A \models T$, every sentence of $\Sigma_0 \cap T$ holds in $A^*$. Since the constants appearing in the finitely many inequalities from $\Sigma_0$ have been interpreted as distinct elements whenever the corresponding indices are distinct, every inequality in $\Sigma_0$ also holds in $A^*$. Hence $A^* \models \Sigma_0$.
[guided]
We must check finite satisfiability because this is exactly the hypothesis needed for compactness. Let $\Sigma_0 \subset \Sigma$ be a finite subset. Although $\Sigma$ contains infinitely many new constants, the finite set $\Sigma_0$ mentions only finitely many of them. Write those indices as
\begin{align*}
k_1,\dots,k_m \in \mathbb{N},
\end{align*}
where $m$ is the number of distinct new constants appearing in $\Sigma_0$.
The hypothesis says that $T$ has finite models of arbitrarily large size. Applying it with this integer $m$, choose a finite $L$-structure $A$ such that
\begin{align*}
A \models T
\quad\text{and}\quad
|A| \geq m.
\end{align*}
Because $A$ has at least $m$ elements, we may choose pairwise distinct elements
\begin{align*}
a_1,\dots,a_m \in A.
\end{align*}
We now turn $A$ into an $L^*$-structure $A^*$. For each $1 \leq r \leq m$, interpret the constant symbol $c_{k_r}$ as $a_r$:
\begin{align*}
c_{k_r}^{A^*} := a_r.
\end{align*}
Any new constant symbol not appearing in $\Sigma_0$ has no effect on whether $A^*$ satisfies $\Sigma_0$, so it may be interpreted arbitrarily as an element of $A$.
Now verify the two kinds of sentences in $\Sigma_0$. First, any sentence in $\Sigma_0 \cap T$ is an $L$-sentence, and the $L$-reduct of $A^*$ is exactly $A$. Since $A \models T$, these sentences hold in $A^*$. Second, any sentence of $\Sigma_0$ of the form $c_i \neq c_j$ involves two distinct constants among the finitely many constants already assigned to distinct elements. Therefore $A^* \models c_i \neq c_j$. Thus $A^* \models \Sigma_0$.
[/guided]
[/step]
[step:Apply compactness to obtain a model with infinitely many distinct elements]
Every finite subset of $\Sigma$ is satisfiable by the previous step. By the first-order compactness theorem (citing a result not yet in the wiki: First-Order Compactness Theorem), there exists an $L^*$-structure $M^*$ such that
\begin{align*}
M^* \models \Sigma.
\end{align*}
For all distinct $i,j \in \mathbb{N}$, the sentence $c_i \neq c_j$ belongs to $\Sigma$, so
\begin{align*}
c_i^{M^*} \neq c_j^{M^*}.
\end{align*}
Thus the underlying set of $M^*$ contains the infinite subset
\begin{align*}
\{c_k^{M^*} : k \in \mathbb{N}\}.
\end{align*}
Therefore $M^*$ is infinite.
[guided]
The previous step proved that every finite piece of $\Sigma$ has a model. The [compactness theorem for first-order logic](/theorems/4290) says that if every finite subset of a first-order theory is satisfiable, then the whole theory is satisfiable. Applying that theorem to the $L^*$-theory $\Sigma$, we obtain an $L^*$-structure $M^*$ such that
\begin{align*}
M^* \models \Sigma.
\end{align*}
Now use the special sentences that we inserted into $\Sigma$. For any distinct indices $i,j \in \mathbb{N}$, the sentence $c_i \neq c_j$ belongs to $\Sigma$. Since $M^*$ satisfies all sentences in $\Sigma$, we have
\begin{align*}
c_i^{M^*} \neq c_j^{M^*}.
\end{align*}
Hence the elements named by the constants
\begin{align*}
c_1^{M^*}, c_2^{M^*}, c_3^{M^*},\dots
\end{align*}
are pairwise distinct. Therefore the underlying set of $M^*$ contains an infinite subset, namely
\begin{align*}
\{c_k^{M^*} : k \in \mathbb{N}\}.
\end{align*}
So $M^*$ is an infinite structure.
[/guided]
[/step]
[step:Take the reduct back to the original language]
Let $M$ be the $L$-reduct of $M^*$, meaning that $M$ has the same underlying set as $M^*$ and interprets exactly the symbols of $L$ as $M^*$ does. Since every sentence of $T$ belongs to $\Sigma$ and $M^* \models \Sigma$, we have
\begin{align*}
M \models T.
\end{align*}
Because $M$ and $M^*$ have the same underlying set, $M$ is infinite. This proves the general statement.
[guided]
The structure $M^*$ is a model in the expanded language $L^*$, but the theorem asks for a model of the original $L$-theory $T$. We therefore forget the interpretations of the added constant symbols. Let $M$ be the $L$-reduct of $M^*$: it has the same underlying set as $M^*$, and every symbol of $L$ is interpreted exactly as it was in $M^*$.
Since $T \subset \Sigma$, every sentence of $T$ is true in $M^*$. These sentences mention only symbols from $L$, so their truth is unchanged when passing from $M^*$ to its $L$-reduct $M$. Hence
\begin{align*}
M \models T.
\end{align*}
The reduct operation does not change the underlying set. Since $M^*$ is infinite, $M$ is infinite. This proves that any first-order theory with finite models of arbitrarily large cardinality has an infinite model.
[/guided]
[/step]
[step:Derive the obstruction to axiomatizing exactly the finite models]
Suppose, for contradiction, that an $L$-theory $T$ has exactly all finite $L$-structures, up to isomorphism, as its models. Then $T$ has finite models of arbitrarily large cardinality. By the general statement already proved, $T$ has an infinite model. This contradicts the assumption that every model of $T$ is finite. Therefore no first-order theory has exactly the finite structures in its language, up to isomorphism, as its class of models.
[/step]