[proofplan]
We add constant symbols naming every element of $M$, and one additional constant symbol $c$ intended to name a new element. We consider the elementary diagram of $M$ together with the sentences $c \neq \bar{m}$ for all $m \in M$. Every finite part of this theory is satisfiable inside $M$, because only finitely many inequalities are mentioned and $M$ is infinite. Compactness then gives a model of the full theory, and the elementary diagram ensures that the named copy of $M$ embeds elementarily into its reduct.
[/proofplan]
[step:Expand the language by constants for $M$ and by one new constant]
Let $\mathcal{L}_M$ denote the first-order language obtained from $\mathcal{L}$ by adjoining, for each $m \in M$, a new constant symbol $\bar{m}$. Let $M_M$ be the $\mathcal{L}_M$-structure whose $\mathcal{L}$-reduct is $M$ and in which each constant symbol $\bar{m}$ is interpreted as the element $m \in M$.
Let $c$ be a new constant symbol not belonging to $\mathcal{L}_M$, and define
\begin{align*}
\mathcal{L}_{M,c} := \mathcal{L}_M \cup \{c\}.
\end{align*}
Define the elementary diagram of $M$ in the language $\mathcal{L}_M$ by
\begin{align*}
\operatorname{Diag}_{\mathrm{el}}(M)
:=
\{\sigma : \sigma \text{ is an } \mathcal{L}_M\text{-sentence and } M_M \models \sigma\}.
\end{align*}
Now define the $\mathcal{L}_{M,c}$-theory
\begin{align*}
\Gamma
:=
\operatorname{Diag}_{\mathrm{el}}(M)
\cup
\{c \neq \bar{m} : m \in M\}.
\end{align*}
[guided]
The purpose of the language expansion is to make every element of $M$ visible to first-order syntax. The symbol $\bar{m}$ is a formal constant symbol, and the structure $M_M$ interprets it as the actual element $m \in M$.
The elementary diagram records every first-order sentence in this expanded language that is true in $M_M$. This is stronger than the atomic diagram: it contains not only equations, relations, and function values, but every first-order statement with parameters from $M$.
We then add one more constant symbol $c$. The theory $\Gamma$ says two things simultaneously: first, the named copy of $M$ satisfies exactly the same first-order facts with parameters from $M$ as it did originally; second, the interpretation of $c$ is distinct from every named element $\bar{m}$.
[/guided]
[/step]
[step:Show every finite fragment of $\Gamma$ is satisfiable]
Let $\Delta \subset \Gamma$ be finite. Since $\Delta$ is finite, there are finitely many elements $m_1,\dots,m_k \in M$ such that the only inequalities of the form $c \neq \bar{m}$ appearing in $\Delta$ are among
\begin{align*}
c \neq \bar{m}_1,\dots,c \neq \bar{m}_k.
\end{align*}
Because $M$ is infinite, there exists an element $b \in M$ such that
\begin{align*}
b \notin \{m_1,\dots,m_k\}.
\end{align*}
Expand $M_M$ to an $\mathcal{L}_{M,c}$-structure $M_{M,c}^{b}$ by interpreting the constant symbol $c$ as $b$. Every sentence in $\Delta \cap \operatorname{Diag}_{\mathrm{el}}(M)$ is true in $M_{M,c}^{b}$, because it is an $\mathcal{L}_M$-sentence true in $M_M$ and does not mention $c$. Also, for each $i \in \{1,\dots,k\}$,
\begin{align*}
M_{M,c}^{b} \models c \neq \bar{m}_i
\end{align*}
because $c^{M_{M,c}^{b}} = b$ and $\bar{m}_i^{M_{M,c}^{b}} = m_i$, with $b \neq m_i$.
Therefore every finite subset $\Delta \subset \Gamma$ is satisfiable.
[guided]
The finite satisfiability argument is the whole reason infinitude of $M$ is needed. A finite fragment of $\Gamma$ can only demand that $c$ be different from finitely many named elements of $M$. It cannot yet demand that $c$ avoid all of $M$.
Let $\Delta \subset \Gamma$ be finite. The part of $\Delta$ coming from the elementary diagram is already true in $M_M$. The only possible obstruction is the finite list of inequalities involving $c$. Since $M$ is infinite, we can choose one element $b \in M$ different from all the finitely many elements named in those inequalities.
We then interpret $c$ as this element $b$. This does not affect any sentence from the elementary diagram, because those sentences are written in the smaller language $\mathcal{L}_M$ and do not mention $c$. It also makes every inequality in the finite list true. Hence the finite fragment $\Delta$ has a model.
[/guided]
[/step]
[step:Apply compactness to obtain a model with a genuinely new named element]
By the [Compactness Theorem for first-order logic](/theorems/4290), since every finite subset of $\Gamma$ is satisfiable, the full theory $\Gamma$ has a model. Here we are citing a result not yet resolved in the wiki: [Compactness Theorem](/theorems/2748) for First-Order Logic.
Let $A$ be an $\mathcal{L}_{M,c}$-structure such that
\begin{align*}
A \models \Gamma.
\end{align*}
Let $N$ denote the $\mathcal{L}$-reduct of $A$. Define a map
\begin{align*}
j: M &\to N \\
m &\mapsto \bar{m}^{A}.
\end{align*}
Also define
\begin{align*}
a := c^{A} \in N.
\end{align*}
For every $m \in M$, the sentence $c \neq \bar{m}$ belongs to $\Gamma$, so $A \models c \neq \bar{m}$. Therefore
\begin{align*}
a \neq j(m)
\end{align*}
for every $m \in M$, and hence $a \notin j[M]$.
[guided]
Compactness lets us pass from finite satisfiability to simultaneous satisfiability of all the requirements in $\Gamma$. The full theory contains one inequality $c \neq \bar{m}$ for every $m \in M$, so in a model of the full theory the interpretation of $c$ is different from every named element.
Let $A$ be such a model. We now forget the extra symbols and keep only the original $\mathcal{L}$-structure; this reduct is called $N$. The constants $\bar{m}$ still live in the expanded structure $A$, and each one names an element of the underlying set of $N$. This gives a map
\begin{align*}
j: M &\to N \\
m &\mapsto \bar{m}^{A}.
\end{align*}
The new element is the interpretation of $c$, namely $a := c^A$. Since $A$ satisfies every sentence $c \neq \bar{m}$, the element $a$ is distinct from every element $j(m)$ in the named copy of $M$.
[/guided]
[/step]
[step:Use the elementary diagram to prove that the named copy of $M$ is elementary]
We prove that $j: M \to N$ is an elementary embedding. First, $j$ is injective. If $m,n \in M$ and $m \neq n$, then the $\mathcal{L}_M$-sentence $\bar{m} \neq \bar{n}$ is true in $M_M$, so it belongs to $\operatorname{Diag}_{\mathrm{el}}(M) \subset \Gamma$. Since $A \models \Gamma$,
\begin{align*}
\bar{m}^{A} \neq \bar{n}^{A},
\end{align*}
and hence $j(m) \neq j(n)$.
Now let $\varphi(x_1,\dots,x_r)$ be an $\mathcal{L}$-formula, and let $m_1,\dots,m_r \in M$. If
\begin{align*}
M \models \varphi(m_1,\dots,m_r),
\end{align*}
then the $\mathcal{L}_M$-sentence $\varphi(\bar{m}_1,\dots,\bar{m}_r)$ belongs to $\operatorname{Diag}_{\mathrm{el}}(M)$. Thus
\begin{align*}
A \models \varphi(\bar{m}_1,\dots,\bar{m}_r),
\end{align*}
and therefore
\begin{align*}
N \models \varphi(j(m_1),\dots,j(m_r)).
\end{align*}
Conversely, if
\begin{align*}
M \not\models \varphi(m_1,\dots,m_r),
\end{align*}
then
\begin{align*}
M \models \neg \varphi(m_1,\dots,m_r),
\end{align*}
so the sentence $\neg \varphi(\bar{m}_1,\dots,\bar{m}_r)$ lies in $\operatorname{Diag}_{\mathrm{el}}(M)$. Since $A \models \Gamma$,
\begin{align*}
N \models \neg \varphi(j(m_1),\dots,j(m_r)).
\end{align*}
Hence, for every $\mathcal{L}$-formula $\varphi(x_1,\dots,x_r)$ and every tuple $(m_1,\dots,m_r) \in M^r$,
\begin{align*}
M \models \varphi(m_1,\dots,m_r)
\iff
N \models \varphi(j(m_1),\dots,j(m_r)).
\end{align*}
Thus $j: M \to N$ is an elementary embedding.
[guided]
The elementary diagram was chosen precisely so that truth with parameters from $M$ is preserved in the model $A$. We first check injectivity. If $m \neq n$ in $M$, then the sentence $\bar{m} \neq \bar{n}$ is true in the expanded structure $M_M$. Since the elementary diagram is included in $\Gamma$, the same sentence is true in $A$. Therefore the elements $\bar{m}^{A}$ and $\bar{n}^{A}$ are distinct, so $j$ is injective.
Now we verify elementarity. Let $\varphi(x_1,\dots,x_r)$ be any formula of the original language $\mathcal{L}$, and let $m_1,\dots,m_r \in M$. The statement that $\varphi$ holds of this tuple in $M$ can be converted into an $\mathcal{L}_M$-sentence by replacing each parameter $m_i$ with the corresponding constant symbol $\bar{m}_i$.
If $M \models \varphi(m_1,\dots,m_r)$, then $\varphi(\bar{m}_1,\dots,\bar{m}_r)$ is in the elementary diagram, so it is true in $A$. Since $N$ is the $\mathcal{L}$-reduct of $A$, this means
\begin{align*}
N \models \varphi(j(m_1),\dots,j(m_r)).
\end{align*}
If $M$ does not satisfy $\varphi(m_1,\dots,m_r)$, then it satisfies the negation. The same diagram argument gives
\begin{align*}
N \models \neg \varphi(j(m_1),\dots,j(m_r)).
\end{align*}
Thus $M$ and $N$ agree on every first-order $\mathcal{L}$-formula evaluated at corresponding tuples. This is exactly the definition of an elementary embedding.
[/guided]
[/step]
[step:Identify $M$ with its elementary image and conclude properness]
Since $j: M \to N$ is an elementary embedding, we may replace $N$ by an isomorphic copy in which $j(m)$ is identified with $m$ for every $m \in M$. Under this identification,
\begin{align*}
M \preceq N.
\end{align*}
The element $a = c^{A}$ satisfies $a \notin j[M]$, so after the identification it satisfies
\begin{align*}
a \in N \setminus M.
\end{align*}
Therefore $N$ is a proper elementary extension of $M$ containing a new element. Finally, if a first-order theory $T$ has an infinite model $M$, applying the result to $M$ gives a proper elementary extension $N \succeq M$. Since elementary extensions satisfy the same sentences in the original language, $N \models T$. Hence every first-order theory with an infinite model has an elementary extension with a new element.
[/step]