[proofplan]
We prove existence and uniqueness of the extension $f: F(S) \to G$. Uniqueness is the easy direction: a homomorphism is determined by its values on generators, and in $F(S)$ every reduced word is built from generators by multiplication and inversion. For existence, we define $f$ on reduced words by applying $\phi$ letter by letter (extended to inverses by $\phi(s^{-1}) := \phi(s)^{-1}$), show this is well-defined using that each element of $F(S)$ has a unique reduced representative, and verify the homomorphism property by analysing the cancellation that occurs when two reduced words are concatenated and reduced.
[/proofplan]
[step:Prove uniqueness of the extension]
Suppose $f_1, f_2: F(S) \to G$ are two homomorphisms satisfying $f_i \circ \operatorname{incl} = \phi$, i.e., $f_i(s) = \phi(s)$ for all $s \in S$. We show $f_1 = f_2$.
Since $f_1$ is a homomorphism, for every $s \in S$ we have $f_1(s^{-1}) = f_1(s)^{-1} = \phi(s)^{-1}$, and likewise $f_2(s^{-1}) = \phi(s)^{-1}$. For any reduced word $w = x_1 x_2 \cdots x_n \in F(S)$ with $x_j \in S \cup S^{-1}$, the homomorphism property gives
\begin{align*}
f_i(w) = f_i(x_1) f_i(x_2) \cdots f_i(x_n).
\end{align*}
Each factor $f_i(x_j)$ is either $\phi(s)$ (if $x_j = s \in S$) or $\phi(s)^{-1}$ (if $x_j = s^{-1}$); in particular $f_1(x_j) = f_2(x_j)$. Hence $f_1(w) = f_2(w)$. Since every element of $F(S)$ has a unique reduced representative, $f_1 = f_2$.
[/step]
[step:Define the candidate homomorphism $f$ on reduced words]
For $s \in S$, set $\phi(s^{-1}) := \phi(s)^{-1} \in G$. This extends $\phi$ to a set map $\tilde{\phi}: S \cup S^{-1} \to G$ where $S^{-1} := \{s^{-1} : s \in S\}$ is the set of formal inverses.
Every element $w \in F(S)$ has a unique reduced expression $w = x_1 x_2 \cdots x_n$ with each $x_j \in S \cup S^{-1}$ and no adjacent pair satisfying $x_j x_{j+1} \in \{s s^{-1}, s^{-1} s : s \in S\}$. (The identity element $e$ is the empty word, $n = 0$.) Define
\begin{align*}
f: F(S) &\to G \\
w = x_1 \cdots x_n &\mapsto \tilde{\phi}(x_1) \tilde{\phi}(x_2) \cdots \tilde{\phi}(x_n),
\end{align*}
with $f(e) := e_G$ (the empty product). This is well-defined because the reduced expression of $w$ is unique.
[guided]
The challenge is that $F(S)$ is a set of equivalence classes of words under the reduction relation; to define a function on $F(S)$, we need to pick a unique representative for each class. The reduced-word normal form provides this canonical representative — every $w \in F(S)$ has exactly one reduced expression.
Having fixed the representative, we can read off its value under $f$: apply $\tilde\phi$ to each letter and multiply the results in $G$. This is unambiguous because the representative is unique.
Note that we have not yet verified the homomorphism property: given two elements $u, v \in F(S)$, the reduced expression of the product $uv$ is not generally the concatenation of the reduced expressions of $u$ and $v$ — cancellation may occur. The next step handles this.
[/guided]
[/step]
[step:Verify the homomorphism property via cancellation of reduced words]
Let $u, v \in F(S)$ with reduced expressions
\begin{align*}
u = x_1 x_2 \cdots x_p, \qquad v = y_1 y_2 \cdots y_q.
\end{align*}
To reduce the concatenation $x_1 \cdots x_p y_1 \cdots y_q$ to a reduced word, we cancel matching pairs at the seam. There exists a unique integer $k$ with $0 \le k \le \min(p, q)$ such that $y_j = x_{p-j+1}^{-1}$ for $j = 1, \ldots, k$ and $y_{k+1} \ne x_{p-k}^{-1}$ (if $k < p$ and $k < q$). Writing $a_j := x_{p-j+1}$ for $j = 1, \ldots, k$, we have $y_j = a_j^{-1}$, and
\begin{align*}
u &= x_1 \cdots x_{p-k} \, a_k \cdots a_1, \\
v &= a_1^{-1} \cdots a_k^{-1} \, y_{k+1} \cdots y_q,
\end{align*}
with the product reducing to
\begin{align*}
uv = x_1 \cdots x_{p-k} \, y_{k+1} \cdots y_q
\end{align*}
as a reduced word (the right-hand side is reduced by the choice of $k$).
Applying $f$ using its definition on reduced words:
\begin{align*}
f(uv) &= \tilde{\phi}(x_1) \cdots \tilde{\phi}(x_{p-k}) \tilde{\phi}(y_{k+1}) \cdots \tilde{\phi}(y_q).
\end{align*}
On the other hand,
\begin{align*}
f(u) f(v) &= \bigl(\tilde{\phi}(x_1) \cdots \tilde{\phi}(x_{p-k}) \tilde{\phi}(a_k) \cdots \tilde{\phi}(a_1)\bigr) \bigl(\tilde{\phi}(a_1^{-1}) \cdots \tilde{\phi}(a_k^{-1}) \tilde{\phi}(y_{k+1}) \cdots \tilde{\phi}(y_q)\bigr).
\end{align*}
By the extension rule $\tilde{\phi}(s^{-1}) = \tilde{\phi}(s)^{-1}$, each product $\tilde{\phi}(a_j) \tilde{\phi}(a_j^{-1}) = \tilde{\phi}(a_j) \tilde{\phi}(a_j)^{-1} = e_G$ in $G$. Cancelling the innermost pair, then the next, and iterating $k$ times:
\begin{align*}
\tilde{\phi}(a_k) \cdots \tilde{\phi}(a_1) \tilde{\phi}(a_1^{-1}) \cdots \tilde{\phi}(a_k^{-1}) &= \tilde{\phi}(a_k) \cdots \tilde{\phi}(a_2) \bigl(\tilde{\phi}(a_1) \tilde{\phi}(a_1)^{-1}\bigr) \tilde{\phi}(a_2)^{-1} \cdots \tilde{\phi}(a_k)^{-1} \\
&= \tilde{\phi}(a_k) \cdots \tilde{\phi}(a_2) \tilde{\phi}(a_2)^{-1} \cdots \tilde{\phi}(a_k)^{-1} = \cdots = e_G.
\end{align*}
Therefore
\begin{align*}
f(u) f(v) = \tilde{\phi}(x_1) \cdots \tilde{\phi}(x_{p-k}) \cdot e_G \cdot \tilde{\phi}(y_{k+1}) \cdots \tilde{\phi}(y_q) = f(uv),
\end{align*}
showing that $f$ is a homomorphism.
[guided]
The homomorphism property would be transparent if the concatenation of two reduced words were automatically reduced — then $f(u)f(v)$ would literally be the product of the $\tilde\phi$-images of all the letters of $u$ followed by all those of $v$, which is exactly $f(uv)$. The obstruction is cancellation at the seam: the last letters of $u$ may cancel against the first letters of $v$.
The seam cancellation has a precise structure. There is a uniquely determined $k$ such that exactly the last $k$ letters of $u$ cancel against the first $k$ letters of $v$ — and because $u$ and $v$ are reduced, those letters must be mutually inverse in a specific pattern: if $a_1, \ldots, a_k$ are the last $k$ letters of $u$ read right-to-left, then the first $k$ letters of $v$ read left-to-right must be $a_1^{-1}, \ldots, a_k^{-1}$.
Why does the rule $\tilde\phi(s^{-1}) := \tilde\phi(s)^{-1}$ ensure the cancellation survives the map to $G$? Because it is precisely the condition that inverse letters in $F(S)$ map to inverse elements in $G$. The $k$ cancellations in $F(S)$ are mirrored by $k$ cancellations in $G$, so nothing is lost in translation.
This is why free groups have the universal property — the only relations imposed are those forced by the group axioms (in particular $s \cdot s^{-1} = e$), and any set map $\phi$ is compatible with these by the extension rule.
[/guided]
[/step]
[step:Verify the commutativity $f \circ \operatorname{incl} = \phi$]
For $s \in S$, the inclusion sends $s$ to the reduced word $s$ of length $1$ in $F(S)$. By definition of $f$ on reduced words, $f(s) = \tilde{\phi}(s) = \phi(s)$. Hence $f(\operatorname{incl}(s)) = f(s) = \phi(s)$, as required.
[/step]
[step:Conclude]
Combining Steps 2-4, the map $f: F(S) \to G$ defined in Step 2 is a group homomorphism satisfying $f \circ \operatorname{incl} = \phi$. By Step 1, any homomorphism with this property equals $f$. This establishes existence and uniqueness, completing the proof of the universal property.
[/step]