[proofplan]
View the inclusion $G \hookrightarrow \operatorname{GL}_n(\mathbb{C})$ as a complex representation of $G$ on $\mathbb{C}^n$ and apply Weyl's unitary trick to produce a $G$-invariant Hermitian inner product $\langle \cdot, \cdot \rangle$ on $\mathbb{C}^n$. Choose a basis orthonormal for $\langle \cdot, \cdot \rangle$ and let $P$ be the change-of-basis matrix from the standard basis to this basis. The dictionary identity $\langle x, y \rangle = (P^{-1}x,\, P^{-1}y)_0$, where $(\cdot, \cdot)_0$ is the standard Hermitian inner product, converts the $G$-invariance of $\langle \cdot, \cdot \rangle$ into the statement that $P^{-1}gP$ preserves $(\cdot, \cdot)_0$ for every $g \in G$, i.e., $P^{-1}GP \leq \operatorname{U}(n)$.
[/proofplan]
[step:Cast the inclusion $G \hookrightarrow \operatorname{GL}_n(\mathbb{C})$ as a representation on $\mathbb{C}^n$]
Let $G \leq \operatorname{GL}_n(\mathbb{C})$ be a finite subgroup. Define
\begin{align*}
\rho: G &\to \operatorname{GL}(\mathbb{C}^n) \\
g &\mapsto g,
\end{align*}
where the matrix $g \in \operatorname{GL}_n(\mathbb{C})$ is identified with the linear map $v \mapsto gv$ acting by matrix multiplication in the standard basis. Matrix multiplication respects products, so $\rho$ is a group homomorphism, hence a complex representation of $G$ on $\mathbb{C}^n$.
[/step]
[step:Apply Weyl's unitary trick to produce a $G$-invariant Hermitian inner product]
Since $G$ is finite, [Weyl's Unitary Trick](/theorems/2411) supplies a Hermitian inner product
\begin{align*}
\langle \cdot, \cdot \rangle: \mathbb{C}^n \times \mathbb{C}^n \to \mathbb{C}
\end{align*}
that is $G$-invariant: $\langle gv, gw \rangle = \langle v, w \rangle$ for all $g \in G$ and $v, w \in \mathbb{C}^n$.
[/step]
[step:Choose an orthonormal basis $(f_i)$ for $\langle \cdot, \cdot \rangle$ and form the change-of-basis matrix $P$]
By Gram–Schmidt orthogonalisation applied to the standard basis with respect to $\langle \cdot, \cdot \rangle$, there exists a basis $f_1, \dots, f_n$ of $\mathbb{C}^n$ orthonormal for $\langle \cdot, \cdot \rangle$:
\begin{align*}
\langle f_i, f_j \rangle = \delta_{ij}, \qquad 1 \leq i, j \leq n.
\end{align*}
Let $e_1, \dots, e_n$ be the standard basis. Define $P \in \operatorname{GL}_n(\mathbb{C})$ to be the unique matrix satisfying
\begin{align*}
Pe_i = f_i, \qquad 1 \leq i \leq n.
\end{align*}
Equivalently, $P^{-1}f_i = e_i$, and the $i$-th column of $P$ is the coordinate vector of $f_i$ in the standard basis.
[/step]
[step:Establish the dictionary identity $\langle x, y \rangle = (P^{-1}x,\, P^{-1}y)_0$]
Let $(\cdot, \cdot)_0: \mathbb{C}^n \times \mathbb{C}^n \to \mathbb{C}$ denote the standard Hermitian inner product, defined by sesquilinear extension of $(e_i, e_j)_0 = \delta_{ij}$.
[claim:For all $x, y \in \mathbb{C}^n$, $\langle x, y \rangle = (P^{-1}x,\, P^{-1}y)_0$]
[/claim]
[proof]
Both sides are sesquilinear forms on $\mathbb{C}^n \times \mathbb{C}^n$, so it suffices to check the identity on a basis. We use the basis $(f_k)_{k=1}^n$. For any $1 \leq i, j \leq n$,
\begin{align*}
\langle f_i, f_j \rangle = \delta_{ij}
\end{align*}
by orthonormality of $(f_k)$ for $\langle \cdot, \cdot \rangle$, and
\begin{align*}
(P^{-1}f_i,\, P^{-1}f_j)_0 = (e_i, e_j)_0 = \delta_{ij}
\end{align*}
using $P^{-1}f_i = e_i$ and orthonormality of $(e_k)$ for $(\cdot, \cdot)_0$. The two sesquilinear forms agree on the basis $(f_k)$, so they agree as forms on $\mathbb{C}^n \times \mathbb{C}^n$.
[/proof]
Substituting $x \to Px$, $y \to Py$ (valid since $P$ is bijective) yields the equivalent dictionary identity that we will use in the calculation:
\begin{align*}
\langle Px,\, Py \rangle = (x, y)_0 \qquad \text{for all } x, y \in \mathbb{C}^n. \quad (\diamond)
\end{align*}
[/step]
[step:Show that $P^{-1}gP \in \operatorname{U}(n)$ for every $g \in G$]
Let $g \in G$ and $u, u' \in \mathbb{C}^n$. Compute:
\begin{align*}
(P^{-1}gPu,\; P^{-1}gPu')_0 &= \langle P \cdot P^{-1}gPu,\; P \cdot P^{-1}gPu' \rangle && (\diamond) \text{ with } x := P^{-1}gPu,\, y := P^{-1}gPu' \\
&= \langle gPu,\; gPu' \rangle && (P \cdot P^{-1} = I_n) \\
&= \langle Pu,\, Pu' \rangle && (G\text{-invariance of } \langle \cdot, \cdot \rangle) \\
&= (u, u')_0 && (\diamond) \text{ with } x := u,\, y := u'.
\end{align*}
So $P^{-1}gP$ preserves $(\cdot, \cdot)_0$ for every $g \in G$. By the matrix characterisation of unitarity — a matrix $A \in \operatorname{GL}_n(\mathbb{C})$ preserves $(\cdot, \cdot)_0$ iff $A^* A = I_n$ iff $A \in \operatorname{U}(n)$ — we conclude $P^{-1}gP \in \operatorname{U}(n)$. Therefore
\begin{align*}
P^{-1} G P \leq \operatorname{U}(n).
\end{align*}
Setting $Q := P^{-1}$, we have $QGQ^{-1} = P^{-1} G P \leq \operatorname{U}(n)$, exhibiting $G$ as conjugate to a subgroup of $\operatorname{U}(n)$.
[guided]
The whole proof is a single idea: **a $G$-invariant inner product becomes the standard inner product after a change of basis**. The unitary group $\operatorname{U}(n)$ is by definition the matrices that preserve the standard Hermitian inner product $(\cdot, \cdot)_0$. So if our $G$-invariant inner product $\langle \cdot, \cdot \rangle$ were the standard one, every $g \in G$ would already be unitary by virtue of the $G$-invariance condition $\langle gv, gw\rangle = \langle v, w\rangle$. The change of basis $P$ is the dictionary translating between the two inner products.
**Why is the dictionary identity $\langle x, y \rangle = (P^{-1}x, P^{-1}y)_0$ true?** Expand any $x \in \mathbb{C}^n$ in the basis $(f_i)$: $x = \sum_i x_i f_i$. The coordinate vector of $x$ in the basis $(f_i)$ is $P^{-1}x$ (because $P$ takes the standard basis to $(f_i)$, so $P^{-1}$ takes vectors to their $(f_i)$-coordinates). Hence $x_i = (P^{-1}x)_i$. By orthonormality of $(f_i)$ for $\langle \cdot, \cdot \rangle$,
\begin{align*}
\langle x, y \rangle = \sum_i x_i \overline{y_i} = \sum_i (P^{-1}x)_i \overline{(P^{-1}y)_i} = (P^{-1}x,\, P^{-1}y)_0.
\end{align*}
**Why does $P^{-1}gP$ work, not $PgP^{-1}$?** Because the dictionary $(\diamond)$ has the form $\langle Px, Py\rangle = (x,y)_0$ — that is, $P$ pulls $\langle \cdot, \cdot\rangle$ back to $(\cdot, \cdot)_0$. To make a matrix $A$ preserve $(\cdot, \cdot)_0$, we must arrange that $P A$ preserves $\langle \cdot, \cdot\rangle$ (so that the dictionary translates back to $(\cdot, \cdot)_0$ on both sides). The simplest $A$ arising from $g$ that has this property is $A = P^{-1} g P$: then $P A = g P$, and $g$ preserves $\langle \cdot, \cdot\rangle$ by hypothesis, so $\langle g P x, g P y\rangle = \langle Px, Py\rangle = (x, y)_0$. Tracing back, $(Ax, Ay)_0 = \langle P A x, P A y\rangle = (x, y)_0$.
The hypothesis "$G$ finite" is consumed entirely inside Weyl's unitary trick — its averaging argument requires a finite group. The remainder of the proof is general linear algebra, valid for any finite-dimensional complex inner-product space.
[/guided]
[/step]