[proofplan]
We average an arbitrary [linear map](/page/Linear%20Map) $A:W\to V$ over the group to produce an operator $T_A:W\to V$ that intertwines $\sigma$ with $\rho$. Haar invariance gives the intertwining relation, and [Schur's lemma](/theorems/2414), proved inside the argument for completeness, determines the averaged operator: it is zero for inequivalent irreducible representations and scalar in the identical irreducible case. Finally, choosing $A$ to be a rank-one map between basis vectors extracts exactly the desired matrix-coefficient integrals.
[/proofplan]
[step:Average a linear map to produce an intertwining operator]
Let $A:W\to V$ be a complex linear map. Define the averaged linear map
\begin{align*}
T_A:W\to V
\end{align*}
by
\begin{align*}
T_Ax:=\int_G \rho(g)A\sigma(g)^{-1}x\,d\mu(g).
\end{align*}
This integral is understood entrywise after choosing bases of the finite-dimensional vector spaces $V$ and $W$; the integrand is continuous because $\rho$, $\sigma$, and the linear operations are continuous.
We prove that $T_A$ is an intertwiner. Let $h\in G$. Since $\mu$ is left-invariant, the substitution $g=hk$ preserves the measure $\mu$. For every $x\in W$,
\begin{align*}
T_A\sigma(h)x=\int_G \rho(g)A\sigma(g)^{-1}\sigma(h)x\,d\mu(g).
\end{align*}
Using $g=hk$, we have $\rho(g)=\rho(h)\rho(k)$ and $\sigma(g)^{-1}\sigma(h)=\sigma(k)^{-1}$. Therefore
\begin{align*}
T_A\sigma(h)x=\int_G \rho(h)\rho(k)A\sigma(k)^{-1}x\,d\mu(k).
\end{align*}
Since $\rho(h)$ is a fixed linear map independent of $k$, it factors out of the finite-dimensional integral, giving
\begin{align*}
T_A\sigma(h)x=\rho(h)T_Ax.
\end{align*}
Thus
\begin{align*}
T_A\sigma(h)=\rho(h)T_A
\end{align*}
for every $h\in G$.
[guided]
The point of the average is to turn an arbitrary map $A:W\to V$ into a map that respects the two group actions. Define
\begin{align*}
T_A:W\to V
\end{align*}
by
\begin{align*}
T_Ax:=\int_G \rho(g)A\sigma(g)^{-1}x\,d\mu(g).
\end{align*}
This is a legitimate finite-dimensional vector-valued integral: after choosing any basis of $V$, each coordinate of the vector-valued function $g\mapsto \rho(g)A\sigma(g)^{-1}x$ is a continuous complex-valued function on the compact group $G$, hence is integrable with respect to the Haar probability measure $\mu$.
We now check the key property. Fix $h\in G$ and $x\in W$. Starting from the definition,
\begin{align*}
T_A\sigma(h)x=\int_G \rho(g)A\sigma(g)^{-1}\sigma(h)x\,d\mu(g).
\end{align*}
We use left-invariance of Haar measure with the substitution $g=hk$. Under this substitution the integration measure remains $d\mu(k)$. The representation identities give
\begin{align*}
\rho(hk)=\rho(h)\rho(k).
\end{align*}
They also give
\begin{align*}
\sigma(hk)^{-1}\sigma(h)=\sigma(k)^{-1}\sigma(h)^{-1}\sigma(h)=\sigma(k)^{-1}.
\end{align*}
Substituting these identities into the integral yields
\begin{align*}
T_A\sigma(h)x=\int_G \rho(h)\rho(k)A\sigma(k)^{-1}x\,d\mu(k).
\end{align*}
Because $\rho(h):V\to V$ is fixed while $k$ varies, linearity of finite-dimensional integration gives
\begin{align*}
T_A\sigma(h)x=\rho(h)\int_G \rho(k)A\sigma(k)^{-1}x\,d\mu(k).
\end{align*}
The integral on the right is exactly $T_Ax$, so
\begin{align*}
T_A\sigma(h)x=\rho(h)T_Ax.
\end{align*}
Since this holds for every $x\in W$, we have proved
\begin{align*}
T_A\sigma(h)=\rho(h)T_A.
\end{align*}
Thus $T_A$ is an intertwiner from $(\sigma,W)$ to $(\rho,V)$.
[/guided]
[/step]
[step:Apply Schur's lemma to determine the averaged operator]
[claim:Schur's lemma for the intertwiners appearing here]
Let $(\rho,V)$ and $(\sigma,W)$ be non-zero finite-dimensional irreducible complex representations of $G$. For a complex linear map $S:W\to V$, write $\ker S:=\{x\in W:Sx=0\}$ for its kernel and $\operatorname{im}S:=\{Sx:x\in W\}$ for its image. When $V=W$, write $I_V:V\to V$ for the identity map on $V$. If $S:W\to V$ is a complex linear map satisfying
\begin{align*}
S\sigma(g)=\rho(g)S
\end{align*}
for every $g\in G$, then either $S=0$ or $S$ is an isomorphism of representations. In particular, if $(\rho,V)$ and $(\sigma,W)$ are not isomorphic, then $S=0$. If $V=W$ and $\rho=\sigma$, then every such $S:V\to V$ is of the form $S=\lambda I_V$ for some $\lambda\in\mathbb C$.
[/claim]
[proof]
The kernel $\ker S\subset W$ is invariant under $\sigma$. Indeed, if $x\in\ker S$ and $g\in G$, then
\begin{align*}
S\sigma(g)x=\rho(g)Sx=0.
\end{align*}
Since $W$ is irreducible, $\ker S$ is either $\{0\}$ or $W$.
The image $\operatorname{im}S\subset V$ is invariant under $\rho$. Indeed, if $y=Sx\in\operatorname{im}S$ and $g\in G$, then
\begin{align*}
\rho(g)y=\rho(g)Sx=S\sigma(g)x\in\operatorname{im}S.
\end{align*}
Since $V$ is irreducible, $\operatorname{im}S$ is either $\{0\}$ or $V$.
If $S\neq 0$, then $\ker S\neq W$ and $\operatorname{im}S\neq\{0\}$. Hence $\ker S=\{0\}$ and $\operatorname{im}S=V$, so $S$ is a linear isomorphism. The intertwining identity then says precisely that $S$ is an isomorphism of representations.
Now assume $V=W$ and $\rho=\sigma$. Since $V$ is a non-zero finite-dimensional complex [vector space](/page/Vector%20Space), $S$ has an eigenvalue $\lambda\in\mathbb C$. The map $S-\lambda I_V$ is again an intertwiner. Its kernel is non-zero, so irreducibility forces $\ker(S-\lambda I_V)=V$. Therefore $S=\lambda I_V$.
[/proof]
Applying the claim to $S=T_A$, the intertwining relation from the previous step gives
\begin{align*}
T_A=0
\end{align*}
when $(\rho,V)$ and $(\sigma,W)$ are not isomorphic. In the identical case $V=W$ and $\rho=\sigma$, there exists $\lambda_A\in\mathbb C$ such that
\begin{align*}
T_A=\lambda_A I_V.
\end{align*}
[/step]
[step:Compute the scalar in the identical representation case by taking traces]
Assume in this step that $V=W$ and $\rho=\sigma$. For an endomorphism $B:V\to V$, write $\operatorname{tr}(B)$ for its trace. Since $T_A=\lambda_A I_V$, taking traces gives
\begin{align*}
\operatorname{tr}(T_A)=\lambda_A\dim V.
\end{align*}
By finite-dimensional linearity of trace and integration,
\begin{align*}
\operatorname{tr}(T_A)=\int_G \operatorname{tr}(\rho(g)A\rho(g)^{-1})\,d\mu(g).
\end{align*}
The cyclic property of trace gives
\begin{align*}
\operatorname{tr}(\rho(g)A\rho(g)^{-1})=\operatorname{tr}(A)
\end{align*}
for every $g\in G$. Since $\mu(G)=1$,
\begin{align*}
\operatorname{tr}(T_A)=\operatorname{tr}(A).
\end{align*}
Therefore
\begin{align*}
\lambda_A=\frac{\operatorname{tr}(A)}{\dim V}.
\end{align*}
[/step]
[step:Extract the matrix coefficient integral from a rank-one map]
Fix indices $1\leq j\leq m$ and $1\leq b\leq n$. Define the rank-one linear map
\begin{align*}
A_{jb}:W\to V
\end{align*}
by
\begin{align*}
A_{jb}x=(x,w_b)_Wv_j.
\end{align*}
Equivalently, $A_{jb}w_b=v_j$ and $A_{jb}w_c=0$ for $c\neq b$.
For $1\leq a\leq n$ and $1\leq i\leq m$, compute the $(i,a)$ matrix coefficient of $T_{A_{jb}}$:
\begin{align*}
(T_{A_{jb}}w_a,v_i)_V=\int_G (\rho(g)A_{jb}\sigma(g)^{-1}w_a,v_i)_V\,d\mu(g).
\end{align*}
For a linear map $B:W\to W$, write $B^*:W\to W$ for the adjoint with respect to $(\cdot,\cdot)_W$. Since $\sigma$ is unitary, $\sigma(g)^{-1}=\sigma(g)^*$. With the convention that the [inner product](/page/Inner%20Product) is linear in the first argument,
\begin{align*}
(\sigma(g)^{-1}w_a,w_b)_W=(w_a,\sigma(g)w_b)_W=\overline{(\sigma(g)w_b,w_a)_W}=\overline{\sigma_{ab}(g)}.
\end{align*}
Thus
\begin{align*}
A_{jb}\sigma(g)^{-1}w_a=\overline{\sigma_{ab}(g)}v_j.
\end{align*}
Therefore
\begin{align*}
(\rho(g)A_{jb}\sigma(g)^{-1}w_a,v_i)_V=\rho_{ij}(g)\overline{\sigma_{ab}(g)}.
\end{align*}
Hence
\begin{align*}
(T_{A_{jb}}w_a,v_i)_V=\int_G \rho_{ij}(g)\overline{\sigma_{ab}(g)}\,d\mu(g).
\end{align*}
If $(\rho,V)$ and $(\sigma,W)$ are not isomorphic, the previous Schur-lemma step gives $T_{A_{jb}}=0$, so the displayed integral is $0$.
Now assume $(\sigma,W)=(\rho,V)$ and the same [orthonormal basis](/page/Orthonormal%20Basis) is used. Then $m=n$, $w_a=v_a$, and $A_{jb}:V\to V$ satisfies $A_{jb}v_b=v_j$ and $A_{jb}v_c=0$ for $c\neq b$. Its trace is
\begin{align*}
\operatorname{tr}(A_{jb})=\delta_{jb}.
\end{align*}
The trace computation gives
\begin{align*}
T_{A_{jb}}=\frac{\delta_{jb}}{\dim V}I_V.
\end{align*}
Taking the $(i,a)$ matrix coefficient gives
\begin{align*}
(T_{A_{jb}}v_a,v_i)_V=\frac{\delta_{jb}}{\dim V}(v_a,v_i)_V=\frac{1}{\dim V}\delta_{ia}\delta_{jb}.
\end{align*}
Combining this with the already computed identity for $(T_{A_{jb}}w_a,v_i)_V$ proves
\begin{align*}
\int_G \rho_{ij}(g)\overline{\rho_{ab}(g)}\,d\mu(g)=\frac{1}{\dim V}\delta_{ia}\delta_{jb}.
\end{align*}
The two asserted orthogonality formulas follow.
[/step]