[proofplan]
We prove a Cauchy-Schwarz linear-forms estimate: if one distinguished function is evaluated at $x$ and all other bounded functions are evaluated at $x+c_i r$ with each $c_i$ invertible in $G$, then the whole average is bounded by the $U^d(G)$ norm of the distinguished function. The proof of this estimate is an induction on $d$, where each Cauchy-Schwarz step removes one bounded factor and creates one additional Gowers difference of the distinguished function. We then translate the arithmetic progression so that any chosen $f_j$ becomes the distinguished function; since all differences $i-j$ are invertible in $G$, the estimate applies for every $j$, giving the minimum.
[/proofplan]
[step:Prove the Cauchy-Schwarz estimate for one distinguished function]
Let $d\ge 1$ be an integer. Let $c_1,\dots,c_d\in G$ be elements such that multiplication by each $c_i$ is a bijection of $G$. Let
\begin{align*}
F:G&\to\mathbb C,\\
x&\mapsto F(x)
\end{align*}
be any function, and for each $i\in\{1,\dots,d\}$ let
\begin{align*}
g_i:G&\to\mathbb C,\\
x&\mapsto g_i(x)
\end{align*}
be a function satisfying $|g_i(x)|\le 1$ for every $x\in G$. We claim that
\begin{align*}
\left|
\mathbb E_{x,r\in G}
F(x)\prod_{i=1}^d g_i(x+c_i r)
\right|
\le
\|F\|_{U^d(G)}.
\end{align*}
We prove the claim by induction on $d$. For $d=1$, since multiplication by $c_1$ is bijective, the map
\begin{align*}
G\times G&\to G\times G,\\
(x,r)&\mapsto (x,x+c_1r)
\end{align*}
is a bijection. Hence
\begin{align*}
\mathbb E_{x,r\in G}F(x)g_1(x+c_1r)
=
\left(\mathbb E_{x\in G}F(x)\right)
\left(\mathbb E_{y\in G}g_1(y)\right).
\end{align*}
Because $|g_1|\le 1$, the second factor has absolute value at most $1$, and therefore
\begin{align*}
\left|
\mathbb E_{x,r\in G}F(x)g_1(x+c_1r)
\right|
\le
\left|\mathbb E_{x\in G}F(x)\right|
=
\|F\|_{U^1(G)}.
\end{align*}
Assume the estimate holds for $d-1$. Define
\begin{align*}
A
:=
\mathbb E_{x,r\in G}
F(x)\prod_{i=1}^d g_i(x+c_i r).
\end{align*}
Make the bijective change of variables $u=x+c_d r$, so that $x=u-c_d r$. Then
\begin{align*}
A
=
\mathbb E_{u,r\in G}
g_d(u)F(u-c_d r)
\prod_{i=1}^{d-1}g_i(u+(c_i-c_d)r).
\end{align*}
Applying Cauchy-Schwarz in the $u$-variable and using $|g_d(u)|\le 1$ gives
\begin{align*}
|A|^2
\le
\mathbb E_{u\in G}
\left|
\mathbb E_{r\in G}
F(u-c_d r)
\prod_{i=1}^{d-1}g_i(u+(c_i-c_d)r)
\right|^2.
\end{align*}
Expanding the square and setting $h=s-r$, $y=u-c_d r$, we obtain
\begin{align*}
|A|^2
\le
\mathbb E_{h\in G}
\left|
\mathbb E_{y,r\in G}
F(y)\overline{F(y-c_dh)}
\prod_{i=1}^{d-1}
g_i(y+c_i r)
\overline{g_i(y+c_i r+(c_i-c_d)h)}
\right|.
\end{align*}
For each fixed $h\in G$, define
\begin{align*}
F_h:G&\to\mathbb C,\\
y&\mapsto F(y)\overline{F(y-c_dh)},
\end{align*}
and, for $i\in\{1,\dots,d-1\}$, define
\begin{align*}
G_{i,h}:G&\to\mathbb C,\\
z&\mapsto g_i(z)\overline{g_i(z+(c_i-c_d)h)}.
\end{align*}
Since $|g_i|\le 1$, each $G_{i,h}$ is bounded in absolute value by $1$. The induction hypothesis applied to $F_h$ and the coefficients $c_1,\dots,c_{d-1}$ gives
\begin{align*}
|A|^2
\le
\mathbb E_{h\in G}\|F_h\|_{U^{d-1}(G)}.
\end{align*}
By [Jensen's inequality](/theorems/1977) for the convex function $t\mapsto t^{2^{d-1}}$ on $[0,\infty)$,
\begin{align*}
|A|^{2^d}
\le
\mathbb E_{h\in G}\|F_h\|_{U^{d-1}(G)}^{2^{d-1}}.
\end{align*}
Expanding the $U^{d-1}(G)$ norm gives
\begin{align*}
|A|^{2^d}
\le
\mathbb E_{h,y,h_1,\dots,h_{d-1}\in G}
\prod_{\omega\in\{0,1\}^{d-1}}
\mathcal C^{|\omega|}F(y+\omega_1h_1+\cdots+\omega_{d-1}h_{d-1})
\\
\cdot
\prod_{\omega\in\{0,1\}^{d-1}}
\mathcal C^{|\omega|+1}F(y+\omega_1h_1+\cdots+\omega_{d-1}h_{d-1}-c_dh).
\end{align*}
Since multiplication by $c_d$ is a bijection of $G$, the substitution $h_d=-c_dh$ preserves normalized counting measure on $G$. The last average is exactly
\begin{align*}
\mathbb E_{y,h_1,\dots,h_d\in G}
\prod_{\omega\in\{0,1\}^{d}}
\mathcal C^{|\omega|}
F(y+\omega_1h_1+\cdots+\omega_dh_d)
=
\|F\|_{U^d(G)}^{2^d}.
\end{align*}
Taking $2^d$-th roots proves the claim.
[guided]
The estimate is the engine of the proof. It says that a multilinear average is controlled by the Gowers norm of one chosen function, provided all other functions are bounded by $1$ and the linear coefficients attached to them are invertible in $G$.
For $d=1$, the average is
\begin{align*}
\mathbb E_{x,r\in G}F(x)g_1(x+c_1r).
\end{align*}
Because multiplication by $c_1$ is bijective, the pair $(x,x+c_1r)$ runs uniformly over $G\times G$ as $(x,r)$ runs uniformly over $G\times G$. Thus the average factors:
\begin{align*}
\mathbb E_{x,r\in G}F(x)g_1(x+c_1r)
=
\left(\mathbb E_{x\in G}F(x)\right)
\left(\mathbb E_{y\in G}g_1(y)\right).
\end{align*}
The boundedness $|g_1|\le 1$ gives $|\mathbb E_y g_1(y)|\le 1$, so this is bounded by $|\mathbb E_xF(x)|=\|F\|_{U^1(G)}$.
Now assume the result known for $d-1$. We want to remove the last bounded factor $g_d$. Define
\begin{align*}
A
:=
\mathbb E_{x,r\in G}
F(x)\prod_{i=1}^d g_i(x+c_i r).
\end{align*}
Set $u=x+c_dr$. This is a bijective change of variables from $(x,r)$ to $(u,r)$, with inverse $x=u-c_dr$. Therefore
\begin{align*}
A
=
\mathbb E_{u,r\in G}
g_d(u)F(u-c_d r)
\prod_{i=1}^{d-1}g_i(u+(c_i-c_d)r).
\end{align*}
We apply Cauchy-Schwarz in the $u$-variable. The role of this step is to use $|g_d|\le 1$ and make $g_d$ disappear:
\begin{align*}
|A|^2
\le
\mathbb E_{u\in G}
\left|
\mathbb E_{r\in G}
F(u-c_d r)
\prod_{i=1}^{d-1}g_i(u+(c_i-c_d)r)
\right|^2.
\end{align*}
Expanding the square introduces a second copy of the variable $r$, which we call $s$:
\begin{align*}
|A|^2
\le
\mathbb E_{u,r,s\in G}
F(u-c_dr)\overline{F(u-c_ds)}
\prod_{i=1}^{d-1}
g_i(u+(c_i-c_d)r)
\overline{g_i(u+(c_i-c_d)s)}.
\end{align*}
Set $h=s-r$ and $y=u-c_dr$. The map $(u,r,s)\mapsto(y,r,h)$ is bijective on $G^3$. Under this substitution,
\begin{align*}
u-c_dr &= y,\\
u-c_ds &= y-c_dh,\\
u+(c_i-c_d)r &= y+c_ir,\\
u+(c_i-c_d)s &= y+c_ir+(c_i-c_d)h.
\end{align*}
Hence
\begin{align*}
|A|^2
\le
\mathbb E_{h\in G}
\left|
\mathbb E_{y,r\in G}
F(y)\overline{F(y-c_dh)}
\prod_{i=1}^{d-1}
g_i(y+c_i r)
\overline{g_i(y+c_i r+(c_i-c_d)h)}
\right|.
\end{align*}
For fixed $h$, define
\begin{align*}
F_h:G&\to\mathbb C,\\
y&\mapsto F(y)\overline{F(y-c_dh)},
\end{align*}
and
\begin{align*}
G_{i,h}:G&\to\mathbb C,\\
z&\mapsto g_i(z)\overline{g_i(z+(c_i-c_d)h)}.
\end{align*}
Each $G_{i,h}$ is still bounded by $1$. Therefore the induction hypothesis applies to the inner average and gives
\begin{align*}
|A|^2
\le
\mathbb E_{h\in G}\|F_h\|_{U^{d-1}(G)}.
\end{align*}
Raising both sides to the power $2^{d-1}$ and using [Jensen's inequality](/theorems/9) gives
\begin{align*}
|A|^{2^d}
\le
\mathbb E_{h\in G}\|F_h\|_{U^{d-1}(G)}^{2^{d-1}}.
\end{align*}
When the $U^{d-1}$ norm is expanded, this is
\begin{align*}
|A|^{2^d}
\le
\mathbb E_{h,y,h_1,\dots,h_{d-1}\in G}
\prod_{\omega\in\{0,1\}^{d-1}}
\mathcal C^{|\omega|}F(y+\omega_1h_1+\cdots+\omega_{d-1}h_{d-1})
\\
\cdot
\prod_{\omega\in\{0,1\}^{d-1}}
\mathcal C^{|\omega|+1}F(y+\omega_1h_1+\cdots+\omega_{d-1}h_{d-1}-c_dh).
\end{align*}
Finally set $h_d=-c_dh$. This substitution preserves normalized counting measure because multiplication by $c_d$ is bijective. The displayed average becomes exactly the $2^d$-th power of the $U^d(G)$ norm of $F$:
\begin{align*}
|A|^{2^d}
\le
\|F\|_{U^d(G)}^{2^d}.
\end{align*}
Taking $2^d$-th roots proves the Cauchy-Schwarz estimate.
[/guided]
[/step]
[step:Translate the progression so that an arbitrary factor is distinguished]
Fix $j\in\{0,\dots,k-1\}$. Define
\begin{align*}
A
:=
\mathbb E_{x,r\in G}
\prod_{\ell=0}^{k-1}f_\ell(x+\ell r).
\end{align*}
Make the bijective change of variables $y=x+jr$. Then
\begin{align*}
A
=
\mathbb E_{y,r\in G}
f_j(y)
\prod_{\substack{0\le \ell\le k-1\\ \ell\ne j}}
f_\ell(y+(\ell-j)r).
\end{align*}
For every $\ell\ne j$, the integer $|\ell-j|$ lies in $\{1,\dots,k-1\}$, so multiplication by $\ell-j$ is a bijection of $G$ by hypothesis. Applying the Cauchy-Schwarz estimate from the previous step with $d=k-1$, distinguished function $F=f_j$, and the remaining functions $g_i$ equal to the functions $f_\ell$ indexed by $\ell\ne j$, gives
\begin{align*}
|A|
\le
\|f_j\|_{U^{k-1}(G)}.
\end{align*}
[/step]
[step:Take the minimum over the distinguished factor]
The index $j\in\{0,\dots,k-1\}$ was arbitrary. Therefore the estimate
\begin{align*}
\left|
\mathbb E_{x,r\in G}
\prod_{\ell=0}^{k-1}f_\ell(x+\ell r)
\right|
\le
\|f_j\|_{U^{k-1}(G)}
\end{align*}
holds for every $j\in\{0,\dots,k-1\}$. Taking the minimum over $j$ yields
\begin{align*}
\left|
\mathbb E_{x,r\in G}
\prod_{\ell=0}^{k-1}f_\ell(x+\ell r)
\right|
\le
\min_{0\le j\le k-1}\|f_j\|_{U^{k-1}(G)}.
\end{align*}
This is the desired generalized von Neumann bound.
[/step]