[proofplan]
We count three-term arithmetic progressions by Fourier analysis on the finite [vector space](/page/Vector%20Space) $\mathbb{F}_p^n$. If the ambient affine space is large compared with the density, the absence of non-degenerate progressions forces a nonzero Fourier coefficient of the balanced function $\mathbb{1}_A-\alpha$ to be large. A large Fourier coefficient gives a density increment on a codimension-one affine subspace. Iterating this increment either exhausts the dimension or makes the density exceed $1$, and the resulting dimension count gives $\alpha \leq C_p n^{-1}$.
[/proofplan]
[step:Define the Fourier transform and count progressions]
Let $V := \mathbb{F}_p^n$, let $N := |V| = p^n$, and let
\begin{align*}
\omega := e^{2\pi i/p}.
\end{align*}
For $\xi \in V$, define the character
\begin{align*}
\chi_\xi: V &\to \mathbb{C} \\
x &\mapsto \omega^{\xi \cdot x},
\end{align*}
where $\xi \cdot x := \sum_{j=1}^n \xi_j x_j \in \mathbb{F}_p$ is interpreted in the exponent through its representative in $\{0,\dots,p-1\}$.
For a function $g: V \to \mathbb{C}$, define its normalized [Fourier transform](/page/Fourier%20Transform) by
\begin{align*}
\widehat{g}: V &\to \mathbb{C} \\
\xi &\mapsto \mathbb{E}_{x \in V} g(x)\overline{\chi_\xi(x)}
= \frac{1}{N}\sum_{x \in V} g(x)\overline{\chi_\xi(x)}.
\end{align*}
The character orthogonality relation is
\begin{align*}
\mathbb{E}_{x \in V}\chi_\xi(x)
=
\begin{cases}
1, & \xi = 0,\\
0, & \xi \neq 0.
\end{cases}
\end{align*}
Indeed, if $\xi \neq 0$, choose $h \in V$ with $\xi \cdot h \neq 0$; translation by $h$ gives
\begin{align*}
\mathbb{E}_{x \in V}\chi_\xi(x)
=
\mathbb{E}_{x \in V}\chi_\xi(x+h)
=
\chi_\xi(h)\mathbb{E}_{x \in V}\chi_\xi(x),
\end{align*}
and $\chi_\xi(h)\neq 1$, so the expectation is $0$.
Let $a := \mathbb{1}_A:V \to \{0,1\}$. Define the normalized three-term progression count
\begin{align*}
T(a) := \mathbb{E}_{x,d \in V} a(x)a(x+d)a(x+2d).
\end{align*}
Fourier inversion, obtained from the orthogonality relation, gives
\begin{align*}
a(x)=\sum_{\xi \in V}\widehat{a}(\xi)\chi_\xi(x).
\end{align*}
Substituting this into $T(a)$ and using orthogonality in the variables $x$ and $d$ gives
\begin{align*}
T(a)
&=
\sum_{\xi,\eta,\zeta \in V}
\widehat{a}(\xi)\widehat{a}(\eta)\widehat{a}(\zeta)
\mathbb{E}_{x,d \in V}
\chi_\xi(x)\chi_\eta(x+d)\chi_\zeta(x+2d)\\
&=
\sum_{\xi,\eta,\zeta \in V}
\widehat{a}(\xi)\widehat{a}(\eta)\widehat{a}(\zeta)
\mathbb{E}_{x \in V}\chi_{\xi+\eta+\zeta}(x)
\mathbb{E}_{d \in V}\chi_{\eta+2\zeta}(d)\\
&=
\sum_{\zeta \in V}\widehat{a}(\zeta)^2\widehat{a}(-2\zeta).
\end{align*}
The last equality uses that $p$ is odd, so multiplication by $2$ is invertible in $\mathbb{F}_p$.
[guided]
The purpose of the Fourier transform is to diagonalize the condition that three points form an arithmetic progression. We work with normalized averages, so all quantities are scale-free in the size $N=p^n$.
For each frequency $\xi \in V$, the character
\begin{align*}
\chi_\xi: V &\to \mathbb{C} \\
x &\mapsto \omega^{\xi \cdot x}
\end{align*}
is a group homomorphism from the additive group of $V$ to the unit circle. The orthogonality identity says that a nontrivial character averages to zero. If $\xi\neq 0$, there is $h\in V$ with $\xi\cdot h\neq 0$. Since translation preserves the uniform average on $V$,
\begin{align*}
\mathbb{E}_{x \in V}\chi_\xi(x)
=
\mathbb{E}_{x \in V}\chi_\xi(x+h)
=
\chi_\xi(h)\mathbb{E}_{x \in V}\chi_\xi(x).
\end{align*}
Because $\chi_\xi(h)\neq 1$, the only possible value of the expectation is $0$. For $\xi=0$, the character is identically $1$, so the expectation is $1$.
Now define $a:=\mathbb{1}_A$. The quantity
\begin{align*}
T(a):=\mathbb{E}_{x,d\in V}a(x)a(x+d)a(x+2d)
\end{align*}
is the normalized number of ordered three-term progressions in $A$, including the degenerate progressions with $d=0$. Fourier inversion writes
\begin{align*}
a(x)=\sum_{\xi\in V}\widehat a(\xi)\chi_\xi(x).
\end{align*}
Substituting this expression at $x$, $x+d$, and $x+2d$, we obtain
\begin{align*}
T(a)
&=
\sum_{\xi,\eta,\zeta \in V}
\widehat{a}(\xi)\widehat{a}(\eta)\widehat{a}(\zeta)
\mathbb{E}_{x,d \in V}
\chi_\xi(x)\chi_\eta(x+d)\chi_\zeta(x+2d)\\
&=
\sum_{\xi,\eta,\zeta \in V}
\widehat{a}(\xi)\widehat{a}(\eta)\widehat{a}(\zeta)
\mathbb{E}_{x \in V}\chi_{\xi+\eta+\zeta}(x)
\mathbb{E}_{d \in V}\chi_{\eta+2\zeta}(d).
\end{align*}
The $x$-average vanishes unless $\xi+\eta+\zeta=0$, and the $d$-average vanishes unless $\eta+2\zeta=0$. Since $p$ is odd, $2$ is invertible, so these two equations force $\eta=-2\zeta$ and $\xi=\zeta$. Therefore
\begin{align*}
T(a)=\sum_{\zeta\in V}\widehat a(\zeta)^2\widehat a(-2\zeta).
\end{align*}
This is the Fourier formula that turns a shortage of progressions into information about nonzero Fourier coefficients.
[/guided]
[/step]
[step:Force a large nonzero Fourier coefficient from the missing progressions]
Define the balanced function
\begin{align*}
f: V &\to \mathbb{R} \\
x &\mapsto a(x)-\alpha.
\end{align*}
Then $\widehat{a}(0)=\alpha$, $\widehat{f}(0)=0$, and $\widehat{a}(\xi)=\widehat{f}(\xi)$ for every $\xi \neq 0$.
Since $A$ contains no non-degenerate three-term arithmetic progression, the only progressions counted by $T(a)$ have $d=0$. Hence
\begin{align*}
T(a)=\frac{|A|}{N^2}=\frac{\alpha}{N}.
\end{align*}
Assume first that
\begin{align*}
N \geq 2\alpha^{-2}.
\end{align*}
Then
\begin{align*}
T(a)\leq \frac{\alpha^3}{2}.
\end{align*}
Using the progression-count formula,
\begin{align*}
\left|\sum_{\zeta \in V\setminus\{0\}}\widehat{a}(\zeta)^2\widehat{a}(-2\zeta)\right|
=
|\alpha^3-T(a)|
\geq \frac{\alpha^3}{2}.
\end{align*}
Let
\begin{align*}
M := \max_{\xi \in V\setminus\{0\}}|\widehat{f}(\xi)|.
\end{align*}
Since multiplication by $-2$ is a bijection of $V\setminus\{0\}$,
\begin{align*}
\frac{\alpha^3}{2}
&\leq
\sum_{\zeta \in V\setminus\{0\}}
|\widehat{a}(\zeta)|^2|\widehat{a}(-2\zeta)|\\
&\leq
M\sum_{\zeta \in V\setminus\{0\}}|\widehat{a}(\zeta)|^2.
\end{align*}
[Parseval's identity](/theorems/434) for the normalized Fourier transform gives
\begin{align*}
\sum_{\zeta \in V}|\widehat{a}(\zeta)|^2
=
\mathbb{E}_{x \in V}|a(x)|^2
=
\alpha.
\end{align*}
Therefore
\begin{align*}
M \geq \frac{\alpha^2}{2}.
\end{align*}
[guided]
The balanced function
\begin{align*}
f:V&\to\mathbb{R}\\
x&\mapsto a(x)-\alpha
\end{align*}
removes the constant density contribution from $a$. Since $\mathbb{E}_{x\in V}a(x)=\alpha$, we have $\widehat f(0)=0$, while for every nonzero $\xi$ the constant function $\alpha$ has zero Fourier coefficient, so $\widehat a(\xi)=\widehat f(\xi)$.
The hypothesis that $A$ has no non-degenerate progression means that whenever $a(x)a(x+d)a(x+2d)=1$, we must have $d=0$. Thus the only contributing pairs are $(x,0)$ with $x\in A$, and therefore
\begin{align*}
T(a)=\frac{|A|}{N^2}=\frac{\alpha}{N}.
\end{align*}
If $N\geq 2\alpha^{-2}$, then this count is at most half of the random main term:
\begin{align*}
T(a)\leq \frac{\alpha^3}{2}.
\end{align*}
The Fourier formula separates the zero frequency from the nonzero frequencies:
\begin{align*}
T(a)
=
\widehat a(0)^2\widehat a(0)
+
\sum_{\zeta\in V\setminus\{0\}}\widehat a(\zeta)^2\widehat a(-2\zeta)
=
\alpha^3+
\sum_{\zeta\in V\setminus\{0\}}\widehat a(\zeta)^2\widehat a(-2\zeta).
\end{align*}
Since $T(a)\leq \alpha^3/2$, the nonzero frequencies must account for a cancellation of size at least $\alpha^3/2$:
\begin{align*}
\left|\sum_{\zeta\in V\setminus\{0\}}\widehat a(\zeta)^2\widehat a(-2\zeta)\right|
\geq \frac{\alpha^3}{2}.
\end{align*}
Let
\begin{align*}
M:=\max_{\xi\in V\setminus\{0\}}|\widehat f(\xi)|.
\end{align*}
For $\zeta\neq 0$, both $\zeta$ and $-2\zeta$ are nonzero, because $p$ is odd. Hence
\begin{align*}
|\widehat a(-2\zeta)|=|\widehat f(-2\zeta)|\leq M.
\end{align*}
Therefore
\begin{align*}
\frac{\alpha^3}{2}
&\leq
\sum_{\zeta\in V\setminus\{0\}}
|\widehat a(\zeta)|^2|\widehat a(-2\zeta)|\\
&\leq
M\sum_{\zeta\in V\setminus\{0\}}|\widehat a(\zeta)|^2.
\end{align*}
Parseval's identity follows from character orthogonality and gives
\begin{align*}
\sum_{\zeta\in V}|\widehat a(\zeta)|^2
=
\mathbb{E}_{x\in V}|a(x)|^2.
\end{align*}
Since $a$ is an indicator, $a^2=a$, so the right-hand side is $\alpha$. Thus
\begin{align*}
\frac{\alpha^3}{2}\leq M\alpha,
\end{align*}
and hence
\begin{align*}
M\geq \frac{\alpha^2}{2}.
\end{align*}
This is the key Fourier consequence: if the progression count is too small, then $A$ has a strong bias along some nonzero character.
[/guided]
[/step]
[step:Convert a large Fourier coefficient into a density increment]
Let $\xi \in V\setminus\{0\}$ satisfy
\begin{align*}
|\widehat f(\xi)|\geq \frac{\alpha^2}{2}.
\end{align*}
For each $t\in \mathbb{F}_p$, define the affine hyperplane
\begin{align*}
H_t:=\{x\in V:\xi\cdot x=t\}.
\end{align*}
Each $H_t$ has cardinality $N/p$. Define the density of $A$ on $H_t$ by
\begin{align*}
\alpha_t:=\frac{|A\cap H_t|}{|H_t|}.
\end{align*}
Then
\begin{align*}
\widehat f(\xi)
&=
\frac{1}{N}\sum_{x\in V}(a(x)-\alpha)\omega^{-\xi\cdot x}\\
&=
\frac{1}{p}\sum_{t\in\mathbb{F}_p}(\alpha_t-\alpha)\omega^{-t}.
\end{align*}
Let $\delta_t:=\alpha_t-\alpha$ for $t\in\mathbb{F}_p$. Since the hyperplanes $H_t$ partition $V$ into equal-size pieces,
\begin{align*}
\sum_{t\in\mathbb{F}_p}\delta_t=0.
\end{align*}
Thus
\begin{align*}
\frac{\alpha^2}{2}
\leq |\widehat f(\xi)|
\leq \frac{1}{p}\sum_{t\in\mathbb{F}_p}|\delta_t|
=
\frac{2}{p}\sum_{\delta_t>0}\delta_t
\leq 2\max_{t\in\mathbb{F}_p}\delta_t.
\end{align*}
Therefore there exists $t_0\in\mathbb{F}_p$ such that
\begin{align*}
\alpha_{t_0}\geq \alpha+\frac{\alpha^2}{4}.
\end{align*}
So, provided $p^n\geq 2\alpha^{-2}$, a progression-free set of density $\alpha$ in $V$ has density at least $\alpha+\alpha^2/4$ on some affine hyperplane.
[guided]
A nonzero Fourier coefficient measures correlation with the character $x\mapsto \omega^{\xi\cdot x}$. This character is constant on each affine hyperplane
\begin{align*}
H_t=\{x\in V:\xi\cdot x=t\}.
\end{align*}
Because $\xi\neq 0$, the [linear map](/page/Linear%20Map) $x\mapsto \xi\cdot x$ from $V$ to $\mathbb{F}_p$ is surjective, so each fiber $H_t$ has size $N/p$.
Let
\begin{align*}
\alpha_t:=\frac{|A\cap H_t|}{|H_t|}
\end{align*}
be the density of $A$ on the fiber $H_t$. The Fourier coefficient can be grouped by these fibers:
\begin{align*}
\widehat f(\xi)
&=
\frac{1}{N}\sum_{x\in V}(a(x)-\alpha)\omega^{-\xi\cdot x}\\
&=
\frac{1}{N}\sum_{t\in\mathbb{F}_p}
\sum_{x\in H_t}(a(x)-\alpha)\omega^{-t}\\
&=
\frac{1}{p}\sum_{t\in\mathbb{F}_p}(\alpha_t-\alpha)\omega^{-t}.
\end{align*}
Set $\delta_t:=\alpha_t-\alpha$. Since the $H_t$ partition $V$ into equal-size pieces, the average of the $\alpha_t$ is $\alpha$, and hence
\begin{align*}
\sum_{t\in\mathbb{F}_p}\delta_t=0.
\end{align*}
The positive deviations and negative deviations therefore have the same total magnitude:
\begin{align*}
\sum_{t\in\mathbb{F}_p}|\delta_t|
=
2\sum_{\delta_t>0}\delta_t.
\end{align*}
Using the large Fourier coefficient and the triangle inequality,
\begin{align*}
\frac{\alpha^2}{2}
\leq |\widehat f(\xi)|
\leq
\frac{1}{p}\sum_{t\in\mathbb{F}_p}|\delta_t|
=
\frac{2}{p}\sum_{\delta_t>0}\delta_t
\leq
2\max_{t\in\mathbb{F}_p}\delta_t.
\end{align*}
Thus some fiber satisfies
\begin{align*}
\delta_{t_0}\geq \frac{\alpha^2}{4},
\end{align*}
or equivalently
\begin{align*}
\alpha_{t_0}\geq \alpha+\frac{\alpha^2}{4}.
\end{align*}
This is the density increment step: a large Fourier bias lets us pass to a codimension-one affine subspace where the density has increased by a quadratic amount.
[/guided]
[/step]
[step:Iterate the density increment inside affine subspaces]
The previous argument applies verbatim inside any affine subspace of $V$. More precisely, if $W\subset V$ is an affine subspace of dimension $m\geq 1$, if $B:=A\cap W$ has relative density $\gamma:=|B|/|W|>0$, and if
\begin{align*}
|W|=p^m\geq 2\gamma^{-2},
\end{align*}
then there is an affine hyperplane $W'\subset W$ such that
\begin{align*}
\frac{|A\cap W'|}{|W'|}
\geq
\gamma+\frac{\gamma^2}{4}.
\end{align*}
This follows by choosing a point $w_0\in W$, writing $W=w_0+L$ for an $m$-dimensional linear subspace $L\subset V$, and applying the preceding Fourier argument to the set
\begin{align*}
B_0:=\{y\in L:w_0+y\in A\}\subset L.
\end{align*}
The set $B_0$ has no non-degenerate three-term arithmetic progression in $L$, because such a progression would translate to one in $A\cap W$.
Starting with $W_0:=V$ and $\gamma_0:=\alpha$, construct affine subspaces
\begin{align*}
W_0 \supset W_1 \supset \cdots \supset W_r
\end{align*}
as long as the density increment step applies. Write
\begin{align*}
m_i:=\dim W_i,
\qquad
\gamma_i:=\frac{|A\cap W_i|}{|W_i|}.
\end{align*}
At each successful step,
\begin{align*}
m_{i+1}=m_i-1,
\qquad
\gamma_{i+1}\geq \gamma_i+\frac{\gamma_i^2}{4}.
\end{align*}
Since every $\gamma_i\leq 1$,
\begin{align*}
\frac{1}{\gamma_{i+1}}
\leq
\frac{1}{\gamma_i+\gamma_i^2/4}
=
\frac{1}{\gamma_i}\cdot \frac{1}{1+\gamma_i/4}
\leq
\frac{1}{\gamma_i}-\frac{1}{5}.
\end{align*}
The last inequality uses $0<\gamma_i\leq 1$. Therefore the number $r$ of successful increments satisfies
\begin{align*}
r<\frac{5}{\alpha}.
\end{align*}
[guided]
The density increment argument is not tied to the original ambient space $V$. If $W\subset V$ is an affine subspace, choose $w_0\in W$ and write $W=w_0+L$, where $L$ is a linear subspace. The translation map
\begin{align*}
L&\to W\\
y&\mapsto w_0+y
\end{align*}
identifies subsets of $W$ with subsets of $L$. Under this identification, a progression
\begin{align*}
y,\quad y+d,\quad y+2d
\end{align*}
in $L$ translates to
\begin{align*}
w_0+y,\quad w_0+y+d,\quad w_0+y+2d
\end{align*}
in $W$. If $d\neq 0$, non-degeneracy is preserved. Hence $A\cap W$ remains progression-free inside $W$.
Thus, whenever $B=A\cap W$ has relative density $\gamma$ and $|W|=p^m\geq 2\gamma^{-2}$, the Fourier argument gives an affine hyperplane $W'\subset W$ with density at least
\begin{align*}
\gamma+\frac{\gamma^2}{4}.
\end{align*}
We now iterate. Define $W_0:=V$ and $\gamma_0:=\alpha$. After each successful increment, the dimension drops by exactly one and the density increases:
\begin{align*}
m_{i+1}=m_i-1,
\qquad
\gamma_{i+1}\geq \gamma_i+\frac{\gamma_i^2}{4}.
\end{align*}
The density can never exceed $1$, because it is the relative size of a subset. To turn this into a bound on the number of iterations, look at reciprocals. For $0<\gamma_i\leq 1$,
\begin{align*}
\frac{1}{\gamma_{i+1}}
\leq
\frac{1}{\gamma_i+\gamma_i^2/4}
=
\frac{1}{\gamma_i}\cdot\frac{1}{1+\gamma_i/4}.
\end{align*}
Also,
\begin{align*}
\frac{1}{\gamma_i}-\frac{1}{\gamma_i+\gamma_i^2/4}
=
\frac{1}{4+\gamma_i}
\geq \frac{1}{5}.
\end{align*}
So every successful density increment decreases $1/\gamma_i$ by at least $1/5$. Since initially $1/\gamma_0=1/\alpha$ and reciprocals remain positive, there can be fewer than $5/\alpha$ successful increments:
\begin{align*}
r<\frac{5}{\alpha}.
\end{align*}
[/guided]
[/step]
[step:Bound the terminal dimension and conclude the density estimate]
When the iteration stops at $W_r$, either $\dim W_r=0$ or the size condition for another increment fails. In both cases,
\begin{align*}
m_r \leq \log_p 2 + 2\log_p(\alpha^{-1}).
\end{align*}
Indeed, if $m_r>0$ and the size condition fails, then
\begin{align*}
p^{m_r}<2\gamma_r^{-2}\leq 2\alpha^{-2},
\end{align*}
because $\gamma_r\geq \alpha$.
Now assume first that $N=p^n\geq 2\alpha^{-2}$. Combining the bounds for the number of increments and the terminal dimension gives
\begin{align*}
n
=
r+m_r
<
\frac{5}{\alpha}
+
\log_p 2
+
2\log_p(\alpha^{-1}).
\end{align*}
Multiplying by $\alpha$ and using
\begin{align*}
\sup_{0<u\leq 1} u\log_p(u^{-1})
=
\frac{1}{e\ln p},
\end{align*}
we obtain
\begin{align*}
\alpha n
\leq
5+\log_p 2+\frac{2}{e\ln p}.
\end{align*}
It remains to handle the case $N<2\alpha^{-2}$. Then
\begin{align*}
\alpha<\sqrt{2}\,p^{-n/2}.
\end{align*}
Since
\begin{align*}
\sup_{x>0} xp^{-x/2}=\frac{2}{e\ln p},
\end{align*}
we have
\begin{align*}
\alpha n
<
\frac{2\sqrt{2}}{e\ln p}.
\end{align*}
Therefore the theorem holds with, for example,
\begin{align*}
C_p
:=
5+\log_p 2+\frac{2}{e\ln p}+\frac{2\sqrt{2}}{e\ln p}.
\end{align*}
This constant depends only on $p$, and the desired estimate
\begin{align*}
\alpha\leq C_p n^{-1}
\end{align*}
follows.
[guided]
At the end of the iteration, there are two possible reasons we cannot continue. Either the affine subspace has dimension $0$, or the hypothesis needed for the Fourier step,
\begin{align*}
|W_r|=p^{m_r}\geq 2\gamma_r^{-2},
\end{align*}
has failed. If the dimension is $0$, then $m_r=0$, which is already bounded by the desired expression. If the size condition fails, then
\begin{align*}
p^{m_r}<2\gamma_r^{-2}.
\end{align*}
The densities only increase during the iteration, so $\gamma_r\geq\alpha$. Hence
\begin{align*}
p^{m_r}<2\alpha^{-2},
\end{align*}
and taking logarithms base $p$ gives
\begin{align*}
m_r\leq \log_p2+2\log_p(\alpha^{-1}).
\end{align*}
In the main case $N=p^n\geq 2\alpha^{-2}$, the iteration starts legitimately. Since each successful step lowers the dimension by one,
\begin{align*}
n=r+m_r.
\end{align*}
Using the bound $r<5/\alpha$ and the terminal dimension estimate,
\begin{align*}
n
<
\frac{5}{\alpha}
+
\log_p2
+
2\log_p(\alpha^{-1}).
\end{align*}
Multiplying by $\alpha$ gives
\begin{align*}
\alpha n
<
5+\alpha\log_p2+2\alpha\log_p(\alpha^{-1}).
\end{align*}
For $0<u\leq 1$, the function $u\ln(u^{-1})$ has maximum $1/e$, attained at $u=e^{-1}$. Therefore
\begin{align*}
\alpha\log_p(\alpha^{-1})
=
\frac{\alpha\ln(\alpha^{-1})}{\ln p}
\leq
\frac{1}{e\ln p}.
\end{align*}
Also $\alpha\leq 1$, so $\alpha\log_p2\leq \log_p2$. Thus
\begin{align*}
\alpha n
\leq
5+\log_p2+\frac{2}{e\ln p}.
\end{align*}
If the main size condition fails at the original level, then
\begin{align*}
p^n<2\alpha^{-2},
\end{align*}
which implies
\begin{align*}
\alpha<\sqrt{2}\,p^{-n/2}.
\end{align*}
The function $x\mapsto xp^{-x/2}$ on $(0,\infty)$ has maximum $2/(e\ln p)$, so
\begin{align*}
\alpha n<\frac{2\sqrt{2}}{e\ln p}.
\end{align*}
Combining the two cases, the estimate holds with
\begin{align*}
C_p
:=
5+\log_p 2+\frac{2}{e\ln p}+\frac{2\sqrt{2}}{e\ln p}.
\end{align*}
This constant depends only on the prime $p$, and therefore
\begin{align*}
\alpha\leq C_p n^{-1}.
\end{align*}
[/guided]
[/step]