Meshulam's Theorem on Three-Term Arithmetic Progressions

Theorem

Edit Issues Pull Requests Attributions Admin

Discussion

Proof

[proofplan] We count three-term arithmetic progressions by Fourier analysis on the finite [vector space](/page/Vector%20Space) $\mathbb{F}_p^n$. If the ambient affine space is large compared with the density, the absence of non-degenerate progressions forces a nonzero Fourier coefficient of the balanced function $\mathbb{1}_A-\alpha$ to be large. A large Fourier coefficient gives a density increment on a codimension-one affine subspace. Iterating this increment either exhausts the dimension or makes the density exceed $1$, and the resulting dimension count gives $\alpha \leq C_p n^{-1}$. [/proofplan] [step:Define the Fourier transform and count progressions] Let $V := \mathbb{F}_p^n$, let $N := |V| = p^n$, and let \begin{align*} \omega := e^{2\pi i/p}. \end{align*} For $\xi \in V$, define the character \begin{align*} \chi_\xi: V &\to \mathbb{C} \\ x &\mapsto \omega^{\xi \cdot x}, \end{align*} where $\xi \cdot x := \sum_{j=1}^n \xi_j x_j \in \mathbb{F}_p$ is interpreted in the exponent through its representative in $\{0,\dots,p-1\}$. For a function $g: V \to \mathbb{C}$, define its normalized [Fourier transform](/page/Fourier%20Transform) by \begin{align*} \widehat{g}: V &\to \mathbb{C} \\ \xi &\mapsto \mathbb{E}_{x \in V} g(x)\overline{\chi_\xi(x)} = \frac{1}{N}\sum_{x \in V} g(x)\overline{\chi_\xi(x)}. \end{align*} The character orthogonality relation is \begin{align*} \mathbb{E}_{x \in V}\chi_\xi(x) = \begin{cases} 1, & \xi = 0,\\ 0, & \xi \neq 0. \end{cases} \end{align*} Indeed, if $\xi \neq 0$, choose $h \in V$ with $\xi \cdot h \neq 0$; translation by $h$ gives \begin{align*} \mathbb{E}_{x \in V}\chi_\xi(x) = \mathbb{E}_{x \in V}\chi_\xi(x+h) = \chi_\xi(h)\mathbb{E}_{x \in V}\chi_\xi(x), \end{align*} and $\chi_\xi(h)\neq 1$, so the expectation is $0$. Let $a := \mathbb{1}_A:V \to \{0,1\}$. Define the normalized three-term progression count \begin{align*} T(a) := \mathbb{E}_{x,d \in V} a(x)a(x+d)a(x+2d). \end{align*} Fourier inversion, obtained from the orthogonality relation, gives \begin{align*} a(x)=\sum_{\xi \in V}\widehat{a}(\xi)\chi_\xi(x). \end{align*} Substituting this into $T(a)$ and using orthogonality in the variables $x$ and $d$ gives \begin{align*} T(a) &= \sum_{\xi,\eta,\zeta \in V} \widehat{a}(\xi)\widehat{a}(\eta)\widehat{a}(\zeta) \mathbb{E}_{x,d \in V} \chi_\xi(x)\chi_\eta(x+d)\chi_\zeta(x+2d)\\ &= \sum_{\xi,\eta,\zeta \in V} \widehat{a}(\xi)\widehat{a}(\eta)\widehat{a}(\zeta) \mathbb{E}_{x \in V}\chi_{\xi+\eta+\zeta}(x) \mathbb{E}_{d \in V}\chi_{\eta+2\zeta}(d)\\ &= \sum_{\zeta \in V}\widehat{a}(\zeta)^2\widehat{a}(-2\zeta). \end{align*} The last equality uses that $p$ is odd, so multiplication by $2$ is invertible in $\mathbb{F}_p$. [guided] The purpose of the Fourier transform is to diagonalize the condition that three points form an arithmetic progression. We work with normalized averages, so all quantities are scale-free in the size $N=p^n$. For each frequency $\xi \in V$, the character \begin{align*} \chi_\xi: V &\to \mathbb{C} \\ x &\mapsto \omega^{\xi \cdot x} \end{align*} is a group homomorphism from the additive group of $V$ to the unit circle. The orthogonality identity says that a nontrivial character averages to zero. If $\xi\neq 0$, there is $h\in V$ with $\xi\cdot h\neq 0$. Since translation preserves the uniform average on $V$, \begin{align*} \mathbb{E}_{x \in V}\chi_\xi(x) = \mathbb{E}_{x \in V}\chi_\xi(x+h) = \chi_\xi(h)\mathbb{E}_{x \in V}\chi_\xi(x). \end{align*} Because $\chi_\xi(h)\neq 1$, the only possible value of the expectation is $0$. For $\xi=0$, the character is identically $1$, so the expectation is $1$. Now define $a:=\mathbb{1}_A$. The quantity \begin{align*} T(a):=\mathbb{E}_{x,d\in V}a(x)a(x+d)a(x+2d) \end{align*} is the normalized number of ordered three-term progressions in $A$, including the degenerate progressions with $d=0$. Fourier inversion writes \begin{align*} a(x)=\sum_{\xi\in V}\widehat a(\xi)\chi_\xi(x). \end{align*} Substituting this expression at $x$, $x+d$, and $x+2d$, we obtain \begin{align*} T(a) &= \sum_{\xi,\eta,\zeta \in V} \widehat{a}(\xi)\widehat{a}(\eta)\widehat{a}(\zeta) \mathbb{E}_{x,d \in V} \chi_\xi(x)\chi_\eta(x+d)\chi_\zeta(x+2d)\\ &= \sum_{\xi,\eta,\zeta \in V} \widehat{a}(\xi)\widehat{a}(\eta)\widehat{a}(\zeta) \mathbb{E}_{x \in V}\chi_{\xi+\eta+\zeta}(x) \mathbb{E}_{d \in V}\chi_{\eta+2\zeta}(d). \end{align*} The $x$-average vanishes unless $\xi+\eta+\zeta=0$, and the $d$-average vanishes unless $\eta+2\zeta=0$. Since $p$ is odd, $2$ is invertible, so these two equations force $\eta=-2\zeta$ and $\xi=\zeta$. Therefore \begin{align*} T(a)=\sum_{\zeta\in V}\widehat a(\zeta)^2\widehat a(-2\zeta). \end{align*} This is the Fourier formula that turns a shortage of progressions into information about nonzero Fourier coefficients. [/guided] [/step] [step:Force a large nonzero Fourier coefficient from the missing progressions] Define the balanced function \begin{align*} f: V &\to \mathbb{R} \\ x &\mapsto a(x)-\alpha. \end{align*} Then $\widehat{a}(0)=\alpha$, $\widehat{f}(0)=0$, and $\widehat{a}(\xi)=\widehat{f}(\xi)$ for every $\xi \neq 0$. Since $A$ contains no non-degenerate three-term arithmetic progression, the only progressions counted by $T(a)$ have $d=0$. Hence \begin{align*} T(a)=\frac{|A|}{N^2}=\frac{\alpha}{N}. \end{align*} Assume first that \begin{align*} N \geq 2\alpha^{-2}. \end{align*} Then \begin{align*} T(a)\leq \frac{\alpha^3}{2}. \end{align*} Using the progression-count formula, \begin{align*} \left|\sum_{\zeta \in V\setminus\{0\}}\widehat{a}(\zeta)^2\widehat{a}(-2\zeta)\right| = |\alpha^3-T(a)| \geq \frac{\alpha^3}{2}. \end{align*} Let \begin{align*} M := \max_{\xi \in V\setminus\{0\}}|\widehat{f}(\xi)|. \end{align*} Since multiplication by $-2$ is a bijection of $V\setminus\{0\}$, \begin{align*} \frac{\alpha^3}{2} &\leq \sum_{\zeta \in V\setminus\{0\}} |\widehat{a}(\zeta)|^2|\widehat{a}(-2\zeta)|\\ &\leq M\sum_{\zeta \in V\setminus\{0\}}|\widehat{a}(\zeta)|^2. \end{align*} [Parseval's identity](/theorems/434) for the normalized Fourier transform gives \begin{align*} \sum_{\zeta \in V}|\widehat{a}(\zeta)|^2 = \mathbb{E}_{x \in V}|a(x)|^2 = \alpha. \end{align*} Therefore \begin{align*} M \geq \frac{\alpha^2}{2}. \end{align*} [guided] The balanced function \begin{align*} f:V&\to\mathbb{R}\\ x&\mapsto a(x)-\alpha \end{align*} removes the constant density contribution from $a$. Since $\mathbb{E}_{x\in V}a(x)=\alpha$, we have $\widehat f(0)=0$, while for every nonzero $\xi$ the constant function $\alpha$ has zero Fourier coefficient, so $\widehat a(\xi)=\widehat f(\xi)$. The hypothesis that $A$ has no non-degenerate progression means that whenever $a(x)a(x+d)a(x+2d)=1$, we must have $d=0$. Thus the only contributing pairs are $(x,0)$ with $x\in A$, and therefore \begin{align*} T(a)=\frac{|A|}{N^2}=\frac{\alpha}{N}. \end{align*} If $N\geq 2\alpha^{-2}$, then this count is at most half of the random main term: \begin{align*} T(a)\leq \frac{\alpha^3}{2}. \end{align*} The Fourier formula separates the zero frequency from the nonzero frequencies: \begin{align*} T(a) = \widehat a(0)^2\widehat a(0) + \sum_{\zeta\in V\setminus\{0\}}\widehat a(\zeta)^2\widehat a(-2\zeta) = \alpha^3+ \sum_{\zeta\in V\setminus\{0\}}\widehat a(\zeta)^2\widehat a(-2\zeta). \end{align*} Since $T(a)\leq \alpha^3/2$, the nonzero frequencies must account for a cancellation of size at least $\alpha^3/2$: \begin{align*} \left|\sum_{\zeta\in V\setminus\{0\}}\widehat a(\zeta)^2\widehat a(-2\zeta)\right| \geq \frac{\alpha^3}{2}. \end{align*} Let \begin{align*} M:=\max_{\xi\in V\setminus\{0\}}|\widehat f(\xi)|. \end{align*} For $\zeta\neq 0$, both $\zeta$ and $-2\zeta$ are nonzero, because $p$ is odd. Hence \begin{align*} |\widehat a(-2\zeta)|=|\widehat f(-2\zeta)|\leq M. \end{align*} Therefore \begin{align*} \frac{\alpha^3}{2} &\leq \sum_{\zeta\in V\setminus\{0\}} |\widehat a(\zeta)|^2|\widehat a(-2\zeta)|\\ &\leq M\sum_{\zeta\in V\setminus\{0\}}|\widehat a(\zeta)|^2. \end{align*} Parseval's identity follows from character orthogonality and gives \begin{align*} \sum_{\zeta\in V}|\widehat a(\zeta)|^2 = \mathbb{E}_{x\in V}|a(x)|^2. \end{align*} Since $a$ is an indicator, $a^2=a$, so the right-hand side is $\alpha$. Thus \begin{align*} \frac{\alpha^3}{2}\leq M\alpha, \end{align*} and hence \begin{align*} M\geq \frac{\alpha^2}{2}. \end{align*} This is the key Fourier consequence: if the progression count is too small, then $A$ has a strong bias along some nonzero character. [/guided] [/step] [step:Convert a large Fourier coefficient into a density increment] Let $\xi \in V\setminus\{0\}$ satisfy \begin{align*} |\widehat f(\xi)|\geq \frac{\alpha^2}{2}. \end{align*} For each $t\in \mathbb{F}_p$, define the affine hyperplane \begin{align*} H_t:=\{x\in V:\xi\cdot x=t\}. \end{align*} Each $H_t$ has cardinality $N/p$. Define the density of $A$ on $H_t$ by \begin{align*} \alpha_t:=\frac{|A\cap H_t|}{|H_t|}. \end{align*} Then \begin{align*} \widehat f(\xi) &= \frac{1}{N}\sum_{x\in V}(a(x)-\alpha)\omega^{-\xi\cdot x}\\ &= \frac{1}{p}\sum_{t\in\mathbb{F}_p}(\alpha_t-\alpha)\omega^{-t}. \end{align*} Let $\delta_t:=\alpha_t-\alpha$ for $t\in\mathbb{F}_p$. Since the hyperplanes $H_t$ partition $V$ into equal-size pieces, \begin{align*} \sum_{t\in\mathbb{F}_p}\delta_t=0. \end{align*} Thus \begin{align*} \frac{\alpha^2}{2} \leq |\widehat f(\xi)| \leq \frac{1}{p}\sum_{t\in\mathbb{F}_p}|\delta_t| = \frac{2}{p}\sum_{\delta_t>0}\delta_t \leq 2\max_{t\in\mathbb{F}_p}\delta_t. \end{align*} Therefore there exists $t_0\in\mathbb{F}_p$ such that \begin{align*} \alpha_{t_0}\geq \alpha+\frac{\alpha^2}{4}. \end{align*} So, provided $p^n\geq 2\alpha^{-2}$, a progression-free set of density $\alpha$ in $V$ has density at least $\alpha+\alpha^2/4$ on some affine hyperplane. [guided] A nonzero Fourier coefficient measures correlation with the character $x\mapsto \omega^{\xi\cdot x}$. This character is constant on each affine hyperplane \begin{align*} H_t=\{x\in V:\xi\cdot x=t\}. \end{align*} Because $\xi\neq 0$, the [linear map](/page/Linear%20Map) $x\mapsto \xi\cdot x$ from $V$ to $\mathbb{F}_p$ is surjective, so each fiber $H_t$ has size $N/p$. Let \begin{align*} \alpha_t:=\frac{|A\cap H_t|}{|H_t|} \end{align*} be the density of $A$ on the fiber $H_t$. The Fourier coefficient can be grouped by these fibers: \begin{align*} \widehat f(\xi) &= \frac{1}{N}\sum_{x\in V}(a(x)-\alpha)\omega^{-\xi\cdot x}\\ &= \frac{1}{N}\sum_{t\in\mathbb{F}_p} \sum_{x\in H_t}(a(x)-\alpha)\omega^{-t}\\ &= \frac{1}{p}\sum_{t\in\mathbb{F}_p}(\alpha_t-\alpha)\omega^{-t}. \end{align*} Set $\delta_t:=\alpha_t-\alpha$. Since the $H_t$ partition $V$ into equal-size pieces, the average of the $\alpha_t$ is $\alpha$, and hence \begin{align*} \sum_{t\in\mathbb{F}_p}\delta_t=0. \end{align*} The positive deviations and negative deviations therefore have the same total magnitude: \begin{align*} \sum_{t\in\mathbb{F}_p}|\delta_t| = 2\sum_{\delta_t>0}\delta_t. \end{align*} Using the large Fourier coefficient and the triangle inequality, \begin{align*} \frac{\alpha^2}{2} \leq |\widehat f(\xi)| \leq \frac{1}{p}\sum_{t\in\mathbb{F}_p}|\delta_t| = \frac{2}{p}\sum_{\delta_t>0}\delta_t \leq 2\max_{t\in\mathbb{F}_p}\delta_t. \end{align*} Thus some fiber satisfies \begin{align*} \delta_{t_0}\geq \frac{\alpha^2}{4}, \end{align*} or equivalently \begin{align*} \alpha_{t_0}\geq \alpha+\frac{\alpha^2}{4}. \end{align*} This is the density increment step: a large Fourier bias lets us pass to a codimension-one affine subspace where the density has increased by a quadratic amount. [/guided] [/step] [step:Iterate the density increment inside affine subspaces] The previous argument applies verbatim inside any affine subspace of $V$. More precisely, if $W\subset V$ is an affine subspace of dimension $m\geq 1$, if $B:=A\cap W$ has relative density $\gamma:=|B|/|W|>0$, and if \begin{align*} |W|=p^m\geq 2\gamma^{-2}, \end{align*} then there is an affine hyperplane $W'\subset W$ such that \begin{align*} \frac{|A\cap W'|}{|W'|} \geq \gamma+\frac{\gamma^2}{4}. \end{align*} This follows by choosing a point $w_0\in W$, writing $W=w_0+L$ for an $m$-dimensional linear subspace $L\subset V$, and applying the preceding Fourier argument to the set \begin{align*} B_0:=\{y\in L:w_0+y\in A\}\subset L. \end{align*} The set $B_0$ has no non-degenerate three-term arithmetic progression in $L$, because such a progression would translate to one in $A\cap W$. Starting with $W_0:=V$ and $\gamma_0:=\alpha$, construct affine subspaces \begin{align*} W_0 \supset W_1 \supset \cdots \supset W_r \end{align*} as long as the density increment step applies. Write \begin{align*} m_i:=\dim W_i, \qquad \gamma_i:=\frac{|A\cap W_i|}{|W_i|}. \end{align*} At each successful step, \begin{align*} m_{i+1}=m_i-1, \qquad \gamma_{i+1}\geq \gamma_i+\frac{\gamma_i^2}{4}. \end{align*} Since every $\gamma_i\leq 1$, \begin{align*} \frac{1}{\gamma_{i+1}} \leq \frac{1}{\gamma_i+\gamma_i^2/4} = \frac{1}{\gamma_i}\cdot \frac{1}{1+\gamma_i/4} \leq \frac{1}{\gamma_i}-\frac{1}{5}. \end{align*} The last inequality uses $0<\gamma_i\leq 1$. Therefore the number $r$ of successful increments satisfies \begin{align*} r<\frac{5}{\alpha}. \end{align*} [guided] The density increment argument is not tied to the original ambient space $V$. If $W\subset V$ is an affine subspace, choose $w_0\in W$ and write $W=w_0+L$, where $L$ is a linear subspace. The translation map \begin{align*} L&\to W\\ y&\mapsto w_0+y \end{align*} identifies subsets of $W$ with subsets of $L$. Under this identification, a progression \begin{align*} y,\quad y+d,\quad y+2d \end{align*} in $L$ translates to \begin{align*} w_0+y,\quad w_0+y+d,\quad w_0+y+2d \end{align*} in $W$. If $d\neq 0$, non-degeneracy is preserved. Hence $A\cap W$ remains progression-free inside $W$. Thus, whenever $B=A\cap W$ has relative density $\gamma$ and $|W|=p^m\geq 2\gamma^{-2}$, the Fourier argument gives an affine hyperplane $W'\subset W$ with density at least \begin{align*} \gamma+\frac{\gamma^2}{4}. \end{align*} We now iterate. Define $W_0:=V$ and $\gamma_0:=\alpha$. After each successful increment, the dimension drops by exactly one and the density increases: \begin{align*} m_{i+1}=m_i-1, \qquad \gamma_{i+1}\geq \gamma_i+\frac{\gamma_i^2}{4}. \end{align*} The density can never exceed $1$, because it is the relative size of a subset. To turn this into a bound on the number of iterations, look at reciprocals. For $0<\gamma_i\leq 1$, \begin{align*} \frac{1}{\gamma_{i+1}} \leq \frac{1}{\gamma_i+\gamma_i^2/4} = \frac{1}{\gamma_i}\cdot\frac{1}{1+\gamma_i/4}. \end{align*} Also, \begin{align*} \frac{1}{\gamma_i}-\frac{1}{\gamma_i+\gamma_i^2/4} = \frac{1}{4+\gamma_i} \geq \frac{1}{5}. \end{align*} So every successful density increment decreases $1/\gamma_i$ by at least $1/5$. Since initially $1/\gamma_0=1/\alpha$ and reciprocals remain positive, there can be fewer than $5/\alpha$ successful increments: \begin{align*} r<\frac{5}{\alpha}. \end{align*} [/guided] [/step] [step:Bound the terminal dimension and conclude the density estimate] When the iteration stops at $W_r$, either $\dim W_r=0$ or the size condition for another increment fails. In both cases, \begin{align*} m_r \leq \log_p 2 + 2\log_p(\alpha^{-1}). \end{align*} Indeed, if $m_r>0$ and the size condition fails, then \begin{align*} p^{m_r}<2\gamma_r^{-2}\leq 2\alpha^{-2}, \end{align*} because $\gamma_r\geq \alpha$. Now assume first that $N=p^n\geq 2\alpha^{-2}$. Combining the bounds for the number of increments and the terminal dimension gives \begin{align*} n = r+m_r < \frac{5}{\alpha} + \log_p 2 + 2\log_p(\alpha^{-1}). \end{align*} Multiplying by $\alpha$ and using \begin{align*} \sup_{0<u\leq 1} u\log_p(u^{-1}) = \frac{1}{e\ln p}, \end{align*} we obtain \begin{align*} \alpha n \leq 5+\log_p 2+\frac{2}{e\ln p}. \end{align*} It remains to handle the case $N<2\alpha^{-2}$. Then \begin{align*} \alpha<\sqrt{2}\,p^{-n/2}. \end{align*} Since \begin{align*} \sup_{x>0} xp^{-x/2}=\frac{2}{e\ln p}, \end{align*} we have \begin{align*} \alpha n < \frac{2\sqrt{2}}{e\ln p}. \end{align*} Therefore the theorem holds with, for example, \begin{align*} C_p := 5+\log_p 2+\frac{2}{e\ln p}+\frac{2\sqrt{2}}{e\ln p}. \end{align*} This constant depends only on $p$, and the desired estimate \begin{align*} \alpha\leq C_p n^{-1} \end{align*} follows. [guided] At the end of the iteration, there are two possible reasons we cannot continue. Either the affine subspace has dimension $0$, or the hypothesis needed for the Fourier step, \begin{align*} |W_r|=p^{m_r}\geq 2\gamma_r^{-2}, \end{align*} has failed. If the dimension is $0$, then $m_r=0$, which is already bounded by the desired expression. If the size condition fails, then \begin{align*} p^{m_r}<2\gamma_r^{-2}. \end{align*} The densities only increase during the iteration, so $\gamma_r\geq\alpha$. Hence \begin{align*} p^{m_r}<2\alpha^{-2}, \end{align*} and taking logarithms base $p$ gives \begin{align*} m_r\leq \log_p2+2\log_p(\alpha^{-1}). \end{align*} In the main case $N=p^n\geq 2\alpha^{-2}$, the iteration starts legitimately. Since each successful step lowers the dimension by one, \begin{align*} n=r+m_r. \end{align*} Using the bound $r<5/\alpha$ and the terminal dimension estimate, \begin{align*} n < \frac{5}{\alpha} + \log_p2 + 2\log_p(\alpha^{-1}). \end{align*} Multiplying by $\alpha$ gives \begin{align*} \alpha n < 5+\alpha\log_p2+2\alpha\log_p(\alpha^{-1}). \end{align*} For $0<u\leq 1$, the function $u\ln(u^{-1})$ has maximum $1/e$, attained at $u=e^{-1}$. Therefore \begin{align*} \alpha\log_p(\alpha^{-1}) = \frac{\alpha\ln(\alpha^{-1})}{\ln p} \leq \frac{1}{e\ln p}. \end{align*} Also $\alpha\leq 1$, so $\alpha\log_p2\leq \log_p2$. Thus \begin{align*} \alpha n \leq 5+\log_p2+\frac{2}{e\ln p}. \end{align*} If the main size condition fails at the original level, then \begin{align*} p^n<2\alpha^{-2}, \end{align*} which implies \begin{align*} \alpha<\sqrt{2}\,p^{-n/2}. \end{align*} The function $x\mapsto xp^{-x/2}$ on $(0,\infty)$ has maximum $2/(e\ln p)$, so \begin{align*} \alpha n<\frac{2\sqrt{2}}{e\ln p}. \end{align*} Combining the two cases, the estimate holds with \begin{align*} C_p := 5+\log_p 2+\frac{2}{e\ln p}+\frac{2\sqrt{2}}{e\ln p}. \end{align*} This constant depends only on the prime $p$, and therefore \begin{align*} \alpha\leq C_p n^{-1}. \end{align*} [/guided] [/step]

Prerequisites (0/1 completed)

Prerequisites Graph

Interactive dependency map showing how this theorem builds on foundational concepts

Loading dependency graph...

Definitions & Concepts

Fourier Transform

What brings you to Androma?

Start with a route through the knowledge graph.