[proofplan]
We prove Selberg's upper bound by replacing the indicator of the sifted condition with a quadratic divisor sum whose weights satisfy $\lambda_1=1$. Expanding the square converts the problem into a finite quadratic form involving the divisibility counts $A_m$, and the assumed decomposition $A_m=Xg(m)+r_m$ separates the main term from the remainders. The main term is evaluated by diagonalizing the finite Selberg quadratic form and solving its constrained minimization problem. The remainder term is then bounded by the finite triangle inequality.
[/proofplan]
[step:Majorize the sifted indicator by a Selberg square]
Define the divisor-sum map $L:A\to\mathbb R$ by
\begin{align*}
L(a):=\sum_{\substack{d\in\mathcal D: d\mid a}}\lambda_d.
\end{align*}
This is a finite real sum because $\mathcal D$ is finite. If $\gcd(a,P(z))=1$ and $d\in\mathcal D$ divides $a$, then $d$ divides both $a$ and $P(z)$, so $d=1$. Since $\lambda_1=1$, this gives $L(a)=1$ on the sifted elements. Hence, for every $a\in A$,
\begin{align*}
\mathbb{1}_{\{b\in A:\gcd(b,P(z))=1\}}(a)\leq L(a)^2.
\end{align*}
Summing over the finite set $A$ gives
\begin{align*}
S(A,P,z)\leq \sum_{a\in A}L(a)^2.
\end{align*}
[guided]
The Selberg sieve starts by replacing the hard condition $\gcd(a,P(z))=1$ with a square that can be expanded. Define the divisor-sum map $L:A\to\mathbb R$ by
\begin{align*}
L(a):=\sum_{\substack{d\in\mathcal D: d\mid a}}\lambda_d.
\end{align*}
The sum is finite because $\mathcal D=\{d\in\mathbb N:d<D\text{ and }d\mid P(z)\}$ is finite.
Now take an element $a\in A$ satisfying $\gcd(a,P(z))=1$. If $d\in\mathcal D$ and $d\mid a$, then $d\mid P(z)$ by the definition of $\mathcal D$, so $d$ is a common divisor of $a$ and $P(z)$. The coprimality condition forces $d=1$. Therefore the divisor sum collapses to
\begin{align*}
L(a)=\lambda_1=1.
\end{align*}
On the other hand, if $a$ is not sifted, the indicator of the sifted condition is $0$ and the square $L(a)^2$ is nonnegative. Thus the pointwise inequality
\begin{align*}
\mathbb{1}_{\{b\in A:\gcd(b,P(z))=1\}}(a)\leq L(a)^2
\end{align*}
holds for every $a\in A$. Summing this pointwise inequality over the finite set $A$ gives
\begin{align*}
S(A,P,z)\leq \sum_{a\in A}L(a)^2.
\end{align*}
[/guided]
[/step]
[step:Expand the square and identify the divisibility counts]
Using the definition of $L(a)$, we obtain
\begin{align*}
\sum_{a\in A}L(a)^2=\sum_{a\in A}\sum_{\substack{d,e\in\mathcal D: d\mid a,\ e\mid a}}\lambda_d\lambda_e.
\end{align*}
All sums are finite, so interchanging the order of summation gives
\begin{align*}
\sum_{a\in A}L(a)^2=\sum_{d,e\in\mathcal D}\lambda_d\lambda_e\#\{a\in A:d\mid a\text{ and }e\mid a\}.
\end{align*}
For $d,e\in\mathcal D$, the integer $\operatorname{lcm}(d,e)$ is a squarefree divisor of $P(z)$, and the conditions $d\mid a$ and $e\mid a$ are equivalent to $\operatorname{lcm}(d,e)\mid a$. Therefore
\begin{align*}
\#\{a\in A:d\mid a\text{ and }e\mid a\}=A_{\operatorname{lcm}(d,e)}.
\end{align*}
Combining this identity with the preceding step yields
\begin{align*}
S(A,P,z)\leq \sum_{d,e\in\mathcal D}\lambda_d\lambda_e A_{\operatorname{lcm}(d,e)}.
\end{align*}
[/step]
[step:Separate the main term from the remainders]
For $d,e\in\mathcal D$, the integer $\operatorname{lcm}(d,e)$ is a squarefree divisor of $P(z)$, so the assumed decomposition applies:
\begin{align*}
A_{\operatorname{lcm}(d,e)}=Xg(\operatorname{lcm}(d,e))+r_{\operatorname{lcm}(d,e)}.
\end{align*}
Substituting this into the previous bound gives
\begin{align*}
S(A,P,z)\leq X\sum_{d,e\in\mathcal D}\lambda_d\lambda_e g(\operatorname{lcm}(d,e))+\sum_{d,e\in\mathcal D}\lambda_d\lambda_e r_{\operatorname{lcm}(d,e)}.
\end{align*}
By the definition of $Q$, the first finite sum is $Q(\lambda)$, so
\begin{align*}
S(A,P,z)\leq XQ(\lambda)+\sum_{d,e\in\mathcal D}\lambda_d\lambda_e r_{\operatorname{lcm}(d,e)}.
\end{align*}
[/step]
[step:Diagonalize the Selberg quadratic form and compute its minimum]
Since $0\leq g(p)<1$ for every prime $p<z$, each $h(d)$ is finite and nonnegative. Since $1\in\mathcal D$ and $h(1)=1$, we have $G(D,z)>0$.
[claim:The optimal Selberg weights satisfy $Q(\lambda)=1/G(D,z)$]
For every $\mu\in\mathbb R^{\mathcal D}$ satisfying $\mu_1=1$,
\begin{align*}
Q(\mu)\geq \frac{1}{G(D,z)}.
\end{align*}
The unique minimizer is the optimal Selberg weight vector $\lambda$, and it satisfies
\begin{align*}
Q(\lambda)=\frac{1}{G(D,z)}.
\end{align*}
[/claim]
[proof]
Define $b:\mathcal D\to\mathbb R$ by
\begin{align*}
b(r):=\prod_{p\mid r}(1-g(p)).
\end{align*}
For each $\mu\in\mathbb R^{\mathcal D}$, define $y(\mu):\mathcal D\to\mathbb R$ by
\begin{align*}
y_r(\mu):=b(r)\sum_{\substack{d\in\mathcal D: r\mid d}}g(d/r)\mu_d.
\end{align*}
This is well-defined because $\mathcal D$ is closed under divisors, and if $r\mid d$ with $r,d\in\mathcal D$, then $d/r$ is squarefree and divides $P(z)$.
For $r,d\in\mathcal D$, define $c_{r,d}:=0$ if $r\nmid d$, and define
\begin{align*}
c_{r,d}:=b(r)g(d/r)
\end{align*}
if $r\mid d$. We verify the factorization
\begin{align*}
g(\operatorname{lcm}(d,e))=\sum_{r\in\mathcal D}h(r)c_{r,d}c_{r,e}
\end{align*}
for fixed $d,e\in\mathcal D$. The function $g$ is multiplicative on squarefree divisors and $g(1)=1$, so it is enough to compare the local contribution of each prime dividing $P(z)$. If a prime $p$ divides neither $d$ nor $e$, the local contribution is $1$. If $p$ divides exactly one of $d,e$, only the local choice with $p\nmid r$ contributes, and the local contribution is $g(p)$. If $p$ divides both $d$ and $e$, the local contribution from $p\nmid r$ is $g(p)^2$, while the local contribution from $p\mid r$ is
\begin{align*}
\frac{g(p)}{1-g(p)}(1-g(p))^2=g(p)(1-g(p)).
\end{align*}
The sum in this last case is $g(p)$. Thus the product of the local factors is exactly $g(\operatorname{lcm}(d,e))$.
Substituting the factorization into $Q(\mu)$ and collecting terms gives the diagonal identity
\begin{align*}
Q(\mu)=\sum_{r\in\mathcal D}h(r)y_r(\mu)^2.
\end{align*}
The finite inclusion-exclusion identity on the divisor poset gives
\begin{align*}
\mu_1=\sum_{r\in\mathcal D}\mu_{\mathrm{Mob}}(r)h(r)y_r(\mu),
\end{align*}
where $\mu_{\mathrm{Mob}}$ denotes the ordinary Mobius function. Indeed, the coefficient of $\mu_d$ in the right-hand side is
\begin{align*}
\sum_{r\mid d}\mu_{\mathrm{Mob}}(r)h(r)b(r)g(d/r).
\end{align*}
Since $h(r)b(r)=g(r)$ and $g(r)g(d/r)=g(d)$ for $r\mid d$, this coefficient equals
\begin{align*}
g(d)\sum_{r\mid d}\mu_{\mathrm{Mob}}(r).
\end{align*}
The divisor sum of the Mobius function is $1$ when $d=1$ and $0$ when $d>1$, proving the identity.
Now assume $\mu_1=1$. Applying the [Cauchy-Schwarz Inequality](/theorems/432) to the finite real vectors $(h(r)^{1/2}y_r(\mu))_{r\in\mathcal D}$ and $(\mu_{\mathrm{Mob}}(r)h(r)^{1/2})_{r\in\mathcal D}$ gives
\begin{align*}
1\leq \left(\sum_{r\in\mathcal D}h(r)y_r(\mu)^2\right)\left(\sum_{r\in\mathcal D}h(r)\right).
\end{align*}
Using the diagonal identity and the definition of $G(D,z)$, this becomes
\begin{align*}
Q(\mu)\geq \frac{1}{G(D,z)}.
\end{align*}
Equality holds precisely when the two Cauchy-Schwarz vectors are proportional. Taking
\begin{align*}
y_r=\frac{\mu_{\mathrm{Mob}}(r)}{G(D,z)}
\end{align*}
for every $r\in\mathcal D$ gives the constraint
\begin{align*}
\sum_{r\in\mathcal D}\mu_{\mathrm{Mob}}(r)h(r)y_r=1
\end{align*}
and equality in Cauchy-Schwarz. The change of variables from $\mu$ to $y(\mu)$ is triangular with nonzero diagonal entries $b(r)$, because every factor $1-g(p)$ is positive. Hence there is a unique vector $\lambda\in\mathbb R^{\mathcal D}$ with $\lambda_1=1$ giving these $y$-values. By the definition in the theorem statement, this vector is the optimal Selberg weight vector. Therefore
\begin{align*}
Q(\lambda)=\frac{1}{G(D,z)}.
\end{align*}
[/proof]
[guided]
The point of this step is to compute the best possible value of the main quadratic expression. First note that the quantities defining $h$ are legitimate: because $0\leq g(p)<1$, every denominator $1-g(p)$ is positive, so every factor $g(p)/(1-g(p))$ is finite and nonnegative. Also $1\in\mathcal D$ because $D>1$, and the empty product gives $h(1)=1$. Hence
\begin{align*}
G(D,z)=\sum_{d\in\mathcal D}h(d)>0.
\end{align*}
Let $\mu\in\mathbb R^{\mathcal D}$ satisfy $\mu_1=1$. Define $b:\mathcal D\to\mathbb R$ by
\begin{align*}
b(r):=\prod_{p\mid r}(1-g(p)),
\end{align*}
and define the transformed variables $y(\mu):\mathcal D\to\mathbb R$ by
\begin{align*}
y_r(\mu):=b(r)\sum_{\substack{d\in\mathcal D: r\mid d}}g(d/r)\mu_d.
\end{align*}
This definition uses only values of $g$ in its stated domain: if $r\mid d$ and $r,d\in\mathcal D$, then $d/r$ is squarefree and divides $P(z)$.
We now show that these variables diagonalize the quadratic form. For $r,d\in\mathcal D$, define $c_{r,d}:=0$ when $r\nmid d$, and define
\begin{align*}
c_{r,d}:=b(r)g(d/r)
\end{align*}
when $r\mid d$. We compare local factors prime by prime. The explicit assumption $g(1)=1$, together with multiplicativity on squarefree divisors, means that the value of $g$ on a squarefree divisor is the product of its prime factors' values. If a prime $p$ divides neither $d$ nor $e$, the local factor is $1$. If $p$ divides exactly one of $d,e$, only the choice $p\nmid r$ can occur, and the local factor is $g(p)$. If $p$ divides both $d$ and $e$, then the contribution with $p\nmid r$ is $g(p)^2$, while the contribution with $p\mid r$ is
\begin{align*}
\frac{g(p)}{1-g(p)}(1-g(p))^2=g(p)(1-g(p)).
\end{align*}
Adding these two contributions gives $g(p)$. This is exactly the local factor of $g(\operatorname{lcm}(d,e))$. Multiplying over the finitely many primes involved gives
\begin{align*}
g(\operatorname{lcm}(d,e))=\sum_{r\in\mathcal D}h(r)c_{r,d}c_{r,e}.
\end{align*}
Substitution into the finite quadratic form gives
\begin{align*}
Q(\mu)=\sum_{r\in\mathcal D}h(r)y_r(\mu)^2.
\end{align*}
The constraint $\mu_1=1$ must also be written in terms of the $y$-variables. Let $\mu_{\mathrm{Mob}}$ denote the ordinary Mobius function. We claim that
\begin{align*}
\mu_1=\sum_{r\in\mathcal D}\mu_{\mathrm{Mob}}(r)h(r)y_r(\mu).
\end{align*}
To verify this, expand the right-hand side and collect the coefficient of a fixed $\mu_d$. That coefficient is
\begin{align*}
\sum_{r\mid d}\mu_{\mathrm{Mob}}(r)h(r)b(r)g(d/r).
\end{align*}
Since $h(r)b(r)=g(r)$ and $g(r)g(d/r)=g(d)$ for $r\mid d$, this equals
\begin{align*}
g(d)\sum_{r\mid d}\mu_{\mathrm{Mob}}(r).
\end{align*}
The standard Mobius inversion identity says that the divisor sum of $\mu_{\mathrm{Mob}}$ is $1$ for $d=1$ and $0$ for $d>1$. Hence the whole expression is exactly $\mu_1$.
Now apply the [Cauchy-Schwarz Inequality](/theorems/432) to the two finite real vectors
\begin{align*}
(h(r)^{1/2}y_r(\mu))_{r\in\mathcal D}
\end{align*}
and
\begin{align*}
(\mu_{\mathrm{Mob}}(r)h(r)^{1/2})_{r\in\mathcal D}.
\end{align*}
The finite-dimensional hypotheses of Cauchy-Schwarz are satisfied because $\mathcal D$ is finite and all entries are real. Since $\mu_1=1$, the preceding identity gives
\begin{align*}
1\leq \left(\sum_{r\in\mathcal D}h(r)y_r(\mu)^2\right)\left(\sum_{r\in\mathcal D}h(r)\right).
\end{align*}
Using the diagonalization and the definition of $G(D,z)$, this yields
\begin{align*}
Q(\mu)\geq \frac{1}{G(D,z)}.
\end{align*}
Finally we identify the minimizer with the optimal weights appearing in the theorem statement. Equality in Cauchy-Schwarz occurs when the two finite vectors are proportional, and this is achieved by setting
\begin{align*}
y_r=\frac{\mu_{\mathrm{Mob}}(r)}{G(D,z)}
\end{align*}
for every $r\in\mathcal D$. These values satisfy the constraint because
\begin{align*}
\sum_{r\in\mathcal D}\mu_{\mathrm{Mob}}(r)h(r)y_r=1.
\end{align*}
The change from $\mu$ to $y(\mu)$ is triangular with diagonal coefficient $b(r)>0$, so it determines a unique vector $\lambda\in\mathbb R^{\mathcal D}$ with $\lambda_1=1$. This vector is, by definition, the optimal Selberg weight vector. Therefore
\begin{align*}
Q(\lambda)=\frac{1}{G(D,z)}.
\end{align*}
[/guided]
[/step]
[step:Bound the remainder and conclude the upper bound]
From the preceding steps,
\begin{align*}
S(A,P,z)\leq XQ(\lambda)+\sum_{d,e\in\mathcal D}\lambda_d\lambda_e r_{\operatorname{lcm}(d,e)}.
\end{align*}
The minimization step gives
\begin{align*}
XQ(\lambda)=\frac{X}{G(D,z)}.
\end{align*}
For the remainder term, the finite triangle inequality gives
\begin{align*}
\sum_{d,e\in\mathcal D}\lambda_d\lambda_e r_{\operatorname{lcm}(d,e)}\leq \sum_{d,e\in\mathcal D}|\lambda_d\lambda_e|\,|r_{\operatorname{lcm}(d,e)}|.
\end{align*}
Therefore
\begin{align*}
S(A,P,z)\leq \frac{X}{G(D,z)}+\sum_{d,e\in\mathcal D}|\lambda_d\lambda_e|\,|r_{\operatorname{lcm}(d,e)}|.
\end{align*}
Since $\mathcal D=\{d\in\mathbb N:d<D\text{ and }d\mid P(z)\}$, this is equivalently
\begin{align*}
S(A,P,z)\leq \frac{X}{G(D,z)}+\sum_{\substack{d,e<D,\ d\mid P(z),\ e\mid P(z)}}|\lambda_d\lambda_e|\,|r_{\operatorname{lcm}(d,e)}|.
\end{align*}
This is the stated Selberg upper bound.
[/step]