[proofplan]
We encode the set $A$ as a point $x_A$ in the two-sided symbolic space $\{0,1\}^{\mathbb{Z}}$ and take the orbit closure under the shift. Along intervals on which the density of $A$ approaches $\overline d(A)$, we average Dirac masses over the orbit of $x_A$ and pass to a weak subsequential limit. The endpoint error in the empirical averages proves that the limiting measure is shift-invariant, and cylinder sets in the symbolic system exactly record finite intersections of shifts of $A$. Passing to the limit gives the correspondence inequality, while the one-coordinate cylinder gives $\mu(B)=\overline d(A)$.
[/proofplan]
[step:Choose density-realising intervals and build the symbolic orbit closure]
Set
\begin{align*}
d := \overline d(A).
\end{align*}
By the definition of the limit superior, choose a strictly increasing sequence $(N_j)_{j=1}^{\infty}$ in $\mathbb{N}$ such that
\begin{align*}
\lim_{j \to \infty} \frac{|A \cap \{1,\dots,N_j\}|}{N_j} = d.
\end{align*}
Let $\Omega := \{0,1\}^{\mathbb{Z}}$ with the [product topology](/page/Product%20Topology) and its Borel $\sigma$-algebra $\mathcal{B}(\Omega)$. Define the shift map
\begin{align*}
\sigma: \Omega &\to \Omega \\
x &\mapsto \sigma x,
\end{align*}
where
\begin{align*}
(\sigma x)(m) := x(m+1)
\end{align*}
for every $x \in \Omega$ and every $m \in \mathbb{Z}$. The map $\sigma$ is a homeomorphism, with inverse $\sigma^{-1}$ given by $(\sigma^{-1}x)(m)=x(m-1)$.
Define the indicator sequence of $A$ as the map
\begin{align*}
x_A: \mathbb{Z} &\to \{0,1\} \\
m &\mapsto x_A(m),
\end{align*}
where $x_A(m)=1$ if $m \in A$ and $x_A(m)=0$ otherwise. Since $A \subset \mathbb{N}$, this convention also gives $x_A(m)=0$ for $m \leq 0$.
Let
\begin{align*}
X := \overline{\{\sigma^n x_A : n \in \mathbb{Z}\}} \subset \Omega,
\end{align*}
where the closure is taken in the product topology. Then $X$ is compact, metrizable, and $\sigma(X)=X$. Let
\begin{align*}
T := \sigma|_X: X \to X
\end{align*}
be the restricted shift homeomorphism, and let $\mathcal{B}:=\mathcal{B}(X)$ be the Borel $\sigma$-algebra of $X$.
[/step]
[step:Extract a weak limit of empirical orbit measures]
For each $j \in \mathbb{N}$, define the Borel probability measure
\begin{align*}
\mu_j := \frac{1}{N_j}\sum_{n=1}^{N_j} \delta_{T^n x_A}
\end{align*}
on $(X,\mathcal{B})$, where $\delta_y$ denotes the Dirac probability measure at $y \in X$.
Since $X$ is compact and metrizable, the space of Borel probability measures on $X$ is sequentially compact in the [weak topology](/page/Weak%20Topology) (citing a result not yet in the wiki: weak compactness of probability measures on compact metric spaces). Therefore there is a subsequence $(N_{j_k})_{k=1}^{\infty}$ and a Borel probability measure $\mu$ on $X$ such that
\begin{align*}
\mu_{j_k} \to \mu
\end{align*}
weakly, meaning that for every continuous function $\varphi: X \to \mathbb{R}$,
\begin{align*}
\lim_{k \to \infty} \int_X \varphi(y)\, d\mu_{j_k}(y)
=
\int_X \varphi(y)\, d\mu(y).
\end{align*}
Because $(N_{j_k})_{k=1}^{\infty}$ is a subsequence of the density-realising sequence, we still have
\begin{align*}
\lim_{k \to \infty} \frac{|A \cap \{1,\dots,N_{j_k}\}|}{N_{j_k}} = d.
\end{align*}
[guided]
For each interval $\{1,\dots,N_j\}$, we put equal mass on the orbit points
\begin{align*}
T x_A,\; T^2x_A,\; \dots,\; T^{N_j}x_A.
\end{align*}
This produces the probability measure
\begin{align*}
\mu_j := \frac{1}{N_j}\sum_{n=1}^{N_j} \delta_{T^n x_A}.
\end{align*}
The compactness input is used only to guarantee a limiting probability measure. Since $X$ is compact and metrizable, the family of Borel probability measures on $X$ is weakly sequentially compact (citing a result not yet in the wiki: weak compactness of probability measures on compact metric spaces). Hence, after passing to a subsequence $(N_{j_k})_{k=1}^{\infty}$, there is a Borel probability measure $\mu$ on $X$ such that for every continuous function $\varphi: X \to \mathbb{R}$,
\begin{align*}
\lim_{k \to \infty} \int_X \varphi(y)\, d\mu_{j_k}(y)
=
\int_X \varphi(y)\, d\mu(y).
\end{align*}
Passing to this subsequence does not disturb the density limit, because every subsequence of a convergent sequence has the same limit:
\begin{align*}
\lim_{k \to \infty} \frac{|A \cap \{1,\dots,N_{j_k}\}|}{N_{j_k}} = d.
\end{align*}
[/guided]
[/step]
[step:Show that the limiting measure is shift-invariant]
Let $\varphi: X \to \mathbb{R}$ be continuous. For each $j \in \mathbb{N}$,
\begin{align*}
\int_X \varphi(Ty)\, d\mu_j(y)
&=
\frac{1}{N_j}\sum_{n=1}^{N_j} \varphi(T^{n+1}x_A),\\
\int_X \varphi(y)\, d\mu_j(y)
&=
\frac{1}{N_j}\sum_{n=1}^{N_j} \varphi(T^n x_A).
\end{align*}
Subtracting the two finite sums gives
\begin{align*}
\left|
\int_X \varphi(Ty)\, d\mu_j(y)
-
\int_X \varphi(y)\, d\mu_j(y)
\right|
=
\frac{1}{N_j}
\left|
\varphi(T^{N_j+1}x_A)-\varphi(Tx_A)
\right|
\leq
\frac{2\|\varphi\|_\infty}{N_j}.
\end{align*}
Letting $j=j_k$ and then $k \to \infty$, [weak convergence](/page/Weak%20Convergence) gives
\begin{align*}
\int_X \varphi(Ty)\, d\mu(y)
=
\int_X \varphi(y)\, d\mu(y).
\end{align*}
Since this holds for every continuous function $\varphi: X \to \mathbb{R}$, the measure $\mu$ is $T$-invariant, by the fact that continuous functions determine Borel probability measures on compact metric spaces (citing a result not yet in the wiki: uniqueness of Borel probability measures from integrals against continuous functions). Thus $(X,\mathcal{B},\mu,T)$ is an invertible measure-preserving system.
[guided]
The only possible failure of exact shift-invariance for the empirical measures comes from the two endpoints of the finite orbit segment. Let $\varphi: X \to \mathbb{R}$ be continuous. Because $\mu_j$ is the average of the Dirac masses at $T^n x_A$ for $1 \leq n \leq N_j$, integrating $\varphi \circ T$ against $\mu_j$ gives
\begin{align*}
\int_X \varphi(Ty)\, d\mu_j(y)
=
\frac{1}{N_j}\sum_{n=1}^{N_j} \varphi(T^{n+1}x_A).
\end{align*}
On the other hand,
\begin{align*}
\int_X \varphi(y)\, d\mu_j(y)
=
\frac{1}{N_j}\sum_{n=1}^{N_j} \varphi(T^n x_A).
\end{align*}
The two sums contain the same middle terms:
\begin{align*}
\varphi(T^2x_A),\dots,\varphi(T^{N_j}x_A).
\end{align*}
Only $\varphi(Tx_A)$ and $\varphi(T^{N_j+1}x_A)$ remain. Therefore
\begin{align*}
\left|
\int_X \varphi(Ty)\, d\mu_j(y)
-
\int_X \varphi(y)\, d\mu_j(y)
\right|
=
\frac{1}{N_j}
\left|
\varphi(T^{N_j+1}x_A)-\varphi(Tx_A)
\right|
\leq
\frac{2\|\varphi\|_\infty}{N_j}.
\end{align*}
The right-hand side tends to $0$ along the subsequence $j=j_k$. Since $\varphi$ and $\varphi \circ T$ are continuous and $\mu_{j_k}$ converges weakly to $\mu$, we may pass to the limit and obtain
\begin{align*}
\int_X \varphi(Ty)\, d\mu(y)
=
\int_X \varphi(y)\, d\mu(y).
\end{align*}
This identity for all continuous test functions is exactly the weak formulation of $T$-invariance of $\mu$, using the fact that continuous functions determine Borel probability measures on compact metric spaces (citing a result not yet in the wiki: uniqueness of Borel probability measures from integrals against continuous functions). Since $T$ is a homeomorphism of $X$, the system is invertible as well as measure-preserving.
[/guided]
[/step]
[step:Define the cylinder set representing membership in $A$]
Define the coordinate-zero cylinder
\begin{align*}
B := \{x \in X : x(0)=1\}.
\end{align*}
The coordinate projection $x \mapsto x(0)$ is continuous on $X$, and $\{1\}$ is open and closed in $\{0,1\}$, so $B$ is a clopen subset of $X$. In particular, $B \in \mathcal{B}$ and its indicator function $\mathbb{1}_B: X \to \mathbb{R}$ is continuous.
For every $n \in \mathbb{N}$,
\begin{align*}
T^n x_A \in B
\iff
(T^n x_A)(0)=1
\iff
x_A(n)=1
\iff
n \in A.
\end{align*}
Therefore
\begin{align*}
\mu_j(B)
=
\frac{1}{N_j}\sum_{n=1}^{N_j}\mathbb{1}_B(T^n x_A)
=
\frac{|A \cap \{1,\dots,N_j\}|}{N_j}.
\end{align*}
Passing to the subsequence and using weak convergence against the continuous function $\mathbb{1}_B$ gives
\begin{align*}
\mu(B)
=
\lim_{k \to \infty}\mu_{j_k}(B)
=
\lim_{k \to \infty}\frac{|A \cap \{1,\dots,N_{j_k}\}|}{N_{j_k}}
=
d.
\end{align*}
Thus $\mu(B)=\overline d(A)$.
[/step]
[step:Identify finite intersections of shifted cylinders with shifted intersections of $A$]
Let $F \subset \mathbb{Z}$ be finite. Define the cylinder set
\begin{align*}
C_F := \bigcap_{h \in F} T^{-h}B.
\end{align*}
If $F=\varnothing$, then $C_F=X$. If $F \neq \varnothing$, then $C_F$ is a finite intersection of clopen sets, hence clopen, and $\mathbb{1}_{C_F}: X \to \mathbb{R}$ is continuous.
For each $n \in \mathbb{N}$,
\begin{align*}
T^n x_A \in C_F
&\iff
T^n x_A \in T^{-h}B \quad \text{for every } h \in F\\
&\iff
T^{n+h}x_A \in B \quad \text{for every } h \in F\\
&\iff
x_A(n+h)=1 \quad \text{for every } h \in F\\
&\iff
n+h \in A \quad \text{for every } h \in F\\
&\iff
n \in \bigcap_{h \in F}(A-h).
\end{align*}
Consequently,
\begin{align*}
\mu_j(C_F)
=
\frac{1}{N_j}
\left|
\left(\bigcap_{h \in F}(A-h)\right)\cap \{1,\dots,N_j\}
\right|.
\end{align*}
[guided]
The cylinder $C_F$ asks that all coordinates indexed by $F$ are equal to $1$, after translating this condition into the dynamics. Explicitly,
\begin{align*}
C_F := \bigcap_{h \in F} T^{-h}B.
\end{align*}
For a point $x \in X$, membership in $T^{-h}B$ means $T^h x \in B$, which means the zeroth coordinate of $T^h x$ is $1$. Since $T$ is the left shift, this is the same as saying that the $h$-th coordinate of $x$ is $1$.
Now evaluate this at the orbit point $T^n x_A$. For each $n \in \mathbb{N}$,
\begin{align*}
T^n x_A \in C_F
&\iff
T^{n+h}x_A \in B \quad \text{for every } h \in F\\
&\iff
(T^{n+h}x_A)(0)=1 \quad \text{for every } h \in F\\
&\iff
x_A(n+h)=1 \quad \text{for every } h \in F\\
&\iff
n+h \in A \quad \text{for every } h \in F\\
&\iff
n \in \bigcap_{h \in F}(A-h).
\end{align*}
This equivalence is the core of the correspondence principle: the measure of a cylinder in the dynamical system records the density of a finite intersection pattern in the original set $A$. Averaging the indicator of $C_F$ over the first $N_j$ orbit points gives
\begin{align*}
\mu_j(C_F)
=
\frac{1}{N_j}\sum_{n=1}^{N_j}\mathbb{1}_{C_F}(T^n x_A)
=
\frac{1}{N_j}
\left|
\left(\bigcap_{h \in F}(A-h)\right)\cap \{1,\dots,N_j\}
\right|.
\end{align*}
[/guided]
[/step]
[step:Pass to the limit and obtain the correspondence inequality]
Since $\mathbb{1}_{C_F}: X \to \mathbb{R}$ is continuous, weak convergence gives
\begin{align*}
\mu(C_F)
=
\lim_{k \to \infty} \mu_{j_k}(C_F).
\end{align*}
Using the counting identity from the previous step,
\begin{align*}
\mu(C_F)
&=
\lim_{k \to \infty}
\frac{1}{N_{j_k}}
\left|
\left(\bigcap_{h \in F}(A-h)\right)\cap \{1,\dots,N_{j_k}\}
\right|\\
&\leq
\limsup_{N \to \infty}
\frac{1}{N}
\left|
\left(\bigcap_{h \in F}(A-h)\right)\cap \{1,\dots,N\}
\right|\\
&=
\overline d\left(\bigcap_{h \in F}(A-h)\right).
\end{align*}
Because $C_F=\bigcap_{h \in F}T^{-h}B$, this is exactly
\begin{align*}
\mu\left(\bigcap_{h \in F}T^{-h}B\right)
\leq
\overline d\left(\bigcap_{h \in F}(A-h)\right).
\end{align*}
Together with $\mu(B)=\overline d(A)$ and the $T$-invariance of $\mu$, this proves the Furstenberg correspondence principle.
[/step]