[proofplan]
We prove the stronger statement that conditioning on any positive-measure past cylinder does not change the distribution of any finite future block. This is checked directly from the product structure of the Bernoulli measure: a joint cylinder involving disjoint past and future coordinates has measure equal to the product of its past and future factors. Therefore the conditional future block distribution equals the unconditional product distribution, and the optimal coupling for the $\bar d_n$-distance is the diagonal coupling, giving distance $0$.
[/proofplan]
[step:Define the past cylinders and future name distributions]
For each coordinate index $j\in\mathbb Z$, define the coordinate map
\begin{align*}
\pi_j:X\to A,\quad \pi_j(x)=x_j.
\end{align*}
Fix $n\in\mathbb N$ and $g\in\mathbb N\cup\{0\}$. Define the future-name map
\begin{align*}
\Phi_n:X\to A^n,\quad \Phi_n(x)=(x_0,x_1,\dots,x_{n-1}).
\end{align*}
Let $C\subset X$ be a past cylinder determined by finitely many coordinates in $\{\dots,-g-2,-g-1\}$, and assume $\mu(C)>0$. Thus there are indices $i_1,\dots,i_m\in\mathbb Z$ with $i_r\le -g-1$ for every $r\in\{1,\dots,m\}$ and symbols $c_1,\dots,c_m\in A$ such that
\begin{align*}
C=\{x\in X:\pi_{i_r}(x)=c_r\text{ for every }r\in\{1,\dots,m\}\}.
\end{align*}
Let $\lambda_n$ denote the unconditional distribution of $\Phi_n$ on $A^n$, and let $\lambda_n^C$ denote the conditional distribution of $\Phi_n$ given $C$. Explicitly, for each word $w=(w_0,\dots,w_{n-1})\in A^n$,
\begin{align*}
\lambda_n(\{w\})=\mu(\{x\in X:\Phi_n(x)=w\})
\end{align*}
and
\begin{align*}
\lambda_n^C(\{w\})=\mu(\{x\in X:\Phi_n(x)=w\}\mid C).
\end{align*}
[guided]
The object we must compare is not the points of $X$ themselves, but the laws of their finite future names. For a fixed block length $n$, the map
\begin{align*}
\Phi_n:X\to A^n,\quad \Phi_n(x)=(x_0,x_1,\dots,x_{n-1})
\end{align*}
records exactly the $\mathcal P$-name of $x$ from time $0$ through time $n-1$.
A past cylinder $C$ is an event that only restricts finitely many negative coordinates before the gap. We write it as
\begin{align*}
C=\{x\in X:\pi_{i_r}(x)=c_r\text{ for every }r\in\{1,\dots,m\}\},
\end{align*}
where each index satisfies $i_r\le -g-1$. The assumption $\mu(C)>0$ is needed so that the [conditional probability](/page/Conditional%20Probability) $\mu(\,\cdot\mid C)$ is defined.
We now define two probability measures on the finite set $A^n$. The first, $\lambda_n$, is the unconditional law of the future name. The second, $\lambda_n^C$, is the law of the same future name after conditioning on the past event $C$. Proving very weak Bernoulli in this Bernoulli case amounts to proving that these two measures are identical for every such $C$, $n$, and $g$.
[/guided]
[/step]
[step:Compute the conditional probability of each future word]
Fix a word $w=(w_0,\dots,w_{n-1})\in A^n$, and define its future cylinder
\begin{align*}
F_w=\{x\in X:\pi_j(x)=w_j\text{ for every }j\in\{0,\dots,n-1\}\}.
\end{align*}
The coordinate sets $\{i_1,\dots,i_m\}$ and $\{0,\dots,n-1\}$ are disjoint because $i_r\le -g-1\le -1$ for every $r$. Since $\mu=p^{\mathbb Z}$ is the product measure on $A^{\mathbb Z}$, cylinder events determined by disjoint finite coordinate sets are independent. This finite-dimensional independence applies to $C$, which is determined by $\{i_1,\dots,i_m\}$, and to $F_w$, which is determined by $\{0,\dots,n-1\}$. Hence the measure of the joint cylinder factors:
\begin{align*}
\mu(C\cap F_w)=\mu(C)\mu(F_w).
\end{align*}
Because $\mu(C)>0$, division by $\mu(C)$ gives
\begin{align*}
\lambda_n^C(\{w\})=\mu(F_w\mid C)=\frac{\mu(C\cap F_w)}{\mu(C)}=\mu(F_w)=\lambda_n(\{w\}).
\end{align*}
Since this equality holds for every word $w\in A^n$, the two probability measures on $A^n$ are equal:
\begin{align*}
\lambda_n^C=\lambda_n.
\end{align*}
[guided]
Fix a word $w=(w_0,\dots,w_{n-1})\in A^n$, and define the future cylinder
\begin{align*}
F_w=\{x\in X:\pi_j(x)=w_j\text{ for every }j\in\{0,\dots,n-1\}\}.
\end{align*}
The event $C$ depends only on the finite coordinate set $\{i_1,\dots,i_m\}$, while $F_w$ depends only on $\{0,\dots,n-1\}$. These two coordinate sets are disjoint because each $i_r\le -g-1\le -1$. The product measure $\mu=p^{\mathbb Z}$ makes cylinder events depending on disjoint finite coordinate sets independent, so it applies to the pair $C$ and $F_w$ and gives
\begin{align*}
\mu(C\cap F_w)=\mu(C)\mu(F_w).
\end{align*}
Because $\mu(C)>0$, the conditional probability of $F_w$ given $C$ is defined, and division by $\mu(C)$ yields
\begin{align*}
\lambda_n^C(\{w\})=\mu(F_w\mid C)=\frac{\mu(C\cap F_w)}{\mu(C)}=\mu(F_w)=\lambda_n(\{w\}).
\end{align*}
This equality holds for every atom $\{w\}$ of the finite set $A^n$. Since probability measures on a finite set are determined by their values on singleton atoms, we conclude
\begin{align*}
\lambda_n^C=\lambda_n.
\end{align*}
[/guided]
[/step]
[step:Convert equality of distributions into zero $\bar d_n$-distance]
Define the normalized Hamming distance
\begin{align*}
d_n:A^n\times A^n\to[0,1],\quad d_n(u,v)=\frac{1}{n}\#\{j\in\{0,\dots,n-1\}:u_j\ne v_j\}.
\end{align*}
The Ornstein $\bar d_n$-distance between two probability measures $\alpha$ and $\beta$ on $A^n$ is the infimum of
\begin{align*}
\sum_{u,v\in A^n}d_n(u,v)\Gamma(\{(u,v)\})
\end{align*}
over all couplings $\Gamma$ of $\alpha$ and $\beta$. Since $\lambda_n^C=\lambda_n$, choose the diagonal coupling $\Gamma_\Delta$ on $A^n\times A^n$ as follows. For $u\in A^n$, define
\begin{align*}
\Gamma_\Delta(\{(u,u)\})=\lambda_n(\{u\}).
\end{align*}
For $u,v\in A^n$ with $u\ne v$, define
\begin{align*}
\Gamma_\Delta(\{(u,v)\})=0.
\end{align*}
This is a coupling of $\lambda_n^C$ and $\lambda_n$, and $d_n(u,u)=0$ for every $u\in A^n$. Hence
\begin{align*}
\bar d_n(\lambda_n^C,\lambda_n)=0.
\end{align*}
[guided]
The $\bar d_n$-distance is obtained by minimizing the expected normalized Hamming distance over all couplings of the two word distributions. Here the two distributions are equal: $\lambda_n^C=\lambda_n$. Therefore the most efficient coupling is the diagonal one, which pairs each word with itself.
Define $\Gamma_\Delta$ on the atoms of $A^n\times A^n$ by assigning the mass of $u$ to the diagonal atom $(u,u)$:
\begin{align*}
\Gamma_\Delta(\{(u,u)\})=\lambda_n(\{u\})
\end{align*}
for every $u\in A^n$, and by assigning zero mass off the diagonal:
\begin{align*}
\Gamma_\Delta(\{(u,v)\})=0
\end{align*}
whenever $u,v\in A^n$ and $u\ne v$. The first marginal of $\Gamma_\Delta$ is $\lambda_n$, and the second marginal is also $\lambda_n$; since $\lambda_n^C=\lambda_n$, this is a coupling of $\lambda_n^C$ and $\lambda_n$.
On every atom with positive $\Gamma_\Delta$-mass, the two words are identical. Thus $d_n(u,u)=0$ for each $u\in A^n$, and the corresponding coupling cost is
\begin{align*}
\sum_{u,v\in A^n}d_n(u,v)\Gamma_\Delta(\{(u,v)\})=0.
\end{align*}
Since $\bar d_n$ is an infimum over nonnegative costs and one admissible cost is $0$, we get
\begin{align*}
\bar d_n(\lambda_n^C,\lambda_n)=0.
\end{align*}
[/guided]
[/step]
[step:Verify the very weak Bernoulli condition]
Let $\varepsilon>0$ be arbitrary. The preceding steps show that for every $n\in\mathbb N$, every $g\in\mathbb N\cup\{0\}$, and every positive-measure past cylinder $C$ determined before the gap,
\begin{align*}
\bar d_n(\lambda_n^C,\lambda_n)=0<\varepsilon.
\end{align*}
Thus no positive-measure family of bad past cylinders exists. The exceptional past set in the very weak Bernoulli condition may therefore be chosen to be $\varnothing$, which has $\mu$-measure $0$. Hence the coordinate partition $\mathcal P$ of the Bernoulli shift is very weak Bernoulli.
[/step]