[guided]We have shown $\xi(g) = \phi(x)(g)$ for all $g \in \mathcal{F}$. The question is: does this force $\xi = \phi(x)$?
The answer is yes, provided we know that $\xi$ and $\phi(x)$ agree on a large enough set of functionals. Specifically, we use the following argument.
Consider the linear functional $\xi - \phi(x) \in X^{**}$. It vanishes on every $g \in \mathcal{F}$. By linearity, it vanishes on $\operatorname{span}(\mathcal{F})$. Since $\xi - \phi(x)$ is norm-continuous on $X^*$ (as an element of $X^{**}$), it vanishes on $\overline{\operatorname{span}}(\mathcal{F})$, the norm-closure of $\operatorname{span}(\mathcal{F})$ in $X^*$.
We need $\overline{\operatorname{span}}(\mathcal{F}) = X^*$. Since $x \in Y$ and $(a_n) \subset Y$, the relevant question is whether $\mathcal{F}$ spans a dense subspace of $X^*$. The functionals in $\mathcal{F}$ were chosen so that their restrictions to each $Y_n$ are dense in $\overline{B}_{Y_n^*}$, hence dense in $Y_n^*$. But we need density in $X^*$, not just in $Y^*$.
Here we use a sharper argument: since both $\xi$ and $\phi(x)$ lie in $X^{**}$, and they agree on $\mathcal{F}$, we show they agree on all of $X^*$ by showing that the sequence $(a_n)$ witnesses the agreement on every functional.
For any $f \in X^*$: the real sequence $(f(a_n))$ has $f(x)$ as a cluster point (by hypothesis (3) — $x$ is a weak cluster point of $(a_n)$, meaning for every $f$ and every $\varepsilon > 0$, infinitely many $n$ satisfy $|f(a_n) - f(x)| < \varepsilon$). We also know that $\xi \in L = \overline{\phi(A)}^{\sigma(X^{**}, X^*)}$, so $\xi(f)$ is a cluster point of $(\phi(a_n)(f))_{n \ge 1} = (f(a_n))_{n \ge 1}$ in $\mathbb{R}$.
Now: by the construction, $g_m^{(j)}(a_n) \to \xi(g_m^{(j)})$ for each $g_m^{(j)} \in \mathcal{F}$. But for a general $f \in X^*$, we only know that $\xi(f)$ is a cluster point of $(f(a_n))$, not necessarily the limit. However, we can handle this as follows.
Take any subsequence $(a_{n_k})$ that converges weakly to $x$ (such a subsequence exists: from the full sequence $(a_n)$, extract a subsequence with $|f_1(a_{n_k}) - f_1(x)| < 1/k$ for a single functional $f_1$, then refine for $f_2$, etc. — but this requires countably many functionals). Instead, we argue: for the general $f$, note that $(f(a_n))$ is a bounded real sequence. We claim it has a unique cluster point, which must then be both $f(x)$ and $\xi(f)$.
To see uniqueness: suppose $\alpha$ and $\beta$ are both cluster points of $(f(a_n))$. Then there exist subsequences with $f(a_{n_k}) \to \alpha$ and $f(a_{n_j}) \to \beta$. Along the subsequence $(a_{n_k})$, the values $g_m^{(j')}(a_{n_k}) \to \xi(g_m^{(j')})$ for every $g_m^{(j')} \in \mathcal{F}$ (since the entire sequence converges on $\mathcal{F}$). But any weak cluster point of $(a_{n_k})$ must map to $\xi(g_m^{(j')})$ under $g_m^{(j')}$, and since $\mathcal{F}$ is total over $Y$, any two weak cluster points of $(a_{n_k})$ that lie in $Y$ must be equal. Call this common cluster point $z$. Then $f(z) = \alpha$. Similarly, from the subsequence $(a_{n_j})$, any weak cluster point $w \in Y$ satisfies $g(w) = \xi(g)$ for all $g \in \mathcal{F}$, so $w = z$. Then $\beta = f(w) = f(z) = \alpha$. So $(f(a_n))$ has a unique cluster point in $\mathbb{R}$, which equals $f(x) = \xi(f)$.[/guided]