[step:Pass 3 — binarise right-hand sides of length $\ge 3$]
Consider a rule $A \to X_1 X_2 \cdots X_k \in P_2$ with $k \ge 3$ and $X_1, \dots, X_k \in V_1$. Introduce fresh variables $Y_1^{(A, X_1 \cdots X_k)}, \dots, Y_{k - 2}^{(A, X_1 \cdots X_k)}$ (one set of fresh variables per rule, so distinct rules do not share auxiliary variables), abbreviated $Y_1, \dots, Y_{k-2}$ when the rule is clear from context. Replace the rule by the chain of binary rules
\begin{align*}
A &\to X_1\, Y_1, \\
Y_i &\to X_{i+1}\, Y_{i+1}, \qquad 1 \le i \le k - 3, \\
Y_{k-2} &\to X_{k-1}\, X_k.
\end{align*}
Rules with $k = 2$ (binary) and $k = 1$ (necessarily $A \to a$ with $a \in \Sigma$, since unit rules have been eliminated) are left unchanged.
Let $V'$ be $V_1$ augmented by all fresh auxiliaries introduced, $P'$ the resulting production set, and $G' := (V', \Sigma, P', S)$.
[claim:$\mathcal{L}(G') = \mathcal{L}(G_2)$]
*Forward ($\mathcal{L}(G_2) \subseteq \mathcal{L}(G')$).* Given a $G_2$-derivation, imitate each step in $G'$: binary and unary steps reuse the identical rule, while a use of a long rule $A \to X_1 \cdots X_k$ is simulated by the chain of $k - 1$ binary steps $A \Rightarrow_{G'} X_1 Y_1 \Rightarrow_{G'} X_1 X_2 Y_2 \Rightarrow_{G'} \cdots \Rightarrow_{G'} X_1 X_2 \cdots X_k$.
*Reverse ($\mathcal{L}(G') \subseteq \mathcal{L}(G_2)$).* Given a $G'$-derivation producing $w \in \Sigma^*$, each auxiliary $Y_i^{(A, X_1 \cdots X_k)}$ appears exclusively on the left-hand side of exactly one $G'$-rule (the rule introduced for it in Pass 3) and on the right-hand side of the preceding chain-rule; so once $Y_i$ appears, the only way to reach a terminal string is to apply the unique $G'$-rule $Y_i \to X_{i+1}\, Y_{i+1}$ (or $Y_{k-2} \to X_{k-1} X_k$ at the end of the chain). Collapsing each full chain of $k - 1$ binary rewrites into a single use of the original rule $A \to X_1 \cdots X_k$ in $P_2$ produces a valid $G_2$-derivation of $w$.
[/claim]
Chaining with the previous equivalences, $\mathcal{L}(G') = \mathcal{L}(G_2) = \mathcal{L}(G)$.
Each rule $A \to X_1 \cdots X_k$ contributes $k - 2$ fresh auxiliaries and yields $k - 1$ binary rules in place of one $k$-rule, so Pass 3 adds at most $\sum_{A \to \alpha \in P_2} (|\alpha| - 2) \le \sum_{A \to \alpha \in P} (|\alpha| - 2)$ fresh variables and rules, as asserted.
[/step]