[proofplan]
We prove both inclusions by the same simulation argument. If a language is decided in polynomial time by a machine in one model, the other model simulates its computation configuration by configuration. The configuration-size hypothesis keeps every intermediate encoding polynomially bounded in the input length, and the one-step simulation hypothesis then makes each simulated transition polynomially expensive. Multiplying the polynomial number of simulated steps by the polynomial cost per step gives a polynomial-time decider in the other model.
[/proofplan]
[step:Simulate an arbitrary polynomial-time computation from $\mathcal{M}_1$ inside $\mathcal{M}_2$]
Let $L \in \mathrm{P}_{\mathcal{M}_1}$. By definition of $\mathrm{P}_{\mathcal{M}_1}$, there is a deterministic machine $A$ in $\mathcal{M}_1$ and a polynomial
\begin{align*}
p_A: \mathbb{N} \to \mathbb{N}
\end{align*}
such that $A$ decides $L$ and, on every input $x$ of length $n$, halts within $p_A(n)$ steps.
By the configuration-size hypothesis applied to $A$, there is a polynomial
\begin{align*}
q_A: \mathbb{N} \to \mathbb{N}
\end{align*}
such that every configuration of $A$ reachable within $t$ steps from an input of length $n$ has an encoding of length at most $q_A(n+t)$. By the initial-encoding hypothesis with target model $\mathcal{M}_2$, there is an $\mathcal{M}_2$ machine $I_{2,A}$ and a polynomial
\begin{align*}
e_{2,A}: \mathbb{N} \to \mathbb{N}
\end{align*}
such that, on input $x$, the machine $I_{2,A}$ writes the encoding of the initial configuration of $A$ on $x$ in time at most $e_{2,A}(|x|)$. By the status-recognition hypothesis with target model $\mathcal{M}_2$, there is an $\mathcal{M}_2$ machine $H_{2,A}$ and a polynomial
\begin{align*}
h_{2,A}: \mathbb{N} \to \mathbb{N}
\end{align*}
such that, on an encoded configuration $c$ of $A$, the machine $H_{2,A}$ determines whether $c$ is nonhalting, accepting, or rejecting in time at most $h_{2,A}(|c|)$. By the one-step simulation hypothesis with $(i,j)=(1,2)$, there is a simulator machine $S_{2,A}$ in $\mathcal{M}_2$ and a polynomial
\begin{align*}
r_{2,A}: \mathbb{N} \to \mathbb{N}
\end{align*}
such that one transition of $A$ from an encoded configuration $c$ is simulated by $S_{2,A}$ in time at most $r_{2,A}(|c|)$.
Construct an $\mathcal{M}_2$ machine $B$ as follows. On input $x$, the machine $B$ runs $I_{2,A}$ to obtain the encoding of the initial configuration of $A$ on $x$. It then applies $H_{2,A}$ to the current encoded configuration. If the result is accepting, then $B$ accepts; if the result is rejecting, then $B$ rejects; if the result is nonhalting, then $B$ applies $S_{2,A}$ once and repeats the status test on the new encoded configuration. Since $S_{2,A}$ simulates the actual transition relation of $A$, and since $H_{2,A}$ correctly recognizes accepting and rejecting configurations, $B$ accepts exactly when $A$ accepts and rejects exactly when $A$ rejects. Therefore $B$ decides $L$.
[guided]
We start with an arbitrary language $L$ that is polynomial-time decidable in the first model. This means that there is a specific deterministic machine $A$ in $\mathcal{M}_1$ deciding $L$, and there is a polynomial
\begin{align*}
p_A: \mathbb{N} \to \mathbb{N}
\end{align*}
such that every computation of $A$ on an input $x$ of length $n$ stops after at most $p_A(n)$ transitions.
The goal is to build a machine in $\mathcal{M}_2$ deciding the same language. A step-by-step simulator alone is not enough: the simulating machine must also know how to create the first encoded configuration and how to tell when an encoded configuration is accepting, rejecting, or still running. These are exactly the additional encoding-access hypotheses in the theorem statement.
By the configuration-size hypothesis, applied to this fixed machine $A$, there is a polynomial
\begin{align*}
q_A: \mathbb{N} \to \mathbb{N}
\end{align*}
with the following property: if a configuration of $A$ is reachable within $t$ steps from an input of length $n$, then its encoding has length at most $q_A(n+t)$. This matters because every later cost bound is measured as a polynomial in the length of the current encoded configuration.
By the initial-encoding hypothesis with target model $\mathcal{M}_2$, there is an $\mathcal{M}_2$ machine $I_{2,A}$ and a polynomial
\begin{align*}
e_{2,A}: \mathbb{N} \to \mathbb{N}
\end{align*}
such that, on input $x$, the machine $I_{2,A}$ writes the encoding of the initial configuration of $A$ on $x$ in time at most $e_{2,A}(|x|)$. By the status-recognition hypothesis, there is an $\mathcal{M}_2$ machine $H_{2,A}$ and a polynomial
\begin{align*}
h_{2,A}: \mathbb{N} \to \mathbb{N}
\end{align*}
such that, on an encoded configuration $c$, the machine $H_{2,A}$ determines whether $c$ is nonhalting, accepting, or rejecting in time at most $h_{2,A}(|c|)$. By the one-step simulation hypothesis with the ordered pair $(1,2)$, there is a machine $S_{2,A}$ in $\mathcal{M}_2$ and a polynomial
\begin{align*}
r_{2,A}: \mathbb{N} \to \mathbb{N}
\end{align*}
such that $S_{2,A}$ simulates one transition of $A$ from any encoded configuration $c$ in time at most $r_{2,A}(|c|)$.
Define an $\mathcal{M}_2$ machine $B$ by direct simulation. On input $x$, the machine $B$ first runs $I_{2,A}$ to produce the initial encoded configuration of $A$ on $x$. It then runs $H_{2,A}$ on the current encoded configuration. If $H_{2,A}$ reports accepting, then $B$ accepts; if it reports rejecting, then $B$ rejects; if it reports nonhalting, then $B$ applies $S_{2,A}$ once to obtain the next encoded configuration and repeats the same status test. Because $A$ is deterministic, there is only one computation path to simulate. Because $A$ decides $L$, that path eventually reaches the correct accepting or rejecting configuration. Hence $B$ also decides $L$.
[/guided]
[/step]
[step:Bound the total running time of the simulation by a polynomial]
Fix an input $x$ of length $n$. Since $A$ halts within $p_A(n)$ steps, every configuration encountered during the simulation is reachable within some $t \leq p_A(n)$ steps. Hence its encoded length is at most
\begin{align*}
q_A(n+t) \leq q_A(n+p_A(n))
\end{align*}
after replacing $q_A$ by a nondecreasing polynomial upper bound if necessary.
After replacing $h_{2,A}$ and $r_{2,A}$ by nondecreasing polynomial upper bounds if necessary, each status test costs at most
\begin{align*}
h_{2,A}(q_A(n+p_A(n)))
\end{align*}
and each simulated transition costs at most
\begin{align*}
r_{2,A}(q_A(n+p_A(n))).
\end{align*}
The machine $B$ runs the initial encoder once, performs at most $p_A(n)$ simulated transitions, and performs at most $p_A(n)+1$ status tests, including the final halting status test. Thus the total simulation time is bounded by
\begin{align*}
P_{2,A}(n) := e_{2,A}(n) + (p_A(n)+1)\, h_{2,A}(q_A(n+p_A(n))) + p_A(n)\, r_{2,A}(q_A(n+p_A(n))) + p_A(n)
\end{align*}
where the final $p_A(n)$ term accounts for the iteration overhead of the simulation loop. The function $P_{2,A}: \mathbb{N} \to \mathbb{N}$ is a polynomial because it is built from polynomials using addition, multiplication, and composition. Thus $B$ decides $L$ in polynomial time in model $\mathcal{M}_2$, so $L \in \mathrm{P}_{\mathcal{M}_2}$.
[/step]
[step:Repeat the same argument in the opposite direction]
The preceding argument used only the configuration-size hypothesis for the simulated machine and the one-step simulation hypothesis from one model to the other. These hypotheses are symmetric in $\mathcal{M}_1$ and $\mathcal{M}_2$ by assumption. Therefore, if $L \in \mathrm{P}_{\mathcal{M}_2}$, the same construction with the ordered pair $(2,1)$ gives a polynomial-time decider for $L$ in $\mathcal{M}_1$. Hence
\begin{align*}
\mathrm{P}_{\mathcal{M}_2} \subset \mathrm{P}_{\mathcal{M}_1}.
\end{align*}
Together with the inclusion already proved,
\begin{align*}
\mathrm{P}_{\mathcal{M}_1} \subset \mathrm{P}_{\mathcal{M}_2},
\end{align*}
we conclude that
\begin{align*}
\mathrm{P}_{\mathcal{M}_1} = \mathrm{P}_{\mathcal{M}_2}.
\end{align*}
[/step]