[proofplan]
Fix $k \geq 1$. We build $U_k$ by a universal simulation algorithm: on input $(e,x_1,\dots,x_k)$ it first decides whether $e$ is the code of a valid $k$-ary program, and if so it computes the initial configuration of that program on input $(x_1,\dots,x_k)$. It then repeatedly applies the effectively decoded one-step transition rule until the simulated computation halts. The effectiveness assumptions on the Gödel coding ensure that validity checking, initial-configuration construction, transition computation, and halt-output detection are all computable, so the simulator itself is a partial computable function.
[/proofplan]
[step:Fix the arity and extract the computable syntactic data from the coding]
Fix $k \geq 1$. By effectiveness of the Gödel coding, there is a total computable predicate
\begin{align*}
\operatorname{Valid}_k: \mathbb N \to \{0,1\}
\end{align*}
such that $\operatorname{Valid}_k(e)=1$ exactly when $e$ codes a valid program of arity $k$.
There is also a total computable map
\begin{align*}
\operatorname{Init}_k: \mathbb N^{k+1} \to \mathbb N
\end{align*}
which, for $\operatorname{Valid}_k(e)=1$, returns the code of the initial configuration of the program coded by $e$ on input $(x_1,\dots,x_k)$. Finally, there is a total computable map
\begin{align*}
\operatorname{Step}: \mathbb N \times \mathbb N \to \mathbb N
\end{align*}
such that $\operatorname{Step}(e,c)$ is the code of the next configuration of the program coded by $e$ from the configuration coded by $c$, whenever $e$ is valid and $c$ is a configuration code for that program.
These maps exist because, in an effective Gödel coding of a standard machine model, the program syntax, the arity declaration, the finite transition table, and the finite configuration data are uniformly decodable by an algorithm.
[/step]
[step:Define the universal simulator as a partial algorithm]
Define a partial algorithm $M_k$ with input set $\mathbb N^{k+1}$ and output set $\mathbb N$ as follows. On input $(e,x_1,\dots,x_k)$, first compute $\operatorname{Valid}_k(e)$.
If $\operatorname{Valid}_k(e)=0$, enter a fixed divergent computation, for example the computation that repeatedly replaces an internal counter $s \in \mathbb N$ by $s+1$ and never halts.
If $\operatorname{Valid}_k(e)=1$, compute
\begin{align*}
c_0 := \operatorname{Init}_k(e,x_1,\dots,x_k),
\end{align*}
where $c_0 \in \mathbb N$ is the code of the initial configuration. Then construct a sequence of configuration codes $(c_t)_{t \in \mathbb N \cup \{0\}}$ recursively by
\begin{align*}
c_{t+1} := \operatorname{Step}(e,c_t)
\end{align*}
for each $t \in \mathbb N \cup \{0\}$, unless $c_t$ is a halting configuration. At stage $t$, before computing $c_{t+1}$, the algorithm checks whether $c_t$ is a halting configuration. If it is, the algorithm decodes the output value $y \in \mathbb N$ from $c_t$ and halts with output $y$.
Let
\begin{align*}
U_k: \mathbb N^{k+1} \rightharpoonup \mathbb N
\end{align*}
be the partial function computed by $M_k$.
[guided]
We now spell out why this really defines a partial computable function. The input to the universal algorithm is a tuple $(e,x_1,\dots,x_k) \in \mathbb N^{k+1}$. The first operation is the computation of
\begin{align*}
\operatorname{Valid}_k(e) \in \{0,1\}.
\end{align*}
This is a computable test because the Gödel coding is effective: an algorithm can inspect the code $e$ and decide whether it represents a syntactically valid program whose input arity is $k$.
If $\operatorname{Valid}_k(e)=0$, the algorithm deliberately performs a fixed non-halting procedure. This makes $U_k(e,x_1,\dots,x_k)$ undefined, matching the convention that an invalid code represents a nowhere-defined function.
Assume now that $\operatorname{Valid}_k(e)=1$. The algorithm computes the natural number
\begin{align*}
c_0 := \operatorname{Init}_k(e,x_1,\dots,x_k),
\end{align*}
which is the encoded initial configuration of the program coded by $e$ on the input tuple $(x_1,\dots,x_k)$. A configuration code contains all finite data needed to continue the computation: the current state, tape or register contents, head positions or instruction pointer, and any other finite machine-state data used by the chosen standard model.
The algorithm then iterates the computable transition map
\begin{align*}
\operatorname{Step}: \mathbb N \times \mathbb N \to \mathbb N.
\end{align*}
Thus, as long as the current configuration code $c_t$ is not halting, it computes
\begin{align*}
c_{t+1} := \operatorname{Step}(e,c_t).
\end{align*}
The halting test and output decoding are computable because halting configurations and their output fields are part of the effective machine syntax. Therefore every finite stage of the simulation is carried out by an algorithm. The resulting procedure may run forever, but that is exactly what it means for the computed function to be partial. Hence the function $U_k$ computed by this simulator is partial computable.
[/guided]
[/step]
[step:Verify agreement on valid program indices]
Let $e \in \mathbb N$ satisfy $\operatorname{Valid}_k(e)=1$, and let $(x_1,\dots,x_k) \in \mathbb N^k$. By construction, $M_k$ begins from the same initial configuration as the program coded by $e$ on input $(x_1,\dots,x_k)$. The recursive definition
\begin{align*}
c_{t+1}=\operatorname{Step}(e,c_t)
\end{align*}
uses exactly the decoded transition rule of that program. Therefore, for every $t \in \mathbb N \cup \{0\}$, the configuration code $c_t$ is the code of the configuration reached by the coded program after $t$ computation steps.
If the coded program halts after $t$ steps with output $y \in \mathbb N$, then $c_t$ is a halting configuration whose decoded output is $y$, so $M_k$ halts with output $y$. Hence
\begin{align*}
U_k(e,x_1,\dots,x_k)=y=\varphi_{e,k}(x_1,\dots,x_k).
\end{align*}
If the coded program does not halt on $(x_1,\dots,x_k)$, then no $c_t$ is a halting configuration. The simulator therefore never reaches its output instruction, so $U_k(e,x_1,\dots,x_k)$ is undefined. Thus
\begin{align*}
U_k(e,x_1,\dots,x_k) \simeq \varphi_{e,k}(x_1,\dots,x_k)
\end{align*}
for every valid $k$-ary program index $e$.
[/step]
[step:Verify agreement on invalid program indices and conclude universality]
Let $e \in \mathbb N$ satisfy $\operatorname{Valid}_k(e)=0$, and let $(x_1,\dots,x_k) \in \mathbb N^k$. By definition of $M_k$, the computation enters the fixed divergent branch, so $U_k(e,x_1,\dots,x_k)$ is undefined. By the convention in the numbering, $\varphi_{e,k}$ is nowhere defined when $e$ is not a valid code of arity $k$, so $\varphi_{e,k}(x_1,\dots,x_k)$ is also undefined.
Combining the valid-index and invalid-index cases, for every $e \in \mathbb N$ and every $(x_1,\dots,x_k) \in \mathbb N^k$,
\begin{align*}
U_k(e,x_1,\dots,x_k) \simeq \varphi_{e,k}(x_1,\dots,x_k).
\end{align*}
Since $k \geq 1$ was arbitrary, such a universal partial computable function exists for every arity $k$.
[/step]