[proofplan]
We first construct the heat-semigroup candidate $\mu_t=G_t*\mu_0$ and verify that it remains in $\mathcal P_2(\mathbb R^n)$ and converges to $\mu_0$ in $W_2$ as $t\downarrow0$. We then prove directly from the Gaussian kernel that the density $\rho_t$ is smooth and solves the classical heat equation for positive time. Finally, we invoke the standard JKO/EVI identification theorem for the Boltzmann entropy, which states that this heat-semigroup curve is precisely the unique $\operatorname{EVI}_0$ Wasserstein gradient flow of $\operatorname{Ent}$, including for initial data of infinite entropy.
[/proofplan]
[step:Construct the heat-semigroup curve and keep it inside $\mathcal P_2(\mathbb R^n)$]
For $t>0$, define the Borel probability measure $\gamma_t$ on $\mathbb R^n$ by
\begin{align*}
d\gamma_t(z)=G_t(z)\,d\mathcal L^n(z).
\end{align*}
Thus $\gamma_t$ is the centred Gaussian probability measure with covariance matrix $2tI_n$. Define
\begin{align*}
\mu_t:=\gamma_t*\mu_0.
\end{align*}
Equivalently, for every Borel set $A\subset\mathbb R^n$,
\begin{align*}
\mu_t(A)=\int_{\mathbb R^n}\int_{\mathbb R^n}\mathbb 1_A(y+z)\,d\gamma_t(z)\,d\mu_0(y).
\end{align*}
By Tonelli's theorem applied to the non-negative function $(y,z)\mapsto \mathbb 1_A(y+z)$ on $\mathbb R^n\times\mathbb R^n$, this formula defines a Borel probability measure.
We verify the finite second moment. Using the elementary inequality $|y+z|^2\le 2|y|^2+2|z|^2$ for $y,z\in\mathbb R^n$ and applying Tonelli's theorem to the non-negative function $(y,z)\mapsto |y+z|^2$, we get
\begin{align*}
\int_{\mathbb R^n}|x|^2\,d\mu_t(x)\le 2\int_{\mathbb R^n}|y|^2\,d\mu_0(y)+2\int_{\mathbb R^n}|z|^2\,d\gamma_t(z).
\end{align*}
The first term is finite because $\mu_0\in\mathcal P_2(\mathbb R^n)$. For the Gaussian term,
\begin{align*}
\int_{\mathbb R^n}|z|^2\,d\gamma_t(z)=2nt<\infty.
\end{align*}
Therefore $\mu_t\in\mathcal P_2(\mathbb R^n)$ for every $t>0$.
[/step]
[step:Prove convergence to the initial measure in $W_2$]
For $t>0$, define a probability measure $\pi_t$ on $\mathbb R^n\times\mathbb R^n$ as the pushforward of $\mu_0\otimes\gamma_t$ under the Borel map $\Phi_t:\mathbb R^n\times\mathbb R^n\to\mathbb R^n\times\mathbb R^n$ given by
\begin{align*}
\Phi_t(y,z)=(y,y+z).
\end{align*}
The first marginal of $\pi_t$ is $\mu_0$, and the second marginal of $\pi_t$ is $\mu_t$, so $\pi_t$ is a coupling of $\mu_0$ and $\mu_t$. Hence, by the definition of $W_2$ as the infimum of the quadratic transport cost over all couplings,
\begin{align*}
W_2^2(\mu_0,\mu_t)\le \int_{\mathbb R^n\times\mathbb R^n}|x-y|^2\,d\pi_t(x,y).
\end{align*}
Using the definition of $\pi_t$ as a pushforward measure, the right-hand side equals
\begin{align*}
\int_{\mathbb R^n}\int_{\mathbb R^n}|z|^2\,d\gamma_t(z)\,d\mu_0(y)=2nt.
\end{align*}
Therefore $W_2(\mu_0,\mu_t)\le(2nt)^{1/2}$, and $\mu_t\to\mu_0$ in $(\mathcal P_2(\mathbb R^n),W_2)$ as $t\downarrow0$.
[guided]
The goal is to show that the heat semigroup has the correct initial condition in the Wasserstein topology. The clean way to do this is to build an explicit coupling rather than use weak convergence plus moment convergence separately.
For $t>0$, define the Gaussian probability measure $\gamma_t$ on $\mathbb R^n$ by $d\gamma_t(z)=G_t(z)\,d\mathcal L^n(z)$. We let $\pi_t$ be the law of the pair $(Y,Y+Z)$, where $Y$ has law $\mu_0$ and $Z$ has law $\gamma_t$. Formally, this means that $\pi_t$ is the pushforward of $\mu_0\otimes\gamma_t$ under the Borel map $\Phi_t:\mathbb R^n\times\mathbb R^n\to\mathbb R^n\times\mathbb R^n$ defined by
\begin{align*}
\Phi_t(y,z)=(y,y+z).
\end{align*}
The first coordinate of $\Phi_t(y,z)$ is $y$, so the first marginal of $\pi_t$ is $\mu_0$. The second coordinate is $y+z$, and its law is exactly the convolution $\gamma_t*\mu_0=\mu_t$. Thus $\pi_t$ is an admissible coupling between $\mu_0$ and $\mu_t$.
Since $W_2^2$ is the infimum of the cost over all such couplings, evaluating the cost on this particular coupling gives an upper bound:
\begin{align*}
W_2^2(\mu_0,\mu_t)\le \int_{\mathbb R^n\times\mathbb R^n}|x-y|^2\,d\pi_t(x,y).
\end{align*}
Under the map $\Phi_t$, the displacement from the first coordinate to the second coordinate is exactly $z$. Therefore the change-of-variables formula for pushforward measures gives
\begin{align*}
\int_{\mathbb R^n\times\mathbb R^n}|x-y|^2\,d\pi_t(x,y)=\int_{\mathbb R^n}\int_{\mathbb R^n}|z|^2\,d\gamma_t(z)\,d\mu_0(y).
\end{align*}
The measure $\mu_0$ is a probability measure, and $\gamma_t$ is the centred Gaussian with covariance matrix $2tI_n$, so
\begin{align*}
\int_{\mathbb R^n}\int_{\mathbb R^n}|z|^2\,d\gamma_t(z)\,d\mu_0(y)=2nt.
\end{align*}
Consequently
\begin{align*}
W_2(\mu_0,\mu_t)\le (2nt)^{1/2}.
\end{align*}
This tends to $0$ as $t\downarrow0$, which proves $\mu_t\to\mu_0$ in $\mathcal P_2(\mathbb R^n)$.
[/guided]
[/step]
[step:Identify the density and verify the heat equation]
For $t>0$, define the map $\rho:(0,\infty)\times\mathbb R^n\to[0,\infty)$ by
\begin{align*}
\rho(t,x)=\int_{\mathbb R^n}G_t(x-y)\,d\mu_0(y).
\end{align*}
For each fixed $t>0$, Tonelli's theorem gives
\begin{align*}
\int_{\mathbb R^n}\rho(t,x)\,d\mathcal L^n(x)=\int_{\mathbb R^n}\int_{\mathbb R^n}G_t(x-y)\,d\mathcal L^n(x)\,d\mu_0(y)=1,
\end{align*}
because $G_t$ integrates to $1$ with respect to $\mathcal L^n$. Hence $\rho_t:=\rho(t,\cdot)$ is a probability density and $\mu_t=\rho_t\,\mathcal L^n$.
We next verify smoothness and the heat equation. Fix $0<a<b<\infty$ and a compact set $K\subset\mathbb R^n$. For every multi-index $\alpha\in\mathbb N_0^n$ and every integer $m\ge0$, the function
\begin{align*}
(t,x,y)\mapsto \partial_t^m D_x^\alpha G_t(x-y)
\end{align*}
is continuous on $[a,b]\times K\times\mathbb R^n$ and is bounded there by a finite constant depending on $a$, $b$, $K$, $m$, and $\alpha$. Differentiation under the integral sign with respect to the finite measure $\mu_0$ therefore gives $\rho\in C^\infty((0,\infty)\times\mathbb R^n)$ and
\begin{align*}
\partial_t\rho(t,x)=\int_{\mathbb R^n}\partial_tG_t(x-y)\,d\mu_0(y),
\end{align*}
while
\begin{align*}
\Delta\rho(t,x)=\int_{\mathbb R^n}\Delta_xG_t(x-y)\,d\mu_0(y).
\end{align*}
A direct differentiation of the Gaussian kernel gives
\begin{align*}
\partial_tG_t(\xi)=\Delta_\xi G_t(\xi)
\end{align*}
for every $t>0$ and every $\xi\in\mathbb R^n$. Since $\Delta_xG_t(x-y)=\Delta_\xi G_t(\xi)$ at $\xi=x-y$, it follows that
\begin{align*}
\partial_t\rho(t,x)=\Delta\rho(t,x)
\end{align*}
for every $(t,x)\in(0,\infty)\times\mathbb R^n$.
[/step]
[step:Use the JKO identification of entropy gradient flow with the heat semigroup]
Write $\operatorname{Dom}(\operatorname{Ent})=\{\nu\in\mathcal P_2(\mathbb R^n):\operatorname{Ent}[\nu]<+\infty\}$ for the effective domain of the entropy. We now use the JKO/EVI identification theorem for the Boltzmann entropy on $\mathcal P_2(\mathbb R^n)$: in $(\mathcal P_2(\mathbb R^n),W_2)$, the lower semicontinuous displacement-convex functional $\operatorname{Ent}$ has a unique $\operatorname{EVI}_0$ gradient flow starting from every $\mu_0\in\overline{\operatorname{Dom}(\operatorname{Ent})}^{W_2}=\mathcal P_2(\mathbb R^n)$, and this flow is exactly the heat-semigroup curve $t\mapsto G_t*\mu_0$ [citetheorem:9575].
The hypotheses of this theorem apply here. The ambient space is exactly $(\mathcal P_2(\mathbb R^n),W_2)$. The functional is exactly the Boltzmann entropy defined above. The entropy is lower semicontinuous and displacement convex by the theorem's hypotheses, and the theorem also includes the closure statement $\overline{\operatorname{Dom}(\operatorname{Ent})}^{W_2}=\mathcal P_2(\mathbb R^n)$. The initial datum $\mu_0$ lies in $\mathcal P_2(\mathbb R^n)$ by assumption, and no finite-entropy assumption is required because the theorem is stated for all initial data in the $W_2$-closure of the entropy domain. Therefore the unique $\operatorname{EVI}_0$ gradient flow of $\operatorname{Ent}$ starting from $\mu_0$ is the curve constructed above:
\begin{align*}
\mu_t=G_t*\mu_0
\end{align*}
for every $t>0$.
Combining this identification with the preceding steps proves all asserted properties: $\mu_t$ is the metric Wasserstein gradient flow of $\operatorname{Ent}$, it is represented for positive time by the smooth density $\rho_t$, the density solves $\partial_t\rho=\Delta\rho$ classically on $(0,\infty)\times\mathbb R^n$, and $\mu_t\to\mu_0$ in $W_2$ as $t\downarrow0$.
[/step]