[proofplan]
We define a distribution on $U \times U$ by the stated oscillatory integral and check that its action on elementary tensors $\phi(x)u(y)$ is exactly the defining Kohn-Nirenberg formula for $(Pu)(\phi)$. This identifies the distribution with the Schwartz kernel of $P$. Smoothness away from the diagonal is proved locally: on a compact set where $|x-y|$ has a positive lower bound, repeated [integration by parts](/theorems/210) in $\xi$ turns the oscillatory integral and all its $x,y$ derivatives into absolutely convergent integrals.
[/proofplan]
[step:Define the candidate kernel as an oscillatory distribution]
Let
\begin{align*}
K_a: C_c^\infty(U \times U) &\to \mathbb{C}
\end{align*}
be the distribution defined by
\begin{align*}
K_a(\psi) = (2\pi)^{-n}\operatorname{Os}\!\int_{U \times U \times \mathbb{R}^n} e^{i(x-y)\cdot \xi}a(x,\xi)\psi(x,y)\,d\mathcal{L}^n(\xi)\,d\mathcal{L}^n(y)\,d\mathcal{L}^n(x),
\end{align*}
for $\psi \in C_c^\infty(U \times U)$. This is well-defined by the definition of oscillatory integrals with symbol amplitudes: for fixed compact support in $(x,y)$, the amplitude
\begin{align*}
b: U \times U \times \mathbb{R}^n &\to \mathbb{C}
\end{align*}
given by
\begin{align*}
b(x,y,\xi)=a(x,\xi)\psi(x,y)
\end{align*}
is compactly supported in $(x,y)$ and has symbol estimates of order $m$ in $\xi$.
[guided]
We first construct the object that should be the kernel. For a [test function](/page/Test%20Function) $\psi \in C_c^\infty(U \times U)$, define a linear functional
\begin{align*}
K_a: C_c^\infty(U \times U) &\to \mathbb{C}
\end{align*}
by
\begin{align*}
K_a(\psi) = (2\pi)^{-n}\operatorname{Os}\!\int_{U \times U \times \mathbb{R}^n} e^{i(x-y)\cdot \xi}a(x,\xi)\psi(x,y)\,d\mathcal{L}^n(\xi)\,d\mathcal{L}^n(y)\,d\mathcal{L}^n(x).
\end{align*}
Why is this a legitimate distribution? The only noncompact variable in the integral is $\xi$. The function
\begin{align*}
b: U \times U \times \mathbb{R}^n &\to \mathbb{C}
\end{align*}
defined by
\begin{align*}
b(x,y,\xi)=a(x,\xi)\psi(x,y)
\end{align*}
is compactly supported in $(x,y)$ because $\psi$ is compactly supported. Since $a \in S^m_{1,0}(U \times \mathbb{R}^n)$, differentiating $b$ in $x$, $y$, or $\xi$ gives finite sums of derivatives of $a$ multiplied by derivatives of $\psi$, and therefore the same symbol growth in $\xi$ holds on compact subsets of $U \times U$. Thus $b$ is an admissible oscillatory-integral amplitude. The oscillatory-integral definition then gives a continuous linear functional on $C_c^\infty(U \times U)$, namely a distribution $K_a \in \mathcal{D}'(U \times U)$.
[/guided]
[/step]
[step:Match the candidate kernel with the operator on elementary tensors]
Let $u,\phi \in C_c^\infty(U)$, and define
\begin{align*}
\psi_{\phi,u}: U \times U &\to \mathbb{C}
\end{align*}
by
\begin{align*}
\psi_{\phi,u}(x,y)=\phi(x)u(y).
\end{align*}
Then $\psi_{\phi,u} \in C_c^\infty(U \times U)$, and by the definition of $K_a$,
\begin{align*}
K_a(\psi_{\phi,u}) = (2\pi)^{-n}\operatorname{Os}\!\int_{U \times U \times \mathbb{R}^n} e^{i(x-y)\cdot \xi}\phi(x)a(x,\xi)u(y)\,d\mathcal{L}^n(\xi)\,d\mathcal{L}^n(y)\,d\mathcal{L}^n(x).
\end{align*}
This is exactly the assumed Kohn-Nirenberg formula for $(Pu)(\phi)$. Hence
\begin{align*}
K_a(\phi \otimes u)=(Pu)(\phi)
\end{align*}
for every $u,\phi \in C_c^\infty(U)$, where $\phi \otimes u$ denotes the function $(x,y)\mapsto \phi(x)u(y)$. By the defining property and uniqueness of the Schwartz kernel, $K_a=K_P$ in $\mathcal{D}'(U \times U)$.
[/step]
[step:Localize away from the diagonal]
Let $W \subset (U \times U)\setminus \{(x,x):x \in U\}$ be open with compact closure in $U \times U$. Since $\overline{W}$ is compact and disjoint from the diagonal, define
\begin{align*}
\delta_W=\inf\{|x-y|:(x,y)\in \overline{W}\}.
\end{align*}
Then $\delta_W>0$. Choose $\eta \in C_c^\infty(U \times U)$ with $\eta=1$ on a neighbourhood of $\overline{W}$ and $\operatorname{supp}\eta$ disjoint from the diagonal. It is enough to prove that the localized distribution $\eta K_P$ is represented by a smooth function, because $W$ was arbitrary.
[/step]
[step:Integrate by parts in $\xi$ to obtain absolute convergence]
For $(x,y)\in \operatorname{supp}\eta$, define the differential operator
\begin{align*}
L_{x,y}: C^\infty(\mathbb{R}^n_\xi) &\to C^\infty(\mathbb{R}^n_\xi)
\end{align*}
by
\begin{align*}
L_{x,y}q(\xi)=\frac{1}{i|x-y|^2}\sum_{j=1}^n (x_j-y_j)\frac{\partial q}{\partial \xi_j}(\xi).
\end{align*}
Then
\begin{align*}
L_{x,y}\left(e^{i(x-y)\cdot \xi}\right)=e^{i(x-y)\cdot \xi}.
\end{align*}
Let $L_{x,y}^t$ denote the formal transpose with respect to $\mathcal{L}^n$ in the $\xi$ variable. For every integer $N\ge 1$,
\begin{align*}
\operatorname{Os}\!\int_{\mathbb{R}^n} e^{i(x-y)\cdot \xi}\eta(x,y)a(x,\xi)\,d\mathcal{L}^n(\xi)
\end{align*}
equals
\begin{align*}
\int_{\mathbb{R}^n} e^{i(x-y)\cdot \xi}(L_{x,y}^t)^N\left(\eta(x,y)a(x,\xi)\right)\,d\mathcal{L}^n(\xi),
\end{align*}
provided $N$ is chosen so large that $m-N<-n$.
The coefficients of $L_{x,y}^t$ are smooth on $\operatorname{supp}\eta$ because $|x-y|\ge \delta_W$ there. Each application of $L_{x,y}^t$ differentiates once in $\xi$ and multiplies by a smooth coefficient bounded on $\operatorname{supp}\eta$. Hence $(L_{x,y}^t)^N(\eta a)$ is a symbol of order $m-N$ in $\xi$, uniformly for $(x,y)$ in compact subsets of $\operatorname{supp}\eta$. Since $m-N<-n$, the last integral is absolutely convergent and locally uniform in $(x,y)$.
[guided]
The phase has no stationary point in the $\xi$ variable when $x \ne y$, because its $\xi$-gradient is the nonzero vector $x-y$. This is the reason the kernel becomes smooth away from the diagonal. We turn that observation into an estimate by using an integration-by-parts operator that reproduces the exponential.
For $(x,y)\in \operatorname{supp}\eta$, define
\begin{align*}
L_{x,y}: C^\infty(\mathbb{R}^n_\xi) &\to C^\infty(\mathbb{R}^n_\xi)
\end{align*}
by
\begin{align*}
L_{x,y}q(\xi)=\frac{1}{i|x-y|^2}\sum_{j=1}^n (x_j-y_j)\frac{\partial q}{\partial \xi_j}(\xi).
\end{align*}
This operator is chosen so that applying it to the exponential returns the exponential itself:
\begin{align*}
L_{x,y}\left(e^{i(x-y)\cdot \xi}\right)=\frac{1}{i|x-y|^2}\sum_{j=1}^n (x_j-y_j)i(x_j-y_j)e^{i(x-y)\cdot \xi}=e^{i(x-y)\cdot \xi}.
\end{align*}
Because $\operatorname{supp}\eta$ is disjoint from the diagonal, the denominator $|x-y|^2$ is bounded below there by a positive constant. Thus the coefficients of $L_{x,y}$ and of its formal transpose $L_{x,y}^t$ are smooth and bounded on compact subsets of $\operatorname{supp}\eta$.
Now integrate by parts $N$ times in the $\xi$ variable. No boundary term appears in the oscillatory-integral interpretation, equivalently after inserting a compactly supported cutoff in $\xi$ and passing to the oscillatory limit. Since $L_{x,y}$ fixes the exponential, we obtain
\begin{align*}
\operatorname{Os}\!\int_{\mathbb{R}^n} e^{i(x-y)\cdot \xi}\eta(x,y)a(x,\xi)\,d\mathcal{L}^n(\xi)
\end{align*}
equal to
\begin{align*}
\int_{\mathbb{R}^n} e^{i(x-y)\cdot \xi}(L_{x,y}^t)^N\left(\eta(x,y)a(x,\xi)\right)\,d\mathcal{L}^n(\xi).
\end{align*}
Each application of $L_{x,y}^t$ differentiates the amplitude once in $\xi$ and multiplies by a smooth coefficient depending on $(x,y)$. Since $a \in S^m_{1,0}(U \times \mathbb{R}^n)$, differentiating once in $\xi$ lowers the order by one. Therefore $(L_{x,y}^t)^N(\eta a)$ has order $m-N$ in $\xi$. Choose $N$ so that $m-N<-n$. Then the bound by a constant multiple of $(1+|\xi|)^{m-N}$ is integrable over $\mathbb{R}^n$ with respect to $\mathcal{L}^n$, so the resulting integral is absolutely convergent and locally uniform in $(x,y)$.
[/guided]
[/step]
[step:Differentiate under the absolutely convergent integral]
Let $\alpha,\beta \in \mathbb{N}_0^n$ be multi-indices. Applying $\partial_x^\alpha\partial_y^\beta$ to
\begin{align*}
e^{i(x-y)\cdot \xi}\eta(x,y)a(x,\xi)
\end{align*}
produces a finite sum indexed by $\gamma$ of terms of the form
\begin{align*}
e^{i(x-y)\cdot \xi}p_{\gamma}(\xi)c_{\gamma}(x,y)\partial_x^{\alpha_\gamma}a(x,\xi),
\end{align*}
where, for each index $\gamma$ in that finite sum, $p_{\gamma}$ is a polynomial in $\xi$, $c_{\gamma}\in C_c^\infty(U \times U)$ is supported away from the diagonal, and $\alpha_\gamma$ is a multi-index. If the degree of $p_\gamma$ is $r_\gamma$, then this term has symbol order at most $m+r_\gamma$ in $\xi$. Repeating the previous integration-by-parts argument with $N$ chosen so that $m+r_\gamma-N<-n$ for every term gives an absolutely convergent integral representing $\partial_x^\alpha\partial_y^\beta(\eta K_P)$.
Thus all mixed derivatives of $\eta K_P$ are represented by continuous functions on $U \times U$. Hence $\eta K_P$ is smooth, and since $W$ was arbitrary, $K_P$ is smooth on $(U \times U)\setminus \{(x,x):x \in U\}$.
[/step]
[step:Conclude the kernel formula and off-diagonal smoothness]
We have shown that the distribution
\begin{align*}
(2\pi)^{-n}\operatorname{Os}\!\int_{\mathbb{R}^n} e^{i(x-y)\cdot \xi}a(x,\xi)\,d\mathcal{L}^n(\xi)
\end{align*}
has the defining action of the Schwartz kernel of $P$ on every elementary tensor $\phi(x)u(y)$ with $u,\phi \in C_c^\infty(U)$. Therefore it is $K_P$. The localization argument proves that its restriction to every compact subset disjoint from the diagonal is smooth, so $K_P$ is smooth on $(U \times U)\setminus \{(x,x):x \in U\}$. This proves both assertions.
[/step]