[proofplan]
We define the kernel as the oscillatory distribution obtained from the phase $(x-y)\cdot \xi/h$ and the amplitude $a(x,\xi;h)$. Pairing this distribution against a tensor test function $v(x)u(y)$ reproduces exactly the defining oscillatory formula for $\operatorname{Op}_h(a)u$, so it is the Schwartz kernel of the operator. Away from the diagonal $x=y$, the phase has no stationary points in $\xi$, and repeated integration by parts in $\xi$ transfers powers of $h/|x-y|$ onto derivatives of the symbol. Taking sufficiently many integrations by parts makes the $\xi$-integral absolutely convergent, which gives both smoothness and the stated off-diagonal decay.
[/proofplan]
[step:Define the oscillatory kernel distribution by cutoff regularization]
Choose a cutoff function $\rho \in C_c^\infty(\mathbb{R}^n)$ such that $\rho(\xi)=1$ for $|\xi|\le 1$. For $\varepsilon \in (0,1]$, define
\begin{align*}
K_{a,\varepsilon}: \mathbb{R}^n \times \mathbb{R}^n &\to \mathbb{C}
\end{align*}
by
\begin{align*}
K_{a,\varepsilon}(x,y;h) := \frac{1}{(2\pi h)^n}\int_{\mathbb{R}^n} e^{i(x-y)\cdot \xi/h} \rho(\varepsilon \xi)a(x,\xi;h)\, d\mathcal{L}^n(\xi).
\end{align*}
Since $\rho(\varepsilon \xi)a(x,\xi;h)$ is compactly supported in $\xi$ and smooth in $(x,\xi)$, this integral defines a smooth function of $(x,y)$.
For $\Phi \in \mathcal{S}(\mathbb{R}^n_x \times \mathbb{R}^n_y)$, define
\begin{align*}
K_{a,\varepsilon}(\Phi) := \int_{\mathbb{R}^n}\int_{\mathbb{R}^n} K_{a,\varepsilon}(x,y;h)\Phi(x,y)\, d\mathcal{L}^n(y)\, d\mathcal{L}^n(x).
\end{align*}
We next prove that the cutoff limit exists. For $\Phi \in \mathcal{S}(\mathbb{R}^n_x \times \mathbb{R}^n_y)$, define
\begin{align*}
B_\Phi: \mathbb{R}^n_\xi &\to \mathbb{C}
\end{align*}
by
\begin{align*}
B_\Phi(\xi;h) := \int_{\mathbb{R}^n}\int_{\mathbb{R}^n} e^{i(x-y)\cdot \xi/h}a(x,\xi;h)\Phi(x,y)\,d\mathcal{L}^n(y)\,d\mathcal{L}^n(x).
\end{align*}
For each integer $q \geq 0$, integrate by parts in the $x$ variable using
\begin{align*}
e^{i(x-y)\cdot \xi/h} = (1 + |\xi|^2h^{-2})^{-q}(1 - \Delta_x)^q e^{i(x-y)\cdot \xi/h}.
\end{align*}
The transpose of $(1 - \Delta_x)^q$ with respect to $d\mathcal{L}^n(x)$ is itself, and all boundary terms vanish because $\Phi$ is Schwartz in $x$. Thus $B_\Phi(\xi;h)$ is bounded by a finite sum of integrals of derivatives $\partial_x^\mu a(x,\xi;h)$ multiplied by Schwartz functions in $(x,y)$. Since $a \in S^m(T^*\mathbb{R}^n)$, for every $q$ there is a constant $C_{q,\Phi,a,h_0}>0$ such that
\begin{align*}
|B_\Phi(\xi;h)| \leq C_{q,\Phi,a,h_0}(1 + |\xi|^2h^{-2})^{-q}\langle \xi\rangle^m
\end{align*}
for all $\xi \in \mathbb{R}^n$ and all $h \in (0,h_0]$. Choosing $q$ with $2q>m+n$ gives $B_\Phi(\cdot;h) \in L^1(\mathbb{R}^n,d\mathcal{L}^n)$. Therefore dominated convergence gives
\begin{align*}
\lim_{\varepsilon \downarrow 0}K_{a,\varepsilon}(\Phi)=\frac{1}{(2\pi h)^n}\int_{\mathbb{R}^n}B_\Phi(\xi;h)\,d\mathcal{L}^n(\xi).
\end{align*}
This defines a continuous linear functional $K_a \in \mathcal{S}'(\mathbb{R}^n_x \times \mathbb{R}^n_y)$, and this distribution is precisely the oscillatory integral denoted by
\begin{align*}
K_a(x,y;h)=\frac{1}{(2\pi h)^n}\int_{\mathbb{R}^n}e^{i(x-y)\cdot \xi/h}a(x,\xi;h)\,d\mathcal{L}^n(\xi).
\end{align*}
[/step]
[step:Pair the kernel with tensor test functions to identify the operator]
Let $u,v \in \mathcal{S}(\mathbb{R}^n)$, and define the tensor test function
\begin{align*}
\Phi_{v,u}: \mathbb{R}^n \times \mathbb{R}^n &\to \mathbb{C}
\end{align*}
by
\begin{align*}
\Phi_{v,u}(x,y) := v(x)u(y).
\end{align*}
For $\varepsilon \in (0,1]$, the cutoff integral is absolutely convergent, so Fubini's theorem applies to the compactly supported $\xi$-integral and the rapidly decreasing $x,y$ variables. Hence
\begin{align*}
K_{a,\varepsilon}(\Phi_{v,u}) = \int_{\mathbb{R}^n} v(x)\left[\frac{1}{(2\pi h)^n}\int_{\mathbb{R}^n}\int_{\mathbb{R}^n} e^{i(x-y)\cdot \xi/h}\rho(\varepsilon \xi)a(x,\xi;h)u(y)\,d\mathcal{L}^n(y)\,d\mathcal{L}^n(\xi)\right]d\mathcal{L}^n(x).
\end{align*}
The preceding cutoff-convergence argument applies with $\Phi=\Phi_{v,u}$, so dominated convergence in the transformed $\xi$-integral justifies passing to the limit $\varepsilon \downarrow 0$. The limit of the bracketed expression is exactly the defining oscillatory integral for $\operatorname{Op}_h(a)u$ paired against $v$. Therefore
\begin{align*}
K_a(\Phi_{v,u}) = \operatorname{Op}_h(a)u(v).
\end{align*}
Thus $K_a$ is the Schwartz kernel of $\operatorname{Op}_h(a)$.
[guided]
The purpose of this step is to verify the kernel identity directly, not merely to name the Schwartz kernel theorem. We test the proposed kernel against a product test function because kernels are characterized by how they act on $v(x)u(y)$.
Let
\begin{align*}
\Phi_{v,u}: \mathbb{R}^n \times \mathbb{R}^n &\to \mathbb{C}
\end{align*}
be the Schwartz function
\begin{align*}
\Phi_{v,u}(x,y) := v(x)u(y).
\end{align*}
For the regularized kernel $K_{a,\varepsilon}$, the factor $\rho(\varepsilon \xi)$ makes the $\xi$-integration compactly supported. Since $u$ and $v$ are Schwartz functions and the cutoff amplitude is smooth with compact $\xi$-support, the full integral in $(x,y,\xi)$ is absolutely convergent. Therefore Fubini's theorem permits us to change the order of integration:
\begin{align*}
K_{a,\varepsilon}(\Phi_{v,u}) = \int_{\mathbb{R}^n} v(x)\left[\frac{1}{(2\pi h)^n}\int_{\mathbb{R}^n}\int_{\mathbb{R}^n} e^{i(x-y)\cdot \xi/h}\rho(\varepsilon \xi)a(x,\xi;h)u(y)\,d\mathcal{L}^n(y)\,d\mathcal{L}^n(\xi)\right]d\mathcal{L}^n(x).
\end{align*}
The expression in brackets is exactly the regularized version of $\operatorname{Op}_h(a)u(x)$. To justify removing the cutoff, define
\begin{align*}
B_{v,u}: \mathbb{R}^n_\xi &\to \mathbb{C}
\end{align*}
by
\begin{align*}
B_{v,u}(\xi;h) := \int_{\mathbb{R}^n}\int_{\mathbb{R}^n} e^{i(x-y)\cdot \xi/h}a(x,\xi;h)v(x)u(y)\,d\mathcal{L}^n(y)\,d\mathcal{L}^n(x).
\end{align*}
The same integration-by-parts argument used to construct $K_a$ shows that $B_{v,u}(\cdot;h) \in L^1(\mathbb{R}^n,d\mathcal{L}^n)$: applying $(1-\Delta_x)^q$ to the phase and moving it onto $a(x,\xi;h)v(x)u(y)$ gives a bound by $C_{q,u,v,a,h_0}(1+|\xi|^2h^{-2})^{-q}\langle\xi\rangle^m$, and $q$ is chosen so that $2q>m+n$. Hence dominated convergence permits the limit $\varepsilon \downarrow 0$ in the $\xi$-integral and gives
\begin{align*}
K_a(\Phi_{v,u}) = \operatorname{Op}_h(a)u(v).
\end{align*}
This identity is the kernel identity: the distribution $K_a$ encodes the operator because pairing $K_a$ against $v(x)u(y)$ gives the distribution $\operatorname{Op}_h(a)u$ evaluated on $v$.
[/guided]
[/step]
[step:Construct the off-diagonal integration-by-parts operator]
Fix $(x,y) \in \mathbb{R}^n \times \mathbb{R}^n$ with $x \ne y$. Define
\begin{align*}
L_{x,y}: C^\infty(\mathbb{R}^n_\xi) &\to C^\infty(\mathbb{R}^n_\xi)
\end{align*}
by
\begin{align*}
L_{x,y} b(\xi) := \frac{h}{i|x-y|^2}(x-y)\cdot \nabla_\xi b(\xi).
\end{align*}
Then
\begin{align*}
L_{x,y}\left(e^{i(x-y)\cdot \xi/h}\right)=e^{i(x-y)\cdot \xi/h}.
\end{align*}
The formal transpose with respect to Lebesgue measure $d\mathcal{L}^n(\xi)$ is
\begin{align*}
L_{x,y}^t b(\xi) := -\frac{h}{i|x-y|^2}(x-y)\cdot \nabla_\xi b(\xi).
\end{align*}
Thus, after $N$ integrations by parts in the $\xi$ variable,
\begin{align*}
K_a(x,y;h) = \frac{1}{(2\pi h)^n}\int_{\mathbb{R}^n} e^{i(x-y)\cdot \xi/h}(L_{x,y}^t)^N a(x,\xi;h)\,d\mathcal{L}^n(\xi)
\end{align*}
as an oscillatory integral.
[/step]
[step:Choose enough integrations by parts to make the off-diagonal integral absolutely convergent]
Let $E \subset \{(x,y):x\ne y\}$ be compact. Define
\begin{align*}
\delta_E := \inf_{(x,y)\in E}|x-y|.
\end{align*}
Since $E$ is compact and disjoint from the diagonal, $\delta_E>0$.
For each $N \in \mathbb{N}$, the operator $(L_{x,y}^t)^N$ is a finite linear combination of terms
\begin{align*}
c_{\nu,N}(x,y)h^N\partial_\xi^\nu
\end{align*}
where $\nu \in \mathbb{N}_0^n$ satisfies $|\nu|=N$ and the coefficient $c_{\nu,N}$ is a smooth function on $\{(x,y):x\ne y\}$ obtained from components of $(x-y)/|x-y|^2$. On $E$, these coefficients satisfy
\begin{align*}
|c_{\nu,N}(x,y)| \le A_{\nu,N}\delta_E^{-N}
\end{align*}
for constants $A_{\nu,N}>0$ depending only on $\nu$ and $N$. Since $a \in S^m(T^*\mathbb{R}^n)$, every $\xi$-derivative lowers symbolic order. Therefore, for each multiindex $\gamma \in \mathbb{N}_0^n$, there is a constant $C_{\gamma,N,E}>0$, depending on $\delta_E^{-1}$, on finitely many coefficients $A_{\nu,N}$, and on the symbol seminorms $\sup_{x,\xi,h}\langle\xi\rangle^{-m+N+|\gamma|}|\partial_x^\mu\partial_\xi^\lambda a(x,\xi;h)|$ with $|\lambda|\le N+|\gamma|$, such that
\begin{align*}
\left|\partial_\xi^\gamma (L_{x,y}^t)^N a(x,\xi;h)\right| \le C_{\gamma,N,E}h^N\langle \xi\rangle^{m-N-|\gamma|}
\end{align*}
for all $(x,y)\in E$, all $\xi \in \mathbb{R}^n$, and all $h\in(0,h_0]$.
Choose $N>m+n$. Then $\langle \xi\rangle^{m-N}$ is integrable over $\mathbb{R}^n$ with respect to $\mathcal{L}^n$. Consequently the integral
\begin{align*}
\frac{1}{(2\pi h)^n}\int_{\mathbb{R}^n} e^{i(x-y)\cdot \xi/h}(L_{x,y}^t)^N a(x,\xi;h)\,d\mathcal{L}^n(\xi)
\end{align*}
is absolutely convergent uniformly for $(x,y)\in E$. This gives a genuine smooth representative of $K_a$ on $E$.
[/step]
[step:Differentiate the off-diagonal formula and estimate the derivatives]
Let $\alpha,\beta \in \mathbb{N}_0^n$ be fixed, and let $N \in \mathbb{N}$ satisfy $N>m+|\alpha|$. Choose an integer
\begin{align*}
M>N+|\alpha|+|\beta|+n+\max\{m,0\}.
\end{align*}
We use $M$ integrations by parts to obtain an absolutely convergent formula, and only at the end weaken the resulting power of $h$ to the power stated with $N$.
Differentiate $\partial_x^\alpha\partial_y^\beta$ of the formula with $(L_{x,y}^t)^M$. This produces a finite sum of terms in which derivatives fall on the phase, on the coefficients of $L_{x,y}^t$, and on the symbol $a$. If $r$ derivatives fall on the phase, then $0\le r\le |\alpha|+|\beta|$ and the resulting factor is bounded by $h^{-r}\langle\xi\rangle^r$. Derivatives falling on the coefficients of $L_{x,y}^t$ are bounded on $E$ by constants depending on $\delta_E^{-1}$ and on finitely many derivatives of the smooth functions $(x-y)_j/|x-y|^2$. Derivatives falling on $a$ are controlled by finitely many symbol seminorms, and each $\xi$-derivative produced by $(L_{x,y}^t)^M$ lowers symbolic order by one. Hence each differentiated term is bounded on $E$ by
\begin{align*}
C_{\alpha,\beta,M,E}h^M h^{-|\alpha|-|\beta|}\langle \xi\rangle^{m+|\alpha|+|\beta|-M}
\end{align*}
where $C_{\alpha,\beta,M,E}>0$ depends on $\delta_E^{-1}$, $h_0$, finitely many derivatives of the coefficients of $L_{x,y}^t$, and finitely many symbol seminorms of $a$.
The choice of $M$ gives $M>m+|\alpha|+|\beta|+n$, so the function $\xi \mapsto \langle\xi\rangle^{m+|\alpha|+|\beta|-M}$ is integrable on $\mathbb{R}^n$ with respect to $\mathcal{L}^n$. Differentiation under the absolutely convergent integral is therefore justified, and multiplying by the prefactor $(2\pi h)^{-n}$ gives
\begin{align*}
|\partial_x^\alpha \partial_y^\beta K_a(x,y;h)| \le C'_{\alpha,\beta,M,E}h^{M-n-|\alpha|-|\beta|}
\end{align*}
for all $(x,y)\in E$ and all $h\in(0,h_0]$.
Since $M>N$ and $0<h\le h_0$, we have
\begin{align*}
h^{M-n-|\alpha|-|\beta|}=h^{N-n-|\alpha|-|\beta|}h^{M-N}\le h_0^{M-N}h^{N-n-|\alpha|-|\beta|}.
\end{align*}
Absorbing the factor $h_0^{M-N}$ into the constant yields
\begin{align*}
|\partial_x^\alpha \partial_y^\beta K_a(x,y;h)| \le C_{\alpha,\beta,N,E}h^{N-n-|\alpha|-|\beta|}
\end{align*}
for all $(x,y)\in E$ and all $h\in(0,h_0]$.
Because this argument applies to arbitrary multiindices $\alpha$ and $\beta$, all derivatives exist and are continuous on every compact set away from the diagonal. Hence $K_a$ is smooth on $\{(x,y):x\ne y\}$.
[/step]
[step:Conclude the kernel representation and the off-diagonal regularity]
The distribution $K_a$ defined by the oscillatory integral satisfies
\begin{align*}
K_a(v\otimes u)=\operatorname{Op}_h(a)u(v)
\end{align*}
for all $u,v\in\mathcal{S}(\mathbb{R}^n)$, so it is the Schwartz kernel of $\operatorname{Op}_h(a)$. The integration-by-parts argument shows that this kernel is represented by a smooth function away from $x=y$ and satisfies the stated off-diagonal estimates. This proves the theorem.
[/step]