This course develops the microlocal viewpoint on distributions: not just where singularities live, but how they are distributed in direction as well. The central object is the wave front set, which refines the notion of singular support by recording the cotangent directions in which a distribution fails to be smooth. From there, the course builds a calculus for detecting, tracking, and transforming singularities under pseudodifferential operators, products, pullbacks, and other basic operations on distributions.
The main themes are singularity analysis, symplectic geometry, and the dynamics of linear partial differential equations. After introducing wave front sets and their behavior under [Microlocal Analysis I: Pseudodifferential Operators](/page/Microlocal%20Analysis%20I%3A%20Pseudodifferential%20Operators), the course studies conormal and Lagrangian distributions as model classes of singularities, then turns to real principal type operators and propagation along bicharacteristics. Oscillatory integrals and the [Stationary Phase Lemma](/theorems/645) provide the analytic bridge to Fourier integral operators, whose canonical relations encode how singularities move under generalized changes of variables. Composition, adjoints, and [Egorov's theorem](/theorems/896) then show how this geometric picture controls operator calculus.
The later chapters apply this framework to hyperbolic equations and parametrix construction, where Fourier integral operators capture the structure of solution operators and the propagation of wave fronts. The course is cumulative: the early chapters establish the language of microlocal regularity, the middle chapters develop the geometric and oscillatory machinery, and the final chapters use that machinery to analyze PDEs and concrete singular phenomena in a unified way.
# Introduction
Microlocal analysis refines the study of singularities by recording both their position and their cotangent direction. The prerequisite course [Microlocal Analysis I: Pseudodifferential Operators](/page/Microlocal%20Analysis%20I%3A%20Pseudodifferential%20Operators) supplied the calculus needed to cut a distribution in phase space; this course uses that calculus to describe how singularities are detected, transformed, and propagated. The central objects are the wave front set, canonical relations, and Fourier integral operators, with hyperbolic PDEs as the guiding class of examples.
These notes assume familiarity with [Distribution](/page/Distribution), [Fourier Transform](/page/Fourier%20Transform), [Sobolev Space](/page/Sobolev%20Space), smooth manifolds, cotangent bundles, and the principal symbol calculus from [Microlocal Analysis I: Pseudodifferential Operators](/page/Microlocal%20Analysis%20I%3A%20Pseudodifferential%20Operators). The aim of this introductory chapter is to set the agenda: why ordinary support and singular support are too coarse, why cotangent directions are the right refinement, and how the later chapters fit into a single geometric picture.
## Why Singular Support Is Not Enough
What information is lost if we only ask where a distribution fails to be smooth? A distribution may be singular at a point for many different geometric reasons: a jump across a hypersurface, a point mass, a conormal concentration, or rapid oscillation approaching the point. The singular support records the base point, but it forgets the frequency directions in which the failure of smoothness is visible.
[definition: Singular Support]
Let $u \in \mathcal{D}'(U)$, where $U \subset \mathbb R^n$ is open. The singular support of $u$, denoted $\operatorname{sing\,supp} u$, is the complement in $U$ of the set of points $x_0 \in U$ for which there exists $\chi \in C_c^\infty(U)$ with $\chi = 1$ on a neighbourhood of $x_0$ and $\chi u \in C^\infty_c(U)$.
[/definition]
The definition localises in $x$, but it does not localise in the dual variable $\xi$. Microlocal analysis keeps the spatial cutoff and adds a conic frequency cutoff, so that one asks whether $\widehat{\chi u}(\xi)$ decays rapidly near a specified non-zero covector direction.
[example: Jump Across A Hypersurface]
Let $u=\mathbb 1_{\{x_1>0\}}$ on $\mathbb R^n$, and write $x=(x_1,x')$. Away from the hyperplane $\{x_1=0\}$ the function is locally constant, hence smooth; at any point $(0,y_0)$ on the hyperplane every neighbourhood contains points where $u=0$ and points where $u=1$, so no smooth representative can agree with $u$ there. Thus
\begin{align*}
\operatorname{sing\,supp}u=\{x_1=0\}.
\end{align*}
Choose a cutoff of product form $\chi(x_1,x')=\alpha(x_1)\beta(x')$, with $\alpha,\beta\in C_c^\infty$, $\alpha(0)\ne 0$, and $\int \beta(x')\,dx'\ne 0$. Then
\begin{align*}
\widehat{\chi u}(\xi_1,\xi')=\left(\int_0^\infty \alpha(x_1)e^{-ix_1\xi_1}\,dx_1\right)\left(\int_{\mathbb R^{n-1}}\beta(x')e^{-ix'\cdot \xi'}\,dx'\right).
\end{align*}
Along the normal ray $\xi' =0$, set $t=\xi_1$. Since $\alpha$ is compactly supported, [integration by parts](/theorems/210) on $[0,\infty)$ gives
\begin{align*}
\int_0^\infty \alpha(s)e^{-its}\,ds=\frac{\alpha(0)}{it}+\frac{1}{it}\int_0^\infty \alpha'(s)e^{-its}\,ds.
\end{align*}
Applying the same [integration by parts](/theorems/2098) to the remaining integral gives
\begin{align*}
\int_0^\infty \alpha(s)e^{-its}\,ds=\frac{\alpha(0)}{it}+O(t^{-2}).
\end{align*}
Therefore
\begin{align*}
\widehat{\chi u}(t,0)=\left(\frac{\alpha(0)}{it}+O(t^{-2})\right)\int_{\mathbb R^{n-1}}\beta(x')\,dx',
\end{align*}
so $\widehat{\chi u}(t,0)$ decays like $|t|^{-1}$, not faster than every power of $|t|$. In tangential directions with $|\xi'|$ large and $\xi_1$ bounded, repeated integration by parts in the $x'$ variables gives rapid decay because derivatives fall only on the smooth compactly supported factor $\beta$. Thus the singularity is detected in the conormal directions, parallel to $dx_1$, rather than in directions tangent to the hyperplane.
[/example]
This example already indicates why cotangent variables enter. A hypersurface singularity is not equally singular in every direction; it is tied to the conormal bundle of the hypersurface.
## Local Fourier Decay As A Smoothness Test
How can Fourier analysis distinguish smooth behaviour from singular behaviour after a spatial cutoff? Compactly supported smooth functions have rapidly decaying Fourier transforms, while compactly supported distributions have at most polynomial growth. Thus rapid decay of $\widehat{\chi u}$ is the Fourier-side signature of smoothness near the region where $\chi$ is equal to $1$.
[definition: Rapid Decay In A Cone]
Let $\mathbb R^n_0 := \mathbb R^n \setminus \{0\}$. Let $\Gamma \subset \mathbb R^n_0$ be an open conic set and let $v \in \mathcal{D}'(\mathbb R^n)$ have compact support. We say that $\hat v$ is rapidly decreasing in $\Gamma$ if for every $N \in \mathbb N$ there exists $C_N > 0$ such that
\begin{align*}
|\hat v(\xi)| \le C_N(1+|\xi|)^{-N}
\end{align*}
for all $\xi \in \Gamma$.
[/definition]
The cone condition matters because covectors are directional objects: multiplying $\xi$ by a positive scalar does not change the direction of oscillation. Before using conic decay to define directional regularity, we first need the non-directional benchmark showing that rapid decay in every direction is the same as ordinary smoothness.
[quotetheorem:8161]
[citeproof:8161]
The compact support hypothesis after applying $\chi$ is essential: the constant function $1$ is smooth, but its global Fourier transform is a multiple of $\delta_0$ rather than an ordinary rapidly decreasing function. The requirement that $\chi=1$ near $x_0$ is also essential. If $u=\delta_0$ and $x_0=0$, choosing a cutoff with $\chi(0)=0$ may give $\chi u=0$, falsely certifying smoothness at the singular point. Finally, rapid decay has to hold in every frequency direction for ordinary smoothness: the hypersurface jump $u=\mathbb{1}_{\{x_1>0\}}$ has improved decay in tangential directions after localisation, but it remains singular at $x_1=0$ because the normal covector directions fail rapid decay. The theorem therefore supplies the non-directional benchmark from which the wave front set is obtained by replacing all of $\mathbb R^n_0$ with a single conic neighbourhood.
## The Microlocal Question
What should it mean for a distribution to be regular at a point in one cotangent direction but singular in another? The answer is to test localised Fourier decay only near a chosen non-zero covector. Failure of rapid decay in that direction becomes membership in the wave front set.
[definition: Microlocal Regularity In Euclidean Space]
Let $u \in \mathcal{D}'(U)$, let $x_0 \in U$, and let $\xi_0 \in \mathbb R^n_0$. We say that $u$ is microlocally smooth at $(x_0, \xi_0)$ if there exist $\chi \in C_c^\infty(U)$ with $\chi(x_0) \ne 0$ and an open conic neighbourhood $\Gamma$ of $\xi_0$ such that $\widehat{\chi u}$ is rapidly decreasing in $\Gamma$.
[/definition]
The condition $\chi(x_0)\ne 0$ is equivalent to the more common cutoff convention $\chi=1$ near $x_0$ after shrinking the neighbourhood and multiplying by a second bump. Indeed, if $\chi(x_0)\ne 0$, choose $\rho\in C_c^\infty(U)$ with $\rho=1$ near $x_0$ and with support contained where $\chi$ is non-zero; then $\rho u=(\rho/\chi)\chi u$, and multiplication by the smooth compactly supported function $\rho/\chi$ preserves rapid decay in slightly smaller cones. Later arguments freely use cutoffs equal to $1$ near the base point with this reduction understood.
The wave front set is the complementary set of directions where this test fails. Its definition is conic in the covector variable and local in the base variable.
[definition: Wave Front Set In Euclidean Space]
Let $u \in \mathcal{D}'(U)$. The wave front set of $u$, denoted $\operatorname{WF}(u)$, is the subset of $U \times (\mathbb R^n \setminus \{0\})$ consisting of all pairs $(x_0, \xi_0)$ at which $u$ is not microlocally smooth.
[/definition]
Although the formal definition belongs to the next chapter, it is useful to see its role immediately: it is a replacement for singular support that remembers direction. Projecting $\operatorname{WF}(u)$ to the base should therefore recover the singular support, and proving this checks that no ordinary singularities have been lost by the directional refinement.
[quotetheorem:8162]
[citeproof:8162]
The non-zero covector condition is necessary: including the zero section would destroy the conic compactness argument, since frequency direction is defined only after removing $\xi=0$. A concrete failure occurs for a smooth compactly supported bump $\rho\in C_c^\infty(U)$ with $\rho(x_0)\ne 0$: if the zero covector were tested as an ordinary direction, $\widehat{\rho}$ need not vanish near $\xi=0$, so the pair $(x_0,0)$ could be incorrectly recorded although $x_0\notin\operatorname{sing\,supp}\rho$. The theorem also depends on covering the non-zero directions by cones on the unit sphere; without conic neighbourhoods, infinitely many isolated frequency tests would not combine into the uniform rapid-decay estimate used in the proof. Thus the theorem does not identify which covectors occur over a singular point; it only says that at least one must occur and that none occur over smooth points. This makes the wave front set a refinement rather than a replacement with unrelated information. The next example marks the extreme case in which every non-zero covector over the singular point is present.
[example: Dirac Mass At The Origin]
For $u=\delta_0$ on $\mathbb R^n$, the singular support is exactly $\{0\}$. If $x_0\ne 0$, choose $\chi\in C_c^\infty(\mathbb R^n)$ supported in a small ball not containing $0$ and equal to $1$ near $x_0$; then $\chi\delta_0=0$, so $u$ is smooth near $x_0$. At $0$, suppose $\delta_0$ were represented by a smooth function $f$ on a neighbourhood $V$ of $0$. For every [test function](/page/Test%20Function) $\varphi\in C_c^\infty(V\setminus\{0\})$,
\begin{align*}
\int_V f(x)\varphi(x)\,dx=\langle \delta_0,\varphi\rangle=\varphi(0)=0.
\end{align*}
Hence $f=0$ on $V\setminus\{0\}$, and by continuity $f(0)=0$, so $f=0$ on $V$. Choosing $\psi\in C_c^\infty(V)$ with $\psi(0)=1$ gives
\begin{align*}
0=\int_V f(x)\psi(x)\,dx=\langle \delta_0,\psi\rangle=\psi(0)=1,
\end{align*}
a contradiction. Thus $\operatorname{sing\,supp}\delta_0=\{0\}$.
Now let $\chi\in C_c^\infty(\mathbb R^n)$ with $\chi(0)\ne 0$. For every test function $\varphi$,
\begin{align*}
\langle \chi\delta_0,\varphi\rangle=\langle \delta_0,\chi\varphi\rangle=\chi(0)\varphi(0)=\langle \chi(0)\delta_0,\varphi\rangle.
\end{align*}
Therefore $\chi\delta_0=\chi(0)\delta_0$. With the Fourier convention used here,
\begin{align*}
\widehat{\chi\delta_0}(\xi)=\langle \chi(0)\delta_0,e^{-ix\cdot \xi}\rangle=\chi(0)e^{-i0\cdot \xi}=\chi(0).
\end{align*}
Fix any non-zero covector $\xi_0$ and any open conic neighbourhood $\Gamma$ of $\xi_0$. Since $t\xi_0\in\Gamma$ for all $t>0$, rapid decay in $\Gamma$ with $N=1$ would require a constant $C_1$ such that
\begin{align*}
|\chi(0)|\le C_1(1+t|\xi_0|)^{-1}
\end{align*}
for all $t>0$. The right-hand side tends to $0$ as $t\to\infty$, while $|\chi(0)|>0$, so no such estimate exists. Hence every non-zero frequency direction over $0$ fails the microlocal smoothness test, and
\begin{align*}
\operatorname{WF}(\delta_0)=\{0\}\times(\mathbb R^n\setminus\{0\}).
\end{align*}
[/example]
The Dirac mass is the opposite extreme from the hypersurface jump: it is singular in every cotangent direction over its support. Much of the course develops tools for distinguishing such patterns without recomputing Fourier transforms from first principles.
## Pseudodifferential Operators As Directional Cutoffs
How can the Fourier cutoff viewpoint be made invariant and compatible with differential equations? Pseudodifferential operators supply the answer. Their symbols localise in both $x$ and $\xi$, and ellipticity becomes the condition that a microlocal cutoff sees all components of the distribution in a chosen conic region.
[definition: Microlocal Ellipticity]
Let $U \subset \mathbb R^n$ be open, and let $A \in \Psi^m(U)$ be a pseudodifferential operator acting continuously $A:C_c^\infty(U) \to C^\infty(U)$ and extending by duality to distributions where the pairing is defined. Let
\begin{align*}
a_m:T^*U\setminus 0 \longrightarrow \mathbb C
\end{align*}
be the principal symbol of $A$, homogeneous of degree $m$ in the cotangent variable. We say that $A$ is elliptic at $(x_0,\xi_0) \in T^*U \setminus 0$ if $a_m(x,\xi) \ne 0$ for all $(x,\xi)$ in some conic neighbourhood of $(x_0,\xi_0)$ with $|\xi|$ sufficiently large.
[/definition]
Ellipticity is the bridge between symbol calculus and wave front sets because an [elliptic operator](/page/Elliptic%20Operator) has a microlocal inverse modulo smoothing errors. The next principle records the detection property that will later replace direct Fourier estimates in most arguments.
[quotetheorem:8163]
[citeproof:8163]
Ellipticity cannot be dropped. For example, on $\mathbb R$ take $A=D_x$ and $u=1$; the principal symbol $\xi$ vanishes at the forbidden zero covector and $Au=0$, so a vanishing symbol would give no information about $u$ in that direction. More relevantly, for $P=D_{x_1}$ on $\mathbb R^2$, the principal symbol is $\xi_1$, so $P$ is not elliptic at covectors with $\xi_1=0$. The distribution $u=\delta(x_2)$ has wave front set in the $\xi_2$ direction, but $Pu=0$, showing that a characteristic direction can contain singularities invisible to the operator. The principle therefore uses ellipticity to replace direct Fourier estimates with parametrices, which is the form needed for propagation and for invariant definitions on manifolds.
## Propagation And Canonical Geometry
Once singularities have directions, the next question is how those directions move under PDEs. For elliptic equations, singularities cannot hide inside the elliptic region. For real principal type hyperbolic equations, singularities travel along the Hamilton flow of the principal symbol.
[definition: Hamilton Vector Field]
Let $p \in C^\infty(T^*X)$, where $X$ is a smooth manifold. The Hamilton vector field of $p$ is the smooth section of $T(T^*X)\to T^*X$
\begin{align*}
H_p:T^*X \longrightarrow T(T^*X)
\end{align*}
specified by the identity $\omega(H_p,V)=dp(V)$ for every vector field $V$ on $T^*X$, where $\omega$ is the canonical symplectic form. In local coordinates $(x_1,\dots,x_n)$ on $X$ and cotangent coordinates $(\xi_1,\dots,\xi_n)$, it is given by
\begin{align*}
H_p = \sum_{j=1}^{n} \frac{\partial p}{\partial \xi_j}\frac{\partial}{\partial x_j} - \sum_{j=1}^{n} \frac{\partial p}{\partial x_j}\frac{\partial}{\partial \xi_j}.
\end{align*}
[/definition]
As a derivation on smooth functions, the same vector field acts by $H_p f=\omega^{-1}(dp,df)$ for $f\in C^\infty(T^*X)$. The Hamilton vector field is the infinitesimal direction in phase space determined by the principal symbol. To state propagation for a PDE, we restrict its integral curves to the characteristic set, because away from that set ellipticity has already removed the possible singularities.
[definition: Bicharacteristic]
Let $p \in C^\infty(T^*X)$ be real-valued. A bicharacteristic of $p$ is a smooth curve
\begin{align*}
\gamma:I\longrightarrow T^*X\setminus 0
\end{align*}
from an interval $I\subset\mathbb R$ such that $\gamma(I)\subset \{(x,\xi) \in T^*X \setminus 0 : p(x,\xi)=0\}$ and
\begin{align*}
\gamma'(t)=H_p(\gamma(t))
\end{align*}
for every $t\in I$.
[/definition]
Bicharacteristics identify the only phase-space curves along which singularities can persist once elliptic regularity has been removed. A definition of these curves is not enough for applications: to solve hyperbolic equations, we need a theorem saying that wave front sets are constrained to move along them. The propagation principle supplies this constraint under real principal type hypotheses, where the Hamilton flow gives a non-degenerate direction of travel through the characteristic set.
[quotetheorem:8164]
[citeproof:8164]
The real principal type hypothesis is essential. A model failure is $P=D_x^2$ on $\mathbb R$, whose principal symbol is $p(\xi)=\xi^2$; on the characteristic set $\{\xi=0\}$ the Hamilton vector field vanishes, so there is no non-degenerate bicharacteristic direction along which the theorem could propagate regularity. Multiple-characteristic operators such as $D_t^2-t^2D_x^2$ show the same obstruction in a hyperbolic-looking setting: the principal symbol has a double characteristic at $t=0,\tau=0$, and singular behaviour near that set is not governed by a single smooth Hamilton flow. The theorem does not create regularity at a singular source; it says that away from $\operatorname{WF}(Pu)$ the remaining singularities must occupy whole bicharacteristic arcs rather than isolated points. For the wave operator this is the microlocal form of finite-speed propagation along null geodesics, and in symplectic language those geodesic lifts are Hamilton trajectories. It also motivates Fourier integral operators, whose canonical relations encode the same movement of covectors.
Geometrically, one should picture the characteristic set as the phase-space surface on which the principal symbol vanishes. Elliptic regions off that surface have already been cleared of possible wave front set, while the Hamilton vector field draws the allowed curves on the characteristic surface itself. Propagation says that, except where the equation has a microlocal source term, singular covectors cannot jump from one such curve to another.
## Fourier Integral Operators As Transport Of Singularities
What class of operators describes solution operators for hyperbolic equations, pullbacks by maps, and oscillatory integral parametrices in one framework? Fourier integral operators answer this by replacing the graph of a cotangent map with a more general canonical relation.
[definition: Oscillatory Integral Model]
Let $X$ and $Y$ be smooth manifolds, and work in coordinate patches $U_X\subset X$ and $U_Y\subset Y$. Let $N\in \mathbb N$, let $\theta\in \mathbb R^N\setminus 0$, let
\begin{align*}
\phi:U_X\times U_Y\times(\mathbb R^N\setminus 0)\longrightarrow \mathbb R
\end{align*}
be a smooth non-degenerate phase function, homogeneous of degree $1$ in $\theta$, and let
\begin{align*}
a\in S^\mu(U_X\times U_Y\times \mathbb R^N)
\end{align*}
be a classical symbol in the $\theta$ variable. Locally, an oscillatory integral operator is first defined on test functions as a continuous [linear map](/page/Linear%20Map) $T:C_c^\infty(U_Y) \to \mathcal D'(U_X)$ of the form
\begin{align*}
Tf(x) = \int e^{i\phi(x,y,\theta)} a(x,y,\theta) f(y)\,d\theta\,dy,
\end{align*}
with integration over $\theta\in\mathbb R^N$ and $y\in U_Y$, interpreted as an oscillatory integral.
[/definition]
The phase function generates a relation between input and output covectors through its stationary set. To make that relation intrinsic, we package it as a conic Lagrangian submanifold of a product of cotangent bundles with the source symplectic form reversed.
[definition: Canonical Relation]
A canonical relation from $Y$ to $X$ is a conic Lagrangian submanifold $C \subset (T^*X \setminus 0) \times (T^*Y \setminus 0)$ with respect to the symplectic form $\pi_X^*\omega_X - \pi_Y^*\omega_Y$.
[/definition]
Canonical relations are the geometric data carried by Fourier integral operators. The main mapping statement is that singularities of $Tf$ are contained in the forward image of the singularities of $f$ under this relation, together with any output-only singularities of the operator kernel.
[quotetheorem:8165]
[citeproof:8165]
Proper support is essential. For a concrete model, define on $\mathbb R$ the smooth-kernel operator
\begin{align*}
Tf(x)=\int_{\mathbb R} e^{ixy}f(y)\,dy
\end{align*}
on compactly supported test functions. Its kernel is smooth on $\mathbb R_x\times\mathbb R_y$, but the operator is not properly supported. If it is applied formally to the non-compact smooth input $f(y)=1$, the output is a multiple of $\delta_0$, a singularity not predicted by the empty kernel wave front set. Proper support is what prevents this kind of singularity from entering from infinity when the operator is extended to distributions. The term $\operatorname{WF}'_X(K_T)$ records the possible singularities obtained from applying $T$ to microlocally smooth inputs; in the common FIO situation where the kernel relation has no covectors with zero $Y$ component, this term is empty and the displayed estimate reduces to the simpler canonical-relation image. The theorem is only an inclusion, not an equality; cancellations in the oscillatory integral or vanishing of the principal amplitude may remove some singularities. Its importance is that it turns the analytic problem of estimating an integral into the symplectic problem of following a canonical relation. The [wave equation](/page/Wave%20Equation) gives the main prototype, because its solution operator transports covectors along geodesic flow.
A canonical relation should be read as a phase-space correspondence: it pairs an input covector over $Y$ with an output covector over $X$ while preserving the symplectic structure in the product with the source sign reversed. When the relation is the graph of a canonical transformation, this looks like ordinary transport by a map; in general it can be multi-valued, which is why the mapping theorem naturally gives an inclusion for wave front sets rather than a pointwise formula.
[example: Solution Operator For The Wave Equation]
Let $(M,g)$ be compact and let $U(t)=e^{-it\sqrt{\Delta_g}}$ be the half-wave propagator. For $(x,\xi)\in T^*M\setminus 0$, write
\begin{align*}h(x,\xi)=|\xi|_g=\left(\sum_{j,k}g^{jk}(x)\xi_j\xi_k\right)^{1/2}.\end{align*}
The Hamilton equations for $h$ are
\begin{align*}\dot x^j=\frac{\partial h}{\partial \xi_j}=\frac{\sum_k g^{jk}(x)\xi_k}{|\xi|_g}.\end{align*}
\begin{align*}\dot \xi_j=-\frac{\partial h}{\partial x^j}=-\frac{1}{2|\xi|_g}\sum_{k,\ell}\frac{\partial g^{k\ell}}{\partial x^j}(x)\xi_k\xi_\ell.\end{align*}
These equations define the geodesic flow $G^t:T^*M\setminus 0\to T^*M\setminus 0$ on the positive characteristic branch $\tau=h(x,\xi)$ of the wave operator.
For each fixed $t$, the half-wave operator $U(t)$ is a Fourier integral operator whose canonical relation is
\begin{align*}C_t=\{(x,\xi;y,\eta)\in (T^*M\setminus 0)\times (T^*M\setminus 0):(x,\xi)=G^t(y,\eta)\}.\end{align*}
Applying the *Wave Front Mapping Principle* to this graph relation gives
\begin{align*}\operatorname{WF}(U(t)f)\subset \{G^t(y,\eta):(y,\eta)\in \operatorname{WF}(f)\}.\end{align*}
Thus an initial singularity at $(x_0,\xi_0)$ can only appear at time $t$ at the covector $G^t(x_0,\xi_0)$, so the solution operator transports wave front directions along the geodesic flow rather than merely moving their base points.
[/example]
This example is the model application for the course. The abstract calculus of Fourier integral operators is designed to make such propagation statements stable under composition, parametrices, and changes of coordinates.
## Structure Of The Course
How do the later chapters build the full theory from this overview? The course begins with the Euclidean definition of the wave front set, then proves its coordinate invariance and reformulates it using pseudodifferential operators. It then studies operations on distributions, propagation for real principal type operators, and the construction of parametrices by oscillatory integrals.
After this foundation, the notes develop Fourier integral operators systematically. Phase functions, Lagrangian distributions, canonical relations, and composition theorems provide the geometric language. This is where the course connects analysis with symplectic geometry: wave front sets live in cotangent bundles, Hamilton vector fields are symplectic invariants, and canonical relations are the natural correspondences between phase spaces. The final applications return to PDE: wave parametrices, singularity propagation, and the microlocal structure of solution operators.
[remark: Guiding Principle]
Microlocal analysis treats a distribution as an object whose singularities live in cotangent space. Pseudodifferential operators detect those singularities, Hamilton flows propagate them, and Fourier integral operators transport them along canonical relations.
[/remark]
This principle will be used repeatedly. Each later construction should be read as a way to make one part of that sentence precise: detection by elliptic cutoffs, propagation by Hamilton dynamics, and transformation by symplectic geometry.
Microlocal analysis sharpens the idea of singular support by asking not only where a distribution fails to be smooth, but in which cotangent directions that failure is seen. The next chapter turns this slogan into a precise invariant: the wave front set.
# 1. Directional Singularities and the Wave Front Set
This opening chapter introduces the central refinement made by microlocal analysis: a distribution can be singular at a point for reasons that live in particular cotangent directions. Ordinary singular support records where smoothness fails, while the wave front set records where rapid decay of the localized Fourier transform fails. The goal is to build this invariant object first in $\mathbb R^n$, then on a smooth manifold, and to record the operations that will be used throughout the course.
## Singular Support and Directional Frequency
The first question is why singular support is too coarse for propagation theory. A hyperbolic equation does not usually move singularities in every direction from a point; it transports them along characteristic covectors. To describe that behaviour, we need to attach frequency direction data to the local failure of smoothness.
[definition: Singular Support]
Let $U \subset \mathbb R^n$ be open and let $u \in \mathcal D'(U)$. The singular support of $u$, denoted $\operatorname{sing\,supp} u$, is the complement in $U$ of the set of points $x_0 \in U$ for which there exists $\chi \in C_c^\infty(U)$ with $\chi = 1$ on a neighbourhood of $x_0$ and $\chi u \in C^\infty_c(U)$.
[/definition]
This definition is local in the base variable, but it does not yet say how to test smoothness in a way that can be split by frequency direction. The next theorem is needed because it converts smoothness near $x_0$ into rapid decay of a localized Fourier transform, which is the condition that can later be imposed only inside selected cones.
[quotetheorem:8166]
[citeproof:8166]
This theorem is the bridge between local regularity and frequency decay. The cutoff hypothesis is essential: without first localizing near $x_0$, decay of a global Fourier transform would mix singularities from unrelated base points. For instance, $u=\delta_1$ on $\mathbb R$ is smooth near $x_0=0$, but its global Fourier transform is a non-decaying exponential, so a global test would falsely report a singularity at $0$. The requirement $\chi=1$ near $x_0$ is equally necessary: if $\chi$ vanishes near $x_0$, then $\chi\delta_{x_0}=0$ has a rapidly decreasing Fourier transform even though $\delta_{x_0}$ is singular at $x_0$. Compact support is what makes the Fourier transform of $\chi u$ a controlled entire object with polynomial growth; multiplying by a non-compact cutoff such as $\chi\equiv 1$ would again test global behaviour and may not even produce a compactly supported distribution.
The result still treats frequency space all at once: it says that smoothness near $x_0$ is equivalent to decay in every high-frequency direction. It therefore cannot distinguish a distribution whose localized transform fails to decay only near one normal ray from a distribution whose localized transform fails to decay in every direction; both are merely reported as singular at $x_0$. This is the exact failure that forces the microlocal refinement. We must ask whether rapid decay holds near a specified nonzero direction $\xi_0$, while ignoring the magnitude of $\xi$ and keeping only the ray of high-frequency escape.
To make this directional question well posed, the neighbourhoods in frequency space must be stable under positive rescaling. A high-frequency estimate is meant to test what happens as $|\xi|\to\infty$ while the direction $\xi/|\xi|$ stays near a chosen point of the sphere, so the neighbourhood should contain whole positive rays rather than isolated frequency magnitudes. Thus, writing $\mathbb R^n_0 := \mathbb R^n\setminus\{0\}$, the next definition isolates the conic subsets that will serve as direction neighbourhoods.
[definition: Conic Set]
Let $V \subset \mathbb R^n_0$. The set $V$ is conic if $\lambda \xi \in V$ for every $\xi \in V$ and every $\lambda>0$.
[/definition]
Conic sets are the natural frequency neighbourhoods because oscillation direction is unchanged by positive rescaling of $\xi$. The size $|\xi|$ measures frequency, while the ray $\mathbb R_+\xi$ records direction.
[example: Dirac Mass at the Origin]
Let $u=\delta_0 \in \mathcal D'(\mathbb R^n)$ and let $\chi\in C_c^\infty(\mathbb R^n)$ with $\chi(0)\ne 0$. For every test function $\varphi\in C_c^\infty(\mathbb R^n)$,
\begin{align*}
\langle \chi\delta_0,\varphi\rangle=\langle \delta_0,\chi\varphi\rangle=\chi(0)\varphi(0)=\langle \chi(0)\delta_0,\varphi\rangle.
\end{align*}
Hence $\chi\delta_0=\chi(0)\delta_0$ as distributions. With the Fourier convention $\widehat f(\xi)=(2\pi)^{-n/2}\int e^{-ix\cdot \xi}f(x)\,dx$, this gives
\begin{align*}
\widehat{\chi\delta_0}(\xi)=\langle \chi(0)\delta_0,(2\pi)^{-n/2}e^{-ix\cdot \xi}\rangle=(2\pi)^{-n/2}\chi(0)e^{-i0\cdot \xi}=(2\pi)^{-n/2}\chi(0).
\end{align*}
This constant is nonzero. If $\Gamma\subset\mathbb R^n_0$ is any nonempty open conic set and $\eta\in\Gamma$, then $\lambda\eta\in\Gamma$ for every $\lambda>0$. Were rapid decay to hold on $\Gamma$, then for $N=1$ there would be $C_1>0$ such that
\begin{align*}
(2\pi)^{-n/2}|\chi(0)|\le C_1(1+\lambda|\eta|)^{-1}
\end{align*}
for every $\lambda>0$, but the right-hand side tends to $0$ as $\lambda\to\infty$, contradicting $(2\pi)^{-n/2}|\chi(0)|>0$. Thus the localized Fourier transform has no rapid decay in any conic direction, so the singularity at $0$ carries every nonzero covector direction.
[/example]
The Dirac mass is maximally singular in direction, but many singularities are more selective. A jump discontinuity across a hypersurface, for instance, is expected to see only covectors normal to the hypersurface.
## Localized Fourier Decay in $\mathbb R^n$
The problem is to turn directional decay into a pointwise phase-space condition. At a point $x_0$, we localize in $x$ by a cutoff, and near a covector $\xi_0 \ne 0$, we localize in frequency by an open cone.
[definition: Microlocal Smoothness in Euclidean Space]
Let $U \subset \mathbb R^n$ be open, let $u \in \mathcal D'(U)$, and let $(x_0,\xi_0) \in U \times \mathbb R^n_0$. The distribution $u$ is microlocally smooth at $(x_0,\xi_0)$ if there exist $\chi \in C_c^\infty(U)$ with $\chi=1$ near $x_0$ and an open conic neighbourhood $\Gamma \subset \mathbb R^n_0$ of $\xi_0$ such that for every $N \in \mathbb N$ there exists $C_N>0$ with
\begin{align*}
|\widehat{\chi u}(\xi)| \le C_N(1+|\xi|)^{-N}
\end{align*}
for all $\xi \in \Gamma$.
[/definition]
Microlocal smoothness identifies the good directions, but the later calculus needs a single closed object containing the bad directions. This motivates defining the wave front set as the complement of microlocal smoothness in the punctured cotangent variables.
[definition: Wave Front Set in Euclidean Space]
Let $U \subset \mathbb R^n$ be open and let $u \in \mathcal D'(U)$. The wave front set of $u$ is
\begin{align*}
\operatorname{WF}(u) := \{(x,\xi) \in U \times \mathbb R^n_0 : u \text{ is not microlocally smooth at } (x,\xi)\}.
\end{align*}
[/definition]
By construction, the wave front set stores covector data above points where directional decay fails. A possible danger is that adding covectors might either lose ordinary singular points or introduce a notion unrelated to the classical support of singularities. The basic compatibility question is therefore whether the base projection of these bad directions recovers exactly the singular support of $u$.
[quotetheorem:8162]
[citeproof:8162]
This projection result explains why the wave front set refines singular support rather than replacing it with unrelated data. The exclusion of $\xi=0$ is not cosmetic. Every open neighbourhood of $0$ in $\mathbb R^n$ meets all directions and contains bounded frequencies, so rapid-decay estimates there do not encode a high-frequency ray; for $u=\delta_0$, adding the zero covector would place $(0,0)$ in the wave front set despite the fact that $0$ is not a direction of oscillation. If zero covectors were included over singular points, the projection statement would still remember singular support, but the fibre would contain artificial data that cannot be propagated by Hamiltonian flow or compared with characteristic sets. The theorem also has a limitation: it does not describe which covectors occur over a singular point, only that at least one must occur. The Heaviside example below shows this distinction, since all nonzero covectors over $0$ occur in dimension one, whereas hypersurface jumps in higher dimension only carry normal covectors. The next structural point is that these bad covectors cannot appear as isolated unstable artefacts of a chosen cone, but form a closed conic subset of phase space.
[quotetheorem:8168]
[citeproof:8168]
The theorem is used constantly as a compactness tool on the cosphere bundle. Its limitation is that it is structural, not diagnostic: it says the bad set is closed and conic once the bad directions have been identified, but it does not identify those directions, compute any wave front set, or give a propagation law for how singularities move. Closedness also depends on allowing open conic neighbourhoods in the definition; a pointwise estimate only along a single ray would not be stable under perturbing the covector. For a concrete failure, take $u=\partial_{x_1}\delta_0$ on $\mathbb R^2$. Its localized Fourier transform is a constant multiple of $\xi_1$, so it vanishes on the vertical ray $\{(0,t):t>0\}$, but in every open cone around that ray it has polynomial growth along directions with $\xi_1\ne 0$. A raywise definition would therefore miss a genuine singular direction. Conicity is also a definition choice with content. If the wave front set recorded individual frequencies rather than rays, rescaling a phase $e^{i\lambda x\cdot \xi_0}$ would change the recorded set although the oscillation direction is the same. The theorem says that wave front sets record directions of high-frequency escape, not the magnitude of a particular frequency. Later, this closed conic structure is what permits elliptic symbols to be chosen disjoint from, or supported near, prescribed parts of $\operatorname{WF}(u)$.
[example: Heaviside Function]
Let $H\in\mathcal D'(\mathbb R)$ be the Heaviside function and let $\chi\in C_c^\infty(\mathbb R)$ satisfy $\chi(0)=1$. With the Fourier convention $\widehat f(\xi)=(2\pi)^{-1/2}\int e^{-ix\xi}f(x)\,dx$, the localized transform is
\begin{align*}
\widehat{\chi H}(\xi)=(2\pi)^{-1/2}\int_0^\infty \chi(x)e^{-ix\xi}\,dx.
\end{align*}
For $\xi\ne 0$, integrate by parts using $e^{-ix\xi}=(-i\xi)^{-1}\frac{d}{dx}e^{-ix\xi}$. Since $\chi$ has compact support, the boundary term at $+\infty$ is $0$, and the boundary term at $0$ is
\begin{align*}
\int_0^\infty \chi(x)e^{-ix\xi}\,dx=\frac{\chi(0)}{i\xi}+\frac{1}{i\xi}\int_0^\infty \chi'(x)e^{-ix\xi}\,dx.
\end{align*}
Applying the same integration-by-parts step to the remaining integral gives
\begin{align*}
\int_0^\infty \chi'(x)e^{-ix\xi}\,dx=\frac{\chi'(0)}{i\xi}+\frac{1}{i\xi}\int_0^\infty \chi''(x)e^{-ix\xi}\,dx.
\end{align*}
Because $\chi''$ is compactly supported, $\left|\int_0^\infty \chi''(x)e^{-ix\xi}\,dx\right|\le \int_0^\infty |\chi''(x)|\,dx$, so
\begin{align*}
\widehat{\chi H}(\xi)=(2\pi)^{-1/2}\frac{1}{i\xi}+O(|\xi|^{-2})
\end{align*}
as $|\xi|\to\infty$.
The leading term is nonzero. Along either conic ray $\xi>0$ or $\xi<0$, the quantity $|\widehat{\chi H}(\xi)|$ is bounded below by a positive multiple of $|\xi|^{-1}$ for all sufficiently large $|\xi|$. Taking $N=2$, no estimate of the form $|\widehat{\chi H}(\xi)|\le C_2(1+|\xi|)^{-2}$ can hold on either ray. Away from $0$, the function $H$ is locally constant and hence smooth. Therefore
\begin{align*}
\operatorname{WF}(H)=\{(0,\xi):\xi\ne 0\}.
\end{align*}
[/example]
This one-dimensional jump shows that the wave front set records both orientations of the normal covector. In higher dimensions a jump across a hypersurface has many tangential variables along which the distribution is still smooth, so the singular directions should not fill the whole cotangent fibre. The Fourier transform separates this behaviour: integration by parts in tangential directions gives rapid decay, while the normal variable retains the same endpoint contribution as the one-dimensional Heaviside function. The invariant language for this surviving normal covector data is the conormal bundle of the hypersurface.
[example: Conormal Singularity of a Hypersurface]
Fix $x_0=(a,0)\in S$, write $x=(x',x_n)$ with $x'\in\mathbb R^{n-1}$, and choose a product cutoff $\chi(x)=\rho(x')\psi(x_n)$ with $\rho(a)=1$ and $\psi(0)=1$. Then
\begin{align*}
\widehat{\chi H(x_n)}(\xi',\xi_n)=(2\pi)^{-n/2}\left(\int_{\mathbb R^{n-1}}\rho(x')e^{-ix'\cdot \xi'}\,dx'\right)\left(\int_0^\infty \psi(t)e^{-it\xi_n}\,dt\right).
\end{align*}
The tangential factor is a Schwartz function because $\rho\in C_c^\infty(\mathbb R^{n-1})$. More explicitly, if some $|\xi_j|\ge c|\xi'|$ and $M\in\mathbb N$, then integrating by parts $M$ times in $x_j$ gives
\begin{align*}
\int \rho(x')e^{-ix'\cdot \xi'}\,dx'=(i\xi_j)^{-M}\int \partial_{x_j}^M\rho(x')e^{-ix'\cdot \xi'}\,dx',
\end{align*}
so
\begin{align*}
\left|\int \rho(x')e^{-ix'\cdot \xi'}\,dx'\right|\le |\xi_j|^{-M}\int |\partial_{x_j}^M\rho(x')|\,dx'\le C_M(1+|\xi'|)^{-M}.
\end{align*}
Thus in every cone where $|\xi'|\ge c|\xi|$, the localized Fourier transform decays faster than every power of $|\xi|$.
It remains to inspect normal directions. For $\xi_n\ne 0$, use $e^{-it\xi_n}=(-i\xi_n)^{-1}\frac{d}{dt}e^{-it\xi_n}$ and compact support of $\psi$:
\begin{align*}
\int_0^\infty \psi(t)e^{-it\xi_n}\,dt=\frac{\psi(0)}{i\xi_n}+\frac{1}{i\xi_n}\int_0^\infty \psi'(t)e^{-it\xi_n}\,dt.
\end{align*}
Since $\psi(0)=1$ and $\left|\int_0^\infty \psi'(t)e^{-it\xi_n}\,dt\right|\le \int_0^\infty |\psi'(t)|\,dt$, the normal factor has leading size $|\xi_n|^{-1}$ and therefore is not rapidly decreasing along either ray $\xi_n>0$ or $\xi_n<0$. Combining the tangential rapid decay with this one-dimensional endpoint term gives
\begin{align*}
\operatorname{WF}(H(x_n))=\{(x,\lambda dx_n):x\in S,\lambda\ne 0\}.
\end{align*}
So a hypersurface jump carries precisely the nonzero covectors normal to the hypersurface, not the tangential covectors along it.
[/example]
Conormal examples are the model singularities behind boundary value problems, propagation from hypersurfaces, and later the construction of Lagrangian distributions.
## Coordinate Invariance and Manifolds
The Euclidean definition uses the Fourier transform, so the next question is whether it survives a change of coordinates. This is not automatic: a nonlinear coordinate change can bend straight Fourier phases, and a definition tied to a particular coordinate grid would be useless on manifolds. The correct invariant statement is that covectors transform by the cotangent map rather than by the tangent map.
[quotetheorem:8169]
[citeproof:8169]
This result shows that the Euclidean construction has the correct transformation law under coordinate changes. The diffeomorphism hypothesis is essential. For the constant map $F:\mathbb R\to\mathbb R$, $F(y)=0$, pulling back $\delta_0$ is not a distribution defined by composition, and the formal covector map would send every covector to $0$, outside the punctured cotangent bundle. A less degenerate rank-loss example is $F(y)=y^2$: the pullback of $\delta_0$ would require interpreting $\delta(y^2)$, which has a non-integrable Jacobian singularity at the critical point. These examples show why local invertibility is not a technical convenience but the condition that prevents covectors from collapsing and prevents new pullback singularities from being created by critical points. The theorem does not yet define wave front sets on manifolds; it proves that the Euclidean definition gives the same answer in overlapping coordinate charts. It therefore makes the next definition possible: on a manifold, the wave front set should be a subset of the punctured cotangent bundle whose coordinate images are the Euclidean wave front sets.
[definition: Wave Front Set on a Smooth Manifold]
Let $M$ be a smooth manifold and let $u\in \mathcal D'(M)$. For a chart $(U,\varphi)$, write $u_\varphi\in \mathcal D'(\varphi(U))$ for the coordinate representative of $u|_U$ on $\varphi(U)\subset \mathbb R^n$. A point $(x,\xi)\in T^*M\setminus 0$ belongs to $\operatorname{WF}(u)$ if, for any chart $(U,\varphi)$ with $x\in U$,
\begin{align*}
(\varphi(x),(d\varphi_x^{-1})^\top\xi)\in \operatorname{WF}(u_\varphi)\subset \varphi(U)\times \mathbb R^n_0.
\end{align*}
[/definition]
Coordinate invariance makes the phrase “in every chart” independent of the atlas used. From now on, statements about wave front sets should be read as statements in the punctured cotangent bundle.
[remark: Cotangent Rather Than Tangent Directions]
A wave front direction is a covector because oscillations are measured by phases. If $e^{i\lambda\phi(x)}$ oscillates rapidly, the direction detected by Fourier analysis is $d\phi_x\in T_x^*M$. This is why canonical relations for Fourier integral operators live naturally in cotangent bundles.
[/remark]
This remark identifies the geometric type of the frequency variable. To describe the standard hypersurface examples invariantly, we need the bundle of covectors that vanish on tangent directions to the hypersurface.
[definition: Conormal Bundle of a Hypersurface]
Let $M$ be a smooth manifold and let $S\subset M$ be an embedded hypersurface. The conormal bundle of $S$ is
\begin{align*}
N^*S := \{(x,\xi)\in T^*M:x\in S,\ \xi(v)=0\text{ for all }v\in T_xS\}.
\end{align*}
[/definition]
The zero covector belongs to this set as written, while wave front sets always exclude the zero section. Thus hypersurface jump singularities live in $N^*S\setminus 0$.
The conormal bundle explains the geometric answer for jump singularities, but wave front sets are not limited to discontinuities along submanifolds. They also detect singularities produced by rapid oscillation, where the preferred covector direction comes from the derivative of the phase rather than from a visible boundary.
[example: Oscillatory Function $e^{i/x}$ Near Zero]
Let $u(x)=e^{i/x}$ on $\mathbb R\setminus\{0\}$. Since $|e^{i/x}|=1$, the function is locally integrable near $0$, so it defines a distribution by
\begin{align*}
\langle u,\varphi\rangle=\int_{\mathbb R} e^{i/x}\varphi(x)\,dx
\end{align*}
for $\varphi\in C_c^\infty(\mathbb R)$. It is smooth on $\mathbb R\setminus\{0\}$, so only covectors over $0$ can occur. Choose $\chi\in C_c^\infty(\mathbb R)$ with $\chi=1$ near $0$. With the convention $\widehat f(\xi)=(2\pi)^{-1/2}\int e^{-ix\xi}f(x)\,dx$,
\begin{align*}
\widehat{\chi u}(\xi)=(2\pi)^{-1/2}\int_{\mathbb R}\chi(x)e^{i/x}e^{-ix\xi}\,dx.
\end{align*}
First take $\xi>0$. The phase is $\phi_\xi(x)=x^{-1}-x\xi$, and for $x\ne0$,
\begin{align*}
\phi_\xi'(x)=-x^{-2}-\xi.
\end{align*}
Thus $|\phi_\xi'(x)|\ge \xi$ on $\mathbb R\setminus\{0\}$. Also
\begin{align*}
\frac{1}{\phi_\xi'(x)}=-\frac{x^2}{1+\xi x^2},
\end{align*}
so $1/\phi_\xi'(x)$ extends continuously to $0$ with value $0$. Integrating by parts on $(-\infty,-\varepsilon)$ and $(\varepsilon,\infty)$ using
\begin{align*}
e^{i\phi_\xi(x)}=\frac{1}{i\phi_\xi'(x)}\frac{d}{dx}e^{i\phi_\xi(x)}
\end{align*}
gives boundary terms at $\pm\varepsilon$ proportional to $\chi(\pm\varepsilon)/\phi_\xi'(\pm\varepsilon)$, and these tend to $0$ as $\varepsilon\downarrow0$. Hence
\begin{align*}
\int \chi(x)e^{i\phi_\xi(x)}\,dx=-\int \frac{d}{dx}\left(\frac{\chi(x)}{i\phi_\xi'(x)}\right)e^{i\phi_\xi(x)}\,dx.
\end{align*}
Repeating the same step $N$ times gives
\begin{align*}
|\widehat{\chi u}(\xi)|\le C_N(1+\xi)^{-N}
\end{align*}
for every $N$, because each factor $1/\phi_\xi'$ contributes at least one power of $\xi^{-1}$ and the coefficients remain integrable near $0$ due to the factor $x^2/(1+\xi x^2)$. Thus the positive covector direction is microlocally smooth.
Now take $\xi=-\lambda$ with $\lambda>0$. Then
\begin{align*}
\widehat{\chi u}(-\lambda)=(2\pi)^{-1/2}\int \chi(x)e^{i(1/x+\lambda x)}\,dx.
\end{align*}
Set $\mu=\sqrt{\lambda}$ and $x=y/\mu$. Then $dx=\mu^{-1}dy$ and
\begin{align*}
1/x+\lambda x=\mu/y+\mu y=\mu\left(y+\frac1y\right).
\end{align*}
Therefore
\begin{align*}
\widehat{\chi u}(-\lambda)=(2\pi)^{-1/2}\mu^{-1}\int \chi(y/\mu)e^{i\mu(y+1/y)}\,dy.
\end{align*}
The rescaled phase $p(y)=y+1/y$ satisfies
\begin{align*}
p'(y)=1-y^{-2}
\end{align*}
and
\begin{align*}
p''(y)=2y^{-3}.
\end{align*}
Thus the only stationary points are $y=1$ and $y=-1$, with $p(1)=2$, $p(-1)=-2$, $p''(1)=2$, and $p''(-1)=-2$. Since $\chi(y/\mu)=1$ near $y=\pm1$ for all large $\mu$, the [one-dimensional stationary phase](/theorems/6985) expansion gives
\begin{align*}
\widehat{\chi u}(-\lambda)=2^{-1/2}\lambda^{-3/4}\left(e^{i(2\sqrt{\lambda}+\pi/4)}+e^{-i(2\sqrt{\lambda}+\pi/4)}\right)+O(\lambda^{-5/4}).
\end{align*}
Equivalently,
\begin{align*}
\widehat{\chi u}(-\lambda)=2^{1/2}\lambda^{-3/4}\cos(2\sqrt{\lambda}+\pi/4)+O(\lambda^{-5/4}).
\end{align*}
Along the sequence with $2\sqrt{\lambda}+\pi/4=2\pi k$, the cosine equals $1$, so $|\widehat{\chi u}(-\lambda)|\ge c\lambda^{-3/4}$ for all sufficiently large $k$. This violates any rapid-decay estimate with, for example, $N=2$ on the negative ray. Consequently
\begin{align*}
\operatorname{WF}(u)=\{(0,\xi):\xi<0\}.
\end{align*}
The singularity is therefore not a jump singularity: its surviving covector direction is selected by the limiting phase derivative $d(1/x)=-x^{-2}\,dx$, which points only in the negative cotangent direction near $0$.
[/example]
This example differs from a jump: the singularity is generated by infinitely rapid oscillation rather than by a discontinuity. Wave front sets capture both phenomena in a common cotangent-language.
## Basic Operations
A usable invariant must behave predictably under the operations used in analysis. If restriction to an open subset changed covectors over points already inside that subset, the wave front set would not be a local invariant at all. The simplest operations to check are restriction to an open subset, multiplication by a smooth function, and pullback by a diffeomorphism.
[quotetheorem:8170]
[citeproof:8170]
Restriction says that wave front sets are sheaf-like in the base variable. The openness of $V$ is important because the proof needs cutoffs supported inside the domain of restriction. If one tried to replace $V$ by the closed half-line $[0,\infty)\subset\mathbb R$ and restrict the smooth function $1$ by multiplying with its indicator, the result would be the Heaviside distribution, which has wave front set $\{(0,\xi):\xi\ne 0\}$. The new singularity is a boundary artefact of the closed restriction, not a feature of the original distribution. The theorem therefore only compares covectors over points inside an open subset and says nothing about points outside $V$, where $u|_V$ has no value as a distribution on $M$. This locality is what later lets microlocal estimates be checked in coordinate patches before being assembled on a manifold. Multiplication by a smooth function is the next operation, and it can remove singularities where the multiplier vanishes, but it cannot create new directional singularities.
[quotetheorem:8171]
[citeproof:8171]
This theorem is often used with cutoffs: microlocal questions may be localized without adding artificial covectors. The inclusion can be strict. For example, on $\mathbb R$ let $u=\delta_0$ and let $a(x)=x$. Then $a\delta_0=0$, so $\operatorname{WF}(a\delta_0)=\varnothing$, while $\operatorname{WF}(\delta_0)=\{(0,\xi):\xi\ne 0\}$. This specific example is why the equality statement requires $a$ to be nonzero on a neighbourhood of the base point; nonvanishing supplies the smooth inverse multiplier $1/a$ and prevents erasure of the singular component. Finally, diffeomorphisms transport wave front sets exactly by the cotangent lift.
[quotetheorem:8172]
[citeproof:8172]
The diffeomorphism hypothesis gives equality because no directions are folded together and no rank is lost. The limitation is visible in the projection map $F:\mathbb R^2\to\mathbb R$, $F(x_1,x_2)=x_1$. Pulling back the smooth distribution $1$ is harmless, but pulling back $\delta_0$ gives $\delta(x_1)$, whose wave front set is $\{((0,x_2),\lambda dx_1):\lambda\ne 0\}$; the result lives on a hypersurface and is not described by an invertible cotangent lift. At maps with critical points, the situation can be worse: $F(y)=y^2$ and $u=\delta_0$ lead formally to $\delta(y^2)$, which is not a well-defined distribution by ordinary pullback. For general smooth maps, pullback therefore requires a transversality condition involving the wave front set, so this theorem is the special case where no obstruction can occur. The formula also explains why microlocal analysis naturally talks to symplectic geometry: diffeomorphisms lift to canonical transformations of cotangent bundles. These operation rules are the minimal calculus needed before bringing in pseudodifferential operators. In the next chapter, microlocal smoothness will be reformulated by elliptic cutoffs and tied to characteristic sets of pseudodifferential operators, making the connection with the symbol calculus from Microlocal Analysis I and preparing the conormal and Lagrangian viewpoint used for Fourier integral operators.
The wave front set gives the geometric language for singularities, but to use it effectively we need operators that can detect them. Pseudodifferential cutoffs provide exactly that mechanism, isolating microlocal regularity through their symbols and elliptic behavior.
# 2. Pseudodifferential Detection of Singularities
These notes develop the microlocal tools needed to study singularities of distributions through phase space. The course assumes the basic theory of distributions, Fourier transform estimates, Sobolev spaces, and the symbolic calculus of pseudodifferential operators. Chapter 1 defined the wave front set by localized Fourier decay; this chapter rewrites that definition as an operator-theoretic test, so that elliptic regularity, parametrices, and the propagation theorems in Chapters 5 and 6 can act directly on $\operatorname{WF}(u)$.
## Microlocal Regularity by Elliptic Cutoffs
How can we say that a distribution is smooth only near one point and one direction? The answer is to use a pseudodifferential cutoff whose principal symbol is nonzero near the covector under inspection. If such a cutoff turns the distribution into a smooth function, then the cutoff has removed no singularity at that covector.
[definition: Elliptic At A Covector]
Let $A:C_c^\infty(X)\to C^\infty(X)$ be a classical pseudodifferential operator with $A\in \Psi^m(X)$, and let $a_m$ be its principal symbol. Assume $A$ is extended to distributions by duality when it is properly supported, or microlocally after inserting compactly supported cutoffs. The operator $A$ is elliptic at $(x_0,\xi_0) \in T^*X \setminus 0$ if $a_m(x_0,\xi_0) \neq 0$.
[/definition]
The condition is conic in the covector variable for classical operators, so ellipticity at one nonzero covector gives ellipticity in a smaller conic neighbourhood of that covector. This is the microlocal substitute for invertibility: it says that the operator sees the chosen phase-space direction.
[example: Directional Cutoff In Euclidean Space]
Let $X=\mathbb R^n$ and choose $(x_0,\xi_0)\in T^*\mathbb R^n\setminus 0$. Pick $\chi\in C_c^\infty(\mathbb R^n)$ with $\chi(x_0)=1$, and choose a degree-zero conic symbol $\psi(\xi)$ supported in a narrow cone about the ray $\mathbb R_+\xi_0$ with $\psi(\xi_0)=1$ after normalizing $\xi_0$ to unit length. For the Kohn-Nirenberg quantization of $a(x,\xi)=\chi(x)\psi(\xi)$, the principal symbol is $a_0(x,\xi)=\chi(x)\psi(\xi)$, so
\begin{align*}
a_0(x_0,\xi_0)=\chi(x_0)\psi(\xi_0)=1\cdot 1=1.
\end{align*}
Thus $A=\operatorname{Op}(\chi(x)\psi(\xi))$ is elliptic at $(x_0,\xi_0)$.
With the Fourier convention $\widehat f(\eta)=\int e^{-ix\cdot\eta}f(x)\,dx$, the operator acts, after the usual cutoff interpretation for distributions, by
\begin{align*}
Au(x)=\chi(x)(2\pi)^{-n}\int e^{ix\cdot\xi}\psi(\xi)\widehat u(\xi)\,d\xi.
\end{align*}
Taking the Fourier transform in $x$ gives
\begin{align*}
\widehat{Au}(\eta)=(2\pi)^{-n}\int \widehat\chi(\eta-\xi)\psi(\xi)\widehat u(\xi)\,d\xi.
\end{align*}
The factor $\psi(\xi)$ discards frequencies outside the chosen cone, while the convolution kernel $\widehat\chi(\eta-\xi)$ is rapidly decreasing because $\chi\in C_c^\infty$. Therefore rapid decay of $\widehat{Au}(\eta)$ in a smaller cone about $\xi_0$ is exactly the Fourier-decay test for the part of $u$ localized near $x_0$ and microlocalized near the ray $\mathbb R_+\xi_0$.
[/example]
This example motivates the main local definition. Instead of asking for all Fourier cutoffs to decay, we ask for a single elliptic pseudodifferential cutoff to regularise the distribution.
[definition: Microlocal Smoothness]
Let $u\in \mathcal D'(X)$ and let $(x_0,\xi_0)\in T^*X\setminus 0$. The distribution $u$ is microlocally smooth at $(x_0,\xi_0)$ if there exists a properly supported operator $A:C_c^\infty(X)\to C^\infty(X)$ with $A\in \Psi^0(X)$, extended continuously as $A:\mathcal D'(X)\to\mathcal D'(X)$, such that $A$ is elliptic at $(x_0,\xi_0)$ and $Au\in C^\infty(X)$.
[/definition]
The order zero choice is a convention rather than a restriction: composing with elliptic powers of an order-one operator changes the order but not the microlocal smoothing test. The definition will be useful only if it recovers the Fourier-decay wave front set from Chapter 1, so the next theorem proves that the elliptic cutoff test and the localized Fourier-decay test of Chapter 1 identify exactly the same covectors.
[quotetheorem:8173]
[citeproof:8173]
The theorem gives an operational interpretation of $WF(u)$: it is exactly the set of covectors where no elliptic pseudodifferential microscope can make $u$ smooth. The ellipticity hypothesis is essential: a cutoff whose principal symbol vanishes near $(x_0,\xi_0)$ can miss a singularity in that direction, as happens for directional cutoffs applied to $\delta_0$. The conclusion also does not say that every regularising operator is elliptic; it says that one elliptic regularising test is enough. This is the form used in elliptic PDE, because the differential or pseudodifferential equation itself supplies the microscope.
[remark: Operator Order In The Criterion]
The criterion remains valid with $A\in \Psi^m(X)$ for any $m\in\mathbb R$. If $Au\in C^\infty(X)$ and $A$ is elliptic at $(x_0,\xi_0)$, a parametrix has order $-m$ and recovers $u$ modulo a microlocally smoothing error. If an order-zero test is desired, compose $A$ with an elliptic operator of order $-m$.
[/remark]
## Properly Supported Operators And The Global Characterization
The local criterion is useful, but it hides a support issue. On noncompact manifolds, a pseudodifferential operator need not send compactly supported distributions to distributions with controlled support unless its Schwartz kernel is proper over each factor. Proper support makes compositions and restrictions behave as expected.
[definition: Properly Supported Pseudodifferential Operator]
Let $A:C_c^\infty(X)\to C^\infty(X)$ be a pseudodifferential operator with Schwartz kernel $K_A\in \mathcal D'(X\times X)$. The operator $A$ is properly supported if both projections
\begin{align*}
\pi_1,\pi_2:\operatorname{supp}K_A \to X
\end{align*}
are proper maps.
[/definition]
Proper support is a technical condition with a concrete payoff: it lets $A$ act continuously on distributions without requiring global compact support. Once this support issue is removed, every regularising pseudodifferential operator can be used as evidence about where $u$ is not singular. The next theorem packages all such evidence into a single intersection formula for $WF(u)$.
[quotetheorem:8174]
[citeproof:8174]
This result turns the wave front set into an ideal-like object: every operator that regularises $u$ contributes its characteristic set, and the intersection of all those obstructions is the remaining singular geometry. Proper support is needed here; without it, applying $A$ to an arbitrary distribution may not be globally defined, even when the same formula is locally meaningful after cutoffs. The theorem does not identify a smallest single regularising operator; it records the common obstruction left by all properly supported regularising tests. The formulation is also stable under replacing local cutoffs by properly supported representatives.
[example: Delta Distribution Revisited By Operators]
Let $u=\delta_0\in \mathcal D'(\mathbb R^n)$ and let $A\in\Psi^0_{\mathrm{prop}}(\mathbb R^n)$ have Kohn-Nirenberg full symbol $a(x,\xi)\sim a_0(x,\xi)+a_{-1}(x,\xi)+\cdots$. Its kernel is
\begin{align*}
K_A(x,y)=(2\pi)^{-n}\int e^{i(x-y)\cdot \xi}a(x,\xi)\,d\xi.
\end{align*}
Since $\delta_0(\phi)=\phi(0)$, applying the kernel to $\delta_0$ evaluates the $y$ variable at $0$:
\begin{align*}
Au(x)=\langle K_A(x,\cdot),\delta_0\rangle=K_A(x,0)=(2\pi)^{-n}\int e^{ix\cdot \xi}a(x,\xi)\,d\xi.
\end{align*}
Let $\chi\in C_c^\infty(\mathbb R^n)$ satisfy $\chi(0)=1$. With the convention $\widehat f(\eta)=\int e^{-ix\cdot\eta}f(x)\,dx$, the localized Fourier transform is
\begin{align*}
\widehat{\chi Au}(\eta)=(2\pi)^{-n}\int\int e^{-ix\cdot(\eta-\xi)}\chi(x)a(x,\xi)\,dx\,d\xi.
\end{align*}
The standard symbol expansion for this oscillatory integral gives the leading term
\begin{align*}
\widehat{\chi Au}(\eta)=\chi(0)a_0(0,\eta)+r_{-1}(\eta)=a_0(0,\eta)+r_{-1}(\eta),
\end{align*}
where $r_{-1}$ is a symbol of order $-1$. If $a_0(0,\xi_0)\neq 0$, then by homogeneity $a_0(0,t\xi_0)=a_0(0,\xi_0)$ for $t>0$, while $r_{-1}(t\xi_0)=O(t^{-1})$. Hence
\begin{align*}
\widehat{\chi Au}(t\xi_0)=a_0(0,\xi_0)+O(t^{-1}).
\end{align*}
This expression cannot decay rapidly as $t\to\infty$, so $Au$ has a singularity at $(0,\xi_0)$ whenever the principal symbol of $A$ is elliptic there.
Conversely, if a conic microlocal cutoff near $(0,\xi_0)$ kills the complete symbol of $A$, then every homogeneous term in the symbol expansion of that localized operator vanishes near $(0,\xi_0)$. The localized kernel is then generated by an order $-\infty$ symbol, so applying it to $\delta_0$ gives a smooth function near that covector. Thus a directional operator removes the $\xi_0$ component of $\delta_0$ only when its full microlocal symbol is absent there; vanishing of just the principal symbol can leave lower-order singular terms. Taking all properly supported regularizing tests recovers
\begin{align*}
WF(\delta_0)=\{(0,\xi):\xi\neq 0\}.
\end{align*}
[/example]
The same language distinguishes singular support from wave front set. If every direction over $x_0$ is removed by suitable elliptic cutoffs, then $u$ is smooth near $x_0$; if at least one direction remains, $x_0$ lies in the singular support. The next theorem records this compatibility with the older notion of singularity.
[quotetheorem:8162]
[citeproof:8162]
This theorem confirms that microlocal analysis refines, rather than replaces, ordinary singular support. The quantification over every nonzero covector above $x_0$ is essential: the hypersurface delta distribution is singular at its base points even though many tangential covectors are absent from its wave front set. The theorem does not say that one direction determines ordinary smoothness; it says that all fibre directions together determine it. This fibrewise viewpoint is the bridge to characteristic sets, where only selected directions can obstruct elliptic regularity.
## Microlocal Elliptic Regularity
What does an equation $Pu=f$ tell us about the singularities of $u$? If $P$ is elliptic at a covector, then $P$ loses no microlocal information there, so regularity of $Pu$ forces regularity of $u$ at that covector. This is elliptic regularity stated in phase space.
[definition: Characteristic Set]
Let $P:C_c^\infty(X)\to C^\infty(X)$ be a classical pseudodifferential operator with $P\in \Psi^m(X)$, and let $p_m$ be its principal symbol. Assume $P$ is extended to distributions by proper support, or microlocally after inserting cutoffs. The characteristic set of $P$ is
\begin{align*}
\operatorname{Char}(P)=\{(x,\xi)\in T^*X\setminus 0 : p_m(x,\xi)=0\}.
\end{align*}
[/definition]
The characteristic set is where the principal symbol fails to be invertible, so it marks exactly the covectors where a parametrix cannot be built from the principal symbol. Away from this set, the equation should transfer regularity from $Pu$ back to $u$ because the operator has a microlocal inverse. The next theorem makes this expectation precise and identifies the only two possible sources of singularities of $u$: singularities already present in $Pu$, and characteristic covectors of $P$.
[quotetheorem:8175]
[citeproof:8175]
The inclusion is the first major payoff of the operator viewpoint. Proper support ensures that $Pu$ is a globally defined distribution, while ellipticity away from $\operatorname{Char}(P)$ is essential because no parametrix is available at characteristic covectors; for $P=D_{x_1}$, singularities constant in the $x_1$ direction can remain invisible to $Pu$. The theorem does not say that every characteristic covector is singular, only that elliptic regularity cannot rule it out. This distinction is what later propagation theorems refine.
[example: Laplacian Away From Characteristic Directions]
Let $P=-\Delta$ on $\mathbb R^n$, using the convention $D_{x_j}=-i\partial_{x_j}$. Since $\partial_{x_j}=iD_{x_j}$, we have $\partial_{x_j}^2=-D_{x_j}^2$, and therefore
\begin{align*}
-\Delta=-\sum_{j=1}^n \partial_{x_j}^2=\sum_{j=1}^n D_{x_j}^2.
\end{align*}
Replacing each $D_{x_j}$ by $\xi_j$ gives the order-two principal symbol
\begin{align*}
p_2(x,\xi)=\sum_{j=1}^n \xi_j^2=|\xi|^2.
\end{align*}
For $\xi\neq 0$, at least one component $\xi_j$ is nonzero, so $\xi_j^2>0$ and hence
\begin{align*}
|\xi|^2=\sum_{j=1}^n \xi_j^2>0.
\end{align*}
Thus no nonzero covector satisfies $p_2(x,\xi)=0$, so
\begin{align*}
\operatorname{Char}(-\Delta)=\varnothing.
\end{align*}
Now assume $-\Delta u=f$ and $f$ is smooth on an open neighbourhood $U$ of $x_0$. Smoothness of $f$ near $x_0$ means that no covector over $U$ lies in $WF(f)$, so in particular
\begin{align*}
(x_0,\xi_0)\notin WF(f)
\end{align*}
for every $\xi_0\neq 0$. Since $Pu=f$ and $\operatorname{Char}(P)=\varnothing$, *Microlocal Elliptic Regularity* gives
\begin{align*}
WF(u)\subset WF(f)\cup\operatorname{Char}(P)=WF(f)\cup\varnothing=WF(f).
\end{align*}
Therefore no nonzero covector over $x_0$ lies in $WF(u)$. By *Projection To Singular Support*, the projection of $WF(u)$ to the base is the singular support of $u$, so $x_0$ is not in the singular support of $u$. Hence $u$ is smooth near $x_0$. This recovers ordinary local elliptic regularity from the microlocal statement, and the same argument works for any differential operator whose principal symbol is nonzero on $T^*X\setminus 0$.
[/example]
Operators with nonempty characteristic set are more subtle. Elliptic regularity does not say singularities must occur at characteristic covectors; it says that only characteristic covectors can hide singularities not already visible in $Pu$.
[example: A First Order Characteristic Direction]
On $\mathbb R^2$, let $P=D_{x_1}$ and take its order-one principal symbol to be
\begin{align*}
p_1(x,\xi)=i\xi_1.
\end{align*}
By the definition of the characteristic set,
\begin{align*}
\operatorname{Char}(P)=\{(x,\xi)\in T^*\mathbb R^2\setminus 0:p_1(x,\xi)=0\}.
\end{align*}
Since $i\xi_1=0$ is equivalent to $\xi_1=0$, this becomes
\begin{align*}
\operatorname{Char}(P)=\{(x,\xi)\in T^*\mathbb R^2\setminus 0:\xi_1=0\}.
\end{align*}
Now fix $(x_0,\xi_0)\in T^*\mathbb R^2\setminus 0$ with $\xi_{0,1}\neq 0$. Then
\begin{align*}
p_1(x_0,\xi_0)=i\xi_{0,1}\neq 0,
\end{align*}
so $P$ is elliptic at $(x_0,\xi_0)$. If $Pu$ is smooth microlocally near $(x_0,\xi_0)$, then $(x_0,\xi_0)\notin WF(Pu)$. Also $(x_0,\xi_0)\notin \operatorname{Char}(P)$ because $\xi_{0,1}\neq 0$. Hence *Microlocal Elliptic Regularity* gives
\begin{align*}
(x_0,\xi_0)\notin WF(u).
\end{align*}
Thus $u$ is microlocally smooth at every covector over which $Pu$ is microlocally smooth and $\xi_1\neq 0$.
When $\xi_{0,1}=0$, the covector lies in $\operatorname{Char}(P)$, so elliptic regularity gives no conclusion about whether $(x_0,\xi_0)$ belongs to $WF(u)$. Those characteristic directions are precisely the directions where later propagation results must supply the missing information.
[/example]
## Equivalence With The Fourier Definition
The previous chapter defined $WF(u)$ through localized Fourier decay, while this chapter uses pseudodifferential cutoffs. The two languages must agree for the theory to be coherent: Fourier decay gives concrete estimates, and operators give calculus and parametrices.
[quotetheorem:8176]
[citeproof:8176]
This equivalence explains why pseudodifferential methods do not change the object defined in Chapter 1. Proper support and the fixed neighbourhood $U\times\Gamma$ in the third condition are part of the statement, since otherwise $Bu$ may not be globally defined or the quantifier over cutoffs may drift away from the covector under inspection. For a concrete failure mode, an operator on $\mathbb R^n$ whose kernel is $K(x,y)=1$ sends a compactly supported test function to a constant multiple of the constant function, but its formal action on a general distribution such as $u=1$ would require pairing $1$ with a non-compactly supported function in the $y$ variable; the usual distributional action is not defined. This is why the global operator formulation restricts to properly supported representatives, while local Fourier tests insert cutoffs before applying the operator. The theorem does not replace Fourier estimates by a weaker notion; it proves that the operator calculus tests exactly the same rapid decay. This is why parametrices and symbolic calculus can be used from now on without redefining $WF(u)$.
[remark: Microsupport Versus Characteristic Set]
The microsupport of an operator records where its complete symbol is active, while the characteristic set records where its principal symbol is not elliptic. In the detection theorem, operators with small microsupport localize the test; operators with small characteristic set certify regularity. Both notions are conic, but they answer different questions.
[/remark]
## Sobolev Wave Front Sets
Smoothness is an all-orders condition. For PDE estimates, it is often necessary to ask how many derivatives are present at a covector, not only whether all derivatives are present. Sobolev wave front sets measure this finite-order microlocal regularity.
[definition: Sobolev Wave Front Set]
Let $u\in\mathcal D'(X)$ and $s\in\mathbb R$. A covector $(x_0,\xi_0)\in T^*X\setminus 0$ is not in $WF^s(u)$ if there exists a properly supported operator $A:C_c^\infty(X)\to C^\infty(X)$ with $A\in\Psi^0(X)$, extended continuously as $A:\mathcal D'(X)\to\mathcal D'(X)$, such that $A$ is elliptic at $(x_0,\xi_0)$ and $Au\in H^s_{\mathrm{loc}}(X)$.
[/definition]
This definition interpolates between distributional regularity and smoothness by replacing $C^\infty$ with a fixed Sobolev order. It is natural to ask whether the ordinary wave front set is recovered by imposing this condition at every Sobolev level. The answer is yes, with a closure inserted to match the closed conic nature of $WF(u)$.
[quotetheorem:8177]
[citeproof:8177]
The previous theorem says that the full wave front set is assembled from all finite Sobolev obstructions together with their possible conic limit points. The neighbourhood condition is essential: being absent from each $WF^s(u)$ at a single covector does not by itself prevent finite-order Sobolev singularities from accumulating at that covector from nearby directions. The result does not assign a single sharp order to every singularity; it identifies smooth microlocal regularity as the absence of all finite-order obstructions on a fixed conic neighbourhood. For estimates, however, the order $s$ is fixed, and the first structural question is how these obstructions change as $s$ varies. Higher Sobolev regularity implies lower Sobolev regularity, so the bad sets increase as the requested order increases.
[quotetheorem:8178]
[citeproof:8178]
Monotonicity describes the Sobolev scale itself. The hypothesis $s\le t$ is essential because $H^t_{\mathrm{loc}}\subset H^s_{\mathrm{loc}}$ goes only from higher regularity to lower regularity. For example, the hypersurface delta distribution $u=\delta_{\{x_1=0\}}$ has no Sobolev wave front set below the threshold $s<-1/2$, but has the full nonzero conormal bundle in $WF^t(u)$ for every $t\ge -1/2$; taking $s<-1/2\le t$ shows that the reverse inclusion $WF^t(u)\subset WF^s(u)$ fails. The theorem does not compute the threshold for a given distribution, only the nesting of the bad sets once the thresholds are known. PDE applications also require knowing how an operator shifts that scale. An operator of order $m$ costs $m$ derivatives, while a microlocal parametrix recovers those derivatives away from the characteristic set. The Sobolev elliptic [regularity theorem](/theorems/2750) records this derivative bookkeeping in the same form as the smooth result.
[quotetheorem:8179]
[citeproof:8179]
This theorem is often the form used in a priori estimates. The order $m$ matters: for a differential operator of order $m$, applying $P$ differentiates $u$ $m$ times, so the right Sobolev order for $Pu$ is $s-m$. The exclusion of $\operatorname{Char}(P)$ is also essential, as characteristic directions admit no microlocal inverse from the principal symbol alone. The theorem does not give propagation along the characteristic set; it only gives recovery away from it. It says that elliptic inversion recovers exactly the number of derivatives predicted by the operator order, microlocally away from characteristics.
[example: Conormal Distribution And Sobolev Thresholds]
Let $u=\delta_{\{x_1=0\}}$ on $\mathbb R^n$, so for $\phi\in C_c^\infty(\mathbb R^n)$,
\begin{align*}
u(\phi)=\int_{\mathbb R^{n-1}}\phi(0,x')\,d\mathcal L^{n-1}(x').
\end{align*}
Near a point $(0,x_0')$ choose a product cutoff $\chi(x_1,x')=\alpha(x_1)\beta(x')$ with $\alpha(0)\neq 0$ and $\beta(x_0')\neq 0$. Then
\begin{align*}
(\chi u)(\phi)=\int_{\mathbb R^{n-1}}\alpha(0)\beta(x')\phi(0,x')\,dx'.
\end{align*}
With $\widehat f(\eta)=\int e^{-ix\cdot\eta}f(x)\,dx$, this gives
\begin{align*}
\widehat{\chi u}(\eta_1,\eta')=\alpha(0)\int_{\mathbb R^{n-1}}e^{-ix'\cdot\eta'}\beta(x')\,dx'=\alpha(0)\widehat\beta(\eta').
\end{align*}
The local $H^s$ condition is measured by
\begin{align*}
\int_{\mathbb R}\int_{\mathbb R^{n-1}}(1+\eta_1^2+|\eta'|^2)^s|\alpha(0)|^2|\widehat\beta(\eta')|^2\,d\eta'\,d\eta_1.
\end{align*}
For fixed $\eta'$, the inner integral in $\eta_1$ has the same large-$|\eta_1|$ convergence as $\int_1^\infty r^{2s}\,dr$, which is finite exactly when $2s<-1$, that is, $s<-1/2$. Since $\widehat\beta$ is rapidly decreasing, the full integral is finite for $s<-1/2$.
At a conormal covector $((0,x_0'),\xi_1 dx_1)$ with $\xi_1\neq 0$, choose $\beta$ so that $\widehat\beta(0)\neq 0$. On a small cone about the ray $\mathbb R_+(\xi_1,0)$ or $\mathbb R_-(\xi_1,0)$, the variable $\eta'$ remains bounded on a subcone while $|\eta_1|\to\infty$, so $|\widehat\beta(\eta')|$ is bounded below on a smaller neighbourhood of $\eta'=0$. The microlocal Sobolev integral therefore contains a divergent multiple of $\int_1^\infty r^{2s}\,dr$ when $s\ge -1/2$. Thus the conormal directions belong to $WF^s(u)$ exactly for $s\ge -1/2$.
If $x_1\neq 0$, a cutoff supported near the base point kills $u$, so there is no wave front contribution there. If $x_1=0$ but the covector has a nonzero tangential component $\eta'\neq 0$, then in a cone around that covector one has $|\eta'|\ge c|(\eta_1,\eta')|$ for some $c>0$, and the rapid decay of $\widehat\beta(\eta')$ gives rapid decay in the full frequency variable. Hence
\begin{align*}
WF(u)=\{((0,x'),\xi_1 dx_1):\xi_1\neq 0\},
\end{align*}
and
\begin{align*}
WF^s(u)=\varnothing
\end{align*}
for $s<-1/2$, while
\begin{align*}
WF^s(u)=\{((0,x'),\xi_1 dx_1):\xi_1\neq 0\}
\end{align*}
for $s\ge -1/2$. The Sobolev wave front sets therefore record not only the conormal direction of the hypersurface singularity, but also its sharp regularity threshold.
[/example]
Sobolev wave front sets therefore refine the wave front set by assigning a regularity scale to each covector. In later chapters this scale will interact with Fourier integral operators, where orders of amplitudes and dimensions of canonical relations determine the Sobolev loss or gain.
Once singularities can be detected by pseudodifferential operators, the next step is to understand how standard operations on distributions alter their cotangent geometry. Exterior products, pullbacks, pushforwards, and multiplication all obey wave front calculus rules that predict when these operations are legitimate and what singularities they create.
# 3. Operations on Distributions and Wave Front Calculus
This chapter studies the operations on distributions that occur constantly in microlocal analysis: forming exterior products, restricting along maps, integrating along fibres, and multiplying singular objects. The organizing principle is that each operation has a geometric obstruction in the cotangent bundle. The wave front set records exactly the directions in which the operation may fail, so the algebra of distributions becomes a calculus of conic subsets.
Chapter 1 introduced wave front sets by localized Fourier decay, and Chapter 2 recast them using elliptic pseudodifferential cutoffs. We now use them as a working tool. The main theme is that familiar operations from smooth analysis extend to distributions under transversality conditions, and the output wave front set is controlled by a canonical operation on covectors.
## Tensor Products and Exterior Singularities
The first question is how singularities behave when two independent variables are placed side by side. If $u$ is a distribution on $X$ and $v$ is a distribution on $Y$, the [tensor product](/page/Tensor%20Product) $u \otimes v$ is always defined, but its singular directions are not just the Cartesian product of the two wave front sets. Singular behaviour can occur in the $X$-direction alone, in the $Y$-direction alone, or in both directions simultaneously.
[definition: Tensor Product of Distributions]
Let $X$ and $Y$ be smooth manifolds. For $u \in \mathcal{D}'(X)$ and $v \in \mathcal{D}'(Y)$, the tensor product $u \otimes v \in \mathcal{D}'(X \times Y)$ is the distribution determined by
\begin{align*}
(u \otimes v)(\phi \otimes \psi) = u(\phi)v(\psi)
\end{align*}
for all $\phi \in C_c^\infty(X)$ and $\psi \in C_c^\infty(Y)$, extended continuously to $C_c^\infty(X \times Y)$.
[/definition]
This definition is algebraic, but the microlocal content lies in how covectors split under $T^*(X \times Y) \cong T^*X \times T^*Y$. A covector at $(x,y)$ is written $(x,y;\xi,\eta)$, with $\xi \in T_x^*X$ and $\eta \in T_y^*Y$. The next theorem supplies the estimate needed whenever kernels or distributions are placed in independent variables before being restricted or integrated.
[quotetheorem:8180]
[citeproof:8180]
The extra zero-covector terms are essential. The theorem has three distinct hypotheses built into its statement: $u$ and $v$ are arbitrary distributions, the base points must lie in the relevant supports, and the covectors are interpreted in $T^*(X\times Y)\setminus 0$ after the canonical splitting of cotangent spaces. It does not say that singularities of $u\otimes v$ must involve nonzero covectors in both variables; a distribution may be smooth in the $Y$-direction while remaining singular in the $X$-direction, so $(\xi,0)$ must be allowed when $\xi$ is a singular covector of $u$ and $v$ does not vanish at the base point. Nor is the displayed inclusion usually an equality without additional assumptions on the factors, since cancellations or vanishing at the base point may remove some possible directions. The point of the estimate is that it packages all exterior singularities into a form that can be fed into pullback, pushforward, and diagonal restriction later in the chapter.
[example: Tensoring a Delta Distribution with a Smooth Function]
Let $u=\delta_0\in\mathcal{D}'(\mathbb{R})$ and let $f\in C_c^\infty(\mathbb{R})$. For $\Phi\in C_c^\infty(\mathbb{R}^2)$, the tensor product acts by
\begin{align*}
(\delta_0\otimes f)(\Phi)=\delta_0\bigl(x\mapsto \int_{\mathbb{R}} f(y)\Phi(x,y)\,dy\bigr)=\int_{\mathbb{R}} f(y)\Phi(0,y)\,dy.
\end{align*}
Thus $\delta_0\otimes f$ is concentrated on $\{0\}\times \operatorname{supp} f$ and is smooth in the $y$ variable.
Since $f$ is smooth, $\operatorname{WF}(f)=\varnothing$, while
\begin{align*}
\operatorname{WF}(\delta_0)=\{(0,\xi):\xi\neq 0\}.
\end{align*}
The tensor product estimate *Wave Front Set of a Tensor Product* therefore gives
\begin{align*}
\operatorname{WF}(\delta_0\otimes f)\subset \{(0,y;\xi,0):y\in \operatorname{supp}f,\ \xi\neq 0\}.
\end{align*}
For the reverse inclusion, fix $y_0\in\operatorname{supp}f$ and choose cutoffs $\chi,\psi\in C_c^\infty(\mathbb{R})$ with $\chi(0)\neq 0$ and $\psi=1$ near $y_0$. The localized Fourier transform factors as
\begin{align*}
\widehat{(\chi\otimes\psi)(\delta_0\otimes f)}(\xi,\eta)=\chi(0)\widehat{\psi f}(\eta).
\end{align*}
Because $y_0\in\operatorname{supp}f$ and $\psi=1$ near $y_0$, the function $\psi f$ is not identically zero, so there is some $\eta_*$ with $\widehat{\psi f}(\eta_*)\neq 0$. Along $(\xi,\eta_*)$ with $|\xi|\to\infty$, this Fourier transform equals the nonzero constant $\chi(0)\widehat{\psi f}(\eta_*)$, so it has no rapid decay in any conic neighbourhood of $(\xi_0,0)$ with $\xi_0\neq 0$. Hence
\begin{align*}
\operatorname{WF}(\delta_0\otimes f)=\{(0,y;\xi,0):y\in \operatorname{supp}f,\ \xi\neq 0\}.
\end{align*}
This is exactly the exterior singularity where the $x$ covector is nonzero and the $y$ covector is zero, so the $\eta=0$ term in the tensor product estimate is indispensable.
[/example]
## Pullback and the Normal Set Condition
The next problem is restriction and change of variables. Smooth functions pull back along every smooth map $F:X\to Y$, but distributions cannot always be pulled back: evaluating a distribution on the image of a map may force it to meet a singularity from a forbidden direction. The obstruction is measured by the covectors normal to $F$.
[definition: Normal Set of a Smooth Map]
Let $F:X\to Y$ be a smooth map between smooth manifolds. The normal set of $F$ is the subset $N_F\subset T^*Y\setminus 0$ defined by
\begin{align*}
N_F=\{(F(x),\eta)\in T^*Y\setminus 0 : dF_x^\top \eta=0\}.
\end{align*}
Here $dF_x^\top:T^*_{F(x)}Y\to T_x^*X$ is the transpose of the differential $dF_x:T_xX\to T_{F(x)}Y$.
[/definition]
The set $N_F$ records covectors on $Y$ that annihilate every tangent vector coming from $X$. This is precisely the obstruction that appears when trying to test $u$ only along the image of $F$: a singular covector normal to the image cannot be averaged away by motion inside $X$. The following theorem turns that obstruction into a construction and gives the covector rule for the pulled-back distribution.
[quotetheorem:8181]
[citeproof:8181]
For an embedding, this theorem is the microlocal version of restricting a distribution to a submanifold. The forbidden covectors are precisely those conormal to the embedded submanifold, and the condition is not cosmetic: the distribution $\delta_0$ on $\mathbb{R}$ cannot be pulled back along the point inclusion $i:\{0\}\hookrightarrow\mathbb{R}$, because $N_i=T_0^*\mathbb{R}\setminus 0$ meets $\operatorname{WF}(\delta_0)$. The theorem gives a sufficient construction and a wave front estimate, but it does not define every possible ad hoc restriction, nor does it assert equality in the wave front inclusion. Its value is that it gives a stable criterion compatible with later operations, especially traces of kernels and diagonal pullbacks used to define products.
[example: Restriction to a Hypersurface]
Let $i:H\hookrightarrow M$ be the inclusion of a smooth hypersurface. In local coordinates near a point of $H$, write $H=\{x_n=0\}$ and
\begin{align*}
i(x_1,\ldots,x_{n-1})=(x_1,\ldots,x_{n-1},0).
\end{align*}
For $v=(v_1,\ldots,v_{n-1})\in T_xH$, the differential is
\begin{align*}
di_x(v)=(v_1,\ldots,v_{n-1},0)\in T_xM.
\end{align*}
If $\eta=\eta_1dx_1+\cdots+\eta_ndx_n\in T_x^*M$, then
\begin{align*}
di_x^\top\eta=\eta_1dx_1+\cdots+\eta_{n-1}dx_{n-1}\in T_x^*H.
\end{align*}
Thus $di_x^\top\eta=0$ exactly when $\eta_1=\cdots=\eta_{n-1}=0$, so the nonzero normal covectors are precisely the nonzero multiples of $dx_n$. Therefore
\begin{align*}
N_i=N^*H\setminus 0.
\end{align*}
If $u\in\mathcal{D}'(M)$ satisfies $\operatorname{WF}(u)\cap N^*H=\varnothing$, then the normal obstruction for $i$ is absent, and *Pullback Theorem for Distributions* defines the restriction $i^*u\in\mathcal{D}'(H)$. For example, if $u$ is conormal to another hypersurface $K=\{\rho=0\}$, then its singular covectors lie along nonzero multiples of $d\rho$. The restriction to $H$ is allowed at points of $H\cap K$ whenever no nonzero multiple of $d\rho$ lies in $N^*H$; equivalently, the conormal direction of $K$ is not also normal to $H$. This is the microlocal form of the condition that the singular direction of $u$ must not be invisible to motion along $H$.
[/example]
The theorem also explains why evaluating a distribution at a singular point is not a harmless operation. If the point inclusion meets a singular covector of $u$, the normal condition fails because every nonzero covector at that point annihilates the zero tangent space.
[example: Trace of a Conormal Distribution]
Let $M=\mathbb{R}^2$ with coordinates $(x_1,x_2)$, let $K=\{x_1=0\}$, and let $u=\delta(x_1)$, so
\begin{align*}
u(\Phi)=\int_{\mathbb{R}}\Phi(0,x_2)\,dx_2
\end{align*}
for every $\Phi\in C_c^\infty(\mathbb{R}^2)$. Let $H=\{x_2=0\}$ and parametrize $H$ by $t\mapsto i(t)=(t,0)$. For $a\partial_t\in T_tH$,
\begin{align*}
di_t(a\partial_t)=a\partial_{x_1}.
\end{align*}
If $\eta=\eta_1dx_1+\eta_2dx_2\in T^*_{(t,0)}M$, then
\begin{align*}
di_t^\top\eta(a\partial_t)=\eta(di_t(a\partial_t))=\eta(a\partial_{x_1})=a\eta_1.
\end{align*}
Thus
\begin{align*}
di_t^\top\eta=\eta_1dt.
\end{align*}
Hence $di_t^\top\eta=0$ exactly when $\eta_1=0$, so
\begin{align*}
N_i=\{((t,0),\eta_2dx_2):\eta_2\neq 0\}=N^*H\setminus 0.
\end{align*}
The wave front set of $\delta(x_1)$ is conormal to $K$:
\begin{align*}
\operatorname{WF}(u)=\{((0,x_2),\xi dx_1):\xi\neq 0\}.
\end{align*}
A covector in $N_i$ has no $dx_1$ component, while a covector in $\operatorname{WF}(u)$ has no $dx_2$ component and has nonzero $dx_1$ component. Therefore
\begin{align*}
\operatorname{WF}(u)\cap N_i=\varnothing.
\end{align*}
By the pullback theorem, the trace $i^*u$ is defined.
To identify it, regularize only in the $x_1$ variable: choose $\rho\in C_c^\infty(\mathbb{R})$ with $\int\rho(s)\,ds=1$ and set $\rho_\varepsilon(t)=\varepsilon^{-1}\rho(t/\varepsilon)$. The smooth functions $u_\varepsilon(x_1,x_2)=\rho_\varepsilon(x_1)$ converge to $\delta(x_1)$ as distributions. Their restrictions to $H$ are
\begin{align*}
i^*u_\varepsilon(t)=u_\varepsilon(t,0)=\rho_\varepsilon(t).
\end{align*}
For $\varphi\in C_c^\infty(H)\cong C_c^\infty(\mathbb{R})$,
\begin{align*}
\int_{\mathbb{R}}\rho_\varepsilon(t)\varphi(t)\,dt=\int_{\mathbb{R}}\rho(s)\varphi(\varepsilon s)\,ds
\end{align*}
after the substitution $t=\varepsilon s$. Since $\varphi(\varepsilon s)\to\varphi(0)$ on the compact support of $\rho$, the limit is
\begin{align*}
\int_{\mathbb{R}}\rho(s)\varphi(0)\,ds=\varphi(0)\int_{\mathbb{R}}\rho(s)\,ds=\varphi(0).
\end{align*}
Thus $i^*u=\delta_0$ on $H$. The trace exists because the singular covectors of $u$ point in the $dx_1$ direction, while the covectors normal to the restriction surface $H$ point in the $dx_2$ direction.
[/example]
## Pushforward Under Proper Submersions
Pullback moves distributions against a map; pushforward moves them along a map by integrating over fibres. The operation is not always defined without support conditions, because the image of a noncompact support can escape every compact set. For example, if $\pi:\mathbb{R}^2\to\mathbb{R}$ is projection onto the first coordinate and $u=1\in\mathcal{D}'(\mathbb{R}^2)$, then the formal expression $(\pi_*u)(\phi)=\int_{\mathbb{R}^2}\phi(x_1)\,dx_1dx_2$ diverges for any nonzero nonnegative $\phi\in C_c^\infty(\mathbb{R})$. For proper maps on the support, however, pushforward is a continuous operation on distributions.
[definition: Pushforward of a Compactly Supported Distribution]
Let $F:X\to Y$ be a smooth map and let $u\in \mathcal{E}'(X)$. The pushforward $F_*u\in \mathcal{D}'(Y)$ is defined by
\begin{align*}
(F_*u)(\phi)=u(\phi\circ F)
\end{align*}
for all $\phi\in C_c^\infty(Y)$.
[/definition]
For noncompactly supported distributions, the same formula is valid when $F$ is proper on $\operatorname{supp}u$. The microlocal question is then which covectors of $u$ survive the fibre integration and appear downstairs on $Y$. For submersions, the answer is governed by covectors lifted from the base, giving the estimate used later for kernels and Radon-type transforms.
[quotetheorem:8182]
[citeproof:8182]
This result says that pushforward only sees singularities whose covectors are horizontal with respect to the fibres. The properness assumption is needed even to make the formula define a distribution in general, as the constant distribution under a projection shows: infinite mass in the fibres may be sent to a compact set downstairs. The submersion assumption is also doing real work. For a concrete failure, let $F:\mathbb{R}\to\mathbb{R}$ be $F(x)=x^2$ and let $u=\chi(x)\,dx$ be a compactly supported smooth density with $\chi=1$ near $0$. Although $\operatorname{WF}(u)=\varnothing$, the pushforward is represented on $y>0$ by a density of the form
\begin{align*}
\frac{\chi(\sqrt{y})+\chi(-\sqrt{y})}{2\sqrt{y}}\,dy,
\end{align*}
with a conormal singularity at the critical value $y=0$. Thus a critical point can create a singularity downstairs even when the input is smooth, so the simple horizontal-covector estimate must be replaced by the full canonical-relation calculus. The theorem therefore gives the clean model case for fibre integration, while also indicating exactly which hypotheses later Fourier integral operator results will generalize. Purely vertical oscillation may disappear after integration, while horizontal singularities project to singularities on the base.
[example: Model Projection and Radon-Type Pushforward]
Let $\pi:\mathbb{R}^2\to\mathbb{R}$ be $\pi(x_1,x_2)=x_1$, and let $u\in\mathcal{D}'(\mathbb{R}^2)$ have compact support. For $\eta=\xi_1dx_1\in T_{x_1}^*\mathbb{R}$ and $a\partial_{x_1}+b\partial_{x_2}\in T_{(x_1,x_2)}\mathbb{R}^2$, the differential satisfies
\begin{align*}
d\pi_{(x_1,x_2)}(a\partial_{x_1}+b\partial_{x_2})=a\partial_{x_1}.
\end{align*}
Therefore
\begin{align*}
d\pi_{(x_1,x_2)}^\top(\xi_1dx_1)(a\partial_{x_1}+b\partial_{x_2})=\xi_1dx_1(a\partial_{x_1})=\xi_1a.
\end{align*}
Thus
\begin{align*}
d\pi_{(x_1,x_2)}^\top(\xi_1dx_1)=\xi_1dx_1.
\end{align*}
By *Pushforward Wave Front Estimate*,
\begin{align*}
\operatorname{WF}(\pi_*u)\subset \{(x_1,\xi_1): ((x_1,x_2),\xi_1dx_1)\in \operatorname{WF}(u)\text{ for some }x_2\}.
\end{align*}
As a concrete conormal example, take
\begin{align*}
u=\chi(x_1,x_2)\delta(x_2-g(x_1)),
\end{align*}
where $g\in C^\infty(\mathbb{R})$, $\chi\in C_c^\infty(\mathbb{R}^2)$, and $g'(x_1)\neq 0$ on the relevant support. The curve is the zero set of $F(x_1,x_2)=x_2-g(x_1)$, and
\begin{align*}
dF=-g'(x_1)dx_1+dx_2.
\end{align*}
Hence the conormal covectors along $x_2=g(x_1)$ are the nonzero multiples
\begin{align*}
\lambda(-g'(x_1)dx_1+dx_2),\qquad \lambda\neq 0.
\end{align*}
A lifted covector from the base has the form $\xi_1dx_1$. If such a lifted covector were equal to a conormal covector, then
\begin{align*}
\xi_1dx_1=\lambda(-g'(x_1)dx_1+dx_2).
\end{align*}
Comparing the $dx_2$ coefficients gives $0=\lambda$, and then comparing the $dx_1$ coefficients gives $\xi_1=0$. Thus the only common covector is the zero covector, which is excluded from wave front sets.
It remains to identify the pushforward. For $\varphi\in C_c^\infty(\mathbb{R})$,
\begin{align*}
(\pi_*u)(\varphi)=u(\varphi\circ\pi).
\end{align*}
Since $(\varphi\circ\pi)(x_1,x_2)=\varphi(x_1)$, the defining action of $\delta(x_2-g(x_1))$ gives
\begin{align*}
u(\varphi\circ\pi)=\int_{\mathbb{R}}\chi(x_1,g(x_1))\varphi(x_1)\,dx_1.
\end{align*}
Therefore $\pi_*u$ is the compactly supported smooth density represented by $\chi(x_1,g(x_1))\,dx_1$, so $\operatorname{WF}(\pi_*u)=\varnothing$. The conormal singularity is transverse to the fibres of $\pi$, and fibre integration removes it instead of projecting it to a singularity on the base.
[/example]
## Products and Hörmander Transversality
The final operation is multiplication. Smooth functions multiply distributions without restriction, but multiplying two singular distributions requires a compatibility condition. The product $uv$ can be viewed as pulling back $u\otimes v$ along the diagonal map, so the pullback theorem turns the problem into a wave front transversality criterion.
[definition: Diagonal Map]
Let $X$ be a smooth manifold. The diagonal map is the smooth map $\Delta:X\to X\times X$ defined by
\begin{align*}
\Delta:X\to X\times X,\qquad \Delta(x)=(x,x).
\end{align*}
[/definition]
The normal set of the diagonal consists of opposite covectors at the same base point. Since products are diagonal restrictions of tensor products, this normal set is exactly where multiplication can break down. The next theorem combines the tensor product estimate with the pullback theorem to give a usable criterion for nonlinear distributional expressions.
[quotetheorem:8183]
[citeproof:8183]
This theorem is the main local rule for nonlinear expressions involving distributions. Its hypothesis is necessary for this construction because the diagonal pullback fails exactly when $\operatorname{WF}(u\otimes v)$ meets the conormal bundle of the diagonal; the model failure is $u=v=\delta_0$, where both opposite covectors occur at the origin. It does not say that multiplication is impossible whenever both factors are singular at the same point; it says that their singular covectors must not point in opposite directions. It also does not classify all possible renormalized or problem-specific products, since those may depend on extra choices beyond distribution theory. The theorem supplies a canonical product when the transversality condition holds, and the displayed estimate is the bookkeeping rule used later when kernels are composed or nonlinear PDE terms are interpreted microlocally.
[example: Why Delta Squared Is Not Defined]
On $\mathbb{R}$, the delta distribution has
\begin{align*}
\operatorname{WF}(\delta_0)=\{(0,\xi):\xi\neq 0\}.
\end{align*}
If $u=v=\delta_0$ and $\xi\neq 0$, then $(0,\xi)\in\operatorname{WF}(u)$ and, since $-\xi\neq 0$, also $(0,-\xi)\in\operatorname{WF}(v)$. Thus the opposite-covector hypothesis in *[Hörmander Product Theorem](/theorems/8183)* fails at $0$, so $\delta_0^2$ is not defined by Hörmander's product construction.
The same obstruction appears under mollification. Choose a real $\rho\in C_c^\infty(\mathbb{R})$ with $\int_{\mathbb{R}}\rho(s)\,ds=1$, set $\rho_\varepsilon(x)=\varepsilon^{-1}\rho(x/\varepsilon)$, and test $\rho_\varepsilon^2$ against $\varphi\in C_c^\infty(\mathbb{R})$ with $\varphi=1$ near $0$. Then
\begin{align*}
\langle \rho_\varepsilon^2,\varphi\rangle=\int_{\mathbb{R}}\varepsilon^{-2}\rho(x/\varepsilon)^2\varphi(x)\,dx.
\end{align*}
With $x=\varepsilon s$, this becomes
\begin{align*}
\langle \rho_\varepsilon^2,\varphi\rangle=\varepsilon^{-1}\int_{\mathbb{R}}\rho(s)^2\varphi(\varepsilon s)\,ds.
\end{align*}
For all sufficiently small $\varepsilon$, the compact support of $\rho$ is mapped into the region where $\varphi(\varepsilon s)=1$, so
\begin{align*}
\langle \rho_\varepsilon^2,\varphi\rangle=\varepsilon^{-1}\int_{\mathbb{R}}\rho(s)^2\,ds.
\end{align*}
Since $\rho\not\equiv 0$, the integral $\int\rho(s)^2\,ds$ is positive, and the pairing tends to $+\infty$ as $\varepsilon\to 0$. Therefore the squares of [standard mollifier](/page/Standard%20Mollifier) approximations do not converge to a distribution, matching the microlocal failure of the product criterion.
[/example]
The product theorem also gives positive examples. Boundary values of holomorphic functions, conormal distributions with compatible orientations, and restrictions of kernels away from bad conormal intersections are often multiplied using exactly this criterion.
[example: Multiplying One-Sided Boundary Values]
Let
\begin{align*}
u=(x+i0)^{-1}=\lim_{\varepsilon\downarrow 0}(x+i\varepsilon)^{-1}
\end{align*}
as a boundary value distribution on $\mathbb{R}$. With the Fourier transform convention used for wave front sets, this boundary value has
\begin{align*}
\operatorname{WF}(u)=\{(0,\xi):\xi>0\}.
\end{align*}
Taking $v=u$, the obstruction in *Hörmander Product Theorem* would require some nonzero $\xi$ with
\begin{align*}
(0,\xi)\in\operatorname{WF}(u)
\end{align*}
and
\begin{align*}
(0,-\xi)\in\operatorname{WF}(v).
\end{align*}
The first condition says $\xi>0$, while the second says $-\xi>0$, equivalently $\xi<0$. These two inequalities cannot hold simultaneously, so the opposite-covector obstruction is absent.
Therefore the product $uv$ is defined by Hörmander's construction. Its wave front estimate gives
\begin{align*}
\operatorname{WF}(uv)\subset \operatorname{WF}(u)\cup\operatorname{WF}(v)\cup\{(0,\xi+\eta):(0,\xi)\in\operatorname{WF}(u),\ (0,\eta)\in\operatorname{WF}(v)\}.
\end{align*}
Since $\operatorname{WF}(u)=\operatorname{WF}(v)=\{(0,\xi):\xi>0\}$, the sum term is
\begin{align*}
\{(0,\xi+\eta):\xi>0,\ \eta>0\}=\{(0,\zeta):\zeta>0\}.
\end{align*}
Thus
\begin{align*}
\operatorname{WF}(uv)\subset \{(0,\zeta):\zeta>0\}.
\end{align*}
Concretely, the smooth products $(x+i\varepsilon)^{-2}$ converge to the boundary value $(x+i0)^{-2}$, so this product is the expected one:
\begin{align*}
uv=(x+i0)^{-2}.
\end{align*}
By contrast, $(x-i0)^{-1}$ has the opposite ray $\{(0,\xi):\xi<0\}$, so pairing $(x+i0)^{-1}$ with $(x-i0)^{-1}$ creates opposite covectors at $0$ and fails the same criterion.
[/example]
## The Calculus Viewpoint
The four operations in this chapter form the basic functorial calculus of wave front sets. Tensor product corresponds to taking exterior variables, pullback corresponds to transporting singularities against a map, pushforward corresponds to projecting covectors along fibres, and product corresponds to restricting the exterior product to the diagonal.
[remark: Geometry of the Four Operations]
Each operation has a base-space formula and a cotangent-space formula. Pullback uses $dF_x^\top$ and requires avoidance of $N_F$; pushforward keeps covectors that lift through $dF_x^\top$; product adds covectors at the same base point; tensor product records both separate and joint singular directions. These rules are the local form of the canonical-relation calculus developed later for Fourier integral operators.
[/remark]
The importance of these estimates is not only that they define operations. They make singularity bookkeeping stable under the constructions used in PDE: traces on hypersurfaces, kernels composed by fibre integration, nonlinear products, and Radon-type transforms. In Chapter 9, these cotangent rules become the model for canonical relations and the composition of Fourier integral operators; first, Chapter 4 develops conormal distributions as the basic kernels to which those rules apply.
Conormal distributions supply the first systematic class of examples where the geometry of singularities is completely visible. They model failure of smoothness along a submanifold and provide the local kernels from which more general microlocal objects are built.
# 4. Conormal Distributions and Model Singularities
This chapter introduces the model class of distributions whose singularities are controlled by a smooth submanifold. The guiding point is that many kernels in analysis are neither smooth nor arbitrary: their failure of smoothness has a normal direction and admits an oscillatory description. The prerequisites are the distribution theory and Fourier decay criteria of Chapter 1, the pseudodifferential detection results of Chapter 2, and the operation rules of Chapter 3. Conormal distributions provide the bridge between those wave front set methods and the Fourier integral kernels that appear later.
## Singularities Normal to a Submanifold
The first question is how to describe a distribution that is singular along a submanifold but smooth in directions tangent to it. The wave front set records covectors, so the geometric object attached to the submanifold should live in the cotangent bundle rather than in the tangent bundle.
[definition: Conormal Bundle]
Let $X$ be a smooth manifold and let $S \subset X$ be an embedded submanifold. The conormal bundle of $S$ in $X$ is
\begin{align*}
N^*S = \{(x,\xi) \in T^*X : x \in S,\ \xi(v)=0 \text{ for every } v \in T_xS\}.
\end{align*}
The punctured conormal bundle is $N^*S \setminus 0$, where the zero covectors are removed.
[/definition]
Thus $N^*S$ consists of covectors that test only normal displacement away from $S$. If $S$ has codimension $k$, then each fibre $N_x^*S$ has dimension $k$, and the punctured conormal directions are the possible singular directions for distributions supported, or singular, along $S$.
[example: Conormal Bundle of a Coordinate Plane]
In $X=\mathbb R^n$, write $x=(y,z)$ with $y\in \mathbb R^{n-k}$ and $z\in \mathbb R^k$, and take $S=\{(y,z):z=0\}$. A smooth curve in $S$ through $(y,0)$ has the form $\gamma(t)=(y(t),0)$, so
\begin{align*}
\gamma'(0)=(y'(0),0).
\end{align*}
Conversely, every vector $(v,0)\in \mathbb R^{n-k}\times \mathbb R^k$ occurs as the derivative of the curve $\gamma(t)=(y+tv,0)$, so
\begin{align*}
T_{(y,0)}S=\{(v,0):v\in \mathbb R^{n-k}\}.
\end{align*}
A covector at $(y,0)$ can be written as $(\eta,\zeta)\in(\mathbb R^{n-k})^*\times(\mathbb R^k)^*$, acting by
\begin{align*}
(\eta,\zeta)(v,w)=\eta(v)+\zeta(w).
\end{align*}
It belongs to $N^*_{(y,0)}S$ exactly when it vanishes on every tangent vector $(v,0)$:
\begin{align*}
(\eta,\zeta)(v,0)=\eta(v)+\zeta(0)=\eta(v).
\end{align*}
Thus $(\eta,\zeta)$ annihilates $T_{(y,0)}S$ if and only if $\eta(v)=0$ for every $v\in\mathbb R^{n-k}$, which is equivalent to $\eta=0$. Therefore
\begin{align*}
N^*S=\{(y,0;0,\zeta): y\in \mathbb R^{n-k},\ \zeta\in(\mathbb R^k)^*\}.
\end{align*}
The tangential covector component must vanish, while the normal covector component $\zeta$ is unrestricted; these are precisely the normal frequency directions along the coordinate plane.
[/example]
The local coordinate model suggests the definition of a conormal distribution: smooth dependence along $S$, paired with oscillation in the normal covariable. The definition is local because both submanifolds and oscillatory integrals are best handled in adapted charts.
[definition: Conormal Distribution]
Let $S\subset X$ be an embedded submanifold. A distribution $u\in \mathcal D'(X)$ is conormal to $S$ if, in every coordinate chart identifying $S$ locally with $\{z=0\}\subset \mathbb R^{n-k}_y\times \mathbb R^k_z$, the distribution can be written modulo a smooth function as an oscillatory integral
\begin{align*}
u(y,z)=\frac{1}{(2\pi)^k}\int_{\mathbb R^k} e^{iz\cdot \zeta} a(y,\zeta)\,d\zeta,
\end{align*}
where $a$ is a classical symbol in the covariable $\zeta$.
[/definition]
The phrase "modulo a smooth function" means that the representation is intended to capture the singular part. Different symbol orders give different strengths of singularity, and the next problem is to recognize when a less rigid phase still describes the same conormal geometry.
## Oscillatory Integral Representation
The local formula above raises two problems. First, a submanifold need not be presented globally as $z=0$. Second, even locally, later applications use phases more flexible than $z\cdot \zeta$. The correct invariant language is a nondegenerate phase function whose critical set parametrizes the conormal bundle.
[definition: Clean Conormal Phase]
Let $X$ be a smooth manifold and let $S\subset X$ be an embedded submanifold. A smooth real-valued function $\phi:U\times (\mathbb R^N\setminus 0)\to \mathbb R$ is a conic phase parametrizing $N^*S$ on $U\subset X$ if $\phi$ is homogeneous of degree $1$ in $\theta$, the critical set
\begin{align*}
C_\phi=\{(x,\theta): \partial_\theta\phi(x,\theta)=0\}
\end{align*}
is a smooth conic manifold, and the map
\begin{align*}
C_\phi &\longrightarrow T^*U, & (x,\theta)&\longmapsto (x,d_x\phi(x,\theta))
\end{align*}
has image contained in $N^*S\setminus 0$ and is a local diffeomorphism onto that image.
[/definition]
This definition isolates the geometry carried by the phase from the analytic strength carried by the amplitude. The model phase $\phi(y,z,\zeta)=z\cdot \zeta$ has critical set $z=0$, and $d_x\phi=(0,\zeta)$, exactly the coordinate description of $N^*S$. To use arbitrary phases without changing the class of distributions, we need a parametrization theorem identifying the model formula with every nondegenerate conormal phase.
[quotetheorem:8184]
[citeproof:8184]
The hypotheses in this theorem are doing real work. Homogeneity keeps the phase compatible with conic cotangent geometry, while nondegeneracy is what permits stationary phase and the [implicit function theorem](/theorems/52) to replace the phase by the model $z\cdot \zeta$ microlocally. If the phase has a degenerate critical point, for instance $\phi(z,\theta)=z\theta^2$ near $\theta=0$, the critical set no longer gives a smooth conic parametrization of nonzero normal covectors, and stationary phase produces behaviour not governed by the symbol order alone. Likewise, a phase whose critical set maps into a larger or different subset of $T^*X$ may define an oscillatory distribution, but it is not a conormal distribution to $S$ unless the image is the relevant part of $N^*S\setminus 0$.
Thus the theorem is not saying that every oscillatory integral near $S$ is conormal to $S$. It says that the oscillatory integrals with the correct clean critical geometry are precisely the local models compatible with the conormal bundle. This gives the next test for the definition: if conormality is the correct microlocal model, then its wave front set should not contain directions outside the conormal bundle.
[quotetheorem:8185]
[citeproof:8185]
The assumption that $u$ is conormal is essential. A distribution may be singular on the same set $S$ while carrying tangential oscillation: for example, on $\mathbb R_y\times \mathbb R_z$ with $S=\{z=0\}$, multiplying $\delta(z)$ by a distribution in $y$ can introduce wave front directions with nonzero $dy$ component. Those directions are not annihilators of $T_yS$, so the conclusion would fail without the conormal oscillatory representation.
The theorem is also only an inclusion. It does not assert that every nonzero conormal covector occurs in the wave front set; cancellations or vanishing of the leading amplitude may remove directions. For the basic examples, however, the whole punctured conormal bundle is visible, so the next refinement is to measure not only where the singularity points but also its order of strength.
## Order and Principal Symbol
Once conormality identifies the directions of singularity, the next question is how strong the singularity is. The answer is encoded in the order of the amplitude, with a normalization shift depending on the number of oscillatory variables and on the ambient dimension convention used for distribution kernels.
[definition: Conormal Distribution of Order]
Let $S\subset X$ have codimension $k$. In adapted coordinates $x=(y,z)$, a distribution represented as
\begin{align*}
u(y,z)=\frac{1}{(2\pi)^k}\int_{\mathbb R^k} e^{iz\cdot \zeta}a(y,\zeta)\,d\zeta
\end{align*}
with $a\in S^m_{\mathrm{cl}}(\mathbb R^{n-k}_y;\mathbb R^k_\zeta)$ is said in these notes to have local amplitude order $m$. The space of such distributions is denoted $I^m_{\mathrm{loc}}(X;S)$ for this local convention.
[/definition]
The notation $I^m_{\mathrm{loc}}(X;S)$ in this chapter is therefore not the invariant standard conormal order used in many references. Different books shift the index of $I^m(X;S)$ by half-density conventions and by codimension-dependent normalizations. These notes keep the local amplitude order visible because it is the order used in the model computations; when kernels are later written as half-densities, the corresponding shift will be stated at that point.
[example: Delta Distribution on a Submanifold]
Let $S=\{z=0\}\subset \mathbb R^{n-k}_y\times \mathbb R^k_z$. For a test function $\Phi\in C_c^\infty(\mathbb R^{n-k}\times\mathbb R^k)$, the distribution defined by integration over $S$ is
\begin{align*}
\langle \delta_S,\Phi\rangle=\int_{\mathbb R^{n-k}}\Phi(y,0)\,dy.
\end{align*}
With the Fourier transform convention $\widehat{\psi}(\zeta)=\int e^{-iz\cdot\zeta}\psi(z)\,dz$, Fourier inversion gives
\begin{align*}
\psi(0)=\frac{1}{(2\pi)^k}\int_{\mathbb R^k}\widehat{\psi}(\zeta)\,d\zeta.
\end{align*}
Applying this to $\psi(z)=\Phi(y,z)$ for each fixed $y$ gives
\begin{align*}
\Phi(y,0)=\frac{1}{(2\pi)^k}\int_{\mathbb R^k}\int_{\mathbb R^k}e^{-iz\cdot\zeta}\Phi(y,z)\,dz\,d\zeta.
\end{align*}
Replacing $\zeta$ by $-\zeta$ in the outer integral gives the equivalent oscillatory form
\begin{align*}
\langle \delta_S,\Phi\rangle=\left\langle \frac{1}{(2\pi)^k}\int_{\mathbb R^k}e^{iz\cdot\zeta}\,d\zeta,\Phi(y,z)\right\rangle.
\end{align*}
Thus, as a distribution,
\begin{align*}
\delta_S(y,z)=\frac{1}{(2\pi)^k}\int_{\mathbb R^k}e^{iz\cdot\zeta}\,d\zeta.
\end{align*}
The amplitude is $a(y,\zeta)=1$. Since $\partial_y^\alpha\partial_\zeta^\beta a=0$ unless $\alpha=\beta=0$, and $|a(y,\zeta)|=1\le (1+|\zeta|)^0$, this is a classical symbol of order $0$. Therefore $\delta_S$ has local amplitude order $0$ in this convention.
To see the wave front directions, localize by $\chi\in C_c^\infty$ near a point $(y_0,0)\in S$. The Fourier transform of $\chi\delta_S$ is
\begin{align*}
\widehat{\chi\delta_S}(\eta,\zeta)=\int_{\mathbb R^{n-k}}e^{-iy\cdot\eta}\chi(y,0)\,dy.
\end{align*}
This expression is independent of the normal frequency $\zeta$ and is rapidly decreasing in $\eta$. Hence cones with $\eta\ne 0$ contain no wave front directions, while along any nonzero normal covector $(0,\zeta_0)$ one can choose $\chi$ with $\int\chi(y,0)\,dy\ne 0$, so the localized Fourier transform does not decay along $(\eta,\zeta)=(0,\lambda\zeta_0)$. Thus
\begin{align*}
\operatorname{WF}(\delta_S)=\{(y,0;0,\zeta):\zeta\ne 0\}=N^*S\setminus 0.
\end{align*}
The delta distribution is therefore singular exactly in the normal covector directions and smooth along the tangent variables $y$.
[/example]
The delta example shows that the top homogeneous part of the amplitude controls the first non-smooth term. A naive leading amplitude is not invariant by itself: even in the model phase $z\cdot \zeta$, a linear change of normal variable $z'=Bz$ changes the dual variable to $\zeta'=B^{-\top}\zeta$ and contributes the corresponding density factor to the oscillatory measure. To compare two conormal distributions of the same order, and later to compose kernels, we need a coordinate-independent way to retain this leading data over the conormal directions.
[definition: Principal Conormal Symbol]
Let $u\in I^m_{\mathrm{loc}}(X;S)$ be represented in adapted coordinates by an amplitude with classical expansion
\begin{align*}
a(y,\zeta)\sim a_m(y,\zeta)+a_{m-1}(y,\zeta)+a_{m-2}(y,\zeta)+\cdots,
\end{align*}
where $a_{m-j}$ is homogeneous of degree $m-j$ for $|\zeta|\ge 1$. The principal conormal symbol of $u$ in this representation is the class of the homogeneous leading term $a_m$ as an element of $S^m_{\mathrm{hom}}(N^*S\setminus 0)$, with the density twisting supplied by the chosen local coordinates and oscillatory measure.
[/definition]
The phrase "in this representation" matters because changing phase and coordinates transforms the leading term by a Jacobian and a Maslov-type factor in more general settings. The next theorem is needed to show that these changes do not destroy the symbol, but only rewrite the same invariant leading object in a new parametrization.
[quotetheorem:8186]
[citeproof:8186]
This theorem is the reason the principal symbol is an invariant geometric object rather than a coordinate artifact. Nondegenerate phase representations are required because the proof divides the oscillatory variables into genuine critical variables and harmless quadratic variables; if the phase is degenerate, the Hessian determinant and signature used by stationary phase may not exist, and the leading power can change. The density Jacobian records the fact that a leading amplitude is measured relative to a choice of oscillatory variables and base coordinates, while the signature factor records the phase convention introduced when quadratic variables are integrated out.
Ignoring these factors gives the wrong comparison between two formulas for the same distribution. For instance, after a change of [normal coordinates](/theorems/2713), the written leading coefficient of the delta distribution changes by the reciprocal Jacobian of the normal variable, although the distribution itself has not changed. In these notes we will often compute the symbol in model coordinates and then appeal to the transformation law for invariance.
## Model Examples and Basic Kernels
The final question in the chapter is why conormal distributions appear so often as kernels. The answer is that many operations impose a constraint, such as equality of variables, membership in a hypersurface, or propagation along a characteristic cone; constraints produce submanifolds, and their kernels are conormal to those submanifolds.
[example: Surface Measure on a Hypersurface]
Let $S=\{x\in \mathbb R^n:f(x)=0\}$, with $f\in C^\infty(\mathbb R^n)$ and $\nabla f\ne 0$ on $S$. Fix $p\in S$. By the implicit function theorem, after shrinking to a neighbourhood $U$ of $p$ there are coordinates $(y,s)\in \mathbb R^{n-1}\times \mathbb R$ such that $s=f(x)$ and $S\cap U=\{s=0\}$.
Write the coordinate map as $x=F(y,s)$. In these coordinates, surface measure on $S$ has the form
\begin{align*}
d\mathcal H^{n-1}\big|_S=J_S(y)\,dy
\end{align*}
where
\begin{align*}
J_S(y)=\sqrt{\det\big((\partial_yF(y,0))^\top\partial_yF(y,0)\big)}
\end{align*}
is smooth and positive. Therefore, for $\psi\in C_c^\infty(U)$,
\begin{align*}
T_\sigma(\psi)=\int_{\mathbb R^{n-1}}\psi(F(y,0))J_S(y)\,dy.
\end{align*}
If $\Psi(y,s)=\psi(F(y,s))$, this becomes
\begin{align*}
T_\sigma(\psi)=\int_{\mathbb R^{n-1}}\Psi(y,0)J_S(y)\,dy.
\end{align*}
Thus, in the adapted coordinates, $T_\sigma$ is the product of the one-dimensional delta distribution in the normal variable $s$ with the smooth density $J_S(y)\,dy$ along $S$:
\begin{align*}
T_\sigma=J_S(y)\delta(s).
\end{align*}
Using the [Fourier inversion formula](/theorems/528) in the single normal variable,
\begin{align*}
\delta(s)=\frac{1}{2\pi}\int_{\mathbb R}e^{is\zeta}\,d\zeta.
\end{align*}
Hence locally
\begin{align*}
T_\sigma(y,s)=\frac{1}{2\pi}\int_{\mathbb R}e^{is\zeta}J_S(y)\,d\zeta.
\end{align*}
The amplitude $a(y,\zeta)=J_S(y)$ is smooth in $y$ and independent of $\zeta$, so it is a classical symbol of order $0$. Therefore surface measure is conormal to $S$. Since the normal coordinate is $s=f(x)$, the normal covectors are multiples of $ds=df$, and the possible singular directions are
\begin{align*}
N^*S\setminus 0=\{(x,\lambda\,df_x):x\in S,\ \lambda\ne 0\}.
\end{align*}
Thus the wave front set of the surface measure is contained in $N^*S\setminus 0$: the singularity is normal to the hypersurface, while the density along the hypersurface is smooth.
[/example]
Surface measure is the hypersurface version of the delta example, with a nonconstant density along the submanifold. The same mechanism explains the kernels of familiar operators, but to use it for operators we must account for the extra variables in a Schwartz kernel and the projection of conormal directions.
[example: Identity Operator Kernel]
Fix a smooth positive density $\mu$ on $X$. The Schwartz kernel $K_{\operatorname{id}}$ is the distribution on $X\times X$ determined by
\begin{align*}
\langle K_{\operatorname{id}},\Phi\rangle=\int_X \Phi(x,x)\,d\mu(x)
\end{align*}
for every $\Phi\in C_c^\infty(X\times X)$. Indeed, for $f,\varphi\in C_c^\infty(X)$ and $\Phi(x,y)=\varphi(x)f(y)$, this gives
\begin{align*}
\langle K_{\operatorname{id}},\varphi\otimes f\rangle=\int_X \varphi(x)f(x)\,d\mu(x)
\end{align*}
which is exactly the pairing of $\operatorname{id}(f)=f$ with $\varphi$.
In local coordinates $x=(x^1,\dots,x^n)$ on $X$, write coordinates on $X\times X$ as $(x,y)$ and introduce adapted coordinates
\begin{align*}
q=x,\qquad z=x-y.
\end{align*}
Then the diagonal is $\Delta=\{z=0\}$. If $d\mu=\rho(q)\,dq$ in these coordinates, with $\rho$ smooth and positive, then
\begin{align*}
\langle K_{\operatorname{id}},\Phi\rangle=\int \Phi(q,q)\rho(q)\,dq.
\end{align*}
Since $y=q-z$, this is the same as
\begin{align*}
\langle K_{\operatorname{id}},\Phi\rangle=\int \Phi(q,q-z)\rho(q)\delta(z)\,dq\,dz.
\end{align*}
Using the Fourier representation of the delta distribution in the normal variable $z$,
\begin{align*}
\delta(z)=\frac{1}{(2\pi)^n}\int_{\mathbb R^n}e^{iz\cdot \zeta}\,d\zeta.
\end{align*}
Therefore, locally near the diagonal,
\begin{align*}
K_{\operatorname{id}}(q,z)=\frac{1}{(2\pi)^n}\int_{\mathbb R^n}e^{iz\cdot\zeta}\rho(q)\,d\zeta.
\end{align*}
The amplitude $a(q,\zeta)=\rho(q)$ is smooth in $q$ and independent of $\zeta$, so it is a classical symbol of order $0$. Hence the identity kernel is conormal to $\Delta$.
It remains to identify the conormal directions. A tangent vector to $\Delta$ at $(x,x)$ has the form $(v,v)$ with $v\in T_xX$. A covector at $(x,x)$ has the form $(\xi,\eta)\in T_x^*X\times T_x^*X$, and its value on $(v,v)$ is
\begin{align*}
(\xi,\eta)(v,v)=\xi(v)+\eta(v)=(\xi+\eta)(v).
\end{align*}
Thus $(\xi,\eta)$ annihilates every tangent vector to $\Delta$ exactly when $(\xi+\eta)(v)=0$ for every $v\in T_xX$, which is equivalent to $\eta=-\xi$. Hence
\begin{align*}
N^*\Delta=\{(x,x;\xi,-\xi):\xi\in T_x^*X\}.
\end{align*}
Removing the zero covector gives
\begin{align*}
N^*\Delta\setminus 0=\{(x,x;\xi,-\xi):\xi\ne 0\}.
\end{align*}
By the conormal wave front inclusion, the singular directions of the identity kernel lie in this punctured conormal bundle:
\begin{align*}
\operatorname{WF}(K_{\operatorname{id}})\subset N^*\Delta\setminus 0.
\end{align*}
The pair $(\xi,-\xi)$ records that the same covector is read with opposite signs on the two factors of $X\times X$, which is the local geometric origin of the identity canonical relation.
[/example]
This diagonal example is the simplest bridge from conormal distributions to pseudodifferential kernels. For a general conormal kernel, the singular support may lie on a submanifold of $X\times Y$ rather than on the diagonal, and integration in the $Y$ variable can project those singularities onto $X$.
Turning that geometric picture into an operator statement creates two separate difficulties. First, the kernel must be supported so that pairing it with a smooth input in the $Y$ variable gives a well-defined distribution on $X$, rather than an uncontrolled integral over noncompact fibres. Second, after the $Y$ variable is eliminated, the only possible new singular covectors on $X$ should be those obtained by projecting the conormal directions of the kernel. The theorem below is the basic mapping result that packages these support and projection requirements for conormal kernels acting on smooth functions.
[quotetheorem:8187]
[citeproof:8187]
Proper support cannot be dropped from this kernel statement. Without it, the integral defining $Af$ for a general smooth input may receive contributions from noncompact parts of the support of $K$, and the resulting functional on test functions in $X$ need not be continuous without extra decay assumptions. The theorem also treats only smooth inputs: it records which singularities the kernel itself can create after integration in $Y$, but it does not yet describe how pre-existing singularities of a distributional input propagate.
That missing propagation statement is exactly what the full Fourier integral operator calculus supplies. In Chapter 9, the simple projection above will be replaced by composition with the twisted canonical relation
\begin{align*}
C_K=\{(x,\xi;y,\eta):(x,y;\xi,-\eta)\in N^*S\setminus 0\}.
\end{align*}
[example: Euclidean Wave Kernel Near the Light Cone]
Consider the oscillatory pieces in the Fourier representation of the sine propagator:
\begin{align*}
K_\pm(t,x)=\int_{\mathbb R^n} e^{i(x\cdot \xi \pm t|\xi|)}a_\pm(\xi)\,d\xi,
\end{align*}
where $a_\pm(\xi)$ is a classical amplitude away from $\xi=0$. Write $\xi=r\omega$ with $r>0$ and $|\omega|=1$. The phase is
\begin{align*}
\phi_\pm(t,x,\xi)=x\cdot \xi \pm t|\xi|.
\end{align*}
Its $\xi$-derivative is
\begin{align*}
\partial_\xi\phi_\pm(t,x,\xi)=x\pm t\frac{\xi}{|\xi|}=x\pm t\omega.
\end{align*}
Thus a stationary point in the frequency variable satisfies
\begin{align*}
x=\mp t\omega.
\end{align*}
Taking Euclidean norms gives $|x|=|t|$, so stationary points can occur only on the light cone $t^2=|x|^2$.
Away from the cone tip $(t,x)=(0,0)$, the function
\begin{align*}
f(t,x)=t^2-|x|^2
\end{align*}
has differential
\begin{align*}
df=2t\,dt-2\sum_{j=1}^n x_j\,dx_j.
\end{align*}
If $f(t,x)=0$ and $(t,x)\ne(0,0)$, then either $t\ne 0$ or $x\ne 0$, so $df\ne 0$. Hence the light cone is a smooth hypersurface away from its tip.
For the $+$ phase, stationarity gives $x=-t\omega$. Since $\xi=r\omega$, the differential in the base variables is
\begin{align*}
d_{t,x}\phi_+=r\,dt+r\sum_{j=1}^n\omega_j\,dx_j.
\end{align*}
Using $\omega=-x/t$, this becomes
\begin{align*}
d_{t,x}\phi_+=r\,dt-\frac{r}{t}\sum_{j=1}^n x_j\,dx_j.
\end{align*}
Since
\begin{align*}
\frac{r}{2t}df=r\,dt-\frac{r}{t}\sum_{j=1}^n x_j\,dx_j,
\end{align*}
we have $d_{t,x}\phi_+=(r/(2t))df$. For the $-$ phase, stationarity gives $x=t\omega$, and
\begin{align*}
d_{t,x}\phi_-=-r\,dt+r\sum_{j=1}^n\omega_j\,dx_j.
\end{align*}
Using $\omega=x/t$, this becomes
\begin{align*}
d_{t,x}\phi_-=-r\,dt+\frac{r}{t}\sum_{j=1}^n x_j\,dx_j.
\end{align*}
Since
\begin{align*}
-\frac{r}{2t}df=-r\,dt+\frac{r}{t}\sum_{j=1}^n x_j\,dx_j,
\end{align*}
we have $d_{t,x}\phi_-=-(r/(2t))df$.
Thus, at every stationary point away from the tip, the base covector is a nonzero multiple of $df$. Therefore the leading singular part of the Euclidean wave kernel is conormal to the smooth light cone, with conormal directions
\begin{align*}
\{(t,x;\lambda\,df_{(t,x)}):t^2=|x|^2,\ (t,x)\ne(0,0),\ \lambda\ne 0\}.
\end{align*}
The stationary equations say exactly that singularities occur when $x$ is reached from the origin at speed one, which is the Euclidean null bicharacteristic geometry of the wave operator.
[/example]
This example previews the propagation theorem: wave equations do not produce arbitrary new singularities, but move conormal singularities along the characteristic geometry. Conormal distributions are therefore both local models for singularities and the building blocks for the Fourier integral operators that transport them.
After conormal singularities are understood as local models, the natural question is how PDE transport them along characteristic geometry. Real principal type operators answer that question by turning the principal symbol into a Hamiltonian flow on cotangent space.
# 5. Real Principal Type and Bicharacteristics
This chapter turns the symbolic calculus from a detector of singularities into a dynamical tool. The principal symbol of a differential or pseudodifferential operator defines a characteristic hypersurface in the cotangent bundle, and the Hamilton vector field of that symbol gives the direction in which high-frequency information travels. The main theme is that, for real principal type operators, microlocal regularity propagates along null bicharacteristics rather than across them.
The prerequisites are the material developed earlier in the course: wave front sets from Chapter 1, Sobolev wave front sets and elliptic regularity from Chapter 2, and the basic symbolic calculus for classical pseudodifferential operators including adjoints and composition. The preceding chapters defined wave front sets and explained how elliptic operators remove microlocal uncertainty. Here ellipticity fails on the characteristic set, so the question becomes geometric: once a singularity reaches a characteristic covector, which neighbouring covectors are forced to share the same regularity? The answer is encoded by Poisson brackets, commutators, and positive commutator estimates.
## Principal Symbols, Characteristic Sets, and Hamilton Vector Fields
What remains of an operator at high frequency after lower-order terms have been discarded? The principal symbol is the leading homogeneous term that controls ellipticity, characteristics, and the classical flow associated with the equation.
[definition: Principal Symbol]
Let $X$ be a smooth manifold. The principal symbol map of order $m$ is the map
\begin{align*}
\sigma_m:\Psi^m(X)\to S^m(T^*X\setminus 0)/S^{m-1}(T^*X\setminus 0).
\end{align*}
Here $\Psi^m(X)$ denotes the properly supported classical pseudodifferential operators acting continuously as
\begin{align*}
P:C_c^\infty(X)\to C^\infty(X)
\end{align*}
and extending by duality to
\begin{align*}
P:\mathcal D'(X)\to \mathcal D'(X).
\end{align*}
For a classical pseudodifferential operator $P\in \Psi^m(X)$ with full symbol
\begin{align*}
p(x,\xi) \sim p_m(x,\xi) + p_{m-1}(x,\xi) + p_{m-2}(x,\xi)+\cdots,
\end{align*}
in a chosen local quantization, where $p_{m-j}$ is homogeneous of degree $m-j$ in $\xi$ for $|\xi| \ge 1$, the principal symbol is locally represented by
\begin{align*}
\sigma_m(P)=[p_m]\in S^m(T^*X\setminus 0)/S^{m-1}(T^*X\setminus 0).
\end{align*}
[/definition]
The local formula is a representative of an invariant object: under changes of coordinates and quantization, the leading homogeneous term transforms according to the pseudodifferential calculus, while all changes below order $m$ disappear in the quotient. Thus the principal symbol records the leading microlocal part of $P$, and this leading part is enough to decide where the elliptic parametrix construction from the previous chapter applies. The next problem is to name the locus where that construction fails, because propagation theory takes place exactly at those covectors.
[definition: Characteristic Set]
Let $P:C_c^\infty(X)\to C^\infty(X)$ be a properly supported classical pseudodifferential operator in $\Psi^m(X)$, extended to $P:\mathcal D'(X)\to \mathcal D'(X)$ by duality. Choose a homogeneous representative $p_m\in S^m(T^*X\setminus 0)$ of the principal symbol class $\sigma_m(P)=[p_m]$. The characteristic set of $P$ is
\begin{align*}
\operatorname{Char}(P)=\{(x,\xi)\in T^*X\setminus 0 : p_m(x,\xi)=0\}.
\end{align*}
[/definition]
For a classical operator the leading homogeneous term $p_m$ is the representative used in this definition; lower-order terms in the full symbol do not enter the characteristic set. Away from $\operatorname{Char}(P)$, elliptic regularity from the previous chapter applies, so the characteristic set is the remaining region where the equation can carry singularities. To understand whether singularities stay fixed or move inside this set, we need the vector field canonically associated with the principal symbol.
[definition: Hamilton Vector Field]
Let $X$ be a smooth manifold. The Hamilton vector field assignment is the map
\begin{align*}
H:C^\infty(T^*X\setminus 0;\mathbb R)\to \mathfrak X(T^*X\setminus 0),\qquad p\mapsto H_p.
\end{align*}
In local coordinates $(x_1,\dots,x_n,\xi_1,\dots,\xi_n)$ on $T^*X$, $H_p$ is the vector field
\begin{align*}
H_p = \sum_{j=1}^n \frac{\partial p}{\partial \xi_j}\frac{\partial}{\partial x_j}-\frac{\partial p}{\partial x_j}\frac{\partial}{\partial \xi_j}.
\end{align*}
[/definition]
This formula is coordinate independent because it is defined by the canonical symplectic form on $T^*X$. Before using $H_p$ for propagation, we must check that it is tangent to the characteristic set rather than pointing away from the equation's characteristic geometry.
[example: Hamilton Flow for a Constant-Coefficient Operator]
Let $X=\mathbb R^n$ with coordinates $(x_1,\dots,x_n)$ and dual coordinates $(\xi_1,\dots,\xi_n)$. For $P=D_{x_1}$, the principal symbol is $p(x,\xi)=\xi_1$, so its derivatives are
\begin{align*}
\frac{\partial p}{\partial \xi_1}=1,\qquad \frac{\partial p}{\partial \xi_j}=0 \text{ for } j\ne 1,\qquad \frac{\partial p}{\partial x_j}=0 \text{ for every } j.
\end{align*}
Substituting these derivatives into the definition of the Hamilton vector field gives
\begin{align*}
H_p=\sum_{j=1}^n \frac{\partial p}{\partial \xi_j}\frac{\partial}{\partial x_j}-\frac{\partial p}{\partial x_j}\frac{\partial}{\partial \xi_j}=\frac{\partial}{\partial x_1}.
\end{align*}
The characteristic set is the zero set of the principal symbol in the punctured cotangent bundle:
\begin{align*}
\operatorname{Char}(P)=\{(x,\xi)\in T^*\mathbb R^n\setminus 0:\xi_1=0\}.
\end{align*}
An integral curve $\gamma(s)=(x(s),\xi(s))$ of $H_p$ satisfies
\begin{align*}
\dot x_1(s)=1,\qquad \dot x_j(s)=0 \text{ for } j\ne 1,\qquad \dot \xi_j(s)=0 \text{ for every } j.
\end{align*}
Hence, for initial data $(x(0),\xi(0))=(x^0,\xi^0)$,
\begin{align*}
x_1(s)=x_1^0+s,\qquad x_j(s)=x_j^0 \text{ for } j\ne 1,\qquad \xi_j(s)=\xi_j^0 \text{ for every } j.
\end{align*}
If $\xi_1^0=0$, then $\xi_1(s)=0$ for all $s$, so the curve stays in $\operatorname{Char}(P)$. Thus the Hamilton flow translates the base point in the $x_1$ direction while leaving the covector fixed; propagation is relevant only on the characteristic set, even though the vector field itself is defined on all of $T^*\mathbb R^n\setminus 0$.
[/example]
The example suggests that the zero set of $p$ is preserved by the Hamilton flow, but the course needs the invariant statement because later arguments are local on manifolds and cannot depend on a special coordinate system. The next theorem supplies this invariance and makes bicharacteristics legitimate curves inside the characteristic set.
[quotetheorem:8225]
[citeproof:8225]
The theorem is the first appearance of propagation: characteristic covectors are organised into curves, not isolated points. The reality hypothesis is essential for the later propagation interpretation, because if the leading symbol has genuinely complex values then its Hamilton field is not a real vector field on the real cotangent bundle; for instance $p(\xi)=i\xi_1$ has the same zero set as $\xi_1$ but no real Hamiltonian flow generated by $p$ itself. The theorem does not say that every level set is a smooth hypersurface or that $H_p$ is nonzero on $\{p=0\}$; $p=\xi_1^2$ has $H_p=2\xi_1\partial_{x_1}$, which vanishes on the characteristic set. This limitation motivates the real principal type condition below, where multiple characteristics are excluded before positive commutator estimates are applied.
## Null Bicharacteristics in the Punctured Cotangent Bundle
If $H_p$ preserves the characteristic set, what are the actual curves along which singularities move? Since principal symbols are homogeneous, the geometry lives in $T^*X\setminus 0$ and is conic in the fibre variable.
[definition: Null Bicharacteristic]
Let $P:C_c^\infty(X)\to C^\infty(X)$ be a properly supported operator in $\Psi^m(X)$, extended to distributions, and let $p_m\in C^\infty(T^*X\setminus 0;\mathbb R)$ be a real homogeneous representative of its scalar principal symbol. A null bicharacteristic of $P$ is a nonconstant integral curve
\begin{align*}
\gamma:I\to \operatorname{Char}(P)\subset T^*X\setminus 0
\end{align*}
of the Hamilton vector field $H_{p_m}$, where $I\subset \mathbb R$ is an interval.
[/definition]
The word "null" refers to the condition $p=0$. The word "bicharacteristic" emphasises that both base variables and covariables evolve; projecting to $X$ alone can hide essential frequency information.
[remark: Conic Scaling]
If $p$ is homogeneous of degree $m$ in $\xi$, then $H_p$ is homogeneous of degree $m-1$ in $\xi$. Thus the unparametrised bicharacteristic curves are compatible with the conic structure of wave front sets, although their parametrisation changes under fibre rescaling when $m\ne 1$.
[/remark]
The fundamental example is the wave operator. Its characteristic set is the light cone in cotangent variables, and its bicharacteristics project to null geodesics in the base.
[example: Bicharacteristics of the Flat Wave Operator]
On $\mathbb R_t\times \mathbb R_x^n$, with dual coordinates $(\tau,\xi_1,\dots,\xi_n)$, consider
\begin{align*}
P=D_t^2-\sum_{j=1}^n D_{x_j}^2.
\end{align*}
With the course convention $D_y=\frac{1}{i}\partial_y$, the symbol of $D_t$ is $\tau$ and the symbol of $D_{x_j}$ is $\xi_j$. Therefore the order-two principal symbol is
\begin{align*}
p(t,x,\tau,\xi)=\tau^2-\sum_{j=1}^n \xi_j^2=\tau^2-|\xi|^2.
\end{align*}
The characteristic set is the zero set of this principal symbol in the punctured cotangent bundle:
\begin{align*}
\operatorname{Char}(P)=\{(t,x,\tau,\xi)\in T^*(\mathbb R^{1+n})\setminus 0:\tau^2-|\xi|^2=0\}.
\end{align*}
Equivalently,
\begin{align*}
\operatorname{Char}(P)=\{(t,x,\tau,\xi):\tau^2=|\xi|^2,\;(\tau,\xi)\ne 0\}.
\end{align*}
Since $p$ is independent of $t$ and $x$, its base derivatives are
\begin{align*}
\frac{\partial p}{\partial t}=0,\qquad \frac{\partial p}{\partial x_j}=0 \text{ for every } j.
\end{align*}
Its fibre derivatives are
\begin{align*}
\frac{\partial p}{\partial \tau}=2\tau,\qquad \frac{\partial p}{\partial \xi_j}=-2\xi_j \text{ for every } j.
\end{align*}
Substituting these derivatives into the Hamilton vector field formula gives
\begin{align*}
H_p=2\tau\frac{\partial}{\partial t}-2\sum_{j=1}^n \xi_j\frac{\partial}{\partial x_j}.
\end{align*}
An integral curve $\gamma(s)=(t(s),x(s),\tau(s),\xi(s))$ of $H_p$ therefore satisfies
\begin{align*}
\dot t(s)=2\tau(s),\qquad \dot x_j(s)=-2\xi_j(s),\qquad \dot\tau(s)=0,\qquad \dot\xi_j(s)=0.
\end{align*}
Thus $\tau(s)=\tau^0$ and $\xi_j(s)=\xi_j^0$ are constant, and the base variables are
\begin{align*}
t(s)=t^0+2\tau^0s,\qquad x_j(s)=x_j^0-2\xi_j^0s.
\end{align*}
If the initial covector is characteristic, so $(\tau^0)^2=|\xi^0|^2$, then the same equality holds for all $s$ because $(\tau(s),\xi(s))=(\tau^0,\xi^0)$. Moreover the projected curve satisfies
\begin{align*}
|x(s)-x^0|^2=4s^2|\xi^0|^2.
\end{align*}
Using $(\tau^0)^2=|\xi^0|^2$, this becomes
\begin{align*}
|x(s)-x^0|^2=4s^2(\tau^0)^2=(t(s)-t^0)^2.
\end{align*}
So the bicharacteristics project to straight null lines in spacetime, with constant covector direction. This is the microlocal form of finite-speed propagation for the constant-coefficient wave equation.
[/example]
On a Riemannian manifold the same computation becomes geodesic flow. The Hamiltonian is the dual metric, so the cotangent flow is the canonical lift of geodesic motion.
[example: Riemannian Wave Equation and Geodesic Flow]
In local coordinates on $M$, write
\begin{align*}
|\xi|_{g^{-1}}^2=g^{jk}(x)\xi_j\xi_k.
\end{align*}
For $P=D_t^2-\Delta_g$ on $\mathbb R_t\times M$, the order-two principal symbol is
\begin{align*}
p(t,x,\tau,\xi)=\tau^2-g^{jk}(x)\xi_j\xi_k.
\end{align*}
Thus the characteristic set is
\begin{align*}
\operatorname{Char}(P)=\{(t,x,\tau,\xi):\tau^2=g^{jk}(x)\xi_j\xi_k,\;(\tau,\xi)\ne 0\}.
\end{align*}
The derivatives of $p$ are
\begin{align*}
\frac{\partial p}{\partial \tau}=2\tau,\qquad \frac{\partial p}{\partial \xi_j}=-2g^{jk}(x)\xi_k,\qquad \frac{\partial p}{\partial t}=0,\qquad \frac{\partial p}{\partial x_j}=-\frac{\partial g^{kl}}{\partial x_j}(x)\xi_k\xi_l.
\end{align*}
Substituting these into the Hamilton vector field formula gives
\begin{align*}
H_p=2\tau\frac{\partial}{\partial t}-2g^{jk}(x)\xi_k\frac{\partial}{\partial x_j}+\frac{\partial g^{kl}}{\partial x_j}(x)\xi_k\xi_l\frac{\partial}{\partial \xi_j}.
\end{align*}
Therefore an integral curve $\gamma(s)=(t(s),x(s),\tau(s),\xi(s))$ satisfies
\begin{align*}
\dot t(s)=2\tau(s),\qquad \dot x_j(s)=-2g^{jk}(x(s))\xi_k(s),\qquad \dot\tau(s)=0,\qquad \dot\xi_j(s)=\frac{\partial g^{kl}}{\partial x_j}(x(s))\xi_k(s)\xi_l(s).
\end{align*}
Since $\dot\tau=0$, the value $\tau(s)=\tau^0$ is constant. On the characteristic set, $\tau^2=|\xi|_{g^{-1}}^2$ and $(\tau,\xi)\ne 0$, so $\tau^0\ne 0$. We may therefore use $t$ as a parameter. Dividing $\dot x_j$ by $\dot t=2\tau^0$ gives
\begin{align*}
\frac{dx_j}{dt}=-\frac{g^{jk}(x)\xi_k}{\tau^0}.
\end{align*}
Equivalently,
\begin{align*}
\xi_j=-\tau^0 g_{jk}(x)\frac{dx_k}{dt}.
\end{align*}
Taking the $g$-norm of $dx/dt$ and using $\tau^2=g^{jk}\xi_j\xi_k$ gives
\begin{align*}
g_{jk}(x)\frac{dx_j}{dt}\frac{dx_k}{dt}=\frac{g^{jk}(x)\xi_j\xi_k}{(\tau^0)^2}=1.
\end{align*}
The remaining Hamilton equation is exactly the cotangent form of the geodesic equation for the Hamiltonian $|\xi|_{g^{-1}}^2$, with the harmless sign coming from the minus sign in $p=\tau^2-|\xi|_{g^{-1}}^2$. Thus the base projection $x(t)$ is a unit-speed geodesic of $g$, with orientation determined by the sign of $\tau^0$. Hence the null bicharacteristics of $D_t^2-\Delta_g$ project to lightlike curves $(t,x(t))$, and singularities of homogeneous wave solutions are transported along the geodesic flow on the cosphere bundle.
[/example]
First-order real principal symbols give a simpler model where the bicharacteristics are the integral curves of a transport vector field. In this case the base projection already contains much of the dynamics, but the cotangent lift still matters: the fibre variables change so that the covector remains on the characteristic hypersurface. This is the microlocal version of the [method of characteristics](/page/Method%20of%20Characteristics), with the added information of which frequency direction is being transported.
[example: First-Order Transport Operator]
On $\mathbb R_t\times \mathbb R_x$, take
\begin{align*}
P=D_t+a(x)D_x,
\end{align*}
with $a\in C^\infty(\mathbb R;\mathbb R)$. With dual coordinates $(\tau,\xi)$ and the convention that $D_t$ has symbol $\tau$ and $D_x$ has symbol $\xi$, the principal symbol is
\begin{align*}
p(t,x,\tau,\xi)=\tau+a(x)\xi.
\end{align*}
Its derivatives are
\begin{align*}
\frac{\partial p}{\partial \tau}=1,\qquad \frac{\partial p}{\partial \xi}=a(x),\qquad \frac{\partial p}{\partial t}=0,\qquad \frac{\partial p}{\partial x}=a'(x)\xi.
\end{align*}
Substituting these into the Hamilton vector field formula gives
\begin{align*}
H_p=\frac{\partial}{\partial t}+a(x)\frac{\partial}{\partial x}-a'(x)\xi\frac{\partial}{\partial \xi}.
\end{align*}
The characteristic set is
\begin{align*}
\operatorname{Char}(P)=\{(t,x,\tau,\xi)\in T^*(\mathbb R^2)\setminus 0:\tau+a(x)\xi=0\}.
\end{align*}
An integral curve $\gamma(s)=(t(s),x(s),\tau(s),\xi(s))$ of $H_p$ satisfies
\begin{align*}
\dot t(s)=1,\qquad \dot x(s)=a(x(s)),\qquad \dot\tau(s)=0,\qquad \dot\xi(s)=-a'(x(s))\xi(s).
\end{align*}
Thus $t(s)=t^0+s$, so using $t$ as the parameter gives
\begin{align*}
\frac{dx}{dt}=a(x),\qquad \frac{d\tau}{dt}=0,\qquad \frac{d\xi}{dt}=-a'(x)\xi.
\end{align*}
Along such a curve,
\begin{align*}
\frac{d}{dt}\bigl(\tau+a(x)\xi\bigr)=0+a'(x)\frac{dx}{dt}\xi+a(x)\frac{d\xi}{dt}.
\end{align*}
Using $\frac{dx}{dt}=a(x)$ and $\frac{d\xi}{dt}=-a'(x)\xi$, this becomes
\begin{align*}
\frac{d}{dt}\bigl(\tau+a(x)\xi\bigr)=a'(x)a(x)\xi-a(x)a'(x)\xi=0.
\end{align*}
Hence if $\tau^0+a(x^0)\xi^0=0$ initially, then $\tau(t)+a(x(t))\xi(t)=0$ for all times for which the curve is defined. The base projection is the ordinary transport characteristic $\dot t=1$, $\dot x=a(x)$, while the fibre component $\xi$ changes exactly so that the covector remains characteristic.
[/example]
These examples show why the propagation theorem cannot be a statement only about points of $X$. The same base point may carry regularity in one covector direction and singularity in another.
## Poisson Brackets and Commutator Estimates
How does the operator calculus detect motion along $H_p$? The bridge is the Poisson bracket: it is both the derivative of one symbol along the Hamilton flow of another and the principal symbol of a commutator.
[definition: Poisson Bracket]
Let $X$ be a smooth manifold. The Poisson bracket is the bilinear map
\begin{align*}
\{\cdot,\cdot\}:C^\infty(T^*X\setminus 0)\times C^\infty(T^*X\setminus 0)\to C^\infty(T^*X\setminus 0).
\end{align*}
For $p,a\in C^\infty(T^*X\setminus 0)$, it is defined by
\begin{align*}
\{p,a\}=H_p a=\sum_{j=1}^n \frac{\partial p}{\partial \xi_j}\frac{\partial a}{\partial x_j}-\frac{\partial p}{\partial x_j}\frac{\partial a}{\partial \xi_j}.
\end{align*}
[/definition]
The sign convention is chosen so that $\{p,a\}$ is the derivative of $a$ along the bicharacteristic flow generated by $p$. This derivative becomes useful for PDE only after it is connected to the operator algebra, so we next identify it as the leading symbol of a commutator.
[quotetheorem:8188]
[citeproof:8188]
This commutator formula turns differentiation along bicharacteristics into an operator identity. The scalar-symbol hypothesis matters: for systems, the leading symbols are matrices and the order $m+k$ terms need not commute, so $PA-AP$ can have order $m+k$ rather than $m+k-1$. The statement is also only a principal-symbol statement; it does not identify subprincipal contributions, lower-order imaginary parts, or the sign of the full commutator. The theorem therefore does not by itself produce a regularity estimate. To get one, the commutant must be chosen so that the principal bracket has a sign, and the lower-order terms must either be lower in the Sobolev induction or be placed in regions where regularity is already known.
[example: Failure of Order Drop for a Matrix System]
On $\mathbb R$, let the system operators act on pairs by
\begin{align*}
P(u_1,u_2)=(D_xu_1,0),\qquad A(u_1,u_2)=(u_2,D_xu_1).
\end{align*}
With respect to the standard basis of $\mathbb C^2$, their principal symbols are the linear maps
\begin{align*}
p(\xi)(v_1,v_2)=(\xi v_1,0),\qquad a(\xi)(v_1,v_2)=(v_2,\xi v_1).
\end{align*}
Compute the two compositions on an arbitrary vector $(v_1,v_2)$. First apply $a(\xi)$ and then $p(\xi)$:
\begin{align*}
a(\xi)(v_1,v_2)=(v_2,\xi v_1).
\end{align*}
Applying $p(\xi)$ to this result gives
\begin{align*}
p(\xi)a(\xi)(v_1,v_2)=p(\xi)(v_2,\xi v_1)=(\xi v_2,0).
\end{align*}
For the opposite composition, first apply $p(\xi)$:
\begin{align*}
p(\xi)(v_1,v_2)=(\xi v_1,0).
\end{align*}
Applying $a(\xi)$ to this result gives
\begin{align*}
a(\xi)p(\xi)(v_1,v_2)=a(\xi)(\xi v_1,0)=(0,\xi^2v_1).
\end{align*}
Subtracting the two compositions component by component gives
\begin{align*}
(p(\xi)a(\xi)-a(\xi)p(\xi))(v_1,v_2)=(\xi v_2,-\xi^2v_1).
\end{align*}
The second component contains the term $-\xi^2v_1$, which is of order $2$, the same leading order as the products $pa$ and $ap$. Therefore the order-$m+k$ cancellation used for scalar principal symbols fails for this matrix system; system propagation needs extra hypotheses controlling the eigenstructure of the principal symbol.
[/example]
This example explains why the scalar assumption is not a cosmetic restriction. The positive commutator argument below uses the loss of one order in $[P,A]$ to match the Sobolev order of $Pu$; if the leading matrix commutator remains, the energy identity contains a term at the wrong order and the scalar proof no longer closes.
[quotetheorem:8189]
[citeproof:8189]
This estimate is the point at which the sign of $H_p(a^2)$ becomes a verifiable Sobolev bound rather than a slogan. The real-principal-symbol hypothesis is needed because a genuinely complex principal symbol contributes an uncontrolled leading real part to the energy identity; for example, adding an imaginary principal absorption term can create damping or growth rather than reversible propagation. The theorem does not say that regularity appears from nowhere: it requires incoming control through $E$, already-known control through $G$, and a microlocal Sobolev hypothesis on $Pu$. A local flow-box example makes this mechanism visible before it is packaged into the global propagation theorem.
The technical hypotheses rule out specific analytic failures. Proper support keeps compositions and distributional pairings local, so applying the commutant does not import uncontrolled behaviour from a distant part of $X$. Compact microlocalization in a conic chart gives uniform symbol seminorms and fixed cutoffs; without it, the constants in the symbolic remainder estimates could escape along the cosphere variables or across coordinate charts. The regularisation $A_\varepsilon$ is needed because $u$ is initially only a distribution, so pairings such as $(B_\varepsilon u,B_\varepsilon u)_{L^2}$ are legitimate for $\varepsilon>0$ before the uniform estimate is passed to the limit.
[example: A Monotone Commutant Along a Flow Box]
Suppose a characteristic point has a conic flow-box neighbourhood with coordinates $(s,y,\eta)$ on the characteristic set such that $H_p=\partial_s$ there. Choose
\begin{align*}
a(s,y,\eta)=\chi(s)\rho(y,\eta),
\end{align*}
where $\rho$ is supported in the transverse part of the flow box and $\chi$ is nondecreasing, equal to $0$ on the incoming side and positive on the target side. Then
\begin{align*}
a(s,y,\eta)^2=\chi(s)^2\rho(y,\eta)^2.
\end{align*}
Since $H_p=\partial_s$ on the characteristic set and $\rho$ is independent of $s$,
\begin{align*}
H_p(a^2)=\partial_s(\chi(s)^2\rho(y,\eta)^2)=2\chi(s)\chi'(s)\rho(y,\eta)^2.
\end{align*}
Because $\chi(s)\ge 0$ and $\chi'(s)\ge 0$, this quantity is nonnegative, and it is positive exactly where $\chi>0$, $\chi'>0$, and $\rho\ne 0$.
Thus the Poisson bracket $\{p,a^2\}=H_p(a^2)$ has the sign needed for the positive commutator term in the transition region. The commutator therefore controls the target cutoff by using regularity from the incoming slice, which is the local mechanism behind propagation from one slice of a bicharacteristic tube to the next.
[/example]
The estimate is not merely an energy inequality; it is the mechanism by which symbolic geometry becomes wave front set information. We now package it into the real principal type framework.
## Real Principal Type Operators and Solvability Along Bicharacteristics
Which hypotheses on $P$ make propagation a clean statement along the whole characteristic set? Real principal type excludes multiple characteristics and complex leading directions, ensuring that $H_p$ gives a genuine nonzero real direction on $\operatorname{Char}(P)$.
[definition: Real Principal Type Operator]
Let $P:C_c^\infty(X)\to C^\infty(X)$ be a properly supported classical pseudodifferential operator in $\Psi^m(X)$, extended to $P:\mathcal D'(X)\to \mathcal D'(X)$, and let $p\in S^m(T^*X\setminus 0)$ be a homogeneous representative of its scalar principal symbol. The operator $P$ is of real principal type on a conic [open set](/page/Open%20Set) $U\subset T^*X\setminus 0$ if there exists a homogeneous elliptic factor
\begin{align*}
q\in S^{-m}(U), \qquad q(x,\xi)\ne 0 \text{ for all } (x,\xi)\in U,
\end{align*}
such that $qp\in C^\infty(U;\mathbb R)$ and
\begin{align*}
d(qp)\ne 0
\end{align*}
on $U\cap \{p=0\}$.
[/definition]
Multiplication by the elliptic factor $q$ does not change the characteristic set, but it permits symbols that are real up to a nonvanishing factor. The nonvanishing differential condition says that the characteristic set is a smooth hypersurface and the Hamilton vector field is nonzero there.
[remark: Dependence on Elliptic Factors]
Replacing $p$ by $qp$ with $q$ nonvanishing rescales the Hamilton vector field on $\{p=0\}$ by $q$. Therefore the unparametrised null bicharacteristics are unchanged. This is why propagation is stated along curves rather than with a preferred time parameter.
[/remark]
The remark removes a possible ambiguity in the definition: the curves are intrinsic even though the Hamilton vector field may be rescaled. The main theorem can now state regularity propagation along these intrinsic curves, with the source term $Pu$ supplying the only microlocal obstruction allowed by the equation.
[quotetheorem:8190]
[citeproof:8190]
This theorem should be read as the hyperbolic counterpart of elliptic regularity. The regularity assumption on $Pu$ is necessary: if $Pu$ has a singularity injected at one point of a bicharacteristic, the equation can create or terminate singular behaviour there, so invariance of $\operatorname{WF}^s(u)$ is no longer a statement about the homogeneous flow alone. The real principal type condition is also essential; for $p=\xi_1^2$, the characteristic set is smooth as a set in simple examples but $H_p$ vanishes on it, so there is no nonzero bicharacteristic direction along which this theorem could propagate regularity. The theorem does not describe reflection at boundaries, radial points, or multiple characteristics, all of which require additional estimates. Its role here is to identify the interior propagation mechanism that later becomes the local model for Fourier integral operators and canonical relations.
[example: Propagation for the Wave Equation]
For $P=D_t^2-\Delta_g$ on $\mathbb R_t\times M$, the order-two principal symbol is
\begin{align*}
p(t,x,\tau,\xi)=\tau^2-g^{jk}(x)\xi_j\xi_k=\tau^2-|\xi|_{g^{-1}}^2.
\end{align*}
Hence
\begin{align*}
\operatorname{Char}(P)=\{(t,x,\tau,\xi):\tau^2=|\xi|_{g^{-1}}^2,\;(\tau,\xi)\ne 0\}.
\end{align*}
On this set, $\tau\ne 0$, since $\tau=0$ would force $|\xi|_{g^{-1}}^2=0$, hence $\xi=0$, contradicting $(\tau,\xi)\ne 0$.
The Hamilton equations for this symbol are
\begin{align*}
\dot t=2\tau,\quad \dot x_j=-2g^{jk}(x)\xi_k,\quad \dot\tau=0,\quad \dot\xi_j=\frac{\partial g^{kl}}{\partial x_j}(x)\xi_k\xi_l.
\end{align*}
Since $\dot\tau=0$, $\tau=\tau^0$ is constant along each bicharacteristic. Using $t$ as a parameter gives
\begin{align*}
\frac{dx_j}{dt}=\frac{\dot x_j}{\dot t}=-\frac{g^{jk}(x)\xi_k}{\tau^0}.
\end{align*}
Therefore
\begin{align*}
g_{j\ell}(x)\frac{dx_\ell}{dt}=-\frac{\xi_j}{\tau^0}.
\end{align*}
Taking the $g$-norm and using the characteristic equation gives
\begin{align*}
g_{j\ell}(x)\frac{dx_j}{dt}\frac{dx_\ell}{dt}=\frac{g^{jk}(x)\xi_j\xi_k}{(\tau^0)^2}=1.
\end{align*}
Thus the base projection of a null bicharacteristic is a unit-speed geodesic, with orientation determined by the sign of $\tau^0$.
Now suppose $Pu=f$ and $f$ is microlocally in $H^{s-1}$ along a compact null bicharacteristic segment. Since $m=2$, the source hypothesis in *Real Principal Type Propagation Framework* is exactly $Pu\in H^{s-m+1}=H^{s-1}$. If $u$ is microlocally $H^s$ at one point of the segment, the theorem propagates that $H^s$ regularity through overlapping flow boxes along the same null bicharacteristic, until the segment reaches a boundary, leaves the conic region where the hypotheses hold, or exits the coordinate patch used to describe the flow. Equivalently, wave singularities travel along the lightlike geodesic flow in the characteristic set.
[/example]
The same theorem also covers first-order equations, where the characteristic curves are familiar from the method of characteristics. This case shows that real principal type propagation is a microlocal strengthening of transport along ordinary characteristic curves.
[example: Solvability Along a First-Order Bicharacteristic]
For $P=D_t+a(x)D_x$ on $\mathbb R_t\times\mathbb R_x$, the principal symbol is
\begin{align*}
p(t,x,\tau,\xi)=\tau+a(x)\xi.
\end{align*}
Since $\partial p/\partial \tau=1$, the differential $dp$ is nonzero on the characteristic set, so $P$ is of real principal type. The characteristic set is
\begin{align*}
\operatorname{Char}(P)=\{(t,x,\tau,\xi):\tau+a(x)\xi=0,\;(\tau,\xi)\ne 0\}.
\end{align*}
The Hamilton vector field is computed from the four derivatives
\begin{align*}
\frac{\partial p}{\partial \tau}=1,\qquad \frac{\partial p}{\partial \xi}=a(x),\qquad \frac{\partial p}{\partial t}=0,\qquad \frac{\partial p}{\partial x}=a'(x)\xi.
\end{align*}
Substitution into the Hamilton formula gives
\begin{align*}
H_p=\frac{\partial}{\partial t}+a(x)\frac{\partial}{\partial x}-a'(x)\xi\frac{\partial}{\partial \xi}.
\end{align*}
Thus a bicharacteristic $\gamma(s)=(t(s),x(s),\tau(s),\xi(s))$ satisfies
\begin{align*}
\dot t(s)=1,\qquad \dot x(s)=a(x(s)),\qquad \dot\tau(s)=0,\qquad \dot\xi(s)=-a'(x(s))\xi(s).
\end{align*}
Because $\dot t=1$, we may use $t$ as the parameter, obtaining
\begin{align*}
\frac{dx}{dt}=a(x),\qquad \frac{d\tau}{dt}=0,\qquad \frac{d\xi}{dt}=-a'(x)\xi.
\end{align*}
Along this curve the characteristic quantity is constant, since
\begin{align*}
\frac{d}{dt}\bigl(\tau+a(x)\xi\bigr)=\frac{d\tau}{dt}+a'(x)\frac{dx}{dt}\xi+a(x)\frac{d\xi}{dt}.
\end{align*}
Using the three evolution equations gives
\begin{align*}
\frac{d}{dt}\bigl(\tau+a(x)\xi\bigr)=0+a'(x)a(x)\xi+a(x)(-a'(x)\xi)=0.
\end{align*}
Therefore an initially characteristic covector remains characteristic, and its base projection is exactly the ordinary transport curve $\dot t=1$, $\dot x=a(x)$.
Now suppose $Pu=f$. Since $P$ has order $m=1$, the source hypothesis in *Real Principal Type Propagation Framework* is $Pu=f\in H^{s-m+1}=H^s$ microlocally. If $f$ is microlocally $H^s$ along a compact bicharacteristic segment and $u$ is microlocally $H^s$ on an incoming point or slice of that segment, the propagation theorem carries the same $H^s$ regularity through overlapping flow boxes in the direction of $H_p$. This is the microlocal version of integrating the transport equation along the characteristic curve, with the extra fibre equation recording which covector direction is being transported.
[/example]
The real principal type condition is sharp enough to exclude many degenerate behaviours. If $dp$ vanishes on the characteristic set, the Hamilton vector field may fail to define a usable propagation direction, and multiple characteristic theory requires additional ideas such as radial point estimates, subprincipal analysis, or second microlocalisation.
[explanation: What This Chapter Contributes]
The chapter gives the geometric core of propagation of singularities. Principal symbols define characteristic sets; Hamilton vector fields foliate those sets by null bicharacteristics; Poisson brackets identify Hamiltonian derivatives with commutators; positive commutator estimates convert sign information into Sobolev bounds. In Chapters 8 through 10, Fourier integral operators will be understood as quantisations of canonical transformations, and this propagation picture becomes the local model for how canonical relations transport wave front sets.
[/explanation]
Propagation of singularities is the dynamical expression of the same Hamiltonian picture: singularities move along bicharacteristics rather than spreading arbitrarily. The next chapter develops the oscillatory estimates needed to build the Fourier integral operators that realize this transport.
# 6. Propagation of Singularities
This chapter is about the dynamical law governing microlocal singularities of solutions to linear PDE. It assumes the preceding material on wave front sets, pseudodifferential detection, Sobolev microlocal regularity, principal symbols, elliptic parametrices, and the bicharacteristic framework of Chapter 5. In earlier chapters the wave front set was detected by pseudodifferential cutoffs and transformed by canonical relations; here the principal symbol of a differential operator itself generates a Hamiltonian flow, and singularities are forced to move along that flow. The main result is the Duistermaat-Hormander propagation theorem for operators of real principal type, together with its standard consequences for hyperbolic equations.
## Real Principal Type and the Propagation Problem
The basic question is local and directional: if $Pu=f$ and $f$ is microlocally smooth near a characteristic covector, how far along the characteristic geometry can a singularity of $u$ persist? Elliptic regularity already answers the question away from the characteristic set, so the new phenomenon occurs only where the principal symbol vanishes.
[definition: Characteristic Set]
Let $P:C_c^\infty(X)\to C^\infty(X)$ be a properly supported pseudodifferential operator of order $m$, written $P \in \Psi^m(X)$, with continuous extension $P:\mathcal{D}'(X)\to\mathcal{D}'(X)$. Let $p_m:T^*X\setminus 0\to \mathbb C$ denote the principal homogeneous component of the full symbol. The characteristic set of $P$ is
\begin{align*}
\operatorname{Char}(P) := \{(x,\xi) \in T^*X \setminus 0 : p_m(x,\xi)=0\}.
\end{align*}
[/definition]
The same set may be described using the principal symbol class in $S^m/S^{m-1}$, since changing the representative by a lower-order term does not change the homogeneous leading part on $T^*X\setminus 0$. This definition separates the phase-space directions where $P$ has a microlocal inverse from those where it does not. The first theorem is needed to make that separation precise: before studying propagation on $\operatorname{Char}(P)$, we must prove that no extra singularities survive in the elliptic region.
[quotetheorem:8191]
[citeproof:8191]
The hypotheses are needed in two separate ways. Proper support ensures that $Pu$ is a well-defined distribution and that the parametrix calculation can be localized without uncontrolled behaviour at infinity; without a support condition, a pseudodifferential kernel may import singularities from far outside the coordinate patch. Ellipticity at the chosen covector is also essential: for $P=\partial_{x_1}$, every covector with $\xi_1=0$ is characteristic, and a distribution independent of $x_1$ may have singularities in those directions while $Pu=0$. This failure is the reason the rest of the chapter studies the characteristic set rather than trying to extend the elliptic argument through it. The theorem gives no propagation information on $\operatorname{Char}(P)$ itself, and it has no converse direction: smoothness of $u$ at a covector does not force smoothness of $Pu$ there, since applying $P$ may introduce singularities from coefficients, supports, or other microlocal branches outside the elliptic parametrix argument.
The elliptic theorem leaves only the characteristic directions unresolved, so the next issue is whether the characteristic set carries a well-defined flow. The real principal type condition is introduced precisely to exclude multiple-characteristic degeneracies and to ensure that the Hamiltonian vector field gives a genuine direction of propagation.
[definition: Real Principal Type Operator]
Let $P:C_c^\infty(X)\to C^\infty(X)$ be a properly supported pseudodifferential operator of order $m$, with extension $P:\mathcal{D}'(X)\to\mathcal{D}'(X)$. The operator $P$ is of real principal type if there is a real-valued homogeneous principal symbol representative
\begin{align*}
p_m:T^*X\setminus 0\to \mathbb R
\end{align*}
of degree $m$ such that its Hamiltonian vector field
\begin{align*}
H_{p_m}:T^*X\setminus 0\to T(T^*X\setminus 0)
\end{align*}
is nonvanishing at every point of $\operatorname{Char}(P)$.
[/definition]
Having a nonzero vector field is still not enough by itself; we need the actual curves it traces through the characteristic set. The next definition names those curves, because the propagation theorem will say that microlocal singularities fill whole such curves unless the equation has a singular right-hand side.
[definition: Null Bicharacteristic]
Let $P$ be of real principal type with real homogeneous principal symbol representative $p_m:T^*X\setminus 0\to\mathbb R$. A null bicharacteristic of $P$ is a maximally extended integral curve $\gamma:I \to \operatorname{Char}(P)$ satisfying
\begin{align*}
\dot{\gamma}(s) = H_{p_m}(\gamma(s)).
\end{align*}
[/definition]
Null bicharacteristics are the characteristic curves in cotangent space rather than in the base manifold alone. The central theorem is needed because elliptic regularity only removes non-characteristic covectors; it does not yet say whether a characteristic singularity can stop, branch, or jump along unrelated directions.
[quotetheorem:8192]
[citeproof:8192]
The real-principal-type assumption rules out several common failure modes. If the principal symbol has a double characteristic, as in $p=\xi_1^2$ on $\mathbb R^n$, then $H_p=2\xi_1\partial_{x_1}$ vanishes on $\{\xi_1=0\}$, so there is no nonzero Hamiltonian direction along which the commutator can transport regularity. If the principal symbol is not real, the commutator no longer supplies a signed energy identity; elliptic damping or growth may dominate the transport picture. The exclusion of $WF(Pu)$ is equally necessary, since a singular forcing term can create or absorb wave front at the point where the bicharacteristic meets it, as fundamental solutions of the wave equation demonstrate.
The theorem is most concrete when the Hamiltonian equations can be solved by hand. The constant-coefficient wave operator supplies the model example, because its characteristic variety is the light cone and its bicharacteristics project to straight light rays.
[example: Constant Coefficient Wave Equation]
On $\mathbb R_t \times \mathbb R_x^n$, consider $P=\partial_t^2-\Delta_x$ with principal symbol
\begin{align*}p(t,x,\tau,\xi)= -\tau^2 + |\xi|^2=-\tau^2+\sum_{i=1}^n \xi_i^2.\end{align*}
Using
\begin{align*}H_p=\partial_\tau p\,\partial_t+\sum_{i=1}^n \partial_{\xi_i}p\,\partial_{x_i}-\partial_t p\,\partial_\tau-\sum_{i=1}^n \partial_{x_i}p\,\partial_{\xi_i},\end{align*}
the partial derivatives are
\begin{align*}\partial_\tau p=-2\tau,\qquad \partial_{\xi_i}p=2\xi_i,\qquad \partial_t p=0,\qquad \partial_{x_i}p=0.\end{align*}
Substituting these into the formula for $H_p$ gives
\begin{align*}H_p=-2\tau\,\partial_t+\sum_{i=1}^n 2\xi_i\,\partial_{x_i}.\end{align*}
Thus an integral curve $s\mapsto (t(s),x(s),\tau(s),\xi(s))$ satisfies
\begin{align*}\dot t=-2\tau,\qquad \dot x_i=2\xi_i,\qquad \dot\tau=0,\qquad \dot\xi_i=0\quad\text{for }1\leq i\leq n.\end{align*}
The characteristic set is obtained by setting $p=0$:
\begin{align*}-\tau^2+|\xi|^2=0.\end{align*}
Equivalently, $\tau^2=|\xi|^2$, so since $(\tau,\xi)\neq 0$ on $T^*(\mathbb R^{1+n})\setminus 0$, the two characteristic sheets are
\begin{align*}\tau=|\xi|\qquad\text{and}\qquad \tau=-|\xi|.\end{align*}
Because $\dot\tau=0$ and $\dot\xi=0$, the covector $(\tau,\xi)$ is constant along each bicharacteristic. When $\tau\neq 0$, reparametrize by $t$. For each component,
\begin{align*}\frac{dx_i}{dt}=\frac{dx_i/ds}{dt/ds}=\frac{2\xi_i}{-2\tau}=-\frac{\xi_i}{\tau}.\end{align*}
Hence on the sheet $\tau=|\xi|$ the spatial velocity is
\begin{align*}\frac{dx}{dt}=-\frac{\xi}{|\xi|},\end{align*}
while on the sheet $\tau=-|\xi|$ it is
\begin{align*}\frac{dx}{dt}=\frac{\xi}{|\xi|}.\end{align*}
Therefore the bicharacteristics project to straight spatial rays moving at speed $1$, and singularities of solutions with smooth forcing can propagate only along these light rays, not through spacetime directions outside the light cone.
[/example]
## Commutators and the Local Propagation Step
The theorem is global along bicharacteristics, but the proof is built from a local energy estimate in phase space. The problem is to convert the sign of $H_p$ on a cutoff into a gain of microlocal regularity for $u$ at one covector from regularity already known slightly upstream.
[definition: Microlocal Sobolev Regularity]
Let $u \in \mathcal{D}'(X)$, $s \in \mathbb R$, and $(x_0,\xi_0) \in T^*X \setminus 0$. We say that $u$ is microlocally in $H^s$ at $(x_0,\xi_0)$ if there exists a properly supported operator $A:C_c^\infty(X)\to C^\infty(X)$ with $A \in \Psi^0(X)$, elliptic at $(x_0,\xi_0)$ and extended continuously to $\mathcal{D}'(X)$, such that $Au \in H^s_{\mathrm{loc}}(X)$.
[/definition]
Wave front set is the failure of microlocal $H^s$ regularity for all $s$, but the commutator method estimates one Sobolev order at a time. The next theorem is needed because it is the finite-order local step that will later be iterated and then upgraded to smooth microlocal regularity.
[quotetheorem:8193]
[citeproof:8193]
The transversality and incoming-regularity assumptions are not cosmetic. If the Hamiltonian flow is tangent to the chosen hypersurface, the cutoff may fail to have a signed derivative, and the commutator estimate loses the positive square that controls the target microlocal norm. If the upstream region is singular, the estimate only transports that singularity; for the transport operator $\partial_{x_1}$, a jump discontinuity placed at $x_1<0$ persists along the line rather than disappearing at $x_1=0$. The theorem is therefore a propagation statement, not a smoothing statement, and its purpose is to provide the local step used in the compact-strip iteration below.
This result is the engine of the whole chapter. It is not an elliptic estimate: it cannot create regularity from nowhere, but it transports regularity along the oriented Hamiltonian flow wherever the inhomogeneous term is regular enough.
[remark: Direction of the Estimate]
The commutator can be arranged to propagate regularity forward or backward by changing the monotonicity of the cutoff along $H_p$. The final propagation theorem is symmetric along a bicharacteristic because the argument may be run in either orientation on small subsegments.
[/remark]
A useful way to visualise the proof is as an energy estimate on a moving phase-space window. The window is not a physical subset of $X$, but a conic tube in $T^*X \setminus 0$, so the estimate records both position and frequency direction.
The local normal form $p=\xi_1$ is the simplest chart in which this phase-space window becomes visible. The next example is useful because it strips away curvature and lower-order terms, leaving only transport of regularity along one coordinate direction.
[example: Reflection Free Local Propagation in a Coordinate Chart]
Let $X$ be locally identified with an open subset of $\mathbb R^n$, and suppose that near a characteristic covector the real principal type symbol has been reduced to
\begin{align*}p(x,\xi)=\xi_1.\end{align*}
With the standard symplectic convention,
\begin{align*}H_p=\sum_{j=1}^n \partial_{\xi_j}p\,\partial_{x_j}-\sum_{j=1}^n \partial_{x_j}p\,\partial_{\xi_j}.\end{align*}
The derivatives are
\begin{align*}\partial_{\xi_1}p=1,\qquad \partial_{\xi_j}p=0\text{ for }j\neq 1,\qquad \partial_{x_j}p=0\text{ for every }j.\end{align*}
Substitution gives
\begin{align*}H_p=\partial_{x_1}.\end{align*}
The characteristic set in this chart is
\begin{align*}\operatorname{Char}(P)=\{(x,\xi):\xi_1=0,\ \xi\neq 0\}.\end{align*}
An integral curve $s\mapsto (x(s),\xi(s))$ of $H_p$ therefore satisfies
\begin{align*}\dot x_1(s)=1,\qquad \dot x_j(s)=0\text{ for }j\neq 1,\qquad \dot\xi_j(s)=0\text{ for every }j.\end{align*}
Hence, for initial data $(x(0),\xi(0))=(x^0,\xi^0)$ with $\xi^0_1=0$,
\begin{align*}x_1(s)=x^0_1+s,\qquad x_j(s)=x^0_j\text{ for }j\neq 1,\qquad \xi(s)=\xi^0.\end{align*}
Thus the bicharacteristics are straight lines in the $x_1$ direction, with $x'=(x_2,\ldots,x_n)$ and the whole covector $\xi$ fixed.
If $Pu$ is microlocally smooth in a conic tube around such a segment and $u$ is microlocally smooth on the incoming part $x_1<0$ of that tube, the *Local Positive Commutator Propagation Step* transports the same microlocal smoothness across the segment in the positive $x_1$ direction, after shrinking the tube if necessary. Since the equations for the flow contain no term changing $\xi$, no term changing $x'$, and no branch where $\dot x_1$ changes sign, this interior normal form has only transmission along the coordinate line; there is no reflected bicharacteristic in the chart.
[/example]
## Forward and Backward Bicharacteristic Strips
Once the local step is available, the next question is how to organise the iteration along a curve. Bicharacteristics may be long, may approach the boundary of the coordinate patch, and may require several microlocal charts, so the propagation statement is phrased on compact strips first and then extended by maximal continuation.
[definition: Bicharacteristic Strip]
Let $P:C_c^\infty(X)\to C^\infty(X)$ be a properly supported real-principal-type operator with real homogeneous principal symbol representative $p_m:T^*X\setminus 0\to\mathbb R$, and let $\gamma:I \to \operatorname{Char}(P)$ be a null bicharacteristic for $H_{p_m}$. A bicharacteristic strip is the image $\gamma([a,b])$ of a compact interval $[a,b] \subset I$, together with a conic neighbourhood small enough that the Hamiltonian flow is defined throughout the interval.
[/definition]
The compact strip is the unit on which finitely many local commutator estimates can be patched together. A single normal-form estimate only moves regularity across one small coordinate tube, while a bicharacteristic segment may pass through several charts and symbol normalizations. The obstruction to a finite propagation statement is whether the curve can be covered by finitely many such tubes while avoiding the forcing set $WF(Pu)$.
[quotetheorem:8194]
[citeproof:8194]
Compactness is what turns the local estimate into a finite argument. On a non-compact bicharacteristic, the curve may leave every coordinate chart or accumulate near a region where the symbol ceases to have the required normal form, so a finite chain of commutator estimates need not exist. The assumption that the strip is disjoint from $WF(Pu)$ is also sharp: if a wave equation is forced by a delta source at an interior point of the strip, the source point interrupts the propagation of regularity across it. This compact-strip formulation is the practical bridge between the one-step commutator estimate and the maximal-bicharacteristic statement.
The compact-strip version is often the form used in calculations. The maximal-bicharacteristic theorem is obtained by applying it to every compact subinterval of a bicharacteristic lying away from $WF(Pu)$.
[remark: Creation and Blocking by the Inhomogeneous Term]
The set $WF(Pu)$ acts as a source set for the propagation theorem. Along a null bicharacteristic, regularity can be transported only through intervals avoiding $WF(Pu)$; when the curve meets $WF(Pu)$, the equation no longer separates singularities of $u$ from singularities already present in the forcing.
[/remark]
The role of $WF(Pu)$ is especially visible for fundamental solutions, where the right-hand side is singular at a source point and smooth elsewhere. The next example shows how propagation turns that localized source into a geometric wavefront along the light cone.
[example: Propagation from an Impulsive Source]
For $P=\partial_t^2-\Delta_x$ on $\mathbb R^{1+n}$, suppose $Pu=\delta_{(0,0)}$. The distribution $\delta_{(0,0)}$ is smooth on $\mathbb R^{1+n}\setminus\{(0,0)\}$, and at the source its wave front set consists of the whole punctured cotangent fiber,
\begin{align*}WF(\delta_{(0,0)})=\{(0,0;\tau,\xi):(\tau,\xi)\neq 0\}.\end{align*}
The principal symbol is
\begin{align*}p(t,x,\tau,\xi)=-\tau^2+|\xi|^2.\end{align*}
Thus, away from the source, *[Microlocal Hypoellipticity Away from the Characteristic Set](/theorems/8191)* removes all non-characteristic covectors from $WF(u)$, so any remaining wave front of $u$ must satisfy
\begin{align*}-\tau^2+|\xi|^2=0.\end{align*}
Equivalently,
\begin{align*}\tau^2=|\xi|^2.\end{align*}
Since $(\tau,\xi)\neq 0$, this gives the two characteristic sheets
\begin{align*}\tau=|\xi|\quad\text{or}\quad \tau=-|\xi|.\end{align*}
The Hamiltonian vector field is obtained from
\begin{align*}H_p=\partial_\tau p\,\partial_t+\sum_{i=1}^n \partial_{\xi_i}p\,\partial_{x_i}-\partial_t p\,\partial_\tau-\sum_{i=1}^n \partial_{x_i}p\,\partial_{\xi_i}.\end{align*}
Here
\begin{align*}\partial_\tau p=-2\tau,\quad \partial_{\xi_i}p=2\xi_i,\quad \partial_t p=0,\quad \partial_{x_i}p=0.\end{align*}
Therefore
\begin{align*}H_p=-2\tau\,\partial_t+\sum_{i=1}^n 2\xi_i\,\partial_{x_i}.\end{align*}
A null bicharacteristic through a source covector $(0,0;\tau_0,\xi_0)$ satisfying $-\tau_0^2+|\xi_0|^2=0$ solves
\begin{align*}\dot t=-2\tau_0,\quad \dot x_i=2(\xi_0)_i,\quad \dot\tau=0,\quad \dot\xi_i=0.\end{align*}
With $t(0)=0$ and $x(0)=0$, integration gives
\begin{align*}t(s)=-2\tau_0s,\quad x(s)=2s\xi_0,\quad \tau(s)=\tau_0,\quad \xi(s)=\xi_0.\end{align*}
For $s\neq 0$,
\begin{align*}|x(s)|=|2s\xi_0|=2|s|\,|\xi_0|.\end{align*}
On the characteristic set $|\xi_0|=|\tau_0|$, so
\begin{align*}|x(s)|=2|s|\,|\tau_0|=|-2\tau_0s|=|t(s)|.\end{align*}
Thus every propagated singularity from the source projects to the light cone $|x|=|t|$.
For the forward fundamental solution, we keep the branch with $t>0$, so the singular support away from the source is contained in
\begin{align*}\{(t,x):t>0,\ |x|=t\}.\end{align*}
Along this forward cone,
\begin{align*}\frac{x(s)}{t(s)}=\frac{2s\xi_0}{-2\tau_0s}=-\frac{\xi_0}{\tau_0}.\end{align*}
Hence
\begin{align*}\xi_0=-\tau_0\frac{x}{t}.\end{align*}
Since $t=|x|$ on the forward cone, this becomes
\begin{align*}(\tau_0,\xi_0)=\tau_0\left(1,-\frac{x}{|x|}\right).\end{align*}
But the cone is the level set of $F(t,x)=t-|x|$, and
\begin{align*}dF=dt-\sum_{i=1}^n \frac{x_i}{|x|}\,dx_i.\end{align*}
Thus the covectors carried by the bicharacteristics are precisely nonzero multiples of $d(t-|x|)$ along the forward cone. By *Duistermaat Hormander Propagation of Singularities*, away from $(0,0)$ the wave front of $u$ can only lie on these null bicharacteristics; in base space this gives the light cone, and in cotangent space it gives the conormal directions to that cone.
[/example]
## Consequences for Hyperbolic Equations
The point of the propagation theorem is that it translates the geometry of the principal symbol into statements about solutions of PDE. For hyperbolic equations with smooth coefficients, the characteristic set is the bundle of null covectors for the principal part, and the bicharacteristics are the lifted geometric rays.
[definition: Hyperbolic Operator with Smooth Coefficients]
Let $X$ be a smooth manifold, let $t:X\to \mathbb R$ be a smooth time function, and let $U\subset X$ be open. A differential operator
\begin{align*}
P:C^\infty(U)\to C^\infty(U)
\end{align*}
of order $m$ with smooth coefficients, extended by duality as $P:\mathcal{D}'(U)\to\mathcal{D}'(U)$, is hyperbolic relative to $t$ on $U$ if its principal symbol
\begin{align*}
p_m:T^*U\setminus 0\to \mathbb R
\end{align*}
is real and, after writing a covector as $\tau\,dt_x+\eta$ with $\eta$ a covector annihilating the time direction and restricting to the cotangent bundle of the time slice through $x$, the polynomial function $\tau\mapsto p_m(x,\tau\,dt_x+\eta)$ from $\mathbb R$ to $\mathbb R$ has only real roots whenever $\eta\neq 0$.
[/definition]
Hyperbolicity alone permits repeated real roots, so it does not by itself give the separated characteristic sheets used by the real-principal-type theorem. When two roots merge, the characteristic set can fail to decompose into smooth real-principal-type sheets, and Hamiltonian propagation no longer has a single clean branch to follow.
For the propagation theorem, this is the obstruction that must be ruled out locally in phase space. A single equation $p_m=0$ should split into $m$ smooth characteristic branches over each nonzero spatial covector, with no branch crossing or multiplicity that would obscure which Hamilton vector field governs a singularity. The stronger condition below records exactly that separation property for the polynomial in the time-frequency variable: away from the zero spatial covector, all characteristic roots are real and distinct.
[definition: Strictly Hyperbolic Operator]
Let $P:C^\infty(U)\to C^\infty(U)$ be hyperbolic relative to $t$ on $U$. The operator $P$ is strictly hyperbolic relative to $t$ on $U$ if, for every $x\in U$ and every nonzero covector $\eta$ annihilating the time direction, the polynomial $\tau\mapsto p_m(x,\tau\,dt_x+\eta)$ has $m$ distinct real roots.
[/definition]
This definition connects the abstract real-principal-type theorem with the operators arising in wave propagation. Distinct characteristic roots split the characteristic set into separate smooth sheets, but singularities still have to be assigned to the Hamiltonian flow on those sheets rather than to an undifferentiated base-space wave. The microlocal question is whether, away from the forcing set $WF(Pu)$, each singular covector must travel along the corresponding null bicharacteristic of the principal symbol.
[quotetheorem:8195]
[citeproof:8195]
Strictness is the hypothesis that prevents sheets from merging. When characteristic roots coincide, the principal symbol may have multiple characteristics, and singularities can display mode conversion or weaker propagation behaviour not captured by the real-principal-type theorem; weakly hyperbolic models such as $P=\partial_t^2-t^2\partial_x^2$ show how a double characteristic at $t=0$ can invalidate the separate-sheet picture. Smooth coefficients are also part of the microlocal framework used here; nonsmooth coefficients can scatter singularities in ways that are not described by classical Hamiltonian flow. The theorem is therefore the clean interior propagation result for geometric optics, where singularities move along the null bicharacteristics of the smooth principal symbol. It is only a set-theoretic statement about $WF(u)$: it does not measure amplitudes, compare strengths of singularities, prove well-posedness of the Cauchy problem, or describe behaviour at multiple characteristics. Fourier integral parametrices later add the amplitude transport information under stronger construction hypotheses.
The hyperbolic theorem is microlocal, so it remembers the covector direction of every singularity. If one projects to the base manifold, different Hamiltonian branches over the same point can overlap, cancel, or enter the region from different sources. The base-space problem is therefore to bound the singular support by visible sources, boundary entry, and possible trapped bicharacteristics without pretending that the projection retains all covector information.
[quotetheorem:8196]
[citeproof:8196]
This estimate explains why microlocal analysis is stronger than singular-support analysis. The base projection records where singularities may be seen, while the full wave front set records which Hamiltonian branches carry them. The terms $F_K$ and $E_K$ cannot be replaced by $S_K$ and $\partial K$ inside the containment, since singularities created by an interior source or entering through the boundary may later occupy other interior points of $K$ along bicharacteristics. The term $T_K$ is the necessary escape alternative: if $Pu$ is smooth and $u$ is singular along an entire trapped or complete bicharacteristic [lying over](/theorems/2876) $K$, then neither an interior source nor the boundary explains the projected singular point. Conversely, the theorem does not claim that every point of $S_K$, $F_K$, $E_K$, or $T_K$ is singular; cancellations between covector branches can remove singular support after projection. This limitation is precisely why the course keeps the wave front set as the primary object until the final projection step.
[example: Smooth Coefficient Wave Equation on a Local Spacetime Patch]
Let
\begin{align*}P=\partial_t^2-\sum_{i,j=1}^n \partial_{x_i}(a_{ij}(t,x)\partial_{x_j})\end{align*}
where $a_{ij}=a_{ji}$ are smooth and the matrix $A(t,x)=(a_{ij}(t,x))$ is uniformly positive definite. The second-order part of
\begin{align*}-\sum_{i,j=1}^n \partial_{x_i}(a_{ij}\partial_{x_j})\end{align*}
is
\begin{align*}-\sum_{i,j=1}^n a_{ij}(t,x)\partial_{x_i}\partial_{x_j},\end{align*}
while the derivatives falling on $a_{ij}$ contribute only first-order terms. With the same principal-symbol convention used for the constant-coefficient wave operator, this gives
\begin{align*}p(t,x,\tau,\xi)= -\tau^2+\sum_{i,j=1}^n a_{ij}(t,x)\xi_i\xi_j.\end{align*}
Write
\begin{align*}Q(t,x,\xi)=\sum_{i,j=1}^n a_{ij}(t,x)\xi_i\xi_j.\end{align*}
Since $A(t,x)$ is positive definite, $Q(t,x,\xi)>0$ whenever $\xi\neq 0$. Thus $p=0$ is equivalent to
\begin{align*}\tau^2=Q(t,x,\xi),\end{align*}
so the characteristic set is the union of the two sheets
\begin{align*}\tau=Q(t,x,\xi)^{1/2}\end{align*}
and
\begin{align*}\tau=-Q(t,x,\xi)^{1/2}.\end{align*}
On either sheet, $\tau\neq 0$, hence
\begin{align*}\partial_\tau p=-2\tau\neq 0.\end{align*}
Therefore the two sheets are smooth and separated over every nonzero spatial covector $\xi$.
Using
\begin{align*}H_p=\partial_\tau p\,\partial_t+\sum_{k=1}^n \partial_{\xi_k}p\,\partial_{x_k}-\partial_t p\,\partial_\tau-\sum_{k=1}^n \partial_{x_k}p\,\partial_{\xi_k},\end{align*}
we compute
\begin{align*}\partial_\tau p=-2\tau,\end{align*}
and, by symmetry $a_{ij}=a_{ji}$,
\begin{align*}\partial_{\xi_k}p=\sum_{j=1}^n a_{kj}\xi_j+\sum_{i=1}^n a_{ik}\xi_i=2\sum_{j=1}^n a_{kj}\xi_j.\end{align*}
The coefficient derivatives are
\begin{align*}\partial_t p=\sum_{i,j=1}^n (\partial_t a_{ij})\xi_i\xi_j\end{align*}
and
\begin{align*}\partial_{x_k}p=\sum_{i,j=1}^n (\partial_{x_k}a_{ij})\xi_i\xi_j.\end{align*}
Hence a bicharacteristic $s\mapsto(t(s),x(s),\tau(s),\xi(s))$ satisfies
\begin{align*}\dot t=-2\tau,\qquad \dot x_k=2\sum_{j=1}^n a_{kj}(t,x)\xi_j,\qquad \dot\tau=-\sum_{i,j=1}^n(\partial_ta_{ij})(t,x)\xi_i\xi_j,\qquad \dot\xi_k=-\sum_{i,j=1}^n(\partial_{x_k}a_{ij})(t,x)\xi_i\xi_j.\end{align*}
Since $\tau\neq 0$ on the characteristic set, $t$ is a valid local parameter along each bicharacteristic. Dividing $\dot x_k$ by $\dot t$ gives
\begin{align*}\frac{dx_k}{dt}=\frac{2\sum_{j=1}^n a_{kj}(t,x)\xi_j}{-2\tau}=-\frac{\sum_{j=1}^n a_{kj}(t,x)\xi_j}{\tau}.\end{align*}
Thus the base projection follows the geometric ray system determined by the metric matrix $A(t,x)$, with the sign of $\tau$ selecting one of the two time-oriented characteristic families.
If $Pu$ is smooth in the patch, then $WF(Pu)$ is empty there. By *Propagation for Strictly Hyperbolic Equations*, every characteristic singularity of $u$ in the patch must lie on one of the null bicharacteristics above; by *Microlocal Hypoellipticity Away from the Characteristic Set*, no non-characteristic covector can remain in $WF(u)$. Thus singularities can move only along the projected geometric rays. Fourier integral parametrices add amplitude information along these rays, while the propagation theorem gives the underlying set-theoretic motion of the wave front set.
[/example]
The chapter completes the bridge from pseudodifferential detection to Hamiltonian propagation; Chapters 7 through 10 then turn that propagation geometry into the Fourier integral operator calculus. Propagation identifies the canonical curves generated by a principal symbol; the next part of the course packages those curves, and more general canonical transformations, into Fourier integral operators and parametrices. The treatment here is the interior real-principal-type theory used in the course; refinements involving subprincipal symbols, radial points, boundaries, and glancing rays require additional hypotheses and estimates beyond this chapter. The same flow viewpoint is also the starting point for wave trace and scattering applications, where singularities of kernels record closed or escaping bicharacteristics rather than only pointwise regularity of a single solution.
Having identified bicharacteristic flow as the mechanism of propagation, we now need analytic tools to construct the corresponding kernels. Oscillatory integrals and stationary phase provide the asymptotics that make Fourier integral operators precise and controllable.
# 7. Oscillatory Integrals and Stationary Phase
After Chapters 5 and 6 identified Hamiltonian propagation of singularities, this chapter supplies the oscillatory-integral and stationary-phase estimates needed to build Fourier integral operators. It assumes the distribution theory, Fourier-transform normalisation, symbol estimates, and pseudodifferential microlocalisation developed in the preceding chapters, and it adds the asymptotic analysis needed for phases with large frequencies. Oscillatory integrals are the analytic bridge between wave front sets and the geometric canonical relations that appear in Fourier integral operators. The preceding chapters used Fourier decay and pseudodifferential cutoffs to locate singularities; here we study integrals whose singular behaviour is produced by rapid oscillation rather than by pointwise blow-up. The main questions are how a phase function encodes geometry, how non-stationary regions become smoothing, and how stationary points produce precise asymptotic expansions.
## Phase Functions and Critical Geometry
The first problem is to separate the data in an oscillatory integral that affects singularities from the data that only changes lower-order amplitudes. An integral of the form
\begin{align*}
I(\lambda)=\int_{\mathbb R^N} e^{i\lambda \phi(y)}a(y)\,dy
\end{align*}
for large $\lambda$ is governed by the critical set of $\phi$, because away from that set repeated integration by parts gains powers of $\lambda^{-1}$. In microlocal applications the phase also depends on base variables, so we first record the class of phases whose oscillations carry a well-defined cotangent direction.
[definition: Phase Function]
Let $X$ be a smooth manifold and let $\Omega\subset X\times \mathbb R^N_{0}$ be an open conic subset in the fibre variable, meaning that $(x,\theta)\in\Omega$ and $t>0$ imply $(x,t\theta)\in\Omega$. A function $\phi:\Omega\to\mathbb R$ with $\phi\in C^\infty(\Omega)$ is a phase function if it is homogeneous of degree $1$ in the fibre variable,
\begin{align*}
\phi(x,t\theta)=t\phi(x,\theta)
\end{align*}
for all $(x,\theta)\in\Omega$ and $t>0$, and $d_{x,\theta}\phi$ is nonzero on $\Omega$.
[/definition]
The non-vanishing condition prevents the oscillatory factor from losing its geometric direction, while homogeneity makes the construction conic in frequency. To decide where the integral over $\theta$ can fail to be smoothing, we need the equations for stationarity only in the variables being integrated out.
[definition: Fibre Critical Set]
For a phase function $\phi\in C^\infty(\Omega)$, its fibre critical set is
\begin{align*}
C_\phi=\{(x,\theta)\in \Omega: \partial_{\theta_1}\phi(x,\theta)=\cdots=\partial_{\theta_N}\phi(x,\theta)=0\}.
\end{align*}
[/definition]
The fibre critical set is the place where the phase can contribute singularities in the base variable $x$. To use $C_\phi$ as a smooth parametrising object, the defining equations should cut it out with constant rank.
[definition: Nondegenerate Phase Function]
A phase function $\phi\in C^\infty(\Omega)$ is nondegenerate if the differentials
\begin{align*}
d_{x,\theta}(\partial_{\theta_1}\phi),\dots,d_{x,\theta}(\partial_{\theta_N}\phi)
\end{align*}
are linearly independent at every point of $C_\phi$.
[/definition]
This condition makes $C_\phi$ a smooth conic submanifold of codimension $N$, so it can be mapped into the cotangent bundle without carrying hidden singular directions. The next result is the geometric reason for using nondegenerate phases in Fourier integral operator theory: they parametrize conic Lagrangian submanifolds, which are the microlocal supports of the distributions to come.
[quotetheorem:8197]
[citeproof:8197]
Thus the phase is not only a formula inside an integral; it is a parametrisation of the geometric object along which singularities will live. The independence hypothesis in the definition is needed: if the equations $\partial_{\theta_i}\phi=0$ fail to have independent differentials, the fibre critical set may have a cusp or a change of dimension, and its image in $T^*X$ need not be a smooth Lagrangian. The theorem also does not say that every conic Lagrangian has a single global phase parametrisation; in practice the Lagrangian is covered by local phase charts. This local nature is why later invariance results must compare different phase functions that describe the same geometric object. A basic model already shows how the Hessian at a critical point controls the leading asymptotic coefficient.
[example: Nondegenerate Quadratic Phase]
Let $A$ be a real symmetric invertible $N\times N$ matrix and set $\phi(y)=\langle Ay,y\rangle/2$. Since $A=A^\top$,
\begin{align*}
\partial_{y_k}\phi(y)=\frac12\sum_{j=1}^N A_{kj}y_j+\frac12\sum_{i=1}^N A_{ik}y_i=(Ay)_k.
\end{align*}
Thus $\nabla\phi(y)=Ay$, so invertibility of $A$ gives the unique critical point $y=0$, and $D^2\phi(0)=A$.
Choose an orthogonal matrix $Q$ such that
\begin{align*}
Q A Q^\top=\operatorname{diag}(\epsilon_1\mu_1,\dots,\epsilon_N\mu_N),
\end{align*}
where $\mu_j>0$ and $\epsilon_j\in\{1,-1\}$. With $z=Qy$, the Jacobian is $dy=dz$, and
\begin{align*}
\langle Ay,y\rangle=\langle Q A Q^\top z,z\rangle=\sum_{j=1}^N \epsilon_j\mu_j z_j^2.
\end{align*}
Hence
\begin{align*}
I(\lambda)=\int_{\mathbb R^N} e^{i\lambda\sum_{j=1}^N\epsilon_j\mu_j z_j^2/2}a(Q^\topz)\,dz.
\end{align*}
For the leading term, replace $a(Q^\topz)$ by $a(0)$ at the critical point and use the scaling $x_j=\lambda^{1/2}\mu_j^{1/2}z_j$. Then
\begin{align*}
dz=\lambda^{-N/2}\left(\prod_{j=1}^N\mu_j^{-1/2}\right)dx=\lambda^{-N/2}|\det A|^{-1/2}\,dx.
\end{align*}
The one-dimensional Fresnel factors are
\begin{align*}
\int_{\mathbb R} e^{i\epsilon_j x_j^2/2}\,dx_j=e^{i\pi\epsilon_j/4}(2\pi)^{1/2},
\end{align*}
so their product is
\begin{align*}
\prod_{j=1}^N e^{i\pi\epsilon_j/4}(2\pi)^{1/2}=e^{i\pi(\epsilon_1+\cdots+\epsilon_N)/4}(2\pi)^{N/2}.
\end{align*}
Since $\epsilon_1+\cdots+\epsilon_N=\operatorname{sgn}(A)$, the leading term is
\begin{align*}
I(\lambda)\sim e^{i\pi\operatorname{sgn}(A)/4}(2\pi)^{N/2}\lambda^{-N/2}|\det A|^{-1/2}a(0).
\end{align*}
The lower-order coefficients come from writing the Taylor expansion of $a(Q^\topz)$ at $z=0$ and integrating each resulting monomial against the same diagonal quadratic oscillation; odd monomials vanish by symmetry, and each quadratic degree adds one further power of $\lambda^{-1}$.
[/example]
## Stationary Phase and Full Expansions
The second problem is quantitative: once the phase has a nondegenerate critical point, we want every coefficient in the large-parameter expansion, not only the leading power. Non-stationary phase says that regions without critical points contribute $O(\lambda^{-M})$ for every $M$, so the asymptotics are local near critical points.
[quotetheorem:7287]
[citeproof:7287]
The useful consequence is that all asymptotic coefficients are microlocal near the stationary set. The lower bound on $|\nabla\phi|$ is essential on the support of the amplitude: for $\phi(y)=y^2$ and an amplitude supported near $0$, no repeated integration by parts in this form is available at the critical point. The theorem does not identify a leading term; it only says that regions without stationary points are smaller than every power of $\lambda^{-1}$. This is the localisation step that allows the isolated [stationary phase theorem](/theorems/8198) to focus entirely on a neighbourhood of each critical point. In the isolated nondegenerate case, the Morse lemma reduces the phase to its quadratic part, while the amplitude supplies a Taylor series.
[quotetheorem:8198]
[citeproof:8198]
This theorem is the main local calculation behind the symbolic calculus of Fourier integral operators. The uniqueness hypothesis is a localisation assumption rather than a global restriction: if several nondegenerate critical points lie in $\operatorname{supp}a$, the expansion is the sum of their separate contributions. The invertibility of the Hessian is the decisive analytic hypothesis; the Airy phase $y^3/3+ty$ at $t=0$ shows that a degenerate critical point can have the different decay rate $\lambda^{-1/3}$. The theorem also does not give a convergent series in general, but an asymptotic expansion whose finite truncations have controlled remainders. The next example records the standard one-critical-point computation that will reappear whenever a phase variable is eliminated.
[example: One Nondegenerate Critical Point]
Let $\phi(y)=y^2/2+y^3\psi(y)$ near $0$, with $\psi\in C^\infty(\mathbb R)$, and let $a\in C_c^\infty(\mathbb R)$ be supported where no critical point other than $0$ occurs. Since
\begin{align*}
\phi'(y)=y+3y^2\psi(y)+y^3\psi'(y)
\end{align*}
and
\begin{align*}
\phi''(0)=1,
\end{align*}
the point $0$ is a nondegenerate critical point.
Write
\begin{align*}
\phi(y)=\frac{y^2}{2}\bigl(1+2y\psi(y)\bigr).
\end{align*}
After shrinking the support of $a$ if necessary, $1+2y\psi(y)>0$. Define
\begin{align*}
z=y\bigl(1+2y\psi(y)\bigr)^{1/2}.
\end{align*}
Then $z(0)=0$ and $dz/dy|_{y=0}=1$, so this is a local change of variables with inverse $y=\kappa(z)$. Moreover,
\begin{align*}
\phi(y)=z^2/2.
\end{align*}
Thus
\begin{align*}
\int e^{i\lambda\phi(y)}a(y)\,dy=\int e^{i\lambda z^2/2}b(z)\,dz
\end{align*}
where
\begin{align*}
b(z)=a(\kappa(z))\kappa'(z).
\end{align*}
Expanding the inverse change of variables at $0$, with $p=\psi(0)$ and $q=\psi'(0)$, gives
\begin{align*}
z=y+py^2+\left(q-\frac{p^2}{2}\right)y^3+O(y^4).
\end{align*}
Solving $y=\kappa(z)$ term by term gives
\begin{align*}
\kappa(z)=z-pz^2+\left(\frac{5p^2}{2}-q\right)z^3+O(z^4).
\end{align*}
Therefore
\begin{align*}
\kappa'(z)=1-2pz+\left(\frac{15p^2}{2}-3q\right)z^2+O(z^3).
\end{align*}
Writing $a_j=a^{(j)}(0)$, the transformed amplitude satisfies
\begin{align*}
b''(0)=a_2-6pa_1+(15p^2-6q)a_0.
\end{align*}
For the quadratic phase $z^2/2$, the one-dimensional stationary phase expansion gives
\begin{align*}
\int e^{i\lambda z^2/2}b(z)\,dz\sim e^{i\pi/4}\left(\frac{2\pi}{\lambda}\right)^{1/2}\left(b(0)+\lambda^{-1}\frac{i}{2}b''(0)+\lambda^{-2}c_2+\cdots\right).
\end{align*}
Since $b(0)=a(0)$, this becomes
\begin{align*}
\int e^{i\lambda\phi(y)}a(y)\,dy\sim e^{i\pi/4}\left(\frac{2\pi}{\lambda}\right)^{1/2}\left(a(0)+\lambda^{-1}c_1+\lambda^{-2}c_2+\cdots\right),
\end{align*}
with
\begin{align*}
c_1=\frac{i}{2}\left(a''(0)-6\psi(0)a'(0)+\bigl(15\psi(0)^2-6\psi'(0)\bigr)a(0)\right).
\end{align*}
The later coefficients are obtained in the same way from higher Taylor coefficients of $b(z)=a(\kappa(z))\kappa'(z)$, so each $c_j$ depends only on finitely many derivatives of $a$ and $\psi$ at $0$.
[/example]
The nondegeneracy hypothesis is not a technical luxury. If the Hessian vanishes in a critical direction, the decay rate changes and the usual symbolic order bookkeeping no longer applies.
[example: Airy-Type Degeneration]
Consider
\begin{align*}
A(\lambda,t)=\int_{\mathbb R} e^{i\lambda(y^3/3+ty)}\chi(y)\,dy,
\end{align*}
where $\chi\in C_c^\infty(\mathbb R)$ equals $1$ on a neighbourhood of $0$. Write $\Phi_t(y)=y^3/3+ty$. Then
\begin{align*}
\Phi_t'(y)=y^2+t
\end{align*}
and
\begin{align*}
\Phi_t''(y)=2y.
\end{align*}
At $t=0$, the only critical point near $0$ is $y=0$, and it is degenerate because $\Phi_0'(0)=0$ and $\Phi_0''(0)=0$.
For $t=0$, the change of variables $y=\lambda^{-1/3}z$ gives $dy=\lambda^{-1/3}dz$ and
\begin{align*}
\lambda\Phi_0(y)=\lambda\frac{y^3}{3}=\lambda\frac{\lambda^{-1}z^3}{3}=\frac{z^3}{3}.
\end{align*}
Therefore
\begin{align*}
A(\lambda,0)=\lambda^{-1/3}\int_{\mathbb R} e^{iz^3/3}\chi(\lambda^{-1/3}z)\,dz.
\end{align*}
The natural prefactor is thus $\lambda^{-1/3}$, not the $\lambda^{-1/2}$ prefactor produced by a nondegenerate one-dimensional quadratic critical point.
For $t<0$, the critical equation is
\begin{align*}
y^2+t=0,
\end{align*}
so the two critical points are $y=\sqrt{-t}$ and $y=-\sqrt{-t}$. Their second derivatives are
\begin{align*}
\Phi_t''(\sqrt{-t})=2\sqrt{-t}
\end{align*}
and
\begin{align*}
\Phi_t''(-\sqrt{-t})=-2\sqrt{-t},
\end{align*}
which are both nonzero. For $t>0$, one has $y^2+t>0$ for every real $y$, so there are no real critical points. Thus the number and type of stationary points change at $t=0$, which is the caustic for this Airy-type family.
[/example]
## Clean Critical Sets and Excess
Many phases in Fourier integral operator theory have stationary sets that are manifolds rather than isolated points. The right replacement for nondegeneracy is that the stationary equations vanish cleanly along a critical manifold, so the Hessian is nondegenerate only in directions normal to that manifold.
[definition: Clean Critical Set]
Let $\phi\in C^\infty(U;\mathbb R)$ and let
\begin{align*}
C=\{y\in U:d\phi(y)=0\}.
\end{align*}
The critical set is clean if $C$ is a smooth submanifold of $U$ and, for every $y\in C$,
\begin{align*}
T_yC=\ker D^2\phi(y).
\end{align*}
[/definition]
Cleanliness says that the only Hessian degeneracy comes from moving along the critical manifold itself. Since those tangential variables remain after stationary phase has been applied in the normal directions, we need a number that records how many such variables survive.
[definition: Excess]
For a clean critical set $C\subset U\subset\mathbb R^N$, the excess is
\begin{align*}
e=\dim C.
\end{align*}
[/definition]
The word excess also appears for phase parametrisations of Lagrangians and canonical relations, where it measures the dimension of fibres left after imposing stationary equations. In the local stationary phase theorem below it is exactly the dimension of the clean critical manifold, and it shifts the power of $\lambda$.
[quotetheorem:8199]
[citeproof:8199]
This result is the mechanism behind the order shift in compositions of Fourier integral operators. The clean hypothesis rules out caustic behaviour: if the rank of the Hessian in normal directions jumps along the critical set, the power of $\lambda$ may change from point to point and a single symbolic expansion need not describe the integral. The componentwise statement is also necessary, since two connected components may have different critical values and therefore carry different oscillating factors $e^{i\lambda\phi_\ell}$. The theorem does not collapse the tangential variables; it leaves an integral over the clean critical manifold, which is precisely the source of the excess contribution. Each free critical parameter contributes a half-power loss relative to the isolated stationary case.
[example: Stationary Manifold]
Let $y=(u,v)\in\mathbb R^e\times\mathbb R^{N-e}$, write $m=N-e$, and set
\begin{align*}\phi(u,v)=|v|^2/2=\frac12\sum_{j=1}^{m}v_j^2.\end{align*}
For $1\le k\le e$ one has $\partial_{u_k}\phi(u,v)=0$, and for $1\le j\le m$ one has $\partial_{v_j}\phi(u,v)=v_j$. Hence $d\phi(u,v)=0$ exactly when $v=0$, so
\begin{align*}C=\{(u,0):u\in\mathbb R^e\}.\end{align*}
The Hessian in the $u$ directions is zero, the mixed second derivatives are zero, and
\begin{align*}\partial_{v_i}\partial_{v_j}\phi(u,v)=\delta_{ij}.\end{align*}
Thus $\ker D^2\phi(u,0)=\mathbb R^e\times\{0\}=T_{(u,0)}C$, and the normal Hessian in the $v$ variables is the identity on $\mathbb R^m$.
Apply the *Clean Stationary Phase Theorem* with excess $e=\dim C$. Since the normal Hessian is the identity, its determinant is $1$ and its signature is $m=N-e$. Therefore the leading normal stationary-phase factor is
\begin{align*}e^{i\pi m/4}(2\pi)^{m/2}\lambda^{-m/2}=e^{i\pi(N-e)/4}(2\pi)^{(N-e)/2}\lambda^{-(N-e)/2}.\end{align*}
The leading coefficient is obtained by restricting the amplitude to the critical manifold:
\begin{align*}A_0=\int_{\mathbb R^e}a(u,0)\,du.\end{align*}
Consequently,
\begin{align*}\int_{\mathbb R^e}\int_{\mathbb R^{N-e}} e^{i\lambda |v|^2/2}a(u,v)\,dv\,du \sim e^{i\pi(N-e)/4}(2\pi)^{(N-e)/2}\lambda^{-(N-e)/2}\int_{\mathbb R^e}a(u,0)\,du+\text{lower powers of }\lambda^{-1}.\end{align*}
More explicitly, because the phase is exactly quadratic in $v$, the later coefficients are obtained by Taylor expanding $a$ only in the normal variable:
\begin{align*}A_j=e^{i\pi m/4}(2\pi)^{m/2}\frac{(i/2)^j}{j!}\int_{\mathbb R^e}(\Delta_v^j a)(u,0)\,du.\end{align*}
Thus the $u$ variables are not integrated out by stationary phase; they remain as integration variables along the clean critical manifold, and only the $N-e$ normal variables produce the decay factor $\lambda^{-(N-e)/2}$.
[/example]
## Oscillatory Integrals as Distributions
The final problem is to turn oscillatory formulas into distributions and to read their wave front sets from the phase. This is where the chapter connects directly to the later definition of Fourier integral distributions associated to Lagrangian submanifolds.
[definition: Oscillatory Integral Distribution]
Let $X$ be a smooth manifold, let $\Omega\subset X\times\mathbb R^N_{0}$ be open and conic in the fibre variable, let $\phi$ be a phase function on $\Omega$, and let $a\in S^m(\Omega)$ be a classical symbol such that, after multiplication by any $\psi\in C_c^\infty(X)$, the product $a(x,\theta)\psi(x)$ has compact $x$-support in a single local coordinate representative. Choose $\chi\in C_c^\infty(\mathbb R^N)$ with $\chi=1$ on a neighbourhood of $0$. The oscillatory integral associated to $(\phi,a)$ is the distribution $u\in\mathcal D'(X)$ whose local action on $\psi\in C_c^\infty(X)$ is
\begin{align*}
u(\psi)=\lim_{\varepsilon\to 0^+}\int_X\int_{\Omega_x} e^{i\phi(x,\theta)}\chi(\varepsilon\theta)a(x,\theta)\psi(x)\,d\theta\,dx,
\end{align*}
where $\Omega_x=\{\theta\in\mathbb R^N_{0}:(x,\theta)\in\Omega\}$ and the limit is independent of the cutoff $\chi$.
[/definition]
Equivalently, $u$ is the continuous linear map $u:C_c^\infty(X)\to\mathbb C$ determined by this regularised formula in local coordinates. The cutoff limit is the standard oscillatory regularisation: large fibre variables are first truncated, non-stationary regions are controlled by integration by parts, and the resulting functional is then passed to the limit in $\mathcal D'(X)$. The compact-support condition is local in the base variable because testing against $\psi$ is what makes the $x$-integration a compactly supported coordinate calculation. The definition is arranged so that non-stationary phase supplies convergence after regularisation. Singularities can occur only where the fibre equations are stationary and the remaining $x$-covector is nonzero.
[quotetheorem:8200]
[citeproof:8200]
The estimate explains why oscillatory integrals are microlocal objects: changing the amplitude away from the stationary set changes the distribution by a smooth term. The inclusion can be strict, for example if the symbol vanishes to high order on part of $C_\phi$ or if leading contributions from different stationary points cancel. The nonzero covector condition matters because a stationary point with $d_x\phi=0$ may affect the smooth density but does not create a wave front direction. The theorem also gives only an upper bound on singular support in phase space, not a complete principal symbol calculation. To make the definition independent of choices, we also need to know that changing from one phase parametrisation of the same Lagrangian to another only transforms the symbol and does not create a new class of distributions.
Without such an invariance result, the same Lagrangian could lead to apparently different spaces of distributions depending on arbitrary auxiliary variables. For instance, adding a nondegenerate quadratic variable to a phase changes the written integral and shifts the raw amplitude order unless stationary phase is used to absorb the extra variable correctly. A nonlinear fibre change of variables can also insert a Jacobian and a Maslov-type signature factor, so equality of the geometric image alone is not enough unless the symbol transformation is controlled.
[quotetheorem:8201]
[citeproof:8201]
This theorem is what turns the previous local constructions into an intrinsic definition of Fourier integral distributions. The nondegeneracy and embeddedness assumptions keep the comparison inside the standard phase calculus. If a parametrisation has excess, there are leftover fibre-critical variables, so the order shift from clean stationary phase must be recorded before a symbol can be compared with a non-excess parametrisation. If the parametrising map has a self-intersection, two distinct critical points can represent the same point of $\Lambda$, so a single local symbol may split into several branches rather than transform by one local isomorphism. The result is modulo smooth terms, so it preserves microlocal singularities and principal data but does not claim equality of the written oscillatory integrals as ordinary functions. It also explains why half-densities enter naturally: the change of phase must transport both amplitudes and the measures used in the fibre variables. The most important examples come from submanifolds. Surface measure already produces a conormal distribution, and its Fourier transform is governed by stationary points of the height function.
[example: Fourier Transform of Surface Measure]
Let $\rho=|\xi|$ and $\omega=\xi/\rho\in \mathbb S^{n-1}$, so
\begin{align*}
\widehat{d\sigma}(\rho\omega)=\frac{1}{(2\pi)^{n/2}}\int_S e^{-i\rho x\cdot\omega}\,d\sigma(x).
\end{align*}
Near a point $p\in S$, choose local coordinates $u=(u_1,\dots,u_{n-1})$ and write the parametrisation as $x=x(u)$. In this chart the phase is
\begin{align*}
\Phi_\omega(u)=-x(u)\cdot\omega.
\end{align*}
For each coordinate direction,
\begin{align*}
\partial_{u_j}\Phi_\omega(u)=-\partial_{u_j}x(u)\cdot\omega.
\end{align*}
Thus $u$ is a critical point exactly when $\omega$ is orthogonal to every tangent vector $\partial_{u_j}x(u)$, which means that $\omega$ is parallel to a unit normal at $x(u)$.
At such a point choose the coordinates so that $x(0)=p$, $\partial_{u_1}x(0),\dots,\partial_{u_{n-1}}x(0)$ are an [orthonormal basis](/page/Orthonormal%20Basis) of $T_pS$, and the unit normal is $\nu(p)$. If $\omega=\varepsilon\nu(p)$ with $\varepsilon\in\{1,-1\}$, then
\begin{align*}
\partial_{u_i}\partial_{u_j}\Phi_\omega(0)=-\partial_{u_i}\partial_{u_j}x(0)\cdot\omega.
\end{align*}
Since $\omega=\varepsilon\nu(p)$, this becomes
\begin{align*}
\partial_{u_i}\partial_{u_j}\Phi_\omega(0)=-\varepsilon\,\partial_{u_i}\partial_{u_j}x(0)\cdot\nu(p).
\end{align*}
The matrix $\bigl(\partial_{u_i}\partial_{u_j}x(0)\cdot\nu(p)\bigr)$ is the second fundamental form in these coordinates, and its determinant is the Gaussian curvature up to the chosen orientation convention. Because the Gaussian curvature is nonzero, the Hessian $D^2\Phi_\omega(0)$ is invertible, so each stationary point is nondegenerate.
Away from these stationary points, $\nabla_u\Phi_\omega\ne 0$, so the contribution is rapidly decreasing by *Nonstationary Phase*. At each stationary point $p$ with normal parallel to $\omega$, the *Stationary Phase Theorem* in dimension $n-1$ gives a leading term
\begin{align*}
\frac{1}{(2\pi)^{n/2}}e^{-i\rho p\cdot\omega}e^{i\pi\operatorname{sgn}(D^2\Phi_{\omega,p})/4}\left(\frac{2\pi}{\rho}\right)^{(n-1)/2}\left|\det D^2\Phi_{\omega,p}\right|^{-1/2}.
\end{align*}
Summing over the finitely many points where the normal direction is $\omega$ or $-\omega$, one obtains
\begin{align*}
\widehat{d\sigma}(\rho\omega)=O\left(\rho^{-(n-1)/2}\right).
\end{align*}
Thus the decay rate is governed by the $n-1$ tangent variables on the hypersurface, and nonvanishing Gaussian curvature is precisely what makes the stationary phase calculation nondegenerate at the normal points.
[/example]
This example foreshadows the general conormal construction. In later chapters, parametrices for hyperbolic equations will be built from oscillatory integrals whose phases solve eikonal equations and whose amplitudes solve transport equations; the present chapter supplies the stationary phase estimates and wave front control needed to make those constructions invariant.
The stationary phase calculus shows how oscillatory kernels concentrate on critical geometry, but it is the Lagrangian viewpoint that packages this geometry invariantly. The next chapter recasts those phase functions as intrinsic objects in cotangent space and uses them to define Lagrangian distributions.
# 8. Lagrangian Distributions and Canonical Geometry
Using the phase and stationary-phase tools of Chapter 7, this chapter turns the geometry behind Fourier integral operators into an intrinsic language. It assumes the earlier material on distributions, wave front sets, homogeneous phase functions, stationary phase, and the symbol calculus for pseudodifferential operators. The course goal is to replace coordinate-dependent oscillatory formulae by geometric objects on cotangent bundles: conic Lagrangians, their principal symbols, and the normal forms that make Fourier integral operators calculable. The central question is how an oscillatory integral knows a Lagrangian, how its leading coefficient becomes a principal symbol, and why conormal distributions are the first examples rather than a separate theory.
## Lagrangian Submanifolds in the Punctured Cotangent Bundle
The first problem is to identify the geometric support of high-frequency oscillation. In the pseudodifferential calculus the relevant set is often the diagonal in $T^*X \times T^*X$, but a general Fourier integral operator replaces the diagonal by a canonical relation. For distributions on a single manifold $X$, this leads to Lagrangian submanifolds of $T^*X \setminus 0$.
[definition: Canonical One-Form]
Let $X$ be a smooth $n$-manifold and let $\pi:T^*X \to X$ be the projection. The canonical one-form is the smooth section
\begin{align*}
\alpha \in \Gamma(T^*(T^*X)),
\end{align*}
defined by
\begin{align*}
\alpha_{(x,\xi)}(V) = \xi(d\pi_{(x,\xi)}V),
\end{align*}
where $(x,\xi) \in T^*X$ and $V \in T_{(x,\xi)}T^*X$.
[/definition]
The same pointwise formula also views $\alpha$ as the smooth fibrewise-linear map $\alpha:T(T^*X)\to \mathbb R$. This intrinsic formulation is what makes the later Lagrangian condition independent of a coordinate chart.
The [exterior derivative](/theorems/1525) of the canonical one-form gives the symplectic structure used throughout the chapter. In local coordinates $(x_1,\dots,x_n,\xi_1,\dots,\xi_n)$,
\begin{align*}
\alpha &= \sum_i \xi_i\,dx_i, & \omega &= d\alpha = \sum_i d\xi_i \wedge dx_i.
\end{align*}
This motivates the next definition, which singles out the conic submanifolds on which this symplectic form vanishes in the largest possible dimension.
[definition: Conic Lagrangian Submanifold]
Let $\Lambda \subset T^*X \setminus 0$ be a smooth submanifold. It is a conic Lagrangian submanifold if $\dim \Lambda = n$, $\omega|_\Lambda = 0$, and $(x,\xi) \in \Lambda$ implies $(x,t\xi) \in \Lambda$ for every $t>0$.
[/definition]
The conic condition records that microlocal singularities live in directions rather than in covectors with a preferred length. Without it, positive-frequency rescaling of an oscillatory integral would leave the proposed geometric support, so the object would not be stable under the homogeneous calculus. The Lagrangian condition says that $\Lambda$ is maximally compatible with the symplectic geometry; an isotropic submanifold of smaller dimension has too few parameters to carry the full stationary-phase leading data for distributions on $X$. A first test case is the graph of a differential, where the symplectic cancellation can be checked directly.
[example: Graph of an Exact One-Form]
Let $f \in C^\infty(X)$ and suppose $df_x \ne 0$ on an open set $U \subset X$. Define $i_f:U\to T^*U$ by $i_f(x)=(x,df_x)$, so
\begin{align*}
\Lambda_f=i_f(U)=\{(x,df_x):x\in U\}.
\end{align*}
Since $\pi\circ i_f=\operatorname{id}_U$, the map $i_f$ is an embedding, and therefore $\dim \Lambda_f=\dim U=n$.
We compute the restriction of the canonical one-form by pulling it back along $i_f$. For $v\in T_xU$,
\begin{align*}
(i_f^*\alpha)_x(v)=\alpha_{(x,df_x)}(di_f(v)).
\end{align*}
By the definition of $\alpha$,
\begin{align*}
\alpha_{(x,df_x)}(di_f(v))=df_x(d\pi_{(x,df_x)}di_f(v)).
\end{align*}
Because $\pi\circ i_f=\operatorname{id}_U$, differentiating gives $d\pi_{(x,df_x)}di_f(v)=v$, hence
\begin{align*}
(i_f^*\alpha)_x(v)=df_x(v).
\end{align*}
Thus $i_f^*\alpha=df$. Pulling back $\omega=d\alpha$ gives
\begin{align*}
i_f^*\omega=i_f^*(d\alpha)=d(i_f^*\alpha)=d(df).
\end{align*}
In local coordinates, if $df=\sum_i \partial_{x_i}f\,dx_i$, then
\begin{align*}
d(df)=\sum_{i,j}\partial_{x_j}\partial_{x_i}f\,dx_j\wedge dx_i=0,
\end{align*}
because the terms with indices $(i,j)$ and $(j,i)$ cancel by equality of mixed partial derivatives, while $dx_i\wedge dx_i=0$. Therefore $\omega|_{\Lambda_f}=0$, and $\Lambda_f$ is Lagrangian.
This is the local model for a phase with no auxiliary stationary variables: the covector is obtained by differentiating the phase in the base variables. The graph is not usually conic, since $(x,df_x)$ need not imply $(x,t\,df_x)\in \Lambda_f$ for all $t>0$; conic Lagrangians arise after introducing homogeneous fibre variables or imposing a level-set condition.
[/example]
This example shows why exact one-forms are basic generating data. General Lagrangians need not be global graphs, so the next issue is how to replace the single function $f$ by a phase with auxiliary variables.
## Homogeneous Lagrangians and Phase Parametrization
The local problem is the inverse problem for oscillatory integrals: given a conic Lagrangian $\Lambda$, find a phase function whose stationary set maps onto it. The auxiliary variables allow the projection $\Lambda \to X$ to have folds and other singularities while keeping the parametrization smooth upstairs.
[definition: Homogeneous Phase Function]
Let $U \subset X$ be open and let $\Theta \subset \mathbb R^N_0$ be a conic open set. A smooth function $\phi:U \times \Theta \to \mathbb R$ is a homogeneous phase function if $\phi(x,t\theta)=t\phi(x,\theta)$ for every $t>0$, and $d_{x,\theta}\phi$ does not vanish on the critical set
\begin{align*}
C_\phi = \{(x,\theta) \in U \times \Theta : \partial_\theta \phi(x,\theta)=0\}.
\end{align*}
[/definition]
The non-vanishing condition prevents the phase from degenerating into a smooth amplitude contribution. To parametrize a smooth Lagrangian, the critical equations must also cut out a smooth manifold of the expected dimension. This motivates the next definition, which is the rank condition needed for the stationary set to carry the desired geometry.
[definition: Nondegenerate Phase Function]
A homogeneous phase function $\phi:U \times \Theta \to \mathbb R$ is nondegenerate if the differentials $d(\partial_{\theta_1}\phi),\dots,d(\partial_{\theta_N}\phi)$ are linearly independent at every point of $C_\phi$.
[/definition]
For a nondegenerate phase, $C_\phi$ has dimension $n$ and carries the geometry of the stationary set. The stationary variables themselves are auxiliary, however; wave front sets and Fourier integral kernels live in $T^*U$ and only see the covector obtained by differentiating the phase in the base variables.
This creates a quotient problem: different parameter values may describe the same cotangent direction, while the oscillatory integral only records the resulting covector over $x$. The critical set by itself is therefore not the geometric object that controls singularities; it must be pushed into the punctured cotangent bundle by the map $(x,\theta)\mapsto (x,\partial_x\phi(x,\theta))$. The definition below fixes this image and the associated map, so that a phase function can be compared with the Lagrangian geometry it is meant to parametrize.
[definition: Lagrangian Parametrized by a Phase]
Let $\phi:U \times \Theta \to \mathbb R$ be a nondegenerate homogeneous phase function. The Lagrangian parametrized by $\phi$ is
\begin{align*}
\Lambda_\phi = \{(x,\partial_x\phi(x,\theta)) : (x,\theta) \in C_\phi\} \subset T^*U \setminus 0.
\end{align*}
The associated map is $j_\phi:C_\phi\to T^*U\setminus 0$, $j_\phi(x,\theta)=(x,\partial_x\phi(x,\theta))$.
[/definition]
The nondegeneracy condition makes $j_\phi$ an immersed Lagrangian parametrization. When $j_\phi$ is injective after restricting to a neighbourhood, $\Lambda_\phi$ is regarded there as an embedded Lagrangian submanifold. This construction may have several parameter values mapping to the same covector before one restricts the parameter set. The central local question is whether every conic Lagrangian has such an embedded local chart after passing to a small enough neighbourhood. This motivates the following theorem, which is the local bridge from canonical geometry to oscillatory integral formulae.
[quotetheorem:8202]
[citeproof:8202]
The hypotheses are doing real work. If the submanifold is isotropic but has dimension smaller than $n$, then it cannot carry the full leading data of a distribution on $X$ in this calculus, because the stationary set has the wrong dimension after solving the critical equations. If the rank condition behind the chosen coordinates fails, the proposed stationary equations may stop cutting out a smooth critical set of the expected dimension; this is the local failure mode that nondegenerate phases are designed to avoid. If the conic condition is dropped, a phase homogeneous of degree one in the auxiliary variables cannot parametrize the object without changing the calculus, since positive fibre rescaling would leave the candidate Lagrangian. The theorem also does not produce a single preferred global phase: a global Lagrangian may require several phase functions, and different charts may be related by changes of variables and stabilization. These transition phenomena are exactly where Maslov factors enter the symbol theory. Before discussing symbols, it is useful to keep the conormal model visible.
[example: Conormal Bundle of a Hypersurface]
Let $Y \subset X$ be a smooth hypersurface, and work in an open set $U$ where $Y \cap U=\{x\in U:h(x)=0\}$ with $dh_x\ne 0$ on $Y \cap U$. Consider the homogeneous phase
\begin{align*}
\phi(x,\theta)=\theta h(x)
\end{align*}
on $U\times \mathbb R_0$. Its $\theta$-derivative is
\begin{align*}
\partial_\theta\phi(x,\theta)=h(x),
\end{align*}
so the critical set is
\begin{align*}
C_\phi=\{(x,\theta)\in U\times \mathbb R_0:h(x)=0\}=(Y\cap U)\times \mathbb R_0.
\end{align*}
For $(x,\theta)\in C_\phi$, the $x$-differential is computed by applying $d_x$ to the product $\theta h(x)$, with $\theta$ fixed:
\begin{align*}
\partial_x\phi(x,\theta)=\theta\,dh_x.
\end{align*}
Hence the Lagrangian parametrized by the phase is
\begin{align*}
\Lambda_\phi=\{(x,\theta\,dh_x):x\in Y\cap U,\ \theta\ne 0\}.
\end{align*}
Since $Y$ is a hypersurface and $dh_x\ne 0$, the annihilator of $T_xY=\ker dh_x$ is the one-dimensional span of $dh_x$. Thus every nonzero conormal covector at $x\in Y\cap U$ has the form $\theta\,dh_x$ with $\theta\ne 0$, and therefore
\begin{align*}
\Lambda_\phi=N^*Y\setminus 0
\end{align*}
over $U$.
The conic property is visible from the formula: if $(x,\theta\,dh_x)\in \Lambda_\phi$ and $t>0$, then
\begin{align*}
t(\theta\,dh_x)=(t\theta)\,dh_x,
\end{align*}
with $t\theta\ne 0$, so $(x,t\theta\,dh_x)\in \Lambda_\phi$. Thus the conormal bundle of a hypersurface is produced by the homogeneous phase $\theta h(x)$, and the single auxiliary variable $\theta$ records the nonzero normal frequency.
[/example]
This model explains why hypersurface singularities fit into the same formalism as general Fourier integral distributions. The phase does not describe propagation yet; for that, the Lagrangian itself moves under a Hamiltonian flow.
[example: Lagrangian Generated by Geodesic Flow]
Let $F_t=\exp(tH_p)$ denote the Hamiltonian flow on the time interval where it is defined, and set
\begin{align*}
\Lambda_t=F_t(\Lambda_0).
\end{align*}
We show that $\Lambda_t$ is again a conic Lagrangian submanifold of $T^*X\setminus 0$.
Since $F_t$ is a diffeomorphism onto its image, $\dim \Lambda_t=\dim \Lambda_0=n$. To check the Lagrangian condition, use the Hamiltonian identity $\iota_{H_p}\omega=dp$. Because $d\omega=0$ and $d(dp)=0$, Cartan's formula gives
\begin{align*}
\mathcal L_{H_p}\omega=d(\iota_{H_p}\omega)+\iota_{H_p}d\omega=d(dp)+0=0.
\end{align*}
Therefore
\begin{align*}
\frac{d}{dt}F_t^*\omega=F_t^*(\mathcal L_{H_p}\omega)=0.
\end{align*}
At $t=0$, $F_0=\operatorname{id}$, so $F_0^*\omega=\omega$, and hence
\begin{align*}
F_t^*\omega=\omega.
\end{align*}
If $\lambda\in\Lambda_0$ and $u,v\in T_\lambda\Lambda_0$, then
\begin{align*}
\omega_{F_t(\lambda)}(dF_tu,dF_tv)=(F_t^*\omega)_\lambda(u,v)=\omega_\lambda(u,v)=0,
\end{align*}
because $\omega|_{\Lambda_0}=0$. Thus $\omega|_{\Lambda_t}=0$.
It remains to check conicity. Let $\rho_s:T^*X\setminus 0\to T^*X\setminus 0$ be positive fibre dilation, $\rho_s(x,\xi)=(x,s\xi)$ for $s>0$. Since $p(x,s\xi)=s p(x,\xi)$, we have $\rho_s^*p=s p$ and hence
\begin{align*}
\rho_s^*(dp)=d(\rho_s^*p)=d(sp)=s\,dp.
\end{align*}
The canonical one-form satisfies $\rho_s^*\alpha=s\alpha$, so
\begin{align*}
\rho_s^*\omega=\rho_s^*(d\alpha)=d(\rho_s^*\alpha)=d(s\alpha)=s\omega.
\end{align*}
For the pushed-forward vector field, pulling back its defining contraction gives
\begin{align*}
\rho_s^*(\iota_{(\rho_s)_*H_p}\omega)=\iota_{H_p}(\rho_s^*\omega)=\iota_{H_p}(s\omega)=s\,dp=\rho_s^*(dp).
\end{align*}
Since $\rho_s$ is a diffeomorphism, this implies $\iota_{(\rho_s)_*H_p}\omega=dp$, so $(\rho_s)_*H_p=H_p$. Therefore the flow commutes with dilation:
\begin{align*}
F_t(\rho_s\lambda)=\rho_s(F_t\lambda).
\end{align*}
If $\mu=F_t(\lambda)\in\Lambda_t$ and $s>0$, then $\rho_s\lambda\in\Lambda_0$ because $\Lambda_0$ is conic, and
\begin{align*}
\rho_s\mu=\rho_s(F_t\lambda)=F_t(\rho_s\lambda)\in\Lambda_t.
\end{align*}
Hence $\Lambda_t$ is conic. The geodesic Hamiltonian flow therefore transports the whole conic Lagrangian geometry of $\Lambda_0$, which is the microlocal mechanism behind propagation of singularities for wave-type equations.
[/example]
## Lagrangian Distributions and Principal Symbols
The analytic problem is to attach distributions to a Lagrangian in a way that is independent of the chosen phase. Locally, the definition is made by oscillatory integrals; globally, compatibility under changes of phase forces the principal symbol to live in a bundle involving half-densities and Maslov data.
[definition: Local Lagrangian Distribution]
Let $U \subset X$ be open, let $\dim X=n$, let $\Theta \subset \mathbb R^N_0$ be conic open, and let $\phi:U \times \Theta \to \mathbb R$ be a nondegenerate homogeneous phase function parametrizing $\Lambda \subset T^*U\setminus 0$. A distribution $u \in \mathcal D'(U)$ is locally a Lagrangian distribution of order $m$ associated with $\Lambda$ if, modulo $C^\infty(U)$, it can be written as an oscillatory integral
\begin{align*}
u(x)=\int_{\Theta} e^{i\phi(x,\theta)}a(x,\theta)\,d\theta,
\end{align*}
where $a:U\times \Theta \to \mathbb C$ is a classical symbol in the class
\begin{align*}
a \in S_{\mathrm{cl}}^{m+n/4-N/2}(U\times \Theta).
\end{align*}
For every compact $K\subset U$ and all multi-indices $\alpha,\beta$, the defining estimates are
\begin{align*}
|\partial_x^\alpha\partial_\theta^\beta a(x,\theta)| \le C_{K,\alpha,\beta}\langle \theta\rangle^{m+n/4-N/2-|\beta|}
\end{align*}
for $x\in K$ and $|\theta|\ge 1$, and $a$ has an asymptotic expansion in homogeneous terms of degrees $m+n/4-N/2-j$ in $\theta$.
[/definition]
The integral in the definition is interpreted as an oscillatory integral with a low-frequency cutoff: insert $\chi(\theta/R)$, pass to the limit $R\to\infty$, and regard different choices near $|\theta|<1$ as changing only the smooth representative on $U$. This convention is harmless for microlocal questions because the singular part comes from the homogeneous high-frequency expansion of the amplitude.
This fixes the order convention used in the rest of the chapter: when the phase has $N$ auxiliary variables on an $n$-dimensional base, amplitude order $\mu$ corresponds to distribution order $\mu-n/4+N/2$. Lowering the leading homogeneous term of the amplitude by one lowers the Lagrangian distribution order by one. This motivates the principal symbol, which extracts precisely that leading homogeneous term in an invariant form.
[definition: Principal Symbol of a Lagrangian Distribution]
Let $X$ be a smooth manifold and let $\Lambda \subset T^*X\setminus 0$ be a smooth conic Lagrangian. The principal symbol map is
\begin{align*}
\sigma_m:I^m(X,\Lambda)\to S_{\mathrm{hom}}^m(\Lambda;\mathcal M_\Lambda \otimes \Omega_\Lambda^{1/2})/S_{\mathrm{hom}}^{m-1}(\Lambda;\mathcal M_\Lambda \otimes \Omega_\Lambda^{1/2}).
\end{align*}
For $u \in I^m(X,\Lambda)$ represented locally by a nondegenerate phase $\phi$ and a classical amplitude $a \sim a_0+a_1+\cdots$, the value
\begin{align*}
\sigma_m(u) \in S_{\mathrm{hom}}^m(\Lambda;\mathcal M_\Lambda \otimes \Omega_\Lambda^{1/2})/S_{\mathrm{hom}}^{m-1}(\Lambda;\mathcal M_\Lambda \otimes \Omega_\Lambda^{1/2}),
\end{align*}
is the class of the leading homogeneous amplitude $a_0$ with the stationary phase normalization, half-density factor, and Maslov factor for the chosen parametrization.
[/definition]
In this convention $\Omega_\Lambda^{1/2}$ is the real half-density bundle determined by the Liouville measure in a local phase chart, and $\mathcal M_\Lambda$ is the Maslov line whose transition functions are the stationary-phase signature factors from changes of nondegenerate phase. Other references may absorb part of this factor into the amplitude normalization; here the displayed order and target bundle are fixed by the exact sequence below.
The transformation law for equivalent phase functions makes this class independent of the chosen local parametrization. The definition is practical rather than decorative: it records exactly what survives after quotienting by distributions of lower order. The next theorem is the basic exact sequence used to solve transport equations and construct parametrices.
[quotetheorem:8203]
[citeproof:8203]
The exact sequence is the Lagrangian analogue of the principal symbol sequence for pseudodifferential operators. It says that the leading microlocal behaviour is geometric data on $\Lambda$, while lower-order corrections are invisible to the principal symbol. The smooth conic Lagrangian hypothesis is essential: at a crossing of two Lagrangian sheets, there is no single smooth half-density line over the intersection point, and a single quotient symbol space cannot encode the two independent leading coefficients without separating the branches. The classical-symbol hypothesis is also essential. For instance, an amplitude such as
\begin{align*}
a(\theta)=\langle \theta\rangle^\mu\left(1+\frac{\sin(\log \langle \theta\rangle)}{\log \langle \theta\rangle}\right)
\end{align*}
satisfies standard symbol estimates after localization away from $\theta=0$, but it has no homogeneous leading term of degree $\mu$ with a full classical expansion; the quotient symbol would therefore have no canonical numerator. The order convention fixes which amplitude degree is called order $m$: if a representation is stabilized by replacing $\phi(x,\theta)$ with $\phi(x,\theta)+q(\eta)$ for a non-singular quadratic form $q$ in $k$ extra variables, stationary phase in $\eta$ shifts the amplitude degree by $k/2$ and contributes determinant and signature factors. The shift $n/4-N/2$ is chosen so that this stabilized representation defines the same order and the same invariant symbol. The half-density and Maslov factors are not cosmetic; without them, the same distribution would acquire incompatible local symbols in overlapping phase charts. The theorem does not solve the transport equations by itself; it supplies the target in which the leading transport solution must live and identifies the lower-order freedom available for the next correction. The remaining bookkeeping issue is the phase ambiguity caused by changing parametrizations.
[remark: Maslov Factors in Computations]
In many PDE constructions the Maslov factor is tracked by choosing a consistent phase chart along the part of $\Lambda$ under consideration. When a caustic is crossed, the signature of the transverse Hessian changes, and stationary phase contributes a factor of $e^{i\pi k/4}$ for an integer $k$. The Maslov line packages these phase jumps so that the principal symbol is globally meaningful.
[/remark]
This viewpoint is especially useful for parametrices. The eikonal equation determines the Lagrangian, and transport equations determine successive homogeneous pieces of the symbol.
## Conormal Distributions as Lagrangian Distributions
The final problem is to reconcile the older class of conormal distributions with the new Lagrangian language. A conormal distribution has oscillation only in directions normal to a submanifold; the corresponding Lagrangian is the punctured conormal bundle.
[definition: Conormal Bundle]
Let $Y \subset X$ be an embedded submanifold. The conormal bundle of $Y$ in $X$ is
\begin{align*}
N^*Y = \{(y,\xi) \in T^*X : y \in Y,\ \xi(v)=0 \text{ for every } v \in T_yY\}.
\end{align*}
[/definition]
The bundle $N^*Y\setminus 0$ is conic and has dimension $n$. In local coordinates with $Y=\{x''=0\}$, its covectors have no $dx'$ component and arbitrary nonzero $dx''$ component. This motivates the local oscillatory definition, where only the normal variables participate in the phase.
[definition: Local Conormal Distribution]
Let $X$ be a smooth $n$-manifold, let $Y \subset X$ have codimension $k$, and choose local coordinates $x=(x',x'')\in U'\times U''\subset \mathbb R^{n-k}\times \mathbb R^k$ with $Y=\{x''=0\}$. A distribution $u\in \mathcal D'(U'\times U'')$ is a local conormal distribution to $Y$ of order $m$ if, modulo smooth terms, it is an oscillatory integral of the form
\begin{align*}
u(x',x'')=\int_{\mathbb R^k} e^{i x''\cdot \theta}a(x',x'',\theta)\,d\theta,
\end{align*}
where $a:U'\times U''\times \mathbb R^k \to \mathbb C$ is a classical symbol
\begin{align*}
a\in S_{\mathrm{cl}}^{m+n/4-k/2}(U'\times U''\times \mathbb R^k).
\end{align*}
For every compact $K\subset U'\times U''$ and all multi-indices $\alpha,\beta$, the defining estimates are
\begin{align*}
|\partial_x^\alpha\partial_\theta^\beta a(x,\theta)|\le C_{K,\alpha,\beta}\langle\theta\rangle^{m+n/4-k/2-|\beta|}
\end{align*}
for $x\in K$ and $|\theta|\ge 1$, and $a$ has an asymptotic expansion in homogeneous terms of degrees $m+n/4-k/2-j$ in $\theta$.
[/definition]
The phase $x''\cdot\theta$ is the local normal-frequency model. Its critical equations force $x''=0$, and the covector is $\theta\cdot dx''$, so the associated Lagrangian is $N^*Y\setminus 0$. This motivates the comparison theorem, which identifies the conormal calculus with the Lagrangian calculus for this specific Lagrangian.
[quotetheorem:8204]
[citeproof:8204]
The embedded-submanifold hypothesis is essential because the normal form $x''=0$ uses adapted coordinates and a vector bundle of normal covectors. If $Y$ has a crossing singularity, such as the union of the coordinate axes in $\mathbb R^2$, there is no single smooth conormal bundle over the crossing point, and the corresponding singularity is not described by one smooth Lagrangian $N^*Y\setminus 0$. The order normalization is also part of the theorem: for codimension $k$, a conormal amplitude of order $\mu$ represents a distribution of order $\mu-n/4+k/2$, which is the same rule obtained from the general Lagrangian definition with $N=k$. If that shift were changed only in the conormal definition, the model phase $x''\cdot\theta$ would no longer land in the same $I^m(X,N^*Y\setminus 0)$ class as the general Lagrangian definition with $N=k$, so the identification would fail even for the delta measure on a hypersurface. The theorem also does not say that every Lagrangian distribution is conormal to a submanifold of $X$; projections of general Lagrangians may have caustics, folds, or several sheets over the same base point. What it gives is the foundational normal form: conormal singularities are the local building blocks, pseudodifferential kernels are conormal to the diagonal in $X\times X$, and more general Fourier integral kernels are Lagrangian with respect to canonical relations.
This comparison also explains why the chapter belongs between wave front sets and parametrices. Propagation theorems move conormal initial singularities along Hamiltonian flow, pseudodifferential kernels appear as the diagonal conormal case, and caustics mark the points where a conormal-looking phase on the base must be replaced by a Lagrangian phase with auxiliary variables. The same language is therefore used in geometric optics, boundary value parametrices, and the construction of solution operators for hyperbolic equations.
[example: Delta Measure on a Hypersurface]
Let $Y=\{x_n=0\}\subset \mathbb R^n$, with coordinates $x=(x',x_n)$. We interpret
\begin{align*}
(2\pi)^{-1}\int_{\mathbb R} e^{i x_n\theta}\,d\theta
\end{align*}
as an oscillatory integral. To verify that it is $\delta(x_n)$, let $\psi\in C_c^\infty(\mathbb R^n)$ and set
\begin{align*}
g(t)=\int_{\mathbb R^{n-1}}\psi(x',t)\,dx'.
\end{align*}
Then $g\in C_c^\infty(\mathbb R)$, and Fourier inversion gives
\begin{align*}
(2\pi)^{-1}\int_{\mathbb R}\int_{\mathbb R} e^{it\theta}g(t)\,dt\,d\theta=g(0).
\end{align*}
Substituting the definition of $g$ yields
\begin{align*}
g(0)=\int_{\mathbb R^{n-1}}\psi(x',0)\,dx',
\end{align*}
which is exactly the action of $\delta(x_n)$ on $\psi$.
The phase is
\begin{align*}
\phi(x,\theta)=x_n\theta.
\end{align*}
Its $\theta$-derivative is
\begin{align*}
\partial_\theta\phi(x,\theta)=x_n,
\end{align*}
so the critical set in $\mathbb R^n\times\mathbb R_0$ is
\begin{align*}
C_\phi=\{(x',0,\theta):\theta\ne 0\}.
\end{align*}
For $(x',0,\theta)\in C_\phi$, the $x$-differential is
\begin{align*}
\partial_x\phi(x',0,\theta)=\theta\,dx_n.
\end{align*}
Therefore the parametrized Lagrangian is
\begin{align*}
\Lambda_\phi=\{(x',0,\theta\,dx_n):\theta\ne 0\}.
\end{align*}
Since
\begin{align*}
T_{(x',0)}Y=\{v\in\mathbb R^n:v_n=0\},
\end{align*}
a covector $\xi=\sum_{j=1}^n \xi_j\,dx_j$ annihilates $T_{(x',0)}Y$ exactly when $\xi_1=\cdots=\xi_{n-1}=0$, hence $\xi=\xi_n\,dx_n$. Thus
\begin{align*}
\Lambda_\phi=N^*Y\setminus 0.
\end{align*}
The amplitude is the constant symbol $(2\pi)^{-1}$, so this is the basic conormal oscillation with one nonzero normal frequency variable. It models singularity concentrated on a hypersurface, such as the leading local form of jumps and boundary layer terms in linear PDE.
[/example]
The chapter therefore closes the circle between phase functions, symplectic geometry, and distribution classes. Lagrangians describe where oscillations live, symbols describe their leading strength, and conormal distributions provide the normal form from which the general theory is built.
Lagrangian distributions unify phase functions, symbols, and canonical geometry in a single framework. With that language in place, Fourier integral operators can be described as quantizations of canonical relations, which is the operator-theoretic form of the same geometry.
# 9. Fourier Integral Operators and Canonical Relations
Building on wave front sets, pseudodifferential detection, propagation of singularities, and the Lagrangian distributions of Chapter 8, this chapter develops the operator calculus needed for variable-coefficient hyperbolic equations. The prerequisites are distribution theory, the local Fourier transform, the pseudodifferential calculus, and the symplectic geometry of cotangent bundles. This chapter introduces Fourier integral operators as the analytic objects whose kernels encode the same canonical relations that appeared geometrically in bicharacteristic flow. The guiding problem is to describe, with precise hypotheses, how an oscillatory kernel transports wave front set from $Y$ to $X$ through a relation in $T^*X \times T^*Y$.
## Kernels as Lagrangian Distributions
How should an operator remember the cotangent directions in both its input and output variables? The Schwartz kernel theorem turns a continuous operator $A:C_c^\infty(Y)\to \mathcal{D}'(X)$ into a distribution $K_A\in \mathcal{D}'(X\times Y)$, so the microlocal structure of $A$ is contained in the wave front set of $K_A$. For pseudodifferential operators this wave front set is concentrated near the conormal bundle of the diagonal, but Fourier integral operators allow more general Lagrangian geometry.
[definition: Twisted Wave Front Set of a Kernel]
Let $X$ and $Y$ be smooth manifolds and let $K\in \mathcal{D}'(X\times Y)$. The twisted wave front set of $K$ is
\begin{align*}
WF'(K)=\{(x,\xi;y,\eta)\in T^*X\times T^*Y:(x,y;\xi,-\eta)\in WF(K)\}.
\end{align*}
[/definition]
The twisted wave front set fixes the sign convention needed to read a kernel as a relation from input covectors to output covectors. This motivates the next definition: we need a class of distributions whose wave front sets are not arbitrary, but are organised by conic Lagrangian submanifolds. Such distributions are the local models for oscillatory kernels.
[definition: Lagrangian Distribution]
Let $M$ be a smooth manifold, let $\Lambda\subset T^*M\setminus 0$ be a conic Lagrangian submanifold, and fix the standard local convention in which an oscillatory integral with $N$ phase variables and amplitude in $S^\mu$ has Lagrangian order $\mu+N/2-\dim M/4$. A distribution $u\in \mathcal{D}'(M)$ is a Lagrangian distribution of order $m$ associated to $\Lambda$, written $u\in I^m(M,\Lambda)$, if microlocally near each point of $\Lambda$ it can be written as an oscillatory integral
\begin{align*}
u(z)=\int_{\mathbb R^N} e^{i\phi(z,\theta)}a(z,\theta)\,d\theta,
\end{align*}
where $\phi$ is a nondegenerate homogeneous phase function parametrising $\Lambda$ and $a\in S^\mu_{\mathrm{cl}}(U\times(\mathbb R^N\setminus 0))$ is a classical scalar symbol on the local coordinate patch $U\subset M$ and the phase variables, with $\mu=m-N/2+\dim M/4$.
[/definition]
Lagrangian distributions provide the kernel-level building blocks, but an operator needs its kernel Lagrangian to be interpreted as a relation between two cotangent bundles. This motivates the definition of an FIO: it packages a Lagrangian distribution on $X\times Y$ together with the twisting convention that converts it into a canonical relation $C\subset T^*X\times T^*Y$.
[definition: Fourier Integral Operator]
Let $X$ and $Y$ be smooth manifolds, and let $C\subset T^*X\setminus 0\times T^*Y\setminus 0$ be a conic canonical relation. A continuous operator $A:C_c^\infty(Y)\to \mathcal{D}'(X)$ is a Fourier integral operator of order $m$ associated to $C$, written $A\in I^m(X,Y;C)$, if its Schwartz kernel satisfies
\begin{align*}
K_A\in I^m(X\times Y,C'),
\end{align*}
in the Lagrangian-distribution convention fixed above, where $C'=\{(x,y;\xi,-\eta):(x,\xi;y,\eta)\in C\}\subset T^*(X\times Y)\setminus 0$.
[/definition]
This chapter uses a kernel-order convention: the superscript in $I^m(X,Y;C)$ is the Lagrangian order of the twisted kernel. Many books use half-density-normalised operator orders, differing from this by dimension shifts. Whenever a composition or symbol statement depends on the normalisation, the convention must be fixed before an order formula is read numerically.
This definition puts pseudodifferential operators and solution operators for hyperbolic equations in one class. Pseudodifferential operators correspond to the identity relation, while propagators for wave equations correspond to canonical transformations generated by the Hamiltonian flow.
[example: Pseudodifferential Operators as Fourier Integral Operators]
Let $A\in \Psi^m(X)$ be written in local coordinates with kernel
\begin{align*}
K_A(x,y)=\int e^{i(x-y)\cdot \xi}a(x,y,\xi)\,d\xi.
\end{align*}
For the phase $\phi(x,y,\xi)=(x-y)\cdot \xi$, stationarity in the auxiliary variable $\xi$ means
\begin{align*}
\partial_{\xi_j}\phi(x,y,\xi)=x_j-y_j
\end{align*}
for each $j$, so the critical set is
\begin{align*}
\Sigma_\phi=\{(x,y,\xi):x=y,\ \xi\ne 0\}.
\end{align*}
On this critical set,
\begin{align*}
\partial_x\phi(x,y,\xi)=\xi
\end{align*}
and
\begin{align*}
\partial_y\phi(x,y,\xi)=-\xi.
\end{align*}
The twisting convention for kernels records the input covector as $-\partial_y\phi$, hence
\begin{align*}
(x,\partial_x\phi;y,-\partial_y\phi)=(x,\xi;y,\xi).
\end{align*}
Since $x=y$ on $\Sigma_\phi$, the generated canonical relation is
\begin{align*}
C=\{(x,\xi;x,\xi):\xi\ne 0\}.
\end{align*}
Thus a pseudodifferential operator is an FIO associated to the identity canonical relation, and its principal pseudodifferential symbol is represented in this local formula by the leading homogeneous part of $a(x,x,\xi)$ on the diagonal critical set.
[/example]
## Canonical Relations from Phase Functions
Given an oscillatory kernel, how do we extract the geometric relation that describes its microlocal action? The answer is to look at stationary points in the auxiliary variables and then record the derivatives of the phase in the base variables. This process turns the analytic data of a phase function into a Lagrangian submanifold.
[definition: Nondegenerate Phase Function for a Kernel]
Let $X$ and $Y$ be smooth manifolds, and let $N\in \mathbb N$. A smooth function $\phi:X\times Y\times (\mathbb R^N\setminus 0)\to \mathbb R$ is a nondegenerate homogeneous phase function for a kernel if it is positively homogeneous of degree $1$ in $\theta$, the differentials $d(\partial_{\theta_j}\phi)$ are independent on the critical set
\begin{align*}
\Sigma_\phi=\{(x,y,\theta):\partial_\theta\phi(x,y,\theta)=0\},
\end{align*}
and $d_{x,y}\phi$ does not vanish on $\Sigma_\phi$.
[/definition]
The number $N$ is local data of the parametrisation; different coordinate patches or different parametrisations of the same Lagrangian may use different numbers of phase variables.
The independence condition says that the stationary equations cut out a smooth critical manifold. That is only the first half of the construction: we still need a relation in the two cotangent bundles, so the next definition records the covectors obtained by differentiating the same stationary phase in the visible variables.
[definition: Canonical Relation of a Phase]
Let $\phi$ be a nondegenerate homogeneous phase function for a kernel on $X\times Y$. The canonical relation generated by $\phi$ is
\begin{align*}
C_\phi=\{(x,\partial_x\phi(x,y,\theta);y,-\partial_y\phi(x,y,\theta)):(x,y,\theta)\in \Sigma_\phi\}.
\end{align*}
[/definition]
The minus sign is the same twisting convention introduced for kernels. In many examples $C_\phi$ is the graph of a homogeneous symplectomorphism, but the definition also allows many-to-one relations such as the Radon transform. The construction would be less useful if it only produced a set; the next theorem proves that the set has the Lagrangian structure required by symplectic microlocal analysis.
The key question is why stationary phase equations should automatically produce a canonical relation rather than an arbitrary conic subset. The answer comes from pulling back the canonical one-form to the critical set: the differential of the phase supplies a primitive, and the stationary equations remove the auxiliary directions.
[quotetheorem:8205]
[citeproof:8205]
This theorem lets us pass between oscillatory formulas and geometry, but its hypotheses are doing real work. For instance, on $\mathbb R_x$ with two auxiliary variables, the homogeneous phase $\phi(x,\theta_1,\theta_2)=x\theta_1$ has $\partial_{\theta_1}\phi=x$ and $\partial_{\theta_2}\phi=0$, so the differentials of the stationary equations are not independent; the critical set contains a free $\theta_2$ direction that does not affect the covector image, and it is not the clean parametrisation required by the theorem. If the nonzero-covector condition is dropped, $\phi(x,\theta)=x^2\theta$ has a critical point at $x=0$ whose visible covector $\partial_x\phi=2x\theta$ vanishes, so the construction lands on the zero section rather than in the conic cotangent bundle used for wave front sets. The result also does not say that every point of the relation contributes with nonzero amplitude; that is a separate symbol and ellipticity question taken up when mapping wave front sets. We first revisit a familiar operation, pullback by a diffeomorphism, because its relation is an honest graph and so gives the cleanest model for covector transport.
[example: Pullback by a Diffeomorphism]
Let $f:X\to Y$ be a diffeomorphism and define $A u=f^*u$, so $Au(x)=u(f(x))$. In local coordinates, after fixing the coordinate density on $Y$, the graph delta kernel is represented by
\begin{align*}
K_A(x,y)=\delta(y-f(x)).
\end{align*}
Indeed, for $u\in C_c^\infty(Y)$,
\begin{align*}
\int_Y \delta(y-f(x))u(y)\,dy=u(f(x)).
\end{align*}
Using the oscillatory representation of the delta distribution, we may write microlocally
\begin{align*}
K_A(x,y)=(2\pi)^{-n}\int_{\mathbb R^n}e^{i(f(x)-y)\cdot \eta}\,d\eta.
\end{align*}
For the phase $\phi(x,y,\eta)=(f(x)-y)\cdot \eta$, stationarity in $\eta$ gives
\begin{align*}
\partial_{\eta_j}\phi(x,y,\eta)=f_j(x)-y_j.
\end{align*}
Thus the critical set is
\begin{align*}
\Sigma_\phi=\{(x,y,\eta):y=f(x),\ \eta\ne 0\}.
\end{align*}
On this critical set,
\begin{align*}
\partial_x\phi(x,y,\eta)=df_x^\top\eta.
\end{align*}
Also,
\begin{align*}
\partial_y\phi(x,y,\eta)=-\eta.
\end{align*}
The kernel twisting convention records the input covector as $-\partial_y\phi$, so
\begin{align*}
(x,\partial_x\phi;y,-\partial_y\phi)=(x,df_x^\top\eta;f(x),\eta).
\end{align*}
Therefore the canonical relation is
\begin{align*}
C_f=\{(x,df_x^\top\eta;f(x),\eta):\eta\in T^*_{f(x)}Y\setminus 0\}.
\end{align*}
Since $f$ is a diffeomorphism, $df_x:T_xX\to T_{f(x)}Y$ is an isomorphism and $df_x^\top:T^*_{f(x)}Y\to T_x^*X$ is an isomorphism, so nonzero input covectors are transported to nonzero output covectors. Thus pullback by $f$ transports wave front directions by the cotangent lift of $f$.
[/example]
The diffeomorphism example has no branching: every input covector determines one output covector. Hyperbolic evolution gives the main PDE example of the same graph phenomenon, with the graph now generated by Hamiltonian flow rather than by an ordinary change of variables.
[example: Half-Wave Propagator]
Let $(M,g)$ be a compact Riemannian manifold and fix $t\in\mathbb R$. The half-wave operator
\begin{align*}
U(t)=e^{-it\sqrt{\Delta_g}}
\end{align*}
is defined on $L^2(M)$ by spectral calculus and extends to distributions by duality. Its principal phase-space Hamiltonian is
\begin{align*}
p(x,\theta)=|\theta|_g=(g^{ij}(x)\theta_i\theta_j)^{1/2}
\end{align*}
on $T^*M\setminus 0$.
In local coordinates, the Hamiltonian vector field of $p$ is determined by
\begin{align*}
H_p=\sum_i \frac{\partial p}{\partial \theta_i}\frac{\partial}{\partial x^i}-\sum_i \frac{\partial p}{\partial x^i}\frac{\partial}{\partial \theta_i}.
\end{align*}
Since
\begin{align*}
\frac{\partial p}{\partial \theta_i}=\frac{1}{2}(g^{ab}\theta_a\theta_b)^{-1/2}\cdot 2g^{ij}\theta_j=\frac{g^{ij}(x)\theta_j}{|\theta|_g},
\end{align*}
and
\begin{align*}
\frac{\partial p}{\partial x^i}=\frac{1}{2}(g^{ab}\theta_a\theta_b)^{-1/2}\cdot \partial_{x^i}g^{ab}(x)\theta_a\theta_b=\frac{\partial_{x^i}g^{ab}(x)\theta_a\theta_b}{2|\theta|_g},
\end{align*}
the lifted bicharacteristics $(x(s),\theta(s))=\Phi_s(y,\eta)$ satisfy
\begin{align*}
\dot x^i(s)=\frac{g^{ij}(x(s))\theta_j(s)}{|\theta(s)|_g}.
\end{align*}
Also,
\begin{align*}
\dot\theta_i(s)=-\frac{\partial_{x^i}g^{ab}(x(s))\theta_a(s)\theta_b(s)}{2|\theta(s)|_g}.
\end{align*}
Microlocally away from the zero section, the kernel of $U(t)$ is a Lagrangian distribution whose canonical relation is the graph of this Hamiltonian flow:
\begin{align*}
C_t=\{(x,\xi;y,\eta):(x,\xi)=\Phi_t(y,\eta)\}.
\end{align*}
Thus a singular covector $(y,\eta)$ in the input is transported to the output covector $\Phi_t(y,\eta)$; geometrically, this is motion along the cotangent lift of the unit-speed geodesic flow.
[/example]
Graphs do not cover integral geometry, where one data point may encode a whole family of incident geometric objects. The Radon transform illustrates how a canonical relation can encode incidence rather than a single-valued transformation.
[example: Radon Transform]
In $\mathbb R^2$, the Radon transform is
\begin{align*}
Ru(s,\omega)=\int_{x\cdot \omega=s}u(x)\,d\mathcal H^1(x),\qquad (s,\omega)\in \mathbb R\times S^1.
\end{align*}
Choose a local angular coordinate $\alpha$ on $S^1$, so $\omega(\alpha)=(\cos\alpha,\sin\alpha)$ and $\omega_\perp(\alpha)=\partial_\alpha\omega(\alpha)=(-\sin\alpha,\cos\alpha)$. Using the delta constraint for the incidence relation $x\cdot\omega(\alpha)-s=0$, the kernel is represented microlocally by the phase
\begin{align*}
\phi(s,\alpha,x,\tau)=\tau(x\cdot \omega(\alpha)-s).
\end{align*}
Stationarity in the auxiliary variable $\tau$ gives
\begin{align*}
\partial_\tau\phi(s,\alpha,x,\tau)=x\cdot\omega(\alpha)-s.
\end{align*}
Thus the critical set is
\begin{align*}
\Sigma_\phi=\{(s,\alpha,x,\tau):x\cdot\omega(\alpha)=s,\ \tau\ne 0\}.
\end{align*}
On this critical set, the output covector in the $(s,\alpha)$ variables is computed from
\begin{align*}
\partial_s\phi(s,\alpha,x,\tau)=-\tau.
\end{align*}
Also, since $\partial_\alpha\omega=\omega_\perp$,
\begin{align*}
\partial_\alpha\phi(s,\alpha,x,\tau)=\tau x\cdot\omega_\perp(\alpha).
\end{align*}
For the input variable $x$, the visible derivative is
\begin{align*}
\partial_x\phi(s,\alpha,x,\tau)=\tau\omega(\alpha).
\end{align*}
The kernel twisting convention records the input covector as $-\partial_x\phi$, so the generated canonical relation is
\begin{align*}
C_R=\{(s,\alpha;-\tau,\tau x\cdot\omega_\perp(\alpha);x,-\tau\omega(\alpha)):x\cdot\omega(\alpha)=s,\ \tau\ne 0\}.
\end{align*}
Equivalently, if $x=s\omega(\alpha)+r\omega_\perp(\alpha)$ on the line $x\cdot\omega(\alpha)=s$, then $x\cdot\omega_\perp(\alpha)=r$, and the same relation becomes
\begin{align*}
C_R=\{(s,\alpha;-\tau,\tau r;s\omega(\alpha)+r\omega_\perp(\alpha),-\tau\omega(\alpha)):r\in\mathbb R,\ \tau\ne 0\}.
\end{align*}
Thus a singular covector at $x$ contributes to Radon data precisely for those lines through $x$ whose normal direction is parallel to that covector. This is the microlocal mechanism behind tomography: object singularities are visible in the data along the incidence curves that record the matching normal direction.
[/example]
The previous examples still come from geometric submanifolds or flows. The Euclidean Fourier transform is often used as the basic linear symplectic model, but it must be interpreted with care in this local kernel calculus.
[example: Euclidean Fourier Transform as a Global Model]
For the Euclidean Fourier transform $\mathcal F:\mathcal{S}(\mathbb R^n_x)\to \mathcal{S}(\mathbb R^n_\xi)$,
\begin{align*}
\mathcal Fu(\xi)=\frac{1}{(2\pi)^{n/2}}\int_{\mathbb R^n}e^{-ix\cdot \xi}u(x)\,d\mathcal L^n(x).
\end{align*}
Its kernel on $\mathbb R^n_\xi\times \mathbb R^n_x$ is
\begin{align*}
K(\xi,x)=(2\pi)^{-n/2}e^{-ix\cdot \xi}.
\end{align*}
For each multiindex $\alpha,\beta$, differentiating the exponential gives
\begin{align*}
\partial_\xi^\alpha\partial_x^\beta e^{-ix\cdot \xi}=p_{\alpha,\beta}(x,\xi)e^{-ix\cdot \xi}
\end{align*}
for a polynomial $p_{\alpha,\beta}$, so $K$ is $C^\infty$ as a distribution kernel. Hence
\begin{align*}
WF(K)=\varnothing.
\end{align*}
Thus the Euclidean Fourier transform is not a local wave-front-set FIO example in the same sense as a conormal kernel or a half-wave parametrix: there is no singular kernel wave front relation to read off locally.
The phase still records the familiar global phase-space transformation. If
\begin{align*}
\phi(\xi,x)=-x\cdot \xi,
\end{align*}
then
\begin{align*}
\partial_\xi\phi(\xi,x)=-x.
\end{align*}
Also,
\begin{align*}
\partial_x\phi(\xi,x)=-\xi.
\end{align*}
With the kernel twisting convention, the input covector is
\begin{align*}
-\partial_x\phi(\xi,x)=\xi.
\end{align*}
Therefore the phase associates the pair
\begin{align*}
(\xi,\partial_\xi\phi;x,-\partial_x\phi)=(\xi,-x;x,\xi).
\end{align*}
Writing the input covector as $\eta=\xi$, this is the linear transformation
\begin{align*}
(x,\eta)\longmapsto(\eta,-x).
\end{align*}
This is a global Schwartz-tempered or metaplectic phase-space statement, expressing the exchange of position and frequency variables; the wave front mapping theorem below should instead be applied to kernels with the stated local Lagrangian singularity structure.
[/example]
## Mapping Wave Front Sets by Canonical Relations
Once an FIO is associated with a canonical relation, what does it do to singularities? The guiding principle is that singularities cannot appear away from the relation unless they come from singularities already present in the input or from singularities of the kernel. Proper support keeps the projection geometry controlled enough for the kernel pairing to define an operator on distributions.
[definition: Image of a Conic Set under a Canonical Relation]
Let $C\subset T^*X\setminus 0\times T^*Y\setminus 0$ be a canonical relation and let $\Gamma\subset T^*Y\setminus 0$ be a conic subset. The image of $\Gamma$ under $C$ is
\begin{align*}
C\circ \Gamma=\{(x,\xi):\text{there exists }(y,\eta)\in \Gamma\text{ with }(x,\xi;y,\eta)\in C\}.
\end{align*}
[/definition]
This notation treats a canonical relation as a relation between phase spaces rather than as a function. When $C$ is the graph of a canonical transformation, the definition reduces to the usual image of a subset. The next theorem is the analytic statement that this set-theoretic operation is exactly the upper bound for how FIOs can move wave front sets.
The main point to prove is exclusion: if an output covector is not related to any singular input covector, then the localized output must be smooth in that direction. The proof combines pseudodifferential cutoffs with nonstationary phase.
[quotetheorem:8206]
[citeproof:8206]
The theorem is an inclusion because relations may fold, amplitudes may vanish, and different branches may interfere. The ellipticity hypothesis is necessary even for the identity relation: if $A=0$ or if $A\in \Psi^0(Y)$ has principal symbol vanishing in a conic neighbourhood of $(y_0,\eta_0)$, then a distribution singular at $(y_0,\eta_0)$ can be sent to something microlocally smooth there. The local graph hypothesis is also necessary for the reverse implication: the operator $u\mapsto u(x)+u(-x)$ on $\mathbb R$ has two canonical branches, and an odd distribution can cancel at the output even though both branches are allowed by the set-theoretic relation. The no-pure-output condition rules out new singularities from smooth input; for example, the rank-one kernel $K(x,y)=\delta(x)$ defines an operator whose output has the singularity of $\delta(x)$ regardless of the wave front set of $u$. Proper support is a separate analytic hypothesis: without it, a kernel such as $K(x,y)=e^{ixy}$ over a noncompact intermediate variable is not controlled by compact pieces in the way required for distributional pushforward. The theorem therefore does not claim that every possible related singularity appears; ellipticity and the local graph condition are the extra assumptions that turn the inclusion into reversible microlocal propagation.
[example: Wave Front Set under Pullback]
For a diffeomorphism $f:X\to Y$ and $A=f^*$, the canonical relation of $f^*$ is
\begin{align*}
C_f=\{(x,df_x^\top\eta;f(x),\eta):\eta\in T^*_{f(x)}Y\setminus 0\}.
\end{align*}
Applying the *[Wave Front Mapping Theorem for Fourier Integral Operators](/theorems/8206)* to this relation gives
\begin{align*}
WF(f^*u)\subset \{(x,df_x^\top\eta):(f(x),\eta)\in WF(u)\}.
\end{align*}
For the reverse inclusion, put $g=f^{-1}$ and $v=f^*u$. Since $g^*v=g^*f^*u=u$, the same mapping theorem applied to $g^*$ gives
\begin{align*}
WF(u)=WF(g^*v)\subset \{(y,dg_y^\top\zeta):(g(y),\zeta)\in WF(v)\}.
\end{align*}
Take $(y,\eta)\in WF(u)$ and set $x=g(y)$, so $y=f(x)$. The displayed inclusion gives a covector $\zeta\in T_x^*X\setminus 0$ such that
\begin{align*}
(x,\zeta)\in WF(f^*u)
\end{align*}
and
\begin{align*}
\eta=dg_y^\top\zeta.
\end{align*}
Because $g=f^{-1}$, the differentials satisfy
\begin{align*}
dg_{f(x)}\circ df_x=\operatorname{id}_{T_xX}.
\end{align*}
Taking transposes gives
\begin{align*}
df_x^\top\circ dg_{f(x)}^\top=\operatorname{id}_{T_x^*X}.
\end{align*}
Since $y=f(x)$, applying $df_x^\top$ to $\eta=dg_y^\top\zeta$ yields
\begin{align*}
df_x^\top\eta=df_x^\top dg_{f(x)}^\top\zeta=\zeta.
\end{align*}
Thus $(x,df_x^\top\eta)\in WF(f^*u)$ whenever $(f(x),\eta)\in WF(u)$, so
\begin{align*}
\{(x,df_x^\top\eta):(f(x),\eta)\in WF(u)\}\subset WF(f^*u).
\end{align*}
Combining the two inclusions,
\begin{align*}
WF(f^*u)=\{(x,df_x^\top\eta):(f(x),\eta)\in WF(u)\}.
\end{align*}
Thus pullback by a diffeomorphism transports singular directions exactly by the cotangent lift.
[/example]
Pullback shows the mapping theorem in a reversible setting. The wave equation gives the dynamic version: the canonical relation is the Hamiltonian flow, so the same inclusion becomes propagation along bicharacteristics.
[example: Propagation by the Half-Wave Group]
For the half-wave group $U(t)=e^{-it\sqrt{\Delta_g}}$ on a compact Riemannian manifold, the canonical relation away from the zero section is the graph
\begin{align*}
C_t=\{(\Phi_t(y,\eta);y,\eta):(y,\eta)\in T^*M\setminus 0\}.
\end{align*}
Applying the *Wave Front Mapping Theorem for Fourier Integral Operators* gives
\begin{align*}
WF(U(t)u)\subset C_t\circ WF(u).
\end{align*}
By the definition of image under a canonical relation,
\begin{align*}
C_t\circ WF(u)=\{\Phi_t(y,\eta):(y,\eta)\in WF(u)\}.
\end{align*}
Hence
\begin{align*}
WF(U(t)u)\subset \Phi_t(WF(u)).
\end{align*}
The reverse inclusion follows by applying the same argument to the inverse half-wave operator $U(-t)$, since
\begin{align*}
U(-t)U(t)u=u.
\end{align*}
The canonical relation of $U(-t)$ is the graph of $\Phi_{-t}$, so
\begin{align*}
WF(u)=WF(U(-t)U(t)u)\subset \Phi_{-t}(WF(U(t)u)).
\end{align*}
Applying $\Phi_t$ to both sides and using $\Phi_t\circ\Phi_{-t}=\operatorname{id}$ gives
\begin{align*}
\Phi_t(WF(u))\subset WF(U(t)u).
\end{align*}
Combining the two inclusions,
\begin{align*}
WF(U(t)u)=\Phi_t(WF(u)).
\end{align*}
Thus, away from the zero section and microlocally where the half-wave parametrix is elliptic, singular covectors propagate exactly along the Hamiltonian flow of $|\theta|_g$; caustics may require changing phase charts, but they do not change this reversible wave front relation.
[/example]
## Proper Support and Composition Setup
Composition is the point where canonical relations become a calculus rather than just a dictionary. The problem is that the formal composition of kernels involves integrating over an intermediate manifold, and this integral may fail to be proper or may create singular products. The correct hypotheses separate support control from clean intersection geometry.
[definition: Properly Supported Operator]
Let $A:C_c^\infty(Y)\to \mathcal{D}'(X)$ have Schwartz kernel $K_A\in \mathcal{D}'(X\times Y)$. The operator $A$ is properly supported if the two projections
\begin{align*}
\operatorname{supp}K_A\to X,\qquad \operatorname{supp}K_A\to Y
\end{align*}
are proper maps.
[/definition]
Proper support ensures that compactly supported input gives locally controlled output and that the transpose operator has the same support control. It is often achieved by inserting cutoffs, and microlocal statements are unchanged in the region where the cutoffs equal $1$.
For two FIOs $A:X\leftarrow Y$ and $B:Y\leftarrow Z$, the geometric composition of their canonical relations is the candidate relation for $AB$. Before stating the theorem, we need the set-theoretic composition.
[definition: Composition of Canonical Relations]
Let $C_1\subset T^*X\setminus 0\times T^*Y\setminus 0$ and $C_2\subset T^*Y\setminus 0\times T^*Z\setminus 0$ be canonical relations. Their composition is
\begin{align*}
C_1\circ C_2=\{(x,\xi;z,\zeta):\text{there exists }(y,\eta)\text{ with }(x,\xi;y,\eta)\in C_1,\ (y,\eta;z,\zeta)\in C_2\}.
\end{align*}
[/definition]
The same intermediate covector $(y,\eta)$ appears in both relations because the twist has already been built into the convention for kernels. Analytically, this matching is the stationary condition in the intermediate base variable. To convert this formal matching into an operator theorem, we need hypotheses ensuring that the stationary set is a clean critical manifold and that its image is again a canonical relation.
The composition theorem is the local model for building parametrices by multiplying FIOs. It also explains why order bookkeeping is convention-sensitive: stationary phase contributes excess terms, while the translation between kernel orders and operator orders contributes dimension shifts unless a half-density normalisation has been fixed.
The next theorem records the hypotheses under which this stationary-phase picture is stable. Each assumption has a distinct role: clean intersection controls the critical manifold, properness controls the intermediate integration, and constant rank ensures that the projected relation is a smooth Lagrangian rather than a variable-rank image.
[quotetheorem:8207]
[citeproof:8207]
This theorem contains the pseudodifferential composition formula as the special case where both relations are the identity relation. The clean-intersection hypothesis excludes stationary equations whose rank changes: the phase $\Phi(x,y,z,\theta)=\theta(y^2-xz)$ has a critical set whose fibre dimension jumps at $(x,z)=(0,0)$, so a single excess number cannot describe the resulting oscillatory order. Properness is needed even when the phase is nondegenerate: an integral over $y\in \mathbb R$ with kernel cutoffs drifting to $|y|\to\infty$ can have no locally finite composed kernel although each compactly truncated piece is an FIO. Constant rank is also necessary; the projection $(s,t)\mapsto (s^2,t)$ has rank $1$ at $s=0$ and rank $2$ elsewhere, so the image of a smooth fibre product can acquire a fold rather than an immersed Lagrangian structure. Such rank changes are related to caustic phenomena in particular parametrisations, but caustics do not by themselves mean that the underlying canonical relation has stopped being Lagrangian; often the remedy is to change phase charts or to work with several branches.
[example: Composition with a Diffeomorphism Pullback]
Let $f:X\to Y$ and $g:Y\to Z$ be diffeomorphisms. Then
\begin{align*}
f^*:C^\infty(Y)\to C^\infty(X),\qquad g^*:C^\infty(Z)\to C^\infty(Y),
\end{align*}
so the composable order is $f^*\circ g^*:C^\infty(Z)\to C^\infty(X)$. For $v\in C^\infty(Z)$ and $x\in X$,
\begin{align*}
(f^*\circ g^*)v(x)=f^*(g^*v)(x).
\end{align*}
By the definition of pullback,
\begin{align*}
f^*(g^*v)(x)=(g^*v)(f(x)).
\end{align*}
Applying the definition of pullback again gives
\begin{align*}
(g^*v)(f(x))=v(g(f(x))).
\end{align*}
Thus
\begin{align*}
(f^*\circ g^*)v(x)=v((g\circ f)(x))=((g\circ f)^*v)(x).
\end{align*}
The canonical relation of $f^*$ is
\begin{align*}
C_f=\{(x,df_x^\top\eta;f(x),\eta):\eta\in T^*_{f(x)}Y\setminus 0\}.
\end{align*}
Similarly,
\begin{align*}
C_g=\{(y,dg_y^\top\zeta;g(y),\zeta):\zeta\in T^*_{g(y)}Z\setminus 0\}.
\end{align*}
A point $(x,\xi;z,\zeta)$ lies in $C_f\circ C_g$ exactly when there are $y\in Y$ and $\eta\in T_y^*Y\setminus 0$ such that
\begin{align*}
(x,\xi;y,\eta)\in C_f
\end{align*}
and
\begin{align*}
(y,\eta;z,\zeta)\in C_g.
\end{align*}
The first membership gives
\begin{align*}
y=f(x).
\end{align*}
It also gives
\begin{align*}
\xi=df_x^\top\eta.
\end{align*}
The second membership gives
\begin{align*}
z=g(y).
\end{align*}
It also gives
\begin{align*}
\eta=dg_y^\top\zeta.
\end{align*}
Substituting $y=f(x)$ gives
\begin{align*}
z=g(f(x))=(g\circ f)(x).
\end{align*}
Substituting $\eta=dg_{f(x)}^\top\zeta$ into $\xi=df_x^\top\eta$ gives
\begin{align*}
\xi=df_x^\top dg_{f(x)}^\top\zeta.
\end{align*}
By the chain rule,
\begin{align*}
d(g\circ f)_x=dg_{f(x)}\circ df_x.
\end{align*}
Taking transposes reverses the order of composition, so
\begin{align*}
d(g\circ f)_x^\top=df_x^\top\circ dg_{f(x)}^\top.
\end{align*}
Therefore
\begin{align*}
\xi=d(g\circ f)_x^\top\zeta.
\end{align*}
Hence
\begin{align*}
C_f\circ C_g=\{(x,d(g\circ f)_x^\top\zeta;(g\circ f)(x),\zeta):\zeta\in T^*_{(g\circ f)(x)}Z\setminus 0\}.
\end{align*}
Since $df_x$, $dg_{f(x)}$, and $d(g\circ f)_x$ are isomorphisms, their transposes preserve nonzero covectors. Thus the FIO composition agrees with ordinary pullback composition, and the composed canonical relation is exactly the cotangent lift of $g\circ f$.
[/example]
This final example returns the calculus to ordinary changes of variables, where clean composition is guaranteed by the graph structure. In general FIO calculus, the same principle survives with relations and clean intersections replacing maps and ordinary composition.
[remark: Why Canonical Relations Matter]
The kernel of an FIO is a distribution on $X\times Y$, but its useful information is not only the subset of $X\times Y$ where it is singular. The cotangent directions form a Lagrangian relation, and this relation controls wave front mapping, composition, parametrices, and propagation for hyperbolic equations. This is the conceptual shift from local operator formulas to symplectic geometry.
[/remark]
Fourier integral operators map singularities according to their canonical relations, but to use them as a calculus we must understand how they behave under composition and adjunction. The next chapter develops those structural rules and shows how Egorov's theorem transfers pseudodifferential data along the same symplectic dynamics.
# 10. Composition, Adjoints, and Egorov's Theorem
After Chapter 9 introduced Fourier integral operators and their wave front mapping property, composition is the point at which they become a calculus rather than a collection of examples. This chapter assumes the earlier construction of FIOs from nondegenerate phase functions, the interpretation of their canonical relations, principal symbols as half-densities, and the stationary phase and clean stationary phase theorems. Earlier chapters attached an operator to a canonical relation and used that relation to describe the movement of wave front sets. This chapter asks when two such movements can be performed in succession, how the symbolic order changes, what happens under adjoints, and why conjugation by an elliptic Fourier integral operator transports pseudodifferential operators by the underlying canonical transformation.
## Transversal Composition of Canonical Relations
Suppose an operator $A$ carries singularities from $Y$ to $X$, and an operator $B$ carries singularities from $Z$ to $Y$. The problem is to decide whether $AB$ carries singularities from $Z$ to $X$ by the composed canonical relation, and what geometric hypotheses make the oscillatory integral defining $AB$ behave like another Fourier integral operator.
Let $X,Y,Z$ be smooth manifolds. If $C_1 \subset T^*X_0 \times T^*Y_0$ and $C_2 \subset T^*Y_0 \times T^*Z_0$ are canonical relations, where $T^*X_0 = T^*X \setminus 0$, the middle variables must match with the sign convention inherited from kernels. To state the next definition, we first record the set-theoretic propagation law obtained by eliminating the intermediate covector over $Y$.
[definition: Composed Canonical Relation]
Let $C_1 \subset T^*X_0 \times T^*Y_0$ and $C_2 \subset T^*Y_0 \times T^*Z_0$ be canonical relations. Their composed relation is
\begin{align*}
C_1 \circ C_2 = \{(x,\xi;z,\zeta) : \text{there exists } (y,\eta) \in T^*Y_0 \text{ with } (x,\xi;y,\eta) \in C_1, (y,\eta;z,\zeta) \in C_2\}.
\end{align*}
[/definition]
This definition records the intended propagation law, but it does not by itself imply that the set is a smooth conic Lagrangian submanifold. If the two relations meet tangentially in the middle cotangent factor, the eliminated set may have singularities, changing dimension at different points or failing to be a manifold at all. Even when the set-theoretic composition is smooth, a nonproper family of intermediate covectors can make the kernel integral fail to be properly supported. The next definition is needed to express the geometric nondegeneracy condition that will make stationary phase stable in the shared variables.
[definition: Transversal Composition]
The canonical relations $C_1 \subset T^*X_0 \times T^*Y_0$ and $C_2 \subset T^*Y_0 \times T^*Z_0$ compose transversally if
\begin{align*}
(C_1 \times C_2) \pitchfork (T^*X_0 \times \Delta(T^*Y_0) \times T^*Z_0)
\end{align*}
inside $T^*X_0 \times T^*Y_0 \times T^*Y_0 \times T^*Z_0$. The resulting fibre product is
\begin{align*}
C_1 \times_{T^*Y_0} C_2 = (C_1 \times C_2) \cap (T^*X_0 \times \Delta(T^*Y_0) \times T^*Z_0),
\end{align*}
and the projection
\begin{align*}
\pi_{XZ}:C_1 \times_{T^*Y_0} C_2 \to T^*X_0 \times T^*Z_0
\end{align*}
is an immersion.
[/definition]
The condition says that the two phase constraints meet with the expected codimension before the middle variables are removed. If the projection to $T^*X_0 \times T^*Z_0$ is only immersed with self-intersections, the correct conclusion is local on branches of the immersed relation rather than a single globally embedded canonical relation. The following theorem is the analytic result this condition is designed to prove: under transversal composition, proper support, and an embedded composed relation on the microlocal region under discussion, the composition of operators remains in the FIO calculus.
[quotetheorem:8208]
[citeproof:8208]
The theorem is the microlocal version of composing changes of variables. Transversality is the hypothesis that prevents extra stationary directions from appearing; without it, the order may change and the phase may need clean rather than ordinary stationary phase. Properness is a support condition, not a symplectic condition: without it, a compactly supported input can pick up contributions from infinitely many intermediate points and the kernel need not define the expected properly supported operator. Embeddedness is also a genuine restriction, because an immersed composition with self-intersections must be treated branch by branch rather than as one smooth canonical relation. This explains why the graph case below is the simplest model: the intermediate covector is forced uniquely.
[example: Composition of Two Graph FIOs]
Let $\kappa_1:T^*Y_0 \to T^*X_0$ and $\kappa_2:T^*Z_0 \to T^*Y_0$ be homogeneous canonical transformations, and write
\begin{align*}
C_1=\{(\kappa_1(y,\eta);y,\eta):(y,\eta)\in T^*Y_0\}, \qquad C_2=\{(\kappa_2(z,\zeta);z,\zeta):(z,\zeta)\in T^*Z_0\}.
\end{align*}
For $(z,\zeta)\in T^*Z_0$, membership in the composed relation means that there is some $(y,\eta)\in T^*Y_0$ with
\begin{align*}
(x,\xi;y,\eta)=(\kappa_1(y,\eta);y,\eta)
\end{align*}
and
\begin{align*}
(y,\eta;z,\zeta)=(\kappa_2(z,\zeta);z,\zeta).
\end{align*}
The second equality forces the unique choice $(y,\eta)=\kappa_2(z,\zeta)$, and substituting this into the first equality gives
\begin{align*}
(x,\xi)=\kappa_1(y,\eta)=\kappa_1(\kappa_2(z,\zeta)).
\end{align*}
Hence
\begin{align*}
C_1\circ C_2=\{(\kappa_1(\kappa_2(z,\zeta));z,\zeta):(z,\zeta)\in T^*Z_0\}=\operatorname{graph}(\kappa_1\circ\kappa_2).
\end{align*}
The fibre of $C_1\times_{T^*Y_0}C_2$ over a fixed $(z,\zeta)$ contains exactly the one middle covector $\kappa_2(z,\zeta)$, so its dimension is $0$. In local phase coordinates, the kernel of $AB$ has phase $\phi_1(x,y,\theta)+\phi_2(y,z,\tau)$, and the critical equations in the eliminated variables are precisely the equations selecting this same unique middle covector. Because the two relations are graphs of diffeomorphisms, the matching equations have invertible linearization in the middle variables, so stationary phase uses an isolated nondegenerate critical point. By *Transversal FIO Composition*, properly supported operators $A\in I^{m_1}(X,Y;C_1)$ and $B\in I^{m_2}(Y,Z;C_2)$ therefore satisfy
\begin{align*}
AB\in I^{m_1+m_2}(X,Z;\operatorname{graph}(\kappa_1\circ\kappa_2)).
\end{align*}
Thus graph FIOs compose exactly like the underlying canonical transformations, with no excess contribution to the order.
[/example]
This graph case is the model for many applications, including propagation by a Hamilton flow. Its strength is also its limitation: it excludes integral transforms where several geometric objects connect the same incoming and outgoing covectors. More complicated integral transforms often have a family of intermediate points rather than a unique one, and treating such families as isolated critical points would give the wrong order. This leads to clean composition.
## Clean Composition and the Excess Formula for Orders
The transversal theorem is too restrictive for several natural transforms. The normal operator of the Radon transform, for instance, composes a relation with its transpose and leaves a residual family of geometric choices. The next question is how to keep a calculus when the middle variables form a smooth critical manifold rather than isolated critical points.
[definition: Clean Composition]
The canonical relations $C_1 \subset T^*X_0 \times T^*Y_0$ and $C_2 \subset T^*Y_0 \times T^*Z_0$ compose cleanly with excess $e$ if their fibre product
\begin{align*}
C_1 \times_{T^*Y_0} C_2
\end{align*}
is a smooth conic manifold, its tangent space is the fibre product of tangent spaces at every point, and the projection to $C_1 \circ C_2$ has fibres of constant dimension $e$.
[/definition]
Clean composition replaces unique stationary points by stationary manifolds. If the critical set is not clean, its dimension or tangent space can jump, and the output may be a more singular distribution not belonging to a single standard FIO class. If the critical set is clean but has positive-dimensional fibres, ordinary stationary phase would undercount the number of variables left to integrate. The following theorem is needed because the excess $e$ measures the number of stationary directions that remain after the composed canonical relation has been identified, and those directions change the order of the operator.
[quotetheorem:8209]
[citeproof:8209]
The additional $e/2$ is the analytic trace of the remaining family of stationary points. Cleanliness is what makes this correction uniform; if the excess changes from point to point, there is no single order shift describing the whole composition. Properness again prevents uncontrolled integration over the intermediate variables, while embeddedness keeps the final canonical relation from decomposing into overlapping branches. The theorem does not say that every nontransversal composition is harmless: it applies only when the failure of transversality is exactly a smooth clean excess. This is the main correction to remember when moving from graph-like propagation to integral geometry.
[example: Normal Operator of the Radon Transform]
Let
\begin{align*}
Rf(s,\omega)=\int_{x\cdot \omega=s} f(x)\,dH^{n-1}(x)
\end{align*}
for $(s,\omega)\in \mathbb R\times S^{n-1}$, with the adjoint normalization
\begin{align*}
R^*g(x)=\int_{S^{n-1}}g(x\cdot \omega,\omega)\,d\omega.
\end{align*}
Microlocally, the condition $x\cdot \omega=s$ says that the covector at $x$ is a nonzero multiple of $\omega$, so composing the Radon relation with its transpose forces the incoming and outgoing covectors in $T^*\mathbb R^n_0$ to agree. Thus the canonical relation of $R^*R$ is the diagonal away from the zero section.
Now compute the multiplier. Use the Fourier convention
\begin{align*}
\widehat f(\xi)=\int_{\mathbb R^n}e^{-ix\cdot \xi}f(x)\,dx,\quad f(x)=(2\pi)^{-n}\int_{\mathbb R^n}e^{ix\cdot \xi}\widehat f(\xi)\,d\xi.
\end{align*}
The one-dimensional Fourier transform of $Rf$ in $s$ is
\begin{align*}
\mathcal F_s(Rf)(\sigma,\omega)=\int_{\mathbb R}e^{-is\sigma}\int_{x\cdot \omega=s}f(x)\,dH^{n-1}(x)\,ds=\int_{\mathbb R^n}e^{-i\sigma x\cdot \omega}f(x)\,dx=\widehat f(\sigma\omega).
\end{align*}
Therefore
\begin{align*}
R^*Rf(x)=\int_{S^{n-1}}Rf(x\cdot \omega,\omega)\,d\omega=(2\pi)^{-1}\int_{S^{n-1}}\int_{\mathbb R}e^{ix\cdot(\sigma\omega)}\widehat f(\sigma\omega)\,d\sigma\,d\omega.
\end{align*}
For any test function $F$ on $\mathbb R^n_0$, the change of variables $\xi=\sigma\omega$ gives
\begin{align*}
\int_{S^{n-1}}\int_{\mathbb R}F(\sigma\omega)\,d\sigma\,d\omega=2\int_{\mathbb R^n}F(\xi)|\xi|^{-(n-1)}\,d\xi.
\end{align*}
Applying this with $F(\xi)=e^{ix\cdot \xi}\widehat f(\xi)$ gives
\begin{align*}
R^*Rf(x)=\pi^{-1}\int_{\mathbb R^n}e^{ix\cdot \xi}\widehat f(\xi)|\xi|^{-(n-1)}\,d\xi=(2\pi)^{-n}\int_{\mathbb R^n}e^{ix\cdot \xi}c_n|\xi|^{-(n-1)}\widehat f(\xi)\,d\xi,
\end{align*}
where $c_n=2(2\pi)^{n-1}$ for this normalization. Hence $R^*R$ is pseudodifferential with principal multiplier $c_n|\xi|^{-(n-1)}$, so its order is $-(n-1)$. This is exactly the order shift predicted by *Clean FIO Composition*: the clean family of hyperplanes through a covector contributes the excess correction, and in dimension $n=2$ the resulting normal operator has order $-1$.
[/example]
The Radon example shows why the normal operator can be elliptic even when the transform itself is not pseudodifferential. The canonical relation becomes diagonal only after composing with the adjoint, and the power of $|\xi|$ records how many geometric parameters were integrated out. This makes the adjoint operation more than a formal Hilbert-space construction: it is the mechanism that reverses the canonical relation so that normal operators can become pseudodifferential. The next section turns to the canonical relation of an adjoint FIO.
## Adjoints of Fourier Integral Operators
Adjoints reverse the direction in which an operator moves singularities. Since the distribution kernel of $A^*$ is obtained from the complex conjugate of the kernel of $A$ with the variables swapped, the expected canonical relation is the transpose relation. The problem is to confirm that this expectation is compatible with orders and symbols.
[definition: Transpose Canonical Relation]
For a canonical relation $C \subset T^*X_0 \times T^*Y_0$, its transpose is
\begin{align*}
C^t = \{(y,\eta;x,\xi) : (x,\xi;y,\eta) \in C\} \subset T^*Y_0 \times T^*X_0.
\end{align*}
[/definition]
The transpose reverses the source and target of the microlocal correspondence. For an operator kernel, taking the adjoint also swaps the two variables and conjugates the oscillatory factor, so the expected relation is $C^t$.
The remaining issue is analytic rather than notational: the formal transpose of the canonical relation must correspond to an actual adjoint operator in the FIO calculus. One must check that adjunction preserves the Fourier integral class and order, and that it sends the leading symbol to the expected conjugate symbol.
[quotetheorem:8210]
[citeproof:8210]
This result makes normal operators accessible to the FIO calculus: $A^*A$ is governed by $C^t\circ C$. The proper support assumption ensures that the Hilbert-space adjoint has a kernel with the expected support behaviour; without it, swapping variables can still be done formally but the resulting operator may not act on the intended test-function spaces. The order is unchanged because adjunction does not introduce a stationary phase elimination, while the symbol is conjugated because the adjoint reverses the complex phase and the density pairing. The theorem does not imply that $A^*A$ is pseudodifferential for an arbitrary relation $C$; that requires $C^t\circ C$ to reduce to the diagonal, at least microlocally. The next theorem asks what extra conclusion is available when that diagonal composition is paired with an invertible leading symbol.
[quotetheorem:8211]
[citeproof:8211]
Ellipticity is therefore a statement not only about nonvanishing amplitudes but also about reversible canonical geometry. The graph hypothesis is essential: if several points of $T^*Y_0$ map to the same point of $T^*X_0$, no microlocal inverse can separate their contributions by a single FIO on the transpose relation. The nonvanishing principal symbol is equally essential, because a zero of the symbol loses a leading component of the singularity even when the canonical transformation is invertible. The theorem is only microlocal, so it constructs an inverse near the chosen conic neighbourhood rather than a global inverse on all of $T^*Y_0$. This prepares the next result, where an elliptic FIO is used to transfer pseudodifferential operators from one cotangent space to another.
## Egorov's Theorem for Conjugation by Elliptic FIOs
A pseudodifferential operator has diagonal canonical relation, so it acts by testing or modifying singularities without changing their cotangent location. If $U$ is an elliptic FIO quantising a canonical transformation $\kappa$, then $U^{-1}PU$ should be pseudodifferential again, with principal symbol obtained by pulling the symbol of $P$ along $\kappa$. Egorov's theorem is the precise version of this invariance of the pseudodifferential calculus under canonical changes of variables.
[quotetheorem:8212]
[citeproof:8212]
The theorem says that pseudodifferential observables are covariant under canonical transformations. The ellipticity of $U$ is what allows the leading symbols of $U$ and $Q$ to cancel; without it, conjugation may lose information and need not return an operator with the expected principal symbol. The graph condition is also built into the conclusion: a general canonical relation can spread one covector into many, so conjugating a diagonal relation need not remain diagonal. Egorov is a principal-symbol statement at this level, and the lower-order terms depend on quantisation choices rather than only on the symplectic map. In applications to PDE, the canonical transformation is often Hamiltonian flow, and Egorov converts propagation of operators into transport of their symbols.
[example: Conjugating by the Wave Group]
Let $M$ be a compact Riemannian manifold and let $U(t)=e^{-it\sqrt{\Delta}}$. Microlocally on $T^*M_0$, the half-wave parametrix identifies $U(t)$ as an elliptic FIO with canonical relation $\operatorname{graph}(\kappa_t)$, where $\kappa_t$ is the homogeneous geodesic flow. Since $U(-t)U(t)=I$ and $\kappa_{-t}=\kappa_t^{-1}$, the canonical relation of $U(-t)PU(t)$ is
\begin{align*}
\operatorname{graph}(\kappa_{-t})\circ \Delta_{T^*M_0}\circ \operatorname{graph}(\kappa_t).
\end{align*}
For $(y,\eta)\in T^*M_0$, the first propagation sends it to $\kappa_t(y,\eta)$, the diagonal relation leaves that covector unchanged, and the final propagation sends it to $\kappa_{-t}(\kappa_t(y,\eta))=(y,\eta)$. Hence
\begin{align*}
\operatorname{graph}(\kappa_{-t})\circ \Delta_{T^*M_0}\circ \operatorname{graph}(\kappa_t)=\Delta_{T^*M_0}.
\end{align*}
If $P\in \Psi^r(M)$ has principal symbol $p$, then applying *Egorov Theorem* with $Q=U(-t)$ gives
\begin{align*}
U(-t)PU(t)\in \Psi^r(M).
\end{align*}
On principal symbols, $U(t)$ transports a covector $(y,\eta)$ to $\kappa_t(y,\eta)$, $P$ multiplies there by $p(\kappa_t(y,\eta))$, and $U(-t)$ transports back. The elliptic leading symbols of $U(t)$ and $U(-t)$ cancel because they are microlocal inverses, so
\begin{align*}
\sigma_r(U(-t)PU(t))(y,\eta)=p(\kappa_t(y,\eta)).
\end{align*}
Equivalently,
\begin{align*}
\sigma_r(U(-t)PU(t))=p\circ \kappa_t.
\end{align*}
Thus measuring $P$ after wave propagation is microlocally the same as measuring the observable transported along the geodesic flow.
[/example]
This final example links the abstract calculus back to wave front propagation. Composition explains how singularities move through successive operators, adjoints make inverse and normal constructions available, and Egorov's theorem shows that pseudodifferential information is carried by the same canonical dynamics.
Once composition and adjoints are available, Fourier integral operators become a tool for solving PDE microlocally rather than just describing individual kernels. Parametrices then arise as approximate inverses built from the canonical flow determined by the hyperbolic principal symbol.
# 11. Parametrices for Hyperbolic Equations
Parametrices are the point where the symbolic and geometric parts of the course meet. Chapters 9 and 10 described canonical relations and Fourier integral operators as objects in their own right; here they become a method for solving hyperbolic Cauchy problems microlocally. The guiding question is how initial singularities determine oscillatory solutions, and how the characteristic Hamilton flow controls both the phase and the amplitude of those solutions.
## The Cauchy Problem and the Characteristic Variety
For an evolution equation, the first microlocal question is not whether a global solution exists, but which covectors are allowed to carry singularities. Hyperbolic equations single out a characteristic variety in the cotangent bundle, and the Cauchy problem asks how data on an initial hypersurface are lifted to that characteristic set.
Let $X$ be a smooth manifold, let $P\in \Psi^m(X)$ have real principal symbol $p\in S^m(T^*X\setminus 0)$, and let $\Sigma\subset T^*X\setminus 0$ be the zero set of $p$. The operator is microlocally elliptic away from $\Sigma$, so any nonsmooth behaviour of a solution to $Pu=f$ must occur over $\Sigma$ unless it is already forced by $f$.
[definition: Characteristic Variety]
Let $X$ be a smooth manifold and let $P:C_c^\infty(X)\to \mathcal D'(X)$ be a properly supported scalar pseudodifferential operator in $\Psi^m(X)$. Let $p:T^*X\setminus 0\to \mathbb C$ denote its homogeneous principal symbol. The characteristic variety of $P$ is
\begin{align*}
\operatorname{Char}(P) = \{(x,\xi)\in T^*X\setminus 0 : p(x,\xi)=0\}.
\end{align*}
[/definition]
This definition packages the obstruction to elliptic inversion. After localisation, such a $P$ also acts continuously as $P:H^s_{\mathrm{loc}}(X)\to H^{s-m}_{\mathrm{loc}}(X)$ for every $s\in \mathbb R$, so the same characteristic set controls Sobolev microlocal regularity. On the complement of $\operatorname{Char}(P)$, a microlocal parametrix for $P$ removes singularities, while on the characteristic variety the principal symbol vanishes and the transport of singularities becomes a Hamiltonian problem.
[example: Constant-Coefficient Wave Characteristics]
On $\mathbb R_t\times \mathbb R_x^n$, consider
\begin{align*}
P=\partial_t^2-\Delta_x=\partial_t^2-\sum_{j=1}^n\partial_{x_j}^2.
\end{align*}
Testing the top-order part on the oscillatory factor $e^{i(t\tau+x\cdot \xi)}$ gives
\begin{align*}
\partial_t^2 e^{i(t\tau+x\cdot \xi)}=-\tau^2 e^{i(t\tau+x\cdot \xi)}.
\end{align*}
For each spatial derivative,
\begin{align*}
-\partial_{x_j}^2 e^{i(t\tau+x\cdot \xi)}=\xi_j^2 e^{i(t\tau+x\cdot \xi)}.
\end{align*}
Adding the spatial terms therefore gives
\begin{align*}
-\Delta_x e^{i(t\tau+x\cdot \xi)}=\left(\sum_{j=1}^n \xi_j^2\right)e^{i(t\tau+x\cdot \xi)}=|\xi|^2e^{i(t\tau+x\cdot \xi)}.
\end{align*}
Thus the principal symbol is
\begin{align*}
p(t,x,\tau,\xi)=|\xi|^2-\tau^2.
\end{align*}
By the definition of the characteristic variety,
\begin{align*}
\operatorname{Char}(P)=\{(t,x,\tau,\xi)\in T^*(\mathbb R^{1+n})\setminus 0:|\xi|^2-\tau^2=0\}.
\end{align*}
The equation $|\xi|^2-\tau^2=0$ factors as
\begin{align*}
|\xi|^2-\tau^2=(|\xi|-\tau)(|\xi|+\tau),
\end{align*}
so on the nonzero cotangent variables it is equivalent to $\tau=|\xi|$ or $\tau=-|\xi|$. Hence
\begin{align*}
\operatorname{Char}(P)=\{(t,x,\tau,\xi):\tau=|\xi|,\ \xi\ne 0\}\cup \{(t,x,\tau,\xi):\tau=-|\xi|,\ \xi\ne 0\}.
\end{align*}
The Hamilton equations for $p$ are
\begin{align*}
\dot t=-2\tau,\qquad \dot x=2\xi,\qquad \dot \tau=0,\qquad \dot \xi=0.
\end{align*}
Thus $\tau$ and $\xi$ are constant along each bicharacteristic, and $x$ changes linearly with the flow parameter. The two sheets $\tau=\pm|\xi|$ are therefore the two constant-coefficient half-wave branches, with singularities propagating along straight rays in opposite time directions.
[/example]
The preceding example shows that a useful Cauchy parametrix requires more than the equation $p=0$: it requires the characteristic set to split into smooth real sheets over the spatial cotangent variables. The next definition isolates exactly the hypothesis under which the temporal roots give separate Hamilton flows that can be quantised branch by branch.
[definition: Strict Hyperbolicity]
Let $Y$ be a smooth manifold and let $I\subset \mathbb R$ be an open interval. Let
\begin{align*}
P:C^\infty(I\times Y)\longrightarrow C^\infty(I\times Y)
\end{align*}
be a scalar differential operator of order $m$ whose principal symbol
\begin{align*}
p:T^*(I\times Y)\setminus 0\longrightarrow \mathbb R
\end{align*}
is homogeneous of degree $m$ in $(\tau,\eta)$, where $(t,y,\tau,\eta)$ are cotangent coordinates. The operator $P$ is strictly hyperbolic with respect to $t$ on an open conic set $\Gamma\subset I\times (T^*Y\setminus 0)$ if, for every $(t,y,\eta)\in \Gamma$, the polynomial $\tau\mapsto p(t,y,\tau,\eta)$ has $m$ distinct real roots
\begin{align*}
\lambda_1(t,y,\eta),\dots,\lambda_m(t,y,\eta):\Gamma\longrightarrow \mathbb R.
\end{align*}
[/definition]
Strict hyperbolicity turns the characteristic variety into a union of smooth sheets. If the roots collide, the Vandermonde system that recovers the Cauchy data becomes singular and the scalar equation may contain glancing or multiple-characteristic behaviour that is not captured by a finite sum of independent half-wave branches. If the hypersurface $t=0$ is characteristic, the Cauchy data fail to determine the normal derivatives in the same way, so the construction has no well-posed starting surface.
These failure modes explain why the next theorem is formulated microlocally near one conic set of initial covectors. The theorem is needed to turn the geometric splitting into an actual solution operator: it says that, after localising the data and the forcing to compatible conic neighbourhoods, the Cauchy problem is represented modulo smoothing errors by a finite sum of branchwise Fourier integral operators.
[quotetheorem:8213]
[proofunderconstruction:8213]
This result is a construction theorem rather than a formula for one universal kernel. Proper support keeps the Schwartz kernels composable and prevents singularities from entering the microlocal patch from uncontrolled spatial infinity. The scalar and strict-root assumptions are not cosmetic: systems and multiple roots require diagonalisation or normal forms, and the branchwise Fourier integral representation may fail at glancing points. The theorem also does not assert global-in-time existence of one phase function; it gives a local microlocal parametrix whose charts can break down at caustics or when bicharacteristics leave the chosen conic neighbourhood. The rest of the chapter explains its two ingredients: the eikonal equation, which fixes the canonical relation, and the transport equations, which fix the symbolic weights along that relation.
## The Eikonal Equation and the Construction of the Phase
The next problem is to write the Lagrangian relation traced by bicharacteristics as an oscillatory integral. A phase function does this only if its critical set generates the correct canonical relation, and the condition that makes this happen is the eikonal equation.
For a first-order branch written in evolution form with spatial symbol $\lambda(t,y,\eta)$, the phase of a solution operator from $t=0$ to time $t$ is usually written in local coordinates as $\phi(t,y,z,\eta)$, where $z$ is the initial spatial point and $\eta$ is the initial covector. The phase must agree with the initial pairing at $t=0$ and must solve a Hamilton-Jacobi equation.
[definition: Eikonal Equation]
Let $Y$ be a smooth manifold, let $\Omega\subset I\times Y\times Y\times (\mathbb R^n\setminus 0)$ be a conic coordinate domain, and let
\begin{align*}
\lambda:I\times (T^*Y\setminus 0)\longrightarrow \mathbb R
\end{align*}
be a smooth real symbol homogeneous of degree $1$ in the covector variable. A smooth real phase
\begin{align*}
\phi:\Omega\longrightarrow \mathbb R
\end{align*}
solves the eikonal equation for the branch $\lambda$ if, for every $(t,y,z,\eta)\in \Omega$,
\begin{align*}
\partial_t\phi(t,y,z,\eta)+\lambda(t,y,\partial_y\phi(t,y,z,\eta))=0,
\end{align*}
and, on $\Omega\cap\{t=0\}$,
\begin{align*}
\phi(0,y,z,\eta)=(y-z)\cdot \eta.
\end{align*}
[/definition]
The equation is nonlinear in the phase but linear along characteristics. A real homogeneous Hamiltonian is essential here: if the branch is complex, the flow is no longer a real canonical transformation, and if homogeneity is lost the conic Fourier integral calculus no longer matches the scaling of wave front sets. Even with a real branch, the phase can fail as a single chart when the projection of the Lagrangian develops a caustic. Since the parametrix needs a nondegenerate phase rather than only a formal solution, the next result identifies the geometric hypothesis under which Hamilton flow produces a valid oscillatory-integral phase chart.
[quotetheorem:8214]
[citeproof:8214]
The nondegeneracy assumption is the local substitute for the informal phrase "before caustics." When it fails, a smooth Hamilton flow may still exist, but the same Lagrangian can fold over the base variables and require extra phase variables or a different chart. On the round sphere, for instance, geodesics starting near one pole focus at the antipodal point, so a single distance-type phase using only the initial covector cannot cover the conjugate point. If the Hamiltonian is complex, the flow no longer defines a real canonical graph; for example, replacing $|\eta|$ by $|\eta|+i$ gives exponential damping rather than a real Lagrangian propagation law. If homogeneity in $\eta$ is dropped, the conic scaling needed for wave front sets is lost, as happens for a massive dispersion relation
\begin{align*}
\lambda(\eta)=\sqrt{|\eta|^2+1}.
\end{align*}
The theorem does not give a global [generating function](/page/Generating%20Function), nor does it prevent conjugate points at later times. Its role is to produce the local oscillatory chart needed for the branchwise parametrix, after which the amplitude equations can be read along the same bicharacteristics.
[example: Euclidean Half-Wave Phase]
For the positive half-wave equation $(\partial_t+i|D_x|)u=0$ on $\mathbb R^n$, the branch symbol is $\lambda(\xi)=|\xi|$ for $\xi\ne 0$. Consider
\begin{align*}
\phi(t,x,y,\xi)=(x-y)\cdot \xi-t|\xi|.
\end{align*}
Differentiating in $t$ gives
\begin{align*}
\partial_t\phi(t,x,y,\xi)=-|\xi|.
\end{align*}
Differentiating in $x$ gives
\begin{align*}
\partial_x\phi(t,x,y,\xi)=\xi.
\end{align*}
Therefore
\begin{align*}
\partial_t\phi+|\partial_x\phi|=-|\xi|+|\xi|=0.
\end{align*}
At $t=0$ the same formula gives
\begin{align*}
\phi(0,x,y,\xi)=(x-y)\cdot \xi.
\end{align*}
Thus $\phi$ satisfies the eikonal equation with the required initial phase.
The critical equation in the phase variable $\xi$ is
\begin{align*}
\partial_\xi\phi(t,x,y,\xi)=x-y-t\frac{\xi}{|\xi|}=0,
\end{align*}
because $\partial_\xi|\xi|=\xi/|\xi|$ for $\xi\ne 0$. Hence the critical set is described by
\begin{align*}
x=y+t\frac{\xi}{|\xi|}.
\end{align*}
On this critical set, the output covector is
\begin{align*}
\partial_x\phi=\xi,
\end{align*}
and the input covector is
\begin{align*}
-\partial_y\phi=\xi.
\end{align*}
So the canonical relation sends $(y,\xi)$ to $(x,\xi)$ with $x=y+t\xi/|\xi|$.
For $q(t,x,\tau,\xi)=\tau+|\xi|$, Hamilton's equations are
\begin{align*}
\dot t=\partial_\tau q=1,\quad \dot x=\partial_\xi q=\frac{\xi}{|\xi|},\quad \dot\tau=-\partial_t q=0,\quad \dot\xi=-\partial_x q=0.
\end{align*}
Since $\dot t=1$, the flow parameter is $t$ itself, and integrating $\dot x=\xi/|\xi|$ with initial point $x(0)=y$ gives $x(t)=y+t\xi/|\xi|$. The phase therefore parametrises exactly the straight-line half-wave Hamilton flow.
[/example]
On a curved manifold, the same construction replaces straight lines by geodesics. Local phases are still action functions, but the existence of a single phase chart is limited by conjugate points.
[example: Riemannian Wave Phase before Conjugate Points]
Let $(Y,g)$ be a Riemannian manifold and consider the positive half-wave branch with
\begin{align*}
\lambda(y,\eta)=|\eta|_g=\left(g^{ij}(y)\eta_i\eta_j\right)^{1/2}
\end{align*}
for $\eta\ne 0$. In local coordinates, Hamilton's equations for $\lambda$ are
\begin{align*}
\dot y^i=\frac{\partial \lambda}{\partial \eta_i}=\frac{g^{ij}(y)\eta_j}{|\eta|_g}.
\end{align*}
They also give
\begin{align*}
\dot\eta_i=-\frac{\partial \lambda}{\partial y^i}=-\frac{1}{2|\eta|_g}\frac{\partial g^{jk}}{\partial y^i}(y)\eta_j\eta_k.
\end{align*}
The base curve has unit speed, because
\begin{align*}
g_{ij}(y)\dot y^i\dot y^j=g_{ij}(y)\frac{g^{ik}(y)\eta_k}{|\eta|_g}\frac{g^{j\ell}(y)\eta_\ell}{|\eta|_g}=\frac{g^{k\ell}(y)\eta_k\eta_\ell}{|\eta|_g^2}=1.
\end{align*}
Thus the projection of the Hamilton flow is the unit-speed geodesic determined by the initial covector direction.
Fix $(z,\eta)\in T^*Y\setminus 0$ and write
\begin{align*}
v^i=\frac{g^{ij}(z)\eta_j}{|\eta|_g}.
\end{align*}
Then $|v|_g=1$, and before the first conjugate time the map $(t,z,v)\mapsto \exp_z(tv)$ is locally nonsingular. Hence the endpoint
\begin{align*}
y(t)=\exp_z(tv)
\end{align*}
determines a single local geodesic branch. Along that branch,
\begin{align*}
d_g(y(t),z)=|t|
\end{align*}
for $t$ in the chosen normal neighbourhood, since the geodesic segment is length-minimising there and has speed $1$.
The associated phase may therefore be chosen as a local Hamilton-Jacobi generating function for this geodesic flow: it satisfies
\begin{align*}
\partial_t\phi(t,y,z,\eta)+|\partial_y\phi(t,y,z,\eta)|_g=0
\end{align*}
and
\begin{align*}
\phi(0,y,z,\eta)=(y-z)\cdot \eta
\end{align*}
in the coordinate chart. Its critical set in the phase variables records precisely the relation $y=\exp_z(t\,g^{-1}\eta/|\eta|_g)$, while the output covector is the geodesically transported covector along this unit-speed ray. The restriction to times before conjugate points is exactly the condition that this single phase chart still parametrises one smooth branch of the wave kernel.
[/example]
The phase determines where singularities may go, but it does not determine their strength. That information is encoded in the amplitude, whose leading term already contains geometric spreading along the Hamilton flow.
## Transport Equations for Amplitudes
Once a phase solves the eikonal equation, applying the differential operator to an oscillatory ansatz produces an expansion in homogeneous orders of the frequency variable. The eikonal equation cancels the top order term; the remaining equations recursively determine the amplitude.
Consider an oscillatory ansatz for a branch of a hyperbolic equation,
\begin{align*}
u(t,y)=\int e^{i\phi(t,y,z,\eta)}a(t,y,z,\eta)g(z)\,d\eta\,dz,
\end{align*}
where $a\sim \sum_{k\ge 0}a_{r-k}$ is a classical symbol. The amplitudes are chosen so that $Pu$ is smooth modulo the desired error order.
[definition: Transport Equation]
Let $P:C^\infty(I\times Y)\to C^\infty(I\times Y)$ be a scalar differential operator of order $m$, let $\phi:\Omega\to \mathbb R$ be a nondegenerate eikonal phase on a conic coordinate domain $\Omega$, and let $S^r_{\mathrm{cl}}(\Omega)$ denote classical symbols in the frequency variable. The transport operator at order $r-\ell$ is the linear map
\begin{align*}
T_\ell:S^{r-\ell}_{\mathrm{cl}}(\Omega)\longrightarrow S^{r-\ell+m-1}_{\mathrm{cl}}(\Omega)/S^{r-\ell+m-2}_{\mathrm{cl}}(\Omega)
\end{align*}
defined by taking the homogeneous component of degree $r-\ell+m-1$ in the expansion of $e^{-i\phi}P(e^{i\phi}a)$. The transport equations for an amplitude $a\sim\sum_{\ell\ge 0}a_{r-\ell}$ are the relations
\begin{align*}
T_0a_r=0,\qquad
T_\ell a_{r-\ell}=F_\ell(a_r,\dots,a_{r-\ell+1}),\qquad \ell\ge 1,
\end{align*}
where, for $\ell\ge 1$,
\begin{align*}
F_\ell:\prod_{j=0}^{\ell-1}S^{r-j}_{\mathrm{cl}}(\Omega)\longrightarrow S^{r-\ell+m-1}_{\mathrm{cl}}(\Omega)/S^{r-\ell+m-2}_{\mathrm{cl}}(\Omega)
\end{align*}
is the map determined by lower-order symbol terms and the requirement that the full expansion vanish to infinite order along the critical set.
[/definition]
The first transport equation is the most important one: it moves the principal amplitude along the Hamilton flow. Without a nondegenerate phase, these equations do not describe a symbol on a clean Lagrangian chart; without initial data on a transverse hypersurface, the first-order equations along rays may not have unique solutions. Lower-order terms also matter, because the subprincipal symbol enters the inhomogeneous part of the first transport equation. To complete the parametrix, the recursive equations must be solvable with initial values fixed by the Cauchy data, and the next theorem gives that solvability statement.
[quotetheorem:8215]
[citeproof:8215]
This theorem explains why parametrices depend on the subprincipal symbol even though the path of propagation is governed by the principal symbol. Transversality to the initial hypersurface is needed because the transport equations are solved as initial-value problems along rays; if a ray grazes the hypersurface, the assigned initial amplitude need not determine a unique transported density. Nondegeneracy of the phase is needed because amplitudes differing by functions vanishing on the critical set represent the same Lagrangian distribution, while degenerate parametrisations can introduce spurious symbol data. The theorem does not select a canonical normalisation by itself; the Cauchy matching or fundamental-solution normalisation supplies that extra information.
[example: Geometric Optics Approximation]
For the model scalar wave operator $P=\partial_t^2-\Delta_x$, write
\begin{align*}
u_h(t,x)=e^{i\phi(t,x)/h}a(t,x;h),\qquad a(t,x;h)=a_0(t,x)+h a_1(t,x)+\cdots .
\end{align*}
We compute the first two orders in $h$ by applying $h^2P$ to $e^{i\phi/h}a$. The time derivatives are
\begin{align*}
\partial_t(e^{i\phi/h}a)=e^{i\phi/h}\left(\frac{i}{h}\phi_t a+\partial_t a\right)
\end{align*}
and
\begin{align*}
\partial_t^2(e^{i\phi/h}a)=e^{i\phi/h}\left(-\frac{\phi_t^2}{h^2}a+\frac{2i}{h}\phi_t\partial_t a+\frac{i}{h}\phi_{tt}a+\partial_t^2a\right).
\end{align*}
For the spatial Laplacian,
\begin{align*}
\Delta_x(e^{i\phi/h}a)=e^{i\phi/h}\left(-\frac{|\nabla_x\phi|^2}{h^2}a+\frac{2i}{h}\nabla_x\phi\cdot\nabla_x a+\frac{i}{h}(\Delta_x\phi)a+\Delta_xa\right).
\end{align*}
Subtracting the Laplacian term from the time term gives
\begin{align*}
e^{-i\phi/h}h^2P(e^{i\phi/h}a)=\left(|\nabla_x\phi|^2-\phi_t^2\right)a+ih\left(2\phi_t\partial_ta-2\nabla_x\phi\cdot\nabla_xa+(\phi_{tt}-\Delta_x\phi)a\right)+h^2(\partial_t^2a-\Delta_xa).
\end{align*}
For $u_h$ to solve $Pu_h=0$ modulo lower powers of $h$, the coefficient of $h^0$ must vanish. If $a_0$ is not identically zero, this gives the eikonal equation
\begin{align*}
\phi_t^2=|\nabla_x\phi|^2.
\end{align*}
The coefficient of $h^1$ then gives the leading transport equation for $a_0$:
\begin{align*}
2\phi_t\partial_ta_0-2\nabla_x\phi\cdot\nabla_xa_0+(\phi_{tt}-\Delta_x\phi)a_0=0.
\end{align*}
The vector field $2\phi_t\partial_t-2\nabla_x\phi\cdot\nabla_x$ is tangent to the rays determined by the eikonal equation, while the scalar term $\phi_{tt}-\Delta_x\phi$ changes the size of $a_0$ when neighbouring rays spread out or focus. Thus the phase determines the travel direction of wave fronts, and the transport equation determines the leading intensity change along those rays.
[/example]
Transport equations also give the local form of the Hadamard parametrix. The geometric optics example tracks a single high-frequency family and shows how amplitudes move along one ray family, but a fundamental solution has a stronger normalisation problem: it must reproduce the delta distribution on the diagonal while respecting causal support. That diagonal normalisation is not visible in the single-ray ansatz above, and it is the extra condition that turns transported amplitudes into a kernel that inverts the wave operator modulo a smooth remainder. The next theorem is needed to package the same eikonal and transport mechanism into future and past microlocal fundamental parametrices near the light cone.
Before stating the Hadamard form, the background geometry has to be fixed. A "normally hyperbolic operator" is not a condition on an arbitrary smooth manifold; it is tied to a Lorentzian metric and, for systems, to a vector bundle. Future and past choices also matter, because the same local symbolic construction can be normalised with different one-sided support conditions. The statement below is local and microlocal: it asserts the Hadamard parametrix form in a small causal neighbourhood, while actual global advanced or retarded fundamental solutions require global hypotheses such as global hyperbolicity.
[quotetheorem:8216]
[citeproof:8216]
This microlocal version is the bridge between the classical light-cone expansion and the Fourier integral operator calculus. The Lorentzian and time-orientation hypotheses are needed to distinguish future from past parametrices and to give meaning to the support conditions $J^\pm$. Geodesic convexity rules out multiple null geodesics between nearby points; without it, the kernel is a sum over several branches and may encounter caustics. The statement does not solve global existence on an arbitrary spacetime, since global hyperbolicity and boundary conditions are separate issues. What it does provide is the local model that the wave-kernel construction uses in each null branch.
## The Wave Kernel as a Sum of Lagrangian Distributions
The final problem is to assemble the local oscillatory pieces into a statement about the Schwartz kernel of the solution operator. The answer is that the wave kernel is not merely singular on the light cone; it is a Lagrangian distribution associated with the canonical relation generated by the bicharacteristic flow.
Let $U(t)$ denote a wave or half-wave propagator on a manifold $Y$. Its Schwartz kernel $K_U(t,y,z)$ is a distribution on $\mathbb R_t\times Y\times Y$, and its wave front set records the input covectors at $z$ and the output covectors at $y$.
[definition: Bicharacteristic Canonical Relation]
Let $X$ be a smooth manifold, let $p:T^*X\setminus 0\to \mathbb R$ be a real principal type symbol, and let $S_0,S_1\subset X$ be hypersurfaces that are noncharacteristic for $p$. Write $\iota_j:S_j\hookrightarrow X$ and let
\begin{align*}
\rho_j:p^{-1}(0)\cap (T^*X)|_{S_j}\longrightarrow T^*S_j\setminus 0
\end{align*}
be the restriction of covectors by $\rho_j(x,\xi)=(x,\iota_j^*\xi)$. Here $(T^*X)|_{S_j}$ means the cotangent bundle of $X$ restricted to base points in $S_j$, not the conormal bundle of $S_j$. Let $D\subset \mathbb R\times p^{-1}(0)$ be an open conic flow domain and let
\begin{align*}
\Phi:D\longrightarrow p^{-1}(0),\qquad \Phi(s,\alpha)=\Phi_s(\alpha),
\end{align*}
be the Hamilton flow of $H_p$ restricted to the characteristic set. For each $s\in \mathbb R$, write $D_s=\{\alpha\in p^{-1}(0):(s,\alpha)\in D\}$, so $\Phi_s:D_s\to p^{-1}(0)$ is a smooth homogeneous transformation. The bicharacteristic canonical relation from $S_0$ to $S_1$ is the conic relation
\begin{align*}
C_{S_1,S_0}
=\{(\rho_1(\Phi_s(\alpha)),\rho_0(\alpha))\in (T^*S_1\setminus 0)\times (T^*S_0\setminus 0): \alpha\in D_s\cap p^{-1}(0)\cap (T^*X)|_{S_0},\ \Phi_s(\alpha)\in (T^*X)|_{S_1}\}
\end{align*}
[/definition]
When this relation is used as the canonical relation of a kernel on $S_1\times S_0$, the source covector is inserted with the standard opposite sign. This relation is the graph of a canonical transformation as long as the flow remains single-valued between the chosen hypersurfaces. At caustics or after global returns, it may be represented only locally by phase charts, and the wave kernel is then written as a finite or locally finite sum of Lagrangian distributions.
The definition gives the geometric target for the kernel calculus, but a solution operator also carries amplitudes and branch multiplicities. The next theorem is needed to pass from the set-theoretic flow relation to the analytic statement that the Schwartz kernel of the propagator is, microlocally, a finite sum of Lagrangian distributions associated with those graphs.
[quotetheorem:8217]
[citeproof:8217]
This theorem is the operational form of propagation of singularities for the Cauchy problem. Proper support and the finite chart condition keep the kernel calculus local. If proper support is dropped on a noncompact manifold, a kernel may receive contributions from data escaping to spatial infinity, so composition with cutoffs need not remain controlled in the chosen patch. If infinitely many branches or returns accumulate in the same compact time interval, as can occur for recurrent geodesic flow after repeated returns to a coordinate neighbourhood, a finite oscillatory sum need not represent the operator in one chart system. Strict hyperbolicity is also essential: at a glancing or double root, such as a model symbol with two coincident temporal roots, the Vandermonde matching of Cauchy data degenerates and the propagator need not split into independent Hamilton graphs. The theorem describes the kernel modulo smoothing terms, so it controls wave front sets rather than giving a pointwise formula for all solutions. If the initial data are singular at a covector, the possible output singularities lie along the Hamilton flow from that covector, with multiplicity and order governed by the amplitude.
The kernel statement still leaves a basic support question open. A Fourier integral representation tells where singularities can travel microlocally, but by itself it does not rule out a smooth tail appearing instantly far away from the initial support. For retarded parametrices one needs a stronger domain-of-dependence conclusion: no influence should arrive outside the cone swept out by the characteristic Hamilton flow. The next result supplies exactly that finite-speed upgrade, combining the characteristic geometry with uniqueness so that propagation controls support, not only wave front sets.
[quotetheorem:8218]
[citeproof:8218]
The microlocal proof separates two ideas that are often merged in energy arguments: the geometry of possible influence comes from the characteristic flow, while uniqueness converts absence of incoming data into vanishing. The bounded-speed hypothesis is needed because an unbounded Hamilton velocity would give no finite cone in the chosen coordinates. The energy uniqueness hypothesis is also essential: absence of a wave front component rules out singular influence, but a smooth nonzero solution could still remain unless support is controlled by a domain-of-dependence theorem. This viewpoint also explains why lower-order terms can affect amplitudes but cannot enlarge the characteristic cone.
[example: Finite Speed for the Euclidean Wave Equation]
For $P=\partial_t^2-\Delta_x$ on $\mathbb R^{1+n}$, the principal symbol is
\begin{align*}
p(t,x,\tau,\xi)=|\xi|^2-\tau^2.
\end{align*}
Hamilton's equations are
\begin{align*}
\dot t=\partial_\tau p=-2\tau,\quad \dot x=\partial_\xi p=2\xi,\quad \dot\tau=-\partial_t p=0,\quad \dot\xi=-\partial_x p=0.
\end{align*}
Thus $\tau$ and $\xi$ are constant along the flow. On the characteristic set $|\xi|^2-\tau^2=0$, we have $\tau=|\xi|$ or $\tau=-|\xi|$ with $\xi\ne 0$. Since $\dot t=-2\tau$, using $t$ as the time variable gives
\begin{align*}
\frac{dx}{dt}=\frac{\dot x}{\dot t}=\frac{2\xi}{-2\tau}=-\frac{\xi}{\tau}.
\end{align*}
On the sheet $\tau=|\xi|$, this is $dx/dt=-\xi/|\xi|$; on the sheet $\tau=-|\xi|$, this is $dx/dt=\xi/|\xi|$. Therefore a bicharacteristic starting over $(0,y)$ reaches
\begin{align*}
x=y-t\frac{\xi}{|\xi|}
\end{align*}
on one sheet and
\begin{align*}
x=y+t\frac{\xi}{|\xi|}
\end{align*}
on the other.
In both cases,
\begin{align*}
|x-y|=\left|t\frac{\xi}{|\xi|}\right|=|t|\frac{|\xi|}{|\xi|}=|t|.
\end{align*}
So the base projection of the characteristic flow is contained in the Euclidean light cone relation $|x-y|=|t|$. Conversely, if $|x-y|=|t|$ and $t\ne 0$, choosing $\xi$ parallel to $x-y$ gives $x=y+t\xi/|\xi|$ or $x=y-t\xi/|\xi|$, depending on the sign of $t$ and the branch. Hence the wave kernel can have singularities only on the light cone, together with the initial diagonal at $t=0$.
The corresponding support statement is that if the Cauchy data are supported in $E\subset \mathbb R^n$, then points influenced at time $t$ must lie in
\begin{align*}
\{x\in\mathbb R^n:\operatorname{dist}(x,E)\le |t|\}.
\end{align*}
Thus the Euclidean wave equation has propagation speed $1$: singularities and support cannot move farther than distance $|t|$ in time $t$.
[/example]
The last structural result records the precise relationship between the singular support of the kernel and the geometry of the flow. It is weaker than the full wave front set statement but often easier to use when only the base variables matter.
[quotetheorem:8219]
[citeproof:8219]
The projection from wave front set to singular support loses covector information, so the theorem is intentionally weaker than propagation of singularities. Cancellation between branches can remove some base singularities, which is why the conclusion is an inclusion rather than an equality: in one-dimensional wave propagation, the two half-wave branches can combine so that a derivative of the kernel has a visible singularity where the undifferentiated kernel has a cancellation. The proper-support hypothesis prevents unrelated singularities at spatial infinity from entering the kernel under composition; on a noncompact manifold, an improperly supported kernel can move a compactly supported cutoff into a distribution whose singular behaviour depends on uncontrolled far-away data. Geodesic convexity and the finite-chart assumptions have a separate role: if several geodesic branches connect the same base points, the base projection may contain overlapping contributions with different covectors, and singular support alone cannot tell which branch produced them. This completes the parametrix construction for hyperbolic equations in the form needed later in applications: the wave kernel is built from a Lagrangian relation determined by the principal Hamilton flow and a classical amplitude determined by transport along that relation.
Parametrix construction completes the local microlocal picture for hyperbolic equations, but its value is clearest in concrete problems. The final chapter applies this machinery to wave propagation, tomography, inverse spectral questions, and scattering, where the abstract calculus becomes an analytic method for extracting geometric information.
# 12. Applications and Case Studies
## Propagation from Conormal Initial Data
This final chapter applies the microlocal calculus developed in Chapters 1 through 11 to concrete analytic problems: wave propagation, inverse spectral geometry, tomography, and scattering. The guiding question is always the same: given an operator or evolution equation, which covectors can carry singularities from the input to the output, and when can those singularities be detected or recovered? The prerequisites are the wave front set, conormal distributions, Hamiltonian flow of principal symbols, Fourier integral operators, and ellipticity.
The first problem is to predict what a wave equation does to an initial singularity that is already geometrically organised. A conormal distribution is singular along a submanifold and oscillates only in covectors normal to that submanifold, so it is the natural initial datum for a wave hitting or emitted from an interface.
[definition: Conormal Initial Datum]
Let $Y \subset \mathbb R^n$ be an embedded submanifold. A distribution $u_0 \in \mathcal D'(\mathbb R^n)$ is a conormal initial datum of order $m$ along $Y$ if $u_0\in I^m(\mathbb R^n,Y)$, meaning that in every coordinate chart in which $Y=\{x''=0\}$, with $x=(x',x'')\in\mathbb R^k\times\mathbb R^{n-k}$, $u_0$ is a finite sum of a smooth function and oscillatory integrals of the form
\begin{align*}
u_0(x)=\int_{\mathbb R^{n-k}} e^{i x''\cdot\theta}a(x',\theta)\,d\mathcal L^{n-k}(\theta),
\end{align*}
where, using the Lagrangian distribution order convention from Chapter 8, $a\in S^{m+n/4-(n-k)/2}_{\mathrm{cl}}$ in the variable $\theta$. A pair $(u_0,u_1)$ is conormal Cauchy data along $Y$ if each component belongs to some conormal class $I^{m_j}(\mathbb R^n,Y)$.
[/definition]
The symbol estimates in the transverse frequency variable are the extra structure that distinguishes conormal distributions from arbitrary distributions whose wave front set happens to lie in the same conormal bundle. As a consequence of the definition,
\begin{align*}
WF(u_0) \subset N^*Y \setminus 0,
\end{align*}
where
\begin{align*}
N^*Y = \{(y,\xi) \in T^*\mathbb R^n : y \in Y,\ \xi(v)=0 \text{ for all } v \in T_yY\}.
\end{align*}
This definition packages both the geometric direction and the regularity pattern of the singularity: a jump across a hypersurface has symbol-controlled oscillation in the normal covariable, while a point source has every nonzero covector over the source point with a homogeneous amplitude. The next question is whether solving the wave equation creates arbitrary new singular directions or only transports these conormal directions along the characteristic geometry.
[quotetheorem:8220]
[citeproof:8220]
The theorem is an inclusion because cancellations can remove some singularities, but the geometry of possible propagation is fixed by the characteristic Hamilton flow. The compact-support hypothesis on the Cauchy data is not cosmetic: for data such as an infinite plane wave or a non-compact hyperplane jump, singularities already extend to spatial infinity, so global statements about the solution in $\mathcal D'(\mathbb R_t\times\mathbb R_x^n)$ no longer follow from a compactly supported parametrix without adding proper-support or finite-propagation cutoffs. The conormal hypothesis is what makes the initial singularities geometrically controlled; for instance, if a distribution has a crossing singularity at the origin with wave front directions in $N^*\{x_1=0\}\cup N^*\{x_2=0\}$, there is no single smooth hypersurface $Y$ whose conormal bundle describes the initial front, and the outgoing fronts split into two unrelated families. The characteristic lift condition is also essential: a covector over the Cauchy surface with $\tau^2\ne|\xi|^2$ is noncharacteristic for $D_t^2-\Delta_x$, so elliptic regularity in spacetime removes it instead of propagating it as a wave. The theorem does not assert that every transported covector remains singular; choosing Cauchy data whose two half-wave amplitudes cancel on one branch can erase that branch. The examples below show how the same statement specialises to a codimension-one jump, where only normal directions occur, and to a point source, where every spatial direction is present.
[example: Wave Fronts from a Hyperplane Jump]
This is a local model for the preceding theorem rather than an instance of its compact-support hypotheses. Let $u_0(x)=H(x_1)$ on $\mathbb R^n$ and $u_1=0$; after multiplying by a cutoff equal to $1$ near a chosen point of $\{x_1=0\}$, the same microlocal calculation applies near that point. Since $H(x_1)$ is smooth where $x_1\ne0$ and has a one-dimensional jump across $x_1=0$, its singular covectors are precisely the nonzero multiples of the normal covector $dx_1$:
\begin{align*}
WF(u_0)=\{(x,\lambda dx_1): x_1=0,\ \lambda \ne 0\}.
\end{align*}
For the wave symbol $p(t,x,\tau,\xi)=\tau^2-|\xi|^2$, the Hamilton vector field is
\begin{align*}
H_p=(\partial_\tau p)\partial_t-(\partial_t p)\partial_\tau+\sum_{j=1}^n(\partial_{\xi_j}p)\partial_{x_j}-\sum_{j=1}^n(\partial_{x_j}p)\partial_{\xi_j}=2\tau\partial_t-2\sum_{j=1}^n\xi_j\partial_{x_j}.
\end{align*}
Starting from a conormal covector over the initial slice,
\begin{align*}
(t(0),x(0),\tau(0),\xi(0))=(0,y,\tau,\lambda dx_1),\qquad y_1=0,\quad \lambda\ne0,
\end{align*}
the characteristic condition is
\begin{align*}
0=p(0,y,\tau,\lambda dx_1)=\tau^2-\lambda^2,
\end{align*}
so $\tau=\pm|\lambda|$. Hamilton's equations give
\begin{align*}
\dot t=2\tau,\qquad \dot x_1=-2\lambda,\qquad \dot x_j=0\text{ for }j\ge2,\qquad \dot\tau=0,\qquad \dot\xi=0.
\end{align*}
Thus
\begin{align*}
t(s)=2\tau s,\qquad x_1(s)=-2\lambda s,\qquad x_j(s)=y_j\text{ for }j\ge2.
\end{align*}
When $t(s)\ne0$,
\begin{align*}
\frac{x_1(s)}{t(s)}=\frac{-2\lambda s}{2\tau s}=-\frac{\lambda}{\tau}.
\end{align*}
Since $\tau=\pm|\lambda|$, this ratio is either $1$ or $-1$, and the projected fronts are exactly the two hyperplanes $x_1=t$ and $x_1=-t$.
In one space dimension, the same two fronts are visible from [d'Alembert's formula](/theorems/665). With $u_1=0$,
\begin{align*}
u(t,x)=\frac{1}{2}u_0(x+t)+\frac{1}{2}u_0(x-t)=\frac12H(x+t)+\frac12H(x-t).
\end{align*}
The term $H(x+t)$ jumps on $x=-t$, and the term $H(x-t)$ jumps on $x=t$, so the initial jump splits into two travelling conormal jumps rather than creating arbitrary new singular directions.
[/example]
This example is rigid because the initial hypersurface has a single normal direction at each point, up to sign. The propagation theorem then has only two characteristic choices to follow, corresponding to the two half-waves. A point source is the opposite limiting case: the singular set is smaller in physical space, but its fibre in the wave front set contains every nonzero spatial covector, so the wave equation generates the full light cone rather than a pair of parallel fronts.
[example: Singularities Produced by a Point Source]
Let $u_0=\delta_0$ and $u_1=0$ on $\mathbb R^n$. The point mass is smooth away from the origin, and its wave front set is the full punctured cotangent fibre over $0$:
\begin{align*}
WF(\delta_0)=\{(0,\xi):\xi\ne0\}.
\end{align*}
Thus the initial conormal directions are not restricted to one normal line; every nonzero spatial covector $\xi$ is present.
For the wave symbol
\begin{align*}
p(t,x,\tau,\xi)=\tau^2-|\xi|^2,
\end{align*}
the derivatives are $\partial_\tau p=2\tau$, $\partial_{\xi_j}p=-2\xi_j$, $\partial_t p=0$, and $\partial_{x_j}p=0$. Hence the Hamilton vector field is
\begin{align*}
H_p=2\tau\partial_t-2\sum_{j=1}^n \xi_j\partial_{x_j}.
\end{align*}
Starting from an initial covector
\begin{align*}
(t(0),x(0),\tau(0),\xi(0))=(0,0,\tau,\xi),\qquad \xi\ne0,
\end{align*}
the characteristic condition is
\begin{align*}
0=p(0,0,\tau,\xi)=\tau^2-|\xi|^2.
\end{align*}
Therefore $\tau^2=|\xi|^2$, so $\tau=\pm|\xi|$. Hamilton's equations are
\begin{align*}
\dot t=2\tau,\quad \dot x=-2\xi,\quad \dot\tau=0,\quad \dot\xi=0.
\end{align*}
Since $\tau$ and $\xi$ are constant along the bicharacteristic, integration from $s=0$ gives
\begin{align*}
t(s)=2\tau s,\quad x(s)=-2\xi s.
\end{align*}
Taking Euclidean norms,
\begin{align*}
|x(s)|=|-2\xi s|=2|\xi|\,|s|.
\end{align*}
Also,
\begin{align*}
|t(s)|=|2\tau s|=2|\tau|\,|s|=2|\xi|\,|s|.
\end{align*}
Thus every projected bicharacteristic from the point source satisfies
\begin{align*}
|x(s)|=|t(s)|.
\end{align*}
Conversely, if $t\ne0$ and $|x|=|t|$, choose $\xi$ parallel to $-x$ and choose $\tau$ with sign matching $t$; then $x/t=-\xi/\tau$, so the same formula $x(s)=-2\xi s$, $t(s)=2\tau s$ reaches that point on the cone. The initial point singularity therefore propagates into the full spherical wave front $|x|=|t|$, because the fibre $WF(\delta_0)$ contains every nonzero spatial direction.
[/example]
These examples justify the phrase propagation of singularities: the singular set in physical space may spread, but its cotangent directions move by a deterministic symplectic rule. In variable-coefficient wave equations, the same statement replaces straight rays by null bicharacteristics of the metric principal symbol.
## Microlocal Structure of the Wave Trace
The next question is how much of a manifold's geodesic dynamics is audible from the spectrum of its Laplacian. A bare list of eigenvalues is too static for microlocal analysis: it records frequencies but does not display the times at which those frequencies interfere constructively. Passing from the list $\{\lambda_j^2\}$ to the time distribution $\sum_j e^{-it\lambda_j}$ converts spectral data into a trace of propagation, and this is what allows closed geodesics to appear as singular times. The bridge between eigenvalues and geometry is the trace of the wave group, whose singularities occur when geodesic flow has fixed points.
[definition: Wave Trace]
Let $(M,g)$ be a compact Riemannian manifold without boundary, equipped with the Riemannian density $dV_g$. Let
\begin{align*}
\Delta_g:H^2(M)\subset L^2(M,dV_g)\to L^2(M,dV_g)
\end{align*}
be the nonnegative self-adjoint Laplace-Beltrami operator. Let
\begin{align*}
U(t)=e^{-it\sqrt{\Delta_g}}:L^2(M,dV_g)\to L^2(M,dV_g)
\end{align*}
be the wave group. The wave trace is the distribution in $\mathcal D'(\mathbb R)$ obtained by taking the distributional trace of the Schwartz kernel of $U(t)$ along the diagonal:
\begin{align*}
\operatorname{Tr}\,U(t)=\operatorname{Tr}\, e^{-it\sqrt{\Delta_g}} = \sum_{j=0}^{\infty} e^{-it\lambda_j},
\end{align*}
where $\lambda_j^2$ are the eigenvalues of $\Delta_g$, repeated with multiplicity.
[/definition]
The functional calculus behind the definition comes from the spectral theorem: $\sqrt{\Delta_g}$ is a nonnegative self-adjoint operator, and $U(t)$ is a unitary group with distributional Schwartz kernel. The trace is not an ordinary function in general; it is a distribution whose singular support can be studied by restricting the wave kernel to the diagonal. Compactness is used here to make the spectrum discrete, so that the trace is a spectral sum rather than a continuous spectral density. The central problem is to identify which times survive this trace operation, and the answer is expressed in terms of closed geodesics.
[quotetheorem:8221]
[citeproof:8221]
The first sentence of the theorem is a one-way singular-support containment: if a nonzero time is singular, then some closed geodesic of length $|t_0|$ must exist. The sign of $t_0$ records the choice between the two half-wave propagators, while length is nonnegative. The converse requires the clean fixed-point hypothesis because otherwise different branches of the canonical relation may meet with excess degeneracy, and stationary phase no longer gives the stated conormal model. A useful model is an oscillatory integral whose phase has a degenerate critical point, such as $\int e^{it\theta^3}a(\theta)\,d\theta$: the singular behaviour is governed by cubic scaling rather than by the nondegenerate quadratic Hessian appearing in ordinary stationary phase. Non-clean fixed-point sets can produce analogous changes in the wave trace, including different orders and coefficients from those predicted by the clean calculus. The condition $t_0\ne0$ removes the universal singularity at the origin, which reflects the local Weyl law and the diagonal singularity of the wave kernel rather than periodic geodesic motion. Boundarylessness avoids extra reflected or diffracted rays; on a manifold with boundary, singularities may also come from broken geodesics reflecting at the boundary. Thus the principle is not a reconstruction theorem by itself, but it explains why the wave trace is sensitive to periodic geodesic dynamics and why spectral rigidity problems study whether the singularity expansion determines the metric near a closed orbit. A single nondegenerate loop already leaves a characteristic local signature.
[example: Geodesic Loop Contribution to the Wave Trace]
Suppose $\gamma$ is an isolated nondegenerate closed geodesic on a compact Riemannian manifold $(M,g)$, with length $L>0$. Nondegeneracy means that the linearised Poincare return map $P_\gamma$ has no fixed vector in the symplectic normal space, equivalently
\begin{align*}
\det(I-P_\gamma)\ne0.
\end{align*}
Microlocally near $\gamma$, the wave trace contribution at $t=L$ is modeled by an oscillatory integral in one action variable $\rho>0$ and transverse variables $z\in\mathbb R^{2n-2}$:
\begin{align*}
\operatorname{Tr}U(t)_\gamma \sim \int_0^\infty\int_{\mathbb R^{2n-2}} e^{i(t-L)\rho}e^{\frac{i\rho}{2}\langle Qz,z\rangle}a(\rho,z)\,d\mathcal L^{2n-2}(z)\,d\rho.
\end{align*}
Here $Q$ is the transverse Hessian of the return phase at the closed orbit, and the isolated nondegeneracy condition is exactly the assertion that $Q$ is invertible; in canonical coordinates one has
\begin{align*}
|\det Q|=|\det(I-P_\gamma)|
\end{align*}
up to the fixed density normalization.
Apply stationary phase in the $z$ variables. Since $z=0$ is the only critical point and $Q$ is invertible, the quadratic part gives
\begin{align*}
\det(\rho Q)=\rho^{2n-2}\det Q.
\end{align*}
Taking square roots of absolute values gives
\begin{align*}
|\det(\rho Q)|^{-1/2}=\rho^{-(n-1)}|\det Q|^{-1/2}.
\end{align*}
Thus the leading transverse contribution is
\begin{align*}
e^{i\pi\sigma(Q)/4}\rho^{-(n-1)}|\det Q|^{-1/2}a(\rho,0),
\end{align*}
where $\sigma(Q)$ is the signature of $Q$. The principal amplitude of the wave kernel along the closed orbit contributes the orbit-length factor $L$, so the leading coefficient has the form
\begin{align*}
c_\gamma=e^{i\pi\mu_\gamma/4}\frac{L}{|\det(I-P_\gamma)|^{1/2}},
\end{align*}
with $\mu_\gamma$ absorbing the Maslov and signature phases. Therefore the singular part near $t=L$ has leading form
\begin{align*}
\operatorname{Tr}U(t)_\gamma \sim c_\gamma\int_0^\infty e^{i(t-L)\rho}b(\rho)\,d\rho,
\end{align*}
where $b$ is a classical symbol with leading term $1$ in the chosen normalization. The isolated closed geodesic therefore creates a conormal spectral singularity at its length, and the size of its leading coefficient is governed by the orbit length and by $|\det(I-P_\gamma)|^{-1/2}$.
[/example]
For a flat torus or a round sphere the fixed-point sets are not isolated, so the clean version of stationary phase rather than the nondegenerate version is the correct model. This is a recurring lesson: the singular time is geometric, while the order and coefficient of the singularity depend on how the fixed-point set sits in phase space.
## Radon Transform and Visibility of Singularities
Integral geometry asks which singularities of an unknown object are detectable from its integrals over a family of submanifolds. The Radon transform is the model case: it maps a function on the plane to its line integrals, and its canonical relation tells us exactly which covectors are visible in the data.
[definition: Radon Transform in the Plane]
The plane Radon transform is the operator
\begin{align*}
R:C_c^{\infty}(\mathbb R^2)\to C^{\infty}(S^1\times\mathbb R)
\end{align*}
defined by
\begin{align*}
(Rf)(\omega,s)=\int_{x\cdot \omega=s} f(x)\,d\mathcal H^1(x)
\end{align*}
for $f\in C_c^{\infty}(\mathbb R^2)$ and $(\omega,s)\in S^1\times\mathbb R$.
[/definition]
The formula extends continuously to compactly supported distributions after pairing with test functions. To use the FIO mapping theorem, we need the canonical relation of $R$, because that relation translates an object covector into the corresponding data covector.
[quotetheorem:8222]
[citeproof:8222]
This relation says that line integrals detect covectors normal to the lines of integration. The sign $\xi=-\lambda\omega$ comes from the twisted kernel convention: the kernel has input covector $d_x\phi=\lambda\omega$, while the operator canonical relation uses its negative on the input side. Replacing the phase by $-\lambda(x\cdot\omega-s)$ reverses the parameter $\lambda$ and gives the same homogeneous set, so the sign convention has to be stated but does not change the visible unoriented line normal. The condition $\lambda\ne0$ excludes the smooth density contribution coming from zero frequency; if $\lambda=0$ were allowed, the relation would include the zero section, which is not part of a wave front relation. The incidence relation $x\cdot\omega=s$ is also necessary: a singularity at a point not lying on the measured line cannot affect that line integral microlocally, and inserting such a pair would predict data singularities where the kernel is smooth. The surface-measure convention fixes the principal density; using a weight that vanishes at a point of the incidence manifold gives the same geometric relation but destroys ellipticity there. The canonical relation alone gives a possible transport rule, not an inversion theorem: it does not say whether the principal symbol vanishes, whether the two parametrisations $(\omega,s)$ and $(-\omega,-s)$ are both being counted, or whether the measured set contains the relevant line. The next problem is ellipticity: when only an open set of lines is measured, which of those geometrically related covectors can be recovered rather than merely mapped into the data?
[quotetheorem:8223]
[citeproof:8223]
The theorem is the mathematical version of limited-angle tomography: missing angles create missing conormal directions. In medical CT language, a sharp tissue boundary is stably reconstructed only when the scanner samples lines with normals close to the boundary's conormal direction. Compact support of $f$ keeps the line integrals and the adjoint cutoff calculus proper; if $f$ is a non-decaying constant or an infinite straight edge without spatial cutoff, some Radon integrals diverge or acquire singular behaviour from infinity rather than from a finite covector of the object. The openness of $V$ matters because ellipticity is a microlocal condition on a neighbourhood, not just at a single measured line. If data are known only for one direction $\omega_0$ or on a curve in $S^1\times\mathbb R$, the cutoff $\chi$ cannot be elliptic on an open conic neighbourhood of the corresponding data covector, so a pseudodifferential parametrix for $R^*\chi R$ is unavailable. Ellipticity of the localized normal operator is the decisive analytic hypothesis: a weight or detector response that vanishes on the relevant line gives measured data even though the geometric line is present, and the normal operator then loses its leading symbol at the target covector. If the available directions omit all normals close to a covector $\xi$, then no microlocal parametrix can recover that singular direction from the data. A concrete failure case is an edge whose normal direction lies in the missing angular wedge: reconstructions may show streak artefacts caused by the boundary of the data set, but the original conormal singularity is not stably recovered. The simplest discontinuity already shows how the direction of the jump controls visibility.
[example: Recovery of a Jump Singularity from Line Integrals]
Let $f=\mathbb{1}_{\{x_1>0\}}\psi$, with $\psi\in C_c^\infty(\mathbb R^2)$ nonzero near the origin. On a neighbourhood where $\psi(0,x_2)$ is not identically zero, the only nonsmooth factor is $\mathbb{1}_{\{x_1>0\}}$, so the jump hypersurface is $Y=\{x_1=0\}$ and its conormal covectors are
\begin{align*}
N^*Y\setminus0=\{((0,x_2),\lambda dx_1):\lambda\ne0\}.
\end{align*}
Thus the jump contributes covectors $((0,x_2),\lambda dx_1)$ to $WF(f)$ wherever the cutoff does not remove the jump.
For the measured direction $\omega=e_1$, the line $x\cdot\omega=s$ is $x_1=s$, and the Radon transform is
\begin{align*}
(Rf)(e_1,s)=\int_{\mathbb R}\mathbb{1}_{\{s>0\}}\psi(s,x_2)\,d\mathcal L^1(x_2).
\end{align*}
Set
\begin{align*}
g(s)=\int_{\mathbb R}\psi(s,x_2)\,d\mathcal L^1(x_2).
\end{align*}
Since $\psi$ is compactly supported and smooth, differentiating under the integral shows that $g$ is smooth. Hence
\begin{align*}
(Rf)(e_1,s)=H(s)g(s).
\end{align*}
If $g(0)\ne0$ after the chosen localization along the line, this expression has a jump at $s=0$, so the data are singular at the line $x_1=0$.
Microlocally, the same conclusion is encoded by the visibility condition. For a jump covector $\xi=\lambda dx_1$, choose $\omega=e_1$ when $\lambda>0$ and $\omega=-e_1$ when $\lambda<0$. Then $\xi=|\lambda|\omega$ in the first case and $\xi=|\lambda|(-e_1)$ in the second, while the incidence equation is
\begin{align*}
x\cdot\omega=s.
\end{align*}
Thus a measured open set containing the relevant lines with normals near $\pm e_1$ makes these conormal covectors visible, so elliptic microlocal inversion recovers the jump modulo smoother terms by *Visibility Theorem for the Radon Transform*. If the measured directions avoid a neighbourhood of $\pm e_1$, no such $\omega$ exists near the conormal direction $dx_1$, so the limited-angle normal operator is not elliptic at these covectors and the edge is not stably recovered.
[/example]
This example also explains the distinction between singular support and wave front set in inverse problems. The location of an edge alone is not enough information; the measured family must see the covector normal to that edge.
## Scattering Intuition: Incoming and Outgoing Wave Fronts
The last case study changes the geometric setting from compact propagation to behaviour at large distance. Scattering theory asks how singularities enter from infinity, interact with an operator or geometry, and leave as outgoing singularities.
[definition: Incoming and Outgoing Wave Front]
Let $\Omega\subset\mathbb R^n$ be an exterior domain. Let $M=\Omega$ in the stationary Helmholtz case or $M=\mathbb R_t\times\Omega$ in the wave case. Let
\begin{align*}
P:C^\infty(M)\to C^\infty(M)
\end{align*}
be a real principal type differential operator, extended by duality to
\begin{align*}
P:\mathcal D'(M)\to\mathcal D'(M).
\end{align*}
Let $u\in\mathcal D'(M)$. An incoming wave front is a connected microlocal component of $WF(u)\cap\operatorname{Char}(P)$ whose bicharacteristics approach a fixed compact interaction region as the forward Hamilton parameter increases. An outgoing wave front is a connected microlocal component of $WF(u)\cap\operatorname{Char}(P)$ whose bicharacteristics leave that compact interaction region as the forward Hamilton parameter increases.
[/definition]
Incoming and outgoing behaviour is dynamical: it depends on a covector together with the Hamilton flow and the chosen time orientation. In Euclidean scattering, the distinction agrees with the sign of radial momentum at infinity.
[example: Radial Momentum for Free Waves]
For the free wave operator with principal symbol $p(t,x,\tau,\xi)=\tau^2-|\xi|^2$, the characteristic set is determined by
\begin{align*}
p(t,x,\tau,\xi)=0 \Longleftrightarrow \tau^2=|\xi|^2.
\end{align*}
The Hamilton vector field is
\begin{align*}
H_p=2\tau\partial_t-2\sum_{j=1}^n\xi_j\partial_{x_j},
\end{align*}
because $\partial_\tau p=2\tau$, $\partial_{\xi_j}p=-2\xi_j$, and $p$ is independent of $t$ and $x$.
Along a bicharacteristic $(t(s),x(s),\tau(s),\xi(s))$, Hamilton's equations are
\begin{align*}
\dot t=2\tau,\quad \dot x=-2\xi,\quad \dot\tau=0,\quad \dot\xi=0.
\end{align*}
Thus $\tau$ and $\xi$ are constant, and integration from $s=0$ gives
\begin{align*}
t(s)=t(0)+2\tau s,\quad x(s)=x(0)-2s\xi.
\end{align*}
For $x(s)\ne0$, let $r(s)=|x(s)|$. Since $r(s)^2=x(s)\cdot x(s)$, differentiating gives
\begin{align*}
2r(s)\dot r(s)=2x(s)\cdot \dot x(s).
\end{align*}
Substituting $\dot x(s)=-2\xi$ gives
\begin{align*}
\dot r(s)=\frac{x(s)\cdot(-2\xi)}{|x(s)|}=-2\frac{x(s)\cdot\xi}{|x(s)|}.
\end{align*}
Therefore the radial momentum
\begin{align*}
\rho_{\mathrm{rad}}(s)=\frac{x(s)\cdot\xi}{|x(s)|}
\end{align*}
controls whether the ray is moving toward or away from the interaction region: if $\rho_{\mathrm{rad}}(s)>0$, then $\dot r(s)<0$, so the ray is incoming along the forward Hamilton direction; if $\rho_{\mathrm{rad}}(s)<0$, then $\dot r(s)>0$, so the ray is outgoing. The case $\rho_{\mathrm{rad}}(s)=0$ is momentarily tangential to the sphere $|x|=r(s)$, so the radial test separates incoming from outgoing rays away from such glancing instants.
[/example]
The radial example gives useful vocabulary, but actual observations are made through operators: restrictions, boundary measurements, scattering matrices, or transforms. To decide whether such an observation really preserves microlocal information, we need a criterion that combines the geometry of its canonical relation with ellipticity of its principal symbol.
[quotetheorem:8224]
[citeproof:8224]
This theorem is the conceptual endpoint of the chapter, and each hypothesis rules out a specific obstruction. Proper support prevents singularities from escaping to infinity and then reappearing through a nonlocal failure of the mapping theorem; for example, an integral operator with kernel $e^{ix\xi}$ and no spatial cutoff can receive contributions from arbitrarily large $x$, so smoothness of the output near a fixed covector does not isolate a compact part of the input. Ellipticity is necessary because a Fourier integral operator with vanishing principal symbol on a branch may map a genuine input singularity to a smoother output: multiplication by a smooth factor vanishing on a hypersurface, followed by an elliptic FIO, gives this failure in local coordinates. The local graph assumption excludes caustics and folds; for a fold projection such as the model phase $y\theta+x\theta^2$, two source covectors can arrive at the same observed covector, and their leading oscillatory amplitudes may cancel. The theorem also does not say that every observed singularity is meaningful when the operator is not elliptic, nor does it separate two source branches if the canonical relation is many-to-one. Whether we call the measurement a wave trace, a Radon transform, or a scattering observation, the governing question is the same: which canonical relation carries the singularity, and is the operator elliptic on that branch?
[remark: Common Microlocal Pattern]
Each case study follows the same three-step method. First identify the operator as a pseudodifferential operator or Fourier integral operator. Next compute or interpret the relevant canonical relation. Finally use ellipticity, clean composition, or Hamilton propagation to decide which wave front directions survive in the output.
[/remark]
The applications in this chapter are deliberately varied, but they rely on a small common vocabulary: conormal distributions, bicharacteristics, canonical relations, ellipticity, and stationary phase. This is the practical strength of microlocal analysis: different analytic problems reduce to tracking covectors through symplectic geometry and testing whether the associated amplitudes vanish.
## Beyond This Course
The constructions in this course point in several directions. The first is the global theory of elliptic and hyperbolic equations on manifolds, where parametrices built from [Microlocal Analysis I: Pseudodifferential Operators](/page/Microlocal%20Analysis%20I%3A%20Pseudodifferential%20Operators) and Fourier integral operators become the basic language for regularity, propagation, and spectral asymptotics. The second is symplectic geometry: canonical relations, Lagrangian submanifolds, and Hamilton flows are not auxiliary pictures but the geometric objects that carry microlocal information. A third direction is inverse problems and integral geometry, where transforms such as the Radon transform are studied by identifying their canonical relations and asking which singularities are visible, invisible, or recoverable.
For related Androma topics, published companion pages include [Distribution](/page/Distribution), [Sobolev Space](/page/Sobolev%20Space), [Fourier Transform](/page/Fourier%20Transform), and [Semiclassical Analysis I: Symbols, Quantization, and Microlocal Foundations](/page/Semiclassical%20Analysis%20I%3A%20Symbols%2C%20Quantization%2C%20and%20Microlocal%20Foundations). The theorem [Stationary Phase Lemma](/theorems/645) supplies the local oscillatory-integral estimate behind many of the constructions above. Together these references give the analytic and geometric background needed to use wave front sets as a working tool rather than only as a definition.
## References
### Androma Notes
- [Distribution](/page/Distribution)
- [Sobolev Space](/page/Sobolev%20Space)
- [Fourier Transform](/page/Fourier%20Transform)
- [Microlocal Analysis I: Pseudodifferential Operators](/page/Microlocal%20Analysis%20I%3A%20Pseudodifferential%20Operators)
- [Semiclassical Analysis I: Symbols, Quantization, and Microlocal Foundations](/page/Semiclassical%20Analysis%20I%3A%20Symbols%2C%20Quantization%2C%20and%20Microlocal%20Foundations)
- [Stationary Phase Lemma](/theorems/645)
### External References
- L. Hörmander, *The Analysis of Linear Partial Differential Operators I-IV*, Springer.
- J. J. Duistermaat, *Fourier Integral Operators*, Birkhäuser.
- V. Guillemin and S. Sternberg, *Geometric Asymptotics*, American Mathematical Society.
- M. E. Taylor, *Pseudodifferential Operators*, Princeton University Press.
- F. Treves, *Introduction to Pseudodifferential and Fourier Integral Operators*, Plenum Press.
Contents
- Introduction
- Why Singular Support Is Not Enough
- Local Fourier Decay As A Smoothness Test
- The Microlocal Question
- Pseudodifferential Operators As Directional Cutoffs
- Propagation And Canonical Geometry
- Fourier Integral Operators As Transport Of Singularities
- Structure Of The Course
- 1. Directional Singularities and the Wave Front Set
- Singular Support and Directional Frequency
- Localized Fourier Decay in $\mathbb R^n$
- Coordinate Invariance and Manifolds
- Basic Operations
- 2. Pseudodifferential Detection of Singularities
- Microlocal Regularity by Elliptic Cutoffs
- Properly Supported Operators And The Global Characterization
- Microlocal Elliptic Regularity
- Equivalence With The Fourier Definition
- Sobolev Wave Front Sets
- 3. Operations on Distributions and Wave Front Calculus
- Tensor Products and Exterior Singularities
- Pullback and the Normal Set Condition
- Pushforward Under Proper Submersions
- Products and Hörmander Transversality
- The Calculus Viewpoint
- 4. Conormal Distributions and Model Singularities
- Singularities Normal to a Submanifold
- Oscillatory Integral Representation
- Order and Principal Symbol
- Model Examples and Basic Kernels
- 5. Real Principal Type and Bicharacteristics
- Principal Symbols, Characteristic Sets, and Hamilton Vector Fields
- Null Bicharacteristics in the Punctured Cotangent Bundle
- Poisson Brackets and Commutator Estimates
- Real Principal Type Operators and Solvability Along Bicharacteristics
- 6. Propagation of Singularities
- Real Principal Type and the Propagation Problem
- Commutators and the Local Propagation Step
- Forward and Backward Bicharacteristic Strips
- Consequences for Hyperbolic Equations
- 7. Oscillatory Integrals and Stationary Phase
- Phase Functions and Critical Geometry
- Stationary Phase and Full Expansions
- Clean Critical Sets and Excess
- Oscillatory Integrals as Distributions
- 8. Lagrangian Distributions and Canonical Geometry
- Lagrangian Submanifolds in the Punctured Cotangent Bundle
- Homogeneous Lagrangians and Phase Parametrization
- Lagrangian Distributions and Principal Symbols
- Conormal Distributions as Lagrangian Distributions
- 9. Fourier Integral Operators and Canonical Relations
- Kernels as Lagrangian Distributions
- Canonical Relations from Phase Functions
- Mapping Wave Front Sets by Canonical Relations
- Proper Support and Composition Setup
- 10. Composition, Adjoints, and Egorov's Theorem
- Transversal Composition of Canonical Relations
- Clean Composition and the Excess Formula for Orders
- Adjoints of Fourier Integral Operators
- Egorov's Theorem for Conjugation by Elliptic FIOs
- 11. Parametrices for Hyperbolic Equations
- The Cauchy Problem and the Characteristic Variety
- The Eikonal Equation and the Construction of the Phase
- Transport Equations for Amplitudes
- The Wave Kernel as a Sum of Lagrangian Distributions
- 12. Applications and Case Studies
- Propagation from Conormal Initial Data
- Microlocal Structure of the Wave Trace
- Radon Transform and Visibility of Singularities
- Scattering Intuition: Incoming and Outgoing Wave Fronts
- Beyond This Course
- References
- Androma Notes
- External References
Microlocal Analysis II: Wave Front Sets and Fourier Integral Operators
Content
Problems
History
Created by admin on 6/20/2026 | Last updated on 6/20/2026
Prerequisites (0/1 completed)
Log in to track your prerequisite progress.
Prerequisites Graph
Interactive dependency map showing prerequisite concepts
Loading dependency graph...
Theorem
Definition
Current
Requires
Rate this page
★
★
★
★
★
Poor
Excellent