[motivation]
**From ODEs to SDEs.** Classical analysis revolves around ordinary differential equations of the form
\begin{align*}
\dot{x}(t) = F(x(t)).
\end{align*}
Many physical systems, however, are subject to random perturbations: a particle in a fluid, stock prices, neural firing rates. A natural way to model such a system is to add a random forcing term,
\begin{align*}
\dot{x}(t) = F(x(t)) + \eta(t),
\end{align*}
where $\eta$ is a random function representing noise. The question is: what properties should $\eta$ have?
If we are modeling physical noise — thermal fluctuations, quantum randomness, measurement error — we expect the noise at widely separated times to be essentially independent. Noise at time $t$ carries no memory of what happened at time $s$ when $|t - s|$ is large. The idealization of this observation is to demand that $\eta(t)$ and $\eta(s)$ are independent for every $t \neq s$. Such a process is called **white noise**.
**Why White Noise Is Not a Function.** The independence requirement places a severe constraint on $\eta$. If $\eta(t)$ and $\eta(s)$ were independent for every $t \neq s$ and $\eta$ were a measurable function, then $\eta$ would have to be almost everywhere constant — a contradiction. More precisely, white noise turns out to exist only as a Schwartz distribution, not as a genuine function. This is the first fundamental obstacle the course must overcome.
**The Integral Formulation and Brownian Motion.** To understand the simplest case, set $F = 0$. The equation reduces to $\dot{x} = \eta$, or in integral form,
\begin{align*}
x(t) = x(0) + \int_0^t \eta(s)\, ds.
\end{align*}
For this integral to make sense, $\eta$ should at least be a signed measure. But white noise is not even that, so the integral cannot be interpreted classically.
We proceed nonetheless by examining what properties such an $x$ would have to satisfy. For any partition $0 = t_0 < t_1 < \cdots < t_n$, the increments
\begin{align*}
x(t_i) - x(t_{i-1}) = \int_{t_{i-1}}^{t_i} \eta(s)\, ds
\end{align*}
should be independent (since the noise values on disjoint time intervals are independent), and their variance should scale as $|t_i - t_{i-1}|$. These are precisely the defining properties of Brownian motion increments. Thus, the "integral of white noise" should be Brownian motion.
**Why Work in Continuous Time?** One might ask: why not simply discretize time and work with finite sums? The answer parallels the relationship between Riemann sums and the Lebesgue integral. The Lebesgue integral requires significant foundational work to construct, but once built, it is a vastly more powerful tool: integrating $1/x^3$ requires no special tricks, whereas summing $\sum_{n=1}^\infty 1/n^3$ in closed form is a much harder problem. Similarly, stochastic calculus in continuous time, once constructed, yields explicit computations and structural results — Itô's formula, the Lévy characterization, Girsanov's theorem — that have no easy discrete analogues.
Furthermore, many important continuous-time processes are naturally described as solutions to stochastic differential equations, just as trigonometric functions and Bessel functions are characterized as solutions to ordinary differential equations. The SDE viewpoint gives both a computational handle and structural insight.
**The Two Integrals: Itô and Stratonovich.** There are two principal approaches to defining stochastic integrals: the **Itô integral** and the **Stratonovich integral**. This course focuses on the Itô integral. One key advantage is that the Itô integral of an adapted process with respect to a martingale is again a martingale — a property that makes it the natural tool for probabilistic analysis. The Stratonovich integral, while more natural from the perspective of differential geometry and chain rules, is less convenient for the probabilistic techniques that dominate this course.
[/motivation]