Free probability studies probability-like calculations in algebras where products need not commute. Its central independence notion is *free independence*: a rule controlling alternating centered products rather than a rule built from commuting random variables. This course starts from ordered words, expectation functionals, moments, and joint laws, while using standard tools from linear algebra, elementary Hilbert-space language, finite posets, and bounded operator models when they clarify the foundations.
The course builds systematically from foundations to computational machinery. Chapters 1-3 establish noncommutative probability spaces, ordered moments, joint laws, and the first examples showing why free independence differs from classical independence. Chapters 4-6 introduce *noncrossing partitions* as the combinatorial skeleton of freeness and develop *free cumulants*, the coordinates in which free independence becomes a vanishing condition. This trio of concepts, freeness, noncrossing partitions, and cumulants, forms the computational core of the notes.
The final chapters move from foundations to models and computations. Chapters 7-8 study *semicircular variables*, the free analogue of Gaussian random variables, and prove the *[free central limit theorem](/theorems/7145)*. Chapters 9-10 construct free families in explicit algebraic and Hilbert-space models, then turn the moment-cumulant formulas into a practical toolkit for computing examples. Analytic and matrix-theoretic motivations remain in the background here; the page itself builds the algebraic language needed before those later directions.
# Introduction
This opening chapter explains what the course means by free probability and why its first foundations are algebraic rather than analytic. Classical probability studies random variables through expectations of products, but the products commute; free probability keeps the expectation language while allowing products to depend on order. The first goal is to replace a probability space by a noncommutative probability space, then to identify the correct analogue of independence.
The course deliberately postpones analytic transform methods and random matrix limits. Those topics motivate the subject, but the first layer of the theory is built from states, moments, words, partitions, and cumulants. By Chapter 8, the free [central limit theorem](/theorems/521) will emerge from these foundations in the same structural role played by the classical [central limit theorem](/theorems/1848).
## Why Noncommutative Probability Starts With Moments
What data should determine the law of a random object when multiplication is not commutative? In classical probability, a real [random variable](/page/Random%20Variable) $X$ is often studied through the numbers $\mathbb E[X^n]$, and several variables are studied through mixed moments such as $\mathbb E[X_1X_2X_1]$. If the variables commute, many words collapse to the same monomial; if they do not commute, the order of the letters becomes part of the data.
[definition: Noncommutative Polynomial]
Let $x_1, \dots, x_n$ be formal noncommuting variables. The algebra $\mathbb C\langle x_1, \dots, x_n\rangle$ is the complex [vector space](/page/Vector%20Space) with basis all words in the letters $x_1, \dots, x_n$, equipped with multiplication by concatenation and extended linearly.
[/definition]
This definition records the first structural change from commutative probability: $x_1x_2$ and $x_2x_1$ are different monomials. The next issue is whether this order-dependence can be seen in a concrete finite model, because the course needs examples where noncommutative words are actual products rather than formal symbols.
[example: Two Ordered Products]
Let $A=e_{12}$ and $B=e_{21}$ in $M_2(\mathbb C)$, with $\operatorname{tr}_2(C)=\frac{1}{2}\operatorname{Tr}(C)$. For standard matrix units, $e_{ij}e_{kl}=\delta_{jk}e_{il}$, since the only possible nonzero entry of the product lies in position $(i,l)$ and occurs exactly when $j=k$. Therefore
\begin{align*}
AB=e_{12}e_{21}=\delta_{2,2}e_{11}=e_{11}.
\end{align*}
Similarly,
\begin{align*}
BA=e_{21}e_{12}=\delta_{1,1}e_{22}=e_{22}.
\end{align*}
The matrices are different: $e_{11}$ has diagonal entries $1,0$, while $e_{22}$ has diagonal entries $0,1$. Their normalized traces nevertheless agree, because
\begin{align*}
\operatorname{tr}_2(AB)=\operatorname{tr}_2(e_{11})=\frac{1}{2}\operatorname{Tr}(e_{11})=\frac{1}{2}.
\end{align*}
and
\begin{align*}
\operatorname{tr}_2(BA)=\operatorname{tr}_2(e_{22})=\frac{1}{2}\operatorname{Tr}(e_{22})=\frac{1}{2}.
\end{align*}
Thus the words $AB$ and $BA$ represent different products even though this particular trace cannot distinguish them; the order of multiplication is already real before any later theory of mixed moments is introduced.
[/example]
The example also shows why traces are natural in this course. A trace does not erase all order information, but it allows cyclic rearrangements, so $\operatorname{tr}_2(ABC)=\operatorname{tr}_2(BCA)$. The next problem is to formulate the abstract expectation functional that will evaluate such words in every model at once.
## States as Expectations
What replaces the expectation operator when the random variables live in an algebra? The answer is a linear functional satisfying the same formal normalisation and positivity conditions as integration against a probability measure. Positivity is essential because it encodes the inequality $\mathbb E[|X|^2]\ge 0$ in a form that still makes sense for abstract algebras with involution.
[definition: Star Algebra]
A unital complex algebra $\mathcal A$ is a $*$-algebra if it is equipped with a map $*:\mathcal A\to\mathcal A$, written $a\mapsto a^*$, such that, for all $a,b\in\mathcal A$ and $\lambda\in\mathbb C$,
\begin{align*}
(a+b)^* = a^*+b^*, \quad (\lambda a)^*=\overline{\lambda}a^*, \quad (ab)^*=b^*a^*, \quad (a^*)^*=a.
\end{align*}
[/definition]
The involution plays the role of complex conjugation or adjoint. Once it is present, positivity has a canonical algebraic form: elements of the shape $a^*a$ are the analogues of nonnegative random variables of the form $|X|^2$. This prepares the next definition, whose purpose is to make expectation an intrinsic part of the algebraic structure.
[definition: State]
Let $\mathcal A$ be a unital $*$-algebra. A state on $\mathcal A$ is a linear functional $\varphi:\mathcal A\to\mathbb C$ such that
\begin{align*}
\varphi(1_{\mathcal A}) &= 1, & \varphi(a^*a)&\ge 0 \quad \text{for all } a\in\mathcal A.
\end{align*}
[/definition]
A state is the noncommutative expectation, but an expectation alone is not yet a probability model. The algebra and the expectation must be bundled into a single ambient object before the course can speak about variables, moments, laws, and independence. The next definition creates that ambient object.
[definition: Noncommutative Probability Space]
A noncommutative probability space is a pair $(\mathcal A,\varphi)$, where $\mathcal A$ is a unital complex algebra and $\varphi:\mathcal A\to\mathbb C$ is a unital linear functional.
[/definition]
When $\mathcal A$ is a $*$-algebra and $\varphi$ is a state, the space has positivity. Many algebraic constructions need only a unital linear functional, while laws of self-adjoint variables and Hilbert-space realizations use the stronger positive setting. The next examples show that this abstraction contains ordinary probability and matrix models, so it is not merely formal notation.
[example: Classical Probability as a Commutative Model]
Let $(\Omega,\mathcal F,\mathbb P)$ be a probability space and let $\mathcal A=L^\infty(\Omega)$, with addition, scalar multiplication, and multiplication defined pointwise. For $f,g\in L^\infty(\Omega)$ and $\lambda\in\mathbb C$, define $\varphi(f)=\mathbb E[f]$. Linearity of the integral gives
\begin{align*}
\varphi(f+\lambda g)=\mathbb E[f+\lambda g]=\mathbb E[f]+\lambda\mathbb E[g]=\varphi(f)+\lambda\varphi(g).
\end{align*}
The unit of $\mathcal A$ is the constant function $1$, and since $\mathbb P(\Omega)=1$,
\begin{align*}
\varphi(1)=\mathbb E[1]=\int_\Omega 1\,d\mathbb P=\mathbb P(\Omega)=1.
\end{align*}
Thus $(L^\infty(\Omega),\varphi)$ is a noncommutative probability space in the algebraic sense. It is commutative because, for every $\omega\in\Omega$,
\begin{align*}
(fg)(\omega)=f(\omega)g(\omega)=g(\omega)f(\omega)=(gf)(\omega).
\end{align*}
So ordinary bounded random variables form the commutative model inside the noncommutative framework, with products interpreted as pointwise products of random variables.
[/example]
This example is not a separate theory inside the course; it is the reference model. Free probability changes independence while keeping the moment language that classical probability already uses. To see what genuinely changes, the course also keeps finite-dimensional noncommutative models in view.
[example: Matrix Probability Space]
Let $\mathcal A=M_N(\mathbb C)$ and define $\varphi(A)=\operatorname{tr}_N(A)=\frac{1}{N}\operatorname{Tr}(A)$. For $A,B\in M_N(\mathbb C)$ and $\lambda\in\mathbb C$, linearity of the usual trace gives
\begin{align*}
\varphi(A+\lambda B)=\frac{1}{N}\operatorname{Tr}(A+\lambda B)=\frac{1}{N}\operatorname{Tr}(A)+\lambda\frac{1}{N}\operatorname{Tr}(B)=\varphi(A)+\lambda\varphi(B).
\end{align*}
The unit is the identity matrix $I_N$, whose diagonal entries are all $1$, so
\begin{align*}
\varphi(I_N)=\frac{1}{N}\operatorname{Tr}(I_N)=\frac{1}{N}\sum_{i=1}^N 1=1.
\end{align*}
Thus $(M_N(\mathbb C),\operatorname{tr}_N)$ is a noncommutative probability space.
It is tracial because, if $A=(a_{ij})$ and $B=(b_{ij})$, then
\begin{align*}
\operatorname{Tr}(AB)=\sum_{i=1}^N(AB)_{ii}=\sum_{i=1}^N\sum_{j=1}^N a_{ij}b_{ji}.
\end{align*}
Also
\begin{align*}
\operatorname{Tr}(BA)=\sum_{i=1}^N(BA)_{ii}=\sum_{i=1}^N\sum_{j=1}^N b_{ij}a_{ji}.
\end{align*}
Renaming the indices $i$ and $j$ in the last double sum gives
\begin{align*}
\sum_{i=1}^N\sum_{j=1}^N b_{ij}a_{ji}=\sum_{i=1}^N\sum_{j=1}^N b_{ji}a_{ij}=\sum_{i=1}^N\sum_{j=1}^N a_{ij}b_{ji}.
\end{align*}
Therefore $\operatorname{tr}_N(AB)=\operatorname{tr}_N(BA)$. In this model, self-adjoint matrices $A=A^*$ play the role of real-valued random variables, and their $k$th moments are exactly the normalized traces
\begin{align*}
\varphi(A^k)=\operatorname{tr}_N(A^k)=\frac{1}{N}\operatorname{Tr}(A^k).
\end{align*}
So matrix probability spaces give finite-dimensional models where expectations are traces and laws are encoded by normalized traces of matrix words.
[/example]
Matrix algebras provide concrete finite models for the abstract definitions. Later analytic and random-matrix applications will explain why freeness appears naturally, but already in Chapters 1 and 2 matrices demonstrate why order-sensitive joint moments are unavoidable.
## Laws and Joint Moments
How should we speak about the distribution of a noncommutative random variable? For one self-adjoint element, the sequence of moments resembles the classical moment sequence of a real random variable. For several noncommuting elements, the law must remember every word in the variables.
[definition: Joint Moment]
Let $(\mathcal A,\varphi)$ be a noncommutative probability space and let $a_1,\dots,a_n\in\mathcal A$. For a word $i_1\dots i_k$ with letters in $\{1,\dots,n\}$, the corresponding joint moment is
\begin{align*}
\varphi(a_{i_1}a_{i_2}\cdots a_{i_k}).
\end{align*}
[/definition]
Joint moments are the coordinates of the law. Since the word $i_1\dots i_k$ has an order, the collection of all such numbers is much richer than the collection of commutative mixed moments. The next definition packages these coordinates as a single functional on noncommutative polynomials, which is the form used by later cumulant formulas.
[definition: Joint Law]
Let $(\mathcal A,\varphi)$ be a noncommutative probability space and let $a_1,\dots,a_n\in\mathcal A$. The joint law of $(a_1,\dots,a_n)$ is the linear functional
\begin{align*}
\mu_{a_1,\dots,a_n}:\mathbb C\langle x_1,\dots,x_n\rangle\to\mathbb C, \qquad
\mu_{a_1,\dots,a_n}(p)=\varphi(p(a_1,\dots,a_n)).
\end{align*}
[/definition]
The word law is the object that freeness will factorize. This raises an immediate warning: if the joint law is genuinely a word functional, then the separate laws of the individual variables should not be expected to determine it. The next theorem records this failure and explains why mixed words are essential data.
[quotetheorem:7106]
[citeproof:7106]
This theorem is the warning that motivates the whole subject. Marginals describe individual variables, while freeness is a relation between subalgebras and is therefore encoded in mixed words. The result does not say that marginal moments are useless: for a single self-adjoint variable they are often the main data of its law. It says instead that, once two variables are placed in a common noncommutative algebra, their separate moment sequences do not determine how their products interleave. The finite-dimensional hypothesis in the proof is deliberately modest, because the failure already occurs for matrices and is not a pathology of infinite-dimensional analysis.
## Independence and the Need for a New Rule
What should independence mean when variables do not commute? Classical independence is characterized by factorization of expectations across products of functions of disjoint random variables. If the product order matters, this factorization cannot be imported without changing its form.
[definition: Classical Independence in Moment Form]
Let $(\Omega,\mathcal F,\mathbb P)$ be a probability space. Subalgebras $\mathcal A_1,\dots,\mathcal A_n\subset L^\infty(\Omega)$ are classically independent if, for every choice of distinct indices $i_1,\dots,i_k\in\{1,\dots,n\}$ and every choice of $f_r\in\mathcal A_{i_r}$,
\begin{align*}
\mathbb E[f_1f_2\cdots f_k]=\prod_{r=1}^{k}\mathbb E[f_r].
\end{align*}
[/definition]
In the commutative setting, rearranging factors causes no change. In a noncommutative algebra, a rule for independence must specify how alternating centered products behave. This leads to freeness, whose defining condition is designed for words that move back and forth between different subalgebras.
[definition: Freeness]
Let $(\mathcal A,\varphi)$ be a noncommutative probability space. Unital subalgebras $(\mathcal A_i)_{i\in I}$ are free if, for every $k\ge 1$, every choice of indices $i_1,\dots,i_k\in I$ with $i_r\ne i_{r+1}$, and every choice of $a_r\in\mathcal A_{i_r}$ satisfying $\varphi(a_r)=0$, one has
\begin{align*}
\varphi(a_1a_2\cdots a_k)=0.
\end{align*}
[/definition]
Freeness says that alternating centered products have zero expectation. This rule must be tested against the familiar first-order consequence of classical independence, because the new definition should still factor a simple product from two different subalgebras.
[example: First Consequence of Freeness]
Suppose $\mathcal A_1$ and $\mathcal A_2$ are free, with $a\in\mathcal A_1$ and $b\in\mathcal A_2$. Set
\begin{align*}
a_0=a-\varphi(a)1_{\mathcal A}
\end{align*}
and
\begin{align*}
b_0=b-\varphi(b)1_{\mathcal A}.
\end{align*}
Since $\varphi$ is unital and linear,
\begin{align*}
\varphi(a_0)=\varphi(a)-\varphi(a)\varphi(1_{\mathcal A})=\varphi(a)-\varphi(a)=0.
\end{align*}
Similarly,
\begin{align*}
\varphi(b_0)=\varphi(b)-\varphi(b)\varphi(1_{\mathcal A})=\varphi(b)-\varphi(b)=0.
\end{align*}
Now write $a=a_0+\varphi(a)1_{\mathcal A}$ and $b=b_0+\varphi(b)1_{\mathcal A}$. Multiplying out by distributivity gives
\begin{align*}
ab=a_0b_0+\varphi(b)a_0+\varphi(a)b_0+\varphi(a)\varphi(b)1_{\mathcal A}.
\end{align*}
Applying linearity of $\varphi$,
\begin{align*}
\varphi(ab)=\varphi(a_0b_0)+\varphi(b)\varphi(a_0)+\varphi(a)\varphi(b_0)+\varphi(a)\varphi(b)\varphi(1_{\mathcal A}).
\end{align*}
The elements $a_0\in\mathcal A_1$ and $b_0\in\mathcal A_2$ are centered, and the indices $1,2$ are different, so freeness gives $\varphi(a_0b_0)=0$. Therefore
\begin{align*}
\varphi(ab)=0+\varphi(b)\cdot 0+\varphi(a)\cdot 0+\varphi(a)\varphi(b)\cdot 1=\varphi(a)\varphi(b).
\end{align*}
Thus freeness reproduces the familiar first-order factorization rule for one element from each of two free subalgebras.
[/example]
This first calculation resembles classical independence, but higher alternating products do not reduce by ordinary commutative factorization. The difference becomes visible in words such as $abab$. The next question is how to organize all the surviving and vanishing terms without expanding each word by hand.
## Combinatorics as the Organising Device
Which combinatorics governs free independence? Classical cumulants are indexed by all set partitions, and independence can be expressed by the vanishing of mixed cumulants. Free cumulants follow the same philosophy, but the relevant partitions are noncrossing partitions.
[definition: Noncrossing Partition]
Let $n\in\mathbb N$. A partition $\pi$ of $\{1,\dots,n\}$ is noncrossing if there do not exist $1\le p<q<r<s\le n$ such that $p,r$ lie in one block of $\pi$ and $q,s$ lie in another block of $\pi$.
[/definition]
Noncrossing partitions capture the planar structure behind free moment expansions. They are the combinatorial substitute for the lattice of all partitions in classical probability. The next example fixes the geometric meaning of the definition before the course uses these partitions algebraically.
[example: Crossing and Noncrossing Partitions]
For the set $\{1,2,3,4\}$, consider first the partition with blocks $\{1,2\}$ and $\{3,4\}$. A crossing would require indices $p<q<r<s$ such that $p,r$ lie in one block and $q,s$ lie in a different block. With four points, the only possible increasing choice using all points is $p=1$, $q=2$, $r=3$, and $s=4$. But $1$ and $3$ are not in the same block, and $2$ and $4$ are not in the same block. Therefore the partition $\{\{1,2\},\{3,4\}\}$ is noncrossing.
By contrast, for the partition with blocks $\{1,3\}$ and $\{2,4\}$, take
\begin{align*}
p=1,\quad q=2,\quad r=3,\quad s=4.
\end{align*}
Then $p<q<r<s$, with $p,r\in\{1,3\}$ and $q,s\in\{2,4\}$. This is exactly the forbidden pattern in the definition of a noncrossing partition, so $\{\{1,3\},\{2,4\}\}$ is crossing. Geometrically, the first partition connects adjacent pairs, while the second connects interleaving pairs, forcing the two chords to meet.
[/example]
Because the course develops free cumulants from moments, noncrossing partitions will appear as computational objects rather than decoration. The main obstruction is that ordinary set partitions count too many terms for free independence: the crossing patterns that occur in classical cumulants do not survive the freeness relations. The moment-cumulant formula below records the replacement bookkeeping, where only noncrossing partitions contribute.
[quotetheorem:7107]
[citeproof:7107]
This theorem is stated here as a roadmap rather than as a tool to use immediately. Its role is to explain why the course must build both algebraic probability and noncrossing partition combinatorics before proving the free central limit theorem. The formula is not an asymptotic theorem and does not by itself assert any independence relation; it is a change of coordinates between moments and cumulants. The restriction to noncrossing partitions is the decisive difference from the classical cumulant formula, where all set partitions appear. The later vanishing of mixed free cumulants is an additional statement about freeness, not a consequence of the displayed expansion alone.
## The Free Central Limit Theorem as Destination
What replaces the Gaussian distribution in a theory governed by freeness? The answer is the semicircular law, whose moments are counted by noncrossing pairings. This is the endpoint of the first course because it confirms that freeness has its own central limit phenomenon.
[definition: Semicircular Law]
The standard semicircular law is the probability measure $\sigma$ on $\mathbb R$ with density
\begin{align*}
\frac{1}{2\pi}\sqrt{4-x^2}\,\mathbb{1}_{[-2,2]}(x)
\end{align*}
with respect to [Lebesgue measure](/page/Lebesgue%20Measure) $\mathcal L^1$ on $\mathbb R$.
[/definition]
The density is included only to identify the limiting distribution. The combinatorial proof will use its moments, which are counted by Catalan numbers. The question is whether sums of freely independent centred variables force exactly those Catalan moments after normalization, just as classical independent sums force Gaussian moments. The theorem below is the destination: it identifies the semicircular law as the universal limit produced by freeness.
[quotetheorem:7108]
[proofunderconstruction:7108]
This result explains the title of the course. Free probability is not only probability with noncommuting variables; it is a parallel probabilistic calculus with its own independence, cumulants, and universal limit laws. The centering assumption removes the first cumulant; without it, the normalized sums would carry a growing deterministic drift rather than converge to the standard semicircular law. The unit variance fixes the scale of the limit, while a different variance would produce a rescaled semicircular law. Identical distribution supplies a common cumulant sequence for the diagonal terms, and freeness removes all mixed-index cumulants. The theorem is therefore a moment-convergence statement for scalar moments of the normalized sums, not a claim of operator-norm convergence or convergence inside an external measure space.
## How to Read the Course
The first lectures establish noncommutative probability spaces, tracial laws, and examples from classical probability and matrices. The middle lectures develop free independence, noncrossing partitions, and free cumulants. The final lectures use those tools to construct semicircular systems and prove the free central limit theorem.
The logical dependencies are cumulative. Definitions of states and joint laws are used in every later chapter, freeness is used to characterize mixed moments, and noncrossing partitions provide the bookkeeping needed for cumulants. Readers should treat the early algebraic definitions as computational infrastructure rather than as background terminology.
These computational tools form the algebraic foundation for probability without points; Chapter 1 reverses the perspective by asking which aspects of probability depend fundamentally on a sample space and which require only expectations.
# 1. Noncommutative Probability Spaces
This chapter replaces the sample space of classical probability by an algebra of observables. The guiding question is: how much of probability theory depends on points, events, and measures, and how much only needs expectations of products? Free probability begins with the second viewpoint. A noncommutative probability space records random variables as algebra elements and records their laws through linear functionals called states.
The main theme is that order matters. In classical probability the products $XY$ and $YX$ agree, so a joint moment such as $\mathbb E[XYZ]$ has no memory of ordering. In the noncommutative setting, $abc$ and $acb$ may carry different information, and the joint law must remember every word in the variables and their adjoints.
## Algebraic Observables and States
Before introducing random variables, we need an algebraic substitute for the space of [measurable functions](/page/Measurable%20Functions). The product in this algebra is the operation of multiplying observables, and the unit represents the constant random variable $1$.
[definition: Unital Algebra]
A unital algebra over $\mathbb C$ is a complex vector space $\mathcal A$ equipped with a bilinear associative product $\mathcal A \times \mathcal A \to \mathcal A$, written $(a,b) \mapsto ab$, and an element $1_{\mathcal A} \in \mathcal A$ such that $1_{\mathcal A}a=a1_{\mathcal A}=a$ for every $a \in \mathcal A$.
[/definition]
The unit lets us evaluate constants and normalize expectations. To distinguish real observables from complex ones, and to compare this algebraic model with operators on [Hilbert space](/page/Hilbert%20Space), the algebra also needs an analogue of complex conjugation and adjoint operators.
[definition: Star Algebra]
A $*$-algebra is a unital algebra $\mathcal A$ over $\mathbb C$ equipped with a map $a \mapsto a^*$ from $\mathcal A$ to $\mathcal A$ such that, for all $a,b \in \mathcal A$ and $\lambda \in \mathbb C$, the identities $(a+b)^*=a^*+b^*$, $(\lambda a)^*=\overline{\lambda}a^*$, $(ab)^*=b^*a^*$, $(a^*)^*=a$, and $1_{\mathcal A}^*=1_{\mathcal A}$ hold.
[/definition]
The operation $a \mapsto a^*$ abstracts complex conjugation for functions and adjoints for operators. This raises the next question: which linear functionals deserve to be called expectations, rather than merely arbitrary linear maps on the algebra?
[definition: State]
Let $\mathcal A$ be a $*$-algebra. A state on $\mathcal A$ is a linear functional $\varphi: \mathcal A \to \mathbb C$ such that $\varphi(1_{\mathcal A})=1$ and $\varphi(a^*a)\ge 0$ for every $a \in \mathcal A$.
[/definition]
A state is the noncommutative replacement for expectation. Some parts of free probability only need a unital expectation functional, so we separate the general algebraic probability space from the more structured positive $*$-setting.
[definition: Noncommutative Probability Space]
A noncommutative probability space is a pair $(\mathcal A,\varphi)$ where $\mathcal A$ is a unital algebra over $\mathbb C$ and $\varphi: \mathcal A \to \mathbb C$ is a unital linear functional.
[/definition]
When $\mathcal A$ is a $*$-algebra and $\varphi$ is a state, we call $(\mathcal A,\varphi)$ a $*$-probability space. The next issue is whether the state can detect the size of every observable; this matters because zero-length vectors must be removed when a state is represented on a Hilbert space.
[definition: Faithful State]
Let $\mathcal A$ be a $*$-algebra. A state $\varphi: \mathcal A \to \mathbb C$ is faithful if, for every $a \in \mathcal A$,
\begin{align*}
\varphi(a^*a)=0 \implies a=0.
\end{align*}
[/definition]
Faithfulness says that no nonzero observable has zero quadratic size. This leaves another structural question: even if multiplication is not commutative, can expectation ignore cyclic rotations of products?
[definition: Tracial State]
Let $\mathcal A$ be a $*$-algebra. A state $\varphi: \mathcal A \to \mathbb C$ is tracial if, for all $a,b \in \mathcal A$,
\begin{align*}
\varphi(ab)=\varphi(ba).
\end{align*}
[/definition]
A trace does not make the algebra commutative, but it makes cyclic rotations invisible to expectation. The reference model is ordinary bounded random variables, where all products commute and the state is integration.
[example: Functions on a Probability Space]
Let $(\Omega,\mathcal F,\mathbb P)$ be a probability space and set $\mathcal A=L^\infty(\Omega,\mathcal F,\mathbb P)$, with multiplication and addition defined pointwise and with unit the constant function $1$. For $f,g\in \mathcal A$ and $\lambda\in\mathbb C$, define $f^*=\overline f$. Pointwise multiplication is commutative, since
\begin{align*}
(fg)(\omega)=f(\omega)g(\omega)=g(\omega)f(\omega)=(gf)(\omega)
\end{align*}
for every $\omega$ outside the null sets on which representatives may be changed. The involution satisfies
\begin{align*}
((fg)^*)(\omega)=\overline{f(\omega)g(\omega)}=\overline{g(\omega)}\,\overline{f(\omega)}=(g^*f^*)(\omega),
\end{align*}
and similarly $(f+g)^*=f^*+g^*$, $(\lambda f)^*=\overline\lambda f^*$, $(f^*)^*=f$, and $1^*=1$.
Define $\varphi(f)=\mathbb E[f]=\int_\Omega f\,d\mathbb P$. Linearity of the integral gives linearity of $\varphi$, and normalization follows from
\begin{align*}
\varphi(1)=\int_\Omega 1\,d\mathbb P=\mathbb P(\Omega)=1.
\end{align*}
For positivity, compute
\begin{align*}
f^*f=\overline f\,f=|f|^2,
\end{align*}
so
\begin{align*}
\varphi(f^*f)=\int_\Omega |f|^2\,d\mathbb P\ge 0
\end{align*}
because $|f|^2$ is a nonnegative measurable function. Thus $(\mathcal A,\varphi)$ is a commutative $*$-probability space.
After passing to almost-everywhere equivalence classes, the state is faithful. Indeed, if $\varphi(f^*f)=0$, then
\begin{align*}
\int_\Omega |f|^2\,d\mathbb P=0.
\end{align*}
Since $|f|^2\ge 0$, this forces $|f|^2=0$ almost everywhere, hence $f=0$ almost everywhere. This recovers the usual algebra of bounded random variables, with expectation as the state.
[/example]
This example recovers ordinary bounded random variables. The point of the framework is that no reference to points of $\Omega$ is needed once the algebra and expectation are known.
## Moment Functionals and Positivity
The basic data of a random variable are its moments. In a noncommutative algebra, the right question is not only what $\varphi(a^k)$ is, but what $\varphi$ does to every ordered product made from several variables.
[definition: Noncommutative Random Variable]
Let $(\mathcal A,\varphi)$ be a noncommutative probability space. A noncommutative random variable is an element $a \in \mathcal A$. If $\mathcal A$ is a $*$-algebra, then $a$ is self-adjoint if $a^*=a$.
[/definition]
Self-adjoint elements play the role of real-valued random variables. This raises the next question for several variables: what should count as a joint moment when the order of multiplication is part of the data?
[definition: Joint Star Moment]
Let $(\mathcal A,\varphi)$ be a $*$-probability space, and let $a_1,\dots,a_n \in \mathcal A$. A joint $*$-moment of $(a_1,\dots,a_n)$ is a number of the form
\begin{align*}
\varphi(a_{i_1}^{\varepsilon_1}a_{i_2}^{\varepsilon_2}\cdots a_{i_k}^{\varepsilon_k}),
\end{align*}
where $k \ge 0$, each $i_j \in \{1,\dots,n\}$, and each $\varepsilon_j \in \{1,*\}$, with $a_i^1=a_i$.
[/definition]
The case $k=0$ is the empty product, whose value is $\varphi(1_{\mathcal A})=1$. Since these moments form an infinite family indexed by words, the next task is to package the family as a single law on a universal polynomial algebra.
[definition: Noncommutative Polynomial Law]
Let $\mathbb C\langle X_1,\dots,X_n,X_1^*,\dots,X_n^*\rangle$ denote the free $*$-algebra of noncommutative polynomials. The joint $*$-law of $a_1,\dots,a_n \in \mathcal A$ is the linear functional
\begin{align*}
\mu_{a_1,\dots,a_n}(P)=\varphi(P(a_1,\dots,a_n,a_1^*,\dots,a_n^*))
\end{align*}
for $P \in \mathbb C\langle X_1,\dots,X_n,X_1^*,\dots,X_n^*\rangle$.
[/definition]
This definition packages all ordered moments into one object. The next problem is to identify the first necessary test for a proposed polynomial functional to come from an actual state.
[quotetheorem:7109]
[citeproof:7109]
This theorem is the first moment-problem constraint. A proposed list of numbers can be a noncommutative law only if every finite matrix of inner products built from words is positive semidefinite.
[example: Projection and Bernoulli Law]
Let $(\mathcal A,\varphi)$ be a $*$-probability space, and let $p\in\mathcal A$ satisfy $p=p^*=p^2$. Since $p^2=p$, repeated multiplication gives the powers of $p$: for $k=1$ this is clear, and if $p^k=p$ for some $k\ge 1$, then
\begin{align*}
p^{k+1}=p^kp=pp=p^2=p.
\end{align*}
Thus $p^k=p$ for every $k\ge 1$, and applying the linear functional $\varphi$ gives
\begin{align*}
\varphi(p^k)=\varphi(p).
\end{align*}
Write $q=\varphi(p)$. Positivity of the state gives
\begin{align*}
q=\varphi(p)=\varphi(p^2)=\varphi(p^*p)\ge 0.
\end{align*}
Also $1_{\mathcal A}-p$ is self-adjoint because $(1_{\mathcal A}-p)^*=1_{\mathcal A}-p^*=1_{\mathcal A}-p$, and
\begin{align*}
(1_{\mathcal A}-p)^*(1_{\mathcal A}-p)=(1_{\mathcal A}-p)(1_{\mathcal A}-p)=1_{\mathcal A}-p-p+p^2=1_{\mathcal A}-p.
\end{align*}
Therefore
\begin{align*}
0\le \varphi((1_{\mathcal A}-p)^*(1_{\mathcal A}-p))=\varphi(1_{\mathcal A}-p)=\varphi(1_{\mathcal A})-\varphi(p)=1-q,
\end{align*}
so $0\le q\le 1$.
If $X$ is a Bernoulli random variable with parameter $q$, then $X(\omega)\in\{0,1\}$ and $\mathbb P(X=1)=q$. For every $k\ge 1$, the pointwise identity $0^k=0$ and $1^k=1$ gives $X^k=X$, hence
\begin{align*}
\mathbb E[X^k]=\mathbb E[X]=0\cdot \mathbb P(X=0)+1\cdot \mathbb P(X=1)=q.
\end{align*}
Thus the scalar moments of the projection $p$ match the scalar moments of a Bernoulli random variable with parameter $q=\varphi(p)$. Projections are therefore the algebraic replacement for events.
[/example]
The projection example shows how classical probability language survives algebraically: an event is replaced by an idempotent self-adjoint observable. We next record the standard representation theorem behind the phrase "observable".
[quotetheorem:7110]
[citeproof:7110]
The GNS construction explains why positivity is the correct axiom for states. It turns algebraic expectation into a Hilbert-space vector state, matching the operator viewpoint used throughout free probability. Conceptually, this means that the algebraic notation $\varphi(a)$ can be read as an [inner product](/page/Inner%20Product) $\langle \pi(a)\xi,\xi\rangle$ after passing to a representation. The construction is not introduced for its own sake here; it justifies treating elements of abstract algebras as observables while still retaining the Hilbert-space geometry needed for inequalities and operator models later in the course.
## Classical Probability as the Commutative Case
The framework should not replace classical probability by something unrelated. The right test is whether commutative algebras reproduce ordinary random variables and their joint distributions.
[quotetheorem:7111]
[proofunderconstruction:7111]
Thus ordinary joint moments are a special case of joint $*$-moments, but the hypotheses explain exactly why this is the commutative boundary of the theory rather than the whole theory. Boundedness keeps the multiplication operators inside a unital algebra of bounded operators, and commutativity of pointwise multiplication is what makes all words with the same multiset of letters collapse to the same classical monomial. If the variables are unbounded, or if one works with noncommuting operators, the same representation argument no longer automatically produces a bounded noncommutative probability space with the same word calculus. The theorem is therefore a compatibility result: it shows that the new language contains classical moments without changing them, while also identifying what classical probability forgets, namely the order information carried by noncommutative words. That order information is exactly what later definitions of joint law, freeness, and free cumulants are designed to preserve.
[example: Scalar Random Variables as Multiplication Operators]
Let $X,Y\in L^\infty(\Omega)$, and define $M_X:L^2(\Omega)\to L^2(\Omega)$ by $(M_Xf)(\omega)=X(\omega)f(\omega)$. The operator is bounded because, for $f\in L^2(\Omega)$,
\begin{align*}
\|M_Xf\|_2^2=\int_\Omega |X(\omega)f(\omega)|^2\,d\mathbb P(\omega)\le \|X\|_\infty^2\int_\Omega |f(\omega)|^2\,d\mathbb P(\omega)=\|X\|_\infty^2\|f\|_2^2.
\end{align*}
For $f\in L^2(\Omega)$, multiplication of operators matches pointwise multiplication of random variables:
\begin{align*}
((M_XM_Y)f)(\omega)=M_X(Yf)(\omega)=X(\omega)Y(\omega)f(\omega)=(XY)(\omega)f(\omega)=(M_{XY}f)(\omega).
\end{align*}
Since $X(\omega)Y(\omega)=Y(\omega)X(\omega)$ almost everywhere, $M_XM_Y=M_{XY}=M_{YX}=M_YM_X$.
Using the $L^2$ inner product linear in the first variable, the adjoint is multiplication by $\overline X$. Indeed, for $f,g\in L^2(\Omega)$,
\begin{align*}
(M_Xf,g)_{L^2}=\int_\Omega X(\omega)f(\omega)\overline{g(\omega)}\,d\mathbb P(\omega)=\int_\Omega f(\omega)\overline{\overline{X(\omega)}g(\omega)}\,d\mathbb P(\omega)=(f,M_{\overline X}g)_{L^2}.
\end{align*}
Thus $M_X^*=M_{\overline X}$. With the cyclic vector $1\in L^2(\Omega)$,
\begin{align*}
(M_X1,1)_{L^2}=\int_\Omega X(\omega)\cdot 1\cdot \overline{1}\,d\mathbb P(\omega)=\int_\Omega X\,d\mathbb P=\mathbb E[X].
\end{align*}
So classical scalar random variables are represented as commuting multiplication operators on $L^2(\Omega)$, and expectation is recovered as the vector state at $1$.
[/example]
This operator model connects the classical and noncommutative pictures. The new phenomena begin when the operators representing variables no longer commute.
[remark: Commutative Collapse]
In a commutative algebra, the words $X_1X_2X_1$ and $X_1^2X_2$ represent the same element. In a noncommutative algebra, they may represent different elements and can have different expectations under the same state. The noncommutative law is therefore a functional on words, not only a measure on $\mathbb C^n$.
[/remark]
This distinction is why free probability cannot be reduced to classical probability on a larger sample space. Noncommuting variables encode joint information that is invisible to marginal distributions.
## Matrix Models and Tracial Moments
Finite matrices provide the first concrete noncommutative models. They are small enough for direct computation and rich enough to show the central role of traces.
[definition: Normalized Matrix Trace]
For $N\in\mathbb N$, the normalized trace on $M_N(\mathbb C)$ is the linear functional
\begin{align*}
\operatorname{tr}_N(A)=\frac{1}{N}\operatorname{Tr}(A),
\end{align*}
where $\operatorname{Tr}(A)=\sum_{i=1}^N A_{ii}$ is the usual matrix trace.
[/definition]
The normalization makes $\operatorname{tr}_N(I_N)=1$, so it has the same normalization as expectation. To use matrices as probability spaces, we must check not only normalization but also positivity, faithfulness, and the trace identity.
[quotetheorem:7112]
[citeproof:7112]
Matrix algebras show how a noncommutative space can still have a trace. The normalized trace is positive because $A^*A$ has nonnegative diagonal trace, faithful because this trace vanishes only when $A=0$, and tracial because ordinary matrix trace is invariant under cyclic permutation. These properties make matrices the basic finite-dimensional testing ground for free probability: products may fail to commute, but expectations still have the cyclic symmetry needed for many moment arguments. We now isolate that symmetry in a form that no longer depends on matrices.
[quotetheorem:7113]
[citeproof:7113]
Cyclicity is weaker than commutativity. It identifies $abc$ with $bca$ and $cab$, but it does not identify $abc$ with $acb$ in general.
[example: Noncommuting Matrices with the Same Marginals]
In $M_2(\mathbb C)$ with normalized trace, let $E_{ij}$ be the matrix unit with $1$ in entry $(i,j)$ and $0$ elsewhere. The matrix units multiply by the rule $E_{ij}E_{kl}=\delta_{jk}E_{il}$ and satisfy $E_{ij}^*=E_{ji}$. Set
\begin{align*}
A=E_{11}-E_{22},\qquad B=E_{12}+E_{21},\qquad C=iE_{12}-iE_{21}.
\end{align*}
Then
\begin{align*}
A^*=E_{11}-E_{22}=A,\qquad B^*=E_{21}+E_{12}=B,\qquad C^*=(-i)E_{21}+iE_{12}=C.
\end{align*}
Their normalized traces are all zero:
\begin{align*}
\operatorname{tr}_2(A)=\frac{1}{2}(1-1)=0,\qquad \operatorname{tr}_2(B)=0,\qquad \operatorname{tr}_2(C)=0.
\end{align*}
Each of $A,B,C$ squares to the identity. For $A$,
\begin{align*}
A^2=(E_{11}-E_{22})(E_{11}-E_{22})=E_{11}-0-0+E_{22}=I_2.
\end{align*}
For $B$,
\begin{align*}
B^2=(E_{12}+E_{21})(E_{12}+E_{21})=0+E_{11}+E_{22}+0=I_2.
\end{align*}
For $C$,
\begin{align*}
C^2=(iE_{12}-iE_{21})(iE_{12}-iE_{21})=0+E_{11}+E_{22}+0=I_2.
\end{align*}
Thus, for $X\in\{A,B,C\}$ and $m\ge 0$, $X^{2m}=I_2$ and $X^{2m+1}=X$. Therefore
\begin{align*}
\operatorname{tr}_2(X^{2m})=\operatorname{tr}_2(I_2)=1,\qquad \operatorname{tr}_2(X^{2m+1})=\operatorname{tr}_2(X)=0.
\end{align*}
So $A,B,C$ have the same one-variable scalar moments: odd moments vanish and even moments equal $1$.
The joint behavior is different because the matrices do not commute. Compute
\begin{align*}
AB=(E_{11}-E_{22})(E_{12}+E_{21})=E_{12}+0-0-E_{21}=E_{12}-E_{21}.
\end{align*}
Since
\begin{align*}
-iC=-i(iE_{12}-iE_{21})=E_{12}-E_{21},
\end{align*}
we have $AB=-iC$. On the other hand,
\begin{align*}
BA=(E_{12}+E_{21})(E_{11}-E_{22})=0-E_{12}+E_{21}-0=-E_{12}+E_{21}.
\end{align*}
Hence $AB\ne BA$. Similarly,
\begin{align*}
AC=(E_{11}-E_{22})(iE_{12}-iE_{21})=iE_{12}+iE_{21}=iB.
\end{align*}
Thus the variables have identical one-variable laws, but ordered mixed products such as $AB$ and $AC$ remember how the variables are placed in the word.
[/example]
The example separates marginal laws from joint laws. A free-probability distribution is not determined by the individual distributions of its variables; it depends on all mixed moments.
[example: Different Joint Laws with Matching One-Variable Laws]
Work in $(M_2(\mathbb C),\operatorname{tr}_2)$ with
\begin{align*}
A=E_{11}-E_{22},\qquad B=E_{12}+E_{21}.
\end{align*}
Both pairs $(A,B)$ and $(A,A)$ have the same first-coordinate marginal law because their first coordinate is $A$. Their second-coordinate marginal laws also agree. Indeed,
\begin{align*}
A^2=(E_{11}-E_{22})(E_{11}-E_{22})=E_{11}-0-0+E_{22}=I_2,
\end{align*}
and
\begin{align*}
B^2=(E_{12}+E_{21})(E_{12}+E_{21})=0+E_{11}+E_{22}+0=I_2.
\end{align*}
Also,
\begin{align*}
\operatorname{tr}_2(A)=\frac{1}{2}(1-1)=0,
\end{align*}
and
\begin{align*}
\operatorname{tr}_2(B)=\frac{1}{2}(0+0)=0.
\end{align*}
Since $A^2=B^2=I_2$, induction gives $A^{2m}=B^{2m}=I_2$ and $A^{2m+1}=A$, $B^{2m+1}=B$ for every $m\ge 0$. Therefore
\begin{align*}
\operatorname{tr}_2(A^{2m})=\operatorname{tr}_2(B^{2m})=\operatorname{tr}_2(I_2)=1,
\end{align*}
and
\begin{align*}
\operatorname{tr}_2(A^{2m+1})=\operatorname{tr}_2(A)=0=\operatorname{tr}_2(B)=\operatorname{tr}_2(B^{2m+1}).
\end{align*}
Thus the two second coordinates have the same one-variable moments.
The joint laws still differ. For the pair $(A,A)$,
\begin{align*}
AA=A^2=I_2,
\end{align*}
so
\begin{align*}
\operatorname{tr}_2(AA)=\operatorname{tr}_2(I_2)=1.
\end{align*}
For the pair $(A,B)$,
\begin{align*}
AB=(E_{11}-E_{22})(E_{12}+E_{21})=E_{12}+0-0-E_{21}=E_{12}-E_{21}.
\end{align*}
The diagonal entries of $E_{12}-E_{21}$ are both $0$, hence
\begin{align*}
\operatorname{tr}_2(AB)=\frac{1}{2}(0+0)=0.
\end{align*}
So the mixed word $X_1X_2$ separates the two pairs: matching one-variable laws do not determine the joint noncommutative law.
[/example]
This final example is the finite-dimensional warning that drives the rest of the course. Independence in free probability will be a rule for computing mixed moments from individual centered moments, but it must operate on words in noncommuting variables rather than on events in a sample space.
Once a noncommutative random variable is defined through its moments, the challenge becomes determining which moment sequences are realizable as laws; Chapter 2 solves this moment problem in the tracial setting, clarifying the structure of one-variable and joint laws.
# 2. Laws and Moment Problems in the Tracial Setting
This chapter turns the abstract data of a noncommutative probability space into the object that free probability studies: the law of one or several noncommutative random variables. In the commutative case, a real random variable is described by a probability measure on $\mathbb R$; in the noncommutative case, several variables are described by all mixed moments, because the order of multiplication matters. The tracial assumption adds a symmetry: moments are invariant under cyclic rotation of words, but not under arbitrary permutation of letters.
## Moment Sequences for One Self-Adjoint Variable
How much information is contained in the sequence $\varphi(a^n)$ when $a=a^*$? For a single self-adjoint variable, the noncommutative setting collapses back to an ordinary real moment problem: powers commute with each other, and positivity forces the moments to behave like moments of a probability measure on the real line.
[definition: Moment Sequence of a Self-Adjoint Variable]
Let $(\mathcal A, \varphi)$ be a noncommutative probability space and let $a \in \mathcal A$ satisfy $a=a^*$. The moment sequence of $a$ is the element $(m_n)_{n\ge 0}\in \mathbb C^{\mathbb N_0}$ defined by
\begin{align*}
m_n=\varphi(a^n), \qquad n\ge 0.
\end{align*}
[/definition]
Equivalently, the same data may be viewed as the function $m:\mathbb N_0\to \mathbb C$ with $m(n)=\varphi(a^n)$.
The condition $m_0=1$ comes from unitality of the state, but a normalized scalar sequence is not yet a valid law. The obstruction is that moment sequences must be compatible with positivity: every quadratic polynomial in the observable should have nonnegative expected square. The theorem below extracts the resulting Hankel positivity condition from the state axiom.
[quotetheorem:7114]
[citeproof:7114]
This theorem says that the infinite Hankel matrix $(m_{i+j})_{i,j\ge 0}$ is positive semidefinite on finitely supported vectors. The self-adjointness hypothesis is essential for this particular Hankel form: in the scalar algebra with $a=i$ and $\varphi$ the identity state, the sequence $m_n=i^n$ does not even make the displayed quadratic form real in general. The theorem does not say that every Hankel-positive sequence is realized by a bounded operator; for that, a support or growth condition is still needed. Its role here is to separate genuine moment data from arbitrary scalar sequences before we identify concrete laws in simple examples.
[example: Projection Law]
Let $p=p^2=p^*$ be a projection in a tracial noncommutative probability space $(\mathcal A,\varphi)$, and set $\lambda=\varphi(p)$. Since $p=p^*p$, positivity of the state gives
\begin{align*}
\lambda=\varphi(p)=\varphi(p^*p)\ge 0.
\end{align*}
Also $1-p$ is self-adjoint, and
\begin{align*}
(1-p)^*(1-p)=(1-p)(1-p)=1-2p+p^2=1-p.
\end{align*}
Applying positivity to $1-p$ gives
\begin{align*}
0\le \varphi((1-p)^*(1-p))=\varphi(1-p)=\varphi(1)-\varphi(p)=1-\lambda,
\end{align*}
so $0\le \lambda\le 1$.
For the moments, $m_0=\varphi(1)=1$. If $n\ge 1$, then $p^n=p$ by induction from $p^2=p$, hence
\begin{align*}
m_n=\varphi(p^n)=\varphi(p)=\lambda.
\end{align*}
Now let $\mu=(1-\lambda)\delta_0+\lambda\delta_1$. Its total mass is $(1-\lambda)+\lambda=1$, and the inequalities above make both weights nonnegative. Its zeroth moment is
\begin{align*}
\int 1\,d\mu=(1-\lambda)+\lambda=1,
\end{align*}
while for every $n\ge 1$,
\begin{align*}
\int t^n\,d\mu(t)=(1-\lambda)0^n+\lambda 1^n=\lambda.
\end{align*}
Thus the moment sequence of $p$ is exactly the moment sequence of $(1-\lambda)\delta_0+\lambda\delta_1$, so a projection has the same one-variable law as a classical $\{0,1\}$-valued random variable.
[/example]
The projection example shows that one-variable laws can look classical even inside a noncommutative algebra. A second basic self-adjoint variable is a symmetry, whose square is the identity.
[example: Symmetry Law]
Let $s=s^*=s^{-1}$ in a tracial noncommutative probability space, and put $\alpha=\varphi(s)$. Since $s^{-1}s=1$, multiplying the identity $s^{-1}=s$ on the right by $s$ gives
\begin{align*}
s^2=1.
\end{align*}
For $k\ge 0$,
\begin{align*}
s^{2k}=(s^2)^k=1^k=1,
\end{align*}
and
\begin{align*}
s^{2k+1}=s(s^2)^k=s1^k=s.
\end{align*}
Therefore
\begin{align*}
\varphi(s^{2k})=\varphi(1)=1
\end{align*}
and
\begin{align*}
\varphi(s^{2k+1})=\varphi(s)=\alpha.
\end{align*}
The positivity bounds on $\alpha$ come from the two positive elements $(1+s)^*(1+s)$ and $(1-s)^*(1-s)$. Since $s=s^*$ and $s^2=1$,
\begin{align*}
(1+s)^*(1+s)=(1+s)(1+s)=1+2s+s^2=2+2s,
\end{align*}
so positivity gives
\begin{align*}
0\le \varphi((1+s)^*(1+s))=\varphi(2+2s)=2+2\alpha.
\end{align*}
Similarly,
\begin{align*}
(1-s)^*(1-s)=(1-s)(1-s)=1-2s+s^2=2-2s,
\end{align*}
and hence
\begin{align*}
0\le \varphi((1-s)^*(1-s))=\varphi(2-2s)=2-2\alpha.
\end{align*}
Thus $-1\le \alpha\le 1$.
Now define
\begin{align*}
\mu=\frac{1+\alpha}{2}\delta_1+\frac{1-\alpha}{2}\delta_{-1}.
\end{align*}
The inequalities above make both weights nonnegative, and its total mass is
\begin{align*}
\frac{1+\alpha}{2}+\frac{1-\alpha}{2}=1.
\end{align*}
For even moments,
\begin{align*}
\int t^{2k}\,d\mu(t)=\frac{1+\alpha}{2}1^{2k}+\frac{1-\alpha}{2}(-1)^{2k}=\frac{1+\alpha}{2}+\frac{1-\alpha}{2}=1.
\end{align*}
For odd moments,
\begin{align*}
\int t^{2k+1}\,d\mu(t)=\frac{1+\alpha}{2}1^{2k+1}+\frac{1-\alpha}{2}(-1)^{2k+1}=\frac{1+\alpha}{2}-\frac{1-\alpha}{2}=\alpha.
\end{align*}
Thus a symmetry has the same one-variable moment sequence as a classical random variable supported on $\{-1,1\}$ with weights $(1-\alpha)/2$ at $-1$ and $(1+\alpha)/2$ at $1$.
[/example]
The preceding examples identify familiar compactly supported probability measures from their moments. To use moment sequences as laws rather than as partial invariants, we need a uniqueness theorem saying that compactly supported measures cannot share all moments unless they are the same measure.
[quotetheorem:7115]
[citeproof:7115]
The compact support assumption is the technical point that makes the moment sequence complete. It cannot simply be dropped: the classical Stieltjes moment problem contains moment-indeterminate families, including perturbations of the lognormal distribution, where distinct probability measures have the same moments of every order. This is the one-variable shadow of a broader theme from the theory of [measure spaces](/page/Measure%20Space): a linear functional on test functions becomes a measure only after positivity and continuity conditions are controlled. The theorem also does not identify the support radius from finitely many moments; it only says that once a common compact interval is known, all moments determine the measure. This is why the course treats bounded variables as the default model before moving to noncommutative joint laws.
## Joint Laws as Functionals on Noncommutative Polynomials
What replaces a probability measure on $\mathbb R^d$ when variables no longer commute? The right test objects are not ordinary polynomials in commuting coordinates, but words in noncommuting letters. A joint law assigns a scalar to every such word, and this scalar is the mixed moment of the corresponding variables.
[definition: Noncommutative Polynomial Algebra]
For $d\in \mathbb N$, the noncommutative polynomial algebra $\mathbb C\langle X_1,\dots,X_d\rangle$ is the unital complex algebra whose basis consists of all words in the letters $X_1,\dots,X_d$, including the empty word $1$.
[/definition]
Multiplication is concatenation of words, so $X_1X_2$ and $X_2X_1$ are different basis elements. The problem is that there is no single measure on $\mathbb R^d$ that can record the order-sensitive moments of noncommuting variables. Instead, the law must be a rule that evaluates every word in the formal letters by substituting the corresponding variables and applying the state.
[definition: Joint Law]
Let $(\mathcal A,\varphi)$ be a noncommutative probability space and let $a_1,\dots,a_d\in \mathcal A$. The joint law of $(a_1,\dots,a_d)$ is the linear functional
\begin{align*}
\mu_{a_1,\dots,a_d}:\mathbb C\langle X_1,\dots,X_d\rangle \to \mathbb C
\end{align*}
defined on words by
\begin{align*}
\mu_{a_1,\dots,a_d}(X_{i_1}\cdots X_{i_n})=\varphi(a_{i_1}\cdots a_{i_n}),
\end{align*}
and by $\mu_{a_1,\dots,a_d}(1)=1$.
[/definition]
A joint law remembers every ordered mixed moment. This is stronger than knowing all marginal one-variable laws, because the marginal law of each $a_i$ contains only moments of powers of that single variable.
[example: Joint Law of Two Matrix Units]
In $M_2(\mathbb C)$, the standard matrix units satisfy $e_{ij}e_{kl}=\delta_{jk}e_{il}$, where $\delta_{jk}$ is the Kronecker delta. With $x=e_{12}$ and $y=e_{21}$, this gives
\begin{align*}
xy=e_{12}e_{21}=\delta_{2,2}e_{11}=e_{11}.
\end{align*}
Similarly,
\begin{align*}
yx=e_{21}e_{12}=\delta_{1,1}e_{22}=e_{22}.
\end{align*}
Since the normalized trace is $\operatorname{tr}_2(T)=\frac{1}{2}\operatorname{Tr}(T)$, and $\operatorname{Tr}(e_{11})=\operatorname{Tr}(e_{22})=1$, we get
\begin{align*}
\operatorname{tr}_2(xy)=\operatorname{tr}_2(e_{11})=\frac{1}{2}.
\end{align*}
Also,
\begin{align*}
\operatorname{tr}_2(yx)=\operatorname{tr}_2(e_{22})=\frac{1}{2}.
\end{align*}
The repeated letters vanish because
\begin{align*}
x^2=e_{12}e_{12}=\delta_{2,1}e_{12}=0.
\end{align*}
Likewise,
\begin{align*}
y^2=e_{21}e_{21}=\delta_{1,2}e_{21}=0.
\end{align*}
Thus any word containing $xx$ or $yy$ as adjacent letters is zero after multiplying that adjacent pair. Alternating words need not vanish: for example,
\begin{align*}
xyxy=(xy)(xy)=e_{11}e_{11}=\delta_{1,1}e_{11}=e_{11},
\end{align*}
so $\operatorname{tr}_2(xyxy)=1/2$. Similarly,
\begin{align*}
yxyx=(yx)(yx)=e_{22}e_{22}=\delta_{2,2}e_{22}=e_{22},
\end{align*}
so $\operatorname{tr}_2(yxyx)=1/2$. This shows concretely that the joint law records ordered words: the products are computed letter by letter, and the alternating pattern is the part that survives for this pair of matrix units.
[/example]
The matrix-unit example is not self-adjoint, but it captures the key point that words are evaluated in order. To formulate laws of self-adjoint variables intrinsically, we next need the involution on noncommutative polynomials that models taking adjoints after evaluation.
[definition: Star Operation on Noncommutative Polynomials]
The star operation on $\mathbb C\langle X_1,\dots,X_d\rangle$ with self-adjoint indeterminates is the involution
\begin{align*}
*:\mathbb C\langle X_1,\dots,X_d\rangle \to \mathbb C\langle X_1,\dots,X_d\rangle
\end{align*}
determined by $X_i^*=X_i$, conjugate-linearity, and
\begin{align*}
(X_{i_1}\cdots X_{i_n})^*=X_{i_n}\cdots X_{i_1}.
\end{align*}
[/definition]
This involution reverses words because it models the adjoint operation in an algebra. A proposed list of mixed moments cannot be arbitrary: if it is to come from a state, every formal square $P^*P$ must have nonnegative value. The theorem below gives the algebraic realization criterion obtained by imposing precisely this positivity condition.
[quotetheorem:7116]
[citeproof:7116]
The positivity criterion is the noncommutative analogue of positive semidefiniteness of moment matrices. Both hypotheses are forced: $\mu(1)=1$ records normalization, while positivity rules out the concrete unital assignment on $\mathbb C\langle X_1\rangle$ defined by $\mu(1)=1$, $\mu(X_1^2)=-1$, and $\mu(w)=0$ for every other nonempty word $w$. Taking $P=X_1$ gives $\mu(P^*P)=\mu(X_1^2)=-1$, so no positive state can realize this functional. The theorem does not impose traciality or boundedness, so it realizes algebraic laws that may not yet come from bounded operators or from traces. To pass from this algebraic realization to Hilbert-space constructions, we need quantitative control of the sesquilinear form $(P,Q)\mapsto \mu(P^*Q)$, and the next estimate supplies exactly that control.
[quotetheorem:432]
[citeproof:432]
Cauchy-Schwarz explains why the GNS construction naturally appears: the state defines an inner product after quotienting out the zero-length elements. Positivity is the indispensable hypothesis; a merely unital linear functional need not give any estimate of this kind. For instance, on $M_2(\mathbb C)$ the functional $\psi(T)=2T_{11}-T_{22}$ is complex-linear and satisfies $\psi(I)=1$, but $\psi(e_{22}^*e_{22})=-1$. Thus $\psi(b^*b)$ can be negative, so $\psi(a^*b)$ cannot be controlled by a positive seminorm built from $\psi(x^*x)$ in the Hilbert-space sense. The inequality does not say that multiplication operators are bounded, only that the state-induced inner product behaves like a Hilbert-space inner product on the quotient. This prepares the transition from formal moment functionals to tracial laws, where the same positivity interacts with cyclic symmetry.
## Tracial Laws and Cyclic Equivalence of Words
What extra identities appear when the state is tracial? The trace identity $\varphi(ab)=\varphi(ba)$ does not make variables commute, but it does make mixed moments invariant under cyclic rotation of the word being evaluated. Thus tracial laws live on words modulo cyclic equivalence, not on commutative monomials.
[definition: Tracial Joint Law]
Let $(\mathcal A,\varphi)$ be a tracial noncommutative probability space and let $a_1,\dots,a_d\in \mathcal A$. The tracial joint law of $(a_1,\dots,a_d)$ is the linear functional
\begin{align*}
\mu_{a_1,\dots,a_d}:\mathbb C\langle X_1,\dots,X_d\rangle\to\mathbb C
\end{align*}
defined by $\mu_{a_1,\dots,a_d}(1)=1$ and
\begin{align*}
\mu_{a_1,\dots,a_d}(X_{i_1}\cdots X_{i_n})=\varphi(a_{i_1}\cdots a_{i_n}),
\end{align*}
regarded together with the trace identity
\begin{align*}
\mu(PQ)=\mu(QP)
\end{align*}
for all $P,Q\in \mathbb C\langle X_1,\dots,X_d\rangle$.
[/definition]
This definition records a property rather than adding new data. To describe exactly which words a tracial law must identify, we next isolate cyclic equivalence as the word-level relation generated by moving an initial block to the end.
[definition: Cyclic Equivalence of Words]
Two words $w=X_{i_1}\cdots X_{i_n}$ and $v=X_{j_1}\cdots X_{j_n}$ in $\mathbb C\langle X_1,\dots,X_d\rangle$ are cyclically equivalent if $v$ is obtained from $w$ by a cyclic rotation of its letters.
[/definition]
Cyclic equivalence is weaker than equality in the commutative polynomial algebra. The question is exactly which word identifications are forced by the trace identity and which reorderings remain forbidden. The theorem below identifies cyclic rotations as the symmetry imposed by traciality, without collapsing the variables into commuting ones.
[quotetheorem:7117]
[citeproof:7117]
This theorem is often the only symmetry available. The tracial hypothesis is essential: on $M_2(\mathbb C)$, the nontracial state $\varphi(T)=T_{11}$ gives $\varphi(e_{12}e_{21})=1$ but $\varphi(e_{21}e_{12})=0$, so even a single cyclic rotation can change the moment. The theorem also does not permit arbitrary reordering of letters inside a word; it identifies $X_1X_2X_1X_2$ with $X_2X_1X_2X_1$, but not with $X_1X_1X_2X_2$ in general. The next example makes this distinction visible by comparing commuting variables with genuinely noncommuting matrices.
[example: Commuting and Noncommuting Pairs]
Let $x,y$ be commuting self-adjoint variables. If a word contains $m$ copies of $x$ and $n$ copies of $y$, then each adjacent occurrence $yx$ may be replaced by $xy$ because $xy=yx$; repeating this swap moves all copies of $x$ to the left and all copies of $y$ to the right. Hence that word evaluates to $x^m y^n$, and the joint law of the commuting pair is determined by the scalar moments $\varphi(x^m y^n)$.
For a concrete noncommuting self-adjoint pair, work in $M_2(\mathbb C)$ with normalized trace $\operatorname{tr}_2$ and set
\begin{align*}
a=e_{12}+e_{21}, \qquad b=e_{11}-e_{22}.
\end{align*}
Since $e_{ij}^*=e_{ji}$, we have $a^*=a$ and $b^*=b$. Using $e_{ij}e_{kl}=\delta_{jk}e_{il}$,
\begin{align*}
ab=(e_{12}+e_{21})(e_{11}-e_{22})=e_{12}e_{11}-e_{12}e_{22}+e_{21}e_{11}-e_{21}e_{22}=0-e_{12}+e_{21}-0=e_{21}-e_{12}.
\end{align*}
Thus
\begin{align*}
abab=(ab)(ab)=(e_{21}-e_{12})(e_{21}-e_{12})=e_{21}e_{21}-e_{21}e_{12}-e_{12}e_{21}+e_{12}e_{12}=0-e_{22}-e_{11}+0=-(e_{11}+e_{22}).
\end{align*}
Therefore
\begin{align*}
\operatorname{tr}_2(abab)=\operatorname{tr}_2(-(e_{11}+e_{22}))=-1.
\end{align*}
On the other hand,
\begin{align*}
a^2=(e_{12}+e_{21})(e_{12}+e_{21})=e_{12}e_{12}+e_{12}e_{21}+e_{21}e_{12}+e_{21}e_{21}=0+e_{11}+e_{22}+0=e_{11}+e_{22}.
\end{align*}
Also
\begin{align*}
b^2=(e_{11}-e_{22})(e_{11}-e_{22})=e_{11}e_{11}-e_{11}e_{22}-e_{22}e_{11}+e_{22}e_{22}=e_{11}-0-0+e_{22}=e_{11}+e_{22}.
\end{align*}
Hence
\begin{align*}
aabb=a^2b^2=(e_{11}+e_{22})(e_{11}+e_{22})=e_{11}+e_{22},
\end{align*}
so
\begin{align*}
\operatorname{tr}_2(aabb)=\operatorname{tr}_2(e_{11}+e_{22})=1.
\end{align*}
Thus the ordered moments $\operatorname{tr}_2(abab)$ and $\operatorname{tr}_2(aabb)$ are different, even though traciality still identifies cyclic rotations such as $abab$ and $baba$.
[/example]
The contrast in the example is a guiding principle for free probability. Freeness is not a statement about product measures on $\mathbb R^d$; it is a rule for evaluating alternating centered words in a noncommutative law.
## Compact Support and Bounded Operator Realization
When does an abstract positive law come from bounded operators rather than merely from algebraic variables? In one variable, compact support is the answer. In several variables, the matching condition is a family of growth bounds saying that multiplication by each coordinate variable should act as a bounded operator in the GNS representation.
[definition: Compactly Supported Noncommutative Law]
A positive unital functional $\mu:\mathbb C\langle X_1,\dots,X_d\rangle\to\mathbb C$ has compact support bounded by $R>0$ if for each $i\in\{1,\dots,d\}$ and every polynomial $P\in\mathbb C\langle X_1,\dots,X_d\rangle$,
\begin{align*}
\mu(P^*X_i^2P)\le R^2\mu(P^*P).
\end{align*}
[/definition]
This condition says that left multiplication by $X_i$ is bounded by $R$ for the seminorm induced by $\mu$. Positivity alone can build an algebraic GNS space, but it does not ensure that the coordinate variables act by bounded operators. The theorem below explains why the compact-support estimate is exactly the extra control needed to pass from formal laws to bounded self-adjoint operator models.
[quotetheorem:7118]
[citeproof:7118]
This theorem closes the circle between algebraic laws and operator models. Each hypothesis has a separate job: unitality normalizes the cyclic vector so that the law has total mass one, positivity turns algebraic expectations into Hilbert-space geometry through the GNS construction, and the compact-support estimate makes the coordinate multiplications bounded operators with norm at most $R$. The compact-support estimate is essential: in one variable, the Gaussian moment functional is positive and unital but gives an unbounded multiplication operator in its natural $L^2$ realization, so no bounded operator model can have those moments. The theorem does not promise a finite-dimensional Hilbert space, nor does it add traciality unless the original functional already satisfies trace identities. Its conclusion is exactly what is needed for the operator-algebraic examples later in the course, where variables are bounded [self-adjoint operators](/page/Self-Adjoint%20Operators) and moments are computed from vector states or traces.
[remark: Algebraic Versus Operator-Valued Realization]
The positivity criterion realizes every positive unital law on the polynomial algebra as an algebraic noncommutative probability space. The bounded-operator realization requires the additional compact-support estimates. This distinction matters because many arguments in free probability are algebraic, while analytic arguments require boundedness or a specified operator topology. Later, freeness will be stated as a rule for mixed centered moments of such laws, free cumulants will reorganize the same moment data, and random matrix limits will be expressed as convergence of tracial joint laws.
[/remark]
With laws of noncommutative variables now characterized, Chapter 3 introduces the central notion for this course: freeness, which replaces classical independence by imposing a vanishing condition on centered mixed moments.
# 3. Classical Independence Versus Free Independence
This chapter introduces the independence notion that replaces tensor independence in a noncommutative probability space. In classical probability, independence is encoded by product measures and factorisation of expectations; in free probability, the corresponding rule is instead a vanishing condition for alternating products of centred variables. The guiding question is how much mixed information is forced once the individual laws of several noncommuting variables are known.
We work throughout in a unital noncommutative probability space $(A,\varphi)$, using $A$ for the ambient algebra previously denoted $\mathcal A$; thus $A$ is a unital algebra over $\mathbb C$ and $\varphi:A\to\mathbb C$ is a unital linear functional. When $A$ is a $*$-algebra and $\varphi$ is positive or tracial, the same definitions apply, but the algebraic moment rules below do not require positivity.
## Tensor Independence and Product States
The first problem is to recall what classical independence says in algebraic language. If two probability spaces are combined classically, functions from the two spaces live in commuting tensor factors, and mixed expectations split across those factors. This gives a useful baseline for seeing what changes in the free setting.
[definition: Tensor Product Noncommutative Probability Space]
Let $(A_1,\varphi_1)$ and $(A_2,\varphi_2)$ be unital noncommutative probability spaces. Their [tensor product](/page/Tensor%20Product) noncommutative probability space is
\begin{align*}
(A_1\otimes A_2,\varphi_1\otimes\varphi_2),
\end{align*}
where $A_1\otimes A_2$ is the algebraic tensor product with unit $1_{A_1}\otimes 1_{A_2}$, and $\varphi_1\otimes\varphi_2$ is determined by
\begin{align*}
\varphi_1\otimes\varphi_2 &: A_1\otimes A_2\to\mathbb C, &
(\varphi_1\otimes\varphi_2)(a_1\otimes a_2)&=\varphi_1(a_1)\varphi_2(a_2).
\end{align*}
[/definition]
The tensor product construction gives an external model with two commuting copies. To use this inside a larger algebra, we need an internal condition saying that two subalgebras behave as those two tensor factors and that the ambient state restricts to the product state.
[definition: Tensor Independent Subalgebras]
Let $(A,\varphi)$ be a unital noncommutative probability space. Unital subalgebras $A_1,A_2\subset A$ are tensor independent if the multiplication map $A_1\otimes A_2\to A$ identifies the algebra generated by $A_1$ and $A_2$ with a tensor product and
\begin{align*}
\varphi(a_1a_2)=\varphi(a_1)\varphi(a_2)
\end{align*}
for all $a_1\in A_1$ and $a_2\in A_2$.
[/definition]
This definition hides the classical product-measure picture inside algebra. Without the tensor-product identification the displayed condition $\varphi(a_1a_2)=\varphi(a_1)\varphi(a_2)$ would control only words of length two, not longer mixed words such as $a_1b_1a_2b_2$. The next result records the familiar mixed-moment factorisation in the form we will compare with freeness.
[quotetheorem:7119]
[citeproof:7119]
The tensor-product identification is essential: if the two copies do not commute, the word $a_1b_1a_2b_2$ cannot be rearranged into an $A_1$-part and an $A_2$-part, so the proof has no way to form $(a_1a_2)\otimes(b_1b_2)$. The product-state hypothesis is also essential; on the same tensor product, a non-product linear functional can have $\varphi(a\otimes b)\neq \varphi(a\otimes 1)\varphi(1\otimes b)$ even for a two-letter word. The theorem does not say that all independence notions should factor mixed moments this way; rather, it identifies exactly the classical mechanism that freeness will abandon. Tensor independence therefore preserves the idea that different sources commute before expectations are taken, while free independence keeps the individual laws but replaces commutation and direct factorisation by a centering rule.
[example: Product State on Classical Random Variables]
Let $(\Omega_i,\mathcal F_i,\mathbb P_i)$ be probability spaces and let $X\in L^\infty(\Omega_1)$, $Y\in L^\infty(\Omega_2)$. Write $\widetilde X(\omega_1,\omega_2)=X(\omega_1)$ and $\widetilde Y(\omega_1,\omega_2)=Y(\omega_2)$ in $L^\infty(\Omega_1\times\Omega_2)$; boundedness ensures all powers below are integrable. For $m,n\ge 1$, the product is pointwise separated:
\begin{align*}
(\widetilde X^m\widetilde Y^n)(\omega_1,\omega_2)=\widetilde X(\omega_1,\omega_2)^m\widetilde Y(\omega_1,\omega_2)^n=X(\omega_1)^mY(\omega_2)^n.
\end{align*}
Therefore, by the defining product-integral rule for separated bounded functions,
\begin{align*}
\mathbb E[\widetilde X^m\widetilde Y^n]=\int_{\Omega_1\times\Omega_2}X(\omega_1)^mY(\omega_2)^n\,d(\mathbb P_1\otimes\mathbb P_2)=\left(\int_{\Omega_1}X^m\,d\mathbb P_1\right)\left(\int_{\Omega_2}Y^n\,d\mathbb P_2\right)=\mathbb E[X^m]\mathbb E[Y^n].
\end{align*}
The same calculation is the tensor-factor computation $\widetilde X=X\otimes 1$ and $\widetilde Y=1\otimes Y$, so all $X$-letters can be gathered into the first factor and all $Y$-letters into the second. If $X$ and $Y$ are replaced by noncommuting operators, the word $XYXY$ keeps its order: it equals $X(YX)Y$, while $X^2Y^2=XXYY$, and the passage from $X(YX)Y$ to $X(XY)Y$ requires $YX=XY$. Thus the classical factorisation works precisely because the two coordinate copies commute before the expectation is applied.
[/example]
## Alternating Centred Products
The question now is how to define independence when the variables need not commute. Since a word such as $abab$ cannot be rearranged into an $a$-part and a $b$-part, the tensor-product rule no longer gives the right template. Free independence instead says that a mixed alternating word has zero expectation after each letter has been centred.
[definition: Centred Element]
Let $(A,\varphi)$ be a unital noncommutative probability space. An element $a\in A$ is centred if
\begin{align*}
\varphi(a)=0.
\end{align*}
For a general $a\in A$, its centred part is
\begin{align*}
a^\circ:=a-\varphi(a)1_A.
\end{align*}
[/definition]
Centred elements are the algebraic version of mean-zero random variables, and the centred part separates scalar information from fluctuation. For noncommuting variables, factorisation of expectations is not the right independence rule because words retain their order. Freeness instead asks what remains after all scalar parts have been removed: alternating products of centred fluctuations from different subalgebras should have zero expectation.
[definition: Free Independence of Subalgebras]
Let $(A,\varphi)$ be a unital noncommutative probability space. A family $(A_i)_{i\in I}$ of unital subalgebras of $A$ is freely independent, or free, if whenever $n\ge 1$, $a_k\in A_{i_k}$, $\varphi(a_k)=0$, and adjacent indices are distinct,
\begin{align*}
i_1\neq i_2,\quad i_2\neq i_3,\quad \dots,\quad i_{n-1}\neq i_n,
\end{align*}
one has
\begin{align*}
\varphi(a_1a_2\cdots a_n)=0.
\end{align*}
[/definition]
The condition only asks for adjacent indices to differ, so the same algebra may reappear later in the word. This is why $A_1,A_2,A_1$ is a legal alternating pattern, while $A_1,A_1,A_2$ is not: adjacent letters from the same algebra must first be multiplied inside that algebra. The definition is deliberately stated only for centred letters; non-centred letters are handled by splitting off scalar parts.
[explanation: Centering Expansion]
Given a letter $a_k\in A_{i_k}$, write
\begin{align*}
a_k=\varphi(a_k)1_A+a_k^\circ.
\end{align*}
Expanding a product by linearity expresses it as a sum of terms in which some letters are scalar and the remaining non-scalar letters are centred. Freeness kills every term whose remaining centred letters still form a nonempty alternating mixed word; the scalar terms are the only survivors. Thus centering is not a separate theorem but the computational mechanism that turns the definition into mixed-moment calculations.
[/explanation]
The point of the centering expansion is not only that certain moments vanish. It gives an algorithm for reducing mixed moments to moments internal to the separate subalgebras, provided the word can be decomposed into alternating centred pieces after scalar parts are removed. The next example is the smallest case where the rule acts directly, with no scalar terms surviving.
[example: First Alternating Vanishing]
Let $A_a=\operatorname{alg}(\{a\})$ and $A_b=\operatorname{alg}(\{b\})$. Since $a\in A_a$, $b\in A_b$, $\varphi(a)=0$, and $\varphi(b)=0$, the four letters in
\begin{align*}
abab
\end{align*}
are centred and have colour pattern
\begin{align*}
A_a,\ A_b,\ A_a,\ A_b.
\end{align*}
Adjacent colours are distinct at every step: $A_a\neq A_b$, $A_b\neq A_a$, and $A_a\neq A_b$. Because $A_a$ and $A_b$ are free, the defining vanishing condition for alternating products of centred letters gives
\begin{align*}
\varphi(abab)=0.
\end{align*}
This is the first free analogue of a mixed centred classical moment, but it is an ordered statement: for noncommuting letters, the word $abab$ is not interchangeable with rearranged words such as $aabb$.
[/example]
## Freeness of Generated Algebras and Families of Variables
In applications we usually start with variables rather than subalgebras. The problem is to give a definition that depends on all polynomials in the variables, not merely on the variables themselves, because products such as $a^2ba$ involve $a^2$ as a single letter from the algebra generated by $a$.
[definition: Unital Algebra Generated by a Family]
Let $A$ be a unital algebra and let $S\subset A$. The unital subalgebra generated by $S$ is
\begin{align*}
\operatorname{alg}(S):=\bigcap\{B\subset A: B\text{ is a unital subalgebra and }S\subset B\}.
\end{align*}
[/definition]
For one variable $a$, this is the algebra of polynomials in $a$ and $1_A$; for several noncommuting variables, it consists of noncommutative polynomials in those variables. This motivates the variable-level definition of freeness, which tests the whole algebraic information carried by each variable or family.
[definition: Free Family of Variables]
Let $(A,\varphi)$ be a unital noncommutative probability space. A family of subsets $(S_i)_{i\in I}$ of $A$ is free if the unital subalgebras $(\operatorname{alg}(S_i))_{i\in I}$ are free. Elements $(a_i)_{i\in I}$ are free if the singleton-generated subalgebras $(\operatorname{alg}(\{a_i\}))_{i\in I}$ are free.
[/definition]
This convention prevents a common mistake: checking only words whose letters are the original variables is not enough. If $a$ and $b$ are tested only through words built from $a$ and $b$, the expression $a^2ba$ may be missed even though $a^2$ is a legitimate element of the algebra generated by $a$. Thus powers and polynomials must be admitted before applying the alternating-centred test.
[explanation: Why Generated Algebras Are Tested]
The phrase "the variables are free" is shorthand for a condition on all polynomial information carried by those variables. For a single element $a$, the relevant algebra is made of polynomials in $a$ and $1_A$; for a family $S_i$, it is the unital algebra $\operatorname{alg}(S_i)$. Therefore any polynomial expression in one family is treated as a single letter from that family when checking alternating centred products.
[/explanation]
The generated-algebra viewpoint matters even in short computations because centering is performed after forming the polynomial letter, not before. In the word $a^2ba$, the letter $a^2$ belongs to the same generated algebra as $a$, but its expectation need not vanish even when $a$ is centred. The next example shows how this extra centering step changes the computation.
[example: Computing a Moment with a Power]
Let $a,b\in A$ be centred free variables, so $\varphi(a)=\varphi(b)=0$, and set $\alpha_2=\varphi(a^2)$. The letter $a^2$ belongs to $\operatorname{alg}(\{a\})$, but it need not be centred, so first split it into its scalar and centred parts:
\begin{align*}
a^2=\alpha_2 1_A+(a^2-\alpha_2 1_A).
\end{align*}
The second summand is centred because $\varphi$ is unital:
\begin{align*}
\varphi(a^2-\alpha_2 1_A)=\varphi(a^2)-\alpha_2\varphi(1_A)=\alpha_2-\alpha_2=0.
\end{align*}
Substituting this decomposition into the word and using linearity of $\varphi$ gives
\begin{align*}
\varphi(a^2ba)=\varphi((\alpha_2 1_A+(a^2-\alpha_2 1_A))ba).
\end{align*}
Expanding the product inside the expectation gives
\begin{align*}
\varphi(a^2ba)=\varphi(\alpha_2 ba+(a^2-\alpha_2 1_A)ba).
\end{align*}
By linearity and scalar homogeneity,
\begin{align*}
\varphi(a^2ba)=\alpha_2\varphi(ba)+\varphi((a^2-\alpha_2 1_A)ba).
\end{align*}
The word $ba$ is an alternating product of centred letters from $\operatorname{alg}(\{b\})$ and $\operatorname{alg}(\{a\})$, so freeness gives
\begin{align*}
\varphi(ba)=0.
\end{align*}
Also, $a^2-\alpha_2 1_A\in\operatorname{alg}(\{a\})$, $b\in\operatorname{alg}(\{b\})$, and $a\in\operatorname{alg}(\{a\})$ are centred, with alternating colour pattern $A_a,A_b,A_a$. Freeness therefore gives
\begin{align*}
\varphi((a^2-\alpha_2 1_A)ba)=0.
\end{align*}
Hence
\begin{align*}
\varphi(a^2ba)=\alpha_2\cdot 0+0=0.
\end{align*}
This computation shows why powers must be centred as whole polynomial letters before applying the alternating-word rule.
[/example]
## Recursive Mixed-Moment Computations
The central computational question is whether freeness gives all mixed moments once the separate laws are known. It does: a mixed word is expanded by centering one or more maximal blocks from the same algebra, and each expansion either vanishes by alternation or reduces to shorter mixed words multiplied by scalar marginal moments.
[quotetheorem:7120]
[citeproof:7120]
The hypothesis that the $A_i$ are free is what supplies the vanishing step in the recursion; without it, a centred alternating term may contain new mixed information. For example, in an arbitrary noncommutative probability space two centred variables can have $\varphi(ab)\neq 0$, so the one-letter marginal data of $a$ and $b$ do not determine the mixed moment. The theorem also does not claim a tensor-product factorisation: the moment of $abab$ depends on the order of the letters, not only on the separate lists $a,a$ and $b,b$. Its role is algorithmic, and the same recursive viewpoint will be reorganised in the next chapter by noncrossing partitions and free cumulants.
[example: Computing a Longer Centred Word]
Let $a,b\in A$ be centred free variables, and set $\beta_2=\varphi(b^2)$. The letter $b^2$ lies in $\operatorname{alg}(\{b\})$, but it need not be centred, so write
\begin{align*}
b^2=\beta_2 1_A+(b^2-\beta_2 1_A).
\end{align*}
The second summand is centred because $\varphi$ is unital:
\begin{align*}
\varphi(b^2-\beta_2 1_A)=\varphi(b^2)-\beta_2\varphi(1_A)=\beta_2-\beta_2=0.
\end{align*}
Substituting this into the word gives
\begin{align*}
ab^2ab=a(\beta_2 1_A+(b^2-\beta_2 1_A))ab.
\end{align*}
Distributing the product and using that $\beta_2$ is a scalar,
\begin{align*}
a(\beta_2 1_A+(b^2-\beta_2 1_A))ab=\beta_2 a^2b+a(b^2-\beta_2 1_A)ab.
\end{align*}
Applying linearity and scalar homogeneity of $\varphi$,
\begin{align*}
\varphi(ab^2ab)=\beta_2\varphi(a^2b)+\varphi(a(b^2-\beta_2 1_A)ab).
\end{align*}
It remains to evaluate the two terms. Set $\alpha_2=\varphi(a^2)$ and decompose
\begin{align*}
a^2=\alpha_2 1_A+(a^2-\alpha_2 1_A).
\end{align*}
Since $\varphi(a^2-\alpha_2 1_A)=\alpha_2-\alpha_2=0$, the second summand is centred. Hence
\begin{align*}
\varphi(a^2b)=\varphi((\alpha_2 1_A+(a^2-\alpha_2 1_A))b).
\end{align*}
Expanding and using linearity,
\begin{align*}
\varphi(a^2b)=\alpha_2\varphi(b)+\varphi((a^2-\alpha_2 1_A)b).
\end{align*}
The first summand is $\alpha_2\cdot 0$ because $b$ is centred. The second summand is the expectation of an alternating product of centred letters from $\operatorname{alg}(\{a\})$ and $\operatorname{alg}(\{b\})$, so freeness gives
\begin{align*}
\varphi((a^2-\alpha_2 1_A)b)=0.
\end{align*}
Therefore
\begin{align*}
\varphi(a^2b)=\alpha_2\cdot 0+0=0.
\end{align*}
For the remaining term, the four letters
\begin{align*}
a,\quad b^2-\beta_2 1_A,\quad a,\quad b
\end{align*}
are centred and belong respectively to
\begin{align*}
\operatorname{alg}(\{a\}),\quad \operatorname{alg}(\{b\}),\quad \operatorname{alg}(\{a\}),\quad \operatorname{alg}(\{b\}).
\end{align*}
Their colours alternate, so freeness gives
\begin{align*}
\varphi(a(b^2-\beta_2 1_A)ab)=0.
\end{align*}
Combining the two evaluations,
\begin{align*}
\varphi(ab^2ab)=\beta_2\cdot 0+0=0.
\end{align*}
This example shows that the scalar part of $b^2$ creates a shorter mixed moment, and that shorter moment still vanishes only after centering the polynomial letter $a^2$.
[/example]
The computations above happen to vanish because the words remain alternating after centering. Nonzero mixed moments appear when centering produces scalar contractions.
[example: Free Projections with Prescribed Traces]
Let $p,q\in A$ be free projections, so $p^2=p$ and $q^2=q$, with $\varphi(p)=s$ and $\varphi(q)=t$. Put
\begin{align*}
p^\circ=p-s1_A
\end{align*}
and
\begin{align*}
q^\circ=q-t1_A.
\end{align*}
These are centred because $\varphi(1_A)=1$, so
\begin{align*}
\varphi(p^\circ)=\varphi(p)-s\varphi(1_A)=s-s=0
\end{align*}
and
\begin{align*}
\varphi(q^\circ)=\varphi(q)-t\varphi(1_A)=t-t=0.
\end{align*}
For the two-letter moment,
\begin{align*}
pq=(s1_A+p^\circ)(t1_A+q^\circ)=st1_A+s q^\circ+t p^\circ+p^\circ q^\circ.
\end{align*}
Applying $\varphi$ and using linearity gives
\begin{align*}
\varphi(pq)=st+s\varphi(q^\circ)+t\varphi(p^\circ)+\varphi(p^\circ q^\circ).
\end{align*}
The middle two terms vanish because $p^\circ$ and $q^\circ$ are centred, and the last term vanishes by freeness because $p^\circ,q^\circ$ are alternating centred letters from $\operatorname{alg}(\{p\})$ and $\operatorname{alg}(\{q\})$. Hence
\begin{align*}
\varphi(pq)=st.
\end{align*}
Now compute the longer word $pqpq$. First expand only the two $q$-letters:
\begin{align*}
pqpq=p(t1_A+q^\circ)p(t1_A+q^\circ).
\end{align*}
Distributing the product gives
\begin{align*}
pqpq=t^2p^2+tp^2q^\circ+tpq^\circ p+pq^\circ p q^\circ.
\end{align*}
Since $p^2=p$,
\begin{align*}
\varphi(pqpq)=t^2\varphi(p)+t\varphi(pq^\circ)+t\varphi(pq^\circ p)+\varphi(pq^\circ p q^\circ).
\end{align*}
The first term is $t^2s$. For the second term, expand $p=s1_A+p^\circ$:
\begin{align*}
\varphi(pq^\circ)=\varphi((s1_A+p^\circ)q^\circ)=s\varphi(q^\circ)+\varphi(p^\circ q^\circ)=0+0=0.
\end{align*}
For the third term,
\begin{align*}
pq^\circ p=(s1_A+p^\circ)q^\circ(s1_A+p^\circ).
\end{align*}
Distributing gives
\begin{align*}
pq^\circ p=s^2q^\circ+s p^\circ q^\circ+s q^\circ p^\circ+p^\circ q^\circ p^\circ.
\end{align*}
After applying $\varphi$, every term is an alternating product of centred letters, or a single centred letter, so
\begin{align*}
\varphi(pq^\circ p)=0.
\end{align*}
For the final term,
\begin{align*}
pq^\circ p q^\circ=(s1_A+p^\circ)q^\circ(s1_A+p^\circ)q^\circ.
\end{align*}
Distributing gives
\begin{align*}
pq^\circ p q^\circ=s^2(q^\circ)^2+s p^\circ(q^\circ)^2+s q^\circ p^\circ q^\circ+p^\circ q^\circ p^\circ q^\circ.
\end{align*}
The last two terms vanish by freeness. For $p^\circ(q^\circ)^2$, decompose $(q^\circ)^2$ into its scalar and centred parts; since $p^\circ$ is centred and lies in the $p$-algebra, both resulting terms have expectation $0$. Thus
\begin{align*}
\varphi(pq^\circ p q^\circ)=s^2\varphi((q^\circ)^2).
\end{align*}
Using $q^2=q$,
\begin{align*}
(q^\circ)^2=(q-t1_A)^2=q^2-2tq+t^2 1_A=q-2tq+t^2 1_A.
\end{align*}
Therefore
\begin{align*}
\varphi((q^\circ)^2)=\varphi(q)-2t\varphi(q)+t^2\varphi(1_A)=t-2t^2+t^2=t-t^2.
\end{align*}
Combining the evaluated terms,
\begin{align*}
\varphi(pqpq)=t^2s+t\cdot 0+t\cdot 0+s^2(t-t^2).
\end{align*}
Hence
\begin{align*}
\varphi(pqpq)=st^2+s^2t-s^2t^2.
\end{align*}
The mixed moment is forced by the two traces $s$ and $t$, even though the projections need not commute.
[/example]
## Pairwise Freeness and Joint Freeness
The last issue in this chapter is logical rather than computational. In classical probability, pairwise independence does not imply mutual independence; the same warning applies to freeness. The definition of joint freeness tests alternating words involving all subalgebras at once.
[definition: Pairwise Freeness]
A family $(A_i)_{i\in I}$ of unital subalgebras of $(A,\varphi)$ is pairwise free if for every pair of distinct indices $i,j\in I$, the two-subalgebra family $(A_i,A_j)$ is free.
[/definition]
Pairwise freeness only sees words using two colours at a time, while joint freeness also tests words with three or more colours. The obstruction is that a three-colour alternating moment can be invisible to every two-colour restriction. The theorem below records this failure of pairwise tests to certify full freeness.
[quotetheorem:7121]
[citeproof:7121]
The construction shows that pairwise freeness controls only the restrictions of the state to two-colour subalgebras. It leaves room for a three-colour mixed moment such as $\varphi(x_1x_2x_3)$ to be chosen independently in a purely algebraic noncommutative probability space. Positivity or traciality would impose extra consistency constraints, so the example is a warning about the definition in its algebraic form rather than a model of every operator-algebraic situation. The limitation is exactly why later cumulant criteria must test all mixed cumulants among the whole family, not only those involving two indices.
[example: Algebraic Three-Colour Obstruction]
In the construction from the proof, take $a_i=x_i\in A_i=\operatorname{alg}(\{x_i\})$ for $i=1,2,3$. Since the word $x_i$ involves only one symbol, the definition of $\varphi$ agrees with $\psi$ on this word, so
\begin{align*}
\varphi(a_i)=\varphi(x_i)=\psi(x_i)=0.
\end{align*}
Thus $a_1,a_2,a_3$ are centred.
Now fix two distinct indices $i\neq j$. Every word built from $x_i$ and $x_j$ involves at most two of the symbols $x_1,x_2,x_3$, so by construction
\begin{align*}
\varphi(w)=\psi(w)
\end{align*}
for every $w\in\mathbb C\langle x_i,x_j\rangle$. The state $\psi$ was chosen so that the one-variable algebras generated by the $x_i$ are free under $\psi$, hence the same alternating centred two-colour products have expectation $0$ under $\varphi$. Therefore each pair $A_i,A_j$ is free.
For the three-colour word, the letters $a_1,a_2,a_3$ are centred and belong respectively to $A_1,A_2,A_3$, with adjacent colours distinct:
\begin{align*}
1\neq 2,\qquad 2\neq 3.
\end{align*}
But the defining assignment of $\varphi$ gives
\begin{align*}
\varphi(a_1a_2a_3)=\varphi(x_1x_2x_3)=1.
\end{align*}
If $A_1,A_2,A_3$ were jointly free, this alternating centred three-colour moment would have to be $0$. Since it is $1$, the family is pairwise free but not jointly free.
[/example]
Free independence is therefore a condition on the full pattern of alternating centred products. It differs from tensor independence both algebraically and computationally: variables need not commute, mixed moments need not factor into products, yet all mixed moments are still determined by the individual laws once freeness is imposed. This principle is the entry point for Chapter 4 on noncrossing partitions and Chapter 5 on free cumulants.
Chapter 3 showed that freeness constrains the alternating structure of centered words; Chapter 4 makes this constraint explicit through the noncrossing partition lattice, the combinatorial framework that indexes and encodes these structures.
# 4. Noncrossing Partitions
Chapter 3 showed that freeness is controlled by alternating centered words; this chapter builds the combinatorial side of that control: how moments are expanded, how cumulants are recovered, and why the relevant indexing objects are planar rather than arbitrary. The prerequisites are elementary set partitions, finite partially ordered sets, and the idea that a word $a_1\cdots a_n$ remembers the order of its letters. This chapter introduces the combinatorial object that replaces ordinary set partitions in free probability. Classical independence is governed by all partitions of a finite set, while free independence is governed by the noncrossing ones, so we develop the order structure of noncrossing partitions, the Kreweras complement, Catalan enumeration, and the Mobius function that later enters moment-cumulant inversion.
## The Problem of Crossing Blocks
The moment-cumulant formula for classical probability sums over all set partitions because the variables commute and no planar order remains. In a noncommutative word $a_1\cdots a_n$, the positions $1,\dots,n$ have a cyclic and linear order, so partitions that force incompatible interlacings should be separated from partitions that respect the order of the word.
[definition: Partition Of A Finite Set]
Let $S$ be a finite set. A partition $\pi$ of $S$ is a finite collection of nonempty, pairwise disjoint subsets $V_1,\dots,V_k \subset S$ such that
\begin{align*} S = V_1 \cup \cdots \cup V_k. \end{align*}
The subsets $V_i$ are called the blocks of $\pi$.
[/definition]
We write $\mathcal P(n)$ for the set of all partitions of $[n] := \{1,\dots,n\}$. The two extreme partitions are $0_n := \{\{1\},\dots,\{n\}\}$ and $1_n := \{\{1,\dots,n\}\}$. To see why a restriction is needed, it helps to compare the first few partition sets before any freeness condition is imposed.
[example: Partitions Of Three Points]
A partition of $[3]=\{1,2,3\}$ is determined by which elements are placed in the same block. There is one partition into three singleton blocks,
\begin{align*} \{\{1\},\{2\},\{3\}\}. \end{align*}
There are three partitions with one two-element block and one singleton block, because the two-element block can be $\{1,2\}$, $\{1,3\}$, or $\{2,3\}$:
\begin{align*} \{\{1,2\},\{3\}\},\quad \{\{1,3\},\{2\}\},\quad \{\{2,3\},\{1\}\}. \end{align*}
There is one partition into a single block,
\begin{align*} \{\{1,2,3\}\}. \end{align*}
Thus $[3]$ has $1+3+1=5$ partitions.
Each of these partitions is noncrossing. Indeed, a crossing would require four distinct positions $p_1<q_1<p_2<q_2$ in $[3]$, but $[3]$ contains only three positions, so no such quadruple exists. Hence every partition of $[3]$ lies in $NC(3)$, and the first possible difference between all partitions and noncrossing partitions occurs at $n=4$.
[/example]
The example shows that small partitions do not yet expose the obstruction. To define the obstruction, place the points $1,\dots,n$ in increasing order on a line or around a circle and connect elements in the same block by internal arcs whenever possible. This motivates the next definition, which isolates the interlacing pattern that prevents a planar drawing.
[definition: Crossing Partition]
A partition $\pi \in \mathcal P(n)$ is crossing if there exist $1 \le p_1 < q_1 < p_2 < q_2 \le n$ such that $p_1,p_2$ lie in one block of $\pi$ and $q_1,q_2$ lie in another block of $\pi$.
[/definition]
Crossing partitions are exactly the partitions incompatible with planar nesting. The complementary class is therefore the class that can index planar expansions, and this is the class that appears in the [Speicher Moment-Cumulant Formula](/theorems/7128). This motivates the following definition of the planar part of $\mathcal P(n)$.
[definition: Noncrossing Partition]
A partition $\pi \in \mathcal P(n)$ is noncrossing if it is not crossing. The set of all noncrossing partitions of $[n]$ is denoted $NC(n)$.
[/definition]
The definition is combinatorial, but its force is geometric: the blocks can be drawn inside a disc with boundary points $1,\dots,n$ in order, and the convex hulls of distinct blocks do not intersect in their interiors. The first case where this removes a partition is the four-point case.
[example: Noncrossing And Crossing Partitions Of Four Points]
For $n=4$, first separate the partitions of $[4]$ by block sizes. There is one partition of type $1+1+1+1$,
\begin{align*} \{\{1\},\{2\},\{3\},\{4\}\}. \end{align*}
There are $\binom{4}{2}=6$ partitions of type $2+1+1$, since the two-element block can be chosen in six ways. There are four partitions of type $3+1$, since the singleton can be chosen in four ways, and there is one partition of type $4$,
\begin{align*} \{\{1,2,3,4\}\}. \end{align*}
The only remaining block type is $2+2$. Its three partitions are
\begin{align*} \{\{1,2\},\{3,4\}\},\quad \{\{1,3\},\{2,4\}\},\quad \{\{1,4\},\{2,3\}\}. \end{align*}
The partition $\{\{1,3\},\{2,4\}\}$ is crossing because $1<2<3<4$, with $1,3$ in one block and $2,4$ in another block. The other two pairings are noncrossing: in $\{\{1,2\},\{3,4\}\}$ the two pairs are disjoint adjacent intervals, and in $\{\{1,4\},\{2,3\}\}$ the pair $\{2,3\}$ is nested inside the pair $\{1,4\}$.
No partition with a singleton block can be crossing, because the definition of crossing requires two distinct blocks each contributing two elements. Thus the only crossing partition of $[4]$ is $\{\{1,3\},\{2,4\}\}$. Since the total number of partitions is
\begin{align*} 1+6+3+4+1=15, \end{align*}
the number of noncrossing partitions is
\begin{align*} 15-1=14. \end{align*}
This is the first value of $n$ where $\mathcal P(n)$ and $NC(n)$ differ.
[/example]
The next structural question is whether noncrossing partitions behave well under the refinement operations needed for inversion formulas. This requires an order, and then meet and join operations.
## Refinement Order And Interval Blocks
A cumulant formula sums over partitions and then inverts that sum, so the relevant partition set must carry an order with usable intervals. The order is refinement: a partition is smaller when it remembers more distinctions between points.
[definition: Refinement Order]
Let $\pi,\sigma \in \mathcal P(n)$. We write $\pi \le \sigma$ if every block of $\pi$ is contained in a block of $\sigma$.
[/definition]
Thus $0_n \le \pi \le 1_n$ for every $\pi \in \mathcal P(n)$. The same order restricts to $NC(n)$, and this restriction is the order used throughout free cumulant theory. A concrete comparison fixes the direction of the order.
[example: Refinement In Four Points]
Let $\pi = \{\{1,2\},\{3\},\{4\}\}$ and $\sigma = \{\{1,2,4\},\{3\}\}$. To check $\pi \le \sigma$, inspect each block of $\pi$:
\begin{align*} \{1,2\}\subseteq \{1,2,4\}. \end{align*}
\begin{align*} \{3\}\subseteq \{3\}. \end{align*}
\begin{align*} \{4\}\subseteq \{1,2,4\}. \end{align*}
Thus every block of $\pi$ is contained in a block of $\sigma$, so $\pi \le \sigma$ by the definition of refinement order.
Both partitions are noncrossing. In $\pi$, only one block has two elements, namely $\{1,2\}$, so there cannot be two distinct blocks each contributing two positions $p_1,p_2$ and $q_1,q_2$. In $\sigma$, the block $\{3\}$ is a singleton, so again there is only one block with at least two elements. Hence neither partition contains a crossing pattern $p_1<q_1<p_2<q_2$.
[/example]
Refinement tells us how partitions compare, but inductive arguments need a way to reduce a noncrossing partition to a smaller one. The reliable removable feature is not that every non-singleton block contains neighbouring labels: for instance $\{\{1,4\},\{2\},\{3\},\{5\},\{6\}\}\in NC(6)$ has no adjacent pair inside its non-singleton block. What noncrossing guarantees instead is an interval block, possibly a singleton, which can be cut away without disturbing the planar order of the remaining points.
[definition: Interval Block]
Let $\pi\in NC(n)$. A block $V$ of $\pi$ is an interval block if, after a cyclic rotation of $[n]$, it has the form
\begin{align*} V=\{p,p+1,\dots,q\}. \end{align*}
Singleton blocks are interval blocks.
[/definition]
Interval blocks are the correct recursive handle for arbitrary noncrossing partitions. For induction, the obstruction is that a visible adjacent pair need not lie in every useful block, and non-singleton blocks need not contain adjacent labels. What noncrossing still guarantees is a removable boundary interval, which is the local feature supplied by the theorem below.
[quotetheorem:7122]
[proofunderconstruction:7122]
This theorem requires the noncrossing hypothesis: the crossing partition $\{\{1,3\},\{2,4\}\}$ has no interval block. It also does not say that every block is an interval, as the partition $\{\{1,2,4\},\{3\}\}$ shows. The point is more precise: at least one block sits on the boundary in a way that can be removed without tearing the planar nesting of the remaining blocks. This proper-interval conclusion is the local deletion mechanism behind inductive proofs on $NC(n)$, including recursive counting and arguments that peel off a block before studying the smaller complementary intervals. Outside noncrossing partitions, a block can weave through the rest of the partition so that no such deletion is available, which is exactly why ordinary set partitions do not support the same planar recursions. For Mobius inversion, however, local deletion is not enough; we also need global order structure, namely best common refinements and best common coarsenings inside $NC(n)$. This motivates the meet and join terminology.
[definition: Meet And Join]
Let $P$ be a partially ordered set and let $x,y \in P$. The meet $x \wedge y$ is the greatest element of $P$ below both $x$ and $y$. The join $x \vee y$ is the least element of $P$ above both $x$ and $y$.
[/definition]
For all set partitions, the meet is obtained by intersecting blocks, while the join is the [transitive closure](/theorems/1493) of belonging to a common block. The issue is whether these best common bounds remain available after imposing noncrossing. This motivates the lattice theorem, which says that the planar restriction is stable enough for the later algebra.
[quotetheorem:7123]
[citeproof:7123]
The noncrossing hypothesis is essential for this statement as a theorem about $NC(n)$ rather than $\mathcal P(n)$: after planar partitions are selected, it is no longer automatic that a classical join remains planar. For instance, in $NC(4)$ the partitions $\pi=\{\{1,3\},\{2\},\{4\}\}$ and $\sigma=\{\{2,4\},\{1\},\{3\}\}$ are both noncrossing, but their ordinary join in $\mathcal P(4)$ is $\{\{1,3\},\{2,4\}\}$, which is crossing. Inside $NC(4)$ the least common upper bound must therefore coarsen further, namely to $1_4$. The theorem does not identify the join with the ordinary partition join; the noncrossing join may be coarser because it must stay inside $NC(n)$. Its main consequence for the course is that intervals $[\pi,\sigma] := \{\rho \in NC(n): \pi \le \rho \le \sigma\}$ are finite posets with enough structure for Mobius inversion. Before inversion, we need the enumeration and a symmetry of $NC(n)$ that has no classical analogue.
## Kreweras Complement And Catalan Enumeration
The number of all set partitions is the Bell number, but free probability repeatedly produces Catalan numbers instead. The Catalan count comes from recursively cutting a noncrossing partition at the block containing $1$.
We use the convention that $NC(0)$ consists of the unique partition of the empty set. This convention is needed when a recursive decomposition produces an empty interval between consecutive elements of a block.
[quotetheorem:7124]
[citeproof:7124]
The noncrossing condition is what makes this recursive decomposition possible: the crossing pairing $\{\{1,3\},\{2,4\}\}$ cannot be decomposed into independent regions between the elements of the block containing $1$. The theorem counts partitions but does not classify them by block sizes or refinement intervals; those finer enumerations require Narayana numbers and interval decompositions. The formula gives $|NC(4)|=14$, matching the list above, and explains why Catalan objects appear throughout free probability: planar nesting, rather than arbitrary grouping, is the combinatorial structure of free independence. The first values make the departure from Bell numbers visible.
[example: Catalan Values]
Using the formula from *[Catalan Enumeration Of Noncrossing Partitions](/theorems/7124)*,
\begin{align*} |NC(n)|=\frac{1}{n+1}\binom{2n}{n}. \end{align*}
Thus
\begin{align*} |NC(0)|=\frac{1}{1}\binom{0}{0}=1. \end{align*}
\begin{align*} |NC(1)|=\frac{1}{2}\binom{2}{1}=\frac{2}{2}=1. \end{align*}
\begin{align*} |NC(2)|=\frac{1}{3}\binom{4}{2}=\frac{6}{3}=2. \end{align*}
\begin{align*} |NC(3)|=\frac{1}{4}\binom{6}{3}=\frac{20}{4}=5. \end{align*}
\begin{align*} |NC(4)|=\frac{1}{5}\binom{8}{4}=\frac{70}{5}=14. \end{align*}
For $n=3$, a crossing would require four distinct positions $p_1<q_1<p_2<q_2$ in $[3]$, but $[3]$ has only three elements. Hence every partition of $[3]$ is noncrossing, so the Catalan count $|NC(3)|=5$ agrees with the Bell count $|\mathcal P(3)|=5$.
For $n=4$, the total number of set partitions is
\begin{align*} 1+6+3+4+1=15, \end{align*}
corresponding respectively to block types $1+1+1+1$, $2+1+1$, $2+2$, $3+1$, and $4$. Among the three $2+2$ pairings,
\begin{align*} \{\{1,2\},\{3,4\}\},\quad \{\{1,3\},\{2,4\}\},\quad \{\{1,4\},\{2,3\}\}, \end{align*}
only $\{\{1,3\},\{2,4\}\}$ is crossing, because $1<2<3<4$ with $1,3$ in one block and $2,4$ in the other. Therefore
\begin{align*} |NC(4)|=15-1=14. \end{align*}
This is the first place where Catalan counting differs from Bell counting.
[/example]
Counting tells us how many planar partitions exist, but later multiplicative formulas need a way to measure the unused planar space around a partition. Without such a complement, even a small calculation like $\pi=\{\{1,2\},\{3\}\}$ gives no canonical way to say which barred positions can still be joined without crossing the original arc. The Kreweras complement records exactly this remaining planar freedom. It is best understood by interlacing two copies of $[n]$ around a circle.
The key point is that the complement is not an arbitrary set-theoretic complement of blocks. It must remember the cyclic order of the original word: a barred block is allowed only when it can be drawn in the regions left by the unbarred partition. Thus the construction packages a geometric operation, filling the unused faces of a planar diagram, into a partition of another copy of $[n]$.
The following definition makes this remaining planar freedom canonical by asking for the coarsest compatible barred partition in the interlaced drawing. The word "coarsest" is important: if two barred points can be joined without crossing the unbarred diagram, then maximal planar freedom requires them to be joined. The resulting object depends on the interlaced cyclic order, so the definition fixes that order before taking the largest compatible barred partition.
[definition: Kreweras Complement]
For $n\ge1$, the Kreweras complement is the map
\begin{align*} K:NC(n)\to NC(n). \end{align*}
Given $\pi \in NC(n)$, place the points
\begin{align*} 1,\bar{1},2,\bar{2},\dots,n,\bar{n} \end{align*}
around a circle in this order. The partition $K(\pi)$ is obtained by taking the largest partition of $\{\bar{1},\dots,\bar{n}\}$, in refinement order, such that $\pi \cup K(\pi)$ is noncrossing on the interlaced set, and then identifying $\bar{i}$ with $i$.
[/definition]
The adjective largest refers to refinement order: $K(\pi)$ has blocks as large as possible while preserving noncrossing with $\pi$. This convention uses the displayed interlacing order; reversing the cyclic convention shifts the labelled answer, so examples must be computed relative to this order. The smallest cases show how the complement swaps occupied and available regions.
[example: Kreweras Complements For Three Points]
For $n=3$, take $\pi=\{\{1,2\},\{3\}\}$ and place the six points in the cyclic order
\begin{align*} 1,\bar{1},2,\bar{2},3,\bar{3}. \end{align*}
A barred block containing both $\bar{1}$ and $\bar{2}$ would cross the unbarred block $\{1,2\}$, because the four boundary points occur in the alternating order
\begin{align*} 1<\bar{1}<2<\bar{2}. \end{align*}
Likewise, a barred block containing both $\bar{1}$ and $\bar{3}$ would cross $\{1,2\}$, since
\begin{align*} 1<\bar{1}<2<\bar{3}. \end{align*}
Thus $\bar{1}$ must remain a singleton in any compatible barred partition.
The barred points $\bar{2}$ and $\bar{3}$ may be placed in the same block: the only unbarred block with two elements is $\{1,2\}$, and the endpoints do not alternate with $\bar{2},\bar{3}$ because the cyclic order is
\begin{align*} 1<2<\bar{2}<\bar{3} \end{align*}
after ignoring the intervening singleton points. Hence the largest compatible barred partition is
\begin{align*} \{\{\bar{1}\},\{\bar{2},\bar{3}\}\}. \end{align*}
Identifying $\bar{i}$ with $i$ gives
\begin{align*} K(\pi)=\{\{1\},\{2,3\}\}. \end{align*}
For $\pi=0_3$, all unbarred blocks are singletons, so no unbarred block can supply two endpoints of a crossing pattern; maximality therefore gives $K(0_3)=1_3$. For $\pi=1_3$, the unbarred block is $\{1,2,3\}$, and each possible barred pair alternates with two of its points:
\begin{align*} 1<\bar{1}<2<\bar{2}, \end{align*}
\begin{align*} 1<\bar{1}<2<\bar{3}, \end{align*}
\begin{align*} 2<\bar{2}<3<\bar{3}. \end{align*}
No two barred points can be joined, so $K(1_3)=0_3$.
[/example]
The three-point computation illustrates the rule at the first nontrivial size. At four points, nested blocks make the geometry more informative because the complement must fit both around and inside existing arcs.
[example: Kreweras Complement For Four Points]
Let $\pi=\{\{1,2\},\{3,4\}\}\in NC(4)$, and place the eight points in the cyclic order
\begin{align*} 1,\bar{1},2,\bar{2},3,\bar{3},4,\bar{4}. \end{align*}
To compute $K(\pi)$, we must find the largest barred partition compatible with the two unbarred blocks $\{1,2\}$ and $\{3,4\}$.
First, $\bar{1}$ cannot be joined to any of $\bar{2},\bar{3},\bar{4}$. Indeed, the relevant cyclic orders contain the alternating patterns
\begin{align*} 1<\bar{1}<2<\bar{2}, \end{align*}
\begin{align*} 1<\bar{1}<2<\bar{3}, \end{align*}
and
\begin{align*} 1<\bar{1}<2<\bar{4}, \end{align*}
so each possible pair involving $\bar{1}$ crosses the unbarred block $\{1,2\}$.
Next, $\bar{2}$ cannot be joined to $\bar{3}$, because
\begin{align*} \bar{2}<3<\bar{3}<4 \end{align*}
is an alternating pattern with the unbarred block $\{3,4\}$. Similarly, $\bar{3}$ cannot be joined to $\bar{4}$, since
\begin{align*} 3<\bar{3}<4<\bar{4}. \end{align*}
The remaining possible nontrivial barred pair is $\{\bar{2},\bar{4}\}$. It does not cross $\{1,2\}$, because the cyclic order of the four relevant points is
\begin{align*} 1<2<\bar{2}<\bar{4}, \end{align*}
and it does not cross $\{3,4\}$, because the relevant cyclic order is
\begin{align*} \bar{2}<3<4<\bar{4}. \end{align*}
Thus the largest compatible barred partition is
\begin{align*} \{\{\bar{1}\},\{\bar{2},\bar{4}\},\{\bar{3}\}\}. \end{align*}
Identifying $\bar{i}$ with $i$ gives
\begin{align*} K(\pi)=\{\{1\},\{2,4\},\{3\}\}. \end{align*}
This example shows that Kreweras complementation can turn a two-block partition into a three-block partition by recording the remaining planar regions rather than preserving block count.
[/example]
The examples suggest two systematic properties: larger original blocks leave less complementary freedom, and the total number of original plus complementary blocks is fixed. This motivates the next theorem, which turns the complement from a collection of drawings into an order-reversing map with a precise block-count identity.
[quotetheorem:7125]
[proofunderconstruction:7125]
The refinement hypothesis is necessary for the order statement: without $\pi\le\sigma$, the complements need not be comparable. For example, in $NC(4)$ let
\begin{align*}
\pi &= \{\{1,2\},\{3\},\{4\}\}, &
\sigma &= \{\{1\},\{2,3\},\{4\}\}.
\end{align*}
Then $\pi$ and $\sigma$ are not comparable. Computing in the displayed interlacing convention gives
\begin{align*}
K(\pi) &= \{\{1\},\{2,3,4\}\}, &
K(\sigma) &= \{\{1,3,4\},\{2\}\}.
\end{align*}
Neither complement refines the other: the block $\{2,3,4\}$ of $K(\pi)$ is not contained in a block of $K(\sigma)$, while the block $\{1,3,4\}$ of $K(\sigma)$ is not contained in a block of $K(\pi)$. The theorem does not say that $K$ is an order-preserving symmetry; it reverses refinement and changes the number of blocks according to the displayed identity. This is exactly the bookkeeping later needed when products indexed by blocks of $\pi$ are paired with products indexed by the complementary regions. Similar planar-duality ideas also appear in polygon dissections, cluster combinatorics, and the Temperley-Lieb diagram calculus, where noncrossing configurations encode algebraic multiplication through planar regions. We now turn from counting partitions to inverting sums indexed by them, which is the algebraic mechanism behind free cumulants.
## Incidence Algebra And The Mobius Function
Moment-cumulant relations have the form of triangular sums over intervals in a finite poset. Without an inverse operation, a formula such as $G(y)=\sum_{x\le y}F(x)$ only gives moments from cumulants and does not recover the cumulants uniquely in a computable way. The correct language for inverting such sums is the incidence algebra of that poset.
[definition: Incidence Algebra]
Let $P$ be a finite poset. The incidence algebra is
\begin{align*} I(P)=\{f:\{(x,y)\in P^2:x\le y\}\to\mathbb C\}. \end{align*}
Convolution is the map $I(P)\times I(P)\to I(P)$ defined by
\begin{align*} (f*g)(x,z) = \sum_{x \le y \le z} f(x,y)g(y,z). \end{align*}
[/definition]
The identity element is the delta function $\delta(x,y)$, equal to $1$ if $x=y$ and $0$ otherwise. To invert triangular sums, we need the function that represents summing over every element of an interval and its convolution inverse.
[definition: Zeta And Mobius Functions]
Let $P$ be a finite poset. The zeta function $\zeta \in I(P)$ is defined by
\begin{align*}
\zeta:\{(x,y)\in P^2:x\le y\}&\to\mathbb C, &
\zeta(x,y)&=1.
\end{align*}
The Mobius function $\mu \in I(P)$ is the convolution inverse of $\zeta$.
[/definition]
Equivalently, $\mu$ is determined recursively by $\mu(x,x)=1$ and, for $x<z$, by
\begin{align*} \sum_{x \le y \le z} \mu(x,y)=0. \end{align*}
This recursion makes Mobius inversion a finite triangular computation. Moment-cumulant formulas have the form of poset-indexed sums: moments are assembled by summing cumulant contributions over an interval of partitions. The obstruction is that the desired cumulant is hidden inside this triangular sum. Mobius inversion is the finite-poset mechanism that isolates the unknown coefficient from all lower contributions.
[quotetheorem:7126]
[proofunderconstruction:7126]
The finiteness hypothesis matters: for infinite posets, the same convolution may involve infinite sums and would require convergence or local finiteness assumptions. For example, in the poset $\mathbb N\cup\{\infty\}$ ordered by $1\le2\le3\le\cdots\le\infty$, the interval $[1,\infty]$ is infinite, so the convolution sum defining $(\zeta*\zeta)(1,\infty)$ contains one term for every element of that interval. The theorem does not compute $\mu$; it only says that once $\mu$ is known, the triangular summation can be inverted. Mobius inversion therefore reduces the problem of recovering cumulants to knowing the Mobius function of the relevant poset. For $NC(n)$, the intervals have a product decomposition, and the Mobius function factors accordingly. Since the closed formula is expressed in Catalan numbers indexed by block sizes, this motivates isolating the Catalan notation used in the formula.
[definition: Catalan Number]
For $m\ge0$, the $m$th Catalan number is
\begin{align*} C_m := \frac{1}{m+1}\binom{2m}{m}. \end{align*}
[/definition]
The Catalan numbers counted all of $NC(m)$; the same numbers also measure the signed contribution of a block of size $m+1$ in the Mobius function. The next result is stated in the form used later for free cumulants, where blockwise factorization is the main computational feature.
[quotetheorem:7127]
This is used here as a statement-only combinatorial reference. The proof of the interval decomposition is not part of this course note; what matters for the sequel is the factorization pattern it gives for Mobius inversion.
The theorem depends on working in noncrossing partition lattices; the analogous Mobius function for all set partitions has a different block-factorial formula. In these notes the interval decomposition is used as a structural input for the Mobius formula, while the planar proof is left to the standard theory of noncrossing partition lattices. The factors $NC(m_r)$ should be read as the independent planar regions that remain between the lower partition $\pi$ and the upper partition $\sigma$. Short computations show how the signs and Catalan coefficients enter.
[example: Simple Mobius Values]
For $n=2$, the interval $[0_2,1_2]$ consists of $0_2$ and $1_2$. The Mobius recursion on this interval gives
\begin{align*} \mu(0_2,0_2)+\mu(0_2,1_2)=0. \end{align*}
Since $\mu(0_2,0_2)=1$, this becomes
\begin{align*} 1+\mu(0_2,1_2)=0, \end{align*}
so
\begin{align*} \mu(0_2,1_2)=-1. \end{align*}
The block formula gives the same value: the single block of $1_2$ has size $2$, and
\begin{align*} C_1=\frac{1}{2}\binom{2}{1}=\frac{2}{2}=1. \end{align*}
Therefore
\begin{align*} (-1)^{2-1}C_1=(-1)^1\cdot 1=-1. \end{align*}
For $n=3$ and $1_3=\{\{1,2,3\}\}$, the single block has size $3$. Also
\begin{align*} C_2=\frac{1}{3}\binom{4}{2}=\frac{6}{3}=2. \end{align*}
Thus the formula gives
\begin{align*} \mu(0_3,1_3)=(-1)^{3-1}C_{3-1}=(-1)^2C_2=1\cdot 2=2. \end{align*}
If $\pi=\{\{1,2\},\{3\}\}$, then its blocks have sizes $2$ and $1$. For the block $\{1,2\}$ the factor is
\begin{align*} (-1)^{2-1}C_{2-1}=(-1)^1C_1=-1\cdot 1=-1. \end{align*}
For the singleton block $\{3\}$, we use
\begin{align*} C_0=\frac{1}{1}\binom{0}{0}=1, \end{align*}
so its factor is
\begin{align*} (-1)^{1-1}C_{1-1}=(-1)^0C_0=1\cdot 1=1. \end{align*}
Multiplying the two block factors gives
\begin{align*} \mu(0_3,\pi)=(-1)\cdot 1=-1. \end{align*}
These computations show that singleton blocks contribute a neutral factor, while each two-element block contributes a factor of $-1$.
[/example]
The preceding computations cover one block and a block together with a singleton. A four-point example shows the multiplicative nature of the formula when two non-singleton blocks are present.
[example: Mobius Value For A Four Point Partition]
Let $\pi=\{\{1,4\},\{2,3\}\}\in NC(4)$. Its two blocks both have size $2$, so the blockwise Mobius formula gives one factor for $\{1,4\}$ and one factor for $\{2,3\}$:
\begin{align*} \mu(0_4,\pi)=\bigl((-1)^{2-1}C_{2-1}\bigr)\bigl((-1)^{2-1}C_{2-1}\bigr). \end{align*}
Since
\begin{align*} C_1=\frac{1}{2}\binom{2}{1}=\frac{2}{2}=1, \end{align*}
each block contributes
\begin{align*} (-1)^{2-1}C_{2-1}=(-1)^1C_1=-1\cdot 1=-1. \end{align*}
Therefore
\begin{align*} \mu(0_4,\pi)=(-1)(-1)=1. \end{align*}
For $1_4=\{\{1,2,3,4\}\}$, there is one block, and its size is $4$. The same formula gives
\begin{align*} \mu(0_4,1_4)=(-1)^{4-1}C_{4-1}=(-1)^3C_3. \end{align*}
Now
\begin{align*} C_3=\frac{1}{4}\binom{6}{3}=\frac{20}{4}=5, \end{align*}
so
\begin{align*} \mu(0_4,1_4)=(-1)^3C_3=-1\cdot 5=-5. \end{align*}
These signs and Catalan coefficients are the constants that later appear when moments are inverted to obtain free cumulants.
[/example]
The chapter has built the finite combinatorial infrastructure needed for free cumulants: the lattice $NC(n)$, the Catalan count, the Kreweras complement, and the Mobius function. In Chapter 5, sums over $NC(n)$ replace sums over all partitions in the free cumulant formula, and this replacement is exactly what encodes freeness.
The noncrossing partition lattice provides the index set for the moment-cumulant formula; Chapter 5 constructs free cumulants as Möbius inversions over this lattice and proves that their mixed vanishing characterizes free independence.
# 5. Free Cumulants
Working inside a noncommutative probability space $(\mathcal A,\varphi)$, this chapter uses the noncrossing partition lattice from Chapter 4 to develop free cumulants. The course assumes familiarity with basic linear algebra, unital algebras, states, and ordinary moments. The guiding question is how independence should be encoded when random variables do not commute, and the answer is built through moment expansions, Möbius inversion, and alternating centered products.
Free cumulants are the coordinates in which free independence becomes a combinatorial vanishing condition. The previous chapter introduced noncrossing partitions as the replacement for all set partitions in a noncommutative setting. This chapter turns that lattice into a calculus: moments are expanded over noncrossing partitions, cumulants are recovered by Möbius inversion, and the first low-order identities show how centering and freeness are detected in computations.
## Moments Indexed by Noncrossing Partitions
The problem is to separate the information in a moment $\varphi(a_1\cdots a_n)$ into contributions coming from different patterns of grouping among the variables. In classical probability this is done by summing over all set partitions. Free probability uses the same idea, but only noncrossing partitions are compatible with the recursive interval structure of free independence.
[definition: Block Moment Functional]
Let $(\mathcal A, \varphi)$ be a noncommutative probability space, let $n \ge 1$, and fix $\pi \in NC(n)$ with blocks $V_1,\dots,V_r$. The block moment functional associated to $\pi$ is the map
\begin{align*}
\varphi_\pi: \mathcal A^n \to \mathbb C
\end{align*}
defined by
\begin{align*}
\varphi_\pi[a_1,\dots,a_n] = \prod_{j=1}^r \varphi\left(\prod_{i \in V_j}^{\nearrow} a_i\right),
\end{align*}
where $\prod^{\nearrow}$ means that the indices in the block are multiplied in increasing order.
[/definition]
This notation records the moments attached to each block while preserving the order of the variables inside a block. For example, if $\pi = \{\{1,3\},\{2\}\} \in NC(3)$, then $\varphi_\pi[a_1,a_2,a_3] = \varphi(a_1a_3)\varphi(a_2)$.
Block moments describe how a partition breaks a word into smaller moment factors. To isolate the new information at order $n$, we need functionals whose products over blocks rebuild the moment, with the one-block term carrying the genuinely $n$-fold contribution. This is the role of free cumulants.
[definition: Free Cumulant Functionals]
A family $(\kappa_n)_{n \ge 1}$ of multilinear maps $\kappa_n: \mathcal A^n \to \mathbb C$ is called the family of free cumulant functionals for $(\mathcal A,\varphi)$ if, for every $n \ge 1$ and every $a_1,\dots,a_n \in \mathcal A$,
\begin{align*}
\varphi(a_1\cdots a_n) = \sum_{\pi \in NC(n)} \kappa_\pi[a_1,\dots,a_n],
\end{align*}
where for $\pi = \{V_1,\dots,V_r\}$,
\begin{align*}
\kappa_\pi[a_1,\dots,a_n] = \prod_{j=1}^r \kappa_{|V_j|}\left(a_{i_1},\dots,a_{i_{|V_j|}}\right),
\end{align*}
and $V_j = \{i_1<\dots<i_{|V_j|}\}$.
[/definition]
The definition is recursive in practice: the partition with one block contains $\kappa_n(a_1,\dots,a_n)$, while all other partitions use cumulants of smaller order. We therefore need to know that these recursive requirements can actually be met for every order.
[quotetheorem:7128]
[proofunderconstruction:7128]
This formula says that moments and free cumulants contain the same information, but organised differently. The noncommutative probability space hypothesis is essential because the state $\varphi$ supplies the moments, while the noncrossing partition lattice supplies the recursive bookkeeping compatible with ordered products. The theorem does not assert positivity, boundedness, or any analytic convergence property for cumulants; it is an algebraic finite-order reconstruction. Its limitation is also its strength: at each fixed order it isolates the indecomposable noncrossing contribution, but distributional information still has to be read through the full sequence of all orders.
A concrete warning appears already in order four. If the crossing pairing $\{\{1,3\},\{2,4\}\}$ is added to the sum, a centered variable with second free cumulant $r_2 \ne 0$ would satisfy the modified identity $m_4=r_4+3r_2^2$ rather than the free identity $m_4=r_4+2r_2^2$. For a semicircular variable, where $r_4=0$ and $r_2=1$, the free formula gives $m_4=2$, while the crossing-enhanced formula gives $3$. Thus replacing $NC(4)$ by all partitions changes the law being described.
[example: Recovering Moments Through Order Four]
Let $a \in \mathcal A$ and write $m_n=\varphi(a^n)$ and $r_n=\kappa_n(a,\dots,a)$. Applying the *Speicher Moment-Cumulant Formula* to the word $(a,\dots,a)$ means that each block of size $j$ contributes a factor $r_j$, so $m_n$ is obtained by summing the block-size products over all partitions in $NC(n)$.
For $n=1$, the only noncrossing partition is $\{\{1\}\}$, hence
\begin{align*}
m_1 = r_1.
\end{align*}
For $n=2$, the noncrossing partitions are $\{\{1,2\}\}$ and $\{\{1\},\{2\}\}$, so
\begin{align*}
m_2 = r_2+r_1r_1.
\end{align*}
Thus
\begin{align*}
m_2 = r_2+r_1^2.
\end{align*}
For $n=3$, the one-block partition contributes $r_3$, the three partitions with block sizes $(2,1)$ are
$\{\{1,2\},\{3\}\}$, $\{\{1,3\},\{2\}\}$, and $\{\{2,3\},\{1\}\}$, and the discrete partition contributes $r_1^3$. Therefore
\begin{align*}
m_3 = r_3+r_2r_1+r_2r_1+r_2r_1+r_1^3.
\end{align*}
Combining the three identical scalar products gives
\begin{align*}
m_3 = r_3+3r_2r_1+r_1^3.
\end{align*}
For $n=4$, group the noncrossing partitions by block sizes. The partition $\{\{1,2,3,4\}\}$ contributes $r_4$. The four partitions of type $(3,1)$ contribute $r_3r_1$ each. The two noncrossing partitions of type $(2,2)$ are $\{\{1,2\},\{3,4\}\}$ and $\{\{1,4\},\{2,3\}\}$, so they contribute $r_2^2$ each. The six partitions of type $(2,1,1)$ contribute $r_2r_1^2$ each. The discrete partition contributes $r_1^4$. Hence
\begin{align*}
m_4 = r_4+4r_3r_1+2r_2^2+6r_2r_1^2+r_1^4.
\end{align*}
The coefficient $2$ in front of $r_2^2$ records exactly the two noncrossing pair partitions of four points; the third pairing $\{\{1,3\},\{2,4\}\}$ is crossing, so it is not included in the free moment expansion.
[/example]
The exclusion of the crossing pairing is the first visible difference between classical and free cumulants. It is small in low order, but it is exactly the difference that drives semicircular rather than Gaussian limiting laws.
## Möbius Inversion and Uniqueness
The next question is whether the recursive construction is merely a convenient definition or a canonical transformation. Since $NC(n)$ is a finite lattice, the answer is governed by Möbius inversion on the interval $[0_n,1_n]$.
[definition: Möbius Function of the Noncrossing Partition Lattice]
For $n \ge 1$, the Möbius function of $NC(n)$ is the map
\begin{align*}
\mu_{NC}: \{(\sigma,\pi) \in NC(n)^2 : \sigma \le \pi\} \to \mathbb Z
\end{align*}
determined by the recursion
\begin{align*}
\sum_{\sigma \le \rho \le \pi} \mu_{NC}(\sigma,\rho) = \begin{cases} 1, & \sigma = \pi, 0, & \sigma < \pi. \end{cases}
\end{align*}
[/definition]
The Möbius function supplies the inverse coefficients for summation over the noncrossing partition lattice. The previous definition identifies those coefficients inside each interval $[\sigma,\pi]$, and now we need to apply them to the specific summation relation between $\varphi_\pi$ and $\kappa_\pi$. The theorem below is needed because it converts the recursive definition of cumulants into an explicit formula and proves uniqueness.
[quotetheorem:7129]
[proofunderconstruction:7129]
This theorem proves that free cumulants are unique. The condition $\pi \le \sigma$ is necessary because inversion takes place on intervals of the refinement order; partitions outside the interval do not contribute to the blockwise moment $\varphi_\sigma$. The formula is still finite and algebraic, so it does not by itself solve questions about convergence of cumulant series or boundedness of operators. What it does provide is the canonical bridge from observed moments to cumulants, which is why later computations can freely switch between recursive identities and explicit Möbius coefficients.
The interval condition cannot be dropped. In $NC(3)$, take $\sigma=\{\{1,2\},\{3\}\}$. The blockwise relation is $\varphi_\sigma=\kappa_\sigma+\kappa_{\{\{1\},\{2\},\{3\}\}}$, because only partitions refining $\sigma$ can split the block $\{1,2\}$ and the singleton $\{3\}$. If the partition $\{\{1,3\},\{2\}\}$ were inserted into the inversion for $\kappa_\sigma$, it would mix entries from different blocks of $\sigma$ and would not invert the displayed two-term relation. This is why Möbius coefficients are attached to the interval $[0_n,\sigma]$, not to unrelated partitions in $NC(n)$.
[example: Free Cumulants of a Scalar]
Let $x=\lambda 1_{\mathcal A}$ and write $r_n=\kappa_n(x,\dots,x)$. Since $\varphi(1_{\mathcal A})=1$ and $x^n=\lambda^n1_{\mathcal A}$, the moments are
\begin{align*}
m_n=\varphi(x^n)=\varphi(\lambda^n1_{\mathcal A})=\lambda^n\varphi(1_{\mathcal A})=\lambda^n.
\end{align*}
For $n=1$, the moment-cumulant identity gives
\begin{align*}
\lambda=m_1=r_1.
\end{align*}
For $n=2$,
\begin{align*}
\lambda^2=m_2=r_2+r_1^2.
\end{align*}
Substituting $r_1=\lambda$ gives
\begin{align*}
\lambda^2=r_2+\lambda^2,
\end{align*}
so
\begin{align*}
r_2=0.
\end{align*}
Now fix $n\ge 3$ and assume inductively that $r_j=0$ for every $2\le j<n$. In the moment-cumulant expansion of $m_n$, the one-block partition contributes $r_n$, the discrete partition contributes
\begin{align*}
r_1^n=\lambda^n,
\end{align*}
and every other noncrossing partition has at least one block whose size lies between $2$ and $n-1$, so its contribution contains a factor $r_j=0$. Hence the whole expansion reduces to
\begin{align*}
\lambda^n=m_n=r_n+\lambda^n.
\end{align*}
Subtracting $\lambda^n$ from both sides gives
\begin{align*}
r_n=0.
\end{align*}
Thus a scalar variable $\lambda 1_{\mathcal A}$ has first free cumulant $\lambda$ and all higher free cumulants equal to $0$, expressing that it has no non-scalar fluctuation.
[/example]
The scalar example is the free-probabilistic version of the fact that a constant has no fluctuation. Cumulants of order at least two measure non-scalar variation.
[example: Free Cumulants of a Projection]
Let $p=p^2 \in \mathcal A$ be a projection and set $\alpha=\varphi(p)$. Since $p^n=p$ for every $n\ge 1$, its moments are
\begin{align*}
m_n=\varphi(p^n)=\varphi(p)=\alpha.
\end{align*}
Write $r_n=\kappa_n(p,\dots,p)$. From the first identity,
\begin{align*}
\alpha=m_1=r_1.
\end{align*}
Thus $r_1=\alpha$.
For order two, the moment-cumulant identity gives
\begin{align*}
\alpha=m_2=r_2+r_1^2.
\end{align*}
Substituting $r_1=\alpha$ gives
\begin{align*}
\alpha=r_2+\alpha^2.
\end{align*}
Subtracting $\alpha^2$ from both sides gives
\begin{align*}
r_2=\alpha-\alpha^2.
\end{align*}
For order three,
\begin{align*}
\alpha=m_3=r_3+3r_2r_1+r_1^3.
\end{align*}
Using $r_1=\alpha$ and $r_2=\alpha-\alpha^2$,
\begin{align*}
3r_2r_1=3(\alpha-\alpha^2)\alpha=3\alpha^2-3\alpha^3.
\end{align*}
Also,
\begin{align*}
r_1^3=\alpha^3.
\end{align*}
Therefore
\begin{align*}
\alpha=r_3+3\alpha^2-3\alpha^3+\alpha^3.
\end{align*}
Combining the two cubic terms gives
\begin{align*}
\alpha=r_3+3\alpha^2-2\alpha^3.
\end{align*}
Hence
\begin{align*}
r_3=\alpha-3\alpha^2+2\alpha^3.
\end{align*}
For order four, the one-variable identity is
\begin{align*}
m_4=r_4+4r_3r_1+2r_2^2+6r_2r_1^2+r_1^4.
\end{align*}
Since $m_4=\alpha$, we substitute the previous cumulants term by term:
\begin{align*}
4r_3r_1=4(\alpha-3\alpha^2+2\alpha^3)\alpha=4\alpha^2-12\alpha^3+8\alpha^4.
\end{align*}
\begin{align*}
2r_2^2=2(\alpha-\alpha^2)^2=2(\alpha^2-2\alpha^3+\alpha^4)=2\alpha^2-4\alpha^3+2\alpha^4.
\end{align*}
\begin{align*}
6r_2r_1^2=6(\alpha-\alpha^2)\alpha^2=6\alpha^3-6\alpha^4.
\end{align*}
\begin{align*}
r_1^4=\alpha^4.
\end{align*}
Thus
\begin{align*}
\alpha=r_4+(4\alpha^2-12\alpha^3+8\alpha^4)+(2\alpha^2-4\alpha^3+2\alpha^4)+(6\alpha^3-6\alpha^4)+\alpha^4.
\end{align*}
The non-$r_4$ terms combine as
\begin{align*}
(4\alpha^2+2\alpha^2)+(-12\alpha^3-4\alpha^3+6\alpha^3)+(8\alpha^4+2\alpha^4-6\alpha^4+\alpha^4)=6\alpha^2-10\alpha^3+5\alpha^4.
\end{align*}
So
\begin{align*}
\alpha=r_4+6\alpha^2-10\alpha^3+5\alpha^4.
\end{align*}
Solving for $r_4$ gives
\begin{align*}
r_4=\alpha-6\alpha^2+10\alpha^3-5\alpha^4.
\end{align*}
Even though the moment sequence of a projection is constant, its free cumulants record nontrivial polynomial information in the scalar parameter $\alpha$.
[/example]
Projections already show that free cumulants are sensitive to the non-Gaussian shape of a law. Even though the moment sequence is simple, the cumulants encode increasingly refined polynomial information in the parameter $\alpha$.
## Multilinear Cumulants of Several Variables
Moments of one variable describe a marginal law, but freeness is a statement about alternating products of several subalgebras. The multilinear form of cumulants is needed because it remembers which variable occupies each position in a word.
[definition: Mixed Free Cumulant]
Let $(\mathcal A,\varphi)$ be a noncommutative probability space. Fix either a family of subalgebras $(\mathcal A_i)_{i \in I}$ of $\mathcal A$ or a family of variables $(x_i)_{i \in I}$ in $\mathcal A$. For $a_1,\dots,a_n \in \mathcal A$, the number $\kappa_n(a_1,\dots,a_n)$ is called a mixed free cumulant relative to the chosen family if its entries are drawn from at least two different members of that family.
[/definition]
The definition is not a new construction; it is terminology for the same multilinear functionals when the inputs come from several sources. The key point is that mixed cumulants are exactly what freeness eliminates.
[quotetheorem:7130]
[citeproof:7130]
This theorem is one of the main reasons cumulants are useful: it converts a recursive condition on alternating centered products into a local vanishing rule for multilinear functionals. The unital hypothesis matters because the centering step subtracts scalar multiples of $1_{\mathcal A}$ and must remain inside the same subalgebra. The statement is also relative to the chosen family $(\mathcal A_i)_{i \in I}$; a cumulant is mixed only after we specify which subalgebras the entries come from.
The centering and unitality requirements cannot be removed from the proof. If $a \in \mathcal A_1$ and $b \in \mathcal A_2$ are free but $\varphi(a)$ and $\varphi(b)$ are nonzero, then $\varphi(ab)=\varphi(a)\varphi(b)$ need not be $0$, so an uncentered alternating product is not expected to vanish. Likewise, if a chosen subalgebra is not unital, then $a-\varphi(a)1_{\mathcal A}$ may leave that subalgebra, so the scalar-subtraction step used to reduce to centered variables no longer stays inside the family being tested for freeness. Without the noncrossing interval structure, the converse argument would also pick up crossing contributions that freeness does not control.
The conclusion is also not a generic property of unrelated subalgebras. In the commutative probability space of two independent Rademacher random variables $X$ and $Y$, let $\mathcal A_1=\mathbb C[X]$ and $\mathcal A_2=\mathbb C[Y]$. These are distinct subalgebras. Since $X$ and $Y$ are centered and $X^2=Y^2=1$, the alternating centered moment is
\begin{align*}
\varphi(XYXY)=\varphi(X^2Y^2)=1.
\end{align*}
The lower-order mixed pair cumulants vanish, so the order-four free moment-cumulant expansion gives
\begin{align*}
\kappa_4(X,Y,X,Y)=1.
\end{align*}
Thus distinct subalgebras that are not free can have nonzero mixed cumulants. The theorem says that the vanishing criterion is equivalent to freeness of the chosen family, not that mixed cumulants vanish merely because the inputs come from different named sources.
[example: A Second-Order Mixed Moment]
Suppose $a \in \mathcal A_1$ and $b \in \mathcal A_2$, where $\mathcal A_1$ and $\mathcal A_2$ are free. The order-two moment-cumulant identity gives
\begin{align*}
\varphi(ab)=\kappa_2(a,b)+\kappa_1(a)\kappa_1(b).
\end{align*}
Because $a$ and $b$ come from two different free subalgebras, $\kappa_2(a,b)$ is a mixed free cumulant, so the mixed-cumulant vanishing criterion gives
\begin{align*}
\kappa_2(a,b)=0.
\end{align*}
Also, the order-one identity gives
\begin{align*}
\kappa_1(a)=\varphi(a).
\end{align*}
and
\begin{align*}
\kappa_1(b)=\varphi(b).
\end{align*}
Substituting these three identities into the order-two expansion yields
\begin{align*}
\varphi(ab)=0+\varphi(a)\varphi(b).
\end{align*}
Hence
\begin{align*}
\varphi(ab)=\varphi(a)\varphi(b).
\end{align*}
This is the lowest-order point at which freeness resembles classical independence.
[/example]
At higher orders freeness diverges from tensor independence, because the noncrossing condition controls which mixed cumulant products may appear. The next example shows how the order of variables matters.
[example: Alternating Centered Product]
Let $a,c \in \mathcal A_1$ and $b,d \in \mathcal A_2$, where $\mathcal A_1$ and $\mathcal A_2$ are free, and assume $\varphi(a)=\varphi(b)=\varphi(c)=\varphi(d)=0$. We compute $\varphi(abcd)$ from the fourth moment-cumulant identity with $(a_1,a_2,a_3,a_4)=(a,b,c,d)$.
By the order-one identity, the centering assumptions give
\begin{align*}
\kappa_1(a)=\kappa_1(b)=\kappa_1(c)=\kappa_1(d)=0.
\end{align*}
The mixed cumulant vanishing criterion for free subalgebras gives
\begin{align*}
\kappa_4(a,b,c,d)=0.
\end{align*}
It also gives
\begin{align*}
\kappa_3(a,b,c)=\kappa_3(a,b,d)=\kappa_3(a,c,d)=\kappa_3(b,c,d)=0,
\end{align*}
because each triple contains entries from both $\mathcal A_1$ and $\mathcal A_2$. For the second-order terms appearing in the free order-four identity,
\begin{align*}
\kappa_2(a,b)=\kappa_2(a,d)=\kappa_2(b,c)=\kappa_2(c,d)=0,
\end{align*}
again because each displayed pair is mixed.
Therefore the fourth identity reduces term by term to
\begin{align*}
\varphi(abcd)=0+0\cdot \kappa_1(d)+0\cdot \kappa_1(c)+0\cdot \kappa_1(b)+0\cdot \kappa_1(a)+0\cdot 0+0\cdot 0+0.
\end{align*}
Hence
\begin{align*}
\varphi(abcd)=0.
\end{align*}
The only pair product that could use the same-subalgebra pairs is $\kappa_2(a,c)\kappa_2(b,d)$, corresponding to $\{\{1,3\},\{2,4\}\}$, but that pairing is crossing and is not a term in the free noncrossing expansion. Thus the alternating centered product has zero moment, exactly as freeness requires.
[/example]
This computation also explains why the noncrossing restriction is forced by freeness. If crossing partitions were included, the pairing of the two $\mathcal A_1$ positions and the two $\mathcal A_2$ positions would create an unwanted contribution.
## Low-Order Identities and Centered Variables
For calculations, the most useful part of the theory is the list of low-order identities. They let us move between moments and cumulants without drawing every noncrossing partition each time.
[quotetheorem:7131]
[proofunderconstruction:7131]
The first four identities show that first cumulants appear as scalar correction terms in every order. Their usefulness depends on using the free cumulants attached to the same state $\varphi$ and on the explicit enumeration of $NC(n)$. For a concrete failure, take $\mathcal A=\mathbb C[x]$, let $\varphi(p)=p(0)$ and $\psi(p)=p(1)$, and set $a=x$. The first identity gives $\varphi(a)=0$ when the cumulant is computed from $\varphi$, but the first cumulant computed from $\psi$ is $\kappa_1^\psi(a)=1$; using cumulants from the wrong state already breaks the order-one formula.
The noncrossing hypothesis is just as restrictive. If one accidentally includes the crossing pairing at order four, the coefficient of the second-order correction changes and the formula becomes classical rather than free. For instance, a centered semicircular variable of variance $1$ has $r_2=1$ and $r_n=0$ for $n \ne 2$, so the identity gives $\varphi(a^4)=2$. Adding the crossing pairing would force $\varphi(a^4)=3$, which is the fourth moment of a standard Gaussian rather than a semicircular variable. The theorem is therefore a low-order computational tool, not a replacement for the general moment-cumulant formula beyond $n=4$. For computations with freeness, we need a way to remove the scalar terms before expanding alternating products, and the next definition names the variables for which this first-order obstruction has already been removed.
[definition: Centered Noncommutative Random Variable]
Let $(\mathcal A,\varphi)$ be a noncommutative probability space. An element $a \in \mathcal A$ is centered if
\begin{align*}
\varphi(a)=0.
\end{align*}
[/definition]
Centering removes the first cumulant but leaves higher cumulants to be computed from the translated variable by multilinearity. The following example gives the basic low-order consequences.
[example: Cumulants of a Centered Variable]
Let $a \in \mathcal A$ be centered, and write $m_n=\varphi(a^n)$ and $r_n=\kappa_n(a,\dots,a)$. Since $a$ is centered, the first-order identity gives
\begin{align*}
r_1=\kappa_1(a)=\varphi(a)=0.
\end{align*}
For order two, the one-variable moment-cumulant identity is
\begin{align*}
m_2=r_2+r_1^2.
\end{align*}
Substituting $r_1=0$ gives
\begin{align*}
m_2=r_2+0^2.
\end{align*}
Since $0^2=0$, this becomes
\begin{align*}
m_2=r_2.
\end{align*}
For order three, the one-variable identity is
\begin{align*}
m_3=r_3+3r_2r_1+r_1^3.
\end{align*}
Substituting $r_1=0$ gives
\begin{align*}
m_3=r_3+3r_2\cdot 0+0^3.
\end{align*}
Both correction terms vanish, so
\begin{align*}
m_3=r_3.
\end{align*}
For order four, the one-variable identity is
\begin{align*}
m_4=r_4+4r_3r_1+2r_2^2+6r_2r_1^2+r_1^4.
\end{align*}
Substituting $r_1=0$ gives
\begin{align*}
m_4=r_4+4r_3\cdot 0+2r_2^2+6r_2\cdot 0^2+0^4.
\end{align*}
The terms containing $r_1$ vanish, hence
\begin{align*}
m_4=r_4+2r_2^2.
\end{align*}
Using $r_2=m_2$, this also gives
\begin{align*}
r_4=m_4-2m_2^2.
\end{align*}
Thus centering removes every first-cumulant correction: the second and third free cumulants equal the corresponding moments, while the fourth cumulant differs from $m_4$ by the two noncrossing pairings of four points.
[/example]
These centered identities prepare the ground for semicircular variables, whose free cumulants vanish except in order two. They also provide a diagnostic for comparing free and classical cumulants at low order.
[example: Classical and Free Cumulants at Low Order]
For one scalar-valued law, write $c_n$ for classical cumulants and $r_n$ for free cumulants. The first moment identity in both theories is
\begin{align*}
m_1=c_1.
\end{align*}
and
\begin{align*}
m_1=r_1.
\end{align*}
Hence
\begin{align*}
c_1=r_1=m_1.
\end{align*}
At order two, both classical and free expansions have the same two partitions:
\begin{align*}
m_2=c_2+c_1^2.
\end{align*}
and
\begin{align*}
m_2=r_2+r_1^2.
\end{align*}
Using $c_1=r_1=m_1$, these give
\begin{align*}
c_2=m_2-c_1^2=m_2-m_1^2.
\end{align*}
and
\begin{align*}
r_2=m_2-r_1^2=m_2-m_1^2.
\end{align*}
Thus $c_2=r_2$.
Now assume the variable is centered, so $m_1=0$. Then $c_1=r_1=0$. The order-three classical and free identities are
\begin{align*}
m_3=c_3+3c_2c_1+c_1^3.
\end{align*}
and
\begin{align*}
m_3=r_3+3r_2r_1+r_1^3.
\end{align*}
Substituting $c_1=r_1=0$ gives
\begin{align*}
m_3=c_3+3c_2\cdot 0+0^3=c_3.
\end{align*}
and
\begin{align*}
m_3=r_3+3r_2\cdot 0+0^3=r_3.
\end{align*}
So for centered variables,
\begin{align*}
c_3=r_3=m_3.
\end{align*}
At order four the coefficient of the pairing contribution changes. Classically, all three pairings of four points contribute, so for a centered variable
\begin{align*}
m_4=c_4+3c_2^2.
\end{align*}
Since $c_2=m_2$, this becomes
\begin{align*}
m_4=c_4+3m_2^2.
\end{align*}
Solving for $c_4$ gives
\begin{align*}
c_4=m_4-3m_2^2.
\end{align*}
In the free expansion, only the two noncrossing pairings contribute, so
\begin{align*}
m_4=r_4+2r_2^2.
\end{align*}
Since $r_2=m_2$, this becomes
\begin{align*}
m_4=r_4+2m_2^2.
\end{align*}
Solving for $r_4$ gives
\begin{align*}
r_4=m_4-2m_2^2.
\end{align*}
The missing third pairing is $\{\{1,3\},\{2,4\}\}$, which is crossing; this single excluded term is why classical and free cumulants first diverge in fourth order.
[/example]
The chapter's main message is that free cumulants are not extra data. They are the Möbius-inverted form of moments on the noncrossing partition lattice, and their mixed vanishing is precisely the combinatorial form of free independence.
Free cumulants are formally defined as Möbius inversions in Chapter 5; Chapter 6 reinterprets them as the natural coordinates for freeness itself, proving that free independence is precisely the condition that mixed cumulants vanish.
# 6. Freeness via Cumulants
Chapter 5 introduced free cumulants as the coefficients that linearise the moment-cumulant relation over noncrossing partitions. This chapter explains why they are the right coordinates for freeness itself. The guiding principle is that free independence removes exactly the mixed cumulants, so mixed moments can be reconstructed recursively from the cumulants of each individual algebra.
The previous chapter built the combinatorics of noncrossing partitions and the definition of free cumulants. We now turn those definitions into a working criterion for freeness, then use the criterion to compute cumulants of sums, simple products, and mixed words after centering.
## Mixed Cumulants and the Criterion for Freeness
The defining condition for freeness is stated in terms of alternating centered products. For computations, that condition is awkward because it is not expressed directly in terms of the full moment expansion. The question is: which coefficients in the noncrossing partition expansion detect the obstruction to freeness?
[definition: Mixed Free Cumulant]
Let $(\mathcal A, \varphi)$ be a noncommutative probability space, and let $(\mathcal A_i)_{i \in I}$ be unital subalgebras of $\mathcal A$. For $n \ge 1$, let
\begin{align*}
\kappa_n : \mathcal A^n \to \mathbb C
\end{align*}
be the $n$-th free cumulant functional. The value $\kappa_n(a_1,\dots,a_n)$ is called mixed with respect to $(\mathcal A_i)_{i \in I}$ if there exist indices $i_1,\dots,i_n \in I$ such that $a_j \in \mathcal A_{i_j}$ for every $j$, and the tuple $(i_1,\dots,i_n)$ is not constant.
[/definition]
A mixed cumulant is therefore a cumulant whose arguments do not all come from the same algebra. This definition isolates the possible cross-talk between different subalgebras, but it does not yet say whether those coefficients are exactly the obstruction to free independence. The next theorem is needed to turn the alternating centered-word definition of freeness into a usable cumulant test.
[quotetheorem:7132]
[proofunderconstruction:7132]
This theorem replaces a moment condition involving all alternating centered words by a coefficientwise condition. The unital hypothesis matters because freeness is tested after subtracting scalar expectations, and the scalar unit must remain inside each algebra when we form $a-\varphi(a)1$. Without freeness there is no reason for mixed cumulants to vanish: if two indexed subalgebras coincide and contain a centred element $a$ with $\varphi(a^2)\ne0$, then the colour-labelled cumulant $\kappa_2(a,a)=\varphi(a^2)$ is mixed with respect to the two labels and is nonzero.
The theorem also does not say that every mixed moment is zero. Even for free centred variables, a word such as $abba$ may have a nonzero moment because noncrossing blocks can connect equal colours across the word. What vanishes is the coefficient attached to a block whose entries genuinely come from more than one algebra. This is the free analogue of the classical fact that independence is equivalent to the vanishing of mixed classical cumulants, with noncrossing partitions replacing all set partitions.
[example: Freeness Forces a Third Mixed Cumulant to Vanish]
Let $\mathcal A_1$ and $\mathcal A_2$ be free unital subalgebras, with $a \in \mathcal A_1$ and $b \in \mathcal A_2$. The cumulant $\kappa_3(a,b,a)$ is mixed because its colour pattern is $(1,2,1)$, so *[Speicher's Cumulant Criterion for Freeness](/theorems/7132)* gives
\begin{align*}
\kappa_3(a,b,a)=0.
\end{align*}
We can also see the coefficient disappear by comparing the moment expansion with the centered-word calculation. Put $\alpha=\varphi(a)$, $\beta=\varphi(b)$, $a^\circ=a-\alpha 1$, and $b^\circ=b-\beta 1$. Since $a^\circ,b^\circ,a^\circ$ are centered and alternate between the two free subalgebras,
\begin{align*}
\varphi(a^\circ b^\circ a^\circ)=0.
\end{align*}
Expanding the product gives
\begin{align*}
a^\circ b^\circ a^\circ=aba-\alpha ab-\beta a^2+\alpha\beta a-\alpha ba+\alpha^2 b+\alpha\beta a-\alpha^2\beta 1.
\end{align*}
Taking $\varphi$ and using $\varphi(ab)=\alpha\beta$ and $\varphi(ba)=\beta\alpha$, which follow from freeness applied to $a^\circ b^\circ$ and $b^\circ a^\circ$, yields
\begin{align*}
0=\varphi(aba)-\alpha^2\beta-\beta\varphi(a^2)+\alpha^2\beta-\alpha^2\beta+\alpha^2\beta+\alpha^2\beta-\alpha^2\beta.
\end{align*}
The scalar terms cancel, so
\begin{align*}
\varphi(aba)=\beta\varphi(a^2).
\end{align*}
Since $\beta=\kappa_1(b)$ and $\varphi(a^2)=\kappa_2(a,a)+\kappa_1(a)^2$, this becomes
\begin{align*}
\varphi(aba)=\kappa_1(b)\kappa_2(a,a)+\kappa_1(a)\kappa_1(b)\kappa_1(a).
\end{align*}
On the other hand, the noncrossing moment-cumulant expansion over $NC(3)$ is
\begin{align*}
\varphi(aba)=\kappa_3(a,b,a)+\kappa_2(a,b)\kappa_1(a)+\kappa_2(a,a)\kappa_1(b)+\kappa_1(a)\kappa_2(b,a)+\kappa_1(a)\kappa_1(b)\kappa_1(a).
\end{align*}
The two second-order mixed cumulants vanish by *Speicher's Cumulant Criterion for Freeness*:
\begin{align*}
\kappa_2(a,b)=0.
\end{align*}
Likewise,
\begin{align*}
\kappa_2(b,a)=0.
\end{align*}
Substituting these into the moment-cumulant expansion leaves
\begin{align*}
\varphi(aba)=\kappa_3(a,b,a)+\kappa_2(a,a)\kappa_1(b)+\kappa_1(a)\kappa_1(b)\kappa_1(a).
\end{align*}
Comparing with the centered-word value of $\varphi(aba)$ forces
\begin{align*}
\kappa_3(a,b,a)=0.
\end{align*}
The calculation shows that freeness removes the mixed coefficient while leaving the marginal $a$-cumulants and the scalar expectation of $b$ intact.
[/example]
The example shows the criterion at the level of one coefficient. For a longer mixed word, the obstruction is combinatorial: most noncrossing partitions either contain a block mixing different algebras, which freeness kills, or split into single-colour blocks that still contribute. A usable moment formula must therefore filter partitions by the colours of their blocks while preserving the original order of the word.
[quotetheorem:7133]
[citeproof:7133]
The phrase separated positions is important: a block may contain positions $1$ and $3$ even if position $2$ belongs to another algebra. This is why the word $abab$ has candidate same-colour blocks before the noncrossing condition is imposed.
Both hypotheses in the formula are doing work. If the families are not free, a partition with a genuinely mixed block can contribute; for centred $a,b$ in the same algebra with $\kappa_2(a,b)\ne0$, the moment $\varphi(ab)$ contains the mixed block $\{1,2\}$ and cannot be recovered from constant-colour blocks alone. Conversely, freeness removes mixed cumulant blocks but does not remove all mixed moments, because a partition may be mixed as a whole while each block has a single colour.
The limitation is that this is still a partition formula, not a closed-form numerical answer. Its value is recursive: once the marginal cumulants inside each subalgebra are known, the only remaining task is to enumerate the compatible noncrossing partitions for the given colour pattern.
## Additivity of Free Cumulants for Sums
Classical cumulants are useful because cumulants of independent sums add. Free cumulants satisfy the same formal rule for freely independent variables. The problem is to identify which terms survive after expanding a cumulant of $a+b$ by multilinearity.
[quotetheorem:7134]
[citeproof:7134]
This is the algebraic core of free additive convolution, whose analytic transform formulation is postponed beyond this foundational cumulant treatment. The freeness assumption is essential: if $b=a$, then $a$ and $b$ are not freely independent in general, and multilinearity gives $\kappa_n(a+b,\dots,a+b)=2^n\kappa_n(a,\dots,a)$ rather than $2\kappa_n(a,\dots,a)$ for $n\ge2$. Thus additivity is a statement about independent free summands, not about arbitrary decompositions of a variable.
The theorem is also a cumulant statement, not a moment statement. Moments of $a+b$ are recovered only after applying the moment-cumulant relation, so mixed products created by expanding $(a+b)^n$ are already encoded indirectly in the cumulants. The distribution of $a+b$ is determined by adding the free cumulant sequences of the distributions of $a$ and $b$, then recovering moments through the moment-cumulant relation.
[example: Cumulants of a Free Sum]
Suppose $a$ and $b$ are free and write $\alpha_n=\kappa_n(a,\dots,a)$ and $\beta_n=\kappa_n(b,\dots,b)$. For $s=a+b$, multilinearity of $\kappa_1$ gives
\begin{align*}
\kappa_1(s)=\kappa_1(a+b)=\kappa_1(a)+\kappa_1(b)=\alpha_1+\beta_1.
\end{align*}
For the second cumulant, multilinearity in the two slots gives
\begin{align*}
\kappa_2(s,s)=\kappa_2(a+b,a+b)=\kappa_2(a,a)+\kappa_2(a,b)+\kappa_2(b,a)+\kappa_2(b,b).
\end{align*}
The terms $\kappa_2(a,b)$ and $\kappa_2(b,a)$ are mixed cumulants, so they vanish by *Speicher's Cumulant Criterion for Freeness*. Hence
\begin{align*}
\kappa_2(s,s)=\kappa_2(a,a)+\kappa_2(b,b)=\alpha_2+\beta_2.
\end{align*}
For the third cumulant, multilinearity gives all eight choices of $a$ or $b$ in the three slots:
\begin{align*}
\kappa_3(s,s,s)=\kappa_3(a,a,a)+\kappa_3(a,a,b)+\kappa_3(a,b,a)+\kappa_3(a,b,b)+\kappa_3(b,a,a)+\kappa_3(b,a,b)+\kappa_3(b,b,a)+\kappa_3(b,b,b).
\end{align*}
Every term except $\kappa_3(a,a,a)$ and $\kappa_3(b,b,b)$ has arguments from both free subalgebras, so each is a mixed cumulant and vanishes by *Speicher's Cumulant Criterion for Freeness*. Therefore
\begin{align*}
\kappa_3(s,s,s)=\kappa_3(a,a,a)+\kappa_3(b,b,b)=\alpha_3+\beta_3.
\end{align*}
The calculation shows the mechanism behind additivity: multilinearity creates all mixed cumulants, and freeness removes exactly those mixed terms.
[/example]
Additivity is especially efficient when the variables are centered. Centering affects only the scalar part of a variable, but a cumulant computation may contain many slots, so we need a precise statement about which orders change under scalar subtraction. That statement justifies replacing variables by centered parts in later mixed-moment reductions.
[quotetheorem:7135]
[citeproof:7135]
This observation lets us move between raw variables and centered variables without changing the higher-order part of the cumulant sequence. The scalar-unit hypothesis is the precise reason this works: subtracting $\lambda 1$ only changes the deterministic first-order part. Replacing $a$ by $a-c$ for a non-scalar element $c$ would not have the same effect; for example, $\kappa_2(a-c,a-c)$ contains the additional terms $\kappa_2(c,c)$ and $-\kappa_2(a,c)-\kappa_2(c,a)$.
The theorem therefore does not say that arbitrary perturbations preserve higher cumulants, nor that centering removes all mixed moments. It only says that scalar recentering leaves the fluctuation cumulants of orders at least two unchanged, which is the tool needed when freeness is applied to centred alternating words.
## Products of Free Variables and Partition Refinements
Sums are governed by multilinearity alone. Products are subtler because a cumulant of products expands by replacing each product slot with an interval of variables and summing over partitions that connect those intervals in controlled ways. In this first course we only need simple cases, but they already show the role of refinement.
[definition: Interval Partition Associated to Products]
Let $m_1,\dots,m_r \in \mathbb N$ and set $N=m_1+\cdots+m_r$. The interval partition associated to product slots of lengths $m_1,\dots,m_r$ is
\begin{align*}
\rho = \{\{1,\dots,m_1\},\{m_1+1,\dots,m_1+m_2\},\dots,\{N-m_r+1,\dots,N\}\}.
\end{align*}
[/definition]
The partition $\rho$ records which consecutive variables were multiplied before taking the cumulant. Once those product slots have been expanded, we need to separate partitions that merely describe moments inside the slots from partitions that genuinely connect the slots. The next theorem gives the two-product case, which is the basic refinement calculation used later.
[quotetheorem:7136]
[citeproof:7136]
The condition $\pi \vee \rho=1_4$ is what prevents purely internal partitions from being counted in a cumulant between product slots. For instance, the partition $\{\{1,2\},\{3,4\}\}$ contributes to $\varphi(x_1x_2)\varphi(y_1y_2)$ but has join $\rho$, not $1_4$, so it cancels in $\kappa_2(x_1x_2,y_1y_2)$. A partition that connects the two intervals survives exactly when its join with $\rho$ is the one-block partition.
This theorem is only the two-product case. Longer product cumulants require the same idea with more intervals and a corresponding join condition, so the formula becomes combinatorially heavier even before any freeness assumptions are imposed. When the variables come from free subalgebras, the same mixed-block elimination applies after expanding products. This turns product computations into a constrained noncrossing-partition count.
[example: Deriving the Moment of Alternating Pair Word]
Let $a$ and $b$ be free, centered variables, so $\kappa_1(a)=\varphi(a)=0$ and $\kappa_1(b)=\varphi(b)=0$. By *[Moment Formula for Free Families](/theorems/7133)*, the moment $\varphi(abab)$ is the sum of $\kappa_\pi(a,b,a,b)$ over noncrossing partitions $\pi \in NC(4)$ whose blocks are monochromatic for the colour pattern $(a,b,a,b)$.
Any contributing partition with a singleton block has value $0$, because the singleton block contributes either $\kappa_1(a)=0$ or $\kappa_1(b)=0$. Thus every surviving block must have size at least $2$. Since the $a$-positions are exactly $\{1,3\}$ and the $b$-positions are exactly $\{2,4\}$, the only monochromatic partition of $\{1,2,3,4\}$ with no singleton blocks is
\begin{align*}
\{\{1,3\},\{2,4\}\}.
\end{align*}
This partition is crossing, because $1<2<3<4$ with $1,3$ in one block and $2,4$ in another block. Therefore it is not in $NC(4)$, so the colour-compatible noncrossing sum has no nonzero terms:
\begin{align*}
\varphi(abab)=0.
\end{align*}
For comparison, the word $abba$ has colour pattern $(a,b,b,a)$. The partition
\begin{align*}
\{\{1,4\},\{2,3\}\}
\end{align*}
is noncrossing and monochromatic, so its contribution is
\begin{align*}
\kappa_2(a,a)\kappa_2(b,b).
\end{align*}
Thus the zero in $\varphi(abab)$ comes from the noncrossing condition, not merely from centering or from matching equal colours.
[/example]
This example is the first place where freeness visibly differs from a naive rule that pairs equal colours regardless of order. Noncrossing geometry matters: the placement of the letters in the word determines whether marginal cumulants can combine.
## Centering Tricks and Reduction of Mixed Moments
The definition of freeness is strongest when variables are centered, but most variables are not. The practical problem is to reduce an arbitrary mixed moment to centered alternating pieces and marginal moments. This section records the basic centering moves used throughout computations.
[definition: Centered Part]
Let $(\mathcal A,\varphi)$ be a noncommutative probability space and let $a \in \mathcal A$. The centered part of $a$ is
\begin{align*}
a^\circ := a-\varphi(a)1.
\end{align*}
[/definition]
The centered part satisfies $\varphi(a^\circ)=0$, so it is the component of $a$ seen by the alternating-word definition of freeness. Expanding $a=a^\circ+\varphi(a)1$ reduces mixed moments to combinations with fewer non-scalar letters, but the final answer should be expressible without repeatedly choosing a centering expansion. The following theorem packages this reduction as a statement about marginal cumulants.
[quotetheorem:7137]
[citeproof:7137]
This theorem is computational rather than structural: it tells us that free joint laws are determined by marginal laws. It is the last purely combinatorial step before transform methods enter the course. Once every mixed moment has been reduced to marginal cumulants, transforms can package those marginal cumulant sequences into analytic objects, so later computations no longer require listing colour-compatible partitions word by word.
The freeness assumption cannot be dropped. If centred variables $a$ and $b$ lie in a non-free pair of subalgebras with $\kappa_2(a,b)\ne0$, then $\varphi(ab)=\kappa_2(a,b)$ contains information that is not a marginal cumulant of either algebra.
The constant-colour restriction is also essential. Allowing mixed blocks would insert coefficients such as $\kappa_2(a,b)$ into the formula, which are precisely joint cumulants rather than marginal data. The noncommutative order of a word still matters, because the set of admissible noncrossing partitions depends on the colour pattern.
[example: Projection Free from a Centered Variable]
Let $p=p^2$ be a projection and let $x$ be centered, with the unital algebras generated by $p$ and $x$ free. Write $\varphi(p)=t$ and $\kappa_2(x,x)=\sigma^2$. We compute $\varphi(p x p x)$ from *Moment Formula for Free Families*: only noncrossing partitions of $\{1,2,3,4\}$ whose blocks are monochromatic for the colour pattern $(p,x,p,x)$ can contribute.
Since $x$ is centered,
\begin{align*}
\kappa_1(x)=\varphi(x)=0.
\end{align*}
Therefore any contributing partition cannot have $\{2\}$ or $\{4\}$ as a singleton block, because such a block would contribute the factor $\kappa_1(x)=0$. The only way to partition the two $x$-positions without singleton $x$-blocks is to put them in one block:
\begin{align*}
\{2,4\}.
\end{align*}
For the two $p$-positions, the monochromatic choices are either two singleton blocks $\{1\}$ and $\{3\}$, or the single block $\{1,3\}$. The partition
\begin{align*}
\{\{1,3\},\{2,4\}\}
\end{align*}
is crossing, because $1<2<3<4$ with $1,3$ in one block and $2,4$ in the other. Hence it is not allowed in $NC(4)$. The only nonzero admissible noncrossing contribution is
\begin{align*}
\{\{1\},\{2,4\},\{3\}\}.
\end{align*}
Thus
\begin{align*}
\varphi(p x p x)=\kappa_1(p)\kappa_2(x,x)\kappa_1(p).
\end{align*}
Using $\kappa_1(p)=\varphi(p)=t$ and $\kappa_2(x,x)=\sigma^2$, this becomes
\begin{align*}
\varphi(p x p x)=t\sigma^2 t=t^2\sigma^2.
\end{align*}
The computation shows that the projection contributes only through its first cumulant here, while centering forces the two $x$-positions to be joined by the second cumulant.
[/example]
The same method handles longer alternating words, but the bookkeeping grows quickly. In later chapters, transforms and special distributions package these recursive cumulant computations into more efficient analytic tools.
The cumulant calculus from Chapters 5 and 6 applies to all free systems; Chapter 7 specializes to semicircular variables, the free analogue of Gaussian distributions, where all cumulants beyond the second vanish.
# 7. Semicircular Variables and Free Wick Formula
Semicircular variables are the free-probabilistic analogue of centred Gaussian random variables. Chapters 5 and 6 introduced free cumulants as the coordinates in which free independence becomes additive; this chapter uses that language to isolate the law whose only nonzero cumulant is the variance. The same law also appears from the Catalan numbers, because noncrossing pairings replace all pairings from the classical Wick formula.
We work in a noncommutative probability space $(\mathcal A, \varphi)$, usually assumed tracial when joint word computations are discussed. Variables called semicircular are self-adjoint unless a statement says otherwise. The main goal is to pass between three descriptions: moment sequences, cumulants, and a Wick-type expansion over noncrossing pairings.
## Semicircular Law by Moments and Catalan Numbers
Which probability law should play the role of the standard normal distribution when crossings are forbidden? In free probability, the answer is encoded by the Catalan numbers: even moments count noncrossing pairings, and odd moments vanish by symmetry.
[definition: Catalan Numbers]
The Catalan numbers are the integers $(C_n)_{n \ge 0}$ defined by
\begin{align*}
C_n = \frac{1}{n+1}\binom{2n}{n}.
\end{align*}
[/definition]
The same numbers count noncrossing pairings of $\{1,\dots,2n\}$. To turn that counting sequence into a probability law, we need a concrete measure whose moments are these numbers; that measure is the semicircle law.
[definition: Standard Semicircle Law]
The standard semicircle law is the probability measure $\mu_{sc}$ on $\mathbb R$ with density
\begin{align*}
\frac{1}{2\pi}\sqrt{4-x^2}\,\mathbb{1}_{[-2,2]}(x)
\end{align*}
with respect to Lebesgue measure.
[/definition]
The support $[-2,2]$ is the normalization for variance $1$. A measure by itself is not an element of a noncommutative probability space, so it cannot be inserted into products or cumulants. To use the semicircle law inside the algebra, we specify what it means for a self-adjoint element to have exactly those scalar moments.
[definition: Standard Semicircular Variable]
A self-adjoint element $s \in \mathcal A$ is a standard semicircular variable if
\begin{align*}
\varphi(s^n)=\int_{-2}^{2} x^n\frac{1}{2\pi}\sqrt{4-x^2}\,d\mathcal L^1(x)
\end{align*}
for every $n \ge 0$, where $\mathcal L^1$ denotes one-dimensional Lebesgue measure on $\mathbb R$.
[/definition]
The density definition is explicit but inconvenient for free-probability computations, where moments are compared with noncrossing partition counts. The useful question is whether the integral moments can be read directly from Catalan numbers, so that semicircular variables can be recognized without redoing an integral each time.
[quotetheorem:7138]
[citeproof:7138]
This theorem depends on the exact normalization of the density: if the support were scaled to $[-1,1]$ with the corresponding semicircle density, the second moment would be $1/4$ rather than $1$. The symmetry hypothesis is also doing real work, since adding a nonzero mean would make the odd moments nonzero and would no longer describe the centred free Gaussian. The theorem does not say that every sequence with Catalan even terms has been realized inside a chosen noncommutative probability space; it identifies the law once a self-adjoint variable with this distribution is present. The recurrence at the end is the bridge to the next section, where the same Catalan numbers reappear from noncrossing pairings in the Speicher Moment-Cumulant Formula.
[example: First Semicircular Moments]
Let $s$ be a standard semicircular variable. By the Catalan moment formula, $\varphi(s^{2n})=C_n$, where $C_n=\frac{1}{n+1}\binom{2n}{n}$. Thus
\begin{align*}
m_2=\varphi(s^2)=C_1=\frac{1}{2}\binom{2}{1}=1.
\end{align*}
\begin{align*}
m_4=\varphi(s^4)=C_2=\frac{1}{3}\binom{4}{2}=2.
\end{align*}
\begin{align*}
m_6=\varphi(s^6)=C_3=\frac{1}{4}\binom{6}{3}=5.
\end{align*}
\begin{align*}
m_8=\varphi(s^8)=C_4=\frac{1}{5}\binom{8}{4}=14.
\end{align*}
These values fix the normalization. If $X$ has the standard semicircle law on $[-2,2]$, then $X/2$ has the corresponding rescaled law on $[-1,1]$, and its second moment is
\begin{align*}
\varphi((X/2)^2)=\frac{1}{4}\varphi(X^2)=\frac{1}{4}.
\end{align*}
So the $[-1,1]$ normalization cannot be the standard variance-one normalization used here. The fourth moment also separates the semicircle law from the standard Gaussian law: for four points, the pairings are $\{\{1,2\},\{3,4\}\}$, $\{\{1,3\},\{2,4\}\}$, and $\{\{1,4\},\{2,3\}\}$, so the classical Gaussian fourth moment is $1+1+1=3$; in the free semicircular case the crossing pairing $\{\{1,3\},\{2,4\}\}$ is excluded, leaving $1+1=2$.
[/example]
The Gaussian comparison is important: both laws have mean $0$ and variance $1$, but their higher moments differ because the combinatorics of independence differs. Classical independence sums over all set partitions through classical cumulants, while free independence sums over noncrossing partitions through free cumulants.
## Characterization by Free Cumulants
The moment formula raises a structural question: which free cumulants produce the Catalan moments? Since noncrossing partitions control the moment-cumulant relation, a law whose only nonzero free cumulant has order $2$ will sum exactly over noncrossing pairings.
[definition: Semicircular Variable of Variance Sigma Squared]
Let $(\mathcal A,\varphi)$ be a noncommutative probability space and let $\sigma^2 \ge 0$. A self-adjoint element $s \in \mathcal A$ is semicircular of variance $\sigma^2$ if its free cumulants satisfy $\kappa_1(s)=0$, $\kappa_2(s,s)=\sigma^2$, and
\begin{align*}
\kappa_n(s,\dots,s)=0 \quad \text{for all } n \ne 2.
\end{align*}
[/definition]
The variance-zero case gives the constant zero variable. For $\sigma^2>0$, the variable $s/\sigma$ is standard semicircular, so the parameter changes only the scale. The next theorem confirms that this cumulant definition is not a new object, but the same semicircular law already described by density and moments. It is the main dictionary entry between the analytic and combinatorial views.
[quotetheorem:7139]
[citeproof:7139]
The condition $\kappa_1(s)=0$ is the centering condition; if $s$ were replaced by $s+\lambda 1$ with $\lambda \ne 0$, the first cumulant and all odd moments would change. The condition that all cumulants of order other than $2$ vanish is also essential: a variable with $\kappa_2=1$ and $\kappa_4\ne 0$ has fourth moment $2+\kappa_4$, so matching only the variance would not force the semicircle law. The theorem is a one-variable characterization and does not by itself assert freeness from any other algebra elements. Its role is to identify the free Gaussian principle: classical Gaussians have only their second classical cumulant, while semicircular variables have only their second free cumulant.
[example: Fourth Moment from Cumulants]
Suppose $s$ has free cumulants $\kappa_2(s,s)=1$ and $\kappa_n(s,\dots,s)=0$ for every $n\ne 2$. We compute the fourth moment from the *Speicher Moment-Cumulant Formula*:
\begin{align*}
\varphi(s^4)=\sum_{\pi\in NC(4)}\kappa_\pi[s,s,s,s].
\end{align*}
A partition $\pi\in NC(4)$ contributes only if every block has size $2$, because any singleton block contributes $\kappa_1(s)=0$, any block of size $3$ contributes $\kappa_3(s,s,s)=0$, and the one-block partition contributes $\kappa_4(s,s,s,s)=0$. Thus the only contributing partitions are the two noncrossing pairings
$\{\{1,2\},\{3,4\}\}$ and $\{\{1,4\},\{2,3\}\}$.
For $\pi=\{\{1,2\},\{3,4\}\}$, multiplicativity of cumulants over blocks gives
\begin{align*}
\kappa_\pi[s,s,s,s]=\kappa_2(s,s)\kappa_2(s,s)=1\cdot 1=1.
\end{align*}
For $\pi=\{\{1,4\},\{2,3\}\}$, the same block product gives
\begin{align*}
\kappa_\pi[s,s,s,s]=\kappa_2(s,s)\kappa_2(s,s)=1\cdot 1=1.
\end{align*}
Therefore
\begin{align*}
\varphi(s^4)=1+1=2.
\end{align*}
The remaining pairing $\{\{1,3\},\{2,4\}\}$ would be a classical Wick contraction, but it is crossing, so it is not an element of $NC(4)$ and contributes nothing in the free cumulant expansion.
[/example]
This computation is the smallest place where the free and classical Gaussian calculations separate. At order two there is no crossing issue; at order four, the missing crossing pairing changes the answer from $3$ to $2$.
## Semicircular Families with Covariance Matrix
A single semicircular variable has only one second cumulant, its variance. For several variables, the second cumulants form a covariance matrix, and the question becomes which matrices can occur and how they determine joint moments.
[definition: Semicircular Family]
Let $(\mathcal A,\varphi)$ be a noncommutative probability space. A family $(s_i)_{i \in I}$ of self-adjoint elements is a semicircular family if all its joint free cumulants vanish except possibly those of order $2$:
\begin{align*}
\kappa_n(s_{i_1},\dots,s_{i_n})=0 \quad \text{for all } n \ne 2.
\end{align*}
[/definition]
This definition reduces the multivariable law to the list of second cumulants, but a finite list of variables needs a compact way to store that data. The natural bookkeeping device is a covariance matrix, because its entries will be exactly the pair contractions used in Wick computations.
[definition: Covariance Matrix of a Finite Semicircular Family]
Let $(s_1,\dots,s_d)$ be a finite semicircular family. Its covariance matrix is the matrix $C \in \mathbb R^{d \times d}$ defined by
\begin{align*}
C_{ij}=\kappa_2(s_i,s_j)=\varphi(s_i s_j).
\end{align*}
[/definition]
After the covariance matrix has been named, the next issue is whether every matrix of numbers can arise this way. Not every symmetric array is compatible with positivity. For instance, the symmetric $2 \times 2$ matrix with diagonal entries $1$ and off-diagonal entries $2$ has quadratic form value $-2$ on $(1,-1)$, so it cannot be the covariance matrix of self-adjoint variables in a positive state. Positivity of the state imposes exactly this kind of quadratic-form condition, and the following theorem identifies the necessary condition.
[quotetheorem:7140]
[citeproof:7140]
The positivity of the state is indispensable here: without $\varphi(x^*x)\ge 0$, the quadratic form argument has no force. The symmetric $2 \times 2$ matrix with diagonal entries $1$ and off-diagonal entries $2$ would not be ruled out despite having a negative value on $(1,-1)$. Self-adjointness and real coefficients keep $x^*x=x^2$ in the displayed computation; with non-self-adjoint variables, covariance has to be organized using mixed expressions such as $\varphi(x^*x)$ instead. The theorem is only a necessary condition, so it does not yet construct variables with a prescribed positive semidefinite matrix. The next theorem supplies the converse construction and shows that no additional covariance obstruction remains for finite semicircular families.
[quotetheorem:7141]
[citeproof:7141]
Positive semidefiniteness is necessary because of the previous theorem and sufficient because the Gram factorization supplies the Hilbert-space vectors used in this construction. Symmetry is also necessary in the real self-adjoint setting: if $C_{ij}\ne C_{ji}$, it cannot equal $\varphi(s_i s_j)$ in a tracial state with self-adjoint variables. The theorem does not assert that the constructed algebra is unique; different Hilbert-space realizations can give isomorphic joint distributions while living on different operator algebras. This construction is the Fock-space model behind the free central limit theorem and is the multivariable source of the Wick rule proved next.
[example: Two Variable Semicircular Families]
For diagonal covariance entries $C_{11}=1$, $C_{22}=4$, and $C_{12}=C_{21}=0$, take $s_1$ standard semicircular and let $t$ be standard semicircular and free from $s_1$. Set $s_2=2t$. Since $s_1$ is standard semicircular, $\varphi(s_1)=0$ and $\varphi(s_1^2)=1$; since $t$ is standard semicircular, $\varphi(t)=0$ and $\varphi(t^2)=1$. Freeness gives $\varphi(s_1t)=\varphi(s_1)\varphi(t)=0\cdot 0=0$, so
\begin{align*}
\varphi(s_1s_2)=\varphi(s_1\cdot 2t)=2\varphi(s_1t)=2\cdot 0=0.
\end{align*}
Also,
\begin{align*}
\varphi(s_2^2)=\varphi((2t)^2)=\varphi(4t^2)=4\varphi(t^2)=4\cdot 1=4.
\end{align*}
Thus the covariance entries are exactly $C_{11}=1$, $C_{12}=C_{21}=0$, and $C_{22}=4$.
For non-diagonal covariance entries $C_{11}=C_{22}=1$ and $C_{12}=C_{21}=\rho$ with $-1\le \rho\le 1$, choose unit vectors $v_1,v_2$ in a real Hilbert space such that $(v_1,v_2)_H=\rho$. This is possible precisely because the two-by-two covariance data are positive semidefinite in this range. In the Fock-space construction from *[Existence of Semicircular Families with Prescribed Covariance](/theorems/7141)*, set $s_i=s(v_i)$. The construction gives
\begin{align*}
\varphi(s_i s_j)=\kappa_2(s_i,s_j)=(v_i,v_j)_H.
\end{align*}
Therefore
\begin{align*}
\varphi(s_1^2)=(v_1,v_1)_H=1.
\end{align*}
\begin{align*}
\varphi(s_2^2)=(v_2,v_2)_H=1.
\end{align*}
\begin{align*}
\varphi(s_1s_2)=(v_1,v_2)_H=\rho.
\end{align*}
So $(s_1,s_2)$ is a semicircular family with the prescribed covariance entries. When $\rho\ne 0$, the mixed second cumulant $\kappa_2(s_1,s_2)=\rho$ is nonzero, so the variables are correlated and need not be free.
[/example]
The condition $\rho \in [-1,1]$ is exactly the positive semidefinite condition for the displayed $2 \times 2$ covariance data. Thus covariance positivity is not a technical side condition; it is the complete compatibility condition for second moments.
## Free Wick Formula Using Noncrossing Pairings
How do we compute a mixed moment such as $\varphi(s_1s_2s_1s_2)$ without expanding cumulants from scratch every time? The [free Wick formula](/theorems/7142) gives the answer: sum over noncrossing pairings, and multiply the covariance entries dictated by the pairs.
[definition: Noncrossing Pairing]
A noncrossing pairing of $\{1,\dots,2n\}$ is a partition $\pi$ of $\{1,\dots,2n\}$ such that every block has size $2$ and no two blocks $\{a,c\}$ and $\{b,d\}$ satisfy $a<b<c<d$. The set of noncrossing pairings of $\{1,\dots,2n\}$ is denoted $NC_2(2n)$.
[/definition]
Pairings encode the possible second-cumulant contractions. The noncrossing condition is the free-probability replacement for the unrestricted pairings in the classical Wick formula. The next result packages the entire moment-cumulant expansion for semicircular families into a single rule, so later computations can be done by drawing planar pairings rather than summing over all noncrossing partitions.
[quotetheorem:7142]
[citeproof:7142]
The semicircular-family hypothesis is essential: if a fourth free cumulant is present, then fourth moments acquire an additional term not represented by pairings. The use of noncrossing pairings is also essential; the crossing pairing of four points is exactly the term that appears in the classical Gaussian Wick formula but is absent in the free formula. The theorem does not say that the variables are free from each other; freeness of distinct variables corresponds to the special case where mixed second cumulants vanish in the appropriate blocks. The practical payoff is that later mixed moment calculations reduce to planar matching with covariance labels rather than repeated moment-cumulant inversion.
[example: Computing a Mixed Fourth Moment]
Let $(s_1,s_2)$ be a semicircular family with covariance matrix $C=(C_{ij})$, and set $i_1=1$, $i_2=2$, $i_3=1$, and $i_4=2$. By the *Free Wick Formula*, the fourth moment is the sum over noncrossing pairings of $\{1,2,3,4\}$:
\begin{align*}
\varphi(s_1s_2s_1s_2)=\sum_{\pi\in NC_2(4)}\prod_{\{p,q\}\in\pi}C_{i_p i_q}.
\end{align*}
The noncrossing pairings are $\{\{1,2\},\{3,4\}\}$ and $\{\{1,4\},\{2,3\}\}$; the pairing $\{\{1,3\},\{2,4\}\}$ is crossing because $1<2<3<4$.
For $\pi=\{\{1,2\},\{3,4\}\}$, the block product is
\begin{align*}
C_{i_1 i_2}C_{i_3 i_4}=C_{12}C_{12}=C_{12}^2.
\end{align*}
For $\pi=\{\{1,4\},\{2,3\}\}$, the block product is
\begin{align*}
C_{i_1 i_4}C_{i_2 i_3}=C_{12}C_{21}.
\end{align*}
Therefore
\begin{align*}
\varphi(s_1s_2s_1s_2)=C_{12}^2+C_{12}C_{21}.
\end{align*}
If the state is tracial and $s_1,s_2$ are self-adjoint, then
\begin{align*}
C_{21}=\varphi(s_2s_1)=\varphi(s_1s_2)=C_{12}.
\end{align*}
Substituting this into the fourth-moment formula gives
\begin{align*}
\varphi(s_1s_2s_1s_2)=C_{12}^2+C_{12}^2=2C_{12}^2.
\end{align*}
Thus the mixed fourth moment is controlled entirely by the off-diagonal covariance entry, with two planar contractions contributing.
[/example]
This example also shows why freeness is stronger than being jointly semicircular. If $C_{12}=0$, the mixed fourth moment above vanishes; if $C_{12}\ne 0$, the variables are correlated through their second cumulant.
[example: Standard Fourth Moment Revisited]
Take $i_1=i_2=i_3=i_4=1$ and $C_{11}=1$. By the *Free Wick Formula*,
\begin{align*}
\varphi(s_1^4)=\sum_{\pi\in NC_2(4)}\prod_{\{p,q\}\in\pi} C_{i_p i_q}.
\end{align*}
The noncrossing pairings of $\{1,2,3,4\}$ are $\{\{1,2\},\{3,4\}\}$ and $\{\{1,4\},\{2,3\}\}$; the remaining pairing $\{\{1,3\},\{2,4\}\}$ is crossing because $1<2<3<4$. For $\pi=\{\{1,2\},\{3,4\}\}$, the product is
\begin{align*}
C_{i_1 i_2}C_{i_3 i_4}=C_{11}C_{11}=1\cdot 1=1.
\end{align*}
For $\pi=\{\{1,4\},\{2,3\}\}$, the product is
\begin{align*}
C_{i_1 i_4}C_{i_2 i_3}=C_{11}C_{11}=1\cdot 1=1.
\end{align*}
Therefore
\begin{align*}
\varphi(s_1^4)=1+1=2.
\end{align*}
The same argument applies in every even degree for one standard semicircular variable: every covariance factor is $C_{11}=1$, so the moment is just the number of noncrossing pairings. Since the Catalan numbers count noncrossing pairings of $\{1,\dots,2n\}$,
\begin{align*}
|NC_2(6)|=C_3=\frac{1}{4}\binom{6}{3}=\frac{20}{4}=5.
\end{align*}
Also,
\begin{align*}
|NC_2(8)|=C_4=\frac{1}{5}\binom{8}{4}=\frac{70}{5}=14.
\end{align*}
Thus the fourth, sixth, and eighth moments are $2$, $5$, and $14$, recovering the Catalan values from the beginning of the chapter.
[/example]
The chapter has now closed the circle between analysis and combinatorics. The density of the semicircle law gives Catalan moments, the free cumulant characterization explains why those moments arise, covariance matrices organize the multivariable case, and the free Wick formula supplies the practical rule for mixed moments.
Semicircular variables show how free cumulants organize moments into explicit, computable form; Chapter 8 extends this to prove a free central limit theorem, where sums of freely independent identically distributed variables converge to a semicircular law.
# 8. The Free Central Limit Theorem
Chapters 4 through 7 built the combinatorial machinery of noncrossing partitions, free cumulants, and semicircular variables. This chapter uses that machinery to prove the free analogue of the central limit theorem: normalized sums of freely independent, identically distributed variables converge in law to a semicircular variable. The proof is shorter than the classical Fourier-transform proof because free cumulants linearize free convolution and their scaling under normalization singles out the second cumulant.
The chapter also treats the multivariate version. There the limit is not a list of independent Gaussian variables, but a semicircular system whose covariance matrix records the limiting second-order mixed moments.
## Identically Distributed Free Variables and Normalization
The basic question is what hypotheses should replace independence and finite variance in the noncommutative setting. We work in a tracial noncommutative probability space $(\mathcal A, \varphi)$, and convergence in distribution means convergence of all scalar moments of noncommutative polynomials.
[definition: Identically Distributed Noncommutative Random Variables]
Let $(\mathcal A, \varphi)$ be a noncommutative probability space. A sequence $(a_k)_{k \ge 1}$ of noncommutative random variables in $\mathcal A$ is identically distributed if
\begin{align*}
\varphi(a_k^m) = \varphi(a_1^m)
\end{align*}
for all $k \ge 1$ and all $m \ge 1$.
[/definition]
Identical distribution ensures that every summand has the same moment sequence and therefore the same free cumulants. The next obstruction is drift: without centering, the first cumulant of a normalized sum grows like $\sqrt n$, so the central limit normalization cannot produce a finite centered limit.
[definition: Centered Variance-One Family]
Let $(\mathcal A, \varphi)$ be a tracial noncommutative probability space. A family $(a_k)_{k \ge 1}$ is centered and variance-one if $\varphi(a_k)=0$ and $\varphi(a_k^2)=1$ for every $k \ge 1$.
[/definition]
Centering removes the first cumulant, and the variance-one assumption fixes the scale of the limiting law. We therefore need notation for the particular expression whose cumulants will be studied throughout the chapter: the sum divided by the square root of the number of summands.
[definition: Normalized Free Sum]
Let $(a_k)_{k \ge 1}$ be a sequence in a noncommutative probability space. For $n \ge 1$, define
\begin{align*}
S_n := \frac{a_1 + \cdots + a_n}{\sqrt n}.
\end{align*}
[/definition]
If the $a_k$ are centered and variance-one, then $S_n$ is again centered and variance-one. The point of the central limit theorem is that all higher moments of $S_n$ lose dependence on the original law and converge to those of the semicircular law.
[example: Normalized Sums of Centered Projections]
Let $p_1,p_2,\dots$ be freely independent projections with $\varphi(p_k)=\lambda$, where $0<\lambda<1$, and define
\begin{align*}
a_k := \frac{p_k-\lambda 1}{\sqrt{\lambda(1-\lambda)}}.
\end{align*}
Since $p_k$ is a projection, $p_k^2=p_k$, and since $\varphi(1)=1$, its centered normalization has mean
\begin{align*}
\varphi(a_k)=\frac{\varphi(p_k)-\lambda\varphi(1)}{\sqrt{\lambda(1-\lambda)}}=\frac{\lambda-\lambda}{\sqrt{\lambda(1-\lambda)}}=0.
\end{align*}
For the second moment, first expand the square:
\begin{align*}
(p_k-\lambda 1)^2=p_k^2-2\lambda p_k+\lambda^2 1.
\end{align*}
Using $p_k^2=p_k$, this becomes
\begin{align*}
(p_k-\lambda 1)^2=(1-2\lambda)p_k+\lambda^2 1.
\end{align*}
Therefore
\begin{align*}
\varphi\bigl((p_k-\lambda 1)^2\bigr)=(1-2\lambda)\varphi(p_k)+\lambda^2\varphi(1).
\end{align*}
Substituting $\varphi(p_k)=\lambda$ and $\varphi(1)=1$ gives
\begin{align*}
\varphi\bigl((p_k-\lambda 1)^2\bigr)=(1-2\lambda)\lambda+\lambda^2=\lambda-\lambda^2=\lambda(1-\lambda).
\end{align*}
Hence
\begin{align*}
\varphi(a_k^2)=\frac{\varphi\bigl((p_k-\lambda 1)^2\bigr)}{\lambda(1-\lambda)}=\frac{\lambda(1-\lambda)}{\lambda(1-\lambda)}=1.
\end{align*}
The variables $a_k$ are freely independent because each $a_k$ lies in the unital algebra generated by $p_k$, and they are identically distributed because the projections have the same distribution. Thus the normalized sums
\begin{align*}
S_n=\frac{1}{\sqrt n}\sum_{k=1}^n a_k
\end{align*}
satisfy the centered variance-one hypotheses of the *Free Central Limit Theorem*. Their limiting law is therefore the standard semicircular law; the parameter $\lambda$ affects the original two-point variables, but it disappears after centering, variance normalization, and free central limit scaling.
[/example]
This example already shows the universality phenomenon: the original variables may have a two-point distribution, but free summation smooths the limiting moment sequence into the Catalan pattern of the semicircle.
## Cumulant Scaling Under Sums
The central calculation is to understand what happens to free cumulants under the operation $a_1+\cdots+a_n$ followed by scaling by $n^{-1/2}$. The additivity of free cumulants for freely independent variables is precisely the reason the proof is linear.
[quotetheorem:7143]
[citeproof:7143]
The formula identifies the entire mechanism of the theorem. Freeness is the hypothesis that removes mixed-index cumulants; without it, terms such as $\kappa_m(a_1,a_2,a_1,\dots)$ could survive the summation and the diagonal count would no longer describe the limit. A concrete failure occurs if all summands are the same centered variance-one variable, $a_k=a$; then $S_n=\sqrt n\,a$, so the second moment is $n\varphi(a^2)$ rather than a bounded quantity. Identical distribution is also doing real work: it lets the $n$ diagonal terms combine into one common cumulant rather than an average whose behaviour would require a separate hypothesis. For instance, if the $a_k$ are free and centered but $\varphi(a_k^2)=k$, then
\begin{align*}
\varphi(S_n^2)=n^{-1}\sum_{k=1}^n k=\frac{n+1}{2},
\end{align*}
so the variance does not approach a finite prescribed constant. The result does not yet prove convergence of moments, because it only tracks cumulants at a fixed order; it prepares the next step by showing that centering kills the growing first cumulant, variance normalization fixes the second cumulant, and every cumulant of order $m\ge 3$ is multiplied by $n^{1-m/2}$, which tends to $0$.
[example: Classical and Free Cumulant Comparison]
Let $X_1,\dots,X_n$ be independent, identically distributed classical random variables, and write $T_n=(X_1+\cdots+X_n)/\sqrt n$. Ordinary cumulants are homogeneous of degree $m$, so
\begin{align*}
K_m(T_n)=K_m\left(n^{-1/2}(X_1+\cdots+X_n)\right)=n^{-m/2}K_m(X_1+\cdots+X_n).
\end{align*}
Since ordinary cumulants add over independent sums,
\begin{align*}
K_m(X_1+\cdots+X_n)=K_m(X_1)+\cdots+K_m(X_n).
\end{align*}
Identical distribution gives $K_m(X_i)=K_m(X_1)$ for every $i$, hence
\begin{align*}
K_m(X_1)+\cdots+K_m(X_n)=nK_m(X_1).
\end{align*}
Combining the three displayed equalities gives
\begin{align*}
K_m(T_n)=n^{-m/2}\,nK_m(X_1)=n^{1-m/2}K_m(X_1).
\end{align*}
For a centered variance-one common law, this means $K_1(T_n)=0$, $K_2(T_n)=1$, and for $m\ge 3$ the factor $n^{1-m/2}$ tends to $0$. The free calculation is formally the same: for freely independent identically distributed variables $a_1,\dots,a_n$ and $S_n=n^{-1/2}(a_1+\cdots+a_n)$, multilinearity expands $\kappa_m(S_n,\dots,S_n)$ into $n^{-m/2}$ times the sum of all mixed cumulants $\kappa_m(a_{i_1},\dots,a_{i_m})$; freeness kills every term with two distinct indices, leaving exactly the $n$ diagonal terms. Therefore
\begin{align*}
\kappa_m(S_n,\dots,S_n)=n^{-m/2}\,n\,\kappa_m(a_1,\dots,a_1)=n^{1-m/2}\kappa_m(a_1,\dots,a_1).
\end{align*}
Thus both theories keep only the second cumulant under central-limit scaling. The difference is combinatorial: classical moments are reconstructed from all set partitions and give the Gaussian law, while free moments are reconstructed from noncrossing partitions and give the semicircular law.
[/example]
This comparison isolates the difference between the two theories. The normalization and cumulant scaling are formally parallel, while the moment-cumulant formula uses all set partitions classically and noncrossing partitions freely. We next record the asymptotic consequence in the form needed for moment convergence.
[quotetheorem:7144]
[citeproof:7144]
The preceding result is the whole asymptotic input, but its hypotheses mark the possible failure modes. If the variables are not centered, then $\kappa_1(S_n)=n^{1/2}\kappa_1(a_1)$ diverges unless $\kappa_1(a_1)=0$, so no centered finite limiting law can arise under this normalization. If the second cumulant is not normalized to $1$, the same argument gives a semicircular limit with that variance rather than the standard one. The theorem also does not say that the original higher cumulants are small; it says their contribution is suppressed by the normalization. To convert this cumulant information into a statement about moments, we return to the Speicher Moment-Cumulant Formula.
## Statement and Proof of the Free Central Limit Theorem
The limiting law should be the unique law whose only nonzero free cumulant is the second one. Earlier chapters identified this law as the semicircular law. We now prove that every centered variance-one freely independent identically distributed sequence has this limit.
[quotetheorem:7145]
[citeproof:7145]
This result reveals why noncrossing pairings are the free Gaussian Wick contractions. It is a moment-convergence theorem: it states convergence of $\varphi(p(S_n))$ for each noncommutative polynomial $p$ in one variable, and it does not by itself assert norm convergence, almost sure convergence, or convergence of spectra. Centering is needed because if each $a_k$ has mean $\mu\neq 0$, then $\varphi(S_n)=\sqrt n\,\mu$, so the normalized sums cannot converge to a centered semicircular law. Freeness is equally structural: if $a_k=a$ for all $k$ with $\varphi(a)=0$ and $\varphi(a^2)=1$, then $S_n=\sqrt n\,a$ and $\varphi(S_n^2)=n$, so even the second moment escapes the semicircular scale. Variance-one fixes the scale of the limiting semicircle. In the classical theorem, all pair partitions contribute to Gaussian moments; in the free theorem, only noncrossing pair partitions contribute to semicircular moments, and that is the forward link to the concrete moment computations below.
[example: Limiting Fourth Moment]
Let $a_1,a_2,\dots$ be centered, variance-one, freely independent, identically distributed variables, and let $S_n=n^{-1/2}(a_1+\cdots+a_n)$. By the *[Free Cumulant Scaling for Normalized Sums](/theorems/7143)*, the first four cumulants of $S_n$ satisfy
\begin{align*}
\kappa_1(S_n)=0
\end{align*}
\begin{align*}
\kappa_2(S_n,S_n)=1
\end{align*}
\begin{align*}
\kappa_4(S_n,S_n,S_n,S_n)=n^{-1}\kappa_4(a_1,a_1,a_1,a_1).
\end{align*}
The *Speicher Moment-Cumulant Formula* writes $\varphi(S_n^4)$ as a sum over noncrossing partitions of $\{1,2,3,4\}$. Every partition with a singleton block contributes $0$ because $\kappa_1(S_n)=0$, so the only surviving partitions are the two noncrossing pairings and the one-block partition. Hence
\begin{align*}
\varphi(S_n^4)=\kappa_2(S_n,S_n)\kappa_2(S_n,S_n)+\kappa_2(S_n,S_n)\kappa_2(S_n,S_n)+\kappa_4(S_n,S_n,S_n,S_n).
\end{align*}
Substituting the cumulant values gives
\begin{align*}
\varphi(S_n^4)=1\cdot 1+1\cdot 1+n^{-1}\kappa_4(a_1,a_1,a_1,a_1).
\end{align*}
Therefore
\begin{align*}
\varphi(S_n^4)=2+n^{-1}\kappa_4(a_1,a_1,a_1,a_1).
\end{align*}
Since the fourth cumulant of $a_1$ is fixed while $n^{-1}\to 0$,
\begin{align*}
\lim_{n\to\infty}\varphi(S_n^4)=2.
\end{align*}
The number $2$ is the Catalan number $C_2$, counting the two noncrossing pairings of four points; the finite-$n$ correction is exactly the scaled fourth cumulant of the original variables.
[/example]
The fourth moment is the first place where the free limit differs from the classical standard Gaussian, whose fourth moment is $3$. The missing crossing pairing accounts for the difference.
[example: Limiting Sixth Moment]
Let $a_1,a_2,\dots$ be centered, variance-one, freely independent, identically distributed variables, and set $S_n=n^{-1/2}(a_1+\cdots+a_n)$. The Speicher Moment-Cumulant Formula gives
\begin{align*}
\varphi(S_n^6)=\sum_{\pi\in NC(6)}\prod_{B\in\pi}\kappa_{|B|}(S_n,\dots,S_n).
\end{align*}
Since $\kappa_1(S_n)=0$, every partition with a singleton block contributes $0$. Since $\kappa_2(S_n,S_n)=1$, each noncrossing pairing contributes a product of three second cumulants:
\begin{align*}
\kappa_2(S_n,S_n)\kappa_2(S_n,S_n)\kappa_2(S_n,S_n)=1\cdot 1\cdot 1=1.
\end{align*}
It remains to count the noncrossing pairings and check that the other partitions vanish in the limit. The number of noncrossing pairings of six points is the Catalan number
\begin{align*}
C_3=\frac{1}{3+1}\binom{6}{3}.
\end{align*}
Here
\begin{align*}
\binom{6}{3}=\frac{6\cdot 5\cdot 4}{3\cdot 2\cdot 1}=20,
\end{align*}
so
\begin{align*}
C_3=\frac{1}{4}\cdot 20=5.
\end{align*}
Now take any noncrossing partition of $\{1,\dots,6\}$ with no singleton blocks that is not a pairing. Then at least one block has size $r\ge 3$. By cumulant scaling,
\begin{align*}
\kappa_r(S_n,\dots,S_n)=n^{1-r/2}\kappa_r(a_1,\dots,a_1).
\end{align*}
For $r\ge 3$ the exponent satisfies $1-r/2<0$, so this factor tends to $0$ as $n\to\infty$, while the cumulant $\kappa_r(a_1,\dots,a_1)$ is fixed. Therefore every non-pairing contribution tends to $0$.
Thus the limiting sixth moment is the sum of the five surviving pairing contributions:
\begin{align*}
\lim_{n\to\infty}\varphi(S_n^6)=5\cdot 1=5.
\end{align*}
The number $5$ is the Catalan count of noncrossing pairings of six points, so the sixth moment matches the standard semicircular law rather than the classical Gaussian moment.
[/example]
These computations give concrete checks on the theorem. They also show that the semicircular distribution is not a cosmetic renaming of the Gaussian distribution: the combinatorics of noncrossing partitions changes the numerical moment sequence.
[remark: Bounded Moments and Universality]
The proof only uses the existence of the finitely many moments needed for each fixed moment calculation. If the variables are uniformly bounded, or if the common law has all moments, then the moment convergence statement holds in every order. The limiting moment sequence depends only on centering and variance, not on the higher cumulants of the original variables.
[/remark]
This universality is the free analogue of the classical principle that many different microscopic laws have the same normalized macroscopic limit. It also explains why semicircular variables play the role of large random matrices: when one computes the normalized trace moments of a Wigner matrix, the large-$N$ index sums are dominated by the same noncrossing pairings that survive in the free cumulant proof. Thus the Catalan numbers appearing in the theorem are not decorative; they are the operational bridge between free sums and the semicircle law from random matrix theory.
## Multivariate Free Central Limit Theorem and Semicircular Systems
The final question is what happens when each summand is a vector of noncommutative variables rather than a single variable. The limit should remember second-order correlations among coordinates, while higher mixed cumulants should vanish after the same normalization.
[definition: Semicircular System with Covariance Matrix]
Let $(\mathcal B,\psi)$ be a tracial noncommutative probability space, and let $C=(c_{ij})_{1\le i,j\le d}$ be a real symmetric positive semidefinite matrix. A $d$-tuple $(s_1,\dots,s_d)$ of self-adjoint variables is a semicircular system with covariance matrix $C$ if its free cumulants satisfy $\kappa_1(s_i)=0$, $\kappa_2(s_i,s_j)=c_{ij}$, and $\kappa_m(s_{i_1},\dots,s_{i_m})=0$ for every $m\ge 3$ and all admissible indices.
[/definition]
When $C=I_d$, the variables form a freely independent standard semicircular family. For general $C$, the entries need not be freely independent; the covariance matrix gives the second cumulants that survive the limiting procedure.
[quotetheorem:7146]
[citeproof:7146]
The multivariate theorem says that free Gaussian families are characterized by covariance data and vanishing higher free cumulants. Freeness is now required at the level of the tuple-generated subalgebras, not coordinate by coordinate; coordinates inside one tuple may be correlated, and those correlations are exactly the entries of $C$. The hypotheses have concrete failure modes. If each tuple has nonzero mean vector $m=(m_1,\dots,m_d)$, then $\varphi(S_{n,i})=\sqrt n\,m_i$, so any coordinate with $m_i\neq 0$ escapes the centered limit. If the tuple copies are not free, the diagonal-counting argument can fail even in the second moment: taking $(a_{k,1},a_{k,2})=(a,b)$ for every $k$, with $\varphi(a^2)=1$ and $\varphi(ab)\neq 0$, gives $S_{n,1}=\sqrt n\,a$ and $S_{n,2}=\sqrt n\,b$, so $\varphi(S_{n,1}^2)=n$ and $\varphi(S_{n,1}S_{n,2})=n\varphi(ab)$ instead of fixed covariance entries. If the copies were not identically distributed, the limiting covariance would have to be supplied as a limit of averaged second moments. The theorem also does not say that the limiting coordinates are freely independent unless $C$ is diagonal. To compute individual mixed moments of the limit, we need to translate this cumulant description back into a word-by-word pairing formula.
[quotetheorem:7147]
[proofunderconstruction:7147]
This is the free version of Wick's formula. The covariance matrix hypothesis supplies every allowed pair contraction, while the semicircular-system hypothesis removes all blocks of size other than two from the moment-cumulant expansion. Its practical role is to turn a mixed word into a finite planar matching problem: label the positions, list the noncrossing pairings, and multiply the covariance entries attached to each pair. That rule is the bridge between the abstract cumulant definition of a semicircular system and the concrete moment computations used in random matrix limits, free central limit calculations, and examples such as $stst$ below.
The vanishing of higher free cumulants is necessary: if $\kappa_4(x,x,x,x)\neq 0$, then the fourth moment contains the additional contribution of the one-block partition $\{1,2,3,4\}$, so pairings alone do not determine $\varphi(x^4)$. The formula also does not assert classical Gaussianity, norm convergence, or independence of the coordinates; it is a moment formula inside the noncommutative probability space. Its most visible difference from the classical Wick rule is that crossing pairings are excluded. For centered classical Gaussian variables $X,Y$ with covariance $\mathbb E[XY]=\rho$ and variances $1$, the word pattern $XYXY$ receives three pair contractions and has moment $1+2\rho^2$; the free semicircular word $stst$ below keeps only the two noncrossing contractions. This restriction is the computational signature of freeness and is the point to watch in mixed-word examples.
[example: Correlated Two-Variable Limit]
Let $(a_k,b_k)$ be freely independent copies of a centered self-adjoint pair with $\varphi(a_1^2)=1$, $\varphi(b_1^2)=1$, and $\varphi(a_1b_1)=\rho$. By traciality, $\varphi(b_1a_1)=\varphi(a_1b_1)=\rho$, so the limiting semicircular pair $(s,t)$ has covariance entries $c_{11}=1$, $c_{22}=1$, and $c_{12}=c_{21}=\rho$. The *[Multivariate Free Central Limit Theorem](/theorems/7146)* gives joint convergence of the normalized sums $(S_{n,a},S_{n,b})$ to $(s,t)$, and the *[Wick Formula for Semicircular Systems](/theorems/7147)* computes $\psi(stst)$ by summing over noncrossing pairings of the four positions carrying the letters $s,t,s,t$.
The noncrossing pairing $\{\{1,2\},\{3,4\}\}$ contributes the product of the covariance entries attached to the two pairs:
\begin{align*}
c_{12}c_{12}=\rho\cdot \rho=\rho^2.
\end{align*}
The noncrossing pairing $\{\{1,4\},\{2,3\}\}$ contributes
\begin{align*}
c_{12}c_{21}=\rho\cdot \rho=\rho^2.
\end{align*}
The remaining pairing $\{\{1,3\},\{2,4\}\}$ would pair the two $s$ positions and the two $t$ positions, so its covariance product would be
\begin{align*}
c_{11}c_{22}=1\cdot 1=1.
\end{align*}
However, this pairing is crossing because $1<2<3<4$ with pairs $\{1,3\}$ and $\{2,4\}$, so it is not included in the free Wick sum. Therefore
\begin{align*}
\psi(stst)=\rho^2+\rho^2=2\rho^2.
\end{align*}
The mixed fourth moment keeps exactly the two noncrossing covariance contractions, which is why the free answer differs from the classical Gaussian Wick computation by excluding the crossing contraction.
[/example]
The [covariance matrix is positive semidefinite](/theorems/3999) because it is obtained from second moments of self-adjoint linear combinations. Indeed, for $\alpha=(\alpha_1,\dots,\alpha_d)\in\mathbb R^d$,
\begin{align*}
\sum_{i,j=1}^d \alpha_i c_{ij}\alpha_j =\varphi\left(\left(\sum_{i=1}^d\alpha_i a_{1,i}\right)^2\right)\ge 0.
\end{align*}
Thus the limiting covariance data are exactly the matrices that can occur for semicircular systems.
[remark: Universality of the Semicircular System]
For each fixed word length, the proof depends only on the corresponding finitely many mixed moments of the original tuple. Under bounded-moment hypotheses in all orders, the full joint distribution of normalized sums converges. The higher mixed cumulants of the original tuple influence finite-$n$ corrections, but not the limiting semicircular system.
[/remark]
The free central limit theorem closes the first part of the course. Starting from noncommutative probability spaces and freeness, the noncrossing cumulant calculus produces a [universal limit theorem](/theorems/2489) whose limit law is semicircular rather than Gaussian. Later analytic tools, such as the $R$-transform and [Cauchy transform](/page/Cauchy%20Transform), give another language for the same phenomenon, but the combinatorial proof here already contains the core mechanism.
The free central limit theorem shows that semicircular distributions arise universally from free families; Chapter 9 asks the constructive question: which models, such as free products and Fock spaces, actually produce free families?
# 9. Standard Constructions and Models
Chapters 3 through 8 gave the combinatorial meaning of freeness through alternating centered moments, noncrossing cumulants, and the semicircular central limit theorem. This chapter explains where free families come from. The guiding question is constructive: given several noncommutative probability spaces, how can we place copies of them inside a larger space so that their original laws are preserved and the copies become freely independent?
The answer has two complementary forms. First, there is an algebraic free product construction, which is universal for freely adjoining algebras with prescribed states. Second, there is a Hilbert-space model on full Fock space, where creation and annihilation operators produce concrete semicircular variables and make the vanishing of mixed centered moments visible.
## Free Products of Noncommutative Probability Spaces
Suppose we are given several noncommutative probability spaces $(A_i,\tau_i)_{i \in I}$. We want a single probability space containing a copy of every $A_i$, preserving each state $\tau_i$, while imposing no relations between different copies except those forced by the unit and by freeness. The construction uses the algebraic free product of unital algebras.
[definition: Algebraic Free Product]
Let $(A_i)_{i \in I}$ be unital algebras over $\mathbb C$. The algebraic free product with amalgamated unit, denoted $*_{i \in I} A_i$, is the unital algebra equipped with unital homomorphisms $\iota_i:A_i \to *_{j \in I}A_j$ such that for every unital algebra $B$ and every family of unital homomorphisms $\varphi_i:A_i \to B$, there is a unique unital homomorphism $\varphi:*_{i \in I}A_i \to B$ satisfying $\varphi \circ \iota_i=\varphi_i$ for all $i \in I$.
[/definition]
This definition identifies the free product by its mapping property. The mapping property is powerful, but by itself it does not tell us how to compute moments: a product such as $a_1b_1a_2$ still contains scalar pieces hidden inside each letter. If those scalar pieces are not separated first, the vanishing rule for freeness cannot even be applied correctly. For computations, it is therefore better to have a normal form for elements, and that normal form requires separating each input algebra into scalars and centered elements.
[definition: Centered Subspace]
Let $(A,\tau)$ be a noncommutative probability space. The centered subspace of $A$ is
\begin{align*}
A^\circ := \{a \in A : \tau(a)=0\}.
\end{align*}
[/definition]
Every element $a \in A_i$ decomposes as $a=\tau_i(a)1+a^\circ$ with $a^\circ \in A_i^\circ$. After this decomposition, scalar parts are no longer genuine letters and adjacent centered letters from the same algebra should be multiplied before freeness is tested.
The obstruction is that a formal product may look mixed while still containing pieces that belong to a single input algebra or scalar parts that should be extracted. The words relevant to freeness are precisely those for which this simplification has already been done: every letter is centered, and consecutive letters come from different algebras.
[definition: Reduced Word]
A reduced word in the family $(A_i,\tau_i)_{i \in I}$ is a product
\begin{align*}
a_1a_2\cdots a_n,
\end{align*}
where $n\ge 1$, each $a_k \in A_{i_k}^\circ$, and $i_k \ne i_{k+1}$ for $1\le k<n$.
[/definition]
Reduced words encode the algebraic content of freeness: adjacent factors from the same algebra should first be multiplied inside that algebra and recentered, while alternating centered factors remain irreducible. The missing ingredient is a scalar-extraction rule on the whole free product. Without such a rule, the algebraic free product only supplies formal words; it does not say which mixed moments should survive. This leads to the free product state.
[definition: Free Product State]
Let $(A_i,\tau_i)_{i \in I}$ be noncommutative probability spaces. The free product state is the linear functional
\begin{align*}
\tau=*_{i \in I}\tau_i:A=*_{i \in I}A_i \to \mathbb C
\end{align*}
determined by $\tau(1)=1$ and by
\begin{align*}
\tau(a_1a_2\cdots a_n)=0
\end{align*}
for every reduced word $a_1a_2\cdots a_n$.
[/definition]
The definition is short, but it is the engine of the construction. It says that once each input law is fixed, all alternating centered mixed moments are forced to vanish.
[quotetheorem:7148]
[citeproof:7148]
The theorem is the free-probability analogue of forming a product probability space, but the role of tensor independence is replaced by freeness. The state-preservation hypothesis is essential: without it, the embedded copy of $A_i$ could have a different marginal law, so the construction would not solve the original extension problem. The alternating-centering hypothesis in the definition of freeness is also essential; merely asking for pairwise factorisation would not determine moments such as $\tau(a_1b_1a_2b_2)$. What the theorem does not by itself guarantee is positivity of the algebraic state in every possible algebraic setting; that analytic issue is usually handled through the Hilbert-space model below.
[example: Two Free Copies of a Projection Algebra]
Let $A_1=A_2=\mathbb C[p]/(p^2-p)$, and let $\tau_i(p)=t$ with $t\in[0,1]$. In the free product space put $p_1=\iota_1(p)$ and $p_2=\iota_2(p)$. Since $\iota_i$ is a unital homomorphism,
\begin{align*}
p_i^2=\iota_i(p)^2=\iota_i(p^2)=\iota_i(p)=p_i,
\end{align*}
so each $p_i$ is a projection, and state preservation gives $\tau(p_i)=\tau_i(p)=t$.
Define $q_i=p_i-t1$. Then
\begin{align*}
\tau(q_i)=\tau(p_i)-t\tau(1)=t-t=0.
\end{align*}
Thus $q_1q_2q_1q_2$ is an alternating product of centered letters from the two free copies, so
\begin{align*}
\tau(q_1q_2q_1q_2)=0.
\end{align*}
For example, to compute $\tau(p_1p_2p_1)$, write $p_i=q_i+t1$ and expand in the noncommutative order:
\begin{align*}
p_1p_2p_1=(q_1+t1)(q_2+t1)(q_1+t1).
\end{align*}
Multiplying the first two factors gives
\begin{align*}
(q_1+t1)(q_2+t1)=q_1q_2+tq_1+tq_2+t^2 1.
\end{align*}
Multiplying by $q_1+t1$ on the right gives
\begin{align*}
p_1p_2p_1=q_1q_2q_1+tq_1q_2+tq_1^2+tq_2q_1+2t^2q_1+t^2q_2+t^3 1.
\end{align*}
The only non-centered term here is $q_1^2$, and the projection relation gives
\begin{align*}
q_1^2=(p_1-t1)^2=p_1-2tp_1+t^2 1.
\end{align*}
Since $p_1=q_1+t1$, this becomes
\begin{align*}
q_1^2=(1-2t)q_1+t(1-t)1.
\end{align*}
Taking $\tau$ term by term, the reduced words $q_1q_2q_1$, $q_1q_2$, and $q_2q_1$ have expectation $0$, and $\tau(q_1)=\tau(q_2)=0$. Hence
\begin{align*}
\tau(p_1p_2p_1)=t\tau(q_1^2)+t^3=t\cdot t(1-t)+t^3=t^2.
\end{align*}
This computation shows the rule in practice: mixed moments are found by expanding into scalar parts and centered reduced words, then keeping only the scalar contribution under the free product state.
[/example]
The projection example shows how the construction separates marginal laws from joint laws. The individual distributions of $p_1$ and $p_2$ are fixed before the free product is formed, while mixed moments are determined by the alternating-centering rule.
## Reduced Words and Moment Computations
The main computational problem in a free product is to evaluate a word whose letters come from several algebras. A word is rarely reduced at first sight, because adjacent letters may belong to the same algebra and individual letters may have nonzero expectation. The procedure is to multiply adjacent letters from the same copy, split each resulting letter into scalar and centered parts, and discard the reduced parts under the free product state.
[quotetheorem:7149]
[proofunderconstruction:7149]
The theorem is often used in reverse: to compute a mixed moment, rewrite the word as a sum of scalar multiples of reduced words and non-reduced pieces. Each hypothesis has a visible role. If a letter is not centered, scalar parts may survive; if two neighbouring letters come from the same subalgebra, they must first be multiplied inside that subalgebra; and if the subalgebras are not free, alternating centered moments may be nonzero. The result does not say that all mixed moments vanish, only that the reduced centered ones do. Only the scalar component survives under the free product state.
[example: Alternating Centered Word]
Let $a\in A_1$ and $b\in A_2$ with $\tau(a)=0$ and $\tau(b)=0$, and suppose that $A_1$ and $A_2$ are freely independent. The word $abab$ has letters in the order
\begin{align*}
A_1,\ A_2,\ A_1,\ A_2.
\end{align*}
Neighbouring letters therefore come from different free subalgebras, and each letter is centered. By *[Reduced Word Normal Form and State Evaluation](/theorems/7149)*, the alternating centered product has zero expectation:
\begin{align*}
\tau(abab)=0.
\end{align*}
This is the simplest kind of free-moment computation: once the word is already alternating and centered, no scalar part remains to contribute to the state.
[/example]
A non-centered mixed word requires more work. The next example illustrates the standard recentering method that appears repeatedly in calculations with free variables.
[example: Recentering a Mixed Moment]
Let $a\in A_1$ and $b\in A_2$, where $A_1$ and $A_2$ are free. Put $\alpha=\tau(a)$, $\beta=\tau(b)$, $a^\circ=a-\alpha1$, and $b^\circ=b-\beta1$. Then $\tau(a^\circ)=\tau(a)-\alpha\tau(1)=\alpha-\alpha=0$, and similarly $\tau(b^\circ)=\beta-\beta=0$.
Since $a=\alpha1+a^\circ$ and $b=\beta1+b^\circ$, we expand in the original order:
\begin{align*}
ab=(\alpha1+a^\circ)(\beta1+b^\circ).
\end{align*}
Multiplying the two factors gives
\begin{align*}
ab=\alpha\beta1+\alpha b^\circ+\beta a^\circ+a^\circ b^\circ.
\end{align*}
Applying linearity of $\tau$ gives
\begin{align*}
\tau(ab)=\alpha\beta\tau(1)+\alpha\tau(b^\circ)+\beta\tau(a^\circ)+\tau(a^\circ b^\circ).
\end{align*}
The first term is $\alpha\beta$ because $\tau(1)=1$. The next two terms vanish because $a^\circ$ and $b^\circ$ are centered:
\begin{align*}
\alpha\tau(b^\circ)=\alpha\cdot0=0.
\end{align*}
\begin{align*}
\beta\tau(a^\circ)=\beta\cdot0=0.
\end{align*}
Finally, $a^\circ\in A_1$, $b^\circ\in A_2$, both are centered, and the two letters come from different free subalgebras, so by *Reduced Word Normal Form and State Evaluation*,
\begin{align*}
\tau(a^\circ b^\circ)=0.
\end{align*}
Therefore
\begin{align*}
\tau(ab)=\alpha\beta.
\end{align*}
Thus first mixed moments factor under freeness, even though higher mixed moments generally require recentering every letter and do not follow the classical tensor-independence pattern.
[/example]
This method turns the abstract definition into an algorithm. For longer words, adjacent letters from the same algebra are first multiplied before centering; otherwise the word may be incorrectly treated as alternating.
[remark: Freeness Is Not Pairwise Factorisation]
Free independence gives $\tau(ab)=\tau(a)\tau(b)$ when $a$ and $b$ lie in two free subalgebras, but it does not say that every mixed moment factors into a product of marginal moments. For instance, $\tau(abab)$ depends on lower moments of $a$ and $b$ after recentering, and its value differs from what tensor independence would impose in general.
[/remark]
The algebraic construction is now complete enough for formal moment calculations. To obtain concrete positive models, and to build the central semicircular variables of the subject, we pass to full Fock space.
## Creation and Annihilation Operators on Full Fock Space
The next construction answers a different question: can we realize free independence by actual operators on a Hilbert space? The full Fock space over a Hilbert space $H$ is designed so that concatenation of tensors records reduced words, while the vacuum vector extracts the scalar part.
[definition: Full Fock Space]
Let $H$ be a complex Hilbert space. The full Fock space over $H$ is
\begin{align*}
\mathcal F(H):=\mathbb C\Omega \oplus \bigoplus_{n=1}^{\infty} H^{\otimes n},
\end{align*}
where $\Omega$ is a distinguished unit vector called the vacuum vector.
[/definition]
Here $\mathcal L(\mathcal F(H))$ denotes the algebra of bounded linear operators $\mathcal F(H)\to \mathcal F(H)$.
The summand $H^{\otimes n}$ represents words of length $n$. A word space without operators is only a bookkeeping device: it records tensors, but it cannot yet model random variables or their products. To turn this word space into a probability model, we need bounded operators that add letters to tensors in a controlled way; this motivates the left creation operator.
[definition: Left Creation Operator]
For $f\in H$, the left creation operator is the [bounded linear operator](/page/Bounded%20Linear%20Operator)
\begin{align*}
\ell(f):\mathcal F(H)\to \mathcal F(H)
\end{align*}
defined by $\ell(f)\Omega=f$ and by
\begin{align*}
\ell(f)(h_1\otimes\cdots\otimes h_n)=f\otimes h_1\otimes\cdots\otimes h_n.
\end{align*}
[/definition]
Creation operators increase tensor length by one. Moment computations also need the reverse operation, because a vacuum expectation can be nonzero only when later operators cancel tensor factors all the way back to the vacuum line. The obstruction is to identify exactly which tensor factor is removed and what scalar contraction is produced by the Hilbert-space inner product.
[quotetheorem:7150]
[citeproof:7150]
The formula depends on the boundedness of $\ell(f)$ and on the Hilbert-space inner product convention; changing the convention conjugates the scalar in the displayed formula. The convention issue is not cosmetic. If $H=\mathbb C$, $f=1$, and $h_1=i$, then the linear-in-the-first convention gives $(h_1,f)_H=i$, while the opposite scalar $(f,h_1)_H=-i$ would fail the adjoint identity when tested against the vacuum vector. The formula also says only how a left annihilation operator removes the first tensor factor, not how arbitrary products of creations and annihilations simplify. The next relation extracts the basic contraction rule behind Fock-space moment computations.
[quotetheorem:7151]
[citeproof:7151]
The relation requires the annihilation immediately to follow a creation on the left; products such as $\ell(f)\ell(g)^*$ do not collapse to scalars in the same way. For instance, on the vacuum vector $\ell(f)\ell(g)^*\Omega=0$, while on a one-particle vector $h$ it gives $(h,g)_H f$, a rank-one action rather than scalar multiplication by a fixed number. The order and convention in the scalar are also forced: in $H=\mathbb C$ with $f=1$ and $g=i$, the composite $\ell(1)^*\ell(i)$ multiplies every tensor by $(i,1)_H=i$, not by $(1,i)_H=-i$. Its role is local: it supplies the elementary cancellation used inside longer moment computations. The contraction relation explains how tensor words collapse, but a probability space also needs a state. Since the vacuum vector represents the empty word, the natural scalar extraction map is the vector state at the vacuum; this motivates vacuum expectation.
[definition: Vacuum Expectation]
Let $B\subset \mathcal L(\mathcal F(H))$ be a unital algebra of bounded operators. The vacuum expectation on $B$ is the state
\begin{align*}
\tau:B&\to \mathbb C, & \tau(T)&=(T\Omega,\Omega)_{\mathcal F(H)}.
\end{align*}
[/definition]
Vacuum expectation is the Hilbert-space analogue of taking the scalar component in the free product. If an operator word sends $\Omega$ into a positive tensor-length subspace, its vacuum expectation is zero.
[example: Zero Vacuum Expectation for a Reduced Operator Word]
Let $e_1,e_2\in H$ be orthonormal and put $s_i=\ell(e_i)+\ell(e_i)^*$. For each $i$, the annihilation formula gives $\ell(e_i)^*\Omega=0$, so
\begin{align*}
s_i\Omega=\ell(e_i)\Omega+\ell(e_i)^*\Omega=e_i+0=e_i.
\end{align*}
Since the vacuum line $\mathbb C\Omega$ is orthogonal to the one-particle space $H$, we get
\begin{align*}
\tau(s_i)=(s_i\Omega,\Omega)=(e_i,\Omega)=0.
\end{align*}
We compute the mixed word by applying operators to $\Omega$ from right to left. First,
\begin{align*}
s_1\Omega=e_1.
\end{align*}
Next,
\begin{align*}
s_2e_1=\ell(e_2)e_1+\ell(e_2)^*e_1=e_2\otimes e_1+(e_1,e_2)_H\Omega.
\end{align*}
Because $e_1$ and $e_2$ are orthogonal, $(e_1,e_2)_H=0$, hence
\begin{align*}
s_2e_1=e_2\otimes e_1.
\end{align*}
Now apply $s_1$:
\begin{align*}
s_1(e_2\otimes e_1)=\ell(e_1)(e_2\otimes e_1)+\ell(e_1)^*(e_2\otimes e_1).
\end{align*}
The creation term is
\begin{align*}
\ell(e_1)(e_2\otimes e_1)=e_1\otimes e_2\otimes e_1.
\end{align*}
The annihilation term is
\begin{align*}
\ell(e_1)^*(e_2\otimes e_1)=(e_2,e_1)_H e_1=0.
\end{align*}
Therefore
\begin{align*}
s_1s_2s_1\Omega=e_1\otimes e_2\otimes e_1.
\end{align*}
This vector lies in $H^{\otimes 3}$, which is orthogonal to the vacuum line $\mathbb C\Omega$, so
\begin{align*}
\tau(s_1s_2s_1)=(s_1s_2s_1\Omega,\Omega)=(e_1\otimes e_2\otimes e_1,\Omega)=0.
\end{align*}
The computation shows explicitly that the middle colour $e_2$ prevents the annihilation operators from contracting the word all the way back to the vacuum.
[/example]
This example is the first hint that orthogonal directions in $H$ behave as free random variables. The general issue is whether the observed cancellation persists for every alternating centered word, not just for one product of three semicircular operators.
To turn the example into a usable construction, we need a statement that applies to whole subalgebras generated on orthogonal sectors of Fock space. Orthogonality should force mismatched colours to prevent a full return to the vacuum, which is exactly the reduced-word vanishing needed for freeness.
[quotetheorem:7152]
[citeproof:7152]
This result explains why full Fock space is the natural Hilbert-space model for freeness. Orthogonality of the subspaces is essential: if $H_1=H_2$, the generators are not separated by colour and mixed centered moments need not vanish. Centering is also essential, because constants in the algebras always contribute vacuum components. The theorem does not classify all free families; it supplies a concrete positive model whose reduced-word mechanism mirrors the algebraic free product.
## Semicircular Variables from Fock Space
The final construction in this chapter produces the free analogue of Gaussian random variables. In classical probability, Gaussian variables are determined by pairings of moments. In free probability, the corresponding variables are semicircular, but creation plus annihilation deserves that name only after its moments have been computed and shown to be Catalan rather than Gaussian. Full Fock space realizes this computation directly.
[definition: Fock Semicircular Operator]
Let $H$ be a complex Hilbert space and let $f\in H$. The Fock semicircular operator associated to $f$ is the bounded operator
\begin{align*}
s(f):\mathcal F(H)&\to \mathcal F(H), & s(f)&=\ell(f)+\ell(f)^*.
\end{align*}
[/definition]
When $f$ is a unit vector, $s(f)$ is self-adjoint and centered under the vacuum expectation. The next theorem is needed to identify its law, because a self-adjoint operator is semicircular only after its full moment sequence has the Catalan form. The creation-annihilation rules reduce that moment computation to counting admissible returns to the vacuum.
[quotetheorem:7153]
[citeproof:7153]
The unit-vector hypothesis fixes the variance. If $f=0$, then $s(f)=0$ and $\tau(s(f)^2)=0$, contradicting the Catalan value $C_1=1$; more generally, each complete contraction contributes $\|f\|_H^2$, so the even moments are scaled by powers of $\|f\|_H^2$. Self-adjointness comes from using the symmetric combination $\ell(f)+\ell(f)^*$; a creation operator alone would not have a real probability law in the self-adjoint sense, since $\ell(f)$ is not self-adjoint when $f\ne 0$. The theorem identifies the operator with the centered semicircular law of variance $1$, but it is only a one-variable statement. Freeness of several such operators requires the orthogonality result from the previous section.
[example: Low Moments of a Fock Semicircular Operator]
Let $\|f\|_H=1$ and let $s=\ell(f)+\ell(f)^*$. By the annihilation formula, $\ell(f)^*\Omega=0$, while $\ell(f)\Omega=f$, so
\begin{align*}
s\Omega=\ell(f)\Omega+\ell(f)^*\Omega=f+0=f.
\end{align*}
Since $f\in H$ lies in the one-particle space and $\Omega$ lies in the vacuum line, these summands are orthogonal in $\mathcal F(H)$; hence
\begin{align*}
\tau(s)=(s\Omega,\Omega)=(f,\Omega)=0.
\end{align*}
For the second moment, apply $s$ once more:
\begin{align*}
s^2\Omega=s(f)=\ell(f)f+\ell(f)^*f.
\end{align*}
The creation term is
\begin{align*}
\ell(f)f=f\otimes f.
\end{align*}
The annihilation term is
\begin{align*}
\ell(f)^*f=(f,f)_H\Omega=\|f\|_H^2\Omega=\Omega.
\end{align*}
Therefore
\begin{align*}
s^2\Omega=f\otimes f+\Omega.
\end{align*}
Taking the vacuum inner product gives
\begin{align*}
\tau(s^2)=(s^2\Omega,\Omega)=(f\otimes f,\Omega)+(\Omega,\Omega)=0+1=1.
\end{align*}
For the fourth moment, continue the same calculation. First,
\begin{align*}
s^3\Omega=s(f\otimes f+\Omega)=s(f\otimes f)+s\Omega.
\end{align*}
Now
\begin{align*}
s(f\otimes f)=\ell(f)(f\otimes f)+\ell(f)^*(f\otimes f).
\end{align*}
The two terms are
\begin{align*}
\ell(f)(f\otimes f)=f\otimes f\otimes f.
\end{align*}
\begin{align*}
\ell(f)^*(f\otimes f)=(f,f)_H f=f.
\end{align*}
Since $s\Omega=f$, we get
\begin{align*}
s^3\Omega=f\otimes f\otimes f+2f.
\end{align*}
Applying $s$ one more time,
\begin{align*}
s^4\Omega=s(f\otimes f\otimes f)+2s(f).
\end{align*}
For the three-particle term,
\begin{align*}
s(f\otimes f\otimes f)=\ell(f)(f\otimes f\otimes f)+\ell(f)^*(f\otimes f\otimes f).
\end{align*}
Thus
\begin{align*}
\ell(f)(f\otimes f\otimes f)=f\otimes f\otimes f\otimes f.
\end{align*}
\begin{align*}
\ell(f)^*(f\otimes f\otimes f)=(f,f)_H f\otimes f=f\otimes f.
\end{align*}
Also, from the second-moment computation,
\begin{align*}
s(f)=f\otimes f+\Omega.
\end{align*}
Combining the terms,
\begin{align*}
s^4\Omega=f\otimes f\otimes f\otimes f+f\otimes f+2f\otimes f+2\Omega.
\end{align*}
Hence
\begin{align*}
s^4\Omega=f\otimes f\otimes f\otimes f+3f\otimes f+2\Omega.
\end{align*}
Only the vacuum component contributes to the vacuum expectation, because $H^{\otimes 2}$ and $H^{\otimes 4}$ are orthogonal to $\mathbb C\Omega$. Therefore
\begin{align*}
\tau(s^4)=(s^4\Omega,\Omega)=0+0+2(\Omega,\Omega)=2.
\end{align*}
The first moments are therefore $\tau(s)=0$, $\tau(s^2)=1$, and $\tau(s^4)=2$, matching the Catalan values $C_0,C_1,C_2$ after centering and showing the first non-Gaussian feature of the semicircular law.
[/example]
The fourth moment already distinguishes the semicircular law from the standard Gaussian law, whose fourth moment is $3$. The missing crossing pairing is the first visible sign that free probability replaces ordinary pairings by noncrossing pairings. Since orthogonal directions in Fock space were already shown to generate free algebras, the remaining step is to combine the single-variable moment computation with the freeness theorem.
[quotetheorem:7154]
[citeproof:7154]
This theorem supplies the standard model used throughout free probability. Orthonormality is essential for the stated normalisation and freeness: unit length gives variance $1$, while orthogonality separates colours so mixed centered moments vanish. The theorem does not say that every semicircular family is literally this family on a full Fock space, only that this construction realizes the joint law. It proves that freely independent variables with semicircular laws are not merely formal objects: they are concrete bounded operators on a Hilbert space with a canonical state.
[remark: Algebraic and Fock Models]
The algebraic free product gives the universal language for adjoining freely independent copies with prescribed laws. The Fock-space construction gives a positive operator model for the most important examples, especially semicircular systems. Later chapters use both viewpoints: free products support abstract moment calculations, while Fock space provides the canonical model for free Gaussian behaviour.
[/remark]
Chapter 9 confirms that free families exist in concrete models; Chapter 10 converts the moment-cumulant formulas into practical computational algorithms, turning the combinatorial theory from earlier chapters into an explicit toolkit for calculations.
# 10. Foundational Computation Toolkit
This chapter turns the combinatorics of free cumulants into a practical computation scheme. Chapters 5 and 6 established the moment-cumulant formula over noncrossing partitions and the vanishing of mixed free cumulants under freeness. The goal now is computational: given enough cumulants, compute moments; given moments, recover cumulants; and reduce tracial word moments before doing any partition enumeration.
## Computing Moments and Cumulants Recursively
The first problem is how to use the moment-cumulant formula without listing every noncrossing partition from scratch. Direct summation works for small orders, but the number of noncrossing partitions grows quickly, so repeated computations need a recursive structure. Recording only scalar moments such as $\varphi(x^n)$ loses the order information in mixed words, while ignoring blockwise cumulants leaves no way to tell which parts of a partition contribution come from lower-order data.
[definition: Word Moment]
Let $(A,\varphi)$ be a noncommutative probability space. For $n\ge 1$, the $n$-th word moment map is the function
\begin{align*}
M_n:A^n&\to \mathbb C, & M_n(x_1,\dots,x_n)&=\varphi(x_1\cdots x_n).
\end{align*}
[/definition]
A word moment records the ordered word rather than only the multiset of letters. To compute these quantities from freeness data, we need the multilinear functionals that separate contributions by noncrossing blocks. We write $NC(n)$ for the lattice of noncrossing partitions of $\{1,\dots,n\}$, and $1_n\in NC(n)$ for the maximal partition with a single block $\{1,\dots,n\}$.
[definition: Free Cumulant Functional]
For a noncommutative probability space $(A,\varphi)$, the free cumulant functionals $(\kappa_n)_{n\ge 1}$ are the multilinear maps $\kappa_n:A^n\to \mathbb C$ determined by
\begin{align*}
\varphi(x_1\cdots x_n)=\sum_{\pi\in NC(n)} \kappa_\pi[x_1,\dots,x_n],
\end{align*}
where, for $\pi=\{V_1,\dots,V_r\}$,
\begin{align*}
\kappa_\pi[x_1,\dots,x_n]=\prod_{V\in\pi} \kappa_{|V|}(x_{i_1},\dots,x_{i_{|V|}})
\end{align*}
with $V=\{i_1<\cdots<i_{|V|}\}$.
[/definition]
The definition gives a triangular relation between moments and cumulants: the full one-block term contains the new cumulant, while every other partition uses smaller blocks. The computational obstruction is that the defining equation presents moments as sums, whereas in practice the moments are known and the cumulants must be recovered. Since every proper noncrossing partition uses lower-order cumulants, the one-block term can be isolated order by order.
[quotetheorem:7155]
[citeproof:7155]
The hypotheses specify both the combinatorial formula being inverted and the lower-order data that have already been fixed. If the noncrossing setup is replaced by all set partitions, the fourth one-variable relation has three pairings instead of the two noncrossing pairings; with $r_2=1$ and all other higher cumulants zero, the all-partition formula gives $m_4=3$, while the free noncrossing formula gives $m_4=2$. That computes classical cumulants rather than free cumulants, so the recursion is no longer the theorem above. The order-by-order assumption is also necessary: the third-order identity
\begin{align*}
m_3=r_3+3r_2r_1+r_1^3
\end{align*}
does not determine $r_3$ from $m_3$ alone until $r_1$ and $r_2$ are known. Thus the theorem is triangular, not closed at a single order in isolation. At order $n=1$ there is no lower-order correction and $\kappa_1(x)=\varphi(x)$, while at order $n=2$ the correction $\kappa_1(x_1)\kappa_1(x_2)$ already shows why centering changes the computation. The theorem does not reduce the number of noncrossing partitions by itself; it only says that once those lower block contributions are known, the new cumulant is isolated by subtracting them. This is why the next step is to organize the partition sum recursively rather than enumerate $NC(n)$ afresh each time.
For single-variable computations this recursion produces a triangular table. Write $m_n=\varphi(x^n)$ and $r_n=\kappa_n(x,\dots,x)$.
[example: Cumulants Through Order Five]
For one variable $x\in A$, write $m_n=\varphi(x^n)$ and $r_n=\kappa_n(x,\dots,x)$. Since every block of a noncrossing partition contributes one scalar cumulant and all entries are equal to $x$, the coefficient of a monomial is the number of noncrossing partitions with the corresponding block sizes. Through order $5$, the relevant counts are: at order $3$, three partitions of type $(2,1)$; at order $4$, four of type $(3,1)$, two of type $(2,2)$, six of type $(2,1,1)$; and at order $5$, five of type $(4,1)$, five of type $(3,2)$, ten of type $(3,1,1)$, ten of type $(2,2,1)$, and ten of type $(2,1,1,1)$. Hence the moment-cumulant relations are
\begin{align*}
m_1=r_1.
\end{align*}
\begin{align*}
m_2=r_2+r_1^2.
\end{align*}
\begin{align*}
m_3=r_3+3r_2r_1+r_1^3.
\end{align*}
\begin{align*}
m_4=r_4+4r_3r_1+2r_2^2+6r_2r_1^2+r_1^4.
\end{align*}
\begin{align*}
m_5=r_5+5r_4r_1+5r_3r_2+10r_3r_1^2+10r_2^2r_1+10r_2r_1^3+r_1^5.
\end{align*}
The triangular recovery starts with
\begin{align*}
r_1=m_1.
\end{align*}
From $m_2=r_2+r_1^2$, subtracting $r_1^2$ gives
\begin{align*}
r_2=m_2-r_1^2.
\end{align*}
Substituting $r_1=m_1$ gives
\begin{align*}
r_2=m_2-m_1^2.
\end{align*}
From $m_3=r_3+3r_2r_1+r_1^3$, subtracting the lower-order terms gives
\begin{align*}
r_3=m_3-3r_2r_1-r_1^3.
\end{align*}
Now substitute $r_1=m_1$ and $r_2=m_2-m_1^2$:
\begin{align*}
r_3=m_3-3(m_2-m_1^2)m_1-m_1^3.
\end{align*}
Distributing the factor $-3m_1$ gives
\begin{align*}
r_3=m_3-3m_2m_1+3m_1^3-m_1^3.
\end{align*}
Combining $3m_1^3-m_1^3=2m_1^3$, we obtain
\begin{align*}
r_3=m_3-3m_2m_1+2m_1^3.
\end{align*}
At order $4$, isolate the one-block cumulant:
\begin{align*}
r_4=m_4-4r_3r_1-2r_2^2-6r_2r_1^2-r_1^4.
\end{align*}
Using the already computed values,
\begin{align*}
4r_3r_1=4(m_3-3m_2m_1+2m_1^3)m_1=4m_3m_1-12m_2m_1^2+8m_1^4.
\end{align*}
\begin{align*}
2r_2^2=2(m_2-m_1^2)^2=2(m_2^2-2m_2m_1^2+m_1^4)=2m_2^2-4m_2m_1^2+2m_1^4.
\end{align*}
\begin{align*}
6r_2r_1^2=6(m_2-m_1^2)m_1^2=6m_2m_1^2-6m_1^4.
\end{align*}
\begin{align*}
r_1^4=m_1^4.
\end{align*}
Adding these four correction terms and collecting like monomials gives
\begin{align*}
4r_3r_1+2r_2^2+6r_2r_1^2+r_1^4=4m_3m_1+2m_2^2+(-12-4+6)m_2m_1^2+(8+2-6+1)m_1^4.
\end{align*}
Thus
\begin{align*}
4r_3r_1+2r_2^2+6r_2r_1^2+r_1^4=4m_3m_1+2m_2^2-10m_2m_1^2+5m_1^4.
\end{align*}
Substituting into the formula for $r_4$ yields
\begin{align*}
r_4=m_4-4m_3m_1-2m_2^2+10m_2m_1^2-5m_1^4.
\end{align*}
At order $5$, isolate $r_5$:
\begin{align*}
r_5=m_5-5r_4r_1-5r_3r_2-10r_3r_1^2-10r_2^2r_1-10r_2r_1^3-r_1^5.
\end{align*}
The six correction terms are expanded as follows:
\begin{align*}
5r_4r_1=5(m_4-4m_3m_1-2m_2^2+10m_2m_1^2-5m_1^4)m_1.
\end{align*}
\begin{align*}
5r_4r_1=5m_4m_1-20m_3m_1^2-10m_2^2m_1+50m_2m_1^3-25m_1^5.
\end{align*}
\begin{align*}
5r_3r_2=5(m_3-3m_2m_1+2m_1^3)(m_2-m_1^2).
\end{align*}
\begin{align*}
5r_3r_2=5(m_3m_2-m_3m_1^2-3m_2^2m_1+3m_2m_1^3+2m_2m_1^3-2m_1^5).
\end{align*}
\begin{align*}
5r_3r_2=5m_3m_2-5m_3m_1^2-15m_2^2m_1+25m_2m_1^3-10m_1^5.
\end{align*}
\begin{align*}
10r_3r_1^2=10(m_3-3m_2m_1+2m_1^3)m_1^2.
\end{align*}
\begin{align*}
10r_3r_1^2=10m_3m_1^2-30m_2m_1^3+20m_1^5.
\end{align*}
\begin{align*}
10r_2^2r_1=10(m_2-m_1^2)^2m_1.
\end{align*}
\begin{align*}
10r_2^2r_1=10(m_2^2-2m_2m_1^2+m_1^4)m_1.
\end{align*}
\begin{align*}
10r_2^2r_1=10m_2^2m_1-20m_2m_1^3+10m_1^5.
\end{align*}
\begin{align*}
10r_2r_1^3=10(m_2-m_1^2)m_1^3=10m_2m_1^3-10m_1^5.
\end{align*}
\begin{align*}
r_1^5=m_1^5.
\end{align*}
Collecting these correction terms by monomial gives the coefficients
\begin{align*}
m_4m_1:\;5.
\end{align*}
\begin{align*}
m_3m_2:\;5.
\end{align*}
\begin{align*}
m_3m_1^2:\;-20-5+10=-15.
\end{align*}
\begin{align*}
m_2^2m_1:\;-10-15+10=-15.
\end{align*}
\begin{align*}
m_2m_1^3:\;50+25-30-20+10=35.
\end{align*}
\begin{align*}
m_1^5:\;-25-10+20+10-10+1=-14.
\end{align*}
Therefore
\begin{align*}
5r_4r_1+5r_3r_2+10r_3r_1^2+10r_2^2r_1+10r_2r_1^3+r_1^5=5m_4m_1+5m_3m_2-15m_3m_1^2-15m_2^2m_1+35m_2m_1^3-14m_1^5.
\end{align*}
Substituting this correction into the isolated formula for $r_5$ gives
\begin{align*}
r_5=m_5-5m_4m_1-5m_3m_2+15m_3m_1^2+15m_2^2m_1-35m_2m_1^3+14m_1^5.
\end{align*}
Thus each recovered $r_n$ is obtained by isolating the unique one-block cumulant in the order-$n$ relation and replacing every lower cumulant by the values already recovered from smaller orders.
[/example]
This table shows the shape of the recursion: each new cumulant appears once and linearly. In computations involving several variables, the same triangular pattern holds, but the order of the entries inside each cumulant must be preserved.
## Block Recursion Over Noncrossing Partitions
The next problem is to evaluate sums over $NC(n)$ without generating all partitions first. Noncrossing partitions decompose around endpoint blocks, and this feature lets us reduce a word to smaller intervals.
[definition: Interval Block]
Let $\pi\in NC(n)$. A block $V\in\pi$ is an interval block if $V=\{p,p+1,\dots,q\}$ for some $1\le p\le q\le n$.
[/definition]
Interval blocks are the computational handles in $NC(n)$, but an endpoint block gives an even more systematic recursion because it splits the complement into independent intervals. The point is to avoid listing every noncrossing partition of a long word: once the block containing the first position is chosen, noncrossingness forces the remaining positions to break into smaller subwords whose moment contributions can be computed recursively.
[quotetheorem:7156]
[proofunderconstruction:7156]
The noncrossing hypothesis is essential here. For an arbitrary set partition of $\{1,2,3,4\}$, take the endpoint block $V=\{1,3\}$ and the remaining block $\{2,4\}$. The complement of $V$ consists of the intervals $\{2\}$ and $\{4\}$, but the block $\{2,4\}$ connects them, so its contribution is $\kappa_2(x_2,x_4)$ rather than a product of independent interval moments $M[2,2]M[4,4]$. This is the concrete word-level failure that noncrossingness prevents. The endpoint requirement is also part of the mechanism: choosing a block not anchored at the first point does not decompose the whole word $x_1\cdots x_n$ into the displayed left-to-right interval recursion. A boundary case is $V=\{1\}$, where the first factor is $\kappa_1(x_1)$ and the only nonempty interval is the tail $\{2,\dots,n\}$; if $x_1$ is centered, this entire branch vanishes. The formula still does not compute cumulant values by itself, since those must come from marginal data or earlier recursive recovery. What it supplies is the bookkeeping structure, and a dynamic program stores moment values of subwords to avoid recomputing the same interval contributions.
[example: Recursive Evaluation Of An Alternating Word]
Let $a,b\in A$ be centered free variables, and write the word as $x_1x_2x_3x_4x_5x_6=ababab$. We compute $M[1,6]=\varphi(ababab)$ by endpoint-block recursion, where $M[p,q]=\varphi(x_p\cdots x_q)$ and $M[p,q]=1$ for an empty interval. By freeness, every cumulant block containing both $a$ and $b$ vanishes, so the endpoint block containing position $1$ can only use the $a$-positions $1,3,5$.
Thus the possible endpoint blocks are $\{1\}$, $\{1,3\}$, $\{1,5\}$, and $\{1,3,5\}$. Applying *[Endpoint Block Decomposition](/theorems/7156)* to these four choices gives
\begin{align*}
\varphi(ababab)=\kappa_1(a)M[2,6]+\kappa_2(a,a)M[2,2]M[4,6]+\kappa_2(a,a)M[2,4]M[6,6]+\kappa_3(a,a,a)M[2,2]M[4,4]M[6,6].
\end{align*}
Since $a$ is centered,
\begin{align*}
\kappa_1(a)=\varphi(a)=0.
\end{align*}
Since $b$ is centered, each one-letter $b$ interval has moment zero:
\begin{align*}
M[2,2]=\varphi(b)=0.
\end{align*}
\begin{align*}
M[4,4]=\varphi(b)=0.
\end{align*}
\begin{align*}
M[6,6]=\varphi(b)=0.
\end{align*}
The only interval in the endpoint expansion whose vanishing is not immediate from being one letter is $M[2,4]=\varphi(bab)$. Its colour-respecting noncrossing partitions are
\begin{align*}
\{\{2\},\{3\},\{4\}\}.
\end{align*}
and
\begin{align*}
\{\{2,4\},\{3\}\}.
\end{align*}
The first partition contributes
\begin{align*}
\kappa_1(b)\kappa_1(a)\kappa_1(b)=0\cdot 0\cdot 0=0.
\end{align*}
The second partition contributes
\begin{align*}
\kappa_2(b,b)\kappa_1(a)=\kappa_2(b,b)\cdot 0=0.
\end{align*}
Therefore
\begin{align*}
M[2,4]=\varphi(bab)=0+0=0.
\end{align*}
Substituting these values into the endpoint expansion gives
\begin{align*}
\varphi(ababab)=0\cdot M[2,6]+\kappa_2(a,a)\cdot 0\cdot M[4,6]+\kappa_2(a,a)\cdot 0\cdot 0+\kappa_3(a,a,a)\cdot 0\cdot 0\cdot 0.
\end{align*}
Each summand is zero:
\begin{align*}
0\cdot M[2,6]=0.
\end{align*}
\begin{align*}
\kappa_2(a,a)\cdot 0\cdot M[4,6]=0.
\end{align*}
\begin{align*}
\kappa_2(a,a)\cdot 0\cdot 0=0.
\end{align*}
\begin{align*}
\kappa_3(a,a,a)\cdot 0\cdot 0\cdot 0=0.
\end{align*}
Hence
\begin{align*}
\varphi(ababab)=0+0+0+0=0.
\end{align*}
The vanishing occurs because the alternating centered word forces every colour-respecting noncrossing endpoint branch to contain a centered singleton interval; for less alternating words, the surviving branches are evaluated by replacing each monochromatic block with the corresponding marginal cumulant.
[/example]
The filtering rule in the example is the central computational simplification: freeness removes all mixed-colour blocks before any arithmetic begins. The next theorem is needed to justify why this filtering, together with the marginal cumulants, determines every mixed moment rather than only the displayed example.
[quotetheorem:7157]
[citeproof:7157]
This theorem is the computational form of freeness, and the freeness hypothesis is doing real work. Marginal cumulants alone never determine a general joint law. For a concrete scalar example, let $\varepsilon$ be a Rademacher random variable with $\mathbb P(\varepsilon=1)=\mathbb P(\varepsilon=-1)=1/2$. The pairs $(X,Y)=(\varepsilon,\varepsilon)$ and $(X',Y')=(\varepsilon,-\varepsilon)$ have the same individual marginal distributions for each coordinate, but
\begin{align*}
\mathbb E[XY]=1,\qquad \mathbb E[X'Y']=-1.
\end{align*}
The same obstruction appears in noncommutative notation: knowing the separate laws of $a$ and $b$ does not determine $\varphi(ab)$, $\varphi(aba)$, or higher mixed words unless a joint relation such as freeness is imposed. Classical independence is also not a substitute, because it concerns commuting random variables or tensor-product factorisation of classical events rather than the vanishing of mixed free cumulants in noncommuting words. Freeness supplies exactly the missing instruction: discard every cumulant block that contains more than one colour, then evaluate the surviving blocks from marginal data.
## Tracial Simplifications And Cyclic Words
The next question is how much a word can be simplified before applying cumulants. In a tracial noncommutative probability space, moments do not become commutative, but cyclic rotations of a word have the same value.
[definition: Cyclic Equivalence Of Words]
Let $X$ be an alphabet and fix $n\ge 1$. On the set $X^n$ of words of length $n$, two words $w=x_1\cdots x_n$ and $w'=y_1\cdots y_n$ are cyclically equivalent if there exists $0\le k<n$ such that
\begin{align*}
w'=x_{k+1}x_{k+2}\cdots x_nx_1\cdots x_k.
\end{align*}
[/definition]
Cyclic equivalence is weaker than commutation: it moves the start point of the word but does not allow arbitrary swaps. In a tracial computation, the useful simplification is not to reorder letters freely but to choose a better starting point for the same cyclic word.
To use this relation in moment calculations, one needs a bridge from the combinatorial rotation of letters to the analytic identity for the state. The relevant question is exactly which rotations are harmless under the trace property, and why this does not license arbitrary rearrangement of the word.
[quotetheorem:7158]
[citeproof:7158]
The tracial hypothesis is essential: in a nontracial state, $\varphi(uv)$ and $\varphi(vu)$ may differ, so even the words $ab$ and $ba$ need not have the same moment. The conclusion also stops at cyclic rotation; it does not permit arbitrary permutation of letters, so words such as $abab$ and $aabb$ remain genuinely different. Computationally, cyclic reduction is therefore a representative-selection step rather than a commutativisation step. It should happen before partition enumeration, because a convenient representative can place a repeated letter at the beginning or expose singleton blocks that vanish after centering.
[example: Cyclic Simplification Of A Tracial Word Moment]
Let $(A,\varphi)$ be tracial, and consider the word $a^2bab$, whose letters are $a,a,b,a,b$. We rotate the final $b$ to the front by writing the word as a product $uv$ with $u=a^2ba$ and $v=b$:
\begin{align*}
\varphi(a^2bab)=\varphi((a^2ba)b).
\end{align*}
The trace identity $\varphi(uv)=\varphi(vu)$ gives
\begin{align*}
\varphi((a^2ba)b)=\varphi(b(a^2ba)).
\end{align*}
Multiplying inside the algebra without changing the order of the remaining letters,
\begin{align*}
b(a^2ba)=ba^2ba.
\end{align*}
Therefore
\begin{align*}
\varphi(a^2bab)=\varphi(ba^2ba).
\end{align*}
If $a$ and $b$ are centered, then
\begin{align*}
\kappa_1(a)=\varphi(a)=0.
\end{align*}
and
\begin{align*}
\kappa_1(b)=\varphi(b)=0.
\end{align*}
Now expand $\varphi(ba^2ba)$ by the Speicher Moment-Cumulant Formula over noncrossing partitions. If a colour-respecting partition has a singleton block at an $a$-position, its cumulant product contains the factor $\kappa_1(a)=0$, so the whole product is $0$. If it has a singleton block at a $b$-position, its cumulant product contains the factor $\kappa_1(b)=0$, so the whole product is again $0$.
Cyclic reduction has not changed the moment value; it has only replaced $a^2bab$ by the equal cyclic representative $ba^2ba$, where singleton centered blocks can be detected before enumerating the remaining noncrossing partitions.
[/example]
Cyclic reduction does not identify $abab$ with $aabb$. The first word is alternating, while the second has adjacent equal letters; their moments can differ in a tracial free product.
## Order Sensitivity And Units
The final computational problem is to avoid importing commutative habits into noncommutative calculations. Most mistakes in hand computations come from reordering letters, forgetting that constants are units, or applying freeness to products that are not alternating and centered.
[remark: Noncommuting Products]
In a noncommutative probability space, $ab$ and $ba$ are different elements in general. A tracial state gives $\varphi(ab)=\varphi(ba)$, but it does not imply $ab=ba$, nor does it imply $\varphi(abc)=\varphi(acb)$.
[/remark]
This distinction matters when forming cumulants: the cumulant $\kappa_3(a,b,a)$ is not the same datum as $\kappa_3(a,a,b)$ unless a separate symmetry argument applies. A second frequent source of errors is forgetting that centering introduces the algebra unit.
[remark: Missing Units]
The unital condition in the definition of freeness is part of the computational setup. Constants are scalar multiples of $1_A$, and centering a variable means replacing $x$ by $x-\varphi(x)1_A$. Forgetting the unit leads to incorrect expansions of products such as $(a-\varphi(a)1_A)b(a-\varphi(a)1_A)$.
[/remark]
Units also control cumulants involving constants, so they are not harmless decoration in a formula. When centered variables are expanded as $x-\varphi(x)1_A$, many terms contain an explicit unit entry. The obstruction is to know whether such a unit can form a genuinely higher-order cumulant block, or whether it merely collapses the expression to lower-order moment data.
[quotetheorem:7159]
[citeproof:7159]
The restriction $n\ge 2$ is necessary, because $\kappa_1(1_A)=\varphi(1_A)=1$, not $0$. The unital and normalized setup is also essential. Without an algebra unit there is no distinguished entry $1_A$ to delete from a word, and if a linear functional on a unital algebra were not normalized, the base computation would give
\begin{align*}
\kappa_2(x,1_A)=\varphi(x)-\varphi(x)\varphi(1_A),
\end{align*}
which need not vanish when $\varphi(1_A)\ne 1$. Thus the result is not saying that constants disappear in every algebraic setting; it says that in a normalized noncommutative probability space, the unit cannot participate in a genuinely higher-order cumulant block. This is exactly what is needed when centered variables are expanded as $x-\varphi(x)1_A$: terms containing scalar multiples of the unit collapse to lower-order moment data instead of creating new mixed cumulants. The next example combines this unit rule with freeness and noncrossingness, and it also shows why crossing pairings from classical Wick-style intuition cannot be imported into free computations.
[example: Centered Alternating Fourth Moment]
Let $a,b\in A$ be centered and free, so
\begin{align*}
\kappa_1(a)=\varphi(a)=0.
\end{align*}
and
\begin{align*}
\kappa_1(b)=\varphi(b)=0.
\end{align*}
For the word $abab$, the positions of $a$ are $\{1,3\}$ and the positions of $b$ are $\{2,4\}$. By freeness, a cumulant block containing both $a$ and $b$ contributes $0$, so only monochromatic blocks can survive in the moment-cumulant expansion.
The colour-respecting partitions of $\{1,2,3,4\}$ are the all-singleton partition, the partition with block $\{1,3\}$ and singleton blocks $\{2\},\{4\}$, the partition with block $\{2,4\}$ and singleton blocks $\{1\},\{3\}$, and the pairing $\{\{1,3\},\{2,4\}\}$. The last one is crossing, because $1<2<3<4$ with $1,3$ in one block and $2,4$ in the other, so it is not in $NC(4)$. Therefore the only colour-respecting noncrossing contributions are
\begin{align*}
\kappa_1(a)\kappa_1(b)\kappa_1(a)\kappa_1(b)=0\cdot 0\cdot 0\cdot 0=0.
\end{align*}
\begin{align*}
\kappa_2(a,a)\kappa_1(b)\kappa_1(b)=\kappa_2(a,a)\cdot 0\cdot 0=0.
\end{align*}
\begin{align*}
\kappa_2(b,b)\kappa_1(a)\kappa_1(a)=\kappa_2(b,b)\cdot 0\cdot 0=0.
\end{align*}
Thus the moment-cumulant sum has no nonzero term, and
\begin{align*}
\varphi(abab)=0+0+0=0.
\end{align*}
For the word $a^2bab^2$, the six positions are
\begin{align*}
1:a,\quad 2:a,\quad 3:b,\quad 4:a,\quad 5:b,\quad 6:b.
\end{align*}
A surviving block must be contained either in the $a$-positions $\{1,2,4\}$ or in the $b$-positions $\{3,5,6\}$, again by freeness. Since $a$ and $b$ are centered, no surviving partition may contain a singleton block. The three $a$-positions cannot be partitioned without singletons except as the single block $\{1,2,4\}$, and the three $b$-positions cannot be partitioned without singletons except as the single block $\{3,5,6\}$. Hence the only possible colour-respecting partition without centered singleton blocks is
\begin{align*}
\{\{1,2,4\},\{3,5,6\}\}.
\end{align*}
This partition is crossing, because
\begin{align*}
2<3<4<5
\end{align*}
with $2,4$ in the block $\{1,2,4\}$ and $3,5$ in the block $\{3,5,6\}$. Therefore it is not in $NC(6)$. Every colour-respecting noncrossing partition of the word $a,a,b,a,b,b$ has a centered singleton block, so every term in the moment-cumulant expansion is $0$. Consequently
\begin{align*}
\varphi(a^2bab^2)=0.
\end{align*}
Both vanishings come from the same obstruction: freeness allows only monochromatic cumulant blocks, centering kills singleton blocks, and noncrossingness forbids the only pairing pattern that could otherwise connect the alternating colours without singletons.
[/example]
The computation toolkit can now be summarized as a sequence. First, center variables when freeness hypotheses are stated for centered alternating products. Second, use tracial cyclicity to choose a convenient cyclic representative when the state is tracial. Third, discard mixed cumulant blocks by freeness and discard singleton centered blocks. Finally, evaluate the remaining noncrossing partitions by endpoint-block recursion.
## Connections and Further Reading
These notes isolate the algebraic and combinatorial foundations of free probability: noncommutative probability spaces, joint laws, freeness, noncrossing partitions, Möbius inversion, free cumulants, and first limit theorems. A natural continuation can keep the same first-principles style while asking three questions already visible here: how later transform methods compute free convolution, how moment limit theorems are organized by cumulants, and how positivity assumptions make moment functionals behave like measures. The last question connects the course to the theory of [measure spaces](/page/Measure%20Space), but measure-theoretic tools are only background motivation on this page, not hidden hypotheses.
Two internal themes are worth keeping visible when moving beyond this foundation. First, every analytic extension must still respect the algebraic data introduced here: the unit, the state, joint laws, and positivity. Second, the combinatorics of $NC(n)$ is not only a bookkeeping device for moments; it is the mechanism behind free convolution and the contrast between tensor independence and free independence. Follow-up notes can therefore develop transform methods, semicircular examples, and free Poisson examples without rebuilding the formal language introduced here.
## References
- Alexandru Nica and Roland Speicher, *Lectures on the Combinatorics of Free Probability*.
- Dan Voiculescu, Ken Dykema, and Alexandru Nica, *Free Random Variables*.
- James A. Mingo and Roland Speicher, *Free Probability and Random Matrices*, foundational chapters.
- Uffe Haagerup and Steen Thorbjornsen, lecture notes on free probability, for supplementary operator examples.
Contents
- Introduction
- Why Noncommutative Probability Starts With Moments
- States as Expectations
- Laws and Joint Moments
- Independence and the Need for a New Rule
- Combinatorics as the Organising Device
- The Free Central Limit Theorem as Destination
- How to Read the Course
- 1. Noncommutative Probability Spaces
- Algebraic Observables and States
- Moment Functionals and Positivity
- Classical Probability as the Commutative Case
- Matrix Models and Tracial Moments
- 2. Laws and Moment Problems in the Tracial Setting
- Moment Sequences for One Self-Adjoint Variable
- Joint Laws as Functionals on Noncommutative Polynomials
- Tracial Laws and Cyclic Equivalence of Words
- Compact Support and Bounded Operator Realization
- 3. Classical Independence Versus Free Independence
- Tensor Independence and Product States
- Alternating Centred Products
- Freeness of Generated Algebras and Families of Variables
- Recursive Mixed-Moment Computations
- Pairwise Freeness and Joint Freeness
- 4. Noncrossing Partitions
- The Problem of Crossing Blocks
- Refinement Order And Interval Blocks
- Kreweras Complement And Catalan Enumeration
- Incidence Algebra And The Mobius Function
- 5. Free Cumulants
- Moments Indexed by Noncrossing Partitions
- Möbius Inversion and Uniqueness
- Multilinear Cumulants of Several Variables
- Low-Order Identities and Centered Variables
- 6. Freeness via Cumulants
- Mixed Cumulants and the Criterion for Freeness
- Additivity of Free Cumulants for Sums
- Products of Free Variables and Partition Refinements
- Centering Tricks and Reduction of Mixed Moments
- 7. Semicircular Variables and Free Wick Formula
- Semicircular Law by Moments and Catalan Numbers
- Characterization by Free Cumulants
- Semicircular Families with Covariance Matrix
- Free Wick Formula Using Noncrossing Pairings
- 8. The Free Central Limit Theorem
- Identically Distributed Free Variables and Normalization
- Cumulant Scaling Under Sums
- Statement and Proof of the Free Central Limit Theorem
- Multivariate Free Central Limit Theorem and Semicircular Systems
- 9. Standard Constructions and Models
- Free Products of Noncommutative Probability Spaces
- Reduced Words and Moment Computations
- Creation and Annihilation Operators on Full Fock Space
- Semicircular Variables from Fock Space
- 10. Foundational Computation Toolkit
- Computing Moments and Cumulants Recursively
- Block Recursion Over Noncrossing Partitions
- Tracial Simplifications And Cyclic Words
- Order Sensitivity And Units
- Connections and Further Reading
- References
Free Probability I: Noncommutative Foundations
Content
Problems
History
Created by admin on 6/15/2026 | Last updated on 6/15/2026
Prerequisites (0/1 completed)
Log in to track your prerequisite progress.
Prerequisites Graph
Interactive dependency map showing prerequisite concepts
Loading dependency graph...
Theorem
Definition
Current
Requires
Rate this page
★
★
★
★
★
Poor
Excellent