[proofplan]
The proof rests on a separation lemma: given distinct signatures $h_1, \dots, h_n \in \mathcal{S}_p$, there exists a linear functional $f \in T((V))^*$ with $f(h_1) = 1$ and $f(h_i) = 0$ for $i = 2, \dots, n$. To construct $f$, we first separate $h_1$ from each $h_i$ ($i \ge 2$) at finite truncation level: distinctness in $T((V))$ implies distinctness of truncations $\pi_{\le N} h_j$ at some finite $N$, and linear independence of $\pi_{\le N} h_1$ from $\pi_{\le N} h_i$ (which uses that signatures share the level-$0$ value $1$, so equal-up-to-scalar signatures are equal). The Hahn-Banach theorem then produces a separating functional $l_i$ at the truncated level, which we lift to $f_i = l_i \circ \pi_{\le N}$ on $T((V))$. The shuffle identity converts the product $\prod_{j=2}^n f_j(h_i)$ into a single pairing $(\shuffle_{j=2}^n f_j)(h_i)$, and we set $f := \shuffle_{j=2}^n f_j$. Linear independence follows immediately: applying $f$ to $\sum_i \lambda_i h_i = 0$ gives $\lambda_1 = 0$, and repeating the construction (with each $h_k$ playing the role of $h_1$) yields $\lambda_k = 0$ for every $k$.
[/proofplan]
[step:Reduce to constructing a separating functional for $h_1$ against $h_2, \dots, h_n$]
Suppose $\sum_{i=1}^n \lambda_i h_i = 0$ in $T((V))$ for some scalars $\lambda_1, \dots, \lambda_n \in \mathbb{R}$. We aim to show $\lambda_i = 0$ for every $i$. By symmetry of the index, it suffices to show $\lambda_1 = 0$ — the same argument with the indices permuted then yields $\lambda_2 = 0, \dots, \lambda_n = 0$.
To show $\lambda_1 = 0$, we exhibit a linear functional $f \in T((V))^*$ such that $f(h_1) = 1$ and $f(h_i) = 0$ for $i = 2, \dots, n$. Then applying $f$ to the relation $\sum_i \lambda_i h_i = 0$:
\begin{align*}
0 = f\!\left(\sum_{i=1}^n \lambda_i h_i\right) = \sum_{i=1}^n \lambda_i f(h_i) = \lambda_1 \cdot 1 + \sum_{i=2}^n \lambda_i \cdot 0 = \lambda_1,
\end{align*}
so $\lambda_1 = 0$. Repeating the construction with each $h_k$ in place of $h_1$ yields $\lambda_k = 0$ for all $k$. The remainder of the proof builds such an $f$.
[guided]
The standard recipe for proving linear independence in a vector space is to exhibit a **dual basis** — for each element $h_1, \dots, h_n$ of the proposed independent set, a linear functional $f_k$ that picks out $h_k$ and kills the others.
Suppose $\sum_{i=1}^n \lambda_i h_i = 0$ with $\lambda_i \in \mathbb{R}$. We want $\lambda_i = 0$ for all $i$. By symmetry (the indices $i = 1, \dots, n$ play interchangeable roles in the hypothesis), it suffices to show $\lambda_1 = 0$ — and then by relabelling $h_k$ as the "first" element, the same argument gives $\lambda_k = 0$.
The strategy: construct $f \in T((V))^*$ with $f(h_1) = 1$ and $f(h_i) = 0$ for $i = 2, \dots, n$. If we have such an $f$, we apply it to the relation:
\begin{align*}
0 = f\!\left(\sum_{i=1}^n \lambda_i h_i\right) = \sum_{i=1}^n \lambda_i \, f(h_i) = \lambda_1 \cdot 1 + 0 = \lambda_1.
\end{align*}
So the entire proof reduces to **constructing $f$**. This is where the structure of signatures comes in: the construction uses (i) the fact that signatures, despite being elements of an infinite-dimensional space, are determined by their truncations at sufficiently large levels, (ii) the Hahn-Banach theorem in finite dimensions, and (iii) the **shuffle identity** to combine pairwise separations into a single multiplicative one.
[/guided]
[/step]
[step:Truncate at a finite level to separate $h_1$ from each $h_i$ in finite-dimensional space]
For each $i \in \{2, \dots, n\}$, since $h_1 \ne h_i$ in $T((V))$ — i.e. they differ in at least one level — there exists a level $N_i \in \mathbb{N}$ such that $\pi_{\le N_i}(h_1) \ne \pi_{\le N_i}(h_i)$ in $T_{N_i}(V) = \bigoplus_{k=0}^{N_i} V^{\otimes k}$. Take
\begin{align*}
N := \max_{i \in \{2, \dots, n\}} N_i,
\end{align*}
so that
\begin{align*}
g_j := \pi_{\le N}(h_j), \qquad j = 1, \dots, n,
\end{align*}
gives $g_1 \ne g_i$ for every $i \ge 2$ in the finite-dimensional space $T_{\le N}(V)$. (We use $\pi_{\le N} := \bigoplus_{k=0}^N \pi_k$, the projection onto levels $0$ through $N$, which is linear from $T((V))$ to $T_{\le N}(V) := \bigoplus_{k=0}^N V^{\otimes k}$.)
We now show that, for each $i \ge 2$, the elements $g_1$ and $g_i$ are linearly independent in $T_{\le N}(V)$. Suppose for contradiction that $g_1 = \mu g_i$ for some scalar $\mu \in \mathbb{R}$. Looking at the level-$0$ component: every signature $h \in \mathcal{S}_p$ has $h^{(0)} = 1$ (the zeroth iterated integral of any path is the constant $1$ by definition of the signature). So $g_1^{(0)} = h_1^{(0)} = 1$ and $g_i^{(0)} = h_i^{(0)} = 1$. Substituting into $g_1 = \mu g_i$ at level $0$,
\begin{align*}
1 = \mu \cdot 1, \qquad \text{hence } \mu = 1.
\end{align*}
But then $g_1 = g_i$, contradicting the choice of $N$. Therefore $\{g_1, g_i\}$ is linearly independent in $T_{\le N}(V)$.
[guided]
We start with the hypothesis: $h_1, \dots, h_n$ are **distinct** elements of $\mathcal{S}_p \subseteq T((V))$. Distinctness in $T((V))$ means: for every pair $i \ne j$, there exists at least one level $k$ where $h_i^{(k)} \ne h_j^{(k)}$.
For our construction, we focus on separating $h_1$ from $h_i$ for $i = 2, \dots, n$. For each such $i$, distinctness gives a level $N_i$ where $h_1$ and $h_i$ differ. Concretely, $\pi_{N_i}(h_1) \ne \pi_{N_i}(h_i)$. Equivalently, the truncation $\pi_{\le N_i}(h_j) := \bigoplus_{k=0}^{N_i} \pi_k(h_j)$ to levels $\le N_i$ also differs:
\begin{align*}
\pi_{\le N_i}(h_1) \ne \pi_{\le N_i}(h_i) \quad \text{in } T_{\le N_i}(V).
\end{align*}
Take a **uniform** truncation level $N := \max_{i \ge 2} N_i$. Then for every $i = 2, \dots, n$:
\begin{align*}
g_1 := \pi_{\le N}(h_1) \ne \pi_{\le N}(h_i) =: g_i \quad \text{in } T_{\le N}(V).
\end{align*}
This is a finite-dimensional space (dimension $1 + \dim V + \dim V^{\otimes 2} + \cdots + \dim V^{\otimes N}$ — finite when $V$ is finite-dimensional, which we implicitly need for the Hahn-Banach application below; if $V$ is infinite-dimensional, the same argument works in the infinite-dimensional setting using a suitable Banach space structure, but the simplest statement is in the finite-$\dim V$ setting).
**Why we now check linear independence of $g_1$ and $g_i$:** The Hahn-Banach theorem requires more than distinctness — it requires that $g_1$ does not lie on the line spanned by $g_i$ (so that we can find a hyperplane through $g_i$ avoiding $g_1$). This is exactly linear independence.
**Why are $g_1, g_i$ linearly independent given they are distinct?** This is the crucial input from the structure of signatures: every signature has the **same level-$0$ value** $1$. Indeed, the level-$0$ component of the signature is $S(x)_{[a, b]}^{(0)} = 1$ by definition (the iterated integral over a degenerate $0$-simplex equals $1$ by convention). So $g_1^{(0)} = g_i^{(0)} = 1$.
Suppose for contradiction $g_1 = \mu g_i$ for some scalar $\mu$. At level $0$: $1 = g_1^{(0)} = \mu g_i^{(0)} = \mu \cdot 1$, so $\mu = 1$. Substituting back: $g_1 = g_i$. But this contradicts our choice of $N$ (which was specifically taken so that $g_1 \ne g_i$).
So $\{g_1, g_i\}$ is linearly independent — and we can now invoke Hahn-Banach.
[/guided]
[/step]
[step:Construct the finite-level separating functionals via Hahn-Banach]
For each $i \in \{2, \dots, n\}$, the pair $\{g_1, g_i\} \subset T_{\le N}(V)$ is linearly independent. Apply the [Hahn-Banach theorem](/theorems/???) (in its algebraic-extension form for finite-dimensional spaces, which is elementary): there exists a linear functional $l_i \in T_{\le N}(V)^*$ such that
\begin{align*}
l_i(g_1) = 1 \quad \text{and} \quad l_i(g_i) = 0.
\end{align*}
Concretely, since $g_1$ is not in the span of $g_i$, we can complete $\{g_1, g_i\}$ to a basis $\{g_1, g_i, e_3, \dots, e_d\}$ of $T_{\le N}(V)$ and define $l_i$ on this basis by $l_i(g_1) = 1$, $l_i(g_i) = 0$, $l_i(e_k) = 0$ for $k \ge 3$, then extend by linearity.
Lift $l_i$ to a functional on $T((V))$:
\begin{align*}
f_i := l_i \circ \pi_{\le N}: T((V)) &\to \mathbb{R} \\
h &\mapsto l_i(\pi_{\le N}(h)).
\end{align*}
The map $f_i$ is a composition of linear maps, hence linear; so $f_i \in T((V))^*$. By construction:
\begin{align*}
f_i(h_1) = l_i(\pi_{\le N}(h_1)) = l_i(g_1) = 1, \qquad f_i(h_i) = l_i(\pi_{\le N}(h_i)) = l_i(g_i) = 0.
\end{align*}
We do not control $f_i(h_j)$ for $j \notin \{1, i\}$.
[guided]
We now manufacture a linear functional that separates $g_1$ from each $g_i$ ($i \ge 2$). The Hahn-Banach theorem is the right tool, but in the finite-dimensional setting we can state and verify the conclusion directly.
**Hahn-Banach (finite-dimensional, algebraic form):** Let $W$ be a finite-dimensional vector space, and let $u, v \in W$ be linearly independent. Then there exists a linear functional $l \in W^*$ with $l(u) = 1$ and $l(v) = 0$.
**Proof (within this guided block — for completeness):** Since $\{u, v\}$ is linearly independent, extend it to a basis $\{u, v, e_3, \dots, e_d\}$ of $W$. Define $l$ by setting $l(u) = 1$, $l(v) = 0$, and $l(e_k) = 0$ for $k = 3, \dots, d$, then extend by linearity. This is well-defined because the values are specified on a basis, and it produces an element of $W^*$.
**Application to our setting.** Apply this to $W = T_{\le N}(V)$, $u = g_1$, $v = g_i$ — both are in $T_{\le N}(V)$, and they are linearly independent by Step 2. So we obtain $l_i \in T_{\le N}(V)^*$ with $l_i(g_1) = 1$ and $l_i(g_i) = 0$.
**Lifting to $T((V))^*$:** The functional $l_i$ is defined on the truncation. To extend to $T((V))$, simply compose with the truncation projection:
\begin{align*}
f_i: T((V)) &\to \mathbb{R} \\
h &\mapsto l_i(\pi_{\le N}(h)).
\end{align*}
This is linear (composition of linear maps), and $f_i \in T((V))^*$.
**Verification:**
\begin{align*}
f_i(h_1) = l_i(\pi_{\le N}(h_1)) = l_i(g_1) = 1, \qquad f_i(h_i) = l_i(\pi_{\le N}(h_i)) = l_i(g_i) = 0.
\end{align*}
**What we do not control:** $f_i(h_j)$ for $j \notin \{1, i\}$ is not constrained by the construction. So the family $\{f_2, \dots, f_n\}$ separates $h_1$ from each $h_i$ pairwise, but no single $f_i$ separates $h_1$ from all the others simultaneously. To remedy this, we will combine them via the **shuffle product** in the next step.
[/guided]
[/step]
[step:Combine the $f_i$ via the shuffle product, using the shuffle identity to compute the pairing]
Define
\begin{align*}
f := f_2 \shuffle f_3 \shuffle \cdots \shuffle f_n \in T(V^*) \cong T((V))^*,
\end{align*}
where $\shuffle$ is the shuffle product on $T(V^*)$ (associative and commutative). We claim:
\begin{align*}
f(h_i) = \prod_{j=2}^n f_j(h_i), \qquad i = 1, \dots, n.
\end{align*}
This is a direct application of the [Shuffle Identity](/theorems/2499) iterated $n - 1$ times: for $h \in \mathcal{S}_p$,
\begin{align*}
f_2(h) f_3(h) = (f_2 \shuffle f_3)(h),
\end{align*}
and inductively, for any number of factors,
\begin{align*}
f_2(h) f_3(h) \cdots f_n(h) = (f_2 \shuffle f_3 \shuffle \cdots \shuffle f_n)(h) = f(h).
\end{align*}
The shuffle identity applies because each $h_i \in \mathcal{S}_p$ is a signature of a path in $C_p([a, b], V)$ for $p \in [1, 2)$, which is the regime covered by the Shuffle Identity theorem.
[guided]
We have, for each $i \in \{2, \dots, n\}$, a functional $f_i \in T((V))^*$ that satisfies $f_i(h_1) = 1$ and $f_i(h_i) = 0$. We need a single functional $f$ with $f(h_1) = 1$ and $f(h_j) = 0$ for **all** $j = 2, \dots, n$.
**Idea: take a product.** If we could form a product $f_2 \cdot f_3 \cdots f_n$ such that the product evaluates as $\prod_{j=2}^n f_j(h_i)$, we would be done — for $i = 1$, every factor gives $1$, so the product is $1$; for $i \ge 2$, the factor $f_i$ gives $0$, so the product is $0$.
**The shuffle product is the right product.** Pointwise multiplication of functionals does not produce a linear functional in general. But the **shuffle identity** ([Shuffle Identity](/theorems/2499)) tells us:
\begin{align*}
f(h) g(h) = (f \shuffle g)(h) \quad \text{for all } h \in \mathcal{S}_p,
\end{align*}
when $f, g \in T((V))^* \cong T(V^*)$ and $\shuffle$ is the shuffle product on $T(V^*)$.
**Verification of the shuffle identity hypotheses:** [Shuffle Identity](/theorems/2499) applies when $h$ is a signature of a path in $C_p([a, b], V)$ for $p \in [1, 2)$. By assumption $\mathcal{S}_p$ consists exactly of signatures of such paths — the subscript $p$ indicates the regularity class — so each $h_i \in \mathcal{S}_p$ qualifies, and we may invoke the theorem.
**Iteration.** Applying the shuffle identity repeatedly:
\begin{align*}
f_2(h) f_3(h) f_4(h) \cdots f_n(h) &= (f_2 \shuffle f_3)(h) \cdot f_4(h) \cdots f_n(h) \\
&= ((f_2 \shuffle f_3) \shuffle f_4)(h) \cdot f_5(h) \cdots f_n(h) \\
&\quad \vdots \\
&= (f_2 \shuffle f_3 \shuffle \cdots \shuffle f_n)(h).
\end{align*}
The shuffle is associative (in fact commutative), so the iterated shuffle is unambiguous. Define
\begin{align*}
f := f_2 \shuffle f_3 \shuffle \cdots \shuffle f_n \in T(V^*) \cong T((V))^*.
\end{align*}
Then for each $h_i$ ($i = 1, \dots, n$):
\begin{align*}
f(h_i) = \prod_{j=2}^n f_j(h_i).
\end{align*}
**The shuffle product is essential** here — it converts the multiplicative requirement on functionals (which is not a priori a linear-functional-producing operation) into the algebraically natural shuffle product on the tensor algebra of the dual.
[/guided]
[/step]
[step:Evaluate $f(h_i)$ to verify $f$ separates $h_1$ from $h_2, \dots, h_n$, and conclude]
Compute $f(h_i)$ for each $i$ using the formula from Step 4:
\begin{align*}
f(h_1) = \prod_{j=2}^n f_j(h_1) = \prod_{j=2}^n 1 = 1,
\end{align*}
since $f_j(h_1) = 1$ for every $j \ge 2$ by Step 3. For $i \ge 2$, the factor with $j = i$ gives $f_i(h_i) = 0$, so the product collapses:
\begin{align*}
f(h_i) = \prod_{j=2}^n f_j(h_i) = f_2(h_i) \cdots f_{i-1}(h_i) \cdot \underbrace{f_i(h_i)}_{=0} \cdot f_{i+1}(h_i) \cdots f_n(h_i) = 0.
\end{align*}
Therefore $f \in T((V))^*$ satisfies $f(h_1) = 1$ and $f(h_i) = 0$ for every $i = 2, \dots, n$, fulfilling the requirement of Step 1.
Applying $f$ to the relation $\sum_{i=1}^n \lambda_i h_i = 0$,
\begin{align*}
0 = f\!\left(\sum_{i=1}^n \lambda_i h_i\right) = \sum_{i=1}^n \lambda_i f(h_i) = \lambda_1 \cdot 1 + \sum_{i=2}^n \lambda_i \cdot 0 = \lambda_1,
\end{align*}
so $\lambda_1 = 0$. By the symmetry argument in Step 1 (relabelling $h_k$ as the "first" element and constructing the corresponding separating functional $f^{(k)} := \shuffle_{j \ne k} f_j^{(k)}$ with $f_j^{(k)}(h_k) = 1$ and $f_j^{(k)}(h_j) = 0$), we obtain $\lambda_k = 0$ for every $k \in \{1, \dots, n\}$. Hence $\{h_1, \dots, h_n\}$ is linearly independent in $T((V))$, completing the proof.
[guided]
We finish by computing $f(h_i)$ explicitly.
**Case $i = 1$.** By Step 3, $f_j(h_1) = 1$ for every $j = 2, \dots, n$. By the formula from Step 4:
\begin{align*}
f(h_1) = \prod_{j=2}^n f_j(h_1) = 1 \cdot 1 \cdots 1 = 1.
\end{align*}
**Case $i \in \{2, \dots, n\}$.** The product over $j = 2, \dots, n$ contains the factor $f_i(h_i) = 0$ (by Step 3, $f_i$ was constructed precisely to kill $h_i$). A single zero factor in a product of real numbers makes the product zero, regardless of the other factors:
\begin{align*}
f(h_i) = \prod_{j=2}^n f_j(h_i) = \cdots \cdot f_i(h_i) \cdot \cdots = \cdots \cdot 0 \cdot \cdots = 0.
\end{align*}
So $f$ has exactly the properties required: $f(h_1) = 1$ and $f(h_i) = 0$ for all $i \ge 2$.
**Apply $f$ to $\sum_i \lambda_i h_i = 0$:**
\begin{align*}
0 = f\!\left(\sum_{i=1}^n \lambda_i h_i\right) = \sum_{i=1}^n \lambda_i f(h_i) = \lambda_1 \cdot 1 + \lambda_2 \cdot 0 + \cdots + \lambda_n \cdot 0 = \lambda_1.
\end{align*}
Hence $\lambda_1 = 0$.
**Symmetry: getting $\lambda_k = 0$ for general $k$.** The whole construction was symmetric in the choice of "first" index. To prove $\lambda_k = 0$ for arbitrary $k$, repeat the construction with $h_k$ playing the role of $h_1$:
- For each $j \ne k$, find $N^{(k)}_j$ such that $\pi_{\le N^{(k)}_j}(h_k) \ne \pi_{\le N^{(k)}_j}(h_j)$, take the maximum $N^{(k)} := \max_{j \ne k} N^{(k)}_j$.
- Apply Hahn-Banach to obtain $l^{(k)}_j \in T_{\le N^{(k)}}(V)^*$ with $l^{(k)}_j(\pi_{\le N^{(k)}}(h_k)) = 1$ and $l^{(k)}_j(\pi_{\le N^{(k)}}(h_j)) = 0$.
- Lift to $f^{(k)}_j := l^{(k)}_j \circ \pi_{\le N^{(k)}}$.
- Form $f^{(k)} := \shuffle_{j \ne k} f^{(k)}_j$.
- By the shuffle identity, $f^{(k)}(h_k) = 1$ and $f^{(k)}(h_j) = 0$ for $j \ne k$.
Applying $f^{(k)}$ to the relation:
\begin{align*}
0 = f^{(k)}\!\left(\sum_i \lambda_i h_i\right) = \lambda_k \cdot 1 = \lambda_k.
\end{align*}
This holds for every $k = 1, \dots, n$. Hence all $\lambda_i = 0$, and $\{h_1, \dots, h_n\}$ is linearly independent in $T((V))$.
The proof closes. The fundamental input was the **shuffle identity**: it is the bridge between the multiplicative requirement on the separating functional and the algebraic structure of $T(V^*)$, allowing us to combine pairwise separations into a global one. Without the shuffle, separating $h_1$ from $h_2, \dots, h_n$ simultaneously would require constructing a single functional that vanishes on $n-1$ specific (not necessarily linearly independent) elements while equalling $1$ on $h_1$, which is in general not possible by Hahn-Banach alone — the shuffle product manufactures exactly such a functional from the easier pairwise case.
[/guided]
[/step]