[proofplan]
Choose polynomial-time many-one reductions $f$ from $A$ to $B$ and $g$ from $B$ to $C$. The candidate reduction from $A$ to $C$ is their composition $h = g \circ f$. The membership equivalence follows by applying the two reduction equivalences in sequence, and the polynomial-time bound follows because the output length of a polynomial-time transducer is polynomially bounded and a composition of polynomial bounds is again polynomial.
[/proofplan]
[step:Choose reductions witnessing the two hypotheses]
Since $A \le_p B$, there exists a polynomial-time computable map
\begin{align*}
f: \rho^* \to \tau^*
\end{align*}
such that, for every word $x \in \rho^*$,
\begin{align*}
x \in A \iff f(x) \in B.
\end{align*}
Since $B \le_p C$, there exists a polynomial-time computable map
\begin{align*}
g: \tau^* \to \upsilon^*
\end{align*}
such that, for every word $y \in \tau^*$,
\begin{align*}
y \in B \iff g(y) \in C.
\end{align*}
[/step]
[step:Define the composed reduction]
Define the map
\begin{align*}
h: \rho^* \to \upsilon^*
\end{align*}
by
\begin{align*}
h(x) = g(f(x))
\end{align*}
for every $x \in \rho^*$.
For every $x \in \rho^*$, the defining property of $f$ gives
\begin{align*}
x \in A \iff f(x) \in B.
\end{align*}
Applying the defining property of $g$ to the word $y = f(x) \in \tau^*$ gives
\begin{align*}
f(x) \in B \iff g(f(x)) \in C.
\end{align*}
Since $h(x) = g(f(x))$, transitivity of logical equivalence yields
\begin{align*}
x \in A \iff h(x) \in C.
\end{align*}
[/step]
[step:Verify that the composed reduction is computable in polynomial time]
Let $M_f$ be a deterministic transducer computing $f$, and let $M_g$ be a deterministic transducer computing $g$. Choose polynomials $p: \mathbb{N} \to \mathbb{N}$ and $q: \mathbb{N} \to \mathbb{N}$ such that $M_f$ halts on every input $x \in \rho^*$ within $p(|x|)$ steps and $M_g$ halts on every input $y \in \tau^*$ within $q(|y|)$ steps.
Because a transducer can write at most one output symbol per computation step, the length of the output of $M_f$ is bounded by its running time. Thus, for every $x \in \rho^*$,
\begin{align*}
|f(x)| \le p(|x|).
\end{align*}
An algorithm for $h$ first runs $M_f$ on input $x$, stores the word $f(x)$, and then runs $M_g$ on input $f(x)$. Its running time is bounded by
\begin{align*}
p(|x|) + q(|f(x)|).
\end{align*}
Using the length bound $|f(x)| \le p(|x|)$ and replacing $q$ if necessary by a nondecreasing polynomial upper bound, this is at most
\begin{align*}
p(|x|) + q(p(|x|)).
\end{align*}
The function $n \mapsto p(n) + q(p(n))$ is a polynomial in $n$, so $h$ is polynomial-time computable.
[guided]
We must check not only that $h(x) = g(f(x))$ has the correct membership behaviour, but also that it can actually be computed within polynomial time in the original input length $|x|$. Let $M_f$ be a deterministic transducer computing $f$, and let $M_g$ be a deterministic transducer computing $g$. Since $f$ is polynomial-time computable, there is a polynomial $p: \mathbb{N} \to \mathbb{N}$ such that $M_f$ halts on every input $x \in \rho^*$ within $p(|x|)$ steps. Since $g$ is polynomial-time computable, there is a polynomial $q: \mathbb{N} \to \mathbb{N}$ such that $M_g$ halts on every input $y \in \tau^*$ within $q(|y|)$ steps.
The only point requiring attention is that the running time of $M_g$ is measured in terms of the length of its own input, namely $|f(x)|$, not directly in terms of $|x|$. We therefore need a polynomial bound on $|f(x)|$ in terms of $|x|$. Because $M_f$ can write at most one output symbol in each computation step, its output length cannot exceed its number of computation steps. Hence, for every input $x \in \rho^*$,
\begin{align*}
|f(x)| \le p(|x|).
\end{align*}
Now compute $h(x)$ by first computing $f(x)$ with $M_f$ and then computing $g(f(x))$ with $M_g$. The first stage takes at most $p(|x|)$ steps. The second stage takes at most $q(|f(x)|)$ steps. Therefore the total running time is bounded by
\begin{align*}
p(|x|) + q(|f(x)|).
\end{align*}
Using $|f(x)| \le p(|x|)$, and taking $q$ to be a nondecreasing polynomial upper bound for the running time of $M_g$, we obtain
\begin{align*}
p(|x|) + q(|f(x)|) \le p(|x|) + q(p(|x|)).
\end{align*}
The expression $p(n) + q(p(n))$ is a polynomial in $n$, because the composition of two polynomials is a polynomial and the sum of two polynomials is a polynomial. Thus the algorithm computing $h$ runs in polynomial time in $|x|$.
[/guided]
[/step]
[step:Conclude that $h$ witnesses $A \le_p C$]
The map $h: \rho^* \to \upsilon^*$ is polynomial-time computable, and for every $x \in \rho^*$ it satisfies
\begin{align*}
x \in A \iff h(x) \in C.
\end{align*}
Therefore $h$ is a polynomial-time many-one reduction from $A$ to $C$. Hence $A \le_p C$.
[/step]