[proofplan]
We first write the Bernoulli likelihood using the logistic parametrisation
\begin{align*}
p_i(\beta)=\frac{e^{x_i^\top\beta}}{1+e^{x_i^\top\beta}}.
\end{align*}
Taking logarithms converts the product likelihood into a sum and gives the stated expression for $\ell$. We then differentiate $\ell$ as a function of $\beta \in \mathbb{R}^d$: the first derivative gives the score, and differentiating the score once more gives the observed Hessian.
[/proofplan]
[step:Substitute the logistic probabilities into the Bernoulli likelihood]
For each $i \in \{1,\dots,n\}$, define the linear predictor map
\begin{align*}
\eta_i:\mathbb{R}^d &\to \mathbb{R} \\
\beta &\mapsto x_i^\top \beta
\end{align*}
and the fitted probability map
\begin{align*}
p_i:\mathbb{R}^d &\to (0,1) \\
\beta &\mapsto \frac{e^{\eta_i(\beta)}}{1+e^{\eta_i(\beta)}}.
\end{align*}
Then
\begin{align*}
1-p_i(\beta)
&=1-\frac{e^{\eta_i(\beta)}}{1+e^{\eta_i(\beta)}}\\
&=\frac{1}{1+e^{\eta_i(\beta)}}.
\end{align*}
Since $Y_i \in \{0,1\}$ and $Y_1,\dots,Y_n$ are conditionally independent Bernoulli random variables with success probabilities $p_i(\beta)$, the likelihood function $L:\mathbb{R}^d \to (0,\infty)$ is
\begin{align*}
L(\beta)
&=\prod_{i=1}^n p_i(\beta)^{Y_i}\bigl(1-p_i(\beta)\bigr)^{1-Y_i}.
\end{align*}
Substituting the two displayed formulas for $p_i(\beta)$ and $1-p_i(\beta)$ gives
\begin{align*}
L(\beta)
&=\prod_{i=1}^n
\left(\frac{e^{\eta_i(\beta)}}{1+e^{\eta_i(\beta)}}\right)^{Y_i}
\left(\frac{1}{1+e^{\eta_i(\beta)}}\right)^{1-Y_i}\\
&=\prod_{i=1}^n
\frac{e^{Y_i\eta_i(\beta)}}{1+e^{\eta_i(\beta)}}.
\end{align*}
[/step]
[step:Take logarithms to obtain the log-likelihood]
Define the log-likelihood function $\ell:\mathbb{R}^d \to \mathbb{R}$ by $\ell(\beta)=\log L(\beta)$. Since each factor in $L(\beta)$ is positive, the logarithm is well-defined. Using the product rule for logarithms and the identity $\log(e^a)=a$ for $a \in \mathbb{R}$, we obtain
\begin{align*}
\ell(\beta)
&=\sum_{i=1}^n
\log\left(\frac{e^{Y_i\eta_i(\beta)}}{1+e^{\eta_i(\beta)}}\right)\\
&=\sum_{i=1}^n
\left\{Y_i\eta_i(\beta)-\log(1+e^{\eta_i(\beta)})\right\}\\
&=\sum_{i=1}^n
\left\{Y_i x_i^\top\beta-\log(1+e^{x_i^\top\beta})\right\}.
\end{align*}
This is the stated expression for the log-likelihood.
[/step]
[step:Differentiate the log-likelihood to compute the score]
Fix $\beta \in \mathbb{R}^d$ and $h \in \mathbb{R}^d$. The total derivative of $\eta_i$ at $\beta$ is the [linear map](/page/Linear%20Map)
\begin{align*}
D(\eta_i)_\beta:\mathbb{R}^d &\to \mathbb{R} \\
h &\mapsto x_i^\top h.
\end{align*}
By the chain rule,
\begin{align*}
D\left(\log(1+e^{\eta_i})\right)_\beta(h)
&=\frac{e^{\eta_i(\beta)}}{1+e^{\eta_i(\beta)}}\,D(\eta_i)_\beta(h)\\
&=p_i(\beta)x_i^\top h.
\end{align*}
Therefore
\begin{align*}
D\ell_\beta(h)
&=\sum_{i=1}^n\left\{Y_i x_i^\top h-p_i(\beta)x_i^\top h\right\}\\
&=\sum_{i=1}^n\bigl(Y_i-p_i(\beta)\bigr)x_i^\top h\\
&=\left(\sum_{i=1}^n x_i\bigl(Y_i-p_i(\beta)\bigr)\right)^\top h.
\end{align*}
By the defining relation between the gradient and the Euclidean inner product on $\mathbb{R}^d$, the score vector $U(\beta)=\nabla \ell(\beta)$ is
\begin{align*}
U(\beta)=\sum_{i=1}^n x_i\bigl(Y_i-p_i(\beta)\bigr).
\end{align*}
[/step]
[step:Differentiate the score to compute the observed Hessian]
Fix $\beta \in \mathbb{R}^d$ and $h \in \mathbb{R}^d$. First compute the derivative of $p_i$. Since
\begin{align*}
p_i(\beta)=\frac{e^{\eta_i(\beta)}}{1+e^{\eta_i(\beta)}},
\end{align*}
the one-variable derivative of the map
\begin{align*}
q:\mathbb{R} &\to (0,1) \\
t &\mapsto \frac{e^t}{1+e^t}
\end{align*}
is
\begin{align*}
q'(t)
&=\frac{e^t(1+e^t)-e^t e^t}{(1+e^t)^2}\\
&=\frac{e^t}{(1+e^t)^2}.
\end{align*}
Thus, by the chain rule,
\begin{align*}
D(p_i)_\beta(h)
&=\frac{e^{\eta_i(\beta)}}{(1+e^{\eta_i(\beta)})^2}x_i^\top h\\
&=p_i(\beta)\bigl(1-p_i(\beta)\bigr)x_i^\top h.
\end{align*}
Using the score formula from the previous step and treating $Y_i$ and $x_i$ as fixed observed quantities, we obtain
\begin{align*}
D U_\beta(h)
&=-\sum_{i=1}^n x_i\,D(p_i)_\beta(h)\\
&=-\sum_{i=1}^n p_i(\beta)\bigl(1-p_i(\beta)\bigr)x_i(x_i^\top h)\\
&=\left(-\sum_{i=1}^n p_i(\beta)\bigl(1-p_i(\beta)\bigr)x_i x_i^\top\right)h.
\end{align*}
Hence the Jacobian matrix of $U$ at $\beta$, equivalently the observed Hessian matrix of $\ell$ at $\beta$, is
\begin{align*}
H(\beta)=J U_\beta
=-\sum_{i=1}^n p_i(\beta)\bigl(1-p_i(\beta)\bigr)x_i x_i^\top.
\end{align*}
This proves the stated formulas for the log-likelihood, score vector, and observed Hessian.
[/step]