\section{Bunched light source and experimental setup}
\label{appendix:setup:detailed}

The experimental setup in this work uses light from a distributed-feedback (DFB) laser diode of wavelength 780\,nm, coupled into 780-HP fiber to project into a single optical mode. This light is sent into an unbalanced fiber-based Mach-Zehnder interferometer with 400\,m delay fiber. This corresponds to a $\Delta=2\,\mu\text{s}$ delay, which is longer than the coherence time $\tau_{\text{c}}=180\,\text{ns}$.

Assuming that the power and polarization in both arms match, the output field from one of the interferometer ports is given by
\begin{equation*}
	E(t)=\frac{1}{2}[E_0(t)+E_0(t+\Delta)],
\end{equation*}
where $E_0(t)$ is the electric field of the laser light. The overall second-order coherence function thus gives
\begin{align*}
g^{(2)}(\tau) = &\>\frac{1}{4}\left[g_0^{(2)}(\tau+\Delta) + g_0^{(2)}(\tau-\Delta)\right]\\
&+\frac{1}{2}\left[g_0^{(2)}(\tau) + \vert g_0^{(1)}(\tau)\vert^2\right]\\
= &\>1 + \frac{1}{2}\exp\left(-\frac{2|\tau|}{\tau_c}\right),
\end{align*}
where $g_0^{(2)}(\tau)=1$ is the second-order coherence function of the laser light and $g_0^{(1)}(\tau)=\exp\left(-|\tau|/\tau_c\right)$ is the first-order coherence function of the laser light, assuming a Lorentzian spectral profile. The peak at $g^{(2)}(\tau=0)$ is a manifestation of the photon bunching effect that saturates at 1.5 in our setup. Our measured peak value of $1.42(1)$ is attributed to the fact that the laser is not fully coherent~\cite{yeo2024crossg2} and other experimental imperfections. A more in-depth explanation and discussion about the source can be found in \cite{yeoNearlosslessMethodGenerating2026}.

A second BS is used to further split the light into two symmetrical channels
for downstream detection using fiber-pigtailed active-quenched Si avalanche
photodiodes (Excelitas SPCM-800-10-FC). Attenuation in each channel is
achieved by cascading fiber beamsplitters as well as fiber decoupling at the
mating sleeves, and is measured to be relatively stable over multiple days.

Time tagging is subsequently performed by timestamp devices (S-Fifteen
Instruments TDC2) with a 1-$\sigma$ timing jitter of 20\,ps.
The clock to each timestamp device is supplied by external 10\,MHz crystal oscillators without any temperature stabilization.


The actual frequency offset between the two clocks was measured concurrently with the clock synchronization experiment. The $10$\,MHz signal was duplicated and combined in a frequency mixer (Mini-Circuits ZFM-2+) followed by a low-pass filter (Mini-Circuits SLP-2.5+, $2.5$\,MHz cut-off). The mixed signal was then sampled by an oscilloscope at a rate of $2.5$\,ksamples/s over $10$\,s, which performed a $2^{14}$-bin FFT (nominal resolution of approximately $0.15$\,Hz) with a Hann window.

The frequency offset error is then calculated by measuring the difference between the served frequency offset (from the frequency estimation step in the clock synchronization) and the measured frequency offset with linear interpolation. The histogram of offset errors is fitted using a Gaussian probability density function, obtaining a standard deviation of $3.2$\,ppb as seen in Fig.\,\ref{result:freq:accuracy}. % 3.23506471

\section{Derivation of max-order distribution}
\label{appendix:derivation}

The FFT for peak finding according to Eqn.\,\ref{eq:convolution} is performed with $N$ bins and accidental coincidences in each of these bins follow the Poisson distribution $\mathcal{P}$ with mean $\lambda{}$. The maximum value across all bins increases with the number of bins. The distribution of the max-order statistic (i.e., the maximum of all bins) with respect to the number of coincidences $x$ and number of bins $N$ of mean value $\lambda{}$ is denoted $f_{(N)}(x|\lambda)$.

Since each bin is assumed to be independent and identically distributed, the corresponding max-order cumulative distribution function (CDF) can be expressed as the product of the CDF of individual bins,
\begin{align*}
F_{(N)}(x|\lambda{}) = \underbrace{F(x|\lambda{})\times\ldots{}\times{}F(x|\lambda{})}_{N\ \text{times}} = \left[F(x|\lambda{})\right]^N.
\end{align*}

We therefore express the max-order probability mass function (PMF) in terms of the Poisson PMF and CDF of a single bin,
\begin{align*}
  f_{(N)}(x|\lambda)
  &= F_{(N)}(x|\lambda{}) - F_{(N)}(x-1|\lambda{}) \\
  &= \left[F(x|\lambda{})\right]^N - \left[F(x-1|\lambda{})\right]^N \\
  &= \left[F(x|\lambda{})\right]^N - \left[F(x|\lambda{}) - f(x|\lambda{})\right]^N.
\end{align*}

Notably, this form is already tenable for direct computation without further simplification, even though computing the difference of $n^{\text{th}}$-powers generally causes catastrophic cancellation due to floating-point rounding errors.
The max-order Poisson PMF distribution width heuristically scales roughly with $\sqrt{\lambda{}}$, which requires the PMF to be accurate to at least $1/\sqrt{\lambda{}}$, e.g., $\lambda{}\le 10^{4}$ needs at least $\sim{}10^{-2}$ accuracy.
Since $F(x|\lambda{}) \in [0,1]$ and the fact that exponentiation can be easily applied for large $N$ using numerical techniques (such as exponentiation-by-squaring), the rounding errors can be minimized to near the floating-point precision, e.g., $\sim{}10^{-16}$ for 64-bit floats.

For larger $\lambda{} > 10^{4}$ (i.e., high accidental coincidence rates per time bin), the normal approximation for each bin remains appropriate,
\begin{align}
f(x) \sim{} \mathcal{P}(x|\lambda{}) \approx{} \mathcal{N}(x|\mu=\lambda{},\sigma=\sqrt{\lambda{}}),
\label{eq:normalapprox}
\end{align}
and the corresponding max-order probability distribution function follows a more tractable form for numerical computation,
\begin{align*}
  f_{(N)}(x|\mu,\sigma)
  &= \frac{d}{dx} F_{(N)}(x|\mu,\sigma) \\
  &= \frac{d}{dx} \left[F(x|\mu,\sigma)\right]^N \\
  &= N f(x|\mu,\sigma) \left[F(x|\mu,\sigma)\right]^{N-1}.
\end{align*}
This in fact corresponds to the first-order term for the discrete case after a binomial expansion,
\begin{align*}
  f_{(N)}&(x|\lambda) \\
  &= \left[F(x|\lambda{})\right]^N - \left[F(x|\lambda{}) - f(x|\lambda{})\right]^N \\
  &= \left[F(x|\lambda{})\right]^N - \left[\left[F(x|\lambda{})\right]^N \right. \\
  &\quad \left. - N f(x|\lambda{})\left[F(x|\lambda{})\right]^{N-1} + \ldots\right] \\
  &= N f(x|\lambda{})\left[F(x|\lambda{})\right]^{N-1} - \mathcal{O}\left(f^2F^{N-2}\right),
\end{align*}
noting that $f(x|\lambda)/F(x|\lambda) \ll 1$ at large $\lambda$.


\section{Comparison of peak finding probabilities with Poisson / normal approximation}
\label{appendix:poissonprobability}

\begin{figure*}
  % \includegraphics[width=\textwidth]{figures/peakfinding/20260218_peakfinding_comparison.eps}
  \includegraphics[width=\textwidth]{figure5.eps}
  \caption{\label{finding:gaussian}
    \textbf{(a)} Success probabilities of finding the correct peak position by solving Eqn.\,\ref{eqn:peakfinding}, given singles detection rate $s_1=s_2=100$\,kcounts/s, coincidence rate $c=650$\,counts/s, and bin overlap of $\nu=0.5$, and frequency offset error of $\Delta{}u=50$\,ppb (precompensation step size of 100\,ppb).
    \textbf{(b)} Corresponding probabilities using the normal approximation for observed accidental coincidences $X$ in Eqn.\,\ref{eqn:peakfinding}, with the same parameters. The probabilities are strongly dependent on $N$ and minimally with $\delta t$, similar to the estimations provided using the statistical significance framework.
  }
\end{figure*}

To understand the effect of normal approximation on peak finding success probabilities, it is useful to first understand the use of the max-order distribution.

Previous models derive a statistical significance $S$ representing the number of standard deviations $\sigma_X$ the coincidence peak value deviates from the mean number of accidental coincidences $X$ (of a \textit{single} time bin).
Ignoring any corrections for frequency difference, the significance shares the form
\begin{align}
S = \frac{c_e T}{\sigma_{X}} = \frac{c_e T}{\sqrt{s_1s_2 T \delta t}} = c_e \sqrt{\frac{N}{s_1 s_2}}
\label{eqn:significance}
\end{align}
using notation consistent with this paper (corresponding to Eqn.\,8 of \cite{ho2009clock} and Eqn.\,A3 of \cite{spiessClockSynchronizationCorrelated2023}). Notably, this equation only depends on the number of time bins $N$ and has no dependence on time bin width $\delta t$.

The search for the true coincidence peak is more concerned with distinguishing the true coincidence peak from the \textit{next highest} peak arising from noise, and less to do with the average fluctuations of a single time bin.
Since the number of accidental coincidences $X$ in each time bin is stochastic, the maximum observed noise peak across all time bins intuitively increases with increasing number of observations (i.e., number of bins $N$).
This maximum is distributed according to the order statistics $X_{(N)}$ of all time bins.

In contrast, the derivation in Eqn.\,\ref{eqn:significance} implicitly assumes that the noise peak is given only by the accidental coincidences $X$ from a single time bin. The use of statistical significance is thus more accurately described as an indication of the \textit{peak misidentification rate}: the probability that the true coincidence peak is misidentified as noise, given that the location of said peak is \textit{already known}. Our derivation of Eqn.\,\ref{eqn:peakfinding} in the main text, which accounts for $X_{(N)}$, yields the actual peak identification rate.

Due to the positive excess kurtosis (i.e., ``fatter tail'') of the Poisson distribution relative to the normal distribution, the corresponding max-order distribution $f_{(N)}(x|\lambda)$ is skewed more towards higher peak values.
At low coincidence rates per time bin ($\lambda \lesssim 10^4$) occurring at small $N$ and $\delta t$ values, the normal approximation $f_{(N)}(x|\mu,\sigma)$ underestimates the noise peak value, and thus overestimates the peak finding success probabilities.

The success probabilities for the Poisson model and the normal approximation model are shown in Fig.\,\ref{finding:gaussian} by numerically solving Eqn.\,\ref{eqn:peakfinding} using Monte Carlo simulations of $>10^7$ trials per $(N,\delta t)$ pair.
The normal approximation indeed overestimates the peak finding probabilities at small time bin width $\delta t$. We use $N = 2^q$ ($q\in\mathbb{Z}^+$) for its logarithmic scale.

\vspace{-1em}
\section{Derivation of peak tracking equation and parameters}
\label{appendix:math:tracking}

We perform active frequency compensation by estimating the clock frequency offset from the set of timestamps.
The timing difference $\tau_i := t_i - t'_i$, between a timestamp pair $\{t_i,t'_i\}$ from both parties, is continuously served by searching for photon pair detection events within a prescribed coincidence tracking window.

The frequency offset $\Delta{}u$ is given by the ratio between measured elapsed time $\Delta{}t_i := t_i - t_{i-1}$ with respect to some reference elapsed time, in this case the elapsed time measured by the other peer $\Delta{}t'_i$,

\begin{align*}
\frac{\Delta{}t_i}{\Delta{}t'_i} = 1 + \Delta{}u.
\end{align*}
Rewriting in terms of the measured successive timing difference, we can estimate the frequency offset by measuring the rate of change in the timing difference, i.e.,
\begin{align*}
\Delta{}u_i = \frac{\tau_i - \tau_{i-1}}{\Delta{}t'_i}.
\end{align*}

Active frequency compensation is therefore achieved by performing a timing correction for each timestamp $t_i$ using the estimated frequency offset,
\begin{align*}
t_i &\rightarrow{} \Delta{}t_i ( 1 + \Delta{}u) + t_{i-1} = t_i + \Delta{}t_i \Delta{}u,
\end{align*}
where the overall frequency offset $\Delta{}u$ accumulated is also given by
$\Delta{}u = \prod_0^i{\left(1 + \Delta{}u_i\right)} - 1$.

Peak tracking with thermal light is intrinsically noisy due to the signal being dominated by accidental coincidences.
Hence, we use an exponential moving average filter given by,
\begin{align*}
\tau{}'_i = \alpha\tau_{i} + (1 - \alpha)\tau'_{i-1},\quad{} \tau'_0 = \tau_0,
\end{align*}
which behaves like a low-pass filter to smooth the signal. The coefficient $\alpha$ represents the relative weight of timing offset $\tau_i$ with respect to the accumulated average $\tau'_{i-1}$, and can be adjusted depending on the observed signal rate.

Given a unit step impulse, the time it takes to reach $1 - 1/e$ of the signal is associated with a time constant $\beta$ related to the coefficient $\alpha$,
\begin{align*}
\alpha = 1 - \exp{\left(-\frac{\overline{\Delta{}t}}{\beta}\right)} \approx{} \frac{\overline{\Delta{}t}}{\beta},\qquad{} \beta \gg \overline{\Delta{}t},
\end{align*}
after Taylor expansion of the exponential, where $\overline{\Delta{}t}$ is the average separation time between consecutive timestamp events. The singles rate $s$ in our experiment is 200\,kcounts/s, so $\overline{\Delta{}t} = 1/s = 5$\,\textmu{}s. We set the time constant $\beta = 50$\,ms, which is 4 orders of magnitude longer than $\overline{\Delta{}t}$ for averaging.
