Overview
What pysinger does
pysinger performs Bayesian posterior sampling of Ancestral Recombination Graphs (ARGs) under the Sequentially Markov Coalescent (SMC). Given a phased VCF file and population-genetic parameters (\(N_e\), \(\mu\), \(r\)), it:
Builds an initial ARG by iteratively threading each haplotype through the growing graph using two coupled Hidden Markov Models.
Refines the ARG via Metropolis–Hastings MCMC that proposes local re-threading moves.
Exports the inferred ARG as a
tskit.TreeSequencefor downstream population-genetic analysis.
The SINGER algorithm in one paragraph
SINGER threads one haplotype at a time into a partially-built ARG. For each new haplotype, a Branch Sequence Propagator (BSP) runs a forward HMM over the genome to decide which branch in each marginal tree the new lineage should join. Conditioned on that, a Time Sequence Propagator (TSP) runs a second forward HMM to decide when (at what coalescence time) it should join. Both HMMs use mutation data as emission evidence. After all haplotypes are threaded, MCMC iterates: pick a random lineage, remove it, and re-thread using BSP + TSP with Metropolis acceptance.
Pipeline
VCF file
│
▼
Sampler.load_vcf() ← parse phased genotypes into Node objects
│
▼
Sampler.iterative_start() ← thread haplotypes 1-by-1 (BSP + TSP)
│
▼
Sampler.internal_sample() ← MCMC re-threading with Metropolis--Hastings
│
▼
arg_to_tskit() ← export to tskit.TreeSequence
Package map
Module |
Role |
|---|---|
|
Top-level orchestrator |
|
Core data structures: |
|
Forward HMMs: |
|
|
|
VCF reader, tskit writer |
|
Piecewise-constant recombination/mutation rate maps |
|
Fitch parsimony for ancestral state reconstruction |