API reference

Full auto-generated API documentation for all public classes and functions.

Sampler

Sampler — top-level MCMC sampler for pysinger.

Mirrors Sampler.cpp / Sampler.hpp.

Typical usage:

sampler = Sampler(Ne=10000, recomb_rate=1e-8, mut_rate=1e-8)
sampler.load_vcf("data/chr1.vcf", start=0, end=1_000_000)
sampler.iterative_start()
sampler.internal_sample(num_iters=100, spacing=1)
class pysinger.sampler.Sampler(Ne: float = 1.0, recomb_rate: float = 0.0, mut_rate: float = 0.0, recomb_map: RateMap | None = None, mut_map: RateMap | None = None)[source]

Bayesian ARG sampler.

Parameters:
  • Ne (float) – Haploid effective population size (used as a time-scaling factor).

  • recomb_rate (float) – Per-base-pair per-generation recombination rate. Scaled internally by Ne to get the coalescent-unit rate.

  • mut_rate (float) – Per-base-pair per-generation mutation rate. Scaled by Ne.

  • recomb_map (RateMap or None) – Variable-rate maps. If provided, take precedence over the scalar rates for rho/theta computation.

  • mut_map (RateMap or None) – Variable-rate maps. If provided, take precedence over the scalar rates for rho/theta computation.

internal_sample(num_iters: int, spacing: int = 1) None[source]

Run num_iters MCMC iterations.

Each iteration proposes at least spacing * sequence_length bp of re-threading moves.

Mirrors Sampler::internal_sample.

iterative_start(max_retries: int = 5) None[source]

Thread all sample nodes one by one to build an initial ARG.

Mirrors Sampler::iterative_start.

If a threading step fails (e.g. HMM underflow on long sequences), the entire build is retried with a fresh RNG state up to max_retries times.

load_vcf(vcf_file: str, start: float = 0.0, end: float = inf, haploid: bool = False) None[source]

Load genotype data from a VCF file.

Parameters:
  • vcf_file – Path to the .vcf file.

  • start – Genomic region (half-open).

  • end – Genomic region (half-open).

  • haploid – If True, treat each column as one haplotype (no phased parsing).

set_precision(c: float, q: float) None[source]
set_seed(seed: int) None[source]

Data structures

Node — a vertex in the ARG carrying allele state information.

Mirrors Node.cpp / Node.hpp. The C++ version uses a std::map with a cached iterator; here we use sortedcontainers.SortedDict and rely on its O(log n) bisect operations, which are fast enough for a readable PoC.

class pysinger.data.node.Node(time: float, index: int = 0, mutation_sites: sortedcontainers.SortedDict = <factory>)[source]

A node in the Ancestral Recombination Graph.

Parameters:
  • time (float) – Coalescence time (in units of 2Ne generations). Leaf nodes have time == 0; the global root sentinel has time == inf.

  • index (int) – Integer label. Sample nodes receive 0-based indices; internal nodes receive larger indices; the root sentinel has index == -1.

  • mutation_sites (sortedcontainers.SortedDict) – Maps genomic position → allele state (1 = derived). A sentinel entry at position -1 with state 0 is kept so look-ups before the first real position always succeed.

add_mutation(pos: float) None[source]

Record that this node carries the derived allele at pos.

get_state(pos: float) float[source]

Return the allele state at pos (exact match only).

Mirrors C++ Node::get_state: returns the stored value if there is an exact key at pos, otherwise 0. This means mutations are point events — a mutation at position 100 does NOT carry its state forward to position 101 or beyond.

index: int = 0
mutation_sites: sortedcontainers.SortedDict
time: float
write_state(pos: float, state: float) None[source]

Overwrite the allele state; state==0 removes the entry.

Branch — a directed edge between two nodes, hashable by node identity.

Mirrors Branch.cpp / Branch.hpp.

In C++ each Branch holds shared_ptr<Node> lower_node and upper_node and compares them by pointer identity. Here we do the same using Python object identity (id()). The class is immutable after construction so it can safely serve as a dict key or set member.

class pysinger.data.branch.Branch(lower_node: 'Node' | None = None, upper_node: 'Node' | None = None)[source]

Directed edge (lower_node, upper_node) in the ARG.

Invariant: lower_node.time <= upper_node.time (except for the null branch where both are None).

Hashability: based on Python object identity of the two node pointers, exactly as the C++ branch_hash struct.

is_null() bool[source]

True iff this is the null (empty) branch.

property length: float

Branch length in time units. Returns inf for branches to root.

lower_node
upper_node

Tree — a marginal coalescent tree at a single genomic position.

Mirrors Tree.cpp / Tree.hpp.

The Tree is represented by two dicts:

parents: child_node -> parent_node children: parent_node -> set of child_nodes

The topology is updated forward/backward along the genome by applying Recombination records.

class pysinger.data.tree.Tree[source]

Marginal coalescent tree at one genomic position.

parents
Type:

child → parent mapping (all non-root nodes present as keys).

children
Type:

parent → set of children (only internal nodes as keys).

add(added_branch: Branch, joining_branch: Branch, cut_node: 'Node' | None) None[source]

Insert added_branch joining at joining_branch.

Mirrors Tree::add.

backward_update(r: Recombination) None[source]

Undo recombination r moving left along the genome.

copy() Tree[source]

Shallow-copy the topology dicts (nodes are shared).

delete_branch(branch: Branch) None[source]

Remove branch from the tree.

find_joining_branch(removed_branch: Branch) Branch[source]

Return Branch(sibling, grandparent) for the removed branch.

Mirrors Tree::find_joining_branch.

find_sibling(n: Node) 'Node' | None[source]

Return the sibling of n (the other child of n’s parent).

forward_update(r: Recombination) None[source]

Apply recombination r moving right along the genome.

get_branches() Set['Branch'][source]

Return all branches in the tree as a set of Branch objects.

insert_branch(branch: Branch) None[source]

Add branch to the tree.

internal_backward_update(r: Recombination, cut_time: float) None[source]

backward_update restricted to branches with upper_node.time > cut_time.

internal_forward_update(r: Recombination, cut_time: float) None[source]

forward_update restricted to branches with upper_node.time > cut_time.

length() float[source]

Total branch length (excluding root sentinel branches).

remove(branch: Branch, cut_node: Node) None[source]

Remove branch from the tree and replace with cut lineage.

Mirrors Tree::remove. After this call the tree has cut_node as a leaf connected to branch.lower_node, and the sibling is directly connected to branch.upper_node’s parent.

Recombination — the ARG topology change at a single recombination breakpoint.

Mirrors Recombination.cpp / Recombination.hpp.

A Recombination stores the set of branches deleted and inserted at a genomic position pos. From the deleted/inserted sets it derives the key named branches (source, target, merging, recombined, transfer branches) and the associated times (start_time). These are used by the BSP/TSP transfer steps and by the MCMC threader.

class pysinger.data.recombination.Recombination(deleted_branches: Set | None = None, inserted_branches: Set | None = None)[source]

Topology change record for one recombination breakpoint.

pos
Type:

Genomic position of the breakpoint (set externally).

deleted_branches
Type:

Branches that exist before (left of) pos.

inserted_branches
Type:

Branches that exist after (right of) pos.

Derived attributes (computed by find_recomb_info)
source_branch
Type:

The lineage that recombines.

target_branch
Type:

The lineage it joins at coalescence.

inserted_node
Type:

New internal node created at the coalescence.

deleted_node
Type:

Old internal node removed by the topology change.

merging_branch
Type:

The branch that takes over after the removed node.

recombined_branch
Type:

Part of the source lineage below start_time.

start_time
Type:

Height at which recombination begins.

lower/upper_transfer_branch
Type:

For BSP interval transfer.

source_sister_branch, source_parent_branch
Type:

auxiliary.

add(prev_added_branch: Branch, next_added_branch: Branch, prev_joining_branch: Branch, next_joining_branch: Branch, cut_node: 'Node' | None = None) None[source]

Update this recombination when a new lineage is threaded in.

Mirrors Recombination::add.

affect(b: Branch) bool[source]

True iff b is in deleted_branches.

create(b: Branch) bool[source]

True iff b is in inserted_branches.

remove(prev_removed_branch: Branch, next_removed_branch: Branch, prev_split_branch: Branch, next_split_branch: Branch, cut_node: 'Node' | None = None) None[source]

Update this recombination when the surrounding topology is pruned.

Mirrors Recombination::remove (both overloads).

set_pos(x: float) None[source]
trace_backward(t: float, curr_branch: Branch) Branch[source]

Return the branch that maps to curr_branch before pos.

Mirrors Recombination::trace_backward.

trace_forward(t: float, curr_branch: Branch) Branch[source]

Return the branch at time t that curr_branch maps to after pos.

Mirrors Recombination::trace_forward.

Interval and IntervalInfo — time-height intervals on ARG branches.

Mirrors Interval.cpp / Interval.hpp.

An Interval represents a window [lb, ub] in coalescent time on a specific Branch. The BSP stores a list of current Intervals as its forward-pass state space; the TSP does the same.

IntervalInfo is a lightweight, hashable summary used as a dict key inside BSP.transfer() to accumulate transfer weights from multiple source intervals into a single target interval.

class pysinger.data.interval.Interval(branch: Branch, lb: float, ub: float, start_pos: int)[source]

A (branch, lb, ub) time interval with associated forward-HMM state.

branch
Type:

The ARG branch this interval lives on.

lb
Type:

Lower time bound.

ub
Type:

Upper time bound.

start_pos

created (used during traceback).

Type:

Index into the coordinate grid where this interval was

weight
Type:

Coalescent probability mass in [lb, ub] (set by BSP).

time
Type:

Representative time point in [lb, ub] (median or median-CDF).

source_weights
Type:

Weights of source intervals (BSP transfer).

source_intervals
Type:

Corresponding source Interval objects.

node
Type:

Optional Node pointer (used by TSP for point-mass intervals).

assign_time(t: float) None[source]
assign_weight(w: float) None[source]
branch
fill_time() None[source]

Set self.time to the exponential-median of [lb, ub].

Mirrors Interval::fill_time() in the C++ code.

full(cut_time: float) bool[source]

True iff this interval spans the entire branch above cut_time.

lb
node: 'Node' | None
source_intervals: List['Interval']
source_weights: List[float]
start_pos
time: float
ub
weight: float
class pysinger.data.interval.IntervalInfo(branch: Branch, lb: float, ub: float, seed_pos: int = 0)[source]

Lightweight key for BSP.transfer() accumulation maps.

Mirrors C++ Interval_info — a (branch, lb, ub) triple with a seed_pos tiebreaker. Must be hashable so it can serve as a dict key.

branch
lb
seed_pos
ub

ARG — the Ancestral Recombination Graph, the central data structure.

Mirrors ARG.cpp / ARG.hpp.

An ARG is encoded as a sorted map of Recombination records keyed by genomic position. A marginal tree at position x is obtained by replaying all records from position 0 up to (and including) x.

The two main MCMC operations are:
remove(cut_point) — extract a single lineage from the ARG, leaving

the remaining haplotypes connected.

add(joining, added) — thread the lineage back in at new positions.

class pysinger.data.arg.ARG(Ne: float = 1.0, sequence_length: float = 1.0)[source]

Ancestral Recombination Graph.

Ne
Type:

Effective population size (diploid).

sequence_length
Type:

Length of the genomic region (base pairs).

root
Type:

Sentinel root node (time=inf, index=-1).

sample_nodes
Type:

Set of leaf (sample) nodes.

node_set
Type:

All non-root nodes.

recombinations

sentinels at 0 and INT_MAX.

Type:

SortedDict[pos → Recombination]. Always has

mutation_sites
Type:

Sorted set of positions carrying derived alleles.

mutation_branches
Type:

pos → set of Branches carrying the mutation.

coordinates
Type:

Grid of genomic positions (HMM bins).

rhos
Type:

Scaled recombination rate per bin.

thetas
Type:

Scaled mutation rate per bin.

MCMC working variables (reset by clear_remove_info)
removed_branches
Type:

pos → Branch that was removed.

joining_branches
Type:

pos → Branch that “jumped over” the cut.

cut_tree
Type:

Marginal tree at the cut position.

cut_pos, cut_node
Type:

Position and sentinel node for the cut.

start, end
Type:

Genomic extent of the current MCMC window.

add(new_joining_branches: sortedcontainers.SortedDict, added_branches: sortedcontainers.SortedDict) None[source]

Thread the removed lineage back in at new positions.

Mirrors ARG::add.

add_node(n: Node) None[source]
add_sample(n: Node) None[source]

Register sample node n and update mutation_sites.

approx_sample_recombinations() None[source]

Sample start_times and finalize derived fields for all recombinations.

Mirrors ARG::approx_sample_recombinations / RSP_smc::approx_sample_recombination. Sets source_branch, start_time, then calls _find_target_branch and _find_recomb_info so that merging_branch (used by BSP) is valid.

build_singleton_arg(n: Node) None[source]

Build an ARG containing a single sample node n.

clear_remove_info() None[source]

Reset MCMC working state.

compute_rhos_thetas(r: float, m: float) None[source]

Compute per-bin scaled recombination/mutation rates.

Mirrors ARG::compute_rhos_thetas(double r, double m).

count_flipping() int[source]
count_incompatibility() int[source]
discretize(bin_size: float) None[source]

Build coordinate grid, placing breakpoints at recombinations.

Mirrors ARG::discretize.

get_arg_length() float[source]

Total ARG length (sum of branch_length * genomic_span).

get_check_points() Set[float][source]

Return the set of recombination positions that need sanity checks.

get_index(x: float) int[source]

Return index i such that coordinates[i] <= x < coordinates[i+1].

get_query_node_at(x: float) Node | None[source]

Return the query node (lower_node of the removed branch at x).

get_tree_at(x: float) Tree[source]

Return the marginal tree at position x.

Replays all Recombination records with pos <= x.

num_unmapped() int[source]
remove(cut_point: Tuple[float, Branch, float]) None[source]

Remove a lineage from the ARG.

cut_point = (pos, center_branch, cut_time).

After this call: - self.removed_branches maps positions to the removed branch. - self.joining_branches maps positions to the “joining” branch. - self.start / self.end delimit the affected genomic region. - self.start_tree is the marginal tree at start (minus the cut).

sample_internal_cut() Tuple[float, Branch, float][source]

Sample a random (pos, branch, cut_time) for the next MCMC step.

Mirrors ARG::sample_internal_cut.

HMM

CoalescentCalculator — piecewise-exponential coalescent CDF/quantile.

Mirrors Coalescent_calculator.cpp / Coalescent_calculator.hpp.

Given a set of branches spanning a range of times, computes the probability that a new lineage coalesces in any time interval [lb, ub]. The coalescent rate at time t is the number of lineages currently alive at t (i.e., spanning t).

The CDF is piecewise-exponential: within each interval the rate is constant, so the survival function is exp(−rate * Δt).

class pysinger.hmm.coalescent.CoalescentCalculator(cut_time: float)[source]

Piecewise-exponential coalescent CDF for a set of branches.

Usage:

cc = CoalescentCalculator(cut_time=0.0)
cc.compute(set_of_branches)
p = cc.weight(lb, ub)   # probability of coalescence in [lb, ub]
t = cc.time(lb, ub)     # representative time in [lb, ub]
compute(branches: Set['Branch']) None[source]

Recompute the CDF from branches.

Mirrors Coalescent_calculator::compute.

prob(x: float) float[source]

Return cumulative coalescence probability at time x.

Mirrors Coalescent_calculator::prob.

quantile(p: float) float[source]

Return the time t such that prob(t) == p.

Mirrors Coalescent_calculator::quantile.

time(lb: float, ub: float) float[source]

Return a representative coalescence time in [lb, ub].

Uses the exponential median; falls back to the midpoint when the interval is tiny. Mirrors Coalescent_calculator::time.

weight(lb: float, ub: float) float[source]

Return the probability of coalescence in [lb, ub].

Mirrors Coalescent_calculator::weight.

Emission models for the BSP/TSP forward HMM.

Mirrors Binary_emission.cpp, Polar_emission.cpp, and Emission.hpp.

The abstract base class Emission defines two methods:

null_emit(branch, time, theta, query_node) — no mutation in bin mut_emit(branch, time, theta, bin_size, mut_set, query_node) — mutations in bin

BinaryEmission — symmetric infinite-sites model (C++ Binary_emission). PolarEmission — polarised (ancestral/derived) model (C++ Polar_emission).

class pysinger.hmm.emission.BinaryEmission[source]

Symmetric infinite-sites emission model.

Mirrors Binary_emission.cpp.

The emission probability for a branch (lower, upper) and query node with representative time t on the branch is:

P(data) ∝ exp(-θ * l_lower) * (θ/bin_size)^s_lower
  • exp(-θ * l_upper) * (θ/bin_size)^s_upper

  • exp(-θ * l_query) * (θ/bin_size)^s_query

divided by the “old” probability (without the threaded lineage) to get the ratio used in the HMM.

emit(branch: Branch, time: float, theta: float, bin_size: float, emissions: list, node: Node) float[source]

Emit using pre-computed diff counts (used by TSP).

mut_emit(branch: Branch, time: float, theta: float, bin_size: float, mut_set: Set[float], node: Node) float[source]

Return the emission probability when mut_set mutations fall in the bin.

null_emit(branch: Branch, time: float, theta: float, node: Node) float[source]

Return the emission probability when no mutations fall in the bin.

class pysinger.hmm.emission.Emission[source]

Abstract emission model for the HMM.

emit(branch: Branch, time: float, theta: float, bin_size: float, emissions: list, node: Node) float[source]

Emit using pre-computed diff counts (used by TSP).

abstractmethod mut_emit(branch: Branch, time: float, theta: float, bin_size: float, mut_set: Set[float], node: Node) float[source]

Return the emission probability when mut_set mutations fall in the bin.

abstractmethod null_emit(branch: Branch, time: float, theta: float, node: Node) float[source]

Return the emission probability when no mutations fall in the bin.

class pysinger.hmm.emission.PolarEmission(penalty: float = 1.0, ancestral_prob: float = 0.5)[source]

Polarised emission model for ancestral/derived alleles.

Mirrors Polar_emission.cpp.

Additional parameters:

penalty: Cost of derived alleles in the query node. ancestral_prob: Prior probability that allele is ancestral at root.

emit(branch: Branch, time: float, theta: float, bin_size: float, emissions: list, node: Node) float[source]

Emit using pre-computed diff counts (used by TSP).

mut_emit(branch: Branch, time: float, theta: float, bin_size: float, mut_set: Set[float], node: Node) float[source]

Return the emission probability when mut_set mutations fall in the bin.

null_emit(branch: Branch, time: float, theta: float, node: Node) float[source]

Return the emission probability when no mutations fall in the bin.

BSP — Branch Sequence Propagator, the forward HMM for branch threading.

Mirrors BSP_smc.cpp / BSP_smc.hpp.

The BSP computes the HMM forward probabilities over a set of Interval objects (each representing a (branch, time) cell). At each genomic position the HMM:

  1. Optionally applies a Recombination transfer (moves probability mass from old intervals to new ones according to the topology change).

  2. Advances by one bin (forward step), mixing staying-in-place with recombining to a new time.

  3. Multiplies by the emission probability (null or with mutations).

After the forward pass, sample_joining_branches() performs a traceback to return a map of pos → Branch for threading.

class pysinger.hmm.bsp.BSP[source]

Branch Sequence Propagator — forward HMM for the branch dimension.

State space: a list of Interval objects. Forward probabilities: forward_probs[step][interval_idx].

Mirrors BSP_smc in the C++ code.

avg_num_states() float[source]

Average number of states (intervals) per position.

forward(rho: float) None[source]

Advance forward by one bin with recombination rate rho.

Mirrors BSP_smc::forward.

mut_emit(theta: float, bin_size: float, mut_set: Set[float], query_node: 'Node' | None) None[source]

Apply mutation emission.

Mirrors BSP_smc::mut_emit.

null_emit(theta: float, query_node: 'Node' | None) None[source]

Apply null-emission (no mutations in bin).

Mirrors BSP_smc::null_emit.

reserve_memory(length: int) None[source]
sample_joining_branches(start_index: int, coordinates: List[float]) sortedcontainers.SortedDict[source]

Traceback to sample a joining-branch map.

Mirrors BSP_smc::sample_joining_branches. Returns SortedDict[pos → Branch].

sanity_check(r: Recombination) None[source]

Zero out invalid point-mass intervals at recombination nodes.

Mirrors BSP_smc::sanity_check.

set_check_points(p: Set[float]) None[source]
set_cutoff(x: float) None[source]
set_emission(e: Emission) None[source]
set_rng(rng: numpy.random.Generator) None[source]
start(branches: Set[Branch], t: float) None[source]

Initialise the forward pass at the left boundary.

branches is the set of branches in the starting tree. t is the cut time (lineage starts at time t).

Mirrors BSP_smc::start.

transfer(r: Recombination) None[source]

Apply topology change r to the interval state space.

Mirrors BSP_smc::transfer.

TSP — Time Sequence Propagator, the forward HMM for coalescence times.

Mirrors TSP_smc.cpp / TSP_smc.hpp.

The TSP operates on a single Branch and samples a representative coalescence time in each genomic bin. It uses a PSMC-style transition kernel (psmc_prob) rather than the coalescent-CDF-based kernel of BSP.

Key public API

start(branch, t) — initialise at the left boundary forward(rho) — advance one bin transfer(r, prev_branch, next_branch)— apply topology change recombine(prev_branch, next_branch) — re-sample at a recombination null_emit(theta, query_node) — apply no-mutation emission mut_emit(theta, bin_size, mut_set, query_node) — apply mutation emission sample_joining_nodes(start_index, coordinates) — traceback → Dict[pos, Node]

class pysinger.hmm.tsp.TSP[source]

Time Sequence Propagator — forward HMM for the time dimension.

State space: a list of Interval objects (finely gridded over a Branch). Forward probabilities: forward_probs[step][interval_idx].

Mirrors TSP_smc in the C++ code.

forward(rho: float) None[source]

Advance one bin.

Mirrors TSP_smc::forward.

mut_emit(theta: float, bin_size: float, mut_set: Set[float], query_node: Node) None[source]

Apply mutation emission.

Mirrors TSP_smc::mut_emit.

null_emit(theta: float, query_node: Node) None[source]

Apply null emission (no mutations).

Mirrors TSP_smc::null_emit.

recombine(prev_branch: Branch, next_branch: Branch) None[source]

Re-sample interval distribution at a recombination.

Mirrors TSP_smc::recombine.

reserve_memory(length: int) None[source]
sample_joining_nodes(start_index: int, coordinates: List[float]) Dict[float, Node | None][source]

Traceback to sample a joining-node map.

Returns Dict[pos → Node]. Mirrors TSP_smc::sample_joining_nodes.

set_check_points(p: Set[float]) None[source]
set_emission(e: Emission) None[source]
set_gap(q: float) None[source]
set_rng(rng: numpy.random.Generator) None[source]
start(branch: Branch, t: float) None[source]

Initialise forward pass at the left boundary.

Mirrors TSP_smc::start.

transfer(r: Recombination, prev_branch: Branch, next_branch: Branch) None[source]

Apply topology change r.

Mirrors TSP_smc::transfer.

MCMC

Threader — MCMC move: remove a lineage and re-thread it.

Mirrors Threader_smc.cpp / Threader_smc.hpp.

Two public entry points:

thread(arg, node) — add a new leaf node (initial threading) internal_rethread(arg, cut) — MCMC move with Metropolis acceptance

class pysinger.mcmc.threader.Threader(cutoff: float = 0.0, gap: float = 0.02)[source]

BSP + TSP threader for adding/rethreading a lineage in an ARG.

Parameters:
  • cutoff (float) – BSP state-space pruning threshold (bsp_c in C++).

  • gap (float) – TSP time grid quantile gap (tsp_q in C++).

internal_rethread(arg: ARG, cut_point: Tuple[float, Branch, float]) None[source]

MCMC move: remove and re-thread a lineage segment.

Proposes a new threading, accepts with Metropolis ratio. Mirrors Threader_smc::internal_rethread.

set_rng(rng: numpy.random.Generator) None[source]
thread(arg: ARG, node: Node) None[source]

Add node as a new leaf and thread its lineage through arg.

Mirrors Threader_smc::thread.

I/O

VCF reader — load phased or haploid VCF into Node objects.

Mirrors Sampler::naive_read_vcf / naive_read_vcf_haploid in Sampler.cpp.

Returns a list of Node objects with mutation sites set, one node per haplotype (2 per individual for diploid, 1 per individual for haploid).

pysinger.io.vcf_reader.read_vcf_haploid(vcf_file: str, start_pos: float = 0.0, end_pos: float = inf) Tuple[List[Node], float][source]

Read a haploid VCF and return (nodes, sequence_length).

One Node per individual (column after FORMAT).

Mirrors Sampler::naive_read_vcf_haploid.

pysinger.io.vcf_reader.read_vcf_phased(vcf_file: str, start_pos: float = 0.0, end_pos: float = inf) Tuple[List[Node], float][source]

Read a phased VCF and return (nodes, sequence_length).

One Node per haplotype (2 × n_individuals). Mutations are stored as positions relative to start_pos.

Mirrors Sampler::naive_read_vcf.

Parameters:
  • vcf_file – Path to the .vcf file.

  • start_pos – Genomic region to load (half-open [start_pos, end_pos)).

  • end_pos – Genomic region to load (half-open [start_pos, end_pos)).

Returns:

  • nodes (List[Node]) – Haplotype nodes (index 0 .. 2n-1).

  • sequence_length (float) – end_pos - start_pos.

tskit writer — convert a pysinger ARG to a tskit TreeSequence.

Mirrors the conceptual output of SINGER’s write() method, but produces a tskit.TreeSequence that can be analysed with the tskit/msprime ecosystem.

The conversion:
  1. Add all non-root nodes (time = node.time * 2*Ne) as tskit individuals.

  2. Walk the ARG recombinations to build a list of (left, right, parent, child) edges spanning each genomic interval.

  3. Sort edges (required by tskit) and call tables.sort() + tables.tree_sequence().

pysinger.io.tskit_writer.arg_to_tskit(arg: ARG, Ne: float = 1.0)[source]

Convert arg to a tskit.TreeSequence.

Parameters:
  • arg (ARG) – The pysinger ARG object.

  • Ne (float) – Effective population size used to convert coalescent time units to generations (t_generations = t_coalescent * 2 * Ne).

Returns:

ts

Return type:

tskit.TreeSequence

Rates

RateMap — piecewise-constant rate map (recombination or mutation).

Mirrors Rate_map.cpp / Rate_map.hpp.

class pysinger.rates.rate_map.RateMap[source]

Piecewise-constant genetic rate map.

Loaded from a file with lines: left right rate

Provides cumulative distance lookup for converting genomic positions to rate-weighted distances.

cumulative_distance(x: float) float[source]
load_map(filename: str) None[source]

Load a rate map from a whitespace-delimited file.

mean_rate() float[source]
segment_distance(x: float, y: float) float[source]

Reconstruction

FitchReconstruction — Fitch parsimony ancestral state reconstruction.

Mirrors Fitch_reconstruction.cpp / Fitch_reconstruction.hpp.

Given a Tree, performs a two-pass Fitch parsimony reconstruction at each genomic position to assign ancestral states to internal nodes.

Pass 1 (pruning / up-pass): bottom-up, assigns a state to each node based on the intersection (or union when intersection is empty) of its children. We encode ambiguity as 0.5.

Pass 2 (peeling / down-pass): top-down, resolves ambiguous states by propagating the parent’s definite state downward.

class pysinger.reconstruction.fitch.FitchReconstruction(tree: Tree)[source]

Two-pass Fitch parsimony reconstruction on a Tree.

Usage:

fr = FitchReconstruction(tree)
fr.reconstruct(pos)      # assigns states for position *pos*
fr.update(recombination) # advance tree topology
reconstruct(pos: float) None[source]

Perform Fitch reconstruction at genomic position pos.

Writes the inferred ancestral states back into each node via node.write_state(pos, state).

Mirrors Fitch_reconstruction::reconstruct.

update(r: Recombination) None[source]

Advance the topology by applying recombination r.

Mirrors Fitch_reconstruction::update.