Sampling Partial Genealogies Using Sequential Importance Sampling

Dongmeng Liu, Jinko Graham

Department of Statistics & Actuarial Science, Simon Fraser University

A gene genealogy traces the ancestry of segments of DNA sequence back in time to their common ancestor. We cannot ob- serve the genealogies but the DNA sequence data give us informa- tion which can be used to sample from the posterior distribution of the underlying genealogies. However, a full genealogy can be so large that it greatly decreases the efficiency of existing sam- pling techniques. Partial genealogies trace the ancestry to a fixed point back in time only, and can dramatically improve the effi- ciency of some commonly-used sampling methods. We introduce an algorithm for sampling the partial genealogies of a set of DNA sequences from their posterior distribution. Our algorithm uses sequential importance sampling (SIS) and accommodates coales- cence, mutation and recombination events in the ancestral history of the sequences. SIS methods are computationally intensive and have become popular as an alternative to MCMC methods for inference in population genetics.

Keywords: coalescence, ancestry, partial genealogies, population genetics, sequential importance sampling