A general method for calculating likelihoods under the coalescent process
Analysis of genomic data requires an efficient way to calculate likelihoods across very large numbers of loci. We describe a general method for finding the distribution of genealogies: we allow migration between demes, splitting of demes [as in the isolation-with-migration (IM) model], and recombination between linked loci. These processes are described by a set of linear recursions for the generating function of branch lengths. Under the infinite-sites model, the probability of any configuration of mutations can be found by differentiating this generating function. Such calculations are feasible for small numbers of sampled genomes: as an example, we show how the generating function can be derived explicitly for three genes under the two-deme IM model. This derivation is done automatically, using Mathematica. Given data from a large number of unlinked and nonrecombining blocks of sequence, these results can be used to find maximum-likelihood estimates of model parameters by tabulating the probabilities of all relevant mutational configurations and then multiplying across loci. The feasibility of the method is demonstrated by applying it to simulated data and to a data set previously analyzed by Wang and Hey (2010) consisting of 26,141 loci sampled from Drosophila simulans and D. melanogaster. Our results suggest that such likelihood calculations are scalable to genomic data as long as the numbers of sampled individuals and mutations per sequence block are small.
189
3
977 - 987
977 - 987
Genetics Society of America