Inferring and understanding changes in effective population size over time is a major challenge for population genetics. Here we investigate some theoretical properties of random mating populations with varying size over time. In particular, we present an exact solution to compute the population size as a function of time, Ne(t), based on distributions of coalescent times of samples of any size. This result reduces the problem of population size inference to a problem of estimating coalescent time distributions. To illustrate the analytic results, we design a heuristic methods using a tree-inference algorithm and investigate simulated and empirical population-genetic data. We investigate the effects of a range of conditions associated with empirical data, for instance number of loci, sample size, mutation rate, and cryptic recombination. We show that our approach performs well with genomic data (10,000 loci) and that increasing the sample size from 2 to 10 greatly improves the inference of Ne(t) whereas further increase in sample size results in modest improvements, even under a scenario of exponential growth. We also investigate the impact of recombination and characterize the potential biases in inference of Ne(t). The approach can handle large sample sizes and the computations are fast. We apply our method to human genomes from four populations and reconstruct population size profiles that are coherent with previous finds, including the Out-of-Africa bottleneck. Additionally, we uncover a potential difference in population size between African and non-African populations as early as 400 thousand years ago. In summary, we provide an analytic relationship between distributions of coalescent times and Ne(t), which can be incorporated into powerful approaches for inferring past population sizes from population-genomic data.
- Received November 30, 2015.
- Accepted July 20, 2016.
- Copyright © 2016, The Genetics Society of America