Genetics. Published Articles Ahead of Print: January 21, 2007, Copyright © 2007
doi:10.1534/genetics.106.061317


A more recent version of this article appeared on April 1, 2007.


REGULAR RESEARCH PAPERS

Inference of population structure under a Dirichlet process prior

1 University of California, Berkeley
2 University of California, San Diego

* To whom correspondence should be addressed. E-mail: johnh{at}biomail.ucsd.edu.

Submitted on May 25, 2006
Revised on August 25, 2006
Accepted on 24 December 2006


Abstract

Inferring population structure from genetic data sampled from some number of individuals is a formidable statistical problem. One widely-used approach considers the number of populations to be fixed, and calculates the posterior probability of assigning individuals to each population. More recently, the assignment of individuals to populations and the number of populations have both been considered random variables that follow a Dirichlet process prior. We examined the statistical behavior of assignment of individuals to populations under a Dirichlet process prior. First, we examined a best-case scenario, in which all of the assumptions of the Dirichlet process prior were satisfied, by generating data under a Dirichlet process prior. Second, we examined the performance of the method when the genetic data were generated under a population genetics model with symmetric migration between populations. We find that the method can be sensitive to the prior assumptions when the number of loci sampled is small, but that inferences are more robust to the prior on the number of populations when the number of sampled loci is large. Inferences on the number of populations is more accurate when {vartheta} = 4 Neµ is large and when the migration rate {vartheta} = 4 Neµ is low. We examined the accuracy of population assignment using a distance on partitions. Finally, we discuss several methods for summarizing the results of a Bayesian Markov chain Monte Carlo analysis of population structure. We develop the notion of the mean population partition, which is the partition of individuals to populations that minimizes the squared partition distance to the partitions sampled by the Markov chain Monte Carlo algorithm.

Key Words: Bayesian, Dirichlet process prior, Markov chain Monte Carlo, population structure




This article has been cited by other articles:


Home page
jashsHome page
G. M. Volk, C. M. Richards, A. A. Reilley, A. D. Henk, P. A. Reeves, P. L. Forsline, and H. S. Aldwinckle
Genetic Diversity and Disease Resistance of Wild Malus orientalis from Turkey and Southern Russia
J. Amer. Soc. Hort. Sci., May 1, 2008; 133(3): 383 - 389.
[Abstract] [Full Text] [PDF]


Home page
GeneticsHome page
S. Kitada, T. Kitakado, and H. Kishino
Empirical Bayes Inference of Pairwise FST and Its Distribution in the Genome
Genetics, October 1, 2007; 177(2): 861 - 873.
[Abstract] [Full Text] [PDF]


Home page
BioinformaticsHome page
M. Jakobsson and N. A. Rosenberg
CLUMPP: a cluster matching and permutation program for dealing with label switching and multimodality in analysis of population structure
Bioinformatics, July 15, 2007; 23(14): 1801 - 1806.
[Abstract] [Full Text] [PDF]