IDT. Quality oligos. Every time.

Originally published as Genetics Published Articles Ahead of Print on April 10, 2009.

Genetics, Vol. 182, 575-593, June 2009, Copyright © 2009
doi:10.1534/genetics.108.100222

mStruct: Inference of Population Structure in Light of Both Genetic Admixing and Allele Mutations

* Machine Learning Department and {dagger} School of Computer Science, Carnegie Mellon University, Pittsburgh, Pennsylvania 15215

1 Corresponding author: 5000 Forbes Ave., School of Computer Science, Pittsburgh, PA 15215. Email: epxing{at}cs.cmu.edu

Traditional methods for analyzing population structure, such as the Structure program, ignore the influence of the effect of allele mutations between the ancestral and current alleles of genetic markers, which can dramatically influence the accuracy of the structural estimation of current populations. Studying these effects can also reveal additional information about population evolution such as the divergence time and migration history of admixed populations. We propose mStruct, an admixture of population-specific mixtures of inheritance models that addresses the task of structure inference and mutation estimation jointly through a hierarchical Bayesian framework, and a variational algorithm for inference. We validated our method on synthetic data and used it to analyze the Human Genome Diversity Project–Centre d'Etude du Polymorphisme Humain (HGDP–CEPH) cell line panel of microsatellites and HGDP single-nucleotide polymorphism (SNP) data. A comparison of the structural maps of world populations estimated by mStruct and Structure is presented, and we also report potentially interesting mutation patterns in world populations estimated by mStruct.