- THIS ARTICLE
-
Abstract
- Full Text (PDF)
- Alert me when this article is cited
- Alert me if a correction is posted
- SERVICES
- Email this article to a friend
- Similar articles in this journal
- Similar articles in PubMed
- Alert me to new issues of the journal
- Download to citation manager
- Reprints & Permissions
- CITING ARTICLES
- Citing Articles via HighWire
- Citing Articles via Google Scholar
- GOOGLE SCHOLAR
- Articles by Falush, D.
- Articles by Pritchard, J. K.
- Search for Related Content
- PUBMED
- PubMed Citation
- Articles by Falush, D.
- Articles by Pritchard, J. K.
Inference of Population Structure Using Multilocus Genotype Data: Linked Loci and Correlated Allele Frequencies
Daniel Falusha, Matthew Stephensb, and Jonathan K. Pritchardca Department of Molecular Biology, Max-Planck Institut für Infektionsbiologie, 10117 Berlin, Germany,
b Department of Statistics, University of Washington, Seattle, Washington 98195
c Department of Human Genetics, University of Chicago, Chicago, Illinois 60637
Corresponding author: Daniel Falush, Schumann Strasse 21/22, 10117 Berlin, Germany., falush{at}mpiib-berlin.mpg.de (E-mail)
Communicating editor: M. K. UYENOYAMA
| ABSTRACT |
|---|
We describe extensions to the method of Pritchard et al. for inferring population structure from multilocus genotype data. Most importantly, we develop methods that allow for linkage between loci. The new model accounts for the correlations between linked loci that arise in admixed populations ("admixture linkage disequilibium"). This modification has several advantages, allowing (1) detection of admixture events farther back into the past, (2) inference of the population of origin of chromosomal regions, and (3) more accurate estimates of statistical uncertainty when linked loci are used. It is also of potential use for admixture mapping. In addition, we describe a new prior model for the allele frequencies within each population, which allows identification of subtle population subdivisions that were not detectable using the existing method. We present results applying the new methods to study admixture in African-Americans, recombination in Helicobacter pylori, and drift in populations of Drosophila melanogaster. The methods are implemented in a program, structure, version 2.0, which is available at http://pritch.bsd.uchicago.edu.
THE study of admixed populations arises in many contexts in population genetics: for example, in the study of hybrid zones (![]()
![]()
![]()
![]()
![]()
![]()
![]()
![]()
![]()
In this article, we develop methods for studying the ancestry of both individuals and specific loci within admixed populations. Much of the previous work on population admixture has aimed to estimate average admixture proportions in an entire population (e.g., ![]()
![]()
![]()
![]()
![]()
![]()
![]()
![]()
![]()
We consider a situation in which we have multilocus genotype data from a sample of individuals collected from a population with (possibly) unknown structure. ![]()
kqk = 1). Both of those models assume that all the markers are unlinked and provide independent information on an individual's ancestry. In this article we introduce a third model, the "linkage model," which extends the admixture model to account for the correlations between linked markers that arise as the result of admixture ("admixture linkage disequilibrium"; ![]()
We also discuss a new prior model for the allele frequencies within each population, which can be used in conjunction with any of the three ancestry models. This model, while still relatively simple, is more accurate in many situations and sometimes allows much more information to be extracted from the data. These and a number of other extensions to the original model described by Pritchard et al. have been implemented in a computer program, structure version 2.0, available at http://pritch.bsd.uchicago.edu.
| SUMMARY OF OLD AND NEW MODELS |
|---|
Consider a sample of N individuals, each genotyped at L loci. ![]()
In the no-admixture model, each individual comes from one of the K populations. We let z(i) denote the population of origin of individual i and Z denote the vector (z(1) ... z(N)). Each of the K populations is characterized by a set of allele frequencies at each locus. Let pklj refer to the frequency of allele j at locus l in population k, and let P denote the full multidimensional vector of allele frequencies for all k, l, and j. A key modeling assumption is that there is linkage equilibrium and Hardy-Weinberg equilibrium (HWE) within populations. Hence, the likelihood of the genotype of individual i, conditional on its population-of-origin z(i), is simply a product of the frequencies of its alleles in that population.
An obvious limitation of the no-admixture model is that in practice individuals may have recent ancestors in more than one population. To model this, Pritchard et al. introduced an admixture model, in which each individual is assumed to have inherited some proportion of its ancestry from each population. Let q(i)k denote the proportion of individual i's genome that is derived from population k (where
), and let Q be the multidimensional vector of ancestry proportions for all the members of the sample. It is now possible for the different allele copies in an individual to come from different populations. (We use the term "allele copy" to refer to an allele carried at a particular locus by a particular individual.) To reflect this, the vector Z now records the population of origin of every allele copy in each individual, with z(i,a)l denoting the origin of the ath allele copy at locus l in individual i. ![]()
![]() |
(1) |
This admixture model also assumes linkage equilibrium and HWE within populations.
Inference is performed in a Bayesian framework, which offers a number of practical advantages in this context. Among these, it allows a straightforward assessment of the statistical uncertainty in each estimate of interest. It also allows us to make use of any prior information that we might have regarding population membership for some members of the sample. See ![]()
The Bayesian approach requires priors for P and Q. Following ![]()
, independently for each k. Some modifications of this prior are described below. The admixture proportions q(i) for individual i were also modeled as draws from a symmetric Dirichlet distribution, in this case with a hyperparameter
. The assumption of symmetry in the prior for the q's corresponds intuitively to an assumption that the K populations contribute roughly equal amounts of genetic material to the sample. To better model situations where this is not the case, the updated implementation of structure allows different values of
to be estimated for each population (so
becomes a vector of K values, with
k representing the relative contribution of population k to the genetic material in the sample). Otherwise the prior for q is unchanged. Alternative models for q are considered by ![]()
![]()
In practice we may not know either the allele frequencies P or the populations of origin Z in advance. Pritchard et al. described a Markov chain Monte Carlo (MCMC) scheme that estimates these jointly. This procedure clusters individuals into populations and estimates the probability of membership (or, for the admixture model, the proportion of membership) in each population for each individual.
A number of related population genetic methods have been described, including ![]()
![]()
![]()
![]()
The linkage model:
A deficiency of the admixture model is that by assuming that the z's within each individual are independent, it ignores the correlations in ancestry that one would expect to see along each chromosome. In this context, it is helpful to distinguish between three sources of linkage disequilibrium (LD). The first source is variation in ancestry (q) among the sampled individuals. Variation in q leads to correlations among markers across the genome, even if they are unlinked, because individuals with a large component of ancestry in population k have an excess of alleles that are common in k. We call this LD "mixture LD." The second source is correlations in ancestry along each chromosome, which cause additional LD between linked markers. We visualize this LD as occurring because each chromosome is composed of a set of "chunks" that are derived, as unbroken units, from one or another of the ancestral populations. In our terminology, this second source is "admixture LD." The third source is "background LD" within populations, which usually decays on a much shorter scale (tens of kilobases in humans). The admixture model in ![]()
To make inference computationally tractable, we use a simple model that incorporates the notion of discrete chromosomal chunks inherited from ancestral populations. Whereas in the "admixture" model of ![]()
Formally, the above assumptions translate into replacing the admixture model assumption that the z's along each chromosome are independent with the assumption that z's along each chromosome are dependent, forming a Markov chain. Specifically, for haploid data, independently for each individual i,
![]() |
(2) |
and
![]() |
(3) |
where dl denotes the genetic distance from locus l to locus l + 1, assumed known. For diploid (or polyploid) data, independently for each individual i, the z's along each of i's two (or more) chromosomes form independent Markov chains satisfying Equation 2 and Equation 3.
Note that the linkage model includes the admixture model as a limiting case: as r tends to infinity in (3), all loci become independent, returning us to the original admixture model (Equation 1). Note also that we assume that r is the same for all individuals, although this assumption could be relaxed at the cost of an increase in the number of parameters.
Interpretation of the linkage model:
To provide some motivation for the linkage model, consider the following idealized scenario. Suppose that our sample comes from a diploid population that experienced a single "admixture event" followed by t2 generations of subsequent random mating within postadmixture populations. In the generation of the admixture event, individuals are formed by mating of individuals between two or more ancestral populations. These individuals inherit their DNA intact (i.e., without intervening recombination) from the ancestral populations. In the subsequent generation, the boundaries delineating these intact chunks will correspond to crossover events in a single meiosis and so (assuming no interference) will form a Poisson process of rate 1 per morgan. Chromosomes in each subsequent generation will inherit chunks of DNA from chromosomes in the previous generation in a similar manner, and it follows from standard results on the superposition of Poisson processes that, in chromosomes in the current generation, the boundaries between the chunks of DNA inherited intact since the admixture event will form a Poisson process of rate t2 per morgan.
This reasoning provides some justification for the form of the transition rates in Equation 3. However, it falls short of providing a complete justification for all assumptions of the linkage model and in particular for the assumption that the ancestral populations of origin of the chunks are independent draws from some (individual-specific) vector q. Furthermore, in real populations, biological details such as crossover interference and gene conversion events (or transformation in bacteria) will cause deviations from the assumed model. Nevertheless, the linkage model captures, in a parsimonious and computationally convenient way, the correlations in ancestry between linked loci that we would expect to see in admixed individuals from real populations.
The discussion above also suggests an interpretation of the parameter r in terms of the number of generations since admixture first occurred. Specifically, if the genetic distances dl between adjacent markers are measured in morgans, then r can be interpreted as an estimate of t2, the number of generations since the admixture event (although inevitable deviations from the model assumptions mean that it would be wise to treat any such estimate with a degree of caution). Similarly, if the genetic distances are measured in centimorgans, then r can be interpreted as an estimate of 100t2. In some situations the genetic distances between loci may not be known, but a proxy such as physical distance may be available. If the physical distance between loci, measured in nucleotides, is used in place of the genetic distance for dl, then r can instead be interpreted as an estimate of the product of t2 and the recombination rate (expected number of crossovers per base pair per meiosis). If there is no information on map positions, then the linkage model is not applicable.
For many data sets, we will have little prior knowledge concerning the time since admixture (and perhaps also the recombination rate). We have therefore implemented a uniform prior for log r. The bounds of the prior should generally be set to include all biologically plausible values of r, which may range over several orders of magnitude (partly explaining the attraction of working with log r).
Computations for the linkage model:
Because in practice the z's for each chromosome are not observed, the Markov model for the z's used by the linkage model (Equation 2 and Equation 3) results in a hidden Markov model (HMM) for the observed genotype data. Standard HMM methods (see ![]()
Although the linkage model was developed with computational tractability in mind, it is nevertheless more computationally intensive than the admixture model. This can make the linkage model less convenient for particularly large or complicated data sets. For the African-American data set described below (626 diploid individuals and 252 loci) and K = 2 populations, a run consisting of 10,000 burn-in iterations followed by 50,000 further iterations took 3 hr using the admixture model, 7 hr for the linkage model if it was (incorrectly) assumed that the data were fully phased, and 11 hr for the linkage model assuming (correctly) that the data were unphased (calculations were performed on a DEC Alpha of 2001 vintage). Performance differentials will increase for larger K: the computation scales linearly with K for the admixture model and for the linkage model with phased data, but scales with K2 for the linkage model with unphased or partially phased data.
Models of allele frequencies:
As described above, ![]()
![]()
![]()
The new model is based on ideas in ![]()
The new model for correlated allele frequencies that we describe here is based on the same implicit assumptions as the model of ![]()
![]()
![]()
![]() |
(4) |
independently for each l. Here,
may be fixed or estimated within the MCMC scheme. Conditional on PA, the frequencies in each population k have a prior distribution
![]() |
(5) |
independently for each k and l. The size of Fk tells us about the effective population size of population k during the time since divergence, with large values of Fk indicating a smaller effective population size (![]()
We refer to this new model for correlated allele frequencies as the "F" model. The name is chosen to reflect the fact that there are close connections between the model and the classical measure of correlations between populations, Wright's FST (![]()
![]()
(1 -
), where pk is the frequency of an allele in population k, and
is the overall frequency of that allele across all subpopulations (![]()
, and Fk plays a role like that of FST in the classical model, except that we use a generalized model with different drift rates for each population. Using a different value of F for each population, rather than a single common value for all populations, introduces a considerable amount of extra flexibility into the model at the expense of only a few additional parameters.
The prior distribution that we have implemented for F assumes that the Fk are a priori independent, with a density proportional to a gamma distribution truncated at 1 (so that Pr[0 < Fk < 1] = 1). Depending on the parameters of the distribution, the prior can be "harsh"putting most of its weight on low values of F, or "permissive"not discriminating strongly against any value of Fk. A harsh prior on low values of F corresponds to strong prior information that the allele frequencies in the different populations are similar to one another, and this seems generally to give the best performance in detecting subtle admixture in problems that are difficult for the independent frequencies model. However, if the values of Fk are being used to make evolutionary inferences, a permissive prior is more appropriate. In the Appendix, we present Metropolis-Hastings updates for PA, Pk, and Fk.
| MODEL RESULTS USING SIMULATED DATA SETS |
|---|
To assess the uses and limitations of the new structure features, we have performed Wright-Fisher simulations, based on the seven demographic scenarios (IVII) shown in Fig 1. Mutation parameters differ among the simulations and are specified separately for each one. Under each scenario, the goal is first to identify the current populations and second to reconstruct elements of their history: for example, the amount of genetic drift, the degree of admixture, and the time since admixture occurred.
|
Differentiating between closely related populationsThe F model:
One advantage of the new F model is that it can sometimes detect population subdivision that is invisible to structure when the gene frequencies of the populations are modeled without correlations. An example is shown in Fig 2, where a single random-mating population splits into two (scenario II). Eight generations after the split, the uncorrelated model is unable to distinguish between the two populations, while the F model distinguishes them quite accurately, with the exception of a few individuals that are not assigned with high probability to either population. After 16 generations of separate evolution, the uncorrelated model becomes able to distinguish reasonably accurately between the two populations and the F model provides little improvement in clustering. Further simulations (not shown) indicate that the F model is less likely to improve performance when the number of loci is small; rather, it can allow accurate clustering of individuals from extremely closely related populations when large numbers of markers are used (e.g., ![]()
|
Estimation of K:
Simulations presented by ![]()
For some data sets, higher estimates of K obtained using the F model may reflect deviations from random assortment that are not caused by genuine population subdivision. Table 1A shows model likelihoods estimated for a single panmictic population (scenario I). Whether or not the F model is used, the highest value of P(X|K) is given by K = 1. In Table 1B the evolutionary parameters are identical but there is a 50% selfing rate. In this case, the F model gives higher probabilities for K = 2, while the original model continues to give the highest model likelihood for K = 1. Other situations that might cause additional populations to be inferred by structure (with or without the F model) include a significant frequency of inbreeding, cryptic relatedness within the sample, or the presence of null alleles.
|
Inference of demographic history:
The F model can also be used to estimate the amount of genetic drift undergone by the different populations under study. In Fig 3A, estimates of F are shown for a population that trifurcated (scenario III). For a substantial time period after the trifurcation, the estimated values of F are approximately proportional to the time since the split and inversely proportional to the population sizes. When the values of F start to exceed
0.2, F no longer increases linearly but the ranking of the values of F continues to reflect the relative degrees of drift that the populations have undergone.
|
The use of the F model to estimate drift is subject to a caveat, which is that contrary to the model assumption, drift may not have occurred independently in each population. For example, Fig 3B shows results based on scenario IV, in which a single population divides into two and one of the populations subsequently subdivides. The structure algorithm interprets the similarity of the two subpopulations as evidence that their gene frequencies are close to those of the ancestor of all three populations and estimates lower values of F for them than for their common ancestor prior to subdivision.
In principle it should be possible to generalize the model to allow for the possibility of hierachical subdivision but we do not attempt this here. Rather, we suggest testing for deviations from the model by estimating values of F while excluding one or more of the populations in turn. If the assumption that all of the populations evolved independently from a single common ancestral population is correct, then this should leave the F values estimated for the other populations approximately unchanged. If the F values decrease, then this suggests that one of the excluded populations diverged first, so that the remaining populations share a more recent common ancestor than shared by the whole sample considered together. If F values of one or more of the populations increase, then it may indicate that the original F values were artificially reduced by the presence of closely related subpopulations in the sample. Other diagnostics are discussed by ![]()
Inference in admixed populationsthe linkage model:
Inference of demographic history becomes more difficult if admixture has occurred subsequent to population divergence (e.g., ![]()
Linkage information can help to resolve the ambiguity. Informally, admixed individuals contain chromosomal chunks that derive from one population or another. Using closely linked markers, the linkage model aims to detect the chromosomal chunks and can potentially reconstruct the ancestral populations accurately even if no pure members exist.
To explore the properties of the new method, we have performed extensive simulations. We consider individuals genotyped at L* loci on each of C chromosomes (i.e., typed at a total of CL* loci). The loci are equidistant, with a recombination rate R per generation between adjacent genotyped sites. The genetic map is assumed known. We analyzed the simulated data using the uncorrelated model for allele frequencies.
Estimation of allele frequencies:
One measure of whether structure is performing well is if it can accurately estimate the population allele frequencies in the ancestral populations. To visualize this, we have constructed neighbor-joining trees based on the posterior mean allele frequencies. When the allele frequencies are accurately estimated, the branch tips lie close to the large black dots (which represent the "correct" frequencies).
We focus on scenario VI, "unidirectional" admixture. In three out of the four cases shown (Fig 4, AC), structure is highly consistent in its inference of gene frequencies for ancestral population 1, reflecting the continuous presence of pure individuals in the sample. The accuracy of this inference provides a baseline from which to judge the performance of structure in disentangling the gene frequencies of ancestral population 2, which ceases to have any pure descendants a few generations after admixture.
|
In the first simulated example (Table 2A, Fig 4A), as the number of generations after admixture (t2) increases, the admixture model becomes increasingly biased, underestimating the divergence between the populations (shown by the intermediate position of the inferred populations in the gene frequency tree) and underestimating the amount of admixture (H2 in Table 2A). In contrast, the linkage model estimates gene frequencies and the degree of admixture accurately for many generations after the admixture event.
|
The performance of the admixture model is improved by increasing the number of chromosomal regions studied (Table 2B, Fig 4B) but the linkage model continues to prolong the number of generations after admixture for which accurate ancestry estimates can be obtained.
In another example (Table 2C, Fig 4C), the admixture model shows the opposite bias for a number of generations, overestimating rather than underestimating admixture and the degree of divergence between ancestral populations. The linkage model again uses linkage information to resolve the ambiguity and it performs well up to eight generations after the admixture event. However, in this example, the marker density is low enough that in later generations, the linkage information is lost, and the admixture and linkage models produce almost identical (and similarly biased) results.
By contrast, Fig 4D (see also Table 2D) shows the situation where the marker density is very high. In this case, background LD is substantial, leading the linkage model to consistently overestimate the divergence between the two populations. Further, substantial admixture is estimated for population 1, which is in fact pure. The admixture model actually does rather better in the populations it infers, but a few generations after the admixture event it also produces misleading results. These results illustrate the problems that can arise when there is substantial background LD, a point to which we return in the DISCUSSION.
Estimating the time since admixture:
In addition to improving estimates of the degree of admixture, the linkage model also provides an indication of the time since admixture. For examples AC the value of r (Table 2) provides good estimates of the number of generations since admixture, except immediately after the admixture event (when little admixture has occurred, so that there is not yet much admixture LD and the posterior for r is uninformative) and >100 generations after admixture, when the number of generations is underestimated. The time of admixture may be considerably overestimated by r if there is substantial background LD in the sample (Table 2D).
Population-of-origin assignments for chromosomal regions:
A further advantage of the linkage model is that if the marker density is high enough, it can provide accurate population-of-origin assignments for chromosomal regions, as required in applications such as admixture mapping. For example, Fig 5 shows population-of-origin assignments for the two allele copies at each locus of a single diploid individual. The Markov structure of the data is clearly evident from the structure output, in that nearby loci typically have similar assignment probabilities. When the data are phased, individual loci are often assigned with very high probability to the correct ancestral population, especially in the middle of a large chunk inherited from one population. At boundaries between chunks, the assignment probabilities typically change rapidly, giving a good indication of the position of the recombination event that brought the chunks from different populations together.
|
For unphased diploids, the data contain somewhat less information about the population of origin of individual allele copies, particularly in regions of the genome where the two homologous chromosomes are spanned by chunks inherited from different ancestral populations. In these regions, neighboring loci do not provide information concerning which of the two allele copies at a particular locus comes from one population and which comes from the other. For many problems (such as in admixture mapping) we are mainly interested in inferring the number of allele copies from each population, and this information can be extracted from the data, given sufficiently dense markers (as in Fig 5).
Coverage properties:
A final advantage of the linkage model is that it gives more accurate estimates of the statistical uncertainty of admixture proportions. This property is illustrated in Fig 6, which shows 90% credibility regions for q for a sample of individuals from two populations. The two populations partially admixed with each other in an admixture event (scenario VII) 32 generations before the sample was taken. After 32 generations of random mating within each postadmixture population, the ancestry coefficients of each individual are almost identical (differing by <0.001) and are shown by the red horizontal lines in the figure. Ancestry estimates were made by both the admixture and linkage models, for markers at a variety of genetic distances. Tightly linked markers give nonindependent information and are therefore less informative about the value of q for each individual than are the same number of unlinked markers, leading to higher variation in estimates between individuals. The admixture model does not take these correlations into account. Consequently, the sizes of the estimated credibility regions are approximately independent of the actual degree of linkage and are much too narrow for tightly linked markers. Under the linkage model, the credibility regions for q become wider as linkage between markers increases and continue to reflect the true degree of statistical uncertainty even for tightly linked markers.
|
| APPLICATIONS TO DATA |
|---|
Recombination between distinct populations of Helicobacter pylori:
The bacterium Helicobacter pylori colonizes the human stomach lining. When multiple strains infect the same stomach, they recombine rapidly through the import of fragments of DNA that are typically a few hundred base pairs in length (![]()
![]()
Fig 7 shows results for a typical isolate from South Africa. South Africa contained isolates from hpAfrica1, hpAfrica2, and hpEurope populations, reflecting the ethnic diversity of the region. The particular isolate we consider here was assigned to the Africa1 (blue) population by the no-admixture model. The top plot (Fig 7A) shows the posterior assignment probabilities for each individual nucleotide based purely on the estimated population allele frequencies (i.e., not using information from q). The plot shows that most sites provide little information about ancestry, with assignment probabilities to all four populations being
0.25. The remaining nucleotides are mostly assigned with high probability to Africa1 or, less frequently, Africa2 (red). The Africa2 nucleotides appear to come in runs, suggesting import of specific DNA fragments into a bacterium from the Africa1 population. This conclusion was confirmed by further exploratory analysis (Fig 7B). For each population, the sum of the log of the assignment probabilities was computed within a 100-nucleotide moving window. For most of the sequence, the value of the sum was positive for the Africa1 population (indicating higher probabilities than those under random assignment) and negative for the other three populations. However, in three stretches the sum for the Africa2 population gives positive values, suggesting DNA import into those regions.
|
The linkage model provides a formal method to make population-of-origin assignments that take the linkage relationships into account (Fig 7C). Nucleotides in the three regions identified by the exploratory analysis were assigned to the Africa2 population with probabilities close to 1.0, providing statistical support for the conclusion that there have been (at least) three imports of Africa2 DNA into these fragments. This example shows that given highly differentiated populations and enough informative sites, it can be possible in practice to make accurate population-of-origin assignments for individual loci. Further, because of the large amount of information provided by linkage, it is also possible to reconstruct ancestral populations in the absence of pure individuals, using the linkage model. See ![]()
Admixture LD in African-Americans:
We used the new linkage model to study the extent of admixture linkage disequilibrium in a Chicago-based population of African-Americans. Previous work on African-Americans has shown significant levels of European admixture in the range of
525%, with substantial variation across studies and across study populations (summarized by ![]()
![]()
![]()
![]()
![]()
![]()
![]()
The data set that we used consists of 247 microsatellites genotyped in samples of unrelated individuals including 210 African-Americans (from Maywood, Illinois), 158 European-Americans (from Michigan), and 308 Nigerians (Yoruba; ![]()
![]()
![]()
When run with K = 2, both the admixture and linkage models gave very similar ancestry estimates and both suggested that Nigerians and American whites were almost pure representatives of the respective preadmixture populations, with average estimated admixture rates of 1.4% for both populations. The African-Americans were substantially admixed, having a mean of 17.8% European ancestry (the range of point estimates for individuals' values of q was 259%, using the admixture model). Similar results were obtained if we used the USEPOPINFO = 1 option to specify the population of origin of the American whites and Nigerians. Our estimate of 17.8% European ancestry is very similar to the estimate of 18.8% obtained by ![]()
The posterior distribution for the parameter r under the linkage model is shown in Fig 8A. The posterior mean of r was 0.098 chromosome chunk breakpoints per centimorgan, with a 90% credible region of 0.070.13. Under the simplifying assumption that the African-American population was created by a single hypothetical admixture event (scenario V), this event is estimated to have taken place 713 generations ago. This is consistent with what one might expect, on the basis of the history of African-Americans, who were mostly exported from Africa during the late eighteenth century (![]()
|
We repeated our analysis using information provided by map distances from the recently published deCODE map (![]()
|
The posterior distribution of r (Fig 8A) clearly excludes large values of r, indicating that we are detecting a significant signal of admixture LD. Recall that as r gets large the linkage model becomes equivalent to the admixture model, so the fact that the posterior for r excludes large values shows that the linkage model provides a better fit to the data than does the admixture model in this case. For comparison, Fig 8B shows the posterior for r for the same data and map distances, but with the order of the loci randomized. The posterior for r has considerable support all the way up to the maximum value of r permitted by the prior and would clearly have extended to still larger values had the prior allowed this. Three further randomizations produced similar results, supporting the effectiveness of the posterior for r in summarizing the extent of admixture LD.
Although we detected a definite signal of admixture LD in our sample, most of the LD present in the African-Americans is actually due to variation in q: i.e., "mixture LD" in the terminology we introduced earlier. To differentiate between mixture and admixture LD, we examined the correlations of ancestry estimates (from the admixture model) between adjacent loci. The first measure that we used (Fig 10A) measures the correlation between the estimated probability of African ancestry (averaged over the two allele copies at each locus) for pairs of neighboring loci. The correlations were positive on average (mean 0.041 with standard error 0.009), albeit with a great deal of variation between different locus pairs. These correlations reflect the total LD in the sample. The second measure (Fig 10B) shows the correlations that remain when variation in q among African-Americans is accounted for. For each individual, at each locus, we computed a "residual" by subtracting the individual's estimated q(i) from the estimated probability of African ancestry (averaged over the two allele copies at each locus). The figure shows the correlations of these residuals. The correlations are slightly but not significantly negative on average (mean -0.001 with standard error 0.007), implying that most of the LD in the sample can be accounted for by variation in q (i.e., mixture LD). Further, a regression of the correlations with the genetic distance between the loci does not have a significant slope under either measure, presumably because the trend has been obscured by the high degree of variation in correlation values at each genetic distance. The fact that the linkage model obtains plausible estimates of r and rejects large values of r indicates that the linkage model extracts much more information from the data than the pairwise comparisons do.
|
Our results also highlight an important feature of human genetic data, which is that there is a great deal of noise in raw LD estimates for individual locus pairs, even when the admixture involves populations that, by human standards, are relatively highly differentiated. Thus, our description of admixture LD in this population would be enhanced by using a denser set of markers, and for applications such as admixture mapping where one needs to estimate the population of origin for the sampled chromosomes, a denser marker set would be critical. In admixed populations where loci have been chosen specifically to have large frequency differences between the putative parental populations (![]()
Genetic drift in Drosophila melanogaster:
To illustrate some of the possible ways of using the F model in historical inference, we have reanalyzed the data set of ![]()
We started the analysis without making any assumption about geographical clustering. Using the no-admixture and F models, K = 3 gave the highest model likelihood. The three inferred populations correlated well with the land masses Israel, Tasmania, and Australia, although many flies were not clearly allocated to one population and 50 Australian flies, 2 Tasmanian flies, and 7 Israeli flies were assigned to their home population with <50% probability. These inconsistencies are due to limited statistical power rather than to identifiable admixture events because an analysis under the migration model with the USEPOPINFO option (![]()
|
Where did the Tasmanian flies come from? We estimated F values when analyzing Tasmanian and Israeli flies together (Fig 11C) and Tasmanian and Australian flies together (Fig 11D). The value of F inferred for the Australian population was close to zero, much lower than that for the Israeli population, while a high value of F was estimated for the Tasmanian population in both cases. This analysis suggests that Tasmanian and Australian flies share a more recent common ancestor with each other than with the Israeli flies. Further, the amount of drift that the Australian flies have undergone since splitting with the Tasmanian flies is very low, implying that Tasmania was colonized from Australia and underwent a bottleneck in the process. A possible technical objection to our analysis is that flies were sampled in several locations in Australia and that this might somehow account for the particularly low estimated value of F. We tested for this possibility using the five sampling locations in Australia that had >25 genotypes. We ran structure separately for each one, using all of the Tasmanian genotypes in every case. The analysis gave a consistently high value of F for the Tasmanian population (0.1000.125) and a low value for the Australian population (0.0040.023)lower than that for the Israeli population in the equivalent analysis (0.039). Our inference therefore appears to be robust to the exact combination of populations chosen for analysis.
The approach we have taken is similar to that used by ![]()
| DISCUSSION |
|---|
We have presented two major modifications of the structure approach, namely the linkage model and the F model. Both can significantly improve the technical quality of the inference, giving better clustering, more realistic confidence limits, and more accurate admixture estimates. For some data sets, these improvements are critical, allowing the detection of population substructure or admixture that was invisible using the earlier algorithm. The new models are also an important step in making structure into a tool for performing detailed historical inference.
The linkage model allows structure to analyze data sets containing markers with admixture LD between them, significantly expanding the range of data sets for which it is an appropriate tool. For data sets with weak admixture LD (e.g., the African-American example), the linkage model gives similar clustering and ancestry estimates to the admixture model, but also estimates a chunk size parameter r that provides information on the average rate of decay of admixture LD in the sample. The rate of decay reflects the amount of time that has elapsed since populations admixed. For very informative data sets (e.g., the H. pylori data), the linkage model also allows accurate assignment of chromosomal chunks to ancestral populations. One consequence is that ancestral populations can be reconstructed even if admixture has been so extensive that no pure individuals remain in the sample. In highly informative data sets, it might also be useful to relax the assumption of a single value for r. Possible ways forward include using a different r for each individual or geographical location and/or allowing the expected chunk size to depend on the population of origin of the chunk. In this way, it might be possible to extract additional information about the timing of admixture between different subpopulations.
Although the linkage model takes into account the correlations between markers that occur due to admixture, structure will always need data from several unlinked or weakly linked genetic regions from each individual to make meaningful inferences. We strongly recommend against using the model for a data set consisting of human Y chromosome or mitochondrial haplotypes, for example. As well as having markers from several genomic regions, it is also important that none of the markers be too strongly linked. Background linkage disequilibrium arises through genetic drift within populations, and in some scenarios this can lead structure to produce misleading results.
Background LD causes problems when particular allele combinations are overrepresented in two or more of the populations before admixture takes place. Such LD can arise through genetic drift before the populations separate (i.e., during the burn-in phase in scenarios V, VI, and VII in Fig 1). If the markers are tightly linked, then this LD can persist throughout the period of divergence and subsequent admixture. Suppose that for this reason, allele combinations ab and AB are overrepresented in both of the populations, compared to Ab and aB. The linkage model tries to attribute this LD to admixture and hence tends to overestimate the frequency of alleles a and b in one population and the frequency of alleles A and B in the other. In this manner, structure can both overestimate the divergence between ancestral populations and infer spurious admixture. Because background LD decays over short distances, this also leads to overestimation of the time since admixture.
In designing a data set to be used by the linkage model, it is therefore desirable to ensure that the markers are sufficiently closely linked to allow for admixture LD, yet sufficiently far apart that there is not substantial background LD between them. Historical information about the likely time of admixture, combined with knowledge of intermarker recombination rates, can be used to help select an appropriate marker spacing.
An alternative, post hoc, approach is to rely on inspection of structure output. One observation that we have made from simulations (based on a variety of demographic scenarios; data not shown) is that when background LD is a problem, structure infers that all individuals from both populations are admixed. Genuine admixture, by contrast, is often asymmetrical, affecting some populations much more than others. For example, if a predefined subpopulation in the sample has substantial ancestry from one structure population without corresponding ancestry from a second, then this is an indication that background LD is not a serious bias. Further when K > 2, background LD causes admixture to be inferred preferentially between the most closely related structure populations; thus if admixture is inferred between distantly related populations, this result is unlikely to result from background LD.
In future, it should be possible to specifically model the background LD present in the data. Full coalescent approaches to this type of problem are computationally daunting, and so models that simplify the structure of the data may be suitable. One approach would be to use a Markov formulation of haplotype structure, perhaps like that used by ![]()
![]()
This article also introduced the new F model for correlated allele frequencies. This update improves clustering for some data sets where populations are very weakly differentiated and also allows inference of the pattern of drift. We used the model to show that Tasmania was almost certainly colonized by Australian D. melanogaster and that a significant bottleneck occurred in the process. A weakness of the current model is that it assumes that all populations have evolved by independent drift from a single ancestral source population, which may not always be a good approximation. This framework could potentially be generalized to incorporate, and differentiate between, more complicated scenarios, with populations splitting from each other sequentially.
The new implementation of structure (version 2) also contains other new options that can be useful for some data sets. In the earlier version, the gene frequencies P and admixture proportions Q were drawn from symmetric Dirichlet distributions with parameters
and
, respectively.
was normally fixed at 1.0, which corresponds to a uniform prior on allele frequencies. For some types of markers (e.g., sequence polymorphisms and SNPs, depending on the process of ascertainment), the frequency spectrum may be skewed toward rare alleles. In this situation, the data are better modeled by smaller values of
(smaller values of
place more prior weight on configurations where all but one allele at a locus is rare). Therefore, we have implemented a Metropolis-Hastings update for
. We have found that for data where most alleles are rare, updating
can lead to more accurate estimates of P. We now also allow different values of
for each population. This generalized prior for Q allows for the fact that in practice not all populations are equally represented in the sample, which may lead to more accurate ancestry estimates, particularly in situations of highly asymmetric admixture. The no-admixture model assumes a prior probability that each individual is drawn from one of the K populations is 1/K. It would also be possible to generalize this prior to allow for differences in frequency between populations but this option has not yet been implemented.
The large number of model options that have now been implemented might lead to concerns about overparameterization. However, provided that there are a reasonable number of loci (>10), the number of model parameters added by each model option is small in comparison with both the number of elements in the data set and the total number of parameters estimated by structure (which include all the elements of P and Q). Indeed, although the F model may appear to increase the number of parameters to be estimated by the model, in some sense it decreases the effective number of parameters, by introducing correlations among the allele frequencies in the different populations.
Nevertheless, there may not be enough information in a particular data set to estimate all of the parameters, in which case a simpler model would be more appropriate. The linkage model adds only a single parameter, r, but for some data sets little or no admixture LD may be present (indicated by posterior support for large values of r), in which case the admixture model should provide essentially the same answers (and be faster to run). It is also possible that certain parameter combinations could lead to problems. For example, our experience with simulated data suggests that it is sometimes difficult to estimate both
and F jointly (perhaps because estimated values for
are sensitive to the estimates for the ancestral allele frequencies PA, which in turn are sensitive to estimates of F). In such cases we have found it useful to first estimate
using the old independent model for population allele frequencies and then to fix
at its estimated value while using the F model for correlated allele frequencies.
In many cases, symptoms that might be attributed to overparameterization in fact provide useful indicators of genuine biological uncertainty. For example, the model used for gene frequencies sometimes has a large effect on the estimates of admixture proportions that are obtained. In Fig 2 the estimates of admixture between the populations (in fact, 0) are much more accurate under the F model than under the uncorrelated model. However, neither model makes accurate estimates for the proportion of European ancestry of the African-Americans if Nigerians are excluded from the sample. The uncorrelated model overestimates admixture (28%) while the F model underestimates it (3%). For both of these data sets, the large difference in admixture estimates between models provides an indication that there is limited information on the true degree of admixture. In contrast, when Nigerians are added to the latter data set, both models give very similar estimates (18 and 17%) for the European ancestry of African-Americans, reflecting the additional information about African gene frequencies provided by the unadmixed individuals.
Despite the many complexities in the demography of species, their genetic structure can often be approximated by a division into discrete (or semidiscrete) populations. This approximation greatly simplifies statistical inference, allowing assignment of individuals to populations, with modest computational requirements even for large data sets. Here, we have extended the approach to gain greater information on the history of the populations themselves. In effect, we are approximating a complex history by a narrative of population fissions and fusions. The resulting algorithms remain computationally tractable and can potentially help resolve the major events in the history of the species under study.
| ACKNOWLEDGMENTS |
|---|
We thank Aravinda Chakravarti, Richard Cooper, Nicholas Schork, and Alan Weder for kindly allowing us to use the African-American data set (and Richard Cooper in particular for help in interpreting the data); Christian Schlötterer for the D. melanogaster data and comments on the manuscript; Sebastian Suerbaum and Mark Achtman for permission to use the H. pylori data; Peter Donnelly and David Balding for helpful discussions of FST; Martin Vingron for use of a computer cluster at the Max Planck Institute for Molecular Genetics; Giovanni Montana for comments; and William Wen for help with data manipulation. Two anonymous reviewers read the manuscript carefully and made many useful suggestions. The work was supported by grants from the National Science Foundation (BIR-9807747 to M.S.), Burroughs Wellcome Fund (J.K.P.), and the National Heart, Lung and Blood Institute (54512 and 54485 for collection of the African-American data).
Manuscript received October 8, 2002; Accepted for publication March 28, 2003.
| APPENDIX |
|---|
Here we provide the computational details for the new MCMC updates. Recall that our goal is to sample from the joint posterior distribution of
![]() |
(A1) |
MCMC methods provide an approach for doing this. We start by making arbitrary initial choices for each parameter and then propose updates that change a subset of these, conditional on the other parameters and the data. One full iteration of our Markov chain proposes changes to each parameter. This algorithm results in a Markov chain whose stationary distribution is the joint posterior distribution of interest. For background on these methods in the present context see ![]()
![]()
![]()
. Here, we describe updates for r, F, and
, as well as modified updates for Q and Z under the linkage model, and for P under the F model.
MCMC update for the F model:
The update for pkl· is similar to that for the original (independent frequencies) model (Equation A6, ![]()
j.
The update for the "ancestral" allele frequencies PA is more complicated. The algorithm that we have implemented is as follows. Start with initial guesses for PA (e.g., based on the overall sample frequencies). Then, at each step of the Markov chain, propose the following update once for each locus. Select at random two alleles m and n, such that 1
m < n
Jl, where Jl is the total number of alleles observed at locus l. Simulate a value
from a normal with mean 0 and some small (fixed) standard deviation (we used 0.05) and propose changing the allele frequencies pAlm and pAln to
and
, respectively. Reject the proposal if either of the proposed allele frequencies is outside (0, 1). Otherwise, accept the updated allele frequencies according to the appropriate Metropolis-Hastings probability, that is, the minimum of 1 and
![]() |
(A2) |
where Pr(PA) is the prior probability of PA, assumed to be symmetric Dirichlet with parameter
. Expression (A2) can be rewritten as
![]() |
(A3) |
where fk
(1 - Fk)/Fk.
Finally, to update each value of Fk, we start from an initial value F(0)k (the prior mean, say) and then update it as follows. Conditional on the current value, Fk, we propose a new value, F'k, from a normal distribution with mean Fk and some fixed standard deviation (0.05, say). If the new value F'k is outside the interval (0, 1), we reject the proposal. Otherwise, we accept it with the Metropolis-Hastings probability, namely, the minimum of 1 and
![]() |
(A4) |
where Pr(Fk) is the prior probability of Fk, assumed to be proportional to a gamma distribution with mean µ and variance
2 (see main text). Expression (A4) can be rewritten as
![]() |
(A5) |
where f' and f are (F'k - 1)/F'k and (Fk - 1)/Fk, respectively. If we assume a single value F for all populations, the update is much the same, except that the posterior ratio is given by a product over both k and l.
Full description of the linkage model, including unphased data:
The following describes an MCMC scheme for simulating from a Markov chain with stationary distribution Pr(P, Z, r, Q|X, K).
- Sample from Pr(Z|P, r, Q, X).
- Sample from Pr(P|Z, r, Q, X) = Pr(P|Z, X).
- Update r by Metropolis-Hastings update.
- Update Q by Metropolis-Hastings update.
Step 2 is exactly the same as in ![]()
![]()
![]() |
(A6) |
for k = 1, ... , K, and
![]() |
(A7) |
This allows us to compute ßlk for k = 1, ... , K, and l = 1, ... , L (the "forward" part of the algorithm). This computation can be made linear in K rather than quadratic, following a simple rearrangement. After substituting Equation 3 for Pr(zl+1 = k'|zl = k), we get
![]() |
(A8) |
Note that the sum in brackets is independent of k' and therefore needs to be calculated only once for each l.
Having done this, the "backward" part allows us to simulate Z, starting from zL, using Gibbs sampling based on
![]() |
(A9) |
and
![]() |
(A10) |
where Pr(zl+1|zl = k, r, Q) is given by (3).
For unphased or partially phased diploids, we have implemented two algorithms. The two algorithms can incorporate different types of phase information and produce equivalent results for unphased data. The first algorithm is based on a Markov formulation for phase information. For each pair of adjacent linked loci we have an estimate of the probability bl that the first alleles of loci l and l + 1 are on the same chromosome. Information that adjacent allele copies were inherited together from the same (unspecified) parent is often available from sib-pair pedigree data. For unphased data the order of the allele copies is random so that bl = 0.5 for all loci. We define
![]() |
(A11) |
where superscript (1) refers to the first allele copy and (2) refers to the second allele copy at each locus. Denote the quantity in the right-hand side of Equation 3 by Pk'k. In the forward part of the algorithm we calculate
![]() |
(A12) |
for k1 = 1, ... , K; k2 = 1, ... , K; and
![]() |
(A13) |
In the backward part we jointly simulate z1 and z2 using
![]() |
(A14) |
and
![]() |
(A15) |
The model for maternal and paternal phase information is similar; here
![]() |
(A16) |
where superscripts m and p refer to the maternal and paternal allele copies at each locus,
![]() |
(A17) |
and
![]() |
(A18) |
where Ml is the probability that allele 1 at locus l is inherited maternally, assumed known. In the backward step, we simulate the population of origin of maternal and paternal chromosomes.
![]() |
(A19) |
and
![]() |
(A20) |
We then simulate the population of origin for the two allele copies at each locus, conditional on maternal and paternal assignments, using
![]() |
(A21) |
For both of these diploid models, a rearrangement analogous to that described for the haploid case leads to the computation in the forward part being proportional to K2 rather than to K4.
For each of these models, the updates for r and Q use the random-walk Metropolis algorithm. In haploids, for example (with the individual superscript i reinstated),
![]() |
(A22) |
r is updated using a Metropolis-Hastings step, by comparing this sum for proposed and current values of r. We have implemented a uniform prior for log10(r); the proposed value of log10(r) differs from the current value according to a normal distribution with mean of 0.05.
Last, we perform step 4, the Metropolis-Hastings update for Q, as follows. For each individual i, we simulate a proposal value q(i)* from the prior distribution, which is D(
1,
2, ... ,
K). The proposed value q(i)* is accepted with probability that is equal to the ratio of the likelihoods.
MCMC updates for
and
:
We begin by placing independent uniform priors in the range (0, 10) on
k for each population k. We start the Markov chain at arbitrary initial values,
, for instance, and then update each
k as follows. Conditional on the current value of
k, we propose a new value
'k from a normal distribution with mean
k and some standard deviation (0.3, say). If
'k is outside (0, 10), we reject the new proposal; otherwise, we accept it according to the standard Metropolis-Hastings ratio. That is, we accept it with a probability that is the minimum of 1 and
![]() |
(A23) |
The latter expression is the ratio of the density function for the Dirichlet distribution given the proposed and current values of
k, respectively, multiplied across all loci. If we assume that the same value of
holds for all K populations, then the analogous update ratio is
![]() |
(A24) |
The full update used for
is closely analogous to that for
, except that the products are over individuals, not loci, and over ancestry coefficients, q, not allele frequencies, p.
| LITERATURE CITED |
|---|
AGIS, M. and C. SCHLÖTTERER, 2001 Microsatellite variation in natural Drosophila melanogaster populations from New South Wales (Australia) and Tasmania. Mol. Ecol. 10:1197-1205.[Medline]
ANDERSON, E. C., 2001 Monte Carlo methods for inference in population genetic models. Ph.D. Thesis, University of Washington, Seattle.
ANDERSON, E. C. and E. A. THOMPSON, 2002 A model-based method for identifying species hybrids using multilocus genetic data. Genetics 160:1217-1229.
BALDING, D. J. and R. A. NICHOLS, 1997 Significant genetic correlations among Caucasians at forensic DNA loci. Heredity 78:583-589.
BARTON, N. H. and G. M. HEWITT, 1989 Adaptation, speciation and hybrid zones. Nature 341:497-503.[Medline]
BEAUMONT, M., D. GOTTELLI, E. M. BARRATT, A. C. KITCHENER, and M. J. DANIELS et al., 2001 Genetic diversity and introgression in the Scottish wildcat. Mol. Ecol. 10:319-336.[Medline]
BERTORELLE, G. and L. EXCOFFIER, 1998 Inferring admixture proportions from molecular data. Mol. Biol. Evol. 15:1298-1311.[Abstract]
BROMAN, K. W., J. C. MURRAY, V. C. SHEFFIELD, R. L. WHITE, and J. L. WEBER, 1998 Comprehensive human genetic maps: individual and sex-specific variation in recombination. Am. J. Hum. Genet. 63:861-869.[Medline]
CHAKRABORTY, R. and K. M. WEISS, 1988 Admixture as a tool for finding linked genes and detecting that difference from allelic association between loci. Proc. Natl. Acad. Sci. USA 85:9119-9123.
CHIKHI, L., M. W. BRUFORD, and M. A. BEAUMONT, 2001 Estimation of admixture proportions: a likelihood-based approach using Markov chain Monte Carlo. Genetics 158:1347-1362.
COOPER, R. S., A. LUKE, X. ZHU, D. KAN, and A. ADEYEMO et al., 2002 A genome-wide scan among Nigerians linking blood pressure to regions on chromosomes 2, 3, and 19. Hypertension 40:629-633.
DALY, M. J., J. D. RIOUX, S. F. SCHAFFNER, T. J. HUDSON, and E. S. LANDER, 2001 High-resolution haplotype structure in the human genome. Nat. Genet. 29:229-232.[Medline]
DAWSON, K. J. and K. BELKHIR, 2001 A Bayesian approach to the identification of panmictic populations and the assignment of individuals. Genet. Res. 78:59-77.[Medline]
EROSHEVA, E. A., 2002 Grade of membership and latent structure models with application to disability survey data. Ph.D. Thesis, Department of Statistics, Carnegie Mellon University, Pittsburgh.
EXCOFFIER, L., 2001 Analysis of population subdivision, pp. 271307 in Handbook of Statistical Genetics, edited by D. BALDING, M. BISHOP and C. CANNINGS. John Wiley & Sons, New York.
FALUSH, D., C. KRAFT, N. S. TAYLOR, P. CORREA, and J. G. FOX et al., 2001 Recombination and mutation during long-term gastric colonization by Helicobacter pylori: estimates of clock rates, recombination size, and minimal age. Proc. Natl. Acad. Sci. USA 98:1505-1561.
FALUSH, D., T. WIRTH, B. LINZ, J. K. PRITCHARD, and M. STEPHENS et al., 2003 Traces of human migrations in Helicobacter pylori populations. Science 299:1582-1585.
GILKS, W. R., S. RICHARDSON and D. J. SPIEGELHALTER (Editors), 1996 Markov Chain Monte Carlo in Practice. Chapman & Hall, London.
GUGLIELMINO, C. R., A. PIAZZA, P. MENOZZI, and L. L. CAVALLI-SFORZA, 1990 Uralic genes in Europe. Am. J. Phys. Anthropol. 83:57-68.[Medline]
KNOWLER, W. C., R. C. WILLIAMS, D. J. PETTITT, and A. G. STEINBERG, 1988 Gm3;5,13,14 and type 2 diabetes mellitus: an association in American Indians with genetic admixture. Am. J. Hum. Genet. 43:520-526.[Medline]
KONG, A., D. F. GUDBJARTSSON, J. SAINZ, G. M. JONSDOTTIR, and S. A. GUDJONSSON et al., 2002 A high-resolution recombination map of the human genome. Nat. Genet. 31:241-247.[Medline]
KUMAR, S., K. TAMURA, I. B. JAKOBSEN, and M. NEI, 2001 MEGA2: molecular evolutionary genetics analysis software. Bioinformatics 17:1244-1245.
LONG, J. C., 1991 The genetic structure of admixed population. Genetics 127:417-428.[Abstract]
MARCHINI, J. L. and L. R. CARDON, 2002 Discussion on statistical modelling and genetic data. J. R. Stat. Soc. B 64:740-741.
MCKEIGUE, P. M., 1998 Mapping genes that underlie ethnic differences in disease risk: methods for detecting linkage in admixed populations, by conditioning on parental admixture. Am. J. Hum. Genet. 63:241-251.[Medline]
MCKEIGUE, P. M., J. R. CARPENTER, E. J. PARRA, and M. D. SHRIVER, 2000 Estimation of admixture and detection of linkage in admixed populations by a Bayesian approach: application to African-American populations. Ann. Hum. Genet. 64:171-186.[Medline]
NEI, M. and W.-H. LI, 1979 Mathematical model for studying genetic variation in terms of restriction endonucleases. Proc. Natl. Acad. Sci. USA 76:5269-5273.
NICHOLSON, G., A. V. SMITH, F. JÓNSSON, O. GÚSTAFSSON, and K. STEFANSSON et al., 2002 Assessing population differentiation and isolation from single nucleotide polymorphism data. J. R. Stat. Soc. B 64:695-715.
PARRA, E. J., A. MARCINI, J. AKEY, J. MARTINSON, and M. A. BATZER et al., 1998 Estimating African American admixture proportions by use of population-specific alleles. Am. J. Hum. Genet. 63:1839-1851.[Medline]
PFAFF, C. L., E. J. PARRA, C. BONILLA, K. HIESTER, and P. M. MCKEIGUE et al., 2001 Population structure in admixed populations: effect of admixture dynamics on the pattern of linkage disequilibrium. Am. J. Hum. Genet. 68:198-207.[Medline]
PRITCHARD, J. K., M. STEPHENS, and P. DONNELLY, 2000 Inference of population structure using multilocus genotype data. Genetics 155:945-959.
RABINER, L. R., 1989 A tutorial on hidden Markov models and selected applications in speech recognition. Proc. IEEE 77(2):257-286.
RIESEBERG, L. H., J. WHITTON, and K. GARDNER, 1999 Hybrid zones and the genetic architecture of a barrier to gene flow between two sunflower species. Genetics 152:713-727.
ROSENBERG, N. A., J. K. PRITCHARD, J. L. WEBER, H. M. CANN, and K. K. KIDD et al., 2002 Genetic structure of human populations. Science 298:2381-2385.
SATTEN, G. A., W. D. FLANDERS, and Q. YANG, 2001 Accounting for unmeasured population structure in case-control studies of genetic association using a novel latent-class model. Am. J. Hum. Genet. 68:466-477.[Medline]
SILLANPÄÄ, M. J., R. KILPIKARI, S. RIPATTI, P. ONKAMO, and P. UIMARI, 2001 Bayesian association mapping for quantitative traits in a mixture of two populations. Genet. Epidemiol. 21:692-699.
SITES, J. W., N. H. BARTON, and K. M. REED, 1995 The genetic-structure of a hybrid zone between 2 chromosome races of the Sceloporus grammicus complex (Sauria, Phrynosomatidae) in central Mexico. Evolution 49:9-36.
STEPHENS, J. C., D. BRISCOE, and S. J. O'BRIEN, 1994 Mapping admixture linkage disequilibrium in human populations: limits and guidelines. Am. J. Hum. Genet. 55:809-824.[Medline]
STEPHENS, M., N. J. SMITH, and P. DONNELLY, 2001 A new statistical method for haplotype reconstruction from population data. Am. J. Hum. Genet. 68:978-989.[Medline]
THIEL, B. A., A. CHAKRAVARTI, R. S. COOPER, A. LUKE, and S. LEWIS et al., 2003 A genome wide linkage analysis investigating the determinants of blood pressure in Caucasians and African Americans. Am. J. Hypertens. 16:151-153.[Medline]
THOMPSON, E. A., 1973 The Icelandic admixture problem. Ann. Hum. Genet. 37:69-80.[Medline]
THORNSBERRY, J. M., M. M. GOODMAN, J. DOEBLEY, S. KRESOVICH, and D. NIELSEN et al., 2001 Dwarf8 polymorphisms associate with variation with flowering time. Nat. Genet. 28:286-289.[Medline]
WRIGHT, S., 1951 The genetical structure of populations. Ann. Eugen. 15:323-354.
This article has been cited by other articles:
![]() |
J. Schumacher, G. Laje, R. A. Jamra, T. Becker, T. W. Muhleisen, C. Vasilescu, M. Mattheisen, S. Herms, P. Hoffmann, A. M. Hillmer, et al. The DISC locus and schizophrenia: evidence from an association study in a central European sample and from a meta-analysis across different European populations Hum. Mol. Genet., July 15, 2009; 18(14): 2719 - 2727. [Abstract] [Full Text] [PDF] |
||||
![]() |
P. Huezo-Diaz, R. Uher, R. Smith, M. Rietschel, N. Henigsberg, A. Marusic, O. Mors, W. Maier, J. Hauser, D. Souery, et al. Moderation of antidepressant response by the serotonin transporter gene The British Journal of Psychiatry, July 1, 2009; 195(1): 30 - 38. [Abstract] [Full Text] [PDF] |
||||
![]() |
C. J. Baskauf and J. M. Burke Population Genetics of Astragalus bibullatus (Fabaceae) Using AFLPs J. Hered., July 1, 2009; 100(4): 424 - 431. [Abstract] [Full Text] [PDF] |
||||
![]() |
M. L. Chang, J. S. Yokoyama, N. Branson, D. J. Dyer, C. Hitte, K. L. Overall, and S. P. Hamilton Intrabreed Stratification Related to Divergent Selection Regimes in Purebred Dogs May Affect the Interpretation of Genetic Association Studies J. Hered., July 1, 2009; 100(suppl_1): S28 - S36. [Abstract] [Full Text] [PDF] |
||||
![]() |
P. J. Wilson, S. K. Grewal, F. F. Mallory, and B. N. White Genetic Characterization of Hybrid Wolves across Ontario J. Hered., July 1, 2009; 100(suppl_1): S80 - S89. [Abstract] [Full Text] [PDF] |
||||
![]() |
W. F. Anderson, A. Maas, and P. Ozias-Akins Genetic Variability of a Forage Bermudagrass Core Collection Crop Sci., June 26, 2009; 49(4): 1347 - 1358. [Abstract] [Full Text] [PDF] |
||||
![]() |
D. E. Pearse, S. A. Hayes, M. H. Bond, C. V. Hanson, E. C. Anderson, R. B. Macfarlane, and J. C. Garza Over the Falls? Rapid Evolution of Ecotypic Differentiation in Steelhead/Rainbow Trout (Oncorhynchus mykiss) J. Hered., June 26, 2009; (2009) esp040v1. [Abstract] [Full Text] [PDF] |
||||
![]() |
B. Pasaniuc, S. Sankararaman, G. Kimmel, and E. Halperin Inference of locus-specific ancestry in closely related populations Bioinformatics, June 15, 2009; 25(12): i213 - i221. [Abstract] [Full Text] [PDF] |
||||
![]() |
M. M. Hansen and K.-L. D. Mensberg Admixture analysis of stocked brown trout populations using mapped microsatellite DNA markers: indigenous trout persist in introgressed populations Biol Lett, June 10, 2009; (2009) rsbl.2009.0214v1. [Abstract] [Full Text] [PDF] |
||||
![]() |
A. Aguilar-Melendez, P. L. Morrell, M. L. Roose, and S.-C. Kim Genetic diversity and structure in semiwild and domesticated chiles (Capsicum annuum; Solanaceae) from Mexico Am. J. Botany, June 1, 2009; 96(6): 1190 - 1202. [Abstract] [Full Text] [PDF] |
||||
![]() |
A. Basu, H. Tang, C. E. Lewis, K. North, J. D. Curb, T. Quertermous, T. H. Mosley, E. Boerwinkle, X. Zhu, and N. J. Risch Admixture mapping of quantitative trait loci for blood lipids in African-Americans Hum. Mol. Genet., June 1, 2009; 18(11): 2091 - 2098. [Abstract] [Full Text] [PDF] |
||||
![]() |
S. Shringarpure and E. P. Xing mStruct: Inference of Population Structure in Light of Both Genetic Admixing and Allele Mutations Genetics, June 1, 2009; 182(2): 575 - 593. [Abstract] [Full Text] [PDF] |
||||
![]() |
J. A. Knight, A. D. Skol, A. Shinde, D. Hastings, R. A. Walgren, J. Shao, T. R. Tennant, M. Banerjee, J. M. Allan, M. M. Le Beau, et al. Genome-wide association study to identify novel loci associated with therapy-related myeloid leukemia susceptibility Blood, May 28, 2009; 113(22): 5575 - 5582. [Abstract] [Full Text] [PDF] |
||||
![]() |
I. Silva-Zolezzi, A. Hidalgo-Miranda, J. Estrada-Gil, J. C. Fernandez-Lopez, L. Uribe-Figueroa, A. Contreras, E. Balam-Ortiz, L. del Bosque-Plata, D. Velazquez-Fernandez, C. Lara, et al. Analysis of genomic diversity in Mexican Mestizo populations to develop genomic medicine in Mexico PNAS, May 26, 2009; 106(21): 8611 - 8616. [Abstract] [Full Text] [PDF] |
||||
![]() |
J. Peccoud, A. Ollivier, M. Plantegenest, and J.-C. Simon A continuum of genetic divergence from sympatric host races to species in the pea aphid complex PNAS, May 5, 2009; 106(18): 7495 - 7500. [Abstract] [Full Text] [PDF] |
||||
![]() |
K. M. Weiss and J. C. Long Non-Darwinian estimation: My ancestors, my genes' ancestors Genome Res., May 1, 2009; 19(5): 703 - 710. [Abstract] [Full Text] [PDF] |
||||
![]() |
B. P. McEvoy, G. W. Montgomery, A. F. McRae, S. Ripatti, M. Perola, T. D. Spector, L. Cherkas, K. R. Ahmadi, D. Boomsma, G. Willemsen, et al. Geographical structure and differential natural selection among North European populations Genome Res., May 1, 2009; 19(5): 804 - 814. [Abstract] [Full Text] [PDF] |
||||
![]() |
S. Wang, L. Zhang, and M. Matz Microsatellite Characterization and Marker Development from Public EST and WGS Databases in the Reef-Building Coral Acropora millepora (Cnidaria, Anthozoa, Scleractinia) J. Hered., May 1, 2009; 100(3): 329 - 337. [Abstract] [Full Text] [PDF] |
||||
![]() |
J. Yan, H.-J. Chu, H.-C. Wang, J.-Q. Li, and T. Sang Population genetic structure of two Medicago species shaped by distinct life form, mating system and seed dispersal Ann. Bot., April 1, 2009; 103(6): 825 - 834. [Abstract] [Full Text] [PDF] |
||||
![]() |
K. C. Koenen, A. E. Aiello, E. Bakshis, A. B. Amstadter, K. J. Ruggiero, R. Acierno, D. G. Kilpatrick, J. Gelernter, and S. Galea Modification of the Association Between Serotonin Transporter Genotype and Risk of Posttraumatic Stress Disorder in Adults by County-Level Social Environment Am. J. Epidemiol., March 15, 2009; 169(6): 704 - 711. [Abstract] [Full Text] [PDF] |
||||
![]() |
M. Junyent, K. L. Tucker, C. E. Smith, A. Garcia-Rios, J. Mattei, C.-Q. Lai, L. D. Parnell, and J. M. Ordovas The effects of ABCG5/G8 polymorphisms on plasma HDL cholesterol concentrations depend on smoking habit in the Boston Puerto Rican Health Study J. Lipid Res., March 1, 2009; 50(3): 565 - 573. [Abstract] [Full Text] [PDF] |
||||
![]() |
S. C. Murray, W. L. Rooney, M. T. Hamblin, S. E. Mitchell, and S. Kresovich Sweet Sorghum Genetic Diversity and Association Mapping for Brix and Height The Plant Genome, March 1, 2009; 2(1): 48 - 62. [Abstract] [Full Text] [PDF] |
||||
![]() |
J. Yu, Z. Zhang, C. Zhu, D. A. Tabanao, G. Pressoir, M. R. Tuinstra, S. Kresovich, R. J. Todhunter, and E. S. Buckler Simulation Appraisal of the Adequacy of Number of Background Markers for Relationship Estimation in Association Mapping The Plant Genome, March 1, 2009; 2(1): 63 - 77. [Abstract] [Full Text] [PDF] |
||||
![]() |
V. N. Giri, B. Egleston, K. Ruth, R. G. Uzzo, D. Y.T. Chen, M. Buyyounouski, S. Raysor, S. Hooker, J. B. Torres, T. Ramike, et al. Race, Genetic West African Ancestry, and Prostate Cancer Prediction by Prostate-Specific Antigen in Prospectively Screened High-Risk Men Cancer Prevention Research, March 1, 2009; 2(3): 244 - 250. [Abstract] [Full Text] [PDF] |
||||
![]() |
B.-H. Song, A. J. Windsor, K. J. Schmid, S. Ramos-Onsins, M. E. Schranz, A. J. Heidel, and T. Mitchell-Olds Multilocus Patterns of Nucleotide Diversity, Population Structure and Linkage Disequilibrium in Boechera stricta, a Wild Relative of Arabidopsis Genetics, March 1, 2009; 181(3): 1021 - 1033. [Abstract] [Full Text] [PDF] |
||||
![]() |
S. E. Baranzini, J. Wang, R. A. Gibson, N. Galwey, Y. Naegelin, F. Barkhof, E.-W. Radue, R. L.P. Lindberg, B. M.G. Uitdehaag, M. R. Johnson, et al. Genome-wide association analysis of susceptibility and clinical phenotype in multiple sclerosis Hum. Mol. Genet., February 15, 2009; 18(4): 767 - 778. [Abstract] [Full Text] [PDF] |
||||
![]() |
L. Bui Thi Ngoc, C. Verniere, P. Jarne, S. Brisse, F. Guerin, S. Boutry, L. Gagnevin, and O. Pruvost From Local Surveys to Global Surveillance: Three High-Throughput Genotyping Methods for Epidemiological Monitoring of Xanthomonas citri pv. citri Pathotypes Appl. Envir. Microbiol., February 15, 2009; 75(4): 1173 - 1184. [Abstract] [Full Text] [PDF] |
||||
![]() |
G. Guillot and M. Foll Correcting for ascertainment bias in the inference of population structure Bioinformatics, February 15, 2009; 25(4): 552 - 554. [Abstract] [Full Text] [PDF] |
||||
![]() |
C. N. Balakrishnan and S. V. Edwards Nucleotide Variation, Linkage Disequilibrium and Founder-Facilitated Speciation in Wild Populations of the Zebra Finch (Taeniopygia guttata) Genetics, February 1, 2009; 181(2): 645 - 660. [Abstract] [Full Text] [PDF] |
||||
![]() |
J. E. Pool and R. Nielsen Inference of Historical Changes in Migration Rate From the Lengths of Migrant Tracts Genetics, February 1, 2009; 181(2): 711 - 719. [Abstract] [Full Text] [PDF] |
||||
![]() |
D. J. Wilson, E. Gabriel, A. J.H. Leatherbarrow, J. Cheesbrough, S. Gee, E. Bolton, A. Fox, C. A. Hart, P. J. Diggle, and P. Fearnhead Rapid Evolution and the Importance of Recombination to the Gastroenteric Pathogen Campylobacter jejuni Mol. Biol. Evol., February 1, 2009; 26(2): 385 - 397. [Abstract] [Full Text] [PDF] |
||||
![]() |
E. Fakra, L. W. Hyde, A. Gorka, P. M. Fisher, K. E. Munoz, M. Kimak, I. Halder, R. E. Ferrell, S. B. Manuck, and A. R. Hariri Effects of HTR1A C(-1019)G on Amygdala Reactivity and Trait Anxiety Arch Gen Psychiatry, January 1, 2009; 66(1): 33 - 40. [Abstract] [Full Text] [PDF] |
||||
![]() |
A. P. Chokkalingam and P. A. Buffler Genetic susceptibility to childhood leukaemia Radiat Prot Dosimetry, December 1, 2008; 132(2): 119 - 129. [Abstract] [Full Text] [PDF] |
||||
![]() |
J. Carlsson Effects of Microsatellite Null Alleles on Assignment Testing J. Hered., November 1, 2008; 99(6): 616 - 623. [Abstract] [Full Text] [PDF] |
||||
![]() |
Z. K. Z. Gajdos, J. L. Butler, K. D. Henderson, C. He, P. J. Supelak, M. Egyud, A. Price, D. Reich, P. E. Clayton, L. Le Marchand, et al. Association Studies of Common Variants in 10 Hypogonadotropic Hypogonadism Genes with Age at Menarche J. Clin. Endocrinol. Metab., November 1, 2008; 93(11): 4290 - 4298. [Abstract] [Full Text] [PDF] |
||||
![]() |
F. Gugerli, S. Brodbeck, and R. Holderegger Utility of Multilocus Genotypes for Taxon Assignment in Stands of Closely Related European White Oaks from Switzerland Ann. Bot., November 1, 2008; 102(5): 855 - 863. [Abstract] [Full Text] [PDF] |
||||
![]() |
M. C. Aldrich, S. Selvin, H. M. Hansen, L. F. Barcellos, M. R. Wrensch, J. D. Sison, C. P. Quesenberry, R. A. Kittles, G. Silva, P. A. Buffler, et al. Comparison of Statistical Methods for Estimating Genetic Admixture in a Lung Cancer Study of African Americans and Latinos Am. J. Epidemiol., November 1, 2008; 168(9): 1035 - 1046. [Abstract] [Full Text] [PDF] |
||||
![]() |
C. Tian, P. K. Gregersen, and M. F. Seldin Accounting for ancestry: population substructure and genome-wide association studies Hum. Mol. Genet., October 15, 2008; 17(R2): R143 - R150. [Abstract] [Full Text] [PDF] |
||||
![]() |
A. Terrab, A. Hampe, O. Lepais, S. Talavera, E. Vela, and T. F. Stuessy Phylogeography of North African Atlas cedar (Cedrus atlantica, Pinaceae): Combined molecular and fossil data reveal a complex Quaternary history Am. J. Botany, October 1, 2008; 95(10): 1262 - 1269. [Abstract] [Full Text] [PDF] |
||||
![]() |
A. Diaz-Perez, M. Sequeira, A. Santos-Guerra, and P. Catalan Multiple Colonizations, In Situ Speciation, and Volcanism-Associated Stepping-Stone Dispersals Shaped the Phylogeography of the Macaronesian Red Fescues (Festuca L., Gramineae) Syst Biol, October 1, 2008; 57(5): 732 - 749. [Abstract] [Full Text] [PDF] |
||||
![]() |
G. Guillot Inference of structure in subdivided populations at low levels of genetic differentiation--the correlated allele frequencies model revisited Bioinformatics, October 1, 2008; 24(19): 2222 - 2228. [Abstract] [Full Text] [PDF] |
||||
![]() |
M. Foll and O. Gaggiotti A Genome-Scan Method to Identify Selected Loci Appropriate for Both Dominant and Codominant Markers: A Bayesian Perspective Genetics, October 1, 2008; 180(2): 977 - 993. [Abstract] [Full Text] [PDF] |
||||
![]() |
F. X. Pico, B. Mendez-Vigo, J. M. Martinez-Zapater, and C. Alonso-Blanco Natural Genetic Variation of Arabidopsis thaliana Is Geographically Structured in the Iberian Peninsula Genetics, October 1, 2008; 180(2): 1009 - 1021. [Abstract] [Full Text] [PDF] |
||||
![]() |
E. C Anderson Bayesian inference of species hybrids using multilocus dominant genetic markers Phil Trans R Soc B, September 12, 2008; 363(1505): 2841 - 2850. [Abstract] [Full Text] [PDF] |
||||
![]() |
R. Oliveira, R. Godinho, E. Randi, and P. C Alves Hybridization versus conservation: are domestic cats threatening the genetic integrity of wildcats (Felis silvestris silvestris) in Iberian Peninsula? Phil Trans R Soc B, September 12, 2008; 363(1505): 2953 - 2961. [Abstract] [Full Text] [PDF] |
||||
![]() |
C. Zhu, M. Gore, E. S. Buckler, and J. Yu Status and Prospects of Association Mapping in Plants The Plant Genome, July 1, 2008; 1(1): 5 - 20. [Abstract] [Full Text] [PDF] |
||||
![]() |
F. C. Odds and M. D. Jacobsen Multilocus Sequence Typing of Pathogenic Candida Species Eukaryot. Cell, July 1, 2008; 7(7): 1075 - 1084. [Full Text] [PDF] |
||||
![]() |
B. N. Sacks, D. L. Bannasch, B. B. Chomel, and H. B. Ernest Coyotes Demonstrate How Habitat Specialization by Individuals of a Generalist Species Can Diversify Populations in a Heterogeneous Ecoregion Mol. Biol. Evol., July 1, 2008; 25(7): 1384 - 1394. [Abstract] [Full Text] [PDF] |
||||
![]() |
J. Duan, J. Mu, M. A. Thera, D. Joy, S. L. Kosakovsky Pond, D. Diemert, C. Long, H. Zhou, K. Miura, A. Ouattara, et al. Population structure of the genes encoding the polymorphic Plasmodium falciparum apical membrane antigen 1: Implications for vaccine design PNAS, June 3, 2008; 105(22): 7857 - 7862. [Abstract] [Full Text] [PDF] |
||||
![]() |
M. Heuertz, S. Goryslavets, J.-F. Hausman, and V. Risovanna Characterization of Grapevine Accessions from Ukraine Using Microsatellite Markers Am. J. Enol. Vitic., June 1, 2008; 59(2): 169 - 178. [Abstract] [Full Text] [PDF] |
||||
![]() |
M. Escudero, P. Vargas, V. Valcarcel, and M. Luceno Strait of Gibraltar: an effective gene-flow barrier for wind-pollinated Carex helodes (Cyperaceae) as revealed by DNA sequences, AFLP, and cytogenetic variation Am. J. Botany, June 1, 2008; 95(6): 745 - 755. [Abstract] [Full Text] [PDF] |
||||
![]() |
F. M. Colles, K. E. Dingle, A. J. Cody, and M. C. J. Maiden Comparison of Campylobacter Populations in Wild Geese with Those in Starlings and Free-Range Poultry on the Same Farm Appl. Envir. Microbiol., June 1, 2008; 74(11): 3583 - 3590. [Abstract] [Full Text] [PDF] |
||||
![]() |
M. Foll, M. A. Beaumont, and O. Gaggiotti An Approximate Bayesian Computation Approach to Overcome Biases That Arise When Using Amplified Fragment Length Polymorphism Markers to Study Population Structure Genetics, June 1, 2008; 179(2): 927 - 939. [Abstract] [Full Text] [PDF] |
||||
![]() |
D. A. Joy, L. Gonzalez-Ceron, J. M. Carlton, A. Gueye, M. Fay, T. F. McCutchan, and X.-z. Su Local Adaptation and Vector-Mediated Population Structure in Plasmodium vivax Malaria Mol. Biol. Evol., June 1, 2008; 25(6): 1245 - 1252. [Abstract] [Full Text] [PDF] |
||||
![]() |
J. A. Crouch, B. M. Glasheen, W. Uddin, B. B. Clarke, and B. I. Hillman Patterns of Diversity in Populations of the Turfgrass Pathogen Colletotrichum cereale as Revealed by Transposon Fingerprint Profiles Crop Sci., May 1, 2008; 48(3): 1203 - 1210. [Abstract] [Full Text] [PDF] |
||||
![]() |
E. K. Latch, D. G. Scognamillo, J. A. Fike, M. J. Chamberlain, and O. E. Rhodes Jr Deciphering Ecological Barriers to North American River Otter (Lontra canadensis) Gene Flow in the Louisiana Landscape J. Hered., May 1, 2008; 99(3): 265 - 274. [Abstract] [Full Text] [PDF] |
||||
![]() |
S. K. Sheppard, N. D. McCarthy, D. Falush, and M. C. J. Maiden Convergence of Campylobacter Species: Implications for Bacterial Evolution Science, April 11, 2008; 320(5873): 237 - 239. [Abstract] [Full Text] [PDF] |
||||
![]() |
S. Sankararaman, G. Kimmel, E. Halperin, and M. I. Jordan On the inference of ancestries in admixed populations Genome Res., April 1, 2008; 18(4): 668 - 675. [Abstract] [Full Text] [PDF] |
||||
![]() |
A. Sundquist, E. Fratkin, C. B. Do, and S. Batzoglou Effect of genetic divergence in identifying ancestral origin using HAPAA Genome Res., April 1, 2008; 18(4): 676 - 682. [Abstract] [Full Text] [PDF] |
||||
![]() |
Y. Zhang Tree-guided Bayesian inference of population structures Bioinformatics, April 1, 2008; 24(7): 965 - 971. [Abstract] [Full Text] [PDF] |
||||
![]() |
C.-Q. Lai, K. L. Tucker, L. D. Parnell, X. Adiconis, B. Garcia-Bailo, J. Griffith, M. Meydani, and J. M. Ordovas PPARGC1A Variation Associated With DNA Damage, Diabetes, and Cardiovascular Diseases: The Boston Puerto Rican Health Study Diabetes, April 1, 2008; 57(4): 809 - 816. [Abstract] [Full Text] [PDF] |
||||
![]() |
E. G. Bakker, M. B. Traw, C. Toomajian, M. Kreitman, and J. Bergelson Low Levels of Polymorphism in Genes That Control the Activation of Defense Response in Arabidopsis thaliana Genetics, April 1, 2008; 178(4): 2031 - 2043. [Abstract] [Full Text] [PDF] |
||||
![]() |
J. S. Barnholtz-Sloan, B. McEvoy, M. D. Shriver, and T. R. Rebbeck Ancestry Estimation and Correction for Population Stratification in Molecular Epidemiologic Association Studies Cancer Epidemiol. Biomarkers Prev., March 1, 2008; 17(3): 471 - 477. [Full Text] [PDF] |
||||
![]() |
M. Rossetto, R. Kooyman, W. Sherwin, and R. Jones Dispersal limitations, rather than bottlenecks or habitat specificity, can restrict the distribution of rare and endemic rainforest trees Am. J. Botany, March 1, 2008; 95(3): 321 - 329. [Abstract] [Full Text] [PDF] |
||||
![]() |
P. Faubet and O. E. Gaggiotti A New Bayesian Method to Identify the Environmental Factors That Influence Recent Migration Genetics, March 1, 2008; 178(3): 1491 - 1504. [Abstract] [Full Text] [PDF] |
||||
![]() |
W. Li Three lectures on case control genetic association analysis Brief Bioinform, January 1, 2008; 9(1): 1 - 13. [Abstract] [Full Text] [PDF] |
||||
![]() |
I. Simko and J. Hu Population Structure in Cultivated Lettuce and Its Impact on Association Mapping J. Amer. Soc. Hort. Sci., January 1, 2008; 133(1): 61 - 68. [Abstract] [Full Text] [PDF] |
||||
![]() |
J. Yu, J. B. Holland, M. D. McMullen, and E. S. Buckler Genetic Design and Statistical Power of Nested Association Mapping in Maize Genetics, January 1, 2008; 178(1): 539 - 551. [Abstract] [Full Text] [PDF] |
||||
![]() |
B.-Z. Yang, H. R. Kranzler, H. Zhao, J. R. Gruen, X. Luo, and J. Gelernter Association of haplotypic variants in DRD2, ANKK1, TTC12 and NCAM1 to alcohol dependence in independent case control and family samples Hum. Mol. Genet., December 1, 2007; 16(23): 2844 - 2853. [Abstract] [Full Text] [PDF] |
||||
![]() |
H. B. Shaffer and R. C. Thomson Delimiting Species in Recent Radiations Syst Biol, December 1, 2007; 56(6): 896 - 906. [Abstract] [Full Text] [PDF] |
||||
![]() |
C. Robbins, J. B. Torres, S. Hooker, C. Bonilla, W. Hernandez, A. Candreva, C. Ahaghotu, R. Kittles, and J. Carpten Confirmation study of prostate cancer risk variants at 8q24 in African Americans identifies a novel risk locus Genome Res., December 1, 2007; 17(12): 1717 - 1722. [Abstract] [Full Text] [PDF] |
||||
![]() |
C. D Criscione, J. D Anderson, D. Sudimack, W. Peng, B. Jha, S. Williams-Blangero, and T. J.C Anderson Disentangling hybridization and host colonization in parasitic roundworms of humans and pigs Proc R Soc B, November 7, 2007; 274(1626): 2669 - 2677. [Abstract] [Full Text] [PDF] |
||||
![]() |
G. Montana and C. Hoggart Statistical software for gene mapping by admixture linkage disequilibrium Brief Bioinform, November 1, 2007; 8(6): 393 - 395. [Abstract] [Full Text] [PDF] |
||||
![]() |
T. Wirth, G. Morelli, B. Kusecek, A. van Belkum, C. van der Schee, A. Meyer, and M. Achtman The rise and spread of a new pathogen: Seroresistant Moraxella catarrhalis Genome Res., November 1, 2007; 17(11): 1647 - 1656. [Abstract] [Full Text] [PDF] |
||||
![]() |
H. G. Parker, A. V. Kukekova, D. T. Akey, O. Goldstein, E. F. Kirkness, K. C. Baysac, D. S. Mosher, G. D. Aguirre, G. M. Acland, and E. A. Ostrander Breed relationships facilitate fine-mapping studies: A 7.8-kb deletion cosegregates with Collie eye anomaly across multiple dog breeds Genome Res., November 1, 2007; 17(11): 1562 - 1571. [Abstract] [Full Text] [PDF] |
||||
![]() |
D. Saisho and M. D. Purugganan Molecular Phylogeography of Domesticated Barley Traces Expansion of Agriculture in the Old World Genetics, November 1, 2007; 177(3): 1765 - 1776. [Abstract] [Full Text] [PDF] |
||||
![]() |
D. Pinto, C. Marshall, L. Feuk, and S. W. Scherer Copy-number variation in control population cohorts Hum. Mol. Genet., October 15, 2007; 16(R2): R168 - R173. [Abstract] [Full Text] [PDF] |
||||
![]() |
R. G. Albaladejo and A. Aparicio Population Genetic Structure and Hybridization Patterns in the Mediterranean Endemics Phlomis lychnitis and P. crinita (Lamiaceae) Ann. Bot., October 1, 2007; 100(4): 735 - 746. [Abstract] [Full Text] [PDF] |
||||
![]() |
A. Khan, B. Fux, C. Su, J. P. Dubey, M. L. Darde, J. W. Ajioka, B. M. Rosenthal, and L. D. Sibley Recent transcontinental sweep of Toxoplasma gondii driven by a single monomorphic chromosome PNAS, September 11, 2007; 104(37): 14872 - 14877. [Abstract] [Full Text] [PDF] |
||||
![]() |
Y. S. Aulchenko, D.-J. de Koning, and C. Haley Genomewide Rapid Association Using Mixed Model and Regression: A Fast and Simple Method For Genomewide Pedigree-Based Quantitative Trait Loci Association Analysis Genetics, September 1, 2007; 177(1): 577 - 585. [Abstract] [Full Text] [PDF] |
||||
![]() |
S. Xu, W. Huang, H. Wang, Y. He, Y. Wang, Y. Wang, J. Qian, M. Xiong, and L. Jin Dissecting Linkage Disequilibrium in African-American Genomes: Roles of Markers and Individuals Mol. Biol. Evol., September 1, 2007; 24(9): 2049 - 2058. [Abstract] [Full Text] [PDF] |
||||
![]() |
M. N. Grote A Covariance Structure Model for the Admixture of Binary Genetic Variation Genetics, August 1, 2007; 176(4): 2405 - 2420. [Abstract] [Full Text] [PDF] |
||||
![]() |
M. Jakobsson and N. A. Rosenberg CLUMPP: a cluster matching and permutation program for dealing with label switching and multimodality in analysis of population structure Bioinformatics, July 15, 2007; 23(14): 1801 - 1806. [Abstract] [Full Text] [PDF] |
||||
![]() |
M. Gaudeul, H. K. Stenoien, and J. Agren Landscape structure, clonal propagation, and genetic diversity in Scandinavian populations of Arabidopsis lyrata (Brassicaceae) Am. J. Botany, July 1, 2007; 94(7): 1146 - 1155. [Abstract] [Full Text] [PDF] |
||||
![]() |
X. Luo, H. R. Kranzler, L. Zuo, H. Zhang, S. Wang, and J. Gelernter CHRM2 variation predisposes to personality traits of agreeableness and conscientiousness Hum. Mol. Genet., July 1, 2007; 16(13): 1557 - 1568. [Abstract] [Full Text] [PDF] |
||||
- THIS ARTICLE
-
Abstract
- Full Text (PDF)
- Alert me when this article is cited
- Alert me if a correction is posted
- SERVICES
- Email this article to a friend
- Similar articles in this journal
- Similar articles in PubMed
- Alert me to new issues of the journal
- Download to citation manager
- Reprints & Permissions
- CITING ARTICLES
- Citing Articles via HighWire
- Citing Articles via Google Scholar
- GOOGLE SCHOLAR
- Articles by Falush, D.
- Articles by Pritchard, J. K.
- Search for Related Content
- PUBMED
- PubMed Citation
- Articles by Falush, D.
- Articles by Pritchard, J. K.









, where
and pklj is the frequency of allele j at locus l in population k.




























































