- THIS ARTICLE
-
Abstract
- Full Text (PDF)
- Alert me when this article is cited
- Alert me if a correction is posted
- SERVICES
- Email this article to a friend
- Similar articles in this journal
- Similar articles in PubMed
- Alert me to new issues of the journal
- Download to citation manager
- Reprints & Permissions
- CITING ARTICLES
- Citing Articles via HighWire
- Citing Articles via Google Scholar
- GOOGLE SCHOLAR
- Articles by Pritchard, J. K.
- Articles by Donnelly, P.
- Search for Related Content
- PUBMED
- PubMed Citation
- Articles by Pritchard, J. K.
- Articles by Donnelly, P.
Inference of Population Structure Using Multilocus Genotype Data
Jonathan K. Pritcharda, Matthew Stephensa, and Peter Donnellyaa Department of Statistics, University of Oxford, Oxford OX1 3TG, United Kingdom
Corresponding author: Jonathan K. Pritchard, Department of Statistics, University of Oxford, 1 S. Parks Rd., Oxford OX1 3TG, United Kingdom., pritch{at}stats.ox.ac.uk (E-mail)
Communicating editor: M. K. UYENOYAMA
| ABSTRACT |
|---|
We describe a model-based clustering method for using multilocus genotype data to infer population structure and assign individuals to populations. We assume a model in which there are K populations (where K may be unknown), each of which is characterized by a set of allele frequencies at each locus. Individuals in the sample are assigned (probabilistically) to populations, or jointly to two or more populations if their genotypes indicate that they are admixed. Our model does not assume a particular mutation process, and it can be applied to most of the commonly used genetic markers, provided that they are not closely linked. Applications of our method include demonstrating the presence of population structure, assigning individuals to populations, studying hybrid zones, and identifying migrants and admixed individuals. We show that the method can produce highly accurate assignments using modest numbers of locie.g., seven microsatellite loci in an example using genotype data from an endangered bird species. The software used for this article is available from http://www.stats.ox.ac.uk/~pritch/home.html.
IN applications of population genetics, it is often useful to classify individuals in a sample into populations. In one scenario, the investigator begins with a sample of individuals and wants to say something about the properties of populations. For example, in studies of human evolution, the population is often considered to be the unit of interest, and a great deal of work has focused on learning about the evolutionary relationships of modern populations (e.g., CAVALLI et al. 1994). In a second scenario, the investigator begins with a set of predefined populations and wishes to classify individuals of unknown origin. This type of problem arises in many contexts (reviewed by ![]()
In both situations described above, a crucial first step is to define a set of populations. The definition of populations is typically subjective, based, for example, on linguistic, cultural, or physical characters, as well as the geographic location of sampled individuals. This subjective approach is usually a sensible way of incorporating diverse types of information. However, it may be difficult to know whether a given assignment of individuals to populations based on these subjective criteria represents a natural assignment in genetic terms, and it would be useful to be able to confirm that subjective classifications are consistent with genetic information and hence appropriate for studying the questions of interest. Further, there are situations where one is interested in "cryptic" population structurei.e., population structure that is difficult to detect using visible characters, but may be significant in genetic terms. For example, when association mapping is used to find disease genes, the presence of undetected population structure can lead to spurious associations and thus invalidate standard tests (![]()
![]()
![]()
![]()
![]()
![]()
Our approach is reminiscent of that taken by ![]()
![]()
![]()
In the next section we provide a brief description of clustering methods in general and describe some advantages of the model-based approach we take. The details of the models and algorithms used are given in MODELS AND METHODS. We illustrate our method with several examples in APPLICATIONS TO DATA: both on simulated data and on sets of genotype data from an endangered bird species and from humans. INCORPORATING POPULATION INFORMATION describes how our method can be extended to incorporate geographic information into the inference process. This may be useful for testing whether particular individuals are migrants or to assist in classifying individuals of unknown origin (as in ![]()
| BACKGROUND ON CLUSTERING METHODS |
|---|
Consider a situation where we have genetic data from a sample of individuals, each of whom is assumed to have originated from a single unknown population (no admixture). Suppose we wish to cluster together individuals who are genetically similar, identify distinct clusters, and perhaps see how these clusters relate to geographical or phenotypic data on the individuals. There are broadly two types of clustering methods we might use:
- Distance-based methods. These proceed by calculating a pairwise distance matrix, whose entries give the distance (suitably defined) between every pair of individuals. This matrix may then be represented using some convenient graphical representation (such as a tree or a multidimensional scaling plot) and clusters may be identified by eye.
- Model-based methods. These proceed by assuming that observations from each cluster are random draws from some parametric model. Inference for the parameters corresponding to each cluster is then done jointly with inference for the cluster membership of each individual, using standard statistical methods (for example, maximum-likelihood or Bayesian methods).
Distance-based methods are usually easy to apply and are often visually appealing. In the genetics literature, it has been common to adapt distance-based phylogenetic algorithms, such as neighbor-joining, to clustering multilocus genotype data (e.g., ![]()
The first challenge when applying model-based methods is to specify a suitable model for observations from each cluster. To make our discussion more concrete we introduce very briefly some of our model and notation here; a fuller treatment is given later. Assume that each cluster (population) is modeled by a characteristic set of allele frequencies. Let X denote the genotypes of the sampled individuals, Z denote the (unknown) populations of origin of the individuals, and P denote the (unknown) allele frequencies in all populations. (Note that X, Z, and P actually represent multidimensional vectors.) Our main modeling assumptions are Hardy-Weinberg equilibrium within populations and complete linkage equilibrium between loci within populations. Under these assumptions each allele at each locus in each genotype is an independent draw from the appropriate frequency distribution, and this completely specifies the probability distribution Pr(X|Z, P) (given later in Equation 2). Loosely speaking, the idea here is that the model accounts for the presence of Hardy-Weinberg or linkage disequilibrium by introducing population structure and attempts to find population groupings that (as far as possible) are not in disequilibrium. While inference may depend heavily on these modeling assumptions, we feel that it is easier to assess the validity of explicit modeling assumptions than to compare the relative merits of more abstract quantities such as distance measures and graphical representations. In situations where these assumptions are deemed unreasonable then alternative models should be built.
Having specified our model, we must decide how to perform inference for the quantities of interest (Z and P). Here, we have chosen to adopt a Bayesian approach, by specifying models (priors) Pr(Z) and Pr(P), for both Z and P. The Bayesian approach provides a coherent framework for incorporating the inherent uncertainty of parameter estimates into the inference procedure and for evaluating the strength of evidence for the inferred clustering. It also eases the incorporation of various sorts of prior information that may be available, such as information about the geographic sampling location of individuals.
Having observed the genotypes, X, our knowledge about Z and P is then given by the posterior distribution
![]() |
(1) |
While it is not usually possible to compute this distribution exactly, it is possible to obtain an approximate sample (Z(1), P(1)), (Z(2), P(2)), ... ,(Z(M), P(M)) from Pr(Z, P|X) using Markov chain Monte Carlo (MCMC) methods described below (see ![]()
| MODELS AND METHODS |
|---|
We now provide a more detailed description of our modeling assumptions and the algorithms used to perform inference, beginning with the simpler case where each individual is assumed to have originated in a single population (no admixture).
The model without admixture:
Suppose we genotype N diploid individuals at L loci. In the case without admixture, each individual is assumed to originate in one of K populations, each with its own characteristic set of allele frequencies. Let the vector X denote the observed genotypes, Z the (unknown) populations of origin of the individuals, and P the (unknown) allele frequencies in the populations. These vectors consist of the following elements,

where Jl is the number of distinct alleles observed at locus l, and these alleles are labeled 1, 2, ... , Jl.
Given the population of origin of each individual, the genotypes are assumed to be generated by drawing alleles independently from the appropriate population frequency distributions,
![]() |
(2) |
independently for each x(i,a)l. (Note that pz(i)lj is the frequency of allele j at locus l in the population of origin of individual i.)
Assume that before observing the genotypes we have no information about the population of origin of each individual and that the probability that individual i originated in population k is the same for all k,
![]() |
(3) |
independently for all individuals. (In cases where some populations may be more heavily represented in the sample than others, this assumption is inappropriate; it would be straightforward to extend our model to deal with such situations.)
We follow the suggestion of ![]()
![]()
![]()
(
1,
2, ... ,
J) is a distribution on allele frequencies p = (p1, p2, ... , pJ) with the property that these frequencies sum to 1. We use this distribution to specify the probability of a particular set of allele frequencies pkl· for population k at locus l,
![]() |
(4) |
independently for each k,l. The expected frequency of allele j is proportional to
j, and the variance of this frequency decreases as the sum of the
j increases. We take
1 =
2 = · · · =
Jl = 1.0, which gives a uniform distribution on the allele frequencies; alternatives are discussed in the DISCUSSION.
MCMC algorithm (without admixture):
Equation 2, Equation 3, and Equation 4 define the quantities Pr(X|Z, P), Pr(Z), and Pr(P), respectively. By setting
= (
1,
2) = (Z, P) and letting
(Z, P) = Pr(Z, P|X) we can use the approach outlined in Algorithm A1 to construct a Markov chain with stationary distribution Pr(Z, P|X) as follows:
ALGORITHM 1: Starting with initial values Z(0) for Z (by drawing Z(0) at random using (3) for example), iterate the following steps for m = 1, 2, ...
- Step 1. Sample P(m) from Pr(P|X, Z(m-1)).
- Step 2. Sample Z(m) from Pr(Z|X, P(m)).
Informally, step 1 corresponds to estimating the allele frequencies for each population assuming that the population of origin of each individual is known; step 2 corresponds to estimating the population of origin of each individual, assuming that the population allele frequencies are known. For sufficiently large m and c, (Z(m), P(m)), (Z(m+c), P(m+c)), (Z(m+2c), P(m+2c)), ... will be approximately independent random samples from Pr(Z, P|X). The distributions required to perform each step are given in the Appendix
The model with admixture:
We now expand our model to allow for admixed individuals by introducing a vector Q to denote the admixture proportions for each individual. The elements of Q are

It is also necessary to modify the vector Z to replace the assumption that each individual i originated in some unknown population z(i) with the assumption that each observed allele copy x(i,a)l originated in some unknown population z(i,a)l:

We use the term "allele copy" to refer to an allele carried at a particular locus by a particular individual.
Our primary interest now lies in estimating Q. We proceed in a manner similar to the case without admixture, beginning by specifying a probability model for (X, Z, P, Q). Analogues of (2) and (3) are
![]() |
(5) |
and
![]() |
(6) |
with (4) being used to model P as before. To complete our model we need to specify a distribution for Q, which in general will depend on the type and amount of admixture we expect to see. Here we model the admixture proportions q(i) = (q(i)1, ... , q(i)K) of individual i using the Dirichlet distribution
![]() |
(7) |
independently for each individual. For large values of
(>>1), this models each individual as having allele copies originating from all K populations in equal proportions. For very small values of
(<<1), it models each individual as originating mostly from a single population, with each population being equally likely. As
0 this model becomes the same as our model without admixture (although the implementation of the MCMC algorithm is somewhat different). We allow
to range from 0.0 to 10.0 and attempt to learn about
from the data (specifically we put a uniform prior on
[0, 10] and use a Metropolis-Hastings update step to integrate out our uncertainty in
). This model may be considered suitable for situations where little is known about admixture; alternatives are discussed in the DISCUSSION.
MCMC algorithm (with admixture):
The following algorithm may be used to sample from Pr(Z, P, Q|X).
ALGORITHM 2: Starting with initial values Z(0) for Z (by drawing Z(0) at random using (3) for example), iterate the following steps for m = 1, 2, ...
- Step 1. Sample P(m), Q(m) from Pr(P, Q|X, Z(m-1)).
- Step 2. Sample Z(m) from Pr(Z|X, P(m), Q(m)).
- Step 3. Update
using a Metropolis-Hastings step.
Informally, step 1 corresponds to estimating the allele frequencies for each population and the admixture proportions of each individual, assuming that the population of origin of each allele copy in each individual is known; step 2 corresponds to estimating the population of origin of each allele copy, assuming that the population allele frequencies and the admixture proportions are known. As before, for sufficiently large m and c, (Z(m), P(m), Q(m)), (Z(m+c), P(m+c), Q(m+c)), (Z(m+2c), P(m+2c), Q(m+2c)), ... will be approximately independent random samples from Pr(Z, P, Q|X). The distributions required to perform each step are given in the Appendix
Inference:
Inference for Z, P, and Q:
We now discuss how the MCMC output can be used to perform inference on Z, P, and Q. For simplicity, we focus our attention on Q; inference for Z or P is similar.
Having obtained a sample Q(1), ... , Q(M) (using suitably large burn-in m and thinning interval c) from the posterior distribution of Q = (q1, ... , qN) given X using the MCMC method, it is desirable to summarize the information contained, perhaps by a point estimate of Q. A seemingly obvious estimate is the posterior mean
![]() |
(8) |
However, the symmetry of our model implies that the posterior mean of qi is (1/K,1/K, ... , 1/K) for all i, whatever the value of X. For example, suppose that there are just two populations and 10 individuals and that the genotypes of these individuals contain strong information that the first 5 are in one population and the second 5 are in the other population. Then either
![]() |
(9) |
or
![]() |
(10) |
with these two "symmetric modes" being equally likely, leading to the expectation of any given qi being (0.5, 0.5). This is essentially a problem of nonidentifiability caused by the symmetry of the model [see ![]()
In general, if there are K populations then there will be K! sets of symmetric modes. Typically, MCMC schemes find it rather difficult to move between such modes, and the algorithms we describe will usually explore only one of the symmetric modes, even when run for a very large number of iterations. Fortunately this does not bother us greatly, since from the point of view of clustering all the symmetric modes are the same [compare the clusterings corresponding to (9) and (10)]. If our sampler explores only one symmetric mode then the sample means (8) will be very poor estimates of the posterior means for the qi, but will be much better estimates of the modes of the qi, which in this case turn out to be a much better summary of the information in the data. Ironically then, the poor mixing of the MCMC sampler between the symmetric modes gives the asymptotically useless estimator (8) some practical value. Where the MCMC sampler succeeds in moving between symmetric modes, or where it is desired to combine results from samples obtained using different starting points (which may involve combining results corresponding to different modes), more sophisticated methods [such as those described by ![]()
Inference for the number of populations:
The problem of inferring the number of clusters, K, present in a data set is notoriously difficult. In the Bayesian paradigm the way to proceed is theoretically straightforward: place a prior distribution on K and base inference for K on the posterior distribution
![]() |
(11) |
However, this posterior distribution can be peculiarly dependent on the modeling assumptions made, even where the posterior distributions of other quantities (Q, Z, and P, say) are relatively robust to these assumptions. Moreover, there are typically severe computational challenges in estimating Pr(X|K). We therefore describe an alternative approach, which is motivated by approximating (11) in an ad hoc and computationally convenient way.
Arguments given in the Appendix (Inference on K, the number of populations) suggest estimating Pr(X|K) using
![]() |
(12) |
where
![]() |
(13) |
and
![]() |
(14) |
We use (12) to estimate Pr(X|K) for each K and substitute these estimates into (11) to approximate the posterior distribution Pr(K|X).
In fact, the assumptions underlying (12) are dubious at best, and we do not claim (or believe) that our procedure provides a quantitatively accurate estimate of the posterior distribution of K. We see it merely as an ad hoc guide to which models are most consistent with the data, with the main justification being that it seems to give sensible answers in practice (see next section for examples). Notwithstanding this, for convenience we continue to refer to "estimating" Pr(K|X) and Pr(X|K).
| APPLICATIONS TO DATA |
|---|
We now illustrate the performance of our method on both simulated data and real data (from an endangered bird species and from humans). The analyses make use of the methods described in The model with admixture.
Simulated data:
To test the performance of the clustering method in cases where the "answers" are known, we simulated data from three population models, using standard coalescent techniques (![]()
- Model 1: A single random-mating population of constant size.
- Model 2: Two random-mating populations of constant effective population size 2N. These were assumed to have split from a single ancestral population, also of size 2N at a time N generations in the past, with no subsequent migration.
- Model 3: Admixture of populations. Two discrete populations of equal size, related as in model 2, were fused to produce a single random-mating population. Samples were collected after two generations of random mating in the merged population. Thus, individuals have i grandparents from population 1, and 4 - i grandparents from population 2 with probability
, where i
{0, 4}. All loci were simulated independently.
We present results from analyzing data sets simulated under each model. Data set 1 was simulated under model 1, with 5 microsatellite loci. Data sets 2A and 2B were simulated under model 2, with 5 and 15 microsatellite loci, respectively. Data set 3 was simulated under model 3, with 60 loci (preliminary analyses with fewer loci showed this to be a much harder problem than models 1 and 2). Microsatellite mutation was modeled by a simple stepwise mutation process, with the mutation parameter 4Nµ set at 16.0 per locus (i.e., the expected variance in repeat scores within populations was 8.0). We did not make use of the assumed mutation model in analyzing the simulated data.
Our analysis consists of two phases. First, we consider the issue of model choicei.e., how many populations are most appropriate for interpreting the data. Then, we examine the clustering of individuals for the inferred number of populations.
Choice of K for simulated data:
For each model, we ran a series of independent runs of the Gibbs sampler for each value of K (the number of populations) between 1 and 5. The results presented are based on runs of 106 iterations or more, following a burn-in period of at least 30,000 iterations. To choose the length of the burn-in period, we printed out log(Pr(X|P(m), Q(m))), and several other summary statistics during the course of a series of trial runs, to estimate how long it took to reach (approximate) stationarity. To check for possible problems with mixing, we compared the estimates of P(X|K) and other summary statistics obtained over several independent runs of the Gibbs sampler, starting from different initial points. In general, substantial differences between runs can indicate that either the runs should be longer to obtain more accurate estimates or that independent runs are getting stuck in different modes in the parameter space. (Here, we consider the K! modes that arise from the nonidentifiability of the K populations to be equivalent, since they arise from permuting the K population labels.)
We found that in most cases we obtained consistent estimates of P(X|K) across independent runs. However, when analyzing data set 2A with K = 3, the Gibbs sampler found two different modes. This data set actually contains two populations, and when K is set to 3, one of the populations expands to fill two of the three clusters. It is somewhat arbitrary which of the two populations expands to fill the extra cluster: this leads to two modes of slightly different heights. The Gibbs sampler did not manage to move between the two modes in any of our runs.
In Table 1 we report estimates of the posterior probabilities of values of K, assuming a uniform prior on K between 1 and 5, obtained as described in Inference for the number of populations. We repeat the warning given there that these numbers should be regarded as rough guides to which models are consistent with the data, rather than accurate estimates of the posterior probabilities. In the case where we found two modes (data set 2A, K = 3), we present results based on the mode that gave the higher estimate of Pr(X|K).
|
With all four simulated data sets we were able to correctly infer whether or not there was population structure (K = 1 for data set 1 and K > 1 otherwise). In the case of data set 2A, which consisted of just 5 loci, there is not a clear estimate of K, as the posterior probability is consistent with both the correct value, K = 2, and also with K = 3 or 4. However, when the number of loci was increased to 15 (data set 2B), virtually all of the posterior probability was on the correct number of populations, K = 2.
Data set 3 was simulated under a more complicated model, where most individuals have mixed ancestry. In this case, the population was formed by admixture of two populations, so the "true" clustering is with K = 2, and Q estimating the number of grandparents from each of the two original populations, for each individual. Intuitively it seems that another plausible clustering would be with K = 5, individuals being assigned to clusters according to how many grandparents they have from each population. In biological terms, the solution with K = 2 is more natural and is indeed the inferred value of K for this data set using our ad hoc guide [the estimated value of Pr(X|K) was higher for K = 5 than for K = 3, 4, or 6, but much lower than for K = 2]. However, this raises an important point: the inferred value of K may not always have a clear biological interpretation (an issue that we return to in the DISCUSSION).
Clustering of simulated data: Having considered the problem of estimating the number of populations, we now examine the performance of the clustering algorithm in assigning particular individuals to the appropriate populations. In the case where the populations are discrete, the clustering performs very well (Fig 1), even with just 5 loci (data set 2A), and essentially perfectly with 15 loci (data set 2B).
|
The case with admixture (Fig 2) appears to be more difficult, even using many more loci. However, the clustering algorithm did manage to identify the population structure appropriately and estimated the ancestry of individuals with reasonable accuracy. Part of the reason that this problem is difficult is that it is hard to estimate the original allele frequencies (before admixture) when almost all the individuals (7/8) are admixed. A more fundamental problem is that it is difficult to get accurate estimates of q(i) for particular individuals because (as can be seen from the y-axis of Fig 2) for any given individual, the variance of how many of its alleles are actually derived from each population can be substantial (for intermediate q). This property means that even if the allele frequencies were known, it would still be necessary to use a considerable number of loci to get accurate estimates of q for admixed individuals.
|
Data from the Taita thrush:
We now present results from applying our method to genotype data from an endangered bird species, the Taita thrush, Turdus helleri. Individuals were sampled at four locations in southeast Kenya [Chawia (17 individuals), Ngangao (54), Mbololo (80), and Yale (4)]. Each individual was genotyped at seven microsatellite loci (![]()
This data set is a useful test for our clustering method, because the geographic samples are likely to represent distinct populations. These locations represent fragments of indigenous cloud forest, separated from each other by human settlements and cultivated areas. Yale, which is a very small fragment, is quite close to Ngangao. Extensive data on ringed and radio-tagged birds over a 3-year period indicate low migration rates (![]()
As discussed in BACKGROUND ON CLUSTERING METHODS, it is currently common to use distance-based clustering methods to visualize genotype data of this kind. To permit a comparison between that type of approach and our own method, we begin by showing a neighbor-joining tree of the bird data (Fig 3). Inspection of the tree reveals that the Chawia and Mbololo individuals represent (somewhat) distinct clusters. Several individuals (marked by asterisks) appear to be classified with other groups. The four Yale individuals appear to fall within the Ngangao group [a view that is supported by summary statistics of divergence showing the Yale and Ngangao to be very closely related (Table 2)].
|
|
The tree illustrates several shortcomings of distance-based clustering methods. First, it would not be possible (in this case) to identify the appropriate clusters if the labels were missing. Second, since the tree does not use a formal probability model, it is difficult to ask statistical questions about features of the tree, for example: Are the individuals marked with asterisks actually migrants, or are they simply misclassified by chance? Is there evidence of population structure within the Ngangao group (which appears from the tree to be quite diverse)?
We now apply our clustering method to these data.
Choice of K, for Taita thrush data:
To choose an appropriate value of K for modeling the data, we ran a series of independent runs of the Gibbs sampler at a range of values of K. After running numerous medium-length runs to investigate the behavior of the Gibbs sampler (using the diagnostics described in Choice of K for simulated data), we again chose to use a burn-in period of 30,000 iterations and to collect data for 106 iterations. We ran three to five independent simulations of this length for each K between 1 and 5 and found that the independent runs produced highly consistent results. At K = 5, a run of 106 steps takes ~70 min on our desktop machine.
Using the approach described in Inference for the number of populations, we estimated Pr(X|K) for K = 1, 2, ... , 5 and corresponding values of Pr(K|X) for a uniform prior on K = 1, 2, ... , 5. (In fact, this data set contains a lot of information about K, so that inference is relatively robust to choice of prior on K, and other priors, such as taking Pr(K) proportional to Poisson(1) for K > 0, would give virtually indistinguishable results.) From the estimates of Pr(K|X), shown in the last column of Table 3, it is clear that the models with K = 1 or 2 are completely insufficient to model the data and that the model with K = 3 is substantially better than models with larger K. Given these results, we now focus our subsequent analysis on the model with three populations.
|
Clustering results for Taita thrush data:
Fig 4 shows a plot of the clustering results for the individuals in the sample, assuming that there are three populations (as inferred above). We did not use (and indeed, did not know) the sampling locations of individuals when we obtained these results. Our clustering algorithm seems to have performed very well, with just a few individuals (labeled 14) falling somewhat outside the obvious clusters. All of the points in the extreme corners (some of which may be difficult to resolve on the picture) are correctly assigned. The four Yale individuals were assigned to the Ngangao cluster, consistent with the neighbor-joining tree and the (
µ)2 distances. We return to this data set in INCORPORATING POPULATION INFORMATION to consider the question of whether the individuals that seem not to cluster tightly with others sampled from the same location are the product of migration.
|
Application to human data:
The next data set, taken from ![]()
Application of our MCMC scheme with K = 2 indicates the presence of two very distinct clusters, corresponding to the Africans and Europeans in the sample (Fig 5). The model with K = 2 has vastly higher posterior probability than the model with K = 1.
|
Additional runs of the MCMC scheme with the models K = 3, 4, and 5 suggest that those models may be somewhat better than K = 2. This may reflect the presence of population structure within the continental groupings, although in this case the additional populations do not form discrete clusters and so are difficult to interpret.
Again it is interesting to contrast our clustering results with the neighbor-joining tree of these data (Fig 6). While our method finds it quite easy to separate the two continental groups into the correct clusters, it would not be possible to use the neighbor-joining tree to detect distinct clusters if the labels were not present. The data set of Jorde also contains a set of individuals of Asian origin (which are more closely related to Europeans than are Africans). Neither the neighbor-joining method nor our method differentiates between the Europeans and Asians with great accuracy using this data set.
|
| INCORPORATING POPULATION INFORMATION |
|---|
The results presented so far have focused on testing how well our method works. We now turn our attention to some further applications of this method.
Our clustering results (Fig 4) confirm that the three main geographic groupings in the thrush data set (Chawia, Mbololo, and Ngangao) represent three genetically distinct populations. The geographic labels correspond very closely to the genetic clustering in all but a handful of cases (14 in Fig 4). Individual 2 is also identified as a possible outlier on the neighbor-joining tree (Fig 3). Given this, it is natural to ask whether these apparent outliers are immigrants (or descendants of recent immigrants) from other populations. For example, given the genetic data, how probable is it that individual 1 is actually an immigrant from Chawia?
To answer this sort of question, we need to extend our algorithm to incorporate the geographic labels. By doing this, we break the symmetry of the labels, and we can ask specifically whether a particular individual is a migrant from Chawia (say). In essence our approach (described more formally in the next section) is to assume that each individual originated, with high probability, in the geographical region in which it was sampled, but to allow some small probability that it is an immigrant (or has immigrant ancestry). Note that this model is also suitable for situations in which individuals are classified according to some characteristic other than sampling location (physical appearance, for example). "Immigrants" in this situation would be individuals whose genetic makeup suggests they were misclassified. Thus, while we speak of "immigrants" and "immigrant ancestry," in some contexts these terms may relate to something other than changes in physical location.
Provided that geographic labels usually correspond to population membership, using the geographic information will clearly improve our accuracy at assigning individuals to clusters; it will also improve our estimates of P, thus also giving us greater precision in assignment of individuals who do not have geographic information. However, in practice we suggest that before making use of such information, users of our method should first cluster the data without using the geographic labels, to check that the genetically defined clusters do in fact agree with geographic labels. We return to this issue in the DISCUSSION.
![]()
Model with prior population information:
To incorporate geographic information, we use the following model. Our primary goal is to identify individuals who are immigrants, or who have recent immigrant ancestry, in the last G generations, say, where G = 0 is the present generation. [In practice there will only be substantial power to detect immigration for small G; cf. ![]()
First, we code each of the geographic locations by a (unique) integer between 1 and K, where K would usually be set equal to the number of locations. Using this coding, let g(i) represent the geographic sampling location of individual i. Now, let
be the probability that an individual is an immigrant to population g(i) or has an immigrant ancestor in the last G generations. Otherwise, with probability 1 -
, the individual is considered to be purely from population g(i). While in principle one could place a prior on
and learn about it from the data as part of the MCMC scheme, in our current implementation the user must specify a fixed value for
; we give some guidelines in the next section.
Assuming that migration is rare, we can use the approximation that each individual has at most one immigrant ancestor in the last G generations (where G is suitably small). Then, assuming a constant migration rate, the probability of an immigrant ancestor in generation t (0
t
G) is proportional to 2t, where t = 0 indicates that the individual migrated in the present generation. Thus, we set the prior on q(i) to be
![]() |
(15) |
with probability 1 -
and
![]() |
(16) |
for each j
g(i) with probability
![]() |
(17) |
where t
{0, ... , G}. As before, q(i)l
0 for l
{1, ... , K}, and
q(i)l = 1.
Again, we can sample from Pr(Q|X) using Algorithm 2. In this case, however, since there are a small number of possible values of q(i), we update q(i) by sampling directly from the posterior probability of q(i)|X,P, rather than conditional on Z.
Note that in this framework, it is easy to include individuals for whom there is no geographic information by using the same prior and update steps as before (Equation 7 and Equation A10).
Testing for migrants in the Taita thrush data:
To apply our method, we must first specify a value for
. In this case, based on mark-release-recapture data from these populations (![]()
is likely to be small. We performed analyses for
= 0.05 and
= 0.1; a summary of the results is shown in Table 4. Individuals 2 and 3 have moderate posterior probabilities of having migrant ancestry, but these probabilities are perhaps smaller than might be expected from examining Fig 4. This is due to a combination of the low prior probability for migration (from the mark-release-recapture data) and, perhaps more importantly, the fact that there is a limited amount of information in seven loci, so that the uncertainty associated with the position of the points marked 1, 2, 3, and 4 in Fig 4 may be quite large. A more definite conclusion could be obtained by typing more loci.
|
It is interesting to note that our conclusions here differ from those obtained on this data set using the package IMMANC (![]()
![]()
We anticipate that our method might also be applied in situations where there is little data to help make an informed choice of
. In such situations we suggest analyzing the data using several different values of
, to see whether the conclusions are robust to choice of
. The range of sensible values for
will depend on the context, but typically we suggest values in the range 0.0010.1 might be appropriate. Sensitivity to choice of
indicates that the amount of information in the data is insufficient to draw strong conclusions.
| DISCUSSION |
|---|
We have described a method for using multilocus genotype data to learn about population structure and assign individuals (probabilistically) to populations. Our method also provides a novel approach to testing for the presence of population structure (K > 1).
Our examples demonstrate that the method can accurately cluster individuals into their appropriate populations, even using only a modest number of loci. In practice, the accuracy of the assignments depends on a number of factors, including the number of individuals (which affects the accuracy of the estimate for P), the number of loci (which affects the accuracy of the estimate for Q), the amount of admixture, and the extent of allele-frequency differences among populations.
We anticipate that our method will be useful for identifying populations and assigning individuals in situations where there is little information about population structure. It should also be useful in problems where cryptic population structure is a concern, as a way of identifying subpopulations. Even in situations where there is nongenetic information that can be used to define populations, it may be useful to use the approach developed here to ensure that populations defined on an extrinsic basis reflect the underlying genetic structure.
As described in INCORPORATING POPULATION INFORMATION we have also developed a framework that makes it possible to combine genetic information with prior information about the geographic sampling location of individuals. Besides being used to detect migrants, this could also be used in situations where there is strong prior population information for some individuals, but not for others. For example, in hybrid zones it may be possible to identify some individuals who do not have mixed ancestry and then to estimate q for the rest (M. BEAUMONT, D. GOTELLI, E. M. BARETT, A. C. KITCHENER, M. J. DANIELS, J. K. PRITCHARD and M. W. BRUFORD, unpublished results). The advantage of using a clustering approach in such cases is that it makes the method more robust to the presence of misclassified individuals and should be more accurate than if only preclassified individuals are used to estimate allele frequencies (cf. ![]()
Another type of application where the geographic information might be of value is in evolutionary studies of population relationships. Such analyses frequently make use of summary statistics based on population allele frequencies [e.g., FST and (
µ)2]. In situations where the population allele frequencies might be affected by recent immigration or where population classifications are unclear, such summary statistics could be calculated directly from the population allele frequencies P estimated by the Gibbs sampler.
There are several ways in which the basic model that we have described here might be modified to produce better performance in particular cases. For example, in MODELS AND METHODS and APPLICATIONS TO DATA we assumed relatively noninformative priors for q. However, in some situations, there might be quite a bit of information about likely values of q, and the estimation procedure could be improved by using that information. For example, in estimating admixture proportions for African Americans, it would be possible to improve the estimation procedure by making use of existing information about the extent of European admixture (e.g., ![]()
A second way in which the basic model can be modified involves changing the way in which the allele frequencies P are estimated. Throughout this article, we have assumed that the allele frequencies in different populations are uncorrelated with one another. This is a convenient approximation for populations that are not extremely closely related and, as we have seen, can produce accurate clustering. However, loosely speaking, the model of uncorrelated allele frequencies says that we do not normally expect to see populations with very similar allele frequencies. This property has the result that the clustering algorithm may tend to merge subpopulations that share similar frequencies. An alternative, which we have implemented in our software package, is to permit allele frequencies to be correlated across populations (Appendix Model with correlated allele frequencies). In a series of additional simulations, we have found that this allows us to perform accurate assignments of individuals in very closely related populations, though possibly at the cost of making us likely to overestimate K.
Our basic model might also be modified to allow for linkage among marker loci. Normally, we would not expect to see linkage disequilibrium within subpopulations, except between markers that are extremely close together. This means that in situations where there is little admixture, our assumption of independence among loci will be quite accurate. However, we might expect to see strong correlations among linked loci when there is recent admixture. This occurs because an individual who is admixed will inherit large chromosomal segments from one population or another. Thus, when the map order of marker loci is known, it should be possible to improve the accuracy of the estimation for such individuals by modeling the inheritance of these segments.
In this article we have devoted considerable attention to the problem of inferring K. This is an important practical problem from the standpoint of model choice. We need to have some way of deciding which clustering model is most appropriate for interpreting the data. However, we stress that care should be taken in the interpretation of the inferred value of K. To begin with, due to the very high dimensionality of the parameter space, we found it difficult to obtain reliable estimates of Pr(X | K) and have chosen to use a fairly ad hoc procedure that we have found gives sensible results in practice. Second, it has been observed that in Bayesian model-based clustering, the posterior distribution of K tends to be quite dependent on the priors and modeling assumptions, even though estimates of the other parameters (e.g., P and Q here) may be reasonably robust (see ![]()
![]()
There are also biological reasons to be careful interpreting K. The population model that we have adopted here is obviously an idealization. We anticipate that it will be flexible enough to permit appropriate clustering for a wide range of population structures. However, as we pointed out in our discussion of data set 3 (Choice of K for simulated data), clusters may not necessarily correspond to "real" populations. As another example, imagine a species that lives on a continuous plane, but has low dispersal rates, so that allele frequencies vary continuously across the plane. If we sample at K distinct locations, we might infer the presence of K clusters, but the inferred number K is not biologically interesting, as it was determined purely by the sampling scheme. All that can usefully be said in such a situation is that the migration rates between the sampling locations are not high enough to make the population act as a single unstructured population.
In summary, we find that the method described here can produce highly accurate clustering and sensible choices of K, both for simulated data and for real data from humans and from the Taita thrush. In the latter example, we find it particularly encouraging that using a relatively small number of loci (seven) we can detect a very strong signal of population structure and assign individuals appropriately.
The algorithms described in this article have been implemented in a computer software package structure, which is available at http://www.stats.ox.ac.uk/~pritch/home.html.
| ACKNOWLEDGMENTS |
|---|
We thank Peter Galbusera and Lynn Jorde for allowing us to use their data, Augie Kong for a helpful discussion, Daniel Falush for suggesting comparison with neighbor-joining trees, Steve Brooks and Trevor Sweeting for helpful discussions on inferring K, and Eric Anderson for his extensive comments on an earlier version of the manuscript. This work was supported by National Institutes of Health grant GM19634 and by a Hitchings-Elion fellowship from Burroughs-Wellcome Fund to J.K.P., by a grant from the University of Oxford and a Wellcome Trust Fellowship (057416) to M.S., and by grants GR/M14197 and 43/MMI09788, from the Engineering and Physical Sciences Research Council and Biotechnology and Biological Sciences Research Council, respectively, to P.D. The work was initiated while the authors were resident at the Isaac Newton Institute for Mathematical Sciences, Cambridge, UK.
Manuscript received September 23, 1999; Accepted for publication February 18, 2000.
| APPENDIX |
|---|
MCMC methods and Gibbs sampling:
MCMC methods are extremely useful for obtaining (approximate) samples from a probability distribution,
(
), say, which cannot be simulated from directly [in our case
= (Z, P, Q) and
(
) = Pr(Z, P, Q|X)]. The idea is to construct a Markov chain
(0),
(1),
(2), ... with stationary distribution
(
). This is often surprisingly straightforward using standard methods devised for this purpose, such as the Metropolis-Hastings algorithm (e.g., ![]()
![]()
(0),
(1),
(2), ... has stationary distribution
(
), then
(m) will be approximately distributed as
(
) provided m is sufficiently large. This can be formalized and shown to be true provided the Markov chain satisfies certain technical conditions (ergodicity) that hold for the Markov chains considered in this article. Furthermore, for sufficiently large c,
(m),
(m+c),
(m+2c), ... will be reasonably independent samples from
(
). The value of m used is often referred to as the burn-in period of the chain; c is often referred to as the thinning interval.
In general it is very difficult to know how large m and c should be. The values required to obtain reliable results depend heavily on the amount of correlation between successive states of the Markov chain. If successive states are relatively uncorrelated (that is, if the chain moves quickly between reasonably different values of
), then the chain is said to mix well, and relatively small values of m and c will suffice. Conversely, if the chain mixes badly (sometimes known as being sticky, as the chain will tend to get stuck moving among very similar values of
), then very large values of m and c will be required, possibly rendering the method impracticable. One strategy for investigating whether m and c are sufficiently large, and the strategy we adopt here, is to simulate several realizations of the Markov chain, each starting from a different value of
(0). If m and c are sufficiently large, then the results obtained should be independent of
(0) and should therefore be similar for the different runs. Substantial differences among the results obtained for the different runs indicate that m and c are too small. It is then necessary either to increase m and c or (if this makes the method computationally infeasible) to construct a Markov chain with better mixing properties. In the examples presented in this article we have chosen c = 1.
Gibbs sampling is a method of constructing a Markov chain with stationary distribution
(
), which has proved particularly useful for clustering problems. Suppose that
may be partitioned into
= (
1, ... ,
r), and that although it is not possible to simulate from
(
) directly, it is possible to simulate a random value of
i directly from the full conditional distribution
(
i |
1,
2, ... ,
i-1,
i + 1, ... ,
r) for i = 1, 2, ... , r. Then the following algorithm may be used to simulate a Markov chain with stationary distribution
(
):
ALGORITHM A1: Starting with initial values
(0) = (
(0)1, ... ,
(0)r), iterate the following steps for m = 1, 2, ...
- Step 1. Sample
(m)1 from
(
1|
(m-1)2,
(m-1)3, ... ,
(m-1)r). - Step 2. Sample
(m)2 from
(
2|
(m)1,
(m-1)3, ... ,
(m-1)r). - Step r. Sample
(m)r from
(
r|
(m)1),
(m)2, ... ,
(m)r-1.
It is easy to show that if
(m-1) ~
(
), then
(m) ~
(
), and so
(
) is the stationary distribution of this Markov chain.
Inference on K, the number of populations
We now provide further details regarding our approach to choosing K (see Inference for the number of populations).
The simplest way of estimating Pr(X|K) is the so-called harmonic mean estimator
![]() |
(A1) |
This estimator is notoriously unstable, often having infinite variance, and is thus of little use in practice. One theoretically attractive alternative involves estimating Pr(P, Q|X) for some P, Q (![]()
![]()
![]()
![]()
![]()
![]() |
(A2) |
The conditional mean and variance of D given X are easily estimated using
![]() |
(A3) |
and
![]() |
(A4) |
If we make the (admittedly dubious) assumption that the conditional distribution of D given X is normal, then it follows from (A1) that
![]() |
(A5) |
(Replacing the assumption of normality with the assumption of being gamma-distributed may be more asymptotically justifiable and gives similar results.) We then use this to estimate the posterior distribution of K from (11). An alternative interpretation of this method is that model selection is based on penalizing the mean of the Bayesian deviance by a quarter of its variance (cf. ![]()
Details of the MCMC algorithms
Algorithm A2:
Step 1 may be performed by simulating pkl· independently for each (k, l), from
![]() |
(A6) |
where
![]() |
(A7) |
is the number of copies of allele j at locus l observed in individuals assigned (by Z) to population k.
Step 2 may be performed by simulating z(i), independently for each i, from
![]() |
(A8) |
where Pr(x(i)|P, z(i) = k) =
Ll=1 pklx(i,1)pklx(i,2).
Note that Equation A8 makes an implicit assumption that an equal fraction of the sample is drawn from each population. Alternatively, it might be natural to introduce an additional parameter for the fraction of the sample drawn from each population.
Algorithm A3:
Step 1 may be performed by updating P and Q independently. Updating P is achieved as before, using Equation A6 but where the definition (A7) of nklj is modified in the obvious way to
![]() |
(A9) |
Updating Q involves simulating from
![]() |
(A10) |
where m(i)k is the number of allele copies in individual i that originated (according to Z) in population k:
![]() |
(A11) |
Step 2 may be performed by simulating zl(i,a), independently for each i, a, l, from
![]() |
(A12) |
where Pr(x(i,a)l|P, z(i,a)l = k) = pklx(i,a)l.
Step 3 may be performed by simulating a proposal
', from a normal distribution with mean
, and some variance
2
. The proposal is automatically rejected if
'
0, and otherwise it is accepted with the appropriate Metropolis-Hastings probability.
Model with correlated allele frequencies
For very closely related populations it is natural to assume that allele frequencies are correlated across populations. For completeness, we describe a model that is implemented in the program structure, allowing allele-frequency correlations.
Recall that we model allele frequencies by pkl· ~
(
1,
2, ... ,
Jl). For all the results presented in this article, we took
1 =
2 = · · · =
Jl = 1.0, which gives a uniform distribution on allele frequencies, where Jl is the number of alleles at lows l. To model closely related populations, we consider an alternative model, where
![]() |
(A13) |
Here, µ(l)i is the mean sample frequency of allele i at locus l, and f(l) > 0 determines the strength of the correlations across populations at locus l. When f(l) is large, the allele frequencies in all populations tend to be similar to the mean allele frequencies in the sample. In our implementation of this model, we placed a gamma prior on each f(l) and used a Metropolis-Hastings update step. The proposal f(l)' was chosen from a normal with mean f(l) and some variance
2f. It was automatically rejected if f(l)'
0.
There are several possible alternative models to considering a factor f for each locus. One would be to consider a factor for each population, and another would be to give each type of locus (e.g., SNPs and dinucleotide and trinucleotide repeats) a shared value of f.
| LITERATURE CITED |
|---|
BALDING, D. J. and R. A. NICHOLS, 1994 DNA profile match probability calculations: how to allow for population stratification, relatedness, database selection and single bands. Forensic Sci. Int. 64:125-140[Medline].
BALDING, D. J. and R. A. NICHOLS, 1995 A method for quantifying differentiation between populations at multi-allelic loci and its implications for investigating identity and paternity. Genetica 96:3-12[Medline].
BOWCOCK, A. M., A. RUIZ-LINARES, J. TOMFOHRDE, E. MINCH, and J. KIDD et al., 1994 High resolution of human evolutionary trees with polymorphic microsatellites. Nature 368:455-457[Medline].
CAVALLI-SFORZA, L. L., P. MENOZZI and A. PIAZZA, 1994 The History and Geography of Human Genes. Princeton University Press, Princeton, NJ.
CHIB, S., 1995 Marginal likelihood from the Gibbs output. J. Am. Stat. Assoc. 90:1313-1321.
CHIB, S. and E. GREENBERG, 1995 Understanding the Metropolis-Hastings algorithm. Am. Stat. 49:327-335.
DAVIES, N., F. X. VILLABLANCA, and G. K. RODERICK, 1999 Determining the source of individuals: multilocus genotyping in nonequilibrium population genetics. TREE 14:17-21.
DICICCIO, T., R. KASS, A. RAFTERY, and L. WASSERMAN, 1997 Computing Bayes factors by posterior simulation and asymptotic approximations. J. Am. Stat. Assoc. 92:903-915.
EWENS, W. J. and R. S. SPIELMAN, 1995 The transmission/disequilibrium test: history, subdivision, and admixture. Am. J. Hum. Genet. 57:455-464[Medline].
FELSENSTEIN, J., 1993 PHYLIP (phylogeny inference package) version 3.5c. Technical report, Department of Genetics, University of Washington, Seattle.
FOREMAN, L., A. SMITH, and I. EVETT, 1997 Bayesian analysis of DNA profiling data in forensic identification applications. J. R. Stat. Soc. A 160:429-469.
GALBUSERA, P., L. LENS, E. WAIYAKI, T. SCHENCK, and E. MATTYSEN, 2000 Effective population size and gene flow in the globally, critically endangered Taita thrush, Turdus helleri. Conserv. Genet. in press.
GILKS, W. R., S. RICHARDSON and D. J. SPIEGELHALTER, 1996a Introducing Markov chain Monte Carlo, pp. 119 in Markov Chain Monte Carlo in Practice, edited by W. R. GILKS, S. RICHARDSON and D. J. SPIEGELHALTER. Chapman & Hall, London.
GILKS, W. R., S. RICHARDSON and D. J. SPIEGELHALTER (Editors), 1996b Markov Chain Monte Carlo in Practice. Chapman & Hall, London.
GOLDSTEIN, D. B. and D. POLLOCK, 1997 Launching microsatellites: a review of mutation processes and methods of phylogenetic inference. J. Hered. 88:335-342
GREEN, P. J., 1995 Reversible jump Markov chain Monte Carlo computation and Bayesian model determination. Biometrika 82:711-732
HUDSON, R. R., 1990 Gene genealogies and the coalescent process, pp. 144 in Oxford Surveys in Evolutionary Biology, Vol. 7, edited by D. FUTUYMA and J. ANTONOVICS. Oxford University Press, Oxford.
JORDE, L. B., M. J. BAMSHAD, W. S. WATKINS, R. ZENGER, and A. E. FRALEY et al., 1995 Origins and affinities of modern humans: a comparison of mitochondrial and nuclear genetic data. Am. J. Hum. Genet. 57:523-538[Medline].
MOUNTAIN, J. L. and L. L. CAVALLI-SFORZA, 1997 Multilocus genotypes, a tree of individuals, and human evolutionary history. Am. J. Hum. Genet. 61:705-718[Medline].
PAETKAU, D., W. CALVERT, I. STIRLING, and C. STROBECK, 1995 Microsatellite analysis of population structure in Canadian polar bears. Mol. Ecol. 4:347-354[Medline].
PARRA, E. J., A. MARCINI, J. AKEY, J. MARTINSON, and M. A. BATZER et al., 1998 Estimating African American admixture proportions by use of population-specific alleles. Am. J. Hum. Genet. 63:1839-1851[Medline].
PRITCHARD, J. K. and N. A. ROSENBERG, 1999 Use of unlinked genetic markers to detect population stratification in association studies. Am. J. Hum. Genet. 65:220-228[Medline].
RAFTERY, A. E., 1996 Hypothesis testing and model selection, pp. 163188 in Markov Chain Monte Carlo in Practice, edited by W. R. GILKS, S. RICHARDSON and D. J. SPIEGELHALTER. Chapman & Hall, London.
RANNALA, B. and J. L. MOUNTAIN, 1997 Detecting immigration by using multilocus genotypes. Proc. Natl. Acad. Sci. USA 94:9197-9201
RICHARDSON, S. and P. J. GREEN, 1997 On Bayesian analysis of mixtures with an unknown number of components. J. R. Stat. Soc. Ser. B 59:731-792.
ROEDER, K., M. ESCOBAR, J. B. KADANE, and I. BALAZS, 1998 Measuring heterogeneity in forensic databases using hierarchical Bayes models. Biometrika 85:269-287
SMOUSE, P. E., R. S. WAPLES, and J. A. TWOREK, 1990 A genetic mixture analysis for use with incomplete source population-data. Can. J. Fish. Aquat. Sci. 47:620-634.
SPIEGELHALTER, D. J., N. G. BEST and B. P. CARLIN, 1999 Bayesian deviance, the effective number of parameters, and the comparison of arbitrarily complex models. Available from http://www.mrc-bsu.cam.ac.uk/publications/preslid.shtml.
STEPHENS, M., 2000a Bayesian analysis of mixtures with an unknown number of componentsan alternative to reversible jump methods. Ann. Stat. in press.
STEPHENS, M., 2000b Dealing with label-switching in mixture models. J. R. Stat. Soc. Ser. B in press.
This article has been cited by other articles:
![]() |
J. Schumacher, G. Laje, R. A. Jamra, T. Becker, T. W. Muhleisen, C. Vasilescu, M. Mattheisen, S. Herms, P. Hoffmann, A. M. Hillmer, et al. The DISC locus and schizophrenia: evidence from an association study in a central European sample and from a meta-analysis across different European populations Hum. Mol. Genet., July 15, 2009; 18(14): 2719 - 2727. [Abstract] [Full Text] [PDF] |
||||
![]() |
G. Guillot On the inference of spatial structure from population genetics data Bioinformatics, July 15, 2009; 25(14): 1796 - 1801. [Abstract] [Full Text] [PDF] |
||||
![]() |
E. Durand, C. Chen, and O. Francois Comment on 'On the inference of spatial structure from population genetics data' Bioinformatics, July 15, 2009; 25(14): 1802 - 1804. [Abstract] [Full Text] [PDF] |
||||
![]() |
C. Pampoulie, D. Gislason, and A. K. Danielsdottir A "seascape genetic" snapshot of Sebastes marinus calls for further investigation across the North Atlantic ICES J. Mar. Sci., July 9, 2009; (2009) fsp199v1. [Abstract] [Full Text] [PDF] |
||||
![]() |
K. L. Evans, K. J. Gaston, A. C. Frantz, M. Simeoni, S. P. Sharp, A. McGowan, D. A. Dawson, K. Walasz, J. Partecke, T. Burke, et al. Independent colonization of multiple urban centres by a formerly forest specialist bird species Proc R Soc B, July 7, 2009; 276(1666): 2403 - 2410. [Abstract] [Full Text] [PDF] |
||||
![]() |
T. LaFramboise Single nucleotide polymorphism arrays: a decade of biological, computational and technological advances Nucleic Acids Res., July 1, 2009; (2009) gkp552v1. [Abstract] [Full Text] [PDF] |
||||
![]() |
J. Xu, A. S. Kibel, J. J. Hu, A. R. Turner, K. Pruett, S. L. Zheng, J. Sun, S. D. Isaacs, K. E. Wiley, S.-T. Kim, et al. Prostate Cancer Risk Associated Loci in African Americans Cancer Epidemiol. Biomarkers Prev., July 1, 2009; 18(7): 2145 - 2149. [Abstract] [Full Text] [PDF] |
||||
![]() |
D. Prada Molecular population genetics and agronomic alleles in seed banks: searching for a needle in a haystack? J. Exp. Bot., July 1, 2009; 60(9): 2541 - 2552. [Abstract] [Full Text] [PDF] |
||||
![]() |
C. J. Baskauf and J. M. Burke Population Genetics of Astragalus bibullatus (Fabaceae) Using AFLPs J. Hered., July 1, 2009; 100(4): 424 - 431. [Abstract] [Full Text] [PDF] |
||||
![]() |
M. L. Chang, J. S. Yokoyama, N. Branson, D. J. Dyer, C. Hitte, K. L. Overall, and S. P. Hamilton Intrabreed Stratification Related to Divergent Selection Regimes in Purebred Dogs May Affect the Interpretation of Genetic Association Studies J. Hered., July 1, 2009; 100(suppl_1): S28 - S36. [Abstract] [Full Text] [PDF] |
||||
![]() |
P. J. Wilson, S. K. Grewal, F. F. Mallory, and B. N. White Genetic Characterization of Hybrid Wolves across Ontario J. Hered., July 1, 2009; 100(suppl_1): S80 - S89. [Abstract] [Full Text] [PDF] |
||||
![]() |
W. F. Anderson, A. Maas, and P. Ozias-Akins Genetic Variability of a Forage Bermudagrass Core Collection Crop Sci., June 26, 2009; 49(4): 1347 - 1358. [Abstract] [Full Text] [PDF] |
||||
![]() |
D. E. Pearse, S. A. Hayes, M. H. Bond, C. V. Hanson, E. C. Anderson, R. B. Macfarlane, and J. C. Garza Over the Falls? Rapid Evolution of Ecotypic Differentiation in Steelhead/Rainbow Trout (Oncorhynchus mykiss) J. Hered., June 26, 2009; (2009) esp040v1. [Abstract] [Full Text] [PDF] |
||||
![]() |
R. Lumaret and R. Jabbour-Zahab Ancient and current gene flow between two distantly related Mediterranean oak species, Quercus suber and Q. ilex Ann. Bot., June 25, 2009; (2009) mcp149v1. [Abstract] [Full Text] [PDF] |
||||
![]() |
S. M. Hoban, T. S. McCleary, S. E. Schlarbaum, and J. Romero-Severson Geographically extensive hybridization between the forest trees American butternut and Japanese walnut Biol Lett, June 23, 2009; 5(3): 324 - 327. [Abstract] [Full Text] [PDF] |
||||
![]() |
E. E. Rees, B. A. Pond, C. I. Cullingham, R. R. Tinline, D. Ball, C. J. Kyle, and B. N. White Landscape modelling spatial bottlenecks: implications for raccoon rabies disease spread Biol Lett, June 23, 2009; 5(3): 387 - 390. [Abstract] [Full Text] [PDF] |
||||
![]() |
B. Pasaniuc, S. Sankararaman, G. Kimmel, and E. Halperin Inference of locus-specific ancestry in closely related populations Bioinformatics, June 15, 2009; 25(12): i213 - i221. [Abstract] [Full Text] [PDF] |
||||
![]() |
A. Bhaskar and Y. S. Song Multi-locus match probability in a finite population: a fundamental difference between the Moran and Wright-Fisher models Bioinformatics, June 15, 2009; 25(12): i187 - i195. [Abstract] [Full Text] [PDF] |
||||
![]() |
B. S. Coates, D. V. Sumerford, N. J. Miller, K. S. Kim, T. W. Sappington, B. D. Siegfried, and L. C. Lewis Comparative Performance of Single Nucleotide Polymorphism and Microsatellite Markers for Population Genetic Analysis J. Hered., June 12, 2009; (2009) esp028v1. [Abstract] [Full Text] [PDF] |
||||
![]() |
M. Porlier, M. Belisle, and D. Garant Non-random distribution of individual genetic diversity along an environmental gradient Phil Trans R Soc B, June 12, 2009; 364(1523): 1543 - 1554. [Abstract] [Full Text] [PDF] |
||||
![]() |
T. Maggio, S. Lo Brutto, F. Garoia, F. Tinti, and M. Arculeo Microsatellite analysis of red mullet Mullus barbatus (Perciformes, Mullidae) reveals the isolation of the Adriatic Basin in the Mediterranean Sea ICES J. Mar. Sci., June 8, 2009; (2009) fsp160v1. [Abstract] [Full Text] [PDF] |
||||
![]() |
A. Aguilar-Melendez, P. L. Morrell, M. L. Roose, and S.-C. Kim Genetic diversity and structure in semiwild and domesticated chiles (Capsicum annuum; Solanaceae) from Mexico Am. J. Botany, June 1, 2009; 96(6): 1190 - 1202. [Abstract] [Full Text] [PDF] |
||||
![]() |
A. Basu, H. Tang, C. E. Lewis, K. North, J. D. Curb, T. Quertermous, T. H. Mosley, E. Boerwinkle, X. Zhu, and N. J. Risch Admixture mapping of quantitative trait loci for blood lipids in African-Americans Hum. Mol. Genet., June 1, 2009; 18(11): 2091 - 2098. [Abstract] [Full Text] [PDF] |
||||
![]() |
J. E. Zalapa, J. Brunet, and R. P. Guries Patterns of hybridization and introgression between invasive Ulmus pumila (Ulmaceae) and native U. rubra Am. J. Botany, June 1, 2009; 96(6): 1116 - 1128. [Abstract] [Full Text] [PDF] |
||||
![]() |
S. Shringarpure and E. P. Xing mStruct: Inference of Population Structure in Light of Both Genetic Admixing and Allele Mutations Genetics, June 1, 2009; 182(2): 575 - 593. [Abstract] [Full Text] [PDF] |
||||
![]() |
J. A. Knight, A. D. Skol, A. Shinde, D. Hastings, R. A. Walgren, J. Shao, T. R. Tennant, M. Banerjee, J. M. Allan, M. M. Le Beau, et al. Genome-wide association study to identify novel loci associated with therapy-related myeloid leukemia susceptibility Blood, May 28, 2009; 113(22): 5575 - 5582. [Abstract] [Full Text] [PDF] |
||||
![]() |
S. Merker, C. Driller, D. Perwitasari-Farajallah, J. Pamungkas, and H. Zischler Elucidating geological and biological processes underlying the diversification of Sulawesi tarsiers PNAS, May 26, 2009; 106(21): 8459 - 8464. [Abstract] [Full Text] [PDF] |
||||
![]() |
S. A. Tishkoff, F. A. Reed, F. R. Friedlaender, C. Ehret, A. Ranciaro, A. Froment, J. B. Hirbo, A. A. Awomoyi, J.-M. Bodo, O. Doumbo, et al. The Genetic Structure and History of Africans and African Americans Science, May 22, 2009; 324(5930): 1035 - 1044. [Abstract] [Full Text] [PDF] |
||||
![]() |
L. G. Carvajal-Carmona, S. Spain, The CORGI Consortium, D. Kerr, R. Houlston, J.-B. Cazier, and I. Tomlinson Common variation at the adiponectin locus is not associated with colorectal cancer risk in the UK Hum. Mol. Genet., May 15, 2009; 18(10): 1889 - 1892. [Abstract] [Full Text] [PDF] |
||||
![]() |
M. Tavaud-Pirra, P. Sartre, R. Nelson, S. Santoni, N. Texier, and P. Roumet Genetic Diversity in a Soybean Collection Crop Sci., May 11, 2009; 49(3): 895 - 902. [Abstract] [Full Text] [PDF] |
||||
![]() |
J. Peccoud, A. Ollivier, M. Plantegenest, and J.-C. Simon A continuum of genetic divergence from sympatric host races to species in the pea aphid complex PNAS, May 5, 2009; 106(18): 7495 - 7500. [Abstract] [Full Text] [PDF] |
||||
![]() |
K. M. Weiss and J. C. Long Non-Darwinian estimation: My ancestors, my genes' ancestors Genome Res., May 1, 2009; 19(5): 703 - 710. [Abstract] [Full Text] [PDF] |
||||
![]() |
A. M. Rezende, E. Tarazona-Santos, A. D A. Couto, C. J. F. Fontes, J. M. De Souza, L. H. Carvalho, and C. F. A. Brito Analysis of Genetic Variability of Plasmodium vivax Isolates from Different Brazilian Amazon Areas Using Tandem Repeats Am J Trop Med Hyg, May 1, 2009; 80(5): 729 - 733. [Abstract] [Full Text] [PDF] |
||||
![]() |
S.-F. Lei, L.-J. Tan, X.-G. Liu, L. Wang, H. Yan, Y.-F. Guo, Y.-Z. Liu, D.-H. Xiong, J. Li, T.-L. Yang, et al. Genome-wide association study identifies two novel loci containing FLNB and SBF2 genes underlying stature variation Hum. Mol. Genet., May 1, 2009; 18(9): 1661 - 1669. [Abstract] [Full Text] [PDF] |
||||
![]() |
A. Auton, K. Bryc, A. R. Boyko, K. E. Lohmueller, J. Novembre, A. Reynolds, A. Indap, M. H. Wright, J. D. Degenhardt, R. N. Gutenkunst, et al. Global distribution of genomic diversity underscores rich complex history of continental human populations Genome Res., May 1, 2009; 19(5): 795 - 803. [Abstract] [Full Text] [PDF] |
||||
![]() |
B. P. McEvoy, G. W. Montgomery, A. F. McRae, S. Ripatti, M. Perola, T. D. Spector, L. Cherkas, K. R. Ahmadi, D. Boomsma, G. Willemsen, et al. Geographical structure and differential natural selection among North European populations Genome Res., May 1, 2009; 19(5): 804 - 814. [Abstract] [Full Text] [PDF] |
||||
![]() |
R. Nielsen, M. J. Hubisz, I. Hellmann, D. Torgerson, A. M. Andres, A. Albrechtsen, R. Gutenkunst, M. D. Adams, M. Cargill, A. Boyko, et al. Darwinian and demographic forces affecting human protein coding genes Genome Res., May 1, 2009; 19(5): 838 - 849. [Abstract] [Full Text] [PDF] |
||||
![]() |
T. Kyndt, A. E. Assogbadjo, O. J. Hardy, R. Glele Kakai, B. Sinsin, P. Van Damme, and G. Gheysen Spatial genetic structuring of baobab (Adansonia digitata, Malvaceae) in the traditional agroforestry systems of West Africa Am. J. Botany, May 1, 2009; 96(5): 950 - 957. [Abstract] [Full Text] [PDF] |
||||
![]() |
S. Wang, L. Zhang, and M. Matz Microsatellite Characterization and Marker Development from Public EST and WGS Databases in the Reef-Building Coral Acropora millepora (Cnidaria, Anthozoa, Scleractinia) J. Hered., May 1, 2009; 100(3): 329 - 337. [Abstract] [Full Text] [PDF] |
||||
![]() |
D. Manicacci, L. Camus-Kulandaivelu, M. Fourmann, C. Arar, S. Barrault, A. Rousselet, N. Feminias, L. Consoli, L. Frances, V. Mechin, et al. Epistatic Interactions between Opaque2 Transcriptional Activator and Its Target Gene CyPPDK1 Control Kernel Trait Variation in Maize Plant Physiology, May 1, 2009; 150(1): 506 - 520. [Abstract] [Full Text] [PDF] |
||||
![]() |
T. E. Wood and T. Nakazato Investigating species boundaries in the Giliopsis group of Ipomopsis (Polemoniaceae): Strong discordance among molecular and morphological markers Am. J. Botany, April 1, 2009; 96(4): 853 - 861. [Abstract] [Full Text] [PDF] |
||||
![]() |
C. V. Breton, H. Vora, M. T. Salam, T. Islam, M. Wenten, W. J. Gauderman, D. Van Den Berg, K. Berhane, J. M. Peters, and F. D. Gilliland Variation in the GST mu Locus and Tobacco Smoke Exposure as Determinants of Childhood Lung Function Am. J. Respir. Crit. Care Med., April 1, 2009; 179(7): 601 - 607. [Abstract] [Full Text] [PDF] |
||||
![]() |
J. Yan, H.-J. Chu, H.-C. Wang, J.-Q. Li, and T. Sang Population genetic structure of two Medicago species shaped by distinct life form, mating system and seed dispersal Ann. Bot., April 1, 2009; 103(6): 825 - 834. [Abstract] [Full Text] [PDF] |
||||
![]() |
S. G. Byars, Y. Parsons, and A. A. Hoffmann Effect of altitude on the genetic structure of an Alpine grass, Poa hiemata Ann. Bot., April 1, 2009; 103(6): 885 - 899. [Abstract] [Full Text] [PDF] |
||||
![]() |
J. Ross-Ibarra, M. Tenaillon, and B. S. Gaut Historical Divergence and Gene Flow in the Genus Zea Genetics, April 1, 2009; 181(4): 1399 - 1413. [Abstract] [Full Text] [PDF] |
||||
![]() |
K. G. McCracken, M. Bulgarella, K. P. Johnson, M. K. Kuhner, J. Trucco, T. H. Valqui, R. E. Wilson, and J. L. Peters Gene Flow in the Face of Countervailing Selection: Adaptation to High-Altitude Hypoxia in the {beta}A Hemoglobin Subunit of Yellow-Billed Pintails in the Andes Mol. Biol. Evol., April 1, 2009; 26(4): 815 - 827. [Abstract] [Full Text] [PDF] |
||||
![]() |
J. P. Foxe, T. Slotte, E. A. Stahl, B. Neuffer, H. Hurka, and S. I. Wright Recent speciation associated with the evolution of selfing in Capsella PNAS, March 31, 2009; 106(13): 5241 - 5245. [Abstract] [Full Text] [PDF] |
||||
![]() |
N. Brede, C. Sandrock, D. Straile, P. Spaak, T. Jankowski, B. Streit, and K. Schwenk The impact of human-made ecological changes on the genetic architecture of Daphnia species PNAS, March 24, 2009; 106(12): 4758 - 4763. [Abstract] [Full Text] [PDF] |
||||
![]() |
C. H. Sneller, D. E. Mather, and S. Crepieux Analytical Approaches and Population Types for Finding and Utilizing QTL in Complex Plant Populations Crop Sci., March 17, 2009; 49(2): 363 - 380. [Abstract] [Full Text] [PDF] |
||||
![]() |
M. Kwak, J. A. Kami, and P. Gepts The Putative Mesoamerican Domestication Center of Phaseolus vulgaris Is Located in the Lerma-Santiago Basin of Mexico Crop Sci., March 17, 2009; 49(2): 554 - 563. [Abstract] [Full Text] [PDF] |
||||
![]() |
E. S. Johnson, F. L. Bekele, S. J. Brown, Q. Song, D. Zhang, L. W. Meinhardt, and R. J. Schnell Population Structure and Genetic Diversity of the Trinitario Cacao (Theobroma cacao L.) from Trinidad and Tobago Crop Sci., March 17, 2009; 49(2): 564 - 572. [Abstract] [Full Text] [PDF] |
||||
![]() |
S. Han, X. Kim-Howard, H. Deshmukh, Y. Kamatani, P. Viswanathan, J. M. Guthridge, K. Thomas, K. M. Kaufman, J. Ojwang, A. Rojas-Villarraga, et al. Evaluation of imputation-based association in and around the integrin-{alpha}-M (ITGAM) gene and replication of robust association between a non-synonymous functional variant within ITGAM and systemic lupus erythematosus (SLE) Hum. Mol. Genet., March 15, 2009; 18(6): 1171 - 1180. [Abstract] [Full Text] [PDF] |
||||
![]() |
J. W. Breinholt, R. Van Buren, O. R. Kopp, and C. L. Stephen Population genetic structure of an endangered Utah endemic, Astragalus ampullarioides (Fabaceae) Am. J. Botany, March 1, 2009; 96(3): 661 - 667. [Abstract] [Full Text] [PDF] |
||||
![]() |
C. M. Richards, G. M. Volk, P. A. Reeves, A. A. Reilley, A. D. Henk, P. L. Forsline, and H. S. Aldwinckle Selection of Stratified Core Sets Representing Wild Apple (Malus sieversii) J. Amer. Soc. Hort. Sci., March 1, 2009; 134(2): 228 - 235. [Abstract] [Full Text] [PDF] |
||||
![]() |
J.-L. Jannink, H. Iwata, P. R. Bhat, S. Chao, P. Wenzl, and G. J. Muehlbauer Marker Imputation in Barley Association Studies The Plant Genome, March 1, 2009; 2(1): 11 - 22. [Abstract] [Full Text] [PDF] |
||||
![]() |
S. C. Murray, W. L. Rooney, M. T. Hamblin, S. E. Mitchell, and S. Kresovich Sweet Sorghum Genetic Diversity and Association Mapping for Brix and Height The Plant Genome, March 1, 2009; 2(1): 48 - 62. [Abstract] [Full Text] [PDF] |
||||
![]() |
J. Yu, Z. Zhang, C. Zhu, D. A. Tabanao, G. Pressoir, M. R. Tuinstra, S. Kresovich, R. J. Todhunter, and E. S. Buckler Simulation Appraisal of the Adequacy of Number of Background Markers for Relationship Estimation in Association Mapping The Plant Genome, March 1, 2009; 2(1): 63 - 77. [Abstract] [Full Text] [PDF] |
||||
![]() |
D. E. Janes, T. Ezaz, J. A. Marshall Graves, and S. V. Edwards Recombination and Nucleotide Diversity in the Sex Chromosomal Pseudoautosomal Region of the Emu, Dromaius novaehollandiae J. Hered., March 1, 2009; 100(2): 125 - 136. [Abstract] [Full Text] [PDF] |
||||
![]() |
I. Simko Development of EST-SSR Markers for the Study of Population Structure in Lettuce (Lactuca sativa L.) J. Hered., March 1, 2009; 100(2): 256 - 262. [Abstract] [Full Text] [PDF] |
||||
![]() |
J. I. Schmidt, K. J. Hundertmark, R. T. Bowyer, and K. G. McCracken Population Structure and Genetic Diversity of Moose in Alaska J. Hered., March 1, 2009; 100(2): 170 - 180. [Abstract] [Full Text] [PDF] |
||||
![]() |
M. Bonhomme, S. Cuartero, A. Blancher, and B. Crouau-roy Assessing Natural Introgression in 2 Biomedical Model Species, the Rhesus Macaque (Macaca mulatta) and the Long-Tailed Macaque (Macaca fascicularis) J. Hered., March 1, 2009; 100(2): 158 - 169. [Abstract] [Full Text] [PDF] |
||||
![]() |
B.-H. Song, A. J. Windsor, K. J. Schmid, S. Ramos-Onsins, M. E. Schranz, A. J. Heidel, and T. Mitchell-Olds Multilocus Patterns of Nucleotide Diversity, Population Structure and Linkage Disequilibrium in Boechera stricta, a Wild Relative of Arabidopsis Genetics, March 1, 2009; 181(3): 1021 - 1033. [Abstract] [Full Text] [PDF] |
||||
![]() |
K. Pajerowska-Mukhtar, B. Stich, U. Achenbach, A. Ballvora, J. Lubeck, J. Strahwald, E. Tacke, H.-R. Hofferbert, E. Ilarionova, D. Bellin, et al. Single Nucleotide Polymorphisms in the Allene Oxide Synthase 2 Gene Are Associated With Field Resistance to Late Blight in Populations of Tetraploid Potato Cultivars Genetics, March 1, 2009; 181(3): 1115 - 1127. [Abstract] [Full Text] [PDF] |
||||
![]() |
A. L. Caicedo, C. Richards, I. M. Ehrenreich, and M. D. Purugganan Complex Rearrangements Lead to Novel Chimeric Gene Fusion Polymorphisms at the Arabidopsis thaliana MAF2-5 Flowering Time Gene Cluster Mol. Biol. Evol., March 1, 2009; 26(3): 699 - 711. [Abstract] [Full Text] [PDF] |
||||
![]() |
T. Wheeldon and B. N White Genetic analysis of historic western Great Lakes region wolf samples reveals early Canis lupus/lycaon hybridization Biol Lett, February 23, 2009; 5(1): 101 - 104. [Abstract] [Full Text] [PDF] |
||||
![]() |
L. Bui Thi Ngoc, C. Verniere, P. Jarne, S. Brisse, F. Guerin, S. Boutry, L. Gagnevin, and O. Pruvost From Local Surveys to Global Surveillance: Three High-Throughput Genotyping Methods for Epidemiological Monitoring of Xanthomonas citri pv. citri Pathotypes Appl. Envir. Microbiol., February 15, 2009; 75(4): 1173 - 1184. [Abstract] [Full Text] [PDF] |
||||
![]() |
D. M. Gatti, A. A. Shabalin, T.-C. Lam, F. A. Wright, I. Rusyn, and A. B. Nobel FastMap: Fast eQTL mapping in homozygous populations Bioinformatics, February 15, 2009; 25(4): 482 - 489. [Abstract] [Full Text] [PDF] |
||||
![]() |
D. Li and D. V. Conti Detecting Gene-Environment Interactions Using a Combined Case-Only and Case-Control Approach Am. J. Epidemiol., February 15, 2009; 169(4): 497 - 504. [Abstract] [Full Text] [PDF] |
||||
![]() |
A. K. MacLeod, D. C. M. Liewald, M. M. McGilchrist, A. D. Morris, S. M. Kerr, and D. J. Porteous Some principles and practices of genetic biobanking studies Eur. Respir. J., February 1, 2009; 33(2): 419 - 425. [Abstract] [Full Text] [PDF] |
||||
![]() |
L. J. Yee, K. Im, A. S. Wahed, T. Bugawan, J. Li, S. L. Rhodes, H. Erlich, H. R. Rosen, T. J. Liang, H. Yang, et al. Polymorphism in the Human Major Histocompatibility Complex and Early Viral Decline during Treatment of Chronic Hepatitis C Antimicrob. Agents Chemother., February 1, 2009; 53(2): 615 - 621. [Abstract] [Full Text] [PDF] |
||||
![]() |
C. L. Simpson, R. Lemmens, K. Miskiewicz, W. J. Broom, V. K. Hansen, P. W.J. van Vught, J. E. Landers, P. Sapp, L. Van Den Bosch, J. Knight, et al. Variants of the elongator protein 3 (ELP3) gene are associated with motor neuron degeneration Hum. Mol. Genet., February 1, 2009; 18(3): 472 - 481. [Abstract] [Full Text] [PDF] |
||||
![]() |
E. G. Holliday, D. R. Nyholt, S. Tirupati, S. John, P. Ramachandran, M. Ramamurti, A. J. Ramadoss, A. Jeyagurunathan, S. Kottiswaran, H. J. Smith, et al. Strong Evidence for a Novel Schizophrenia Risk Locus on Chromosome 1p31.1 in Homogeneous Pedigrees From Tamil Nadu, India Am J Psychiatry, February 1, 2009; 166(2): 206 - 215. [Abstract] [Full Text] [PDF] |
||||
![]() |
S. S. Anand, C. Xie, G. Pare, A. Montpetit, S. Rangarajan, M. J. McQueen, H. J. Cordell, B. Keavney, S. Yusuf, T. J. Hudson, et al. Genetic Variants Associated With Myocardial Infarction Risk Factors in Over 8000 Individuals From Five Ethnic Groups: The INTERHEART Genetics Study Circ Cardiovasc Genet, February 1, 2009; 2(1): 16 - 25. [Abstract] [Full Text] [PDF] |
||||
![]() |
C. C. Lee, N.-c. Y. You, Y. Song, Y.-H. Hsu, J. Manson, L. Nathan, L. Tinker, and S. Liu Relation of Genetic Variation in the Gene Coding for C-Reactive Protein with Its Plasma Protein Concentrations: Findings from the Women's Health Initiative Observational Cohort Clin. Chem., February 1, 2009; 55(2): 351 - 360. [Abstract] [Full Text] [PDF] |
||||
![]() |
C. N. Balakrishnan and S. V. Edwards Nucleotide Variation, Linkage Disequilibrium and Founder-Facilitated Speciation in Wild Populations of the Zebra Finch (Taeniopygia guttata) Genetics, February 1, 2009; 181(2): 645 - 660. [Abstract] [Full Text] [PDF] |
||||
![]() |
M. D. Fallin and A. Matteini Genetic Epidemiology in Aging Research J Gerontol A Biol Sci Med Sci, January 23, 2009; (2009) gln021v1. [Abstract] [Full Text] [PDF] |
||||
![]() |
H. Chen, P. L. Morrell, V. E. T. M. Ashworth, M. de la Cruz, and M. T. Clegg Tracing the Geographic Origins of Major Avocado Cultivars J. Hered., January 1, 2009; 100(1): 56 - 65. [Abstract] [Full Text] [PDF] |
||||
![]() |
J. K. Schultz, J. D. Baker, R. J. Toonen, and B. W. Bowen Extremely Low Genetic Diversity in the Endangered Hawaiian Monk Seal (Monachus schauinslandi) J. Hered., January 1, 2009; 100(1): 25 - 33. [Abstract] [Full Text] [PDF] |
||||
![]() |
M. A. Gonzalez-Perez, P. A. Sosa, E. Rivero, E. A. Gonzalez-Gonzalez, and A. Naranjo Molecular markers reveal no genetic differentiation between Myrica rivas-martinezii and M. faya (Myricaceae) Ann. Bot., January 1, 2009; 103(1): 79 - 86. [Abstract] [Full Text] [PDF] |
||||
![]() |
T. Barbara, G. Martinelli, C. Palma-Silva, M. F. Fay, S. Mayo, and C. Lexer Genetic relationships and variation in reproductive strategies in four closely related bromeliads adapted to neotropical 'inselbergs': Alcantarea glaziouana, A. regina, A. geniculata and A. imperialis (Bromeliaceae) Ann. Bot., January 1, 2009; 103(1): 65 - 77. [Abstract] [Full Text] [PDF] |
||||
![]() |
S. Budel, T. Padukkavidana, B. P. Liu, Z. Feng, F. Hu, S. Johnson, J. Lauren, J. H. Park, A. W. McGee, J. Liao, et al. Genetic Variants of Nogo-66 Receptor with Possible Association to Schizophrenia Block Myelin Inhibition of Axon Growth J. Neurosci., December 3, 2008; 28(49): 13161 - 13172. [Abstract] [Full Text] [PDF] |
||||
![]() |
V. L. Stevens, L. J. Bierut, J. T. Talbot, J. C. Wang, J. Sun, A. L. Hinrichs, M. J. Thun, A. Goate, and E. E. Calle Nicotinic Receptor Gene Variants Influence Susceptibility to Heavy Smoking Cancer Epidemiol. Biomarkers Prev., December 1, 2008; 17(12): 3517 - 3525. [Abstract] [Full Text] [PDF] |
||||
- THIS ARTICLE
-
Abstract
- Full Text (PDF)
- Alert me when this article is cited
- Alert me if a correction is posted
- SERVICES
- Email this article to a friend
- Similar articles in this journal
- Similar articles in PubMed
- Alert me to new issues of the journal
- Download to citation manager
- Reprints & Permissions
- CITING ARTICLES
- Citing Articles via HighWire
- Citing Articles via Google Scholar
- GOOGLE SCHOLAR
- Articles by Pritchard, J. K.
- Articles by Donnelly, P.
- Search for Related Content
- PUBMED
- PubMed Citation
- Articles by Pritchard, J. K.
- Articles by Donnelly, P.




































































