Genetics, Vol. 158, 1811-1823, August 2001, Copyright © 2001

Interpretation of Variation Across Marker Loci as Evidence of Selection

Renaud Vitalisa,b,c, Kevin Dawsona,d, and Pierre Boursota
a Laboratoire Génome, Populations et Interactions, Université Montpellier II, 34095 Montpellier Cedex 05, France,
b Laboratoire Génétique et Environment, Institut des Sciences de l'Évolution de Montpellier, Université Montpellier II, 34095 Montpellier Cedex 05, France,
c Station Biologique de la Tour du Valat, 13200 Arles, France
d I.A.C.R. Long Ashton Research Station, Department of Agricultural Science, University of Bristol, Bristol BS41 9AF, United Kingdom

Corresponding author: Renaud Vitalis, Laboratoire Génétique et Environnement, C.C. 065, Institut des Sciences de l'Évolution de Montpellier, Université Montpellier II, Place Eugène Bataillon, 34095 Montpellier Cedex 05, France., vitalis{at}isem.univ-montp2.fr (E-mail)

Communicating editor: J. HEY


*  ABSTRACT
*TOP
*ABSTRACT
*THE MODEL
*PROPERTIES
*APPLICATIONS
*DISCUSSION
*APPENDIX
*LITERATURE CITED

Population structure and history have similar effects on the genetic diversity at all neutral loci. However, some marker loci may also have been strongly influenced by natural selection. Selection shapes genetic diversity in a locus-specific manner. If we could identify those loci that have responded to selection during the divergence of populations, then we may obtain better estimates of the parameters of population history by excluding these loci. Previous attempts were made to identify outlier loci from the distribution of sample statistics under neutral models of population structure and history. Unfortunately these methods depend on assumptions about population structure and history that usually cannot be verified. In this article, we define new population-specific parameters of population divergence and construct sample statistics that are estimators of these parameters. We then use the joint distribution of these estimators to identify outlier loci that may be subject to selection. We found that outlier loci are easier to recognize when this joint distribution is conditioned on the total number of allelic states represented in the pooled sample at each locus. This is so because the conditional distribution is less sensitive to the values of nuisance parameters.


PRESUMED neutral polymorphic loci are commonly used in making inferences about patterns of differentiation within or among populations of the same or closely related species. For this purpose, genetic distances (see, e.g., NEI 1972 Down) or WRIGHT's (1951) F-statistics are estimated from allele-frequency data. Under particular models of population structure, these parameters are related to demographic or historical parameters, such as the effective population size, the rate of migration between populations, or the time since the populations diverged from their common ancestral population.

However, misinterpretations can occur if one is not able to clearly distinguish between the patterns generated by random genetic drift or by natural selection. The problem is that selective processes can also affect neutral loci. A locus that is neutral will respond to selection whenever it is in linkage disequilibrium (statistical association among allelic states at different loci) with other loci that are subject to selection. Such associations may arise by chance in small populations (HILL and ROBERTSON 1966 Down, HILL and ROBERTSON 1968 Down; OHTA and KIMURA 1969 Down). For example, stabilizing or balancing selection operating at a locus tends to maintain an elevated level of variation at closely linked neutral loci (STROBECK 1983 Down; HUDSON and KAPLAN 1988 Down). Selection acting on any locus has an effect on loosely linked loci, which resembles a reduction of effective population size (ROBERTSON 1961 Down; BARTON 1995 Down, BARTON 1998 Down). Local adaptation tends to increase population differentiation at loci where selection acts, and very high FST values may be found at closely linked neutral loci (CHARLESWORTH et al. 1997 Down). The substitution of advantageous mutations at a locus may also reduce neutral variation at linked loci (MAYNARD SMITH and HAIGH 1974 Down; KAPLAN et al. 1989 Down; BARTON 1995 Down). Similarly, "background selection," caused by the selection against deleterious mutations (CHARLESWORTH et al. 1993 Down; BARTON 1995 Down) results in a reduced effective population size for neutral genes in the region of the chromosome where this selection is acting. Background selection may also increase the apparent population differentiation (CHARLESWORTH et al. 1997 Down).

Therefore, it is of prime interest to identify loci that are responding to selection to exclude them from the genetic analysis of population structure or history. It was recognized early on by CAVALLI-SFORZA 1966 Down that any form of selection will affect some regions of the genome more than others, whereas population history, demography, migration, and the mating system will affect the whole genome in the same way. Accordingly, LEWONTIN and KRAKAUER 1973 Down proposed two tests of selective neutrality. Both tests are based on the sampling distribution of a statistic , the standardized variance of gene frequency, which is an estimator of the parameter FST. Their first test is a goodness-of-fit test comparing the observed distribution of estimates (one estimate from each locus) to a {chi}2 distribution with (n - 1) d.f., where n is the number of populations sampled. The second test is based on the comparison of the observed variance of (across loci) denoted s2F, with the theoretical variance approximated as

(1)

where is the mean value of averaged across loci, and k is a constant that, according to LEWONTIN and KRAKAUER 1973 Down, should not exceed 2 whatever the underlying distribution of allelic frequency. The ratio should be distributed approximately as a {chi}2/d.f., the number of degrees of freedom being determined by the number of biallelic loci.

However, since populations of the same species share, to a certain extent, a common history and since populations are connected through the dispersal of individuals, values will be correlated across loci. For example, the geographic and historical relationships between populations may have a hierarchical structure if populations have been derived from a common ancestral population by a sequence of successive splits. This is the pattern to be expected following the fragmentation of a species range. The effect of such a population history is always to increase the expected variance of (ROBERTSON 1975A Down, ROBERTSON 1975B Down). Moreover, even simple models of divergence by drift (NEI and CHAKRAVARTI 1977 Down), island models (NEI et al. 1977 Down), or stepping-stone models of dispersal (NEI and MARYUYAMA 1975 Down) inflate the expected variance, making LEWONTIN and KRAKAUER's (1973) test unreliable in most cases (LEWONTIN and KRAKAUER 1975 Down).

More recently, BOWCOCK et al. 1991 Down studied the worldwide human genetic differentiation based on DNA polymorphism. Simulating a reasonably well supported evolutionary scenario of divergence, they evaluated the theoretical distribution of FST conditional on initial gene frequencies. Among 100 nuclear RFLP markers a number of genes exhibited lower or, more often, higher variation than expected under neutrality. In an important article, BEAUMONT and NICHOLS 1996 Down proposed a method based on the analysis of the expected distribution of FST conditional on heterozygosity rather than allele frequency. The conditional distribution, constructed under an island model of population structure, is remarkably robust to a wide range of alternative models (colonization, stepping-stone). Interestingly, departures from equilibrium do not alter the expected distribution much whenever FST is <0.5. Yet, unequal numbers of immigrants per generation over the whole population generated some discrepancies with the symmetric island model for heterozygosities in the range [0.1, 0.5] (see Figure 3d in BEAUMONT and NICHOLS 1996 Down).

Thus, their approach might be flawed whenever the true population history consists of repeated branching events or when the connectivity of populations is uneven. However, we cannot infer patterns of migration or historical branching and test for the homogeneity of the markers with the same data. This is what FELSENSTEIN 1982 Down described as the "infinitely many parameters" problem. A solution to this problem is to restrict attention to simple but realistic scenarios that may apply to any pair of populations (ROBERTSON 1975B Down; TSAKAS and KRIMBAS 1976 Down). This reduces the number of parameters in the model. Here, we develop a model of population divergence. We define population-specific parameters as functions of probabilities of identity for pairs of genes taken within or among populations. These parameters are simply related to the ratio of divergence time over effective population size. We construct simple estimators of these population-specific parameters. We then examine the expected joint distribution of these estimators under a wide range of neutral scenarios of divergence. This suggests a new method to assess the homogeneity of response of genetic markers to the historical processes, using empirical data. Finally, we apply our new method to a data set of allozyme loci from Drosophila simulans populations and compare our results to those obtained by using BEAUMONT and NICHOLS'(1996) method.


*  THE MODEL
*TOP
*ABSTRACT
*THE MODEL
*PROPERTIES
*APPLICATIONS
*DISCUSSION
*APPENDIX
*LITERATURE CITED

We consider two haploid populations of constant sizes N1 and N2, which completely separated {tau} generations ago from a single population of stationary size N0. By complete separation, we mean that the populations did not exchange any migrants between the time of the split and the present. We do not assume that the common ancestral population was at equilibrium when it split. Instead, we allow the ancestral population to have gone through a bottleneck {tau}0 generations before present (with {tau}0 > {tau}). Before this, the ancestral population was at mutation-drift equilibrium, with constant size Ne. Generations do not overlap. New mutations arise at a rate µ and follow the infinite allele model (IAM). This model of population divergence is illustrated in Fig 1.



View larger version (16K):
In this window
In a new window
Download PPT slide
 
Figure 1. A gene genealogy under our model for n = 10 genes sampled in each population. In this example, the parameter values are N1 = N2 = 100, N0 = 500, Ne = 1000, {tau} = 50, {tau}0 = 150, and µ = 10-3.

Let Qw,i be the probability that two genes sampled at random within population i are identical by descent (IBD) and Qa be the probability that a gene sampled at random from population 1 is IBD to a gene sampled at random from population 2. IBD probabilities are defined as the probabilities that two genes have not mutated since their most recent common ancestor (MALECOT 1975 Down). The probability that a pair of genes are IBD is equal to the probability that these genes are identical in state (IIS) whenever the mutation process follows the IAM.

More generally, let Qh denote the IBD probability of any pair of genes: h = (w, i) when two genes are sampled within population i, or h = a when one gene is sampled from each population. It is possible to give an expression for Qh as a function of the coalescence time (SLATKIN 1991 Down). Under a continuous time approximation

(2)

(HUDSON 1990 Down), where ch (t) is the probability of coalescence at t for a pair of genes of type h, and {gamma} = (1 - µ)2. The waiting time for a coalescent event in a population of size Ni has an exponential distribution with mean Ni. The IBD probability for a pair of genes in population i reduces to

(3)

where Q0 is the IBD probability for two genes sampled at random from the common ancestral population at time {tau} (just before the split) and (1 - Ci) = {gamma}{tau} · e-{tau}/Ni is the probability that the two genes neither coalesce nor mutate in the ith population in the time interval 0 < t <= {tau}. The first term on the right-hand side of Equation 3 is the probability that the two genes coalesce in the time period 0 < t <= {tau} and are IBD. Following Equation 2, the IBD probability for a pair of genes sampled at random from the common ancestral population just before the split at time {tau} is given by

(4)

where (1 - C0) = {gamma}{tau}0-{tau} · e is the probability that the two genes neither coalesce nor mutate in the time interval {tau} < t <= {tau}0. The first term on the right-hand side of Equation 4 averages over the coalescent events occurring during the population bottleneck. During this time interval ({tau} < t <= {tau}0) the waiting time for a coalescent event is exponentially distributed with mean N0. The last term in Equation 4 averages over coalescent events occurring in the ancestral population at mutation-drift equilibrium. This last term represents the IBD probability for two randomly sampled genes in a stationary population of size Ne, which is 1/(1 + {theta}), with {theta} = 2Neµ. Solving the integrals in the low-mutation limit (where {gamma}t {approx} e-2µt), we find that the solution of Equation 3 is

(5)

where {theta}i = 2Niµ and Ti = . The value of Q0 is given by the solution of Equation 4,

(6)

where {theta}0 = 2N0µ and T0 = . The probability for a gene in population 1 to be IBD with a gene in population 2 is just given by

(7)

Obviously, two such genes cannot coalesce during the {tau} generations between the moment of divergence and the present. They are IBD only if their respective ancestors are IBD when populations 1 and 2 diverge and, furthermore, if they do not undergo mutation during the divergence. Now, it is useful to consider the parameter

(8)

It is worth noting that the weighted sum of Fi over the two populations gives the intraclass correlation for the probability of identity by descent for genes within populations relative to genes between populations. This is of particular interest, because the properties of the intraclass correlations for the probability of identity in state ("IIS correlations"; COCKERHAM and WEIR 1987 Down) can be deduced from the properties of the corresponding intraclass IBD correlations in the low-mutation limit (ROUSSET 1996 Down). Indeed, such ratios of identity probabilities of the form of Equation 8 give the same low-mutation limit, whether one considers the infinite allele model or other mutation models (ROUSSET 1996 Down, ROUSSET 1997 Down).

If we neglect new mutations arising during the divergence process, Qa reduces to Q0 and Qw,i = Ci(1 - Q0) + Q0. Thus

(9)

Note that Equation 9 gives a well-known result when both daughter populations are assumed to have the same size N, so that F1 = F2 = F {approx} 1 - e-{tau}/N (see, e.g., REYNOLDS et al. 1983 Down). Hereafter, the parameter Ti is referred to as the "branch length" of population i. An important result is that, in the low-mutation limit, the new parameters F1 and F2 do not depend on the "nuisance parameters" {theta} or T0. This suggests that a simple moment-based estimator i of branch length can be derived as

(10)

where i is an estimator of Fi (see Appendix for details).


*  PROPERTIES
*TOP
*ABSTRACT
*THE MODEL
*PROPERTIES
*APPLICATIONS
*DISCUSSION
*APPENDIX
*LITERATURE CITED

Simulation procedure:
For each set of parameter values, a sequence of artificial data sets was generated using standard coalescent simulations, as described by, e.g., HUDSON 1990 Down. The simulations were performed as follows (see Fig 1 for an illustrated example of one simulated genealogy). For each population, the genealogy of a sample of ni genes is generated for a period of time ranging from present to {tau} generations in the past. During this period, all the coalescent events are separated by exponentially distributed time intervals, with means in population 1 and in population 2 (see Equation 3). At time {tau}, the number n0 of lineages that remain represents the ancestors of all the genes sampled in populations 1 and 2. The genealogy of these lineages is generated for the time period [{tau}, {tau}0], and all the coalescence events are separated by exponentially distributed time intervals, with mean (see the first term in the right-hand side of Equation 4). At time {tau}0, the lineages that remain are the ancestors of all the genes sampled in populations 1 and 2. The genealogy of these ne genes is generated for the period [{tau}0, +{infty}], with all coalescent events separated by exponentially distributed time intervals with mean (see the second term in the right-hand side of Equation 4). Once the complete genealogy is obtained, the mutation events are superimposed on the coalescent tree of lineages. In the results that follow, each artificial data set consisted of two (haploid) samples of size n = 100, one from population 1 and the other from population 2.

Simulation results:
By calculating the estimators 1 and 2 for each of these artificial data sets, it was possible to obtain a close approximation to the expected distribution of these estimators (see Appendix for details). Fig 2 shows this expected joint distribution of 1 and 2 for various combinations of the nuisance parameters {theta} and T0. In this case, the "true" branch lengths were T1 = T2 = 0.1 (hence F1 = F2 {approx} 0.0953). The expected value of the estimator 1 (respectively 2) was always close to the value of the parameter F1 (respectively F2). One can show that, by construction, the points (1, 2) lie within the upper-right triangle with vertices (1, 1), (-1, 1), and (1, -1). The joint distribution of these two statistics has a negative correlation. Most importantly, it is clear from this figure that the joint distribution of 1 and 2 depends strongly on the nuisance parameters, even though their expectations remain close to the true values of F1 and F2.



View larger version (38K):
In this window
In a new window
Download PPT slide
 
Figure 2. Expected distribution of pairs of 1 and 2 estimates for wide ranges of values of the nuisance parameters {theta} = 2Neµ and T0. Ti = is 0.10 for both daughter populations (with {tau} = 50 and N1 = N2 = 500), giving an expected value Fi {approx} 0.0953, as indicated by the dotted lines. For all parameter sets, µ = 10-4 and N0 = 1000. One hundred individuals are sampled in each daughter population. The light gray area defines a region in which 95% of the simulated points are expected to lie (see Appendix for details).

It can be seen that, for smaller values of T0, the joint distribution becomes tighter as {theta} increases. On the other hand, for larger values of {theta}, the distribution is found to widen as T0 increases. In both cases, it is the level of variation that remains before divergence that is crucial in shaping the joint distribution. With small {theta} and large T0, the lineages coalesce rapidly before the divergence, and the number of distinct mutations (allelic states) that can be maintained is small. In this case, the variance of the estimates of populations branch lengths is large, as illustrated by the wide joint distribution of 1 and 2. Therefore, the joint distribution of 1 and 2 is not ideal for investigating the homogeneity of response of a set of molecular markers to the genealogical processes. Indeed, other factors such as heterogeneous mutation rates across loci may be invoked to explain disparities of branch length estimates among markers. Fortunately, this problem can be overcome by considering the joint distribution of 1 and 2, conditional upon the total number k of allelic states in the pooled sample at each locus. Fig 3 shows the estimated joint distribution for T1 = T2 = 0.1 (hence F1 = F2 {approx} 0.0953), conditioned on k = 4. The combinations of nuisance parameter values are the same as in Fig 2.



View larger version (31K):
In this window
In a new window
Download PPT slide
 
Figure 3. Expected distribution of pairs of 1 and 2 estimates conditioned on a number of alleles in the sample equal to four. As in Fig 1, wide ranges of values were used for the nuisance parameters. The dotted lines indicate the expected values for F1 and F2.

The expected joint conditional distribution appears to be almost independent on the nuisance parameters. So, given the observed values for the parameters F1 and F2, and given the number of alleles in the sample, one can obtain the conditional joint distribution, and then a high probability region, that should contain 95% of the observed measures of pairwise i's values. This result provides the justification for using the conditional distributions to analyze the homogeneity in the patterns of genetic differentiation revealed by a (large) set of markers.


*  APPLICATIONS
*TOP
*ABSTRACT
*THE MODEL
*PROPERTIES
*APPLICATIONS
*DISCUSSION
*APPENDIX
*LITERATURE CITED

In this section, we present a methodology for identifying outlier loci by a pairwise analysis of populations. For each pair of populations (i, j), we suggest the following protocol:

  1. For all loci, the statistics i and j are computed (see Appendix).

  2. The parameters Fi and Fj are estimated as the averages among loci weighted by the heterozygosities (1 - i) and (1 - j), respectively (see Appendix). This corresponds to the weighting of loci suggested by WEIR and COCKERHAM 1984 Down for the multilocus estimator of FST.

  3. The expected joint distribution of i and j is generated by performing 10,000 coalescent simulations for a given set of nuisance parameter values. This is repeated using a wide range of values for the nuisance parameters. In the D. simulans data set discussed below, all the pairwise combinations for {theta} and T0 were performed, with {theta} = 1, 5, or 10 and T0 = 0.01, 0.1, or 1. Thus, a total of 90,000 coalescent simulations were performed in this example. The simulated sample sizes are chosen to be representative of those actually realized in the real data set.

  4. For each expected joint distribution of i and j, we construct all the distributions, conditional on the number of allelic states k in the pooled sample, for k = 2, 3, ... (the pooled sample is the sample obtained by pooling the samples from populations i and j). Remember, there is one expected distribution for each set of nuisance parameter values. For each conditional distribution, we identify the "high probability" or "high density" region, in the range of the points i and j, where 95% of the data are expected to lie (see Appendix for the construction of this high probability region).

  5. For each value of the number of allelic states in the pooled sample, we superimpose a scatter plot of the observed data points (pairs of 1 and 2 values) over an outline of the 95% high probability region to identify outlier loci.

D. simulans data set:
We applied this method to a D. simulans data set, described in SINGH et al. 1987 Down and CHOUDHARY et al. 1992 Down. The raw data set was kindly provided by R. S. Singh and R. A. Morton. Among 111 allozyme loci, 43 were found to be polymorphic in the five populations studied in Europe and Africa. The samples consisted of isofemale lines maintained in the laboratory. The haploid sample sizes ranged from n = 26 to n = 55. Fig 4 shows the analysis performed on a particular pair of populations (France and Tunisia). The multilocus estimates of the parameters F1 (French population) and F2 (Tunisian population) were 0.0064 and 0.0617, respectively. The expected distributions with these averaged values, conditioned on the number of alleles in the pooled sample, are plotted with the actual monolocus pairwise (1, 2) estimates.



View larger version (21K):
In this window
In a new window
Download PPT slide
 
Figure 4. 1 and 2 values estimated from 43 loci in Drosophila simulans for the pairwise comparison of the populations from France (n = 55) and Tunisia (n = 52). n is the number of isofemale lines typed for each enzymatic system (haploid sample size). Each locus is represented with a solid dot. The averaged values are 1 = 0.0064 and 2 = 0.0617 as indicated by the dotted lines. Thin solid lines enclose a region in which 95% of the simulated data points are expected to lie. Four distributions are shown, conditioned on the number of allelic states k in the whole sample: (A) expected distribution of pairwise Fi estimates conditioned on k = 2; (B) with k = 3; (C) with k = 4; and (D) with k = 5. Solid arrows indicate outlier loci. The loci coding for glutamate pyruvate transaminase (GPT) and carbonic anhydrase-3 (Ca-3) are shown, respectively, in C and D.

In the great majority of cases, the points fall within the 95% confidence region. With 43 loci we would expect two (0.05 x 43 {approx} 2) to lie outside the region by chance. But considering the joint distributions for loci with three or more alleles, we found 4 loci that clearly lie outside. Caution is required in the case of loci that lie on the borders of the possible range (Fig 4B). These correspond to loci that have an allele fixed in one population. Slight variations in the nuisance parameters can increase or decrease the relative proportion of loci that may fix one allele in a population. Indeed, we found some conditions under which the 95% envelope contained these 2 loci. This problem can remain even when we condition on the observed number of alleles. On the other hand, 2 other loci (coding for glutamate pyruvate transaminase and carbonic anhydrase-3) are clear outliers of the expected distributions (Fig 4C and Fig D). In all pairwise comparisons that included the French population, these two loci fell either outside, or on the edges of the 95% high probability region.

In all the pairs that included the population from Congo, two loci coding respectively for the larval protein-10 (Pt-10) and the phosphoglucomutase (PGM) were found to lie outside or on the limit of the 95% high probability region (Fig 5). The locus coding for the larval protein-10 systematically gives a longer estimated branch length for this African population than do all other loci, while it gives similar branch lengths to other loci for the other populations. This suggests that genetic variation was severely reduced by a factor other than genetic drift in this African population. The locus coding for phosphoglucomutase gives a longer branch length estimate than the other loci in three cases (Fig 5, A–C) and a shorter one in one case (Fig 5D). The locus coding for phosphoglucomutase was also found to lie outside the limit of the 95% high probability region in all the pairs that included the population from Seychelle Island (Fig 6). To strengthen our presumption that these loci were outside the limit allowed by a neutral model, we checked whether these loci also lie outside the limit of the 99% high probability region. The same results were obtained. For these loci, we did not find any plausible neutral scenario of divergence by drift that could provide such a scatter of points. We thus conclude that natural selection may have acted on these loci or on closely linked regions within the genome.



View larger version (29K):
In this window
In a new window
Download PPT slide
 
Figure 5. 1 and 2 values estimated from 43 loci in Drosophila simulans for all the pairwise comparisons involving the population from the Congo (n = 45). (A) Expected distribution for the populations from France (n = 55) and Congo. (B) Tunisia (n = 52) vs. Congo. (C) Congo vs. Cape Town, South Africa (n = 32). (D) Congo vs. Seychelle Island (n = 26). All distributions are conditioned on k = 4. Each locus is represented with a solid dot. Dotted lines give the expected values for 1 and 2. For each expected conditional distribution, solid arrows indicate the loci coding for the larval protein-10 (Pt-10) and phosphoglucomutase (PGM).



View larger version (40K):
In this window
In a new window
Download PPT slide
 
Figure 6. 1 and 2 values estimated from 43 loci in Drosophila simulans for all the pairwise comparisons involving the population from the Seychelle Island (n = 26). (A) Expected distribution for the populations from France (n = 55) and Seychelle Island. (B) Tunisia (n = 52) vs. Seychelle Island. (C) Congo (n = 45) vs. Seychelle Island. (D) Cape Town, South Africa (n = 32) vs. Seychelle Island. Distributions in A and C are conditioned on k = 4 and distributions in B and D are conditioned on k = 3. Each locus is represented with a solid dot. Dotted lines give the expected values for 1 and 2. For each expected conditional distribution, solid arrows indicate the locus coding for phosphoglucomutase (PGM).

We are more cautious about claiming that the loci coding for glutamate pyruvate transaminase and carbonic anhydrase-3 were or are subject to selection. These loci are clear outliers in some pairwise comparisons involving the French population but fall just within the limits of the confidence region in other comparisons. Moreover, when considering 99% confidence regions instead of 95% confidence regions, some loci were no longer detected as outliers but rather as lying on the edges of the confidence limit. The locus coding for isocitrate dehydrogenase-1 was found to be an outlier in three (out of four) pairs that included the population from Seychelle Island. Overall, six more loci were detected as outliers in single pairwise comparisons only. Therefore, we should be very cautious about considering those latter loci as being under selection. Indeed, if a locus has responded to selection in one particular contemporary population since it became isolated, then we expect this locus to show up as an outlier in all (or most) comparisons involving this population. This pattern is exactly what we found for the two loci coding for larval protein-10 and phosphoglucomutase in the Congo and Seychelle Island populations.

Evaluating the robustness of this method to the assumptions of the model:
In the data set discussed above, it is likely that the populations of D. simulans have exchanged migrants after divergence. More generally, one can wonder whether complete isolation and divergence by random drift accurately describes natural situations. An alternative approach would be to develop a new model of population divergence that allows subsequent migration after separation. But if we want to make inferences about a more realistic (and hence a more complex) model of divergence, then we need to distinguish between the pattern of genetic differentiation that results from (i) recent separation followed by very little migration or (ii) ancient separation followed by a moderate amount of migration. This is a difficult task, which would require more powerful methods for inferring parameter values (e.g., maximum likelihood; see NIELSEN and SLATKIN 2000 Down) that would be much more time consuming. Further, note that NIELSEN and SLATKIN 2000 Down assume that the mutation rate is zero.

So, we are interested in testing if our method (which assumes evolution in complete isolation after divergence) is undermined when applied to pairs of populations that still exchange genes after divergence. It should be borne in mind that gene flow, like genetic drift, affects the whole genome in the same way. We generated artificial data sets under neutral models of population divergence, including high mutation rates and moderate levels of migration between populations. We used a modified version of the algorithm described by HUDSON 1990 Down, which accounts for symmetric migration between populations. For the period of time ranging from present to {tau} generations in the past, considering populations 1 and 2 altogether, the waiting time to the next event (coalescence or migration) is drawn from an exponential distribution with mean N1N2/[N2/(1) · N1/(2) + m(n1 + n2)N1N2], where m is the backward migration rate (NORDBORG 2001 Down). Conditionally on the occurrence of one event, two genes coalesce in population 1 (respectively population 2) with probability N2/(1)/[N2/(1) · N1/(2) + m(n1 + n2) N1N2] (respectively N1/(2)/[N2/(1) · N1/(2) + m(n1 + n2) N1N2]) or one gene migrates from population 2 to population 1 (respectively from population 1 to population 2) with probability m · n1/[N2/(1) · N1/(2) + m(n1 + n2) N1N2] (respectively m · n2/[N2/(1) · N1/(2) + m(n1 + n2)N1N2]; see STROBECK 1987 Down; TAKAHATA 1988 Down; NORDBORG 2001 Down). Then, for the period [{tau}, +{infty}], the coalescent process was generated as previously described (see also Fig 1).

For each set of parameters, we generated 20 data sets composed of two samples (n1 = n2 = 50) of 50 loci each. The parameter values are given in Table 1. For each data set, we applied our method as described above. We generated joint distributions, conditional on the number of alleles, according to the actual numbers of alleles in each sample. For all sets of parameters, we grouped loci with eight alleles and more in a single class. The number of joint conditional distributions generated per artificial data set (i.e., the number of classes for different numbers of alleles) ranged from three to seven. For each data set, over all the joint conditional distributions taken together, we expected to detect 0.05 x 50 = 2.5 outlier loci, just by chance. We performed Wilcoxon's signed-rank tests (see, e.g., MENDENHALL et al. 1990 Down) to determine if the distribution of the number of detected outlier loci was shifted to the right of 2.5 (one-tailed test).


 
View this table:
In this window
In a new window

 
Table 1. Results from applications to various divergence scenarios

Table 1 shows the total observed number of outlier loci (mean and median over 20 independent simulated data sets) detected for a range of nuisance parameter values (low and high mutation rates, short or long divergence by random drift, with or without migration). In no case could we reject the null hypothesis that the expected number of outlier loci detected by our method was equal to 2.5 (against the alternative hypothesis that the expected number of outliers was >2.5). Thus, our approach is conservative in the sense that the 95% confidence region contains at least 95% of the loci generated by a truly neutral model. At the level of 5% we do not (falsely) detect any more than 5% of outlier loci in a sample of neutral markers (type I error).

Comparison with Beaumont and Nichols' (1996) method:
We also applied BEAUMONT and NICHOLS' (1996) procedure to the D. simulans data set. Based on a preliminary examination of the data, three loci (coding for {alpha}-fucosidase, dipeptidase-1, and mannose phosphatase isomerase) were found to lie outside the 95% confidence region of the conditional joint distribution of ST and mean heterozygosity. The percentiles were determined as described in BEAUMONT and NICHOLS 1996 Down. Surprisingly, none of these three loci were detected as outliers using our method. There may be several reasons for this.

We suspect that, in the present case, the inclusion of a very distant insular population (Seychelle Island) may bias their analysis. Indeed, populations heterogeneous with respect to their demographic parameters (effective population sizes and migration rates) were shown to strongly affect their method (BEAUMONT and NICHOLS 1996 Down). Isolation (low migration rates) together with population bottlenecks can introduce a further bias. Consider as an extreme case the fixation of a private allele at some locus in one population. This may be unexpected for a polymorphic locus in a mutation-migration-drift equilibrium model, unless there is a strong asymmetry, with some populations being smaller and receiving less immigrants than others. However, this is not unexpected for a model of separation and isolation, where there were population bottlenecks. This may boost the FST estimate at some locus and thus exclude it from the 95% high probability region. So, isolated populations should probably be excluded from BEAUMONT and NICHOLS' (1996) analysis.

Moreover, in general, the loci that were outliers in our analysis gave small values of (global) FST. But from the shape of the joint distribution of FST and heterozygosity, it seems that BEAUMONT and NICHOLS' (1996) analysis is likely to detect outlier loci that exhibit unusually large FST values. However, a process that would cause an apparent decrease of genetic variation at one locus in a single local population, without leading to a decrease of the variation over all populations, would not be detected in BEAUMONT and NICHOLS' (1996) procedure. In other words, if selection acts on one locus at a local scale, pairwise comparisons of populations are more likely to be efficient for detecting outlier loci.


*  DISCUSSION
*TOP
*ABSTRACT
*THE MODEL
*PROPERTIES
*APPLICATIONS
*DISCUSSION
*APPENDIX
*LITERATURE CITED

Using population-specific estimators of branch lengths:
Conventional pairwise genetic distances or pairwise measures of population differentiation are based on the assumption that the sizes of populations are equal and constant through time or that dispersal, if any, is symmetric. For example, the pairwise FST parameter is defined as a ratio of identity probabilities within and among populations. But the within-population term is taken as an average over the pair of populations. Thus, the definition of the parameter implicitly assumes that both populations share the same demographic parameters. WEIR and COCKERHAM's (1984) estimator {theta} of FST is constructed to have low bias and variance, assuming that the populations are independent replicates of the same stochastic process. This means that populations are supposed to have the same size and that they do not exchange migrants. Without these assumptions, {theta} would be a complex function of unequal (within-population) identity probabilities.

In contrast, the i parameters defined here make sense even when the populations are of unequal size. The only assumption we make is that when the two populations have separated, they remain completely isolated. From the estimation of Fi's for a pair of populations, we can infer the branch lengths. The ratio of these branch length estimates is inversely proportional to the ratio of effective population sizes. Thus, these estimates may be seen as measures of the intensity of genetic drift that has occurred since population divergence. The main drawback to this approach is that when estimates of IIS probabilities are smaller within populations than among them (i.e., w,i < a), i becomes negative, and the moment-based estimator of branch length fails. Although this can arise just by chance for some loci, averaging estimates over loci reduces the problem.

Provided that we obtain good estimates of branch lengths for a pair of populations (which requires the pooling of information from many independent loci), we may be able to evaluate the consistency of locus-specific estimates. Indeed, the joint distribution of branch length estimates, conditioned on the number of alleles in the pooled sample, depends only weakly on nuisance parameters of the simple model of divergence by drift. In particular, this conditional distribution is not sensitive to departures from mutation-drift equilibrium before isolation or to differences in mutation rates.

Detection of selection acting on genetic markers:
We saw from the analysis of the D. simulans data set that the great majority of loci always fall in the confidence region of the conditional pairwise distributions of branch length estimates, while some loci do not. Overall, we identified two loci that were probably subject to selection in the population from Congo, one of which was also probably subject to selection in the population from Seychelle Island. We concluded that the distribution of variability at these loci may have been shaped by forces other than mutation and drift. Furthermore, we identified two other loci that either lie on the edges or fall just outside the high probability region of the expected conditional distribution in the French population, although we are more cautious about these latter loci. It is noteworthy that our estimation of the density of i parameters (see Appendix) is discontinuous, because of the discrete nature of the data (the allele counts). This is particularly true when the number of alleles on which the distribution is conditioned is small (for a given set of parameters, the lower the number of allelic states, the more discontinuous the null distribution; see Fig 4). Using discrete distributions is clearly preferable to using some (unnecessary) continuous approximations to it. Moreover, whenever the null distribution is based on the same number of allelic states and the same number of genes as in the sample, there is no tendency for loci to show up as outlier just because of the discrete nature of the distribution (i.e., a locus cannot, by construction, show up between arc-shaped areas located at the edge of some distributions). Yet, when an apparent outlier lies very close to the 95% high probability region, it is highly advisable to check whether this locus also lies outside the 99% high probability region.

The main criticisms of LEWONTIN and KRAKAUER's (1973) attempts to interpret across-loci heterogeneity of FST values arose from their failure to consider allele frequencies as random variables, whose distribution depends on the underlying model of population structure and history. Indeed, uneven patterns of dispersal among populations (NEI and MARYUYAMA 1975 Down) or sequences of population splits within the species (ROBERTSON 1975A Down, ROBERTSON 1975B Down) may strongly undermine the approach. LEWONTIN and KRAKAUER 1975 Down acknowledged that their tests might be limited to situations where the true population structure did not depart too much from the island model.

However, conditioning the distribution of FST on the heterozygosity (BEAUMONT and NICHOLS 1996 Down) or on gene frequency for biallelic loci (BOWCOCK et al. 1991 Down) was shown to give surprisingly robust results, in the sense that strong departures from the model assumptions do not alter the distribution very much. The strongest effect on the joint expected distribution of FST and heterozygosity occurs when populations are heterogeneous with respect to their demographic parameters (BEAUMONT and NICHOLS 1996 Down), for example, when populations are founded by very different numbers of individuals or when populations are arranged in an irregular stepping-stone lattice. However, BEAUMONT and NICHOLS 1996 Down considered a large number d of subpopulations in the metapopulation (d = 100) and this parameter strongly influences the expected heterozygosity [He {approx} 4Ndµ/(1 + 4Ndµ), for diploids]. In addition, at a local scale, FST is only weakly influenced by the total population size Nd (ROUSSET 2001 Down). The number of populations has a stronger role than acknowledged by BEAUMONT and NICHOLS 1996 Down in determining whether mutation has an effect on FST or not. It was shown that, considering smaller numbers of populations, FST estimates may be reduced by mutation, especially with a stepwise mutation model (see FLINT et al. 1999 Down). With d = 100 islands, the sets of parameters used in BEAUMONT and NICHOLS 1996 Down did not account for any case where mutation may depress FST.

As already suggested by TSAKAS and KRIMBAS 1976 Down, restricting LEWONTIN and KRAKAUER's (1973) approach to pairs of populations removes all kinds of dependence on the unknown population structure. Indeed, whatever their history, two populations ultimately descend from a single ancestral one in the past. Still, nuisance parameters may broaden the joint distribution of pairwise Fi's (Fig 2). However, conditioning on the number of alleles (Fig 3) also gives distributions that are robust enough to variations in the values of nuisance parameters. It is obvious that, for each analysis of a pair of populations, we deliberately discard the information brought by other populations, which may decrease the power of the method (TSAKAS and KRIMBAS 1976 Down). But we believe that this enables us to explain a wider range of patterns than any symmetrical model, such as the island model. In this respect, our approach is conservative. Moreover, we found that low or moderate gene flow did not undermine our approach, in the sense that the probability of falsely detecting a neutral locus as an outlier (type I error) is no more than 5% (Table 1). We compared the performance of our method to that of BEAUMONT and NICHOLS 1996 Down, using the empirical data from SINGH et al. 1987 Down and CHOUDHARY et al. 1992 Down. We further tested whether our method would falsely reject neutral loci (type I error) any more than expected, under a wide range of nuisance parameter values (see Table 1). In particular, since the method assumes that the mutations arising after divergence can be neglected, we checked that high mutation rates do not weaken the approach.

We found that patterns such as those identified in, e.g., the Tunisia vs. Congo data set as evidence of selection can be produced by "neutral models," where the coalescent process occurs independently at each locus. Indeed, similar scatters of points could be obtained whenever the parameters 1 and 2 vary across loci, having particularly high values at certain loci (results not shown). Models of this type provide a rough approximation to models of unlinked neutral loci, some of which were strongly influenced by selection (remembering that the effect of selection resembles a reduction in the effective population size experienced by these loci, as described by ROBERTSON 1961 Down; BARTON 1995 Down, BARTON 1998 Down). So, it is certainly plausible that the patterns that we identified in, e.g., the Tunisia vs. Congo data set were produced by selection. A thorough investigation of the conditions under which our method fails to identify selected loci (type II error) would be desirable. However, this is not feasible, as the range of models that incorporate selection is very large.

An important task for the future is to consider a more general neutral model of the divergence of two populations, where gene flow may continue after the moment of "separation." It is also desirable to extend this approach to more elaborate neutral models, incorporating recombination. More sophisticated estimators of the divergence parameters (branch lengths) would then be required. We assumed that the mutation process follows the IAM and we allowed a wide range of possible mutation rates. In the IAM, genes that are identical in state are also identical by descent. This may not be the case with other mutation models such as with the K allele or stepwise mutation processes, which can produce IIS genes that are not IBD (homoplasy). The IAM is probably an adequate model for allozyme data. It is certainly not so appropriate for potentially more variable markers, such as microsatellites. Recent studies revealed that the processes of mutation of microsatellite markers may be more complex than previously thought and may vary greatly among loci (ESTOUP and ANGERS 1998 Down). Furthermore, the effect of homoplasy on measures of population subdivisions is not simple (ROUSSET 1996 Down). Therefore, further studies should be conducted to test the application of our method across different classes of nuclear markers that differ in processes of mutation. Clearly, if a whole class of marker loci, which are known to have a very distinct mutation process, are identified as outliers by our analysis, then this class of markers should be interpreted with caution.

If we could identify those marker loci that responded to selection during the process of divergence, then we may be able to obtain improved estimates of the parameters of population structure and history by excluding these loci (ROSS et al. 1999 Down). Our method differs from previous ones in allowing selection to be detected in particular populations and in some pairwise comparisons but not others. This opens up the possibility that markers may be discarded only in the analysis of those populations where there is evidence that they have responded to selection. It is also of interest to use this approach to screen the genome for regions that have responded to strong selection in the recent past. If populations have diverged phenotypically and if this has been caused by selection, then it may even be possible to identify candidate regions for the quantitative trait loci underlying this adaptive divergence.


*  ACKNOWLEDGMENTS

We are very grateful to R. S. Singh and R. A. Morton for providing the Drosophila simulans data set. We thank I. Olivieri for helpful comments on a previous draft of this manuscript and S. Billiard for valuable discussions about the structured coalescent. We are grateful to two anonymous reviewers for their constructive comments. This work was funded by contract no. BIO4-CT96-1189 of the Commission of the European Communities (DG XII) to P.B., and R.V. was also partially funded by the Fondation Sansouire. This is publication no. 2001-045 of the Institut des Sciences de l'Évolution de Montpellier.

Manuscript received October 23, 2000; Accepted for publication May 14, 2001.


*  APPENDIX
*TOP
*ABSTRACT
*THE MODEL
*PROPERTIES
*APPLICATIONS
*DISCUSSION
*APPENDIX
*LITERATURE CITED

Parameters estimation:
For any given allele u, we use the indicator variable xiju for describing the state of the jth gene in the ith population, with i = (1, 2). xiju = 1 if the allelic type is u, xiju = 0 otherwise. Let piu be the frequency of allele u in the ith population. Then piu = {epsilon} (xiju|p), where {epsilon} (|p) denotes the expectation, conditional on the array p of all the allele frequencies. Considering the second moments of the random variable xiju, it follows that {epsilon} (x2iju|p) = piu and, since individuals are sampled independently from the ith population, {epsilon}(xijuxij'u|p) = p2iu for j' != j. Then, summing over all alleles gives the probability for two genes in population i to be identical in state (IIS),

(A1)

where {epsilon} denotes now the expectation over the distribution of allele frequencies p and k is the number of alleles in the population. The IIS probability for two genes respectively taken in populations 1 and 2 is given by

(A2)

An unbiased estimator of the frequency of allele u among ni sampled individuals from the ith population is simply given by iu= {sum}nj=1 . Expanding the square of this expression, and then taking expectation, gives {epsilon}(2iu|p) = . Therefore,

(A3)

is an unbiased estimator of the probability for two genes in population j to be identical in state, with k being the number of alleles in the sample. Similarly

(A4)

is an unbiased estimator of the IIS probability of two genes taken in the ancestral population, before divergence. Approximating the expectation of a ratio by the ratio of expectations, an estimator of Fi is given by

(A5)

When combining the information brought by all alleles at more than one locus, a multilocus estimator is defined as the ratio of the sum of locus-specific numerators over the sum of locus-specific denominators (see, e.g., WEIR and COCKERHAM 1984 Down). It is worth noting that, when daughter population sizes are equal, this simple way to estimate parameters (i.e., equating Q's to 's in Equation 8 to get ) directly yields Cockerham's estimators (COCKERHAM 1973 Down; WEIR and COCKERHAM 1984 Down), developed with the methods of analysis of variance (see ROUSSET 2001 Down for a thorough demonstration of the equivalence between estimator formulas based on analyses of variance and expressions in terms of frequency of identical genes). Our estimator differs from previous ones (e.g., REYNOLDS et al. 1983 Down) in allowing separate parameters Fi's for each population.

Estimation of the density of Fi parameters:
For each set of parameter values, coalescent simulations were performed, thus generating "artificial data sets." Each artificial data set yields a pair of estimates 1 and 2. An approximation to the expected joint distribution was obtained as follows. First, a two-dimensional histogram was constructed. Recall that the points (1, 2) are constrained to lie within the upper-right triangle of a square with vertices (-1, -1), (1, -1), (-1, 1), and (1, 1). The whole square region was covered by a two-dimensional array (or mesh) of 100 x 100 square cells. Each cell has thus sides of length 0.02. Each observation (1, 2) was binned in the appropriate cell. The cell counts were divided by the total number of observations to obtain a discrete probability distribution over the two-dimensional array. This discrete distribution is a close approximation to the expected joint distribution of the estimators (1, 2). The q-level "high probability region" (q = 95% or any other value) is constructed as follows. The cells are sorted in order of decreasing probability. Finally, starting from the cells with the highest associated probabilities, cells are sequentially added to the confidence region until the cumulative probability of the whole set of cells obtained is equal to (or just exceeds) the chosen q-value.

From this procedure, we obtain for each simulation a region within which a proportion q of the data lies. Note that this confidence region is not necessarily continuous. Constructing the high probability region using the discrete distribution is clearly preferable to using some (unnecessary) continuous approximation to it.


*  LITERATURE CITED
*TOP
*ABSTRACT
*THE MODEL
*PROPERTIES
*APPLICATIONS
*DISCUSSION
*APPENDIX
*LITERATURE CITED

BARTON, N. H., 1995  Linkage and the limits to natural selection. Genetics 140:821-841[Abstract].

BARTON, N. H., 1998  The effect of hitch-hiking on neutral genealogies. Genet. Res. 72:123-133.

BEAUMONT, M. A. and R. A. NICHOLS, 1996  Evaluating loci for use in the genetic analysis of population structure. Proc. R. Soc. Lond. Ser. B 263:1619-1626[Abstract/Free Full Text].

BOWCOCK, A. M., J. R. KIDD, J. L. MOUNTAIN, J. M. HEBERT, and L. CAROTENUTO et al., 1991  Drift, admixture, and selection in human evolution: a study with DNA polymorphisms. Genetics 88:839-843.

CAVALLI-SFORZA, L. L., 1966  Population structure and human evolution. Proc. R. Soc. Lond. Ser. B 164:362-379[Medline].

CHARLESWORTH, B., M. T. MORGAN, and D. CHARLESWORTH, 1993  The effect of deleterious mutations on neutral molecular variation. Genetics 134:1289-1303[Abstract].

CHARLESWORTH, B., M. NORDBORG, and D. CHARLESWORTH, 1997  The effects of local selection, balanced polymorphism and background selection on equilibrium patterns of genetic diversity in subdivided populations. Genet. Res. 70:155-174[Medline].

CHOUDHARY, M., M. B. COULTHART, and R. S. SINGH, 1992  A comprehensive study of genic variation in natural populations of Drosophila melanogaster. VI. Patterns and processes of genic divergence between D. melanogaster and its sibling species, D. simulans. Genetics 130:843-853[Abstract].

COCKERHAM, C. C., 1973  Analyses of gene frequencies. Genetics 74:697-700.

COCKERHAM, C. C. and B. S. WEIR, 1987  Correlations, descent measures: drift with migration and mutation. Proc. Natl. Acad. Sci. USA 84:8512-8514[Abstract/Free Full Text].

ESTOUP, A., and B. ANGERS, 1998 Microsatellites and minisatellites for molecular ecology: theoretical and empirical considerations, pp. 55–86 in Advances in Molecular Ecology, edited by G. R. CARVALHO. IOS Press, Amsterdam.

FELSENSTEIN, J., 1982  How can we infer geography and history from gene frequencies? J. Theor. Biol. 96:9-20[Medline].

FLINT, J., J. BOND, D. C. REES, A. J. BOYCE, and J. M. ROBERTS-THOMSON et al., 1999  Minisatellite mutational processes reduce FST estimates. Hum. Genet. 105:567-576[Medline].

HILL, W. G. and A. ROBERTSON, 1966  The effect of linkage on limits to artificial selection. Genet. Res. 8:269-294[Medline].

HILL, W. G. and A. ROBERTSON, 1968  Linkage disequilibrium in finite populations. Theor. Appl. Genet. 38:226-231.

HUDSON, R. R., 1990  Gene genealogies and the coalescent process. Oxf. Surv. Evol. Biol. 7:1-44.

HUDSON, R. R. and N. L. KAPLAN, 1988  The coalescent process in models with selection and recombination. Genetics 120:831-840[Abstract/Free Full Text].

KAPLAN, N. L., R. R. HUDSON, and C. H. LANGLEY, 1989  The hitchhiking effect revisited. Genetics 123:887-899[Abstract/Free Full Text].

LEWONTIN, R. C. and J. KRAKAUER, 1973  Distribution of gene frequency as a test of the theory of the selective neutrality of polymorphism. Genetics 74:175-195[Abstract/Free Full Text].

LEWONTIN, R. C. and J. KRAKAUER, 1975  Testing the heterogeneity of F values. Genetics 80:397-398[Free Full Text].

MALÉCOT, G., 1975  Heterozygosity and relationship in regularly subdivided populations. Theor. Popul. Biol. 8:212-241[Medline].

MAYNARD SMITH, J. and J. HAIGH, 1974  The hitch-hiking effect of a favourable gene. Genet. Res. 23:23-35[Medline].

MENDENHALL, W. M., D. D. WACKERLY and R. L. SCHEAFFER, 1990 Mathematical Statistics with Applications. PWS-KENT Publishing Company, Boston.

NEI, M., 1972  Genetic distance between populations. Am. Nat. 106:283-292.

NEI, M. and A. CHAKRAVARTI, 1977  Drift variance of FST and GST statistics obtained from a finite number of isolated populations. Theor. Popul. Biol. 11:307-325[Medline].

NEI, M. and T. MARYUYAMA, 1975  Lewontin-Krakauer test for neutral genes. Genetics 80:395[Free Full Text].

NEI, M., A. CHAKRAVARTI, and Y. TATENO, 1977  Mean and variance of FST in a finite number of incompletely isolated populations. Theor. Popul. Biol. 11:291-306[Medline].

NIELSEN, R. and M. SLATKIN, 2000  Likelihood analysis of ongoing gene flow and historical association. Evolution 54:44-50[Medline].

NORDBORG, M., 2001 Coalescent theory, pp. 179–212 in Handbook of Statistical Genetics, edited by D. J. BALDING, M. BISHOP and C. CANNINGS. John Wiley & Sons, Chichester, UK.

OHTA, T. and M. KIMURA, 1969  Linkage disequilibrium due to random genetic drift. Genet. Res. 13:47-55.

REYNOLDS, J., B. S. WEIR, and C. C. COCKERHAM, 1983  Estimation of the coancestry coefficient: basis for a short term genetic distance. Genetics 105:767-779[Abstract/Free Full Text].

ROBERTSON, A., 1961  Inbreeding in artificial selection programmes. Genet. Res. 2:189-194.

ROBERTSON, A., 1975a  Gene frequency distribution as a test of selective neutrality. Genetics 81:775-785[Abstract/Free Full Text].

ROBERTSON, A., 1975b  Remarks on the Lewontin-Krakauer test. Genetics 80:396[Free Full Text].

ROSS, K. G., D. D. SHOEMAKER, M. J. B. KRIEGER, J. DEHEER, and L. KELLER, 1999  Assessing genetic structure with multiple classes of molecular markers: a case study involving the introduced fire ant Solenopsis invicta.. Mol. Biol. Evol. 16:525-543[Abstract].

ROUSSET, F., 1996  Equilibrium values of measures of population subdivision for stepwise mutation processes. Genetics 142:1357-1362[Abstract].

ROUSSET, F., 1997  Genetic differentiation and estimation of gene flow from F-statistics under isolation by distance. Genetics 145:1219-1228[Abstract].

ROUSSET, F., 2001 Inferences from spatial population genetics, pp. 179–212 in Handbook of Statistical Genetics, edited by D. J. BALDING, M. BISHOP and C. CANNINGS. John Wiley & Sons, Chichester, UK.

SINGH, R. S., M. CHOUDHARY, and J. R. DAVID, 1987  Constrasting patterns of geographic variation in the cosmopolitan sibling species Drosophila melanogaster and D. Simulans. Biochem. Genet. 25:27-40[Medline].

SLATKIN, M., 1991  Inbreeding coefficients and coalescence times. Genet. Res. 58:167-175[Medline].

STROBECK, C., 1983  Expected linkage disequilibrium for a neutral locus linked to a chromosomal arrangement. Genetics 103:545-555[Abstract/Free Full Text].

STROBECK, C., 1987  Average number of nucleotide differences in a sample from a single subpopulation: a test for population subdivision. Genetics 117:149-153[Abstract/Free Full Text].

TAKAHATA, N., 1988  The coalescent in two partially isolated diffusion populations. Genet. Res. 52:213-222[Medline].

TSAKAS, S. and C. B. KRIMBAS, 1976  Testing the heterogeneity of F values: a suggestion and a correction. Genetics 84:399-401[Abstract/Free Full Text].

WEIR, B. S. and C. C. COCKERHAM, 1984  Estimating F-statistics for the analysis of population structure. Evolution 38:1358-1370.

WRIGHT, S., 1951  The genetical structure of populations. Ann. Eugen. 15:323-354.




This article has been cited by other articles:


Home page
GeneticsHome page
M. Foll and O. Gaggiotti
A Genome-Scan Method to Identify Selected Loci Appropriate for Both Dominant and Codominant Markers: A Bayesian Perspective
Genetics, October 1, 2008; 180(2): 977 - 993.
[Abstract] [Full Text] [PDF]


Home page
GeneticsHome page
H. Innan and Y. Kim
Detecting Local Adaptation Using the Joint Sampling of Polymorphism Data in the Parental and Derived Populations
Genetics, July 1, 2008; 179(3): 1713 - 1720.
[Abstract] [Full Text] [PDF]


Home page
GeneticsHome page
A. Caballero, H. Quesada, and E. Rolan-Alvarez
Impact of Amplified Fragment Length Polymorphism Size Homoplasy on the Estimation of Population Genetic Diversity and the Detection of Selective Loci
Genetics, May 1, 2008; 179(1): 539 - 554.
[Abstract] [Full Text] [PDF]


Home page
GeneticsHome page
Y. Tsumura, T. Kado, T. Takahashi, N. Tani, T. Ujino-Ihara, and H. Iwata
Genome Scan to Detect Genetic Structure and Adaptive Genes of Natural Populations of Cryptomeria japonica
Genetics, August 1, 2007; 176(4): 2393 - 2403.
[Abstract] [Full Text] [PDF]


Home page
GeneticsHome page
N. C. Kane and L. H. Rieseberg
Selective Sweeps Reveal Candidate Genes for Adaptation to Drought and Salt Tolerance in Common Sunflower, Helianthus annuus
Genetics, April 1, 2007; 175(4): 1823 - 1834.
[Abstract] [Full Text] [PDF]


Home page
J HeredHome page
L. Pariset, I. Cappuccio, P. Ajmone-Marsan, M. Bruford, S. Dunner, O. Cortes, G. Erhardt, E.-M. Prinzenberg, K. Gutscher, S. Joost, et al.
Characterization of 37 Breed-Specific Single-Nucleotide Polymorphisms in Sheep
J. Hered., September 1, 2006; 97(5): 531 - 534.
[Abstract] [Full Text] [PDF]


Home page
GeneticsHome page
L. Camus-Kulandaivelu, J.-B. Veyrieras, D. Madur, V. Combes, M. Fourmann, S. Barraud, P. Dubreuil, B. Gouesnard, D. Manicacci, and A. Charcosset
Maize Adaptation to Temperate Climate: Relationship Between Population Structure and Polymorphism in the Dwarf8 Gene
Genetics, April 1, 2006; 172(4): 2449 - 2463.
[Abstract] [Full Text] [PDF]


Home page
Mol Biol EvolHome page
A. Bonin, P. Taberlet, C. Miaud, and F. Pompanon
Explorative Genome Scan to Detect Candidate Loci for Adaptation Along a Gradient of Altitude in the Common Frog (Rana temporaria)
Mol. Biol. Evol., April 1, 2006; 23(4): 773 - 783.
[Abstract] [Full Text] [PDF]


Home page
GeneticsHome page
J. M. Burke, S. J. Knapp, and L. H. Rieseberg
Genetic Consequences of Selection During the Evolution of Cultivated Sunflower
Genetics, December 1, 2005; 171(4): 1933 - 1940.
[Abstract] [Full Text] [PDF]


Home page
Phil Trans R Soc BHome page
M.A Toro and A Caballero
Characterization and conservation of genetic diversity in subdivided populations
Phil Trans R Soc B, July 29, 2005; 360(1459): 1367 - 1378.
[Abstract] [Full Text] [PDF]


Home page
Mol Biol EvolHome page
A. Vasemagi, J. Nilsson, and C. R. Primmer
Expressed Sequence Tag-Linked Microsatellites as a Source of Gene-Associated Polymorphisms for Detecting Signatures of Divergent Selection in Atlantic Salmon (Salmo salar L.)
Mol. Biol. Evol., April 1, 2005; 22(4): 1067 - 1076.
[Abstract] [Full Text] [PDF]


Home page
GeneticsHome page
I. Goldringer and T. Bataillon
On the Distribution of Temporal Variations in Allele Frequency: Consequences for the Estimation of Effective Population Size and the Detection of Loci Undergoing Selection
Genetics, September 1, 2004; 168(1): 563 - 568.
[Abstract] [Full Text] [PDF]


Home page
J HeredHome page
R. Vitalis, K. Dawson, P. Boursot, and K. Belkhir
DetSel 1.0: A Computer Program to Detect Markers Responding to Selection
J. Hered., September 1, 2003; 94(5): 429 - 431.
[Abstract] [Full Text] [PDF]


Home page
Mol Biol EvolHome page
M. Kayser, S. Brauer, and M. Stoneking
A Genome Scan to Detect Candidate Regions Influenced by Local Natural Selection in Human Populations
Mol. Biol. Evol., June 1, 2003; 20(6): 893 - 900.
[Abstract] [Full Text] [PDF]


Home page
Genome ResHome page
J. M. Akey, G. Zhang, K. Zhang, L. Jin, and M. D. Shriver
Interrogating a High-Density SNP Map for Signatures of Natural Selection
Genome Res., December 1, 2002; 12(12): 1805 - 1814.
[Abstract] [Full Text] [PDF]


Home page
Mol Biol EvolHome page
J. F. Storz, M. A. Beaumont, and S. C. Alberts
Genetic Evidence for Long-Term Population Decline in a Savannah-Dwelling Primate: Inferences from a Hierarchical Bayesian Model
Mol. Biol. Evol., November 1, 2002; 19(11): 1981 - 1990.
[Abstract] [Full Text] [PDF]


Home page
GeneticsHome page
C. Schlotterer
A Microsatellite-Based Multilocus Screen for the Identification of Local Selective Sweeps
Genetics, February 1, 2002; 160(2): 753 - 763.
[Abstract] [Full Text] [PDF]