Abstract
We investigate the usefulness of analyses of population differentiation between different ecological types, such as host races of parasites or sources and sink habitats. To that aim, we formulate a model of population structure involving two classes of subpopulations found in sympatry. Extensions of previous results for Wright's Fstatistics in island and isolationbydistance models of dispersal are given. It is then shown that source and sinks cannot in general be distinguished by Fstatistics nor by their gene diversities. The excess differentiation between two partially isolated classes with respect to differentiation within classes is shown to decrease with distance, and for a wide range of parameter values it should be difficult to detect. In the same circumstances little differentiation will be observed in “hierarchical analyses between pools of samples from each habitat, and differences between levels of differentiation within each habitat will only reflect differences between levels of gene diversity within each habitat. Exceptions will indicate strong isolation between the different classes or habitatrelated divergent selection.
COMPARISONS of population structure between host races of parasites or other behavioral ecotypes have been conducted to evaluate the potential of such ecological differentiation as a factor of genetic isolation and sympatric speciation (e.g., Federet al. 1990; Ross and Shoemaker 1993; Duffy 1996; Gibbset al. 1996). Similar comparisons have also been conducted to evaluate directional “gene flow from one environment to the other, or to test for “sourcesink functioning (e.g., Diaset al. 1996; Hamilton 1997; Stantonet al. 1997). Source habitats are habitats in which individuals contribute more to the future population than the average individual does, and sink habitats are habitats where the reverse occurs (see Dias 1996, for review).
Attempts to estimate gene flow between the different ecotypes are often based on “hierarchical analyses of differentiation measured by Wright's Fstatistics between pools of samples from each habitat (Wright 1969), or other forms of comparisons of Fstatistics. However, it is not clear what information is brought by such comparisons. Although models of population structure involving different types of individuals or subpopulations have been discussed by several authors, one of the few theoretical works explicitly addressing the biological interpretation of differentiation between two distinct groups of subpopulations is the hierarchical island model of Slatkin and Voelm (1991). In this article, we formulate a model of population differentiation between sympatric subpopulations of two different types. The analysis of this model makes it possible to identify the demographic parameters that determine the patterns of differentiation observable by comparisons of gene diversities or of Fstatistics. Then it is possible to assess whether sources and sinks can be distinguished by their levels of gene diversity or by the relative levels of population differentiation between sources, between sinks, and between sources and sinks. It is also possible to assess whether partial genetic isolation between different ecological types can be detected by hierarchical analyses.
The comparisons of population structure for different classes of individuals are often complicated by isolation by distance (e.g., Zink and Barrowclough 1984; Duffy 1996; Hamilton 1997; Stantonet al. 1997). Even if migration occurs preferentially within each class rather than between them, it might be expected that neighboring subpopulations from different classes are less differentiated than more distant subpopulations in the same class. Thus, it is not obvious whether the habitatrelated variation in differentiation at a given geographical distance can be interpreted independently of distance. For this reason isolation by distance will be considered in the models. For simplicity, we first present results for island models of dispersal, then consider how they generalize to models of isolation by distance.
THE MODEL
We consider a population on a discrete space, evolving at discrete time intervals. Subpopulations are found at n_{x} × n_{y} positions on a twodimensional torus. If n_{y} = 1, this is the onedimensional model on a circle. Two different classes of haploid individuals are located at each node of the lattice: they form two haploid subpopulations of size 2N_{1} and 2N_{2} (see Figure 1). Thus, the two types of subpopulations are assumed to be homogeneously distributed on the same geographical area. The total population size is 2(N_{1} + N_{2})n_{x}n_{y}. The model assumes discrete generations. From time t and t + 1 the order of events is reproduction, mutation, and migration. These events are described in the reverse order, going backward in time.
Migration: The position or movement of a gene on a lattice may be described by a pair of coordinates (r_{1}, r_{2}). Let q_{k}_{,}_{r}_{′}_{i}_{,}_{r*} be the probability that a gene found in a classk individual in position r′ had an ancestor or was itself in a classi individual in position r* before reproduction and migration. We assume that migration is homogeneous in space for given values of i and k; i.e., the probability q_{k}_{,}_{r}_{′}_{i}_{,}_{r*} depends only on values of the movement, r ≡ r′ – r*, and can be written q_{ki}(r). Let
The sourcesink functioning was formalized by Pulliam (1988). In his definition, sources are compartments that show no net change in population size but send more emigrants than they receive immigrants, the reverse holding for sinks. Habitat 1 is a source when v_{21}N_{2} > v_{12}N_{1}, i.e., when v_{21}/v_{12} > N_{1}/N_{2}. There is sourcesink functioning when (and only when) v_{21}/N_{1} and v_{12}/ N_{2} differ. This corresponds to the case of “nonconservative migration defined by Nagylaki (1982).
The distinction between sources and sinks can also be expressed in terms of reproductive values (e.g., Fisher 1958; Taylor 1990). The class reproductive values e_{1} and e_{2} are defined as the probabilities that the ancestral lineage of some gene was in class 1 or 2 in a distant past (obviously, e_{2} = 1 – e_{1}). They can be computed as components of a left eigenvector associated with the dominant eigenvalue of the matrix of classtransition rates (e.g., Taylor 1990, Equation 8). Likewise the individual reproductive values of the two classes are defined as the probabilities that the ancestral lineage of some gene was in a given class 1 or class 2 individual in a distant past. They can be computed as elements of a left eigenvector of a demographic “projection matrix (e.g., Leslie 1945; Caswell 1989; Taylor 1990, Equation 7). In the present model they are obtained from the matrix of classtransition rates v_{ki}. One has e_{1} = v_{21}/(v_{21} + v_{12}), and the individual reproductive values are e_{1}/N_{1} and e_{2}/N_{2}, which are proportional to v_{21}/N_{1} and v_{12}/N_{2}. Hence from the above definition there is sourcesink functioning when (and only when) the individual reproductive values differ.
Dispersal is assumed independent and identically distributed in each dimension. Let z ≡ (x, y), and for any function f(r) consider the Fourier series
Let
Mutation: We consider the infinite allele model. See Rousset (1996) for a discussion of the relationship between Fstatistics for this model and for more general mutation models.
We assume that mutation acts independently on each gene lineage and that the mutation rate u is identical in both types of subpopulations. Let
Reproduction: We assume that the probability that two gametes observed in a classi subpopulation before migration are copies of the same gene from an individual in this subpopulation is 1/(2N_{i}) < 1. In other words, the probability of coalescence is 1/(2N_{i}). Thus
Probabilities of identity are obtained from their generating functions by Fourier inversion techniques (e.g., Gasquetet al. 1998). For a finite lattice of size n_{x} × n_{y}, the inverse Fourier transform of some function f is
RESULTS
This section presents analytical approximations for gene diversities and differentiation between pairs of subpopulations under the model presented above. These approximations are derived in the context of a more general model in Rousset (1999). Their accuracy will be assessed through exact numerical computations, using Equations 9 and 10. Mathematica (Wolfram 1991) has been used in all computations.
Infinite island model: Consider n populations on a circle. Let the dispersal rate
In the oneclass infinite island model, the usual measure of population structure, F_{ST}(r) ≡ (Q_{11;0} – Q_{11;}_{r})/ (1 – Q_{11;}_{r}), equals Q_{11;0} under the infinite allele model because the probabilities of identity between different subpopulations, Q_{11;}_{r}, are then null. We have a similar result in the twoclass model, if we consider the parameters
In an island model with small values of dispersal and transition rates, we expect that Q_{ii}_{;0} ≈ 1/(1 + 4N_{i}M_{i}), where M_{i} is the probability that a parent was in another subpopulation (Beaumont and Nichols 1996). As in the oneclass model (Hudson 1990), Q_{ii}_{;0} is approximately the probability that, going backward in time, there is coalescence of two gene lineages within a classi subpopulation before one of their ancestral lineages emigrates from this subpopulation (i.e., immigrates, going forward in time). For example, with small values of dispersal m_{22} and transition rate v_{21},
In general, some detailed knowledge of the demography of the population would be required to estimate w* and F_{w}_{*} because their definitions depend on the reproductive values e_{i}, in contrast to the parameters defined for pairs of subpopulations (Equation 11), which can be estimated using data from pairs of subpopulations. Parameters involving all subpopulations in the population are not considered, in part because their estimation requires, in principle, having subsamples from all subpopulations. However, “hierarchical analyses based on the comparison of probabilities of identity within and between pools of subsamples (Wright 1969) are discussed. An analysis based on estimating F_{w}_{*} is not a hierarchical analysis because F_{w}_{*} is defined from the average of the probabilities Q_{w}_{*;0} within each subpopulation rather than from probabilities of identity computed after pooling genes from different subpopulations.
Since F_{ii}_{;}_{r} = Q_{ii}_{;0} in the infinite island, infinite allele model, the comparison of the relative values of differentiation in each class is equivalent to the comparison of gene diversities. Intuitively, the probability of identity will be higher in the class with a smaller subpopulation size or in the class with “lower dispersal, and the differences between the two classes will be stronger the less they exchange migrants. Higher transition rates have the obvious effect of homogenizing the gene diversities.
The individual effects of differences in subpopulation sizes or transition rates can be investigated in the symmetric dispersal model, in which all conditional dispersal distributions (i.e., all m_{ij}'s) are identical (Figure 2 and appendix). Note that the assumption of symmetric dispersal does not constrain the classtransition rates to particular values. In this model, a large sink may have a higher gene diversity than a small source (e.g., top left of Figure 2), and in some cases even a small sink may have a higher gene diversity than a large source (area below the previous one, where 1 is source, N_{2} < N_{1}, and Q_{11;0} > Q_{22;0}).
When the two classes differ by a single parameter value (N_{1} ≠ N_{2} or v_{21} ≠ v_{12}), the consequences are relatively easily predicted (Figure 2). If N_{2}v_{21} = N_{1}v_{12}, there is no source and sink and Q_{11;0} < Q_{22;0} for v_{12} < v_{21}. Then the class that receives a higher fraction of its genes from the other class has a lower gene diversity (Equation A4). The fact that gene diversities differ in that case suffices to show that sources and sinks cannot be distinguished simply by their genetic diversities. This is easily seen when all m's and v's are small, in which case the relative magnitude of the gene diversities will depend on the relative magnitude of N_{2}(v_{22}m_{22} + v_{21}) vs. N_{1}(v_{11}m_{11} + v_{12}) (this follows from the argument that led to Equation 12). By contrast, which class is source and which is sink will depend on the relative magnitude of N_{2}v_{21} vs. N_{1}v_{12}. This is independent of the dispersal rates from a subpopulation into another subpopulation of the same class, m_{11} and m_{22}.
Isolation by distance: In the isolationbydistance models, we assume that each axial dispersal distribution has a finite second moment (or average squared dispersal distance)
We determine the increase of differentiation with distance, and we consider how the results of the infinite island model extend to models of isolation by distance. In particular we investigate whether differences in gene diversities determine differences between levels of withinclass differentiations as they do in the infinite island model, and whether betweenclass differentiation is an average of withinclass differentiations. We also investigate to what extent the result of the infinite island model, F_{w}_{*} ≈ 1/(1 + 4N_{e}m_{e}), approximates the value of differentiation between adjacent populations under isolation by distance, as it does in oneclass models with small migration rates (e.g., Kimura and Maruyama 1971).
In the oneclass, onedimensional isolationbydistance model
In two dimensions
We continue with the onedimensional model for exposition, with results very similar to those detailed below being obtained in the twodimensional model. Numerical examples are given for both cases.
Some aspects of differentiation that might in principle be used to investigate the divergence between the two habitats, such as the relative strength of withinhabitat and betweenhabitat differentiations, are a priori dependent on the geographical distance between populations, and their properties may not be summarized in terms of the demographic parameters independently of distance. Then, measures whose values are either less dependent on geographical distance, or have a known relationship to it, should be considered.
Consider first the following measures of population structure,
We now seek parameters that quantify the differences between a_{11}, a_{12}, and a_{22} but the value of which does not depend on the distancedependent term
Withinclass differentiation: For the comparison of differentiations within each class, a parameter whose value does not depend on the term
Under the island model with small dispersal and transition rates, if F_{22} ≫ F_{11} ≈ 0, then D = R – 1 ≈ –F_{22}, hence
Betweenclass differentiation: For the comparison of betweenclass differentiation relative to withinclass differentiation, we consider Z(r) ≡ (1 + R)a_{12}(r) – a_{11}(r) – Ra_{22}(r). In essence this is a comparison of betweenclass differentiation a_{12}(r) to the average withinclass differentiation (a_{11}(r) + a_{22}(r))/2, corrected by the ratio R to make it approximately independent of distance. When r → ∞,
Gene diversities: Generally, the class with the higher withinclass differentiation is the one with the lower gene diversity (Figure 3), although some exceptions could occur at short distances. Therefore, the relative levels of withinclass differentiation convey little additional information relative to that given by R, i.e., by the relative values of gene diversities in each class. As in the island model, we should understand how the different demographic parameters affect the relative values of gene diversities in each class. For the symmetric dispersal model, where all dispersal distributions are identical, the qualitative results noted before and illustrated by Figure 2 for the relationship between subpopulation sizes, transition rates, and gene diversities in the symmetric island model, hold true in general (see appendix). The individual effect of differences in dispersal distributions cannot be easily summarized. Even in the oneclass model the important parameters of the dispersal distribution determining gene diversity have no simple interpretation (Rousset 1997).
Increase of differentiation with distance: In the onedimensional isolationbydistance model, the increase of a_{ijklw}* with distance is
Hierarchical analyses: In the framework of Wright's Fstatistics, such analyses compare the probability of identity within pools of subsamples, averaged over the different classes, to the probability of identity between pools. When an increasing number of subsamples are pooled, each withinpool probability approaches an average of the Q_{ii}_{;}_{r}'s, and the probability of identity between pools is close to Q_{12;}_{r}, for diverse values of r > 0. The hierarchical analysis generally conducted in experimental studies compares the genetic variation within pools (1 – Q_{1/2;}_{r} if equal weight is given to both classes) to the genetic variation between pools (1 – Q_{12;}_{r}). One parameter describing this comparison is
Numerical examples: In Table 1, the numerical accuracy of the different analytical approximations is investigated. The following points may be seen from this table. For the island model, the number of migrants into subpopulations averaged over the different classes, M ≡ 2N_{1}(v_{12} + v_{11}m_{11}) + 2N_{2}(v_{21} + v_{22}m_{22}), is seen to be a very inaccurate descriptor of differentiation in comparison with approximation (14) [Table 1, cases h±k; in this table, the value of F_{w}_{*}/(1 – F_{w}_{*}) is compared to 1/(4N_{e}m_{e}) and to 1/(2M)]. This is in agreement with previous results (Gaggiotti 1996).
The island model also suggests an approximation for the measure D of differences between genetic differentiations within each class (Equation 28). This approximation is correct within a factor of two in examples where N_{2} ≪ N_{1} (N_{2} = 10, cases a, f±i, and k in Table 1), including some cases not assuming an island model of dispersal.
In isolationbydistance models, in principle D(r) approaches R – 1 more slowly when v_{21} + v_{12} decreases, and some information could be obtained from the rate of approach. However, even under the simple case of symmetric dispersal, this approach is a complex function of the demographic parameters. In the numerical examples we investigated whether this approximation is accurate at distances larger than
We have seen that the betweenclass differentiation approaches an average of withinclass differentiations as distance increases. This is quantified by the approach of Z(r) to 0, which is also slower when v_{21} + v_{12} decreases. Figure 3 shows that the excess differentiation between classes with respect to this average may be difficult to detect, even for transition rates ≈1/50. Differentiation may be stronger between neighboring sources and sinks than between neighboring sinks or neighboring sources, yet this will not be the case for more distant pairs of subpopulations.
Finally the increase of differentiation with distance has been approximated by the values of the derivatives discussed above (Equation 30 for the onedimensional model), which are asymptotic values at large distances. The lower v_{21} + v_{12} is, the slower the asymptotic values are approached as distance increases. At short distances, the increase of a withinclass differentiation may depend mainly on local processes rather than on the effective parameters. As for D, in the numerical examples we investigated whether the asymptotic values are accurate at the distance
DISCUSSION
The analysis of the models presented here has shown that genetic differentiation is approximately described by “effective parameters N_{e}, σ^{2}_{e}, and m_{e} and by the relative values of gene diversities in the two types of subpopulations. N_{e} may be understood as the average rate of coalescence of pairs of ancestral lineages (Nordborg 1997; Nagylaki 1998; Rousset 1999), and m_{e} and σ^{2}_{e} may be understood as average dispersal rates of ancestral lineages. The other aspects of the results have implications both for inferences about sourcesink functioning and for hierarchical analyses of differentiation.
The relative values of gene diversities in each habitat have no simple relationship with the reproductive value of individuals in each habitat, because the gene diversities depend among other factors on the migration rates between subpopulations of the same types, while the reproductive values of each type are independent of these migration rates. Therefore, source and sink habitats cannot be distinguished by comparison of their gene diversities.
Further, these models show that the differentiation between two habitats is generally intermediate between the genetic differentiations within each habitat, and that genetic differentiation will be generally higher in the habitat with the lower gene diversity than in the habitat with the higher gene diversity. It follows that source and sink habitats cannot be distinguished by analyses of population structure by Fstatistics.
The models also show that a partial isolation between the two classes will often not result in an excess differentiation between them and will not be detected by a hierarchical analysis. Some complications may in principle result from isolation by distance: if the two different classes show a strong but incomplete isolation, betweenclass differentiation may exceed withinclass differentiation at short distances, but this excess differentiation will disappear at larger distances. In practice, no excess differentiation between classes may be detectable at distances only a few times the effective dispersal rate
On the other hand, divergent selection in the two classes on the loci considered, or on closely linked ones, will result in betweenclass differentiation being higher at all distances than the average of withinclass differentiations at all distances. Examples of populations made of two types of subpopulations in the same geographical area and showing such a pattern of differentiation include host races of the fruit fly Rhagoletis pomonella, which show a substantial excess differentiation between races at some loci (Federet al. 1990) in spite of large transition rates (6% in Federet al. 1994). There is independent evidence that selection is responsible for the patterns of differentiation at these loci (Federet al. 1997). Other examples may be found in Ross and Shoemaker (1993) or Chevillon et al. (1998). Because the reverse observation (that betweenclass differentiation should be intermediate between withinclass differentiations) is a robust expectation of the neutral model for a wide range of parameter values, the comparison of within and betweenclass differentiations may be an efficient way to detect habitatrelated divergent selection at some loci under investigation or at linked loci.
Acknowledgments
I thank M. Lascoux, Y. Michalakis, and O. Ronce for comments on various versions of this manuscript. The reviewers of a previous manuscript and J. Hey also provided some very helpful comments. This work was supported by the Service Commun de Biosystématique de Montpellier and grants from the Région LanguedocRoussillon (963223) and Centre National de la Recherche Scientifique (ACC SV3 9503037). This is paper 98119 of the Institut des Sciences de l'Évolution.
APPENDIX
Let λ_{1}, λ_{2}, and λ_{3} be the eigenvalues of A. Consider the spectral representation of P (e.g., Karlin and Taylor 1975): P = XDX^{–1}, where D is a diagonal matrix with elements
If all dispersal distributions are identical with characteristic function ψ, then it is found that for λ ≡ (1 – v_{21} – v_{12})ψ^{2}, d_{j} = γλ^{j} ^{–} ^{1}/(1 – γλ^{j} ^{–} ^{1}), and
From Equations 5 and 9, the ratio of gene diversities is
In the symmetric dispersal model Z(r) simplifies to
Footnotes

Communicating editor: J. Hey
 Received April 28, 1998.
 Accepted September 30, 1998.
 Copyright © 1999 by the Genetics Society of America