Abstract
The interplay between population structure and natural selection is an area of great interest. It is known that certain types of population subdivision do not alter fixation probabilities of selected alleles under genic, frequency-independent selection. In the presence of dominance for fitness or frequency-dependent selection these same types of subdivision can have large effects on fixation probabilities. For example, the barrier to fixation of a fitter allele due to underdominance is reduced by subdivision. Analytic results presented here relate a subdivided population that conforms to a finite island model to an approximately equivalent panmictic population. The size of this equivalent population is different from (larger than) the actual size of the subdivided population. Selection parameters are also different in the hypothetical equivalent population. As expected, the degree of dominance is lower in the equivalent population. The results are not limited to dominance but cover any form of polynomial frequency dependence.
NATURAL populations are likely to be characterized by some kind of population structure. The population-genetic and evolutionary consequences of such structure have been investigated since the beginnings of population genetics (Wright 1931, 1939, 1943). Much work has centered on the amount of polymorphism maintained in a subdivided population (Slatkin 1977; Maruyama and Kimura 1980; Nagylaki 1998) and on the distribution of allele frequencies (Maruyama, 1972a,b,c). A closely related topic is the effective size of a subdivided population (Wright 1939; Maruyama 1970a; Slatkin 1981, 1991; Takahata 1991; Nei and Takahata 1993; Santiago and Caballero 1995; Whitlock and Barton 1997; Wang and Caballero 1999).
One area of interest is the interaction of population structure with selection. Maruyama (1970b, 1974) has shown that fixation probabilities are unaffected by population subdivision under simple genic selection and fairly general conditions of migration patterns. Under a finite island model of subdivision with a large number of demes, the trajectory of allele frequency over time is approximately the same as that in a panmictic population with a different size and different selection coefficient (Cherry and Wakeley 2003).
If the assumption of genic selection is relaxed, the problem becomes more difficult. Subdivision affects fixation probabilities when there is dominance for fitness or when the relative fitnesses of genotypes depend on their frequencies. An area of particular interest has been the case of underdominance (heterozygote disadvantage), which can serve as a barrier to the fixation of a fitter genotype. Subdivision reduces this barrier, and this effect has been interpreted as a simple case of Wright’s “shifting balance” theory (Lande 1985), a theory that has been a topic of debate for many decades. Formally equivalent to dominance is a form of locally frequency-dependent selection. For example, selection that favors an allele when it is locally common, but disfavors it when it is rare, is similar to underdominance, and underdominance may be considered a case of positive frequency dependence. Frequency dependence may have any form; it need not be restricted to the linear case that is formally equivalent to dominance.
Analytic results, including expressions for fixation probabilities, have been obtained for selection with dominance in the low-migration limit (Slatkin 1981; Lande 1985). Simulations have provided results for intermediate cases, where the migration rate is neither very low nor sufficiently high to make subdivision irrelevant (Slatkin 1981; Spiritoet al. 1993). Here I present analytic results that are not restricted to the weak-migration limit for a finite island model of subdivision. These results relate the subdivided population to a hypothetical equivalent panmictic population that differs from the actual population in both its size and its selection parameters. The selection parameter h, which is a measure of dominance for fitness (or degree of frequency dependence), is in effect moved toward ½ (additive fitness or no frequency dependence) by subdivision. This result can be generalized to more complicated forms of frequency dependence. The existence of an equivalent panmictic population allows application of established diffusion results to the subdivided population. Such quantities of fixation probabilities and expected times to fixation can be calculated.
MODELS AND RESULTS
Consider a finite island model of population structure, in which a finite number of demes (“islands”) exchange migrants with one another. The population consists of D demes, each containing N haploid or N/2 diploid individuals. Each generation involves migration, selection, and genetic drift. The order in which these occur makes little difference when selection coefficients and migration rates are small compared to unity, so this order need not be specified. The migration rate (the expected fraction of genes that come from outside the deme in any generation) is given by m, and selection operates in a manner to be specified below.
Three processes, selection, drift, and migration, alter the frequency of an allele in a deme each generation. Migration is symmetric in the island model, so it does not affect the overall allele frequency x (it does not affect the mean, and its effect on the variance is negligible). The effects of selection and drift are approximately additive. The mean change in x results from selection, whereas the variance is the result of genetic drift.
Dominance: Cherry and Wakeley (2003) analyzed the case of genic selection in an island model of subdivision. This analysis can be extended to cases where there is dominance for fitness, including over- and underdominance. These cases are equivalent, in terms of their usual diffusion approximations, to a certain form of frequency-dependent selection. Diffusion results for dominance in a panmictic population are well established. I show that, under certain conditions, the diffusion for a subdivided population is equivalent to that for some panmictic population. As in the case of genic selection, this equivalent panmictic population differs from the subdivided population not only in size, but also in fitness parameters.
Suppose that the fitnesses of genotypes aa, Aa, and AA are 1, 1 + 2hs, and 1 + 2s. The fitness difference between the two alleles is 2hs when paired with an a allele and 2s - 2hs when paired with an A allele. Thus the mean selective difference between the two alleles ŝ, which might be called the marginal selection coefficient (by analogy to the marginal fitness), depends on the allele frequency and is given by
Now consider a subdivided population. We are interested in the mean and variance of the change in overall allele frequency from one generation to the next, MΔx¯ and VΔx¯. Expressions for these as functions of x¯ will allow use of diffusion approximations. To obtain these we need to consider the probability distribution of the allele frequency in the ith deme, xi. If drift within a deme is strong compared to selection, the population as a whole in effect serves in the short term as a source population for any subpopulation, with constant allele frequency x¯. Under these conditions the distribution of within-deme allele frequency is a beta distribution whose probability density function is
The beta-distribution approximation is valid when selection is weak compared to drift in a subpopulation, in the sense that |ŝ| ⪡ 1/N. This condition must hold for all allele frequencies. In the present case, ŝ(xi) = 2hs (1 - xi) + (2s - 2hs)xi takes on its most extreme values at allele frequencies of zero or one. Thus the constraint on the strength of selection becomes
The stochastic change in allele frequency in a deme comes from the binomial sampling of alleles. Thus the variance of the change in allele frequency in the ith deme is ∼(1/N)xi(1 - xi). Using expressions for the first two moments of the beta distribution we can show that the variance of the change in population-wide frequency x¯ is given by
The mean change, on the other hand, is affected by the more complex selection scheme. For the ith deme, the change in allele frequency xi due to selection is approximately ŝ(xi)xi(1 - xi) = (k0 + k1xi)xi(1 - xi). We are interested in the mean of this quantity. Using the approximation that the xi are beta distributed with a = 2Nmx¯ and b = 2Nm(1 - x¯), along with expressions for the first three moments of a beta distribution, we can show that the mean change in x¯ is given by
Alternative parameterizations: There are several natural parameterizations of the selection model used above. The parameterization involving s and h cannot be applied to symmetric over- or underdominance: s would have to be zero for symmetry, while hs would have to be nonzero for over- or underdominance. One parameterization that can represent the symmetric cases is that involving k0 and k1. A more commonly used notation for dominance gives the fitnesses of aa, Aa, and AA as 1, 1 + s1, and 1 + s2. Let s1e and s2e be the effective values of s1 and s2. Using the fact that s1 = 2hs and s2 = 2s, we obtain
In the symmetric case s2 = 0. If we let s′= s1 then both homozygotes have fitness 1 and the heterozygote has fitness 1 + s′. If
More complicated forms of selection: The analysis given above applies when the frequency dependence of the selective difference between the alleles has the form ŝ(x) = k0 + k1x. This case includes an arbitrary degree of dominance when the relative fitnesses of the diploid genotypes do not depend on their frequencies. This is not the only possible form of frequency dependence; ŝ(x) can in principle have any form whatsoever. Can we obtain diffusion approximations for more complicated forms of ŝ(x) when the population is subdivided, again assuming that |Nŝ(x)| is always small compared to 1?
The results obtained above are consequences of the forms of the first three moments of the beta distribution. The useful properties of these moments extend to the higher-order moments. These properties permit the analysis of cases where the frequency dependence is described by any polynomial, i.e., where ŝ(x) = P(x) = k0 + k1x + k2x2 +... + knxn. We see that the diffusion for a subdivided population with this form of selection is equivalent to that for a panmictic population with a different size and with ŝe(x) = Pe(x) = k0e + k1ex + k2ex2 +... + knexn. Simple dominance is a special case of this, with the degree of the polynomials P and Pe equal to 1.
Recall that we are interested in the expected value of ŝ(x)x(1 - x) when x has a beta distribution with parameters a = 2Nmx¯ and b = 2Nm(1 - x¯). For polynomial ŝ(x),ŝ(x)x(1 - x) is the sum of terms of the form kixi+1 (1 - x). For any beta distribution
—Predicted and observed fixation probabilities as functions of h and m. Predicted values of fixation probabilities (curves), relative to that for a neutral allele, are compared to values estimated by simulations (points) for a range of values of the dominance parameter h. In all cases the number of demes (D) is 100, the deme size (N) is 100, and s = 10-4. The allele was initially present in a single copy. Results are presented for three migration rates: m = 0.001 (solid triangles and curve), m = 0.003 (open circles and curve), and m = 0.01 (solid diamonds and curve).
COMPUTER SIMULATIONS
To test the approximations used above, I have run computer simulations and compared the results to theoretical predictions. In these simulations frequency-dependent selection acts on a haploid population. Most of the simulations are for cases of linear frequency dependence and can also be interpreted in terms of diploidy with dominance. I utilize the parameterization involving s and h for these cases.
In these simulations the state of the population is represented by an array of D integers, each corresponding to a deme. Each integer indicates the number of copies of allele A in the deme and hence ranges from 0 to N. Each generation the new value for each deme is drawn from a binomial distribution. The index parameter n of this binomial (number of “trials”) is equal to N. The probability parameter p (probability of “success”) is determined by the current allele frequency in the deme xi, the population-wide mean allele frequency x¯, the migration rate m, and the selection parameters s and h. Let p˜ = (1 - m)xi + mx¯. This would be the expected allele frequency in the ith deme in the next generation if there were no selection. The (marginal) selection coefficient is given by ŝ(p˜). Therefore p = (1 + ŝ)p˜/(1 + ŝp˜).
Predictions of fixation probabilities follow from combination of the theory presented here with classical diffusion results. Kimura (1957, Equation 5.4) gave an expression for fixation probability in a panmictic Wright-Fisher population with an arbitrary form of frequency dependence. Replacement of the parameters in this expression with the effective values derived here (Equations 7 and 8), and numerical evaluation of the resulting expression, yields the desired numerical predictions. Analogous use of results of Kimura and Ohta (1969, Equation 12) yields predictions of mean times to fixation.
Predicted and observed fixation probabilities for N = D = 100 and s = 3 × 10-4 or s = 10-3
Figure 1 compares theoretical predictions of fixation probabilities to the results of simulations for N = 100, D = 100, s = 10-4, and various values of m and h. The predictions are all very close to the observed values: all predictions are within 3.6% of the simulation results, and a majority are within 1%. The plot illustrates that the theory captures the effects of both the degree of dominance and the migration rate on the probability of fixation. Furthermore, all the observed mean fixation times are within a few percent of the theoretical predictions (maximum deviation 2.5%; data not shown), indicating that the diffusion is a good description of the trajectory of allele frequency as well as the probability of ultimate fixation.
Fixation probabilities for various numbers of demes
Table 1 presents results for larger values of s. For s = 3 × 10-4, the predictions are still close to the observed values, differing by at most 7.4%. For s = 10-3, some of the predictions differ significantly from the simulation results for the smaller migration rate, especially at the extremes of over- and underdominance. This is to be expected because the weak selection assumption (Equations 2 and 3) is not met: for h =-4, 2N(1 - h)s = -1, and for h = 3, 2Nhs = 0.6, neither of which is very small in magnitude compared to 1. Even at these extremes, however, the predictions come within 16% of the observed values. For the less extreme values of h the theoretical predictions are quite good.
Table 2 shows some results for different numbers of demes (D). With 30 demes, the predictions are again quite good: all of them are within 10% of the simulation results. With as few as 10 demes the assumption of a large number of demes is seriously violated. Although many of the predictions are close to the observations, some differ from them by as much as 32%.
—Fixation probabilities as functions of initial allele frequency. Theoretical curves for various values of h and m are compared to simulation results (points), with N = 100, D = 100, and s = 10-4. (a) h = 3 and m = 0.01 (open squares and curve), or h = 3 and m = 0.003 (solid triangles and curve), or h =-4 and m = 0.003 (open circles and curve), or h =-4 and m = 0.01 (solid diamonds and curve). (b) h =-15 and m = 0.001 (solid triangles and curve), m = 0.003 (open circles and curve), or m = 0.01 (solid diamonds and curve).
In the simulation results presented so far the allele was initially present in a single copy. Figure 2 shows results for a range of initial allele frequencies. Figure 2a shows results for overdominance (h = 3) and underdominance (h =-4) with different migration rates. All of the points (simulation results) fall along the theoretical curves. All of the predicted fixation probabilities are within 3% of the simulation estimates, and the probabilities of loss also agree within 3%. Figure 2b shows results for a case of strong underdominance (h =-15). Results for such extreme underdominance were not given for an allele starting at a single copy because some fixation probabilities would be so low that they would be difficult to estimate by simulation. For higher initial allele frequencies the fixation probability is much larger and can be measured more easily. Figure 2b illustrates that the theory correctly predicts the reduced probability of fixation of a rare allele at high migration rates and the increased fixation probabilities at lower m. The predicted fixation probabilities are all within 6% of the estimates from simulation. Furthermore, all of the predicted probabilities of loss are within 5% of the simulation results.
—Fixation probabilities for quadratic frequency dependence. Theoretical predictions (curves) are compared to simulation results (points) over a range of values of selection parameters. In all cases k0 = 0, k2 = 0.002 - k1, N = 100, D = 100, and the allele was initially present in a single copy. Results are presented for three migration rates: m = 0.001 (solid triangles and curve), m = 0.003 (open circles and curve), and m = 0.01 (solid diamonds and curve).
The theory presented here covers any frequency dependence described by a polynomial. In the simulations discussed above this was a first-degree polynomial. Figure 3 compares predictions and results for cases of quadratic frequency dependence. The agreement of the predictions with the results is excellent: all of the predictions are within a few percent of the simulation results (the largest difference is 3.5%).
DISCUSSION
The theory presented here relates a subdivided population with frequency-dependent selection (including the case of dominance) to an equivalent panmictic population characterized by different parameters. Subdivision alters both the size of the equivalent panmictic population (the effective population size Ne) and the effective values of all of the parameters describing selection. In the case of dominance, the dominance parameter h is in effect moved toward ½ by subdivision; i.e., fitness is made effectively closer to additive. For frequency dependence described by any polynomial in allele frequency, the effective values of all of the polynomial coefficients are altered by subdivision.
The theoretical treatment assumed that selection was weak in one sense, but allowed that it was strong in another sense. The requirement is that selection is weak compared to drift in a subpopulation, i.e., that the product of deme size (N) and (marginal) selection coefficient is always small in magnitude compared to unity. This allows selection to be quite strong with respect to drift in the population as a whole, so that selection may have a large effect on the fate of an allele without the assumptions being violated.
Computer simulations confirm that the theory closely predicts fixation probabilities so long as the parameters meet the stated conditions, namely that D is large and |Nŝ(x)| ⪡ 1 for all x between zero and one. As expected, when the parameter values violate these conditions the predictions are less reliable. Nonetheless, when the conditions are moderately violated, for example, when selection and within-deme drift are of comparable strength, the predictions are still quite good.
A special case covered by the theoretical results is that of dominance for fitness in the absence of other sources of frequency dependence. It was shown that subdivision in effect decreases the degree of dominance, as measured by the deviation of the dominance parameter h from ½, by a factor that depends only on the deme size N and the migration rate m. Specifically, subdivision changes the effective deviation of h from ½, he - ½, by a factor of Nm/(Nm + 1). The direction of this effect is in accord with the well-established fact that subdivision decreases the effect of dominance on fixation probabilities (Wright 1940, 1941; Slatkin 1981). It should be noted that this effect is not simply a matter of a reduction in the fraction of heterozygotes. If this were the sole source of the effect, one would expect a factor of 1 - Fst = 2Nm/(2Nm + 1) reduction in the effect of dominance. If the same degree of inbreeding were achieved by, for example, brother-sister mating, while selection operated globally, then 1 - F would indeed be the factor by which the effect of dominance was modified. The model analyzed here differs from that case in that competition is local to each subpopulation.
The diffusion approximation derived here for a finite island model completely describes the trajectory of allele frequency over time in the presence of dominance or frequency-dependent selection. The results followed from the moments of the distribution of within-deme allele frequencies. This is a beta distribution for the island model. Under other models of subdivision this distribution may have a different form, but the moments of the distribution nonetheless characterize the population. These moments could be derived theoretically or measured empirically. So long as Fst is independent of allele frequency and ordinary genetic drift is the only stochastic force operating, these moments can be used to relate the subdivided population to an equivalent panmictic population even in the presence of frequency-dependent selection.
Acknowledgments
I thank Christina Muirhead and Jon Wilkins for comments on the manuscript. John Wakeley encouraged me to work on this topic. This work was supported by National Science Foundation grant DEB-9815367 to John Wakeley.
Footnotes
-
Communicating editor: N. Takahata
- Received October 21, 2002.
- Accepted January 2, 2003.
- Copyright © 2003 by the Genetics Society of America