Genetics, Vol. 165, 2153-2166, December 2003, Copyright © 2003

Selection and Drift in Subdivided Populations: A Straightforward Method for Deriving Diffusion Approximations and Applications Involving Dominance, Selfing and Local Extinctions

Denis Rozea,b and François Rousseta
a Institut des Sciences de l'Evolution, Université Montpellier II, 34095 Montpellier, France
b Centre d'Etudes sur le Polymorphisme des Microorganismes, Institut de Recherche pour le Développement, 34394 Montpellier, France

Corresponding author: Denis Roze, UMR CNRS-IRD 9926, Institut de Recherche pour le Développement, 911 avenue Agropolis, BP 64501, 34394 Montpellier Cedex 5, France., roze{at}mpl.ird.fr (E-mail)

Communicating editor: L. EXCOFFIER


*  ABSTRACT
*TOP
*ABSTRACT
*LIFE CYCLE
*THE DIFFUSION APPROXIMATION
*EXPECTED CHANGE IN ALLELE...
*VARIANCE IN ALLELE FREQUENCY...
*PROBABILITIES OF COALESCENCE
*EXAMPLES OF FITNESS FUNCTIONS
*RESULTS
*DISCUSSION
*APPENDIX A
*APPENDIX B
*LITERATURE CITED

Population structure affects the relative influence of selection and drift on the change in allele frequencies. Several models have been proposed recently, using diffusion approximations to calculate fixation probabilities, fixation times, and equilibrium properties of subdivided populations. We propose here a simple method to construct diffusion approximations in structured populations; it relies on general expressions for the expectation and variance in allele frequency change over one generation, in terms of partial derivatives of a "fitness function" and probabilities of genetic identity evaluated in a neutral model. In the limit of a very large number of demes, these probabilities can be expressed as functions of average allele frequencies in the metapopulation, provided that coalescence occurs on two different timescales, which is the case in the island model. We then use the method to derive expressions for the probability of fixation of new mutations, as a function of their dominance coefficient, the rate of partial selfing, and the rate of deme extinction. We obtain more precise approximations than those derived by recent work, in particular (but not only) when deme sizes are small. Comparisons with simulations show that the method gives good results as long as migration is stronger than selection.


THE geographical distribution of a population has many consequences on its evolution. The nature of the habitat (fragmented or continuous) and the dispersal patterns of individuals across space affect the relative importance of the different processes changing allele frequencies (selection, genetic drift ...) and therefore have an effect on several genetical features of populations, such as the amount of genetic diversity and the strength of inbreeding depression. Spatial structure may also affect the timescale of evolution; indeed, the rate of adaptation of a population depends critically on the probability that new advantageous mutations will go to fixation and on the time length of the fixation process.

Different models have been used to calculate the probability of fixation of an advantageous mutation in a spatially structured population, dealing with different types of population structure, and different modes of selection. For some simple models, MARUYAMA 1970 Down found that in a population subdivided into demes of constant size that never go extinct, spatial structure has no influence on fixation probabilities: a new mutation has the same probability of fixation as if the population was not fragmented. This conclusion has been supported by more recent work (CHERRY and WAKELEY 2003 Down). SLATKIN 1981 Down used a low-migration limit to investigate the joint effects of dominance and spatial structure on fixation probabilities of advantageous mutations. In his model, the fixation of an allele within a deme occurs much faster than its transmission between demes, so that fixation is assumed to take place independently in each deme (see also TAKAHATA 1991 Down). He showed that the probability of fixation of recessive mutations is increased in this low-migration limit, compared to the case of a panmictic population; indeed, population structure increases the probability that an advantageous homozygous genotype is created before the mutation is lost. However, the probability of fixation of dominant mutations is decreased. BARTON 1993 Down worked on another model to study the effect of catastrophic extinctions of demes and showed that extinctions can substantially reduce the fixation probability of beneficial alleles. More recently, several authors used diffusion approximations to derive fixation probabilities and fixation times in metapopulations. CHERRY 2003A Down, CHERRY 2003B Down, CHERRY and WAKELEY 2003 Down, and WAKELEY 2003 Down have shown that diffusion methods can be used in island models of population structure and studied the joint effects of dominance, population structure, and local extinctions on fixation probabilities and times of mutant alleles. WHITLOCK 2003 Down has presented another approach, which he also applies to the stepping-stone model of population structure.

In this article we provide a different method to construct diffusion approximations, which is in several ways more general than previous ones. The method consists of expressing the mean of allele frequency change over one generation in terms of partial derivatives of a "fitness function" and probabilities of genetic identity computed in a neutral model. We make no assumption on the particular form of the fitness function; indeed, the method can be applied to many different selection models (including multilevel selection). We then take the example, considered by previous authors, of selection through effects on fecundity (different genotypes producing different numbers of gametes) and consider the effects of dominance, self-fertilization, and local extinctions on the probability of fixation of new mutations. We confirm some previous approximations; however, our method gives more precise results when migration is strong, when deme size is small, and also when extinctions occur. As discussed later, the diffusion method gives satisfying results as long as migration rates are not too small; with dominance, Slatkin's method performs better when migration is very low.


*  LIFE CYCLE
*TOP
*ABSTRACT
*LIFE CYCLE
*THE DIFFUSION APPROXIMATION
*EXPECTED CHANGE IN ALLELE...
*VARIANCE IN ALLELE FREQUENCY...
*PROBABILITIES OF COALESCENCE
*EXAMPLES OF FITNESS FUNCTIONS
*RESULTS
*DISCUSSION
*APPENDIX A
*APPENDIX B
*LITERATURE CITED

Throughout this article, we use an island model of population structure, with possible local extinctions. We assume that the population is subdivided into n demes. At the beginning of the life cycle, each deme contains N adult individuals and has a probability e of becoming extinct. In the surviving demes, individuals produce a very large number of gametes and die; in practice we treat the number of gametes produced as if it were infinite. Gamete fusion may precede or follow migration (zygotic or gametic migration); we first do not make any assumption about the breeding system, but later we consider the cases of random union of gametes and of partial self-fertilization at a rate {alpha}. Migrants produced by a deme can reach any other deme with the same probability. The migration rate m is defined as the proportion of migrant zygotes (or gametes) present in any surviving deme once migration has occurred ("backward" migration rate). We assume that migrants arriving in extinct demes do not survive, each extinct deme being recolonized later by a finite number k of individuals (SLATKIN 1977 Down). These recolonizers come from "propagules" that formed in each surviving deme after the first round of dispersal. Again, recolonization can be zygotic (if it occurs after fertilization) or gametic (if it occurs before; in this case each extinct deme is recolonized by 2k gametes). We also define a parameter {phi} measuring the probability that two colonizers come from the same deme (this is specified later). We assume that colonizers reproduce immediately, producing a very large number of zygotes. Finally, in all demes N individuals are sampled randomly among all the zygotes to form the new adult generation; exactly N adults are always in each deme, recolonized or not. Thus, recolonizers experience two rounds of migration and reproduction within one "generation." This way of representing the dynamics of recolonization is obviously simplistic, but may be seen as a first step in understanding the consequences of extinction and recolonization (SLATKIN 1977 Down; WADE and MCCAULEY 1988 Down; WHITLOCK and BARTON 1997 Down; PANNELL and CHARLESWORTH 1999 Down); more complex models may be tractable within the framework presented here (ROUSSET 2004 Down, Chaps. 10 and 11).


*  THE DIFFUSION APPROXIMATION
*TOP
*ABSTRACT
*LIFE CYCLE
*THE DIFFUSION APPROXIMATION
*EXPECTED CHANGE IN ALLELE...
*VARIANCE IN ALLELE FREQUENCY...
*PROBABILITIES OF COALESCENCE
*EXAMPLES OF FITNESS FUNCTIONS
*RESULTS
*DISCUSSION
*APPENDIX A
*APPENDIX B
*LITERATURE CITED

Diffusion theory has provided good approximations for probabilities of fixation of mutant alleles (KIMURA 1964 Down). In this section we briefly review the methodology in the case of a panmictic population containing two alleles a and A. We call p the frequency of A and {delta}p the change in frequency of A over a short time interval {delta}t. We first suppose that time and frequency are continuous and define

(1)

where E[x] is the expectation of x. To use the diffusion method we need the following limits:

(3)


(4)

If these limits are finite, and if the higher-order moments of {delta}p are negligible, the probability of fixation of A when present in frequency p, u(p), is given by the differential equation

(5)

(KARLIN and TAYLOR 1981 Down) with solution

(6)

where

(7)

To apply this continuous model to the case of a population with discrete generations, one states that time is measured in units of N generations, where N is the population size, and that the short time interval {delta}t corresponds to one generation. Then {delta}t = 1/N, and {delta}p and {delta}x are the limits as N tends to infinity of N x M{delta}p and N x V{delta}p, respectively (KARLIN and TAYLOR 1981 Down).

In the simple case of genic selection where A has a selective advantage s over a in an ideal haploid Wright-Fisher population of size N, we have

(8)


(9)

with q = 1 - p (EWENS 1979 Down; KARLIN and TAYLOR 1981 Down). For {delta}p and {delta}x to be finite, we need to assume that s tends to zero as N tends to infinity and that the product Ns has a finite limit, say {sigma}. Then Equation 6 and Equation 7 are valid, with {delta}p = {sigma}pq and {delta}x = pq. These expressions of {delta}p and {delta}x lead to the classical result (e.g., EWENS 1979 Down)

(10)

where N is assumed to be large and s small. Despite all the approximations done, this formula proves very accurate for values of s as high as 0.1, even when N is as small as 10 (e.g., CARR and NASSAR 1970 Down; GALE 1990 Down).

In the following, we consider an island model of population structure and look for an expression for the probability of fixation of a mutant allele A in a population initially fixed for allele a; p is now the frequency of A in the whole population, while the vector p = (p1, ... , pn) gives the frequencies of A in different demes. We suppose that the number of demes (n) is large and define {delta}p and {delta}p as the limits as n tends to infinity of the products n x M{delta}p and n x V{delta}p, where M{delta}p and V{delta}p are still the first two moments of the change of A over one generation. Using a simple argument, we show that these limits can be expressed as functions of the average frequency of A in the population (p). A rigorous demonstration that the process does converge to a diffusion (including the proof that higher-order moments vanish in the limit as n tends to infinity) would follow the general argument of ETHIER and NAGYLAKI 1980 Down for Markov chains with two timescales. WAKELEY 2003 Down explains how their equations apply to island models of structured populations. We do not repeat such arguments here, but we offer an alternative presentation of the same general idea.


*  EXPECTED CHANGE IN ALLELE FREQUENCY
*TOP
*ABSTRACT
*LIFE CYCLE
*THE DIFFUSION APPROXIMATION
*EXPECTED CHANGE IN ALLELE...
*VARIANCE IN ALLELE FREQUENCY...
*PROBABILITIES OF COALESCENCE
*EXAMPLES OF FITNESS FUNCTIONS
*RESULTS
*DISCUSSION
*APPENDIX A
*APPENDIX B
*LITERATURE CITED

We use the direct fitness method (TAYLOR and FRANK 1996 Down; ROUSSET and BILLIARD 2000 Down) to express M{delta}p as a function of probabilities of identity and partial derivatives of a fitness function. The parameters and variables of the model are summarized in Table 1 and Table 2. In all the following, the subscript ij refers to the jth adult individual of the ith deme. The fitness Wij of this individual is defined as the expected number of its gametes that will participate to the next adult generation. Fitness may depend on the phenotype of the individual, on the mean phenotype of the individuals present in its deme, and on the mean phenotype in the whole population. We call zaa the phenotype of homozygous aa individuals and assume that the phenotype of AA individuals differs from zaa by an amount s: zAA = zaa + s. We write the phenotype of heterozygous individuals zAa = zaa + sh, h measuring the dominance of A. Defining pij as the frequency of A in individual ij (pij = 0, 1/2, or 1), pij1 and pij2 as the frequencies of A at each of its two homologous genes, and zij as its phenotype, we have the relation

(11)


 
View this table:
In this window
In a new window

 
Table 1. Parameters of the model


 
View this table:
In this window
In a new window

 
Table 2. Variables of the model

The expected change in frequency of A over one generation is

(12)

To the first order in s, this is

(13)

as Wij depends on zij, zi, and z. Equation 11 and Equation 13 give

(14)

where the overbar means the average among demes and individuals over the whole population.

{delta}p has been defined as the limit of the product n x M{delta}p as n goes to infinity. For this limit to be finite, we have to assume in principle that s is of order 1/n, so that the product ns has a finite limit as n tends to infinity (for the same reason, s had to be of order 1/N in the panmictic case—see previous section). Therefore our method supposes a high number of demes (n high) and weak selection (s small), although simulations will show that these are not major constraints. As s is of order 1/n, the terms in o(s) of M{delta}p vanish when we take the limit of n x M{delta}p as n goes to infinity.

The different averages in Equation 14 can be seen as probabilities of identity: for example, is the probability of obtaining two A alleles after sampling two genes with replacement from the same deme. In the diffusion limit (n tends to infinity, s tends to zero, ns finite), these probabilities converge to values that depend only on p, the overall frequency of A. Again, we take the example of , the probability that two genes sampled with replacement from the same deme are both A. Consider the two ancestral lineages of these genes backward in time: either these lineages stay and coalesce in the same deme or one of them (or both) migrates to another deme before coalescence occurs; in the first case, the expected coalescence time does not depend on the number of demes n, while in the latter case, coalescence time becomes infinitely long as n tends to infinity. This has been described as a separation of timescales (e.g., WAKELEY 2003 and references therein), because lineages in different demes coalesce at a rate inversely related to n, while genes within demes can coalesce within a few generations, at a rate depending on deme size and migration (roughly, on m + 1/N in the island model), but essentially independent of n. Therefore in the limit as n tends to infinity, at time t independent of n in the past we can consider that the two lineages have either stayed in the same deme and coalesced (with a probability that we call rR1) or have migrated to different demes and have not coalesced (with probability 1 - rR1). Thus the probability that our two genes are A is equal to rR1pt + (1 - rR1)p2t, where pt is frequency of A in the whole population at time t. However, in the limit as n tends to infinity and s tends to zero, the frequency of A in the population does not change over the finite number t of generations, thus pt = p, the present frequency of A. Therefore

(15)

This expression, as well as all further expressions involving O(1/n) terms, is an average over all parental populations with the same value of p. These parental populations may differ in the frequency of demes with different copies of the A allele, so that for each possible parental population may depart from the above average. But the magnitude of these differences vanishes as the number of demes increases, which allows us to use diffusion equations (see WAKELEY 2003 Down, Equation 33, and also ETHIER and NAGYLAKI 1980 Down, Equation 1.5). In the infinite island model, one recovers the classical expression for inbreeding coefficients (e.g., CROW and KIMURA 1970 Down). Here rR1 is equivalent to Wright's FST (HUDSON 1998 Down; ROUSSET 2002 Down).

The same reasoning is used to calculate the limit of , which is the probability that two genes sampled with replacement in the same individual are both A:

(16)

Here rR0 is the probability of coalescence for two genes sampled with replacement from the same individual, in a population with an infinite number of demes.

Denote rD1 the probability of coalescence of two genes sampled from the same deme without replacement, and rD0 the probability of coalescence of the homologous genes of an individual (still in the limit as n tends to infinity). We have the relations

(17)

and

(18)

rD0 can be shown to be equivalent to Wright's FIT in the infinite island model. As is the probability that the two homologous genes of an individual are both A, we have

(19)

Last, we need the limit as n tends to infinity of , which measures the probability that the two homologous genes of an individual and a third gene sampled from the same deme are all A. We call aR the limit as n tends to infinity of the probability that the three lineages coalesce before one of them moves to another deme, bR the limit of the probability that only two of them stay in the same deme and coalesce, and cR = 1 - aR - bR the limit of the probability that two lineages move to different demes before any coalescence event has occurred. We have

(20)

Denote aD, bD, and cD the same probabilities when the third gene is sampled in a different individual. We have the relations

(21)

and

(22)

Because adult population size remains constant after regulation, the partial derivatives of the fitness function sum to zero (ROUSSET and BILLIARD 2000 Down); therefore we can express M{delta}p as a function of the two first partial derivatives only. After replacing the average products of allele frequencies by the expressions calculated above, we obtain

(23)


*  VARIANCE IN ALLELE FREQUENCY CHANGE
*TOP
*ABSTRACT
*LIFE CYCLE
*THE DIFFUSION APPROXIMATION
*EXPECTED CHANGE IN ALLELE...
*VARIANCE IN ALLELE FREQUENCY...
*PROBABILITIES OF COALESCENCE
*EXAMPLES OF FITNESS FUNCTIONS
*RESULTS
*DISCUSSION
*APPENDIX A
*APPENDIX B
*LITERATURE CITED

To use the diffusion method, we also need the limit as n tends to infinity of n x V{delta}p, where V{delta}p = E [({delta}p)2|p] and {delta}p is the change in frequency of A over one generation. We have

(24)

where p' = p + {delta}p and Var0 is the variance in the neutral case (s = 0). We look for an expression for Var0[p'] to the first order in 1/n. We have

(25)

We show in Appendix A that these variance and covariance terms can be expressed, to first order in 1/n, as functions of the frequency of A in the whole population (p) and an effective population size Ne depending on the various parameters of the model (but not on allele frequencies). By definition, Ne is the variance effective population size (EWENS 1982 Down).

When e = 0 (no extinction), covariances in the change in allele frequencies in different demes equal zero (because sampling occurs independently in each deme). Equation A5 of Appendix A lead to

(26)

which is of the form pq/2Ne, with

(27)

a result known under the form nN/(1 - FST)(1 + FIS), where and (see WANG and CABALLERO 1999 Down for a more general result). In the case of gametic migration, monoecious individuals, and random fertilization, we have , giving

(28)

Extinctions add an additional complication because they introduce correlations in the change in allele frequency in different demes (because the frequency of A in the migrant pool depends on which demes did survive). We show in Appendix A that when e > 0, the variance effective population size is

(29)

This is also the eigenvalue effective size (EWENS 1979 Down, EWENS 1982 Down), derived by several independent arguments in ROUSSET 2003 Down.


*  PROBABILITIES OF COALESCENCE
*TOP
*ABSTRACT
*LIFE CYCLE
*THE DIFFUSION APPROXIMATION
*EXPECTED CHANGE IN ALLELE...
*VARIANCE IN ALLELE FREQUENCY...
*PROBABILITIES OF COALESCENCE
*EXAMPLES OF FITNESS FUNCTIONS
*RESULTS
*DISCUSSION
*APPENDIX A
*APPENDIX B
*LITERATURE CITED

The expressions for M{delta}p and V{delta}p derived in the previous sections involve variables representing various probabilities of coalescence in a neutral model, evaluated in the limit as n tends to infinity. These variables can be calculated by writing recurrence equations (which will depend on the life cycle considered) and calculating values at equilibrium. These recurrence equations are given in Appendix B for three different cases: (i) monoecious individuals with random fertilization, gametic migration, and no extinction; (ii) monoecious individuals with rate {alpha} of self-fertilization, zygotic migration, and no extinction; and (iii) monoecious individuals with random fertilization, gametic migration, extinction rate e, and gametic recolonization. Here we give only approximate solutions assuming that migration rate is small and deme size large, so that N = O(1/m), and neglecting terms in O(m).

Case i: random fertilization, no extinction:
Approximating Equations B5–B9 for small m and large N, one obtains (with M = Nm)

(30)


(31)

These approximations are also valid for zygotic migration and dioecious individuals (not shown). Although, as found by WHITLOCK 2002 Down, these approximate expressions satisfy the relation , this is no longer the case when the exact expressions (given in Appendix B) are used.

Case ii: partial self-fertilization, no extinction:
Here we assume that self-fertilization occurs at a rate {alpha} and that fertilization precedes migration. Random fertilization corresponds to {alpha} = 1/N. We obtain the approximations

(32)


(33)

Case iii: random fertilization, extinctions:
We assume here that both migration and recolonization are gametic. In the LIFE CYCLE section, we introduced the parameters k (number of colonizers) and {phi} (probability of common origin). More precisely, {phi} measures the probability that two genes sampled randomly among the 2k gametes colonizing an extinct deme were in the same deme before migration. We also need the parameters {psi}1 and {psi}2, defined as the probabilities that three genes sampled randomly among the 2k colonizers were in the same deme before migration ({psi}1) or in two different demes ({psi}2). Following SLATKIN 1977 Down, we distinguish two modes of recolonization: the "migrant pool," where the 2k gametes recolonizing a deme are taken randomly in the whole population after migration (in this case {phi} = {psi}1 = {psi}2 = 0), and the "propagule pool," where the 2k gametes were all in the same deme after migration [in this case {phi} = (1 - m)2, {psi}1 = (1 - m)3, and {psi}2 = 3m(1 - m)2]. When the extinction rate e is O(m) we obtain

(34)

This approximation follows from the recursion (B20) in Appendix B and was also obtained by WHITLOCK and MCCAULEY 1990 Down. We could not obtain any simple approximation for aD, bD, and cD.


*  EXAMPLES OF FITNESS FUNCTIONS
*TOP
*ABSTRACT
*LIFE CYCLE
*THE DIFFUSION APPROXIMATION
*EXPECTED CHANGE IN ALLELE...
*VARIANCE IN ALLELE FREQUENCY...
*PROBABILITIES OF COALESCENCE
*EXAMPLES OF FITNESS FUNCTIONS
*RESULTS
*DISCUSSION
*APPENDIX A
*APPENDIX B
*LITERATURE CITED

We consider two cases of individual selection: "soft" and "hard" selection. The first case corresponds to a regulation of the number of gametes (or zygotes) produced by the different surviving demes just before migration, while in the second case there is no such regulation. However, in both cases there is a phase of population regulation, as N adults are sampled in each deme at the end of every generation. This definition of hard selection is the same as in BARTON 1993 Down, but different from in WHITLOCK 2002 Down, which explains some discrepancies between our results and results in WHITLOCK 2003 Down. In Whitlock's hard selection model, each deme contributes to the next adult generation in proportion to the number of juveniles produced in the deme before migration; although this definition sounds simple, we do not know any realistic life cycle maintaining a constant population size in this case.

In both cases, we fix zaa to zero and assume that an individual of phenotype zij produces a number of gametes proportional to 1 + zij; the numbers of gametes produced by Aa and AA individuals, relative to aa individuals, are then 1 + sh and 1 + s, respectively.

We note at this point that we need only the limit as n tends to infinity of the fitness function, as terms of order 1/n and higher will vanish when we will calculate {delta}p—see Equation 23.

Soft selection:
In this case, we assume that the number of gametes produced in each deme is regulated before migration to a constant value that is the same for all demes; we suppose that this value is still very large. Recall that the fitness of individual ij is the expected number of its gametes that will participate to the next adult generation. We obtain

(35)

Individual ij can participate to the next generation only if its deme survives (probability 1 - e). The first term in the parentheses is the frequency of gametes produced by individual ij present in deme i after migration; the second term is the frequency of gametes produced by ij in any other nonextinct deme, after migration, times the number of other nonextinct demes; and the third term is the frequency of gametes produced by ij in the pool of recolonizers, times the number of extinct demes. Simplifying Equation 35 leads to

(36)

giving {partial}Wij/{partial}zij = 2, {partial}Wij/{partial}zi = -2 (derivatives have to be evaluated for s = 0, which means here zij, zi, z = 0).

Hard selection:
Here we assume that the number of gametes or juveniles produced in a deme before migration is not regulated: a deme fixed for A produces 1 + s times the number of gametes (or juveniles) produced in a deme fixed for a. Here the fitness function is given by

(37)

Again, the first term in the parentheses represents the proportion of gametes produced by individual ij present in deme i after migration (provided that deme i did not go extinct); the second term is the average proportion of gametes produced by ij in the other nonextinct demes, after migration, times the number of other nonextinct demes; and the third term is the proportion of gametes produced by ij in the pool of recolonizers, times the number of extinct demes (we assume that surviving demes contribute to the pool of colonizers in proportion of the total number of gametes present in the deme). Equation 37 leads to

(38)

giving {partial}Wij/{partial}zij = 2, {partial}Wij/{partial}zi = -2(1 - e)(1 - m)2.


*  RESULTS
*TOP
*ABSTRACT
*LIFE CYCLE
*THE DIFFUSION APPROXIMATION
*EXPECTED CHANGE IN ALLELE...
*VARIANCE IN ALLELE FREQUENCY...
*PROBABILITIES OF COALESCENCE
*EXAMPLES OF FITNESS FUNCTIONS
*RESULTS
*DISCUSSION
*APPENDIX A
*APPENDIX B
*LITERATURE CITED

Additivity:
In the additive case (h = 1/2) and when no extinction occurs (e = 0), simple results are obtained. Under hard selection and gametic migration one arrives at

(39)

while under soft selection and gametic migration,

(40)

The N - 1 factor is consistent with the fact that soft selection means no selection when N = 1. In the case of zygotic migration, N has to be replaced by N (2N)/(2N - 1) in Equation 39, while N - 1 has to be replaced by N (2N - 2)/(2N - 1) in Equation 40. These differences are significant only for small values of N.

Therefore, in the additive case with no extinction, probabilities of fixation depend neither on population structure nor on inbreeding, as found by MARUYAMA 1970 Down, except for effects of order 1/N that have been ignored in previous analyses. Previous simulation studies have also missed them, often because large values of N were assumed. When extinctions occur (e > 0), the expression of G is more complex and depends on population structure. For example, under soft selection and haploid migration, we find

(41)

Effects of dominance:
When h != 1/2, the function G is a complicated function of the parameters. Here we present some results relative to case i (monoecious individuals with random fertilization, gametic migration, and no extinction). If we assume that m is small and N large, so that we can use approximations 30 and 31, we obtain

(42)

which has to be integrated numerically to obtain u(p), the probability of fixation of A. For advantageous mutations (s > 0), a crude approximation can be obtained by neglecting terms in x2 in the expression of G, giving

(43)

We found that this approximation often gives a good idea of the probability of fixation of a rare advantageous allele, except when both the migration rate is large and h is close to zero.

Fig 1 shows the probability of fixation of a rare advantageous allele as a function of its dominance coefficient, for different values of the migration rate. Here we used the exact expressions of the probabilities of identity for case i (given in Appendix B); however, Equation 43 gives very close results (not shown). We used the soft selection model, but under hard selection the results are qualitatively and quantitatively very similar for large N. Fig 1 illustrates the fact that population structure increases the probability of fixation of recessive advantageous mutants, while it decreases the probability of fixation of dominant advantageous mutants. It also shows that the diffusion approximation gives poor results for small values of m; in the last case SLATKIN's (1981) low-migration approximation performs better (this is discussed later).



View larger version (12K):
In this window
In a new window
Download PPT slide
 
Figure 1. Probability of fixation of an advantageous mutation (s = 0.01) when present as a single copy, as a function of its dominance coefficient (h). Migration is gametic, individuals are monoecious with random fertilization, and no extinction occurs (e = 0). Number of demes, n = 200; deme size, N = 100. Results from the diffusion model are: dotted line, migration rate m = 0.1; dotted/dashed line, m = 0.01; dashed line, m = 0.001. Simulation results are: solid circles, m = 0.1; open circles, m = 0.01; squares, m = 0.001. The solid line corresponds to SLATKIN's (1981) low migration limit.

Effects of selfing:
In case ii (monoecious individuals with rate {alpha} of self-fertilization, zygotic migration, and no extinction), approximations (32) and (33) lead to

(44)

which has to be integrated numerically. Neglecting the terms in x2, we obtain a crude approximation for advantageous mutations:

(45)

Fig 2 shows the effect of partial self-fertilization on the probability of fixation of a recessive advantageous mutation (h = 0). Again, we used the exact expressions of the probabilities of identity and integrated numerically the G function, but Equation 45 gives very close results. We still observe a discrepancy between analytical and simulation results at small values of m.



View larger version (11K):
In this window
In a new window
Download PPT slide
 
Figure 2. Probability of fixation of a recessive advantageous mutation (s = 0.01, h = 0), as a function of the rate of self-fertilization ({alpha}). Individuals are monoecious and migration is zygotic. Number of demes, n = 200; deme size, N = 100; no extinction occurs (e = 0). Results from the diffusion model are: dotted line, migration rate m = 0.01; solid line, migration rate m = 0.05. Simulation results are: open circles, m = 0.01; solid circles, m = 0.05.

Effects of the extinction rate:
Fig 3 shows some results concerning case iii (monoecious individuals with random fertilization, gametic migration, extinction rate e, and gametic recolonization), taking the example of an advantageous underdominant mutant. Underdominance corresponds to the case where heterozygotes are disadvantaged (h < 0); it can result from chromosomal rearrangements (LANDE 1985 Down) and possibly from mutations affecting transcriptional regulation during the development (GIBSON 1996 Down). In a large panmictic population, such mutations have a very low probability of fixation (e.g., KIMURA 1962 Down). Using a low-migration limit, LANDE 1985 Down showed that local extinctions can increase the probability of fixation of underdominant mutations, in an island model of population structure. This effect has also been observed by simulation in a two-dimensional stepping-stone model (MICHALAKIS and OLIVIERI 1993 Down).



View larger version (18K):
In this window
In a new window
Download PPT slide
 
Figure 3. Probability of fixation of an underdominant advantageous mutation (s = 0.02, h = -0.1) relative to a neutral mutation, as a function of the extinction rate (e). Individuals are monoecious with random fertilization, and migration and recolonization are gametic; number of demes, n = 200; deme size, N = 100; migration rate, m = 0.1. A corresponds to the "migrant pool" model and B to the "propagule pool" model. Results from the diffusion model are: solid line, number of colonizers k = 2; dashed line, k = 10; dotted line, k = 100. Simulation results are: solid circles, k = 2; open circles, k = 10; squares, k = 100.

Fig 3 represents the probability of fixation of an underdominant mutation, relative to a neutral mutation, as a function of the extinction rate e, and for different values of k, the number of colonizers. Fig 3A corresponds to the migrant pool model of colonization, while Fig 3B corresponds to the propagule pool model. We used the soft selection model, but again results are very similar with hard selection (not shown). The effect of the rate of extinction on the fixation probability of the mutant is nonmonotonic. This may result from two opposite forces: extinctions increase the probability that an advantageous AA genotype will appear before allele A is lost, in particular when the number of colonizers is small, but also decrease the chance of fixation of A once AA genotypes have been created (by increasing random drift, extinctions reduce the probability of fixation of advantageous alleles).


*  DISCUSSION
*TOP
*ABSTRACT
*LIFE CYCLE
*THE DIFFUSION APPROXIMATION
*EXPECTED CHANGE IN ALLELE...
*VARIANCE IN ALLELE FREQUENCY...
*PROBABILITIES OF COALESCENCE
*EXAMPLES OF FITNESS FUNCTIONS
*RESULTS
*DISCUSSION
*APPENDIX A
*APPENDIX B
*LITERATURE CITED

Fig 1 and Fig 2 show that the analytical model gives good predictions when the migration rate is not too small, while for small values of m predictions are not as good (except when h = 1/2 and e = 0, since in that case we obtain an expression independent on m for the probability of fixation). This comes from the fact that we calculated the probabilities of coalescence before migration (r0, r1, a, b, c) in the neutral case (s = 0); as s increases, these probabilities become more and more different from the neutral values, and this difference depends on the migration rate. As previously noted, what matters in the argument based on a separation of timescales is that changes in allele frequencies within demes, due to drift and migration, are fast relative to selection (see also WAKELEY 2003 Down), hence that s is small relative to m + 1/N. In practice, the validity of the diffusion approximation depends on weak selection (small s) and not-too-small migration (roughly, m > s). This can also be expressed in terms of our coalescent argument: for relatively high values of m, two lineages from the same deme either separate to different demes (in the past) or coalesce rapidly within the deme, and during this short time selection does not have much effect. For small values of the migration rate, however, the mean within-deme coalescence time becomes longer, and the effect of selection during that time cannot be neglected any more.

Although the validity of the diffusion approximation depends on a large number of demes (large n) and weak selection (small s), these two conditions are in fact not very restrictive: Table 3 shows that our solution gives accurate results for values of n as low as 10 and values of s as high as 0.1. When the exact expressions of the probabilities of identity are used, no other assumption needs to be made; for example, deme size (N) and rate of extinction (e) can be arbitrarily large or small. This is a difference between the approach used in CHERRY and WAKELEY 2003 Down, CHERRY 2003A Down, and WHITLOCK 2003 Down and our approach. Indeed, these authors calculate various quantities that are similar to the probabilities of identity in Equation 14 by using the fact that the distribution of allele frequencies among demes can be approximated by a ß-distribution when N is large and m is small. From our coalescent argument, we obtain the first moments of the distribution of allele frequencies among demes without having to assume that N is large or m small. In fact, CHERRY 2003A Down and WHITLOCK's (2003) approximations for the G function are the same as our Equation 42, which assumes small m and large N. We found that Equation 42 gives satisfying results as long as deme size is not too small; for small values of N, however, using the exact expressions of r0, r1, a, and c gives more precise results. For example, with n = 50, N = 15, and m = 0.3, Equation 42 predicts that the probability of fixation of a deleterious allele with s = -0.005 and h = 0 (relative to a neutral allele) is 0.0072, while using the exact expressions of r1, a, and c gives a value of 0.01128, and simulations give 0.01121 (for soft selection). Although assuming either hard or soft selection does not make much difference when N is large and m small (indeed, Equation 42 holds in both cases), this is not the case when N is small, as can be shown by using expressions of the G function that do not make any assumption on N and m—obtained from equations B5–B9 and 36 and 38. This is illustrated by Fig 4, which shows the probability of fixation of a recessive advantageous allele obtained from these expressions and from Equation 42, for low values of N. These examples show that using exact values of the probabilities of coalescence leads to more precise approximations; moreover, exact values of r0, r1, a, and c can be calculated without too much difficulty for many different life cycles, while it is not always easy to obtain approximations for the distribution of allele frequencies among demes (for example, when extinctions occur).



View larger version (10K):
In this window
In a new window
Download PPT slide
 
Figure 4. Probability of fixation of an advantageous recessive mutation (h = 0, s = 0.02), relative to a neutral mutation, as a function of deme size (N). Migration is gametic, individuals are monoecious with random fertilization, and no extinction occurs (e = 0). Number of demes, n = 100; migration rate, m = 0.3. Dashed line, soft selection; dotted line, hard selection; solid line, approximation (42) for N large and m small.


 
View this table:
In this window
In a new window

 
Table 3. Accuracy of the diffusion results

BARTON 1993 Down obtained approximations for the probability of fixation of a single mutant in a model with extinction and additive selection, assuming that deme size is large. We observed that his approximation for the case {phi} = 0 (Equation 11b in BARTON 1993 Down) gives similar results as our solution when migration and extinction rates are small (not shown); however, in our model recolonization does not occur exactly as in BARTON 1993 Down; therefore, one must be cautious when comparing our results.

An important consequence of population subdivision is an increase of the fixation probability of recessive advantageous mutations. Fig 5 shows this effect for a population of size 10,000; the different curves correspond to different modes of subdivision (10 demes of 1000 individuals, 100 demes of 100 individuals, and 1000 demes of 10 individuals). One can see that the increase in fixation probability of the recessive mutation is substantial only with "strong" population structure (small migration rate and/or small deme size); a moderate structure such as N = 100, m = 0.1 has virtually no effect on the fixation probability of recessive alleles.



View larger version (11K):
In this window
In a new window
Download PPT slide
 
Figure 5. Effect of population structure on the probability of fixation of a recessive advantageous mutation (s = 0.01, h = 0). x-axis, migration rate; y-axis, probability of fixation of the mutation when present as a single copy. Migration is gametic and fertilization random; extinctions do not occur (e = 0). Solid line, n = 1000, N = 10; dashed line, n = 100, N = 100; dotted line, n = 10, N = 1000.

We have demonstrated a simple method to construct diffusion approximations for populations subdivided according to an island model. It is based on the separation of timescales and on the additional fact that for the different life cycles, expressions for M{delta}p can be deduced mechanically from the fitness function with minimal risk of error by the "direct fitness" method, where the fitness function itself is a direct expression of the life cycle considered. This method has been useful for analyzing selection on various traits under different population structures (e.g., TAYLOR and FRANK 1996 Down; ROUSSET and GANDON 2002 Down). In this context, fitness has to be defined as the expected number of "successful" gametes of an individual, i.e., those that effectively participate to the gene pool of the next adult generation. Many different uses of the word "fitness" can be found in the literature; it often means "relative fecundity" or "relative survival." Defining a fitness function as we do proves to be useful since it leads to general expressions like Equation 23.

The direct fitness method used here has also proven convenient to study kin selection (TAYLOR and FRANK 1996 Down; ROUSSET and BILLIARD 2000 Down). A substantial part of kin selection theory has been devoted to the evolution of altruistic traits; in some cases, it can be shown that such traits always increase in frequency if a condition of the form -c + rb > 0 holds, where c and b are the cost and benefit of the altruistic act, and r the "relatedness" between interactants, assumed to be independent of the frequency of the allele coding for altruism (HAMILTON 1964 Down). Whether such a condition exists is not always obvious, and the present application of the direct fitness formalism may clarify such problems, since from Equation 23 one easily obtains conditions under which an allele increases in frequency in a structured population. For example, when h = 1/2, Equation 23 shows that M{delta}p is always positive provided that

(46)

This is the correct form of Hamilton's rule that one obtains by modern inclusive fitness techniques; it shows that -c, b, and r have to be defined as {partial}Wij/{partial}zij, {partial}Wij/{partial}zi, and 2rR1/(1 + rD0), respectively. When h != 1/2 we can also define a relatedness coefficient from Equation 23, but now this coefficient depends on p, the frequency of A. Conversely, our model can be used to include the effects of genetic drift in kin selection models and derive fixation probabilities and fixation times of alleles coding for altruistic behaviors.

Many other applications are outside the scope of this article, but are worth mentioning. For example, one can compute the distribution of allele frequency of recessive deleterious alleles in a metapopulation and consider the implication of different demographic regimes on the purge of such alleles. Average times to fixation or loss of new alleles are easily obtained once the expressions of M{delta}p and V{delta}p are known (CROW and KIMURA 1970 Down); although in this article we focused on fixation probabilities, fixation times are equally important in determining the rate of evolution of populations. CHERRY and WAKELEY 2003 Down, CHERRY 2003B Down, and WHITLOCK 2003 Down showed that the diffusion method provides good approximations for fixation times in structured populations. In general, a lower rate of migration slows down fixation, while extinction/recolonization can speed it up.

One may wonder about the generality of the present approach with respect to the type of population structure. Although in this article we stayed in the framework of the island model, the methodology presented here can be used with other spatial models, as long as coalescence occurs at two different timescales: a fast one during which selection can be neglected and a slow one that becomes infinitely long as population size tends to infinity. This is the case, for example, for cytoplasmic genes: coalescence is fast if lineages are sampled within the same organism and slow if they are sampled in different organisms. The two scales can also be family vs. population (e.g., CABALLERO and HILL 1992 Down). In these examples many questions arise about conflicting levels of selection; multilevel selection is easily incorporated into this framework by redefining the fitness function.


*  ACKNOWLEDGMENTS

We thank Joshua Cherry, John Wakeley, and Michael Whitlock for providing articles in press and Joshua Cherry, Montgomery Slatkin, John Wakeley, Michael Whitlock, and an anonymous reviewer for helpful comments on the manuscript. This is paper ISEM 2003–79.

Manuscript received February 11, 2003; Accepted for publication May 19, 2003.


*  APPENDIX A
*TOP
*ABSTR