# Perfect Simulation From Nonneutral Population Genetic Models: Variable Population Size and Population Subdivision

- Paul Fearnhead
^{1}

- 1
*Author e-mail:*p.fearnhead{at}lancs.ac.uk

## Abstract

We show how the idea of monotone coupling from the past can produce simple algorithms for simulating samples at a nonneutral locus under a range of demographic models. We specifically consider a biallelic locus and either a general variable population size mode or a general migration model for population subdivision. We investigate the effect of demography on the efficacy of selection and the effect of selection on genetic divergence between populations.

WHILE simulating from neutral population genetics models is straightforward using the coalescent (Kingman 1982), even for complex demographic models (see, *e.g.*, Donnelly and Tavaré 1995; Hudson 2002), simulation from nonneutral models is much more difficult. There are currently numerous approaches to simulation for nonneutral models. These can be split into two cases, simulating from the stationary distribution of the population or simulating from the population at a specific time. Examples of the latter include simulating samples a specific time after the fixation of a beneficial allele (*e.g.*, Przeworski 2003). In this article we focus on the former.

For nonneutral models that assume parent-independent mutation, constant population size, and random mating, the stationary distribution of allele frequencies is known and can be simulated from directly, using rejection sampling (Donnelly *et al.* 2001) or numerical integration methods (Fearnhead and Meligkotsidou 2004; Joyce and Genz 2006). For such models it is also possible to simulate the genealogy of a sample and linked neutral variation. This is possible by simulating and conditioning on the frequency of the nonneutral allele in the history of the population (Nordborg 2001; Nordborg and Innan 2003; Coop and Griffiths 2004; Spencer and Coop 2004) or by simulating from the conditional distribution of the ancestral selection graph given the allelic type of the sample (Slade 2000; Stephens and Donnelly 2003; Fearnhead 2006).

For models that do not assume parent-independent mutations, it is possible to use coupling from the past (CFTP) (Propp and Wilson 1996; Kendall 2005) to simulate samples from the ancestral selection graph (Fearnhead 2001). Here we extend this idea. First we introduce a simpler implementation of CFTP, which is obtained by having a state space of unordered rather than ordered samples. This means that we characterize a sample by the number of alleles of each type, rather than by the allelic type of each of an ordered set of chromosomes. This has two advantages, first that as we work on a smaller dimensional space, coupling should be quicker. Second, for the models we consider, by working with unordered samples we obtain a monotonicity of the sample space, which makes it more straightforward to detect when we have obtained a sample from the stationary distribution of interest and thus provides a much simpler algorithm to implement.

Our second extension is to allow for a variety of demographic models, including arbitrary variable population size models and models with multiple subpopulations (demes). As far as we are aware, the method we propose is the only current method for simulating from such multiple-deme coalescent models in the presence of selection. Our method is computationally efficient, with, for example, 1000 samples of size 100 being simulated in <40 sec for a 10-deme stepping-stone model under a variety of selection models (see results).

## METHODS

#### Monotone coupling from the past:

Consider an ergodic Markov chain *X _{t}* with state space {0, 1,…,

*K*}. CFTP (Propp and Wilson 1996) gives a method for simulating from the stationary distribution of

*X*. The idea is based on the fact that if we simulated the Markov chain from time −∞ to time 0, then regardless of the initial value of the chain,

_{t}*X*

_{−∞}, we would have that

*X*

_{0}is a draw from the stationary distribution of the chain. The idea of CFTP is that it enables us to perform such simulation in finite computing time.

To do this we first introduce some extra simulation, in that we consider simulating the value of *X _{t}*

_{+1}for all possible values of

*X*. To simplify notation we define a function

_{t}*F*(·) that specifies all these transitions. So if we are told that

_{t}*X*=

_{t}*x*, then we have

*X*

_{t}_{+1}=

*F*(

_{t}*x*). Note that the function

*F*is a realization of a random variable; the stochasticity of the Markov chain is now encompassed in the randomness of this function, but given a set of realizations

_{t}*F*

_{−T},

*F*

_{−T+1},…,

*F*

_{−1}we have a deterministic relationship between

*X*

_{−T}and

*X*

_{0}.

We now have an idealized simulation algorithm:

For

*t*= −1, −2,…,−∞ simulate the value of*X*_{t}_{+1}for all possible values of*X*and hence the function_{t}*F*(·)._{t}Arbitrarily specify

*X*_{−∞}; and for*t*= −∞,…,−1 recursively apply*x*_{t}_{+1}=*F*(_{t}*x*) to obtain_{t}*x*_{0}, a draw from the stationary distribution of the Markov chain.

This algorithm is obviously impracticable, as it involves infinite computing time. However, we can perform this simulation in finite computing time by doing this simulation a bit at a time. This gives the following CFTP algorithm:

Arbitrarily choose a negative integer

*T*_{0}; set*T*=*T*_{0}and*S*= 0.For

*t*=*S*− 1,*S*− 2,…,*T*, simulate*F*._{t}For each possible starting value

*x*= 0, 1,…,*K*, set*X*=_{T}*x*and recursively apply*x*_{t}_{+1}=*F*(_{t}*x*) to obtain_{t}*x*_{0}. If the value of*x*_{0}is identical for all starting values of*X*, then output_{t}*x*_{0}, a draw from the stationary distribution of the Markov chain; otherwise set*S*=*T*,*T*= 2*T*and return to b.

The idea here is that we imagine that we are doing the idealized simulation algorithm above. However, we first simulate the dynamics of the chain from time *T*_{0} to time 0, then the extra dynamics from time 2*T*_{0} to time *T*_{0}, then the extra dynamics from time 4*T*_{0} to time 2*T*_{0}, and so on (step b). The *coupling* condition in step c enables us to determine with certainty what the value of *X*_{0} would be if we had continued to simulate the dynamics of the chain back to time −∞ (as in the idealized algorithm). For example, imagine that the condition in step c is satisfied after simulating back to time 4*T*_{0}; this condition says that regardless of the value of , the value of *X*_{0} will be the same (*x*_{0}). Thus we do not need to know the realization of the chain from time −∞ to time 4*T*_{0}, as regardless of this we know that continuing the realization on to time 0 will produce *X*_{0} = *x*_{0}. Thus the value we output in step c is the same as the value we would obtain in step b of the idealized simulation algorithm and is thus a draw from the stationary distribution of the Markov chain.

Any choice for how to decrease *T* when coupling does not occur would produce a valid algorithm, but it has been argued that doubling *T* each time is optimum (by its relationship with a binary search; see, *e.g.*, Kendall 2005). Note that the validity of the approach requires that we do not resimulate any of the dynamics of the Markov chain if coupling does not occur in step c. Furthermore, the algorithm requires only that the functions *F _{t}*(·) are simulated in such a way that marginally the dynamics of the Markov chain are correct, and it is possible to have any amount of dependence in terms of the value of

*X*

_{t}_{+1}for different values of

*X*. In some cases it is possible to utilize this to obtain a more efficient way of determining whether coupling occurs in step c. If we can find a distribution on

_{t}*F*that marginally has the dynamics of the Markov chain and that satisfies the monotonicty condition that

_{t}*x*≥

*y*⇒

*F*(

*x*) ≥

*F*(

*y*), then we can replace step c with:

c. For both

*x*= 0 and*x*=*K*, set*X*=_{T}*x*and recursively apply*x*_{t}_{+1}=*F*(_{t}*x*) to obtain_{t}*x*_{0}. If the value of*x*_{0}is identical for both starting values of*X*then output_{t}*x*_{0}, a draw from the stationary distribution of the Markov chain; otherwise set*S*=*T*,*T*= 2*T*and return to b.

We call the resulting algorithm monotone CFTP. The only difference is that we run the Markov chain only forward in time for the minimal and maximal elements of the state space. The monotonicity ensures that if these two realizations couple, then all realizations from other starting points (which lie between them) will also couple.

While in this example we have assumed a totally ordered state space, monotone CFTP will still apply if we have only a partial order: if starting the chain at each of the maximal and minimal elements of our state space produces the same *x*_{0}-value, then so must all other starting values of the chain (which lie between the minimal and maximal elements). We use this extension to partial orderings in the multideme models we consider below.

An example of the CFTP algorithms is given in Figure 1.

#### Models:

We consider coalescent models that include selection, variable population size, and population structure and focus on a single biallelic locus. Details of how selection is incorporated within a coalescent-type process can be found in work on the Ancestral Selection Graph (Krone and Neuhauser 1997; Neuhauser and Krone 1997) or the Ancestral Influence Graph (Donnelly and Kurtz 1999). For background on incorporating population growth or population structure see, for example, Donnelly and Tavaré (1995) and Hudson (1990) and references therein.

The coalescent model we consider is obtained in the large population size limit to a range of forward population-genetics models. We briefly describe how the parameters are defined in the case of the Wright–Fisher model. First define an effective population size of *N*_{0} diploid individuals or equivalently 2*N*_{0} chromosomes. We measure time in units of 2*N*_{0} generations and as is common define time in terms of time before the present. We consider a population consisting of *D* demes, with random mating within each deme. At time *t* in the past, the population size of deme *d* is *N _{d}*(

*t*) =

*N*

_{0}λ

_{d}(

*t*) diploid individuals. For the case

*D*>1 we further define a migration matrix by

*M*= 4

_{ij}*N*

_{0}

*m*for

_{ij}*i*≠

*j*, where

*m*is the proportion of the population of deme

_{ij}*i*in a generation that have migrated there from deme

*j*in the previous generation. We define .

We denote the alleles at our locus by 1 and 2. As with any biallelic model, mathematically we can describe the mutation process in terms of parent-independent mutations. We let θ = 4*N*_{0}*u*, where *u* is the probability of mutation per chromosome per generation, and let ν = (ν_{1}, ν_{2}) be the probability distribution of the mutant allele. (Note that this model allows for “silent” mutations, which do not change the allele at the locus; thus the effective mutation rate of allele 2 to allele 1 is θν_{1} and of allele 1 to allele 2 is θν_{2}.)

Finally we define the selection process. We assume that the fitness of an *ij* genotype is 1 + *s _{ij}* for

*i*,

*j*= 1, 2, with

*s*≥ 0 and

_{ij}*s*

_{12}=

*s*

_{21}. We can thus define selection rates σ*

_{ij}= 4

*N*

_{0}

*s*. We choose a different parameterization of these selection rates where σ*

_{ij}_{11}= σ

_{11}, σ*

_{12}= σ

_{g}+ σ

_{12}, and σ*

_{22}= 2σ

_{g}+ σ

_{22}. This is an overparameterization of the selection process, and any choice of σ

_{g}≥ 0 and σ

_{ij}≥ 0 for

*i*,

*j*= 1, 2 that gives the correct values of σ*

_{ij}could be used. We then define σ = max{σ

_{11}, σ

_{12}, σ

_{22}}.

The choices of defining the mutation process in terms of a parent-independent mutation process, and the parameterization of selection rates, have been made to make the simulation algorithm detailed below as efficient as possible. In particular, the parameterization of the selection rates is in terms of a genic component (given by the σ_{g} terms) and a nongenic component. The efficiency of the simulation algorithm is increased by choosing σ_{g} to be as large as possible subject to the constraints on the positivity of the other selection rates. (To do this we should label the alleles so that σ*_{22} ≥ σ*_{11}.)

For ease of presentation we have assumed that the selection rates are constant through time and across demes, although it is straightforward to generalize to the case where these rates vary. The coalescent process for our above model can be described in terms of a backward process for the history of our sample (which is independent of the alleles in the sample) and conditional on this backward process some forward dynamics for the allelic types of the branches. If we simulated the backward process until time *t* = ∞, then for any initial choice for the population frequency of the alleles at that time, when we simulated the process forward we would obtain a sample at the present time that is drawn from the stationary distribution of the population. We now describe these backward and forward processes and how we can apply monotone CFTP.

#### Backward process:

The backward process is a continuous-time Markov chain, whose state **N(t) = (N _{1}(t)**,…,

**N**is the number of branches in the underlying coalscent-type process at time

_{D}(t))*t*in the past that come from each deme. The transitions correspond to different possible events in this coalescent-type process. To ease notation, let the state at current time

*t*be (

*n*

_{1},

*n*

_{2},…,

*n*); then the rates of the possible events, and the corresponding transitions, are:

_{D}Coalescence: deme

*d*occurs at rate*n*(_{d}*n*− 1)/(2λ_{d}_{d}(*t*)), with transition*n*=_{d}*n*− 1._{d}Migration: deme

*d*to deme*i*occurs at rate*n*/2, with transition_{d}M_{di}*n*=_{d}*n*− 1 and_{d}*n*=_{i}*n*+ 1._{i}Mutation: deme

*d*occurs at rate*n*θ/2, with no change to state._{d}Genic selection: deme

*d*occurs at rate*n*σ_{d}_{g}/2, with transition*n*=_{d}*n*+ 1._{d}Diploid selection: deme

*d*occurs at rate*n*σ/2, with transition_{d}*n*=_{d}*n*+ 2._{d}

(When describing the transition we have listed only the elements of the state that change.) The initial value of the state, **N(0)**, is given by the number of chromosomes sampled from each deme.

#### Forward dynamics and monotone CFTP:

Assume that we have simulated and stored *T* events in the backward process above. We now consider the forward in time dynamics; however, to apply CFTP we need to make these forward dynamics deterministic, and thus for each event that we simulate in the backward process we also simulate and store a realization of an independent continuous uniform [0, 1] random variable. These realizations will then determine the specific forward transition at each event.

Forward in time the state of our system is given by the number of each type of allele in each deme in our ancestral process. As we have stored the number of branches in each deme at each event, we can define this state solely in terms of the number of type 1 alleles in each deme, which we denote by . Our forward-in-time dynamics are determined by specifying an initial value for the state **n**^{(1)}. Then for the *T*th event, *T* − 1th event,…,first event in turn we update this state as follows.

Consider the *j*th event. Let *u* denote the realization of the uniform random variable associated with this event. Assume that the current state is **n**^{(1)} and the number of branches in each deme is **n** = (*n*_{1},…,*n _{D}*). Then the dynamics depend on the type of the

*j*th event as follows:

Coalescence, deme

*d*: if , then let .Migration, deme

*d*to deme*i*: if , then let and .Mutation, deme

*d*: if , then let ; if , then let .Genic selection, deme

*d*: if , then let .Diploid selection, deme

*d*: if(1)then let ; otherwise, if(2)then let .

We have described the dynamics purely in terms of changes in **n**^{(1)}. The changes in the number of branches in each deme are given by the reverse of the dynamics of the backward process. These dynamics come from the different possible events in the coalescent process that would affect **n**^{(1)}. For i this is a coalescent event to a branch of allele 1, for ii a migration of an allele 1 branch from population *i* to *d*, for iii a mutation of an allele 2 branch to allele 1 or vice versa, for iv a selection event at which an allele 1 branch is nonancestral (which occurs unless both incoming and continuing branches have allele 2), and for v a selection event at which both nonancestral branches have allele 1 or only one has it. (See appendix a for the calculation of the probabilities in this case.)

We can now use CFTP to simulate a sample from the stationary distribution of the population. The dynamics specified above satisfy a monotonicity condition (see appendix b) if 2σ_{21} ≤ σ + σ_{11} + σ_{22}. (This can always be achieved by, if necessary, choosing σ>max{σ_{11}, σ_{12}, σ_{22}}; for example, if σ_{11} = σ_{22} = 0 as in heterozygote advantage then we choose σ = 2σ_{12}.) The partial ordering of this monotonicity is that (*n*_{1},…,*n _{D}*) ≥ (

*n*′

_{1},…,

*n*′

_{D}) if and only if

*n*≥

_{d}*n*′

_{d}for

*d*= 1,…,

*D*.

Thus the monotone CFTP algorithm described above can be applied, whereby to check coupling we need only run the process forward in time from two values of the state: **n**^{(1)} = (*n*_{1},…,*n _{D}*), all branches in all demes carry allele 1, and

**n**

^{(1)}= (0,…,0), no branches in any deme carry allele 1. If we obtain the same sample from each of these initial conditions, then that sample is drawn from the population at stationarity. If not we have to simulate the backward process further into the past and repeat until coupling occurs.

Programs implementing CFTP both for single-population variable population size and for multiple-deme constant-population-size models were written in a combination of R and C. These are available from http://www.maths.lancs.ac.uk/∼fearnhea.

#### Verifying the CFTP results:

To check the validity of the programs implementing this monotone CFTP algorithm we ran the programs under two special cases. First we considered the neutral case and compared the results of our program with those obtained by the program ms (Hudson 2002). Note that this program defines time in units of *N*_{0} generations, rather than the 2*N*_{0} used here. It assumes an infinite-sites mutation model—this output can be converted into a two-allele model with ν = (0.5, 0.5) by (i) setting the mutation rate at θ/2 and (ii) assuming that each mutation changes the allele of the chromosome so that the allele of a chromosome is determined by whether it has an odd or an even number of mutations. For the constant population size and a single panmictic population (*D* = 1) we compared results with those based on simulating from the known stationary distribution of the population frequency of allele 1, using rejection sampling (see Donnelly *et al.* 2001).

## RESULTS

#### Single-population model:

First we used the CFTP algorithm to simulate from a series of single-population models. There are many demographic models that have been suggested or inferred for human or other populations (*e.g.*, Wakeley *et al.* 2001; Wall *et al.* 2002; Marth *et al.* 2004; Schaffner *et al.* 2005). We considered four different scenarios for the variation in population size and four different selection models. In each case we set *N*_{0} = 10,000 diploid individuals, θ = 1, and ν = (0.9, 0.1) (based loosely on appropriate mutation models for disease genes; see Pritchard 2001). The four population size scenarios are:

Constant: a constant population size.

Growth: an exponentially growing population with λ(

*t*) = exp(−0.7*t*) (based on an inferred model for β-globin, see Harding*et al.*1997).Bottleneck: a population bottleneck from

*t*= 0.15 to*t*= 0.175. During the bottleneck the effective population size is 1000, and prior to it it is 5000 (based on a model from Marth*et al.*2004).Complex: a more complicated scenario based loosely on the population size of a non-African population in the model of Schaffner

*et al.*(2005). It includes recent exponential growth and a bottleneck.

Plots of λ(*t*) = *N*(*t*)/*N*_{0} for each of these models are given in Figure 2.

The selection models are defined in terms of the selection rates σ* = (σ*_{11}, σ*_{12}, σ*_{22}). Our four selection models are (i) neutral σ* = (0, 0, 0), (ii) genic σ* = (0, 5, 10), (iii) heterozygote advantage σ* = (0, 10, 0), and (iv) heterozygote overdominance σ* = (0, 20, 10). For each combination of population size scenario and selection model we simulated 10,000 samples. The CPU cost varied across the 16 pairs we considered. In the constant-population-size case, on a 3.4-GHz laptop, it took 0.3, 1, 15, and 25 sec to simulate 1000 samples for the four selection models, respectively. CPU times were reduced by around a factor of 2 for the growth and bottleneck scenarios and were increased by around a factor of 1.5 for the complex scenario.

Histograms of the frequency of allele 1 in a sample of size 50 (conditional on the allele segregating) are given in Figure 3. These histograms agree with other simulation results for the cases that include constant population size (see *Verifying the CFTP results*). The different population-size scenarios have little effect on the frequency spectrum in the neutral case, but quite a notable effect for each of other selection scenarios. The effect is most pronounced for the heterozygote advantage case where the growth and bottleneck scenarios have substantially reduced any effect of selection.

#### Two-deme model:

We now consider a model based on two demes each containing a constant-sized panmictic population with migration between the demes. For simplicity we assume that each deme has the same population size and an identical migration rate from deme 1 to deme 2 and vice versa. We assume a total sample of size 100, with 50 chromosomes sampled from each deme. We assume ν = (0.5, 0.5) and present results for θ = 0.1. We obtain very similar results for smaller values of θ except that the probability of the allele segregating in the population is reduced.

We present results for four selection models and four migration rates. Again we summarize the selection models in terms of σ* = (σ*_{11}, σ*_{12}, σ*_{22}). Our four selection models are (i) neutral σ* = (0, 0, 0), (ii) genic σ* = (0, 5, 10), (iii) heterozygote advantage σ* = (0, 10, 0), and (iv) recessive σ* = (0, 10, 10). The four migration rates (the values of *M*_{12} = *M*_{21}) are 10, 2, 0.5, and 0.1.

We simulated 10,000 samples for each of the 16 combinations of selection model and migration rate. We verified the results for the neutral model with other simulation methods (see *Verifying the CFTP results*). The CPU costs of the simulation varied little with migration and were ∼0.4, 0.7, 10, and 4 sec per 1000 samples for the four selection models, respectively (CPU times for a 3.4-GHz laptop). The frequency spectrum for samples of size 50 within a single deme, conditional on the allele segregating, are shown in Figure 4. The amount of migration appears to have little effect except in the case of heterozygote advantage, when smaller migration rates appear to reduce the effect of selection.

We also studied the effect of selection on the degree of divergence in gene frequency between the demes. Table 1 gives the mean *F*_{st}-values for samples under a SNP ascertainment model that requires two randomly chosen chromosomes (across both demes) to segregate. If the observed frequencies of type 1 alleles in the demes were *p*_{1} and *p*_{2}, respectively, and *p* = (*p*_{1} + *p*_{2})/2, then the *F*_{st}-value for that sample was (*p*_{1} − *p*)^{2}/(*p*(1 − *p*)) (see, *e.g*., Section 2.3 of Nicholson *et al.* 2002). The weight given to a sample is proportional to *p*(1 − *p*) and gives more importance to samples whose minor allele frequency is close to 0.5.

As noted by Hughes *et al.* (2005), the effect of selection is to reduce the amount of variation in allele frequencies, and hence *F*_{st}-values, across demes. The effect is most pronounced as migration rates are decreased. (Measuring the variation in allele frequencies using the *d*-statistic of Nei 1987 produced similar results.)

We further tested the effect of different selection regimes within the two demes. For simplicity we assumed neutrality within deme 1 and the three selection models above within deme 2. The mean *F*_{st}-values in this case are given in Table 2. As expected, *F*_{st}-values increased as compared to the case of the same selection regime within each population, although only in the genic selection case are *F*_{st}-values consistently greater than in the completely neutral case.

#### Stepping-stone model:

We now consider a linear (circular) stepping-stone model . We assume 10 ordered demes, each of equal population size. Migration events are possible between neighboring demes, with *M _{i}*

_{,i+1}=

*M*

_{i}_{+1,i}= 1 for

*i*= 1,…,9 and

*M*

_{1,10}=

*M*

_{10,1}= 1. We consider the same mutation model and each of the four selection models used in the two-deme case and simulate samples of size 100, with 50 chromosomes sampled from deme 1 and 50 from deme 1 +

*c*for

*c*= 1, 2, 3, and 4. We are interested in the degree of correlation in allele frequencies for differing degrees of physical separation of the demes (

*c*) and differing selection models.

Again we simulated 10,000 samples for each combination of selection model and *c*. CPU costs were affected only slightly by *c* and were 0.5, 1.4, 40, and 20 sec per 1000 samples for the neutral, genic, heterozygous advantage, and recessive selection models, respectively (on a 3.4-GHz laptop). We calculated *F*_{st}-values in the same way as for the two-deme case (above), once again under a SNP ascertainment scheme that requires two randomly chosen chromosomes from our sample of size 100 to be carrying different alleles.

Results are given in Table 3. Again the effect of selection is to reduce the amount of variation in allele frequencies between demes. Most strikingly, for these selection models the amount of variation in allele frequencies depends little on the spatial separation of the demes, which is not the case for the neutral case. We further investigated this effect by simulating samples under the genic model for a range of smaller selection values: σ_{g} = 1, 2, 3, and 4 (see Table 4). The effect that *F*_{st}-values depend little on spatial separation is notable for σ_{g} ≥ 2.

## DISCUSSION

We have presented a method for simulating samples from the stationary distribution of a class of nonneutral population-genetic models. The method is simple compared to other approaches based on CFTP (Fearnhead 2001) due to the monotonicity inherent in the problems we consider. Thus simulating samples requires only (i) to simulate the number of branches within the ancestral selection graph back in time and (ii) to simulate the number of these branches (within each deme) that carry allele 1 forward in time for two initial configurations: all branches initially carrying allele 1 and all branches initially carrying allele 2. The resulting algorithm has been shown to be computationally efficient and enables simulation of a large number of samples from complex selection and demographic models within practicable CPU time. The main limitation on the simulation algorithm is the mutation rate θ, with the CPU cost appearing to increase proportional to 1/θ for small θ. Thus simulation for very small values of the mutation rate can be prohibitive.

The monotonicity characteristic of our models occurs because we consider only a single selective locus that carries two different alleles. For more general models, monotonicity will not necessarily apply, but simulating unordered samples as done here should still be easier and more efficient than simulating ordered samples as in Fearnhead (2001).

The examples we considered assumed either a single population and variable population size or a constant population size and multiple demes. There are alternative approaches for analyzing and simulating under the former class of models: in some cases direct simulation is possible by simulating the ancestral selection graph back in time until there is a constant population size and then simulating the alleles on the branches at that time from the known stationary distribution of the constant population size model. Alternatively, recent work by Evans *et al.* (2006) calculates the frequency spectrum under variable-population-size models and an infinite-sites mutation model, and the method used with SelSim (Spencer and Coop 2004) could easily be adapted to this situation.

However, simulating samples from nonneutral models with multiple demes is much more challenging, and we know of no other current coalescent-based simulation methods in this case. Some methods exist for special cases; for example, diffusion approximations for the island model in the limit as the number of demes tends to infinity (Cherry 2003; Cherry and Wakeley 2003).

## APPENDIX A: DIPLOID SELECTION DYNAMICS

Fix a deme and let *n* be the number of branches located in that deme and *n*^{(1)} be the number that carries allele 1. The dynamics at a diploid selection event are as follows (see Neuhauser and Krone 1997): (i) choose three branches at random, call the first the incoming branch, the second the continuing branch, and the third the checking branch; (ii) denote by *ij* the genotype given by the incoming and checking branch; and (iii) with probability σ_{ij}/σ the incoming branch is parental—otherwise the continuing branch is parental.

First consider the probability of the nonancestral branches at a diploid selection event both carrying allele 1. This corresponds to the first condition on *u* given by (1). This occurs if either (a) all three branches chosen in i carry allele 1 or (b) two of these branches carry allele 1 and they are nonancestral. The probability of a is *n*^{(1)}(*n*^{(1)} − 1)(*n*^{(1)} − 2)/(*n*(*n* − 1)(*n* − 2)). The probabilitiy of b iswhere the first term is the probability of choosing two branches carrying allele 1 and one carrying allele 2, the second term is the probability of the branch carrying allele 2 being the incoming branch and being ancestral, and the final term is the probability of the branch carrying allele 2 being the continuing branch and being ancestral.

Combining these probabilities gives the expression on the right-hand side of (1).

The right-hand side of (2) is one minus the probability of the two nonancestral branches carrying allele 2 and is calculated in an identical manner. The probability of *u* satisfying (2) but not (1) is thus the required probability of the precisely one nonancestral branch carrying allele 1.

## APPENDIX B: MONOTONICITY

For notational simplicity we drop the (1) superscript for the state. Consider two values for the states **n** and **n′** that satisfy **n** ≥ **n′**. To demonstrate monotonicity we need to show that for any possible event and value of *u* this ordering of the states is preserved.

Monotonicity at coalescence, mutation, and genic selection events holds trivially. Assume that one of these events occurs to a branch in deme *d*. Either *n _{d}* =

*n*′

_{d}, in which case the dynamics at this event are identical for both states, or

*n*≥

_{d}*n*′

_{d}+ 1, in which case the ordering is preserved as the transition changes the

*n*- and

_{d}*n*′

_{d}-values by either 0 or 1.

Monotonicity also follows for migration events. Consider a migration from deme *i* to deme *d*. We need consider only *n _{i}* ≠

*n*′

_{i}, as otherwise the dynamics at this event are identical for both states. However, in this case the ordering is preserved in deme

*i*by the same argument as above and is also preserved in deme

*d*as

*n*≥

_{d}*n*′

_{d}and the dynamics mean that whenever

*n*′

_{d}increases by one then so does

*n*.

_{d}Finally, consider diploid selection in deme *d*. By the same arguments as above monotonicity trivially holds if *n _{d}* =

*n*′

_{d}or

*n*≥

_{d}*n*′

_{d}+ 2, so we focus on

*n*=

_{d}*n*′

_{d}+ 1. The key point is to check that it is never possible for

*n*′

_{d}to be unchanged at this event at the same time as

*n*is being decreased by 2. Again, for ease of exposition, we slightly change notation and denote the number of branches in deme

_{d}*d*by

*n*. For such a transition to occur we would need(B1)for the

*n*to be decreased by 2 (see Equation 1) and(B2)for

_{d}*n*′

_{d}to be unchanged (using Equation 2 and

*n*′

_{d}=

*n*− 1). Now let

_{d}*a*= (σ − σ

_{11}+ σ

_{12})/(3σ) and

*b*= (σ − σ

_{22}+ σ

_{12})/(3σ). Inequalities (B1) and (B2) can simultaneously hold for the same

*u*if and only if

Thus for monotonicity we need to show that this cannot occur and thus that for all *n* and *n _{d}*

Now consider the right-hand side; this can be rewritten as

Now the first, second, fourth, and sixth terms of this expression sum to *n*(*n* − 1)(*n* − 2), so we get that the inequality we required simplifies to

Now this inequality trivially holds for *n _{d}* = 1 and for

*n*=

*n*. For larger values of

_{d}*n*and

_{d}*n*−

*n*the term in the brackets is decreasing with both

_{d}*n*and

_{d}*n*−

*n*(as both

_{d}*a*< 1 and

*b*< 1). So we need only show that the inequality holds for

*n*= 2 and

_{d}*n*−

*n*= 1. The bracket term in this case is 2(

_{d}*a*+

*b*− 1); and the inequality

*a*+

*b*− 1 < 0 holds if and only if 2σ

_{21}≤ σ + σ

_{11}+ σ

_{22}.

## Acknowledgments

Motivation for this work came from the International Centre for Mathematical Sciences workshop on Mathematical Population Genetics, March 2006. This research was supported by Engineering and Physical Sciences Research grant no. C531558.

## Footnotes

Communicating editor: M. Nordborg

- Received May 12, 2006.
- Accepted August 1, 2006.

- Copyright © 2006 by the Genetics Society of America