## Abstract

We show that the unstructured ancestral selection graph applies to part of the history of a sample from a population structured by restricted migration among subpopulations, or demes. The result holds in the limit as the number of demes tends to infinity with proportionately weak selection, and we have also made the assumptions of island-type migration and that demes are equivalent in size. After an instantaneous sample-size adjustment, this structured ancestral selection graph converges to an unstructured ancestral selection graph with a mutation parameter that depends inversely on the migration rate. In contrast, the selection parameter for the population is independent of the migration rate and is identical to the selection parameter in an unstructured population. We show analytically that estimators of the migration rate, based on pairwise sequence differences, derived under the assumption of neutrality should perform equally well in the presence of weak selection. We also modify an algorithm for simulating genealogies conditional on the frequencies of two selected alleles in a sample. This permits efficient simulation of stronger selection than was previously possible. Using this new algorithm, we simulate gene genealogies under the many-demes ancestral selection graph and identify some situations in which migration has a strong effect on the time to the most recent common ancestor of the sample. We find that a similar effect also increases the sensitivity of the genealogy to selection.

KIMURA (1983) strongly promoted the idea that the abundant genetic variation seen in nearly every species studied must be neutral. Ohta and Kimura (1971) suggested a less strict version of this in which polymorphisms are explained by the constant input of nearly neutral mutations, *i.e.*, variants with selective advantages or disadvantages smaller than the reciprocal of the population size. After a great deal of debate and many analyses, summarized in Golding (1994), there are now few adherents to the strict neutral mutation hypothesis. Molecular techniques currently allow huge numbers of polymorphisms to be assayed with relative ease, and the resulting genomic data can be used to estimate the strength of selection associated with genetic differences (Sawyer and Hartl 1992; Bustamante *et al.* 2002). Recent statistical inferences made from such data emphasize the importance of selection and show that selective advantages or disadvantages on the order of the reciprocal of the population size may be fairly common (Sawyer *et al.* 2003).

The methods used in these works to estimate selection parameters from intraspecific genomic data rely on predictions of sample allele frequencies derived from forward-time diffusion theory. They do not use the ancestral genetic process known as the coalescent, which was introduced in the early 1980s by Kingman (1982a)(b), Hudson (1983), and Tajima (1983). The reason for this is clear: it is much simpler to predict the frequencies of selected alleles using diffusion theory than using the backward-time approach of the coalescent. For neutral loci, the coalescent provides a simple and useful description of the genetic ancestry of a sample of genetic data—the genealogy for short—from a large well-mixed, or panmictic, population of constant size through time. Due to the close connection between genealogies and genetic data, and to the ease with which genealogies can be simulated, a growing set of computational tools use the coalescent to make inferences about population history from DNA sequences; see Stephens (2001) and Tavaré (2004) for reviews.

The coalescent is robust to many kinds of deviations from its underlying assumptions (Kingman 1982b; Möhle 1998c), but it does not hold when members of the sample, or the ancestral lineages of the sample, are not exchangeable. In fact, the relative simplicity of the neutral coalescent process flows directly from the exchangeability of the lineages. Exchangeability, which results in the model from the assumptions of neutrality and panmixia, means that the statistical properties of a sample do not depend on how the sampled items are labeled (Kingman 1982b; Aldous 1985). Possible “labels” include the geographic locations where the samples were collected or the allelic states of the samples if these are known. When samples are not exchangeable, a related but more complicated model called the stuctured coalescent exists (Slatkin 1987; Strobeck 1987; Notohara 1990; Wilkinson-Herbots 1998), and computational methods of inference are being developed for this model as well (Nath and Griffiths 1996; Beerli and Felsenstein 1999, 2001; Bahlo and Griffiths 2000; De Iorio and Griffiths 2004).

Genealogies in the presence of strong selection, *i.e.*, with selective advantages or disadvantages much greater than the reciprocal of the population size, can be modeled using the structured coalescent with the subdivisions being the alternative alleles (Kaplan* et al*. 1988, 1989). For weak selection, Krone and Neuhauser (1997) and Neuhauser and Krone (1997) showed that the genealogy of a sample taken at random with respect to allelic variation is described by an exchangeable ancestral process. The ancestral selection graph (ASG) is a branching-coalescing random graph within which the simple, bifurcating genealogy of a sample is embedded. The elegant exchangeability of the ASG comes with a cost: it lends itself only to very computationally intensive techniques. In addition, there appears to be very little about genealogies in the ASG that differs greatly from genealogies under neutrality (Krone and Neuhauser 1997).

For these reasons, and also to be able to control for sampling, recent extensions to the ASG have modeled genetic ancestry conditional on the frequencies of selected alleles in the sample (Slade 2000a,b; Stephens and Donnelly 2003; Barton* et al*. 2004). This leads to dramatic improvements in computation time while preserving some degree of exchangeability among the sampled lineages (Slade 2000a). The conditional simulation of selected genealogies by Slade (2000a)(b) enables the timing structure of common ancestry to be quantified. In contrast to the unconditional ASG, genealogies conditional upon allele frequencies in the sample may be either larger or smaller on average than genealogies under neutrality (Slade 2000a); see also the review of Slade and Neuhauser (2003). The extent of these differences depends strongly on the initial sample configuration of alleles, particularly for small mutation rates. Under neutrality, samples in which there are approximately equal numbers of each allele have longer times to common ancestry than more unbalanced samples. However, the selective effect on genealogies for samples in which there are approximately equal numbers of each allele is greater than that of more unbalanced samples. We show that restricted migration can cause unbalanced samples to be converted to balanced samples, via the instantaneous sample-size adjustment mentioned above, and thus the time to common ancestry under selection is reduced.

Our purpose is to consider the ASG in the presence of population subdivision and migration. Population structure is evident in many genetic data sets but modeling it presents difficulties (Slatkin 1985; Avise 2000; Hey and Machado 2003), creating impediments both to inference and to understanding. Therefore, it is valuable to identify cases in which simple models may be applied even though the underlying dynamics appear complicated. We present a rigorous derivation of the ASG with very many demes (subpopulations) in an island model of migration (Wright 1931) and show that it possesses a simple structure akin to that under neutrality (Wakeley 1998). Five parameters collapse into just three (one each for migration, selection, and mutation) in the limit as the number of demes tends to infinity. Interestingly, the population rates of selection and mutation scale differently (see below) in the presence of population subdivision. This was found previously in forward-time analyses (Wakeley 2003; Wakeley and Takahashi 2004), but the result is more intuitive in the present case. We also describe a newly enhanced process for the simulation of genealogies conditional on the frequencies of alleles in the sample. We show using conditional simulations that population subdivision and migration among very many demes accentuate the action of selection on the genealogy.

## THEORY

The structured ancestral selection graph is the ancestral process for a class of forward-time models. Following Krone and Neuhauser (1997) we begin by describing a continuous-time model in which reproduction occurs according to the Moran (1958)(1962) model, but where the population is subdivided into demes (subpopulations) connected by migration. There are *D* demes, and in a general model these might have sizes *N _{i}*,

*i*= 1, … ,

*D*. We assume that each of the lineages in the population can be in one of two possible allelic states,

*A*

_{1}and

*A*

_{2}, with relative rates of reproduction λ

_{1}and λ

_{2}, respectively. However, as in Krone and Neuhauser (1997), we begin with a description in which the states of lineages are not specified and so each lineage undergoes two kinds of reproduction (birth/death) events. Type 2 birth/death events occur with rate λ

_{2}− λ

_{1}per lineage. These represent the action of selection since they are realized only if the lineage that reproduces is of the favored allelic type at the time of the event. Type δ birth/death events occur with rate λ

_{1}per lineage. The majority of birth/death events will be of this type, and they can be thought of as neutral in the sense that they are realized regardless of the allelic type of the parental lineage. Note that at this point, without specifiying the allelic types, lineages are exchangeable within demes but are not exchangeable between demes. Note also that, while this is a model of directional selection, it is possible to include frequency dependence (Neuhauser 1999) or general diploid selection (Slade 2000a).

Subdivision is mediated by a collection of *D* × *D* migration probabilities, *m _{ij}*, for

*i*,

*j*= 1, … ,

*D*. When a lineage in deme

*j*reproduces, either by a type δ or by a type 2 event, the individual its offspring will replace (

*i.e.*, the individual picked to die) is chosen uniformly at random from among the

*N*lineages in deme

_{i}*i*with probability

*m*. With probability 1 −

_{ij}*m*, where , the individual to die is chosen uniformly at random from within the same deme as the reproducing individual. As usual in Moran-type models, the individual chosen to reproduce might also be the one chosen to die. Although the model does allow “migration” back to the same deme (with probability

*m*) the effect of this becomes negligible in the large-

_{ii}*D*limit we consider below.

The general model described above forms the basis for a structured ancestral selection graph. This would be obtained via the usual large-*N* limit, with *D* finite, in the same way that the structured coalescent is obtained in the neutral case (Notohara 1990; Wilkinson-Herbots 1998). That is, we would let and assume that *c _{i}* =

*N*/

_{i}*N*,

*M*=

_{ij}*N*, σ = λ

_{i}m_{ij}_{2}− λ

_{1}, and θ =

*Nu*, where

*u*is the neutral mutation rate, are all finite as

*N*→ ∞, and time is measured in units of

*N*generations, where a generation is defined to be λ

_{1}

^{−1}time units. Note that a crucial assumption of the structured coalescent is that migration occurs only between the

*D*demes enumerated in the initial sample, and the ancestry is therefore always restricted to those particular demes.

We do not pursue this large-*N* limit here, but instead consider a limiting ancestral process that approximates the behavior of a population divided into very many demes and in which selective differences are small. Thus, we assume that the sample size is small relative to the number of demes in the population, rather than to the deme size. Historically, migration can take ancestors to any deme within the population. This follows recent coalescent work under the assumption of neutrality (Wakeley 1998) and work on the forward-time diffusion approximation for allele frequencies in the presence of selection (Wakeley 2003; Wakeley and Takahashi 2004). We set λ_{2} = (1 + *s _{D}*)λ

_{1}, and we use λ

_{1}

*s*in place of λ

_{D}_{2}− λ

_{1}for the rate of type 2 birth/death events below. This corresponds to a model in which there are two alleles,

*A*

_{1}and

*A*

_{2}, and allele

*A*

_{2}has fitness equal to 1 +

*s*while

_{D}*A*

_{1}has fitness equal to 1. We put a subscript on the selection coefficient in recognition of the fact that the limit we seek has

*Ds*(and

*Du*) finite as

*D*→ ∞. Following Lessard and Wakeley (2004), the large-

*N*, large-

*D*limit can also be studied. Similar limit results here require more stringent conditions: roughly, that

*D*goes to infinity faster than

*N*(see appendix A).

For simplicity, we assume that migration occurs according to Wright's (1931) island model, but adapted for Moran-type reproduction as in Wakeley and Takahashi (2004). That is, we assume that *m _{ij}* =

*m*/

*D*for all

*i*and

*j*, and that

*N*=

_{i}*N*for all

*i*. Thus, we have a model in which there are

*D*demes, each of size

*N*, and each of which receives a migrant from the total population with probability

*m*at each birth/death event. As in the unstructured ancestral selection graph, it is helpful to consider a graphical representation, which Krone and Neuhauser (1997) refer to as the

*percolation diagram.*This is shown in Figure 1 and is the structured analog of Figure 1 in Krone and Neuhauser (1997). Following a single lineage, either forward or backward in time, 2-arrows are encountered at rate λ

_{1}

*s*and δ-arrows are encountered at rate λ

_{D}_{1}. Each arrow has a probability

*m*of connecting to a lineage sampled uniformly at random from the entire population and probability 1 −

*m*of connecting to a lineage sampled uniformly at random from the same deme. Thus, there are four kinds of branches and these appear repeatedly in the history of every lineage with rates λ

_{1}(1 −

*m*), λ

_{1}

*m*, λ

_{1}

*s*(1 −

_{D}*m*), and λ

_{1}

*s*. Note that λ

_{D}m_{1}is simply a scaling factor that specifies the units in which time will be measured; λ

_{1}= 1 corresponds to the usual notion of measuring time in generations.

The *dual* or ancestral process is obtained by following lineages back in time, *i.e.*, up the graph in Figure 1. Because 2-arrows are realized only if the parental allele is of the favored type, both lineages must be followed in the ancestral process. A single genealogy is obtained, in the unconditional ancestral process, by tracing lineages back to the ultimate ancestor, assigning its type from the equilibrium distribution of allele frequencies (see below), and then following lineages forward in time, with mutation, and trimming off branches appropriately (Krone and Neuhauser 1997; Neuhauser and Krone 1997). Rather than this unconditional ancestral process, we consider the genealogy of a sample containing known numbers of different alleles. This leads to more efficient simulation of selected genealogies. Using the conditional approach, it is possible to label ancestral lineages as either *virtual* or *real*, and it is necessary to trace lineages back only to the most recent common ancestor of the sample, *i.e*., among the *real* lineages, rather than all the way back to the ultimate ancestor (Slade 2000a). There is also a minimal representation of the ancestral process that reduces the number of possible events required at each ancestral transition (Slade and Neuhauser 2003). We describe how this is done in the present, structured model in methodology.

The essence of the large-*D* result here, as in the neutral case, is that the rates at which events occur that affect the history of a sample from the population are very different when every lineage is in a separate deme than when at least one deme contains more than one lineage. The dependence on *D* is such that a *separation-of-timescales* result applies, as, for example, in Möhle (1998a)(b). Because of this, the history of a sample can be divided into two phases (Wakeley 1999). The first is a short *scattering* phase in which coalescent events occur between samples from the same deme and migration events in which lineages move to unoccupied demes (*i.e.*, demes that do not contain any ancestral lineages). The scattering phase occurs on a fast timescale and becomes an instantaneous sample size adjustment in the limit as *D* tends to infinity. The scattering phase ends when all remaining lineages are in different demes. At this point, the *collecting* phase begins in which pairs of lineages come together into a single deme and coalesce, eventually finding a common ancestor of the entire sample. The migration events that could be ignored in the scattering phase, *i.e.*, those in which lineages move to occupied demes, are now essential since a coalescent event can occur only between a pair of lineages if they are in the same deme. We use the scattering/collecting terminology even though we have incorporated selection.

### Forward-time analysis:

To simulate genealogies in both the conditional and the unconditional ASG, it is necessary to know the equilibrium distribution of the frequencies of *A*_{1} and *A*_{2} in the population. In fact, the ASG is a model specifically for a sample from such an equilibrium population. There will be no equilibrium without mutation, and here we follow Krone and Neuhauser (1997) in assuming that there is a probability of mutation *u _{D}* per birth/death event. We note that asymmetric mutation can also be accommodated (Slade 2000a). The forward-time dynamics of allele frequencies in a subdivided population may be complicated, but in the case of a large number of demes they are nearly as simple as in an unstructured population (Wakeley 2003). In a model very similar to the one we consider here, Wakeley and Takahashi (2004) showed that the frequencies of alleles in the total population change according to a diffusion process that is identical to the diffusion process in an unstructured population, only with a different timescale. Further, as these frequencies change (slowly) the collection of demes closely tracks an equilibrium distribution of allele frequencies.

Let the random variable *X*(*t*) be the frequency of allele *A*_{1} at time *t*, and let *x* represent a particular value of *X*(*t*). Because we consider the limit *D* → ∞ while *Ds _{D}* and

*Du*remain finite, mutation and selection do not appear in the equilibrium 1(Wakeley and Takahashi 2004), which here is the fraction of demes in the population that contain

_{D}*k*copies of allele

*A*

_{1}and in which

*a*

_{(}

_{k}_{)}=

*a*(

*a*+ 1) · · · (

*a*+

*k*− 1) is an ascending factorial. Clearly, the equilibrium vector ν does depend on the frequency of allele

*A*

_{1}in the total population, and this will change over time. However, Theorem 3.3 of Ethier and Nagylaki (1980) can be used to show that the collection of demes is always sufficiently close to ν that changes in

*X*depend only on the ensemble properties and Var

_{ν}[

*K*] =

*x*(1 −

*x*)

*N*

^{2}/(

*Nm*+ 1 −

*m*) and that the diffusion of

*X*is identical to the usual unstructured diffusion except that the timescale is increased by the factor 1 + (1 −

*m*)/(

*Nm*) (Wakeley and Takahashi 2004).

By considering overall limiting rates of type δ and type 2 birth/death events—which are λ_{1}*ND* and λ_{1}*s _{D}ND* (1 −

*x*), respectively—and conditioning on the number of copies of allele

*A*

_{1}in the deme where the reproduction event occurs, it can be shown that 2 We let 3and we further assume that 4Then, the diffusion process for the frequency of allele

*A*

_{1}has drift parameter

*a*(

*x*) = −(σ/2)

*x*(1 −

*x*) + (θ/2)(1 − 2

*x*) and diffusion parameter

*b*(

*x*) =

*x*(1 −

*x*). This process has a unique stationary distribution 5where

*B*is a normalizing constant; that is,

*h*(

*x*) satisfies 6(Wright 1949; Kimura 1955). With reference to Figure 1, the two-part equilibrium (1) and (5), which predicts the distribution of

*A*

_{1}among demes and across the total population, is what would be observed by assigning

*A*

_{1}'s and

*A*

_{2}'s to the lineages and then running the process forward a very long time.

### The collecting phase:

Consider the case in which *n* lineages are in, or are sampled from, *n* different demes. By tracing the lineages back in time without knowing their allelic types, we find a simple ancestral process in the limit *D* → ∞ and generate an intuitive understanding for the different scaling of mutation *vs.* selection in (4) and in Wakeley (2003) and Wakeley and Takahashi (2004). The four kinds of arrows, or reproduction events, discussed above will be encountered by these *n* lineages and will sometimes move them to occupied demes so that they might coalesce. We can recognize nine possible events for such a sample, and these are listed in Table 1. For example, the fourth kind of event is that a δ-arrow takes one of the *n* lineages to one of the other *n* − 1 occupied demes but does not connect to the resident ancestral lineage. The result is that now the *n* lineages, while still distinct, reside in *n* − 1 demes. Thus, at the next event the two lineages that reside together in one deme can coalesce. The interpretations of the other events in Table 1 are equally straightforward. Note that the events are categorized according to their effect on the ancestral lineages and also by their probabilities of occurring with reference to the limit *D* → ∞ (while *Ds _{D}* remains finite). As in the unstructured ancestral selection graph, when a 2-arrow is encountered, both paths are followed and the lineage splits into two lineages. We are interested in the rates of branching and coalescence as we follow the history of the sample back in time.

It is appropriate that we have ignored mutations above and in Table 1 because here we are dealing with the unconditional ancestral process. Mutations will occur with probability *u _{D}*, 0 ≤

*u*≤ 1, at both types of arrows or reproduction events. Along a single lineage, they occur with rate λ

_{D}_{2}

*u*=

_{D}*ND*(1 +

*s*)

_{D}*u*[1 + (1 −

_{D}*m*)/(

*Nm*)]/2, which becomes θ/2 in the limit

*D*→ ∞. As in Krone and Neuhauser (1997) and Neuhauser and Krone (1997), we superimpose this mutation process on the unconditional ancestral process. In the conditional ancestral process we consider below, it is necessary to treat mutation, coalescence, and branching simultaneously.

When *D* is large, the first two events in Table 1 will account for the vast majority of events. These events have rates of *O*(1), which here means that they have a finite and nonzero limit as *D* → ∞ for a given λ_{1}. It is easy to see that this is true because the rates of these events depend on *D* only through λ_{1}. These events are lineage switches within and between demes, but ones that preserve the basic sample structure of *n* lineages in *n* different demes. They do not change the rates at which events occur and Table 1 continues to apply. Note that with other kinds of (non-island) population structure, these events would change the state of the sample by moving lineages among different types of demes (Wakeley and Aliacar 2001). Here, they can be ignored due to the symmetry of the island model. The next most numerous events will be events 3–7, and these have rates of *O*(1/*D*), which means that these rates will approach zero as *D* → ∞ for a given λ_{1}, but that their rate of approach to zero will be inversely proportional to *D*. Again, the limiting process we seek is for *Ds _{D}* finite as

*D*→ ∞, so that

*s*is of

_{D}*O*(1/

*D*). Four of these events—3, 4, 6, and 7—do fundamentally alter the state of the sample. These are migration events to occupied demes, both with and without coalescence, and branching events, both with and without a migration event to an unoccupied deme.

Krone and Neuhauser (1997) and Neuhauser and Krone (1997) use the term *collision* to denote the event that an ancestral lineage splits and immediately coalesces with another ancestral lineage. In the unstructured ancestral selection graph, all collisions become negligible in the limit (*N* → ∞ and *D* = 1), and it is not necessary to distinguish between different kinds of collisions. Here, because *N* is finite, some collisions will occur with rates comparable to regular coalescent events. Event 5 in Table 1 is of this sort and has a rate of *O*(1/*D*). However, event 5 is a collision in which the lineage splits and then immediately coalesces with itself. Both forward and backward in time, this has no effect. The descendant lineage has the same parent regardless of whether the parental allele is *A*_{1} and the 2-arrow is not followed or the parental allele is *A*_{2} and the 2-arrow is followed. We refer to this event as a self-collision. Only self-collisions have the potential to occur with rates comparable to regular coalescent events. Other kinds of collisions, for example, event 8 in Table 1, which includes some self-collisions, have rates of *O*(1/*D*^{2}). The number of occurrences of all events with rates of this magnitude is shown later, in appendix A, to be negligible in the limiting (*D* → ∞) process.

Now, consider what happens in the limiting process. By substituting the value of λ_{1} from Equation 3 into the entries of Table 1, the limiting rates become 7891011121314where and σ = *NDs _{D}*. Again, the first two types of events do not change the state of the sample, so it is not problematic that their rates become infinite. It simply means that the ancestral lineages will encounter many birth/death events before anything of relevance happens to the sample. Event 3 is a coalescent event in which the number of lineages decreases from

*n*to

*n*− 1 and these

*n*− 1 lineages are all in different demes. Event 7 is a branching event, in which case the sample goes from

*n*to

*n*+ 1 lineages, and a migration event leaves all

*n*+ 1 lineages in different demes. In event 4 the number of lineages remains

*n*, but now two of them are in the same deme, while in event 6 the number of lineages increases to

*n*+ 1, and again two are in the same deme. To summarize, the first step that matters in the limiting (

*D*→ ∞) process for

*n*lineages in

*n*different demes can be a coalescent event or a branching event, but in either case the remaining lineages are all in different demes; alternatively it can be a migration event or a branching event that results in two lineages residing together in the same deme while the rest are in separate demes.

We note that Equations 13 and 14, correspond to events that would disrupt the dual process or greatly complicate the branching structure, respectively. The zero rates with which we have described them above refer to their instantaneous rates in the limiting process, but do not fully reveal that even over the entire graph, until the ultimate ancestor is reached, these events will occur with probability zero in the limit as *D* → ∞. Proof of this that allows us to omit these events without loss of generality is deferred to appendix A.

With the same level of detail as in Table 1, 27 different events can be distinguished for a sample of *n* lineages in *n* − 1 demes, *i.e.*, where a single pair resides in one deme. However, it is unnecessary to distinguish all of these, and Table 2 shows them grouped into just four kinds of events. The important difference from Table 1 is that events with rates of *O*(1) now have the potential to affect the state of the sample. The *O*(1) events in Table 2 are a coalescent event between the two lineages that share a deme and a migration event that sends one of these two lineages to an unoccupied deme. The other events that can change the sample are of *O*(1/*D*) or smaller, and these are branching events and migration events to occupied demes. These fast events would be problematic if they were to actually occur in the limiting process, but whenever a pair of lineages resides in the same deme events with rates of *O*(1) will dominate. These events, 1 and 2 in Table 2, end with all remaining lineages in different demes. This guarantees that there will never be more than one multiply occupied deme and, that when there is, it will contain just two lineages (appendix A). That is, in the limiting process, either a *V*_{1} or a *V*_{2} event always preempts a *V*_{4} event during the state where precisely one deme contains two lineages. Once in the resulting state, Table 1 applies and the highest rate events that affect the sample configuration are of *O*(1/*D*).

Thus, in a relatively short time, a sample of *n* lineages in *n* − 1 demes will be converted to a sample of *k* lineages in *k* demes. The value of *k* depends on whether a coalescent event or a migration event occurs. If it is a coalescent event, then *k* = *n* − 1, and if it is a migration event, then *k* = *n*. On the timescale above, with λ_{1} = *ND*[1 + (1 − *m*)/(*Nm*)]/2, these events become instantaneous because their rates approach infinity in the limit *D* → ∞. The result is an instantaneous adjustment that has two possible outcomes, with probabilities 15and 16This will act as a filter on the four-rate Poisson process described by Equations 9–12. Starting with the sample of *n* lineages in *n* different demes, whenever a migration event to an occupied deme occurs without immediate coalescence (event 4), there is a chance *P*_{2}(mig) that the sample will be returned to its original state. The rest of the time, *i.e.*, with probability *P*_{2}(coal), event 4 is converted to event 3, which is a coalescent event. Whenever a branching event occurs where both parents stay in the same deme (event 6), there is a chance *P*_{2}(coal) that it is converted into self-collision, so that the sample returns to its original state, and a chance *P*_{2}(mig) that it is converted into an observable branching event (event 7).

It is straightforward to show that a Poisson process filtered in this way is equivalent to a Poisson process with an adjusted rate; for example, see Wakeley (1999). Thus, the probabilities *P*_{2}(mig) and *P*_{2}(coal) narrow the relevant number of *O*(1/*D*) events to just two. Using Equations 9–12, and simplifying, the limiting rates of coalescence and branching become 17and 18respectively. Therefore, the rates of coalescence and branching in the limiting (*D* → ∞) ancestral process become identical to the rates of coalescence and branching in a panmictic ancestral selection graph, but where time is measured in units of λ_{1} generations. From Equations 3 and 15 we can write λ_{1} = *ND*/(2*P*_{2}(mig)). Since we assume that 0 < *m* ≤ 1 and *N* ≥ 1, we have 0 < *P*_{2}(mig) ≤ 1 and λ_{1} ≥ *ND*/2. With a migration rate of *m* = 1, the present model reduces to a panmictic model with timescale λ_{1} = *ND*/2. Otherwise, the effect of island-model subdivision is to lengthen the timescale of the ancestral (and the forward-time) process.

Note that the processes of selection and mutation respond differently to subdivision. Mutation scales with λ_{1}, while the selection parameter σ scales simply with *ND*/2. In other words, there are different effective population sizes for selection and for mutation. The effective size for selection is smaller and is equal to *P*_{2}(mig) = *Nm*/(*Nm* + 1 − *m*) times the effective size for mutation. The reason for this is that, when a branching event creates a new lineage in the ancestral process, it has a *P*_{2}(coal) = (1 − *m*)/(*Nm* + 1 − *m*) chance of being erased by a coalescent event, so only *P*_{2}(mig) of them are observable. In the present model, this cancels, exactly, the factor 1/*P*_{2}(mig) by which the number of branching events that occur in the limiting process is increased relative to the number under panmixia. For the same reason, there is also a difference between the scalings for recombination, which splits lineages, and mutation in similarly structured populations (Nordborg 2000; Lessard and Wakeley 2004).

### The scattering phase:

The results obtained above apply to the history of a sample of *n* lineages in *n* different demes. They follow from the fact that coalescent events within multiply occupied demes and migration events from multiply occupied demes to unoccupied demes occur with rates that are *D* times larger than the rates of branching events and migration events to occupied demes. Because of this, whenever multiple samples are taken from one or more demes, there will be a brief scattering phase and this will leave the remaining lineages in different demes (*cf.* Table 2) so that the collecting-phase results above apply. The number of lineages that remain at the end of the scattering phase, and that then enter the collecting phase, will depend on the number of coalescent events that occur within the multiply sampled deme or demes. In the limit as *D* → ∞, the duration of the scattering phase becomes negligible so that it can be treated as instantaneous when time is measured in proportion to *D* generations.

Because we have assumed that *s _{D}* is

*O*(1/

*D*), whereas the rates of within-deme coalescence and migration to occupied demes depend only on

*N*,

*m*, and the sample size(s), the scattering phase here is the same as that in a population in which all genetic variation is selectively equivalent. As in Wakeley (1998), the scattering phase for a sample of size

*n*all from a single deme is equivalent to a series of

*n*Bernoulli trials with probabilities of success 19for

*k*= 1, 2, … ,

*n*, and in which ℳ =

*Nm*/(1 −

*m*). Note that

*k*= 2 yields

*P*

_{2}(mig) given in Equation 15. Success is defined to be the migration of one of the lineages to an unoccupied deme, and each of these increases by one the number of lineages that will enter the collecting phase. If we use

*n*′ to denote the number of lineages that will enter the collecting phase, then 20is its probability function (Wakeley 1998), in which the |

*S*

_{n}

^{}| are unsigned Stirling numbers of the first kind and ℳ

_{(}

_{n}_{)}= ℳ(ℳ + 1) · · · (ℳ +

*n*− 1). Finally, the scattering phase occurs independently among demes, so that we can multiply the probabilities (20) over samples from different demes.

## METHODOLOGY

We have shown that the ancestry of a sample from a subdivided population, with equal deme sizes and island-type migration, and in which two alleles are subject to selection and mutation, includes an instantaneous scattering phase and then enters a collecting phase, which is equivalent to the unstructured ASG with mutation and selection parameters scaled appropriately. This means that if *Y* is any measurement on the sample, such as the time to common ancestry or the total length of the genealogy, we can compute properties of *Y* by conditioning on the outcome of the scattering phase. For example, if *E*[*Y*|**n**] is the expected value of *Y* for the sample **n** = (*n*_{1}, … , *n _{d}*) of size from

*d*different demes, then 21where

*P*[

**n**′|

**n**] is the product of probabilities like (20) over demes. Because the scattering phase always ends with each remaining lineage in a separate deme,

*E*[

*Y*|

**n**′] is the expected value of

*Y*for a sample of size |

**n**′| in our rescaled, unstructured, collecting-phase ASG. Below, we use the framework of Equation 21 both analytically and in simulations, where our results are also consistent with keeping track of allelic configurations during the scattering phase.

A compression of the Markov process that describes the conditional evolution of the unstructured ASG is found. This continuous-time Markov jump chain is an extension of the conditional ancestral selection graph of Slade (2000a) and is derived with new clarity. The result is a maximal compression of the ancestral selection graph in which the timing properties of the graph are retained. This continues, and improves upon, a similar enhancement of the conditional graph by Slade (2000b) that reduces its branching rate. This is an alternative version of the minimal representation of the conditional graph as discussed in Slade and Neuhauser (2003). Without the timing structure of the graph in place, results that describe some probabilistic properties of a single ancestral lineage within the ASG are obtained by Fearnhead (2002). The details of our general derivation are deferred to appendix B.

The simulations presented in results are performed using the corresponding adaptation of a computational Monte Carlo method as in Slade (2000a)(b). The purpose of such a simulation scheme is the calculation of an approximate probability distribution over the realizations of the genealogy. For a particular genealogy then, a weight is attributed to its associated time to the most recent common ancestor (*T*_{MRCA}), and the weighted average is evaluated upon completion of a large number of repetitions. The improved efficiency in having attained the maximal compression of the genealogical Markov chain is substantial; running time until convergence that was previously required is now at least fully halved. The feasibility of at least doubling the branching and mutation rates is also achieved. The performance gets better still when calculating statistics of the *T*_{MRCA} distribution, as it would also do for properties of subtrees of the genealogy. Our compression reduces the number of possible transitions required to describe the conditional ASG at any event, from 10 to only 6. Thus, the size of the state space of a realization of the conditional graph decreases exponentially. Hence, variance of any corresponding simulation technique is reduced. Depending on the parameter combinations, to produce the composite PDFs in results with a Pentium 3.06GHz Xeon processor, computing time ranged from a few hours to several days.

## RESULTS

From Equation 21 we can see that simple estimators of the migration parameter ℳ are as usefull when weak selection is present as they are when all variation is neutral. For example, let *q*_{0} and *q*_{1} be the probabilities of identity in state for samples of size two taken from the same deme and from two different demes, respectively. Taking the scattering phase into account, we have 22and if *q̂*_{0} and *q̂*_{1} are estimates of *q*_{0} and *q*_{1} from some data, then 23is an estimate of ℳ. Note that (23) has the same form as the various estimators of gene flow based on *F*_{ST} or pairwise sequence differences under the assumption of selective neutrality; see, for example, Hudson *et al.* (1992).

Neither the unstructured ASG nor our slightly more complicated structured ASG lends itself to analytical computations much more involved than the above. From Equation 21 it is clear how analysis of our structured ASG is related to analysis of the unstructured graph. So, for example, it is possible to average such expressions as Krone and Neuhauser's (1997) Equation 3.5 for the expected time to the ultimate ancestor of the sample over *n* = |**n**′| at the start of the collecting phase. Instead of this, we have used simulations that include the scattering phase and the maximal compression of the ASG described in appendix B to compute the probability density function of the time to the most recent common ancestor of a sample (*T*_{MRCA}).

In unstructured nonneutral genealogies with small mutation rates certain sample configurations, among a sample of size *n*, have reduced mean coalescent times when compared to their neutral counterparts, and other sample configurations have enlarged coalescent times (Slade 2000a; see also Slade and Neuhauser 2003). The mixtures of reduced sample configurations at the start of the collecting phase in the structured model result in novel distributions of the *T*_{MRCA} under selection. This mixture is the same under neutrality as it is with selection. The effect of the scattering phase under neutrality is governed by the level of mutation in the collecting phase. Although the sample size is reduced, it can easily correspond to an increased *T*_{MRCA} because the adjusted sample configuration requires longer to assimilate its ancestors (particularly for low mutation rates).

In the unstructured ASG as the mutation rate decreases the conditional genealogy becomes more sensitive to the presence of selection (Slade 2000a). However, the sample configurations that yield the greatest selective effect are also the samples least likely to be obtained, having extremely small sampling probabilities. The sampling distribution is increasingly U-shaped as θ decreases, and under selection, samples dominated by the favored allele are more likely than those dominated by the unfavored allele. Introducing population subdivision makes the most sensitive samples accessible because the scattering phase delivers adjusted samples to the collecting phase. Low migration rates yield sample configurations at the start of the collecting phase that consist of close to *d A*_{1} alleles and *d A*_{2} alleles, provided all of the *d* demes initially sampled contained at least one of each allele. The highest sensitivity to selection therefore is achieved when the population is structured between very many demes connected by an exceedingly low migration rate. Note that this effect is independent of whether the initial sample contains mostly the favored or the unfavored allele.

To convert a coalescent time unit into years, it is necessary to calculate the product of the (estimated) effective population size and number of years per reproductive generation. When the migration rate ℳ ≪ 1 we compare the structured model only since according to the factor calculated in theory, 1 + ℳ^{−1}, a coalescent time unit in the unstructured ASG represents incomparably fewer numbers of generations. On the other hand, when ℳ ≫ 1 there is a meaningful comparison between results with and without very many demes. When ℳ = 0.01 in the structured genealogies (both neutral and selected) every generation that elapses according to coalescent time translates as 100 additional generations in an unstructured genealogy. Population subdivision magnifies the extent of changes to coalescence times and thus immediately accentuates the effect of selection on genealogical time depth.

The interaction between migration and selection as it affects the distribution of the conditional *T*_{MRCA} is shown in Figures 2 and 3. The mutation rate is set at three levels, θ = 0.01, 0.1, and 1. Low mutation rates are appropriate since we consider alleles corresponding to amino acid replacements at a hypothetical nucleotide site within a single locus. (We note that θ = 0.001 yields a distribution of the *T*_{MRCA} under neutrality that is extremely similar to that obtained for θ = 0.01.) The selection rate in our three cases is chosen in accordance with the weighted average predicted to be operating at nucleotide sites by Sawyer *et. al* (2003), namely σ = 5 and 7.5. The migration rate is set to two levels, ℳ = 0.01 in Figure 2 and ℳ = 10 in Figure 3.

The probability density function (PDF) for the low migration rate and low mutation rate case is presented in Figure 2A. A substantial change in the shape of the PDF is clear under selection. Increasing the mutation rate, the next lowest migration rate case is shown Figure 2B. The extent of the effect of selection appears to diminish at our highest level of the mutation rate in Figure 2C. The parameters (mean and standard deviation) of these distributions are compared in Table 3. In the last case just mentioned, we have a particularly novel result as the mean starts to increase again, *i.e.*, toward neutrality, with the selection strength above some threshold. In both Figure 2 and Table 3 it is assumed that samples of size 12 were taken from each of four demes, and that each of the four samples included one copy of allele *A*_{1} and 11 copies of allele *A*_{2}, or (1, 11) for short.

In contrast, when the migration rate is high the sample size adjustment is moderate, and the initial sample configurations show very similar responses to selection. The PDFs shown in Figure 3, A and B, correspond to different initial samples within each deme of being either all (1, 11) of *A*_{1} and *A*_{2} alleles or vice versa all (11, 1), respectively. The effect on the shape of the PDFs can be discerned; however, in Table 4 comparisons of the parameters of the distributions under panmixia are also shown. With a migration rate as high as ℳ = 10, the effects of many-demes population structure in Table 4, for both neutral and selected genealogies, can still be distinguished. In the neutral case (Table 4, left column), even with the sample size adjustment the mean *T*_{MRCA} is considerably higher than it is for panmixia. Comparing the two rows of Table 4, we can see that this also reflects the 10% lengthening of the genealogies expected when ℳ = 10. The difference in the selective effect between the two completely different initial samples is only minor.

## DISCUSSION

We have obtained a simple structured ancestral selection graph in the limit as the number of demes tends to infinity and with island-type migration among demes of equal size. The result is understood as an approximation to the ancestral process for samples in the presence of weak selection from populations composed of a large number of demes. Convergence to the simpler process occurs due to the difference in timescales between migration and drift within demes on the one hand and selection and mutation across the entire population on the other. Under neutrality, results of this sort can be proven using the technique developed by Möhle (1998a)(b) for calculating the transition rates of (ancestral) Markov processes with a separation of timescales. For ancestries under weak selection, the framework of birth-death processes is more appropriate because the addition of new lineages at type 2 branching events increases the dimensionality that would be required in Möhle's analysis. We have shown that one can truncate the system by ignoring the transitions of very small rates to obtain a closed approximated system; we then found the correct transition rates in the limiting process were provided by the corresponding calculation.

Our analysis does not provide an estimate of the error incurred in applying the limiting model to real (*i.e.*, finite) populations. One way to address this is using simulations, such as those reported in Lessard and Wakeley (2004). In that article, simulations demonstrated the rapid convergence of the distribution of the total length of the gene genealogy of a sample of size two to that predicted by the limiting (*D* → ∞) ancestral recombination graph with island model migration among demes. Populations composed of 100 or more demes were closely approximated by the limiting model.

It is clear that any quantity of interest can be computed using the framework of Equation 21, that is, by conditioning on the outcome of the scattering phase. In fact, in the case of weak selection and mutation considered here, there is a lot of parity with the strictly neutral model (Wakeley 1998). In other words, migration structures selected variation within and between demes in a way that is identical to the way it structures neutral variation. This can be seen as a consequence of the fact that the model applies to 0 < *m* ≤ 1 while *s _{D}* → 0 as

*D*→ ∞, so that migration is stronger than selection within demes. However, using the improved algorithm for simulation of selected genealogies conditional on the frequencies of

*A*

_{1}and

*A*

_{2}in the sample, we have also shown a strong effect of migration on the

*T*

_{MRCA}of some samples. Namely, limited migration can convert even very unbalanced samples into balanced ones,

*e.g.*, (1, 11) → (1, 1), and place the collecting-phase ancestral sample into the range of samples for which the

*T*

_{MRCA}is reduced by selection. This happens because there are no mutations during the scattering phase, while coalescent events between like alleles can occur. However, we note that although unbalanced samples, like (1, 11), within demes are more likely to be obtained than balanced samples, like (6, 6), when the migration rate is small it will be even more typical for single-deme samples to be fixed for one allele or the other. This can be seen in Equation 1, which converges to ν

_{0}= 1 −

*x*and ν

*=*

_{N}*x*in the limit

*m*→ 0.

We note also that if we had not assumed the mutation rate to be small (*u _{D}* → 0 as

*D*→ ∞), we would have obtained a neutral limiting graph. When the mutation rate in the ancestral selection graph is infinite the branching-coalescing structure collapses to a neutral coalescent process (Neuhauser and Krone 1997). Allowing mutation in the scattering phase of the structured ancestral selection graph also yields a neutral (coalescent) ancestral process, since this would render an infinite mutation rate in the collecting phase. That is, if the mutation probability μ is

*O*(1) instead of

*O*(1/

*D*) then mutation is a fast timescale event and will occur in the scattering phase amid the rapid migration and coalescent events also of

*O*(1). Whenever the process switches over to the slower transitions, such as those in Table 1 that are

*O*(1/

*D*), mutation still occurs at a fast rate and θ → ∞, as

*D*→ ∞.

We would like to clarify the case of zero mutation rate that leads to a neutral coalescent in the ASG. When θ = 0, ancestors are always of one type only, but in the dual process 2-arrows occur at a faster rate than δ-arrows. If all ancestors are *A*_{1} only the slower-rate δ-arrows are needed to describe the ancestral process since 2-arrows are never followed, and this yields the neutral coalescent. On the other hand, having all *A*_{2} ancestors both the δ-arrows and the faster-rate 2-arrows participate in the ancestral process. However, the outcome of 2-arrows is determined in this case; they will always be followed and no uncertainty arises in the path taken by the dual process of a sample. No virtual ancestors are needed and when a branching event occurs in the Markov process corresponding to the conditional ASG, it represents a so-called null transition, as discussed in appendix B. Recalculating the Markov process without these transitions yields the same neutral coalescent as required.

## APPENDIX A

We first consider a process that omits *U*_{8}, *U*_{9}, and *V*_{4}; a branching-coalescing graph results similar to the one described in theory but without any problematic events. We show that in our original process, each of these three events will occur with probability zero, by the *T*_{UA}, in the limit as *D* → ∞.

Analogously to that of Krone and Neuhauser (1997), for the structured ancestral selection graph with many demes collisions and such can be ignored. Due to the similarity of the techniques employed to this end we present an outline of the proof only.

Couple the original limiting ancestral process with another limiting ancestral process that has a higher branching rate σ > σ/2. This produces an exact copy of all events in the original process, except for the addition of branching events that result from the higher rate.

Let *M _{n}* denote the maximum number of ancestors that can be present in the graph. Equation 2.4 in Theorem 3 of Griffiths (1991) allows the following calculation, the details of which are apparently new, A1where ⌊

*x*⌋ is the integer part of

*x*and ε > 1 is any constant. The probability above is largest when σ > 1, and we need only consider that case in what follows, without loss of generality. Applying Stirling's formula, we then have the following upper bound on the probability in question, A2as

*n*→ ∞. One may then deduce the expression given by Griffiths (1991) that

*M*/

_{n}*n*→ 1, in probability, as

*n*→ ∞. Furthermore, we know that throughout the graph

*M*is bounded above by some constant

_{n}*K*dependent on

_{n}*n*, but independent of

*ND*and time.

Theorem 4.7.1 in Karlin and Taylor (1975) yields an upper bound on the expected total time until the ultimate ancestor (*T*_{UA}). That is, some constant *K _{n}*, independent of time and

*ND*, exists such that

*E*(

*T*

_{UA}) <

*K*. The combination of events {

_{n}*U*

_{8},

*U*

_{9}} occurs with rate λ

_{1}

*s*(

_{D}nm*n*/

*D*) in the original ancestral process. Fix a constant 0 < α <

^{1}/

_{5}. At most (

*ND*)

^{α}units of time elapse before the ultimate ancestor is found. The number of ancestors in the graph is perpetually bounded above by (

*ND*)

^{α}prior to finding the ultimate ancestor. Therefore, in the dominating coupled limiting process, the probability of {

*U*

_{8},

*U*

_{9}} occurring prior to the ultimate ancestor being reached cannot be greater than A3the right side of which tends to zero as

*D*→ ∞, since α <

^{1}/

_{5}.

The other event that could spoil the integrity of the ancestral process with an instantaneous proliferation of ancestors is *V*_{4}. Events *U*_{4} and *U*_{6} in Table 1 lead to the collection of events in Table 2, {*V*_{1}, *V*_{2}, *V*_{3}, *V*_{4}}. Note that since the former occur with a rate *O*(1/*D*), instead of *O*(1), there is a bound on the total number of entries into the set of *V*_{·} events that does not depend on *D*. The event {(*U*_{4} ∪ *U*_{6}) ∩ *V*_{4})} occurs with rate A4As in the previous paragraph, in the limiting process up until the ultimate ancestor is found, the probability that an event such as this ever occurs cannot be greater than the following: A5One may verify that the expression above does indeed tend to zero as *D* → ∞, since α < ^{1}/_{5}.

For a finite *N* there is no need to restrict the migration probability *m* for our zero limit results to hold. For the large-*N* limits, we assume a very small *m* such that *Nm* → *M* as *N* → ∞. Let us also replace *s _{D}* by

*s*= σ/(

_{ND}*ND*) in (A5). Three limits are separated into two types for us to describe. A limit with respect to

*both N*and

*D*that converges to zero can be obtained by setting

*N*=

*D*

^{β}, where β < (1 − 5α)/(5α). When passing both to the limit, we could instead allow

*N*=

*D*and fix α <

^{1}/

_{10}. After the substitutions just referred to, the limit is then taken with respect to the single variable represented in the equation,

*D*. There are two

*iterated limits*where we first pass to the limit with respect to

*D*and then with respect to

*N*, and vice versa. In the former we easily have convergence to zero, as required. In the latter,

*D*is held constant while

*N*passes to infinity and we find that (A5) would diverge. This is a consequence of our notation and we address this problem below.

We have already proven a bound on the maximum number of ancestors and on the *E*(*T*_{UA}) and that this bound depends on sample size but is independent of population size *ND* and time. [See (A2) and Karlin and Taylor's theorem.] We have expressed this bound relative to *N* and *D* by writing (*ND*)^{α}. Thus when *D* is passed to the limit first (or together with *N*) this all works out according to (A3) and (A5), and the limits do indeed converge to zero. That one of the iterated limits corresponding to (A5) diverges, namely as *N* → ∞ before *D*, is technically consistent but violates the earlier proven bounds. This failure of notation indicates that the technique cannot be applied to the iterated limit of *N* before *D*. In that case an expression of the form (*K _{n}*)

^{5}/

*D*describes its growth as

*N*and then

*D*are passed to the limit. Thus, we have now ensured eventual convergence to zero of all limits. We have shown that, over the entire graph until the

*T*

_{UA}, the collision probabilities are bounded above by a term

*O*(1/

*D*). These probabilities then converge to zero in all cases as

*D*→ ∞.

## APPENDIX B

A realization of the conditional graph can include the exact ancestral relationships that describe the ancestors involved at each event and the waiting times between events. An arbitrary but fixed order is required among the ancestors to identify which particular ancestors participate in any event. Combinatorial factors used subsequently to remove this order among the ancestors are valid provided that the transition probabilities and boundary conditions are translated accordingly (Slade 2000a). Our formulation here uses an unordered allelic configuration and an ordered decomposition into real and virtual ancestors.

As the conditional graph evolves backward in time, given an initial sample configuration, branching events add ancestors to the graph, but these are not actual ancestors of the sample. These *virtual* ancestors are contained within the graph for the probabilistic purpose of representing selective evolution and do not belong to the genealogy; the remaining actual ancestors are *real* ancestors of the genealogy. Define the pair **r** = (*r*_{1}, *r*_{2}) as the number of real ancestors of types *A*_{1} and *A*_{2}, respectively. Define also the pair **v** = (*v*_{1}, *v*_{2}) as the number of virtual ancestors of each allelic type. Then, *r*_{1} + *v*_{1} = *n*_{1} and *r*_{2} + *v*_{2} = *n*_{2}. Let **e*** _{i}* be the

*i*th unit vector.

The continuous-time Markov jump chain identified in Slade (2000b) makes transitions as B1where *n* = *n*_{1} + *n*_{2}, we have factored out *n*/2 from the above rates, and *i*, *j* = 1, 2 for *i* ≠ *j*. Now the transitions just described occur with probabilities given by their relative rates, directly obtained as quotients with the sum of the rates in the denominator. The waiting time between transitions is distributed exponentially with a rate parameter + *n*θ/2 + *n*σ/2, when there are *n* ancestors (in total) in the graph.

Note a *null* transition that leaves the ancestral configuration unchanged can occur. Events such as these can be removed with the use of generating functions. Removing an event in the (embedded) Markov chain above does not, however, filter the entire waiting time by the corresponding factor. The Markov chain contains dependencies among the set of possible transitions that can occur at any particular time. These dependencies correspond to the ancestral process and in particular to the percolation diagram as in Figure 1. Each member of the population produces two types of arrows, as events of independent Poisson processes. That is, these arrows occur independently of other members of the population and between both types of arrow. The null transition referred to above shows that certain 2-arrows are ignored and can be omitted from our consideration of the ancestral process. In fact, when the ancestor in the dual process from which the 2-arrow emanates is known to possess the *A*_{2} allelic type, it is in effect ignored. (Note that looking forward in time this is then the ancestor hit by the 2-arrow.) That is, no branching event corresponding to that 2-arrow need occur. This omission is of no consequence for coalescence or mutation rates, since coalescence corresponds to δ-arrows, and mutation occurs on an arrow with probability zero. The adjusted rate of branching for the new waiting-time distribution is *n*σ/2 × *n*_{1}/*n*, the rate of branching in the original graph multiplied by the proportion of *A*_{1} ancestors in the graph at the time the branching event occurs. The (unconditional) rate of a generic branching event in the new Markov chain is unchanged since we have simply omitted the null transition from the original Markov chain.

Consider also a mutation event from a virtual *A*_{1} to a virtual *A*_{2} ancestor; a recursive term is given in Slade and Neuhauser (2003). However, without the unordering factor, the expression is B2where *q*(**r**, **v**) is the stationary sampling probability of the marginal allelic configuration (*r*_{1} + *v*_{1}, *r*_{2} + *v*_{2}). This equation shows that the event just described can be separated into two distinct probabilistic components, a coalescent-type event and another null transition type of event. Operating in reverse to that which removed the null branching event, add a term *v*_{1}θ/2 to the mutation rate in the waiting-time distribution. This introduces another null mutation transition to the Markov chain that would occur with this same rate. Thus, in the Markov chain that results, a null transition occurs with a sum total rate of zero.

Therefore, the minimal continuous-time Markov jump chain that describes the unstructured ASG, and the collecting phase here, makes transitions as B3where again we have factored out *n*/2 from the above rates, and *i*, *j* = 1, 2 for *i* ≠ *j*. These transitions occur with probabilities given by their relative rates. Note that the only virtual ancestors that ever exist in this formulation of the graph are *A*_{1}. The waiting time between transitions in (B3) is now exponential .

## Acknowledgments

We thank both referees for their reports. P.F.S. thanks the Wakeley Lab and the Department of Organismic and Evolutionary Biology at Harvard for their hospitality during a visit. J.W. was supported by a Presidential Early Career Award for Scientists and Engineers from the National Science Foundation (DEB-0133760). P.F.S. was also supported by a travel fund from the Faculty of Science at the University of Sydney. This is research publication no. 0009 from Sydney University Bioinformatics and Technology Centre.

## Footnotes

Communicating editor: D. Charlesworth

- Received June 21, 2004.
- Accepted October 28, 2004.

- Genetics Society of America