## Abstract

Models for detecting the effect of adaptation on population genomic diversity are often predicated on a single newly arisen mutation sweeping rapidly to fixation. However, a population can also adapt to a new environment by multiple mutations of similar phenotypic effect that arise in parallel, at the same locus or different loci. These mutations can each quickly reach intermediate frequency, preventing any single one from rapidly sweeping to fixation globally, leading to a “soft” sweep in the population. Here we study various models of parallel mutation in a continuous, geographically spread population adapting to a global selection pressure. The slow geographic spread of a selected allele due to limited dispersal can allow other selected alleles to arise and start to spread elsewhere in the species range. When these different selected alleles meet, their spread can slow dramatically and so initially form a geographic patchwork, a random tessellation, which could be mistaken for a signal of local adaptation. This spatial tessellation will dissipate over time due to mixing by migration, leaving a set of partial sweeps within the global population. We show that the spatial tessellation initially formed by mutational types is closely connected to Poisson process models of crystallization, which we extend. We find that the probability of parallel mutation and the spatial scale on which parallel mutation occurs are captured by a single compound parameter, a characteristic length, which reflects the expected distance a spreading allele travels before it encounters a different spreading allele. This characteristic length depends on the mutation rate, the dispersal parameter, the effective local density of individuals, and to a much lesser extent the strength of selection. While our knowledge of these parameters is poor, we argue that even in widely dispersing species, such parallel geographic sweeps may be surprisingly common. Thus, we predict that as more data become available, many more examples of intraspecies parallel adaptation will be uncovered.

THERE are many dramatic examples of convergent evolution across distantly related species, where a phenotype independently evolves via parallel changes at orthologous genetic loci (Wood *et al.* 2005b; Arendt and Reznick 2008), indicating that adaptation can be strongly shaped by pleiotropic constraints (Haldane 1932; Orr 2005; Stern and Orgogozo 2008; Kopp 2009). There are also a growing number of examples of the parallel evolution of a phenotype *within* a species due to independent mutations at the same gene (Arendt and Reznick 2008) (which are sometimes referred to as genetically redundant). Some of the best-studied examples come from the repeated evolution of resistance to insecticides within several insect species (Ffrench Constant *et al*. 2000) and the resistance of malaria to antimalarial drugs (Anderson and Roper 2005; Pearce *et al.* 2009). Another example is the loss of pigmentation in *Drosophlia santomea* through at least three independent mutations at a *cis*-regulatory element (Jeong *et al.* 2008), while the evolution of pigmentation within vertebrate species provides further examples (Protas *et al.* 2006; Gross *et al.* 2009; Kingsley *et al.* 2009). There are also a number of examples of parallel evolution within our own species (Novembre and Di Rienzo 2009). For example, various G6PD mutations have spread in parallel in response to malaria (Tishkoff *et al.* 2001; Louicharoen *et al.* 2009), and lactase persistence has evolved independently in at least three different pastoral populations (Tishkoff *et al.* 2007; Enattah *et al.* 2008). A particularly impressive example in humans is offered by the sickle cell allele at the β-globin gene that confers malaria resistance, where multiple changes have putatively occurred at a single base pair (see Flint *et al.* 1998, for discussion). In each of these examples, multiple, independent mutations have led to the same or a functionally equivalent adaptive phenotype, although they differ in the degree to which the functional consequences and equivalences of the different mutations have been explored. Such repeated adaptive evolution via similar changes within a species, which we term *parallel adaptation*, may therefore be common. As we also address repeated evolution of a similar phenotype via changes at different genetic loci, this could more broadly be termed “convergent adaptation” (Arendt and Reznick 2008).

In many of these examples the selection pressure is patchy and rates of gene flow are low, increasing the chance of parallel adaptation. However, parallel adaptation can occur even in a panmictic population. For example, adaptation may occur from multiple independent copies of the selected allele present in standing variation at mutation–selection balance within the population (Orr and Betancourt 2001; Hermisson and Pennings 2005). Even when there is no standing variation for a trait in a panmictic population, a selected allele could arise independently several times during the course of a selective sweep, if mutation is sufficiently fast relative to the spread of the selected allele. This idea was formalized by Pennings and Hermisson (2006a,b), who showed that such soft sweeps may be expected when the population scaled mutation rate (the product of the effective population size and mutation rate) toward the adaptive allele is >1. Thus, repeated mutation may be quite common for species with large populations or where the mutation target is large, *e.g.*, knocking out of a gene. Pennings and Hermisson (2006a) showed that the number of independently arisen selected alleles in a sample has approximately a Ewens distribution, and properties of neutral variation at a closely linked site can be derived from this (Pennings and Hermisson 2006b). Such a selective sweep has been termed a *soft sweep*, as the population can adapt without the dramatic reduction in diversity at linked selected sites that is usually associated with a full sweep (Maynard Smith and Haigh 1974); see Pennings and Hermisson (2006a,b), Hermisson and Pfaffelhuber (2008), and Pritchard *et al.* (2010) for discussion and Schlenke and Begun (2005) or Jeong *et al.* (2008) for potential examples.

Clearly, if parallel mutations can occur during adaptation in a large panmictic population, then limited dispersal should further increase the chance of parallel adaptation, as other mutations can arise and spread during the time it takes one to move across the species range. Intuitively, a low rate of dispersal and a large mutational target should increase the chance of parallel adaptation (as in Coop *et al.* 2009; Novembre and Di Rienzo 2009), but it is unclear exactly how other dispersal, population, and mutational parameters play into the probability of parallel adaptation. However, in the absence of a formal model, many simple questions remain: Does parallel adaptation occur only in species with strong population structure? Weak selection pressures lead to slowly spreading mutations. Is parallel adaptation more likely in this case? This leaves us unable to understand the likelihood of parallel adaptation in particular examples (such as Flint *et al.* 1998) and more generally its role in geographic patterns of adaptation (such as Coop *et al.* 2009).

Here we study parallel adaptation in a homogeneous, geographically spread population. We focus on the case where a population is exposed to a novel selection regime throughout a homogeneous species range, and the population is initially entirely devoid of standing variation for the trait, assumptions that favor the fixation of only a single new allele in the population. We use simple approximations to derive theoretical results for the properties of parallel adaptation in a continuous spatial population with strong migration for a range of dispersal distributions (also called dispersal kernels, including fat-tailed examples). We are able to describe fairly completely the resulting patterns and show that they are well captured by a single compound parameter combining the rate of mutation and the speed at which the mutation spreads. For an introduction to the patterns of genetic diversity that can be expected from such geographic structure at both neutral and selected loci, see Lenormand (2002), Charlesworth *et al.* (2003), and Novembre and Di Rienzo (2009).

We show that when population sizes are sufficiently large and dispersal distances are small compared to the species range, parallel adaptation within a species is likely to be common, and quantify this relationship. Furthermore, we describe how separately arisen mutations will—at least for some time—leave behind a spatial pattern reflecting their separate origins.

The structure of this article is as follows: In methods we introduce and analyze our model of a continuous population, first in the classical context and then in a more general context that allows for accelerating waves (arising from fat-tailed dispersal distributions). In *Simulations* we present the results of some simulations of the continuous process, intended to assess the robustness of our results to deviations from the assumptions. In *Biological parameters and the characteristic length* and *Applications* we present and discuss the theoretical results in a few biologically reasonable contexts, providing numerical results to illustrate how the different parameters play into the probability of parallel adaptation. In the discussion we discuss consequences and extensions. Some mathematical arguments are postponed until the appendixes.

#### Modeling assumptions:

Here we describe the assumptions behind our model and give some background, before introducing in methods the model we analyze. First, we assume each mutation under consideration confers a selective advantage such that, upon appearing in the population, it quickly rises locally to some equilibrium frequency. Second, there is significant spatial structure; namely, migration is weak enough that the selected trait reaches an equilibrium frequency locally before spreading to the entire population. Third, the parallel mutations are distinguishable and confer the same selective benefit. Fourth, these mutations are neutral relative to each other, in the sense that in a population at equilibrium frequency (*e.g.*, fixation) for any collection of these mutations, the dynamics of their relative proportions occur on a longer timescale than their dynamics in the original background (examples are given below). We call this last assumption *allelic exclusion*, since it implies that areas fixed for one adaptive allele will not be rapidly overtaken by another.

Under these assumptions, a newly arisen advantageous mutation, if it is initially successful, will spread through the population in a more-or-less wavelike manner (more on this later). If another allele conferring the same advantage arises in a location the first has not yet reached, then the two waves spread toward each other and will at some point collide. What happens when they collide will generally depend on the details of their epistatic interaction or, if they occur at a single site, on their dominance interaction. However, by our assumption of allelic exclusion, the dynamics are slower than the spread of the selected alleles. This allows us to neglect the slower mixing of types and genetic drift that will happen in this phase, instead focusing on the first process by which independently arisen alleles partition the population.

In Figure 1 we show a cartoon to illustrate our model, and in Figures 5 and 6 we show the results of a simulation (described in *Simulations*).

##### Allelic exclusion:

The allelic exclusion assumption is fundamental to our approach. It will hold, for instance, if there is a single advantageous mutation, and we treat each time it arises independently as a distinct allele, identifiable by examination of linked neutral variation. It will also hold if mutations at multiple sites within a gene are genetically redundant, such as loss-of-function mutations, and no additional selective benefit is conferred by having a mutation at more than one site (though this may be an approximation, since even loss-of-function changes within the same gene may differ in their characteristics, as in Rosenblum *et al.* 2010).

Another important consequence of allelic exclusion is that a mutation occurring in a location where the advantageous allele already exists in large numbers is unlikely to persist or achieve high frequency—indeed, if the interaction is neutral and 999 other individuals already exist in the same location with the selected trait, then a new mutation will contribute on average only 0.001 of the future population and has high probability of being lost from the population by drift. This fact allows us to ignore all new mutations that occur after any selected allele has risen in that location to a nonzero frequency. In particular, the shape of the wave front will not be important, only how its leading edge spreads. Below, for convenience we often talk about the probability or rate of local *fixation*, but it follows from this observation that we need require only that the allele escapes loss from the population by drift and that some intermediate equilibrium frequency is reached, as would occur in the case of overdominance.

##### Selection:

We also assume that the advantageous, derived alleles have a reproductive advantage of (1 + *s*) relative to the ancestral type. In practice, in a diploid model with dominance or epistasis, or in the presence of density dependence, we require that both the manner in which a new mutation escapes drift and the way that it subsequently spreads through the population be well approximated by the simple haploid (or additive) model. Roughly speaking, this holds if the growth and spread of the allele are driven by growth where the allele is at very low frequency (and primarily occurring in heterozygotes). This implies that the probability a new mutation escapes drift is well approximated by 2*s* divided by the variance in offspring number [which is quite robust to the details of spatial structure (Maruyama 1970, 1974)] and that per-capita growth is fastest when at low frequency. In the usual formulation of diploid systems (Aronson and Weinberger 1978), this is satisfied if the fitness advantage of the homozygote is no more than twice the fitness advantage of the heterozygote. In other cases, *e.g.*, an Allee effect, the behavior can be quite different; see Stokes (1976).

#### Background on the wave of advance:

We model the spread of a selected allele by making use of existing work on traveling waves, a link first established independently by Fisher (1937) and by Kolmogorov, Petrovskii, and Piscunov (KPP) (Kolmogorov *et al*. 1937). We introduce and review the wave of advance literature here, as much of the subsequent development has occurred in fields other than population genetics. Suppose that individuals produce a random number of offspring with mean *r* and that offspring disperse a random distance with standard deviation σ, and let *p*(*t*, *x*) be the expected proportion of mutants at time *t* and location *x*. Suppose also that the selection coefficient *s* is small and the advantage is additive and that the population density ρ is fairly large. Both articles argued that if the dispersal distance is Gaussian, or if σ is small (so that the “long-time” dispersal distribution is Gaussian), then barring the appearance of new mutations, the time evolution of *p* is well described by the reaction–diffusion equation now known as the Fisher–KPP equation,(1)where *d* is the dimension of the species range. They furthermore showed in *d* = 1 that a “wave of advance” occurs as the solution to this equation and that for initial conditions where the allele is only polymorphic within a spatially bounded region, the solution moved asymptotically with speed . Kolmogorov *et al.* (1937) also covered the more general case in which *p*(*x*, *t*)(1 − *p*(*x*, *t*)) is replaced by *F*(*p*(*x*, *t*)) for an appropriate function *F*, which gives the density-dependent growth rate of the selected type, subject to certain conditions.

For many other choices of dispersal distribution and growth function *F* the advancing front of a new type also approaches a constant wave shape that advances at constant speed through time—a “traveling wave” solution, but with a speed not given by the same formula. Then the frequency of individuals of the selected type at *x* at time *t* can be expressed *p*(*x*, *t*) = *h*(*x* − *νt*), where *h*(·) gives the shape of the wave and ν is its speed. These traveling wave solutions have been studied for the Fisher–KPP equation for a range of appropriate *F* (Aronson and Weinberger 1975); the speed can often be found more easily than the wave shape (Hadeler and Rothe 1975). Radially symmetric solutions also exist, in which the new type travels outward from an initial origin; the behavior of such radially spreading waves depends on initial conditions, but will asymptotically move with the same constant speed and fixed wave shape as in one dimension.

Since the introduction of the Fisher–KPP equation, traveling wave solutions to reaction–diffusion equations have been studied in the ecological literature as a model of invading species (Skellam 1951; Kot *et al.* 1996), as well as in a range of other fields. See Aronson and Weinberger (1978) for some classical theory, general discussion, and context or Volpert *et al.* (1994) for a more extensive reference. Related models, using integrodifference or integrodifferential equations have been used by various authors to include various important biological factors such as age structure and fluctuating environments (Neubert and Caswell 2000; Neubert *et al.* 2000; Kot and Neubert 2008); see Hastings *et al.* (2005) or Zhao (2009) for a review. Density regulation is often discussed in these models, but important behaviors can usually be determined by a linearization, on the basis of how the new type grows when rare. Common to these models is the existence of traveling wave solutions, whose forms and speeds are often known only implicitly; most natural models of the spread of a new selected type can be translated into one of these frameworks. There is also a fruitful connection of these Fisher–KPP models to branching random walks that is beyond the scope of this article; see McKean (1975), Biggins (1979, 1995), and Kot *et al.* (2004). A similar model, the contact process, has also been widely studied in the probabilistic literature; see Bramson *et al.* (1989).

The qualitative behavior of the spread of an organism or an allele in a population can depend on the organism's dispersal kernel, defined as the probability density of the distance between mother and child's birth locations (see Shigesada and Kawasaki 1997 or Cousens *et al.* 2008 for discussion). Most mathematical models of invasions assume that the dispersal kernel has tails bounded by an exponential and obtain a constant wave speed. In some species this is appropriate, while in others, rare, long-distance migration events are important (Shigesada and Kawasaki 1997). In such organisms, dispersal may be better modeled by a kernel that is not bounded by an exponential (*i.e.*, a “fat-tailed” kernel), although there is generally insufficient evidence so far (Cousens *et al.* 2008, Chap. 5). Mollison (1972) showed that in a certain model, if the kernel is fat tailed, the range occupied by the expanding type will be patchy and will grow faster than linearly: the spread accelerates and eventually moves faster than any constant-speed traveling wave. Moreover, Lewis and Pacala (2000) established a link between leptokurtic kernels (kernels whose kurtosis exceeds that of the standard Gaussian) and patchy invasion dynamics. Leptokurtic but exponentially bounded kernels can lead to waves that initially accelerate but settle to a constant speed. We shall see that the important behavior of the model is not determined by the asymptotic, long-time speed of the wave, but rather by its behavior at intermediate times. Therefore, kernels that have similar short-time behavior but different long-time behavior can give rise to similar dynamics on the scale we are interested in. Consideration of other wave behaviors leads to a more general model, which we study in *The general case*.

The models reviewed above are haploid models; traveling waves in diploid models have been much less studied. Aronson and Weinberger (1975) show that in the diploid analog to Equation 1, if the difference in selection coefficient is small, then allele frequency dynamics are approximately governed by (1). If local populations are in Hardy–Weinberg equilibrium, then more general results apply, demonstrating the existence of traveling waves (Weinberger 1982; Zhao 2009). If dispersal occurs over a distance comparable to the width of the wave, then this will no longer be the case, and while recently developed general theory (Zhao 2009) might be applied, the existence and characterization of traveling waves in other diploid models is to our knowledge an open question. However, we certainly expect the behavior to be wavelike, and since our theory takes wave behavior as an input, we have no qualms about using our model to discuss the diploid organisms in *Biological parameters and the characteristic length*.

## METHODS

Consider a population with continuous spatial distribution. After the change initiating the novel selection regime (which occurs at *t* = 0), selected mutations can arise at random throughout the species range and, if they escape loss due to genetic drift, start to expand radially out from their origination point. Under our assumption of mutational exclusion, a new mutation can arise and spread only in areas not already reached by another selected allele.

We first examine the case where the wave travels with constant velocity, and then in *The general case* we study the case where the speed of spread of the selected allele is not constant through time, proving our results in this more general setting. It turns out that the properties of the final pattern of types can be conveniently summarized by a single compound parameter, a *characteristic length*. In the constant-speed case, models with different parameters will result in patterns with identical properties when the geographic distance is scaled in units of this characteristic length. This is similar to the use of effective population size in models of genetic drift, for which populations with very different sizes have identical rates of drift when time is scaled by the effective population size. This characteristic length can be defined as the distance traveled by an unobstructed spreading wave before it is expected that one other successful mutation would have arisen within the area so far enclosed.

Any such expanding waves of selected alleles in real populations are subject to stochastic fluctuations. Initially, we ignore this point, but in *Stochastic waves* we show that if stochasticity is taken into account, first-order properties such as the mean number of types depend only on the speed of the *mean* wave, in a way we make explicit.

#### The mutational process:

Imagine a large, haploid population with ρ individuals per unit area distributed uniformly over some range *U*. The spatial range *U* may be one- or two-dimensional, but must be connected (*i.e*., not composed of disjoint pieces). We allow the number of offspring produced by each individual to be random, with mean *r* and variance ξ^{2}, and suppose that each offspring of a nonmutant carries some beneficial mutation with probability μ. Mutants have an additive advantage of *s* relative to nonmutants and additional mutations have no effect; to fix things, suppose that nonmutants reproduce at rate 1 (so time is scaled in units of generations), while mutants reproduce at rate 1 + *s*, and that each individual reproduces independently of the others and of the state of the population. We frequently make use of the fact that the probability of local fixation of a single new mutant, since *s* is assumed to be small, is well approximated by 2*s*/ξ^{2}.

The set of times and locations (*t*_{i}, *x*_{i}) at which new mutants appear is a random set of points in space–time. Under our assumptions, it is well approximated by a Poisson point process in [0, ∞) × *U* with constant intensity ρμ*r*, the mutation rate per unit area per generation. (Recall that a Poisson point process with constant intensity is a random collection of points with the property that the number of points in any set has a Poisson distribution with mean equal to the area of the set multiplied by the intensity, and the numbers of points in any two nonintersecting sets are independent of each other.) By the “thinning” property of Poisson processes, the points of origin of new mutations that will be successful if no other mutant type has already colonized the location is also a Poisson point process in [0, ∞) × *U* with constant intensity λ = (ρμ*r*)(2*s*/ξ^{2}) mutations per unit area per generation.

#### Constant wave speed:

As reviewed earlier, if ρ is large enough that stochastic effects are small, the selected type will spread as an expanding wave with constant wave speed . The resulting picture is of waves spreading radially out from the site of each successful mutation, until they collide with each other. Rather than formally prove the convergence of some discrete model, we take this natural continuous model as our starting point for analysis: successful mutations arise as the Poisson process described in *The mutational process* and instantly begin to spread radially at speed *ν*.

This natural model of parallel adaptation has sources of new waves (new successful mutations) arising as a Poisson process in space and time and each wave expanding outward until encountering another wave, where they come to a halt (see Figure 1). This model has been studied before—again, first by Kolmogorov (1937)—as a model of crystallization, in which nucleation sites form at random points in time and space and initiate the radial growth of new crystals. More generally, the speed of the wave and the intensity of the Poisson process may not be constant, in which case the final tessellation of space determined by the crystals is known as the Kolmogorov–Johnson–Mehl–Avrami tessellation (Fanfoni and Tomellini 1998), The properties of these crystal shapes have been extensively studied in the case of constant-speed waves by Møller (1992, 1995) and others (Gilbert 1962; Bollobás and Riordan 2008).

To make this model precise, suppose that mutant type *i* arose at location *x*_{i} and time *t*_{i}. At each later time *t*, let *A*_{i}(*t*) denote the area that type *i* has spread to by time *t* that was not reached first by any other type *j*. This is formally defined by(2)where ‖*x* − *y*‖ is the Euclidean distance from *x* to *y*. The dynamics of the collection of areas *A*_{i}, for origins (*t*_{i}, *x*_{i}) distributed as a homogeneous Poisson process, is the (Kolmogorov–)Johnson–Mehl process.

One objection to this picture is that the waves do not really begin to spread instantly at speed *ν*; they first need to reach equilibrium (*e.g.*, fixation) locally, which takes a random time of order log(ρσ* ^{d}*)/

*s*generations, and converge to the equilibrium wave speed only after the effect of initial conditions dies out. Translating the points of a Poisson process by independent random amounts produces another Poisson process, but this one will no longer be homogeneous in time. However, if the standard deviation of the time to local fixation is small relative to λ, this is has little effect. Another objection is that the shape of the wave itself is stochastic, especially at the beginning. We ignore this detail and return to it in

*Stochastic waves*. The simulations in Figure 7, and others for a range of parameters (not shown), show that the waves quickly begin to spread at a constant speed, after a random delay with relatively low variance, reassuring us that our assumptions are reasonable.

In principle, any property of the model can now be found through calculations with Poisson processes, although for general species ranges *U* the formulas are complicated. All results in this section can be derived more or less explicitly by giving a population genetics interpretation to results in Møller (1992); they also follow from our results in *The general case*, where the proofs are given, which generalize the Johnson–Mehl model in a different direction than previously done by Møller (1992).

It turns out that all properties can be summarized fairly simply, especially in this constant-speed case. Recall that λ = 2*s*ρμ*r*/ξ^{2} is the spatial density of successful mutations per generation and is the speed of the wave, and let ω(*d*) be the area of the sphere of radius 1 in *d* dimensions, so ω(1) = 2 and ω(2) = π. We define the *characteristic length* χ, which will be useful in a moment, as(3)Fix some habitat shape *U* in *d* dimensions (*d* will be 1 or 2) with diameter (the maximum distance between any two points in *U*) equal to 1, and let *U*(*a*) be a habitat with the same shape, scaled by the factor *a* (and hence diameter *a*). Since the process with parameters (ν, λ, *U*(*a*)) can be realized by rescaling space by 1/*a* and time by λ*a ^{d}* in the process with parameters (ν/(

*a*

^{d}^{+1}λ), 1,

*U*(1)), the distribution of the final configuration of types within

*U*(

*a*) [specifically, {

*A*

_{i}(∞)}

_{i}] is a function of the single number(4)The quantity (

*a*/χ)

^{d}^{+1}is, up to a constant, the expected number of other mutations to arise in an area of diameter

*a*in the time it takes the wave to cover that area. Furthermore, χ is the distance traveled by an unobstructed spreading wave before it is expected that one other successful mutation would have arisen within the area enclosed so far. [This last interpretation is the reason for the appearance of ω(

*d*).]

A critical characteristic of parallel adaptation is the density of unique mutations per unit area. This mean density of types ν(*d*) can be calculated exactly and clearly displays the role of χ. We define ν(*d*) as the mean number of successful types arising in a region divided by the area of that region, assuming the species range is the infinite range ℝ* ^{d}* to avoid edge effects. (This turns out to be independent of the region used.) In two dimensions, we get from Equation 9 (after the integrals worked out in appendix b,

*Stretched exponential*) that(5)where Γ is the gamma function, and while in one dimension, ν(1) = χ

^{−1}2

^{−5/2}. In general, the mean number of successful types in a

*d*-dimensional region of total area

*A*will be

*A*/χ

*, up to a constant depending only on the shape of the region.*

^{d}The mean area occupied by a successful mutation in *d* dimensions is 1/ν(*d*). In *d* = 2 there does not seem to be a nice expression for the variance of this area, but numerical computation and simulation indicate that the distribution is not highly skewed—the areas occupied by different mutations are comparable (see, *e.g.*, Figures 5 and 6). Møller (1992) also computes many other quantities in this constant-speed setting, such as the mean density of interfaces between adjacent types or the mean number of neighbors of a given mutation—we do not consider these here.

A related quantity is the distance from a sample point to the origin of the mutation that eventually covers it, which relates to the total number of others sharing a sampled type. The mean and variance (and all other moments) of this can be computed in a similar way to Equation B4 in appendix b, *Stretched exponential*.

Besides the spatial scale, the other most important datum is the timescale of adaptation. Although the final distribution of types depends only on the characteristic length, the speed at which adaptation spreads through the population depends differently on the parameters. A convenient summary of this time is τ, defined as the time before a chosen point is hit by an expanding wave. [For the point *x*, this is τ = inf{*t* > 0 : *x* ∈ ∪_{i}*A*_{i}(*t*)}.] In the infinite *d*-dimensional range ℝ* ^{d}*, the time τ has a distribution such that τ

^{d}^{+1}is an exponential,so the expected time after selection begins until a chosen location has been reached by an adaptive mutation is(6)Note that the crucial quantity here in, say, two dimensions, iswhich depends much more strongly on the selection coefficient

*s*than does the characteristic length.

#### The general case:

The results of the previous section can be applied to any model having waves of advance with a constant wave speed, substituting this speed for ν. However, real dispersing populations often show important effects of rare, long-distance dispersal events (such as Coyne *et al.* 1982; Clark 1998). This implies that the dispersal distribution should in some sense be fat tailed, but since data on rare events are hard to obtain, there is no consensus on what distributions best model real dispersal (Kot *et al.* 1996). Since the behavior of the wave depends on the shape of the dispersal kernel—waves from fatter-tailed kernels may tend to accelerate initially—it is important to explore wave behaviors other than the simple constant-speed case (Kot *et al.* 1996). This is more technically demanding, but we find that the spatial properties can again be meaningfully described by a single characteristic length.

As mentioned before, Mollison showed that if the dispersal kernel is not bounded by a decaying exponential (which we now refer to as fat tailed), the range occupied by the expanding type expands at an ever-increasing speed (Mollison 1972). Although the resulting behavior is termed an accelerating wave (Kot *et al.* 1996), there is no “traveling wave solution” in the same precise sense as before; thus, we imagine that the frequency *p*(*x*, *t*) is radially symmetric and decreasing as ‖*x*‖ increases for large enough *t* and ‖*x*‖ and that for some fixed low frequency, ε, the distance *f*(*t*) at which the wave front first rises above a frequency ε [*i.e.*, *f*(*t*) = sup {‖*x*‖ > 0 : *p*(*x*, *t*) > ε}] is reasonably well behaved. In practice, we assume the existence of such an *f*(*t*), and refer to this as the leading edge, and proceed from there. Note that it is not necessary to know the entire wave shape, only the speed of its leading edge, because of our assumption of mutation exclusion. Note also that *f* depends on ε, although in many cases the dependence introduces only a constant; see *Applications* for examples.

This leads to the following more general model. As before, the population inhabits a region *U* with uniform population density ρ, and successful mutations occur at the points of a rate λ = 2*s*ρμ*r*/ξ^{2} Poisson point process in [0, ∞) × *U* (time and space). Any mutation that occurs in an unoccupied area begins to occupy the area around it, occupying by time *t* any previously unoccupied points no more than distance *f*(*t*) away, where *f* is an increasing function with *f*(0) = 0 and *f*(*t*) → ∞ as *t* → ∞. The function *f*(*t*), the *wave expansion profile*, is the radius of an uninterrupted wave after time *t*, and a point at distance *r* from the origin of a mutation will be reached after time units.

However, this model is not quite completely specified: unlike in the constant-speed case, the collision of two waves traveling at different speeds presents a delicate situation. Even laying aside questions of how the waves interact (does their interface grow more quickly?), the resulting patterns are not described by the simple analog of Equation 2, essentially because if waves are accelerating, then one mutation may surround another. The problem is depicted in Figure 2. In defining the regions *A*_{i} associated with each type *i* in the original way, we say that each mutation will occupy by time *t* any previously unoccupied points no more than distance *f*(*t*) away. However, as is shown in the last time slice of Figure 2, this requires the path leading to the occupation of certain points to *pass through* regions already occupied by another type. In the constant-speed case, it was always clear which points were reached first by each mutation (for a proof, see Møller 1992). A natural definition of the regions *A*_{i} that jibes with the “wave-like” intuition would be that at time *t*, region *i* expands outward (perpendicularly to its boundary) at rate , if it is not blocked by another region. This definition would allocate the gray region in Figure 2 to the (blue) mutation that arose second. Unfortunately, this model seems significantly more difficult to analyze, due to possible interactions between many mutational origins. Both definitions are likely equally good approximations to the true dynamics, which are inherently stochastic, especially in the accelerating wave case (Mollison 1972). Furthermore, if allowing mutations to interfere with each other only slows the waves down, the original definition will be more conservative than another that allows interference, in that it results in strictly fewer independent mutational origins.

With this in mind, we stick with the simple analog of Equation 2, defining formally for *t* > *t*_{i}(7)This preserves the same intuition as before, with the modification that waves may now move through each other invisibly to claim an area on the other side of a distinct region, which is unrealistic, but results in an underestimate of the number of independent parallel mutations.

Although this more general model has also been studied in the context of crystallization, where it is generally known as “Kolmogorov–Johnson–Mehl–Avrami” dynamics (Fanfoni and Tomellini 1998), the focus is on phase transitions and how the proportion of occupied space increases with time (governed by the Avrami equation). The statistics of the number and shape of the regions in this more general setting seem to have so far been unaddressed.

Any quantity we might be interested in follows in principle from a calculation with Poisson point processes—see Kingman (1993) or Daley and Vere-Jones (2003) for background. Figure 3 depicts the general procedure; it will be useful in visualizing the following argument. For instance, fix a time *t*_{0} and a point *x*_{0} that is at least distance *f*(*t*_{0}) from the boundary. The location *x*_{0} is covered by a mutant type if at some time previously, a mutation arose nearby enough that it could have reached that location in the time elapsed since it arose. If the mutation arose *t* time units in the past, it must be within distance *f*(*t*). Therefore, to see if a point *x*_{0} has *not* yet been covered at time *t*_{0}, we need only look back through time to see if, at each time *t* units in the past, the corresponding circle with radius *f*(*t*) is empty of new mutations. A trajectory of radially expanding circles moving back through time sweeps out a cone in space–time (whose sides are not straight if the speed is not constant); we denote by *h*(*t*) the area (in space–time) of such a cone of height *t*, defined by *h*(*t*) = |{(*s*, *y*) : *s* ≥ 0 and ‖*x* − *y*‖ ≤ *f*(*t* − *s*)}|. The area of this cone multiplied by the population density represents the total number of individuals a mutation could have arisen in. Since successful mutations form a Poisson point process in space and time, we know that the number of successful mutations in such a cone is Poisson distributed with mean λ*h*(*t*).

This simple fact gives us the distribution of the time until some (any) adapted type arrives at a given location. Denote by τ the time that *x*_{0} is first reached by an adaptation, shown in Figure 3. [Formally, τ = inf {*t* > 0 : *x*_{0} ∈ ∪_{i} *A*_{i}(*t*)}.] Then τ > *t* if the cone with base at (*x*, *t*) is empty of successful mutations, and soand so if the species range *U* is infinite (we mean ℝ* ^{d}*), the expected time until

*x*

_{0}is adapted is

The entire process is parameterized by three things: the wave expansion profile *f*(*t*), the mutation intensity λ, and the region *U*. However, by changing variables as before, we can remove λ and the scaling of *U*: the process characterized by (*f*(*t*), λ, *U*) is equivalent up to a linear scaling of time and space, to the process characterized by (*f*(*t*/λ*x ^{d}*)/

*x*, 1,

*U*/

*x*). (

*U*/

*x*is the region

*U*rescaled by the numeric factor

*x*.) In the constant-speed case

*f*(

*t*) = ν

*t*, then we are left with a single parameter,

*f*(

*t*/λ

*x*)/

^{d}*x*=

*x*

^{−(d+1)}ν

*t*/λ = (χ/

*x*)

^{d}^{+1}ω(

*d*)

*t*, where χ = (ν/(λω(

*d*)))

^{1/(d+1)}is the characteristic length of Equation 3, and recall that ω(

*d*) is the area of the sphere with radius 1 in

*d*dimensions. This suggests defining the characteristic length χ more generally to satisfy the equation(8)This has a natural interpretation. An unobstructed mutation in

*t*units of time will cover an area with radius

*f*(

*t*). The expected number of other mutations that occur in that area over that time period is λω(

*d*)

*tf*(

*t*)

*. After the wave has traveled distance χ, this expected number is λω(*

^{d}*d*)

*f*

^{−1}(χ)χ

*= 1. Thus, χ is the distance traveled by an unobstructed spreading wave before it is expected that one other successful mutation would have arisen within the area enclosed so far.*

^{d}Regardless of the wave expansion profile, if the range is large relative to χ, there will be many mutations with high probability; and conversely, a range small relative to χ will have few mutations with high probability. Indeed, by the time a single wave has traveled distance χ, in any other circle of radius χ there is a good chance that another mutation has already occurred, so that the chance a single mutation manages to fix everywhere before another arises is small.

##### Global properties:

Now we derive formulas for a few other quantities of interest: the mean density of successful mutations, the expected area covered by a typical mutation, and properties of the distance from a chosen point to its mutational origin. We provide these results and their derivations to give more intuition for the Poisson point process and space–time cones. Møller treated only the constant-speed case but allowed inhomogeneity in time; it seems that the case of a general wave expansion profile *f* does not appear in the literature. Less mathematically inclined readers may at this point skip to *Some fat-tailed kernels*, or even to *Simulations*, without much loss of continuity.

We assume that the region *U* is ℝ* ^{d}*. In general, the shape of the region will have some effect on these quantities, but if the region is large in all directions relative to χ, then the effect will be small. Also, without loss of generality, we can now rescale time so that the mutational flux λ = 1 (which also affects the wave expansion profile

*f*). When we compute numerical examples, we need to remember that time is in units of 1/λ generations.

We make much use of the volume of the “cone” in space–time swept out by the expanding mutation over *t* time units, defined to be and depicted in Figure 3. We also want to know the volume of the cone with radius *r* at the base, which (abusing notation a bit) we define by .

First consider ν, the mean number of successful mutations per unit area. Let *g*(*t*, *x*) be the probability that a successful mutation arose at location *x* and time *t* and that no other mutation reached *x* earlier. The mean number of successful mutations originating within a region *W* is the integral of *g*(*t*, *x*) over [0, ∞) × *W*. The probability that some mutation arose in a small region of size ε about (*t*, *x*) is approximately ε; and the probability that no other mutation had already reached that point is exp(−*h*(*t*)). Since this does not depend on *x*, the mean number of successful mutations originating in *W* is equal to the area of *W* multiplied by ν, which we now know to be(9)A subregion *W* of total area |*W* | will have on average ν|*W* | successful mutations arising within it (although note that mutations not arising within *W* could invade).

Now consider the area finally occupied by a “typical” successful mutation, which we denote by *A*. (Technically, with distribution given by the Palm measure.) Since the total area of a region divided by the number of mutations in the region, which converges to E[*A*] as the size of the region increases, can also be seen to converge to 1/ν, we also now know that .

Finally, fix some point *x*_{0}. Let *X* be the distance from *x*_{0} to the origin of the mutation that eventually encloses it, let τ be the time until *x*_{0} is first covered, and let *R* = *f*(τ) (depicted in Figure 3). As in the constant-speed case, the probability that *x*_{0} has not yet been reached by time *t* is exp(−*h*(*t*)), and so the distribution of τ is given by ℙ{τ > *t*} = exp(−*h*(*t*)). Then to have *X* = *x* and *R* = *r*, we need a mutation to have arisen at the appropriate time somewhere in the ring of radius *x* about *x*_{0} and no other mutation to have already arisen in the cone whose point is at (τ, *x*_{0}). The cone is empty with probability exp(−*h*(*r*)), and since the ring of radius *x*_{0} and width *dx* have area *d*ω(*d*)*x ^{d}*

^{−1}

*dx*(and the number of points in it is Poisson distributed), the contribution to the joint probability density of

*X*and

*R*is

*d*ω(

*d*)

*x*

^{d}^{−1}(note that each

*d*appearing in this last expression denotes the dimension). This joint density is then(10)Integrating over possible values of

*R*, we get a density for

*X*:(11)Changing the order of integration, the moments of

*X*can be written(12)

#### Stochastic waves:

We have treated the wave expansion as deterministic, but in real biological systems, the true dynamics will be stochastic. Established theory for Poisson processes allows us to at least write down analogous expressions in the general, stochastic case. For example, if the selection coefficient varies over a sufficiently small range that mutational exclusion approximately holds, we could model each type as having a randomly chosen speed. Alternatively, the shape itself could be random, with long-distance migrants causing discrete patches to appear outside of the main spread of the wave in a stochastic manner. Happily, it turns out that if we make the same definition as at the start of *The general case* to avoid the “delicate situation,” then the equations for ν, the distribution of τ, and 𝔼[*A*] all still hold, after replacing *h*(*t*) by the mean volume swept out over *t* time units, as we prove in appendix a.

#### Some fat-tailed kernels:

Dispersal kernels that lead to accelerating waves are thought to be important for the spread of real organisms (Shigesada and Kawasaki 1997). A consequence of the existence of a characteristic length is that the long-time behavior of a wave has little effect—most waves spread no farther than a small multiple of χ, and hence different *f* that agree over the scale of χ will result in similar patterns. Since even exponentially bounded kernels can lead to waves that accelerate for some time before reaching the asymptotically constant speed, it is of interest to look at different wave expansion profiles, regardless of what we believe about the tail of the dispersal kernel.

Kot *et al.* (1996) divide the class of fat-tailed distributions into those distributions with finite moments of all orders and those without. Since our interest is in the resulting patterns of diversity, and it is not our intention to analyze the precise behavior of the wave under different models of dispersal, in appendix b we follow Fourier transform methods of Kot *et al.* (1996) to somewhat heuristically obtain expressions for the wave expansion profile *f*(·) for two families of fat-tailed distributions without moment-generating functions: the stretched exponentials (with moments), and the Lévy symmetric stable distributions (without moments). We then apply the theory of *The general case*. These results are used in *Applications* to compare the three families at some biologically relevant values.

All kernels have a scaling parameter, denoted by σ, which is the scale on which distance is measured. To then compare different kernels, we need to standardize them to each other; however, we cannot match standard deviations because for some of these kernels, the standard deviation is not defined. We have chosen to standardize so that the interquartile ranges (IQR, the difference between the 75th and the 25th quantiles, equivalent here to the *median absolute deviation*) match those of the standard Gaussian. See Figure 4 for a depiction of the resulting kernels.

## RESULTS

#### Simulations:

To test the robustness of the theory to deviations from the assumptions, we implemented simulations in R (www.r-project.org). In the simulations, the population is a rectangular grid of demes with *N* haploid individuals in each. Each generation, each individual independently produces either one or no offspring; the probability she produces an offspring is *r* if she is of the ancestral type, and it is *r*(1 + *s*) if she is of any mutant type. Each offspring is a new, as yet unseen type with probability μ; otherwise it is the same type as the parent. Next, all individuals have the chance to migrate, which they do independently of each other with probability *m* to a nearest-neighbor deme. We also used a truncated power-law dispersal kernel, which gave similar results, which we do not display. Those that migrate outside the population are lost. Finally, the demes are resampled down to size *N*. As in the Wright–Fisher model, death does not occur explicitly, but only during the resampling phase. The history of a typical run on a one-dimensional grid is shown in Figure 5 with time in the vertical direction, and several time slices of a typical two-dimensional simulation are shown in Figure 6.

The simulations use discrete generations and discretized space, in contrast to our theory, and so provide better support for the approximations we use. The discrete model is, however, in the domain of attraction of the classical wave of advance—if we rescale time so each generation lasts ε, rescale space so the grid spacing is , and make selection weak, setting *s* = *s*′ε, then in the limit as ε → 0 and *N* → ∞ in the appropriate manner, we expect the frequency of mutant types to satisfy the Fisher–KPP equation (1) with *s* = *s*′ and σ^{2} = *m* 2^{−(d−1)}.

We first used simulations to investigate how well the wave behavior predicted by the Fisher–KPP equation approximates the wave speed of this discrete model. Deviations in the wave speed and the wavelike form of the spread are likely to be the largest sources of error in our approximations. The results from a representative set of simulations are shown in Figure 7, performed on a linear grid of 1000 populations, with *s* = 0.1, *N* = 1000, *r* = 0.4, nearest-neighbor dispersal, μ = 0, and an initial seed of 20 mutant individuals at the origin. For local dispersal the wave speed was constant, after a short transient period of random length, as the trajectories on the left of Figure 7 show. The speed found was one to two times faster than predicted, likely because our simulations have discrete generations (*r* was typically between and ) and discrete demes, rather than the continuous space and time of the Fisher–KPP equation. However, the wave speed does depend linearly on *m* and as predicted across different dispersal distributions (results not shown), which we view as the important verification.

To test our prediction that the mean radius of the regions occupied by distinct mutations was captured by our compound parameter, the characteristic length, we also compared the size of regions occupied by each type in simulations to the theoretical predictions. We ran 818 simulations on linear grids of 1000 demes with nearest-neighbor migration at 45 different combinations of the following parameter values chosen to obtain a good spread of characteristic lengths: migration probability *m* ∈ (0.025, 0.05, 0.1, 0.2), reproduction rate *r* ∈ (0.1, 0.2, 0.05), local population size *N* ∈ (200, 600, 1000), and selection coefficient *s* ∈ (0.08, 0.16, 0.24, 0.32, 0.4). Simulations were stopped at the first time the ancestral type went extinct, and the proportions of the total population that were of each type were computed. In a one-dimensional range, we expect the mean area occupied by a type (and hence, proportion of the total population, since all simulations had the same number of demes) to increase linearly with the characteristic length, with a slope depending on the details of the model (which we did not attempt to compute). The resulting distributions of proportions are shown as a function of characteristic length in Figure 8 (open boxplots), showing a good linear fit of the mean area to the characteristic length (solid line), confirming that our theoretical predictions fit the simulated model well.

##### Limits of the model:

In our view one of the more serious approximations we make is that once the allele is introduced to a deme it quickly reaches its equilibrium frequency locally. This allows us to assume that the time delay between when the selected allele arises and when it begins to spread as a wave of known speed is short and relatively constant across alleles. This approximation is important because it underlies our assumption of allelic exclusion, *i.e.*, that in areas reached by a spreading mutation, subsequent mutations do not arise and escape drift fast enough to also spread. In both cases “fast” must be quicker than the timescale on which mutations arise and escape drift locally. To demonstrate the shortcomings of this approximation, we ran an additional 90 simulations on the same linear grid with a Gaussian dispersal kernel whose SD is one-tenth the range size (100 demes), chosen to violate this assumption. The results are shown in Figure 8 (shaded boxplots). The local population size *N* was set to 10,000, to make the characteristic lengths comparable, and the other parameters were a subset of those chosen above [*m* = 0.05, *r* = 0.1, and *s* ∈ (0.16, 0.24, 0.32, 0.40)]. The observed proportion of the range occupied per type is far lower than would be predicted from our characteristic length, which is expected if mutational exclusion does not hold. Indeed, for these examples the width of the wave [which is (Fisher 1937; Kolmogorov *et al.* 1937)] is around one-third the characteristic length, indicating that a significant number of successful mutations will arise in a location where another type has already begun to occupy. In fact, we see in Figure 8 that for these simulations the mean number of types does not appear to depend on *s*, as expected in the panmictic model of Pennings and Hermisson (2006a). If the width of the wave was much smaller than the characteristic length, this would not be a problem. We return to the question of when our approximations hold in the discussion.

#### Biological parameters and the characteristic length:

The best summary of the probability and spatial scale of parallel adaptation in our model is the characteristic length. The simple form of the characteristic length, especially in the classical Fisher–KPP case, allows us to find how the various parameters affect the spatial scale and probability of parallel adaptation. For example, doubling the local effective population density has a similar effect to halving the standard deviation of the dispersal distance. Intuitively, this reflects the fact that halving the standard deviation of the dispersal kernel doubles the time it takes a mutation to spread a given distance, which in terms of the chance of parallel adaptation is equivalent to doubling the population density. Note, however, that these two changes are not equivalent in terms of the time it takes adaptive alleles to spread throughout the entire range—doubling the density will decrease the time until adaptation, while halving dispersal distances will increase the time, as seen in Equation 6. Furthermore, it is intuitively clear—and this can be made rigorous—that in a region large relative to the characteristic length, many parallel mutations are very likely.

With this in mind, we here compute characteristic lengths at some representative parameter ranges and in some specific situations, to give a sense of under what conditions parallel adaptation is likely. Consider a population spread over an area the size of Europe, here defined as the area west of the Ural mountains, which has an area of ∼10^{7} km^{2} (about a third the area of Africa) and is ∼4000 km across in the longest direction. Therefore, if χ < 4000 km, multiple mutations are likely. We take two population densities, ρ = 2 and ρ = 0.002, chosen respectively to reflect the human population density 1000 BCE (McEved and Jones 1978), and the long-term ancestral effective population size of humans, *N*_{e} = 10,000, spread out over the area of Europe. We vary the mutation rate between 10^{−8} and 10^{−4} mutations per generation, representing a mutational target of between 1 and 10,000 bp (the typical mutation rate in humans is ∼10^{−8} per base per generation). We keep the selection coefficient *s* = 0.01 fixed, since the results are fairly insensitive to changes in *s*. We vary the typical dispersal distance σ between 1 and 100 km [in line with human dispersal estimates (Wijsman and Cavalli-Sforza 1984)]. Finally, we compare several different dispersal kernels—rescaled so that each shares a common interquartile range—to see how fatness of tails affects the results (the stretched exponential, with α = , matches that used by Novembre *et al.* 2005).

From Figure 9, we see that the expected degree of parallel adaptation depends strongly on the parameters. At the lower population density (black lines), it is not until mutation rate is on the order of hundreds or thousands of base pairs (depending on σ) that independent origins are likely; at smaller mutation rates parallel adaptation is highly unlikely. At the higher population density (red lines), independent origins are likely even if the mutational target is only a few base pairs; at larger mutation rates the characteristic length falls to only tens or hundreds of kilometers, indicating a ubiquity of mutational origins. This is to be expected, because with ρ = 2, an area the size of Europe would have 10^{7} individuals, and thus a mutation rate of 10^{−4} per generation would produce 1000 mutant alleles in a single generation. Of course, a characteristic length that is on the order of the dispersal distance means that the model described here—a smooth circular outward spread of alleles—is no longer a good approximation to the true dynamics.

It is also clear from Figure 9 that over this range of parameters the choice of dispersal kernel does not affect the number of mutations as strongly as do other parameters. The characteristic length does vary between kernels, but not generally enough to change the conclusion that parallel adaptation is likely or unlikely. At certain parameter combinations the difference is large, but not over much of the biologically realistic range we have examined.

It is also useful to look at the expected amount of time that the population will take to adapt, which we display in Figure 10 over the same set of parameters. The first observation is that when parallel evolution is likely, it will also take a reasonably small amount of time—under the conditions where adaptation will take an exceptionally long time, most of the time is spent waiting for a single successful mutation. The different dispersal kernels, however, lead to more disparate times to adaptation. Of particular note is the Cauchy kernel, which if dispersal distance is small far outpaces the spread by other dispersal distributions. As with small characteristic lengths, very small times to adaptation should not be taken too seriously, indicating only that the spread happens quickly.

#### Applications:

##### Why are there so few recent Eurasian-wide sweeps in Humans?

The majority of recently arisen selected alleles, as identified from, *e.g.*, haplotype patterns, seem to be geographically restricted (Voight *et al.* 2006; Wang *et al.* 2006; Coop *et al.* 2009; Pickrell *et al.* 2009), occupying broad geographic areas such as Europe or East Asia or Western Eurasia. On the basis of this observation, Coop *et al.* (2009) and Pickrell *et al.* (2009) argued that few selected alleles have recently swept to fixation across Eurasia. There are at least three possible explanations for this pattern: (1) there has not been sufficient time for these alleles to spread; (2) the selection pressures are at a local scale, not Eurasia-wide; and (3) the selection pressures are shared across Eurasia and different populations have adapted in parallel. These explanations are not mutually exclusive, and to distinguish between them we need a much more in-depth knowledge of the phenotypes underlying the putative selective sweeps. However, the theory developed here can shed light on whether the last hypothesis is plausible.

We use a subset of the parameters in the previous section: a Gaussian kernel with σ = 100 km, a stretched exponential kernel with α = , and stable distributions with several values of α. As before, the non-Gaussian kernels are parameterized to have the same interquartile range as the Gaussian kernel with the same σ. Table 1 displays our computed characteristic lengths across the above combinations of parameters, at mutation rates of both 10^{−8} per generation (a single unique base pair change) and 10^{−5} per generation (a 1000-bp target, roughly the number of coding bases in a gene). The different dispersal kernels give roughly similar characteristic lengths, suggesting that these numbers are relatively robust to the choice of kernel.

Multiple mutational origins are likely if the characteristic length is shorter than the physical dimensions of the region. Eurasia measures >8000 km across, and so Table 1 suggests that multiple origins at a single base pair are very unlikely at the lower population density. On the other hand, if the mutational target is large, then multiple origins are likely at low densities, while at high densities independent origins are ubiquitous. The complementary cases of (ρ = 2, μ = 10^{−8}) and (ρ = 0.002, μ = 10^{−5}) give identical characteristic lengths of ∼3000 km, although the timescale on which the mutations spread differs. Thus for these two parameter combinations we can expect a few mutations to dominate within continents and for multiple mutations to be common in a population spread across an area the size of Eurasia. Obviously these calculations are very crude, as population densities vary through space and time, and dispersal across continents is not simply a function of geographic distance and individual dispersal. Nevertheless, these calculations suggest that it is plausible that for adaptive traits with reasonable mutational targets (*e.g.*, a change anywhere within a gene or pathway) even low population densities can lead to parallel adaptation across an area the size of Eurasia, and higher densities almost certainly will.

We note that as human population densities have increased dramatically over time, so too has the probability of parallel adaptation. It is interesting therefore to note that a number of recent human adaptations (*e.g.*, sickle cell alleles) involve repeated changes at very small mutational targets in relatively small geographic areas, while older adaptations from single changes (*e.g.*, skin pigmentation) are more broadly spread.

##### Parallel adaptation of the sickle cell allele:

The sickle cell allele HbS at the β-globin gene in humans provides a particularly interesting case of putative parallel adaptation. The HbS allele (β6 Glu → Val) has been driven to intermediate frequencies by selection within the past 10,000 years due to increased resistance to malaria of heterozygotes for the allele (Haldane 1949; Allison 1954; Currat *et al.* 2002; Kwiatkowski 2005). The HbS allele is present on at least four major distinct haplotypes in Africa, each at intermediate frequency within a different geographic region; the haplotypes are named after the population sample where they were first discovered (Central African Republic, Senegal, Benin, and Cameroon). This is consistent with multiple origins of this single-base-pair change. Note that a distinct, malaria resistance allele, HbC (β6 Glu → Lys), has also arisen in Africa at the same codon as the HbS allele (Trabuchet *et al.* 1991; Agarwal *et al.* 2000; Wood *et al.* 2005a), increasing our confidence that the mutational input was high enough to allow multiple types to arise. However, Flint *et al.* (1998) thought the hypothesis of multiple new mutations arising at a single base pair was extremely unlikely and proposed that it was more likely that gene conversion had spread a single mutation across multiple haplotypes.

The theory we have developed can be used to assess the plausibility of the multiple mutational origins of the sickle cell allele, by exhibiting parameter combinations that yield characteristic lengths consistent with the separation of the sample locations. [Recall that the wave of advance, and thus also our model, works in the case of heterozygote advantage (Aronson and Weinberger 1975).] The different HbS haplotypes co-occur within a few thousand kilometers of each other (see Table 5 of Flint *et al.* 1998) (noting that these locations are unlikely to reflect the geographic mutational origins, and mutations will have been spread by large population movements). As the HbS changes occur at a single base pair, the mutation rate would have been ∼10^{−8}, and we take an *s* = 0.05 (as in Currat *et al.* 2002). If human dispersal at that time was well approximated by a Gaussian kernel with σ = 100 km, then a characteristic length of ∼1000 km would require an effective density of individuals of ρ ∼ 25 km^{−2}, while if σ = 10 km, then we would require only ρ ∼ 2.5 km^{−2}. This latter set of parameters does not seem unrealistic, considering our knowledge of population density and dispersal parameters, so our model suggests that the hypothesis of multiple origins is not unreasonable.

## DISCUSSION

We have presented theoretical results on the prevalence of parallel mutations during the sweep of an adapted allele across the geographic range of a species. Parallel adaptation can occur in a species adapting to a shared selection pressure simply because selected alleles may spread slowly enough to allow other mutations to arise and spread elsewhere in the species range. The distribution of the number of unique types is given implicitly by a certain Poisson process, which we summarized by computing the mean values of several important quantities. Many features of the continuous model can be captured by a characteristic length, a compound parameter that combines the dispersal parameter, the mutation rate, and the population density, and our simulations confirm that this is a very useful predictor of model behavior. This characteristic length can be obtained under a wide variety of dispersal kernels, as long as the speed at which selected alleles spread under those kernels can be computed even if that speed is not constant. The regions occupied by distinct types have dimensions on the order of the characteristic length, so if the species range is at least as large as the characteristic length, then parallel adaptation is likely. The expected number of parallel mutations is a simple function that, as intuition would predict, decreases with dispersal rate and increases with mutation rate. Somewhat counterintuitively, the results are relatively insensitive to the strength of selection, as selection both hastens the spread of an allele and conversely increases the chances that a new mutation escapes drift.

#### Does parallel adaptation require strong population structure?

Our results confirm the intuitive idea that species with low levels of dispersal (and hence strong geographic genetic structure) may adapt to global selective challenges by parallel adaptation at separated geographic locations. However, the likelihood of parallel adaptation depends on dispersal strength *relative to population density* (ρ/σ), and so geographic parallel adaptation may be an important factor even in species that appear panmictic at neutral markers. The observation of relatively little neutral geographic genetic structure merely implies that genetic drift in subpopulations is slow compared to migration, which increases with both dispersal distance and local effective population density. (Neutral population structure is determined by Wright's neighborhood size, proportional to σ^{2}ρ; see Charlesworth *et al.* 2003.) Increasing dispersal distance (σ) reduces the chance of parallel adaptation, while increasing population density (ρ) increases the chance of parallel adaptation. Therefore, an absence of strong geographic structure cannot rule out the possibility of geographic parallel adaptation. In this context it may be helpful to consider the global effective population size in a geographically spread population, a common estimator of which is the average coalescent time of pairs of sequences sampled across the species range. Such estimators increase both with lower dispersal distances and with higher local effective population densities (Charlesworth *et al.* 2003). Thus, species with larger global effective population size are more likely to adapt by parallel mutation whether they are truly panmictic (as shown by Pennings and Hermisson 2006a) or selection is dispersal limited, as discussed here.

#### Signals in patterns of diversity:

While we have discussed our model results in terms of parallel adaptive changes at the same genetic locus, our results do not rely on an assumption of complete linkage between the selected alleles. The base pairs at which these changes occur can be contiguous, partially linked, or completely unlinked; we merely require that the mutations are selectively equivalent. Thus our results apply equally well to selected alleles of similar phenotypic effect that have arisen in parallel at different genetic loci, *e.g.*, mutations at different genes in the same pathway. There are a number of potential cases of parallel adaptation within a species where adaptive changes at different genes have produced similar phenotypes in different populations (Hoekstra and Nachman 2003; Nachman *et al.* 2003; Hoekstra *et al.* 2006; Steiner *et al.* 2009). For example, there is evidence for differences in the genetic basis of the adaptive response to shared selection pressures in European and East Asian human populations (Lao *et al.* 2007; Norton *et al.* 2007; Edwards *et al.* 2010).

If parallel adaptation has occurred, there are at least two potential signatures in patterns of genetic variation. The first is that, immediately after the sweep, the independent copies of the selected allele will form a spatial patchwork with patches of size ∼χ. Within each patch, a different selected allele will predominate. Each of these mutations may have arisen on a different haplotype, especially if neutral genetic variation varies across the species range. This linked genetic variation will be swept up to high frequency locally and so also form a patchwork, with different haplotypes common within each patch. These patches may maintain sharp boundaries in allele frequencies between them for some time and so may resemble local adaptation, despite the fact that the causal selection pressures are homogeneous (see Coop *et al.* 2009, for discussion). This similarity may be further compounded since the boundaries between selected types will tend to occur at geographic barriers to migration, as selected alleles will temporarily be slowed there (Piálek and Barton 1997).

Over time these spatial patterns will be erased through the mixing action of migration. This spatial mixing by migrants will occur in a manner analogous to heat flow, on a timescale given by the diffusive parameter σ, the standard deviation of the distance between the birth locations of parent and child. In a geographic region of linear dimensions *R*, the patterns will become erased in a time of order *R*^{2}/σ^{2} generations. If the alleles have strongly deleterious epistatic (or dominance) interactions, then their geographic mixing will be more complicated and perhaps slowed. A potential example of this is offered by Williams *et al.* (2005) and Penman *et al.* (2009) who discuss the role of epistasis in preventing the geographic mixing of different malaria resistance alleles in humans. Indeed, Kondrashov (2003) has suggested that the relatively slow spread of selected alleles across a species range might allow Dobzhansky–Muller epistatic incompatibilities to arise in parapatry, potentially leading to speciation.

The second, possibly longer-lasting, pattern is the partition of alleles and their linked haplotypic variation. If genetic drift is slower than spatial mixing, then the initial partition of alleles is spread out uniformly over space long before any alleles disappear or fix through genetic drift. After mixing by migration, the resulting patterns would be characterized by a region of reduced variation and longer linkage disequilibrium surrounding each selected locus. If the independent selected changes occur at the same gene, this will resemble a soft sweep in a single panmictic population as described by Pennings and Hermisson (2006a,b). Indeed, it is possible that some of the characterized putative soft sweeps (*e.g.*, Schlenke and Begun 2005) arose in this manner. If the changes occur at different genes, there will be a set of partial sweeps at the different loci. Such patterns could potentially explain the apparent excess of partial sweeps, compared to full sweeps, seen in human populations (Coop *et al.* 2009; Pritchard *et al.* 2010). Thus parallel mutation would allow a population to maintain a higher level of heterozygosity at the selected loci than would sweeps from a single mutational origin. A related argument has been made by Goldstein and Holsinger (1992), who discussed the role of genetically redundant mutations and isolation by distance in the maintenance of genetic variation in quantitative genetics models (see also Lande 1991 and Kelly 2006).

#### Exclusion and spatial structure:

Two of our main assumptions—mutational exclusion and our resolution of the “delicate situation” in *The general case* section—lead to an underestimate of the number of independent, parallel adaptations, while others—such as deterministic spread of the waves—affect only the variance of the number of parallel adaptations. It is also useful to compare with existing results on panmictic populations. As described in *Simulations*, simulations using long-distance dispersal that resulted in a nearly panmictic model produced a much higher number of parallel adaptations than predicted from our theory. This is in line with the results of Pennings and Hermisson (2006a), who showed that *within* a single large randomly mating population, a high rate of introduction of selectively equivalent mutations can allow multiple mutations to escape low frequency before the first to arise fixes in the population. Extrapolating from their results, we see that if population density is high enough in an area where a spreading mutation has begun to establish, then other mutations could arise and concurrently spread, undermining our assumption of allelic exclusion. The higher migration is relative to reproductive rate, the closer a model is to the panmictic model of Pennings and Hermisson (2006a). In general, since it neglects spatial structure, we expect the panmictic model to underestimate the true degree of parallel adaptation as well, so if the panmictic model predicts more parallel adaptations than does our model, we expect the truth to be closer to the panmictic predictions. Future work could relax our assumption that the mutations quickly reach equilibrium, allowing model predictions more accurate than either model. In any case, our results provide an underestimate of the prevalence of parallel adaptation, but with very widely dispersing species the results of Pennings and Hermisson (2006a) may be more appropriate.

#### Selective equivalence:

Throughout this article we have assumed that variation in the selection coefficient will be much smaller than the strength of selection (as follows from our focus on *parallel* adaptation). However, it is unclear how often the *strict* selective equivalence holds in practice. Mutations that have a convergent effect on a phenotype of interest may differ in their pleiotropic effects, and even identical changes at the same base pair may have somewhat different effects due to linked variation.

However, the characteristic length depends only weakly on *s*, so the effect of small differences in selection coefficient should be minimal. Further, our results in the *Stochastic waves* section suggest that if there are only small differences, it is reasonable to use the mean selection coefficient, and that the mutations will initially form a patchwork with the average size of a patch given by this characteristic length. The timescale over which the patchwork persists will be affected by the differences in selection coefficient. Suppose for simplicity that each newly arising type chooses one of only two distinct selection coefficients, either *s*_{1} (a “weak” mutation) or *s*_{2} > *s*_{1} (a “strong” mutation) that interact additively. The original patchwork is erased as the stronger mutations push into, or arise within and overtake, areas already occupied by the weak mutations. They do this at speed , so the timescale over which the original tessellation is erased is of order . The patterns in diversity resulting from multiple types with different selection coefficients will depend on the linkage of the loci underlying the different types. In some cases (*e.g.*, full linkage), the stronger allele may push the other out of the population as it spreads; while in other cases (*e.g.*, no linkage) the stronger allele will spread throughout the population but not disrupt the spatial pattern of the weaker alleles.

#### Outlook:

Our results demonstrate that if dispersal is indeed a limiting factor in the spread of selected alleles, then in large geographically spread populations, parallel adaptation will be common. As yet there are relatively few firm examples of parallel adaptive mutations within species, but we believe that this simply reflects the fact that we are just beginning to identify and understand mutations that contribute to adaptive phenotypes. It is notable that many of the cases of geographic parallel adaptation come from humans and the other species where phenotypes that have reasonably well-characterized genetic bases have been carefully studied (*e.g.*, pigmentation or drug and insecticide resistance). This suggests that further work on other species and phenotypes will uncover many more examples.

Genes or pathways that harbor different mutations that have swept in nonoverlapping parts of the species range will represent good candidates for geographic parallel mutation. One difficulty in interpreting these candidates will come in understanding whether they are approximately genetically redundant with respect to the phenotypes that selection has acted on or whether they dominate in different portions of the species range because they represent locally adapted alleles. The former explanation may be appropriate as a null hypothesis, as it requires fewer differences in selection pressures across the range and requires only that populations are geographically separated.

Aside from identifying candidates for geographic parallel mutation, a productive line of research is to understand whether population densities and dispersal patterns are conducive to their occurrence. The spatial density of individuals within a population is likely to fluctuate dramatically over time, so the long-term effective population size for the species is likely to be a very poor estimate for the rate at which selected mutations arise, especially in populations that have experienced recent rapid growth. The move toward genome-wide population resequencing data will allow the recent effective population size to be estimated from the rare alleles, and the spatial spread of these rare alleles will be informative about recent dispersal parameters (*e.g.*, Novembre and Slatkin 2009).

As yet we know relatively little about the full impact of long-distance dispersal, a situation that will hopefully be improved by the increasing spatial and genomic resolution of population genetic studies (*e.g.*, Novembre *et al.* 2008; Auton *et al.* 2009), along with the methods to accurately identify subtle signals of gene flow in such data sets (*e.g.*, Price *et al.* 2009). In many species rare, extremely long-distance migrants occur, which can have strong effects on the speed and patchiness with which the wave advances (Kot *et al.* 1996; Clark 1998; Lewis and Pacala 2000). While in numerical examples we did not see a strong effect on the likely amount of parallel mutation, this conclusion does not extend to all parameter values, and it is difficult to compare parameters across different dispersal distributions. If migration is not spatially restricted (*e.g.*, the fully connected “island” model), then we expect the dynamics to be significantly different. There are a number of examples of very rapid spread of selected alleles [*e.g.*, in malaria (Wootton *et al.* 2002; Roper *et al*. 2004; Anderson and Roper 2005) and of vertically transmitted parasites (Kidwell 1983; Turelli and Hoffmann 1991)], perhaps suggesting that the extent to which limited dispersal allows parallel adaptation is still a very open question and likely to vary between species.

## APPENDIX A: STOCHASTIC WAVES

Here we continue where the *Stochastic waves* section left off, to demonstrate when, and in what sense, expressions for the “mean wave” suffice, even if the true behavior is stochastic.

We need to know the probability that an isolated mutation arising at *x* will first cover a point at *y* after time *t*, which we define to be *q*(*t*, *r*), with *r* = ‖*x* − *y*‖, the distance from *x* to *y*. We assume that this quantity depends only on the distance between *x* and *y*, regardless of the direction. This does not require that waves are circular, only that there are no “preferred directions.” We also assume that each wave's stochasticity is independent of the Poisson process of locations and each other.

The mean time until a point *x*_{0} is reached by the adaptive type can be found in a manner entirely analogous to the deterministic case. Before, all waves were the same, so we needed only to keep track of their origins; now mutations can be thought of as having different “types,” chosen independently and randomly from a type space 𝒱, so that each point now consists of a triple (*X*_{i}, *T*_{i}, *V*_{i}), recording the location *X*_{i} and time *T*_{i} of the mutational origin and what type of wave *V*_{i} it will have, respectively. The types *V*_{i} are chosen independently from some common distribution. For a simple example, if only the speed is random, then 𝒱 is the set of positive real numbers, *V*_{i} simply records the speed of the wave, and *q*(*t*, *r*) = ℙ{*V*_{i} > *r*/*t* }.

Just as before, the number of points in any region of time, space, and type space is Poisson distributed, with mean proportional to its measure. We now measure “area” in the third coordinate using the distribution of *V*, so that the number of mutations that occur in a spatial region of area *A*, over a time interval of length *t*, and with type lying in some set 𝒰 is Poisson distributed with mean . The probability that the mutant type has not reached *x* by time *t* is the exponential of the total measure of *possible* mutants that would have reached (*x*, *t*). This measure can be written as(A1)so that *h*(*t*) is also the *mean* volume in space–time swept out by an expanding, unobstructed mutation over *t* time units. Just as before,(A2)Furthermore, following the same reasoning as for Equation 9, the mean density of types is given by(A3)

Since *h*(*t*) is the mean volume swept out over *t* time units, the equations for τ and ν are the same as in the deterministic case, except the path of the wave has been replaced by the path of the *mean* wave, in this sense. In other words, in computing the mean density of types, we may replace the path of the wave by its mean path.

## APPENDIX B: WAVE SPEEDS FOR SOME FAT-TAILED KERNELS

Here we apply the theory of *The general case* to the *stable* and the *stretched exponential* families of dispersal kernels. We do not dwell on the justification for expressions of wave expansion profiles of different dispersal kernels, preferring instead to take this as input into the theory, but for the interested (or suspicious) reader, here we outline how the expressions are arrived at, which follows Kot *et al.* (1996), with the minor addition of Equation B5. In each case, an expression for the wave speed is arrived at as follows. Suppose that there is a population of size *N* at every point and that mutant organisms have first a dispersal stage, where they disperse a distance according to a distribution *k*(·), and then undergo density-dependent growth, with a population of size *n*(*x*) at location *x* growing to *F*(*n*(*x*)). The ensuing discrete-time integrodifference analog to the Fisher–KPP equation studied, *e.g.*, in Kot *et al.* (1996) is

As is common in the study of such nonlinear equations, we then linearize the equation by assuming that the spread of the wave depends only on the behavior when the mutant type is rare, replacing *F*(*n*) with *nF*′(0), and writing *F*′(0) = (1 + *s*). See Kot *et al.* (1996) or Mollison (1972) for discussion of this assumption. Then the number of mutants *n* satisfiesSince the Fourier transform takes convolutions to products, if we write and for the Fourier transforms of *n* and *k*, respectively, then(B1)One might hope to then obtain information about the spread of the wave from this expression or even by explicitly inverting it. We assume that the speed in two dimensions is the same as in one dimension, which can be easily proved at least in some sense for many of these models.

#### Stretched exponential:

A family of dispersal kernels with moments but without moment-generating functions is the family sometimes called “stretched exponentials,” whose probability density function isfor 0 < α < 1. If α = 1 and *d* = 1 this is the Laplace distribution; more generally the distribution is sometimes also called the “error distribution.” Here α controls the decay of the tails and *c*_{α} is a positive constant depending on α chosen so that the IQR matches that of the standard Gaussian. This distribution with has been used by Novembre *et al.* (2005) for human dispersal, and Kot *et al.* (1996) showed that it gave the best fit from a few choices to dispersal data for *D. pseudoobscura*. We introduce the scaling parameter σ by replacing *k*(*x*) with *k*(*x*/σ)/σ, as usual.

Following the reasoning above, Kot *et al.* (1996) argued that the stretched exponential dispersal gives a wave that accelerates with a power of *t*:(B2)Because of this form, properties of the constant-speed case can be derived from the case α = 1, after substituting *v* for σ*s*/*c*_{α}. The characteristic length for this family of distributions has the form(B3)where is a constant. Note that this is independent of *s*, unlike the constant-speed case.

In the remainder of this section we neglect the demographic parameters *r* and ξ^{2}, assuming, for instance, that the offspring distribution is Poisson with mean and variance equal to 1. It is fairly straightforward to factor them in, but for simplicity we omit them. The variable *r* may appear, used as a radius; it should not cause confusion.

Using the fact thatwhere Γ is the gamma function, we can compute thatand apply the results of *The general case*. Therefore, the mean area occupied by a typical mutation is

Recall also that the mean time until a sample point is occupied satisfieswhich gives an idea of the timescale the process happens on. The factor of λ appears because of our time scaling.

Also, the moments of the distance *X* of a sample point to the origin of its mutation have a nice form:(B4)

#### Stable distributions:

For another example of fat-tailed dispersal distributions, we take the well-known family of symmetric stable distributions, which are parameterized by a scaling exponent 0 < α < 2. Stable distributions arise naturally in the generalization of the central limit theorem and as scaling limits for random walks with step distributions whose tails have power-law decay like *x*^{α} (so-called “Lévy walks”) and so are a natural choice for a dispersal distribution. For recent discussion of modeling real dispersal with such distributions, see Edwards *et al.* (2007) and Reynolds (2008). The best-known example is the Cauchy distribution (α = 1). The case α = 2 corresponds in some senses to the Gaussian distribution, but what follows does not apply in that case because Equation B5 for the tail behavior does not hold.

Denote by *k*_{α}(*x*) the density function of the α-stable distribution, again normalized to have the same IQR as the standard Gaussian. To incorporate a scaling parameter analogous to the standard deviation, we use the dispersal kernel *k*_{α}(*x*/σ)/σ.

In the case of a stable dispersal distribution, the Fourier transform (B1) can be explicitly inverted. If we begin with *n*_{0} mutants at the origin (as a delta distribution), then the spread is itself given by the dispersal kernel,so if *x*_{t} > 0 is such that *n*_{t}(*x*_{t}) = ε*n*_{0}, thenThe density of the stable distribution for general α is not known in general, but its tail behavior is. For large *x*,(B5)and so for large *t* and small ε,giving *f*(*t*) = σ(*t*(1 + *s*)* ^{t}*α sin(πα/2)Γ(α)/ε)

^{1/(1+α)}. Note that the asymptotic speed is no longer independent of ε—positions farther out the wave front move faster. For our numerical purposes, we take ε = 0.2.

An analytic expression for the characteristic length can be written using the Lambert *W* - function, but it is more efficient to solve (8) numerically. The integrals giving other properties of the process are difficult to do explicitly, but can be done numerically as well.

## Acknowledgments

We thank Dave Begun, Chuck Langley, John Novembre, Sebastian Schreiber, Monty Slatkin, and Michael Turelli for helpful discussions. The manuscript benefited from comments on an earlier draft by Chuck Langley, John Novembre, and Michael Turelli, as well as the editor and two anonymous reviewers. This work was supported by a Sloan Fellowship to G.C., and P.R. was supported by funds from University of California (Davis, CA) to G.C. and S. Schreiber.

## Footnotes

Communicating editor: J. Wakeley

- Received June 4, 2010.
- Accepted July 15, 2010.

- Copyright © 2010 by the Genetics Society of America