The Genealogical Consequences of Fecundity Variance Polymorphism
Jesse E. Taylor

## Abstract

The genealogical consequences of within-generation fecundity variance polymorphism are studied using coalescent processes structured by genetic backgrounds. I show that these processes have three distinctive features. The first is that the coalescent rates within backgrounds are not jointly proportional to the infinitesimal variance, but instead depend only on the frequencies and traits of genotypes containing each allele. Second, the coalescent processes at unlinked loci are correlated with the genealogy at the selected locus; i.e., fecundity variance polymorphism has a genomewide impact on genealogies. Third, in diploid models, there are infinitely many combinations of fecundity distributions that have the same diffusion approximation but distinct coalescent processes; i.e., in this class of models, ancestral processes and allele frequency dynamics are not in one-to-one correspondence. Similar properties are expected to hold in models that allow for heritable variation in other traits that affect the coalescent effective population size, such as sex ratio or fecundity and survival schedules.

THE population genetics of within-generation fecundity variance has been studied from two perspectives. Beginning with Wright (1938), several authors have investigated the relationship between the effective size of a panmictic population with seasonal reproduction and the variance of the number of offspring born to each adult within a season (Crow and Denniston 1988; Nunney 1993, 1996; Waples 2002; Hedrick 2005; Engen et al. 2007). Although the precise form of this relationship depends on other biological factors such as the mating system and the manner in which population regulation operates, each of these studies shows that the effective population size is a decreasing function of fecundity variance. Furthermore, provided that the variance and the coalescent effective population sizes coincide (Ewens 1982; Nordborg and Krone 2002; Sjodin et al. 2005), these results imply that both the rate at which neutral allele frequencies fluctuate from generation to generation and the rate at which lineages coalesce will be positively correlated with within-generation fecundity variance. For example, it has been suggested that the shallow genealogies that have been documented in many marine organisms are a consequence of the high variance of reproductive success in the recruitment sweepstakes operating in these species (Hedgecock 1994; Árnason 2004; Eldon and Wakeley 2006).

These results hold in models in which all individuals have the same within-generation (or within-season) fecundity variance. However, the evolutionary genetics of populations that are polymorphic for alleles that influence demographic traits have also been investigated. The first results of this kind were derived by Gillespie (1974, 1975, 1977), who used diffusion theory to show that natural selection can act directly on within-generation fecundity variance in a haploid population with nonoverlapping generations. By studying a simple model of a population composed of two genotypes, say A1 and A2, Gillespie (1974) showed that the fluctuations in the frequency of allele A1 can be approximated by a diffusion process with the following drift and variance coefficients,$Math$where p is the frequency of A1, N is the number of adults, and 1 + μi and $Math$ are the mean and the variance, respectively, of the number of offspring produced by an individual of type Ai. Most discussions of this class of models have focused on the fitness consequences of differences in fecundity variance, which are summarized by the drift coefficient, m(p), of the diffusion approximation. There are two main conclusions. The first is that because m(p) is an increasing function of the difference $Math$$Math$, selection can favor alleles that reduce within-generation fecundity variance even if these have lower mean fecundity. Such variance–mean trade-offs can be interpreted as a kind of bet hedging and could explain the evolution of certain risk-spreading traits such as insect oviposition onto multiple host plants (Root and Kareiva 1986) or multiple mating by females (Sarhan and Kokko 2007). On the other hand, because the strength of selection on fecundity variance is inversely proportional to population size, selection for mean–variance trade-offs will usually be dominated by changes in mean fecundity. For this reason, it has been suggested that within-generation bet hedging will be favored only in very small populations (Seger and Brockman 1987; Hopper et al. 2003), although recent theoretical studies have shown that bet hedging can evolve under less restrictive conditions in subdivided populations (Shpak 2005; Lehmann and Balloux 2007; Shpak and Proulx 2007).

Less consideration has been given to the diffusion coefficient, v(p), which differs from the familiar quadratic term, p(1 − p), of the Wright–Fisher diffusion. Because the variance effective population size of a monomorphic population depends on the fecundity variance, it is not surprising that v(p) has an additional dependence on the frequency of A1 whenever the two alleles have different offspring variances. However, as noted by Gillespie (1974), the relationship between allele frequency fluctuations and the allelic composition of the population is counterintuitive. For example, when p is close to 1, so that the population is composed mainly of A1-type individuals, the rate of allele frequency fluctuations is dominated by the variance of the A2 genotype. In particular, if we define the variance effective population size by the expression Np(1 − p)/v(p) (Ewens 1982), then not only is this quantity frequency dependent, but also it depends on the life history traits of the missing genotype whenever the population is fixed for one of the two alleles. In contrast, the coalescent effective population size of a monomorphic population depends only on the offspring distribution of the fixed allele. The discrepancy between these two quantities raises the following question: namely, How does fecundity variance polymorphism affect the statistical properties of the genealogy of a random sample of individuals?

The answer to this question is of interest for several reasons. First, although the effects of selection on genealogies have received considerable attention (Przeworski et al. 1999; Williamson and Orive 2002; Barton and Etheridge 2004), little is known about the genealogical consequences of variation in traits that alter the coalescent rate. Extrapolating from models in which the effective population size varies under the control of external factors, we might expect the coalescent process in a model with fecundity variance polymorphism to be a stochastic time change of Kingman's coalescent. However, the results derived in the next section show that this intuition is usually wrong. The second motivation is more practical. Even if changes in fecundity variance are usually controlled by selection on other traits, the existence of interspecific differences in fecundity variance suggests that there must be periods when populations are polymorphic for alleles that alter the fecundity variance. In these instances, it might be possible to use sequence data to identify the loci responsible for these changes, but to do so will require the development of methods that exploit patterns that are unique to models in which the effective population size depends on the genetic composition of the population. For example, whereas the effects of genetic hitchhiking are usually restricted to linked sites (Maynard Smith and Haigh 1974; Kim and Stephan 2002; Przeworski 2002; Przeworski et al. 2005), we will see later that selective sweeps by mutations that affect fecundity variance would have a genomewide impact on polymorphism.

Kingman (1982a,b) showed that the genealogy of a sample of individuals from a panmictic, neutrally evolving population of constant size can be described by a simple stochastic process known as the coalescent (or Kingman's coalescent). One of the most important properties of Kingman's coalescent is that it is a Markov process, a fact that is heavily exploited in mathematical analyses and that also allows for efficient simulations of genealogies. Unfortunately, this property generally does not hold in populations composed of nonexchangeable individuals. For example, if there are selective differences between individuals, then although the genealogy of a sample of individuals can still be regarded as a stochastic process, selective interactions between individuals cause this process to also depend on the history of nonancestral lineages. The key to overcoming this difficulty is to embed the genealogical process in a larger process that does satisfy the Markov property. This can be done in two ways. One approach is to embed the coalescent tree within a graphical process called the ancestral selection graph (Krone and Neuhauser 1997; Neuhauser and Krone 1997; Donnelly and Kurtz 1999) in which lineages can either branch, giving rise to pairs of potential ancestors, or coalesce. The intuition behind this construction is that the effects of selection on the genealogy can be accounted for by keeping track of a pool of potential ancestors that includes lineages that have failed to persist due to being outcompeted by individuals of higher fitness. Because the branching rates are linear in the number of lineages, while the coalescence rates are quadratic, this process is certain to reach an ultimate ancestor in finite time. The process can be stopped at this time, and both the ancestral and the genotypic status of individual branches can be resolved by assigning random mutations to the graph and then traversing it from the root to the leaves.

The second approach is due to Kaplan et al. (1988), who showed that the genealogical history of a sample of genes under selection can be represented by a structured coalescent process. Here we think of the population as being subdivided into several demes, or genetic backgrounds, consisting of individuals that share the same genotype at the selected locus. Because individuals with the same genotype are exchangeable, the rate of coalescence within a background depends only on the size of the background and the number of ancestral lineages sharing that genotype. In addition, mutations at the selected site will move lineages between backgrounds. To obtain a Markov process, we need to keep track of two kinds of information: (i) the types of the ancestral lineages and (ii) the frequencies of the alleles segregating at the selected locus. Fortunately, because one-dimensional diffusion processes are reversible with respect to their stationary distributions (i.e., the detailed balance conditions are satisfied), the ancestral process of allele frequencies at a locus segregating two alleles has the same law as the forward process. Subsequently, Hudson and Kaplan (1988) showed that the genealogy at a linked neutral locus can be described by a structured coalescent defined in terms of the genetic backgrounds at the selected locus; in this case, recombination between the selected and neutral loci can also move lineages between backgrounds.

The objective of this article is to extend the structured coalescent to population genetic models in which within-generation fecundity variance is genotype dependent. (The genealogical consequences of polymorphism affecting between-generation fecundity variance will be described in a separate article.) In these models, exchangeability is violated not only by selective differences between individuals, but also by differences in life history traits that affect coalescent rates and allele frequency fluctuations. Nonetheless, because lineages are exchangeable within backgrounds, the coalescence and substitution rates can still be calculated conditional on the types of the lineages and the genetic composition of the population. In the next two sections, I derive structured coalescent processes that describe the genealogy at a neutral marker locus that is linked to a second locus (the “selected locus”) that affects fecundity variance. This is first done for a haploid model and then extended to a diploid model in which there may be both sex- and genotype-specific differences in fecundity variance. Results for both models are summarized in Table 2.

This work shows that coalescent processes in populations with fecundity variance polymorphism differ from the structured coalescent in a monomorphic population in three ways. One difference is that in populations with fecundity variance polymorphism, the coalescent rates in the different genetic backgrounds are not inversely proportional to the variance effective population size. Instead, coalescence within each allelic background depends only on the frequencies and fecundity distributions of genotypes containing that allele. The second difference is that the genealogies at the marker and selected loci are correlated even when these loci are unlinked; i.e., fecundity variance polymorphism has a genomewide impact on genealogies and genetic variation. This follows from the calculations leading up to Equation 28, which show that the genealogical process at an unlinked marker locus can be represented as a stochastic time change of Kingman's coalescent dependent on the ancestral process of allele frequencies at the selected locus. The third and most surprising difference is that the correspondence between ancestral processes and allele frequency processes is many-to-one in diploid models with fecundity variance polymorphism. In fact, there are infinitely many combinations of genotype-dependent fecundity distributions (satisfying Equation 24) that have the same diffusion approximation but different genealogical processes. These results are illustrated numerically using simulations of the structured coalescent under directional and balancing selection. Finally, I examine the scope of the theory and some possible applications in the discussion.

In this section I consider a modification of the “culling without replacement” model of a haploid population containing N adult individuals with nonoverlapping generations (Gillespie 1975). Table 1 gives a summary of the relevant parameters and some related expressions that appear in the structured coalescent process corresponding to this model. As in the original formulation, I assume that there is a selected locus, segregating two alleles A1 and A2, which can influence individual fecundity, and I write p(t) and q(t) for the frequencies of A1 and A2, respectively, among the adults alive in generation t. I also assume that there is a neutral marker locus that is linked to the selected locus, but that has no impact on any aspect of demography. Each generation consists of the following sequence of events: reproduction, mutation, and recombination in the offspring, death of the adults, and density-dependent culling of the offspring pool. The aim is to derive a structured coalescent process, valid when N tends to infinity, which characterizes the genealogy of a sample of individuals at the marker locus.

View this table:
TABLE 1

Parameters and related expressions in the haploid model

To describe reproduction, suppose that the individuals in each of the two genetic backgrounds have been assigned an arbitrary order, and let ηi(j) denote the number of offspring born to the jth adult with genotype Ai. I assume that the random variables ηi(j) are independent and that the numbers of offspring born to adults with the same genotype at the selected locus are identically distributed. It is understood that the offspring variables ηi(j) are indexed by t and dependent on N, although these indexes are usually left implicit.

Mutation and recombination occur as follows. At birth, each offspring of an A1 parent mutates to A2 with probability μ2/N, while each offspring of an A2 parent mutates to A1 with probability μ1/N. As usual, I assume that mutations occur independently to different offspring and that nonmutant offspring inherit the genotype (at the selected locus) of their parent. I also assume that recombination between the selected locus and the marker locus occurs with probability r/N in each offspring and that each recombinant individual inherits its marker locus from an adult chosen uniformly at random from the population. Following reproduction, mutation, and recombination, the adults of the present generation all die, and the next generation is formed by sampling N of the offspring uniformly at random and without replacement.

#### Diffusion approximation:

To obtain a diffusion approximation for the frequency p(t), I need to make several assumptions about the offspring distributions, η1 and η2. First, to preclude the possibility that fewer than N offspring are born in any generation, I assume that each parent gives birth to at least one offspring; i.e., ℙ{η1 ≥ 1} = ℙ{η2 ≥ 1} = 1. Although the results in this section can be derived under the less restrictive condition that, on average, adults give birth to more than one offspring, the stronger condition will spare us some technical difficulties. The second assumption is that the means and the variances of the offspring distributions depend on N in the following way:$Math$In other words, whereas no restriction will be imposed on the difference between the variances of the offspring distributions, selection on mean fecundity is required to be weak, i.e., of order O(N−1), so that selection and genetic drift operate on comparable timescales. The implications of this assumption are examined in the discussion.

Under these conditions, Gillespie (1975) showed that if time is measured in units of N generations, then as N tends to infinity, the frequency process converges in distribution to a diffusion process with drift and variance coefficients$Math$(1)where the constants Vi = σi/R2 are the standardized variances of the offspring distributions and q = 1 − p. Standard theory for one-dimensional diffusions (Ewens 2004) tells us that this process has a unique stationary distribution on [0, 1] whenever both mutation rates are positive and that the density of this distribution is$Math$(2)where γi = 1 − 1/R + Vi, θi = 2μii, δ = s1s2 + V2V1, and C is a normalizing constant. Moreover, because one-dimensional diffusions are reversible with respect to their stationary distributions (Griffiths 2003), we know that the ancestral process of allele frequencies in a population at equilibrium is governed by the same diffusion approximation as the forward-in-time process.

#### Coalescent process:

Suppose that we have sampled n individuals at random from the population and let ni denote the number of individuals in our sample with genotype Ai. Throughout this section, I assume that the population is in equilibrium and that the frequency process is defined arbitrarily far into the past. To identify the structured coalescent process corresponding to this model, we need to calculate the rates at which coalescence, mutation, and recombination change the number of ancestral lineages belonging to each genetic background. Because the diffusion approximation determined by (1) is the limit of a sequence of discrete-time Markov chains with time rescaled by a factor of N, these rates can be calculated by multiplying the probability that a given event occurs in one generation (looking backward in time) by N and then taking the limit as N tends to infinity. In particular, this means that terms that are of order O(N−2) can be neglected in the following calculations. [To be rigorous, we should show that convergence is uniform in (p, n1, n2), but here I proceed heuristically, noting that a formal derivation can be given using the approach described in Barton et al. 2004.]

I first consider coalescence of A1-type lineages. Suppose that p and p′ denote the frequency of A1 in the current and the preceding generation, respectively. Let us calculate the probability, conditional on p, that two A1-type lineages coalesce without mutation or recombination. This is equal to the probability, conditional on p, that two individuals sampled uniformly at random from the population are both descended without mutation or recombination from the same A1-type parent, divided by the probability, also conditional on p, that two individuals sampled uniformly at random from the population are both of type A1. To calculate this ratio, let Y denote the total number of offspring born in the preceding generation,$Math$let $Math$ denote the number of nonrecombinant, nonmutant offspring born to the jth adult with genotype Ai, and note that the conditional probability, given Y and $Math$, that two individuals sampled uniformly at random from the population are both nonmutant, nonrecombinant offspring of this individual is $Math$($Math$ − 1)/(Y(Y – 1)). Consequently,$Math$(3)The last expression is obtained by conditioning on p′ and then using the exchangeability of the conditioned random variables $Math$.

The next task is to evaluate the limit, as N tends to infinity, of the conditional expectation in the last line of (3). This calculation depends on four observations. First, because the conditional distribution of p′ given p is concentrated around p when N is large, i.e., the frequency is unlikely to change very much over one generation in a large population, we can replace p′ by p inside the expectation and then pull this term outside. Second, the concentration of p′ around p also implies that the conditional expectation of the random variable $Math$ given p tends to the unconditioned expectation as N tends to infinity. In effect, knowing p allows us to predict p′ very accurately, but provides very little information about either Y/N or $Math$(1) when N is large. Third, since Y is a sum of independent random variables, each having mean R + O(N−1) and variance bounded by σ1 + σ2, the strong law of large numbers implies that Y/N converges almost surely to R. Finally, because the random variable $Math$(1) is conditionally binomial with parameters η1(1) and (1 − r/N)(1 − μ2/N) = 1 − O(1/N), we can replace $Math$(1) by η1(1) in the unconditioned expectation. Taken together, these observations imply that$Math$(4)Since the probability that two lineages coalesce in the same generation that they are affected by either mutation or recombination is O(N−2), we can calculate the asymptotic coalescent rate for two A1 lineages by combining (3) and (4). This gives$Math$

Similar calculations show that the probability of either a multiple merger or a simultaneous merger is asymptotically negligible as N tends to infinity with n1 fixed. Consequently, only binary mergers occur in the diffusion limit, and if there are n1 lineages of type A1 and if the frequency of A1 is p, then the asymptotic rate of a binary merger within this background is $Math$(5)

Coalescence rates of A2-type lineages can be calculated in a similar manner. If there are n2 lineages in the A2 background and the frequency of A2 is q, then the rate of pairwise mergers in the A2 background is$Math$(6)

My next aim is to calculate the rates at which lineages migrate between backgrounds through mutation at the selected locus. As above, let p and p′ denote the frequency of A1 in the present and preceding generations, and consider the probability, conditional on p, that a single lineage of type A1 is the nonrecombinant offspring of a parent with genotype A2. This is equal to the probability, conditional on p, that an individual sampled uniformly at random from the population is a nonrecombinant mutant offspring of an A2-type parent, divided by the probability, conditional on p, that an individual sampled uniformly at random from the population is of type A1,$Math$(7)where $Math$ denotes the number of mutant, nonrecombinant offspring born to the jth individual with genotype Ai and q′ = 1 − p′.

The limit of this expression as N tends to infinity can be evaluated using the same approach that was used to evaluate the coalescent rates. First, note that by the exchangeability of the terms inside the expectation and the fact that the random variable $Math$(1) is conditionally binomially distributed with parameters η2(1) and (μ1/N)(1 − r/N), (7) is equal to$Math$Then, by using the fact that q′ is concentrated around q = 1 − p when N is large, it can be shown that$Math$Next, because the probability that a particular offspring is both mutant and recombinant is of order O(N−2), this last result shows that the rate at which a single lineage migrates from the A1 background into the A2 background is equal to$Math$Finally, because the probability that two or more of the n1 A1-type lineages mutate in the same generation is also asymptotically negligible, it follows that the rate at which one of these lineages mutates into the A2 background is$Math$(8)A similar approach can be used to show that the rate at which one of the n2 A2-type lineages mutates into the A1 background is$Math$(9)

The final task is to determine the rates at which lineages migrate between backgrounds because of recombination between the selected and marker loci. For example, consider the probability that a nonmutant individual with genotype A1 inherits its marker locus from a parent with genotype A2. This is equal to the probability, conditional on p, that an individual sampled uniformly at random from the population inherits the selected locus from an A1-type parent and inherits the marker locus from an A2-type parent, divided by the probability, conditional on p, that the sampled individual has genotype A1. If $Math$ denotes the number of nonmutant, recombinant offspring of the jth parent with genotype A1 that inherit the marker locus from an individual with genotype A2, then this ratio can be written as$Math$Exploiting exchangeability once more, as well as the fact that $Math$(1) is conditionally binomial with parameters η1(1) and (rq′/N)(1 − μ2/N), this last expression is equal to$Math$In this case, the limit of the conditional expectation in this expression is just pq. Then, since the probability that the ancestral lineages are affected either by multiple recombination events or by both recombination and mutation in the same generation is of order O(N−2), it follows that the rate at which one of n1 A1-type lineages recombines into the A2 background is$Math$(10)while the rate at which one of n2 A2-type lineages recombines into the A1 background is$Math$(11)

The preceding calculations have identified the asymptotic rates at which coalescence (Equations 5 and 6), mutation (Equations 8 and 9), and recombination (Equations 10 and 11) occur within the two genetic backgrounds defined by the selected locus. Each of these rates depends on the configuration of the ancestral lineages, (n1(t), n2(t)), at time t in the past, as well as on the ancestral allele frequency, p(t), which is governed by the diffusion with the variance and drift coefficients given in (2). Furthermore, because the lineages are exchangeable within backgrounds, these calculations fully determine the structured coalescent process corresponding to the haploid model. For example, when a coalescence event occurs within the A1 background, each pair of lineages in that background is equally likely to coalesce. For subsequent reference, these results are summarized in Table 2.

View this table:
TABLE 2

Transition rates in the structured coalescent process

If both alleles have the same fecundity variance, say V1 = V2V, then, as expected, the structured coalescent process for this model is identical, up to a scalar time change, to the coalescent processes derived by Kaplan et al. (1988) for the Wright–Fisher model and Barton et al. (2004) for the Moran model. [The processes differ by a time change because of the discrepancy between the census population size, N, and the effective population size, N/(1 − 1/R + V), of the model studied in this article.] However, if V1V2, then we see that genetic variation in fecundity variance affects the coalescent process at a linked locus in two ways. First, for fixed p, the rate of coalescence within each background depends specifically on the mean and the variance of the offspring distribution of the allele defining that background; e.g., the instantaneous rate of coalescence of two A1 lineages depends on V1 but not on V2. It follows that genetic variation in fecundity variance does not simply alter the coalescent process at a linked site by a frequency-dependent change of timescale, but rather changes the coalescent effective size of each background individually. On the other hand, because the fluctuations of the background sizes are governed by a diffusion process that depends on the fecundity distributions of both alleles, the overall structure of the genealogy within either background is influenced by the life history traits of both alleles. In particular, directional selection on within-generation fecundity variance will tend to keep the background with higher fecundity variance at low frequencies unless there is compensating selection on mean fecundity.

My next objective is to characterize the effect of fecundity variance on the coalescent process of a randomly mating population in which both the marker locus and the selected locus are diploid and autosomal. Because there are often sex-dependent differences in selection and fecundity (Clutton-Brock 1988; Hedrick 2007; Hammer et al. 2008), I consider a model in which there are two sexes or mating types (“males” and “females”) and let γ ∈ (0, 1) denote the proportion of adults that are female. As in the previous section, suppose that the population contains N adult individuals (and therefore 2N copies of each locus) and that generations are nonoverlapping, with each adult producing a random number of gametes and then dying at the end of each generation. I again assume that recruitment follows the culling-without-replacement model: the next generation is formed by randomly sampling N maternal and N paternal gametes without replacement and then grouping these into N random pairs, each containing one maternal and one paternal gamete. Finally, the sex of each individual is determined by dividing the population into two random subsets, of sizes ⌊Nγ⌋ and N − ⌊Nγ⌋ respectively, comprising the female and male members of the population. (⌊x⌋ denotes the greatest integer less than x.) A more realistic model of sex determination is discussed below. Table 3 provides a summary of the parameters relevant to this model.

View this table:
TABLE 3

Parameters and related expressions in the diploid model

Although p and q still denote the allelic frequencies of A1 and A2 among the adult members of the population, I also need to introduce variables $Math$ to denote the numbers of adults with genotype AiAj and sex Δ ∈ {f, m}. By convention, the heterozygous genotype will always be written as A1A2; e.g., there is no distinction between heterozygotes with maternally or paternally inherited copies of A1, and I simply write $Math$ for the number of heterozygotes with sex Δ. To describe reproduction, suppose that an arbitrary order is assigned to the individuals in each sex × genotype class, and let $Math$(k) denote the total number of gametes produced by the kth individual with genotype AiAj and sex Δ. These random variables are assumed to be identically distributed within each sex × genotype class, as well as independent both within and between such classes.

I make the following assumptions about meiosis, mutation, and recombination. First, I assume that each gamete is the product of a fair and independent meiosis; i.e., each gamete is equally likely to inherit the maternal copy of the parent's selected (or marker) locus as it is to inherit the paternal copy. Second, if a gamete is descended from a parental allele of type A1 (resp. A2), then with probability μ2/2N (resp. μ1/2N), the copy transmitted in the gamete will be mutant, and otherwise the transmitted copy will be identical to that of the parental allele. Third, recombination between the selected locus and the marker locus will occur in each meiosis with probability r/2N. If a recombination occurs, then the selected locus and the marker locus will be inherited from different chromosomes; e.g., the selected locus will be inherited from the parent's paternal chromosome, while the marker locus will be inherited from the parent's maternal chromosome. Otherwise, the gamete will inherit both loci from the same parental chromosome.

#### Diffusion approximation:

To guarantee the existence of a diffusion approximation for the frequency of allele A1, certain conditions must be imposed on the gamete distributions. Here I assume that the means and variances of the gamete distributions can be written in the form$Math$i.e., there can be fixed differences in the variances of different sex × genotype classes, but the differences of the means within each sex must be of order O(N−1). Note, however, that the expected numbers of gametes produced by the two sexes can be very different; i.e., the difference R(m)R(f) need not vanish as N tends to infinity. Second, to ensure that at least N maternal and N paternal gametes are produced in every generation, I assume that the gamete distributions satisfy the following lower bounds:$Math$As was noted in the previous section, both the diffusion approximation and the structured coalescent can be derived under much weaker conditions on the gamete distributions. For example, the inequalities γR(f) > 1 and (1 − γ)R(m) > 1 would suffice. In addition, the model could also be changed so that the sex of each adult is determined independently, with each adult developing as a female with probability γ and as a male with probability 1 − γ. Although these conditions do not guarantee that N maternal or paternal gametes will be produced in every generation, we can still derive the structured coalescent by showing that the probability that too few gametes are produced in any particular generation is exponentially small in N.

One complication presented by the diploid model is that the process of allele frequencies, (p(t): t ≥ 0), is not Markovian when N is finite. Instead, the distribution of p(t + 1) depends not only on p(t), but also on the sizes of the six sex × genotype classes in the previous generation, which in turn depend on p(t − 1). Nonetheless, it can be shown that there is still a diffusion approximation for p(t) when time is measured in units of 2N generations and N tends to infinity. The key observation is that, because mating and sex determination are both random, in large populations the normalized sizes of the sex × genotype classes, $Math$(t) = $Math$(t)/N, will usually be very close to the equilibrium values that depend only on the allele frequency p(t) = p,$Math$(12)i.e., the sex ratio will be close to γ, while the genotype frequencies in the two sexes will be nearly equal and close to Hardy–Weinberg equilibrium. Furthermore, even if by chance the $Math$(t) do deviate from these values by a large amount in one generation, equilibrium will be rapidly restored by random mating and sex determination.

In view of this last property, our model can be said to have two timescales: a slow timescale on which the allele frequency p(t) changes and a fast timescale on which the variables $Math$(t) fluctuate about their equilibrium values. Mathematical techniques for deriving diffusion approximations for Markov processes with two timescales are described in Ethier and Nagylaki (1980). In essence, their theory allows us to identify the infinitesimal drift and variance coefficients of the diffusion approximation by first calculating the expectation and variance of the change in the slow variables as functions of both the slow and fast variables and then substituting the equilibrium values of the fast variables into the resulting expression. In addition, we must show that the fast variables relax sufficiently rapidly to equilibrium from all initial values. See Ethier and Nagylaki (1980) for a detailed analysis of a diploid, two-sex extension of the Wright–Fisher model. A similar analysis can be used to identify the diffusion approximation for the diploid culling-without-replacement model, but here I have omitted the very lengthy calculations required for the convergence proof and simply state the results. However, before doing so, it is useful to define several constants that characterize the sex averages of the demographic parameters:$Math$(13)The infinitesimal drift and variance coefficients of the diffusion approximation for the diploid culling-without-replacement model can then be written as follows:$Math$(14)

As expected, if all three genotypes have the same fecundity variance, then $Math$ and this process reduces to the usual Wright–Fisher diffusion. More surprising is the fact that this is also true whenever the fecundity variance of the heterozygote is exactly intermediate between the fecundity variances of the two homozygotes, $Math$. However, in all other cases, the diffusion approximation defined by (14) has an infinitesimal variance that is quartic in p and therefore differs both from the Wright–Fisher diffusion and from the haploid diffusion shown in (1). In particular, the diffusion approximations for the haploid and diploid models studied in this article will coincide only when both reduce to a Wright–Fisher diffusion. This was first observed by Gillespie (1974), who derived a diffusion approximation similar to (14) for a diploid branching process with genetic control of fecundity variance.

Provided that both mutation rates are positive, the diffusion approximation for the diploid model has a unique stationary distribution with a density on [0, 1]. However, before we give an explicit formula for this density, it is convenient to rewrite the generator of this process in a more compact form,$Math$(15)where the values of the coefficients v1, v2, d1, and d2 are determined by (14). Although v1 is always positive, the coefficient v2 can be either positive or negative subject to the constraint v2 ≥ −4v1, which is always satisfied in this model [see the constants defined above (14)]. Then, if we define θi = 2μi/v1 and α = (|v2|/(4v1 + v2))1/2 and we let C denote a normalizing coefficient, the density of the stationary distribution is given by one of the following three expressions,$Math$(16)depending on whether v2 ∈ (−4v1, 0), v2 = 0, or v2 > 0, respectively.

#### Coalescent process:

To identify the structured coalescent process corresponding to the diploid model, it is again necessary to calculate the coalescent, mutation, and recombination rates of ancestral lineages as functions of the allele frequencies and the numbers of lineages belonging to each genetic background. I continue to assume that the population is in equilibrium, so that the process of ancestral frequencies is governed by the same diffusion as the forward process. In addition, I assume that the population size is so large that we can equate the “fast” variables, $Math$(t), to the equilibrium values shown in (12). As in the derivation of the diffusion, this approximation can be justified using results in Ethier and Nagylaki (1980).

Suppose that p and p′ denote the frequency of the allele A1 in the present and the previous generation, respectively. To calculate the coalescent rate of A1-type lineages, I again need to calculate the probability, conditional on p, that two nonmutant, nonrecombinant chromosomes with genotype A1 at the selected locus are identical by descent at the marker locus. As in the haploid model, this is equal to the probability, conditional on p, that two chromosomes, sampled uniformly at random and without replacement, were carried by nonmutant, nonrecombinant gametes produced by the same A1-type chromosome in some adult alive in the previous generation, divided by the probability, also conditional on p, that two chromosomes sampled at random are of type A1.

One new feature of the diploid model that must be addressed here is the fact that coalescence of two A1 lineages can happen in several ways, depending on the sex × genotype class of the parent and on which chromosome is transmitted (maternal or paternal) if the parent is an A1A1 homozygote. To account for these different possibilities, let $Math$(j) and $Math$(j) denote the numbers of nonmutant, nonrecombinant gametes derived from the maternal or the paternal chromosome, respectively, of the jth individual with genotype A1A1 and sex Δ, and observe that, conditional on the total number of gametes produced by this individual, each of these random variables is binomially distributed with parameters $Math$(j) and $Math$(1 − μ1/2N) (1 − r/2N). Similarly, let $Math$(j) denote the number of nonmutant, nonrecombinant gametes of type A1 produced by the jth individual with genotype A1A2 and sex Δ, and observe that this random variable also is conditionally binomial, with parameters $Math$(j) and $Math$(1 − μ1/2N) (1 − r/2N). In each of these cases, the factor of $Math$ in the binomial probability reflects the assumption that meiosis is fair; i.e., a gamete produced by a homozygote is equally likely to be of paternal or maternal origin, while a gamete produced by a heterozygote is equally likely to carry an A1 or an A2 allele before mutation.

If we let Y(f) and Y(m) denote the total numbers of gametes produced by the female and male members of the previous generation, then we can decompose the coalescent probability into a sum over the six possible sources of the parental A1 chromosome:$Math$(17)For example, the first sum in the expectation counts the number of ways that two lineages can be descended from the maternal copy of the A1 chromosome in a female A1A1 homozygote. This expression is divided by Y(f)(Y(f) − 1), which counts the number of ways that two gametes (of any type) can be sampled from the pool of gametes contributed by female parents, and is multiplied by a factor of $Math$(N − 1)/(2N − 1), which is the probability that the two sampled chromosomes were both contributed by female parents. (Recall that each sex contributes N chromosomes to the next generation irrespective of the sex ratio.)

To calculate the pairwise coalescence rate, I need to evaluate the limit of the product $Math$, where the probability is that shown in (17) and I have multiplied by a factor of 2N to be consistent with the time rescaling leading to the diffusion approximation. First, multiply and divide each term inside the expectation by N2, bringing one factor of 1/N outside of the expectation and leaving one in front of each sum. Then, use the exchangeability of the gamete numbers to replace each sum by the product of the number of terms in the sum and any one of the summands (e.g., the first):$Math$(18)Next, we can use the strong law of large numbers to replace each of the terms (N/Y(f))2 and (N/Y(m))2 by the constants 1/(γR(f))2 and 1/((1 − γ)R(m))2, respectively. Likewise, by appealing to the assumption that the fast variables are close to the equilibrium values shown in (12), we can replace $Math$ = $Math$/N by γp2, $Math$ by γ2pq, $Math$ by (1 − γ)p2, and $Math$ by (1 − γ)2pq. Here we have also exploited the fact that the allele frequencies are approximately constant over a single generation, p′ ≈ p and q′ ≈ q. Combining these observations with the exchangeability of the random variables $Math$(1) and $Math$(1), the expression shown in (18) can be rewritten as$Math$(19)As in the analysis of the haploid model, it can be shown that the limiting values of the conditional expectations in (19) do not depend on p when N tends to infinity, so that it suffices to evaluate the unconditioned expectation. To complete these calculations, observe that if Z is a nonnegative integer-valued random variable with mean m and variance v, and if $Math$ is conditionally binomial with parameters Z and $Math$, then $Math$. Since $Math$(1) is conditionally binomially distributed with parameters $Math$(1) and $Math$(1 − μ1/2N)(1 − r/2N) ≈ $Math$, the asymptotic rate of coalescence of two A1 lineages in the absence of mutation and recombination is$Math$

Finally, since more complicated possibilities such as multiple merger coalescence events or simultaneous mutation and coalescence events can be neglected when N is large, it follows that the pairwise coalescence rate in the A1 background when there are n1 ancestral lineages of that type is$Math$(20)Likewise, if there are n2 ancestral lineages of type A2, then the pairwise coalescence rate in that background is$Math$(21)

The rates at which lineages move between backgrounds can be calculated in a similar manner. Although we still need to consider the sex and genotype of the parent contributing the mutant or recombinant gamete, the derivations are analogous to those described in the preceding section and so I omit the details. If the frequency of A1 is p and if the sample configuration is (n1, n2), then the rates at which lineages mutate out of either the A1 or the A2 background are$Math$(22)respectively, while the rates at which lineages recombine out of these two backgrounds are$Math$(23)Taken together with the diffusion approximation (14), these results determine the structured coalescent process corresponding to the diploid model and are summarized in Table 2, along with the corresponding results for the haploid model.

Comparing the structured coalescent processes for the haploid and diploid models, we see that these processes differ in two ways. First, as we noted earlier, the fluctuations of the background sizes are governed by diffusion processes that differ except in the special case where both processes are Wright–Fisher diffusions. Second, whereas the instantaneous coalescent rates in the haploid model are inversely proportional to the background sizes, the coalescent rates in the diploid model have a more complicated dependence on p that reflects the different diploid genotypic backgrounds that individual lineages can belong to. For example, the coalescent rate of A1-type lineages in the diploid model can be explained by noting that each such lineage will be paired either with another A1 chromosome with probability p, in which case the standardized variance is V11, or with an A2 chromosome with probability q, in which case the standardized variance is V12. Because the type of chromosome that a particular lineage is paired with in a polymorphic population changes frequently under random mating, the coalescent rate is simply the arithmetic mean of the coalescent rates in the two genotypic backgrounds, weighted by the frequencies with which these occur.

#### Degenerate relationships between coalescents and diffusions:

One unexpected property of the diploid culling-without-replacement model is that there are infinitely many combinations of genotype-dependent fecundity distributions that have the same diffusion approximation (14) but distinct coalescent processes. Indeed, because the diffusion approximation is uniquely determined by the coefficients μ1, μ2, v1, v2, d1, and d2 given in (14), it follows that all sets of fecundity distributions that satisfy the following four conditions will have the same diffusion approximation:$Math$(24a)$Math$(24b)$Math$(24c)$Math$(24d)Assuming that the values of R and γ are fixed, (24) imposes four constraints on six unknowns (sij, Vij, i, j = 1, 2). This leaves 2 d.f., one of which can be accounted for by the fact that we can add an arbitrary constant to all three selection coefficients, sij, without changing either the diffusion approximation or the structured coalescent (see Table 2). The remaining degree of freedom is more interesting. Note that although conditions (24a) and (24b) determine the values of V12 and $Math$ uniquely, the entire set of conditions can be satisfied for any $Math$, provided that we also take $Math$ and then choose the selection coefficients so that (24c) and (24d) are satisfied. However, because the coalescent rates in Table 2 depend on V11 and V22 individually, there is a distinct structured coalescent for each set of fecundity parameters satisfying (24) with a different value of V11. As expected, sets having larger values of V11 have faster coalescence in the A1 background and slower coalescence in the A2 background.

It can also be shown that the marginal distribution of sample genealogies varies among the multiple solutions to (24). (By the sample genealogy, I mean the coalescent tree without information about the types of the branches.) For example, this can be verified by calculating the probability that two lineages ancestral to a random sample of size 2 from a stationary population coalesce in a short period of time, dt. Suppose that the frequency of A1 in the population at the time of sampling is p and let q = 1 − p. Then, with probabilities p2, 2pq, and q2, the sample will contain two A1 individuals, an A1 and an A2 individual, or two A2 individuals, respectively. Conditional on these events, Equations 20 and 21 show that the probabilities that the two lineages coalesce during the time interval [0, dt] are$Math$respectively. Consequently, conditioning on p only, the probability of coalescence during this short interval is$Math$Finally, to remove the conditioning on p, we need to average this probability over the stationary distribution shown in (16). Let m11, 2m12, and m22 denote the expected values of the quantities p2, 2pq, and q2 with respect to this distribution, and note that these expectations depend on V11 and V22 only through the diffusion approximation. Then, the unconditioned probability that the two lineages coalesce in time [0, dt] is$Math$(25)Since, in general, m11m22, note that the probability shown in (25) differs among sets of fecundity distributions that have the same diffusion approximation but different values of V11 and V22. This shows that the distribution of sample genealogies is not uniquely determined by the diffusion. Numerical examples illustrating the nonunique relationship between the diffusion approximation and the coalescent process are given in the simulations section.

My objective in this section is to describe the coalescent process at a neutral marker locus that is unlinked to the selected locus. I work with the same model that was studied in the preceding section except that it is assumed that the recombination fraction between the marker locus and the selected locus is $Math$ for all N; i.e., each gamete is as likely to inherit copies of both loci from the same parental chromosome as it is to inherit the selected locus from one parental chromosome and the marker locus from the other. In particular, because the dynamics of the selected locus are unchanged, the drift and variance coefficients shown in (14) still define the diffusion approximation for the frequency of A1.

One consequence of free recombination between the marker locus and the selected locus is that, in a large population, migration of lineages between genetic backgrounds occurs much more rapidly than either coalescence or substantial changes in the allele frequencies, both of which require order N generations to occur. Consider coalescence first. As in the preceding section, the pairwise coalescent rates can be calculated by considering the different ways in which pairs of chromosomes can be identical by descent at the marker locus. For example, to calculate the probability that two A1-type chromosomes are identical by descent at the marker locus and have an A1A1 homozygote as a parent, let $Math$(j) and $Math$(j) denote the numbers of nonmutant gametes that inherit a copy of the marker locus from either the maternal or the paternal chromosome, respectively, of the jth individual with genotype A1A1 and gender Δ. Then, because meiosis is fair, each of these variables is conditionally binomially distributed with parameters $Math$(j) and $Math$(1 − μ2/2N), and so the probability in question is$Math$(26)Furthermore, if there are n1 lineages in background A1, then the total probability (ignoring terms of order N−2) that two of these are identical by descent at the marker locus and have a parent with genotype A1A1 is just $Math$ times the expression on the right-hand side of (26). The probabilities of identity by descent for the other cases can be calculated in a similar fashion and are summarized in Table 4. The only novel feature encountered in these calculations is that the probability that coalescence and recombination occur in the same generation is nonnegligible, so that coalescence can occur between lineages that belong to different backgrounds.

View this table:
TABLE 4

Probabilities of identity-by-descent at an unlinked locus

Next consider changes in the sample configuration (n1, n2) that involve recombination or mutation but not coalescence. If $Math$(j) denotes the number of recombinant, nonmutant A1-type gametes produced by the jth individual with genotype A1A2 and gender Δ, then $Math$(j) is conditionally binomially distributed with parameters $Math$(j) and $Math$(1 − μ2/2N), and the probability that an A1 lineage recombines into the A2 background over one generation is$Math$Similarly, the probability that an A2 lineage recombines into the A1 background is p + O(N−1). Because different gametes recombine independently, it follows that the transition probabilities for the sample configuration are$Math$(27)i.e., the sample configuration in each generation is approximately drawn independently from the binomial distribution with parameters n = n1 + n2 and p. Furthermore, because the mutation rates are assumed to be of order O(N−1), the expression shown in (27) is also the transition probability for changes in the sample configuration caused either by recombination or by mutation; i.e., we can absorb the additional probability due to changes involving mutation into the O(N−1) remainder.

Because the sample configuration changes much more rapidly than the number of lineages, the limiting form of the genealogical process at an unlinked marker locus in an infinite population will not be that of a structured coalescent. In fact, it is evident that the processes$Math$do not even have a limit as N tends to infinity. On the other hand, if we let n(t) = n1(t) + n2(t) denote the total number of ancestral lineages extant at time t, then it can be shown that the processes$Math$do converge to a limit, Y(t) = (p(t), n(t)), by exploiting the separation-of-timescales results that were discussed in the previous section. This relies on two observations: (i) the fast recombination events leave the variables p(t) and n(t) unchanged; and (ii) when N is large, the distribution of the fast variables rapidly approaches an equilibrium that is the binomial distribution with parameters n and p. Consequently, the theory presented in Ethier and Nagylaki (1980) allows us to calculate the coalescent rates of the limiting process by averaging the sum of the configuration-dependent rates shown in Table 4, all of which involve events in which n decreases by one, over the stationary distribution of the sample configuration:$Math$In contrast, changes in the allele frequencies do not depend on the sample configuration and so the evolution of p(t) in the infinite population limit is the diffusion process corresponding to the generator G defined in (15). Putting these results together, it follows that Y(t) is a Markov process with generator$Math$(28)Furthermore, because the lineages at the unlinked marker locus are exchangeable in the infinite population limit, the full coalescent process can be constructed from Y(t) by stipulating that a randomly chosen pair of lineages coalesces whenever n(t) decreases by one.

Equation 28 shows that the coalescent rate at the unlinked marker locus is just the average of the coalescent rates in each of the three genotypic classes, weighted by the Hardy–Weinberg frequencies p2, 2pq, and q2. (The constant term, $Math$ can be partitioned among the three classes according to their frequencies.) Of course, if all three genotypes have the same fecundity variance, then the coalescent rate in (28) is equal to a constant times $Math$. In this case, the processes p(t) and n(t) decouple, and the coalescent process at the unlinked marker locus is just a scalar time change of Kingman's coalescent. However, if the fecundity variance is genotype dependent, then the coalescent process at an unlinked marker locus is instead a stochastic time change of Kingman's coalescent, and the time change depends on the ancestral frequency process at the selected locus. Thus, segregation of alleles affecting fecundity variance will have a genomewide impact on genealogies.

## SIMULATIONS

#### Directional selection:

The structured coalescent can be used to study the statistical properties of sample genealogies in two ways. One approach, introduced in Kaplan et al. (1988) and greatly elaborated by Barton and Etheridge (2004), relies on numerically solving the Kolmogorov backward equation satisfied by expectations of functionals of the process (e.g., the mean time to the most recent common ancestor). However, because these equations are singular, numerical solutions are exceptionally sensitive to round-off error, particularly when the sample size is larger than 5. The second approach, which largely circumvents the problem posed by the singularities but requires lengthier computations, is to characterize the distribution of sample genealogies using Monte Carlo simulations of the structured coalescent. This is the approach taken here, using the results derived in the previous three sections. Details of the implementation are given in the appendix, and the code used to run these simulations is available from the author on request.

Let us consider the haploid model first, using the diffusion approximation defined by Equation 1 and the coalescent and substitution rates given in Table 2. Figure 1 shows how the distributions of the time to the most recent common ancestor (TMRCA) and the total length of the genealogy depend on the fecundity variance of allele A1 when the offspring distribution of A2 is held constant. The sample size was n = 50, and the remaining parameter values were R = 2, V2 = 1, μ1 = μ2 = 0.02, and r = 0 (thus the marker locus and the selected locus are completely linked) while the selection coefficients were chosen according to two different scenarios. The results shown in Figure 1, A and B, were obtained from simulations in which both selection coefficients, s1 and s2, were set to zero. In this case, selection still acts on fecundity variance [see the drift coefficient in (1)], and Figure 1 shows that the expected values of the TMRCA and tree length distributions are unimodal functions that sharply decrease as V1 increases from 0 to V2 = 1 and then slowly increase as V1 increases above 2. For example, the mean coalescent time decreases from 3.56 to 1.31 as V1 increases from 0 to 1, reaches a minimum of 1.16 when V1 = 2, and then increases to 1.23 when V1 = 5. Two factors are responsible for this pattern. On the one hand, as V1 increases, so does the rate of coalescence within the A1 background, which tends to reduce the depth and length of sample genealogies. On the other hand, because there is selection against the allele with higher fecundity variance, the statistical properties of the sample genealogy are disproportionately influenced by the demographic traits of the allele having the lower fecundity variance. The fact that the mean tree depth and tree length increase when V1 is sufficiently greater than V2 indicates that the selective advantage of A2 ultimately has a stronger impact on the genealogy than does the increased rate of coalescence within the A1 background. For comparison, in Figure 1, C and D, I have also plotted results obtained from simulations in which the selection coefficients have been chosen to exactly compensate for the selective disadvantage of the allele with higher fecundity variance (i.e., s1 = V1V2, s2 = 0). In this case, there is no overall selection against either allele, and so the expected values of the TMRCA and tree length are smaller than the corresponding values in Figure 1, A and B, and decrease monotonically with V1.

Figure 1.—

The effect of fecundity variance polymorphism in a haploid population on the distribution of the TMRCA and total length of the genealogy of a sample of size 50. Solid curves show the mean, while the dotted and dashed line curves show the 5, 25, 50, and 95% quantiles. (A and B) Selected scenario. (C and D) Neutral scenario (see text). Estimates are based on 100,000 simulations per data point. R = 2, V2 = 1, μ1 = μ2 = 0.02, r = 0.

The structured coalescent can also be used to efficiently simulate genealogies conditional on the genetic composition of the sample. For example, Figure 2 shows how the mean time to the most recent common ancestor (MTMRCA) of either the entire sample or of just the A1- or A2-type individuals in the sample varies as a function of the number of copies of A1. These results are based on simulations conducted using the same parameter values as in Figure 1, for values of V1 = 0, 0.5, 1, or 2, and under the assumption that selection acts only on fecundity variance (s1 = s2 = 0). Three observations are notable. First, the presence of even a minority of individuals carrying the allele with lower fecundity variance can substantially inflate the expected depth of the entire tree. For example, if there are n1 = 10 A1-type alleles in a sample of size 50, the mean TMRCA increases from 1.56 when V1 = 1 to 2.80 when V1 = 0. Second, the subsamples containing only the A1- or A2-type individuals tend to reach their most recent common ancestors at very different rates: thus, when n1 = n2 = 25, the expected time to the most recent common ancestor of the A1-type individuals is 2.56 but only 0.67 for the A2-type individuals. Third, the expected time to the most recent common ancestor of the A2-type individuals in the sample is only weakly influenced by the value of V1 when V2 is held constant. Indeed, unless the sample size is large or either the mutation rate or the recombination rate is of comparable magnitude to the coalescent rate, most subsamples of genetically homogeneous individuals coalesce to a common ancestor without mutating into the other background.

Figure 2.—

The effect of fecundity variance polymorphism in a haploid population on the mean TMRCA of either the entire sample or of just the A1- or A2-type individuals conditional on the number of A1 genes in the sample. Curves showing the conditional mean TMRCA for the A1- or A2-type individuals are increasing or decreasing, respectively. Parameters are as in Figure 1, A and B (selected scenario).

Simulations of the structured coalescent were also used to examine how fecundity variance polymorphism influences genealogies in a diploid population. Recall that the diffusion approximation for this model is defined by Equation 14, while the transition rates for the structured coalescent can be found in Table 2. Figure 3 shows how the distributions of the TMRCA and the total length of the genealogy of a sample of size 50 vary as functions of V11 when the fecundity variances of the three diploid genotypes are chosen so that the sex-averaged quantities $Math$ and V12 are both equal to 1. Once again, the marker locus and the selected locus were taken to be completely linked, while the sex-averaged selection coefficients, s11, s12, and s22, were either set to zero (selected scenario: Figure 3, A and B) or chosen so that there is no overall selection on either allele (neutral scenario: Figure 3, C and D). I have also assumed that the sex ratio is 1:1 (γ = 0.5). Consider the selected scenario first. Figure 3, A and B, shows that both the expected values and the quantiles of the TMRCA and the tree length distributions of the genealogies are convex functions of V11 that are minimized when all three fecundity variances are equal: V11 = V12 = V22 = 1. As in the haploid model, selection for genotypes with lower fecundity variance is partly responsible for this pattern. For example, in the selected scenario, A2A2 homozygotes will generally be overrepresented in the population whenever V11 > V12 = 1 > V22, which will reduce the overall rate of coalescence. However, the results plotted in Figure 3, C and D, show that selection alone does not explain this pattern, since the expected values of the TMRCA and the total tree length under the neutral scenario also are unimodal functions of V11, albeit with less curvature than in the selected scenario. An alternative explanation is provided by the inequality between the arithmetic and the geometric mean. This is most evident in the weak mutation limit, $Math$, in which case the population is equally likely to be fixed for A1 or A2 alleles. For simplicity, assume that R = 1. Since the mean TMRCA conditional on fixation of A1 or A2 is proportional to V11−1 or V22−1, respectively, it follows that the mean TMRCA (unconditional on the sample configuration) is proportional to$Math$with equality holding only when $Math$.

Figure 3.—

The effect of fecundity variance polymorphism in a diploid population on the distribution of the TMRCA and total length of the genealogy of a sample of size 50. Solid curves show the mean, while the enclosing curves show the 5, 25, 50, and 95% quantiles. (A and B) Selected scenario. (C and D) Neutral scenario (see text). Estimates are based on 100,000 simulations per data point. γ = 0.5, R = 2, $Formula$, μ1 = μ2 = 0.02, r = 0.

The results shown in Figure 3, C and D, also provide an example of the degeneracy of the relationship between genealogies and diffusions that was discussed in the last section. Indeed, whereas the diffusion approximation for the neutral scenario does not change as V11 is increased from 0 to 2 with $Math$ held constant, the simulations show that the distribution of sample genealogies depends very strongly on this parameter. For example, one striking feature of these results is that the lower and upper quantiles of the TMRCA and the tree length distributions actually diverge as V11 deviates from V12; i.e., the variance of these statistics is an increasing function of the difference |V11V22|. On its own, this observation is not surprising: in the absence of selective differences between genotypes, the variance in the TMRCA would be expected to be elevated in a population that contains multiple alleles that have different fecundity variances. What is surprising is the fact that the variances of these statistics can differ so greatly under conditions that leave the fluctuations of allele frequencies at the “selected” locus unchanged. It is also noteworthy that although the diffusion approximation for the neutral scenario shown in Figure 3, C and D, is just the neutral Wright–Fisher diffusion corresponding to the generator$Math$the corresponding genealogical processes are not described by Kingman's coalescent except in the case V11 = V12 = V22. This can be deduced from the fact that the mean-to-quantile ratios of the statistics plotted in Figure 3 are not invariant as functions of V11.

#### Balancing selection:

Previous studies (Hudson and Kaplan 1988; Slatkin 2000; Takahata 2000; Navarro and Barton 2002) have shown that balancing selection can increase the mean time required for a sample of genes to coalesce to a common ancestor, particularly if mutation rates are small. This is because balancing selection can maintain multiple genetic backgrounds at high frequencies, which increases the likelihood that a sample of individuals is polymorphic and decreases the rates at which lineages belonging to different backgrounds move between these backgrounds. Although various processes can give rise to frequency-dependent selection, one context in which balancing selection can occur at a diploid locus is when the heterozygote is more fit than either homozygote. In fact, in the diploid model considered in this article, heterozygote advantage can arise in two different ways: either because the heterozygote has higher mean fecundity than the two homozygotes (s12 > s11, s22) or because it has lower fecundity variance (V12 < V11, V22). To illustrate the effects of differences in fecundity variance on the genealogy at a locus subject to balancing selection, Figure 4 shows how the distribution of the TMRCA of a sample of size 50 varies with the fecundity variance of the heterozygote, V12, when the fecundity variances of the two homozygotes have been set to 1. Results are shown for three sets of simulations in which the selection coefficients of the homozygous genotypes, s11 and s22, have been set to zero, while the selection coefficient of the heterozygote, s12, has been set to 0, 5, or 10. Note that the marker and the selected locus are completely linked in these examples.

Figure 4.—

The effect of fecundity variance polymorphism and balancing selection on the distribution of the TMRCA of a sample of size 50. Solid curves show the mean, while the enclosing curves show the 5, 25, 50, and 95% quantiles. Estimates are based on 100,000 simulations per data point. γ = 0.5, R = 2, $Formula$, s11 = s22 = 0, μ1 = μ2 = 0.02, r = 0.

As expected, Figure 4 shows that in all three cases, the TMRCA for the sample decreases as the fecundity variance of the heterozygote increases. This effect is most pronounced as V12 increases from 0 to V11 = V22 = 1 and when the selection coefficient s12 is large. In fact, increasing the value of V12 has three distinct consequences, each of which individually tends to reduce the depth of genealogies. First, the coalescent rate in either allelic background is an increasing function of V12 (Table 2). Second, as V12 increases, the selective advantage of the heterozygote is reduced by selection for lower fecundity variance. This, in turn, reduces the strength of balancing selection on the locus, making it less likely that the sample includes lineages from both genetic backgrounds. In fact, when s12 = 0, the locus is actually subject to disruptive selection whenever V12 > 1, which reduces polymorphism below neutral levels. Third, increasing the value of V12 also increases the magnitude of genetic drift in the population, which in turn causes the genetic backgrounds to fluctuate more strongly away from the frequencies maintained by balancing selection. This too reduces polymorphism at the selected locus. Interactions between these effects may also explain why the coalescent time is more sensitive to the fecundity variance of the heterozygote when V12 is small and s12 is large. Thus, when balancing selection is strong and both backgrounds are maintained at high frequencies, the coalescent rates in both backgrounds have a strong dependence on V12. However, as the strength of balancing selection decreases in relation to genetic drift, either because the selection coefficient s12 is smaller or because the selective advantage of the heterozygote is offset by its higher fecundity variance, the heterozygosity of the population will also decrease and one of the two alleles will typically be much more common than the other. In this case, the rate of coalescence within the more abundant genetic background will be much less sensitive to the fecundity variance of the heterozygote, so that further increases in V12 will have a much smaller impact on the distribution of genealogies.

Figure 5 shows how fecundity variance polymorphism and balancing selection affect the mean TMRCA when there is partial linkage between the marker locus and the selected locus. Results are shown for three sets of simulations that differ only in the selection coefficient and fecundity variance of the heterozygote. For comparison, a horizontal line is also plotted showing the mean coalescent time at the marker locus when there are no differences in the selection coefficients or fecundity variances of the three diploid genotypes (s11 = s12 = s22 = 0, V11 = V12 = V22 = 1). As expected, in all three cases, the coalescent time at the marker locus decreases as the recombination rate between the marker locus and the selected locus increases. While this effect is very weak when only the fecundity variance of the heterozygote differs from that of the two homozygotes (Figure 5, thick line: s12 = 0, V12 = 0), the depths of genealogies at sites that are tightly linked to the selected locus are substantially inflated under the combined influence of heterozygote advantage and reduced fecundity variance (Figure 5, dashed line: s12 = 5, V12 = 0). A similar pattern is seen when heterozygote advantage is combined with increased fecundity variance (Figure 5, dotted line: s12 = 20, V12 = 2), although in this case the coalescent times at sites that are loosely linked to the selected locus are actually smaller than would be expected in a population fixed for either allele. Because lineages at such sites can rapidly recombine between the allelic backgrounds, the principle effect of strong balancing selection is to increase the likelihood that lineages will coalesce within heterozygotes, which themselves have elevated coalescent rates.

Figure 5.—

The effect of fecundity variance polymorphism and balancing selection on the mean TMRCA of a sample of size 10 at a linked marker locus. Dashed curve, V12 = 0, s12 = 5; solid curve, V12 = 0, s12 = 0; dotted curve, V12 = 2, s12 = 20; horizontal line, V12 = 1, s12 = 0 (no fecundity variance polymorphism). Estimates are based on 100,000 simulations per data point. γ = 0.5, R = 2, $Formula$, s11 = s22 = 0, μ1 = μ2 = 0.02.

One notable feature of Figure 5 is that, as the recombination rate increases, the coalescent times do not approach the “neutral” value represented by the horizontal line. This effect is consistent with the results derived in the previous section, which show that the coalescent process at an unlinked site is a stochastic time change of Kingman's coalescent depending on the frequency dynamics at the selected locus (28). In particular, this means that selection acting on fecundity variance polymorphism will have a genomewide impact on genealogies and variation. This is illustrated in Figure 6, which shows how the mean TMRCA for a sample of size 10 at an unlinked locus varies with the selection coefficient of the heterozygote. These estimates were obtained by simulating the unstructured coalescent process defined by Equation 28, with the scaled fecundity variance (V12) of the heterozygote equal to 0, 0.5, 1, 2, or 3, when V11 = V22 = 1. As expected, the selective advantage of the heterozygote has no impact on the coalescent time at an unlinked locus when the heterozygote has the same fecundity variance as the homozygotes (Figure 6, horizontal line: V12 = 1). However, in the other four cases, Figure 6 shows that the mean TMRCA can either increase or decrease with the value of s12 depending on whether the fecundity variance of the heterozygote is less than or greater than that of the two homozygotes. Figure 6 also shows that the effect of increasing s12 is saturating; i.e., the expected coalescent time at the unlinked locus approaches a finite, positive limit as s12 tends to infinity. Indeed, in the limit of very large s12, the allele frequency dynamics at the selected locus are frozen at the fixed value p = $Math$, while the genealogical process at the unlinked marker locus tends to Kingman's coalescent run at a constant rate$Math$In this case, the mean TMRCA for a sample of size 10 is simply 2(1 − $Math$)/J, which agrees well with the times estimated from simulations with s12 = 40. (Note, however, that different limits for the coalescent rate can be obtained if all three selection coefficients are increased so that p tends to a different fixed point.) In contrast, when s12 is negative, the coalescent times at the unlinked locus are only weakly dependent on V12. Indeed, with heterozygote disadvantage, disruptive selection acts on the locus segregating the fecundity variance polymorphism, so that the frequency of A1 is usually concentrated near p = 0 or p = 1 and only rarely makes rapid transitions between these states. In this case, there are few heterozygotes in the population and thus the fecundity variance of this genotype has little impact on the coalescent rate at any site in the genome.

Figure 6.—

The effect of fecundity variance polymorphism and balancing selection on the mean TMRCA of a sample of size 10 at an unlinked marker locus. The horizontal line shows the case with no fecundity variance polymorphism. Estimates are based on 100,000 simulations per data point. γ = 0.5, R = 2, $Formula$, s11 = s22 = 0, μ1 = μ2 = 0.02.

## DISCUSSION

The results derived in this article show that structured coalescent processes in populations that are genetically polymorphic for within-generation fecundity variance differ from the coalescent in a genetically homogeneous population in two ways. One difference is that the diffusion process governing the background frequencies is, in general, not the familiar Wright–Fisher diffusion (Gillespie 1974, 1975). Not only is there selection in favor of reduced fecundity variance, but also, because the infinitesimal variance, v(p), is typically a cubic or quartic function of p, it follows that the variance effective population size, Np(1 − p)/v(p), is frequency dependent. [Recall that the variance effective population size is the size of an ideal population governed by the Wright–Fisher model with the same variance in neutral allele frequency fluctuations as the nonidealized population (Ewens 1982).] The second difference is that the coalescent rates within the different allelic backgrounds are not jointly inversely proportional to the variance effective population size. Instead, the rate at which lineages coalesce within each background depends only on the demographic traits of genotypes containing that allele. Thus, in a haploid population, the coalescent rate within the A1 background depends only on the fecundity variance of the A1 allele, while in a diploid population, the coalescent rate within this background depends only on the fecundity variances of the A1A1 homozygote and the A1A2 heterozygote.

These observations shed some light on a paradox noted by Gillespie (1974), which concerns the interpretation of effective population size in models with fecundity variance polymorphism. Referring back to the diffusion approximations (1) and (14), note that the variance effective population size depends disproportionately on the rarer allele. For example, in the haploid model, the variance effective population size is N/(1 − 1/R + pV2 + (1 − p)V1), which tends to N/(1 − 1/R + V2) as p approaches 1 and N/(1 − 1/R + V1) as p approaches 0. In other words, when an allele is lost from the population, the variance effective population size depends only on the absent allele. Although this observation may seem counterintuitive at first, two points should be kept in mind. One is that the coefficient of variation of the total number of offspring produced by individuals with a given genotype will be proportionately smaller when there are many such individuals than when there are few. This explains the qualitative relationship between the infinitesimal variance of the diffusion approximation and the fecundity variances of the two alleles. The second point is that the variance effective population size is only a relative measure of the speed of the fluctuations of the allele frequencies. When the population is monomorphic, there are no such fluctuations and the variance effective population size is uninformative about the population.

The effect of fecundity variance polymorphism on coalescent rates is more straightforward. The form that the structured coalescent takes suggests that we can associate with each allelic background a coalescent effective population size (Nordborg and Krone 2002; Sjodin et al. 2005) that depends on the frequency of that background and on the demographic traits of genotypes that contain that allele. Such a definition is justified by the fact that the coalescent process within each background is a stochastic time change of Kingman's coalescent, augmented by the possibility of migration of lineages between backgrounds. For example, in the haploid model, the coalescent effective population sizes of the A1 and A2 backgrounds could be defined to be Np/(1 − 1/R + V1) and Nq/(1 − 1/R + V2). Like the variance effective population size, these quantities are frequency dependent and fluctuate in time, but unlike the variance effective population size, the coalescent effective population sizes are meaningful even when a single allele is fixed in the population. In particular, the coalescent effective population size of an allele that is fixed in the population depends only on the demographic traits of that allele.

The disparity between the variance and the coalescent effective population sizes is also behind the most surprising result in this article: namely, that, in diploid populations, there are infinitely many combinations of fecundity distributions that have the same diffusion approximation but different coalescent processes. The formal reason that this correspondence is many-to-one is that the infinitesimal variance of the diffusion process is less informative about the fecundity variances of the three diploid genotypes than are the two background-specific coalescent rates in the structured coalescent (compare Equation 14 with Equations 20 and 21). Note that this is not the case in haploid models, where the fecundity variance of each allele can be individually deduced from either the infinitesimal variance coefficient or the two coalescent rates. It should also be noted that the degenerate relationship between ancestral processes and allele frequency dynamics in diploid populations is probably valid only in the diffusion limit. Indeed, information about the fecundity distributions is lost in passing to this limit, which would probably discriminate between finite population models that share the same diffusion approximation (see, for example, Proulx 2000). Nonetheless, it is still remarkable that the diffusion approximation itself does not uniquely determine the corresponding coalescent process, particularly in those cases where the diffusion process is related to a particular coalescent process via moment duality (Krone and Neuhauser 1997).

Surprisingly, it can be shown that the degenerate relationship between coalescents and diffusions remains even when the substitution process to the common ancestor is known. Using the approach described in Taylor (2007), the common ancestor processes for the models studied here can be represented as bivariate Markov processes that record the type of the common ancestor and the genetic composition of the population. In a two-allele model, this process can be denoted (z(t), p(t)), where p(t) specifies the frequency of A1 and z(t) = i if the genotype of the common ancestor is Ai. Suppose that the generator of the diffusion approximation for the model is$Math$where v(p) and m(p) are the infinitesimal variance and drift coefficients, respectively, and assume that the coalescence, mutation, and recombination rates for the structured coalescent process are those shown in Table 2. Then, the generator of the common ancestor process for this model is given by the expression$Math$(29)where the last term of each equation specifies the substitution rate to the common ancestor. The function h(p) that appears in these terms is the conditional probability that the common ancestor has genotype A1 given that the frequency of this allele is p and is the unique solution of the equation$Math$(30)subject to the conditions h(0) = 0 and h(1) = 1. Although a closed-form expression for h(p) is known only in a few special cases (Taylor 2007), Equation 30 can be solved numerically and the results used to study the influence of fecundity variance differences on substitution rates.

Note that, because the common ancestor process given in (29) depends only on the diffusion process and the mutation rates, it follows that there are infinitely many diploid models with distinct coalescent processes that nevertheless have the same substitution process along the common ancestor lineage. This observation has the following consequence. Because divergence between distantly related species is largely determined by substitutions to the common ancestor lineage in each population, these results suggest that even combined data on divergence and polymorphism at a site segregating alleles affecting fecundity are not always sufficient to distinguish between models that have different fecundity distributions, at least if diffusion-based analyses are employed. Indeed, as can be seen in Figure 3, even neutral patterns of divergence and polymorphism at the selected locus can be consistent with scenarios in which the segregating alleles change both the mean and the variance of the fecundity distribution. On the other hand, the fact that these scenarios have different coalescent processes suggests that inference could be done using multilocus sequence data, provided that the marker loci are neutral.

It is interesting to compare these observations with a recent result found by Myers et al. (2008). These authors show that the demographic history of a neutrally evolving population with variable population size cannot be uniquely reconstructed from its frequency spectrum. In fact, given any particular demographic history and the corresponding frequency spectrum, they show that there are infinitely many distinct demographic histories that share the same frequency spectrum. Similarly, different combinations of fecundity distributions that have the same diffusion approximation will necessarily share the same frequency spectrum. The difference between these two cases is that, at least in the diffusion limits, there is a one-to-one correspondence between the allele frequency dynamics and the genealogical structure of a population when variation in the effective population size is independent of genotype, but not when the effective population size fluctuates because of changes in the genetic composition of the population.

#### Scope and applications:

Having described some of the genealogical consequences of genetic polymorphism for fecundity variance, an important question concerns the relevance of these results to real populations. Recall that when we derived the diffusion approximations studied in this article, we made the assumption that the fecundity variances of the different genotypes were independent of population size, but that the differences in the mean fecundity were of order N−1. This choice of scalings was motivated by two related concerns. First, it guarantees that selection and genetic drift operate on comparable timescales, so that the diffusion approximations are proper weak limits for the corresponding sequences of finite population models. Second, it also reflects the fact that while the intensity of selection on mean fecundity is independent of population size, the intensity of selection on within-generation fecundity variance is instead inversely proportional to population size. However, as several authors have noted (Seger and Brockman 1987; Hopper et al. 2003), this second observation also suggests that the models introduced by Gillespie (1974) are unlikely to explain patterns of within-generation fecundity variance except possibly in very small populations. The difficulty is that selection for reduced fecundity variance is likely to drive a new mutation to fixation only if that mutation has a very small impact on mean fecundity.

Although these conditions seem exceptionally restrictive, there are several scenarios in which we might expect genetic polymorphism for fecundity variance to have a detectable impact on the genealogical relationships in a population. One such scenario is when two or more alleles differing in fecundity variance are maintained in the population by balancing selection. In this case, the genealogies of population samples would still be influenced by differences in fecundity variance even though the intensity of selection acting directly on this particular trait may be negligible relative to the intensity of selection maintaining the polymorphism. In the example that was studied in the last section (see Figures 4 and 5), balancing selection was a consequence of overdominance in mean fecundity: we assumed that the heterozygote had higher mean fecundity than either homozygote. The canonical example of overdominance in human populations is provided by the sickle cell mutation segregating at the hemoglobin locus (Currat et al. 2002). Although selection acts directly on survival in this case, fecundity variance polymorphism would be expected to result from differences in survival prior to and during reproductive maturity. Alternatively, balanced polymorphisms involving alleles that affect fecundity variance can be maintained by mechanisms such as assortative mating or frequency-dependent selection (Sinervo and Lively 1996; Gigord et al. 2001; Sinervo et al. 2007). For example, in their study of a balanced polymorphism maintained by frequency-dependent selection in the side-blotched lizard Uta stansburiana, Calsbeek et al. (2002) found that there is a significant difference in the fecundity variances of alternative male mating strategies. As was shown in the last section, these differences will have a genomewide impact on genetic variation.

A second scenario in which fecundity variance polymorphism may be important, at least transiently, is when a new mutation affecting both the mean and the variance of the offspring distribution is fixed in a population. For example, suppose that the mean fecundity of the mutant and wild-type alleles differs by an amount that is large relative to 1/N, but small relative to R. If the mean fecundity of the mutant is less than that of the wild-type allele, then it will usually be quickly lost from the population. On the other hand, if the mutant has higher mean fecundity, then it may be rapidly fixed in the population, resulting in a selective sweep. While the fecundity variances of the original and mutant alleles might play only a minor role in determining whether the new allele sweeps to fixation, differences in these variances will alter the genealogical signature of any sweep that does occur. In this case, the structured coalescent can be used to study these effects by modifying the diffusion process governing the frequencies of the genetic backgrounds to reflect the occurrence of the selective sweep (Kaplan et al. 1989). This can be done in two steps (Griffiths 2003; Coop and Griffiths 2004). First, the diffusion process must be conditioned so that frequency of the mutant allele (which we take to be A1) rises from 0 to 1. Then the conditioned process (which is also a diffusion) must be time reversed with respect to its speed measure.

Apart from these changes to the diffusion governing the frequencies of the genetic backgrounds, the coalescence, mutation, and recombination rates of the structured coalescent are the same as those summarized in Table 2. Thus, as with the equilibrium models studied in this article, genetic polymorphism in fecundity variance at the site of a selective sweep will affect the genealogy at a linked neutral locus both through the dynamics of the genetic backgrounds and through the coalescence rates within these backgrounds. Note, however, that if the mutation is very strongly selected, then the sample path of the sweep will be only very weakly affected by the fecundity variances of the relevant genotypes. In this case, the principal effect of fecundity variance polymorphism will be to change the rates at which coalescence occurs in the two backgrounds. However, because these rates will influence the number of lineages that escape the selective sweep via recombination, as well as the genealogy of the recombinant lineages below the sweep, the genealogical consequences of a change in fecundity variance might be evident in the pattern of polymorphism at linked sites.

One intriguing feature of selective sweeps by mutations that alter fecundity variance is that these will have a genomewide effect on genealogies and polymorphism. As was shown earlier, the genealogy at a neutrally evolving unlinked site is given by a stochastic time change of Kingman's coalescent, in which the time change depends the frequencies of the backgrounds at the selected locus. Again, the only change that needs to be made to the equilibrium theory is to modify the diffusion to reflect the occurrence of the sweep. In particular, if the new mutation is very strongly selected, then the coalescent at an unlinked site could be approximated by a time change of Kingman's coalescent with an instantaneous change in the rate of coalescence that occurs at the time of the sweep. However, apart from the effects of the sweep on linked variation, it seems unlikely that population genetic data could be used to distinguish such an event from a rapid change in the census population size.

The preceding discussion assumes that the diffusion processes described by (1) and (14) are good approximations to the frequency dynamics in the finite population models. However, if the segregating alleles cause large differences in mean fecundity, then the expected changes in allele frequencies from generation to generation may be rather poorly described by the infinitesimal drift coefficients of these approximations. In particular, Proulx (2000) has shown that when there is strong selection on mean fecundity, then selection on fecundity variance is frequency dependent, even in a haploid population. While a formal diffusion limit does not exist in this case, approximate drift and variance coefficients can still be calculated (see Equations 5 and 9 in Proulx 2000), and simulations suggest that a diffusion process parameterized with these coefficients is a more accurate approximation to the finite population models than the weak selection limits derived by Gillespie (1975). This approach could also be applied to the selective sweep scenario described above, although simulations would be needed to justify the approximation in the absence of convergence of the underlying processes.

Several recent studies have shown that population subdivision can increase the strength of selection acting directly on fecundity variance, especially in populations in which the migration rate is very low or in which density regulation occurs before dispersal (Proulx 2000; Shpak 2005; Lehmann and Balloux 2007; Shpak and Proulx 2007). In fact, if either of these conditions is satisfied, then the intensity of selection for reduced fecundity variance is inversely proportional to the size of individual demes rather than to the size of the entire metapopulation. Because many species exist as metapopulations, these results suggest that adaptive evolution of fecundity variance may be more common than has been generally believed. For example, Sarhan and Kokko (2007) propose that population subdivision has facilitated the evolution of multiple mating by female Glanville Fritillary butterflies as a strategy for reducing the risk of mating with incompatible males. Although I have focused on panmictic models in this article, the approach taken here could also be applied to island models of the type studied by Shpak and Proulx (2007). Provided that the number of individuals living within a single deme is much smaller than the number of demes, there is a separation of timescales in these models between events occurring within demes and events involving the entire metapopulation (Roze and Rousset 2003; Wakeley and Takahashi 2004). This has two important consequences. First, it can be exploited to derive a diffusion approximation for the global frequency of an allele—this is the approach taken by Shpak and Proulx (2007), for example. Second, the fact that lineages disperse between demes much more rapidly than they coalesce means that, on the coalescent timescale, the island model effectively behaves like a panmictic population. In particular, several authors have shown that the genealogy of a random sample of individuals from a neutral island model can be described by a scalar time change of Kingman's coalescent (Wakeley and Aliacar 2001; Nordborg and Krone 2002; Taylor and Véber 2009), while Slade and Wakeley (2005) have derived an ancestral selection graph for the infinitely-many-demes limit of the island model with selection. Although rigorous results are lacking at present, it should also be possible to derive coalescent processes for island models with selection and fecundity variance polymorphism by exploiting the existence of a diffusion approximation for the global allele frequencies.

An important question that deserves further investigation concerns the extent to which the results described in this article (e.g., genomewide correlations of genealogies) might apply to other models in which the variance effective population size is frequency dependent. One scenario in which we might expect many of the properties of the coalescent processes studied here to still hold is when there is heritable variation in other traits that influence the coalescent effective population size. For example, sex ratio is one such trait (Nunney 1993, 1996; Engen et al. 2007), and indeed heritable variation in sex ratio due to meiotic drive and cytoplasmic incompatibility has been documented in a number of species (Werren and Beukboom 1998; Jaenike 2001). Other life history traits, such as survival and fecundity schedules, also have this property, and Shpak (2007) has recently derived diffusion approximations for models that allow these traits to vary in age-structured populations. Dispersal polymorphisms provide a third example, since the effective population size of the infinitely-many-demes limit for the island model depends on the migration rate (Wakeley 2003; Wakeley and Takahashi 2004). While explicit studies of these models are needed, the fact that these traits are determinants of the coalescent effective population size suggests that the corresponding coalescent processes will have properties similar to those found in populations with fecundity variance polymorphism.

## APPENDIX

A simulation of the structured coalescent involves two steps: choice of the initial condition of the process, followed by simulation of the sample path until all lineages have coalesced to their most recent common ancestor. How the initial condition is chosen depends on whether or not we know the genetic composition of the sample. To simulate a genealogy for a random sample of size n, we can first sample the frequency p from the stationary distribution (2) or (16) of the relevant diffusion process and then sample n1 from a binomial distribution with parameters n and p. Alternatively, to simulate the genealogy of a sample containing n1 individuals of type A1 and n2 individuals of type A2, we need to sample the initial frequency p from the distribution, $Math$, where π(p)dp is the stationary distribution of the diffusion and C′ is a normalizing constant. In both cases, the density of the sampling distribution for p can be written in the form Cpa−1(1 − p)b−1h(p), where h(p) is a bounded continuous function. To generate samples from this distribution, I first drew samples from a Beta distribution with parameters (a, b) and then used rejection sampling to draw subsamples with the correct distribution. The envelope function for the rejection sampling was taken to be h(p)/hmax, where hmax = max{h(p): p ∈ [0, 1]}.

Simulation of the structured coalescent was carried out in the following way. Let a(p) and b(p) denote the infinitesimal variance and drift coefficients of the diffusion approximation and suppose that p(t) = p. Unfortunately, because the function $Math$ is not Lipschitz continuous on [0, 1] for the models studied in this article, the Euler–Maruyama scheme is inappropriate. Instead, I used a generalization of the Wright–Fisher model with adaptive time discretization. To generate the next position of the diffusion, I chose the time interval to be of duration$Math$with N = 1000, and I then sampled Np(t + δt) from the binomial distribution with parameters N and p + b(pt, using the fast algorithm bnldev described in Press et al. (1992). It can be shown that, as N tends to infinity, this process converges in distribution to the desired diffusion process. The accuracy of this approximation was confirmed by comparing the time averages of the mean, the variance, and the third moment of the simulated sample paths with the stationary averages of these statistics calculated using the appropriate stationary distribution.

Changes to the sample configuration were simulated using the following procedure. I first simulated six independent mean one exponentially distributed random variables, Z1, ⋯ , Z6, one corresponding to each possible transition (see Table 2). In addition, I also introduced six “clocks,” T1(t), ⋯ , T6(t), recording the effective amount of time elapsed for each possible transition since the last event. The effective time is defined to be the integral of the rate of the event in question along the sample path of the process from the time of the last event up until the present time. For example, if the last event happened at time t = 0, then in the haploid model, the effective time for coalescence of two lineages in the A1 background is$Math$The timing and type of the next transition is then determined by the clock that first exceeds the corresponding exponential random variable. (Where necessary, each interval [t, t + δt] was itself subdivided, with linear interpolation of the diffusion sample path, so that only one transition event occurred per subinterval.) Following each such transition, the clocks are set to zero and six new independent exponential random variables are simulated. The accuracy of the code implementing the structured coalescent was tested by comparing the mean of the TMRCA and the total tree length for simulations of the neutral structured coalescent with the analytical values calculated using Kingman's coalescent.

## Acknowledgments

I thank Simon Myers and Alison Etheridge for helpful discussion and Stephen Proulx and one anonymous reviewer for insightful comments on this article. This work was supported by Engineering and Physical Sciences Research Council grant EP/E010989/1.

## Footnotes

• Communicating editor: J. Wakeley