## Abstract

The effective population size (*N*_{e}) quantifies the effectiveness of genetic drift in finite populations. When generations overlap, theoretical expectations for *N*_{e} typically assume that the sampling of offspring genotypes from a given individual is independent among successive breeding events, even though this is not true in many species, including humans. To explore the effects on *N*_{e} of nonindependent mate pairing across breeding events, we simulated the genetic drift of mitochondrial DNA, autosomal DNA, and sex chromosome DNA under three mating systems. Nonindependent mate pairing across breeding seasons has no effect when all adults mate pair for life, a small or moderate effect when females reuse stored sperm, and a large effect when there is intense male–male competition for reproduction in a harem social system. If adult females reproduce at a constant rate irrespective of the type of mate pairing, the general effect of nonindependent mate pairing is to decrease *N*_{e} for paternally inherited components of the genome. These findings have significant implications for the relative *N*_{e} values of different genomic regions, and hence for the expected levels of DNA sequence diversity in these regions.

THE effective population size (*N*_{e}) is a fundamental parameter of population genetics, which quantifies the effect of genetic drift, the stochastic change in allele frequencies over time in a population of finite size (Wright 1931). The magnitude of *N*_{e} affects both the level of genetic variability within a population and the efficiency with which populations retain mildly beneficial mutations and purge mildly deleterious ones. This influences a myriad of genetic phenomena, such as the level of DNA sequence polymorphism, the rate of substitution of nonsynonymous and functional noncoding nucleotide positions, the abundance of transposable elements, levels of variation, and the rate of evolution of gene expression, the persistence of duplicate genes, and genome size and organization (Lynch 2007; Charlesworth 2009). There are a variety of definitions of *N*_{e}; here we use the definition in terms of the mean coalescence time of a pair of neutral alleles, which is given by 2*N*_{e} (Charlesworth 2009). This definition has the useful feature that the expected pairwise nucleotide site diversity under the widely used infinite sites model is equal to 4*N*_{e}*μ*, where *μ* is the neutral mutation rate (Kimura 1971).

As a result of differences in their ploidy level and mode of inheritance, autosomal DNA (aDNA), the X chromosome (xDNA), the Y chromosome (yDNA), and maternally transmitted organelle DNA such as mitochondrial DNA (mtDNA) generally have a different *N*_{e} values. Under certain conditions, such as constant population size, discrete generations, a Poisson distribution of reproductive success, and a sex ratio equal to one, the relative *N*_{e} values of these genomic regions (*N*_{e-a}, *N*_{e-x}, *N*_{e-mt}, and *N*_{e-y}) are expected to be 4:3:1:1 (Charlesworth 2009). This is because aDNA is biparentally inherited and diploid; xDNA is biparentally inherited, diploid in females, and haploid in males (with female heterogamety, the reverse applies to the Z chromosome), and yDNA (the W chromosome, with female heterogamety) and mtDNA are both effectively uniparentally inherited and haploid in most species.

However, several characteristics of natural populations, such as unequal numbers of males and females and nonrandom variation in reproductive success, can affect the value of *N*_{e}, even for populations with discrete generations (reviewed in Caballero 1994; Hedrick 2007; Charlesworth 2009). In addition, natural selection at sites linked to neutral markers also has the potential to increase *N*_{e} (under balancing selection) (Charlesworth 2006) or to decrease *N*_{e} (with background selection or selective sweeps) (Hudson and Kaplan 1988; Charlesworth *et al.* 1993). Because the nature of natural selection varies in different genomic regions, especially in relation to the rate of recombination, *N*_{e} may also vary among unlinked regions with the same ploidy and mode of inheritance, for example, different portions of an autosomal chromosome (Gossmann *et al.* 2011). In natural populations, these factors can skew the relative *N*_{e} values away from the 4:3:1:1 expectation. Even when the effects of natural selection and “nonideal” demography are ignored, the 4:3:1:1 relation still has a large variance when applied to individual loci (Hudson and Turelli 2003).

When generations overlap, an additional source of possible deviations from these idealized relations arises from variation among individuals in survival and reproductive success among breeding seasons (Felsenstein 1971; Hill 1972, 1979; Johnson 1977) and from sex differences in demographic parameters and stochastic changes in population size (Engen *et al.* 2007). In contrast, the effect of a high variance in reproductive success caused by male–male competition is lessened when generations overlap for many breeding seasons (Nunney 1993; Charlesworth 2001). Conversely, overlapping generations with nonindependent mate pairing across breeding seasons could increase the variance in reproductive success. For example, in humans, paternity is correlated with paternal confidence in paternity (Anderson 2006), and married individuals tend to repeatedly produce offspring with each other more frequently than expected by chance. Nonindependent mate pairing among breeding seasons occurs in many other species as well—for example, long-term pair bonding in prairie voles (DeVries *et al.* 1995), harems in gorillas (Gatti *et al.* 2004), and sperm storage in fruit flies (Neubaum and Wolfner 1999).

Current theoretical models that allow calculation of the effective population size with overlapping generations and age structure make several simplifying assumptions, notably constant sizes of each age class, sufficiently large numbers of individuals in each age class that second-order terms in their reciprocals can be neglected, and independent sampling of offspring genotypes from the same individual reproducing at different times (Hill 1972; Nunney 1991, 1993; Caballero 1994; Charlesworth 1994, 2001). The latter assumption in particular makes it difficult to provide accurate expressions for species such as humans and *Drosophila*, which reproduce nonindependently because of long-term pair bonds and sperm storage, respectively (Charlesworth 2001).

The goal of this study is therefore to explore the consequences of nonindependence of reproductive events across time in different social systems and with different age structures, using simulations of genetic drift in two types of age-structured populations, under different scenarios of independent and nonindependent mate pairing among breeding events. We have explored how these scenarios affect the relative values of *N*_{e-a}, *N*_{e-x}, *N*_{e-mt}, and *N*_{e-y} using the infinite alleles model of mutation (Kimura and Crow 1964), with particular emphasis on comparisons of similar mating systems that differ in the extent of nonindependence among breeding events.

## Theory and Methods

### Age structure

A flowchart illustrating the steps involved in these simulations is shown in Figure 1, with further details of each model provided below. We performed simulations of the two types of age-structured populations depicted in Figure 2. The first “long” age structure model had a large number of age-classes (*n* = 200), intended to approximate the life history of organisms such as *Drosophila* or primates that reproduce quasi-continuously. The second “short” age structure model had only five age classes and applies to organisms that have only a few opportunities to reproduce within an individual’s lifespan. The following assumptions are made: a constant population size, an equal number of males and females, reproduction and death over discrete time intervals, a demographic model that includes nonreproductive juveniles, age-independent adult mortality and fecundity, and no sex differences in age-specific survival probabilities (and hence age distributions) or mutation rate. A fixed age distribution was assigned to the population consistent with regulation by density-dependent fecundity or mortality (Charlesworth 1994, Chap. 1). Mating was at random with respect to genotype and age of parent. For details, see Supporting Information, File S1, File S2, and File S3.

### Mating systems

The mating systems explored here are divided into those in which females pair with males that are chosen randomly with replacement from the set of adult males and those in which females pair randomly with males chosen randomly without replacement. When mate pairing occurs without replacement, each male reproduces with a maximum of one female per breeding season for monogamy models (or with one harem of females per breeding season for harem models, provided that the male holds a harem). When mate pairing occurs with replacement, all males have an equal probability of mating with each female (provided she is not using stored sperm) but the same male may by chance mate with multiple females in a given breeding season.

For mate pairing without replacement, we explored a system with either mate pairing between male–female pairs (“monogamy” models) or with males and groups of females (“harem” models). We considered two versions of the monogamy and harem models. In the first version, following Nunney (1991, 1993), mate pairing occurs independently in each breeding season. We call these models the “seasonal monogamy” and “seasonal harem” models. In the second version, mate pairing occurs for life and is thus nonindependent among breeding seasons. We call these models the “monogamy for life” and “harem for life” models. Details are given in File S1, File S2, and File S3.

In the seasonal harem and the harem for life models, we first focused on a system in which all adult females are mate paired in harems composed of nine females and one male. Because the number of males and females are equal, eight out of every nine adult males cannot reproduce in each breeding season in both cases. In the seasonal harem model, each adult male gets a chance to be in a harem every breeding season, because the male that occupies each harem is chosen randomly each breeding season. In the long age structure simulations, most males do not successfully reproduce each breeding season, so that the effect of harems on *N*_{e} is modest in the seasonal harem model. However, this effect is more pronounced in the short age structure. These effects are quantified below in the *Theoretical expectations* section. In the harem for life model, the same males occupy harems for multiple breeding seasons. Half siblings can be produced each breeding season if offspring from multiple females within the same harem survive. Nonreproductive adult males get the chance to reproduce in subsequent breeding seasons only if they are randomly chosen for joining a harem where the male died after the previous breeding season. In this way, in the harem for life model harems are expected to have a more pronounced effect on the variance in reproductive success among males in both age structures as compared to the seasonal harem model. As discussed below, we also modeled other harem sizes for the seasonal harem and the harem for life models.

For mate pairing with replacement, we performed simulations that are similar to the “lottery polygyny” mating system explored by Nunney (1993), which we refer to as the “storage” models. In lottery polygyny, all females mate each breeding season with a male drawn randomly with replacement. Drawing males with replacement generates an additional variance in male reproductive success beyond the Poisson expectation; this effect is quantified below in the *Theoretical expectations* section. In the storage models, Poisson distributions of numbers of offspring of each sex are produced by a female with a randomly selected male; the progeny are thus full siblings. Half siblings can be produced each generation if the same male is by chance paired with two or more females. A subset of the resulting offspring from these pairings is chosen for survival, based on the number of newborns in the youngest age cohort dictated by density-dependent equilibrium (as in the monogamy and harem models).

To examine nonindependent mate pairing across breeding seasons in the models with replacement (the storage models), we simulated sperm storage (Neubaum and Wolfner 1999). Virgin females initially mate with a male chosen randomly with replacement; in subsequent breeding seasons there is a fixed probability of reproduction with sperm stored from this mating, with the decision to use or not use the stored sperm made by drawing a random number. In the event that stored sperm is not used, females are paired again with a male selected randomly with replacement. We assumed that females store sperm for their entire life and that stored sperm is exclusively derived from the most recent copulation.

Simulations were performed with a 0% probability of using stored sperm (the “no-storage” model, analogous to lottery polygyny), a 50% probability of using stored sperm (the “half-storage” model), and a 100% probability of using stored sperm (the “all-storage” model). The all-storage model is therefore identical to the monogamy for life model with the exceptions that (1) the initial pairing of males with each female is performed with replacement and (2) in the monogamy for life model, a female receives a new partner if her male dies, whereas in the storage models, a female continues to use stored sperm after her male dies. The all-storage model (and also the other storage models—no storage and half storage) therefore have an additional variance in male reproductive success compared to the monogamy models, which stems from mate pairing with replacement. In the all-storage model, and to some degree the half-storage model, this additional variance is partially offset by a lower variance in partner number per female, because in the monogamy for life model some females outsurvive their partners and pair again with another male. We expect the magnitude of this offset to be small, because only half of the females outlive their partners, and because in these cases the disparity in life spans is generally small.

Of course, many assumptions of these models are not met by species in nature. For example, the assumption of a constant population size, including a male to female sex ratio of 1:1 in each age cohort, is unrealistic. Overall, our focus was to evaluate the effect of nonindependent mate pairing on the relative *N*_{e} values for different parts of the genome. In doing so, however, we were unable to emulate various details of the breeding systems of real species, and we also violate to some degree various key theoretical assumptions that are commonly made in deriving formulae for *N*_{e}.

### Estimating N_{e} from simulations of genetic systems

We estimated *N*_{e-a}, *N _{e-x}*,

*N*

_{e-mt}, and

*N*

_{e-y}by conducting individual-based Monte Carlo simulations that incorporated the above demographic assumptions into single-locus genetic models of each of these modes of transmission. Effective population sizes were estimated from genetic diversity under the infinite alleles model of neutral mutation (Kimura and Crow 1964), under which each new mutation at a locus represents a new allele. Forward simulations were performed in which mutations occurred at a rate

*μ*per time interval, such that

*μ*is the probability that a gamete produced by a parent at time

*t*contains an allele that was not present in the zygote that formed the parent in question (Charlesworth 2001, Equation A1). Plots of

*H*/(1 −

*H*), where

*H*is the heterozygosity (the probability that two random alleles are different is state)

*vs.*time were used to assess when the simulations had progressed for a sufficient number of breeding seasons for the system to reach mutation–drift equilibrium (at which point the heterozygosity reaches an equilibrium (

*H*

_{eq}). Under the infinite-alleles model,

*H*

_{eq}is determined by the scaled mutation rate parameter

*θ*= 4

*N*

_{e}

*μ*, giving the (nonlinear) relation

*θ*=

*H*

_{eq}/(1 –

*H*

_{eq}) (Kimura and Crow 1964).

*H*

_{eq}was estimated from the mean heterozygosity across all simulations; this mean was then used to calculate

*θ*. Simulations were performed on the sharcnet computer cluster (http://www.sharcnet.ca) and scripts that perform these simulations are available by request.

### Theoretical expectations

Theoretical expectations for *N*_{e} for aDNA, xDNA, mtDNA, and yDNA under the assumption of independent mate pairing among breeding seasons are given by Equations 4, 8, 11, 12, and 13 of Charlesworth (2001). Equations 4, 8, and 11 provide expressions for *N*_{e} that take age structure into account, assuming Poisson distributions of per capita offspring number for individuals breeding at a given time. Equations 12 and 13 include parameters for adult males (Δ*V*_{m}) and adult females (Δ*V*_{f}) that represent additional variances in offspring number over Poisson expectations, standardized by the squared mean offspring number per breeding season per adult of each sex (note that this differs from the usage in Vicoso and Charlesworth (2009), where unstandardized variances were discussed). For all models considered here, Δ*V*_{f} is equal to zero because adult females always have an equal opportunity of reproducing. For the monogamy models, Δ*V*_{m} is also equal to zero, because all adult males also have an opportunity to reproduce that is equivalent to each other and to the adult females.

For the no-storage model with lottery polygyny, based on random sampling of males each breeding season, the argument of Vicoso and Charlesworth (2009, Appendix, Equation A5) can be used, generalizing it to assume a mean number of offspring per individual of *x* instead of 2. This gives the result that Δ*V*_{m} is approximately equal to the binomial variance in offspring number, divided by the squared mean offspring number per adult male per breeding season, *i.e.*,*x* is the mean number of offspring per male adult per breeding season and *p* is the probability that a female selects a given adult male for a mating event (with a large number of breeding males, *p* ≪ 1). This formula for Δ*V*_{m} accommodates the additional variance in male reproductive success due to mate pairing with replacement (*cf*. Nunney 1993; Vicoso and Charlesworth 2009).

For the seasonal harem model, since a male that is not in a harem has no offspring, Δ*V*_{m} is given by*p*_{1} is the probability that an adult male is in a harem, *x*_{1} is the expected number of offspring per harem male, and the mean number of offspring per male is *x* = *p*_{1}*x*_{1}. Numerical values for Δ*V*_{m} from these equations for the models analyzed here are given in File S1, File S2, and File S3.

### Comparing simulations with the theoretical expectation and with each other

Ninety-five percent confidence intervals for *N*_{e} estimated from the simulations were generated by adding or subtracting 1.96 times the standard error from the mean *H*_{eq} and then converting these values to *N*_{e} according to the relation *N*_{e} = *H*_{eq}/[4*μ*(1− *H*_{eq})]. These 95% confidence intervals were essentially the same as those calculated with an alternative approach in which we recalculated the mean from resampling subsets of the data (jackknifing) 1000 times, including each simulation in each iteration with 50% probability (data not shown). The theoretical variance of *H*_{eq} for a Wright–Fisher population, which is equal to 2*θ*/((2+*θ*)(3+*θ*)(1+*θ*)^{2} (Nei 1987, p. 369) was also compared to the simulated variance of *H*_{eq}. Ninety-five percent confidence intervals for the ratios *N*_{e-x}:*N*_{e-a}, *N*_{e-mt}:*N*_{e-a}, and *N*_{e-y}:*N*_{e-a} for each model were also estimated from the resampled data subsets.

We compared the theoretical *θ* to the simulated *θ* using two tailed *z*-tests (test 1). The null hypothesis for test 1 is that the mean of the simulated *θ* is not different from the theoretical *θ*, calculated as described above. An approximation for the variance of *θ* was obtained from the theoretical variance of *H*_{eq} (as defined above) using the Delta method—*i.e.*, by multiplying the variance of *H*_{eq} by the square of the derivative of *H*/(1 − *H*), which is (1 − *H*_{eq})^{−4}. This approximation for the variance of *θ* was then used to calculate the standard error of *θ* for the *z*-test. We also performed the *z*-test using the standard error of *θ* estimated from the simulated variance of *H*_{eq} instead of the theoretical variance of *H*_{eq}, with similar results.

We compared the relative sizes of *N*_{e-a}, *N*_{e-x}, *N*_{e-mt}, and *N*_{e-y} from each of the simulations to the theoretical expectations under the null hypothesis that the relative sizes of *N*_{e-a}, *N*_{e-x}, *N*_{e-mt}, and *N*_{e-y} are not different from the theoretical expectations discussed above (test 2). A two-sided normal deviate test was used to test whether the difference between the *θ* value for xDNA, multiplied by the theoretical value of the ratio *N*_{e-a}/*N*_{e-x}, and the *θ* value for aDNA, was significantly different from zero. For example, for the monogamy models, 1.333 times the simulated *θ* value for xDNA was compared to 1 times the simulated *θ* value for aDNA. A significant departure of this difference from zero was inferred if the difference of the means was >1.96 standard errors, using the following expression for the variance of this difference*R*_{xa} is the theoretical value of the ratio *N*_{e-a}/*N*_{e-x}, and Var(*θ*_{x}) and Var(*θ*_{a}) refer to the variances in *θ* of xDNA and *θ* of aDNA, respectively, which are approximated from the variance of *H*_{eq} as described above. Test 2 was also used to compare simulated scaled *θ* values for mtDNA to simulated *θ* values for aDNA and to compare simulated scaled *θ* values for yDNA to simulated *θ* values for aDNA.

We also compared the simulations with nonindependent mate pairing across breeding seasons (the monogamy for life, harems for life, and the half-storage or all-storage models) to the corresponding simulations with independent mate pairing across breeding seasons (the seasonal monogamy, seasonal harem, and the no-storage models, respectively) for each type of locus (aDNA, xDNA, mtDNA, and yDNA; test 3). The null hypothesis of test 3 is that, for a given locus type, the mean *θ* from the simulations with independent mate pairing across breeding seasons is not different from the mean *θ* from the simulations without independent mate pairing. The statistical approach was similar to test 2 in that we used a two-sided normal deviate test to test whether the difference between the *θ* value from the model with independent mate pairing among breeding seasons and the *θ* value from the model with nonindependent mate pairing among breeding seasons was significantly different from zero.

A premise of tests 1, 2, and 3 is that the statistics being compared have an approximately Gaussian distribution. While this is not precisely the case, this assumption is approximately met because of the large number of replicate simulations used to generate the mean *θ* values (4000). This assumption is justified in File S3.

## Results

### Theoretical expectations

Table 1 and Table 2 show theoretical expectations for the *N*_{e} values for the case of independent mate pairing across breeding events, based on Equations 1 and 2 and the relevant formulas in Charlesworth (2001), and using the numerical values for demographic parameters provided in File S1 and File S2. The theoretical values of *N*_{e-m}/*N*_{e-a}, *N*_{e-x}/*N*_{e-a}, and *N*_{e-y}/*N*_{e-a} are also presented in Table 1 and Table 2 and summarized in Figure 3. The theoretical value for *N*_{e-mt} for each of the age structures is the same for all models, because the variance in female reproductive success is the same in all models, although the expectations differ substantially for each of the age structures. In general, overlapping generations reduce *N*_{e} relative to the census size for adults (*N*_{c}) (Nunney 1991, 1993; Charlesworth 2001) and as a result, the theoretical expectations for *N*_{e} are substantially lower for the long age structure (Table 1) than for the short age structure (Table 2) for each of the models. In our simulations, *N*_{c} for the long and short age structures is equal to 1044 individuals and 1062 adults, respectively.

### Simulations with independent mate pairing across breeding seasons

Simulations of mitochondrial DNA can be considered as “controls” for comparison across simulations within each age structure. In no case did these *N*_{e-mt} simulations deviate significantly from the theoretical value; as expected, the simulated *N*_{e-mt} values for each age structure were similar, irrespective of the model (Table 1 and Table 2). *N*_{e} estimates from the simulations with independent mate pairing among breeding seasons were never individually significantly lower than the theoretical expectations (test 1, Table 1 and Table 2). Unexpectedly, on three occasions simulations with independent mate pairing across breeding seasons were individually significantly higher than the theoretical expectation (test1; Table 1 and Table 2). These included *N*_{e-y} values from the no-storage model in the long and short age structure simulations and also *N*_{e-a} values from the no-storage model in the short age structure simulations. However, none of these was significant after Bonferroni correction for four locus types tested per simulation (Rice 1989), and it is clear from inspection of plots of *θ vs.* time that these values are atypically high at the generation we choose to study (400,00 for the long age structure and 200,000 for the short age structure; see black arrows in Figure S1).

The variance in heterozygosity among replicate simulations was generally similar to the theoretical variance: for all three models with independent mate pairing, the variance in heterozygosity of the simulations was <30% higher or lower than the theoretical variance for each locus type. Exceptions to this were yDNA in the no-storage and in the seasonal monogamy models in the short age structure simulations, which had simulated variances that were >35% higher than the theoretical expectations. Overall, it appears that the simulations with independent reproduction across time intervals agreed well with the theoretical predictions of the behavior of populations with overlapping generations.

In the seasonal monogamy model, males have the same distribution of reproductive success as females, so that the difference between *N*_{c} and the theoretical value of *N*_{e} reflects the presence of juveniles and mortality between successive age classes, which together cause a departure from the discrete-generation equivalent *N*. In the seasonal monogamy model, the theoretical expectation for *N*_{e-a}/2*N*_{c} for autosomes is 0.349 and 0.548, respectively, for the long and short age structures (Table 1 and Table 2). The theoretical values of the relative *N*_{e} of different parts of the genome are the same as for the discrete generation equivalent (that is, 4: 3: 1: 1 for *N*_{e-a}: *N*_{e-x}: *N*_{e-mt}: *N*_{e-y}), as found previously (Charlesworth 2001).

In the seasonal harem models, males have unequal access to reproduction depending on whether or not they are in a harem. This is expected to decrease the *N*_{e} of paternally inherited portions of the genome. However, in the long age structure this difference is small with a harem size of nine females, with independent mate pairing among breeding seasons (Table 1). This is because male membership in harems is reassigned each breeding season and because relatively few offspring are produced each breeding season in our model. Thus, because most males do not reproduce in each breeding season in the long age structure, the harem system did not substantially increase the variance in male reproductive success within a given breeding season, as long as males are reassigned to harems each breeding season. The theoretical value of *N*_{e-a}/2*N*_{c} for the seasonal harem model is equal to 0.336 and 0.212, respectively, for the long and short age structures.

Differences in *N*_{e}/*N*_{c} between the seasonal monogamy and the seasonal harem models illustrate a more prominent role of the harem social system in a population where on average most adults breed during their lifetime (the short age structure), as compared to a population where on average most adults do not breed during their lifetime (the long age structure). Because of the different paternal contributions, the theoretical expectations for *N*_{e-mt} /*N*_{e-a} and *N*_{e-x}/*N*_{e-a} in the seasonal harem model are higher than 1/4 and 3/4, respectively, but lower than 1/4 for *N*_{e-y}/*N*_{e-a} (Table 1 and Table 2). In the storage model, mate pairing occurs with replacement, and an additional variance in male reproductive success is generated by resampling males each breeding season. Compared to the monogamy models, the expectation for *N*_{e} for the no-storage model is again lower for paternally inherited portions of the genome, but not substantially so for either age structure (Table 1 and Table 2). For the no-storage model, *N*_{e-a}/2*N*_{c} is equal to 0.334 and 0.479, respectively, for the long and short age structures.

### Simulations with nonindependent mate pairing across breeding events

The theoretical expectations discussed above assume independent mate pairing across successive breeding events involving the same individual. We anticipated that nonindependent mate pairing among breeding seasons in our models would increase the variance in male reproductive success. We therefore expected a reduction in *N*_{e} for wholly or partially paternally inherited portions of the genome (xDNA, aDNA, and yDNA) in simulations with nonindependent mate pairing across breeding seasons compared to these theoretical expectations and compared to the simulations with independent mate pairing across breeding seasons. Theoretical expressions for *N*_{e} when mate pairing is not independent across breeding seasons are not available in these cases; for this reason, we relied on simulations to quantify the effect of this nonindependence.

The seasonal monogamy and monogamy for life models differ in that the first model has independent mate pairing among breeding seasons. However, these models are identical in terms of the variances in female and male reproductive success in a given breeding season, which are close to the Poisson values. As expected, there was no significant difference for any locus type between the theoretical *N*_{e} values for the seasonal monogamy model and the simulated values for the monogamy for life model (*P* > 0.05, test 1, Table 1 and Table 2). The relative *N*_{e} values from the monogamy for life simulations were also not significantly different from the theoretical expectations for the seasonal monogamy model (*P* > 0.05, test 2; Table 1 and Table 2); the same applies to the difference between the simulated *N*_{e} values from the monogamy for life model and the seasonal monogamy models (*P* > 0.05, test 3; Table 1 and Table 2). Thus, nonindependent mate pairing among breeding seasons does not greatly affect *N*_{e} in the monogamy model, provided that all adults are paired for reproduction in each breeding season. This model requires an equal number of males and females to make possible the 1:1 matching of adults of each sex in every breeding season—a condition rarely met in natural populations.

To explore a more realistic scenario where males have a higher variance in reproductive success than females, we performed simulations where one male reproduces with a harem of nine females. Because there is an equal number of males and females, eight out of every nine males do not reproduce each breeding season. In both age structures, the harem for life model has significantly lower *N*_{e} values than the theoretical expectation under independence across seasons and the results from simulations with independent mate pairing across breeding seasons in all portions of the genome with paternal inheritance (*N*_{e-x}, *N*_{e-a}, and *N*_{e-y}, *P* < 0.05, tests 1 and 3; Table 1 and Table 2). As expected, simulated values of *N*_{e-mt} from the harem for life model were not significantly different from the theoretical expectation or simulations with independent mate pairing across breeding seasons (*P* > 0.05, tests 1 and 3, Table 1 and Table 2). In the harem for life model, the effect of nonindependent mate pairing on *N*_{e} of paternally inherited portions of the genome is more severe for parts of the genome with a higher proportion of paternal inheritance. Nonindependent mate pairing in the harem for life model thus decreases *N*_{e-a} compared to *N*_{e-mt} and *N*_{e-x}, so that *N*_{e-mt}/*N*_{e-a} and *N*_{e-x}/*N*_{e-a} are higher, but decreases *N*_{e-y} compared to *N*_{e-a}, so that *N*_{e-y}/*N*_{e-a} is lower (Figure 3).

As expected, *N*_{e-x}/*N*_{e-a} was significantly higher than the theoretical expectation with independent mate pairing across breeding seasons for the long age structure. There was no significant difference between these ratios for the short age structure and, contrary to expectations, *N*_{e-x}/*N*_{e-a} from the harem for life simulations for the short age structure was almost significantly lower (Figure 3). However, this unexpected result was caused by an atypically high *N*_{e-a} and atypically low *N*_{e-x} that were sampled at the 200,000th simulated breeding season in these simulations (see red arrows in Figure S1). Consistent with expectations, *N*_{e-mt}/*N*_{e-a} from the long age structure was higher and *N*_{e-y}/*N*_{e-a} from the short age structure was lower than the theoretical expectation with independent mate pairing across breeding seasons (test 2; Table 1 and Table 2), although the significance of these disparities differed between the age structures.

A significant effect of nonindependent mate pairing among breeding seasons was also recovered from simulations involving sperm storage, although the effect was much smaller than that caused by nonindependent mate pairing among breeding seasons in the harem for life model. We considered three scenarios for sperm storage: the no-storage model where females do not store sperm, the half-storage model where females use stored sperm half of the time, and the all-storage model where females always use stored sperm derived from their first reproduction. As expected, the effect of sperm storage on *N*_{e} was most pronounced for *N*_{e-y}. When the half-storage model was compared to the no-storage model, *N*_{e} was not significantly different for any locus type for either age structure (*P* > 0.05, test 3; Table 1 and Table 2). But when the all-storage model was compared to the no-storage model, *N*_{e-y} was significantly lower for the long age structure model and *N*_{e-x}, *N*_{e-a}, and *N*_{e-y} were significantly lower with the short age structure (*P* < 0.05, tests 1 and 3; Table 1 and Table 2). The relative *N*_{e} of the different loci in the half-storage model was not significantly different from the theoretical expectations for either age structure (*P* > 0.05, test 2; Table 1 and Table 2). The relative *N*_{e} values of the different loci in the all-storage model were also not significantly different from the theoretical expectations (*P* > 0.05, test 2; Table 1), except for *N*_{e-}* _{y}*/

*N*

_{e-a}, which was significantly lower than the theoretical expectation with the long age structure (

*P*< 0.05, test 2; Table 1).

Using the long age structure model, we also performed simulations with harem sizes equal to 2, 3, or 18 females, with or without independent mate pairing among breeding seasons. As expected, *N*_{e} for paternally inherited portions of the genome decreases with increasing harem size. For all of these harem sizes, *N*_{e} for paternally inherited portions of the genome was significantly smaller for the harem for life model than the theoretical expectation with independent mate pairing among breeding seasons (test 1, *P* < 0.05; Table S1) and than the seasonal harem (test 3, *P* < 0.05; Table S1). As expected, for harem sizes of 2, 3, or 18 there was no significant difference between the simulated *N*_{e-mt} and the theoretical expectation (test 1, *P* > 0.05) or between the *N*_{e-mt} simulated in the seasonal harem and harem for life models (test 3, *P* > 0.05). When the *N*_{e-x}/*N*_{e-a}, *N*_{e-mt}/*N*_{e-a}, and *N*_{e-y}/*N*_{e-a} ratios from the seasonal harem model and the harem for life models were compared with harem sizes of 2, 3, or 18, the only ratio that was significantly different from the theoretical expectation was *N*_{e-mt}/*N*_{e-a} for harem sizes of 3 or 18 (test 2, *P* < 0.05; Table S1). Nonetheless, it is clear that nonindependent mate pairing across breeding seasons increases *N*_{e-x}/*N*_{e-a} when the simulations from multiple harem sizes are considered collectively (Figure 4). Based on simulations with harem sizes of 2, 3, 9, and 18 and the long age structure, a permutation test (File S3) indicates that the fit of the jackknifed harem for life *N*_{e-x}/*N*_{e-a} ratios to the theoretical expectation is significantly worse than the fit of the jackknifed seasonal harem *N*_{e-x}/*N*_{e-a} ratios (*P* < 0.0001).

## Discussion

The standard models of genetic drift in age-structured populations (Felsenstein 1971; Hill 1972, 1979; Johnson 1977; Charlesworth 2001; Engen *et al.* 2007) assume that reproductive events involving the same individual in different breeding seasons occur independently. This assumption is violated by many species in nature. To explore the effect of nonindependent mate pairing among breeding seasons, we simulated genetic drift under three different mating systems: (1) couples pair randomly without replacement—the monogamy models, (2) multi-female groups pair with one male sampled without replacement—the harem models, and (3) females pair with males chosen with replacement, and potentially store sperm (the storage models, also known as lottery polygyny), in two types of population age structures that differed greatly in their numbers of age classes.

Although we generally did not find significant differences between theoretical and simulated *N*_{e} values in the models with independent mate pairing among breeding seasons (see Table 1 and Table 2), there are in fact reasons to expect modest differences that we did not have the statistical power to detect. In particular, the models use approximations that ignore terms of the order of the squares of the reciprocals of the sizes of the reproductive age classes, relative to first-order terms (Felsenstein 1971; Johnson 1977; Emigh and Pollak 1979; Charlesworth 2001). The small populations used in these simulations could thus deviate noticeably from the theoretical predictions for this reason alone. It appears from Table 1 and Table 2, however, that this and other sources of departure from the assumptions used to obtain the theoretical results have only minor effects, given the generally fairly good agreement between the theoretical expectations and the simulation results. However, the theoretically expected high level of stochastic variability for heterozygosity (Nei 1987, p. 369) means that only quite large deviations could be detected, as can be seen from the confidence intervals shown in Table 1, Table 2, Table S1, and in Figure 3 and Figure 4. Greatly increasing the number of replicates would increase our ability to detect deviations from the prediction, but simulations with large numbers of age classes are inherently time consuming.

In contrast, comparisons between the simulation results with independence and those with nonindependent mate pairing among breeding seasons demonstrated a significant effect of nonindependent mate pairing on genetic drift, except in the monogamy model; here, all adults are continuously mate paired for life irrespective of whether mate pairing was independent across breeding seasons. The effect of nonindependent mate pairing among breeding seasons in the harem and storage models was to decrease the *N*_{e} of portions of the genome that are at least in part transmitted paternally. In the harem for life model, this effect was significant for *N*_{e-x}, *N*_{e-a}, and *N*_{e-y}. The relative magnitude was generally largest for *N*_{e-y}, intermediate for *N*_{e-a}, and the smallest for *N*_{e-x}, but the ratios of these values did not always differ significantly from the expectations with independent mate pairing among breeding seasons. The effect of nonindependent mate pairing among breeding seasons was similar, but of smaller magnitude, in the all-storage model, with *N*_{e-x}, *N*_{e-a}, and *N*_{e-y} substantially affected in the all-storage simulations with the short age structure.

In the long age structure, we suspect that the nonsignificant effect of nonindependent mate pairing for xDNA and aDNA in the all-storage model is due to the relatively small magnitude of this effect and that a significant effect could be detected if statistical power were increased with more simulations. The same is probably true for xDNA, aDNA, and yDNA in the half-storage model for both age structures. This raises the question of how many more replicates would be needed to reduce the standard error enough to detect a significant difference between the simulations with nonindependent mate pairing among breeding seasons and the theoretical expectation that assumes independent mate pairing among breeding seasons. Using the theoretical expectations for the variance of *θ* from the approximations discussed above, we explored this question with respect to test 2 with the long age structure, which tests whether the ratio of *θ* for xDNA to aDNA differs from the null expectation of 0.759 in the harem and storage models. In general, as expected, for a given number of replicates (or independent nucleotide sites in a real data set), the statistical power to detect departure from expectation increases with *θ* and with the magnitude of the departure from the null expectation. When *θ* for aDNA is 0.03, 4000 independent replicates provide a >95% probability (at the *P* = 5% level) of rejecting the null hypothesis that the ratio of *θ* of xDNA to *θ* of aDNA is 0.759, when it is actually 0.85 or higher. However, when *θ* of aDNA is 0.001 (which is closer to the situation in humans), ∼54,000 replicates are needed to achieve a >95% probability of rejecting the null hypothesis that the ratio of *θ* of xDNA to *θ* of aDNA is 0.759, when it is actually 0.95 or higher. This shows that data on a large number of SNPs are needed to test for biologically plausible differences in variability among different components of the genome.

### The effect of nonindependent mate pairing on N_{e-x}/N_{e-a}

Natural selection, mutation, and demographic effects vary between the sex chromosomes, mtDNA, and aDNA, and this influences their *N*_{e} and level of polymorphism. The relative *N*_{e} of different parts of the genome can offer insight into intricacies of species social systems and natural selection (Ellegren 2009; Charlesworth 2012). After controlling for variation in mutation rate, a paucity of polymorphism on xDNA relative to aDNA could, for example, suggest natural selection on slightly deleterious hemizygous X-linked mutations in males, whereas an excess of polymorphism on xDNA is consistent with a female biased sex adult ratio or an excess variance in male reproductive success (*e.g.*, Charlesworth 2001; Evans *et al.* 2010). Population size changes can also skew relative *N*_{e} values of different components of the genome; for example, *N*_{e-x}/*N*_{e-a} may become ephemerally higher after population expansion, but lower after population contraction (Pool and Nielsen 2007).

The harem models studied here provide a useful tool with which to explore the effect of another variable that can influence *N*_{e-x} /*N*_{e-a}—namely, variance in male reproductive success that stems from nonindependent mate pairing across breeding seasons. This could arise, for example, in species with intense male–male competition, where the same males tend to repeatedly win reproductive access over multiple breeding seasons. With the harem models, *N*_{e-x}/*N*_{e-a} reaches a maximum of 9/8 when only one male reproduces with all females, and a minimum of 9/16 when only one female reproduces with all males (Figure 4; Caballero 1995; Charlesworth 2001), the same as in the discrete generation case (Wright 1931; Caballero 1995). To characterize the effect of nonindependent mate pairing across generations on *N*e-x/*N*e-a, we performed simulations with various harem sizes (either 2, 3, 9, or 18 females per harem) using the seasonal harem model and the harem for life models. These simulations indicate that a relatively modest amount of nonindependent mate pairing across breeding seasons could potentially account for estimates of *N*_{e-x}/*N*_{e-a} that are significantly >0.759 (Figure 4). In humans, for example, there appears to be a significant excess of polymorphism for xDNA compared to aDNA in regions far from genes (Hammer *et al.* 2010). Of note, however, is the high variance of this ratio, even when averaged over 4000 independent sites (see above).

## Conclusions

This study demonstrates that, in some circumstances, nonindependent mate pairing across breeding seasons can significant decrease the *N*_{e} of wholly or partially paternally inherited portions of the genome, including yDNA, aDNA, and xDNA. In our simulations, the magnitude of this effect is a function of the degree of paternal inheritance and is generally, therefore, largest for yDNA, intermediate for aDNA, and smallest for xDNA. Because this differential effect depends on the mode of inheritance, nonindependent mate pairing across breeding seasons increases *N*_{e-x}/*N*_{e-a} when all females have purely random variation in reproductive success. Population age structure and lifetime fecundity influenced the effect of nonindependent mate pairing. For example, a statistical signature of nonindependent mate pairing among breeding seasons was more readily detected in the short age structure, in which individuals had a higher lifetime fecundity, than with the long age structure. The simulations performed here were biologically unrealistic in the sense that multiple haploid genomes (mtDNA and yDNA), each with independent genealogical histories, were generated. In reality of course, these genomes have only one genealogy for a given population, and yDNA and mtDNA each offer only one realization of a highly variable coalescent process (Hudson and Turelli 2003). Additionally, because of linkage, yDNA and mtDNA are subject to background and hitchhiking effects, which tend to lower their *N*_{e} below neutral expectations. Furthermore, our simulations were performed under the assumption of neutrality—an assumption that may frequently be violated in data sets from real genomes. For these reasons, it is potentially challenging to use molecular polymorphism data from natural populations to distinguish nonindependent mate pairing across breeding seasons from alternative demographic scenarios that also increase *N*_{e-x}/*N*_{e-a} but that have different expected affects on *N*_{e-y} and *N*_{e-mt}, such as a skewed adult sex ratio, or recent population growth.

## Acknowledgments

We thank the Institute for Evolutionary Biology at the University of Edinburgh for hosting B.J.E. during his sabbatical and Jonathan Dushoff and Bill Hill for helpful discussion. We also thank Lindi Wahl and two anonymous reviewers for helpful suggestions.

## Footnotes

*Communicating editor: L. M. Wahl*

- Received September 29, 2012.
- Accepted November 20, 2012.

- Copyright © 2013 by the Genetics Society of America