Abstract
We describe a method for comparing nuclear and organelle population differentiation (F_{ST}) in seed plants to test the hypothesis that pollen and seed gene flow rates are equal. Wright’s infinite island model is used, with arbitrary levels of selffertilization and biparental organelle inheritance. The comparison can also be applied to gene flow in animals. Since effective population sizes are smaller for organelle genomes than for nuclear genomes and organelles are often uniparentally inherited, organelle F_{ST} is expected to be higher at equilibrium than nuclear F_{ST} even if pollen and seed gene flow rates are equal. To reject the null hypothesis of equal seed and pollen gene flow rates, nuclear and organelle F_{ST}’s must differ significantly from their expected values under this hypothesis. Finite island model simulations indicate that infinite island model expectations are not greatly biased by finite numbers of populations (≥100 subpopulations). The power to distinguish dissimilar rates of pollen and seed gene flow depends on confidence intervals for fixation index estimates, which shrink as more subpopulations and loci are sampled. Using data from the tropical tree Corythophora alta, we rejected the null hypothesis that seed and pollen gene flow rates are equal but cannot reject the alternative hypothesis that pollen gene flow is 200 times greater than seed gene flow.
GENE flow in natural populations can take place in two ways: through the movement of gametes and through the movement of individuals such as progeny or adults. In seed plants, adults are usually immobile, while gene flow results from the dispersal of both pollen and seeds. In diploid species, seeds carry two nuclear alleles while pollen grains carry one nuclear allele. As a result, pollen and seed gene flows are expected to have different effects on deme size and overall levels of population structure (Crawford 1984; Ennos 1994). In particular, seed dispersal will contribute twice as much as pollen dispersal to the size of genetic neighborhoods when rates of pollen and seed gene flow are equal. Thus, seed gene flow can be a more important determinant of deme size and subdivision among populations than pollen gene flow.
The vast majority of direct measures of plant gene flow have focused on the pollen component of gene flow (e.g., Devlin and Ellstrand 1990; Nason and Hamrick 1997; see also McCauley 1995). Such pollenonly measures of gene flow may produce biased estimates of plant genetic neighborhood sizes in two ways. First, if gene flow through seed is ignored entirely, downwardly biased estimates of gene flow will (of course) result. Second, if gene flow through seed is not ignored, but rather assumed to proceed at a rate equal to that through pollen, either upward or downward bias may result. Comparing rates of pollen and seed gene flow is therefore necessary for a complete understanding of the dispersal mechanisms that cause population structure. Measures of the fixation of alleles among populations (F_{ST} and G_{ST}) should be useful tools for making such comparisons. These measures are commonly used to estimate N_{e}m, the product of the effective population size and the migration rate, under the assumptions of Wright’s infinite island model of population structure (Wright 1931; see also Slatkin 1985; Nagylaki 1998). Likewise, they are natural choices for independently estimating the effects of different components of gene flow on overall levels of population subdivision.
Organelle genomes provide a mechanism to estimate levels of gene flow independently in pollen and seeds, due to uniparental inheritance (McCauley 1995). In angiosperms, chloroplast and mitochondrial genomes are most often maternally inherited, while in gymnosperms both genomes are predominantly paternally inherited (Corriveau and Coleman 1988; Reboud and Zeyl 1994; Mogensen 1996), with the notable exception of pines (Wagner 1992). Of particular utility is maternal organelle inheritance, which provides a record of seedmediated gene flow independent of pollenmediated gene flow. Substantial data are becoming available on patterns of maternally inherited organelle genetic variation among plant populations (McCauley 1994; Powellet al. 1995; DumolinLapégueet al. 1998; Hamilton 1999; Caronet al. 2000; Desplanqueet al. 2000; Dutechet al. 2000; MulokoNtoutoumeet al. 2000; Sperisenet al. 2001) that can be used to infer rates of seedmediated gene flow.
To correctly interpret these data, it is necessary to account for differences in the effective population sizes of nuclear and organelle genomes. The effective population size of organelle genomes is less than that of diploid nuclear genomes because organelle genomes are haploid and are often uniparentally inherited (Birkyet al. 1983). Because of those smaller genome effective population sizes, the organelle genome is expected to experience higher rates of genetic drift than do nuclear loci (Takahata and Palumbi 1985; Birkyet al. 1989). This increased rate of genetic drift will increase the level of population structure for an organelle genome relative to that for the nuclear genome within a species. The extent of population differentiation for organelle and nuclear genomes will also depend on the rates of gene flow for gametes and individuals. In plants, this means that population differentiation of nuclear alleles and organelle genomes depends on pollen and seed dispersal rates as well as on the outcrossing rate and the mode of inheritance of organelles. It is well established that organelle genomes are expected to show higher equilibrium levels of population structure than do nuclear loci in island and steppingstone models of drift and migration (Petitet al. 1993; Ennos 1994; Hu and Ennos 1999). Despite these clear expectations for organelle nuclear population structure, however, currently few methods compare levels of gene flow for different components of gene dispersal (such as seed and pollen) inferred from measures of population differentiation in different genomes.
Our goal in this article is to provide a framework to test specific hypotheses about rates of pollen and seed gene flow in the island model using estimates of fixation indices from natural populations. In particular, we attempt to consolidate the theory of population structure for nuclear and organelle genomes influenced by drift and migration in Wright’s (1931) infinite island model. We combine and extend previous results to simultaneously include the effects of arbitrary levels of organelle genome effective population size, selffertilization, and patterns of inheritance of organelles on expected nuclear and organelle population structures near equilibrium. We then describe a method for comparing empirical estimates of nuclear and organelle fixation indices to test the null hypothesis that rates of seed and pollen gene flow are identical under the island model. We investigate the power of the hypothesis test using simulation and show that the sampling variability of the comparison is fairly straightforward to estimate and interpret. Our comparison therefore provides a useful alternative to existing estimators of the ratio of nuclear and organelle gene flow rates established by Ennos (1994), which may be of limited use in practice since their sampling variances are difficult to interpret.
Since actual populations are not infinite in number, we examined expectations for nuclear and organelle population structure in a finite island model using simulations. These simulations explore the power of our hypothesis test to distinguish fixation index estimates from nuclear and organelle data. Specifically, they indicate the dependence of power on the total number of subpopulations in the island model, the number of subpopulations sampled, and the number of nuclear loci sampled. These results suggest sampling designs that will increase the chance of rejecting the null hypothesis of identical rates of pollen and seed gene flow when the true values are different. Although we employ the perspective of pollen and seedmediated gene flow in plants, the results presented here are generally applicable to systems of nuclear diploid and haploid (sex) chromosomes or nuclear and organelle gene flow in animals as well.
Throughout this article we assume no heteroplasmy of organelle genomes in gametes. Although the rate of loss of organelle genetic diversity within individuals depends on several factors, Chesser (1998) showed for uniparental inheritance that if the rate of loss of individual organelle genetic variation is greater than the rate of loss of population organelle variation, then population processes should dominate the dynamics of genetic variation. Heteroplasmy may also be maintained by selection (Clark 1988), which violates the conditions of neutrality required by any fixation index analysis of population structure that attempts to infer rates of gene flow. Additionally, heteroplasmy (e.g., Ludwiget al. 2000) and doubly uniparental inheritance (e.g., Kondoet al. 1990; Saavedraet al. 1997) of mitochondrial DNA have been shown in a variety of animal species but are not commonly reported for plant organelle genomes, although exceptions exist (Mogensen 1996). In what follows, the terms migration, dispersal, and gene flow are considered synonymous since we are concerned only with the successful establishment of alleles or haplotypes that have moved from one population to others, not with the movement of all gametes or seeds. We also assume that mutation rates are small relative to migration rates, so that the shortterm (hundreds of generations) dynamics of genetic diversity within and among populations are dominated by gene flow and genetic drift.
INFINITE ISLAND MODEL
Let us consider the effective population size (N_{e}) of selectively neutral alleles in diploid nuclear genomes and selectively neutral haplotypes in organelle genomes. In a population of N_{e} individuals there are 2N_{e} nuclear allele copies. Cytoplasmic genomes are inherited from the single haplotype present in one parent, assuming complete homoplasmy and strict uniparental transmission. In hermaphrodites there are N_{e} haplotype copies, since all individuals are capable of transmitting an organelle genome to progeny because each individual can act as an effective maternal or paternal parent. In a dioecious species, only one sex or onehalf of the population (assuming a 1:1 breeding sex ratio) is able to transmit an organelle genome to progeny, giving N_{e}/2 haplotype copies. Thus, the effective population sizes of nuclear and organelle genomes will differ due to their mode of inheritance. Under these assumptions, the effective population size of an organelle genome is expected to be onehalf that of the nuclear genome for hermaphrodites and onequarter that of the nuclear genome for dioecious organisms (Birkyet al. 1983). The inverses of these effective population sizes are the probabilities of autozygosity for nuclear alleles or organelle haplotypes [e.g., 1/(2N_{e}), 1/N_{e}, and 2/N_{e}] under an infinite alleles model (Kimura and Crow 1964). These expectations can be readily modified to take into account factors such as unequal breeding sex ratios under dioecy or unequal probabilities of reproduction as an effective maternal or paternal parent for hermaphrodites.
We now review Wright’s (1931, 1951; see Crow and Kimura 1970; Crow 1986) wellknown infinite island model, which assumes equilibrium between drift and migration for an effectively infinite set of subpopulations, each of size N_{e} with m% of each subpopulation’s gene copies exchanged at random from the rest of the population every generation. Using this model we can approximately relate the degree of differentiation among subpopulations to a function of the effective population size and the amount of migration. This familiar approximation is based on the increase in autozygosity in finite subpopulations that also exchange alleles with the total population (migration fraction m),
Expressions (35) predict that at equilibrium organelle loci should exhibit more fixation among populations than nuclear loci given an identical rate of migration (Petitet al. 1993; Ennos 1994; Hu and Ennos 1997, 1999). This increase in fixation among populations is due to the increase in genetic drift caused by the lower effective size of haploid genomes and their mode of transmission. At equilibrium with strict uniparental inheritance of organelles and identical total migration rates, in the limit as N_{e}m becomes large we have (approximately) F_{ST(}_{n}_{)} = F_{ST(oh)}/2 = F_{ST(od)}/4. However, as we describe in the next section (which addresses arbitrary values of N_{e}m), the total migration rates for nuclear and organelle alleles are not necessarily identical since modes of dispersal may differ for these genomes.
Components of gene flow in seed plants: In seed plants, population structure due to gene migration is determined by the dispersal of nuclear alleles and organelle haplotypes in pollen and seeds (Crawford 1984; Petitet al. 1993; Ennos 1994). These components of gene migration are diagrammed in Figure 1 for the case of strict uniparental inheritance of organelle genomes, with the possibility of outcrossing at the rate of t and selfing at the rate of 1  t. With strict outcrossing, nuclear alleles and paternally inherited organelle haplotypes disperse through both pollen and seed, while maternally inherited organelle haplotypes disperse only through seed. In contrast, with complete selffertilization, all genomes are dispersed only through seeds. A mixed mating system will therefore reduce the average dispersal of nuclear alleles and paternally inherited organelle haplotypes via pollen (compared to complete outcrossing) by a factor equal to the selfing rate. Note that selfing does not alter the dispersal of strictly maternally inherited organelle haplotypes, since the sire makes no contribution by definition. Summing the individual components of gene migration (m) for nuclear, paternal, and maternal transmission thus leads to the following expressions for the nuclear, paternal organelle, and maternal organelle migration rates:
Organelle genomes may also be transmitted to zygotes by either a maternal or a paternal parent in a uniparental fashion. If α is the rate of maternal transmission and β the rate of paternal transmission for an organelle genome, then α+β = 1 (Takahata and Palumbi 1985; Birkyet al. 1989). The components of pollen and seed gene migration with biparental organelle inheritance are diagrammed in Figure 2. With biparental transmission the selfing rate does influence the dispersal of organelle haplotypes (Petitet al. 1993). Summing the migration terms in Figure 2, and noting that the α(1  t)m_{seed} and β(1  t)m_{seed} terms sum to (1  t)m_{seed}, yield an expression for the components of gene migration for an organelle haplotype with biparental transmission:
Null and alternative hypotheses for rates of pollen and seed gene flow: As described above, our goal is to establish an analytical framework in which to compare different degrees of population differentiation (measured as F_{ST}) for nuclear and organelle genomes and to infer possible differences in migration rates of pollen and seed. Such comparisons require clear null hypotheses. In particular, we desire a test of the null hypothesis that pollen and seed gene flow rates are identical. Since equilibrium F_{ST}’s for nuclear and organelle genomes are expected to differ due to differences in the effective size of each genome (Equations 3, 4, 5), showing that F_{ST} estimates are significantly different does not demonstrate that migration rates of pollen and seed differ. To establish a testable null hypothesis, it is necessary to account for both the effective size of allele populations for different genomes and the contributions of seed and pollen dispersal to fixation among populations at equilibrium.
The most basic null hypothesis is that seed and pollen gene flow rates are equal. If this is the case, then the m_{seed} and m_{pollen} terms in the expressions for components of gene flow (Equations 6, 7, 8, 9) are equal so that the subscripts can be dropped. This provides a means of establishing null hypotheses for differences in F_{ST} estimated for different genomes. Specifically, suppose that an organism has genome 1 and genome 2, whose F_{ST} values approximately satisfy F_{ST1} = 1/(a_{1}N_{e}m + 1) and F_{ST2} = 1/(a_{2}N_{e}m + 1). Solving for N_{e}m in terms of F_{ST1} and inserting the result into the formula for F_{ST2} yields the equation
First, consider an outcrossing (t = 1) hermaphrodite with strict maternal organelle transmission (α= 1). The components of gene migration are then m_{nuclear} = (1/2)m + m or (3/2)m for nuclear alleles (from Equation 6) and m_{maternal} = m for organelle alleles (from Equation 8). If we then substitute these migration values into our equilibrium expectations for fixation among populations we obtain for nuclear alleles (Equation 3)
As a second example, again assuming that pollen and seed dispersal rates are equal, we derive the appropriate null hypotheses for the case when organelle genomes have some degree of biparental inheritance. Consider a dioecious species with a mixed mating system (t = 0.5) and organelle transmission that is 90% maternal (α= 0.9) and 10% paternal (β= 0.1). Equation 9 is used to determine the components of organelle allele migration in pollen and seed. In this case m_{organelle} = (21/20)m [note that this migration rate is greater than that for strict maternal inheritance by a difference of (1/2)βm because of dispersal of β organelle genomes in pollen, but only onehalf of the paternally transmitted organelles actually disperse in pollen since the selfing rate is 1  t or ^{1}/_{2}]. Next we observe that total migration of nuclear alleles is given by Equation 6. In this case m_{nuclear} = (5/4)m, slightly less than would be the case under complete outcrossing. This is reasonable since selfing increases migration in seed and reduces migration in pollen.
Substituting the total migration rate for organelles into Equation 5 gives
In general, when seed and pollen dispersal rates are equal, the total migration rate of organelle alleles for all levels of biparental transmission will be at least m. This is because all alleles must be dispersed in seed with any level of outcrossing. The total migration of organelle alleles will thus be increased by the degree of paternal transmission for outcross matings, since dispersal in these matings is through both pollen and seed.
The examples presented above take the perspective of establishing null hypotheses for relative levels of nuclear and organelle subdivision when pollen and seed migration rates are equal. However, it is a simple extension to incorporate into a null hypothesis any available data on rates of pollen and seed gene flow. If, for example, pollen and seed migration rates as well as the mating system have been estimated, it will be possible to formulate a null hypothesis for the degree of fixation among populations for organelle and nuclear alleles at driftmigration equilibrium. It is also possible to use observed levels of fixation among populations for organelle and nuclear alleles to estimate rates of pollen and seed migration that would be expected to produce such a pattern. For all of these cases, however, it is important to bear in mind that the hypotheses generated are dependent on the assumptions of Wright’s island model, namely that populations are near equilibrium and that drift and migration are the dominant processes influencing population subdivision.
At this point we switch from considering only theoretical expectations, which are exact, to potential comparisons of F_{ST(}_{n}_{)} and F_{ST(o)} under various regimes of pollen and seed gene flow based on empirical estimates of fixation indices influenced by sampling variance.
Methods for comparing nuclear and organelle gene flow: Before presenting an application and power analysis of the null hypotheses we have developed in the sections above, we pause to compare our approach to that of Ennos (1994; see also Hu and Ennos 1997). Using Ennos’s symbols, let x be the proportion of pollen from within a population that sires ovules within the same population and 1  x the proportion of pollen due to gene flow. Similarly, let y be the proportion of seeds from within a population that establish within the same population and 1  y the proportion due to gene flow. The ratio (1  x)/(1  y) then provides an estimate of the relative contributions of seed and pollen gene flow for hermaphrodites with strict maternal inheritance of organelles. To estimate this ratio from fixation indices, Ennos (1994) showed that
Ennos’s statistic is a point estimate of the ratio of pollen to seed gene flow estimated from nuclear and organelle fixation indices. To test hypotheses about rates of seed and pollen gene flow or to compare ratio estimates from empirical data, it is necessary to estimate the sampling variance of the statistic(s) employed. Because Ennos’s statistic is a ratio and has a denominator with a strongly nonlinear relationship to the organelle fixation index, its sampling variance is problematic to estimate and interpret. A simple calculation indicates the reason for this difficulty: If x and y are known to within error bounds Δx and Δy, then the error Δf of the linear approximation to (1  x)/(1  y) satisfies, to leading order, the equation
We stress that whether one compares two F_{ST} estimates by calculating their ratio, by comparing confidence intervals (as we do below), or by any other means, it is the error in estimating each individually that ultimately determines the power of comparison. We have chosen to create separate confidence intervals for F_{ST(}_{n}_{)} and F_{ST(o)} on the grounds that this approach makes it easier to interpret the sources of estimation error, no additional estimation error is introduced by linear approximation of a markedly nonlinear function, and the construction of null hypotheses is more obvious. In some cases, it could be convenient to compute a single statistic that summarizes the comparison between the two F_{ST} estimates. One could use Ennos’s statistic for this purpose, although for the reasons given above, we argue that a difference of scaled F_{ST} estimates is preferable. This is because the sampling variance of the difference is simply the sum of the individual sampling variances and therefore is more straightforward to interpret than the sampling variance of a ratio. It seems to us, however, that use of any single statistic has the potential to obscure patterns clearly discernible in the separate F_{ST} estimates.
FINITE ISLAND MODEL SIMULATIONS
Using simulations of drift and gene flow in a finite island model, we estimated nuclear and organelle fixation indices and their 95% confidence intervals across a range of total numbers of subpopulations and numbers of subpopulations sampled. The goal of these simulations was to examine the power to distinguish differences in the values of fixation indices for nuclear and organelle loci in the finite island model under equal and unequal rates of pollen and seed gene flow.
We conducted simulations of drift and migration for monoecious individuals with arbitrary rates of paternal gamete (pollen) and progeny (seed) migration, outcrossing, rates of maternal and paternal transmission of organelles, effective size of subpopulations, and total numbers of subpopulations. Nuclear genotypes were diploid, all loci had two alleles, and generations were nonoverlapping. All subpopulations were equal and constant in size and mating was random within subpopulations. Allele frequency change caused by genetic drift was modeled with Kimura’s (1980) “pseudosampling variable” (PSV) method. Migration was based on individual zygote or gamete movement where the product of the migration rate and the effective size (N_{e}m) determined the number of gametes or progeny leaving or entering each population every generation. When N_{e}m was not a whole number, the remainder determined the chances of an individual migration event in each generation. Genotypespecific immigration rates were determined by the average genotype frequency among all subpopulations, equivalent to assuming that all subpopulations have equal rates of gene flow with all other subpopulations. In each generation the order of events was genetic drift, gamete dispersal, mating, and progeny dispersal.
The fixation indices (0398; and G_{ST}) were calculated in our simulations by two methods: analysis of variance (Weir 1996) and gene diversity (Nei 1973, 1986; Pons and Petit 1995). These methods are expected to provide very similar estimates from the same data for loci with two alleles (Chakraborty and DankerHopfe 1991). Confidence intervals were estimated by 1000 bootstrap replicates over sampled populations with replacement (Efron and Tibshirani 1993; see also Weir 1996). Simulations were written in Matlab (6.0.0.88, Release 12) and the source code is available at http://www.georgetown.edu/faculty/hamiltm1/.
Before presenting the simulation results in detail, we note that two types of sampling variance may cause allele frequencies in finite numbers of subpopulations to evolve differently than expected under the infinite island model (Neiet al. 1977; Takahata 1983; Varvioet al. 1986). First, finite numbers of subpopulations cause additional genetic drift, since the allele frequencies in the migrant pool are drawn from a finite sample. Second, in most cases when actual measurements of genotype frequencies are taken, a small number of populations are sampled relative to the total (finite) number of subpopulations. Thus, empirical sampling of populations introduces further variance into estimates of fixation indices. The combination of these two types of sampling variance will dictate the confidence intervals associated with estimates of population subdivision, as illustrated in our simulations. In the simulations presented here, fixation indices (0398; and G_{ST}) were estimated using all individuals (i.e., a complete census) in randomly sampled subpopulations.
Although the use of a complete census might obscure a potentially significant source of sampling error (i.e., error due to withinpopulation sampling of individuals), in fact it is well known in the sample survey literature (e.g., Kish 1965) that sampling variance is determined predominantly by the sampling unit that is most limited. We have chosen to focus on the common situation in which the number of populations sampled is much smaller than the number of individuals sampled from each population. In this case, both basic theory and simulations show that withinpopulation sampling contributes relatively little to overall sampling error, so that completecensus simulations can be expected to provide a realistic picture of the expected estimation error even for data collected from withinpopulation samples.
SIMULATION RESULTS
Although not completely realistic in all aspects, the finite island model is a better approximation of actual organisms than the infinite island model since biological populations are clearly not infinite in number. The key question addressed in our simulations is, “Can the finite island model be used to predict levels of population structure under drift and migration such that it is possible to distinguish cases where rates of pollen and seed gene flow are roughly equal or where rates are very different?” To answer this question we ran simulations with a variety of parameter values to estimate F_{ST(}_{n}_{)} and F_{ST(o)}. We first discuss simulation results for equal rates of pollen and seed gene flow.
The simulation results allowed us to compare levels of population subdivision in the finite and infinite island models. In general, the level of population subdivision for both the nuclear and organelle genomes in the finite island model closely matched expectations from the infinite island model. However, the finite case does differ in two major respects from the infinite case. First, fixation index estimates showed more random variation across generations with decreasing total numbers of populations. This result is expected since a finite number of subpopulations causes random variance in total population allele frequency, which can be thought of as total population genetic drift, and increases as the total number of subpopulations decreases (Neiet al. 1977). Second, fixation indices have fairly wide confidence intervals when estimated from a random sample of populations that is small, as is the case in empirical studies (see Weir 1996).
Figure 3 shows simulation results for 10 populations sampled from a total of 200 populations with equal and moderate rates of pollen and seed gene flow (N_{e}m = 4.0). Simulation results in Figure 4 are for the same conditions except that migration rates are very low (N_{e}m = 0.02). In both cases, random variation in fixation index estimates across generations from the small total number of subpopulations is evident and the 95% confidence intervals for the nuclear and organelle estimates broadly overlap. In these examples the organelle genome showed higher levels of population subdivision (as expected due to the smaller effective size of a uniparentally inherited, haploid genome), although the nuclear and organelle fixation indices are statistically indistinguishable at the 95% confidence level.
These examples are representative of our simulations for up to 10,000 total subpopulations under equal rates of pollen and seed gene flow. Sampling more subpopulations decreased the width of confidence intervals, while increasing the number of subpopulations in the finite island model reduced generationtogeneration variation in fixation index estimates. As expected, the two methods of fixation index estimation (0398; or G_{ST}) gave very similar results with two alleles, so only results for Θ are presented.
We also explored organelle and nuclear population differentiation for 100, 200, 500, and 1000 total subpopulations in the finite island model with sampling sizes of 5, 10, 25, and 50 subpopulations. Simulation results for mean Θ and the 2.5 and 97.5 percentiles from 1000 independent subpopulation samples for each set of conditions are given in Table 1.
These simulations highlight two patterns in the finite island model. First, mean estimates of Θ were relatively insensitive to variation in the total number of populations in the island model. In Table 1 the infinite island model expectations are F_{ST} = 1/7 for nuclear loci and F_{ST} = 1/3 for organelle loci. All levels of total subpopulations in the finite island model produced mean estimates of Θ that were near infinite island model expectations. This demonstrates that infinite island model expectations for levels of organelle and nuclear population differentiation should not be greatly biased by finite numbers of populations if there are at least 100 populations experiencing gene flow. In practice, the sampling variance associated with empirical estimates of fixation indices will likely be much greater than systematic bias in fixation indices due to finite numbers of subpopulations. For example, compare the 95% confidence intervals for fixation indices in Figures 3 and 4 with the difference between the infinite island model expected and finite island model simulation mean of Θ (Table 1).
The second pattern apparent in Table 1 is the relationship between the number of populations sampled to estimate fixation indices and the distribution of those estimates. The confidence intervals for nuclear and organelle estimates of Θ were independent of the total number of island subpopulations, but clearly widened as the number of sampled subpopulations decreased. It may, therefore, be difficult to statistically distinguish levels of nuclear and organelle population differentiation when small numbers of populations are sampled. For example, the 95% confidence intervals for organelle and nuclear Θ overlapped when 5 or 10 subpopulations were sampled across all levels of total subpopulations. However, when 25 or 50 subpopulations were sampled the distributions of organelle and nuclear Θ were nonoverlapping in all cases.
Sampling multiple loci reduced the width of fixation index confidence intervals. Table 2 shows simulation results for mean Θ and the 2.5 and 97.5 percentiles from 1000 independent samples of 10 subpopulations for 1, 5, and 10 nuclear loci drawn from 100, 200, 500, and 1000 total subpopulations. In Table 2, as in Table 1, the infinite island model expectations are F_{ST} = 1/7 for nuclear loci. The confidence intervals for nuclear estimates of Θ were relatively independent of the total number of subpopulations, but decreased as the number of loci sampled increased.
It will be increasingly difficult to distinguish nuclear and organelle fixation estimates under equal rates of pollen and seed gene flow as N_{e}m approaches zero (or as F_{ST} approaches one). For any level of N_{e}m, however, sampling more subpopulations and loci will reduce the sampling variance of fixation index estimates and give greater power to distinguish nuclear and organelle population differentiation. Finally, we note that, using simulation, it should also be possible to provide a power analysis of any empirical comparison of nuclear and organelle fixation indices. Thus, failure to reject the null hypothesis of equal rates of pollen and seed gene flow can be interpreted in the context of the number of subpopulations and loci sampled.
AN EXAMPLE
As an empirical example, we tested the hypothesis that seed and pollen gene flow are equal using organelle haplotype and nuclear genotype data from the hermaphroditic tropical tree Corythophora alta. Hamilton (1999) collected chloroplast haplotype data for 162 individuals in seven populations and estimated Θ at 0.962 with a 95% confidence interval of 0.8351.0 (here Θ and the confidence interval based on 1000 bootstrap population samples with replacement were estimated with a Matlab program as in the simulations). Data for one nuclear microsatellite locus from these same individuals and populations (locus CTC 403 no. 12, n = 147; M. B. Hamilton, unpublished data) provided an estimate of R_{ST} = 0.059 with a 95% confidence interval of 0.0090.182 (estimated with RSTCALC 2.2, Goodman 1997; see Slatkin 1995). On the basis of these fixation indices, organelle N_{e}m = 0.0198 and nuclear N_{e}m = 4.0.
Empirical estimates of the outcrossing rate and degree of chloroplast maternal transmission are not yet available, so we assumed complete outcrossing and complete maternal organelle transmission (the implications of these assumptions are considered below). Under these assumptions in the infinite island model with equal rates of pollen and seed gene flow, applying Equation 10 to the observed 95% confidence interval (0.009, 0.182) for F_{ST(}_{n}_{)} yields an expected 95% confidence interval (C.I.) for F_{ST(oh)} of (0.027, 0.400) [with an expected point estimate of F_{ST(oh)} = 0.158 or 2.68 times the nuclear fixation estimate]. This expected interval does not overlap the observed chloroplast confidence interval of 0.8351.0. Likewise, applying Equation 10 to the observed 95% confidence interval (0.835, 1.0) for F_{ST(oh)} yields an expected 95% C.I. for F_{ST(}_{n}_{)} of (0.628, 1.0) [with an expected point estimate of F_{ST(}_{n}_{)} = 0.869]. Again, this expected interval does not overlap the observed nuclear confidence interval of (0.009, 0.182).
Finite island model simulations provide an additional point of view on these results. For a hermaphrodite with equal rates of pollen and seed gene flow, complete outcrossing, strict maternal organelle transmission, and N_{e}m = 4.0 or N_{e}m = 0.02 are shown in Figures 3 and 4, respectively. These simulations establish a null hypothesis for nuclear and organelle population structure in the finite island model with equal rates of pollen and seed gene flow. Clearly, the C. alta data do not fit these predictions. In particular, the simulations predict that the 95% confidence intervals for nuclear and organelle fixation indices are likely to overlap considerably, even though the organelle genome point estimates are often greater than the nuclear point estimates as expected. In the actual data, the 95% confidence intervals do not overlap and the nuclear R_{ST} is near zero, while the chloroplast F_{ST} is not significantly different from one. Further, as discussed in the preceding paragraph, the 95% confidence intervals for nuclear and organelle fixation indices still do not overlap when either is scaled by Equation 10 under the assumption of equal rates of pollen and seed gene flow.
Having rejected at the ≥90% level the null hypothesis that pollen and seed gene flow rates are equal, we used the finite island model simulations to explore expectations under various alternative hypotheses for rates of seed and pollen gene flow. Simulations were run using equal effective population sizes for both genomes but unequal pollen and seed migration rates. The gene flow rates for pollen and seed were based on migration values that would produce the observed levels of nuclear and chloroplast N_{e}m, given that the values of N_{e} for the two genomes must be equal.
The results of one simulation are given in Figure 5, which shows nuclear and organelle fixation among populations when pollen gene flow is 200 times greater than seed gene flow (m_{p} = 0.08, m_{s} = 0.0004, N_{e} = 50). With asymmetric rates of pollen and seed gene flow, the simulations predict that organelle F_{ST} reaches and remains at one due to low levels of gene flow in seeds, while the nuclear locus shows modest levels of population subdivision due to a higher rate of gene flow in pollen (Figure 5). The simulation results closely match the observed levels of population subdivision in C. alta. Indeed, the confidence intervals for the simulation and observed fixation index estimates broadly overlap for both the organelle and the nuclear genomes. Thus, the observed data are consistent with the expectation of asymmetric levels of pollen and seed gene flow in the finite island model.
In this example, we do not have estimates of the outcrossing rate nor the rate of maternal chloroplast transmission in C. alta. However, departures from strict maternal transmission of organelles would tend to decrease the equilibrium level of population structure in the chloroplast genome due to organelle gene flow in pollen. Selffertilization with strict maternal inheritance would tend to increase nuclear population structure or, with some degree of biparental organelle inheritance, both nuclear and organelle population structure. Because the simulation fixation index confidence intervals are large, we cannot distinguish complete outcrossing from a small degree of selfing or strict maternal inheritance from limited paternal organelle transmission (results not shown).
A complete test of the hypothesis that C. alta nuclear and chloroplast population structure is consistent with asymmetric seed and pollen gene flow rates will require estimates of the outcrossing rate and mode of organelle transmission. However, a substantial departure from strict outcrossing and maternal transmission would be required to qualitatively modify the predicted levels of population subdivision under these asymmetric rates of gene flow, given the large confidence intervals about both the empirical estimates and the simulation expectations.
DISCUSSION
We have presented a general framework for deriving expectations for levels of population differentiation for nuclear and organelle genomes in the island model, taking into account the potentially complex components of gene flow in seed plants. This framework allows comparison of levels of population differentiation for different genomes and tests of the null hypothesis that rates of pollen and seed gene flow are equal. Such comparisons will help to elucidate the mechanisms that contribute to plant deme size and population structure and will be increasingly practical as joint estimates of nuclear and organelle population differentiation are collected in natural populations.
In the finite island model with limited migration, fixation indices are expected to increase toward the level predicted by the infinite island model and then, after many generations have elapsed, to decline toward zero (Neiet al. 1977; Varvioet al. 1986). Finite numbers of populations cause additional genetic drift, since the total population is finite and global allele frequencies eventually reach fixation or loss. Without the input of genetic variation by mutation the entire system of finite populations will approach fixation for the same allele. However, on relatively short timescales the finite island model maintains genetic variation even without mutation. Our results show that for a broad range of effective sizes and migration rates, fixation indices in the finite island model with 100 or more populations should be approximately equal to expectations from the infinite island model over at least several hundred generations.
We presented simulation results for a small number of total subpopulations to demonstrate that the infinite and finite island models provide very similar expectations for population differentiation. Even in the case of 5 populations sampled from 100 total subpopulations, the simulations indicate that we will be able to distinguish grossly unequal rates of pollen and seed gene flow. Our test will not be sensitive to minor differences in gene flow rates where small numbers of populations are sampled. Indeed, the stochastic nature of the drift process causes substantial random variation in allele frequencies among populations that results in widely variable estimates of the fixation index when a small number of populations are sampled. In sum, the results show that the power of empirical comparisons is largely dependent on the number of subpopulations sampled. Comparisons of nuclear and organelle fixation indices will be increasingly able to distinguish different rates of pollen and seed gene flow by sampling more populations, since this will reduce fixation index confidence intervals.
Another way to narrow confidence intervals for fixation index estimates, and thereby aid comparisons of nuclear and organelle gene flow, is to sample multiple loci. Multilocus data are easily obtainable for nuclear fixation index estimates since unlinked loci can be utilized. However, organelle genomes are singlelinkage units, so that observed haplotypes constitute a single locus even if sampled from multiple regions of an organelle genome. Twolocus estimates of the fixation index for organelles may possibly be obtained in plants by sampling both chloroplast and mitochondrial genomes, if organelles have contrasting patterns of uniparental inheritance (e.g., Lattaet al. 2001). Even if both organelles are inherited in an identical fashion leading to complete gametic disequilibrium (Schnabel and Asmussen 1989), chloroplast and mitochondrial genomes experience mutation independently and have the potential for different effective organelle numbers, leading to different rates of genetic drift (see Chesser 1998). The evolutionary dynamics of joint organelle genetic polymorphism under independent mutation and drift but with common gene flow (common uniparental inheritance) deserve future modeling. For animals (which have only one organelle genome), in contrast, twolocus organelle fixation index estimates are impossible to obtain.
When comparing nuclear and organelle fixation estimates, it is important to consider what might cause departures from the null hypothesis of equal rates of pollen and seed gene flow. The genetic model that allows us to equate a fixation index with a simple function of the effective population size and migration rate assumes that equilibrium between drift and migration has nearly been reached. As a result, samples drawn from populations not at equilibrium may cause the null hypothesis to be erroneously rejected. In other words, if equilibrium is violated, it is possible that a significant departure from the null hypothesis could be observed when the rates of pollen and seed gene flow are actually equal. However, G_{ST} has been shown to reach equilibrium rapidly under assumptions similar to those here (Crow and Aoki 1984; Petitet al. 1993), and in the simulations presented here Θ and G_{ST} reached approximate equilibrium in <100 generations. Since the effective size of organelle loci is less than that of nuclear loci, we expect the approach to fixation or loss by drift to be faster for organelle loci. Therefore, organelle loci should approach equilibrium more rapidly than nuclear loci if rates of pollen and seed gene flow are approximately equal. For populations where genetic structure is decreasing, Petit et al. (1993) showed that G_{ST} for paternally inherited loci may actually be less than G_{ST} for nuclear loci during approach to equilibrium. In their example, equilibrium organelle G_{ST} is higher than nuclear G_{ST}, but nonequilibrium conditions cause the organelle locus to approach its equilibrium level of subdivision more rapidly. Even under nonequilibrium conditions, we generally expect levels of population subdivision to be higher for organelle than for nuclear loci when population structure is increasing. However, the example of Petit et al. (1993) shows that levels of population subdivision could be lower for organelle loci during the approach to equilibrium if population structure is decreasing and organelle and nuclear migration rates are equivalent.
The island model assumes equal levels of gene flow among all populations, which may not be appropriate for many species. In particular, the island model does not incorporate isolation by distance, which is observed in many natural populations (Wright 1943, 1951; Slatkin 1985). To compare expected levels of nuclear and organelle subdivision under specific hypotheses for rates of pollen and seed gene flow, it may be possible to establish expectations for nuclear and organelle population differentiation under other models of migration (e.g., Hu and Ennos 1999).
Although it is sometimes stated that populations are not expected to be near a state of driftmigration equilibrium (e.g., Whitlock and McCauley 1999), there are very few direct data to support any general claim of either approximate equilibrium or nonequilibrium genetic structure in natural populations. In specific species there can be an a priori expectation that equilibrium should not be attained due to factors such as recent range expansion (e.g., Petitet al. 1997). The test proposed here could provide a means to examine fittoequilibrium population structure expectations if estimates of model parameters (rates of gene flow, mating system, and organelle transmission) are available. Given that the confidence intervals for fixation indices in the finite island model with realistic levels of population and locus sampling will be large, however, it may be difficult to reject the hypothesis of approximate equilibrium. Failure to reject the island model does not argue that the species being studied actually evolves under the exact conditions of the island model, only that the island model provides a reasonable approximation of genetic population structure within the limits of measurement variance. Equilibrium conditions and the island model itself are potentially testable under the framework described here.
Acknowledgments
We thank J. Braverman for discussion and an anonymous National Science Foundation (NSF) proposal reviewer for insight into the sampling variance of Ennos’s ratio estimator. The final manuscript was improved by the comments of two anonymous reviewers. This research was supported by NSF grant DEB9983014 to M.B.H.
Footnotes

Communicating editor: D. Charlesworth
 Received December 18, 2001.
 Accepted September 5, 2002.
 Copyright © 2002 by the Genetics Society of America