| HOME | HELP | FEEDBACK | SUBSCRIPTIONS | ARCHIVE | SEARCH | TABLE OF CONTENTS |
Corresponding author: Maria E. Orive, Department of Ecology and Evolutionary Biology, University of Kansas, Lawrence, KS 66045-2106., orive{at}ukans.edu (E-mail)
Communicating editor: A. G. CLARK
| ABSTRACT |
|---|
A new maximum-likelihood method is developed for estimating unidirectional pollen and seed flow in mixed-mating plant populations from counts of joint nuclear-cytoplasmic genotypes. Data may include multiple unlinked nuclear markers with a single maternally or paternally inherited cytoplasmic marker, or with two cytoplasmic markers inherited through opposite parents, as in many conifer species. Migration rate estimates are based on fitting the equilibrium genotype frequencies under continent-island models of plant gene flow to the data. Detailed analysis of their equilibrium structures indicates when each of the three nuclear-cytoplasmic systems allows gene flow estimation and shows that, in general, it is easier to estimate seed than pollen migration. Three-locus nuclear-dicytoplasmic data only increase the conditions allowing seed migration estimates; however, the additional dicytonuclear disequilibria allow more accurate estimates of both forms of gene flow. Estimates and their confidence limits for simulated data sets confirm that two-locus data with paternal cytoplasmic inheritance provide better estimates than those with maternal inheritance, while three-locus dicytonuclear data with three modes of inheritance generally provide the most reliable estimates for both types of gene flow. Similar results are obtained for hybrid zones receiving pollen and seed flow from two source populations. An estimation program is available upon request.
THE juxtaposition of biparental and uniparental inheritance gives joint nuclear-cytoplasmic data special utility in decomposing plant gene flow and estimating the rates of pollen and seed migration. Moreover, plants offer two major cytoplasmic genomes, mitochondrial (mt) and chloroplast (cp) DNA, and three different combinations of uniparental inheritance patterns for this purpose. In most plant species, mtDNA and cpDNA are both inherited maternally, but in some species the two organelles are both inherited paternally or through opposite parents (![]()
![]()
![]()
![]()
![]()
The first is two-locus cytonuclear data with a biparentally inherited nuclear marker and a maternally inherited cytoplasmic marker. This is the most common, but may be the least powerful form of nuclear-cytoplasmic data for estimating pollen flow, since pollen then only carries the nuclear marker. The second is two-locus cytonuclear data with a paternally inherited cytoplasmic marker; although less common, this type of data can be much more informative, since pollen flow will now be reflected in both the nuclear and cytoplasmic markers. The third and final class is three-locus, nuclear-mitochondrial-chloroplast data combining biparental nuclear inheritance with both forms of uniparental cytoplasmic inheritance. Such dicytonuclear data with three distinct modes of inheritance are currently available from conifer species in the family Pinaceae, which inherit their mitochondria maternally and chloroplasts paternally (![]()
![]()
![]()
![]()
The theoretical foundation for using these three types of data for the estimation of plant gene flow has been laid out in a series of continent-island migration models for mixed-mating populations. These have fully delimited the effects of unidirectional pollen and seed flow upon both standard two-locus cytonuclear systems, with a single maternally or paternally inherited cytoplasmic marker (![]()
![]()
![]()
Although cytonuclear disequilibria are not necessary for estimation per se, permanent nonrandom associations nonetheless increase the chances that the equilibrium state will depend on, and thus allow estimates of, the rates of pollen and seed migration. In this regard, ![]()
![]()
For both two-locus systems, the factors necessary for permanent cytonuclear associations can arise in many biological situations. Disequilibria will be present in migrant pollen or seeds, for example, when suitable selection or other nonrandomizing forces act on the source population(s), as well as when gene flow is contributed by multiple, genetically distinct sources, as might be expected in hybrid zones and other areas of admixture. Similarly, allele frequency differences between the two forms of gene flow can be caused by the presence of distinct sources for migrant pollen and seeds, by multiple pollen and seed sources whose relative contributions vary with the form of gene flow, or by selection or other evolutionary forces acting during the life cycle of the source population(s).
Finally, dicytonuclear systems with opposite uniparental inheritance of the two cytoplasmic markers (![]()
Here we develop a formal maximum-likelihood procedure for estimating both types of plant gene flow based on this three-part theoretical framework, using the general, dicytonuclear migration model developed in a companion article (![]()
![]()
![]()
![]()
We begin by deriving the general conditions under which each system can be used in this way for estimation, illustrating these through consideration of some important special cases. In addition, we test the relative utility of these three types of data for decomposing plant gene flow into pollen and seed migration via simulated data. Finally, since cytonuclear and dicytonuclear data also offer a valuable tool for studying gene flow into a hybrid zone and other areas of admixture (![]()
![]()
![]()
![]()
![]()
![]()
| CONDITIONS FOR GENE FLOW ESTIMATION |
|---|
We wish to determine the conditions under which the cytonuclear and dicytonuclear models of unidirectional pollen and seed migration developed previously (![]()
![]()
![]()
![]()
![]()
the equilibrium nuclear allele frequency in this population, and
and
the nuclear allele frequencies in migrant pollen and seeds, respectively. The frequencies of the diallelic cytonuclear combinations carried by migrant pollen are denoted as in Table 4.
|
|
|
|
All three migration models apply to mixed-mating populations with nonoverlapping generations (![]()
M < 1; the overall fraction of migrant pollen per generation is thus the product, M(1 - s). Similarly, each generation a fraction m of the total seed pool is assumed to be derived from the source population(s) and the remaining fraction 1 - m from the resident population, where 0
m < 1. The migration rates and mating system, as well as the genetic composition of the source(s), are assumed constant over time; this corresponds to a continent-island model of migration with unidirectional migration (Fig 2). In addition, we assume that the markers are unaffected by selection, mutation, or random genetic drift in the resident population.
|
|
General considerations:
A basic consideration for any estimation procedure is that the number of parameters to be estimated not exceed the degrees of freedom (i.e., number of independent classes) in the data. This means that at most 5 parameters can be estimated from the 6 joint genotypes in two-locus cytonuclear data (Table 1), and at most 11 from the 12 joint genotypes in three-locus nuclear-mitochondrial-chloroplast (n-mtDNA-cpDNA) data (Table 2). Tests for goodness-of-fit between observed and expected frequencies will require a further degree of freedom, leaving at most 4 parameters that may be estimated from each two-locus system and 10 from the full, three-locus system. In each system, additional degrees of freedom and estimation power are possible by assaying multiple, unlinked nuclear markers.
The estimation method developed here additionally requires that the equilibrium state of the relevant cytonuclear or dicytonuclear migration model depend on the parameter(s) being estimated. This requires that the parameter(s) independently affect the equilibrium value of at least one variable out of the set of independent variables used to describe the system. Sufficient sets of equilibrium values for each of the three possible types of data are provided by the theory developed in ![]()
![]()
![]()
) and nuclear heterozygotes (
), the final mitochondrial frequency (
M), and the final allelic and genotypic cytonuclear disequilibria giving associations between mtDNA cytotypes and nuclear alleles (
A/M) and heterozygotes (
Aa/M). The corresponding set of variables for a two-locus n-cpDNA system with paternal inheritance includes the final frequencies of the nuclear alleles and nuclear heterozygotes (
and
), the equilibrium chloroplast frequency (
C), and the final allelic (
A/C) and heterozygote (
Aa/C) cytonuclear disequilibria between the nuclear and chloroplast markers. The formulas for these are given in Appendix A as (A1A8).
Last, for a full three-locus nuclear-dicytoplasmic system, a sufficient set of 11 variables includes the 8 variables listed above for the two cytonuclear systems, together with the final cytoplasmic disequilibrium between the two cytoplasmic markers (
M/C) and the joint allelic and genotypic disequilibria for the three markers, measuring associations between the joint mitochondrial-chloroplast cytotypes and the nuclear alleles (
A/MC) and heterozygotes (
Aa/MC) [(A9A11) in Appendix A]. An alternate parameterization of the three-locus system replaces the joint disequilibria (
A/MC and
Aa/MC) with the final three-way allelic (
A/M/C) and heterozygote (
Aa/M/C) disequilibria [(A12A13) in Appendix A], which measure the associations among the three markers (nuclear, mitochondrial, and chloroplast) after taking into account all of the possible two-way associations between them (nuclear-mitochondrial, nuclear-chloroplast, and mitochondrial-chloroplast).
General estimation conditions:
Examination of these equilibrium formulas readily gives the general conditions under which each of the three systems can be used to estimate the rate of seed (m) and pollen (M) migration. These are given in Table 5 for the general case of mixed-mating populations. The necessary conditions include (i) nuclear polymorphism in migrant seeds (0 <
< 1), (ii) unequal nuclear or chloroplast allele frequencies in migrant pollen and seeds (
or
C
C), and (iii) the existence of two-locus disequilibria between chloroplast and nuclear alleles in migrant pollen (
A/C
0) or seeds (
A/C
0), or between chloroplast and mitochondrial alleles in migrant seeds (
M/C
0). Allele frequency differences between the two types of migrant pools (ii) could be caused by distinct sources for migrant pollen and seeds, or by selection or other evolutionary forces acting during the life cycle of the source population. Such intermigrant differences could also be caused by having unequal pollen and seed dispersal from multiple sources (e.g., source 1 may contribute a greater fraction of migrant pollen than migrant seeds, while source 2 contributes a greater fraction of migrant seeds than pollen). Migrant pollen or seeds would be expected to carry associations between chloroplast and nuclear or mitochondrial alleles (iii) when selection or other evolutionary forces act on the source population, as well as when the pollen or seeds are contributed by multiple, genetically distinct sources, as might be expected in hybrid zones and other areas of admixture.
|
The results given in Table 5 show that it is harder to estimate pollen flow than seed flow; whenever the data allow estimation of pollen migration, the rate of seed migration can also be estimated. This is not surprising, given that only two of the three markers experience movement via pollen migration, while all three move during seed migration. Moreover, an inspection of the conditions in Table 5 reveals that there are at least three situations when only seed migration can be estimated. Two of these arise when there are equal nuclear allele frequencies in the two migrant pools (
=
). In such cases, the rate of pollen flow cannot be estimated using cytonuclear data with maternal cytoplasmic inheritance (n-mtDNA), while estimation of the seed migration rate requires only that the migrant seeds be polymorphic at the nuclear marker (0 <
< 1). The latter condition also allows estimates of the seed migration rate from cytonuclear data with paternal cytoplasmic inheritance (n-cpDNA) and n-mtDNA-cpDNA data with both forms of cytoplasmic inheritance in cases where none of the three systems allows estimation of the pollen migration rate (i.e., no intermigrant allele frequency differences in the nuclear or paternally inherited marker, or allelic cytonuclear disequilibria for the paternally inherited marker in migrant pollen and seeds). Finally, nonrandom associations between the two cytoplasmic markers (
M/C
0) allow seed migration estimates from n-mtDNA-cpDNA data in the absence of all four conditions that allow pollen migration estimates.
Three-locus data allow estimation of the gene flow parameters whenever data can be used from at least one of the two-locus cytonuclear systems and also sometimes when neither of these allows estimation. However, dicytonuclear data with three modes of inheritance increase only the conditions for estimating the seed migration rate; their conditions for estimating the pollen migration rate (M) are the same as for two-locus data with paternal inheritance (n-cpDNA). Three-locus data may nevertheless increase the power to detect the two forms of gene flow and/or give more accurate estimates because the rates of pollen and seed flow enter into more of the terms that determine the final state of the dicytonuclear system via cytonuclear and dicytonuclear disequilibria in migrant seeds (Appendix A). These migrant associations are not listed explicitly in the estimation conditions given in Table 5, since they require that migrant seeds be polymorphic at the nuclear marker, which is itself sufficient for estimation of the rate of seed migration.
Estimation conditions for special cases:
To further explore the conditions under which the two gene flow rates may be estimated, we consider the important special cases of the general continent-island model of unidirectional pollen and seed migration presented in a companion article (![]()
=
), (6) equal frequencies of the paternally inherited cytoplasmic marker in the two migrant pools (
C =
C), (7) equivalent migrant pools (
=
,
C =
C,
A/C =
A/C), and (8) no migrant disequilibria (
=
= 0 for all migrant pollen and seed disequilibria). The equilibrium structure in these eight simpler cases provides valuable insight into the conditions under which the equilibria for the dicytonuclear system and the two two-locus cytonuclear systems lose their dependence on, and their utility for estimating, either of the two migration rates. The details for these are given in ![]()
Seed migration alone (0 < m < 1, M = 0):
The equilibria for all three systems (n-mtDNA, n-cpDNA, and n-mtDNA-cpDNA) depend on, and allow the estimation of, the seed migration rate when this is the sole form of gene flow, as long as the migrant seeds are polymorphic for the nuclear marker (0 <
< 1). Data from the n-mtDNA-cpDNA system can be used in the additional case where migrant seeds carry nonrandom associations between the two cytoplasmic markers (
M/C
0).
Pollen migration alone (0 < M < 1, m = 0):
The absence of seed migration places significant constraints on the estimation of the pollen migration rate, M. The equilibrium for cytonuclear systems with maternal inheritance (n-mtDNA) is independent of M and thus cannot be used to estimate the rate of pollen migration if this is the only form of gene flow. The n-cpDNA and n-mtDNA-cpDNA systems with a paternally inherited cytoplasmic marker do allow estimation of the pollen migration rate, provided that the migrant pollen carry nonrandom cytonuclear allelic associations (
A/C
0).
Complete random mating (s = 0): Complete random mating places no additional constraints on estimation for the rates of pollen and seed flow beyond those given in Table 5 for the general case of mixed-mating populations.
Complete self-fertilization with seed migration (s = 1, 0 < m < 1):
The opposite extreme of complete selfing is distinctive since the lack of outcrossing means such populations are also closed to pollen flow. Both types of two-locus cytonuclear systems allow estimation of the seed migration rate (m) in purely selfing populations as long as the migrant seed pool includes either heterozygous seeds (
0) or cytonuclear disequilibria for that two-locus system (
N/*
0 for N = A, AA, Aa, or aa and * = M or C). Dicytonuclear data may also be used in such cases, as well as when migrant seeds carry joint or three-way disequilibria (
N/*
0 for N = A, AA, Aa, or aa and * = MC or M/C).
Equal nuclear allele frequencies in migrant pollen and seeds (
=
):
Although we have noted a number of situations that can produce unequal frequencies in the two forms of gene flow, in other cases they will be the same. When this holds for the nuclear marker, the residents' nuclear allele frequency reaches the common migrant value (
=
=
). As a result, the cytonuclear system with maternal cytoplasmic inheritance (n-mtDNA) loses its dependence on the pollen migration rate and thus cannot be used to estimate this type of gene flow (![]()
Equal frequencies of the paternally inherited cytoplasmic marker in the two migrant pools (
C =
C):
If the paternally inherited marker has equal frequencies in the two migrant pools, its frequency in the resident population approaches the common migrant value (
C =
C =
C). As with equal nuclear allele frequencies, this reduces the power for estimating gene flow rates for both the n-cpDNA and n-mtDNA-cpDNA systems with a paternally inherited marker by eliminating many of the intermigrant factors that generate permanent disequilibria.
Equivalent migrant pools (
=
,
C =
C,
A/C =
A/C):
If the two migrant pools are equivalent, cytonuclear data with maternal inheritance (n-mtDNA) are restricted to estimating the rate of seed migration (and then only if migrant seeds are polymorphic for the nuclear marker, as is true whenever
=
). The two- and three-locus systems with a paternally inherited cytoplasmic marker (n-cpDNA and n-mtDNA-cpDNA) still allow estimation of both types of gene flow as long as the migrant pollen and seeds carry nonrandom associations between nuclear and paternally transmitted cytoplasmic alleles (
A/C,
A/C
0).
No migrant disequilibria (
=
= 0, for all migrant pollen and seed disequilibria):
With no migrant disequilibria, it should be possible to estimate both types of gene flow from any of the three systems as long as the two migrant pools differ in their nuclear allele frequencies (
, see special case 5 above). Further, the cytonuclear system with paternal transmission (n-cpDNA) and the full three-locus system (n-mtDNA-cpDNA) can also provide both estimates if migrant pollen and seeds have distinct frequencies of the paternally inherited marker (
C
C).
| ESTIMATING GENE FLOW |
|---|
Here we present a new method to estimate rates of unidirectional pollen and seed migration using joint cytonuclear or dicytonuclear frequencies. We focus specifically on the case of uniparentally inherited cytoplasmic markers where, in the dicytonuclear case, these are inherited through opposite parents. This method uses data in the form of counts in adults of joint n-mtDNA, n-cpDNA, or n-mtDNA-cpDNA genotypes, where mtDNA here represents a maternally inherited marker and cpDNA a paternally inherited marker. We first outline the general approach for the simplest cases with a single diallelic marker from each genome and then indicate two initial extensions to data involving multiple unlinked nuclear markers and/or multiallelic markers. The utility of this method is illustrated with simulated data, which also allows us to compare estimates from dicytonuclear data with opposite cytoplasmic inheritance with estimates obtained from two-locus cytonuclear data with a single maternally or paternally inherited cytoplasmic marker. A program implementing this estimation procedure is available from the authors upon request.
Maximum-likelihood estimation:
We use maximum likelihood (![]()
![]()
11, ... ,
22) and migrant pollen frequencies (e.g.,
1C, ... ,
2C, Table 4), using the equilibrium formulas in (A1A13) for the general case in conjunction with the decompositions shown in Table 1 and Table 2. We then jointly estimate the three parameters M, m, and s by finding their values that maximize the log-likelihood function for the observed joint genotypic counts, using the "simulated annealing" minimization routine amebsa from Numerical Recipes in C, Ed. 2 (![]()
The full log-likelihood function for dicytonuclear data is given by
![]() |
(1) |
(ignoring the constant multinomial coefficient), where, for example, NAA/M/C gives the observed count of adults with the AA/M/C joint genotype, and Û11 gives the equilibrium frequency of that joint genotype for the specified parameter values under our model. If unknown, the program may be modified so as to simultaneously estimate the migrant frequencies from the source population. In this case, the migrant frequencies become parameters for the likelihood function, in addition to the migration and selfing rate parameters M, m, and s. The 95% confidence limits are found by bootstrapping the data set a user-specified number of times and dropping the lowest 2.5% and highest 2.5% of each estimate. The program gives the lower and upper bounds on the confidence interval as well as the length of the interval, obtained by subtracting the lower from the upper bound.
Multiple nuclear markers:
An extension for estimating rates of seed and pollen migration from data with multiple unlinked nuclear markers is straightforward. In this case, the overall likelihood is calculated as a product of the likelihoods for each of the nuclear markers considered separately (![]()
ij,k,l;x gives the expected frequency of that joint genotype under our model, the log-likelihood function is given by
![]() |
(2) |
where the outer sum is over each nuclear marker, x = 1, ... , n. Estimates for the parameters are obtained by maximizing this composite function via the optimization routine described above.
Consideration of linked nuclear markers requires simultaneous consideration of the various recombination frequencies (![]()
![]()
Multiallelic data:
An extension to multiple alleles greatly increases the number and complexity of associations considered by this model. As a first step to addressing multiallelic data, such data may be converted to diallelic form by grouping one allele vs. all others at a locus. This approach was chosen for the estimation program developed here, because it is the simplest and allows the user to specify the grouping for the locus, rather than arbitrarily averaging over all possible groupings, which could be difficult to interpret. Further theory must be developed to allow a more comprehensive multiallelic analysis.
Results from simulated data:
We tested this method with simulated diallelic data sets containing a single nuclear marker; counts of the 12 possible joint three-locus genotypes (NAA/M/C ... , Naa/m/c) in Table 2 were generated as random samples from the equilibrium genotypic distributions under specified migration and selfing rates and specified migrant pollen and seed frequencies. Due to the many possible combinations of parameter values, the examples chosen are not meant to be exhaustive, but instead illustrate some of the factors that affect gene flow estimation using this method. To assess the relative utility of the three possible types of data, we also extracted counts for the two-locus genotypes in each two-locus cytonuclear system. For each run, all three data sets were bootstrapped 200 times to construct 95% confidence intervals for each of the three estimated parameters (M, m, and s).
We consider two different populations (designated A and B) whose migrant compositions differ in the conditions that allow estimation of the two rates of gene flow (Table 5). For population A, the nuclear allele frequency is equal in migrant pollen and seeds (
=
= 0.7), while the frequency of the paternally inherited marker differs in the two migrant pools (
C = 1.0,
C = 0.7), allowing us to examine the effect of having only one type of intermigrant frequency difference on estimating the two migration rates. The resident population receives migrant seeds of two types, AA/M/C and aa/m/c (
11 = 0.7,
22 = 0.3) , which produces cytoplasmic as well as all possible allelic and homozygote disequilibria in the migrant seeds (
M/C =
A/* =
AA/* = -
aa/* = 0.21 and
Aa/* =
Aa/M/C = 0, where * indicates M, C, or MC;
A/M/C =
AA/M/C = -
aa/M/C = -0.084). The migrant pollen are also of two types, A/C and a/C (
1C = 0.7,
1C = 0.3,
A/C = 0), but carry no nonrandom associations. Population B receives the same types of migrant seeds as population A and thus has the same disequilibria in the migrant seed pool. The migrant pollen also still carry no allelic cytonuclear disequilibrium, but are now monomorphic for both the nuclear and paternally transmitted cytoplasmic markers (
1C = 1.0). As a result, population B has intermigrant frequency differences for both these markers (
= 0,
= 0.7,
C = 1.0,
C = 0.7), allowing more ways to generate permanent disequilibria and estimate gene flow than population A.
These two situations, with disequilibria in migrant seeds but not in migrant pollen, could arise if a population receives migrant seeds from multiple sources, but migrant pollen from only one source, distinct from the sources for migrant seeds. For instance, a central population might receive animal-dispersed seeds from two nearby populations but only receive wind-dispersed pollen in one prevailing direction from a more distant population. The gene flow estimates in the numerical examples below would then represent composite estimates of the overall rates of pollen and seed flow from all sources. The hybrid zone model in the subsequent section outlines how to estimate the separate contributions from each source.
For each migrant composition, we examined various combinations of the seed migration (m = 0, 0.05, 0.1), pollen migration (M = 0, 0.05, 0.1, 0.2), and selfing (s = 0.1, 0.5, 0.9) rates. We estimated these parameters from three simulated data sets with representative sample sizes of N = 100, 300, and 500. The results for m and M are given in TABLE ACC, where the symbol
z
indicates the estimated value for each parameter z. The estimates are normalized relative to their deviation from the true value,

When the parameter value is zero, the estimate itself is given instead. Also presented in TABLE ACC are the lengths of the 95% confidence intervals for the estimates of m (Im) and M (IM), each of which has a maximum possible length of 1.0.
When only the frequency of the paternally inherited cytoplasmic marker differs between the two migrant pools (population A, Table C1), the n-mtDNA data, with a maternally inherited cytoplasmic marker, cannot estimate the pollen migration rate (Table 5) because the equilibrium state for this cytonuclear system is then independent of the pollen migration rate (![]()
(m) < 0.5 for 83.3% of estimates, Im < 0.23 for all estimates;
(M) < 0.5 for 75% of estimates, IM < 0.4 for 71% of estimates).
All three systems consistently give good estimates when the allele frequencies of both the nuclear and the paternally inherited cytoplasmic marker differ in the migrant pools (population B, Table C2), so that permanent disequilibria are generated by intermigrant admixture effects. Such differences in allele frequencies would be expected, for example, if multiple, genetically distinct sources each contributed differentially to the migrant pollen and seed pools. When there is no seed migration (m = 0), none of the three systems can estimate the pollen migration rate in our population B examples because the absence of allelic cytonuclear disequilibrium in migrant pollen (
A/C = 0) makes the equilibrium state independent of the pollen migration rate (see the special case m = 0 above). In other cases with no seed migration, where migrant pollen carry cytonuclear allelic disequilibrium (
A/C
0), estimation of the pollen migration rate is possible from two- and three-locus cytonuclear data containing a paternally inherited marker (n-cpDNA and n-mtDNA-cpDNA). This, however, requires knowledge of the initial frequency for the maternally inherited marker, X(0)M, which is now the marker's expected frequency since its value is not affected by pollen migration alone (Appendix A); if necessary, this may be treated as an unknown parameter and jointly estimated along with M and s.
In general, increased sample sizes appear to improve the accuracy of the estimates and decrease the size of the confidence intervals for both migrant compositions (A and B), although there is a great deal of variability among different simulated data sets. High selfing rates (s = 0.9) generally worsen estimates of pollen migration, although again there is a great deal of variation across runs. These poor estimates may be because high selfing reduces the total fraction of migrant pollen, M(1 - s), so that such pollen contribute less to the genetic composition of the population at equilibrium. The equilibrium genotype frequencies are thus less useful for estimating the pollen migration rate under high selfing.
Fig 3 and Fig 4 give results from two particular test runs for these same migrant compositions (A and B). In each case, the estimates of the seed migration rate
m
and the pollen migration rate
M
for the three systems (n-mtDNA, n-cpDNA, and n-mtDNA-cpDNA) are given for the three sample sizes: N = 100, 300, and 500 (estimates of selfing rate not shown). The dashed lines give the actual values of the migration rates, the solid boxes give the estimates, and the open boxes indicate the upper and lower bounds for the 95% confidence limits.
|
|
Fig 3 gives an example from population A (
=
,
C
C) with a selfing rate of s = 0.1. Estimates of seed migration rates are close to the true value and confidence intervals are small for all three systems (Im
0.154), with the exception of n-cpDNA for N = 100 (Im = 0.227). For this example, the n-mtDNA system could not estimate the pollen migration rate because, as noted in the special case
=
above, the equilibrium state for cytonuclear systems with a maternally inherited cytoplasmic marker does not depend on the pollen migration rate with equal nuclear frequencies in the migrant pools. The n-mtDNA-cpDNA system gave smaller confidence intervals for all of the estimates.
An example from population B (
,
C
C) where the selfing rate is s = 0.5 is given in Fig 4. All three systems give good estimates for the seed migration rate (
(m)
0.335), and both n-mtDNA and n-mtDNA-cpDNA give consistently small confidence intervals for this estimate (0.035
Im
0.125). For the pollen migration rate, only the n-mtDNA system with maternal cytoplasmic inheritance and the n-mtDNA-cpDNA system with three forms of inheritance gave good results for N = 100 and 500 (
(M)
0.345), with the n-mtDNA-cpDNA system giving slightly smaller confidence intervals for all of the estimates. All three systems consistently overestimated the pollen flow rate for the run at N = 300.
| HYBRID ZONE MODEL |
|---|
We next consider explicitly the estimation of unidirectional migration rates from multiple sources, such as would be found in a hybrid zone or other areas of admixture. Here, we estimate the migration rates for pollen and seed from each source separately, in contrast to the previous framework that allowed only composite estimates for the total amount of each form of gene flow. The migration model for this application is depicted in Fig 5. We assume that each generation a fixed fraction M1 of outcrossed pollen in the hybrid population is derived from source population 1 (species 1), and a fraction M2 is derived from source population 2 (species 2), where both sources have constant genetic compositions. Similarly, a fixed fraction m1 of the seeds migrate from source population 1 and a fixed fraction m2 from source population 2. The total pollen and seed migration rates are then M = M1 + M2 and m = m1 + m2, respectively, with the remaining fractions contributed by the resident population. The general model described in ![]()
|
Composition of total migrant pools:
Frequencies in each total migrant pool are now the weighted averages of the corresponding frequencies in the two source populations. For example, the nuclear allele frequency in migrant pollen will be

where
(i) is the nuclear allele frequency in migrant pollen from source population i. The disequilibria in the total migrant pools will be the result of admixture between the contributions of the two sources and can be calculated using (1518) of the companion article (![]()

where, once again, the superscript indicates the source population. We can use this new model to jointly estimate the four migration rates M1, M2, m1, m2) in hybrid zones in much the same way we used the original model to estimate the two (composite) migration rates (M, m).
For our numerical test of the estimation procedure for hybrid zones, we focus on the case where the two source populations show fixed differences at all three loci. This corresponds to having diagnostic nuclear, mitochondrial, and chloroplast markers for the two source populations, as might be found in a hybrid zone where two genetically diverged taxa come into contact. Such a situation is presumably the optimal case for estimation; the utility in other cases can be determined via simulations in the same way as we have done here. We assume that source population 1 is fixed for AA/M/C and source population 2 is fixed for aa/m/c. The migration rates from the two sources then uniquely determine the frequencies in migrant seeds,
![]() |
(3) |
and in migrant pollen (Table 4),
![]() |
(4) |
as well as their disequilibria,
![]() |
(5) |
![]() |
(6) |
![]() |
(7) |
![]() |
(8) |
where * indicates M, C, or MC. Note that this shows that the signs of the three-way disequilibria can serve as useful indicators of asymmetry in seed migration from two genetically distinct sources, since (6) will be positive if m2 > m1, negative if m1 > m2, and zero only when m1 = m2.
Results from simulated data:
Several different combinations of migration and selfing rates (M1, M2, m1, m2, s) were tested, selected from the same range used in the general case. The numerical results are given in TABLE ADD. As for the original model, due to the many possible combinations of parameter values, the examples given in TABLE ADD are not meant to be exhaustive but are illustrative of some of the factors that impact this estimation method. The cytonuclear system with a maternally inherited cytoplasmic marker (n-mtDNA) consistently had trouble estimating pollen migration rates, generally giving the largest confidence intervals for M1 and M2 and often having very poor estimates as well. With equal rates of pollen and seed migration (m1 = m2 = M1 = M2 = 0.1), the cytonuclear system with a paternally inherited cytoplasmic marker (n-cpDNA) was also often poor at estimating the pollen migration rate, and, for low or intermediate selfing rates, had larger confidence intervals for estimates of seed migration rates than either the n-mtDNA or n-mtDNA-cpDNA systems. Note that, for this case, the migrant pools are equivalent, with equal allele frequencies for the nuclear and paternally inherited cytoplasmic markers (
=
= 0.5,
C =
C = 0.5) so that there are no simple intermigrant effects.
In the examples where only one source population contributes migrant pollen (M1 = 0.0), the n-cpDNA system with paternal cytoplasmic inheritance tended to estimate the seed migration rate from the other source population as zero instead of the true value of 0.01 (<m2> = 0.0,
(m2) = 1.0). This may be because, with diagnostic markers and only one source of migrant pollen, the migrant pollen pool is fixed at both the nuclear and paternally inherited loci (
=
C = 0) and has no allelic cytonuclear disequilibrium (
A/C = 0). In contrast, the three-locus system with both modes of cytoplasmic inheritance was usually quite successful at estimating all four migration parameters, and, in cases where one of the two-locus systems failed to estimate migration rates (
(z) > 0.5), the n-mtDNA-cpDNA system generally succeeded.
With smaller sample sizes (N = 100), there were occasionally runs where none of the three systems could estimate a parameter. For example, for m1 = 0, m2 = 0.05, M1 = 0.1, M2 = 0, s = 0.9, and N = 100, by chance the simulated data set did not include any heterozygous individuals (NAa/m/C = NAa/m/c = 0) although the equilibrium heterozygote frequency was
= 0.0388 for that parameter set. All three systems estimated the seed migration rate to be zero in this case (<m2> = 0.0,
(m2) = 1.0). This is indicative of the problem that all iterative maximum-likelihood methods encounter when there are missing observations, as can occur with small sample sizes (![]()
Fig 6 Fig 7 Fig 8 Fig 9 give examples from two particular test runs. As before, the estimates of the seed migration rates (Fig 6 and Fig 8,
m1
and
m2
) and the pollen migration rates (Fig 7 and Fig 9,
M1
and
M2
) for the three systems (n-mtDNA, n-cpDNA, and n-mtDNA-cpDNA) are given for the three sample sizes: N = 100, 300, and 500. Again, the dashed lines give the actual values for each of the migration rates, the solid boxes give the estimates, and the open boxes indicate the upper and lower bounds for the 95% confidence limits.
|
|
|
|
In the example given by Fig 6 and Fig 7, all four migration rates differed (m1 = 0.05, m2 = 0.01, M1 = 0.1, M2 = 0.2) and the selfing rate was s = 0.5. Increasing sample size generally decreased the size of the confidence intervals for all three systems, except for the pollen migration rates for N = 500. The three-locus system (n-mtDNA-cpDNA) gave consistently smaller confidence intervals for pollen migration rates than the other two systems. The n-cpDNA system with paternal inheritance gave very poor estimates for the smaller of the seed migration rates (Fig 6, m2), while the n-mtDNA system with maternal inheritance always performed poorly in estimating pollen migration (Fig 7). In contrast, the n-mtDNA-cpDNA system gives reasonable estimates for all four migration rates.
For the example shown in Fig 8 and Fig 9, all four migration rates were the same (m1 = m2 = M1 = M2 = 0.1) and the selfing rate was s = 0.1. Again, increasing sample size decreased the confidence intervals for most of the estimates, although the effect was not as great as in the previous example. The cytonuclear system with paternal inheritance (n-cpDNA) gave large confidence intervals and often poor estimates for all four migration rates. The cytonuclear system with maternal inheritance (n-mtDNA) again gave very poor estimates for the pollen migration rates (Fig 9). All three systems underestimated the seed migration rates and overestimated the pollen migration rates for the N = 300 run, indicating that a particular data set may poorly reflect the "true" genotypic frequencies for a population, leading to inaccurate estimates. Once again, the n-mtDNA-cpDNA system performed best overall.
| DISCUSSION |
|---|
The juxtaposition of biparental and uniparental inheritance in the same individual makes joint cytonuclear data particularly useful for decomposing plant gene flow and estimating the pollen (haploid) and seed (diploid) components. Here we have used previously developed continent-island models of unidirectional migration (![]()
![]()
![]()
![]()
![]()
We have focused here on censusing adults, both for convenience and since, in general, assaying three markers from adult tissues will be easier than doing so from seeds, especially for species whose seeds are small. However, seeds from conifers may be particularly easy to assay, especially for nuclear allozymes (![]()