Abstract
A simple, exact formula is derived for the expected number of heterozygous sites per individual at equilibrium in a subdivided population. The model of infinitely many neutral sites is posited; the linkage map is arbitrary. The monoecious, diploid population is subdivided into a finite number of panmictic colonies that exchange gametes. The backward migration matrix is arbitrary, but time independent and ergodic (i.e., irreducible and aperiodic). With suitable weighting, the expected number of heterozygous sites is 4N_{e}u, where N_{e} denotes the migration effective population number and u designates the total mutation rate per gene (or DNA sequence). For diploid migration, this formula is a good approximation if N_{e} ⪢ 1.
ONE of the most important measures of genetic variability at the molecular level is the expected number of heterozygous nucleotide sites per individual,
Since natural populations are frequently subdivided, considerable effort has been devoted to extending (1) to subdivided populations. Li (1976) proved that (1) holds for the island model without recombination if N_{e} = N_{T}, the total population number. Slatkin (1987) generalized Li's result by demonstrating that (1) holds if (i) there is no recombination; (ii) the backward migration matrix M is symmetric and ergodic; (iii)
In this note, we prove for gametic dispersion that, for suitably weighted calculation of
GAMETIC DISPERSION
Generations are discrete and nonoverlapping; the monoecious, diploid population is subdivided into a finite number of panmictic colonies that exchange gametes in a fixed pattern. We apply the model of infinitely many neutral sites with an arbitrary linkage map to a gene or DNA sequence. Thus, we posit that the mutation rate per site is so low that mutation occurs at each site at most once and then only at monomorphic sites. This approximation requires that the proportion of polymorphic sites be much less than one. Let u denote the total mutation rate per gene.
At the beginning of the life cycle, every one of the N_{i} adults in deme i produces the same very large number of gametes, which then disperse independently. Complete random union of gametes follows. Therefore, a proportion 1/N_{i} of the zygotes whose gametes originate in deme i are produced by selffertilization. Mutation is next, and finally population regulation returns the number of individuals in deme i to N_{i}. Thus, random genetic drift operates through population regulation.
Before deriving our results, we introduce some essential concepts and parameters.
Let m_{ij} designate the probability that a gamete in deme i after dispersion was produced in deme j. In the absence of selection, it is reasonable to assume that the backward migration matrix M = (m_{ij}) is constant
(Nagylaki 1992, p. 135). We posit also that M is ergodic, i.e., irreducible and aperiodic (Gantmacher 1959, pp. 50, 80, 88). Irreducibility guarantees that the descendants of individuals in each deme are able eventually to reach every other deme. Aperiodicity precludes pathological cyclic behavior. Given irreducibility, the biologically trivial condition that individuals have positive probability of remaining in some deme, i.e., that m_{ii} > 0 for some i, suffices for aperiodicity (Feller 1968, p. 426). Of course, M must be stochastic:
Let N_{T} and κ_{i} represent the total population number and the proportion of adults in deme i, respectively:
The components of the last equation in (4) are
Conservative migration patterns are those that do not change the subpopulation numbers; in this case, and only in this case, we have ν = κ (Nagylaki 1980). Conservative migration has many simple intuitive properties that do not always hold for arbitrary migration (Nagylaki 1980, 1982, 1983, 1985, 1986, 1992, pp. 135–136, 151; Nordborg 1997). In our model, the subpopulation numbers N_{i} refer to adults. However, since the number of gametes in each deme before dispersion is proportional to N_{i}, it is also true that the gametic numbers are unchanged by conservative migration, and only by conservative migration.
In our results, the vectors κ and ν enter combined in the migration effective population number N_{e}, defined by (Nagylaki 1980, 1982, 1983, 1994)
There is a simple, intuitive interpretation of N_{e} (cf. Nordborg 1997). In many cases, 1/N_{e} can be defined as the probability that two randomly chosen gametes in distinct individuals are descended from the same parent (Crow and Denniston 1988; Caballero and Hill 1992; Nagylaki 1992, pp. 243–247, 1995). For gametes in demes i and j, this probability is
We are now prepared to deduce our results.
Let T_{ij} denote the mean coalescence time (in generations) of two distinct, homologous nucleotides chosen at random from adults just before gametogenesis, one from deme i and one from deme j. At equilibrium, considering ancestry and coalescence in the preceding generation yields directly
Define the global and local means (cf. Nagylaki 1982)
In the strongmigration limit, N_{i} → ∞ for every i with κ and M fixed. Then T_{ij} ~ 2N_{e} for every i and j (Notohara 1993), where the notation means that T_{ij}/(2N_{e}) → 1. This demonstrates independently the asymptotic validity of the exact formula (8) whenever migration dominates random drift.
We discuss special cases of (8) after presenting a different proof of (9).
An alternative proof of (9): Since (8) and (9) are geographicalinvariance relations, the following instructive approach is natural. Suppose the model of infinitely many alleles (Malécot 1946, 1948, 1951; Wright 1948; Kimura and Crow 1964) applies to each site: every nucleotide at site s mutates to new nucleotides at rate u_{s}. (We soon let u_{s} → 0, so the fact that there are really only four nucleotides will not matter.)
Let
Conservative migration: If migration is conservative, then N_{e} = N_{T}, so (9) reduces to a result established by Strobeck (1987) for weak evolutionary forces. In this case, ν = κ, and hence the averages in (8) and (9) simplify to weighting by the demic proportions:
Examples of conservative migration are random outbreeding and site homing (Nagylaki 1992, pp. 136, 149, and refs. therein), the island model (Nagylaki 1983, 1986, and refs. therein), and the circular steppingstone model (Nagylaki 1983, 1986, and refs. therein). The choice m_{ij} = κ_{j} corresponds to panmixia in the entire population.
Note that (16) is independent of the migration pattern, provided the latter is conservative. This raises the following apparent paradox. If there is no migration, then T_{ii} = 2N_{i}, whence
From the formal point of view, note that if there is no migration, then M is the identity matrix. Therefore, contrary to our assumption of ergodicity, M is reducible and ν is undefined. Thus, (16) does not apply.
A more illuminating explanation is that, as the migration rates tend to zero, so does the probability of descent from a different deme, but the mean interdeme coalescence times (T_{ij} for i ≠ j) diverge and make a finite, positive contribution to the mean intrademe coalescence times T_{ii}. This behavior is exemplified by the island model with migration rate m: Li's (1976) solution shows that T_{ij} = O(m^{−1}) for i ≠ j and that the interdeme contribution is O(1) as m → 0. For two islands, Nath and Griffiths (1993) demonstrate that the distribution of the intrademe coalescence time converges to the singledeme distribution, but the mean intrademe coalescence time does not converge to the singledeme mean.
Doubly stochastic backward migration matrix: Here we assume, in addition to (2), that
A natural subclass of doubly stochastic M is homogeneous M: in this case, m_{ij} = m_{i}_{−}_{j}, which depends only on displacement, rather than on both the initial and final positions. Examples are the island and circular steppingstone models, but, as observed above, these migration patterns are also conservative (Nagylaki 1992, p. 136).
Symmetric M is another subclass of doubly stochastic M. In this case, the formula
Two demes: Parametrizing M as
Migration is conservative if ν = κ, which is equivalent to N_{1}m_{1} = N_{2}m_{2}. This condition means that the same number of individuals migrate from deme 1 to deme 2 as vice versa.
DIPLOID MIGRATION
In this section, we provide support for the robustness of (8) and (9) by proving that (8) is a good approximation for diploid migration if N_{e} ⪢ 1. We derive exact results for conservative migration and weak and strongmigration approximations for the general case.
We modify the model in the preceding section so that selfing is excluded and zygotes (rather than gametes) disperse, still before population regulation.
Let S_{i} designate the mean coalescence time of two distinct, homologous nucleotides chosen at random just before gametogenesis from an adult in deme i. Let T_{ij} signify the mean coalescence time of two homologous nucleotides chosen from distinct adults just before gametogenesis, one from deme i and one from deme j. A moment's reflection shows that at equilibrium,
We retain (7) and define
Conservative migration: Since the number of zygotes in each deme before migration is proportional to the number of adults, therefore it is conservative migration, and only conservative migration, that leaves the zygotic numbers invariant. If migration is conservative, then
One can also deduce (29) by the alternative approach presented in the preceding section: approximate Equations (19b) and (20) of Nagylaki (1985) for weak mutation and sum over sites.
Weak migration: Let m represent the largest total migration rate:
Observe that (34) and (35) agree with (28) and (29), respectively, for weak conservative migration.
Strong migration: If N_{i} ⪢ 1 but m ≮ 1, then we can assume that N_{i} → ∞ for every i with the backward migration matrix M fixed. In this limit, the result (8) indicates that we must have the asymptotic formulas
This result agrees with (8) and (9), as expected from the strongmigration limit for diploid migration in the model of infinitely many alleles (Nagylaki 1983).
DISCUSSION
We have demonstrated that, for gametic dispersion and suitable averaging, the mean intrademe coalescence time is
In contrast, the rate of gene substitution, K, is completely independent of population structure; unlike
We close this note with some remarks on effective population number. Consult Caballero (1994) and Nagylaki (1995) for additional discussion and references.
First, it must be kept in mind that the introduction of an effective population number generally does not reduce exactly a complicated model to a simpler or ideal one. At most, such reduction occurs approximately or only for certain functionals of the evolutionary process. For example, the variance effective population number
Second, although effective population numbers have usually been defined in terms of some property of the evolutionary process, they are theoretically instructive and useful only if they can be evaluated as parameters, rather than random variables that depend on that process. This has been accomplished under a wide range of assumptions for both
Third, a particular effective population number is useful only if it can be evaluated without analysis of the evolutionary process or if it predicts more than one property of that process. Again,
The migration effective population number N_{e} defined by (5) has all the desirable properties discussed above. Its evaluation from (5) is simple, explicit, and independent of the genetic model: N_{e} satisfies N_{e} ≤ N_{T} and depends only on the vector κ of demic proportions and on the unique stationary distribution ν of the Markov chain generated by the constant, ergodic backward migration matrix M. Of course, no effective population number can reduce a model of a subdivided population to that of a panmictic one. However, the above and earlier analyses (Nagylaki 1980, 1982, 1983, 1994) show that N_{e} replaces N_{T} in the strongmigration limit and in certain aspects of geographical invariance.
Finally, it should be noted that our definition of N_{e} differs from that of the various recently introduced effective population numbers for subdivided populations (Nei and Takahata 1993; Wang 1997a,b; Whitlock and Barton 1997), which are defined in terms of the behavior of the evolutionary process.
Acknowledgments
I am very grateful to Magnus Nordborg for stimulating me to carry out this study by asking whether previous results were special cases of a general formula and for helpful comments on the manuscript. I thank Robert Griffiths for a discussion that led to the intuitive interpretation of the migration effective population number. This work was supported by National Science Foundation grant DEB9706912.
Footnotes

Communicating editor: M. Slatkin
 Received December 5, 1997.
 Accepted March 16, 1998.
 Copyright © 1998 by the Genetics Society of America