- THIS ARTICLE
-
Abstract
- Full Text (PDF)
- Alert me when this article is cited
- Alert me if a correction is posted
- SERVICES
- Similar articles in this journal
- Similar articles in PubMed
- Alert me to new issues of the journal
- Download to citation manager
- Reprints & Permissions
- CITING ARTICLES
- Citing Articles via Google Scholar
- GOOGLE SCHOLAR
- Articles by Nagylaki, T.
- Search for Related Content
- PUBMED
- PubMed Citation
- Articles by Nagylaki, T.
The Expected Number of Heterozygous Sites in a Subdivided Population
Thomas Nagylakiaa Department of Ecology and Evolution, University of Chicago, Chicago, Illinois 60637
Corresponding author: Thomas Nagylaki, Department of Ecology and Evolution, University of Chicago, 1101 E. 57th St., Chicago, IL 60637.
Communicating editor: M. SLATKIN
| ABSTRACT |
|---|
A simple, exact formula is derived for the expected number of heterozygous sites per individual at equilibrium in a subdivided population. The model of infinitely many neutral sites is posited; the linkage map is arbitrary. The monoecious, diploid population is subdivided into a finite number of panmictic colonies that exchange gametes. The backward migration matrix is arbitrary, but time independent and ergodic (i.e., irreducible and aperiodic). With suitable weighting, the expected number of heterozygous sites is 4Neu, where Ne denotes the migration effective population number and u designates the total mutation rate per gene (or DNA sequence). For diploid migration, this formula is a good approximation if Ne
1.
ONE of the most important measures of genetic variability at the molecular level is the expected number of heterozygous nucleotide sites per individual,
0 . For a panmictic population at equilibrium and without selection, ![]()
![]() |
(1) |
Since natural populations are frequently subdivided, considerable effort has been devoted to extending (1) to subdivided populations. ![]()
![]()
0 is calculated by weighting each deme by the reciprocal of the number of individuals in it; and (iv) Ne = nÑ, where n signifies the number of demes and Ñ denotes the harmonic mean of the subpopulation numbers. ![]()
![]()
![]()
![]()
![]()
![]()
![]()
![]()
![]()
![]()
In this note, we prove for gametic dispersion that, for suitably weighted calculation of
0 , the Equation 1 holds for every linkage map if M is arbitrary, but ergodic and time independent, and if Ne designates the migration effective population number (![]()
![]()
![]()
![]()
1.
| GAMETIC DISPERSION |
|---|
Generations are discrete and nonoverlapping; the monoecious, diploid population is subdivided into a finite number of panmictic colonies that exchange gametes in a fixed pattern. We apply the model of infinitely many neutral sites with an arbitrary linkage map to a gene or DNA sequence. Thus, we posit that the mutation rate per site is so low that mutation occurs at each site at most once and then only at monomorphic sites. This approximation requires that the proportion of polymorphic sites be much less than one. Let u denote the total mutation rate per gene.
At the beginning of the life cycle, every one of the Ni adults in deme i produces the same very large number of gametes, which then disperse independently. Complete random union of gametes follows. Therefore, a proportion 1/Ni of the zygotes whose gametes originate in deme i are produced by self-fertilization. Mutation is next, and finally population regulation returns the number of individuals in deme i to Ni. Thus, random genetic drift operates through population regulation.
Before deriving our results, we introduce some essential concepts and parameters.
Let mij designate the probability that a gamete in deme i after dispersion was produced in deme j. In the absence of selection, it is reasonable to assume that the backward migration matrix M = (mij) is constant (![]()
![]()
![]()
![]() |
(2) |
Let NT and
i represent the total population number and the proportion of adults in deme i, respectively:
![]() |
(3a) |
![]() |
(3b) |
By the ergodicity of the nonnegative stochastic matrix M, the eigenvalue 1 of M is simple and exceeds all other eigenvalues in absolute value; we may choose the left eigenvector
corresponding to this unit eigenvalue to have only positive components (![]()
![]() |
(4) |
uniquely. Note that
is the unique stationary distribution of the Markov chain with transition matrix M.
The components of the last equation in (4) are
With the aid of (2), we can rewrite this as
depends only on the relative migration rates. More precisely, if we replace mij by cmij for every i and j such that i
j, where the constant c is independent of i and j and the restriction
is unaltered.
Conservative migration patterns are those that do not change the subpopulation numbers; in this case, and only in this case, we have
=
(![]()
![]()
![]()
![]()
![]()
![]()
![]()
![]()
In our results, the vectors
and
enter combined in the migration effective population number Ne, defined by (![]()
![]()
![]()
![]()
![]() |
(5) |
We have ß
1 and hence Ne
NT, with equality if and only if migration is conservative (![]()
![]()
![]()
![]()
![]()
There is a simple, intuitive interpretation of Ne (cf. ![]()
![]()
![]()
![]()
Averaging this with respect to the stationary distribution
and using (4), (3a), and (5) yield
We are now prepared to deduce our results.
Let Tij denote the mean coalescence time (in generations) of two distinct, homologous nucleotides chosen at random from adults just before gametogenesis, one from deme i and one from deme j. At equilibrium, considering ancestry and coalescence in the preceding generation yields directly
![]() |
(6) |
Clearly, (6) applies also to a model with 2Ni haploid individuals in deme i.
Define the global and local means (cf. ![]()
![]() |
(7) |
Averaging (6) according to (7) and appealing to (4) and (3a) yield
![]() |
(8) |
Therefore, summing mutations over sites shows that the expected number of nucleotide differences is
![]() |
(9) |
Thus, with the weighting (7), the exact effect of population subdivision in (8) and (9) is to replace the actual total population number NT by the migration effective population number Ne.
In the strong-migration limit, Ni
for every i with
and M fixed. Then Tij ~ 2Ne for every i and j (![]()
1. This demonstrates independently the asymptotic validity of the exact Equation 8 whenever migration dominates random drift.
We discuss special cases of (8) after presenting a different proof of (9).
An alternative proof of (9):
Since (8) and (9) are geographical-invariance relations, the following instructive approach is natural. Suppose the model of infinitely many alleles (![]()
![]()
![]()
![]()
![]()
0, so the fact that there are really only four nucleotides will not matter.)
Let f(s)ij denote the probability that two distinct nucleotides at site s chosen at random from adults just before gametogenesis, one from deme i and one from deme j, are the same. Define the weighted means (![]()
![]() |
(10) |
![]() |
(11) |
(s)0 is the weighted average nucleotide heterozygosity. At equilibrium, these satisfy the geographical-invariance relation (
![]() |
(12) |
To obtain the expected number of heterozygous sites in the model of infinitely many sites, we must let us
0 and sum over s in (12). Evidently,
(s)
1 as us
0, so (12) reduces to
![]() |
(13) |
![]() |
(14) |
![]() |
(15) |
Conservative migration:
If migration is conservative, then Ne = NT, so (9) reduces to a result established by ![]()
=
, and hence the averages in (8) and (9) simplify to weighting by the demic proportions:
![]() |
(16) |
Examples of conservative migration are random outbreeding and site homing (![]()
![]()
![]()
![]()
![]()
j corresponds to panmixia in the entire population.
Note that (16) is independent of the migration pattern, provided the latter is conservative. This raises the following apparent paradox. If there is no migration, then Tii = 2Ni, whence
![]() |
(17) |
From the formal point of view, note that if there is no migration, then M is the identity matrix. Therefore, contrary to our assumption of ergodicity, M is reducible and
is undefined. Thus, (16) does not apply.
A more illuminating explanation is that, as the migration rates tend to zero, so does the probability of descent from a different deme, but the mean interdeme coalescence times (Tij for i
j) diverge and make a finite, positive contribution to the mean intrademe coalescence times Tii. This behavior is exemplified by the island model with migration rate m: LI's (1976) solution shows that Tij = O(m-1) for i
j and that the interdeme contribution is O(1) as m
0. For two islands, ![]()
Doubly stochastic backward migration matrix:
Here we assume, in addition to (2), that
![]() |
(18) |
![]() |
(19) |
![]() |
(20a) |
![]() |
(20b) |
A natural subclass of doubly stochastic M is homogeneous M: in this case, mij = mi-j, which depends only on displacement, rather than on both the initial and final positions. Examples are the island and circular stepping-stone models, but, as observed above, these migration patterns are also conservative (![]()
Symmetric M is another subclass of doubly stochastic M. In this case, the formula
0 = 4nÑu was derived by ![]()
Two demes:
Parametrizing M as
![]() |
(21) |
![]() |
(22) |
![]() |
(23) |
We can confirm (23) by calculating
0 from NOTOHARA's (1990, p. 69) formulas for Tij.
Migration is conservative if
=
, which is equivalent to N1m1 = N2m2. This condition means that the same number of individuals migrate from deme 1 to deme 2 as vice versa.
| DIPLOID MIGRATION |
|---|
In this section, we provide support for the robustness of (8) and (9) by proving that (8) is a good approximation for diploid migration if Ne
1. We derive exact results for conservative migration and weak- and strong-migration approximations for the general case.
We modify the model in the preceding section so that selfing is excluded and zygotes (rather than gametes) disperse, still before population regulation.
Let Si designate the mean coalescence time of two distinct, homologous nucleotides chosen at random just before gametogenesis from an adult in deme i. Let Tij signify the mean coalescence time of two homologous nucleotides chosen from distinct adults just before gametogenesis, one from deme i and one from deme j. A moment's reflection shows that at equilibrium,
![]() |
(24a) |
![]() |
(24b) |
We retain (7) and define
![]() |
(25) |
Averaging (24b) and using (4), (3a), (5), and (25), we obtain the invariance formula
![]() |
(26) |
Conservative migration:
Since the number of zygotes in each deme before migration is proportional to the number of adults, therefore it is conservative migration, and only conservative migration, that leaves the zygotic numbers invariant. If migration is conservative, then
0 and
0 are averaged with respect to
, as in (16). Therefore, averaging (24a) yields
![]() |
(27) |
Recalling that Ne = NT and solving (26) and (27) simultaneously, we find
![]() |
(28) |
Let
0 and
0 denote the expected number of nucleotide differences between two homologous genes in the same individual and in different individuals in the same deme, respectively. Summing mutations over sites, we have
![]() |
(29a) |
![]() |
(29b) |
If NT
1, then (28) and (29) are very close to (8) and (9), respectively.
One can also deduce (29) by the alternative approach presented in the preceding section: approximate Equation 20aEquation 20b of ![]()
Weak migration:
Let m represent the largest total migration rate:
![]() |
(30) |
We derive an approximation for m
1 and arbitrary subpopulation numbers Ni. From (2) and (30) we see that
![]() |
(31) |
0, where
ij denotes the Kronecker delta (
ii = 1, and
ij = 0 if i
j). Substituting (31) into (24a) gives
![]() |
(32) |
![]() |
(33) |
The joint solution of (26) and (33) is our weak-migration approximation:
![]() |
(34a) |
![]() |
(34b) |
![]() |
(35a) |
![]() |
(35b) |
0. Observe that (34) and (35) agree with (28) and (29), respectively, for weak conservative migration.
Strong migration:
If Ni
1 but m
/ 1, then we can assume that Ni
for every i with the backward migration matrix M fixed. In this limit, the result (8) indicates that we must have the asymptotic formulas
![]() |
(36) |
, where S*i and T*ij are independent of Ne. The leading terms in (24b) yield
![]() |
(37) |
By the ergodicity of M, the nonnegative, stochastic Kronecker-product matrix M
M in (37) has a simple maximal eigenvalue 1, and the corresponding right eigenvector has equal components. Therefore, T*ij = T*, independent of i and j, and hence (24a) implies that S*i = T*. From (26) we get
![]() |
(38) |
with M fixed.
This result agrees with (8) and (9), as expected from the strong-migration limit for diploid migration in the model of infinitely many alleles (![]()
| DISCUSSION |
|---|
We have demonstrated that, for gametic dispersion and suitable averaging, the mean intrademe coalescence time is
0 = 2Ne , and the expected number of heterozygous nucleotide sites is
0 = 4Neu , where Ne and u denote the migration effective population number (5) and the mutation rate per gene, respectively. If Ne
1, these formulas are good approximations for diploid migration. Thus, for the simple functionals
0 and
0 , population subdivision can be taken into account by replacing the actual total population number NT by Ne. This reduction generally fails for higher moments of T0 and d0 and for measures of genetic variability other than
0 , such as the expected number of segregating sites.
In contrast, the rate of gene substitution, K, is completely independent of population structure; unlike
0 and
0 , it does not depend even on the stationary distribution
and the deme proportions
. To see this, note first that the fixation probability of a nucleotide with initial frequency pi in deme i is its weighted average initial frequency,
![]() |
(39) |
(![]()
j; so P =
j/(2Nj). Since a mutant appears in deme j with probability
j, the unconditional fixation probability is
![]() |
(40) |
![]() |
(41) |
We close this note with some remarks on effective population number. Consult ![]()
![]()
First, it must be kept in mind that the introduction of an effective population number generally does not reduce exactly a complicated model to a simpler or ideal one. At most, such reduction occurs approximately or only for certain functionals of the evolutionary process. For example, the variance effective population number N(v)e for a panmictic dioecious population is a parameter (rather than a random variable) only in the diffusion approximation, and it is only in this approximation that it reduces a dioecious model to a monoecious one (![]()
![]()
![]()
Second, although effective population numbers have usually been defined in terms of some property of the evolutionary process, they are theoretically instructive and useful only if they can be evaluated as parameters, rather than random variables that depend on that process. This has been accomplished under a wide range of assumptions for both N(v)e and N(i)e .
Third, a particular effective population number is useful only if it can be evaluated without analysis of the evolutionary process or if it predicts more than one property of that process. Again, N(v)e and N(i)e satisfy this criterion.
The migration effective population number Ne defined by (5) has all the desirable properties discussed above. Its evaluation from (5) is simple, explicit, and independent of the genetic model: Ne satisfies Ne
NT and depends only on the vector
of demic proportions and on the unique stationary distribution
of the Markov chain generated by the constant, ergodic backward migration matrix M. Of course, no effective population number can reduce a model of a subdivided population to that of a panmictic one. However, the above and earlier analyses (![]()
![]()
![]()
![]()
Finally, it should be noted that our definition of Ne differs from that of the various recently introduced effective population numbers for subdivided populations (![]()
![]()
![]()
![]()
| ACKNOWLEDGMENTS |
|---|
I am very grateful to MAGNUS NORDBORG for stimulating me to carry out this study by asking whether previous results were special cases of a general formula and for helpful comments on the manuscript. I thank ROBERT GRIFFITHS for a discussion that led to the intuitive interpretation of the migration effective population number. This work was supported by National Science Foundation grant DEB-9706912.
Manuscript received December 5, 1997; Accepted for publication March 16, 1998.
| LITERATURE CITED |
|---|
CABALLERO, A., 1994 Developments in the prediction of effective population size. Heredity 73:657-679.
CABALLERO, A. and W. C. HILL, 1992 A note on the inbreeding effective population size. Evolution 46:1969-1972.
CROW, J. F. and C. DENNISTON, 1988 Inbreeding and variance effective population numbers. Evolution 42:482-495.
EWENS, W. J., 1974 A note on the sampling theory for infinite alleles and infinite sites models. Theoret. Pop. Biol. 6:143-148[Medline].
EWENS, W. J., 1979 Mathematical Population Genetics. Springer-Verlag, Berlin.
EWENS, W. J., 1982 On the concept of effective population size. Theoret. Pop. Biol. 21:373-378.
FELLER, W., 1968 An Introduction to Probability Theory and Its Applications, Vol. I, Ed. 3. Wiley, New York.
GANTMACHER, F. R., 1959 The Theory of Matrices, Vol. II. Chelsea, New York.
GRIFFITHS, R. C., 1981 The number of heterozygous loci between two randomly chosen completely linked sequences of loci in two subdivided population models. J. Math. Biol. 12:251-261.
HERBOTS, H. M., 1994 Stochastic models in population genetics: genealogy and genetic differentiation in structured populations. Dissertation, University of London.
HEY, J., 1991 A multi-dimensional coalescent process applied to multi-allelic selection models and migration models. Theoret. Pop. Biol. 39:30-48[Medline].
KIMURA, M., 1968 Evolutionary rate at the molecular level. Nature 217:624-626[Medline].
KIMURA, M., 1969 The number of heterozygous nucleotide sites maintained in a finite population due to steady flux of mutations. Genetics 61:893-903
KIMURA, M. and J. F. CROW, 1964 The number of alleles that can be maintained in a finite population. Genetics 49:725-738
LI, W.-H., 1976 Distribution of nucleotide differences between two randomly chosen cistrons in a subdivided population: the finite island model. Theoret. Pop. Biol. 10:303-308[Medline].
MALÉCOT, G., 1946 La consanguinité dans une population limitée. C.R. Acad. Sci. Paris 22:841-843.
MALÉCOT, G., 1948 Les mathématiques de l'hérédité. Masson, Paris.
MALÉCOT, G., 1951 Un traitment stochastique des problèmes linéaires (mutation, linkage, migration) en Génétique de Population. Ann. Univ. Lyon Sci. Sec. A 14:79-117.
NAGYLAKI, T., 1980 The strong-migration limit in geographically structured populations. J. Math. Biol. 9:101-114[Medline].
NAGYLAKI, T., 1982 Geographical invariance in population genetics. J. Theoret. Biol. 99:159-172[Medline].
NAGYLAKI, T., 1983 The robustness of neutral models of geographical variation. Theoret. Pop. Biol. 24:268-294.
NAGYLAKI, T., 1985 Homozygosity, effective number of alleles, and interdeme differentiation in subdivided populations. Proc. Natl. Acad. Sci. USA 82:8611-8613
NAGYLAKI, T., 1986 Neutral models of geographical variation, pp. 216237 in Stochastic Spatial Processes, edited by P. TAUTU. Springer, Berlin.
NAGYLAKI, T., 1992 Introduction to Theoretical Population Genetics. Springer-Verlag, Berlin.
NAGYLAKI, T., 1994 Geographical variation in a quantitative character. Genetics 136:361-381[Abstract].
NAGYLAKI, T., 1995 The inbreeding effective population number in dioecious populations. Genetics 139:473-485[Abstract].
NATH, H. B. and R. C. GRIFFITHS, 1993 The coalescent in two colonies with symmetric migration. J. Math. Biol. 31:841-852[Medline].
NEI, M. and N. TAKAHATA, 1993 Effective population size, genetic diversity, and coalescence time in subdivided populations. J. Mol. Evol. 37:240-244[Medline].
NORDBORG, M., 1997 Structured coalescent processes on different time scales. Genetics 146:1501-1514[Abstract].
NOTOHARA, M., 1990 The coalescent and the genealogical process in geographically structured population. J. Math. Biol. 29:59-75[Medline].
NOTOHARA, M., 1993 The strong-migration limit for the genealogical process in geographically structured populations. J. Math. Biol. 31:115-122.
NOTOHARA, M., 1997 The number of segregating sites in a sample of DNA sequences from a geographically structured population. J. Math. Biol. 36:188-200[Medline].
SLATKIN, M., 1987 The average number of sites separating DNA sequences drawn from a subdivided population. Theoret. Pop. Biol. 32:42-49[Medline].
STROBECK, C., 1987 Average number of nucleotide differences in a sample from a single subpopulation: a test for population subdivision. Genetics 117:149-153
TAJIMA, F., 1989 DNA polymorphism in a subdivided population: the expected number of segregating sites in the two-subpopulation model. Genetics 123:229-240
TAKAHATA, N., 1988 The coalescent in two partially isolated diffusion populations. Genet. Res. 52:213-222[Medline].
WANG, J., 1997a Effective size and F-statistics of subdivided populations. I. Monoecious species with partial selfing. Genetics 146:1453-1463[Abstract].
WANG, J., 1997b Effective size and F-statistics of subdivided populations. II. Dioecious species. Genetics 146:1465-1474[Abstract].
WATTERSON, G. A., 1975 On the number of segregating sites in genetical models without recombination. Theoret. Pop. Biol. 7:256-276[Medline].
WHITLOCK, M. C. and N. H. BARTON, 1997 The effective size of a subdivided population. Genetics 146:427-441[Abstract].
WRIGHT, S., 1948 Genetics of populations, pp. 111112 in Encyclopaedia Britannica, Vol. 10, Ed. 14. Encyclopaedia Britannica, Chicago.
- THIS ARTICLE
-
Abstract
- Full Text (PDF)
- Alert me when this article is cited
- Alert me if a correction is posted
- SERVICES
- Similar articles in this journal
- Similar articles in PubMed
- Alert me to new issues of the journal
- Download to citation manager
- Reprints & Permissions
- CITING ARTICLES
- Citing Articles via Google Scholar
- GOOGLE SCHOLAR
- Articles by Nagylaki, T.
- Search for Related Content
- PUBMED
- PubMed Citation
- Articles by Nagylaki, T.














































