- THIS ARTICLE
-
Abstract
- Full Text (PDF)
- Alert me when this article is cited
- Alert me if a correction is posted
- SERVICES
- Similar articles in this journal
- Similar articles in PubMed
- Alert me to new issues of the journal
- Download to citation manager
- Reprints & Permissions
- CITING ARTICLES
- Citing Articles via HighWire
- Citing Articles via Google Scholar
- GOOGLE SCHOLAR
- Articles by Chapman, N. H.
- Articles by Thompson, E. A.
- Search for Related Content
- PUBMED
- PubMed Citation
- Articles by Chapman, N. H.
- Articles by Thompson, E. A.
The Effect of Population History on the Lengths of Ancestral Chromosome Segments
Nicola H. Chapmana and Elizabeth A. Thompsonba Division of Medical Genetics, University of Washington, Seattle, Washington 98195
b Department of Statistics, University of Washington, Seattle, Washington 98195
Corresponding author: Elizabeth A. Thompson, University of Washington, Box 354322, Seattle, WA 98195., thompson{at}stat.washington.edu (E-mail)
Communicating editor: M. VEUILLE
| ABSTRACT |
|---|
An isolated population is a group of individuals who are descended from a founding population who lived some time ago. If the founding individuals are assumed to be noninbred and unrelated, a chromosome sampled from the population can be represented as a mosaic of segments of the original ancestral types. A population in which chromosomes are made up of a few long segments will exhibit linkage disequilibrium due to founder effect over longer distances than a population in which the chromosomes are made up of many short segments. We study the length of intact ancestral segments by obtaining the expected number of junctions (points where DNA of two distinct ancestral types meet) in a chromosome. Assuming random mating, we study analytically the effects of population age, growth patterns, and internal structure on the expected number of junctions in a chromosome. We demonstrate that the type of growth a population has experienced can influence the expected number of junctions, as can population subdivision. These effects are substantial only when population sizes are very small. We also develop an approximation to the variance of the number of junctions and show that the variance is large.
AN isolated population is one that is descended from a small group of individuals (founders) and in which population growth is due almost exclusively to births within the population, rather than immigration from outside. Interest in the genetics of isolated populations has recently been revived among human geneticists, because of suggestions that such populations may be useful for disequilibrium-based mapping of susceptibility loci for complex disease. In particular, it is hoped that diseases for which there are several susceptibility loci in large outbred populations may be more homogeneous in small isolated populations. In addition, small recently founded populations may exhibit linkage disequilibrium over longer genetic distances than large outbred populations (![]()
![]()
Isolated populations are fundamentally different from the large outbred populations that are usually assumed in the theoretical study of linkage disequilibrium and may differ from one another in several aspects of their history. Populations are founded at different times by founder groups of different sizes, experience different growth patterns, and may have varying levels of internal subdivision. ![]()
![]()
We study the effects of population history on the number of junctions existing in a chromosome sampled from an isolated population. A junction is a point on the chromosome where DNA from two distinct ancestral chromosomes meet (![]()
![]() |
(1) |
|
Thus by obtaining the expected number of junctions in a length of chromosome, we obtain the expected number of contiguous segments and therefore a lower bound on their expected length.
A junction is formed when a crossover occurs between two chromosomes, at a point where they are not descendants of the same ancestral chromosome. That is, the chromosomes are not identical by descent (IBD) at that point. Once a junction is formed, it is transmitted as is any other gene (according to the laws of Mendelian inheritance). Since IBD is defined relative to some ancestral population, and since junctions require non-IBD to be formed, junctions are also defined relative to some ancestral population. In this article, junctions are defined relative to the founding generation; that is, this generation is assumed to consist of noninbred, unrelated individuals.
Some analogous questions regarding the lengths, number, and ancestral origins of chromosome segments have recently been considered by ![]()
![]()
![]()
![]()
By contrast, in this article we consider IBD relative to a founder population at some defined time point in the past and the shorter-term effects of population structure. We study the formation and transmission of junctions in random-mating subdivisions of a monoecious population with discrete generations. We assume that during gamete formation, crossover events along the chromosome happen according to a Poisson process, which has rate one per morgan. This implies that the number of crossover events in a chromosome of length L has a Poisson distribution with mean L. The age of the population is assumed known, as is the size of the population at each generation. In subdivided populations, the generation of the split(s) and the sizes of the subpopulations are assumed known. We first present some theoretical results, including an expression for the expected number of junctions per morgan existing on a chromosome randomly sampled from a particular generation and two approximations to the variance of this quantity. We then apply these results to some example populations to illustrate the effects of population size, type of growth, and subdivision.
| THEORETICAL DEVELOPMENT |
|---|
Mean number of junctions
Let Jt be the number of junctions present on a chromosome of length L, sampled at random from a population at generation t. Let n = {n0, n1, ... nt}, where nj denotes the number of junctions formed in meioses from generation j. Finally, let It(k, j) = 1 if the kth junction formed in meioses from generation j is present on the chromosome selected at time t, and let It(k, j) = 0 otherwise. Then as a function of n,

Taking the expectation conditional on n,

Now E[It(k, j)] is equal to the probability that junction k from generation j is present on the selected chromosome. Let l denote the locus where junction k formed, and consider the population at generation j + 1. One can think of locus l as having two alleles: One is junction k, and the other is not junction k. The frequency of k in generation j + 1 is exactly 1/(2Nj+1), where Nj+1 is the population size in generation j + 1, and is assumed known for all j. In a random-mating population, each of the 2Nj+1 genes at locus l in generation j + 1 are equally likely to be the ancestor of locus l in the randomly selected chromosome. Therefore E[It(k, j)], the probability that junction k from generation j is present on the selected chromosome, is equal to 1/(2Nj+1), and thus

Taking the expectation again,
![]() |
(2) |
and so we require E[nj].
Calculation of E[nj]:
Let Hj(p) denote the proportion of the chromosome that is non-IBD in individual p of generation j. Then

where L denotes the length of the chromosome in morgans, and
if the two haplotypes of individual p are non-IBD at point x on the chromosome,
otherwise. Thus
![]() |
(3) |
where hj is the probability of non-IBD at a particular locus between the two haplotypes of an individual in generation j. Now
, where Xj(m) denotes the number of junctions formed in meiosis m from generation j. Since crossovers happen along the chromosome according to a Poisson process with rate one per morgan, conditional on Hj(pm), Xj(m) has a Poisson distribution with mean Hj(pm)L, where pm denotes the parent of meiosis m, and L denotes the length of the chromosome in morgans. Therefore

since the parent is simply a randomly chosen individual from generation j, and by Equation 3. Then
![]() |
(4) |
Substituting Equation 4 into Equation 2, we obtain
![]() |
(5) |
For the random-mating population considered here,
(![]()
Equation 5 demonstrates that population history affects the expected number of junctions in a chromosome through the probability of non-IBD in each generation. This implies that in a large population where hj remains close to one over many generations, the number of generations since the founding of the population is the most important factor in determining the expected number of junctions and therefore the lower bound on the expected length of intact ancestral segments. Growth patterns that result in small population sizes over long periods of time will result in the accumulation of IBD, and as a result fewer junctions will be expected in chromosomes from such populations. Similarly, chromosomes from populations in which there is extensive subdivision will be expected to carry fewer junctions and therefore have longer intact ancestral segments.
Variance of the number of junctions
Recall that ni denotes the total number of junctions formed in all meioses from generation i.
Poisson approximation: We first consider a variance approximation on the basis of some simplifying assumptions. Specifically, suppose that
- ni has a Poisson distribution with mean 2Ni+1hiL.
- ni is independent of nj for all i
j. - The presence of any one junction in the sampled chromosome from generation t is independent of the presence of any other junction in that chromosome. That is, Pr(junction k formed in a meiosis from generation i exists in the chromosome sampled at generation t|junction l formed in a meiosis from generation j exists in the chromosome sampled at generation t) = Pr(junction k formed in a meiosis from generation i exists in the chromosome sampled at generation t), for any k, l, i, and j, where k
l if i = j.
Let Jt(i) denote the number of junctions formed in generation i that exist in the randomly sampled chromosome from generation t. Then assumption 1, together with the fact that the probability that a junction formed in a meiosis from generation i exists in the chromosome sampled at generation t equals 1/(2Ni+1), implies that Jt(i) has a Poisson distribution with mean hiL, for 0
i
t - 1. Furthermore, assumptions 2 and 3 imply that Jt(i) is independent of Jt(j), for i
j. Therefore

For the Poisson distribution, the variance is equal to the mean and can therefore be calculated using Equation 5.
The above assumptions do not generally hold. Assumption 1 would hold if all of the individuals in generation i had the same proportion hi of their genome non-IBD. In fact, this proportion varies across members of generation i and is equal to hi only in expectation. This extra variability leads to extra-Poisson variation in the distribution of ni. Assumption 2 does not hold, since, for example, knowing that ni is very small relative to the number of meioses implies that the population is likely close to fixation, and therefore subsequent nj (j > i) must also be small. Junctions formed close to one another in the same meiosis are likely to be inherited together, and therefore assumption 3 is not generally true. The violation of these assumptions implies that the true variance is likely higher than that predicted by the Poisson approximation. The assumptions are probably closer to the truth in larger populations.
Simulations:
The performance of the Poisson approximation to the variance was investigated by simulation. Chromosome data were simulated for random-mating populations of constant size (N = 20 or N = 50) over 150 generations. Individual chromosomes were represented by a linked list of segments, where adjacent segments were of distinct ancestral types. Each individual in generation i + 1 was produced by randomly choosing (with replacement) two parents from generation i. A gamete from each of these parents was generated by simulating the locations of crossovers according to a Poisson process and constructing the gamete out of the appropriate segments of parental chromosomes. More details can be found in ![]()
Fig 2 shows the variance of the number of junctions per morgan in populations of constant size either N = 20 or N = 50, estimated by simulation and by the Poisson approximation. The Poisson variance is an underestimate of the true variance, especially for older generations, and the smaller population. Since for a true Poisson random variable, mean and variance are equal, Table 1 shows the ratio of the estimated variance (
2, based on 10,000 simulations) to the theoretical mean (µ) as a function of N and t. Comparing the populations at times where t/N = 1 (N = 20, t = 20 and N = 50, t = 50) we see that the mean underestimates the variance by approximately the same amount: 6 or 8%. Similarly, comparing populations where t/N = 2.5 (N = 20, t = 50 and N = 50, t = 125) the mean underestimates the variance by 28 or 29%. This suggests that for a constant-sized population, t/N approximately determines the adequacy of the Poisson approximation to the variance. For values of t/N > 1, the Poisson approximation underestimates the true variance. The importance of the quantity t/N is not surprising, since for a population of constant size, ht = (1 - (2N)-1)t
exp(-t/(2N)). Larger values of t/N correspond to increasing amounts of IBD in the population, and in these situations, assumptions 13 may be further from the truth.
|
|
Relaxing assumptions 1 and 2:
We now develop a second variance approximation, which does not require assumptions 1 and 2. Consider the calculation of E[J2t]. As a function of n,

The first term in J2t is a sum over all junctions. The second term is a sum over pairs of distinct junctions formed in the same generation, and the third term is a sum over pairs of junctions formed in different generations. Applying conditional expectation,

We argued previously that E[It(k, j)] = 1/(2Nj+1). By assumption 3,

and

Then

and so

Therefore
![]() |
(6) |
Expressions for E[n2j] and E[ninj] are developed in the Appendix (Equation A8 and Equation A9), and E[nj] is given in Equation 4. The expectations in Equation 6 depend on the chromosome length and the population sizes over time, through the single-locus non-IBD probabilities (hj, j = 0 ... t - 1), and the two-locus non-IBD probabilities [
j(
),
j(
),
j(
) j = 0, ... t - 1], which are described in the Appendix
Fig 2 shows the variance of the number of junctions per morgan in populations of constant size either N = 20 or N = 50. The variance is estimated by simulation (10,000 iterations), Equation 6, and the Poisson approximation. For both populations, Equation 6 is much better than the Poisson-based variance approximation, particularly for later generations. It is interesting to note that in both examples, the Poisson approximation begins to fail at approximately the Nth generation, which is where the non-IBD proportion has been reduced to
60%. For generations earlier than this, the two variance approximations are almost indistinguishable, and they are very close to the simulated variance (see Fig 3). This suggests that for young populations or older, larger populations, the Poisson variance approximation may be adequate. The Poisson approximation to the variance has an advantage over Equation 6, because it is so much easier to calculate.
|
| APPLICATION TO GROWING POPULATIONS WITH AND WITHOUT SUBDIVISION |
|---|
To demonstrate the potential effects of different types of population growth on expected junction number and therefore intact segment length, we consider an example. Consider a population that has grown to 100 times its initial size, over a period of 100 generations. This example reflects the age of modern Finnish (![]()
![]()
- Linear growth: expansion by a constant number of individuals each generation.
- Exponential growth: expansion by a constant percentage each generation. A 100-fold increase over 100 generations corresponds to a growth rate of 4.72% per generation.
- Exponential growth with internal subdivision: population bifurcates whenever a population size of 2N0 is reached (first division at t = 15, subsequently every 15 generations).
- Exponential growth with internal subdivision: population bifurcates whenever a population size of 4N0 is reached (first division at t = 30, subsequently every 15 generations).
- Exponential growth with internal subdivision: population bifurcates whenever a population size of 8N0 is reached (first division at t = 45, subsequently every 15 generations).
For a given value of N0, all scenarios have the same total size at generation 100. All exponential growth scenarios have the same total number of individuals at all generationsthe difference is in the extent of internal subdivision. Fig 4 shows the total population sizes over time for the population with N0 = 20.
|
Table 2 shows the expected number of junctions in a chromosome selected from generation 100 (using Equation 5) and the corresponding lower bound on the expected length of intact ancestral segments (using Equation 1), for each of the five growth scenarios with N0 = 20. In these populations, the type of growth has a pronounced effect on the expected number of junctions. Substantially more junctions are expected in the linearly growing population than in any of the exponentially growing populations. This is because the linearly growing population increases its size rapidly enough in the early generations that little IBD is accumulated. In contrast, all the exponentially growing populations remain small for a long period of time, during which IBD accumulates within the population. Thus fewer junctions are formed. For the same reason, increasing amounts of subdivision within the exponentially growing populations results in substantially fewer junctions being formed. Intact ancestral segments in the unsubdivided exponentially growing population are expected to be
50% larger than in the linearly growing population. In the most subdivided exponential population, ancestral segments are expected to be twice as long as in the linearly growing population and almost 50% larger than those in the unsubdivided exponentially growing population. Thus different patterns of population growth can have a dramatic effect on expected number of junctions in a chromosome and therefore the length of ancestral segments.
|
Table 3 shows the expected number of junctions on a chromosome of length 1 M from generation 100, for each of the five growth scenarios and the larger founding population sizes. For the larger populations (N0 = 100 and N0 = 500) the expected number of junctions in the linearly growing population is close to 100, which is what one would expect in an infinitely large population where IBD does not accumulate. This reflects the fact that little IBD accumulates in these populations because they start relatively large and grow quickly. The number of junctions expected in the exponentially growing populations is reduced relative to the linearly growing populations and further reduced in the subdivided populations. While these trends are the same as those observed in the smallest populations (N0 = 20, see Table 2), the magnitude of the effects is much smaller. For example, when N0 = 500, only 3% more junctions are expected in the linearly growing population than in the most subdivided exponentially growing population.
|
It is also important to consider the variability of the number of junctions in a chromosome. Table 4 shows the variance of the number of junctions in a chromosome of length 1 M from generation 100, estimated by simulation, Equation 6, and the Poisson approximation. Simulation-based estimates are available only for populations with N0 = 20 and the subdivided populations with N0 = 100, since simulation of the larger populations is too computationally demanding. For the populations with N0 = 20, the Poisson approximation badly underestimates the variance. The approximation based on Equation 6 is much better, but still an underestimate. When N0 = 100, both approximations are closer to the simulated values, and Equation 6 is still better. For the populations with N0 = 500, the variance approximations are virtually identical, and we hypothesize that the variance is well estimated by either approximation for populations this large. The variance is always greater than or equal to the mean.
|
| DISCUSSION |
|---|
The theoretical development shows that the most important factor in determining the expected number of junctions in a chromosome, and therefore a lower bound for the average length of intact ancestral segments, is the time since founding of the population. In generation t of an infinitely large random-mating population, we expect t junctions per morgan in a chromosome. In finite populations, the expectation is <t, but the difference is substantial only if the historical population sizes have been small enough to result in the accumulation of IBD and therefore the production of fewer junctions. Similarly, different growth patterns and levels of subdivision affect the expected number of junctions in a substantial way only if population sizes are very small. Even when this is the case, the variance of the number of junctions in a chromosome is large, and so the existing number of junctions in a chromosome may differ substantially from that expected on the basis of known population history and structure.
These results allow us to predict that disequilibrium may persist over longer distances in smaller, more recently founded populations. Whether or not it does depends on the patterns of junction formation in many meioses, which we cannot observe. Studies of the extent of disequilibrium across the genome of an isolated population are therefore desirable. Only then can the utility of a large-scale disequilibrium mapping study be assessed.
| ACKNOWLEDGMENTS |
|---|
We are grateful to a referee for drawing our attention to the related work of ![]()
![]()
Manuscript received February 20, 2002; Accepted for publication June 10, 2002.
| APPENDIX |
|---|
CALCULATION OF SECOND-ORDER MOMENTS OF Hi(p) AND ni
To calculate the second-order moments of ni, we require the second-order moments of Hi(p), the proportion of the chromosome that is non-IBD in individual p of generation i.
Second-order moments of Hi(p):
To calculate second-order moments of Hi(p), we consider some two-locus gene nonidentity measures described by ![]()
,
, or
according to the number of chromosomes in which the loci are being compared (see Fig A1). ![]()
i = (
i,
i,
i)T denote the column vector of two-locus non-IBD probabilities at generation i. ![]()
i+1 =
·
i, where
is a transition matrix that depends on the recombination fraction
between the loci and the size (Ni) of the population at generation i. Therefore
i depends on the population sizes up to and including generation i - 1 and the recombination fraction
. We denote the probabilities of interest by
i(
),
i(
), and
i(
).
|
Calculation of E[Hi(p)2]:
Consider E[Hi(p)2], the expected value of the square of the non-IBD proportion in an individual in generation i.
![]() |
(A1) |
In this equation,
s denotes the recombination fraction between two loci a distance s morgans apart, and line 4 is obtained from line 3 by a change of variables s = |x - y| and integration.
i is too complicated to evaluate exactly. ![]()
Calculation of E[Hi(p) · Hi(p')]:
Consider the product of the non-IBD proportions of two distinct individuals p and p' in the ith generation.
![]() |
(A2) |
Calculation of E[Hi(p) · Hj(p')]:
Finally, we examine the product of the non-IBD proportions of two individuals: p from the ith generation, and p' from the jth generation. We assume that i < j. Then
![]() |
(A3) |
where ai and a'i denote the genes at locus x in person p of generation i, bj and b'j denote the genes at locus y in person p' of generation j, and
indicates non-IBD. To have bj
b'j, bj and b'j must be descended from different individuals in generation j - 1. This implies that
![]() |
(A4) |
where bj-1 and b'j-1 denote the ancestors at generation j - 1 of bj and b'j, respectively. Applying (A4) iteratively, we obtain
![]() |
(A5) |
where b and b' denote genes at locus y on distinct chromosomes in generation i + 1. The probability on the right-hand side of Equation A5 depends on the relationship between the chromosomes carrying ai, a'i, and the ancestors bi and b'i of bi+1 and b'i+1. Table A11 shows the possible configurations of bi and b'i, the probability of each configuration, calculated using the random-mating model, and the desired probability Pr(ai
a'i; bi+1
b'i+1) conditional on that configuration.
The probability required in Equation A3 is then obtained by summing over the possible configurations and substituting that quantity into Equation A5. Therefore
![]() |
(A6) |
Substituting Equation A6 into Equation A3, we find
![]() |
(A7) |
where

Second-order moments of ni:
To calculate E[n2i] and E[ninj], we use the formula
, where Xi(m) denotes the number of junctions formed in meiosis m from generation i. Since crossovers happen along the chromosome according to a Poisson process with rate one per morgan, given Hi(pm), Xi(m) has a Poisson distribution with mean Hi(pm)L, where pm denotes the parent of meiosis m, and L denotes the length of the chromosome in morgans. Therefore

since the parent is simply a randomly chosen individual from generation i.
To calculate E[n2i], we also consider E[Xi(m)Xi(m')], the expected value of the product of the numbers of junctions formed in two different meioses from the same generation:

since conditional on the proportion non-IBD in each of the parents, the numbers of junctions formed in each meiosis are independent. With probability 1/Ni, both meioses are from the same parent. Otherwise they are from distinct individuals in the ith generation. Therefore

by Equation A1 and Equation A2. Then
![]() |
(A8) |
Calculation of E[ninj] requires that we first obtain E[Xi(m)Xj(m')], the expected value of the product of the numbers of junctions formed in two meioses occurring in two different generations. Now,

for i < j, by Equation A7. Then
![]() |
(A9) |
| LITERATURE CITED |
|---|
BENEDICT, R., 1989 The Crysanthemum and the Sword. Houghton Mifflin, Boston.
CHAPMAN, N. H., 2001 Genome descent in isolated populations. Ph.D. Thesis, University of Washington, Seattle, WA.
CHAPMAN, N. H., and E. A. THOMPSON, 2001 Linkage disequilibrium mapping: the role of population history, size and structure, pp. 413437 in Advances in Genetics, Vol. 42. Academic Press, San Diego.
CHAPMAN, N. H. and E. M. WIJSMAN, 1998 Genome screens using linkage disequilibrium tests: optimal marker characteristics and feasibility. Am. J. Hum. Genet. 63:1872-1885.[Medline]
CROW, J. F., and M. KIMURA, 1970 An Introduction to Population Genetics Theory. Harper & Row, New York.
DERRIDA, B. and B. JUNG-MULLER, 1999 The genealogical tree of a chromosome. J. Stat. Phys. 94:277-298.
FISHER, R. A., 1949 The Theory of Inbreeding. Oliver and Boyd, Edinburgh.
KRUGLYAK, L., 1999 Prospects for whole-genome linkage disequilibrium mapping of common disease genes. Nat. Genet. 22:139-144.[Medline]
LONJOU, C., A. COLLINS, and N. E. MORTON, 1999 Allelic association between marker loci. Proc. Natl. Acad. Sci. USA 96:1621-1626.
NEVANLINNA, H. R., 1972 The Finnish population structurea genetic and genealogical study. Hereditas 71:195-236.[Medline]
WEIR, B. S., P. J. AVERY, and W. G. HILL, 1980 Effect of mating structure on variation in inbreeding. Theor. Popul. Biol. 18:396-429.
WIUF, C. and J. HEIN, 1997 On the number of ancestors to a DNA sequence. Genetics 147:1459-1468.[Abstract]
This article has been cited by other articles:
![]() |
A. R. Freeman, C. J. Hoggart, O. Hanotte, and D. G. Bradley Assessing the Relative Ages of Admixture in the Bovine Hybrid Zones of Africa and the Near East Using X Chromosome Haplotype Mosaicism Genetics, July 1, 2006; 173(3): 1503 - 1510. [Abstract] [Full Text] [PDF] |
||||
![]() |
M. J. Sillanpaa and M. Bhattacharjee Bayesian Association-Based Fine Mapping in Small Chromosomal Segments Genetics, January 1, 2005; 169(1): 427 - 439. [Abstract] [Full Text] [PDF] |
||||
- THIS ARTICLE
-
Abstract
- Full Text (PDF)
- Alert me when this article is cited
- Alert me if a correction is posted
- SERVICES
- Similar articles in this journal
- Similar articles in PubMed
- Alert me to new issues of the journal
- Download to citation manager
- Reprints & Permissions
- CITING ARTICLES
- Citing Articles via HighWire
- Citing Articles via Google Scholar
- GOOGLE SCHOLAR
- Articles by Chapman, N. H.
- Articles by Thompson, E. A.
- Search for Related Content
- PUBMED
- PubMed Citation
- Articles by Chapman, N. H.
- Articles by Thompson, E. A.























