Abstract
An isolated population is a group of individuals who are descended from a founding population who lived some time ago. If the founding individuals are assumed to be noninbred and unrelated, a chromosome sampled from the population can be represented as a mosaic of segments of the original ancestral types. A population in which chromosomes are made up of a few long segments will exhibit linkage disequilibrium due to founder effect over longer distances than a population in which the chromosomes are made up of many short segments. We study the length of intact ancestral segments by obtaining the expected number of junctions (points where DNA of two distinct ancestral types meet) in a chromosome. Assuming random mating, we study analytically the effects of population age, growth patterns, and internal structure on the expected number of junctions in a chromosome. We demonstrate that the type of growth a population has experienced can influence the expected number of junctions, as can population subdivision. These effects are substantial only when population sizes are very small. We also develop an approximation to the variance of the number of junctions and show that the variance is large.
AN isolated population is one that is descended from a small group of individuals (founders) and in which population growth is due almost exclusively to births within the population, rather than immigration from outside. Interest in the genetics of isolated populations has recently been revived among human geneticists, because of suggestions that such populations may be useful for disequilibriumbased mapping of susceptibility loci for complex disease. In particular, it is hoped that diseases for which there are several susceptibility loci in large outbred populations may be more homogeneous in small isolated populations. In addition, small recently founded populations may exhibit linkage disequilibrium over longer genetic distances than large outbred populations (Chapman and Wijsman 1998; Kruglyak 1999).
Isolated populations are fundamentally different from the large outbred populations that are usually assumed in the theoretical study of linkage disequilibrium and may differ from one another in several aspects of their history. Populations are founded at different times by founder groups of different sizes, experience different growth patterns, and may have varying levels of internal subdivision. Chapman and Thompson (2001) give a brief survey of the variety of histories and structures seen in human populations. It is important to understand the potential effects of these aspects of a population’s history on disequilibrium, both to assess the utility of disequilibriumbased studies and to interpret the results of such studies.
Lonjou et al. (1999) presented observed disequilibria in two regions of the genome for a wide variety of human populations. In general, levels of disequilibrium in isolated populations were only slightly higher than in outbred populations. However, the pairs of loci they considered were very tightly linked (<0.2 cM apart) and therefore this result may simply reflect the large number of generations required to break down such associations. In this article, we address how the extent of linkage disequilibrium is affected by population history, rather than considering the magnitude of disequilibrium between two loci a particular distance apart.
We study the effects of population history on the number of junctions existing in a chromosome sampled from an isolated population. A junction is a point on the chromosome where DNA from two distinct ancestral chromosomes meet (Fisher 1949). Figure 1 shows examples of two chromosomes that might have been sampled from an isolated population. Different shadings represent different ancestral types. The top chromosome contains two junctions, and the chromosome is therefore made up of three segments. The bottom chromosome contains eight junctions and is made up of nine segments. A quantity of interest is the average length of contiguous ancestral segments remaining in the generation under study. If the chromosomes have broken into many short pieces relative to the founder population, disequilibrium due to founder effect will stretch over only short distances. Conversely, if the chromosomes are composed of a small number of large pieces, relative to the founder generation, disequilibrium will stretch over longer distances. If there are J junctions in a chromosome, there are J + 1 ancestral segments, and by Jensen’s inequality,
A junction is formed when a crossover occurs between two chromosomes, at a point where they are not descendants of the same ancestral chromosome. That is, the chromosomes are not identical by descent (IBD) at that point. Once a junction is formed, it is transmitted as is any other gene (according to the laws of Mendelian inheritance). Since IBD is defined relative to some ancestral population, and since junctions require nonIBD to be formed, junctions are also defined relative to some ancestral population. In this article, junctions are defined relative to the founding generation; that is, this generation is assumed to consist of noninbred, unrelated individuals.
Some analogous questions regarding the lengths, number, and ancestral origins of chromosome segments have recently been considered by Wiuf and Hein (1997) and Derrida and JungMuller (1999). Wiuf and Hein (1997) consider, as do we, the moments of the number of ancestral chromosome segments, while Derrida and JungMuller (1999) focus on the number of distinct ancestors contributing to a current chromosome. The primary difference from this article is that these authors have considered the longterm equilibrium between the process of recombination and the IBD process modeled via the coalescent ancestry of chromosomes. Recombination increases the number of contributing ancestors, whereas coancestry decreases this number.
By contrast, in this article we consider IBD relative to a founder population at some defined time point in the past and the shorterterm effects of population structure. We study the formation and transmission of junctions in randommating subdivisions of a monoecious population with discrete generations. We assume that during gamete formation, crossover events along the chromosome happen according to a Poisson process, which has rate one per morgan. This implies that the number of crossover events in a chromosome of length L has a Poisson distribution with mean L. The age of the population is assumed known, as is the size of the population at each generation. In subdivided populations, the generation of the split(s) and the sizes of the subpopulations are assumed known. We first present some theoretical results, including an expression for the expected number of junctions per morgan existing on a chromosome randomly sampled from a particular generation and two approximations to the variance of this quantity. We then apply these results to some example populations to illustrate the effects of population size, type of growth, and subdivision.
THEORETICAL DEVELOPMENT
Mean number of junctions
Let J_{t} be the number of junctions present on a chromosome of length L, sampled at random from a population at generation t. Let n = {n_{0}, n_{1},... n_{t}}, where n_{j} denotes the number of junctions formed in meioses from generation j. Finally, let I_{t}(k, j) = 1 if the kth junction formed in meioses from generation j is present on the chromosome selected at time t, and let I_{t}(k, j) = 0 otherwise. Then as a function of n,
Calculation of E[n_{j}]: Let H_{j}(p) denote the proportion of the chromosome that is nonIBD in individual p of generation j. Then
Equation 5 demonstrates that population history affects the expected number of junctions in a chromosome through the probability of nonIBD in each generation. This implies that in a large population where h_{j} remains close to one over many generations, the number of generations since the founding of the population is the most important factor in determining the expected number of junctions and therefore the lower bound on the expected length of intact ancestral segments. Growth patterns that result in small population sizes over long periods of time will result in the accumulation of IBD, and as a result fewer junctions will be expected in chromosomes from such populations. Similarly, chromosomes from populations in which there is extensive subdivision will be expected to carry fewer junctions and therefore have longer intact ancestral segments.
Variance of the number of junctions
Recall that n_{i} denotes the total number of junctions formed in all meioses from generation i.
Poisson approximation: We first consider a variance approximation on the basis of some simplifying assumptions. Specifically, suppose that
n_{i} has a Poisson distribution with mean 2N_{i}_{+1}h_{i}L.
n_{i} is independent of n_{j} for all i ≠ j.
The presence of any one junction in the sampled chromosome from generation t is independent of the presence of any other junction in that chromosome. That is, Pr(junction k formed in a meiosis from generation i exists in the chromosome sampled at generation tjunction l formed in a meiosis from generation j exists in the chromosome sampled at generation t) = Pr(junction k formed in a meiosis from generation i exists in the chromosome sampled at generation t), for any k, l, i, and j, where k ≠ l if i = j.
Let J_{t}(i) denote the number of junctions formed in generation i that exist in the randomly sampled chromosome from generation t. Then assumption 1, together with the fact that the probability that a junction formed in a meiosis from generation i exists in the chromosome sampled at generation t equals 1/(2N_{i}_{+1}), implies that J_{t}(i) has a Poisson distribution with mean h_{i}L, for 0 ≤ i ≤ t  1. Furthermore, assumptions 2 and 3 imply that J_{t}(i) is independent of J_{t}(j), for i ≠ j. Therefore
The above assumptions do not generally hold. Assumption 1 would hold if all of the individuals in generation i had the same proportion h_{i} of their genome nonIBD. In fact, this proportion varies across members of generation i and is equal to h_{i} only in expectation. This extra variability leads to extraPoisson variation in the distribution of n_{i}. Assumption 2 does not hold, since, for example, knowing that n_{i} is very small relative to the number of meioses implies that the population is likely close to fixation, and therefore subsequent n_{j} (j > i) must also be small. Junctions formed close to one another in the same meiosis are likely to be inherited together, and therefore assumption 3 is not generally true. The violation of these assumptions implies that the true variance is likely higher than that predicted by the Poisson approximation. The assumptions are probably closer to the truth in larger populations.
Simulations: The performance of the Poisson approximation to the variance was investigated by simulation. Chromosome data were simulated for randommating populations of constant size (N = 20 or N = 50) over 150 generations. Individual chromosomes were represented by a linked list of segments, where adjacent segments were of distinct ancestral types. Each individual in generation i + 1 was produced by randomly choosing (with replacement) two parents from generation i. A gamete from each of these parents was generated by simulating the locations of crossovers according to a Poisson process and constructing the gamete out of the appropriate segments of parental chromosomes. More details can be found in Chapman (2001). In each simulation, a chromosome was randomly selected for the generation of interest, and the number of junctions existing in that chromosome was recorded. Variance estimates are based on 10,000 simulations.
Figure 2 shows the variance of the number of junctions per morgan in populations of constant size either N = 20 or N = 50, estimated by simulation and by the Poisson approximation. The Poisson variance is an underestimate of the true variance, especially for older generations, and the smaller population. Since for a true Poisson random variable, mean and variance are equal, Table 1 shows the ratio of the estimated variance (
Relaxing assumptions 1 and 2: We now develop a second variance approximation, which does not require assumptions 1 and 2. Consider the calculation of
The first term in
We argued previously that E[I_{t}(k, j)] = 1/(2N_{j}_{+1}). By assumption 3,
Figure 2 shows the variance of the number of junctions per morgan in populations of constant size either N = 20 or N = 50. The variance is estimated by simulation (10,000 iterations), Equation 6, and the Poisson approximation. For both populations, Equation 6 is much better than the Poissonbased variance approximation, particularly for later generations. It is interesting to note that in both examples, the Poisson approximation begins to fail at approximately the Nth generation, which is where the nonIBD proportion has been reduced to ∼60%. For generations earlier than this, the two variance approximations are almost indistinguishable, and they are very close to the simulated variance (see Figure 3). This suggests that for young populations or older, larger populations, the Poisson variance approximation may be adequate. The Poisson approximation to the variance has an advantage over Equation 6, because it is so much easier to calculate.
APPLICATION TO GROWING POPULATIONS WITH AND WITHOUT SUBDIVISION
To demonstrate the potential effects of different types of population growth on expected junction number and therefore intact segment length, we consider an example. Consider a population that has grown to 100 times its initial size, over a period of 100 generations. This example reflects the age of modern Finnish (Nevanlinna 1972) and Japanese (Benedict 1989) populations. We consider initial population sizes (N_{0}) of 20, 100, and 500 individuals, and for each we consider five growth scenarios:
Linear growth: expansion by a constant number of individuals each generation.
Exponential growth: expansion by a constant percentage each generation. A 100fold increase over 100 generations corresponds to a growth rate of 4.72% per generation.
Exponential growth with internal subdivision: population bifurcates whenever a population size of 2N_{0} is reached (first division at t = 15, subsequently every 15 generations).
Exponential growth with internal subdivision: population bifurcates whenever a population size of 4N_{0} is reached (first division at t = 30, subsequently every 15 generations).
Exponential growth with internal subdivision: population bifurcates whenever a population size of 8N_{0} is reached (first division at t = 45, subsequently every 15 generations).
For a given value of N_{0}, all scenarios have the same total size at generation 100. All exponential growth scenarios have the same total number of individuals at all generations—the difference is in the extent of internal subdivision. Figure 4 shows the total population sizes over time for the population with N_{0} = 20.
Table 2 shows the expected number of junctions in a chromosome selected from generation 100 (using Equation 5) and the corresponding lower bound on the expected length of intact ancestral segments (using Equation 1), for each of the five growth scenarios with N_{0} = 20. In these populations, the type of growth has a pronounced effect on the expected number of junctions. Substantially more junctions are expected in the linearly growing population than in any of the exponentially growing populations. This is because the linearly growing population increases its size rapidly enough in the early generations that little IBD is accumulated. In contrast, all the exponentially growing populations remain small for a long period of time, during which IBD accumulates within the population. Thus fewer junctions are formed. For the same reason, increasing amounts of subdivision within the exponentially growing populations results in substantially fewer junctions being formed. Intact ancestral segments in the unsubdivided exponentially growing population are expected to be ∼50% larger than in the linearly growing population. In the most subdivided exponential population, ancestral segments are expected to be twice as long as in the linearly growing population and almost 50% larger than those in the unsubdivided exponentially growing population. Thus different patterns of population growth can have a dramatic effect on expected number of junctions in a chromosome and therefore the length of ancestral segments.
Table 3 shows the expected number of junctions on a chromosome of length 1 M from generation 100, for each of the five growth scenarios and the larger founding population sizes. For the larger populations (N_{0} = 100 and N_{0} = 500) the expected number of junctions in the linearly growing population is close to 100, which is what one would expect in an infinitely large population where IBD does not accumulate. This reflects the fact that little IBD accumulates in these populations because they start relatively large and grow quickly. The number of junctions expected in the exponentially growing populations is reduced relative to the linearly growing populations and further reduced in the subdivided populations. While these trends are the same as those observed in the smallest populations (N_{0} = 20, see Table 2), the magnitude of the effects is much smaller. For example, when N_{0} = 500, only 3% more junctions are expected in the linearly growing population than in the most subdivided exponentially growing population.
It is also important to consider the variability of the number of junctions in a chromosome. Table 4 shows the variance of the number of junctions in a chromosome of length 1 M from generation 100, estimated by simulation, Equation 6, and the Poisson approximation. Simulationbased estimates are available only for populations with N_{0} = 20 and the subdivided populations with N_{0} = 100, since simulation of the larger populations is too computationally demanding. For the populations with N_{0} = 20, the Poisson approximation badly underestimates the variance. The approximation based on Equation 6 is much better, but still an underestimate. When N_{0} = 100, both approximations are closer to the simulated values, and Equation 6 is still better. For the populations with N_{0} = 500, the variance approximations are virtually identical, and we hypothesize that the variance is well estimated by either approximation for populations this large. The variance is always greater than or equal to the mean.
DISCUSSION
The theoretical development shows that the most important factor in determining the expected number of junctions in a chromosome, and therefore a lower bound for the average length of intact ancestral segments, is the time since founding of the population. In generation t of an infinitely large randommating population, we expect t junctions per morgan in a chromosome. In finite populations, the expectation is <t, but the difference is substantial only if the historical population sizes have been small enough to result in the accumulation of IBD and therefore the production of fewer junctions. Similarly, different growth patterns and levels of subdivision affect the expected number of junctions in a substantial way only if population sizes are very small. Even when this is the case, the variance of the number of junctions in a chromosome is large, and so the existing number of junctions in a chromosome may differ substantially from that expected on the basis of known population history and structure.
These results allow us to predict that disequilibrium may persist over longer distances in smaller, more recently founded populations. Whether or not it does depends on the patterns of junction formation in many meioses, which we cannot observe. Studies of the extent of disequilibrium across the genome of an isolated population are therefore desirable. Only then can the utility of a largescale disequilibrium mapping study be assessed.
APPENDIX: CALCULATION OF SECONDORDER MOMENTS OF H_{i}(p) AND n_{i}
To calculate the secondorder moments of n_{i}, we require the secondorder moments of H_{i}(p), the proportion of the chromosome that is nonIBD in individual p of generation i.
Secondorder moments of H_{i}(p): To calculate secondorder moments of H_{i}(p), we consider some twolocus gene nonidentity measures described by Weir et al. (1980) and illustrated in Figure A1. Generally, we are interested in the probability that genes a and a′ at locus x are nonIBD, and genes b and b′ at locus y are also nonIBD. This probability is denoted 0398;, Γ, or Δ according to the number of chromosomes in which the loci are being compared (see Figure A1). Weir et al. (1980) consider the evolution of these probabilities over time for populations reproducing according to various schemes of random mating with discrete generations. Let ν_{i} = (0398;_{i}, Γ_{i}, Δ_{i})^{T} denote the column vector of twolocus nonIBD probabilities at generation i. Weir et al. (1980) show that ν_{i}_{+1} = Ω · ν_{i}, where Ω is a transition matrix that depends on the recombination fraction θ between the loci and the size (N_{i}) of the population at generation i. Therefore ν_{i} depends on the population sizes up to and including generation i  1 and the recombination fraction θ. We denote the probabilities of interest by 0398;_{i}(θ), Γ_{i}(θ), and Δ_{i}(θ).
Calculation of E[H_{i}(p)^{2}]: Consider E[H_{i}(p)^{2}], the expected value of the square of the nonIBD proportion in an individual in generation i.
Calculation of E[H_{i}(p) · H_{i}(p′)]: Consider the product of the nonIBD proportions of two distinct individuals p and p′ in the ith generation.
Calculation of E[H_{i}(p) · H_{j}(p′)]: Finally, we examine the product of the nonIBD proportions of two individuals: p from the ith generation, and p′ from the jth generation. We assume that i < j. Then
The probability required in Equation A3 is then obtained by summing over the possible configurations and substituting that quantity into Equation A5. Therefore
Secondorder moments of n_{i}: To calculate
To calculate
Acknowledgments
We are grateful to a referee for drawing our attention to the related work of Wiuf and Hein (1997) and Derrida and JungMuller (1999). This work was supported in part by the Burroughs Wellcome Fund for the Program in Mathematical and Molecular Biology.
Footnotes

Communicating editor: M. Veuille
 Received February 20, 2002.
 Accepted June 10, 2002.
 Copyright © 2002 by the Genetics Society of America