## Abstract

Admixed populations have been used for inferring migrations, detecting natural selection, and finding disease genes. These applications often use a simple statistical model of admixture rather than a modeling perspective that incorporates a more realistic history of the admixture process. Here, we develop a general model of admixture that mechanistically accounts for complex historical admixture processes. We consider two source populations contributing to the ancestry of a hybrid population, potentially with variable contributions across generations. For a random individual in the hybrid population at a given point in time, we study the fraction of genetic admixture originating from a specific one of the source populations by computing its moments as functions of time and of introgression parameters. We show that very different admixture processes can produce identical mean admixture proportions, but that such processes produce different values for the variance of the admixture proportion. When introgression parameters from each source population are constant over time, the long-term limit of the expectation of the admixture proportion depends only on the ratio of the introgression parameters. The variance of admixture decreases quickly over time after the source populations stop contributing to the hybrid population, but remains substantial when the contributions are ongoing. Our approach will facilitate the understanding of admixture mechanisms, illustrating how the moments of the distribution of admixture proportions can be informative about the historical admixture processes contributing to the genetic diversity of hybrid populations.

EXCHANGES of genes between two or more mutually isolated populations can result in new admixed or hybrid populations. For nearly 80 years, statistical models have been used to estimate the proportions of the genetic ancestry of an admixed population that are derived from the various parental source populations (Bernstein 1931; Roberts and Hiorns 1965; Long and Smouse 1983; Long 1991; Chakraborty *et al.* 1992; Bertorelle and Excoffier 1998; Pritchard *et al.* 2000; Chikhi *et al.* 2001; Wang 2003; Tang *et al.* 2005) and, more recently, to determine the probable ancestral origins of chromosomal segments within individual genomes (Ungerer *et al.* 1998; Falush *et al.* 2003; Hoggart *et al.* 2004; Patterson *et al.* 2004; Baird 2006; Tang *et al.* 2006; Sankararaman *et al.* 2008; Bercovici and Geiger 2009; Price *et al.* 2009). Admixed human populations have been employed in assessing patterns of migration and genetic structure (Parra *et al.* 2001; Seldin *et al.* 2007; Wang *et al.* 2008; Silva-Zolezzi *et al.* 2009), detecting natural selection (Workman *et al.* 1963; Cavalli-Sforza and Bodmer 1971; Chakraborty and Weiss 1988; Tang *et al.* 2007; Oleksyk *et al.* 2010; Lohmueller *et al.* 2011), and identifying phenotypically important genes through admixture-mapping strategies (McKeigue 1998, 2005; Halder and Shriver 2003; Reich and Patterson 2005; Smith and O’Brien 2005; Buerkle and Lexer 2008; Seldin *et al.* 2011).

Many recent methods consider admixed populations as statistical combinations of the source populations, treating allele frequencies in a hybrid population as linear combinations of allele frequencies in the source groups. While this perspective is informative in diverse applications for describing the current structure of admixed populations, it does not mechanistically account for the inherent complexity of admixture processes. In the case of humans, throughout history, previously isolated populations have come into contact through colonization waves, forced displacements, and population migrations. Moreover, admixture processes have often been influenced by sociocultural rules on intermarriage in contexts of ethnic conflict or discrimination, slavery, and clan or caste systems. Such complex histories of social behaviors have produced a variety of patterns of genetic variation in different admixed groups (Parra *et al.* 1998, 2003; Bonilla *et al.* 2005; Bedoya *et al.* 2006; Chaix *et al.* 2007; Wang *et al.* 2008; Halder *et al.* 2009; Tishkoff *et al.* 2009; Verdu *et al.* 2009; Bryc *et al.* 2010).

Mechanistic perspectives that seek to describe the history of admixture processes through time rather than estimating admixture proportions from the source populations in descriptive statistical models have been part of some recent studies of admixture (Briscoe *et al.* 1994; Stephens *et al.* 1994; Pfaff *et al.* 2001), and they have been used to make theoretical predictions of admixture proportions as well as of Wright’s fixation index *F*_{ST} and statistics measuring linkage disequilibrium (Chakraborty and Weiss 1988; Long 1991; Guo *et al.* 2005). Most of these approaches have relied on models with a relatively simple dynamic considering a single admixture event between populations, rather than on models that investigate a more complex history of admixture processes.

Ewens and Spielman (1995) proposed a mechanistic admixture model that incorporated multiple admixture events involving multiple source populations. This model has been used primarily to evaluate the influence of population subdivision and admixture on the performance of the transmission-disequilibrium test (Ewens and Spielman 1995) and to examine linkage disequilibrium statistics (Guo *et al.* 2005). However, complex mechanistic models have not been used to directly determine the influence of admixture histories on the admixture patterns of hybrid populations.

In this article, expanding on the models of Ewens and Spielman (1995) and Guo *et al.* (2005), we develop a general mechanistic model of a historical admixture process. We first introduce the model, the most general form of which considers *m* source populations that contribute to the ancestry of a hybrid population. We treat the fraction of genetic admixture in the hybrid population originating from a specific source population as a random variable, whose distribution we study over time in the *m* = 2 case. We next examine the expectation, variance, and higher moments of the admixture fraction as functions of time and of the introgression parameters, and we consider in detail a special case in which admixture is constant across generations. Finally, we conclude with a discussion of the implications of the work for empirical studies of admixture.

## The Model

We describe a version of our mechanistic admixture model in which the number of source populations is two. The generalization to *m* source populations is straightforward, and we provide it in supporting information, File S1, Figure S1, and Table S1.

Define population *H* (“hybrid”) as a population consisting of immigrant individuals from two isolated source populations, *S*_{1} and *S*_{2}, and hybrid individuals who have ancestors from both *S*_{1} and *S*_{2}. The hybrid population can be viewed as having a separate location or status from *S*_{1} and *S*_{2}, so that individuals within *H* can interbreed with each other and with new immigrants that come from the source populations.

We let *s*_{1,}* _{g}*,

*s*

_{2,}

*, and*

_{g}*h*be the fractional contributions of populations

_{g}*S*

_{1},

*S*

_{2}, and

*H*to the hybrid population

*H*at generation

*g*+ 1. That is, for a randomly chosen individual in

*H*at generation

*g*+ 1, the probabilities that a randomly chosen parent of the individual derives from populations

*S*

_{1},

*S*

_{2}, and

*H*are

*s*

_{1,}

*,*

_{g}*s*

_{2,}

*, and*

_{g}*h*, respectively. These probabilities can differ in different generations, but for all

_{g}*g*≥ 0, the parameters

*s*

_{1,}

*,*

_{g}*s*

_{2,}

*, and*

_{g}*h*have values that are ≥0 and ≤1, such that

_{g}*s*

_{1,}

*+*

_{g}*s*

_{2,}

*+*

_{g}*h*= 1. At generation 0, the hybrid population is not yet formed. Therefore,

_{g}*h*

_{0}= 0 and

*s*

_{1,0}+

*s*

_{2,0}= 1. Hence, considering the period through generation

*g*, in addition to

*g*itself, this model has 2

*g*− 1 independent parameters: one introgression proportion in the first generation and two introgression proportions in each of the next

*g*− 1 generations. A diagram of the model appears in Figure 1.

### Admixture fractions for a random individual in the hybrid population

We focus on a key quantity in admixed populations, namely the fraction of admixture from one of the source populations for a random individual in *H* at a randomly chosen locus. This fraction represents the proportion of the genome of a randomly chosen individual in *H* that ultimately traces to a specific source.

We indicate the possible sources for the (unordered) parents of an individual in *H* by *S*_{1}*S*_{1}, *S*_{1}*S*_{2}, *S*_{1}*H*, *S*_{2}*H*, *HH*, and *S*_{2}*S*_{2}. An individual in generation *g* ≥ 1 has one of several possible types of parents, each with some probability dependent on the parameters *s*_{1,}_{g}_{−1}, *s*_{2,}_{g}_{−1}, and *h _{g}*

_{−1}(Table 1). If the parents have different ancestries, we do not distinguish the order of the two parents, so that, for example, “

*S*

_{1}

*H*” does not convey which specific parent is from population

*S*

_{1}and which is from

*H*.

Let *Y* be a random variable indicating the source populations of the parents of a random individual in *H*. Let *H*_{1,}* _{g}* be the admixture fraction from source population

*S*

_{1}for a random individual in population

*H*at a random locus at generation

*g*. Because at generation 0, the hybrid population is not yet formed,

*h*

_{0}= 0, and

*H*

_{1,0}is not defined. Using Table 1, we can write a recursion relation to calculate

*H*

_{1,}

*for all*

_{g}*g*≥ 1. For the first generation (

*g*= 1), we have(1)For all subsequent generations (

*g*≥ 2), we have(2)Here, and are fractions of ancestry from source population

*S*

_{1}for the two parents of a hybrid individual at generation

*g*with

*Y*=

*HH*. We use the superscripts (1) and (2) only to indicate that and are separate independent and identically distributed (IID) random variables, so that if an individual in population

*H*at generation

*g*has two parents from

*H*, the admixture fraction is distributed as the mean of the admixture fractions for two IID random individuals from

*H*in the previous generation.

Equations 1 and 2 allow us to analyze the behavior of the admixture fraction from a source population for a random individual in the hybrid population, as a function of the time *g* and the parameters *s*_{1,}* _{i}*,

*s*

_{2,}

*, and*

_{i}*h*for

_{i}*i*= 1, 2, … ,

*g*− 1. Under our model, the set of possible values of

*H*

_{1,}

*is Using Equations 1 and 2, we can show that for a value*

_{g}*q*in the set

*Q*, the probability

_{g}*P*(

*H*

_{1,}

*=*

_{g}*q*) that a random individual in the hybrid population at generation

*g*has admixture fraction

*q*can be computed using the following recursion relation (

*Appendix*). For the first generation (

*g*= 1), we have and(3)For all subsequent generations (

*g*≥ 2), for

*q*in

*Q*,(4)where the function

_{g}*I*is defined for all values of

_{g}*q*in

*Q*and equals(5)

_{g}*P*(

*H*

_{1,}

*=*

_{g}*q*) is zero when

*q*is not in

*Q*.

_{g}We can use Equations 3–5 to examine the evolution of the distribution of *H*_{1,}* _{g}* across generations. For five scenarios in which the admixture process is constant after the founding of population

*H*(

*s*

_{1,}

*=*

_{g}*s*

_{1}and

*s*

_{2,}

*=*

_{g}*s*

_{2}for all

*g*≥ 1), Figure 2 plots the complete set of values of

*P*(

*H*

_{1,}

*) for the first six generations.*

_{g}In Figure 2A, we consider a scenario in which the hybrid population *H* is founded with equal contributions from source populations *S*_{1} and *S*_{2} (*s*_{1,0} = *s*_{2,0} = ), which do not subsequently contribute to *H* (*s*_{1,}* _{g}* =

*s*

_{2,}

*= 0 for all*

_{g}*g*≥ 1). We can see that the probability

*P*(

*H*

_{1,}

*)for a random individual in*

_{g}*H*to exhibit a given fraction of admixture from

*S*

_{1}is distributed symmetrically around at each generation, with a single mode at

*H*

_{1,}

*= for each of the first six generations. This pattern arises from the fact that after a symmetric founding event, in the absence of immigration, no new input enters the admixed population from either source, and the distribution remains symmetric.*

_{g}Figure 2B considers an admixture process with the same starting conditions as in the previous case (*s*_{1,0} = *s*_{2,0} = ), in which the subsequent contributions from the source populations *S*_{1} and *S*_{2} to the hybrid population *H* are symmetric and constant across generations as before, but nonzero (*s*_{1,}* _{g}* =

*s*

_{2,}

*≠ 0 for all*

_{g}*g*≥ 1). In this case, because at each generation, the two source populations make equal contributions, the distribution of

*H*

_{1,}

*continues to be symmetric around . Instead of being unimodal as in the previous case, however, it is now multimodal. This multimodality arises from the fact that in a scenario with continuing gene flow from the sources, new modes arise as the new immigrants mate with individuals whose admixture fractions lie near preexisting modes.*

_{g}In Figure 2C, we consider an admixture process with a symmetric founding of population *H* as before (*s*_{1,0} = *s*_{2,0} = ), in which the subsequent contributions from the source populations *S*_{1} and *S*_{2} are nonzero and constant across generations, but with *S*_{2} contributing more than *S*_{1} at each generation (0 ≠ *s*_{1,}_{g}*s*_{2,}* _{g}* for all

*g*≥ 1). In this case, the distribution of

*H*

_{1,}

*is no longer symmetric around . Instead, it is shifted toward smaller values after the founding of the hybrid population*

_{g}*H*. This pattern arises from the fact that in a scenario with continuing gene flow in which at each generation, many more individuals immigrate into

*H*from

*S*

_{2}than from

*S*

_{1}, matings between new immigrants and admixed individuals are more likely to occur with immigrants from

*S*

_{2}than with immigrants from

*S*

_{1}. Thus, after the symmetric founding of population

*H*, the probability of randomly drawing an individual in

*H*with a high fraction of admixture from population

*S*

_{1}is lower than the probability of drawing an individual with a low fraction of admixture from

*S*

_{1}.

Figure 2D considers an admixture process in which population *S*_{1} contributes more than population *S*_{2} to the founding of population *H* (*s*_{1,0} > *s*_{2,0} ≠ 0), but with the same subsequent constant admixture process as in Figure 2C (0 ≠ *s*_{1,}_{g}*s*_{2,}* _{g}* for all

*g*≥ 1). In this case, the distribution of

*H*

_{1,}

*is no longer symmetric around at generation 1, but is shifted toward higher values of the admixture fraction from source population*

_{g}*S*

_{1}. Nevertheless, as in Figure 2C, the distribution of

*H*

_{1,}

*shifts toward zero in the subsequent generations. As in Figure 2C, in each generation, admixed individuals in population*

_{g}*H*are more likely to mate with new immigrants from

*S*

_{2}than with new immigrants from

*S*

_{1}.

Finally, in Figure 2E, we consider a process in which the source population *S*_{1} contributes more than population *S*_{2} to the hybrid population not only in the founding of population *H* (*s*_{1,0} > *s*_{2,0} ≠ 0) but also in each subsequent generation (0 ≠ *s*_{1,}_{g}*s*_{2,}* _{g}* for all

*g*≥ 1). In this case, the distribution of

*H*

_{1,}

*is shifted toward high values of the admixture fraction from population*

_{g}*S*

_{1}. Unlike in Figure 2, C and D, in Figure 2E, an individual in population

*H*is more likely to mate with a new immigrant from

*S*

_{1}than with a new immigrant from

*S*

_{2}at each generation following the founding of population

*H*. Thus, unlike in Figure 2, C and D, generation after generation, the probability of randomly drawing an individual in population

*H*with a high fraction of admixture from

*S*

_{1}is higher than that of drawing an individual with a low fraction of admixture from

*S*

_{1}.

This collection of scenarios illustrates three main points. First, if contributions to the admixed population occur only in the first generation, then the long-term level of admixture continues to reflect the initial conditions. Second, the same starting conditions can lead to quite different long-term patterns, depending on the subsequent contributions to the hybrid population. Third, with constant contributions at each generation, the starting conditions influence the speed with which the distribution of admixture tends toward its long-term distribution, but do not predict the qualitative form of this distribution.

### Moments of the admixture fraction for a random individual in the hybrid population

Analysis of the moments of the distribution of admixture as a function of time *g* can provide a way of understanding features of the distribution and its determinants in the historical admixture process itself. We can utilize the recursion in Equations 1 and 2 to obtain recursions for the expectation, variance, and higher moments of *H*_{1,}* _{g}* as functions of

*g*and

*s*

_{1,}

*,*

_{i}*s*

_{2,}

*, and*

_{i}*h*, for

_{i}*i*= 1, 2, … ,

*g*− 1. We first obtain a recursion for the expectation

*E*[

*H*

_{1,}

*]. Next, we generalize the method used for finding the expectation, and we obtain a recursion for the*

_{g}*k*th moment, . Using the case of

*k*= 2, we obtain a recursion for the variance

*V*[

*H*

_{1,}

*].*

_{g}#### Expectation of *H*_{1,}_{g}:

_{g}

Using the law of total expectation, we can obtain an expression for the expectation *E*[*H*_{1,}* _{g}*] as a function of conditional expectations for different possible pairs of parents

*Y*for a random individual in population

*H*at generation

*g*:(6)For the first generation, because parents cannot derive from population

*H*, we have(7)Using Equations 1 and 2,(8)and for all subsequent generations (

*g*≥ 2),

Recalling that for all *g* ≥ 0, *s*_{1,}* _{g}* +

*s*

_{2,}

*+*

_{g}*h*= 1,

_{g}*h*

_{0}= 0, and for all and are IID random variables, we can simplify the recursion expression. For

*g*= 1,(10)and for all subsequent generations (

*g*≥ 2), we have(11)This result demonstrates that for a random individual in the hybrid population

*H*, the expectation of the admixture fraction from population

*S*

_{1}in one generation is a linear function of the corresponding expectation in the previous generation.

#### Moments of *H*_{1,}_{g}:

_{g}

Using a similar computation to that employed in obtaining the recursion for the expected admixture, we can write recursions for higher moments of the admixture fraction . For the first generation (*g* = 1), we have for *k* ≥ 1,(12)For all *g* ≥ 2, we have(13)where and represent IID random variables for the fractions of ancestry from source population *S*_{1} for two hybrid individuals in generation *g* − 1.

Using the law of total expectation, for *k* ≥ 1, we have for the first generation (*g* = 1)(14)For *g* ≥ 2, we have(15)Recalling that for all *g* ≥ 0, *s*_{1,}* _{g}* +

*s*

_{2,}

*+*

_{g}*h*= 1,

_{g}*h*

_{0}= 0, and for all

*g*≥ 2, and are IID random variables, we can use the binomial theorem to obtain a simplified recursion for the moments of

*H*

_{1,}

*. For the first generation, we have(16)For*

_{g}*g*≥ 2,

Note that by simplifying Equation 16 with *k* = 1, we obtain for the first generation(18)which matches Equation 10. Simplifying Equation 17 with *k* = 1 using the fact that *s*_{1,}* _{g}* +

*s*

_{2,}

*+*

_{g}*h*= 1 for all

_{g}*g*≥ 0, for all subsequent generations (

*g*≥ 2), we obtain(19)which matches Equation 11.

#### Variance of *H*_{1,}_{g}:

_{g}

When *k* = 2, Equations 16 and 17 provide a recursion relation for the second moment of *H*_{1,}* _{g}*. For the first generation, because

*s*

_{1,0}+

*s*

_{2,0}= 1, we have(20)For subsequent generations (

*g*≥ 2), because

*s*

_{1,}

*+*

_{g}*s*

_{2,}

*+*

_{g}*h*= 1 for all

_{g}*g*≥ 0, we obtain

With the relationship , and using Equations 10, 11, 20, and 21, we obtain a recursion for the variance of *H*_{1,}* _{g}*. For the first generation (

*g*= 1), we have(22)and for

*g*≥ 2,(23)This recursion for the variance of the admixture fraction utilizes the variance in the previous generation, along with the expectation in the previous generation and its square.

## Special Case: Constant Admixture after the Founding of the Hybrid Population

Using our recursions for the moments of the admixture fraction *H*_{1,}* _{g}*, we can examine particular cases in which

*s*

_{1,}

*,*

_{g}*s*

_{2,}

*, and*

_{g}*h*are specified. Here we consider a special case that reflects a constant process in which admixture occurs in the same way from one generation to the next after the founding of the hybrid population. In this section, we specify that for all

_{g}*g*≥ 1, all introgression parameters are constant in time after the founding of population

*H*(

*s*

_{1,}

*= s*

_{g}_{1},

*s*

_{2,}

*=*

_{g}*s*

_{2}, and

*h*=

_{g}*h*for all

*g*≥ 1). We first consider a case in which no admixture from source populations

*S*

_{1}and

*S*

_{2}occurs after the founding of the hybrid population.

### A single admixture event

Suppose that source populations *S*_{1} and *S*_{2} do not contribute to the hybrid population after its founding (*s*_{1} = *s*_{2} = 0, and *h* = 1). As before, because at generation 0 the hybrid population is not yet formed, we specify that *h*_{0} = 0 and *s*_{1,0} + *s*_{2,0} = 1, with *s*_{1,0} and *s*_{2,0} both taking values in (0, 1).

#### Expectation of *H*_{1,}_{g}:

_{g}

Under this scenario, we can simplify Equations 10 and 11 for the expected admixture from population *S*_{1}. Because *s*_{1} = *s*_{2} = 0 and *h* = 1, for all *g* ≥ 1, we have(24)When admixture occurs only in the initial generation, the expected admixture fraction for a random individual in the hybrid population at any generation depends only on the initial contribution from source population *S*_{1}.

#### Variance of *H*_{1,}_{g}:

_{g}

Using Equations 22 and 23, *V*[*H*_{1,}* _{g}*] follows the recursion relation of a geometric sequence with ratio 1/2 and initial value

*V*[

*H*

_{1,1}] =

*s*

_{1,0}(1 −

*s*

_{1,0})/2. Therefore, for

*g*≥ 1,(25)The variance decreases monotonically as a function of

*g*and is smaller when the initial contribution

*s*

_{1,0}from source population

*S*

_{1}is farther away from .

The scenario in Figure 2A, in which *s*_{1,0} = *s*_{2,0} = and *s*_{1,}* _{g}* =

*s*

_{2,}

*= 0 for all*

_{g}*g*≥ 1, provides an example of the setting considered here. In Figure 2A, the distribution of the admixture fraction for a random individual in

*H*becomes increasingly concentrated near as time progresses. As predicted by Equation 24, the mean admixture is constant over time with a value of . As predicted by Equation 25, the variance decreases over time; it eventually approaches zero, so that the admixture fraction for a random individual approaches the mean. This phenomenon can be attributed to the fact that except during the founding event, each mating in the population involves two individuals from the hybrid population itself; no new source of admixture draws the admixture fraction toward extreme values of 0 or 1. Thus, with admixture values equal to the mean of those of their parents, offspring individuals are likely to have intermediate admixture within the unit interval.

It is noteworthy that if admixture occurs in a single event, then Equations 24 and 25 provide a basis for estimating the time of the event from the observed mean and variance of admixture. Given mean *M* and variance *V* (with *V* ≠ 0), Equations 24 and 25 yield(26)It can be seen from Equation 26 that for a fixed mean, a smaller variance indicates a larger value of *g* and therefore a longer time since admixture, and for a fixed variance, a smaller value of *M*(1 − *M*) indicates a shorter time since admixture.

### Nonzero combined contribution from the source populations at each generation

In this section, we consider values of *s*_{1} and *s*_{2} in [0, 1] and values of *h* in (0, 1). As before, because at generation 0 the hybrid population is not yet formed, *h*_{0} = 0 and *s*_{1,0} + *s*_{2,0} = 1. This set of assumptions corresponds to a process with a nonzero combined contribution of populations *S*_{1} and *S*_{2} to *H* in each generation (*s*_{1} + *s*_{2} ≠ 0 because *h* ≠ 1), although we do allow one or the other contribution to be zero (*s*_{1} = 0 and *s*_{2} ≠ 0 or *s*_{1} ≠ 0 and *s*_{2} = 0). The contribution of population *H* to itself in each generation is nonzero (*h* ≠ 0).

#### Expectation of *H*_{1,}_{g}:

_{g}

Applying Equations 10 and 11, the recursion relation for *E*[*H*_{1,}* _{g}*] can be simplified. For the first generation (

*g*= 1), we have(27)For all subsequent generations (

*g*≥ 2),(28)This equation is a nonhomogeneous first-order recurrence of the form(29)with initial condition

*E*[

*H*

_{1,1}] =

*s*

_{1,0}, where ψ(

*g*) =

*s*

_{1}and λ =

*h*. Because we consider an admixture process that is constant from one generation to the next and we assume

*h*≠ 0 and

*h*≠ 1, we can apply Theorem 3.1.2 of Cull

*et al.*(2005) to Equation 29 to obtain the unique solution for

*E*[

*H*

_{1,}

*]:*

_{g}Figures 3 and 4 illustrate the expected admixture fraction as a function of *g* under constant admixture, as determined in Equation 30. In Figure 3, we can see that in three admixture scenarios with different parameter values for the founding of the hybrid population *H*, but with identical introgression parameters constant in the subsequent generations, the expected admixture fraction from the source population *S*_{1} approaches the same long-term limit. Moreover, in Figure 4, considering three scenarios with identical founding parameter values (*s*_{1,0} and *s*_{2,0}), but different values for the introgression parameters *s*_{1} and *s*_{2} in the subsequent generations with identical ratios, *s*_{1}/*s*_{2}, the expected admixture fraction also approaches the same long-term limit.

Using Equation 30 and the relation *s*_{1} + *s*_{2} + *h* = 1 with *h* ∈ (0, 1), we can compute the long-term limit of *E*[*H*_{1,}* _{g}*] as

*g*→ ∞:(31)Equation 31 demonstrates that the starting conditions (

*s*

_{1,0}and

*s*

_{2,0}) for the founding of the hybrid population

*H*do not influence the long-term limiting expectation, as observed in Figure 3. The limiting expected admixture in Equation 31 can be rewritten as 1 − 1/(1 +

*s*

_{1}/

*s*

_{2}), showing that the limiting expectation is determined only by the ratio of the constant contributions from populations

*S*

_{1}and

*S*

_{2}, as observed in Figure 4.

Using Equation 31, we can plot the long-term limit of the expected admixture fraction from source population *S*_{1} as a function of the introgression parameters *s*_{1} and *s*_{2} (Figure 5). When the admixture process is constant over time, for a given value of *s*_{2}, the long-term expectation of the admixture fraction from the source population *S*_{1} increases monotonically with *s*_{1}. Because the long-term limit depends only on the ratio *s*_{1}/*s*_{2}, different introgression proportions as well as different founding scenarios for population *H* can lead, in the long-term, to the same expected admixture fractions in *H*.

#### Variance of *H*_{1,}_{g}:

_{g}

When the admixture process is constant across generations, we can employ the same methods used for obtaining the expectation of *H*_{1,}* _{g}* to obtain a solution for . In this case, for the first generation (

*g*= 1), Equation 20 gives(32)For

*g*≥ 2, Equation 21 gives

As was true in the case of Equation 28, this equation is a nonhomogeneous first-order recurrence with the form(34)Here, the initial condition is , λ = *h*/2, and for all *g* ≥ 2,(35)Using Equation 30, we can simplify Equation 35 for all *g* ≥ 2, to obtain(36)Because *h* ≠ 0 and *h* ≠ 1, Theorem 3.1.2 of Cull *et al.* (2005) applies in the same way as in the computation of *E*[*H*_{1,}* _{g}*], producing a unique solution for :(37)Decomposing the summation and summing separate geometric series, we obtain(38)where

With the relationship , and using Equations 30 and 38, we obtain the variance of *H*_{1,}* _{g}*:(43)We can simplify Equation 43 to obtain expressions for

*V*[

*H*

_{1,}

*] without the summation . For all values of*

_{g}*h*in (0, 1) with

*h*≠ , by summing the geometric series from Equation 43,(44)where

*A*

_{5}= 2

*hA*

_{4}/(1 − 2

*h*). For

*h*= , Equation 43 gives

Figures 6 and 7 illustrate the variance of the admixture fraction under the special case of constant admixture, computing Equation 43 for different sets of values of the introgression parameters. Figure 6 shows that in three scenarios with different founding parameter values (*s*_{1,0} and *s*_{2,0}), because the admixture process is constant over time and identical among the scenarios, the variance of the admixture fraction from one of the source populations approaches the same long-term limit. In Figure 7, considering two admixture scenarios with identical founding events but opposite constant admixture processes, the variance of the admixture fraction also approaches the same limit.

In Figures 6 and 7, we can see that for some sets of values of *s*_{1,0}, *s*_{1}, and *s*_{2}, the variance of the admixture fraction from one of the source populations increases monotonically from the beginning of the admixture process until it reaches a maximal value and then decreases monotonically to its long-term limit. In these cases, at the beginning of the admixture process, the source populations introduce considerable variance to the distribution of the admixture fraction for a random individual in *H*. After a certain amount of time, the proportion of matings that involve members of the hybrid population *H* with similar admixture fractions increases, reducing the proportion of matings that generate offspring admixture fractions at opposite extremes. Additional matings then occur among individuals with similar admixture, ultimately decreasing the variance of the admixture fraction until *V*[*H*_{1,}* _{g}*] approaches its long-term limit.

Because *h* ≠ 0 and *h* ≠ 1, we can compute the long-term limit of *V*[*H*_{1,}* _{g}*] as

*g*→∞ using Equation 43. We obtain, for all values of

*h*in (0, 1),(46)The starting conditions do not influence the long-term limit, as observed in Figure 6.

Recalling that *s*_{1} + *s*_{2} + *h* = 1, an alternative representation for Equation 46 is(47)It is possible to see from Equation 47 that if *s*_{1} + *s*_{2} is fixed, then the limiting variance is greater when both source populations contribute similarly to the hybrid population (*s*_{1} ≈ *s*_{2}) than when one source population contributes more than the other (*s*_{1} ≫ *s*_{2} or s_{2} ≫ *s*_{1}). Additionally, for a fixed ratio *s*_{1}/*s*_{2}, the variance is greater when the combined contribution from both source populations, *s*_{1} + *s*_{2}, is greater. This result is sensible, as continuing contributions from the source populations generate individuals with admixture fractions at opposite extremes, thereby increasing the variance of admixture fractions.

Using Equation 47, we can plot the long-term limit of the variance of admixture proportions as a function of *s*_{1} and *s*_{2} (Figure 8). Figure 8 illustrates that the long-term limit of *V*[*H*_{1,}* _{g}*] is greater when

*s*

_{1}=

*s*

_{2}and

*s*

_{1}+

*s*

_{2}≈ 1 (and thus

*h*≈ 0). This scenario corresponds to an admixture process in which the admixed individuals in population

*H*contribute little to the next generation, and the population

*H*is largely founded anew at each generation from the source populations

*S*

_{1}and

*S*

_{2}, with identical proportions.

When *s*_{1} + *s*_{2} → 0, with *s*_{1}/*s*_{2} held constant, *h* → 1 and Equation 47 gives(48)This scenario corresponds to an admixture process in which populations *S*_{1} and *S*_{2} found the hybrid population *H* at the first generation and contribute little in subsequent generations. It tends toward the special case in which the source populations do not further contribute to the hybrid population after the founding event. The result in Equation 48 is consistent with the corresponding limit of Equation 25 for the case of no continuing admixture.

Considering our results on the expectation and variance of the admixture fraction together, although different admixture proportions that are constant and nonzero across generations can lead in the long-term to the same expected fraction of admixture, such parameter values can produce different variances. The long-term limiting expectation and variance do not depend on the conditions of the founding event of the hybrid population *H*; they depend only on the subsequent constant admixture process.

## Discussion

Our study provides a theoretical framework for analyzing complex admixture processes that involve dynamic contributions of mutually isolated source populations to the ancestry of a hybrid population. Using our mechanistic approach, we have analytically derived recursions for the expectation, variance, and higher moments of the admixture fractions in a hybrid population. In the special case of constant admixture, we have solved the recursions and analyzed the behavior of the expectation and variance.

An important observable quantity that can be estimated in modern admixed populations and used for understanding historical aspects of the admixture process is the mean admixture fraction from a source population. For a hybrid population, this quantity provides a simple summary of its overall level of admixture. However, when a hybrid population is founded in a single admixture event, we have found that the mean admixture fraction is constant across generations and is therefore uninformative about the time since the founding of the hybrid population. When the source populations contribute in a constant manner to the hybrid population after the founding event, very different admixture processes can produce identical expected admixture fractions in the long-term.

The behavior of the variance of the admixture fraction is more complex than that of the expectation. First, the variance is not constant in time, and therefore it does contain information about the time since the founding event. Second, the limiting variance can differ quite substantially for processes with the same limiting expectation, with the limit depending on the magnitude of the ongoing contributions from the source populations. Third, a low variance is characteristic of an admixture process that occurred as a single event, whereas higher variance occurs when admixture is ongoing. These results suggest that in addition to the mean admixture, other easily measured quantities such as the variance and higher moments of the admixture fraction are likely to be informative, together with the mean, in statistical procedures for estimating the parameters of the historical admixture model that gives rise to a hybrid population.

Numerous statistical methods have been developed to estimate the admixture proportions from given source populations in hybrid populations using, for instance, maximum likelihood (Wang 2003; Tang *et al.* 2005; Alexander *et al.* 2009), least squares (Roberts and Hiorns 1965; Long and Smouse 1983), coalescence times (Bertorelle and Excoffier 1998), Bayesian approaches (Pritchard *et al.* 2000; Corander *et al.* 2003; Patterson *et al.* 2004), and principal components analysis (Paschou *et al.* 2007; McVean 2009; Bryc *et al.* 2010). Although many of these methods do estimate a composite parameter representing the time since initial admixture, they generally do not use a full mechanistic approach and have largely not tried to reconstruct the history of the admixture process.

Our model incorporates a general variation over time in the relative contributions of the source populations to the hybrid population. Owing to the potentially large number of parameters in a general case with arbitrary changes in admixture with time, it is unclear when the full history of admixture will be identifiable from genetic data. Indeed, as we have focused on the mean and variance of admixture in special cases of constant admixture processes, it is also uncertain how much information will be available for estimation from higher moments in a complex case with more parameters. However, our model is flexible enough to accommodate reductions in the number of parameters through assumptions of constant admixture over periods of many generations or over the entire history of the model. It is thus likely that identifiability can be achieved at least in some cases.

The initial theoretical framework that we have developed can be expanded to account for additional aspects of the admixture process for hybrid populations. For instance, in File S1, we extend the approach to consider *m* potential source populations, deriving general expressions for the moments of the random fraction of admixture originating from any specific one of the *m* source populations. However, we have not modeled sex-specific contributions from the source populations or assortative mating between hybrid individuals on the basis of their admixture fractions (Risch *et al.* 2009). Further, while we have considered the distribution of the admixture fraction across individuals in a hybrid population, we have studied admixture only pointwise in the genome, and we have not investigated variation in admixture across the genome of a random individual. The distribution of the length of chromosomal segments ultimately tracing to a particular source population, and other variables that could potentially be examined in a recombination-based model, could provide a useful additional set of quantities to consider beyond those available in our current formulation.

Finally, we have not accounted for genetic drift in the founding populations over the course of the admixture process, a phenomenon that can confound the accurate estimation of admixture proportions (Long 1991). In the future, all of these factors can be incorporated by extending our initial mechanistic admixture model. The various extensions will make it possible to draw more information from genetic data to shed light on the complex mechanisms underlying observed genetic variation in hybrid individuals and populations.

## Acknowledgments

The authors thank Ethan Jewett and Mike DeGiorgio for useful discussions and comments. This work was supported in part by U.S. National Institutes of Health grants R01 GM081441 and R01 HG005855, by U.S. National Science Foundation grant BCS-1147534, and by the Burroughs Welcome Fund.

## Appendix

Here, we obtain the recursion relation for the distribution of the admixture fraction from source population *S*_{1} in the hybrid population *H* at generation *g* (Equations 3–5). Using the definitions of random variables *Y* and *H*_{1,}* _{g}*, we can obtain an expression for

*P*(

*H*

_{1,}

*=*

_{g}*q*), where

*q*lies in , conditional on the different possible pairs of parents

*Y*for a random individual in population

*H*at generation

*g*:(A1)

Using Equations 1 and 2, we can evaluate the conditional probabilities *P*(*H*_{1,}* _{g}* =

*q*|

*Y*=

*y*) in Equation A1 in terms of the unconditional probabilities

*P*(

*H*

_{1,}

_{g}_{−1}) in the previous generation, where for all

*g*≥ 1,

*P*(

*H*

_{1,}

*=*

_{g}*q*) = 0 when

*q*is not in

*Q*. For

_{g}*q*in

*Q*, we have(A2)(A3)(A4)(A5)(A6)(A7)By inserting Equations A2–A7 into Equation A1, we obtain the following recursion relation for

_{g}*P*(

*H*

_{1,}

*=*

_{g}*q*). For the first generation (

*g*= 1), we have and(A8)For all subsequent generations (

*g*≥ 2),(A9)For any value of

*q*in

*Q*where

_{g}*g*≥ 1, we can use Table 1 to evaluate Equations A8 and A9 in terms of parameters

*s*

_{1,}

_{g}_{−1},

*s*

_{2,}

_{g}_{−1}, and

*h*

_{g−1}. Simplifying Equations A8 and A9 we obtain Equations 3–5.

## Footnotes

*Communicating editor: Y. S. Song*

- Received July 14, 2011.
- Accepted September 23, 2011.

- Copyright © 2011 by the Genetics Society of America

Available freely online through the author-supported open access option.