Probability of Identity by Descent in Metapopulations
- * Department of Mathematics, Uppsala University, S-751 06 Uppsala, Sweden
- † Department of Genetics, Uppsala University, S-750 07 Uppsala, Sweden
- Corresponding author: Martin Lascoux, Department of Genetics, Box 7003, S-750 07 Uppsala, Sweden. E-mail: martin.lascoux{at}genetik.uu.se
Abstract
Equilibrium probabilities of identity by descent (IBD), for pairs of genes within individuals, for genes between individuals within subpopulations, and for genes between subpopulations are calculated in metapopulation models with fixed or varying colony sizes. A continuous-time analog to the Moran model was used in either case. For fixed-colony size both propagule and migrant pool models were considered. The varying population size model is based on a birth-death-immigration (BDI) process, to which migration between colonies is added. Wright's F statistics are calculated and compared to previous results. Adding between-island migration to the BDI model can have an important effect on the equilibrium probabilities of IBD and on Wright's index.
WRIGHT'S island model considers “a population subdivided into random breeding islands with populations of size N of which the proportion m consists of immigrants that may be considered a random sample of the total species” (Wright 1969). The model also assumes that (1) the populations have discrete nonoverlapping generations, (2) individual populations do not fluctuate in size, and (3) migration is uniform over all islands. Under those assumptions and at steady state, Wright showed that FST, the correlation between uniting gametes, could be written as a simple function of Nm, namely
Wright's F statistics, FIS and FST, where the subindices refer to individuals, subpopulations, and total population, are expressed in terms of probability of identity by descent (IBD) as
In the present article we extend both models. In the first part a continuous-time version of the Maruyama and Tachida model is given and extended to a migrant pool extinction model. Detailed arguments are provided for deriving the systems of ordinary differential equations satisfied by the IBD probabilities, starting out from the basic mutation/reproduction balance and successively adding migration and extinction mechanisms. In the second part of the article, migration between islands and selfing are added to a diploid version of Rannala and Hartigan's model. An approximation for Wright's index at population genetic equilibrium is found. It is shown in particular that when there is migration among islands as well as immigration from a large source with a fixed probability of IBD (the situation modeled by Rannala and Hartigan), the death rate may have an important effect on probabilities of IBD and FST.
GENERALITIES AND SCOPE
Our purpose is to model and study genetic diversity at individual, colony, and total population levels. The distribution of genetic diversity in the population is affected by five agencies: (1) mutation, (2) reproduction, (3) extinction and subsequent replacement of colonies, (4) migration between colonies and/or from an external source, and (5) population size.
We assume that mutation follows an infinite-allele model (Kimura and Crow 1964), which assumes that every mutation creates an allele that is not currently existing in the population. Under this model, two alleles that are identical in state are also identical by descent. Two types of reproduction systems are considered: asexual reproduction and sexual reproduction. Sexual reproduction includes both random and mixed mating (part of the population mating at random, the other part selfing). An extinct colony can be replaced by a given number of individuals originating from one of the existing colonies (propagule model; Slatkin 1977) or by a given number of individuals randomly sampled over the colonies (migrant pool model; Slatkin 1977). In either case, the initial size of the population is restored instantly. Alternatively, colonies can be gradually recolonized, allowing varying population sizes. Finally, colonies can exchange individuals (seeds) or single gametes (pollen).
We consider two models of basically different character, a fixed-size model where the colonies are equally large and of given size, and a variable-size model in which the sizes of the subpopulations are given by birth-and-death processes. We introduce next the various parameters and the precise dynamics in both models.
Model parameters: In both models there are n colonies. For the fixed-size model we let N denote the number of diploid individuals in each colony and k be the number of colonists. The distribution of genetic material evolves according to the nonnegative Poisson intensities
-
u, mutation of an allele into a new type per gene,
-
1/2, reproduction/compensatory death per individual,
-
λ, extinction/recolonization per colony,
-
ms, seed migration between colonies per individual.
In addition, two probabilities enter as parameters, namely,
-
s, selfing probability,
-
mp, pollen migration probability between colonies during mating.
The corresponding set of parameters for the variable-size model are the Poisson intensities
-
u, mutation of an allele into a new type per gene,
-
1/2, reproduction per individual,
-
δ, death per individual,
-
β, immigration of a new individual from the mainland per colony,
-
ms, seed migration between colonies per individual,
and the selfing probability s, 0 ≤ s ≤ 1. The reproduction intensity of 1/2 was retained to conform with the usual choice of normalization. In that case, with probability h/2 + o(h) one individual, and hence with probability h + o(h) one of two alleles in different individuals considered at time t + h, is the result of reproduction during (t, t + h].
Mutation acts in the same way in both models. It is the basic cause of reduction in identity by state of the genes in the population. The other mechanisms require more precise descriptions.
Reproduction: Obviously the main difference between fixed and variable size is that in the first case the Moran model mechanism of automatic compensating deaths following the formation of every new individual applies (Moran 1958), whereas in the latter a birth event more realistically increases the size of the population by one unit. In the variable-size case the assumption δ> 1/2 will guarantee that the population does not grow without bound and the assumption β > 0 will prevent permanent extinction.
To define reproduction in the fixed-size model we distinguish first of all whether reproduction occurs by random mating or asexually, and treat these cases separately. In both cases we assume the Moran model mechanism of automatic compensating deaths.
The asexual case is simpler. Here, in the event that the Poisson intensity signals that an individual is going to reproduce, a genetic copy of this diploid individual is produced followed by the death of one of the N original individuals. It is worth noting that the killed individual may thus be the one which itself triggered the production of the new.
The more complicated sexual reproduction case also involves the selfing parameter s as follows. Consider one of the N individuals and suppose its Poisson clock goes off. This we understand as an instruction to first make a copy of one of its own genes that is to form the first gene in the pair of genes representing the new individual. To find the second gene in the new pair there are two possibilities. Namely, with probability s the reproducing individual selects the second gene again to be a copy of one of its own, and with probability 1 – s it selects the second gene to be a copy of any one of the 2N existing genes sampled uniformly in the subpopulation. The actual selfing probability is therefore s + (1 – s)/ N rather than s. Again the formation in this manner of the new individual is instantly followed by a compensatory death.
In the variable-size model, for simplicity we restrict the analysis to sexual reproduction only.
Extinction/recolonization: As already stated we cover the two schemes introduced by Slatkin (1977) within the fixed-size model. The first mechanism is the so-called propagule recolonization, which means that upon extinction of a colony, k individuals are picked randomly in one of the n existing colonies to replace it. The second is the migrant pool. The difference is that now the replacement colony is put together by sampling k individuals uniformly over the whole population. Cases intermediate between those two extreme models are also taken care of by considering the probability of common origin of two colonists (Whitlock and McCauley 1990). In all cases, the k individuals then reproduce to restore the size of the population.
For the variable-size model one can say that deaths and immigration replace extinction/recolonization. Even if colonies may go extinct from time to time, influx from an external source is guaranteed. The effect of β is that external individuals enter into each colony independently, and an additional assumption will then be made regarding the genetic relationship between existing and immigrating gametes.
Migration: To each individual is associated the seed migration intensity ms, which, when its Poisson clock rings, forces this individual to select randomly and uniformly one of the existing individuals in the population and then exchange position with the particular individual that was chosen. For example, in the fixed-size model this yields a probability 1/nN that nothing at all happens. Pollen migration, on the other hand, is a refinement of the random mating mechanism. We let mp denote the probability that one of the genes composing a newly created individual due to pollen migration should be considered as being selected randomly in the full existing population and not only within its own colony.
Now we introduce more formally the IBD probabilities that we wish to study in this work. Pick two different alleles, A and B, randomly in the population. We consider the probabilities θk, k = 1, 2, 3, that A and B are identical by descent given their hierarchical “distance” k,
-
θ1, if A and B belong to the same individual,
-
θ2, if A and B belong to different individuals in the same colony,
-
θ3,if A and B are taken from different colonies in the population.
FIXED-POPULATION SIZE MODEL
Mutation and reproduction: We consider the IBD probabilities θk(t), t ≥ 0, k = 1, 2, 3, as functions of time, and investigate to what extent they change during a small interval of time (t, t + h], h > 0. Pick two alleles at hierarchical distance k from each other and observe them at time t + h.
First suppose that only mutation is in effect. Because the probability of a mutation in (t, t + h] of one of the selected genes is proportional to 2uh,
Now we include reproduction, which stabilizes the genetic structure in the population and counteracts the diversifying effect of mutation through genetic drift toward fixation. The balance of mutation and genetic drift then leads to stationary probabilities θ1 and θ2. However, there is no genetic drift over colony borders, and hence for k = 3 the only solution is still θ3 = 0.
Asexual reproduction: Assume that all reproduction is asexual. Then the genetic relationship between two genes in a single individual does not change in case that individual was just formed as a copy of a previously existing individual. Mutation, on the other hand, will affect their interrelationship, forcing the IBD probability to vanish asymptotically as in (2). Hence the only stationary solution for θ1(t) is θ1 = 0.
For θ2(t) note that with probability h + o(h) one of the chosen genes belongs to an individual that during (t, t + h] was produced by asexual reproduction. Then one of the two individuals involved, call it A, existed at time t and the other one, call it B, has been replaced by a copy of a randomly picked individual that existed at time t. With probability 1/N, B happens to be a copy of A in which case the two chosen genes must be considered belonging to the same individual. In this last case we have selected one gene in A and one gene in the copy of A called B. But the probability that two different genes in the same individual are IBD at t is θ1(t), and thus our two genes are IBD with probability 1/2 + θ1(t)/2. With probability 1 – 1/N their IBD probability remains at θ2(t). Hence,
Sexual reproduction: We start by deriving an equation for θ2(t). Consider at time t + h two alleles, a and b, from two different individuals in the same colony and suppose one of them, a say, belongs to an individual that was created sexually in the same time interval elapsed since time t. With probability 1/N, allele a is copied from the individual to which b belongs, in which case they are identical by descent with probability 1/2 + θ1(t)/2. With probability 1 – 1/N, the gene a was copied elsewhere; hence their IBD probability remains θ2(t). Some reflection shows that these same relations hold whether or not the individual decided to self. Hence,
Turning to Wright's indices, we obtain for the asexual case described previously,
Adding migration and extinction: The next step toward enriching the applicability of the model and taking into account the colony structure that until now did not enter the derivations, is to add migration and extinction. Seed migration: To include seed migration we must consider the equations for θ2(t) and θ3(t), noting that migration of individuals does not affect θ1(t). Consider first a pair of genes selected in the same colony. Then the intensity of a migration event in (t, t + h], which corresponds to selecting the two genes, in fact, from different colonies is 2ms(1 – 1/n). Hence
Pollen migration: Introduce the total migration parameter
To complete the analysis of the migration model, it remains to do the algebraic manipulations leading to the following expressions for Wright's indices:
Extinction/recolonization: The recolonization event consists of two steps that are supposed to occur instantly in case of extinction. First k,1 ≤ k ≤ N, founder individuals, or colonists, are selected from a single colony (propagule model) or randomly over the n colonies (migrant pool model). The N – k additional individuals are produced by sexual reproduction from the k colonists. To avoid unnecessarily complicated expressions we restrict to the case mp = 0, ignoring pollen migration.
The simplest case is the propagule model with k = N colonizers, in which an existing colony is copied and reinstalled in full to account for recolonization in the event of extinction. It is clear that extinction and the consecutive reinstallment of a colony have no effect upon θ1 or θ2. Hence we derive the stationary solution for θ3(t), taking into consideration the possibility that one of the two colonies representing the pair of genes selected for the determination of θ3(t + h) goes extinct during (t, t + h], the probability of which is 2λh. The probability is 1/n that the particular replacement colony is chosen to which the other gene of the given pair belongs, and the probability is 1/nN that even the same individual is chosen to which the other gene of the given pair belongs. Thus
Next we turn to the general propagule and migrant pool schemata with k colonizers. The Equations 11 for θ1(t) and 21 for θ3(t) with k = N remain unchanged, whereas Equation 12 for θ2(t) changes. To find the modified equation for θ2(t), suppose we pick at time t + h two alleles from two different individuals in a given colony. With probability λh + o(h), the colony suffers extinction and is recolonized by k founders, all this within (t, t + h]. Let
Thus, temporarily switching off mutation, reproduction, and migration and considering only extinction/recolonization,
The corresponding steady-state solutions (22) change into
It remains to calculate the probabilities αk, a task that can be rephrased as follows. Suppose we start with k individuals marked 1,..., k, and select from them by random sampling with replacement 2(N – k) allele copies, forming N – k new individuals. This scheme yields a colony of 2N genes classified in k categories that are represented by at least one gene each. Now select randomly with equal probabilities two individuals and in each of them a gene. We want to find the probability ak that both genes belong to the same category (originate from the same founder). However, as long as the two chosen individuals do not both belong to the original group of size k, the probability is 1/k that they are marked with the same number. Hence,
The exact expressions for Wright's indices are somewhat unwieldy in this case, so we defer the further discussion to the final section where we apply some approximations and state a result.
IBD PROBABILITY IN VARYING POPULATION SIZE
An obvious drawback with the previous model is the restriction to fixed colony sizes N throughout the population. In population genetics modeling it has been natural to apply simple birth-death-immigration (BDI) models to widen the scope toward varying population size. In particular, there is a rich probabilistic structure in haploid populations based on the demographic decomposition into the number of families of allele types with a given number of representatives in the population (Tavaré 1989). Rannala and Hartigan (1995) explore some of these ideas for computing IBD probabilities in a metapopulation run by birth, death, and immigration. Rannala (1996) considers the corresponding sampling distribution of alleles under the same model and derives an expression for FST, which then coincides with θ2 in our notation.
We take a different approach and extend the analysis of Rannala and Hartigan to include, under equilibrium conditions, selfing and migration between colonies. Our study may also be seen as a further alternative to the extinction/recolonization scheme in the fixed-size model.
A birth-death-immigration model: We start afresh from the situation that led to Equation 15, keeping seed migration but replacing the size-preserving Moran mechanism by death and immigration, as explained in generalities and scope.
Within this setting, δ introduces the potential for local population extinctions and β that for local recolonizations. Out of several possible schemes regarding the probability of IBD between a newly immigrated gene and an existent one, we assume that all immigrants are taken from a pool of individuals such that
We assume that only seed migration is in effect. Hence individuals are exchanged between colonies without changing the population size. Let
The conditional jump probabilities at a time t, such that Npopt = r, are given by
Next we study the sizes of the single colonies. We refer to their stationary distributions by means of random variables (N1,..., Nn). By symmetry we immediately have
To this end, we note that in the present framework migration corresponds to the death of an individual in one colony followed instantly by immigration into a randomly chosen colony. Hence, focusing on a specific colony size, Njt say, the total external immigration intensity at time t is given by
As an approximation for the colony size we have thus found a slightly different BDI process. Its conditional jump probabilities given that Njt = r are
It is interesting to compare the variance for the components Nj with the variance of their sum Npop in the dependent case m > 0. We obtain
Probabilities of IBD in the varying-population-size model: Because in the present model with positive probability some colonies or even the whole population may be extinct from time to time, the definition of the IBD probabilities θk is not as straightforward as in the previous constant-size model. To define θ2 there must be at least one colony with two individuals, and to define θ3 at least two nonempty colonies. It is reasonable to argue that otherwise corresponding weights should be assigned to let θ3 = 0 and θ2 = 0, respectively. However, we have, for example,
Now we go over the various events that may occur in (t, t + h]. First of all, with intensity
Next, with intensity
Finally, with intensity r/2 a birth occurs in (t, t + h] such that again the size changes from r to r + 1. As a pair of genes A and B are selected at time t + h, the probability is 2/(r + 1) that one of them belongs to the newly reproduced individual. Let us suppose that A does. Then we have two possibilities. With probability 1/r gene A was produced by random sampling from the individual that gene B belongs to, in which case the probability of IBD is 1/2 + θ1(t)/2. With probability 1 – 1/r the relationship between A and B is given simply by θ2(t).
Summing up,
Turning to θ3 suppose two colonies i and j have been chosen. We must determine the effect of immigration into any one of them during (t, t + h]. Suppose colony i of size r is subject to external immigration (or colony j of size q). Then with probability 1/(r + 1) the immigrant belongs to the select pair resulting in IBD probability f. Similarly, if the immigrant is a migrant from j, then again with probability 1/(r + 1) the IBD probability of the select pair changes, this time into θ2. The intensities are for the first event β and for the second approximately mE(Nj)/n. Hence
At this point we observe that the steady-state probabilities corresponding to the balanced equations we have derived will depend on r and q unless u = 0. In the present model the external immigration parameter β has a similar effect and purpose as mutation, namely, to guarantee fresh import of genetic material and enhance genetic diversity. The difference is that mutation acts only on the individual level while immigration occurs in colonies regardless of their sizes. We therefore make the approximation in the sequel to ignore mutation and simply restrict the case to u = 0. Moreover, because θ′1(t) = 0 implies
In conclusion we have found, under the simplifying assumptions of mutation being negligible and the typical size of a colony being large, that the conditional IBD probabilities at equilibrium for given colony sizes are expressed by the relations (30). Again relying on, e.g., (29), we let expression (30) represent the unconditional IBD probabilities for the variable-size model. Thus Wright's genetic indices for the BDI model turn out to be
DISCUSSION
Relation of fixed-size model to previous work: We compare the expressions derived so far with those previously obtained by Maruyama and Tachida (1992) and Whitlock and McCauley (1990). To recover the results of Maruyama and Tachida (1992) from (20) and (23), or the combination of them, we observe that if u, λ, and m are small compared to N, then
We take these findings as a basis for claiming that the continuous time approach has some advantages over the discrete analog for investigating the detailed influence on Wright's indices of the various random mechanisms that change population genetic diversity.
We continue with some additional remarks on the approximate analysis discussed above. It is interesting that the equilibrium relation (14) is still valid for the approximative solutions, that is,
For the case of extinction/recolonization with k founding individuals, we apply the approximation (34) and obtain from (25)
The remaining case
Finally, it is instructive to look at the expressions for FST as N → ∞ and k is fixed. In the migrant pool case φ = 1/n,
Discussion of the variable size model: Writing Ncol for the size of a typical colony we have equivalently to (31)
In the special case ms = 0, for which
Our model as well as that of Rannala and Hartigan assume (i) the presence of a continuous flow of immigrants from a mainland in which allele frequencies remain constant and (ii) the absence of spatial structure. Clearly, in many biological situations these are strong assumptions. Therefore a natural next step is to study probabilities of IBD in a metapopulation where colony sizes are regulated by a birth-death process, without external immigration, and in which the migration rate among colonies depends on their spatial location. The demography of similar systems has been studied (e.g., Pollett 1998) but, to our knowledge, their genetic properties remain unexplored.
Acknowledgments
We thank Dr. Steve Krone and two anonymous referees for useful comments on the manuscript.
Footnotes
-
Communicating editor: W. Stephan
- Received November 29, 1998.
- Accepted April 8, 1999.
- Copyright © 1999 by the Genetics Society of America