Abstract
In 1991, Barton and Turelli developed recursions to describe the evolution of multilocus systems under arbitrary forms of selection. This article generalizes their approach to allow for arbitrary modes of inheritance, including diploidy, polyploidy, sex linkage, cytoplasmic inheritance, and genomic imprinting. The framework is also extended to allow for other deterministic evolutionary forces, including migration and mutation. Exact recursions that fully describe the state of the population are presented; these are implemented in a computer algebra package (available on the Web at http://helios.bto.ed.ac.uk/evolgen). Despite the generality of our framework, it can describe evolutionary dynamics exactly by just two equations. These recursions can be further simplified using a “quasi-linkage equilibrium” (QLE) approximation. We illustrate the methods by finding the effect of natural selection, sexual selection, mutation, and migration on the genetic composition of a population.
EVOLUTION involves simultaneous changes at many genetic loci. Modeling these changes is difficult because associations between alleles at different loci (“linkage disequilibria”) cause the effects of selection acting on one locus to spill over onto other loci, generating indirect selection (see Ewens 1979, p. 195). These indirect effects influence the evolution of genes that are themselves under direct selection, altering the course of adaptation (e.g., Hill and Robertson 1966; Barton 1983). Moreover, indirect selection determines the fate of modifier genes that have important effects even if they themselves are free of direct selection. Examples of such modifiers include female mating preference genes (Fisher 1952), modifiers of recombination (Otto and Michalakis 1998), and modifiers of the mutation rate (Dawson 1999; Sniegowskiet al. 2000).
The most obvious approach to modeling multilocus systems is simply to follow the frequencies of all possible genotypes. There are three basic drawbacks here. First, the number of genotypes grows exponentially with the number of loci, rapidly overwhelming both analytical and simulation approaches when there are even a modest number of genes. Second, the quantities that are often of most interest, such as allele frequencies and mean phenotypes, are obscured by working with genotypes. Third, approximations for the dynamic equations appear more naturally when we work with quantities other than genotype frequencies.
Several approximate approaches have been developed to deal with these problems.
-
The infinitesimal model assumes that very many genes influence the phenotype, such that each allele has an infinitesimal effect (Fisher 1918; Bulmer 1980; Turelli and Barton 1994). In this limit, the genetic variance contributed by allelic variation at each locus is constant, and evolutionary change is due solely to changes in associations among loci. This is an accurate and general approximation for short-term change under strong selection, but cannot describe changes in allele frequencies over the longer term.
-
The hypergeometric model (Kondrashov 1984; Barton 1992; Doebeli 1996) also assumes that loci are unlinked and have equal effects, but allows the number of loci to be finite. However, the stability of solutions to this model is limited to certain selection regimes (Shpak and Kondrashov 1999; Barton and Shpak 2000), which do not include many scenarios of evolutionary interest, such as stabilizing selection.
-
Another method introduced by Fisher (1953) models a population by following the inheritance of “junctions” between chromosome regions with different ancestries. This approach is well suited to describing the ancestry of samples of neutral genomes (Hudson 1990) and models of hybridization, in which selection can be approximated as acting on the proportions of genetic material derived from different source populations (Baird 1995). It is again intractable over long timescales, however, since the number of junctions increases geometrically.
-
Price (1970) gave an exact and completely general equation, in which the average change in a trait is precisely equal to its covariance with relative fitness plus the change due to transmission. This takes the classical approach of quantitative genetics, by following only the phenotype and disregarding the (usually unknown) genetics that underlies it.
-
Several independent developments extend Price's approach by following the mean, variance, and higher moments of the phenotypic distribution (e.g., Barton and Turelli 1987; Bürger 1991; Shapiroet al. 1994). Each moment depends on higher moments, however, and so approximations are required to give a closed set of dynamical equations. Such approximations are accurate under restricted circumstances only. In contrast to thermodynamics, where molecular motions can be averaged out over macroscopic scales, genetic details do influence phenotypic evolution.
Barton and Turelli (1991, hereafter BT91) developed the quantitative genetic approach to provide a complete description of multilocus systems, with no restrictions on the relation between genotype and phenotype. This developed from work by Barton (1983, 1986), Barton and Turelli (1987), and Turelli and Barton (1990); it was paralleled by independent work of Christiansen (1987) and Bürger (1991). Barton and Turelli's (1991) approach contains three key elements. First, it gives a general representation of populations with multiple alleles and multiple loci. Second, it derives exact recursions for the effects of selection and recombination on allele frequencies and the associations between alleles at different loci. Third, it finds a “quasi-linkage equilibrium” approximation (QLE) that allows the recursion equations to be greatly simplified under some conditions.
Although the notation and framework of the BT91 approach are general in many respects, the methods it develops are restricted to certain forms of inheritance. Their notation can describe autosomal genes in randomly mating diploids and in nonrandomly mating haploids. It cannot, however, accommodate such complications as nonrandom mating in diploids, polyploidy, sex linkage, genome imprinting, and cytoplasmic inheritance. The main aim of this article is to show how the BT91 approach can be generalized to include all forms of inheritance. We also show how migration and mutation can be described in the same framework.
This article begins by presenting a notation that is sufficiently flexible to accommodate a variety of evolutionary forces and modes of inheritance. Next, we derive general recursion equations that describe how the genetic state of a population changes over the course of a generation. The following section shows how the selection and transmission coefficients that appear in the recursions are calculated for any particular situation. We then present the QLE approximation for the recursions, which greatly simplifies the equations. The QLE approach is illustrated with examples in the following section.
The notation and recursions set out in this article have been implemented in a set of Mathematica packages (Wolfram 1999) that are available on the Web at http://helios.bto.ed.ac.uk/evolgen. These packages use the general notation, in essentially the same form as in this article, and apply this notation to define functions appropriate for selection and recombination in diploids. appendix d gives examples that show how the recursions can be computed automatically to give algebraic expressions for genetic changes in an arbitrary set of loci.
A GENERAL NOTATION FOR MULTILOCUS EVOLUTION
Here we lay out a notation that is sufficiently flexible to account for the different modes of inheritance and evolutionary forces that motivate the model. The section starts by introducing the concepts on which the notation relies, shows how the notation can describe genotypes and populations, and then describes the relation between phenotypes and genotypes.
Contexts and Positions, Selection, and Transmission: It is useful to begin by defining the word “gene” to mean a particular copy of a nonrecombining sequence at some locus in some individual. Thus two different genes may or may not reside at the same locus, and if they do they may or may not be in the same allelic state. A gene at a given locus can be found in any of a number of situations: It might be carried by a male or a female, it might have been inherited from a mother or a father, or it might reside in one deme or another. We refer to this collection of qualities as the gene's context. Context is a key concept in our notation and it is important in two ways. First, it determines how evolutionary forces act on the gene. A selection coefficient, for example, may depend on whether the gene is carried by a male or a female. If there is genomic imprinting, then the selection coefficient will also depend on the sex of the individual from which the gene was inherited. Second, a gene's context affects how it is transmitted. Consider two autosomal loci in a diploid individual. If there is no recombination between the loci during meiosis, the resulting gamete will carry copies of the genes that both descended from the individual's mother or father; with recombination, the gamete will carry one gene from the individual's mother and one from the father.
The information we need to specify a gene's context varies between models. In a model of a spatially structured population, the context will include geographical information. Likewise, in a life history model the context specifies the life stage of the individual carrying it. The context will include the sex of the individual carrying a gene in a model with two sexes, but not in a model of a hermaphroditic population.
Loci are referred to by lowercase italic letters. The context is written in a series of subscripts whose elements carry the relevant information. In this article we use the convention that for diploid populations with two sexes, the first subscript of the context gives the sex of the individual carrying the gene, which we call its “sex of carrier.” The second subscript gives the sex of the parent from which it was inherited, its “sex of origin.” Sexes are denoted by “m” for male and “f” for female. For example, genes at a diploid locus i that are carried by females and that descended from a male (the female's father) are referred to as ifm. These are then four possible contexts for genes at this locus. Contexts in hermaphrodites would not include the sex of carrier (since all individuals are the same sex), but would include the sex of origin to denote whether the gene was transmitted through an egg or a sperm. One would account for more than two sexes (as when modeling a plant population with tristyly) by simply allowing the subscripts to take more than two possible values. Subscripts can be added to denote other information, such as the deme or the family in which a gene resides.
We use the term position to refer to a particular locus in a particular context. An example of a position is the place in the genomes of females where genes inherited from males (their fathers) at locus i reside. Positions, like loci, are defined independently of the allelic states of the genes that reside there. With n diploid loci in a dioecious population, there are 4n positions that genes occupy (n loci × two sexes of carrier × two sexes of origin). Open-faced lowercase letters refer to single positions, for example
Two fundamental kinds of events that occur during the course of a generation are selection and transmission. “Selection” accounts for variation in the contribution of different genotypes to the next stage in the life cycle. Fitnesses are assigned to either individuals or groups, depending on the form of selection. The simplest case is viability selection, in which case fitnesses are assigned to individuals. For sexual selection and assortative mating, we account for the relative contributions of mated pairs of male and female genotypes; here fitnesses are assigned to all possible kinds of pairs (BT91). Thus according to our use of the term, with nonrandom mating there can be selection even when all individual genotypes have equal survival, mating success, and fecundity. Group selection is described by assigning fitnesses to all possible combinations of genotypes that could comprise a group.
By “transmission” we mean an event that changes the context of a gene. A simple example is meiosis followed by syngamy: A gene that was carried by a female becomes a gene that was inherited from a female. The rules of transmission depend both on the gene's context and on the mode of inheritance obeyed by its locus (autosomal, Y-linked, cytoplasmic, etc.).
Describing genotypes and populations: The genotype of an individual at position
These conventions can be generalized if the situation demands it. When a position has pleiotropic effects on a set of k traits, the variable
In general, an individual is represented by a vector X containing the values of his/her indicator variables for every position in the genome. It is useful below to extend this vector to include positions from more than one individual.
The genetic state of a population can be completely described by a set of statistical moments that we call associations. These include associations among genes within a haploid genome, which are conventionally referred to as linkage disequilibria. We use the more general term, however, since we are concerned with associations among arbitrary sets of positions, which may or may not be linked and may or may not be in a population that is at equilibrium. Indeed, we need to consider associations between positions that are in different individuals. The relation between different measures of linkage disequilibrium is summarized in the discussion.
Our notation allows multilocus moments to be defined in a variety of ways. The key quantities that determine the moments are a set of reference values. There is one reference value for each position, and the reference value for position
Summary of notation
Next we define the product of all the ζ's in the set of positions
—A model of a dioecious population with four autosomal loci. Open circles are genes inherited from a female; solid circles are genes from a male. Three of the associations between the 16 positions are shown.
The associations have particularly simple interpretations when the reference points are chosen to be the current allele frequencies (
Equations 1, 2, 3 provide a recipe for translating genotypic frequencies into a set of reference values
Summations of the kind seen in the second term of Equation 4 make frequent appearances in this article. The sum includes one term for each possible subset
The notation allows for more than two alleles per locus. It does become more complicated in that event, however, because the extra degrees of freedom require us to account for associations with repeated positions. With three alleles, for example, the allele frequencies at a position are described by the two variables
Two basic equations for simplifying associations for biallelic loci are useful later. From Equations 1, 2, 3
The relation between phenotypes and genotypes: This notation is sufficiently flexible to allow for any relation between phenotypes and genotypes. Let Z be the value of a phenotypic character in an individual. This value can be written in general as a function of the individual's genotype,
Equation 6 can describe any kind of genetic dominance, epistasis, sex differences in expression, genomic imprinting, etc. The relation between genotype and phenotype is determined by the choice of the coefficients
The b coefficients can become quite numerous. For example, with just two biallelic loci each with four possible contexts (say, two sexes of carrier and two sexes of origin), there are 28 – 1 = 255 possible b coefficients: 8 corresponding to single positions, 28 corresponding to pairs of positions, etc. The number of coefficients drops dramatically, however, in many cases. With no epistasis, genotype-phenotype relations can be fully described with only 8 distinct coefficients, while with a completely additive model (no dominance and no epistasis), only 4 distinct coefficients are needed.
EVOLUTION BY SELECTION AND TRANSMISSION
Here we use the notation proposed above and results from BT91 to find how the genetic composition of a population changes over the course of a generation. We first show how selection and transmission change a population and then end with some statistical bookkeeping.
Selection: Fitness is just another phenotypic character. Consequently, Equation 6 applies when the trait under consideration is fitness. BT91 showed that this insight is useful because the b coefficients then take on special significance: They can be used to calculate how the genetic state of the population changes.
We noted earlier that fitness (that is, relative reproductive output) can depend on just the genotype of the individual (as with viability selection), on the genotype of a mated pair (as with any form of nonrandom mating), or on the genotypes of a larger group of individuals (as with kin or group selection). To describe the effects of selection, we need to consider together the genomes of the selection group, by which we mean the set of individuals that interact to determine their mutual fitness. With simple viability selection and random mating, the selection group is a single individual. Often it is useful to define the selection group to be a male and female mated pair, which allows for nonrandom mating as well as viability selection. The selection group can be expanded to more than two individuals to accommodate group selection.
We denote the set of all positions in a selection group as
The absolute fitness of a selection group is defined to be the ratio of its frequency after selection to its frequency before. When a selection group consists of more than one individual, its “frequency” before selection is equal to the product of the frequencies of the genotypes of those individuals. Take, for example, a selection group consisting of a mated pair of male and female genotypes. The frequency before selection can often be taken as the product of the frequencies of the respective male and female genotypes, since the premating “groups” are equivalent to randomly chosen pairs of males and females. The frequency of the selection group after selection is the frequency with which those genotypes are found together among all mated pairs (weighting the pairs by their relative fecundities, if they differ). This representation of selection can account for viability selection within each sex as well as nonrandom mating and fecundity selection.
The genotype of the selection group is described by the vector X, which includes the allelic state for every position in every individual in the group. Denoting the frequencies of the group's genotype before and after selection as f(X) and f′(X), the group's absolute fitness as W(X), and the population's mean fitness as W̄, we see from Equation 6 that we can always write the expected relative fitness of the genotype of a selection group in the form
The coefficients
Selection coefficients have simple interpretations. With biallelic loci, the coefficient
Two points about Equation 7 are worth keeping in mind. If the phenotype contains an environmental component, the relative fitness w(X) is understood to mean the relative fitness averaged over that environmental variation. Second, the selection coefficients depend on how the selection group is defined. For example, with random mating the selection group can be defined as a single individual, and no selection coefficients that include both sexes of carrier appear. But if the selection group is defined as a mated pair, the fitness function of Equation 7 generates selection coefficients with both sexes of carrier even under random mating. This discrepancy is not a problem, though, since the alternative definitions of the selection group will produce the same results so long as the definition that is chosen is used consistently throughout the calculations.
Given any set of assumptions about how genotypes (or phenotypes) affect lifetime fitness, Equation 7 can be used to calculate the corresponding selection coefficients. appendix a presents a simple example of two loci under epistatic viability selection. When several selection events occur over the course of a generation, the job is made easier by calculating coefficients for each event in isolation and then combining them. For example, suppose that fitness is the product of viability through two stages of the life cycle, each represented by Equation 7 but with coefficients
Given the selection coefficients, we can determine the state of a population following the selection event. BT91 showed how the new allele frequencies and associations are given by
Equation 9, which gives the new moments in terms of the old reference values, can be used to calculate changes in allele frequencies caused by selection. If we choose the reference values to be the allele frequencies before selection (
Equation 10 gives an exact expression for the change in allele frequency at position
Transmission: “Transmission” refers to an event that changes the contexts of genes. Obvious examples are meiosis, where a gene carried by a diploid individual becomes a gene in a haploid individual (the gamete), and fertilization, where the reverse transition happens. Migration can also be considered as a form of transmission, since genes change their context as they move from one location to another. The effects of transmission on the state of the population are determined by the transmission coefficients. The transmission coefficient
To clarify the meaning of these coefficients, consider autosomal loci in a haploid population with two sexes. The transmission coefficient tim←if is the probability that a gene at locus i in a male was inherited from a gene at locus i in a female and is therefore ½. The transmission coefficient t{im,jm}←{if,jf} is the probability that the genes at loci i and j in a male were both inherited from a female (the mother), which is (1 – rij)/2, where rij is the recombination rate between loci i and j.
There are three constraints on transmission coefficients. First, transmission coefficients are zero unless each position in set
Transmission coefficients often involve recombination between groups of more than two loci. We use rA to denote the probability that recombination occurs somewhere in the set of loci A; that is, the alleles at those loci passed to a gamete are a mixture of those inherited from the individual's mother and father. Table 2 gives the transmission coefficients for several cases of interest, including autosomes, X-linked loci, and cytoplasmic factors.
Many models assume that there is no genetic variation for the rules of transmission. In that case, the effect of transmission on the moments (the allele frequencies
and associations) that describe a population is simple. Equation 3 implies that the moments after transmission,
Examples of transmission coefficients under meiosis and syngamy
Equation 12 can be easily generalized to allow different genotypes to follow different transmission rules. Examples include cases where there is meiotic drive or genetic variation in recombination rates. As in Equation 6, we write the transmission coefficients as a polynomial function of genotype,
Changing reference values: As described earlier, our system of describing a population is defined relative to a set of reference values. The investigator is free to leave these fixed or to change them as often as desired. It is often convenient, however, to change the reference values once per generation. By updating the reference values to the current allele frequencies, the associations have simple interpretations, and we can calculate the per-generation changes in allele frequencies. Moreover, updating only once per generation avoids a proliferation of alegebra, involving reference values at intermediate stages that eventually cancel. Sometimes it is convenient to update the reference values at the zygote stage. Alternatively, it may be easiest to update them before transmission, since under normal meiosis (no meiotic drive, etc.) allele frequencies are unchanged and the change in associations caused by transmission often takes a simple form when the associations have already been centered. If we are interested only in finding the evolutionary equilibrium, changing reference values from one generation to the next is not an issue.
Changing the reference values changes the associations D, because the latter are defined in terms of the former. Denote the associations before and after the change as
The previous section notes that the transmission equation (12) does not hold when different positions at the same locus have different reference values (that is,
With the exception of random drift, all of genetic evolution can be concisely represented by two equations: Equation 9 for the effects of selection and other deterministic forces and Equation 12 or 16 for the effects of transmission. These can be supplemented by Equation 15, which does the bookkeeping needed to ensure that the associations have a simple interpretation. Other deterministic forces, like mutation and migration, can also be described by these equations. Mutation can be represented as a form of frequency-dependent selection and migration as a form of transmission (since genes change their contexts when they move). It is easier to find the effects of mutation directly, however, which we do below. Before doing that, however, we develop an approximation that greatly simplifies the equations for selection and transmission.
THE QLE APPROXIMATION
The recursions derived above can be used to calculate the exact dynamics for a wide range of multilocus population genetic models. Although this approach may give more insight than directly following genotype frequencies, it will not necessarily be any more tractable. That is because exact results require following the dynamics of the same number of variables, regardless of whether they are genotype frequencies or moments (that is, allele frequencies and associations). One of the great appeals of the moment-based approach introduced by BT91 is that in some situations expressions for the associations can be greatly simplified by approximation. In this section, we derive approximate expressions for the associations and changes in allele frequencies when the population is in a state of QLE. The concept was introduced by Kimura (1965) and greatly generalized by Nagylaki (1993) and Nagylaki et al. (1999); a concise summary of those results is given in Bürger (2000, p. 82).
The first fundamental assumption we must make is that all the associations D are of order a, by which we mean that they are not larger than a constant factor times the largest of the a's. BT91 shows that this condition is met when the forces that generate associations within a sex (epistasis, migration, etc.) are weak relative to recombination and when nonrandom mating is not strong. An intuitive justification is that the associations are produced by evolutionary forces that are of order a (see Equation 9) and will not accumulate to values that are much larger than that if the forces breaking them down (recombination, segregation, and mutation) are sufficiently strong. The second assumption needed for the QLE approximation is that all the selection coefficients a are ≪1. BT91 shows that when these two conditions hold, a population rapidly settles into a state where the allele frequencies are changing slowly, and the associations are close to the equilibrium values they would reach if the allele frequencies were in fact stationary (see also Nagylaki 1993). We can then neglect terms involving higher powers of the a's and also higher powers of the D's (because they are of order a). Furthermore, the effects of a series of events of selection, migration, and mutation can be added together, provided they are each of order a (Kirkpatrick and Servedio 1999).
Approximations for the associations: We assume that there are two alleles at each locus, which simplifies the analysis. The approach can be extended to multiple alleles following the leads of BT91. The main results developed below are illustrated with a simple two-locus example in appendix a.
Consider a life cycle in which we define the reference values to be the allele frequencies at the zygote stage. A series of selection events occur during the course of the generation. The generation ends with transmission, creating the zygotes for the next generation. We seek to derive an approximation for the dynamics of allele frequencies that is accurate up to (and including) terms of order a2, which we denote O(a2). From Equation 10, we see that approximation requires in turn that we find an approximation for the associations D that is accurate to order a. To do that we find the values that give an equilibrium for the recursion equations for the D that are accurate to order a; those solutions are our QLE approximations for the associations. The results apply not just to selection but to other deterministic forces that generate associations (such as migration) so long as they are weak relative to recombination and segregation. To simplify the derivations, we assume that there is no genetic variation in the transmission coefficients, an assumption that could be relaxed (see Barton 1995). However, we must assume that the transmission coefficients are sufficiently large that forces of order a do not eventually generate strong associations. (This requires that the largest of absolute values of the selection coefficients a is much smaller than the smallest of the absolute values of the eigenvalues of the matrix of transmission coefficients
To begin deriving an approximate recursion for the D, Equation 9 gives the cumulative effect of selection and other deterministic forces on the associations between positions in set
The first step of Equation 17 follows from Equation 9 because the D are of order a and therefore the term
The effect of changing reference values can also be simplified. Equation 10 shows that the change in the allele frequencies is of order a. If we define the reference values to be the allele frequencies, then the quantities (
Equation 19 is the main result of this section. It gives the solutions for the associations implicitly: The QLE value
The next two sections illustrate how to do this by carrying out the calculations for autosomal genes in dioecious haploids and for autosomal, sex-linked, and cytoplasmic genes in diploids.
Autosomal genes in haploids: The QLE approximation for autosomal inheritance in a haploid population with two sexes was found by BT91. This section rederives their result to illustrate the new notation and how to use Equation 19 to find a QLE approximation.
The context for each gene now contains only its sex of carrier. That is because an individual carries only one gene at each locus, rather than the two that must be distinguished in the case of diploids. The comments following Equation 19 imply that for this case the first sum on its right side reduces to
Putting those facts together gives the QLE approximation
Autosomal, sex-linked, and cytoplasmic genes in diploids: Now consider autosomal genes in a diploid population with two sexes. The context for a gene now includes both its sex of carrier and sex of origin. We allow for nonrandom mating and sex differences in selection and recombination. To simplify the calculation, however, we assume that there is no genetic variation in recombination rates and no genomic imprinting (that is, an allele's sex of origin does not affect its expression). The approach outlined here can be directly extended to allow for more than two sexes, as might be appropriate to describe a population with partial selfing.
Careful consideration of Equation 19 shows that the associations fall into three cases. Case 1 are associations among a set of positions
Case 2 are the associations between a set of positions that all have the same sex of carrier, but some have a male and others a female sex of origin. This kind of association exists for some sets of positions (for example, autosomal), but not others (for example, sets with only cytoplasmic loci). If they do exist, Equation 19 gives the QLE approximation
Case 3, the last category of association, is when all positions in
When all the genes in set A are autosomal, all four of the possible case 3 associations exist. Solving Equation 19 then gives
Transmission coefficients for case 3 associations at QLE
When A is a set of either all X-linked genes or a mixture of X-linked and autosomal genes, D̃Amm does not exist. Solving Equation 19 for the remaining three kinds of associations in case 3 gives
When A is a mixture of cytoplasmic and nuclear genes or only a set of cytoplasmic genes, then
A similar situation occurs with Y-linkage. When A is a set of Y-linked genes or a mixture of Y-linked and autosomal genes, the only kind of case 3 association that exists is
We stop our inventory of the case 3 associations at this point. There are modes of transmission not discussed above, as, for example, when male, female, and hermaphroditic individuals occur in the population. Associations for those cases can be calculated, however, from Equation 19 using the same method.
Changes in allele frequencies at QLE: When a population is in quasi-linkage equilibrium, changes in allele frequencies can be approximated by simple expressions. The exact expression for allele frequency change is given by Equation 10. The QLE approximation for
MUTATION AND MIGRATION
Mutation and migration are two other deterministic forces that change the genetic composition of a population. This section shows how they change allele frequencies and the associations between loci.
Mutation: While the effects of mutation on allele frequencies have been understood since Haldane (1927), its effects on associations among loci have not been fully worked out. Indeed, it seems to us that the general case, in which there is an arbitrary matrix of mutation rates between alternative alleles and an arbitrary representation of allelic state, does not lead to simple expressions. Bürger (2000, p. 190) gives expressions for the effects of “random walk” mutation, in which the change in allelic effect
Here, we give the general result for two alleles at each locus, with allelic state taking the values 0 and 1; see appendix b for details. Denote the mutation rate at position
Migration: The effects of migration on single loci were found by Wright (1931), and later workers have understood that migration can generate associations between loci. General results for the effects of migration on associations, however, have apparently not been worked out previously.
The change in the frequency of allele 1 at position
We illustrate the use of this result with two special cases that may be of general interest. The first is the association between pairs of positions generated by migration. The association between positions
A second situation that may also be of general interest is when the associations among the residents and migrants are initially zero. Then Equation 30 gives
The exact results can be used to find simple approximations for the effects of migration on the associations. Equation 30 shows that the change caused by migration is
APPLICATIONS
This section uses the machinery described above to develop results for the effects of natural and sexual selection. The aim is both to illustrate how these methods work and to develop some results that are biologically interesting in their own right.
In the first application, we find the selection coefficients generated by natural selection acting on an additive polygenic trait and use those results to study how it evolves under autosomal inheritance. Next, we find approximations for the genetic correlation between a female mating preference and a male display trait produced by sexual selection. Then, we see how the mode of inheritance affects this correlation by deriving results for haploid autosomal, diploid autosomal, and diploid X-linked genes.
Quadratic stabilizing selection on an additive polygenic trait: Many problems in evolutionary biology involve evolution of traits controlled by multiple genes of approximately additive effect. In this section we derive exact expressions for the selection coefficients on single positions and sets of positions that result from quadratic stabilizing selection. These can be used to calculate the evolutionary changes in the mean, variance, and higher moments of the trait. The same methods can be used to calculate selection coefficients more generally, for an arbitrary form of selection acting on any number of additive genes.
Consider a trait controlled by a set of genes with additive effects under stabilizing selection. The calculations illustrate a general strategy for calculating selection coefficients: Write an explicit model for the phenotype, write the fitness function as a polynomial, substitute the expression for the phenotype into that fitness function, equate the result with Equation 7, and finally pick out the coefficients of the fitness function that correspond to the a's. This example is very similar to one in BT91 (p. 244). It introduces readers who are not familiar with that article to the approach and shows those who are how the new notation works.
The model for the phenotype of an individual comes from Equation 6, which simplifies under our assumption that genes have additive effects,
Our model for fitness is the quadratic function
Substituting Equation 36 into Equation 37 and averaging over the environmental variation gives
The selection coefficient acting jointly on positions
To complete the analysis of this model we find how the population evolves. To determine the state of the population in the following generation, we need to make some assumptions about inheritance, that is, the rules of transmission. To keep things simple, we assume that
The change in allele frequencies caused by selection is found using Equation 10 with some help from Equation 5,
Since transmission does not change allele frequencies, the overall change in the mean of the trait from the start of one generation to the next can be found by summing
Now consider the change in the association
Sexual selection by female choice: We mentioned earlier that the multilocus machinery developed here can be used to study the genetic consequences of nonrandom mating. This section shows how to calculate a QLE approximation for the genetic correlation (or covariance) between a female mating preference and male display trait that is generated by sexual selection. This quantity is important to many theories about sexual selection (Kirkpatrick and Ryan 1991). Previous work has calculated the covariance expected under autosomal inheritance in diploids (Lande 1981; Barton and Turelli 1991) and haploids (Kirkpatrick 1982; Kirkpatrick and Barton 1997). Here we extend the earlier results by finding the covariance when some genes are sex linked. In addition to the biological interest in the result, the derivations illustrate how nonrandom mating and nonautosomal inheritance are modeled in our framework.
Consider a pair of characters, one expressed in females and the other in males, that together affect the probability that a male and a female will mate. We refer to the first as a “preference” and the second as the “male trait.” In fact, the preference need not be a behavioral phenotype: It can be any character that affects mating probabilities. The value of a female's preference phenotype is denoted P and that of the male trait is T. A set of genes
We begin by calculating the selection coefficients, which are independent of the inheritance rules. We then use them to find QLE approximations for the preference-trait covariance under three types of inheritance: haploid autosomal, diploid autosomal, and diploid X-linked.
Selection coefficients: Derivation of the selection coefficients follows Kirkpatrick and Barton (1997). The phenotypic distribution of the preference among females at birth, fP(·), has mean P̄ and variance
We now make two kinds of approximations. Assume that the preference and trait are not evolving rapidly, so that P̄* ≈ P̄, T̄* ≈ T̄,
Now substitute the expressions for P and T written in the form of Equation 36 into Equation 47. The selection coefficient for a set of loci
This result shows that the force of sexual selection that unites a female preference gene with a male trait gene is simply proportional to ρPT, the phenotypic correlation between the preference and trait among mated pairs. It is also proportional to the size of each gene's effect on the phenotype relative to the character's phenotypic standard deviation. Selection coefficients for all other sets of positions are 0.
These selection coefficients are valid regardless of how the genes affecting the preference and the male trait are inherited. In the following two sections we find the genetic correlation generated by this type of selection when the loci are haploid autosomal, diploid autosomal, and diploid X-linked. The examples show how the QLE approximation accommodates different modes of inheritance.
Haploid autosomal inheritance: We begin by calculating the genetic correlation in males between the female preference and male trait. The definition of the additive genetic correlation in zygotes is
Diploid autosomal inheritance: In diploids, there are two positions at the preference locus and two at the trait locus, corresponding to copies of those genes inherited from mothers and fathers. The genetic covariance between preference and trait in males is therefore
To find those associations, we use Equations 21 and 22. The function F(·) that appears there is calculated using (23):
X-linked inheritance: To illustrate how our methods extend to other forms of inheritance, consider next a case in which genetic variation in a female mating preference is X-linked while variation in the male trait is autosomal. We see that the genetic covariance and correlation in males are different than when both characters are autosomally inherited.
The preference-trait covariance in males is
The associations that appear in Equation 57 are given by Equations 21 and 24. The values for the function F(·) that appears in (24) are now
Putting these facts together shows that in male zygotes the genetic covariance and correlation between the mating preference and male trait are
DISCUSSION
We have set out a general notation that describes arbitrary modes of selection and genetic transmission. The key components are the representation of genotype frequencies in terms of means and higher moments of the distribution of allelic states, of selection as a polynomial function of genotype, and of genetic transmission as the movement of genes between different contexts. The first two components are already well developed, particularly in models of additive quantitative traits. The main contribution of this article is to combine them with a generalized representation of transmission. The calculations can be automated, as described in appendix d.
How does a general multilocus notation help us to better understand evolution? A set of equations for changes in genotype frequencies can be derived automatically for arbitrary models, but will in all but the simplest cases be impenetrably complicated. The value of an algebraic expression written in terms of multilocus moments or cumulants is that it allows one to identify and interpret the key processes responsible for evolutionary change. For example, Equations 30, 31, 32, 33, 34, 35 show that migration builds up associations among loci in proportion to the product of the allele frequency differences. A second example comes from the analysis of sexual selection. The analysis shows that the genetic correlation between a female preference and a male trait is directly proportional to the phenotypic correlation between the preference and trait in mating pairs. Further, the genetic correlation depends on the way in which the preference and trait genes are inherited (Equations 53 and 60).
Defining a model in a standard notation can reveal similarities among apparently different mechanisms. For example, if the indirect selection on a modifier of recombination is expressed in terms of selection coefficients (aU), it can be shown to depend on the effects of recombination on the mean and variance of log(fitness), regardless of the causes of fitness variation (Barton 1995). An unambiguous notation may also clarify conceptual issues. We believe that models of group selection using our notation may clarify definitions of fitness and of “levels of selection.”
Models of multiple loci are most fruitful when combined with appropriate approximations. The best developed is the QLE approximation, which assumes that processes such as epistasis and migration that generate associations among loci are weak, relative to those that break them down, such as recombination, segregation, and mutation. This leads to simple expressions for associations of all orders and is likely to be accurate for most sexually reproducing populations. Several workers have explored this approach, using different measures of association (linkage disequilibrium). Table 4 summarizes these measures and their corresponding versions of the QLE approximation. (For a more detailed treatment, see Bürger 2000, pp. 82 and 183–190.)
The different measures can be divided into two classes. Most measures are defined for each genotype, usually as a difference between its actual frequency and
the frequency expected at linkage equilibrium. In contrast, we define associations as moments of the distribution of allelic states; these moments are defined for each set of positions (
Alternative measures of association (linkage disequilibrium)
The multilocus notations used by Christiansen (1999) and Bürger (2000) are closest to that used here. The main difference is that we deal with sets of genes in context, or positions, which allows us to avoid restrictive assumptions such as autosomal diploid inheritance, random mating, and equal transmission rates in males and females. The relation between the notations can be illustrated by comparing expressions for associations among loci at QLE. Christiansen's (1999) Equation 7.19 for the associations among a set of loci M in a gamete is
Bürger's (2000, p. 188) expression for cumulant associations at QLE in generation t is
Although the QLE results developed in this article are consistent internally and with independent derivations, we have not rigorously shown that this quasi-equilibrium is unique or that the population will always converge to it when the assumptions are met. Nagylaki (1993; see also Nagylakiet al. 1999) showed that autosomal loci in a random mating diploid population under weak selection converge to a QLE. In this article we have relaxed his assumptions to allow for nonrandom mating, other forms of inheritance, and other evolutionary forces (migration and mutation). Since the effects of migration and mutation are equivalent to forms of frequency-dependent selection, Nagylaki's results should also apply when they act so long as the effective selection coefficients that they generate are small. The consequences of nonrandom mating and variations in inheritance are more difficult to account for. We expect convergence to our QLE values whenever all the a's are sufficiently small and all eigenvalues of the matrix of transmission coefficients
The QLE approximation developed here can be used to study a variety of interesting models. Selection on the genetic system (recombination, selfing, and mutation rate, for example) can be studied by assuming a modifier allele of small effect. Even if the system as a whole is under strong selection, associations involving the modifier will be weak and can therefore be modeled by a set of linear equations (Barton 1995). The infinitesimal model is based on a different kind of approximation, in which the limit of a large number of loci, each with infinitesimal effect, is taken. Turelli and Barton (1994) give a heuristic argument that extends this model to allow for linkage and epistasis. However, as Bürger (2000, p. 189) points out, this extension remains to be proven rigorously.
The biggest lacuna in the framework presented here is the lack of a model for random genetic drift. Its effects can be included by accounting for the variation in allele frequencies and associations caused by random sampling. The effects of drift on allele frequencies and pairwise associations have been well studied (Ewens 1979), but exact results and approximations for the joint probability distribution of the higher-order associations remain to be worked out. A basic implementation of random drift is included in the Mathematica packages.
For the future, there is considerable scope for applying the methods set out in this article to bring together analyses of particular models and to explore better ways to approximate general multilocus systems.
Acknowledgments
We thank Ophélie Ronce, Maria Servedio, Stuart Thomas, and two careful reviewers for their comments on the manuscript. We are grateful for support from the National Science Foundation (grant DEB-9973221), the Biotechnology and Biological Sciences Research Council (postgraduate studentship no. 97/B1/G/03163 to NB/TJ), the Wellcome Trust (International Prize Travelling Research Fellowship no. 061530 to T.J.), the Scottish International Education Trust (travel grant to T.J.), and Darwin Trust for funding.
APPENDIX A: A TWO-LOCUS EXAMPLE
The goal is to develop a simple example that illustrates the notation and calculations involved in selection, transmission, and the QLE approximation. We look at the effects of viability selection acting on two diploid autosomal loci in a random mating hermaphroditic population.
Consider a simple selection scheme in which allele 1 at locus i changes relative fitness by si and allele 1 at locus j by sj. Interactions between the loci also affect fitness. More specifically, assume that each interaction between an allele 1 at locus i and an allele 1 at locus j changes relative fitness by eij. These assumptions lead to the fitnesses for diploid genotypes shown in Table A1.
Because mating is random, we can define the selection group as a single individual. Since the population is hermaphrodite, the contexts for the two loci need only to give the sex of origin for each gene. For example, im stands for a gene that was inherited via a sperm.
Fitnesses for the two-locus example
To calculate the selection coefficients (the a's), first write the fitnesses that appear in Table A1 in the form of a polynomial in the X's:
The change in the frequency of allele 1 at locus i is calculated using Equation 10. Because the population is hermaphroditic, allele frequencies and associations in genes inherited via sperm will be equal to those inherited via eggs: pif = pim ≡ pi and Difjf = Dimjm ≡ Dij. After one generation of random mating, associations involving more than one sex of origin (e.g., Difjm) are zero. The exact change in allele frequency will then be
That result can be written entirely in terms of the selection coefficients and allele frequencies once the population reaches QLE. At that point, Equation 22 shows that the association between the loci is
APPENDIX B: MUTATION
The centered associations following mutation can be written as
APPENDIX C: MIGRATION
We calculate the centered associations following a migration event. The result is most easy to interpret when it is expressed in terms of the centered association among the residents and migrants before the event. We therefore define the reference values in the resident population,
Just after migration, the (uncentered) associations are given by Equation 3,
The next step is to express the
To find the central moments after migration, we set the new reference points to the allele frequencies after migration:
APPENDIX D: MATHEMATICA EXAMPLES
Here, we give some examples that outline how Mathematica (Wolfram 1999) can be used to find algebraic expressions for the changes in associations due to various evolutionary processes. Software that extends Mathematica to do the calculations described in this article is available on the Web at http://helios.bto.ed.ac.uk/evolgen. The notation used by that software is essentially the same as that in the text, but there are a few differences that need to be explained. The examples below show the user's input to Mathematica in boldface type and the program's output in regular type.
Stages in the life cycle: Different stages in the life cycle are denoted by the first element in the context, rather than by primes. Thus, the association between two positions in male gametes would be written
The first step is to define the contexts for each stage. For example, this defines gamete (
Recombination: Expressions for the mean and for the associations at one stage are expressed in terms of variables at the previous stage by applying various rules. For example, the mean contribution of a gene in the new gamete pool is
Stabilizing selection: Stabilizing selection is represented by defining fitness as a function of genotype. It is convenient to define the trait and fitness separately for each sex, even though they are in fact the same. The expressions can be simplified later:
The effect of selection on the mean contribution of a position is
The complete life cycle: This set of rules defines the change over the whole life cycle:
The new mean depends on selection coefficients up to fourth order:
Other processes: Non-Mendelian inheritance is handled by interpreting recombination rates appropriately. For example, a cytoplasmically inherited gene is certain to be inherited from the mother:
Migration is implemented in a similar way to recombination, as a set of transmission rules. For example, this is the cross-genome association between loci j and k among juveniles,
Footnotes
-
Communicating editor: J. B. Walsh
- Received October 23, 2001.
- Accepted May 1, 2002.
- Copyright © 2002 by the Genetics Society of America