- THIS ARTICLE
-
Abstract
- Full Text (PDF)
- Alert me when this article is cited
- Alert me if a correction is posted
- SERVICES
- Similar articles in this journal
- Similar articles in PubMed
- Alert me to new issues of the journal
- Download to citation manager
- Reprints & Permissions
- CITING ARTICLES
- Citing Articles via HighWire
- Citing Articles via Google Scholar
- GOOGLE SCHOLAR
- Articles by Kirkpatrick, M.
- Articles by Barton, N.
- Search for Related Content
- PUBMED
- PubMed Citation
- Articles by Kirkpatrick, M.
- Articles by Barton, N.
General Models of Multilocus Evolution
Mark Kirkpatricka, Toby Johnsonb,c, and Nick Bartonba Section of Integrative Biology, University of Texas, Austin, Texas 78712,
b Institute of Cell, Animal and Population Biology, University of Edinburgh, Scotland EH9 3JT, United Kingdom
c Department of Zoology, University of British Columbia, Vancouver, British Columbia V6T 1Z4, Canada
Corresponding author: Mark Kirkpatrick, University of Texas, Austin, Texas 78712., kirkp{at}mail.utexas.edu (E-mail)
Communicating editor: J. B. WALSH
| ABSTRACT |
|---|
In 1991, Barton and Turelli developed recursions to describe the evolution of multilocus systems under arbitrary forms of selection. This article generalizes their approach to allow for arbitrary modes of inheritance, including diploidy, polyploidy, sex linkage, cytoplasmic inheritance, and genomic imprinting. The framework is also extended to allow for other deterministic evolutionary forces, including migration and mutation. Exact recursions that fully describe the state of the population are presented; these are implemented in a computer algebra package (available on the Web at http://helios.bto.ed.ac.uk/evolgen). Despite the generality of our framework, it can describe evolutionary dynamics exactly by just two equations. These recursions can be further simplified using a "quasi-linkage equilibrium" (QLE) approximation. We illustrate the methods by finding the effect of natural selection, sexual selection, mutation, and migration on the genetic composition of a population.
EVOLUTION involves simultaneous changes at many genetic loci. Modeling these changes is difficult because associations between alleles at different loci ("linkage disequilibria") cause the effects of selection acting on one locus to spill over onto other loci, generating indirect selection (see ![]()
![]()
![]()
![]()
![]()
![]()
![]()
The most obvious approach to modeling multilocus systems is simply to follow the frequencies of all possible genotypes. There are three basic drawbacks here. First, the number of genotypes grows exponentially with the number of loci, rapidly overwhelming both analytical and simulation approaches when there are even a modest number of genes. Second, the quantities that are often of most interest, such as allele frequencies and mean phenotypes, are obscured by working with genotypes. Third, approximations for the dynamic equations appear more naturally when we work with quantities other than genotype frequencies.
Several approximate approaches have been developed to deal with these problems.
- The infinitesimal model assumes that very many genes influence the phenotype, such that each allele has an infinitesimal effect (
FISHER 1918 ;
BULMER 1980 ;
TURELLI and BARTON 1994 ). In this limit, the genetic variance contributed by allelic variation at each locus is constant, and evolutionary change is due solely to changes in associations among loci. This is an accurate and general approximation for short-term change under strong selection, but cannot describe changes in allele frequencies over the longer term.
- The hypergeometric model (
KONDRASHOV 1984 ;
BARTON 1992 ;
DOEBELI 1996 ) also assumes that loci are unlinked and have equal effects, but allows the number of loci to be finite. However, the stability of solutions to this model is limited to certain selection regimes (
SHPAK and KONDRASHOV 1999 ;
BARTON and SHPAK 2000 ), which do not include many scenarios of evolutionary interest, such as stabilizing selection.
- Another method introduced by
FISHER 1953 models a population by following the inheritance of "junctions" between chromosome regions with different ancestries. This approach is well suited to describing the ancestry of samples of neutral genomes (
HUDSON 1990 ) and models of hybridization, in which selection can be approximated as acting on the proportions of genetic material derived from different source populations (
BAIRD 1995 ). It is again intractable over long timescales, however, since the number of junctions increases geometrically.
-
PRICE 1970 gave an exact and completely general equation, in which the average change in a trait is precisely equal to its covariance with relative fitness plus the change due to transmission. This takes the classical approach of quantitative genetics, by following only the phenotype and disregarding the (usually unknown) genetics that underlies it.
- Several independent developments extend Price's approach by following the mean, variance, and higher moments of the phenotypic distribution (e.g.,
BARTON and TURELLI 1987 ; BÜRGER 1991;
SHAPIRO et al. 1994 ). Each moment depends on higher moments, however, and so approximations are required to give a closed set of dynamical equations. Such approximations are accurate under restricted circumstances only. In contrast to thermodynamics, where molecular motions can be averaged out over macroscopic scales, genetic details do influence phenotypic evolution.
![]()
![]()
![]()
![]()
![]()
![]()
Although the notation and framework of the BT91 approach are general in many respects, the methods it develops are restricted to certain forms of inheritance. Their notation can describe autosomal genes in randomly mating diploids and in nonrandomly mating haploids. It cannot, however, accommodate such complications as nonrandom mating in diploids, polyploidy, sex linkage, genome imprinting, and cytoplasmic inheritance. The main aim of this article is to show how the BT91 approach can be generalized to include all forms of inheritance. We also show how migration and mutation can be described in the same framework.
This article begins by presenting a notation that is sufficiently flexible to accommodate a variety of evolutionary forces and modes of inheritance. Next, we derive general recursion equations that describe how the genetic state of a population changes over the course of a generation. The following section shows how the selection and transmission coefficients that appear in the recursions are calculated for any particular situation. We then present the QLE approximation for the recursions, which greatly simplifies the equations. The QLE approach is illustrated with examples in the following section.
The notation and recursions set out in this article have been implemented in a set of Mathematica packages (![]()
| A GENERAL NOTATION FOR MULTILOCUS EVOLUTION |
|---|
Here we lay out a notation that is sufficiently flexible to account for the different modes of inheritance and evolutionary forces that motivate the model. The section starts by introducing the concepts on which the notation relies, shows how the notation can describe genotypes and populations, and then describes the relation between phenotypes and genotypes.
Contexts and Positions, Selection, and Transmission:
It is useful to begin by defining the word "gene" to mean a particular copy of a nonrecombining sequence at some locus in some individual. Thus two different genes may or may not reside at the same locus, and if they do they may or may not be in the same allelic state. A gene at a given locus can be found in any of a number of situations: It might be carried by a male or a female, it might have been inherited from a mother or a father, or it might reside in one deme or another. We refer to this collection of qualities as the gene's context. Context is a key concept in our notation and it is important in two ways. First, it determines how evolutionary forces act on the gene. A selection coefficient, for example, may depend on whether the gene is carried by a male or a female. If there is genomic imprinting, then the selection coefficient will also depend on the sex of the individual from which the gene was inherited. Second, a gene's context affects how it is transmitted. Consider two autosomal loci in a diploid individual. If there is no recombination between the loci during meiosis, the resulting gamete will carry copies of the genes that both descended from the individual's mother or father; with recombination, the gamete will carry one gene from the individual's mother and one from the father.
The information we need to specify a gene's context varies between models. In a model of a spatially structured population, the context will include geographical information. Likewise, in a life history model the context specifies the life stage of the individual carrying it. The context will include the sex of the individual carrying a gene in a model with two sexes, but not in a model of a hermaphroditic population.
Loci are referred to by lowercase italic letters. The context is written in a series of subscripts whose elements carry the relevant information. In this article we use the convention that for diploid populations with two sexes, the first subscript of the context gives the sex of the individual carrying the gene, which we call its "sex of carrier." The second subscript gives the sex of the parent from which it was inherited, its "sex of origin." Sexes are denoted by "m" for male and "f" for female. For example, genes at a diploid locus i that are carried by females and that descended from a male (the female's father) are referred to as ifm. These are then four possible contexts for genes at this locus. Contexts in hermaphrodites would not include the sex of carrier (since all individuals are the same sex), but would include the sex of origin to denote whether the gene was transmitted through an egg or a sperm. One would account for more than two sexes (as when modeling a plant population with tristyly) by simply allowing the subscripts to take more than two possible values. Subscripts can be added to denote other information, such as the deme or the family in which a gene resides.
We use the term position to refer to a particular locus in a particular context. An example of a position is the place in the genomes of females where genes inherited from males (their fathers) at locus i reside. Positions, like loci, are defined independently of the allelic states of the genes that reside there. With n diploid loci in a dioecious population, there are 4n positions that genes occupy (n loci x two sexes of carrier x two sexes of origin). Open-faced lowercase letters refer to single positions, for example
= ifm. Open-faced uppercase letters refer to sets of positions, e.g.,
= {
,
}. A genome, denoted
, is the set of all positions in an individual. With a single diploid locus i, for example, the genome for a male is
= {imm, imf}, and for a female it is
= {ifm, iff}. The notation is summarized in Table 1.
|
Two fundamental kinds of events that occur during the course of a generation are selection and transmission. "Selection" accounts for variation in the contribution of different genotypes to the next stage in the life cycle. Fitnesses are assigned to either individuals or groups, depending on the form of selection. The simplest case is viability selection, in which case fitnesses are assigned to individuals. For sexual selection and assortative mating, we account for the relative contributions of mated pairs of male and female genotypes; here fitnesses are assigned to all possible kinds of pairs (BT91). Thus according to our use of the term, with nonrandom mating there can be selection even when all individual genotypes have equal survival, mating success, and fecundity. Group selection is described by assigning fitnesses to all possible combinations of genotypes that could comprise a group.
By "transmission" we mean an event that changes the context of a gene. A simple example is meiosis followed by syngamy: A gene that was carried by a female becomes a gene that was inherited from a female. The rules of transmission depend both on the gene's context and on the mode of inheritance obeyed by its locus (autosomal, Y-linked, cytoplasmic, etc.).
Describing genotypes and populations:
The genotype of an individual at position
is represented by the indicator variable X
. With just two alleles per locus, X
can take two values, which it is convenient to set at 0 or 1; for this special case, the frequency of allele 1 at position
is written p
and the frequency of allele 0 as q
= 1 - p
. A fact that is useful later is that under these conventions, the expected value of X
(averaging over all individuals in the population) is equal to p
. When there are more than two alleles, we can choose any distinct values to distinguish the alleles. If we are considering alleles that have additive effects on a quantitative trait, it is convenient to set the values equal to their effects so that the expectation of X
equals the position's contribution to the mean value of the trait.
These conventions can be generalized if the situation demands it. When a position has pleiotropic effects on a set of k traits, the variable X
becomes a vector of length k. In multiallelic models, vectors can also be used as an alternative to the scalar-value convention described in the last paragraph. We can define the indicator to be a vector of length equal to the number of alleles, all entries of which are zero except for the one corresponding to the allelic state. (This approach might be used in a model of genes with additive effects if, for example, alleles with the same effect mutate to other alleles at different rates.) Other conventions are also possible: For example, each position could be represented by a vector of length two, with the first giving the allelic effect and the second a label for the allele.
In general, an individual is represented by a vector X containing the values of his/her indicator variables for every position in the genome. It is useful below to extend this vector to include positions from more than one individual.
The genetic state of a population can be completely described by a set of statistical moments that we call associations. These include associations among genes within a haploid genome, which are conventionally referred to as linkage disequilibria. We use the more general term, however, since we are concerned with associations among arbitrary sets of positions, which may or may not be linked and may or may not be in a population that is at equilibrium. Indeed, we need to consider associations between positions that are in different individuals. The relation between different measures of linkage disequilibrium is summarized in the DISCUSSION.
Our notation allows multilocus moments to be defined in a variety of ways. The key quantities that determine the moments are a set of reference values. There is one reference value for each position, and the reference value for position
is denoted 
. (Note the distinction between an italic p
, which denotes an allele frequency, and a curly 
, which denotes a reference value.) To describe a population, we first make a change of variables, such that the allelic state of an individual gene at position
is measured relative to the reference value for that position, 
:
|
(1) |
Choice of the reference values is up to the investigator. Typically it is useful to define 
as the expected value of X
among zygotes. The reference value then has a particularly simple meaning for biallelic models. Defined that way, if there are no differences in allele frequencies between positions at a locus (e.g., between males and females), the reference value is equal to the frequency of allele 1 at that locus (
=
i = pi). The new indicator variable 
then takes the values 1 - pi and -pi.
Next we define the product of all the
's in the set of positions
:
|
(2) |
The symbol
indicates that the product includes one term for each element in the set
. If we choose an individual at random from the population, then 
is a random variable. The association between the alleles at the positions in set
is defined as the expectation of 
taken over the whole population,
|
(3) |
where EX[·] denotes an expectation over the distribution of genotype frequencies. The D
are therefore moments, that is, measures of statistical association. As an example of the notation, the association between alleles at loci i and j in a diploid male, one inherited from his female parent (mother) and the other from his male parent (father), is written Dimfjmm. Products over empty sets are defined to be 1, so that D
= 1. The D's are the same as BT91's C's. Fig 1 illustrates the notation. [We assume here and below that the indicators X
have scalar values (0 or 1, say). If they are vectors, then the pairwise D{
,
} are matrices, and nth-order associations are tensors of rank n.]
|
The associations have particularly simple interpretations when the reference points are chosen to be the current allele frequencies (
= p
). Moments for single positions vanish: D
= 0. Associations between pairs of positions are equal to the covariance in the allelic state of genes at those positions. Departures from Hardy-Weinberg proportions are measured by D's involving pairs of positions at the same locus that have the same sex of carrier but different sexes of origin; for example, Dimfimm for locus i in males. When there are no sex differences in allele frequencies, Dimfjmf is equal to the conventional measure of pairwise linkage disequilibrium (also called gametic phase disequilibrium) between loci i and j. Moments involving more than two loci are measures of higher-order associations in the population. Following the standard terminology for statistical moments, we say that the associations are "centered" when the reference values are set equal to the current allele frequencies.
Equation 1Equation 2Equation 3 provide a recipe for translating genotypic frequencies into a set of reference values
and associations D that completely describe the genetic state of a population. The reverse translation is of course also possible. For example, with biallelic loci and the reference values defined to be equal to the current allele frequencies (
= p
), the frequency of genotype X is
|
(4) |
where |
| means the number of positions in set
, and
\\
stands for the positions in set
that are left after those in set
are taken away. The first term in Equation 4, a product that includes one term for each position in the genome, gives the genotype frequency that would be found in the absence of any associations. The second term accounts for the effects of the associations.
is a set of all positions in a genome whose sex of carrier is the same as that of X. Expressions more general than Equation 4 that allow for multiple alleles and arbitrary definitions for the reference values can be derived using results that are developed below.
Summations of the kind seen in the second term of Equation 4 make frequent appearances in this article. The sum includes one term for each possible subset
of positions in the set
, including
itself. When an asterisk appears, as in Equation 4, the sum does not include a term in which
equals the empty set,
. Thus, if
consists of the two positions
and
, the sum in Equation 4 will have three terms as
takes on the values {
}, {
}, and {
,
}. When the summation symbol is not followed by an asterisk, the sum does include a term in which
=
.
The notation allows for more than two alleles per locus. It does become more complicated in that event, however, because the extra degrees of freedom require us to account for associations with repeated positions. With three alleles, for example, the allele frequencies at a position are described by the two variables D
and D
. However, when there are only two alleles per locus, associations containing repeated positions can be expressed in terms of associations with no repeated positions. (This article focuses mainly on biallelic loci, which is perhaps not a severe restriction as loci can be defined as single-nucleotide sites. Readers who are interested in loci with multiple alleles should consult BT91, or the documentation with the Mathematica packages, for more details about those models.)
Two basic equations for simplifying associations for biallelic loci are useful later. From Equation 1Equation 2Equation 3
|
(5) |
(see BT91, Equation 5). Here and throughout, expressions of the form
stand for
, the union of sets
and
; thus D
= D

, etc.
The relation between phenotypes and genotypes:
This notation is sufficiently flexible to allow for any relation between phenotypes and genotypes. Let Z be the value of a phenotypic character in an individual. This value can be written in general as a function of the individual's genotype,
|
(6) |
where
is the trait mean in the population and eZ is a random environmental component that is independent of genotype and that has mean 0. The 
that appear on the right-hand side are calculated from the genotype vector X on the left using Equation 1 and Equation 2. (Note that the term in the sum corresponding to the null set
=
makes no contribution because 
= D
= 1.)
Equation 6 can describe any kind of genetic dominance, epistasis, sex differences in expression, genomic imprinting, etc. The relation between genotype and phenotype is determined by the choice of the coefficients b
. If there is gene-by-environment interaction, the b becomes a function of the state of the environment; it may be convenient to include that environmental state as a component of the context. Equation 6 also applies with multiple alleles, provided that the set
includes the appropriate number of repeated elements. For example, suppose that there are three alleles at locus i and two alleles at locus j. For haploid genotypes, the set
is then defined as {
,
,
}. The coefficients b
, b
, b
, b
, b

are then required to account for the 5 d.f., and the set
in the summation of Equation 6 ranges over all five distinct subsets of
.
The b coefficients can become quite numerous. For example, with just two biallelic loci each with four possible contexts (say, two sexes of carrier and two sexes of origin), there are 28 - 1 = 255 possible b coefficients: 8 corresponding to single positions, 28 corresponding to pairs of positions, etc. The number of coefficients drops dramatically, however, in many cases. With no epistasis, genotype-phenotype relations can be fully described with only 8 distinct coefficients, while with a completely additive model (no dominance and no epistasis), only 4 distinct coefficients are needed.
| EVOLUTION BY SELECTION AND TRANSMISSION |
|---|
Here we use the notation proposed above and results from BT91 to find how the genetic composition of a population changes over the course of a generation. We first show how selection and transmission change a population and then end with some statistical bookkeeping.
Selection:
Fitness is just another phenotypic character. Consequently, Equation 6 applies when the trait under consideration is fitness. BT91 showed that this insight is useful because the b coefficients then take on special significance: They can be used to calculate how the genetic state of the population changes.
We noted earlier that fitness (that is, relative reproductive output) can depend on just the genotype of the individual (as with viability selection), on the genotype of a mated pair (as with any form of nonrandom mating), or on the genotypes of a larger group of individuals (as with kin or group selection). To describe the effects of selection, we need to consider together the genomes of the selection group, by which we mean the set of individuals that interact to determine their mutual fitness. With simple viability selection and random mating, the selection group is a single individual. Often it is useful to define the selection group to be a male and female mated pair, which allows for nonrandom mating as well as viability selection. The selection group can be expanded to more than two individuals to accommodate group selection.
We denote the set of all positions in a selection group as
. For example, take a selection group consisting of a male and female in a mated pair. With a single biallelic diploid locus i, the selection group is
= {imm, imf, ifm, iff}. With three alleles, the selection group is the same, but with each element appearing twice.
The absolute fitness of a selection group is defined to be the ratio of its frequency after selection to its frequency before. When a selection group consists of more than one individual, its "frequency" before selection is equal to the product of the frequencies of the genotypes of those individuals. Take, for example, a selection group consisting of a mated pair of male and female genotypes. The frequency before selection can often be taken as the product of the frequencies of the respective male and female genotypes, since the premating "groups" are equivalent to randomly chosen pairs of males and females. The frequency of the selection group after selection is the frequency with which those genotypes are found together among all mated pairs (weighting the pairs by their relative fecundities, if they differ). This representation of selection can account for viability selection within each sex as well as nonrandom mating and fecundity selection.
The genotype of the selection group is described by the vector X, which includes the allelic state for every position in every individual in the group. Denoting the frequencies of the group's genotype before and after selection as f(X) and f'(X), the group's absolute fitness as W(X), and the population's mean fitness as
, we see from Equation 6 that we can always write the expected relative fitness of the genotype of a selection group in the form
|
(7) |
(see BT91, Equation 6).
The coefficients a
defined by Equation 7 are called selection coefficients. The coefficient a
represents the force of selection acting on the position in set
. These coefficients can account for any form of selection within individuals (including dominance, epistasis, and genomic imprinting) and any form of nonrandom mating. Note that selection coefficients defined this way typically depend on allele frequencies, associations, and reference values, even if the fitnesses of genotypes are constant (BT91). If phenotypes include environmental (nongenetic) components, then the frequencies f() and f'() represent expectations averaging over those components (see APPLICATIONS). Note that the selection coefficients defined by (7) differ from those defined in BT91. Although similar in form, the fitness functions are not the same, and so selection coefficients from our system and that of BT91 cannot be interchanged.
Selection coefficients have simple interpretations. With biallelic loci, the coefficient a
measures the force of direct selection acting on position
to increase the frequency of allele 1. Selection coefficients with multiple subscripts indicate that those positions have nonadditive effects on fitness. For example, dominance at a locus i in diploid males is measured by aimfimm. This coefficient measures the force of selection favoring allele 1 at locus i when it appears in two copies, one inherited from a female (the individual's mother) and the other from a male (the father). Nonadditive fitness interactions between loci are represented by selection coefficients that have multiple positions with the same sex of carrier. The selection coefficient aiffjff, for example, measures the departure from additivity for the alleles at loci i and j that are carried by females and were inherited from females. The effects of nonrandom mating appear in selection coefficients that include both male and female sexes of carrier. When there are more than two alleles per locus, there are selection coefficients that have the same position repeated. [The notation can accommodate a continuum-of-alleles model where there are an infinite number of alleles per locus, provided that fitness can be approximated by a polynomial function (see BT91). It may not be possible, however, to obtain a good approximation to a continuum-of-alleles model using a finite set of moments.]
Two points about Equation 7 are worth keeping in mind. If the phenotype contains an environmental component, the relative fitness w(X) is understood to mean the relative fitness averaged over that environmental variation. Second, the selection coefficients depend on how the selection group is defined. For example, with random mating the selection group can be defined as a single individual, and no selection coefficients that include both sexes of carrier appear. But if the selection group is defined as a mated pair, the fitness function of Equation 7 generates selection coefficients with both sexes of carrier even under random mating. This discrepancy is not a problem, though, since the alternative definitions of the selection group will produce the same results so long as the definition that is chosen is used consistently throughout the calculations.
Given any set of assumptions about how genotypes (or phenotypes) affect lifetime fitness, Equation 7 can be used to calculate the corresponding selection coefficients. Appendix A presents a simple example of two loci under epistatic viability selection. When several selection events occur over the course of a generation, the job is made easier by calculating coefficients for each event in isolation and then combining them. For example, suppose that fitness is the product of viability through two stages of the life cycle, each represented by Equation 7 but with coefficients b
and c
. For biallelic loci, the overall selection coefficient is
|
(8) |
[To prove Equation 8, write
, expand each of the w's using Equation 7, use Equation 5 to eliminate products of
's, and finally match the corresponding coefficients of the
's on the right and left sides.] In the event of weak selection, the situation can be simplified further by approximation: If the selection coefficients b and c are of order s, then
to leading order in s.
Given the selection coefficients, we can determine the state of a population following the selection event. BT91 showed how the new allele frequencies and associations are given by
|
(9) |
This is our main result for the effects of selection. We see that the change in the associations for positions in set
, represented here by the second term on the right, is equal to a sum of all the selection coefficients acting on sets of positions in the population, weighted by the association between those positions and the ones in set
.
Equation 9, which gives the new moments in terms of the old reference values, can be used to calculate changes in allele frequencies caused by selection. If we choose the reference values to be the allele frequencies before selection (
= p
), then the change in allele frequency at position
is equal to D'
. With two alleles per locus, Equation 9 gives
|
(10) |
where
|
(11) |
On the right side of Equation 10, the first term represents selection acting directly on alleles at position
. The second term represents the effects of indirect selection: the force of selection acting on other positions that is transmitted to position
through the associations.
Equation 10 gives an exact expression for the change in allele frequency at position
caused by selection. If all positions at locus i are equivalent, then this is equal to the change in allele frequency at that locus. If not, the overall change at locus i is found by averaging
p
over all the positions at that locus. (Note, however, that the average allele frequency is not sufficient to fully describe the population.)
Transmission:
"Transmission" refers to an event that changes the contexts of genes. Obvious examples are meiosis, where a gene carried by a diploid individual becomes a gene in a haploid individual (the gamete), and fertilization, where the reverse transition happens. Migration can also be considered as a form of transmission, since genes change their context as they move from one location to another. The effects of transmission on the state of the population are determined by the transmission coefficients. The transmission coefficient t

is defined simply as the probability that the positions in set
were inherited from positions in set
. (Note that this is generally not the same as the probability that the positions in set
are transmitted to set
.)
To clarify the meaning of these coefficients, consider autosomal loci in a haploid population with two sexes. The transmission coefficient tim
if is the probability that a gene at locus i in a male was inherited from a gene at locus i in a female and is therefore 1/2. The transmission coefficient t{im,jm}
{if,jf} is the probability that the genes at loci i and j in a male were both inherited from a female (the mother), which is (1 - rij)/2, where rij is the recombination rate between loci i and j.
There are three constraints on transmission coefficients. First, transmission coefficients are zero unless each position in set
has a corresponding position in set
from which it descended. This implies that sets
and
must be equal when the context information is stripped from all of their positions; that is, t

= 0 if A
B. (For example,
because i
j; a gene at locus i cannot be descended from a gene at locus j.) Second, the coefficients representing transmission to any given set
must sum to 1. (In the notation introduced below,
.) A third constraint on the coefficients applies when transmission represents recombination, segregation, and/or syngamy. Then the sex of origin for each position in set
must equal the sex of carrier for the corresponding position in set
, since that is the sex of the parent from which a gene in set
descended.
Transmission coefficients often involve recombination between groups of more than two loci. We use rA to denote the probability that recombination occurs somewhere in the set of loci A; that is, the alleles at those loci passed to a gamete are a mixture of those inherited from the individual's mother and father. Table 2 gives the transmission coefficients for several cases of interest, including autosomes, X-linked loci, and cytoplasmic factors.
|
Many models assume that there is no genetic variation for the rules of transmission. In that case, the effect of transmission on the moments (the allele frequencies and associations) that describe a population is simple. Equation 3 implies that the moments after transmission, D''
, are then just a linear combination of the moments before, D'
. When the reference values are chosen to be equal for positions at each locus, the effect of transmission is particularly simple:
|
(12) |
This is our main result for the effects of transmission. The summation is over all sets of positions
that could become set
following transmission. The notation "
: U = A" means that
and
must be equal when the context information is stripped from them, that is, when U = A. (Taking the example of dioecious diploids, with
= ifm, the sum in Equation 12 has four terms at
takes the values iff, ifm, imf, and imm.) This requirement follows from the first constraint on transmission described above. Equation 12 needs modification if different positions at the same locus have different reference values, as discussed in Changing reference values below. The two-locus example presented in Appendix A shows how transmission coefficients are used in calculating changes in allele frequencies.
Equation 12 can be easily generalized to allow different genotypes to follow different transmission rules. Examples include cases where there is meiotic drive or genetic variation in recombination rates. As in Equation 6, we write the transmission coefficients as a polynomial function of genotype,
|
(13) |
where
is the set of all positions that influence transmission. The transmission coefficient 


is the mean probability that genes at positions
were inherited from positions
, averaged over all genotypes. The coefficient
t

|
represents the effect of the set of positions
on the transmission coefficient t

, in the same way that the selection coefficient a
represents the effects of the set
on fitness. To find the effects of transmission on the associations, substitute (13) into (12) and then average over all genotypes:
|
(14) |
(cf. ![]()
Changing reference values:
As described earlier, our system of describing a population is defined relative to a set of reference values. The investigator is free to leave these fixed or to change them as often as desired. It is often convenient, however, to change the reference values once per generation. By updating the reference values to the current allele frequencies, the associations have simple interpretations, and we can calculate the per-generation changes in allele frequencies. Moreover, updating only once per generation avoids a proliferation of alegebra, involving reference values at intermediate stages that eventually cancel. Sometimes it is convenient to update the reference values at the zygote stage. Alternatively, it may be easiest to update them before transmission, since under normal meiosis (no meiotic drive, etc.) allele frequencies are unchanged and the change in associations caused by transmission often takes a simple form when the associations have already been centered. If we are interested only in finding the evolutionary equilibrium, changing reference values from one generation to the next is not an issue.
Changing the reference values changes the associations D, because the latter are defined in terms of the former. Denote the associations before and after the change as D''
and D'''
, respectively, and the reference values before and after as
''
and
'''
. (If the reference values have not been changed since the start of the generation, then 
= 
.) The associations after the reference values change are found using Equation 1Equation 2Equation 3:
|
(15) |
Because the sum in (15) is not asterisked, it includes the term corresponding to
=
, which is D''
. This last expression is the main result for changing reference values.
The previous section notes that the transmission Equation 12 does not hold when different positions at the same locus have different reference values (that is, 

for some i = j). In that case, the associations between positions before transmission must be adjusted to the reference values for the positions that those genes will occupy after transmission,
|
(16) |
where 
* is the reference value for the position in set
that corresponds to
in set
.
With the exception of random drift, all of genetic evolution can be concisely represented by two equations: Equation 9 for the effects of selection and other deterministic forces and Equation 12 or Equation 16 for the effects of transmission. These can be supplemented by Equation 15, which does the bookkeeping needed to ensure that the associations have a simple interpretation. Other deterministic forces, like mutation and migration, can also be described by these equations. Mutation can be represented as a form of frequency-dependent selection and migration as a form of transmission (since genes change their contexts when they move). It is easier to find the effects of mutation directly, however, which we do below. Before doing that, however, we develop an approximation that greatly simplifies the equations for selection and transmission.
| THE QLE APPROXIMATION |
|---|
The recursions derived above can be used to calculate the exact dynamics for a wide range of multilocus population genetic models. Although this approach may give more insight than directly following genotype frequencies, it will not necessarily be any more tractable. That is because exact results require following the dynamics of the same number of variables, regardless of whether they are genotype frequencies or moments (that is, allele frequencies and associations). One of the great appeals of the moment-based approach introduced by BT91 is that in some situations expressions for the associations can be greatly simplified by approximation. In this section, we derive approximate expressions for the associations and changes in allele frequencies when the population is in a state of QLE. The concept was introduced by ![]()
![]()
![]()
The first fundamental assumption we must make is that all the associations D are of order a, by which we mean that they are not larger than a constant factor times the largest of the a's. BT91 shows that this condition is met when the forces that generate associations within a sex (epistasis, migration, etc.) are weak relative to recombination and when nonrandom mating is not strong. An intuitive justification is that the associations are produced by evolutionary forces that are of order a (see Equation 9) and will not accumulate to values that are much larger than that if the forces breaking them down (recombination, segregation, and mutation) are sufficiently strong. The second assumption needed for the QLE approximation is that all the selection coefficients a are <<1. BT91 shows that when these two conditions hold, a population rapidly settles into a state where the allele frequencies are changing slowly, and the associations are close to the equilibrium values they would reach if the allele frequencies were in fact stationary (see also ![]()
![]()
Approximations for the associations:
We assume that there are two alleles at each locus, which simplifies the analysis. The approach can be extended to multiple alleles following the leads of BT91. The main results developed below are illustrated with a simple two-locus example in TABLE A11.
Consider a life cycle in which we define the reference values to be the allele frequencies at the zygote stage. A series of selection events occur during the course of the generation. The generation ends with transmission, creating the zygotes for the next generation. We seek to derive an approximation for the dynamics of allele frequencies that is accurate up to (and including) terms of order a2, which we denote O(a2). From Equation 10, we see that approximation requires in turn that we find an approximation for the associations D that is accurate to order a. To do that we find the values that give an equilibrium for the recursion equations for the D that are accurate to order a; those solutions are our QLE approximations for the associations. The results apply not just to selection but to other deterministic forces that generate associations (such as migration) so long as they are weak relative to recombination and segregation. To simplify the derivations, we assume that there is no genetic variation in the transmission coefficients, an assumption that could be relaxed (see ![]()


.)
To begin deriving an approximate recursion for the D, Equation 9 gives the cumulative effect of selection and other deterministic forces on the associations between positions in set
,
|
(17) |
where
is a set of distinct positions and pq
is defined by Equation 11. The asterisked
* indicates that the sum does not include the term with
=
; it has been separated out to give the first term, D
.
The first step of Equation 17 follows from Equation 9 because the D are of order a and therefore the term a
D
D
in Equation 9 is of order a3 and so can be neglected. The second step follows because the term a
D
in the first line is of order a2 except when
=
, in which event the reduction formula Equation 5 gives us
.
The effect of changing reference values can also be simplified. Equation 10 shows that the change in the allele frequencies is of order a. If we define the reference values to be the allele frequencies, then the quantities (
''
-
'''
) that appear in the product in Equation 15 are of order a. That equation therefore reduces simply to
, meaning that the effect of updating the reference values can be neglected. Assume that differences between positions at the same locus are O(a), which holds under normal sexual inheritance. With help from Equation 12 and Equation 17 we then get the full recursion for the associations over an entire generation:
|
(18) |
On setting
, we get a QLE approximation for the associations that is accurate to order O(a):
|
(19) |
The first sum on the right is zero if the positions in
include more than one sex of origin. That is because if
includes more than one sex of origin, then
in the first sum would have to include more than one sex of carrier. But D
is of order a2 if the positions in
include both sexes of carrier, since it represents associations between alleles in two randomly chosen zygotes.
Equation 19 is the main result of this section. It gives the solutions for the associations implicitly: The QLE value 
on the left side depends on the QLE values for the other associations, which appear as 
on the right side. The relationship is linear (because of the linear form of the transmission Equation 12), and so the solution can always be found using standard matrix algebra. Thus the
can be calculated directly, using standard matrix methods, given a set of transmission rules that specify the t's, a set of allele frequencies from which we can calculate pq
, and a set of selection coefficients a. We have implicitly assumed here that the selection coefficients are constant in time, but the approach can be generalized to changing environments (see BT91, Appendix B; ![]()
t). If, however, the environment changes on a time scale that is slow relative to the rate at which associations are changed by transmission, then the results given above apply.
The next two sections illustrate how to do this by carrying out the calculations for autosomal genes in dioecious haploids and for autosomal, sex-linked, and cytoplasmic genes in diploids.
Autosomal genes in haploids:
The QLE approximation for autosomal inheritance in a haploid population with two sexes was found by BT91. This section rederives their result to illustrate the new notation and how to use Equation 19 to find a QLE approximation.
The context for each gene now contains only its sex of carrier. That is because an individual carries only one gene at each locus, rather than the two that must be distinguished in the case of diploids. The comments following Equation 19 imply that for this case the first sum on its right side reduces to (t

f
f + t

m
m), where
f stands for set
with the sexes of carrier for all its positions converted to f and similarly for
m. The associations among a set of autosomal loci are equal in male and female zygotes, so
, and further those quantities must be equal to 
on the left side of Equation 19 because all the positions in
must have the same sex of carrier. With no sex differences in recombination we have
, where (1 - r
) is the probability that the loci in set
are not broken apart by recombination, and the factor of 1/2 accounts for the probability that genes in set
were inherited from a given parent.
Putting those facts together gives the QLE approximation
|
(20) |
This is equivalent to BT91's Equation 25. Some superficial differences are caused by three changes in notation. Their result is expressed in terms of recombination rates rather than transmission rates. Second, BT91 separately defined within-male, within-female, and between-sex (nonrandom mating) selection coefficients; all of these are included in the sum on the right side of Equation 20. Last, they counted separately selection coefficients with different permutations of the same set of positions, which generates the combinatorial terms in their expression.
Autosomal, sex-linked, and cytoplasmic genes in diploids:
Now consider autosomal genes in a diploid population with two sexes. The context for a gene now includes both its sex of carrier and sex of origin. We allow for nonrandom mating and sex differences in selection and recombination. To simplify the calculation, however, we assume that there is no genetic variation in recombination rates and no genomic imprinting (that is, an allele's sex of origin does not affect its expression). The approach outlined here can be directly extended to allow for more than two sexes, as might be appropriate to describe a population with partial selfing.
Careful consideration of Equation 19 shows that the associations fall into three cases. Case 1 are associations among a set of positions
that include both sexes of carrier. The QLE approximation for these associations is simply 
= 0. That is because these represent associations between genes in two or more randomly chosen zygotes.
Case 2 are the associations between a set of positions that all have the same sex of carrier, but some have a male and others a female sex of origin. This kind of association exists for some sets of positions (for example, autosomal), but not others (for example, sets with only cytoplasmic loci). If they do exist, Equation 19 gives the QLE approximation
|
(21) |
These associations come from nonrandom mating in the previous generation: Associations between genes with different sexes of origin within an individual appear when there are correlations between the genotypes of mating males and females in the previous generation. These associations, which include Hardy-Weinberg disequilibria (an excess or deficit of heterozygotes), are zero under random mating because then the selection coefficients for positions with both sexes of carrier are of order a2. The transmission coefficient t

can be translated into recombination rates according to the way that the genes in set A are inherited, as discussed above in the section on transmission.
Case 3, the last category of association, is when all positions in
have the same sex of carrier and all have the same sex of origin. Here D
represents an association among genes within a single individual that were inherited from the same parent. The QLE approximations for this case depend on how the genes in set
are inherited. They can be calculated by first writing out Equation 19 for the associations that do exist, given the mode of inheritance, out of the four possible cases
Aff,
Afm,
Amf, and
Amm, where, for example, Afm means that all positions in set A have a female sex of carrier and a male sex of origin. Inspection of the transmission coefficients reveals that these expressions do not depend on any associations that do not exist (e.g.,
Afm does not depend on
Amm when all genes in set
are X-linked, because
). Last, solve the resulting equations. That procedure leads to the following results for autosomal, X-linked, Y-linked, and cytoplasmic genes. The transmission coefficients used in the calculations are shown in Table 3.
|
When all the genes in set A are autosomal, all four of the possible case 3 associations exist. Solving Equation 19 then gives
|
(22) |
where
![]() |
(23) |
Here rxA is the recombination rate for set A in sex x, where x can take the values m and f, and
x stands for the opposite sex of x (for example, if x = f then
x = m). The term F(·) results from nonrandom mating, which creates associations between alleles inherited from different parents in the next generation. These alleles are brought together in single gametes by recombination, producing associations within the same gametic genome (i.e., linkage disequilibria) two generations later. The summation in (23) is over all the different ways that the set of loci A can be partitioned into nonnull sets. With A = {i, j, k}, for example, the sum includes six terms: S = {i} and T = {j, k}, S = {j, k} and T = {i}, S = {j} and T = {i, k}, etc. F(·), which appears in results below, vanishes under random mating. The selection coefficients























