- THIS ARTICLE
-
Abstract
- Full Text (PDF)
- Alert me when this article is cited
- Alert me if a correction is posted
- SERVICES
- Similar articles in this journal
- Similar articles in PubMed
- Alert me to new issues of the journal
- Download to citation manager
- Reprints & Permissions
- CITING ARTICLES
- Citing Articles via Google Scholar
- GOOGLE SCHOLAR
- Articles by Santiago, E.
- Articles by Caballero, A.
- Search for Related Content
- PUBMED
- PubMed Citation
- Articles by Santiago, E.
- Articles by Caballero, A.
Effective Size and Polymorphism of Linked Neutral Loci in Populations Under Directional Selection
Enrique Santiagoa and Armando Caballeroba Departamento Biología Funcional, Universidad de Oviedo, 33071 Oviedo, Spain
b Departamento Bioquímica, Genética e Inmunología, Universidad de Vigo, 36200 Vigo, Spain
Corresponding author: Enrique Santiago, Departamento de Biología Funcional, Universidad de Oviedo, 33071 Oviedo, Spain., esr{at}sauron.quimica.uniovi.es (E-mail).
Communicating editor: B. S. WEIR
| ABSTRACT |
|---|
The general theory of the effective size (Ne) for populations under directional selection is extended to cover linkage. Ne is a function of the association between neutral and selected genes generated by finite sampling. This association is reduced by three factors: the recombination rate, the reduction of genetic variance due to drift, and the reduction of genetic variance of the selected genes due to selection. If the genetic size of the genome (L in Morgans) is not extremely small the equation for Ne is
where N is the number of reproductive individuals, C 2 is the genetic variance for fitness scaled by the squared mean fitness, (1 - Z) = Vm/C2 is the rate of reduction of genetic variation per generation and Vm is the mutational input of genetic variation for fitness. The above predictive equation of Ne is valid for the infinitesimal model and for a model of detrimental mutations. The principles of the theory are also applicable to favorable mutation models if there is a continuous flux of advantageous mutations. The predictions are tested by simulation, and the connection with previous results is found and discussed. The reduction of effective size associated with a neutral mutation is progressive over generations until the asymptotic value (the above expression) is reached after a number of generations. The magnitude of the drift process is, therefore, smaller for recent neutral mutations than for old ones. This produces equilibrium values of average heterozygosity and proportion of segregating sites that cannot be formally predicted from the asymptotic Ne, but both parameters can still be predicted by following the drift along the lineage of genes. The spectrum of gene frequencies in a given generation can also be predicted by considering the overlapping of distributions corresponding to mutations that arose in different generations and with different associated effective sizes.
DIRECTIONAL selection generates differences in the reproductive success of individuals, increasing the variance of change in gene frequency and reducing the genetic diversity of neutral alleles. A population, then, behaves for these parameters like an ideal unselected population of size Ne, the effective population size (![]()
![]()
When differences in fitness are inherited, the effective population size cannot be predicted from the variance of progeny number at a given generation. The drift process is amplified over generations because the random association that originated in a given generation between neutral and selected genes remains in descendants for a number of generations until it is eliminated by segregation and recombination. This problem was first addressed by ![]()
![]()
![]()
![]()
![]()
![]()
![]()
![]()
![]()
Here, we develop a prediction of the effective population size under linkage, extending the argument of ![]()
![]()
,
where N is the number of reproductive individuals, C 2 is the genetic variance of fitness of individuals (these are measured relatively to the mean fitness), and Q is the sum of a series of relative terms, the first one being the change of one unit in neutral gene frequency because of new associations created in a given generation and the rest being the remaining fractions of this change in the following generations. For example, for unlinked genes and weak selection, Q
1 +
+
+
+ ... = 2 (![]()
![]()
Thus, the term Q 2 C 2 is the variance of the long-term selective values, and with no linkage and weak selection it approximates 4C2. This does not hold, however, under linkage, but the argument can be rebuilt to consider the decline of the association between a neutral gene and selected genes on chromosomes. The problem of the reduction of the effective size under linkage is reduced to the problem of finding the appropriate value of Q 2, and the same argument used by ![]()
Initially, to make predictions independently of gene frequencies and effects, an infinitesimal model (an infinite number of genes of small effect) is considered, but predictive equations are also valid for the background selection model of ![]()
![]()
![]()
![]()
| DERIVATION OF EXPRESSIONS |
|---|
The general model:
We consider a monoecious diploid population with random mating and a constant number of reproductive individuals, N. Every individual in the population is made up of two haploid homologous complements with
chromosomes l Morgans (M) long each. (Table 1 shows the most common notation used in this article.) Each complement is referred to as a "gamete" (i.e., there are 2N gametes in the population). It is assumed that there is no genetic correlation between gametes in parents. The mapping function of ![]()
, is assumed to relate the recombination fraction r and the genetic distance x in Morgans. A large number n of loci uniformly distributed on the chromosomes determines fitness. Allelic effects can be different for different loci, but gene action is additive within loci; that is, the fitness value of the heterozygote is the average value of the corresponding homozygotes. This latter assumption, however, can be removed for some models (see below). Gene effects are multiplicative between loci. This genetic system is at mutation-selection-drift equilibrium with a mean fitness of one; i.e., each parent has two descendants on average, and the genetic variance for fitness of individuals is C 2, which is assumed to be small.
|
As the effects of loci are multiplicative, the relationship between variance of individuals and the contributions of the n selective loci to variation is C2 =
nj=1 (1 + c2j) - 1 , where c2j is the square of the coefficient of variation contributed by locus j, i.e., the variance for fitness of the locus, scaled such that the average fitness is 1.
Consider a neutral locus in the middle of a chromosome. We assume that the neutral alleles at this locus are initially produced by mutation, but this is not a necessary assumption of the model. Due to the finite size of the population, the sampling process generates random associations between the neutral alleles and selected loci. The expected change in gene frequency of the neutral allele (S) is the covariance between the frequency of the allele in gametes (p) and the selective value (f) of individuals carrying the gametes, S = cov (p, f) (see ![]()
Let pi be the frequency of an allele of the neutral locus in gamete i (pi can be 0 or 1). For the moment, we consider one single selected locus j with additive effects of alleles. Locus j is at a genetic distance of x M from the neutral locus. Let us consider a copy of the neutral allele present in a given individual, and let f'j be the selective value contributed by the selected allele present in the same gamete as the neutral allele and f''j the selective value contributed by the homologous selected allele in the other gamete. Under random mating, the expected change in gene frequency (i.e., covariance) of the copy of the neutral allele in the first generation (S1) can be partitioned into the change due to the selected allele in the same gamete as the neutral gene (S'1) and the change due to the homologous selected allele in the other gamete in the same individual (S''1) ,
A fraction of the random associations generated in the first generation will remain in the following generations even if the population is expanded to an infinite size after the first generation. The expected value of the remaining covariances in the second generation depends on two factors: the change in expressed genetic variance of the selected locus and the recombination rate between the selected and the neutral loci. The first factor affects both partial covariances (S'1 and S''1) in an identical way: Every generation, the genetic variation of the selected locus is assumed to be reduced by selection and drift by a constant proportion (1 - Z). Thus, both covariances are reduced to a proportion Z. On the contrary, the decline because of recombination is different for both partial covariances. The association between the neutral allele and the selective value of the same gamete is maintained with a probability 1 - r =
(i.e., if they do not recombine). Therefore the expected partial covariance that remains in generation 2 is
The effect of recombination on the other partial covariance (between the neutral allele and the selected gene in the other gamete) is opposite to the previous one. Recombination incorporates the selected allele of the homologous gamete into the gamete carrying the neutral gene with a probability r. Therefore, the remaining covariance in generation 2 is
In the following generations, both covariances are reduced in the same proportion (1 - r)Z per generation, the selected allele remaining in the same gamete as the neutral gene as the condition for the maintenance of the association, i.e.,
and so on. The sum of all these covariances from generation 1 to infinity is the total change in gene frequency over generations due to the association newly created between the neutral allele and the selected locus j in the initial generation. New associations are created between the neutral gene and the selected locus in successive generations until an asymptotic stage is reached. From ![]()
![]() |
(1a) |
![]() |
(1b) |
[Note that in the derivation of Equation 17 of ![]()
![]()
![]()
. The variance of the cumulative selective values due to locus j is Q2j c2j , and this can be again partitioned into the variance due to the selected allele originally located in the gamete with the neutral gene, and the variance due to the selected allele in the other gamete,
If j were the only locus with effect on fitness in the genome, the effective population size would be
![]() |
(2) |
From Equation 1a and Equation 1b we note that Q'j and Q''j take the value 2/(2 - Z)
2 when the neutral gene and the selected locus are located in different chromosomes (i.e., x =
), the population size is large, and selection does not change the genetic variance very quickly (i.e., Z
1). Therefore, with no linkage, Q'2 = Q''2
4, and Equation 2 yields
![]() |
(3) |
(![]()
![]()
![]()
![]()
instead of Equation 3, but this is not correct as we discuss.
Now consider the n selected loci with different contribution to the variance for fitness. With multiplicative effects among them, the total variance of the cumulative selective values for all the selective loci in the genome is
Therefore, the asymptotic value of Ne is
![]() |
(4a) |
If c2j and Q2j values are uncorrelated (i.e., independence between Z and c 2 values), Equation 4a reduces to
![]() |
(4b) |
nj=1Q'2 jn and Q''2 =
nj=1
.
For large populations Q''j is nearly 2 when linkage is not very tight (Equation 1b), and it asymptotically tends to 1 as x tends to 0. Thus, the average Q''2 ranges from 1 (complete linkage) to 4 (no linkage). Q'2 approximates 1/r 2 (Equation 1a) in large populations under weak selection (i.e., Z
1), so it may take values much larger than 4 for tight linkage (r
1/2). Thus, Q''2 can usually be neglected relative to Q'2 , and Equation 4b can be reduced to
![]() |
(5) |
- 1]/
). The other component is due to loci in the chromosome carrying the neutral gene (with probability 1/
). As the neutral gene is assumed to be located in the middle of the chromosome, the second component can be obtained by integration over one-half of the chromosome length. Thus, using Equation 1a and the above probabilities,
Numerical analysis (data not shown) indicates that the relevant parameter is the product L =
l, that is, the genetic size of the genome. Variations in the distribution of the sizes of the chromosomes do not make much difference if the size of the whole genome is constant. Thus, the first term in the above equation can be dropped by setting
= 1 and substituting l by L:
![]() |
(6) |
The following approximations to the above expression can be made:
![]() |
(7) |
Then, a general expression for Ne with linkage can be obtained by substituting Q'2 from Equation 6 into Equation 5. For L > 0.2 or so, using the approximation (7), Equation 5 can be simplified to
![]() |
(8) |
If selection is weak and linkage is not very tight, i.e., the exponent is smaller than 1 or so, Equation 8 can be expressed in a way more familiar to the classical equations for the effective population size, Ne
N/[1 + C 2/(1 - Z)L].
Application to particular genetic systems:
The above equations for predicting the effective population size are a function of the proportional reduction of the genetic variation (1 - Z) at selected loci. Two processes are involved in the dynamics of the genetic variation of loci: selection and drift. It is generally assumed that drift eliminates variation at a constant rate 1/2Ne. The change in genetic variance due to selection depends on the genetic system. For some models, this change is constant. Particularly, phenotypic selection on an infinitesimal model (![]()
![]()
![]()
At equilibrium, the mutational input per generation (Vm) equals the loss of variation by drift and selection. Therefore, the proportion of the expressed genetic variance C 2 that is lost by drift and selection per generation is Vm/C 2. The remaining fraction of the expressed variation, which is expected to be maintained after one generation of selection, is
![]() |
(9) |
This term can be substituted in the previous equtions to obtain the appropriate Q'2 and Ne values. For example, the predictive Equation 8 becomes Ne
N exp[-(C 2)2/(L Vm)].
Infinitesimal model:
All the previous equations apply under the infinitesimal model. Favorable and deleterious mutation models reduce to the infinitesimal model if effects are very small. If the population is small, selection is not very strong, and linkage is not very tight, the equilibrium variance can be approximated by C 2 = 2NeVm (![]()
![]()
)
1 -
.
Deleterious mutations model:
This is equivalent to the background selection model of ![]()
![]()
![]()
qi, as the deleterious allele frequency will be generally small, and the expected gene frequency in the next generation is qi+1
qi - qi(1 - qi)t
qi(1 - t). Therefore, the proportional change in the genetic variance of the selected locus due to selection is qi+1
. This is the factor by which genetic variance is changed by selection for this model. This result can also be obtained directly from Equation 9, noting that under mutation-selection balance C 2 = Ut (![]()
![]() |
(10) |
Equation 10 can also be obtained from the formula by ![]()
![]()
. Substituting into Equation 9 we get Equation 10. Thus, the appropriate value of Q'j can be obtained from Equation 1a,
![]() |
(11) |
The value of Q'2 is given by Equation 6; if the genome size is not extremely small (L > 0.2), using (7) and (10) we get Q'2
, and Equation 8 becomes
![]() |
(12) |
When the effect of drift is negligible (i.e., t
1/2Ne), then C2 = Ut and Ne = N exp(
)
, which agrees with the approximation of N. H. BARTON (unpublished results; see p. 671 of ![]()
![]()
![]()
Without recombination (L = 0) and t
1/2Ne, Equation 6 reduces to Q'2 =
. Substituting this and C2= Ut into Equation 5, Ne
N exp[
]. This equation is identical to the expression of ![]()
![]()
![]()
=
0 exp(
), where
0 represents the expected heterozygosity if there is no selection.
Favorable mutations model:
Assume that the selective values of the three genotypes at any selective locus j are 1, 1 + t, and 1 + 2t, respectively. The predictive equations previously shown do not hold if favorable mutations are not effectively neutral (i.e., t > 1/2Ne, after ![]()
q1 + q1(1 - q1)t and the proportional change in genetic variance due to selection is q2(1 - q2)/q1(1 - q1). Drift also reduces the genetic variance by 1 - 1/2Ne, therefore, Z1 =
. Here we consider only the effect of the gamete associated with the neutral gene (i.e., we neglect Q''j ), obtaining Q'j with the same argument leading to Equation 1a. If the frequency of recombination between the neutral gene and the selected locus j is r, and the covariance between them in the first generation is S'1 , the expected changes in gene frequency in the following generations are
and so on. Zi is the proportional change in genetic variance from the initial generation to generation i due to selection and drift. Therefore, the value of Q'j given an initial frequency q1 for the selected locus when the neutral gene appears is (see Equation 1a)
where i represents the successive generations and qi are the sequential frequencies of the selective allele in the consecutive generations, which are obtained as qi
qi-1 + qi-1(1 - qi-1)t. Now, the neutral mutation may appear when the selected allele has any gene frequency (q1) in the range 0 to 1. Thus, the appropriate value of Q'j is the mean weighted value of the Q'j(q1) values corresponding to all the possible initial frequencies q1 in the range 0 to 1, the weights being the product of the proportional contribution of the possible initial frequencies to the observed genetic variance and the probability of being at all the possible values of the initial frequency. In a deterministic mutation-selection model, the probability of having a particular frequency, q, is proportional to 1/[q(1 - q)] (![]()
![]() |
(13) |
The appropriate Q'2 value of the neutral locus is the average value of the Q'2j values corresponding to all the selective loci j in the genome. This average value has to be substituted into Equation 5. Therefore, although it seems difficult to reach a simple algebraic equation to predict Ne for the model of favorable mutations, the principles previously shown can be applied to find numerical approximations.
| ASYMPTOTIC EFFECTIVE SIZE, HETEROZYGOSITY, AND POLYMORPHISM |
|---|
The parameter Ne that we have derived is the asymptotic effective population size. If a neutral allele appears in the population at a given generation by mutation, the drift process will be initially weak on it, but random associations with selected genes will accumulate over generations making drift increase until an asymptotic value is reached. In the first generation, the magnitude of the drift process on the neutral allele can be quantified by the variance in allele frequency in the first generation. Although this refers only to the particular neutral genes that appeared one generation ago, we refer to it as the effective population size in the first generation,
![]() |
(14) |
)
1i=1 S'i = 1, so that Q'1 = 1 . The value of Q''1 is also 1. As stated before, the asymptotic value of Q'' is less than or equal to 2, and after a few generations it will be much smaller than Q' under linkage, so we can ignore it in Equation 14 and henceforth. The magnitude of the drift process on the neutral allele from generation 1 to 2 is analogously quantified by the variance in allele frequency from generation 1 to 2, which we refer to as Ne,2. This is calculated using Q'2 , the average of all the Q'j2 (see the accumulation of terms stated before), obtained as Q'j2 = (
)
2i=1S'i = 1 + (1 - r)Z , and ![]() |
(15) |
From generation 2 to 3, Ne, 3 can be calculated using Q'3 which is the average of all the Q'j3 , obtained as Q'j3 = (
)
3i=1S'i = 1 + (1 - r)Z + (1 - r)2Z 2 , and so on up to infinite, Q'j,
= Q'j , when the asymptotic effective population size, Ne,
= Ne (equations in the previous sections), is reached.
There is no simple solution for the Q' terms for consecutive generations (except for infinite generations; i.e., Equation 6). Therefore, numerical methods have to be applied to estimate the values of the partial effective sizes in consecutive generations. If genetic variance for fitness is not large, in the first generation Ne,1 is close to the census size N of the population. In the following generations the effective size drops toward its asymptotic value (see Figure 1). For a new neutral mutation, the decay of genetic variance is 1/2Ne,1 in the first generation, 1/2Ne,2 in the second generation, and so on. A consequence of this cumulative effect of drift on new mutations is that there is not a simple formula to connect asymptotic population size, heterozygosity, and proportion of segregating sites for neutral alleles, as we address next.
|
Heterozygosity:
Under the infinite sites model, the heterozygosity contributed by a new mutation (i.e., with frequency 1/2N) is 2(1/2N)(1 - 1/2N)
1/N. Then, with a mutation rate µ per locus and generation, the number of new mutations per generation is 2Nµ, and the input of heterozygosity per generation is about 2µ. The neutral variability generated by these mutations decreases at an increasing rate, which is a function of the consecutive values of Ne,i, so the remaining proportion after
generations is R
= 
i=1(1 -
) . Therefore, the expected heterozygosity at equilibrium (
) is the sum of the contributions by mutations during all the previous generations,
![]() |
(16) |
If there is no selection, or selection acts on a noninherited trait, there is a single value of Ne for the consecutive generations. Thus, R
= (1 - 1/2Ne)
, and substituting this into Equation 16,
= 4Neµ, as expected (CROW and KIMURA 1970, p. 323). Furthermore, when selection is on an inherited trait and the selective effects are large, the consecutive values of Ne,i decay very quickly reaching values close to the asymptotic effective size, Ne, in a few generations. Under this condition, heterozygosity is again well approximated by
= 4Neµ. Otherwise, this equation underestimates heterozygosity because the effective size associated with a mutation is larger than the asymptotic Ne for a long period of time. This is illustrated in Figure 1, which shows the expected heterozygosity for consecutive generations of a neutral allele starting with a single copy in the initial generation. It is observed that the heterozygosity has a rate of reduction lower than that of Ne,i. However, the degree of disassociation between heterozygosity and the asymptotic Ne is much smaller than that between the proportion of segregating sites and the asymptotic Ne, as we explain next.
Proportion of segregating sites:
Under an infinite sites model, the proportion of segregating sites increases by 2Nµ, the number of new mutations per generation. The equilibrium proportion of segregating sites, s, can be obtained by calculating the probability that mutants appearing in previous generations are still segregating in the current one. Looking backward in time, the remaining fraction of the segregating sites produced
generations ago is a function of the magnitude of the drift process until the current generation. As we have seen, this magnitude is represented by the partial Ne,i values from generation 1 to generation
and it can be summarized by the harmonic mean Ne,H
of these
values, i.e.,
= (
) 
i=1(
) . Thus, the probability of segregation in the current generation of mutations appeared
generations ago (P
) can be approximated by
![]() |
(17) |
(![]()
in the long term, say for
> Ne,H
. Therefore, in practice we utilize this equation until the difference for two consecutive generations, P
- P
+1, is smaller than the expected asymptotic rate of decay 1/2Ne. After that, the recursive equation P
+1 = P
(1 - 1/2Ne) is used. The proportion of segregating sites s can be computed as the sum of the remaining contributions from all the previous generations,
![]() |
(18) |
The proportion of segregating sites is generally much more dependent on N than on Ne because only a small proportion of new mutations segregate for a long period. For example, if there is no selection, the s value for the whole population is approximately 4Nµ ln 2N (see ![]()
=
, which is a relatively large proportion. For example, for a population of N = 100, about 10% of the segregating sites are lost by drift every generation. This is also illustrated in Figure 1. The probability of segregation of an initially single-copy neutral allele has most of its reduction in the initial generations. Given this large rate of loss of polymorphic loci per generation, it is clear that the proportion of segregating sites is very dependent on the mutations arising few generations ago and, therefore, the Ne,i values of the initial generations have much influence. Because these initial Ne,i values are closer to the census size N than to the asymptotic effective size, Ne, the proportion of segregating sites in the whole population is only slightly dependent on Ne. On the contrary, the rate of loss of heterozygosity per generation is relatively small (1/2N with no selection). For example, for a population of size N = 100, it is only 0.5% per generation. Therefore, the heterozygosity is more dependent on the asymptotic effective population size, Ne. The above arguments indicate that if the asymptotic Ne is much smaller than the census size N, heterozygosity will be more affected by selection than the proportion of segregating sites because the latter depends strongly on N. This dependence of the asymptotic reduction of s,
, and Ne,i on population size predicted under a model of deleterious mutations is shown in Figure 2. The larger the population size the stronger the selection as the mutant effects are assumed to be constant (t = 0.01). Reductions of Ne,i and
are close and tend to be equal with large N (strong selection), as previously noted by ![]()
|
Allele frequency spectrum:
The application of the classic theory of Ne provides methods to predict the spectrum of frequencies of neutral genes a number of generations after their appearance (e.g., ![]()
of the partial Ne,i values for the
generations is used as the constant effective population size for the
generations. The distribution of neutral gene frequencies in the population can then be computed as a combination of distributions for neutral mutations that appeared in the actual generation, one generation ago, two generations ago, etc., up to infinity. An illustration of this is given in the next section.
| EVALUATION OF RESULTS |
|---|
The above predictions and equations were checked by Monte Carlo simulations. Random mating populations with N diploid individuals were simulated. The selective system was controlled by n loci evenly distributed in linear chromosomes. Further n neutral loci were allocated alternating with the selected loci. The population was initially run for thousands of generations so that the selective system could reach mutation-selection-drift equilibrium. Thereafter, two different sets of runs were carried out according to the objective. In the simulations used to evaluate Ne, alleles from each neutral locus were initially set at frequency 0.5. The population was then simulated for 100300 generations until the asymptotic effective size was clearly reached. Fifty additional generations were run. At least 200 independent replicates of this process were simulated. The variance (Vari) of the frequency of the neutral genes was computed for each generation i over loci and replicates. The effective population size at a given generation i was computed as Ne,i =
. The observed asymptotic Ne value was computed as the average of the Ne,i values of the 50 additional generations. A different set of simulations was run to evaluate the heterozygosity and the segregation of polymorphic loci. In this case, the neutral genes were introduced as mutants, and the population was run until the equilibrium heterozygosity and polymorphism was reached. The selective value of an individual was calculated as (1 + t)k for the model of favorable mutations and (1 - t)k for the model of detrimental mutations, where k is the number of mutants carried by the individual. Every generation the mean fitness of the population was set to 1, and the variance of relative fitnesses of individuals (C 2) was computed.
Table 2 shows some simulations of asymptotic values of Ne,
, and s. Predictions were generally close to simulations. As was explained before, the effective size is progressively reduced over generations until the asymptotic value is reached. A comparison with simulations is made in Figure 3. Predictions of the equilibrium heterozygosity and proportion of segregating sites in Table 2 were made from these values of Ne,i in consecutive generations as explained above. As expected, the absolute reduction of Ne is generally greater than the reduction of heterozygosity and polymorphism (cf. Figure 2) because
and s depend not only on the asymptotic Ne but also on nonasymptotic values, particularly s. A tendency of convergence between the ratios
/
0 and s/s0 with increasing population size is predicted, as noted by ![]()
|
|
Predictions are also accurate when mutations of unequal effects are considered. For example, simulations from ![]()
= 0.70 and ß = 0.032 give an average
/
0 of 0.67. The prediction from Equation 4a is 0.65, and that from the approximation (4b) (assuming no correlation between C2i and Q'2i ) is 0.67, suggesting that the shape of the distribution of effects is not very important for the effective population size.
Finally, Figure 4 represents an example of the agreement between observed and expected allele frequency spectrums. The expected frequency distribution, under selection for the whole population of mutations originated
generations ago, was obtained by using transition matrix methods. The partial Ne,i values for generations 1 to
were predicted, and the harmonic mean Ne,H
of these was used as the constant effective size of mutations originated
generations ago. Predictions (top line) were made by accumulating all the expected distributions for neutral mutations originated in all the previous generations and in the current one. Simulations (boxes) were very close to these predictions. The bottom line shows the expected distribution, which would have been predicted under a pure neutral model without selection. This was calculated assuming the constant effective population size, which explains the observed level of heterozygosity in the population.
|
| DISCUSSION |
|---|
The fundamental concept in our analysis is that the parameter Ne, which summarizes the magnitude of the drift process in a genomic region or in the whole genome, is a function of the rate of reduction of the covariance between the neutral genes and the selected system. This reduction depends on three factors: the genetic size of the genome (i.e., the recombination rate), the change of variance of the selected loci due to selection, and the reduction of variance due to drift. At equilibrium, the total rate of reduction is
=
+ t for models in which this rate is independent of the gene frequencies (i.e., infinitesimal model or deleterious mutations model). When the effects of the selected loci on fitness are large in relation to Ne, say t
1/2Ne, the relative influence of genetic drift is small and predictions become independent of Ne. In this case, there is full agreement with equations from ![]()
![]()
![]()
Our predictions of Ne can be made in terms of compound parameters, such as the variance for fitness, C 2, and the new input of mutational variance, Vm, but not necessarily on mutation rates and mutational effects of spontaneous mutations, whose magnitudes are in a current debate (e.g., ![]()
![]()
![]()
= 0.02, C2 = 0.01, and L = 1.25 into Equation 8 and Equation 9, we obtain Ne = 0.67N, which is a considerable reduction in effective size due to inherited differences in viability alone.
A main requisite for our model to work is the continuous flux of genetic variation for fitness in all the chromosome regions. Mutation introduces new variation at neutral sites while selection reduces the genetic variability. This requirement is far away from the strong selective sweep model assumed by ![]()
![]()
/
0) is 0.031, 0.084, and 0.246 when a single selective locus is segregating all the time, one-third of the time, or one-twelfth of the time, respectively. The corresponding predictions obtained with our method are 0.028, 0.033, and 0.041, respectively. Thus, the two latter, including periods of recovery of polymorphism, deviate from the assumptions of our model, and predictions become more and more inaccurate.
Our derivation follows the arguments of ![]()
![]()
, is obtained, as deduced by ![]()
![]()
![]()
![]()
. ![]()
![]()
![]()
![]()
The reason for the confusion is clear from the derivation in this article. ![]()
![]()
1), Q'
, and from Equation 5, Ne =
, which agrees with BARTON's expression. If now r = 0.5, the above expression yields Ne =
. However, neglecting Q''2 is allowed only for moderate or strong linkage because only for r
0.5 is Q'2
Q''2 . The intuitive explanation is that with tight linkage, the fitness associated with the neutral allele depends mostly on that of the gamete carrying it and Q'2
Q''2 . However, for very loose linkage or no linkage, the fitness of the homologous gamete is also important: Q'2
Q''2
4 , and Ne =
.
To reduce the complexity of the derivation, we have considered that the recombination rate is constant across the chromosome and the neutral gene is located in the middle of a chromosome. An equivalent derivation can also be developed for a neutral gene at any location. The neutral location does not make a big difference unless the gene is in the final region of the chromosome tip. In this region, the effect of drift is smaller as closely linked selective genes can only (or mainly) be located at one side of the neutral gene, reducing down to a half the random associations with selected genes. As these regions in both tips are very small, their weight on the average Ne value for the whole genome is irrelevant and the result for the central location is a very good approximation to the average Ne. This effect indicates that Ne is mainly determined by the strength of selection acting on the region closely linked to the neutral locus. An equivalent conclusion has been reached by ![]()
Regional variations in the frequency of recombination are often observed (see ![]()
![]()
![]()
The reduction of effective size associated with a neutral gene is progressive: The magnitude of the drift process is smaller for new neutral mutations than for old ones, and this process accumulates on neutral genes over generations until an asymptotic value is reached. The consequence is that heterozygosity will always be larger than that expected if all the neutral genes in the population had a constant effective size equal to the asymptotic value (Ne) and, therefore, cannot be formally predicted in the simple way, 4Neµ. The magnitude of the underprediction depends on how quickly the asymptotic Ne value is reached (see Figure 3). For a given genome size or recombination rate, this relies on the rate of reduction of the genetic variance. Under the assumptions of the infinitesimal model, the reduction of the variance in the selected system will be mainly due to drift if selection is weak, the reduction of effective size will be slow, and the difference between the real heterozygosity and that expected from the asymptotic Ne will be the highest. As the effect t of selected genes becomes larger, the rate of reduction increases and the asymptotic Ne is reached earlier. In a model of deleterious mutations of large effect (the background selection model), heterozygosity tends to be almost equal to 4Neµ as mutations reach their asymptotic value of Ne in a few generations. ![]()
/
0), which is identical to our prediction of the asymptotic Ne/N when drift is not considered. This prediction turns inexact as the population size or the effect of selected genes decreases (![]()
![]()
![]()
The progressive reduction of the effective size associated with mutations can also explain the apparent disconnection between heterozygosity and number of segregating sites under selection, which is the basis of statistical tests of neutrality (e.g., ![]()
![]()
The spectrum of gene frequencies can be approximated from the evolution of Ne associated to mutations over generations. For mutations originated
generations before the actual generation, the magnitude of the drift process can be summarized by the harmonic mean (Ne,H
) of the Ne,i values from generation i = 1 to
. The remaining proportion of heterozygosity can be predicted by (1 - 1/2Ne,H
)
. Analogously, the proportion of segregating sites can be approximated from Ne,H
using the general theory of the effective population size (Equation 17Equation 18). In other words, the spectrum of gene frequencies for mutations originated
generations ago is approximately the expected under a neutral model using the appropriate Ne,H
. Deviations from the pure neutral spectrum arise when the contributions of all previous generations are accumulated. Different spectra corresponding to different Ne,H
values of previous generations (from
= 1 to
) are superimposed, one over the others, building a general spectrum that cannot be explained by a single Ne value under a neutral model (see Figure 4).
When statistical tests are applied to compare predicted and observed spectra of gene frequencies, the finite size of the samples can make the deviations from the neutral model to difficult to detect. Observations in natural populations of Drosophila denote reduced diversity in regions with low recombination rates (![]()
![]()
![]()
![]()
![]()
![]()
Finally, some remarks concerning artificial selection can be made. In the general theory of quantitative traits, linkage is usually ignored as farm species generally have several chromosomes, suggesting that the assumption of free recombination is close to reality. Additionally, linkage makes the analytical model more cumbersome: Additive models are complicated by the effect of the generation of negative covariances between genes affecting fitness (![]()
![]()
, its application to a model in which parents are selected individuals and the genetic values of both "gametes" are negatively correlated is not straight. Further insight into these models is necessary to assess the impact of linkage in artificial selection programs.
| ACKNOWLEDGMENTS |
|---|
We thank B. CHARLESWORTH, W. G. HILL, and N. BARTON for helpful comments. This work was supported by grant PB95-0909-C02-02 from Ministerio de Educación y Cultura (Spain) to E.S























