- THIS ARTICLE
-
Abstract
- Full Text (PDF)
- Alert me when this article is cited
- Alert me if a correction is posted
- SERVICES
- Similar articles in this journal
- Similar articles in PubMed
- Alert me to new issues of the journal
- Download to citation manager
- Reprints & Permissions
- CITING ARTICLES
- Citing Articles via HighWire
- Citing Articles via Google Scholar
- GOOGLE SCHOLAR
- Articles by Tsitrone, A.
- Articles by David, P.
- Search for Related Content
- PUBMED
- PubMed Citation
- Articles by Tsitrone, A.
- Articles by David, P.
Heterosis, Marker Mutational Processes and Population Inbreeding History
Anne Tsitronea, François Roussetb, and Patrice Davidaa Centre d'Ecologie Fonctionnelle et Evolutive, Centre National de la Recherche Scientifique, 34293 Montpellier Cedex 5, France
b Institut des Sciences de l'Evolution de Montpellier, Université Montpellier II, 34095 Montpellier, France
Corresponding author: Anne Tsitrone, CEFE-CNRS, 1919 Rte. de Mende, 34293 Montpellier Cedex 5, France., tsitrone{at}cefe.cnrs-mop.fr (E-mail)
Communicating editor: D. CHARLESWORTH
| ABSTRACT |
|---|
Genotype-fitness correlations (GFC) have previously been studied using allozyme markers and have often focused on short-term processes such as recent inbreeding. Thus, models of GFC usually neglect marker mutation and only use heterozygosity as a genotypic index. Recently, GFC have also been reported (i) with DNA markers such as microsatellites, characterized by high mutation rates and specific mutational processes and (ii) using new individual genotypic indices assumed to be more precise than heterozygosity. The aim of this article is to evaluate the theoretical impact of marker mutation on GFC. We model GFC due to short-term processes generated by the current breeding system (partial selfing) and to long-term processes generated by past population history (hybridization). Various mutation rates and mutation models corresponding to different kinds of molecular markers are considered. Heterozygosity is compared to other genotypic indices designed for specific marker types. Highly mutable markers (such as microsatellites) are particularly suitable for the detection of GFC that evolve in relation to short-term processes, whereas GFC due to long-term processes are best observed with intermediate mutation rates. Irrespective of the marker type and population scenario, heterozygosity usually provides higher correlations than other genotypic indices under most biologically plausible conditions.
THE existence of correlations between individual genotypes at marker loci and fitness-related traits has caused much debate among evolutionary biologists. Such correlations were initially used as an argument in favor of selection acting on the maintenance of allozyme polymorphisms in the controversy that has historically opposed selectionists and neutralists (![]()
![]()
![]()
![]()
![]()
![]()
![]()
![]()
![]()
The observation of significant HFC with noncoding DNA markers makes it clear that at least some of the correlations are not due to direct effects of the marker genes on the phenotype. Associative overdominance refers to any kind of HFC not due to a direct effect of marker genes, but to genetic associations between the markers and fitness genes (![]()
![]()
![]()
![]()
![]()
![]()
![]()
![]()
![]()
![]()
![]()
![]()
![]()
Previous models of associative overdominance implicitly neglect marker mutation, because they focus on short-term inbreeding and on markers with low mutation rates (e.g, allozymes). Now that highly mutable markers (i.e., microsatellites) and long-term scenarios are included in HFC studies, a theoretical assessment of the importance of mutation is needed. This approach was implicitly followed by ![]()
![]()
![]()
![]()
![]()
![]()
![]()
![]()
![]()
![]()
In this article we assess the significance of these arguments using a theoretical approach. The influence of marker mutation on genotype-fitness correlations (GFC) due to inbreeding is investigated by comparing (i) different mutation models and mutation rates, (ii) different population inbreeding histories involving various time scales, and (iii) different genotypic indices related to the mutational processes of the marker.
| THEORY |
|---|
The model:
The rationale of the model is as follows: Population history can generate variance in inbreeding among individuals, depending on individual pedigrees. The population can thus be partitioned into "inbreeding classes." First, inbreeding levels are associated with genotypes at neutral markers. The latter can be obtained from identity-in-state (IIS) relationships among marker alleles for a given marker mutation model and population scenario. Second, in the presence of inbreeding depression, the inbreeding level determines the value of the fitness trait. Thus, individual phenotype and genotype correlate through individual variation in the inbreeding level. For each mutation model, genotypic index, and population scenario assumed, the correlation coefficient
(X, W) between X (a given index) and the fitness trait W is derived. As
![]() |
(1) |
three moments must be computed: the covariance of the fitness trait and X, cov(W, X), and the variances
2(W) and
2(X). Below, we describe how analytical expressions for these moments can be obtained. These computations rely on IIS probabilities that depend on the mutation model assumed. Mutation models for the marker genes and corresponding indices are described in the next section.
Mutation models and individual genotypic indices:
In a second step we develop theoretical models that approximate mutational processes at DNA markers, such as microsatellites, RFLP markers, or neutral DNA sequences. Only the first two categories of markers have been used in GFC studies, but the third kind might also be used in the future.
Stepwise mutation model:
Under a strict stepwise mutation model (SMM; ![]()
![]()
![]()
![]()
K-alleles model:
Under a model first formulated by ![]()
![]()
![]()
![]()
![]()
Infinite-alleles and infinite-sites models:
Under the infinite-alleles model (IAM; ![]()
![]()
The partial selfing model and single-locus heterozygosity-fitness correlation:
We investigate a simple scenario in which HFC relates to the actual mating system of the population. This scenario examplifies short-term inbreeding. Consider a large population of size n at inbreeding equilibrium, with freely recombining loci and one marker locus with a given mutation model. A proportion S of offspring is produced by selfing at each generation, whereas a proportion 1 - S comes from outcrossing events. Each individual is characterized by the number of generations of selfing in its pedigree, k, starting from the most recent outcrossing event (0
k
). The population is thus partitioned into inbreeding classes Ck, 0
k
, each consisting of individuals having the same inbreeding level (i.e., the same k). Note that, under this model, there is an infinite number of inbreeding classes, although in practice only classes with low k are largely represented. In the absence of selection, inbreeding class k has frequency Pr(Ck) = (1 - S)Sk. Selection against homozygous genomes reduces the frequency of classes with high k. The effect of selection thus resembles a reduction in the selfing rate S to a lower value Ssel that can be computed as explained in DAVID 1999 (Appendix A). In practice, ![]()
Let us define the probabilities of IIS within a population: Q0 for a pair of genes drawn from the same individual and Q1 for genes from different individuals. In what follows, moments are expressed as functions of Q0 and Q1 that are derived in Appendix A under each mutation model. The inbreeding coefficient in class k, fk, is defined by
![]() |
(2) |
where Q|k is the probability of identity of genes in an individual of class Ck. Under a general inbreeding load model assuming additive effects of fitness genes (![]()
![]() |
(3) |
where W0 is the value of the fitness trait for outbred individuals (i.e., belonging to C0); ß is the inbreeding load (i.e., the fitness reduction among completely inbred individuals as compared to outbred individuals); and
is a random effect with mean 0 and variance
2
, assumed to be independent of k. Moments of the fitness trait W are, based on Equation 3,
![]() |
(4a) |
![]() |
(4b) |
where E(f) and
2(f) are the mean and variance of fk over all k-values.
Moments of heterozygosity H are simply expressed as functions of probabilities of IIS:
![]() |
(5a) |
![]() |
(5b) |
Within a given inbreeding class Ck, heterozygosity is not correlated with the fitness trait so that, conditioning on the inbreeding class k,
![]() |
(6a) |
where E(H|Ck) = 1 - Q|k, and E(W|Ck) = W0 - ßfk. First- and second-order moments of fk and Q|k are needed to compute Equation 4aEquation 4b. Neglecting mutation since the most recent outcrossing event, 1 - Q|k
, so that fk = 1 - (
)k. This yields

and

where FIS is the classical F-statistic (![]()
![]()
![]() |
(6b) |
Note that cov(H, W) is null for S = 0 and S = 1, since there is then no variance in the inbreeding level among individuals. Equation 4b, Equation 5b, and Equation 6b can now be used to derive the correlation coefficient between heterozygosity and the fitness trait from (1). The maximum correlation coefficient (
max), assuming no within-inbreeding class variance for the fitness trait (
2
= 0 in Equation 3), is
![]() |
(7a) |
2max does not depend on the inbreeding load ß, but only on the marker genetic diversity 1 - Q1 (see Appendix A for an analytical expression of Q1 under each mutation model) and on the selfing rate S. In the presence of within-pedigree variance for the fitness trait (
2
> 0),
2 is
![]() |
(7b) |
Extension of the partial selfing model to other genotypic indices:
We first focus on the squared difference in repeat units for a microsatellite marker under the stepwise mutation model of mutation. Cov(d2, W) is derived as in (6a), replacing H by d2. Moments of d2 are obtained as detailed in Appendix A (Equation A7). This yields

(8) and thus

(9) and
2 is as in (7b).
We then focus on another genotypic index: the number of nucleotide differences for a neutral sequence under the infinite-sites model of mutation. The correlation coefficient between p and W under the ISM is computed similarly to the correlation coefficient between d2 and W. Moments of p are derived in Appendix A (Equation A5) and Cov(p, W) is as in (8). We obtain
![]() |
(10) |
and
2 is as in (7b).
The population admixture model (hybridization):
We next consider a simple situation where the HFC is related to long-term processes. We assume a large, random mating population of size N at mutation-selection equilibrium, which splits into two randomly mating finite subpopulations of size n << N. Let the two subpopulations diverge for a long time
(assumed to be equal to N generations for the sake of simplicity) without any gene flow between them and then merge into a single, infinite, panmictic population. Given enough divergence time, different deleterious mutations inherited from the ancestral population will reach different frequencies in each subpopulation. We investigate the genotype-phenotype relationship in the resulting mixed population. Each individual is characterized by the probability x that the two alleles it has at a given locus originate from the same subpopulation, depending on the number g of panmictic generations following the admixture. The quantity 1 - x can be interpreted as the individual "degree of outbreeding," i.e., the "disparity between the genome of the two parents" (![]()
Just after the admixture (g = 1) the correlation is due to the coexistence of two inbreeding classes, Cw and Cb, with equal probabilities 1/2. Cw (w stands for "within") is composed of individuals with both parents originating from the same subpopulation (x = 1) while Cb individuals (b stands for "between") have one parent in each of the two subpopulations (x = 0). Cw individuals are expected to have lower fitness than Cb individuals as they are more likely to be homozygous for their deleterious mutations. They also have more similar alleles at a neutral marker locus than Cb individuals.
Each generation after the contact, new inbreeding classes Cx appear as a consequence of recombination. At generation g after the contact, assuming free recombination for the sake of simplicity,
![]() |
(11a) |
![]() |
(11b) |
Let Qw and Qb be the probability of identity-in-state of genes from Cw and from Cb, respectively. fx, the inbreeding coefficient of class x, is defined as
![]() |
(12) |
Neglecting mutation since admixture, Q|x can be expressed as xQw + (1 - x) Qb so that
![]() |
(13) |
where FST is the classical F-statistic (![]()
![]()
![]() |
(14) |
Here W0 is the value of the fitness trait for Cb individuals (i.e., with x = 0) and the inbreeding load ß, the difference in fitness for Cb and Cw individuals, measures heterosis between the two subpopulations before the mixture. The random effect
with mean 0 and variance
2
is assumed independent of x. We now derive the correlation coefficient between heterozygosity and the fitness trait. Moments of the fitness trait W are derived on the basis of (11) and (14):
![]() |
(15a) |
![]() |
(15b) |
Moments of heterozygosity H are simply expressed as functions of probabilities of IIS:
![]() |
(16a) |
![]() |
(16b) |
The covariance term is calculated using the partition into inbreeding classes,

(17a) which yields
![]() |
(17b) |
The maximum correlation coefficient is
![]() |
(18a) |
At the first generation after the mixture (g = 1),
![]() |
(18b) |
The coefficient
2max depends only on the ISS probabilities Qb and Qw (derived in Appendix B) but not on the inbreeding load, just as in the partial selfing model. In the presence of within-pedigree variance for the fitness trait,
2 is as in (7b), replacing
2(fk) by
2(x).
The model can be generalized to other cases, as for the partial selfing model. A summary of the analytical expressions for the squared correlation coefficients obtained under both population models is provided in Table 1.
|
Numerical parameters for the models:
Analytical results derived in the previous sections were explored numerically using Mathematica 3.0 programs (![]()
Mutation rates: To analyze the impact of mutation on heterozygosity-fitness correlations with respect to different mutation models, a large range of mutation rates is investigated, from 10-9 to 10-2 per locus and per generation. Mutation rates ranging from 10-6 to 10-2 are investigated to compare d2 and H under the SMM, as this is the accepted range for a microsatellite locus. Assuming a substitution rate equal to 10-9 per site per generation, total mutation rates ranging from 10-9 to 10-6 correspond to sequences of realistic length (few to several hundreds of base pairs). These mutation rates were thus considered when comparing p and H under the ISM.
Population parameters:
Under the partial selfing model the standard situation modeled is a large population (n = 103 individuals) with intermediate selfing rate (S = 0.4). The effect of a change in the population size (from 103 to 106) and the selfing rate (from 0 to 1) is analyzed. Under the admixture model the standard assumptions are small subpopulations (n = 102) descended from a large ancestor population (N = 104) after a long divergence time (
= 104 generations). The effect of the subpopulation size (from 102 to 104) and the joint influence of the ancestor population size and the divergence time in generations (assumed to be equal for the sake of simplicity and ranging from 104 to 106) are investigated. The inbreeding load does not need to be estimated here as we focus on maximum correlation coefficients, which do not depend on this parameter (see Table 1).
| RESULTS |
|---|
We first focus on the effects of the mutation models and mutation rates on the heterozygosity-fitness correlation for both population scenarios, evaluating the effect of population parameters. We then analyze the influence of genotypic indices on the genotype-fitness correlation obtained. Finally the form of the relation between the genotype and the fitness (the expected value of the fitness trait as a function of d2) is discussed under the SMM.
Impact of marker mutation on the heterozygosity-fitness correlation:
Under the partial selfing model, the maximal correlation coefficient increases with the marker mutation rate for all mutation models studied (Fig 1A). This is not surprising since this correlation is an increasing function of marker gene diversity [(1 - Q1) in Equation 7a], which itself increases with the mutation rate irrespective of the mutation model (see Appendix A). For low mutation rates (<10-5), the correlation coefficient is never higher than a few percent, in agreement with most experimental data (![]()
|
A different picture is obtained under the admixture model, as the maximum correlation coefficient exhibits a maximum for an intermediate mutation rate (Fig 1B). This "optimal mutation rate" is of the order of 1/
(the divergence time; Fig 1B and data not shown). The correlation coefficient obtained under the admixture model may be much higher than that obtained under the partial selfing model. Under the partial selfing model the correlation is limited by the fact that all genotypes, be they homozygotes or heterozygotes, are represented in all inbreeding classes although the frequency of heterozygotes is halved in each generation of selfing. In contrast, under the admixture model, parameter values can be found such that all homozygotes are in class Cw and all heterozygotes are in class Cb, which allows the maximum correlation (in the absence of environmental variance in fitness) to approach unity. However, assuming freely recombining loci, this maximum correlation is halved each generation following the admixture (Equation 18aEquation 18b). Furthermore, in natural populations, within-inbreeding class variance for the fitness trait may weaken the real correlation [Equation 7b, replacing
2(fk) by
2(x)]. Mutation models rank in the same order as those under the partial selfing model with regard to the correlation coefficient, except with very high mutation rates (>10-3) for which the SMM provides a stronger correlation than the IAM. Ultimately, for both population models studied, marker mutation influences HFC primarily through its effect on marker diversities: (1 - Q1) for the partial selfing model (Equation 7a) and (1 - Qw) and (1 - Qb) for the admixture model (Equation 18aEquation 18b).
Impact of population parameters on the heterozygosity-fitness correlation:
Under the partial selfing model, the correlation coefficient increases with the population size and the selfing rate, until S is very close to 1 (Fig 2), whatever the mutation rate and model (data not shown). ![]()
![]()
![]()
![]()

> 0 in Equation 4b) the correlation is weakened and becomes an increasing function of the inbreeding load (Equation 7b).
|
In the admixture model, the correlation coefficient decreases with the subpopulation size, particularly when the mutation rate is high (Table 2). The correlation increases when the ancestral population size and the divergence time increase simultaneously, particularly for low mutation rates. The effect of the inbreeding load ß is similar to that in the partial selfing scenario (Equation 7b).
|
Impact of the genotypic index on the correlation with the fitness trait:
The squared difference in repeat units vs. heterozygosity under the stepwise mutation model:
Under the partial selfing model, for low mutation rates, d2 and heterozygosity H are equally correlated with fitness (Fig 3A). Increasing the mutation rate strongly increases the correlation with heterozygosity but not with d2 (Fig 3A), irrespective of population size or selfing rate (data not shown). This can be explained by the distribution of the expected value of the fitness trait conditioned on the value of d2. Using (C3) and (C4) in Appendix C we found that all heterozygotes (nonnull d2 values) correspond numerically to the same expected fitness (e.g., W = 0.89 with W0 = 1, ß = 1, u = 10-4, and other parameters as in Fig 3A), which is higher than that of homozygotes (d2 = 0) (e.g., W = 0.72 with W0 = 1, ß = 1, u = 10-4, and other parameters as in Fig 3A). Therefore, heterozygosity is as informative as d2, and it provides a stronger correlation due to its lower variance.
|
Under the admixture model, however, the correlation between heterozygosity and fitness is weakened for high mutation rates (Fig 1B and Fig 3B) whereas the correlation with d2 is almost insensitive to the mutation rate (Fig 3B). With a small subpopulation size (n = 100) and mutation rate (u = 10-4), the correlation between the fitness trait and d2 is weaker than that with heterozygosity, as in the partial selfing model (data not shown). In contrast, when the subpopulation size and the mutation rate are large enough (so that nu >> 1) d2 provides a better correlation than H (Fig 3B). Again, this can be explained by the distribution of the expected value of the fitness trait conditional on d2, derived according to Appendix C (Fig 4). Indeed in the first case (nu << 1), d2 has little value for predicting fitness once it is >1. However, when nu >> 1, the expected value of fitness increases progressively with the value of d2, so that in this case d2 is far more informative than H.
|
The number of nucleotide differences vs. heterozygosity under the infinite-sites model: Under the partial selfing model, given the low mutation rate assumed for a neutral sequence, the same correlation with fitness is obtained whether heterozygosity H or the number of nucleotide differences p is considered (Fig 5A). Only very high mutation rates (>10-4), corresponding to unreasonably long sequences, would introduce a difference, with a higher correlation with H than with p (data not shown).
|
Under the population mixture model, identical correlation coefficients are obtained with p and H except with large subpopulation size (>103) and large mutation rates (>10-6; data not shown). In this case p is slightly less correlated to fitness than H (Fig 5B). Again, only very high mutation rates (>10-4) would result in a higher correlation with p than with H (data not shown). Unsurprisingly, these results are qualitatively similar to that obtained in the comparison of d2 and H under the SMM with equivalent mutation rates.
The expected value of the fitness trait as a function of d2:
The expected shape of the genotype-phenotype relationship is not linear (see Fig 4), with fitness plateauing for high d2-values. This indicates that linear regression models previously used in empirical studies are not appropriate. We therefore suggest using a nonlinear regression based on the logistic equation
![]() |
(19) |
where j is the square root of the value taken by d2, W(
) is the maximum value of W, asymptotically reached for very high d2 values, W(0) is the value associated with d2 = 0, and r measures the rate at which the fitness trait value saturates as a function of d2. This can be rewritten:
![]() |
(20) |
For the partial selfing model,
(d2) can be interpreted as a microsatellite-based indirect measure of inbreeding depression. For the admixture model it is interpreted as an indirect measure of heterosis between the two subpopulations before the mixture. Numerical data obtained using Appendix C were fitted to this nonlinear model, for a large range of parameters defined in Table 3 and Table 4 (data not shown). This model explains 95100% of the variance in fitness obtained for both population models studied (neglecting within-inbreeding class variance for the fitness trait). It thus provides a satisfactory description of the genotype-phenotype relationship on the basis of three parameters. Under the partial selfing model,
(d2) increases with the marker mutation rate, the selfing rate, and the population size (Table 3). Under the mixture model,
(d2) increases with the marker mutation rate and when the divergence time and the population size increase simultaneously. It decreases with the subpopulation size (Table 4).
|
|
| DISCUSSION |
|---|
A unified framework for studying genotype-fitness relationships:
To our knowledge, mutational processes at marker loci have not previously been incorporated into analytical models of associative overdominance, although they are recognized as relevant for empirical issues (![]()
![]()
![]()
![]()
The fitness model we used (Equation 3 and Equation 14) was first proposed by ![]()
![]()
![]()
![]()
![]()
Variance for the fitness trait may exist within an inbreeding class, particularly when recombination is limited, due to the segregation of chunks of chromosomes rather than independent loci. In natural populations, environmental effects may also increase this variance. If this is large enough compared to genetical effects, HFC would be weakened and also would depend on the inbreeding load, which should then be estimated. Empirically estimating within- and between-inbreeding class variance for the fitness trait might be difficult in natural populations, but would be particularly useful to assess the validity of our model.
Choosing appropriate marker genes:
Microsatellites have recently been used as a tool to infer fitness differences due to variation in the level of inbreeding between individuals (![]()
We have shown that HFC due to partial selfing increases with the marker mutation rate (and thus with marker variability), making microsatellites good markers in such a situation. Partial selfing is an example of short-term inbreeding generated by the current mating system. One generation of outcrossing reduces individual inbreeding to zero, so that only selfing since the most recent outcrossing event (i.e., a few generations) needs to be taken into account. However, short-term inbreeding can also be due to recent demographic episodes such as bottlenecks. A sudden reduction in population size produces random inbreeding among individuals, i.e., mating between kin due to chance, as opposed to systematic inbreeding due to the mating system (![]()
![]()
![]()
However, when the origin of HFC is long-term inbreeding (e.g., admixture of anciently diverged small subpopulations), the correlation increases up to a certain mutation rate but then decreases (Fig 1B). Using highly mutable markers can thus lead to confusing results under a long-term inbreeding scenario. The explanation is that even highly "inbred" genotypes can have heterozygous marker loci if marker mutation rates are sufficiently high. Marker heterozygosity is therefore not a good index of individual fitness. There is thus an "optimal mutation rate," which decreases as the timescale of the population inbreeding history increases. In conclusion, one should use molecular markers whose mutational process scales roughly with the assumed cause of inbreeding. However, empirical studies provide only imprecise estimates of mutation rates and a temporal scale of inbreeding. In practice we suggest only that empirical studies should avoid using markers that mutate at an obviously inappropriate rate.
Mutation models of marker genes play little role in HFC, except when the mutation rate is high. In the infinite-alleles model there is no homoplasy, whereas the K-alleles model is characterized by increasing homoplasy with decreasing K-values. The comparison between IAM and KAM, and among various K-values, suggests that homoplasy is a limiting factor for HFC, irrespective of the population scenario. Homoplasy represents a loss of information about identity-by-descent, making marker genotypes less closely related to the inbreeding level of individuals. As the homoplasy generated by the stepwise mutation model is different from that produced by a KAM, the results obtained for these two mutation models cannot be directly compared.
Choosing an appropriate genotypic index for microsatellite data:
It has been suggested recently that d2 may be a more appropriate index than heterozygosity when analyzing fitness consequences of inbreeding (![]()
![]()
![]()
In a red deer population from the Isle of Rum, ![]()
![]()
100 generations, with large subpopulation size, parameters that appear realistic for such a population, always lead to better predictive power of H as compared to d2 (not shown). Note that the fact that there are only two classes of inbreeding and fitness in our admixture scenario just after the contact does not necessarily imply that a binary variable such as H will perform better than a quantitative variable such as d2. If the proportion of high fitness individuals (class Cb) within a class of d2 increases progressively with d2 (Fig 4, with u = 10-2) rather than in a stepwise way (in most cases), d2 performs better than H.
One can suggest another explanation for the stronger correlation with fitness obtained using d2 as compared to H. First, we cannot exclude the possibility that a more sophisticated demographic scenario may generate this result under less restrictive conditions than the admixture scenario formulated here, although we do not have a precise idea of what such a scenario should be. Alternatively, the results of ![]()
![]()
![]()
![]()
![]()
It seems generally unlikely that d2 will provide a higher correlation with fitness than H. For low mutation rates, heterozygosity and d2 are equivalent under both our population models, and for high mutation rates, heterozygosity always provides a higher correlation than d2 with fitness under the partial selfing model. Under the admixture model only, d2 is more correlated with a fitness trait than heterozygosity under restrictive conditions: large mutation rate, divergence time, and subpopulation size. These conditions are expected to be associated with a relatively low inbreeding load (ß) in the mixed population, because there is a small probability of fixation, or of sufficient change in frequency, of a deleterious mutation in a large subpopulation. The inbreeding load does not influence the maximum correlation (Equation 18a) but affects the actual correlation, taking into account within-inbreeding class variance for the fitness trait. Given that sources of variation in fitness other than inbreeding must exist in natural populations (generating 
in Equation 4b and Equation 15b), this small inbreeding load weakens the actual correlation (see Equation 7b), thus reducing the chance of detecting it.
Finally, a conclusion of our analysis is that there seems to be little theoretical reason to use d2 instead of H when analyzing correlations between microsatellite genotype and fitness, even for long-term inbreeding scenarios. When doing so, however, one should choose an appropriate regression model to analyze genotype-fitness correlations (Equation 19 and Equation 20) rather than the linear model used in previous studies.
| ACKNOWLEDGMENTS |
|---|
We thank Nicolas Bierne, Sylvain Glémin, Philippe Jarne, Sally Otto, Josephine Pemberton, John Slate, and John Thompson for stimulating discussions and helpful comments on a previous draft of the manuscript. This work was supported by funds from the Centre National de la Recherche Scientifique (CNRS; Jeune Equipe program) to P. Jarne.
Manuscript received May 23, 2001; Accepted for publication October 1, 2001.
| APPENDIX A |
|---|
IDENTITY-IN-STATE IN THE PARTIAL SELFING MODEL
We derive probabilities of identity-in-state within and between individuals under the infinite sites, the stepwise mutation, and the K-alleles mutational models (see ![]()
IAM and ISM:
Recursions for the IIS probabilities under the IAM or the ISM are
![]() |
(A1) |
with
= (1 - u)2. Note that in the ISM or the IAM, IIS and identity-by-descent (IBD) are equivalent. Q0 and Q1 are the equilibrium values of the system above:
![]() |
(A2) |
Under the ISM the generating function
0 (resp.
1) of the probability pj,0 (resp. pj,1) that two randomly chosen alleles within an individual (resp. between individuals) differ by j nucleotide sites is defined as
0(z) =
+
j=0pj,0zj (resp.
1(z) =
+
j=0pj,1zj). The effect of mutation is to change both generating functions
(z) by a factor: r(z) = (1 - u + uz)2. We also introduce
0|k, which is the value of
0 conditional on inbreeding class k.
The recursions on
are then mathematically similar to those for IBD (![]()
![]() |
(A3) |
0 and
1 are the equilibrium points of the system above, and neglecting mutation since the last outcrossing event
0|k is easily derived:
![]() |
(A4) |
By differentiating these generating functions one obtains the various moments of p:
![]() |
(A5) |
SMM:
To derive Q0 and Q1 under the SMM it is useful to introduce
0 (resp.
1), the generating function of the probability pj,0 (resp. pj,1), that two randomly chosen alleles within an individual (resp. between individuals) differ by j steps. Because steps can take either nonnegative or negative values,
0(z) =
+
j=-
pj,0zj (resp.
1(z) =
+
j=-
pj,1zj). We also define
0|k, the value of
0 conditional on class k. The effect of mutation is to change the generating functions by a factor r(z) = (1 - u +
(
))2, and
0 and
1 are calculated as in (A4) using this new r(z). By definition, we have p0,0 = Q0 and p0,1 = Q1. Furthermore, p-j,0 = p+j,0 =
Pr(d2 = j2) for all j
0. Therefore,

and neglecting mutation since the last outcrossing event,

We use the inverse Fourier transform in j of a given function f:

Q0 and Q1 can computed as
![]() |
(A6) |
Lj(f) are numerically calculated with Mathematica 3.0 (![]()
By differentiating the generating functions
one obtains the various moments of d2:
![]() |
(A7) |
KAM:
The recursions for IIS probabilities are
![]() |
(A8) |
in which
= (1 - u) +
, Q'j,t =
'Qj,t +
, and
' = (
)2.
The equilibrium values of the system are
![]() |
(A9) |
| APPENDIX B |
|---|
IDENTITY-IN-STATE IN THE MIXTURE MODEL
We derive probabilities of identity-in-state under the ISM, SMM, and KAM (see ![]()
IAM and ISM:
The probability of identity (IIS and IBD are confounded here) for two alleles randomly drawn from the ancestral random mating population of size N at inbreeding equilibrium is Qa
with
= (1 - u)2. Let two subpopulations from the ancestral population diverge without gene flow for
generations. Just before the admixture, the probabilities of identity of two randomly chosen alleles are Qw(
) = Qeq + (
(
))
(Qa - Qeq) within a subpopulation and Qb(
) = 
Qa between subpopulations, with Qeq =
. After the admixture, neglecting mutations since the admixture event and considering an infinite mixed population for the sake of simplicity,
![]() |
(B1) |
Under the ISM we obtain analogous expressions for generating functions, as we did in APPENDIX A. Let
w (resp.
b) be the generating function of the probability pj,w (resp. pj,b) that two randomly chosen alleles that originated from the same subpopulation (respectively from the two subpopulations) differ by j nucleotide substitutions [
w(z) =
+
j=0pj,wzj]. Recursions on
(z) are mathematically identical to those for IBD, taking into account mutation through the factor r(z) = (1 - u + uz)2. Therefore, just before the admixture we obtain

within subpopulations and
b,
(z) = (r(z))
a(z) between subpopulations, with








































