Abstract
Numerous studies have relied on microsatellite DNA data to assess the relationships among populations in a phylogenetic framework, converting microsatellite allelic composition of populations into evolutionary distances. Among other coefficients, (δμ)^{2} and R_{st} are often employed because they make use of the differences in allele sizes on the basis of the stepwise mutation model. While it has been recognized that some microsatellites can yield disproportionate interpopulation distance estimates, no formal investigation has been conducted to evaluate to what extent such loci could affect the topology of the corresponding dendrograms. Here we show that single loci, displaying extremely large amongpopulation variance, can greatly bias the topology of the phylogenetic tree, using data from European grayling (Thymallus thymallus, Salmonidae) populations. Importantly, we also demonstrate that the inclusion of a single disproportionate locus will lead to an overestimation of the stability of trees assessed using bootstrapping. To avoid this bias, we introduce a simple statistical test for detecting loci with significantly disproportionate variance prior to phylogenetic analyses and further show that exclusion of offending loci eliminates the false increase in phylogram stability.
NUCLEAR microsatellite DNA loci are increasingly employed to assess evolutionary relationships among populations (Goldstein and Schlötterer 1999). Because of their high variability, microsatellites can allow for the discrimination of populations where other methods (e.g., DNA sequencing) have failed to detect any polymorphism (Bowcocket al. 1994; Angers and Bernatchez 1997; Brünneret al. 1998). Another attractive feature of microsatellites is that the phylogeny of populations can be retraced from a large number of independent loci, whereas, e.g., with mitochondrial DNA, conclusions rely essentially on only one locus. Microsatellites evolve predominantly according to the stepwise mutation model (SMM; Ohta and Kimura 1973; Schlötterer 2000; but see also Ballouxet al. 2000), according to which every allele has an equal probability to mutate up or down by a single repeat unit (Shriveret al. 1993; Valdeset al. 1993). Consequently, numerous genetic distance indices have been developed to make use of the evolutionary information contained within differences in repeat numbers among alleles, treating them as quantitative traits (e.g., Goldstein et al. 1995a,b; Shriveret al. 1995; Slatkin 1995). Goldstein et al. (1995b) have suggested the use of a coefficient known as (δμ)^{2}, which relies solely upon the differences in mean allele sizes between a pair of populations,
However, computer simulation studies have also pointed out that (δμ)^{2} distances have an inherently high variance, suggesting that hundreds of loci may be required to attain stable estimates (Zhivotovsky and Feldman 1995). Furthermore, a recent study of four human populations indicated that (δμ)^{2} distances can be extremely sensitive to the influence of a very small number of loci, even when >200 loci are utilized; indeed, it was shown that almost onehalf of the average interpopulation distance could be attributed to only 2 out of 213 loci (Cooperet al. 1999). Therefore, the contribution of each microsatellite to the overall (δμ)^{2} distance estimates (hereafter referred to as the “contribution of a locus”) can vary tremendously, to a point where a small number of the assessed loci can dictate the value of the (δμ)^{2} distances. Despite the fact that large variance of the (δμ)^{2} distance has been recognized, there has been no formal attempt to ascertain the effect this variance has on the topology of phylogenetic trees derived from the distance matrix. In fact, there is good reason to expect that inequalities among loci could create a bias in the topological validation of trees, as assessed with resampling procedures (such as bootstrapping or jackknifing; see Lapointe 1998 for a review). Indeed, the main underlying assumptions of bootstrapping are that characters (here loci) are independently and identically distributed (IID; West and Faith 1990; Carpenter 1992). Identical distribution requires that each locus “must obey one common stochastic model of evolution” (Sanderson 1995), an assumption that is likely to be violated with microsatellite loci that are characterized typically by heterogeneous mutation rates (Weber and Wong 1993; Primmeret al. 1996; Di Rienzoet al. 1998; Harret al. 1998) and possible differences in range constraints (Garzaet al. 1995; Lehmannet al. 1996). In a case where a small number of loci could disproportionately affect the overall (δμ)^{2} distance matrix, resampling (e.g., using the bootstrap) of microsatellites having a small contribution to distances may have little or no effect on the determination of the topology of a tree. For that reason, it could be expected that unequal contribution of loci to the interpopulation distances should upwardly bias the stability estimates (bootstrap support) of (δμ)^{2} trees.
To evaluate this potential bias, we have used an innovative randomization method, based on the permutation of alleles within a real microsatellite data set. This approach was inspired by the family of permutation tail probability tests (PTP; Faith and Cranston 1991). In this general framework, statistics pertaining to the topological stability of a phylogenetic tree were computed from real data and then compared to those attained with randomized data, i.e., following the permutation of alleles among individuals or loci. This procedure intends to establish whether the topology of the tree derived from the real data is more stable than that of a tree built from random data. Any pattern of topological stability emerging from the permuted (randomized) data set would reveal the existence of a bias in the analyses and further allows qualitative evaluation of the extent of this bias.
In this study, we explored the effect of unequal contribution of loci in resolving population evolutionary relationships, using the (δμ)^{2} genetic distance. This has been examined using data from 17 microsatellite loci obtained from widely distributed natural populations of European grayling, Thymallus thymallus (Salmonidae). We show that a single locus that exhibits strikingly large interpopulation variance can completely dominate the calculation of the genetic distances among populations, introducing a substantial bias in the topology of the corresponding phylogram. Using a permutational approach, we show that the inclusion of disproportionately variable loci falsely increases the similarity of trees between resamples and, consequently, overestimates the bootstrap support values of the (δμ)^{2} phylogenetic tree. The extension of these analyses to a second index of interpopulation genetic distances, the socalled R_{st} (Slatkin 1995), revealed that such problems are not limited to the (δμ)^{2} index, but are to be expected with most SMMbased distance coefficients. These findings imply that caution is warranted when applying (δμ)^{2} or R_{st} distances on microsatellite loci that display heterogeneous levels of diversity.
METHODOLOGY
Data: Seventeen microsatellites were employed to genotype 594 T. thymallus individuals sampled from 17 populations across Europe (genotyping details in Koskinen and Primmer 2001). All specimens were caught from the wild between 1994 and 1999 from areas considered to be mostly unaffected by stocking of European grayling. Linkage equilibrium tests of the loci and HardyWeinberg equilibrium tests of the loci and populations did not reveal any significant deviations potentially violating the assumptions of genetic distance estimation and (δμ)^{2} phylogram construction (Koskinenet al. 2002). The microsatellitebased evolutionary relationships of the European grayling populations are relatively clear: Phylogenetic trees based on CavalliSforza and Edwards's (1967) chord distance (D_{CE}) and Nei et al.'s (1983) D_{A} distance are congruent with mitochondrial DNAbased results (Koskinenet al. 2002). Yet, the (δμ)^{2} distances yielded by each locus were extremely heterogeneous and motivated the investigation of unequal contributions of loci on phylogenetic tree topology.
Testing the unequal contributions of different loci: Following Equation 1, a locus displaying large differences in mean allele sizes among populations will contribute more to the overall (δμ)^{2} distance than a locus exhibiting small size differences. Therefore, we assessed the expected overall contribution of each locus i by calculating its corresponding variance of mean allele size among populations, hereafter referred to as the variance of mean sizes (μ Var_{i}),
To evaluate whether a locus had a significantly larger contribution than the other loci to the (δμ)^{2} distance matrix, we introduce a new statistic comparing the contribution of a locus i to the average contribution of the remaining l loci:
While the (δμ)^{2} index is the focal point of this study, disparities among loci contributions are also expected to influence other SMMbased distance measures. Handling repeat unit numbers as a quantitative trait can indeed induce variations in the relative importance of differing loci, which would not be the case for indices based on allele frequencies that always sum up to one (e.g., CavalliSforza and Edwards 1967; Neiet al. 1983). For comparative purposes, we extended our analyses to a second distance coefficient derived for microsatellite markers, namely R_{st} (Slatkin 1995). The two measures are related to a certain point, as allele size variance estimators (within and among populations) unavoidably rely on the sum of squared differences. However, while (δμ)^{2} considers only the average allele size differences among populations, R_{st} also takes into account the withinpopulation variance, which might compensate for the influence of disproportionately variable loci.
Phylogenetic analyses and topological comparisons: All trees analyzed in this study were recovered using the following procedure: First, interpopulation genetic distance matrices were obtained by summing the (δμ)^{2} values across loci or by calculating the R_{st} variance ratio (computations carried out with MsatBootstrap 1.1, available from: http://www.helsinki.fi/~primmer, under publications and data). Then, the corresponding trees were recovered with the FitchMargoliash algorithm [computations performed in FITCH from the PHYLIP package, version 3.752 (P = 2, G, and no negative branches allowed); Felsenstein 1995].
The topological similarity between trees was evaluated with the partition metric (P_{m}; Robinson and Foulds 1981). For a given pair of trees, this index counts the number of clusters occurring in the first or the second tree, but not in both (i.e., the number of topological differences between a pair of trees; Penny and Hendy 1985). The P_{m} values were standardized with the maximum number of topological differences between two trees (i.e., 2n − 6; Steel and Penny 1993), and the 1− complement of this measure was recorded [hereafter referred to as the topological similarity index (TSI); see also Landryet al. 1996]. Consequently, topologically identical trees have a TSI of 1, whereas completely different trees have a TSI of 0.
Contribution of individual loci in the determination of the topology of a phylogram: Two strategies were employed to evaluate the contribution of single loci to the structure of a phylogenetic tree. First, we compared the tree that is based on the complete data set (17 loci) to trees obtained when each single locus was excluded in turn, i.e., 17 trees based on 16 loci each (using TSI; see above). The raison d'être of this procedure was to assess whether tree topology would be altered more by the removal of a locus with a larger μ Var than it would when a locus with a smaller μ Var was excluded from the data set. Second, the topology of the tree derived from each single locus was compared to the tree based on all loci, i.e., 17 trees based on 1 locus each (using TSI; see above). The rationale of this second procedure was that a tree based on a locus contributing more to the overall interpopulation distances should be more similar to the tree based on all 17 loci than a tree obtained from a locus with a smaller contribution to the distance matrix.
Topology and bootstrap support of phylograms as a function of the number of assessed loci: Computer simulation studies have revealed that increasing the number of microsatellites ought to decrease the variance of (δμ)^{2} (Goldsteinet al. 1995b). Accordingly, increasing the number of microsatellites should stabilize the structure of the corresponding phylogram (Goldsteinet al. 1995b; Takezaki and Nei 1996; Jinet al. 2000). To investigate this, we applied two distinct analyses. First, for every given fixed number of loci (from 2 to 17), 100 bootstrap replicates were generated from the 17locus data set, and their matching (δμ)^{2} or R_{st} distance matrices were calculated (computations carried out in MsatBootstrap 1.1). Then, the FITCH tree corresponding to each matrix was recovered as described above. Subsequently, the topology of each replicated tree was compared to the topology of the tree based on all 17 loci (using TSI), and the mean TSI value across each set of 100 replicates was recorded as an index of general topological similarity between trees based on a subset of loci and the tree based on all 17 microsatellites.
Second, the topological stability (i.e., bootstrap support) of trees built from either (δμ)^{2} or R_{st} matrices was evaluated as a function of the number of loci, using the analytical design described above. For every set of 100 phylograms previously obtained, a majorityrule consensus tree was calculated (computations in CONSENSE in PHYLIP; Felsenstein 1995). Then, the mean nodesupport value of the consensus tree was computed and further used as an indicator of topological stability. Large values of average node support characterize trees with clusters that are well supported by the data (i.e., stable trees) whereas small values indicate that some clusters are poorly supported (i.e., unstable trees).
Data randomization procedures for evaluating contribution of loci to interpopulation distances and subsequent tree topology: Two related permutation models were employed to randomize the data:
Single permutation: To eliminate any evolutionary signal that could be related to the differences in the number of repeats between populations, alleles scored at a given locus were randomly assigned to individuals. Under this permutational hypothesis, mean allele sizes are expected to be equal in all populations, leading to expected interpopulation distances approaching zero; any differences in variance between loci are, however, retained.
Double permutation: In this scheme, the single permutation was followed by a second shuffling that permuted alleles among loci within each individual. Thereby any given allele was assigned to a randomly chosen individual and to a randomly chosen locus. In addition to bringing expected interpopulation distances near zero, this procedure aimed at equalizing the variance of allele sizes among loci. All permutation procedures were repeated 10 times, and the same analyses of convergence and stability were repeated for each permuted data set.
RESULTS
The (δμ)^{2} genetic interpopulation distances yielded by each locus were found to be extremely variable, maximum values ranging from 4.0 to 169.0 squared repeat units (ru^{2}), with a mean estimate of 27.6 ru^{2} (Table 1). Marked variations in average (δμ)^{2} distances were also observed among loci, ranging from 1.0 ru^{2} (BFRO016) to 194.8 ru^{2} (One2). Thus, it was clear that the influence of individual microsatellites on the overall (δμ)^{2} distance matrix greatly differed among loci. Indeed, only 2 of the 17 loci (namely One2 and BFRO012) accounted for 63.6% of the total average interpopulation distance (Table 1). Accordingly, these two microsatellites exhibited much larger allele size variances (μ Var) than did the remaining 15 loci (Table 1). Rigorous statistical testing revealed that the contribution of the One2 locus was significantly larger than the average contribution of other loci, an assertion that did not hold true for any other locus after Bonferroni correction (see Table 1).
Consequently, the importance of each locus in the determination of the structure of the evolutionary relationships among populations was found to differ strikingly among loci. Single removal of strategic loci (e.g., One2 or BFRO013) resulted in dramatic alterations to the topology of the tree (Figure 1). On the other hand, the majority of the other loci could be individually excluded without observing any major change in the topology. In fact, nine loci (53%) could be individually removed without causing a single topological modification (Figure 1). From a reverse angle, the singlelocus trees obtained from either One2 or BFRO013 showed the highest similarity with the tree based on complete data set (Figure 1), providing additional evidence that these loci were predominant in the determination of the general structure of the phylogenetic tree.
Analyses of the topological structure of (δμ)^{2} phylograms as a function of an increasing number of loci also indicated the drastic effects that diverging contributions of the microsatellites can have on the recovery of evolutionary relationships (Figure 2). As expected, increasing the number of resampled loci led consistently to a tree that was increasingly more similar to the one based on all 17 loci. Interestingly, however, the rate of increase in the TSI estimates based on real data did not differ from the one obtained with the singlepermuted data (Figure 2A). On the other hand, the pattern observed with the doublepermutation procedure showed that the complete randomization of alleles (i.e., among individuals and loci) removed any false evolutionary pattern more successfully than the single permutation did (Figure 2A). Interestingly, the same conclusions can be drawn from the trees derived from the variancebased index, R_{st}, despite the fact that this distance should account for withinpopulation variance (Figure 2C).
Another striking conclusion of this study was that the average bootstrap support values of (δμ)^{2} phylograms based on randomized data (single permutation) were comparable to the bootstrap values observed with real data (Figure 2B). Even in the absence of any evolutionary signal (singly permuted data), bootstrap support values averaging up to 50% were recorded, with the relationships of some nodes being supported by bootstrap values as high as 79%. However, the bootstrap support values of the trees derived from randomized data following the doublepermutation procedure were considerably decreased (Figure 2B). Results for the R_{st} exhibited a similar pattern (Figure 2D). As for TSI analyses, the R_{st} trees based on >14 loci appeared to some extent more stable than those from the singly permuted data.
To further demonstrate the effects of a dominant locus, the analyses were rerun after removal of the locus displaying a contribution significantly larger than all others (i.e., One2); this partial data set was also submitted to the single permutation procedure [(δμ)^{2} only]. Following this, the topological convergence (i.e., increase in the TSI; Figure 2A) and stability levels (bootstrap support; Figure 2B) of trees observed were found to be comparable to those obtained with the double permutation of the complete data set.
DISCUSSION
Collectively, these results, based on data from 17 populations spanning the natural range of European grayling, reinforce earlier findings from four human populations (Cooperet al. 1999) that some loci can have a markedly larger influence than others in the determination of (δμ)^{2} genetic distances when several loci are combined (see also Goodman 1997). More importantly, the results presented here demonstrate for the first time that loci contributing disproportionately to the distance matrix can also strongly bias subsequent phylogenetic tree construction and internal validation procedures. To diagnose this problem within a data set before undertaking any phylogenetic analysis, we have formulated a simple test based on permutations aimed at detecting unequal locus contributions, providing a robust framework to address this problem and identify outliers.
The contribution of a locus (differences in size among alleles) is not necessarily related to its phylogenetic informativeness (Angers and Bernatchez 1997; Estoup and Angers 1998), due to factors including size homoplasy (Estoupet al. 1995; Angers and Bernatchez 1997), variation in mutation rates within (Primmeret al. 1998; Schlöttereret al. 1998; Crozieret al. 1999) and between (Brinkmannet al. 1998; Kayseret al. 2000) loci, and allele size constraints (Garzaet al. 1995; Lehmannet al. 1996). Another fundamental problem is that incompatibility of phylogenetic information contained in different microsatellite loci is common. Nevertheless, validation procedures (such as bootstrapping) are aimed at revealing the degree of incompatibility within the data, assessing the degree of confidence of evolutionary trees derived from multiple loci. However, our results illustrate that inequalities among loci can falsely increase the topological congruence (TSI) and stability (bootstrap support) of corresponding phylogenetic trees. Therefore, standard resampling procedures cannot accurately assess the reliability of a tree in the presence of important inequalities among contribution of loci, neutralizing an important instrument to identify the possible phylogenetic incongruence within the data. This also implies that it could be very difficult to recover the true evolutionary relationships among populations in a case where one dominating locus (i.e., with disproportionate contribution) would lead to a phylogenetic tree that is erroneous.
This is evidenced in the European grayling data set, where singlepermuted (randomized) data that produced trees of similar “quality” to the trees based on the original data indicated that the increase in TSI estimates of the nonpermuted microsatellite data with increasing locus number was not a reflection of a meaningful evolutionary pattern. Given that alleles were permuted within each locus separately so that the locusspecific allele size variances were retained, the most likely explanation is that the observed TSI estimates were governed by differences in the magnitude of (δμ)^{2} or R_{st} distances among loci. Indeed, equalizing the variance among loci with the double permutation procedure was enough to remove most of the spurious increase of the TSI in the randomized data, confirming that the increase of TSI within the singly permuted data can be explained solely by differences of mean allele size variance among loci.
Analyses of the bootstrap support values of phylogenetic trees indicated a comparable trend. The similar increase of the average bootstrap values observed in the singlepermuted, compared to the original, data substantiates the idea that the reliability of the topology of a phylogenetic tree obtained with (δμ)^{2} can be governed by factors that are not related to any evolutionary pattern. The most likely explanation for this finding is that the topological stability was artificially increased due to the effects of larger distances of loci displaying higher variance of mean allele sizes. In support of this, equalizing the variance using the double permutation procedure confirmed that the patterns observed with the single permutation of the complete data (17 loci) are attributable exclusively to differences in variance among loci. It is worth noting that the same analyses, when applied to nonSMMbased distance coefficients [D_{A} (Neiet al. 1983) and D_{CE} (CavalliSforza and Edwards 1967)], did not reveal any noteworthy increase in the topological stability of phylogenetic trees derived from the singlepermuted data (either TSI or bootstrap support; M. T. Koskinen, H. Hirvonen, P.A. Landry and C. R. Primmer, unpublished results). Therefore, the surprising stability of trees derived from randomized data does not result from a bias caused by the permutation procedure.
Several studies reported that trees based on (δμ)^{2} exhibited lower bootstrap values than did trees based on other distance measures (such as CavalliSforza and Edwards 1967 or Neiet al. 1983; Goldsteinet al. 1995b; Takezaki and Nei 1996; Jinet al. 2000; M. T. Koskinen, H. Hirvonen, P.A. Landry and C. R. Primmer, unpublished results). It has been advocated that the poor stability of (δμ)^{2} trees resulted from the high sampling variance associated with (δμ)^{2} distances and that increasing the number of loci should increase the stability of such trees (Goldsteinet al. 1995b). While this appeared to be the case with our data, deeper analysis revealed that the increase of stability could be explained solely by the large inequalities among loci to the (δμ)^{2} distances. Findings of this study reinforce that one should be cautious when utilizing the (δμ)^{2} coefficient to reconstruct the evolutionary history of populations, especially when the amongpopulation variance of loci is heteroscedastic. The effects of inequalities among loci are likely to be more important in relatively small data sets, because the variance of (δμ)^{2} is expected to decrease when increasing the number of loci (Goldsteinet al. 1995b). This consequence might be of primary importance for studies of wild species in which, on average, only six microsatellites are currently utilized in the estimation of evolutionary relationships (M. T. Koskinen, H. Hirvonen, P.A. Landry and C. R. Primmer, unpublished results). These findings also suggest that caution is warranted when using socalled “hybrid trees,” in which the tree topology is determined using nonSMM distance measure, and interpopulation (δμ)^{2} distances are applied to the tree (Angers and Bernatchez 1997; Estoup and Angers 1998). Interpopulation distances based on (δμ)^{2} can be erroneously influenced by a small number of loci, implying that even a posteriori distance adjustments could in practice be very sensitive to highly disproportionate loci.
Data sets comprising large numbers of loci can indeed display reduced (δμ)^{2} variance; nevertheless, it was shown that a very small number of loci (i.e., 2 out of 213) can contribute to almost onehalf of the overall distances (Cooperet al. 1999). Thus, increasing the number of loci does not necessarily provide a solution to the influence of disproportionate loci. Cooper et al. (1999) have also proposed several corrections to the data prior to calculating the (δμ)^{2} distances to normalize the outlying loci, but with limited improvements. The main difficulty is to make an adjustment that will not modify the property of linearity with time of the (δμ)^{2} index. For example, allele sizes could be standardized before analyses, ascertaining that each locus would then have an equal weight in the distance calculations (Goodman 1997). While this procedure is warranted for R_{st} calculations, it is not advisable for (δμ)^{2} because large distances will be more constrained than small ones by the standardization, giving up the linearity with divergence time.
Given that differences in locus contributions arise in part because of different range constraints and heterogeneous mutations rates among loci (Estoup and Angers 1998), a distance measure accounting for these two parameters should produce more accurate estimates of the distances. Some corrections involving these parameters were indeed intended to account for different mutations rates and size constraints among loci (e.g., D_{GLS}; Pollocket al. 1998), but the use of these coefficients requires reasonably accurate estimates of evolutionary parameters of each microsatellite, which at present cannot be reliably achieved (Estoup and Angers 1998; Cooperet al. 1999; Ellegren 2000). Therefore, in cases where outlying loci can be identified using the test reported here, the simplest and most straightforward option for eliminating any false phylogram stability (bootstrap support) and convergence toward a final solution (TSI) is to remove the offending loci from subsequent analyses (e.g., Zhivotovskyet al. 2000).
CONCLUSION
The problem of unequal contribution of microsatellites combined with the use of an SMMbased distance coefficient [(δμ)^{2} or R_{st}] should be considered when assessing the evolutionary relationships among populations and especially when utilizing validation procedures based on resampling. Results presented here suggest that the use of (δμ)^{2} or R_{st} should be restricted to arrays of loci displaying comparable amounts of variance, to minimize the influence of exceptionally highly variable loci. Thus, in establishing the phylogenetic relationships among populations with (δμ)^{2}, all microsatellites are considered equal, but it clearly appears that some are more equal than others … (to paraphrase Orwell 1945).
Acknowledgments
The authors are grateful to F.J. Lapointe for stimulating discussions in the early stages of this study and to J. N. Painter, D. L. Johnson, and three anonymous reviewers for helpful suggestions to improve this manuscript. Researchers providing the T. thymallus samples from across Europe are also much appreciated. This work was supported by a National Sciences and Engineering Research Council of Canada postdoctoral fellowship awarded to P.A. Landry and by the Biological Interactions Graduate School, the University of Helsinki, and the Academy of Finland (project no. 172964 and Centre of Excellence Program 20002005, grant no. 44887).
Footnotes

Communicating editor: M. W. Feldman
 Received September 6, 2001.
 Accepted May 6, 2002.
 Copyright © 2002 by the Genetics Society of America