Genetics, Vol. 161, 1339-1347, July 2002, Copyright © 2002

Deriving Evolutionary Relationships Among Populations Using Microsatellites and ({delta}µ)2: All Loci Are Equal, but Some Are More Equal Than Others ...

Pierre-Alexandre Landrya, Mikko T. Koskinenb, and Craig R. Primmerb
a Metapopulation Research Group, Division of Population Biology, Department of Ecology and Systematics, FIN-00014, University of Helsinki, Helsinki, Finland
b Division of Population Biology, Department of Ecology and Systematics, FIN-00014, University of Helsinki, Helsinki, Finland

Corresponding author: Pierre-Alexandre Landry, Division of Population Biology, Department of Ecology and Systematics, PO Box 65 (Viikinkaari 1), FIN-00014, University of Helsinki, Helsinki, Finland., alexandre.landry{at}helsinki.fi (E-mail)

Communicating editor: M. W. FELDMAN


*  ABSTRACT
*TOP
*ABSTRACT
*METHODOLOGY
*RESULTS
*DISCUSSION
*CONCLUSION
*LITERATURE CITED

Numerous studies have relied on microsatellite DNA data to assess the relationships among populations in a phylogenetic framework, converting microsatellite allelic composition of populations into evolutionary distances. Among other coefficients, ({delta}µ)2 and Rst are often employed because they make use of the differences in allele sizes on the basis of the stepwise mutation model. While it has been recognized that some microsatellites can yield disproportionate interpopulation distance estimates, no formal investigation has been conducted to evaluate to what extent such loci could affect the topology of the corresponding dendrograms. Here we show that single loci, displaying extremely large among-population variance, can greatly bias the topology of the phylogenetic tree, using data from European grayling (Thymallus thymallus, Salmonidae) populations. Importantly, we also demonstrate that the inclusion of a single disproportionate locus will lead to an overestimation of the stability of trees assessed using bootstrapping. To avoid this bias, we introduce a simple statistical test for detecting loci with significantly disproportionate variance prior to phylogenetic analyses and further show that exclusion of offending loci eliminates the false increase in phylogram stability.


NUCLEAR microsatellite DNA loci are increasingly employed to assess evolutionary relationships among populations (GOLDSTEIN and SCHLOTTERER 1999 Down). Because of their high variability, microsatellites can allow for the discrimination of populations where other methods (e.g., DNA sequencing) have failed to detect any polymorphism (BOWCOCK et al. 1994 Down; ANGERS and BERNATCHEZ 1997 Down; BRUNNER et al. 1998 Down). Another attractive feature of microsatellites is that the phylogeny of populations can be retraced from a large number of independent loci, whereas, e.g., with mitochondrial DNA, conclusions rely essentially on only one locus. Microsatellites evolve predominantly according to the stepwise mutation model (SMM; OHTA and KIMURA 1973 Down; SCHLOTTERER 2000 Down; but see also BALLOUX et al. 2000 Down), according to which every allele has an equal probability to mutate up or down by a single repeat unit (SHRIVER et al. 1993 Down; VALDES et al. 1993 Down). Consequently, numerous genetic distance indices have been developed to make use of the evolutionary information contained within differences in repeat numbers among alleles, treating them as quantitative traits (e.g., GOLDSTEIN et al. 1995A Down, GOLDSTEIN et al. 1995B Down; SHRIVER et al. 1995 Down; SLATKIN 1995 Down). GOLDSTEIN et al. 1995B Down have suggested the use of a coefficient known as ({delta}µ)2, which relies solely upon the differences in mean allele sizes between a pair of populations,

(1)

where mx and my are the mean allele sizes (in repeat units) for a given locus j in populations x and y, respectively, and r represents the number of loci. This SMM-based genetic distance measure was developed specifically to accommodate circumstances in which populations have been isolated for long time periods, i.e., when mutations might account for a marked proportion of the interpopulation microsatellite variation (GOLDSTEIN et al. 1995B Down). Indeed, computer simulation studies have demonstrated ({delta}µ)2 to remain linear with time (GOLDSTEIN et al. 1995B Down), even when the strict SMM assumption for the occurrence of solely single-step mutations is violated (KIMMEL et al. 1996 Down). Moreover, ({delta}µ)2 has been shown to be relatively robust to fluctuations in population sizes (TAKEZAKI and NEI 1996 Down). In theory, this distance index is expected to provide unbiased estimates of divergence among populations that have been maintained in mutation-drift equilibrium during their evolution, have had constant sizes, and were subjected to no gene flow (ZHIVOTOVSKY 2001 Down). Due to these desired properties, ({delta}µ)2 has been widely used in evolutionary studies of humans (e.g., GOLDSTEIN et al. 1995B Down; PEREZ-LEZAUN et al. 1997 Down; CALAFELL et al. 1998 Down; JIN et al. 2000 Down), domestic animals (e.g., RITZ et al. 2000 Down), and also numerous wild species (e.g., ANGERS and BERNATCHEZ 1997 Down; VALSECCHI et al. 1997 Down; GOODMAN 1998 Down; GOLDSTEIN et al. 1999 Down; TESSIER and BERNATCHEZ 2000 Down).

However, computer simulation studies have also pointed out that ({delta}µ)2 distances have an inherently high variance, suggesting that hundreds of loci may be required to attain stable estimates (ZHIVOTOVSKY and FELDMAN 1995 Down). Furthermore, a recent study of four human populations indicated that ({delta}µ)2 distances can be extremely sensitive to the influence of a very small number of loci, even when >200 loci are utilized; indeed, it was shown that almost one-half of the average interpopulation distance could be attributed to only 2 out of 213 loci (COOPER et al. 1999 Down). Therefore, the contribution of each microsatellite to the overall ({delta}µ)2 distance estimates (hereafter referred to as the "contribution of a locus") can vary tremendously, to a point where a small number of the assessed loci can dictate the value of the ({delta}µ)2 distances. Despite the fact that large variance of the ({delta}µ)2 distance has been recognized, there has been no formal attempt to ascertain the effect this variance has on the topology of phylogenetic trees derived from the distance matrix. In fact, there is good reason to expect that inequalities among loci could create a bias in the topological validation of trees, as assessed with resampling procedures (such as bootstrapping or jackknifing; see LAPOINTE 1998 Down for a review). Indeed, the main underlying assumptions of bootstrapping are that characters (here loci) are independently and identically distributed (IID; WEST and FAITH 1990 Down; CARPENTER 1992 Down). Identical distribution requires that each locus "must obey one common stochastic model of evolution" (SANDERSON 1995 Down), an assumption that is likely to be violated with microsatellite loci that are characterized typically by heterogeneous mutation rates (WEBER and WONG 1993 Down; PRIMMER et al. 1996 Down; DI RIENZO et al. 1998 Down; HARR et al. 1998 Down) and possible differences in range constraints (GARZA et al. 1995 Down; LEHMANN et al. 1996 Down). In a case where a small number of loci could disproportionately affect the overall ({delta}µ)2 distance matrix, resampling (e.g., using the bootstrap) of microsatellites having a small contribution to distances may have little or no effect on the determination of the topology of a tree. For that reason, it could be expected that unequal contribution of loci to the interpopulation distances should upwardly bias the stability estimates (bootstrap support) of ({delta}µ)2 trees.

To evaluate this potential bias, we have used an innovative randomization method, based on the permutation of alleles within a real microsatellite data set. This approach was inspired by the family of permutation tail probability tests (PTP; FAITH and CRANSTON 1991 Down). In this general framework, statistics pertaining to the topological stability of a phylogenetic tree were computed from real data and then compared to those attained with randomized data, i.e., following the permutation of alleles among individuals or loci. This procedure intends to establish whether the topology of the tree derived from the real data is more stable than that of a tree built from random data. Any pattern of topological stability emerging from the permuted (randomized) data set would reveal the existence of a bias in the analyses and further allows qualitative evaluation of the extent of this bias.

In this study, we explored the effect of unequal contribution of loci in resolving population evolutionary relationships, using the ({delta}µ)2 genetic distance. This has been examined using data from 17 microsatellite loci obtained from widely distributed natural populations of European grayling, Thymallus thymallus (Salmonidae). We show that a single locus that exhibits strikingly large interpopulation variance can completely dominate the calculation of the genetic distances among populations, introducing a substantial bias in the topology of the corresponding phylogram. Using a permutational approach, we show that the inclusion of disproportionately variable loci falsely increases the similarity of trees between resamples and, consequently, overestimates the bootstrap support values of the ({delta}µ)2 phylogenetic tree. The extension of these analyses to a second index of interpopulation genetic distances, the so-called Rst (SLATKIN 1995 Down), revealed that such problems are not limited to the ({delta}µ)2 index, but are to be expected with most SMM-based distance coefficients. These findings imply that caution is warranted when applying ({delta}µ)2 or Rst distances on microsatellite loci that display heterogeneous levels of diversity.


*  METHODOLOGY
*TOP
*ABSTRACT
*METHODOLOGY
*RESULTS
*DISCUSSION
*CONCLUSION
*LITERATURE CITED

Data:
Seventeen microsatellites were employed to genotype 594 T. thymallus individuals sampled from 17 populations across Europe (genotyping details in KOSKINEN and PRIMMER 2001 Down). All specimens were caught from the wild between 1994 and 1999 from areas considered to be mostly unaffected by stocking of European grayling. Linkage equilibrium tests of the loci and Hardy-Weinberg equilibrium tests of the loci and populations did not reveal any significant deviations potentially violating the assumptions of genetic distance estimation and ({delta}µ)2 phylogram construction (KOSKINEN et al. 2002 Down). The microsatellite-based evolutionary relationships of the European grayling populations are relatively clear: Phylogenetic trees based on CAVALLI-SFORZA and EDWARDS's (1967) chord distance (DCE) and NEI et al. 1983 Down DA distance are congruent with mitochondrial DNA-based results (KOSKINEN et al. 2002 Down). Yet, the ({delta}µ)2 distances yielded by each locus were extremely heterogeneous and motivated the investigation of unequal contributions of loci on phylogenetic tree topology.

Testing the unequal contributions of different loci:
Following Equation 1, a locus displaying large differences in mean allele sizes among populations will contribute more to the overall ({delta}µ)2 distance than a locus exhibiting small size differences. Therefore, we assessed the expected overall contribution of each locus i by calculating its corresponding variance of mean allele size among populations, hereafter referred to as the variance of mean sizes (µ Vari),

(2)

where k is the number of populations, nj and mj are the sample size and the mean allele size of the jth population, respectively, and µ is the mean allele size of locus i across all populations. This equation is analogous to the interpopulation variance in an analysis of variance framework (see SOKAL and ROHLF 1995 Down). Let us draw attention to the term nj (mj - µ)2, which in fact represents the share of each population j to the contribution of locus i; this is relevant for testing the inequalities among loci (see below).

To evaluate whether a locus had a significantly larger contribution than the other loci to the ({delta}µ)2 distance matrix, we introduce a new statistic comparing the contribution of a locus i to the average contribution of the remaining l loci:

(3)

Under the null hypothesis, the contribution of a single locus should not differ from the average contribution of others, and therefore Fctri will fluctuate around unity. The statistical significance of this ratio can be assessed by computing the null distribution of this statistic using a permutation procedure. Here, loci are assumed to be independent from one another, but the same presumption cannot hold true for populations that are linked by phylogenetic relationships. Under these conditions, a relevant permutation scheme should permute data among loci but not among populations. The probability associated with the null hypothesis was then obtained by permuting the shares of populations pertaining to single-locus contributions (see above). Share values were randomly reassigned to loci, the value of Fctri was reestimated, and the probability of the null hypothesis was obtained by computing the proportion of permuted cases for which the statistic is equal to or larger than the original value. This test is unilateral by design; i.e., only loci displaying contributions larger than others will be declared disproportionate. Significance levels must then be adjusted for multiple testing (e.g., Bonferroni correction). The program to test for unequal contribution (AnimalFarm, ver. 1.0) is available from http://www.helsinki.fi/~primmer (under publications and data).

While the ({delta}µ)2 index is the focal point of this study, disparities among loci contributions are also expected to influence other SMM-based distance measures. Handling repeat unit numbers as a quantitative trait can indeed induce variations in the relative importance of differing loci, which would not be the case for indices based on allele frequencies that always sum up to one (e.g., CAVALLI-SFORZA and EDWARDS 1967 Down; NEI et al. 1983 Down). For comparative purposes, we extended our analyses to a second distance coefficient derived for microsatellite markers, namely Rst (SLATKIN 1995 Down). The two measures are related to a certain point, as allele size variance estimators (within and among populations) unavoidably rely on the sum of squared differences. However, while ({delta}µ)2 considers only the average allele size differences among populations, Rst also takes into account the within-population variance, which might compensate for the influence of disproportionately variable loci.

Phylogenetic analyses and topological comparisons:
All trees analyzed in this study were recovered using the following procedure: First, interpopulation genetic distance matrices were obtained by summing the ({delta}µ)2 values across loci or by calculating the Rst variance ratio (computations carried out with MsatBootstrap 1.1, available from: http://www.helsinki.fi/~primmer, under publications and data). Then, the corresponding trees were recovered with the Fitch-Margoliash algorithm [computations performed in FITCH from the PHYLIP package, version 3.752 (P = 2, G, and no negative branches allowed); FELSENSTEIN 1995].

The topological similarity between trees was evaluated with the partition metric (Pm; ROBINSON and FOULDS 1981 Down). For a given pair of trees, this index counts the number of clusters occurring in the first or the second tree, but not in both (i.e., the number of topological differences between a pair of trees; PENNY and HENDY 1985 Down). The Pm values were standardized with the maximum number of topological differences between two trees (i.e., 2n - 6; STEEL and PENNY 1993 Down), and the 1- complement of this measure was recorded [hereafter referred to as the topological similarity index (TSI); see also LANDRY et al. 1996 Down]. Consequently, topologically identical trees have a TSI of 1, whereas completely different trees have a TSI of 0.

Contribution of individual loci in the determination of the topology of a phylogram:
Two strategies were employed to evaluate the contribution of single loci to the structure of a phylogenetic tree. First, we compared the tree that is based on the complete data set (17 loci) to trees obtained when each single locus was excluded in turn, i.e., 17 trees based on 16 loci each (using TSI; see above). The raison d'être of this procedure was to assess whether tree topology would be altered more by the removal of a locus with a larger µ Var than it would when a locus with a smaller µ Var was excluded from the data set. Second, the topology of the tree derived from each single locus was compared to the tree based on all loci, i.e., 17 trees based on 1 locus each (using TSI; see above). The rationale of this second procedure was that a tree based on a locus contributing more to the overall interpopulation distances should be more similar to the tree based on all 17 loci than a tree obtained from a locus with a smaller contribution to the distance matrix.

Topology and bootstrap support of phylograms as a function of the number of assessed loci:
Computer simulation studies have revealed that increasing the number of microsatellites ought to decrease the variance of ({delta}µ)2 (GOLDSTEIN et al. 1995B Down). Accordingly, increasing the number of microsatellites should stabilize the structure of the corresponding phylogram (GOLDSTEIN et al. 1995B Down; TAKEZAKI and NEI 1996 Down; JIN et al. 2000 Down). To investigate this, we applied two distinct analyses. First, for every given fixed number of loci (from 2 to 17), 100 bootstrap replicates were generated from the 17-locus data set, and their matching ({delta}µ)2 or Rst distance matrices were calculated (computations carried out in MsatBootstrap 1.1). Then, the FITCH tree corresponding to each matrix was recovered as described above. Subsequently, the topology of each replicated tree was compared to the topology of the tree based on all 17 loci (using TSI), and the mean TSI value across each set of 100 replicates was recorded as an index of general topological similarity between trees based on a subset of loci and the tree based on all 17 microsatellites.

Second, the topological stability (i.e., bootstrap support) of trees built from either ({delta}µ)2 or Rst matrices was evaluated as a function of the number of loci, using the analytical design described above. For every set of 100 phylograms previously obtained, a majority-rule consensus tree was calculated (computations in CONSENSE in PHYLIP; FELSENSTEIN 1995 Down). Then, the mean node-support value of the consensus tree was computed and further used as an indicator of topological stability. Large values of average node support characterize trees with clusters that are well supported by the data (i.e., stable trees) whereas small values indicate that some clusters are poorly supported (i.e., unstable trees).

Data randomization procedures for evaluating contribution of loci to interpopulation distances and subsequent tree topology:
Two related permutation models were employed to randomize the data:

  1. Single permutation: To eliminate any evolutionary signal that could be related to the differences in the number of repeats between populations, alleles scored at a given locus were randomly assigned to individuals. Under this permutational hypothesis, mean allele sizes are expected to be equal in all populations, leading to expected interpopulation distances approaching zero; any differences in variance between loci are, however, retained.

  2. Double permutation: In this scheme, the single permutation was followed by a second shuffling that permuted alleles among loci within each individual. Thereby any given allele was assigned to a randomly chosen individual and to a randomly chosen locus. In addition to bringing expected interpopulation distances near zero, this procedure aimed at equalizing the variance of allele sizes among loci. All permutation procedures were repeated 10 times, and the same analyses of convergence and stability were repeated for each permuted data set.


*  RESULTS
*TOP
*ABSTRACT
*METHODOLOGY
*RESULTS
*DISCUSSION
*CONCLUSION
*LITERATURE CITED

The ({delta}µ)2 genetic interpopulation distances yielded by each locus were found to be extremely variable, maximum values ranging from 4.0 to 169.0 squared repeat units (ru2), with a mean estimate of 27.6 ru2 (Table 1). Marked variations in average ({delta}µ)2 distances were also observed among loci, ranging from 1.0 ru2 (BFRO016) to 194.8 ru2 (One2). Thus, it was clear that the influence of individual microsatellites on the overall ({delta}µ)2 distance matrix greatly differed among loci. Indeed, only 2 of the 17 loci (namely One2 and BFRO012) accounted for 63.6% of the total average interpopulation distance (Table 1). Accordingly, these two microsatellites exhibited much larger allele size variances (µ Var) than did the remaining 15 loci (Table 1). Rigorous statistical testing revealed that the contribution of the One2 locus was significantly larger than the average contribution of other loci, an assertion that did not hold true for any other locus after Bonferroni correction (see Table 1).


 
View this table:
In this window
In a new window

 
Table 1. Diversity characteristics of the 17 loci analyzed, shown in decreasing order of µ Var

Consequently, the importance of each locus in the determination of the structure of the evolutionary relationships among populations was found to differ strikingly among loci. Single removal of strategic loci (e.g., One2 or BFRO013) resulted in dramatic alterations to the topology of the tree (Fig 1). On the other hand, the majority of the other loci could be individually excluded without observing any major change in the topology. In fact, nine loci (53%) could be individually removed without causing a single topological modification (Fig 1). From a reverse angle, the single-locus trees obtained from either One2 or BFRO013 showed the highest similarity with the tree based on complete data set (Fig 1), providing additional evidence that these loci were predominant in the determination of the general structure of the phylogenetic tree.



View larger version (16K):
In this window
In a new window
Download PPT slide
 
Figure 1. Association between standard deviation of mean allele size and the influence of each locus on the topology of the ({delta}µ)2 phylogenetic tree. Topological comparisons (as measured by TSI) of trees based on sets of 16 loci (obtained by excluding each particular locus) with the tree based on the complete data of 17 loci are indicated by open circles, whereas topological comparisons of each single-locus tree with the 17-locus tree are represented by solid circles.

Analyses of the topological structure of ({delta}µ)2 phylograms as a function of an increasing number of loci also indicated the drastic effects that diverging contributions of the microsatellites can have on the recovery of evolutionary relationships (Fig 2). As expected, increasing the number of resampled loci led consistently to a tree that was increasingly more similar to the one based on all 17 loci. Interestingly, however, the rate of increase in the TSI estimates based on real data did not differ from the one obtained with the single-permuted data (Fig 2A). On the other hand, the pattern observed with the double-permutation procedure showed that the complete randomization of alleles (i.e., among individuals and loci) removed any false evolutionary pattern more successfully than the single permutation did (Fig 2A). Interestingly, the same conclusions can be drawn from the trees derived from the variance-based index, Rst, despite the fact that this distance should account for within-population variance (Fig 2C).



View larger version (31K):
In this window
In a new window
Download PPT slide
 
Figure 2. Topological similarity index (TSI, A and C) and bootstrap support (B and D) of resampled phylograms as a function of the number of loci for two distance indices, ({delta}µ)2 (A and B) and Rst (C and D). Original data (solid circles) and single permutation data (open triangles) show similar trends, while double permutation data (open squares) show a much slower increase in both analyses (see text for permutation methodology). Intervals illustrate the range of values obtained with the replicated permuted data. Results from excluding the most divergent locus are indicated by crosses (original data) and asterisks (single permutation). Some symbols are slightly displaced to the right for clarity.

Another striking conclusion of this study was that the average bootstrap support values of ({delta}µ)2 phylograms based on randomized data (single permutation) were comparable to the bootstrap values observed with real data (Fig 2B). Even in the absence of any evolutionary signal (singly permuted data), bootstrap support values averaging up to 50% were recorded, with the relationships of some nodes being supported by bootstrap values as high as 79%. However, the bootstrap support values of the trees derived from randomized data following the double-permutation procedure were considerably decreased (Fig 2B). Results for the Rst exhibited a similar pattern (Fig 2D). As for TSI analyses, the Rst trees based on >14 loci appeared to some extent more stable than those from the singly permuted data.

To further demonstrate the effects of a dominant locus, the analyses were rerun after removal of the locus displaying a contribution significantly larger than all others (i.e., One2); this partial data set was also submitted to the single permutation procedure [({delta}µ)2 only]. Following this, the topological convergence (i.e., increase in the TSI; Fig 2A) and stability levels (bootstrap support; Fig 2B) of trees observed were found to be comparable to those obtained with the double permutation of the complete data set.


*  DISCUSSION
*TOP
*ABSTRACT
*METHODOLOGY
*RESULTS
*DISCUSSION
*CONCLUSION
*LITERATURE CITED

Collectively, these results, based on data from 17 populations spanning the natural range of European grayling, reinforce earlier findings from four human populations (COOPER et al. 1999 Down) that some loci can have a markedly larger influence than others in the determination of ({delta}µ)2 genetic distances when several loci are combined (see also GOODMAN 1997 Down). More importantly, the results presented here demonstrate for the first time that loci contributing disproportionately to the distance matrix can also strongly bias subsequent phylogenetic tree construction and internal validation procedures. To diagnose this problem within a data set before undertaking any phylogenetic analysis, we have formulated a simple test based on permutations aimed at detecting unequal locus contributions, providing a robust framework to address this problem and identify outliers.

The contribution of a locus (differences in size among alleles) is not necessarily related to its phylogenetic informativeness (ANGERS and BERNATCHEZ 1997 Down; ESTOUP and ANGERS 1998 Down), due to factors including size homoplasy (ESTOUP et al. 1995 Down; ANGERS and BERNATCHEZ 1997 Down), variation in mutation rates within (PRIMMER et al. 1998 Down; SCHLOTTERER et al. 1998 Down; CROZIER et al. 1999 Down) and between (BRINKMANN et al. 1998 Down; KAYSER et al. 2000 Down) loci, and allele size constraints (GARZA et al. 1995 Down; LEHMANN et al. 1996 Down). Another fundamental problem is that incompatibility of phylogenetic information contained in different microsatellite loci is common. Nevertheless, validation procedures (such as bootstrapping) are aimed at revealing the degree of incompatibility within the data, assessing the degree of confidence of evolutionary trees derived from multiple loci. However, our results illustrate that inequalities among loci can falsely increase the topological congruence (TSI) and stability (bootstrap support) of corresponding phylogenetic trees. Therefore, standard resampling procedures cannot accurately assess the reliability of a tree in the presence of important inequalities among contribution of loci, neutralizing an important instrument to identify the possible phylogenetic incongruence within the data. This also implies that it could be very difficult to recover the true evolutionary relationships among populations in a case where one dominating locus (i.e., with disproportionate contribution) would lead to a phylogenetic tree that is erroneous.

This is evidenced in the European grayling data set, where single-permuted (randomized) data that produced trees of similar "quality" to the trees based on the original data indicated that the increase in TSI estimates of the nonpermuted microsatellite data with increasing locus number was not a reflection of a meaningful evolutionary pattern. Given that alleles were permuted within each locus separately so that the locus-specific allele size variances were retained, the most likely explanation is that the observed TSI estimates were governed by differences in the magnitude of ({delta}µ)2 or Rst distances among loci. Indeed, equalizing the variance among loci with the double permutation procedure was enough to remove most of the spurious increase of the TSI in the randomized data, confirming that the increase of TSI within the singly permuted data can be explained solely by differences of mean allele size variance among loci.

Analyses of the bootstrap support values of phylogenetic trees indicated a comparable trend. The similar increase of the average bootstrap values observed in the single-permuted, compared to the original, data substantiates the idea that the reliability of the topology of a phylogenetic tree obtained with ({delta}µ)2 can be governed by factors that are not related to any evolutionary pattern. The most likely explanation for this finding is that the topological stability was artificially increased due to the effects of larger distances of loci displaying higher variance of mean allele sizes. In support of this, equalizing the variance using the double permutation procedure confirmed that the patterns observed with the single permutation of the complete data (17 loci) are attributable exclusively to differences in variance among loci. It is worth noting that the same analyses, when applied to non-SMM-based distance coefficients [DA (NEI et al. 1983 Down) and DCE (CAVALLI-SFORZA and EDWARDS 1967 Down)], did not reveal any noteworthy increase in the topological stability of phylogenetic trees derived from the single-permuted data (either TSI or bootstrap support; M. T. KOSKINEN, H. HIRVONEN, P.-A. LANDRY and C. R. PRIMMER, unpublished results). Therefore, the surprising stability of trees derived from randomized data does not result from a bias caused by the permutation procedure.

Several studies reported that trees based on ({delta}µ)2 exhibited lower bootstrap values than did trees based on other distance measures (such as CAVALLI-SFORZA and EDWARDS 1967 Down or NEI et al. 1983 Down; GOLDSTEIN et al. 1995B Down; TAKEZAKI and NEI 1996 Down; JIN et al. 2000 Down; M. T. KOSKINEN, H. HIRVONEN, P.-A. LANDRY and C. R. PRIMMER, unpublished results). It has been advocated that the poor stability of ({delta}µ)2 trees resulted from the high sampling variance associated with ({delta}µ)2 distances and that increasing the number of loci should increase the stability of such trees (GOLDSTEIN et al. 1995B Down). While this appeared to be the case with our data, deeper analysis revealed that the increase of stability could be explained solely by the large inequalities among loci to the ({delta}µ)2 distances. Findings of this study reinforce that one should be cautious when utilizing the ({delta}µ)2 coefficient to reconstruct the evolutionary history of populations, especially when the among-population variance of loci is heteroscedastic. The effects of inequalities among loci are likely to be more important in relatively small data sets, because the variance of ({delta}µ)2 is expected to decrease when increasing the number of loci (GOLDSTEIN et al. 1995B Down). This consequence might be of primary importance for studies of wild species in which, on average, only six microsatellites are currently utilized in the estimation of evolutionary relationships (M. T. KOSKINEN, H. HIRVONEN, P.-A. LANDRY and C. R. PRIMMER, unpublished results). These findings also suggest that caution is warranted when using so-called "hybrid trees," in which the tree topology is determined using non-SMM distance measure, and interpopulation ({delta}µ)2 distances are applied to the tree (ANGERS and BERNATCHEZ 1997 Down; ESTOUP and ANGERS 1998 Down). Interpopulation distances based on ({delta}µ)2 can be erroneously influenced by a small number of loci, implying that even a posteriori distance adjustments could in practice be very sensitive to highly disproportionate loci.

Data sets comprising large numbers of loci can indeed display reduced ({delta}µ)2 variance; nevertheless, it was shown that a very small number of loci (i.e., 2 out of 213) can contribute to almost one-half of the overall distances (COOPER et al. 1999 Down). Thus, increasing the number of loci does not necessarily provide a solution to the influence of disproportionate loci. COOPER et al. 1999 Down have also proposed several corrections to the data prior to calculating the ({delta}µ)2 distances to normalize the outlying loci, but with limited improvements. The main difficulty is to make an adjustment that will not modify the property of linearity with time of the ({delta}µ)2 index. For example, allele sizes could be standardized before analyses, ascertaining that each locus would then have an equal weight in the distance calculations (GOODMAN 1997 Down). While this procedure is warranted for Rst calculations, it is not advisable for ({delta}µ)2 because large distances will be more constrained than small ones by the standardization, giving up the linearity with divergence time.

Given that differences in locus contributions arise in part because of different range constraints and heterogeneous mutations rates among loci (ESTOUP and ANGERS 1998 Down), a distance measure accounting for these two parameters should produce more accurate estimates of the distances. Some corrections involving these parameters were indeed intended to account for different mutations rates and size constraints among loci (e.g., DGLS; POLLOCK et al. 1998 Down), but the use of these coefficients requires reasonably accurate estimates of evolutionary parameters of each microsatellite, which at present cannot be reliably achieved (ESTOUP and ANGERS 1998 Down; COOPER et al. 1999 Down; ELLEGREN 2000 Down). Therefore, in cases where outlying loci can be identified using the test reported here, the simplest and most straightforward option for eliminating any false phylogram stability (bootstrap support) and convergence toward a final solution (TSI) is to remove the offending loci from subsequent analyses (e.g., ZHIVOTOVSKY et al. 2000 Down).


*  CONCLUSION
*TOP
*ABSTRACT
*METHODOLOGY
*RESULTS
*DISCUSSION
*CONCLUSION
*LITERATURE CITED

The problem of unequal contribution of microsatellites combined with the use of an SMM-based distance coefficient [({delta}µ)2 or Rst] should be considered when assessing the evolutionary relationships among populations and especially when utilizing validation procedures based on resampling. Results presented here suggest that the use of ({delta}µ)2 or Rst should be restricted to arrays of loci displaying comparable amounts of variance, to minimize the influence of exceptionally highly variable loci. Thus, in establishing the phylogenetic relationships among populations with ({delta}µ)2, all microsatellites are considered equal, but it clearly appears that some are more equal than others ... (to paraphrase ORWELL 1945 Down).


*  ACKNOWLEDGMENTS

The authors are grateful to F.-J. Lapointe for stimulating discussions in the early stages of this study and to J. N. Painter, D. L. Johnson, and three anonymous reviewers for helpful suggestions to improve this manuscript. Researchers providing the T. thymallus samples from across Europe are also much appreciated. This work was supported by a National Sciences and Engineering Research Council of Canada postdoctoral fellowship awarded to P.-A. Landry and by the Biological Interactions Graduate School, the University of Helsinki, and the Academy of Finland (project no. 172964 and Centre of Excellence Program 2000-2005, grant no. 44887).

Manuscript received September 6, 2001; Accepted for publication May 6, 2002.


*  LITERATURE CITED
*TOP
*ABSTRACT
*METHODOLOGY
*RESULTS
*DISCUSSION
*CONCLUSION
*LITERATURE CITED

ANGERS, B. and L. BERNATCHEZ, 1997  Complex evolution of a salmonid microsatellite locus and its consequences in inferring allelic divergence from size information. Mol. Biol. Evol. 14:230-238.[Abstract]

BALLOUX, F., H. BRÜNNER, N. LUGON-MOULIN, J. HAUSSER, and J. GOUDET, 2000  Microsatellites can be misleading: an empirical and simulation study. Evolution 54:1414-1422.[Medline]

BOWCOCK, A. M., A. RUIZ-LINARES, J. TOMFOHRDE, E. MINCH, and J. R. KIDD et al., 1994  High resolution of human evolutionary trees with polymorphic microsatellites. Nature 368:455-457.[Medline]

BRINKMANN, B., M. KLINTSCHAR, F. NEUHUBER, J. HUHNE, and B. ROLF, 1998  Mutation rate in human microsatellites: influence of the structure and length of the tandem repeat. Am. J. Hum. Genet. 62:1408-1415.[Medline]

BRÜNNER, P. C., M. R. DOUGLAS, and L. BERNATCHEZ, 1998  Microsatellite and mitochondrial DNA assessment of population structure and stocking effects in Arctic charr Salvelinus alpinus (Teleostei: Salmonidae) from central Alpine lakes. Mol. Ecol. 7:209-223.

CALAFELL, F., A. SHUSTER, W. C. SPEED, J. R. KIDD, and K. KIDD-KENNETH, 1998  Short tandem repeat polymorphism evolution in humans. Eur. J. Hum. Genet. 6:38-49.[Medline]

CARPENTER, J. M., 1992  Random cladistics. Cladistics 8:147-153.

CAVALLI-SFORZA, L. L. and A. W. F. EDWARDS, 1967  Phylogenetic analysis: models and estimation procedures. Am. J. Hum. Genet. 19:233-257.

COOPER, G., W. AMOS, R. BELLAMY, M. R. SIDDIQUI, and A. FRODSHAM et al., 1999  An empirical exploration of the ({delta}µ)2 genetic distance for 213 human microsatellite markers. Am. J. Hum. Genet. 65:1125-1133.[Medline]

CROZIER, R. H., B. KAUFMANN, M. E. CAREW, and Y. C. CROZIER, 1999  Mutability of microsatellites developed for the ant Camponotus consobrinus.. Mol. Ecol. 8:271-276.[Medline]

DI RIENZO, A., P. DONNELY, C. TOOMAJIAN, B. SISK, and A. HILL et al., 1998  Heterogeneity of microsatellite mutations within and between loci and implications for human demographic histories. Genetics 148:1269-1284.[Abstract/Free Full Text]

ELLEGREN, H., 2000  Microsatellite mutations in the germline: implications for evolutionary inference. Trends Genet. 16:551-558.[Medline]

ESTOUP, A., and B. ANGERS, 1998 Microsatellites and minisatellites for molecular ecology: theoretical and empirical considerations, pp. 55–86 in Advances in Molecular Ecology, edited by G. R. CARVALHO. IOS Press, Amsterdam.

ESTOUP, A., C. TAILLIEZ, J. M. CORNUET, and M. SOLIGNAC, 1995  Size homoplasy and mutational processes of interrupted microsatellites in two bee species, Apis mellifera and Bombus terrestris (Apidae). Mol. Biol. Evol. 12:1074-1084.[Abstract]

FAITH, D. P. and P. S. CRANSTON, 1991  Could a cladogram this short have arisen by chance alone? On permutation tests for cladistic structure. Cladistics 7:1-28.

FELSENSTEIN, J., 1995 Phylip (Phylogeny Inference Package) Version 3.572. Department of Genetics, University of Washington, Seattle.

GARZA, J. C., M. SLATKIN, and N. B. FREIMER, 1995  Microsatellite allele frequencies in humans and chimpanzees, with implications for constraints on allele size. Mol. Biol. Evol. 12:594-603.[Abstract]

GOLDSTEIN, D. B., and C. SCHLÖTTERER (Editors), 1999 Microsatellites: Evolution and Applications. Oxford University Press, Oxford.

GOLDSTEIN, D. B., A. RUIZ LINARES, L. L. CAVALLI-SFORZA, and M. L. FELDMAN, 1995a  Genetic absolute dating based on microsatellites and the origin of modern humans. Proc. Natl. Acad. Sci. USA 92:6723-6727.[Abstract/Free Full Text]

GOLDSTEIN, D. B., A. RUIZ-LINARES, L. L. CAVALLI-SFORZA, and M. W. FELDMAN, 1995b  An evaluation of genetic distances for use with microsatellite loci. Genetics 139:463-471.[Abstract]

GOLDSTEIN, D. B., G. W. ROEMER, D. A. SMITH, D. E. REICH, and A. BERGMAN et al., 1999  The use of microsatellite variation to infer population structure and demographic history in a natural model system. Genetics 151:797-801.[Abstract/Free Full Text]

GOODMAN, S. J., 1997  Rst Calc: a collection of computer programs for calculating unbiased estimates of genetic differentiation and determining their significance for microsatellite data. Mol. Ecol. 6:881-885.

GOODMAN, S. J., 1998  Patterns of extensive genetic differentiation and variation among European harbor seals (Phoca vitulina vitulina) revealed using microsatellite DNA polymorphisms. Mol. Biol. Evol. 15:104-118.[Abstract]

HARR, B., B. ZANGERL, G. BREM, and C. SCHLÖTTERER, 1998  Conservation of locus specific microsatellite variability across species: a comparison of two Drosophila sibling species D. melanogaster and D. simulans.. Mol. Biol. Evol. 15:176-184.[Abstract]

JIN, L., M. L. BASKETT, L. L. CAVALLI-SFORZA, L. A. ZHIVOTOVSKY, and M. W. FELDMAN et al., 2000  Microsatellite evolution in modern humans: a comparison of two data sets from the same populations. Ann. Hum. Genet. 64:117-134.[Medline]

KAYSER, M., L. ROEWER, M. HEDMAN, L. HENKE, and J. HENKE et al., 2000  Characteristics and frequency of germline mutations at microsatellite loci from the human Y chromosome, as revealed by direct observation in father/son pairs. Am. J. Hum. Genet. 66:1580-1588.[Medline]

KIMMEL, M., R. CHAKRABORTY, D. N. STIVERS, and R. DEKA, 1996  Dynamics of repeat polymorphisms under a forward-backward mutation model: within- and between-population variability at microsatellite loci. Genetics 143:549-555.[Abstract]

KOSKINEN, M. T. and C. R. PRIMMER, 2001  High throughput analysis of 17 microsatellite loci in grayling (Thymallus spp. Salmonidae). Conserv. Genet. 2:173-177.

KOSKINEN, M. T., E. RANTA, J. PIIRONEN, A. VESELOV, and S. TITOV et al., 2000  Genetic lineages and postglacial colonization of grayling (Thymallus thymallus, Salmonidae) in Europe, as revealed by mitochondrial DNA analyses. Mol. Ecol. 9:1609-1624.[Medline]

KOSKINEN, M. T., J. NILSSON, A. JE, A. G. VESELOV, E. POTUTKIN, and E. POTUTKINRANTA ET AL., 2002  Microsatellite data resolve phylogeographic patterns in European grayling, Thymallus thymallus, Salmonidae. Heredity 88:391-401.[Medline]

LANDRY, P.-A., F.-J. LAPOINTE, and J. A. W. KIRSCH, 1996  Estimating phylogenies from lacunose distance matrices: additive is superior to ultrametric estimation. Mol. Biol. Evol. 13:818-823.

LAPOINTE, F.-J., 1998 How to validate phylogenetic trees: a stepwise procedure, pp. 71–88 in Data Science, Classification and Related Methods, edited by C. HAYASHI, N. OHSUMI, K. YAJIMA, Y. TANAKA, H.-H. BOCK et al. Springer, Tokyo.

LEHMANN, T., W. A. HAWLEY, and F. H. COLLINS, 1996  An evaluation of evolutionary constraints on microsatellite loci using null alleles. Genetics 144:1155-1163.[Abstract]

NEI, M., F. TAJIMA, and Y. TATENO, 1983  Accuracy of estimated phylogenetic trees from molecular data. J. Mol. Evol. 19:153-170.[Medline]

OHTA, T. and K. KIMURA, 1973  The model of mutation appropriate to estimate the number of electrophoretically detectable alleles in a genetic population. Genet. Res. 22:201-204.[Medline]

ORWELL, G., 1945 Animal Farm. Secker & Warburg, London.

PENNY, D. and M. D. HENDY, 1985  The use of tree comparison metrics. Syst. Zool. 34:75-82.

REZ-LEZAUN, A., F. CALAFELL, E. MATEU, D. COMAS, and R. RUIZ-PACHECO et al., 1997  Microsatellite variation and the differentiation of modern humans. Hum. Genet. 1:1-7.

POLLOCK, D. D., A. BERGMAN, M. W. FELDMAN, and D. B. GOLDSTEIN, 1998  Microsatellite behavior with range constraints: parameter estimation and improved distances for use in phylogenetic reconstruction. Theor. Popul. Biol. 53:256-271.[Medline]

PRIMMER, C. R., H. ELLEGREN, N. SAINO, and A. P. MØLLER, 1996  Directional evolution in germline microsatellite mutations. Nat. Genet. 13:391-393.[Medline]

PRIMMER, C. R., N. SAINO, A. P. MØLLER, and H. ELLEGREN, 1998  Unravelling the processes of microsatellite evolution through analysis of germline mutations in barn swallows Hirundo rustica.. Mol. Biol. Evol. 15:1047-1054.

RITZ, L. R., M.-L. GLOWATZKI-MULLIS, D. E. MACHIGH, and C. GAILLARD, 2000  Phylogenetic analysis of the tribe Bovini using microsatellites. Anim. Genet. 31:178-185.[Medline]

ROBINSON, D. F. and L. R. FOULDS, 1981  Comparison of phylogenetic trees. Math. Biosci. 53:131-147.

SANDERSON, M. J., 1995  Objections to bootstrapping phylogenies: a critique. Syst. Biol. 44:299-320.

SCHLÖTTERER, C., 2000  Evolutionary dynamics of microsatellite DNA. Chromosoma 109:365-371.[Medline]

SCHLÖTTERER, C., R. RITTER, B. HARR, and G. BREM, 1998  High mutation rate of a long microsatellite allele in Drosophila melanogaster provides evidence for allele-specific mutation rates. Mol. Biol. Evol. 15:1269-1274.[Abstract]

SHRIVER, M. D., L. JIN, R. CHAKRABORTY, and E. BOERWINKLE, 1993  VNTR allele frequency distributions under the stepwise mutation model: a computer simulation approach. Genetics 134:983-993.[Abstract]

SHRIVER, M. D., L. JIN, E. BOERWINKLE, R. DEKA, and R. E. FERRELL et al., 1995  A novel measure of genetic distance for highly polymorphic tandem repeat loci. Mol. Biol. Evol. 12:914-920.[Abstract]

SLATKIN, M., 1995  A measure of population subdivision based on microsatellite allele frequencies. Genetics 139:457-462.[Medline]

SOKAL, R. R., and F. J. ROHLF, 1995 Biometry—The Principles and Practices of Statistics in Biological Research, Ed. 3. W. H. Freeman, New York.

STEEL, M. A. and D. PENNY, 1993  Distributions of tree comparison metrics—some new results. Syst. Biol. 42:126-141.

TAKEZAKI, N. and M. NEI, 1996  Genetic distances and reconstruction of phylogenetic trees from microsatellite DNA. Genetics 144:389-399.[Abstract]

TESSIER, N. and L. BERNATCHEZ, 2000  A genetic assessment of single versus double origin of landlocked Atlantic salmon (Salmo salar) from Lake Saint-Jean, Québec, Canada. Can. J. Fish. Aquat. Sci. 57:797-804.

VALDES, A. M., M. SLATKIN, and N. B. FREIMER, 1993  Allele frequencies at microsatellite loci: the stepwise mutation model revisited. Genetics 133:739-749.

VALSECCHI, E., P. PALSBØLL, P. HALE, D. GLOCKNER-FERRARI, and M. FERRARI et al., 1997  Microsatellite genetic distances between oceanic populations of the humpback whale (Megaptera novaeangliae). Mol. Biol. Evol. 14:335-362.

WEBER, J. L. and C. WONG, 1993  Mutation of human short tandem repeats. Hum. Mol. Genet. 2:1123-1128.[Abstract/Free Full Text]

WEST, J. G. and D. P. FAITH, 1990  Data, methods and assumptions in phylogenetic inferences. Aust. Syst. Bot. 3:9-20.

ZHIVOTOVSKY, L. A., 2001  Estimating divergence time with the use of microsatellite genetic distances: impacts of population growth and gene flow. Mol. Biol. Evol. 18:700-709.[Abstract/Free Full Text]

ZHIVOTOVSKY, L. A. and M. W. FELDMAN, 1995  Microsatellite variability and genetic distances. Proc. Natl. Acad. Sci. USA 92:11549-11552.[Abstract/Free Full Text]

ZHIVOTOVSKY, L. A., L. BENNETT, A. M. BOWCOCK, and M. W. FELDMAN, 2000  Human population expansion and microsatellite variation. Mol. Biol. Evol. 17:757-767.[Abstract/Free Full Text]




This article has been cited by other articles:


Home page
J HeredHome page
D. N. Irion, A. L. Schaffer, T. R. Famula, M. L. Eggleston, S. S. Hughes, and N. C. Pedersen
Analysis of Genetic Variation in 28 Dog Breed Populations With 100 Microsatellite Markers
J. Hered., January 1, 2003; 94(1): 81 - 87.
[Abstract] [Full Text] [PDF]