Genetics, Vol. 151, 285-296, January 1999, Copyright © 1999

Multiple Levels of Single-Strand Slippage at Cetacean Tri- and Tetranucleotide Repeat Microsatellite Loci

Per J. Palsbølla,b, Martine Bérubéb, and Hanne Jørgensenb
a Department of Ecology and Evolutionary Biology, University of California, Irvine, California 92697-2525,
b Department of Population Biology, University of Copenhagen, DK-2100 Copenhagen, Denmark

Corresponding author: Per J. Palsbøll, School of Biological Sciences, University of Wales, Deiniol Rd., Bangor, Gwyneed LL57 2UW, Wales., p.palsboll{at}bangor.ac.uk (E-mail)

Communicating editor: S. YOKOYAMA


*  ABSTRACT
*TOP
*ABSTRACT
*MATERIALS AND METHODS AND...
*DISCUSSION
*LITERATURE CITED

Between three and six tri- and tetranucleotide repeat microsatellite loci were analyzed in 3720 samples collected from four different species of baleen whales. Ten of the 18 species/locus combinations had imperfect allele arrays, i.e., some alleles differed in length by other than simple integer multiples of the basic repeat length. The estimate of the average number of alleles and heterozygosity was higher at loci with imperfect allele arrays relative to those with perfect allele arrays. Nucleotide sequences of 23 different alleles at one tetranucleotide repeat microsatellite locus in fin whales, Balaenoptera physalus, and humpback whales, Megaptera novaeangliae, revealed sequence changes including perfect repeats only, multiple repeats, and partial repeats. The relative rate of the latter two categories of mutation was estimated at 0.024 of the mutation rate involving perfect repeats only. It is hypothesized that single-strand slippage of partial repeats may provide a mechanism for counteracting the continuous expansion of microsatellite loci, which is the logical consequence of recent reports demonstrating directional mutations. Partial-repeat mutations introduce imperfections in the repeat array, which subsequently could reduce the rate of single-strand slippage. Limited computer simulations confirmed this predicted effect of partial-repeat mutations.


ANALYSES of microsatellite loci are now commonplace in evolutionary and genetic studies of natural populations. Microsatellite loci are nucleotide sequences of one to five nucleotides arranged in tandem (TAUTZ 1989 Down; WEBER and MAY 1989 Down), with mutation rates as high as 10-5–10-2 (WEBER and WONG 1993 Down; TALBOT et al. 1995 Down; AMOS and RUBINSZTEIN 1996A Down; PRIMMER et al. 1996 Down). The allelic states at microsatellite loci are usually scored from their molecular weight, and the subsequent data analysis relies on a mutational mechanism of single-strand slippage during replication (LEVINSON and GUTMAN 1987A Down, LEVINSON and GUTMAN 1987B Down), mainly of single repeats (SCHLOTTERER and TAUTZ 1992 Down; MAHTANI and WILLARD 1993 Down; WEBER and WONG 1993 Down; TALBOT et al. 1995 Down; AMOS and RUBINSZTEIN 1996A Down; PRIMMER et al. 1996 Down; but see also GRIMALDI and CROUAU-ROY 1997 Down; ORTI et al. 1997 Down). This stepwise mode of mutation combined with the high mutation rates violates the assumptions of the commonly used infinite allele/site models. This, in turn, has necessitated development of novel measures of genetic divergence specifically for the analysis of microsatellite data (e.g., GOLDSTEIN et al. 1995A Down, GOLDSTEIN et al. 1995B Down; SHRIVER et al. 1995 Down; SLATKIN 1995 Down; KIMMEL and CHAKRABORTY 1996 Down). Although most investigations of these novel statistics presented so far have been based on a simple symmetrical stepwise mutation model, the proposed statistics also accommodate more complicated distributions of changes in repeat numbers, including asymmetrical and multirepeat mutations (KIMMEL and CHAKRABORTY 1996 Down; KIMMEL et al. 1996 Down).

Several reports have presented analyses of microsatellite data, which demonstrated deviations from null expectations of the simple symmetrical, stepwise mutation model. Likely explanations for the observed deviations are constraints on the number of repeats (GARZA et al. 1995 Down), presence of multirepeat mutations (DI RIENZO et al. 1994 Down; AMOS and RUBINSZTEIN 1996A Down; PRIMMER et al. 1996 Down), and/or directional mutation toward more repeats (ELLEGREN et al. 1995 Down; RUBINSZTEIN et al. 1995 Down; AMOS and RUBINSZTEIN 1996A Down; PRIMMER et al. 1996 Down).

A serious obstacle to additional insight into the mode of evolution at microsatellite loci is the fact that the only phylogenetic signal contained in the repeat array itself is the number of repeats. Hence, investigations of the mode of evolution at microsatellite loci have mainly relied on indirect analyses of deviations from the null-expectations, either by estimating the probability of the observed data under specific evolutionary models (e.g., SHRIVER et al. 1993 Down; DI RIENZO et al. 1994 Down; NIELSEN 1997 Down), by including sequence data from other linked loci (e.g., JIN et al. 1996 Down; ORTI et al. 1997 Down), or by direct identification of germ-line mutations (e.g., AMOS and RUBINSZTEIN 1996B Down; PRIMMER et al. 1996 Down). An alternative approach, which has been pursued by several authors, is analyses of loci with interrupted or compound microsatellite repeat arrays (e.g., ESTOUP et al. 1995 Down; GARZA et al. 1995 Down; GARZA and FREIMER 1996 Down; MESSIER et al. 1996 Down; ANGERS and BERNATCHEZ 1997 Down). These studies showed that the nucleotide sequence of the microsatellite array at such loci often provides additional evolutionary data not obtainable from the molecular weight alone and that, indeed, the difference in molecular weight may not be a reliable indicator of evolutionary distance. However, because one of the major advantages of microsatellite analyses over traditional sequence analyses in population assays is that alleles are scored by the molecular weight, such an additional sequence analysis will require a substantial increase in effort.

Here we present the results from a study of cetacean tri- and tetranucleotide repeat microsatellite loci where some alleles differ in length by other than simple multiples of the basic repeat length. These microsatellite loci differ from the interrupted and compound microsatellite loci presented previously by the fact that alleles at a locus can be divided into groups that represent different evolutionary lineages from the molecular weight alone. Sequencing of alleles at one locus in two species revealed a complex pattern of single-strand slippage on several levels that involved not only single repeats, but also multiple and partial repeats. We estimated the rate of mutations that included such imperfect or partial repeats at ~2.4% of the rate of mutations involving only perfect repeats (most of which are presumably single-step mutations).


*  MATERIALS AND METHODS AND RESULTS
*TOP
*ABSTRACT
*MATERIALS AND METHODS AND...
*DISCUSSION
*LITERATURE CITED

Sample collection:
A total of 3720 tissue samples were analyzed, the majority of which were obtained from free-ranging whales as skin biopsies (PALSBOLL et al. 1991 Down) or sloughed skin (CLAPHAM et al. 1993 Down), and a few that were obtained during whaling operations (coastal subsistence hunting and premoratorium commercial whaling operations). Samples from fin whales, Balaenoptera physalus, were collected in the North Atlantic, the Mediterranean Sea, and the Sea of Cortez in the North Pacific Ocean (BERUBE et al. 1998 Down). Minke whale, B. acutorostrata, blue whale, B. musculus, and humpback whale, Megaptera novaeangliae, samples (PALSBOLL et al. 1997A Down) were collected only in the North Atlantic (Table 1). Samples were preserved either by freezing at -20° to -80°, conservation in saturated NaCl with 20% DMSO (AMOS and HOELZEL 1991 Down), or both.


 
View this table:
In this window
In a new window

 
Table 1. Number and origin of samples

Genotyping of microsatellite loci:
Total-cell DNA was extracted after standard procedures of cell lysis by addition of 1% SDS, overnight digestion with proteinase K, multiple extractions with phenol/chloroform, and finally ethanol precipitation (MANIATIS et al. 1982 Down). Three to six tri- and tetranucleotide repeat microsatellite loci were analyzed in each sample as described (PALSBOLL et al. 1997B Down; Table 1 and Table 3). In addition, the first 289–302 nucleotides of the mitochondrial control region were sequenced and the sex was determined for each sample following the procedures outlined in PALSBOLL et al. 1995 Down and BERUBE and PALSBOLL 1996A Down, BERUBE and PALSBOLL 1996B Down. Any two samples with identical genotypes at all analyzed microsatellite loci, mitochondrial control region sequences, and sex were inferred as duplicate samples from the same individual whale (in both the fin whale and blue whale one dinucleotide repeat microsatellite locus was analyzed as well). Using these criteria, the 3720 samples were collected from a total of 2975 individual whales (Table 1).


 
View this table:
In this window
In a new window

 
Table 2. Allele lengths and frequencies at locus GATA028 in the Gulf of St. Lawrence fin whale sample


 
View this table:
In this window
In a new window

 
Table 3. Number of alleles and subarrays per locus

Number of alleles and allele-length distributions:
Two kinds of intraspecific allele-length distributions were observed in the analyzed samples: "perfect allele arrays," in which the length of all alleles differed by simple integer multiples of the basic repeat length, and "imperfect allele arrays," where some alleles differed in length by other than simple integer multiples of the basic repeat length (see Table 2 for an example). The alleles at each imperfect allele array could be further subdivided into "subarrays," each containing alleles that differed in length only by simple multiples of the basic repeat length (Table 2).

Of the 18 species/loci combinations analyzed, 10 had imperfect and 8 perfect allele arrays (Table 3). Within and among species, we observed a higher number of alleles at loci with imperfect allele arrays relative to loci with perfect allele arrays. We observed an average of 8.2 (range: 6–11) and 14.6 (range: 8–27) alleles at loci with perfect and imperfect allele arrays, respectively, and between two and four subarrays at loci with imperfect allele arrays. Not surprisingly (given the difference in the number of alleles), we estimated a higher degree of heterozygosity (H) at loci with imperfect allele arrays as well (Table 7).


 
View this table:
In this window
In a new window

 
Table 4. Nucleotide sequences of the microsatellite array for alleles detected at locus GATA028 in fin and humpback whales


 
View this table:
In this window
In a new window

 
Table 5. Estimated values of from simulations under a model of constant population size


 
View this table:
In this window
In a new window

 
Table 6. Estimates of under a model of population expansion


 
View this table:
In this window
In a new window

 
Table 7. Observed values of [S] and heterozygosity (H) for selected populations

To test if the observed number of alleles and heterozygosity at perfect loci indeed was significantly lower than that of loci with imperfect allele arrays, we ranked the observed number of alleles (Table 3) or estimated heterozygosity (Table 7) within each species. The test statistic (SOBS) was calculated as the sum, across all species, of the ranks assigned to the loci with perfect allele arrays.

The probability of SOBS was estimated from 10,000 permutations. For each permutation and each species, the observed ranks were randomly reassigned to the analyzed loci, and the sum of the ranks (SSIM) assigned to loci with perfect allele arrays was calculated. The probability of SOBS was estimated as the proportion of simulations where SSIM was equal or smaller than SOBS. The tests did not include the data from B. musculus, as only loci with perfect allele arrays were observed in this species.

The probability of the observed ranking regarding the number of alleles (Table 3) and estimated heterozygosity (Table 7) at loci with perfect and imperfect allele arrays was estimated at 0.0086 (SOBS = 10.5) and 0.073 (SOBS = 10.5), respectively. This result implied that a significantly higher number of alleles was observed at loci with imperfect allele arrays. The degree of heterozygosity was similarly higher, but not significantly so, at loci with imperfect allele arrays.

Sequence analysis of locus GATA028 alleles:
To gain further insight into the kind of changes at the sequence level that generated the imperfect allele arrays, we sequenced individual alleles of different lengths at locus GATA028 in fin and humpback whale samples.

For the fin whale, one copy of each allele length detected among the 358 individual whales analyzed was sequenced (a total of 19 alleles). The alleles were preferably isolated and sequenced in homozygous individuals. Alleles not detected in a homozygous state were amplified and sequenced in the heterozygous individual, where we observed the largest difference in allele lengths. In practice, this meant that the sequenced alleles were sampled from several different and quite divergent populations, such as the Sea of Cortez, the Mediterranean Sea, and the Gulf of St. Lawrence. In the humpback whale, only two alleles of each subarray were sequenced.

Individual alleles were sequenced directly from asymmetrically amplified PCR products after an initial symmetrical amplification (GYLLENSTEN and ERLICH 1988 Down). For alleles found only in heterozygous individuals, the symmetrical amplification products were separated before the subsequent asymmetrical amplification by electrophoresis through 4% NuSieve low-melting agarose, and the relevant band was excised and dissolved in distilled water. Symmetrical and asymmetrical amplifications were performed under conditions similar to those used during the population analyses, except that both oligonucleotide primers (the same as used for the population analyses) were added in 1 µM concentrations. For the asymmetrical amplifications, the concentration of the limiting oligonucleotide primer was reduced to 0.01 µM. Symmetrical and asymmetrical amplifications were performed in 10- and 50-µl volumes, respectively.

The limiting oligonucleotide primer used for the asymmetrical amplification was used as a sequencing primer following the manufacturer's instructions (Sequenase Version 2.0; United States Biochemical, Cleveland). The sequence reaction products were separated by electrophoresis, as described for the population analyses, and visualized by overnight autoradiography.

Nucleotide composition of locus GATA028 alleles:
The alleles at locus GATA028 sequenced in the fin whale could be divided into four categories, each corresponding to the four subarrays identified in the population analyses. The sequenced alleles of the subarray denoted 1 (Table 4) all contained a duplicated, 15-nucleotide repeat at the 3' end of the microsatellite array, each composed of one imperfect (GTA) followed by three perfect (GATA) repeats. All the remaining three subarrays (denoted 0, 2, and 3; Table 4) also contained the 15-nucleotide repeat, but in these alleles, it was repeated three times. Of these last three subarrays, two contained imperfect repeats (a TA or a GAT repeat, subarrays 2 and 3, respectively; Table 4) within what was a perfect array of GATA repeats in the third subarray (subarray 0; Table 4).

The sequences of GATA028 alleles in the humpback whale could also be subdivided into two categories, each corresponding to the two observed subarrays. Alleles belonging to the subarray denoted 0 (Table 4) contained a 15-nucleotide repeat sequence at the 3' end that was identical to the one found in the fin whales, although not repeated. Alleles of the subarray denoted 3 (Table 4) in the humpback whale did not contain the 15-nucleotide repeat sequence, but rather they contained a duplicated 11-nucleotide repeat sequence consisting of one imperfect (GTA) repeat followed by two perfect (GATA) repeats (Table 4).

The nucleotide sequences of alleles at locus GATA028 revealed that the main mutational mechanism within each subarray at loci with imperfect allele arrays probably was (as anticipated for microsatellite loci) single-strand slippage of perfect GATA repeats. However, the mutations responsible for the transitions between subarrays were imperfect mutations, i.e., not simple loss or gain of single, perfect repeats. Two kinds of imperfect mutations were observed: gain or loss (presumably by single-strand slippage) of multiple repeats, of which one was an imperfect repeat (e.g., the 15- or 11-nucleotide repeat sequences in the fin and humpback whale, respectively), or single-strand slippage involving partial repeats. Alternatively, the latter kind of imperfect mutations could also result from a deletion of one or two nucleotides not generated by single-strand slippage.

The new allele generated from such an imperfect mutation may differ in length from the parental allele by other than a simple integer multiple of the basic repeat length, as observed in the present study. Hence, the new allele, as well as its descendant alleles generated by single-strand slippage of perfect repeats, will form a lineage (or subarray) that is readily distinguishable from other alleles by molecular weight alone. The occurrence of imperfect mutations thus explained why we observed an elevated number of alleles at loci with imperfect allele arrays. In the absence of imperfect mutations, many mutations will yield allele lengths that already are present in the population and, thus, do not add to the overall number of discernible alleles.

Relative rate of imperfect to perfect mutations:
The fact that we observed imperfect allele arrays at 10 of 18 loci indicated that imperfect mutations were relatively frequent. To obtain an estimate of the frequency of imperfect mutations from the combined data sets of all four species, we estimated the frequency of imperfect mutations as the relative rate (R) of imperfect to perfect mutations. For simplicity, we assumed that all perfect mutations were stepwise mutations. R was defined as

and estimated as

(1)
where [I]i and [S]i are the estimates of the composite parameters {theta}[I] and {theta}[S], respectively, at the ith locus. The parameters {theta}[I] and {theta}[S] equal 4Neµ[I] and 4Neµ[S], where Ne denotes the effective population size, and µ[I] and µ[S] denote the mutation rate of imperfect and single-step mutations, respectively. The term µ[S] is equal to the mutation rate under a symmetrical single-step model. Under other and less simple mutation models (e.g., asymmetrical and multistep mutations), the term µ[S] is equal to the product of the mutation rate and the variance of the symmetrized distribution of changes in allele size (see KIMMEL and CHAKRABORTY 1996 Down; KIMMEL et al. 1996 Down). Hence, in principle, the estimations below are valid for other and more complicated stepwise mutation models than the simple symmetrical single-step mutation model.

Estimation of R: Depending on the rate and nature of the imperfect mutations, the parameter {theta}[I] can be estimated under either a single-step mutation model and/or an infinite allele model. The parameter {theta}[S], however, is most appropriately estimated under a stepwise model.

Estimation of {theta}[S] at a single locus: We estimated {theta}[S] at each locus under the simplest possible stepwise mutation model, namely gain or loss of only a single repeat, each with an equal probability. Under such a strict single-step mutation model and assuming equilibrium conditions, {theta}[S] can be estimated from the sample variance in repeat number per chromosome at the locus, i.e.,

(2)
where n is the number of chromosomes sampled, ji is the number of repeats detected at the ith copy, and is the mean number of repeats for all sampled chromosomes (MORAN 1975 Down; VALDES et al. 1993 Down).

It is straightforward to estimate {theta}[S] in this manner at loci with perfect allele arrays, as the variance can be estimated directly from relative difference in allele lengths divided by the repeat length. However, for loci with imperfect allele arrays, we cannot deduce the relative difference in the number of repeats between alleles from different subarrays unless the nucleotide composition of alleles at each subarray is known (which was the case for only two species/locus combinations in this study). An overall estimate of the parameter {theta}[S] could be obtained by simply adding the contribution from each subarray, i.e.,

(3)
where [S]j is the estimate of {theta}[S] for the jth subarray obtained as described by Equation 2. This approach is similar to that suggested by HUDSON and KAPLAN 1986 Down, who found that the expected number of segregating sites in a nested subsample (based on allelic class) was approximately equal to the population frequency of the subsample times the expected number of segregating sites in the entire sample. Hence, if E([S]j) = {theta}Sxj, where xj is the population frequency of the jth subarray, it follows that E([S]) {approx} {theta}[S].

Estimation of {theta}[I] at a single locus: The parameter {theta}[I] was estimated as [I] from the heterozygosity in the sample using the bias correction suggested by CHAKRABORTY and WEISS 1991 Down as the solution to the equation

(4)
where t = and H = 1 - {Sigma}ix2i , where xi is the frequency of the ith allele.

Evaluating the estimation of R: To evaluate if indeed Equation 1 provided an unbiased estimate of R, coalescence simulations were performed as described by HUDSON 1990 Down. During these simulations, two kinds of mutations were allowed: single-step gain or loss of repeats, each with a mutation rate of {theta}[S]/2, as well as less frequent mutations, at a rate of {theta}[I], each generating a new discernible allele, i.e., corresponding to imperfect mutations. For each combination of {theta}[I] and {theta}[S], we conducted 1000 simulations, each with six loci and 200 chromosomes, where {theta}[I], {theta}[S], and R were estimated as [I], [S], and in the manner described above (Equation 4, Equation 3, and Equation 1, respectively).

Our simulations revealed that R was consistently overestimated over a wide range of parameter values (see Table 5) when {theta}[S] was estimated from subarrays (Equation 2). The degree of bias, however, was ~40% and did not appear to be affected by the value of {theta}[I], {theta}[S], or R. The bias was mainly caused by underestimation of {theta}[S] when estimated from subarrays (Equation 3).

The severity of the bias introduced by the estimation of R in the above manner from the subarrays should be evaluated in terms of the overall variance in the estimation of R. As is evident from Table 5, the variance of R is quite considerable and exceeds by far the bias introduced by the estimation from subarrays for the values of [I], [S], and R observed in this study (Table 7).

Effects of population expansion on the estimation of R: While R is the ratio of the same parameter (4Neµ) for two different kinds of mutations, the estimates of {theta}[I] and {theta}[S] are, however, obtained from two different aspects of the data. Chakraborty and Kimmel have recently shown (CHAKRABORTY et al. 1997 Down; KIMMEL et al. 1998 Down) that these two aspects respond differently to temporal changes in Ne, and, thus, our estimate of R may not only reflect the ratio µ[I]/µ[S] during and after changes in Ne.

Analyses of mitochondrial control region sequences in the samples included in this study using the program Fluctuate in the Lamarc computer package (KUHNER et al. 1998 Down) indicated that several of the populations included in this study probably have growth rates that deviate significantly from zero (P. J. PALSBØLL, unpublished data; data and results not shown). We investigated the possible effect of such changes in Ne to our estimates of R by coalescence simulations. The simulations were conducted under a model of exponential growth in the manner described by SLATKIN and HUDSON 1991 Down, with two kinds of mutations corresponding to either an infinite allele or a stepwise mutation model and equivalent to {theta}[I] and {theta}[S], respectively. Simulations were performed with parameter values of {theta}[I] and {theta}[S] ranging from 0.001 to 100, and {alpha} (rNe where r is the growth rate) ranging from 5 to 5000 (Table 6). A total of 1000 simulations were undertaken per combination of {theta}[I], {theta}[S], and {alpha}, each with six loci and 200 chromosomes. The estimate of R was obtained for each simulation using Equation 1Equation 2Equation 3Equation 4.

Although the simulations revealed that [S] and [I] (Equation 3 and Equation 4) underestimated {theta}[I] and {theta}[S] during population growth, the bias of the estimate of R itself was relatively modest (Table 6). The simulations that yielded mean values of [S], [I], and observed during this study indicated that R (on average) was underestimated by ~20% (Table 6).

Observed estimates of R: Using Equation 1Equation 2Equation 3Equation 4, we estimated R, the relative mutation rate of the imperfect to single-step mutations, at all loci with imperfect allele arrays.

As explained above, the estimations rely on population equilibrium conditions, i.e., constant population size, no recombination, and that the sampled chromosomes are from a single, panmictic population with no migration. It is not possible with the current knowledge to assess if all these assumptions are met for all the species and populations included in this study. However, to minimize possible violations of the assumptions, we did confine our estimation of R to populations that are currently believed to constitute part of a single panmictic population (although migration most likely does occur). The analyzed populations were West Greenland minke whales (n = 69), western North Atlantic blue whales (n = 89), Gulf of St. Lawrence fin whales (n = 97), and West Indian humpback whales (1992 only, n = 596). Additional estimations were also obtained from the Sea of Cortez (n = 51) and Mediterranean Sea (n = 58) fin whale populations. As mentioned above, an analysis of the mitochondrial control region sequences using the program Fluctuate in the Lamarc computer package (KUHNER et al. 1998 Down) based upon the population samples used in this study yielded estimates of growth rates in blue and humpback whale populations that did not deviate significantly from zero (data not shown). For the remaining populations, we estimated positive growth rates that deviated significantly from zero. The only exception were the Mediterranean Sea fin whales, where we estimated a negative growth rate that deviated significantly from zero.

Estimates of {theta}[S] for individual loci ranged from 0 to 430, with most values in the range of 5–30 (Table 7). Extreme values outside this range (e.g., B. musculus, locus GATA098, [S] = 430; Table 7) did have allele frequency distributions that deviated significantly from the null expectations under a single-step mutation model (for further discussion see NIELSEN and PALSBOLL 1999 Down).

The estimates of R obtained at loci with imperfect allele arrays ranged from 0 to 0.065 (Table 8), with an overall mean of 0.024. The highest estimates of R were observed in the minke whale (B. acutorostrata; Table 8) and fin whale (B. physalus; Table 8), where analysis of mitochondrial control region sequences indicated population growth (data not shown). Hence, it appears (as our simulations suggested) that the population expansions have not greatly influenced our estimate of R. Our results imply that (on average) ~2.4% of the mutations at these tri- and tetranucleotide repeat microsatellite loci were imperfect mutations, i.e., mutations other than simple gain or loss of perfect repeats (Table 8).


 
View this table:
In this window
In a new window

 
Table 8. Intraspecific estimates of R

Separate estimates from other fin whale populations in the Mediterranean Sea and the Sea of Cortez yielded similar estimates of R (Table 9).


 
View this table:
In this window
In a new window

 
Table 9. Estimates of R from additional fin whale populations (B. physalus) at loci GATA028, GATA053, and GGAA520

Because the nucleotide sequence of each allele length detected at locus GATA028 was known in the fin and humpback whales, we were able to estimate {theta}[S] directly from the variance in repeat number (Equation 2) after exclusion of the imperfect mutations responsible for the generation of subarrays (Table 4). During this estimation, we assumed that all copies of equal length had a nucleotide sequence similar to that of the sequenced allele (Table 4).

The values of R estimated in this manner at locus GATA028, in three fin whale populations and one humpback whale population (Table 10), yielded estimates of {theta}[S] that were approximately half of the estimates obtained by our indirect approach (Equation 3). In all four cases, the estimate of R was at least twice that of the estimate obtained by the indirect approach (Equation 3). Given the large variance in the estimation of {theta}[S] itself from the number of repeats (Equation 5) and the fact that some of the populations share a recent common ancestry and, thus, do not constitute independent observations, no generalizations can be drawn from these relatively few observations.


 
View this table:
In this window
In a new window

 
Table 10. Estimates of {theta}[S] and R at locus GATA028 directly from the sequence data

The values in Table 8 suggested a positive correlation between {theta}[I] and {theta}[S]. The existence of such a correlation was assessed by using the same approach that was used when testing whether the observed number of alleles and heterozygosity was higher at loci with imperfect allele arrays compared to loci with perfect allele arrays (see above). The loci within each species (Table 8) were ranked according to [S] and subsequently partitioned into loci with perfect or imperfect allele arrays. The probability of the observed sum of the ranks for the loci with perfect allele arrays (SOBS = 9.0) was estimated from 10,000 Monte Carlo simulations to 0.025, which implies there was a positive correlation between {theta}[I] and {theta}[S].


*  DISCUSSION
*TOP
*ABSTRACT
*MATERIALS AND METHODS AND...
*DISCUSSION
*LITERATURE CITED

Multiple levels of single-strand slippage at microsatellite arrays:
The findings of this study suggest that single-strand slippage mutations at microsatellite loci involve not only single-step mutations, but also relatively high frequencies of multi- as well as partial-repeat mutations. The frequency of the two latter categories of mutations was estimated at a mean of 2.5% of the rate of single-step mutations. The estimate was obtained from several loci and across four different species. The multistep mutations detected in this study included an imperfect repeat, and, thus, were contingent on a previous imperfect mutation, i.e., by partial-repeat slippage. Of the four imperfect mutations detected from the sequences at locus GATA028, two involved multiple repeats. Hence, our study yielded an approximate rate of multirepeat and partial-repeat mutations of roughly 1.25% each. As our study only detected multirepeat mutations that included an imperfect repeat, this rate is most likely an underestimate of the overall rate of multirepeat mutations. The occurrence of imperfect mutations was not confined to a single species, locus, or population, but was detected across several species and loci, arguing that imperfect mutations are relatively common phenomena.

The imperfect mutations, which we interpreted as partial-repeat slippage, could also be indels not generated by single-strand slippage. However, single-strand slippage appears to be the most likely mutational mechanism for generating the imperfect mutations observed at locus GATA028 for the following reasons:

  • The nucleotide sequences of the alleles at locus GATA028 contained as many nucleotides from the flanking regions as from the microsatellite array (data not presented); however, neither indels nor any nucleotide substitutions were observed in the flanking regions.

  • All the inferred partial-repeat changes were located within a stretch of perfect repeats where single-strand slippage is presumably the main mutational mechanism.

  • The apparent positive correlation of {theta}[I] with {theta}[S].

  • The two imperfect repeats generated from these mutations consisted of partial GATA repeats (GAT or TA).

The sequence data presented by ESTOUP et al. 1995 Down and ANGERS and BERNATCHEZ 1997 Down also suggest partial-repeat slippage mutations, although at an interspecific level.

As suggested for minisatellites (MONCKTON et al. 1994 Down), the mutation in the repeat array could also be influenced by elements in the flanking regions; however, the present data do not allow for the testing of such a possibility.

Constraints on allele size as a result of partial-repeat mutations:
While multirepeat mutations have been presented earlier, partial-repeat mutations within the microsatellite array are not commonly reported. Imperfections in the repeat array of an allele appear to reduce (WEBER 1990 Down) or completely halt the rate of single-strand slippage mutations (JIN et al. 1996 Down). Hence, imperfections in the repeat array may, in part, provide a mechanism that would counteract the expansion in overall allele length caused by a mutational bias toward a gain of repeats as reported recently by AMOS and RUBINSZTEIN 1996B Down and PRIMMER et al. 1996 Down.

A number of deleterious diseases, e.g., Huntington's disease (DUYAO et al. 1993 Down), have been shown to be caused by a rapid increase in the number of repeats at specific microsatellite loci, and, thus, selection could also hinder expansion of microsatellite loci. However, as many microsatellite loci are situated in noncoding DNA sequences, selection does not appear to be the sole mechanism preventing a continuous expansion.

Partial-repeat mutations may partly counteract continuous expansion of the repeat number at neutral microsatellite loci by generating imperfections in the microsatellite array. We tested the effects of partial-repeat mutations on the overall number of repeats by simulations. We assumed a biased (toward gain of repeats) single-step mutation model with an equal probability of a partial-repeat mutation per repeat in the microsatellite array. The occurrence of a partial-repeat mutation in a microsatellite array changed the rate of single-step mutations from {theta}[S] to zero. The prediction of such a model is that alleles with a high number of repeats on average are more prone to partial-repeat mutations than alleles with fewer repeats, which in turn will reduce the rate of single-strand slippage (in this case to zero). The proposed mechanism is consistent with the observation that some loci contain alleles with a large number of perfect repeats (RICO et al. 1994 Down), or that some species are fixed for alleles with an imperfect repeat array.

A limited number of simulations, under the model proposed above using forward simulations with multinomial resampling of alleles over discrete generations and constant population size, did indeed confirm the predictions of the model (Figure 1). The presence of partial-repeat mutations reduced the increase in mean allele length relative to the absence of partial-repeat mutations. The number of simulations conducted was very limited and assumed that a partial-repeat mutation completely halted the rate of single-step mutations, which our own data indicate is not necessarily the case. A more thorough assessment is warranted over a wide range of parameter values before any firm conclusions can be drawn. However, this result indicates that a relatively minor extension of the main mutational mechanism at microsatellite loci could provide an explanation for the absence of continuous expansion of microsatellite loci, which is a logical consequence of the empirical data suggesting a mutational bias toward gain of repeats at microsatellite loci.



View larger version (10K):
In this window
In a new window
Download PPT slide
 
Figure 1. Estimates of mean allele length with or without partial-repeat mutations. Estimates of mean number of repeats per allele after 2000–10,000 generations in a population of 1000 chromosomes and an initial allele size of 10 repeats at generation 0. A total of 100 simulations were conducted per estimate. The single-step mutation rate ({theta}[S]) was set at 10-4, and the probability of a gain was set at 0.7. ({circ}) A probability of 0.005 of a partial-repeat mutation per repeat, which reduced {theta}[S] to zero. ({bullet}) Simulations under similar conditions, but with no partial-repeat mutations.

Consequence for detection and estimation of divergence:
Our study revealed that approximately half of the imperfect mutations were multirepeat changes. This estimate is likely to be an underestimate because of the approach used in this study (see above). AMOS and RUBINSZTEIN 1996B Down as well as PRIMMER et al. 1996 Down identified germ-line mutations, each at single microsatellite loci, and detected multirepeat mutations at a frequency of 0 and 18%, respectively. DI RIENZO et al. 1994 Down observed allele distributions at 8 out of 10 dinucleotide repeat microsatellite loci that were consistent with the occurrence of multirepeat mutations when compared to the null expectations under a strict single-step mutation model. NIELSEN and PALSBOLL 1999 Down estimated the frequency of multirepeat mutations at 9 microsatellite loci with perfect arrays in different baleen whale populations using a maximum likelihood procedure (NIELSEN 1997 Down). They found significant deviations from the null expectations under a strict single-step mutation model, consistent with multirepeat mutations at 2 loci. The estimates of the frequency of multirepeat mutations most compatible with the observed data were 0.05 and 0.29, respectively.

The results from the above-mentioned studies as well as the present studies indicate that multirepeat mutations occur at a high proportion of loci. Multirepeat mutations will change the sample mean and increase the sample variance several repeat units in a single mutational event. In the present study, we observed two instances where one subarray was completely absent from one or several population samples (locus GATA028 and locus GGAA520, B. physalus, data not shown), which, of course, will affect the linear relationship between the microsatellite-specific statistics and divergence time (GOLDSTEIN et al. 1995A Down, GOLDSTEIN et al. 1995B Down; SLATKIN 1995 Down). The increase in variance of the microsatellite-specific statistics caused by multirepeat mutations may have a considerable impact on the accuracy of studies of natural populations, which are typically based on analyses of a relatively modest number of loci (ZHIVOTOVSKY and FELDMAN 1995 Down), and may explain why some population genetic studies find a poor correlation between geographic and interpopulation genetic distances (e.g., VALSECCHI et al. 1997 Down).

The partial-repeat mutations detected in the current study have an impact on the accuracy of divergence estimates obtained from statistics based on the number of alleles, such as Weir's {theta} (WEIR 1990 Down). As our results have shown, the number of alleles are correlated with the number of subarrays and, thus, partial-repeat mutations will increase the variance of such statistics.

The results from this and other studies (see above) show that the sequence changes observed at microsatellite loci do not follow a simple pattern, which presumably increases the variance of the current statistics proposed for estimating divergence from microsatellite data. Most studies of natural populations rely on the analysis of a relatively modest number of microsatellite loci, and, thus, the increase in variance is of concern and needs to be addressed. It may be that microsatellite loci with imperfect allele arrays, such as those described in the present study, constitute a useful class of loci, which possesses the high rate of mutation that is characteristic of microsatellite loci, but with an elevated number of alleles relative to perfect loci.


*  ACKNOWLEDGMENTS

We thank the following institutions for donating samples: Allied Whale, Center for Coastal Studies, Cetacean Research Group at Memorial University, Department of Animal Biology at Barcelona University, Department of Marine Biology at University of Baja California, Fisheries Research Institute at Tromsø University, Greenland Natural Resources Institute, the Marine Research Institutes in Iceland and Norway, Mingan Island Cetacean Study, Inc., and Tethys. The majority of the humpback whale samples was collected during the international collaborative project YoNAH (Years of the North Atlantic Humpback whale). In addition, we thank T. H. Andersen, T. P. Feddersen, C. Færch-Jensen, A. H. Larsen, K. B. Pedersen, D. Poulsen, P. Raahauge, R. Sponer, and E. Widén for technical assistance. This work was greatly improved by the valuable comments and suggestions from R. R. Hudson. M. Slatkin also provided useful comments on earlier drafts. We also owe thanks to one anonymous reviewer, who pointed out the possible effect of population growth to our estimations, and R. Nielsen for advice. R. R. Hudson and P. Arctander are thanked for their support. This project was in part funded by the Commission for Scientific Research in Greenland, the European Union Biotechnology Program (grant to P. Arctander), the Greenland Home Rule, the International Whaling Commission, the Natural Science Research Council (Denmark), World Wildlife Foundation (Denmark), and the Åge V. Jensen Charity Foundation.

Manuscript received July 2, 1998; Accepted for publication September 21, 1998.


*  LITERATURE CITED
*TOP
*ABSTRACT
*MATERIALS AND METHODS AND...
*DISCUSSION
*LITERATURE CITED

AMOS, W. and A. R. HOELZEL, 1991  Long-term preservation of whale skin for DNA analysis. Rep. Int. Whaling Comm. Spec. Issue 13:99-104.

AMOS, W. and D. C. RUBINSZTEIN, 1996a  Microsatellites are subject to directional evolution. Nat. Genet. 12:13-14[Medline].

AMOS, W. and D. C. RUBINSZTEIN, 1996b  Microsatellites show mutational bias and heterozygote instability. Nat. Genet. 13:390-391[Medline].

ANGERS, B. and L. BERNATCHEZ, 1997  Complex evolution of a salmonid microsatellite locus and its consequences in inferring allelic divergence from size information. Mol. Biol. Evol. 14:230-238[Abstract].

RUBÉ, M. and P. PALSBØLL, 1996a  Erratum of identification of sex in cetaceans by multiplexing with three ZFX and ZFY specific primers. Mol. Ecol. 5:602.

RUBÉ, M. and P. J. PALSBØLL, 1996b  Identification of sex in cetaceans by multiplexing with three ZFX and ZFY specific primers. Mol. Ecol. 5:283-287[Medline].

RUBÉ, M., A. AGUILAR, D. DENDANTO, F. LARSEN, and G. NOTARBARTOLO-DI-SCIARA et al., 1998  Population genetic structure of North Atlantic, Mediterranean Sea and Sea of Cortez fin whales, Balaenoptera physalus (Linnaeus, 1758): analysis of mitochondrial and nuclear loci. Mol. Ecol. 7:585-600[Medline].

CHAKRABORTY, R. and K. M. WEISS, 1991  Genetic variation of the mitochondrial DNA genome in American Indians is at mutation-drift equilibrium. Am. J. Phys. Anthropol. 86:497-506[Medline].

CHAKRABORTY, R., M. KIMMEL, D. N. STIVERS, L. J. DAVISON, and R. DEKA, 1997  Relative mutation rate at di-, tri-, and tetranucleotide microsatellite loci. Proc. Natl. Acad. Sci. USA 94:1041-1046[Abstract/Free Full Text].

CLAPHAM, P. J., P. J. PALSBØLL, and D. K. MATTILA, 1993  High-energy behaviors in humpback whales as a source of sloughed skin for molecular analysis. Mar. Mamm. Sci. 9:213-220.

DI RIENZO, A., A. C. PETERSON, J. C. GARZA, A. M. VALDES, and M. SLATKIN et al., 1994  Mutational processes of simple-sequence repeat loci in human populations. Proc. Natl. Acad. Sci. USA 91:3166-3170[Abstract/Free Full Text].

DUYAO, M., C. AMBROSE, R. MYERS, A. NOVELLETTO, and F. PERSICHETTI et al., 1993  Trinucleotide repeat length instability and age of onset in Huntington's disease. Nat. Genet. 4:387-392[Medline].

ELLEGREN, H., C. R. PRIMMER, and B. C. SHELDON, 1995  Microsatellite `evolution': directional bias? Nat. Genet. 11:360-362[Medline].

ESTOUP, A., C. TAILIEZ, J.-M. CORNUET, and M. SOLIGNAC, 1995  Size homoplasy and mutational processes of interrupted microsatellites in two bee species, Apis mellifera and Bombus terrestris (Apidae). Mol. Biol. Evol. 12:1074-1084[Abstract].

GARZA, J. C. and N. B. FREIMER, 1996  Homoplasy for size at microsatellite loci in humans and chimpanzees. Genome Res. 6:211-217[Abstract/Free Full Text].

GARZA, J. C., M. SLATKIN, and N. B. FREIMER, 1995  Microsatellite allele frequencies in humans and chimpanzees, with implications for constraints on allele size. Mol. Biol. Evol. 12:594-603[Abstract].

GOLDSTEIN, D. B., A. R. LINARES, L. L. CAVELLI-SFORZA, and M. W. FELDMAN, 1995a  An evaluation of genetic distances for use with microsatellite loci. Genetics 139:463-471[Abstract].

GOLDSTEIN, D. B., A. RUIZ LINARES, L. L. CAVALLI-SFORZA, and M. W. FELDMAN, 1995b  Genetic absolute dating based upon microsatellites and the origin of modern humans. Proc. Natl. Acad. Sci. USA 92:6723-6727[Abstract/Free Full Text].

GRIMALDI, M.-C. and B. CROUAU-ROY, 1997  Microsatellite allelic homoplasmy due to variable flanking sequences. J. Mol. Evol. 44:336-340[Medline].

GYLLENSTEN, U. B. and H. A. ERLICH, 1988  Generation of single-stranded DNA by the polymerase reaction and its application to direct sequencing of the HLA-DQA locus. Proc. Natl. Acad. Sci. USA 85:7652-7656[Abstract/Free Full Text].

HUDSON, R. R., 1990 Gene genealogies and the coalescent process, pp. 1–44 in Oxfords Surveys in Evolutionary Biology, edited by D. J. FUTUYMA and J. ANTONOVICS. Oxford University Press, Oxford.

HUDSON, R. R. and N. L. KAPLAN, 1986  On the divergence of alleles in nested subsamples from finite populations. Genetics 113:1057-1076[Abstract/Free Full Text].

JIN, L., C. MACAUBAS, J. HALLMAYER, A. KIMURA, and E. MIGNOT, 1996  Mutation rate varies among alleles at a microsatellite locus: phylogenetic evidence. Proc. Natl. Acad. Sci. USA 93:15285-15288[Abstract/Free Full Text].

KIMMEL, M. and R. CHAKRABORTY, 1996  Measures of variation at DNA repeat loci under a general stepwise mutation model. Theor. Popul. Biol. 50:345-367[Medline].

KIMMEL, M., R. CHAKRABORTY, D. N. STIVERS, and R. DEKA, 1996  Dynamics of repeat polymorphisms under a forward-backward mutation model: within- and between-population variability at microsatellite loci. Genetics 143:549-555[Abstract].

KIMMEL, M., R. CHAKRABORTY, J. P. KING, M. BAMSHAD, and W. S. WATKINS et al., 1998  Signatures of population expansion in microsatellite repeat data. Genetics 148:1921-1930[Abstract/Free Full Text].

KUHNER, M. K., J. YAMATO, and J. FELSENSTEIN, 1998  Maximum likelihood estimation of population growth rates based on the coalescent. Genetics 149:429-434[Abstract/Free Full Text].

LEVINSON, G. and G. A. GUTMAN, 1987a  High frequencies of short frameshifts in poly-CA/TG tandem repeats borne by bacteriophage M13 in Escherichia coli K-12. Nucleic Acids Res. 15:5323-5339[Abstract/Free Full Text].

LEVINSON, G. and G. A. GUTMAN, 1987b  Slipped-strand mispairing: a major mechanism for DNA sequence evolution. Mol. Biol. Evol. 4:203-221[Abstract].

MAHTANI, M. M. and H. F. WILLARD, 1993  A polymorphic X-linked tetranucleotide repeat locus displaying a high rate of new mutation: implications for mechanisms of mutation at short tandem repeat loci. Hum. Mol. Genet. 2:431-437[Abstract/Free Full Text].

MANIATIS, T., E. F. FRITSCH and J. SAMBROOK, 1982 Molecular Cloning: A Laboratory Manual. Cold Spring Harbor Laboratory Press, Cold Spring Harbor, NY.

MESSIER, W., S.-H. LI, and C.-B. STEWARD, 1996  The birth of microsatellites. Nature 381:483[Medline].

MONCKTON, D. G., R. NEUMANN, T. GURAM, N. FRETWELL, and K. TAMAKI et al., 1994  Minisatellite mutation rate variation associated with a flanking DNA sequence polymorphism. Nat. Genet. 8:162-170[Medline].

MORAN, P. A. P., 1975  Wandering distribution and the electrophoretic profile. Theor. Popul. Biol. 8:318-330[Medline].

NIELSEN, R., 1997  A likelihood approach to population samples of microsatellite alleles. Genetics 146:711-716[Abstract].

NIELSEN, R. and P. J. PALSBØLL, 1999  Single-locus tests of microsatellite evolution: multi-step mutations and constraints on allele size. Mol. Phylogenet. Evol. in press.

ORTI, G., D. E. PEARSE, and J. C. AVISE, 1997  Phylogenetic assessment of length variation at a microsatellite locus. Proc. Natl. Acad. Sci. USA 94:10745-10749[Abstract/Free Full Text].

PALSBØLL, P. J., F. LARSEN, and E. SIGURD HANSEN, 1991  Sampling of skin biopsies from free-ranging large cetaceans in West Greenland: development of new biopsy tips and bolt designs. Rep. Int. Whaling Comm. Spec. Issue 13:71-79.

PALSBØLL, P. J., P. J. CLAPHAM, D. K. MATTILA, F. LARSEN, and R. SEARS et al., 1995  Distribution of mtDNA haplotypes in North Atlantic humpback whales: the influence of behaviour on population structure. Marine Ecol. Prog. Ser. 116:1-10.

PALSBØLL, P. J., J. ALLEN, M. BÉRUBÉ, P. J. CLAPHAM, and T. P. FEDDERSEN et al., 1997a  Genetic tagging of humpback whales. Nature 388:676-679.

PALSBØLL, P. J., M. BÉRUBÉ, A. H. LARSEN, and H. JØRGENSEN, 1997b  Primers for the amplification of tri- and tetramer microsatellite loci in cetaceans. Mol. Ecol. 6:893-895[Medline].

PRIMMER, C. R., N. SAINO, and A. P. MØLLER, 1996  Directional evolution in germline microsatellite mutations. Nat. Genet. 13:391-393[Medline].

RICO, C., I. RICO, and G. HEWITT, 1994  An optimized method for isolating and sequencing large (CA/GT)n (n > 40) microsatellites from genomic DNA. Mol. Ecol. 3:181-182.

RUBINSZTEIN, D. C., W. AMOS, J. LEGGO, S. GOODBURN, and S. JAIN et al., 1995  Microsatellite evolution—evidence for directionality and variation in rate between species. Nat. Genet. 10:337-343[Medline].

SCHLÖTTERER, C. and D. TAUTZ, 1992  Slippage synthesis of simple sequence DNA. Nucleic Acids Res. 20:211-215[Abstract/Free Full Text].

SHRIVER, M. D., L. JIN, R. CHAKRABORTY, and E. BOERWINKLE, 1993  VNTR allele frequency distributions under the stepwise mutation model: a computer simulation approach. Genetics 134:983-993[Abstract].

SHRIVER, M. D., L. JIN, E. BOERWINKLE, R. DEKA, and R. E. FERRELL et al., 1995  A novel measure of genetic distance for highly polymorphic tandem repeat loci. Mol. Biol. Evol. 12:914-920[Abstract].

SLATKIN, M., 1995  A measure of population subdivision based on microsatellite allele frequencies. Genetics 139:457-462[Medline].

SLATKIN, M. and R. R. HUDSON, 1991  Pairwise comparisons of mitochondrial DNA sequences in stable and exponentially growing populations. Genetics 129:555-562[Abstract].

TALBOT, C. C., D. AVRAMOPOULOS, S. GERKEN, A. CHAKRAVARTI, and J. A. ARMOUR et al., 1995  The tetranucleotide repeat polymorphism D21S1245 demonstrates hypermutability in germline and somatic cells. Hum. Mol. Genet. 4:1193-1199[Abstract/Free Full Text].

TAUTZ, D., 1989  Hypervariability of simple sequences as a general source for polymorphic DNA markers. Nucleic Acids Res. 17:6463-6471[Abstract/Free Full Text].

VALDES, A. M., M. SLATKIN, and N. B. FREIMER, 1993  Allele frequencies at microsatellite loci: the stepwise mutation model revisited. Genetics 133:737-749[Abstract].

VALSECCHI, E., P. PALSBØLL, P. HALE, D. GLOCKNER-FERRARI, and M. FERRARI et al., 1997  Microsatellite genetic distances between oceanic populations of the humpback whale (Megaptera novaeangliae). Mol. Biol. Evol. 14:355-362[Abstract].

WEBER, J. L., 1990  Informativeness of human (dC-dA)n · (dG-dT)n polymorphisms. Genomics 7:524-530[Medline].

WEBER, J. L. and P. E. MAY, 1989  Abundant class of human DNA polymorphisms which can be typed using the polymerase chain reaction. Am. J. Hum. Genet. 44:388-396[Medline].

WEBER, J. L. and C. WONG, 1993  Mutation of human short tandem repeats. Hum. Mol. Genet. 2:1123-1128[Abstract/Free Full Text].

WEIR, B. S., 1990 Genetic Data Analysis. Methods for Discrete Population Genetic Data. Sinauer Associates, Sunderland, MA.

ZHIVOTOVSKY, L. A. and M. W. FELDMAN, 1995  Microsatellite variability and genetic distances. Proc. Natl. Acad. Sci. USA 92:11549-11552[Abstract/Free Full Text].




This article has been cited by other articles:


Home page
J HeredHome page
F. Lopez-Giraldez, J. Marmi, and X. Domingo-Roura
High Incidence of Nonslippage Mechanisms Generating Variability and Complexity in Eurasian Badger Microsatellites
J. Hered., September 1, 2007; 98(6): 620 - 628.
[Abstract] [Full Text] [PDF]


Home page
Mol Biol EvolHome page
J. Brohede, N. Arnheim, and H. Ellegren
Single-Molecule Analysis of the Hypermutable Tetranucleotide Repeat Locus D21S1245 Through Sperm Genotyping: A Heterogeneous Pattern of Mutation but no Clear Male Age Effect
Mol. Biol. Evol., January 1, 2004; 21(1): 58 - 64.
[Abstract] [Full Text]