Although it is clear that postreplicative DNA mismatch repair (MMR) plays a critical role in maintaining genomic stability in nearly all forms of life surveyed, much remains to be understood about the genome-wide impact of MMR on spontaneous mutation processes and the extent to which MMR-deficient mutation patterns vary among species. We analyzed spontaneous mutation processes across multiple genomic regions using two sets of mismatch repair-deficient (msh-2 and msh-6) Caenorhabditis elegans mutation-accumulation (MA) lines and compared our observations to mutation spectra in a set of wild-type (WT), repair-proficient C. elegans MA lines. Across most sequences surveyed in the MMR-deficient MA lines, mutation rates were ∼100-fold higher than rates in the WT MA lines, although homopolymeric nucleotide-run (HP) loci composed of A:T base pairs mutated at an ∼500-fold greater rate. In contrast to yeast and humans where mutation spectra vary substantially with respect to different specific MMR-deficient genotypes, mutation rates and patterns were overall highly similar between the msh-2 and msh-6 C. elegans MA lines. This, along with the apparent absence of a Saccharomyces cerevisiae MSH3 ortholog in the C. elegans genome, suggests that C. elegans MMR surveillance is carried out by a single Msh-2/Msh-6 heterodimer.
GENOME stability is continually challenged by a diverse array of mutagenic forces that include errors during DNA replication, environmental factors such as UV radiation, and endogenous mutagens such as oxygen free radicals generated during oxidative metabolism (Lindahl 1993). Multiple DNA repair pathways have evolved to minimize the mutagenic consequences of DNA damage and erroneous DNA replication. Most of the major DNA repair pathways have been detected in all three domains of life, suggesting ancient origins (Eisen and Hanawalt 1999).
The mismatch repair (MMR) pathway corrects a wide range of base-base mismatches (some involving damaged bases) and small loop-outs in DNA molecules and has been extensively studied in multiple systems using a variety of genetic, biochemical, and biophysical approaches (reviewed in Harfe and Jinks-Robertson 2000). In eukaryotic MMR, heterodimeric complexes involving homologs of the Escherichia coli MutS protein [named MutS Homologs (Msh) 1-7] mediate error surveillance and recognition. In Saccharomyces cerevisiae, Msh2/Msh6 protein heterodimers recognize and repair nuclear base-base mismatches and small (1–2 bp) insertion-deletion (indel) loops, whereas Msh2/Msh3 heterodimers correct a range of small and larger loop-outs, but do not recognize most base-base mismatches (Alani 1996; Habraken et al. 1996). In humans, hMsh2/hMsh6 and hMsh2/hMsh3 heterodimers display partially overlapping damage recognition spectra similar to that observed in S. cerevisiae (Acharya et al. 1996). Msh1 proteins are involved in maintaining mitochondrial genome stability in S. cerevisiae (Chi and Kolodner 1994), but msh1 orthologs have not been detected in any metazoan genomes surveyed thus far (Eisen 1998; unpublished genome database searches) and it is often assumed that MMR is absent from mitochondria in metazoans. Msh4 and Msh5 have roles in meiotic recombination in S. cerevisiae, Caenorhabditis elegans, and humans, with no apparent MMR-related functions (Zalevsky et al. 1999).
Despite the remarkable overall congruence in the MMR machinery and error recognition mechanisms between S. cerevisiae and humans, msh3 orthologs are not detected in the genomes of C. elegans or Drosophila melanogaster (Eisen 1998; unpublished genome database searches). Furthermore, although the Schizosaccharomyces pombe genome encodes an msh3 ortholog, S. pombe mutants deficient for this gene do not display defects in MMR but rather have reduced recombination frequencies (Tornier et al. 2001). The overall consequences of lacking an msh3 ortholog involved in MMR-mediated error recognition, however, are unclear. The Msh2/Msh6 dimer in species lacking an msh3 ortholog may recognize and repair a spectrum of errors comparable to that repaired by both Msh2/Msh3 and Msh2/Msh6 dimers in species such as S. cerevisiae and humans. Alternatively, the Msh2/Msh6 dimer in msh3 ortholog-deficient species may recognize a range of mismatches and small indel loops comparable to that of the Msh2/Msh6 dimer in msh3 ortholog-proficient species. It also cannot be ruled out that Msh-2 and/or Msh-6 homodimers may be involved in C. elegans MMR-mediated error surveillance.
Mutation spectra in MMR-deficient backgrounds have been investigated in multiple eukaryotic species including S. cerevisiae (reviewed in Harfe and Jinks-Robertson 2000), S. pombe (Tornier et al. 2001), C. elegans (Degtyareva et al. 2002; Tijsterman et al. 2002), D. melanogaster (Harr et al. 2002), mice (Andrew et al. 2000), and humans (Malkhosyan et al. 1996; Ohzeki et al. 1997; Tauchi et al. 2000; Mark et al. 2002). Although these studies have provided important fundamental insights into MMR-deficient mutation processes, they have been limited to observations at one or a few reporter genetic loci and/or focus exclusively on mutations at known hotspot repetitive sequences, such as microsatellites. Furthermore, a general lack of direct and unbiased estimates of baseline (repair-proficient) spontaneous mutation spectra in almost all eukaryotic species has limited our ability to interpret mutation rates and patterns in DNA repair-deficient backgrounds. An accurate and comprehensive understanding of MMR's contributions to maintaining genome stability requires a broad-based analysis of MMR-deficient mutation spectra at multiple diverse genetic loci and in a system where baseline spontaneous mutation processes are well understood.
This study provides a direct and robust analysis of mutation rates and patterns in two MMR-deficient (msh-2 and msh-6) strains of C. elegans. Mutation spectra were surveyed across multiple nuclear loci and one mitochondrial locus in the msh-2 and msh-6 C. elegans mutation-accumulation (MA) lines to provide insights into the roles of MMR in maintaining eukaryotic genome stability. Mutational estimates from a set of long-term, wild-type (WT) C. elegans MA lines (Denver et al. 2004a,b) provide a unique mutational baseline for interpreting MMR-deficient mutation processes.
MATERIALS AND METHODS
Base strains and mutation-accumulation procedures:
The msh-2 strain of C. elegans, provided as a gift from Thomas D. Petes at the University of North Carolina-Chapel Hill, contains a Tc1 transposon insertion in the seventh exon of the msh-2 gene and has been characterized as MMR-defective (Degtyareva et al. 2002). The msh-6 C. elegans strain, provided as a gift from Ronald H. Plasterk at the Hubrecht Laboratory, is missing the entire fifth and part of the sixth exon and is also MMR defective (Tijsterman et al. 2002). Before initiating MA experiments, each MMR knockout strain was backcrossed to N2 genomes six times so that the msh-2 and msh-6 experiments would be carried out on highly similar genetic backgrounds, other than the specific defective MMR genes of interest.
Fifty MA lines were initiated for each of the backcrossed msh-2 and msh-6 strains of C. elegans. Following standard MA procedures for C. elegans (Vassilieva et al. 2000), each MA line was propagated across an average of 18 generations in a benign environment (NGM agar plates seeded with the OP50 strain of E. coli as a food source, 20°) as single, randomly selected hermaphrodites picked at the L4 larval stage. This treatment resulted in an effective population size equal to one for each MA line throughout the experiment and ensured that all but the most deleterious mutations accumulated over time in an effectively neutral fashion. Sets of backups, maintained for each MA line (at 10°) for the preceding generation, were used in the event of sterile or dead worms. MA lines were declared extinct if all three consecutive attempts to transfer worms from the backup plate resulted in nonviable worms. Five msh-2 MA lines and one msh-6 line went extinct through the course of the MA experiment. For comparison, 26 of 100 original WT MA lines were extinct after an average of 214 generations (Denver et al. 2000).
Mutation detection and confirmation:
Mutations were detected in the MMR-deficient C. elegans MA lines by polymerase chain reaction (PCR)-amplifying regions of the genome followed by direct DNA sequencing of the PCR products. The majority of loci sequenced were randomly distributed across C. elegans chromosomes by designing PCR primer pairs around chromosomal positions selected by a random number generator (Denver et al. 2004b); a subset of loci, however, were directly targeted to specific homopolymer (HP) runs to evaluate their mutational properties (Denver et al. 2004a). PCRs were performed using a large amount of genomic DNA (∼25,000 diploid genomes per reaction) and 2 units Taq DNA polymerase (Eppendorf) to eliminate artifacts associated with initial amplification from small amounts of genomic DNA. PCR products were purified by solid phase reversible immobilization (Elkin et al. 2001), cycle sequenced, and analyzed on ABI3700 and ABI3730 DNA sequencers (Applied Biosystems) at the Indiana Molecular Biology Institute.
DNA sequence text files from MMR-deficient MA lines, backcrossed MMR-deficient progenitor lines, and N2 (WT) were batch aligned using Clustal W (Higgins et al. 1994) to identify putative mutations in the MMR-deficient MA lines. Putative MA line-specific mutations identified in the alignments were then visually scrutinized on the electropherogram data to eliminate base-caller errors and other sequencing artifacts. All putative mutations showing unclear or ambiguous sequence data were resequenced. Putative mutations supported by clean, unambiguous electropherogram data were then evaluated on the opposite strand (sequencing reaction in opposite direction), using internal primers where necessary. Only those mutations supported by reliable electropherogram data on both strands of directly sequenced PCR products were considered for this study.
Calculation of mutation rates:
Complex sequence mutation rates were calculated using the equation μ = m/(LnT), where μ is the mutation rate (per nucleotide site per generation), m is the number of observed mutations, L is the number of MA lines, n is the number of nucleotide sites, and T is the time in generations. The standard errors of mean (SEM) for complex sequence mutation rates were calculated using the equation SEM = (μ/(LnT))1/2. For HP loci, mutation rates were calculated using the equation μ = m/(LhT), where h is the number of HP loci, and SEM values were calculated using the equation SEM = (μ/(LhT))1/2. SEM values for mutation rates are shown in parentheses throughout the text and tables.
RESULTS AND DISCUSSION
We sequenced 20,469 bp of nuclear DNA, distributed across 24 PCR product loci, and 929 bp of mitochondrial DNA (one PCR locus) from each of 45 msh-2 and 49 msh-6 C. elegans MA lines (see supplementary Table 1 at http://www.genetics.org/supplemental/). The majority of nuclear loci (21/24 total) was also assayed in the WT C. elegans MA lines (Denver et al. 2004b). Each MMR-deficient MA line was propagated across an average of 18 generations as single, randomly selected hermaphrodites to allow for the accumulation of all but the most deleterious mutations (Vassilieva et al. 2000). Both sets of MMR-deficient MA lines displayed severe fitness declines by the end of the mutation-accumulation phase (our unpublished data), consistent with a previous analysis of fitness declines in a separate set of msh-2 C. elegans MA lines (Estes et al. 2004). We investigated MMR-deficient C. elegans mutation processes first at HP loci ≥ 8 bp in length that are known indel hotspots, then across the remaining complex sequences. We compared the rates and spectra mutation in the MMR-deficient MA lines reported here to that observed in a long-term WT set of C. elegans MA lines (Denver et al. 2004a,b) to gain unique insights into relationships between MMR and spontaneous mutation processes in C. elegans.
Mutation rates and spectra at homopolymer loci:
We first considered mutation rates and spectra at HP runs (defined here as any mononucleotide run ≥ 8 bp in length) as these simple sequences are known indel mutational hotspots in MMR-deficient backgrounds in yeast, humans, and C. elegans (Harfe and Jinks-Robertson 2000; Tijsterman et al. 2002). Fifteen nuclear HP loci (eight A:T HPs and seven G:C HPs) were assayed in the MMR-deficient MA lines in addition to a single mitochondrial (A:T)11 run. We did not focus on other microsatellites (di- or trinucleotide, for instance) as previous studies have surveyed C. elegans MMR-deficient mutation processes at these types of loci (Degtyareva et al. 2002; Tijsterman et al. 2002), and HPs are a much more dominant component of the C. elegans genome than are other microsatellite types (Denver et al. 2004a).
We detected 149 mutations, dominated by single-nucleotide indels, at nuclear HP loci in the MMR-deficient C. elegans MA lines (Table 1). The majority of HP mutations were observed at G:C HP loci (115/149 total observed, as compared to 69.5/149 expected on the basis of an even distribution of mutations across the 15 assayed HP loci), and two-nucleotide indel mutations were exclusively observed at G:C HP runs. The mutation rates for G:C HPs were highly similar and not significantly different for the msh-2 and msh-6 MA lines (Table 2). Larger G:C HPs, 12–16 bp in length, mutated at approximately a twofold greater rate [μ = 1.3 (±0.1) × 10−2 mutations per HP per generation, calculated for combined msh-2 and msh-6 mutations] than that of smaller, 8–11 bp, G:C HPs [μ = 4.7 (±1.0) × 10−3 mutations per HP per generation, again calculated for both msh-2 and msh-6 data]. Mutation rates at A:T HPs were also highly similar between the msh-2 and msh-6 MA lines (Table 2). Larger A:T HPs, 12–16 bp in length, mutated at a higher rate [μ = 6.2 (±1.4) × 10−3 mutations per HP per generation, calculated for combined msh-2 and msh-6 mutations] than that of smaller, 8–11 bp, HP loci [μ = 1.2 (±0.4) × 10−3 mutations per HP per generation, also calculated for combined msh-2 and msh-6 mutations]. Overall, mutation rates at A:T HPs were lower (three- to fourfold) than rates observed at G:C HPs in the MMR-deficient MA lines, consistent with observations in S. cerevisiae (Tran et al. 1997; Gragg et al. 2002). All mutations observed at HP loci were indels, the majority of which were single-base pair indels (five 2-bp indels were observed). Deletions were more prevalent than insertions in the MMR-deficient MA lines at both A:T and G:C HPs (Table 1).
No mutations were observed at the mitochondrial (A:T)11 HP run. This is notable as 4.5 (±1.4) mutations were expected among the 94 MMR-deficient MA lines, under the assumption that nuclear and mitochondrial A:T HPs mutate at comparable rates in MMR-deficient backgrounds. Alternatively, under a null expectation that mitochondrial HP loci mutate at the same rate in WT (Denver et al. 2000) and MMR-deficient backgrounds, 0.5 (±0.2) mutations would be expected at the (A:T)11 HP. The number of mutations observed at this mitochondrial HP locus in the MMR-deficient MA lines (zero) was much closer to the latter expectation, suggesting that msh-2 and msh-6 do not function in mitochondrial MMR in C. elegans.
The remarkable similarity between the msh-2 and msh-6 MA lines in terms of overall nuclear HP mutation rate and pattern suggests that a single Msh-2/Msh-6 heterodimeric complex mediates MMR surveillance for postreplicative HP loop-outs in C. elegans. This is supported by the apparent absence of an msh3 ortholog in the C. elegans genome (Eisen 1998) and indications from yeast two-hybrid experiments that Msh-2 and Msh-6 may interact exclusively with one another (Boulton et al. 2002). The mutational similarities between msh-2 and msh-6 C. elegans reported here contrast with observations in S. cerevisiae where HP mutation patterns observed in msh2, msh3, and msh6 strains differ markedly with respect to one another (Gragg et al. 2002). Compared to G:C HP mutation rates in the WT set of C. elegans MA lines (Denver et al. 2004a), a ∼100-fold elevated rate was observed for G:C HPs in the MMR-deficient MA lines. The mutation rate disparity between the MMR-deficient and WT MA lines was more pronounced for A:T HPs where the rate was elevated ∼500-fold in the MMR-deficient MA lines. This observation suggests that the C. elegans MMR surveillance machinery may have evolved a greater ability to recognize and repair loop-outs specific to A:T HPs as compared to G:C HPs. The selective pressure to maintain stability specifically at A:T HP runs may be related to the extreme dominance of A:T HPs in the C. elegans genome (146,224 A:T HPs ≥ 8 bp were detected in the genome) as compared to G:C HPs (only 2,401 G:C HPs ≥ 8 bp were detected) (Denver et al. 2004a).
Mutation rates and spectra in complex sequence:
Although mutations at known hotspot repetitive loci (such as HPs and microsatellites) in MMR-deficient backgrounds provide important insights into the MMR process, the vast majority of the C. elegans genome is composed of more complex sequences where the impacts of MMR on mutation processes are less clear. Across >20 kb of nuclear DNA surveyed from each MMR-deficient MA line, we detected 17 nuclear complex sequence mutations (not at HP loci, defined previously) in the msh-2 MA lines and 19 nuclear complex sequence mutations in the msh-6 MA lines (Table 3), resulting in strikingly similar total mutation rates for the msh-2 and msh-6 MA lines (Table 2) and further suggesting that MMR surveillance in C. elegans is carried out by a single Msh-2/Msh-6 heterodimer. No complex sequence mutations were observed at the mitochondrial locus. Highly similar mutation rates specific for base substitutions and indels were observed for the msh-2 and msh-6 MA lines (Table 2).
The distributions of mutations across msh-2 and msh-6 MA lines were very close to Poisson expectations (for msh-2, 30.8, 11.7, 2.2, and 0.3 instances of MA lines with 0, 1, 2, and 3 mutations were expected, respectively, and 31, 12, 1, and 1 lines were observed with 0, 1, 2, and 3 mutations, respectively; for msh-6, 33.2, 12.9, 2.5, and 0.3 instances of MA lines with 0, 1, 2, and 3 mutations were expected, respectively, and 33, 14, 1, and 1 MA lines were observed with 0, 1, 2, and 3 mutations, respectively). For the msh-2 MA lines, the distribution of mutations across the 24 assayed nuclear PCR product loci were also very close to Poisson expectations (11.8, 8.4, 3.0, and 0.7 loci with 0, 1, 2, and 3 mutations were expected, respectively, and 11, 8, 4, and 1 loci with 0, 1, 2, and 3 mutations were observed, respectively). For the msh-6 MA lines, however, seven mutations were observed at a single locus (ZK337; see Table 3), resulting in a significant deviation from Poisson expectations (P < 0.005) for the distribution of mutations across PCR loci (10.9, 8.6, 3.4, 0.9, and 4.2 × 10−4 loci with 0, 1, 2, 3, and 7 mutations were expected, respectively, and 14, 7, 1, 1, and 1 loci with 0, 1, 2, 3, and 7 mutations were observed, respectively).
The complex sequence mutation rates observed in MMR-deficient backgrounds were ∼100-fold higher than complex sequence mutation rates in the WT MA lines (Denver et al. 2004b). This observation is in contrast to suggestions that the relatively high mutation rates observed in the WT C. elegans MA lines (as compared to previous lower mutation rate estimates for C. elegans based on indirect phenotypic assays) may be due to mutation-induced stress responses where mismatch repair is inactivated (Rosenberg and Hastings 2004). If the high mutation rate observed in the WT MA lines was due to stress-induced MMR-deficiency, we would expect roughly similar mutation rates between the WT and MMR-deficient C. elegans MA lines rather than the observed ∼100-fold disparity.
Similar mutation patterns were also observed between the msh-2 and msh-6 C. elegans MA lines. Among base substitutions, eight transitions and three transversions were observed in the msh-2 MA lines; seven transitions and two transversions were observed in the msh-6 MA lines (Table 3). These transition biases were also similar to that observed in the WT C. elegans MA lines (Denver et al. 2004b). Six complex sequence indel mutations were observed in the msh-2 MA lines: three single-bp insertions and three single-bp deletions. Ten complex sequence indels were observed for the msh-6 MA lines: seven single-base pair insertions, one 2-bp insertion, one single-base pair deletion, and one 8-bp deletion. Seven of the msh-6 indels, all insertions, were observed at a single locus (ZK337). Six of the 16 total complex sequence indels in the MMR-deficient MA lines (both msh-2 and msh-6) were in short mononucleotide run stretches below our cutoff threshold for HP loci (≥8 bp). Whereas complex sequence base substitutions were more frequent than indels in MMR-deficient backgrounds, complex sequence indels occurred at a higher rate than did base substitutions in the WT MA lines (Denver et al. 2004b), suggesting that the C. elegans MMR machinery may be more efficient at eliminating base substitutions than the indels in complex sequence.
The overall similarity in complex sequence mutation spectra observed between the msh-2 and msh-6 C. elegans MA lines contrasts with mutation patterns observed in MMR-deficient S. cerevisiae strains at the Canr locus (Marsischky et al. 1996) that vary with respect to differing specific genotypes (Table 4). Similarly, mutation spectra differed in MMR-deficient human colon carcinoma cell lines at the hprt locus depending on the specific MMR-deficient background (Malkhosyan et al. 1996; Ohzeki et al. 1997; Tauchi et al. 2000). The distinctive similarity between msh-2 and msh-6 C. elegans mutation spectra, as compared to yeast and humans, may be due to the apparent absence of an msh3 homolog and the corresponding presence of a single major MMR error surveillance complex (the Msh-2/Msh-6 heterodimer) in C. elegans.
An exception to the trend of mutational similarity between msh-2 and msh-6 C. elegans was the ZK337 locus where seven insertion mutations were observed exclusively in the msh-6 MA lines (Table 3); no mutations were found at this locus in the msh-2 MA lines. This observation deviated significantly (P < 0.05) from expectations based on an even distribution of ZK337 mutations across msh-2 and msh-6 MA lines. This msh-6-specific mutational hotspot locus was also distinctive from other assayed nuclear loci as all seven observed mutations (occurring across four distinct specific locus positions; Table 2) were insertions, whereas an approximately equal number of insertions and deletions were detected at the other 23 loci surveyed in the MMR-deficient MA lines. This finding suggests that Msh-6 may be involved in mutation deterrence in an Msh-2-independent fashion that is specific to this region (and perhaps other regions) of the C. elegans genome. Further studies are required to understand the significance of the unusual mutation patterns at this single locus.
In this study we provide important insights into the relationships between MMR and spontaneous mutation processes in C. elegans. We find that, for the most part, mutation rates and spectra in two distinct MMR-deficient C. elegans backgrounds (msh-2 and msh-6) are highly similar to one another (the ZK337 locus being an exception), but differ when compared to rates and patterns of spontaneous mutation in a WT strain of C. elegans. The nuclear MMR-deficient mutation spectra reported here for C. elegans also differ from that observed in other eukaryotes that encode an msh3 ortholog (S. cerevisiae and humans). A broad-based understanding of the evolution of MMR pathways and their contributions to maintaining genome stability across eukaryotic phylogeny will require comparable surveys of MMR-deficient mutation spectra in species such as D. melanogaster, which also appears to lack an msh3 ortholog, and Arabidopsis thaliana, which encodes AtMsh2, AtMsh3, AtMsh6, and AtMsh7, carrying out MMR surveillance with three different error surveillance complexes (Culligan and Hays 2000).
We thank T. D. Petes at the University of North Carolina, Chapel Hill and R. H. Plasterk at the Hubrecht Laboratory for providing the msh-2 and msh-6 C. elegans strains, respectively. We thank Lawrence Washington at the Indiana Molecular Biology Institute for DNA sequencing. We thank two anonymous reviewers for helpful comments. Funding for this work was provided by National Institutes of Health grant R01 GM36827 to M.L. and W.K.T., and fellowship F32 GM66652 to D.R.D.
Sequence data from this article have been deposited with the EMBL/GenBank Data Libraries under accession nos. AY863110, AY863111, AY863112, AY863113, AY863114, AY863115, AY863116, AY863117, AY863118, AY863119, AY863120, AY863121, AY863122, AY863123, AY863124, AY863125, AY863126, AY863127, AY863128, AY863129, AY863130, AY863131, AY863132, AY863133, AY863134, AY863135, AY863136, AY863137, AY863138, AY863139, AY863140, AY863141, AY863142, AY863143.
Communicating editor: K. Kemphues
- Received November 12, 2004.
- Accepted January 20, 2005.
- Genetics Society of America