Abstract
Hereditary nonpolyposis colorectal cancer (HNPCC) is associated with defects in DNA mismatch repair. Mutations in either hMSH2 or hMLH1 underlie the majority of HNPCC cases. Approximately 25% of annotated hMSH2 disease alleles are missense mutations, resulting in a single change out of 934 amino acids. We engineered 54 missense mutations in the cognate positions in yeast MSH2 and tested for function. Of the human alleles, 55% conferred strong defects, 8% displayed intermediate defects, and 38% showed no defects in mismatch repair assays. Fifty percent of the defective alleles resulted in decreased steady-state levels of the variant Msh2 protein, and 49% of the Msh2 variants lost crucial protein–protein interactions. Finally, nine positions are predicted to influence the mismatch recognition complex ATPase activity. In summary, the missense mutations leading to loss of mismatch repair defined important structure–function relationships and the molecular analysis revealed the nature of the deficiency for Msh2 variants expressed in the tumors. Of medical relevance are 15 human alleles annotated as pathogenic in public databases that conferred no obvious defects in mismatch repair assays. This analysis underscores the importance of functional characterization of missense alleles to ensure that they are the causative factor for disease.
COLORECTAL cancer is the second leading cause of cancer mortality in the United States (Jemal et al. 2005). In 2006, the American Cancer Society statistics revealed that colorectal cancer was newly diagnosed in ∼153,000 Americans and caused ∼56,000 deaths. Approximately 2–7% of these cases are the consequence of an inherited form of the disease, hereditary nonpolyposis colorectal cancer (HNPCC) (Peltomaki 2003). HNPCC is a dominant and highly penetrant disease in that individuals inheriting a single defective locus have an 80–90% chance of developing cancer before 50 years of age (Lynch and Lynch 2000). The hereditary nature of HNPCC was first documented in 1913 (Warthin 1913). Eighty years later, in an exciting convergence of medical and basic science research, investigators proved that HNPCC results from defects in genes encoding DNA mismatch proteins (Fishel et al. 1993; Leach et al. 1993; Peltomaki et al. 1993).
DNA mismatch repair is an ancient and well-conserved mechanism that significantly contributes to the accurate preservation of genetic material (Schofield and Hsieh 2003; Kunkel and Erie 2005). DNA mismatches form primarily during replication of the genome (Kunkel and Erie 2005), recombination between nonidentical sequences (Surtees et al. 2004), or exposure to DNA-damaging agents (Stojic et al. 2004). Postreplicative mismatch repair includes recognition and binding of a mismatch in the DNA helix followed by incision and degradation of the error-containing strand. Once the error is removed, a new DNA strand with perfect base pairing is synthesized. By eliminating single-base mismatches and insertion/deletion loops that arise because of DNA polymerase error and slippage, mismatch repair enhances the fidelity of DNA replication ∼1000-fold (Schofield and Hsieh 2003; Kunkel and Erie 2005). Without an intact mismatch repair system numerous mutations accumulate, some of which are deleterious, including ones that eventually lead to cancer in higher organisms (Jacob and Praz 2002; de la Chapelle 2004). In addition to an increased rate of single base pair mutations, repetitive DNA elements, such as mono-, di-, and trinucleotide repeat sequences, are highly unstable in mismatch repair-defective bacteria (Levinson and Gutman 1987), yeast (Strand et al. 1993), mouse (Reitmair et al. 1996), and human cells (Aaltonen et al. 1993; Ionov et al. 1993; Parsons et al. 1993; Thibodeau et al. 1993). This phenomenon, referred to as microsatellite instability, is a hallmark of mismatch repair-defective tumors.
Genome sequencing data has uncovered DNA mismatch repair homologs in highly divergent phylogenetic branches, including eubacteria, archaebacteria, and eukaryotes (Wheeler et al. 2000). The first mismatch repair components to be characterized included bacterial MutS, MutL, and MutH (Schofield and Hsieh 2003). Eukaryotic homologs of MutS (MutS homolog or MSH genes) and MutL (MutL homolog or MLH and some postmeiotic segregation or PMS genes) function similarly to their bacterial counterparts but with important differences.
The first step in DNA mismatch repair is identification of mispaired bases in the DNA helix. While MutS acts as a homodimer or homo-oligomer to bind mispaired DNA in prokaryotes (Su and Modrich 1986; Jiricny et al. 1988; Su et al. 1988; Bjornson et al. 2000), the equivalent task is performed in eukaryotes by two separate heterodimers known as MutSα, consisting of Msh2 and Msh6 (Iaccarino et al. 1996), and MutSβ, composed of Msh2 and Msh3 (Habraken et al. 1996; Palombo et al. 1996). The distinct heterodimers possess differing mismatch recognition properties; MutSα binds primarily to single base pair mismatches and small insertion/deletion loops, whereas MutSβ binds to larger insertion/deletion loops up to 16 nucleotides in length (Acharya et al. 1996; Marsischky et al. 1996).
After mismatch recognition, the bacterial MutL homodimer complexes with MutS and initiates subsequent repair events (Au et al. 1992; Hall and Matson 1999; Acharya et al. 2003; Junop et al. 2003). The major MutL eukaryotic equivalent is MutLα, a heterodimer, composed of Mlh1 and Pms1 (called Pms2 in humans) (Prolla et al. 1994; Pang et al. 1997). A crucial step in mismatch repair requires that the error-containing strand is specifically targeted for repair. The MutH endonuclease confers the strand specificity in certain prokaryotes, where only the newly synthesized, unmethylated strand is cleaved for degradation (Welsh et al. 1987; Acharya et al. 2003). A mismatch repair-specific endonuclease activity has not been identified in eukaryotes; however, in vitro, a nicked DNA strand will serve as the initiation site for strand degradation (Iams et al. 2002; Dzantiev et al. 2004). Downstream mismatch repair events include displacement and excision of the error-containing strand followed by new synthesis (Lahue et al. 1989). These steps require a helicase (Yamaguchi et al. 1998; Mechanic et al. 2000; Yang et al. 2004) and multiple, redundant exonucleases (Burdett et al. 2001; Genschel et al. 2002) for complete excision in prokaryotes. No specific eukaryotic helicase has been implicated; furthermore, in a reconstituted system with purified components a helicase is not essential (Constantin et al. 2005; Zhang et al. 2005). Only a single exonuclease, Exo1 (Tran et al. 2004), has been verified as playing a direct role in eukaryotic mismatch repair. Because cleavage of the error-containing strand may be on either side of the mismatch, a central feature of excision is that it be directed toward the mismatch. The presence of Pol30/PCNA (Johnson et al. 1996; Umar et al. 1996) and replication factor C facilitate the directed excision toward the mismatch in eukaryotes (Dzantiev et al. 2004). After excision of the error-containing strand, single-stranded DNA-binding proteins (Lahue et al. 1989), DNA polymerase, and DNA ligase (Tomkinson and Mackey 1998) are needed to synthesize the error-free DNA duplex.
Characterization of key components in homologous systems allowed for the accelerated identification of mismatch repair genes in humans and ultimately for the discovery of an array of clinically significant mutations (Peltomaki and Vasen 2004). Included among the lesions are missense alterations, comprising nearly one-quarter of the known mismatch repair mutations in the human databases (e.g., HGMD and InSiGHT). These missense mutations result in variant mismatch repair proteins with a single amino acid change. A primary objective of this work was to determine whether clinically identified missense mutations result in dysfunctional proteins (pathogenic) or in no obvious defect (benign) in mismatch repair assays. Ascertaining the difference between a pathogenic or benign polymorphism is imperative for genetic counseling and clinical surveillance in HNPCC families. In addition, molecular characterization of pathogenic variant mismatch repair proteins contributed to the understanding of the basic mechanism of mismatch repair and by extension to oncogenesis in HNPCC tumors.
MATERIALS AND METHODS
Microbial and molecular manipulations:
Microbial manipulations were conducted according to previously published procedures (Ausubel et al. 1994; Burke et al. 2000). Bacterial strains used were CJ236, MV1190 (Bio-Rad Laboratories, Hercules, CA), and XL2-Blue (Stratagene, La Jolla, CA). Molecular methods were carried out using standard protocols (Ausubel et al. 1994). Plasmid DNA extractions were accomplished using the QIAGEN procedure (QIAGEN, Valencia, CA). Princeton Syn/Seq Facility or GENEWIZ (South Plainfield, NJ) performed all nucleotide-sequencing reactions.
Plasmids:
The salient features of plasmids are listed in supplemental Table 1 at http://www.genetics.org/supplemental/. Primers used in the construction of plasmids are listed in supplemental Table 2. pMSH2 was created to express a hemagglutinin (HA) epitope-tagged MSH2 gene from the endogenous promoter. pMSH2 was formed by fusing a SpeI–BglII fragment from pEAM42, a BglII–ClaI fragment from pEAE113, and a ClaI–SpeI digested pRS413 (Sikorski and Hieter 1989) vector fragment. pEAE113 and pEAM42 were generously provided by Eric Alani (Cornell University). MSH2∷HA on pMSH2 was sequenced and shown to complement the mismatch repair defect of a msh2Δ strain.
pGBD-MSH2 construct allows for constitutive expression of an in-frame fusion between MSH2∷HA and the GAL4 DNA binding domain (GBD). A GBD-MSH2 fusion lacking the HA epitope (pAG55) fusion construct was made by PCR amplification of the MSH2 reading frame from yeast genomic DNA and formed by in vivo recombination techniques (Oldenburg et al. 1997). MSH2 fragments were amplified using primers 1 + 2 for the 5′ fusion fragment and primers 3 + 4 for the 3′ fragment. The vector, pGBD-C2 (James et al. 1996), was digested with BamHI and PstI. The linearized vector and the PCR-amplified MSH2 fragments were introduced into yeast to form pAG55. A 760-bp epitope-containing SpeI–PstI fragment from pMSH2 was used to replace the corresponding untagged C-terminal coding region of MSH2 (580 bp) encoded in pAG55 according to standard ligation procedures (Sambrook et al. 1989), resulting in pGBD-MSH2. The fusion junction and the MSH2∷HA coding region were verified by nucleotide sequencing.
The GAL4 activating domain (GAD) fusion constructs allow for constitutive expression of an in-frame fusion between the ORFs for the functional partners of MSH2 and the GAD coding region. The functional partners include MSH6, MSH3, MLH1, PMS1, EXO1, and POL30. The fusion constructs were made by PCR amplification of the partner ORFs from yeast genomic DNA and fusion constructs were formed by in vivo recombination techniques. The primers used are listed in supplemental Table 2 at http://www.genetics.org/supplemental/ and were as follows: 5 + 6 and 7 + 8 for MSH3; 9 + 10 and 11 + 12 for MSH6; 13 + 14 and 15 + 16 for EXO1; 17 + 18 and 19 + 20 for MLH1; 21 + 22 and 23 + 24 for PMS1; and 25 + 26 for POL30. The vector, pGAD-C2 (James et al. 1996), was digested with BamHI and PstI. The linearized vector and the PCR-amplified fragments for each partner ORF were introduced into yeast to form the GAD-fusions. All constructs were extracted from yeast and confirmed by restriction endonuclease digestion.
The pGAL-MSH2 constructs contains the MSH2∷HA coding region downstream of the galactose-inducible GAL10 promoter. The GAL10 promoter (PGAL10) was amplified from pMR438 using primers 29 + 30. MSH2 was amplified using primers 27 + 2 (for the 5′ coding region) and 3 + 28 (for the 3′ coding region) using pMSH2 DNA. BamHI-linearized pRS423 vector (Sikorski and Hieter 1989) and PCR-amplified fragments containing the PGAL10 and MSH2 were introduced into yeast to form pGAL-MSH2 by homologous recombination. The pGAL-MSH2 construct was verified by restriction endonuclease digestion and by dideoxy nucleotide sequencing of PGAL10 and MSH2∷HA coding sequence.
Missense mutations were introduced into the msh2 coding region of pMSH2 and pGAL-MSH2 by oligonucleotide-directed mutagenesis (Kunkel 1985). Mutagenic oligonucleotides (Princeton Syn/Seq Facility) listed in supplemental Table 3 at http://www.genetics.org/supplemental/ were phosphorylated and used to prime synthesis of the complementary strand of uracil-containing DNA template in an in vitro mutagenesis reaction (Ausubel et al. 1994). Whenever possible, the codon to generate the missense mutation was selected to either create or destroy a restriction endonuclease site within MSH2 (supplemental Table 3). The missense mutations were verified by nucleotide sequencing. Missense mutations were introduced into pGBD-MSH2 using either in vivo recombination or recombinant DNA technology. All GBD-MSH2 missense allele-containing constructs were confirmed by restriction endonuclease digestion or nucleotide sequencing.
Strain construction:
All yeast strains used in this study were derived from W303 and confirmed to be wild type at the RAD5 locus by amplifying the region using colony PCR and diagnostic MnlI digestions with primers 32 and 33. Construction of AGY75, the msh2Δ reporter strain, was as follows. The URA3 auxotrophic marker at the msh2 locus (msh2∷URA3) in the strain LYS505 (Lorraine Symington, Columbia University) was changed to LEU2 using pUL9 (Cross 1997) to generate AGY28 (MATa ade2-1 leu2-3,112 trp1-1 ura3-1 can1-100 his3-11,15 msh2∷LEU2). AGY28 was crossed to a CAN1 strain (Rodney Rothstein, Columbia University) to isolate AGY70 (MATa CAN1 ade2-1 trp1-1 ura3-1 leu2-3,112 his3-11,15 msh2Δ∷LEU2). A dinucleotide instability reporter encoded on pSH44 (Henderson and Petes 1992) was introduced into AGY70 to form AGY75.
Mismatch repair assays:
The mutagenized plasmids (containing the msh2 missense alleles), the pMSH2 unmutagenized plasmid (containing wild-type MSH2), and the pRS413 vector control (no MSH2) were used to transform the yeast reporter strain AGY75. Colonies from each of the transformations were tested for DNA mismatch repair using qualitative and quantitative assays using previously detailed experiments (Lea and Coulson 1949; Henderson and Petes 1992; Reenan and Kolodner 1992).
Immunoblotting:
Approximately 3 × 107 cells of each strain were used to prepare protein extracts (Ohashi et al. 1982; Burke et al. 2000). Samples were fractionated using a 7% resolving gel (Ausubel et al. 1994) and detection of Msh2-HA was conducted according to the ECL immunoblotting procedure (Amersham Biosciences, Piscataway, NJ). The primary antibody used was mouse 12CA5 monoclonal antibody specific for the HA epitope (Princeton Monoclonal Facility). The secondary antibody was α-mouse-horseradish peroxidase (HRP)-conjugated secondary antibody (Amersham Biosciences). Both antibodies were used at a 1:2500 dilution. After visualization of Msh2-HA, the membrane was reprobed with rabbit α-Kar2p polyclonal (our laboratory, 1:50,000 dilution) and α-rabbit IgG-HRP (Amersham Biosciences, 1:2500 dilution) antibodies to assay for equal protein concentrations and loadings of the samples.
Yeast two-hybrid assays:
pGBD-C2, pGBD-MSH2, and the plasmids containing the fusions between GBD and the msh2 alleles were introduced into PJ69-4A (James et al. 1996) and GAD fusion constructs were introduced into PJ69-4α (James et al. 1996). Crosses were conducted to form diploid yeast strains for 24 hr at 30° and replica printed to selective plates. For semiquantitative assays, cultures were grown in liquid selective medium to saturation. Fivefold serial dilutions were performed in microtiter dishes and spotted onto selective plates.
Screen to identify amino acids in Msh2 important for MutSα and MutSβ subunit interactions:
Hydroxylamine mutagenesis (Burke et al. 2000) of pGBD-MSH2 was conducted to screen for mutations conferring MutSα and/or MutSβ subunit binding defects. PJ69-4A (James et al. 1996) was transformed with hydroxylamine mutagenized pGBD-MSH2. The transformants were mated to confluent lawns of PJ69-4α (James et al. 1996) harboring pGAD-MSH6 or pGAD-MSH3. The resulting diploids were tested for the loss of two-hybrid interactions as described above. Roughly 10,000 colonies were screened. Plasmids that appeared to confer a defect in the binding of MutSα/β subunits were extracted from yeast and retested for subunit interaction upon transformation back into the PJ69-4A strain. Plasmids resulting in defective two-hybrid interactions were sequenced. Identified missense mutations were introduced into the wild-type gene on pMSH2 and tested for functionality as described above.
RESULTS
Functional characterization of hMSH2 missense mutations:
Since the discovery of the linkage between defects in hMSH2 and HNPCC, clinical isolates have yielded an array of mutations that potentially contribute to the disease. For many of the mutations (e.g., those resulting in deletions or truncations) the nature of the genetic lesion causes an obvious loss of function; however, for the ∼25% of missense mutations scattered throughout the coding region that cause a single amino acid change in the 934 amino acid protein, linkage or functional analysis is required before confidently assigning a pathogenic designation. Because DNA mismatch repair is highly conserved from bacteria to humans, the missense mutations may be easily characterized in the genetically facile yeast, Saccharomyces cerevisiae. In addition to analyzing 40 hMSH2 alleles, we also examined 11 missense alleles characterized originally in bacterial mutS (Wu and Marinus 1994) and 3 yeast missense mutations identified in our laboratory. The findings presented below and the work of others confirms that yeast is an ideal organism for testing the function of human mutations (Jeyaprakash et al. 1996; Shimodaira et al. 1998; Drotschmann et al. 1999; Shcherbakova and Kunkel 1999; Ellison et al. 2001; Hoffmann et al. 2003; Clodfelter et al. 2005).
The mutations were introduced into the yeast MSH2 coding sequence contained on a stable, low-copy-number plasmid, pMSH2, using site-directed mutagenesis (Kunkel 1985). A yeast codon usage table (Cherry et al. 1997) was consulted when planning the mutagenesis to prevent decreased expression because of placement of a rare codon. The mutations were confirmed by restriction endonuclease digestion and by nucleotide sequence analysis. Plasmids with the MSH2 alleles and control plasmids were introduced into a yeast reporter strain lacking the MSH2 gene (msh2Δ) to assay DNA mismatch repair.
Qualitative and quantitative mismatch repair assays were performed in vivo. In these assays, an increased incidence of survivors on drug-containing medium is indicative of defects in DNA mismatch repair. Specifically, elevated rates of resistance to 5-FOA reflect the failure to repair polymerase slippage at the dinucleotide tract fused upstream of the URA3 gene on a resident reporter construct (Henderson and Petes 1992), whereas an increased rate of resistance to canavanine represents a failure to repair singe base pair mismatches or single nucleotide insertions/deletions in the CAN1 coding sequence (Tishkoff et al. 1997). An example of a qualitative mismatch repair assay is shown in Figure 1A. The wild-type strain is sensitive to the drugs in contrast to the frequency of drug-resistant mutants appearing in the msh2 null strain. By comparison with the controls, a judgment as to whether the msh2 alleles result in defects in DNA mismatch repair was made for each msh2 missense strain. The results of quantitative (Table 1 for representative alleles) and qualitative (Table 2 for all alleles) mismatch repair assays revealed that 33 of the msh2 missense allele strains displayed a pronounced defect when compared to the controls. Three exhibited intermediate loss of activity and 18 displayed wild-type function.
Mismatch repair assays of MSH2 missense alleles. (A) Qualitative functional assays. The phenotypes of the strains expressing the msh2 missense substitutions are compared to the wild type (WT) and msh2 null (Δ) controls to determine the mismatch repair efficiencies in standard canavinine (CAN) (Reenan and Kolodner 1992) and 5-fluororotic acid monohydrate (FOA) resistance plate assays (Henderson and Petes 1992). The missense strains are listed according to the amino acid change; e.g., L521P denotes that at position 521 the leucine codon was changed to a proline codon. Uniform growth on medium lacking histidine and tryptophan (–HIS –TRP) serves as a replica-printing control. (B and C) The frequency of missense mutations resulting in pseudo-wild-type (B) or in mismatch repair-defective (C) phenotypes. The codon number scale on the x-axes is the same for B, C, and D. The scale for frequency corresponding to B and C on the y-axes differ. (D) Conservation among MutS homolog 2 proteins. A CLUSTAL W (version 1.82) multiple sequence alignment of human, mouse, worm, fly, mold, and yeast MSH2 was used in conjunction with the Evolutionary phylogenetic SHADOWing of closely related species (Ovcharenko et al. 2004) to generate a plot showing the degree of conservation (substitutions per amino acids) across the codon positions. The graph is plotted to emphasize the conserved regions of MSH2.
Mutation frequencies of representative alleles
Mismatch repair phenotype of missense MSH2 alleles
The “pseudo-wild-type” alleles include 3 missense alleles with bacterial equivalents and 15 with clinical origins (Table 2). To ensure that subtle defects in DNA mismatch repair were not overlooked, all the pseudo-wild-type and intermediate alleles were assayed using sensitive quantitative methods (Table 3). Using the dinucleotide instability assay as a more sensitive indicator of defects in DNA mismatch repair, we were able to determine that within the wild-type class, seven had subtle defects that were not apparent on the qualitative plates (Table 3, +/− designation).
Mutation frequencies of pseudo-wild-type and intermediate alleles
Overall, our results are in agreement with other reports of MSH2 missense alleles in yeast (Polaczek et al. 1998; Drotschmann et al. 1999; Ellison et al. 2001; Sia et al. 2001; Clodfelter et al. 2005). One exception is the reported failure of msh2-G317D to fully complement a msh2 deletion strain (Drotschmann et al. 1999); however, an independent study supports, with our finding, that it is a pseudo-wild-type allele (Ellison et al. 2001). The discrepancy may be attributable to strain background differences.
We examined the positions of the classes of alleles with respect to highly conserved regions of MutS homolog 2 proteins (Figure 1D). The pseudo-wild-type alleles fell primarily in the less conserved amino terminal coding region (Figure 1, B and D), whereas mutations that resulted in altered function were concentrated in highly conserved coding regions (Figure 1, C and D).
Thirty-six substitutions had a significant effect on mismatch repair function. We reasoned that the molecular characterization of these pathogenic variant mismatch repair proteins would contribute to the understanding of the basic mechanism of mismatch repair and help to define important structure–function relationships. The entire set of altered Msh2 proteins, including the pseudo-wild-type variants, was tested in molecular assays detailed below.
One-half of the defective Msh2 variant proteins have decreased steady-state levels:
Changing a single amino acid may have a dramatic effect on protein folding. In addition, misfolded variants are often targeted for degradation (Goldberg 2003). Thus, changing a single amino acid in a protein may have a significant affect on steady-state levels. To test for potential protein instability, we examined the levels of Msh2 and Msh2-variant proteins using standard immunoblotting methods (see Figure 2A for representative alleles). After normalization, we calculated for each variant the steady-state level as a percentage of wild-type Msh2 (Table 2). A variant was considered to have a significant defect if levels were <40% of wild-type Msh2.
Stability alleles defined. (A) Sample immunoblot to detect decreased steady-state levels of Msh2. Protein extracts were fractionated on a 7% separating gel and after blotting, the detection of Msh2 expressing the HA epitope was accomplished as described in the materials and methods. The positive control included the wild-type Msh2-HA protein extract from a msh2Δ strain expressing MSH2 from pMSH2 (WT). The no-epitope control extracts were from a msh2Δ strain harboring the pRS413 vector (Δ). The msh2 alleles were expressed from a plasmid identical to pMSH2 except for the relevant encoded missense mutation. The denoted change is marked above the corresponding lanes of the gel. The nomenclature for the substitution is the same as was described in the Figure 1 legend. After visualization of Msh2∷HAp (Msh2), the membrane was reprobed with rabbit α-Kar2p polyclonal and α-rabbit IgG HRP antibodies (loading). (B) Representative example of overexpression assay. Overexpression was achieved by placement of the inducible GAL10 (GAL) promoter, a promoter that leads to repression during cultivation in glucose (off) but confers elevated expression in the presence of galactose and the absence of glucose (overexpressed) upstream of the MSH2 missense alleles. In addition, the GAL-MSH2 fusions were on plasmid DNA replicated via the high-copy 2μ origin of replication. The mismatch repair phenotypes of msh2Δ strains overexpressing the msh2 missense substitutions (example shown is C195R, GAL-msh2-C195R) were compared to those overexpressing the wild-type MSH2 gene (GAL-MSH2) and to those with no overexpression (vector). The mismatch repair efficiencies were determined qualitatively using the 5-fluororotic acid monohydrate (FOA) dinucleotide instability plate assays (Henderson and Petes 1992). (C) Stability residues cluster to four regions on the basis of MutS crystal structure modeling. The leftmost image is of the MutS homodimer structure with the relevant positions in black and enhanced with shading. The five structural domains are highlighted as follows: domain I (blue), domain II (green), domain III (yellow), domain IV (orange), and domain V (red). The four stability clusters are enlarged and labeled 1–4 in the numbered images on the right. The relevant amino acid with the codon number is given for yeast with the corresponding Thermus aquaticus, Taq, MutS residue in parentheses below. Images were generated by manipulating 1ewq.pdb (crystal structure Taq MutS homodimer complexed with a heteroduplex DNA molecule at 2.2 Å resolution) (Obmolova et al. 2000) using the Swiss PDB Viewer (version 3.7) (Guex et al. 1999) and POV Ray Tracer program (version 3.6). (1) The connection between domains I and II is shown with E194 and C195 highlighted. (2) The C345, T347, G350, A618, and D621 stability residues are clustered within a 5-Å radius of one another in a region in the central core of the protein in domain III (yellow). The linking helix connecting the DNA-binding domain (orange) with the ATPase domain (red) is emphasized. (3) The highly conserved stability residues P640, C716, R371, and G711 localize to a region where a surface-exposed helix containing R371 (domain III) makes contact with the ATPase-containing domain (domain V) to stabilize the region. (4) Three stability residues (L521, D524, and R542) are located near the DNA mismatch recognition site (domain IV) that becomes stabilized upon DNA binding and is adjacent to the connecting helix (Obmolova et al. 2000).
Seventeen msh2 missense alleles, representing 14 amino acid positions, resulted in decreased steady-state levels of Msh2 (Table 4). We postulated that many of the Msh2 variants with decreased amounts might simply produce too little protein for efficient mismatch repair. To test this possibility, mutant alleles were overexpressed in an msh2Δ strain (see Figure 2B for an example). We tested 60% of stability alleles and found that all were able to restore mismatch repair when overexpressed suggesting that the defects in these mutants arises from decreased protein levels.
Profile of alterations causing decreased steady-state levels of Msh2
Overall, ∼40% of the substitutions involve prolines and glycines, amino acids that often influence protein folding (Andreotti 2003; Krieger et al. 2005) and/or distort α-helical structures (Duclohier 2004). In addition, ∼40% of the substitutions involve replacement of, or substitution with, an arginine residue. Arginine is a large amino acid with a propensity to be surface exposed (Chothia 1975, 1976; Pacios 2001) (Table 4).
To better understand how the alterations might alter stability of Msh2, we mapped the positions onto the three-dimensional structure using Thermus aquaticus, Taq MutS as the model (Lamers et al. 2000; Obmolova et al. 2000). MutS has five major domains (I–V shown in Figure 2C) (Obmolova et al. 2000). The stability residues cluster to four discrete regions that stabilize interdomain interactions (Figure 2C). Three are in the central core of the protein and one is near the DNA-binding domain. Collectively, ∼70% of the positions identified in this work as being important for stability were predicted by Obmolova et al. (2000) to be important for either structural integrity or interdomain interactions (Table 4). For two of the stability clusters in the core, amino acids from widely separated but conserved portions of the primary sequence come together in close proximity to stabilize the protein (Table 4 and Figure 2C). In addition, the central core cluster appears to be crucial for positioning and stabilizing an important helix thought to transmit the DNA-binding signal (domain IV) to the ATPase domain (domain V) via the central core (domain III) (Obmolova et al. 2000).
In summary, the deleterious substitutions altering steady-state levels of Msh2 cluster to four distinct regions on the putative structure. These “stability” clusters support the significance of certain key elements discussed previously as contributing to the structural integrity of the MutS protein (Obmolova et al. 2000). In particular, they cluster to regions of the structure in which interdomain interactions are found. The data presented here strongly support the model that the portion of the protein linking the DNA-binding domain with the ATPase domain is crucial for functioning, specifically altering stability and complex formation (Junop et al. 2001).
Clustered substitutions lead to decreased mismatch repair subunit formation:
Mismatch repair is dependent on the formation of higher-order protein complexes. Because amino acid substitutions have the potential to interfere with crucial protein–protein interactions, the altered Msh2 proteins were tested for their ability to interact with other subunits of the mismatch repair complex by employing the yeast two-hybrid assay (James et al. 1996). A positive interaction in the yeast two-hybrid assay results in the expression of a reporter gene, in this case allowing for growth on medium lacking histidine. Using the unmutagenized GBD-Msh2 fusion protein as a positive control and the GBD protein as the negative control, the ability of each fusion variant to interact with members of the mismatch repair machinery was determined. The functional partners tested were Msh3, Msh6, Mlh1, Pms1, Pol30, and Exo1. Representative results from a two-hybrid experiment are shown in Figure 3A.
Mutations altering the mismatch repair complex. (A) Representative two-hybrid assay to detect mismatch repair subunit formation. The MATa yeast two-hybrid reporter strain was transformed with pGBD-C2 (negative control, C2), pGBD-MSH2 (positive control, Msh2), pGBD-msh2-P640L (Msh2P640L), pGBD-msh2-H658Y (Msh2H658Y), pGBD-msh2-G693D (Msh2G693D), pGBD-msh2-G688D (Msh2G688D), pGBD-msh2-T743I (Msh2T743I), pGBD-msh2-G692S (Msh2G692S), or pGBD-msh2-S742L (Msh2S742L). Transformants were mated with (denoted with an ×) individual MATα yeast two-hybrid reporter strains harboring pGAD-MSH6 (Msh6), pGAD-MSH3 (Msh3), pGAD-MLH1 (Mlh1), pGAD-PMS1 (Pms1), pGAD-EXO1 (Exo1), or pGAD-POL30 (Pol30). Diploid cultures were serially diluted and spotted onto agar plates and allowed to grow for 2 days at 30°. Medium lacking leucine and tryptophan (–LEU –TRP) selects for diploids harboring both pGAD and pGBD constructs (Control). Growth on selective medium also lacking histidine (–LEU –TRP –HIS) indicates a two-hybrid interaction (Interaction). (B) Sample immunoblot to detect decreased levels of GBD-Msh2 fusion proteins. The positive control included the wild-type GBD-Msh2-HA protein extract from a strain expressing the fusion from pGBD-MSH2 (Msh2). The no-epitope control extracts were from a strain harboring the pGBD-C2 vector (C2). The msh2 alleles were expressed from a plasmid identical to pGBD-MSH2 except for the relevant encoded missense mutation. The denoted change is marked in superscript. The nomenclature for the substitution is the same as was described in the Figure 1 legend. After visualization of GBD-Msh2∷HAp (GBD-Msh2), the membrane was reprobed with rabbit α-Kar2p polyclonal and α-rabbit IgG HRP antibodies (loading). (C–F) Clustering of amino acid substitution sites resulting exclusively in subunit formation defects. The relevant yeast amino acid with the codon number is given with the corresponding Taq MutS residue in parentheses below. The subunit positions are colored in black and enhanced with shading. The DNA-contacting subunit of the homodimer (the Msh6- or Msh3-like subunit of Taq MutS) is shown in paler colors. For example, domain I is dark blue for the Msh2-like subunit and pale blue for the Msh6/3p-like subunit, whereas domain V is red for the Msh2-like subunit and pink for the Msh6/3p-like subunit. Images were generated as described in the previous figure legend. (C) Domain V subunit interface. In domain V (red for the Msh2-like subunit), the residue corresponding to yeast S762 is S669 (black) in Taq MutS and is within 4 Å of H696 (gray) in the other dimer partner (pink). (D) DNA-binding region subunit interface. Domain I is a region where the two dimer partners come in close proximity upon DNA binding. C67 (R76 in Taq MutS, black) on the Msh2-like subunit (dark blue) is within 4 Å of Taq MutS V57 (gray) in the dimer partner (light blue). (D) Surface-exposed residue L457. The image is of the Taq MutS homodimer structure with L457 (Taq MutS L372) surface residue in black. (F) Putative domain V connector region and putative contact region with Mlh1p (MutL). H658 and M707 residues are highlighted in black.
The negative control (GBD alone) failed to produce a positive interaction with any of the partners. Conversely, the positive control (GBD-Msh2) produced an interaction with all of the partners tested (Figure 3A). Msh2 is known to directly contact Msh6 and Msh3 (Habraken et al. 1996; Iaccarino et al. 1996; Palombo et al. 1996); however, the positive two-hybrid interaction with other proteins of the mismatch repair complex may reflect bridging interactions formed in the yeast nucleus where the endogenous proteins localize. Because of potential bridging interactions, the two-hybrid analysis presented here was used primarily as a tool to probe the stability of the higher order complex and, except for the known partners (Msh6, Msh3, and potentially Mlh1/Pms1), not to map specific protein–protein contact residues.
In addition to examining the human and Escherichia coli missense mutations, we conducted a mutagenesis screen to find alterations leading to loss of subunit interactions. The GBD-MSH2 fusion plasmid was mutagenized in vitro with hydroxylamine. Transformants containing the mutagenized plasmids were mated to strains expressing either the GAD-MSH6 or GAD-MSH3 fusions. Three amino acid substitutions (C67Y, G711D, and S762Y) were identified that reproducibly caused a loss of interactions. In addition, the substitutions were created in the MSH2 gene and all were determined to confer a defect in DNA mismatch repair (Table 2). One identified alteration at codon 711 in yeast was at the corresponding position of a human missense alteration (Isidro et al. 2000). The G711 amino acid is defined as a stability/subunit residue because of the dramatic effect of either the yeast (G > D) or human alteration (G > R) on steady-state levels of the protein (discussed above; Table 4).
The two-hybrid analysis of the entire set of missense alterations revealed that 39%, or 21 substitutions, representing 17 amino acid positions, caused a loss of subunit interactions. For all but three substitutions (discussed below) the loss of interactions was found to occur with the entire array of functional partners. The simple explanation for this is that any alteration preventing MutS heterodimer formation (i.e., binding with Msh6 or Msh3) would block subsequent ternary interactions.
The “subunit” variants that resulted in loss of interactions fell into two main categories: the true subunit variants, or those with no significant change in steady-state levels (6 alleles, or 29%; Table 5), and the stability/subunit variants, or those with decreased levels (15 alleles, or 71%). The majority of variant proteins with decreased steady-state levels (Table 2) also failed to interact using the two-hybrid assay; however, the existence of Msh2R542P and Msh2L521P variants that exhibit decreased protein levels, but still have the ability to show positive interactions, suggests that the system is sensitive even if protein levels are decreased in the steady-state assays. The sufficient levels are presumably achieved as a consequence of the robust ADH1 promoter and the high-copy 2μ origin of replication. To test this directly, the protein levels of the variant fusions and controls were examined to determine whether the failure to detect an interaction was because the GBD-Msh2 variant proteins were below a critical threshold. Only GBD-Msh2 variant hybrid proteins that failed to interact but were found at levels comparable to the wild-type version of the hybrid protein were designated as true subunit variants (representatives from each class are shown in Figure 3B).
Alterations causing decreased mismatch repair subunit formation
These subunit positions were mapped onto the structure of Taq MutS. Two substitutions (S762Y and C67Y) are found on the dimerization interface (Figure 3, C and D). S762 is in domain V near the ATPase region (Figure 3C) and C67 is in domain I near the DNA-binding region (Figure 3D). In each case a larger tyrosine residue replaces the amino acid and presumably destabilizes the dimerization interface. One of the subunit positions (L457) is surface exposed (Figure 3E), consistent with the hypothesis that this amino acid may contact proteins in the higher order mismatch repair complex. In addition, three substitutions at two amino acid positions (yeast M707I, H658Y/R) displayed detectable interactions with a MutS heterodimer partner but not with members of the ternary complex (H658Y example in Figure 3A). The two amino acids cluster near a potential contact region on the solvent-exposed surface (Figure 3E).
In summary, the subunit alleles fell into two classes: those that have decreased levels and those that have normal levels of Msh2. The substitutions that altered steady-state levels of the protein and complex formation presumably result in significant structural changes that destabilize the protein and prevent formation of the higher order complex. The true subunit variants are defined as having wild-type levels of protein that fail to interact with the other mismatch repair components. The altered amino acids reside either along the heterodimer interface or in regions that may alter ternary complex formation. Three subunit substitutions at two amino acid positions specifically blocked interactions with Mlh1 and Pms1 without altering heterodimer formation. The phenotype of substitutions in this region lends credibility to the previous hypothesis that the region where domains converge and which is structurally altered upon binding of ADP is also the site of MutL binding (Obmolova et al. 2000; Junop et al. 2001). We extend the model to suggest that the analogous region on Msh2 is important for Mlh1/Pms1 binding.
Substitutions likely to alter ATPase functioning:
The molecular tests described above helped to classify the missense substitutions into functional groups. The final class consists of those substitutions likely to impact the ATPase functioning of the MutS heterodimer. These putative ATPase alleles (Figure 4) expressed wild-type levels of variant proteins that interacted with the mismatch repair partners. Using the crystal structure of Taq MutS complexed with ADP (Junop et al. 2001) as the reference, we mapped the corresponding positions on the three-dimensional structure (Figure 4A). The amino acid residues cluster to a conserved domain known to function in ATPase activity (Junop et al. 2001). A subset of these substitutions change Walker Box residues found in all ATPases (highlighted in black in Figure 4); however, the remaining are also given the ATPase classification on the basis of the predicted spatial proximity and in some cases by their the verified role in ATPase activity (Studamire et al. 1998; Junop et al. 2001) (highlighted in gray in Figure 4). The ATPase residues line the conserved dimer interface spanning the region between the heterodimers' ATPase active sites (Figure 4A) beginning with the conserved ATPase residues and ending with residue S742 (S637 in Taq MutS) that is within 10 Å of the ADP molecule from the Msh6/3 subunit. In summary, these residues are likely to form part of the composite ATPase site spanning the dimer interface (Junop et al. 2001) on the basis of phenotypic characterizations and the location of the relevant amino acid on the predicted structure.
The composite ATPase positions. (A) Mapping the position onto the composite ATPase domain. The ATPase domains of the Taq MutS homodimer are shown (red for the Msh2-like subunit; pink for the Msh6/3p-like subunit). The relevant yeast amino acid with the codon number is given with the corresponding Taq MutS residue in parentheses below. The ADP molecules are shown in blue. The positions targeted for alteration corresponding to the conserved consensus ATPase active site are highlighted in black. The putative ATPase positions are highlighted in gray. The Msh6/3p-like dimer partner is shown in pink. Images were made as described in the Figure 3 legend, except that the PDB file was 1fw6.pdb, the crystal structure of a Taq MutS–DNA–ADP ternary complex (Junop et al. 2001). (B) Consensus sites for ATPases with the corresponding positions analyzed in this work. Boldface residues indicate the consensus for all ATPases. Underlined residues have a corresponding missense variant analyzed in this article. The DELGRG sequence conservation encompassing the Walker B box is for MutS homologs.
DISCUSSION
Summary:
The data presented in this article characterize an extensive set of missense mutations in the mismatch repair gene, MSH2. The majority of the mutations analyzed were originally identified in patients with HNPCC syndrome. We introduced 54 missense mutations into the cognate positions in yeast MSH2, tested for mismatch repair, and characterized the variant proteins at a molecular level. We were able to classify the missense strains into several functional categories. These classifications are named according to the defining molecular features and include pseudo-wild type, stability, stability/subunit, subunit, and ATPase (Table 2). For all except the pseudo-wild-type variants, a single amino acid change significantly altered Msh2 functioning. The deleterious substitutions clustered in regions of high conservation among MutS homologs. In addition, the defective variants allowed for the mapping of four discrete regions where predicted structural domains interact as critical for Msh2 stability. Finally, regions for protein–protein interactions of the mismatch repair complex were identified and warrant further investigation as to whether they constitute direct contact sites. In summary, the molecular characterization of pathogenic variant mismatch repair proteins allowed for a more complete understanding of the cellular and molecular defects of clinically derived variant Msh2 proteins.
Decreased levels may constitute the major cause of pathogenicity of missense variants:
Our data show that the lack of adequate levels of variant Msh2 protein is the most common reason for failure of DNA mismatch repair among the defective human missense variants. In addition, overexpression restored mismatch repair function in all cases tested. Interestingly, scrutiny of the clinical data shows that when immunohistochemistry on tumors was performed, three variants that we find to have decreased levels (Msh2C195R, Msh2A818V, and Msh2R371S) likewise had no detectable protein in the tumor that encodes the hMsh2 missense variant protein (Leung et al. 1998; Furukawa et al. 2002; Scartozzi et al. 2002). This correlation underscores the legitimacy of using yeast to study human variants.
The decreased stability of the majority of the variants could conceivably help influence therapeutics in the future. Since tumors lacking Msh2 are often resistant to the DNA damage-induced cell death caused by chemotherapeutic regimes (Fedier and Fink 2004), the restoration of Msh2 levels in the tumors would render the tumor susceptible to cell death upon exposure the drugs. An analogous therapeutic rationale was employed in tumor cells in which hMLH1 expression was significantly repressed because of promoter hypermethylation (Maier et al. 2005). Treatment of cell lines derived from these tumors with a DNA methyltransferase inhibitor resulted in demethylation of the hMLH1 promoter and consequent restoration of mismatch repair activity (Herman et al. 1998).
Intermediate mismatch repair defect and late-onset tumors:
One missense variant, M688I, was isolated in a family fulfilling many criteria for HNPCC but failing to meet the early onset condition (Banno et al. 2003). In this family, afflicted siblings were in their seventies and eighties when they developed tumors with microsatellite instability (HNPCC tumors usually appear before the age of 50). It is of interest that this allele in yeast (M707I) displayed an intermediate mismatch repair phenotype (see Table 1 for rate). Our data for this allele suggest that the decreased rate may account for the late onset of the disease in this family.
The reliability of information in publicly accessible databases:
Ascertaining the difference between a pathogenic or benign polymorphism is critical for accurate genetic counseling and clinical surveillance in HNPCC families. Of immediate medical relevance are 15 human alleles annotated as pathogenic in publicly accessible databases that conferred no obvious defects in DNA mismatch repair assays. Although it is formally possible that the yeast gene does not serve as an adequate model for human function or that the missense mutations might be deleterious in only in certain genetic contexts (Heck et al. 2006), it is most likely that the designation as pathogenic for these missense mutations is inaccurate. First, these pseudo-wild-type substitutions are found in regions throughout the coding sequence but tend to be in regions of low conservation among Msh2 homologs, raising doubt about their impact on the function of the protein. Second, an analysis of the data from the medical literature suggests that most of these pseudo-wild-type alleles were miscategorized. In certain cases, the pathogenic designation was clearly in error; e.g., the human alleles G322D and L390F were proven to be common polymorphism (Konishi et al. 1996; Liu et al. 1998), yet the pathogenic classification persists in the databases. In addition, human alleles D167H and N127S were found in combination with a severe defect in Mlh1 function, which would account for the cancer predisposition (Moslein et al. 1996; Samowitz et al. 2001; Scartozzi et al. 2002). Several human alleles (Y98C, A714V, Q824E, and A870G) were found in healthy members of suspected HNPCC families (Kim et al. 2001), suggesting that there is no linkage between the missense allele and cancer. For many alleles, including T44M, S323C, S323Y, E562V, and E886G, the authors provide no definitive designation of pathogenicity because of limited or lacking linkage data and control groups (Akiyama et al. 1997; Beck et al. 1997; Kim et al. 2001; Bisgaard et al. 2002). Finally, for Q61P and I145M the evidence for pathogenicity is unpublished. Because of the difficulty of proving a missense allele to be pathogenic, we strongly encourage functional characterization of missense alleles to determine whether they are likely to be the causative factor for cancer in HNPCC families and recommend adherence to a more stringent set of criteria before assigning a pathogenic designation in publicly accessible databases.
Acknowledgments
We are grateful to all of the Princeton Molecular Biology students who have taken part in an ongoing project-based laboratory course that inspired this work. Many thanks to Lorraine Symington, Eric Alani, Thomas Petes, and Rodney Rothstein for generously supplying strains and plasmids and Fred Hughson for helpful comments on the manuscript. This research was supported by Princeton University, the Department of Molecular Biology at Princeton University, Howard Hughes Medical Institute, the New Jersey Commission on Cancer, and a National Institute of Health grant GM037739 awarded to M.D. Rose.
Footnotes
- Received January 17, 2007.
- Accepted July 27, 2007.
- Copyright © 2007 by the Genetics Society of America