Abstract
PCR amplification was previously used to identify a cluster of resistance gene analogues (RGAs) on soybean linkage group J. Resistance to powdery mildew (Rmd-c), Phytophthora stem and root rot (Rps2), and an ineffective nodulation gene (Rj2) map within this cluster. BAC fingerprinting and RGA-specific primers were used to develop a contig of BAC clones spanning this region in cultivar “Williams 82” [rps2, Rmd (adult onset), rj2]. Two cDNAs with homology to the TIR/NBD/LRR family of R-genes have also been mapped to opposite ends of a BAC in the contig Gm_Isb001_091F11 (BAC 91F11). Sequence analyses of BAC 91F11 identified 16 different resistance-like gene (RLG) sequences with homology to the TIR/NBD/LRR family of disease resistance genes. Four of these RLGs represent two potentially novel classes of disease resistance genes: TIR/NBD domains fused inframe to a putative defense-related protein (NtPRp27-like) and TIR domains fused inframe to soybean calmodulin Ca2+-binding domains. RT-PCR analyses using gene-specific primers allowed us to monitor the expression of individual genes in different tissues and developmental stages. Three genes appeared to be constitutively expressed, while three were differentially expressed. Analyses of the R-genes within this BAC suggest that R-gene evolution in soybean is a complex and dynamic process.
IN the last several years, many different disease resistance genes (R-genes) have been cloned from a variety of plant species. To date, five different structural classes of R-genes have been identified (Hammond-Kosack and Jones 1997; Dangl and Jones 2001). The largest two classes contain nucleotide-binding domains (NBDs) and leucine-rich repeats (LRRs). The NBD is thought to be involved in signal transduction cascades through phosphorylation/dephosphorylation events with either ATP or GTP (Dangl and Jones 2001). The R-gene NBD domain has been expanded to include homology to eukaryotic cell death effectors (NBD-ARC; Dangl and Jones 2001). The LRR is a conserved domain thought to be involved in ligand binding and pathogen recognition (Hammond-Kosack and Jones 1997). Plant R-genes within this group have structural similarity to animal Nod proteins involved in innate immunity (Dangl and Jones 2001). The two NBD/LRR classes are divided by the presence of either a Toll/Interleukin-1 cytoplasmic receptor (TIR) domain or a coiled-coil (CC) domain at the amino terminus. The TIR is an ancient signaling domain induced by environmental stress or pathogen attack that has been identified in mammals, insects, and plants (O’Neill and Greene 1998). Unlike the other classes of plant disease resistance genes, NBD/LRR genes appear to function exclusively in resistance responses.
Complex clusters of R-genes are common in plant genomes. The Xa21 gene family in rice contains seven homologs within a 230-kb region (Songet al. 1997). The seven members of the I2 family in tomato span 90 kb (Simonset al. 1998). The Arabidopsis RPW8 locus is composed of five homologs within 13 kb (Xiaoet al. 2001). In barley, the Mla locus is composed of three distinct CC/NBS/LRR gene families within a 240-kb region (Wei et al. 1999, 2002). The use of resistance gene analogues (RGAs) has also been used to demonstrate clustering of R-genes in a variety of species including soybean (Kanazinet al. 1996), potato (Leisteret al. 1996), lettuce (Shenet al. 1998), and Arabidopsis (Aartset al. 1998). While clustering of R-genes does occur frequently, complete sequencing of the Arabidopsis genome has revealed 46 single-gene loci and ∼40 gene clusters of two or more (Dangl and Jones 2001).
Determining how sequence differences between paralogs result in altered specificities has been essential in examining the evolution of R-genes. By examining three haplotypes of the Cf-2/Cf-5 family in tomato, Dixon et al. (1998) were able to demonstrate that variation in LRR copy number and recombination play a role in generating diversity. In addition, the solvent-exposed regions of the LRR in many R-genes are hypervariable and correlate with novel specificities (Andersonet al. 1997; Parniskeet al. 1997; Botellaet al. 1998; Dixonet al. 1998; Meyerset al. 1998; Warrenet al. 1998; Elliset al. 1999). Grant et al. (1995) hypothesized that different regions of the LRR were responsible for recognizing unrelated strains of Pseudomonas syringae. In the case of the rice blast resistance gene Pi-ta, a single amino acid change within the LRR resulted in susceptibility to rice blast (Bryanet al. 2000). In the P locus of flax, six amino acid changes within the β-strand/β-turn motif of the LRR were responsible for the different specificities of P and P2 (Doddset al. 2001). The TIR has also been implicated in determining pathogen specificity. Sequencing of 13 alleles in the L locus of flax has revealed that 2 alleles, differing only in the TIR, have different specificities to isolates of the flax rust pan (Elliset al. 1999).
Domain-swapping experiments have also been used to identify regions in R-genes required for specificity. Luck et al. (2000) used domain swapping at the L locus of flax to demonstrate that changes in the TIR and portions of the NBD can alter disease resistance specificities. In addition, combining a functional TIR from one gene with a functional LRR from another could lead to nonfunctional R-genes, suggesting that the TIR and LRR must be able to act in concert with one another. van der Hoorn et al. (2001) used domain swapping to identify regions responsible for pathogen specificity in the tomato R-genes Cf-4 and Cf-9. Specificity determinants for Cf-4 were swapped into Cf-9 to demonstrate that it was now able to recognize AVR4. Hwang et al. (2000) used domain swapping to examine differences between the homologs of the Mi gene in tomato. Mi confers resistance to the nematode Meloidogyne incognita while Mi-1.1 does not. Reciprocal LRR swapping and expression in a hairy root transformation assay associated two phenotypes with the swapped genes. The functional Mi gene with nonfunctional LRRs was unable to confer resistance. However, the nonfunctional Mi-1 with Mi LRRs exhibited a lethal phenotype consistent with constitutive defense gene expression. These results suggest that novel combinations of R-genes will have interesting and potentially useful pathogen specificities.
One area of disease resistance research that has remained relatively unexplored is the expression of disease resistance genes. Using the BLASTX algorithm, Meyers et al. (1999) identified 95 NBD sequences out of 750,000 expressed sequence tags (ESTs) from wheat, maize, rice, and soybean available in the DuPont database. Analysis of 183,000 ESTs from the soybean EST database found 41 sequences with homology to the TIR. This means that ∼1 in 5000 ESTs correspond to TIR containing R-genes. The complete coding sequence of soybean cDNA clone LM6, a TIR/NBD/LRR R-gene homolog (Graham et al. 2000, 2002), was used to screen the public soybean EST database using the BLASTN and TBLASTX algorithms. Of the 208,198 ESTs, 103 showed homology to LM6 or ∼1 in 2000.
Given these low transcript levels, expression of few genes has been examined in detail. Wang et al. (1999) used RNA gel blot analyses to demonstrate that the Pib gene for rice blast resistance was induced by different light and temperature conditions. Further analyses of the four members of the Pib gene family demonstrated that they were induced by environmental conditions favoring infection and by chemical signals that trigger secondary defense responses (Wanget al. 2001). Cooley et al. (2000) examined the expression of the HRT gene in Arabidopsis by reverse transcriptase (RT)-PCR and were unable to detect induction of the gene with pathogen inoculation. Yoshimura et al. (1998) used RT-PCR to demonstrate the induction of the bacterial blight resistance gene Xa1 upon pathogen inoculation. Using the I-2 promoter fused to the β-glucuronidase reporter gene, Mes et al. (2000) were able to detect expression of the reporter gene in fruits, leaves, stems, and roots.
In soybean, a cluster of RGAs had been mapped to a region on soybean linkage group J (Kanazinet al. 1996). The RGAs were located within a cluster of previously mapped genes: powdery mildew resistance (Rmd-c), Phytophthora stem and root rot (Rps2), and the ineffective nodulation gene Rj2. Using a combination of RGA-specific primers and bacterial artificial chromosome (BAC) fingerprinting, a contig of BACs was developed for this region in soybean cultivar “Williams 82” [rps2, Rmd (adult onset), rj2; Marek and Shoemaker 1997]. Two TIR/NBD/LRR cDNAs that mapped on opposite ends of a core BAC in the contig Gm_Isb001_091_F11 (BAC 91F11) were identified (Grahamet al. 2000).
We report here the complete sequence of soybean BAC 91F11 from cultivar Williams 82. We have identified 16 different R-gene sequences within this BAC, including four genes from two potentially novel classes of disease resistance genes. The first class is composed of two genes with homology to the TIR and NBD domains fused inframe to an NtPRp27-like gene that is a putative defense-related protein believed to be involved in downstream defense responses. The second class is composed of two genes with complete TIR signatures fused inframe to soybean calmodulin Ca2+-binding domains. RT-PCR revealed that three of the genes within this BAC were constitutively expressed, while three appeared to be differentially expressed. Sequence analysis of regions outside of the R-genes has also revealed important information about the origins of these genes and the mechanisms governing their evolution.
MATERIALS AND METHODS
BAC 91F11 sequencing: BAC 91F11 was identified from the Iowa State University Williams 82 soybean BAC library. BAC 91F11 subclones were generated using three different methodologies from BAC DNA prepared using QIAGEN (Valencia, CA) tip-100s. Subclones were prepared from low-melt agarose-purified BAC insert DNA digested with Sau3AI and ligated into the BamHI site of HK-phosphatase (Epicentre Technologies, Madison, WI) treated pGEM3 Z+ (Promega, Madison, WI). Because many of the initial subclones were chimeric, subclones were also generated from Tsp509I partially digested BAC DNA treated with HK-phosphatase, size-selected on a low-melt agarose gel, and ligated into the EcoRI site of pBSKII+ (Stratagene, La Jolla, CA). While this technique did not produce detectable chimeric subclones, many gaps remained in the BAC sequence after analysis of a number of clones statistically determined to provide full sequence coverage. All but two of these remaining sequence gaps were filled using subclones generated from nebulized, phosphatase-treated and end-polished BAC DNA (Invitrogen, La Jolla, CA) cloned into the EcoRV site of pBSKII+. Subclones >1500 bases were sequenced by the Iowa State University DNA Synthesis and Sequencing Facility. Subclone sequences were assembled using Sequencher software (Gene Codes Corporation, Ann Arbor, MI). Once a core of contig sequences was assembled, only additional informative subclones were completely sequenced. The two final gaps were closed by PCR using primers designed from adjacent subclone sequences. The sequence of BAC 91F11 has been given GenBank accession no. AF541963. Sequence similarities were determined using the BLASTX algorithm (Altschulet al. 1997) against the GenBank nonredundant database. Sequence similarities to ESTs were determined using BLASTN (Altschulet al. 1997) to search the GenBank EST database. R-gene sequences were aligned using Megalign software (DNASTAR, Madison, WI).
Sequence analysis of BAC 91F11 resistance genes: The location of introns was predicted on the basis of sequences of cDNAs LM6 and MG13 (Grahamet al. 2000), by matching the sequences to corresponding ESTs if available, and also by using the NetPlantGene intron prediction program (Hebsgaardet al. 1996). The sequences of R-genes 6, 9, 13, and 14 and fragments A and B were examined in further detail with the Diogenes (http://web.ahc.umn.edu/diogenes) and PANAL (Silversteinet al. 2000; http://mgd.ahc.umn.edu/panal) sequence analysis packages. Sequence analyses of the 91F11 R-genes were performed using Genetics Computer Group software (GCG, Madison, WI). Exon sequences were combined using the Assemble program. Gene sequences were kept whole and later divided into functional domains for analysis. These domains included the TIR, NBD, and LRR domains. In addition, the intervening region (IR) between the NBD and LRR was analyzed. Sequence alignments were generated using the Pileup program and sequence distances were determined using the Diverge program. Two-by-two contingency tables were used to determine the significance of amino acid substitution rates. Phylogenetic trees were constructed using PAUPsearch and PAUPdisplay with the default settings. One hundred bootstrap analyses were performed within PAUP-search to provide confidence levels for phylogenetic trees. Restriction fragment sites were identified using Map.
Development of resistance gene-specific primers: As genomic sequences were obtained from BAC 91F11, they were arranged into contigs made up of overlapping sequences. These sequences were examined for disease resistance motifs using the BLASTX algorithm. Sixteen different genes or gene fragments were identified, ranging in nucleotide identity from 71 to 99% (see results). Alignment of the genes using Lasergene software (DNASTAR) was used to identify regions from which gene-specific primers could be designed for use in RT-PCR. Oligo 6.0 (Molecular Biology Insights, Cascade, CO) was used for designing primers for 12 of the genes (Table 1). Gene-specific primers could not be designed for R-genes 6 and 9 and R-gene fragments A and B due to the small size and high nucleotide identity of the R-gene portions.
Controls for testing resistance gene primer specificity: Each primer pair was tested by PCR against subclones representing all other predicted resistance genes in the BAC. PCR was performed using a PTC 200 DNA Engine thermocycler from MJ Research (Watertown, MA). PCR reactions were 20 μl in volume and contained 1× BRL PCR buffer, 2.0 mm MgCl2, 200 μm each dNTP, 0.2 μm each primer, 1 μl template DNA, and 0.5 units Taq Enzyme (Invitrogen, Carlsbad, CA). PCR cycling conditions were 94° for 2 min, 35 cycles of 94° for 1 min, anneal for 30 sec, 72° for 1 min, followed by 72° for 2 min. Annealing temperatures were altered until a specific primer pair would amplify only the subclone corresponding to the specific gene. To confirm that no other products were amplified, a Southern blot of PCR products derived from each primer pair against all subclones was made and probed with the PCR product of the specific primer pair. If the primers were specific, only the subclone corresponding to the gene-specific primer pair would show a hybridization signal. A second Southern blot was used to demonstrate that the gene-specific PCR product could hybridize to all the subclones. Together, the two Southern blots demonstrated that the gene-specific primers were specific to an individual R-gene within the BAC. To test whether the primers were specific relative to the rest of the R-genes in the genome, the gene-specific primers were tested against the Williams 82 BAC library. The primers were considered specific if they amplified only BACs that overlapped with BAC 91F11. Once the gene-specific primers were selected, they were retested against the subclones representing all of the genes using the reagents for RT-PCR.
mRNA isolation and reverse transcriptase polymerase chain reaction: Six mRNA samples were isolated from a range of organs and stages of development in the soybean cultivar Williams 82. Plants from which samples were collected at 9 days after planting (DAP) and 14 DAP were grown in a growth chamber. The samples of fully expanded leaves were taken from greenhouse-grown plants 150 DAP. For other leaf samples, all the leaves were harvested from the plants. Flower and pod samples were taken from field-grown plants at 62 and 80 DAP, respectively. For mRNA isolation, samples were taken from at least six plants, combined, and ground in liquid nitrogen in preparation for mRNA isolation. Two independent mRNA isolations were performed on the ground tissue. mRNA was isolated from 2 g of selected tissues using the Micro-FastTrack 2.0 kit (Invitrogen). Approximately 0.1 μg of mRNA was used for first-strand cDNA synthesis using the Advantage RT-for-PCR kit (CLONTECH, Palo Alto, CA) following the manufacturer’s recommended conditions. An oligo (dT) 18 primer was used for the first-strand synthesis. cDNAs were amplified using the Advantage cDNA polymerase mix (CLONTECH). Amplification reactions had a final concentration of 1× cDNA PCR reaction buffer, 0.2 mm dNTPs, and 0.5× cDNA polymerase mix in a total volume of 20 μl. Amplification conditions were those determined for the gene-specific primers above. RT-PCR products were run out on a 1% agarose, 1% TAE, ethidium bromide gel.
Controls for the RT-PCR reactions included a “minus” reverse transcriptase RT-PCR reaction to test each mRNA sample for genomic DNA contamination. cDNA synthesis and RT-PCR conditions were as described for tissue samples except no reverse transcriptase enzyme was added to the cDNA synthesis reaction. In addition, an RT-PCR reaction using water as the template for cDNA synthesis was included to check for reagent contamination. Each set of synthesized cDNAs, including the controls, was amplified using primers designed from a soybean tubulin EST taken from the Public Soybean EST Project as a positive control. The sequences of the primers were Tub56 U, 5′ CAA TTG GAG CGC ATC AAT G 3′ and Tub56 L, 5′ ATA CAC TCA TCA GCA TTC TC 3′. All RT-PCR reactions were repeated to verify results.
Sequence of BAC-91F11 resistance gene-specific primers used for RT-PCR
It was impossible to design gene-specific primers at the same location in each of the genes due to the high nucleotide identity shared between genes. Differences in the primers, amplification lengths, and primer annealing sites can significantly affect first-strand cDNA synthesis and PCR efficiency. Therefore, we made no attempt to quantify expression of the genes. Our assay is a yes/no assay to determine if the gene products could be detected in a particular tissue.
Analysis of R-genes with NtPRp27-like sequences: To determine if genes homologous to genes 13 and 14 existed elsewhere in the genome, primers were designed to span the resistance gene and NtPRp27-like protein fusion point. Primer NBD13/14 (5′ GGC CTT CCA CGG GCT TT 3′) was designed from within the GLPLA domain of the NBD. Primer NtPRp27-R13/14 (5′ TGC AAT ACC TCC ART TAA TC 3′) was designed from within a conserved domain in the NtPRp27-like sequence. The primers were used to screen the Williams 82 BAC library using PCR conditions described previously for the R-gene-specific primers and an annealing temperature of 52°.
RESULTS
Sequence analysis of BAC 91F11: The subclones from BAC 91F11 were assembled into three large contigs of 12,608, 42,051, and 61,392 nucleotides. Average sequence redundancy was 4.8-fold, excluding subclones for which only end sequence was obtained. BAC-end sequences from other BACs in the Williams 82 linkage group J contig (Grahamet al. 2000) were used to orient the subclone contigs relative to each other within the BAC. The two final gaps of 38 and 683 bases were closed by PCR amplification giving a final length of 118,773 bases for BAC 91F11 (Figure 1).
Using the BLASTX and TBLASTX algorithms (Altschulet al. 1997), we determined similarities for sequences within the BAC by screening the GenBank nonredundant database in 1000- and 2000-bp intervals (March, 2002; Figure 1). Only sequences with an expected value >10-20 were accepted. Sequence homologies included 16 resistance gene homologs, a leucine zipper protein, a hypothetical protein from Arabidopsis, four sequences with highest homology to the Ca2+-binding domains of a soybean calmodulin gene, three sequences with homology to a NtPRp27-like protein, and four regions with homology to retroelements (Figure 1).
Analysis of BAC 91F11 retroelements: The retroelements from BAC 91F11 fall into three groups (Figure 1). Two of the retroelements are Gypsy/Ty3-related retroelements while the other two sequences show similarity to a Ta11-like non-LTR retroelement and an L1-like non-LTR retroelement. Further analyses were made of the two Gypsy/Ty3 retroelements. The element located between R-genes 13 and 14 has long terminal repeats of 390 bases. Three base differences were observed between the LTRs. The 4401-bp open reading frame is disrupted by three stop codons. Insertion of the retroelement resulted in duplication of the target sequence GAAAG. The second Gypsy/Ty3 element, next to R-gene 4, has identical LTRs, 370 bp in length. The open reading frame is 4524 bp in length and appears to be intact. Insertion of this retroelement resulted in a target site duplication of 5 bases, TGGGG. The LTRs of the two retroelements show no significant nucleotide identity with each other. Within the open reading frame the retroelements share ∼50% nucleotide identity.
—Sequence assembly of BAC 91F11. The ruler provides an estimated distance in nucleotides. The contig assembly was confirmed by BAC-end sequences from Gm_ISb001_036_O14, Gm_ISb001_068_K07, Gm_ISb001_050_L04, Gm_ISb001_101_H11, Gm_ISb001_042_M07, Gm_ISb001_049_B23, Gm_ISb001_025_B01, Gm_ISb001_010_C02, and Gm_ISb001_068_J10 in Williams 82 linkage group J contig (Grahamet al. 2000). The location of the BAC-end sequence is shown on the ruler. BLASTX (Altschulet al. 1997) was used to compare sequences against the GenBank nonredundant database (March, 2002). The numbers and letters above the R-gene sequences are reflected in the text and in the primer names. R-genes 2 and 8 correspond to previously described cDNAs MG13 and LM6 (Grahamet al. 2000). Arrowheads below the genes indicate the orientation of the predicted open reading frame. BLASTN (Altschulet al. 1997) was used to identify EST sequences with ≥99% nucleotide similarity to BAC 91F11 sequences (March, 2002). GenBank accession numbers of the ESTs and their positions are given under the ruler in italics.
Structure of BAC 91F11 resistance genes: Using the GCG software package, we analyzed the structures of the R-genes located within BAC 91F11 (Figures 1 and 2). All 16 R-genes are oriented in the same direction and have an average nucleotide identity of 86% within the predicted exons. The structures of the genes are shown in Figure 2. Genes 1, 3, 4, 5, 7, 8, 10, 11, and 12 appear to be full-length genes with similarity to the TIR/NBD/LRR family of disease resistance genes. Each of these genes contains 10 LRRs except for gene 12, which is missing the 3 terminal LRRs and 544 bases (relative to the consensus) of the 3′ untranslated region. All of the genes, except genes 7 and 11, contain complete open reading frames. Gene 7 contains three frameshifts resulting in stop codons. Gene 11 has a single frameshift resulting in a stop codon. The locations of these frameshifts are shown in Figure 2. The intron positions within the genes are conserved although the sizes of the introns vary.
In addition to the full-length genes, we identified seven truncated genes, four of which may belong to two classes of novel plant defense genes. R-gene fragments A and B, which extend 187 and 243 bases past the start site, respectively, encode the amino terminus of the TIR. R-gene 2 encodes a complete TIR domain, a truncated NBD domain, and a single LRR (cDNA MG13, Grahamet al. 2000). R-genes 6 and 9 encode complete TIR domains with an average nucleotide identity of 83% with the other R-genes in the cluster. Beyond the TIR, R-genes 6 and 9 show no further similarity to R-genes. Two sequences with similarity to the Ca2+-binding domains of soybean calmodulin gene ScaM-4 (GenBank accession no. L01433) lie downstream of the TIR domains of genes 6 and 9 (Figures 1 and 2). Ca1 is ∼81 bases long and overlaps with the third Ca2+-binding domain of ScaM-4. Ca2 is 133 bases long and shows homology with the fourth Ca2+-binding domain of ScaM-4. Using the BLASTN algorithm we were able to identify seven ESTs in GenBank dbEST corresponding to R-genes 6 and 9 (BG154262, BG652535, BI425148, BI787128, AW164239, BI971986, and BI892930). Four of the ESTs revealed that the calmodulin fragments were included in the transcripts of R-genes 6 and 9. The DIOGENES open reading frame prediction program supported this conformation. In both genes, a 111-bp intron separates the TIR from the Ca1 domain. The Ca1 domain is separated from Ca2 by a 945-bp intron in R-gene 6 and a 965-bp intron in R-gene 9 (Figure 2). Along their entire length, R-genes 6 and 9 share 99% nucleotide identity. Together, the TIR, Ca1, and Ca2 domains encode a protein 264 amino acids in length.
—Structure of BAC 91F11 R-gene sequences. The name of each R-gene appears to the left. Genes are grouped by their structural similarities. Introns (gray) are not drawn to scale. Intron sizes are shown above their location. Gene 7 has three frameshifts, resulting in stop codons, indicated by black dots. Gene 11 has a single frameshift resulting in a stop codon.
Using the BLASTX, BLASTN, and TBLASTX algorithms, we were unable to identify genes from other plant species homologous to R-genes 6 and 9 in the GenBank nonredundant, EST, or GSS databases. Analysis of the ESTs corresponding to genes 6 and 9 revealed that they were expressed in a variety of soybean cultivars (Williams, Williams 82, Bragg, Harosoy, progeny from a recombinant inbred line from Minsoy × Noir, and Corolla, a meristematic mutant) and tissues (germinating shoots, floral meristems, etiolated hypocotyls, hypocotyls infected with P. sojae, and roots following mock infection or flooding treatments). It is also interesting to note that ScaM-4 is one of two calmodulin isoforms induced by fungal elicitors or pathogen infection (Heoet al. 1999). TIR domains fused inframe to Ca2+-binding domains may represent a novel class of disease resistance proteins.
The last two truncated R-genes, genes 13 and 14, may also represent a second novel class of resistance genes. These two genes contain TIR and NBD domains (Figure 2); however, they are missing motif 5 of the NBD-ARC domain (van der Biezen and Jones 1998). Beyond the 3′ end of the NBD (from bp position 1477), the two genes show no similarity to resistance gene sequences. However, analysis with the DIOGENES software revealed that the open reading frames extended 643 bases past the NBD (quadratic discriminant score of 77.5). Using the BLASTX algorithm, we found that the extended open reading frames of genes 13 and 14 had 66% amino acid identity to the predicted amino acid sequence of an mRNA for an unknown defense-related protein, which, on the basis of expression analyses, is thought to be defense related (NtPRp27; Okushimaet al. 2000). Although R-genes 13 and 14 share homology to NtPRp27, they appear to lack a secretion signal.
Genes 13 and 14 share 91.4% nucleotide identity from the start of the R-gene homology through the end of the NtPRp27-like sequence homology. Detailed analyses of the two genes show that the fusion occurred at the same breakpoint, suggesting that one of the genes is a duplicate of the other. In addition, sequences 1050 bp past the 3′ end of NtPRp27-like protein have a nucleotide identity of 91.0%. At the end of this region, gene 13 has a 243-bp repeat of the 5′ portion of the NtPRp27-like protein (Figure 1).
We searched the GenBank nonredundant database, dbEST, and dbGSS for other R-gene sequences fused to NtPRp27-like sequences. In addition, the databases were searched to identify all NtPRp27 homologs. These homologs were then screened for any R-gene motifs. In both cases, we were unable to identify R-genes fused to NtPRp27-like genes in any other species. Comparison of the amino acid sequences of NtPRp27 homologs from tobacco, wheat, barley, and Arabidopsis with the homologous region of genes 13 and 14 shows strong amino acid conservation (Figure 3A). Phylogenetic analyses of the NtPRp27-like portions of R-genes 13 and 14 showed greatest similarity to NtPRp27 and two putative genes from Arabidopsis (GenBank accession nos. GB AAD25570.1 and GB AAD25577.1; Figure 3B).
To determine if genes homologous to genes 13 and 14 existed elsewhere in the soybean genome, primers spanning the R-gene/defense-related protein fusion were used to screen the Williams 82 BAC library (Figure 3A). The primers identified only those BACs overlapping with BAC 91F11. The primers also detected similar bands in the following soybean lines: BSR 101, PI 1487-654, Wm79, Wm1, L91-8765, Altoona, Clark, A81-356033 (Glycine max), and PI 468916 (G. soja; data not shown).
—Comparison of the NtPRp27-like protein portion of R-genes 13 and 14 with homologous sequences from other species. (A) Amino acid alignment of the NtPRp27 homologs. Sequences were aligned using the GCG program PileUp. The sequences for the NtPRp27-like portion of genes 13 and 14 begin immediately where R-gene homology ends. The sequences have the following GenBank accession nos.: pBH6-17 (GB T06205), HP (hypothetical protein Triticum aestivum; GB ADD46133.1), WCI-5 (GB T06278), HP1 (hypothetical protein Arabidopsis thaliana; GB AAD25570.1), HP2 (hypothetical protein A. thaliana; GB AAD25577.1), and NtPRp27 (GB BAA81904.1). Conserved residues have a black background. Similar residues have a gray background. (B) Phylogenetic relationships of NtPRp27-like proteins from a variety of plant species. The phylogenetic tree was built using the default options in the GCG programs PAUPsearch and PAUPdisplay.
Evolutionary analysis of BAC 91F11 R-genes: The GCG program Diverge was used to examine sequence differences among the BAC 91F11 R-genes. The R-genes were divided into structural domains including the TIR, NBD, LRR, and also the IR between the NBD and LRR. By examining the ratio of nonsynonymous substitutions (Ka) to synonymous substitutions (Ks), we examined the types of evolutionary forces acting upon the genes. Within the TIR, NBD, IR, and LRR regions, average Ka/Ks values for all pairwise comparisons were 0.830, 0.548, 0.785, and 1.104, respectively. The use of two-by-two contingency tables identified gene comparisons in which the Ka/Ks ratio was significantly different from neutral selection (Ka/Ks = 1; Table 2). Within the IR and the NBD region many of the pairwise comparisons were significantly less than one, suggesting conservative selective pressure. Pairwise comparisons including R-genes 7 and 11 often resulted in Ka/Ks ratios significantly greater than one. These results, along with the frameshifts and stop codons, support their roles as pseudogenes.
Comparisons of nonsynonymous and synonymous amino acid substitution rates (Ka/Ks) among the BAC 91F11 R-genes
Sequence analyses of other R-gene clusters has suggested that the solvent-exposed amino acids of the β-strand/β-turn motif of the LRR are involved in determining pathogen specificity. The 10 LRRs of genes 1, 3, 4, 5, 7, 8, 10, and 11 and the 7 LRRs of gene 12 were analyzed by separating the β-strand/β-turn motif from the remainder of the LRR. The Ka/Ks average of all pairwise comparisons in the β-strand/β-turn was 1.93 while the remainder of the LRR had a Ka/Ks value of 0.724. Within the β-strand/β-turn domain, many of the pairwise comparisons had Ka/Ks ratios significantly greater than one, suggesting divergent evolution (Table 2). Amino acid alignment of the LRRs demonstrates the hypervariability of the β-strand/β-turn motif specifically in the fifth, sixth, and seventh LRRs (Figure 4).
The alignment of the 16 R-genes was used to construct phylogenetic trees of the IR and the TIR, NBD, and LRR regions within R-genes (Figure 5). For genes 6, 9, 13, and 14, only the R-gene portions of the genes were included. Phylogenetic analyses were performed using the heuristic tree search and parsimony options in PAUPsearch and PAUPdisplay. Genes in close proximity on the BAC do not appear to cluster within or between phylogenetic trees. The only exceptions are genes 13 and 14, which are grouped together. Additionally, bootstrap analysis rarely supports branches within phylogenetic trees. This suggests that domain shuffling may have occurred between the genes.
—Amino acid alignment of eight highly conserved LRRs of BAC 91F11 R-genes. The number following the gene name denotes the LRR number. Alignment of the LRR region was performed using the GCG program PileUp. Conserved residues have a black background. Most of the 91F11 LRRs have the conserved core consensus sequence LxxLxxLxxxx CxxL, where x can be any amino acid. The β-strand/β-turn domain of the LRR is boxed to demonstrate the hypervariability of the amino acids in this region. Tildes in the sequence represent gaps introduced to maximize the alignment.
Assessing significant phylogenetic relationships among R-genes in this cluster has been difficult due to the high nucleotide homology shared among genes, the number of R-genes present, and the duplication of the genes themselves. To further understand the relationships among genes, we examined the sequences 2000 bp upstream and downstream from the start and stop codons of all 16 R-genes (Figures 6 and 7). This revealed a significant amount of information on the origin of the genes within the cluster. Subsets of restriction sites were used to demonstrate regions of sequence similarity. Immediately evident in our analyses were two large duplications involving R-genes on BAC 91F11 (Figure 6). The first duplication is 5419 bases in length and results in the duplication of R-genes 6 and 9 (Figure 6A). The second duplication involves the regions surrounding R-genes 13 and 14 (Figure 6B). This 5531-base duplication includes 1942 bases upstream of the start site, the R-genes themselves, and 1001 bases downstream of the stop codon.
At their 5′ end, all of the R-genes share 75-82 bases of nucleotide similarity immediately preceding the start site. Upstream of this point, however, the genes fall into three distinguishable classes (Figure 7A). Fragment A and R-genes 1, 3, 4, 7, 12, and 13 share no additional DNA similarity with the other R-genes. R-genes 8, 10, 11, and 14 make up a second group. R-genes 5, 6, and 9 are related to this group for the first 1000 bases prior to the start site, but then fall into a separate group. Comparing the groups with their physical position on the BAC reveals no clear pattern.
Analysis of the 2000 bp 3′ to the stop codon also revealed three distinct groups (Figure 7B). R-genes 6 and 9 are similar to each other, but share no similarity to the rest of the R-genes. R-genes 13 and 14 are also similar to each other but show no similarity with the rest of the genes. This is not surprising given their unique fused structure. The remaining R-genes fall into one major group that can also be subdivided. Again, these sequence-based groupings do not correspond to physical positions within the BAC.
In many cases, the alignments reveal evidence of recombination. Looking at the 5′ upstream sequence, R-gene 7 most resembles R-gene 12. However, looking at the 3′ downstream sequence, R-genes 7 and 12 fall within different groups. The 5′ sequences of R-genes 10 and 11 are most similar; yet the 3′ sequence reveals they belong in different groups as well. Perhaps the most striking example is the organization of R-genes 13 and 14. Gene 14 appears to be composed of the 5′ sequences from both main groups and includes R-gene fragment B.
—Phylogenetic analyses of the TIR, NBD, IR, and LRR regions of the BAC 91F11 R-genes. Analyses were performed using the GCG programs PAUPsearch and PAUPdisplay. One hundred bootstrap replicates were performed for each of the trees. Branches that are supported by >50 bootstrap replicates are shown.
Gene-specific RT-PCR analyses of BAC 91F11 R-genes: We designed 12 gene-specific primers that differentiated these 12 genes from all R-genes on BAC 91F11 (Table 1). We were unable to design primers that would differentiate the R-gene portions of genes 6 and 9 and R-gene fragments A and B because of their small size and high nucleotide identity. Controls verified that the gene-specific primers differentiated between subclones representing all 16 genes and that the primers were specific to this R-gene cluster on linkage group J.
On the basis of these results, we were able to perform RT-PCR to monitor the expression of 12 of the genes on BAC 91F11. Libraries were made from six different soybean tissues. The tissues chosen represented different organs of the soybean as well as different developmental stages. Negative controls confirmed that the mRNA samples were free of genomic DNA contamination.
RT-PCR results demonstrate that six of the BAC 91F11 R-genes are expressed (Figure 8, B and C). Genes 2, 10, and 14 appeared to be constitutively expressed in all of the tissue samples. Gene 2 corresponds to cDNA MG13 (Grahamet al. 2000), which had been isolated from soybean roots infected with Heterodera glycines. Gene 14 is one of the two R-genes fused to an NtPRp27-like protein. Genes 8, 11, and 12 appear to be differentially expressed. Gene 8, which corresponds to cDNA LM6 (Grahamet al. 2000), was detected only in the aboveground portion of the plant at 9 DAP. cDNA LM6 had originally been isolated from soybean epicotyls (Grahamet al. 2000). Gene 11 was detected in the below-ground portion of the plants at 9 DAP. Gene 12 was detected in the aboveground portion of the plant at 9 DAP and in mature leaves at 150 DAP.
—Duplicated regions within BAC 91F11. Sequences with no GenBank homology are shown in white. A thin line represents a gap in the duplication. Intron positions are shown in gray. Introns are not drawn to scale. The R-gene portions of the sequences are shown in green. (A) A 5419-bp duplication giving rise to a second R-gene fused inframe to two calmodulin fragments (Ca1 and Ca2). The duplicated regions are separated by >30,000 bases (see Figure 1). (B) A 5531-bp duplication giving rise to a second R-gene fused inframe to an NtPRp27-like protein homology (NLPH). The duplicated region is separated by the insertion of a retroelement, ∼5 kb (see Figure 1).
The BLASTN algorithm was used to search for sequences with similarity to expressed soybean genes from dbEST (GenBank; March, 2002). We considered only sequences with less than two base differences relative to the BAC sequence. Twenty different ESTs were identified from a variety of plant tissues and environmental conditions (Figure 1). ESTs were identified for R-genes 2, 4, 6, 7, 8, 9, and 12. The corresponding EST libraries included a variety of tissues and developmental stages as well as different pathogen-infected tissues.
DISCUSSION
R-gene structure and evolution: BAC 91F11 contains 16 different R-gene sequences with homology to the TIR/NBD/LRR family of disease resistance genes. Analysis of the synonymous and nonsynonymous amino acid substitution rates between the genes has revealed several interesting features. First, the Ka/Ks ratios in the IR, TIR, and NBD regions are <1.0. This suggests that these regions are under purifying selection. Second, the solvent-exposed β-strand/β-turn domain of the LRR has an average Ka/Ks value of 1.93, while the other regions of the LRR have an average Ka/Ks value of 0.724. These values would suggest that the β-strand/β-turn motif of the LRR is undergoing divergent selection.
—Analyses of R-gene organization in BAC 91F11. The sequences upstream and downstream of the R-genes were examined to determine if clustering of genes could be detected. A subset of restriction fragment sites (vertical lines) is provided to demonstrate clustering of sequences. Different colored blocks represent nonhomologous sequences. Thin lines represent gaps introduced to maximize the alignment. Alignment of a specific sequence ends when it no longer shows DNA similarity to any other sequence. (A) Alignment of sequences 2000 bases upstream of the start codon. The upstream sequence for R-gene 2 is not shown because is not included in the BAC. The upstream sequence for R-gene fragment A is not shown because it showed no DNA similarity with the rest of the sequences. (B) Alignment of sequences 2000 bases downstream of the stop codon. The sequences of R-gene 2 and R-gene fragment A are not shown because they show no DNA similarity to the rest of the sequences. The downstream sequence of R-gene 8 (*) is truncated since it is not present on the BAC.
Previously, we used PCR amplification of R-gene-like sequences in the BAC 91F11 cluster to estimate amino acid substitution rates in this region. The results indicated that the TIRs of R-genes in this cluster were under divergent selection (Grahamet al. 2002). One explanation for these results is that the amplification was performed on a BAC 80 kb larger than 91F11 (Gm_ISb001_ 034_P07, 200 kb) that could have contained additional diverse R-genes. A second explanation is that PCR amplification of R-gene fragments made the identification of pseudogenes difficult. Including pseudogenes in our analyses could have altered Ka/Ks ratios.
As cloned R-gene sequences have been analyzed, intragenic unequal crossing over has been shown to play an important role in changing pathogen specificity. Decreases or increases in LRR number are associated with altered specificity in the L and M loci in flax (Andersonet al. 1997; Elliset al. 1999; Lucket al. 2000), the Cf-5 locus in tomato (Dixonet al. 1998), and the RPP5 locus in Arabidopsis (Noëlet al. 1999). All of the TIR/NBD/LRR R-genes on BAC 91F11 contain 10 LRRs except for gene 12, which contains 7. There is no evidence of expansion or contraction of the LRRs, which means that the LRR domains of the BAC 91F11 R-genes are evolving mainly by the accumulation of point mutations within the β-strand/β-turn of the LRR, not by unequal crossing over.
—Sample RT-PCR reactions of the 91F11 gene-specific primer pairs. (A) Control reaction to test for genomic DNA contamination of mRNA samples. Primers used for the control were developed from a soybean tubulin EST. (B) RT-PCR reactions using the 91F11 gene-specific primers R2, R8, R12, and R1. (C) RT-PCR results for each of the 91F11 R-genes tested. The y-axis lists each of the gene-specific primer pairs. The x-axis lists the tissue sources used for RT-PCR. Expressed genes are labeled with a plus sign. Undetected genes are labeled with a minus sign.
Analyses of the sequences 2000 bases upstream and downstream of the R-genes suggest that recombination is important in generating R-gene diversity. Upstream of the translational start site, the 16 R-genes share only 80 bases of sequence similarity. Beyond this point, the genes break into two distinct groups. Three main groups could also be identified in the sequences downstream of the stop codon. While distinct groups of genes are present, they are not physically separated on the BAC; instead they appear in a completely random assortment. In addition, there is clear evidence of recombination between distinct groups.
By analyzing the fusion junctions and the adjacent sequences of genes 13 and 14, it is apparent that one of the sequences arose through direct duplication of the other. To determine if the duplication was due to the insertion of the intervening retroelement, we examined the sequences of both genes 13 and 14 and of the retroelement separating them. Genes 13 and 14 share >90% nucleotide identity and contain complete open reading frames. The LTRs of the retroelement share >99.2% nucleotide identity. The high conservation found within the LTRs suggests that the insertion of the retroelement was relatively recent and occurred after the duplication of the two genes. Therefore, the duplicated genes are probably the result of intragenic recombination. In addition, a recombination or deletion event was probably involved in fusing the TIR/NBD domains to the defense-related protein.
Recombination or deletion was also apparently involved in the formation of R-genes 6 and 9. Again, unequal recombination or deletion would be necessary to fuse the R-gene TIR to the calmodulin-like Ca2+-binding domains and then to duplicate the novel gene. Surprisingly, while the genes share >96% nucleotide identity, they are physically separated by >30 kb. This suggests that a mechanism must be present for separating newly duplicated genes. In the Mla locus of barley, R-genes have been shuffled by several rounds of duplication and inversion and by the insertion of nested transposons (Weiet al. 2002). These changes have significantly expanded the Mla locus.
Analysis of BAC 91F11 has allowed us to build a model to describe the evolution of this cluster of genes (Figure 9). Initially, the R-genes on BAC 91F11 were most likely arranged as two separate clusters (Figure 9A). The head to tail orientation of all the R-genes within the BAC suggests that within the clusters, tandem duplications resulted in new R-genes (Hulbertet al. 2001; Figure 9A). Eventually, an unequal recombination event led to mixing of the two clusters into one. Newly duplicated R-genes could then be shuffled within the cluster (Figure 9, B and C). Unequal recombination then led to the R-gene/defense-related protein fusion to give gene 13, which was followed by duplication to give gene 14 (Figure 9, C and D). Eventually, R-gene fragment B was inserted into the promoter region of gene 14 (Figure 9E). This fragment was then duplicated and shuffled to give R-gene fragment A (Figure 9F). At this point a Gypsy/Ty3 retroelement was also inserted. R-gene 6 was then fused to the calmodulin fragment and duplicated to give R-gene 9 (Figure 9, G and H). R-gene 9 was then shuffled within the cluster. This was followed by the relatively recent insertion of another Gypsy/Ty3 retroelement (Figure 9, H and I).
—Model of BAC 91F11 R-gene evolution. This proposed model represents key events giving rise to the existing R-gene cluster. Other intragenic recombination events have occurred but are not shown. The order of events was determined by comparing nucleotide identities within genes and surrounding areas. Nucleotide identities and restriction fragment patterns (Figure 7) were used to determine relationships between genes. (A) The structure of the original cluster. The origins of R-gene 2 cannot be determined since it is not completely included in the BAC. The calmodulin and NtPRp27-like protein fragments most likely originated from intact genes. These genes may not have been present in the original cluster but are shown at the earliest step to simplify the model. (B-I) Steps giving rise to the current cluster. See discussion for further details.
The degree to which genes are shuffled could be dictated by the insertion and excision of retroelements. Genes 13 and 14, which are 90% identical, are obviously the result of a tandem duplication. However, unlike many of the genes in this cluster, they have not been physically separated. The only thing separating them is a retroelement (Figure 9G). In contrast, genes 6 and 9 are still 99% identical but have been separated by >30,000 bases. Analyses of the LTRs of the retroelements within this BAC suggest that retroelement insertion occurred after the duplication of the genes (Figure 9, E-I). Insertion of retroelements near R-genes may prevent the mispairing that would eventually lead to recombination and gene shuffling. In the Mla locus of barley, successive rounds of duplication have been followed by inversions and nested retrotransposon insertions (Weiet al. 2002). The Mla gene family, which is separated by a nested cluster of retrotransposons, has suppressed recombination rates. In addition to retroelements, the fusion of R-genes to novel genes could also affect shuffling and recombination rates. Traditionally, recombination between closely related R-gene sequences was thought to lead to R-gene homogenization (Hulbertet al. 2001). The presence of fused sequences may limit the rates of recombination and may introduce novel sequence motifs.
Novel disease resistance signatures: We have identified two R-genes (13 and 14) with high similarity to the TIR and NBD domains of disease resistance genes. In addition, these genes have a third domain with similarity to the defense-related proteins NtPRp27 (Okushimaet al. 2000) from tobacco and WCI-5 (Görlachet al. 1996) from wheat. NtPRp27 is constitutively expressed in tobacco roots but can be induced by infection with tobacco mosaic virus, wounding, and drought and by the application of ethylene, methyl jasmonate, salicylic acid, and abscisic acid. Under stress conditions, the mRNA accumulation patterns mirror that of pathogenesis-related protein 1 (PR-1; Okushimaet al. 2000). In wheat, the expression of WCI-5 was induced by benzothiadiazole and infection with Blumeria graminis f. sp. tritici, the causative agent of powdery mildew in wheat. Benzothiadiazole is a salicylic acid analogue, which induces systemic acquired resistance. With each of these treatments, the expression of WCI-5 was coordinated with that of PR-1 (Görlachet al. 1996). These data suggest that both NtPRp27 and WCI-5 are involved in downstream pathogen resistance responses. We examined the Gen-Bank nonredundant database, dbEST, and dbGSS and have been unable to find other R-genes fused to sequences with similarity to these defense-related proteins.
Unlike NBD/LRR genes, homologs with similarity to NtPRp27 and WCI-5 are rare. Okushima et al. (2000) estimated that there were three NtPRp27 homologs in the tobacco genome. Using BLASTX (Altschulet al. 1997), we were able to identify six Arabidopsis NtPRp27/WCI-5 homologs in the GenBank nonredundant database. Five of the homologs are on Arabidopsis chromosome 2, three of which are clustered together. The sixth homolog is located on Arabidopsis chromosome 1. The three homologs identified in rice were also clustered. Examination of soybean EST sequences suggests that there may be four homologs in the soybean genome (data not shown). However, using primers that spanned the R-gene/defense-related protein junction, we were unable to identify other homologs. Failure to identify other homologs of the R-gene/defense-related protein fusion may mean that genes 13 and 14 are the only members of this class, or it may mean that other homologs have undergone sequence divergence, eliminating primer site recognition.
Another interesting point to consider is the expression of genes 13 and 14. In the case of gene 14, the defense-related protein is now fused to a gene whose expression is constitutive in all of the tissues examined. NtPRp27 and WCI-5, which have high similarity to the defense-related portion of genes 13 and 14, are the only two other defense-related proteins for which expression data have been examined in detail thus far. For each of these genes pathogen-induced expression occurs 2 or 9 days, respectively, after infection (Görlachet al. 1996; Okushimaet al. 2000). In addition, NtPRp27 is constitutively expressed in roots (Okushimaet al. 2000). Fusion of the NtPRp27-like protein to a disease resistance gene (13 and 14) places the gene under the regulation of a novel promoter, thus creating novel expression profiles.
Given that NtPRp27 is a secreted protein constitutively expressed in roots and induced by pathogen infection, Okushima et al. (2000) hypothesized that NtPRp27 may be associated with antimicrobial defense responses. Within the linkage group J cluster of disease resistance genes is the ineffective nodulation gene, Rj2. The reasons behind the inability to nodulate are unknown, but it is interesting to speculate that symbiosis does not occur because the potential symbiont is instead recognized as a pathogen. As in disease resistance, nodulation involves an initial recognition event between the plant and the symbiont. Perhaps homologs of genes 13 and 14, which share homology to both disease resistance genes and NtPRp27, are somehow involved in this response.
In addition to the TIR/NBD/defense-related genes, we identified a second novel class of disease resistance genes: TIR domains fused inframe to two Ca2+-binding domains with homology to the Ca2+-binding domains of soybean calmodulin ScaM-4. Ca2+ domains or EF-Hands have been identified in a large variety of proteins and are generally found as pairs (Lewit-Bentley and Réty 2000). The EF-Hand is a helix-loop-helix structure that undergoes a conformational change upon binding of Ca2+ (Yapet al. 1999). The conformational change from a closed to open position allows proteins with EF-Hands to interact with other proteins, leading to their activation or inactivation. ScaM-4 is a calmodulin isoform with four EF-Hands and is believed to be involved in the activation of plant defense responses (Heoet al. 1999). Ca2+ signaling is thought to be one of the first steps in the signal transduction pathway leading to the activation of defense responses (Nurnberger and Scheel 2001). How the Ca2+ signal regulates the expression of defense responses is not understood. One possible mechanism is through binding of Ca2+ to calmodulin. ScaM-4 expression can be induced by fungal elicitors or pathogen infection (Heoet al. 1999). Constitutive expression of ScaM-4 in transgenic tobacco plants leads to the induction of system acquired resistance-associated genes and enhanced, widespread resistance to bacterial, fungal, and viral pathogens. Recently, Kim et al. (2002) have demonstrated the importance of calmodulin in regulating pathogen defense responses. The barley Mlo gene family encodes a seven-transmembrane protein. Mutations within Mlo lead to broad-spectrum resistance to powdery mildew. This suggests a role for Mlo in negatively regulating defense responses. Kim et al. have shown that a rice homolog of Mlo is able to bind the soybean calmodulin homolog ScaM-1 in vitro. In vivo, the loss of the calmodulin-binding domain in Mlo halved its ability to suppress a defense response against powdery mildew. Since R-genes 6 and 9 encode transcripts with a complete TIR domain and the two terminal Ca2+ domains of ScaM-4, they may be involved in plant defense signaling.
Analyses of the alternative splice products of the tobacco mosaic virus resistance gene N suggest that the TIR/NBD domains of genes 13 and 14 and the TIR domain of genes 6 and 9 could still act in signal transduction pathways. Alternative splicing of resistance genes has been demonstrated only in the NBD/LRR family of disease resistance genes. In addition to N, the flax rust resistance genes M (Andersonet al. 1997) and L6 (Lawrenceet al. 1995) and the Arabidopsis P. syringae resistance gene RPS4 (Gassmannet al. 1999) also produce alternate transcripts. Alternative splicing in these genes removes most, if not all, of the LRR. Dinesh-Kumar and Baker (2000) used modified N genes to demonstrate the importance of the alternative transcript in tobacco mosaic virus resistance. Transgenic plants with modified N genes, which could form only one of the two transcripts, could not mount a complete defense response. This suggests that both the full-length and the alternate products are required for complete resistance. Since genes 6, 9, 13, and 14 still contain signaling domains, they may still be involved in the disease resistance signal transduction cascade.
Initially, the presence of truncated R-gene domains within the cluster suggested that these genes were no longer functional. However, the genes with truncated domains show functional similarity to the MyD88 family of genes (Akiraet al. 2001). MyD88 encodes a TIR domain and death domain but does not encode an LRR. MyD88 acts as an adapter molecule through its TIR, for the interaction of an interleukin-1 receptor with an interleukin-1 associated protein kinase. Similarly, the RPW8 disease resistance gene from Arabidopsis shares limited homology to the amino terminus of an NBS/LRR protein (Xiaoet al. 2001). RPW8 lacks LRRs but is thought to function in broad-spectrum powdery mildew resistance through interaction or recruitment of NBS/LRR genes. These data suggest that truncated R-genes could also act as adaptor molecules in the defense response.
In Arabidopsis, almost 20% of the identified NBD/LRR genes are missing most, if not all, of the LRR (33 truncated R-genes and 166 total NBD/LRR genes; Richlyet al. 2002). It is possible that at least some of these genes have been misannotated and additional unique domains have not been identified. In the case of R-genes 6 and 9, only the identification of corresponding ESTs allowed us to determine their true structure. Of the 33 truncated Arabidopsis R-genes, 16 have ESTs associated with them (Richlyet al. 2002). Complete sequencing of these ESTs could reveal surprising structural features within the NBD/LRR family of disease resistance genes.
R-gene expression: RT-PCR was used to monitor the expression of 12 of the 16 R-genes in roots, shoots, flowers, pods, mature leaves, and young leaves. Of the 12 genes examined, expression of 6 genes was detectable. Genes 2, 10, and 14 were constitutively expressed in all tissues. Genes 8, 11, and 12 were differentially expressed. By analyzing dbEST at GenBank, we were able to identify over 20 ESTs corresponding to BAC 91F11 R-genes. ESTs confirmed the expression of 2 genes (genes 4 and 7) for which we did not detect RT-PCR activity and 2 genes (6 and 9) for which we could not design RT-PCR primers. Many of the ESTs originated from plant tissues harvested under different environmental conditions or after pathogen inoculation. This suggests that at least some of the BAC 91F11 R-genes could be induced by stress. Differences between RT-PCR and the EST results are most likely due to differences in tissues examined.
By monitoring the expression of the R-genes we hoped to identify possible candidates for adult-onset powdery mildew resistance (Rmd). In the greenhouse, the primary leaves and first and second trifoliates are susceptible to powdery mildew infection while subsequent trifoliates are resistant (Lohnes and Bernard 1992). Century et al. (1999) examined the developmental control of Xa21-mediated disease resistance in rice and found that expression of the Xa21 gene was independent of plant developmental stage, suggesting that Xa21 is regulated post-transcriptionally. In contrast, we found that gene 12 is the only gene that was expressed in mature leaves but not in younger leaves. Therefore, it may be a candidate for adult-onset powdery mildew resistance.
In this article we reported the sequencing of a 118.8-kb core BAC in a soybean linkage group J contig known to span a cluster of disease resistance genes. Sequence analysis has revealed the presence of several different types of genes. These include 16 genes with similarity to disease resistance genes and retroelements, as well as domains from calmodulin genes, leucine zipper proteins, and defense-related proteins. More than half of the R-genes appear to be full length; the remainder encode truncated genes. To date, truncated TIR/NBD genes have been described only in Arabidopsis. In contrast to Arabidopsis, the truncated soybean genes may represent two novel classes of disease resistance genes because the TIR domains are fused inframe to additional protein domains: TIR/NBD domains fused to a putative defense-related protein (NtPRp27) and TIR domains fused to calmodulin EF (Ca2+-binding) domains. Both the NtPRp27-like protein and the calmodulin EF domains are putatively involved in pathogenesis and defense responses in other species. By analyzing the R-gene sequences within this BAC we have begun to make advances in understanding the mechanisms generating novel disease resistance specificities in soybean.
Acknowledgments
Thanks go to Dan Voytas and David Wright at Iowa State University for their help in the evaluation of the retrotransposon sequences. Thanks go to Roger Wise at Iowa State University for critical evaluation of the manuscript. Names are necessary to report factually on the available data; however, the USDA neither guarantees nor warrants the standard of the product, and the use of the name by the USDA implies no approval of the product to the exclusion of others that may also be suitable. This article is a contribution of the Corn Insect and Crop Genetics Research Unit (USDA-ARS, Midwest Area) and project no. 3236 of the Iowa Agriculture and Home Economics Experiment Station (Ames, IA).
Footnotes
-
Sequence data from this article have been deposited with the EMBL/GenBank Data Libraries under accession no. AF541963.
-
Communicating editor: J. A. Birchler
- Received June 12, 2002.
- Accepted September 24, 2002.
- Copyright © 2002 by the Genetics Society of America