Comparative genomics provides a powerful tool for the identification of genes that encode traits shared between crop plants and model organisms. Pathogen resistance conferred by plant R genes of the nucleotide-binding–leucine-rich-repeat (NB–LRR) class is one such trait with great agricultural importance that occupies a critical position in understanding fundamental processes of pathogen detection and coevolution. The proposed rapid rearrangement of R genes in genome evolution would make comparative approaches tenuous. Here, we test the hypothesis that orthology is predictive of R-gene genomic location in the Solanaceae using the pepper R gene Bs2. Homologs of Bs2 were compared in terms of sequence and gene and protein architecture. Comparative mapping demonstrated that Bs2 shared macrosynteny with R genes that best fit criteria determined to be its orthologs. Analysis of the genomic sequence encompassing solanaceous R genes revealed the magnitude of transposon insertions and local duplications that resulted in the expansion of the Bs2 intron to 27 kb and the frequently detected duplications of the 5′-end of R genes. However, these duplications did not impact protein expression or function in transient assays. Taken together, our results support a conservation of synteny for NB–LRR genes and further show that their distribution in the genome has been consistent with global rearrangements.
R genes have a central role in plant disease resistance to mediate pathogen detection and response (Martin et al. 2003; Glazebrook 2005). Although R genes are only one of the components required for these responses, they are consistently identified as a critical determinant for qualitative and quantitative resistance (Fluhr 2001; Wisser et al. 2006). The structure, mechanism of action, and evolution of this gene family are still being elucidated and are critical issues for a more efficient deployment of disease resistances in agricultural crops (McDowell and Simon 2006; Takken et al. 2006; Friedman and Baker 2007; van Ooijen et al. 2007).
Comparative studies of sequence similarity between plant R proteins and proteins of innate immunity in animals have made important contributions toward understanding R-protein structure, the role of individual protein domains, and the mechanism by which R proteins identify and respond to foreign proteins (Nurnberger et al. 2004; Takken et al. 2006; Rairdan and Moffett 2007). Both share a central nucleotide-binding (NB) site and a region of homology termed the “ARC” domain (collectively referred to as the NB–ARC) (van der Biezen and Jones 1998; Rairdan and Moffett 2007). The plant counterparts have a highly variable leucine-rich-repeat (LRR) domain at the C terminus and, at the N terminus, either a domain with homology to the Toll and interleukin-1 receptors (TIR) or lack this feature, instead possessing a domain that may include a coiled-coil motif. Due to uncertainty regarding the presence of a coiled-coil motif, this class of NB–LRRs is often referred to as non-TIR proteins. The LRR domains are highly variable and tend to be under diversifying selection to adapt to continually changing pathogen proteins (Meyers et al. 1998b; Michelmore and Meyers 1998; Mondragon-Palomino et al. 2002). Other conserved patterns have been identified in the N terminus of non-TIR proteins, most notably, an EDxxD motif that mediates an intramolecular interaction (Rairdan et al. 2008). The interaction with cellular factors is mediated by the N-terminal domains of NB–LRR proteins although domain-swapping experiments between closely related NB–LRR proteins have shown that recognition specificity is determined by the LRR domains (Rairdan and Moffett 2007; van Ooijen et al. 2007).
The clustering of R genes has provided both insight into their ability to evolve rapidly and challenges to their identification and cloning. R genes often occur in clusters of tandem duplications that can span several megabases and include a multitude of copies of functional R genes, pseudogenes, and other genes within the clusters (Meyers et al. 1998a; Kuang et al. 2004; Smith et al. 2004). Of the various modes of evolution ascribed to these clusters, sequence exchange between R genes within the cluster by unequal crossing over or illegitimate recombination is especially noteworthy (Michelmore and Meyers 1998; Ellis et al. 2000; Hulbert et al. 2001; McDowell and Simon 2006; Friedman and Baker 2007; Wicker et al. 2007). Under stress conditions, transposon activation, recombination activation, and chromatin modifications related to small RNAs may be induced (Levy et al. 2004; Friedman and Baker 2007; Yi and Richards 2007).
Two distinct models for the genomewide arrangement and distribution of NB–LRR genes and these clusters have been proposed. The first predicts rapid rearrangement of R-gene distribution during genome evolution, yielding poor conservation of R-gene locations (Leister et al. 1998; Richly et al. 2002; Meyers et al. 2003). Indeed, in monocots, extensive loss of genomewide R-gene colinearity has been attributed to frequent R-gene duplication and ectopic transposition (Gale and Devos 1998; Paterson et al. 2003). In contrast, the second model supports genomewide conservation of R-gene distribution maintained during speciation. According to this model, most duplication and recombination of R-gene sequences should occur within restricted chromosomal regions, yielding clusters of closely related R-gene sequences. The resulting orthology relationships (homologs related by speciation, not duplication) are complex due to “fractionation” (repeated cycles of duplication, deletion, and recombination) but can, as we have previously shown, be reconstructed (Grube et al. 2000b). Analysis of R genes using the complete Arabidopsis thaliana genome sequence supports this model and accounts for the consensus of NB–LRR sequences (Baumgarten et al. 2003). Resistance to a particular pathogen type is not conserved, and highly similar NB–LRR proteins may confer resistance to very different pathogens (Grube et al. 2000b).
Bs2 encodes a non-TIR NB–LRR protein identified in Capsicum chacoense that confers resistance to the bacterium Xanthomonas campestris pv. vesicatoria. This R gene has greatest sequence identity to Rx and Gpa2 in potato, which confer resistance to a virus and nematode, respectively (Bendahmane et al. 1999; Tai et al. 1999b; van der Vossen et al. 2000). Despite the difference in the pathogens recognized by these genes, they are distinguishable from all other known R genes by marked sequence and structural features. In this study, we demonstrate that these three R genes are derived from syntenic regions in solanaceous genomes as predicted by our model of conservation of synteny. In performing these comparisons, we explore conserved amino acid patterns associated with proteins of the non-TIR family and the local genomic context of R genes of the Solanaceae. Finally, advances in the development of the Solanaceae as a system for comparative genomics highlight a role for chromosomal rearrangements in R-gene distribution throughout plant genomes.
MATERIALS AND METHODS
Capsicum genotypes used in this study were Capsicum annuum NuMex R Naky (R Naky), Early CalWonder 300 (ECW), Early CalWonder-123R (ECW123) (provided by Robert Stall), Yolo Wonder (YW), Perennial (A. Palloix, INRA, Montfavet, France), Capsicum chinense PI159234 and C. chacoense PI439414 (U. S. Department of Agriculture Agricultural Research Station Southern Regional PI Station, Griffin, GA), and an F2 population of 75 individuals derived from the cross R Naky × PI159234 (Livingstone et al. 1999). A tomato mapping population of 88 F2 individuals originating from a cross between Solanum pennellii and Solanum lycopersicum was provided by S. Tanksley.
R-gene sequence analysis:
NB–LRR sequences were obtained from the NCBI GenBank database (http://www.ncbi.nlm.nih.gov) in December 2004 using the Bs2 protein sequence (AAF09256) as a query in BLASTP and are detailed in Table 1. Later searches established that since the original survey no proteins in the Bs2/Gpa2/Rx clade have been described with a characterized role in disease resistance.
Input sequences for dendrogram construction consisted of 452 amino acids of the NB–ARC and flanking regions of R proteins aligned using DIALIGN (Morgenstern et al. 1998; Kumar et al. 2001). The aligned sequences commenced seven amino acids before the GMG motif and extended 10 amino acids past the MHD motif of this region. The high divergence at the nucleotide level did not permit recombination detection. A neighbor-joining dendrogram was constructed using MEGA 2.1 (Kumar et al. 2001). The p-distance model was employed with pairwise deletion gap handling. Ten thousand bootstrap replications were generated to examine the robustness of data trends.
Coiled-coil domain prediction:
To predict coiled-coils, deduced R-protein sequences were analyzed using the COILS (Lupas et al. 1991) and Marcoil (Delorenzi and Speed 2002) programs. When analyzing the data set with COILS, the 14- and 21-amino-acid window sizes were used with the most encompassing matrix, MTIDK. For Marcoil, three matrices were used: 9FAM, MTK, and MTIDK. The outputs were graphed as the coils score along the length of the protein, and results were divided into three categories based on descriptive criteria. Regions that were predicted by both algorithms to contain coiled-coils with likelihood ≥40% were classified as “strong.” Regions that were predicted by both algorithms to contain coiled-coils with likelihood between 10% and 40% or that were predicted by only one algorithm to contain coiled-coils with likelihood >85% were subjectively classified as “weak.” Other regions were assumed to not harbor a coiled-coil motif.
Hydrophobic domain prediction:
Sequences were analyzed using the Kyte–Doolittle hydrophobicity plot in the Lasergene program, Protean (DNAStar, Madison, WI). A sliding window of nine amino acids, the ideal window size for finding hydrophobic domains in globular proteins (Kyte and Doolittle 1982), was used. A moving average trendline with a period of 9 was plotted over the data to assist visualization. Protein regions scoring above a stringent threshold of 2.1 units above the grand average hydropathy for each protein were considered to be hydrophobic.
The C-terminal LRR domain consists of a variable number of leucine-rich repeats. The pattern LXXLXXLXXLXLXX(N/C/T)(X)XLXXIPXX was originally reported as the consensus sequence for these repeats (Jones and Jones 1997). The underlined portion of the consensus sequence matched the examined protein sequences best. For consistency, we reevaluated the LRR descriptions of all R proteins in our data set and manually reannotated Pi-Ta, Dm3, RP3, and RPG1b LRRs.
Analysis of duplicated genome sequences:
The DotPlot program in Lasergene's Megalign was used to compare various DNA sequences. The Bs2 YAC (AY702979) was aligned against itself, using a minimum similarity of 65% and a window size of 50, and the Rx/Gpa2 contig (AF265664) alignment used 65% similarity and a 75-base window. The solanaceous R genes were aligned against their respective genomic sequence to find local duplications (Mi 1.2, U81378; RB, AY303171; R1, EF514212; Tm22, AF536201). Pairwise percentage similarity of duplications was calculated using Megalign's ClustalV. Regions that were repeated one or more times within the Bs2 BAC were assigned putative identifications using BLASTX on default settings. Transposon identification was performed using CENSOR (Kohany et al. 2006).
Localization of Bs2, Gpa2, Me, and Mech loci on a Capsicum linkage map:
DNA markers and genes corresponding to resistance gene loci were integrated into the Capsicum linkage map of Livingstone et al. (1999) by the previously described method (Blum et al. 2003). PCR-based markers and RFLP probes were prepared as described below.
To determine the position of Bs2 in the pepper linkage map, two Bs2-linked markers, A2 and S19, were used. These map 0 and 7 cM from Bs2, respectively (Tai et al. 1999b). To localize the A2 marker in our linkage map, A2 fragments were amplified from genomic DNA of ECW-123 using A2 STS PCR primers according to Tai et al. (1999a). The resulting 528-bp A2 fragment was used as a probe for RFLP hybridizations. To localize the codominant SCAR marker S19 in our linkage map, S19 primers (Tai et al. 1999a) were used.
PCR primers (Integrated DNA Technologies, Coralville, IA) were used to amplify a 435-bp fragment from potato Gpa2 BAC clone 111 (van der Vossen et al. 2000), provided by J. Bakker, for subsequent use as an RFLP probe to map Gpa2 in pepper, as described above. This probe corresponded to nucleotides 398–833 of the coding region of Gpa2 (GenBank AF195939). In addition, two RFLP markers, GP34 (provided by C. Gephardt) and tomato clone CD19, were mapped to more accurately determine the location of the Gpa2 gene.
Me and Mech loci:
Previously, RAPD marker Q04_0.3 was mapped in pepper 10.6 cM away from the nematode resistance locus Me3 (Lefebvre et al. 1997; Djian-Caporalino et al. 2001). Previous mapping also revealed that a second nematode resistance locus, Me4, maps 10 ± 4 cM away from Me3 (Djian-Caporalino et al. 2001), and it was subsequently found that Me1, Me7, Mech1, and Mech2 could be inferred to localize to a region spanning ∼17 cM telomeric to Q04_0.3 and ∼ 10 cM centromeric to Q04_0.3. We mapped RAPD marker Q04_0.3 (OpQ04.300) using previously described methods (Lefebvre et al. 1997) in our segregating population. The map location of marker OpQ04.300 was used to infer the probable map locations of Me and Mech genes.
Mapping the Bs2 gene in a tomato linkage map:
A 500-bp DNA fragment of the Bs2 gene was amplified from genomic DNA of C. chacoense (PI439414) using the primers Bs2 L1 and Bs2 R1 (Tai et al. 1999a). Amplification products were cloned and sequenced at the Cornell University Life Sciences Core Laboratory Center and used as an RFLP probe. Polymorphic bands were mapped in tomato using population filters provided by S. Tanksley.
Rx tagged with four HA epitope tags was constructed in the pB1 binary vector containing the Rx promoter and 3′ sequence (Rx:4HA) as described (Bendahmane et al. 2002; Peart et al. 2002b). The NBLet sequence was deleted by overlapping PCR to create Rx:4HAΔNBLet. Binary vectors were transformed into the Agrobacterium tumefaciens strain C58C1 carrying the virulence plasmid pCH32. Agroinfiltration was performed as previously described (Bendahmane et al. 2000; Peart et al. 2002a). GFP fluorescence was evaluated 5 days later using a hand-held UV lamp. Protein extraction and immunoblotting were preformed essentially as described by Rairdan and Moffett (2006).
Primary sequence relationships:
NB–LRR proteins homologous to Bs2 were collected using the full-length Bs2 protein sequence (AAF09256) in a search using BLASTP. Proteins were identified from both monocot and dicot plants and were mostly non-TIR–NB–LRR R proteins; TIR–NB–LRR matches to Bs2 scored at or above e = 10−19. All matches at or below this threshold were checked manually to determine if they had an experimentally established resistance function, thereby eliminating probable pseudogenes. These criteria produced a set of 35 previously characterized non-TIR NB–LRR proteins from both monocot and dicot plants (Table 1).
Amino acid sequence relationships of the NB–ARC region are a common criterion used to compare R proteins (Cannon et al. 2002). Aligned NB–ARC amino acid sequences were trimmed to the same length, and a sequence similarity diagram was generated (Figure 1A). Because recombination and sequence exchange drives the evolution of many R genes, we employed a neighbor-joining method for sequence analysis. Although it is not the most sophisticated method, neighbor-joining is not based on a continuum of sequence divergence that is an assumption required for parsimony and other models of phylogeny reconstruction (Doyle and Gaut 2000). While recombination detection algorithms are being developed for nucleotide alignments, the divergence of our data set limited us to amino acid level comparisons. Figure 1A is therefore not our only measure of orthology, but critical in the organization of sequences for the following analyses.
From these comparisons of primary sequence, Rx and Gpa2 emerged as the R proteins most closely related to Bs2. The high bootstrap values supporting this clade provide a high confidence for this grouping, which reflects the sum of random mutation and recombination among these homologs. While a second Rx paralog, Rx2, has been identified and mapped to potato chromosome V (Bendahmane et al. 2000), it has more recently been shown that all sequences highly similar to Rx/Gpa2 in two different diploid potatoes reside within the Rx/Gpa2 cluster (Bakker et al. 2003). This suggests that the presence of Rx2 on chromosome V might represent a recent translocation event that is not widely conserved (Bakker et al. 2003).
Predicted structural relationships:
The effect of fractionation on phylogeny prompted us to seek other evidence of relationship among NB–LRRs. The N terminal, NB–ARC, and LRR domains of R proteins are further divided into subdomains and motifs. The methods and criteria for annotating these features vary between reports, so to compare domains of Bs2 with those of other R proteins, we revisited the domain prediction for all R proteins in this study (Table 1) to fill in missing information and to apply a consistent set of criteria to all sequences. Our analyses focused on key features of the N terminus, NB–ARC, and LRR domains (Figure 1B).
All of the proteins analyzed in this study are referred to as non-TIR R proteins, and the N termini are often reported to contain coiled-coil or leucine zipper domains. The protein sequences were reevaluated for coiled-coils using the programs COILS and Marcoil and a common set of criteria. The COILS program is commonly used for R-protein evaluation and employs a sliding window to evaluate the probability that a stretch of amino acids forms a coiled-coil (Lupas et al. 1991; Pan et al. 2000b). The program Marcoil uses a hidden Markov model, which can be advantageous for recognizing shorter coiled-coils such as those believed to be found in R proteins (Delorenzi and Speed 2002). Often the highest scores were at the N terminus as expected, but this domain was spuriously predicted elsewhere in the protein as well. For example, a typical false positive was found in the polyglutamate repeat in the LRR of Hero, which cannot physically form a coiled-coil (Ernst et al. 2002; Gruber et al. 2006). We do not attempt to distinguish between regular coiled-coils and the leucine zipper subclass, but note that many leucine zippers reported in the R-gene literature were not predicted to be coiled-coils, even though a requisite pattern of leucine residues was present. In general, the Marcoil and COILS programs were in agreement with few exceptions. However, the 14-amino-acid window of COILS gave many apparent false positives relative to the 21-amino-acid window.
Figure 2A, panels A–D, shows the predicted coiled-coils for HRT, Bs2, Rx, and Gpa2, with strong predictions in dark blue and weak predictions in light blue, and illustrates some of the previous discrepancies about coiled-coils in the N termini of these proteins. HRT and Gpa2 had been previously reported as having coiled-coils in the N terminus, while Rx was described as possessing a putative coiled-coil with a less conserved consensus, and Bs2 was noted to lack a coiled-coil domain (Bendahmane et al. 1999; Cooley et al. 2000; Tai and Staskawicz 2000; van der Vossen et al. 2000). On the basis of our updated analyses, however, Bs2 is more likely to possess a coiled-coil than is Gpa2. While a distinction may have been made previously between the coiled-coil nature of the N termini of Rx and Gpa2, side-by-side comparison reveals that no substantive difference was detected by current algorithms. In the absence of experimental data demonstrating the existence of a coiled-coil structure in these R proteins, we suggest they should be more conservatively classified simply as non-TIR R proteins.
A motif that is fairly conserved in the N-terminal domain of most characterized non-TIR–NB–LRR proteins is the EDxxD motif, which is found adjacent to potential coiled-coils and forward of the NB–ARC (Bai et al. 2002; Rairdan et al. 2008). The region WVxxIRELAYDIEDIVDxY was aligned among all of the R genes in our study and grouped according to clades identified in Figure 1A (also see Figure 2B). In general, the groupings produced by the NB–ARC alignment are mirrored in this motif. Previously, synapomorphies within the NB–ARC were found to correlate with the presence or the absence of a TIR N-terminal domain; this result revealed common patterns that can be found within the non-TIR N-terminal domain on the basis of NB–ARC relationships (Pan et al. 2000b). Exceptions to this trend are seen within the EDxxD portion of Bs2 and the divergence of Dm3, RPS2, and RPS5. The use of the DiAlign algorithm (Morgenstern et al. 1998) allowed the motif of these latter sequences to be aligned on the basis of similarity outside the EDxxD motif, but as noted previously, this clade seems to lack the most conserved portion of the motif (Rairdan et al. 2008). Given this data set, a slightly modified consensus for this region was observed (Figure 2B). Considering amino acid properties, a general pattern is suggested and described in the Figure 2 legend.
The structural annotation was revised for two other R-protein regions. A hydrophobic region within the NB–ARC (GxP or GLPL) has been shown to be important for R-gene function (Rairdan and Moffett 2006), but also several other regions have been noted as being hydrophobic in first reports of R-gene isolations. Since criteria used by authors vary, we again applied common criteria for prediction and annotation of hydrophobic domains across the R proteins examined. A Kyte–Doolittle plot was used to analyze hydrophobicity (Figure 1B; indicated in purple) (Kyte and Doolittle 1982). LRRs are not necessarily contiguous, which further complicates their delineation. In our analyses, two types of interruptions were found: (1) gaps in Rp1-D and the polyglutamate repeat in Hero and (2) superimposition of alternate domains, as predicted by other methods, on the LRR pattern. LRR domains are shown in red in Figure 1B, and our reevaluation was useful in delimiting the ends of these domains as structural features in our analysis.
Sequence relationships of the noncoding regions near and within R genes:
We interpret the intron position within R genes (Bai et al. 2002; Meyers et al. 2003) as an indicator of orthology relationships. Introns and exons, both within the coding region and in the 5′- and 3′-UTR, are shown in Figure 1B. Visual comparison of the placement of noncoding regions further demonstrates the striking similarity between closely related R genes. We were intrigued by the extreme 27-kb length of the Bs2 intron. Dot plots aligning the Bs2 YAC with itself as well as BLAST and CENSOR searches (Kohany et al. 2006) were employed to investigate this phenomenon (supporting information, Figure S1). The Bs2 intron contains six major duplicated elements (Figure 3 and Table S1). Portions of the intron were found to bear similarity to the internal regions of two Gypsy-type LTR retrotransposons, Ogre (Macas and Neumann 2007) and GYPOT1 (Jurka and Shankar 2006a). There were several interruptions in the partial alignment with the Ogre element. The region with similarity to GYPOT1 is flanked by direct repeats (Figure 3A-e). Much of the intronic region is duplicated to the surrounding regions as well. Duplicated portions of the Ogre element (Figure 3A-f) and the repeats flanking GYPOT1 (Figure 3A-d) indicate local movement of large retroelements as does another region with similarity to a Copia LTR retrotransposon (Kohany and Jurka 2007). Smaller portions of the intron are also duplicated: Figure 3A-g harbors an Alien element (PozuetaRomero et al. 1996) and Figure 3A-d is found separately within the intron. Nonautonomous Alien DNA transposons were distributed rather evenly across the YAC, but other elements of the same class, Sonata (Jurka 2006a,b; Jurka and Shankar 2006b), were much more numerous and tended to cluster. Non-LTR retrolements (Yoshioka et al. 1993) were also observed in the vicinity of Bs2 as well as an additional hAT DNA transposon (Jurka and Kohany 2006). Other duplicated fragments were observed but bore no similarity to transposons. In general, there is an erosion and truncation of transposable-element-related sequences consistent with multiple insertions followed by sequence drift. The sequence encompassing the GYPOT1 element is also found in the flanking regions of other solanaceous R genes, specifically Rx and Gpa2. Interestingly, most BLAST hits to this particular transposon-related sequence were associated with R-gene clusters in various plants. An abundance of similar retrotransposons has been reported in tomato (Datema et al. 2008) and has generally been associated with genome expansion.
Another notable duplicated feature is the presence of a fragment of the Bs2 gene, specifically a portion of the 5′-end of the gene repeated past the 3′-end of the functional gene. Other solanaceous R genes were tested for similar truncated NB–LRRs, or “NBLets” (Figure 3), because of a similar report for Tm22 (Lanfermeijer et al. 2003). Only Rx and RGC3, a pseudogene near Rx, share the same type of trailing 5′ gene fragment as Bs2; the absence of a NBLet for Gpa2 may be due to the loss of its terminal exon as compared to Rx. NBLets were also found for Mi 1.2, R1, RB, and Tm22, but these were 5′ to the coding sequence. Tandem duplications have been implicated in affecting gene expression/activity through the generation of small interfering RNAs. This phenomenon has been observed specifically in R-gene clusters (Yi and Richards 2007). To determine whether the duplicated region of Rx plays a functional role in Rx-mediated resistance to PVX, protein levels and resistance responses were compared in transient agro-expression assays between Rx genomic constructs with and without this duplication. A 300-bp region encompassing the 3′ duplication was deleted from a binary vector containing the Rx promoter, the Rx coding region fused to four HA epitope tags, and 3′ sequences. Rx constructs were coexpressed with a GFP-tagged version of PVX, such that in this assay, Rx efficacy is inversely correlated with the amount of GFP florescence observed (Rairdan and Moffett 2006). The deletion of the Rx NBLet had no notable effect on the ability of Rx to confer resistance to PVX or on the level of Rx protein expressed (Figure 4) in this system. These results rule out the possibility that the Rx NBLet expresses a protein fragment required for Rx function and suggests that the NBLet does not alter Rx protein expression levels.
Macrosyntenic relationships of Bs2, fractionated orthologs, and paralogs:
Bs2 belongs to a large gene family in Capsicum as demonstrated by numerous bands in DNA blot analysis (Tai et al. 1999b). Therefore, it was not practical to directly map the Bs2 gene by RFLP or identify paralogs. PCR approaches using portions of the Bs2 gene have been employed, but could potentially identify paralogs located nearby or duplicated elsewhere in the genome. To test our hypothesis and to identify the position of Bs2 in the pepper genome unequivocally, we used DNA markers tightly linked to Bs2 (Tai et al. 1999a; Tai and Staskawicz 2000).
A2 is a marker that is tightly linked to the Bs2 gene (0.0 cM) and resides on the YAC clone containing Bs2 (Tai et al. 1999b; Tai and Staskawicz 2000). We cloned the A2 genomic DNA fragment from the C. annuum cultivar ECW-123R (Bs2/Bs2) and used it as an RFLP probe. Two dominant polymorphic bands were detected on survey filters for our comparative mapping population (Livingstone et al. 1999; Grube et al. 2000b). The 1-kb band mapped between markers TG263 and CT121 on the lower arm of pepper chromosome 9 (P9; Figure 5A and Figure 6), while the 2-kb band was assigned to P1.
To determine which of these loci corresponded to the Bs2 gene, another linked marker, S19, was used. S19 is a codominant SCAR marker located 7 cM from Bs2 and A2 (Tai et al. 1999a). The same pair of polymorphic bands identified during the cloning of Bs2, 220 bp in ECW and R Naky, 240 bp in ECW-123R and PI159234, were amplified in our mapping parents (Figure 5B). S19 was mapped to P9, ∼6 cM below TG263 and centromeric with respect to the position for the 1-kb band of A2 (Figure 6), demonstrating that Bs2 resides on P9 in a region of the pepper genome that is orthologous to the top arm of potato XII that includes the fractionated orthologs Rx and Gpa2. This is consistent with results from others who have located several PCR markers with similarity to Bs2 on this chromosome (Ogundiwin et al. 2005; Sugita et al. 2006; Djian-Caporalino et al. 2007).
To further test our hypothesis of shared synteny between fractionated R-gene orthologs, RFLP probes corresponding to Gpa2 were mapped in pepper and Bs2 probes were mapped in tomato. The probe derived from the 5′-end of Gpa2 hybridized to an average of 11 bands on pepper genomic survey blots. The prominent polymorphic bands were mapped, and all localized to a region on P9 each 3 cM from marker TG263 (Figure 5C and Figure 6). While Bs2 is a member of a large gene family in pepper, it produces few bands on tomato genomic DNA survey blots (Figure 5D and Tai et al. 1999b). While one of the major polymorphic bands mapped to tomato chromosome 2, the other mapped to tomato chromosome 12, between CT100 and CT129, which tightly flank Rx and Gpa2 in potato and broadly flank Bs2 and Gpa2 homologs in pepper (Figure 6).
Tomato and potato are collinear throughout this arm of the chromosome, differing by only a whole-arm inversion. Pepper is collinear with both tomato and potato in this region except CT100 is centromeric in both pepper and tomato, but near the telomere in potato. This deviation signifies the breakpoint between the inversions. Further, this breakpoint, between CT100 and CT129 in potato compared with pepper, is the location of the R-gene cluster, providing a plausible explanation for the dispersal of R genes to the ends of this inverted region (TG180/Mi 3 and CT100/Lv in tomato Figure 6). This hypothesis predicts that other R genes similar to Rx/Gpa2 are localized near CT129 in pepper, the other end of the inversion breakpoint. The marker OpQ04.300 is linked to nematode resistances and shared the same polymorphism between our mapping parents as observed in previous studies, allowing us to also demonstrate the presence of Me1, Me3, Me4, Me7, Mech1, and Mech2 in this region on a comparative map that aligns with other genera with multiple single-copy linkages (Figure 5E and Figure 6) (Livingstone et al. 1999; Djian-Caporalino et al. 2001, 2007).
Critical breakpoints in this updated alignment occur at telomeric/centromeric regions or near R genes. For the Rx/Gpa2 cluster, a chromosome translocation/inversion breakpoint dispersed the R-gene homologs and associated markers in other genera relative to potato. Others have observed the genomic distribution of R genes as a somewhat random phenomenon (Leister et al. 1998; Pan et al. 2000a; Richly et al. 2002), but it has been since shown in Arabidopsis that R-gene locations are consistent with the rearrangements of their chromosomal context (Baumgarten et al. 2003). In the Solanaceae, 22 genome rearrangements distinguish tomato and pepper (Livingstone et al. 1999). The analysis of Grube et al. (2000b) and subsequent R-gene discovery in tomato, pepper, and potato described here were combined to examine the association of R genes with chromosome breakpoints (Figure 7). In every case, we could associate at least one source of resistance with every breakpoint. Despite the limitation of only including NB–LRRs near breakpoints that have a known phenotype, the sequence relationship between Hero and Prf is seen to be reflected in their genomic relationship. As shown by Baumgarten et al. (2003), this sequence relationship is not expected for every R gene that can be aligned in the genome because clusters are heterogeneous. This hypothesis can be further tested in a comparative system when the completed tomato genome sequence allows these comparisons to be made at a higher resolution across the Solanaceae.
The ability to determine orthology is critical in the application of comparative genomics to questions of R-gene evolution, function, and discovery. Here we investigate the homology relationships of Bs2, a major gene in Capsicum for resistance to bacterial spot and other non-TIR NB–LRRs. From analyses of sequence, gene architecture, predicted protein structure, and macrosynteny, Bs2 is a fractionated ortholog of members of the Rx/Gpa2 locus. In contrast to monocots, recent reports in dicots illustrate fractionated synteny, and microcolinearity can be found across genera for R genes and other tandemly duplicated genes (Ballvora et al. 2007; Schlueter et al. 2008). Approaches to understanding and utilizing R-gene macrosynteny in the Solanaceae are certainly viable. The cloning of the late blight resistance gene R3a from potato based on I2 in tomato illustrates the potential of these comparative approaches (Huang et al. 2005). Extensions of our model provide for the a priori localization of cloned R-gene sequences in one species on the basis of the genomic location of its fractionated ortholog in a model species and a selection criterion for candidate sequences where resistances align in comparative maps. Comparative maps are critical for understanding and identifying rearrangement breakpoints that fragment these relationships.
The clustering of R genes and the complex recombination of paralogs within clusters pose a special challenge in studies of their evolution. This recombination results in varying levels of sequence exchange through a form of in vivo DNA shuffling that generates diversity as well as gain and loss of R genes (Michelmore and Meyers 1998; Song et al. 2002). We qualify this claim of orthology and acknowledge the “fractionation” of the evolutionary history of R genes (recombination, duplication, and deletion) that also does not allow for the creation of a true phylogeny by conventional methods. These limitations lead to our application of additional criteria that can be used to support a common history. It has been noted that for some regions of the R genes, sequence similarity is not sufficient to allow the pairing required for conventional recombination (Wicker et al. 2007). The frequently reported association of transposable-element-derived sequences within R-gene clusters provides the requisite conserved sequences for recombination to occur, and the presence of transposons elsewhere in the genome would provide a means for interchromosomal sequence exchange (Meyers et al. 2003).
The relationship of R-gene clusters and chromosome rearrangement warrants further investigation. The breakpoint of the translocation that differentiates pepper chromosome 9, potato chromosome XII, and tomato chromosome 12 is apparently in or near an R-gene cluster (Livingstone et al. 1999; Grube et al. 2000b). This phenomenon is repeated throughout comparative maps of the Solanaceae and is summarized in Figure 7. Every chromosomal rearrangement breakpoint is associated with one or more R genes or mapped resistances. Previously, it has been shown that genomic rearrangement and duplication are significant sources of R-gene dispersal and duplication, which is further complicated by ancestral polyploids (Baumgarten et al. 2003; Ameline-Torregrosa et al. 2008). Increased sequence information will provide resolution of the precise relationship of R-gene clusters, embedded transposons, and chromosome breakpoints that may be detected in comparisons between related genera. These processes also may explain the dramatic expansion observed in some R-gene clusters (Meyers et al. 1998a). The translocation breakpoint proposed within the Rx/Gpa2 cluster would result in these sequences being dispersed to the centromeric and telomeric regions of the lower arm of P9 (Figure 6). Two other modes of expansion were witnessed in dot plots of the genomic sequences of these R-gene regions that would further disperse R genes in the genome (Figure S1 and Figure S2). This hypothesis can be further tested in a comparative system when genome sequence allows these comparisons to be made at a higher resolution across the Solanaceae.
The clustering of these duplicated sequences also leads to unique regulatory and functional properties (Friedman and Baker 2007; Tam et al. 2008). Small RNAs have been described as coordinately regulating these R-gene clusters at a post-transcriptional level (Yi and Richards 2007). We speculated that NBLets adjacent to functional R genes may also have a role in this regulatory process. Our test of the effect on expression and function of Rx with and without an adjacent NBLet did not detect any differences, indicating that the Rx mRNA expression, stability, and translation are unaffected by the NBLet per se. It remains possible, however, that the NBLet may affect local chromatin structure in its endogenous context, which in turn could affect R-gene expression levels (Friedman and Baker 2007). NBLets are not exclusive to the non-TIR NB–LRR class of R genes; they have been noted in the TIR class as well (Graham et al. 2002). Xa21, a member of a third class of R genes, has been shown to have a highly conserved 5′ domain that is important for mediating recombination between genetically linked paralogs (Song et al. 1997).
A subset of R genes, including Bs2 and Rx, provide durable resistances, but as others are overcome, there is a need for new sources of resistance and to reconsider approaches to using these genes in crop improvement (McDowell and Woffenden 2003; Lecoq et al. 2004). According to our model, large-scale sequencing of the easily obtained NB–ARC domains from a new source of resistance can be grouped by sequence similarity, and these groups will reflect corresponding R-gene clusters on comparative maps of reference plants. This strategy can be tested in application to the cloning of Lv and Mi 3 in tomato using homology to Bs2 and Rx/Gpa2 as references (Figure 6). The importance of the genetic background of a plant is linked to the complexity of R-gene clustering and mechanism and poses different challenges to plant breeders. The introgression of a new resistance gene will occur at regions of the genome that may already contain resistance genes (Michelmore 2003). So-called “jackpot” cultivars can be seen as a source of cassettes of resistances and contain clusters of many tightly linked resistances (Grube et al. 2000a). However, merging selected genes of these clusters is a much more daunting prospect.
A major barrier to understanding R-gene similarities and function is the lack of structural information. Successes in this area have so far capitalized on regions of shared similarity and homology modeling in the NB and ARC domains and, to some degree, the highly divergent LRR, but the N-terminal domain of the non-TIR class lacks this benefit (McHale et al. 2006; Takken et al. 2006; Chattopadhyaya and Pal 2008). While some motifs have been found in these variable regions, the major feature ascribed to these proteins, a coiled-coil domain, has never been demonstrated, only computationally predicted. Since oligomerization of NB–LRR-associated coiled-coil domains has not been reported, and cellular proteins that interact with this domain show no common structures, it would seem that the existence of a coiled-coil structure either has a role in protein conformation or is simply an artifact of the prediction programs (Deyoung and Innes 2006).
The convoluted history of R-gene diversity is being explicated with increasing resolution. Comparative studies are one tool to investigate shared aspects of R genes, but often reveal striking differences that are fundamental to their evolution and mode of action; R proteins at once must be highly adaptive to changing pathogens, yet retain sufficient similarity to interface with host proteins and signal transduction networks. Elucidating the mechanisms of genome-level processes that have operated in different lineages is a key step both in reaching translational goals and in determining the factors that govern the evolution of this gene family.
We thank B. Staskawicz and R. Freedman for providing the Capsicum YAC clone sequence, C. Gephardt for providing us with the GP34 potato clone, J. Bakker for the potato Gpa2 clone, and R. Stahl for providing us with C. annuum ECW123 seed. We are grateful to Greg Rairdan for generating initial Rx constructs and to M. Sacco for coining the term NBLet. Our gratitude to R. Grube, B. Baker, A. Bent, and J. Rouppe van der Voort for helpful conversations regarding the mapping of R genes in the Solanaceae and K. Perez for critical review of the manuscript. This work was supported in part by the National Science Foundation (NSF; DBI-0218166 and IOB-0343327 to M.J. and IOS-0744652 to P.M.). M.M. was supported by a Barbara McClintock Award (Robert Rabson), the Olin Fellowship, the College of Agriculture and Life Sciences (University of Wisconsin, Madison) Dean's fund, and a gift from Kalsec. S.M.C. was supported by an NSF Graduate Research Fellowship. B.-C.K. received support from U. S. Department of Agriculture Initiative for Future Agricultural and Food Systems Award no. 2001-52100-113347 and NSF Plant Genome Award no. 0218166.
Supporting information is available online at http://www.genetics.org/cgi/content/full/genetics.109.101022/DC1.
↵1 Present address: Center for Human Genome Variation, Duke University, Durham, NC 27708.
↵2 Present address: Département de Biologie, Université de Sherbrooke, 2500 Blvd. de l'Université, Sherbrooke QC J1K 2R1, Canada.
Communicating editor: V. Sundaresan
- Received January 27, 2009.
- Accepted April 29, 2009.
- Copyright © 2009 by the Genetics Society of America