Abstract
We have used the naturally occurring plant-parasite system of Arabidopsis thaliana and its common parasite Peronospora parasitica (downy mildew) to study the evolution of resistance specificity in the host population. DNA sequence of the resistance gene, RPP13, from 24 accessions, including 20 from the United Kingdom, revealed amino acid sequence diversity higher than that of any protein coding gene reported so far in A. thaliana. A significant excess of amino acid polymorphism segregating within this species is localized within the leucine-rich repeat (LRR) domain of RPP13. These results indicate that single alleles of the gene have not swept through the population, but instead, a diverse collection of alleles have been maintained. Transgenic complementation experiments demonstrate functional differences among alleles in their resistance to various pathogen isolates, suggesting that the extreme amino acid polymorphism in RPP13 is maintained through continual reciprocal selection between host and pathogen.
ALTHOUGH resistance (R) genes can be abundant and highly variable within a given plant species, little is known about the origin and maintenance of variation of R genes in natural plant populations (reviewed in Michelmore and Meyers 1998; Hulbertet al. 2001). Haldane (1949) hypothesized that coevolution between host and parasite could lead to the maintenance of variation within both organisms. Genetic models incorporating either overdominance (heterozygote advantage) or negative frequency-dependent selection (in which the fitness of an individual is negatively correlated with the frequency of its genotype within the population) corroborate Haldane's assertion that such dynamics could lead to either stable balanced polymorphisms or cycling of alleles in the host and parasite populations (reviewed in May and Anderson 1983). Alternatively, strong positive selection may lead to the fixation of a single allelic type at an R-gene locus within a host population. Depending on the strength of selection, the geographic distribution of the pathogen, and the host demography, the same allele may sweep to fixation in a number of host populations. When pathogens are variable at a geographic or temporal scale, polymorphism may be maintained at the species level, even in the face of strong directional selection within host populations. Analyses of sequence variation at R-gene loci in natural plant populations are necessary to distinguish between these different evolutionary scenarios.
Studies of allelic polymorphism at R-gene loci in Arabidopsis thaliana have been limited to a few previous examples. At both the RPM1 and the RPS5 loci, functional alleles and a null allele segregate between populations of A. thaliana (Grant et al. 1995, 1998; Tianet al. 2002). While the ratio of intraspecific nonsynonymous polymorphism to synonymous polymorphism within the coding regions of these two genes is much less than one (Bergelsonet al. 2001), the linked intergenic regions of the resistant and susceptible (null-allele possessing) individuals show a large number of fixed differences. The authors interpreted this as evidence for the long-term maintenance of the resistant and susceptible haplotypes (Stahlet al. 1999; Tianet al. 2002).
The level of nucleotide polymorphism has also been determined at the RPS2 R-gene locus in A. thaliana (Caicedoet al. 1999; Mauricioet al. 2003). Similar to the RPM1 and RPS5 loci, two different haplotype classes were found at the RPS2 locus. Alleles of one haplotype were present in plants that were susceptible or only mildly resistant to Pseudomonas syringae pv. tomato (Pst) expressing avrRpt2 (Mauricioet al. 2003). Alleles of the other haplotype class were present in resistant and partially resistant individuals as well as a susceptible lab-induced mutant and one susceptible plant with a nonsense mutation in the RPS2 gene. Thus, the haplotype differentiation corresponded roughly with the observed phenotypic variation in the plants. These studies are consistent with the maintenance of variation between accessions or populations in self-compatible species, most likely through some form of negative frequency-dependent selection or geographic differentiation in pathogen selective pressures.
The RPP13 gene in A. thaliana controls resistance to the oomycete pathogen, Peronospora parasitica, and encodes a protein containing a coiled-coil domain, a nucleotide-binding site (NBS), and a leucine-rich repeat region (LRR; Bittner-Eddy et al. 1999, 2000). RPP13 is unique among characterized NBS:LRR R genes in that it retains full function in rar1, ndr1, eds1, pad4, npr1, and double eds1, ndr1 mutant plants (Bittner-Eddy and Beynon 2001). In contrast to the other R-gene loci that have been the focus of population genetic analyses, multiple recognition specificities to different pathogen genotypes are encoded by differentiated alleles at this single locus (Bittner-Eddyet al. 2000; P. Bittner-Eddy, unpublished data). This resistance gene is located on the bottom arm of chromosome 3 (At3g46530) and two RPP13 paralogs have been identified in the Arabidopsis Col-0 genome. These paralogs are located on the third chromosome ∼73 kb from the RPP13 gene. They share 65 and 60% amino acid identity to the RPP13 allele from the Columbia ecotype (GenBank accession nos. At3g46730 and At3g46710). The functions of these two distantly related paralogs are unknown.
In this study, we investigate 24 accessions of A. thaliana collected from 20 populations in the United Kingdom and 4 populations from elsewhere in northern Europe to determine whether the pattern of allelic variation at the R gene, RPP13, is consistent with a history of either balancing or directional selection. We observe extreme amino acid polymorphism in the LRR region of the protein. This level of variation is greater than that of 17 other loci in A. thaliana, suggesting a history of balancing selection. Furthermore, the A. thaliana individuals show different levels of resistance to three naturally occurring pathogen isolates, suggesting that multiple, functionally differentiated alleles have been maintained within A. thaliana through reciprocal plant-pathogen coevolution.
MATERIALS AND METHODS
Isolation and sequencing of alleles: Alleles of RPP13 were isolated from single individuals from 24 different populations of A. thaliana (Table 1; Figure 1). Hybridization data indicated that each individual of A. thaliana studied contained a single RPP13 gene (P. Bittner-Eddy, unpublished data). Four of the A. thaliana individuals were standard laboratory accessions from northern Europe (Nd-1, Ws-2, Col-5, and Rld-2), while the other individuals were collected from 20 natural populations across the United Kingdom. The methods for DNA isolation and PCR amplification from A. thaliana were as described in Bittner-Eddy et al. (2000). Primers specific to flanking noncoding sequence, coupled with internal primers, were used to generate three overlapping segments encompassing the entire RPP13 gene. The PCR products from several amplification reactions were pooled and sequenced directly.
Accessions for the RPP13 study and reactions to three P. parasitica isolates (Maks9, Emco5, and Wela3)
Orthologous and paralogous sequences of RPP13 were also obtained from A. arenosa and A. lyrata, both described as sister species to A. thaliana (Priceet al. 1994). The A. arenosa individual was from a population in Soubey, Switzerland, and the two A. lyrata individuals were from populations in Saugatuck, Michigan, and Mayodan, North Carolina, respectively. DNA was extracted from these individuals using a modified CTAB method (Doyle and Doyle 1987). In anticipation of sequence divergence between A. thaliana and these two species in potential primer binding sites flanking the RPP13 locus, primers were instead designed in the peripheral coding regions of the gene. Pfu proofreading polymerase (Stratagene, La Jolla, CA) was used to minimize PCR artifacts. The PCR products were cloned into the TOPO TA cloning vector (Invitrogen, Carlsbad, CA) and multiple clones were sequenced to confirm the sequences. Sequences were edited by eye and assembled in Sequencher (Gene Codes, Ann Arbor, MI).
Data analyses: The amino acid sequences were predicted from the nucleotide sequences using MacClade (Maddison and Maddison 2000). Clustal X (Thompsonet al. 1997) was used to align the predicted protein sequences. Minor adjustments to optimize this alignment were made by eye. Maximum parsimony, neighbor-joining, and bootstrapping analyses were completed using PAUP*4.0b8 (Swofford 1999). DnaSP version 3.51 was used for intra- and interspecific genetic analyses and coalescent simulations (Rozas and Rozas 1999). The sliding window analysis was conducted as in Aguadé et al. (1992) on the basis of the estimation of silent and replacement substitutions proposed by Nei and Gojobori (1986). The sequence data for the interlocus comparisons were obtained from GenBank and the literature (Innanet al. 1996; Kawabe et al. 1997, 2000; Henikoff and Comai 1998; Purugganan and Suddith 1998, 1999; Kawabe and Miyashita 1999; Stahlet al. 1999; Kamiyaet al. 2000; Kuittinen and Aguadé 2000; Aguadé 2001; Hauseret al. 2001; Kliebensteinet al. 2001; Mauricioet al. 2003). Sawyer's Geneconv method was used to determine whether some regions of a pair of sequences had more consecutive identical polymorphic sites than expected by chance (Sawyer 1999). This test assumes that mutations are neutral and independently distributed and that there has been no history of recombination between sequences. Permutation of the sequences was used to assign P values to the observed shared fragments and to evaluate their statistical significance.
—Map of England showing origin of plants used in this study. The collection site of the two pathogen isolates, Maks9 and Emco5, is indicated by an asterisk.
Phenotypic analyses: The reactions of all 24 A. thaliana accessions to three different P. parasitica isolates were determined. These isolates, Maks9, Emco5, and Wela3, were collected from naturally infected A. thaliana plants from Maid-stone (United Kingdom), East Malling (United Kingdom), and Weiningen (Switzerland), respectively (Koch and Slusarenko 1990; Bittner-Eddyet al. 1999). The methods of inoculation and phenotyping were as described previously (Holubet al. 1994).
RESULTS
Intraspecific variation at RPP13: The length of the complete alignment of the 24 alleles was 2652 nucleotides. The predicted proteins encoded by individual alleles were between 820 and 843 amino acids in length. All alleles had the same overall domain structure and there were no obviously truncated genes (Bittner-Eddyet al. 2000). The alleles showed numerous indel and nucleotide polymorphisms. In this 2.6-kb region, there were 32 indel polymorphisms and 365 nucleotide polymorphisms (see supplemental Figure 1 at http://www.genetics.org/supplemental/). One-half of all the indel polymorphisms occurred in the LRR region of the gene. Within the LRR, the indels were located most frequently in the putative β-α-connecting loop of the LRR, also described as the third subdomain in the repeat (Bittner-Eddyet al. 2000).
The RPP13 gene shows the greatest nucleotide polymorphism (π= 0.043; θ= 0.040) of any gene surveyed to date from A. thaliana (Table 2; Figure 2). The average value of θ from A. thaliana across a sample of 17 other genes taken from the literature is 0.0085 (ranging from 0.0026 to 0.0206). Assuming even the highest of these θ-values (θ= 0.0206 for ChiA; Kawabeet al. 1997), coalescent simulations with no recombination (the most conservative test) indicate that observing values of π or θ as high as those for the RPP13 locus is unlikely (P = 0.04 and P = 0.021, respectively). The converse is also true. Assuming a genome-wide value of θ equal to that of RPP13 (θ= 0.04), even the highest levels of polymorphism in this sample from published surveys is improbable (θ≤ 0.0206, P = 0.026; π≤ 0.0109, P = 0.003).
Not only is the synonymous and nonsynonymous polymorphism high across the entire gene (πsyn = 0.049; πnon = 0.041) compared to other genes in A. thaliana, but also the ratio of πnon/πsyn = 1.5 in the LRR. There is a significant excess of nonsynonymous polymorphisms per nonsynonymous site relative to synonymous polymorphisms per synonymous site (χ2 = 3.92, P = 0.048) in the LRR, suggesting balancing selection favoring amino acid variation in the LRR. In the non-LRR region, πnon/πsyn = 0.44 and there is a significant excess of synonymous polymorphisms per synonymous site (χ2 = 13.6, P = 0.0002). This suggests purifying selection against amino acid variants outside of the LRR.
Patterns of nucleotide polymorphism and divergence at the RPP13 locus of A. thaliana
While the number of segregating sites is 324, the estimated minimum number of mutations is 403, indicating that multiple hits have occurred at some positions. In some cases, three or more different amino acid residues were observed at a single codon position. In the first two-thirds of the gene, the region encoding the coiled-coil domain and the nucleotide-binding site, 71/542 (13%) of the codons exhibited nonsynonymous polymorphisms. Three or more amino acids were encoded at 6/71 of these polymorphic codons (8.4%). In contrast, 124/282 (43%) of the codons in the LRR exhibited nonsynonymous polymorphisms. Not only was the level of polymorphism higher, but also 55% of the polymorphic codons encoded three or more amino acids and more than one-quarter encoded four or more amino acids. These codons with greater than two amino acids segregating are concentrated in the junctions between the β-strand, β-turn motif and the connecting β-α-loop of the individual LRRs (see supplemental Figure 1 at http://www.genetics.org/supplemental/).
—The ratio of πnon to πsyn across 18 genes of A. thaliana. The sequence data for the interlocus comparisons were obtained from GenBank and the literature (Hanfstinglet al. 1994; Kawabe et al. 1997, 2000; Henikoff and Comai 1998; Purugganan and Suddith 1998, 1999; Caicedoet al. 1999; Kawabe and Miyashita 1999; Stahlet al. 1999; Kamiyaet al. 2000; Kuittinen and Aguadé 2000; Aguadé 2001; Hauseret al. 2001; Kliebensteinet al. 2001).
Frequency spectrum of variation: The Tajima's D (Tajima 1989) value was negative, but did not differ significantly from zero. Partitioning the gene into two regions (exclusively the LRR vs. the rest of the gene) to test if the pattern of evolution differed between these two functionally differentiated parts did not result in a significant deviation from zero (Table 2). The frequency of singletons (polymorphic sites where the rarest variant is present only once in the sample) in this sample is 27%. An excess of singleton mutations relative to the neutral expectation has been observed at multiple A. thaliana loci including Adh1, AP3, CAL, CHI, ChiA, and PI (Innanet al. 1996; Kawabeet al. 1997; Purugganan and Suddith 1998, 1999; Kuittinen and Aguadé 2000). To examine the difference between RPP13 and samples from other loci, the frequency spectra of polymorphism were compared across 14 loci (see supplemental Figure 2 at http://www.genetics.org/supplemental/). In this comparison, over half of the non-RPP13 genes show a high proportion of singletons (>50%). Furthermore, singletons make up a large proportion (60%) of the polymorphic amino acids at these other loci, while singletons make up a smaller proportion (26%) of the polymorphic amino acids at the RPP13 locus.
—Neighbor-joining tree of alleles of RPP13, ortholog from A. arenosa, and paralogs from A. thaliana. A neighbor-joining tree was inferred on the basis of the nucleotide sequences of the RPP13 alleles from A. thaliana and A. arenosa using PAUP*4.0b8 (Swofford 1999). The tree was rooted using the sequence of two paralogs from the accession Columbia (GenBank accession nos. At3g46730 and At3g46710). The HKY85 substitution model was used. Bootstrap proportions of 100 bootstrap replicates >50 are indicated on the branches. Shaded area highlights the cluster of A. thaliana RPP13 alleles.
Haplotype structure: Parsimony and neighbor-joining trees were inferred on the basis of the nucleotide sequences of the RPP13 alleles from A. thaliana and A. arenosa (Figure 3; see also supplemental Figure 3 at http://www.genetics.org/supplemental/). High boot-strap values support the monophyly of the clade composed of the RPP13 alleles from A. thaliana, as well as the larger clade containing the RPP13 alleles from A. thaliana plus the ortholog from A. arenosa. Nineteen different RPP13 alleles were detected in the 24 accessions. One allele was found in four accessions (i.e., Hil-1, Duc-1, Leg-1, and Lha-1) and two alleles were found in two accessions (Ci-1 and Ti; Crl-1 and Bra-1). While multiple clades within A. thaliana were well supported in both analyses, the alleles in clade A and clade B are the most differentiated; alleles from these two clades show 66 fixed differences distributed across the entire RPP13 coding region. Clade A shows a low level of within-clade variation (π= 0.0019). Among the five alleles in this clade, 14 of 15 total polymorphisms are singletons and nearly all of these result in an amino acid difference. Variation in clade B is much greater (π= 0.04) and the overall proportion of singletons is much lower within this haplotype (25.7%). Within clade B, there is some evidence for further divisions among alleles. The three pairs of alleles (Nd-1 and Frd-1, Ws-2 and Coc-1, and God-1 and Edi-2) each emerge in both neighbor-joining and parsimony analyses with high bootstrap support. These three allele pairs are diverged relative to all others in their sequences, despite evidence for recombination at this locus (see below). Another cluster of seven alleles (Rld-2, Poo-1, Sna-1, Ci-1, Ty-0, Pet-1, and Asp-1) also share distinct substitutions with one another relative to all other alleles. However, a straightforward haplotype tree could not be constructed due to recombination.
Evidence for recombination: Although estimates of outcrossing rates in A. thaliana are very low (Abbott and Gomes 1989), there was abundant evidence for recombination at the RPP13 locus. Given such high levels of polymorphism at RPP13, there was even greater power to detect recombinants at this locus than at other loci in A. thaliana. The four-gamete test (Hudson and Kaplan 1985) indicated that a minimum of 38 recombination events are necessary to explain the pattern of variation at this locus (Figure 4). The program Geneconv (Sawyer 1999) was used to infer recombination or gene conversion events in the genealogy of the alleles. Over 50 putative gene conversion/recombination tracks were identified in this sample, reinforcing the observation of extensive recombination at this locus (Figure 4). In some places the two analyses identified the same endpoints of recombinational events, while in other regions, the two analyses were not in agreement (Figure 4).
—Length and position of putative gene conversion/recombination tracks ordered by decreasing P value. The P values are Karlin-Altschul P values, based on the BLAST method for finding sequence matches in DNA or protein databases (Altschulet al. 1990) and are Bonferroni corrected. The names of the alleles involved in these gene conversion/recombination events are separated by a slash. Three sets of alleles are identical in sequence and are separated by a comma. The tick marks along the x-axis are the midpoints of the 38 minimum recombination intervals based on Hudson's four-gamete test. The structure of the RPP13 gene is indicated at the top of the figure.
Only one putative recombinant between haplotype clades A and B was detected. This allele, Ksk-2, shared 82 unique polymorphisms and only two differences with Sco-1 between sites 1 and 1286; whereas Ksk-2, Lha-1, Leg-1, Duc-1, and Hil-1 shared 300 polymorphisms and no differences between sites 1243 and 2652. The Ksk-2 allele may have originated fairly recently because only one event is needed to infer the origin of this allele and the potential donor sequences found among other alleles in this study. The recent origin of this allele is further supported by the high sequence identity shared between each “recombinant” portion of the Ksk-2 allele with the inferred donor sequences.
The program Geneconv was also used to evaluate whether recombination or gene conversion events involving alleles of RPP13 and RPP13 paralogs had occurred. No evidence of conversion or recombination was found between these alleles and the two paralog sequences available from the Columbia ecotype.
Interspecific comparisons: Joint analyses of within- and between-species divergence can increase the statistical power to detect deviations from neutral evolution (Akashi 1999). To identify orthologous sequences from close relatives of A. thaliana, sequences were PCR amplified with RPP13-specific A. thaliana primers. Phylogenetic analyses revealed that one of these amplified sequences, Aren1 from A. arenosa, was sister to the clade of RPP13 alleles from A. thaliana (Figure 3; see also supplemental Figure 3 at http://www.genetics.org/supplemental/). Many other sequences amplified from A. arenosa and A. lyrata cluster with either one of the two previously described paralogs of A. thaliana (e.g., Lyr3, Aren3, and Aren4). A number of studies use A. lyrata for interspecific sequence comparisons to A. thaliana (e.g., Wrightet al. 2002; Barrieret al. 2003). However, none of our amplification products from A. lyrata appeared to be orthologous to RPP13. Two sequences, Lyr1 and Lyr2, possess the greatest nucleotide identity to RPP13 alleles among the collection of A. lyrata sequences. However, frameshift mutations within both of these sequences prevent their use in population genetic analyses (i.e., calculations of Ks and Ka).
—Sliding window analyses: (a) average number of differences per site between RPP13 alleles within A. thaliana and (b) between A. thaliana and A. arenosa. Dashed lines are synonymous variation, while solid lines are nonsynonymous. The sliding window analysis was conducted as in Aguadé et al. (1992) on the basis of the estimation of silent and replacement substitutions proposed by Nei and Gojobori (1986). Values are midpoints of 36-bp windows. The positions of the coiled coil (CC), the nucleotide-binding and ARC domains (NB-ARC), and the leucine-rich repeats (LRR) are indicated below the plots.
Average divergence at synonymous sites between Aren1 and the RPP13 alleles from A. thaliana is 0.15. This value is close to the estimated divergence at synonymous sites across several loci between A. thaliana and A. lyrata: Wright et al. (2002) report an average Ks = 0.126 (range 0.027–0.23) for 24 loci; Barrier et al. (2003) report an average Ks = 0.119 (range 0.0–0.55) for 304 expressed sequence tags. However, nonsynonymous divergence (Ka) between A. thaliana and A. arenosa at RPP13 (Ka = 0.089) exceeds the estimates reported for the A. thaliana/A. lyrata comparisons: Ka = 0.0211 (range 0.0–0.092) from Wright et al. (2002) and Ka = 0.025 (range 0.0–0.16) from Barrier et al. (2003). Amino acid divergence reaches 16% in the LRR, at least six times greater than the average amino acid divergence in the A. thaliana/A. lyrata gene comparisons.
Distribution of variation across the gene: Sliding window analyses were used to characterize the pattern of polymorphism and divergence across the RPP13 gene (Figure 5). The pattern of nonsynonymous and synonymous polymorphism and divergence differs across the gene with nonsynonymous polymorphism and divergence peaking in the LRR. For the interspecific comparison of RPP13 in A. thaliana and A. arenosa, the level of synonymous divergence (Ks) exceeded nonsynonymous divergence (Ka) in the first two-thirds of the gene (Figure 5). The Ka/Ks ratio of 0.389 in this region is comparable to πnon/πsyn = 0.44 for the intraspecific comparison. The pattern of sequence divergence also mirrors polymorphism in the LRR region, showing greater amino acid divergence relative to silent divergence. Ka/Ks is ∼1 in the LRR and exceeds 1 at the junctions between β-strand, β-turn motif and the connecting β-α-loop of the individual LRRs. Both the interspecific and the intraspecific comparisons indicate that RPP13 has a rate of amino acid evolution higher than that of any gene studied in A. thaliana, especially in the LRR.
The McDonald-Kreitman test was also used to determine if the level of nonsynonymous polymorphism observed at the RPP13 locus exceeded that expected under neutrality. Under neutrality, the levels of intraspecific polymorphism and interspecific divergence are expected to be correlated (McDonald and Kreitman 1991). At the RPP13 locus, we detected a significant departure from the null expectation (i.e., the ratios of polymorphism and divergence differed between synonymous and nonsynonymous mutations, P < 0.0005; Table 3). Since studies of other R genes revealed a pattern of diversifying selection acting on the LRR region and functional studies indicate that pathogen-recognition specificity is encoded by this region, we tested whether this region alone was responsible for the deviation from the neutral expectation. Indeed, an analysis of the first half of the gene, which does not contain the LRR, does not show a deviation from neutrality, while the analysis of the LRR region alone does (Table 3). These results indicate an excess of amino acid polymorphisms segregating in the LRR region relative to the neutral expectation.
Correlation with resistance phenotype: The resistance responses to three isolates of P. parasitica, Maks9, Emco5, and Wela3, were determined for the 24 accessions of A. thaliana (Table 1). The genetic basis of resistance to these isolates of P. parasitica was investigated in greater depth by transgenic complementation (Table 1; Bittner-Eddyet al. 2000). The Nd-RPP13 allele encodes specific recognition to both the Maks9 and the Emco5 isolates of P. parasitica, while the Col and Rld alleles do not encode specific resistance to these two isolates (Bittner-Eddyet al. 2000). Furthermore, the Rld-RPP13 gene encodes resistance to a third isolate, Wela3, while Col and Nd alleles do not encode resistance to this isolate (Bittner-Eddyet al. 2000). Recently a third allele, Frd, has been transformed into the susceptible Col ecotype; this allele encodes resistance to Emco5, but not to Maks9 (P. Bittner-Eddy, unpublished data). Although the RPP13 allele isolated from the Col individual does not confer resistance to any of the three P. parasitica isolates used in this study, the Col-RPP13 allele does not contain any frameshift mutations nor are there any other indications that this allele is nonfunctional. Downstream genes in the RPP13 pathway of Col are not compromised, as this genotype showed isolate-specific resistance when transformed with each of the Rld, Frd, or Nd alleles.
Summary of McDonald-Kreitman tests
The Nd, Frd, Rld, and Col alleles differ from each other by a large number of amino acids, so identifying residues that may be involved in pathogen recognition was not possible through simple pairwise comparisons. However, all of the accessions in this study were phenotyped for resistance to Maks9 and Emco5. Since resistance due specifically to the RPP13 gene has been demonstrated only for the Nd and Frd alleles, resistance in other plants may not be encoded by the RPP13 locus. The Rld accession is resistant to Maks9 and Emco5 but this resistance maps to other position(s) in the genome (Bittner-Eddyet al. 2000). Furthermore, some accessions had the same RPP13 allele but differed in their reaction to P. parasitica (i.e., Bra and Crl). Therefore the only informative comparison to identify amino acid differences that are associated with resistance or susceptibility was between the alleles from the susceptible accessions and those that have been shown by transformation to confer resistance.
Over half of the accessions studied were susceptible to one of the two isolates. The subset of 10 accessions that were resistant to the Maks9 isolate was different from the subset of 11 accessions that were resistant to Emco5. Nine were susceptible to both isolates. The “susceptible” alleles to each pathogen isolate are found in all the major clusters of the gene tree, indicating that a large diversity in amino acid sequence was found among alleles from susceptible plants; i.e., susceptibility alleles are not more similar in sequence to one another than they are to alleles from resistant plants.
Alleles of RPP13 from the Nd and Frd accessions conferred resistance to Emco5. There are 13 amino acid positions in which Nd and Frd have the same amino acid, but all of the susceptible alleles have a different amino acid (see supplemental Table 1 at http://www.genetics.org/supplemental/). Twelve of 13 of these polymorphisms are located in the LRR. Since the amino acid residues associated with resistance to Emco5 reside predominantly in the LRR, and the LRR regions of the Nd and Frd alleles are substantially differentiated from other alleles in the sample, it is likely that at least some of the recognition determinants of Emco5 are localized to the LRR region. The Nd allele has been demonstrated to confer resistance to Maks9 (Bittner-Eddyet al. 2000). A comparison of the Nd sequence to the alleles from accessions susceptible to Maks9 revealed that five amino acid residues were unique to Nd (see supplemental Table 2 at http://www.genetics.org/supplemental/). These five amino acids are located between sites 167 and 313 of the amino acid alignment and were restricted to the NBS. One difference at site 167 is found within the conserved motif 1 as described in van der Biezen and Jones (1998). None of the other four differences are located in conserved regions or regions for which functions have been ascribed such as kinase domains of the NBS.
DISCUSSION
A. thaliana is naturally infected by P. parasitica and genetic material of both organisms used in this study was collected from natural populations in northern Europe, predominantly in the United Kingdom. The resistance gene, RPP13, shows both extreme sequence diversity and functional diversity in pathogen recognition. The pattern of sequence variation at RPP13 suggests a coevolutionary interaction between host and parasite that is still very active. The presence of extreme polymorphism at this locus is consistent with the prediction that genes involved in pathogen recognition and defense should show elevated levels of polymorphism (Haldane 1949).
Such extreme intraspecific amino acid polymorphism has not been described at other R-gene loci in A. thaliana (Bergelsonet al. 2001). The RPS5 and RPM1 loci segregate only two haplotypes each: one haplotype with a functional version of the gene and one having a null (deleted) allele. Purifying selection resulting in the conservation of the amino acid sequences of RPS5 and RPM1 alleles may be responsible for the low level of amino acid polymorphism at these loci. A third welldescribed locus, RPS2, also does not show the levels of nonsynonymous and synonymous polymorphism found at RPP13 (Caicedoet al. 1999; Mauricioet al. 2003). Unlike the RPP13 locus, only one pathogen-recognition specificity has so far been described at the RPS2 locus. The disparity in amino acid polymorphism found between these loci may be related to the maintenance of multiple recognition specificities at the RPP13 locus due to selection by naturally occurring pathogens. P. parasitica commonly infects A. thaliana in the wild, and variation in host resistance and pathogen virulence has been shown (Holubet al. 1994). Although alleles of RPS2 from A. thaliana encode recognition to the Pst pathogen, little is known about the natural infection of A. thaliana by Pst, a tomato pathogen.
The pattern of sequence variation and segregation of multiple functionally distinct alleles at the RPP13 locus most closely resembles the observations of the allelic variation at the L locus in flax. Thirteen alleles of the L locus have been described and each confers a different rust-resistance specificity (Elliset al. 1999). The levels of both silent and amino acid polymorphism are high at this locus: πnon is 0.017 in the nonLRR region and reaches 0.051 in the LRR region. As observed at the RPP13 locus, πnon exceeds πsyn in the LRR region, but not in the regions excluding the LRR. However, the sample of alleles from the L locus is not random; these alleles were specifically selected because they conferred different rust-resistance specificities. Our Arabidopsis sample was derived from naturally occurring populations from across Europe and was not selected on the basis of a priori phenotypic observations. In light of the random sampling undertaken in our study, the polymorphism at the RPP13 locus is perhaps even more extraordinary because individuals with divergent phenotypes were not explicitly selected for analysis.
The observation that sequence variation is highest in the LRR portion of the gene is consistent with other studies of R genes encoding LRR domains (e.g., Mondragon-Palominoet al. 2002) and the McDonald-Kreitman test indicates that this region has more amino acid polymorphism than expected under neutrality. The LRR regions of resistance proteins may be involved in determining specificity of gene-for-gene interactions found in plants (Staskawiczet al. 1995). This has been supported by domain swaps and mutational analyses of R genes, although physical interaction between a pathogen avr protein and the LRR region of a resistance protein has been demonstrated only between the AvrPita molecule expressed by the fungus M. grisea and the Pi-ta resistance protein from rice (Jiaet al. 2000). In addition, studies on the L locus demonstrated that other domains may also be involved in determining specificity (Elliset al. 1999; Lucket al. 2000). However, in all R-gene studies, protein sequence variation in the LRRs of R genes is correlated with different pathogen-recognition specificities (Wanget al. 1998; Bryanet al. 2000; Hwanget al. 2000; Banerjeeet al. 2001; Doddset al. 2001; Van der Hoornet al. 2001; Wulffet al. 2001). At the RPP13 locus, the excess amino acid polymorphism relative to silent polymorphism is consistent with the hypothesis that this region is experiencing diversifying selection. Furthermore, 12 of 13 amino acid differences between two alleles that confer recognition to the P. parasitica isolate Emco5 and the alleles from susceptible individuals occurred in the LRR portion of the protein. This observation is consistent with the LRR playing a central role in pathogen recognition.
Is it possible that the amount of amino acid polymorphism observed at this locus is due to relaxed selection pressure at this locus? Two factors could result in relaxed selection:
-
The pathogen is not a consistent selective agent; i.e., allelic variation accumulates during episodes when this host is not exposed to the pathogen.
-
Host demographic factors, such as the predominantly selfing nature of the species and population dynamics dominated by rounds of colonization and extinction, result in a reduction in effective population size, which has been shown to affect the efficacy of selection.
At other loci, the prevalence of amino acid replacements occurring as singletons has been interpreted as evidence that selection against slightly deleterious mutations has not been as effective in A. thaliana as in other organisms (Sawyeret al. 1987; Purugganan and Suddith 1999). However, several lines of evidence argue against the accumulation of amino acid polymorphisms due to strictly neutral processes. If the gene were evolving neutrally (perhaps because the pathogen was not a consistent selective agent) we might expect that some individuals would lose the gene through deletion, or the gene might accumulate frameshift or nonsense mutations. To the contrary, none of the 24 randomly sampled individuals possessed null alleles. All genes were full length (none encode obviously truncated proteins) and none have frameshift mutations.
Amino acid variation is associated with functional differentiation; that is, amino acid-differentiated alleles encode recognition to different pathogen isolates, indicating that at least some of the amino acid differences in these alleles contribute to functional differentiation. In the case of resistance to the pathogen Emco5, 12 of 13 of the amino acid residues shared among resistant alleles were found in the LRR, a region shown to affect pathogen recognition in other R genes. It would be unlikely to observe such an association between protein function and protein sequence if the gene were evolving neutrally.
Multiple analyses suggest the nonneutral evolution of the LRR; the amino acid polymorphism in this region exceeds that of the neutral expectation. While not all of the segregating variation necessarily has functional consequences, it is likely that at least some of these predominantly nonconservative amino acid changes concentrated in the putatively exposed residues affect pathogen recognition. Experiments involving domain swaps between alleles and site-directed mutational analyses will help to resolve precisely which of the many amino acid differences are functionally important.
All of these lines of evidence point to the selective maintenance of sequence variation at this locus, driven by a variable pathogen species. Furthermore, given the demography and mating system of A. thaliana, the allelic polymorphism has most likely been maintained through negative frequency-dependent selection and not overdominance. The long-term maintenance of many differentiated alleles is clearly inconsistent with recurrent selective sweeps operating at this locus over large geographic scales. The presence of several recombinant RPP13 alleles indicates that heterozygotic individuals must have been present multiple times in the past. This indicates some, albeit potentially infrequent, outcrossing and segregation of differentiated alleles that affect disease resistance within A. thaliana populations. The characterization of allelic variation at the RPP13 locus and observation of recombinant alleles provide the necessary materials for future investigations of the role of recombination in generating novel recognition specificities in a natural host-parasite interaction.
Acknowledgments
We thank H. Akashi, J. Parsch, and two anonymous reviewers for their helpful comments. We are grateful to A. Kawabe, K. Olsen, and E. Stahl for sharing their alignments of some of the genes used for the interlocus comparisons. This work was supported by grants from the U.S. National Science Foundation (to L.E.R., R.W.M., and C.H.L.) and the Biotechnology and Biological Sciences Research Council (to P.B.-E., E.B.H., and J.L.B.).
Footnotes
-
Sequence data from this article have been deposited with the EMBL/GenBank Data Libraries under accession nos. AY487208–AY487228 and AY487230–AY487236.
-
Communicating editor: O. Savolainen
- Received May 22, 2003.
- Accepted December 17, 2003.
- Copyright © 2004 by the Genetics Society of America