Abstract
Pathogen resistance is an ecologically important phenotype increasingly well understood at the molecular genetic level. In this article, we examine levels of avrRpt2-dependent resistance and Rps2 locus DNA sequence variability in a worldwide sample of 27 accessions of Arabidopsis thaliana. The rooted parsimony tree of Rps2 sequences drawn from a diverse set of ecotypes includes a deep bifurcation separating major resistance and susceptibility clades of alleles. We find evidence for selection maintaining these alleles and identify the N-terminal part of the leucine-rich repeat region as a probable target of selection. Additional protein variants are found within the two major clades and correlate well with measurable differences among ecotypes in resistance to the avirulence gene avrRpt2 of the pathogen Pseudomonas syringae. Long-lived polymorphisms have been observed for other resistance genes of A. thaliana; the Rps2 data suggest that the long-term maintenance of phenotypic variation in resistance genes may be a general phenomenon and are consistent with diversifying selection acting in concert with selection to maintain variation.
PLANTS are attacked by a multitude of pathogens and can respond to a subset of these attacks by mounting an induced defense response (Burdon 1987). The initial step in the induction of a defense response involves a genetic interaction between a specific allele of a disease resistance (R) gene and a complementary pathogen avirulence (avr) gene, the so-called gene-for-gene interaction (Flor 1956, 1971; Staskawiczet al. 1995). In Arabidopsis thaliana, the Rps2 resistance gene confers resistance to pathogens with the avirulence gene avrRpt2 in the pathogen Pseudomonas syringae (Donget al. 1991; Whalenet al. 1991; Kunkelet al. 1993; Yuet al. 1993; Bentet al. 1994; Mindrinoset al. 1994). Recently, P. syringae strains have been found to infect A. thaliana in natural populations (Jakobet al. 2002).
The RPS2 protein contains a nucleotide-binding site (NBS) and a leucine-rich repeat (LRR) region, two characteristics of a large family of plant R genes (e.g., Salmeronet al. 1996; Thomaset al. 1997; McDowellet al. 1998; Meyerset al. 1998; Elliset al. 1999; Noelet al. 1999; Bittner-Eddyet al. 2000; Lucket al. 2000). The LRR region is thought to function in pathogen recognition and thereby determine resistance specificity (Jones and Jones 1997; Leister and Katagiri 2000; Taoet al. 2000; Axtellet al. 2001). Within the LRR, solvent-exposed amino acid residues framed by conserved aliphatic residues are predicted to make direct contacts with the avirulence gene product or avr gene-dependent factor(s) (Jones and Jones 1997). Evolutionary analyses point to the framed, solvent-exposed residues as exhibiting very fast substitution rates due to positive Darwinian selection (Parniskeet al. 1997; Meyerset al. 1998; Bittner-Eddyet al. 2000; Bergelsonet al. 2001; Mondragón-Palominoet al. 2002), consistent with their direct role in pathogen (i.e., avirulence gene) recognition. Other regions may also determine recognition (Elliset al. 1999; Lucket al. 2000), however, and R gene-mediated resistance levels can also depend on other host factors (Banerjeeet al. 2001).
Disease resistance genes are often polymorphic for resistance and susceptibility alleles (Kunkel 1996; Stahlet al. 1999; Elliset al. 2000; Bergelsonet al. 2001; Holub 2001). Caicedo et al. (1999) examined patterns of polymorphism among eight independent alleles of Rps2 and found evidence of two divergent classes. Statistical tests of the data failed to detect evidence for natural selection, but several features of the data led the authors to suggest that selection, nonetheless, might be important at Rps2. First, the locus contained a high level of nucleotide polymorphism, with almost half of the polymorphisms resulting in amino acid changes. Second, the unrooted gene tree structure included one long branch separating a susceptibility allele (present in accessions Wu-0 and Zu-0-1) from a cluster of more closely related resistance and susceptibility alleles, a structure consistent with balancing selection maintaining Rps2 polymorphism. Finally, the tree indicated a preponderance of amino acid changes between more closely related alleles, suggesting that diversifying selection may have generated Rps2 sequence variation.
Here we extend the results of Caicedo et al. (1999) by carrying out statistical tests of selective neutrality and balancing selection at the Rps2 locus with a larger sample of A. thaliana accessions and a sequence from the closely related congener, A. lyrata. We relate quantitative resistance phenotypes to the evolutionary history of the alleles and identify RPS2 mutations that may confer phenotypic variation. We also test for associations of Rps2 sequence variation and the geographic origin of alleles. The data are discussed in reference to the evolutionary processes thought to underlie plant disease resistance polymorphism.
MATERIALS AND METHODS
Plant materials: Twenty-seven accessions of A. thaliana were chosen to create a worldwide sample for Rps2 sequencing representing the major geographic regions in the species' distribution (Table 1). Twelve of these accessions were taken from collections of J. Bergelson and R. Mauricio. Fifteen were selected from those at the Arabidopsis Biological Resource Center (ABRC) at random, except avoiding an excess of accessions from any one country. These accessions were obtained from the ABRC, and seeds from single individuals were harvested to create single-seed stocks for producing the plant materials used in the study. Two individuals of A. lyrata from Indiana (collected by R. Mauricio and D. Jacobson) were used to determine a consensus sequence of the locus for this species.
Phenotype assessment: Resistance phenotypes to the P. syringae avirulence gene, avrRpt2, were determined in all but six of the sequenced accessions, as well as the “Columbia” accession and the mutant, rps2-101C (in a Columbia background). Plants were grown from seed in Promix soil with a 12-hr day length at 20°. When the plants were 3–4 weeks old, one entire new leaf was infiltrated with P. syringae pv. tomato strain DC3000 at OD of 0.0002 using a blunt 1-cc syringe. The pathogen strain used in these infections contained a plasmid: either pLAFR3 or pLABL18. The pLABL18 plasmid is identical to the pLAFR3 plasmid, but contains an additional 3.6-kb fragment containing the avrRpt2 gene (Whalenet al. 1991). Three days postinfection, bacterial levels were measured by grinding standard hole-punch-size leaf punctures in 10 mm MgSO4 and plating dilutions on King's medium B with 40 mg/ml tetracycline. Five to eight replicate plants of each accession were infected with each bacterial strain per experiment. Phenotyping was replicated in at least two experiments for each sequenced accession, except for Po-1 and Mt-0, which were tested in only one experiment.
Plants were identified as resistant or susceptible by comparing the growth (colony-forming units per leaf punch, log-transformed) of the two pathogen strains in an analysis of variance (ANOVA) that included experimental day for accessions evaluated on multiple days. Accessions in which the pathogen strain with pLABL18 grew significantly less than the strain with pLAFR3 (Table 1, column 3) were designated “resistant.” Other accessions were designated “susceptible.”
For those accessions designated resistant, resistance was quantified by comparing pathogen growth in each focal accession with growth in the Columbia accession. Resistance relative to Columbia (Table 1, column 4) was calculated by dividing the difference in the growth of the strains pLABL18 and pLAFR3 in the focal accession by the difference in the growth of the strains in Columbia, as assessed on the same experimental days. Gaps in the distribution of relative resistance values were used to categorize accessions by degree of resistance. Accessions in the group with lower resistance than that of Columbia were labeled “mildly resistant” (mR), and those in the group with higher values than that of Columbia were labeled “strongly resistant” (sR). To determine the significance level for the degree of resistance relative to Columbia, we evaluated the significance of the interaction between accession (Columbia vs. the focal accession) and the pathogen strain (containing pLAFR3 or pLAB18) using ANOVA.
DNA sequence determination: For each accession, DNA was extracted from young rosette leaves using protocols described previously (Bergelsonet al. 1998). The region encompassing Rps2 was amplified in three overlapping amplicons, using primers GTTAGTTGGGTGGCGGGAGAG and GGCACAACCGAAA CAACTGAGG, AACGGAGACTAAAACAGCCC and GACATGCA TCTTCACC, and GTGGATCCATGCTAGTCACATTGAAGTTC and GACCTTTTTATTCCTTTTTCCG, in standard PCR protocols. Both strands were sequenced throughout the region using internal primers (sequences available from the authors) and ABI (Applied Biosystems, Foster City, CA) dye terminator sequencing chemistry. Sequences for each accession were compiled and aligned using Sequencher 3.0 (Genecodes, Ann Arbor, MI). A single consensus sequence for A. lyrata was generated from partial sequences of the two A. lyrata individuals. A small number of sites in our A. lyrata sequences were polymorphic; in each case one of the two alleles included the base found in A. thaliana, and we assigned the A. thaliana base to the consensus A. lyrata sequence for analyses. Multiple large insertions and deletions between the A. thaliana and A. lyrata sequences in the 5′ and 3′ noncoding regions substantially decreased the number of sites at which between-species comparisons could be made. As a consequence, some polymorphism analyses were conducted without considering the out-group sequence.
We found differences between our sequences from accessions Wu-0 and Zu-0 and those reported by Caicedo et al. (1999) for the same accessions. In particular, Caicedo et al. (1999) did not detect mutations at positions 1279, 2554, and 3085 (Figure 1), and they found that variants at positions 3396 and 3502 were shared by Wu-0 and Zu-0 (Figure 1). Variation within accessions has been noted (Breyneet al. 1999; Caicedoet al. 1999) and may reflect ecotype seed collection from multiple (or heterozygous) individuals in the field.
Population genetic analyses: Silent (noncoding and synonymous) and amino acid replacement (nonsynonymous) polymorphism and divergence (Jukes-Cantor corrected) calculations were conducted using DnaSP (Rozas and Rozas 1997). Genealogy estimation was conducted by parsimony using PAUP (Swofford 1996), with 500 bootstrap replicates. Standard tests of a panmictic population, neutral mutation model utilized coalescent simulations with a fixed number of segregating sites (Hudson 1993), with programs available from R. Hudson (http://home.uchicago.edu/~rhudson1). Analyses testing for heterogeneity of polymorphism to divergence ratios were conducted using DNASlider (McDonald 1998); sliding window average G-statistics were analyzed for scaled population recombination rates, from RSlider = 0 to 100 (RSlider = 4NerL, where Ne is the effective population size, r is the recombination rate per base pair per generation and L is the length of the analyzed region), with the most conservative P values obtained for RSlider = 7 for the entire sequenced region and RSlider = 6 for the coding sequence. Linkage disequilibrium was tested for in 2 × 2 contingency tables by Fisher's exact tests, using shareware available from W. Engels. Differentiation among “populations” (groups of sequences defined by phenotype or geographic origin) was calculated as FST = 1 — πW/πT, where πW is average pairwise nucleotide difference within populations and πT is that in total (Hudsonet al. 1992b), and was tested by resampling with sequences permuted across groups (following Hudsonet al. 1992a; Holsinger and Mason-Gamer 1996; Bergelsonet al. 1998), using programs written by E. A. Stahl.
We analyzed a coalescent model with selection and recombination as described in Tian et al. (2002). Selection was assumed to maintain two alleles at fixed frequency 0.81 (and 0.19), acting at the beginning of the LRR (site 2654 in our alignment). The scaled per-base-pair recombination rate R = 4Ne(1 — s)r = 0.00057 uses published estimates of effective population size and selfing rate for A. thaliana (see Tianet al. 2002) and a recombination rate per meiosis estimated from regression of genetic and physical positions of markers near Rps2 (2.71 cM/Mb, r2 = 0.96; data from the Arabidopsis Genome Resource, http://ukcrop.net/agr, markers mi475, SEP2B, m600, PG11, DD1, mi123, RLK5, mi232, prha, g8300, and mi431). The scaled mutation rate between selected alleles (0.0125) was adjusted to fit the observed data near the selected site.
RESULTS
avrRpt2-dependent resistance phenotypes: For each of 21 accessions, we compared the growth of P. syringae strain DC3000 with avrRpt2 and DC3000 without avrRpt2. If an accession is resistant, the growth of the strain with avrRpt2 should be significantly less than the growth of the strain without avrRpt2. The log of growth of the pathogen without avrRpt2 minus that of the pathogen with avrRpt2 is listed in Table 1; this measure of resistance is unitless since it is equivalent to the log of the ratio of growth for the two pathogen strains. The results of our ANOVAs indicate that 17 of the 21 accessions tested were resistant. Accessions BG-4, Po-1, Zu-0, and Knox-2 and the Columbia rps2 mutant showed no indication of resistance. Statistical designations of resistance and susceptibility were consistent with observed disease symptoms.
We determined whether resistant accessions inhibited bacterial growth of DC3000 with avrRpt2 to different extents by comparing bacterial growth in each line relative to this same measure in a common paired control line, Columbia. Relative resistance values (Table 1) ranged from 0.285 to 1.61. Gaps in the distribution of relative resistance values, between 0.67 and 0.945 and between 1.14 and 1.39, allowed us to group alleles into three operational subclasses of resistance, mild (mR), intermediate (R), and strong (sR). We used relative resistance values rather than ANOVA P values (Table 1) to categorize accessions because the power to detect differences from Columbia varied among accessions. The mR group included accessions AB-7, GR-6, Wu-0, Yo-0, and Cvi-0, and the sR group included Pog-0, RLD-1, Co-1, and Tsu-0.
Low growth of DC3000 without avrRpt2 in Pu-8 suggested partial resistance to the DC3000 background; additional resistance in the presence of avrRpt2 indicated that Pu-8 is resistant, but we were unable to measure its relative resistance. Also, Wu-0 has been reported previously as susceptible (Caicedoet al. 1999) although it exhibits growth and symptoms consistent with an intermediate phenotype (Kunkelet al. 1993; this study). It is possible that Caicedo et al. (1999) studied a different genotype within Wu-0 (see materials and methods).
Molecular variation at Rps2: We surveyed DNA sequence variability in 27 accessions from throughout the species range, including the accessions whose resistance phenotypes we determined (Table 1), and from the closely related species A. lyrata. The sequenced region spans 4248 base pairs (bp) in A. thaliana accession Columbia (GenBank accession no. AL049483), from 1003 bp upstream of the Rps2 start codon to 521 bp down-stream of its stop codon. Our survey yielded a 4461-bp alignment including the A. lyrata sequence, with 3755 sites at which polymorphism and divergence were ascertained (Table 2).
The data including the outgroup sequence revealed a total of 197 nucleotide differences fixed between A. lyrata and all A. thaliana sequences and 58 single nucleotide polymorphisms distinguishing 18 haplotypes in the 27 A. thaliana alleles (Figure 1; Table 2). Within the Rps2 coding sequence, we detected 55 nonsynonymous (amino acid changing) differences between species and 20 nonsynonymous polymorphisms. The Rps2 coding sequence reading frame is intact in all individuals, despite two one-codon insertions in A. lyrata relative to A. thaliana at Columbia residues 741 and 771 (both in the LRR region) and a four-codon deletion at 877 (near the RPS2 C terminus). We also introduced one-codon insertion/deletions (indels; in both A. lyrata and A. thaliana) at Columbia residues 86 and 737, where the two species differ at all three nucleotide positions; these three-base differences were not included in polymorphism analysis. We found numerous indels between species in noncoding regions and five indel polymorphisms all outside of the coding sequence. A homonucleotide run at 821 varied between two A. lyrata individuals, but in A. thaliana no microsatellites were detected. No heterozygous sites were detected in A. thaliana individuals. Overall levels of polymorphism and divergence at Rps2 (Table 2) fall within the range seen at other loci in A. thaliana and A. lyrata (Kawabe and Miyashita 1999; Purugganan and Suddith 1999; Aguadé 2001).
In Table 3, levels of polymorphism and divergence in the LRR region are presented. Within this region, the β-pleated sheet structural motif consensus sequence (Jones and Jones 1997) allows framed solvent-exposed amino acid residues, specific candidates for positive selection, to be analyzed and compared with conserved structural residues and nonconserved residues between frames. Significantly greater Ka than Ks between R gene paralogs at framed exposed residues (Meyerset al. 1998; Bergelsonet al. 2001) has provided strong evidence for positive selection on plant R genes. In contrast, synonymous and nonsynonymous divergence reveals no evidence for positive selection on Rps2 (framed exposed residues, Ka = 0.033, Ks = 0.12); functional constraint is evident for all categories of LRR region amino acid residues (Table 3). Contingency tables comparing synonymous and replacement polymorphism and divergence (McDonald and Kreitman 1991) also provide no indication of selectively driven protein evolution (Table 3).
Arabidopsis thaliana accessions and their avrRpt2-dependent resistance phenotypes
Evidence for balancing selection at Rps2: Figure 2 shows a parsimony tree inferred from silent and nonsynonymous polymorphism and divergence, with accession name and avrRpt2-dependent resistance phenotype shown for each allele. The Rps2 gene tree reveals the presence of two highly supported major clades. This haplotype structure is evident for synonymous as well as amino acid replacement polymorphisms, but only for polymorphisms falling in the middle of the coding sequence. Tests for nonrandom associations between all pairs of nonsingleton polymorphisms (Figure 3) reveal that linkage disequilibrium is clustered within a central segment of the Rps2 coding sequence. Indeed only polymorphisms in this segment show significant linkage disequilibrium after correction for multiple tests of association. Outside of this central segment of the Rps2 coding sequence, the data reveal little haplotype structure (Figures 1 and 3).
Sliding window analysis of nucleotide diversity between the two major clades (Figure 4) shows a peak of silent polymorphism in the center of the coding sequence—the 300 bp 5′ of the region encoding the RPS2 LRR region and in the 5′ half of the region encoding the RPS2 LRR region itself (hereafter referred to as the 5′ LRR region)—corresponding to the region containing the cluster of polymorphic sites in linkage disequilibrium. Peak nucleotide diversity between the two major clades reaches πb = 0.086 in the Rps2 5′ LRR region, a value approaching estimates of silent divergence between species. Clustering of silent polymorphism within this segment of the Rps2 coding sequence results in significant heterogeneity in the ratio of polymorphism to divergence across the sequenced region (sliding window average G: entire region, P ≤ 0.004; coding sequence, P ≤ 0.0014; McDonald 1998). Variation at Rps2, therefore, is not compatible with an equilibrium model of selective neutrality in a panmictic population.
Levels of variability across the Rps2 locus and among RPS2 functional domains
Polymorphism and divergence within the RPS2 leucine-rich repeat (LRR) region
—Rps2 region polymorphic sites. Shown are positions in the alignment and the bases of A. lyrata (consensus) and A. thaliana sequences. Periods represent ancestral bases inferred from A. lyrata, and bases indicate derived polymorphic mutations. Amino acid replacement mutations are indicated relative to the wild-type Columbia accession (GenBank accession no. AL049483), which is identical to Bla-2, C2-1, and Gott-20. The region encoding the RPS2 leucine-rich repeat (LRR) region is indicated by the line above the amino acid replacement mutations, with the thicker line indicating its 5′ half.
—Phylogeny of Rps2 sequences based on parsimony analysis of silent, synonymous, and amino acid replacement variability. Accession names are indicated for each Rps2 sequence with avrRpt2-dependent phenotype in boldface type (R, resistant; mR, mildly resistant; sR, strongly resistant; S, susceptible). The tree shown is one of three most parsimonious trees (length 265, consistency index 0.974) that differ only in the resistance (upper) clade. Numbers of mutations are shown above branches, with proportional branch length. Bootstrap values >90% are shown below branches (boldface italics).
—Linkage disequilibrium between polymorphisms in the Rps2 region. The Rps2 region diagram shows the coding sequence (box) with RPS2 functional regions (LZ, leucine zipper; NBS, nucleotide-binding site; LRR, leucinerich repeat). On the horizontal lines below, singleton polymorphisms (small hash marks) and nonsingleton polymorphisms (sample frequency two or greater, large hash marks) are indicated for silent/synonymous polymorphisms (top line) and amino acid replacement polymorphisms in the coding sequence (bottom line). In the triangle at bottom, Fisher's exact test P values for each pair of nonsingleton polymorphic sites are indicated by shading, P > 10—2 (white), 10—3 < P < 10—2 (stippled), 10—4 < P < 10—3 (shaded stipple), and P < 10—4 (black). Only P values <10—4 (black) remain significant after Bonferroni correction.
Alleles from resistant and susceptible accessions are not scattered throughout the Rps2 gene tree, but are grouped together; therefore, we refer to the two major clades as the resistance (R) clade and susceptibility (S) clade. We tested for a significant association between Rps2 sequence variation and avrRpt2-dependent resistance variation by analyzing differentiation (an FST estimator based on nucleotide diversities; Hudsonet al. 1992b) between phenotypes. Overall differentiation between phenotypes is highly significant (S, mR, R, and sR; FST = 0.52, P < 0.001). Pairwise comparisons between phenotypes reveal significant differentiation between S and each of R, mR, and sR (FST ≥ 0.47, P ≤ 0.019), marginally significant differentiation between R and mR phenotypes (FST = 0.12, P = 0.09), and no significant differentiation for other comparisons between resistant phenotypes (FST < 0.005, P > 0.3). Thus, sequence variation at Rps2 correlates with avrRpt2-dependent disease resistance, suggestive of causal links between the two (see discussion).
—Sliding window analysis of silent (noncoding and synonymous) divergence between resistance and susceptibility clades of Rps2 alleles. Average numbers of pairwise differences per site within the window are shown with a solid line. Predicted levels under a coalescent model with selection and recombination (dashed line) assume that selection acts at the beginning of the LRR region (2654) and maintains Rps2 polymorphism at frequency 22/27 = 0.81, with independently estimated recombination rate 0.00057 and fitted mutation rate between selected alleles 0.0125 (see materials and methods). Expected levels under neutrality (dotted line) are calculated as divergence in the window times the ratio of averages across the region of polymorphism (π) and divergence, multiplied by the expected time to the most recent common ancestor for sample size 27 relative to expected average pairwise coalescence time, . The window is 150 silent sites wide, slid by 10-site increments. Beneath the sliding window plot the corresponding functional regions of RPS2 are shown, with amino acid differences between the clades indicated by asterisks (*).
Geographic differentiation: In contrast to avrRpt2-dependent resistance, accessions from the same geographic region are scattered throughout the Rps2 gene tree (Figure 2). We categorized accessions into five regions, (1) Eastern Europe, Asia, and Africa; (2) Central and Northern Europe; (3) Western and Southern Europe; (4) Eastern North America; and (5) Western North America, on the basis of the recent expansion of A. thaliana from Western Asia and Eastern Europe to its current worldwide distribution (Priceet al. 1994; Sharbelet al. 2000). Rps2 sequence variation reveals no differentiation among regions (overall FST = 0.043, P = 0.3; for all pairs of regions FST ≤ 0.14, P ≥ 0.15). In addition, Rps2 variation does not differentiate North America from other continents (FST = 0.043, P > 0.5), revealing no evidence for a founder effect in the colonization of the Western hemisphere by A. thaliana. These results are typical of studies of a single or few loci and moderate sample sizes in A. thaliana (Innanet al. 1996; Bergelsonet al. 1998; Kawabe and Miyashita 1999).
DISCUSSION
Previously, Caicedo et al. (1999) found a high level of polymorphism at the Rps2 locus and two highly divergent alleles, suggestive of balancing selection, but a statistical test (Tajima's D) could not reject selective neutrality. Here we find statistical evidence in support of the selection hypothesis and tentatively identify the Rps2 5′ LRR region as the target of selection. An Rps2 sequence from sister species A. lyrata and a larger sampling of alleles allowed us to detect a clustering of polymorphism relative to divergence exceeding that possible under selective neutrality in a panmictic population. This result rules out the possibility that the region of high polymorphism is a mutational hotspot, since mutation rate heterogeneity would affect both polymorphism and divergence.
Our statistical confirmation of a peak of polymorphism should not be taken, in and of itself, as a strong refutation of neutral evolution. For example, Innan et al. (1996) identified a short highly diverged stretch in exon 4 of Adh, as well as in three adjacent sequence “blocks.” While the authors argue in favor of balancing selection acting on exon 4 (owing to amino acid replacement differences between the alleles), they raise the possibility that population structure and history produced the “dimorphism” seen throughout the locus.
Biallelic variation has also been found at several other loci in A. thaliana (Kawabeet al. 1997; Kawabe and Miyashita 1999; Stahlet al. 1999; Purugganan and Suddith 1999; Aguadé 2001; Hauseret al. 2001; Tianet al. 2002), adding to the appeal of a population structure hypothesis. We favor balancing selection as an explanation for Rps2 variation, on the basis of features of the data that distinguish our results from those of other studies that find biallelic variation but favor a population structure hypothesis. As indicated in the sliding window analysis (Figure 4), most of the variation is present in the coding segment of the Rps2 gene and overlaps with the functional domain of the protein implicated in pathogen recognition. Seven amino acid replacement changes separate the R and S clades, four in the LRR region, and the suggestion that differences between Rps2 allelic classes could be functional is consistent with a role of selection. Furthermore, accessions' Rps2 allelic classes correspond closely with their resistance phenotypes. Since selection can act only if functionally distinct alleles exist, a correspondence between phenotype and geno-type provides additional evidence in support of balancing selection. Others have also pointed to the importance of possible functional differences distinguishing diverged alleles. For example, Hauser et al. (2001) found two divergent alleles across part of the region in their analysis of polymorphism in Glabrous1 (Gl1), a candidate gene for leaf trichome density variation; they argued against selection because the divergence was not in the coding region of the gene and variation in trichome density did not correlate with Gl1 sequence variation. Kawabe et al. (2000) found divergent alleles of the cytosolic phosphoglucose isomerase (PgiC) and favored balancing selection because the alleles produced distinct allozymes (but note that phenotypic properties of the allozymes were not investigated).
Balancing selection is expected to lead to a signature in which neutral variation accumulates between the alleles surrounding the site(s) under selection. This signature is a simple manifestation of the genealogical correlation of tightly linked sites: as a balanced polymorphism becomes old, so too do the genealogical ages of sites tightly linked to it. In Hudson and Kaplan's (1988) coalescent treatment of balanced polymorphism, the physical scale of neutral polymorphism linked to the site under selection is, to a first approximation, determined by a balance between the origination of new neutral mutations (governed by the scaled neutral mutation rate, 4Neu, where Ne is the effective population size and u is the neutral mutation rate per site per generation) and the decay of the linkage disequilibrium between these mutations and the site under selection (governed by the scaled recombination rate, 4Ner, where r is the per generation recombination rate between adjacent sites). Even for a highly self-fertilizing species, balancing selection can be expected to produce a relatively sharp peak of neutral polymorphism linked to a site under selection (Nordborget al. 1996; Nordborg 1997). Based on available genetic and population genetic estimates of mutation and recombination rates in A. thaliana (Tianet al. 2002; see materials and methods and Figure 4), the peak of polymorphism seen at Rps2 is compatible with theoretical predictions for a balanced polymorphism at the 5′ end of the region of the gene that encodes the RPS2 LRR region (Figure 4).
We note that the balancing selection analysis is based on a constant-size panmictic population model and does not take into account departures from this model in the demographic history of A. thaliana. Nevertheless, given that the peak of polymorphism is restricted to within the Rps2 coding sequence, that polymorphisms within the peak are not in linkage disequilibrium with polymorphisms outside it, and that significant linkage disequilibrium is rarely observed between loci in A. thaliana (Innanet al. 1997; Nordborget al. 2002), we can identify the region of the peak, which includes the N-terminal half of the RPS2 LRR region, as the target of natural selection.
In our balancing selection analysis, the best-fit mutation rate between allelic classes was found to be equal to 0.01, i.e., on the order of one-hundredth the rate of neutral coalescence (1/Ne). Higher mutation rates between the selected alleles would lead to more recent common ancestry between them, and if large enough may not result in an observable peak of polymorphism even with balancing selection. Many kinds of mutations can cause loss of function; therefore the rate of origination of new susceptibility alleles might be expected to be quite high. An ancient balanced polymorphism between a resistance and a susceptibility allele would imply that selection favors one susceptibility allele over others and that the rate of origination of this particular susceptibility allele is low. Alternative resistance alleles, on the other hand, might be expected to have a low rate of origination. Thus, the observation of a signature of selection between the two major Rps2 clades is consistent with the hypothesis that the two major allelic classes of Rps2 contain functional resistance alleles. Indeed, Banerjee et al. (2001) showed that the susceptibility allele of Po-1 is partially functional against avrRpt2 when in the Col-0 genetic background. Note that the designation of resistance or susceptibility in this study is based only on the ability to recognize one specific avirulence gene, avrRpt2. We propose that the alleles represented by the Rps2 resistance and susceptibility clades encode distinct specificities against natural pathogens in wild populations. The recent finding of infection by P. syringae in natural populations of A. thaliana (Jakobet al. 2002) makes this a realistic possibility.
—The RPS2 LRR region, with polymorphic mutations. The amino acid sequence taken from Jones and Jones (1997) is shown, and codon number of the rightmost residue in each row is shown on the right. Residues matching the LRR consensus (at bottom) are shown in boldface type, and the vertical lines bracket the structural motif frame. Residues that differ between the resistance and susceptibility clades are indicated in red, and residues that differ in association with phenotypic changes within the clades are indicated in blue.
Rps2 exhibits marked sequence variability in association with phenotypic variation. Seven of the nine phenotypic changes that would be inferred by simply mapping phenotypes onto the Rps2 gene tree are associated with amino acid polymorphisms, six with polymorphisms in the LRR region (Figures 1 and 2). Polymorphisms that distinguish the R and S clades are found upstream of the LRR region (not shown) and in nonconserved residues between LRR frames (Figure 5); these changes could confer phenotypic variation that is maintained by natural selection (Elliset al. 1999; Lucket al. 2000). Polymorphisms associated with other phenotypic changes on the tree include framed solvent-exposed residues and conserved residues between frames (Figure 5). While we cannot rule out the possibility that changes at other loci contribute to phenotypic variation in these accessions, we suggest that these amino acid polymorphisms should be candidates for further study of RPS2 function (Axtellet al. 2001). Moreover, besides conferring phenotypic variation that is maintained by selection, hypervariability of amino acid residues N-terminal to the LRR region and in the N-terminal half of the LRR region may be consistent with diversifying selection on Rps2.
Previous studies have found evidence for rapid adaptive substitution rates in LRR region solvent-exposed residues among R gene paralogs (Meyerset al. 1998; Bergelsonet al. 2001; Holub 2001; Mondragón-Palominoet al. 2002). In contrast, previous studies have not found evidence for positive selection at two Arabidopsis R genes that exhibit signatures of balancing selection, Rpm1 (Stahlet al. 1999) and Rps5 (Tianet al. 2002). The possibility that the major alleles of Rps2 represent a functional balanced polymorphism suggests that the maintenance of variation by natural selection may be a general feature of R gene evolutionary dynamics. At Rps2, we do not find evidence for adaptive protein evolution between species, but we do observe marked amino acid variability that could be consistent with diversifying selection. It remains to be seen whether even faster evolving R genes can also support balanced polymorphisms, as exemplified by genes of mammalian major histocompatibility complex (Hughes and Nei 1988) and plant self-incompatibility loci (Clark 1993).
Acknowledgments
We thank A. Berry, I. Cetl, M. Nachman, G. Robellen, O. Savolainen, and J. Winterer for collecting A. thaliana seeds, as well as the Arabidopsis Biological Resource Center at Ohio State University for providing seeds of A. thaliana accessions. F. Ausubel provided seed of rps2-101C. We acknowledge the assistance of M. Aguadé and the reviewers who provided careful and helpful reviews. This work was funded by a Sloan Foundation/National Science Foundation Fellowship in Molecular Evolution and University of Georgia Faculty Research grant to R.M., a Sloan Foundation/Department of Energy Fellowship in Computational Molecular Biology to E.A.S., and a Packard Fellowship and National Institutes of Health awards GM-57994 and GM-62504 to J.B.
Footnotes
-
Sequence data from this article have been deposited with the EMBL/Genbank Data Libraries under accession nos. AF487796–
-
Communicating editor: M. Aguadé
- Received August 7, 2002.
- Accepted November 11, 2002.
- Copyright © 2003 by the Genetics Society of America