Natural Variation in the Pto Disease Resistance Gene Within Species of Wild Tomato (Lycopersicon). II. Population Genetics of Pto
Laura E. Rose, Richard W. Michelmore, Charles H. Langley


Disease resistance to the bacterial pathogen Pseudomonas syringae pv. tomato (Pst) in the host species Lycopersicon esculentum, the cultivated tomato, and the closely related L. pimpinellifolium is triggered by the physical interaction between the protein products of the host resistance (R) gene Pto and the pathogen avirulence genes AvrPto and AvrPtoB. Sequence variation at the Pto locus was surveyed in natural populations of seven species of Lycopersicon to test hypotheses of host–parasite coevolution and functional adaptation of the Pto gene. Pto shows significantly higher nonsynonymous polymorphism than 14 other non-R-gene loci in the same samples of Lycopersicon species, while showing no difference in synonymous polymorphism, suggesting that the maintenance of amino acid polymorphism at this locus is mediated by pathogen selection. Also, a larger proportion of ancestral variation is maintained at Pto as compared to these non-R-gene loci. The frequency spectrum of amino acid polymorphisms known to negatively affect Pto function is skewed toward low frequency compared to amino acid polymorphisms that do not affect function or silent polymorphisms. Therefore, the evolution of Pto appears to be influenced by a mixture of both purifying and balancing selection.

IN the past decade, there has been a burst of interest in the evolutionary dynamics of disease resistance in plants by population geneticists. Perhaps two influential reasons for this are that, in many plant species, specific pathogen resistance is controlled by genes of large effect and individuals within species show allelic segregation at these loci. These observations hold true for both cultivated and “natural” (noncrop) species, including Arabidopsis, and are not simply an artifact of modern breeding practices. Another reason that population geneticists have begun to focus on this trait is that we can study the origin and evolution of resistance as a proxy for understanding how individuals adapt to their environment. Studying adaptation through the lens of disease resistance has an added element of complexity, however, in that the selective agent, i.e., the pathogen, coevolves with the host. This potential for coevolution sets pathogens apart from other sources of natural selection, such as abiotic stress.

The study of host–pathogen coevolution has a long history, rich in modeling. In particular, the apparently “simple” nature of the genetic basis of resistance in plants has meant that mathematical models of these evolutionary processes are tractable and may incorporate a reasonable amount of biological reality. The models have helped us to predict the dynamics that we may expect at the loci controlling the interactions between host and pathogens. Technological advances have now made it possible to test specific predictions of these models. Over 40 resistance genes have been cloned and sequenced in the past 2 decades and methods are available for inexpensive sequence analysis of large numbers of individuals. Additionally, the pathogen effectors (e.g., avirulence genes) that are required for the activation of disease resistance responses have also been cloned. The structures are available for some of these molecules and molecular biologists and biochemists are actively trying to pinpoint precisely how these R-genes function and to determine their molecular partners in plant cells (Wulf et al. 2004; Zhu et al. 2004; Janjusevic et al. 2006; McHale et al. 2006).

According to population genetic models, balancing selection, which leads to the maintenance of allelic variation at R-genes, can occur as a result of the dynamic coevolutionary process between host and pathogen (reviewed in May and Anderson 1983). Depending on the model's parameters, the R-gene can come to a stable balanced polymorphism, oscillate in frequency (also considered a balanced polymorphism), or go to fixation as a result of positive selection (i.e., sweep through a population). The two forms of balanced polymorphism leave slightly different signatures at the molecular level that can theoretically be distinguished from other forms of natural selection. A range of tests have been developed, which can be used to detect whether the R-gene has historically been affected by balancing selection, including HKA, Tajima's D, McDonald–Kreitman, and tests involving coalescent simulations (Hudson et al. 1987; Tajima 1989; McDonald and Kreitman 1991). Recently, Bakker et al. (2006) completed a comprehensive study of 27 resistance genes across 96 accessions of Arabidopsis thaliana. Signatures of balancing selection were found at several of these R-genes; however, additional studies of other host species, with a range of mating systems and life histories, are needed before we can establish which evolutionary history predominates at R-gene loci (de Meaux and Mitchell-Olds 2003; Tiffin and Moeller 2006). We have chosen to concentrate on the well-characterized interaction between wild tomatoes and Pseudomonas syringae. These host species display a range of mating systems and serve as a complementary model system for understanding host–parasite coevolution.

There are currently at least nine described species of wild tomato (clade Lycopersicon). Phylogenetic studies have shown that the Lycopersicon clade is monophyletic and embedded within one of the largest plant genera, Solanum (Peralta and Spooner 2001). Subsequently, the tomato species have been renamed and these new names are described in Peralta and Spooner (2001). For consistency with our previous work on these species, we have used the older names here. Only a single tomato species is cultivated and, in this study, we focus explicitly on the noncultivated tomato species. These species are native to the western coast of South America and range in mating from selfing to obligate outcrossing. All species are regular diploids and are genetically well characterized. Our focal species (in order of outcrossing rates from lowest to highest) are Lycopersicon parviflorum, L. chmielewskii, L. pimpinellifolium, L. hirsutum, L. pennellii, L. chilense, and L. peruvianum. Five of these species were the focus of recent population genetic work on understanding how recombination is related to diversity within species as a function of the mating system and on determining how differentiated these species are from one another at the molecular level (Baudry et al. 2001; Roselius et al. 2005; Städler et al. 2005). This extensive population genetic work used exactly the same plants analyzed in this study and provided a set of 14 reference loci that were used for comparisons with R-gene sequences in these same species.

Tomatoes can become infected by a bacterial pathogen, P. syringae pv. tomato (Pst), the causative agent of bacterial speck disease. As suggested by its common name, Pst-infected leaf tissue develops black specks surrounded by chlorotic halos. In cultivated tomatoes, infestation by this pathogen can lead to yield losses, as both fruit and leaves are attacked. Different races of this pathovar are found worldwide and these races differ in the expression of well-characterized avirulence factors (Sakar et al. 2006). For example, two avirulence factors found in race 0, but not in race 1, are AvrPto and AvrPtoB. Both of these two avirulence proteins are specifically recognized by a tomato resistance protein, Pto (Scofield et al. 1996; Tang et al. 1996; Kim et al. 2002).

Disease resistance conferred by the Pto gene was introgressed into the cultivated tomato species, L. esculentum, from its sister species, L. pimpinellifolium (Pilowsky and Zutra 1982). In the early 1990s, this gene was one of the first R-genes to be cloned and sequenced (Martin et al. 1993). This gene encodes a serine–threonine protein kinase and is 963 bp long with no introns. Pto belongs to a small family of six genes clustered in a 60-kb region of chromosome 5 of tomato (Martin et al. 1993; GenBank accession nos. AF220602 and AF220603). Pto paralogs do not encode recognition of the AvrPto and AvrPtoB avirulence factors, but most of these paralogs are expressed and are functional protein kinases (Chang et al. 2002; Kim et al. 2002).

Pto binds AvrPto and AvrPtoB in yeast two-hybrid assays and in planta (Scofield et al. 1996; Tang et al. 1996; Kim et al. 2002; Mucyn et al. 2006). The recognition of these avirulence proteins by Pto in the plant cell activates the disease resistance pathway. Mutant analyses of all three proteins have shown that mutations that disrupt binding ability result in susceptibility, i.e., a loss of pathogen recognition, and no activation of the resistance response (Scofield et al. 1996; Tang et al. 1996; Kim et al. 2002). The activation segment, which lies within the catalytic cleft between the small N-terminal lobe and the larger C-terminal lobe of Pto, has been investigated thoroughly by site-directed mutagenesis and domain swaps (Scofield et al. 1996; Frederick et al. 1998; Rathjen et al. 1999; Wu et al. 2004). Substitutions of negatively charged amino acids at several positions in the P+1 loop of this region lead to a constitutive activation of a disease resistance response, the hypersensitive response (HR) (Rathjen et al. 1999; Wu et al. 2004). We were particularly interested in determining how this region evolved in these tomato species, as it plays a critical role in pathogen recognition and downstream signaling. Recently, we investigated the functional variation within and between the seven species of Lycopersicon listed above in terms of their levels of resistance to two isogenic strains of Pst (Rose et al. 2005). The two strains of Pst differed only in the presence of the AvrPto gene. We sequenced the Pto alleles from these Lycopersicon individuals and tested a subset of them in a susceptible host using Agrobacterium-mediated transformation. Here we describe population genetic analyses of Pto alleles from six species of Lycopersicon. We evaluate the distribution of nucleotide sequence polymorphism, taking into account the results from structure–function analyses of Pto, in comparison to 14 reference loci.


Plant materials:

Populations of each species of Lycopersicon were sampled across their ranges (Rose et al. 2005, Figure 1). Individuals of seven species of Lycopersicon were grown from seed collected from natural populations in Ecuador, Peru, and Chile (Table 1). These seeds were collected by Charles Rick and colleagues and stored at the Tomato Genetics Resource Center (TGRC) at the University of California at Davis (see Seeds from additional populations not available from the TGRC were obtained from the U. S. Department of Agriculture, Agricultural Research Service Plant Genetic Resources Unit in Geneva, New York (i.e., specifically, accessions PI129157, PI134417, PI134418, PI251305, PI126444, PI128654, and PI128659). For the samples LA3653 (L. chmielewskii), LA1583 (L. pimpinellifolium), and the outbreeding species, field-collected seed from South America was used. For the inbreeding species (excluding accessions LA3653 and LA1583), selfed seed, available from the TGRC, was used. In total, 58 different individuals representing a total of 31 accessions of seven wild tomato species and one accession of Solanum ochranthum were studied. Seeds were soaked in a 50% bleach solution for 30 min and incubated on Anchor germination paper (St. Paul) at 22° with 24 hr fluorescent light. Seedlings were transferred to soil 2 weeks after germination and grown under greenhouse conditions at Davis, California.

Preparation of genomic DNA and sequencing:

DNA was isolated using a CTAB method (Doyle and Doyle 1987) from 2 g of leaf tissue collected from each plant. The DNA was resuspended in 300–1000 μl TE, depending on yield. Alleles of Pto from each species were amplified by PCR using Pfu polymerase (Stratagene, La Jolla, CA). For most reactions, the primers SSP17 (GGTCACCATGGGAAGCAAGTATTC) and JCP32 (GGCTCTAGATTAAATAACAGACTCTTGGAG) were used. These primers overlap the start and stop codon of the Pto gene and amplify not only Pto, but also Pth3 and Pth5, due to the similarity of these genes at their 3′- and 5′-ends. The standard PCR protocol was 94° for 5 min, 25× (94° for 30 sec, 50°–60° for 30 sec, 72° for 90 sec) followed by 72° for 10 min. Products were gel purified using QIAGEN (Valencia, CA) Gene Clean or Prep-A-Gene (Bio-Rad, Hercules, CA) kits. These products were cloned into the pCR-Blunt vector (Invitrogen, Carlsbad, CA). Multiple clones were sequenced from all 58 individuals. Sequencing was performed using an ABI 377 automated DNA sequencer. In the process of sequencing alleles of Pto, Pth3, and Pth5 from different species, we determined that BstXI specifically digests alleles of Pth3 and Pth5, but not Pto. Therefore, we adopted this diagnostic digest to enrich for Pto alleles and select the clones for subsequent sequencing. Multiple independent clones were sequenced for each individual and a minimum of two clones were sequenced per allele of Pto. Additional cloning and sequencing was used to clarify any ambiguous positions.

Phylogenetic analysis was used to delineate which sequences belonged to the Pto clade, as one indication of orthology. Phylogenetic analyses were completed using PAUP (Swofford 1999). The phylogenetic relationships between these sequences were determined using maximum parsimony and neighbor joining and these methods yielded similar topologies. The trees were rooted as in Vleeshouwers et al. (2001), made possible by the inclusion of sequences of Pto homologs from different species of Solanum. Sequences of the region containing the entire Pto gene family (∼60 kb) were available for two species: L. esculentum (GenBank accession no. AF220603) and L. pimpinellifolium (accession no. AF220602). The open reading frames of all Pto gene family members from these two species were aligned with the sequences generated in this study. The gene family members from L. esculentum and L. pimpinellifolium were used as anchors and to name the clades in the phylogenetic tree. In total, >225 Pto orthologs and paralogs were analyzed (L. E. Rose, unpublished results). A well-supported “Pto clade” was identified (supplemental Figure 1 at and the sequences belonging to this clade were subjected to further analyses.

DNA sequence analyses:

The standard summary statistics, including π, Tajima's D, Fu and Li's D, and Fu's F test statistics were calculated using DnaSP v. 3.51(Rozas and Rozas 1999). For comparisons among loci, the sequences of alleles of 14 other loci (CT66, CT93, CT99, CT114, CT143, CT148, CT166, CT179, CT189, CT198, CT208, CT251, CT268, and sucr) were obtained from E. Baudry, K. Roselius, and T. Städler (sequences are directly available from GenBank: accession nos. AY941323AY941771 and DQ104647DQ104695). These reference genes had been amplified from the same individuals of populations of L. pimpinellifolium (LA1583), L. chmielewskii (LA3653), L. hirsutum (LA1775), L. chilense (LA2884), and L. peruvianum (LA2744) as used in this study. A total of 8–10 alleles from each population were sequenced from these 14 reference loci. These genes are single-copy cDNA markers previously developed and mapped in Tanksley et al. (1992). A summary of their predicted gene products is found in Table 1 of Roselius et al. (2005).

Coalescent simulations:

Coalescent simulations were conducted to examine whether the pattern of substitution at synonymous and nonsynonymous sites at Pto in L. peruvianum and L. chilense differed from the 14 other genes from these same individuals. For synonymous sites, we used the arithmetic mean of π of the 14 non-R-genes as our estimate of θ for our simulations. A total of 1000 simulations were executed in DnaSP and subsequently we determined whether the value of π observed at Pto fell within the 95% confidence interval of the simulations based on θ estimated from the 14 non-R-genes. For these simulations, we assumed no recombination, the most conservative assumption. The same approach was also used to test if π at nonsynonymous sites (πnon) was different for Pto vs. the arithmetic mean across these 14 non-R-genes. Additionally, an even more conservative test comparing πnon at Pto to πnon of the gene showing the highest level of nonsynonymous variation was also conducted.

Test for ancestral polymorphism in L. peruvianum:

We tested whether the proportion of fixed differences between species that are still segregating in L. peruvianum was significantly higher at Pto than at the 14 other loci. Alleles of Pto and 14 other non-R-genes were sequenced from the same 20 individuals (5 individuals of the four species L. chilense, L. chmielewskii, L. pimpinellifolium, and L. peruvianum, representing 10 alleles/species/gene). For each gene, an alignment of the alleles was made, including all four species. Fixed differences among the three species (L. chilense, L. chmielewskii, and L. pimpinellifolium) were tallied and the type of substitution (synonymous or nonsynonymous) was recorded. After the number of fixed differences was calculated, we determined whether these sites were polymorphic in L. peruvianum. To test if the proportion of fixed differences that are polymorphic in L. peruvianum was significantly higher at Pto vs. the other 14 genes, a bootstrap method was used. The 10,000 bootstrap replicates were created by sampling, from the data set of the 14 genes, the number of positions from Pto that showed fixed differences between these three species. A P-value was computed by determining the proportion of the bootstrap replicates that had values greater than or equal to our observed value.

Modification of the Wakeley–Hey speciation-by-isolation model:

We modified the Wakeley–Hey (WH) model (Wakeley and Hey 1997) to test for differences among loci in the distribution of “fspp,” i.e., the fixed differences (f), shared ancestral polymorphism (s), and private polymorphisms (p1, p2) between closely related species. Specifically, we were interested in whether the distribution of variation of “fspp” differed at Pto as compared to 14 other non-R-genes from L. peruvianum and L. chilense (sequences available from GenBank accessions AY941323AY941771 and DQ104647DQ104695). We chose to focus on these two species in particular because both are self-incompatible. This enabled us to avoid violating the assumption of random mating implicit in many population genetic tests. The available sample sizes for L. hirsutum, another self-incompatible species in this clade, were too small to be used in these tests. These tests focused specifically on individuals from a single population of each species, namely LA2884 for L. chilense and LA2744 for L. peruvianum.

We considered three data partitions: (1) synonymous sites only, (2) nonsynonymous sites only, and (3) synonymous plus nonsynonymous sites. This allowed us to determine whether the distribution of variation was dependent on whether or not the substitution led to an amino acid difference. First we tested whether the pattern of distribution of fspp at any of the 14 non-R-genes differed significantly from one another. From the data set of 14 genes, one gene was removed in turn. The parameter values of θ ancestral (θa), θ species 1 (θ1), θ species 2 (θ2), and τ were estimated from the remaining 13 genes using the WH program. To determine if the observed distribution of fspp at this 14th gene was different from that predicted on the basis of coalescent simulations, we used the estimated parameter values of θa, θ1, θ2, and τ from the 13 other genes. We conditioned on observing the same sum total of variant sites, i.e., Σ(f + s + p1 + p2), as observed at the removed gene. Per each data set with one gene removed in turn, we ran 10,000 coalescent simulations using an appropriately modified version of Hudson's “ms” program. For each replicate, the numbers of f, s, p1, and p2 were recorded. From these 10,000 replicates, a mean for each type of variant site (f, s, p1, p2) was calculated. A χ2-like test statistic was calculated from each replicate (i.e., Σ[(Obs − Exp)2/Exp]) over each category of variant site (fspp), where “Obs” is the observed number of f, s, p1, or p2 and “Exp” is the mean of f, s, p1, or p2 across the 10,000 replicates. This test statistic was recorded for each replicate and these values were sorted in ascending order. To determine whether the actual observed distribution of fspp differed among genes, the same test statistic described above was calculated for the gene that had been removed. Then we compared this value of the test statistic from the removed gene to the distribution of the test statistic from the 10,000 replicates based on the 13-gene set (i.e., “one gene removed” set). A value greater than the largest 500 values corresponded to a significant difference at the 5% level in the distribution of these variant types between the removed gene and the 13 others. After we determined that the 14 non-R-genes did not differ in their patterns of distribution of fspp for synonymous, nonsynonymous, and all sites, we used the same procedure to test whether Pto differed in its distribution of fspp, compared to the 14 non-R-genes.

Frequency spectrum of nonsynonymous substitutions:

In our previous study (Rose et al. 2005), we determined whether 16 different Pto alleles from these wild species could confer avrPto recognition in planta and result in the activation of a disease resistance response (HR). These alleles from six different species were chosen to represent the range of amino acid variation observed at this locus among these species. The alleles were chosen independently of the resistance response of the host and derived from individuals either resistant (15 individuals) or susceptible (4 individuals) to Pst-expressing avrPto. Sequences identical to the alleles hir183, hir137, and parv80 were amplified from additional conspecific individuals, accounting for the larger number of resistance phenotypes reported (19) vs. the number of alleles tested (16). Transient expression of these alleles in susceptible plants revealed that 11 alleles recognized AvrPto and activated the HR (classified at A+ alleles), while 5 did not (classified as A− alleles). To determine how natural selection had shaped the pattern of substitution at different positions in the Pto protein as a function of the effects of these substitutions, we evaluated the frequency spectrum of substitutions segregating in these two classes of alleles. The larger set of 48 Pto alleles from this study (pseudogenes omitted) was resampled to generate samples of 11 and 5, corresponding to the number alleles in the A+ and A− classes, respectively. The actual number of singletons observed in the A+ and A− classes was compared to the distributions generated by this resampling procedure.


Intraspecific sequence polymorphism of Pto alleles:

We identified 60 sequences as alleles of Pto and these were subjected to population genetic analyses (Table 1). For 7 of 58 individuals, no Pto alleles were identified, although multiple clones from each were sequenced, resulting in the identification of alleles of Pto paralogs, confirming that the Pto gene family does exist in these individuals. Sequences considered to be alleles of Pto showed levels of synonymous site polymorphism and divergence consistent with estimates from single-copy genes in these same individuals (Tables 2 and 3; Baudry et al. 2001; Roselius et al. 2005), bolstering the inference that these sequences identified as Pto alleles are orthologous, rather than paralogous. Furthermore, no more than two different Pto alleles were found in any individual although many clones per individual were sequenced, consistent with the presumption that Pto is present as a single, rather than duplicated, gene in these individuals.

View this table:

Origin of individuals and Pto alleles

View this table:

Average pairwise differences per site within population and within species

View this table:

Polymorphism in L. peruvianum and L. chilense across loci

Within-species polymorphism was the lowest for the selfing species L. parviflorum, L. chmielewskii, and L. pimpinellifolium (Table 2). Of the three self-incompatible species, L. hirsutum had the lowest polymorphism (π = 0.005) and L. peruvianum had the highest (π = 0.017). Three test statistics (Tajima's D, Fu and Li's D, and Fu's F) were calculated for the self-incompatible species to determine if the levels of polymorphism and frequency spectrum within species conformed to expectations under the neutral theory (Tajima 1989; Fu and Li 1993; Fu 1997). These tests failed to detect significant deviations from neutrality (supplemental Table 1 at

High levels of amino acid polymorphism at Pto:

The ratio of πnon to πsyn within the outcrossing species ranged from 0.45 to 1.00 (Table 2). This ratio was not calculated for the self-compatible species because of the low number of segregating sites observed in these species. For comparative purposes, the levels of πnon and πsyn were calculated for 14 additional loci from individuals of L. chilense (population LA2884) and L. peruvianum (population LA2744) (Figure 1, Table 3). These 14 loci showed much smaller ratios of πnon to πsyn. We used coalescent simulations to test if the value of π observed at nonsynonymous and synonymous sites fell within the 95% confidence interval of simulations in which θ was estimated from the average π across 14 non-R-genes in these species (Hudson 1990). These coalescent simulations indicated that Pto shows excess variation in both species, specifically at nonsynonymous sites (P-value = 0.000), while at synonymous sites the observed level of variation at Pto is within the 95% confidence interval on the basis of θ across these 14 other genes (Table 4 ). In fact, Pto in L. chilense shows a significantly higher level of variation, as captured in the summary statistic π, than the gene showing the highest level of nonsynonymous variation from our set of non-R-genes, namely CT148 in this species.

View this table:

Results of coalescent simulations

Figure 1.—

Average pairwise differences (π) at nonsynonymous and synonymous sites in exons of 15 genes in populations of (A) L. peruvianum, LA2744, and (B) L. chilense, LA2884. Genes CT143 and CT189 showed no polymorphism in L. chilense population LA2884 at nonsynonymous or synonymous sites. Arithmetic mean of π nonsynonymous (solid line) and π synonymous (shaded line) is based on 14 genes excluding Pto.

Interspecific sequence divergence of Pto within the Lycopersicon clade:

The sequence divergence between species was compared to the sequence polymorphism within species to evaluate the extent of shared polymorphism at this locus (Table 5 ). The level of divergence among alleles within the clade Lycopersicon (Math = 0.02) was nearly the same as the level of polymorphism found in a single L. peruvianum population (π = 0.018 for LA3355). There were very few fixed differences between L. peruvianum and the other species at the Pto locus, indicating that substantial shared polymorphism exists between L. peruvianum and the other species (Table 5); also, a high proportion of fixed differences among species are polymorphic within L. peruvianum. L. peruvianum is known to exhibit much of the variation detected in other species of Lycopersicon. Baudry et al. (2001) reported that ∼40% of the sites with fixed differences among L. chilense, L. chmielewskii, L. hirsutum, and L. pimpinellifolium exhibited the same variants segregating within L. peruvianum. With the larger data set of 14 genes available now for these same populations, we tested whether the proportion of fixed differences between species that are still segregating in L. peruvianum is significantly higher at Pto compared to the 14 loci as might be expected under a scenario of balancing selection at Pto. A total of 143 sites across the 14 non-R-genes were found to be fixed between at least one pair of the three species L. chilense, L. chmielewskii, and L. pimpinellifolium, while 25 substitutions at Pto were fixed between at least one pair of the three species (Table 6). For the 14-gene data set, 46.85% (67/143) of the fixed differences were polymorphic within L. peruvianum. At Pto, this proportion was 60% (15/25). Bootstrapping the data set from the 14 non-R-genes revealed that a value >15 was found in only 6.51% of the bootstrap replicates. This suggests that the proportion of sites that are fixed between species and still segregating in L. peruvianum is higher at Pto than at the 14 other genes. Another interesting observation is that over half of these variants fixed between species but segregating in L. peruvianum encode amino acid differences (53.3%). At the other 14 loci, this proportion is 23.88%. Using the same bootstrap method as described above, this proportion of segregating variants encoding amino acid differences is statistically higher at Pto than at the other loci (P-value = 0.0195).

View this table:

The average pairwise nucleotide differences per site between alleles

View this table:

Fixed differences between species segregating in L. peruvianum population LA2744

WH analyses:

We modified the Wakeley–Hey model (Wakeley and Hey 1997) to test for differences among loci in the distribution of “fspp,” i.e., fixed differences (f), shared ancestral polymorphism (s), and private polymorphisms (p1, p2) between closely related species. Specifically, we were interested in whether the distribution of variation of fspp differed at Pto as compared to 14 other non-R-genes from L. peruvianum and L. chilense. If balancing selection were operating at Pto in these species, we would expect to see elevated shared polymorphism and a reduction of fixed differences between species. If directional selection were operating at Pto, we might see the opposite pattern: a reduction of shared polymorphism, many fixed differences among species, and a reduction of within-species polymorphism at Pto as compared to those other loci. First, we tested if the 14 non-R-genes differed in their distribution of fspp. We did not find any significant differences among these genes (supplemental Table 2 at Likewise for Pto, the P-values were all >5%, no matter which data partition was considered (supplemental Table 2 at In conclusion, we found no evidence that Pto (or any other gene tested) shows significant differences among the distribution of fspp among these species.

Skew in frequency spectrum of nonsynonymous substitutions:

Results from functional assays described in Rose et al. (2005) were used to classify substitutions at the Pto locus in terms of functional consequences of recognition and signaling ability. Of the 16 tested alleles, 11 were functional (denoted the A+ class) and 5 were not (A− class). The sequences of the 16 functionally tested Pto alleles were compared with a larger sample of alleles from these species (48 in total). In the sample of functionally tested alleles, 41 nonsynonymous substitutions were observed, 9 of which were found only once in the entire sample of 48 alleles (i.e., appeared as singletons). The A− class harbored a disproportionate number of these nonsynonymous singletons (6 of 9). The frequency skew in nonsynonymous substitutions among A− alleles raised the possibility that these polymorphisms represent deleterious substitutions and that these alleles were being kept in low frequency in these species by natural selection. These A− alleles may also accumulate nonsynonymous substitutions at a higher rate because they are no longer functional and are evolving neutrally.

On the other hand, considering amino acid polymorphisms in the A+ class, since these polymorphisms did not affect AvrPto recognition or signaling and are effectively neutral with respect to this phenotype, the frequency spectrum of these nonsynonymous changes would not be expected to be negatively skewed. We used a nonparametric bootstrap method approach to ask: when any five alleles are sampled at random without replacement, what proportion of the time are six or more nonsynonymous singletons found? In only 2.83% of the replicate data sets were six or more nonsynonymous singletons observed (supplemental Table 3 at Replicate data sets of 11 alleles (the A+ sample size) revealed that the probability of observing three or more nonsynonymous singletons (as observed in the A+ sample) is well within the range expected (P-value = 0.9349). For synonymous singletons, neither the 5-allele sample nor the 11-allele sample showed any deviation from the expected, indicating that the observed number of synonymous singletons is not disproportionately high for the A+ or A− classes (supplemental Table 3 at


Comparisons of intraspecific variation between Pto and 14 other loci from these species established how the pattern of evolution differs at Pto as compared to a set of non-R-genes in these species. Three main conclusions emerged: (1) Pto shows significantly higher intraspecific amino acid variation than observed at other loci; (2) many of the substitutions in Pto, which are fixed between species, are still segregating in L. peruvianum; and (3) amino acid substitutions associated with loss of function are skewed toward low frequency, while amino acid substitutions not associated with loss of function are not. Therefore, Pto is not evolving in a similar manner as the other loci analyzed in these species. However, the distribution of variation at Pto does not fit the predictions of a straightforward balanced polymorphism as reported for some other resistance loci (Caicedo et al. 1999; Stahl et al. 1999; Tian et al. 2002; Mauricio et al. 2003; Rose et al. 2004; reviewed in Meyers et al. 2005; Tiffin and Moeller 2006). While elevated amino acid polymorphism relative to silent polymorphism is expected under a scenario of balanced polymorphism, amino acid polymorphism alone does not provide unequivocal support for this hypothesis. If balancing selection were operating, the ratio of replacement to silent polymorphism should be significantly elevated relative to the ratio of replacement to silent divergence inferred using an outgroup. In the case of Pto, unfortunately, our outgroup sequence from S. ochranthum encoded a pseudogene, precluding the use of conventional tests of neutrality (i.e., McDonald–Kreitman and HKA tests). Therefore, other methods were considered to evaluate the variation and to test alternative hypotheses.

One possible explanation for the elevated amino acid polymorphism and detection of pseudogenes would be relaxed constraint. While this is certainly a viable hypothesis, additional aspects of the data should be taken into account. For instance, if Pto were evolving under relaxed constraint, we would predict that the frequency spectrum for amino acid substitutions that disrupt function should be the same as that of substitutions that do not disrupt function. However, substitutions limited to those alleles that fail to signal AvrPto recognition were at low frequency in these species, while substitutions that are not implicated in knocking out AvrPto recognition were found at higher frequency. In the former case, purifying selection may be operating to remove or keep deleterious substitutions at low frequency, while, in the latter case, the amino acid variation observed could be selectively neutral or possibly maintained by natural selection.

Amino acid sequence conservation at Pto:

Other indications that purifying selection is operating at this locus come from jointly considering the natural variation at Pto and structure–function relationships of Pto and other kinases. The catalytic core of kinases consists of ∼300 residues encompassing 12 uninterrupted subdomains folded into a bilobal structure (Hanks and Hunter 1995). Of the ∼300 residues, 12 are essentially invariant among kinases, and these 12 sites are also invariant in the collection of 48 Pto alleles described here. Furthermore, functional studies of the Pto protein using site-directed and random mutagenesis have specifically characterized the functional consequences of variation at 66 positions (20%) of the protein (Salmeron et al. 1994; Scofield et al. 1996; Frederick et al. 1998; Rathjen et al. 1999; Sessa et al. 2000; Xiao et al. 2003; Wu et al. 2004; Bernal et al. 2005; de Vries et al. 2006). The vast majority (>86%) of the positions that knock out Avr recognition and/or signaling are invariant among the alleles collected from these natural populations. Only 6 of 43 sites demonstrated through mutagenesis to knock out or severely alter Pto function are variable in our set of alleles. Two of these polymorphic positions (sites 185 and 313) are singleton polymorphisms restricted to two different A− alleles (hir46 and chi487, respectively), consistent with previous functional results.

The other four positions (sites 201, 202, 205, and 208) that are polymorphic among our alleles were restricted to the P+1 loop of the protein (positions 201–210). This region of protein kinases often serves as the ligand-binding site and is in close proximity to the T-loop, where regulatory phosphorylation takes place (Hanks and Hunter 1995). To investigate the sensitivity of this region to the presence of negatively charged molecules (as a mimic of phosphorylation), Rathjen et al. (1999) systematically altered the positions in this region to create a negatively charged amino acid (aspartic acid) to address how perturbations in this P+1 loop affected Pto function (Rathjen et al. 1999; Wu et al. 2004). Nearly all acidic substitutions in the P+1 loop led to gain-of-function phenotypes; i.e., these altered proteins constitutively activated the hypersensitive disease response, even in the absence of a pathogen ligand. Residues in the P+1 loop are normally hydrophobic, polar, or positively charged, but never negatively charged. A substitution of an aspartic acid is a drastic change and none of our polymorphisms led to such great differences in polarity. At three of the polymorphic positions (201, 205, and 208), the segregating variants were all the same polarity (i.e., all hydrophobic). At position 202, a lysine (positively charged, hydrophilic) and a glutamine (polar, hydrophilic) segregated. In all cases, these polymorphisms are found among alleles that are functional in activating an AvrPto-dependent HR (A+ class) and, therefore, these particular variants are unlikely to be subject to negative (purifying) selection. Therefore, although Pto shows high levels of amino acid polymorphism, there is no evidence of an excess of amino acid variation at sites that disrupt protein function. The polymorphism that is associated with loss of function and that is present in these alleles is at low frequency in these species.

Conservation and polymorphism:

The fact that the majority of plants (75%) are resistant to Pst and, of the ones tested, most carried a functional Pto gene implied that this trait is selectively favored and maintained in many natural populations (Rose et al. 2005). Which scenario may best explain this mixture of evidence for purifying selection and maintenance of polymorphism observed at Pto? Simple deterministic models of host–parasite dynamics show that polymorphism can be maintained at host R-genes and pathogen Avr genes, provided that virulence and resistance carry fitness costs (Barrett 1988; Seger 1988; Leonard 1994). If the host is continuously exposed to the pathogen, differentiated host alleles can be maintained over long periods of time and prolong the age of these alleles relative to the neutral expectation. This can result in an accumulation of linked neutral variation around the R-gene locus. Within the individual allelic classes, purifying selection operates to maintain the specificity and function of these alleles. In a few cases, the distribution of variation within and between resistance and susceptible haplotypes has matched many of these predictions (Stahl et al. 1999; Tian et al. 2002).


Recent theoretical work has revealed that, even without costs of resistance or virulence, polymorphism for resistance can be maintained. Incorporating the spatial scale of dispersal of hosts and pathogens in a metapopulational framework, Thrall and Burdon (2002) showed that variation in host and pathogen populations is a stable outcome even without assuming costs. Likewise, Salathe et al. (2005) found that polymorphism can be maintained without costs if the assumption of infinite population sizes in the host and pathogen is relaxed. Applied to the Pto system, periods when Pto experiences selection by pathogens expressing AvrPto or AvrPtoB could alternate with periods when the pathogen population is dominated by genotypes lacking these Avr genes, relaxing constraint on Pto. During these intervals, neutral or deleterious substitutions could accumulate. Each round of relaxed selection could spin off novel null alleles and pseudogenes. Reinvasion of pathogens expressing functional Avr genes would favor those individuals with intact Pto molecules, thereby explaining the paradoxical signatures of both purifying selection and relaxed constraint.

Incorporating costs of virulence or resistance into these models stabilizes these dynamics and can lead to the maintenance of even greater genotypic diversity in host and parasite (Salathe et al. 2005). In particular, considering all four possible combinations of costs—(1) no costs, (2) cost of resistance, (3) cost of virulence, or (4) costs of both resistance and virulence—the greatest diversity in host R-genes is found when virulence is costly. In the Pto system, strains lacking AvrPto and AvrPtoB are able to infect hosts with and without the Pto resistance gene. These strains are considered “virulent.” However, there are many indications that this virulence is costly; that is, the pathogen pays a fitness penalty for losing a particular avirulence factor. For example, in Pto-minus plants, Pst-expressing AvrPto grows 10 times as much as the isogenic strain lacking AvrPto (Chang et al. 2000). Hosts infected with Pst+AvrPto show more visible disease symptoms, including speck formation, enhanced to severe necrosis and leaf dehydration. AvrPto can suppress the cell death response to nonhost pathogens (Kang et al. 2004) and suppresses cell-wall-based defense in Arabidopsis (Hauck et al. 2003). Likewise, AvrPtoB, which encodes an E3 ubiquitin ligase, is capable of suppressing programmed cell death in Nicotiana benthamiana (Abramovitch et al. 2003; Jamir et al. 2004; Lin et al. 2006). AvrPtoB is a homolog of a gene encoding another well-characterized virulence factor, VirPphA, and when AvrPtoB is expressed in a disabled strain of P.s. pv. phaseolicola, it can restore water-soaking ability to this pathogen (Jackson et al. 2002). Recently, the phenotype of a double-mutant Pst strain, lacking both AvrPto and AvrPtoB, was generated (Lin and Martin 2005). This strain shows an even slower growth rate in susceptible hosts than either of the single mutant Pst strains, indicating that AvrPto and AvrPtoB contribute additively to virulence activity of Pst. As modeling has demonstrated, such a fitness cost of virulence (i.e., the loss of particular Avr genes) may help explain the maintenance of host polymorphism at Pto.

Maintenance of variation—the effect of dual or multiple specificities:

A cost of virulence may underlie the maintenance of functional and nonfunctional forms of Pto, but how do we explain the presence of diversity among functional alleles? One possibility is that this diversity is neutral with respect to Pto function. Non-R-genes in these species show very low levels of amino acid polymorphism and tests of neutrality reveal that the purifying selection is the prevailing selective pressure operating at these loci (Roselius et al. 2005). Assuming that the underlying neutral mutation rate is the same across these loci, we would conclude that a larger proportion of the Pto protein can tolerate amino acid variation than can the non-R-genes. However, Pto is a small, compact protein of 321 amino acids. This functionally well-characterized R-gene has dual specificity, interacting with at least two different avirulence proteins, AvrPto and AvrPtoB. This dual specificity of Pto may give us some insight into the maintenance of amino acid differentiation among Pto alleles. For the most part, these two Avr proteins appear to bind and activate Pto in a similar way (Kim et al. 2002; Bernal et al. 2005). However, we recently used a DNA-shuffling approach to generate chimeric Pto proteins and tested these chimeric molecules for AvrPto and AvrPtoB recognition ability (Bernal et al. 2005). A small number of the tested chimeras showed differential recognition specificity: 12 chimeras bound AvrPto, but not AvrPtoB, while 5 interacted with AvrPtoB, but not AvrPto. Amino acid positions that were critical for this differential recognition specificity were identified, including Ser76, Gly78, and Leu213. As particular amino acid substitutions in Pto can differentiate the recognition of these two avirulence proteins, it is possible that the amino acid variation among Pto alleles could be maintained by quantitative or qualitative differences in specificity or binding ability of these or additional avirulence proteins. “Multiple recognition” may be viewed as a consequence of the host adapting to recognize different pathogen effector molecules or, alternatively, of the pathogen targeting specific host proteins. Ultimately, both evolutionary scenarios may lead to the maintenance of Pto alleles differing in effector binding ability and could contribute to the maintenance of high levels of amino acid variation at Pto as compared to other loci in these individuals.


We thank H. Akashi, J. Baines, W. Gilliland, J. Hermisson, M. Koch, J. Parsch, P. Pennings, R. Ree, W. Stephan, and two anonymous reviewers for valuable input and critique. The plant seeds were provided by the C. M. Rick Tomato Genetics Resource Center (TGRC) and the USDA Plant Genetic Resources Unit. This work was supported by a National Science Foundation (NSF) Dissertation Improvement Grant (9902342) and a Jastro Shields Graduate Research Award to L.E.R., and an NSF Cooperative Agreement (BIR-8920216) to the Center for Engineering Plants Resistant Against Pathogens.


  • Sequence data from this article have been deposited with the EMBL/GenBank Data Libraries under the accession nos. DQ019170DQ019221.

  • Communicating editor: D. Weigel

  • Received July 16, 2006.
  • Accepted December 8, 2006.


View Abstract