Abstract
The Arabidopsis thaliana CLAVATA2 (CLV2) gene encodes a leucine-rich repeat protein that regulates the development of the shoot meristem. The levels and patterns of nucleotide variation were assessed for CLV2 and 10 flanking genes that together span a 40-kb region of chromosome I. A total of 296 out of 7959 sequenced nucleotide sites were polymorphic. The mean levels of sequence diversity of the contiguous genes in this region are approximately twofold higher than those of other typical Arabidopsis nuclear loci. There is, however, wide variation in the levels and patterns of sequence variation among the 11 linked genes in this region, and adjacent genes appear to be subject to contrasting evolutionary forces. CLV2 has the highest levels of nucleotide variation in this region, a significant excess of intermediate frequency polymorphisms, and significant levels of intragenic linkage disequilibrium. Most alleles at CLV2 are found in one of three haplotype groups of moderate (>15%) frequency. These features suggest that CLV2 may harbor a balanced polymorphism.
BALANCED polymorphisms are maintained in populations by selective forces acting on alternative alleles of a locus (Richman 2000; Tianet al. 2002). Various forms of balancing selection as well as local adaptation can lead to the persistence of allelic variants of a gene in a species. Molecular population genetic analyses have identified several examples of balanced polymorphisms in eukaryotic genes, including the Adh locus in Drosophila melanogaster (Kreitman and Hudson 1991), the self-incompatibility locus in various plant species (Richman 2000; Uyenoyama 2000), and the Rpm1 disease-resistance gene in Arabidopsis thaliana (Stahlet al. 1999). In balanced polymorphisms, selection is expected to maintain a region of enhanced variability of neutral polymorphisms surrounding a selected site, resulting in correlated gene genealogies among linked loci (Nordborget al. 1996; Tianet al. 2002). The window of increased variation in outcrossing species, however, can be fairly narrow as recombination breaks apart correlations among linked sites surrounding a target of balancing selection (Nordborget al. 1996). In general, the scale of elevated variation in species such as D. melanogaster is <1 kb; variation around the Fast/Slow Adh polymorphism, for example, is enhanced in a region of ∼200 bp (Kreitman and Hudson 1991).
In selfing species, the width of the genomic region of enhanced variation scales with the inverse of the population recombination parameter C = 4Ner′, where Ne is the effective population size and r′ is the selfing-reduced effective recombination rate (Charlesworthet al. 1997). Since r′ in selfing species is generally lower than the recombination rate in outcrossing species, the window of enhanced variation surrounding a balanced polymorphism should be wider in selfers than in out-crossers. Indeed, a very low recombination rate can result in balanced polymorphisms encompassing large tracts of linked sites in the genome. Thus, in selfing species, selection for balanced polymorphism can thus affect the genetic diversity and evolutionary dynamics of both adjacent and distant genes.
A. thaliana provides an excellent opportunity to empirically assess the genomic impact of balanced polymorphisms in a predominantly selfing plant species. Outcrossing rates in this weedy plant species are estimated to be as low as 1% (Abbot and Gomez 1989). Species-wide surveys of nucleotide variation reveal a low level of recombination within nuclear loci (Kawabeet al. 1997; Miyashitaet al. 1998; Nordborget al. 2002). This low effective recombination rate may lead to strong correlations among nucleotide polymorphisms over long distances in the genome. In A. thaliana, linkage disequilibrium among polymorphic nucleotide sites is observed both within and among genes, and disequilibrium tracts can extend up to ∼250 kb (Nordborget al. 2002). In contrast, linkage disequilibrium generally decays within several hundred basepairs in D. melanogaster (Longet al. 1998) and within 1.5 kb in the outcrossing plant Zea mays (Remingtonet al. 2001; Tenaillonet al. 2001).
Low effective recombination and long-range linkage disequilibrium in A. thaliana suggest that the region of enhanced variation associated with a balanced polymorphism could extend over several linked genes. This linkage may affect the rate and efficacy of selection on alternate alleles. Recent studies, however, contradict this prediction; the effects of balanced polymorphisms in the TFL1 gene (Olsenet al. 2002), the RPS5 disease-resistance locus (Tianet al. 2002), and the enzyme locus PgiC (Kawabeet al. 2000) appear to be quite localized. To clarify how far the effects of selection can extend in the A. thaliana genome, we have undertaken a systematic investigation of a putative balanced polymorphism in the CLAVATA2 (CLV2) gene.
CLV2 is a meristem regulatory gene located near 89 cM on chromosome I. Loss-of-function mutations at CLV2 result in the accumulation of undifferentiated cells in vegetative, inflorescence, and floral meristems. The enlargement of these shoot meristems contributes to the formation of extra flowers and floral organs (Kayes and Clark 1998). The 720-amino-acid (aa) protein encoded by CLV2 includes a signal peptide, a putative extracellular domain with ∼20 leucine-rich repeats (LRR), a transmembrane region, and a short cytoplasmic domain (Jeonget al. 1999). Although CLV2 is structurally similar to the Cf family of disease-resistance proteins from tomato, the complexes formed by CLV2 and the Cf proteins are quite different (Rivaset al. 2002). CLV2 appears to be necessary for protein accumulation of CLV1, a LRR receptor kinase (Jeonget al. 1999). These two proteins are hypothesized to form a disulfide-linked heterodimer in the plasma membrane, although there is not yet direct evidence for this interaction. When bound by a multimeric ligand that includes the CLV3 protein (Trotochaudet al. 2000), the activated CLV1-CLV2 complex triggers a signal transduction cascade that ultimately represses the WUSCHEL gene, a transcription factor gene that promotes shoot meristem growth (Trotochaudet al. 1999; Brandet al. 2000).
The initial isolation of CLV2 showed that this gene harbors a large amount of nucleotide diversity (Jeonget al. 1999). This elevated variation might be caused by the maintenance of a balanced polymorphism in CLV2, by correlated effects of selection at neighboring loci, or by a high rate of mutation in this portion of the genome. Here we report a molecular population genetic analysis of CLV2 and 10 adjacent genes that are found in a 40-kb region of the genome. Consistent with the low effective recombination in A. thaliana, linkage disequilibrium persists not only within genes in the CLV2 region, but also between loci separated by as much as 25 kb. Genes in this region also show the elevated silent site nucleotide variation associated with the effects of balancing selection. The level of variation is highest at CLV2; however, there is a nearly 10-fold range in the levels of nucleotide diversity among the neighboring loci. Moreover, the allelic distribution of nucleotide variation varies markedly in this region. Although CLV2 gene also has a significant excess of intermediate frequency polymorphisms and intragenic linkage disequilibrium (consistent with balancing selection), most nearby loci have an excess of rare polymorphisms. These results indicate that adjacent genes may have differing patterns and levels of nucleotide variation, suggesting that they are subject to contrasting evolutionary forces.
MATERIALS AND METHODS
Isolation and sequencing of alleles: A. thaliana ecotypes were obtained from single-seed propagated material provided by the Arabidopsis Biological Resource Center (ABRC; see Table 1). The Lisse-2 seed stock was from the population collection of P. H. Williams maintained at ABRC. A. lyrata seed from a Karhumaki, Russia, population was provided by O. Savolainen and Helmi Kuittinen.
Genomic DNA was isolated from young leaves of 9-20 A. thaliana ecotypes and one A. lyrata accession using the plant DNeasy mini kit (QIAGEN, Chatsworth, CA). PCR primers for 11 genes in this region were designed from the Col-0 genomic sequence [bacterial artificial chromosome (BAC) T8F5, GenBank accession no. AC004512] using Primer3 (Rozen and Skaletsky 1998); all primers were located in predicted exons. Primers were chosen without regard to predicted functional domains, but are biased toward the 5′ end of coding sequences. Description of sequenced genes (see Table 2) and the PCR primers used in the amplification reactions are described in Supplementary Text I at http://www.genetics.org/supplemental/. PCR of A. thaliana samples was performed with Taq DNA polymerase (Eppendorf, Madison, WI) using protocols designed for direct sequencing. PCR of A. lyrata samples was performed with the error-correcting Pwo polymerase (Roche) using the manufacturer’s amplification protocol. The error rate of this error-correcting polymerase is <1 in 7000 bp (M. Purugganan, unpublished observations).
DNA fragments were purified using the QIAquick PCR purification kit or the QIAquick gel extraction kit (QIAGEN). A. thaliana samples were sequenced directly via cycle sequencing with BigDye terminators (Applied Biosystems) using the primers described in Supplementary Text I at http://www.genetics.org/supplemental/. Several singleton polymorphisms were confirmed with reamplification and sequencing. Amplified A. lyrata products were cloned into pCR4Blunt-TOPO vector using the Zero Blunt TOPO PCR cloning kit (Invitrogen). Plasmid miniprep DNA was isolated using the QIAprep miniprep kit (QIAGEN), and sequenced twice via cycle sequencing from both directions. DNA sequencing was conducted with a Prism 3700 96-capillary automated sequencer (Applied Biosystems). The PHRED and PHRAP functions (Ewing and Green 1998; Ewinget al. 1998) of BioLign 2.0.7 (Tom Hall, North Carolina State University) were used to call bases and to create contigs; low-quality sequence was trimmed from contigs. GenBank accession numbers for these genes are AF528566-AF528713.
Molecular population genetic data analysis: Sequences used in this study were visually aligned against the A. thaliana GenBank sequence for the Col-0 accession (no. AC004512). The variable length portions of microsatellites were excluded from the analysis. The A. lyrata ortholog was used as the outgroup. Interspecific divergence distances were estimated from silent sites with the Kimura two-parameter model using MEGA2.1 (Kumaret al. 2001). Polymorphism analyses were conducted using DnaSP 3.51 (Rozas and Rozas 1999). Levels of nucleotide diversity per site were estimated as π (Tajima 1983) and θW (Watterson 1975). The Tajima (1989) and Fu and Li (1993) tests for selection were conducted; Fu and Li’s test was performed both with (D) and without (D*) the A. lyrata outgroup sequence. Significance of Tajima’s and Fu and Li’s test statistics was determined in coalescent simulations with 10,000 runs using the number of segregating sites under a model of no recombination. Linkage disequilibrium between informative sites within and between genes was estimated as r2 (Hill and Robertson 1968) with significance determined by Fisher’s exact tests. Levels of intragenic disequilibrium were also quantified by the ZnS statistic (Kelly 1997) with deviation from neutral-equilibrium expectations determined by coalescent simulations with 10,000 runs using the recombination parameter estimated from the data.
The Hudson-Kreitman-Aguadé (HKA) two-locus test (Hudsonet al. 1987) was conducted using silent site changes from a program available from Jody Hey (Rutgers University). The Adh locus was chosen as the reference neutral locus in these tests (Innanet al. 1996; Miyashitaet al. 1998). Some studies suggest that this gene may harbor a balanced polymorphism (Miyashita 2001), which may indicate that using this gene as a reference locus is conservative when testing for the hypothesis of balancing selection. Among several genes, however, the pattern of variation at Adh is one that is most consistent with neutral-equilibrium expectations under a metapopulation model (Innan and Stephan 2000).
Previously published A. thaliana sequences of the following genes, which were available at the time of this study, were used in comparisons of nucleotide diversity: Adh (Innanet al. 1996; Miyashitaet al. 1998); AP1, LFY, and TFL1 (Olsenet al. 2002); AP3 and PI (Purugganan and Suddith 1999); CAL (Purugganan and Suddith 1998); CHI (Kuittinen and Aguadé 2000); ChiA (Kawabeet al. 1997); ChiB (Kawabe and Miyashita 1999); F3H and FAH1 (Aguadé 2001); PgiC (Kawabeet al. 2000); and RPS2 (Caicedoet al. 1999). The same set of genes, with the exception of ChiB and RPS2, was used in interspecific divergence comparisons.
RESULTS
Nucleotide variation among linked genes in a 40-kb region of Arabidopsis chromosome I: Fragments of 11 adjacent genes on chromosome I were sequenced from 9 to 12 A. thaliana accessions sampled primarily from Eurasia (Tables 1 and 2). The sequenced regions spanned exons and (when present) introns within the coding region of each gene; fragments ranged from 277 to 939 bp, with a mean length of 724 bp/gene. Of the 7959 nucleotide sites sequenced for this study, 296 sites segregated for single nucleotide polymorphisms. Twenty-eight indel polymorphisms, ranging from 1 to 3.9 kb, were also observed in these sequences. Four indels, two in the serpin and two in the ARI/RING-like gene, are associated with simple sequence or microsatellite repeats in introns. Seven indels occur in coding regions. Tables of polymorphic sites are given in Supplementary Figures S1-S7 at http://www.genetics.org/supplemental/.
Polymorphisms in the UBQ13, the MATH domain gene, and the serpin suggest that these loci may be pseudogenes. All sampled UBQ13 alleles contain a partial ubiquitin repeat followed by three or four complete repeats. We were unable to locate the rest of the repeat in the upstream genomic sequence of the Col-0 accession. The internal repeats appear to have undergone substantial recombination; because homology among these repeats was difficult to determine, analyses were restricted to the 5′ flanking region, the partial repeat, and the first and last complete repeats. One allele of UBQ13 codes for a premature stop codon, while two alleles contain 3- or 12-bp deletions in coding sequence. The Col-0 allele contains a 3.9-kb insertion of mitochondrial DNA (Sun and Callis 1993) that was not observed in any other accession. One allele of the MATH domain gene codes for a premature stop codon, while another allele has a frameshift mutation. The putative serpin gene has multiple lesions, including three alleles with premature stop codons, three with frameshift mutations, and five with a 39-bp deletion in the coding region. The large number of potentially deleterious polymorphisms in the UBQ13 and serpin genes suggests that they are recent pseudogenes. It is unclear whether the MATH domain gene, which is expressed in Col-0, segregates for rare deleterious alleles or is an incipient pseudogene. In estimating levels of silent site nucleotide diversity for these loci, we have aligned these genes according to their predicted coding potential.
A. thaliana accessions
Genes in the A. thaliana CLV2 genomic region
Estimates of silent nucleotide diversity and divergence in the CLV2 region: Nucleotide diversity at silent sites (third position of codons and noncoding regions) for these 11 genes was estimated from the average number of pairwise differences (π; Tajima 1983) and from the number of segregating sites (θW; Watterson 1975). Focusing on silent sites permits comparisons of sequences with different proportions of coding to noncoding sequences. In addition, the amount of silent site diversity provides information about the action of selection at linked sites. Among the genes in the CLV2 region, levels of π span nearly one order of magnitude, from 0.0063 to 0.0579 (see Table 3A and Figure 1). Levels of θW show a comparable range, from 0.0075 to 0.0489 (Table 3A). CLV2 exhibits the greatest silent site diversity, with the highest π and the second highest θ. The high value of θW observed at the putative antigen receptor can be attributed to a single allele from the Ita-0 accession that accounts for 7 of the 10 segregating sites in this gene. The lowest levels of π and θW were observed 3 kb upstream of CLV2 in the MATH domain gene and 17 kb downstream in the ARI/RING-like gene.
Overall, the CLV2 region exhibits elevated levels of silent site nucleotide diversity compared to other nuclear genes in A. thaliana. The mean values of π and θW for the 11 genes in the CLV2 region are 0.0219 ± 0.005 and 0.0241 ± 0.004, respectively. These mean diversity levels are considerably greater than those observed among 14 previously studied A. thaliana genes. For these other loci (see materials and methods), the mean values of silent site π and θW are 0.009 ± 0.001 and 0.012 ± 0.002, twofold lower than those of the genes in the CLV2 region (see Figure 2). In contrast, the 11 genes in the CLV2 region display only slightly higher levels of nucleotide divergence between A. thaliana and the closely related species A. lyrata (Table 3A). The mean level of silent site sequence divergence, K2P, between these two species for 12 previously studied A. thaliana genes is 0.123 ± 0.007 substitutions/site. The mean nucleotide divergence level for the 11 linked genes in the CLV2 region is 0.138 ± 0.010 substitutions/site.
The HKA test (Hudsonet al. 1987) detects differences in nucleotide variation levels between two loci when corrected for mutation rate variation. This test was applied to the genes in the CLV2 region, with the Adh gene serving as the reference locus. The observed numbers of silent intraspecific polymorphisms and interspecific differences for Adh are 30 and 124.12, respectively. The numbers of silent site within-species polymorphisms and between-species differences for each gene at the CLV2 region are indicated in Table 3. The HKA tests reveal that three genes have significant increases in nucleotide variation levels (Table 4A). The three genes that display significant deviation from the neutral-equilibrium model based on the HKA test are CLV2 (P < 0.01), AtNAP11 (P < 0.03), and the antigen receptor gene (P < 0.04). The non-neutral evolution at these loci is associated with an excess of intraspecific variation for each gene as compared to the neutral Adh locus.
Selective forces among linked genes in the CLV2 region: The frequency distribution of polymorphisms provides information on the relative roles of neutral drift vs. selection at specific loci. The skewness of frequency distributions for nucleotide polymorphisms in the sample or along branches in the gene genealogy can be evaluated with the Tajima (Tajima 1989) or Fu and Li (Fu and Li 1993) tests for selection, respectively. Since A. thaliana may have experienced a recent population expansion, these two tests should be interpreted with caution when inferring selection. However, they may still provide information on the extent and direction of deviations in molecular diversity patterns from predictions of the neutral-equilibrium model, as well as permit comparison of relative patterns of nucleotide variation between genes. To take into account the selfing nature of A. thaliana, the significance of these test statistics was assessed by coalescent simulations under a stringent model of no recombination.
Among the 11 genes in the CLV2 region, 8 have negative values of Tajima’s D and Fu and Li’s D and D*, indicating an excess of low-frequency polymorphisms within these loci (Table 4A). The trend toward excess low-frequency polymorphism for most of the genes at the CLV2 region is similar to that observed for many other Arabidopsis nuclear genes (Purugganan and Suddith 1999; Innan and Stephan 2000; Kuittinen and Aguadé 2000). This pattern of variation may reflect the inbreeding associated with this selfing plant and/or rapid post-Pleistocene range expansion of this species (Sharbelet al. 2000). However, only 2 genes—the putative serpin (Fu and Li D =-1.6981, P < 0.05) and the putative antigen receptor (Tajima’s D =-1.9246, P < 0.05; Fu and Li D* =-2.2497, P < 0.01)—show significantly negative values of at least one test statistic. In the latter case, this significant excess in low-frequency polymorphisms is largely due to the presence of a single divergent haplotype from the Moroccan Ita-0 ecotype.
Features of sequence variation in the A. thaliana CLV2 genomic region
—Levels of silent site nucleotide diversity at CLV2 and flanking loci. The dashed line shows the mean level of nucleotide diversity (π) in previously studied genes of A. thaliana. Sequenced and unsequenced exons are shown in thick solid or shaded bars, respectively. The line connecting UBQ13 fragments spans the mitochondrial DNA insertion observed in Col-0. Arrows indicate each gene’s orientation in the chromosome.
In contrast, both CLV2 and the TIR domain gene have consistently positive values of the Tajima and Fu and Li test statistics (Table 4A), but only the TIR domain gene was significantly positive (Fu and Li D* =+1.2984, P < 0.05; D =+1.6783, P < 0.01). Loci with significant positive values of these test statistics have rarely been observed in previous studies of A. thaliana. Positive values of these test statistics are associated with an excess of intermediate-frequency polymorphisms. These data suggest that both of these genes may be evolving nonneutrally in a pattern consistent with balancing selection, but the power of these tests is limited at such small sample sizes.
Since our results indicated that the CLV2 gene has the highest level of polymorphism among the 11 linked genes, we examined variation at this gene and its three closest neighbors in greater detail. We sequenced additional accessions at these loci to increase the number of sampled alleles to 19-21. The results from this expanded data set are consistent with the patterns observed with the smaller data set. The levels of nucleotide variation, the directions of the Tajima’s D and Fu and Li’s D* and D tests statistics, and the results of the HKA tests against Adh are all comparable across the two data sets (Tables 3 and 4). The only difference is that with larger sample sizes, the value of Tajima’s D is now significant for CLV2 (D =+1.752, P < 0.05). This finding is consistent with previous analyses that indicated that augmenting sample sizes for sequenced alleles increases the power to detect significant deviations from the neutral-equilibrium model (Simonsenet al. 1995).
—Comparison of silent site nucleotide diversity between genes in the CLV2 region and other A. thaliana genes. Previously studied genes (open circles) and genes at the CLV2 region (solid circles) are ranked by π.
The positive value of Tajima’s D in CLV2 is associated with the presence of at least three distinct haplotype groups (I, II, and IV in Figure 3). These three haplogroups are found at moderate frequency, with the rarest haplogroup at ∼15% frequency. Also, one haplotype (III in Figure 3) may have arisen from a recombination event between alleles belonging to groups II and IV. Alternatively, haplotype III, obtained from the Ita-0 accession, may represent an additional allelic class; this accession also bears more divergent alleles of several other loci in this region.
Intragenic and intergenic linkage disequilibrium at the CLV2 region: Linkage disequilibrium, the nonrandom association of allelic polymorphisms, was surveyed for nucleotide polymorphisms both within and between genes in the CLV2 region. The amount of linkage disequilibrium was estimated using the r2 statistic (Hill and Robertson 1968) for nonsingleton sites, and the significance of pairwise disequilibrium comparisons was assessed with Fisher’s exact test. In the smaller sampling of 9-12 accessions, 61% of intragenic comparisons are significant (a total of 1423 pairs of sites and a range of 6-379 per gene). The proportion of significant disequilibrium values for pairwise comparisons ranges from 10% for AGL37 to 95% for CYP96A3 and the TIR domain gene.
Selection tests at genes in the A. thaliana CLV2 genomic region
Larger sample sizes increase the power of detecting significant linkage disequilibrium, and this is demonstrated for four genes (the MATH domain gene, CLV2, the serpin pseudogene, and the TIR domain gene), which were examined in the expanded sample set of 19-21 ecotypes. The proportion of significant comparisons ranges from ∼1 to ∼30% of pairwise comparisons (Table 5). The levels of intragenic linkage disequilibrium can also be estimated using the ZnS statistic (Kelly 1997). The levels of ZnS are significantly higher than expected under neutrality for the CLV2 gene (P < 0.012), the serpin pseudogene (P < 0.04), and the TIR domain gene (P < 0.008), as assessed by coalescent simulations that take into account the population recombination parameter estimated from the data.
The extent of disequilibrium between genes is evident in plots of r2 as a function of physical distance. Across the entire 40-kb region, strong linkage disequilibrium (r2 = 1) is observed even at distances of ∼25 kb in the smaller data set (Figure 4A). Using data from the expanded sample set, strong levels of intergenic disequilibrium are also evident among CLV2 and its nearest neighbors (Figure 4B). The distance plot shows strong linkage disequilibrium up to ∼6 kb associated with correlations among CLV2, the serpin pseudogene, and the TIR domain locus.
Amino acid replacements at CLV2: In our sample of CLV2 alleles, 22 of the 54 nucleotide polymorphisms code for amino acid replacements (Figure 3A); 20 of the substitutions occur in the LRRs, while two are in the cysteine-pair region preceding the LRRs (Figure 5). Proteins in the four allele classes differ by 7-15 amino acids. Although the majority of these replacements are fairly conservative, two to five of the differences between allele classes are due to radical substitutions (Figure 3B). The amino acid substitutions observed in our data set probably encompass much of the variation present within the species. Comparisons of the full-length CLV2 sequence from the Col-0 (class I), Ws-0 (class II), and Ler-0 (class IV) ecotypes reveal only two additional amino acid replacements, one in the 18th LRR and one in the cysteine-pair region following the LRRs (Jeonget al. 1999).
DISCUSSION
Contrasting patterns of sequence variation across the A. thaliana CLV2 region: Molecular population genetic analyses of the A. thaliana CLV2 region indicate that levels and patterns of nucleotide diversity can vary even among contiguous, closely linked genes. For example, although CLV2 has the highest level of nucleotide variation in this region (π= 0.0558), the MATH domain gene has the lowest (π= 0.0060)—a nearly 10-fold reduction in diversity between adjacent genes. Similar patterns of differing nucleotide diversity levels among linked genes have also been observed in a 400-kb region around the FRI gene (Hagenblad and Nordborg 2002), in a 170-kb region around the MAM1 gene (Hauboldet al. 2002), and in a 20-kb region around RPS5 (Tianet al. 2002). The variation in estimates of sequence polymorphism even among contiguous genes also suggests that surveys of nucleotide diversity may require more extensive sampling in a given genomic region to arrive at better estimates of region-specific polymorphism levels.
—Polymorphisms in the CLV2 gene. (A) Table of nucleotide polymorphisms in 21 A. thaliana accessions. Positions of polymorphic sites are indicated at the top. All alleles are compared to the Col-0 reference sequence. Brackets denote the four allelic classes observed. For sites containing nonsynonymous substitutions, the amino acid polymorphisms are shown beneath the nucleotide polymorphisms; the first line shows the Col-0 residue, while subsequent lines show replacement residues. (B) Numbers of amino acid replacement polymorphisms within and among allele classes. Total numbers of replacements are above the diagonal. Replacements within an allele class are on the diagonal. Radical replacements are below the diagonal.
There also appear to be dramatic changes in the patterns of nucleotide variation observed among neighboring loci in the CLV2 region. Both CLV2 and the TIR domain locus, for example, have positive levels of Tajima’s D, consistent with an excess of intermediate frequency polymorphisms in the sampled alleles. These two loci, however, are surrounded by and interspersed with genes that display negative levels of Tajima’s D, indicating an excess of low-frequency polymorphisms for these linked loci. These results suggest that levels and patterns of variation are remarkably gene-specific even among closely linked A. thaliana nuclear genes.
Intragenic linkage disequilibrium at A. thaliana CLV2 and nearest genes
Linkage disequilibrium levels appear to be extensive across the CLV2 region. In this 40-kb region, disequilibrium is observed both intra- and intergenically, and strong disequilibrium can extend to ∼25 kb. There is also evidence for correlation of allele genealogies among some of the linked genes (K. A. Shepard, unpublished observations). This correlation in gene genealogies, however, is not observed between genes that are farther apart and can also disappear between adjacent loci. The CLV2 gene and the MATH domain locus immediately upstream, for example, display weaker correlation in genealogies among the sampled alleles (K. A. Shepard, unpublished observations).
—Linkage disequilibrium in the CLV2 genomic region. All site comparisons separated by <1 kb are intragenic; the remainder are intergenic. (A) Linkage disequilibrium across all 11 genes in the CLV2 region determined from 8 accessions. (B) Linkage disequilibrium among the MATH domain, CLV2, serpin, and TIR domain genes determined from 19 accessions.
Several of the genes in the CLV2 region appear to contain two or more distinct haplotype groups (see, for example, Figure S1 in Supplementary Information at http://www.genetics.org/supplemental/). The presence of two distinct allele groups, commonly referred to as allelic dimorphism, has been observed in previous studies of A. thaliana (Kawabeet al. 1997; Purugganan and Suddith 1998). For most A. thaliana nuclear genes, allelic dimorphism appears to be readily accounted for by a model of neutral evolution with no recombination and may represent the remnants of ancestral population structure (Kuittinen and Aguadé 2000; Aguadé 2001). In a few instances, however, the elevated nucleotide variation associated with these highly divergent alleles is more compatible with balancing selection at a locus (Stahlet al. 1999; Olsenet al. 2002; Tianet al. 2002).
—Predicted amino acid replacements encoded by CLV2 alleles. Replacement changes based on predicted protein sequences for representative alleles within haplotype classes as designated in Figure 3. IIa is the An-2 allele. IIb corresponds to the Chi-1 and the Ws-0 alleles. Lyr is the A. lyrata ortholog. Radical amino acid substitutions are indicated in boldface type. The numbers for each LRR are shown on top in italics, and the amino acid positions for each replacement are also numbered, as designated by Jeong et al. (1999). Replacements in the β-strand/β-turn region as predicted by the conserved sequence motif xxLxLxx are designated as “β.” “α” denotes replacements in possible α-helical regions of the Col-0 accession as predicted by SSpro2 (Baldiet al. 1999).
The long-range decay of linkage disequilibrium is expected in A. thaliana, a predominantly selfing species with a reduced effective recombination rate. Unlike in D. melanogaster or Z. mays, where disequilibrium decays in scales of ∼1 kb, linkage disequilibrium in A. thaliana can persist up to 250 kb (Nordborget al. 2002). Given reduced recombination in A. thaliana, balanced polymorphisms may be expected to display high levels of variation and maintenance of alternate haplotypes over longer genomic scales in A. thaliana (Nordborget al. 1996), comparable to the persistence of disequilibrium in this selfing species (Nordborget al. 2002).
Evidence for balancing selection in the CLV2 genomic region: The reported high level of polymorphism at the CLV2 meristem regulatory gene (Jeonget al. 1999) first suggested the possibility that alleles at this locus or a nearby linked gene may be maintained as a balanced polymorphism in A. thaliana. A survey of the levels and patterns of variation among 11 linked genes centered on CLV2 was undertaken to dissect the evolutionary forces acting on this 40-kb genomic region. Three aspects of the levels and patterns of nucleotide diversity at CLV2 are noteworthy. First, the level of silent site nucleotide diversity at this developmental gene is about fivefold higher than those of typical A. thaliana nuclear genes; this is one of the highest levels of variation thus far reported in this species. The level of variation at CLV2 is also significantly higher than that of the reference neutral gene Adh (HKA test, P < 0.01). Second, Tajima’s D is significantly positive for this gene (P < 0.01), which indicates an excess of intermediate-frequency polymorphisms. Third, the level of intragenic linkage disequilibrium at this locus is significantly higher than that predicted by a neutral-equilibrium model under limited recombination (ZnS statistic, P < 0.012).
Three alternative scenarios may explain this pattern of diversity at the CLV2 gene. One possibility is a duplication at this locus, which could explain the distinct haplogroups, high variation, and intragenic linkage disequilibrium. There is no evidence, however, for a recent duplication of CLV2 or any of the genes flanking it in the Arabidopsis genome. Moreover, we find no evidence of duplication heterozygosity in different A. thaliana ecotypes (K. A. Shepard, unpublished observation). A second scenario is that contemporary or ancestral geographical subdivision can also result in the observed pattern. Detailed analysis of A. thaliana ecotypes using genome-wide markers, however, does not reveal any strong geographical subdivision within this species (Sharbelet al. 2000). Molecular population genetic analyses of various genes do reveal the sporadic presence of allelic dimorphism compatible with ancestral subdivision (Kawabeet al. 1997; Miyashitaet al. 1998). The levels of nucleotide variation at these loci, however, do not show marked elevation, nor do they display significant positive levels of either Tajima’s or Fu and Li’s statistics. These observations suggest that diversity at these genes, but not at CLV2, is compatible with neutral evolution under no recombination (Aguadé 2001). The third alternative compatible with the observed levels and patterns of nucleotide variation at CLV2 is that this gene harbors a balanced polymorphism. Similar patterns have been noted in other loci that unequivocally harbor balanced polymorphisms, including the Rpm1 (Stahlet al. 1999) and RPS5 (Tianet al. 2002) disease-resistance genes. It should be noted that the balanced polymorphism at CLV2 may not be incompatible with the possibility of ancestral geographical subdivision. The CLV2 haplogroups, for example, may have originated from locally adapted, geographically distinct ancestral populations (Charlesworthet al. 1997) and may be currently maintained by local selection on alternate alleles despite the widespread post-Pleistocene dispersal of this species.
The only other gene in this region that shows some evidence for balancing selection is the TIR domain gene located ∼4 kb downstream of CLV2. This locus has significantly positive Fu and Li and ZnS disequilibrium test statistics; unlike CLV2, however, this gene does not show significantly high intraspecific nucleotide variation compared to Adh (HKA test, P < 0.7). The pattern at the TIR domain gene may simply result from linkage with a balanced polymorphism at CLV2, as is suggested by the allele groups shared among these loci (see Figures 3 and S5 at http://www.genetics.org/supplemental/). Alternatively, balancing selection may be acting independently on the TIR domain gene. The sequence of this gene is similar to the TIR portion of the RPS4 disease-resistance gene (Gassmannet al. 1999), but it lacks the nucleotide binding site and LRRs characteristic of proteins encoded by RPS4 and other TIR-containing disease-resistance genes in plants. If balancing selection is acting directly on this gene, and not as a correlated effect from putative balanced polymorphisms at CLV2, it may be associated with as yet uncharacterized disease-resistance functions at this locus.
While levels of nucleotide variation are predicted to be highest immediately surrounding a balanced polymorphism, an elevated level of variation may also be expected in a more extended genomic region of a predominantly selfing species. This predicted pattern is also observed by the high level of nucleotide variation among the 11 linked genes in the CLV2 genomic region. There is a twofold increase in estimates of variation between loci in the CLV2 region and a set of 14 other A. thaliana genes. There is no accompanying increase in nucleotide divergence estimates for these genes between A. thaliana and A. lyrata, compared to previously studied loci. This suggests that the increase in intraspecific nucleotide variation in this region is not the result of an increase in the neutral mutation rate.
Our results, however, indicate that while a wide window of enhanced neutral variation surrounds the putative balanced polymorphisms in CLV2, significant effects of selection on levels and patterns of sequence diversity appear confined to genic scales. The localized nature of the effects of balanced polymorphisms in the predominantly selfing A. thaliana is paradoxical, although it has been observed at several loci. In the RPS5 disease-resistance locus, significantly enhanced variation is observed surrounding the sequence junction that harbors the RPS5 balanced indel polymorphism, but is not observed at adjacent loci within ∼10 kb (Tianet al. 2002). Similarly, a balanced polymorphism at the TFL1 inflorescence architecture gene is confined to the 1-kb promoter region, and increased diversity is not observed in either the TFL1 coding region or the upstream rps28 gene (Olsenet al. 2002). Finally, a replacement polymorphism associated with a Fast/Slow allozyme polymorphism at the PgiC locus is intragenically localized, spanning a region of only five exons and intervening introns (Kawabeet al. 2000). These results are consistent with our observations in the CLV2 region that significant retained effects of balancing selection on levels and patterns of sequence diversity may be focused at specific genes and not at nearby linked loci.
The CLV2 gene, and to some extent the TIR domain locus, are the only two genes that display departures from neutral-equilibrium predictions by several criteria: (i) significantly elevated levels of nucleotide variation, (ii) intermediate-frequency polymorphisms, and (iii) intragenic linkage disequlibrium. The other genes in the CLV2 region may also have been affected by selection at or near these loci, but do not retain consistent signatures of balancing or positive selective forces. This may reflect, in part, the relatively low power of some of the tests for selection (Simonsenet al. 1995). Loci may, for example, harbor balanced polymorphisms but the frequency of allele classes are not sufficiently high to provide a significant positive value of Tajima’s D.
Functional consequences of the putative balanced polymorphism at CLV2: The functional consequences of natural allelic differentiation at CLV2 remain unclear. The putatively balanced alleles at CLV2 are associated with a large number of replacement polymorphisms, with 7-15 amino acid changes differentiating different allele groups. The distribution of amino acid replacements within LRRs suggests that some of these substitutions could affect the function of the CLV2 protein. Extracellular plant LRRs are characterized by the consensus amino acid sequence LxxL{xxLxLxx}NxLx GxI-PxxLGx, where L may also be isoleucine, valine, or phenylalanine. Plant-specific LRRs have not yet been crystallized; however, structural analyses of nonplant proteins predict that each LRR consists of a β-strand and an α-helix joined by loops. The alternating β-strands and α-helices yield a horseshoe-shaped structure in which parallel β-strands form a binding pocket for protein-protein interactions. The xxLxLxx motif forms a β-strand/β-turn with buried leucine residues and solvent-exposed variable residues (Kobe and Kajava 2001).
In our CLV2 data set, three amino acid substitutions occur in the solvent-exposed residues in the β-strand/β-turn (Figure 5). Two of these mutations (Thr125 ⟷ Ile and Arg148 ⟷ Gly) are radical substitutions, while the third (Ile244 ⟷ Val) is conservative. Recent studies of cytoplasmic LRR proteins that confer disease resistance in plants have highlighted the functional importance of variation in solvent-exposed LRR residues. Evidence for diversifying selection on these residues has been observed in comparisons of paralogous disease-resistance genes within several species (Parniskeet al. 1997; McDowellet al. 1998; Meyerset al. 1998; Wanget al. 1998; Bittner-Eddyet al. 2000; Doddset al. 2001). In the CLV2 LRRs, we did not find support for diversifying selection as measured by Ka/Ks, the ratio of nonsynonymous to synonymous nucleotide substitution rates (data not shown). However, analysis of the P2 and p-B genes of flax indicates that less dramatic variation can also alter protein function. The predicted P2 and p-B proteins, which confer recognition of different rust strains, differ by only six solvent-exposed residues (Doddset al. 2001). These results suggest that the variation we observe in the CLV2 β-strand/β-turn might have functional consequences.
The majority of amino acid replacements in CLV2 are located in the interstrand regions of the LRRs. Of these 14 replacements, only 2 are predicted to reside in helical motifs, suggesting that the remainder are found in loops (Figure 5). Although the structure-function relationships in the interstrand regions are less understood, residues in loop regions can clearly affect LRR protein function. Studies of natural variation at RPS2, an A. thaliana disease-resistance gene, have shown that six amino acid differences between the Col-0 (resistant) and Po-1 (susceptible) alleles are sufficient to alter pathogen recognition (Banerjeeet al. 2001). Two of these mutations are found in both resistant and susceptible alleles in other ecotypes (Caicedoet al. 1999), suggesting that they are not specificity determinants. The remaining four mutations, which are located in interstrand regions of the RPS2 protein, indicate that residues outside the β-strand/β-turn can lead to functional diversification among alleles. This result is not surprising, as structural analyses have shown that ligand binding to other types of LRR proteins often involves contacts in the loops as well as in the β-strands (reviewed by Kobe and Kajava 2001).
If, indeed, some of these replacement substitutions are maintained as balanced polymorphisms, the mechanism of selection is puzzling in light of what little is known about CLV2’s role in plant development. Although there is compelling genetic evidence that the proteins encoded by the three CLAVATA genes act together to regulate shoot meristem growth, the exact constituents of and binding relationships among the receptor and ligand multimers are unclear. Of the three characterized CLAVATA genes, clv2 mutant alleles show the weakest shoot meristem phenotypes (Kayes and Clark 1998). The mild clv2 phenotype may indicate that this gene is not a crucial regulator of meristem function in Arabidopsis; however, mutations in the fasciated ear2 gene, a putative CLV2 ortholog, have dramatic effects on maize inflorescence morphology (Taguchi-Shiobaraet al. 2001). Moreover, unlike the meristem-specific phenotypes of clv1 and clv3 mutants, clv2 plants show pleiotropic effects on pedicel, stamen, and gynoecium development (Kayes and Clark 1998). Finally, in contrast to the narrow, meristematic expression domains of CLV1 and CLV3, the broad expression pattern of CLV2 in the shoot (Jeonget al. 1999) suggests that CLV2 may interact with additional proteins in other parts of the plant.
We therefore propose two hypotheses that might explain the putative balancing selection on the CLV2 locus. First, CLV2 might act as a modulator of shoot meristem growth, with different alleles enhancing or reducing the strength of signaling through the CLAVATA complex. This modulation might be accomplished by variation in the accumulation of CLV1 protein in the plasma membrane or by alterations in the affinity of the complex for the multimeric CLV3 ligand. Such modulation could have direct effects on fitness-related traits such as flower number. Alternatively, balancing selection may act on pleiotropic functions of CLV2 that involve currently unidentified binding partners. Characterizing phenotypic, ecologically relevant variation associated with alleles at CLV2 will strengthen the argument of balancing selection at this locus.
Acknowledgments
The authors thank Brandon Gaut, Ken Olsen, Mark Ungerer, Montserrat Aguadé, two anonymous reviewers, and members of the Purugganan laboratory for helpful comments, Outi Savolainen and Helmi Kuittinen for providing A. lyrata seed, and Juergen Kroymann for providing preprints of relevant manuscripts. The authors are also grateful to the NCSU Phytotron for providing growth facilities and the NCSU Genome Research Laboratory for sequencing facilities. This work was funded by a grant from the National Science Foundation Integrated Research Challenges in Environmental Biology program to M.D.P., J. Schmitt, and T.F.C. Mackay, and an Alfred P. Sloan Foundation Young Investigator Award to M.D.P.
Footnotes
-
Sequence data from this article have been deposited with the EMBL/GenBank Data libraries under accession nos. AF528566-AF528713.
-
Communicating editor: M. Aguadé
- Received July 16, 2002.
- Accepted November 20, 2002.
- Copyright © 2003 by the Genetics Society of America