Genetics, Vol. 166, 1419-1436, March 2004, Copyright © 2004

Functional Divergence in Tandemly Duplicated Arabidopsis thaliana Trypsin Inhibitor Genes

M. J. Claussa and T. Mitchell-Oldsa
a Department of Genetics and Evolution, Max Planck Institute of Chemical Ecology, 07745 Jena, Germany

Corresponding author: M. J. Clauss, Max Planck Institute of Chemical Ecology, Beutenberg Campus, Hans Knöll Str. 8, 07745 Jena, Germany., clauss{at}ice.mpg.de (E-mail)

Communicating editor: J. BERGELSON


*  ABSTRACT
*TOP
*ABSTRACT
*MATERIALS AND METHODS
*RESULTS
*DISCUSSION
*LITERATURE CITED

In multigene families, variation among loci and alleles can contribute to trait evolution. We explored patterns of functional and genetic variation in six duplicated Arabidopsis thaliana trypsin inhibitor (ATTI) loci. We demonstrate significant variation in constitutive and herbivore-induced transcription among ATTI loci that show, on average, 65% sequence divergence. Significant variation in ATTI expression was also found between two molecularly defined haplotype classes. Population genetic analyses for 17 accessions of A. thaliana showed that six ATTI loci arranged in tandem within 10 kb varied 10-fold in nucleotide diversity, from 0.0009 to 0.0110, and identified a minimum of six recombination events throughout the tandem array. We observed a significant peak in nucleotide and indel polymorphism spanning ATTI loci in the interior of the array, due primarily to divergence between the two haplotype classes. Significant deviation from the neutral equilibrium model for individual genes was interpreted within the context of intergene linkage disequilibrium and correlated patterns of functional differentiation. In contrast to the outcrosser Arabidopsis lyrata for which recombination is observed even within ATTI loci, our data suggest that response to selection was slowed in the inbreeding, annual A. thaliana because of interference among functionally divergent ATTI loci.


NATURAL selection and neutral evolutionary processes can shape functionally important genetic variation at individual genes. In multigene families, we are further challenged to distinguish the contribution of variation among loci and among alleles to functional diversification (OHNO 1970 Down; FORCE et al. 1999 Down; WALSH 2003 Down). After gene duplication, both the ecological function and the evolutionary fate of loci can diverge (e.g., ZHANG et al. 2002A Down; FERRARI et al. 2003 Down). Duplicated loci that are retained for more than a few million years eventually experience strong purifying selection (LYNCH and CONERY 2003 Down) and can undergo adaptive divergence via modification of the ancestral gene function and the acquisition of novel function (FORCE et al. 1999 Down; LYNCH and FORCE 2000 Down; LYNCH et al. 2001 Down). For genes involved in biotic interactions where functional diversity is selectively favored (e.g., plant-pathogen and plant-herbivore interactions, mate recognition), elevated gene copy retention associated with subfunctionalization and neofunctionalization may impart a selective advantage to the persistent multigene families observed in many plant genomes (CHARLESWORTH et al. 2000 Down; RASK et al. 2000 Down; BERGELSON et al. 2001 Down). We are only now beginning to explore concerted functional evolution among duplicated loci and alleles at individual loci. Gene birth, death, and diversification dynamics are expected to interact in a complex manner with allelic diversity evolving in response to frequency-dependent selection, balancing selection, or positive selection in a coevolutionary "arms race" (ELLIS et al. 1995 Down; MEYERS et al. 1998 Down; DUDA and PALUMBI 2000 Down; BERGELSON et al. 2001 Down; TIAN et al. 2002 Down; VAN DER HOORN et al. 2002 Down; ZHANG et al. 2002B Down).

Evolutionary inferences based on patterns in molecular sequence data have been the topic of a large number of recent studies (COMERON and KREITMAN 2002 Down; FAY et al. 2002 Down; FORD 2002 Down; KIM and STEPHAN 2002 Down, KIM and STEPHAN 2003 Down; NAVARRO and BARTON 2002 Down; TIAN et al. 2002 Down; NORDBORG and INNAN 2003 Down). Because tandemly duplicated loci share not only a functional and evolutionary origin, but also a common chromosomal context, identifying the signatures of selection at diverging loci can be confounded by positional nonindependence. Interference due to conflicting selection on linked sites is expected in asexual genomes (MIRALLES et al. 1999 Down), but may also shape functional evolution in sexual genomes with local reductions in recombination (HILL and ROBERTSON 1966 Down; KIM and STEPHAN 2000 Down, KIM and STEPHAN 2003 Down; COMERON and KREITMAN 2002 Down). In Drosophila, where linkage disequilibrium (LD) typically decays within 1 kb (LONG et al. 1998 Down), local reductions in recombination have been suggested to limit adaptive evolution at linked sites (KIRBY and STEPHAN 1996 Down; BETANCOURT and PRESGRAVES 2002 Down). For the worldwide population of the inbreeding annual plant Arabidopsis thaliana, significant LD is found within individual loci (HANFSTINGL et al. 1994 Down; KAWABE et al. 1997 Down; KAWABE and MIYASHITA 1999 Down; AGUADE 2001 Down; HAUSER et al. 2001 Down) and on a genomic scale up to 250 kb (NORDBORG et al. 2002 Down). This potential nonindependence must be taken into consideration in functional genetic analyses for closely linked loci in A. thaliana, in particular for tandemly duplicated gene families.

Proteinase inhibitors (PIs) are widespread and highly diverse in the plant kingdom. The physiological function of plant proteinase inhibitors includes protection against the proteolytic enzymes of herbivores and pathogens, as well as the regulation of endogenous storage proteinases during seed dormancy and reserve protein mobilization (GREEN and RYAN 1972 Down; RYAN 1990 Down; PAUTOT et al. 1991 Down; KOIWA et al. 1997 Down; DE LEO et al. 1998 Down; HARUTA et al. 2001 Down; GLAWE et al. 2003 Down; TELANG et al. 2003 Down). A single plant may have proteinase inhibitors from several different functional classes (LASKOWSKI and KATO 1980 Down; RYAN 1990 Down). On the basis of structural and biochemical properties, a novel class of serine trypsin inhibitors (TIs) encoded by a small gene family has been identified in the Brassicaceae (MENEGATTI et al. 1992 Down; CECILIANI et al. 1994 Down; RUOPPOLO et al. 2000 Down; ZHAO et al. 2002 Down). The specificity of the inhibited protease is determined in part by a single amino acid residue at the P1 position of the reactive site loop (LASKOWSKI and KATO 1980 Down; ASCENZI et al. 1999 Down). Experimental randomization of P3-P3' residues in the reactive site loop of the Sinapis alba TI (MTI2) and subsequent selection by phage display for elevated trypsin inhibitory activity demonstrated that the wild-type MTI2 reactive site has optimal conformation for trypsin inhibition with very few degrees of freedom (CECI et al. 2003 Down). Trypsin inhibitory function has been maintained in orthologous TI loci in the Brassicaceae for >20 million years (MY), as suggested by a conserved reactive site loop (P3-P3' APRIF/YP) in Brassica, Sinapis, and Arabidopsis (KOCH et al. 2001 Down; ZHAO et al. 2002 Down).

Functional investigation of Sinapis trypsin inhibitors has demonstrated expression in immature seeds and in wounded leaves (CECI et al. 1995 Down; DE LEO et al. 2001A Down). The MTI2 inhibitor interacts in a highly specific manner with insect gut proteinases, is induced upon feeding, and can be an effective defense against insect herbivores (DE LEO et al. 1998 Down, DE LEO et al. 2001B Down; CECI et al. 2003 Down). However, in response to a diet enriched in one TI, insects can preferentially express digestive proteinases insensitive to the dominant inhibitor and thereby render this plant defense less effective (BROADWAY 1995 Down, BROADWAY 1997 Down; JONGSMA et al. 1995 Down; JONGSMA and BOLTER 1997 Down; DE LEO et al. 1998 Down; VOLPICELLA et al. 2003 Down). The fact that insect trypsin proteinases are encoded by members of large gene families (GU et al. 2002 Down) may contribute to this flexible counterstrategy. Thus, the diversity and expression of plant PIs may be shaped by an antagonistic coevolutionary dynamic favoring protein diversification as predicted by the arms-race model (VAN VALEN 1973 Down; BROADWAY 1996 Down; JONGSMA and BOLTER 1997 Down; DE LEO et al. 1998 Down).

The loci encoding A. thaliana trypsin inhibitors (ATTIs) are members of a multigene family. Six ATTI loci are arranged in tandem within 10 kb on chromosome II and appear to have undergone duplication subsequent to a genome duplication event 24–40 million years ago (BLANC et al. 2003 Down). Here, we test whether nucleotide polymorphism evolves independently among members of the ATTI tandem array in the selfing annual A. thaliana, with reference to the closely related outcrossing species, A. lyrata. We observed significant heterogeneity in the pattern of polymorphism among gene family members, including a peak of diversity flanking the presence/absence polymorphism for the ATTI5 locus. We then analyze ATTI gene expression in response to herbivory and test for functional diversification both among loci and among alleles. Assessing the role of natural selection in shaping the significant functional diversification observed for members of this gene family was complicated by contrasting population genetic signatures for linked loci in a region of low recombination.


*  MATERIALS AND METHODS
*TOP
*ABSTRACT
*MATERIALS AND METHODS
*RESULTS
*DISCUSSION
*LITERATURE CITED

A. thaliana trypsin inhibitors:
Six ATTI loci have been identified in the Columbia accession of A. thaliana on the basis of sequence homology to mustard trypsin inhibitor (MTI2; MENEGATTI et al. 1992 Down) and rapeseed trypsin inhibitor (RTI3; CECILIANI et al. 1994 Down). ATTI1 is the ortholog to MTI2 and RTI3 (with on average 75% similarity) and has been structurally analyzed (ZHAO et al. 2002 Down). In our study ATTI1, ATTI2, ATTI3, ATTI4, and ATTI6 occur in tandem on chromosome II and refer to At2g43510, At2g43520, At2g43530, At2g43535, and At2g43550, respectively. ATTI4 is currently annotated in GenBank only as an expressed protein, not as a trypsin inhibitor (AC002335; March 11, 2002). ATTI5 refers to a locus between ATTI4 and ATTI6 that was identified in several accessions other than Columbia on the basis of sequence homology (Fig 1). A locus with no homology to ATTI and of unknown function is located between ATTI5 and ATTI6 (At2g43540). A seventh locus in this gene family, ATTI7, is located on chromosome I (At1g47540). We focus here only on the ATTI16 loci located within the tandem array on chromosome II.



View larger version (26K):
In this window
In a new window
Download PPT slide
 
Figure 1. (A) Amino acid sequence of six A. thaliana trypsin inhibitor (ATTI) loci located on chromosome II. The signal peptide is indicated by light shading and the reactive site loop by dark shading, with P1 and P1' indicated by dots. (B) Percentage pairwise sequence identity among ATTI coding sequences for one representative A. thaliana accession (Fe-1a).

The ATTI1 locus encodes an 89-amino-acid precursor with 27 amino acids in the amino terminus not represented in the mature protein (ZHAO et al. 2002 Down). The 27-amino-acid N terminus of the precursor polypeptide most likely represents a signal peptide for secretion into the endoplasmic reticulum (SignalP v1.1). For ATTI4 and ATTI5, a 5-amino-acid carboxy terminus was identified on the basis of sequence homology to MTI2 (VOLPICELLA et al. 2000 Down). The position and length of a single intron is conserved among all ATTI loci.

Herbivory experiment and expression analysis:
Seeds of seven A. thaliana accessions chosen randomly from two sequence-defined haplotype classes (A, Gö-0, Ler-0, Nd-1, and Rsch-0; B, Cvi-1, Fe-1a, and Wei-0; see below for haplotype classification) were sown onto a sterilized potting soil/vermiculite mix (3:1), vernalized at 4° for 1 week, placed in a randomized design into a short-day growth chamber, and reduced to a single individual per pot after germination. Four and 6 weeks after germination, six randomly selected plants per accession were placed into a separate growth chamber for 9 hr; three plants were subjected to the herbivory treatment and three individuals were control plants. The herbivory treatment consisted of one third instar Plutella xylostella (Leptidoptera) larva per plant. The larvae originate from line G88 (Cornell University) and were reared on an artificial diet (SHELTON et al. 1991 Down), from which they were removed 16 hr prior to transfer to plants. Herbivore-induced transcription of homologous trypsin inhibitors is maximal after 9 hr of feeding in a related species of Brassicaceae (BAUKE 2002 Down). After the experiment, up to six basal leaves were harvested from all plants and immediately placed into liquid nitrogen. All leaves harvested from herbivore-induced plants had feeding scars and thus had the potential for local as well as systemic induction.

For each of 12 plants per accession (two ages, two herbivore treatments, and three replicates), RNA was extracted from {cong}100 mg of leaf tissue using a standard protocol employing cell lysis with TRIZOL reagent (GIBCO BRL, Gaithersburg, MD) and RNA purification with phenol-chloroform and ethanol precipitation. Approximately 1 µg of total RNA was used for cDNA synthesis, as described by FROHMAN et al. 1988 Down. For reverse transcription polymerase chain reaction (RT-PCR), the amount of total cDNA added for each sample was adjusted so that the RT-PCR product within the linear phase for a housekeeping gene was ~5 ng/µl (RAN; Ras-related nuclear small GTP-binding protein; At5g55190). RAN primers were 5' ACCAGCAAACCGTGGATTACC and 3' CCACAAAGTGAAGATTAGCGTCC (57°; see RT-PCR conditions below). To assess trypsin inhibitor expression, one master mix was made for RT-PCR conducted with primers for all six ATTI loci and RAN. A standard PCR protocol was used with the following cycling scheme: 2 min at 94°; 30 cycles of 30 sec at 94°, 20 sec at primer-specific annealing temperature, and 40 sec at 72°; and a final extension at 72° for 10 min. Locus-specific primers were designed to span the intron, such that PCR products derived from the cDNA could be differentiated from genomic contaminants (primer sequences available upon request). The PCR products were visualized on a 2% agarose gel in a standardized manner. Band intensity provides a semiquantitative measure of the transcription level (ImageQuant 5.1; Molecular Dynamics, Sunnyvale, CA). Prior to analysis of transcription level, each RT-PCR band was standardized to the intensity of the 400-bp band of 2 µl of the Low Mass Ladder (Invitrogen, San Diego) loaded in every ninth well to control for gel effects. For the ATTI RT-PCR products, we standardized bands by the intensity of the RAN band for the given individual and by the PCR efficiency of the primers for that ATTI locus amplifying genomic DNA.

Statistical analysis of transcription:
An analysis of variance was performed on the log-transformed standardized transcript level and, because transcription was sometimes zero, 100 was added to each standardized data point prior to log transformation (PROC GLM; version 8, SAS Institute). Haplotype class, herbivory, locus, age, and accession nested within haplotype class were fixed effects (e.g., TEMPLETON et al. 1993 Down). The full model including all interaction effects was analyzed. The locus x herbivory two-way interaction effect was further analyzed using linear contrasts within the context of the ANOVA to determine which loci deviated significantly from the average herbivore-induction response.

ATTI tandem array—amplification and sequencing:
A single individual was sampled from 17 accessions of A. thaliana (L.) Heynh. (Brassicaceae) originating from Eurasia, North Africa, and North America: Col-0, USA; Cvi-1, Cape Verde Islands; Di-0, France; Fe-1a, Germany; Gö-0, Germany; Ita-0, Morocco; Kas-1, India; Le-0, The Netherlands; Ler-0, Germany; Nd-1, Germany; Nok-0, The Netherlands; Rsch-0, Russia; Sah-0, Spain; Ta-0, Czech Republic; Wei-0, Switzerland; Wil-2, Lithuania; and Ws-0, Belarus. All A. thaliana seeds were obtained from the Nottingham Arabidopsis Stock Center. After harvest, young leaf material from each accession was placed in liquid nitrogen and DNA extraction from 0.1 g of tissue followed the protocol for the Nucleon PhytoPure plant DNA extraction kit (Amersham, Arlington, Heights, IL). Primers for PCR amplification were placed in the exons of genes flanking the ATTI array in Columbia (AC002335; arrayF, 5' GGACGGGTCGTTTCAGCTG, and arrayR, 5' GACGTGAGCTTAGAGTTCATAC; 58°). PCR of up to 10 kb of chromosome II was conducted using ELONGase Enzyme mix (GIBCO BRL) with hotstart and an 8-min extension time. PCR products were gel purified and cloned using pCR-Blunt II TOPO vector (Invitrogen). Three clones per accession were sequenced on a 3700 ABI capillary sequencer using primers spaced approximately every 400 bp and designed from the Columbia accession (available upon request). For Ita-0, the ATTI region was amplified in two overlapping segments due to the presence of a large insertion. The ATTI array was amplified from one individual of A. lyrata ssp. petraea from Plech in Bavaria, Germany (49° 54' 99'' N; 11° 30' 64'' E; same as KOCH, Pfaffenhofen b. Neuhaus, Bavaria, Germany, leg. KOCH; KOCH et al. 2001 Down; CLAUSS and MITCHELL-OLDS 2003 Down) using the primers 5' CCAATCGGTTTGGTCCTAAAG and 3' CATTCATTGAAGAACATCACATTG (55°). Three clones from A. lyrata were sequenced using the above primers from A. thaliana as well as additional species-specific primers where necessary.

Sequence analysis:
Sequences were assembled with SeqMan 5.0 (DNASTAR) and all variable sites were checked manually during the construction of a consensus sequence from three clones for each A. thaliana accession and one A. lyrata individual. Only one allele per accession was included in the population genetic analysis of A. thaliana because individuals are derived from multiple generations of inbreeding in the laboratory. For the outgroup A. lyrata, we sequenced one of two alleles. All sequences were aligned with MegAlign 5.03 set to default gap penalty parameters (DNASTAR). Pairwise BLAST searches confirmed insertion/deletion (indel) breakpoints in regions of high polymorphism and repetitive nucleotide sequence. Coding and noncoding regions were inferred according to homology with MTI2, GenBank annotation, and sequenced cDNA clones in A. thaliana (M. J. CLAUSS, unpublished data). The DnaSP program version 3.84 (ROZAS and ROZAS 1999 Down) was used for both intra- and interspecific analyses of nucleotide polymorphism. Nucleotide diversity, {pi}, was estimated according to NEI 1987 Down; {theta} according to WATTERSON 1975 Down; and nucleotide divergence, K, between A. thaliana and A. lyrata according to NEI 1987 Down. Linkage disequilibrium between variants at different polymorphic sites (HILL and ROBERTSON 1968 Down); HUDSON and KAPLAN's (1985) estimate of the minimum number of recombination events; the recombination parameter, R, per gene (HUDSON 1987 Down); and Wall's B (WALL 1999 Down) were also calculated. Insertions and deletions are not considered in these algorithms. We assessed the spatial distribution of indels in the ATTI region by summing gapped sites in 17 nonoverlapping segments that were 400 bp in length without gaps.

Sequences were tested for departure from equilibrium-neutral expectations using statistics from TAJIMA 1989 Down, FAY and WU 2000 Down, MCDONALD and KREITMAN 1991 Down, and HUDSON et al. 1987 Down. For Fay and Wu's H, coalescent simulations based on a neutral infinite-sites model assuming a large constant population size were employed to identify significant departure from neutral expectations. Input parameters for coalescent simulations were: (1) the observed {theta} per gene (estimated as the average number of nucleotide differences), (2) the sample size, (3) the estimated recombination parameter per gene (R), and (4) the estimated value of Fay and Wu's H. Homogeneity of diversity and divergence in silent sites along the tandem array was tested using a multilocus Hudson, Kreitman, and Aguadé (HKA) test for eight nonoverlapping, artificially defined loci each encompassing ~500 nongapped positions (multilocus HKA; distributed by Jody Hey through http://www.lifesci.rutgers.edu/heylab/).


*  RESULTS
*TOP
*ABSTRACT
*MATERIALS AND METHODS
*RESULTS
*DISCUSSION
*LITERATURE CITED

Heterogeneous ATTI expression:
Semiquantitative RT-PCR estimating mRNA levels of ATTI loci in rosette leaves for seven accessions of A. thaliana demonstrated that five of the six ATTI loci were transcribed. Although positive controls for three primer pairs with genomic DNA resulted in ATTI5 amplification, we never observed ATTI5 bands in RT-PCR analyses for any accession or tissue type (including basal leaves, flowers, fruits, and roots; results not shown). Consequently, the ATTI5 locus was excluded from subsequent analyses. The ATTI5 locus was identified here for the first time on the basis of sequence similarity and position within the tandem array from the accessions Cvi-1, Fe-1a, Nd-1, and Wei-0 (see below). The ATTI4 locus, presently not annotated in the Col-0 sequence as a member of the trypsin inhibitor gene family, was transcribed in all accessions tested.

There was differential regulation among ATTI loci, among haplotype classes, and in response to herbivory, as seen by the significant main effects in the ANOVA of log-transformed transcript levels (Table 1). Gene regulation differed among loci with ATTI3 and ATTI6 having the highest transcript levels (Table 1; Fig 2A). The haplotype classes differed significantly in transcription; overall, haplotype A (HA) had higher transcript levels (Table 1; Fig 2B). Only ATTI6 showed the opposite response, with haplotype B (HB) having more transcripts (Fig 2B). Accessions were assigned to haplotype class on the basis of the presence (HB) or absence (HA) of the ATTI5 region, as well as >150 nucleotide polymorphisms in linkage disequilibrium throughout the array (see below). The effect of accession nested within haplotype class explained no additional variation in transcript levels (Table 1). The herbivory treatment accounted for the largest source of variation in the model, with herbivore-induced plants having, on average, 4.5 times greater transcript levels than uninduced control plants (Table 1; Fig 2C). Loci were induced differently by herbivory, such that transcription plasticity was greater than average for ATTI3 and lower than average for ATTI2 (Table 1, linear contrasts; Fig 2C). Plant age did not account for significant variation in expression. However, the two-way interaction of locus and age was significant, indicating the potential for age-specific regulation among loci (Table 1; Fig 2A). The remaining two-way as well as all three- and four-way interaction terms were not significant and, therefore, were removed from the model for clarity.



View larger version (23K):
In this window
In a new window
Download PPT slide
 
Figure 2. Transcript levels of six ATTI loci for seven accessions of A. thaliana showing the effect of (A) locus and age (4 weeks, solid bars; 6 weeks, shaded bars), (B) locus and haplotype class, and (C) locus and herbivory. In B, the interaction between locus and haplotype class is graphically represented as expression of haplotype A minus the expression of haplotype B. Plotted values are least-square means from the full factorial ANOVA with untransformed data.


 
View this table:
In this window
In a new window

 
Table 1. Analysis of variance of ATTI transcript level in A. thaliana

Nucleotide variation in the ATTI array:
The alignment of 17 accessions of A. thaliana for the ATTI tandem array located on chromosome II spanned 9019 bp (from 3' of the preceding to 5' of the succeeding gene). An analysis based on 7016 nongapped sites identified 239 polymorphic sites (including 59 singletons), 16 haplotypes, nucleotide diversity estimates of {pi} = 0.0107 and {theta} = 0.0102, and a minimum of six recombination events. Unless otherwise stated, analyses and references to position are based on a 10,352-bp global alignment of 17 A. thaliana accessions with A. lyrata as the outgroup.

Under neutrality, nucleotide sequences exhibiting high divergence among species are also predicted to evolve at high rates within species. In the ATTI region, however, diversity and divergence were uncorrelated (rSpearman = 0.137; P = 0.996; n = 17 nonoverlapping 300-bp windows), and a multilocus HKA test rejected homogeneity of polymorphism and divergence (eight nonoverlapping loci of ~500 bp; {chi}2 = 24.71; P < 0.001). In relation to divergence, polymorphism was disproportionately lower in the 5' and 3' ends of the region and higher flanking ATTI4, in the middle of the array (Fig 3).



View larger version (19K):
In this window
In a new window
Download PPT slide
 
Figure 3. The ratio of silent-site diversity ({pi}) in A. thaliana to silent-site divergence (K) from A. lyrata ssp. petraea for nonoverlapping 400-bp windows. Approximate locations of ATTI loci are indicated. Loci in brackets fall within the indicated windows in A. thaliana, but were not included in the calculation because they were deleted from A. lyrata ssp. petraea. Average {pi}/K = 0.12, indicated by horizontal line.

Significant linkage disequilibrium was evident throughout the entire tandem array (55% of comparisons by Fisher's exact test significant at P < 0.001; Fig 4), giving rise to two haplotype clades (Fig 5). In addition to the six recombination events identified via the HUDSON and KAPLAN 1985 Down algorithm applied to nucleotide polymorphisms for all accessions, visual inspection of indel and nucleotide polymorphisms together suggested two novel recombination events between positions 4470–4607 and 10009–10107 (Fig 4). Distribution of these eight recombination events was not uniform; events were concentrated in the 5' and 3' ends of the array (a posteriori Fisher's exact test of the number of recombination events in the first and last quarters of polymorphic sites combined vs. middle 50%; one-tail P = 0.038; Fig 4). Three of eight recombination events resulted in a switch of the downstream haplotype category for that accession (interhaplotype recombination; Fig 4). The common haplotype (HA) was represented by the Col-0 sequence. The less frequent haplotype (HB), typified by the presence of the ATTI5 locus (see below), was found in 4 of 17 accessions in this study (frequency of HB is 0.24); HB was found at a frequency of 0.17 in a sample of 80 randomly chosen accessions of A. thaliana (results not shown).



View larger version (30K):
In this window
In a new window
Download PPT slide
 
Figure 4. Polymorphic sites among 17 accessions of A. thaliana showing significant linkage disequilibrium (Fisher's exact test; P < 0.001 for 8608 of 15,576 pairwise comparisons) across 10 kb on chromosome II. Locations of ATTI loci are indicated and shaded where polymorphic sites exist within the coding sequence. Although not included in the analysis due to indels, the approximate position of ATTI5 is also given. An unknown gene is located between 8958 and 9467. X identifies the approximate upper position of the inferred recombination events from (a) the HUDSON and KAPLAN 1985 Down algorithm based on nucleotide polymorphisms and (b) visual inspection of both nucleotide and indel polymorphisms, as well as (c) the interhaplotype recombination events.



View larger version (12K):
In this window
In a new window
Download PPT slide
 
Figure 5. Neighbor-joining analysis with 1000 bootstrap replicates showing nodes >50% for 3411 silent-site polymorphisms in ATTI array among the 13 noninterhaplotype recombining accessions of A. thaliana using one outgroup sequence from A. lyrata ssp. petraea.

Within the ATTI array, a peak in species-wide diversity centered on ATTI5 was the result of a large number of fixed differences among haplotypes (Fig 6A; {pi} > 0.04 in one 400-bp window; n = 17 accessions). Diversity within each haplotype was similarly low throughout the array (HA {pi} = 0.0032; HB {pi} = 0.0027) and showed no underlying peaks in polymorphism near ATTI5 (Fig 6A). Using 2 accessions that are representative of HA and HB (Col-0 and Fe-1, respectively), we polarized each polymorphism in the array using the A. lyrata sequence as the outgroup. Equal numbers of derived nucleotide substitutions were found in the two lineages (62 in each Col-0 and Fe-1a). Even within the largest nonrecombining region in the array interior (Fig 4), derived and ancestral substitutions were evenly distributed among haplotypes (Table 2).



View larger version (23K):
In this window
In a new window
Download PPT slide
 
Figure 6. Patterns of nucleotide polymorphism among 17 accessions of A. thaliana spanning 10 kb on chromosome II. Position is given from global alignment with A. lyrata ssp. petraea, and the locations of loci in the ATTI tandem array are indicated. All windows are 400 bp with a 100-bp step, shaded double arrows represent regions of indels, and asterisks indicate regions of significant deviation from neutrality for the respective test (P < 0.01). (A) Diversity ({pi}total) for all accessions (solid line; n = 17), haplotype A accessions not involved in interhaplotype recombination (dotted line; n = 10), and haplotype B accessions not involved in interhaplotype recombination (dashed line; n = 3). (B) Tajima's D (n = 17) and (C) Fay and Wu's H (n = 17 accessions and A. lyrata ssp. petraea).


 
View this table:
In this window
In a new window

 
Table 2. Distribution of derived indel and nucleotide polymorphisms in two haplotype classes of A. thaliana regions flanking the ATTI5 deletion

We tested whether the species-wide frequency spectrum of polymorphisms deviated from expectations under neutral equilibrium population dynamics. A sliding window analysis of the ATTI region demonstrated that, on average, Tajima's D was not different from zero (D = 0.22; P > 0.10). However, two 400-bp segments did deviate significantly from zero: D = –2.08 (P < 0.05) and D = 2.26 (P < 0.05), corresponding to the 5' regions of ATTI2 and ATTI6, respectively (Fig 6B; see Single-locus analysis below). A sliding window analysis of Fay and Wu's H demonstrated that, within the context of an overall excess of high-frequency-derived polymorphisms (H = –36.06; P < 0.001), distinct deviations from neutrality were associated with ATTI2 and ATTI4 (Fig 6C).

Insertion/deletion variation in the ATTI array:
The pattern of insertions and deletions contains valuable population genetic information and simultaneously can be the source of hidden biases in the analysis of nucleotide diversity. Alignment algorithms employing a high or low gap penalty can result in elevated or reduced diversity parameters within the resulting alignments, respectively. Indels were found throughout the ATTI array, with an indel peak roughly coinciding with the peak of polymorphism (rSpearman = 0.48; P = 0.049; n = 17 nonoverlapping 400-bp windows; Fig 7). This positive correlation is not expected if high diversity were simply an alignment artifact. When we polarized 230 gapped sites in accessions representative of haplotypes A and B relative to the A. lyrata outgroup (Col-0 and Fe-1a, respectively), we observed approximately equal numbers of insertion and deletion events (20 and 23, respectively), although 2.7 times more sites were deleted (163) than were inserted (67). Derived indels were distributed evenly among haplotype clades surrounding the ATTI5 deletion (Table 2).



View larger version (27K):
In this window
In a new window
Download PPT slide
 
Figure 7. Polymorphism among 17 accessions of A. thaliana over ~10 kb calculated for nonoverlapping windows 400 bp in length excluding gaps for (top) nucleotide polymorphisms and (bottom) insertion/deletion polymorphisms (log plus one). Correlation between nucleotide and indel polymorphisms was rSpearman = 0.48, P = 0.049 for all data and rSpearman = 0.45, P = 0.069 after removal of the two largest indels. Approximate locations of ATTI loci are indicated by arrows in an alignment of only A. thaliana sequences. ATTI5 was not included in the calculations because it is located within an indel. Unk is At2g43540 (see MATERIALS AND METHODS).

In the Ita-0 accession, we identified a 4857-bp indel that appears to be a class II (DNA) transposable element insertion as evidenced by the presence of an 11-bp terminal inverted repeat, flanking 5-bp target site duplication, and sequence similarity to A. thaliana MuDR-like elements. The element was located within the 5' untranslated region (UTR) of ATTI4 (–85 bp of ATG, with no discernible effect on transcription; M. J. CLAUSS, unpublished data) and was lacking in the remaining 16 accessions as well as in the A. lyrata outgroup. The second largest indel was a 1505-bp deletion observed in 13 of 17 accessions, which includes the previously undescribed ATTI5 locus (Fig 1). The ATTI5+ allele was 43% diverged from the most similar locus; hence ATTI5 probably represents the recent deletion of an old gene family member. Consistent with this hypothesis, diversity within the ATTI5 region ({pi} = 0.0034 for positions 7273–8777) was not reduced relative to average diversity across the entire tandem array ({pi} = 0.0027 for positions 197–10,548) when only the four accessions with ATTI5+ (haplotype B; Fig 6A) were considered. Comparison with the outgroup A. lyrata was not informative because ATTI5, as well as ATTI3, were deleted in all alleles surveyed thus far (M. J. CLAUSS, unpublished data). Although the ATTI5 deletion site in A. thaliana and A. lyrata was within 100 bp, homology of the deletion was uncertain because of ambiguous alignment in this highly repetitive and polymorphic region.

Single-locus analysis:
The six ATTI loci differ in their level of diversity within A. thaliana, the frequency distribution of polymorphisms, and among-species divergence (Table 3, Table 4, and Table 5; Fig 7). The consensus sequence of all ATTI loci was in frame, with no pseudogene signatures. Variation in sequence diversity among the five expressed ATTI loci was positively correlated with variability in transcript levels among accessions (Fig 8). For the nontranscribed ATTI5 locus only, a GA microsatellite located 3–13 bp 5' to the ATG in each TI locus was large and variable (up to 25 repeats), and there was a single missense replacement substitution in one accession (Cvi-1). As expected for functional genes, average diversity at ATTI loci ({pi} = 0.0057) was lower than diversity across the ~10 kb spanning the array ({pi}total = 0.0107 or {pi}silent = 0.0135; Table 3 and Table 4). Within this chromosomal context, diversity for ATTI1 and ATTI2 was extremely low ({pi} = 0.0009), whereas {pi} for ATTI4 was high (0.0110), approaching that of silent-site diversity. Differences in the extent of haplotype structure were responsible for some of this diversity: ATTI1 and ATTI2 had only singleton polymorphisms, whereas the two dominant haplotypes could be identified in the remaining loci (e.g., Wall's B for ATTI4 and ATTI6 was 0.54 and 0.44, respectively; Table 4). Among functional regions, exon 1, coding for most of the signal peptide, had the greatest average diversity, even higher than that of the 5'-UTR (Table 3). Significant deviation in the frequency spectra from equilibrium-neutral expectations was observed for the ATTI2 locus (including flanking region; Tajima's D = –2.02; P < 0.002) and ATTI6 (coding sequence only; D = 2.20; P < 0.01; Table 4). After a sequential Bonferroni correction (HOLM 1979 Down) is applied to all 16 tests of site frequency spectra in Table 4, these results remain significant at the P = 0.05 level.



View larger version (13K):
In this window
In a new window
Download PPT slide
 
Figure 8. Correlation between sequence diversity in the coding sequence and the coefficient of variation in expression among seven accessions of A. thaliana for members of the ATTI gene family (r2 = 0.41).


 
View this table:
In this window
In a new window

 
Table 3. Diversity ({pi}) among 17 accessions of A. thaliana and divergence (K) to A. lyrata ssp. petraea for six ATTI loci


 
View this table:
In this window
In a new window

 
Table 4. Analysis of site frequency spectra for 17 accessions of A. thaliana


 
View this table:
In this window
In a new window

 
Table 5. Rates of synonymous (s) and replacement (a) substitution within A. thaliana for 17 accessions and between A. thaliana and A. lyrata ssp. petraea

Although average ATTI divergence between A. thaliana and A. lyrata was similar to divergence across the entire chromosomal region (K = 0.100), K ranged from 0.0661 to 0.1671 among loci (Table 3). Among coding positions (~270 bp per gene) in the four ATTI loci for which an outgroup comparison was possible, divergence and diversity were positively correlated (rSpearman = 0.9487; P = 0.05; n = 4; Table 3). Although the significant HKA test reported above rejected homogeneity for the entire 10-kb region, HKA was nonsignificant when only the coding regions were compared (results not shown). Among functional regions, the lowest divergence, on average, was seen for coding sequences (8%), whereas the 5'-UTR was on average 17% divergent (Table 3). High- frequency-derived polymorphisms were in significant excess for ATTI2 (Fay and Wu's H; Table 4).

A comparison of synonymous (Ks) and amino acid replacement (Ka) substitutions can provide evidence of the evolutionary rate of functionally important changes on three levels in gene families: intraspecific diversity within loci, species divergence within loci, and divergence among loci. For ATTI, we found lower replacement than synonymous rates of evolution at each level. First, within A. thaliana, {pi}a/{pi}s varied from 0.29 (ATTI1) to 0.76 (ATTI6) along the array, suggesting different levels of purifying selection among loci (Table 5). Second, among species, the Ka/Ks ratio was lower than {pi}a/{pi}s within A. thaliana (Table 5). It is notable that Ka/Ks for ATTI6 was the lowest observed (0.14). Third, comparisons among all pairs of ATTI loci for one representative accession (Fe-1a) also illustrated overall selective constraint in trypsin inhibitor evolution (Ka/Ks << 1): average Ka/Ks ratios for all comparisons including ATTI1 through ATTI6 were 0.33, 0.40, 0.44, 0.43, 0.61, and 0.21, respectively. Thus, ATTI5 was under the least selective constraint (0.61), whereas amino acid evolution appears most constrained for ATTI1 and ATTI6.

Of particular interest for understanding changes in inhibitory function are replacement changes in the reactive site loop (Fig 1). While there were synonymous changes between the P3 to P3' positions in some loci, no within-species replacement polymorphisms were observed within this critical region. The most common residue at the P1 position for ATTI, arginine, was replaced by lysine in ATTI4 and ATTI6 (Fig 1).


*  DISCUSSION
*TOP
*ABSTRACT
*MATERIALS AND METHODS
*RESULTS
*DISCUSSION
*LITERATURE CITED

We found a significant association between gene expression and haplotype class for loci of the A. thaliana trypsin inhibitor (ATTI) tandem gene family. Membership in two molecularly defined haplotype classes explained a significant proportion of the variation in transcription, whereas the contribution of accession nested within haplotype was not significant (Table 1). Haplotype classes differed up to sixfold in transcript level (Fig 2B). Within A. thaliana, linkage disequilbrium generally extends over <250 kb; hence a significant association between haplotype and function suggests cis-regulation at or near the tandemly arrayed ATTI loci, whereas unlinked trans-acting regulation is less likely (HAUBOLD et al. 2002 Down; NORDBORG et al. 2002 Down; SHEPARD and PURUGGANAN 2003 Down).

An association between naturally occurring sequence haplotypes and function has been observed in A. thaliana for several qualitative phenotypes relating to pathogen resistance (STAHL et al. 1999 Down; TIAN et al. 2002 Down; MAURICIO et al. 2003 Down). However, in an analysis of quantitative variation for a plant-herbivore defense trait, HAUSER et al. 2001 Down do not detect a significant relationship between nucleotide polymorphism and trichome density. Identifying significant functional differences among divergent alleles is critical for efforts to connect nucleotide sequence variation to ecologically important characteristics undergoing natural selection. In this study of transcription of herbivore-induced proteinase inhibitor genes, we identified significant associations between sequence haplotypes and quantitative functional variation in just one component of a complex phenotype. Additional components such as protein synthesis, inhibitory activity, and the fitness consequences of alternative alleles remain to be assessed. Population genetic analyses of polymorphisms can also provide evidence of the adaptive significance of functional variation. Below, we describe the chromosomal landscape of polymorphism surrounding the ATTI tandem array and consider the evidence for locus-specific functional evolution.

ATTI loci in the chromosomal landscape:
Tandemly duplicated genes located in close proximity to one another not only share a demographic and functional history, but also exist within a shared chromosomal landscape. For 10,352 aligned nucleotide positions, we observed: (1) average levels of overall polymorphism within A. thaliana and divergence to A. lyrata, (2) two distinct haplotype classes in A. thaliana with similar nucleotide diversity within each class, (3) significant linkage disequilibrium, (4) abnormally low levels of diversity associated with ATTI1 and ATTI2, and (5) a peak of polymorphism spanning the presence/absence polymorphism of the previously undescribed ATTI5 locus. Significant heterogeneity of polymorphism relative to divergence allowed us to reject the hypothesis that the genomic region encompassing the ATTI tandem array has evolved according a homogeneous neutral model (Fig 3; multilocus HKA; P < 0.001).

The co-occurrence of a peak of polymorphism at one functional locus and distinct haplotype classes has previously been reported in several studies of A. thaliana molecular evolution and may indicate balancing or frequency-dependent selection (STAHL et al. 1999 Down; TIAN et al. 2002 Down, TIAN et al. 2003 Down; MAURICIO et al. 2003 Down; but see AGUADE 2001 Down). The peak of polymorphism documented for ATTI differs from previous reports in several ways.

First, the polymorphism peak in the ATTI tandem array spans >1.2 kb of nongapped sequence, including three genes, their promoter regions, and intergenic sequences (Fig 3 and Fig 6). The genes are part of a tandem array of duplicated loci that are functionally differentiated and thus represent several linked targets of selection (e.g., PARSCH et al. 2001 Down; BAINES et al. 2002 Down).

Second, the peak of nucleotide polymorphism in the ATTI array centers on a presence/absence polymorphism for an old, apparently nonfunctional, gene copy (ATTI5). Whereas evidence in favor of an indel polymorphism maintained by selection is strengthened by the presence of at least one functional allele (e.g., RPM1; TIAN et al. 2003 Down), our analysis indicated that both the complete gene deletion and the ATTI5+ allele were nonfunctional. The lack of transcription, elevated Ka/Ks ratio, and a large polymorphic microsatellite immediately 5' of the ATG all suggest that ATTI5+ is a pseudogene and, hence, not a candidate for an extant balanced polymorphism. However, loss-of-function in ATTI5+ appears to be relatively young, and thus this locus may have influenced past evolution in the array. Comparative data from A. lyrata do not shed light on this process, because ATTI5 was also deleted in A. lyrata and the deletion site for both species is in the same repetitive, polymorphic region for which sequence alignment was ambiguous. The high level of divergence between ATTI5 and all other ATTI genes (>43%) suggests that the indel represents the deletion of an old ATTI5 allele, rather than a recent insertion via duplication from an existing TI locus. Recent gene conversion is also unlikely because (a) the most similar TI, ATTI4, was 43% divergent from ATTI5+ and (b) ancestral and derived nucleotide polymorphisms were evenly distributed among ATTI5+ and ATTI5 haplotypes in the nonrecombining region flanking the deletion (Table 2).

Third, although the peak of polymorphism reflects fixed differences among two haplotype classes spanning several ATTI loci, the allele frequency spectrum does not provide statistical support for a balanced polymorphism. The worldwide frequency of the ancestral haplotype B (estimated by the frequency of the ATTI5+ allele) was 0.17, and nucleotide polymorphisms in linkage disequilibrium with ATTI5 did not deviate significantly from expectations under a neutral equilibrium model (Fig 6B). Diversity across the sampled 10 kb was similar for both ATTI haplotype classes (HA {pi} = 0.0032 and HB {pi} = 0.0027; or if accessions with interhaplotype recombination are excluded, HA {pi} = 0.0017 and HB {pi} = 0.0015), suggesting coexisting allele classes with segregating variation of similar age. The even distribution of derived sites among haplotypes also argues strongly against introgression of a divergent allele (Table 2). One possible explanation for this pattern is frequency-dependent selection for the maintenance of long-lived and divergent clades as proposed by STAHL et al. 1999 Down for the RPM1 pathogen-defense gene. Although our peak spans an unlikely target of current selection, a simulation study by NORDBORG and INNAN 2003 Down argues that the peak of polymorphism need not always be centered on the site of selection (see discussion of ATTI6 below). However, two dominant haplotype classes are also a likely outcome of a neutral coalescence process without recombination (e.g., as seen within loci in inbreeding populations; HUDSON 1990 Down; AGUADE 2001 Down; CHARLESWORTH 2003 Down).

Functional evolution in closely linked ATTI loci:
ATTI loci exhibit substantial sequence divergence (65%), differences in reactive site residues determining functional specificity, and variation in constitutive as well as induced transcription (Fig 1 and Fig 2). In A. thaliana there was an association of nucleotide polymorphism with ATTI transcript level as demonstrated by significant haplotype and locus x haplotype effects on transcription (Table 1; Fig 2) and by the positive correlation between sequence diversity and variability in transcription among loci (Fig 8). This functional and molecular diversity, together with multiple recombination events (Fig 4), suggests the potential for independent adaptive evolution among tandem duplicates in A. thaliana. However, the coding sequences of ATTI loci are in close physical linkage (neighboring genes are separated on average by only 716 bp), and we estimated significant linkage disequilibrium, particularly among polymorphisms in the interior of the tandem array (Fig 4). Below, we interpret the patterns of polymorphism and divergence in functionally differentiated TI loci within the constraints of their genetic backgrounds.

Unique patterns of sequence diversity and expression for the first two TI loci indicate an evolutionary history distinct from the remaining gene family members. ATTI1 and ATTI2 exhibit extremely low levels of intraspecific polymorphism in comparison to other A. thaliana loci (Table 3 and Table 4; Fig 2A; KAWABE et al. 1997 Down; KUITTINEN and AGUADE 2000 Down; SAVOLAINEN et al. 2000 Down; AGUADE 2001 Down; HAUSER et al. 2001 Down; OLSEN et al. 2002 Down; TIAN et al. 2002 Down; WRIGHT et al. 2003 Down; RAMOS-ONSINS et al. 2004 Down), average divergence from A. lyrata (Table 5; SAVOLAINEN et al. 2000 Down; AGUADE 2001 Down; RAMOS-ONSINS et al. 2004 Down), and low transcript levels (Fig 2A). Both loci appear to be under strong purifying selection; the nonsynonymous substitution rate was much less than the rate of synonymous changes (Table 5). Conservation of the amino acid sequence in the reactive site loop between ATTI1, ATTI2, and ATTI1 orthologs in Sinapis (MTI2) and Brassica (RTI3) also indicates functional constraint (ZHAO et al. 2002 Down). Phage display experiments with MTI2 have demonstrated that the P3-P3' APRIFP reactive loop shared by ATTI1 and ATTI2 and, in particular, the arginine at the P1 position result in maximal inhibition of trypsin (CECI et al. 2003 Down). Nonetheless, a low {pi}/K ratio (Fig 3) and an excess of high-frequency-derived polymorphisms concentrated in the signal peptide and 5'-UTR of ATTI2 are consistent with recent positive selection (Fig 6B and Fig C; Table 4; CLAUSS and MITCHELL-OLDS 2003 Down). An evolutionary trajectory independent from the rest of the tandem array was facilitated by several recombination events that reduced the correlation between polymorphisms in ATTI1 and ATTI2 vs. downstream TI loci (Fig 4).

ATTI3, ATTI4, and ATTI5 are located in a region with excess species-wide polymorphism, excess indels, and almost complete linkage disequilibrium (Fig 4, Fig 6, and Fig 7). Nonetheless, ATTI3 and ATTI4 were functional and show evidence of selective constraint ({pi}a/{pi}s < 1; Table 5). ATTI3 has the common trypsin inhibitor P1 reactive site residue (arginine) and was the most highly transcribed ATTI locus in both control and herbivore-induced treatments (Table 1; Fig 2C). ATTI4 has lysine at the P1 position and showed a low-to-intermediate transcription profile (Fig 2C). Lysine at the P1 position also results in trypsin inhibition (POLTICELLI et al. 1999 Down) and is found in unrelated trypsin inhibitors (LING et al. 1993 Down). Thus, the change in reactive site at ATTI4 from arginine to lysine is more likely to reflect modification of trypsin inhibition rather than an entirely novel function. We hypothesize that modification of the reactive site in ATTI4 occurred within the last 5 MY (after divergence of A. thaliana and A. lyrata) because the ATTI4 ortholog in A. lyrata has the putatively ancestral arginine at the P1 position. ATTI4 also differs from the remaining functional ATTI array members in having a C-terminal peptide that may affect regulation of inhibitory activity toward endogenous proteases during sequestration (Fig 1A; VOLPICELLA et al. 2000 Down; DE LEO et al. 2001A Down). Population genetic tests of ongoing evolution by natural selection in ATTI4 were confounded by haplotype structure and linkage disequilibrium within the tandem array. Although Fay and Wu's H reached its nadir in the 3' region of ATTI4 (Fig 6C; Table 4), the statistic was also significantly negative for the entire nonrecombining region spanning ATTI3ATTI5 (H = –24.7; P < 0.001; n = 1462 aligned nongapped positions). Functional analyses also pointed to positive correlations across loci: transcription of both functional TI loci in this linkage group differed significantly among haplotypes (Fig 2B). At this juncture, we can only comment that HA for ATTI4 had the greatest proportional increase in expression over HB observed for all loci (result not shown) and that this difference was associated with a high local concentration of derived polymorphisms (eight of nine polymorphisms derived in HA). One possible explanation for this pattern is ongoing positive selection for the high-expression ATTI4 HA allele and hitchhiking at linked loci. Further functional experiments and population genetic analyses of a larger sample of accessions, including more naturally occurring recombinants, are needed to disentangle these correlated functional and evolutionary patterns.

The function of ATTI6 in A. thaliana and A. lyrata appears to be modified in comparison to upstream TI loci. ATTI6 has lysine in place of arginine at the P1 position (as in ATTI4) and two additional amino acid substitutions in the P3-P3' reactive site loop in comparison to ATTI1 and ATTI2. The potential for independent functional evolution of ATTI6 in A. thaliana was suggested by three recombination events within and flanking this locus (Fig 4). ATTI6 is functionally unique among ATTI loci in that transcript levels for haplotype B exceeded those of haplotype A (Fig 2B). An interhaplotype recombination event located ~880 bp 5' of ATTI6 was associated with a switch from haplotype A to B downstream and a concordant dramatic switch in expression. This result identifies candidate polymorphisms for regulatory control of haplotype-specific ATTI6 expression and is consistent with promoter-deletion experiments that have identified the 520 bp 5' of MTI2 essential for gene expression (DE LEO et al. 2001A Down). In addition to the distinctive pattern of expression for ATTI6, this locus also had a population genetic signature unique among ATTI loci. A significant excess of intermediate frequency polymorphisms was found in ATTI6 (Tajima's D = 2.20; P < 0.01; Table 4). While we must be cautious of type I errors, we feel justified in further exploring the change in allele frequency distribution at this location in the array (Fig 6B) because of the highly significant change in expression pattern at this locus (Fig 2B). Several possible processes may give rise to an intermediate frequency polymorphism, including balancing selection at the ATTI6 locus, directional selection at ATTI6 constrained by selection at linked sites (e.g., interference or traffic), or nonequilbrium population dynamics. We reject population admixture or other forms of population structure as the primary cause of intermediate frequency polymorphisms and elevated {pi}a/{pi}s observed at ATTI6, because demographic processes are expected to be pervasive throughout the genome and affect all site categories (Table 3 and Table 4). The role of balancing selection or frequency-dependent selection favoring alternate alleles at ATTI6 must be further explored via ecologically informed functional studies (e.g., TIAN et al. 2003 Down).

Interference or traffic occurs when variants are positively selected but fail to go to fixation because of conflicts among multiple linked segregating sites (KIRBY and STEPHAN 1996 Down; KIM and STEPHAN 2003 Down). If increased transcript levels are selectively favored, a balanced polymorphism at ATTI6 could reflect interference between positive selection for haplotype B at ATTI6 and positive selection for haplotype A upstream in the ATTI3-ATTI5 linkage group (e.g., BAINES et al. 2002 Down). Population subdivision and low outcrossing rates in A. thaliana reduce the probability of a recombination event coupling two putatively selected alleles separated by only 2100 bp (NORDBORG et al. 2002 Down). If positive selection to fix the rare beneficial recombinant is less effective in A. thaliana due to reduced Ne and other correlated characteristics of this inbreeding ruderal (CHARLESWORTH 2003 Down), the selective conflict and, thus, the alternate alleles can have long residence times in the global population. It is also possible that past interference with positive selection for haplotype A alleles at ATTI2 or ATTI1 may have contributed to the patterns of nucleotide and functional polymorphism at downstream loci. Our data are consistent with the hypothesis that interference plays an important role in limiting adaptive evolution in densely packed tandem genes for this highly selfing species. In the population genetic context of the closely related outcrossing species, A. lyrata, effective recombination appears to be widespread even within ATTI loci (CLAUSS and MITCHELL-OLDS 2003 Down). Thus, constraints to functional evolution in A. thaliana may well be ephemeral and must not necessarily inhibit the long-term evolutionary potential of tandemly duplicated loci.

Population genetics of defense:
Patterns of nucleotide polymorphism in plant defense-related genes can be divided into three categories. First, we find examples that are consistent with selective sweeps, as predicted by models of a coevolutionary arms race between plants and their enemies (BITTNER-EDDY et al. 2000 Down; BERGELSON et al. 2001 Down, MONDRAGON-PALOMINO et al. 2002 Down; ZHANG et al. 2002B Down; CLAUSS and MITCHELL-OLDS 2003 Down; this study). Second, and contrary to the predictions emerging from the classic arms race model, several studies demonstrate balanced polymorphisms of ancient alleles, which have been interpreted in view of balancing selection or "trench warfare" (STAHL et al. 1999 Down; TIAN et al. 2002 Down; KROYMANN et al. 2003 Down; MAURICIO et al. 2003 Down). Third, many loci display patterns that are complex and difficult to explain under simple evolutionary models of neutrality or selection (KAWABE et al. 1997 Down; TIFFIN and GAUT 2001 Down; ZHANG et al. 2002B Down; this study).

In gene-for-gene pathogen resistance, selection appears to result overwhelmingly in rapid adaptive evolution among both alleles and paralogs (BERGELSON et al. 2001 Down). However, in more diffuse interactions involving nonhost pathogen resistance or insect herbivory, selection may not follow classical gene-for-gene or race-specific predictions (ELLIS et al. 2000 Down; COLLINS et al. 2003 Down). In the ATTI gene family we observed purifying selection for five of six functionally differentiated loci. These divergent TI loci may function as "fixed heterozygotes," providing an effective and ecologically flexible defense against trypsin proteinases of the diverse specialist and generalist herbivores and pathogens that attack Arabidopsis. An effective horizontal strategy for maintaining functional diversity through s