| HOME | HELP | FEEDBACK | SUBSCRIPTIONS | ARCHIVE | SEARCH | TABLE OF CONTENTS |
Genetics, Vol. 176, 2357-2369, August 2007, Copyright © 2007
doi:10.1534/genetics.107.072231
| ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
Institute of Evolutionary Biology, School of Biological Sciences, University of Edinburgh, Edinburgh EH9 3JT, United Kingdom
1 Corresponding author: Institute of Evolutionary Biology, School of Biological Sciences, University of Edinburgh, Ashworth Lab, King's Bldgs., W. Mains Rd., Edinburgh EH9 3JT, United Kingdom.
E-mail: deborah.charlesworth{at}ed.ac.uk
| ABSTRACT |
|---|
|
|
|---|
0.11 cM was estimated over a completely sequenced region whose physical distance is 250 kb (i.e.,
2.3 Mb/cM; TOMITA et al. 2004); this was estimated to be about one-tenth of the recombination rate of the immediately flanking region. To test whether recombination is detectable in the A. lyrata S-locus region, we have been using population genetic approaches. We previously studied DNA sequence diversity of four loci located in the genome region containing the self-incompatibility loci and closely linked to the S loci (KAWABE et al. 2006), but not involved in the incompatibility reaction. In the Icelandic population surveyed, two loci, B80 and B120, had extremely high diversity; since our tests did not detect any evidence for balancing selection at these loci, we concluded that the high diversity was due to linkage disequilibrium (LD) with the S loci (KAMAU and CHARLESWORTH 2005), as predicted for sites close to a locus under balancing selection (HUDSON and KAPLAN 1988; CHARLESWORTH et al. 1997). The boundary of the region over which the influence of balancing selection at the A. lyrata S loci extends was not defined. Two other nearby loci had lower diversity, but the difference was not statistically significant, so data from further loci linked to the S loci are needed.
Ideally, the diversity data should be analyzed in relation to the theory of how the balanced polymorphism affects linked sites at different distances away. Diversity is expected to be higher, the higher the number of alleles that are maintained (TAKAHATA and SATTA 1998; NAVARRO and BARTON 2002). Because we initially surveyed only one population, it was also possible that the high diversity observed is due to linkage disequilibrium within this population (WAKELEY and ALIACAR 2001) and might not necessarily imply very infrequent recombination with the S loci. For instance, some of our sampled individuals might recently have immigrated from a genetically different population. This can be tested using data from reference loci, together with data from further populations. The ideal sample for testing for LD is to include one individual from each of multiple different populations, to minimize LD due to recent common ancestry within populations (WAKELEY and ALIACAR 2001).
Here, we extend the survey of diversity at the B80 and B120 loci to eight more Icelandic populations (of which seven proved to have S haplotypes that allowed comparisons between populations), plus additional samples from other European populations, and we add four more loci at increasing distances from the S-locus region. If the flanking loci have high nucleotide diversity because of linkage disequilibrium with the S loci, we expect to detect associations between the flanking locus sequences and the S-locus sequences of different incompatibility alleles; i.e., we expect S haplotypes carrying a given sequence at an S locus to be characterized by defined alleles at the flanking loci across independent families and populations. Our tests give evidence for associations at the flanking loci closest to the S locus, but not at the more distant ones, since the former had much lower diversity within S haplotypes (defined by their SRK sequences), compared with between haplotypes, whereas the latter had similar diversity regardless of haplotype.
Using approximate expressions for diversity within and between S haplotypes as functions of the genetic recombination rate, in an explicit population genetics model, we show how it is possible to estimate the allele numbers and turnover times of S alleles and to use these to estimate the rates of crossing over between the S loci and the flanking loci at different probable physical distances away. This approach of using diversity values (which are directly related to LD near a site under balancing selection; CHARLESWORTH et al. 1997) should be preferable to estimating recombination using standard analyses of linkage disequilibrium; therefore we did not attempt to infer the phase of our haplotypes and quantify linkage disequilibrium. Our data suggest that recombination is unusually infrequent in the region close to the S loci, but do not suggest a very extensive region of suppressed crossing over. Despite some uncertainty in our estimates, the approach is the first simple method for estimating recombination in the S-locus region, and the results are consistent with direct estimates of recombination by genetic mapping (KAWABE et al. 2006). Genetic mapping cannot, however, exclude some recombination within the region.
| MATERIALS AND METHODS |
|---|
|
|
|---|
|
|
All sequencing was performed on an ABI 377 automatic sequencing machine using Dyenamic (Amersham Biosciences). Sequences were verified manually using Sequencher version 4.2.2 (Gene Codes, Ann Arbor, MI). At least five clones were sequenced from each plant: the base assigned at each position for each clone was that found in all or most of the sequences. The GenBank accession numbers for the new sequences obtained are as follows: locus B80, EF599769–EF599795; B120, EF599796–EF599820; S2, EF599821–EF599861; S4, EF599862–EF599882; S8, EF599883–EF599905; and S12, EF599906–EF599946.
Associations between loci:
To test for associations, we used plants in which at least one of the SRK alleles was known from previous work, and we defined haplotypes according to the SRK allele carried, determined as follows. For as many haplotypes as possible, we established the phase between the alleles at each S-linked locus and the individual's known A. lyrata S alleles, using full-sib families made by crossing plants from different Icelandic populations, in which the SRK haplotypes for at least one of the parents, and several progeny, had been established in a previous study (BECHSGAARD et al. 2004). We sequenced portions of the loci of interest in both parents of the families and at least two offspring, to distinguish the parental alleles; at least three clones per allele in all samples were sequenced to verify the sequences of the four alleles. We could then infer the linkage phase of the parental haplotypes for the set of linked loci studied, as described in HAGENBLAD et al. (2006). When only one allele was identified, the individual was treated as a homozygote in our analyses.
Polymorphism and divergence analysis:
For each of the six loci, sequences from our natural population samples (including the sequences of alleles identified from the families) were aligned using Clustal X v. 1.81 using the default conditions, and further modifications to the alignment were done manually in BioEdit. Intron–exon boundaries were determined after alignment with the cDNA sequences of the A. thaliana orthologs of all genes. Nucleotide diversity among the A. lyrata alleles and divergence from their A. thaliana orthologs were estimated with the software DNAsp v. 4.0 (ROZAS and ROZAS 1999), using NEI and GOJOBORI's (1986) method. Diversity values per synonymous or nonsynonymous site were estimated for the coding regions of the six loci, as well for the intron sites when present, using similar samples of plants (PCR amplification was unsuccessful for one or two individuals at each of the loci).
HKA tests were used to compare polymorphism within A. lyrata with divergence from the orthologous A. thaliana sequences. Multilocus tests were conducted using the HKA program distributed by J. Hey (http://lifesci.rutgers.edu/heylab). Nested HKA tests within a maximum-likelihood framework were implemented in MLHKA (WRIGHT and CHARLESWORTH 2004) for an A. lyrata data set that included the 6 flanking loci and 12 other loci studied in European populations, all on the same arm of the A. lyrata chromosome 7 as the S loci (KUITTINEN et al. 2004; KAWABE et al. 2006). We refer to these as reference loci. The general model that assumes no selection at any of the loci was compared to one in which selection was assumed to be acting at one or more loci. Loci with no variants within A. lyrata were assigned a low diversity.
Except for locus S8, for which we observed only two haplotypes, neighbor-joining trees were estimated on the basis of pairwise divergence of all sites (with pairwise deletion and Jukes–Cantor correction), using MEGA version 3 (KUMAR et al. 2004). The significance of clusters was assessed by bootstrapping (1000 permutations).
Estimating recombination for flanking genes:
Below, we derive the expected coalescence times for alleles from the same functional allelic class and alleles from different classes, respectively Twithin and Tbetween, as functions of three parameters, the number of selectively maintained alleles, n, the turnover rate of these alleles (i.e., the rate at which new functional S alleles arise), c, and the recombination rate, r. Under the infinite-sites model, and assuming no mutation rate differences, these are proportional to the expected nucleotide diversity values in samples of the sequences of the two respective kinds (HUDSON 1990). The estimates make use of
, the estimated reference locus diversity; i.e., we compare our results with the predicted relative values from the equations: Twithin/2Ne =
within/
and Tbetween/2Ne =
between/
. We used observed values of silent- or synonymous-site diversity measures from samples of these two kinds based on sequence data from the SRK locus kinase domain (CHARLESWORTH et al. 2003b), and from the other loci in the S-locus region, to estimate the quantities in the equations. The reference loci for the
-estimate were 12 loci from the same chromosome arm as the S loci, the A. lyrata chromosome 7 (KAWABE et al. 2006); these loci were surveyed for sequence diversity in natural populations of A. lyrata (A. KAWABE, A. FORREST, S. I. WRIGHT and D. CHARLESWORTH, unpublished data) and have diversity similar to that of other loci in the species (WRIGHT et al. 2003, 2006).
The SRK locus data allow us to estimate the two quantities describing the selected locus itself (the S locus in the present study), n and 2Nec, assuming that sites within the SRK kinase domain do not recombine with the selected sites in the S domain (i.e., r = 0). Estimating c requires knowing the Ne value. We estimated Ne using nucleotide diversity estimates from reference loci, since
estimates 4Neµ, where µ is the neutral mutation rate. We used two different µ-estimates based on synonymous- or silent-site divergence from A. thaliana orthologs, one value at the high end of the likely range and a more moderate one (WRIGHT et al. 2002).
Given these estimates, we then applied the equation for a neutral site recombining with a self-incompatibility locus (Equation A3 in the APPENDIX) to synonymous- or silent-site diversity data from loci at different physical distances from the S locus, to estimate the recombination rates, r (in crossovers). These estimates were then converted into values per megabase, using physical distance estimates. To estimate
within, we used S haplotypes, defined as haplotypes with the same SRK sequence, whose sequences at a flanking locus were determined from two or more plants; the mean over all such haplotypes was used in the calculations, similarly to estimating within-deme diversity for a subdivided population (for example, see HUDSON et al. 1992); we used the unweighted mean, since the sample size for most haplotypes was two (although larger numbers were studied for S1 haplotypes, as explained above).
between for each flanking locus was estimated as the mean of the nucleotide diversity values between different S haplotypes; we estimated this by subtracting the
within values either from the diversity in the whole sample or conservatively from the diversity estimated using one sequence randomly chosen from each S haplotype from the Mt. Esja population. The results were very similar, and results from the first method are used below.
These diversity estimates were obtained using DNAsp (ROZAS and ROZAS 1999). The two diversity values were used to describe subdivision into allelic classes with respect to the SRK locus, using estimates of the proportion of variability that is between classes, analogous to FST values for quantifying how much variability is between, as opposed to within, demes in a subdivided population (the FAT statistic of CHARLESWORTH et al. 1997; see also TAKAHATA and SATTA 1998). The FAT values estimate a quantity
(CHARLESWORTH et al. 1997) that is closely similar to the LD measure
2 (MCVEAN 2002). To estimate FAT values, we treated alleles from different S haplotypes, defined by their SRK alleles (see above), as "populations" and used the measure KST that takes the sequence differences of alleles into account (HUDSON et al. 1992), and we tested the significance of the "subdivision" using K* (HUDSON et al. 1992), by permutation tests in DNAsp; we refer to this below as KAT. To compare with subdivision between the populations sampled, we also estimate KST values between the populations. In both the between-population and the between-haplotype analyses, we included indel variants in the subdivision estimates.
Although the principle of our approach is based on the existence of LD, LD estimation is not required. The approach relies purely on diversity values; however, it is necessary to have reliable information about the phase of variants in the flanking loci. We did not attempt to analyze haplotypes in which phase was inferred from unphased sequences, because the frequency of heterozygotes at the SRK locus and the flanking genes (see below) is very high and there is therefore little information from which to infer the phases of variants; such inferences will thus be very unreliable.
| RESULTS |
|---|
|
|
|---|
The case in which diversity is observed at sites that recombine with the selected locus is given in the APPENDIX, and Figure 2 illustrates some results from the model. The diversity between haplotypes carrying different functional alleles at the selected site (
between) can be extremely high relative to that at reference loci not close to the selected locus, as noted by TAKAHATA and SATTA (1998). A crucial result is that
within and
between for sequences linked to a locus under balancing selection are expected to be similar unless recombination is extremely infrequent. FAT (the proportion of variability that is between classes or its value estimated as KAT) will thus be close to zero if recombination occurs, and very high values suggest low recombination.
|
![]() | (1a) |
![]() | (1b) |
Observed polymorphism at flanking loci:
The loci studied are shown in Table 1. The two loci closest to the S loci, B80 and B120, were included in a previous diversity survey of a sample from a single natural population of A. lyrata from Iceland (Mt. Esja, also referred to as 99R, KAMAU and CHARLESWORTH 2005), and the sequences obtained previously were included in the present analyses. Diversity has not previously been studied at the four other loci, S2, S4, S8, and S12, which are more distant from the S loci. The physical distances in A. thaliana from the ortholog of the SRK locus are shown in Table 1, and in that species the pollen incompatibility gene (SCR) is close to SRK. The corresponding physical distances in A. lyrata are known only for two haplotypes (Sa and Sb of KUSABA et al. 2001), respectively equivalent to S13 and S20 of SCHIERUP et al. (2001), neither of which has been found in European populations (CHARLESWORTH et al. 2003a; MABLE et al. 2003; BECHSGAARD et al. 2004). Since these two well-studied haplotypes differ considerably in gene arrangement and intergene distances, the distances in other haplotypes, including those studied here, may also differ. However, the flanking gene orders are the same in haplotypes Sa and Sb and in A. thaliana (KUSABA et al. 2001). The higher total DNA content of A. lyrata (BENNETT et al. 2003) suggests that using A. thaliana distances should be conservative in evaluating the extent of recombination suppression in the S-locus region.
Pooling the sequences from all populations, regardless of their S haplotypes, nucleotide diversity (
) is highest for the two closest S-flanking genes, B80 and B120, and lower for the distant flanking loci (Table 2). B80 and B120 were heterozygous in all individuals. No plants were heterozygous at the S8 locus, which has very low diversity; we observed only two haplotypes at this locus. Haplotype numbers were higher for S2, S4, and S12, and heterozygotes were common (Table 2).
|
The diversity differences do not result from differences in the mutation rates of the loci, since silent-site divergence between A. lyrata and A. thaliana ranges from 0.09 to 0.17, within the normal range for these two species. Raw silent-site divergence varies, but few genes sequenced in both species have Ks > 0.2 or < 0.05 (WRIGHT et al. 2002; BARRIER et al. 2003; RAMOS-ONSINS et al. 2004); the mean Ks and Ka for these species are 0.119 ± 0.004 and 0.025 ± 0.002, respectively, based on 304 ESTs from A. lyrata and their A. thaliana putative orthologs (BARRIER et al. 2003). All six S-flanking loci had low Ka/Ks values (Table 2), indicating that they are functional genes in both species.
Associations between S alleles and loci flanking the S-locus region:
Using full-sib families to establish the phase of the haplotypes (see MATERIALS AND METHODS), we identified haplotypes from different populations carrying a number of different SRK alleles. To denote the sequences at flanking loci, we use a notation that gives the locus in question, with a subscript giving the SRK allele (e.g., B80S1). Of the B80 sequences from known S haplotypes, five sets have identical sequences, although they originated from different populations (S12, S14, S22, and S25); S6, S9, and S18 alleles were each studied from only one population, and the S18 alleles (all from population 5) were identical in sequence. However, although most of the B80S1 alleles cluster together, one is very different, and the B80S15 and B80S27 sequences were also variable (not shown). These results are similar to those previously found for B80 (KAMAU and CHARLESWORTH 2005) and for another nearby locus, Aly8 (HAGENBLAD et al. 2006).
For the B120 locus, diversity is again high (Table 2), but associations with SRK are less clear than for B80, and B120 alleles from several S haplotypes are found in disparate parts of the tree, as was previously found within the Mt. Esja population (KAMAU and CHARLESWORTH 2005). Again, S6 and S9 haplotypes were studied only from the Mt. Esja population, and again the B120S6 and B120S9 sequences included a few variants. The B120S1 alleles are found in three groups, with very different sequences, and B120S15, B120S16, B120S25, and B120S27 alleles are also scattered across the tree, suggesting some recombination between the S locus and the B120 locus. Some of these results could arise if a plant is heterozygous for two haplotypes, but only one SRK allele was detected, which is not the one from which our B120 allele was sequenced, so that a haplotype is then misclassified; thus our results are likely to underestimate associations with SRK alleles. Nevertheless, as is seen below, there is clear evidence of associations.
A. lyrata S alleles can be divided into four dominance levels, and alleles within the same dominance class are most similar in sequence (PRIGODA et al. 2005). A flanking locus that is in linkage disequilibrium with SRK will have the same evolutionary history as that of the S alleles and might thus be associated with the dominance classes. Our sequences of alleles at the B80 and B120 loci do not, however, show clustering that is congruent with these allele classes. Finally, consistent with their low diversity, the gene trees for the four distant loci show no evident associations with the S haplotypes or with S-allele dominance classes.
Diversity within and between different S haplotypes or natural populations:
To quantify the proportion of variability that is between S-allele classes, we estimated nucleotide diversity within and between the haplotype classes defined by their SRK sequences and used these to calculate the measure KAT explained above, which is analogous to KST for quantifying how much variability is between, as opposed to within, demes in a subdivided population.
The results for silent sites are shown in Table 3 and Figure 3. Consistent with its high diversity (KAMAU and CHARLESWORTH 2005), the B80 gene has a high FAT value (estimated as KAT, see MATERIALS AND METHODS), which is highly significant using the K* test (P < 0.0001), suggesting strong isolation between the B80 sequences in haplotypes with different SRK alleles and high similarity among the sequences when the SRK allele is the same. The B120 sequences also show significant associations with SRK alleles (P < 0.0001), as does Aly8, the A. lyrata ortholog of the A. thaliana ARK3 gene, which is also in the region flanking the S loci (HAGENBLAD et al. 2006). For the close flanking loci, the high diversity is thus evidently due to high proportions of sites differing between S haplotypes. For the four more distant flanking loci, samples of different compositions with respect to the S haplotype yield similar diversity, and KAT values are low (Table 3) and not significantly different from zero (P = 0.21 for S2, 0.51 for S4, and 0.9 for S12).
|
|
Estimates of SRK allele numbers and turnover rates and recombination rates with flanking loci:
We used the equations in the APPENDIX to analyze jointly diversity results from SRK and from flanking loci, including the four new genes whose sequence diversity was described above and also Aly8. We first estimated the number of functional classes of alleles, n, on the basis of polymorphisms in the kinase domain of the SRK gene. This locus has very high diversity, which cannot be estimated accurately, but silent-site diversity is at least 0.6, whereas within allelic classes it is much lower (estimated
within = 0.00052, on the basis of the data reported in CHARLESWORTH et al. 2003a). The n estimates vary, depending on the set of reference loci used, since the diversity estimates differ significantly between different A. lyrata chromosomes (A. KAWABE, A. FORREST, S. I. WRIGHT and D. CHARLESWORTH, unpublished data). The lowest estimated n value is 8, on the basis of a silent-site diversity of 0.0097 for 12 loci from the chromosome arm AL7 on which the S loci are located (these loci have unusually low diversity), and the highest is 39, on the basis of the mean synonymous-site diversity value of 0.027 from AL7 plus 24 AL1 loci. The higher the number of alleles maintained at the selected locus, the less recombination is required to maintain diversity at linked loci (see Equation 3). Since we wish to test whether recombination may be restricted in the S-locus region, it is conservative to assume large n. We therefore used values of 25 and 50 in our calculations.
For the turnover rate of S alleles, c, we require an estimate of 4Ne, and we estimated this using either of two mutation rates (see MATERIALS AND METHODS) and using either silent or synonymous sites; there are thus four c estimates for each value of the reference locus diversity. All values are similar, and all are very low, so we do not show the values here. The highest value, 1.14 x 10–6, assumes the low mutation rate and the high reference locus diversity above. We discuss below the inaccuracy caused by applying a model of a gametophytic system to a species with a sporophytic incompatibility system.
Given these estimates, we then applied the equation for a site recombining with a self-incompatibility locus (Equation A3) to synonymous- or silent-site diversity data from loci at different physical distances from the S locus, to estimate the recombination rates. The highest estimates (based on the high mutation rate) are shown in Table 3. The estimates are low for the three loci close to SRK. For the more distant loci, the values of
within are similar to the diversity from this set of sequences estimated without taking account of the haplotype's SRK sequence (
total); thus
between is either small or negative, suggesting that recombination is frequent enough at this distance to break down linkage disequilibrium. For two loci,
within >
between, while the S8 gene has only a single (nonsynonymous) SNP variant. The results therefore suggest sufficient recombination that the balancing selection at the S loci has not led to reduced diversity within allelic classes; thus the recombination rate cannot be estimated.
| DISCUSSION |
|---|
|
|
|---|
Neither the allele number estimates nor the estimates of recombination rates are highly accurate. The equations we have used are expected to be accurate only for gametophytic self-incompatibility and do not take into account the differences in allele frequencies that occur with sporophytic self-incompatibility (SCHIERUP et al. 1998; UYENOYAMA 2000). A minor inaccuracy arises because the model ignores the effects of allelic turnover, which involves a selective sweep when a new S allele arises and spreads in a population, so that the diversity within this haplotype will initially be low. This was modeled by TAKAHATA and SATTA (1998) and TAKEBAYASHI et al. (2004), but the effect is generally small, particularly when the turnover rate is low, as seems to be the case for the SRK alleles (see above).
More serious difficulties come from unequal S-allele frequencies. Using the n-island model of migration to model the n-allele case of self-incompatibility loci is appropriate only if allelic classes are interchangeable. This should be correct for alleles in a gametophytic incompatibility system, unless there is some "extra effect" of selection acting on different alleles (LAWRENCE and FRANKLIN-TONG 1994), but it is not true for sporophytic incompatibility, because dominance means that certain allele classes are expected to be consistently more frequent than others (reviewed in SCHIERUP 1998; UYENOYAMA 2000). A. lyrata populations have large numbers of alleles, in at least four dominance classes (MABLE et al. 2004; SCHIERUP et al. 2006), and the predicted higher frequencies are indeed estimated for the more recessive alleles (SCHIERUP et al. 2006). We also cannot assume that no extra effect of selection acts on A. lyrata alleles, since segregation anomalies have been observed for these alleles in plants from our study populations (BECHSGAARD et al. 2004).
With unequal allele frequencies, the probability that a neutral variant recombines onto a given allelic type from a different one, r', is affected, since, rather than being exclusively homozygotes (at a frequency determined strictly by the number of alleles), or else heterozygous for two different S-allele classes, plants can carry two members of the same high-frequency allelic class, and between-class recombination cannot then occur. For our species' self-incompatibility system, a single n value is therefore incorrect for the equations in the APPENDIX that take account of the chance that an allele is present heterozygous with one of a different allelic class with which it could recombine (see Equation A3 and the expression for r'). With a large number of alleles, however, most individuals will be heterozygous, and so the effective recombination rate between different alleles in our equations is close to the true recombination rate, r. If there are many alleles, r' is approximately equal to the product of r and the sum of the squared allele frequencies, which is greater than r/n, implying that the true value of r is less than nr' in Equation A6, so that our method overestimates r. As we are testing the possibility of low recombination in the S-locus region, this is conservative, and we can thus use this approach to obtain rough parameter estimates for A. lyrata, which has large numbers of alleles, in at least four dominance classes (MABLE et al. 2004; SCHIERUP et al. 2006).
Even for gametophytic incompatibility systems, and even when there is no recombination, the equation for Twithin is only approximate (VEKEMANS and SLATKIN 1994), because the effective size of allelic classes in a finite population is affected by fluctuations in the number of copies of an allele over the generations, and it is not the arithmetic mean, but the harmonic mean number that determines this effective size (VEKEMANS and SLATKIN 1994).
Although our estimates are therefore rough, they clearly indicate a very low turnover rate. The effect of fluctuations in an allele's number of copies is to lower Twithin, which will lead to overestimation of the number of alleles. An effect in the opposite direction, leading to an overestimate of n by our approach, arises due to the differences in allele longevity expected in sporophytic incompatibility systems. There is thus no single coalescence time, and no single turnover rate, for all alleles. Diversity may evolve within old S alleles, due to mutation as well as to recombination between different S haplotypes, and such variants lead to an inference that recombination is occurring, but in reality this reflects the age of these particular alleles, not the recombination rate.
The turnover rate and allele number estimates assume no recombination, and both yield plausible values. The per locus mutation rate to new functional S alleles must be lower than the turnover rate. Given that two loci are involved in self-incompatibility, and that suitable changes must occur in both to generate a new functional S allele (SCHOPFER et al. 1999; CHARLESWORTH 2000; TAKAYAMA et al. 2000; KUSABA et al. 2001; CHOOKAJORN et al. 2003; CHARLESWORTH et al. 2005), it has long been realized that this mutation rate should be very low; thus the very low value estimated here seems plausible.
The highest estimated recombination rate in Table 3 corresponds to a value of almost 18 Mb/cM, a very large physical distance per map unit. The average physical distance per centimorgan is
1 Mb in maize (DOONER 1996), barley (KÜNZEL et al. 2000), Medicago truncatula (CHOI et al. 2004), and Allium (KHRUSTALEVA et al. 2005). In A. lyrata AL7, the estimated value is 205 kb/cM (KAWABE et al. 2006). KING et al. (2002) review the wide range of values estimated for different regions of the same species' genome for the few plants where data are available. In chromosome arms (i.e., not including centromeric regions, which have much lower rates of crossing over), the highest values are
550 kb/cM in A. thaliana chromosome IV (DROUAUD et al. 2005),
1 Mb in rice (ZHANG and WING 1997), and 22 Mb in wheat (GILL et al. 1996), and in poplar, with an average of
200 kb/cM, a region with 25 times less recombination was found near a rust resistance gene (STIRLING et al. 2001).
The main factor causing our r estimates to be misleading will be an incorrect n estimate. Our estimates of S-allele numbers based on silent or on synonymous sites are lower than the numbers of SRK alleles directly observed (and sequenced) in A. lyrata (CHARLESWORTH et al. 2003a; BECHSGAARD et al. 2004; MABLE et al. 2004). Even assuming n = 50, however, only doubles the estimated recombination rates in Table 3. Our estimates are also probably conservative because we used physical distances for A. thaliana, whose genome is smaller than that of A. lyrata. Unless the A. lyrata physical distances are considerably smaller than those in A. thaliana, our results suggest that there is a region of suppressed crossing over, which may not extend as far as the distant flanking genes we have studied, since none of these four loci has high diversity and two of them yielded similar estimated diversity within S haplotypes and between different haplotypes. Since diversity estimates have high variance, and our sample of haplotypes whose phase could be established is small, the extent of the region remains uncertain. Balancing selection at the S loci may be affecting the polymorphism of a large set of loci in this region. Although the apparently nonrecombining region is thus probably small, the homologous region in A. thaliana (i.e., between the same loci that delimit the mapped region in A. lyrata) contains >200 genes.
Other systems with long-term balancing selection:
The approach used here should be applicable to other systems with long-term balancing selection, including MHC and the honeybee sex-determining system (HASSELMANN and BEYE 2004; CHO et al. 2006), and it should be possible to estimate the numbers of functionally different alleles. A difficulty, however, is that the sequences that have been determined for these systems are not assigned to different functional classes of alleles. Such assignment is feasible for the honeybee sex-determining system, though it is laborious, but for MHC alleles it is very difficult, because the functions of these loci are unknown, so that alleles are classified and named according to sequence similarity; thus one cannot estimate diversity within and between classes, other than dividing the sequences arbitrarily. For instance, a study of HLA-DPB1 sequences in human populations (BERGSTRÖM et al. 1998) found much lower nucleotide diversity within 13 such "lineages" than between them. The within-lineage diversity estimates range from 0.0007 for introns 1 and 2 and 0.0006 for nonantigen recognition sites in exon 2 (both
100 times less than the values between the lineages) to 0.087 for sites encoding the amino acids in exon 2 involved in antigen recognition (more than half the between-lineage estimate of 0.139). The exon 2 antigen recognition sites are thought to be directly involved in the protein's function, and there is evidence that they are under balancing selection (HUGHES et al. 1990), so these are sites in the gene corresponding to r = 0, or close to zero, and they should thus have the highest ratio of diversity between vs. within lineages. This suggests that the lineages may not correspond to functionally distinct alleles. Another possibility is that different allelic types can recombine without losing their functional distinctiveness, but this cannot account for the low ratio for the nonantigen recognition sites (non-ARS) in exon 2. Although much more data are needed for this kind of estimate, we used the equation appropriate for MHC (see APPENDIX); the ARS yield a surprisingly high estimate of 97 functional allelic classes, while the non-ARS in exon 2 yield a value of 8, and an implausibly high estimated r value of 19.6 cM/Mb.
The "subdivision theory" developed here and by TAKAHATA and SATTA (1998) should nevertheless help toward understanding the high diversity in MHC regions in a quantitative manner, by making it possible to use recombination rate data to ask whether high diversity in any given region can be explained by linkage to regions under balancing selection or whether it requires selection acting within the region (GRIMSLEY et al. 1998). It has been suggested that the higher than usual polymorphism of loci in the region of the genome surrounding the MHC loci is due to "hitchhiking" by the selected loci (SHIINA et al. 2006). A better characterization of the situation would be in terms of the subdivision due to balancing selection at loci in the region, since the term hitchhiking refers to a situation in which allele frequencies are being altered by directional selection. If there are enough functionally different alleles (high n in the model), and low enough recombination, this may even be capable of accounting for the functional variants in the 5' cis-regulatory region of the MHC-DQA1 gene,
4 kb from the gene's coding region (LOISEL et al. 2006).
Conclusions:
Since diversity data have high variances, accurate estimates using this approach will require large samples of alleles of many functional classes. Such samples are not easy to obtain for self-incompatibility, even though functional classes can, in principle, be determined, and our recombination rate estimates are clearly rough (and were impossible for some of the S-flanking loci, due to higher diversity estimates within haplotypes with the same SRK alleles than between them). They will be even more difficult to obtain for MHC loci. In both cases, however, the theory shows clearly that sets of very similar sequences may represent alleles of the same functional class, and this may help determine the number of such classes, given estimates of the recombination rate, even without knowing the actual nature of the function.
| APPENDIX: DERIVATION OF EXPRESSIONS FOR NUCLEOTIDE DIVERSITIES AT SITES CLOSELY LINKED TO A LOCUS UNDER BALANCING SELECTION |
|---|
|
|
|---|
Using the approach of NAGYLAKI (1998), Equation 6, the fundamental expression for Twithin is
![]() | (A1) |
![]() | (A2) |
In addition, n is the number of alleles at the selected locus, and c is the "turnover rate," i.e., the probability that an allele in a given generation originated from a different allele. The first two terms in the fundamental equation for Twithin describe coalescences in the preceding generation within a new allele and within existing alleles, respectively. The remaining terms represent cases when no coalescence has occurred in the previous generation, so that there is a delay of one generation in the time to coalescence. The third term describes the case when turnover has not occurred, nor has there been recombination. The final term takes account of recombination with other allelic classes.
In the terms on the right-hand side of Equation A1, r is modified according to the number of different functional types of alleles with which recombination of a given allele type can occur, as indicated by a tilde. Assuming equal frequencies of all functional types of allele, if the form of selection is such that homozygous genotypes can occur, as in MHC systems, the frequency with which a given functional type of allele encounters an allele of a different class (i.e., is heterozygous for two functional classes or types of allele) is (n – 1)/n. The chance that, in such a heterozygote, a neutral variant switches to a given allele by recombination from another allele is r. Thus the per generation probability of such a switching event is
![]() | (A3) |
For determining Tbetween, we have
![]() | (A4) |
These expressions simplify to the following, using the same notation as defined above:
![]() | (A5) |
![]() | (A6) |
Finally, for the case of a site recombining with a gametophytic self-incompatibility locus, we have, from Equation A2,
![]() | (A7) |
| ACKNOWLEDGEMENTS |
|---|
|
|
|---|
| LITERATURE CITED |
|---|
|
|
|---|
BARRIER, M., C. D. BUSTAMANTE, J. YU and M. D. PURUGGANAN, 2003 Selection on rapidly evolving proteins in the Arabidopsis genome. Genetics 163: 723–733.
BECHSGAARD, J., 2002 Population genetic dynamics of homomorphic self-incompatibility systems: evidence of different selection pressures on the different self-incompatibility alleles above that of frequency-dependent selection. Ph.D. Thesis, University of Aarhus, Aarhus, Denmark.
BECHSGAARD, J., T. BATAILLON and M. H. SCHIERUP, 2004 Uneven segregation of sporophytic self-incompatibility alleles in Arabidopsis lyrata. J. Evol. Biol. 17: 554–561.[CrossRef][Medline]
BENNETT, M. D., I. J. LEITCH, H. J. PRICE and J. S. JOHNSTON, 2003 Comparisons with Caenorhabditis (
100 Mb) and Drosophila (
175 Mb) using flow cytometry show genome size in Arabidopsis to be
157 Mb and thus similar to 25% larger than the Arabidopsis genome initiative estimate of
125 Mb. Ann. Bot. 91: 547–557.
BERGSTRÖM, T. F., A. JOSEFSSON, H. ERLICH and U. GYLLENSTEN, 1998 Recent origin of HLA-DPB1 alleles and implications for human evolution. Nat. Genet. 18: 237–242.[CrossRef][Medline]
CASSELMAN, A. L., J. VREBALOV, J. A. CONNER, A. SINGHAL, J. GIOVANNI et al., 2000 Determining the physical limits of the Brassica S-locus by recombinational analysis. Plant Cell 12: 23–24.
CHARLESWORTH, B., M. NORDBORG and D. CHARLESWORTH, 1997 The effects of local selection, balanced polymorphism and background selection on equilibrium patterns of genetic diversity in subdivided inbreeding and outcrossing populations. Genet. Res. 70: 155–174.[CrossRef][Medline]
CHARLESWORTH, B., D. CHARLESWORTH and N. H. BARTON, 2003 The effects of genetic and geographic structure on neutral variation. Annu. Rev. Ecol. Evol. Syst. 34: 99–125.[CrossRef]
CHARLESWORTH, D., 2000 How can two-gene models of self-incompatibility generate new specificities? A comment on: production of an S RNase with dual specificity suggests a novel hypothesis for the generation of new S alleles. Plant Cell 12: 309–310.
CHARLESWORTH, D., C. BARTOLOMÉ, M. H. SCHIERUP and B. K. MABLE, 2003a Haplotype structure of the stigmatic self-incompatibility gene in natural populations of Arabidopsis lyrata. Mol. Biol. Evol. 20: 1741–1753.
CHARLESWORTH, D., B. K. MABLE, M. H. SCHIERUP, C. BARTOLOMÉ and P. AWADALLA, 2003b Diversity and linkage of genes in the self-incompatibility gene family in Arabidopsis lyrata. Genetics 164: 1519–1535.
CHARLESWORTH, D., X. VEKEMANS, V. CASTRIC and S. GLÉMIN, 2005 Plant self-incompatibility systems: a molecular evolutionary perspective. New Phytol. 168: 61–69.[CrossRef][Medline]
CHO, S., Z. Y. HUANG, D. R. GREEN, D. R. SMITH and J. ZHANG, 2006 Evolution of the complementary sex-determination gene of honey bees: balancing selection and trans-species polymorphisms. Genome Res. 16: 1366–1375.
CHOI, H.-K., D. KIM, T. UHM, E. LIMPENS, H. LIM et al., 2004 A sequence-based genetic map of Medicago truncatula and comparison of marker colinearity with M. sativa. Genetics 166: 1463–1502.
CHOOKAJORN, T., A. KACHROO, D. R. RIPOLL, A. G. CLARK and J. B. NASRALLAH, 2003 Specificity determinants and diversification of the Brassica self-incompatibility pollen ligand. Proc. Natl. Acad. Sci. USA 101: 911–917.[CrossRef][Medline]
DOONER, H. K., 1996 Genetic fine structure of the bronze locus in maize. Genetics 113: 1021–1036.
DROUAUD, J., C. CAMILLERI, P.-Y. BOURGUIGNON, A. CANAGUIER, A. BÉRARD et al., 2005 Variation in crossing-over rates across chromosome 4 of Arabidopsis thaliana reveals the presence of meiotic recombination "hot spots". Genome Res. 16: 106–114.[CrossRef][Medline]
GILL, K. S., B. S. GILL, T. R. ENDO and E. V. BOYKO, 1996 Identification and high-density mapping of gene-rich regions in chromosome 5 of wheat. Genetics 143: 1001–1012.[Abstract]
GRIMSLEY, C., K. A. MATHER and C. OBER, 1998 HLA-H: a pseudogene with increased variation due to balancing selection at neighboring loci. Mol. Biol. Evol. 15: 1581–1588.[Abstract]
HAGENBLAD, J., J. BECHSGAARD and D. CHARLESWORTH, 2006 Linkage disequilibrium between incompatibility locus region genes in the plant Arabidopsis lyrata. Genetics 173: 1057–1073.
HASSELMANN, M., and M. BEYE, 2004 Signatures of selection among sex-determining alleles of the honey bee. Proc. Natl. Acad. Sci. USA 101: 4888–4893.
HUDSON, R. R., 1990 Gene genealogies and the coalescent process. Oxf. Surv. Evol. Biol. 7: 1–45.
HUDSON, R. R., and N. L. KAPLAN, 1988 The coalescent process in models with selection and recombination. Genetics 120: 831–840.
HUDSON, R. R., D. D. BOOS and N. L. KAPLAN, 1992 A statistical test for detecting geographic subdivision. Mol. Biol. Evol. 9: 138–151.[Abstract]
HUGHES, A. L., T. OTA and M. NEI, 1990 Positive Darwinian selection promotes charge profile diversity in the antigen-binding cleft of class I major-histocompatibility-complex molecules. Mol. Biol. Evol. 76: 515–524.
KAMAU, E., and D. CHARLESWORTH, 2005 Balancing selection and low recombination affects diversity near the self-incompatibility loci of the plant Arabidopsis lyrata. Curr. Biol. 15: 1773–1778.[CrossRef][Medline]
KAWABE, A., B. HANSSON, A. FORREST, J. HAGENBLAD and D. CHARLESWORTH, 2006 Comparative gene mapping in Arabidopsis lyrata chromosomes 6 and 7 and A. thaliana chromosome IV: evolutionary history, rearrangements and local recombination rates. Genet. Res. 88: 45–56.[CrossRef][Medline]
KHRUSTALEVA, L. I., P. E. DEMELO, A. W. VANHEUSDEN and C. KIK, 2005 The integration of recombination and physical maps in a large-genome monocot using haploid genome analysis in a trihybrid Allium population. Genetics 169: 1673–1685.
KING, J., I. P. ARMSTEAD, I. S. DONNISON, H. M. THOMAS, R. N. JONES et al., 2002 Physical and genetic mapping in the grasses Lolium perenne and Festuca pratensis. Genetics 161: 315–324.
KUITTINEN, H., A. A. DE HAAN, C. VOGL, S. OIKARINEN, J. LEPPÄLÄ et al., 2004 Comparing the linkage maps of the close relatives Arabidopsis lyrata and Arabidopsis thaliana. Genetics 168: 1575–1584.
KUMAR, S., K. TAMURA and M. NEI, 2004 MEGA3: integrated software for molecular evolutionary genetics analysis and sequence alignment. Brief. Bioinform. 5: 150–163.
KÜNZEL, G., L. KORZUN and A. MEISTER, 2000 Cytologically integrated physical restriction fragment length polymorphism maps for the barley genome based on translocation breakpoints. Genetics 154: 397–412.
KUSABA, M., K. DWYER, J. HENDERSHOT, J. VREBALOV, J. B. NASRALLAH et al., 2001 Self-incompatibility in the genus Arabidopsis: characterization of the S locus in the outcrossing A. lyrata and its autogamous relative, A. thaliana. Plant Cell 13: 627–643.
LAWRENCE, M. J., and V. E. FRANKLIN-TONG, 1994 The population genetics of the self-incompatibility polymorphism in Papaver rhoeas. IX. Evidence of an extra effect of selection acting on the S-locus. Heredity 72: 353–364.
LOISEL, D. A., M.