Abstract
Surveys in Drosophila have consistently found reduced levels of DNA sequence polymorphism in genomic regions experiencing low crossing-over per physical length, while these same regions exhibit normal amounts of interspecific divergence. Here we show that for 36 loci across the genomes of eight Lycopersicon species, naturally occurring DNA polymorphism (scaled by locus-specific divergence between species) is positively correlated with the density of crossing-over per physical length. Large between-species differences in the amount of DNA sequence polymorphism reflect breeding systems: selfing species show much less within-species polymorphism than outcrossing species. The strongest association of expected heterozygosity with crossing-over is found in species with intermediate levels of average nucleotide diversity. All of these observations appear to be in qualitative agreement with the hitchhiking effects caused by the fixation of advantageous mutations and/or “background selection” against deleterious mutations.
THE genus Lycopersicon consists of nine species, of which the only cultivated species is L. esculentum (tomato), represented in the wild by var. cerasiforme (Rick 1983). Lycopersicon species are crossable with one another in all combinations, though with varying degrees of difficulties (Soost 1958). The karyotypes of the 12 chromosome pairs are very similar with little or no structural differences among species (Rick 1983).
Despite this uniformity of karyotypes and the small number of species, Lycopersicon encompasses a great diversity of mating systems. L. cheesmanii, endemic to the Galapagos Islands, has an autogamous mating system, which is typical of many other endemic flowering plants of the archipelago (Rick 1966). Another species exhibiting virtually complete autogamy is L. parviflorum (Rick 1983). Self-fertilization prevails among natural populations as well as cultivated varieties of L. esculentum. In contrast, L. pimpinellifolium shows regional differences in relative levels of outcrossing vs. selfing. Autogamy predominantly occurs in peripheral populations of southern Peru and Equador while allogamy prevails in the central parts of the species distribution (Rick 1983). L. chmielewskii is another species with a facultative mating system. It has a limited distribution and has not been as extensively studied as other Lycopersicon species. The remaining species (L. chilense, L. hirsutum, L. pennellii, and L. peruvianum) differ from these two facultative outcrossers by the presence of a self-incompatibility system (Rick 1987). Self-incompatibility occurs in these four species to a varying degree and is probably most widely distributed in L. chilense, L. pennellii, and L. peruvianum (marginal populations of L. pennellii, and L. peruvianum are self-compatible). The self-incompatibility system in Lycopersicon is gametophytic and controlled by a single, multiallelic S locus (Tanksley and Loaiza-Figueroa 1985).
Genetic linkage maps have been established in tomato since the beginning of classical genetics (Jones 1911; Butler 1952). Due to the low level of genetic variation among cultivars of L. esculentum, the current map was constructed using an F2 population of the interspecific cross L. esculentum × L. pennellii. It contains more than 1000 markers that are distributed over 1276 cM (Tanksleyet al. 1992; Pillenet al. 1996; Fultonet al. 1997). The centromeres have been localized on these maps. In addition, a quantitative cytogenetic map of the distribution of recombination nodules (RNs) is available for comparison with the linkage map (Sherman and Stack 1995). This cytogenetic map [based on spreads of chromosomal synaptonemal complexes (SCs)] describes the frequency and distribution of RNs at a per 0.1-μm resolution for each of the 12 chromosomes in L. esculentum. The distribution of RNs is thought to reflect the distributions of subsequent chiasmata and crossovers.
Our major goals in this study are to investigate the relationship between crossing-over and the level of DNA polymorphism in Lycopersicon, using information from these sources, and to analyze the impact of mating system on DNA polymorphism. This work has been stimulated by data from surveys of DNA polymorphism in natural populations of Drosophila, which consistently show that genetic variation is lower for loci in regions where crossing-over per physical length is relatively infrequent (Aguadéet al. 1989; Stephan and Langley 1989), while the same regions exhibit normal amounts of interspecific divergence (Begun and Aquadro 1991; Berryet al. 1991). Interest in Lycopersicon was motivated by the interspecific variation in outcrossing associated with differences in patterns of allozyme variation resulting from it (Ricket al. 1979; Rick and Tanksley 1981) and the clear evidence for large differences among chromosomal regions in the level of crossingover per physical length (Sherman and Stack 1995, and references therein). We approach these goals in three steps: (1) We align the RN-cytogenetic maps and linkage maps to estimate the local density of crossingover per physical length. (2) We reanalyze Miller and Tanksley’s (1990) RFLP data obtained from eight Lycopersicon species (L. chilense is absent) and 41 loci distributed across all 12 chromosomes. (3) We conduct a four-cutter survey of DNA sequence variation at the sucrose accumulator gene (sucr) (Chetelatet al. 1995) and the cystolic superoxide dismutase gene, Sod-2 (Perl-Treveset al. 1990), using a sample from a L. peruvianum population; sucr is located in the centromere region of chromosome 3 in a region of reduced crossing-over per physical length, whereas Sod-2 is on the long arm of chromosome 1 in a region of normal crossing-over (Figure 1).
MATERIALS AND METHODS
Construction of a crossing-over per physical length map: We construct a map to estimate the density of crossing-over per physical length based on the quantitative cytogenetic map for the cultivated tomato, L. esculentum (Sherman and Stack 1995), which shows the frequency of RNs in each 0.1-μm segment of the SCs of the 12 chromosomes of L. esculentum (>400 observed SCs per chromosome). We apply the “lowess procedure” (Chamberset al. 1983; weighting parameter is 5%) to smooth the local variation along the chromosomes thereby emphasizing the regional characteristics of the map (for instance, extended segments of low or high recombination rates) over local variation [much of which reflects the finite sampling of the original observations (Sherman and Stack 1995)]. In a second step, we align the updated genetic maps (Pillenet al. 1996; Fultonet al. 1997) and these (smoothed) RN maps in a linear fashion such that the centromeres and telomeres of the chromosomes’ cytogenetic maps correspond to ends of the genetic maps of each chromosome arm. In those cases where the genetic location of the centromere covered several adjacent intervals (Fultonet al. 1997), the centromere is assumed to be in the midpoint of these intervals. The density of RNs per micrometer (RN/μm) for each of the mapped loci can be assigned by interpolation.
RFLP data source and analysis: Thirty-six loci of the data set of Miller and Tanksley (1990) that could be localized unambiguously on the recent genetic linkage map (Pillenet al. 1996) were used in this analysis. The raw data were given as sets of restriction fragment lengths for each locus, each plant, and each restriction enzyme (Miller 1989). These RFLP data (southern blots of digests with five six-cutter restriction enzymes) were obtained from a total of 156 plants representing nine taxa [eight species and one sample from an isolated population identified as L. peruvianum var. humifusum, LA2150; following Miller and Tanksley (1990), LA2150 is considered a separate taxon]. As mentioned above, the nine taxa can be partitioned into three groups based on their mating systems (Rick 1987): self-compatible and typically self-fertilizing (L. cheesmanii, esculentum, and parviflorum), self-compatible with intermediate levels of outcrossing (L. chmielewskii and pimpinellifolium), and typically self-incompatible and consequently outcrossing (L. hirsutum, pennellii, LA2150, and peruvianum).
We estimate genetic variation within each taxon s (expected number of pairwise differences per nucleotide site,
Recognizing that systematic differences among loci in the levels of variation exist because of differences in probe size and inherent mutation rate, we estimate interspecific divergence, dˆpl, for each locus l over three apparently independent, evolutionary paths p: esculentum to pimpinellifolium, hirsutum to pennellii, and cheesmanii to peruvianum (Miller and Tanksley 1990), using the same method, i.e., on the basis of the proportion of shared fragments,
Analysis of covariance: The model of analysis of covariance for the crossing-over per physical length and species effects on genetic variation is
Experimental procedures: The determination of four-cutter restriction site variation at the sucrose accumulator locus (sucr) and the cystolic Cu/Zn superoxide dismutase locus (Sod-2) is based on the survey of five plants from the 1995 maintenance L. peruvianum population of the Tomato Genetics Resource Center of the University of California at Davis (accession LA2744). The founding seeds of this accession were originally collected at Sobraya (Azapa), Tarapacá, Chile in 1986. The plants used in this study (kindly provided by C. M. Rick) were from the second generation of mass sib-pollination of a greenhouse population (48 individuals). The protocol for the preparation of genomic DNA is adapted from Chetelat et al. (1995). The choices of PCR primers are based on the published L. esculentum sequences (accession numbers Z12027 and X87372 for sucr and Sod-2, respectively). Primers are placed in coding sequences and spaced approximately every 500 bp. Longer PCR fragments were also examined, but the interpretation of the banding pattern was ambiguous, particularly when restriction site heterozygosities occurred. PCR reaction mixture (Longet al. 1998) included tricine buffer and Taq Extender (Stratagene, La Jolla, CA). The PCR products are cut directly in the PCR buffer with eight four-cutter enzymes (AluI, CfoI, DdeI, HaeIII, HinI, HpaII, RsaI, ScrFI) and run on a gel made of Synergel (Research Product International Corp.). The gels are stained with ethidium bromide and photographed.
RESULTS
Maps of the density of RN/μm and linkage: To analyze the effects of crossing-over on DNA polymorphism in Lycopersicon, we constructed a combined physical and recombination map using a quantitative RN-cytogenetic map of L. esculentum (Sherman and Stack 1995) and the high-resolution linkage map from the cross of L. esculentum and L. pennellii (Pillenet al. 1996; Fultonet al. 1997). Figure 1 shows the estimated density of recombination nodules, R, along the SC for chromosomes 1 and 3. RNs are thought to be the earliest cytogenetic manifestations of the process yielding meiotic chromosome exchange (Carpenter 1979a,b). As has been recognized for decades (reviewed in Sherman and Stack 1995), all euchromatic chromosome arms show an extended centromeric proximal region in which the frequency of exchange per physical length is severely reduced relative to that in the distal portions. Crossingover is suppressed in the centric heterochromatin and in the regions immediately adjacent to the telomeres. This map is also useful for other Lycopersicon species as little or no structural variation among their karyotypes is observed (Khush and Rick 1968), and the genetic maps are comparable among species (van Ooijenet al. 1994).
Results of analysis of covariance: Shown in Table 1 are the 36 loci from the survey of Miller and Tanksley (1990) and estimates of average numbers of differences per site within species for each locus and each of the eight species. One locus, TG12, lacked sufficient data in several species. Five additional loci in the original study of Miller and Tanksley were excluded because of paucity of observations or ambiguity in interpretation of the original observations, e.g., multiple loci per probe. Table 1 also shows the rescaling factors, rad∂l and ard∂l. Finally, Table 1 shows the estimates of
Table 2 presents the results of the analyses of covariance
Restriction site variation at sucr and Sod-2 in L. peruvi-anum: The positive correlation between crossing-over per physical length and DNA polymorphism is supported by a RFLP analysis of two gene regions in a survey of a L. peruvianum population. The sucr gene is located in the centromeric region of chromosome 3. Based on its position [genetic position = 55.6 (Chetelatet al. 1995)] on the genetic map (Pillenet al. 1996; Fultonet al. 1997), we estimate a rate of crossing-over of 0.00 RN/μm (Figure 1). Variation is surveyed in a region of ∼3750 bp (of the 4-kb sucr transcriptional unit). Our four-cutter method (eight enzymes) allows us to identify a total of 64 restriction sites over this length of DNA, 3 of which are polymorphic within the L. peruvianum sample. One of the 3 polymorphic sites is a replacement polymorphism. Six fixed differences are found between L. esculentum and L. peruvianum. The relatively low number of observed restriction sites is largely due to the AT-rich composition of the introns that make up 51.3% of the total sequence. To obtain estimates for the standard nucleotide diversity statistics π (Nei 1987) and θ (Watterson 1975), we estimate the number, Lˆ, of silent sites surveyed as
—Maps of the densities of recombination nodules R (RN/μm) from Sherman and Stack (1995). The gray lines are their original data and the black lines are the smoothed estimates. These “smoothed” maps are aligned with the genetic maps of each chromosome arm so that the R values for each of the surveyed loci can be interpolated. (a) The map for chromosome 1. The positions of various loci are indicated as is the position of the kinetochore/centromere (K-C). (b) A similar map for chromosome 3. Also shown on these maps are the positions of the two loci surveyed for four-cutter restriction map variation in L. peruvianum, Sod-2 and sucr.
Summary of our analyses of Miller and Tanksley’s (1990) RFLP data
Analysis of covariance of the average number of differences per site within species, , rescaled by the relative average divergence, rad∂l
The Sod-2 gene is located on the long arm of chromosome 1. On the basis of its position [genetic position = 45.8 (Pillenet al. 1996)] on the genetic map, we estimated a recombination rate of 0.137 RN/μm (see Figure 1). Our method allows the survey of variation in a region of roughly 3300 bp of the 3.5-kb Sod-2 transcriptional unit. Due to the high AT-content of the introns that make up 86.7% of the total of exon and intron sequences of Sod-2, we identified only 34 restriction sites, 8 of which were polymorphic within and among the surveyed L. peruvianum lines. Five fixed differences are found between L. esculentum and L. peruvianum. Assuming that all restriction site polymorphisms in exons are at synonymous positions, the estimates of nucleotide diversity are (with Lˆ = 210)
DISCUSSION
Recombination map: By overlaying a high-resolution physical map (Sherman and Stack 1995) and an updated set of linkage data (Pillenet al. 1996; Fultonet al. 1997), we are able to construct a map for L. esculentum that gives the rate of crossing-over in units of RN/μm. This map captures many of the properties of the genetic linkage in Lycopersicon that have been reported since the dawn of classical genetics (Jones 1911). However, given the experimental errors of both the physical and the linkage maps and the assumptions of our construction, it must still be considered quantitatively crude. Another potential problem is that the maps vary between species. For instance, the comparison of a L. peruvianum intraspecific linkage map with the L. esculentum map that is based on L. esculentum × L. pennellii crosses revealed on average a 10% increase in chromosome length for the intraspecific map (van Ooijenet al. 1994). In agreement with earlier reports (Sherman and Stack 1995 and references therein), our map shows that crossing-over per physical length is suppressed over a substantial fraction of the euchromatic regions of each chromosome, in particular in those regions proximal to the centromeres and telomeres.
Species effects on levels of polymorphism: As expected from allozyme studies (Rick 1983) and from the observations by Miller and Tanksley (1990) in their original publication of these RFLP data, the selfing species show much lower average levels of variation than those with high degrees of outcrossing. Analysis of covariance reveals that there are highly significant differences in levels of variation between species (see Table 2). Figure 2a depicts the observed distribution of
Recombination and species effects on levels of variation: Both the analysis of covariance and our survey of the sucr and Sod-2 genes in L. peruvianum support the hypothesis that DNA polymorphism correlates with rates of crossing-over per physical length. Thus, this effect, which has been observed in several Drosophila species [including D. ananassae (Stephan and Langley 1989), D. melanogaster (Aguadéet al. 1989; Begun and Aquadro 1992), D. simulans (Begun and Aquadro 1991; Berryet al. 1991), D. mauritiana, and D. sechellia (Hiltonet al. 1994)] and in mice (Nachman 1997), has been confirmed in a relatively distant relative, Lycopersicon. And very recently levels of RFLP were measured in selfing and outcrossing species of Aegilops (Dvoráket al. 1998). An association between allelic diversity and presence in genomic regions of low crossing-over was found. However there was no attempt to correct levels of polymorphism in Aegilops for locus-specific rates of divergence or to measure variation in terms of nucleotide diversity.
Figure 2b shows the normalized distribution of
In a separate analysis and despite the lack of support for heterogeneity among species, we examined the slopes of the regression of
Population genetics theory: Perhaps the simplest explanation for a correlation between levels of crossingover per physical length and levels of polymorphism would be that recombination itself contributes directly by increasing the input of new fragment lengths. This hypothesis would also predict a correlation of divergence with crossing-over per physical length. We examined this relationship for both measures of divergence and R (Table 1). In neither case is there any suggestion of a positive association between divergence and crossing-over.
—The distribution of the . (a) The estimates of
rescaled by rad∂l are plotted against the species average,
. The columns of points are from left to right, L. parviflorum, cheesmanii, esculentum, chmielewskii, LA2150, pimpinellifolium, pennellii, hirsutum, and peruvianum. (b) The
, corrected for the species average (over loci), i.e.,
are plotted against R, the estimated density of RN/μm. The line depicts the overall slope estimate from the analysis of covariance.
Two models have been proposed to explain the reduction of DNA sequence polymorphism in regions of low rates of crossing-over: the selective sweep model (Maynard Smith and Haigh 1974; Kaplanet al. 1989; Stephanet al. 1992) and the background selection model (Charlesworthet al. 1993; Hudson and Kaplan 1995; Charlesworth 1996). The first model assumes the hitchhiking of neutral (or nearly neutral) variants on chromosomes bearing rare, strongly selected, favorable mutations at closely linked loci that go rapidly to fixation. The second model involves the loss of neutral or nearly neutral variants as a result of steady elimination of linked deleterious mutations from the population. Qualitatively, both models can explain the observed positive correlation between crossing-over per physical length and DNA sequence diversity within species. The large difference (greater than twofold) in
—The estimated values of Cs (in units of corrected nucleotide differences per site per RN/μm) from the parametric model analysis of covariance are plotted against the estimated average, (in units of corrected nucleotide differences per site). The open bars represent the selfers (in order, L. parviflorum, cheesmanii, and esculentum), the shaded bars the self-compatible species with intermediate levels of outcrossing (L. chmielewskii and pimpinellifolium), and the solid bars the species with self-incompatibility alleles (LA2150, L. pennellii, hirsutum, and peruvianum). The slope of the increase in average number of differences per site with increasing R is low for species with a low
, while the two species (L. pennellii and hirsutum) with intermediate
show the strongest response with crossing-over per physical length. On the right is plotted the slope for the outcrossing species, L. peruvianum, which has the highest overall level of variation but a shallow slope with increasing R more typical of selfing species. A least-squares fit of Cs to a quadratic model in
yields a good fit (r2 = 0.78, P < 0.01; both the linear and quadratic coefficients are significantly different from zero), which supports the suggested nonlinearity of the relationship.
The apparently nonlinear relationship between
Acknowledgments
We thank C. Rick for plant material and advice, R. Chetelat for DNAs and protocols, and A. Long for statistical advice. C. Aquadro, B. Charlesworth, and M. Nordborg made helpful suggestions that improved the presentation of this paper. This research was supported in part by National Science Foundation grants DEB-9407226 to W.S. and DEB-9509548 to C.H.L.
Footnotes
-
Communicating editor: G. B. Golding
- Received April 2, 1998.
- Accepted July 21, 1998.
- Copyright © 1998 by the Genetics Society of America