The common ancestor of the self-fertilizing nematodes Caenorhabditis elegans and C. briggsae must have reproduced by obligate outcrossing, like most species in this genus. However, we have only a limited understanding about how genetic variation is patterned in such male–female (gonochoristic) Caenorhabditis species. Here, we report results from surveying nucleotide variation of six nuclear loci in a broad geographic sample of wild isolates of the gonochoristic C. remanei. We find high levels of diversity in this species, with silent-site diversity averaging 4.7%, implying an effective population size close to 1 million. Additionally, the pattern of polymorphisms reveals little evidence for population structure or deviation from neutral expectations, suggesting that the sampled C. remanei populations approximate panmixis and demographic equilibrium. Combined with the observation that linkage disequilibrium between pairs of polymorphic sites decays rapidly with distance, this suggests that C. remanei will provide an excellent system for identifying the genetic targets of natural selection from deviant patterns of polymorphism and linkage disequilibrium. The patterns revealed in this obligately outcrossing species may provide a useful model of the evolutionary circumstances in C. elegans' gonochoristic progenitor. This will be especially important if self-fertilization evolved recently in C. elegans history, because most of the evolutionary time separating C. elegans from its known relatives would have occurred in a state of obligate outcrossing.
THE combined actions of the population processes of mutation, drift, recombination, demography, and natural selection mold the genomic landscape of genetic variation, leaving signatures that can be detected from patterns of polymorphism in natural populations. To understand the relative importance of these forces in shaping genome evolution, it is crucial to characterize natural genetic variation within and between species. The genus Caenorhabditis, a group of bactivorous nematodes, is rapidly being developed as a model for population genetics, motivated by the genetic model Caenorhabditis elegans and by genome sequencing efforts in related members of this group (Stein et al. 2003; FÉlix 2004). Here, we explore how evolutionary forces shape patterns of nucleotide diversity in natural samples of the obligately outbreeding species C. remanei.
Most nematode species, including C. remanei, have a gonochoristic (male–female) breeding system. However, establishment of the androdioecious (male–hermaphrodite) C. elegans as a biological model has resulted in a research emphasis on the two self-fertile species in this genus (C. elegans and C. briggsae), which are thought to have evolved independently from gonochoristic ancestors (Kiontke et al. 2004). Among known species of Caenorhabditis, C. remanei is related most closely to the androdioecious C. briggsae and gonochoristic C. sp. 5 (JU727) (Braendle and Felix 2006; Kiontke and Sudhaus 2006), sharing a common ancestor many millions of years ago (Coghlan and Wolfe 2002; Stein et al. 2003; FÉlix 2004; Cutter et al. 2006). C. elegans, as well as the gonochoristic C. sp. 4 (CB5161) and C. japonica, are more distant relatives of C. remanei, although all have very similar morphology (Fitch and Emmons 1995; Kiontke et al. 2004). Despite the morphological parallels among species, they differ greatly at the sequence level (Stein et al. 2003; Cho et al. 2004) and no fertile interspecific hybrids between pairwise crosses of C. elegans, C. briggsae, and C. remanei have been observed (Baird et al. 1992; Baird and Yen 2000; Hill and L'Hernault 2001; Baird 2002).
As Caenorhabditis develops further as a model genus for studying comparative evolution, it becomes more urgent to acquire a robust understanding of the gonochoristic species that share common ancestors with the progenitors of these self-fertile species. The consequences of an outcrossing ancestry will be particularly important if hermaphroditism arose recently in the history of C. elegans and/or C. briggsae (Cutter et al. 2006), because most of the evolutionary time separating them from their relatives would have occurred in a gonochoristic state. For example, indications of sexual selection, sex differences in longevity, and selection for codon usage bias in a recent gonochoristic ancestor may persist in selfing descendant species, provided that the origin of selfing is sufficiently recent (McCulloch and Gems 2003; Cutter and Ward 2005; Cutter 2006). Caenorhabditis sex determination genes, which might have been involved in breeding system changes in these species, appear to evolve quickly (Clifford et al. 2000; Haag et al. 2002; Stothard et al. 2002; Nayak et al. 2005). Furthermore, female C. remanei continue to attract strongly the males of other species (Chasnov and Chow 2002) and heterospecific major sperm protein can trigger ovulation (Hill and L'Hernault 2001; Miller et al. 2001). However, sufficient time has elapsed in the selfing species to have led to reduced attractiveness of hermaphrodites to males, reduced mating vigor of males, and smaller male sperm relative to gonochoristic species (Lamunyon and Ward 1999; Hill and L'Hernault 2001; Chasnov and Chow 2002), either through relaxed selection or through selection disfavoring these traits.
C. remanei has been isolated from localities in Europe, North America, and Asia and is found frequently in phoretic associations with invertebrates (principally in the “enduring” dauer larval stage; Baird 1999; Kiontke and Sudhaus 2006; M. A. FÉlix, personal communication). Previous work in C. remanei has demonstrated that specieswide levels of diversity are substantially greater than those in its self-fertile relatives (Graustein et al. 2002; Jovelin et al. 2003; Haag and Ackerman 2005). However, it is not yet clear how different regions of the genome vary in their levels of nucleotide diversity, how the genetic variation is distributed across the species' geographic range, or the distance to which linkage disequilibrium extends between pairs of polymorphic sites in the genome. The answers to these questions depend on the extent of demographic processes affecting the whole genome, such as population subdivision and growth, as well as the locus-specific action of natural selection and its interaction with recombination rates and variation in mutation rate among chromosomal regions.
In this study, we evaluate these issues in a sampling of C. remanei by surveying nucleotide diversity at six nuclear loci. The pattern of polymorphisms reveals little evidence for population structure or deviation from neutral expectations, indicating that C. remanei has a large effective population size and that these samples may approximate panmixis and demographic equilibrium. In combination with the rapid decay of linkage disequilibrium with distance, this suggests that screens for deviations from neutral patterns of polymorphism may prove particularly successful in identifying targets of natural selection in this species.
MATERIALS AND METHODS
We studied 34 isolates of C. remanei, including 6 obtained from the Caenorhabditis Genetics Center and 1 (JU825) gifted by M.-A. Félix (Table 1). Several of the strains had been inbred for numerous generations (SB146, PB4641, VT733, CR1014, CR1415, and CR2124), whereas the others are each simple population propagations from the progeny of a single isolated female individual. Most of the strains derive from collections of isopod wood lice under logs in the Wright State University woods from 1998 and 2000. C. remanei in Ohio were found in association with the wood lice species Trachelipus rathkii, Porcellio scaber, Armadillidium nasatum, Oxychilus alliarius, Oniscus asellus, and Cylisticus convexus (Baird 1999). Three individuals were isolated from the same firefly larva from the Achilles Hill site.
A NaOH digestion protocol (Floyd et al. 2002) was used to isolate DNA from single male individuals for each of the 34 strains of C. remanei. We selected for resequencing in these strains the putative orthologs of the six genes from C. elegans chromosomes II and X that were assessed for polymorphism in Cutter (2006) (Table 2). However, the C. remanei regions sequenced differ from those analyzed in C. elegans (Cutter 2006). Specifically, because of the high diversity in C. remanei, in the present study we primarily sequenced exonic DNA, rather than long introns. These fragments cover roughly a quarter of the coding sequence of the corresponding genes. We use an arbitrary internal naming scheme to refer to these gene fragments (e.g., “p15”). Some individuals were excluded from the analysis of a given locus, due either to failure to amplify or to heterozygosity in sequence traces; only the subset of 27 individuals with complete data at all loci were used for interlocus analyses (Table 1). Both strands were sequenced directly from PCR products, following cleanup with exonuclease I and shrimp alkaline phosphatase, on an ABI 3730 automated sequencer by the University of Edinburgh Sequencing Facility. Sequence data from this article have been deposited in GenBank under the following accession numbers: p15, DQ897401–DQ897433; p17, DQ897434–DQ897465; p18, DQ897466–DQ897499; p21, DQ897500–DQ897532; p22, DQ897533–DQ897565; and p24, DQ897370–DQ897400.
Sequencher v. 4.0 and BioEdit were used for sequence alignment and manual editing to confirm sequence quality and to remove primer sequences. Calculations of diversity from pairwise differences (π) and from the number of segregating sites (θ) (Watterson 1975; Nei and Li 1979), linkage disequilibrium, recombination, and tests of neutrality were performed on each locus using DnaSP v.4.10.04 (Rozas et al. 2003), LDhat (Fearnhead and Donnelly 2001; McVean et al. 2002), and SplitsTree4 (Bruen et al. 2006; Huson and Bryant 2006). We distinguish between diversity at silent sites (synonymous positions, πs and θs; noncoding intronic positions, πnc and θnc; synonymous and noncoding positions, πsi and θsi) and at nonsynonymous sites (πa and θa). Tests of neutrality, e.g., Tajima's D (Tajima 1989b), were conducted with silent sites, although values are similar when all sites are considered together. We also performed coalescent simulations with recombination in DnaSP to infer the neutral distributions of test statistics, conditioning on the observed number of segregating sites in the sample. Although we report the standard Fst measure of differentiation between populations (Hudson et al. 1992b), we also tested for population structure with Ks* in DnaSP because this statistic performs well for small sample sizes coupled with high diversity (Hudson et al. 1992a; Hudson 2000), as in our data. Note that Fst is based on frequencies of polymorphic sites, assuming independence, whereas Kst* uses information about pairwise differences between sequences. Approximate confidence intervals for Fst and Kst* were obtained for each locus by resampling random haplotypes with replacement from each putative subpopulation and calculating Fst and Kst* with a modified version of Hudson's (2002) samplestats program (kindly provided by E. Stahl, personal communication) or permtest (Hudson 2000) to derive the distribution of these statistics. Only sites with single-nucleotide polymorphisms were used in this procedure. Structure 2.0 was also used to evaluate population subdivision (Pritchard et al. 2000). We implemented Structure's linkage model with correlated allele frequencies (given λ = 0.862, which was estimated by Structure for K = 1 and fixed at this value for other K) and assigned a genetic distance of 10−7 to adjacent sites within a locus and free recombination among loci (burn-in = 50,000; run length = 100,000; three replicates for each 1 ≤ K ≤ 5). Analyses of interlocus linkage disequilibrium and population subdivision with Structure were limited to the 27 individuals with complete data for all loci and excluded indels.
The expected decay of linkage disequilibrium (r2) with distance was modeled for each locus according to Equation 3 of Weir and Hill (1986),where n is the sample size and Γ is the product of the population recombination parameter (ρ = 4Ner) and distance (d). Values of the population recombination parameter inferred from LDhat and from least-squares fitting of the above equation to data resulted in similar estimates (not shown). We also fit this formula to the sample of 230 SNPs from 11 strains of C. elegans (Koch et al. 2000), except that recombination distances (centimorgans) were used and “ρ” was inferred by least-squares regression. For this analysis, pairs of sites from different chromosomes were assigned a distance of 50 cM.
We calculated a measure of codon bias, the frequency of optimal codons, Fop, with CodonW (J. Peden, http://www.sourceforge.net/codonw), given the C. elegans codon usage table (Ikemura 1985; Stenico et al. 1994). We computed Fop using only the portions of the coding regions that were sequenced for polymorphism analysis, from which we report the maximum value observed among the sampled sequences. Likewise, we used the codeml program in Paml (Yang 1997) to compute Nei–Gojobori estimates of synonymous (dS) and nonsynonymous (dN) substitution rates of the resequenced portions of the coding regions, using SB146 relative to C. briggsae orthologs (http://www.wormbase.org). We aligned the C. remanei and C. briggsae sequences using canonical peptide translations with Clustal's default settings in BioEdit, with manual adjustment as necessary. Neighbor-joining trees and neighbor networks were constructed with SplitsTree4 (Huson and Bryant 2006).
Patterns of nucleotide polymorphism:
Across six loci comprising 3950 bp of sequence in 34 C. remanei individuals, we identified very high levels of nucleotide polymorphism (Table 3). Silent-site diversity (πsi) for the six loci ranged from 1.5 to 11.8%, with an average of 5.7% (1252 bp in total, 761 bp of which are coding), and diversity at nonsynonymous sites (πa) averaged 0.33% (2611 bp in total), ranging from 0 to 0.64% (Table 3). Two loci (p22, which is probably X-linked, as explained below, and the putatively autosomal p24) were particularly polymorphic, with silent-site diversities of ∼10% (Table 3). For these loci, individual pairs of haplotypes can be up to Ks = 25% divergent at silent sites. This high diversity is not an artifact due to alternative gene duplicates being sequenced, as evidenced by the extensive recombination detected within these polymorphic loci, the clear decline of linkage disequilibrium with distance, the lack of clearly divergent sets of haplotypes, and their unimodal distributions of pairwise allelic divergence. Exclusion of these two loci lowers the average diversity of the remaining four loci to πsi = 3.1%. Diversity at synonymous sites does not differ significantly from that at noncoding sites (P > 0.5; Table 3). Sixteen polymorphic sites across four loci exhibited more than two segregating bases, and insertion/deletion (indel) polymorphisms were present in five of the six loci (including an in-frame 3-bp coding region indel polymorphism in p17). Among the 252 polymorphic sites with two allelic states, there are 158 transitions (ts) and 94 transversions (tv), yielding a ts/tv ratio of 1.68. We also note that strain PB4641, the strain used for full-genome sequencing, contains a C to T transition in the p17 locus that generates a predicted premature stop codon (TAA).
Residual heterozygosity in the isofemale lines was detected in some male individuals for the three loci expected to be autosomal on the basis of the locations of their orthologs in C. elegans (p15, p24, and p17). The three orthologs of C. elegans X-linked genes (p18, p22, and p21), however, never exhibited residual heterozygosity in male C. remanei, consistent with X-linkage in this species as well. This is also supporting evidence that these are single-copy genes. We observed no difference between the levels of genetic variation of the three putatively autosomal loci (p15, p24, and p17) relative to the three putatively X-linked loci, p18, p22, and p21 (the mean for both sets of loci is πsi = 0.057, so a 4/3 Ne X chromosome correction results in a trend of greater diversity on putative X-linked loci), and no correlation with polymorphism levels in orthologous genes in C. elegans or C. briggsae (Cutter 2006; Cutter et al. 2006).
We tested for associations between the diversity values and several genic features. Synonymous-site diversity correlates positively with G + C content at fourfold degenerate sites (GC3s; Spearman's ρ = 0.84, P = 0.036) and with the population recombination parameter estimates, ρ = 4Ner, see below (Spearman's ρ = 0.83, P = 0.042), but not with other features of the genes such as codon usage bias (Fop) or divergence between C. briggsae and C. remanei (dN, dS or dN/dS). Identical associations were observed for silent-site diversity (πsi × GC3s Spearman's ρ = 0.93, P = 0.008; πsi × ρ4Ner Spearman's ρ = 0.89, P = 0.019), and polymorphism at nonsynonymous sites was not significantly correlated with nonsynonymous divergence (πa × dN/dS Spearman's ρ = 0.77, P = 0.072). Diversity at nonsynonymous sites did not correlate with synonymous-site diversity (P = 0.7).
We evaluated the C. remanei samples for evidence of population structure using three approaches, recognizing that the available samples are imperfect for testing the full scale and extent of structure in the species as a whole. First, we used the data from the three sampling locations with three or more individuals [Wright State University (WSU), Achilles Hill, and Gloucester; Table 1] to compute standard measures of interpopulation genetic variation (Fst, Kst*) for each locus, tested for significance with Ks* (Hudson et al. 1992a,b). Using this approach, we identified evidence of structure only for locus p24, an effect that appears to be driven by the lack of diversity among the samples from Gloucester (Table 4). For all other loci, Fst-values were very close to zero (Table 4). A resampling approach to infer confidence intervals on Fst (Table 4) allows us to exclude the presence of strong population structure like that observed in other Caenorhabditis species (BarriÈre and FÉlix 2005; Haber et al. 2005; Sivasundar and Hey 2005; Cutter 2006; Cutter et al. 2006). We obtained similar results with the Snn nearest-neighbor population structure statistic (Hudson 2000), again with only locus p24 showing significant evidence of structure after Bonferroni multiple-tests correction (Snn = 0.88, P < 0.001). Furthermore, overall levels of diversity were similar whether all samples are analyzed together or separately for the three sampling localities or for the collection of singleton samples each from different localities (Table 4).
In a second approach to identifying population subdivision, we used the program Structure to attempt to infer the maximum-likelihood number of subpopulations (K) from the pattern of polymorphism in the sample (Pritchard et al. 2000). This approach yielded higher log-likelihood values for K increasing from 1 to 5. However, the program assigns roughly equal proportions of each subpopulation to each individual for all values of K that we explored (i.e., ∼1/K of each individual is assigned as being derived from each putative subpopulation; Figure 1). This suggests little, or no, population subdivision. Although it should be noted that this program was not intended to be used with sequence data (i.e., from closely linked polymorphic sites), and this use can lead to overestimation of K, we used a model allowing recombination to make these data fit the assumptions required by Structure as closely as possible. Finally, we represented the relationships between the sampled individuals graphically using neighbor networks (Huson and Bryant 2006), which qualitatively illustrate the lack of any clear association between sequences and sampling locality (Figure 2; supplemental Figure 1 at http://www.genetics.org/supplemental/). Overall, these data provide very little evidence of consistent population structure, indicating that these samples approximate panmixia.
Recombination and linkage disequilibrium:
Evidence for recombination is prevalent both within and among loci, with significant test results for intralocus recombination for five of the six loci (Table 5). Using a four-gamete test to infer the minimum number of recombination events in the sample for each locus (Hudson and Kaplan 1985), at least two recombination events are detected in the two loci with the least evidence of recombination (Table 5). We also calculated the population recombination parameter (ρ = 4Ner; effective population size Ne and recombination rate r) for each locus using the maximum-likelihood method implemented in LDhat (Fearnhead and Donnelly 2001), yielding an average value of ρ = 0.0346 (Table 5). Consistent with crossing over in these regions, pairwise linkage disequilibrium (LD) decays significantly over the ∼600 bp of sequence in four of the six loci (Figure 3), with significant LD present between only 0.4% of sites (after multiple-tests correction). Within loci, r2-values average 0.208 (Table 5). No significant LD is detected between sites in different loci, and the average r2 between loci equals 0.050. To illustrate the expected decay of LD with distance given the inferred recombination parameter (ρ), we overlay Weir and Hill's (1986) Equation 3 on the plots of pairwise r2 and distance values for each locus (Figure 3). Extrapolating this decline in linkage disequilibrium with distance to the point at which the interlocus value is reached (r2 = 0.05) suggests that linkage equilibrium should occur after ∼1–2 kb on average (Table 5). In a reanalysis of published sequence variation in 3 kb of the Cr-fem-3 gene (Haag and Ackerman 2005), linkage disequilibrium also decays significantly with distance (Spearman's ρ = 0.30, P < 0.0001), reaching background levels after 534 bp (given the estimate of the recombination parameter ρ from LDhat of 0.0325, n = 11). For comparison, linkage disequilibrium in C. elegans for a genomewide sample of 230 SNPs from 11 strains (Koch et al. 2000) decays on a scale of tens of centimorgans (Figure 4), which implies LD over many megabases given an average recombination rate of ∼3 cM/Mb (Barnes et al. 1995).
Tests of neutrality:
Given the findings of limited, if any, population structure and extensive recombination, it is also informative to consider the potential for demography and selection to perturb equilibrium neutral patterns of polymorphism at the surveyed loci. Genetic diversity measures based on pairwise differences (π) or the number of segregating sites (θ) are expected to yield approximately equal values at equilibrium under a standard neutral model (Watterson 1975; Nei and Li 1979; Tajima 1983). Departures from this neutral expectation are quantified by statistics such as Tajima's D, for which values different from zero may reflect the action of selection or nonneutral demographic processes (Tajima 1989a). For all six loci sampled here, no significant departures from the neutral expectation were observed by either Tajima's D or Fu and Li's D*, assuming no recombination (Table 6). However, the distribution of D and D* calculated from coalescent simulations with free recombination in DnaSP indicated a departure from neutrality for p22 alone (D, P = 0.001; D*, P = 0.001), as well as when calculated using observed levels of recombination for D* (P = 0.027). The average value of D of 0.196 across loci is slightly positive, and D* is similar, as also reported for two other nuclear loci in this species (Graustein et al. 2002). This could be a signature of a recent bottleneck or population contraction; however, data from many more loci are necessary to determine whether this is a general pattern in the genome. Tests of neutrality that are based on divergence, e.g., HKA tests (Hudson et al. 1987), unfortunately cannot be applied here because of saturated synonymous-site divergence relative to the nearest known congener (Table 6). However, the relatively high amino acid divergence and low codon usage bias of p18 (Table 6) suggest that this locus likely experiences much weaker purifying selection than the other loci.
Effective population size:
Given values of the population parameters ρ and θ and the assumption of demographic equilibrium, it is possible to calculate corresponding estimates of the effective population size (Ne) because ρ = 4Ner and θ = 4Neμ (where r is the recombination rate and μ is the mutation rate; Li 1997). From the average πsi used as an estimate of θ, and applying the C. elegans neutral mutation rate of 9.0 × 10−9/bp rate to C. remanei (Denver et al. 2004; Keightley and Charlesworth 2005), we obtain a diversity-based estimate of Ne in C. remanei of ∼1.6 × 106 (per locus range: 4.2 × 105–3.3 × 106). It is not yet known whether the mutation rate estimated in C. elegans applies in other species, but patterns of divergence suggest roughly equivalent mutation rates among Caenorhabditis species (Cutter and Payseur 2003), although C. briggsae may have a slightly higher rate than C. elegans (Baer et al. 2005). To make a similar calculation on the basis of ρ, a measure of recombination rate (r) is required. C. remanei has six chromosomes, each of which likely experience only a single crossover per meiosis (Hillers and Villeneuve 2003), and its total genome size is slightly larger than those of C. elegans and C. briggsae (∼130 Mb; J. S. Johnston and R. Waterston, personal communication), suggesting an average recombination rate of ∼2.3 cM/Mb (6 × 50 cM/130 Mb). Using this genomic average recombination rate as an estimate of r, the observed mean ρ of 0.0346/bp implies an effective size of Ne ∼ 3.7 × 105 (per locus range: 4.0 × 104–1.5 × 106). This calculation based on ρ depends on the appropriateness of our estimate of the average genomic recombination rate, which may deviate from the local rate of recombination for any given locus. Consequently, if the six loci tend to reside in regions of lower than average recombination, the ρ-based Ne calculation will be an underestimate. If real, the difference in Ne inferred from θ and ρ could also indicate excess linkage disequilibrium, which could result from departures from demographic equilibrium or from natural selection acting at or near these loci (Andolfatto and Przeworski 2000; Andolfatto and Wall 2003). Nevertheless, it is plausible from both of these calculations that C. remanei has an effective population size on the order of 1 million.
Patterns of nucleotide polymorphism in C. remanei:
Natural population samples of C. remanei are characterized by very high silent-site nucleotide diversity, some loci having πsi as high as 10%. This does not appear to be an artifact of population structure, as intralocality measures of diversity paint the same picture and there is little evidence for population structure. The range of diversity among the six loci surveyed here encompasses the values previously reported for four other nuclear loci studied in different samples from this species (Table 3; Graustein et al. 2002; Jovelin et al. 2003; Haag and Ackerman 2005). Considering all 10 loci together yields an average πsi of 4.7%, implying that nearly 1 in 20 silent sites differs between any two randomly selected C. remanei sequences.
In stark contrast to C. remanei, the self-fertilizing congeners C. elegans and C. briggsae exhibit ∼20-fold less diversity in orthologous loci (Graustein et al. 2002; Jovelin et al. 2003; Cutter 2006; Cutter et al. 2006). This difference in diversity between outcrossing and selfing Caenorhabditis is consistent with the prediction of lower genetic variation within inbreeding populations, although it is not entirely clear which aspects of inbreeding might have led to such a dramatic reduction in diversity in selfing Caenorhabditis (Graustein et al. 2002; Sivasundar and Hey 2005; Cutter 2006; Cutter et al. 2006). Loci from some other species of nematodes also demonstrate diversity at nuclear loci as high as the ∼10% seen in C. remanei, such as β-tubulin in Haemonchus contortus (Beech et al. 1994) and ITS1-rDNA of Longidorus biformis (Ye et al. 2004), although most reports indicate levels of silent-site diversity of ≤2% (Anderson et al. 1998; Cutter et al. 2006). Similarly high levels of nucleotide polymorphism have also been reported in several species of mycophagous Drosophila (Dyer and Jaenike 2004; Shoemaker et al. 2004), suggesting that intraspecific per-nucleotide diversities of 5–10% may not be unusual in a variety of taxa, in spite of the predominent view of low-to-moderate levels of genetic variation within genetic model organisms: Drosophila melanogaster πsi ∼ 1.6% (Andolfatto 2001), Arabidopsis thaliana θsi ∼ 1.6%, maize θsi ∼ 1.5% (Wright and Gaut 2005), and human πsi ∼ 0.08% (Zhang 2000; Yu et al. 2001).
As in other obligately outbreeding species, linkage disequilibrium between pairs of polymorphic sites in C. remanei decays rapidly with distance (Long et al. 1998; Remington et al. 2001; Tenaillon et al. 2001). For most of the loci examined here, linkage disequilibrium declines significantly over just a few hundred base pairs at a rate that suggests linkage equilibrium will generally be reached at distances >1–2 kb. D. melanogaster and maize show similar rapidly decaying linkage disequilibrium (Long et al. 1998; Remington et al. 2001; Tenaillon et al. 2001), whereas 50–60 kb is required for its decay in humans and in A. thaliana (Reich et al. 2001; Nordborg et al. 2005). The self-fertile wild barley, Hordeum vulgare, exhibits LD to a comparable extent as maize (Morrell et al. 2005), whereas in the self-fertile C. elegans, linkage disequilibrium decays at a scale of megabases and even extends between pairs of sites on different chromosomes (Figure 4), which is consistent with very low outcrossing rates (BarriÈre and FÉlix 2005; Haber et al. 2005; Cutter 2006). The precise relationship between LD and distance for individual loci, however, is defined by the combined effects of local rates of recombination and mutation, selection regimes, and stochasticity inherent to the coalescent process. For example, the correlation that we observe between diversity levels (π or θ) and estimates of the recombination parameter (ρ) in these data could indicate a reduced efficacy of natural selection in lower recombination regions due to background selection or genetic hitchhiking (Maynard Smith and Haigh 1974; Charlesworth et al. 1993; Wiehe and Stephan 1993; Hudson and Kaplan 1995). Alternatively, mutation or biased gene conversion rates correlated with recombination could result in greater diversity in high recombination regions (Marais and Duret 2001; Marais 2003). The additional observation of a GC-content × ρ correlation, without an association between ρ and codon usage bias, suggests the operation of biased gene conversion, although further data are required to adequately test this.
We also note that values of ρ/θ in C. remanei are somewhat lower than expected from inferences of genome size, map length, and mutation rate (C. remanei r ∼ 2.3 × 10−8 M/bp/generation, C. elegans μ ∼ 9 × 10−9 mutations/bp/generation; Denver et al. 2004; Keightley and Charlesworth 2005; J. S. Johnston and R. Waterston, personal communication). At equilibrium, ρ = 4Ner and θ = 4Neμ, so ρ/θ provides an index of the average number of recombination events per mutation event (r/μ). Here, we observe mean ρ/θ = 0.5 and expect r/μ ∼ 2.5. A more formal analysis of these issues with additional data is warranted; low values of ρ/θ, reflecting excess linkage disequilibrium due to historical demographic perturbations, have also been described for D. melanogaster (Andolfatto and Przeworski 2000; Andolfatto and Wall 2003; Haddrill et al. 2005).
We found no difference in levels of diversity between putatively autosomal and X-linked loci. Findings of comparable levels of nucleotide diversity at autosomal and X-linked loci also have been reported for several species of Drosophila (Andolfatto 2001; Dyer and Jaenike 2004), despite the potentially reduced effective population size of the X chromosome (by virtue its hemizygosity in males). Some populations of D. melanogaster and D. simulans, however, appear to have elevated (African) or reduced (non-African) levels of diversity on the X chromosome (Begun and Whitley 2000; Andolfatto 2001; Kauer et al. 2002; Glinka et al. 2003; Thornton and Andolfatto 2006); thus, future evaluation of diversity for C. remanei population samples from different continents might provide additional insights into the processes patterning genetic variation.
Locus p24, the putative ortholog of the C. elegans “undescribed” gene ZK430.1, has the most extreme values for several population statistics in this sample of loci, including the highest ρ, θsi, and Fst and lowest r2. Is this locus or a linked locus subject to selection, or is the genomic region subject to unusual characteristics? Tests of the frequency spectrum (e.g., Tajima's D) do not suggest nonneutrality, and the coding sequence exhibits only an average rate of protein evolution (dN). However, dS, a measure of the mutation rate from interspecific comparisons, is also the most extreme value among the loci in this sample. Thus, p24 may simply reside in a region of both high mutation and high recombination.
Consistency with panmixia and demographic equilibrium:
The obligately outcrossing C. remanei stands in striking contrast to C. elegans and C. briggsae in terms of showing little evidence of population structure (BarriÈre and FÉlix 2005; Haber et al. 2005; Sivasundar and Hey 2005; Cutter 2006; Cutter et al. 2006). The results of analyses of population subdivision and tests of neutrality point to the conclusion that these samples roughly conform to the predictions of a panmictic population at demographic equilibrium. However, it is important to recognize that (1) more extensive population subsampling will improve our understanding of how genetic variation is distributed across space and (2) the analysis of more loci is required to infer the degree to which population growth or contraction may have affected diversity patterns. For example, should larger samples of loci confirm a genomewide trend of a deficit of rare alleles (e.g., positive Tajima's D) and a consistently larger estimate of Ne inferred from diversity than from linkage disequilibrium, then this could suggest recent population contraction in the history of these, primarily American, C. remanei samples (Tajima 1989a; Andolfatto and Wall 2003). Population sampling for disparate regions (Europe, Asia) will also aid in our understanding of the historical processes that shape genome evolution in this species. The extensive recombination and large effective population size inferred for C. remanei, on the order of Ne ∼ 106, imply that selection will operate efficiently on traits subject to even very weak selection. Consequently, scans of the genome for deviant patterns of polymorphism and linkage disequilibrium hold great promise for identifying regions that may be subject to natural selection in the wild, without requiring extensive corrections for the confounding effects of complex demographic scenarios. Furthermore, should the high molecular diversity correspond to substantial phenotypic variation, C. remanei may provide an exceptionally good system for dissecting the genetic basis of quantitative traits based on natural variation (Jovelin et al. 2003).
General implications for Caenorhabditis:
Obligate outcrossing defines the ancestral mode of reproduction among Caenorhabditis nematodes, from which self-fertilization has evolved independently in at least the two lineages leading to C. elegans and C. briggsae (Kiontke et al. 2004). The molecular mechanisms controlling germline differentiation and sex determination appear to have diverged dramatically among these species (Clifford et al. 2000; Haag et al. 2002; Stothard et al. 2002; Nayak et al. 2005). However, the duration of selfing in the C. elegans and C. briggsae lineages remains an open question: extant intraspecific diversity would permit an origin as recently as a few hundred thousand years ago despite a much more ancient common ancestor of known species of Caenorhabditis (Coghlan and Wolfe 2002; Stein et al. 2003; Cutter 2006; Cutter et al. 2006). Regardless of the dating of the origin of self-fertilization, much of the evolutionary histories of these species would have occurred in an obligately outcrossing ancestor. If C. remanei adequately reflects many of the characteristics common to obligately outcrossing Caenorhabditis species, then it will provide a useful model for the evolutionary circumstances surrounding the gonochoristic progenitors of C. elegans and C. briggsae. This should prove particularly insightful for understanding the role of natural selection in shaping such traits as longevity, sex-limited behaviors and chemosensation, codon usage bias, and transitions in the mode of reproduction.
We thank M.-A. Félix for providing strain JU825, E. Haag and M. Palopoli for sharing sequence data files with us, and E. Stahl for help with samplestats. We also gratefully acknowledge the comments of D. Begun, K. Dyer, P. Haddrill, and two anonymous reviewers on previous drafts of the manuscript and the suggestions of B. Charlesworth and W. G. Hill relating to several of the analyses in this work. The Caenorhabditis Genetics Center kindly provided some strains that were used in this study. This work was funded by the National Science Foundation with International Research Fellowship Program grant no. 0401897 to A.D.C.
Communicating editor: D. Begun
- Received June 12, 2006.
- Accepted August 10, 2006.
- Copyright © 2006 by the Genetics Society of America