Abstract
Understanding nucleotide variation in natural populations has been a subject of great interest for decades. However, many taxonomic groups, especially those with atypical life history attributes remain unstudied, and Drosophila is the only arthropod genus for which DNA polymorphism data are presently abundant. As a result of the recent release of the complete genome sequence and a wide variety of new genomic resources, the Daphnia system is quickly becoming a promising new avenue for expanding our knowledge of nucleotide variation in natural populations. Here, we examine nucleotide variation in six protein-coding loci for Daphnia pulex and its congeners with particular emphasis on D. pulicaria, the closest extant relative of D. pulex. Levels of synonymous intraspecific variation, πs, averaged 0.0136 for species in the Daphnia genus, and are slightly lower than most prior estimates in invertebrates. Tests of neutrality indicated that segregating variation conforms to neutral model expectations for the loci that we examined in most species, while Ka/Ks ratios revealed strong purifying selection. Using a full maximum-likelihood coalescent-based method, the ratio of the recombination rate to the mutation rate (c/u), averaged 0.5255 for species of the Daphnia genus. Lastly, a divergence population-genetics approach was used to investigate gene flow and divergence between D. pulex and D. pulicaria.
A thorough understanding of molecular variation in natural populations is fundamental to many important evolutionary issues. Thus, the characterization of molecular variation in natural populations and the discernment of the underlying forces that shape this variation have been a central focus of evolutionary geneticists for decades (e.g., Kreitman 1983; Moriyama and Powell 1996; Schmid et al. 2005). The ability to collect vast amounts of DNA sequence data has improved greatly in recent years, and this has expanded our understanding of the mechanisms that maintain genetic variability—especially in model systems. However, many animal groups with atypical life history attributes remain unstudied, and Drosophila is the only arthropod genus for which DNA polymorphism data are presently abundant. Undoubtedly, a more complete understanding of nucleotide variation in natural populations can be obtained by examining additional taxonomic groups.
The waterflea Daphnia (Cladocera, Anomopoda) has been central to thousands of studies in many diverse areas of biology (reviewed in Peters and de Bernardi 1987). Recently, Daphnia has been recognized as a new genetic model system. As a result, the scope of Daphnia research has expanded to include gene expression, mapping, and comparative genomic studies (e.g., Watanabe et al. 2005; Cristescu et al. 2006; Omilian et al. 2006; Soetaert et al. 2006). Yet, comparatively little is known about daphniid population genetics at nuclear protein-coding loci. Past studies of genetic variation in daphniid populations mainly employed allozymes, microsatellites, and mitochondrial DNA (e.g., Hebert 1974; Mort and Wolf 1986; Crease et al. 1990; Crease et al. 1997; Weider and Hobaek 1997; Palsson 2000), and few population-genetics analyses of nuclear DNA sequences are published (Little et al. 2004; Ishida and Taylor 2007). A better understanding of Daphnia population-genetics parameters will facilitate the interpretation of the new genomic data that are derived from this organism, especially data that are interpreted in an evolutionary context.
In this study, we treat Daphnia pulex as the focal species and also report on smaller surveys of five of its congeners: D. pulicaria, D. obtusa, D. ambigua, D. magna, and D. mendotae. The latter four species are distantly related to D. pulex, being separated by millions of years of divergence (Colbourne and Hebert 1996). However, the relationship between D. pulex and D. pulicaria is less clear. Previous studies on the basis of allozymes and mitochondrial DNA have regarded D. pulicaria as a sister species to D. pulex (Hebert et al. 1993; Colbourne and Hebert 1996). However, the species barrier between North American D. pulex and D. pulicaria is tenuous; the two are nearly morphologically indistinguishable and hybridize readily in nature. Although D. pulex–D. pulicaria hybrids reproduce via obligate parthenogenesis in the wild (e.g., Hebert et al. 1993; Hebert and Finston 2001), they are capable of producing progeny via sexual reproduction in the lab (Heier and Dudycha 2009). The most obvious difference between D. pulex and D. pulicaria is their habitat: D. pulex usually reside in ephemeral fishless ponds, whereas D. pulicaria primarily inhabit lakes.
Thus, D. pulex and D. pulicaria appear to be in the process of ecological speciation, with differences between pond and lake habitats being the source of divergent selection. Divergence population genetics can be used to shed light on the divergence process between these two species. This approach is based on the premise that two nascent species share polymorphisms due to their recent divergence from a common ancestor, and as time progresses, shared polymorphisms are lost, and new mutations arise in each species. When multiple loci are examined, the variance in levels of divergence among loci is used to assess gene flow (e.g., Wang et al. 1997; Machado et al. 2002; Hey 2006). If some loci show large divergences while others do not, it may be postulated that gene flow is still occurring at some loci. Divergence in the face of gene flow implicates selection as a driving force for the divergence (Maynard-Smith 1966; Endler 1977; Rice and Hostert 1993).
Here, we investigate the evolutionary processes shaping divergence between D. pulex and D. pulicaria by examining DNA sequence polymorphism data in six nuclear protein-coding loci. In particular, we ask whether an “isolation” speciation model, in which an ancestral population splits with no subsequent gene flow between descendent populations (Wakeley and Hey 1997), or the “isolation with migration” model (Wakeley and Hey 1998) is a better fit to the data. In doing so, multiple demographic parameters are estimated that characterize the divergence between D. pulex and D. pulicaria. This article also quantifies specieswide levels of nuclear variation in several other Daphnia species, and we report for all species included in this study the following basic population-genetics information: diversity estimates, tests of neutrality, Ka/Ks ratios, and the recombination rate (c/u).
MATERIALS AND METHODS
Taxonomic sampling:
Species delimitation and nomenclature within the D. pulex complex is problematic (Mergeay et al. 2008). Thus, we present two salient issues that warrant attention for this study. The D. pulex species name is used to denote two different lineages, one that inhabits Europe (sensu Leydig 1860), and the other is primarily found in North America (sensu Hebert 1995). In the same vein, two different lineages are named D. pulicaria; European D. pulicaria (sensu Alonso 1996) are distinct from North American D. pulicaria populations (sensu Forbes 1893). Here, we refer only to North American D. pulex and D. pulicaria, and restrict our conclusions to these lineages.
Six loci were examined in D. pulex, D. pulicaria, and D. obtusa. Due to the difficulty of obtaining PCR products, two loci were analyzed for three additional species that are much more distant in their taxonomic relationship to D. pulex: D. ambigua, D. mendotae, and D. magna. For D. pulex, two individuals per collection site were sampled (36 alleles, supporting information, Table S1). Obligate parthenogens that are known to occur in D. pulex were not included in this study. For all other species, one individual from each of nine collection sites was sampled (18 alleles per species, see Table S1 for exceptions). With the exception of D. pulex, each individual sampled within a species was from a unique collection site, making it impracticable to address questions at the level of the deme. This specieswide sampling strategy from multiple demes may increase the effective population size relative to that of a single deme (Wright 1943). With multiple subdivided demes, the migration rate among demes can influence inferences regarding divergence time, relative population sizes, and phylogenetic relationships (Wakeley 2000), and the appropriateness of our sampling approach depends on how well Daphnia fit the assumptions of an island model of migration with many demes. Moreover, Daphnia are cyclic parthenogens, and asexual periods of reproduction may influence adherence to the standard coalescent, potentially influencing our analyses. Although it is beyond the scope of the present article to deal formally with this issue, we expect that the influence of selection on linked sites will be greater in a cyclically parthenogenetic population than a purely sexual population, thereby increasing the among-individual variance in reproductive success and reducing effective population size.
Because taxonomic differentiation on the basis of morphology is notoriously difficult in Daphnia, we verified the taxonomic identification of all individuals with sequence data from the 12S rDNA gene (Colbourne and Hebert 1996). Allozyme analysis for the lactate dehydrogenase (LDH) locus was used to differentiate between D. pulex and D. pulicaria (Table S1), following the conventional notion that D. pulicaria is homozygous for the F allele and D. pulex is homozygous for the S allele (Hebert et al. 1989, 1993).
It has been proposed that certain populations of D. pulex endemic to Oregon are a separate species called D. arenata (Hebert 1995). Phylogenies on the basis of mitochondrial DNA sequences show that D. pulex is paraphyletic with respect to these Oregon populations (Colbourne et al. 1998; Lynch et al. 2008), but nuclear data reveal support for D. arenata (Omilian et al. 2008). This study includes two populations from Oregon (denoted LOG and GI); one population (LOG) consists of D. arenata, while the GI population is likely to be of hybrid origin (Omilian et al. 2008). For our population-genetics analyses, we found that the D. arenata individuals were sufficiently divergent to be analyzed separately from D. pulex. The GI individuals were excluded from most analyses.
Locus information:
Loci were chosen on the basis of the presence of conserved regions for primer design in sequences present in cDNA libraries from D. pulex and D. magna (cDNA libraries provided by John Colbourne and Hajime Watanabe, see Table S2 for primer sequences). These loci appeared to be single-copy genes based on BLAST searches to the D. pulex genome (v.1.0), and data obtained from cloning did not reveal evidence for more than two alleles per individual at any locus. We examined sequences from the following loci: ATP synthase epsilon chain (atp-ep), a calcium-binding protein with an EF-hand (cbp-EF), cleavage stimulation factor (cstf), glyceraldehyde-3-phosphate dehydrogenase (g3pdh), a rab GTPase(rab4), and a protein with translation initiation factor and subtilase activity (tif).
PCR amplification, DNA sequencing, and cloning:
Specific PCR information is described in Omilian et al. (2008). Briefly, Taq polymerase with proofreading capability (Clontech) was used, and PCR products were sequenced in both directions. Because Daphnia were from natural populations, direct sequencing of PCR products revealed many heterozygous loci, that is, two overlapping peaks were observed for at least one site of the DNA sequence electropherogram. Heterozygous sites were detected with CodonCode Aligner software v1.4.3 set at the highest sensitivity and then verified by eye. Approximately 50% of our amplicons had multiple heterozygous sites and were cloned with the Invitrogen TOPO TA kit to determine the gametic phase. The QIAprep Spin miniprep kit (QIAGEN) was used for plasmid purification, and a T7 primer was used to sequence the cloned inserts. Four to 12 cloned fragments were sequenced, and the results from cloned PCR products were compared to the direct sequences of PCR products to ensure that polymorphisms were not the result of PCR or cloning-induced errors (Cronn et al. 2002).
Sequence analysis and statistical tests:
Polymorphism patterns:
Sequences were aligned with MEGA version 3.1 (Kumar et al. 2004), using the ClustalW algorithm (Thompson et al. 1994) and then manually corrected. Population-genetics parameters and tests of neutrality were calculated with DnaSP version 4.0 (Rozas et al. 2003) unless noted otherwise. Two measures of nucleotide diversity were estimated: π, the average of pairwise differences among DNA sequences (Tajima 1983), obtained using the Jukes–Cantor correction (Jukes and Cantor 1969; Lynch and Crease 1990); and θ, based on the total number of segregating mutations in the sample (Watterson 1975; Tajima 1996). Under neutrality, π and θ are expected to be equal and estimators of 4Neu, where Ne is the genetic effective population size and u is the mutation rate per site per generation. Both π and θ were estimated for the following categories of nucleotide sites/regions: total (πT, θT), nonsynonymous (πn, θn), synonymous (πs, θs), intron (πi, θi), and UTR (πU, θU).
Recombination:
A full-likelihood coalescence-based approach in LAMARC 2.0 (Kuhner 2006) was used to estimate the recombination rate, rLAMARC = c/μ, where c is the recombination rate per site per generation and μ is the neutral mutation rate per site per generation. LAMARC uses a finite-sites model and operates under the following assumptions: (1) recombination frequency is not affected by sequence divergence, (2) all recombination is homologous, (3) gene conversion and interference do not occur, (4) recombination events are selectively neutral, and (5) the recombination rate is constant across the region. We excluded loci with 10 or fewer variable sites because LAMARC requires sufficient numbers of variable sites to get reliable estimates of the parameters.
Because it was not possible to implement the best fitting model (determined by Modeltest v3.7; Posada and Crandall 1998) for most loci in LAMARC, we used the Felsenstein '84 (F84) model with empirical base frequencies (Kishino and Hasegawa 1989; Felsenstein 1993); for most loci, the F84 model is a closer fit than the other model offered by LAMARC and is computationally faster. Two categories of relative mutation rate were assigned, accounting for substitution rate differences between nonsynonymous sites and all other sites. Our sampling strategy included 20 initial chains of 1000 and 2 final chains of 50,000 genealogies with 1000 genealogies discarded per chain. Adaptive heating was used to improve the search of parameter space. The entire analysis was replicated three times and the results were combined (Geyer 1991).
Tests of neutrality and Ka/Ks ratios:
Two tests of neutrality were conducted: the Hudson–Kreitman–Aguade test (HKA, Hudson et al. 1987) and Tajima's D test (Tajima 1989). We used a multilocus version of the HKA test (Hey's HKA program; http://lifesci.rutgers.edu/∼heylab). Test statistics for both HKA and Tajima's D tests were compared with distributions generated from 10,000 coalescent simulations to determine significance. Values for Ka and Ks were calculated in MEGA using the Kumar method, which is a modification of the Pamilo–Bianchi–Li (Pamilo and Bianchi 1993; Li 1993) and Comeron (1995) methods; this method accounts for some problematic degeneracy assignments. For both Ka and Ks we subtracted the mean within-species diversity (averaged between the two species being compared) from raw estimates of divergence to get the net divergence between species. Standard errors for Ka/Ks ratios were calculated from equation A1.19b from Lynch and Walsh (1998).
Phylogenetic analyses:
We used MrBayes 3.1.2 (Huelsenbeck and Ronquist 2001) to elucidate the genealogical relationships of D. pulex, D. pulicaria, and D. arenata alleles with Bayesian inference. The six loci were concatenated in an alignment of 3160 bp for a total evidence analysis. The tree was rooted with D. obtusa, which is an uncontroversial outgroup. Large indel mutations that were likely to be the result of one insertion event, rather than several independent events, were weighted as one event (e.g., a novel intron insertion in some individuals).
We conducted an analysis without partitioning and an analysis with the sequence data partitioned by one noncoding and three codon positions (first, second, and third codon positions). Model selection for each partition was made according to the Akaike information criterion in Modeltest v3.7 (Posada and Crandall 1998). For MrBayes, default prior settings were used except for the ratepr parameter that was set to variable for the partitioned analyses, so that partitions could evolve at different rates. Branch lengths and topology were shared among partitions, but the substitution rate matrix, state frequency, and shape parameter of the γ-distribution were unlinked, allowing for separate parameter estimates. Two independent and simultaneous Markov chain Monte Carlo (MCMC) analyses of 15 heated and one cold chain were run for 6 million generations sampling from the chain every 100 generations. After determining chain convergence (average standard deviation of split frequencies <0.01), we discarded the initial 25% of trees as a burn-in. A 50% majority-rule consensus topology with posterior probability (PP) values for each node was constructed from the post-burn-in trees.
Genealogical relationships were further examined with median-joining haplotype networks using the program Network version 4.5 (Bandelt et al. 1999; http://www.fluxus-engineering.com). This method accounts for the coexistence of ancestral and descendent haplotypes, multifurcations, and reticulate relationships (Posada and Crandall 2001). We used the MP post-processing option, which removes all superfluous median vectors and links that are not contained in the shortest trees of the network.
Divergence population genetics:
A Bayesian methodology employed by the Isolation with Migration (IMa) software (Hey and Nielsen 2007) was used to generate posterior probability distributions for six parameters relevant to the divergence between D. pulex and D. pulicaria. In turn, these parameters were used to estimate the time since divergence, migration rates, and effective population sizes for D. pulex, D. pulicaria, and their ancestral population. Assuming five generations per year in Daphnia, we applied the mutation rate 2.9 × 10−8 mutations/site/year (single-nucleotide mutation rate from Haag-Liautard et al. 2007). Although this mutation rate is from a distant relative (Drosophila), it remains the most rigorously determined arthropod nuclear rate that is available. Because there is error associated with mutation-rate estimation (outlined in Haag-Liautard et al. 2007) and the application of a mutation rate from a distant relative, we also present the scaled IMa parameters.
Once appropriate prior distributions were identified (Won and Hey 2005), we used multiple runs with identical conditions except the random number seed to determine convergence. We applied the HKY mutation model (Hasegawa et al. 1985) to accommodate multiple substitutions at a single nucleotide site. Five chains with heating were used for each run. Runs of 60,000,000 steps and a burn-in period of 100,000 steps provided for effective sample sizes of at least 95.
The isolation with migration methodology is contingent upon several assumptions. First, it is assumed that variation within the dataset is selectively neutral. Second, the sampled populations, D. pulex and D. pulicaria, are assumed to be more closely related to each other than other populations, and gene flow with unsampled populations is assumed to be nonexistent. Lastly, there should be no recombination within loci and free recombination between loci (Hey and Nielsen 2004). The assumption of selective neutrality is likely to hold as neutrality tests indicated that sequences for D. pulex and D. pulicaria conform to neutral model expectations (see results). Phylogenetic analyses from previous studies reveal mixed results regarding the closest relative to D. pulex; studies on the basis of mitochondrial DNA usually reconstruct D. arenata as the closest relative to D. pulex (Colbourne et al. 1998; Mergeay et al. 2008), whereas nuclear DNA reveals that D. pulex and D. pulicaria are sister groups (Omilian et al. 2008; this study). Thus, we conducted three separate IMa analyses in which the following were compared: (1) D. pulex and D. pulicaria, (2) D. pulex and D. arenata, and (3) D. arenata and D. pulicaria. Because the assumption of no intralocus recombination is violated in our data set, we also assessed migration with a full-likelihood coalescence-based approach in LAMARC that accounts for recombination.
RESULTS
Altogether, we sequenced >280 kb representing six nuclear protein-coding loci. The aligned length for the loci ranged from 443 to 615 bp with a mean length of 519 bp. All amplified fragments contained at least 1 intron, for a total of 10 introns, with the exception of two populations (LOG and GI, both from Oregon), which were polymorphic for an intron insertion; these populations had 11 introns total. Average intron length per species ranged in size from 57 bp (D. ambigua) to 162 bp (D. pulicaria) with an average length of 82.7 bp for the Daphnia genus (see Table S3). Four loci contained UTR sequence (either 5′ or 3′) in the amplicon; in all four cases we did not get sequence data for the complete UTR.
Polymorphism patterns, tests of neutrality, and Ka/Ks ratios:
Levels of DNA polymorphism varied in Daphnia according to functional site/region, locus, and species (Table 1). Nucleotide diversity was far lower for nonsynonymous than synonymous and intron sites (P ≤ 0.0001, Figure 1). Diversity in UTRs was threefold lower than at synonymous sites or introns (P ≤ 0.002, Figure 1). Synonymous-site diversity averaged across all loci was highest in D. ambigua (πs = 0.0310), lowest in D. arenata (πs = 0.0017), and the focal species D. pulex and D. pulicaria were intermediate with values of 0.0188 and 0.0159, respectively (Table 2). Relative to the mutation rate, the overall recombination rate per site per generation (c/μ) ranged from 0.2383 to 0.7611 for Daphnia species and averaged 0.5255 for the Daphnia genus (Table 3).
Basic polymorphism statistics calculated for each species
Estimates of π and θ for different functional regions averaged across all loci examined in the Daphnia genus. Error bars are standard errors of the mean of all loci. The following categories of nucleotide sites were examined: total (T), synonymous (s), nonsynonymous (n), intron (i), and UTR (U).
Estimates of nucleotide diversity averaged across all loci for each species of Daphnia included in this study
Full-likelihood estimates of the relative recombination rate (rLAMARC = c/μ, where c is the recombination rate per site per generation and μ is the neutral mutation rate per site per generation) for Daphnia species
We did not observe a significant departure from neutrality for any locus with the HKA test (Table 4). A significant negative mean Tajima's D was observed only for D. obtusa (Table 5). Synonymous (Ks) and nonsynonymous (Ka) divergences among D. pulex and its congeners are shown in Table 6; Ks values were higher than Ka values for all loci, and Ka/Ks ratios ranged from −0.0120 to 0.0501. D. pulex and D. pulicaria were not significantly different at silent or replacement sites (Table 6).
HKA test results
Tajima's D values for comparisons of π and θ
Average nucleotide divergence for protein-coding genes between D. pulex and five of its congeners
Evolutionary relationships between D. pulex and D. pulicaria:
In MrBayes, the lowest average standard deviation of split frequencies was obtained with the unpartitioned data set that employed the GTR + I + Γ model of nucleotide evolution; the 50% majority-rule consensus Bayesian topology is presented from this analysis (Figure 2). The consensus tree revealed that D. pulex consists of two major monophyletic groups that are paraphyletic with respect to D. pulicaria. We find that D. pulicaria is a strongly supported monophyletic group (PP = 0.99), and D. pulex and D. pulicaria are more closely related to each other than either is to D. arenata, which is also a distinct monophyletic group (PP = 1). However, support was low for the sister relationship of D. pulex and D. pulicaria to the exclusion of D. arenata (PP = 0.65). The GI population is unique in that one of the sampled individuals groups with D. pulicaria and the other groups with D. pulex.
The 50% majority-rule consensus Bayesian genealogy from the total evidence analysis of six nuclear protein-coding loci (3160 bp) in Daphnia pulex (solid), D. pulicaria (open), and D. arenata (shaded). Numbers at nodes are posterior probabilities (PP) and are not shown if they are <0.85. The tree is rooted with the outgroup species, D. obtusa (hatched). Labels consist of the population name, followed by the individual number (if more than one individual was sampled), and the allele number (either 1 or 2). D. pulex is paraphyletic with respect to D. pulicaria, which is a monophyletic group (PP = 0.99). D. arenata is also monophyletic (PP = 1.0) and is a sister group to D. pulex + D. pulicaria; support for this branch is low (PP = 0.65).
Because a bifurcating genealogy neglects recombination within a species, we also constructed a network. Network results were consistent with those obtained from the Bayesian analysis (Figure S1).
Divergence population genetics:
For D. pulex and D. pulicaria, multiple IMa runs with identical conditions except the starting seed revealed clear marginal posterior probability distributions of the parameters, with unimodal peaks for all curves; results are presented from a single representative run (Figure 3, Table 7). For gene flow from D. pulicaria to D. pulex, the highest probability of the posterior distribution is near zero, and the curve drops to zero at higher migration rates (Figure 3). For gene flow from D. pulex to D. pulicaria, a model of no gene flow can be rejected, but migration is low (Figure 3, Table 7). D. pulex and D. pulicaria are estimated to have diverged ∼82,000 years ago (95% HPD interval: 58,000–126,000). Population size for D. pulex is significantly larger than the ancestral population, suggesting population expansion; this is consistent with the negative mean value for Tajima's D observed in this species (Tables 5 and 7).
The marginal posterior probability distributions for effective population size, effective migration rate, and divergence time for D. pulex and D. pulicaria. Assuming five generations per year in Daphnia, model parameters were converted to demographic quantities using the mutation rate estimate 2.9 × 10−8 mutations/site/year [rate from Haag-Liautard et al. (2007), assuming five generations per year in Daphnia].
Scaled model parameters and demographic quantities from IMa analyses of nuclear protein-coding loci in D. pulex and D. pulicaria
Migration analyes in LAMARC, which accounts for recombination within nuclear loci, determined that migration estimates were roughly an order of magnitude higher than those obtained with IMa (Table 7). The effective number of gene migrants (2Nem) from D. pulicaria to D. pulex, per generation was 1.71 (95% support interval: 1.34–2.14), and for D. pulex to D. pulicaria, 1.67 (95% support interval: 1.28–2.10).
Despite several varied attempts, IMa runs for D. pulex and D. arenata, and D. pulicaria and D. arenata, did not have a narrow unimodal peak for the divergence-time parameter. Instead, the highest likelihood appears to have an infinitely wide range of parameter values. Thus, all parameter estimates from these runs may be invalid and are not presented. This is likely due to the low levels of variation observed in the D. arenata sequences.
DISCUSSION
Patterns of nucleotide polymorphism:
Averaged across species, daphniid nucleotide diversity (πs = 0.0136, θs = 0.0141) is lower than the average of other invertebrates (πs = 0.0265, Lynch 2006). In reference to two prominent arthropod species, synonymous diversity in Daphnia is similar to Drosophila melanogaster (πs = 0.0158, Andolfatto 2001) and lower than Anopheles gambiae (πs = 0.0235, Besansky et al. 2003; Mukabayire et al. 2001). In Daphnia, there is considerable heterogeneity in polymorphism levels for the various sites/regions and loci spanning a range from 0.0000 to 0.0694 (Table 1, Figure 1). Levels of diversity for nonsynonymous polymorphisms are far lower than for synonymous polymorphisms, reflecting purifying selection against amino acid substitutions. Mean πi and πs are significantly higher than πU (P < 0.0001 and P = 0.002, respectively), suggesting that UTRs are selectively constrained in Daphnia. These results indicate that synonymous, intron, and UTR sites should not be grouped together under the umbrella of silent sites in Daphnia and are consistent with studies that document lower levels of polymorphism in UTR regions than synonymous sites (e.g., Andolfatto 2005).
Levels of polymorphism were heterogeneous among the different species examined (Tables 1 and 2). Nucleotide diversity is notably low for D. mendotae and D. arenata, being nearly an order of magnitude lower than the other examined species for some functional sites/regions. These low diversity estimates may be due to sampling effects as only two loci were examined for D. mendotae and only four individuals were available for D. arenata. Alternatively, low estimates of nucleotide diversity could be indicative of small effective population size or lower mutation rates in these species (reviewed in Wright and Gaut 2005).
The population-genetic consequences of cyclical parthenogenesis are largely unexplored empirically. Daphnia are cyclical parthenogens that experience phases of clonal reproduction, so it is of interest if they exhibit low rates of recombination. Low recombination may cause greater vulnerability to hitchhiking effects as recombination separates independently arising mutations. The scaled recombination rate, c/μ, can be used to address this question, as recombinational activity across groups can be compared (e.g., Hudson 1987; Haddrill et al. 2005). In general, c/μ estimates are similar among Daphnia species, ranging from 0.2383 to 0.7611 (Table 3). These estimates are likely to be a lower bound on the recombination rate in Daphnia given that gene conversion may be an important factor for closely linked sites (Andolfatto and Nordborg 1998), and LAMARC does not account for gene conversion. Nevertheless, despite having phases of clonal reproduction, Daphnia appear to have recombination rates that are similar to other organisms for which estimates are available (Lynch 2007), save Drosophila, which have higher relative recombination rates than most animals (Haddrill et al. 2005; Lynch 2007).
A longstanding concern of population geneticists is quantifying how natural selection influences levels of genetic variation. The evaluation of intraspecific polymorphism and interspecific divergence can elucidate whether neutral, selective, and/or demographic forces are acting—especially when multiple loci are examined. Neutrality tests (HKA and Tajima's D) suggest that all species of Daphnia examined in this study conform to neutral model expectations, with the exception of D. obtusa, which had a significantly negative mean value for Tajima's D (Tables 4 and 5). The ratios of the number of nonsynonymous substitutions per nonsynonymous site (Ka) to the number of synonymous substitutions per synonymous site (Ks) were low for all Daphnia species, suggesting strong purifying selection (Table 6). This observation is potentially a consequence of our focus on functionally conserved genes, to maximize PCR success across a wide range of Daphnia species.
Species barrier between D. pulex and D. pulicaria:
The observation of virtually no divergence at synonymous sites (average Ks = 0.0034, standard error = 0.0038) between D. pulex and D. pulicaria suggests that the species status assigned to D. pulicaria may be inappropriate. Furthermore, previous work has shown that D. pulicaria is often paraphyletic with respect to D. pulex (e.g., Crease et al. 1989; Lehman et al. 1995; Crease et al. 1997; Dudycha 2004). Yet, for some loci/types of data, D. pulicaria does have a unique genotypic constellation to the exclusion of D. pulex (e.g., Crease et al. 1997; Colbourne et al. 1998; Černý and Hebert 1999). Our total evidence analysis of six nuclear protein-coding loci reveals that D. pulicaria and D. arenata are monophyletic groups (PP = 0.99, PP = 1, respectively; Figure 2), supporting the notion that D. pulex, D. pulicaria, and D. arenata are distinct species. Many studies report improved resolution and greater node support when data from multiple loci are combined (e.g., Remsen and Desalle 1998), perhaps explaining the paraphyly for D. pulex and D. pulicaria in earlier studies that is not observed here.
We used a divergence population-genetics approach to address whether a simple isolation model is appropriate for D. pulex and D. pulicaria. Both IMa and LAMARC analyses reveal that gene flow does occur between D. pulex and D. pulicaria, thereby ruling out a strict isolation speciation model. One migrant gene copy per generation, on average, may prevent substantial divergence at a locus (Wright 1931). Although the point estimates obtained with IMa are clearly below this, the upper bound of the 90% highest posterior density intervals exceed one, as do the migration estimates obtained with LAMARC. Despite the fact that LAMARC and IMa differ in their underlying assumptions, the confidence intervals for the estimates obtained with the different programs do overlap, with estimates obtained in LAMARC being higher (Table 7).
In contrast to IMa, LAMARC assumes migration equilibrium and accounts for intralocus recombination. Loci with unacknowledged recombination are expected to have longer gene trees, on average, than nonrecombining loci (Won and Hey 2005). Here, inferred migration events are spread over a longer time period, perhaps explaining the lower gene flow estimates obtained with the IMa program. Other studies that have used IMa with the largest nonrecombining block of sequence data vs. the full locus (with recombination) have revealed that migration estimates are not significantly affected (although in one case they did decrease); however, estimates of population size are substantially inflated when ignoring recombination (Bull et al. 2006; Strasburg and Rieseberg 2008). Thus, estimates of population size for D. pulex and D. pulicaria that were obtained in IMa should be regarded as an upper bound.
Numerous studies have outlined physiological and life history differences between North American D. pulex and D. pulicaria (e.g., Brandlova et al. 1972; Tessier and Consolatti 1991; Dudycha 2004; Weider et al. 2004). In some parts of their distribution, D. pulex and D. pulicaria switch to sexual reproduction in response to different photoperiodic cues, resulting in a prereproductive isolating barrier (Deng 1997), and they usually occupy different but geographically overlapping habitats. D. pulex typically resides in vernal ponds with leaf litter whereas D. pulicaria occupies lakes that contain fish (e.g., Hebert et al. 1993; Hebert 1995; Hebert and Finston 2001). Despite the above observations, hybrids between D. pulex and D. pulicaria do commonly occur, but are found to be obligately asexual in nature (e.g., Hebert and Crease 1983; Hebert et al. 1993; Hebert and Finston 2001). Laboratory studies, however, have revealed that these hybrids are sexual and easily backcrossed (Heier and Dudycha 2009). Our divergence population genetics approach, on the basis of nuclear loci, is consistent with the above observations—D. pulex and D. pulicaria have undergone genetic differentiation as a result of different ecological pressures, but still exchange genes at a level that may be high enough to prevent reproductive isolation.
Acknowledgments
Special thanks to John Colbourne and Mike Pfrender for assembling databases that were used for primer design. We also thank Teri Crease, Dee Denver, Jeff Dudycha, Eric Rynes, Doug Scofield, Lucian Smith, and Derek Taylor for helpful advice. Lawrence Washington, Yelena Radivojac, and Brian Molter provided excellent technical assistance. Desiree Allen, John Colbourne, Sandy Connelly, Koen De Gelas, Jeff Dudycha, John Havel, Karen Looper, Mike Pfrender, Sarah Schaack, and Emily Williams provided Daphnia specimens. Part of this work was carried out using the Computational Biology Service Unit at Cornell University, which is partially funded by Microsoft Corporation. This work was supported by a National Science Foundation IGERT fellowship (to A.R.O.), National Science Foundation grants DEB-0196450 and EF-0328516, and National Institutes of Health grant R01-GM36827 (to M.L.).
Footnotes
Sequence data from this article have been deposited with the EMBL/GenBank Data Libraries under accession nos. EU918429–EU918560.
Supporting information is available online at http://www.genetics.org/cgi/content/full/genetics.108.099549/DC1.
Communicating editor: D. Begun
- Received December 9, 2008.
- Accepted March 15, 2009.
- Copyright © 2009 by the Genetics Society of America