Abstract
Selection is predicted to drive diversification within species and lead to local adaptation, but understanding the mechanistic details underlying this process and thus the genetic basis of adaptive evolution requires the mapping of genotype to phenotype. Venom is complex and involves many genes, but the specialization of the venom gland toward toxin production allows specific transcripts to be correlated with specific toxic proteins, establishing a direct link from genotype to phenotype. To determine the extent of expression variation and identify the processes driving patterns of phenotypic diversity, we constructed genotype-phenotype maps and compared range-wide toxin-protein expression variation for two species of snake with nearly identical ranges: the eastern diamondback rattlesnake (Crotalus adamanteus) and the eastern coral snake (Micrurus fulvius). We detected significant expression variation in C. adamanteus, identified the specific loci associated with population differentiation, and found that loci expressed at all levels contributed to this divergence. Contrary to expectations, we found no expression variation in M. fulvius, suggesting that M. fulvius populations are not locally adapted. Our results not only linked expression variation at specific loci to divergence in a polygenic, complex trait but also have extensive conservation and biomedical implications. C. adamanteus is currently a candidate for federal listing under the Endangered Species Act, and the loss of any major population would result in the irrevocable loss of a unique venom phenotype. The lack of variation in M. fulvius has significant biomedical application because our data will assist in the development of effective antivenom for this species.
NATURAL selection can be a powerful force driving rapid diversification within species and is predicted to lead to local adaptation through the increase in frequency of mutations in gene-regulatory or protein-coding regions (Stern 2000; Hoekstra and Coyne 2007; Carroll 2008; Muller 2007). Expression variation at single loci has produced adaptive phenotypic divergence in the beaks of Darwin’s finches (Abzhanov et al. 2004), coat color in mice (Manceau et al. 2011), and mimicry in butterflies (Reed et al. 2011). Most traits, however, are products of poorly characterized developmental pathways involving many loci. Many of the fundamental features of evolving systems, such as evolvability, epistasis, pleiotropy, and basic variational properties (Rokyta et al. 2008, 2009, 2011b; Wager 2008; Chou et al. 2011; Woods et al. 2011; Hill and Zhang 2012), result from the relationship between genotype and phenotype (Stadler et al. 2001; Hansen 2006), but the ability to study this relationship directly in polygenic traits is rare. Therefore, linking gene-regulatory changes to adaptive evolution in polygenic, complex phenotypes remains a challenge (Romero et al. 2012; Savolainen et al. 2013).
Snake venoms are complex cocktails of ∼40–100 proteinaceous toxins (Boldrini-França et al. 2010; Calvete et al. 2010; Pavlicev et al. 2011; Rokyta et al. 2011a, 2012, 2013; Durban et al. 2013; Margres et al. 2013) that collectively function in prey capture, digestion, and defense. Although most quantitative traits are the products of developmental pathways where gene-regulatory changes may have effects mediated through complex interaction networks, toxin expression variation directly changes the phenotype because relative amounts of venom components determine, in part, the venom’s efficacy. The specialization of the venom gland toward toxin production allows the use of transcriptomics to identify a large number of loci that contribute to the venom phenotype (Rokyta et al. 2011a, 2012, 2013; Margres et al. 2013), and proteomic techniques can verify the secretion of the proteins produced by these loci (Margres et al. 2014). Correlating specific transcripts with specific toxic proteins in the context of whole venom establishes a direct link from genotype to phenotype, allowing us to link expression variation at specific loci to variation in a complex trait. Expression is typically measured at the mRNA level (Rokyta et al. 2011a, 2012, 2013; Margres et al. 2013). Venom, however, is a secretion, and protein expression can be measured directly by reversed-phase HPLC (RP-HPLC), permitting accurate quantification of population-level phenotypic variation.
Venomous snakes and their prey represent a powerful system for studying the process of local adaptation (Biardi et al. 2005). Snake venoms function solely following injection into another organism, and prey provide geographically variable selective pressures through their own genetic variation and variation in local species compositions and abundances. Selection can vary among interacting populations as a result of genotype-by-genotype-by-environment interactions, producing phenotypic variation because locally beneficial traits are not expected to become fixed at the species level (Thompson 2005). Although intraspecific variation in venom composition has been documented previously (Chippaux et al. 1991; Daltry et al. 1996; Creer et al. 2003; Mackessy 2010; Sunagar et al. 2014), comparisons have been based on clinical symptoms of human patients (Chippaux et al. 1991), suffered from small sample sizes (Straight et al. 1991; Sunagar et al. 2014), and/or employed qualitative assessments of venom appearance or protein presence or absence (Straight et al. 1991; Daltry et al. 1996; Creer et al. 2003; Boldrini-França et al. 2010; Mackessy 2010). These comparisons effectively characterized venom differences at the protein-family level, but paralogs can have vastly different functions, and expression variation may be locus specific rather than gene-family specific. Therefore, the construction of a genotype-phenotype map would improve the resolution of such comparisons by characterizing venom at the locus level (Margres et al. 2014).
To determine whether snake venoms exhibit geographic variation as predicted for a locally adapted trait and identify which loci contributed to phenotypic divergence, we used our genotype-phenotype map approach and proteomically compared toxin expression across multiple populations for two species with nearly identical ranges in the southeastern United States: the eastern diamondback rattlesnake (Crotalus adamanteus) and the eastern coral snake (Micrurus fulvius). Comparative studies of codistributed species can improve our understanding of the processes driving patterns of phenotypic diversity, and comparing trait distributions for independent species across the same landscape will show whether selection or other factors, such as drift, are responsible for the extent of the observed phenotypic variation (Gomulkiewicz et al. 2007). C. adamanteus is the largest species of rattlesnake and consumes small mammals, with rats, squirrels, and rabbits comprising most of its diet (Klauber 1997). C. adamanteus is historically native to seven states in the southeastern United States but recently has been extirpated from Louisiana, is endangered in North Carolina (Palmer and Braswell 1995), and is currently under consideration for listing as threatened under the Endangered Species Act (United States Fish and Wildlife Service 2012). Identifying phenotypically distinct populations for this species of conservation concern is critical to effective species management. M. fulvius is also native to the forests of the southeastern coastal plain and is primarily ophiophagous, preying on small snakes and other squamates (Jackson and Franz 1981). M. fulvius envenomations can be lethal, and antivenom is currently unavailable in the United States (Norris et al. 2009). Intraspecific venom variation can limit the efficacy of antivenoms (Chippaux et al. 1991; Casewell et al. 2014; Sunagar et al. 2014), and the extent of variation within M. fulvius is currently unknown. We sought to characterize the variation within this species, assisting in the development of an effective antivenom.
Materials and Methods
Transcriptome assembly and analysis
The raw paired-end reads passing the Illumina quality filter were merged if their 3′ ends overlapped, as described previously (Rokyta et al. 2012, 2013; Margres et al. 2013). To eliminate reads corresponding to extremely high-abundance transcripts, we used the Extender program (Rokyta et al. 2012) with 1,000 merged reads as seeds to attempt to generate complete transcripts using only the merged reads. Extension of seeds required an overlap of 100 nucleotides, Phred scores of at least 30 at each position in the extending read, and an exact match in the overlapping region. We performed a reference-based assembly against the 3,031 nontoxins previously annotated for C. horridus (Transcriptome Shotgun Assembly accession number GAAZ01000000; Rokyta et al. 2013) with NGen version 11 using both the merged and unmerged reads and a minimum match percentage of 93% for C. adamanteus and 85% for M. fulvius. Consensus sequences were retained if they had at least 10x coverage over the entire coding sequence; regions outside the coding sequence with less than 10x coverage were removed. Toxin sequences were clustered into groups with <1% nucleotide divergence in their coding sequences, and duplicate nontoxin sequences were eliminated following alignment of the final transcripts with NGen. We used one representative from each toxin cluster and all the unique nontoxins to filter the corresponding reads in a reference-based transcriptome assembly in NGen with a minimum match percentage of 98% using only the merged reads. The unfiltered reads then were used in a de novo transcriptome assembly in NGen with the default minimum match percentage of 93%, retaining only contigs comprised of at least 100 reads.
To increase our chances of identifying all toxin sequences, we performed four additional de novo assemblies for each species. Three assemblies were performed with NGen with a minimum match percentage of 98% using 1 million, 5 million, and 10 million reads. We opted for high stringency for these assemblies to attempt to differentiate among paralogs and varied the number of reads because we found that some high-abundance transcripts were difficult to assemble with too many reads (i.e., >1 million), apparently because of low levels of unspliced transcripts. The fourth additional de novo assembly used the Extender program as earlier on 1,000 new random reads, and we only used this assembly to identify snake venom metalloproteinases, which were difficult for other methods to assemble. After combining the results of all the preceding assemblies and eliminating duplicates, we performed one final round of read filtering of the merged reads, followed by a de novo assembly of the unfiltered reads as earlier, keeping only contigs comprising >1,000 reads. This step was included to ensure that we missed no toxin sequences with appreciable expression.
To identify and eliminate potentially chimeric sequences in our toxin databases for both species, we screened for evidence of recombination within each toxin family with GARD (Pond et al. 2006). We used the general reversible model of sequence evolution and gamma-beta rates. If we found a signal for recombination resulting in significantly different tree topologies for different regions of the alignments based on KH tests, we performed a reference-based assembly with NGen version 11 with a minimum match percentage of 98% and the autotrim parameter set to false, using the toxin coding sequences as references and 10 million merged reads. Such high-stringency alignments facilitate the identification of chimeric sequences by producing either multimodal or extremely uneven coverage distributions, particularly in combination with our long, merged reads. Suspect sequences were confirmed to be chimeras of other sequences in our toxin database before removal.
Sequences were identified by means of nucleotide 6-frame translation-protein (blastx) searches against the NCBI nonredundant (nr) protein database with a minimum E-value of 10−4 and retaining only 10 hits. De novo assembled transcripts were retained and annotated only if they had complete protein-coding sequences. Putative toxins were identified by searching their blastx match descriptions for toxin-related keywords, as described previously (Rokyta et al. 2012, 2013; Margres et al. 2013). The final set of unique transcripts for each species was generated by combining the results from all assemblies and eliminating duplicates by means of an NGen assembly and a second, more stringent assembly in SeqMan Pro. Final transcript abundances were estimated by means of a reference-based transcriptome assembly with NGen with a minimum match percentage of 95% using only the coding sequences of transcripts. Signal peptides for toxins were identified via SignalP analyses (Bendtsen et al. 2004). Putative toxins were named with a toxin class abbreviation, a number indicating cluster identity, and a lowercase letter indicating the particular member of a cluster.
Mass spectrometry analysis
Chromatographic separation and tandem mass spectrometry of the 21 RP-HPLC peaks previously identified in C. adamanteus were performed as described previously (Margres et al. 2014; Figure 1A). Four additional peaks were identified following RP-HPLC analysis of all C. adamanteus venom samples. These peaks (1b, 15b, 20b, and 20c) were isolated from a venom sample collected from KW1127, an adult female (127-cm snout-vent length, 136-cm total length) from Ca1 (Figure 2), as described earlier (Figure 1B). Twenty-two RP-HPLC peaks were collected from 400 μg of M. fulvius venom on an analytical column (Figure 1C), as described previously (Margres et al. 2014). These peaks were isolated from a venom sample collected from the adult female (62-cm snout-vent length, 68.5-cm total length; Mf2 in Figure 2) used in the transcriptome analysis of Margres et al. (2013).
RP-HPLC profiles of C. adamanteus and M. fulvius venom samples. (A) Margres et al. (2014) identified 21 major peaks in the RP-HPLC analysis of 100 μg of venom from a juvenile female C. adamanteus (72.4-cm snout-vent length, 77.5-cm total length) from population Ca2 in Figure 2. (B) We identified 25 major peaks in the RP-HPLC analysis of 100 μg of venom from an adult female C. adamanteus (127-cm snout-vent length, 136-cm total length) from population Ca1 in Figure 2. Peaks 1b, 15b, 20b, and 20c were collected for mass spectrometry analysis. (C) We identified 22 major peaks in the RP-HPLC analysis of 25 μg of venom from an adult female M. fulvius (62.0-cm snout-vent length, 68.5-cm total length) from population Mf2 used in the transcriptomic analysis of Margres et al. (2013).
Large-scale sampling of C. adamanteus and M. fulvius. We collected venom samples from 65 adult C. adamanteus from seven putative populations and 49 adult M. fulvius from eight putative populations. AR, Apalachicola River; Ca, C. adamanteus; Mf, M. fulvius; SMR, Saint Mary’s River; SR, Suwannee River.
An externally calibrated Thermo Scientific LTQ Orbitrap Velos nLC-ESI-LIT was used for tandem mass spectrometry on the four additional C. adamanteus peaks (1b, 15b, 20b, and 20c) and 22 M. fulvius peaks. A 2-cm, 100-μm inside diameter trap column was followed by a 10-cm analytical column of 75 μm inside diameter (SC001 and SC200 Easy Columns, C-18 AQ packing, Thermo Scientific, Waltham, MA). Separation was carried out using EASY-nLC II with a continuous vented column configuration. A 2-μl (200-ng) sample was aspirated into a 20-μl loop and loaded onto the trap. The flow rate was set to 300 nl/min for separation on the analytical column, and a 1-h linear gradient from 0 to 45% acetonitrile in 0.1% formic acid was run. The LC effluent was directly nanosprayed into the mass spectrometer, which was operated in a data-dependent mode under direct control of the Xcalibur software. Ten data-dependent collisional-induced-dissociation (CID) MS/MS scans per full scan were performed. Each sample was run in triplicate.
Tandem mass spectra were extracted by Proteome Discoverer version 1.4.0.288. Charge-state deconvolution and deisotoping were not performed. Sequest version 1.4.0.288 (Thermo Fisher Scientific, San Jose, CA) was used to search the transcriptome databases containing 4,186 entries for C. adamanteus and 1,352 entries for M. fulvius with signal peptides removed and assuming the digestion enzyme trypsin, allowing one missed digestion site. Oxidation of methionine (variable) and carbamidomethylation of cysteine (variable) were specified as potential modifications. For C. adamanteus, the 21 RP-HPLC peaks identified in Margres et al. (2014) were analyzed with a fragment-ion mass tolerance of 0.80 Da and a parent-ion tolerance of 2.0 Da. The four new C. adamanteus peaks identified in this study and the 22 RP-HPLC peaks identified in M. fulvius were analyzed using Sequest and X! Tandem version Cyclone 2010.12.01.1 (The GPM, thegpm.org) using the transcriptome databases as described earlier. Because of the increased sensitivity of the Orbitrap, a fragment-ion mass tolerance of 0.50 Da and a parent-ion tolerance of 10.0 ppm were specified. Glutamic acid and glutamine conversions to pyroglutamate and deamination of N-termini were specified in X! Tandem as variable modifications. Scaffold version 4.3.2 (Proteome Software, Portland, OR) was used to validate MS/MS-based peptide and protein identifications for both species. Peptide identifications were accepted if they could be established at greater than 95% probability by the Scaffold local FDR algorithm, contained at least two identified peptides, possessed unique peptide evidence, and had a minimum of 20% sequence coverage. To identify the major toxins within each peak, only proteins with >4% of the total spectral matches within each peak were reported.
Reversed-phase high-performance liquid chromatography
RP-HPLC was performed on a Beckman System Gold HPLC (Beckman Coulter, Fullerton, CA) equipped with Beckman 32 Karat Software version 8.0 for peak quantification, as described by Margres et al. (2014), for 100 μg of total protein for all C. adamanteus samples. For all M. fulvius samples, approximately 25 μg of total protein was injected onto a Jupiter C18 narrow-bore column, 250 × 2 mm (Phenomenex, Torrance, CA), using the standard solvent system, as described previously (Margres et al. 2014). After 5 min at 5% B, a linear gradient to 45% B was run over 120 min at a flow rate of 0.2 ml/min. Peaks were quantified as described by Margres et al. (2014).
Compositional analysis
Twenty-five RP-HPLC peaks per C. adamanteus venom sample and 22 RP-HPLC peaks per M. fulvius venom sample were quantified as described earlier (Figure 1). Briefly, relative amounts of individual peaks were determined by measuring the area under each peak relative to the total area of all identified peaks (Margres et al. 2014). According to the Lambert-Beer law, this relative amount corresponds to the percentage of total peptide bonds in the sample (McNaught and Wilkinson 1997), and this measure has been shown to be a useful proxy for the relative amount of a specific protein by weight (Gibbs et al. 2009). However, this process generates compositional data, with each peak representing an individual component of the whole venom. Compositional data sets are subject to constant-sum constraints and are inherently biased toward negative correlations among components (Aitchison 1986). We used the robCompositions package (Templ et al. 2011) in R to transform the data prior to statistical analysis using either the centered logratio (clr) or isometric logratio (ilr) transformations (Egozcue et al. 2003; Filzmoser et al. 2009). The clr transformation takes the data from the simplex to real space, and because it retains the individual identities of the peaks, this transformation was used for visualization purposes (Figure 4 and Figure 5). The clr-transformed data, however, still suffer from a sum constraint because the components must add to zero. Therefore, we used the ilr transformation for all statistical analyses testing for differences among venom samples because this transformation does not suffer from the same sum constraint. The ilr transformation projects vectors in the simplex space into the real space using an orthonormal basis for the simplex and maintains all metric properties in the process. Zero values did occur during peak quantification, and we could not transform the data using logratios because we could not take the log of zero. Therefore, we treated zeros as trace values. Trace values represent a peak that was not completely absent but rather was not quantifiable as a result of the accuracy of the measurement process, and traces need to be positive values that are less than the smallest recorded value (i.e., 0.01% in our data set; Aitchison 1986).
We used the adonis function from the vegan package (Oksanen et al. 2013) in R and Euclidean distances to perform a permutational or nonparametric MANOVA (McArdle and Anderson 2001) on the ilr-transformed data to test for significant expression variation. The ilr transformation takes Aitchison distances in the simplex space to Euclidean distances in real space, enabling the application of standard multivariate statistical approaches (Egozcue et al. 2003; Aitchison and Egozcue 2005). Nonparametric tests do not assume that the data are normally distributed, as do traditional parametric tests (McArdle and Anderson 2001), and P-values were calculated under 100,000 permutations. To determine the sensitivity of the full nonparametric MANOVA analysis to the inclusion of trace values, as discussed earlier, we replaced all zeros in both data sets with three different values prior to normalization to percentages, transformation, and statistical analysis: 10−3, 10−4, and 10−6. Analysis of all three trace-value data sets resulted in nearly identical results. We performed all remaining statistical analyses for both species using a trace value of 10−4.
If the nonparametric MANOVA returned a significant result, we used the sample function in R to randomly permute population designations to identify which populations demonstrated significant expression variation. We used the ilr-transformed data and performed all pairwise comparisons in the context of the full model. First, the test statistic was obtained through a traditional parametric MANOVA in R. Population assignments then were permuted for the two populations of interest over 100,000 iterations, and a MANOVA was conducted following each permutation to create a null distribution of the test statistic. The test statistic calculated using the original unpermuted data then was compared with the null distribution. If the test statistic was within the one-tailed 95% C.I., the populations were not considered to exhibit significant expression variation and were combined into a single population. If the test statistic was outside the 95% C.I., the populations were considered distinct and retained.
If variation was detected, we performed a linear discriminant function analysis using the lda function in R on the ilr-transformed data to assess group membership placement probabilities across populations. The lda function tests for differences in the multivariate means, enabling group discrimination using the variables that capture the largest amount of variation in the data.
Sample collection
We collected venom samples from 65 adult C. adamanteus from seven putative populations and 49 adult M. fulvius from eight putative populations (Figure 2). Samples were collected across all months and seasons. C. adamanteus reaches sexual maturity at approximately 102-cm snout-vent length (Waldron et al. 2013), and M. fulvius becomes sexually mature at approximately 50-cm snout-vent length (Jackson and Franz 1981). All specimens used in our analyses met or exceeded these measurements. We limited our analyses to adults to avoid the potentially confounding effects of ontogenetic toxin expression variation, which has been documented previously (e.g., Mackessy 1988; Calvete et al. 2010; Durban et al. 2013). Individuals were grouped a priori into populations based on various biogeographic barriers (i.e., major river drainages) and/or geographic distance. Because our randomization tests would combine populations with similar expression patterns but would not split one population with different expression patterns into multiple populations, we attempted to divide our samples into as many populations as possible while maintaining reasonable sample sizes (i.e., ≥4). Population designations were as follows: Ca1/Mf1, Florida panhandle west of the Apalachicola River; Ca2/Mf2, Florida panhandle and southern Georgia east of the Apalachicola River and west of the Suwannee River; Ca3, Brooksville Ridge of the Florida peninsula and one sample due east on the Atlantic Coastal Ridge; Mf3, northern peninsular Florida near Osceola National Forest; Ca4/Mf8, Everglades National Park; Mf4, northern peninsular Florida near Gainesville/Ocala; Ca5, Little St. George Island; Mf5, southern region of the Brooksville Ridge of peninsular Florida; Ca6, Caladesi Island; Mf6, east coast of central Florida; Ca7, Sapelo Island, Georgia; and Mf7, southern peninsular Florida near Lake Okeechobee (Figure 2).
Results and Discussion
Constructing the genotype-phenotype map
Because the original venom-gland transcriptome assembly for C. adamanteus described by Rokyta et al. (2012) was found to contain a few chimeric toxin sequences, we performed a complete reanalysis with higher stringency and steps to eliminate chimeric sequences. The original data (NCBI Sequence Read Archive accession number SRA050594) consisted of 95,643,958 pairs of 101-nucleotide reads generated on an Illumina HiSeq 2000 with average Phred scores of 32. We merged 68,987,031 of these pairs on the basis of their 3′ overlaps to give a set of reads with an average Phred score of 43 and an average length of 143 nucleotides. Transcriptome assembly and annotation resulted in 4,119 unique nontoxin transcripts with full-length coding sequences and 76 unique putative toxin transcripts (Figure 3, A and B). These 76 toxin transcripts were combined into 44 clusters with <1% nucleotide divergence in their coding sequences. Such clustering facilitates the estimation of transcript abundances and provides a better estimate of the number of toxin sequences.
Venom-gland transcriptomes of C. adamanteus and M. fulvius. (A) A total of 4,119 non-toxin-encoding and 76 toxin-encoding transcripts were identified for C. adamanteus. Toxins were grouped into 44 clusters based on <1% nucleotide divergence. The inset shows a magnification of the top 200 transcripts, the vast majority of which code for toxins. (B) The pie chart shows that the C. adamanteus venom gland is biased toward toxin production, and the histogram shows expression levels of individual toxin clusters, color coded by toxin class. Toxins detected proteomically are indicated with asterisks. (C) A total of 1,270 non-toxin-encoding and 110 toxin-encoding transcripts were identified for M. fulvius. Toxins were grouped into 44 clusters, and most of the highly expressed transcripts coded for toxins. (D) The pie chart shows that the M. fulvius venom gland is extremely biased toward toxin production, and the histogram shows expression levels of individual toxin clusters, color coded by toxin class. Toxins detected proteomically are indicated with asterisks. BPP, bradykinin-potentiating and C-type natriuretic peptides; CTL, C-type lectin; CFVII, coagulation factor VII; CRISP, cysteine-rich secretory protein; HYAL, hyaluronidase; KUN, Kunitz-type protease inhibitor; KunKun, dual-domain Kunitz-type protease inhibitor; LAAO, l-amino acid oxidase; LCN, long-chain neurotoxin; MYO, myotoxin-A; NGF, nerve growth factor; NP, natriuretic peptide; NUC, nucleotidase; PDE, phosphodiesterase; PLA2, phospholipase A2; SVMPII, snake venom metalloproteinase type II; SVMPIII, snake venom metalloproteinase type III; SVSP, snake venom serine proteinase; 3FTx, three-finger toxin; TruncHYAL, truncated hyaluronidase; VEGF, vascular endothelial growth factor; VF, venom factor.
The original proteomic analysis for C. adamanteus described by Margres et al. (2014) identified 40 toxic proteins that grouped into 20 unique clusters based on shared peptide evidence. Following reanalysis of the venom-gland transcriptome, we identified unique proteomic evidence for 21 of the 44 toxin clusters identified, including 15 of the 20 most highly expressed transcripts (Table 1, Supporting Information, Table S1, Figure 3B). These 21 clusters represented 10 of the 17 toxin gene families identified in the transcriptome, and we uniquely identified 22 individual transcripts (i.e., a single allelic variant) in the 25 RP-HPLC peaks.
Reassembly of the venom-gland transcriptome for M. fulvius (Margres et al. 2013) yielded 1,270 unique nontoxin and 110 unique toxin transcripts. These toxin transcripts were combined into 44 clusters with <1% nucleotide divergence for analysis (Figure 3, C and D). The raw data (NCBI Sequence Read Archive accession number SRA062772) consisted of 79,573,048 pairs of 101-nucleotide reads generated on an Illumina HiSeq 2000 with an average Phred score of 32. We merged 59,414,816 of these pairs on the basis of their 3′ overlaps to give composite reads with an average Phred score of 44 and an average length of 137 nucleotides. Only 1,269 of the 1,270 total unique nontoxins were included in Figure 3C because a single extremely low-abundance transcript was removed during abundance estimation because none of the 10 million reads used for estimating abundance mapped to this transcript.
We identified unique proteomic evidence for 29 of the 44 toxin clusters identified in the venom-gland transcriptome of M. fulvius, including 17 of the 20 most highly expressed transcripts. These 29 clusters represented 11 of the 15 toxin gene families identified in the transcriptome, and we uniquely identified 38 individual transcripts (i.e., 9 allelic variants; Table S2) in the 22 RP-HPLC peaks (Figure 1C). Peptide reports for both species are listed in Table S3.
Our proteomic analyses often failed to detect proteins corresponding to the lower-abundance transcripts for both species (Figure 3, B and D), most likely reflecting a detection threshold and the large range of expression levels for toxin genes (i.e., the signal for low-abundance proteins may be overwhelmed by that for high-abundance proteins). Additionally, certain protein classes expressed at appreciable levels in the venom gland may not encode true toxin proteins. These putative toxins were identified through homology with known toxins and may not encode true toxin proteins that are secreted into the venom. Venom is produced in the venom gland, but the venom gland is associated with a smaller gland downstream, the accessory gland. The function of the accessory gland is still uncertain, and Vonk et al. (2013) sequenced the venom-gland transcriptome and the accessory-gland transcriptome of the king cobra (Ophiophagus hannah). They found that most toxins are expressed at high levels only in the venom gland and at relatively low levels in the accessory gland. Lectins, however, were an exception, being much more highly expressed in the accessory gland than in the venom gland (Vonk et al. 2013). We identified 12 clusters of C-type lectins in the venom-gland transcriptome of C. adamanteus but detected only proteomic evidence for four of these proteins. C-type lectins were much less abundant in M. fulvius. We identified two clusters of C-type lectins in the M. fulvius transcriptome and verified one of the two transcript products proteomically. Vonk et al. (2013) argued that lectins may be deactivated in the venom gland. Although we found proteomic evidence for some of these proteins in the venoms of both species, certain lectin transcripts may not be true toxins or at least not significant components of the venoms.
Not all venoms vary geographically
We conducted nonparametric MANOVAs and, consistent with predictions for a locally adapted trait, detected significant expression variation among C. adamanteus populations (P = 0.002). Despite significant variation in C. adamanteus, we found no significant expression variation in M. fulvius at any spatial scale (0.100 ≤ P ≤ 0.190), indicating that M. fulvius may not be engaged in a coevolutionary arms race with specific prey taxa, may have experienced a species-wide selective sweep (i.e., the rapid range-wide fixation of a specific phenotype as a result of selection), or has potentially undergone a recent range expansion. Intraspecific venom variation had been described previously as a ubiquitous phenomenon within snakes that would limit the efficacy of antivenoms (Chippaux et al. 1991; Casewell et al. 2014; Sunagar et al. 2014). However, our results demonstrate that intraspecific venom variation is not a valid a priori assumption, and bites from different geographic locations may not always result in different pathologies and degrees of neutralization by antivenom. Raw, untransformed data for both species are given in Table S4.
Pervasive geographic expression variation in C. adamanteus
To determine the extent of geographic expression variation and to identify which populations demonstrated significant expression variation in C. adamanteus, we performed pairwise randomization tests and determined that five of the seven a priori C. adamanteus populations had distinct patterns of expression. The three northernmost populations (Ca1, Ca2, and Ca7; Figure 2) were combined into a single population, but all other adult populations showed unique expression patterns (Figure 4). We updated population designations and conducted the nonparametric MANOVA with five populations (P = 0.001). We used a linear discriminant function analysis to assess group membership placement probabilities for the five adult populations, and the analysis accurately placed 79.31% of Ca1/2/7 individuals (formerly populations Ca1, Ca2, and Ca7; n = 29), 56.25% of Ca3 individuals (n = 16), 25.00% of Ca4 individuals (n = 4), 72.73% of Ca5 individuals (n = 11), and 60.00% of Ca6 individuals (n = 5). The lower percentages of the group membership placement probabilities (i.e., <70%) may have been a function of unequal prior probabilities owing to large differences in sample size.
Statistical analyses identified extensive geographic expression variation in C. adamanteus. Following randomization tests, Ca1, Ca2, and Ca7 from Figure 2 were combined into a single population (represented by the white bar), but all other populations possessed unique expression patterns. The mean centered log ratio expression levels for individual peaks (excluding peak 6) for each C. adamanteus population are shown. Peak 6 was included in all analyses but was excluded from the figure for aesthetic purposes. Every individual in Ca6 lacked this protein, and taking the log of the extremely small trace value resulted in a large negative number. Colors correspond to Figure 2. Clr, centered log ratio.
We compared the highest and lowest mean clr expression levels for individual peaks among each population and identified the most variable (maximum difference >2) and conserved (maximum difference <1) loci. Although these values are arbitrary, a maximum difference >2 indicates that the highest expression of the gene in one population is more than seven times greater than the lowest expression of the gene in another population relative to the geometric mean. Conversely, a maximum difference <1 indicates that the highest expression of the gene in one population is approximately 2.7 times greater than the lowest expression of the gene in another population relative to the geometric mean. Eight peaks (2, 4, 5, 6, 9, 12, 17, and 20a) representing seven loci and the only two unidentified proteins (peaks 6 and 9) were extremely variable and had a difference >2, three peaks (7, 10, and 15a) corresponding to five loci were conserved and had a difference <1, and 14 peaks representing 10 loci had a difference between 1 and 2.
The genotype-phenotype map allowed us to identify the specific loci and mechanism driving venom divergence among adult C. adamanteus. For example, myotoxin (peak 2) and one of the two unidentified proteins (peak 6) were expressed at their highest levels in Ca1/2/7 and Ca3 (Figure 4), the northernmost populations, and were completely absent in Ca6, a southern population on Caladesi Island (Figure 2). This expression dichotomy had been documented previously in myotoxin (Straight et al. 1991), and we demonstrated that parallel regulatory changes affecting other genes are also occurring along this latitudinal gradient. Although expressing myotoxin at a relatively high level, Ca3 downregulated seven other loci (snake venom type II metalloproteinase 2; snake venom type III metalloproteinase 2; l-amino acid oxidase; C-type lectins 10, 11, and 13; and snake venom serine proteinase 4, peak 20b) in comparison with all other populations. Snake venom serine proteinase 4 (peak 12) and snake venom type II metalloproteinase 2 and snake venom type III metalloproteinase 2 (peak 20a) were downregulated in Ca4. Snake venom serine proteinases 4 and 7, nucleotidase, and hyaluronidase (peak 17) were expressed at relatively low levels in Ca5, while snake venom type II metalloproteinase 1 (peak 21) was significantly upregulated in comparison to all other populations. The variation in expression of these loci between populations was significant and affected this highly polygenic, complex phenotype in a tractable manner (Fraser 2013), providing candidate genes for which to test the functional and evolutionary significance of this variation (e.g., promoter evolution).
Casewell et al. (2014) recently suggested that venom variation can be influenced not only by regulatory changes but also by postgenomic effects, including post-translational modifications to specific toxins. We detected the same toxic protein, snake venom type III metalloproteinase 2, in multiple RP-HPLC fractions (Table 1, Table S1), suggesting several different protein products following proteolysis (Casewell et al. 2014). This toxin was the only protein identified in peaks 4 and 5 (Table 1, Table S1). Peak 4 was expressed at its highest level in Ca4 and lowest level in Ca6, the two southernmost populations (Figure 2), while the opposite was found in peak 5. This result demonstrates that post-translational modifications of a single gene product also can contribute to rapid venom divergence.
The pervasive expression variation we identified not only demonstrates the importance of gene regulation in the evolution of venoms but also has significant conservation implications. C. adamanteus, as a result of human persecution and habitat fragmentation, is currently a candidate for federal listing as threatened under the Endangered Species Act (United States Fish and Wildlife Service 2012). Traditional conservation genetics aims to identify and conserve genetically distinct populations based on neutral markers. Neutral markers, however, can leak across strong selective boundaries, and if selection is driving venom divergence, we might expect divergence in this phenotype to precede neutral divergence among populations. All five distinct populations identified in this study should be involved, in conjunction with sequence data and population genetics estimates, with the identification and designation of evolutionarily significant units because the loss of any of these populations would result in the loss of a unique venom phenotype.
Expression variation in both high- and low-abundance proteins
Gene expression is a metabolically costly process, and selection will drive expression to an optimal level at which its cost is balanced by its fitness benefit (Gout et al. 2010). Proteins expressed at low levels are predicted to exhibit patterns of rapid adaptive evolution and influence complex trait variation more than highly expressed proteins because highly expressed proteins are thought to evolve under strong purifying selection and have relatively conserved expression levels (Gout et al. 2010). Therefore, abundant toxins should exhibit less variation in expression as these generic proteins may perform nonspecific prey functions, while less abundant proteins may be prey specific, thus explaining their presence or absence (Gibbs et al. 2009). To determine whether variation among adult C. adamanteus was restricted to low-abundance proteins and highly expressed loci were conserved, we divided the 25 RP-HPLC peaks into low-abundance (12 peaks) and high-abundance (13 peaks) data sets based on the clr mean for each peak and conducted a nonparametric MANOVA. We detected significant expression variation in both low-abundance (P < 10−3) and high-abundance (P < 10−3) data sets, indicating that loci expressed at all levels contributed to geographic divergence among adult C. adamanteus. Our results contradict theoretical predictions that highly expressed genes have more conserved expression patterns and evolve more slowly than genes expressed at lower levels (Gout et al. 2010), potentially as a result of the tissue-specific expression of toxin genes. The modularity of the venom system may increase the evolvability of this complex phenotype, allowing for extreme levels of expression variation in both highly and weakly expressed loci.
Conserved expression patterns in M. fulvius
Despite the lack of significant expression variation in M. fulvius, we also compared the highest and lowest mean clr expression levels for individual peaks among each population and identified the most variable (maximum difference >2) and conserved (maximum difference <1) loci, as described earlier. Mf1 (Figure 2) was excluded from this analysis because of its extremely small sample size (n = 2). Consistent with the lack of significant variation identified in our previous analysis, 11 peaks (1, 4b, 5, 6a, 6d, 7, 11, 12, 15a, 15b, and 16) representing the protein products of 22 loci were conserved and had a difference <1, 10 peaks (2a, 2b, 3, 4a, 6b, 6c, 8, 9, 10, and 13) representing 18 loci had a difference between 1 and 2, and only a single peak (peak 14) was variable and had a difference >2 (Figure 5). Peak 14 contained the protein products of five loci (phospholipases A2-2, -3, and -7 and three-finger toxins-9 and -10). Phospholipases A2-2 and -3 were the two most highly expressed transcripts in the venom-gland transcriptome, and all five loci are in the top 13 most highly expressed toxins. However, all these proteins are present in multiple peaks, again suggesting that the variation identified may be restricted to specific post-translationally modified protein products or complexes (Bohlen et al. 2011; Casewell et al. 2014).
Lack of geographic expression variation in M. fulvius. Most peaks and their corresponding loci were conserved across all populations, with peak 14 (5 loci) being the only peak that exhibited extensive variation (i.e., maximum difference >2). The mean centered log ratio expression levels for individual peaks for each M. fulvius population (excluding Mf1) are shown. Mf1 was excluded because of its small sample size (n = 2). Colors correspond to Figure 2. Clr, centered log ratio.
As discussed earlier, M. fulvius envenomations can be lethal, and antivenom is currently unavailable in the United States (Norris et al. 2009). Intraspecific venom variation has been shown to limit the efficacy of antivenoms (Chippaux et al. 1991; Casewell et al. 2014; Sunagar et al. 2014), but our results demonstrate that intraspecific venom variation is not as pervasive as previously believed. The paucity of expression variation in M. fulvius suggests that a single anitvenom would effectively neutralize bites from different geographic locations, providing critical information for the development of an effective antivenom.
Extremes of toxin expression variation in sympatric species
Comparative studies of codistributed species can improve our understanding of the processes driving patterns of phenotypic diversity. We compared phenotypic distributions for independent species across the same landscape to determine whether selection or other factors were responsible for the observed patterns (Gomulkiewicz et al. 2007). A direct link between the expression level of a locus and fitness has been documented in other organisms (Gout et al. 2010). This relationship should be even more pronounced in venoms because expression variation directly changes the phenotype (i.e., relative amounts of venom components should determine, at least in part, the venom’s efficacy). We identified extensive expression variation in C. adamanteus and found no variation in M. fulvius. These incongruent patterns suggest that phenotypic diversification in C. adamanteus was driven by selection, potentially resulting from dietary differences or local coevolutionary interactions with prey (Biardi et al. 2005), because locally beneficial traits are not expected to become fixed at the species level owing to genotype-by-genotype-by-environment interactions. The monomorphic venom phenotype in M. fulvius suggests that a recent range expansion and/or a species-wide selective sweep rather than reciprocal selection from specific prey taxa and/or dietary differences dictated the evolution of this complex trait.
Conclusion
We showed that regulatory changes at specific loci, potentially in response to coevolutionary interactions, can lead to rapid divergent phenotypic evolution in a polygenic complex trait. The lack of expression variation in M. fulvius demonstrated that intraspecific venom variation is not as pervasive as previously believed, provided critical information for the development of an effective and currently unavailable antivenom, and highlighted the significance of the venom differences identified in C. adamanteus, a species currently under consideration for listing as threatened under the Endangered Species Act (United States Fish and Wildlife Service 2012). We identified distinct populations of C. adamanteus based on an ecologically critical trait for this species, and the conservation of these populations will preserve unique venom phenotypes.
Acknowledgments
The authors thank Karalyn Aronow, Richard Bartlett, Jeffrey Fobb, Nathanael Herrera, Pierson Hill, Chip Leavine, Charles Looney, Jacob Loyacano, John Malone, Moses Michelson, Michael Rochford, Joe Pfaller, Jordan Sirosky, Jim Mendenhall, and Lora Smith and Jennifer Howze at the Joseph W. Jones Ecological Research Center at Ichuaway for help in acquiring venom samples. The authors also thank Megan Lamb, Danielle Jones, and Jennifer Wanat with the Florida Department of Environmental Protection and the Apalachicola River National Estuarine Research Reserve, Bradley Smith and Shelley Stiaes of the United States Fish and Wildlife Service and St. Vincent National Wildlife Refuge, Dorset Hurley and the Sapelo Island National Estuarine Research Reserve, Brett Williams and the Department of the Air Force, and Peter Krulder and Carl Calhoun at Caladesi Island State Park for access to field sites. We thank the College of Medicine Translational Science Laboratory at Florida State University for their assistance in collecting and analyzing the mass spectrometry data and Brian Inouye for his recommendations regarding statistical analyses. We also thank the laboratory of Lisle Gibbs at Ohio State University for providing comments. Samples were collected under the following permits: Florida Fish and Wildlife Conservation Commission (FWC) LSSC-13-00004 and LSSC-09-0399, Eglin Air Force Base 96 SFS/SFOIRP, Everglades National Park EVER-2012-SCI-0053, Florida Department of Environmental Protection permit number 04101310, and St. Vincent National Wildlife Refuge permit number 41650-2012-08. Sample collection was approved by the Florida State University Institutional Animal Care and Use Committee (IACUC) under protocol 0924. Annotated transcriptome sequences were submitted to the GenBank Transcriptome Shotgun Assembly database under accession number GBEX01000000 for C. adamanteus and GBEW01000000 for M. fulvius. This work was supported by the National Science Foundation (DEB-1145987 to D.R.R.).
Footnotes
Supporting information is available online at http://www.genetics.org/lookup/suppl/doi:10.1534/genetics.114.172437/-/DC1.
Communicating editor: J. J. Bull
- Received September 18, 2014.
- Accepted November 4, 2014.
- Copyright © 2015 by the Genetics Society of America