Nematodes compose an abundant and diverse invertebrate phylum with members inhabiting nearly every ecological niche. Panagrellus redivivus (the “microworm”) is a free-living nematode frequently used to understand the evolution of developmental and behavioral processes given its phylogenetic distance to Caenorhabditis elegans. Here we report the de novo sequencing of the genome, transcriptome, and small RNAs of P. redivivus. Using a combination of automated gene finders and RNA-seq data, we predict 24,249 genes and 32,676 transcripts. Small RNA analysis revealed 248 microRNA (miRNA) hairpins, of which 63 had orthologs in other species. Fourteen miRNA clusters containing 42 miRNA precursors were found. The RNA interference, dauer development, and programmed cell death pathways are largely conserved. Analysis of protein family domain abundance revealed that P. redivivus has experienced a striking expansion of BTB domain-containing proteins and an unprecedented expansion of the cullin scaffold family of proteins involved in multi-subunit ubiquitin ligases, suggesting proteolytic plasticity and/or tighter regulation of protein turnover. The eukaryotic release factor protein family has also been dramatically expanded and suggests an ongoing evolutionary arms race with viruses and transposons. The P. redivivus genome provides a resource to advance our understanding of nematode evolution and biology and to further elucidate the genomic architecture leading to free-living lineages, taking advantage of the many fascinating features of this worm revealed by comparative studies.
NEMATODES are highly prolific organisms that originated during the Precambrian or Cambrian explosion over 500 million years ago and have subsequently evolved exquisite adaptations, allowing them to inhabit nearly all ecological niches (Malakhov and Hope 1994; Ayala and Rzhetsky 1998; Blaxter et al. 1998; Rodriguez-Trelles et al. 2002). Most nematodes are adapted to “free-living” lifestyles (i.e., nonparasitic and not associated with plants or animals, or only transiently associated as in phoresy) in terrestrial, freshwater, and marine environments, while others have parasitic lifestyles (Malakhov and Hope 1994). Free-living nematodes such as Caenorhabditis elegans have proven to be invaluable models for elucidating developmental and behavioral processes, leading to major discoveries including the genetic pathways underlying programmed cell death and the discovery of microRNA (miRNAs) and RNA interference, among others (Ambros and Horvitz 1984; Yuan et al. 1993; Fire et al. 1998). There is a huge repertoire of culturable free-living species for comparative studies, potentially making it difficult to decide which to prioritize for sequencing (Blaxter et al. 1998).
The free-living nematode Panagrellus redivivus has been used as a model system since the days of Linnaeus and is an established free-living comparative taxon to C. elegans (Sternberg and Horvitz 1981, 1982; Srinivasan et al. 2008). Fascinating differences in cell lineages and in behavior have been observed between the two (Sternberg and Horvitz 1981, 1982; Sulston et al. 1983; Srinivasan et al. 2008). For example, P. redivivus exhibits several distinguishing morphological characteristics: it is a gonochoristic (male–female) species requiring both sexes for reproduction and is ovoviviparous eggs hatch in utero and the young larvae are subsequently released through the vulva (Figure 1B). The larvae undergo four molts post hatch. Males have 9 chromosomes while females have 10 (Hechler 1970; Sternberg and Horvitz 1981). P. redivivus adults average 2 mm in size, twice as long as C. elegans adults (Hechler 1970; Sternberg and Horvitz 1982).
Historically, free-living nematodes have served as useful models for understanding basic biology such as organ development and signal transduction (Sternberg and Horvitz 1981, 1982; Srinivasan et al. 2008). In addition to comparative developmental studies, P. redivivus has been used in aquatic and soil toxicity studies, revealing interesting insights into the effects of pollutants and toxins on reproduction, movement, and feeding (Ager et al. 1984; Debus and Niemann 1994; Hempel et al. 1995; Boyd and Williams 2003; Niu et al. 2010). Small metabolites isolated from several fungal species have been successfully tested for their nematocidal activity using P. redivivus (Li et al. 2005; Huang et al. 2009; da Cruz et al. 2011). P. redivivus has been used to isolate male and female sex pheromones (Choe et al. 2012). It has also been used as a model for studying infection using human bacterial pathogens (Laws et al. 2005). Hence, P. redivivus has been used as a model system extensively in many diverse fields of biology in addition to being a free-living comparative taxon with C. elegans, making it a standout among free-living nematode sequencing candidates.
A molecular phylogenetic approach based on small subunit ribosomal DNA suggests the presence of 12 monophyletic clades in Nematoda (Figure 1A) (Holterman et al. 2006; van Megan et al. 2009). According to this phylogeny, P. redivivus belongs to clade 10, whereas C. elegans belongs to clade 9 (Figure 1). Sequencing efforts have focused primarily on the crown clades of Chromadoria with >13 sequenced genomes. All of the sequenced free-living nematode genomes currently available are restricted to clade 9 and are within the Caenorhabditis genus (Dillman et al. 2012). Other than the caenorhabditids, nematode sequencing efforts have prioritized either plant or animal parasites—including some of the most devastating agricultural and human pathogens such as plant parasites within Meloidogyne and the human parasites Brugia malayi and Trichinella spiralis (Ghedin et al. 2007; Opperman et al. 2008; Mitreva et al. 2011), which cause elephantiasis and trichinosis, respectively. P. redivivus represents the first noncaenorhabditid free-living nematode to be sequenced. Although little is known about its natural ecology, published literature suggests that P. redivivus has been isolated from a variety of environments, including felt beer hall mats, insect frass, slime from tree wounds, rotting fruit, insects, and wheat paste (Ferris 2009; Félix and Duveau 2012). These are acidic and nutrient-rich environments and have considerable overlap with the nutrient-rich natural habitats of C. elegans, which has also been isolated from rotting/decaying matter, especially rotting fruit (Kiontke and Sudhaus 2006; Félix and Duveau 2012). Given this ecological overlap, it is interesting to consider the architecture of free-living nematode genomes and how they might adapt to their respective niches. The phylogenetic position of P. redivivus and its ecological overlap with C. elegans make it an excellent species for studying the evolution of development, behavior, and adaptation (Figure 1A) (Blaxter et al. 1998; Holterman et al. 2006).
Here we describe the de novo assembly and characterization of a draft genome, transcriptome, and the complement of small RNAs of P. redivivus.
Materials and Methods
Strain culturing and maintenance of P. redivivus
For genomic and transcriptomic analysis, we used the P. redivivus strain PS2298/MT8872 (Sternberg and Horvitz 1981) originally obtained from D. J. Hooper (Rothamsted Experimental Station, Harpenden, Hertsfordshire, England). This strain was raised at 20° using standard methods.
Isolation of DNA and RNA
P. redivivus worms were grown on 5–10, 10-cm nutrient agar dishes containing Escherichia coli OP50 plates until near starvation. The worms were rinsed and collected with M9 buffer and washed multiple times to remove any E. coli. After the last wash in M9, the worms were suspended in M9 for 15–30 min. The worms were then snap-frozen in liquid nitrogen in ∼100-μl aliquots and stored at −80°. Worms were thawed and refrozen two to three times to break the cuticle before extracting either genomic DNA or bulk RNA. Genomic DNA was extracted using two rounds of proteinase K digestion followed by phenol-chloroform extraction. The genomic DNA was then treated with RNase A for digestion of any RNAs present in the sample. Bulk RNA was extracted using the Qiagen RNeasy mini kit.
Genomic and RNA-Seq library construction
A genomic library (library ID 12193) was constructed using Illumina Paired End DNA Sample Preparation Kit according to the manufacturer’s instructions. Briefly, 3 μg of genomic DNA were fragmented using nebulization. The fragments were end-repaired, 3′ adenylated, and ligated to Illumina’s paired-end adaptors. The ligation products were size-selected on an agarose gel to yield fragments of ∼350 bp. These fragments were then PCR-amplified to produce the finished library. Mate pair, a.k.a. “jumping” library (library ID 13185), was prepared using Illumina Mate Pair Library Preparation kit v2. Briefly, 7.5 μg of genomic DNA was fragmented using HydroShear device (Genomic Instrumentation Services) to generate fragments of ∼2.2 kb. Following end repair and biotinylation, the 2.2-kb fragment was gel-purified and circularized. Circular DNA was fragmented using Bioruptor NGS (Diagenode), and biotin-containing fragments were isolated using Dynabeads (Invitrogen). The fragments were end-repaired, 3′ adenylated, and ligated to NEBNext Multiplex Adaptors (NEB). The ligation products were PCR-amplified and size-selected using AMPure XP beads (Beckman Coulter) to generate the finished library of ∼450 bp in length. The RNA-Seq mixed-stage, poly(A)-selected library was created from 10 μg of total RNA using a standard unstranded protocol (Mortazavi et al. 2008, 2010). Libraries were quantified using a Qubit fluorometer (Invitrogen), and size distributions were verified using an Agilent Bioanalyzer and the High Sensitivity DNA Kit. Genomic and RNA-seq libraries were sequenced on Illumina Genome Analyzer IIx sequencer in paired-end mode with the read length of 76 nt. The jumping library was sequenced on Illumina HiSeq2000 in paired-end mode with the read length of 100 nt.
Genome assembly and annotation
The genomic libraries were built, sequenced, assembled, filtered, and repeat-masked as previously described (Mortazavi et al. 2010) using Velvet 1.2.07 and RepeatModeler 1.0.5, RepeatMasker 3.0.3, recon 1.70, and RepeatScout 1.0.5. The mixed-stage transcriptome was sequenced as previously described (Mortazavi et al. 2010) and assembled into complementary DNAs (cDNAs) using Oases 0.2.6 (Schulz et al. 2012). RNA-seq reads were submitted to the Sequence Read Archive under accession no. GSM1076725. This Whole Genome Shotgun project has been deposited at DNA Data Bank of Japan/ EMBL/GenBank under accession no. AOMH00000000. The version described in this article is the first version, AOMH01000000.
Assembled cDNAs were mapped onto the genome with blat and used as hints for gene finding using Augustus 2.6 with C. elegans settings (Stanke et al. 2008). Separately, RNA-seq reads were mapped onto the genome using TopHat 1.4 (Trapnell et al. 2009), assembled into transcripts using Cufflinks 2.02 (Trapnell et al. 2010). Candidate single nucleotide variations (SNVs) in the genome and transcriptome mapped reads were called using the samtools 0.1.13 (Li et al. 2009) pileup and varFilter options (Supporting Information, Figure S1). Candidate SNVs in the transcriptome that fell within 5 bp of exon junctions were filtered out as likely splicing artifacts.
Generation of the small RNA library
Small RNAs were isolated from mixed cultures of P. redivivus using the miRVana kit (Ambion) according to the manufacturer’s instructions. A small RNA library was then produced from the isolated RNAs using NEBNext small RNA sample prep Set 1 (New England Biolabs). The library was then size-selected on a 6% PAGE gel with the cut band corresponding to ∼90–120 bp. Library quality and size were confirmed prior to sequencing on a Bioanalyzer (Agilent).
For additional methods involving analyses of sequence information of the genome, see Supporting Information.
Results and Discussion
P. redivivus genomic assembly and transcript annotation
We sequenced 34 million, 350-bp fragments and 52 million, 2200-bp fragments of genomic P. redivivus DNA using paired-end 75-bp reads and 100-bp reads, respectively, and assembled them as described in Materials and Methods. After filtering out E. coli genomic DNA, the P. redivivus 64.4-Mb assembly had an N50 of 262.4 kb, with a maximum scaffold size of 2318 kb. The assembly has a GC content of 44.25%, and 7.01% of the genome was repeat-masked (Table 1). The Caenorhabditis genomes have a lower average GC content of 37%, which makes the genome of P. redivivus more similar to the necromenic nematode Pristionchus pacificus (43% GC) with respect to GC content.
We collected RNA from ∼100,000 mixed-stage worms and sequenced 35 million, 200-bp cDNA fragments using paired-end 75-bp reads that were assembled into 18,298 distinct cDNAs with an N50 of 2.0 kb (Table S1) that were mapped onto the genome to assist the Augustus gene finder. Augustus identified 26,372 transcripts in 24,249 genes with 78,945 splice junctions. To extend the Augustus predictions of protein-coding genes, we used TopHat and Cufflinks (Trapnell et al. 2010) to map the RNA-seq data set onto the assembled genome. Cufflinks assembled 32,676 transcripts in 24,178 genes. Augustus predicted 32.9% of the consolidated gene models, whereas 19.3% of the models came from Cufflinks only (Figure 2A). Novel splice isoforms represented the bulk (11.5%) of the new transcripts identified by Cufflinks, while novel intergenic transcripts accounted for only 1.5% (Figure 2B). A survey of expression levels revealed that the novel splice isoform and non-Augustus gene models were highly expressed (Figure 2C). Thus, de novo protein-coding gene prediction (Augustus) and de novo transcript assembly on the genome (Cufflinks) are complementary methods that can be combined to obtain a more complete annotation of genes. We estimate that this draft of the P. redivivus genome is ∼98.2% complete, based on protein-clustering analysis of the P. redivivus proteome with the C. elegans Core Eukaryotic Genes Mapping Approach (CEGMA) protein set (Parra et al. 2007) (Figure 2D).
Analysis of small RNAs
A small RNA library was prepared by first isolating <200 nucleotide RNAs from whole mixed-stage animals by column chromatography. 5′ RNA and 3′ DNA adaptors were added using T4 RNA ligase I and T4 RNA ligase II (New England Biolabs), respectively. Both enzymes require 5′ phosphate present in the donor molecule and 3′ hydroxyl (OH) group in the acceptor for activity. The library was amplified and size selected for ∼90–120 bp fragments corresponding to inserts ∼20–50 bp in size. We sequenced 24 million reads from a small RNA library generated from mixed-stage animals (Table S2). We identified 248 miRNA precursors with at least 10 reads that support the presence of a mature miRNA derived from the hairpin precursor (Table 2 and File S2). For 218 hairpins, both 5′ and 3′ mature miRNAs were present with at least one read, and for 116 hairpins both were supported with at least 10 reads. In 157 miRNA genes, the dominantly expressed mature miRNA is located in the 3′ arm of the hairpin, a phenomenon that has also been observed in other nematodes (de Wit et al. 2009). In a few cases, there were two miRNAs expressed from the same miRNA loci, one from the plus strand and the other from the minus strand, suggesting the existence of antisense miRNA transcription (Ruby et al. 2007). We considered miRNA hairpins located within 500 bp from each other to be clustered; thus we found 14 miRNA clusters each containing two to seven miRNAs (Figure S2 and Figure S3C). In total, 42 miRNAs were located in these clusters and were likely derived from multicistronic precursors. Seventeen miRNAs came from multiple loci (Table 2).
Using conservation of both mature miRNA and its hairpin sequence as criteria, we found orthologs for P. redivivus miRNAs in humans (46 of the 1527 miRBase miRNAs), Drosophila (31/240), C. elegans (46/223), Caenorhabditis briggsae (28/140), Caenorhabditis remanei (29/109), P. pacificus (20/124), B. malayi (20/32), and Ascaris suum (50/97) (Table S3). Among these were the well-studied and broadly conserved miRNAs let-7, miR-1, and miR-124 and the first miRNA identified, lin-4 (Lee et al. 1993). Altogether, 63 P. redivivus miRNAs have at least one ortholog among the species studied. Hierarchical clustering was used to visualize the distribution and conservation of these miRNAs, separating those highly conserved from miRNAs with only one or two orthologs (Figure 3). The most highly expressed miRNA was prd-21808-8719-5p (34%), for which we found no orthologs, whereas the second, prd-miR-51-5p, was conserved in six species (C. elegans, C. remanei, B. malayi, A. suum, D. melanogaster, and Homo sapiens). In addition, prd5043_2650-3p and prd17878_7454-5p were conserved only in A. suum. In all, 10 of the 20 most abundant miRNAs from P. redivivus had an ortholog in C. elegans (Figure 4). lin-4 (4.7% of all miRNA reads) and miR-1 (2.1%) were also among the 20 most abundantly expressed miRNAs in the data set (Figure 4).
In addition to miRNAs, we also found evidence for the presence of endogenous small interfering RNAs (siRNAs) through identification of a cluster of nonhairpin-derived small RNAs. These consisted of reads that we tentatively identify as belonging to 21U, 22G, and 26G classes. The cluster in contig Pred1187 spanning nucleotides 15–486, consisted of 132 21U RNA reads (U first nucleotide, 21 nucleotides in length), 94 22G RNA reads (G first nucleotide, 22 nucleotides in length), and 78 26G RNA reads (G first nucleotide, 26 nucleotides in length) complementary to a 1-kb region of the predicted gene pred1_g624, a 221-amino-acid protein with two transmembrane domains and no obvious orthologs in other species. These RNAs were spaced at varying distances in both 5′ and 3′ directions (Figure S3A). We were not able to find large clusters of 21U-RNAs, similar to those described in C. elegans (15,722 unique 21U-RNA species expressed over a 200-kb region) (Ruby et al. 2007; Batista et al. 2008). C. elegans 21U-RNA species are expressed in the germline, are bound by Argonaut subfamily piwi-related protein PRG-1, and are thought to be the nematode equivalent of Piwi-interacting RNAs (piRNAs) found in Drosophila and humans. We were also unable to identify PRG-1 or PRG-2 orthologs in the transcriptome study (Table S4), suggesting that this class of small RNA may not be utilized in P. redivivus. By contrast, we did find evidence for the 22G-RNA class of small RNAs. We observed 18 clusters of 22G-RNA, which were defined by at least 20 22G-RNA reads (criteria for a cluster were multiple reads with <10 copies each, length 21–23 nt, and spaced by at most 200 nucleotides). Sixteen of these clusters were located on the opposite strand of the target gene. Within these clusters were only 22G-RNAs (G first nucleotide, 21–23 nucleotides in length). An example of a 22G cluster from P. redivivus is shown in Figure S2. The 22G-RNA class is further divided into two subclasses, one bound by CSR-1 and required for holocentric chromosome segregation (Claycomb et al. 2009), and the other subclass bound by worm-specific AGOs and playing an important role in transposon, pseudogene, cryptic locus, and protein-coding gene silencing (Gu et al. 2009). The P. redivivus genome has five CSR-1 orthologs (Table S4), suggesting that the CSR-1 pathway may be more elaborated in P. redivivus compared to C. elegans.
We analyzed protein orthology to explore the architecture of the P. redivivus proteome. We compared 24,249 P. redivivus proteins to the proteomes of seven other nematode species and an insect outgroup: P. redivivus, C. elegans, P. pacificus, Meloidogyne hapla, Bursaphelenchus xylophilus, Brugia malayi, A. suum, T. spiralis, and the parasitoid wasp Nasonia vitripennis (C. elegans Genome Sequencing Consortium 1998; Ghedin et al. 2007; Dieterich et al. 2008; Opperman et al. 2008; Werren et al. 2010; Jex et al. 2011; Kikuchi et al. 2011; Mitreva et al. 2011) (Table 3 and File S1). An important caveat of such orthology analyses is that the accuracy of the results relies on the quality and completeness of the proteomes used. We found a total of 9156 orthology clusters that included 17,415 P. redivivus proteins; 281 of these were found exclusively in nematodes. A total of 1,664 orthology clusters included at least one protein from each of the nine taxa that we analyzed (N:N), 521 of which were strictly conserved at a 1:1 ratio across all taxa (Table 3). These highly conserved proteins provide a candidate list of additional potential phylogenetic markers that could increase the signal-to-noise ratio in future phylum-wide phylogenetic analyses (Holterman et al. 2006; van Megan et al. 2009). P. redivivus had 6834 orphan proteins that did not cluster with any examined proteins, suggesting that they are uniquely derived in P. redivivus or that they are sufficiently divergent from their orthologs so as to not be recognizably related by sequence similarity alone. We find it remarkable that, despite using eight nematode proteomes, representing only 4 of the 12 clades (Figure 1), we still find that >20% of the protein-coding genes in each species are orphans, with little-to-no sequence homology with other proteins in the analysis (Table S5). This suggests that a tremendous diversity of proteins underlies the superficial similarity of nematode morphology and that many novel proteins may yet remain to be discovered with additional genome sequencing of these wonderfully adaptable worms.
Signaling and regulatory pathways in P. redivivus
Organisms often display remarkable plasticity during their life cycle and are capable of adapting to different conditions by sensing their environment and physiological status. Behavioral and metabolic changes are the most common forms of plasticity in response to environmental changes (Fielenbach and Antebi 2008). Both these processes are rapid responses to the environment and help the organism maintain homeostasis. Since the nematode P. redivivus is free-living and appears to inhabit nutrient-rich environments, we examined whether changes in components of signaling or developmental pathways reflect its adaptation to such a lifestyle. We also screened the assembled genome for the conservation of important biological pathways including the dauer, cell-death, and RNA interference (RNAi) pathways.
Dauer formation pathway
One of the most extensively studied molecular pathways in C. elegans is the dauer formation pathway (Fielenbach and Antebi 2008). The dauer diapause represents a long-lived life stage, which is a developmental response to stressful environmental conditions such as low availability of food and high population density. Detailed molecular and genetic analyses in C. elegans have revealed how the worm senses its environment and reacts to changing environmental conditions by activating conserved signaling pathways to initiate entry into the dauer stage (Fielenbach and Antebi 2008; Schaedel et al. 2012).
Several researchers have suggested that P. redivivus does not form dauers (Hechler 1970; Stock and Nadler 2006). This observation is surprising as the genome of P. redivivus encodes nearly all the major components of the dauer pathway, with 21 of the 25 proteins that we examined being conserved (Table 4). Howerver, Panagrellus isolates do form dauers in nature (Félix and Duveau 2012), and it is possible that this species or this laboratory strain lost the ability to form dauers under standard laboraotory conditions. Several of the pathway components apparently absent in P. redivivus appear to be specific to the caenorhabditids.
Cell death pathway
Cell death is a critical process during development because animals need to eliminate unwanted cells in a regulated and timely manner (Horvitz 2003). In C. elegans, programmed cell death is believed to be molecularly initiated by the activation of a core cell-death pathway, consisting of egl-1, ced-9, ced-4, and ced-3 (Metzstein et al. 1998). Cell death is highly abundant during P. redivivus germline and somatic development (Sternberg and Horvitz 1981, 1982). In addition, P. redivivus development undergoes specific cell deaths that are evolutionarily derived (e.g., the female gonadal posterior distal tip cell). Given these observations, we examined whether core components of the cell-death pathway are present in P. redivivus. Most known effectors are conserved; however, CED-9 appears to be absent (Table S6). We assume that this is a result of the draft nature of the genome or possibly the result of divergent sequence rather than the actual absence of this gene. CED-9 is central in the regulation and prevention of cell death in many species and is highly conserved from C. elegans to humans (Metzstein et al. 1998).
The RNAi pathway has become a valuable experimental tool to perturb individual or groups of genes to uncover their specific function(s), although its application and reliability across different nematodes has been inconsistent (Urwin et al. 2002; Viney and Thompson 2008; Dalzell et al. 2010, 2011). An obvious potential explanation for this is a conspicuous lack of certain RNAi effectors in some nematode species. However, we note that other factors, such as culturing conditions, rather than the disparity of RNAi effectors across species may better explain RNAi competencies among nematodes (Dalzell et al. 2011). We found that many RNAi effector genes are conserved in P. redivivus (Table S4 and Table S7). We found 56 RNAi effector proteins that cluster with known effectors in C. elegans, including at least 16 Argonaute-like proteins, in the P. redivivus genome (Table S4 and Table S7), suggesting that P. redivivus has more of the known RNAi pathway conserved with C. elegans than any other noncaenorhabditid nematode that has been sequenced; however, this is due in part to P. redivivus expansions in certain orthologous clusters such as CSR-1- and EKL-1-like proteins (Dalzell et al. 2011). Despite the high number of orthologous effectors in C. elegans and P. redivivus, 21 of the 73 RNAi effectors that we examined appear to be specific to the C. elegans lineage, having no apparent orthologs in any of the taxa that we analyzed (Table S4 and Table S7). We are hopeful that P. redivivus will be susceptible to experimental RNAi, given the apparent conservation of so many effectors that need to be tested.
A novel small RNA pathway required for the production and/or function of germline small RNA(s) in C. elegans includes four regulator genes (csr-1, drh-3, ego-1, and ekl-1) (Rocheleau et al. 2008). We found that three of these genes have expanded to small families in the P. redivivus lineage, with five paralogs of CSR-1, three paralogs of DRH-3, and six paralogs of EKL-1. All three genes are required for RNAi in the C. elegans germline and share a chromosome segregation-defective and embryonic-lethal phenotype (Grishok et al. 2001, 2005; Kim et al. 2005; Robert et al. 2005; Duchaine et al. 2006). An unusual group of retrotransposons, named PAT elements, was previously identified in P. redivivus. These retrotransposons have contributed to a higher spontaneous mutation rate in P. redivivus compared to C. elegans (Link et al. 1987; de Chastonay et al. 1992). Although we could not precisely determine the number of PAT element copies in the P. redivivus genome, an HMMER Pfam analysis indicates the presence of at least 65 copies of reverse transcriptase and 23 copies of integrase, suggesting at least 23–65 retroelements (Table S8) (Finn et al. 2011). The apparent expansion of csr-1, drh-3, and ekl-1 and the abundance of retrotransposons in the genome suggest pronounced regulation of transposons in the germline of P. redivivus.
Protein family domain abundance
An analysis of domain abundance of various protein families provides an unbiased approach to exploring the vast sea of genomic data and reveals striking differences between the free-living nematodes C. elegans and P. redivivus (Figure 5; Figure S5). The C. elegans genome is greatly enriched in F-box, F-box-associated, FTH, and C-type lectin domains, among others, when compared to the P. redivivus genome. By contrast, the P. redivivus genome is enriched in BTB/POZ, BTB and C-terminal Kelch, trypsin, reverse transcriptase, and integrase core domains, among others (Figure 5). Both F-box and BTB domains are structural motifs that mediate protein–protein interactions, and proteins containing these domains are associated with signal transduction, cell-cycle regulation, and other cellular functions (Craig and Tyers 1999; Kipreos and Pagano 2000; Pintard et al. 2003; Stogios et al. 2005). The few members of these protein families that have been well-studied function as adaptors that determine the substrate specificity of E3 ubiquitin ligases, targeting substrates for proteolysis (Bai et al. 1996; Craig and Tyers 1999; Gagne et al. 2002; Furukawa et al. 2003; Pintard et al. 2003). Both F-box and BTB proteins are thought to play an important role in nematode immunity, with certain substrate-binding motifs having undergone heavy positive selection to target bacterial and viral peptides in the ever-escalating host–pathogen arms race (Dawkins and Krebs 1979; Thomas 2006). A detailed examination of the family of BTB domain-containing proteins in the P. redivivus genome revealed that there are large lineage-specific clades that appear to be rapidly evolving, suggestive of their involvement in immune responses (Tables 5 and 6; Figure 6). In addition, the presence of smaller, conserved orthology clusters of BTB/POZ and BTB/C-terminal Kelch proteins suggests that these likely target endogenous proteins, possibly for degradation in an E3 ligase proteolysis pathway (Figure 6; Figure S4) (Petroski and Deshaies 2005). This pattern of expansion and conservation of P. redivivus BTB domain-containing proteins, with many seemingly fast-evolving lineage-specific clusters, is consistent with observations from the C. elegans genome that F-box and BTB domain-containing proteins likely function in immunity and proteolysis (Thomas 2006).
The extent of variation in the number of F-box and BTB domains between P redivivus and C. elegans is striking. We pursued this observation further by evaluating the prevalence of F-box and BTB proteins across many nematodes and found that the free-living nematodes are outliers, having far more of either of these protein domains than any other nematodes, including the necromenic nematode P. pacificus as well as B. xylophilus, a plant-parasitic member of clade 10 along with P. redivivus (Figure 1 and Table 5). We also note that the trend across nematodes seems to favor BTBs over F-box proteins, with the exception of C. elegans, which has far more F-box proteins. Expanding this analysis across eukaryotes reveals that metazoans generally have more BTB proteins than F-box proteins, with plants and C. elegans being the exceptions (Table 5 and Table 6).
Due to the dramatic disparity of F-box domains between the free-living nematodes P. redivivus and C. elegans, we investigated the conservation of F-box domain-containing proteins across nematodes (Table 5 and Table S9). We found few F-box domain-containing proteins broadly conserved in nematodes and insects (Table S9) (Jin et al. 2004). We have yet to find the highly conserved SEL-10 (CDC4), known for its role in Skp, Cullin, F-box containing complex (or SCF complex)-mediated proteolysis in our genome or transciptome.
We suggest that the apparent evolution of F-box and BTB proteins in P. redivivus could be a response to viral susceptibility. Both BTB and F-box domain-containing proteins are traditionally known for their roles as the substrate-specifying subunits of multisubunit cullin-RING ubiquitin ligases (CRLs) (Feldman et al. 1997; Michel and Xiong 1998; Pintard et al. 2003; Petroski and Deshaies 2005; Sarikas et al. 2011). These ligases are modular and are responsible for targeting a wide variety of substrates for proteolysis by ubiquitylating them. These complexes are assembled on a cullin scaffold, which tethers a RING protein to a substrate-specifying subunit, usually through an adaptor protein as in the case of the canonical SCFCdc4 CRL. There are a variety of different CRLs, each associating with a specific cullin protein and possessing different specificity, depending on the adaptor and/or substrate-specifying subunit used (Petroski and Deshaies 2005). For example, C. elegans is known to have seven CRLs with ubiquitin-ligase activity, each built on a distinguishing cullin scaffold (cul-1 through cul-6 and apc-2) (Sarikas et al. 2011). CUL-1 CRLs use F-box proteins for substrate specificity, while those with a CUL-3 cullin scaffold use BTB proteins for substrate specificity. CRL machinery is widely exploited by viruses as a method of immune evasion, with most known examples targeting aspects of the CUL-1 CRL (Barry and Fruh 2006).
In support of this hypothesis, we observed an unprecedented expansion of cullin proteins in the P. redivivus genome (Table 5 and Figure 7). We explored this expansion by constructing a gene tree of all cullin homology domain-containing proteins across sequenced nematode genomes (Figure 7). Because of the dramatic expansion of BTB proteins, we expected an accompanying expansion of the CUL-3 family in P. redivivus, but paradoxically found an expansion of CUL-1-like proteins, which are known to use F-box proteins for substrate specificity in other metazoans (Petroski and Deshaies 2005; Sarikas et al. 2011). We also identified a number of novel cullin proteins, with little similarity to any described families, including several that appear to have arisen due to recent tandem duplications (Figure 7). The apparent absence of any CUL-2 ortholog and the abundance of novel cullin proteins suggest a surprising amount of regulatory and proteolytic plasticity in P. redivivus, which may be shaped by the stressful demands of the free-living lifestyle, immunological or otherwise. It is not uncommon for components of the ubiquitin system to be adapted to expand the immune system (e.g., Han et al. 2011; Yewdell 2005). Although beyond the scope of this current work, our data suggest that exploring the role of these cullins and the function of the CRL complexes that they form would increase our understanding of the adaptative changes that P. redivivus has undergone to cope with the stresses of its ecological niche.
Genomes of different organisms encode families of chemoreceptors, the size and diversity of which can reflect the niche inhabited by the organism (Thomas and Robertson 2008). These proteins mediate the first step in the transduction of chemical and other types of stimuli such as taste and pheromone signals. We screened the G-protein-coupled receptor (GPCR) repertoire in the genome of P. redivivus to better understand how the evolution of this large family of receptors might reflect its life history and how it compares to C. elegans. Not unexpectedly, both C. elegans and P. redivivus have an abundance of serpentine family domains (Srh, Str, Srd, etc.) belonging to the GPCR superfamily (Table 7). We found that, although the P. redivivus genome possesses a variety of GPCR proteins, it is far less abundant than what we found in C. elegans (Table 7). We observed that P. redivivus had 1075 serpentine domains compared to the 3259 domains of C. elegans. It has been suggested that C. elegans requires an abundance of chemoreceptors to navigate and interpret the nutrient-rich environments in which it lives (Robertson and Thomas 2006).
The number of serpentine GPCR domains in P. redivivus is similar to that of its clade mate, the migratory endoparasitic B. xylophilus. We did find that, of the nematodes that we analyzed, the animal parasites (A. suum, B. malayi, and T. spiralis) have far fewer GPCR domains compared to their free-living counterparts. Our data suggest that, in the specialized environments that these worms inhabit inside their hosts, they do not need a large repertoire of receptors, whereas the free-living nematodes, and nematodes that spend more foraging time in complex soil environments (e.g., P. pacificus and B. xylophilus), require a larger set so that they can better navigate and interpret their environment (Table 7).
ATP-binding cassette (ABC) transporters provide a means for a wide variety of substrates to be actively transported across membranes, hydrolyzing ATP in the process (Davidson et al. 2008; Sundaram et al. 2008). C. elegans has 61 ABC transporters, representing ∼0.3% of its protein-coding gene repertoire (Sheps et al. 2004). We found 94 putative ABC transporters in the P. redivivus genome, representing ∼0.4% of the total number of genes that we report in this draft genome (Table S10 and File S1). We found P. redivivus orthologs for 52 of the 61 C. elegans ABC transporters, indicating a high level of conservation (Table S10). In addition to having lineage-specific ABC transporters, we see expansions of hmt-1-like and pgp-like ABC transporters (Table S10). Unsurprisingly, both of these families of ABC transporters are involved in tolerance of heavy metals and other toxins. hmt-1 functions in heavy metal tolerance, mitigating the toxic effects of arsenic, cadmium, and copper on C. elegans, while pgp-5 is involved in resistance to heavy metals and bacterial toxins (Kurz et al. 2007; Schwartz et al. 2010). Expansions in these particular families of ABC transporters may explain the high level of copper tolerance reported in P. redivivus, which has been shown to have higher tolerance to copper than C. elegans or P. pacificus (Boyd and Williams 2003).
Eukaryotic release factor
The expanded protein domain families in P. redivivus compared to C. elegans include the eukaryotic release factor domains 1, 2, and 3. These three domains are found in the eukaryotic release factor 1 protein (eRF1), which is highly conserved from yeast to humans and plays a key role in translational termination (Frolova et al. 1994). eRF1 recognizes termination codons contained in messenger RNA (mRNA) and competes with suppressor transfer RNA(s) for the ribosomal A site (Drugeon et al. 1997). Most animals have 2–3 eRF1 orthologs while P. redivivus seems to have a striking expansion of 15 (Tables 5 and 6). While one of these is a putative ortholog of ETF-1 in C. elegans, the rest are quite diverse and appear to be scattered throughout the genome (Figure S6). How would a nematode or any other animal make use of an expansion of eRF1-like proteins and what might that reveal about the life history or natural ecology of P. redivivus?
Suppression of translational termination is a common strategy of animal and plant viruses and is necessary for the replication of some viruses (ten Dam et al. 1990). The expansion of eRF1 proteins in P. redivivus could represent an enhanced arsenal against viral assault, providing evidence of an historical or ongoing arms race between P. redivivus and viral antagonists. eRF1 levels are important for translational termination such that overexpressing eRF1 reduces readthrough (Drugeon et al. 1997; Le Goff et al. 1997) and depleting eRF1 increases readthrough (Stansfield et al. 1996). It is known that targeted depletion of eRF1 is a strategy employed by some viruses, such as the murine leukemia virus, whose reverse transcriptase interacts with eRF1 to increase translational readthrough, leading to efficient replication of the virus (Orlova et al. 2003). Perhaps additional copies of eRF1 ensure peptide chain termination, preventing or decreasing translational readthrough and/or ribosomal frameshifting and thus conferring resistance or immunity to some viruses. Little is known about the breadth and diversity of viruses that infect nematodes, especially noncaenorhabditid nematodes. There are no reports regarding viral infection of P. redivivus; however, P. redivivus is known to have a relatively high load of unusual retrotransposons, designated as PAT retroid elements and thought to be distantly related to the Gypsy family of retrotransposons (Link et al. 1987; de Chastonay et al. 1992). While most retroid elements produce GAG and Pol genes by translational readthrough or ribosomal framshifting from the same mRNA transcript, PAT elements are thought to generate separate transcripts for GAG and Pol genes, respectively (Link et al. 1987; de Chastonay et al. 1992). This implies that they have evolved to regulate GAG and Pol ratios at the transcriptional level, bypassing the need for translational readthrough or ribosomal frameshifting. We speculate that this could represent a vivid example of an evolutionary arms race, with P. redivivus evolving an expanded repertoire of eRF1 genes to ensure translational termination while PAT elements, the only known active retroelements in P. redivivus, have shifted to generate their genes in discrete transcripts, thus overcoming the host genome’s defenses, although this remains to be explored.
The annotated draft genome and transcriptome of P. redivivus provides a powerful resource in evolutionary and ecological comparative genomics. As it is the first free-living genome outside of the Caenorhabditis family to be sequenced, the genome highlights features that are common with the C. elegans genome. This may reflect common constraints and adaptations resulting from the free-living lifestyle. Free-living worms live in a complex and dynamic environment and must be able to generate appropriate responses to different stimuli and protect themselves against exogenous threats such as predators and pathogens, while still managing to find food and mates (Figure 8). Our analyses suggest some common genomic and transcriptomic features between the P. redivivus and C. elegans genomes. These include a large complement of GPCRs for interpreting and navigating nutrient-rich environments and an expansion of immune-related proteins to combat the abundant pathogens found in such environments. We also observe unexpected novelties, such as an unprecedented expansion of cullin scaffold proteins in P. redivivus and an unprecedented expansion of eRF1 orthologs. Some of the genomic features that we have described such as expansions in certain ABC transporters and eRF1 proteins may explain previous observations regarding toxin tolerance and the unusual PAT retroelements present in the P. redivivus genome (Link et al. 1987; de Chastonay et al. 1992; Boyd and Williams 2003) . These findings potentially encourage the development of new avenues in nematode research (Figure 8).
Comparative nematode genomics has come a long way since the release of the first whole nematode genome sequence (C. elegans Genome Sequencing Consortium 1998). Many additional nematode genomes have been sequenced, and the continuing drop in cost will ensure that even more will be sequenced (Stein et al. 2003; Ghedin et al. 2007; Abad et al. 2008; Dieterich et al. 2008; Mortazavi et al. 2010; Jex et al. 2011; Kikuchi et al. 2011; Mitreva et al. 2011; Sommer and Streit 2011). In addition, sequencing the genomes of nematode pests is providing researchers an avenue for identifying pharmacological targets that could be useful in the development of novel drugs against these parasitic nematodes (e.g., Jex et al. 2011). Comparison of genes involved in parasitism across various nematode clades expands our knowledge of the role played by processes such as horizontal gene transfer in the evolution of parasitism by nematodes (Mayer et al. 2011). Although there are 13 nematode genome sequences available, with many more in preparation, sequencing efforts have focused primarily on the crown clades of Chromadoria, heavily covering clade 9, and many of these projects have focused (appropriately) on parasites; however, we believe that our understanding of development, gene regulation, and niche partitioning among nematodes, as well as parasitism, will be greatly enhanced by studying the free-living ancestors from which parasites evolved (Dillman et al. 2012). This comparative analysis highlights some of the potential selective pressures on free-living nematodes and the adaptations that allow them to thrive in the natural world.
We thank Hillel Schwartz, Margaret C. Ho, Alexei Aravin, Claire C. de la Cova, Iva Greenwald, Kay Hofmann, and Raymond Deshaies for discussions. We thank Alicia Rogers and Zane Goodwin for their patience and help with some of the computations. This work was supported by a National Institutes of Health U.S. Public Health Service Training Grant (T32GM07616) to A.R.D. and by the Howard Hughes Medical Institute (with which P.W.S. is an Investigator). Sequencing was carried out at the Millard and Muriel Jacobs Genome Facility at the California Institute of Technology.
Communicating editor: O. Hobert
- Received December 15, 2012.
- Accepted January 8, 2013.
- Copyright © 2013 by the Genetics Society of America
Available freely online through the author-supported open access option.