Genomic resources for hundreds of species of evolutionary, agricultural, economic, and medical importance are unavailable due to the expense of well-assembled genome sequences and difficulties with multigenerational studies. Teleost fish provide many models for human disease but possess anciently duplicated genomes that sometimes obfuscate connectivity. Genomic information representing a fish lineage that diverged before the teleost genome duplication (TGD) would provide an outgroup for exploring the mechanisms of evolution after whole-genome duplication. We exploited massively parallel DNA sequencing to develop meiotic maps with thrift and speed by genotyping F1 offspring of a single female and a single male spotted gar (Lepisosteus oculatus) collected directly from nature utilizing only polymorphisms existing in these two wild individuals. Using Stacks, software that automates the calling of genotypes from polymorphisms assayed by Illumina sequencing, we constructed a map containing 8406 markers. RNA-seq on two map-cross larvae provided a reference transcriptome that identified nearly 1000 mapped protein-coding markers and allowed genome-wide analysis of conserved synteny. Results showed that the gar lineage diverged from teleosts before the TGD and its genome is organized more similarly to that of humans than teleosts. Thus, spotted gar provides a critical link between medical models in teleost fish, to which gar is biologically similar, and humans, to which gar is genomically similar. Application of our F1 dense mapping strategy to species with no prior genome information promises to facilitate comparative genomics and provide a scaffold for ordering the numerous contigs arising from next generation genome sequencing.
TELEOST fish provide numerous medical models. Some are induced mutant models, as in zebrafish and medaka (i.e., Moore et al. 2006; Schartl et al. 2010). Others are evolutionary mutant models, in which naturally occurring mutations lead to adaptive phenotypes that mimic human disease (Albertson et al. 2009), as in cichlids (craniofacial anomalies), platyfish (melanoma), mollies (premature puberty), cavefish (retinal degeneration), and icefish (osteopenia and anemia) (Eastman and Devries 1981; Streelman et al. 2003; Meierjohann and Schartl 2006; Near et al. 2006; Jeffery 2009; Valenzano et al. 2009; Albertson et al. 2010; Lampert et al. 2010; Zhang et al. 2010b). Teleost genomes differ from mammalian genomes, however, by a whole-genome duplication event, the teleost genome duplication (TGD) (Figure 1) (Amores et al. 1998; Postlethwait et al. 1998; Taylor et al. 2003; Jaillon et al. 2004). While the TGD can facilitate the dissection of ancestral gene functions due to the partitioning of ancestral subfunctions in the course of evolution (Force et al. 1999; Postlethwait et al. 2004), it can also obfuscate correlations between teleost disease models and their human counterparts because of the difficulty of ortholog assignment after lineage-specific loss of duplicated genes and the asymmetric evolution of gene duplicates. Genomic resources from a ray-fin (Actinopterygian) fish that diverged from teleosts before the TGD (Figure 1) would facilitate the connectivity of teleost and mammalian genomes. Unfortunately, candidate species for this role, including polypterus, paddlefish, sturgeons, bowfin, and gar (Blacklidge and Bidwell 1993; Inoue et al. 2003; Hardie and Hebert 2004) have virtually no genome resources and have life history traits inconvenient for the construction of large-scale genetic maps.
Teleost genomes possess substantially rearranged chromosomes with respect to mammalian chromosomes (Postlethwait et al. 1998; Nakatani et al. 2007), and although it has been suggested that chromosome rearrangements accelerated after the TGD, this idea is controversial (Comai 2005; Semon and Wolfe 2007; Hufton et al. 2008). Comparative analysis of a fish genome occupying a lineage that diverged from teleosts shortly before the TGD would test whether chromosome rearrangements detected in teleosts arose before or after the TGD.
Analysis of a half dozen genes in spotted gar (Lepisosteus oculatus), a large, air-breathing ray-finned North American fish, suggested that its lineage might have diverged from the teleost lineage before the TGD (Hoegg et al. 2004; Crow et al. 2006). If gar did diverge before the TGD, then gar would provide a genomic intermediary between teleost medical models and the human genome. Simple hormone injections can induce wild-caught gar to spawn in the laboratory and each female can produce on average more than 6000 fertilized eggs (Smith 2006). Gar embryos are suitable for in situ hybridization studies and develop in the lab to hatching and beyond. Spotted gar, however, like many plant and animal species of ecological, evolutionary, agricultural, pharmacological, behavioral, and medical interest, has inconvenient life history traits (males mature when 1 yr old and females when 2 yr old; Smith 2006) and gar lacks genome resources (just six nuclear gene sequences in GenBank). Furthermore, current meiotic mapping methods produce only a few hundred markers at great expense, usually do not provide protein-coding loci, and require multigenerational pedigrees (Kucuktas et al. 2009; Sanetra et al. 2009; Tripathi et al. 2009; Chintamanani et al. 2010; Li et al. 2010; Okada et al. 2010). While next generation sequencing methodologies have been effectively used for resequencing genomes of canonized model organisms with sequenced genomes (Hobert 2010; Zuryn et al. 2010) and for analyzing populations in genome-wide association studies (Hohenlohe et al. 2010), the short sequences these methods generate have been thought to be less suitable for species that lack a reference genome.
To address these problems, we devised novel strategies that utilize next-generation sequencing and Stacks software, which converts short-read sequences into called genotypes (Catchen et al. 2011), to economically create genetic maps containing many thousands of mapped markers. Because the gar’s 1- to 2-yr generation time erects a disincentive to the construction of a traditional F2 or backcross mapping panel, we capitalized on polymorphisms naturally present in the genome of a single male and a single female spotted gar taken directly from Louisiana bayous to develop a genetic map directly by genotyping their F1 progeny. We used as markers polymorphisms that were present in restriction-associated DNA (RAD)-tag sequences adjacent to restriction enzyme cut sites (Miller et al. 2007a,b; Baird et al. 2008). We associated mapped RAD-tag markers to gar coding genes by constructing a reference transcriptome from one embryo and the head of one larva. These datasets allowed the identification of nearly 1000 mapped markers as representatives of protein-coding genes. We used coding markers on the gar map to test the hypothesis that the gar lineage diverged from the teleost lineage before the TGD and to challenge the hypothesis that most rearrangements in teleost genomes occurred after the teleost genome duplication.
The mapping strategy developed here is directly applicable to numerous nonmodel organisms and their application should spur genomic research on previously intractable species. Furthermore, the great number of genetic markers mapped can help order the thousands of unordered contigs that arise in next generation genome sequencing projects.
Materials and Methods
A single female and a single male adult spotted gar (L. oculatus) were collected in Louisiana (Bayous Chevreuil and Gheens, respectively) and maintained in a recirculating system at 23–25° on a 14-h-light/10-h-dark photoperiod. Injecting fish with Ovaprim at 0.5 ml/kg body weight induced spawning within 48 hr. This mating produced thousands of progeny, of which we collected 500 for DNA extractions and selected 94 of these to genotype for our F1 map cross and two others to use for transcriptomics. Progeny were maintained at 23–25° until 14 days postfertilization (dpf), when they were killed by MS-222 overdose and stored in 100% EtOH at −20°. Parents were killed by concussion, and brain, liver, blood, gonads, kidney, and muscle samples were collected for transcriptomics. Tissue samples were stored in 100% EtOH at −20° until isolation of genomic DNA (DNeasy Blood and Tissue Kit, Qiagen, see supporting information, File S1 for details). Local university animal care committees approved euthanasia and all other animal procedures.
Genomic DNA was purified from parents and progeny and digested with high-fidelity SbfI (New England Biolabs), which has an 8-bp, GC-rich recognition site (CCTGCAGG) and cuts ∼25,000 to 30,000 times in teleost genomes. Samples were ligated to adapters with a set of five nucleotide (nt) barcodes each different by at least two nucleotides for unambiguous assignment. RAD-tag libraries were created as described (Miller et al. 2007b; Baird et al. 2008; Hohenlohe et al. 2010) (see File S1 for further details). A total of 50 ng of pooled, size-selected DNA was PCR amplified for 12 cycles and the 200- to 500-bp fraction was gel purified. RAD-tag libraries were sequenced on an Illumina Genome Analyzer IIx by 80 nucleotide single end reads loading equal amounts of DNA from 10 progeny on each lane in barcoded samples.
Reads of low quality or with ambiguous barcodes were discarded. Retained reads were sorted into loci and genotyped using Stacks software we wrote specifically to analyze these map cross data (Catchen et al. 2011). Stacks is freely downloadable (http://creskolab.uoregon.edu/stacks/). The likelihood-based SNP calling algorithm (Hohenlohe et al. 2010) implemented in Stacks evaluates each nucleotide position in every RAD tag of all individuals, thereby differentiating true SNPs from sequencing errors. Some RAD-tag genotypes contained a single SNP, but others represented alleles that differed by multiple SNPs that were scored from these haplotypes.
Markers heterozygous in just one parent were mapped as a pseudo-testcross (Grattapaglia and Sederoff 1994) and markers heterozygous in both parents were mapped as an F2 family. Markers segregated in four different patterns. Type lm × ll (segregating 1:1) was heterozygous in the male parent and homozygous in the female parent; nn × np (1:1) was homozygous in the male and heterozygous in the female; hk × hk (1:2:1) was heterozygous in both parents with two shared alleles; and ef × eg (1:1:1:1) was heterozygous in both parents with two sex-specific alleles and one shared allele.
Linkage analysis was performed for markers present in at least 85 of 94 individuals (50 out of 94 for protein-coding markers) with JoinMap 4.0 (Wageningen, The Netherlands). Markers were identified as paternal or maternal, which enabled the construction of male-specific and female-specific linkage maps. Linkage between markers, recombination rate, and map distances were calculated using the Kosambi mapping function and the maximum likelihood mapping algorithm in JoinMap. Markers were grouped with an initial LOD threshold of 14.0 but many unlinked markers and small linkage groups were added using the strong crosslink feature of JoinMap at a minimum LOD of 10.0. Markers with significant segregation distortion or unlinked at LOD <10 were excluded. The graphical genotypes feature of JoinMap identified doubtful double recombinants, which were reevaluated by inspection of stacked sequences and corrected as needed; for example, some genotypes that the software called homozygotes were clearly heterozygotes with minor allele reads below threshold. Corrected genotypes were loaded into JoinMap and linkage analysis was repeated until no suspicious genotypes were identified.
The consensus map was calculated using JoinMap’s population type CP (cross pollinator, or full-sib family), the Kosambi mapping function, and the regression mapping algorithm. Because JoinMap could not process >5500 markers, we selected all markers with comparative genomic information and an arbitrary set of the 8406 total linked markers to sum to 5466 markers. To reduce computational time, the largest linkage groups (1–8) were analyzed by excluding markers with identical genotypes, but smaller linkage groups were analyzed using all markers.
Total RNA was isolated from two F1 map cross progeny that were not used for genotyping—one entire stage-30 (7 days postfertilization, near hatching) gar embryo and the head tissue of a stage-33 (12 dpf, yolk nearly exhausted) gar larva (Long and Ballard 2001)—using the RiboPure Total RNA Isolation kit (Ambion). mRNA was isolated using the MicroPolyA Purist kit (Ambion). A total of 500 ng of mRNA was reverse transcribed with SuperScript III Reverse Transcriptase (Invitrogen) and random hexamer primers (Invitrogen). Second strand cDNA was synthesized with random primers and 15 units of Klenow DNA polymerase exo-minus (Epicentre). Double stranded cDNA was sheared in a Bioruptor (Diagenode) for 30 cycles (30 sec on, 60 sec off). Sheared DNA was end repaired with the End-It DNA repair kit (Epicentre) and dA overhangs were added with Klenow DNA polymerase exo-minus. Adapters were ligated overnight and 100 ng was PCR amplified for 12 cycles with Phusion DNA polymerase (New England Biolabs). Each sample was sequenced on a single lane of an Illumina GAIIX sequencer (see File S1 for further details).
The restriction enzyme SbfI cuts frequently in coding sequences; for example, the sequenced zebrafish genome has 26,948 SbfI recognition sites (giving 53,896 RAD tags), with 6010 (11%) of those occurring in protein-coding genes; for stickleback, the figure is even higher, at 16%. To improve identification of coding sequences located near each RAD tag, we constructed paired-end contigs by randomly shearing SbfI-digested DNAs to obtain fragments of different length, all of which had a RAD tag at one end, and subjected them to paired-end sequencing (see File S1 for further details). Sequence from the first end defined the RAD-tag marker, while sequence from the paired ends, which occurred at many different distances from each tag due to random shearing, provided a few hundred base pairs of contiguous sequence located a few hundred base pairs from each enzyme cutting site (Figure S2A) (see also Etter et al. 2011).
BLASTx searches of zebrafish, stickleback, and human genomes (Ensembl v56), using an e-value cutoff of 1e-5 allowed the association of gar paired-end sequences and RNA-seq contigs to highly similar annotated sequences in teleosts. The annotation of each coding sequence that aligned on the gar map was manually verified. We constructed Oxford grids (Edwards 1991) by lining up all 974 gar coding markers in their genomic order along each gar chromosome displayed in numerical order on the horizontal axis and then plotting the position of the human ortholog on the human karyotype displayed along the vertical axis.
To quantify syntenic divergence, we selected a group of 588 zebrafish genes, 573 stickleback genes, and 486 human genes with mapped gar orthologs. We counted the number of chromosomes in human or gar that contained orthologs of genes on each teleost chromosome and the number of chromosomes in human that contained orthologs of genes on each gar chromosome, normalized to the number of chromosomes in each species (zebrafish, 25; stickleback, 21; gar, 29; and human, 23) and evolutionary divergence times (Figure 1) (Inoue et al. 2005), and plotted results on a distance tree. To avoid genome duplication effects, we compared genomes unidirectionally from teleost to gar and human.
Constructing the gar map
To test, on a genome-wide scale, whether spotted gar lineage diverged from the teleost lineage before the TGD (Hoegg et al. 2004; Crow et al. 2006) and to challenge the controversial hypothesis that chromosome rearrangements accelerated after the TGD (Comai 2005; Semon and Wolfe 2007; Hufton et al. 2008), we netted from nature a male and a female spotted gar and mated them by in vitro fertilization. We isolated genomic DNA from parents and from 94 of their 2-week old offspring to form our F1 mapping panel. Markers heterozygous in the female provided a female meiotic map and markers heterozygous in the male parent provided a male map, while markers segregating in both parents were mapped as an F2 family and allowed the construction of a combined map (Figure 2, A and B). Each parental library provided 5 million sequences to identify the universe of RAD tags present in the cross. In all, gar had ∼33,000 SbfI restriction cut sites, which provided ∼65,000 total tags because tags extend in each direction from each cut site. Each of the 94 F1 progeny provided ∼1 million sequences to ascertain their genotypes (Table S1).
To automate genotype calling, we utilized Stacks software (Catchen et al. 2011). In brief, Stacks assembled RAD tags into stacks of identical sequence (Figure 2C), compared stacks pairwise (Figure 2D), and merged stacks into loci, defined as stacks of average sequencing depth that differ by fewer than three nt (Figure 2E). Stacks distinguished sequencing errors from polymorphisms using a maximum likelihood framework (Hohenlohe et al. 2010), compared loci of offspring and parents, called genotypes, provided a web interface for interrogating sequences and genotypes, and exported genotypes into JoinMap mapping software (Van Ooijen 2006). Of ∼65,000 total tags, 15,076 were polymorphic, and of those, 8406 tags mapped to the male map, the female map, or both. Polymorphic markers that did not appear in the final maps failed due to sequence ambiguities that decreased the number of map cross progeny scored for those markers below criterion (85 individuals). Because the number of RAD-tag markers greatly exceeded the limits of JoinMap, we arbitrarily selected 4551 markers plus all protein-coding loci for the final map.
JoinMap assigned 5466 markers to 29 linkage groups, similar in number to the 28 chromosomes of the closely related longnose gar (L. osseus) (Rab et al. 1999). Total map length was 1988 cM. Loc1, the longest L. oculatus linkage group, had 598 markers in 84 cM; Loc29, the shortest linkage group, had 44 markers in 62 cM; and Loc15 had 145 markers in 52 cM with 34 markers in protein-coding regions (Figure 2G). Figure S1 shows the complete map. A total of 656 markers were polymorphic in both the male and the female and hence were shared between the male and the female maps; these markers showed the same order and location in both maps, thus verifying map validity.
Coding sequences are required for comparative genomics, and although some of the mapped RAD-tag sequences represented coding regions [121 of 8406 RAD tags (1.4%) when compared to human], we utilized two strategies to increase the number of mapped coding sequences (Figure S2).
First, we randomly sheared SbfI-digested DNAs to obtain fragments of different length with a RAD tag at one end and subjected them to paired-end sequencing. Sequencing from the first end defined the RAD-tag marker and hence genomic location, while sequencing from the paired end, which occurred at many different distances from each tag due to random shearing, provided a few hundred base pairs of sequence located a few hundred base pairs from each enzyme cut site (Figure S2A). This procedure greatly increased the length of sequence associated with each RAD tag and increased the chance of finding sequence in coding regions.
The second step to increase mapped coding genes was to conduct a small gar transcriptomics project by RNA-seq (Pan et al. 2008). Our gar reference transcriptome comes from mRNA isolated from one mature embryo and from the head of another individual larva a few days older, both from the F1 map cross population and thus they share polymorphisms with mapped coding RAD tags. For each sample, one lane of sequencing gave 62.5 million raw paired-end reads 60 nt long with an insert length of 350 bp. Quality filtering and trimming for quality left 38.5 million paired-end reads and 6.7 million single-end reads. Velvet (Zerbino and Birney 2008) assembled 18.3 million of these reads into the final transcriptome assembly using a k-mer coverage estimate of 38x (based on a comparison to the known zebrafish transcriptome) and with a coverage cutoff of four. Figure S3 plots the number of contigs vs. contig length for this transcriptome. BLASTn searches against the gar reference transcriptome, using as query mapped RAD tags and their several hundred base pairs of paired-end contigs, identified an additional 691 mapped markers with near perfect identity to EST contigs. A BLASTn search of all 65,000 RAD tags, whether polymorphic or not, against our gar reference transcriptome identified 9086 (14%) that hit an EST, according to the criteria that the alignment spanned at least 70% of the query, hit the EST contig with an e-value of 1e-20 or less, and had a top BLAST hit with a raw score that was at least an order of magnitude better than the second best hit; these criteria accommodate exon/intron boundaries and allow for a few mismatches due to sequencing error or polymorphism. Of RAD tags that aligned to the transcriptome, 1327 (14.6%) contained polymorphisms, and of those, 945 were placed on the final map; the rest (382) were excluded because they were associated with RAD tags that were not scored in the minimum number of progeny that we had arbitrarily set as necessary to appear on the map.
BLASTx searches connected mapped markers, paired-end sequences, and ESTs to annotated genes in the sequenced genomes of human (Homo sapiens), stickleback (Gasterosteus aculeatus), and zebrafish (Danio rerio). We chose stickleback because its genome is particularly well assembled and zebrafish because it represents a lineage that diverged basal to other sequenced teleosts. Manual curation associated 891 and 900 mapped markers to zebrafish and stickleback orthologs (or co-orthologs) and 777 markers to human orthologs; in total, the analysis provided 974 gar markers with a putative ortholog in at least one of the three other genomes (Figure 2F).
The assignment of mapped markers to coding genes provided an additional test of map validity. A total of 58 coding genes contained more than one map marker each. Importantly, in each case all markers within the same gene mapped to the same bin or neighboring bins, as expected if the map were accurate.
With nearly 1000 coding markers distributed over the genome, we could perform comparative genomic analyses. We constructed Oxford grids (Edwards 1991) that lined up all 974 gar markers in their genomic order along the horizontal axis and plotted the position of each human ortholog on the human genome displayed along the vertical axis (Figure 3A). Results showed, first, that each segment of a human chromosome tends to be represented on just one gar chromosome, rather than on two chromosomes as in teleosts (Postlethwait et al. 1998; Woods et al. 2000; Jaillon et al. 2004). For example, the short arm of H. sapiens chromosome 5 (Hsa5p) is orthologous to the upper (right) part of Loc6, while the proximal part of the long arm of human chromosome 5 (Hsa5q) is orthologous to the proximal (left) part of Loc2, and the distal tip of Hsa5q is orthologous to the distal tip of Loc9 (Figure 3, A and B). Likewise, all orthologs of Hsa4 genes were found on Loc4 (Figure 3A). Although inversions in the gar or human lineages or both have rearranged regions orthologous to Hsa17, analysis shows that large regions are represented on a single gar chromosome (Figure 3C). Reciprocally, the upper (right) portion of Loc10 is orthologous only to Hsa17 and the lower part to Hsa1 or Hsa19 (Figure 3A). Likewise, the human orthologs of mapped Loc19 genes are located only on Hsa19q (Figure 3A). In contrast, each portion of a human chromosome tends to fall on two teleost chromosomes (Amores et al. 1998; Taylor et al. 2003; Jaillon et al. 2004; Kasahara et al. 2007).
Second, as with human chromosomes, each portion of a gar chromosome is in general orthologous to parts of two different teleost chromosomes. For example, gar genes that mapped to the left part of Loc10 have orthologs distributed broadly over two stickleback chromosomes, LGVIII and LGIII, while genes located on the right portion of Loc10 have stickleback orthologs distributed along LGXI and LGV+LGIX (Figure 4A). In addition, the Loc10 set of Hsa17 orthologs occurs on two zebrafish linkage groups that were previously shown to be paralogous, Dre3 and Dre12 (Postlethwait et al. 1998) (Figure 4B). This includes the gar orthologs of human genes GRIN2C and SDK2, each of which has one co-ortholog on zebrafish Dre3 and the other co-ortholog on Dre12 (Figure 4, B and C). Likewise, Loc19 is co-orthologous to zebrafish chromosomes Dre2 and Dre22 (Figure 4). This evidence shows on a genome-wide scale that gar and teleost lineages diverged before the TGD.
Teleost genome rearrangements
Data concerning the order of coding markers on the gar genetic map provide an opportunity to distinguish competing hypotheses for the explanation of the origin of teleost genome rearrangements. Under one hypothesis, chromosome rearrangements accelerated in the teleost lineage after the TGD (Comai 2005; Semon and Wolfe 2007); alternatively, most rearrangements had already occurred in the human or ray-fin lineage or both before the divergence of gar and teleost lineages and are thus mostly independent of genome duplication (Hufton et al. 2008). The first hypothesis predicts that gar and human genomes would be about equally rearranged with respect to teleost genomes. In contrast, the second hypothesis predicts that the architecture of teleost genomes would be more similar to the gar genome than to the human genome. We quantified syntenic divergence and normalized to evolutionary divergence times. Results showed that human and gar genomes clustered together separated from the two teleost genomes by a long branch (Figure 5). This finding shows that syntenic rearrangements accelerated after the divergence of gar and teleost lineages but before the divergence of stickleback and zebrafish lineages; this result is predicted by the hypothesis that the TGD facilitated the fixation of chromosome translocations in the teleost lineage.
We present here a strategy that provides to nonmodel species a rapid and economical method for constructing dense, coding-rich meiotic maps from the offspring of individual wild-caught parents. We used this approach to rapidly produce the largest published meiotic map for any fish, to our knowledge, containing >8000 total markers and nearly 1000 protein-coding markers. Importantly, this map was achieved for orders of magnitude less expense than other genome projects. The new map showed—on a genome-wide scale—that gar and teleost lineages diverged before the teleost genome duplication. Phylogenetic analyses suggest that sturgeons (Acipenser sp.), gar (Lepisosteus sp.), and bowfin (Amia calva) occupy a clade of ancient ray-fin fish that diverged from the teleost lineage after the divergence of the bichir (Polypterus sp.) lineage (Hoegg et al. 2004; Inoue et al. 2005; Crow et al. 2006; Katsu et al. 2008; Salaneck et al. 2008). Among species occupying this pre-TGD clade, spotted gar appears to be the most suitable for studies of development, genomics, and physiology. Spotted gar are locally plentiful in North America from Louisiana to Ontario, are amenable to in vitro fertilizations in the laboratory (in contrast to bowfin), do not have multiple rounds of genome duplication as do many species of sturgeons and paddlefish (Birstein and Desalle 1998), and have relatively small genomes compared to species such as bichir, a basally diverging ray-fin fish (Hardie and Hebert 2004). Furthermore, gar can be raised to adulthood in the laboratory, they produce thousands of eggs in a single spawn, their embryos are suitable for in situ hybridization analyses of gene expression, and their large size provides substantial material for biochemical and physiological analyses.
We conclude that spotted gar is the species of choice to serve as an experimentally accessible outgroup to teleosts to help infer ancestral, preduplicated functions of genes duplicated in teleosts. For example, if one member of a pair of teleost gene duplicates has a function that is not found in humans and other tetrapods, for example melanocyte adaptation to background coloration (Braasch and Postlethwait 2011; Zhang et al. 2010a), it is unclear whether that function was ancestral in bony fish and was lost in the human lineage or was lacking in the common ancestor of all bony fish and was newly evolved in the teleost lineage, perhaps as a neofunctionalization event (Force et al. 1999) after the TGD. Investigation of function in gar embryos could help resolve these types of questions.
Comparative genomic analyses conducted with the gar map revealed substantial conservation of synteny between human and gar that is consistent with the model that the TGD accelerated the loss of ancestral syntenies, and thus fails to rule out the hypothesis that whole-genome duplication plays a role in promoting syntenic rearrangements. Several theoretical concerns suggest possible mechanisms by which genome duplication could accelerate chromosomal rearrangements. The duplication of homologous coding elements would provide more substrates for illegitimate recombination between homeologous (paralogous) chromosomes and thereby stimulate the chromosome translocations that disturb conserved syntenies (Comai 2005). In addition, natural selection would favor rearranged karyotypes that reduce meiotic pairing between homeologous chromosomes because reduced pairing of homeologs would decrease the rate of aneuploid offspring, thereby improving fitness and thus tending to preserve these rearrangements during evolution (Comai 2005).
The mapping strategy we develop here is potentially broadly applicable to nonmodel species, capitalizing on the power of massively parallel sequencing to reveal genetic polymorphisms and to accelerate transcriptome analysis. For this RAD-tag strategy to work well, a species should (1) have access to a clutch of 20 or preferably more individuals from a single female taken from nature to make a female map or from several females fertilized by the same male to make a male map (see Figure S4), (2) show sufficient heterozygosity to provide polymorphisms in RAD-tag sequences (note that increasing reads from 80 to 100 or 150 nt would increase the rate of capturing polymorphisms), (3) possess a genome that provides appropriate numbers of RAD tags (using a restriction enzyme that recognizes a 6-bp site would increase the number of tags, which would be useful for a small genome, and using an enzyme that recognizes a 10-bp sequence would decrease the number of tags, which would be useful for a large genome or for situations in which fewer markers were required to answer questions), and (4) provide sufficient material for RNA-seq to help identify coding sequences contained in RAD tags. While species like kiwis (Apteryx australis) and elephants that produce a single offspring each season would not be so favorable for this approach, it should work for thousands of other species of interest for physiology and evolution. RAD-tag meiotic mapping provides an inexpensive and rapid way for individual research laboratories to query genomes at nodes in the tree of life that interest them and to map naturally occurring genetic variants.
Although the first linkage maps appeared nearly a century ago (Sturtevant 1913), meiotic maps remain a critical tool for understanding the mechanisms of development, physiology, and evolution. In fact, in the age of whole-genome sequencing, dense, sequence-based genetic maps assume an even more important role to efficiently order the often thousands of unassembled genomic contigs resulting from genome sequencing projects (Lewin et al. 2009). For many species, meiotic maps using our strategy would be more rapid and far less costly than the construction of tiled, fingerprinted BACs and do not require the specialized skill sets and materials needed for the development of radiation hybrid panels. The density of our meiotic map—an average of six polymorphic markers per megabase, or nearly one per BAC clone—would help assemble the thousands of contigs typically produced in today’s genome sequencing projects. RAD-tag mapping, coupled with RNA-seq and low-coverage whole-genome sequencing by next-generation technologies, can liberate nonmodel organisms from the prison of genomic ignorance.
We thank W. Cresko, E. Johnson, I. Braasch, J. Smith, S. Bassham, and T. Titus for helpful conversations. This work was supported by National Institutes of Health (NIH)/National Center for Research Resources grant R01RR020833 to J.H.P, a Golden Ranch Plantation grant to A.F. and Q.F, an NIH National Research Service Award Ruth L. Kirschstein postdoctoral fellowship 1F32GM095213-01 to J.C., and National Science Foundation grant DEB-1025212 to W. Cresko.
Available freely online through the author-supported open access option.
Supporting information is available online at http://www.genetics.org/lookup/suppl/doi:10.1534/genetics.111.127324/-/DC1.
Data in public repositories: SRA026509.1
IRB (IACUC) number: 08-13RA
- Received January 29, 2011.
- Accepted May 16, 2011.
- Copyright © 2011 by the Genetics Society of America
Available freely online through the author-supported open access option.