The sequencing of the 12 genomes of members of the genus Drosophila was taken as an opportunity to reevaluate the genetic and physical maps for 11 of the species, in part to aid in the mapping of assembled scaffolds. Here, we present an overview of the importance of cytogenetic maps to Drosophila biology and to the concepts of chromosomal evolution. Physical and genetic markers were used to anchor the genome assembly scaffolds to the polytene chromosomal maps for each species. In addition, a computational approach was used to anchor smaller scaffolds on the basis of the analysis of syntenic blocks. We present the chromosomal map data from each of the 11 sequenced non-Drosophila melanogaster species as a series of sections. Each section reviews the history of the polytene chromosome maps for each species, presents the new polytene chromosome maps, and anchors the genomic scaffolds to the cytological maps using genetic and physical markers. The mapping data agree with Muller's idea that the majority of Drosophila genes are syntenic. Despite the conservation of genes within homologous chromosome arms across species, the karyotypes of these species have changed through the fusion of chromosomal arms followed by subsequent rearrangement events.
ONE of the primary strengths of the genus Drosophila as a model system has been the relative ease of generating detailed cytogenetic maps. Indeed, the first definitive mapping of genes to chromosomes was performed in Drosophila melanogaster (Bridges 1916). The subsequent discovery of polytene chromosomes in the salivary glands in this same species (Painter 1934) and their codification into fine-structure genetic/cytogenetic maps represents perhaps one of the first forays into “genomics.” Polytene maps (Bridges 1935; Lefevre 1976) provided an important genetic tool for mapping genes, for detecting genetic diversity within populations, and for inferring phylogenies among related species (Dobzhansky and Sturtevant 1938; Judd et al. 1972; Ashburner and Lemeunier 1976; Lemeunier and Ashburner 1976). Sturtevant and Tan (1937) laid the groundwork for comparative genomics when they established that genes within the chromosomal arms are conserved or syntenic among species. In an insightful melding of the gene mapping and evolutionary studies, H. J. Muller (1940) proposed that the genomes of Drosophila species were subdivided into a set of homologous elements represented by chromosome arms. What Muller (1940) noted, which was subsequently elaborated on by Sturtevant and Novitski (1941), was that the presumed homologs of identified mutant alleles within a chromosome arm of D. melanogaster were also confined to a single arm in other species within the genus where mapping data were available. Using D. melanogaster as a reference, Muller proposed that each of the five major chromosome arms plus the dot chromosome be given a letter designation (A–F) and that this nomenclature be used to identify equivalent linkage groups within the genus.
The ancestral organization of the Muller elements found in the subgenus Drosophila is six acrocentric rods (Powell 1997), but a variety of rearrangement events have altered the organization of the Muller elements within the Sophophora subgenus (Figure 1). There is a pericentric inversion in D. ananassae on the X or A element that converts the normally acrocentric X into a metacentric chromosome. A further remarkable karyotypic change can also be seen on Muller element F of D. ananassae. The F element that is normally a small, dot-like chromosome is a large metacentric that is equivalent in size to the X or A element. The Sophophora species have also accumulated a variety of chromosomal fusions (Figure 1). The large autosomes of D. melanogaster are products of centromeric fusions between the B and C elements (symbolized as Muller B·C or chromosome 2L·2R) and D·E elements (3L·3R). The autosomes of D. willistoni also resulted from fusions of autosomal Muller elements (B·C and F·E; Figure 1) (Papaceit and Juan 1998). The metacentric X of D. willistoni, D. pseudoobscura, and D. persimilis was generated via a fusion of A·D elements (the X and autosomal 3L arms of D. melanogaster) that changed the transmission mode of an autosomal element to sex-linked inheritance. The absence of the fusion between A and D in D. subobscura, a close relative of D. pseudoobscura and D. persimilis, suggests that the obscura and willistoni group fusions are not homologous.
There is not, however, a perfect one-to-one correspondence between the Muller element and chromosome arm among all Drosophila species. Elements have rearranged following chromosome fusions. The specifics of these rearrangements are shown in Figure 1 and some of the highlights are presented here. In D. erecta and D. yakuba, there is a shared pericentric inversion at the base of the B·C element (2L·2R in D. melanogaster). In D. pseudoobscura and D. persimilis, an apparent pericentric inversion has moved genes from the Muller A to the Muller D element (Segarra et al. 1995). Comparative genomic analysis of the sequences involved in these rearrangements may provide valuable clues about the mechanisms driving the reorganization of these genomes.
The wide-ranging conservation of gene content within Muller elements is useful for assigning and organizing genomic scaffolds to specific chromosomal arms, while hybridization of DNA markers to polytene chromosomes is necessary to confirm the placement of assembled sequence on the chromosomes. While the polytene maps provide a valuable tool for orienting large scaffolds (>1 Mb), genome assemblies may yield many small scaffolds that are difficult to map in this manner. Thus, computational methods that take advantage of conservation of gene order among species have been developed to aid in the mapping of smaller scaffolds (Bhutkar et al. 2006). In contrast to the strong syntenic conservation in Drosophila, the order of genes along chromosome arms is poorly conserved due to the accumulation of inversions that shuffle gene order (Segarra and Aguadé 1992; Ranz et al. 2001). This shuffling of gene order provides strong inference that two scaffolds can also be joined when the genes at the terminal ends are adjacent in a conserved block of genes in the genomes of other species (Bhutkar et al. 2008).
The cytogenetic maps of the 11 Drosophila species provide a useful medium for presenting the mapped genome scaffolds using a web-based format such as GBrowse available through FlyBase Consortium (1999). The available cytogenetic maps of the 11 species, however, vary in quality and nomenclature. The D. pseudoobscura and D. persimilis chromosomes depicted in photographic maps contain several bends, making them less than ideal for web-based presentation of the data (Kastritsis and Crumpacker 1966; Moore and Taylor 1986). Two nomenclature problems exist in how the cytogenetic maps were divided into sections and subsections. The gold standard used in D. melanogaster divides the cytogenetic map into numbered sections and each section was subdivided into lettered subsections (Bridges 1935). In the non-D. melanogaster species, chromosomal maps have been divided into major sections, but not all sections have been divided into subsections. In all but one species, D. mojavensis (Wasserman 1992), numbered sections and lettered subsections were used. Lack of concordance among the cytogenetic maps indicated a need for revision of the maps and standardization of nomenclature where possible. The completion of genomic sequencing and whole-genome assembly of 11 Drosophila species provides an opportunity to update, organize, and synthesize the genetic and physical maps of each species and to standardize the quality, presentation, and nomenclature of cytogenetic maps from each. Assembled scaffolds were aligned to the polytene chromosomes of the newly sequenced species by evaluation of existing and newly obtained map data as well as computational comparisons among species, especially in relation to the organization of genes in the genome of D. melanogaster.
Mapping the assembly scaffolds in precise order and orientation may not seem important to Drosophilists interested in the molecular biology of their favorite gene. However, well-supported maps of assembly scaffolds will allow one to address questions about genome rearrangement, a virtual black box in evolutionary biology. Many different genome-level questions are especially suitable for investigation using species in the genus Drosophila.
First, what is the molecular basis of differences among species, including their inability to produce viable and fertile progeny? Genome sequences of closely related species pairs, such as D. pseudoobscura and D. persimilis and D. simulans and D. sechelia, will facilitate comprehensive genomic analyses of functional differences between genomes, whereas each of the other species represents a reference for generating genomewide mapping resources for studies within each group.
Second, what is the mechanistic basis for the origin of distinct features of the sex chromosomes and how are selfish versions such as the sex ratio variety of D. pseudoobscura and D. persimilis born? The X chromosome in the obscura and willistoni groups has independently acquired an entirely new arm. These genome sequences provide a substrate for determining changes associated with the transformation from autosome into X chromosome and for defining unique DNA signatures of this sex chromosome (Gallach et al. 2007; Sturgill et al. 2007). A close relative of D. virilis has similarly acquired a new arm of the X through a centromere fusion, and comparative genomic studies of this derived X chromosome in D. americana will provide insight into unique forms of selection that contribute to the initial divergence between sex chromosomes (McAllister and Evans 2006; Evans et al. 2007).
Third, how do new inversions originate? A dichotomy currently exists between inversions that appear to have arisen through ectopic exchange at repetitive sequences and those lacking any evidence of shared repeats at the breakpoints (Ranz et al. 2007). The difference may indicate distinct pathways of origin, or alternatively, the mechanism of origin may be obscured by time. Analyses of the structure at the breakpoints of inversions relative to their age will distinguish between immediate causal mechanisms of origin and subsequent changes within these regions of the genome.
Fourth, what is the molecular basis of gene arrangement polymorphism? The species selected for sequencing and their close relatives exhibit a diverse array of chromosomal rearrangements consisting of both paracentric and pericentric inversions in addition to centromeric fusions between chromosomal elements (Hsu 1952; Stone et al. 1960; Vieira et al. 1997a). These naturally occurring rearrangements are excellent substrates to determine the factors that influence change in chromosomal structure (Vieira et al. 2001; McAllister 2002, 2003). Using the positions of known chromosomal rearrangements and comparative map data, these genome sequences provide a guide for inferring the positions of genes relative to other genome arrangements within these or closely related species (Vieira et al. 2006; Evans et al. 2007).
Fifth, what is the basis for gene arrangement polymorphism on some chromosome arms and not others using D. subobscura and D. pseudoobscura as models? These questions are now more tractable, given the complete genome sequence available for two members of the obscura species group. These species still have their limitations in that few balancer and mutant strains exist. In addition, transformation systems to introduce mutations and shuttle genes into different strains are also limited, but are likely to be developed in the future.
Here, we present the scaffold maps for 11 species of Drosophila. The data in this article are presented in a series of sections. The materials and methods describes the bioinformatic methods that were used to aid in the scaffold joining. The results is divided into a section based on the computational analysis and a section for each species that present (1) the history of the polytene maps; (2) the mapping data used to anchor the scaffolds; (3), the new polytene chromosomal maps; and (4) the problems that were discovered in the assembled sequences. The discussion presents new insights that emerged from the mapping of genome scaffolds as well as potential pitfalls of the whole-genome shotgun approach.
MATERIALS AND METHODS
Strains, polytene map preparation, and genome sequence:
The descriptions of the genome strains for the Drosophila species sequenced for the Comparative Analysis Freeze 1 (CAF1) are described at the Drosophila Species Stock Center website (see http://www.flybase.org for the current link). Details on the preparation of polytene maps and the strategy for anchoring scaffolds to the polytene chromosomes can be found in the supplemental Materials and Methods section. The CAF1 assembly scaffolds were mapped to the polytene chromosomes.
Computational support for anchoring scaffolds:
Syntenic analysis of genome assembly data served as the basis of computational predictions for anchoring scaffolds to specific chromosome arms. CAF1-assembled sequences (http://rana.lbl.gov/drosophila/caf1.html) of all 11 genomes were analyzed using Synpipe (Bhutkar et al. 2006), a computational tool for gene homology and synteny inference. Synpipe uses an annotated peptide set from a reference species, in this case the D. melanogaster Release 4.3 annotation (FlyBase Consortium 1999), and infers an initial set of homology assignments in a candidate genome assembly. A tBLASTn-based approach (Altschul et al. 1997) was utilized for this. Following this, synteny chains consisting of neighboring genes (in the same order as in the reference species) are inferred. Initial homology assignments are then refined with the objective of maximizing the size of syntenic chains in the presence of paralogs. Merging adjacent synteny chains to allow for localized gene order scrambling (presumably due to micro-inversions on the chromosome) results in expanded synteny chains. This is based on various user-defined thresholds for allowable scrambling, with a default of up to 10 genes whose order can be locally scrambled. This approach accommodates contig and scaffold gaps in the candidate assembly by tagging homologs that might lie in unsequenced assembly gaps, on the edges of scaffolds or contigs, or on small assembly fragments. It also provides adjacent gene-pair information between species, which has been shown to be concordant with phylogenetic relationships (Bhutkar et al. 2007a). The output of this algorithm, including gene-pair information, provides a data set that can be used for comparative analysis of syntenic blocks and boundaries to infer chromosomal rearrangements between species (Bhutkar et al. 2008), to refine multi-species alignments and orthologous gene calls, to identify probable assembly errors, and to anchor scaffolds along chromosome arms in the proper place and orientation.
As has been noted, in the genus Drosophila orthologous genes are localized on the same Muller elements across species (Sturtevant and Tan 1937; Sturtevant and Novitski 1941; Richards et al. 2005), with the exception of a relatively few cases of gene movement across arms (Gonzalez et al. 2004; Schulze et al. 2006; Bhutkar et al. 2007b). The assignment of homologs across scaffolds then lends itself to mapping scaffolds to specific Muller elements on the basis of the location of the majority of genes within the genome of D. melanogaster. Once scaffolds have been assigned to Muller elements in this manner, the order and orientation of various assembly scaffolds along chromosome arms can be predicted using orthologous markers placed on the edges of scaffolds in the context of synteny relationships across species. To predict the adjacency of two scaffolds (a “scaffold join”), the following cases were analyzed: (a) conserved synteny, (b) conserved synteny with assembly gaps, (c) rearrangement supported by adjacent species, (d) rearrangement supported by an inferred ancestral arrangement, and (e) species-specific rearrangement with trace-back. A visual representation of these five inferences is shown in Figure 2.
If one of these cases supported the adjacency requirement for two scaffolds assigned to the same Muller element, a “scaffold join” was predicted. On the basis of the edges that were joined, the scaffolds were mutually oriented in a superscaffold with respect to their assembly orientation. Superscaffold edges were further considered for potential joins with other scaffolds or superscaffolds. This recursive procedure continues until no further predictions can be made on the basis of syntenic block data. Various levels of confidence can be assigned to these predictions on the basis of the level of evidence. For example, joins made in cases where straightforward synteny via sequential gene order was used are assigned a higher confidence than joins made using species-specific breakpoints. At the completion of this process, scaffolds are bucketed either into blocks of superscaffolds, ideally one per chromosome arm with orientation information, or into a bucket of unconnected scaffolds that have no evidence to support their joins.
Assignment of scaffolds to Muller elements and placement in ordered and oriented superscaffold blocks was inferred for each of the 11 species. Table 1 lists the number and lengths of scaffolds assigned to each Muller element of each species. In addition, Table 1 also shows this information for scaffolds whose location and orientation was verified with physical and genetic mapping data. Additionally, syntenic block information was also used to assign assembly scaffolds to heterochromatic regions of the genome based on majority hits from D. melanogaster heterochromatic genes. Table 2 summarizes the scaffold assignment data for euchromatic, heterochromatic, and unmapped scaffolds. An important observation is that the amount of DNA and genes in the scaffolds assigned to a Muller element is similar among the 11 species (109.7–153.1 Mb), but the unassigned fraction is more variable among the species (14.7–31.7). The unassigned scaffolds are expected to be heterochromatic and 90% of these scaffolds are <100 kb in length (see supplemental text). The results of inferring scaffold joins to form superscaffolds (scaffold names, GenBank identifiers, scaffold order, and orientation information) for each species are included as supplemental Tables 2–14. The summary of superscaffold order for the 11 Drosophila species based on the computational- and marker-based analyses are shown in supplemental Table 15. Figure 2 shows an example of scaffold order inferred along one chromosome arm, the kind of evidence used to predict the order and orientation, and how these computational predictions complemented the experimental analysis process.
Inferring the order and orientation of scaffolds allowed the creation of cross-species synteny maps for full-length Muller element comparisons. These maps are based on the annotated protein set of a reference species. One set of synteny relationships was derived using Release 4.3 of the well-annotated D. melanogaster peptide set (FlyBase Consortium 1999). The prediction of scaffold order information and consensus annotation sets (Drosophila 12 Genomes Consortium 2007) allowed the use of other species as reference sets, now that their Muller element-wide gene order could be predicted. Orthology and syntenic block information was derived using Synpipe while allowing for assembly gaps and localized scrambling. Localized gene-order scrambling, or microsynteny, was permitted within specified thresholds. Gene duplication was not addressed in this analysis and the best syntenic placement was specified as the ortholog for a given gene from the reference species. In accordance with the phylogeny, synteny maps using either D. melanogaster or D. virilis as reference sets show longer conserved blocks in more closely related species than in more distantly related species (Bhutkar et al. 2008).
D. melanogaster subgroup species maps
The sequenced species most closely related to D. melanogaster (D. simulans, D. sechellia, D. erecta, and D. yakuba) belong to the melanogaster group. All five are nearly identical morphologically and have karyotypes similar to D. melanogaster, consisting of an acrocentric rod X chromosome, two large metacentric autosomes (2L·2R and 3L·3R), and a small dot fourth chromosome. Males have a “J-shaped,” entirely heterochromatic Y chromosome. In polytene chromosome preparations, there are five long banded euchromatic elements corresponding to the X and each of the major autosomal arms. The centromeric portion of each chromosome is embedded in the chromocenter, which contains the pericentric heterochromatin. Relative to the banded euchromatin, this heterochromatic compartment of the genome is underreplicated. The fourth chromosome can be seen as a small, banded nub usually associated with the chromocenter. The heterochromatic Y is not banded and lies entirely within the chromocentral region of the nucleus. Each of the six banded euchromatic arms corresponds to each of the Muller elements that are used to define the syntenic relationships among the species of the genus (X, A; 2L, B; 2R, C; 3L, D; 3R, E; 4, F) (Muller 1940; Sturtevant and Novitski 1941).
Perhaps one of the earliest forays into the genomics of the genus was made possible by the development of the polytene chromosome maps of D. melanogaster by Bridges (1935). He proposed an alphanumeric nomenclature by dividing each arm into 20 numbered units and each of these into the six subunits A–F and by assigning a number designation for each of the bands therein. This codification of the chromosome map allowed the designation of chromosomal aberration breakpoints and the localizations of genes associated with these breaks. While these maps were useful and the standard for many years, they were augmented by the production of mosaic photographic maps by Lefevre (1976) and further refined by the electron microscope maps of Sorsa (1988a,b,c).
The four sequenced melanogaster group species are so similar to D. melanogaster that it has been possible to use the aforementioned maps in mapping efforts in these species. Indeed, the mapping of several genes in D. simulans by in situ hybridization has used the Bridges (1935) D. melanogaster nomenclature to designate the location of cloned sequences from D. simulans (FlyBase Consortium 1999 and references cited therein). Moreover, the mapping of inverted sequences that serve to distinguish the gene order in the four species has utilized the Bridges (1935) maps and nomenclature. In some cases, the position of the paracentric inversion breakpoints can be observed directly by virtue of the fact that viable, albeit sterile, hybrid progeny can be produced with interspecific mating, e.g., between D. melanogaster and D. simulans (Horton 1939; Lemeunier and Ashburner 1976). In cases where these hybrids cannot be generated, the similarity in the chromosome-banding patterns has allowed the mapping of inversion breakpoints by side-by-side comparisons of photographic maps of the different species (Lemeunier and Ashburner 1976, 1984). The traditional use of the Bridges' (1935) maps and nomenclature in the comparative cytology of these species dictated the use of the combined Bridges ideogram and Lefevre photographic maps for the alignment of the scaffolds to the chromosomal maps. Additionally, the large number of clear orthologies among the species with D. melanogaster, the conserved synteny and the accurate assignment of genes to the polytene map in D. melanogaster, coupled with the similarities in karyotype and banding pattern, allowed a fairly robust alignment of the assembled scaffolds with the chromosomal maps. This was even the case for D. simulans and D. sechellia, which were sequenced to lower coverage than the other species and thus had more problematic and fragmented assemblies, respectively. Indeed, the excellent alignment of scaffold to chromosome enabled the discovery of apparent assembly errors that will have to be addressed in future analyses on these species.
Chromosome maps—D. simulans:
The alignment of D. melanogaster and D. simulans has revealed some problems with the mosaic D. simulans assembly (Begun et al. 2007) and these will be pointed out below in a presentation of each arm/element alignment. There are also 665 small scaffolds that we have not attempted to align to the chromosomes. However, because these contain D. melanogaster orthologs, we have provided a tentative linkage call on the basis of a possible syntenic conservation between D. simulans and D. melanogaster. These assignments can be found in the supplemental tables.
Muller element A (chromosome X, sections 1–20):
As noted above, there is a single large scaffold associated with the X chromosome (Figure 3). This scaffold is 17.0 Mb in length and contains 1360 called orthologs. Using these and their D. melanogaster cytological assignments, it is possible to align the scaffold from a site near the telomere (1A) to the base (20F) of the X. As a cross-check on this alignment, we extracted from FlyBase all of the D. simulans genes that had been cytologically localized. Within this group, those that were also found in the orthology calls for the X were identified and their cytology correlated with their position in the scaffold. This process resulted in 11 additional points of alignment that were all consistent with the computational analysis (supplemental Table 16).
Muller element B (chromosome 2L, sections 21–40):
There is a single large scaffold of 22.0 Mb assigned to B/2L with 2160 called orthologs. Correlation of the molecular and inferred cytological maps allows the alignment of the scaffold from near the telomere (21A) to the base (40F) of this arm (Figure 3). Again, the cytologically localized genes were extracted from FlyBase and those represented in the ortholog set were identified. Sixteen of these allowed further confirmation of the scaffold alignment (Figure 3, supplemental Table 16).
Muller element C (chromosome 2R, sections 41–60):
A single large scaffold of 19.6 Mb is associated with C/2R. This scaffold contains 2328 called orthologs. Alignment of the molecular and cytological maps shows that the scaffold extends from the base (41E) to the telomere (60F) of this element (Figure 3). The alignment also demonstrates a problem in the assembly. At a position approximately in the middle of the scaffold, there is an ∼350-kb sequence, which by cytological mapping should reside near the telomere in 3L/D (solid boxes in Figure 3). Because hybrids of D. simulans and D. melanogaster have been examined cytologically, a transposition of this size would have been noted. Thus, it is likely that the insertion of these sequences into the 2R/C scaffold represents an assembly error. Eleven cytologically localized D. melanogaster orthologs were identified and examined. These again are consistent with the computational alignment (Figure 3, supplemental Table 16).
Muller element D (chromosome 3L, sections 61–80):
A single large scaffold of 22.6 Mb is associated with D/3L. Using the 2289 called orthologs and their positions in the D. melanogaster genome it is possible to align the scaffold from near the telomere (61A) to the base (80F) of this arm (Figure 3). We again extracted from FlyBase the genes that had been previously localized to the D/3L arm and found those that had called orthologs. The locations of this group of 19 were consistent with and confirmed the computational alignment (Figure 3, supplemental Table 16). The clear gap in the shaded box above the chromosome in Figure 3 indicates the position of the material assembled into the C/2R scaffold.
Muller element E (chromosome 3R, sections 81–100):
A single large 27.5-Mb scaffold is associated with D/3R, which contains 2921 called orthologs. The alignment of this scaffold's molecular map and the associated D. melanogaster cytological map positions the scaffold from near the base (81F) at one end to near the telomere (100E) at the other (Figure 3). Cytological examination of hybrids between D. simulans and D. melanogaster has shown that these two species differ in this element by a large paracentric inversion with breakpoints in 84F1 and 93F6-7 (Horton 1939; Lemeunier and Ashburner 1976; Ranz et al. 2007). The computational alignment of scaffold to chromosome clearly reveals the presence of the inversion and confirms the cytologically determined breakpoints (Figure 3). The previously localized genes contained within the called ortholog set retrieved from FlyBase again were consistent with and confirmed the computational alignment (Figure 3, supplemental Table 16).
Muller element F (chromosome 4, section 101–102):
The small fourth chromosome has a single scaffold of 1.0 Mb mapped to it (Figure 3). There are 59 called orthologs and these allow an alignment that extends from near the base (102A) to the telomere (102F) of the chromosome. There are apparent problems with the assembly of this scaffold in that two genes, Dyrk3 and Mitf, which map near the telomere in D. melanogaster and in the other three melanogaster group species, are placed in the proximal third of the D. simulans assembly. There are also fewer orthology calls in this species for this chromosome as compared to the other melanogaster group species. Inspection of the regions where these should be found indicates that there are several gaps in the assembly. This fact and the problem noted above with respect to the C/2R and D/3L assemblies indicates that the assemblies in D. simulans should be used with caution.
Chromosome maps—D. sechellia:
Due to the low level coverage, the D. sechellia assembly is the most fragmented of the melanogaster group species. Additionally, there are a large number of short scaffolds that contain only one or two orthologs of D. melanogaster genes. A list of these can be found in supplemental Table 17. A comparison of those lists reveals that many of these short scaffolds are apparent duplicates of small portions of the longer ones that we were able to align with the chromosomes.
Muller element A (chromosome X, sections 1–20):
The X chromosome can be aligned to 22 scaffolds ranging in size from 139 kb to 3.3 Mb. The smallest number of called orthologs in a scaffold is 7 and the largest 329. The total coverage of the aligned scaffolds is 20.9 Mb containing 1936 called orthologs. The aligned scaffolds cover the X from near the telomere (1A) to near the base (20C/D) of the chromosome with at least five significant gaps (Figure 4). Due to the undersampling of the X relative to the autosomes, this chromosome shows the highest fragmentation. Despite these problems, the alignments are good and the gene order within each scaffold is similar to those seen in D. melanogaster. There are, however, clear problems with the assemblies revealed by this and other chromosome/scaffold alignments. Scaffold 4 aligns in part with a region just proximal to the telomere of the X (3D-6E). This scaffold is, however, clearly chimeric in that another large and contiguous portion of the scaffold aligns perfectly with a region near the telomere of E/3R (supplemental Table 17). On the basis of the close similarity of the banding patterns of D. melanogaster and D. sechellia (Lemeunier and Ashburner 1984), a transposition of material between the X and 3R appears highly unlikely and the association of sequences from these two elements in this scaffold is most likely an assembly error. This is not the only case of scaffold chimerism in this species (see below).
Muller element B (chromosome 2L, sections 21–40):
The B/2L chromosome arm can be aligned with 10 scaffolds, which comprise 22.2 Mb of sequence and contain a total of 2210 called orthologs. The scaffolds range in size from 53 kb to 4.7 Mb. The lowest ortholog count is 5 (scaffold 63) and the highest 809 (scaffold 3). The observation that makes the alignment odd is that scaffolds 3 and 5 apparently do not represent a contiguous sequence in the B/2L element (Figure 4, supplemental Table 17). Moreover, scaffold 5 is chimeric and contains sequences from this element as well as from C/2R and E/3R (Figure 4, supplemental Table 17). Interestingly, the three chimeric portions are internally contiguous and are not intermingled. There are also two gaps in the coverage at both the distal and proximal ends of the arm. One possible explanation for the observed contiguity problems is that there have been transpositions or inversions of large blocks of material. The size of these blocks makes it likely that they would have been observed cytologically, but this is not the case. The most likely explanation is that they represent assembly errors associated with low coverage.
Muller element C (chromosome 2R, sections 41–60):
The C/2R arm can be aligned to five scaffolds comprising 19.6 Mb of sequence and containing 2448 called orthologs. The scaffolds range in size from 168 kb to 14.2 Mb. Scaffold 59 has the lowest ortholog count at 7, while scaffold 1 has the highest at 1748. As noted, a contiguous fragment of scaffold 5 (189 kb; 141 ortholog calls) aligns with this arm (Figure 4, supplemental Table 17).
Muller element D (chromosome 3L, sections 61–80):
Arm D/3L can be aligned with seven scaffolds, which comprise 22.8 Mb of sequence and contain 2240 called orthologs (Figure 4). The scaffolds range in size from 100 kb (scaffold 89) to 9.9 Mb (scaffold 0). Scaffold 89 has the smallest number of called orthologs (4) while scaffold 0 contains the highest (1095). Aside from being the largest scaffold, like scaffolds 4 and 5, scaffold 0 is a chimera. Approximately half of scaffold 0 is aligned to D/3L while the other half is associated with E/3R. Again the two halves are represented by contiguous sequences that align well with the D. melanogaster cytological map (Figure 4, supplemental Table 17).
Muller element E (chromosome 3R, sections 81–100):
The E/3R element can be aligned to 15 scaffolds comprising 27.7 Mb of sequence and containing 3045 called orthologs. Scaffold 166 is the smallest at 38 kb and, not surprisingly, contains the smallest number of called orthologs, 5. The largest is the other half of scaffold 0 at 11.2 Mb, which also contains the highest number of called orthologs at 1345. As noted earlier, there are also portions of two other chimeric scaffolds (4 and 5) that align with this arm (Figure 4, supplemental Table 17). As in the case of D. simulans, D. sechellia differs from D. melanogaster by virtue of a large paracentric inversion on E/3R (Lemeunier and Ashburner 1984). The reported breakpoints are similar to those of the D. simulans inversion (Horton 1939; Lemeunier and Ashburner 1976). Scaffolds 0 and 6 of D. sechellia show clear evidence of this inversion at the sequence level and also indicate that the breaks are similar if not identical to the D. simulans inversion (Ranz et al. 2007) (Figure 4, supplemental Table 17).
Muller element F (chromosome 4, section 101–102):
The dot chromosome F/4 can be aligned with three scaffolds comprising 1.2 Mb of sequence and containing in aggregate 72 called orthologs (Figure 4, supplemental Table 17). Scaffold 30 is the longest (666 kb) and contains the highest number of called orthologs (41). Scaffold 52 is the shortest (194 kb) and contains the smallest number of called orthologs (12) (supplemental Table 17).
Chromosome maps—D. erecta:
The coverage of this species is high and consequently the assemblies appear to be better than the two species covered above. The large contiguous scaffolds that essentially include entire arms have also allowed the ready identification of the inversion breakpoints and complexes that serve to differentiate D. erecta from D. melanogaster (Lemeunier and Ashburner 1976). Interestingly, the molecularly defined breakpoints are in excellent agreement with the previously reported cytological determinations, offering a testament to the accuracy of those results (Lemeunier and Ashburner 1976). D. erecta has a pericentric inversion at the base of the B·C element (2L·2R in D. melanogaster). The 2L and 2R arms are referred to as the Muller elements B/C· and ·B/C, respectively, where the dot represents the position of the centromere in the rearranged arms. Similar nomenclature is used for the D. yakuba pericentric inversion.
Muller element A (chromosome X, sections 1–20):
There are two large scaffolds that cover the majority of this arm. The more distal of these is 4644, which extends from the telomere (1A) through section 3A. It contains 2.5 Mb of sequence and 257 called orthologs. The second scaffold 4690 begins in section 3B and extends to the base (20E/F) of the A/X element. It contains 18.8 Mb of sequence and 1704 called orthologs. Thus the total coverage of this arm is 21.3 Mb of sequence and 1961 called orthologs (Figure 5, supplemental Table 18). As noted in supplemental Table 18, several inversions differentiate the polytene chromosome-banding pattern when comparing D. erecta and D. melanogaster. The molecular sequence of scaffold 4690 identifies six inversion breakpoints and three overlapping inversions (Figure 5, supplemental Table 18). The molecularly identified breakpoints agree almost entirely with those reported from direct cytological analysis (Lemeunier and Ashburner 1976). The only point of noncongruence is a very small section (7D-7E) that would have been very difficult to discern in normal cytological preparations or with the methodology used to define the extent of the inversions.
Muller element B/C· (chromosome 2L, sections 21–41):
The single large scaffold 4929, encompassing 26.6 Mb of sequence and containing 2332 called orthologs, covers essentially the entirety of B-C·/2L (Figure 5, supplemental Table 18). A discussion of this scaffold/arm association as well as that of ·B-C/2R must include the fact that this species possesses a pericentric inversion that reassociates genetic material between the two distinct Muller elements in D. melanogaster. Superimposed on this pericentric inversion is a group of overlapping paracentric inversions. The positions and extent of the overlapping inversions is shown in Figure 5 by a series of brackets above the chromosome. The molecular map determined from scaffold 4929 identifies seven inversion breakpoints in this arm (Figure 5, supplemental Table 18). Again, as in the case of the A/X element, the agreement between the positions of the molecularly and cytologically defined breakpoints is excellent (Lemeunier and Ashburner 1976). The only discrepancy is again a very small segment (26B4-26B2) that would have been difficult to detect.
Muller element ·B/C (chromosome 2R, sections 40–60):
There is a single large scaffold 4845 encompassing 22.6 Mb of sequence and containing 2338 called orthologs that extends for the length of ·B-C/2R (Figure 5, supplemental Table 18). Here again, the presence of the other portion of the pericentric inversion reassociates sequences that are found in different Muller elements in D. melanogaster. The molecular map derived from scaffold 4845 identifies five inversion breakpoints, including the paracentric inversions superimposed on the pericentric (Figure 5, supplemental Table 18). The brackets above the chromosome in Figure 5 show the positions and extent of the inversions. As in the arms discussed above, the cytological and molecular maps are in excellent agreement. The exception is a cytologically small fragment (35F-36B).
Muller element D (chromosome 3L, sections 61–80):
The single large scaffold 4784 encompassing 25.8 Mb of sequence and containing 2448 called orthologs extends for the length of D/3L (Figure 5, supplemental Table 18). The molecular map derived from this scaffold identifies seven paracentric inversion breakpoints. These are associated with four inversions. The distal pair of inversions is overlapping while one of the proximal pair is within the larger, and this pair apparently shares a similar proximal breakpoint (Figure 5). The brackets above the chromosome in Figure 5 show the position and sizes of the inversions. A comparison of the cytologically determined breakpoints with those found here in the molecular map shows that again the cytological determination was extremely accurate with, in this case, no missed small segments (Lemeunier and Ashburner 1976).
Muller element E (chromosome 3R, sections 81–100):
There are two scaffolds that can be aligned to the E/3R element. The most proximal of these, 4770, comprises 17.7 Mb of sequence and contains 1929 called orthologs. The more distal scaffold 4820 comprises 10.5 Mb of sequence and contains 1137 called orthologs. Thus a total of 28.2 Mb of sequence and 3066 called orthologs can be aligned to E/3R (Figure 5, supplemental Table 18). At the molecular level, there are four identified paracentric inversion breakpoints. These are in general agreement with the previous cytologically determined breaks (Lemeunier and Ashburner 1976). The brackets above the chromosomes in Figure 5 show the positions and extent of the inverted regions. Unfortunately, there is an apparent small gap in the 4770 scaffold that is near the distal end of the smaller included inversion, and the join between the 4770 and 4820 is near the distal end of the larger inversion. This prevents a more precise mapping of the distal ends of both inversions. It appears, however, that the larger of the two inversions is similar, if not identical, to the E/3R inversion seen in D. simulans and D. sechellia.
Muller element F (chromosome 4, section 101–102):
The single scaffold 4512 encompasses nearly the entire banded portion of chromosome F/4 (Figure 5, supplemental Table 18). This scaffold covers 1.3 Mb of sequence and contains 77 called orthologs. A comparison of this scaffold with the gene content and order of D. melanogaster indicates that this element is essentially entirely conserved in gene content and order.
Chromosome maps—D. yakuba:
The assembly of the D. yakuba sequence appears to be reasonably good on the basis of few disagreements with the cytological map and has allowed the assignment of a single large scaffold to each of the six Muller elements. As with D. sechellia, several of these small scaffolds that we did not align contain duplications of material found in other small scaffolds or in the large scaffolds aligned to the arms. Thus, assuming that these are not true duplications within the genome, for reasons beyond the scope of this report these would appear to be unassembled fragments that could be used to evaluate limitations in the assembly process used for this genome.
Muller element A (chromosome X, sections 1–20):
A single large scaffold comprising 21.8 Mb of sequence and containing 1946 called orthologs can be aligned with the A/X element. This scaffold extends from the telomere at 1A to the base of the chromosome at 20F, covering essentially the entire arm (Figure 6, supplemental Table 19). The molecular map of the scaffold indicates that there are 12 breakpoints that define the endpoints of six overlapping paracentric inversions (Figure 6, supplemental Table 19). The position of the breakpoints and the extent of the inversions are indicated by the brackets above the chromosomes in Figure 6. In an earlier cytological mapping of the inversion breakpoints in this element, Lemeunier and Ashburner (1976) noted that the number of rearrangements made it difficult to precisely map inversion endpoints. They were, however, able to discern some and these are shown above the chromosome map in Figure 6. The data derived from this alignment of the molecular map have allowed further refinement of the earlier map and provide a more complete picture of the inversion complex that distinguishes the gene order in this species relative to D. melanogaster.
Muller element B/C· (chromosome 2L, sections 21–41):
A single large scaffold comprising 22.3 Mb of sequence and containing 2342 called orthologs can be aligned to B-C·/2L. The distal end of the scaffold lies at 21A near the telomere and the proximal end at 41F (Figure 6, supplemental Table 19). Thus the scaffold encompasses almost the entire arm. As is the case for D. erecta, the Muller B and C elements are reassociated by a pericentric inversion, creating a mix of genetic material from both arms. Superimposed on this is a set of overlapping paracentric inversions. The molecular map of the scaffold identifies nine inversion breakpoints in B-C·/2L (supplemental Table 19). Eight of these are associated with four paracentric inversions. The positions of the breakpoints and the extent of the inversions are shown by the brackets above the chromosomes in Figure 6. As was true for the A/X element, the number and extent of the inversion complexes confounded the cytological analysis of this arm. Despite this fact, several of the cytologically determined breaks are in good agreement with the molecular map, and this new ordering should be considered a refinement of that earlier effort (Lemeunier and Ashburner 1976).
Muller element ·B/C (chromosome 2R, sections 40–60):
The ·B-C/2R arm can be aligned with a single large scaffold comprising 21.1 Mb of sequence and containing 2372 called orthologs (Figure 6, supplemental Table 19). The scaffold's proximal end maps to 40F at the base of the arm to 60F at the telomere and thus encompasses essentially the entire arm. The molecular map identifies 13 breakpoints associated with both the peri- and paracentric inversions. Six of these are associated with a set of three nested and overlapping paracentric inversions at the distal end of the arm that are exclusive of the pericentric breaks that serve to reassociate the genetic material between the two Muller elements (Figure 6, supplemental Table 19). Again, the earlier cytological analysis was hampered by the extent of rearrangement in 2L·2R. Nonetheless, several of the breaks identified there are in agreement with those seen here. Again, the map provided here should be regarded as a refinement of the earlier cytologically derived map (Lemeunier and Ashburner 1976).
Muller element D (chromosome 3L, sections 61–80):
The D/3L element can be aligned with a single large scaffold comprising 24.2 Mb of sequence and containing 2352 called orthologs. The distal end maps to 61A and the proximal end to 80F, essentially encompassing the entire arm (Figure 6, supplemental Table 19). The molecular map identifies nine breakpoints that are associated with five paracentric inversions. The two most distal inversions are nested and exclusive of the proximal three (Figure 6). The latter are overlapping and the distal breakpoint of two of the group appears to be shared or in close proximity. The positions of the breakpoints and the extent of the inverted segments are indicated by the brackets above the chromosome in Figure 6. The fact that the inversion set in D/3L is less complex than that seen in the A/X and B-C·B-C/2L·2R chromosomes made the cytological interpretation of the breakpoints more straightforward (Lemeunier and Ashburner 1976) and a comparison of the molecularly and cytologically determined breaks shows that they are in excellent agreement (Figure 6, supplemental Table 19).
Muller element E (chromosome 3R, sections 81–100):
Arm E/3R can be aligned with a single large scaffold comprising 28.8 Mb and containing 3054 called orthologs. The proximal end of the scaffold maps to 81F and the distal end to 100E, thus encompassing almost the entire arm (Figure 6, supplemental Table 19). The molecular map identifies 12 breakpoints that are associated with eight overlapping/nested paracentric inversions proximally and a single distal inversion that is not associated with the others. The number of breaks and inversions is discordant because there are several shared or proximate breakpoints in the more proximal set of inversions. The positions of the breakpoints and the extent of the inverted segments are indicated by the brackets above the chromosome in Figure 6. There is also an apparent small gap in the sequence of the scaffold indicated by a shaded bar above the chromosome map. Despite the complexity of the rearrangements in this arm, the cytologically determined breakpoints and those found in the molecular map are in good agreement (Lemeunier and Ashburner 1976). The only exceptions to this are a group of very small chromosome intervals that would have been very difficult to discern in cytological preparations or by the methodology used to determine the breakpoints (Figure 6, supplemental Table 19).
Muller element F (chromosome 4, section 101–102):
A single scaffold aligns with the F/4 dot chromosome arm (Figure 6). It comprises 1.4 Mb of sequence and contains 77 called orthologs (supplemental Table 19). The proximal end of the scaffold lies at the base of the arm at 101F and distally at 102F near the telomere, essentially covering the entire arm (Figure 6). There is no large rearrangement of the genetic material at the molecular level and the chromosome appears to be largely conserved in genetic content and order in comparison to D. melanogaster.
D. ananassae maps
First described by Doleschall (1858), D. ananassae belongs to the melanogaster group (although not the melanogaster subgroup) of the subgenus Sophophora and is one of eight cosmopolitan species (Patterson and Stone 1952; Tobari 1993a). The mitotic chromosomes of D. ananassae were first described by Metz (1916). Following this initial investigation, Kaufmann (1936), Kikkawa (1938), and Tobari et al. (1993) established the karyotype as three pairs of V-shaped chromosomes, a V-shaped X, and a rod or J-shaped Y in the male complement. The number of arms seen in polytene chromosome preparations was first reported as six by Kikkawa (1935). Kikkawa (1936) further determined that the shortest arms of unequal length are XL and XR while the other four represent the left and right arms of the second and third chromosomes. The largely heterochromatic fourth chromosome is embedded in the chromocenter and is generally not seen as a banded euchromatic element.
Most Drosophila species carry the nucleolus organizer (NOs) on the sex chromosomes (Ashburner et al. 2005), but a unique Y-4 linkage of the NOs has been reported in D. ananassae (Kaufmann 1937; Kikkawa 1938; Roy et al. 2006; Shibusawa et al. 2007). Possibly related to this NO localization is the fact that, in the primary spermatocyte, the X, Y, and fourth chromosomes form a tangled multivalent (Hinton and Downs 1975; Matsuda et al. 1983; Goni et al. 2006). Another unique character of the species is that spontaneous crossing over occurs in males (Kikkawa 1937; Moriwaki 1937), albeit at much lower frequencies than in females. Subsequent to this discovery, Hinton (1970), Hinton and Downs (1975), Moriwaki's group (Moriwaki et al. 1970), and Matsuda et al. (1993) have all studied the cytogenetic basis of this phenomenon. Additionally, parthenogenetic females have been isolated from South Pacific island (Futch 1972) and Papua New Guinea (Matsuda and Tobari 1999) populations.
Early linkage maps of the species were constructed by Kikkawa (1938) and Moriwaki (1938, 1940). Unfortunately, almost all of the mutant stocks listed on their maps were lost. Therefore, after World War II, Moriwaki and colleagues and Hinton reisolated mutants and reconstructed the linkage maps (Hinton 1983; reviewed by Matsuda et al. 1993; Moriwaki and Tobari 1975; Hinton 1991; unpublished data in Tobari 1993b). On the basis of these linkage maps, several genetic factors (Enhancers, Suppressors, and Modifiers) controlling male crossing over have been mapped to the chromosomes (Hinton 1970). In addition, genes related to parthenogenesis (Matsuda and Tobari 2004) and affecting mating behavior have also been mapped on the second chromosome (Doi et al. 2001; Yamada et al. 2002). Hinton's genetic map (see Tobari 1993b) is generally used as the standard for linkage analyses; however, there are many extant unmapped mutants.
Various populations of D. ananassae carry many chromosomal rearrangements, translocations, pericentric inversions, and paracentric inversions (Dobzhansky and Dreyfus 1943; Freire-Maia 1961; Futch 1966) and several reference maps of the six arms of the polytene chromosomes have been prepared. Seecof (Stone et al. 1957) drew a cytological map of a “standard” arrangement. These authors subdivided the polytene chromosomes into 161 divisions, with the enumeration starting at the distal (telomeric) end of each arm. This map was utilized by Futch (1966) to describe a variety of chromosome rearrangements found in South Pacific island populations. Subsequently, Moriwaki and Ito (1969) composed photomaps that were used to describe the puffing patterns and Hinton and Downs (1975) produced a new ideogram map for use in their cytogenetic analyses. More recently, we have prepared revised photographic maps (Tobari et al. 1993), which are based on those of Moriwaki and Ito (1969) to aid in the determination of inversion breakpoints. In these maps, the chromosomes are divided into 100 numerical sections, and each of these is further subdivided into subsections denoted by the letters A, B, C, and D. Section numbering is started from the distal end of XL to the distal end of 3R: XL (1–13), XR (14–20), 2L (21–44), 2R (45–63), 3L (64–81), and 3R (82–99). Section 100 is assigned to the fourth chromosome, although no fourth chromosome bands have yet been identified and we cannot eliminate the possibility of the existence of euchromatic bands on this chromosome. The Muller element equivalences for the six chromosome arms of D. ananassae and its relatives are (Muller element = chromosome arm) A = XL and XR, B = 3R, C = 3L, D = 2R, E = 2L, and F = 4.
Using the revised photographic map, Tomimura et al. (1993) described chromosomal polymorphisms of D. ananassae and related species from 30 populations covering the species range. They found five pericentric inversions, one translocation, and 52 paracentric inversions. Among the 52 paracentric inversions, 32 were previously undescribed and three of the cosmopolitan inversions, In(2L)A, In(3L)A, and In(3R)A, were found in almost all the localities covering the species range. Many overlapping inversions are found in populations of D. ananassae and its relatives, making it possible to deduce the phylogenetic relationships of species belonging to the D. ananassae complex, including D. pallidosa, papuensis-like, pallidosa-like, and Taxon K.
Further mapping efforts are related to an unusual Optic morphology (Om) hypermutability system, which has been used to genetically map lesions at 22 loci (Hinton 1984). It was subsequently discovered that these Om mutations were caused by tom insertions (Shrimpton et al. 1986). Using the tom element, 18 of the 22 Om loci have been mapped cytologically (Tanda et al. 1989; Matsubayashi et al. 1991, 1992; Awasaki et al. 1994; Juni et al. 1996), allowing an alignment of the genetic and polytene maps.
Chromosome maps—D. ananassae:
The D. ananassae strain AABBg1 was used for genomic sequencing instead of the strain used to develop the polytene map (Moriwaki and Ito 1969; Tobari et al. 1993). The AABBg1 strain was used because it was the most highly inbred strain available at the time that genome sequencing began. The polytene chromosome map strain (Moriwaki and Ito 1969; Tobari et al. 1993) has been used extensively to develop the D. ananassae physical map genes using in situ hybridization. The karyotypes of the polytene map strain and AABBg1 have the same standard arrangement on all chromosomes except for Muller C (chromosome 3L), where the polytene strain has the standard arrangement and AABBg1 has the terminal inversion In(3L)A with breakpoints at 75B and 81A.
Muller element A (chromosome XL, telomere to centromere, sections 1–13, and chromosome XR, centromere to telomere, sections 14–20):
A total of 18 scaffolds map to the two arms of the D. ananassae X chromosome, accounting for 37.0 Mb (Figure 7, supplemental Table 20). A total of 48 loci have been mapped via in situ hybridization in the present experiment (see supplemental Table 20). Five scaffolds map to the euchromatic portion of XL, accounting for 19.4 Mb, while 8 scaffolds map to XR, accounting for 12.0 Mb. There is a gap in the region between scaffold 13335 and scaffold 12929, which corresponds to 8C–9B. Probes derived from sequence in scaffolds 13265, 13333, 13111, 12905 and 12048 hybridized to the chromocenter; however, conserved linkage (CL) analysis places these scaffolds on Muller A between scaffolds 13417 and 13137. Although Stephan (1989) mapped the fw gene to 14A by in situ hybridization using a probe of the fw gene of D. melanogaster, in this mapping effort, we could not distinguish between the chromocenter and 14A of XR. Although there are inconsistencies in the order between the molecular and cytological positions within scaffold 13334, this scaffold may be aligned to the proximal region of XR.
Muller element B (chromosome 3R, centromere to telomere, sections 82–99):
Five scaffolds cover almost the entire B element and comprise a total of 21.9 Mb of sequence (Figure 7, supplemental Table 20). A total of 20 loci have been mapped on this chromosome arm, 2 of them by the tom transposable element (Matsubayashi et al. 1992). Scaffold 12422 is adjacent to scaffold 12913 on the basis of the conserved linkage analysis, but this scaffold may be in the centromeric heterochromatin. We did not use sequences of this scaffold for our in situ hybridization analysis to verify this provisional localization.
Muller element C (chromosome 3L, telomere to centromere, section 64–81):
A single scaffold (13266) comprising 19.9 Mb of sequence aligned to almost the entire C element (Figure 7, supplemental Table 20). A total of 17 loci have been mapped on this chromosome arm, 3 of them by the tom transposable element (Matsubayashi et al. 1992). Because Muller C in the genome strain (AABBg1) differs from the polytene map by a single inversion, In(3L)A (75B; 81A), it is necessary to divide the scaffold into two parts at the breakpoint, 75B, to align the scaffold with the polytene chromosome. We have not determined the precise sites of the inversion breakpoint at 75B or 81A; however, the length of the inverted segment and position of breakpoints is in good agreement with the scaffold and polytene maps.
Muller element D (chromosome 2R, centromere to telomere, sections 45–63):
Only one scaffold, scaffold 13337, aligned with almost the entire D element, and it accounts for 23.3 Mb of the chromosome (Figure 7, supplemental Table 20). A total of 18 loci have been mapped on this chromosome arm, 3 of them using the tom transposable element (Matsubayashi et al. 1992). The centromeric region 45A-B remains unknown.
Muller element E (chromosome 2L, telomere to centromere, sections 21–44):
Four scaffolds cover almost the entire E element from the telomere to centromere for a total length of 33.2 Mb (Figure 7, supplemental Table 20). A total of 34 loci have been mapped on the chromosome. An unusual alignment was found in parts of scaffolds 13250, 13,333, and 13,043, which contain homologous regions of the element A of D. melanogaster. The sites of the homologous positions of the element E are 40B, but these could not be distinguished. Further analysis will be necessary to resolve whether these sequence anomalies were due to the transposition or to problems with the assembly.
Muller element F (chromosome 4, section 100):
Sixteen scaffolds have F element orthologs that are 17.8 Mb in length (Table 1, supplemental Table 20). We do not find polytene bands associated with the fourth chromosome, which corresponds to Muller's F element. Mitotic preparations show that it is as long as the Y chromosome and is also entirely heterochromatic and likely resides entirely within the chromocenter in polytene preparations. These data are consistent with the increased size of the F element on the basis of cytology. Unfortunately, we could not determine the location of the corresponding scaffolds. We are currently applying FISH on mitotic chromosomes to locate these scaffolds.
As a preliminary analysis to determine whether repetitive DNA has contributed to the expansion of Muller F, we tested the 17.8-Mb scaffold sequences with Repeat Masker using the D. melanogaster settings for known repeats. The scaffolds from element F of D. ananassae contain an average of 32.5% interspersed repeats that is composed of 27.1% retro element and 5.4% DNA-based repeats. This is similar to the scaffolds assigned to the centromeric regions of the D. virilis genome.
D. pseudoobscura and D. persimilis maps
D. pseudoobscura and D. persimilis are Nearctic members of the obscura group of the genus Drosophila. The ancestral configuration of the six Muller elements in the obscura group is six acrocentric rods (Powell 1997), as seen in the Palaearctic obscura group species D. subobscura (Krimbas 1992). D. pseudoobscura and D. persimilis have evolved from this ancestral condition to have five chromosomes: X, 2, 3, 4, and 5. The metacentric X chromosome is composed of the left and right arms XL and XR that resulted from the fusion of Muller's A and D, respectively. The Y chromosome of D. pseudoobscura is composed of genes from Muller's element D, suggesting that the Y is the degenerate copy of the ancestral autosome (Carvalho and Clark 2005). Chromosomes 2, 3, 4, and 5 are acrocentric rods. The Muller element equivalences for the six chromosomal arms of the obscura group species (Muller element = chromosome arm) are A = XL; B = 4, C = 3, D = XR; E = 2; and F = 5.
The D. pseudoobscura salivary chromosomal maps were originally developed by Tan (1935, 1937). The salivary chromosomes were divided into 100 sections across the six chromosomal arms in the following order: XL (1–17), XR (18–42), 2 (43–62), 3 (63–81), 4 (82–99), and 5 (100). The original section assignments were based on camera lucida drawings of the chromosomes and not on photomicrographs. Sections 38–42 were clearly delineated for the tip of chromosome XR; however, sections 18–37 were not because the drawings were based on polytene chromosomes from D. pseudoobscura/D. miranda hybrids (Dobzhansky and Tan 1936). Only chromosome 3 had its 19 sections subdivided into lettered subsections (Dobzhansky and Sturtevant 1938). Kastritsis and Crumpacker (1966) published photomicrographs of complete D. pseudoobscura salivary chromosomes; however, not all sections were labeled in their figures. Moore and Taylor (1986) published photomicrographs of complete D. persimilis salivary chromosomes with all sections indicated. In addition, Moore and Taylor included photomicrographs of the chromosomes of D. pseudoobscura–D. persimilis hybrids, allowing one to precisely map the breakpoints of the inversion differences between the two species.
The salivary chromosomal maps for the genome strains of D. pseudoobscura and D. persimilis are quite similar except for four inversion differences. Four of the six chromosomal arms have gene arrangement differences within or between the two species (Dobzhansky 1944). While these arrangement differences add complexity to the comparison of the D. pseudoobscura and D. persimilis genomic sequences, they act as valuable physical markers for orienting the genome scaffolds to the salivary chromosomes. Two chromosomes have fixed differences between the two species. Chromosome XL differs by a single fixed inversion that is a diagnostic character that distinguishes the two species (Anderson et al. 1977). Chromosome 2 also has a fixed inversion difference.
Two chromosomes are segregating for gene arrangement polymorphisms in natural populations of the two species (Dobzhansky 1944). Chromosome XR is segregating for two different gene arrangements in each species. D. pseudoobscura has two arrangements called standard and sex ratio that differ by three nonoverlapping inversions (Sturtevant and Dobzhansky 1936). The sex ratio chromosome is a meiotic drive element causing males that carry the chromosome to sire ∼95% daughters. D. persimilis populations also segregate for standard and sex ratio chromosomes; however, the two gene arrangements differ by a single inversion. The sex ratio chromosome of D. persimilis appears homosequential with the standard arrangement of D. pseudoobscura. The genome strains of D. pseudoobscura and D. persimilis carry the standard arrangements of XR. Thus, their XR maps differ by a single inversion difference.
In contrast to the other chromosomes, chromosome 3 has >30 different gene arrangements that segregate in populations of D. pseudoobscura and D. persimilis, making it a central focus of study in these species for >70 years (Dobzhansky and Sturtevant 1938; Dobzhansky 1944; Anderson et al. 1991). The D. persimilis genome strain carries the standard arrangement while the D. pseudoobscura genome strain carries the Arrowhead arrangement (Richards et al. 2005), which was derived from the standard arrangement by a single inversion (Dobzhansky 1944).
The Freeze 1 assembly of D. pseudoobscura used 755 scaffolds to create 16 supercontigs or ultrascaffolds that were anchored to five of the six chromosomal arms (Richards et al. 2005). Chromosomes 2 and 3 (Muller C and E) each had a single supercontig assigned to them, while chromosome arms XL, XR, and 4 (Muller A, D, and B) had 4, 5, and 5 supercontigs, respectively. At the time of the first annotation of D. pseudoobscura, no scaffold assignments were made for chromosome 5, the dot (Muller F). By virtue of the new genomes including the D. persimilis genome in the CAF1 assemblies, we are now able to assign scaffolds to the D. pseudoobscura Muller F.
We now present updated salivary chromosome maps of the D. pseudoobscura and D. persimilis genome strains. The maps present the boundaries of the sections and also now include lettered subsections. We have integrated all physical and genetic map data that allow one to orient the sequence scaffolds from the CAF1 assemblies of the two species. We present the data for D. pseudoobscura first because this species has the vast majority of mapping data available and indicate changes from the original assembly and annotation (Richards et al. 2005). The D. persimilis data are presented second because the whole-genome shotgun had 3× coverage and the larger number of scaffolds was oriented to the physical map using the D. pseudoobscura assembly.
Chromosome maps—D. pseudoobscura:
Muller element A (chromosome XL, sections 1–17):
Three scaffolds map to Muller A for a total of 20.3 Mb of the D. pseudoobscura genome (Figure 8, supplemental Table 21). The original assembly and annotation of D. pseudoobscura mapped four supercontigs to XL because the genes within the scaffold mapped to Muller A in D. melanogaster: ChXL_group1a, ChXL_group1e, ChXL_group3a, and ChXL_group3b. On the basis of the present analysis, two of the XL supercontigs, ChXL_group1a and ChXL_group3a, need to be split and remapped. Segarra and Aguadé (1992) and Segarra et al. (1995) used in situ hybridization to map phosphogluconate dehydrogenase (Pgd) and zeste (z) from Muller A to the base of XR in D. pseudoobscura. ChXL_group1a and ChXL_group3a were split into four and two sequences, respectively, to accommodate these results, and one segment from each scaffold was moved to XR. In addition, this move required that ChXL_group3b also be mapped to the base of XR. The junctions of the rejoined sequences on the two arms of the X are supported by conserved linkage in at least one other species.
The movement of genes from Muller A to Muller D is supported by additional evidence. First, the size of the XL and XR arms is not equivalent in D. pseudoobscura. Each euchromatic portion of Muller A and D in D. melanogaster has 25 Mb of DNA for a total of 50 Mb of sequence. The relative length of the two arms in D. pseudoobscura determined from measurements of salivary chromosomes shows that XL and XR are 41.2 and 58.8% of the complete X chromosome. Thus, XL and XR would be expected to have 20.6 and 29.4 Mb, respectively, assuming that DNA content on the two arms is conserved between D. melanogaster and D. pseudoobscura. The mapped scaffolds on XL and XR are 20.3 and 30.5 Mb, consistent with the relative proportions of the two polytenized chromosomal arms. Second, the Freeze 1 assembly of D. pseudoobscura had a scaffold (Contig6811_Contig7852) that included a junction between Muller A and D genes. The CAF1 assembly of D. persimilis also has a scaffold (scaffold_12) with a similar A/D junction. Although the D. persimilis assembly used the backbone of D. pseudoobscura to assist the assembly, a fresh assembly of D. pseudoobscura was generated for the backbone sequence. In both cases, the original assemblies support a join of scaffolds from Muller A and D within the XR chromosome arm.
The distances to the breakpoints for the fixed inversion difference between D. pseudoobscura and D. persimilis are found in supplemental Table 21 and Table 22. The length of the inverted region (6.8 Mb in D. pseudoobscura and 7.3 Mb in D. persimilis) is proportional to the size of the inversion inferred from salivary chromosomes (Moore and Taylor 1986). The inferred inversion lengths are similar, but not identical, which may reflect differences in the quality of the assemblies between the two species. D. persimilis had only 3× coverage, while D. pseudoobscura had 8× coverage.
Muller element B (chromosome 4, sections 82–99):
Eleven scaffolds map to Muller B in D. pseudoobscura for a total of 27.5 Mb of the genome (Figure 8, supplemental Table 21). The original assembly mapped five scaffolds to chromosome 4: Ch4_group1, Ch4_group2, Ch4_group3, Ch4_group4, and Ch4_group5 (Richards et al. 2005). The exact orientation and location of these scaffolds were not completely determined at the time of that publication. Papaceit et al. (2006) has increased the density of markers on the physical map of Muller B in D. pseudoobscura using five known loci and 19 anonymous DNA probes. No sequence information was available for these anonymous probes; however, these DNAs were hybridized to D. pseudoobscura, D. subobscura, and D. melanogaster, providing valuable clues about where these clones are found in the D. pseudoobscura sequence. The sequences for the anonymous probes in D. pseudoobscura were inferred from the region of hybridization observed in D. melanogaster. We identified the cytological coordinates for all D. melanogaster genes using the map conversion table in FlyBase (FlyBase Consortium 1999). From these data, we mapped the D. pseudoobscura orthologs of D. melanogaster genes that hybridized to each of the anonymous probes. For instance, the anonymous probe P67 hybridized to region 21C of D. melanogaster and 20 genes mapped within the approximate coordinates of 21C. The D. pseudoobscura orthologs of these 20 genes are linked within a single conserved linkage block found in a single sequence scaffold, Ch4_group3. P67 maps to 94B and was near the location of known gene marker Mhc, which was also located in scaffold Ch4_group3 and was mapped to cytological location 95B. In some cases, the block of genes from D. melanogaster mapped to two or more conserved linkage blocks within the scaffolds of D. pseudoobscura. In these cases, each region was considered as the potential location for the anonymous hybridization probe. The locations of known probes and unambiguously mapped anonymous probes were used to triangulate the locations of anonymous probes. Once an anonymous probe was mapped to a single sequence region, all the other locations were ruled out. These additional anonymous physical markers were able to place the five sequence scaffolds for Muller B in D. pseudoobscura.
Three scaffolds, Ch4_group1, Ch4_group2, and Ch4_group4, had to be split and reoriented on the basis of the physical map data and conservation of linkage relationships among the different Drosophila species. In addition, eight additional scaffolds were mapped to Muller B by virtue of the Synpipe analysis (Bhutkar et al. 2006).
Muller element C (chromosome 3 arrowhead arrangement, sections 63–81):
A single supercontig (Ch3) was generated in the original D. pseudoobscura assembly that accounted for 19.8 Mb of the genome (Figure 8, supplemental Table 21). Four new scaffolds were integrated on the physical map, three at the centromeric end and one inserted in the middle of chromosome 3 (supplemental Table 15). These changes were made on the basis of conserved linkage information among the 11 Drosophila species.
The distances to the breakpoints for the standard-to-Arrowhead inversion event between D. pseudoobscura and D. persimilis are found in supplemental Table 21 and Table 22. The size of the inverted region (5.9 Mb in D. pseudoobscura and 5.9 Mb in D. persimilis) is proportional to the size of the inversion inferred from salivary chromosomes (Dobzhansky and Sturtevant 1938).
Muller element A/D (chromosome XR, sections 18–42):
Seven scaffolds map to Muller A/D in D. pseudoobscura for a total of 30.5 Mb of the genome (Figure 8, supplemental Table 21). The original assembly mapped five scaffolds to Muller A/D in D. pseudoobscura: ChXR_group6, ChXR_group9, ChXR_group8, ChXR_group3a, and ChXR_group5 (Richards et al. 2005). Scaffold ChXR_group9 was merged with ChXR_group8 because the two scaffolds were adjacent; however, the orientation of the half of the original ChXR_group8 appears reversed on the basis of the configuration of the two inversion breakpoints between D. pseudoobscura and D. persimilis, analysis of D. persimilis original sequence traces with associated mate pairs, and linkage mapping data that are all inconsistent with the original assembly (Ortiz-Barrientos et al. 2006). Additionally, three scaffold segments that were incorrectly assigned to XL were added to XR (see above): ChXL_group1a, ChXL_group3a, and ChXL_group3b.
The distances to the breakpoints for the fixed inversion between D. pseudoobscura and D. persimilis are found in supplemental Table 21 and Table 22. The size of the inverted region (13.2 Mb in D. pseudoobscura and 12.2 Mb in D. persimilis) is proportional to the size of the inversion inferred from salivary chromosomes (Moore and Taylor 1986).
Analysis of a 25,989-bp nucleotide sequence within the A/D junction region reveals a 1039-bp repeat sequence that generates 11,457 hits in a BLASTN (Altschul et al. 1997) search of D. pseudoobscura genomic scaffolds. These BLAST hits are overrepresented in Muller F scaffolds as well as in the set of unplaced scaffolds, two scaffold sets likely to be enriched for heterochromatic DNA. This sequence was also found in D. persimilis in roughly the same location.
Muller element E (chromosome 2, sections 43–62):
A single supercontig (Ch2) was generated in the original D. pseudoobscura assembly that accounted for 30.8 Mb of the genome (Figure 8, supplemental Table 21). The distances to the breakpoints for the fixed inversion difference between D. pseudoobscura and D. persimilis are found in supplemental Table 21 and Table 22. The size of the inverted region (7.6 Mb in D. pseudoobscura and 7.7 Mb in D. persimilis) is proportional to the size of the inversion inferred from salivary chromosomes (Moore and Taylor 1986).
Muller element F (chromosome 5, section 100):
A total of 25 scaffolds were assigned to Muller F in D. pseudoobscura for a total of 1.2 Mb of the genome (Figure 8, supplemental Table 21). Five additional scaffolds can be assigned on the basis of comparison to the D. persimilis assembly. The scaffold map for Muller F was not well defined in the original assembly and annotation of D. pseudoobscura (Richards et al. 2005), largely because few genetic and physical markers exist for the dot chromosome. The availability of sequence information from the other species of Drosophila provides valuable clues about the order of the Muller F scaffolds using the Synpipe analysis (Bhutkar et al. 2006). The genes of the dot assemble into one supercontig in D. virilis, D. mojavensis, and D. willistoni and into four supercontigs in D. persimilis. These data allowed the scaffolds to be ordered but not oriented in D. pseudoobscura on the basis of conserved linkage analysis.
Chromosome maps—D. persimilis:
The coverage of the D. persimilis genome was 3×, which led to a more fragmented genome assembly. The D. persimilis assembly was assisted with a new ARACHNE assembly of the D. pseudoobscura genome that differed from the ATLAS assembly done by the Human Genome Sequencing Center at Baylor College of Medicine. This process introduced some errors in the D. persimilis assembly of Muller C scaffolds. The scaffolds of D. persimilis were oriented to the salivary chromosomes on the basis of homology with D. pseudoobscura, which had a more detailed genetic and physical map. The karyotype of D. persimilis is standard on Muller C.
Muller element A (chromosome XL, sections 1–17):
Twenty-one scaffolds map to Muller A in D. persimilis for a total of 21.2 Mb of the genome (Figure 9, supplemental Table 22). The distances to the breakpoints for the fixed inversion difference between D. pseudoobscura and D. persimilis are found in supplemental Table 21 and Table 22. The map of Muller A in D. persimilis can be found in Figure 9 and supplemental Table 22.
Muller element B (chromosome 4, sections 82–99):
A total of 12 scaffolds map to Muller B in D. persimilis for a total of 28.4 Mb of the genome (Figure 9 and supplemental Table 22).
Muller element C (chromosome 3 standard arrangement, sections 63–81):
Eleven scaffolds map to Muller C in D. persimilis for a total of 19.9 Mb of the genome (Figure 9 and supplemental Table 22). Two scaffolds (scaffold_2 and scaffold_4) required breakage and rejoining to account for the distance between the standard and Arrowhead inversion breakpoints. The cytogenetic evidence shows that the distance between breakpoints is 5.9 Mb or 29.8% of Muller C in the two species. Both breakpoints for this inversion that were identified by Richards et al. (2005) map within scaffold_4 of the D. persimilis assembly, but are separated by only 1.4 Mb within the scaffold. Polytene chromosomes in hybrids between the D. pseudoobscura and D. persimilis genome strains confirm that there is a single inversion difference between the two species (S. W. Schaeffer, unpublished data). The junctions at the scaffold breaks and rejoins are supported by conserved linkage relationships among the 12 Drosophila species (S. W. Schaeffer, unpublished data).
Muller element A/D (chromosome XR, sections 18–42):
Fifty-two scaffolds map to Muller A/D in D. persimilis for a total of 29.2 Mb of the genome (Figure 9, supplemental Table 22). The distances to the breakpoints for the fixed inversion difference between D. pseudoobscura and D. persimilis are found in supplemental Tables 21 and 22.
Muller element E (chromosome 2, sections 43–62):
Five scaffolds map to Muller E in D. persimilis for a total of 31.7 Mb of the genome (Figure 9, supplemental Table 22). The distances to the breakpoints for the fixed inversion difference between D. pseudoobscura and D. persimilis are in supplemental Table 22.
Muller element F (chromosome 5, section 100):
Five scaffolds are assigned to Muller F in D. persimilis for a total of 1.5 Mb of the genome (Figure 9, supplemental Table 22), but the scaffolds are not oriented to the map because of the lack of genetic or physical markers.
D. willistoni maps
Members of the D. willistoni species group are found in the biomes of the neotropical region and account for >80% of the drosophilid fauna collected from fruit baits. This is particularly true if we consider the ubiquity of some species of the D. willistoni subgroup: D. willistoni, D. paulistorum, D. tropicalis, and D. equinoxialis, especially in the Amazonian region (Martins 1987). The willistoni species group is a clade of ∼25 described species (Bächli 2006). This group is restricted to the neotropical region and is basal to the Palaearctic and Nearctic members of the melanogaster and obscura groups. D. willistoni is one of the most widely distributed Drosophila species in the New World and can be found in the southern United States, throughout Central America and the Caribbean, and in southern South America, in Argentina (Patterson and Stone 1952; Spassky et al. 1971). Along with their sister taxon, the saltans group, the willistoni species lack some of the defining secondary sexual characters (e.g., sex combs) found in these more derived groups (O'Grady and Kidwell 2002). Gleason et al. (1998) have provided a phylogeny of the willistoni species group based on the nucleotide sequences of one mitochondrial and two nuclear genes. On the basis of their analyses, D. willistoni is most closely related to D. tropicalis, and these two taxa compose the sister group to a clade containing D. equinoxialis, D. paulistorum, and D. pavlovskiana. Phylogenies inferred from chromosome rearrangement events on Muller B, however, showed that D. willistoni is the sister group to D. tropicalis, D. equinoxialis, and D. paulistorum (Rohde et al. 2006).
The first reference map of the polytene chromosomes of D. willistoni was drawn by Dobzhansky (1950), who defined the standard karyotype from a strain collected from the Belém population in northern Brazil. Strains from the Belém population were relatively free from inversion variation from which a standard polytene map could be developed (Dobzhansky 1950). The salivary chromosomes were divided into 100 sections across the five chromosomal arms in the following order: XL (Muller A, 1–16), XR (Muller element D, 17–36), IIL (Muller element C, 37–55), IIR (Muller element B, 56–77), and III (Muller element F·E, 78–100).
Spassky and Dobzhansky (1950) isolated mutant strains from the Belém, Brazil, population to develop a genetic map. These genetic maps provide some useful markers for orienting the sequence scaffolds from D. willistoni, but these data must be viewed with caution because it is not always obvious if the new photographic maps developed for this project are homosequential with the Dobzhansky (1950) maps. D. willistoni has extensive gene arrangement polymorphism on all chromosomes, which has been described from chromosomal variability in natural populations (da Cunha et al. 1950, 1959; da Cunha and Dobzhansky 1954; Valente and Araujo 1985, 1986; Valente et al. 1993, 2001, 2003; Rohde 2000; Rohde et al. 2005).
Regner et al. (1996) developed the first photographic maps for D. willistoni using the Dobzhansky (1950) drawings to demarcate the major chromosomal sections. In parallel with those studies, in situ hybridization mapping studies were used to establish the homology among the different arms and delineate rearrangement breakpoints of D. willistoni. Bonorino et al. (1993) mapped the Hsp70 locus in seven species within the D. willistoni subgroup and in D. nebulosa. The CuZn Sod (Rohde et al. 1994) and Alcohol dehydrogenase (Adh) (Rohde et al. 1995) genes were also mapped in this species group. Rieger (1999) mapped nine genes in D. willistoni (Hsp8, Hsr-omega, Hsp 27, Ubi, BRC, E74, E75, 71E, and Sgs5) using heterologous probes from the D. melanogaster genome.
Rohde (2000) improved the photographic maps of Regner et al. (1996) after the analysis of at least 10 chromosomal arms per strain from 22 isofemale lines from populations representative of almost all geographical regions of the species. The standard arrangement of the X chromosome presented by Regner et al. (1996) was changed because this chromosome had a less common karyotype found throughout the geographical distribution of the species. First, autosome (chromosomes II and III) patterns were maintained according to Regner et al. (1996), following the order presented by Dobzhansky (1950). The maps were modified to redefine the boundaries of the numbered sections in each chromosomal arm within interband regions. Second, each section was divided into lettered subsections (starting with A in the proximal region), which allow higher precision in the description of inversion breakpoints and in situ localization of probes. Third, sections 81 and 80 of the third chromosome, III, were returned to their original order as described by Dobzhansky (1950). These modifications made by Rohde (2000) improved the photographic map and allowed higher precision in the description of inversion breakpoints and the in situ localizations.
Chromosome maps—D. willistoni:
In the photomap presented in Figure 10, all chromosomes are oriented in the same way, with the proximal (centromeric) regions at the left and the distal (telomeric) regions at the right. Despite some modifications, the section boundaries established by Dobzhansky (1950) and Rohde (2000) were preserved. At this time, there are limited numbers of genetic and physical markers for D. willistoni so the orientation of the scaffolds should be viewed as provisional.
Muller element A (chromosome XL, sections 1–16):
Five scaffolds map to Muller A in D. willistoni, comprising a total of 27.9 Mb of genomic sequence (Figure 10, supplemental Table 23). The five scaffolds were joined on the basis of the conserved synteny at the ends of the scaffolds. The orientation was determined on the basis of the location of the br locus, which maps close to the telomere (Spassky and Dobzhansky 1950). The order of sn, f, sc, y, lz, and w on the genetic map is consistent with their order within scaffold 181096. The genetic locations of Notch and cut, however, are at opposite ends of the scaffold map. This may result from an inversion difference between the genetic mapping strains and the genome strain. The orientation of these scaffolds should be viewed with caution.
Muller element B (chromosome 2R, sections 56–77):
Eight scaffolds map to Muller B in D. willistoni, comprising a total of 32.3 Mb of genomic sequence (Figure 10, supplemental Table 23). The eight scaffolds were joined on the basis of the conserved synteny at the ends of the scaffolds. Two markers, Adh and Cl, provide a tentative orientation for the scaffold map, although Adh is more distal on the genetic map (Lakovaara and Saura 1972), again suggesting an inversion difference between the genome strain and the mapping strain.
Muller element C (chromosome 2L, sections 37–55):
Eight scaffolds map to Muller C in D. willistoni, comprising a total of 29.6 Mb of genomic sequence (Figure 10, supplemental Table 23). The eight scaffolds were joined on the basis of the conserved synteny at the ends of the scaffolds. One assembly error was detected for one of the scaffolds on this arm, scaffold 181009. Nucleotides 1–2,748,112 had 231 orthologous gene calls on Muller C and nucleotides 2,841,197–3,492,693 had 80 orthologous gene calls on Muller D. This scaffold was split between the two Muller elements. Two distal genetic markers px and bw suggest the orientation of the scaffold map to the polytene chromosomes.
Muller element D (chromosome XR, sections 17–36):
Eight scaffolds map to Muller D in D. willistoni, comprising a total of 29.6 Mb of genomic sequence (Figure 10, supplemental Table 23). The eight scaffolds were joined on the basis of the conserved synteny at the ends of the scaffolds. Bases 2,841,197–3,585,778 in scaffold 181009 map to Muller D. The genetic and physical markers do not clearly resolve the orientation of the scaffolds to the cytogenetic map. A tentative placement is shown in Figure 10.
Muller element F·E (chromosome 3, sections 78–100):
Three scaffolds map to Muller E/F in D. willistoni, comprising a total of 33.7 Mb of genomic sequence (Figure 10, supplemental Table 23). The three scaffolds were joined on the basis of the conserved synteny at the ends of the scaffolds. Base 1 of scaffold 181130 was anchored to the centromeric region of chromosome 3 on the basis of the three genes from the fused dot chromosome that hybridize to section 78 on the polytene map (Papaceit and Juan 1998). Probes for the Xdh locus map to the central region of the chromosome, which is consistent with the location of Xdh within scaffold 181089. Thus, the orientation of the F·E scaffolds is the best supported of the five major chromosomal arms of D. willistoni.
The junction between Muller F and E genes is located in scaffold 181130 and is defined between genes CG34036-PA of F and CG17119-PA of E and between nucleotides 2,014,728 and 2,029,101. The 14-kb sequence has no assembly gaps and is 64.7% A + T. This region was used in a BLASTN search to determine if the sequence provides any clues about the fusion and rearrangement process. The BLASTN comparison of the E·F junction region with the D. willistoni genome reveals a repeat of 107 bp (2,022,283–2,022,362) and 141 bp (2,024,912–2,025,052) that are found ubiquitously throughout the genome. The majority of hits within D. willistoni match unplaced scaffolds, suggesting that this region may have been heterochromatic at one time. These repeats are also found in the Drosophila genomes, but the length of the repeats is 47 bp. A BLASTN search of the sequence between the two repeats finds a match to a P element of D. sturtevanti (GenBank accession no. AY578784). This could suggest a role for transposable elements in the F·E fusion and subsequent rearrangement.
D. virilis maps
Analyses of genes and genomes are a primary focus of research efforts on D. virilis and its close relatives. The arrangement of chromosomes in D. virilis represents the inferred karyotype of the common ancestor of the genus Drosophila (reviewed by Clayton and Guest 1986), and therefore, it has been used frequently as an outgroup for a variety of studies of the Sophophora lineage, especially for comparative sequence analyses of genes in D. melanogaster. Each chromosomal element in the D. virilis genome exists as an independent acrocentric arrangement (or rod) with a near-terminal centromere, so the diploid number is 12, including the small pair of dot chromosomes. Homology between these individualized elements and the chromosomal arms of the metacentric chromosomes of D. melanogaster was demonstrated by early studies using cross-species hybridization (Loukas and Kafatos 1986; Whiting et al. 1989), thus providing molecular confirmation of the classically inferred homology among Muller elements. The relationships between the Muller element and the numbering of the six chromosomes of D. virilis are (Muller element = chromosome) A = X; B = 4, C = 5, D = 3; E = 2; and F = 6.
Several investigators produced the original drawn maps of the polytene chromosomes of D. virilis and developed different nomenclatures for the banding patterns (Fujii 1936, 1942; Hughes 1936, 1939; Patterson et al. 1940b; Hsu 1952). However, the photographic map and nomenclature developed by Gubenko and Evgen'ev (1984) has been adopted as the standard for physical mapping on the polytene chromosomes of D. virilis. Under their system, the X chromosome is divided into 19 sections numbered consecutively with section 1 beginning at the telomere and section 19 ending at the proximal junction between the polytene chromosome and chromocenter. Each section of the X is further subdivided into regions A–D. The major autosomes are divided into 10 sections each with further subdivision designated by lettering, and the dot chromosome consists of a single section, 60, with lettered subdivisions. This demarcation of the polytene bands and associated nomenclature was further refined in the graphic map developed by Kress (1993). The nomenclature for the banding pattern in the polytene chromosomes contained in these maps has been used almost exclusively in physical mapping studies of D. virilis. Thus it is unnecessary to consider older representations of the polytene chromosomes with alternative nomenclatures; also, it would create unnecessary confusion to develop a new nomenclature.
Over 75 different D. virilis gene sequences have been obtained in studies of conservation within and among genes to identify functional constraints. Many investigators have used a widely shared λ-phage genomic library (Blackman and Meselson 1986) to isolate and sequence a large portion of these genes. In addition to revealing patterns of sequence conservation and divergence within and flanking single genes, conservation of nested gene structures has also been demonstrated (Kaymer et al. 1997). However, early studies of gene arrangement quickly revealed that the appearance of highly integrated overlapping gene structures in D. melanogaster is not necessarily an indication of evolutionary conservation, given the plethora of genome rearrangements between these genomes (Neufeld et al. 1991; Von Allmen et al. 1996). Performing comparative analyses to identify conserved regions of genes and gene clusters in D. melanogaster is now a straightforward endeavor, given the availability of complete genome sequences of D. virilis and 11 other Drosophila species (Drosophila 12 Genomes Consortium 2007).
Emergence of a sparse physical map of loci distributed throughout the genome of D. virilis was one indirect outcome of these comparative sequence analyses that frequently involved in situ localization of the genes. In addition to these individually mapped genes, a dense physical map was constructed by D. Hartl's lab using in situ hybridization to localize large-insert P1 clones at single chromosomal positions (Lozovskaya et al. 1993; Vieira et al. 1997b). Although sequences were originally obtained from only select clones containing a few targeted genes, microsatellite loci and putative gene regions have since been identified from partial sequences of the inserts of other mapped P1 clones (Schlötterer 2000; Huttunen and Schlötterer 2002; McAllister and Evans 2006; Vieira et al. 2006). Comparative genomic analyses using D. virilis as one reference species have also added loci to the physical map (Vieira et al. 1997a; Ranz et al. 1999; Päällysaho et al. 2001). Physically mapped positions within the genome and their associated sequences represent a critical link for associating an assembled genome sequence with the underlying arrangement of chromosomes. The dense physical map of D. virilis provides a unique opportunity to independently evaluate the accuracy of the assembly and to orient the sequence along each chromosome arm.
Availability of a well-resolved physical map made D. virilis a logical choice for genome sequencing, but this species could have presented unique challenges in whole-genome sequencing because of a much larger genome size relative to other Drosophila species. The total genome size of D. virilis is estimated to be about twice that of D. melanogaster (Kavenoff and Zimm 1973; Laird 1973). However, ∼40% of this enlarged genome is composed of highly repetitive satellite DNAs (Gall et al. 1971; Schweber 1974) that are likely sequestered in the heterochromatic regions adjacent to the centromeres of its acrocentric chromosomes (Beck 1977). Consistent with an unequal expansion of heterochromatic and euchromatic genomic regions, a comparison of a limited number of introns showed their mean length to be only 39% longer in D. virilis relative to D. melanogaster (Moriyama et al. 1998). Enlargement of both heterochromatic and euchromatic genomic regions persists despite there being a deletion bias that causes a high intrinsic rate of loss of repetitive sequences (Petrov et al. 1996; Petrov and Hartl 1998). Availability of the sequence of the considerably enlarged genome of D. virilis provides a substrate for gaining insight into the regulation of genome size and the asymmetric portioning of this excess into heterochromatic and euchromatic domains.
Although the distinction between heterochromatin and euchromatin is quite strong in the Drosophila genome, and changes in gene position relative to these domains leads to the well-known phenomenon of position-effect variegation, initial comparative studies indicate little conservation in gene content of heterochromatin between D. virilis and D. melanogaster. Both light and Dbp80 are heterochromatic in D. melanogaster, but are located within euchromatic regions of D. virilis (Yasuhara et al. 2005; Schulze et al. 2006). The RpL15 gene adjacent to Dbp80 in D. melanogaster does appear to be a conserved heterochromatic gene; however, it has moved between chromosomal elements in the melanogaster group, and independently, it has moved to a euchromatic position in D. pseudoobscura (Schulze et al. 2006). Flux in the gene content of the heterochromatin of the major chromosomes contrasts with conservation in the content of the unique dot chromosome (Riddle and Elgin 2006; Slawson et al. 2006). Another conserved chromosomal feature is the presence of TART elements at the telomeres (Casacuberta and Pardue 2003). Organization of the assembled genome sequence of D. virilis relative to markers in the euchromatic regions of the polytene chromosomes will provide initial guidance for identifying further movement of genes between these chromatin domains and will serve as a reference for conducting comparative genomic analyses in the subgenus Drosophila.
Chromosome maps—D. virilis:
Availability of a dense physical map, coupled with the identification of conserved syntenic groups at scaffold edges, suggests that most of the euchromatic coding genes are present in 22 scaffolds containing ∼150.7 Mb of sequence organized along each chromosomal arm. Furthermore, 10 scaffolds containing ∼12.2 Mb of sequence and likely representing centromeric regions of specific chromosomal elements are also identified (supplemental material), but these scaffolds are not oriented along the chromosomes.
Muller element A (chromosome X, sections 1–19):
The X chromosome of D. virilis has a high marker density due to the large number of individually mapped genes. A total of 105 physically mapped markers anchor six scaffolds along the X chromosome (Figure 11), and 14 additional markers are present in the genome sequence at positions that are inconsistent with their reported chromosomal location (supplemental Table 24). Because the small subset of markers showing inconsistencies are distributed throughout the chromosome, these disagreements most likely arise from incorrect assignment of map positions—not from errors in the assembled sequence. The set of markers that order and orient the scaffolds span cytological divisions 1A at the telomere to 19C near the centromere, which corresponds to sequences in scaffold 13042 (A1) and scaffold 12970 (A6), respectively. Only 5 markers are represented in the centrally positioned 0.8-Mb scaffold 12472 (A5). Although these markers position the scaffold at cytological positions 13B and 13C, the orientation of the scaffold is not clearly delimited with these mapped positions. The analysis of conserved syntenic gene arrangement orients this scaffold by inferring joins with the flanking scaffolds. Position and orientation of the other five scaffolds is strongly supported by physically mapped markers, by loci within the linkage map, and by conserved syntenic gene arrangement at the junctions between scaffolds (Figure 11). Conservation of syntenic groups identified in the other sequenced genomes indicates no missing genes in the gaps that currently exist at the junctions between the scaffolds. These six scaffolds positioned on the X chromosome contain 30.5 Mb of the assembled genome sequence.
Correspondence between the linkage map and the assembled genome sequence is the strongest for the X chromosome. Six intervals in the linkage map are represented by loci identified within the same scaffold; thus, these intervals provide an estimate of the relationship between physical distance and recombination rate. An average 131 kb/cM was measured from an approximately twofold range from 75 to 158 kb/cM. This high level of recombination had been previously inferred from the overall length of the 170-MU linkage map of element A. Interestingly, the highest recombination rate is estimated for the interval flanked by yellow and scute near the telomere of the X, which is a region that exhibits an unusually low level of recombination in populations of D. melanogaster collected in North America (Aguadé et al. 1989).
Muller element B (chromosome 4, sections 40–49):
A total of 90 physically mapped markers anchor three scaffolds on chromosome 4 of D. virilis (Figure 11; supplemental Table 24). The most distal marker, corresponding to a P1 clone with sequence in scaffold 13246 (B1), is located in the telomeric cytological band 40A. The most proximal marker is the tandemly duplicated Adh gene at 49B (Nurminsky et al. 1996; Charlesworth et al. 1997), which is present in scaffold 12723 (B3). Twelve additional markers have reported map positions along chromosome 4, but their position in the assembly is inconsistent with flanking markers and they likely represent cases of erroneous map positions (supplemental Table 24). Conservation of syntenic blocks orients these scaffolds in the same manner as the mapping data, and the conservation of gene order at scaffold edges indicates that known coding genes are not missing in the gaps. Therefore, the three scaffolds (13246, B1; 12963, B2; and 12723, B3), which represent 28.7 Mb of the assembly, appear to contain the entire euchromatic region of chromosome 4 (Figure 11).
Interestingly, loci on the linkage map of chromosome 4 are not congruent with the arrangement of the scaffolds inferred with physical markers and conserved synteny. One endpoint of the linkage map, plexus (Dvir\px) at the zero position, is located near the distal end of the genome sequence within scaffold 13246 (B1) using the putative ortholog Dmel\net as a BLAST query (Figure 11). At the proximal end, the terminal locus black (Dvir\bl) corresponds to the appropriate position within scaffold 12723 (B3), using the putative homolog Dmel\b to locate the corresponding gene within the genome sequence. Intermediate positions on the linkage map, however, are not consistent with this arrangement (supplemental Table 24). This inconsistency has several possible causes, such as errors in the assembly, incorrect assignment of homology between mutant phenotypes, or errors in the linkage map. Appearance of systematic disagreement between positions in the linkage map and the oriented sequence indicates that reported positions on the linkage map of chromosome 4 may comprise two linkage groups constructed in opposite directions along this chromosome.
Muller element C (chromosome 5, sections 50–59):
Four scaffolds of the assembled genome sequence are oriented on chromosome 5 using physical map data representing 68 markers (Figure 11). Mapped positions span the entire polytene chromosome map from the telomere at cytological band 50A, which corresponds with the sequence of scaffold 12823 (C1), to the pericentromeric heterochromatin at cytological band 59F, which corresponds to the sequence of scaffold 13324 (C4). Conserved syntenic gene arrangement identified only a single scaffold join on chromosome 5, represented by the telomeric scaffold (12823, C1) and the adjacent subtelomeric scaffold 10324 (C2).
As indicated in supplemental Table 24, five markers distributed throughout chromosome 5 are located in the genome sequence at positions that are inconsistent with their reported position on the physical map. More importantly, a block of markers contained in the large scaffold positioned at the center of chromosome 5 exhibits a shared inconsistency with their reported map positions, thus indicating a likely case of an error in the assembly (Figure 11). A block of markers localized on the cytological map at cytological subdivisions 51B/C corresponds with the interval between positions 636 and 906 kb in the sequence of scaffold 12875. However, starting at position 1732 kb–20,532 kb, this scaffold is oriented, respectively, from cytological band 59E to 51F. This disagreement between the sequence and groups of markers indicates that scaffold 12875 does not represent the correct organization of chromosome 5. Furthermore, conservation of a syntenic group consisting of putative orthologs of Dmel\Rya-r44F at position 1735 kb in scaffold 12875 and Dmel\CG8740 near the end of scaffold 13324 is consistent with an erroneous join within or between contigs in scaffold 12875 near position 1730 kb. Thus, base 1–∼1700 kb of scaffold 12875 is likely to be oriented in the minus direction around 51 B/C (designated as C3′ in Figure 11). This corrects the apparent discrepancy between widely separated positions in the sequence corresponding with markers located within cytological section 59. One of these markers is a P1 clone from a library with an average insert size of 65.8 kb (Lozovskaya et al. 1993). This P1 clone has one end sequence located at position 1732 kb in scaffold 12875 (C3) and the other near the edge of scaffold 13325 (C4), which further suggests a join between an internal position within scaffold 12875 and scaffold 13324.
Although this apparent error in the assembly of scaffold 12875 affected the inference of junctions between scaffolds on the basis of the conserved syntenic groups, no other scaffolds (>5 kb) contain putative orthologs of genes on element C. Therefore, these four scaffolds, which contain a total of 27.3 Mb, apparently represent the sequence of chromosome 5 without any large gaps in the regions between the organized scaffolds.
Muller element D (chromosome 3, sections 30–39):
The largest scaffold in the assembled genome sequence (scaffold 13049 at 25.1 Mb) is anchored onto chromosome 3 with 107 markers distributed between cytological bands 30B and 39F (D4 in Figure 11). Two additional scaffolds contain markers mapping to the telomeric subdivisions 30A and 30B; however, the single marker in the most distal scaffold (10322, D1) does not orient the sequence, and the markers in the subtelomeric scaffold (12758, D3) provide only a tentative arrangement of this sequence. Conservation of syntenic groups, however, resolves the order and orientation as a single block consisting of these scaffolds and an additional small scaffold in the subtelomeric region (D2). Genes present in the syntenic group identified from the other genome sequences are missing at the inferred join between D3 and D4, which is represented by the gap in Figure 11. Overall, four scaffolds encompassing a total of 26.7 Mb of the assembled genome contain the sequence of chromosome 3.
The linkage group representing chromosome 3 is associated with the genome sequence through two loci with clear homologies to genes in D. melanogaster (Figure 11). One of the loci used to position the linkage map is the cinnabar locus of D. virilis, which was previously considered by Sturtevant and Novitski (1941) to be homologous with the scarlet locus of D. melanogaster. The reverse homology (i.e., Dvir\st with Dmel\cn) is evident on the linkage map of element C (Figure 11). Homology between the short veins (sv) locus of D. virilis and the rhomboid (Dmel\rho) locus of D. melanogaster provides a second anchor for the linkage map. These two loci are located ∼2.1 Mb apart in the assembly and separated by 15.5 MU, which is consistent with the length of a centimorgan estimated from the X chromosome.
Muller element E (chromosome 2, sections 20–29):
Chromosome 2 is the longest, and the 35.5 Mb of sequence contained in the scaffolds placed on this chromosome reflects this feature. Three scaffolds are oriented along chromosome 2 using 134 physically mapped markers spanning the region between 20C and 29E (Figure 11). An additional 15 markers are located in the sequence at positions that are inconsistent with their reported chromosomal location (supplemental Table 24). The syntenic analysis indicated that genes are missing at the inferred scaffold joins between scaffold 12822 (E1) and 13047 (E2) and between scaffold 13047 (E2) and scaffold 12855 (E3). However, the comparative analysis places an additional scaffold (12954, E4) in the proximal euchromatin at the base of chromosome 2 without any indication of missing genes between these scaffold edges.
Muller element F (chromosome 6, section 60):
Extensive mapping and sequencing of the dot chromosome of D. virilis by S. Elgin and colleagues (Slawson et al. 2006) provides a unique resource for identifying and orienting the sequence of this chromosome. An ∼2-Mb single scaffold (13052, F1) appears to contain the entire assembled sequence of chromosome 6 (Figure 11). Furthermore, this is the only scaffold identified as containing genes located on element F of D. melanogaster.
D. mojavensis maps
D. mojavensis is a member of the mulleri complex of the highly speciose repleta group that underwent an explosive radiation in the New World (Wasserman 1982). All of the >24 mulleri complex species are cactophilic, inhabiting regions and hosts that are inhospitable to many other Drosophila species. The polytene map of D. repleta was employed as the standard for all repleta group species (Wasserman 1962). To create maps for other members of the repleta group, Wharton's original drawings were physically cut and rearranged to suit the banding patterns observed microscopically and in photomicrographs, and the original nomenclature for the D. repleta map was retained in other species. It then was possible to tabulate the nature and number of chromosomal variations observed in the repleta radiation. Of 240 chromosomal rearrangements in the repleta group, 235 were found to be paracentric inversions and 4 were centric fusions (Wasserman 1982).
On the basis of the changes in the repleta map, Wasserman reconstructed the mutational events ancestral to the mulleri complex (Wasserman 1960) and D. mojavensis (Wasserman 1962). The closest relative of D. mojavensis, D. arizonae, differs from it by three fixed inversions in the X (Muller element A), second (Muller E), and third (Muller B) chromosomes. In D. mojavensis, polymorphism for chromosomal inversions exists in chromosomes 2 (five inversions) and 3 (two inversions) and varies among populations (Mettler 1963; Johnson 1980). Although chromosomes 4, 5, and 6 are colinear between the two species, and reproductive isolation is incomplete, there is no evidence for introgression in either direction (Counterman and Noor 2006; Machado et al. 2007b).
D. mojavensis was described by Patterson et al. (1940a) from a specimen found in southern California. Since then, three other populations have been discovered (Sonora, Mexico; Baja California, Mexico; Santa Catalina Island, CA) breeding in different host cactus species at the regional level. The four geographic host races show genetic differentiation from one another (Reed and Markow 2004; Ross and Markow 2006; Machado et al. 2007a) along with varying levels of reproductive isolation among themselves and with their sibling species, D. arizonae (Vigneault and Zouros 1986; Markow 1991; Reed and Markow 2004). These features of D. mojavensis have made it a popular and useful model species for studies of adaptation (Matzkin et al. 2006) and speciation (Markow and Hocutt 1998).
The strain of D. mojavensis utilized in the genome-sequencing project was derived from an isofemale strain collected on Santa Catalina Island that is fixed for the standard gene arrangements for chromosomes 2 and 3. The gene arrangements in the salivary chromosomes of this strain form the basis for the current physical and genetic maps for D. mojavensis. We constructed a standardized polytene chromosome map for the karyotype of the D. mojavensis strain from Catalina Island with the aid of our own photomicrographs as well as the modified (Ruiz et al. 1990) original drawings of Wharton (1942). Chromosome numbers were maintained as in Wharton's original D. repleta maps: chromosome 1 = X or element A; 2 = E; 3 = B; 4 = D; 5 = C; and 6 = F (dot). At this time, we have omitted chromosome 6 from the map. New nomenclature was created by dividing each element into 20 sections of roughly similar size. The sections were demarcated by sharp bands. The 20 sections were numbered from the distal to the proximal end for each chromosome. Each numbered section then was further subdivided (distal to proximal) into four lettered sections, again demarcated by prominent bands in photographs.
Chromosome maps—D. mojavensis:
Muller element A (chromosome X, sections 1–20):
Five scaffolds map to Muller A, comprising 32.0 Mb of sequence that localizes to cytological regions 1–20. Scaffold 6473, 16.9 Mb, was anchored to the proximal end of the chromosome (19B) (Figure 12, supplemental Table 25). Computational approaches were used to determine the positions of scaffolds 6328, 6308, and 6359, although they have not been anchored yet to the distal end of element A. The linkage markers further support the proposed relationship between the scaffolds. Two of the four loci, DMOJX030 and DMOJX080, are found within scaffolds 6308 and 6359, respectively.
Muller element B (chromosome 3, sections 41–60):
One scaffold maps to Muller B, comprising 32.4 Mb of sequence that localizes to cytological regions 41–60 (Figure 12, supplemental Table 25). The scaffold, 6500, was anchored near its distal end to 44B. The relationships of five markers were established for chromosome 3. No BLAST hits were found for marker A3-10-13 and hence its relationship on the scaffold cannot be determined. Marker A1-2-1 (not shown) is located 36.7 cM units proximal to M2-17-5 and has a BLAST hit to scaffold 6499, a small (421 kb) scaffold of unknown orientation and placement relative to scaffold 6500.
Muller element C (chromosome 5, sections 81–100):
One scaffold maps to Muller C, comprising 26.9 Mb of sequence that localizes to cytological regions 81–100 (Figure 12, supplemental Table 25). The single scaffold 6496 was anchored at the telomeric (distal) end to 82A. The genetic map consisted of five markers. Marker A3-12-6 does not occur in the same order between the linkage map and the genome sequence. This discrepancy cannot be checked on an independent map because the previous linkage map (Staten et al. 2004) lacks sufficient information about linkage on the fifth chromosome.
Muller element D (chromosome 4, sections 61–80):
Two scaffolds map to Muller D, comprising 27.3 Mb of sequence (Figure 12, supplemental Table 25). Scaffold 6680 was 24.7 Mb and was localized at the proximal end by two different probes to 80C and at the distal end to 63A. Seven genetic markers were localized in chromosome 4. The linkage map of chromosome 4 is composed of seven markers. The linkage discrepancy on the proximal end of chromosome 4 (DMOJ4040 and DMOJ4050) is likely due to statistical variance in the map over fairly long genetic intervals. The Staten et al. (2004) linkage map agrees with the genome sequence in that region. It is possible that the discrepancy at the distal end (A2-13-1 and M2-19-2) reflects inaccurate orientation of scaffold 6654.
Muller element E (chromosome 2, sections 21–40):
Chromosome 2 has a single scaffold, 6540, which comprises 34.1 Mb of sequence and contains cytological regions 21–40 (Figure 12, supplemental Table 25). The orientation of scaffold 6540 was determined by the in situ hybridization of probe 6540X2 to position 40D at the proximal end of the assembled sequence. Four markers were used in the genetic map. On chromosome 2, the DMOJ2020 marker maps to the central portion of scaffold 6540, but to the distal end of the linkage group. We reanalyzed a previous D. mojavensis linkage map (Staten et al. 2004), which contains some common markers and found comparable alignment discrepancies with the genome sequence in that region of chromosome 2. Thus, the scaffold sequence for Muller E should be viewed with caution.
Muller element F:
Muller element F is not included in our analysis because we failed to visualize the chromosome in any of the polytene chromosome preparations. A single scaffold comprising 3.4 Mb of sequence is assigned to Muller F on the basis of conserved linkage.
D. grimshawi maps
Members of the Hawaiian Drosophila are a premiere example of adaptive radiation in nature. This group contains nearly 1000 species, most of which display extreme sexual dimorphism, elaborate courtship and mating displays, and a high degree of host plant specificity (O'Grady 2002). This Hawaiian Drosophila lineage is placed in the subgenus Drosophila where it is sister to the virilis–repleta radiation (Markow and O'Grady 2006). D. grimshawi, a species of Hawaiian Drosophila, is placed within the “picture wing” species group, so named because of the spectacularly pigmented wings that are used in courtship and mating displays (Edwards et al. 2007).
There are currently 112 described picture wing species, the majority of which have been included in Hampton Carson's polytene chromosome phylogeny (Carson 1992). D. grimshawi carries the standard chromosomal arrangement for this phylogeny; all other picture wing gene arrangements can be reached via a combination of 228 naturally occurring inversions (Carson 1992; Carson et al. 1992). Carson et al. (1992) provide detailed maps of polytene chromosome inversions in several species of Hawaiian Drosophila, including D. grimshawi.
Chromosome maps—D. grimshawi:
The dearth of in situ hybridization data for D. grimshawi posed a unique problem for anchoring genomic scaffolds to the polytene chromosomes. The extensive chromosome phylogeny work of Carson and colleagues (Carson 1992; Carson et al. 1992), coupled with a low frequency of between-species chromosomal variation, allowed in situ hybridizations in related species (D. heteroneura, D. silvestris, and D. nigribasis) to assist in localizing genes within D. grimshawi. Polytene chromosome band assignments were made for a total of 29 loci (supplemental Table 26). D. melanogaster homologs (supplemental Table 26) for these taxa were used to query the D. grimshawi genome (Altschul et al. 1997) and to assign scaffolds to a specific chromosome (supplemental Tables 15 and 26).
Muller element A (chromosome X, sections 1–20):
Scaffold 15203 localized to Muller's A on the basis of three loci (Figure 13, supplemental Table 26). The yolk protein gene (Yp3) maps to position 13B in D. grimshawi. The ras locus maps to position 18C on the X chromosome of D. silvestris and the per locus is near position 17B in D. nigribasis. The X chromosome of D. grimshawi differs from D. nigribasis by 9 inversions and from D. silvestris by 10 inversions. Of these, 4 occur outside the regions being mapped and therefore are not important for the purposes of placing these scaffolds. The five (D. nigribasis) or six (D. silvestris) inversions that do occur within the region mapped alter the arrangement of the polytene bands between the D. grimshawi standard and the other two taxa; these species are not homosequential within this region. However, the relative order of the Yp, per, and ras markers are the same in all three species. We can therefore conclude that the orientation of the contig will be the same, even though gene order between 13B and 18C is not conserved. Two additional scaffolds, 14851 and 15081, were anchored by synteny. Together, these account for a total of 26.4 Mb of sequence using these three markers.
Muller element B (chromosome 3, sections 21–40):
Two scaffolds, 15252 and 15126, which together account for a total of 25.5 Mb, can be placed and oriented using six markers (Figure 13, supplemental Table 26). Two markers were mapped by in situ hybridization in D. grimshawi (ex and vasa). An additional four markers were mapped in D. silvestris (Pgk, Pez) and in D. heteroneura (dp, wg). Element B differs in two chromosome inversions between D. grimshawi and these other taxa. The 3m inversion is between bands 23A and 25D. The 3d inversion spans 35A to 38C. None of the markers mapped in D. silvestris or D. heteroneura localize to these regions so the band identities are identical. Computational analysis of synteny places joins scaffolds 15252 and 15126 via scaffold 14978. A fourth scaffold, 9450, is joined to the distal end of scaffold 15252, creating a superscaffold 26.7 Mb in length.
Muller element C (chromosome 2, sections 41–60):
Scaffold 15242, accounting for 18.3 Mb of sequence, has been placed and oriented to Muller's C (Figure 13, supplemental Table 26). Although there are no markers mapped in D. grimshawi, a total of seven loci have been mapped by in situ hybridization. Three loci (leo, lola, vg) have been placed in D. heteroneura, and four regions (spin, Khc, Jheh1, αAmy) are known from D. silvestris. In natural populations, element C differs in a single inversion, 2m, between bands 41A and 46A, the other species relative to D. grimshawi. The D. silvestris lines in which Khc, spin, and Jheh1 were localized were homosequential with the D. grimshawi standard. Two additional scaffolds, 9437 and 15112, are joined to scaffold 15245 via syntenic analysis, yielding a 23.5-Mb superscaffold.
Muller element D (chromosome 5, sections 61–80):
Scaffold 15110, 24.6 Mb in length, maps to element D (Figure 13, supplemental Table 26). Six loci have been mapped to Muller's D, three from D. silvestris (Argk, ATPsynb, Pgm), two from D. heteroneura (lark, Cp18), and one from D. grimshawi (αCat). These two species are homosequential with D. grimshawi for Muller's D so localizations are directly comparable between taxa. One gene located on element D of D. melanogaster was previously shown to map on element E of D. grimshawi (αCat), indicating movement of the gene between these elements.
Muller element E (chromosome 4, sections 81–100):
Three scaffolds have been localized to this chromosome via in situ hybridization (Figure 13, supplemental Table 26). A total of eight loci have been mapped to this element, two in D. grimshawi (Hr96, αCat), two in D. heteroneura (wts, fru), and four in D. silvestris (RpS3, ninaE, Sryβ, cher). There is a single fixed inversion, 4b, between D. silvestris–D. heteroneura and D. grimshawi spanning the region from 93A to 97D. The RpS3 locus is involved in this inversion and has been localized to band 95D in D. grimshawi, a position equivalent to the mapped position 95B in D. silvestris. The remainder of the chromosome is unaffected by this inversion. Two scaffolds, 14830 and 15116, can be joined to the mapped scaffolds, generating a single 34.2-Mb superscaffold for this chromosome.
Muller F (chromosome 6 sections):
A single marker, bt, localized in D. grimshawi, anchors scaffold 14822 to the microchromosome (Muller's F). Computational analysis joins scaffold 14592 to this, creating a 1.3-Mb superscaffold for the dot chromosome. Figure 13 shows Muller's F from D. silvestris (Carson 1992); there are no inversions or band differences in this element known from any Hawaiian Drosophila.
Unplaced or orphan scaffolds for the 11 species:
The supplemental text describes general properties of scaffolds that were not assigned or mapped to one of the Muller elements in each of the species. These scaffolds are also referred to as orphan scaffolds.
The combined approaches of comparative syntenic block analysis and direct physical mapping have collectively informed the orientation of these genome sequences relative to the polytene chromosomal maps. One of the more remarkable features of this analysis is the relatively low impact that genome size and structure had on the ability to assemble gene-rich components of each chromosome arm. Differences in genome size appear to arise primarily from variation in the heterochromatic or unassigned DNA of centromeric regions (Table 2) and the large differences in genome size among these species had little impact on the “quality” of the assemblies using the whole-genome shotgun (WGS) method. For example, measures of DNA content differ quite dramatically among species in the subgenus Drosophila (Table 2) (Bosco et al. 2007); D. virilis has the largest estimated genome size of all 12 sequenced species, whereas D. mojavensis has the smallest estimated genome size (Table 2). Virtually identical sequencing and assembly methods were used in these two species, yet the sequence localized to the chromosome arms of D. virilis is only slightly more fragmented at the level of scaffolds compared with D. mojavensis.
The sequence of D. pseudoobscura represents an interesting contrast because this species also has a relatively small genome, yet the assembled genome sequence of this species is quite fragmented. Whether this higher level of fragmentation is attributable to biological vs. methodological causes is unclear due to differences in methods used to generate the assembly of D. pseudoobscura (Richards et al. 2005). Different assembly programs can lead to different levels of fragmentation because the algorithms are more or less conservative about joining contigs together to form scaffolds. For instance, an assembly will be more fragmented if the number of BAC paired end reads required to join two contigs together is large. The relative proportion of small to large insert clones used in the whole-genome shotgun sequencing can also influence fragmentation. The use of fewer BAC clones will result in more fragmentation. With more repetitive sequences in the genome, assemblies will be more fragmented. We are tempted to conclude that differences in the assembler were responsible for the higher level of fragmentation in D. pseudoobscura. The Freeze 1 assembly of D. pseudoobscura was done with ATLAS (Havlak et al. 2004), while the D. persimilis assembly was done as assisted assembly using an ARACHNE assembly of the D. pseudoobscura genome. The D. persimilis assembly appears to be less fragmented through a more aggressive joining of contigs, which led to mis-joins. This suggests that a more conservative assembly, while more fragmented, may have fewer mis-joins. The data presented here suggest that assembly algorithms that integrate paired end reads and conserved linkage information of close relatives may be a useful strategy for reducing fragmentation of assemblies.
Synteny-based predictions for “scaffold joins” were used to complement the placement of scaffolds with genetic and physical anchors in a given species. The computational predictions of scaffold order and orientation were critical for filling gaps where experimental markers are not currently available. Although the orientation of the complete sequence of each chromosome arm should be viewed as a first draft, the overall synergism between the computational and marker-based organization of these chromosome-sized scaffolds generates confidence in the inferences.
Computational predictions and marker data proved to be mutually informative. The analysis of the genome sequence of D. virilis is a good example of computational predictions that were useful in orienting small scaffolds with respect to flanking scaffolds because marker data indicated the order of scaffolds along the chromosome, but the resolution of standard physical mapping was insufficient to accurately orient the scaffolds along the chromosome (e.g., A5 internally with Muller element A and D1 at the distal end of element D, Figure 11). Small scaffolds not containing any experimental markers were also inserted and oriented using synteny information (e.g., D2 within element D and E4 at the proximal end of element E, Figure 11). The computational analysis also provided a comprehensive screen for orthologs of D. melanogaster genes, and on the basis of these orthology calls, almost all of the gene-containing regions appear to be organized along the chromosome arms of D. virilis and the other species.
The computational analysis is dependent upon having an accurate assembly, whereas the marker data provide a completely independent assessment of the assembled sequence. One caveat of marker data is the need for accuracy in the ordering of reference positions. These analyses uncovered several instances where cytological positions and linkage relationships did not match the assembled sequences. These discrepancies are due to either errors in the assembled sequences or errors in the ordering of the reference markers. One particular discrepancy for a set of mapped positions on chromosome 5 (element C) of D. virilis revealed an apparent error in the assembly of the largest scaffold mapped to this chromosome arm. The assembly error made it impossible to detect syntenic groups with the edges of this scaffold because the erroneous mis-join within the scaffold occurred between two real scaffold edges, thus hiding them from the Synpipe analysis (Bhutkar et al. 2006).
Assembly mis-joins also influenced the computational approach in D. pseudoobscura. A number of assembly superscaffolds were broken apart on the basis of experimental data, indicating an alternative alignment along the chromosomes. On the other hand, computational predictions performed well in the case of D. persimilis, despite a more fragmented assembly because of lower sequence coverage. The major exception was on Muller C where the assembly rearranged segments within the inverted region in the central part of the chromosome. A large number (90) of scaffold joins were inferred across five of six Muller elements, and there was perfect agreement with the experimental data derived from the backbone species D. pseudoobscura, as shown in Table 3.
The syntenic analysis is quite effective in identifying errors in the assembly arising from joins between sequences of different Muller elements. Errors are clearly apparent in scaffolds containing adjacent blocks of genes that belong to different Muller elements, but there is no indication from chromosomal rearrangements that this composite gene order is expected, and the genes show the expected associations with Muller elements in closely related species. A number of these mis-joins, in the case of D. sechellia, for example, have been confirmed. A list of probable assembly mis-joins is provided in the supplemental material. These corrections of the initial freezes of the assemblies will provide more accurate data for downstream analysis. For example, knowledge of Muller element-wide scaffold and gene order and misassembly information enabled a fine-scale analysis of chromosome-wide rearrangements between species (Bhutkar et al. 2008).
Patterns of chromosome evolution in the genus Drosophila:
A wide spectrum of gross changes in overall chromosome form is represented by species of Drosophila. For example, the curious dot chromosome, which has maintained a common form in all three species of the subgenus Drosophila and in most members of the subgenus Sophophora, exhibits two derived configurations in these species. A dot chromosome is not evident in the karyotype of D. willistoni. This small chromosome fused with the proximal end of the E element to generate a composite chromosome. The fusion is clearly demarcated in the genome sequence because genes from element F are mostly contained in a localized block within a single scaffold that has a distinct transition into genes from element E (supplemental Table 15). In this case, the composite nature of this scaffold captures the visible change in the karyotype (Papaceit and Juan 1998). The junction between the two elements is relatively precise, being separated by 14 kb in length without gaps in the sequence. The junction sequence has repeats with at least one signature of a P element, suggesting that these repeats could have played a role in the fusion process. The other change in the dot chromosome entails the expansion of this chromosome into a large metacentric chromosome in D. ananassae. The scaffolds associated with element F contain the same syntenic genes seen in the dot chromosome of other species. The expansion of the F element in D. ananassae occurred through the addition of 32.5% more retro and DNA element sequence without major changes in coding capacity. Even with this expansion, the structure of the assembled scaffolds for the “dot” of D. ananassae does not differ substantially from some of the other species in the Sophophora lineage where the genes are also distributed among several scaffolds. By contrast, element F is represented by one or two scaffolds for species in the subgenus Drosophila. These differences in assembly of the dot in the two subgenera are correlated with differences in chromatin structure. The dot of D. melanogaster is heterochromatic, while a heterochromatic structure is not evident for the dot of D. virilis (Slawson et al. 2006). It is unclear how chromatin structure could influence the WGS approach, but the differences in efficiency of assembling the sequence of element F, despite a relatively uniform level of transposable element content, suggest a possible impact on the effectiveness of genome sequencing and assembly.
The complete characterization of conserved linkage groups between these genome sequences further supports previous observations of limited exchange of genetic content between Muller's elements (Richards et al. 2005). However, exchange events have occurred during the evolution of these Drosophila species, and these instances of gene movement not only are limited to members of multigene families, but also include single-copy genes (Ranz et al. 2003, 2007; Bhutkar et al. 2007b). This analysis reveals that large-scale genome rearrangement has played a limited role in the movement of genes between chromosome arms. For example, D. erecta and D. yakuba share a pericentric inversion at the base of the B·C element (2L·2R in D. melanogaster). Thus, relative to D. melanogaster, the B and C elements are now mixed from telomere to centromere. The new order is B → C and C → B.
Elements A and D were fused in D. pseudoobscura and D. persimilis to form a metacentric X chromosome that contains genes homologous to the X and autosomal 3L arms of D. melanogaster. Genes from the A element have moved to the base of the D element in the two obscura group species. This interchromosomal change was first explained by a pericentric inversion between elements A and D after the centric fusion that generated the metacentric X chromosome of D. pseudoobscura (Segarra and Aguadé 1992; Segarra et al. 1995). However, a pericentric inversion in the fused A and D elements would be expected to cause a reciprocal exchange of gene content between elements. No evidence for such an exchange has been detected in the assembled genome of D. pseudoobscura. Thus, the pericentric inversion responsible for the exchange between the A and D elements might be asymmetric with one of the breakpoints in the heterochromatic region very close to the centromere of element D. The accumulation of transposable elements in heterochromatic regions (Rizzon et al. 2002) and their putative role in the origin of inversions (Cáceres et al. 1999) might account for this possibility.
Alternatively, after the fusion of both elements, the centromere might have been repositioned toward a more interstitial region of element A. Centromere repositioning (CR) has been proposed to explain the karyotype evolution of diverse mammalian species (Ferreri et al. 2005; Carbone et al. 2006), but up to now CR has not been described either in Drosophila or in other insects. Centromere repositioning events would cause, like asymmetric pericentric inversions, a relocation of gene content between elements (with the shortening of one element and the lengthening of the other one), but with no inversion of gene order relative to the centromere. However, it is not straightforward to distinguish between the pericentric inversion and the centromere repositioning alternatives in the Drosophila case described here. The A·D fusion occurred following the divergence between the lineage that gave rise to pseudoobscura and the lineage currently represented by D. subobscura and close relatives. Given the ancient split between these lineages (19–22 MYA according to Gao et al. 2007), the gene order around the centromere of the fused and nonfused elements might have completely changed as a result of paracentric inversions fixed in both lineages.
Rearrangement of elements in the Drosophila genome also occurs through pericentric inversion. A pericentric inversion of the A element in D. ananassae converted the normally acrocentric X into a metacentric chromosome, thus representing a change in karyotype other than fusion between Muller's elements. The scaffolds in the genome sequence map separately to the two chromosome arms generated by this inversion. In addition to the relatively rare cases of major chromosomal rearrangements (fusions, translocations, and pericentric inversions), these species possess a rich spectrum of paracentric inversions and their presence in the sequenced genomes has been readily detected and confirmed (Sperlich and Pfriem 1986; Bhutkar et al. 2008).
Evaluation of these assembled genome sequences relative to the chromosome maps yields initial insight into the distinction between euchromatic and heterochromatic genomic regions as represented in polytenized chromosomal regions and the chromocenter of salivary glands. Scaffolds mapping to centromeric regions comprise initial glimpses into heterochromatic domains of these genomes (supplemental Text). The ∼10-fold increase in the abundance of interspersed repeats in the “orphan” scaffolds located in the pericentromeric regions in the genome of D. virilis is consistent with the genome of D. melanogaster (Smith et al. 2007). Gene content within the pericentric regions appears quite variable among species, because D. mojavensis and D. grimshawi are essentially devoid of these “orphan” scaffolds, whereas D. virilis and D. ananassae have quite extensive regions of assembled pericentric sequence containing a modest number of genes. Gene content of these regions, however, exhibits a high rate of flux among species consistent with previous studies (Yasuhara et al. 2005; Schulze et al. 2006). Genes in the heterochromatin of D. melanogaster are generally distributed throughout the genomes of the other species, and from a different perspective, genes residing in pericentromeric scaffolds of D. virilis are generally present in euchromatic regions of D. melanogaster. This movement between chromosomal domains also generates the appearance of much more intra-element movement using the position of genes in the genome of D. melanogaster as a reference. For example, two scaffolds of D. virilis contain a mosaic of 17 genes from the D and E elements of D. melanogaster. Each of these genes is located within large scaffolds mapped to the E element of D. mojavensis, D. grimshawi, and D. willistoni. Therefore, the ancestral location of these genes appears to be element E, which has been observed as directly for RpL15 of D. virilis and αCat of D. grimshawi. These comparisons support a previous inference that the location of a small number (∼7) of genes on element D of D. melanogaster is a derived feature, possibly resulting from a small pericentric inversion(s) within the centromeric region of chromosome 3 in the melanogaster group that exchanged genes from element E to element D (Schulze et al. 2006).
Mapped assembly scaffolds add value to the comparative genome data:
The data presented here should be viewed as the starting point for experiments designed to understand how genomes evolve. These experimental studies are now more tractable, given the complete genome sequence of 12 representative species in the genus Drosophila. Availability of BAC libraries for nine additional Drosophila species further enables comparative genomic analyses (see http://www.genome.arizona.edu/BAC_special_projects/#Drosophila) (Markow et al. 2003). Studies of these species will be limited initially due to the availability of few balancer and mutant strains. In addition, transformation systems to introduce mutations and shuttle genes into different strains are also limited, but are likely to be developed in the future.
The scaffold maps presented here represent the first pass at anchoring the assembled sequences to the physical map and one should view these assignments with caution. The ordering and orientation of scaffolds revealed by this analysis will be represented in FlyBase (http://www.flybase.org) as a sequence of each chromosome arm in each species. The amount of genetic and physical mapping that was used in the different species varied, which leads to different levels of confidence in the maps. Gaps between scaffolds will be indicated by insertion of ambiguities. Although each chromosome will appear as one contiguous sequence, several real or possible limitations should be recognized when using these sequences for downstream analyses: (1) distances are not accurate for the intervals between positions that span scaffold joins; (2) a substantial amount of sequence, including coding regions, may be missing from each scaffold join; (3) scaffolds designated in the unplaced bin may contain sequence corresponding to gaps between oriented contigs or scaffolds, and the scaffolds in the unplaced bin are completely unordered; and (4) unrecognized assembly errors may exist within scaffolds. With these caveats in mind, the sequences do provide a comprehensive resource for comparison among the genomes of these species of Drosophila and represent a starting point for additional experiments that evaluate these uncertainties.
The original motivation for the 12 genomes project was to develop new bioinformatic tools to aid with genome assembly using whole-genome shotgun methods. The power of Drosophila is the availability of a sophisticated polytene chromosome map that allows for careful verification of new computational approaches for ordering and orienting scaffolds together. This approach shows that there are added benefits of using a set of related species in genome biology. Comparison of gene order among close relatives will help to order and orient scaffolds in species groups where physical map data may be limited.
There are additional challenges to the computational approaches described here. The Drosophila genome is relatively well behaved with respect to rearrangements. About 95% of the genes are syntenic and the major mechanism of genome change is inversion events. Mammalian genomes have the added complication of many more transposition events. Thus, new computational methods must take these rearrangements into account as new tools are developed.
This effort resulted from an international collaboration among Drosophilists and it is important to give appropriate credit to the groups and group leaders. S. W. Schaeffer coordinated the compilation of data and text for this article, contributed text throughout the article, and served as a group leader for the obscura group. A. Bhutkar, B. F. McAllister, M. Matsuda, L. M. Matzkin, P. M. O'Grady, C. Rohde, V. L. S. Valente, T. A. Markow, W. M. Gelbart, and T. C. Kaufman served as group leaders for the different species and contributed text and data for the article. The participants in each group are the following: the Computational Working Group—A. Bhutkar, S. M. Russo, T. F. Smith, and W. M. Gelbart; the D. melanogaster Working Group—T. C. Kaufman, R. Wilson, J. Goodman, and V. Strelets; the D. ananassae Working Group—M. Matsuda, H. Sato, Y. Tomimura, E. Kataoka, K. Yoshida, and Y. N. Tobari; the D. pseudoobscura and D. persimilis Working Group—S. W. Schaeffer, C. A. Machado, W. W. Anderson, M. Papaceit, M. Aguadé, C. Segarra, S. Richards, and M. A. F. Noor; the D. willistoni Working Group—C. Rohde, T. T. Rieger, A. C. L. Garcia, S. W. Schaeffer, and V. L. S. Valente; the D. virilis Working Group—B. F. McAllister, E. R. Lozovsky, J. Hartigan, and D. R. Smith; the D. mojavensis Working Group—L. M. Matzkin, M. Wasserman, L. K. Reed, T. Watts, and T. A. Markow; and the D. grimshawi Working Group—P. M. O'Grady, R. T. Lapoint, and K. Edwards.
We thank Agencourt Bioscience, The Broad Institute of MIT and Harvard, and the Washington University Genome Sequencing Center who generated the CAF1 genome sequence data and were supported by grants and contracts from the National Human Genome Research Institute. This article benefited from two anonymous reviewers who completed their task faster than we thought possible and made many valuable points that helped us improve the presentation of these data. We are humbled by the attention to detail paid by these members of the Genetics community. We thank all investigators that contributed data on the physical and genetic maps of these species.
The computational analysis group thanks FlyBase for D. melanogaster data sets and computing support; the research computing group in the Division of Life Sciences at Harvard University for systems support; Biomolecular Engineering Research Center (Boston University) computing and support staff; Venky Iyer (Eisen Lab at the University of California at Berkeley) and the Assembly/Alignment/Annotation (AAA) coordinating committee for releasing GLEAN-R gene predictions. This work was supported by a subcontract to Boston University from Harvard University under National Institutes of Health grant HG000739.
The D. melanogaster group also thanks FlyBase for data sets and computing support. We also are indebted to Mike Eisen and Venky Iyer for their efforts in organizing the AAA data assembly and web pages as well as for the the orthology calls that were used in our alignment of scaffolds to chromosomes. The melanogaster group alignments were supported by funding and facilities from the Indiana Genomics Initiative.
The D. ananassae chromosome group thanks Ryoko Ogawa for her technical assistance and also acknowledges the support of the National BioResource Project provided by The Ministry of Education, Science, and Culture of Japan.
S.W.S. thanks Richard J. Cyr for the use of his microscope that was used to obtain photomicrographs of the D. pseudoobscura and D. persimilis chromosomes and Lawrence G. Harshman and J. W. O. Ballard for suggestions on preparing linear salivary chromosomes.
The D. willistoni chromosome group thanks Jeffrey R. Powell for his confidence and for making strains available, Antonio B. de Carvalho for stimulating discussions, and the Brazilian funding agencies Conselho Nacional de Desenvolvimento Cientifico e Tecnológico, Coodenação de Aperfeiçcoamento de Pessoal de Nivel Superior, and Fundação de Amparo á Pesquisa do Estado do Rio Grande do Sul for grants and fellowships.
B.F.M. thanks A. L. Evans for technical assistance. B.F.M. was supported by National Science Foundation grant no. DEB-0420399.
↵2 Present address: Department of Genetics, NC State University, Raleigh, NC 27695.
Communicating editor: R. S. Hawley
- Received December 15, 2007.
- Accepted March 13, 2008.
- Copyright © 2008 by the Genetics Society of America