Abstract
We studied levels of intra- and interspecific nucleotide variation associated with a Y-linked gene in five members of the Drosophila melanogaster subgroup. Using published sequence for 348 bp of the Dhc-Yh3 gene, and degenerate PCR primers designed from comparisons of the sea urchin and Chlamydomonas flagellar dynein genes, we recovered a 1738-bp region in D. melanogaster. Analyses of sequence variation in a worldwide collection of 11 lines of D. melanogaster and 10 lines of D. simulans found only a single silent polymorphism in the latter species. The synonymous site divergence per site for Dhc-Yh3 is comparable to values for X and autosomal genes. Assuming a Wright-Fisher population model, the lack of variation is statistically less than expected using appropriately reduced estimates of θ from the X and autosomes. Because the Y chromosome encodes only six known genes, genetic hitchhiking associated with background selection is unlikely to explain this low variation. Conversely, adaptive hitchhiking, as associated with sex-ratio chromosomes, or a large variance in male fertility may reduce the polymorphism on the Y chromosome. Codon bias is very low, as seen for other genes in regions of low recombination.
TO understand the forces affecting nucleotide variability in natural populations, we have taken an informative approach to contrasting genomic regions that differ in specific features such as population gene number, sex-limited transmission, and levels of recombination (Berryet al. 1991; Begun and Aquadro 1992). The model in much of this approach has been the sibling pair Drosophila melanogaster and D. simulans, where considerable data on intra- and interspecific nucleotide variation associated with the autosomes, sex chromosomes, and mtDNA have accrued (Moriyama and Powell 1996). Although studies of ribosomal DNA and Stellate gene copy number variation have found heterogeneity among Y chromosomes of D. melanogaster (Lyckegaard and Clark 1989; Clark and Lyckegaard 1990; Clarket al. 1990), missing has been a measure of silent-site variation for a single gene on the Y chromosome. Because the Y chromosome undergoes no recombination and is male limited in transmission, this makes it male specific with respect to both mutation rate and those features of male life history that are likely to influence the effective population number of Y-linked copies. Finally, under panmixia the effective population number of the Y chromosome is one-quarter the autosomal number and one-third the X chromosome value.
Gepner and Hays (1993) reported a small dynein-related partial cDNA that localized exclusively to the Y chromosome of D. melanogaster and named this gene Dhc-Yh3. Using available Y-autosomal translocations they identified the previously known male fertility factor kl-5, which was reported to be associated with flagellar function (Goldsteinet al. 1982; Hardyet al. 1984). This 348-bp fragment shows extensive amino acid sequence identity with the ATP-binding domains of the sea urchin (Anthocidaris crassispina) sperm dynein (Gibbonset al. 1991; Ogawa 1991) and Chlamydomonas reinhardtii flagellar dynein motors (Mitchell and Brown 1994). In this article we use the D. melanogaster sequence of this small piece of Y-specific DNA to enter and capture a larger piece of the Y chromosome by using the Chlamydomonas and sea urchin dynein β-heavy-chain amino acid sequences to identify regions of conservation extending beyond the original 348-bp region. Using degenerate primers and cDNA from males we expand the D. melanogaster Y-specific Dhc-Yh3 region to encompass 1738 bp. From this region we survey nucleotide diversity in worldwide collections of 11 Y chromosomes of D. melanogaster, 10 chromosomes of D. simulans, and single copies from D. sechellia, D. mauritiana, and D. yakuba.
MATERIALS AND METHODS
Lines: Worldwide collections of isofemale lines were used for both D. melanogaster and D. simulans. The D. melanogaster lines were from France (MT61, R27, MT42), Zimbabwe (Z29, Z22, Z40), Kenya (K3782), Long Island, New York (DPF1-95, DPF2-95, DPF3-95), and Israel (N-line). The D. simulans lines were obtained from the MidAmerica Species stock center and were from Hawaii (No. 0.251.0), Guyana (0.251.1), Colombia (0.251.2), Australia (0.251.4), Guatemala (0.251.161), Kushla (0.251.162), Morro Bay (0.251163), Florida City (0.251.165), Islamorada (0.251.166), and Soloway-Hochman (0.251.167). Single lines of D. sechellia (lab of J. David), D. mauritiana (line S080, Umeå Stock Center), and D. yakuba (line S180, Ivory Coast, Umeå Stock Center) were used. From each line a single male was mated to 5–10 virgin females and a line that was isogenic for the Y chromosome was established.
Primary sequence recovery: Messenger RNA from D. melanogaster was extracted from 150–300 male flies using the Quick-Prep kit from Pharmacia (Piscataway, NJ), and single-strand cDNA subsequently generated using the Pharmacia protocol. Two outward-facing Dhc-Yh3-specific primers (MELDYN 1,5′-GCTATAAACTTTAACGCAGTC-3′ and MELDYN2, 5′-GCAAGCAATATGCTCTC-3′) were used in conjunction with two degenerate inward-facing primers (DYN3, 5′-TTCCCCCGYTTYTAYTTYGT-3′ and DYN4, 5′-GTCNCGRTCNACBATCCA-3′) designed from the alignment of the sea urchin and Chlamydomonas amino acid sequences. Approximately 10 ng of single-stranded cDNA was amplified in 50 μl of 50 mm Tris HCl, pH 8.3, 20 mm KCl, 0.01% gelatin, 2.5 or 3.0 mm MgCl2, 2 units of Amplitaq polymerase (Perkin-Elmer, Norwalk, CT), and 120 ng of each primer. The resulting PCR fragments were run on low-melting-point agarose, excised, and used as template in a double-stranded sequencing reaction (Khoranaet al. 1994). Sequencing primers were spaced about every 250 bp. Reaction products from each sequencing reaction were loaded on standard acrylamide gels with an electrolyte gradient and electrophoresed for 3–5 hr. Both strands were completely sequenced for each allele with rare gaps of 5–10 bases for which only one strand produced readable sequence. All differences were confirmed on both strands.
RESULTS
Figure 1 depicts the amplified region and strategy used to recover D. melanogaster Dhc-Yh3-specific dynein sequences. The degenerate primers allowed us to amplify a 1738-nucleotide region (all numbers referred to here are with respect to the D. melanogaster sequence). This translates into 402.66 effectively silent sites (Kreitman 1983), which is a region larger than that screened for most genes surveyed in both species (Moriyama and Powell 1996). In early attempts to use genomic DNA as template, it became apparent that the small region, between nucleotide positions 1476 and 1569, would not amplify and that this was the site of a potential intron. In D. simulans we amplified the region from nucleotides 60 to 1689 (372.67 effective sites) using cDNA generated from our iso-Y lines. D. sechellia, D. mauritiana, and D. yakuba were sequenced from cDNA and 1719 bases were sequenced. Sequences for the five species are available under GenBank accession nos. AF136243–AF136266.
The nucleotide sequence differences for D. melanogaster, D. simulans, D. sechellia, D. mauritiana, and D. yakuba are given in Figure 2. There are two replacement changes (Thr/Ala difference between D. yakuba and the D. melanogaster-simulans lineage at nucleotide 1015 and a Ser/Thr change between D. melanogaster and the other species at nucleotide 1270) and 139 silent substitutions overall, with 48 silent substitutions fixed between D. melanogaster and D. simulans. The pairwise silent-site divergence between D. melanogaster and D. simulans (d = 0.13) is at the high end of estimates for a survey of autosomal and sex-linked genes (Moriyama and Powell 1996; average d = 0.102). Finally, we observed no polymorphisms in our sample of 11 D. melanogaster Y chromosomes from a diversity of localities including both cosmopolitan and east African populations. In our sample of 10 D. simulans chromosomes a single silent polymorphism at nucleotide 1377 (G to A) was observed in association with the Australia, Hawaii, and Guatemala lines.
Diagram of the major cytological regions and fertility factors of the D. melanogaster Y chromosome. The strategy to recover a 1738-bp portion of the Y chromosome-associated dynein gene Dhc-Yh3 using the published 348-bp cDNA sequence from Gepner and Hays (1993) is shown. Note the position of proposed intron as inverted triangle.
It is generally assumed that most synonymous site mutations evolve in a neutral fashion and can be modeled as a Wright-Fisher population (Ewens 1979), where steady-state polymorphism is determined by the joint parameter θ = 4Neμ, where Ne is the number of diploid individuals of both sexes, and μ is the neutral mutation rate per silent site. Given that the Y chromosome has a lower effective number than the rest of the genome, we would have expected to have seen less polymorphism; but would we have expected to observe zero polymorphisms in D. melanogaster and only a single polymorphic site in D. simulans? This may be statistically approached in two ways. First, we can simply jointly contrast the level of polymorphism and divergence with another autosomal or sex-linked gene (i.e., the HKA test; Hudsonet al. 1987). The HKA test of Dhc-Yh3 against the 5′ Adh region in D. melanogaster is not significant (χ2 = 2.80, P < 0.094). The test of the Dhc-Yh3 region in D. simulans against the G6pd (Eaneset al. 1996) and Tpi (Hassonet al. 1998) genes, both found in chromosomal regions with normal recombination, is statistically significant with Tpi (χ2 = 4.25, P < 0.039). This test depends on the choice of genes used in the contrast, and in this case the results are mixed. Alternatively, one may estimate the parameter θ for a large sample of autosomal and X-linked genes, assume that average is representative of the true value, and predict the number of Y-linked polymorphisms expected after correcting for the different copy number of each chromosome. For X-linked and Y-linked genes, θ is three-quarters and one-quarter the autosomal value, respectively. Watterson's estimator is then used to determine the expected number of segregating sites under the infinite sites model (Watterson 1975). The results are summarized in Table 1. For a region of this size, numerous Y-associated polymorphisms were expected in both species. Conditioned on the observed mean value of θ = 0.00135 (for 15 autosomal and 7 sex-linked genes in Moriyama and Powell 1996), our sample size, and the effective number of silent sites, an average of S = 4.11 polymorphisms would be expected in D. melanogaster. For historical reasons, the Moriyama and Powell (1996) average is biased toward genes in regions of very reduced recombination, and if only those genes in regions of normal recombination are used we expect an average of S = 4.42 polymorphisms. Using Equation 9.5 from Tavaré (1984), the probability of observing zero polymorphisms under stochastic sampling was significant (P = 0.038). Thus, one would expect to see Y-linked polymorphisms for a region of this size, and this is statistically significant even given large stochastic variance associated with the historical process. The observation of zero polymorphisms precludes putting an estimate on θ for D. melanogaster.
Distribution of variable nucleotide sites between five members of the D. melanogaster subgroup for a 1738-bp region of the Y-linked dynein gene Dhc-Yh3.
Summary numbers and parameters for Y-linked dynein in D. melanogaster and D. simulans
Our estimate of θ for the D. simulans Dhc-Yh3 region was 0.00084, or an order of magnitude lower than the one-quarter autosome–one-third X chromosome-derived prediction. For D simulans, we would have expected S = 8.48 polymorphisms. Our observation of one (or fewer) polymorphism in our sample was expected with probability P = 0.013 (Equation 9.5; Tavaré 1984). We conclude that the observation of only one polymorphism for this part of the Dhc-Yh3 gene is not compatible with the simple reduction in population size predicted from copy number differences between the Y chromosome and the X autosomes.
We may use Tavaré's Equation 9.5 (Tavaré 1984) to put an upper 95% estimate on θ for both species. The lower estimate of θ is unbounded for D. melanogaster and has an upper 95% confidence limit of 0.0034. We estimate that θ has an upper 95% confidence limit of 0.0062 in D. simulans.
In principle, relative values of the diploid effective population size can be estimated from each chromosome or organelle, once corrected for genome copy number. Based on polymorphism at synonymous sites, Table 2 compares the estimates of Neμ from autosomal and X-linked loci, mtDNA, and Y-linked Dhc-Yh3. If mutation rates are equal between sexes, chromosomes, and organelles, all estimates should be the same. The comparison for D. melanogaster is difficult to assess because the lower estimate of θ is zero for the Y chromosome. Estimates of Neμ for D. simulans differ by an order of magnitude across chromosomes and the mitochondrial genome.
DISCUSSION
This study constitutes the first estimate of intra- and interspecific variation in a Y-linked single copy gene in Drosophila. Our fundamental observation is that in the 1738-bp region of the Dhc-Yh3 gene we see no polymorphism within D. melanogaster and only a single silent polymorphism in D. simulans. The level of interspecific divergence, characterized as simple pairwise divergence per silent site, is comparable to the average observed for a sample of autosomal and X-linked genes. While the overall results can be succinctly summarized, the lack of variation in association with the unique nature of the Y chromosome raises a number of questions.
Explanations for the lower-than-expected variation associated with the Y chromosome must involve either male-associated reductions in the effective population number of the Y or a reduction in the male-associated mutation rate. Bauer and Aquadro (1997) recently evaluated sex-specific rate differences by comparing the number of germline divisions in males and females and also comparing levels of synonymous site divergence on the X chromosomes and autosomes. They concluded the sex-specific mutation rates were similar. Is there evidence for Dhc-Yh3 that the synonymous site mutation rate is different for the Y chromosome? While the pairwise divergence at synonymous sites is higher for Dhc-Yh3 compared to the values for autosomal-X genes provided by Moriyama and Powell (1996), divergence per se underestimates relative mutation rate differences. This is because under the neutral theory divergence includes not only time since the isolation of the ancestral gene pools but the 2N generations associated with the ancestral coalescence (Gillespie and Langley 1979). Recognizing this, it would appear that the Y-associated mutation rate is at least not lower than for X or autosomal genes, and may even be higher. Irrespective of whether the rate is truly higher or simply equal, a reduced mutation rate does not appear to be a viable explanation for the reduced variation on the Y chromosome.
Comparison of estimates of Nμ based on Y-linked Dhc-Yh3, mitochondrial cytochrome b, and averages for autosomal and X-linked genes
The other possibility is that the effective population size of the Y has been reduced below the level expected from simple chromosome copy number differences. In this regard, studies of sequence polymorphism in genes scattered across the genome of D. melanogaster show a systematic pattern of low polymorphism in regions of reduced recombination (e.g., Aguadéet al. 1989; Berryet al. 1991; Begun and Aquadro 1992). This reduction has been attributed to two types of genetic hitchhiking (Maynard-Smith and Haigh 1974), termed selective sweeps (Kaplanet al. 1989; Stephanet al. 1992) and background selection (Charlesworthet al. 1993; Hudson and Kaplan 1995). In both models, steady-state levels of linked neutral polymorphism depend on the regional level of recombination as well as the mutational input from either advantageous (under sweeps) or deleterious (under background selection) mutations in the region under restricted recombination. While there is no recombination on the Y chromosome, only six genes or fertility factors have been identified (see Kennison 1981). Charlesworth et al. (1993) examined the ability of background selection to generate reductions in heterozygosity for several chromosomal regions observed to possess reduced variation. If the assuming values of sh = 0.01–0.02 are used, the total deleterious mutation rate must be >5 × 10−3 per locus to see Y-associated reductions in heterozygosity of this magnitude. These become even larger if higher levels of dominance are assumed for Y-linked genes. Although dynein genes have large coding regions (the sea urchin sperm dynein gene is 4466 amino acid residues), with introns possibly in the 100-kb range, these necessary rates are still unrealistically large. On the other hand, adaptive sweeps cannot be rejected. Perlitz and Stephan (1997) used coalescent methods to estimate the time since fixation of the last advantageous mutation (or catastrophic hitchhiking event) for four autosomal and sex-linked regions with low recombination in several Drosophila species. Their estimates put the frequency of a sweep at one every 0.1N generations. One cannot rule out such adaptive sweeps on the Y chromosome, but the number of genes in the regions studied in Perlitz and Stephan (1997) is still an order of magnitude, or more, greater than the number of genes on the Y chromosome, so the opportunity will be reduced accordingly. It should be pointed out that studies of viability, segregational, and fertility variation in D. melanogaster have found no contribution associated with Y chromosomes (Clark 1987, 1990; Lyckegaard and Clark 1989; Clarket al. 1990). In contrast, the potential for natural selection to act on the Y-linked rDNA array associated with the abnormal abdomen phenotype has been shown for D. mecatorum (Hollocher and Templeton 1994). Y-linked adaptive sweeps could also be associated with the periodic emergence of sex-ratio chromosome polymorphisms if transmission modifiers have arisen on the Y chromosome. Evidence for X-linked sex-ratio polymorphisms and Y-linked suppression factors has been recently shown in D. simulans (Cazemajoret al. 1997), so this must be seriously entertained as a potential type of adaptive sweep. A final explanation for the low level of polymorphism may simply be the disproportionate impact that male fertility variation has on the effective population number of the Y chromosome. While sex-specific fertility variation in Drosophila, as reflected in the components of mating success, will be present in both sexes, it is proposed to be much larger in males (Brittnacher 1981; Brockettet al. 1996). Sex-specific variation in fertility will have disproportionate effect on the effective population number of different chromosomes and organelles; the effective population number is averaged between the sexes for the autosomes, while it is biased toward the female contribution for the X-linked genome, exclusively female biased for the mitochondrial genome, and exclusively male biased with respect to Y-linked genes. This male bias associated with fertility variation probably contributes something to the absence of nucleotide polymorphism on the Y chromosome. However, even if the variance in fertility is orders of magnitude higher in males, this cannot reduce the population number of the Y relative to the autosomes by more than a factor of 2 (Jacquard 1974).
This entire discussion has been predicated on the assumption that the synonymous changes on all chromosomes are truly neutral. This is likely to be the case for synonymous sites in the Dhc-Yh3 gene because its codon bias is extremely low. The codon adaptation index (CAI; Sharp and Li 1987), a measure of uneven codon usage, varies overall from low values of ~0.15, which is the expected range for a gene evolving strictly under mutation-drift pressure (Moriyama and Hartl 1993), to a hypothetical upper value of 1.0 if only preferred codons are used in a gene. The most biased genes such as the highly transcribed glycolytic genes and abundant structural proteins such as actin and tubulin have CAI values as high as 0.803 (see Kliman and Hey 1993). The CAI value for Dhc-Yh3 is 0.174, which in the Kliman and Hey compilation of 385 genes is eclipsed on the low end only by the fourth chromosome ci gene (CAI = 0.165). It would appear the effective population numbers for both the fourth and Y chromosomes are reduced to the point that codon usage reflects simple mutation pressure. The theoretical arguments for codon bias and its variability are associated with the nearly neutral theory (Li 1987; Akashi 1995), where small selection coefficients (associated with translational efficiency) become effectively neutral as they approach in magnitude the reciprocal of the effective population size. The least-biased genes have been observed on the fourth chromosome as well as in regions of severely depressed recombination (Kliman and Hey 1993). Our observation for the Dhc-Yh3 gene clearly extends the association of low bias with reduced recombination to the Y chromosome and implies that most synonymous mutation in Dhc-Yh3 behaves as truly neutral.
Acknowledgments
We acknowledge the MidAmerica Stock Center and Umeå Stock Center for supplying a number of lines. Dave Duvernell and Yihao Duan read and commented on an earlier version of the article. This study was supported in part by U.S. Public Health Service Grant GM-45247 and National Science Foundation Grant DEB-9318381 to W. F. Eanes, as well as supported by the Institute of Entomology, Czech Academy of Science.
Footnotes
-
Communicating editor: W. Stephan
- Received December 10, 1998.
- Accepted July 29, 1999.
- Copyright © 1999 by the Genetics Society of America