Abstract
Sex chromosomes are generally morphologically and functionally distinct, but the evolutionary forces that cause this differentiation are poorly understood. Drosophila americana americana was used in this study to examine one aspect of sex chromosome evolution, the degeneration of nonrecombining Y chromosomes. The primary X chromosome of D. a. americana is fused with a chromosomal element that was ancestrally an autosome, causing this homologous chromosomal pair to segregate with the sex chromosomes. Sequence variation at the Alcohol Dehydrogenase (Adh) gene was used to determine the pattern of nucleotide variation on the neo-sex chromosomes in natural populations. Sequences of Adh were obtained for neo-X and neo-Y chromosomes of D. a. americana, and for Adh of D. a. texana, in which it is autosomal. No significant sequence differentiation is present between the neo-X and neo-Y chromosomes of D. a. americana or the autosomes of D. a. texana. There is a significantly lower level of sequence diversity on the neo-Y chromosome relative to the neo-X in D. a. americana. This reduction in variability on the neo-Y does not appear to have resulted from a selective sweep. Coalescent simulations of the evolutionary transition of an autosome into a Y chromosome indicate there may be a low level of recombination between the neo-X and neo-Y alleles of Adh and that the effective population size of this chromosome may have been reduced below the expected value of 25% of the autosomal effective size, possibly because of the effects of background selection or sexual selection.
SEX chromosomes generally exhibit a high degree of morphological and functional differentiation (Bull 1983). This heteromorphism is usually represented by a heterochromatic sex-limited chromosome (Y or W) that has few coding loci, but often contains an abundance of repetitive sequences. The chromosome (X or Z) that is shared between the two sexes contains a variety of functional genes and may exhibit some form of dosage compensation between the sexes. Sex chromosomes in a variety of taxa exhibit this pattern of heteromorphism, suggesting many independent origins of sex chromosomes from homologous chromosomal pairs (Bull 1983). Support for this hypothesis is provided by sequence comparisons that reveal independent autosome-to-sex chromosome transitions in mammals and birds (Graves 1995; Lahn and Page 1997; Fridolfssonet al. 1998). The absence of shared sequences between the X and Y chromosomes of Drosophila has, however, inspired a proposal of an alternative pathway for the evolution of its Y chromosome (Hacksteinet al. 1996). A variety of models have been proposed to account for various aspects of the evolutionary transition of an identical chromosomal pair into heteromorphic sex chromosomes (Muller 1914, 1918; Nei 1970; Charlesworth 1978, 1996; Rice 1987, 1996, 1998), although the mechanisms involved are poorly supported by empirical evidence.
Once a pair of homologues begins to segregate as proto-sex chromosomes, differentiation cannot occur until recombination in the heterogametic sex is restricted between the two sex chromosomes. Restricted recombination between the two sex chromosomes causes the genetic isolation of the sex-limited chromosome, and this isolation from recombination is an integral component of most models of Y-chromosome degeneration (Charlesworth 1996; Rice 1996). An empirical study of an artificial Y chromosome has confirmed the importance of this genetic isolation by demonstrating the accumulation of nonrecessive deleterious alleles in the absence of recombination (Rice 1994). Accumulation of deleterious alleles and subsequent loss in gene function on a nonrecombining chromosome must occur through an interaction between mutation, selection, and drift; alternative models of degeneration involve different relative contributions from these forces (Muller 1918; Nei 1970; Charlesworth 1978, 1996; Rice 1987, 1996, 1998; Charlesworth and Charlesworth 1997; Orr and Kim 1998). Two processes, background selection and selective sweeps, are likely candidates for the initial decay in gene function on a nonrecombining proto-Y chromosome when effective population size (Ne) is very large, such as in many Drosophila species (Rice 1987, 1996; Charlesworth 1996). The stochastic accumulation of deleterious alleles caused by the successive loss of the class of chromosomes carrying the smallest number of deleterious mutations (Muller 1964), commonly known as Muller's ratchet, may also contribute to the degeneration of a nonrecombining proto-Y chromosome (Charlesworth 1978). The speed of this process is, however, highly dependent on Ne and characteristics of deleterious mutations (Charlesworth and Charlesworth 1997), so that it is not clear whether it would have a significant effect in groups like Drosophila, with their very large effective population sizes (Charlesworth 1996).
Under both background selection and Muller's ratchet, a nonrecombining Y chromosome has a low effective size, due to purifying selection continually removing chromosomes containing deleterious mutations from the population (Charlesworthet al. 1993; Rice 1998). When selection is sufficiently strong in relation to 1/Ne, so that wild-type alleles always predominate at a given locus, mutation from wild-type to mutant allele is effectively irreversible. A sample of genes taken from a nonrecombining population ultimately traces its ancestry back to the relatively small genotypic class carrying the smallest number of such strongly selected deleterious mutations. This may facilitate the accumulation of mildly deleterious mutations (for which selection is weak in relation to 1/Ne) on a proto-Y chromosome, because drift can cause such mutations to rise to high frequencies or even fixation (Charlesworth 1996) under conditions when they will be eliminated from a proto-X chromosome, because of the greater effectiveness of purifying selection in overcoming the effects of drift when Ne is large (Ohta 1992). Similarly, the fixation of advantageous mutations is inhibited on the proto-Y chromosome (Rice 1996; Orr and Kim 1998).
The selective sweep model proposes that selection acts in a positive way. An advantageous mutation occurring on a particular proto-Y chromosome may cause the fixation, or selective sweep, of the entire chromosome on which it occurs. Deleterious alleles at other loci on this chromosome would fix by hitchhiking with the beneficial allele (Rice 1987), assuming that the chromosome has a net fitness advantage. Fixation of deleterious alleles during a selective sweep may be facilitated by the masking effect of the X chromosome (Muller 1914, 1918; Fisher 1935; Nei 1970). The long-term consequence of these selective sweeps would be a locus-bylocus decay in genetic function on the proto-Y chromosome as successive alleles are fixed in the population.
Species with secondary sex chromosomes, formed by the fusion of an autosome with a primary sex chromosome, provide excellent models for studying sex chromosome evolution (Steinemann and Steinemann 1998). Several species of Drosophila have secondary sex chromosomes that represent a gradation in sex chromosome differentiation, with older chromosomes exhibiting greater differentiation (Ashburner 1989; Charlesworth 1996; Powell 1997). The secondary sex chromosomes of Drosophila americana americana were formed by fusion of the primary X with chromosomal element 4 (Muller's element B), which was a freely segregating autosome in the ancestral state represented by D. a. texana and D. virilis (Hughes 1939; Stalker 1940; Throckmorton 1982). Following the original fusion event, chromosomes with the fused X/4 arrangement would have acquired sequence variability through recombination with unfused chromosomes in heterozygous females. As the X/4 fusion increased in frequency, the element 4 homologues began to segregate as sex chromosomes. In D. a. americana (see Figure 1a), neo-X chromosomes are represented by the element 4 arm of X/4 fusion chromosomes, and neo-Y chromosomes are represented by free fourth chromosomes, which are male-limited and segregate with the ancestral Y during male meiosis. Recombination has been restricted between the neo-X and neo-Y in D. a. americana since the fixation (or near fixation) of the X/4 fusion, due to the extremely low frequency of recombination in male D. virilis (Kikkawa 1935), as in other species of Drosophila (Gethmann 1988). With this genetic isolation of the neo-Y, the process of degeneration was initiated.
—Karotypes of D. virilis, D. a. americana, and F1 hybrids. The dot chromosomes are excluded from the diagrams. (a) Chromosomal configurations in male D. virilis and male D. a. americana. All five chromosomal elements are present in D. virilis. The X/4 fusion, which created the neo-X and neo-Y, is represented in the karyotype of the male D. a. americana together with the fusion of chromosomes 2 and 3. (b) Chromosomal configurations of the male and female F1 progeny derived from a cross between female D. virilis and male D. a. americana. Due to segregation in male D. a. americana, the neo-Y chromosome is inherited by male F1 progeny and the neo-X chromosome is inherited by female F1 progeny.
The secondary sex chromosomes in D. a. americana apparently represent a very early stage of differentiation; therefore, they should provide a good model for examining the mechanisms that lead to the initial loss in gene function on Y chromosomes. There is no cytological or genetic evidence for degeneration of the neo-Y or for dosage compensation on the neo-X in D. a. americana (Bone and Kuroda 1996; Marinet al. 1996; Charlesworth et al. 1997a). One explanation for the lack of differentiation between the neo-sex chromosomes of D. a. americana is that the recent fixation of the X/4 fusion has provided little opportunity for differentiation to occur. The argument is supported by the absence of the X/4 fusion in D. a. texana (Throckmorton 1982), coupled with DNA sequence data indicating that D. a. americana and D. a. texana are very closely related (Tominaga and Narise 1995; Hilton and Hey 1996, 1997; Nurminskyet al. 1996). The presence of the X/4 fusion chromosome in D. a. americana is currently the only diagnostic feature of the two subspecies. Previous studies indicated that the X/4 fusion chromosome is fixed throughout the range of D. a. americana in the north central to northeastern United States, whereas this arrangement is not present in D. a. texana in the south central to southeastern United States (Hsu 1952; Patterson and Stone 1952; Throckmorton 1982). A hybrid zone involving the X/4 fusion is present in the parapatric region where the ranges of the two subspecies overlap (Throckmorton 1982). Because of the presence of the hybrid zone, an alternative explanation for the lack of degeneration of the neo-Y in D. a. americana is that gene flow from D. a. texana provides a “fresh” input of unfused autosomes into the pool of neo-Y chromosomes (Charlesworthet al. 1997a).
In this study, a population analysis of DNA sequences is presented for the Alcohol Dehydrogenase (Adh) gene of D. a. americana and D. a. texana. The Adh gene is located at cytological band 49B, which is near the centromere of chromosomal element 4 in D. virilis, D. a. americana, and D. a. texana (Gubenko and Evgen'ev 1984; Nurminskyet al. 1996; Charlesworthet al. 1997a). Sequence data from Adh provide a direct measure of differentiation between neo-X and neo-Y chromosomes of D. a. americana and the autosomes of D. a. texana. Furthermore, the pattern of sequence variation on the neo-Y chromosomes of D. a. americana can be examined to determine if hitchhiking due to background selection, Muller's ratchet, and/or selective sweeps has influenced nucleotide variability on this chromosome. These processes are expected to reduce nucleotide variability on this chromosome; background selection is, however, unlikely to cause a detectable departure from neutrality in the pattern of nucleotide variation at segregating sites, whereas selective sweeps can have a strong effect (Bravermanet al. 1995; Charlesworthet al. 1995; Simonsenet al. 1995). The fact that the reductions in Ne in both Muller's ratchet and background selection result from a similar process suggests that they will have similar effects on patterns of variability, although this remains to be investigated formally.
MATERIALS AND METHODS
Strains: Laboratory strains of D. virilis and D. a. americana were obtained from the National Drosophila Species Resource Center (Charlesworthet al. 1997a). In addition to these laboratory lines, which represent a subspecies-wide sample and have been maintained in culture for many years, samples of D. a. americana and D. a. texana were collected from natural populations for this study. A sample of D. a. americana was collected during October 1996 from bait cups containing fermented banana hung among a dense stand of sandbar willows (Salix exigua) in a marsh located ∼10 km west of Gary, Indiana. A sample of D. a. texana was similarly collected during April 1997 among large black willows (S. nigra) growing in a cove of a lake about 5 km west of Lone Star, Texas. Each line was established from either a single wild-caught female or a wildcaught pair. The Gary sample is referred to as G96.# (with # being the line identification), whereas the Lone Star sample is referred to as LP97.#.
Karyotypic analyses were performed on the newly collected lines to verify the presence or absence of the X/4 fusion chromosome. The laboratory lines were also karyotyped recently (Charlesworthet al. 1997a). Larvae were cultured at low density on banana medium maintained at 18°. Ganglia from third instar larvae were dissected in Ringer's solution, placed in a 0.07 M sodium citrate solution for 15 min, fixed in a 1:1 solution of 100% ethanol:45% acetic acid, and stained in 2% orcein. Several cells from at least two larvae were scored for each line.
DNA extraction and sequencing: Hybrids between D. a. americana and D. virilis are easy to obtain, so that DNA was extracted from hybrid offspring and used to obtain templates for sequencing. This procedure allowed for the use of species-specific primers so that single alleles of D. a. americana could be sequenced and assigned to neo-X or neo-Y chromosomes. Males from the D. a. americana lines were mated to females from a strain (V.46) of D. virilis carrying multiple visible mutations (see Charlesworthet al. 1997a). Segregation in male D. a. americana carrying the X/4 fusion results in the F1 female offspring carrying the neo-X chromosome, whereas F1 males carry the neo-Y chromosome (Figure 1b). F1 hybrids were also used to obtain a sequencing template from D. a. texana by crossing either a male or female to the V.46 strain of D. virilis. Single F1 flies were used for DNA extraction in all cases. Each fly was homogenized in 50 μl of a buffer containing 0.15 m NaCl, 0.1 m EDTA (pH 8.0), 0.5% SDS, and 150 μg/ml protease K. The solution was incubated at 55° for 1 hr and extracted once with phenol/chloroform (1:1) and once with chloroform alone. The DNA was ethanol precipitated and resuspended in TE.
Nucleotide sequences of the Adh locus from species in the virilis group were obtained from GenBank (Nurminskyet al. 1996). To obtain sequences from single alleles, differences between the Adh sequences of D. virilis and D. a. americana/ D. a. texana were identified. Primers were designed in these regions, allowing for the use of PCR in the specific amplification of D. a. americana/D. a. texana alleles in the F1 hybrids. Two overlapping regions of the Adh gene were amplified (Figure 2): the 5′ end by Adh5f, 5′ CCG ACT AGA AAG CAT CAC, and AdhI2r, 5′ TTA CAT TCG GTT TGT TAT ATG, and the 3′ end by AdhI1f, 5′ GTG TGT ATC AGA TTT CTG C, and AdhE3r, 5′ ATT TGA ATG GTT TAG ATA TGC. To obtain a template for sequencing, DNA from the F1 flies was used in 25-μl reactions containing 1/10 reaction buffer (Perkin Elmer, Norwalk, CT), 2.5 mm MgCl2, 0.1 μm each dNTP, 0.2 μM each primer, and 2 units Taq polymerase. After an initial 95° denaturation for 2 min, the reactions were cycled 30 times using the following parameters: 95°, 0.5 min; 54°, 1 min; and 72°, 1 min. Upon obtaining reaction products, 20 μl of each reaction was column purified (QIAGEN, Valencia, CA). The four primers used to amplify the products and two additional internal primers were used in 10-μl cycle sequencing reactions. Approximately 30 ng of each purified PCR product was mixed with 4 μl of ABI PRISM dye terminator reaction mix (Perkin Elmer) and 3 pg of the appropriate oligonucleotide. Reactions were cycled 25 times with the following parameters: 95°, 30 sec; 54°, 30 sec; and 60°, 3 min. Products were ethanol precipitated and run on either an ABI 373 or ABI 377 automated DNA sequencer.
—Structure and analysis of the Adh gene. Sequences were obtained for an 884-bp region of the gene, from 7 bases upstream of the initiation codon to 7 bases prior to the stop codon. The three exons are indicated by black boxes and the noncoding regions are represented by the line. Positions of the two primer pairs, Adh5f/AdhI2r and AdhI1f/AdhE3r, used for amplification of the region from genomic DNA, are indicated. Sequences of D. virilis (Dv) and D. a. americana/texana (Da) are indicated for the primer regions, where asterisks denote the sites of nucleotide differences between the primer sequences and the D. virilis sequence. Positions of two internal sequencing primers are also indicated by light lines.
Statistical analyses of sequence data: The DNA sequences were visually aligned using SeqPup (D. Gilbert, Indiana University) and compared using SITES (Hey and Wakely 1997). The standard estimates of nucleotide diversity, Π and θ, and their expected errors, were calculated according to Tajima (1993). These are based respectively on the average pairwise differences present in the samples and the number of segregating sites in the samples. To determine the level of sequence divergence that was present among samples, the net number of pairwise differences (d) was estimated (Nei 1987, p. 276). The D statistics of Tajima (1989) and Fu and Li (1993) were used to determine if the distribution of variation within samples was consistent with a neutral model. Both statistics were based on all the observed nucleotide variants. The D. virilis sequence was used as the outgroup to determine the ancestral state for the Fu and Li test.
It was also necessary to determine the statistical significance of an observed difference in variability between the neo-X and neo-Y chromosomes. An appropriate standard statistical test has not been devised to compare levels of nucleotide diversity between samples, so a resampling method was used. A pooled set of neo-X and neo-Y chromosomes corresponding to the sequences in the actual samples was randomly sampled with replacement into two groups of the same sizes as the original samples. For each iteration of this resampling, the mean number of pairwise differences between sequences was calculated for the two randomized samples, along with the difference in their means. Repetition of the resampling process 1000 times provided an estimate of the probability of obtaining neo-X and neo-Y samples from this population with a difference in mean pairwise difference greater than or equal to the observed value. Because the neo-X and neo-Y sequences were obtained from a single population of flies, it is appropriate to apply this test, which only considers the sampling error associated with the estimated diversities for these two samples and disregards the evolutionary stochastic error. The resampling procedure is simply a test of the null hypothesis that the neo-X and neo-Y alleles are sampled from one homogenous population at the locality where the flies were collected.
Coalescent simulations: In addition, coalescent simulations were used to examine the time of origin of the neo-sex chromosomes and the extent of any reduction in effective population size for the neo-Y chromosome. We assume that the centric fusion generating the neo-X chromosome reached fixation T time units ago (Figure 3), where time is measured in units of the mean time to coalescence of a strictly neutral autosomal locus. One coalescent time unit is thus equal to 2Ne generations, where Ne is the effective population size with respect to autosomal genes, assumed to be constant throughout (Hudson 1990). The time taken for fixation of the fusion is assumed to be negligible compared with coalescent time, and is treated as zero in what follows. Prior to the origin of the neo-Y chromosome, the locus was inherited as an autosomal gene and hence had a coalescent time of unity.
After the fixation of the fusion, the neo-X and neo-Y alleles are assumed to have coalescent times that are fractions tx and ty of the autosomal rate (Figure 3). In the absence of sexual selection, which would cause the variance in reproductive success of males to be greater than that of females, tx is 0.75, but it may approach or even exceed 1 if there is intense sexual selection (Charlesworth 1994; Caballero 1995; Nagylaki 1995). We similarly expect ty to be equal to 0.25 in the absence of sexual selection, but to be reduced below this by sexual selection. Hitchhiking due to selective sweeps or background selection acting on the neo-Y would reduce ty even more. In the simulations, we assumed that tx = 0.75 or 1.0 and considered the effect of varying ty between 0.25 and 0.01. The two values for tx span the range between no sexual selection and intense sexual selection, and the values for ty span the range from no sexual selection or hitchhiking to intense effects of such factors. The neutral mutation rate scaled in units of coalescent time is denoted by M = 2 Neu, where u is the neutral mutation rate per generation, summed over all relevant sites.
—Representation of the genealogical process without recombination as modeled in the coalescent simulations. Fixation of the X/4 fusion and isolation between the neo-Y and neo-X chromosomes occurs at time T before the samples of nx and ny neo-X and neo-Y alleles were collected. Prior to this the alleles were present in one population, but afterward the two populations were separate, each with their respective coalescent times, tx and ty. (a) Idealized genealogy of alleles with coalescent events occurring both after and prior to the fusion at time T. (b) Idealized example of “premature coalescence,” where all the neo-X and neo-Y alleles descend from a single common ancestor following the fixation of the X/4 fusion at T.
The coalescent model is similar to that previously employed for two subpopulations connected by migration (Hudson 1990; Wakeley 1996; Nordborg 1997), except that we allow for a finite time since the split between the two subpopulations. At the start of a replicate run, a sample of nx and ny genes is assumed to be drawn from neo-X and neo-Y chromosomes derived from a natural population. At an arbitrary time t prior to sampling (t ≤ T), let there be nxt and nyt genes from the sample present in the neo-X and neo-Y subpopulations, respectively. The rates per unit coalescent time of coalescent events within the neo-X and neo-Y subpopulations are kxt = nxt (nxt – 1)/2tx and kyt = nyt (nyt – 1)/2ty, respectively.
Let Rx be the rate (in units of coalescent time) at which alleles that are currently in the neo-X subpopulation are derived from the neo-Y population, and Ry be the corresponding rate at which alleles in the neo-Y population are derived from the neo-X subpopulation. If unfused fourth chromosomes are rare, there are approximately three times as many neo-X chromosomes as neo-Y chromosomes. It is thus reasonable to assume that Rx = 0.333 Ry. At time t, the net rates at which recombination events occur are thus nxtRx and nytRy for the neo-X and neo-Y, respectively. If these rates are of order 1, the corresponding distributions of the waiting times to recombination events are exponential with expectations of 1/(nxtRx) and 1/(nytRy). The probability that a given event occurs in a given subpopulation and is a coalescent or a recombination event is the ratio of the rate of the event to the sum of the rates for all four types of event (Hudson 1990). The latter is given by the sum of the rates of the events in question. A uniform random number is thus compared with the probabilities of each class of event to determine what happens at the time point following t. An exponentially distributed random number is chosen, from a distribution of mean one, and divided by the sum of the current rate parameters to determine the time from t to the next coalescent or recombination event.
The genes within a given subpopulation are classified as having been present in that population in the initial sample or as having entered the subpopulation from the other one by recombination. Records are kept of the numbers of genes in each of these classes and of the numbers of genes within each of them that have previously experienced a coalescent event with a given class of gene. Mutations are laid down on internal and external branches of the gene tree connecting the set of sampled alleles, according to Poisson distributions with means 0.5 M times the lengths of the appropriate branches. Singleton variants correspond to mutations on external branches; fixed mutations correspond to mutations that occur on branches that are not shared by genes sampled from different subpopulations. This procedure is repeated until time T is reached or until coalescence of all the neo-X genes or neo-Y genes with other genes present in the same initial subpopulation (“premature coalescence”). Premature coalescence corresponds to a situation in which one or more of the following possibilities are realized: (i) all variants on the neo-X or all neo-Y are found only within their own subpopulation; (ii) there are fixed differences between the neo-X and neo-Y chromosomes; or (iii) there are no segregating sites within either or both of the neo-X and neo-Y samples (Figure 3b). Because any one of these events is incompatible with the observed data, runs of this type were not used for calculating the statistics of interest.
After time T is reached in the absence of premature coalescence, coalescent events proceed at a rate determined by the autosomal rate, and no further recombination events are allowed, because the genes now all form part of the same population. As before, however, records are still kept of the numbers of genes that derived from the original set of neo-X or neo-Y genes and of the numbers that have experienced a coalescent event with a gene in a given category.
RESULTS
Patterns of DNA sequence variability: Sequences of an 884-bp region of the Adh gene were obtained for 6 neo-X and 5 neo-Y chromosomes from the laboratory strains of D. a. americana, 19 neo-X and neo-Y chromosomes from the recently collected D. a. americana G96 sample, and 10 autosomes from the recently collected D. a. texana LP97 sample. A total of 41 variable nucleotide sites were identified upon comparison of the 884-bp sequences from these 59 chromosomes, and these are presented in Figure 4. Of the 41 variable sites, 8 occur in the 130 bp of noncoding sequence, 32 are present at silent sites of 252 codons, and 1 is a replacement substitution. The one replacement substitution is a threonine-to-isoleucine replacement on the neo-Y chromosome of G96.40. The variable sites are distributed very evenly among the chromosomes (Figure 4), indicating very little haplotype structure in the samples and suggesting a historically high level of recombination in this region. There are 50 different haplotypes represented by the 59 sequences, and six instances where identical sequences were obtained from two different chromosomes: D.am.6X/G96.12X, D.am.0Y/LP97.02, G96.14Y/G96.30Y, D.am.0X/G96.06Y, G96.21Y/L97.08, and G96.03Y/G96.48Y. In another case, four different chromosomes, D.am.3X, D.am.7X, G96.13X, and G96.45X, share the same sequence. A total of 21 haplotypes are present on 25 neo-X chromosomes, 22 haplotypes are present on 24 neo-Y chromosomes, and 10 haplotypes on 10 autosomes of D. a. texana. Haplotype structure in the samples of neo-X and neo-Y chromosomes of D. a. americana and autosomes of D. a. texana is, therefore, very similar.
Another case where identical sequences were obtained involves the neo-X and neo-Y chromosomes of G96.31. This is the only line (out of 18) from the G96 sample where the F1 female and male share the same sequence. Under the experimental methods that were followed to obtain the sequences, identical sequences can be obtained from F1 males and females if the X/4 fusion is not present in the D. a. americana male used in the original cross with D. virilis. Given the low haplotype structure observed in the samples, it is improbable that a neo-X and neo-Y from the same strain would share the same sequence, so it seems likely that the X/4 fusion was not present in the parental male of G96.31 used for this cross. Cytological examination of the G96.31 line confirmed that the X/4 fusion is polymorphic in this line; therefore, these sequences were excluded from the analyses. The line was established from a wild-caught pair, and apparently one of the three X chromosomes present in the two individuals used to establish the line was not fused to the fourth chromosome. Our sequence and cytological analyses indicate that the other 22 lines of D. a. americana are fixed for the X/4 fusion chromosome. If the unfused X chromosome in G96.31 represents a low-level polymorphism, the frequency of the unfused X chromosomes in the population at Gary, Indiana is apparently <1.5%. Cytological analyses confirmed the absence of the X/4 fusion chromosome in the LP97 sample of D. a. texana.
On the basis of the DNA sequences that were obtained, the different samples are very similar with respect to the variable nucleotide sites present in each. No fixed differences were found between any of the samples, and only two nucleotide site variants were identified that were present in more than one sequence but limited to a single sample (positions 196 and 451 in Figure 4). The divergence between samples was quantified using d, a measure of the mean number of net nucleotide differences between a pair of populations (Nei 1987). As can be seen in Table 1, comparisons between samples of D. a. americana generally yield negative values for d, which indicates there is a greater average number of differences within samples than between samples. The comparisons between the samples of D. a. americana and D. a. texana have positive values for d, but these are very small. Given that the variance of d is large when its value is small (Takahata and Nei 1985), inferences regarding the relationships among these samples are unreliable. What is clear from these comparisons, however, is that there is no quantitative divergence between the neo-X and neo-Y chromosomes of D. a. americana. Furthermore, there is no divergence between either of these chromosomes and the homologous locus in D. a. texana.
Average and net number of pairwise sequence differences at Adh
—Nucleotide polymorphisms in the Adh gene region of D. a. americana and D. a. texana. Positions of the variable nucleotide sites are presented relative to the sequenced region. Changes are indicated as noncoding (I), synonymous (S), or replacement (R). The ancestral state at each variable nucleotide position is based on sequences from D. virilis and D. lummei (Nurminskyet al. 1996). Complete sequences have been deposited in GenBank under accession nos. AF136650–AF136708.
Sequence diversity per nucleotide site within each sample was estimated by the standard measures θ, based on the number of segregating sites, and Π, based on average pairwise differences among sequences (Tajima 1993). The laboratory strains represent individual isolates from throughout the range of D. a. americana and provide a standard for estimating the total variability within the subspecies, whereas the G96 sample was obtained from multiple flies collected at a single locality. As is shown in Table 2, measures of variability for both the neo-X and neo-Y are relatively consistent between the established laboratory strains of D. a. americana and the recently collected G96 sample. The estimates of Π, however, are in closer agreement than those for θ. For example, Π is 0.0056 and 0.0059 for the G96 and Lab samples of the neo-Y, whereas θ is 0.0071 and 0.0060, respectively (Table 2). The comparison between the Lab and G96 samples reveals that nucleotide diversity in the population at Gary, Indiana is similar to that present in the subspecies as a whole, indicating there have been no recent bottlenecks that have influenced nucleotide diversity in the population at Gary.
Nucleotide diversity at the Adh locus in D. a. americana and D. a. texana
There is reason to suspect that sequence variability at Adh on the neo-X chromosome of D. a. americana may have been recently affected by selection. The X/4 fusion was presumably derived through a single mutational event. If some form of directional selection was responsible for the increase in frequency of this chromosome, sequence variation at Adh may have been affected. Because this locus is located relatively close to the centromere on the X/4 fusion chromosome, current variation at this locus would reflect the frequency of recombination between this locus and the centromere, and the time required for fixation of the X/4 fusion (Maynard Smith and Haigh 1974; Kaplanet al. 1989; Stephanet al. 1992). Both the Tajima (1989) and Fu and Li (1993) statistical tests were used to examine the pattern of sequence variation in the G96-X sample for significant departure from neutrality. Although θ was greater than Π for the G96-X sample, yielding a Tajima's D of –0.88 and a Fu and Li's D of –0.59, both test statistics are well within the limits of the neutral model (the α = 0.05 value for Tajima's D is –1.80 and for Fu and Li's D is –1.89). Also, selection on the X/4 fusion would be expected to cause a reduction in sequence variation at Adh, but the observed variation was very high in the G96-X and Lab-X samples, ∼14% greater than the observed diversity at Adh on the autosomes of D. a. texana (Table 2). Nucleotide diversity at Adh on the neo-X chromosome is also relatively consistent with the levels of silent nucleotide variability observed at other loci in the genome of D. a. americana and D. a. texana (Hilton and Hey 1996, 1997; B. F. McAllister, unpublished data).
Nucleotide variability on the neo-Y of D. a. americana would not have been influenced directly by the fixation of the X/4 fusion chromosome. The population of neo-Y chromosomes currently present in D. a. americana represents a subpopulation of the freely segregating autosomes that was present before the fusion of this element with the X chromosome, but these chromosomes have now been “captured” by the X/4 fusion. However, the selective sweep model of Y-chromosome degeneration (Rice 1987) predicts that the pattern of sequence diversity at Adh on the neo-Y chromosome would be distorted by the bottleneck in population size caused by a selective sweep (Bravermanet al. 1995; Simonsenet al. 1995). Negative values for the D statistics were generally observed for the two samples of neo-Y chromosomes (Table 2). For the G96-Y sample, Tajima's D was –0.83 and Fu and Li's D was –0.30; both are consistent with neutrality. On the basis of these statistical tests, there is no evidence for any selective sweeps having occurred on the neo-Y chromosome of D. a. americana.
Although the G96-X and G96-Y sequences were obtained from the same sample and primarily from the same strains of flies, there is a striking difference in the level of nucleotide variability present in these two chromosomal populations. Nucleotide diversity on the neo-Y is ∼65% (with a 95% CI of ∼±27%) of that observed on the neo-X, regardless of whether the comparison is based on Π or θ (Table 2). The average pairwise difference among the neo-X sequences from the G96 sample is 7.58 bp for the entire 884-bp region, whereas the neo-Y sequences from the G96 sample have an average pairwise difference of 4.95 bp. The sequences from these two G96 samples were used in a resampling scheme to determine if the observed 2.64-bp difference in nucleotide diversity between the neo-Y and neo-X chromosomes is statistically significant (see materials and methods). We investigated the probability of observing a greater difference in the amount of nucleotide diversity than is observed for the G96-X and G96-Y sequences by repeatedly reconstructing and comparing two random samples of these sequences. Two out of 1000 replicates were obtained with a difference between the random samples that is greater than or equal to the observed 2.64-bp difference in mean pairwise diversity between the G96-X and G96-Y samples. This result provides strong statistical support (α≤ 0.002) for rejecting the null hypothesis that these two samples of alleles were obtained from a single homogenous population, thus indicating that average nucleotide diversity on the neo-Y is significantly lower than that on the neo-X.
Results of coalescent simulations: As described in materials and methods, coalescent simulations were performed to investigate the following questions: (1) the value of the time T since the origin of the neo-X and neo-Y chromosomes and (2) the effective population size of the neo-Y, which is inversely proportional to ty. There are several features of the data that need to be reproduced with reasonable probability by values of these parameters: (i) the relatively low proportion of neo-Y chromosome variants that are unique to this subpopulation; (ii) the relatively high proportion of unique neo-X chromosome variants; (iii) the high proportion of singletons among the unique neo-Y chromosome variants; (iv) the low diversity on the neo-Y chromosome compared with the neo-X; and (v) the lack of fixed differences between the neo-Y and neo-X chromosomes.
To ask these questions, the simulation program determined the following statistics, conditioning an assumed values of the scaled mutation rates and recombination rates described in materials and methods: (i) the proportion (Puy) of runs in which the proportion of neo-Y chromosome variants unique to the neo-Y chromosome was equal to or less than the proportion observed; (ii) the proportion (Pux) of runs in which the proportion of neo-X chromosome variants unique to the neo-X chromosome was equal to or greater than the proportion observed; (iii) the proportion (Ps) of runs in which the proportion of singleton variants among variants unique to the neo-Y chromosome was equal to or greater than that observed; (iv) the proportion (Py) of runs in which the fraction of all segregating sites represented by neo-Y chromosomal variants was equal to or less than that observed; and (v) the proportion (Pf) of runs in which the number of fixed differences between the neo-X and neo-Y was equal to or less than the number observed. In addition, Pp, the proportion of runs in which “premature coalescence” occurred (see materials and methods and Figure 3b), was recorded. Values of Pf and Pp are not shown in the tables, because Pf is always close to one and Pp is always zero or small, unless recombination is zero and T is large.
As can be seen from Table 3, Puy and Ps tend to decline with increasing T when Ry is zero, and Py declines when ty is larger. Recombination is expected to reduce Pux and Py and to increase Ps, as was observed in the simulation results. An overall assessment of the extent to which a given parameter set is compatible with the observed variants segregating at silent sites in the samples of neo-Y and neo-X alleles can be obtained by calculating the proportion (Pc) of runs in which the criteria used to establish the values of Puy, Pux, Ps, and Py are all met, and when premature coalescence did not occur. If Pc ≤ 0.05, we may consider the parameter set in question to be incompatible with the data at the 5% probability level. While this criterion is ad hoc, it seems to capture most features of the data that are relevant to the parameters of interest and is computationally straightforward. We chose to use the number of segregating sites as the basis for our test statistics, rather than pairwise difference measures, because these have superior statistical properties in the context of subdivided populations (Wakeley 1997).
Results of coalescent simulations
The top section of Table 3 shows the results for the case when no recombination between the neo-X and neo-Y is allowed, corresponding to a state of complete fixation of the X/4 fusion and no introgression from D. a. texana. Most of the runs were done with an M value of 3.5; this corresponds to a θ value of 7 for the Adh silent sites, which is in the middle range of the values estimated for the sum over all silent sites for the samples of americana neo-X, neo-Y, and texana alleles. Sample sizes of 25 and 24 were used for the neo-X and neo-Y alleles, respectively, corresponding to the pooled data for the americana sequences. A coalescent time of 0.75 for the X chromosome relative to the autosomes was used, corresponding to the case of no sexual selection (Charlesworth 1994; Caballero 1995; Nagylaki 1995). One thousand runs were performed for each parameter set. The simulation results were compared with the observed number of three segregating sites unique to the neo-Y, two of which were singletons, and a value of 0.409 for the proportion of all variants that are found on the neo-Y.
With no recombination, it is not hard to account for the proportion of variants that are unique to the neo-Y chromosome sample, except when T is fairly large (≥0.2), unless ty is very small. The frequency of neo-Y singletons is also sensitive both to T and the effective population size of the neo-Y chromosome subpopulation; the region of parameter space for which T ≥ 0.30 and ty ≥ 0.10 is ruled out by this statistic (Ps) alone. The proportion of neo-X variants that are unique to the neo-X becomes significantly too large for large T or small ty. The observed value of the fraction of all variants that are found on the neo-Y chromosome is compatible with a wide range of values of T and ty, although Py becomes >0.95 for small ty and large T, which implies that the observed value is significantly too large for these parameters. The observed number of fixed differences is also compatible with the data for the range of parameters shown, although Pf increases as T decreases. Overall, only T ≤ 0.10 and ty = 0.1 are compatible with the data. This remains true even if widely different M values are used (data not shown). One possibility is that strong sexual selection may cause the coalescent time for X-linked genes relative to autosomal genes to be much higher than the value of 0.75 assumed in Table 3. This was tested by runs in which tx was set to 1, but this made only a minor difference to the results.
Given the evidence that some X chromosomes in D. a. americana are unfused, it is important to investigate the consequences of recombinational exchange between the centromere and Adh. Such recombination can occur in females that lack the fusion, which may result from incomplete fixation of the fusion in D. a. americana or introgression from D. a. texana. This would result in a transfer of alleles between the neo-X (fused fourth chromosomes) and neo-Y (unfused fourth chromosomes) subpopulations, at a rate determined jointly by the frequency of crossing over between the centromere and Adh in females and the frequency of unfused X chromosomes.
The results of simulations that include recombination are shown in the lower part of Table 3. The overall effect of recombination is to increase the frequency of runs that meet the test criteria, especially for large T. If an M value of 3.5 is assumed, the only cases in which the probability of meeting the criteria for acceptance are when ty < 0.25. Values of M that are much smaller than 3.5 deviate significantly from the observed number of segregating sites in the neo-X chromosome sample (data not shown). Similarly, an M value of 5 gives a worse fit than M = 3.5. Somewhat counterintuitively, the best fit is obtained when Ry = 10 for a wide range of T values. Substantially higher Ry values lead to a lack of overall agreement with the data. No very precise conclusion concerning the likely value of T can thus be drawn from these results. A very high recombination rate (»100), coupled with a large T (>1) is, however, inconsistent with the observed fraction of all segregating sites that are on the neo-Y chromosome, given the significantly lower diversity for the neo-Y chromosome than for the neo-X (see above). Overall, the results suggest the occurrence of a low frequency of exchange of Adh alleles between the neo-X and neo-Y chromosomes, with a significant reduction in the effective population size of the neo-Y chromosome to well below 25% of the autosomal value.
DISCUSSION
Sequence data from the Adh gene of D. a. americana reveal two important features of the neo-sex chromosomes: effectively no sequence divergence is present between the neo-X and neo-Y chromosomes, and sequence diversity is significantly lower on the neo-Y chromosome relative to the neo-X. These findings have implications for the progression of sex chromosome differentiation as represented by the secondary sex chromosomes of D. a. americana. The neo-sex chromosomes of D. a. americana are currently in an incipient stage of differentiation, given the observed lack of divergence between the neo-X and neo-Y. Substantial reduction in nucleotide diversity on the neo-Y relative to the neo-X suggests that Ne for the neo-Y chromosome has been reduced well below the expected 25% of the autosomal value. We consider both of these points in turn.
Lack of divergence between neo-X and neo-Y alleles: This is the first study to provide direct evidence that the neo-sex chromosomes of D. a. americana share a high degree of sequence similarity. Because Adh is located on both the neo-X and neo-Y chromosomes, divergence at this locus provides a direct measure of the degree of separation between these two chromosomes, but even a low frequency of recombination between the neo-X and neo-Y chromosomes would retard this sequence divergence (Hudson 1990; Wakeley 1996; Nordborg 1997). The simulation results (Table 3) are consistent with the presence of rare recombination events: Ry values of ∼10 provided a somewhat better fit to the observed sequence data than zero recombination, although the latter cannot be ruled out if the fusion is of recent origin (T ≤ 0.1) and ty is ≤0.1. However, a high rate of exchange (Ry » 10) seems to be ruled out.
The value of Ry is of the order of the product of the effective population size and the rate of exchange between the centromere and the neo-Y copy of Adh. Because silent site diversities at Adh and other loci (Hilton and Hey 1996, 1997; B. F. McAllister, unpublished data) suggest an autosomal effective population size of >1,000,000 in D. a. americana, the frequency of exchange in this chromosomal interval must be of the order of 1 in 100,000 or less. This is an average recombination rate for all potential sources of exchange between fourth chromosomes that are fused to the X and unfused fourth chromosomes. The simulation results therefore suggest the possibility of a cumulative effect of rare recombination events between neo-X and neo-Y chromosomes.
There are several possible sources of recombination events. Measured rates of recombination in male D. virilis are very low (Kikkawa 1935), but rare recombination events in males may be a source of exchange between neo-X and neo-Y chromosomes if the fusion has gone to complete fixation. The observed polymorphism of the X/4 fusion provides another opportunity for rare recombination events. Previous studies of D. a. americana indicate that the X/4 fusion chromosome is fixed throughout its range, with the exception of the relatively narrow hybrid zone (Hsu 1952; Throckmorton 1982). As already noted, a single line out of 22 collected from the population at Gary, Indiana is polymorphic for the X/4 fusion, suggesting that <1.5% of the X chromosomes in this population are not fused with chromosome 4. Such a low frequency polymorphism of the X/4 fusion provides an opportunity for neo-Y chromosomes to undergo infrequent recombination when present in females heterozygous for the fusion. Because Adh is close to the centromere at cytological band 49B (Gubenko and Evgen'ev 1984; Nurminskyet al. 1996; Charlesworthet al. 1997a), and recombination is suppressed near translocation breakpoints (Ashburner 1989), it may experience only a low frequency of exchange in such females, accounting for the low value of Ry that we have estimated. Loci that are located more distally on chromosome 4 should exhibit greater effects of recombination, because there would be a greater opportunity for crossing over when an unfused chromosome 4 was present in a female where exchange could occur. Sequence data from two additional regions of chromosome 4 are consistent with this prediction (B. F. McAllister, unpublished data).
While increasing the rate of recombination causes an increase in similarity for a given T, even with an extended amount of time, it does not do so in such a way that all criteria can be satisfied for an arbitrarily large T. This is understandable from the fact that recombination between the two chromosome classes behaves like a conservative migration process connecting two partially isolated subpopulations; use of equations (A1) of Nordborg (1997) shows that the difference in the equilibrium level of nucleotide diversities between two subpopulations connected by conservative migration is small in comparison with the divergence between the subpopulations, even if they differ substantially in size. Inclusion of recurrent gene flow from texana into americana, another possible source of unfused fourth chromosomes, would behave in a similar way to recombination. It is therefore essentially impossible to produce both a lack of divergence between the two chromosome classes and a reduction in diversity within the neo-Y chromosomes, with very large values of both T and recombination rates.
The only known difference between D. a. americana and D. a. texana is the X/4 chromosomal fusion, suggesting a relatively recent origin of this chromosomal arrangement, especially as it is also absent from close relatives such as D. novamexicana and D. virilis. The maintenance of a steep cline between the fused arrangement of D. a. americana and the unfused arrangement of D. a. texana (Patterson and Stone 1952; Throckmorton 1982) indicates the operation of selection across the hybrid zone against immigrant karyotypes. The probable close linkage of the Adh gene to the centromere of chromosome 4 suggests that sequence divergence at Adh could have proceeded between the two subspecies even in the face of gene flow, due to the elimination of immigrant genotypes, if sufficient time had elapsed since the origin of the fusion (Barton 1986; Barton and Bengtsson 1986; Charlesworthet al. 1997b). The lack of such divergence suggests that T may in fact be of the order of the coalescence time or less. Absence of sequence divergence between D. a. americana and D. a. texana has also been observed for the period gene (Hilton and Hey 1996) on the distal end of the X chromosome, the oskar gene on chromosome 2 (Hilton and Hey 1997), and two separate genes, seven in absentia and transformer, on the distal end of chromosome 3 (B. F. McAllister, unpublished data). This consistency in the lack of sequence differentiation documented for Adh and the other loci suggests that the recent separation between the two taxa has had a greater impact than current gene flow on the distribution of nucleotide site variants between D. a. americana and D. a. texana.
Reduction of diversity on the neo-Y chromosome: The finding that sequence diversity is reduced 34% on the neo-Y chromosome relative to the neo-X is very interesting in light of the evidence for possible ongoing recombination between the neo-Y and neo-X chromosomes. Established sex chromosomes should have a Y chromosome with ∼33% of the sequence diversity present on the X, unless strong selection greatly increases the variance in male reproductive success relative to that in females (Charlesworth 1994; Caballero 1995; Nagylaki 1995). The coalescent simulations suggest that the best fit to the data is provided by a model in which Ne for the neo-Y is reduced to 10% or less of the autosomal value. The facts that the neo-Y chromosomes have few unique variants and that there is no evidence for a drastic loss of variability or distortion of the allele frequency spectrum, militate against any selective sweeps having occurred in the neighborhood of the Adh locus as postulated by Rice (1987) in his model for the degeneration of Y chromosomes. The reduction in Ne for the neo-Y chromosome must therefore have been caused by either strong sexual selection, background selection (Charlesworthet al. 1993), or temporally fluctuating selection (Gillespie 1994, 1997), because these would not necessarily produce such effects.
This study of D. a. americana provides an initial assessment of the early processes involved in Y-chromosome degeneration. The possibility of a low frequency of recombination between the neo-Y and neo-X chromosomes is consistent with previous analyses indicating that the neo-Y chromosome is still capable of functioning in the homozygous condition (Charlesworthet al. 1997a) and that the neo-X chromosome does not show signs of dosage compensation (Bone and Kuroda 1996; Marinet al. 1996). The loss of sequence diversity on the neo-Y is currently the only apparent consequence of partial genetic isolation between these chromosomes, indicating a strong reduction in effective population size of the neo-Y chromosomes as postulated in standard models of Y-chromosome degeneration (Charlesworth 1996; Rice 1996). Additional sequence data are needed to confirm these conclusions.
Acknowledgments
The authors thank J. Hnilicka and K. Poneta for technical assistance, and K. Dritz and T. McAllister for assistance in locating suitable collecting localities. We also thank D. Charlesworth, J. Feder, D. Guttman, and S. Yi for helpful discussions of this project and/or commenting on previous versions of this manuscript. This work was supported by a National Science Foundation/Alfred P. Sloan Postdoctoral Research Fellowship in Molecular Evolution to B. McAllister, and by U.S. Public Health Service grant GM-50355 and a grant from the Royal Society to B. Charlesworth.
Footnotes
-
Communicating editor: W. F. Eanes
- Received November 3, 1998.
- Accepted April 28, 1999.
- Copyright © 1999 by the Genetics Society of America