Abstract
The origins and divergence of Drosophila simulans and close relatives D. mauritiana and D. sechellia were examined using the patterns of DNA sequence variation found within and between species at 14 different genes. D. sechellia consistently revealed low levels of polymorphism, and genes from D. sechellia have accumulated mutations at a rate that is ∼50% higher than the same genes from D. simulans. At synonymous sites, D. sechellia has experienced a significant excess of unpreferred codon substitutions. Together these observations suggest that D. sechellia has had a reduced effective population size for some time, and that it is accumulating slightly deleterious mutations as a result. D. simulans and D. mauritiana are both highly polymorphic and the two species share many polymorphisms, probably since the time of common ancestry. A simple isolation speciation model, with zero gene flow following incipient species separation, was fitted to both the simulans/mauritiana divergence and the simulans/sechellia divergence. In both cases the model fit the data quite well, and the analyses revealed little evidence of gene flow between the species. The exception is one gene copy at one locus in D. sechellia, which closely resembled other D. simulans sequences. The overall picture is of two allopatric speciation events that occurred quite near one another in time.
SEVERAL hundred thousand years ago one species of Drosophila gave rise to three that today we call Drosophila simulans, D. mauritiana, and D. sechellia. Today the three species are morphologically distinct (primarily on the basis of male genitalia), partially intersterile (male hybrids are sterile, female hybrids fertile), and largely allopatric (D. simulans is a nearly cosmopolitan human commensal, while the other two are island endemics). The combination of clear phenotypic distinction, partial infertility, and recent coancestry (not to mention their evolutionary proximity to D. melanogaster) has made this little species complex our most thoroughly studied speciation model system (Coyne and Kreitman 1986; Coyne 1992; Wu and Palopoli 1994; Coyne and Charlesworth 1997).
Historically there have been two main approaches to the genetic study of species divergence. The classical approach is to genetically map traits that are thought to be important in speciation. Such traits tend to fall into one of three categories.
The most straightforward are those for which the species exhibit characteristic differences and that probably represent species-specific adaptations. Major lifestyle or life history adaptations can, in principle, play a large direct role in speciation, particularly if those changes arise first as polymorphisms within the ancestral species (Bush 1969; Rice and Hostert 1993). For example, the preferred host of D. sechellia, Morinda citrifolia, is toxic to the other species of the D. melanogaster complex, and the genes that confer resistance can be mapped in the species hybrids (Jones 1998).
A second class of “speciation” traits are those that are features of mating pairs of organisms. In recent years, a host of interesting Drosophila mating phenotypes have come under focus, including species-specific mate detection pheromones (Coyneet al. 1994; Coyne and Charlesworth 1997), sperm competition (Snooket al. 1994; Priceet al. 1999), and female mediation of sperm competition (Price 1997).
The third class of speciation traits are those that appear almost exclusively in species hybrids. Exam-ples of these are hybrid inviability and hybrid sterility. In general such traits can be genetically studied only in species where postzygotic isolation is incomplete, which is necessary for the production of F2s or backcrosses.
A different genetic approach to understanding speciation is to study the history of species divergence as it is revealed in the polymorphism pattern at randomly selected genes. In recent years, comparative DNA sequence data (especially mitochondrial) have been frequently used to address basic questions about the relatedness of close sister taxa and populations (Avise 1989). Conceptually, the idea is a direct extension of basic population genetic questions (i.e., questions about population subdivision, gene flow, and natural selection) to the species level. However, the use of DNA sequence data also permits the use of genealogical coalescent models, which incorporate classical population genetic parameters (e.g., effective population size and migration rate) within a gene tree framework (Hudson 1990), as well as the entire suite of tools used by molecular phylogeneticists. These methods become even more informative when data come from multiple loci and thus can be used to distinguish forces that act on all genes from those, like natural selection, that affect individual loci (Hudsonet al. 1987; Hey 1994). For the sake of a useful label, we refer to this general approach in the remainder of the article as divergence population genetics (DPG).
The two approaches (the mapping of speciation phenotypes and DPG) have historically been directed at very different questions. The gene mapping studies address the genetic architecture of phenotypes that may have been important in speciation; however, these maps bear no direct connection to the demographic factors that have caused species, and they may not have a direct connection to the selective factors that have actually caused species. All of the speciation phenotypes listed above can arise during or following speciation that is primarily caused by selection on other phenotypes. Thus, for example, hybrid sterility and inviability may arise as epistatic by-products of independent adaptations in the separate incipient species (Dobzhansky 1936; Muller 1940). In contrast to the gene map approach, divergence studies can focus directly on evolutionary forces, particularly those demographic factors that affect all of the genes in the genome.
The two approaches can greatly complement one another, such as when interpretations of the evolution of phenotypic traits are laid upon an understanding of phylogenetic history. For example, recent attempts to demonstrate sympatric speciation and assess its frequency have strongly relied upon accurate branching phylogenies (Schliewenet al. 1994; Barraclough and Vogler 2000; Coyne and Price 2000). However, phylogenetic history, particularly for recent speciation events, may not be well represented by simple branching trees. Most real species probably emerge gradually whereas a branching model entails the assumption of instantaneous splitting. This can be misleading particularly if speciation has been recent or there have been multiple speciation events that overlap in their time course. In such cases we must consider “phylogeny” more broadly as concerning the genesis of phyla, however complex or slow that process might have been. If speciation events have been recent, and if they have been complex and not instantaneous, it may be possible to reveal the complexities using a population genetics approach.
This report brings together the efforts of several investigators interested in the speciation events that have led to our current simulans complex species. To date, DPG studies on the simulans complex have been done for five different nuclear loci (Hey and Kliman 1993; Kliman and Hey 1993a; Hiltonet al. 1994). Here we report on patterns of DNA sequence divergence at an additional nine loci. Together these data permit a broad, genome-wide assessment of speciation.
MATERIALS AND METHODS
The data for five loci, per, yp2, z, ase, and ci, have previously been described (Hey and Kliman 1993; Kliman and Hey 1993a; Hiltonet al. 1994). DNA sequences were collected from multiple lines of each species of the simulans complex for each of nine additional loci. For some of these genes, new data were aligned with existing, previously reported data for some species. For all loci, at least one sequence from D. melanogaster was also available.
Zw, Adh, and est-6: DNA sequences had previously been reported for D. melanogaster and D. simulans for these genes (Kreitman 1983; Cooke and Oakeshott 1989; McDonald and Kreitman 1991; Eanes et al. 1993, 1996; Karotamet al. 1993). DNA was extracted from single individuals of D. mauritiana and D. sechellia drawn from isofemale lines that had been in the laboratory for >200 generations. PCR on these genomic DNAs was done to generate a 1.3-kb region of Zw, a 0.87-kb region of Adh, and a 1.6-kb region of est-6. The sequenced region of Zw corresponds to sites 148-1460 in a previously reported D. simulans sequence (Eaneset al. 1993). The sequenced region of Adh corresponds to sites 2195-2900 in a previously reported D. simulans sequence (Cohnet al. 1984). The sequenced region of est-6 corresponds to sites 157-1679 of a previously reported D. melanogaster sequence (Cooke and Oakeshott 1989). Primary PCR products were purified from an agarose gel slice using the QIAquick gel extraction kit (QIAGEN, Valencia, CA) and were then used as the template for subsequent amplification of shorter regions for DNA sequencing. These reactions employed one primer carrying an M13 forward tail, while the other primer carried an M13 reverse tail. Subsequent sequencing reactions were done in both directions simultaneously using fluorescently labeled M13 primers on a Li-Cor (Lincoln, NE) automated DNA sequencer. For D. mauritiana, six sequences were used. Four of these (lines 105, 197, 152, and 207) were the same as those used previously for per, yp2, z, ase, and ci (designated in those articles as MA-3, MA-4, MA-5, and MA-6, respectively). For D. sechellia, two sequences were obtained, one from SE-C1 (also called strain 24) and one from SE-P1 (also called strain ss77; Kliman and Hey 1993a).
janus: Two loci, janus-A and janus-B, are overlapping and related by an ancient duplication (Yanicostaset al. 1989). Three lines of D. mauritiana and D. simulans were sequenced, as well as one line each from D. melanogaster and D. sechellia. To isolate just one allele from each isofemale line, single males were crossed with a virgin female from a balanced lethal strain of D. melanogaster [Df(3R)X3F/TM3Sb + P[ry + .RP49].84F] deficient for a region including the janus loci. Following DNA extraction from single F1 individuals, PCR was conducted using the primers from positions 639-658 of GenBank record DMSRYG1 and positions 1736-1717 of GenBank record DRO-JAN. DNA sequencing was conducted with internal primers spaced approximately every 250 bp, using an Amersham (Piscataway, NJ) sequenase T7 kit and corresponding protocol. The final sequence spanned bases 429-1491 of Yanicostas et al. (1989), including most of the janus-A locus and part of the janus-B locus. Throughout the analyses, these regions were treated as a single locus. D. simulans strains were provided by C. Montchamp Moreau; D. mauritiana strains and the D. sechellia strain were provided by F. Lemeunier.
hb, mt:ND5, Sxl, and w: Portions of each of these loci were sequenced from single flies drawn from inbred lines of each species that were collected or obtained from others. For D. mauritiana, 1 line was obtained from H. Robertson and 9 from O. Kitagawa. For D. sechellia, 8 isofemale lines were collected from the Seychelles in 1985. These 8 lines were sequenced for each of these genes. In addition, 6 lines of D. sechellia were collected from the Seychelles in 1989. These lines were sequenced only for the Sxl locus. For D. simulans, 3 lines (from France, Tunisia, and Kenya) were obtained from the Drosophila species stock center, and 13 lines were collected from diverse locations, including Florida City, FL; Beltsville, MD; Murakata City, Japan; Palmer Island, Australia; Ottawa, Canada; Cairns, Australia; Capetown, South Africa; Brazzaville, Congo; Morven, GA; and Praslin, Seychelles.
DNA sequencing was done using templates generated via PCR on genomic DNA. PCR was done using a kinased primer and was followed by treatment with λ exonuclease to degrade one strand (Higuchi and Ochman 1989). The DNAs were sequenced with the dideoxy method with [35S]dATP label (Sangeret al. 1977).
For hb (hunchback), the sequenced region corresponds to intronic sequence from positions 7769-8052 of D. melanogaster GenBank record U17742. For mt:ND5 (mitochondrial NADH-ubiquinone oxidoreductase chain 5), the sequenced region corresponds to positions 7256-7472 of D. melanogaster GenBank record U37541. For Sxl (Sex Lethal), the sequenced region corresponds to intronic sequence from positions 241722-241977 of D. melanogaster GenBank record AE003439. For w (white), the sequenced region corresponds to intronic sequence from positions 12260-12478 of D. melanogaster GenBank record X02974.
In(2L)t: In(2L)t refers to the D. simulans/D. mauritiana/D. sechellia homologue of the proximal breakpoint site of the In(2L)t inversion that segregates in natural populations of D. melanogaster (Andolfattoet al. 1999). D. simulans isofemale lines were collected from Arena Farms, Maryland. The lines for D. sechellia include the original “Robertson” isofemale line collected and described by Tsacas and Baechli (1981) and provided by Hugh Robertson and two lines collected from Cousin Island, Seychelles, in 1985. The D. mauritiana lines were provided by Chung-I Wu. To obtain alleles from D. simulans (Arena Farms, Maryland) and D. mauritiana populations, multiple males from each isofemale line were crossed to virgin female D. melanogaster In(2L)t homozygotes. The resulting hybrid progeny (all female) were heterozygous for In(2L)t. This allowed the recovery of individual (one per isofemale line) D. simulans and D. mauritiana alleles by PCR with a standard arrangement-specific primer pair (Andolfattoet al. 1999). For D. sechellia, genomic DNA was prepared, one individual per isofemale line. Due to the unexpectedly high degree of similarity between one D. simulans (ar07) and one D. sechellia allele (from the Robertson line), male genitalia were checked for both lines and both alleles were resampled and sequenced. Polyethylene glycol-precipitated PCR products were directly sequenced on both strands using a Rhodamine Terminator cycle sequencing kit (Applied Biosystems, Foster City, CA) and run on an ABI377XL automated sequencer.
GenBank accession numbers: Accession numbers are as follows: for Zw, Adh, and est-6, AF284474-AF284497; janus, AF284453-AF284459; hb, AF295808-AF295835; mt:ND5, AF295836-AF295861; Sxl and w, AF295862-AF295921; In(2L)t, AF294398-AF294409 and AF217926-AF21791.
RESULTS
Polymorphism summaries: Sample sizes and basic statistics of the loci studied are listed in Table 1. DNA sequence variation is summarized in Table 2. A simple weighted average of nucleotide diversity per base pair shows D. simulans to be the most variable, followed by D. mauritiana and D. sechellia. For the autosomal loci the weighted average values of
Tests of selective neutrality: To focus on demographic factors associated with the divergence of species, we first addressed whether the data show evidence that natural selection has shaped levels of variation. Table 3 shows the results of contingency table tests in which variable sites are classified both with respect to whether they are polymorphisms within species or fixed differences between species and whether they occurred at synonymous or replacement sites within the protein-coding regions. Under a model in which all mutations are either deleterious or neutral, the expected ratio of synonymous to amino acid replacement variation should be the same for polymorphisms and for fixed differences between species. Three loci (est6, janus, and Zw) revealed a poor fit to the neutral model (Table 3), and in each case the direction is the same as had previously been reported for Adh (McDonald and Kreitman 1991) and Zw (Eaneset al. 1993) in contrasts involving D. simulans and D. melanogaster. If we assume that the pattern of synonymous site variation is close to that expected for neutral mutations, then the direction of departure for these tests is one in which the number of fixed replacement differences between species is higher than expected. This pattern would result if directional natural selection has caused some amino acid mutations to become fixed within species.
Gene and sample summary statistics
Similar in principle to the McDonald-Kreitman tests in Table 3, the Hudson-Kreitman-Aguadé (HKA) test examines whether the relative levels of observed polymorphism and divergence are consistent across multiple loci. Figure 1 shows the results of HKA tests (Hudsonet al. 1987). Rather than rely on the assumption that the test statistic follows a χ2 distribution (Hudson 1987), the overall test statistic was compared with a distribution generated from 10,000 coalescent simulations. Figure 1 shows, for each locus, whether or not the observed values of polymorphism and divergence are higher or lower than expected, and it shows the contribution from each data point to the overall test statistic. In each case the overall test statistic indicates a rejection of the neutral model: D. simulans, χ2 = 25.31, P = 0.0010; D. mauritiana, χ2 = 18.61, P = 0.0390; and D. sechellia, χ2 = 52.84, P = 0.0010. For D. simulans, Figure 1 shows how ase and ci make large contributions to the test statistic, as expected from previous reports (Berryet al. 1991; Hiltonet al. 1994). These two genes are also the only ones in the study from low recombination portions of the genome (as identified in D. melanogaster; see Table 1) and thus have probably been subject to collateral selective effects via linkage—genetic hitchhiking (Maynard Smith and Haigh 1974) or background selection (Charlesworthet al. 1993). The effect of this indirect selection, whether via beneficial or deleterious mutations, is to reduce polymorphism levels in regions of low recombination while leaving divergence levels typical of those seen among loci (Begun and Aquadro 1992). In the case of D. mauritiana, the ci and In(2L)t loci contributed a large amount to the test statistic. D. sechellia presents an interesting situation, for it carries very low polymorphism levels at nearly all loci, almost certainly due to a small effective population size (Cariouet al. 1990; Hey and Kliman 1993). Again, both ase and ci have low polymorphism, but, in this species, neither locus appears unusual, as all loci but one have low polymorphism levels. The exception is In(2L)t, which revealed 23 polymorphisms in D. sechellia. Upon inspection of the three sequences, two were revealed to be very similar to each other (four differences), while the third closely resembled sequences from D. simulans. It is this simulans-like sequence that contributes most of the polymorphisms to the D. sechellia sample for In(2L)t. There are two possible explanations for the observation: limited gene flow in the wild and recent admixture in the laboratory. Gene flow seems reasonable in that D. sechellia and D. simulans are partially interfertile and that both have been collected on the large island of Mahé (Cariouet al. 1990; R’Khaet al. 1991). However, no other loci show a pattern suggestive of recent gene flow. The second explanation, recent mixing in the laboratory, also does not fit the observed pattern at In(2L)t in a simple way, as the D. sechellia line from which this sequence arose has normal viability and normal male genitalia for this species. Thus, neither explanation can be directly supported nor ruled out. Also, despite the evidence from the McDonald-Kreitman tests (Table 3) of excess replacement differences between species at est6, janus, and Zw, we do not find evidence that these loci have overall levels of polymorphism and divergence that are inconsistent with the neutral model (Figure 1).
Polymorphism statistics
Contrasting levels of synonymous and replacement variation
The HKA test was repeated with the exclusion of just those loci that showed the strongest departures from expectations. As expected, the value of the overall test statistics dropped markedly, though that for D. sechellia was still significant (D. simulans, χ2 = 10.77, P = 0.1308; D. mauritiana, χ2 = 13.92, P = 0.1520; D. sechellia, χ2 = 21.90, P = 0.0459*). In the case of D. sechellia the still significant departure is primarily due to two loci (mt:ND5 and w) that revealed two polymorphisms where none were expected.
We also considered Tajima’s D statistic (Table 2) of the difference between different estimators of the population mutation rate, θ = 2Gu, where G is the effective number of gene copies and u is the mutation rate (Tajima 1989b). For a diploid species of effective population size N, G = 2N for an autosomal locus; G = 3N/2 for an X-linked locus; and G = N/2 for a sex-limited, effectively haploid locus found on the mitochondria or the Y chromosome. Under a neutral model of constant population size, the expected value of D is very near zero (Tajima 1989b). A negative D results when more than the expected number of polymorphic sites have low frequencies in the sample, a pattern that can be caused either by recent selection that has removed variation or by a recent population size expansion. For D. simulans, the values of D vary considerably and one (hb) is significantly less than zero. However, for D. mauritiana, nine values were negative while only two were positive (two could not be calculated and one was equal to zero), and again one of the values was significantly different from zero (w). To check whether such an overall negative pattern of D values is very unlikely by chance, the average observed value of D was calculated (weighted by locus length) and compared to the distribution of the same quantity generated by computer simulation. The simulations were the same ones used for the HKA tests and included 10,000 independent standard coalescent simulations using estimates of divergence time and θ for each locus that were generated from the observed polymorphism and divergence levels. For each simulation, we noted whether the absolute value of the observation was greater than the absolute value of the simulated value (two-tailed test). For D. mauritiana, the weighted average of D was -0.677 and only 2% (P = 0.020) of the simulations generated a more extreme value. For D. simulans and D. sechellia, the same analysis revealed a weighted value of D that fell near the middle of the simulated distribution (results available upon request). The overall negative pattern of D values from D. mauritiana suggests that recent population demographics have shaped the polymorphism pattern, with the simplest explanation being recent population size expansion (Tajima 1989a).
—Three multilocus HKA tests were done (Hudsonet al. 1987), one per species. (+) SimP, (+) SimD, (▵) MauP, (▴) MauD, (○) SecP, (•) SecD. In each case, polymorphism within species (as listed in Table 2) and divergence from a single D. melanogaster sequence were used for the test. Shown are the contributions to the overall χ2 test statistic by the polymorphism and divergence observations for each locus. Thus, for example, SimP refers to the standardized departure from expectations for polymorphism within D. simulans and SimD refers to the same quantity for divergence from D. melanogaster. If the observed value was greater than the expected, then the point is placed above the line; otherwise it is placed below the line. In the case of SecP for In(2L)t the value was 38.4, and in the case of SecD for In(2L)t the value was 7.39. These extreme values are represented by points outside the graph. Loci are ordered from left to right in rough accord with their degree of departure from expectations.
Divergence of genes: The three species of the simulans complex are closely related to one another, and much of the history of gene samples drawn from the simulans complex predates the origins of the three species (Hey and Kliman 1993). This recent complicated history precludes any simple analysis in which gene divergence is equated with species divergence (see below). However, we can ask some simple questions about how individual gene copies have diverged. In particular, we can use sequences from D. melanogaster to root the differences between pairs of sequences drawn from the simulans complex and ask whether genes drawn from different species have accumulated mutations at the same rate.
Relative rate tests were conducted for pairs of simulans complex sequences, rooted by a D. melanogaster sequence, using the method of Wu and Li (1985). Because each test involved just a pair of sequences, and there are many such pairs, we have done two types of summaries. In the first place, we examined all of the data for the 14 genes by doing all possible pairwise comparisons of sequences drawn from different species and summarized the results for each species pair and each gene in Table 4. Half of the genes (hb, In(2L)t, janus, per, Sxl, w, and Zw) had some pairs of sequences in which the substitution rate difference seemed excessive under the null model of no rate variation. These significant comparisons tended to show up in all three pairwise species contrasts. Another pattern is that for all these genes where some tests were significant, there was an average substitution rate excess for D. sechellia relative to D. simulans and relative to D. mauritiana (Δ columns in Table 4). In the comparisons between D. simulans and D. mauritiana the direction of departure varied evenly among genes.
The second method of summarizing was applied to the data, prior to the relative rate tests, so as to have results that are not complicated by so many multiple comparisons. This analysis employed just one single constructed sequence from each species. The following genes have been sequenced, at least in part, from at least one individual from each of the three simulans complex species, as well as D. melanogaster: the proximal Amylase gene (Amy-P; Shibata and Yamazaki 1995); Amyrel (Da Lageet al. 1998); the Cecropin gene cluster (Ramos-Onsins and Aguadé 1998); dynein (Dhc-Yh3; Zurovcova and Eanes 1999); glutathione-S-transferase D1 (GstD1; M. T. Hargis and J. B. Cochrane, unpublished sequences in GenBank); myosin alkali light chain (Mlc1; Leichtet al. 1995); male accessory gland peptide genes Mst26Aa and Mst26Ab (Aguadéet al. 1992); nullo (Cacconeet al. 1996); Cu-Zn superoxide dismutase (Sod; K. Arxontaki, P. Kastanis, S. Tsakas, M. Loukas and E. Eliopoulos, unpublished sequences in GenBank); and serendipity (sry-α; Cacconeet al. 1996). The sequence from each species, for each of these genes, was aligned by eye and concatenated. To these sequences were added one randomly drawn sequence from each species from the 14 genes listed in Table 1. The final data set included concatenated data from 23 gene regions, with a total length for each sequence of 28,692 bases. For each of the three possible comparisons, the D. melanogaster sequence was used to root the divergence between the sequences from each of the other two species and to obtain estimates of the substitution rate per base pair since that root point. For the comparisons between D. simulans and D. sechellia the relative rate test yielded values of 0.0092 and 0.0137 changes per site, respectively, which are highly significantly different (P < 0.0001). For the simulans/mauritiana comparison, the values are 0.0102 and 0.0122, respectively (P < 0.05); and for the mauritiana/sechellia contrast the values are 0.0121 and 0.0145, respectively (P < 0.05). On balance it appears that genes from D. sechellia have been evolving ∼50% more quickly than have genes in D. simulans and that genes in D. mauritiana have an average rate of mutation accumulation that is in between that of the other species. Put another way, if we consider just the 112 Mb of DNA sequence recently reported for the D. melanogaster genome project, then a random copy of the D. sechellia genome had >500,000 more mutations accumulate than a comparable copy of the D. simulans genome since the various times at which the different genes had common ancestors.
Relative rate test results
The ranking of mutation accumulation rates inversely mirrors the ranking of estimated effective population sizes—the larger the effective population size, the lower the rate of mutation accumulation. This pattern is consistent with the slightly deleterious model of mutation accumulation, in which more mutations are effectively neutral when population sizes are smaller (Ohta 1972, 1973). If the slightly deleterious mutation model does explain the differing rates of mutation accumulation, then we would also expect this to be reflected in the ways that synonymous mutations have accumulated in the different species. Synonymous codon usage in Drosophila does appear to have been shaped, in part, by natural selection (Kliman and Hey 1993b; Akashi 1994, 1995; Duret and Mouchiroud 1999), and the degree to which prefered codons (vs. unprefered codons) accumulate can be taken as a measure of the efficacy of natural selection on codon usage. Thus, for example, Akashi (1995) found that fixations of unpreferred codons were significantly more numerous than fixations of preferred codons in D. melanogaster, indicating that selection on codon usage may have become ineffective in this species subsequent to its split from D. simulans. Using D. melanogaster as an outgroup, we identified by parsimony the ancestral and derived states for fixed synonymous substitutions unique to each of the three simulans complex species (see Table 5). D. simulans and D. mauritiana had too few fixed synonymous substitutions with which to conduct a test, but there are 29 such fixations in D. sechellia. Of these, 17 substitute an unpreferred codon for an ancestral preferred codon, while only 6 show the opposite pattern. These values differ significantly from equality (G = 5.48, P = 0.0019), consistent with the hypothesis that selection on silent sites has been ineffective in D. sechellia in the time since coancestry with the other species.
Fixed synonymous mutations
Shared polymorphisms and fixed differences
The evidence of reduced effective population size and reduced efficiency of natural selection, in D. sechellia relative to the other species, is also consistent with the finding that D. sechellia bears many fewer genes that contribute to hybrid sterility in crosses with D. simulans than does D. mauritiana. Though this pattern was once interpreted as evidence that D. simulans and D. sechellia are the most closely related species pair (Palopoliet al. 1996), it is also consistent with a greater rate of adaptation in D. mauritiana, as might occur with a larger effective population size.
Divergence of species: As incipient species begin to diverge from one another they can be expected to share genetic variation that was common to their ancestral species. If neither incipient species experiences a strong population bottleneck, then these shared polymorphisms may persist for a long period of time, particularly at those genes that are not associated with adaptive divergence (and are not linked to such genes). Table 6 shows the numbers of shared polymorphisms and fixed differences found between each species pair. Both D. simulans and D. mauritiana are highly polymorphic, and even though the number of sequences sampled is small, we find that the two species share polymorphisms at a majority of the loci. In contrast, species comparisons that involve D. sechellia generally revealed no shared polymorphisms, as expected given the low level of polymorphism found within this species. The exceptions involving D. sechellia are a single shared polymorphism between D. simulans and D. sechellia at per and the abundance of shared polymorphisms at In(2L)t due to a single D. sechellia sequence (see above).
To assess how many of the shared polymorphisms could be expected to arise just by recurrent mutation, we conducted a simple calculation under the assumption that mutations occur randomly and independently with equal probability at all sites. If s1 and s2 polymorphic sites were observed in each of two historically independent species over a common region of length L, then the probability that exactly ss of those polymorphisms fall on the same base positions in the two samples is given by the hypergeometric probability
Another way to check whether mutations are occurring randomly and fairly uniformly across sites is to compare observations with a Poisson distribution. An approximate check can be made by asking whether the number of sites that support a 2-, 3-, or 4-base polymorphism is consistent with a Poisson distribution, given the number of sites that revealed no polymorphic sites. Fitting a Poisson distribution to the D. simulans data set returned expected values of 12,271, 275, 3, and 0 positions with 1, 2, 3, and 4 segregating bases, respectively (sites with 1 segregating base are invariant). The observed values were 12,271, 271, 7, and 0. The good fit of the Poisson distribution suggests that overall the data set has just a small number of sites where recurrent mutations have occurred.
Also revealed in the comparison between D. simulans and D. mauritiana is the negative correlation, across loci, that is expected between fixed differences and shared polymorphisms. In the absence of recombination and recurrent mutation, a gene tree for one locus can support either fixed differences or shared polymorphisms, but not both (neither may occur as well), as a simple byproduct of the possible gene tree topologies (Wakeley and Hey 1997). However, if recombination has been occurring, then different portions of a locus have different gene trees and it is possible for both shared polymorphisms and fixed differences to occur.
Figure 2 shows the results of cluster analyses for most of the genes (similar diagrams for the remaining genes were reported previously). These diagrams should not be equated with gene tree estimates, as most loci showed evidence of recombination and thus do not have a bifurcating gene tree history. However, these diagrams do serve to show the variable patterns of similarity that are found among genes and how those patterns are not consistent with simple phylogenetic relationships among species. As in the case of the original studies on ase, ci, per, yp2, and z (Hey and Kliman 1993; Kliman and Hey 1993a; Hiltonet al. 1994), sequences from D. simulans show only a limited tendency to cluster by their taxonomic designation. The same kind of dispersed pattern is seen for D. mauritiana sequences at hb, In(2L)t, and janus. The D. sechellia samples do consistently cluster with one another (with the exception of one sequence at In(2L)t), but depending on the gene the D. sechellia cluster may fall almost anywhere within the diagram.
The frequent tendency for genes from D. simulans and D. mauritiana to cluster with those from the other species is entirely consistent with the presence of a large number of shared polymorphisms between these species (Table 6). These patterns are expected if multiple gene lineages persist in both species since the time of speciation (Clark 1997; Wakeley and Hey 1997).
Fitting a speciation model: We compared the data to what would be expected under a simple speciation model, called an “isolation model,” in which an ancestral constant size population splits over a very short period of time into two populations, each of constant size. There are four primary parameters to the model, including three θ’s, or population mutation rates (one for the ancestral population and one for each descendant), and a time since the splitting event. The model fitting requires the counts of shared polymorphisms and fixed differences, as well as counts of the numbers of unique polymorphisms. The method is outlined in Wakeley and Hey (1997) and Wang et al. (1997).
Table 7 shows the results of fitting the isolation model to four different data sets. The first case includes D. simulans and D. mauritiana and, as in the original application of the method for this species pair, the ancestral species appears to have had a size intermediate between the descendants and to have occurred not very long ago (Wakeley and Hey 1997). The second case is the same as the first except that the numbers of shared polymorphisms were reduced by the number expected by chance and independent mutation as shown in Table 6. The isolation model parameter estimates are very similar to the first case. The third and fourth applications are to the case of D. simulans and D. sechellia (with and without the shared sequence of In(2L)t, respectively). It is interesting that the removal of that sequence does not have a large effect on the parameter estimates. In both cases D. sechellia has a low estimated value for θ, while the ancestral species estimate is considerably larger than that for either descendant species. The reason for the similarities, with and without the In(2L)t sequence, is that this sequence is not the only locus where a shared polymorphism was found (one also occurred at per; Table 6). Thus, in both applications, the model must still reconcile the presence of divergence between the taxa, low polymorphism within D. sechellia, and the presence of shared polymorphism. The combined effect of all three is to drive up the estimate of the size of the ancestral species (Wanget al. 1997).
We also performed statistical tests of the quality of fit between the expected levels of polymorphism under the isolation model and the observed values (Wanget al. 1997). The test proceeds by conducting coalescent simulations based on the parameter estimates and then by comparing the distribution of results from 10,000 such simulations with the actual data. These simulations incorporated recombination, at rates based on the estimated amount of recombination that occurs within each gene in each species, as this strongly affects the degree to which shared polymorphisms and fixed differences covary. We used the γ estimate of the population recombination rate (Hey and Wakeley 1997) as determined for each species and locus. Just as with the actual data, each simulated data set is partitioned into the four categories of polymorphic sites (polymorphisms exclusive to species 1, those exclusive to species 2, shared polymorphisms, and fixed differences) for each locus, and these quantities are used to generate isolation model parameter estimates and expected values for each of the quantities. The simulations were also used to generate 95% confidence intervals for the parameter estimates.
—Distance trees for nine loci. The length of the branch to the outgroup sequence of D. melanogaster is shown in units of estimated changes per base pair. Comparable trees for the remaining loci (ase, ci, per, yp2, and z) were reported previously (Hey and Kliman 1993; Kliman and Hey 1993a; Hiltonet al. 1994). For each locus, DNA sequences were aligned by eye and clustering was done using the neighbor-joining algorithm (Saitou and Nei 1987). Where lines were not common to multiple loci, lines are labeled only to species. For Adh, est6, and Zw, the lines with specific line numbers were the same as some of those used in earlier reports on ase, ci, per, yp2, and z (see materials and methods). For hb, mt:ND5, Sxl, and w, most lines came from a common set (as described in materials and methods) and these lines are numbered within each species.
Isolation model fitting
We considered two test statistics. One was a simple χ2 statistic that summed the discrepancies between observations and expectations for each locus and each polymorphism type. If we denote the counts of the four types of polymorphisms for locus i as Si,j, with j = 1... 4, and if there are L loci, then
From comparison of the first two rows of Table 7, it is clear that adjusting the observed numbers of shared polymorphism by the number expected by chance has little effect on the parameter estimates or the quality of the fit of the isolation model. Similarly, from rows three and four, we see that the effect of including the unusual sequence of In(2L)t within the D. sechellia sample has little effect on the parameter estimates. There is an effect on the fit between the model and the data (the model fits better when the sequence is excluded), but in neither case is the model rejected.
DISCUSSION
Our basic approach has been to extend DNA sequence-based population genetics to questions associated with relatively ancient speciation events. The divergence of the simulans complex species probably began hundreds of thousands of years ago (Hey and Kliman 1993), yet in the patterns of mutation accumulation and in the patterns of shared and fixed differences, we can still assess the effects of population sizes and assess the historical presence of gene flow between species. For two quite different reasons, the DPG approach to the study of recent speciation events becomes considerably more informative the more that comparative DNA sequence data are available from multiple independently segregating genes. First, multiple loci permit the assessment of different evolutionary forces. The historical portraits that are developed for each locus can be compared to see whether different loci are consistent with a common historical model. Thus multiple loci can be used to distinguish those demographic forces that have acted on many genes (e.g., population splitting, population size changes, and migration) from those that have acted just on smaller parts of the genome (e.g., natural selection). The second benefit of multiple loci concerns sampling effort. For populations or species that have been diverging for some time, the gene trees within species may not extend back to the time of the common ancestor, and even if they do, only a small minority of lineages are expected to be of that age. Thus, only a small portion of the true genealogical history for a species, at a locus, may extend from the time period under investigation. When this is the case, repetitive sampling within species tends to include that history even in a small sample. Put another way, the older nodes of a species’ true genealogy, for a locus, tend to be revealed in a small sample, whereas more recent portions are, on average, only revealed as the sample size per locus grows large (Kliman and Hey 1993a). This basic feature of genealogical sampling necessarily dictates an optimal strategy that is shifted away from multiple sequences per locus and toward multiple loci—each with few sequences. In the extreme, it is even possible to study the sizes of ancestral species by using just one sequence from each species, so long as many loci are studied (Takahata 1986).
Departures from the neutral model: The major assumptions of the basic null model that is used as a heuristic guide for many analyses, and as an explicit baseline in the statistical tests, are that mutations are neutral and that population sizes are constant (the McDonald-Kreitman test does not rely upon the latter). We observed four distinct kinds of departures from null expectations: an overall negative value of Tajima’s D for D. mauritiana (Table 2), suggestive of a recently expanding population size in this species (Tajima 1989a); significant McDonald-Kreitman tests at est6, janus, and Zw (Table 3), suggestive of an accumulation of excess amino acid replacement differences between species at these loci (Eaneset al. 1993); an excess accumulation of mutations in D. sechellia, many of which are probably slightly deleterious (Tables 5 and 6); and significant HKA tests, primarily due to low variation within species at genes in low recombination portions of the genome (Figure 1). On balance, our null model does not fare very well, though we do learn a great deal from each of these exceptions. These findings necessarily lessen the applicability of the isolation model fitting, which strongly relies upon the neutral model. In recent years, particularly with growing data on polymorphism and divergence, there have come many reports on exceptions to the neutral model, particularly for the well-studied D. melanogaster (Kreitman 1996; Moriyama and Powell 1996; Ohta 1996; Hey 1999).
Speciation: Throughout this report, the three simulans complex species are considered to be biological entities within which evolutionary forces of natural selection and genetic drift play out amid a recombining gene pool, and between which there is a near absence of gene exchange (Dobzhansky 1937). If we wish, this starting point can be taken as an assumption under test. Thus, for example, consider that the three simulans complex taxa have been represented by DNAs prepared from organisms that were taxonomically identified on morphological grounds. Then our evolutionary investigation amounts to a testing of the hypothesis that the taxonomic samples have indeed come from biological species. In particular, we can ask two questions: (1) whether the patterns of DNA sequence variation for any one taxon are consistent with that taxon being a single biological species and (2) whether the patterns of DNA sequence variation between taxa are consistent with genetic isolation. The first question can be partly assessed by asking whether single taxa show evidence of multiple separate gene pools. An example of this would be if multiple genes show evidence of relatively ancient population subdivision among the samples from a single taxon. For example, a pattern like this was found for multilocus samples of D. novamexicana (Hilton and Hey 1996, 1997). The second question can be partly assessed by asking whether multiple taxa have experienced substantial gene flow at multiple loci. Hilton and Hey (1996, 1997) also found a case wherein two cytospecies, D. americana americana and D. a. texana, had experienced considerable gene exchange at multiple loci and thus should not be considered as species (see also McAllister and Charlesworth 1999).
When simulans taxa are considered from the standpoint of just one single locus, then we often do find that taxa are poorly reflected by the patterns of similarity among individual gene copies. Pairs of gene copies drawn from D. simulans and D. mauritiana vary widely in the degree to which they differ, and gene tree estimates for individual genes show that these taxa are highly paraphyletic when represented by multiple gene copies for a single locus (Satta and Takahata 1990; Hey and Kliman 1993; Kliman and Hey 1993a; Hiltonet al. 1994; Figure 2). However, when the same set of multiple strains from each species was studied at multiple loci, there was no tendency for ancient population subdivision within species and no strong evidence of recent gene flow at multiple loci (Hey and Kliman 1993; Hiltonet al. 1994). In the branching cluster diagrams for z, yp2, and per, the D. sechellia sequences clustered, as did those for D. mauritiana, while those for D. simulans tended to be spread out across the diagram and to come together mostly at the deepest parts of the diagram (Hey and Kliman 1993). However, despite these common patterns among taxa across loci, the clustering patterns varied widely within taxa across loci— even though the same set of inbred lines had been studied for each locus. For some of the newly studied loci (hb, w, mt:ND5, and Sxl), a common set of inbred lines was used, and again we see that there are common patterns among taxa (very similar to what was found with z, yp2, and per), and again we see that within taxa the relationships among particular lines vary widely across loci (Figure 2). These results are just what we expect for taxa that represent real, recently diverged, biological species, within which there is recombination. The overall picture is one in which all three species diverged at about the same time, in which both D. mauritiana and D. simulans have had large effective population sizes and still carry shared polymorphisms since divergence, and in which D. sechellia has a small effective population size.
At the crux of many speciation discussions is the question of whether or not natural selection plays a direct creative role in forming species. In the simplest models of allopatric speciation it does not, and speciation is a byproduct of the evolution that proceeds in physically separated populations. Thus, for example, in the classic speciation model of Dobzhansky and Muller, each of two separate populations accumulates adaptations one by one. However, it turns out that when given the chance to hybridize, the mutations fixed in one species are incompatible with the novel genome of the other species. In other words, they are epistatic and deleterious when expressed in a genetic background other than the one in which they arose (Dobzhansky 1936; Muller 1940). However, speciation may also arise in a more dynamic context where natural selection promoting regional specialization is in a tug-of-war against recombination and gene flow that break down associations both among genes and between genes and geography. Under these circumstances, where diverging populations are sympatric or parapatric, natural selection is acting directly to shape the species barrier.
Whether or not natural selection promotes species formation directly or indirectly depends on whether or not gene exchange was occurring among incipient species. Thus, research on the historical demographic processes associated with species divergence may reveal evidence of ancient gene flow and, therefore, illuminate the kinds of natural selection and the kinds of phenotypes that might have existed during the beginning stages of species formation. Of course, if gene flow and natural selection were important factors for just a short period of time at the beginning of speciation, then patterns of variation may not be indistinguishable from those expected under the isolation model, particularly if those events were long ago. However, population genetic methods can sometimes reveal recent or ongoing gene flow between species that are otherwise long diverged. With such findings our understanding of species as evolutionary entities undergoes a significant adjustment; for it is then that natural selection can be seen as having maintained the phenotype, by which we recognize the species, in the face of that gene flow.
Assessing gene flow: Variation can be shared between species either by gene flow or by dual persistence since the time of population splitting. These historical alternatives can be difficult to distinguish, relying primarily on two kinds of observations. First, if most gene sequences suggest moderate or high divergence, but a minority are identical for two species, then the simplest explanation may be population splitting long ago and limited recent gene flow. This kind of observation is essentially one of an appearance of a sequence that is “atypical” for its taxon. An example of this pattern was found at the per locus in D. pseudoobscura and D. persimilis (Wang and Hey 1996). Among the loci studied here, only In(2L)t showed an example of this. The second kind of observation that can be suggestive of gene flow is if loci vary widely in the degree to which they share polymorphisms. A model of limited gene flow is expected to give wide variation among loci in apparent divergence (Wakeley 1996; Wanget al. 1997). However, such variation among genes must be quite high, and assessment of it is strongly dependent upon recombination rates. Thus we could not reject the model of no gene flow, in the testing of the isolation model, between D. simulans and D. sechellia, despite the appearance of haplotype sharing at In(2L)t. A third explanation of shared variation is not genealogical, but mutational—recurrent mutations can also cause shared variation. Given the overall recent low levels of DNA sequence divergence, this has probably not been a large factor in this study. However, recurrent mutation could well have been the cause of the single polymorphism that is shared by D. sechellia and D. simulans at the per locus.
Limited evidence of gene flow among these species also comes from a study of ase and ci. In this case the observation did not involve shared variation (polymorphism is nearly absent in these genes) but rather that divergence between the simulans complex species was less than expected given what had been found at other loci (Hiltonet al. 1994). On balance, the data are largely consistent with an absence of gene flow. Only In(2L)t revealed the kind of pattern expected of a very recent gene flow event, and in this case laboratory admixture cannot be ruled out. Overall the levels of shared polymorphisms and fixed differences are consistent with an isolation model. However, given the difficulty of distinguishing gene flow from shared ancestral variation, we cannot rule out a speciation model that included a period of gene flow following population splitting.
Phylogeny: As an important model system for the study of speciation, the D. simulans complex has been the subject of many efforts to infer phylogeny. Indeed, all three possible pairs of taxa have been proposed as the most closely related species pair, including simulans/sechellia (Cariou 1987; Palopoliet al. 1996) and simulans/mauritiana (Lachaiseet al. 1986; Joly 1987; Hey and Kliman 1993; Coyne and Charlesworth 1997; Harret al. 1998; Tinget al. 2000), and also the sechellia/mauritiana pairing (Caccone et al. 1988, 1996), which seems unlikely on the basis that it would require a colonization from one remote island to another, whereas the other models simply require two colonizations from the mainland.
The difficulty of the phylogeny problem can be seen both from the standpoint of the data and from the standpoint of theory. Regarding data, a simple appraisal of the cluster diagrams for the 14 genes shows how difficult it could be to try to discern an overall species branching history. Thus consider from the standpoint of D. sechellia sequences, which always cluster together [excepting In(2L)t], and ask whether the next most similar sequence is from D. simulans, or D. mauritiana, or whether it is a node that joins sequences from both of these species. A plurality of genes pair the D. sechellia cluster with a mix of simulans and mauritiana gene copies, including ase and ci (Hiltonet al. 1994), as well as est6, hb, In(2L)t, janus, and yp2 (Hey and Kliman 1993). Five genes reveal a simulans gene copy, or a cluster of simulans copies, as the next most similar to the D. sechellia cluster, including Adh, per (Kliman and Hey 1993a), Sxl, w, and z (Hey and Kliman 1993). Just 2 genes, mt:ND5 and Zw, have a sechellia/mauritiana pairing. On balance, there is a suggestion that the origin of what we call D. sechellia arose prior to the splitting that gave rise to our other species. This was the conclusion based on just 3 genes (Hey and Kliman 1993), and now with 14 genes we see that a plurality of the cluster diagrams favor this explanation. This conclusion is also supported by a recent study of the Odysseus (OdsH) locus that contributes to hybrid sterility between D. mauritiana and D. simulans. Multiple sequences from each of the three species revealed a striking pattern of very low polymorphism within species and multiple fixed differences between species (Tinget al. 2000). When considered in light of the relative paucity of fixed differences found at other genes, this pattern is strongly suggestive of multiple recurrent selective sweeps at this locus. The OdsH coding region sequences of the three taxa appear quite separate and distinct on the estimated gene tree, with those from D. simulans and D. mauritiana more closely related to each other than either are to those of D. sechellia (Tinget al. 2000).
It is noteworthy that what appears to be the most unlikely pairing for the most recent speciation event, on the basis of these cluster analyses and on biogeographic grounds (D. sechellia and D. mauritiana), was the favored topology in a study that brought together multiple comparative DNA sequence data sets (Cacconeet al. 1996). Caccone et al. included data sets for which there were only single copies from each taxon, as well as those available data sets with multiple sequences from each. When multiple sequences were available the data were collapsed within taxa, so as to represent each taxon by just a single sequence, with polymorphisms represented using the IUPAC ambiguity codes (A. Caccone, personal communication). Different genes supported different topologies, but when all the data were combined into one large data set (i.e., one long sequence for each species), the result was strong support for the sechellia/mauritiana pairing. This result was not sensitive to inclusion of the ambiguous (i.e., polymorphic) positions. For three reasons, we do not further explore why Caccone et al.’s method of data combining would yield a network that is at odds with the data from most of the 14 genes studied here. First, it is difficult to assess whether the collapsing and combining of data from many genes, with widely varying histories, might lead to a misinterpretation of closely spaced speciation events. Second, the data presented here cannot rule out any particular bifurcating topology—though the sechellia/mauritiana pairing seems unlikely. Third, we have tried to avoid imposing a traditional phylogenetic model on our analyses. Such models necessarily employ assumptions of instantaneous splitting among distinct homogeneous entities. In the diversification of the simulans complex, we have the opportunity to understand phylogeny in a broader sense.
It is worth noting that the difficulty of inferring a branching species history is probably not a simple byproduct of too little data. The 14-locus data set comprises very nearly 220,000 bp of DNA sequence, not including the D. melanogaster outgroup sequences, and there are a total of 554 polymorphic sites, including 320 so-called “phylogenetically informative” polymorphisms (i.e., the rarer base occurs more than once). Also, as these are very closely related DNA sequences, only a small fraction of these polymorphisms are expected to have occurred at the same site (see results). One might suppose that a data set with just three taxa and hundreds of informative sites (with little recurrent mutation) would permit a straightforward, traditional, phylogenetic resolution, but clearly it does not.
If we consider “phylogeny” as pertaining to the genesis of phyla then we have good reasons for eschewing most analyses that impose a simple bifurcating model on the history of these species. All three species are similarly related to one another, and the data suggest that all three have been evolving as separate entities for about the same amount of time. It also appears that divergence has been accruing in a manner consistent with allopatric speciation. If that is correct then we must also consider the likelihood that there was an extended period of time when multiple separate, but nonreproductively isolated, populations existed. The isolation model used here for some analyses assumes an instantaneous population splitting event, but even if that is accurate, neither that model nor any of our data help us to think about the origins of reproductive isolation. Given the recency of these speciation events, their evident proximity in time to one another, and the biological necessity that such events encompassed some time, there seems a large chance that we could misunderstand history if we were to take “speciation event” too literally as denoting an instance in time. For example, under allopatry and the Dobzhansky/Muller model (Dobzhansky 1936; Muller 1940), it would have taken some time for independent adaptive mutations to arise and sweep to fixation in the separate populations.
There are also a number of ways that the demographic circumstances associated with the origin of these taxa could positively mislead any attempt to impose a bifurcating model. For example, if the ancestral species consisted of multiple populations with limited gene exchange, with differentiation and local adaptation then the divergence of multiple species out of this ancestral species could be expected to reflect this structure. Indeed, there is evidence that D. simulans once had more population structure than we find at present (Hamblin and Veuille 1999). It is entirely possible that conclusions from a majority of gene trees, or a combined data set, might mistakenly reflect this population structure and fail to reflect the actual sequence of speciation events.
A synthesis: If we draw from the current biogeography and patterns of DNA sequence similarities, then it appears as if there were two island colonization events by flies that came from a large continental population. Given the large variation in DNA sequence similarities, species cluster, it seems nearly certain that a large amount of the variation that presently occurs among species includes samples of the variation that was present in ancestral species. If the two colonization events happened nearly at the same time, then different genes are expected to suggest different orders and topologies for these population splitting events.
—Diagram of one mainland population of constant size giving rise to two island populations of different sizes.
Consider a model in which a large continental species gives rise to two smaller isolated populations on offshore islands, and that after formation these island populations are constant in size and exchange no genes with the mainland population (Figure 3). Then the expected amount of divergence between a gene copy from an island endemic and the mainland species can be expressed as a function of the time since splitting, the mutation rate since splitting, and the amount of variation within the mainland ancestral species. Let dim be the average number of base pair differences between the island species (i) and the mainland species (m); let t be the time of island population formation; and let ui and um be the respective mutation rates per year experienced by each. Then
The absolute time can be roughly assessed by assuming that um applies to the divergence between D. melanogaster and D. simulans. The average of the pairwise differences between these species, summed across these 14 loci, is 476.62. If we assume that the separation of these gene copies was ∼3 million years ago (Hey and Kliman 1993), then um = 476.62/(2 · 3 · 106) = 7.94 · 10-5. Applying this rate we obtain an estimate of t, for D. sechellia, of 413,000 years and of t, for D. mauritiana, of 263,000 years. These dates scale linearly with any estimate of um, and it should be noted that the 3 million year date is very rough, as it relies upon a few amber fossils of early Drosophilids of somewhat uncertain age (Throckmorton 1975) and an assumption of a molecular clock (Hey and Kliman 1993; Kliman and Hey 1993a).
Acknowledgments
We thank Constantin Yanicostas for assistance with janus. R.M.K. and J.H. were supported by National Institutes of Health (NIH) grant R01GM58060. R.M.K. also received support from the Jeffress Memorial Trust. F.D. was supported by “Groupe de Recherche sur les Genomés” grant GREG92-392 to Michel Veuille. J.C. was supported by NIH GM 58260. J.W. was supported by National Science Foundation grant DEB-9815367.
Footnotes
-
Communicating editor: W. F. Eanes
- Received April 14, 2000.
- Accepted September 11, 2000.
- Copyright © 2000 by the Genetics Society of America