Abstract
An essential component of the immune system of animals is the production of antimicrobial peptides (AMPs). In vertebrates and termites the protein sequence of some AMPs evolves rapidly under positive selection, suggesting that they may be coevolving with pathogens. However, antibacterial peptides in Drosophila tend to be highly conserved. We have inferred the selection pressures acting on Drosophila antifungal peptides (drosomycins) from both the divergence of drosomycin genes within and between five species of Drosophila and polymorphism data from Drosophila simulans and D. melanogaster. In common with Drosophila antibacterial peptides, there is no evidence of adaptive protein evolution in any of the drosomycin genes, suggesting that they do not coevolve with pathogens. It is possible that this reflects a lack of specific fungal and bacterial parasites in Drosophila populations. The polymorphism data from both species differed from neutrality at one locus, but this was not associated with changes in the protein sequence. The synonymous site diversity was greater in D. simulans than in D. melanogaster, but the diversity both upstream of the genes and at nonsynonymous sites was similar. This can be explained if both upstream and nonsynonymous mutations are slightly deleterious and are removed more effectively from D. simulans due to its larger effective population size.
GENES involved in host-parasite interactions are often subject to strong balancing or directional selection. For example, parasite antigens commonly evolve rapidly, and natural selection can maintain many different alleles in a population (Escalante et al. 1998). Similarly, vertebrate MHC genes are highly polymorphic and have elevated levels of nonsynonymous substitutions (Hughes and Nei 1988). However, most of our understanding of the molecular evolution of immune systems comes from studies of the acquired immune response of vertebrates. Acquired immune responses detect and eliminate many different parasites by generating a huge repertoire of receptor molecules with different specificities by somatic rearrangement. This type of immune system is a relatively recent evolutionary innovation of vertebrates; invertebrates instead rely on an innate immune response for defense against pathogens. The innate immune response also remains an essential component of the vertebrate immune response.
The innate immune response relies on a limited number of germline-encoded receptor and effecter molecules. Despite this, it is still highly effective in defending against a diverse array of pathogens. This is thought to be because parasites are recognized using highly conserved molecular patterns and eliminated using responses that are effective across a broad range of parasite taxa (Medzhitov and Janeway 1997). Presumably these parasite molecules are highly conserved because they are under strong functional constraints, unlike protein antigens that commonly evolve very quickly. Under this scenario, there may be less opportunity for rapid host-parasite coevolution between pathogens and the innate immune system than between pathogens and the acquired immune system.
There is, however, indirect evidence of coevolution between parasites and invertebrate hosts (which possess only innate immune systems). For example, parasites are often adapted to their local host population, as is predicted by most theoretical models of host-parasite coevolution (Ebert 1994; Morand et al. 1996; Lively and Dybdahl 2000). However, it is currently unclear whether these patterns result from coevolution between parasites and components of the innate immune system. Alternatively, coevolution could occur between parasites and other host molecules involved in host-parasite interactions (e.g., cell surface molecules exploited by pathogens to enter cells).
If the innate immune system does coevolve with pathogens, then we would expect immune system genes to show patterns of rapid adaptive evolution or elevated polymorphism. This is the case in Drosophila simulans, where a survey of immune system genes showed evidence of stronger directional selection than was the case for nonimmunity genes (Schlenke and Begun 2003). This could be the result of coevolution, but there is currently no evidence of reciprocal changes in parasite genes. Alternatively, directional selection may result from ecological factors that change the type of opportunistic infections acquired by flies or alter the costs of mounting an immune response.
Clues as to how innate immune systems adapt to novel parasites or parasite genotypes can be gained by comparing the lists of genes under directional and purifying selection. For example, peptidoglycan recognition proteins, which include receptors that bind to bacteria leading to the expression of antimicrobial peptides (AMPs), tend to be highly conserved (Jiggins and Hurst 2003). In contrast, thioester-containing proteins and scavenger receptors, some members of which bind to pathogens and are involved in their subsequent phagocytosis or encapsulation, can be under strong directional selection (Lazzaro 2005; Little and Cobbe 2005).
In this study we have focused on the evolution of AMPs, which are an important component of the innate immune response. Most AMPs are thought to exploit the fact that the outer surface of bacterial membranes contains negatively charged phospholipid headgroups that are absent from animal and plant cells (Zasloff 2002). Because AMPs have an amphipathic structure (separate hydrophobic and hydrophilic domains) they can first carpet and then integrate into the outer layer of the membrane. This then allows them either to disrupt the integrity of the cell membrane (e.g., by lysing the membrane or forming pores) or to enter the cell to disrupt some intracellular target (Zasloff 2002).
In vertebrates, AMPs appear to be a hotspot of rapid adaptive evolution. The most dramatic case is found in frogs, each species of which produces 10–20 AMPs in their dermal gland secretions. These AMPs differ among closely related species in their size, sequence, and antimicrobial specificity, and this rapid diversification has been driven by diversifying selection (Duda et al. 2002). In various groups of mammals, both α- and β-defensins have diversified under positive selection, and this has led to changes in their antimicrobial specificity (Hughes and Yeager 1997; Morrison et al. 2003; Semple et al. 2003; Antcheva et al. 2004; Lynn et al. 2004). These patterns strongly suggest that these AMPs are subject to continually changing selection pressures, presumably due a changing pathogen environment. The changing selection pressures may be the result of parasites coevolving with the AMP molecules.
Studies of insect AMPs have produced more varied results. In termites, the antifungal peptide termicin evolves rapidly under positive selection (Bulmer and Crozier 2004). However, several studies of six different families of antibacterial peptides in Drosophila have consistently failed to produce evidence for rapid adaptive evolution of the amino acid sequence (Clark and Wang 1997; Ramos-Onsins and Aguade 1998; Lazzaro and Clark 2003). Although these studies have detected some evidence of positive selection, it is clear that the rate of adaptive protein evolution is dramatically less than that in vertebrates and termites [an exception, andropin, is discussed later (Date-Ito et al. 2002)].
We wanted to test whether this pattern of evolution was general across Drosophila AMPs. Therefore, we have investigated the evolution of the drosomycins, a family of AMPs that are active against fungi rather than against bacteria and that show no homology to previously studied antibacterial peptides. Drosomycin (Drs) strongly inhibits the growth of filamentous fungi, but has no effect on the growth of a range of bacteria (Fehlbaum et al. 1994). This makes it the only purely antifungal peptide characterized in Drosophila. At low concentrations, drosomycin causes the cell cytoplasm to be extruded along the hyphae, suggesting that it lyses cell membranes (Fehlbaum et al. 1994). However, drosomycin's exact mechanism of action is unknown. There are also six drosomycin-like genes in the D. melanogaster genome, all found within a 56-kb region of the left arm of chromosome 3 (Figure 1). Unlike drosomycin itself, the antifungal activity of these genes has not been tested experimentally, although it is known that one of them (Dro5) is upregulated following fungal infection (De Gregorio et al. 2001).
The arrangement of drosomycin genes in D. melanogaster, showing the region sequenced.
MATERIALS AND METHODS
Fly lines:
We sequenced the drosomycin genes from 20 D. melanogaster chromosomes and 20 D. simulans chromosomes. The flies were all originally collected as isofemale lines from natural populations. We used D. melanogaster lines from Gabon in west central Africa and from The Netherlands. The D. simulans lines were collected in Kenya and France.
The third chromosome of the D. melanogaster stocks was made isogenic by standard crosses to the balancer stock TM6/Sb. Population genetics analyses in D. melanogaster can be confounded by linkage to chromosomal inversions. Therefore, we crossed the isogenic chromosomes to an inversion-free stock and inspected the salivary gland chromosomes of the F1 progeny for inversion loops. Inversions were uncommon in the region containing the drosomycin genes (1 of 21 Gabon chromosomes and 0 of 15 Netherlands chromosomes). Chromosomes containing inversions were discarded. The D. simulans lines were inbred by sib mating for six to nine generations.
DNA sequence data:
PCR primers were designed from the genome sequences of D. melanogaster and D. simulans using the program Primer 3. From each chromosome we sequenced the three regions shown in Figure 1. These were amplified in multiple overlapping amplicons. In some cases mutations in the primer binding site prevented the PCR primers from working. In all such cases new primers were designed to ensure that the complete sequence was obtained from all 40 chromosomes. The PCR products were purified either with QIAGEN (Chatsworth, CA) PCR cleanup columns or by using exonuclease I and shrimp alkaline phosphatase to digest unused PCR primers and dNTPs. The PCR products were then sequenced directly using big dye reagents on an ABI capillary sequencer. The sequence chromatograms were inspected by eye to confirm the validity of all differences within and between species. The sequences were then aligned using Clustal W and the alignment was corrected by eye.
Sequences also were obtained from three other species in the melanogaster group, D. yakuba, D. ananassae, and D. erecta, whose genomes have been sequenced but are currently not annotated (Smith 2004; Wilson 2004). The genome assemblies used were D. yakuba Langley Group assembly 22/5/2004, D. erecta Agencourt assembly 28/10/2004, and D. annanasae Agencourt assembly 6/12/2004. The contigs of these species were searched by Blast using the protein sequences of all the D. melanogaster drosomycin genes. Unique coding sequences were then downloaded and aligned with the D. simulans and D. melanogaster sequences. These sequences may contain some errors. However, these are not expected to occur preferentially at either synonymous or nonsynonymous sites. Therefore, any errors will tend to bias estimates of dN/dS toward 1 (neutrality).
Tree reconstruction:
The phylogenetic relationships of the drosomycin sequences from different species were reconstructed by maximum likelihood using PAUP* v.4.0b10 (Swofford 1998). We used the HKY85 model of sequence evolution and allowed gamma distributed rate heterogeneity between sites (Hasegawa et al. 1985). The transition-to-transversion ratio and the shape of the gamma distribution were first estimated from a maximum parsimony tree. The phylogeny was then reconstructed using a heuristic search with nearest-neighbor interchanges. The robustness of the topology was assessed by 1000 nonparametric bootstrap replicates. The tree was rooted using the sequences from D. ananassae.
The dN/dS ratio:
We tested the hypothesis that functionally distinct regions of the protein have different ratios of nonsynonymous-to-synonymous substitutions (dN/dS ratios). This analysis used the sequences, alignment, and tree topology shown in Figures 2 and 3 (with the exclusion of D. ananassae sequence D). Therefore, we are looking at the divergence of both paralogs and orthologs. The analysis was performed using the Codeml program in the PAML v. 3.14 package (Yang 1997), which fits a maximum-likelihood model of codon substitution along the phylogenetic tree. First, we fitted a model under the null hypothesis that all the codons have the same dN/dS ratio. Then, using a likelihood-ratio test, we compared this to a model in which the signal peptide and mature peptide have different dN/dS ratios. Finally, this was compared to a three-ratio model in which the active site has a dN/dS ratio different from that of the rest of the mature peptide.
It is possible that positive selection could act on sites that cannot be predicted a priori. To detect such sites, we used a model that allows the dN/dS ratio to vary between codons (Yang et al. 2000). Our null model, M8A, forces the dN/dS ratio of all the codons to be <1. In this model, sites either have 1 of 10 different dN/dS ratios calculated from a beta distribution bounded by 0 and 1 or belong to an eleventh class in which the dN/dS ratio is fixed at 1 (Swanson et al. 2003). This is then compared to model M8, in which the eleventh dN/dS ratio is free to vary above 1 (i.e., positive selection is allowed). In all the PAML analyses, statistical significance was assessed by comparing the likelihood-ratio statistic (2Δl) to the χ2 distribution, with the degrees of freedom being the number of additional parameters in the more complex model.
Tests of neutrality:
The majority of population genetics analyses were performed using the program DnaSP (Rozas and Rozas 1999). The neutral distribution of Fay and Wu's H, Tajima's D, and Fu and Li's F and D statistics was obtained by 2000 coalescent simulations. The simulations were performed on the basis of the number of segregating sites (S), the length of the sequence, and the per-site recombination parameter (C). In D. simulans and D. melanogaster, there is no recombination in males, and therefore for autosomal genes C = 2Nc (where N is the effective population size and c is the crossing-over rate/base pair/generation in females). We assumed that in D. simulans N = 2 × 106 and that in D. melanogaster N = 106 (Andolfatto and Przeworski 2000). We used the estimate of c (3.2 × 10−8) for this region in D. melanogaster obtained by Marais et al. (2003) using the polynomial method of Hey and Kliman (2002). We conservatively assumed the same value of c for D. simulans. Therefore, in D. simulans C = 0.128 and in D. melanogaster C = 0.064.
The HKA test was performed using the program HKA written by Jody Hey (http://lifesci.rutgers.edu/∼heylab). The test statistics were compared to a neutral distribution generated from 10,000 coalescent simulations (Hudson et al. 1987; Wang and Hey 1996).
RESULTS
Origins of the gene family:
We began by reconstructing the patterns of gene duplication and loss leading to today's drosomycin gene family. We found six drosomycin genes in D. yakuba, seven in D. simulans, D. melanogaster, and D. erecta, and four in D. ananassae. The fourth D. ananassae gene (D in Figure 2) could not be aligned at the 3′-end and lacked the disulphide bridges that are essential for maintaining the tertiary structure of the peptide. Therefore, this gene was omitted from further analyses. The phylogenetic tree of these sequences shows that the D. ananassae sequences form a single monophyletic group, suggesting that they diverged after the split from the erecta/yakuba/melanogaster/simulans lineage (Figure 3). Similarly, assuming that the tree is rooted with the D. ananassae sequences, the seven D. melanogaster drosomycins duplicated after the split from D. ananassae (18–30 million years ago; Powell 1997) but before D. erecta split from the simulans/melanogaster lineage (6–15 million years ago; Powell 1997). The Dro3 gene either has been lost from D. yakuba or is simply missing from the currently available sequence. The relationship of the seven drosomycin genes roughly mirrors their arrangement in the genome, with related genes being nearby in the genome (Figures 1 and 3).
Amino acid alignment of the drosomycin genes from five species of Drosophila. The species are D. melanogaster (mel), D. simulans (sim), D. yakuba (yak), D. erecta (ere), and D. ananassae (ana). The positions of the β-strand (b), α-helix (a), disulphide bridges (C), hydrophobic region (*), and putative active site (#) are marked above the alignment.
Phylogeny of the drosomycin genes. Percentage of bootstrap support (if >50) is given next to nodes.
The phylogenetic relationships of the melanogaster subgroup have been the subject of much debate (Ko et al. 2003). Some studies have grouped D. yakuba into a clade containing D. melanogaster and D. simulans but not D. erecta. Other work groups D. yakuba with D. erecta. Our data clearly support the latter tree topology, as the D. yakuba sequences tend to be most related to D. erecta. To test whether this was a true pattern, we reconstructed a phylogeny under the constraint that the D. simulans, D. melanogaster, and D. yakuba sequences of each gene were monophyletic. This constraint significantly lowered likelihood relative to the unconstrained tree [Shimodaira-Hasegawa (1999) test: unconstrained, ln l = 2481; constrained, ln l = 2506; Δ ln l = 25.0; P = 0.01].
Concerted evolution:
Concerted evolution is the maintenance of similar nucleotide sequences among members of a gene family within a species, despite those sequences changing over time. It is commonly observed within multi-gene families due to gene conversion or unequal crossing over.
If concerted evolution has homogenized all or most of the sequences of two genes, concerted evolution can be detected by reconstructing the phylogeny of the genes from two different species. Figure 3 shows that, following the initial duplication of drosomycin, the genes have diverged independently. In other words, each gene is more similar to its ortholog in another species than to other members of the family in the same species. This result still holds if all 40 D. simulans and D. melanogaster alleles of each gene are included in the tree (data not shown). Therefore, the drosomycin phylogeny provides no evidence for concerted evolution.
It is possible that a phylogenetic approach will fail to detect gene conversion if only small tracts of sequence are converted within genes. To test for this pattern, we used the runs test of Sawyer (1989). This method first identifies runs of similarity between pairs of sequences, which are given a score based on their length. The order of sites is then permuted and the test statistic recalculated to assess whether the runs of sequence similarity are longer than expected. This permutation test accounts for multiple tests. We detected only one possible gene conversion event in D. erecta involving the conversion of 7–33 bp of Dro6 into Dro4 (P < 0.05). Therefore, gene conversion appears to be rare or absent within this gene family.
Indels and functionally important mutations:
The alignment of the D. simulans and D. melanogaster sequences contained numerous insertions and deletions (indels) both within and between the species. Of particular note is the replacement in D. simulans of sequence upstream from Dro5 with sequence upstream from Dro2. It is possible that this has resulted in regulatory elements that have been copied from upstream of Dro2 to upstream of Dro5. This rearrangement consists of a 182- to 258-bp deletion ∼644 bp before the Dro5 gene in D. simulans relative to D. melanogaster. The original length of this sequence has been largely restored by an ∼226-bp insertion just 42 bp before the deletion. This inserted sequence has been copied from the region ending 6 bp before the Dro2 coding sequence. Unfortunately, it is impossible to align this region with an outgroup, making the direction (insertion vs. deletion) uncertain. This same region also has many indels segregating within populations, including a 604- to 680-bp deletion in one of the D. melanogaster chromosomes.
We found three polymorphisms in natural populations that probably result in nonfunctional proteins. First, 4 of 20 D. simulans Dro2 alleles contain a 2-bp deletion that introduces a frameshift into the mature peptide. Second, 1 of 20 D. melanogaster Dro1 alleles contains an internal stop codon. Finally, 2 of 20 D. melanogaster Dro1 alleles have a mutation that results in the loss of a disulphide bridge. The presence of several null alleles suggests that there is not strong selection maintaining the function of these loci. It is notable that null alleles have been recorded in attacins and class C scavenger receptors in D. melanogaster (Hedengren et al. 2000; Lazzaro and Clark 2001; Lazzaro 2005).
Amino acid divergence of drosomycin genes:
If hosts and parasites are coevolving, this can drive the rapid divergence of amino acid sequences. Conversely, stabilizing selection may conserve amino acids that are essential for the function or structural integrity of the molecule. In this section we have compared the patterns of divergence between different regions of the drosomycin molecule and inferred the selection pressures acting on them.
In species where natural selection is driving rapid divergence of AMPs, the mature peptide is much less conserved than the signal peptide (Duda et al. 2002). However, in the amino acid alignment of drosomycin, it is clear that the signal peptide is slightly more variable than the mature peptide (Figure 2). We can test whether different selection pressures act on the two protein domains by comparing different models of codon substitution with a likelihood-ratio test (Table 1). The model in which the signal and mature peptides have different dN/dS ratios is a significantly better fit to the data than the model in which they have the same ratio (Table 1; one ratio vs. two ratio: 2Δl = 73.1, d.f. = 1, P < 0.001). The mature peptide has the lower dN/dS ratio, indicating that it is under stronger stabilizing selection than the signal peptide.
Estimated dN/dS ratios for different regions of the drosomycin protein
The structure of drosomycin, in common with most other AMPs, includes a cluster of hydrophobic amino acids that probably interact directly with the fungal cell membrane (Figure 2) (Landon et al. 2000). This structure, combined with comparisons to better-characterized AMPs, has been used to identify the probable active site of the molecule (Figure 2) (Landon et al. 2000). If drosomycin is coevolving with pathogens, then the active site might be a target of selection. Inspection of Figure 2 does show several mutations in the active site, including mutations from hydrophobic to hydrophilic amino acids. Furthermore, the dN/dS ratio in the active site is significantly higher than in the rest of the mature peptide (Table 1; two ratio vs. three ratio: 2Δl = 30.3, d.f. = 1, P < 0.001). However, this ratio is still <1 (dN/dS = 0.49), and therefore this difference can be explained by either relaxed selective constraints or positive selection on the active site.
Next we tested whether positive selection acts on amino acids scattered elsewhere in the drosomycin molecule. Including a fraction of positively selected sites in the model of codon substitution did not alter the likelihood score (M8A vs. M8: 2Δl = 0.00, d.f. = 1, NS). Therefore, the variation in the rate of amino acid substitution in different regions of the protein appears to be due to variation in selective constraints rather than to positive selection.
Finally, we looked at the evolution of cysteine residues. The drosomycin molecule contains four disulphide bridges between eight cysteine residues that stabilize the structure of the molecule (Michaut et al. 1996; Landon et al. 1997). These cysteines have been conserved in the same location in all the drosomycin genes across all five species (Figure 2), confirming that they are essential to the function of drosomycin. The only exception is “sequence D” in D. ananassae, which has lost three of the four disulphide bridges. It is therefore unlikely that this gene produces a functional antifungal peptide.
Nucleotide polymorphism:
In both D. melanogaster and D. simulans, the African population tends to be more diverse than the European one, but the differences are small and not consistent (Table 2). This is typical of the pattern observed at other autosomal loci in D. melanogaster, but the difference between continents is normally larger in D. simulans (Begun and Aquadro 1993; Hamblin and Veuille 1999; Andolfatto 2001).
The nucleotide diversity of different species, populations, and genes
The diversity of coding sequence is substantially higher in D. simulans than in D. melanogaster, which is similar to the pattern reported for other genes (Moriyama and Powell 1996; Andolfatto 2001). The higher diversity in D. simulans is entirely due to a larger number of synonymous polymorphisms, as the two species have similar numbers of nonsynonymous polymorphisms (Tables 3 and 4). Therefore the two species have different ratios of synonymous-to-nonsynonymous segregating sites (Table 4; Europe: G = 5.41, P = 0.02; Africa: G = 0.95, NS).
Nucleotide diversity and divergence in coding and intergenic sequence
Intergenic, synonymous, and replacement segregating mutations
The diversity of noncoding intergenic sequence is very similar in D. simulans and D. melanogaster (Tables 2–4⇑⇑). The ratio of intergenic-to-replacement polymorphisms is similar in the two species (Table 4; Europe: G = 0.06, NS; Africa: G = 0.94, NS). However, the ratio of synonymous (coding region) to intergenic polymorphisms is very different (Table 4; Europe: G = 11.01, P = 0.001; Africa: G = 6.95, P = 0.01). In summary, the two species have similar diversities at nonsynonymous sites and in intergenic regions, but D. simulans has a much higher diversity at synonymous sites.
The divergence between the two species also differs among the three classes of sites. As expected from the dN/dS analysis above, synonymous divergence is far greater than nonsynonymous divergence (Table 3). Interestingly, the intergenic divergence is also lower than the synonymous divergence, suggesting that intergenic regions have either lower mutation rates or higher functional constraints (Table 3).
Frequency of polymorphic sites:
A selective sweep or population expansion will result in an increase in the number of low-frequency polymorphisms relative to those at intermediate frequency. This can be detected using Tajima's test statistic D, which is negative if there is an excess of rare mutations and positive if there is an excess of intermediate-frequency mutations (Tajima 1989). In our data set there are 10 significantly negative estimates of D and 1 significantly positive estimate (Table 5). Although it is clear from this that our data do not fit the neutral equilibrium model, the multiple tests performed make it difficult to identify specific loci that might be under selection. We also performed Fu and Li's D and F tests with an outgroup (Fu and Li 1993) by comparing the number of mutations on terminal and internal branches of the genealogy. These tests produced qualitatively similar results to Tajima's D (data not shown).
Tests of neutrality based on the frequency of mutations
Genetic hitchhiking during a selective sweep results in an excess of derived variants at high frequency (Fay and Wu 2000). This effect occurs at some distance away from the selected site, as recombination must have partially broken down the association with the actual site under selection. Furthermore, the effect is relatively short lived. We found an excess of high-frequency-derived variants in the intergenic sequence just downstream of Drs in the European population of D. melanogaster (Table 5; Fay and Wu's H = −6.67; P < 0.00001). This result is still highly significant after correction for multiple tests and when different rates of recombination are assumed (data not shown). It stems from the fact that all four ancestral-state sites segregating in this region are singletons, all of which are found in a single haplotype. Therefore, this could reflect a single recombination event during a selective sweep. It is notable that only one other estimate of H is marginally significantly different from the neutral distribution. Furthermore, this statistic is thought to be less sensitive to demographic processes than Tajima's D. Therefore, it is possible that there has been a recent selective sweep in this region. Interestingly, there are no fixed differences between Drs in D. simulans and D. melanogaster. Therefore, the data cannot be explained by positive selection on the Drs locus fixing nonsynonymous mutations.
Polymorphism and divergence:
Under the neutral model, the ratio of synonymous-to-nonsynonymous polymorphisms will be the same as the ratio of synonymous-to-nonsynonymous differences between species. However, positive selection can increase the number nonsynonymous differences between species due to the fixation of favorable mutations. This can detected using a McDonald-Kreitman test, which simply compares the two ratios in a 2 × 2 contingency table using Fisher's exact test (McDonald and Kreitman 1991). The McDonald-Kreitman test was not significant for any of the genes individually or for the combined data set of all genes (data in Table 6, all comparisons not significant). The same was true when the sites in intergenic regions were included as synonymous sites. If there are slightly deleterious amino acid polymorphisms that contribute to polymorphism but not to divergence, this can reduce the power of this test. Therefore, we removed singletons from the data set and repeated the test, but the result was still nonsignificant (data not shown).
Polymorphism and divergence at synonymous and nonsynonymous sites
Selective sweeps will reduce the genetic diversity of a gene, and balancing selection may increase it. It is possible to detect these effects as neutral theory predicts that both interspecific divergence and intraspecific polymorphism are proportional to the neutral mutation rate and therefore positively correlated across loci. Deviations from this prediction can be detected in a contingency table (where the table rows are loci and the table columns are polymorphism and divergence) using an HKA test (Hudson et al. 1987). In our case, this test is conservative as it makes the assumption that there is free recombination between loci and no recombination within loci, which clearly does not apply to our data.
We have used the HKA test to compare polymorphism and divergence across all our coding and intergenic sequences (Table 7). The sum of deviations was significantly greater than expected in D. simulans but not in D. melanogaster (D. simulans: sum deviations = 24.3, P = 0.01; D. melanogaster: sum deviations = 9.1, NS). If only a single locus has been under selection, it is more powerful to test whether the maximum standardized discrepancy (MSD) for any one of the loci is greater than expected (Wang and Hey 1996). This approach also has the advantage of identifying specific loci that differ from the rest of the data set. This test was significant only in D. simulans, due to the Drs locus (MSD = 5.43, Drs locus, P = 0.003). The unusual pattern of polymorphism in Drs can be clearly seen in Tables 2 and 7. This gene has both the lowest interspecific divergence of all the loci and the lowest intraspecific diversity (θ) in D. melanogaster, and yet in D. simulans θ is the highest of all the drosomycin genes.
Polymorphic sites and mean interspecific divergence of genes and intergenic sequence
The polymorphic sites in the D. simulans Drs sequences are shown in Figure 4. They include four nonsynonymous mutations and eight synonymous mutations. The nonsynonymous mutations are all in the signal peptide. Interestingly, the most common haplotype is identical to that found in D. melanogaster. Therefore, the derived mutations tend to be at a lower frequency than expected under neutrality, resulting in a significantly positive value of Fay and Wu's H in the African population (Table 5). Overall, the segregating mutations tend to be found at a low frequency, as reflected by negative estimates of Tajima's D (Table 5, NS).
Polymorphic sites in Drs from D. simulans. Nonsynonymous sites are marked as (*).
DISCUSSION
Adaptive protein evolution:
The protein sequences of antimicrobial peptides in vertebrates and termites have extremely high rates of evolution, driven by long-term positive selection (see Introduction). This is in stark contrast to the drosomycin genes, whose amino acid sequence is principally under purifying selection. Although not all the drosomycins appear to evolve neutrally, it is clear that their rate of adaptive evolution is far lower than that of AMPs in other species. This pattern is similar to that reported for antibacterial peptides in Drosophila, which also tend to have conserved amino acid sequences (see Introduction).
Why do AMPs show such differing patterns of evolution? One hypothesis is that there are differences intrinsic to the peptides themselves that alter their mode of evolution. For example, certain AMPs could have strong structural constraints, or they may target pathogen molecules that are highly conserved. However, there are now data from several structurally unrelated peptides in vertebrates and Drosophila, and consistent differences have emerged between AMP evolution in the two groups of hosts. Although more comparisons are needed, it currently seems likely that it is the ecology or physiology of the species that explains the different modes of evolution.
One explanation is that vertebrates and termites coevolve with specialist bacterial and fungal pathogens, but Drosophila does not. It is striking that no specialist fungal or bacterial pathogens of D. melanogaster have been described, despite this being one of the most studied of all species. If such pathogens do not exist, it may be related to the ecology of D. melanogaster and related species. They have short life spans and live on ephemeral food patches. Within each food patch there are unlikely to be many overlapping generations before dispersal to a new food patch, making it difficult for a pathogen to persist by transmission solely between individuals of D. melanogaster. Therefore, it is possible that most D. melanogaster pathogens are generalists infecting many species, which in turn may restrict the opportunity for coevolution. This contrasts with vertebrates and termite colonies, which are long lived with overlapping generations in the same habitat patch, facilitating the spread of specialist pathogens. This hypothesis could be tested by a comparative analysis of AMP evolution across taxa with differing ecologies.
This hypothesis is not inconsistent with the observation of rapid evolution in other components of the Drosophila immune system (Begun and Whitley 2000; Schlenke and Begun 2003; Lazzaro 2005). First, many of these genes may also be involved in defenses against specialist parasites such as parasitoids or viruses. For example, the Toll pathway is important in antiviral as well as antibacterial defenses (Contamine et al. 1989; Zambon et al. 2005). Also, many of these genes are involved in signaling pathways, and selection may be acting to alter patterns and the magnitude of immune responses rather than adapting to novel parasite genotypes.
The one notable exception to the pattern above is a peptide called andropin in D. melanogaster, which evolves rapidly under positive selection (Date-Ito et al. 2002). This peptide, which is related to the AMP cecropin and has antibacterial activity, is unusual in being expressed in the male ejaculatory duct and transferred to females during mating (Samakovlis et al. 1991; Lung et al. 2001). Proteins transferred from males to females often evolve very rapidly, and it is therefore unclear whether the rapid evolution is associated with its antibacterial properties or with some other postmating function (Swanson et al. 2001). If andropin does coevolve with microbes, infections of the reproductive tract might be D. melanogaster specialists if they are sexually transmitted.
Selection on Drs:
Although we did not find rapid adaptive protein evolution, there was some indication of recent selection at or near Drs in both species. In D. melanogaster, there was an excess of high-frequency-derived mutations just downstream of Drs. This could result from a selective sweep at linked sites, such as the Drs coding sequence or regulatory elements. A selective sweep could also explain the low diversity of the Drs coding sequence. However, there are no fixed differences between the Drs coding sequence of the two species, suggesting that selection favoring beneficial changes to the amino acid sequence is unlikely. This result was also significant only in Europe, and we are unable to exclude the possibility that it is an artifact of some demographic process.
In D. simulans, an HKA test showed that Drs had an anomalous polymorphism-to-divergence ratio. This was the result of Drs having both the lowest interspecific divergence (the most common allele is identical to D. melanogaster) and the highest intraspecific polymorphism of any gene. The first explanation of this data is that balancing selection has increased the diversity at this locus. This is consistent with the flanking intergenic sequence also having fairly high levels of polymorphism (Table 2). However, the high polymorphism is due to low-frequency-derived mutations that have occurred on multiple different haplotypes, and there are no nonsynonymous polymorphisms in the mature peptide. It seems unlikely that such a pattern would arise from balancing selection. This suggests that some recent change has led to the accumulation of variation. This could be a relaxation of selective constraints, a local increase in the mutation rate, gene conversion, or recent positive selection increasing the frequency of certain mutations.
An alternative hypothesis is that the Drs locus has introgressed between the two species, resulting in reduced interspecific divergence. A similar process has been postulated in the Cecropin gene region of D. melanogaster (Date et al. 1998). In our case, this would have to be coupled to a selective sweep reducing the diversity of Drs in D. melanogaster, as in this species the polymorphism-to-divergence ratio is normal. This is possible, given the evidence for selection near Drs (see above) and might be expected if selection favored the introgressing allele.
The observation that the D. melanogaster Drs sequence is identical to the most common D. simulans allele is certainly unusual and is compatible with the introgression hypothesis. Introgression could potentially introduce derived D. simulans polymorphisms into D. melanogaster. However, using D. yakuba as an outgroup, we found that the shared melanogaster/simulans allele is ancestral to all the other simulans alleles. Introgression might also be expected to involve some of the flanking intergenic sequence. However, the interspecific divergence of the flanking sequence was typical elsewhere in the drosomycin region (sliding-window analysis; data not shown). Instead, the short length of the Drs gene means that the absence of fixed differences between species may simply have occurred by chance. Assuming that the occurrence of fixed differences between the species follows a Poisson distribution, two of the other drosomycin loci have divergences that fall within the 95% confidence interval of divergence at Drs. Similarly, the upper 95% confidence limit on K's between the most common D. simulans allele and D. melanogaster is 0.079, which is only slightly lower than the divergence estimated at the other loci. In conclusion, although the Drs locus does not fit the neutral model, the causes of the unusual patterns of polymorphism and divergence remain unclear.
Polymorphism in coding and intergenic sequences:
The data that we collected are ideal for comparing sequence evolution in D. simulans and D. melanogaster, as we sequenced similar numbers of alleles from a single genomic region that encompassed both coding and intergenic sequence. The two species have similar nonsynonymous diversities, but D. simulans has much greater synonymous site diversity. Similar patterns have been observed previously and been attributed to D. simulans having a larger population size than D. melanogaster (Moriyama and Powell 1996; Andolfatto 2001). This is expected to increase the diversity of neutrally evolving synonymous sites. However, if nonsynonymous mutations in drosomycin tend to be slightly deleterious, they will be removed more effectively from D. simulans than from D. melanogaster.
The intergenic sequence has diverged less between species than between synonymous sites. This has been observed before and has been attributed to either higher mutation rates in transcribed regions or to selective constraints on intergenic regions (Halligan et al. 2004; Kern and Begun 2005). The intergenic sequence has similar diversities in the two species (i.e., it behaves most like the nonsynonymous sites). This is the result of the nucleotide diversity of intergenic regions being markedly lower than the diversity of synonymous sites in D. simulans, while in D. melanogaster synonymous and intergenic diversities are more similar. This difference in diversity cannot be easily explained by higher mutation rates in transcribed regions, as this would increase the synonymous diversity but not alter the intergenic-to-synonymous polymorphism ratio in the two different species. The pattern can be explained, however, if polymorphisms in the intergenic sequence tend to be slightly deleterious and are therefore removed by selection more efficiently in D. simulans (Kern and Begun 2005). Similar observations were reported by Kern and Begun (2005), who found similar intergenic but different synonymous diversities in the two species. However, when they compared African D. melanogaster with a sample of D. simulans of mixed origin they found similar synonymous diversities in the two species. In our data we have compared both ancestral (African) and derived (European) populations of both species and found that the original pattern holds in both cases.
Acknowledgments
The isofemale lines were kindly supplied by Penny Haddrill. This work was funded by a Wellcome Trust Research Career Development Fellowship.
Footnotes
- Received May 10, 2005.
- Accepted August 31, 2005.
- Copyright © 2005 by the Genetics Society of America