Genetics, Vol. 168, 1323-1335, November 2004, Copyright © 2004
doi:10.1534/genetics.103.025775

Evolution of P Elements in Natural Populations of Drosophila willistoni and D. sturtevanti

Department of Ecology and Evolutionary Biology, The University of Arizona, Tucson, Arizona 85721

1 Corresponding author: The Institute for Genomic Research, 9712 Medical Center Dr., Rockville, MD 20850.
E-mail: jsilva{at}tigr.org

Manuscript received December 12, 2003. Accepted for publication June 25, 2004.

ABSTRACT

To determine how population structure of the host species affects the spread of transposable elements and to assess the strength of selection acting on different structural regions, we sequenced P elements from strains of Drosophila willistoni and Drosophila sturtevanti sampled from across the distributions of these species. Elements from D. sturtevanti exhibited considerable sequence variation, and similarity among them was correlated to geographic distance between collection sites. By contrast, all D. willistoni elements sampled were essentially identical ({pi} < 0.2%) and exhibited patterns typical of a recent population expansion. While the canonical P elements sampled from D. sturtevanti appear to be long-time residents in that species, a rapid expansion of a very young canonical P-element lineage is suggested in D. willistoni, overcoming barriers such as large geographical distances and moderate levels of population subdivision. Between-species comparisons reveal selective constraints on P-element evolution, as indicated by significantly different substitution rates in noncoding, silent, and replacement sites. Most remarkably, in addition to replacement sites, selection pressure appears to be strong in the first and third introns and in the 3' and 5' flanking regions.


P ELEMENTS are one of the most thoroughly studied families of eukaryotic transposable elements (TEs), and much is known about their structure, transposition mechanisms, and evolutionary history (O'HARE and RUBIN 1983; ENGELS 1989, 1996; CLARK et al. 2002; RIO 2002). Here we address two understudied aspects of P-element evolution, namely their transmission within natural populations and the distribution of sequence motifs affecting element fitness among different structural regions.

The canonical P element, first isolated from Drosophila melanogaster, is ~3 kb long and contains four open reading frames (ORFs) that together encode a transposase (Figure 1; O'HARE and RUBIN 1983). In addition, a truncated polypeptide consisting of only the first three ORFs and part of the third intron encodes a repressor of transposition (LASKI et al. 1986; ROBERTSON and ENGELS 1989; MISRA and RIO 1990; GLOOR et al. 1993). P-element sequences have been grouped into ~25 subfamilies according to their level of sequence identity and host taxa (HAGEMANN et al. 1994, 1996a,b; CLARK and KIDWELL 1997; SARKAR et al. 2003; OLIVEIRA DE CARVALHO et al. 2004). The distribution of P elements seems to be mainly restricted to the order Diptera, the vast majority of the elements having been sampled from the genus Drosophila and a few closely related genera (for a review see CLARK et al. 2002), as well as from Anopheline mosquitoes (SARKAR et al. 2003; OLIVEIRA DE CARVALHO et al. 2004).



View larger version (29K):
In this window
In a new window
Download PPT slide
 
FIGURE 1.—

Schematic of the structure of the canonical P. (A) The canonical D. melanogaster P element. The element is flanked by 31-bp inverted terminal repeats (represented by arrows). These are adjacent to 5' and 3' terminal regions (represented by single lines). The element has four ORFs, numbered 0–3, which together encode a transposase enzyme. ORFs are represented by open rectangles, and introns by single lines. The first and last positions of each noncoding region are numbered above. The inverted terminal repeats were excluded from all analyses, as they were the primer-binding regions. (B) Structure of the P elements sequenced from stains of D. willistoni and D. sturtevanti. Open boxes represent open reading frames, and horizontal lines represent introns and flanking regions. The location of insertions is marked by a T-shaped sign, and their relative length is indicated by the size of the top of the T. An interruption in the structure of an element represents a large deletion. All the elements sequenced from D. willistoni are complete, except for small indels.

 
Only two P-element subfamilies include elements that are known to be active. One of them, the canonical subfamily, owes its name to the inclusion of the canonical element from D. melanogaster and others that are closely related to it (O'HARE and RUBIN 1983). This subfamily is characteristic of the two New World species groups of the subgenus Sophophora, the Drosophila willistoni and Drosophila saltans groups (CLARK et al. 1995), and is the main focus of this study. The second active P element was found in Scaptomyza pallida (SIMONELIG and ANXOLABéHèRE 1991), and belongs to the M-type subfamily (HAGEMANN et al. 1994).

Previous studies revealed the presence of selective constraints acting on nonsynonymous sites of P elements (WITHERSPOON 1999; SILVA and KIDWELL 2000). These results were explained by purifying selection acting at the level of the host to preserve the functional coding sequence of the repressor polypeptide. In addition, selection was also explained by the advantage that autonomous elements (those encoding all sequence factors required for transposition) have over nonautonomous ones at the time of horizontal transfer, which is considered to be an essential step in the P-element life cycle (HARTL et al. 1997; PINSKER et al. 2001; SILVA et al. 2004). However, those studies were based on either a small number of sequences or a small coding region of the P element, which limited the power of the analyses and prevented assessment of possible constraints in noncoding regions.

While several studies have explicitly addressed the transfer of canonical P elements among species (CLARK and KIDWELL 1997; SILVA and KIDWELL 2000; LORETO et al. 2001), the spread of these elements within species has been studied only in D. melanogaster. It has been argued that the association of D. melanogaster with human activities has played a major role in the acquisition and subsequent rapid spread of P elements throughout worldwide populations of the species in <200 years (KIDWELL 1979; ENGELS 1992). Therefore, it is questionable whether the spread of P elements in this species is representative of the rapidity of spread of TEs in host populations.

Here we report a study of the molecular evolution of canonical P elements within two species, D. willistoni and D. sturtevanti, members of the willistoni and saltans groups, respectively. The main goals of this study were twofold: (1) to assess the nature of selective constraints acting in both coding and noncoding regions of P elements within and between species, on the basis of a large sample of elements, and (2) to investigate the spread of P elements among natural populations of two species that, in contrast with D. melanogaster, are not commensal with humans.


MATERIALS AND METHODS

Strains:

P-element sequences were obtained from 35 strains of D. willistoni and 9 strains of D. sturtevanti. The origin of these strains is broadly representative of the distributions of the two species, which range from Mexico and Florida in the north to Southern Brazil, but sampling was more exhaustive in D. willistoni than in D. sturtevanti (Figure 2, Table 1).



View larger version (18K):
In this window
In a new window
Download PPT slide
 
FIGURE 2.—

Collection sites of D. willistoni and D. sturtevanti strains. Numbers correspond to the first column of Table 1, where location, date, and collector are specified for each strain.

 

View this table:
In this window
In a new window

 
TABLE 1

Strains of D. willistoni and D. sturtevanti: collection date and location

 

DNA isolation and sequencing:

Total genomic DNA was obtained from each strain following standard protocols. P elements were PCR amplified from each strain using oligonucleotide PInvRep (Table 2) under the following conditions: 2 min of initial template denaturation at 95°, 20 cycles of 30 sec denaturation at 95°, 20 sec of primer annealing at 55°, and 3 min of primer extension at 68°, with a final extension for 7 min at 68°. PCR amplifications were done using Platinum Taq DNA polymerase High Fidelity (GIBCO BRL, Gaithersburg, MD; error rate ~0.8 x 10–6) and standard concentrations of all reagents. Few rounds of amplification were used so as to minimize potential polymerase errors in the PCR products. The expected number of errors in a 3-kb fragment in 20 rounds of amplification is 3000 x 20 x 0.8 x 10–6, or 0.048/fragment. In the whole data set of 46 sequences, there will be on average approximately two false polymorphisms. The products of PCR amplifications were cloned into the TOPO TA cloning vector (Invitrogen, San Diego). For each strain, one clone was chosen randomly among those with an insert of ~3 kb, the length of the canonical element, and sequenced. When no inserts of the desired size were present, as happened in a few strains of Drosophila sturtevanti, inserts of other sizes were chosen. P-element inserts were sequenced directly, using a series of internal primers that allowed sequencing of each element in both directions (Table 2). Sequences were obtained by the Laboratory for Molecular Systematics and Evolution at the University of Arizona, using an ABI 377 automated sequencer. Several P-element sequences were obtained from the literature: the canonical P element (O'HARE and RUBIN 1983), the functional P element from S. pallida (SIMONELIG and ANXOLABéHèRE 1991), the canonical elements from Drosophila mediopunctata (LORETO et al. 2001) and Drosophila nebulosa (LANSMAN et al. 1987), and partial P sequences obtained by CLARK et al. (1995) from D. sturtevanti, D. willistoni, and representatives of noncanonical P subfamilies.


View this table:
In this window
In a new window

 
TABLE 2

P-element primers used in PCR and sequencing reactions

 

Sequence analyses:

Alignment of all sequences was done by eye using MacClade 4 (MADDISON and MADDISON 2001). The location of all insertions, deletions, and point mutations was identified using SITES (HEY and WAKELEY 1997).

Nucleotide variation:

Estimates of heterozygosity per site were obtained from the average pairwise number of differences between elements, {pi} (NEI and LI 1979), and from {theta}, calculated on the basis of the number of polymorphic sites, S (WATTERSON 1975). These calculations were performed with DnaSP 3 (ROZAS and ROZAS 1999).

Substitution estimates:

The number of synonymous substitutions per synonymous site, dS, and the number of nonsynonymous substitutions per nonsynonymous site, dN, was estimated using the method of NEI and GOJOBORI (1986). Standard deviations for the average dS and dN within and between groups of sequences were calculated as described by NEI and JIN (1989). Divergence in noncoding regions (introns and untranslated regions) was estimated using the Jukes-Cantor method (JUKES and CANTOR 1969). Maximum-likelihood estimates of {omega} among D. willistoni elements, where {omega} = dN/dS, were obtained using the method of GOLDMAN and YANG (1994) implemented in PAML 3.13 (YANG 1997). The model of evolution used assumes one {omega} for all sites, and we tested whether the value of {omega} estimated from the data provided a significantly better fit to the data than when {omega} is fixed and equal to 1. Because the model in which {omega} is constrained is a special case of the more general model where {omega} is free to vary, the difference in likelihood of the two models can be tested for statistical significance using a likelihood-ratio test (LRT), in this case by comparing the LRT statistic to a {chi}2 distribution with 1 d.f. (YANG and NIELSEN 2002).

Phylogenetic analysis:

The phylogenetic relationship among P-element sequences was reconstructed by maximum parsimony. Tree space was searched using branch-and-bound. Bootstrap analysis consisted of 100 bootstrap replicates using branch-and-bound. Phylogenetic analyses were performed in PAUP* (SWOFFORD 1999).


RESULTS

Length polymorphism in D. willistoni P elements:

Amplification of P elements from all 35 collected strains of D. willistoni produced a fragment ~3 kb long. Most strains also yielded smaller-sized fragments. All 3-kb P elements sampled are identical in structure and sequence to the D. melanogaster canonical element (Figure 1 and Table 3), except for rare insertions, deletions, and point mutations. Most indels are one nucleotide (nt) in length and hence disrupt the reading frame when located in a coding region. There are four longer deletions, which all occur in ORFs, one of which has a length that is not a multiple of three. There are fewer indels per nucleotide in coding regions (9 in 2223 nt) than in noncoding regions (4 in 655 nt), but the difference is not statistically significant (; indels of length multiple of three, which do not disrupt the reading frame, were excluded, making the test conservative).


View this table:
In this window
In a new window

 
TABLE 3

Variable nucleotide positions in D. willistoni P elements

 

Nucleotide polymorphism in D. willistoni P elements:

There are 78 polymorphic sites among the 35 P elements examined (Table 3). Of these, 75 are singletons, two are doubletons, and one is present in three sequences. The 78 polymorphisms can be classified as follows: 17 are in noncoding regions, 23 are synonymous, 36 are replacements, and 2 lead to termination codons, one of which is the mutation present in three sequences (position 2625). These polymorphisms are distributed evenly along the element, as revealed by similar levels of polymorphism in different structural regions (Table 4). Within coding regions, polymorphisms are about twice as frequent in synonymous as in nonsynonymous sites, but the difference is not statistically significant (Table 4). A more precise estimate of {omega}, the ratio dN/dS, obtained using maximum likelihood, supports this result. The maximum-likelihood estimate of {omega} is 0.634 (ln L = –3817.9547). However, this value does not provide a significantly better fit to the data than {omega} = 1 (ln L = –3819.2663), as the LRT statistic is 2 x [–3817.9547–(–3819.2663)] = 2.623, which is much smaller than the critical value of . It is unclear whether this lack of significance reflects a true value of {omega} = 1 (i.e., that {omega} ~0.6 is just stochastic variance around the mean of {omega} = 1), or whether it results from insufficient power of the method to detect significance at very low levels of divergence.


View this table:
In this window
In a new window

 
TABLE 4

D. willistoni P elements: polymorphism and divergence per region

 

Length polymorphism in the D. sturtevanti P elements:

The P-element fragments amplified using the terminal repeat primers in D. sturtevanti strains varied in size between 500 bp and 4 kb. The PCR products amplified from strains from Mexico (Apazapán and Matlapa), El Salvador, and Panama had two sizes, of ~2.7 and 3 kb in length. One random 3-kb clone was sequenced from each strain, except for A10S from which two randomly chosen clones were sequenced. The PCR result from the Jamaican line showed a single 500-bp product (one was randomly chosen for sequencing) that from the Dominican Republic yielded two fragments of ~3 kb and 650 bp (one of each size was sequenced). In total, 11 P elements were obtained from the nine strains of D. sturtevanti (Table 1). Structurally, these elements are considerably more polymorphic than those obtained from D. willistoni (Figure 1). Some have deletions that encompass almost the entire length of the element, one has a large insertion, and others have smaller deletions of various lengths (Table 5). The presence of multiple indels that disrupt the reading frame of the transposase makes it unlikely that any of these elements encodes a functional transposase.


View this table:
In this window
In a new window

 
TABLE 5

Insertions and deletions in D. sturtevanti P elements, relative to the canonical element from D. melanogaster

 

Nucleotide variability in D. sturtevanti P elements:

In addition to the high degree of sequence length polymorphism, the elements from D. sturtevanti also differ markedly from one another in nucleotide sequence: there are 283 nucleotide polymorphisms in the 11 D. sturtevanti P elements sequenced. In comparison with the canonical P-element reference sequence, the polymorphisms in the D. sturtevanti elements can be grouped as follows: 84 occur in noncoding regions and 199 in coding regions. Of the latter, 64 are synonymous and 135 are nonsynonymous polymorphisms.

Phylogenetic relationships among P elements:

Phylogenetic relationships among D. willistoni and D. sturtevanti P-element sequences are depicted in Figure 3. The element from S. pallida and others representing noncanonical subfamilies were used as outgroups. Due to their similarity, only a few representative elements were chosen from among all D. willistoni sequences and from among the Mexican strains of D. sturtevanti.



View larger version (32K):
In this window
In a new window
Download PPT slide
 
FIGURE 3.—

Phylogenetic relationships of P elements sampled from D. willistoni and D. sturtevanti, reconstructed using maximum parsimony. Elements collected for this study are in boldface type; they are prefixed with "Dwill" or "Dst" for D. willistoni and D. sturtevanti, respectively. All 35 D. willistoni P elements sampled form a monophyletic clade (only two representatives of these are shown in the shaded polygon comprising Dwill sequences); likewise, six D. sturtevanti P elements from Mexico and Central America are monophyletic (two representatives are shown in the shaded polygon comprising Dst sequences). The S. pallida P element and the noncanonical elements from Drosophila lusaltans, Drosophila pavlovskiana, and D. sturtevanti (Dsturtevanti42) were used as outgroups. Encircled letters mark individual clades of D. sturtevanti sequences and are used for reference in the text and tables. One of 2210 most parsimonious trees is shown, with branch lengths proportional to the number of nucleotide changes and bootstrap support shown on the tree. All elements analyzed in this study are part of the canonical clade, with the exception of Matlapa2 from D. sturtevanti.

 
Several interesting patterns emerge from this analysis. Our sampling scheme that was designed to detect canonical elements did indeed produce only canonical elements, with the possible exception of the element Matlapa2 from D. sturtevanti, which contains a large insertion (Figure 1). The elements from D. willistoni and D. sturtevanti, as well as other known canonical elements from D. nebulosa, D. carpicorni, and D. mediopunctata, form a monophyletic clade, with the exception of Dsturtevanti42, which is known to belong to a noncanonical subfamily (CLARK et al. 1995), and Dst-Matlapa2, which groups with noncanonical elements. Also, the elements from D. sturtevanti are paraphyletic in relation to those from D. willistoni and other species and form clades according to the geographic origin of the strains from which they were obtained. The D. sturtevanti elements from Mexico (with the exception of Matlapa2) group with those from Central America. One of the elements from the Dominican Republic forms a clade with the one from Jamaica and, finally, the D. sturtevanti elements from Brazil form a monophyletic group. This geographic grouping also coincides with the distribution of the length of PCR products obtained in each line and is further supported by the location and size of indels (Figure 1 and Table 5). Each of the clades identified in this analysis was treated independently for the purposes of polymorphism and divergence estimates.

Divergence between D. willistoni and S. pallida P elements:

The D. melanogaster canonical element differs from the consensus sequence of the D. willistoni elements only at nucleotide position 32 ("A" in D. melanogaster and "G" in D. willistoni); this extremely low divergence is explained by the recent horizontal transfer of a P element between the two species (DANIELS et al. 1990). We estimated the divergence of D. willistoni P elements from the S. pallida P element 18 (SIMONELIG and ANXOLABéHèRE 1991). The latter is a member of the only P-element subfamily, other than the canonical one, for which functionality has been confirmed in vivo, and differs from the canonical element by ~25% (SILVA and KIDWELL 2000). The number of substitutions per synonymous and per nonsynonymous site (dS and dN, respectively) were estimated separately for each P-element region, as well as for the coding region as a whole (Table 4). Several remarkable results emerge from these comparisons. First, dS is significantly larger than dN in all comparisons. Second, dN is larger in ORF3 than in the other three ORFs (significantly so in relation to ORFs 0 and 1). Finally, the number of substitutions in noncoding regions is significantly lower than that in synonymous sites, with the exception of intron 2. The significance of these observations is addressed in the DISCUSSION.

Polymorphism and divergence in D. sturtevanti P elements:

Five D. sturtevanti elements from Mexico and Central America were grouped into clade A (Figure 3). These elements are almost intact structurally, and the few indels observed are almost all fixed among them (Table 5). The degree of polymorphism in synonymous sites does not differ significantly from that in nonsynonymous sites (Table 6). However, when D. sturtevanti elements are compared with those from D. willistoni, dS is significantly larger than dN, despite the relatively small divergence between the elements from these two species (<10%). In addition, and as already observed for D. willistoni P elements, the 5' and 3' regions flanking the transposase, as well as the first and third introns, have tended to evolve more slowly than silent sites (the difference is statistically significant for intron 1), and nonsynonymous sites in ORF3 have evolved significantly faster than those in the other three ORFs.


View this table:
In this window
In a new window

 
TABLE 6

Polymorphism and divergence of D. sturtevanti P elements: clade A

 
Two P elements from the Antilles strains of D. sturtevanti, Jamaica, and the Dominican Republic, also form a monophyletic group, clade B. They share one small insertion in the 3' end and one large internal deletion that eliminates most of ORF0, -1, and -2, and half of ORF3 (Figure 1 and Table 5). These elements bear a strong similarity to the canonical elements from D. willistoni (Table 7). When these elements are compared to the canonical elements from D. willistoni, synonymous sites evolve faster than nonsynonymous sites and 5' and 3' noncoding regions, but the difference is not significant.


View this table:
In this window
In a new window

 
TABLE 7

Polymorphism and divergence of D. sturtevanti P elements: clade B

 
The two Brazilian elements from D. sturtevanti, from Ceará and Minas Gerais (I27), are very similar to each other, as reflected in the low level of polymorphism in clade D (Table 8) and the multiple indels that they share (Figure 1 and Table 5). The degree of polymorphism in nonsynonymous sites does not differ from that in synonymous sites. When compared to the D. willistoni elements, dN is always larger than dS (significantly so within ORF1, ORF2, and for the element as a whole), and the first and third introns are the slowest evolving structural regions of the element. Nonsynonymous sites in ORF3 evolve faster than those in the other ORFs. Finally, one element from the Dominican Republic (DomRep4) and the element from Matlapa each form individual lineages (C and E, respectively). Details of their divergence from the D. willistoni elements are presented in Table 9. While dN is not significantly lower than dS for most exons in the case of the DomRep4 element, when the coding region is considered as a whole this difference is significant. In the case of the element from Matlapa, substitutions have accumulated significantly faster in nonsynonymous than in synonymous sites in all but ORF2, in which a long deletion prevents a reliable estimate. Once more the evolution rate in the 5' and 3' noncoding regions and in the first and third introns is lower than that of synonymous sites, and ORF3 evolved faster than the other ORFs for which a reliable rate could be obtained.


View this table:
In this window
In a new window

 
TABLE 8

Polymorphism and divergence of D. sturtevanti P elements: clade D

 

View this table:
In this window
In a new window

 
TABLE 9

Divergence between D. sturtevanti and D. willistoni P elements: lineages C and E

 

Interspecies P-element comparisons:

Our survey revealed a striking difference between the two species in the degree of structural and sequence variability of the P elements sampled. While those sampled from D. willistoni are extremely similar to one another, with overall {pi} ~0.16% (Table 4), the elements from D. sturtevanti differ considerably from each other in both structure and sequence (Figure 1 and Table 5). In addition, similarity among D. sturtevanti P elements is related to the geographic regions from which the strains were collected, as evidenced by the groups formed by elements from Central America (represented by Mexico, El Salvador, and Panama), from the Antilles (Jamaica and the Dominican Republic), and from Brazil (Ceará and Minas Gerais). Each of these three groups is characterized by specific insertions and deletions (Figure 1, Table 5), as well as by shared derived nucleotide polymorphisms, which are reflected in the P-element phylogeny (Figure 3). These data, together with homogeneous PCR band sizes within each group, are consistent with the hypothesis that different D. sturtevanti populations carry their own sets of P elements.

Comparisons of P elements between species provide strong evidence that sites in all structural regions of the element, except for intron 2, have evolved under purifying selection. These include not only replacement sites in exons but also, somewhat surprisingly, those in introns 1 and 3 and both the flanking regions upstream and downstream from the transposase gene. However, the distribution of polymorphisms within species provides no clear evidence of selective constraints in the short-term evolution of P elements.


DISCUSSION
We studied a group of 45 canonical and one noncanonical P element sampled from strains of D. willistoni and D. sturtevanti collected from across the geographic distributions of these species (Table 1, Figure 2). The striking differences observed between the two species in the degree of P-element structural and sequence variability, as well as differences in the selection patterns acting on their structural regions, are further discussed below in the context of evolutionary scenarios and mechanisms that could have facilitated them.

The residence time of canonical P elements in D. willistoni and D. sturtevanti:

We have determined that canonical elements sampled previously from the willistoni and saltans groups diversified at most three million years ago (SILVA and KIDWELL 2000). However, the very high degree of sequence similarity among D. willistoni P elements, which contrasts sharply with the results from D. sturtevanti, suggests that the sampled D. willistoni canonical elements last shared a common ancestor much more recently than the time of diversification of the two species. If our sample is representative of all canonical P elements in D. willistoni, then this species might have been one of the last species within the two New World Sophophora groups to be invaded by canonical P elements. The P elements sampled from D. sturtevanti are paraphyletic in relation to the other canonical elements, including those in D. willistoni (Figure 3). Together with the high sequence and structural heterogeneity of D. sturtevanti elements, this suggests an older invasion of D. sturtevanti by canonical P elements, relative to that observed for D. willistoni, and their subsequence divergence by mutation and drift. However, the high similarity among D. sturtevanti elements of clade A, which are present in lines from Mexico and Central America, and the much lower degree of polymorphism among these elements than in host markers (SILVA 2000), provide strong evidence for the activity of D. sturtevanti elements that postdates the origin of this species. Whether a more exhaustive sampling of other subpopulations of D. sturtevanti would show additional localized spread of canonical elements within this species remains to be seen. If this were the case, it would suggest that the P-element family retains many pockets of transposition activity, which may serve as the source of new waves of horizontal transfer.

Spread of P elements within species:

In sharp contrast to the polymorphism observed among D. sturtevanti P elements, that of D. willistoni elements is characterized by the almost exclusive presence of singleton mutations, which includes both length and nucleotide polymorphisms. Because our sequences were obtained from cloned P elements, some of the polymorphisms could represent polymerase errors. The expected number of false polymorphisms in our total data set caused by such errors is approximately two (see MATERIALS AND METHODS). However, we found over 70 polymorphisms, clearly above the expected background error level, and the effect (if any) of polymerase errors on the results should be negligible. Furthermore, the fact that the excess of singletons is observed in indels, as well as in point mutations, suggests that the pattern is real. The predominance of singleton substitutions may indicate either a rapid expansion of P elements in the host species or the presence of weakly deleterious mutations. The extremely high degree of similarity among P elements sampled from locales that are separated by thousands of kilometers, together with a lack of evidence for strong selective constraints within species, suggests that canonical P elements invaded D. willistoni relatively recently and spread quickly throughout the whole species.

D. willistoni populations, with an estimated Fst ~0.15 for both nuclear (Adh) and mitochondrial (ND5) markers, are much less structured than those of D. sturtevanti, for which Fst ≥ 0.48 for both types of molecular markers (SILVA 2000). The marginal populations of D. willistoni, such as those in the Antilles and Florida, show a significant degree of genetic differentiation from the continental populations, while in D. sturtevanti significant differentiation is observed among continental populations as well (SILVA 2000). This species difference likely results from the higher density, wider ecological range, and concomitant larger effective population size of D. willistoni relative to that of D. sturtevanti.

Our results suggest that the spread of P elements can be hindered by significant subdivision of the host population, as shown by the association between element similarity and geographic origin of the D. sturtevanti elements. However, the canonical P elements in D. willistoni, including those from peripheral, isolated populations, are all nearly identical, showing that these elements can spread rapidly throughout the species range even when gene flow, as detected from host markers, is somewhat restricted.

Selection on P elements:

In both D. willistoni and D. sturtevanti, the presence of deletions and insertions that disrupt the transposase reading frame, as well as the fact that these indels are evenly distributed among coding and noncoding regions, suggests a lack of selective constraints on P elements within species. Between species, however, the slower rate of evolution of nonsynonymous sites in all four ORFs relative to that of synonymous sites suggests that there is selective pressure to maintain transposase activity, in agreement with previous results (WITHERSPOON 1999). This suggests that elements that encode an active transposase have a fitness advantage over nonautonomous elements. Interestingly, in all comparisons, nonsynonymous sites in ORF3 evolve significantly faster than those in the other three ORFs. As ORF3 is the only ORF not included in the 66-kD polypeptide that represses transposition, this result provides support for the hypothesis that independent, additive constraints act on the transposase and on repressors of transposition (WITHERSPOON 1999). While ORFs 0–2 are subjected to selective pressures imposed by both types of constraint, ORF3 is subjected only to those imposed on the transposase; therefore it evolves faster than the first three ORFs.

Surprisingly, the rate of evolution of some noncoding regions, namely the first and third introns and, often, the 3' and 5' flanking regions, is of the same order of magnitude as that of nonsynonymous sites and considerably lower than that observed in synonymous sites. Several characteristics of the P-element transposition mechanism, as well as the maternal transmission of the 66-kD repressor of transposition, are linked to these regions and provide possible explanations for their relatively high degree of conservation. The 3' and 5' noncoding regions contain motifs that are required for transposition, such as the inverted terminal repeats, 11-bp internal inverted repeats, and unique sequences ~150 bp in length at each end of the element (ENGELS 1989). Recent evidence suggests that the third intron is involved in targeting of the P-element mRNA to the oocyte (SIMMONS et al. 2002). More importantly, this intron is not spliced out from the P-element mRNA transcript in the somatic tissue, where it is translated; its presence is essential to maintain in the correct frame the early stop codon that gives rise to the truncated 66-kD repressor polypeptide (ROBERTSON and ENGELS 1989; MISRA and RIO 1990; ROCHE et al. 1995), thus preventing transposase activity in the soma, which would be clearly deleterious to the host. Therefore, it is not too surprising that this intron evolves under selection. Because the rate of evolution of intron 1 is just as low as that of intron 3, our results strongly suggest that the former contains a motif, or motifs, involved in the repression of transposition.

Selection mechanism:

The patterns of divergence described above support a scenario according to which selection acts on the element only at the time of horizontal transfer between species, and not during evolution within species. When invading a new genome, the fitness of autonomous elements is higher than that of nonautonomous elements because the former contain, by definition, all intact sequence motifs required for transposition and can therefore spread in the new host; during within-species transmission, however, transposase could act in trans, benefiting autonomous and nonautonomous elements alike. Previously, this has been the hypothesis most frequently put forward to explain observations of selective constraint in the evolution of transposases encoded by class II TEs (WITHERSPOON 1999; SILVA and KIDWELL 2000).

Our present results from a large sample of P elements from D. willistoni show that dS within species is larger than dN for all four ORFs, even though the difference is not statistically significant (Table 4). The maximum-likelihood value of {omega} is also <1, but again not significantly so. These results suggests the possibility that selection also acts during the transmission of elements within species, but that the small number of polymorphisms observed makes it impossible to obtain enough power to produce a significant result. Selection at the within-species level could occur by two processes. It could result from the fact that some of the nonsynonymous polymorphisms are effectively neutral or only slightly deleterious (KIMURA 1983). Slightly deleterious nonsynonymous polymorphisms linger in populations for many generations, but only rarely go to fixation and hence the ratio of synonymous to nonsynonymous mutations increases with time, being barely notable in a sample of recently diverged sequences. A similar pattern is observed in the mitochondrial DNA of many animal species, in which the ratio of synonymous to nonsynonymous mutations within species is lower than the ratio between species (NACHMAN 1998). This first process requires that transposase acts preferentially in cis, such that the mutated elements have a lower transposition efficiency; this seems unlikely in eukaryotes, as the processes of translation and transposition occur in different cellular compartments. A second process that could lead to a difference in the rate of accumulation of synonymous and replacement substitutions would occur when the selection coefficient of mutations in an element varies with the ratio between autonomous and nonautonomous elements in a cell. This corresponds to selection mechanism (4) described by WITHERSPOON (1999) and can be explained as follows: during the first stages after invasion of a new host species, most elements are complete and transposase will be plentiful, acting either in cis or in trans. However, with time, elements accumulate mutations and autonomous elements become sparse. If, at this point, the number of autonomous and of nonautonomous elements per cell varies independently, and if the frequency of transposition is a direct function of the number of autonomous elements per cell, such that transposition is more frequent in cells with the most functional elements, then the conditions are created for selection to favor autonomous over nonautonomous elements. A larger number of sequences (or more divergent elements) will have to be collected to determine whether the lack of significance in the values of dS and dN within D. willistoni is real or the result of a lack of power due to the small number of mutations.

In conclusion, we have provided evidence consistent with the idea that P elements are opportunistic elements that can spread very rapidly throughout a species, despite the existence of a small amount of population structure. We have also shown that selective pressures differ significantly among structural regions of the element. Most remarkably, evolutionary constraints in the first and third introns are almost as strong as those observed in nonsynonymous sites. Because the strongest constraints seem to be associated with functions important for both transposition and its repression, both in the third intron and in exons, we postulate that the function(s) related to putative motifs present in the first intron of the element fall into this category. Finally, our results show a pattern that could be caused by selection acting not only at the time of horizontal transfer, but also during the spread of P elements within species, although a more exhaustive sample or more divergent elements are needed to obtain a conclusive result.


ACKNOWLEDGEMENTS
We thank Salvador Baez, Glória Lagunes, and Jaime Piñero for logistic assistance in the field and to Cláudia Carareto, Vera Valente, Elgion Loreto, and Fabiana Herédia for providing the Brazilian strains of Drosophila. We also thank Jonathan Clark, Andrew Holyoake, Patrick O'Grady, Michael Nachman, Michael Simmons, and one anonymous reviewer for numerous suggestions that significantly improved the focus of this manuscript. This research was funded through fellowship support from Junta Nacional de Investigacão Científica e Tecnológica, from the Interdisciplinary Program in Genetics at the University of Arizona, and from the Flinn Foundation to J.C.S. This work was also supported by National Science Foundation grants DEB-9701252 and DEB-9815754.


FOOTNOTES
Sequence data from this article have been deposited with the GenBank Data Library under accession nos. AY578739, AY578740, AY578741, AY578742, AY578743, AY578744, AY578745, AY578746, AY578747, AY578748, AY578749, AY578750, AY578751, AY578752, AY578753, AY578754, AY578755, AY578756, AY578757, AY578758, AY578759, AY578760, AY578761, AY578762, AY578763, AY578764, AY578765, AY578766, AY578767, AY578768, AY578769, AY578770, AY578771, AY578772, AY578773, AY578774, AY578775, AY578776, AY578777, AY578778, AY578779, AY578780, AY578781, AY578782, AY578783, AY578784.


LITERATURE CITED

CLARK, J. B., and M. G. KIDWELL, 1997 A phylogenetic perspective on P transposable element evolution in Drosophila. Proc. Natl. Acad. Sci. USA 94: 11428–11433.[Abstract/Free Full Text]

CLARK, J. B., T. K. ALTHEIDE, M. J. SCHLOSSER and M. G. KIDWELL, 1995 Molecular evolution of P transposable elements in the genus Drosophila. I. The saltans and willistoni species groups. Mol. Biol. Evol. 12: 902–913.[Abstract]

CLARK, J. B., J. C. SILVA and M. G. KIDWELL, 2002 Evidence of horizontal transfer of P transposable elements, pp. 161–171 in Horizontal Gene Transfer, edited by M. SYVANEN and C. I. KADO. Academic Press, San Diego.

DANIELS, S. B., K. R. PETERSON, L. D. STRAUSBAUGH, M. G. KIDWELL and A. CHOVNICK, 1990 Evidence for horizontal transmission of the P transposable element between Drosophila species. Genetics 124: 339–355.[Abstract]

ENGELS, W., 1989 P elements in Drosophila melanogaster, pp. 437–483 in Mobile DNA, edited by D. E. BERG and M. M. Howe. American Society of Microbiology, Washington, DC.

ENGELS, W. R., 1992 The origin of P elements in Drosophila melanogaster. BioEssays 14: 681–686.[CrossRef][Medline]

ENGELS, W. R., 1996 P elements in Drosophila, pp. 103–123 in Transposable Elements, edited by H. SAEDLER and A. GIERL. Springer-Verlag, Berlin.

GLOOR, G. B., C. R. PRESTON, D. M. JOHNSON-SCHLITZ, N. A. NASSIF, R. W. PHILLIS et al., 1993 Type I repressors of P element mobility. Genetics 135: 81–95.[Abstract]

GOLDMAN, N., and Z. YANG, 1994 A codon-based model of nucleotide substitution for protein-coding DNA sequences. Mol. Biol. Evol. 11: 725–736.[Abstract]

HAGEMANN, S., W. J. MILLER and W. PINSKER, 1994 Two distinct P element subfamilies in the genome of Drosophila bifasciata. Mol. Gen. Genet. 244: 168–175.[Medline]

HAGEMANN, S., E. HARING and W. PINSKER, 1996a A new P element subfamily from Drosophila tristis, D. ambigua, and D. obscura. Genome 39: 978–985.[Medline]

HAGEMANN, S., E. HARING and W. PINSKER, 1996b Repeated horizontal transfer of P transposons between Scaptomyza pallida and Drosophila bifasciata. Genetica 98: 43–51.[CrossRef][Medline]

HARTL, D. L., A. R. LOHE and E. R. LOZOVSKAYA, 1997 Modern thoughts on an ancyent marinere: function, evolution, regulation. Annu. Rev. Genet. 31: 337–358.[CrossRef][Medline]

HEY, J., and J. WAKELEY, 1997 A coalescence estimator of the population recombination rate. Genetics 145: 833–846.[Abstract]

JUKES, T. H., and C. R. CANTOR, 1969 Evolution of protein molecules, pp. 21–132 in Mammalian Protein Metabolism, edited by H. N. MUNRO. Academic Press, New York.

KIDWELL, M. G., 1979 Hybrid dysgenesis in Drosophila melanogaster. The relationship between the P-M and I-R interaction systems. Genet. Res. 33.

KIMURA, M., 1983 The Neutral Theory of Molecular Evolution. Cambridge University Press, Cambridge, UK/London/New York.

LANSMAN, R. A., R. O. SHADE, T. A. GRIGLIATTI and H. W. BROCK, 1987 Evolution of P transposable elements: sequences of Drosophila nebulosa P elements. Proc. Natl. Acad. Sci. USA 84: 6491–6495.[Abstract/Free Full Text]

LASKI, F. A., D. C. RIO and G. M. RUBIN, 1986 Tissue specificity of Drosophila P element transposition is regulated at the level of mRNA splicing. Cell 44: 7–19.[CrossRef][Medline]

LORETO, E. L., V. L. VALENTE, A. ZAHA, J. C. SILVA and M. G. KIDWELL, 2001 Drosophila mediopunctata P elements: a new example of horizontal transfer. J. Hered. 92: 375–381.[Abstract/Free Full Text]

MADDISON, D. R., and W. P. MADDISON, 2001 MacClade 4: Analysis of Phylogeny and Character Evolution. Sinauer Associates, Sunderland, MA.

MISRA, S., and D. C. RIO, 1990 Cytotype control of Drosophila P element transposition: the 66 kd protein is a repressor of transposase activity. Cell 62: 269–284.[CrossRef][Medline]

NACHMAN, M. W., 1998 Deleterious mutations in animal mitochondrial DNA. Genetica 102–103: 61–69.[CrossRef]

NEI, M., and T. GOJOBORI, 1986 Simple methods for estimating the numbers of synonymous and nonsynonymous nucleotide substitutions. Mol. Biol. Evol. 3: 418–426.[Abstract]

NEI, M., and L. JIN, 1989 Variances of the average numbers of nucleotide substitutions within and between populations. Mol. Biol. Evol. 6: 290–300.[Abstract]

NEI, M., and W. H. LI, 1979 Mathematical models for studying genetic variation in terms of restriction endonucleases. Proc. Natl. Acad. Sci. USA 76: 5269–5273.[Abstract/Free Full Text]

O'HARE, K., and G. M. RUBIN, 1983 Structures of P transposable elements and their sites of insertion and excision in the Drosophila melanogaster genome. Cell 34: 25–35.[CrossRef][Medline]

OLIVEIRA DE CARVALHO, M., J. C. SILVA and E. L. S. LORETO 2004 Analyses of P-like transposable element sequences from the genome of Anopheles gambiae. Insect Mol. Biol. 13: 55–63.[CrossRef][Medline]

PINSKER, W., E. HARING, S. HAGEMANN and W. J. MILLER, 2001 The evolutionary life history of P transposons: from horizontal invaders to domesticated neogenes. Chromosoma 110: 148–158.[Medline]

RIO, D. C., 2002 P transposable elements in Drosophila melanogaster, pp. 484–518 in Mobile DNA II, edited by N. L. CRAIG, R. CRAIGIE, M. GELLERT and A. M. LAMBOWITZ. American Society for Microbiology, Washington, DC.

ROBERTSON, H. M., and W. R. ENGELS, 1989 Modified P elements that mimic the P cytotype in Drosophila melanogaster. Genetics 123: 815–824.[Abstract/Free Full Text]

ROCHE, S. E., M. SCHIFF and D. C. RIO, 1995 P-element repressor autoregulation involves germ-line transcriptional repression and reduction of third intron splicing. Genes Dev. 9: 1278–1288.[Abstract/Free Full Text]

ROZAS, J., and R. ROZAS, 1999 DnaSP 3: an integrated program for molecular population genetics and molecular evolution analysis. Bioinformatics 15: 174–175.[Abstract/Free Full Text]

SARKAR, A., R. SENGUPTA, J. KRZYWINSKI, X. WANG, C. ROTH et al., 2003 P elements are found in the genomes of nematoceran insects of the genus Anopheles. Insect Biochem. Mol. Biol. 33: 381–387.[CrossRef][Medline]

SILVA, J. C., 2000 Population genetics of P transposable elements and their host species, with emphasis on Drosophila willistoni and Drosophila sturtevanti. Ph.D. Thesis, University of Arizona, Tucson.

SILVA, J. C., and M. G. KIDWELL, 2000 Horizontal transfer and selection in the evolution of P elements. Mol. Biol. Evol. 17: 1542–1557.[Abstract/Free Full Text]

SILVA, J. C., E. L. LORETO and J. B. CLARK, 2004 Factors that affect the horizontal transfer or transposable elements. Curr. Issues Mol. Biol. 6: 57–72.[Medline]

SIMMONS, M. J., K. J. HALEY and S. J. THOMPSON, 2002 Maternal transmission of P element transposase activity in Drosophila melanogaster depends on the last P intron. Proc. Natl. Acad. Sci. USA 99: 9306–9309.[Abstract/Free Full Text]

SIMONELIG, M., and D. ANXOLABéHèRE, 1991 A P element of Scaptomyza pallida is active in Drosophila melanogaster. Proc. Natl. Acad. Sci. USA 88: 6102–6106.[Abstract/Free Full Text]

SWOFFORD, D. L., 1999 PAUP*: Phylogenetic Analysis Using Parsimony and Other Methods. Sinauer Associates, Sunderland, MA.

WATTERSON, G. A., 1975 On the number of segregating sites in genetical models without recombination. Theor. Popul. Biol. 7: 256–276.[CrossRef][Medline]

WITHERSPOON, D. J., 1999 Selective constraints on P-element evolution. Mol. Biol. Evol. 16: 472–478.[Abstract]

YANG, Z., 1997 PAML: a program package for phylogenetic analysis by maximum likelihood. Comput. Appl. Biosci. 13: 555–556.[Free Full Text]

YANG, Z., and R. NIELSEN, 2002 Codon-substitution models for detecting molecular adaptation at individual sites along specific lineages. Mol. Biol. Evol. 19: 908–917.[Abstract/Free Full Text]




This article has been cited by other articles:


Home page
GeneticsHome page
R. A. Subramanian, P. Arensburger, P. W. Atkinson, and D. A. O'Brochta
Transposable Element Dynamics of the hAT Element Herves in the Human Malaria Vector Anopheles gambiae s.s.
Genetics, August 1, 2007; 176(4): 2477 - 2487.
[Abstract] [Full Text] [PDF]


Home page
Mol Biol EvolHome page
C. Casola, A. M. Lawing, E. Betran, and C. Feschotte
PIF-like Transposons are Common in Drosophila and Have Been Repeatedly Domesticated to Generate New Host Genes
Mol. Biol. Evol., August 1, 2007; 24(8): 1872 - 1888.
[Abstract] [Full Text] [PDF]