Genetics, Vol. 164, 533-544, June 2003, Copyright © 2003

DNA Polymorphism in the ß-Esterase Gene Cluster of Drosophila melanogaster

Evgeniy S. Balakireva,b, V. R. Chechetkinc, V. V. Lobzinc, and Francisco J. Ayalaa
a Department of Ecology and Evolutionary Biology, University of California, Irvine, California 92697-2525,
b Institute of Marine Biology, Vladivostok 690041, Russia, and Academy of Ecology, Marine Biology, and Biotechnology, Far Eastern State University, Vladivostok 690600, Russia
c Troitsk Institute of Innovation and Thermonuclear Investigations (TRINITI), Theoretical Department of Division for Perspective Investigations, 142190 Troitsk, Moscow Region, Russia

Corresponding author: Francisco J. Ayala, 321 Steinhaus Hall, University of California, Irvine, CA 92697-2525., fjayala{at}uci.edu (E-mail)

Communicating editor: N. TAKAHATA


*  ABSTRACT
*TOP
*ABSTRACT
*MATERIALS AND METHODS
*RESULTS
*DISCUSSION
*LITERATURE CITED

We have analyzed nucleotide polymorphism within a 5.3-kb region encompassing the functional Est-6 gene and the {psi}Est-6 putative pseudogene in 28 strains of Drosophila melanogaster and one of D. simulans. Two divergent sequence types were detected, which are not perfectly associated with Est-6 allozyme variation. The level of variation ({pi}) is very close in the 5'-flanking region (0.0059) and Est-6 gene (0.0057), but significantly higher in the intergenic region (0.0141) and putative pseudogene (0.0122). The variation in the 3'-flanking region is intermediate (0.0083). These observations may reflect different levels of purifying selection in the different regions. Strong linkage disequilibrium occurs within the region studied, with the largest values revealed in the putative pseudogene and 3'-flanking region. Moreover, recombination is restricted within {psi}Est-6. Gene conversion is detected both within and (to a lesser extent) between Est-6 and {psi}Est-6. The data indicate that {psi}Est-6 exhibits some characteristics that are typical of nonfunctional genes, while other characteristics are typically attributed to functional genes; the same situation has been observed in other pseudogenes (including Drosophila). The results of structural entropy analysis demonstrate higher structural ordering in Est-6 than in {psi}Est-6, in accordance with expectations if {psi}Est-6 is indeed a pseudogene. Taking into account that the function of {psi}Est-6 is not known (but could exist) and following the terminology of J. Brosius and S. J. Gould, we suggest that the term "potogene" may be appropriate for {psi}Est-6, indicating that it is a potential gene that may have acquired some distinctive but unknown function.


THE ß-esterase gene cluster is on the left arm of chromosome 3 of Drosophila melanogaster, at 68F7–69A1 in the cytogenetic map (but see PROCUNIER et al. 1991 Down). The cluster is composed of two tandemly duplicated genes, originally named Est-6 and Est-P (COLLET et al. 1990 Down). The coding regions of these genes are 1686 and 1691 bp long, respectively, and consist of two exons (1387 and 248 bp) and a small (51 bp in Est-6 and 56 bp in Est-P) intron (OAKESHOTT et al. 1987 Down). The Est-6 gene is well characterized (reviewed by RICHMOND et al. 1990 Down; OAKESHOTT et al. 1993 Down). The gene encodes the major ß-carboxylesterase (EST-6) that is transferred by D. melanogaster males to females in the seminal fluid during copulation (RICHMOND et al. 1980 Down) and affects the female's consequent behavior and mating proclivity (GROMKO et al. 1984 Down). Less information is available for Est-P. COLLET et al. 1990 Down first described Est-P and concluded that it was a functional gene, on the basis of several lines of evidence: transcriptional activity, intact splicing sites, no premature termination codons, and presence of initiation and termination codons. However, BALAKIREV and AYALA 1996 Down found premature stop codons within the Est-P coding region and some other indications suggesting that Est-P might be in fact a pseudogene and named it {psi}Est-6. DUMANCIC et al. 1997 Down showed that some alleles of the Est-P produce a catalytically active esterase corresponding to the previously identified EST-7 isozyme (HEALY et al. 1991 Down) and renamed the gene correspondingly Est-7.

Our earlier investigation of {psi}Est-6 (BALAKIREV and AYALA 1996 Down) was limited to 10 lines of D. melanogaster and 2.8 kb. We now increase the sample size to 28 lines and extend the analysis by comparing the nucleotide variability in the {psi}Est-6 putative pseudogene and Est-6 gene in a random sample of D. melanogaster derived from a natural population of California. The full sequence now analyzed is 5394 bp long and includes the 5'-flanking region, complete Est-6 gene, intergenic region, {psi}Est-6 putative pseudogene, and 3'-flanking region. The data for the 5'-flanking region and Est-6 gene (1686 bp) are from BALAKIREV et al. 2002 Down.


*  MATERIALS AND METHODS
*TOP
*ABSTRACT
*MATERIALS AND METHODS
*RESULTS
*DISCUSSION
*LITERATURE CITED

Drosophila strains:
The 28 D. melanogaster strains were derived from a random sample of wild flies collected by F. J. Ayala (October 1991) in El Rio Vineyard, Acampo, California. The strains were made fully homozygous for the third chromosome by crosses with balancer stocks, as described by SEAGER and AYALA 1982 Down. The strains were named in accordance with the esterase-6 (the letter before the hyphen) and superoxide dismutase (the letter after the number) electrophoretic alleles they carry, ultra slow (US), slow (S), and fast (F) (Fig 1).



View larger version (0K):
In this window
In a new window
Download PPT slide
 
Figure 1. DNA polymorphism in the putative pseudogene {psi}Est-6. The numbers above the top sequence represent the segregating sites and the start of a deletion or insertion. Nucleotides are numbered from the Est-P start codon (position 3052 in COLLET et al. 1990 Down). Eleven premature stop codons are due to single-nucleotide polymorphisms G/A (site 1068, strains F-357F, F-274F, F-517F, F-517S, and F-1461S) and T/G (site 1388, strains S-501S, S-510S, and S-5F), as well as to a 9-bp insertion of ACATTTGAT (position 1379–1387, strains S-501S, S-510S, and S-5F); these sites and nucleotides are marked by boldface type. The S, US, and F letters before the strain numbers refer to the EST-6 allozymes slow, ultra-slow, and fast. (The S and F after the numbers refer to the allozyme polymorphism at the Sod locus and have been previously used to tag these lines.) {blacktriangleup} denotes a 3-bp deletion of CAG (position 232–234, strain F-775F); {dagger} denotes the absence of a deletion; {blacktriangledown} denotes an insertion of ACATTTGAT (position 1379–1387, strains S-501S, S-510S, and S-5F); {ddagger} denotes the absence of an insertion.

DNA extraction, amplification, and sequencing:
Total genomic DNA was extracted using the tissue protocol of the QIAamp tissue kit (QIAGEN, Valencia, CA). The D. melanogaster Est-6 sequence (GenBank accession nos. M33780 and M33781; COLLET et al. 1990 Down) was used for designing PCR and sequencing primers. The primers used for the PCR amplification reactions were 5'-gattttgcttcgagtgataatgg-3' (forward primer) and 5'-agactacgtgcacagtgtggtggg-3' (reverse primer). The PCR reactions were carried out in final volumes of 50 µl using TaKaRa Ex Taq in accordance with the manufacturer's description (Takara Biotechnology, Berkeley, CA). The reaction mixtures were subjected to 30 cycles of denaturation, annealing, and extension: 95° for 30 sec, 63° for 30 sec, and 72° for 2.0 min (for the first cycle and progressively adding 3 sec at 72° for every subsequent cycle); with a final 7-min extension period at 72°. The PCR reactions were purified with the Wizard PCR preps DNA purification system (Promega, Madison, WI), directly sequenced by the dideoxy chain-termination technique using Dye Terminator chemistry, and separated with the ABI PRISM 377 automated DNA sequencer (Perkin-Elmer, Norwalk, CT). For each line, the sequences of both strands were determined, using 12 overlapping internal primers spaced, on average, 350 nucleotides. (See GenBank accession nos. AF147095, AF147096, AF147097, AF147098, AF147099, AF147100, AF147101, AF147102, AF150809, AF150810, AF150811, AF150812, AF150813, AF150814, AF150815, AF217624, AF217625, AF217626, AF217627, AF217628, AF217629, AF217630, AF217631, AF217632, AF217633, AF217634, AF217635, AF217636, AF217637, AF217638, AF217639, AF217640, AF217641, AF217642, AF217643, AF217644, AF217645, and AF526538, AF526539, AF526540, AF526541, AF526542, AF526543, AF526544, AF526545, AF526546, AF526547, AF526548, AF526549, AF526550, AF526551, AF526552, AF526553, AF526554, AF526555, AF526556, AF526557, AF526558 for the {psi}Est-6 sequences.) At least two independent PCR amplifications were sequenced for each polymorphic site in all D. melanogaster strains to prevent possible PCR or sequencing errors.

DNA sequence analysis:
The sequences were assembled using the program SeqMan (Lasergene, DNASTAR, 1994–1997). The computer programs DnaSP, version 3.4 (ROZAS and ROZAS 1999 Down) and PROSEQ, version 2.4 (FILATOV and CHARLESWORTH 1999 Down) were used to analyze the data by means of the "sliding window" method (HUDSON and KAPLAN 1988 Down) and for most intraspecific analyses. Departures from neutral expectations were investigated using KELLY's (1997) and WALL's (1999) neutrality tests incorporating recombination. The permutation approach of HUDSON et al. 1992 Down was used to estimate the significance of sequence differences between haplotype families. The coalescent simulations (HUDSON 1990 Down) were performed with the PROSEQ program to estimate the probabilities of the observed values of Kelly's ZnS and Wall's B and Q statistics and confidence intervals for the nucleotide diversity values. The method of SAWYER 1989 Down, SAWYER 1999 Down was used to analyze intra- and intergenic conversion events.

Entropy analysis:
If {psi}Est-6 is in fact a pseudogene or nonessential gene, one could expect lower structural regularity and higher structural divergence in this putative pseudogene than in its functional paralogous gene, Est-6. These features can be quantitatively assessed with the proper structural analysis of the relevant sequences. Our approach is based on spectral methods previously developed (CHECHETKIN and TURYGIN 1994 Down, CHECHETKIN and TURYGIN 1996 Down; CHECHETKIN et al. 1994 Down; CHECHETKIN and LOBZIN 1996 Down, CHECHETKIN and LOBZIN 1998 Down; for a review and further references, see LOBZIN and CHECHETKIN 2000 Down).

First, we begin with the necessary definitions. The Fourier harmonics corresponding to the nucleotides of type {alpha} (where {alpha} is A, C, G, or T) in a sequence of length M are defined as

(1)

where {rho}m,{alpha} indicates the positions occupied by the nucleotides of type {alpha}, {rho}m,{alpha} = 1 if the nucleotide of type {alpha} occupies the mth site, and 0 otherwise. The amplitudes of Fourier harmonics (or structure factors) are expressed as

(2)

where the asterisk denotes complex conjugation. The zeroth harmonics depending only on the nucleotide composition do not contain structural information and are discarded below. Due to the symmetry property

(3)

the spectra for structure factors can be restricted to their left halves from n = 1 to

(4)

where the brackets denote the integer part of the quotient. The structure factors will always be normalized with respect to mean spectral values,

(5)

where N{alpha} is the total number of nucleotides of type {alpha} in a sequence of length M. The structural regularity of the nucleotides of type {alpha} in a sequence of length M is assessed with the spectral structural entropy

(6)

The value of spectral entropy for a counterpart random sequence having the same nucleotide composition is the highest and the corresponding mean characteristics averaged over an ensemble of various random realizations are given by

(7)

Using the values (7) as the reference characteristics, it is convenient to introduce the relative spectral structural entropy

(8)

as well as the relative normalized deviations

(9)

The value of S{alpha},rel serves for the comparison of structural regularity for the different sequences (generally, also of different lengths); the higher the value of S{alpha},rel, the higher the structural regularity of a sequence, while r{alpha} serves for the assessment of the statistical significance of observed deviations. Assuming a Gaussian distribution for r in the case of random deviations, the probability of finding values of r exceeding some threshold r0 is given by

(10)

The value Pr = 0.05 corresponds to that of r0 = 1.64.

The level of structural divergence may be quantitatively estimated with deviations

(11)

where the cross correlation coefficients are determined as

(12)


(13)

The structure factor harmonics F{alpha}{alpha}(qn), the mean spectral values {alpha}{alpha}, and number N are defined in Equation 2, Equation 4, and Equation 5, while the superscripts 1 and 2 refer to a pair of compared sequences. The definition (12) assumes equal lengths for the compared sequences 1 and 2 corresponding to the patterns of the same gene in two different strains of D. melanogaster. In the presence of insertions/deletions and unequal lengths in the compared sequences, the shorter sequence is supplemented by void sites up to the length of the longer one (MARPLE 1987 Down). For n different sequences there are n(n - 1)/2 pairwise cross correlation coefficients. In our case n = 28 and the number of cross correlation coefficients is equal to 378. The higher values of deviations {delta}{alpha} correspond to the higher structural divergence between compared sequences.


*  RESULTS
*TOP
*ABSTRACT
*MATERIALS AND METHODS
*RESULTS
*DISCUSSION
*LITERATURE CITED

Nucleotide polymorphism and recombination:
Fig 1 shows a total of 92 polymorphic sites in a sample of 28 sequences of the {psi}Est-6 putative pseudogene: 62 sites in exon I (1 site involves a 3-bp deletion), 2 sites in the intron, 12 sites in exon II, and 16 sites in the 3'-flanking region. Two indel polymorphisms occur. A 3-bp deletion occurs in the F-775F strain and a 9-bp insertion occurs in the S-510S, S-501S, and S-5F strains (Fig 1). For the {psi}Est-6 coding region we detected 41 replacements (1 site involves a 9-bp insertion) and 33 synonymous polymorphic sites. We previously found 13 replacements and 23 synonymous polymorphic sites in the Est-6 coding region (BALAKIREV et al. 2002 Down). The ratio of replacement to synonymous polymorphic sites is 0.565 for the Est-6 gene and more than twice that, 1.242, for the putative pseudogene. We detected 11 premature stop codons (all TGA) within the coding region of the putative pseudogene. The stop codons are generated by single mutations (positions 1068 and 1388) as well as by the insertion ACATTTGAT (Fig 1). The mdg-3 retrotransposon insertion (5.2 kb) was detected within the intron of {psi}Est-6 in the S-438S D. melanogaster strain (data not shown). GAME and OAKESHOTT 1990 Down detected the same insertion previously in strain 12I-11.2 of D. melanogaster, which carried a null allele of {psi}Est-6.

Table 1 shows estimates of nucleotide diversity for the putative pseudogene as well as for the Est-6 gene and the flanking regions. The {pi} value for the full sequence is 0.0084, which is within the range of values observed in other high-recombination gene regions in D. melanogaster (MORIYAMA and POWELL 1996 Down). The {pi} value is very similar in the 5'-flanking (0.0059) and Est-6 regions (0.0057), but significantly higher in the intergenic region (0.0141) and putative pseudogene (0.0122), and intermediate in the 3'-flanking region (0.0083). The level of synonymous variation is 0.0152 in the Est-6 coding region but 0.0268 (1.76 times higher) in the putative pseudogene. The difference is more pronounced for nonsynonymous variation, which is 0.0026 in the Est-6 gene and 0.0078 (3.0 times higher) in the putative pseudogene. This could indicate different degrees of selective constraint in the Est-6 gene and the putative pseudogene. The level of silent polymorphism in the 3'-flanking region is 0.0083, but 0.0268 (3.2 times higher) in the putative pseudogene. These differences could again indicate differences in selective constraints. The level of silent divergence between D. melanogaster and D. simulans is similar for the Est-6 gene (0.1469) and the putative pseudogene (0.1393), but lower in the 5'-flanking (0.0807) and 3'-flanking (0.0417) regions.


 
View this table:
In this window
In a new window

 
Table 1. Nucleotide diversity and divergence of Est-6, {psi}Est-6, and flanking regions, in 28 strains of D. melanogaster

The method of HUDSON and KAPLAN 1985 Down reveals a minimum of 15 recombination events in the whole region (5394 bp) analyzed. The minimum number of recombination events is six for the Est-6 gene but three for the putative pseudogene. There is a large difference (3300 times) in the value of the recombination estimator C (HUDSON 1987 Down) obtained for the Est-6 gene and the putative pseudogene (Table 2). The same tendency (but much less pronounced) is obtained using the permutation-based method of MCVEAN et al. 2002 Down and also the method of HEY and WAKELEY 1997 Down based on the number of pairs of sites with incongruent genealogical histories (Table 2).


 
View this table:
In this window
In a new window

 
Table 2. Recombination estimates

Thus, there is two times more total nucleotide variability in the putative pseudogene (Table 1) but the recombination rate is at least two times higher in the Est-6 gene (Table 2). The association in {psi}Est-6 of a high level of nucleotide variation with low recombination is contrary to the well-documented positive relationship between within-species DNA variation and recombination rates (e.g., BEGUN and AQUADRO 1992 Down).

Haplotype structure:
Previously, ODGERS et al. 1995 Down and BALAKIREV et al. 1999 Down, BALAKIREV et al. 2002 Down described two groups of haplotypes for both the 5'-flanking and Est-6 coding regions in D. melanogaster. There are also two groups of haplotypes in the putative pseudogene region (Fig 1 and Fig 2) that are labeled S or F according to the Est-6 haplotype with which they are associated. Note, however, that lines F-531F, F-611F, and F-96S belong to the fast allozyme group of the Est-6 gene (BALAKIREV et al. 2002 Down) but they are in the slow allozyme group in the neighbor-joining tree of the putative pseudogene sequences (Fig 2). The average number of nucleotide differences (K) between the two haplotypes (excluding 3'-flanking region) is 20.534. The S group includes most haplotypes (22 out of 28), which are 3.69 times less variable ({pi} = 0.0026 ± 0.0006) than the six F haplotypes ({pi} = 0.0096 ± 0.0042). The permutation test of HUDSON et al. 1992 Down is highly significant for the F and S haplotypes, K*st = 0.3438 (P < 0.001).



View larger version (0K):
In this window
In a new window
Download PPT slide
 
Figure 2. Neighbor-joining tree of the {psi}Est-6 haplotypes of D. melanogaster based on Kimura's two-parameter distance. The numbers at the nodes are percentages of bootstrap probability values based on 10,000 replications.

Gene conversion:
The method of SAWYER 1989 Down, SAWYER 1999 Down detects gene conversion events within both the Est-6 gene (5 regions in 10 sequences, P = 0.0097) and {psi}Est-6 (13 regions in all 28 sequences, P = 0.0000). The numbers of significant fragments are 14 for the Est-6 gene (fragment length from 314 to 1183 bp, average 662 bp) and 85 for {psi}Est-6 (fragment length from 154 to 1052 bp, average 669 bp). Gene conversion events between Est-6 and {psi}Est-6 are detected only in the protein alignment (involving a single region between amino acids 41 and 55, P = 0.0102). The number of significant fragment pairs showing intergenic conversion, which involve 23 Est-6 and 6 {psi}Est-6 sequences, is 138. Taken together, these results show that gene conversion has played an important role in the evolution of the ß-esterase gene cluster.

Sliding-window analysis:
Fig 3 shows the distribution of polymorphism along the Est-6 (thin line) and {psi}Est-6 (thick line) sequences. There is a distinct peak in the Est-6 sequences at 750–950, which includes the F/S replacement site (position 772). We detected this peak previously (BALAKIREV et al. 1999 Down, BALAKIREV et al. 2002 Down) in our data, and also in data of HASSON and EANES 1996 Down and COOKE and OAKESHOTT 1989 Down, and suggested that it may reflect the effect of balancing selection (STROBECK 1983 Down; HUDSON and KAPLAN 1988 Down). There are four (approximately equal) strong peaks in the {psi}Est-6 sequences at 50–200, 400–600, 850–1050, and 1300–1650. The putative pseudogene peaks are more acute than the Est-6 gene peaks, have a regular distribution along the sequence (with an interval of 200–300 bp), and are not centered around the replacement polymorphisms (Fig 1).



View larger version (0K):
In this window
In a new window
Download PPT slide
 
Figure 3. Sliding-window plots of nucleotide diversity ({pi}) in Est-6 (thin line) and {psi}Est-6 (thick line). Window sizes are 100 nucleotides with 1-nucleotide increments.

We measure heterogeneity in the distribution of polymorphic sites along the {psi}Est-6 sequence and discordance between {pi} (within-melanogaster polymorphism) and K (melanogaster-simulans divergence) by means of GOSS and LEWONTIN's (1996) and MCDONALD's (1996, 1998) statistics and assess their significance by Monte Carlo simulations of the coalescent model incorporating recombination (MCDONALD 1996 Down, MCDONALD 1998 Down). On the basis of 10,000 simulations, with the recombination parameters varying from 1 to 64, the tests are not significant: GOSS and LEWONTIN's (1996) interval length variance (VIL) is 0.000164, P > 0.05, and modified interval length variance (QIL) is 0.000767, P > 0.05; MCDONALD's (1998) maximum sliding G statistic is 6.9202, P > 0.05; the Kolmogorov-Smirnov statistic is 0.043733, P > 0.05. The tests are significant for the Est-6 gene including the promoter region, but not for the Est-6 coding region alone (BALAKIREV et al. 2002 Down).

Linkage disequilibrium:
We have calculated the P value of Fisher's exact test in all pairwise comparisons of informative polymorphic sites. The numbers (and percentages) of pairwise comparisons that are significant are, for the whole region, 4235 out of 7626 (55.53%, 2.62% with the Bonferroni correction); for the Est-6 gene, 151 out of 300 (50.33%, 18.33% with the correction); for the putative pseudogene, 1486 out of 1830 (81.20%, 14.81% with the correction); for the 3'-flanking region, 66 out of 78 (84.62%, 57.69% with the correction); and between Est-6 and {psi}Est-6, 927 of 1525 (60.79%, but none with the Bonferroni correction). The significant interlocus linkage disequilibria are caused by six divergent haplotypes, F-517S, F-517F, F-1461S, F-274F, F-357F, and F-775F, which have unique polymorphisms in both Est-6 and {psi}Est-6.

Tests of neutrality:
In a previous study (BALAKIREV et al. 2002 Down) we detected significant deviations from neutrality in the 5'-flanking and Est-6 coding regions, using KELLY's ZnS (1997) and WALL's (1999) B and Q tests, based on linkage disequilibrium between segregating sites; both tests were significant with the population recombination rate ≥0.010 (Kelly's test) or without recombination (Wall's test). For {psi}Est-6, Kelly's ZnS and Wall's B and Q values are even higher (ZnS = 0.422; B = 0.432; Q = 0.520) than those for the Est-6 gene and significant by coalescent simulations with the population recombination rate ≥0.005 (ZnS statistic) or without recombination (B and Q statistics).

Entropy analysis:
We use this new type of analysis when seeking to ascertain the functionality of {psi}Est-6. We have calculated the relevant characteristics for the exon-intron-exon sequences of Est-6 and {psi}Est-6 before splicing and for exon-exon sequences after splicing. The examples of spectra for structure factor harmonics (see MATERIALS AND METHODS, Equation 5) are illustrated in Fig 4 and Fig 5, where the period p is related to the ordinal number of structure factor harmonic n as p = M/n. The high peaks at p = 3 (n = 561 for Est-6 and n = 563 for {psi}Est-6) are a distinctive feature of protein-coding regions (FICKETT and TUNG 1992 Down) and are inherent to both genes and pseudogenes (HOLSTE et al. 2000 Down). The characteristics for the structural entropy averaged over the 28 lines of D. melanogaster are presented in Table 3. The values of integral spectral structural entropy per harmonic summed over all four types of nucleotides, S/(M - 1), are equal to -1.808 (r = 3.16) for Est-6, -1.821 (r = 3.45) for spliced Est-6, -1.718 (r = 0.73) for {psi}Est-6, and -1.736 (r = 1.20) for spliced {psi}Est-6. The values obtained for {psi}Est-6 are not significantly different from random sequence, while the entropy of the Est-6 gene is significantly higher than expected for random sequence (Table 3). These results demonstrate higher structural ordering in Est-6 than in {psi}Est-6, in accordance with expectations if {psi}Est-6 is indeed a pseudogene. (Note the generally higher values of r, indicating higher structural ordering in the spliced genes, with the exception of a few cases due to structural coupling between exons and intron.)



View larger version (0K):
In this window
In a new window
Download PPT slide
 
Figure 4. The normalized structure factor spectra (see MATERIALS AND METHODS, Equation 1Equation 2Equation 3Equation 4Equation 5) for the unspliced Est-6 gene. The high peaks at n = 561 correspond to three-periodicity (p = 3), which is a fundamental feature of protein-coding regions.



View larger version (0K):
In this window
In a new window
Download PPT slide
 
Figure 5. The normalized structure factor spectra (see MATERIALS AND METHODS, Equation 1Equation 2Equation 3Equation 4Equation 5) for the unspliced {psi}Est-6 putative pseudogene. The high peaks at n = 563 correspond to three-periodicity (p = 3), which is a fundamental feature of protein-coding regions.


 
View this table:
In this window
In a new window

 
Table 3. The characteristics for the structural entropy averaged over 28 lines of Drosophila melanogaster

The mean values of deviations {delta}{alpha} averaged over the set of 378 pairwise cross correlation coefficients corresponding to each gene are summarized in Table 4, while the examples of their distributions are presented in Fig 6 and Fig 7. The values <{delta}> in the last column in Table 4 are obtained by additional averaging of {delta}{alpha} over four types of nucleotides. The insertions/deletions are present only in {psi}Est-6 and produce the main contribution to deviations {delta}{alpha}. Comparison of structural divergence in {psi}Est-6 was also performed upon equalizing the lengths of the sequences by removing insertions/deletions. As seen in Table 4, even after equalizing, the structural divergence remains distinctly higher in {psi}Est-6 than in Est-6, as expected if {psi}Est-6 is not a functional gene. Besides that, it is worth noting the correlation between the deviations {delta}{alpha} and the heights of peaks for f{alpha}{alpha} at p = 3 (see Equation 5 as well as Fig 4 and Fig 5): the higher the peaks at p = 3 the smaller {delta}{alpha} and the narrower their distributions (see Fig 6 and Fig 7 as well as Table 4).



View larger version (0K):
In this window
In a new window
Download PPT slide
 
Figure 6. Histograms for deviations {delta}{alpha} (see MATERIALS AND METHODS, Equation 11Equation 12Equation 13) for Est-6 in 28 lines of D. melanogaster. Higher values of {delta}{alpha} correspond to higher structural divergence between compared sequences.



View larger version (0K):
In this window
In a new window
Download PPT slide
 
Figure 7. Histograms for deviations {delta}{alpha} (see MATERIALS AND METHODS, Equation 11Equation 12Equation 13) for {psi}Est-6 in 28 lines of D. melanogaster. Higher values of {delta}{alpha} correspond to higher structural divergence between compared sequences.


 
View this table:
In this window
In a new window

 
Table 4. The mean divergences of cross correlations {delta}{alpha} within 28 lines of Drosophila melanogaster


*  DISCUSSION
*TOP
*ABSTRACT
*MATERIALS AND METHODS
*RESULTS
*DISCUSSION
*LITERATURE CITED

Pseudogenes in Drosophila and other organisms:
Relative to what is known in other organisms, especially vertebrates (MIGHELL et al. 2000 Down), pseudogenes are not common in Drosophila (POWELL 1997 Down). Moreover, sequence evolution in many of the pseudogenes detected in Drosophila has indications of some functional constraint, including lower than expected levels of intraspecific variability and interspecific divergence, significant heterogeneity of nucleotide variability and divergence along the sequences, higher rate of substitution at synonymous nucleotide positions, conservation of important functional regions, transcriptional activity, and codon bias (JEFFS and ASHBURNER 1991 Down; CURRIE and SULLIVAN 1994 Down; JEFFS et al. 1994 Down; SULLIVAN et al. 1994 Down; PRITCHARD and SCHAEFFER 1997 Down; RAMOS-ONSINS and AGUADE 1998 Down). Two Adh Drosophila genes originally identified as pseudogenes (FISHER and MANIATIS 1985 Down; JEFFS and ASHBURNER 1991 Down) were later considered to be novel functional genes (LONG and LANGLEY 1993 Down; BEGUN 1997 Down).

The unusual patterns of pseudogene evolution suggesting some functional constraints have also been revealed in other organisms. Moreover, it was shown that a small number of detrimental alterations is a common feature of pseudogenes; there are many examples of extremely conserved pseudogene sequences exhibiting 90% and higher homology with functional counterparts (for a review, see VANIN 1985 Down; WILDE 1986 Down; MIGHELL et al. 2000 Down). The extent of similarity between a pseudogene and its functional counterpart could be sharply nonuniform along the sequence (e.g., SUDO et al. 1990 Down; MATTERS and GOODENOUGH 1992 Down; JOHN et al. 1996 Down).

By definition, pseudogenes should be transcriptionally and translationally silent; however, nonfunctional (or functional only in some cases) transcripts of many pseudogenes have been described (e.g., FOTAKI and IATROU 1988 Down; SORGE et al. 1990 Down; NGUYEN et al. 1991 Down; ZHOU et al. 1992 Down; CRISTIANO et al. 1993 Down; BARD et al. 1995 Down; FURBAB and VANSELOW 1995 Down; for a review and further reference, see VANIN 1985 Down; WILDE 1986 Down; MIGHELL et al. 2000 Down). In some cases, the translation activity was shown in vivo (MCCARREY and THOMAS 1987 Down; BRISTOW et al. 1993 Down) and in vitro (MISRA-PRESS et al. 1994 Down). SORGE et al. 1990 Down also detected considerable subject-to-subject variation in the relative amounts of transcripts derived from gene and pseudogene that could be due to polymorphisms in the promoter regions of the pseudogene or to polymorphisms affecting stability of pseudogene-derived mRNA. This situation is reminiscent of the {psi}Est-6 putative pseudogene, for which stop codons were detected only in some lines of D. melanogaster.

Pattern of recombination:
Disruption of homologous sequences by insertion or nucleotide polymorphisms can significantly reduce recombination frequencies. Even individual nucleotide substitutions have been shown to affect recombination (e.g., SELVA et al. 1995 Down; LUKACSOVICH and WALDMAN 1999 Down). In comparison with recombination between alleles lacking large insertion polymorphisms, recombination within the maize a1 gene (XU et al. 1995 Down) is inhibited when one allele with a transposon insertion is paired with a second allele lacking an insertion. In this respect, we note that the insertion of the mdg-3 retrotransposon within the intron of the {psi}Est-6 putative pseudogene has been detected in one D. melanogaster strain studied here (S-438S) and also in strain 12I-11.2 analyzed by GAME and OAKESHOTT 1990 Down.

Recombination is not a random process; recombination hotspots caused by specific initiating sequences are reported for many gene systems. The unprecedented evolutionary stability of simple repeats promoting recombination in the expressed mammalian MHC-DRB genes was detected in some specific genome locations (review in SCHWAIGER and EPPLEN 1995 Down). But in mammalian DRB pseudogenes, the simple repeat stretch seems gradually to reduce its characteristic pattern in the evolutionary course (LARHAMMAR et al. 1985 Down; RIESS et al. 1990 Down; SCHWAIGER and EPPLEN 1995 Down). BLISKOVSKII et al. 1993 Down described the structure of the human son processed pseudogene, which has a 96% homology with the son functional gene. Despite the high sequence homology, the son pseudogene lacks five monomers of the perfect tandem repeat area, which, it has been suggested, are associated with the initiation of recombination processes (JEFFREYS et al. 1985 Down).

In several yeast and maize genes, the sequence signals initiating recombination often occur within the promoter but not within the gene itself (WHITE et al. 1993 Down; FAN et al. 1995 Down; XU et al. 1995 Down). The promoter of the {psi}Est-6 putative pseudogene is limited to 193 bp consisting of the intergenic region between Est-6 and {psi}Est-6. It might be that the obvious reduction in recombination could be connected with the promoter truncation of {psi}Est-6. The sequences promoting recombination could also be eroded within {psi}Est-6 due to stochastic accumulation of mutations as in the case of HLA-DRB (LARHAMMAR et al. 1985 Down; RIESS et al. 1990 Down; SCHWAIGER and EPPLEN 1995 Down) and son (BLISKOVSKII et al. 1993 Down) pseudogenes.

Pseudogene function:
A possible role for pseudogenes in development as a source of the intracellular inhibitors was suggested by MCCARREY and RIGGS 1986 Down. It has been also suggested that pseudogenes may have regulatory roles for the genes from which they have been derived (FOTAKI and IATROU 1988 Down; INOUYE 1988 Down; ZHOU et al. 1992 Down; TROYANOVSKY and LEUBE 1994 Down). HEALY et al. 1996 Down have shown that 3' sequences within the {psi}Est-6 transcription unit contain elements that modulate the expression of Est-6.

A functional role has been proposed and, in some cases clearly brought out, for pseudogenes in the diversity of vertebrate immune response (e.g., REYNAUD et al. 1987 Down; KNIGHT 1992 Down; VARGAS-MADRAZO et al. 1995 Down; SAYEGH et al. 1999 Down). The immunoglobulin gene diversity is generated by somatic gene conversion events in which sequences derived from pseudogenes are integrated into functional germ-line genes. Gene conversion events have been also reported in several bacterial pathogens as a mechanism for generating antigenic variation (of sequence diversity in the expressed genes; e.g., THON et al. 1989 Down; ZHANG et al. 1997 Down; NOORMOHAMMADI et al. 2000 Down; BRAYTON et al. 2001 Down). It has been suggested that human olfactory receptor (OR) pseudogenes might be important for the generation and maintenance of diversity (GLUSMAN et al. 2000 Down). While OR pseudogenes have lost coding function, they are apparently under new evolutionary constraints; OR pseudogenes adopt noncoding functions as CpG islands (GLUSMAN et al. 2000 Down), enhancers (BUETTNER et al. 1998 Down), and matrix attachment regions (GIMELBRANT and MCCLINTOCK 1997 Down). While pseudogenes are generally defined as nonfunctional, the above examples serve to challenge this characterization.

Conclusions:
We have detected some contrasting characteristics of nucleotide variation in the Est-6 gene and the {psi}Est-6 putative pseudogene. The level of the total nucleotide variation is 2.1 times higher in {psi}Est-6 than in Est-6. The population recombination rate is at least 2.6 times lower in {psi}Est-6 than in Est-6. As a consequence, linkage disequilibrium is more pronounced in {psi}Est-6 than in Est-6. The haplotype structure of {psi}Est-6 is dimorphic. However, the divergent sequences of {psi}Est-6 are not perfectly associated with Est-6 allozyme variation. Some of the detected features of {psi}Est-6 indicate that it could be a pseudogene: 11 premature stop codons out of 28 strains are hardly compatible with functionality of the encoded protein. The level of nonsynonymous variation is 3.0 times higher in {psi}Est-6 than in Est-6. The results of the structural entropy analysis reveal a lower structural regularity and a higher structural divergence for {psi}Est-6, in accordance with the expectations provided it is a pseudogene or nonfunctional gene. On the other hand, it has been shown that the gene is expressed (COLLET et al. 1990 Down) and some alleles of {psi}Est-6 produce a catalytically active esterase (DUMANCIC et al. 1997 Down). On the basis of the results we have presented and the review of the literature about pseudogene nucleotide variation, we conclude that there is not a sharp division between genes and pseudogenes with respect to function. A pseudogene may lose some specific function but retain or acquire another, which may not be simply recognizable. There are many examples of functional or "active" pseudogenes, a statement that would amount to a genetic oxymoron, if pseudogenes are defined as nonfunctional. Taking into account that the function of {psi}Est-6 is not known (but could exist and may be discovered in the future), we suggest that the term potogene be used for {psi}Est-6, following the terminology of BROSIUS and GOULD 1992 Down, reflecting that the status of this gene is not certain at present. These authors pointed out that the products of gene duplication, including those that become pseudogenes, may eventually acquire distinctive functions, and thus might be called potogenes to call attention to their potentiality for becoming new genes or acquiring new functions. The distinctive features of {psi}Est-6, including its pattern of population variation, suggest that it may already have some functional role (for instance, as a reservoir of sequences, which can recombine with the expressed Est-6 gene), but its designation as a potogene would imply that the function is not known and far from confirmed, although the potentiality exists.


*  ACKNOWLEDGMENTS

We are grateful to S. A. Sawyer, G. McVean, D. A. Filatov, J. K. Kelly, J. H. McDonald, J. D. Wall, J. M. Comeron, F. Depaulis, and J. Rozas for useful advice on analyses and for providing computer programs. We thank Elena Balakireva, Andrei Tatarenkov, Victor DeFilippis, Martina Zurovkova, and Carlos Márquez for encouragement and help; and W. M. Fitch, B. Gaut, R. R. Hudson, A. Long, and two anonymous reviewers for detailed and valuable comments. This work is supported by National Insititutes of Health grant GM42397 to F. J. Ayala.

Manuscript received December 16, 2002; Accepted for publication February 21, 2003.


*  LITERATURE CITED
*TOP
*ABSTRACT
*MATERIALS AND METHODS
*RESULTS
*DISCUSSION
*LITERATURE CITED

BALAKIREV, E. S. and F. J. AYALA, 1996  Is esterase-P encoded by a cryptic pseudogene in Drosophila melanogaster? Genetics 144:1511-1518.[Abstract]

BALAKIREV, E. S., E. I. BALAKIREV, F. RODRIGUEZ-TRELLES, and F. J. AYALA, 1999  Molecular evolution of two linked genes, Est-6 and Sod, in Drosophila melanogaster.. Genetics 153:1357-1369.[Abstract/Free Full Text]

BALAKIREV, E. S., E. I. BALAKIREV, and F. J. AYALA, 2002  Molecular evolution of the Est-6 gene in Drosophila melanogaster: contrasting patterns of DNA variability in adjacent functional regions. Gene 288:167-177.[Medline]

BARD, J. A., S. P. NAWOSCHIK, B. F. O'DOWD, S. R. GEORGE, and T. A. BRANCHEK et al., 1995  The human serotonin 5-hydroxytryptamine1D receptor pseudogene is transcribed. Gene 153:295-296.[Medline]

BEGUN, D., 1997  Origin and evolution of a new gene descended from alcohol dehydrogenase in Drosophila. Genetics 145:375-382.[Abstract]

BEGUN, D. J. and C. F. AQUADRO, 1992  Levels of naturally occurring DNA polymorphism correlate with recombination rates in Drosophila melanogaster.. Nature 356:519-520.[Medline]

BLISKOVSKII, V. V., A. V. KIRILLOV, K. S. SPIRIN, V. M. ZAKHAREV, and I. M. CHUMAKOV, 1993  Son pseudogene does not contain five repeated elements of the area of perfect tandem repeats present in the homologous son gene sequence. Mol. Biol. 27:61-68.

BRAYTON, K. A., D. P. KNOWLES, T. C. MCGUIRE, and G. H. PALMER, 2001  Efficient use of a small genome to generate antigenic diversity in tick-borne ehrlichial pathogens. Proc. Natl. Acad. Sci. USA 98:4130-4135.[Abstract/Free Full Text]

BRISTOW, J., S. E. GITELMAN, M. K. TEE, B. STAELS, and W. L. MILLER, 1993  Abundant adrenal-specific transcription of the human P450c21A "pseudogene". J. Biol. Chem. 268:12919-12924.[Abstract/Free Full Text]

BROSIUS, J. and S. J. GOULD, 1992  On "genomenclature": a comprehensive (and respectful) taxonomy for pseudogenes and other "junk" DNA. Proc. Natl. Acad. Sci. USA 89:10706-10710.[Abstract/Free Full Text]

BUETTNER, J. A., G. GLUSMAN, N. BEN-ARIE, P. RAMOS, and D. LANCET et al., 1998  Organization and evolution of olfactory receptor genes on human chromosome 11. Genomics 53:56-68.[Medline]

CHECHETKIN, V. R. and V. V. LOBZIN, 1996  Levels of ordering in coding and non-coding regions of DNA sequences. Phys. Lett. A 222:354-360.

CHECHETKIN, V. R. and V. V. LOBZIN, 1998  Study of correlations in segmented DNA sequences: application to structural coupling between exons and introns. J. Theor. Biol. 190:69-83.[Medline]

CHECHETKIN, V. R. and A. Y. TURYGIN, 1994  On the spectral criteria of disorder in non-periodic sequences: application to inflation models, symbolic dynamics and DNA sequences. J. Phys. A Math. Gen. 27:4875-4898.

CHECHETKIN, V. R. and A. Y. TURYGIN, 1996  Study of correlations in DNA sequences. J. Theor. Biol. 178:205-217.[Medline]

CHECHETKIN, V. R., L. A. KNIZHNIKOVA, and A. Y. TURYGIN, 1994  Three-quasiperiodicity, mutual correlations, ordering and long-range modulations in genomic nucleotide sequences for viruses. J. Biomol. Struct. Dyn. 12:271-299.[Medline]

COLLET, C., K. M. NIELSEN, R. J. RUSSELL, M. KARL, and J. G. OAKESHOTT et al., 1990  Molecular analysis of duplicated esterase genes in Drosophila melanogaster.. Mol. Biol. Evol. 7:9-28.[Abstract]

COOKE, P. H. and J. G. OAKESHOTT, 1989  Amino acid polymorphisms for esterase-6 in Drosophila melanogaster.. Proc. Natl. Acad. Sci. USA 86:1426-1430.[Abstract/Free Full Text]

CRISTIANO, R. J., S. J. GIORDANO, and A. W. STEGGLES, 1993  The isolation and characterization of the bovine cytochrome b5 gene, and a transcribed pseudogene. Genomics 17:348-354.[Medline]

CURRIE, P. D. and D. T. SULLIVAN, 1994  Structure, expression and duplication of genes which encode phosphoglyceromutase of Drosophila melanogaster.. Genetics 138:352-363.

DUMANCIC, M. M., J. G. OAKESHOTT, R. J. RUSSELL, and M. J. HEALY, 1997  Characterization of the EstP protein in Drosophila melanogaster and its conservation in Drosophilids. Biochem. Genet. 35:251-271.[Medline]

FAN, Q., F. XU, and T. PETES, 1995  Meiosis-specific double-strand breaks at the HIS4 recombination hot spot in the yeast Saccharomyces cerevisiae control in cis and trans.. Mol. Cell. Biol. 15:1679-1688.[Abstract]

FICKETT, J. W. and C.-S. TUNG, 1992  Assessment of protein coding measures. Nucleic Acids Res. 20:6441-6450.[Abstract/Free Full Text]

FILATOV, D. A. and D. CHARLESWORTH, 1999  DNA polymorphism, haplotype structure and balancing selection in the Leavenworthia PgiC locus. Genetics 153:1423-1434.[Abstract/Free Full Text]

FISHER, J. A. and T. MANIATIS, 1985  Structure and transcription of the Drosophila mulleri alcohol dehydrogenase genes. Nucleic Acids Res. 13:6899-6917.[Abstract/Free Full Text]

FOTAKI, M. E. and K. IATROU, 1988  Identification of a transcriptionally active pseudogene in the chorion locus of the silkmoth Bombyx mori. Regional sequence conservation and biological function. J. Mol. Biol. 203:849-860.[Medline]

RBAB, R. and J. VANSELOW, 1995  An aromatase pseudogene is transcribed in the bovine placenta. Gene 154:287-291.[Medline]

GAME, A. Y. and J. G. OAKESHOTT, 1990  Associations between restriction site polymorphism and enzyme activity variation for esterase 6 in Drosophila melanogaster.. Genetics 126:1021-1031.[Abstract]

GIMELBRANT, A. A. and T. S. MCCLINTOCK, 1997  A nuclear matrix attachment region is highly homologous to a conserved domain of olfactory receptors. J. Mol. Neurosci. 9:61-63.[Medline]

GLUSMAN, G., A. SOSINSKY, E. BEN-ASHER, N. AVIDAN, and D. SONKIN et al., 2000  Sequence, structure, and evolution of a complete human olfactory receptor gene cluster. Genomics 63:227-245.[Medline]

GOSS, P. J. E. and R. C. LEWONTIN, 1996  Detecting heterogeneity of substitution along DNA and protein sequences. Genetics 143:589-602.[Abstract]

GROMKO, M. H., D. F. GILBERT and R. C. RICHMOND, 1984 Sperm transfer and use in the multiple mating system of Drosophila, pp. 371–426 in Sperm Competition and the Evolution of Animal Mating Systems, edited by R. L. SMITH. Academic Press, New York.

HASSON, E. and W. F. EANES, 1996  Contrasting histories of three gene regions associated with In(3L)Payne of Drosophila melanogaster.. Genetics 144:1565-1575.[Abstract]

HEALY, M. J., M. M. DUMANCIC, and J. G. OAKESHOTT, 1991  Biochemical and physiological studies of soluble esterases from Drosophila melanogaster.. Biochem. Genet. 29:365-388.[Medline]

HEALY, M. J., M. M. DUMANCIC, A. CAO, and J. G. OAKESHOTT, 1996  Localization of sequences regulating ancestral and acquired sites of esterase 6 activity in Drosophila melanogaster.. Mol. Biol. Evol. 13:784-797.[Abstract]

HEY, J. and J. WAKELEY, 1997  A coalescent estimator of the population recombination rate. Genetics 145:833-846.[Abstract]

HOLSTE, D., O. WEISS, I. GROSSE, and H. HERZEL, 2000  Are noncoding sequences of Rickettsia prowazekii remnants of "neutralized" genes? J. Mol. Evol. 51:353-362.[Medline]

HUDSON, R. R., 1987  Estimating the recombination parameter of a finite population model without selection. Genet. Res. 50:245-250.[Medline]

HUDSON, R. R., 1990  Gene genealogies and the coalescent process. Oxf. Surv. Biol. 7:1-44.

HUDSON, R. R. and N. KAPLAN, 1985  Statistical properties of the number of recombination events in the history of a sample of DNA sequences. Genetics 111:147-164.[Abstract/Free Full Text]

HUDSON, R. R. and N. KAPLAN, 1988  The coalescent process in models with selection and recombination. Genetics 120:831-840.[Abstract/Free Full Text]

HUDSON, R. R., D. BOOS, and N. L. KAPLAN, 1992  A statistical test for detecting geographic subdivision. Mol. Biol. Evol. 9:138-151.[Abstract]

INOUYE, M., 1988  Antisense RNA: its functions and applications in gene regulation—a review. Gene 72:25-34.[Medline]

JEFFREYS, A. J., V. WILSON