Genetics, Vol. 166, 1845-1856, April 2004, Copyright © 2004

Nucleotide Variation in the tinman and bagpipe Homeobox Genes of Drosophila melanogaster

Evgeniy S. Balakireva,b,c and Francisco J. Ayalaa
a Department of Ecology and Evolutionary Biology, University of California, Irvine, California 92697-2525,
b Institute of Marine Biology, Vladivostok 690041, Russia
c Academy of Ecology, Marine Biology, and Biotechnology, Far Eastern State University, Vladivostok 690600, Russia

Corresponding author: Francisco J. Ayala, 321 Steinhaus Hall, University of California, Irvine, CA 92697-2525., fjayala{at}uci.edu (E-mail)

Communicating editor: L. HARSHMAN


*  ABSTRACT
*TOP
*ABSTRACT
*MATERIALS AND METHODS
*RESULTS
*DISCUSSION
*LITERATURE CITED

The tinman (tin) and bagpipe (bap) genes are members of the NK homeobox gene family of Drosophila, so that tin occupies a higher position than bap in the regulatory hierarchy. Little is known about the level and pattern of genetic polymorphism in homeobox genes. We have analyzed nucleotide polymorphism in 27 strains of Drosophila melanogaster and one each of D. simulans and D. sechellia, within two closely linked regions encompassing a partial sequence of tin and the complete sequence of bap. The two genes exhibit different levels and patterns of nucleotide diversity. Two sets of sharply divergent sequence types are detected for tin. The haplotype structure of bap is more complex: about half of the sequences are identical (or virtually so), while the rest are fairly heterogeneous. The level of silent nucleotide variability is 0.0063 for tin but significantly higher, 0.0141, for bap, a level of polymorphism comparable to the most polymorphic structural genes of D. melanogaster. Recombination rate and gene conversion are also higher for bap than for tin. There is strong linkage disequilibrium, with the highest values in the introns of both genes and exon II of bap. The patterns of polymorphism in tin and bap are not compatible with an equilibrium model of selective neutrality. We suggest that negative selection and demographic history are the major factors shaping the pattern of nucleotide polymorphism in the tin and bap genes; moreover, there are clear indications of positive selection in the bap gene.


THE population polymorphism of homeobox genes in Drosophila has received scant attention. BEGUN et al. 1994 Down investigated restriction-map polymorphism in the cut locus region of D. melanogaster and D. simulans. The amount of nucleotide variation was in both species about one-fourth of the average amount in comparable restriction-map studies of gene regions with normal recombination (AQUADRO 1992 Down), even though the gene is located in a genomic region with "normal" recombination rate. BEGUN et al. 1994 Down suggested that selective sweeps associated with some closely linked gene(s) might account for the decreased variability of cut. BAINES et al. 2002 Down investigated nucleotide polymorphism in the D. melanogaster bicoid region. The level of silent polymorphism for the noncoding region was lower than typical values observed in D. melanogaster (MORIYAMA and POWELL 1996 Down); silent diversity in the coding region was also low ({pi} = 0.0002). An interesting feature of the bicoid coding region variability was that 6 of 7 polymorphic sites involved replacement polymorphisms, which could indicate a relaxation of selective constraints in this region (BAINES et al. 2002 Down). A significant excess of intraspecific replacement polymorphism (16 of 21) was also observed in the Arabidopsis thaliana CAULIFLOWER homeotic gene, which was in this case attributed to positive selection (PURUGGANAN and SUDDITH 1998 Down). Nucleotide polymorphism has been investigated also for the even-skipped (LUDWIG and KREITMAN 1995 Down), Ultrabithorax (GIBSON and HOGNESS 1996 Down), and OdysseusH (TING et al. 1998 Down) Drosophila homeobox genes. The data obtained for these genes, however, involved only interspecific comparisons or only very few sequences, so that intraspecific variability could not be estimated. On the average, Drosophila regulatory loci display reduced levels of diversity (less than half) compared to structural genes (MORIYAMA and POWELL 1996 Down). In contrast, the level of polymorphism is about the same in some plant regulatory genes as in structural loci (PURUGGANAN and SUDDITH 1999 Down; PURUGGANAN 2000 Down). However, recent data indicate that even adjacent regulatory genes can show a wide range in the level and patterns of sequence variation, which suggests that different members of a regulatory gene cluster may be subject to distinct evolutionary forces (LAWTON-RAUH et al. 2003 Down; SHEPARD and PURUGGANAN 2003 Down; see also PURUGGANAN and SUDDITH 1999 Down).

Previously, we have investigated nucleotide variability in the ß-esterase gene cluster located on the left arm of chromosome 3 of Drosophila melanogaster, at 68F7–69A1 in the cytogenetic map (BALAKIREV and AYALA 1996 Down, BALAKIREV and AYALA 2003A Down, BALAKIREV and AYALA 2003B Down, BALAKIREV and AYALA 2004 Down; BALAKIREV et al. 1999 Down, BALAKIREV et al. 2002 Down, BALAKIREV et al. 2003 Down; AYALA et al. 2002 Down). The cluster comprises two tandemly duplicated genes, Est-6 and {psi}Est-6 (originally named Est-P by COLLET et al. 1990 Down). We detected complex haplotype structures in both genes. We now present a population genetic analysis of the tinman (tin) and bagpipe (bap) homeobox genes in 27 strains of D. melanogaster.

The tin and bap genes are members of the NK homeobox gene family, which consists of several closely linked interacting regulatory genes, located on the right arm of D. melanogaster chromosome 3, at 93DE in the cytogenetic map (KIM and NIRENBERG 1989 Down; review in JAGLA et al. 2001 Down). The common cosmopolitan inversion In(3R) (89C2.3;96A1.19; LEMEUNIER and AULARD 1992 Down) encloses the tin and bap genes, but no third-chromosome inversions have been found segregating in the El Rio population studied in the present investigation (SMIT-MCBRIDE et al. 1988 Down and unpublished data from our laboratory). There are two exons (444 and 705 bp) in bap, but three (186, 609, and 453 bp) in tin. Correspondingly, there is one intron in bap (121 bp) and two (905 and 430 bp) in tin. The two genes are closely linked. The transcription start site of bap is 7.1 kb downstream of the 3'-flanking end of the tin gene. No open reading frame has been recorded in the region between the two genes (ADAMS et al. 2000 Down). Both genes are transcribed in the same direction. There is a hierarchical relationship between the two genes since bap activation in the dorsal mesoderm depends on tin, while tin does not require bap (AZPIAZU and FRASCH 1993 Down; BODMER 1993 Down). We have sequenced 3818 bp, which include the 5'-flanking region of tin (767 bp), the partial tin gene (exon I, 186 bp; intron I, 905 bp; and partial exon II, 570 bp), the 5'-flanking region of bap (64 bp), the complete bap gene (exon I, 444 bp; intron, 121 bp; and exon II, 705 bp), and the 3'-flanking region of bap (56 bp).


*  MATERIALS AND METHODS
*TOP
*ABSTRACT
*MATERIALS AND METHODS
*RESULTS
*DISCUSSION
*LITERATURE CITED

Drosophila strains:
The 27 D. melanogaster strains derive from a random sample of wild flies collected by F. J. Ayala (October 1991) in El Rio Vineyard, Acampo, California. The strains were made fully homozygous for the third chromosome by crosses with balancer stocks, as described by SEAGER and AYALA 1982 Down. The 27 strains were previously investigated for ß-esterase gene cluster (see above). Strains 1–27 successively correspond to the following strains in BALAKIREV et al. 2003 Down: S-114S, F-517S, F-517F, F-531F, F-1461S, S-2588S, S-581F, S-255S, F-357F, F-611F, S-174F, S-501F, S-501S, S-1224F, S-377F, F-274F, S-512F, F-96S, US-255F, S-968F, S-521S, S-549S, F-775F, S-438S, S-26F, S-510S, and S-565F.

DNA extraction, amplification, and sequencing:
Total genomic DNA was extracted using the tissue protocol of the QIAamp tissue kit (QIAGEN, Valencia, CA). The published Celera genome sequence of D. melanogaster was used for designing PCR and sequencing primers (ADAMS et al. 2000 Down). The primers used for the tin PCR amplification reactions were 5'-acaaacgtaaatacatggggactat-3' (forward primer) and 5'-attgacacttaacgcaaggaaacag-3' (reverse primer). The primers used for the bap PCR amplification reactions were 5'-atcaatagaaatccgtagtgaac-3' (forward primer) and 5'-ttacacatagaggctaaaacac-3' (reverse primer). The PCR reactions were carried out in final volumes of 50 µl, using TaKaRa Ex Taq in accordance with the manufacturer's description (Takara Biotechnology, Berkeley, CA). The reaction mixtures were overlaid with mineral oil; placed in a DNA thermal cycler (Perkin-Elmer-Cetus, Norwalk, CT); incubated 5 min at 95°; and subjected to 30 cycles of denaturation, annealing, and extension: 95° for 30 sec, 60° for 30 sec, and 72° for 2.0 min (for the first cycle and progressively adding 3 sec at 72° for every subsequent cycle), with a final 7-min extension period at 72°. The PCR reactions were purified with the Wizard PCR preps DNA purification system (Promega, Madison, WI), directly sequenced by the dideoxy chain-termination technique using Dye Terminator chemistry, and separated with the ABI PRISM 377 automated DNA sequencer (Perkin-Elmer). For each strain, both strands were sequenced using six overlapping internal primers spaced, on average, 350 nucleotides apart. At least two independent PCR amplifications were sequenced for each polymorphic site in all D. melanogaster strains to prevent possible PCR or sequencing errors. The GenBank accession numbers for the sequences are AY368077, AY368078, AY368079, AY368080, AY368081, AY368082, AY368083, AY368084, AY368085, AY368086, AY368087, AY368088, AY368089, AY368090, AY368091, AY368092, AY368093, AY368094, AY368095, AY368096, AY368097, AY368098, AY368099, AY368100, AY368101, AY368102, AY368103, AY368104, AY368105, AY368106, AY368107, AY368108, AY368109 and AY369088, AY369089, AY369090, AY369091, AY369092, AY369093, AY369094, AY369095, AY369096, AY369097, AY369098, AY369099, AY369100, AY369101, AY369102, AY369103, AY369104, AY369105, AY369106, AY369107, AY369108, AY369109, AY369110, AY369111, AY369112, AY369113, AY369114, AY369115.

DNA sequence analysis:
Multiple alignment was carried out using the program CLUSTAL W (THOMPSON et al. 1994 Down). Linkage disequilibrium between polymorphic sites was evaluated using Fisher's exact test of independence. The computer programs DnaSP, version 3.4 (ROZAS and ROZAS 1999 Down) and PROSEQ, version 2.4 (FILATOV and CHARLESWORTH 1999 Down) were used to analyze the data by means of the "sliding-window" method (HUDSON and KAPLAN 1988 Down) and for most intraspecific analyses. Departures from neutral expectations were investigated using the tests of HUDSON et al. [1987; Hudson-Kreitman-Aguadé (HKA) test], HUDSON et al. 1994 Down(haplotype test), MCDONALD 1996 Down, MCDONALD 1998 Down, KELLY 1997 Down, and WALL 1999 Down. The permutation approach of HUDSON et al. 1992 Down was used to estimate the significance of sequence differences between haplotype families. The coalescent simulations (HUDSON 1983 Down, HUDSON 1990 Down) were performed with the PROSEQ program to estimate the probabilities of the observed values of Kelly's ZnS and Wall's B and Q statistics and confidence intervals for the nucleotide diversity values. The method of SAWYER 1989 Down, SAWYER 1999 Down was used to analyze intra- and intergenic conversion events.


*  RESULTS
*TOP
*ABSTRACT
*MATERIALS AND METHODS
*RESULTS
*DISCUSSION
*LITERATURE CITED

Nucleotide polymorphism and recombination:
Fig 1 shows a total of 69 polymorphic sites in a sample of 27 sequences of the tin and bap genes. There are five length polymorphisms: four in intron I of tin and one in exon II of bap. Some relevant statistics are given in Table 1. The {pi} value for the full sequence is 0.0056, which is within the range of values observed in highly recombining gene regions in D. melanogaster (MORIYAMA and POWELL 1996 Down; POWELL 1997 Down). The silent {pi} value is 0.0063 for tin but significantly higher (by coalescent simulation), 0.0140, for bap. The level of synonymous variation is 0.0052 in the tin coding region but three times higher, 0.0167, in the bap coding region (the difference is highly significant by coalescent simulation even without recombination, P = 0.01). The difference is less pronounced for nonsynonymous variation, which is 0.0011 in the tin gene and two times higher, 0.0021, in the bap gene (P > 0.05). Silent variability is three times higher in exon II ({pi} = 0.0224) than in exon I ({pi} = 0.0073) of bap; the difference is statistically significant by coalescent simulation, even without recombination (P = 0.01). The elevated level of nucleotide variability within the bap exon II cannot be accounted for by relaxation of functional constraints. This is because the bap exon II contains a functionally important homeodomain (183 bp at position 3136–3318, our coordinates), which is conserved within a wide phylogenetic scale (DUBOULE 1994 Down). There are four polymorphic sites within this bap homeodomain region (Fig 1, positions 3192, 3228, 3246, and 3255), with {pi} = 0.0051, which is slightly lower than that for the whole bap exon II ({pi} = 0.0063).



View larger version (47K):
In this window
In a new window
Download PPT slide
 
Figure 1. DNA polymorphism in the tin and bap genes of 27 strains of Drosophila melanogaster. The lines are numbered consecutively according to their genetic similarity. The numbers above the top sequence represent the position of segregating sites and the start of a deletion or insertion. Nucleotides are numbered from the beginning of our sequence, position 155034 for the tin and 165505 for the bap gene in ADAMS et al. 2000 Down. The coding regions of the genes are underlined below the top reference sequence. Amino acid replacement polymorphisms are marked with asterisks. Dots indicate the same nucleotide as the reference sequence. A hyphen represents deleted nucleotides. {blacktriangleup}, a deletion; {dagger},the absence of a deletion. {blacktriangleup}1, a 2-bp deletion of TT (position 252–253); {blacktriangleup}2, a 3-bp deletion of TCC (position 1007–1009); {blacktriangleup}3, a 101-bp deletion of TACCAAAATCTACTAAATCGAATCAGTGTACATATATGATATAAAACTAATTATTATACCGGTTAACTATCTAAATTGCAATGATTTAAGAAGGTATCTGT (position 1058–1158); {blacktriangleup}4, a 9-bp deletion of TCCTGCTTT (position 1842–1850). {blacktriangledown}, a 12-bp insertion of GAGCGGAGAGCG (position 3716–3727); {ddagger}, the absence of an insertion. The coordinates for the functional regions of the genes are: 1–767 (tin, 5'-flanking region), 768–953 (tin, exon I), 954–1858 (tin, intron I), 1859–2428 (tin, exon II), 2429–2492 (bap, 5'-flanking region), 2493–2936 (bap, exon I), 2937–3057 (bap, intron), 3058–3774 (bap, exon II), and 3775–3830 (bap, 3'flanking region). sec, D. sechellia; sim, D. simulans.


 
View this table:
In this window
In a new window

 
Table 1. Nucleotide diversity and divergence in the tin and bap genes of D. melanogaster

The silent variability of bap ({pi} = 0.0140) is close to that of one of the most polymorphic genes of D. melanogaster, Est-6 ({pi} = 0.0156), in the same North American population (BALAKIREV et al. 2002 Down; BALAKIREV and AYALA 2003A Down). The silent variability of bap exon II is very close to that of {psi}Est-6 ({pi} = 0.0253), which is almost twice as variable as Est-6 (BALAKIREV et al. 2003 Down; BALAKIREV and AYALA 2004 Down). The nucleotide variability detected in the tin gene is in the range of values observed in many other regulatory genes evolving under negative selection (MORIYAMA and POWELL 1996 Down; PURUGGANAN 2000 Down; RILEY et al. 2003 Down). The differences in the level of variability between tin and bap could be due to positive selection reflecting different degrees of functional constraint (see below).

The level of divergence, K, between D. melanogaster and D. simulans is similar for tin and bap (Table 1). The K/{pi} ratio is, however, substantially higher for tin in both the coding and noncoding regions: Kcod/{pi}cod is 10.67 for tin, but 5.29 for bap and Kncod/{pi}ncod is 9.28 for tin, but 3.85 for bap. These differences reflect that the pressure of purifying selection is higher for tin than for bap. Silent divergence is very similar in exon I (K = 0.0988) and exon II (K = 0.1012) of bap, despite the fact that the level of silent variability is significantly different in these coding regions, as noted above. The level of divergence between D. melanogaster and D. sechellia is very close to the level of divergence between D. melanogaster and D. simulans (Table 1); the only difference concerns nonsynonymous divergence, which is 1.4 and 1.7 times higher between D. melanogaster and D. sechellia than between D. melanogaster and D. simulans for the tin and bap genes, respectively.

The method of HUDSON and KAPLAN 1985 Down reveals a minimum of nine recombination events in the whole region analyzed. The minimum number of recombination events is six for bap but two for tin. There is a large difference (33 times) in the recombination estimator ({rho}, MCVEAN et al. 2002 Down) for tin ({rho} = 0.0008) and bap ({rho} = 0.0264) (Table 2). For the chromosomal region 93DE the laboratory estimate of recombination rate (Clab; COMERON et al. 1999 Down) based on the physical and genetic maps of D. melanogaster is 0.0744. A substantial difference between the laboratory-based and sequence-based estimates of recombination has been explained by a recent bottleneck (for review, see WALL 2001 Down). The large difference between the sequence-based estimates of recombination rate for the closely linked tin and bap genes (7.1 kb apart) is inconsistent with the demographic explanation of the difference between the laboratory-based and sequence-based estimates of recombination rate (ANDOLFATTO and PRZEWORSKI 2000 Down; WALL 2001 Down).


 
View this table:
In this window
In a new window

 
Table 2. Recombination estimates

The method of SAWYER 1989 Down, SAWYER 1999 Down detects gene conversion events within both tin (P = 0.0097) and bap (P = 0.0029). The number of significant fragments is 7 for tin (fragment length varies from 1149 to 1581 bp, average 1396 bp) but 49 for bap (fragment length varies from 323 to 1135 bp, average 665 bp). Intragenic conversion involves two tin regions (coordinates 751–1899 and 319–1899) in 6 sequences; for bap, intragenic conversion involves eight regions (coordinates 1–763, 1–1135, 250–763, 592–1135, 592–1303, 592–1402, 996–1318, and 996–1402) in 23 sequences. Thus, intragenic gene conversion is more pronounced within bap than within tin. Intergenic gene conversion is not detected, which is likely due to the low nucleotide similarity between tin and bap (42.7%), which is insufficient to satisfy the homology requirements for efficient intergenic conversion. The recombination machinery is sensitive even to a single-nucleotide mismatch; individual nucleotide substitutions have been shown to affect recombination in many organisms (for review, see BALAKIREV and AYALA 2003C Down).

Haplotype structure:
We have previously described the presence of two sets of haplotypes for the ß-esterase gene cluster of D. melanogaster, localized on the left arm of the third chromosome (BALAKIREV et al. 1999 Down, BALAKIREV et al. 2002 Down, BALAKIREV et al. 2003 Down). There is also strong haplotype structure for the tin and bap genes (Fig 1 Fig 2 Fig 3). For the tin gene, there are two sets of haplotypes, each completely associated with one of two deletions, 3 bp and 101 bp, within intron I (Fig 1, {blacktriangleup}2 and {blacktriangleup}3) and almost completely associated with the replacement polymorphism at position 1884 (Fig 1). The {blacktriangleup}2 deletion exists also in D. simulans and D. sechellia (Fig 1), which suggests that it is the ancestral condition, whereas the {blacktriangleup}3 deletion has appeared after the split of D. melanogaster from the other two species. Strong haplotype structure is also observed for the bap gene (Fig 1 and Fig 3). There is a set of 13 sequences, 12 of them identical to each other plus one (no. 10) that differs by a nonsynonymous substitution, and a second set of fairly heterogeneous sequences (nos. 14–27). The difference between the two tin haplotype sets (1–18 vs. 19–27) is highly significant (Kst* = 0.4692; Kst*0.999 = 0.1580, P < 0.001 by the permutation test of HUDSON et al. 1992 Down). However, the level of variability is not significantly different between the two tin haplotype sets ({pi} = 0.009 vs. {pi} = 0.0019).



View larger version (14K):
In this window
In a new window
Download PPT slide
 
Figure 2. Neighbor-joining tree of the tin haplotypes of D. melanogaster, based on Kimura's two-parameter distance. The numbers at the nodes are bootstrap percentage of probability values based on 10,000 replications.



View larger version (15K):
In this window
In a new window
Download PPT slide
 
Figure 3. Neighbor-joining tree of the bap haplotypes of D. melanogaster based on Kimura's two-parameter distance. The numbers at the nodes are bootstrap percentage of probability values based on 10,000 replications.

The homogeneous haplotype set of bap (strains 1–13, Fig 1 and Fig 3) has scant variability ({pi} = 0.0001). The heterogeneous set (strains 14–27) is significantly different from the first set (Kst* = 0.2852; Kst*0.999 = 0.1002, P < 0.001 by the permutation test) and has significantly greater variability ({pi} = 0.0089; P < 0.0001). The distribution of mutations in the bap gene is highly asymmetrical in the two haplotype sets: the heterogeneous set is 74 times more variable than the homogeneous haplotype set.

Sliding-window analysis:
Fig 4 shows sliding-window plots of the distribution of nucleotide polymorphism in D. melanogaster (Fig 4A), divergence between this species and D. simulans (Fig 4B), polymorphism-to-divergence ratio (Fig 4C), and linkage disequilibrium (Fig 4D). There are two distinct peaks of nucleotide variability in tin (Fig 4A), in the 5'-flanking region (midpoint coordinates 271–280) and intron (midpoint coordinates 992–1027). These peaks coincide with the peaks of divergence (Fig 4B) and linkage disequilibrium (Fig 4D). The highest peak of the polymorphism-to-divergence ratio within the tin gene, however, does not coincide with the peaks of variability and divergence (it is centered on the tin exon I, Fig 4C). There is also a peak of variability in the bap gene (Fig 4A) in the intron region (midpoint coordinates 2969–3003). This peak is not accompanied by a peak of divergence (Fig 4B); rather, there is an obvious decrease of divergence in the intron region of bap, which produces a peak in the silent divergence-to-polymorphism ratio (Fig 4C, midpoint coordinates 2700–2900). The peak in the bap intron coincides with a peak of linkage disequilibrium (Fig 4D). The highest peak of the silent divergence-to-polymorphism ratio is centered on the bap exon II (Fig 4C, midpoint coordinate 3200). The minimal values of polymorphism-to-divergence ratio within the bap gene are at the beginning of exon I and centered on the two replacement polymorphic sites (Fig 4C and Fig 5, positions 2676 and 2677). The lowest and highest polymorphism-to-divergence ratios are accompanied by the largest maximum and average sliding G values of the MCDONALD's (1996, 1998) tests (Fig 6).



View larger version (24K):
In this window
In a new window
Download PPT slide
 
Figure 4. Sliding-window plots of (A) nucleotide diversity (measured by {pi}), (B) silent nucleotide diversity (thin line) and divergence (K, thick line), (C) polymorphism-to-divergence ratio, and (D) linkage disequilibrium (measured by D) along the tin and bap genes of D. melanogaster. K is the average number of nucleotide substitutions per site between D. melanogaster and D. simulans. Window sizes are 100 nucleotides with 1-nucleotide increments for A, 50 nucleotides with 1-nucleotide increments for B, and 200 nucleotides with 50-nucleotide increments for D. Window size is 10 variable substitutions for the sliding-window plot of polymorphism-to-divergence ratio (C). A schematic of the region investigated is displayed at bottom. Exons are indicated by boxes; the introns and the 5'- and 3'-flanking regions are shown by thin lines.



View larger version (14K):
In this window
In a new window
Download PPT slide
 
Figure 5. Sliding-window plot of synonymous polymorphism-to-divergence ratio in the tin and bap genes of D. melanogaster. A vertical line indicates the separation between tin and bap. D. simulans is used as an outgroup species. Window size is 10 variable substitutions.




View larger version (18K):
In this window
In a new window
Download PPT slide
 
Figure 6. Sliding-window plot of the largest average sliding G value (A) and the largest maximum sliding G value (B) for the bap gene (synonymous variability). Window size is 13 variable substitutions for A and 15 for B.

The variants within the intron sequences of both genes segregate as single-haplotype blocks (Fig 1). Correspondingly there are obvious peaks of nucleotide variability and linkage disequilibrium in the introns (Fig 4A and Fig D). For the tin gene the intron peak of nucleotide variability also corresponds to the high peak of divergence but for the bap gene there is an opposite tendency: the peak of variability corresponds to the lowest level of the divergence (Fig 4A and Fig B). A similar tendency is observed for the bap exon II: the level of variability and divergence is close in this region (Fig 4B). Thus for the tin gene the amount of divergence between species is in accordance with the amount of intraspecific polymorphism but for the bap gene the pattern is different: a decrease of the divergence and a parallel increase of polymorphism in the intron and exon II regions that could be explained by the influence of positive selection (see below).

We have measured heterogeneity in the distribution of polymorphic sites along the sequence and discordance between the level of within-melanogaster polymorphism and the melanogaster-simulans (or melanogaster-sechellia) divergence by means of GOSS and LEWONTIN's (1996) and MCDONALD's (1996, 1998) statistics and have assessed their significance by Monte Carlo simulations of the coalescent model incorporating recombination (MCDONALD 1996 Down, MCDONALD 1998 Down; Table 3). On the basis of 10,000 simulations, with the recombination parameters varying from 1 to 64, the tests are not significant for the tin gene. However, the tests are significant for the bap gene (Table 3). There are two areas within the bap gene with the largest average and maximum sliding G values (Fig 6A and Fig B). The first (and most pronounced) area is located at the beginning of bap exon I and coincides with a region of low polymorphism-to-divergence ratio (Fig 5). The second area is located in bap exon II and coincides with a region of high polymorphism-to-divergence ratio. The region of low polymorphism-to-divergence ratio is centered on the two replacement substitutions (positions 2676 and 2677). The region of high polymorphism-to-divergence ratio is localized within exon II but is not centered on the replacement polymorphism (Fig 1 and Fig 4C). An area of low polymorphism could result from a selective sweep whereas high polymorphism could result from balancing selection (MCDONALD 1996 Down, MCDONALD 1998 Down). Previously, we have shown that both types of selection are involved in the evolution of the Est-6 gene of D. melanogaster (BALAKIREV et al. 2002 Down). We suggest that both types of selection are involved within the bap gene. To examine this suggestion one would analyze separately the polymorphism-to-divergence ratio in the regions with low and high polymorphism (that roughly correspond to exon I and exon II of bap), using the MCDONALD and KREITMAN 1991 Down and/or the HKA test (HUDSON et al. 1987 Down). However, both tests are hardly applicable in this case because there are only two silent polymorphic sites within exon I of bap. Thus, we cannot contrast the pattern of polymorphism-to-divergence in both exons. However, for bap exon II the HKA test reveals a higher polymorphism-to-divergence ratio in comparison with the complete tin gene ({chi}2 = 4.484, P = 0.034 if D. simulans is used as an outgroup and {chi}2 = 4.813, P = 0.028 if D. sechellia is used as an outgroup), which is in accordance with the possible action of positive selection on this region.


 
View this table:
In this window
In a new window

 
Table 3. Test statistics for the tin and bap genes of D. melanogaster

Linkage disequilibrium:
For the whole region there are 1378 pairwise comparisons and 514 (37.30%) of them are significant (Fig 7). With the Bonferroni correction, 156 (11.32%) remain significant. There is strong linkage disequilibrium within the tin gene: 83.69% (272 out of 325) pairwise comparisons are significant (55.69%, 181 with the Bonferroni correction). The significant associations are due mostly to polymorphic sites located within the tin noncoding regions (5'-flanking region and intron); only three exonic polymorphic sites (positions 1884, 2245, and 2284) are involved in significant associations (Fig 7). Within the bap gene 48.43% pairwise comparisons (170 out of 351) show statistically significant linkage disequilibrium (9.69%, 34 with the Bonferroni correction). The distribution of significant associations within bap is not homogeneous: more than half of the sites involved in significant associations are located within exon II. Between tin and bap genes only 5.22% (72 out of 1378) of tests are significant (Fig 7, shaded areas); none is significant with the Bonferroni correction.



View larger version (104K):
In this window
In a new window
Download PPT slide
 
Figure 7. Fisher's exact test of nonrandom associations between pairs of tin and bap polymorphisms. Singleton mutations are excluded from the analysis. Each box in the matrix represents the comparison of two polymorphic sites. The location of the segregating sites on the tin and bap genes is shown on the diagonal, which indicates the position of the 5'-flanking, coding, and 3'-flanking regions. The intergenic associations are shaded boxes. The associations that remain significant after Bonferroni correction are solid boxes. {ddagger}, P < 0.001; {dagger}, 0.001 < P < 0.01; {circ}, 0.01 < P < 0.05.

Within both the tin and bap genes there are strong associations between intronic sites (Fig 7). Clusters of significant linkage disequilibrium that occur predominantly in introns have been repeatedly observed in Drosophila (MIYASHITA and LANGLEY 1988 Down; MIYASHITA et al. 1993 Down; SCHAEFFER and MILLER 1993 Down). KIRBY et al. 1995 Down showed that the linkage disequilibria clustered within the introns of the Adh locus of D. pseudoobscura are caused by epistatic selection maintaining the secondary structures of precursor mRNA. The mechanism underlying the action of epistatic selection is based on a model of compensatory fitness interactions (KIMURA 1985 Down), which suggests that mutations occurring in RNA helices are individually deleterious but become neutral in appropriate combinations. STEPHAN 1996 Down has shown that the rate of compensatory evolution substantially decays over a distance of 100 nucleotides, which is in agreement with our results. Indeed, the distance between the intronic sites is <100 nucleotides for both tin (positions 975, 982, 1010, 1011, 1016, 1019, 1022, 1037, and 1045) and bap (positions 2954, 2999, 3001, 3005, 3011, and 3019). Thus the evolution of the intronic sequences of the tin and bap genes may be subjected to secondary structure constraints.

We have analyzed the relationship between linkage disequilibrium (LD; measured as D') and physical distance between sites by the method of MCVEAN et al. 2002 Down, with the significance of Pearson's correlation coefficient estimated by 10,000 permutations. There is a significant decline in LD with increasing distance for both tin (Pearson's correlation coefficient is –0.1432; P = 0.0070) and bap (–0.2189; P = 0.0009).

Tests of neutrality:
KELLY's (1997) ZnS test (Table 4; based on linkage disequilibrium between segregating sites) detects significant deviations from neutrality for the entire region with recombination 0.0072 (the value of recombination obtained by the method of MCVEAN et al. 2002 Down, Table 2). WALL's (1999) B and Q tests are significant even without recombination. The areas of significant values of Kelly's and Wall's statistics coincide with peaks of linkage disequilibrium within both tin and bap (not shown). For tin the tests are significant with a lower level of recombination than for bap. For instance, the ZnS statistic obtained for tin is significant (P = 0.01) with recombination rate C = 0.0072, while bap requires C = 0.020 (Table 4). Overall the tests are significant for the entire region as well as for each gene separately, with a recombination rate much lower than the laboratory estimate (Clab = 0.0744) based on the physical and genetic maps of D. melanogaster (J. M. COMERON, personal communication; COMERON et al. 1999 Down). We suggest that the significance of the tests could reflect the action of selection combined with the demographic history of D. melanogaster, which originated from Africa and migrated in relatively recent times to the rest of the world. The higher test statistics values for tin may reflect the specific character (rare recombination) of the evolution of this gene, as well as the demographic history of D. melanogaster.


 
View this table:
In this window
In a new window

 
Table 4. Tests of neutrality for the tin and bap genes of D. melanogaster

There are two sets of divergent haplotypes for the tin and bap genes (Fig 1). It is appropriate to use the haplotype test (HUDSON et al. 1994 Down) in this case to see whether directional selection has increased the frequency of some haplotypes. For tin, there are a total of 37 polymorphic sites and a subset of 18 sequences with 13 sites (Fig 1, strains 1–18). The haplotype test is not significant (P = 0.103) even with a laboratory estimate of recombination equal to 0.0744 (see above). A total of 32 polymorphic sites are in the sample of 27 bap sequences, but the set of homogeneous strains (1–13) includes just one polymorphic site (Fig 1 and Fig 3). The probability of this configuration, obtained by the haplotype test, is 0.002, even without recombination. The region of amino acid substitution between species (Fig 1) at the beginning of bap exon I may be a likely candidate for a selective sweep.

We have also used the neutrality tests of DEPAULIS and VEUILLE 1998 Down to analyze the haplotype distribution. The tests are not significant for the tin gene (as in the HUDSON et al. 1994 Down and MCDONALD 1996 Down, MCDONALD 1998 Down tests). The tests are significant for the bap gene when applied to the homogeneous and heterogeneous sets of sequences separately (the haplotype groups are the same as for the HUDSON et al. 1994 Down test). Particularly, for the homogeneous set of sequences, the observed haplotype diversity is 0.167, while the expected haplotype diversity is significantly higher (0.640, P < 0.05), which is compatible with the hypothesis of directional selection. For the heterogeneous set of sequences, this test reveals a significant excess of variability: the observed haplotype diversity is 0.952, but the expected haplotype diversity is 0.850 (P < 0.05), which is compatible with the hypothesis of diversifying selection. We have also applied the DEPAULIS and VEUILLE 1998 Down tests for different functional parts of the tin and bap genes. There is no deviation from neutral expectation for any partition (5'-flanking region, intron, and exon II) of the tin gene (exon I is excluded from these separate analyses because it is only 186 bp). For the bap gene, the tests are significant for exon I and exon II separately, but this significance is due to opposite patterns. For exon I, the test reveals a significant excess of different haplotypes: the observed number is 7, but only 4.9 haplotypes are expected (P < 0.05). For exon II, the test reveals a significant deficit of haplotypes: the observed diversity is 0.604, while the expected haplotype diversity is significantly higher (0.820, P < 0.05). These additional tests corroborate our suggestion that the bap gene is under the complex influence of positive selection (see above).


*  DISCUSSION
*TOP
*ABSTRACT
*MATERIALS AND METHODS
*RESULTS
*DISCUSSION
*LITERATURE CITED

There is a significant difference in the level and pattern of nucleotide variability in tin and bap, two closely linked homeobox genes of D. melanogaster. The level of tin variability is within the range observed in other regulatory genes of Drosophila (MORIYAMA and POWELL 1996 Down; POWELL 1997 Down; BAINES et al. 2002 Down; RILEY et al. 2003 Down) and some other organisms (PURUGGANAN 2000 Down). The silent variability of bap is, however, significantly higher and close to the values observed for the most polymorphic Drosophila genes, such as Est-6 and {psi}Est-6 (BALAKIREV and AYALA 1996 Down, BALAKIREV and AYALA 2003A Down, BALAKIREV and AYALA 2003B Down, BALAKIREV and AYALA 2004 Down; BALAKIREV et al. 1999 Down, BALAKIREV et al. 2002 Down, BALAKIREV et al. 2003 Down; AYALA et al. 2002 Down). The pattern of nucleotide variability in tin and bap is not compatible with an equilibrium model of selective neutrality. We suggest that the colonizing and demographic history of D. melanogaster together with negative (purifying) selection may be the main factors shaping the observed patterns of nucleotide variability. The bap data suggest that positive selection may also contribute to the observed patterns: diversifying selection would have increased the level of nucleotide variation, while directional selection would account for the excess of nearly identical sequences. Positive selection in the bap gene is supported by significant HKA (HUDSON et al. 1987 Down), MCDONALD 1996 Down, MCDONALD 1998 Down, and haplotype (HUDSON et al. 1994 Down; DEPAULIS and VEUILLE 1998 Down) tests. We have previously suggested that the pattern of nucleotide variability of the Est-6 coding region is shaped by the influence of both directional and balancing selection (BALAKIREV et al. 1999 Down, BALAKIREV et al. 2002 Down; AYALA et al. 2002 Down). A similar account has been proposed for the Adh locus of A. thaliana (HANFSTINGL et al. 1994 Down), the Acp29AB gene of D. melanogaster (AGUADE 1999 Down), and regulatory gene TFL1 of A. thaliana (OLSEN et al. 2002 Down). The results of the neutrality tests must, nevertheless, be cautiously interpreted, given the modest-sized sample of sequences from a single population (SIMONSEN et al. 1995 Down). Moreover, there are nonselective factors that could partly account for the patterns of the tin and bap polymorphisms. Possible explanatory processes include bottlenecks and founding effects and/or population admixture, as well as varying recombination rates in different genomic regions. One way of distinguishing between selective and demographic processes could be to perform similar investigations in other populations of D. melanogaster.

The homeobox tin and bap genes are involved in recruiting cardioblasts and visceral muscle primordia from the mesodermal mass (AZPIAZU and FRASCH 1993 Down; BODMER 1993 Down). We have detected significant differences in the level and pattern of nucleotide variability between two closely linked genes that occupy different hierarchical positions in the interacting gene network. tin functions at the top of a genetic hierarchy, specifying the heart and the visceral mesoderm; tin is first expressed in all mesoderm and tin mutations abolish bap expression but not vice versa (AZPIAZU and FRASCH 1993 Down). The level of variability is significantly lower and the ratio of divergence-to-polymorphism is higher in tin than in bap. Also, silent divergence between D. melanogaster and D. simulans is higher in the introns than in the exons of tin, suggesting that selective constraints reduce the level of variation in the tin coding regions. The difference between synonymous and nonsynonymous divergence is less than half for the tin gene compared to that for bap, suggesting that negative selection is stronger in tin. But there is evidence that positive selection is involved in the molecular evolution of bap. Overall, it appears that the higher hierarchical position of tin is associated with lower genetic variability and negative selection, whereas the functionally dependent component of this interacting complex, bap, exhibits evidence of adaptive evolution. A similar relationship was observed between genes of the Ras-mediated signal transduction pathway of Drosophila (RILEY et al. 2003 Down).


*  ACKNOWLEDGMENTS

We are grateful to J. H. McDonald, G. McVean, D. A. Filatov, J. K. Kelly, J. D. Wall, J. M. Comeron, F. Depaulis, and J. Rozas for useful advice on analyses and for providing computer programs. We thank Elena Balakireva for encouragement and help. This work is supported by National Institutes of Health grant GM42397 to F. J. Ayala.

Manuscript received August 15, 2003; Accepted for publication January 9, 2004.


*  LITERATURE CITED
*TOP
*ABSTRACT
*MATERIALS AND METHODS
*RESULTS
*DISCUSSION
*LITERATURE CITED

ADAMS, M. D., S. E. CELNIKER, R. A. HOLT, C. A. EVANS, and J. D. GOCAYNE et al., 2000  The genome sequence of Drosophila melanogaster.. Science 287:2185-2195.[Abstract/Free Full Text]

AGUADÉ, M., 1999  Positive selection drives the evolution of the Acp29AB accessory gland protein in Drosophila. Genetics 152:543-551.[Abstract/Free Full Text]

ANDOLFATTO, P. and M. PRZEWORSKI, 2000  A genome-wide departure from the standard neutral model in natural populations of Drosophila. Genetics 156:257-268.[Abstract/Free Full Text]

AQUADRO, C. F., 1992  Why is the genome variable? Insights from Drosophila. Trends Genet. 8:355-362.[Medline]

AYALA, F. J., E. S. BALAKIREV, and A. G. SÁEZ, 2002  Genetic polymorphism at two linked loci, Sod and Est-6, in Drosophila melanogaster.. Gene 300:19-29.[CrossRef][Medline]

AZPIAZU, N. and M. FRASCH, 1993  tinman and bagpipe: two homeo box genes that determine cell fates in the dorsal mesoderm of Drosophila.. Genes Dev. 7:1325-1340.[Abstract/Free Full Text]

BAINES, J. F., Y. CHEN, A. DAS, and W. STEPHAN, 2002  DNA sequence variation at a duplicated gene: excess of replacement polymorphism and extensive haplotype structure in the Drosophila melanogaster bicoid region. Mol. Biol. Evol. 19:989-998.[Abstract/Free Full Text]

BALAKIREV, E. S. and F. J. AYALA, 1996  Is esterase-P encoded by a cryptic pseudogene in Drosophila melanogaster? Genetics 144:1511-1518.[Abstract]

BALAKIREV, E. S. and F. J. AYALA, 2003a  Nucleotide variation of the Est-6 gene region in natural populations of Drosophila melanogaster.. Genetics 165:1901-1914.[Abstract/Free Full Text]

BALAKIREV, E. S. and F. J. AYALA, 2003b  Molecular population genetics of the ß-esterase gene cluster of Drosophila melanogaster.. J. Genet. 82:101-117.[CrossRef][Medline]

BALAKIREV, E. S. and F. J. AYALA, 2003c  Pseudogenes: Are they "junk" or functional DNA? Annu. Rev. Genet. 37:123-151.[CrossRef][Medline]

BALAKIREV, E. S. and F. J. AYALA, 2004  The ß-esterase gene cluster of Drosophila melanogaster: Is {psi}Est-6 a pseudogene, a functional gene, or a potogene? Genetica 121(in press).

BALAKIREV, E. S., E. I. BALAKIREV, F. RODRIGUEZ-TRELLES, and F. J. AYALA, 1999  Molecular evolution of two linked genes, Est-6 and Sod, in Drosophila melanogaster.. Genetics 153:1357-1369.[Abstract/Free Full Text]

BALAKIREV, E. S., E. I. BALAKIREV, and F. J. AYALA, 2002  Molecular evolution of the Est-6 gene in Drosophila melanogaster: contrasting patterns of DNA variability in adjacent functional regions. Gene 288:167-177.[CrossRef][Medline]

BALAKIREV, E. S., V. R. CHECHETKIN, V. V. LOBZIN, and F. J. AYALA, 2003  DNA polymorphism in the ß-esterase gene cluster of Drosophila melanogaster.. Genetics 164:533-544.[Abstract/Free Full Text]

BEGUN, D. J., S. N. BOYER, and C. F. AQUADRO, 1994  cut locus variation in natural populations of Drosophila.. Mol. Biol. Evol. 11:806-809.[Medline]

BODMER, R., 1993  The gene tinman is required for specification of the heart and visceral muscles in Drosophila.. Development 118:719-729.[Abstract]

COLLET, C., K. M. NIELSEN, R. J. RUSSELL, M. KARL, and J. G. OAKESHOTT et al., 1990  Molecular analysis of duplicated esterase genes in Drosophila melanogaster.. Mol. Biol. Evol. 7:9-28.[Abstract]

COMERON, J. M., M. KREITMAN, and M. AGUADÉ, 1999  Natural selection on synonymous sites is correlated with gene length and recombination in Drosophila. Genetics 151:239-249.[Abstract/Free Full Text]

DEPAULIS, F. and M. VEUILLE, 1998  Neutrality tests based on the distribution of haplotypes under an infinite-site model. Mol. Biol. Evol. 15:1788-1790.[Medline]

DUBOULE, D. (Editor), 1994 Guidebook to the Homeobox Genes. Sambrook & Tooze Publications/Oxford University Press, London/New York/Oxford.

FILATOV, D. A. and D. CHARLESWORTH, 1999  DNA polymorphism, haplotype structure and balancing selection in the Leavenworthia PgiC locus. Genetics 153:1423-1434.[Abstract/Free Full Text]

GIBSON, G. and D. S. HOGNESS, 1996  Effect of polymorphisms in the Drosophila regulatory gene Ubx on homeotic stability. Science 271:200-203.[Abstract]

GOSS, P. J. E. and R. C. LEWONTIN, 1996  Detecting heterogeneity of substitution along DNA and protein sequences. Genetics 143:589-602.[Abstract]

HANFSTINGL, U., A. BERRY, E. A. KELLOGG, J. T. COSTA, III, and W. RÜDIGER et al., 1994  Haplotype divergence coupled with lack of diversity at the Arabidopsis thaliana alcohol dehydrogenase locus: Roles for both balancing and directional selection? Genetics 138:811-828.[Abstract]

HUDSON, R. R., 1983  Properties of a neutral allele model with intragenic recombination. Theor. Popul. Biol. 23:183-201.[CrossRef][Medline]

HUDSON, R. R., 1990  Gene genealogies and the coalescent process. Oxf. Surv. Biol. 7:1-44.

HUDSON, R. R. and N. KAPLAN, 1985  Statistical properties of the number of recombination events in the history of a sample of DNA sequences. Genetics 111:147-164.[Abstract/Free Full Text]

HUDSON, R. R. and N. KAPLAN, 1988  The coalescent process in models with selection and recombination. Genetics 120:831-840.[Abstract/Free Full Text]

HUDSON, R. R., M. KREITMAN, and M. AGUADÉ, 1987  A test of neutral molecular evolution based on nucleotide data. Genetics 116:153-159.[Abstract/Free Full Text]

HUDSON, R. R., D. BOOS, and N. L. KAPLAN, 1992  A statistical test for detecting geographic subdivision. Mol. Biol. Evol. 9:138-151.[Abstract]

HUDSON, R. R., K. BAILEY, D. SKARECKY, J. KWIATOWSKI, and F. J. AYALA, 1994  Evidence for positive selection in the superoxide dismutase (Sod) region of Drosophila melanogaster.. Genetics 136:1329-1340.[Abstract]

JAGLA, K., M. BELLARD, and M. FRASCH, 2001  A cluster of Drosophila homeobox genes involved in mesoderm differentiation program. BioEssays 23:125-133.[CrossRef][Medline]

JUKES, T. H., and C. R. CANTOR, 1969 Evolution of protein molecules, pp. 21–120 in Mammalian Protein Metabolism, edited by H. M. MUNRO. Academic Press, New York.

KELLY, J. K., 1997  A test of neutrality based on interlocus associations. Genetics 146:1197-1206.[Abstract]

KIM, Y. and M. NIRENBERG, 1989  Drosophila NK-homeobox genes. Proc. Natl. Acad. Sci. USA 86:7716-7720.[Abstract/Free Full Text]

KIMURA, M., 1985  The role of compensatory neutral mutations in molecular evolution. J. Genet. 64:7-19.

KIRBY, D. A., S. V. MUSE, and W. STEPHAN, 1995  Maintenance of pre-mRNA secondary structure by epistatic selection. Proc. Natl. Acad. USA 92:9047-9051.[Abstract/Free Full Text]

LAWTON-RAUH, A., R. H. ROBICHAUX, and M. D. PURUGGANAN, 2003  Patterns of nucleotide variation in homeologous regulatory genes in the allotetraploid Hawaiian silversword alliance (Asteraceae). Mol. Ecol. 12:1301-1313.[CrossRef][Medline]

LEMEUNIER, F., and S. AULARD, 1992 Inversion polymorphism in Drosophila melanogaster, pp. 339–405 in Drosophila Inversion Polymorphism, edited by C. B. KRIMBAS and J. R. POWELL. CRC Press, Cleveland.

LUDWIG, M. Z. and M. KREITMAN, 1995  Evolutionary dynamics of the enhancer region of even-skipped in Drosophila.. Mol. Biol. Evol. 12:1002-1011.[Abstract]

MCDONALD, J. H., 1996  Detecting non-neutral heterogeneity across a region of DNA sequence in the ratio of polymorphism to divergence. Mol. Biol. Evol. 13:253-260.[Abstract]

MCDONALD, J. H., 1998  Improved tests for heterogeneity across a region of DNA sequence in the ratio of polymorphism to divergence. Mol. Biol. Evol. 15:377-384.[Abstract]

MCDONALD, J. H. and M. KREITMAN, 1991  Adaptive protein evolution at the Adh locus in Drosophila. Nature 351:652-654.[CrossRef][Medline]

MCVEAN, G., P. AWADALLA, and P. FEARNHEAD, 2002  A coalescent-based method for detecting and estimating recombination from gene sequences. Genetics 160:1231-1241.[Abstract/Free Full Text]

MIYASHITA, N. and C. H. LANGLEY, 1988  Molecular and phenotypic variation of the white locus region in Drosophila melanogaster.. Genetics 120:199-212.[Abstract/Free Full Text]

MIYASHITA, N., M. AGUADÉ, and C. H. LANGLEY, 1993  Linkage disequilibrium in the white locus region of Drosophila melanogaster.. Genet. Res. 62:101-109.[Medline]

MORIYAMA, E. N. and J. R. POWELL, 1996  Intraspecific nuclear DNA variation in Drosophila. Mol. Biol. Evol. 13:261-277.[Abstract]

NEI, M., 1987 Molecular Evolutionary Genetics. Columbia University Pr