Genetics, Vol. 156, 385-399, September 2000, Copyright © 2000

Contrasting Patterns of Nonneutral Evolution in Proteins Encoded in Nuclear and Mitochondrial Genomes

Daniel M. Weinreicha and David M. Randa
a Department of Ecology and Evolutionary Biology, Brown University, Providence, Rhode Island 02912

Corresponding author: Daniel M. Weinreich, Department of Biology, Muir Bldg., University of California, 9500 Gilman Dr., San Diego, CA 92093., dmw{at}ucsd.edu (E-mail)

Communicating editor: A. G. CLARK


*  ABSTRACT
*TOP
*ABSTRACT
*MATERIALS AND METHODS
*RESULTS
*DISCUSSION
*LITERATURE CITED

We report that patterns of nonneutral DNA sequence evolution among published nuclear and mitochondrially encoded protein-coding loci differ significantly in animals. Whereas an apparent excess of amino acid polymorphism is seen in most (25/31) mitochondrial genes, this pattern is seen in fewer than half (15/36) of the nuclear data sets. This differentiation is even greater among data sets with significant departures from neutrality (14/15 vs. 1/6). Using forward simulations, we examined patterns of nonneutral evolution using parameters chosen to mimic the differences between mitochondrial and nuclear genetics (we varied recombination rate, population size, mutation rate, selective dominance, and intensity of germ line bottleneck). Patterns of evolution were correlated only with effective population size and strength of selection, and no single genetic factor explains the empirical contrast in patterns. We further report that in Arabidopsis thaliana, a highly self-fertilizing plant with effectively low recombination, five of six published nuclear data sets also exhibit an excess of amino acid polymorphism. We suggest that the contrast between nuclear and mitochondrial nonneutrality in animals stems from differences in rates of recombination in conjunction with a distribution of selective effects. If the majority of mutations segregating in populations are deleterious, high linkage may hinder the spread of the occasional beneficial mutation.


SINCE the introduction of DNA sequencing technology to population genetics (KREITMAN 1983 Down), many protein-coding loci have been examined in many species. A wide range of patterns of polymorphism and divergence, consistent with a variety of selective processes, have been described in nuclear genes (BROOKFIELD and SHARP 1994 Down; KREITMAN and AKASHI 1995 Down). The elimination of strongly deleterious mutations by the action of purifying selection appears to be a ubiquitous selective force (KIMURA 1983 Down; KREITMAN 1983 Down), although examples of balancing selection (e.g., HUGHES and NEI 1988 Down; KREITMAN and HUDSON 1991 Down) and directional selection (e.g., LONG and LANGLEY 1993 Down; MESSIER and STEWART 1997 Down) acting on amino acid replacement mutations have also been reported.

To date, protein-coding genes on mitochondrial DNA (mtDNA) in animals have not been found to exhibit the diversity of polymorphism and divergence patterns seen in nuclear genes. On the contrary, nearly every sequencing study testing the neutrality of animal protein-coding genes in mtDNA reveals the same pattern: an excess of amino acid replacement mutations segregating within species, relative to fixed amino acid replacement mutations (BALLARD and KREITMAN 1994 Down; NACHMAN et al. 1994 Down, NACHMAN et al. 1996 Down; Rand et al. 1994 Down; Rand AND KANN 1996 Down; WISE et al. 1998 Down). Moreover, surveys of previously published animal mtDNA sequences have extended these observations (HASEGAWA et al. 1998 Down; NACHMAN 1998 Down; Rand AND KANN 1998 Down).

OHTA and KIMURA 1971 Down pointed out that slightly deleterious mutations may reach fixation in populations of finite size in spite of the pressure from purifying selection, as a consequence of genetic drift. Because the probability of fixation for deleterious mutations is an inverse function of the product of N (effective population size) and s (selective coefficient; OHTA 1972 Down), a deleterious mutation with a given s (s < 0) will be more likely to reach fixation in a small population than in a large one. KIMURA 1983 Down(Figure 3.7) made a second prediction about slightly deleterious mutations: for any negative value of Ns, the reduction in fixation probability relative to the strictly neutral expectation will be greater than the reduction in heterozygosity. This is a consequence of the fact that even those slightly deleterious mutations destined for loss may nevertheless persist in the population for a time due to drift and will therefore contribute to heterozygosity. For example, AKASHI 1995 Down has argued that a large effective population size in Drosophila simulans is responsible for the observation of significant excess of selectively "unpreferred" codons segregating in that species, relative to fixed "unpreferred" codons. In contrast, D. melanogaster, which is thought to have a smaller effective population size, shows no such excess segregation of putatively mildly deleterious synonymous mutations.

Thus the observation of a relative excess number of segregating amino acid replacement mutations in mtDNA-encoded loci is consistent with the assumption that many segregating amino acid replacement mutations are slightly deleterious. Rand AND KANN 1996 Down partitioned segregating mutations at the (mitochondrial) ND5 locus in D. melanogaster into synonymous and amino acid replacement sites and applied Tajima's D statistic (TAJIMA 1989 Down) to these two classes of sites independently. They found that the hypothesis of neutral evolution could be rejected only for amino acid replacement mutations, which showed a deviation consistent with weak purifying selection acting on these sites. Recently, NIELSEN and WEINREICH 1999 Down have shown that in models of recurrent mutation and genetic drift, mildly deleterious mutations will on average be younger than neutral mutations, even in the absence of recombination. They further showed that the mean age of segregating amino acid replacement mutations tends to be less than the mean age of segregating synonymous mutations in animal mtDNA, consistent with the view that such mutations are being weakly selected against.

Here we explore two questions suggested by these observations. First, is there significantly more diversity in the patterns of polymorphism and divergence of nuclear-encoded genes than of mitochondrially encoded genes? To assess this question, we have performed a careful survey of the literature for data sets of nuclear and mitochondrial polymorphism and divergence. Second, animal mitochondrial and nuclear DNA exhibit five gross genetic differences: mtDNA apparently lacks recombination (MORITZ et al. 1987 Down; but see LUNT and HYMAN 1997 Down; AWADALLA et al. 1999 Down; EYRE-WALKER et al. 1999 Down), has a smaller effective population size (BIRKY et al. 1983 Down) as a consequence of maternal inheritance and an extreme population bottleneck during the course of oogenesis (BENDALL et al. 1996 Down; PARSONS et al. 1997 Down), and a higher mutation rate in at least some lineages (AVISE 1991 Down), and is haploid (HAUSWIRTH and LAIPIS 1982 Down; JENUTH et al. 1996 Down). If we assume that the molecular evolution of nuclear- and mitochondrially encoded loci is driven by common selective forces, then can these genetic differences account for observed differences in the patterns of evolution? Classical diffusion-derived expressions for polymorphism (KIMURA 1969 Down) and divergence (KIMURA 1957 Down) are known, assuming genetic and selective independence among sites, but could not easily be extended to the present case. We therefore performed a series of computer simulations in which each of these genetic factors was independently varied under models of positive, negative, and no selection. These simulations allowed us to examine the evolutionary and sampling behavior of genes under "nuclear" and "mitochondrial" conditions. While the manifestations of selection in simulated nuclear and mitochondrial genes differ dramatically, no single genetic factor is sufficient to explain the empirical differences seen.


*  MATERIALS AND METHODS
*TOP
*ABSTRACT
*MATERIALS AND METHODS
*RESULTS
*DISCUSSION
*LITERATURE CITED

Published DNA sequences:
Data sets consisting of DNA sequence polymorphism and divergence for 39 nuclear loci from Drosophila spp. and 31 data sets for 7 mtDNA-encoded loci from diverse animal species were compiled from the literature. Many of the nuclear data sets are those used in MORIYAMA and POWELL 1996 Down; most of the mtDNA data sets are pooled from NACHMAN 1998 Down, Rand AND KANN 1998 Down, and NIELSEN and WEINREICH 1999 Down. Two studies of each of three genes in D. melanogaster are included in Table 1 (Acp26Aa, Acp26Ab, and Est-6). For the purposes of this study, a random data set for each of these genes was discarded, leaving 36 nuclear data sets. Gene name, protein length, species, sample size, fixed and polymorphic synonymous and amino acid replacement site counts, neutrality index (N.I., defined below), P value of the test statistic from the associated MCDONALD and KREITMAN 1991 Down test, and citations appear in Table 1 (nuclear encoded) and 2 (mtDNA encoded).


 
View this table:
In this window
In a new window

 
Table 1. Locus, number of codons sequenced, species, sample size, mutation class counts, N.I., statistical significance of the MCDONALD and KREITMAN 1991 Down test, and citation for nuclear-encoded DNA sequence surveys used in this study

A data set consisting of DNA sequence polymorphism and divergence for six nuclear loci from the plant Arabidopsis thaliana was similarly compiled from the literature. Gene name, protein length, sample size, fixed and polymorphic synonymous and amino acid replacement site counts, N.I., P value of the test statistic from the associated MCDONALD and KREITMAN 1991 Down test, and citations appear in Table 3.


 
View this table:
In this window
In a new window

 
Table 2. Locus, number of codons sequenced, species, sample size, mutation class counts, N.I., statistical significance of the MCDONALD and KREITMAN 1991 Down test, and citation for mtDNA-encoded sequence surveys used in this study


 
View this table:
In this window
In a new window

 
Table 3. Locus, number of codons sequenced, sample size, mutation class counts, N.I., statistical significance of MCDONALD and KREITMAN 1991 Down test, and citation for nuclear-encoded sequence surveys of Arabidopsis thaliana used in this study

Computer simulations:
Computer simulations were written in "C" and compiled to run under UNIX. Simulations were parameterized in eight dimensions, shown in Table 4. Simulations follow N chromosomes, each represented by the interval (0, 1), which undergo repeated cycles (generations) of mutation, recombination, random mating and selection, and sampling. All statistics are calculated after recombination but before random mating and selection (WATTERSON 1975 Down; R. N. NIELSEN, personal communication).


 
View this table:
In this window
In a new window

 
Table 4. Definitions of parameters used in computer simulations

Mutations are of two sorts, selected and neutral, and in each generation the number of each sort in the population is determined by an independent Poisson-distributed deviate with mean Nµ/2. Chromosomes to be mutated are chosen at random and mutated "sites" are located as uniformly distributed real numbers on the interval (0, 1). Thus our simulations adhere to the infinite sites model (KIMURA 1969 Down).

In any given generation, the number of recombination events is Poisson distributed with mean Nc. Pairs of "parental" chromosomes to be recombined are chosen randomly and the location of the crossover site is chosen as a uniformly distributed real number on the interval (0, 1). Each recombination event generates two novel chromosomes consisting, respectively, of all sites present on the first parental whose locations are numerically less than the crossover site together with all sites present on the second parental whose locations are numerically greater than the crossover site, and all sites present on the second parental chromosome whose locations are numerically less than the crossover site together with all sites present on the first parental whose locations are numerically greater than the crossover site.

Relative fitness is assessed for diploid genotypes. Genotype frequencies are calculated by the Hardy-Weinberg equation using allele frequencies before selection, which is equivalent to assuming random mating. Thus in these simulations N is both the census and effective population size. Under a multiplicative fitness model, the fitness of the i-jth genotype is given by

(1)

where s is the selection coefficient acting on selected sites, h is the degree of dominance, mi,j is the number of selected sites chromosomes i and j have in common, and ni,j is the number of selected sites appearing on exactly one of chromosomes i and j. is the population mean fitness and is given by

Under an additive fitness model, the fitness of the i-jth genotype is given by

(2)

although wi,j is set to 0 if s < -1/(2mi,j + hni,j). , mi,j, and ni,j are as above.

Finally, Wright-Fisher sampling is performed according to GILLESPIE 1993 Down and all chromosomes in the population are compared to identify sites newly fixed in the population. Whenever such a site is found, the corresponding fixation-event counter is incremented [neutral (cneut) or selected (csel)], and the site is removed from all chromosomes. Since sites reach fixation only in the sampling phase of the simulation, mi,j and ni,j in Equation 1 and Equation 2 include only segregating selected sites, and the fitness of a chromosome is independent of the number of selected site fixations that have previously occurred in the simulation.

Uniform deviates on (0, 1) were generated with the UNIX library random number function (drand48()), seeded with the program's unique process identifier (getpid()). Poisson and binomial deviates were generated as described in PRESS et al. 1992 Down.

Intralineal population bottlenecks were implemented as described in BERGSTROM and PRITCHARD 1998 Down and occur after mutation but before selection. The parameters N and s were varied in this group of simulations, but Nc was set to 0.0, h was set to 1.0, and µ was set to 1/2N. The additional parameters M (the number of intralineal chromosomes before bottleneck, M <= N) and B (the size of the intralineal bottleneck, B <= M) were employed as follows. In all cases, the intralineal bottleneck size (B) was set to 1 and the number of lineages was held at 1000, so that N = 1000 · M. Thus, in these simulations, N is not necessarily equal to the effective population size. The intensity of the bottleneck was parameterized by M, which assumed values of 10 (moderate intensity) and 100 (high intensity). When M is set to 1, the Bergstrom and Pritchard model degenerates to the no-bottleneck model described above.

Simulations were performed at steady-state as previously described (NIELSEN and WEINREICH 1999 Down). Briefly, a population fixed for a chromosome carrying no mutations is initiated for some point in parameter space and run for 100 · N generations to reach quasi-equilibrium. At 2 · Tdiv generation intervals thereafter, all the chromosomes in the population and the neutral and selected site fixation counters (cneut and csel) are recorded in a unique computer data file created for that point in parameter space. 2 · Tdiv generations of simulation correspond to Tdiv generations of divergence occurring simultaneously in two species. cneut and csel were set to zero after the 100 · N generation initialization and after each 2 · Tdiv generation interval.

Since forward simulations are time intensive, these data files represented archived results, which could be reanalyzed as needed. Additionally, random chromosome samples of size n < N were drawn from archived population replicates to examine the consequence of sampling on statistics of interest. Finally, recording replicate results into data files allowed us to make our simulation reentrant, thereby permitting us to utilize QUAHOG (http://www.cs.brown.edu/software/quahog/), a UNIX-based job management facility with access to >100 ULTRASparc1 workstations within the Brown University Computer Science Department. Simulations for each point in parameter space were run until 1000 replicates had accumulated in the data file for that point, unless otherwise noted.

The correctness of the simulations was verified by comparison with expectations from analytic results (KIMURA 1957 Down, KIMURA 1969 Down; CHARLESWORTH et al. 1993 Down) where possible.

Statistics:
Published DNA sequence data sets were tested for deviation from neutral expectation with the MCDONALD and KREITMAN 1991 Down test. In this test, all polymorphic sites are classified either as synonymous or as causing an amino acid replacement, and all fixed differences are similarly classified. No attempt was made to correct for multiple mutations at a nucleotide in counting fixed differences. Additionally, the neutrality index (N.I.; Rand AND KANN 1996 Down) was calculated for each McDonald/Kreitman table as

(3)

As defined by Rand AND KANN 1996 Down, N.I. values range from 0 to {infty}, and under strict neutrality the ratios in the numerator and denominator are expected to be equal (MCDONALD and KREITMAN 1991 Down; but see MAYNARD SMITH 1994 Down), giving an N.I. of 1.0. However, in those data sets in which the number of polymorphic synonymous or fixed replacement sites equals zero, we substituted 1 for the purposes of calculating N.I. (Rand AND KANN 1998 Down) to avoid division by zero. We denote this "no-division-by-zero" protocol with asterisks.

The following statistics were tabulated from the computer simulations: the number of neutral and segregating sites in the entire population in the ith replicate (SiN,neut and SiN,sel, respectively) and the number of neutral and selected site fixation events in the ith replicate (cineut and cisel, respectively). To explore the consequences of sampling from whole populations, 10 independent random samples of x chromosomes each were drawn from each evolutionary replicate. We denote the number of neutral and selected sites segregating in the jth such sample drawn from the ith replicate as Si,jn=x,neut and Si,jn=x,sel, respectively.

N.I.N, the mean neutrality index for the entire population, was calculated as

(4a)

where r is the number of evolutionary replicates performed and N.I.iN, given by

(4b)

represents the neutrality index in the entire population in the ith simulated replicate. N.I.n=x, the mean neutrality index for a sample of x chromosomes drawn from the population, was calculated as

(5a)

where N.I.i,jn=x, given by

(5b)

represents the neutrality index in the jth subsample of size x drawn from the ith simulated replicate. Values of N.I.n=10 and N.I.n=30 were calculated. We extended our no-division-by-zero protocol to these simulated data, substituting a 1 for SiN,neut, Si,jn=x,neut, or cisel, in any case in which a zero was observed.

If one assumes that amino acid replacement mutations are selected and that synonymous mutations are neutral, then Equation 4aEquation 4b and Equation 5aEquation 5b are seen to be equal to Equation 3. Though selection is known to act on some synonymous mutations in both genomes (BALLARD and KREITMAN 1994 Down; AKASHI 1995 Down; Rand AND KANN 1998 Down), few would dispute that on average, selection is stronger on segregating amino acid replacement mutations, and so we do not feel that this assumption undermines our approach (see DISCUSSION).

Power analysis:
The statistical power of the MCDONALD and KREITMAN 1991 Down test to detect selection under the present model was measured as previously described (NIELSEN and WEINREICH 1999 Down). For each set of parameter values simulated, power was estimated for three cases: using all polymorphisms segregating in the population, and using only that polymorphism segregating in random samples of size n = 10 and 30 chromosomes drawn from the population. In the latter cases, McDonald/Kreitman tests were performed on 10 replicate samples drawn from each replicate population. In all cases, the proportion of replicates that gave a test statistic significant at the 5% level was tabulated.


*  RESULTS
*TOP
*ABSTRACT
*MATERIALS AND METHODS
*RESULTS
*DISCUSSION
*LITERATURE CITED

Published nuclear- and mtDNA-encoded DNA sequence analysis:
N.I. values for 39 nuclear- and 31 mtDNA-encoded loci are shown in Table 1 and Table 2. After randomly removing one of each of the three duplicate nuclear data sets (see MATERIALS AND METHODS), the mean nuclear-encoded N.I. value (±SD) is 1.21 ± 1.38 while the mtDNA-encoded mean N.I. is 4.41 ± 4.52, which differ significantly (t = 3.82, d.f. = 66, P = 0.0002). Fig 1 is a frequency histogram of N.I. values for nuclear- and mtDNA-encoded genes, and partitioning N.I. values into the classes shown in Fig 1 reveals a highly significant association with genome (G = 18.56, d.f. = 6, P = 0.005).



View larger version (27K):
In this window
In a new window
Download PPT slide
 
Figure 1. Frequency histogram of observed values of N.I. (Equation 3) for 36 nuclear-encoded (shaded bars) and 31 mtDNA-encoded (solid bars) loci shown in Table 1 and Table 2, respectively.

Moreover, we may restrict ourselves to those data sets with significant (defined as P < 0.05) McDonald/Kreitman test results: there were 6 such nuclear-encoded loci (Acp26Aa, G6pd, jgw, per, Pgi, and z), of which 5 have N.I. values <1.0. In contrast, 15 mtDNA-encoded loci have significant test results (ATPase 6 in D. melanogaster; CO II in hominoids; Cyt b in Ambystoma spp., Brachyramphus spp., Drosophila spp., Grus spp., Melospiza melodia, Microtus spp., Pomatostomus temporalis, and Sciurus aberti; NADH 2 in Homo sapiens; NADH 3 in Mus domesticus and Pan troglodyte; NADH 5 in D. melanogaster; and restriction fragment length polymorphism (RFLP) survey in H. sapiens), only one of which has N.I. values <1.0. These two observations jointly have a P value of 0.0004 against a null hypothesis of no difference in genome-specific bias in direction of significant deviation (G = 12.37, 1 d.f.).

Published sequence analysis for nuclear DNA of A. thaliana:
N.I. values for six nuclear genes from A. thaliana are shown in Table 3. Five of the six genes have N.I. values >1.0, and the mean (±SD) N.I. value for these genes is 2.97 ± 2.02. Three of the genes exhibit significant MCDONALD and KREITMAN 1991 Down test statistics; all of these have N.I. values >1.0.

Diffusion approximation provides a lower bound for N.I. as a function of Ns:
By assuming selective and genetic site independence, KIMURA 1957 Down developed analytic expectations for the probability of selected and neutral site fixation (usel and uneut, respectively), as well as for the number of segregating selected and neutral sites (Ssel and Sneut, respectively; KIMURA 1969 Down). Under these assumptions, a lower bound for N.I. is given by

(6)

where µ is the per-chromosome mutation rate. The asterisks again denote our no-division-by-zero protocol. Thus E(u*sel) is given by the greater of Equation 5.6 of KIMURA 1957 Down and 1/(2 · µ · Tdiv), and E(S*neut) is given by the greater of Equation 29 of KIMURA 1969 Down and 1. The right-hand quantity in Equation 6 represents a lower bound on E(N.I.) because we have substituted the ratio of ratios of expectations for the expectation of a ratio of ratios. By Jensen's inequality, variance in either denominator will inflate the left-hand side of the equation by more than it will the right-hand side.

Simulation of N.I. as a function of Ns:
In Fig 2, we present mean simulated whole-population neutrality index values (N.I.N, Equation 4aEquation 4b) under the multiplicative (Equation 1, open circles) and additive (Equation 2, solid circles) fitness schemes, for = -0.01 <= s <= = 0.01 when N = 1000, µ = = 0.0005, h = 1, Nc = 0, Tdiv = 30N = 30,000 generations, and M = B = 1. Fig 2 also shows the diffusion-derived expression (Equation 6, solid line), which is exceeded by simulated values (under both models) for all values of Ns, as expected. As previously noted (NACHMAN 1998 Down), N.I. is inversely related to Ns. The most striking pattern in the figure is the existence of a maximum N.I., the direct consequence of our no-division-by-zero protocol, which comes to dominate selected fixation values (cisel) in Equation 4aEquation 4b and usel in Equation 6. The maximum in N.I. is not the consequence of replacing the neutral segregating site count (SiN,neut in Equation 4aEquation 4b) with 1, because in our simulations of both fitness models only a very small proportion of replicates have SiN,neut values equal to 0 when Ns < 0, and this proportion is insensitive to Ns [not shown, though recall that Sn,eut is relatively insensitive to background selection (CHARLESWORTH et al. 1993 Down)]. Likewise, E(Sneut) in Equation 6 is independent of Ns (KIMURA 1969 Down) and thus cannot be responsible for any change in slope. The location of the maximum in Fig 2 is approximately the point at which 2 · usel · µ · Tdiv < 1, or equivalently, when usel < 1/(2 · µ · Tdiv), which means that our no-division-by-zero protocol will cause N.I. to become increasingly insensitive to purifying selection as selection strength increases (driving down usel), as mutation rate goes down, and as divergence time decreases. The maximum empirical value of N.I. is also dependent on these parameters, and numeric values of N.I. larger than those shown in Fig 2 are possible with larger values of µ · Tdiv. We present results for a range of values of s and µ, but have restricted ourselves to Tdiv = 30N, which we judge to be a biologically realistic number [e.g., D. melanogaster-D. simulans divergence time is ~3 million years (HEY and KLIMAN 1993 Down) x 10 generations/year ÷ 106 effective population size (KREITMAN 1983 Down) = 30N; H. sapiens-P. paniscus divergence time is ~6 million years (SIBLEY 1992 Down) ÷ 20 years/generation ÷ 104 effective population size (TAKAHATA 1993 Down) = 30N].



View larger version (21K):
In this window
In a new window
Download PPT slide
 
Figure 2. Simulation results for values of N.I.N (Equation 4aEquation 4b) under multiplicative (Equation 1, {circ}) and additive (Equation 2, •) fitness functions for Ns from -10.0 to 10.0. Diffusion-derived lower bound for N.I. (Equation 6, —) is shown. N.I.n=10 (x) and N.I.n=30 (+) (Equation 5aEquation 5b) under multiplicative fitness function are also shown. Other parameter values: N = 1000, µ = 5 x 10-4, h = 1, Nc = 0, Tdiv = 30N, and M = B = 1.

Mean sample neutrality index values (N.I.n=x, Equation 5aEquation 5b) are also shown in Fig 2 for x = 10 (x) and 30 (+) drawn from populations under multiplicative fitness. Note first that the location of the maximum is unaffected by sampling. When s is negative, the number of segregating neutral sites in samples (Si,jn=x,neut) is again largely independent of both sample size and strength of purifying selection (not shown), as was the number of segregating neutral sites in the whole population. Thus the location of the maximum is driven by the behavior of csel, which contributes equally to Equation 4aEquation 4b and Equation 5aEquation 5b. However, the value of N.I.n=x is conservative when s is negative. Selection keeps deleterious mutations at low frequency (TAJIMA 1989 Down), and so such sites will tend to be underrepresented in small samples. But since selection also generally prevents deleterious mutations from fixing, sampling has the effect of reducing the deviation in Equation 5aEquation 5b relative to neutral expectation. Because positively selected sites segregate at high frequency (TAJIMA 1989 Down), sampling has a much smaller effect. Curiously, some underrepresentation of selected sites in very small samples (n = 10) occurs when s is positive but small (<=4/N), thereby biasing N.I.n=10 downward (Fig 2, inset). Thus, under weak positive selection, small sample estimates of N.I. can overstate the true population deviation from the neutral expectation, although this effect is modest.

Behavior under the additive fitness model (Equation 2) does not differ qualitatively for any parameter values examined, and no further results under this model are presented.

Consequences of variation in Nc, N, µ, h, and M on values of N.I.:
As noted in the Introduction, the genetics of mtDNA- and nuclear-encoded genes exhibit five gross differences: recombination rate, effective population size, mutation rate, degree of selective dominance, and intralineal bottlenecks. These aspects were modeled in our simulations by the parameters Nc, N, µ, h, and M, respectively, which were varied independently. Mean N.I. values from these simulations are shown in Table 5. Mean N.I. is monotonic when Ns > -3 (Fig 2), and selection coefficients acting on amino acid replacement mutations in mtDNA-encoded proteins have been estimated to lie in the range -3 <= Ns <= 0 (NACHMAN 1998 Down; NIELSEN and WEINREICH 1999 Down). In the interest of representational clarity, we now restrict ourselves to three selection coefficients, s = -0.003, 0.0, and 0.003, corresponding to Ns of -3.0, 0.0, and 3.0 when N is 1000. Entries in Table 5 are grouped to indicate variation of orthogonal parameters. Each entry represents a point in parameter space, and on each line results are presented in three columns, corresponding to values of s equal to 0.003, 0.0, and -0.003. Within columns, the whole-population neutrality index (N.I.N) and neutrality index values for samples of size n = 10 (N.I.n=10) and 30 (N.I.n=30) are shown to left, center, and right, respectively.


 
View this table:
In this window
In a new window

 
Table 5. Simulated neutrality index values for population and samples

Three patterns seen in Fig 2 are also manifest in Table 5. First, in almost all cases, the inverse relationship between N.I. and Ns is preserved, so that weak positive selection (represented in the left column) gives N.I. values <1.0 and weak purifying selection (right column) gives N.I. values >1.0. Second, sample neutrality index values deviate from 1.0 less than whole-population values. And finally, sample size generally has only a modest effect on N.I.n=x. Several additional conclusions are apparent. Most surprising to us was the general insensitivity of N.I. to recombination. In contrast, N.I.N is very sensitive to the population size (seen when N = 10,000 and when M = 10 and 100, both of which reduce the influence of genetic drift), although this sensitivity is greatly attenuated when the neutrality index is calculated for realistically sized samples. It should also be noted that small values of N · µ cause a jump in the proportion of replicates in which zero segregating sites are observed (e.g., WATTERSON 1975 Down). These zeros bias mean N.I. values down, accounting for the results shown when N = 100 and µ = 5 x 10-5. N.I. was found to be largely insensitive to dominance, although simulations of overdominance (s > 0 and h >= 2 or s < 0 and h < 0) could not be completed because under these conditions the number of segregating sites grew impractically large. Finally, computation time per generation of simulation increased with the number of chromosomes in the population, and the number of generations simulated increased with population size (since Tdiv = 30 · N). Thus, <1000 replicates were completed for large values of Nc, N, µ, and M.

Consequences of variation in Nc, N, µ, h, and M on McDonald/Kreitman power:
The MCDONALD and KREITMAN 1991 Down test compares the ratio of segregating synonymous to amino acid replacement mutations with the ratio of fixed synonymous to amino acid replacement mutations. We performed McDonald/Kreitman tests on the simulated populations presented here as well as random samples thereof, under the assumption outlined above that synonymous mutations are selectively neutral while amino acid replacement mutations are selected. We focused on the frequency of simulated data sets in which a significant deviation is observed while s is nonzero, which represents a measure of the test's statistical power to detect the action of natural selection. Since we have independently varied each of five genetic characteristics in our simulations, these data can be employed to estimate the sensitivity of power of the McDonald/Kreitman test to variations in these factors.

The proportion of replicates that give a significant McDonald/Kreitman test statistic while recombination rate, population size, mutation rate, dominance, and bottleneck size are independently varied is shown in Table 6, which has the same format as Table 5. The test was found to be more sensitive to negative selection than positive selection (AKASHI 1999 Down), and the test's power to detect both positive and negative selection is seen to be sensitive mainly to increases in N and µ. (Recall that N = 1000 · M under the Bergstrom/Pritchard model, so increasing M necessarily increases N.) Increasing either N or µ increases the number of mutations segregating in the population (and in samples thereof), and thus by increasing the numeric values entering the 2 x 2 table these changes naturally increase the power of the test. Recombination increases the test's power to detect purifying selection only very slightly.


 
View this table:
In this window
In a new window

 
Table 6. Proportion of simulation replicates with significant (P < 0.05) MCDONALD and KREITMAN 1991 Down test results


*  DISCUSSION
*TOP
*ABSTRACT
*MATERIALS AND METHODS
*RESULTS
*DISCUSSION
*LITERATURE CITED

Patterns of polymorphism and divergence in nuclear- and mtDNA-encoded proteins differ significantly:
Mitochondrially encoded proteins exhibit a consistent pattern of excess amino acid replacement mutations segregating within species, as measured by N.I. (BALLARD and KREITMAN 1994 Down; NACHMAN et al. 1994 Down, NACHMAN et al. 1996 Down; Rand et al. 1994 Down; Rand AND KANN 1996 Down, Rand AND KANN 1998 Down; HASEGAWA et al. 1998 Down; NACHMAN 1998 Down; WISE et al. 1998 Down). In contrast, nuclear-encoded proteins exhibit a variety of patterns consistent with purifying selection, neutrality, and positive selection (e.g., BROOKFIELD and SHARP 1994 Down; KREITMAN and AKASHI 1995 Down). Our observations comparing published nuclear- and mtDNA-encoded polymorphism and divergence for protein-coding loci (Fig 1) confirm our qualitative intuition that the molecular evolution of loci in animals differs significantly as a function of the encoding genome. Whereas N.I. values for nuclear-encoded loci are evenly distributed about 1.0, the vast majority of N.I. values for mtDNA-encoded loci are >1.0.

To be fair, we did have some a priori expectation of this pattern (e.g., MORIYAMA and POWELL 1996 Down) before we tabulated the data in Table 1 and Table 2, which may artificially inflate the significance of the values presented. But regardless of the historical contingencies by which we became aware of the signal seen in Fig 1, given the extremely small associated P values, patterns of nonneutral evolution clearly differ considerably between nuclear- and mtDNA-encoded proteins.

In theory, this could be a numerical artifact. Because the neutrality index (Equation 3) is a ratio of ratios, estimates of its value will be inflated by large sampling variance in either denominator (fixed amino acid replacement site or polymorphic synonymous site counts). Thus shorter genes or smaller population samples will bias the N.I. upward. And indeed, the mean gene lengths are significantly less in the mtDNA-encoded data set (mean number of amino acids ± SD encoded in nuclear data set, 381.09 ± 152.19; in mtDNA data set, 260.65 ± 156.67, t = 3.18, d.f. = 65, P = 0.0011). However, four of six mtDNA-encoded loci with N.I. values <1.0 are shorter than the sample mean, and among both data sets, N.I. is uncorrelated with gene length (not shown). Moreover, no significant difference in length exists between the 15 longest mtDNA-encoded genes and the entire nuclear data set (t = 0.167, d.f. = 47, P = 0.43) although a highly significant association between N.I. values >1.0 and genome persists (partitioning N.I. values as greater than and <1.0: G = 11.06, d.f. = 1, P = 0.0009). Similarly, no significant difference in length exists between the 18 shortest nuclear-encoded genes and the entire mitochondrial data set (t = 0.270, d.f. = 49, P = 0.39), but again a highly significant association between N.I. and genome is detected (G = 10.36, d.f. = 1, P = 0.0013). Thus the pattern in Fig 1 seems not to be driven by any bias in gene lengths. Sample sizes also differ significantly (n ± SD for nuclear data sets, 17.69 ± 14.98; for mtDNA data sets, 28.99 ± 26.91, t = 1.99, d.f. = 65, P = 0.025); however, the larger average mtDNA data set should reduce variance in those estimates and bias estimates of N.I. downward, suggesting that the reported genome-specific difference in N.I. may be conservative. Thus differences in sampling variance (either in gene length or sample size) cannot account for the pattern in Fig 1.

The McDonald/Kreitman test is insensitive to populations not at equilibrium, to recombination, and to variation in nucleotide mutation rates (MCDONALD and KREITMAN 1991 Down), and we believe the neutrality index is similarly robust. Furthermore, although different sampling strategies may underlie the nuclear and mitochondrial data sets, we do not believe that these differences bias our analysis because N.I. is based on segregating site counts rather than on site frequencies. For example, several of our nuclear data sets (e.g., Adh and Est-6) represent explicitly stratified samples intended to include previously described allozyme classes. However, because truly random samples of even modest size would be expected to include allozymes segregating at moderate frequency, stratification will have only a marginal effect on segregating site counts. Additionally, since allozyme classes are the consequence of amino acid replacement mutations, one would expect that the intentional addition of very rare allozyme classes to one of our data sets would inflate segregating amino acid replacement site counts, biasing N.I. upward. Since nuclear data sets appear on average to suffer a deficit of segregating amino acid replacement sites, this effect would seem to make estimates of N.I. conservative. Several of our mitochondrial data sets may include some geographic stratification, but this is also unlikely to affect our analysis. The local fixation of standing variation will not inflate N.I. as long as the haplotypes are fixed at random with respect to their number of segregating replacement sites. Although multiple-niche models of balancing selection are possible (e.g., LEVINE 1953 Down), in cases where this effect was explicitly tested for in mtDNA, no support for this model was found (FRY and ZINK 1998 Down; BROWN et al. 2000 Down). Finally, mutations under balancing selection are expected to persist in the population, but in the mtDNA data sets employed here, no such evidence exists (NIELSEN and WEINREICH 1999 Down).

Alternatively, natural selection acting on synonymous mutations could be responsible for the pattern seen in Fig 1. For example, in D. simulans, segregating unpreferred synonymous mutations are overrepresented relative to fixations (AKASHI 1996 Down), which will bias N.I. values downward. However, natural selection is similarly known to act on synonymous mutations in mtDNA-encoded genes in several species of Drosophila (BALLARD and KREITMAN 1994 Down; Rand AND KANN 1998 Down). Thus we do not believe that natural selection acting on codon bias is a major factor contributing to the nuclear-mitochondrial contrast we report. Another possible explanation for the pattern in Fig 1 is the much broader species representation among the mitochondrial data sets. However, confining ourselves to data sets from D. melanogaster and D. simulans, a highly significant association between genome and N.I. value still exists (G = 7.48; d.f. = 1; P = 0.006). Although there are two drosophilid mitochondrial data sets with N.I. <1.0, both come from D. pseudoobscura and may reflect a recent population expansion in that species (Rand AND KANN 1998 Down; HAMBLIN and AQUADRO 1999 Down). Thus the much broader species representation among mtDNA data sets cannot explain the pattern we describe.

There are only 13 mtDNA-encoded loci in metazoans (WOLSTENHOLME 1992 Down), and thus the 31 loci in Table 2 necessarily include multiple data sets for single loci, although all such duplicates are from different species. This suggests the possibility that the statistical significance seen in Fig 1 could be the consequence of repeatedly sampling from correlated evolutionary processes. However, the association between genomes and N.I. (partitioned as N.I. > 1.0 and N.I. < 1.0) is preserved when a single data set for each locus is randomly chosen from Table 2 (G = 6.79, d.f. = 1, P = 0.0092). More generally, all mtDNA-encoded proteins are constituents of the enzymes responsible for oxidative phosphorylation (OXPHOS), whereas none of the nuclear-encoded proteins in Table 1 are. This common functionality among mtDNA-encoded loci might cause an evolutionary correlation, driving the observed patterns of polymorphism and divergence. For example, nearly all the nuclear-encoded proteins in Table 1 are soluble whereas OXPHOS enzymes all reside in the inner mitochondrial membrane and are extremely hydrophobic (GILLHAM 1994 Down). It is known that strong purifying selection acts to eliminate hydrophilic amino acids from mtDNA-encoded proteins (see NAYLOR et al. 1995 Down). However, among nuclear-encoded proteins, no correlation exists between N.I. and hydrophobicity (scored by the method of KYTE and DOOLITTLE 1982 Down; not shown). Nevertheless, the possibility that unique selective forces are acting on (mtDNA-encoded) OXPHOS proteins cannot be dismissed. An obvious approach to this question is to examine the polymorphism and divergence patterns in some of the nuclear-encoded OXPHOS proteins, an avenue that we are currently pursuing.

Genetic factors alone seem unable to account for empirical patterns in neutrality index values:
Our simulations (Fig 2) repeat the observation that N.I. is quite sensitive to Ns, the strength and direction of selection (NACHMAN 1998 Down). Thus progressively stronger positive selection increases the selected site fixation count and reduces N.I. monotonically for all parameter values examined. Progressively stronger purifying selection depresses the selected site fixation count and increases N.I., although under the no-division-by-zero protocol employed, N.I. is not permitted to climb to infinity. While we find some effect on N.I. for most of the genetic factors examined (Table 5), no single factor breaks this roughly inverse relationship between s and N.I for biologically realistic parameter values. Population size, which had the strongest influence, predicts smaller deviations from 1.0 in mitochondrial N.I. since mtDNA have smaller effective populations, whereas larger effects are empirically observed. Moreover, population size had its effect greatly attenuated when sample N.I. means were measured. Thus, if we wish to regard our samples of nuclear- and mtDNA-encoded genes as multiple realizations of a single evolutionary process, we are at present unable to appeal to genetic differences between genomes to account for the pattern seen in Fig 1. We acknowledge that we have not explored the effects of interaction among these factors due to the prohibitive amount of computation time required for a thorough exploration of parameter space.

As noted, only 17% (6 of 36) of the nuclear data sets in Table 1 show a significant deviation from neutral expectation by the MCDONALD and KREITMAN 1991 Down test, while 48% (15 of 31) of the mitochondrial data sets in Table 2 do. Inasmuch as many of the nuclear samples were constructed with an a priori intuition that selection might be working, while many of the mtDNA data sets were constructed to explore questions of phylogeography, this contrast is perhaps conservative. However, the results of our McDonald/Kreitman power analysis may account for this pattern. Although the effective population size of mtDNA is less than that of nuclear DNA (BIRKY et al. 1983 Down), the census number of mtDNA molecules in a population is much larger (GILLHAM 1994 Down). Furthermore, in at least some animal species, mtDNA mutation rates are higher than those of nuclear DNA (AVISE 1991 Down). Both of these factors increase the statistical power of the test, particularly for sample sizes used in this study irrespective of the sign of s (Table 6). Thus if one regards the data in Table 1 and Table 2 as repeated samples drawn from a single evolutionary process, on the basis of our power analysis one would predict a greater incidence of significant McDonald/Kreitman test statistics among the mitochondrially encoded data sets. However, these results shed no light on the cause of the highly significant genome-specific differentiation in the direction of deviation among these data sets.

The biological importance of the frequency distribution of s:
At present, there is little support for the hypothesis that unique selective forces acting on mitochondrially encoded OXPHOS proteins explain the pattern shown in Fig 1, although we cannot rule out this possibility. And no single genetic difference between nuclear and mtDNA genetics examined appears sufficient to explain this pattern. However, both our simulations and analytic expectations assume that s is equal for all selected mutations entering the population (we have not included deleterious mutations of large effect since such mutations contribute very little to polymorphism or divergence). This assumption of a single fixed s is clearly simplistic; indeed, it is theoretically problematic (GILLESPIE 1995 Down). However, very little is known about the true frequency distribution of s. Although it is likely that the majority of amino acid replacement mutations are slightly deleterious (OHTA 1973 Down), it seems equally likely that some are also advantageous (GILLESPIE 1995 Down). Nevertheless, we believe the interplay between recombination and a common distribution of mutational s could at least in part be responsible for the striking contrast seen in Fig 1. If the majority of mutations that contribute to polymorphism are indeed slightly deleterious, then we reason that the patterns of polymorphism and divergence for a tightly linked chromosome will be dominated by those mutations, and that when the occasional mutation with a positive s occurs, it will be able to contribute to the process only if it happens to land on a relatively "unloaded" copy, an unlikely event. (Or similarly unlikely, an advantageous mutation would reach fixation in the absence of recombination only if it were sufficiently strongly selected to offset the cumulative effect of the deleterious mutations to which it was linked.) This may account for the pattern seen in mtDNA-encoded proteins, where N.I. values >1.0 predominate. In contrast, the same small fraction of advantageous mutations landing on a recombining chromosome will be more likely to get onto an unloaded segment of the chromosome before being lost. Once on an unloaded chromosomal segment, such a mutation will have its advantage expressed, resulting in differential reproductive output, and therefore will have its frequency increased by selection. We believe this could account for the pattern seen in nuclear-encoded proteins, where N.I. values appear to be evenly distributed ~1.0. [It should be noted that although indirect evidence of recombination in animal mtDNA has recently accumulated (LUNT and HYMAN 1997 Down; AWADALLA et al. 1999 Down; EYRE-WALKER et al. 1999 Down), very low levels of mitochondrial recombination are suggested (EYRE-WALKER et al. 1999 Down).]

Our hypothesis predicts that empirical neutrality index values should be inversely correlated to recombination rate, although among the nuclear genes in Table 1 for which we were able to find published estimates of recombination rate, no correlation exists. Moreover, in a cursory exploration of selective frequency distribution space we were unable to find parameter values in which this effect was observed. Recently, GILLESPIE 1999 Down explored the behavior of N.I. ( in his notation) under several more sophisticated frequency distributions of s. His simulations compared N.I. under free recombination and complete linkage as a function of population size, but, like us, he was unable to find a case in which linkage carried N.I. from ~1.0 to considerably larger values.

However, suggestive comparisons emerge from DNA polymorphism and divergence data recently accumulated from the plant A. thaliana (Table 3). A. thaliana is almost exclusively self-fertilizing, and its effective recombination rate is consequently very low (KAMABE and MIYASHITA 1999 Down). Consistent with our hypothesis, N.I. is >1.0 for five of the six nuclear genes in A. thaliana in Table 3. Of course there are many other biological differences between Arabidopsis and Drosophila evolution that may be responsible for this observation.

Another intriguing system is the Ost/O3+4 chromosomal inversion in D. subobscura. Acph-1 lies very near one of the inversion breakpoints (SEGARRA et al. 1996 Down), and since recombination between inversion haplotypes is greatly suppressed near breakpoints, under random mating the recombination rate at Acph-1 within karyotype will be proportional to p2, where p is the frequency of the karyotype in question. Thus our hypothesis predicts that neutrality index values calculated at Acph-1 within karyotypes should be correlated with the square of karyotype frequency. And indeed, in a sample of D. subobscura taken from a population in which the frequency of O3+4 was estimated as 0.767 and of Ost as 0.147 (NAVARRO-SABATE et al. 1999 Down), N.I. values within the former karyotype are much lower (1.70) than within the latter (9.0; N.I. calculated from data in NAVARRO-SABATE et al. 1999 Down). Since these karyotypes exhibit a latitudinal cline (NAVARRO-SABATE et al. 1999 Down), this system offers the possibility of varying effective recombination rate while holding gene function constant by sampling from different points along the cline and calculating N.I. within the karyotype.

Finally, several groups (BRAVERMAN et al. 1997 Down; SCHUG et al. 1998 Down; JENSEN et al. 1999 Down) are exploring the interaction of recombination rate and levels of putatively silent polymorphism. Surprisingly, there are few Drosophila data sets for protein-coding loci in regions of lowest recombination. We are now beginning to collect additional polymorphism and divergence data from such loci located on the tip of the X and on the fourth chromosome of D. melanogaster, regions of low recombination, to test the joint predictions that our hypothesis makes.

While our simulations revealed no single genetic factor to account for the marked difference in patterns of nonneutral evolution seen in nuclear- and mtDNA-encoded proteins (Fig 1), we suggest two (nonexclusive) hypotheses. Fig 2 and Table 5 demonstrate that N.I. is inversely related to Ns, so that if the selective histories of the genes in Table 1 and Table 2 are distinct, N.I. will be affected. Thus, if the fraction of mildly deleterious amino acid replacement mutations entering OXPHOS genes is larger than the corresponding fraction for nuclear loci (or equivalently if the opportunities for positive selection are greater for nuclear-encoded loci), mtDNA-encoded N.I. values will be biased upward. Additionally, we speculate that genetic linkage in mtDNA results in patterns of polymorphism and divergence that are dominated by the largest class of mutations entering the population. If the frequency distribution of selection coefficients is such that a majority of mutations that contribute to polymorphism and divergence are mildly deleterious, values of N.I. >1 may result in regions of low recombination. Both hypotheses are open to experimental attack.


*  ACKNOWLEDGMENTS

R. Nielsen encouraged us to explore this problem by computer simulation and solved an interesting bug. Two anonymous reviewers improved this study considerably. Access to over 100 SUN workstations were kindly made available to us by the Brown University Computer Science Department. D.M.W. was supported by National Science Foundation grants 9527709 and 9707676 awarded to D.M.R.

Manuscript received June 22, 1999; Accepted for publication May 19, 2000.


*  LITERATURE CITED
*TOP
*ABSTRACT
*MATERIALS AND METHODS
*RESULTS
*DISCUSSION
*LITERATURE CITED

AGUADÉ, M., 1998  Different forces drive the evolution of the Acp26Aa and Acp26Ab accessory gland genes in Drosophila melanogaster species complex. Genetics 150:1079-1089[Abstract/Free Full Text].

AGUADÉ, M., 1999  Positive selection drives evolution of the Acp29AB accessory gland protein locus in Drosophila. Genetics 152:543-551[Abstract/Free Full Text].

AKASHI, H., 1995  Inferring weak selection from patterns of polymorphism and divergence at "silent" sites in Drosophila DNA. Genetics 139:1067-1076[Abstract].

AKASHI, H., 1996  Molecular evolution between Drosophila melanogaster and D. simulans: reduced codon bias, faster rates of amino acid substitution, and larger proteins in D. melanogaster.. Genetics 144:1297-1307[Abstract].

AKASHI, H., 1999  Inferring the fitness effects of DNA mutations from polymorphism and divergence data: statistical power to detect directional selection under stationarity and free recombination. Genetics 151:221-238[Abstract/Free Full Text].

AVISE, J. C., 1991  Ten unorthodox perspectives on evolution prompted by comparative population genetic findings on mitochondrial DNA. Annu. Rev. Genet. 25:45-69[Medline].

AWADALLA, P., A. EYRE-WALKER, and J. MAYNARD SMITH, 1999  Linkage disequilibrium and recombination in hominid mitochondrial DNA. Science 286:2524-2525[Abstract/Free Full Text].

BALAKIREV, E. S., E. I. BALAKIREV, F. RODRÍGUEZ-TRELLES, and F. J. AYALA, 1999  Molecular evolution of two linked genes, Est-6 and Sod, in Drosophila melanogaster.. Genetics 153:1357-1369[Abstract/Free Full Text].

BALLARD, J. W. O. and M. KREITMAN, 1994  Unraveling selection in the mitochondrial genome of Drosophila. Genetics 138:757-772[Abstract].

BEGUN, D. J. and C. F. AQUADRO, 1994  Evolutionary inferences from DNA variation at the 6-phosphogluconate dehydrogenase locus in natural populations of Drosophila: selection and geographic differentiation. Genetics 136:155-171[Abstract].

BENDALL, K. E., V. A. MACAULAY, J. R. BAKER, and B. C. SYKES, 1996  Heteroplasmic point mutations in the human mtDNA control region. Am. J. Hum. Genet. 59:1276-1287[Medline].

BERGSTROM, C. T. and J. PRITCHARD, 1998  Germline bottlenecks and the evolutionary maintenance of mitochondrial genomes. Genetics 149:2135-2146[Abstract/Free Full Text].

BIRKY, C. W., JR., T. MARUYAMA, and P. FUERST, 1983  An approach to population and evolutionary genetic theory for genes in mitochondria and chloroplasts, and some results. Genetics 103:513-527[Abstract/Free Full Text].

BRAVERMAN, J., M. AGUADÉ and C. LANGLEY, 1997 Reduced level of DNA sequence variation at the erect wing locus of D. melanogaster and D. simulans, p. 234A in Proceedings of the 38th Annual Drosophila Research Conference, Chicago, April 1997. Genetics Society of America, Bethesda, MD.

BROOKFIELD, J. F. Y. and P. M. SHARP, 1994  Neutralism and selectionism face up to DNA data. Trends Genet. 10:109-111[Medline].

BROWN, A. F., L. M. KANN, and D. M. RAND, 2000  Gene flow versus local adaptation in the Northern acorn barnacle Semibalanus balanoides: insights from mtDNA control region polymorphism. Evolution in press.

CHARLESWORTH, B., M. T. MORGAN, and D. CHARLESWORTH, 1993  The effect of deleterious mutations on neutral molecular variation. Genetics 134:1289-1303[Abstract].

EANES, W. F., M. KIRCHNER, and J. YOON, 1993  Evidence for adaptive evolution of the g6pd gene in Drosophila melanogaster and Drosophila simulans lineages. Proc. Natl. Acad. Sci. USA 90:7475-7479[Abstract/Free Full Text].

EYRE-WALKER, A., N. H. SMITH, and J. MAYNARD SMITH, 1999  How clonal are human mitochondria? Proc. R. Soc. Lond. Ser. B 266:477-483[Medline].

FRY, A. J. and R. M. ZINK, 1998  Geographic analysis of nucleotide diversity and song sparrow (Aves: Emberizidae) population history. Mol. Ecol. 7:1303-1313[Medline].

GILLESPIE, J. H., 1993  Substitutional processes in molecular evolution. I. Uniform and clustered substitutions in a haploid model. Genetics 134:971-981[Abstract].

GILLESPIE, J. H., 1995  On Otha's hypothesis: most amino acid substitutions are deleterious. J. Mol. Evol. 40:64-69.

GILLESPIE, J. H., 1999  The role of population size in molecular evolution. Theor. Popul. Biol. 55:145-156[Medline].

GILLHAM, N. W., 1994 Organelle Genes and Genomes. Oxford University Press, New York.

GLEASON, J. M. and J. R. POWELL, 1997  Interspecific and intraspecific comparisons of the period locus in the Drosophila willistoni sibling species. Mol. Biol. Evol. 14:741-753