- THIS ARTICLE
-
Abstract
- Full Text (PDF)
- Alert me when this article is cited
- Alert me if a correction is posted
- SERVICES
- Similar articles in this journal
- Similar articles in PubMed
- Alert me to new issues of the journal
- Download to citation manager
- Reprints & Permissions
- CITING ARTICLES
- Citing Articles via HighWire
- Citing Articles via Google Scholar
- GOOGLE SCHOLAR
- Articles by Bustamante, C. D.
- Articles by Hartl, D. L.
- Search for Related Content
- PUBMED
- PubMed Citation
- Articles by Bustamante, C. D.
- Articles by Hartl, D. L.
Directional Selection and the Site-Frequency Spectrum
Carlos D. Bustamantea, John Wakeleya, Stanley Sawyerb, and Daniel L. Hartlaa Department of Organismic and Evolutionary Biology, Harvard University, Cambridge, Massachusetts 02138
b Department of Mathematics, Washington University, St. Louis, Missouri 63130
Corresponding author: Daniel L. Hartl, Department of Organismic and Evolutionary Biology, Harvard University, 16 Divinity Ave., Cambridge, MA 02138., dhartl{at}oeb.harvard.edu (E-mail)
Communicating editor: N. TAKAHATA
| ABSTRACT |
|---|
In this article we explore statistical properties of the maximum-likelihood estimates (MLEs) of the selection and mutation parameters in a Poisson random field population genetics model of directional selection at DNA sites. We derive the asymptotic variances and covariance of the MLEs and explore the power of the likelihood ratio tests (LRT) of neutrality for varying levels of mutation and selection as well as the robustness of the LRT to deviations from the assumption of free recombination among sites. We also discuss the coverage of confidence intervals on the basis of two standard-likelihood methods. We find that the LRT has high power to detect deviations from neutrality and that the maximum-likelihood estimation performs very well when the ancestral states of all mutations in the sample are known. When the ancestral states are not known, the test has high power to detect deviations from neutrality for negative selection but not for positive selection. We also find that the LRT is not robust to deviations from the assumption of independence among sites.
THE Poisson random field (PRF) models of ![]()
![]()
![]()
![]()
![]()
![]()
![]()
Since the PRF model assumes independence among sites, it is a limiting case for population genetic models that incorporate mutation, selection, and recombination. For typical DNA sequence data, in which many sites segregate simultaneously, the assumption of independence among sites is equivalent to the assumption that there is free recombination between segregating nucleotides. Studying statistical inference in the PRF framework can therefore illuminate efforts to build models that relax the assumption of independence among sites. The reason for this is that any model that hopes to estimate mutation, selection, and recombination parameters can do as well as or worse than the PRF if the data conform to the PRF model.
It would be of great interest to know, therefore, how well maximum-likelihood methods for point estimation and hypothesis testing in the PRF framework perform under varying levels of selection and mutation. One property of interest is the probability of rejection neutrality if the locus is truly under selection, which addresses the power of the likelihood-ratio tests (LRTs) of neutrality. Another statistical issue of interest concerns the robustness of these models to deviations from the assumption of independently evolving sites, since this assumption may be unrealistic for certain types of data.
This article treats in detail the PRF model for site-frequency data proposed by ![]()
| THEORY |
|---|
Poisson random field model:
First let us consider a simple example with which to illustrate the type of data being modeled. The data in Table 1 represent a sample of six individuals sequenced at five variable DNA sites. Each column is an aligned DNA site and each row is an individual sampled from the population. At each site where the individual carries the derived nucleotide, the resulting entry in the matrix is 1; otherwise the entry is 0. For each site, then, the sum of the column gives the number of individuals in the sample that carry the derived form of the mutation.
|
Define the unfolded site-frequency spectrum as the random vector X = (X1, X2, ... , Xn-1) of sample configurations Xi, where Xi represent the number of sites that have n - i ancestral and i derived nucleotides for 1
i
n - 1 among the n aligned nucleotide sequences. For the example in Table 1 the unfolded site-frequency spectrum is (3, 1, 0, 1, 0) since there are three sites where one individual carries the derived mutation, one site where two individuals carry the derived mutation, no sites where three individuals or five individuals carry the derived mutations, and one site where four individuals carry the derived mutation.
So far we have assumed that for each site we know which state is derived and which is ancestral. If this information is unknown, we would model the folded site-frequency spectrum, X', defined as the number of sites where 1 individual or (n - 1) individuals carry a mutation, 2 individuals or (n - 2) individuals carry a mutation, etc., so that X'i = Xi + Xn-i for i < n - i and X'
= Xn/2 for i =
. For the example above, the folded site-frequency spectrum is (3, 2, 0).
We wish to model the site-frequency spectrum for DNA sites in an infinite-sites (i.e., irreversible mutation) independent (i.e., freely recombining) model with (weak) directional haploid selection. We assume that all mutations that enter the population have the same selective effect and that there are no epistatic interactions among mutations (i.e., independent fitness effects of all mutations). We also assume that there is no population structure to the data and the population has achieved statistical stationarity.
![]()
![]()
![]()
) be the density of the frequency of a single mutation, q, in the range dq, their formula is
![]() |
(1) |
where
is the product of the selection coefficient (a in Fisher's notation, usually s in Wright's notation) and the effective haploid population size. Here s is the per-generation increase in log(p/q) so that
q = s p(1 - p) if s is small.
![]()
/2)f(q,
), where
/2 is the per-locus mutation rate scaled by the effective population size. They also showed that this random field is a PRF, which means that the number of sites whose derived mutation frequency q satisfies 0 < a < q < b < 1 has a Poisson distribution for given a, b and also that the number of sites for nonoverlapping intervals (a, b) is independent (![]()
For a particular site, given the frequency q of the mutation in the population, the probability of sampling i sequences of one type and n - i of another is binomially distributed with parameters n and q. If mutations enter the population as a Poisson process with rate
/2, the Xi's are independent Poisson-distributed random variables with mean,
F(i,
), where
![]() |
(2) |
Therefore, the probability of observing xi sites that have i derived and (n - i) ancestral mutations is
![]() |
(3) |
The PRF model outline above leads directly to a likelihood-ratio test of neutrality, which compares the null hypothesis
= 0 with the alternative hypothesis that
0. Since the Xi's are independent, the likelihood function is the product of the individual p(Xi = xi|
,
). Letting Lf(
,
| x) be the likelihood function for the folded model and Lu be the likelihood function for the unfolded model,
![]() |
(4) |
and
![]() |
(5) |
where
i
is the greatest integer
i and
![]() |
(6) |
To perform the likelihood-ratio test, we need to maximize (4) or (5) for
and
under (A) the unconstrained state space and (B) the constraint that
= 0. Letting
and
be the unconstrained MLEs of
and
, and
W the MLE of
under neutrality, the likelihood-ratio test statistic,
, is
![]() |
(7) |
Appealing to large sample results (![]()
) is 
21 distributed since the neutral and selected models differ by 1 d.f.
Maximum-likelihood estimation and asymptotic results:
Finding the maximum-likelihood estimates of
and
for both the folded and unfolded site frequency models is relatively straightforward. Let l be the log-likelihood function for the unfolded model
![]() |
(8) |
The maximum-likelihood estimates of
and
(i.e.,
and
) are usually found by setting the partial derivatives of the log-likelihood function equal to zero. Letting S represent the total number of segregating sites (S =
n-1i=1 xi),
![]() |
(9) |
![]() |
(10) |
where
![]() |
(11) |
and
![]() |
(12) |
An efficient approach to solving for the MLEs is to maximize the profile likelihood that has the same global maximum but requires only maximization in one dimension. The log-profile-likelihood function, l*(
), is l(
,
), where
is the maximum-likelihood estimate of
given
. By setting (9) to zero and solving for
one can easily show that
![]() |
(13) |
It follows from (2) that
![]() |
(14) |
which proves that under neutrality
![]() |
(15) |
Here
W is WATTERSON's (1975) estimator of
. This shows that the total number of segregating sites is a sufficient statistic for
given
. The two preceding results hold, regardless of the folding of the site configurations since
n-1i=1F(i,
) =
[
]i=1G(i,
).
We can now maximize the log-profile-likelihood function l*(
) numerically, using a standard optimization technique such as Newton-Raphson iteration (![]()
) with respect to
. To do this we first substitute (13) into (8) and take the necessary derivatives. The results are
![]() |
(16) |
and
![]() |
(17) |
where
![]() |
(18) |
![]() |
(19) |
![]() |
(20) |
and
![]() |
(21) |
where
![]() |
(22) |
To find the equivalent results for the folded model, substitute n by [n/2] and F, F', and F'' by G, G', and G'', respectively, in the equations above, where
![]() |
(23) |
![]() |
(24) |
To find the asymptotic covariance matrix of the maximum-likelihood estimates, we compute the Fisher information matrix I(
,
), which is minus the expected value of the matrix of second derivations of the log-likelihood function. Since the components of I are linear in xi and S, this is equivalent to replacing xi by E(Xi) =
F(i,
) and S by
T0(
). The asymptotic covariance matrix of
and
is the inverse of the empirical Fisher information matrix I (
,
).
The components of the Fisher information matrix I (
,
) are easily shown to be
![]() |
(25) |
![]() |
(26) |
![]() |
(27) |
where
![]() |
(28) |
The asymptotic covariance of the MLEs is the inverse matrix of I (
,
), which by Cramer's rule equals
![]() |
(29) |
evaluated at (
,
). It then follows that
![]() |
(30) |
![]() |
(31) |
![]() |
(32) |
Note that the negative inverse of the second derivative of the log-profile-likelihood function (17), with xi replaced by E(Xi) =
F(i,
) and
by S/T0, is formally the same as (32). this is consistent with the fact that both expressions give the asymptotic variance of
.
From general likelihood theory, we expect (
,
) in large samples to be distributed as a multivariate normal with mean (
,
) and variance-covariance matrix M. Using this result and the fact that 2 ln(
) is approximately
21 distributed, we can define two types of confidence sets for
at given confidence level 1 -
.
The first confidence set consists of
in the interval
± Z(1 -
/2)
, where Z(1 -
/2) is the standard normal quantile corresponding to level 1 -
/2 (e.g., Z(0.975) = 1.96). The second confidence set corresponds to the set of values of
for which we would not reject the likelihood-ratio test:
such that l*(
)
l*(
) - 0.5
21,1-
(![]()
| SIMULATIONS |
|---|
The two properties of the LRT that we are interested in exploring are the size and power of the test.
The size, confidence level, or probability of type I error of a test is defined as the probability of rejecting a true null hypothesis and is denoted
. The power of a test is the probability of rejecting a false null hypothesis and is denoted 1 - ß, where ß is the probability of not rejecting a false null hypothesis (i.e., the probability of a type II error).
In particular, we explore whether the LRT maintains the proper size even if the assumption of independence among segregating sites is not met. If the test is robust to the assumption of free recombination and the null hypothesis is true, then in repeated sampling we should reject the null hypothesis (100 ·
)% of the time. If the test is not robust to the assumption of free recombination, the realized size of the test will vary with the rate of recombination. We might also expect the realized size of the test to vary with the mutation rate if the test is not robust to recombination, since a larger mutation rate will produce a test with a higher probability of rejecting the null hypothesis. It would also be of interest to know how the power of the LRT changes as the strength of selection changes for data that conform to the independence assumption.
To explore these two issues, as well as the coverage of the two different types of confidence intervals for
, we simulated data under two types of models. The first type of data was generated using coalescent simulations for neutral data under varying rates of recombination and mutation (R. HUDSON, personal communication). These data sets were used to explore the robustness of the test to deviations from independence among sites. The second type of data was generated using the PRF model outlined above by sampling independent Poisson random variables with rates given by mutation and selection parameters of interest. These data were used to explore the power of the test under varying levels of mutation and selection, as well as the sampling distributions of
W,
, and
. All of the data sets generated had n = 10 sequences and consisted of 500 replicates. The same data were used for all folded and unfolded analysis, with the folded site configurations generated by summing 1 and (n - 1), 2 and (n - 2), etc.
All numerical integration was done using Romberg integration, maximization by Newton-Raphson iteration with bisection, and random number generation using standard numerical algorithms (![]()
| RESULTS AND DISCUSSION |
|---|
Size and power of the LRT:
In Table 2 we summarize the results for the analysis of the size of the PRF test for both folded and unfolded site configurations under two different mutation rates and 10 levels of recombination. From this analysis we see that the Poisson random field LRT is extremely sensitive to the assumption of independence among sites. If the test were not sensitive to this assumption, we would expect the entries in the table to fluctuate
5% but we see rejection levels that reach upward of 50% for tight linkage. While the strength of the effect declines quickly as the rate of recombination increases, the tests do not attain level
until the rate of recombination is more than an order of magnitude greater than the mutation rate.
|
The reason for this becomes clear when one examines the effect of recombination rate on the distribution of the site-frequency spectrum, X. In Fig 1 we present representative replicates of X from the coalescent simulations (
= 0) under varying levels of recombination for
= 10. The expected value of the number of sites at a given frequency i/n is the same for each of the six graphs presented [E(Xi) =
/i]. The variance of Xi, though, increases as the recombination rate tends to zero. The reason, therefore, that we reject neutrality more than expected in Table 2 with increasing linkage is that we underestimated the variance for the site frequencies. Likewise, if neutral sites are linked to selected sites there will also be an increase in the variance in the distribution of the X. These results suggest that the analysis of single genes using this method is invalid if the rate of recombination is not extremely large, since one rejects the null hypothesis (which of course includes the assumption of free recombination) if the data are neutral but not independent.
|
Performance of maximum likelihood for point estimation in the PRF framework:
While the method may not be applicable for linked regions, the analysis of genome-wide single-nucleotide polymorphism may have the quasi-independence required for the test to maintain level
. It would be desirable, therefore, to know how well the method performs in moderate sample sizes when the data conform to the PRF model. For the remainder of this article, all of the analysis is performed on data generated under the PRF model.
In Fig 2 we summarize the results for the analysis of the power of the PRF test for both folded and unfolded site configurations under the same mutation rates above for
[-10, 10]. We see from that that the LRT has very good power to detect deviations from neutrality even in moderate sample sizes when the ancestral states of variable sites are known (i.e., for unfolded site configurations). A surprising result is that for folded site configurations the test has strong power to detect negative selection but very weak power to detect positive selection. The reason for this is related to the issue of point estimation in general in the PRF framework.
|
From general-likelihood theory, we expect the maximum-likelihood estimate for a parameter under certain regularity conditions to be consistent, unbiased, and efficient as the sample size used to estimate the parameter tends to infinity. By "consistent" we mean that the parameter estimate will converge in probability to the true value of the parameter (i.e., the variance in the parameter estimate will tend toward 0), by "unbiased" we mean that the expected value of the parameter estimate is centered on the true value of the parameter, and by "efficient" we mean that the estimator has the lowest possible variance of all possible estimators. In practice, though, maximum-likelihood estimation can often behave very poorly if the sample size is not large enough or if the regularity conditions are not met.
For the levels of mutation examined in our generated data, we find that the distributions for
are reasonably centered around the true value of
for the unfolded model (Fig 3) with increasing variance as one moves away from neutrality. We note that in regions of extreme selection (|
| > 5) the distribution for
becomes quite skewed with
=
or
= -
for certain replicate data sets. The reason for this is that the log-likelihood function becomes very flat when the magnitude of
is very large. In the case of negative selection, samples consisting of only singletons are consistent with extreme negative selection and very high mutation, so that
becomes indistinguishable from -
and we show shortly why the likelihood function asymptotes to
for any given data set. These results indicate that these extrema represent either a local maximum or minimum that needs to be checked once an MLE for
has been estimated. If it is found that the likelihood at one of these two extremes is higher than the estimate
, this is problematic for point estimation but not for hypothesis testing since we can still test whether the observed sample configuration is consistent with a neutral model (for most of the cases we examined where
=
or
= -
for unfolded data, the LRT is significant).
|
While maximum-likelihood estimation works reasonably well for point estimation using unfolded data, the method performs very poorly for folded data in the region of positive selection as alluded to above in the section on power. As we see in Fig 3, for the folded data the method performs well in terms of estimating
for the region of negative selection, performs reasonably well for a small region of positive selection, and then asymptotes to neutrality for arbitrarily large
.
For a better understanding of this phenomenon, let us focus on one of the replicate data sets we generated under
= 10 and
= 10, x = (11, 6, 8, 1, 2, 6, 2, 3, 10). For these data,
= 8.66,
=
, and LRT = 18.08 (P < 0.0001) for the unfolded model. We note that the profile-likelihood function for the unfolded data is monotonic with a peak at
(Fig 4). As expected from the LRT, we see in Fig 5 that the predictions from the infinite selection model are quite different from the predictions of the neutral model. For the folded model,
= 20.03,
= -0.473, and LRT = 0.18 (P = 0.328). As expected from the LRT, we see in Fig 5 that the folded site-frequency spectrum is fitted very well by a neutral model. What may be surprising is that the data are equally well fit by a model with infinitely strong positive selection! Another interesting property of these data is that the likelihood function for folded data configurations has a bimodal nature with a region in between that can be significantly different from either peak, implying that the data are consistent with either neutrality or strong selection but not weak selection (Fig 4).
|
|
The reason for this phenomenon is that when the site configurations are folded, the limiting distribution as
is indistinguishable from neutral evolution with a mutation rate that is twice as large as the true mutation rate. This can be shown by noting that the log-likelihood function for a given data set will reach finite asymptotic value for
since
![]() |
(33) |
Recalling that G(i,
) = F(i,
) + F(n - i,
) and the result derived earlier that F(i, 0) = 1/i, we can easily see that
![]() |
(34) |
![]() |
(35) |
leading to a bimodal log-likelihood function in the extreme case with maxima near 0 and
as seen in the example above.
It is important to note that this is a general problem for methods that attempt to detect selection using folded site configurations [e.g., TAJIMA's (1989) D-statistic]. This also explains the odd result reported in ![]()
One interesting point related to this issue is the effect of the estimation of
on the estimation of
. We see from Fig 6 that estimation of
is excellent for the unfolded model for regions of positive selection. It may seem paradoxical that while the variance in the estimates of
increases with increasing level of selection, the variance in the estimates of
decreases. The reason for this is that once selection becomes large the predictions of the PRF model for the site-frequency spectrum become indistinguishable (this is exactly the reason for the asymptote at
for positive selection), but only a small range of
is consistent with the observed number of segregating sites. In the case of negative selection, the results are reversed since as selection increases in the negative selection the values for F(i,
) become very small, leading to large variance in the estimate of both selection and mutation.
|
For the folded model, in Fig 6 we begin to see the effect of the maximum near 0 for
on the estimation of mutation. Watterson's estimator of
will always be larger than the estimated value of
under positive selection, since F(i,
) increases with increasing
and reaches its maximum as explained above at twice the value of the partial harmonic sum in Watterson's estimator. This means that since we tend to underestimate selection in the folded model for positive selection, we will tend to overestimate
.
Confidence sets for
:
The last issue we explore is the construction of confidence intervals for the selection parameter in the PRF framework. As we outlined above our two methods for constructing confidence intervals are (1) the region that contains (1 -
) · 100% of the area in the normal approximation to likelihood function and (2) the region in the profile likelihood space that is <0.5
21,1-
likelihood units from the maximum-likelihood point. For the first confidence set, at
= 0.05 this corresponds to the area that is 1.96 standard deviations away from
where the standard deviation is given by
. Likewise for the second set, at
= 0.05 this corresponds to the region where l*(
) - l*(
) < 3.84/2 likelihood units.
We see from Fig 7 that confidence intervals based on the normal approximation to the likelihood tend to undercover (i.e., they tend to be too small), particularly for the folded model and for strong positive selection. The confidence intervals based on the profile likelihood, however, have excellent coverage regardless of the folding of the data or the strength of selection.
|
The reason for this is seen in Fig 8, where we plot the log-profile-likelihood function and its normal approximation for a data set with the same
but varying sample sizes (in terms of the number of segregating sites for the sample). We see that the number of segregating sites needs to be quite large before the normal approximation begins to have the same coverage as the profile likelihood in a region of 1.92 log-likelihood units. This occurs because a confidence interval based on the normal approximation will, by definition, be symmetric, and the log-likelihood function is quite right skewed (as we explained above). Since we need an immensely large sample for the region around the maximum-likelihood point to be normally distributed (upward of 1000 segregating sites), confidence intervals based on the normal approximation will tend to undercover.
|
| CONCLUSIONS |
|---|
The Poisson random field model for site-frequency data holds great potential for untangling the effects of mutation and selection on standing genetic variation. Our analysis suggests, however, that when the data do not conform to the assumptions of very loose linkage, the estimates from this model may be misleading. Likewise, if the data set being analyzed is small, one should be cautious about the asymptotic assumptions implicit in maximum likelihood as a method of point estimation. This is a general problem for methods that attempt to estimate selection using site-frequency data, and methods that attempt to incorporate recombination into point estimation will only make some of these problems worse. The solution may be to focus on genetic variation that is independent (such as the growing number of single nucleotide polymorphisms for many organisms) rather than focusing on methods that detect selection at single loci. We also find that folding the site-frequency spectrum obscures positive selection and suggest that every effort should be made to incorporate information on the ancestry of mutations being studies, if one wishes to estimate selection.
We also point out that simply adding information on the number of fixed differences in the region between the study population and a close species will not aid in point estimation for selection. Such a method would add exactly one data point and one parameter (time since divergence), which can be scaled to fit that point exactly. A method that uses two classes of sites (e.g., silent and replacement) and fixed differences circumvents this problem and will add useful information for the estimation of selection. We have developed a method that uses divergence data and the site-frequency spectrum to estimate selection. Likewise, information on variation in the site-frequency spectrum among different genomic regions also holds great potential for refining point estimation for single regions of interest. We have also developed a method that uses Markov chain Monte Carlo to perform Gibbs and Metropolis sampling for a hierarchical model of this form. We will expand upon these two methods in subsequent publications.
| ACKNOWLEDGMENTS |
|---|
We thank Richard Lewontin, Rasmus Nielsen, and Steve Palumbi for many helpful comments. This work was supported by a Howard Hughes Medical Institute Graduate Fellowship to C.D.B. and National Science Foundation grant DMS9707045 to S.A.S.
Manuscript received April 24, 2001; Accepted for publication August 20, 2001.
| LITERATURE CITED |
|---|
AKASHI, H., 1995 Inferring weak selection from patterns of polymorphism and divergence at "silent" sites in Drosophila DNA. Genetics 139:1067-1076[Abstract].
AKASHI, H., 1999 Inferring the fitness effects of DNA mutations from polymorphism and divergence data: statistical power to detect directional selection under stationarity and free recombination. Genetics 151:221-238
AKASHI, H. and S. W. SCHAEFFER, 1997 Natural selection and the frequency distributions of "silent" DNA polymorphism in Drosophila. Genetics 146:295-307[Abstract].
FISHER, R. A., 1930 The Genetical Theory of Natural Selection. Clarendon Press, Oxford.
HARTL, D. L., E. N. MORIYAMA, and S. A. SAWYER, 1994 Selection intensity for codon bias. Genetics 138:227-234[Abstract].
HARTL, D., E. F. BOYD, C. D. BUSTAMANTE and S. SAWYER, 2000 The glean machine: what can we learn from DNA sequence polymorphism, pp. 3749 in Genomics and Proteomics: Functional and Computational Aspects, edited by S. SUHAI. Plenum Press, New York.
KENDALL, M., 1987 Kendall's Advanced Theory of Statistics, Vol. 2. Oxford University Press, Oxford.
KINGMAN, J., 1993 Poisson Processes. Oxford University Press, Oxford.
LANGE, K., 1999 Numerical Analysis for Statisticians. Springer-Verlag, New York.
PRESS, W. H., B. P. FLANNERY and S. A. T. W. T. VETTERLING, 1988 Numerical Recipes in C. Cambridge University Press, Cambridge, UK.
SAWYER, S. and D. L. HARTL, 1992 Population genetics of polymorphism and divergence. Genetics 132:1161-1176[Abstract].
TAJIMA, F., 1989 Statistical method for testing the neutral mutation hypothesis by DNA polymorphism. Genetics 123:585-595
WATTERSON, G. A., 1975 On the number of segregating sites in genetic models without recombination. Theor. Popul. Biol. 7:256-276[Medline].
WRIGHT, S., 1938 The distribution of gene frequencies under irreversible mutation. Proc. Natl. Acad. Sci. USA 24:253-259
WRIGHT, S., 1969 Evolution and the Genetics of Populations, Vol. 2: The Theory of Gene Frequencies. University of Chicago Press, Chicago.
This article has been cited by other articles:
![]() |
E. Huerta-Sanchez, R. Durrett, and C. D. Bustamante Population Genetics of Polymorphism and Divergence Under Fluctuating Selection Genetics, January 1, 2008; 178(1): 325 - 337. [Abstract] [Full Text] [PDF] |
||||
![]() |
R. D. Hernandez, S. H. Williamson, L. Zhu, and C. D. Bustamante Context-Dependent Mutation Rates May Cause Spurious Signatures of a Fixation Bias Favoring Higher GC-Content in Humans Mol. Biol. Evol., October 1, 2007; 24(10): 2196 - 2202. [Abstract] [Full Text] [PDF] |
||||
![]() |
H. Chen, R. E. Green, S. Paabo, and M. Slatkin The Joint Allele-Frequency Spectrum in Closely Related Species Genetics, September 1, 2007; 177(1): 387 - 398. [Abstract] [Full Text] [PDF] |
||||
![]() |
R. D. Hernandez, S. H. Williamson, and C. D. Bustamante Context Dependence, Ancestral Misidentification, and Spurious Signatures of Natural Selection Mol. Biol. Evol., August 1, 2007; 24(8): 1792 - 1800. [Abstract] [Full Text] [PDF] |
||||
![]() |
O. G. Pybus, A. Rambaut, R. Belshaw, R. P. Freckleton, A. J. Drummond, and E. C. Holmes Phylogenetic Evidence for Deleterious Mutation Load in RNA Viruses and Its Contribution to Viral Evolution Mol. Biol. Evol., March 1, 2007; 24(3): 845 - 852. [Abstract] [Full Text] [PDF] |
||||
![]() |
P. L.F. Johnson and M. Slatkin Inference of population genetic parameters in metagenomics: A clean look at messy data Genome Res., October 1, 2006; 16(10): 1320 - 1327. [Abstract] [Full Text] [PDF] |
||||
![]() |
W.-Y. Ko, S. Piao, and H. Akashi Strong Regional Heterogeneity in Base Composition Evolution on the Drosophila X Chromosome Genetics, September 1, 2006; 174(1): 349 - 362. [Abstract] [Full Text] [PDF] |
||||
![]() |
J. J. Welch Estimating the Genomewide Rate of Adaptive Protein Evolution in Drosophila Genetics, June 1, 2006; 173(2): 821 - 837. [Abstract] [Full Text] [PDF] |
||||
![]() |
L. Zhu and C. D. Bustamante A Composite-Likelihood Approach for Detecting Directional Selection From DNA Sequence Data Genetics, July 1, 2005; 170(3): 1411 - 1421. [Abstract] [Full Text] [PDF] |
||||
![]() |
S. H. Williamson, R. Hernandez, A. Fledel-Alon, L. Zhu, R. Nielsen, and C. D. Bustamante Simultaneous inference of selection and population growth from patterns of variation in the human genome PNAS, May 31, 2005; 102(22): 7882 - 7887. [Abstract] [Full Text] [PDF] |
||||
![]() |
S. Williamson, A. Fledel-Alon, and C. D. Bustamante Population Genetics of Polymorphism and Divergence for Diploid Selection Models With Arbitrary Dominance Genetics, September 1, 2004; 168(1): 463 - 475. [Abstract] [Full Text] [PDF] |
||||
![]() |
G. K.-S. Wong, Z. Yang, D. A. Passey, M. Kibukawa, M. Paddock, C.-R. Liu, L. Bolund, and J. Yu A Population Threshold for Functional Polymorphisms Genome Res., August 1, 2003; 13(8): 1873 - 1879. [Abstract] [Full Text] [PDF] |
||||
![]() |
S. Williamson Adaptation in the env Gene of HIV-1 and Evolutionary Theories of Disease Progression Mol. Biol. Evol., August 1, 2003; 20(8): 1318 - 1325. [Abstract] [Full Text] [PDF] |
||||
![]() |
J. Wakeley Polymorphism and Divergence for Island-Model Species Genetics, January 1, 2003; 163(1): 411 - 420. [Abstract] [Full Text] [PDF] |
||||
- THIS ARTICLE
-
Abstract
- Full Text (PDF)
- Alert me when this article is cited
- Alert me if a correction is posted
- SERVICES
- Similar articles in this journal
- Similar articles in PubMed
- Alert me to new issues of the journal
- Download to citation manager
- Reprints & Permissions
- CITING ARTICLES
- Citing Articles via HighWire
- Citing Articles via Google Scholar
- GOOGLE SCHOLAR
- Articles by Bustamante, C. D.
- Articles by Hartl, D. L.
- Search for Related Content
- PUBMED
- PubMed Citation














































