- THIS ARTICLE
-
Abstract
- Full Text (PDF)
- Alert me when this article is cited
- Alert me if a correction is posted
- SERVICES
- Email this article to a friend
- Similar articles in this journal
- Similar articles in PubMed
- Alert me to new issues of the journal
- Download to citation manager
- Reprints & Permissions
- CITING ARTICLES
- Citing Articles via HighWire
- Citing Articles via Google Scholar
- GOOGLE SCHOLAR
- Articles by Wong, W. S. W.
- Articles by Nielsen, R.
- Search for Related Content
- PUBMED
- PubMed Citation
- Articles by Wong, W. S. W.
- Articles by Nielsen, R.
Genetics, Vol. 168, 1041-1051, October 2004, Copyright © 2004
doi:10.1534/genetics.104.031153
Accuracy and Power of Statistical Methods for Detecting Adaptive Evolution in Protein Coding Sequences and for Identifying Positively Selected Sites
Wendy S. W. Wong*,1,
Ziheng Yang
,
Nick Goldman
and
Rasmus Nielsen*,
* Department of Biological Statistics and Computational Biology, Cornell University, Ithaca, New York 14850
Department of Biology, University College London, London WC1E 6BT, United Kingdom
European Bioinformatics Institute, Wellcome Trust Genome Campus, Hinxton, Cambridge, CB10 1SD, United Kingdom
Center for Bioinformatics, University of Copenhagen, Copenhagen 2100 Kbh Ø, Denmark
1 Corresponding author: Department of Biological Statistics and Computational Biology, Cornell University, Ithaca, NY 14850.
E-mail: sww8{at}cornell.edu
The parsimony method of SUZUKI and GOJOBORI (1999) and the maximum likelihood method developed from the work of NIELSEN and YANG (1998) are two widely used methods for detecting positive selection in homologous protein coding sequences. Both methods consider an excess of nonsynonymous (replacement) substitutions as evidence for positive selection. Previously published simulation studies comparing the performance of the two methods show contradictory results. Here we conduct a more thorough simulation study to cover and extend the parameter space used in previous studies. We also reanalyzed an HLA data set that was previously proposed to cause problems when analyzed using the maximum likelihood method. Our new simulations and a reanalysis of the HLA data demonstrate that the maximum likelihood method has good power and accuracy in detecting positive selection over a wide range of parameter values. Previous studies reporting poor performance of the method appear to be due to numerical problems in the optimization algorithms and did not reflect the true performance of the method. The parsimony method has a very low rate of false positives but very little power for detecting positive selection or identifying positively selected sites.
MUCH attention has recently been devoted to the detection of positive selection on protein-coding DNA sequences in molecular evolutionary genomics (e.g., SWANSON and VACQUIER 2002; BERNATCHEZ and LANDRY 2003; CHOISY et al. 2004). The most commonly used criterion for detecting positive selection in protein-coding genes is to compare the nonsynonymous rate (dN) with the synonymous rate (dS). When the rate ratio
= dN/dS > 1, the nonsynonymous rate is greater than the synonymous rate and this is interpreted as evidence for the action of positive selection.
Several methods have been proposed for detecting if a protein is experiencing an excess of nonsynonymous substitution or elevated values of
. The most popular methods are parsimony methods (FITCH et al. 1997; Bush et al. 1999; SUZUKI and GOJOBORI 1999) and maximum likelihood methods (NIELSEN and YANG 1998; YANG et al. 2000). Using these methods, numerous genes have been identified to be evolving under the influence of positive selection (e.g., YANG and BIELAWSKI 2000; LIBERLES et al. 2001; LIBERLES and WAYNE 2002).
Parsimony methods were independently developed by FITCH et al. (1997) and SUZUKI and GOJOBORI (1999). In these methods, substitutions are inferred using parsimony reconstruction of ancestral sequences, and an excess of nonsynonymous substitutions is tested independently for each site. The two methods differ in that FITCH et al. (1997) (see also BUSH et al. 1999) first estimated the average dN/dS ratio along the sequence and then compared the nonsynonymous/synonymous rate ratio at each site against this average, while SUZUKI and GOJOBORI (1999) compared the dN/dS ratio at each site independently against the neutral expectation 1. The SUZUKI and GOJOBORI (1999) method is implemented in the Adaptsite computer program of SUZUKI et al. (2001).
GOLDMAN and YANG (1994) and MUSE and GAUT (1994) were the first to develop codon-based models for likelihood estimation of
. NIELSEN and YANG (1998) and YANG et al. (2000) extended these methods to allow variation in
among sites, thereby providing a more powerful framework for detecting positive selection when sites undergoing positive selection are interspersed among sites dominated by negative selection. They suggested the use of an empirical Bayes approach for identifying putatively positively selected sites in genes that have been demonstrated to undergo positive selection. In the approach of NIELSEN and YANG (1998), a (neutral) model (model M1) allowing only two categories of sites, with
= 1 and
= 0, is compared using a likelihood ratio test (LRT) with a (selection) model (M2), which allows an additional category of positively selected sites with
> 1. If M1 (neutral) can be rejected in favor of M2 (selection), positive selection is inferred. Several similar but more-realistic models were implemented by YANG et al. (2000). One commonly used pair involves a null model (M7) in which
was assumed to be beta-distributed among sites and an alternative selection model (M8), which allows an extra category of positively selected sites. The likelihood methods are implemented in the codeml program in the PAML package (YANG 1997).
The likelihood method in its current form proposes a two-step procedure in which an LRT is first used to test for positive selection in a gene as a whole. If this test indicates statistical evidence for the presence of a proportion of sites evolving under positive selection, identification of putative positively selected sites can then proceed (NIELSEN and YANG 1998; YANG et al. 2000). In contrast, the parsimony method in the SUZUKI and GOJOBORI (1999) implementation has been proposed as a test for individual sites. If one's interest is to detect positive selection in a gene and multiple sites are analyzed, a correction for multiple testing is therefore needed. We wish here to distinguish between the two different inferential problems of testing for positive selection in a particular gene or section of a gene and of predicting which sites are most likely to be under positive selection.
A number of simulation experiments have been performed to study various aspects of the parsimony and likelihood methods for detecting positive selection in protein-coding genes. ANISIMOVA et al. (2001)(2002) studied the likelihood method. They concluded that the accuracy and power of the LRT and of the Bayes identification of sites under positive selection depend on the data. Both accuracy and power are low when the data contain only a few highly similar sequences or when selection is weak. Overall, the method was found to have good accuracy and power in data sets of moderate or large sizes (for example, for
15 or more sequences).
SUZUKI and GOJOBORI (1999) performed simulations to examine the performance of their parsimony method. They compared the results of the method on analyzing two tree topologies (64 and 128 taxa, respectively), with various branch lengths (0.01, 0.02, and 0.03 synonymous changes per synonymous site for each branch) and various dN/dS ratios (0.2, 0.5, 1.0, 2.0, and 5.0). The power of the method was found to increase with increasing branch lengths and strength of the positive selection. The study also concluded that the method has a very low false-positive rate in general.
SUZUKI and NEI (2001)(2002) also conducted simulation studies to compare the reliability of the parsimony and likelihood methods. These two studies focused mainly on predicting positively selected sites. It was argued that the parsimony-based method was robust against the assumptions of the models and tends to be conservative, whereas the likelihood method gave numerous false-positive results with certain parameters in the simulation. SUZUKI and NEI (2001) also compared the likelihood and parsimony methods for identifying amino acid sites under positive selection using a data set of human leukocyte antigen (HLA) alleles. Performance was evaluated by examining the number and location, relative to the antigen recognition site (ARS), of amino acid residues inferred to be under positive selection. The authors discussed a number of problems in the likelihood approach and concluded that it was inferior to the parsimony method using reconstructed ancestral sequences. Those results contrast sharply with the analysis of a similar HLA data set by YANG and SWANSON (2002), in which the likelihood results were biologically sensible.
Since the results shown in different studies have been contradictory, we have conducted a new and more comprehensive simulation study to determine the reliability and power of the parsimony and maximum likelihood methods. We examine the performance of both methods in answering two questions: (i) Is a gene under positive selection or does it have any sites under positive selection? and (ii) Which sites in a gene are under positive selection?
Likelihood and parsimony methods for detecting positive selection:
In the maximum likelihood method, site-specific models M1 (neutral), M2 (selection), M7 (beta), M8 (beta&
; NIELSEN and YANG 1998; YANG et al. 2000), and M8a (beta&
= 1; SWANSON et al. 2003) were used with codeml in the PAML 3.13 package (YANG 2000b). Model M1 (neutral) allows two classes of sites with
0 = 0 and
1 = 1 in proportions p0 and p1 = 1 p0, respectively. Model M2 (selection) has an additional class with
2, which takes on any nonnegative value, and applies to a proportion p2 of sites, now with the constraint p0 + p1 + p2 = 1. We test for positive selection by comparing twice the log-likelihood difference between M1 and M2 with a
22 distribution in the LRT (YANG et al. 2000). Model M7 (beta) assumes a ß-distribution for 0
1. Model M8 (beta&
) adds to M7 an extra category, with proportion p1 of sites with
1, while the rest of sites (at frequency p0 = 1 p1) have
from the ß-distribution between 0 and 1. Here we compare twice the log-likelihood difference between M7 and M8 with a
22 distribution to test for positive selection (YANG et al. 2000; ANISIMOVA et al. 2001). Model M8a was introduced in SWANSON et al. (2003); it is similar to model M8 except that the category
1 is fixed at
1 = 1. It was argued that twice the log-likelihood difference between M8 and M8a should be asymptotically distributed as a 50:50 mixture of a point mass at 0 and
21 (SWANSON et al. 2003). However, this asymptotic result holds only if all the parameters of the null model are estimable (CHERNOFF 1954; SELF and LIANG 1987), which is not always the case for the M8a-M8 comparison. Thus besides the
20 +
21 distribution, to be conservative we use the
12 distribution as well for comparison with the test statistic. We also use slight variations to M1 (neutral) and M2 (selection), by letting
0 vary freely between 0 and 1 rather than fixing it at 0. These models are referred to below as M1a and M2a. These two models were implemented in a modified version of codeml. Notice that the M0 vs. M3 test that was used in SUZUKI and NEI (2001)(2002) and ANISIMOVA et al. (2001)(2002) is a test of heterogeneity in
among sites and not really a test for positive selection. We did not include this test here since our primary interest is identifying positive selection. To predict which sites are under positive selection in the likelihood framework, the empirical Bayes method described in NIELSEN and YANG (1998) and YANG et al. (2000) was applied. A site is predicted as positively selected if the (empirical Bayes) posterior probability that it belongs to the positive selection category is greater than a predetermined cutoff value Pb. It is worth mentioning here that this method is not designed to control the frequentist type I error, that is, the probability of inferring positive selection when the null hypothesis is true (i.e., the site is not under positive selection). SUZUKI and NEI (2001)(p. 1866) incorrectly suggest that this error rate is expected to be (1 Pb) when the cutoff is Pb. In the empirical Bayes method, Pb is the probability that a site inferred to be positively selected is truly under positive selection (termed the accuracy by ANISIMOVA et al. 2002), and what should equal (1 Pb) is the proportion of sites inferred to be positively selected that are not under positive selection. However, we will here concentrate on evaluating the false-positive rate (frequentist type I error rate) of the empirical Bayes method, using Pb = 0.95 or Pb = 0.99.
The maximum parsimony approach to detecting positive selection in protein coding nucleotide sequences was described in SUZUKI and GOJOBORI (1999)(see also FITCH et al. 1997; BUSH et al. 1999). Given a set of aligned sequences and assuming that each codon site is independent, the method first infers the ancestral codon states using either the parsimony method (FITCH 1971; HARTIGAN 1973) or the empirical Bayes method (YANG et al. 1995), with parameters estimated from pairwise distances rather than using maximum likelihood (ZHANG and NEI 1997; ZHANG et al. 1998). Second, for each codon site, the method counts the numbers of synonymous and nonsynonymous sites and the numbers of synonymous and nonsynonymous differences. Finally, for each site, a test of neutrality is conducted to see whether dN > dS or
> 1. A one-sided test for positive selection is used in this simulation study, with the significance level set at 5 or 1%. If the test is significant, the method concludes that the site is undergoing positive selection. We compare this test of selection at each site with the empirical Bayesian identification of sites under positive selection (NIELSEN and YANG 1998; YANG et al. 2000), as did SUZUKI and NEI (2001)(2002).
We also use the procedure of SUZUKI and GOJOBORI (1999) to test whether there is any site under positive selection in the whole protein, for comparison with the likelihood ratio test of NIELSEN and YANG (1998) and YANG et al. (2000). For such a test of positive selection in a protein, a correction for multiple testing is needed since each site is tested for positive selection independently. We use the Simes' improved Bonferroni procedure (SIMES 1986). That is, we rank the P-values of the test on each site, from the lowest to the highest. If any site has a P-value smaller than the designated type I error
divided by its rank, we claim that the data set is significant for positive selection. Simulation studies showed that the Simes' improved Bonferroni procedure has better power than the traditional Bonferroni procedure (SIMES 1986) and hence it is used in this study.
Real and simulated data sets analyzed in this article:
HLA data used in SUZUKI and NEI (2001):
To understand why drastically different conclusions were reached by YANG and SWANSON (2002) and SUZUKI and NEI (2001) in the analysis of two similar data sets, we reanalyzed the data of SUZUKI and NEI (2001) using codeml. Following SUZUKI and NEI (2001), we fixed branch lengths at estimates obtained under a nucleotide-based model on a neighbor-joining tree (SAITOU and NEI 1987). As in SUZUKI and NEI (2001), the F61 model was used to account for codon usage bias, with the equilibrium codon frequencies estimated by the observed frequencies in the data (GOLDMAN and YANG 1994).
Simulated data:
Data sets were simulated using evolver in the PAML 3.13 package (YANG 2000b), on a 5-taxon tree (Figure 1A) and a 30-taxon tree (Figure 1B). The following parameters are common in all sets of simulations: (1) the transition/transversion rate ratio
= 1, (2) the stationary frequencies of each of the 61 sense codons is 1/61, (3) the number of codons in each sequence is 500, and (4) the tree length (the expected number of nucleotide substitutions per codon along all branches in the phylogeny) is 3. For each of the two tree topologies, six sets of different
-values were simulated, as follows.
|
Data sets that contain only neutrally or negatively selected sites:
= 0 for all codon sites; 100 replicates.
- (a)
= 0 for 50% of the sites, and
= 1 for 50% of the sites; 100 replicates.
= 0 for 90% of the sites, and
= 1 for 10% of the sites; 100 replicates.
= 0.5 for 50% of the sites, and
= 1 for 50% of the sites; 100 replicates.
Data sets that contain positively selected sites:
= 1.5 for 50% of the sites,
= 1 for 50% of the sites; 100 replicates.
= 0 for 45% of the sites,
= 1 for 45% of the sites, and
= 1.5 for 10% of the sites; 50 replicates.
= 0 for 45% of the sites,
= 1 for 45% of the sites, and
= 5 for 10% of the sites; 50 replicates.
Note that the
-values in three of the above schemes (schemes 2, 3, and 4) were identical to those used in SUZUKI and NEI (2002). Schemes 1, 5, and 6 are designed to mimic pseudogene evolution, weakly positively selected evolution, and highly positively selected evolution, respectively. We note that some of the simulation schemes used here are highly unrealistic for real data sets, such as scheme 4. However, they provide difficult test cases, useful for evaluating detection methods.
Analysis of simulated data:
The simulated data were analyzed using the parsimony method with Adaptsite 1.3 (SUZUKI et al. 2001) and the maximum likelihood method with codeml in the PAML 3.13 package (YANG 2000b).The procedure for data analysis with Adaptsite is as follows:
- Since Adaptsite cannot estimate the branch lengths of the tree, we used Bn-Bs (ZHANG et al. 1998) to estimate the synonymous branch lengths of the tree, with the true topology given.
- Adaptsite-p was applied to the data, using the true tree topology and estimated branch lengths, to estimate the total and average numbers of synonymous and nonsynonymous sites for the phylogenetic tree with user-given mutation rates between the four nucleotides. The mutation rates between any two nucleotides were set to 1, since
= 1 in the simulated data.
- Given the output from adaptsite-p, we used adaptsite-t to compute the P-values of one-sided and two-sided neutrality tests independently for each codon site.
- Since Adaptsite is not capable of analyzing some of the sites in the data sets (e.g., those that have >10,000 combinations for possible ancestral codons over all nodes), upon the program's author's recommendation, we excluded those sites in calculating the summarized results.
- Tests of neutrality (
1 for all sites) were then completed using Simes' improved Bonferroni procedure (SIMES 1986) as described earlier.
We ranked only those sites that Adaptsite was able to analyze. Regarding step 1 above, SUZUKI and GOJOBORI (1999) used the neighbor-joining method for constructing the tree topology and then used the NEI and GOJOBORI (1986) method for estimating the number of synonymous substitutions. Since these two steps were implemented in one program included in the Adaptsite 1.3 package (SUZUKI et al. 2001), we used the Bn-Bs program (ZHANG et al. 1998) so that we can feed Adaptsite with the true tree topology. The Bn-Bs program implements a modified method from the original NEI and GOJOBORI (1986) to take into account the transition bias for estimating synonymous and nonsynonymous substitutions along the branches of a given tree. Steps 24 above are the standard procedures described in the README file included in the Adaptsite 1.3 package (SUZUKI et al. 2001).
The procedure for data analysis for codeml in PAML is as follows:
- Given the topology of the tree, models M0, M1, M2, M1a, M2a, M7, M8, and M8a are used, with
fixed at 1 in all models. Under models M2, M2a, M7, M8, and M8a, the same analysis is conducted multiple times using different initial values, to investigate possible problems with convergence of likelihood optimizations or multiple local maxima of the likelihood function (YANG 1997; YANG et al. 2000).
- Log-likelihood values from each data set and the putative positively selected sites inferred by codeml are obtained. For a data set analyzed with different initial values, the result with a higher likelihood value is used, in accordance with standard theory (STUART et al. 1999).
- LRTs were performed to compare models M1 with M2, M1a with M2a, M7 with M8, and M8a with M8.
When interpreting the results we distinguish between tests of positive selection (the LRT and the parsimony-based test using a Bonferroni correction) and prediction of sites under positive selection.
Analysis of the HLA data set:
The log-likelihood values and parameter estimates of the HLA data set of SUZUKI and NEI (2001) under various models are shown in Table 1. The results for M0 (one-ratio) are the same as those of SUZUKI and NEI (2001)(Table 1). However, the results for all other modelsthat is, M1 (neutral), M2 (selection), M3 (discrete), M7 (beta), and M8 (beta&
)are different, and those in SUZUKI and NEI (2001) are incorrect. Models M2 (selection), M3 (discrete), and M8 (beta&
), which allow for sites under positive selection, all suggest presence of such sites (Table 1). Those models also fit the data significantly better than the corresponding null models, namely M1 (neutral), M0 (one-ratio), and M7 (beta), respectively. A number of sites are identified by the models to be under positive selection. For example, model M8 identified 24 sites at the 95% probability level. Of these, 20 sites are on the list of 57 amino acids within the ARS (BJORKMAN et al. 1987a,b). The other 4 sites identified (45M, 83G, 94T, and 113Y; site numbering refers to the PDB structural file 1AKJ) are not on the list but are all located in the same region. The sites are very similar to those identified by YANG and SWANSON (2002) from a similar data set. Three of the 4 non-ARS sites (45M, 94T, and 113Y) were identified to be under positive selection by YANG and SWANSON (2002) as well.
|
Multiple runs using different starting values identified a suboptimal local maximum of the likelihood function for model M2 (selection) at
0 = 0.578,
1 = 0.101, and
, with
= 8229.64. Model M8 (beta&
) also has a local optimum, at
0 = 0.555,
= 0.031,
= 0.102,
= 0.046, with
= 8228.63. These likelihood values are much lower than those in Table 1, and we use the results of Table 1 corresponding to the higher peaks. Note that if
in M8 and
2 in M2 are constrained to be
1, as suggested by SWANSON et al. (2003), there will be only one peak under those two models. Model M7 (beta) seems also to have a local maximum at
,
= 0.130, with
= 8267.39.
Simulation results:
Hypothesis tests:
Table 2 shows the number of data sets detected by the two methods to have significant evidence for the presence of positive selection, for each set of parameter values. Note that under schemes 1, 2a, 2b, and 3, no sites are under positive selection with
> 1, so that any data sets in which positive selection is claimed are false positives (type I errors). The improved Bonferroni procedure combined with Adaptsite did not detect positive selection in any of the data sets simulated under those schemes and thus had zero false positives. In general, the false-positive rate of the LRT with codeml is lower than or equal to the nominal significance level. In particular, the false-positive rates for the M7 vs. M8 comparison were all below 5%, much lower than the error rates reported by SUZUKI and NEI (2002). However, the type I errors of M8a-M8 comparison using the mixture of
2 distributions suggested by SWANSON et al. (2003) were about twice the desired level. The LRT comparing M8a vs. M8 using a
21 distribution performed better. None of the original tests suggested by NIELSEN and YANG (1998) and YANG et al. (2000) had elevated levels of falsely significant results.
|
In sum, neither Adaptsite nor the LRT implemented in codeml suffers from an excess of falsely significant results under the simulation conditions investigated here. However, they differ dramatically in their power to detect positive selection. Note that under schemes 4, 5, and 6, sites under positive selection with
> 1 exist, so that a method that detects positive selection more often has higher power. Adaptsite detected no positive selection even when
= 5 in 10% of the sites (scheme 6) or when half of the sites were undergoing weak positive selection (scheme 4). In contrast, in scheme 4, the LRT between M7 and M8 (5% significance level) identified positive selection in 72 and 98% of the cases when the numbers of taxa were 5 and 30, respectively. In scheme 6 all the LRTs had power close to 100%. While Adaptsite essentially has zero power to detect positive selection under all of the conditions studied, the power of the LRT can be quite high even for five sequences, without inflating the type I error rate of the test.
Prediction of positively selected sites:
The accuracy of Adaptsite and codeml in predicting positively selected sites in data sets that do contain positively selected sites is shown in Table 3. Adaptsite detected <1% of the positively selected sites when either 10% (scheme 4) or 50% (scheme 5) of the sites were under weak positive selection (
= 1.5). However, for 30 sequences when 10% of the sites are under strong positive selection (
= 5 in scheme 6), Adaptsite identified 8% of those sites and had no false positives before Simes' improved Bonferroni procedure. Codeml performs even better on the same data sets, correctly identifying over 75% of the positively selected sites without wrongly categorizing any of the neutral sites as being positively selected. Furthermore, Adaptsite was not able to identify any positively selected sites with the same distribution of
on the five-taxon tree, whereas codeml detected nearly 20% of them.
|
In the weak positive selection data sets (schemes 4 and 5), the empirical Bayes methods predict an almost equal amount of neutral and positively selected sites to belong to the positive selection category. The proportion of sites evolving neutrally that are predicted to be under positive selection can be as high as 36% with M8. The high error rates are due to inaccuracies in maximum likelihood estimates of parameters in the
-distribution. Adaptsite predicts no positively selected sites in either category. None of the methods are capable of discriminating between sites in which
= 1 and
= 1.5 with any confidence. Clearly, differentiating between sites evolving under such similar values of
is very hard.
Table 4 shows the proportion of neutral sites that are falsely predicted to be under positive selection by codeml in the data sets without positive selection. Results from Adaptsite are not included in Table 4, since it did not have any false positives. Again note that the distributions of
in schemes 2a, 2b, and 3 are the same as those used in SUZUKI and NEI (2002). We did not find any false positives after the LRTs in these sets. However, there were still some false positives (<5% of cases for M1a vs. M2a and M7 vs. M8; <10% for M8a vs. M8) in the pseudogene set (scheme 1) after the LRT.
|
Predicting which sites are under positive selection is a very hard statistical problem, especially when the value of
is low at the positively selected sites. None of the examined methods could reliably distinguish between sites evolving at
= 1 and those evolving at
= 1.5. Caution should thus be exercised against drawing strong conclusions when the estimated
is only marginally >1, particularly if the estimated standard error of
is large relative to
1. Furthermore, the current implementation of the empirical Bayes approach fails to accommodate the sampling errors in the maximum likelihood estimates of model parameters (such as proportions of sites and the
-ratios), and as a result, posterior probabilities calculated from small data sets may be inflated if they are based on inaccurate parameter estimates (ANISIMOVA et al. 2002). It is then important to consider the posterior probabilities only if the LRT is significant.
In sum, results of this simulation study suggest that the LRT of positive selection does not generally lead to an excess of false positives, when the models are applied correctly and optimization problems are eliminated, consistent with the simulation studies of ANISIMOVA et al. (2001)(2002). Previous claims of excessive false-positive rates for the ML method were based on results either known to be incorrect (SUZUKI and NEI 2001) or most likely caused by numerical optimization problems or simulation errors (SUZUKI and NEI 2002).
In contrast, Adaptsite was unable to identify positive selection in virtually all of the simulated data sets analyzed here. Even in scheme 6 with strong positive selection (
= 5), when the LRT detected positive selection with
100% power for both small and large trees and the empirical Bayes method distinguished between neutral and positively selected sites with great accuracy (Tables 2 and 3), Adaptsite essentially predicts all sites to be neutral. Similarly, in a real data set of the tax gene of a human T-cell lymphotropic virus, Adaptsite failed to detect positive selection even when the
-ratio averaged over all sites and all branches is much greater than 1 (SUZUKI and NEI 2004). The lack of power of the method makes it unusable for testing positive selection except in large data sets with many sequences. This conclusion is consistent with the original study of SUZUKI and GOJOBORI (1999), who recommended its use in large data sets. While the method has been successful in several large data sets, of HLA alleles (SUZUKI and NEI 2001) and viral genes such as HIV-1 env (YAMAGUCHI-KABATA and GOJOBORI 2000), it is in general unknown how large the data set should be for the method to have any power. We suggest that failure of the method to detect positive selection should not be taken as evidence for absence of positive selection and that the method be used for exploratory data analysis only, to provide a heuristic assessment of synonymous and nonsynonymous changes at individual sites (see also FITCH et al. 1997).
It is quite possible that the likelihood models used for detecting positive selection can be violated such that the rate of false positives of the LRT is increased over the nominal level. Identification of such cases is an important step toward improving the methods, and we encourage researchers to continue the quest to find conditions under which the likelihood method fails. We also note that the empirical Bayes prediction can be improved, for example, by integrating over the uncertainty in the parameters in the
-distribution. Likewise, T. MASSINGHAM and N. GOLDMAN (unpublished observations) have proposed a related likelihood procedure that may accurately control the false-positive rates. Future studies examining the properties of the method for identifying positively selected sites may help to further improve and refine them.
Furthermore, the limitations of detection methods based on comparison of synonymous and nonsynonymous rates should always be borne in mind. Such methods detect positive selection only if there is an excess of nonsynonymous substitutions and are thus suitable for detecting recurrent diversifying selection, but may not detect directional selection that drives an advantageous mutation quickly to fixation. A reasonable amount of synonymous and nonsynonymous substitutions is also necessary for such methods to work, as too little information is available at low divergence levels while synonymous substitutions are often saturated at high divergence. In viral sequences, excessive recombination can also cause false positives for the detection method (ANISIMOVA et al. 2003).
ANISIMOVA, M., J. P. BIELAWSKI and Z. YANG, 2001 Accuracy and power of the likelihood ratio test in detecting adaptive molecular evolution. Mol. Biol. Evol. 18: 15851592.
ANISIMOVA, M., J. P. BIELAWSKI and Z. YANG, 2002 Accuracy and power of bayes prediction of amino acid sites under positive selection. Mol. Biol. Evol. 19: 950958.
ANISIMOVA, M., R. NIELSEN and Z. YANG, 2003 Effect of recombination on the accuracy of the likelihood method for detecting positive selection at amino acid sites. Genetics 164: 12291236.
BERNATCHEZ, L., and C. LANDRY, 2003 MHC studies in nonmodel vertebrates: What have we learned about natural selection in 15 years? J. Evol. Biol. 16: 363377.[CrossRef][Medline]
BJORKMAN, P. J., S. A. SAPER, B. SAMRAOUI, W. S. BENNET and J. L. STROMINGER et al., 1987a Structure of the class I histocompatibility antigen, HLA-A2. Nature 329: 506512.[CrossRef][Medline]
BJORKMAN, P. J., S. A. SAPER, B. SAMRAOUI, W. S. BENNET, J. L. STROMINGER et al., 1987b The foreign antigen binding site and T cell recognition regions of class I histocompatibility antigens. Nature 329: 512518.[CrossRef][Medline]
BUSH, R. M., W. M. FITCH, C. A. BENDER and N. J. COX, 1999 Positive selection on the H3 hemagglutinin gene of human influenza virus A. Mol. Biol. Evol. 16: 14571465.[Abstract]
CHERNOFF, H., 1954 On the distribution of the likelihood ratio. Ann. Math. Stat. 25: 573578.
CHOISY, M., C. H. WOELK, J. F. GUEGAN and D. L. ROBERTSON, 2004 Comparative study of adaptive molecular evolution in different human immunodeficiency virus groups and subtypes. J. Virol. 78: 19621970.
FITCH, W. M., 1971 Toward defining the course of evolution: minimum change for a specific tree topology. Syst. Zool. 20: 406416.[CrossRef]
FITCH, W. M., R. M. BUSH, C. A. BENDER and N. J. COX, 1997 Long term trends in the evolution of H(3) HA1 human influenza type A. Proc. Natl. Acad. Sci. USA 94: 77127718.
GOLDMAN, N., and Z. YANG, 1994 A codon-based model of nucleotide substitution for protein-coding DNA sequences. Mol. Biol. Evol. 11: 725736.[Abstract]
HARTIGAN, J. A., 1973 Minimum mutation fits to a given tree. Biometrics 29: 5365.[CrossRef]
KIMURA, M., 1980 A simple method for estimating evolutionary rates of base substitutions through comparative studies of nucleotide sequences. J. Mol. Evol. 16: 111120.[CrossRef][Medline]
LIBERLES, D. A., and M. L. WAYNE, 2002 Tracking adaptive evolutionary events in genomic sequences. Genome Biol. 3: REVIEWS1018.
LIBERLES, D. A., D. R. SCHREIBER, S. GOVINDARAJAN, S. G. CHAMBERLIN and S. A. BENNER, 2001 The adaptive evolution database (TAED). Genome Biol. 2: RESEARCH0028.
MUSE, S. V., and B. S. GAUT, 1994 A likelihood approach for comparing synonymous and nonsynonymous nucleotide substitution rates, with application to the chloroplast genome. Mol. Biol. Evol. 11: 715724.[Abstract]
NEI, M., and T. GOJOBORI, 1986 Simple methods for estimating the numbers of synonymous and nonsynonymous nucleotide substitutions. Mol. Biol. Evol. 3: 418426.[Abstract]
NIELSEN, R., and Z. YANG, 1998 Likelihood models for detecting positively selected amino acid sites and applications to the HIV-1 envelope gene. Genetics 148: 929936.
SAITOU, N., and M. NEI, 1987 The neighbor-joining method: a new method for reconstructing phylogenetic trees. Mol. Biol. Evol. 4: 406425.[Abstract]
SELF, S., and K.-Y. LIANG, 1987 Asymptotic properties of maximum likelihood estimators and likelihood ratio tests under nonstandard conditions. J. Am. Stat. Assoc. 82: 605610.[CrossRef]
SIMES, R. J., 1986 An improved Bonferroni procedure for multiple tests of significance. Biometrika 73: 751754.
SORHANNUS, U., 2003 The effect of positive selection on a sexual reproduction gene in Thalassiosira weissflogii (Bacillariophyta): results obtained from maximum-likelihood and parsimony-based methods. Mol. Biol. Evol. 20: 13261328.
STUART, A., K. ORD and S. ARNOLD, 1999 Kendall's Advanced Theory of Statistics. Arnold, London.
SUZUKI, Y., and T. GOJOBORI, 1999 A method for detecting positive selection at single amino acid sites. Mol. Biol. Evol. 16: 13151328.[Abstract]
SUZUKI, Y., and M. NEI, 2001 Reliabilities of parsimony-based and likelihood-based methods for detecting positive selection at single amino acid sites. Mol. Biol. Evol. 18: 21792185.
SUZUKI, Y., and M. NEI, 2002 Simulation study of the reliability and robustness of the statistical methods for detecting positive selection at single amino acid sites. Mol. Biol. Evol. 19: 18651869.
SUZUKI, Y., and M. NEI, 2004 False-positive selection identified by ML-based methods: examples from the Sig1 gene of the diatom Thalassiosira weissflogii and the tax gene of a human T-cell lymphotropic virus. Mol. Biol. Evol. 21: 914921.
SUZUKI, Y., T. GOJOBORI and M. NEI, 2001 ADAPTSITE: detecting natural selection at single amino acid sites. Bioinformatics 17: 660661.
SWANSON, W. J., and V. D. VACQUIER, 2002 The rapid evolution of reproductive proteins. Nat. Rev. Genet. 3: 137144.[Medline]
SWANSON, W. J., R. NIELSEN and Q. YANG, 2003 Pervasive adaptive evolution in mammalian fertilization proteins. Mol. Biol. Evol. 20: 1820.
YAMAGUCHI-KABATA, Y., and T. GOJOBORI, 2000 Reevaluation of amino acid variability of the human immunodeficiency virus type 1 gp120 envelope glycoprotein and prediction of new discontinuous epitopes. J. Virol. 74: 43354350.
YANG, Z., 1997 PAML: a program package for phylogenetic analysis by maximum likelihood. Comput. Appl. Biosci. 13: 555556.
YANG, Z., 2000a Maximum likelihood estimation on large phylogenies and analysis of adaptive evolution in human influenza virus A. J. Mol. Evol. 51: 423432.[Medline]
YANG, Z., 2000b Phylogenetic Analysis by Maximum Likelihood (PAML), Version 3.13. University College, London.
YANG, Z., and J. P. BIELAWSKI, 2000 Statistical methods for detecting molecular adaptation. Trends Ecol. Evol. 15: 496503.[CrossRef][Medline]
YANG, Z., and W. J. SWANSON, 2002 Codon-substitution models to detect adaptive evolution that account for heterogeneous selective pressures among site classes. Mol. Biol. Evol. 19: 4957.
YANG, Z., S. KUMAR and M. NEI, 1995 A new method of inference of ancestral nucleotide and amino acid sequences. Genetics 141: 16411650.[Abstract]
YANG, Z., R. NIELSEN, N. GOLDMAN and A. M. PEDERSEN, 2000 Codon-substitution models for heterogeneous selection pressure at amino acid sites. Genetics 155: 431449.
ZHANG, J., and M. NEI, 1997 Accuracies of ancestral amino acid sequences inferred by the parsimony, likelihood, and distance methods. J. Mol. Evol. 44: S139S146.
ZHANG, J., H. F. ROSENBERG and M. NEI, 1998 Positive Darwinian selection after gene duplication in primate ribonuclease genes. Proc. Natl. Acad. Sci. USA 95: 37083713.
This article has been cited by other articles:
![]() |
T. D. O'Connor and N. I. Mundy Genotype-phenotype associations: substitution models to detect evolutionary associations between phenotypic variables and genotypic evolutionary rate Bioinformatics, June 15, 2009; 25(12): i94 - i100. [Abstract] [Full Text] [PDF] |
||||
![]() |
P. Qiu, V. Sanfiorenzo, S. Curry, Z. Guo, S. Liu, A. Skelton, E. Xia, C. Cullen, R. Ralston, J. Greene, et al. Identification of HCV protease inhibitor resistance mutations by selection pressure-based method Nucleic Acids Res., June 1, 2009; 37(10): e74 - e74. [Abstract] [Full Text] [PDF] |
||||
![]() |
M. V. Han, J. P. Demuth, C. L. McGrath, C. Casola, and M. W. Hahn Adaptive evolution of young gene duplicates in mammals Genome Res., May 1, 2009; 19(5): 859 - 867. [Abstract] [Full Text] [PDF] |
||||
![]() |
D. Alvarez-Ponce, M. Aguade, and J. Rozas Network-level molecular evolutionary analysis of the insulin/TOR signal transduction pathway across 12 Drosophila genomes Genome Res., February 1, 2009; 19(2): 234 - 242. [Abstract] [Full Text] [PDF] |
||||
![]() |
W. Zhai, R. Nielsen, and M. Slatkin An Investigation of the Statistical Power of Neutrality Tests Based on Comparative and Population Genetic Data Mol. Biol. Evol., February 1, 2009; 26(2): 273 - 283. [Abstract] [Full Text] [PDF] |
||||
![]() |
M. Anisimova and C. Kosiol Investigating Protein-Coding Sequence Evolution with Probabilistic Codon Substitution Models Mol. Biol. Evol., February 1, 2009; 26(2): 255 - 271. [Abstract] [Full Text] [PDF] |
||||
![]() |
W. Delport, K. Scheffler, and C. Seoighe Models of coding sequence evolution Brief Bioinform, January 1, 2009; 10(1): 97 - 109. [Abstract] [Full Text] [PDF] |
||||
![]() |
F. C. Almeida and R. DeSalle Orthology, Function and Evolution of Accessory Gland Proteins in the Drosophila repleta Group Genetics, January 1, 2009; 181(1): 235 - 245. [Abstract] [Full Text] [PDF] |
||||
![]() |
S. Dupas, C. W. Gitau, A. Branca, B. P. Le Ru, and J.-F. Silvain Evolution of a Polydnavirus Gene in Relation to Parasitoid-Host Species Immune Resistance J. Hered., September 1, 2008; 99(5): 491 - 499. [Abstract] [Full Text] [PDF] |
||||
![]() |
L. Bao, H. Gu, K. A. Dunn, and J. P. Bielawski Likelihood-Based Clustering (LiBaC) for Codon Models, a Method for Grouping Sites according to Similarities in the Underlying Process of Evolution Mol. Biol. Evol., September 1, 2008; 25(9): 1995 - 2007. [Abstract] [Full Text] [PDF] |
||||
![]() |
F. C. Almeida and R. DeSalle Evidence of Adaptive Evolution of Accessory Gland Proteins in Closely Related Species of the Drosophila repleta Group Mol. Biol. Evol., September 1, 2008; 25(9): 2043 - 2053. [Abstract] [Full Text] [PDF] |
||||
![]() |
G. C. Nickel, D. L. Tefft, K. Goglin, and M. D. Adams An Empirical Test for Branch-Specific Positive Selection Genetics, August 1, 2008; 179(4): 2183 - 2193. [Abstract] [Full Text] [PDF] |
||||
![]() |
M. A. Chapman, J. H. Leebens-Mack, and J. M. Burke Positive Selection and Expression Divergence Following Gene Duplication in the Sunflower CYCLOIDEA Gene Family Mol. Biol. Evol., July 1, 2008; 25(7): 1260 - 1273. [Abstract] [Full Text] [PDF] |
||||
![]() |
A. Llopart and J. M. Comeron Recurrent Events of Positive Selection in Independent Drosophila Lineages at the Spermatogenesis Gene roughex Genetics, June 1, 2008; 179(2): 1009 - 1020. [Abstract] [Full Text] [PDF] |
||||
![]() |
J. L. Kelley and W. J. Swanson Dietary Change and Adaptive Evolution of enamelin in Humans and Among Primates Genetics, March 1, 2008; 178(3): 1595 - 1603. [Abstract] [Full Text] [PDF] |
||||
![]() |
T. Yuri, R. T. Kimball, E. L. Braun, and M. J. Braun Duplication of Accelerated Evolution and Growth Hormone Gene in Passerine Birds Mol. Biol. Evol., February 1, 2008; 25(2): 352 - 361. [Abstract] [Full Text] [PDF] |
||||
![]() |
S. A. Ramm, P. L. Oliver, C. P. Ponting, P. Stockley, and R. D. Emes Sexual Selection and the Adaptive Evolution of Mammalian Ejaculate Proteins Mol. Biol. Evol., January 1, 2008; 25(1): 207 - 219. [Abstract] [Full Text] [PDF] |
||||
![]() |
M. W. Hahn, J. P. Demuth, and S.-G. Han Accelerated Rate of Gene Gain and Loss in Primates Genetics, November 1, 2007; 177(3): 1941 - 1949. [Abstract] [Full Text] [PDF] |
||||
![]() |
D. T. Gerrard and A. Meyer Positive Selection and Gene Conversion in SPP120, a Fertilization-Related Gene, during the East African Cichlid Fish Radiation Mol. Biol. Evol., October 1, 2007; 24(10): 2286 - 2297. [Abstract] [Full Text] [PDF] |
||||
![]() |
L. Petersen, J. P. Bollback, M. Dimmic, M. Hubisz, and R. Nielsen Genes under positive selection in Escherichia coli Genome Res., September 1, 2007; 17(9): 1336 - 1343. [Abstract] [Full Text] [PDF] |
||||
![]() |
X. Maside and B. Charlesworth Patterns of Molecular Variation and Evolution in Drosophila americana and Its Relatives Genetics, August 1, 2007; 176(4): 2293 - 2305. [Abstract] [Full Text] [PDF] |
||||
![]() |
Z. Yang PAML 4: Phylogenetic Analysis by Maximum Likelihood Mol. Biol. Evol., August 1, 2007; 24(8): 1586 - 1591. [Abstract] [Full Text] [PDF] |
||||
![]() |
I. Mayrose, A. Doron-Faigenboim, E. Bacharach, and T. Pupko Towards realistic codon models: among site variability and dependency of synonymous and non-synonymous rates Bioinformatics, July 1, 2007; 23(13): i319 - i327. [Abstract] [Full Text] [PDF] |
||||
![]() |
C. Kosiol, I. Holmes, and N. Goldman An Empirical Codon Model for Protein Sequence Evolution Mol. Biol. Evol., July 1, 2007; 24(7): 1464 - 1479. [Abstract] [Full Text] [PDF] |
||||
![]() |
M. Anisimova and Z. Yang Multiple Hypothesis Testing to Detect Lineages under Positive Selection that Affects Only a Few Sites Mol. Biol. Evol., May 1, 2007; 24(5): 1219 - 1228. [Abstract] [Full Text] [PDF] |
||||
![]() |
C. A Swann, S. J B Cooper, and W. G Breed Molecular evolution of the carboxy terminal region of the zona pellucida 3 glycoprotein in murine rodents Reproduction, April 1, 2007; 133(4): 697 - 708. [Abstract] [Full Text] [PDF] |
||||
![]() |
C. Seoighe, F. Ketwaroo, V. Pillay, K. Scheffler, N. Wood, R. Duffet, M. Zvelebil, N. Martinson, J. McIntyre, L. Morris, et al. A Model of Directional Selection Applied to the Evolution of Drug Resistance in HIV-1 Mol. Biol. Evol., April 1, 2007; 24(4): 1025 - 1031. [Abstract] [Full Text] [PDF] |
||||
![]() |
A. Doron-Faigenboim and T. Pupko A Combined Empirical and Mechanistic Codon Model Mol. Biol. Evol., February 1, 2007; 24(2): 388 - 397. [Abstract] [Full Text] [PDF] |
||||
![]() |
M. R. Pie The Influence of Phylogenetic Uncertainty on the Detection of Positive Darwinian Selection Mol. Biol. Evol., December 1, 2006; 23(12): 2274 - 2278. [Abstract] [Full Text] [PDF] |
||||
![]() |
Z. Yang On the Varied Pattern of Evolution of 2 Fungal Genomes: A Critique of Hughes and Friedman Mol. Biol. Evol., December 1, 2006; 23(12): 2279 - 2282. [Abstract] [Full Text] [PDF] |
||||
![]() |
S. L. Sawyer and H. S. Malik Eukaryotic Transposable Elements and Genome Evolution Special Feature: Positive selection of yeast nonhomologous end-joining genes and a retrotransposon conflict hypothesis PNAS, November 21, 2006; 103(47): 17614 - 17619. [Abstract] [Full Text] [PDF] |
||||
![]() |
J. E. Aagaard, X. Yi, M. J. MacCoss, and W. J. Swanson Rapidly evolving zona pellucida domain proteins are a major component of the vitelline envelope of abalone eggs PNAS, November 14, 2006; 103(46): 17302 - 17307. [Abstract] [Full Text] [PDF] |
||||
![]() |
S. Foret and R. Maleszka Function and evolution of a gene family encoding odorant binding-like proteins in a social insect, the honey bee (Apis mellifera) Genome Res., November 1, 2006; 16(11): 1404 - 1413. [Abstract] [Full Text] [PDF] |
||||
![]() |
K. Scheffler, D. P. Martin, and C. Seoighe Robust inference of positive selection from recombining coding sequences Bioinformatics, October 15, 2006; 22(20): 2493 - 2499. [Abstract] [Full Text] [PDF] |
||||
![]() |
J. C. Preston and E. A. Kellogg Reconstructing the Evolutionary History of Paralogous APETALA1/FRUITFULL-Like Genes in Grasses (Poaceae) Genetics, September 1, 2006; 174(1): 421 - 437. [Abstract] [Full Text] [PDF] |
||||
![]() |
R. E. Broughton and P. C. Reneau Spatial Covariation of Mutation and Nonsynonymous Substitution Rates in Vertebrate Mitochondrial Genomes Mol. Biol. Evol., August 1, 2006; 23(8): 1516 - 1524. [Abstract] [Full Text] [PDF] |
||||
![]() |
S. Guindon, M. Black, and A. Rodrigo Control of the False Discovery Rate Applied to the Detection of Positively Selected Amino Acid Sites Mol. Biol. Evol., May 1, 2006; 23(5): 919 - 926. [Abstract] [Full Text] [PDF] |
||||
![]() |
S. L. Chen, C.-S. Hung, J. Xu, C. S. Reigstad, V. Magrini, A. Sabo, D. Blasiar, T. Bieri, R. R. Meyer, P. Ozersky, et al. Identification of genes subject to positive selection in uropathogenic strains of Escherichia coli: A comparative genomics approach PNAS, April 11, 2006; 103(15): 5977 - 5982. [Abstract] [Full Text] [PDF] |
||||
![]() |
D. J. Wilson and G. McVean Estimating Diversifying Selection and Functional Constraint in the Presence of Recombination Genetics, March 1, 2006; 172(3): 1411 - 1425. [Abstract] [Full Text] [PDF] |
||||
![]() |
A. Civetta, S. A. Rajakumar, B. Brouwers, and J. P. Bacik Rapid Evolution and Gene-Specific Patterns of Selection for Three Genes of Spermatogenesis in Drosophila Mol. Biol. Evol., March 1, 2006; 23(3): 655 - 662. [Abstract] [Full Text] [PDF] |
||||
![]() |
F. Zhao, X. Zhang, C. Liang, J. Wu, Q. Bao, and S. Qin Genome-wide analysis of restriction-modification system in unicellular and filamentous cyanobacteria Physiol Genomics, February 23, 2006; 24(3): 181 - 190. [Abstract] [Full Text] [PDF] |
||||
![]() |
K. Scheffler and C. Seoighe A Bayesian Model Comparison Approach to Inferring Positive Selection Mol. Biol. Evol., December 1, 2005; 22(12): 2531 - 2540. [Abstract] [Full Text] [PDF] |
||||
![]() |
M. Nei Selectionism and Neutralism in Molecular Evolution Mol. Biol. Evol., December 1, 2005; 22(12): 2318 - 2342. [Abstract] [Full Text] [PDF] |
||||
![]() |
M. Mondragon-Palomino and B. S. Gaut Gene Conversion and the Evolution of Three Leucine-Rich Repeat Gene Families in Arabidopsis thaliana Mol. Biol. Evol., December 1, 2005; 22(12): 2444 - 2456. [Abstract] [Full Text] [PDF] |
||||
![]() |
O. Podlaha, D. M. Webb, P. K. Tucker, and J. Zhang Positive Selection for Indel Substitutions in the Rodent Sperm Protein Catsper1 Mol. Biol. Evol., September 1, 2005; 22(9): 1845 - 1852. [Abstract] [Full Text] [PDF] |
||||
![]() |
L. Rowen, E. Williams, G. Glusman, E. Linardopoulou, C. Friedman, M. E. Ahearn, J. Seto, C. Boysen, S. Qin, K. Wang, et al. Interchromosomal Segmental Duplications Explain the Unusual Structure of PRSS3, the Gene for an Inhibitor-Resistant Trypsinogen Mol. Biol. Evol., August 1, 2005; 22(8): 1712 - 1720. [Abstract] [Full Text] [PDF] |
||||
![]() |
D. J. Lynn, A. R. Freeman, C. Murray, and D. G. Bradley A Genomics Approach to the Detection of Positive Selection in Cattle: Adaptive Evolution of the T-Cell and Natural Killer Cell-Surface Protein CD2 Genetics, July 1, 2005; 170(3): 1189 - 1196. [Abstract] [Full Text] [PDF] |
||||
![]() |
J. G. Bishop Directed Mutagenesis Confirms the Functional Importance of Positively Selected Sites in Polygalacturonase Inhibitor Protein Mol. Biol. Evol., July 1, 2005; 22(7): 1531 - 1534. [Abstract] [Full Text] [PDF] |
||||
![]() |
T. C. Spady, O. Seehausen, E. R. Loew, R. C. Jordan, T. D. Kocher, and K. L. Carleton Adaptive Molecular Evolution in the Opsin Genes of Rapidly Speciating Cichlid Species Mol. Biol. Evol., June 1, 2005; 22(6): 1412 - 1422. [Abstract] [Full Text] [PDF] |
||||
![]() |
B. Tummler and P. Cornelis Pyoverdine Receptor: a Case of Positive Darwinian Selection in Pseudomonas aeruginosa J. Bacteriol., May 15, 2005; 187(10): 3289 - 3292. [Full Text] [PDF] |
||||
![]() |
S. L. Kosakovsky Pond and S. D. W. Frost Not So Different After All: A Comparison of Methods for Detecting Amino Acid Sites Under Selection Mol. Biol. Evol., May 1, 2005; 22(5): 1208 - 1222. [Abstract] [Full Text] [PDF] |
||||
![]() |
B. J. Evans, D. B. Kelley, D. J. Melnick, and D. C. Cannatella Evolution of RAG-1 in Polyploid Clawed Frogs Mol. Biol. Evol., May 1, 2005; 22(5): 1193 - 1207. [Abstract] [Full Text] [PDF] |
||||
![]() |
P. Pfister and I. Rodriguez Olfactory expression of a single and highly variable V1r pheromone receptor-like gene in fish species PNAS, April 12, 2005; 102(15): 5489 - 5494. [Abstract] [Full Text] [PDF] |
||||
![]() |
Z. Yang, W. S.W. Wong, and R. Nielsen Bayes Empirical Bayes Inference of Amino Acid Sites Under Positive Selection Mol. Biol. Evol., April 1, 2005; 22(4): 1107 - 1118. [Abstract] [Full Text] [PDF] |
||||
![]() |
S. Consuegra, H.-J. Megens, H. Schaschl, K. Leon, R. J. M. Stet, and W. C. Jordan Rapid Evolution of the MH Class I Locus Results in Different Allelic Compositions in Recently Diverged Populations of Atlantic Salmon Mol. Biol. Evol., April 1, 2005; 22(4): 1095 - 1106. [Abstract] [Full Text] [PDF] |
||||
![]() |
T. Massingham and N. Goldman Detecting Amino Acid Sites Under Positive Selection and Purifying Selection Genetics, March 1, 2005; 169(3): 1753 - 1762. [Abstract] [Full Text] [PDF] |
||||
- THIS ARTICLE
-
Abstract
- Full Text (PDF)
- Alert me when this article is cited
- Alert me if a correction is posted
- SERVICES
- Email this article to a friend
- Similar articles in this journal
- Similar articles in PubMed
- Alert me to new issues of the journal
- Download to citation manager
- Reprints & Permissions
- CITING ARTICLES
- Citing Articles via HighWire
- Citing Articles via Google Scholar
- GOOGLE SCHOLAR
- Articles by Wong, W. S. W.
- Articles by Nielsen, R.
- Search for Related Content
- PUBMED
- PubMed Citation
- Articles by Wong, W. S. W.
- Articles by Nielsen, R.












