- THIS ARTICLE
-
Abstract
- Full Text (PDF)
- Alert me when this article is cited
- Alert me if a correction is posted
- SERVICES
- Similar articles in this journal
- Similar articles in PubMed
- Alert me to new issues of the journal
- Download to citation manager
- Reprints & Permissions
- CITING ARTICLES
- Citing Articles via HighWire
- Citing Articles via Google Scholar
- GOOGLE SCHOLAR
- Articles by Wong, W. S. W.
- Articles by Nielsen, R.
- Search for Related Content
- PUBMED
- PubMed Citation
- Articles by Wong, W. S. W.
- Articles by Nielsen, R.
Genetics, Vol. 167, 949-958, June 2004, Copyright © 2004
doi:10.1534/genetics.102.010959
Detecting Selection in Noncoding Regions of Nucleotide Sequences
Wendy S. W. Wong1 and Rasmus Nielsen
Department of Biological Statistics and Computational Biology, Cornell University, Ithaca, New York 14850
1 Corresponding author: 434 Warren Hall, Department of Biological Statistics and Computational Biology, Cornell University, Ithaca, NY 14853.
E-mail: sww8{at}cornell.edu
>ABSTRACT
MODEL OF THE CODING...
MODEL OF THE NONCODING...
COMBINING BOTH THE NONCODING...
LIKELIHOOD-RATIO TESTS AND...
SIMULATION
DATA ANALYSIS
RESULTS
DISCUSSION
ACKNOWLEDGEMENTS
LITERATURE CITED
We present a maximum-likelihood method for examining the selection pressure and detecting positive selection in noncoding regions using multiple aligned DNA sequences. The rate of substitution in noncoding regions relative to the rate of synonymous substitution in coding regions is modeled by a parameter
. When a site in a noncoding region is evolving neutrally
= 1, while
> 1 indicates the action of positive selection, and
< 1 suggests negative selection. Using a combined model for the evolution of noncoding and coding regions, we develop two likelihood-ratio tests for the detection of selection in noncoding regions. Data analysis of both simulated and real viral data is presented. Using the new method we show that positive selection in viruses is acting primarily in protein-coding regions and is rare or absent in noncoding regions.
MUCH attention has recently been given to positive selection at the molecular level because of its functional importance. Positive selection has been identified in the coding region in the human immunodeficiency virus (HIV)-1 envelope gene (BONHOEFFER et al. 1995), the major histocompatibility complex (MHC; HUGHES and NEI 1988), the tumor suppressor gene BRCA1 (HUTTLEY et al. 2000), female reproductive proteins in mammals (SWANSON et al. 2001), and many other proteins (YANG and BIELAWSKI 2000).
Several methods can be used to detect selection acting on protein-coding regions. One of the common approaches is to estimate the nonsynonymous rate (dN) to synonymous rate (dS) ratio (the most frequently used symbols in this article are given in Table 1)
. The dN/dS ratio is sometimes referred to as
(e.g., GOLDMAN and YANG 1994). When a codon site undergoes negative selection, synonymous substitutions occur at a faster rate than nonsynonymous substitutions, and therefore
< 1. However, when there is no selection (neutrality), the rate of synonymous substitutions is equal to the rate of nonsynonymous substitutions, i.e.,
= 1. Alternatively, if the site undergoes positive selection, new mutations are beneficial and
> 1.
|
Inferences regarding
have been used to demonstrate the presence of selection in many viral systems. Viruses may escape an existing immune response due to mutations in the proteins involved in interactions with the immune system. As a result several viral proteins have been observed to evolve under strong positive selection. In the HIV-1 envelope gene, positive selection has been found at sites that code for the surface positions in the protein (BONHOEFFER et al. 1995; MINDELL 1996; NIELSEN and YANG 1997; YAMAGUCHI and GOJOBORI 1998; YAMAGUCHI-KABATA and GOJOBORI 2000). In human influenza A (H3N2), positive selection has been found in the hemagglutinin (HA) gene, which encodes for a molecule that triggers the humoral immune response in humans (FITCH et al. 1997). ENDO et al. (1996) and YANG and BIELAWSKI (2000) gave more comprehensive lists of genes that are undergoing positive selection. While positive selection has been found in the viral genes that code for proteins that interact with the host immune system, very little is known regarding selection in noncoding regions. A variety of research has shown that viral noncoding regions play an important role in gene regulation and function (SHIROKI et al. 1995; WALKER et al. 1995; TAKAYOSHI et al. 1998). Furthermore, CARTER and ROIZMAN (1996) showed that viral introns are involved in alternative splicing and regulation of their own gene expression. The functional importance of the noncoding regions suggests that selection may be acting on them. However, since most interaction between the host immune system and viruses is at the level of proteins and peptides, very little positive selection is expected in noncoding regions compared to that in coding regions. Unfortunately, this and other hypotheses regarding selection in the noncoding region have not been subject to scientific evaluation because no appropriate statistical method has been available for this purpose.
We here extend the NIELSEN and YANG (1998) and YANG et al. (2000) methods for detecting positive selection in coding regions to noncoding regions. We model the evolution in coding regions and the evolution in noncoding regions jointly and assume that the neutral (synonymous) nucleotide substitution rate is constant in both the coding and noncoding regions of the same gene. Under this assumption, we introduce a new parameter
to model the evolution in the noncoding region.
is the nucleotide substitution rate in the noncoding region, normalized by the synonymous nucleotide substitution rate in the coding region. Therefore, when a site is subject to neutral selection,
= 1. Similarly,
> 1 indicates positive selection, while
< 1 suggests the presence of negative selection. The interpretation of
is, therefore, similar to the interpretation of
in models of evolution in coding regions. Using such a combined model we are able to develop a test to detect selection that is applicable to noncoding regions.
Three different simulated data sets are used to test the validity of the models. As an illustration of the method, we also compile 13 viral data sets from publications and GenBank to examine the hypothesis that positive selection mainly targets coding regions of viral genomes.
ABSTRACT
>MODEL OF THE CODING...
MODEL OF THE NONCODING...
COMBINING BOTH THE NONCODING...
LIKELIHOOD-RATIO TESTS AND...
SIMULATION
DATA ANALYSIS
RESULTS
DISCUSSION
ACKNOWLEDGEMENTS
LITERATURE CITED
, where q
ij
is the rate of transition from codon i to codon j. The transition rate from codon i to codon j is assumed to be proportional to the stationary distribution of the substituted nucleotide in codon j. If we represent codon i as a triplet i1i2i3 and codon j as j1j2j3 (i1, i2, i3, j1, j2, j3
{T, C, A, G}), and if codons i and j differ by exactly one nucleotide in position k, then
![]() | (1) |
Additionally,
if codons i and j differ in more than one position. The diagonal entries are defined as
to fulfill the mathematical requirement that the row sums must equal 0. The parameter
is the transition/transversion rate ratio.
, as defined before, is the nonsynonymous/synonymous (dN/dS) rate ratio. µjk is the stationary distribution of jk, the nucleotide in the kth position of the codon. For instance,
, since it is a nonsynonymous transversion (from histidine to glutamine and C
G is a transversion). The stationary frequency of codon i (i1i2i3) is assumed to be the product of the stationary frequencies of nucleotides i1, i2, and i3, divided by the sum of the stationary frequencies of the sense codons:
![]() | (2) |
Here we assume a universal genetic code, but with slight modification the method can be applied to other genetic codes.
To reduce the number of free parameters, we restrict our model to be time reversible, i.e.,
for any i, j. It is easy to prove that this is indeed the case using standard methods. Furthermore, we take advantage of FELSENSTEIN's (1981) pruning algorithm to save computational time.
Note that this model is different from the GOLDMAN and YANG (1994) model. In our model the codon transition substitution rates depend on the stationary nucleotide frequencies, whereas in the GOLDMAN and YANG (1994) model, the codon transition rates depend on the stationary codon frequencies.
and the stationary nucleotide substitution rates (µT, µC, µA, µG) govern the neutral mutation rates while
models the effect of selection.
ABSTRACT
MODEL OF THE CODING...
>MODEL OF THE NONCODING...
COMBINING BOTH THE NONCODING...
LIKELIHOOD-RATIO TESTS AND...
SIMULATION
DATA ANALYSIS
RESULTS
DISCUSSION
ACKNOWLEDGEMENTS
LITERATURE CITED
j) is given by
![]() | (3) |
{A, C, T, G}.
is the rate of substitution relative to the neutral selection rate. As in the codon model, the diagonal entries are
.
Under this model, the estimate of
at a site indicates the type of selection acting on it. As in the models by GAUT and WEIR (1994), MUSE (1995), NIELSEN and YANG (1998), and YANG et al. (2000) and in similar work by other authors, we assume that the synonymous substitution rate in the coding region reflects the neutral nucleotide substitution rate. Then
![]() | (4) |
When
= 1, the rate of nucleotide substitution at a site is equal to the synonymous codon substitution rate in the coding region, which in turn equals the synonymous nucleotide substitution rate. To illustrate this, consider C
G changes in both the noncoding and coding regions, and assume that the C
G change in the coding region is a change of codon TCC (Serine) to TCG (Serine). In the noncoding region,
since the C
G change is a transversion, whereas in the coding region,
since the change is a synonymous transversion. Likewise,
> 1 implies that the substitution rate is greater than the neutral nucleotide substitution rate, whereas
< 1 implies that the nucleotide substitution rate is lower than the neutral rate.
In our study we consider three rate classes of sites (negative, neutral, and positively evolving) in the noncoding region. We have implemented three models, namely the neutral model, the two-category model, and the three-category model (see Table 2) . In the neutral model, we assume that there is no positive selection. Therefore,
can only be
1. In the two-category model,
can be <1, and it can be
1. In the three-category model,
can take on values in three categories:
< 1,
= 1, and
> 1; that is,
![]() |
0 < 1,
1 = 1, 1 <
2 <
, and p1 + p2 + p3 = 1.
|
The neutral model is a special case of the three-category model in which p2 = 0 and a special case of the two-category model in which
2 = 1. Since the neutral model is nested within the two other models, a likelihood-ratio test of the hypothesis of no positive selection can be performed by comparing the maximum-likelihood value obtained under the two- and three-category models to the maximum-likelihood value obtained under the neutral model. ABSTRACT
MODEL OF THE CODING...
MODEL OF THE NONCODING...
>COMBINING BOTH THE NONCODING...
LIKELIHOOD-RATIO TESTS AND...
SIMULATION
DATA ANALYSIS
RESULTS
DISCUSSION
ACKNOWLEDGEMENTS
LITERATURE CITED
In our joint model of coding and noncoding regions, the individual sites (codon sites in the case of the coding region and nucleotide sites in the case of the noncoding region) are assumed to be independent. Therefore, given the model and a particular phylogenetic tree, the log likelihood can be calculated as the sum of the log likelihoods among sites,
![]() | (5) |
The log likelihood defined in Equation 5 can be optimized using standard optimization techniques. The popular local unconstrained optimization procedure [Broyden-Fletcher-Goldfarb-Shanno (BFGS) algorithm] adopted from Numerical Recipes in C (PRESS et al. 1992) is used in our program EvoNC. However, since the parameters in our models are constrained (all parameters are constrained to be >0, the proportions of
in each category have to add 1 to 1, and some
's have to be <1 and some >1), it is a constrained optimization problem. We replaced the original objective function by a quadratic penalty function. The quadratic penalty function consists of the original objective function and a penalty term for each constraint. The penalty term is a multiple of the square of the constraint violation when the current parameter vector violates the constraint and is 0 otherwise (NOCEDAL and WRIGHT 1999). A barrier term was also incorporated for each parameter to ensure that the vector of parameters lies in the interior of the parameter space.
In NIELSEN and YANG (1998) and YANG et al. (2000)
was allowed to vary among sites. In this study we do not pursue such models as our primary interest is in the noncoding region. Allowing
to vary among sites in the codon region is likely to increase the likelihood of the model. It might affect the parameter estimates of
since
and
may be correlated. However, implementing a model that allows variable
will make the optimization procedure even more computationally intensive.
ABSTRACT
MODEL OF THE CODING...
MODEL OF THE NONCODING...
COMBINING BOTH THE NONCODING...
>LIKELIHOOD-RATIO TESTS AND...
SIMULATION
DATA ANALYSIS
RESULTS
DISCUSSION
ACKNOWLEDGEMENTS
LITERATURE CITED
, let the corresponding proportions of nucleotide sites in the three categories be p1 ... pk, under the constraint that p1 + ... + pk = 1. The probability of observing data (xh) at site h is then
![]() | (6) |
The conditional probability can be calculated using an empirical Bayes approach as shown in NIELSEN and YANG (1998). The posterior probability that the nucleotide site h belongs to category k is given by
![]() | (7) |
We assign sites to categories by choosing the category k that maximizes prob(
k|xh).
The maximum-likelihood framework allows us to test the null hypothesis of no positive selection using likelihood-ratio tests. Two different likelihood-ratio tests can be performed by comparing the neutral model against the three-category model and the neutral model against the two-category model. The three-category model has two more parameters (
3, the category that allows
> 1, and p3, the proportion of
3) than the neutral model. Comparing twice the log-likelihood difference between these two models with the
2 distribution with 2 d.f. may be used to approximate P values of this test. However, because one of the parameters is on the boundary of the parameter space and another parameter is not estimable under the null hypothesis, the true asymptotic distribution of the likelihood-ratio test statistic is not known under the null hypothesis. The resulting test will therefore be conservative. A better test, similar to test II in SWANSON et al. (2003) for the coding region, is to compare the neutral model against the two-category model. In this test twice the difference in log-likelihood is asymptotically distributed as a 50:50 mixture of a point mass at 0 and a
2 distribution with 1 d.f. (CHERNOFF 1954; SELF and LIANG 1987). The benefits of this test are that the reduction in the degrees of freedom may in some cases increase the power of the test and that the true asymptotic distribution of the test statistic is actually known.
ABSTRACT
MODEL OF THE CODING...
MODEL OF THE NONCODING...
COMBINING BOTH THE NONCODING...
LIKELIHOOD-RATIO TESTS AND...
>SIMULATION
DATA ANALYSIS
RESULTS
DISCUSSION
ACKNOWLEDGEMENTS
LITERATURE CITED
was set to be 0.4. Both regions shared the same
= 5. The proportions of the three categories of
were chosen with an intention to reflect the distribution in real data sets. The distribution of
in the four data sets is shown in Table 3
.
|
ABSTRACT
MODEL OF THE CODING...
MODEL OF THE NONCODING...
COMBINING BOTH THE NONCODING...
LIKELIHOOD-RATIO TESTS AND...
SIMULATION
>DATA ANALYSIS
RESULTS
DISCUSSION
ACKNOWLEDGEMENTS
LITERATURE CITED
Table 4 gives the summarized source and the details of the 13 data sets and the GenBank accession numbers are available from the authors upon request. NEIGHBOR in the PHYLIP package version 3.6 (FELSENSTEIN 2002) was used for estimating phylogenetic trees. The alignment gaps were removed by our program EvoNC.
|
The coding region of each of the data sets was analyzed by codeml in the PAML package version 3.12 (YANG 1997). Models M7, M8 (YANG et al. 2000), and M8a (SWANSON et al. 2003) were used in the study to determine if positive selection exists in the coding region.
For each of the data sets, two tests were performed for the coding region. Test 1 compares twice the log-likelihood ratio difference between M7 and M8 to a
2 distribution with 2 d.f.. As noted in YANG et al. (2000), this test is conservative. Test 2 compares twice the log-likelihood difference between M8 and M8a to a 
20 + 
21 distribution, as suggested in SWANSON et al. (2003).
The selection in the noncoding region was investigated using EvoNC. Both the coding and the noncoding regions were used in the program. Similar to the coding region, two tests were performed for each of the data sets. Test 1 compares twice the log-likelihood ratios between the neutral and the three-category model to a
22 distribution, whereas test 2 compares twice the log-likelihood ratios between the neutral and two-category model to a 
20 + 
21 distribution. The critical value of the likelihood-ratio test statistic, for a test performed at the 5% significance level, is then 2.71.
ABSTRACT
MODEL OF THE CODING...
MODEL OF THE NONCODING...
COMBINING BOTH THE NONCODING...
LIKELIHOOD-RATIO TESTS AND...
SIMULATION
DATA ANALYSIS
>RESULTS
DISCUSSION
ACKNOWLEDGEMENTS
LITERATURE CITED
Simulation data:
The simulated data were analyzed using the three models (the neutral model, the two-category model, and the three-category model, as mentioned above), and the log-likelihood of each model was used to perform likelihood-ratio tests. Estimates of all parameters of the model were obtained from the three-category model. Each site was categorized according to the posterior probabilities calculated using Equation 8. The estimates of parameters are summarized in Table 5 , and the classification of sites in conserved and positive sets is shown in Figure 1 .
|
|
The estimated values of
match quite well with the actual values. In the neutral data set, estimates of
were obtained for all sites for all models. In this boundary of the parameter space, the proportion of sites in the different categories is not identifiable. In the conserved data set, maximum-likelihood estimates of
and
were obtained for the three-category model, which were virtually identical to the true values of 0.2 and 1.0. Again, the proportion of sites in category 1 and 2 is not identifiable. The estimate of the proportion of conserved sites was
0 = 0.51, which was slightly larger than the true proportion 0.50. In the positive data set 1, estimates of
and
were obtained for the three-category model, which differed slightly from the true values of 0.2 and 5.0, respectively. The estimated proportion of neutral sites was less than expected (0.15 vs. 0.25). On the other hand,
0 = 0.79 and
2 = 0.06, which were both larger than the true values of 0.725 and 0.025. In the positive data set 2, the estimates from the three-category model are
and
, and the corresponding proportions are
0 = 0.79 and
2 = 0.10. Again, the estimated proportions were slightly larger than the true values. In the neutral and conserved data sets, all three models gave approximately the same maximum-log-likelihood value in these two sets. In these two cases, the neutral null hypothesis was true and was not rejected by the likelihood-ratio test.
In the positive data set 1, the maximum-log-likelihood values for the three models were
![]() |
Both tests reject the false null hypothesis of no positive selection. However, from the maximum-log-likelihood shown below, only test 2 (neutral model vs. two-category model) was significant in positive data set 2. This is probably due to the fact that only 5 of 200 sites were slightly positively selected.
![]() |
The Bayesian classification of sites performed quite well for the first three simulated data sets. In the neutral data set, EvoNC classified all 200 sites correctly. In the conserved set, 14 of the actual neutral sites were miscategorized as conserved sites while 3 of the actual conserved sites were miscategorized as neutral sites. A total of 183 sites were classified correctly. In positive data set 1, the 5 positively selected sites were all included in the estimated set of positively selected sites. Three neutral sites were also falsely included in this set.
In positive data set 2, 1 out of the 5 positively selected sites was classified as being neutral and 16 out of the 195 negatively selected and neutral sites were classified as being positively selected. The main reason the method performs worse when there are only a few positively selected sites is that the estimates of
2 are more unreliable. When the maximum-likelihood estimates of parameters have large variance, the empirical Bayes estimates of posterior probabilities may have reduced accuracy. Nevertheless, a close examination revealed that the falsely classified sites have low posterior probabilities. Therefore, by adjusting the cutoff probability the accuracy of the method can be controlled.
These results suggest that our method is capable of picking up strong positive selection, even though as few as 2.5% of the sites are positively selected. On the other hand, when only a small portion of the sites are undergoing weak positive selection, the classification of sites does not have great accuracy.
The viral data sets:
The results of the PAML analysis are summarized in Table 6 . Five out of the 13 viral data sets have significant evidence for positive selection, using test 1 for the coding region (M7 vs. M8): glycoprotein of Ebola virus, foot-and-mouth disease polyprotein, hepatitis C polyprotein, New Castle disease virus nucleocapsid protein, and poliovirus polyprotein. Among these 5 data sets, 2 were not significant using test 2 for the coding region (M8a vs. M8). This may be because the true value of
was only slightly >1 in the positively selected sites or, possibly, because the beta distribution assumed in model M7 did not fit the true distribution of
well. For instance, in the foot-and-mouth disease data set,
was equal to 1.01 in the positively selected sites. Clearly, this cannot be interpreted as good evidence for positive selection in this virus. On the other hand, for example, 11% of the sites in Ebola glycoprotein have
= 5.92, which is probably a good indication that this protein is undergoing positive selection.
|
In the noncoding regions there was little or no evidence of positive selection (Table 6). For most data sets, the log-likelihoods of all three models were almost exactly the same (Table 7) . One exception was the Japanese encephalitis virus. The test of the two-category model vs. the neutral model in the 3'-untranslated region of Japanese encephalitis virus was marginally significant (P = 0.05), but the test of the three-category model vs. the neutral model (P = 0.22) was not. NAM et al. (2002) described this region as the variable region since it showed a high degree of sequence variation and deletion. However, they also pointed out that despite the fact that the region was highly variable, the predicted RNA structures all had a similar type loop at the 5' terminus.
|
After correcting for multiple testing by the Bonferroni procedure, we conclude that the likelihood-ratio tests provide no evidence for positive selection in the viral data that we examined. This result confirms our expectation that positive selection occurs primarily in the coding regions of viruses.
ABSTRACT
MODEL OF THE CODING...
MODEL OF THE NONCODING...
COMBINING BOTH THE NONCODING...
LIKELIHOOD-RATIO TESTS AND...
SIMULATION
DATA ANALYSIS
RESULTS
>DISCUSSION
ACKNOWLEDGEMENTS
LITERATURE CITED
The results of our analysis of simulated data suggest that the new method provides accurate parameter estimates. The likelihood-ratio tests also performed well and detected selection from 15 sequences when only 2.5% of the sites were undergoing positive selection. When more than one category of
was present, the program miscategorized
10% of the sites. In small data sets the classification of sites may not have the highest accuracy. A similar conclusion has been reached regarding classification of sites in coding regions (ANISIMOVA et al. 2001, 2002). In the analysis of real data, it is advisable to confirm the presence of positive selection in particular sites by employing additional structural, functional, or evolutionary information.
Our model is based on the assumption that there is no selection acting on the synonymous sites and the rate of substitution is constant among sites. This assumption would be violated if there is codon usage bias in the coding region. A lot of the viruses may have codon usage bias due to the overlapping reading frames and/or RNA structural constraints in coding regions. We do not know how much the bias would have affected the parameter estimates but this is a question that could be addressed using simulations.
The models presented here do not allow rate variation among the lineages in the phylogeny. This may result in a loss of statistical power of the likelihood-ratio tests. Our method will not be able to identify positively selected sites if positive selection occurs in only a few lineages, while the negative selection dominates the rest. In the future, we would like to incorporate rate variation among the lineages into the program as well.
Since the optimization procedure (BFGS algorithm) used here is a local optimization procedure, it is possible that the likelihood is trapped at a local minimum. While we do not have a rigorous solution to the problem, it is advisable to run EvoNC with different sets of initial values. If they all result in the same log-likelihood it is very likely that one has found the global minimum.
EvoNC is currently under development for incorporating new models and functionalities. The beta version for the models tested in this article is available from the authors upon request.
ABSTRACT
MODEL OF THE CODING...
MODEL OF THE NONCODING...
COMBINING BOTH THE NONCODING...
LIKELIHOOD-RATIO TESTS AND...
SIMULATION
DATA ANALYSIS
RESULTS
DISCUSSION
>ACKNOWLEDGEMENTS
LITERATURE CITED
ABSTRACT
MODEL OF THE CODING...
MODEL OF THE NONCODING...
COMBINING BOTH THE NONCODING...
LIKELIHOOD-RATIO TESTS AND...
SIMULATION
DATA ANALYSIS
RESULTS
DISCUSSION
ACKNOWLEDGEMENTS
>LITERATURE CITED
ANISIMOVA, M., J. P. BIELAWSKI and Z. YANG, 2001 Accuracy and power of the likelihood ratio test in detecting adaptive molecular evolution. Mol. Biol. Evol. 18: 15851592.
ANISIMOVA, M., J. P. BIELAWSKI and Z. YANG, 2002 Accuracy and power of bayes prediction of amino acid sites under positive selection. Mol. Biol. Evol. 19: 950958.
BADRANE, H., and N. TORDO, 2001 Host switching in Lyssavirus history from the Chiroptera to the Carnivora orders. J. Virol. 75: 80968104.
BONHOEFFER, S., E. C. HOLMES and M. A. NOWAK, 1995 Causes of HIV diversity. Nature 376: 125.[CrossRef][Medline]
BREUN, L. A., T. J. BROERING, A. M. MCCUTCHEON, S. J. HARRISON, C. L. LUONGO et al., 2001 Mammalian reovirus L2 gene and lambda2 core spike protein sequences and whole-genome comparisons of reoviruses type 1 Lang, type 2 Jones, and type 3 Dearing. Virology 287: 333348.[CrossRef][Medline]
CARTER, K. L., and B. ROIZMAN, 1996 Alternatively spliced mRNAs predicted to yield frame-shift proteins and stable intron 1 RNAs of the herpes simplex virus 1 regulatory gene alpha 0 accumulate in the cytoplasm of infected cells. Proc. Natl. Acad. Sci. USA 93: 1253512540.
CHERNOFF, H., 1954 On the distribution of the likelihood ratio. Ann. Math. Stat. 25: 573578.
ENDO, T., K. IKEO and T. GOJOBORI, 1996 Large-scale search for genes on which positive selection may operate. Mol. Biol. Evol. 13: 685690.[Abstract]
FELSENSTEIN, J., 1981 Evolutionary trees from DNA sequences: a maximum likelihood approach. J. Mol. Evol. 17: 368376.[CrossRef][Medline]
FELSENSTEIN, J., 2002 Phylogenetic Inference Package (PHYLIP), Version 3.6. University of Washington, Seattle.
FITCH, W. M., R. M. BUSH, C. A. BENDER and N. J. COX, 1997 Long term trends in the evolution of H(3) HA1 human influenza type A. Proc. Natl. Acad. Sci. USA 94: 77127718.
FUJIWARA, K., O. YOKOSUKA, K. FUKAI, F. IMAZEKI, H. SAISHO et al., 2001 Analysis of full-length hepatitis A virus genome in sera from patients with fulminant and self-limited acute type A hepatitis. J. Hepatol. 35: 112119.[CrossRef][Medline]
GAUT, B. S., and B. S. WEIR, 1994 Detecting substitution-rate heterogeneity among regions of a nucleotide sequence. Mol. Biol. Evol. 11: 620629.[Abstract]
GOLDMAN, N., and Z. YANG, 1994 A codon-based model of nucleotide substitution for protein-coding DNA sequences. Mol. Biol. Evol. 11: 725736.[Abstract]
HASEGAWA, M., H. KISHINO and T. YANO, 1985 Dating of the human-ape splitting by a molecular clock of mitochondrial DNA. J. Mol. Evol. 22: 160174.[CrossRef][Medline]
HUGHES, A. L., and M. NEI, 1988 Pattern of nucleotide substitution at major histocompatibility complex class I loci reveals overdominant selection. Nature 335: 167170.[CrossRef][Medline]
HUTTLEY, G. A., S. EASTEAL, M. C. SOUTHEY, A. TESORIERO, G. G. GILES et al., 2000 Adaptive evolution of the tumour suppressor BRCA1 in humans and chimpanzees. Australian Breast Cancer Family Study. Nat. Genet. 25: 410413.[CrossRef][Medline]
LUO, K., H. HE, Z. LIU, D. LIU, H. XIAO et al., 2002 Novel variants related to TT virus distributed widely in China. J. Med. Virol. 67: 118126.[CrossRef][Medline]
MINDELL, D. P., 1996 Positive selection and rates of evolution in immunodeficiency viruses from humans and chimpanzees. Proc. Natl. Acad. Sci. USA 93: 32843288.
MUSE, S. V., 1995 Evolutionary analyses of DNA sequences subject to constraints of secondary structure. Genetics 139: 14291439.[Abstract]
MUSE, S. V., and B. S. GAUT, 1994 A likelihood approach for comparing synonymous and nonsynonymous nucleotide substitution rates, with application to the chloroplast genome. Mol. Biol. Evol. 11: 715724.[Abstract]
NAM, J. H., S. L. CHAE, S. H. PARK, Y. S. JEONG, M. S. JOO et al., 2002 High level of sequence variation in the 3' noncoding region of Japanese encephalitis viruses isolated in Korea. Virus Genes 24: 2127.[CrossRef][Medline]
NIELSEN, R., and Z. YANG, 1998 Likelihood models for detecting positively selected amino acid sites and applications to the HIV-1 envelope gene. Genetics 148: 929936.
NOCEDAL, J., and S. J. WRIGHT, 1999 Numerical Optimization, pp. 490527. Springer-Verlag, New York.
PRESS, W. H., S. A. TEUKOLSKY, W. T. VETTERLING and B. P. FLANNERY, 1992 Numerical Recipes: The Art of Scientific Computing, pp. 425430. Cambridge University Press, Cambridge/London/New York.
SALEMI, M., and A. M. VANDAMME, 2002 Hepatitis C virus evolutionary patterns studied through analysis of full-genome sequences. J. Mol. Evol. 54: 6270.[CrossRef][Medline]
SANCHEZ, A., S. G. TRAPPIER, B. W. MAHY, C. J. PETERS and S. T. NICHOL, 1996 The virion glycoproteins of Ebola viruses are encoded in two reading frames and are expressed through transcriptional editing. Proc. Natl. Acad. Sci. USA 93: 36023607.
SEAL, B. S., J. M. CRAWFORD, H. S. SELLERS, D. P. LOCKE and D. J. KING, 2002 Nucleotide sequence analysis of the Newcastle disease virus nucleocapsid protein gene and phylogenetic relationships among the Paramyxoviridae. Virus Res. 83: 119129.[CrossRef][Medline]
SELF, S., and K.-Y. LIANG, 1987 Asymptotic properties of maximum likelihood estimators and likelihood ratio tests under nonstandard conditions. J. Am. Stat. Assoc. 82: 605610.[CrossRef]
SHIROKI, K., T. ISHII, T. AOKI, M. KOBASHI, S. OHKA et al., 1995 A new cis-acting element for RNA replication within the 5' noncoding region of poliovirus type 1 RNA. J. Virol. 69: 68256832.[Abstract]
SWANSON, W. J., Z. YANG, M. F. WOLFNER and C. F. AQUADRO, 2001 Positive Darwinian selection drives the evolution of several female reproductive proteins in mammals. Proc. Natl. Acad. Sci. USA 98: 25092514.
SWANSON, W. J., R. NIELSEN and Q. YANG, 2003 Pervasive adaptive evolution in mammalian fertilization proteins. Mol. Biol. Evol. 20: 1820.
TAKAYOSHI, I., S. M. TAHARA and M. M. C. LAI, 1998 The 39-untranslated region of hepatitis C virus RNA enhances translation from an internal ribosomal entry site. J. Virol. 72: 87898796.
THOMPSON, J. D., D. G. HIGGINS and T. J. GIBSON, 1994 CLUSTAL W: improving the sensitivity of progressive multiple sequence alignment through sequence weighting, position-specific gap penalties and weight matrix choice. Nucleic Acids Res. 22: 46734680.
VILCEK, S., and S. BELAK, 1997 Organization and diversity of the 3'-noncoding region of classical swine fever virus genome. Virus Genes 15: 181186.[Medline]
WALKER, P. A., L. E. LEONG and A. G. PORTER, 1995 Sequence and structural determinants of the interaction between the 5'-noncoding region of picornavirus RNA and rhinovirus protease 3C. J. Biol. Chem. 270: 1451014516.
YAMAGUCHI, Y., and T. GOJOBORI, 1997 Evolutionary mechanisms and population dynamics of the third variable envelope region of HIV within single hosts. Proc. Natl. Acad. Sci. USA 94: 12641269.
YAMAGUCHI-KABATA, Y., and T. GOJOBORI, 2000 Reevaluation of amino acid variability of the human immunodeficiency virus type 1 gp120 envelope glycoprotein and prediction of new discontinuous epitopes. J. Virol. 74: 43354350.
YANG, Z., 1997 PAML: a program package for phylogenetic analysis by maximum likelihood. Comput. Appl. Biosci. 13: 555556.
YANG, Z., and J. P. BIELAWSKI, 2000 Statistical methods for detecting molecular adaptation. Trends Ecol. Evol. 15: 496503.[CrossRef][Medline]
YANG, Z., R. NIELSEN, N. GOLDMAN and A. M. PEDERSEN, 2000 Codon-substitution models for heterogeneous selection pressure at amino acid sites. Genetics 155: 431449.
This article has been cited by other articles:
![]() |
G. A. Babbitt and Y. Kim Inferring Natural Selection on Fine-Scale Chromatin Organization in Yeast Mol. Biol. Evol., August 1, 2008; 25(8): 1714 - 1727. [Abstract] [Full Text] [PDF] |
||||
![]() |
Q. Xu, C.-H. C. Cheng, P. Hu, H. Ye, Z. Chen, L. Cao, L. Chen, Y. Shen, and L. Chen Adaptive Evolution of Hepcidin Genes in Antarctic Notothenioid Fishes Mol. Biol. Evol., June 1, 2008; 25(6): 1099 - 1112. [Abstract] [Full Text] [PDF] |
||||
![]() |
J. Lu, Y. Fu, S. Kumar, Y. Shen, K. Zeng, A. Xu, R. Carthew, and C.-I Wu Adaptive Evolution of Newly Emerged Micro-RNA Genes in Drosophila Mol. Biol. Evol., May 1, 2008; 25(5): 929 - 938. [Abstract] [Full Text] [PDF] |
||||
![]() |
N. Howell, J. L. Elson, C. Howell, and D. M. Turnbull Relative Rates of Evolution in the Coding and Control Regions of African mtDNAs Mol. Biol. Evol., October 1, 2007; 24(10): 2213 - 2221. [Abstract] [Full Text] [PDF] |
||||
![]() |
S. W. Doniger, J. Huh, and J. C. Fay Identification of functional transcription factor binding sites using closely related Saccharomyces species Genome Res., May 1, 2005; 15(5): 701 - 709. [Abstract] [Full Text] [PDF] |
||||
![]() |
S. Sinha and E. D. Siggia Sequence Turnover and Tandem Repeats in cis-Regulatory Modules in Drosophila Mol. Biol. Evol., April 1, 2005; 22(4): 874 - 885. [Abstract] [Full Text] [PDF] |
||||
- THIS ARTICLE
-
Abstract
- Full Text (PDF)
- Alert me when this article is cited
- Alert me if a correction is posted
- SERVICES
- Similar articles in this journal
- Similar articles in PubMed
- Alert me to new issues of the journal
- Download to citation manager
- Reprints & Permissions
- CITING ARTICLES
- Citing Articles via HighWire
- Citing Articles via Google Scholar
- GOOGLE SCHOLAR
- Articles by Wong, W. S. W.
- Articles by Nielsen, R.
- Search for Related Content
- PUBMED
- PubMed Citation
- Articles by Wong, W. S. W.
- Articles by Nielsen, R.












