- THIS ARTICLE
-
Abstract
- Full Text (PDF)
- Alert me when this article is cited
- Alert me if a correction is posted
- SERVICES
- Similar articles in this journal
- Similar articles in PubMed
- Alert me to new issues of the journal
- Download to citation manager
- Reprints & Permissions
- CITING ARTICLES
- Citing Articles via HighWire
- Citing Articles via Google Scholar
- GOOGLE SCHOLAR
- Articles by Seo, T.-K.
- Articles by Kishino, H.
- Search for Related Content
- PUBMED
- PubMed Citation
- Articles by Seo, T.-K.
- Articles by Kishino, H.
Estimation of Effective Population Size of HIV-1 Within a Host: A Pseudomaximum-Likelihood Approach
Tae-Kun Seoa,b, Jeffrey L. Thornec, Masami Hasegawaa,b, and Hirohisa Kishinoda Department of Biosystems Science, The Graduate University for Advanced Studies, Hayama, Kanagawa, 240-0193, Japan,
b The Institute of Statistical Mathematics, 4-6-7 Minami-Azabu Minato-ku, Tokyo 106-8569, Japan,
c Bioinformatics Research Center, North Carolina State University, Raleigh, North Carolina 27695-7566
d Laboratory of Biometrics, Graduate School of Agriculture and Life Sciences, University of Tokyo, Yayoi 1-1-1 Bunkyo-ku, Tokyo 113-8657, Japan
Corresponding author: Tae-Kun Seo, Box 7566, North Carolina State University, Raleigh, NC 27695-7566., seo{at}statgen.ncsu.edu (E-mail)
Communicating editor: G. B. GOLDING
| ABSTRACT |
|---|
Using pseudomaximum-likelihood approaches to phylogenetic inference and coalescent theory, we develop a computationally tractable method of estimating effective population size from serially sampled viral data. We show that the variance of the maximum-likelihood estimator of effective population size depends on the serial sampling design only because internal node times on a coalescent genealogy can be better estimated with some designs than with others. Given the internal node times and the number of sequences sampled, the variance of the maximum-likelihood estimator is independent of the serial sampling design. We then estimate the effective size of the HIV-1 population within nine hosts. If we assume that the mutation rate is 2.5 x 10-5 substitutions/generation and is the same in all patients, estimated generation lengths vary from 0.73 to 2.43 days/generation and the mean (1.47) is similar to the generation lengths estimated by other researchers. If we assume that generation length is 1.47 days and is the same in all patients, mutation rate estimates vary from 1.52 x 10-5 to 5.02 x 10-5. Our results indicate that effective viral population size and evolutionary rate per year are negatively correlated among HIV-1 patients.
ONE of the most striking features of human immunodeficiency virus (HIV)-1 infections is the high variation among patients of the length of the asymptomatic period. During this period, the number of CD4+ T cells decreases slowly, the immune system gradually weakens, and the viral load is roughly constant. The length of the asymptomatic stage can range from a few years to >10 years. At the end of the asymptomatic period, progression to AIDS starts. This progression is characterized by increasing viral loads and a continued decrease in the number of CD4+ T cells (for a review, see ![]()
Mathematical models to account for the variation of the asymptomatic period have been introduced (![]()
![]()
![]()
Because of their high evolutionary rates, RNA viruses such as HIV-1 are potentially more informative regarding evolutionary processes than are more slowly evolving model systems. For example, with the constant rate assumption of a molecular clock, the rate of molecular evolution and the internal node times of a phylogenetic tree can be simultaneously estimated with serially sampled viral data (![]()
![]()
As another illustration of the rich information available in serially sampled viral data sets, important population genetic parameters that are confounded for contemporaneously isolated sequence data can be separately estimated for serially sampled data. For example, effective population size and the rate of mutation per generation are two of the central parameters in population genetic theory. With contemporaneously isolated sequence data, only their product can be estimated. With serially sampled viral data and with a known generation time, effective population size and mutation rate per generation can be separately estimated. Likewise, with serially sampled viral data and with a known mutation rate, effective population size and generation time can be disentangled (![]()
![]()
The stages identified by ![]()
![]()
In this article, a two-stage estimation procedure is adopted. Times of internal nodes are estimated from sequence data and then these estimated node times serve as the basis for inferring effective population size. Because the main interest is effective population size and not times of internal nodes, internal node times are nuisance parameters in our analysis and the number of these nuisance parameters increases as the number of sequences increases. This situation can by analyzed with a pseudomaximum-likelihood approach (![]()
| THEORY |
|---|
Coalescent theory:
Subsequent to the pioneering work of ![]()
![]()
![]()

where we denote Ne as the effective population size to emphasize that it is not necessarily the same as the total population size. Here, the time intervals ti are measured in terms of generations. The vector of these intervals is denoted t. The mean and variance of ti are, respectively,

and

Initially, we treat the vector of time intervals t as observed and we discuss inference of Ne for this situation. Later, we replace this vector t in the equations below by its estimate. When there are n contemporaneously isolated sequences, the likelihood function
is
![]() |
(1) |
Using the log-likelihood equation
![]() |
(2) |
the maximum-likelihood estimate of the effective population size is
![]() |
(3) |
This estimate is unbiased and its variance is
![]() |
(4) |
(e.g., ![]()
![]()
Coalescent likelihood of serially sampled data:
Recently, coalescent theory has been applied to the investigation of serially sampled viral populations (![]()
![]()
![]()
![]()
. For a vector t representing the pertinent time intervals (e.g., tn1, ... , tc+1, t*c + sc+n2, sc+n2-1, ... , s2), the likelihood
can be expressed as a product of probability densities of time intervals
![]() |
(5) |
|
As was shown by ![]()
![]() |
(6) |
Moreover, each factor provides some information regarding the value of Ne. We define
![]() |
(7) |
where argmax
f(
) represents the value of
that maximizes f(
). Because the lengths of time intervals between coalescent events are random, the Ñ(i)e are random variables. By making a transformation of random variables from the time intervals in Equation 6 to the Ñ(i)e, the distributions of the Ñ(i)e are seen to be independent of the serial sampling design. Specifically, the probability density p(Ñ(i)e|Ne) is an exponential distribution with mean Ne. By expressing Equation 5 in terms of Ñ(i)e instead of in terms of time intervals, we get
![]() |
(8) |
The maximum-likelihood estimate of Ne is therefore
![]() |
(9) |
Because the Ñ(i)e are exponential random variables with mean Ne,
![]() |
(10) |
It is straightforward to show that the approaches used here to derive
e and Var(
e) apply to more general serial sampling designs. Therefore, given the divergence times, Var(
e) is affected by the total number of sampled sequences but not by other particulars of the serial sampling design.
Pseudomaximum-likelihood approach for serial samples:
Usually, the time of sampling is measured in chronological units (day, year, etc.) and the times of internal nodes are inferred in the same units. To apply coalescent theory, chronological time should be transformed to time units in terms of generations. We can estimate the evolutionary rate r (number of substitutions per year) with serially sampled data. To make the problem simple, suppose that r is almost constant. To easily convert the mutation rate per generation into the substitution rate per year, we also assume that all mutations are selectively neutral. If the generation length
(days/generation) is known, 1 year can be regarded as roughly 365/
generations and the mutation rate µ (number of mutations/generation) can be calculated as r
/365. If the mutation rate µ is known, 1 year can be regarded as roughly r/µ generations and the generation length (
) can be calculated 365µ/r. Unfortunately, we cannot determine µ and
separately with only serially sampled data measured in chronological time units. One of the two must be inferred from other information. Previously estimated values of µ include 4.0 x 10-5 (![]()
![]()
include 2.6 (![]()
![]()
![]()
![]()
Equation 3 and Equation 9 apply when the time intervals are known. In real situations, they are estimated rather than being known with certainty. The uncertainty arises because the sequence data X are insufficient for exact estimation of the internal node times and topology. Therefore, uncertainty of internal node times needs to be considered when calculating the variance of the maximum-likelihood estimate for the effective population size. Suppose that we know definitely one of µ and
. For n sequences (equivalently, n - 1 coalescent events),
![]() |
(11) |
The first factor above, p(t|Ne), is itself a product of n - 1 terms that each correspond to a specific coalescent event (see Equation 5 and Equation 6). The second of the factors, p(X|t, r), is the conventional quantity that is calculated for phylogeny reconstruction with Felsenstein's pruning algorithm (![]()
![]()
and
, we get

Also, we note that

where l1,i(N|t) corresponds to a particular of the n - 1 coalescent events. With pseudomaximum likelihood, all parameters except those of interest are regarded as nuisance parameters. Replacing the nuisance parameters with consistent estimates simplifies the estimation problem. Instead of estimating t and Ne simultaneously, we infer
by maximizing l2 and then estimate Ne by maximizing l1(Ne|
). We use
e(t) to refer to the estimated value of Ne that is obtained by maximizing l1(Ne|t) over Ne for some fixed value of t. Obviously, the estimates
and
e(
) are not necessarily equal to those obtained by jointly maximizing l(Ne, t) over Ne and t.
The values of
can be inferred with a maximum-likelihood method that incorporates serially sampled data (e.g., the Tipdate software package; ![]()
are obtained,
e(
) can be inferred with
(Equation 2). This approach is sensible because the maximum-likelihood estimate of t is consistent (![]()
e(
) can be approximated as
![]() |
(12) |
The right side of Equation 12 is the sum of two terms where the first term,
, is the variance of Ne when the divergence times are known (see Equation 10). For a given value of Ne, the divergence times are random and will stochastically vary among coalescent realizations. It is this randomness due to genetic sampling that is captured by
. We can approximate this genetic sampling term with
e(
)2/(n - 1). The second of the two terms in Equation 12 arises from statistical sampling. With finite sequence lengths, divergence times cannot be perfectly estimated and this second term represents the uncertainty of
e(
) that results. The impact of this randomness due to statistical sampling can be handled either via a numerical approximation of the second term of Equation 12 or by simulation (see below).
| METHODS |
|---|
Sequence sampling from asymptomatic period:
We analyzed the C2-V5 region of HIV-1 env sequences from nine patients. These data were described by ![]()
![]()
A striking pattern among the nine HIV-1 sequence data sets is that divergence, defined as the evolutionary distance from an early founder sequence, is linearly increasing during the first portion of the infection and then tends to stabilize at a later stage (![]()
|
The sequence set from each patient was aligned via the default options of ClustalW Ver.1.07 (![]()
Phylogenies were then inferred from aligned sequences via maximum-likelihood. Tree reconstruction was performed in three steps. First, a consensus of all sequences with the earliest isolation date from a particular patient was constructed. This consensus sequence was added to the data and was treated as an outgroup. Then, the neighbor-joining method (![]()
![]()
![]()
![]()
Estimation of evolutionary rate and effective population size:
With Equation 12, knowledge of Var(
) is necessary to determine Var(
e(
)). According to asymptotic theory, the distribution of maximum-likelihood estimates can be approximated by a multivariate normal distribution. Here, the asymptotic approximation improves as sequence length increases. We approximated the variance of the estimated times of internal nodes with the inverse of a numerically estimated Fisher information matrix. By sampling from the resulting multivariate normal distributions, we simulated new sets of estimated times and then estimated effective population sizes from these simulated times. To estimate Ne in this fashion, it was necessary to assume a specific value for either
or µ and then apply a generalization of Equation 5. For each patient and with each value of
or µ, variance estimates were based on 20,000 samples from the appropriate multivariate normal distribution. This computational procedure yields an approximation of the second of the two terms in Equation 12 that are needed to estimate Var(
e(
)).
When sampling node times from a multivariate normal distribution, it is possible to observe samples that are inconsistent with the serial sampling design. For example, in Equation 5, if the set of random numbers shows >n1 - 1 coalescent events after time k2, the likelihood function is not defined. In cases where the likelihood function was not defined, the set of sampled node times was discarded.
As an alternative to our strategy of sampling from a multivariate normal distribution to estimate the uncertainty in
e(
) due to uncertainty in node times, a bootstrap approach could be used. The idea would be to sample sites with replacement. For each bootstrap sample, effective population size could be estimated and the variance of these estimates among bootstrap samples could be determined. As with our multivariate sampling strategy, this bootstrap approach would only estimate the uncertainty in
e(
) that is due to uncertainty in node times. This contribution from node time uncertainty would be added to
e(
)2/(n - 1) to approximate Var(
e(
)). Advantages of our multivariate normal sampling strategy over the bootstrap alternative are simplicity and computational feasibility.
| RESULTS |
|---|
Correlation between Ne and r:
Table 1 shows estimated evolutionary rates and effective population sizes. There appears to be large variation among patients in evolutionary rates. This variation can be explained by any of three scenarios: (1) mutation rates are the same among patients, but generation lengths differ; (2) generation lengths are the same, but mutation rates differ; (3) both mutation rates and generation lengths differ. Because estimates from serially sampled data of the mutation rate and generation length are confounded when the sampling times are measured in chronological units, we considered only the first two of the three potential causes for evolutionary rate variation.
When an identical mutation rate among patients was assumed, we used the value 2.5 x 10-5 (mutations per generation) that was estimated by ![]()
![]()
Estimates of generation time
have been obtained via a wide variety of approaches and these estimates themselves have widely varied. Estimates of
range from 1.2 days (![]()
![]()
, ranged among patients from 0.76 to 2.64 days with a mean of 1.47 days. This mean value is close to values of 1.78 days (![]()
![]()
Fig 2 shows the relation between evolutionary rate and effective population size for two scenarios: constant generation length and constant mutation rate. Assuming a viral generation length of 1.47 days in each patient, we observed a clearly negative correlation between evolutionary rate and effective population size. A negative correlation was also observed when the mutation rate was assumed to be shared among viruses in different patients, but here the correlation was weaker.
|
Bias and precision of pseudomaximum-likelihood estimates:
Via simulation, we investigated the effect on effective population size estimates under various circumstances (Table 2). Our simulation scenarios differed according to the number of sequences (5, 10, 15, 20, or 25) that were sampled at each of four sampling times. The interval between sampling times was 1 year, which corresponds to 248.3 generations under the assumption that the generation length (
) is 1.47 days. The true value of Ne was 4000, because this is close to the mean estimated effective population size (4232.2) of the nine patients when
was set equal to 1.47 days. Sequence lengths were simulated to be 600 bases, because the lengths of the aligned sequences from the nine patients ranged from 544 to 600 bases. The evolutionary rate was set to 0.00735 substitutions/year because this is the mean of the estimates from the nine patients. This evolutionary rate corresponds to a mutation rate of 2.97 x 10-5 mutations/generation.
|
Node times and the tree topology were simulated according to the coalescent process. In the simulations, one additional sequence was added so that the root of the ingroup genealogy could be inferred. This additional sequence had a sampling time that was set to chronologically precede the root of the ingroup by one generation. The time at which the lineage leading to the ingroup joined with the lineage leading to the outgroup was then randomly determined according to the coalescent process. The Jukes-Cantor model (![]()
After generating the sequences, effective population size was inferred from these data in five different situations. In three cases (referred to as cases 13),
was assumed. For case 1, the true tree topology and coalescent times were both treated as known. For case 2, the true tree topology was treated as known but the coalescent times were estimated with maximum likelihood. For case 3, the tree topology was estimated with a combination of neighbor-joining and local rearrangements and then the coalescent times were estimated with maximum likelihood. In two situations (referred to as cases 4 and 5),
was assumed. As with case 2, case 4 involved treating the true tree topology as known but estimating coalescent times. As with case 3, case 5 involved reconstructing the tree topology via a combination of neighbor-joining and local rearrangements and then estimating the coalescent times on this topology with maximum likelihood.
For each entry in Table 2, the estimated mean and standard error of
e are based upon results from 100 simulated data sets. As expected due to the second term of Equation 12, case 1 exhibits lower standard errors than do the other cases. Another general and expected trend seems to be that the standard error for estimating Ne decreases as the number of sequences increases. This decrease can be explained by the first term in Equation 12.
As the number of sequences gets larger, the difference grows between the standard errors for the cases where the true tree topology was and was not treated as known. This indicates that topological uncertainty is more important for data sets with large numbers of sequences. But the impact of topological uncertainty on the standard errors of Ne seems to be relatively small when compared with the impact of uncertainty of coalescence times.
Results from cases 3 and 5 indicate that the approach introduced here to estimate Ne has upward bias. The first step of our pseudomaximum-likelihood approach is to estimate t by maximizing P(X|t). These estimates of t will be asymptotically unbiased and will asymptotically approach a multivariate normal distribution. However, when the sequence length is not long enough, the sampling distribution of the estimates of t may greatly differ from a multivariate normal. When there are many sequences, coalescent time intervals are relatively short near the tips and the estimated internal node times are constrained by the times of tips. Thus, the deviation between the sampling distribution of the estimates of t and a multivariate normal distribution may become particularly important when there are many sequences. As a result, the sampling distribution of estimates of t is truncated and the expected values of
will be larger than their true values. This could explain the upward bias that we observe for estimating Ne even when the true tree topology is used. The effect of the truncation of coalescent times was also noted by ![]()
The estimation properties of the pseudomaximum-likelihood technique are also affected by the fact that one step is to reconstruct a bifurcating topology but this reconstruction step is not sufficiently influenced by the times at which sequences are sampled. As an illustration, consider a data set where sequence A and sequence B were sampled at the same time but sequence C was sampled k generations earlier. It is possible, especially if the mutation rate is small, that these three sequences are identical because no substitutions occur following their common ancestor. In this case, our topology reconstruction procedure would consider each of the three rooted topologies shown in Fig 3 to be equally likely and one of these three topologies would be arbitrarily selected. If the topology of Fig 3A were selected, the resulting estimate of Ne for these data would be 0. In contrast, selection of the topologies in Fig 3B or Fig 3C would yield an estimate for Ne of k. The necessity to arbitrarily select one bifurcating topology due to branch length estimates being zero becomes more frequent when the number of sequences is large and the impact of these arbitrary selections is likely to be more significant when the number of sequence sampling times is large. The effect of phylogenetic reconstruction on estimating population parameters has also been investigated by ![]()
when the number of sequences is large and when using UPGMA trees without correcting for multiple hits.
|
The number of generations represented by each branch on the geneaology controls the estimate of effective population size. Our approach is to estimate these numbers of generations from sequence data. Because sequence data can be used to directly estimate the expected number of sequence changes on each branch of the genealogy, the number of generations represented by each branch can be straightforwardly estimated from the numbers of sequence changes on branches and from a known rate µ of sequence change per generation. When the chronological time per generation
is assumed known and µ is assumed unknown, the rate of sequence change per chronological time unit must also be estimated to permit estimation of the number of generations represented by each branch on the tree. Because rate of sequence change per chronological time unit is subject to estimation uncertainty, the standard errors for estimating Ne are smaller for cases 4 and 5 than for cases 2 and 3.
| DISCUSSION |
|---|
Recently, a Bayesian approach that incorporates the uncertainty in the genealogical structure has been developed (![]()
Although there are multiple Bayesian and frequentist options for model checking, a potential advantage of the approach here over the full Bayesian strategy is ease of model checking. The first step of our procedure begins with reconstructing a geneaology and the structure of the reconstructed genealogy can then be immediately inspected. Inspection of the genealogy permits both formal and informal checks of whether the assumed model of population history is plausible. For example, the structure of the reconstructed genealogy can give an immediate indication as to whether a model of population size increase over time is justified. Because we believe that models are apt to be the weakest point of evolutionary analysis, whereas the method of estimation based upon the model is an important but secondary concern, ease of model checking should not be dismissed when evaluating a procedure.
In our approach, the estimated times of internal nodes are nuisance parameters rather than being of major interest. Our pseudomaximum-likelihood method does not account for uncertainty of these nuisance parameters. To account for this uncertainty, an empirical Bayes approach could be adopted. In empirical Bayes approaches, the marginal-likelihood function is obtained by integrating the conditional distribution over the space of nuisance parameters (![]()
In our case, the marginal-likelihood function of Ne given the sequence data X could be obtained by integrating over the times t,

This integration is not analytically simple but a numerical approximation is available (![]()
![]() |
(13) |
The marginal likelihood can be calculated with Equation 13 for each Ne and the maximum-likelihood estimate of Ne can be found by maximizing P(X|Ne).
To facilitate data analysis, we combine the constant rate assumption of the molecular clock hypothesis and Kingman's n-coalescent (![]()
![]()
![]()
![]()
![]()
![]()
Kingman's n-coalescent also requires a constant population size. Because clinical measurements such as viral load counts do not change much during the asymptomatic period (reviewed in ![]()
Regarding the molecular clock, we focus here on analysis of sequences that were isolated during the approximately linear phase of sequence divergence that is characteristic of the early asymptomatic period in HIV-1 infections. This linear pattern is what would be seen with neutral evolution and with a molecular clock. Because the linear pattern does not extend into later portions of the asymptomatic period, we did not analyze sequences isolated during the later portions. Therefore, instead of using a highly realistic and overly complicated model to analyze the entire data set, we opted here to investigate only the portions of the data that seemed relatively compatible with the simple model of a molecular clock.
Searching for associations between effective population size and measurements that reflect the physiological condition of a patient may also be fruitful. As a potential marker of disease progression, we examined the time points at which CD4+ T cell counts of each patient decreased to 200/µl (see ![]()
In previous work (![]()
e(
)) among serial sampling designs that share a common number of sampled sequences will be attributable to differences in Var(
) (see Equation 12). Designs with smaller Var(
) are expected to yield smaller Var(
e(
)).
In this study, we assumed either that viral mutation rate or viral generation length is constant among patients. In reality, both mutation rate and generation time probably vary to some extent among patients. To investigate this simultaneous variation, more data and especially a more sophisticated model for viral evolution may be warranted.
The negative correlation between evolutionary rate and effective population size (Fig 2) is noteworthy. We cannot formally exclude the possibility that this negative correlation is an artifact of our pseudomaximum-likelihood procedure because the fact that viral generation length and evolutionary rate may both vary among patients makes this exclusion difficult. Because differences in effective population size and evolutionary rate estimates among patients exceed the uncertainty within patients, the possibility that this negative correlation is simply an artifact seems unlikely.
In population genetics, this negative correlation is predicted by both the slightly deleterious mutation model (![]()
![]()
![]()
2 and with a mean that can be either negative or positive. It was shown by simulation that the substitution (note that "substitution" here does not refer solely to variants that have been fixed in the population) rate is negatively correlated with Ne
(![]()
.
It is not clear whether the above classical views of population genetics explain our finding of a negative correlation. There are potentially many population genetic scenarios that could result in this correlation. Also, the correlation could be attributable to the variation of immune response among patients. If a patient's immune system is strong, viral sequences could experience strong positive selection and this could lead to an increased evolutionary rate and a decreased effective population size. We believe that further characterization of this apparent correlation and its possible sources would facilitate our understanding of the mechanism of viral adaptation within hosts.
| ACKNOWLEDGMENTS |
|---|
We thank two anonymous reviewers for their suggestions. T.-K.S. and H.K. were supported by the Japan Society for the Promotion of Science (JSPS) United States-Japan research collaboration program 12554037; T.-K.S. by the Japanese government scholarship program for foreign students; M.H. and H.K. by grant BSAR-497 from the Ministry of Education, Culture, Sports, Science, and Technology (MECSST); J.L.T. and H.K. by grant 13308013 of MECSST; and J.L.T. by National Science Foundation grants DBI-0077503 and INT-990934.
Manuscript received August 13, 2001; Accepted for publication January 21, 2002.
| APPENDIX |
|---|
Using the following equation,

we can estimate
e(
). With a Taylor expansion of (
/
Ne)l(
e(t), t) around Ne, we get
![]() |
(A1) |
because

Equation A1 leads to

and

A Taylor expansion of
e(
) around t yields

Therefore,
![]() |
(A2) |
| LITERATURE CITED |
|---|
ADACHI, J., and M. HASEGAWA, 1996 Programs for molecular phylogenetics based on maximum likelihood, Molphy Version 2.3.
CONGDON, P., 2001 Bayesian Statistical Modelling, pp. 472474. John Wiley & Sons, New York.
DRUMMOND, A. and A. G. RODRIGO, 2000 Reconstructing genealogies of serial samples under the assumption of a molecular clock using serial-sample UPGMA. Mol. Biol. Evol. 17:1807-1815
DRUMMOND, A., G. K. HICHOLLS, A. G. RODRIGO, and W. SOLOMON, 2002 Estimating mutation rate, population history, substitution model and genealogy simultaneously from temporally spaced sequence data. Genetics in press.
FELSENSTEIN, J., 1981 Evolutionary trees from DNA sequences: a maximum likelihood approach. J. Mol. Evol. 17:368-376[Medline].
FELSENSTEIN, J., 1988 Phylogenies from molecular sequences: inference and reliability. Annu. Rev. Genet. 22:521-565[Medline].
FELSENSTEIN, J., 1992 Estimating effective population size from samples of sequences: inefficiency of pairwise and segregating sites as compared to phylogenetic estimates. Genet. Res. 59:139-147[Medline].
FELSENSTEIN, J., M. K. KUHNER, J. YAMATO and P. BEERLI, 1999 Likelihoods on coalescents: a Monte Carlo sampling approach to inferring parameters from population samples of molecular data, pp. 163185 in Statistics in Molecular Biology and Genetics (IMS Lecture Series, Vol. 33), edited by F. SEILLIER-MOISEIWITSCH. Institute of Mathematical Statistics and American Mathematical Society, Hayward, CA.
FU, Y.-X., 1994 A phylogenetic estimator of effective population size or mutation rate. Genetics 136:685-692[Abstract].
FU, Y.-X., 2001 Estimating mutation rate and generation time from longitudinal samples of DNA sequences. Mol. Biol. Evol. 18:620-626
GOLDING, G. B., 1997 The effect of purifying selection on genealogies, pp. 271285 in Progress in Population Genetics and Human Evolution (IMA Volumes in Mathematics and Its Applications, Vol. 87), edited by P. DONNELLY and S. TAVARÉ. Springer-Verlag, New York.
GONG, G. and F. J. SAMANIEGO, 1981 Pseudo maximum likelihood estimation: theory and applications. Ann. Stat. 9:861-869.
HASEGAWA, M., H. KISHINO, and T. YANO, 1985 Data of the human-ape splitting by a molecular clock of mitochondrial DNA. J. Mol. Evol. 22:160-174[Medline].
HUDSON, R. R., 1991 Gene genealogies and the coalescent process. Oxf. Surv. Evol. Biol. 7:1-44.
JUKES, T. H., and C. R. CANTOR, 1969 Evolution of protein molecules, pp. 21132 in Mammalian Protein Metabolism, edited by H. N. MUNRO. Academic Press, New York.
KINGMAN, J. F. C., 1982a The coalescent. Stoch. Proc. Appl. 13:235-248.
KINGMAN, J. F. C., 1982b On the genealogy of large populations. J. Appl. Probab. 19A:27-43.
KUHNER, M. K., J. YAMATO, and J. FELSENSTEIN, 1998 Maximum likelihood estimation of population growth rates based on the coalescent. Genetics 149:429-434
MANSKY, L. M., 1996 Forward mutation rate of human immunodeficiency virus type 1 in a T lymphoid cell line. AIDS Res. Hum. Retroviruses 12:307-314[Medline].
NEUHAUSER, C. and S. M. KRONE, 1997 The genealogy of samples in models with selection. Genetics 145:519-534[Abstract].
NIELSEN, R. and Z. YANG, 1998 Likelihood models for detecting positively selected amino acid sites and applications to the HIV-1 envelope gene. Genetics 148:929-936
NOWAK, M. A. and R. M. MAY, 1992 Coexistence and competition in HIV infections. J. Theor. Biol. 159:329-342[Medline].
NOWAK, M. A., R. M. MAY, and R. M. ANDERSON, 1990 The evolutionary dynamics of HIV-1 quasi species and the development of immunodeficiency disease. AIDS 4:1095-1103[Medline].
O'HAGAN, A., 1996 Kendall's Advanced Theory of Statistics, Vol. 2B: Bayesian Inference, pp. 131132. John Wiley & Sons, New York.
OHTA, T., 1987 Very slightly deleterious mutations and the molecular clock. J. Mol. Evol. 26:1-6[Medline].
OVERBAUGH, J. and C. R. M. BANGHAM, 2001 Selective forces and constraints on retroviral sequence variation. Science 292:1106-1109
PERELSON, A. S., A. U. NEUMANN, M. MARKOWITZ, J. M. LEONARD, and D. D. HO, 1996 HIV-1 dynamics in vivo: virion clearance rate, infected cell life-span, and viral generation time. Science 271:1582-1586[Abstract].
PRZEWORSKI, M., B. CHARLESWORTH, and J. D. WALL, 1999 Genealogies and weak purifying selection. Mol. Biol. Evol. 16:246-252[Abstract].
RAMBAUT, A., 2000 Estimating the rate of molecular evolution: incorporating non-contemporaneous sequences into maximum likelihood phylogenies. Bioinformatics 16:395-399
RODRIGO, A. G., and J. FELSENSTEIN, 1999 Coalescent approaches to HIV population genetics, pp. 233272 in The Evolution of HIV, edited by K. A. CRANDALL. Johns Hopkins University Press, Baltimore.
RODRIGO, A. G., E. G. SHAPER, E. L. DELWART, A. K. IVERSEN, and M. V. GALLO et al., 1999 Coalescent estimates of HIV-1 generation time in vivo. Proc. Natl. Acad. Sci. USA 96:2187-2191
SAITOU, N. and M. NEI, 1987 The neighbor-joining method: a new method for reconstructing phylogenetic trees. Mol. Biol. Evol. 4:406-425[Abstract].
SEO, T.-K., J. L. THORNE, M. HASEGAWA, and H. KISHINO, 2002 A viral sampling design for testing the molecular clock and for estimating evolutionary rates and divergence times. Bioinformatics 18:115-123
SHANKARAPPA, R., J. B. MARGOLICK, S. J. GANGE, A. G. RODRIGO, and D. UPCHURCH et al., 1999 Consistent viral evolutionary changes associated with the progression of human immunodeficiency virus type 1 infection. J. Virol. 73:10489-10502
TACHIDA, H., 1991 A study on a nearly neutral mutation model in finite populations. Genetics 128:183-192[Abstract].
THOMPSON, J. D., D. G. HIGGINS, and T. J. GIBSON, 1994 Clustal W: improving the sensitivity of progressive multiple sequence alignment through sequence weighting, position-specific gap penalties and weight matrix choice. Nucleic Acids Res. 22:4673-4680
VISCIDI, R. P., 1999 HIV evolution and disease progression via longitudinal studies, pp. 346389 in The Evolution of HIV, edited by K. A. CRANDALL. Johns Hopkins University Press, Baltimore.
This article has been cited by other articles:
![]() |
N. N. V. Vijay, Vasantika, R. Ajmani, A. S. Perelson, and N. M. Dixit Recombination increases human immunodeficiency virus fitness, but not necessarily diversity J. Gen. Virol., June 1, 2008; 89(6): 1467 - 1477. [Abstract] [Full Text] [PDF] |
||||
![]() |
X. Liu and Y.-X. Fu Test of Genetical Isochronism for Longitudinal Samples of DNA Sequences Genetics, May 1, 2007; 176(1): 327 - 342. [Abstract] [Full Text] [PDF] |
||||
![]() |
N. Beerenwinkel and M. Drton A mutagenetic tree hidden Markov model for longitudinal clonal HIV sequence data Biostat., January 1, 2007; 8(1): 53 - 71. [Abstract] [Full Text] [PDF] |
||||
![]() |
C. T. T. Edwards, E. C. Holmes, O. G. Pybus, D. J. Wilson, R. P. Viscidi, E. J. Abrams, R. E. Phillips, and A. J. Drummond Evolution of the Human Immunodeficiency Virus Envelope Gene Is Dominated by Purifying Selection Genetics, November 1, 2006; 174(3): 1441 - 1453. [Abstract] [Full Text] [PDF] |
||||
![]() |
Y. Liu, J. McNevin, J. Cao, H. Zhao, I. Genowati, K. Wong, S. McLaughlin, M. D. McSweyn, K. Diem, C. E. Stevens, et al. Selection on the Human Immunodeficiency Virus Type 1 Proteome following Primary Infection J. Virol., October 1, 2006; 80(19): 9519 - 9529. [Abstract] [Full Text] [PDF] |
||||
![]() |
|














and the effective population size 







