- THIS ARTICLE
-
Abstract
- Full Text (PDF)
-
All Versions of this Article:
genetics.103.025692v1
170/1/419 most recent - Alert me when this article is cited
- Alert me if a correction is posted
- SERVICES
- Email this article to a friend
- Similar articles in this journal
- Similar articles in PubMed
- Alert me to new issues of the journal
- Download to citation manager
- Reprints & Permissions
- CITING ARTICLES
- Citing Articles via HighWire
- Citing Articles via Google Scholar
- GOOGLE SCHOLAR
- Articles by Suchard, M. A.
- Search for Related Content
- PUBMED
- PubMed Citation
- Articles by Suchard, M. A.
Originally published as Genetics Published Articles Ahead of Print on March 21, 2005.
Genetics, Vol. 170, 419-431, May 2005, Copyright © 2005
doi:10.1534/genetics.103.025692
Stochastic Models for Horizontal Gene Transfer
Taking a Random Walk Through Tree Space
Marc A. Suchard1
Department of Biomathematics, David Geffen School of Medicine, University of California, Los Angeles, California 90095-1766
1 Address for correspondence: Department of Biomathematics, David Geffen School of Medicine, UCLA, 650 Charles Young Dr., Box 951766, Los Angeles, CA 90095-1766.
E-mail: msuchard{at}ucla.edu
>ABSTRACT
MODEL
STATISTICAL FRAMEWORK
EXAMPLE
REMARKS
APPENDIX
ACKNOWLEDGEMENTS
LITERATURE CITED
Horizontal gene transfer (HGT) plays a critical role in evolution across all domains of life with important biological and medical implications. I propose a simple class of stochastic models to examine HGT using multiple orthologous gene alignments. The models function in a hierarchical phylogenetic framework. The top level of the hierarchy is based on a random walk process in "tree space" that allows for the development of a joint probabilistic distribution over multiple gene trees and an unknown, but estimable species tree. I consider two general forms of random walks. The first form is derived from the subtree prune and regraft (SPR) operator that mirrors the observed effects that HGT has on inferred trees. The second form is based on walks over complete graphs and offers numerically tractable solutions for an increasing number of taxa. The bottom level of the hierarchy utilizes standard phylogenetic models to reconstruct gene trees given multiple gene alignments conditional on the random walk process. I develop a well-mixing Markov chain Monte Carlo algorithm to fit the models in a Bayesian framework. I demonstrate the flexibility of these stochastic models to test competing ideas about HGT by examining the complexity hypothesis. Using 144 orthologous gene alignments from six prokaryotes previously collected and analyzed, Bayesian model selection finds support for (1) the SPR model over the alternative form, (2) the 16S rRNA reconstruction as the most likely species tree, and (3) increased HGT of operational genes compared to informational genes.
TRADITIONAL views of molecular evolution hold that genetic material mutates slowly over time as it is passed in a vertical fashion from parent to progeny. Molecular phylogenetics then aims to reconstruct this history of inheritance of genetic sequence data from contemporary organisms into a tree-like structure. However, belief in a single tree, mandated by vertical transmission, for all genetic material is changing. Evolutionary biologists increasingly recognize the horizontal transmission of genetic material between distantly related organisms as an important mechanism of evolution (SYVANEN 1994; LAWRENCE 1999; JAIN et al. 2002).
The process of horizontal (or lateral) gene transfer (HGT) plays a critical role across all domains of life and in particular among prokaryotes (JAIN et al. 1999; KOONIN et al. 2001). For example, many prokaryotes are agile at quickly adapting to new environments. Often, this ability stems from the acquisition of new genes through HGT rather than through random mutation (LAWRENCE 1999). At least three mechanisms promote HGT in prokaryotes (JAIN et al. 2002). These include: (1) transformation in which free DNA sequences are absorbed from the environment, (2) conjugation between two different prokaryotic species, and (3) transduction of genetic material through viruses. Finally, HGT also has medical importance (BROWN 2003). In the field of infectious diseases, HGT among bacterial pathogens of antibiotic resistance genes has greatly contributed to the emergence of multidrug-resistant bacteria in clinical settings (LEVERSTEIN-VAN HALL et al. 2002). In the field of oncology, HGT may also affect tumor progression; BERGSMEDH et al. (2001) show that eukaryotic cells can transfer active oncogenes.
Three general methods have been employed to examine HGT. The first focuses on single genomes and identifies genes suspected to have been imported through HGT by examining variation in nucleotide base composition and codon usage patterns (LAWRENCE and OCHMAN, 1997). The latter two methods are comparative studies across species. One uses similarity approaches based on gene content to identify HGT (RAGAN 2001) and to propose average genome or species-level trees (SNEL et al. 1999), while the alternative method endorses phylogenetic reconstruction using orthologous genes (JAIN et al. 1999). Base composition and codon bias studies may perform poorly when compared to phylogenetic methods (KOSKI et al. 2001). Further, phylogenetic methods offer at least one advantage over similarity-based approaches. The reconstructed phylogenies have direct biological interpretability as descriptions of the underlying evolutionary histories of the different genes (DOOLITTLE 1999). If a reconstructed gene tree differs from the assumed phylogeny of the species being studied, then HGT is offered as a possible explanation (SYVANEN 1994). One intrinsic difficulty is that the true species tree is often itself unknown. Therefore, it is necessary to either fix the species tree to equal the inferred gene tree for a specially chosen gene, e.g., the 16S rRNA tree (WOESE 2000), or simultaneously estimate the species tree and gene trees given a biologically plausible model relating them. As a first step, several research groups have attacked the inverse problem of reconstructing a species tree given gene trees subject to HGT. Most notable are the parsimony-based reconciled tree work by Page and colleagues (e.g., PAGE 2000) and the algorithmic work of MIRKIN et al. (2003).
I propose a simple class of stochastic models for HGT that enable the simultaneous estimation of the underlying species tree relating a group of organisms and the gene trees subject to HGT for a set of orthologous gene alignments. These HGT models function in a hierarchical manner (SUCHARD et al. 2003a) in which standard Bayesian phylogenetic approaches (e.g., SINSHEIMER et al. 1996; YANG and RANNALA 1997; MAU et al. 1999; LI et al. 2000; HUELSENBECK et al. 2001) are used to reconstruct each gene tree from its corresponding gene alignment. Simultaneous to the reconstructions, the HGT models impose a second probabilistic distribution over the gene trees (MADDISON 1997). This hierarchical distribution describes the gene trees likelihoods given an unknown species tree and an unknown number of HGT events leading from that species tree to each gene tree. The model is fit in a Bayesian framework that naturally handles uncertainty in discrete parameters such as all the trees and the number of HGT events and compares various models using Bayes factors (SUCHARD et al. 2001). Stochastic models fit in statistical frameworks offer several advantages over parsimony approaches. First, parsimony may underestimate the number of HGT events linking the species tree to the gene trees. This consequence is similarly seen in parsimonious reconstructions of the tree themselves, in which the number of nucleotide substitutions is underestimated. Second, it is easier in a statistical framework to include measures of uncertainty and these levels may be high in the inferred gene trees given the sparse data from which they are reconstructed.
One additional advantage of building stochastic models for HGT is the ability to compare competing models and to incorporate possible differences in the stochastic processes across genes, while assessing the significance of these differences in a formal statistical framework. As one example of possible differences across genes, JAIN et al. (1999) propose the complexity hypothesis. Under this hypothesis, genes are divided into one of two classes, informational or operational genes. Between classes, the rates of HGT differ. It is suspected that rates are higher for operational genes than for informational genes. This hypothesis and others can be tested by integrating over all possible species trees and gene trees weighed by their posterior probabilities. This Bayesian model-averaging approach reduces the possible bias inherent in selecting a specific species tree, minimizes underestimation of the uncertainty associated with the hypotheses (TAYLOR et al. 1996), and eliminates the need for ad hoc analyses. Formal comparison of different models for HGT will help gather further insight into the underlying biological processes.
ABSTRACT
>MODEL
STATISTICAL FRAMEWORK
EXAMPLE
REMARKS
APPENDIX
ACKNOWLEDGEMENTS
LITERATURE CITED
Within-gene reconstruction model:
I begin with a hierarchical framework for phylogenetic reconstruction using molecular sequence data Y (SUCHARD et al. 2003a). Data Y = (Y1, ... , YK) consist of K naturally disjoint partitions. Partition data Yk for k = 1, ... , K represent the aligned DNA sequences of length Lk from one specific gene per partition, sequenced from the same N taxa across all partitions. A hierarchical phylogenetic model enables the pooling of information across gene partitions to improve estimate precision in individual partitions, while permitting estimation and testing of tendencies in across-partition quantities. For HGT, such across-partition quantities include: (1) an overall species tree, (2) appropriate stochastic models from which to construct a probability distribution over individual gene trees given the species tree, and (3) the stochastic model parameters that may vary between different classes of genes.
To utilize standard Bayesian models for phylogenetic reconstruction (e.g., SINSHEIMER et al. 1996; YANG and RANNALA 1997; MAU et al. 1999; LI et al. 2000) within a gene partition, data Yk further divide into ordered homologous sites Ykl for l = 1, ... , Lk. Site data Ykl = (Ykl1, ... , YklN)t contain one nucleotide from each taxon, such that Ykln
(A, G, C, T) or their ambiguous wildcards for n = 1, ... , N. I assume that sites within a partition are independent and identically distributed, and the likelihood of observing Ykl is given by a multinomial distribution over the 4N possible outcomes with ambiguous nucleotides being integrated over their possible realizations. The multinomial outcome probabilities become functions of an unknown tree
k that describes the relatedness of the N taxa, branch lengths tk = (tk1, ... , tkB), and a model to describe nucleotide mutation along these branches, all within partition k.
I elect for a reversible, continuous-time Markov chain (CTMC) model for nucleotide substitution (FELSENSTEIN 1981) popularized by TAMURA and NEI (1993) (TN93). The TN93 model is further parameterized by two transition:transversion rate ratios,
k between purines A and G and
k between pyrimidines C and T, and the stationary distribution of the underlying Markov chain
k = (
kA,
kG,
kC,
kT). The final scale parameter in the TN93 model is fixed such that branch lengths measure the expected number of nucleotide substitutions between the nodes in
k that the branch connects. Because I assume a reversible model for nucleotide substitution and make no clock-like restrictions on branch lengths, the root of each tree is unidentifiable (FELSENSTEIN 1981). As a consequence, the descriptions of all trees to follow are unrooted with N 2 internal nodes and B = 2N 3 branches.
Across-gene hierarchical model:
Following the hierarchical framework of SUCHARD et al. (2003a), I take branch lengths tk as exponentially distributed with unknown expected divergence µk within partition k and model
![]() |
![]() | (1) |
= (
A,
G,
C,
T) are unknown across-partition-level expectations, variance-covariance matrix
has diagonal form, and
2
,
2
,
2µ, and N
are unknown across-partition-level measures of precision. Leaving V,
,
, and N
as unknowns specified only by hyperprior distributions and estimating these parameters simultaneously with the within-partition-level continuous parameters,
k,
k, and µk for all k, enables the borrowing of strength of information from one partition by another, producing more precise within-partition-level estimates. I assume conjugate (when possible) and flat or noninformative hyperpriors on these across-partition-level parameters, as discussed in SUCHARD et al. (2003a). While the development of hierarchical priors over the continuous within-partition-level parameters has been straightforward, constructing a hierarchical prior over gene trees
k that incorporates the stochastic nature of HGT is more involved. This is illustrated in the next section.
Horizontal gene transfer models:
To build a stochastic model for HGT, I first present a formal description of the set of all possible N-taxon trees, commonly referred to as "tree space" (BILLERA et al. 2001), as a mathematical graph and then discuss several possible random walks (D. ALDOUS and J. FILL, unpublished results) on this graph that mirror the observed effects of HGT.
There exist M = (2N 5)!/2N3(N 3)! possible trees relating N extant taxa (FELSENSTEIN 1981). On the basis of these M trees, I construct a graph
= (
,
) with vertex set
and edge set
. Each tree represents a different vertex, or node, in the graph, such that the size of the vertex set |
| = M. An edge uv
of a graph describes a direct connection between two of the graph's vertices u, v
. The number of edges emanating from a single vertex v defines its degree d(v). Two vertices that are joined together by a single edge are called adjacent. Restricting attention to simple graphs in which pairs of vertices may be connected to each other only by a single edge and no vertex is connected to itself by a looping edge, a single vertex v from graph
may be adjacent from as few as zero to as many as M 1 other vertices. The set of all vertices adjacent to v are its neighborhood
(v) and the size of this neighborhood |
(v)| = d(v). The specification of a neighborhood for each vertex completes the description of
, and many choices are available.
Subtree-prune-regraft-based model:
One approach to defining neighborhoods for each possible tree stems from subtree transfer operations (ALLEN and STEEL 2001). Subtree transfer operators act on trees producing local rearrangements. Applying a subtree transfer operator to one tree
results in the creation of one of several possible new topologies that differs from
by an extent dependent on the operator. The collection of all trees one operation away from
= v becomes its neighborhood
(v) under that operator. Nearest-neighbor interchange (ROBINSON 1971), tree bisection and reconnection (SWOFFORD et al. 1996), and subtree prune and regraft (SPR) (HEIN 1990, 1993) are three examples. In light of the goals of this article, SPR offers an advantage over the former two operators because of its potential biological interpretation. Applying the SPR operator to
= v with its resultant drawn from
SPR(v) mirrors the differences observed between a species tree and an individual gene tree affected by one HGT (or recombination) event (HEIN 1990, 1993; JAIN et al. 1999; ALLEN and STEEL 2001).
Figure 1 illustrates one realization of the SPR operator applied to a six-taxon tree. The operator works in two steps. The first step selects and cuts any branch in the initial tree,
initial. Cutting the branch prunes away a subtree,
subtree. This subtree then regrafts itself using the same cut branch to a new internal node obtained by subdividing a preexisting branch in
initial
subtree.
|
Several important properties about the graph
SPR induced by the SPR operator have been previously studied. First,
SPR is regular, implying that every vertex v
SPR possesses the same degree d(v) = 2(N 3)(2N 7) and, hence, neighborhood size (ALLEN and STEEL 2001). Also,
SPR is connected, meaning that a sequence of consecutive edges (a path) exists, connecting every pair of vertices in
SPR (ROBINSON 1971; ALLEN and STEEL 2001).
One straightforward stochastic process on any simple graph
is an unweighted random walk. A random walk on
proceeds from vertex to vertex along existing edges of the graph, generating a discrete-time Markov chain (DTMC), where the states of the chain are the visited vertices. As unweighted, the chain uniform randomly chooses its next vertex to visit from all neighbors of its current vertex. For this DTMC, the one-event transition probability matrix A has entries
![]() | (2) |
rescaled to be a stochastic matrix [i.e.,
v(A)uv = 1].
On the basis of K random walks on the graph
SPR induced by the SPR operator, I construct a hierarchical prior over the joint distribution of all gene trees
k. To accomplish this task, I assume:
- An unknown species tree
exists.
- The vertex representing
is the initial state of K Markov chains.
- The Markov chains are conditionally independent given
and A.
- The vertex representing
k is the final state of the kth chain.
- And each chain is of unknown length 0
Ek <
.
Figure 2 depicts one set of the possible paths of K = 4 Markov chains starting at species tree
and ending at gene trees
k on a small portion of a representative graph. The lengths of paths Ek shown range from one to three. I illustrate no paths of length zero, but these realizations should be most likely. A parsimony-like analysis considering beginning and end points of the chains in Figure 2 would, for example, underestimate E4 as zero instead of three.
|
Given the assumptions listed above, the probability of species tree
giving rise to gene tree
k after Ek HGT events is
![]() | (3) |
To complete the hierarchical specification, I assign a prior distribution over
by letting
![]() | (4) |
, one reasonable choice is z1 = ... = zM = 1/M; alternately, one may choose z such that the prior odds of competing hypotheses regarding
are one in a hypothesis-testing setting (SUCHARD et al. 2003a). A further choice is discussed later. I further assume a conditionally independent prior on all Ek,
![]() | (5) |
k is the expected number of HGT events for gene k and is a deterministic function of across-gene-level parameters. This prior is conjugate to (3), allowing all Ek to be integrated out of the model, improving sampling efficiency (LIU 1994),
![]() | (6) |
k = v|
= u,
k) = (P)uv, the multistep transition probability matrix,
![]() | (7) |
k are scaled as the expected number of HGT events per gene. Let
= (
1, ... ,
K). Then, recalling the conditional independence assumption between Markov chains, the joint distribution over all gene trees
k becomes
![]() | (8) |
Calculating the probabilities in (8) requires numerical methods to determine the matrix exponential involving PSPR. These methods involve calculating the complete set of eigenvalues and eigenvectors of PSPR, requiring
(M3) operations. Such procedures become quickly computationally prohibitive as N, and hence M, increases. As a consequence, numerical approximations may be necessary to develop weighted graph extensions to
SPR directly. The weights in these extended graphs would be functions of unknown parameters and sampling these parameters would necessitate repetitive diagonalization.
Random walks with analytic solutions:
An alternative to this computational barrier involves using random walks on graphs for which analytic solutions are known for any size M. To help find such solutions, Equation 7 demonstrates the close connection between a DTMC with a Poisson-distributed number of events and a CTMC. In fact, any such DTMC can be expressed as a unique CTMC, called the "continuized" version (D. ALDOUS and J. FILL, unpublished results). Analytic solutions for several weighted and unweighted CTMC processes on a complete graph are commonly used in phylogenetics. In a complete graph, all vertices are adjacent to all others. The most notable examples are the CTMC models for nucleotide substitution. The simplest model by JUKES and CANTOR (1969) is unweighted. In the APPENDIX, I present the multistep transition probability matrix PGJC for a generalized Jukes-Cantor (GJC) model involving an arbitrary number of vertices M. Proposed by KIMURA (1980), the next most sophisticated model for a complete graph is weighted. This model presupposes that the vertices are divided into two disjoint sets,
1
2 =
, and that transitions within and between
1 and
2 occur at varying rates. In terms of HGT, such a weighted random walk may prove useful to model varying rates of HGT between different groups of taxa. Letting M1 = |
1|, M2 = |
2|, and R equal the ratio of within- to between-transition rates, I present the multistep transition probability matrix PGK given M1, M2, and R for a generalized Kimura (GK) model in the APPENDIX.
Modeling differences across gene classes:
I incorporate potential differences across genes in the expected number of HGT events
k by employing a generalized linear model (GLM) approach (MCCULLAGH and NELDER 1983). GLMs link the mean response, in this case
k, to a set of linear predictors. First, I divide all K genes into one of C possible classes, where the definition of the classes depends on the specific research question at hand. To identify gene-class membership in the GLM, I construct a K x C design matrix D = (Dkc), where matrix elements Dk1 = 1 for all k, representing the baseline multiplier for the reference class, and
![]() | (9) |
![]() | (10) |
= (
1, ... ,
C) specify, on the log-scale, the expected number of HGT events for all classes. I complete the hierarchical prior specification by assuming
![]() | (11) |
= diag(10, ... , 10). This provides a quite diffuse prior on
, with the median expected number of HGT events per gene
0.14 (GARCIA-VALLVE et al. 2000) for all classes.
As an example of how this GLM construction functions, consider the C = 2 classes case. Then,
![]() | (12) |
2 = 0, no difference across classes exists. Likewise, when
2 < 0, the expected number of HGT events per gene is smaller in class 2 than in class 1, and when
2 > 0, the expected number is larger. ABSTRACT
MODEL
>STATISTICAL FRAMEWORK
EXAMPLE
REMARKS
APPENDIX
ACKNOWLEDGEMENTS
LITERATURE CITED
When models are nested, a relatively simple Bayes factor calculation is available via the Savage-Dickey ratio (VERDINELLI and WASSERMAN 1995) and involves generating a posterior sample from the larger model only (SUCHARD et al. 2003b). For example, to assess the significance of differences across gene classes in the expected number of HGT events, let M1 represent the unrestricted model proposed above. Nested within M1 exists M0, the equal-rates model, where
c = 0 for c = 2, ... , C. Further, the GJC model is nested within the GK model, as both are equal when R = 1.
On the other hand, the GJC and SPR models are non-nested, but both possess zero free parameters in their respective P matrices. For two arbitrary models M0 and M1 in situations like this, it is possible to estimate the posterior probabilities p(M0|Y) and p(M1|Y) by constructing a mixture model over the joint space of M0 and M1. By applying the Bayes theorem,
![]() | (13) |
p(M1|Y) (CARLIN and CHIB 1995; SUCHARD et al. 2002).
Models SPR and GK neither are nested nor contain the same number of free parameters. One might entertain constructing a reversible-jump Markov chain Monte Carlo (MCMC) sampler (GREEN 1995) over the joint space of these models to compute the Bayes factor in support of SPR over GK. However, a simpler algebraic solution exists given the two preceding Bayes factor calculations,
![]() | (14) |
To estimate all model parameters and Bayes factors, I employ MCMC. I further develop this MCMC algorithm and discuss its performance in the APPENDIX.
ABSTRACT
MODEL
STATISTICAL FRAMEWORK
>EXAMPLE
REMARKS
APPENDIX
ACKNOWLEDGEMENTS
LITERATURE CITED
|
I parallel the above analysis by assuming that the number of the different gene classes C = 2. I let class c = 1 represent the informational genes and class c = 2 represent the operational genes. To further maintain consistency with JAIN et al. (1999), I exclude third codon position nucleotides from all alignments and assume that first and second codon position nucleotides are evolving independently under the same process for each gene.
Selection of stochastic model:
I begin by comparing the relative likelihoods of the three different stochastic models, SPR, GJC, and GK. For the GK model, I define my two disjoint sets of trees as (1) those that support a split between the four Eubacteria and the two Archaea,
1, and (2) those that do not,
2. These definitions offer a first approximation to modeling differing rates of HGT within life domains and across domains in this example. HGT events that start and end in set
1 are within domain transfers, while events that start in
1 and end in
2, or vice versa, are across domain transfers.
The log10 Bayes factor in favor of SPR over GJC and the log10 Bayes factor in favor of GK over GJC are
![]() | (15) |
![]() | (16) |
|
Considering these Bayes factor estimates, the data strongly reject (KASS and RAFTERY 1995) the two complete graph models with analytic solutions in favor of the more biologically plausible process based on the SPR operator. However, the GJC and GK models should not be discounted completely; their computational complexity does not increase with increasing number of taxa N and they can offer some insight into the underlying biological processes. For example, the Bayes factor in favor of GK over GJC offers some indirect support for differing HGT rates within domains rather than across domains. One caveat should be kept in mind to keep from drawing too strong a conclusion from this findingthe unbalanced study design with only two Archaea precludes identifying HGT events within that domain. All further results in this article are based on the SPR model.
Estimating the species tree:
Figure 3 displays the currently accepted species tree relating the six prokaryotes studied here. The four Eubacteria and two Archaea form two distinct clades (FENG et al. 1997) and Aa is the earliest branching species of the Eubacteria studied (DECKERT et al. 1998). The branching order of the remaining three Eubacteria Ec, S6, and Bs is more ambiguous (GIOVANNONI et al. 1996). The three possible resolutions of this trifurcation are depicted on the right side of Figure 3. Much of the debate surrounding the trifurcation depends on data choice and reconstruction methodology. For example, the top resolution produces species tree
Ec-S6 that places Ec and S6 as nearest neighbors. Protein synthesis elongation factor (EF) Tu gene reconstructions support this tree (LAKE and RIVERA 1996) and JAIN et al. (1999) fix
Ec-S6 as their reference tree in their analysis. Reconstructions of 16S rRNA phylogeny support the middle resolution of species tree
Bs-S6 (COLE et al. 2003) with Bs and S6 as nearest neighbors. The final resolution of species tree
Ec-Bs gains support from reconstructions of phenylalanyl-tRNA synthetase (TEICHMANN and MITCHISON 1999). However, even these three critical genes are subject to HGT (WOLF et al. 1999; ZAP et al. 1999; KE et al. 2000) and their reconstructed phylogenies may inaccurately represent the true species tree.
|
On the basis of the SPR model for HGT, I infer
Bs-S6 as the most likely species tree with >0.999 posterior probability. The two other resolutions,
Ec-Bs and
Ec-S6, are the second and third most likely species trees, respectively. To estimate the Bayes factors in favor of
Bs-S6 against
Ec-Bs and
Ec-S6, I judiciously reweight my prior probabilities on trees z and calculate
![]() | (17) |
![]() | (18) |
Ec-Bs and
Ec-S6 initially appears quite small, on a relative scale it is not; probabilities for the remaining 102 trees are >15 orders of magnitude smaller.
Data sets as large as the K = 144 gene alignments from JAIN et al. (1999) are currently rare. Consequentially, I examine via simulation the number of alignments necessary to identify the species tree under the SPR model. Under this simulation, I randomly sample without replacement a fixed number of gene alignments K and then estimate the posterior support for
Bs-S6, assuming this is the true species-tree. I repeat this simulation 20 times for each value of K. For K = 2, the expected posterior probability of
Bs-S6 = 0.14. This estimate is approaching its prior value, signifying appropriate MCMC sampling with limited data. Approximately K = 50 gene alignments are required to achieve an expected posterior probability
0.80 and K = 70 are required for
0.90.
Hierarchical estimates of evolutionary pressures:
Table 2 presents the posterior estimates of the across-gene-level parameters used to pool information about (
k,
k, µk,
k). The table also lists posterior estimates of
![]() | (19) |
are consistent with a previous study using a subset of the data in a hierarchical framework (SUCHARD et al. 2003a). Also in comparison to this previous study, differences in estimates of M,
2A,
2G,
2M, and N
all trend in the correct directions given the increase in the number of taxa and genes fit here.
|
Varying rates of HGT across gene classes:
Figure 5 displays model estimates for the linear predictors
1 and
2 and for the expected number of HGT events per gene,
k, for the informational and operational gene classes. The two top plots display histograms of the posterior samples of
1 (left) and
2 (right). These plots also include normal approximations to the posterior (solid lines) and prior densities (dashed lines). Examining the plot on the right, the prior density at
2 = 0 (dotted vertical line) is considerably higher than the normal approximation to the posterior density. Further, the 95% BCI of
2 = (0.271.15) and does not cover zero. Both observations support the hypothesis that
2
0 and, hence, that rates of HGT differ between informational and operational genes. Formally, the Bayes factor in favor of differing rates is given by the Savage-Dickey ratio. The log10 Bayes factor,
![]() | (20) |
|
The bottom plot in Figure 5 transforms
1 and
2 into the expected number of HGT events per gene and displays histograms of the posterior samples of these quantities. Depicted in dark shading is
k for the operational genes and depicted in light shading is
k for the informational genes. Although
k for operational genes is significantly greater than
k for informational genes from the argument above, a small amount of overlap is observed (solid shading) between these marginal histograms. This overlap results from the high negative correlation between
1 and
2 (data not shown) and illustrates the need for caution in making inference on the basis of marginal posterior summaries alone. ABSTRACT
MODEL
STATISTICAL FRAMEWORK
EXAMPLE
>REMARKS
APPENDIX
ACKNOWLEDGEMENTS
LITERATURE CITED
The specific stochastic models for HGT developed in this article have important limitations. First and foremost, the random walks explore only the discrete, topological portion of tree space and do not consider changes in branch lengths between trees as part of the underlying HGT process. As a result, HGT between nearest neighbors in a tree remains unidentified as this process does not result in a change in the topological configuration of the tree. Model extensions that consider a continuous random drift process on the joint space of (
, t) (BILLERA et al. 2001) may circumvent this shortfall. For a related problem involving coalescence, YANG (2002) shows that including branch lengths t into the probabilistic model across loci improves power. Additionally, I assume that the K DTMCs representing the random walks of the gene trees
k away from the species tree
are conditionally independent given
. This assumption implies that the evolutionary histories of all genes are unlinked, while evidence for the HGT of, at a minimum, complete operons abounds in prokaryotes (KOONIN et al. 2001). Possible modeling aspects include allowing for linked or partially linked genes.
HGT is not the only process that may cause incongruence between gene trees. Although the effects of lineage sorting should be minor given the extensive divergence between the species studied here, the inclusion of paralogous genes copies within the orthologous alignments may mislead inference. Also important, stochastic error due to sparse phylogenetic data, evolutionary model misspecification, and parallel/convergent evolution can falsely produce incongruence between trees (CAO et al. 1998). These effects should upwardly bias the inferred number of HGT events. However, I suspect this bias is less than one HGT event per gene as only a modest percentage of genes should be affected and the error should produce just minor changes in the inferred tree. There is no a priori reason to suspect that this bias differs between the informational and operational gene classes; so the bias does not affect the relative difference between classes in HGT rates and inference regarding the complexity hypothesis.
For the SPR model, numerical approximations to the matrix exponentials involving the multistep transition probability matrix PSPR may offer promise in handling research problems with larger numbers of taxa N (MOLER and VAN LOAN 2003). As N increases, the square dimensions of PSPR grow superexponential, while the size of the neighborhood of each vertex grows only as
(N2). As a consequence, PSPR becomes increasingly sparse. In this situation, the number of unique eigenvalues increases substantially slower than the matrix's dimension. Krylov subspace techniques (SIDJE and STEWART 1999) may stretch computational limits upward to N = 8 or more.
In spite of these limitations, these stochastic models for HGT offer several advantages over previous approaches to studying HGT using multiple orthologous gene alignments. Under these stochastic models, the species tree is an unknown parameter that may be either integrated out of the analysis as a nuisance parameter or estimated jointly with the multiple gene trees. Joint analysis decreases the possibility of bias introduced through fixing the species tree when knowledge about it is uncertain. A stochastic approach also overcomes the bias inherent in parsimony-like estimation. Further, the hierarchical framework in which the stochastic model sits enables the borrowing of strength in the estimation of all gene-partition-level estimates including the gene trees themselves. Finally, and most importantly, stochastic models lend themselves well to formal statistical testing, with no need for ad hoc procedures. The ability to compare differing models for HGT will continue to shed further insight into the underlying biological process.
ABSTRACT
MODEL
STATISTICAL FRAMEWORK
EXAMPLE
REMARKS
>APPENDIX
ACKNOWLEDGEMENTS
LITERATURE CITED
Complete models:
To determine the multistep transition probability matrix PGJC for the GJC model with M
2 states, I first recall that
![]() | (A1) |
To determine the eigenvalues of QGJC, I write
![]() | (A2) |
k is expressed in terms of the expected number of HGT events per gene, J is the M x M matrix of all ones, and I is the M x M identity matrix. Matrix J has a rank of one and, therefore, one nonzero eigenvalue that equals M/(M 1). Given the eigenvalues of J and expression (A2), the M eigenvalues of QGJC become
![]() | (A3) |
2 continues to have only two distinct eigenvalues. Conceptually this results because the qualitative behavior of the underlying Markov chain does not change as the size of the state-space increases.
By letting
k
, I see that the stationary distribution is the eigenvector corresponding to the 0 eigenvalue. By examining the other limiting case where
k = 0 and considering the initial conditions, algebraic rearrangement yields
![]() | (A4) |
The state-space of the GK model is partitioned into two disjoint sets
1 and
2. Let M1 = |
1| and M2 = |
2|, where M1 + M2 = M, and let R be the ratio of rates for transitions within a structural set to transitions between sets. Then, following arguments similar to those above, one can find the multistep transition probability matrix PGK for the GK model.
![]() | (A5) |
![]() | (A6) |
2, then
![]() | (A7) |
3 = M2R + M1. For R
1, note that there are four unique eigenvalues when M1
M2 and three unique eigenvalues otherwise. This is consistent with the standard Kimura model, in which M1 = M2 = 2 with three unique eigenvalues.
Sampling algorithm:
For each gene-partition k, let
k = (
k, tk,
k,
k, µk,
k) and, then, assemble
= (
1, ... ,
K) to be the collection of all gene-level parameters. To specify the hierarchical prior parameters, let
= (V,
,
, N
,
,
). Across-gene-level parameters
also include R when considering the GK model and mixing parameter
{0, 1} when comparing models SPR and GJC. I employ a MCMC approach to sample from each model's joint posterior distribution, p(
,
|Y). I generate samples from these posteriors using two nested Metropolis-within-Gibbs cycles, as laid out in SUCHARD et al. (2003a) for hierarchical phylogenetic models. The outer cycle first iterates over gene partitions k and then over the parameters in
. Within each gene partition k, the inner cycle proceeds over the parameters in
k. With the exception of proposals for
,
, R, and
, all parameter proposals follow those in SUCHARD et al. (2003a).
The multinomial prior placed on
is conjugate to its sampling density. As a result, it is possible to Gibbs sample
from its full conditional distribution for moderately small M. This full conditional distribution remains multinomial with M state probabilities given by
![]() | (A8) |
k = vk for all k and
is the vector of all model parameters (
,
) excluding
. Similar to the reweighted prior approach to estimate
, varying z can improve sampling efficiency when estimating the relative posterior probabilities of specific species trees
.
I draw the transition ratio R and linear predictors
via separate Metropolis-Hastings proposals. For R, I propose new parameter values by generating a normal random variate centered at the current value of R with a tunable variance s2R. Given the high degree of correlation between column vectors in the design matrix D, I expect the posterior distribution of
to also exhibit strong correlation. This expectation stems from a normal linear regression approximation to p(exp(
)|
) that has a variance-covariance structure proportional to (D'D)1. As a consequence, component-by-component updating of
c in
should lead to a slowly mixing MCMC chain (ROBERTS and SAHU 1997). To help ensure adequate mixing, I propose all
c simultaneously using a multivariate normal random variate centered at the current value of
with a tunable variance-covariance matrix diag
. I adjust the tunable variances such that proposals have acceptance rates of 3040% (GELMAN et al. 1996) and fix the correlation matrix
approximately equal to the posterior correlation of
determined by a trial chain.
When comparing HGT models using a mixture approach, I sample the mixing parameter
directly from its full conditional distribution in a Gibbs step,
![]() | (A9) |
![]() | (A10) |
= u, and
k = vk. Values a may be saved at each iteration and used to construct a Rao-Blackwellized estimator for p(M1|Y) (SUCHARD et al. 2003a).
Finally, the inferred number of HGT events Ek for the SPR model can be recovered after posterior simulation. The full conditional distribution
![]() | (A11) |
= u and
k = vk. Since
uvk
1,
![]() | (A12) |
![]() | (A13) |
, I draw one replicate E
k for each p = 1, ... , P. For each p, I first generate E* from a Poisson
distribution and U from the uniform distribution. Then, if U
(AE*)uv/(P)uv, where
(p) = u and
, I set Ek(p) = E*. Otherwise, I reject the current proposal and begin again by regenerating (E*, U).
MCMC performance:
I run my MCMC chains for 1.1 x 105 outer Metropolis-within-Gibbs cycles, discard the first 104 cycles as burn-in, and subsample every 10 cycles. This process retains P = 104 posterior samples with decreased autocorrelation. The total chain length and burn-in time appear moderately longer than required by examining time-series plots of the model log-likelihood during simulation.
To assess the performance of the MCMC sampler, I employ scaled SRQ plots (MYKLAND et al. 1995; LI et al. 2000; SUCHARD et al. 2002). SRQ plots are useful to demonstrate adequate sampler mixing within discrete model parameters. For the primary measures in this study, two important discrete parameters are the species tree
and the model mixture parameter
. In particular, I use SRQ plots to assess mixing when comparing the relative probabilities of two possible species trees and of differing stochastic models for HGT. In these SRQ plots, the local slope around a given point depicts the ratio of the relative posterior probability estimate based on the entire MCMC chain to an estimate based on a short segment of the chain around that point. Substantial deviation of the slope from one implies that the sampler is slowly mixing and, as a result, the chain is not sufficiently long to generate stable estimates. For continuous model parameters and Bayes factors based on the Savage-Dickey ratio, I assess convergence by comparing posterior estimates obtained from simulations of at least five independent chains with starting values drawn directly from the model priors.
ABSTRACT
MODEL
STATISTICAL FRAMEWORK
EXAMPLE
REMARKS
APPENDIX
>ACKNOWLEDGEMENTS
LITERATURE CITED
ABSTRACT
MODEL
STATISTICAL FRAMEWORK
EXAMPLE
REMARKS
APPENDIX
ACKNOWLEDGEMENTS
>LITERATURE CITED
ALLEN, B., and M. STEEL, 2001 Subtree transfer operations and their induced matrices on evolutionary trees. Ann. Combinatorics 5: 115.
BERGSMEDH, A., A. SZELES, M. HENRIKSSON, A. BRATT, M. FOLKMAN et al., 2001 Horizontal transfer of oncogenes by uptake of apoptotic bodies. Proc. Natl. Acad. Sci. USA 98: 64076411.
BILLERA, L., S. HOLMES and K. VOGTMANN, 2001 Geometry of the space of phylogenetic trees. Adv. Appl. Math. 27: 733767.[CrossRef]
BROWN, J., 2003 Ancient horizontal gene transfer. Nat. Rev. Genet. 4: 121132.[CrossRef][Medline]
CAO, Y., A. JANKE, P. WADDELL, M. WESTERMAN, O. TAKENAKA et al., 1998 Conflict among individual mitochondrial proteins in resolving the phylogeny of Eutherian orders. J. Mol. Evol. 47: 307322.[CrossRef][Medline]
CARLIN, B., and S. CHIB, 1995 Bayesian model choice via Markov chain Monte Carlo methods. J. R. Stat. Soc. Ser. B 57: 473484.
COLE, J., B. CHAI, T. MARSH, R. FARRIS, Q. WANG et al., 2003 The ribosomal database project (RDP-II): previewing a new autoaligner that allows regular updates and the new prokaryotic taxonomy. Nucleic Acids Res. 31: 442443.
DECKERT, G., P. WARREN, T. GAASTERLAND, W. YOUNG, A. LENOX et al., 1998 The complete genome of the hyperthermophilic bacterium aquifex aeoclicus. Nature 392: 353358.[CrossRef][Medline]
DOOLITTLE, W., 1999 Lateral gene transfer, genome surveys and the phylogeny of prokaryotes. Science 286: 1443a.
FELSENSTEIN, J., 1981 Evolutionary trees from DNA sequences: a maximum likelihood approach. J. Mol. Evol. 17: 368376.[CrossRef][Medline]
FENG, D., G. CHO and R. DOOLITTLE, 1997 Determining divergence times with a protein clock: update and reevaluation. Proc. Natl. Acad. Sci. USA 94: 1302813033.
GARCIA-VALLVE, S., A. ROMEU and J. PALAU, 2000 Horizontal gene transfer in bacterial and archeal complete genomes. Genome Res. 10: 17191725.
GELMAN, A., G. ROBERTS and W. GILKS, 1996 Efficient Metropolis jumping rules, pp. 599608 in Bayesian Statistics, Vol. 5, edited by J. BERNARDO, J. BERGER, A. DAWID and A. SMITH. Oxford University Press, Oxford.
GIOVANNONI, S., M. RAPP, D. GORDON, E. URBACH, M. SUZUKI et al., 1996 Ribosomal RNA and the evolultion of bacterial diversity, pp. 6385 in Evolution of Microbial Life, edited by D. ROBERTS, P. SHARP, G. ALDERSON and M. COLLINS. Cambridge University Press, Cambridge, UK.
GREEN, P., 1995 Reversible jump Markov chain Monte Carlo computation and Bayesian model determination. Biometrika 82: 711732.
HEIN, J., 1990 Reconstructing evolution of sequences subjects to recombination using parsimony. Math. Biosci. 98: 185200.[CrossRef][Medline]
HEIN, J., 1993 A heuristic method to reconstruct the history of sequences subject to recombination. J. Mol. Evol. 36: 396405.
HUELSENBECK, J., F. RONQUIST, R. NIELSEN and J. BOLLBACK, 2001 Bayesian inference of phylogeny and its impact on evolutionary biology. Science 294: 23102314.
JAIN, R., M. RIVERA and J. LAKE, 1999 Horizontal gene transfer among genomes: the complexity hypothesis. Proc. Natl. Acad. Sci. USA 96: 38013806.
JAIN, R., M. RIVERA, J. MOORE and J. LAKE, 2002 Horizontal gene transfer in microbial genome evolution. Theor. Popul. Biol. 61: 489495.[CrossRef][Medline]
JUKES, T., and C. CANTOR, 1969 Evolution of protein molecules, pp. 21132 in Mammaliam Protein Metabolism, edited by H. MUNRO. Academic Press, New York.
KASS, R., and A. RAFTERY, 1995 Bayes factors. J. Am. Stat. Assoc. 90: 773795.[CrossRef]
KE, D., M. BOISSINOT, A. HULETSKY, F. PICARD, J. FRENETTE et al., 2000 Evidence for horizontal gene transfer in evolution of elongation factor Tu in enterococci. J. Bacteriol. 182: 69136920.
KIMURA, M., 1980 A simple model for estimating evolutionary rates of base substitutions through comparative studies of nucleotide sequences. J. Mol. Evol. 16: 111120.[CrossRef][Medline]
KOONIN, E., K. MAKAROVA and L. ARAVIND, 2001 Horizontal gene transfer in prokaryotes: quantification and classification. Annu. Rev. Microbiol. 55: 709742.[CrossRef][Medline]
KOSKI, L., R. MORTON and G. GOLDING, 2001 Codon bias and base composition are poor indicators of horizontally transferred genes. Mol. Biol. Evol. 18: 404412.
LAKE, J., 1991 The order of sequence alignment can bias the selection of tree topology. Mol. Biol. Evol. 8: 378385.[Medline]
LAKE, J., and M. RIVERA, 1996 The prokaryotic ancestry of eukaryotes, pp. 87108 in Evolution of Microbial Life, edited by D. ROBERTS, P. SHARP, G. ALDERSON and M. COLLINS. Cambridge University Press, Cambridge, UK.
LAWRENCE, J., 1999 Gene transfer, speciation and the evolution of bacterial genomes. Curr. Opin. Microbiol. 2: 519523.[CrossRef][Medline]
LAWRENCE, J., and H. OCHMAN, 1997 Amelioration of bacterial genomes: rates of change and exchange. J. Mol. Evol. 44: 383397.[CrossRef][Medline]
LEVERSTEIN-VAN HALL, M., A. BOX, H. BLOK, A. PAUUW, A. FLUIT et al., 2002 Evidence of extensive interspecies transfer of integron-mediated antimicrobial resistance genes among multidrug-resistant Enterobacteriaceae in a clinical setting. J. Infect. Dis. 186: 4956.[CrossRef][Medline]
LI, S., D. PEARL and H. DOSS, 2000 Phylogenetic tree construction using Markov chain Monte Carlo. J. Am. Stat. Assoc. 95: 493508.[CrossRef]
LIU, J., 1994 The collasped Gibbs sampler in Bayesian computations with applications to a gene regulation problem. J. Am. Stat. Assoc. 89: 958966.[CrossRef]
MADDISON, W., 1997 Gene trees in species trees. Syst. Biol. 46: 523536.[CrossRef]
MAU, B., M. NEWTON and B. LARGET, 1999 Bayesian phylogenetic inference via Markov chain Monte Carlo methods. Biometrics 55: 112.[CrossRef][Medline]
MCCULLAGH, P., and J. NELDER, 1983 Generalized Linear Models: Monographs on Statistics and Applied Probability. Chapman & Hall, New York..
MIRKIN, B., T. FENNER, M. GALPERIN and E. KOONIN, 2003 Algorithms for computing parsimonious evolutionary scenarios for genome evolution, the last universal common ancestor and dominance of horizontal gene transfer in the evolution of prokaryotes. BMC Evol. Biol. 3: 2.[CrossRef][Medline]
MOLER, C., and C. VAN LOAN, 2003 Nineteen dubious ways to compute the exponential of a matrix, twenty-five years later. Soc. Ind. Appl. Math. Rev. 45: 349.
MYKLAND, P., L. TIERNEY and B. YU, 1995 Regeneration in Markov chain samplers. J. Am. Stat. Assoc. 90: 233241.[CrossRef]
PAGE, R., 2000 Extracting species trees from complex gene trees: reconciled trees and vertebrate phylogeny. Mol. Phylogenet. Evol. 14: 89106.[CrossRef][Medline]
RAGAN, M., 2001 Detection of lateral gene transfer among microbial genomes. Curr. Opin. Genet. Dev. 11: 620626.[CrossRef][Medline]
RIVERA, M., R. JAIN, J. MOORE and J. LAKE, 1998 Genomic evidence of two functionally distinct gene classes. Proc. Natl. Acad. Sci. USA 95: 62396244.
ROBERTS, G., and S. SAHU, 1997 Updating schemes, correlation structure, blocking and parameterization of the Gibbs sampler. J. R. Stat. Soc. Ser. B 59: 291317.[CrossRef]
ROBINSON, D., 1971 Comparison of labeled trees with valency three. J. Comb. Theor. Ser. B 11: 105119.[CrossRef]
SIDJE, R., and W. STEWART, 1999 A numerical study of large sparse matrix exponentials arising in Markov chains. Comput. Stat. Data Anal. 29: 345368.[CrossRef]
SINSHEIMER, J., J. LAKE and R. LITTLE, 1996 Bayesian hypothesis testing of four-taxon topologies using molecular sequence data. Biometrics 52: 193210.[CrossRef][Medline]
SNEL, B., P. BORK and M. HUYNEN, 1999 Genome phylogeny based on gene content. Nat. Genet. 21: 108110.[CrossRef][Medline]
SUCHARD, M., R. WEISS and J. SINSHEIMER, 2001 Bayesian selection of continuous-time Markov chain evolutionary models. Mol. Biol. Evol. 18: 10011013.
SUCHARD, M., R. WEISS, K. DORMAN and J. SINSHEIMER, 2002 Oh brother, where art thou? A Bayes factor test for recombination with uncertain heritage. Syst. Biol. 51: 715728.[CrossRef][Medline]
SUCHARD, M., C. KITCHEN, J. SINSHEIMER and R. WEISS, 2003a Hierarchical phylogeneic models for analyzing multipartite sequence data. Syst. Biol. 52: 649664.[CrossRef][Medline]
SUCHARD, M., R. WEISS and J. SINSHEIMER, 2003b Testing a molecular clock without an outgroup: derivations of induced priors on branch length restrictions in a Bayesian framework. Syst. Biol. 52: 4854.[CrossRef][Medline]
SWOFFORD, D., G. OLSEN, P. WADDELL and D. HILLIS, 1996 Phylogenetic inferences, pp. 407514 in Molecular Systematics, Ed. 2, edited by D. HILLIS, C. MORITZ and B. MABLE. Sinauer Associates, Sunderland, MA.
SYVANEN, M., 1994 Horizontal gene transfer: evidence and possible consequences. Annu. Rev. Genet. 28: 237261.[Medline]
TAMURA, K., and M. NEI, 1993 Estimation of the number of nucleotide substitutions in the control region of mitochondrial DNA in humans and chimpanzees. Mol. Biol. Evol. 10: 512526.[Abstract]
TAYLOR, J., A. SIQUEIRA and R. WEISS, 1996 The cost of adding parameters to a model. J. R. Soc. Stat. Ser. B 58: 593607.
TEICHMANN, S., and G. MITCHISON, 1999 Is there a phylogenetic signal in prokaryote proteins? J. Mol. Evol. 49: 98107.[CrossRef][Medline]
VERDINELLI, I., and L. WASSERMAN, 1995 Computing Bayes factors using a generalization of the Savage-Dickey density ratio. J. Am. Stat. Assoc. 90: 614618.[CrossRef]
WOESE, C., 2000 Interpreting the universal phylogenetic tree. Proc. Natl. Acad. Sci. USA 97: 83928396.
WOLF, Y., L. ARAVIND, N. GRISHIN and E. KOONIN, 1999 Evolution of aminoacyl-tRNA synthetasesanalysis of unique domain architectures and phylogenetic trees reveals a complex history of horizontal gene transfer events. Genome Res. 9: 689710.
YANG, Z., 2002 Likelihood and Bayes estimation of ancestral population sizes in hominoids using data from multiple loci. Genetics 162: 18111823.
YANG, Z., and B. RANNALA, 1997 Bayesian phylogenetic inference using DNA sequences: a Markov chain Monte Carlo method. Mol. Biol. Evol. 14: 717724.[Abstract]
ZAP, W., Z. ZHANG and Y. WANG, 1999 Distinct types of rRNA operons exist in the genome of the actinomycete thermomonospora chromogena and evidence for horizontal gene transfer of an entire rRNA operon. J. Bacteriol. 181: 52015209.
Communicating editor: J. HEIN
This article has been cited by other articles:
![]() |
M. A. Ragan and R. G. Beiko Lateral genetic transfer: open issues Phil Trans R Soc B, August 12, 2009; 364(1527): 2241 - 2251. [Abstract] [Full Text] [PDF] |
||||
![]() |
F. M. Jordan, R. D. Gray, S. J. Greenhill, and R. Mace Matrilocal residence is ancestral in Austronesian societies Proc R Soc B, June 7, 2009; 276(1664): 1957 - 1964. [Abstract] [Full Text] [PDF] |
||||
![]() |
N. Galtier and V. Daubin Dealing with incongruence in phylogenomic analyses Phil Trans R Soc B, December 27, 2008; 363(1512): 4023 - 4029. [Abstract] [Full Text] [PDF] |
||||
![]() |
J. W. Leigh, E. Susko, M. Baumgartner, and A. J. Roger Testing Congruence in Phylogenomic Analysis Syst Biol, February 1, 2008; 57(1): 104 - 115. [Abstract] [Full Text] [PDF] |
||||
![]() |
N. Galtier A Model of Horizontal Gene Transfer and the Bacterial Phylogeny Problem Syst Biol, August 1, 2007; 56(4): 633 - 642. [Abstract] [Full Text] [PDF] |
||||
![]() |
S. Linz, A. Radtke, and A. von Haeseler A Likelihood Framework to Measure Horizontal Gene Transfer Mol. Biol. Evol., June 1, 2007; 24(6): 1312 - 1319. [Abstract] [Full Text] [PDF] |
||||
![]() |
C. Ane, B. Larget, D. A. Baum, S. D. Smith, and A. Rokas Bayesian Estimation of Concordance among Gene Trees Mol. Biol. Evol., February 1, 2007; 24(2): 412 - 426. [Abstract] [Full Text] [PDF] |
||||
- THIS ARTICLE
-
Abstract
- Full Text (PDF)
-
All Versions of this Article:
genetics.103.025692v1
170/1/419 most recent - Alert me when this article is cited
- Alert me if a correction is posted
- SERVICES
- Email this article to a friend
- Similar articles in this journal
- Similar articles in PubMed
- Alert me to new issues of the journal
- Download to citation manager
- Reprints & Permissions
- CITING ARTICLES
- Citing Articles via HighWire
- Citing Articles via Google Scholar
- GOOGLE SCHOLAR
- Articles by Suchard, M. A.
- Search for Related Content
- PUBMED
- PubMed Citation
- Articles by Suchard, M. A.










































