IN a recent article, Wang and Hey (2009) consider estimation of the parameters in an isolation-with-migration (IM) model for two species. For each locus, the data *X* consist of two samples, and therefore the probability of the data depends only on the time to the most recent common ancestor (MRCA), and we can write the likelihood for a single locus aswhere *f*(*t*|Θ) is the probability density of the coalescent time. Assuming free recombination between loci, the full likelihood is a product of each locus likelihood.

To determine the likelihood we must determine the density *f*(*t*|Θ) for coalescent of two samples in the IM model. Wang and Hey (2009) find the time to the MRCA by explicitly integrating over all possible sample paths in the system. The purpose of this letter is to demonstrate that the time to the MRCA is easy to compute from a matrix exponential. Furthermore, the matrix exponential framework has the advantage that it generalizes to more than two samples. We describe the IM model, briefly describe the solution to computing the coalescence time density from Wang and Hey (2009), and finally present an approach that computes the density through matrix exponentials.

The description of the model is taken from Wang and Hey (2009) and formulates the IM model as a continuous time Markov chain. Before time *T* and for two samples the system is in one of the following five states: *S*_{11}, both genes are in population 1; *S*_{22}, both genes are in population 2; *S*_{12}, one gene is in population 1 and the other is in population 2; *S*_{1}, the genes have coalesced and the single gene is in population 1; and *S*_{2}, the genes have coalesced and the single gene is in population 2. The rates between the states can be described by the instantaneous rate matrix *Q* given by(1)where the diagonals are such that each row sums to zero. After time *T*, the system only has two states: *S _{AA}* corresponding to two genes in the ancestral population and

*S*corresponding to one single gene in the ancestral population. The rate of going from state

_{A}*S*to state

_{AA}*S*is 2/θ

_{A}_{A}. The model parameters are thus Θ = (θ

_{1}, θ

_{2}, θ

_{A},

*m*

_{1},

*m*

_{2},

*T*), where θ

_{1}, θ

_{2}, and θ

_{A}are the scaled population sizes,

*m*

_{1}and

*m*

_{2}are the migration rates and

*T*is the speciation time. We refer to Wang and Hey (2009, Figure 1) for an illustration of the model and for more details on the model parameters.

Now consider the sample path *z* = {*z*(*s*) : 0 ≤ *s* ≤ *t* } shown in Figure 1, where the coalescent happens at time *t* < *T* and in population 2. The density for this sample path is given bywhere *x* and *y* are the number of transitions to the states *S*_{12} and *S*_{11}, respectively. The number of transitions from *S*_{11} to *S*_{12} and from *S*_{12} to *S*_{11} is therefore *y* and, since the starting state is *S*_{12}, the number of transitions from *S*_{12} to *S*_{22} and *S*_{22} to *S*_{12} is *x* − *y* + 1 and *x* − *y*. The variables *U, V*, and *W* are the total amount of time spent in the states *S*_{12}, *S*_{11}, and *S*_{22}, respectively.

To determine the density *f*_{2}(*t*|Θ) for coalescent in population 2 at time *t* < *T*, Wang and Hey (2009) explicitly integrate all possible sample paths that find a MRCA at time *t*. The integration is performed by first integrating all sample paths that share the same values for the five variables (*x*, *y*, *U*, *V*, *W*); this density is termed *p*(*x*, *y*, *U*, *V*, *W* |Θ). Second, this expression is summed over variables (*x*, *y*) and integrated over variables (*U*, *V*, *W*) with the constraint *U* +*V* +*W* = *t*:The inner summation becomes a Bessel *I*_{α}(*x*) function and the two-dimensional planar integration is handled numerically.

It is possible, however, to take immediate advantage of the continuous time Markov chain representation (1) and to solve the system of ordinary differential equations analytically. The two samples are either from the same species (corresponding to the starting state being *S*_{11} or *S*_{22}) or the two samples are from each of the species (in which case the starting state is *S*_{12}). Starting in state *a*, the density for coalescent in population 1, for *t* < *T*, is given by(2)where *e ^{A}* is the matrix exponential and (

*e*)

^{A}_{jk}is entry (

*j*,

*k*) in

*e*. Calculating matrix exponentials has a very long history (

^{A}*e.g.*, Moler and Van Loan 2003), and implementations are available in most standard computer languages. The density for coalescent in population 2 at time

*t*<

*T*is(3)and the total density for a coalescent at time

*t*<

*T*is(4)Similarly, the density for coalescent in the ancestral population at time

*t*>

*T*is(5)

In Figure 2 we illustrate the coalescent density in the two species IM model. We use the same parameters as in the simulation study in Wang and Hey (2009, Table 6): θ_{1} = 0.005, θ_{2} = 0.003, θ_{A} = 0.002, *m*_{1} = 50, *m*_{2} = 100, and *T* = 0.003 (the vertical line).

Multiple analytical approaches for computing the coalescence time density can be found in the literature. Variations of the IM model have been analyzed using Laplace transforms in, *e.g.,* Latter (1973), Takahata (1995), and Wilkinson-Herbots (1998). Their results can also be derived using matrix exponentiation. In Wakeley (1996), a spectral decomposition was used to obtain a continuous-time approximation to a discrete time model. Generalizations of the IM model dealing with two loci with recombination between loci are analyzed using expressions for continuous time Markov chains in Slatkin and Pollack (2006) and Simonsen and Churchill (1997).

We finally emphasize that the matrix exponential framework described above generalizes to more than two samples and more than two populations. For any coalescence system that can be expressed as a finite-state homogeneous continuous time Markov chain, we can compute the density of coalescence times using matrix exponentiation. Expressing a coalescence process as such a system is straightforward although a major complication is an explosion in the number of states when the number of samples and populations increase.

If the Markov chain is not homogeneous, that is, the rate matrix *Q* depends on the time parameter *t*, simple matrix exponentiation is no longer a solution to the coupled set of differential equations. The model of Innan and Watanabe (2006), for instance, consists of the same set of states and almost the same rate matrix, but has the migration rates depend linearly on the time variable *t*. For this system, the approach described above cannot be applied.

## Footnotes

Available freely online through the author-supported open access option.

Communicating editor: L. Excoffier

- Copyright © 2011 by the Genetics Society of America

Available freely online through the author-supported open access option.