Abstract
In this article we present a model for analyzing patterns of genetic diversity in a continuous, finite, linear habitat with restricted gene flow. The distribution of coalescent times and locations is derived for a pair of sequences sampled from arbitrary locations along the habitat. The results for mean time to coalescence are compared to simulated data. As expected, mean time to common ancestry increases with the distance separating the two sequences. Additionally, this mean time is greater near the center of the habitat than near the ends. In the distant past, lineages that have not undergone coalescence are more likely to have been at opposite ends of the population range, whereas coalescent events in the distant past are biased toward the center. All of these effects are more pronounced when gene flow is more limited. The pattern of pairwise nucleotide differences predicted by the model is compared to data collected from sardine populations. The sardine data are used to illustrate how demographic parameters can be estimated using the model.
MIGRATION often plays an important role in shaping patterns of genetic diversity. Under conditions of restricted gene flow, the geographical and genetic structures of a population tend to become correlated. In the most basic terms, we expect individuals in close geographical proximity to be genetically more similar than geographically distant individuals. This differentiation will arise even in the absence of local adaptation, due to locally occurring genetic drift. As a result, it is possible to use existing patterns of neutral genetic variation to make inferences about the geographic structure of populations.
The beststudied models of geographic structure are the island (Wright 1931; Maruyama 1970a) and steppingstone (Kimura and Weiss 1964) models. Both types of model assume a population composed of a number of subpopulations, or demes, connected to each other through migration. Each deme is assumed to be panmictic. In the island model, there is no explicit geography, in that each migration event occurs via a common migrant pool. Steppingstone models, in contrast, permit migration only between neighboring demes. In the onedimensional model, the demes are arrayed in a line, and each deme exchanges migrants only with the two adjacent demes. The analogous twodimensional model assumes a grid of demes, with each deme exchanging migrants with some number of neighbors (e.g., four).
Coalescent theory differs from classical population genetics in its focus on the time to the most recent common ancestor of two or more sequences, rather than on the properties of the population as a whole. This focus on the genealogical structure of a sample provides a framework in which properties of populations can be estimated. Coalescent theory applied to geographically structured populations with discrete demes has been formalized as the “structured coalescent” (see, e.g., WilkinsonHerbots 1998).
The coalescent model developed in this article assumes a population distributed uniformly along a finite, onedimensional habitat. Gene flow is restricted, so locations of parents and offspring are correlated. A diffusion approximation is used to characterize the locations of ancestors of sampled sequences. Applied to pairs of sequences, this approach fully specifies the probability density for the times and locations of their most recent common ancestors and also provides summary statistics such as the mean time to coalescence.
The model is analogous to the onedimensional steppingstone model, but with some important differences that illustrate the motivation for this work. Although there has been a lot of work on finite steppingstone models (discussed below), most analyses of the steppingstone model rely on nonrealistic treatments of habitat boundaries. The beststudied models fall into two categories. Models in the first category assume that the ends of the array are joined together (the circular steppingstone model, or the toroidal model in two dimensions; e.g., Maruyama 1970b; Nagylaki 1974a, 1977; Strobeck 1987; Slatkin 1991). This assumption secures mathematical tractability by making all demes identical and migration isotropic (Strobeck 1987), but is directly applicable to few systems in nature (e.g., a population inhabiting the entire coast of an island). Models in the second category assume a linear habitat of infinite length (e.g., Weiss and Kimura 1965; Nagylaki 1974b, 1976; Sawyer 1976, 1977). While these models provide useful insights regarding the shortterm behavior of populations in a onedimensional habitat, they predict infinite divergence between individuals (Sawyer 1976; Griffiths 1981; WilkinsonHerbots 1998).
The model studied here assumes a finite linear habitat (e.g., along a stretch of coastline). The analysis indicates that the expected pattern of genetic diversity does, in fact, depend on location in the habitat, suggesting that application of a circular model to a finite linear population is problematic. At the very least, this misapplication entails discarding information encoded in the variation in genetic diversity along the population range.
Models of isolation by distance in a continuous population date back to Wright (1943), who defined the effective neighborhood population size as the reciprocal of the probability of selffertilization. That is, the neighborhood size is approximately the number of individuals within the singlegeneration dispersal range. Wright's work shows that the correlation between adjacent individuals and the differentiation between neighborhoods both increase as the neighborhood size becomes small compared with the total population size. However, much of the theoretical work since has focused on populations subdivided into discrete demes. While a discrete model may be appropriate for many organisms, others may be distributed more or less continuously across a particular range, but nevertheless be geographically structured due to limited gene flow. The model presented here assumes a continuous population, but can be applied in modified form to the discretedemes model.
Work from within the classical population genetics paradigm provides some insight to the properties of finite linear models similar to the one considered here. Finite onedimensional steppingstone models have been analyzed by Maruyama (1970c), Fleming and Su (1974), and Malécot (1975). These analyses derive expectations for classical measures such as the covariance in gene frequencies across demes. Nagylaki and Barcilon (1988) have considered probabilities of identity in a semiinfinite linear habitat. Maruyama (1971) has also derived probabilities of identity for a continuous population on a torus and the rate of decrease in genetic variability in a finite twodimensional population (Maruyama 1970d, 1972). Hey (1991) has compared the mean coalescence time for a pair of sequences sampled from opposite ends of a finite linear stepping stone with that of a pair sampled at random from the entire population. The result that coalescence times are longer near the center of the habitat range is consistent with findings of Herbots (1994, pp. 66 and 145146), who found a similar pattern in linear steppingstone models with three and five demes.
While all of these results are intimately linked to the distribution of coalescence times (Slatkin 1991), many classical populationgenetic analyses do not make use of all of the information in DNA sequence data, making the coalescent approach presented here preferable. Furthermore, all of these analyses rely on approximations that assume a large local population size, an assumption that is relaxed in the present analysis. However, it is noteworthy that the results derived here are consistent with, and in some cases anticipated by, much of this previous work.
A model similar to the one presented here was proposed by Barton and Wilson (1995, 1996), who applied a coalescent approach to a continuous population in two dimensions, deriving recursion equations that describe the coalescent process for a pair of sequences. These equations agree closely with simulated distributions of coalescence times. However, the method becomes cumbersome for long coalescence times and does not readily lead to summary statistics such as the moments of the probability distribution. While limited to pairs of genes in a onedimensional habitat, the model presented here is easily applied to both long and short coalescence times and yields summary statistics that can be used to make inferences regarding demographic history from genetic data. Simulation results confirm that the diffusion approximation used in this model provides an accurate characterization of the entire coalescent process under a broad range of parameter values.
THE MODEL
The model assumes a uniformly distributed population of N haploid individuals in a linear habitat, but can be applied to a population of N/2 diploid individuals without modification. Distance is scaled such that any location along the habitat is indexed by a number between 0 and 1 (with ^{1}/_{2} being the midpoint of the habitat). Absolute densitydependent population regulation is assumed. Each individual occupies a space of width 1/N, from which all other individuals are excluded. The structure of the population is a onedimensional lattice, as in the voter model (Holley and Liggett 1975), a contactprocess model used in many ecological applications (Durrett and Levin 1994). The distribution of coalescence times is found using a continuous approximation. Another way to think of the model is as a steppingstone model consisting of N demes, each of size 1.
Generations are nonoverlapping. Each individual produces a very large number of gametes, which are dispersed according to a normal distribution centered at the location of the individual and with variance
The boundaries of the habitat are reflecting, so a gamete that would otherwise land outside the habitat range is reflected back an equal distance within it. Each individual thus has the same expected number of offspring regardless of its location. This means that migration is conservative, so migration alone is sufficient to maintain the relative population densities at all locations in the habitat (Nagylaki 1980). Nonreflecting boundaries would correspond to the case where those gametes dispersing outside the habitat range are lost. In such a system, individuals near the edges of the habitat would have a reduced effective fecundity relative to those nearer to the center.
As Felsenstein (1975) pointed out, most continuousspace models in population genetics assume a uniform population density that would not actually be maintained by the proposed reproductive scheme. A normal distribution of gametes without severe density regulation generates a population that is clumped together at certain locations and sparsely populated at others. With its absolute density regulation at all locations, the model of reproduction proposed here will immediately generate and maintain a population that is uniformly distributed across its habitat range.
Applying a coalescent approach to the analysis of this model involves tracking the location of the ancestors of a particular sequence back in time. The location of a single sampled lineage can be approximated using a diffusion process with diffusion constant
The analysis presented here derives the distribution of the time to coalescence for a pair of sequences drawn from locations
If we consider a single lineage, disregarding habitat boundaries for the moment, it is equally likely to have come from a location to its left or its right in the previous generation. It follows that when we consider two lineages at some distance from each other, they are equally likely to have been closer together or farther apart in the previous generation. However, if the two shared a common ancestor in the previous generation, they must have been closer together. Thus, conditional on not coalescing in the previous generation, the two lineages are slightly more likely to have been farther apart than closer together. In contrast to the singlesequence case, as time in the past becomes very large, the ancestral lineages are not equally likely to be found anywhere in the habitat range. If the two lineages are still distinct, they are likely to have been more geographically separated than a uniform distribution dictates.
The analysis involves a transformation of variables to create two new parameters that do diffuse independently. The first parameter encodes the distance between the two sequences, and the second their average position:
The genealogical history is now modeled as a single diffusion process in this triangular state space. The diffusion constant is
Diffusion is subject to two different boundary conditions in the triangular state space. The two short sides of the triangle are reflecting boundaries. Reflection at these lines corresponds to reflection of a lineage off of a habitat boundary. The long side of the triangle is a more complex, partially reflecting, partially absorbing boundary. Positions along this line represent states where the two ancestral lineages are very close together in space. Reflection is equivalent to the two lineages moving past each other. Absorption is equivalent to a coalescent event.
The diffusion process in the transformed state space is isotropic, but not separable. Although diffusion in one dimension is independent of diffusion in the other dimension, it is not independent of location, due to the fact that the state space is triangular. The diagonal reflecting boundaries can be eliminated by placing a mirror image of the state space opposite the boundary. Reflection at the boundary is now represented as movement into the mirrorimage state space. Three such reflections transform the state space into a square ranging from –1 to 1 in both x and y. All four edges are coalescent boundaries, and the symmetric shape makes the x and y diffusion processes separable as
The boundary conditions at the edges of the square depend on both the population density and the dispersal (migration) rate. If the population density and dispersal rate are very high, then when the two lineages come close together, they are likely to pass by each other rather than share a common ancestor, because there are a large number of individuals within their dispersal range. This corresponds to a more reflecting boundary in the twodimensional state space. On the other hand, if the population density and dispersal rate are low, neighborhood size is small, and two lineages that are close together in space will be more likely to share a common ancestor, corresponding to a more absorbing boundary. Mathematically speaking, the flux rate of the probability distribution across the boundary is equal to the probability that the two lineages coalesce in the previous generation.
Because the model assumes perfect densitydependent population regulation, there is exactly one haploid lineage in each span of width 1/N. A coalescent event occurs when the two ancestral lineages are found within the same 1/N span in a given generation. Note that giving the lineages a finite physical width addresses the concern raised by Sawyer (1976) that common ancestry in a continuous model requires two lineages to have a physical separation of zero, which leads to pathological behaviors when applied to models of more than one dimension. In our continuous approximation of the population, we assume that a coalescent event occurs whenever the two lineages are separated by a distance of <1/(2N), that is, when 1 – 1/(2N) < x < 1. This approximates the probability that both lineages are found within the same fixed span of width 1/N. Applying this criterion at the boundaries, the result is an infinite series of sine and cosine terms. The full solution is derived in the appendix, and the main results are reproduced here in the text. For example, the joint probability density for the locations of the two lineages is given by
At time t, the probability of the two lineages not yet having coalesced is equal to the volume under the probability surface defined by U (x, y, t) within the square –1 < x < 1, –1 < y < 1. Intuitively, this is because coalescence is represented by the diffusion of the probability density out of the square. Thus, the instantaneous rate of coalescence at time t is given by the rate at which the probability volume within the square is decreasing at that time. From this relationship it is possible to derive the expectation of the time to coalescence for two sequences sampled from locations corresponding to x_{0} and y_{0}:
It is possible using this approach to derive a number of other analytic results, including the full probability distribution for the time to coalescence (Equations A25 and A33). Each of the moments of the distribution has a simple form analogous to that for the expectation (Equations A26, A27, A28, A29, A30, A31 and A34). It is also possible to write down the exact distribution of the locations of two lineages that have not yet coalesced (A40), as well as the distribution of locations of the coalescent events (A44, A45, A46).
RESULTS
The results derived in this article have been compared to simulation data to assess the accuracy of the diffusion approximation in this context. This section contains analytic and simulation results for a number of values over a range of parameters. These results provide both reassurance regarding the accuracy of the equations and insight into the behavior of the coalescent process in a continuous, linear habitat.
Monte Carlo simulations were performed backward in time using two different migration/coalescence processes. In the first process, the locations of the two lineages were kept as floating point numbers. Migration each generation was performed by drawing a random number from a normal distribution. If the new location for the lineage lay outside the habitat range, the new location was selected by reflection at the habitat boundary. A coalescent event occurs if, after translocation and reflection, the two lineages lie within a distance 1/(2N) of each other. Times and locations of coalescent events were averaged over a large number of sample runs.
The second process used a discrete lattice model. Each of the two lineages was assigned an integer location between 1 and N. Each generation two pseudorandom integers were drawn from independent Poisson distributions for each lineage, which were then translocated by the difference of the two Poissons. This produces a discrete distribution that approximates the shape of a normal distribution. Reflections were performed as in the first process. If, after migration and reflection, the two lineages were at the same location (had the same integer location value), a coalescent event was considered to have occurred. Again, the times and locations of the coalescent events were averaged over a large number of sample runs.
The relative value of the mean time to coalescence is determined largely by the product
Inspection of the tables reveals that the analytically derived mean time to coalescence is in good agreement with simulated data, with the results of the three methods differing typically by no more than 0.5%. Tables 1, 2, 3 and Figure 3 also immediately reveal two features of this model. First is the intuitively pleasing result that the mean time to coalescence increases with the physical distance between the two sampled sequences. The rate of increase with distance is dependent on the migration rate, with lower migration rates corresponding to higher rates of genetic divergence with distance.
A second, less intuitive, result from these data is the dependence of time to coalescence on the location of the two sampled sequences (in contrast to their separation). The pattern is most easily seen by considering pairs of adjacent sequences sampled from various locations along the habitat. A pair of adjacent sequences sampled from the center of the population range has a longer mean time to coalescence than pairs sampled closer to the ends, an effect that is more pronounced at lower migration rates. This result is anticipated by classical population genetics results in which the probability of identity by descent is higher for demes near a reflecting boundary (Maruyama 1970c; Nagylaki and Barcilon 1988) and by the work of Herbots (1994).
The model also provides results regarding the locations of the lineages and common ancestors (Equations A40, A41, A42, A43, A44, A45, A46). Figure 4 shows the probability surface for the time and location of coalescence for three different pairs of sequences (from Equation A44). Recent coalescent events are likely to be found in the region between the two sampling sites. In the more distant past, the probability distribution depends only on the migration rate and not the sampling locations. This distantpast distribution is biased toward the center of the range. Another result that can be derived from the model is the distribution of the lineage locations conditional on their still being separate at a time t in the past (Equation A40). In the distant past, this distribution is skewed toward the edges. Intuitively, if the two lineages are separated by a very deep genealogical branch, it most likely results from their having spent a lot of time at opposite ends of the range. A number of other results are also derived and presented in the appendix, including the strongmigration limit (Nagylaki 1980, 2000) and application of these results to a linear array of demes.
APPLICATION TO DATA
The expected time to coalescence can be used to estimate demographic parameters. In this section published sequence data are used to fit the model and estimate the effective population size and the genetic dispersal rate. Our purpose here is not to determine specific parameter values for a particular organism. In fact, the population considered below is likely to violate one or more assumptions of the model. Our goal is simply to illustrate the fact that patterns of genetic diversity such as the one predicted by the model may be found in nature. We also want to emphasize the fact that the finite linear model makes different predictions under neutrality than either an island or a circular model and that these differences may alter our interpretation of sequence data.
We have applied the model developed here to sequence data collected from the mitochondrial control region in the five different regional forms of sardines (Sardinops) in the Indian and Pacific oceans (Bowen and Grant 1997). Sardines are characterized by an antitropical distribution and are restricted to five temperate upwelling zones off the coasts of Japan, California, Chile, Australia, and South Africa. Temperate waters extend continuously from South Africa through Australia to Chile and from Japan to North America. The two temperate zones are separated by warmer tropical waters in the Pacific. However, Bowen and Grant (1997) point out that this tropical zone is fairly narrow in the eastern Pacific, along the west coast of Mexico, suggesting that genetic contact between the California and Chile sardine populations is or has been possible. In the western Pacific, the tropical zone is much broader, making genetic exchange between Japan and Australia unlikely.
On the basis of these observations, it may be reasonable to apply the finite, linear model to this system, treating the populations as linearly arrayed from Japan to California to Chile to Australia to South Africa, with genetic contact possible only between adjacent populations. The total length of this range is ∼25,000 miles, with the five sampling sites occurring at ∼5000mile intervals. These five sites yield 15 pairwise comparisons, which were fit to the model by finding the parameter values that minimize the sum of the squares of the differences between the predicted and observed values. The statistical properties of these estimators are not investigated here. However, in this case, the model does appear to fit the data reasonably well, reproducing the same pattern of genetic diversity, and it provides a framework that highlights certain features of the data.
Figure 5 shows the observed and expected average number of pairwise nucleotide differences (proportional to the expected coalescence time) in the data set for parameter values minimizing the sum of the squares of the errors. The observed data manifest the two key features of the model: increasing genetic differentiation with distance and greater genetic diversity near the center of the habitat. The product
Inferences can also be drawn from differences in observed and expected values. In the fourth column of Figure 5 (labeled “Calif.”), the observed interpopulation values are all lower than the corresponding expected values. In the fifth column (“Japan”), on the other hand, the expected values are lower than the observed. This pattern suggests that the California and Chile populations are more closely connected genetically than might be predicted from distance alone, whereas the Japan and California populations are genetically more distant than expected. This observation supports Bowen and Grant's (1997) argument that the tropical barrier in the eastern Pacific is, or has recently been, traversible. In fact, the data suggest that the tropical water in the eastern Pacific may represent less of a genetic barrier than the equally large band of temperate water between North America and Japan.
This analysis is presented not to address specific issues in sardine biogeography or question the conclusions of Bowen and Grant, who attribute the observed pattern of genetic diversity to a range expansion of the sardine populations. Our purpose has been simply to show that patterns predicted by the model can, in fact, be found in natural populations and to illustrate how the model can be employed to estimate interesting demographic parameters. Furthermore, the analysis suggests how observed patterns of genetic diversity can be made more meaningful when compared to a more sophisticated null model.
DISCUSSION
We have presented a model for analyzing genetic diversity in a finite, continuous, linear population. Using a diffusion approximation, the model fully characterizes the distribution of possible genealogical histories of a pair of sequences sampled from such a population. Results derived from the model include the full distribution of coalescence times and locations, as well as a number of summary statistics, such as the mean time to the most recent common ancestor.
The analytic results derived from the model are in good agreement with simulations over a wide range of parameter values. This agreement extends even to extremely small neighborhood sizes (approaching one), allowing us to relax the usual coalescent theory assumption of a large local population size. In the other extreme, the strong migration limit where the neighborhood size approaches the population size, the model converges on wellestablished results for the coalescent process in a panmictic population.
The model makes several predictions regarding genealogies in a finite continuous habitat. In addition to the intuitive result that genetic divergence increases with distance, the model predicts that genetic diversity will be greater near the center of the habitat than at the edges. Coalescent events in the recent past are most likely to occur between the sampling locations of the two sequences. In the distant past, the distribution of locations of coalescent events becomes independent of sampling location and is concentrated toward the center of the habitat. The locations of lineages (conditional on not having coalesced), on the other hand, are biased toward the edges of the habitat in the distant past. All of these effects are more pronounced under lower migration.
In this development of the model, we have assumed reflecting habitat boundaries, meaning that individuals suffer no loss of fecundity when they are situated at the edge of the habitat. It may be more reasonable to assume absorbing habitat boundaries. In the forwardtime model, this would mean that gametes that dispersed outside the habitat range would be lost. The effect in the backwardtime model would be to bias the distribution of lineage locations slightly toward the center of the habitat, decreasing slightly the time to common ancestry, but preserving the broad patterns described for the reflectingboundaries case.
One property of the model not discussed above is the dependence of the coalescent process on neighborhood size. For a given pair of locations, the ratio of the expected time to coalescence to the total population size is determined primarily by the product
It is also possible using this model to derive other values for a particular set of parameters. Slatkin (1991) derived F_{ST} in relation to mean time to coalescence for pairs of genes as
The distribution of coalescence times given by Equation A25 can also be combined with a particular mutational model. Integration of this probability against the mutational process will yield the probability of identity in state or the likelihood of a particular set of differences between the two sequences. Results such as these may be valuable in the analysis of sequence data.
Mathematica files for generating the results described in this article are available from the authors, as is a C program for estimating parameter values from sequence data.
Acknowledgments
We thank N. H. Barton, J. L. Cherry, T. Nagylaki, A. Platt, J. Wall, and three anonymous reviewers for helpful discussions and comments on the manuscript. This work was supported by a grant from the Howard Hughes Medical Institute to J.F.W. and in part by National Science Foundation grant no. DEB9815367 to J.W.
APPENDIX
Derivation of formulas: Let the initial positions of the two sequences be denoted by z^{0}_{1} and z^{0}_{2}, where z^{0}_{1} and z^{0}_{2} represent the relative locations along the length of the entire habitat, and therefore both lie between 0 and 1. Let x_{0} and y_{0} be the following transformations of these coordinates:
The two short sides of the triangular state space represent reflecting boundaries, which may be eliminated by the method of reflecting the state space across the boundary. Three such reflections generate a square state space ranging from –1 to 1 in each direction. Note that crossing over the diagonals of this square involves a transposition of x and y. However, since the diffusion process is isotropic, this transposition does not affect the analysis.
Because the x and y diffusion processes are now separable, further discussion focuses on a onedimensional diffusion process. The twodimensional process can be reconstructed by multiplication of two such onedimensional processes. The derivation uses only x. The equations for the diffusion process in y are identical.
The long side of the triangular state space, which is now replicated four times as the boundary of the new square state space, represents a partially reflecting, partially absorbing boundary. The diffusion process has now been reduced to a SturmLiouvilletype problem, where the boundary conditions are set by a relationship between the flux rate across the boundary and the density function within the boundary.
The function U_{x}(x, t) must satisfy the diffusion equation within the range (–1, 1),
The general solution to the diffusion equation is
The equations for the α_{i} and
The form of the solution in equation A17 works well so long as the probability of coalescence in the first generation in the past is very small, that is, when the two sequences are separated by a sufficient distance (when x_{0} is not close to 1) or when the neighborhood size is large. If x_{0} is close to 1 and the neighborhood size is not large, we must use a more complex form for the initial conditions. This results from the fact that the model assumes discrete time steps, and so the shortest possible coalescence time is one generation. The diffusion approximation solution, on the other hand, assumes continuous time, and coalescence is possible at any time t > 0. When the probability of coalescence occurring within the first time step is very small, this correction is negligible. However, under certain conditions, we must account for the fact that one generation of migration occurs prior to the first opportunity for coalescence. This migration effectively moves the initial condition probability peak away from the boundary, resulting in a longer predicted time to coalescence.
The more accurate form of the solution, valid over all values of x_{0}, takes as its initial conditions a normal distribution of variance
Assuming this distribution for the initial conditions is equivalent to permitting onehalf generation of migration to occur prior to initiation of the coalescent process. In this way, coalescent events that occur over the first generation take place in the range of times ^{1}/_{2} < t < ^{3}/_{2}. Note that t = 1, the first time when coalescence can occur in the discrete time model, lies at the center of this range. This “premigration,” which is necessary to compensate for approximations made in the translation from discrete to continuous time, gives rise to the ”t – 1/2” terms in Equation A20 and in subsequent derivations. Once these factors have been taken into account, a full description of the state of the system at time t is given by U(x, y, t) = U_{x}(x, t) U_{y}(y, t), which can be manipulated to yield a number of other results.
Cumulative distribution function of the coalescence time: First we derive the probability that the two lineages have coalesced prior to time t. Recall that, in this formulation, coalescence is equivalent to diffusion outside of the square state space. The formulas derived above have been normalized so that, at t = 0, the volume under the probability surface is equal to one. At a time t, the probability that the two lineages are still separate is simply given by the volume under the probability surface within the square:
Probability density function of the coalescence time: The probability that the two lineages coalesce at time t is given by the time derivative of the cumulative distribution function,
Moments of the distribution: The expectation and variance for the distribution of coalescence times, as well as all higher moments, can be derived from the expression for C(t). The pth moment of the distribution is given by
Simplified form: If the two samples are taken from locations separated by more than the singlegeneration dispersal range, or if the neighborhood size is large, we can neglect the premigration correction introduced above and use a simplified form of the solution where the initial state of the system is the δ function δ(x – x_{0}, y – y_{0}). The cumulative distribution function, probability density function, and pth moment of the distribution of the time to coalescence are given by
High migration limit: Considering the strongmigration limit, where
Locations of the lineages: The probability distribution for the locations of two lineages that have not yet coalesced is fully described by U(x, y, t). The transformed (triangular) state space is represented four times within the square space over which we have been considering the values of U. Thus, each combination of lineage locations (z_{1}, z_{2}) corresponds to four pairs of (x, y) coordinates. The likelihood at time t that the two lineages are at positions z_{1} and z_{2} is given by
Locations of the coalescence events: The instantaneous rate of coalescence at a particular location is equivalent to the flux across the boundary at the point corresponding to that location. The probability of a coalescent event occurring at a particular location in the habitat, z_{0}, corresponds to flux at four locations in the transformed state space, one on each of the four sides, and the differential width in the habitat, δz, is twice the corresponding differential width (δx or δy) in the transformed space. The coalescence rate within the range z_{0} to z_{0} +δz is
Application to the discretedemes steppingstone model: This solution can be applied approximately to a finitelength steppingstone model of population structure by treating the D demes as part of a continuous population. Assume a total population size of N (deme population size of N/D) and migration rate m (a fraction m/2 of a deme's population arrives from each of its two neighbors each generation). The distance between adjacent demes (from center to center), scaling the total population from 0 to 1 as in the continuous case, is 1/D, and the migration variance is m/D^{2}, so
The functions f_{i}(x_{0}) and
Footnotes

Communicating editor: M. W. Feldman
 Received June 10, 2001.
 Accepted March 4, 2002.
 Copyright © 2002 by the Genetics Society of America