Abstract
Recently Kruglyak, Durrett, Schug, and Aquadro showed that microsatellite equilibrium distributions can result from a balance between polymerase slippage and point mutations. Here, we introduce an elaboration of their model that keeps track of all parts of a perfect repeat and a simplification that ignores point mutations. We develop a detailed mathematical theory for these models that exhibits properties of microsatellite distributions, such as positive skewness of allele lengths, that are consistent with data but are inconsistent with the predictions of the stepwise mutation model. We use our theoretical results to analyze the successes and failures of the genetic distances (δμ)^{2} and D_{SW} when used to date four divergences: African vs. nonAfrican human populations, humans vs. chimpanzees, Drosophila melanogaster vs. D. simulans, and sheep vs. cattle. The influence of point mutations explains some of the problems with the last two examples, as does the fact that these genetic distances have large stochastic variance. However, we find that these two features are not enough to explain the problems of dating the humanchimpanzee split. One possible explanation of this phenomenon is that long microsatellites have a mutational bias that favors contractions over expansions.
MICROSATELLITES are simple sequence repeats in DNA that typically have a high level of variability due to a high rate of mutations that alter their length. For this reason they have been useful for studying population structure on the time scale of thousands of generations (see Bowcocket al. 1994; Royet al. 1994; Goldsteinet al. 1995b; Underhillet al. 1996; Goldstein and Pollock 1997; Harret al. 1998; Irwinet al. 1998; Reich and Goldstein 1998; Goldsteinet al. 1999; Pritchardet al. 1999; RuizLinareset al. 1999). To make inferences from observed patterns, one needs a statistic to measure differentiation between populations and a model to give the distribution of that statistic. Here, we consider two genetic distances: (δμ)^{2} of Goldstein et al. (1995a,b) and D_{SW} of Shriver et al. (1995).
We examine the behavior of two genetic distances (δμ)^{2} and D_{SW} in four increasingly divergent examples: (i) African vs. nonAfrican human populations, (ii) human vs. chimpanzee, (iii) Drosophila melanogaster vs. D. simulans, and (iv) cattle vs. sheep. If one assumes the stepwise mutation model (SMM) of Ohta and Kimura (1973), then the expected value of (δμ)^{2} grows linearly in time. When used on example (i), the statistic (δμ)^{2} gives good estimates (see Goldsteinet al. 1995b), but when applied to examples (ii) and (iii), it gives answers that are roughly oneseventh and onethirtieth of the commonly accepted values. The nonlinear distance D_{SW} does not do as well as (δμ)^{2} at dating the human population split but has a slightly better performance for examples (ii) and (iii), yielding estimates that are about onethird and oneeighth of the commonly accepted values.
Finally, in example (iv), the two species are too far diverged for microsatellites to be useful molecular clocks. Results of Ellegren et al. (1997) show that roughly onehalf of the microsatellites they isolated in one species were monomorphic in the other and have presumably lost their ability to mutate. This observation suggests that in the long run point mutations break up perfect repeats and reduce the mutation rates of microsatellite loci. It is natural to ask if this mechanism can explain the underestimates that arise in examples (ii) and (iii). To investigate this possibility, we introduced two new models. The first is a slight generalization of the model of Kruglyak et al. (1998), which we call the proportional slippage/point mutation (PS/PM) model. In this model point mutations spoil perfect repeats; the slippage rate is zero for microsatellites with fewer than κ repeat units and then increases linearly. The PS/PM model can be used to estimate slippage rates from DNA sequence data, but to address the divergence question we need a second model, called the PCR model, that keeps track of the lengths of all perfect repeats that make up an imperfect repeat.
The PCR model is complicated, but it is possible to obtain a simple formula for the variance of a repeat L_{t} as a function of time t in generations (see Theorem 2). Using a = 2 × 10^{−8} as an estimate for the point mutation rate per repeat unit and a threshold of four repeat units for slippage events to be possible, this formula shows that the variance of the repeat length begins to depart from linearity when t/(10,000,000) is not small relative to one. This result explains some of the problems with the use of (δμ)^{2} in the comparison of D. melanogaster and D. simulans, which diverged ~25,000,000 generations ago, but makes the failure of (δμ)^{2} in the human vs. chimpanzee split even more mysterious, since, as our calculations have shown, point mutations will not have had a significant effect in 250,000 generations.
To further investigate the problems in dating the human vs. chimpanzee split, we investigated the behavior of the PS/PM model when there are no mutations. This special case, called the PS/0M model, and denoted
Calculations for the fourth moment show that if β is the per locus slippage rate, and α is the initial activity of a microsatellite, i.e., the length minus the threshold κ for slippage to occur, then the kurtosis becomes large when βt/α^{2} is large relative to one. In general fourth moments of the microsatellite lengths are larger under the PS/0M model than under the SMM. Consequently, microsatellite statistics that use these moments, such as those of Reich and Goldstein (1998) and Gonser et al. (2000), will have much different distribution under PS/0M than under SMM. In the case of our four examples the kurtoses are (i) 3.02, (ii) 3.93, (iii) 6.75, and (iv) 10.7, compared to 3 for the normal distribution. In the case of the humanchimpanzee split, (ii), this implies that confidence intervals are 1.21 times as large as they would be under the SMM. However, this again does not explain the magnitude of the failures of (δμ)^{2} and D_{SW} in dating the humanchimpanzee split. The last observation and the fact that the simulated microsatellite distributions given in Figures 1 and 3 have many more large microsatellites than are typically observed lead us to conclude that there are forces that constrain the growth not yet incorporated into our models. We return to this point in the discussion.
GENETIC DISTANCES
Our first step is to define the two genetic distances (δμ)^{2} and D_{SW} and to compute their values for the four examples. We then introduce our two new models, state the theoretical results we have obtained, and use them to study the four examples. To define (δμ)^{2}, let μ_{A} and μ_{B} be the mean length of alleles at a microsatellite locus in populations A and B, and define genetic distance between the two populations [see (1) of Goldsteinet al. 1995b] as
To motivate the definition of our second distance, we recall (see, e.g., p. 6723 of Goldsteinet al. 1995b) that, if X and X′ are the lengths of the microsatellite locus in a sample of size two from population A, and Y and Y′ are a similar random sample of size two from population B, then
Theorem 1: If 2βτ is large and τ⩾ N_{e} then
FOUR EXAMPLES
To test the behavior of the statistics (δμ)^{2} and D_{SW} we consider four increasingly divergent examples.
Divergence of human populations: Goldstein et al. (1995b) investigated 30 microsatellite loci and estimated that the value of (δμ)^{2} between African and nonAfrican populations was 6.47. Using this in (1) with their mutation rate estimate of 5.6 × 10^{−4} gives a prediction of 5776 generations for the divergence time. Assuming a human generation time of 27 years, they then arrived at the estimate of 156,000 years, a figure that they argued was in agreement with previous genetic estimates and with archaeological data.
Rubinsztein et al. (1995) studied 24 microsatellite loci in East Anglians and SubSaharan Africans and obtained an estimate of 1.45 for D_{SW}. Assuming β 5.6 × 10^{−4} and taking N_{e} = 5000 as the size of one of the two subpopulations, we can use (2) to give a prediction of 9880 generations for the divergence time. Multiplying by 27 years leads to an estimate of 267,000 years, which is much larger than the estimate of Goldstein et al. (1995b). One possible explanation is that we have chosen the wrong effective population size for our estimate. If instead we use N_{e} = 750 then an estimate of 5630 generations results, which is similar to the value estimated by Goldstein et al. (1995b).
Humans vs. chimpanzees: Rubinsztein et al. (1995) also studied 24 microsatellite loci in chimpanzees. Combining this with their human data, they obtained an estimate of 5.475 for D_{SW} for the humanchimpanzee comparison. They commented that the ratio of this estimate to the East Anglian vs. African comparison, 5.475/1.45 = 3.78, was surprising since the ratio of the divergence times for the two splits is at least 50. The nonlinearity of D_{SW} shown in Theorem 1 helps explain this discrepancy. If we use the slippage rate of β = 5.6 × 10^{−4} from the previous example for both humans and chimpanzees and assume an effective population size of N_{e} = 10^{4} for each population, then using Theorem 1 we arrive at an estimate of τ = 88,200 generations for their divergence time. If we use an average lifetime of 20 years for humans and chimpanzees this translates into 1.76 million years, about onethird the accepted estimate of 5–6 million years (see, e.g., Goodmanet al. 1998 or Kumar and Hedges 1998).
Since Rubinsztein et al. (1995) report only the genetic distances D_{SW} for their loci, we need to turn to other sources for data we can use to calculate (δμ)^{2}. Bowcock et al. (1994), Deka et al. (1994), and Garza et al. (1995) studied 10, 7, and 8 microsatellite loci, respectively, in these two species. The data are given in Table 1. From this, we can compute (δμ)^{2} values of 7.56, 86.19, and 40.19, respectively. Even though the second estimate is >11 times the first, we can use all 25 loci in Table 1 together to get (δμ)^{2} ≈ 40. Using (1) now with the slippage rate estimate β = 5.6 × 10^{−4} gives 35,700 generations, or ~700,000 years, which is less than oneseventh the accepted age.
Assuming the SMM and that the above parameters remain constant, coalescent simulations show that the (δμ)^{2} and D_{SW} estimates are significantly smaller than those expected under the SMM. Specifically, for two samples of 20 individuals with 25 unlinked microsatellites in two separate randommating populations of size 10^{4}, which were separated until 275,000 generations ago and with mutations following the SMM with β = 5.6 × 10^{−4}, we expect a 95% confidence interval for (δμ)^{2} of 179–465, whereas the data were only 40, and a 95% confidence interval for D_{SW} of 7.97–14.6, whereas the data were 5.475.
Drosophila species: The divergence time between D. melanogaster and D. simulans is estimated to have occurred ~2.5 million years ago (see Hey and Kliman 1993). Wetterstrand (1997) used eight di, four tri, and four tetranucleotide repeats and estimated (δμ)^{2} = 19.393 between these species. Using the mutation estimate of 6.3 × 10^{−6} from Schug et al. (1997), she then used (1) to estimate that the divergence time occurred 1.52 million generations ago. Assuming 10 generations per year, she computed a divergence time of 152,000 years, which is about onesixteenth of the estimate of Hey and Kliman (1993).
One of the problems with this estimation is that tri and tetranucleotide repeats have considerably smaller slippage rates than dinucleotide repeats in Drosophila (see Schuget al. 1998). With this in mind, we applied Wetterstrand's analysis to data on 31 dinucleotide repeat loci from Hutter et al. (1998) given in Table 2. The average value of (δμ)^{2} for these loci is 16.09. Using the estimate β = 9.3 × 10^{−6} from Schug et al. (1998) in (1) we estimate the divergence time to be ~865,000 generations. Using the previous estimate of 10 generations per year, this translates into 86,500 years, which is about onethirtieth of the estimate of Hey and Kliman (1993).
Independently, Harr et al. (1998) also used (δμ)^{2} to estimate the divergence times in the phylogeny of D. melanogaster, D. simulans, D. sechelia, and D. mauritiana. From the possible choices of the mutation rate β they list, we choose 10^{−5}, which is the closest to that of Schug et al. (1998). In this case their estimates differ from those of Hey and Kliman (1993) by factors of 10–30.
Our second statistic D_{SW} does much better on the data set of Hutter et al. (1998). The estimate of D_{SW} from their data is 3.64, so assuming an effective population size of N = 10^{6} and using (2) with β = 9.3 × 10^{−6}, we obtain an estimate of τ = 3,330,000 generations. With 10 generations a year this becomes 330,000 years, which is about oneeighth of the estimate of Hey and Kliman (1993).
Again coalescent simulations with the above parameters show that these estimates are significantly smaller than those expected under the SMM. Assuming the two populations are separated until 25 million generations ago we expect a 95% confidence interval of 315–728 for (δμ)^{2} while the data are <20 and a 95% confidence interval of 10.5–18.0 for D_{SW} while the data are 3.64.
Cattle vs. sheep: These two species diverged ~16 million years ago, which, assuming a generation of 2 years, translates into 8 million generations. Ellegren et al. (1997) examined 13 loci of bovine origin and 14 of ovine origin. Discarding 3 loci of bovine origin for which there was not reliable information about their length in sheep, the data are given in Table 3.
Two of these loci studied by Ellegren et al. (1997) show clear signs of mutations other than microsatellite slippage events. At RM103 allele sizes are 115–151 bp in cattle vs. 73 bp in sheep, but the example sequence given for the repeat in cattle is (CA)_{16}. Thus at least part of the average 61.6 bp difference must be due to a major deletion in the sequence flanking the microsatellite in sheep or to an insertion in cattle. At RME11 we have the surprising result that this locus is much longer in sheep than in cattle but that this longer and hence presumably more mutable microsatellite is monomorphic in sheep. Note also that this is the only locus of bovine origin with a large negative δμ. This suggests that again much of this difference in length is due to mutations involving the flanking sequence.
If we remove these two loci, which have an average (δμ)^{2} of 653, the remaining 22 loci have an average (δμ)^{2} of 74.4 per locus. If we use an average generation time of 2 years for cattle and sheep, then using (1) we can estimate that the average slippage rate must be β = 4.65 × 10^{−6}. We could find no information about slippage rates in cattle or sheep, but this is about onethirteenth the rate of 6 × 10^{−5} that Ellegren (1995) observed for microsatellites in pigs.
TWO MODELS WITH POINT MUTATIONS
In all but the first example of the African vs. nonAfrican split in the human population, if we use the SMM with either of our statistics (δμ)^{2} or D_{SW}, then we underestimate divergence times. In view of this, it is natural to ask if there is some mechanism that interferes with the normal rate of growth of these divergence statistics. One possibility is that point mutations spoiling perfect repeats reduce microsatellite mutation rates over time. To investigate this we introduce a new model called the PS/PM model that is a modest generalization of the one proposed by Kruglyak et al. (1998).
PS/PM model: There are three types of changes that can occur:
Proportional slippage: A microsatellite of length ℓ > κ becomes length ℓ ± 1 at rate b(ℓ − κ) each. Microsatellites of length ℓ ⩽ κ do not experience slippage events.
Point mutations: For 1 ⩽ j < ℓ, a microsatellite of length ℓ becomes length j at rate a.
Birth of microsatellites: κ → κ+ 1 at rate c.
For later purposes, it is convenient to write the new proportional slippage rule succinctly as b(ℓ − κ)^{+}, where
When κ = 1 the PS/PM model reduces to the original model of Kruglyak et al. (1998). The motivation for the change from κ = 1 to a general κ comes from several studies. Goldstein and Clark (1995) studied 17 microsatellite loci in Drosophila, plotted variance of repeat count vs. maximum repeat count, and found (see p. 3884) a straight line that hit zero at seven repeat units. Brinkman et al. (1998) studied 10,844 parent/child allelic transfers at nine short tandem repeat loci, finding 23 mutations. There were no mutations at loci with fewer than nine repeats and an approximate linear growth of mutations after that point (see Figure 3 on p. 1412). Finally, Rose and Falush (1998) studied dinucleotide repeats in the yeast genome and compared their frequency with what would be expected on the basis of random chance. The ratio was close to 1 for one to four repeat units and then the logarithm of the ratio increased linearly (see the middle figure on their p. 614).
In formulating the PS/PM model introduced above, our thought experiment consists of picking two nucleotides at random and seeing how many times they are repeated as we scan to the right, so we only need to keep track of the left onehalf of a newly imperfect repeat that has been hit by a mutation. This viewpoint, along with appropriate bookkeeping, can be used to fit the model to data and estimate mutation rates (see Kruglyaket al. 1998). However, if we are going to look at microsatellites through the eyes of an experimentalist who only tracks the length of PCRamplified fragments of DNA, we need to define a new process that keeps track of the lengths of all the perfect repeats in an interrupted repeat as a vector
In words, in our PCR fragment size model, each of the lengths of the perfect repeat units
PCR model: If the state at time t is
Let
Theorem 2: If the initial activity of the microsatellite is A_{0} then at any time t ⩾ 0
To understand the implications of Theorem 2 we return to our four examples. Thinking of dinucleotide repeats, we assume a point mutation rate of a = 2 × 10^{−8} per repeat unit (see Drakeet al. 1998). Based on the work of Rose and Falush (1998) we choose
A MODEL WITHOUT POINT MUTATIONS
Our discussion of Theorem 2 suggests that when aκt is small, as is the case for comparisons between human populations or between humans and chimpanzees, we can ignore the effects of point mutations. If we set the point mutation rate a = 0 in the PS/PM model and add a superscript 0 to remind ourselves that we have
done this, then the activity
PS/0M model: If
Theorem 3: If we use E_{α} to denote the expected value for the process starting from
Substantial differences between the PS/0M model and the SMM appear when we look at third and higher moments. The SMM is symmetric so E(X_{t} − ℓ)^{3} = 0, but as (6) shows, the proportional slippage model has positive skewness. Farrall and Weeks (1998) performed an analysis of 4558 AC dinucleotide repeat loci assayed in the CEPH pedigrees and found positive skewness in the distribution of microsatellite allele lengths. Rubinsztein et al. (1994) had earlier observed this skewness and suggested that it was evidence for “a bias in favor of gains” (see p. 1096 of Rubinszteinet al. 1999). However, our results show such a skewed distribution can result from the PS/0M model that has no mutational bias.
Computing the fourth moment reveals another difference between our proportional slippage model and stepwise mutation. In the SMM the difference in microsatellite length, X_{t} − Y_{t}, between two individuals with a most recent common ancestor t generations ago is the sum of independent random variables. Thus, if t is large,
If the kurtosis is large then the distribution of X_{t} − Y_{t} will have a heavy tail and estimation of quantities such as (δμ)^{2} will be difficult. To see when the kurtosis will
In the African/nonAfrican split if we assume t = 6000 generations, use an average activity α = 15, which corresponds to an average size of 20 repeat units, and set β = 5.6 × 10^{−4} then βt/2α^{2} = 0.0075 so the kurtosis is 3.02. For the humanchimpanzee split, t = 250,000, β = 5.6 × 10^{−4}, and α = 15, so βt/2α^{2} = 0.311 and
To interpret the numerical values of the kurtosis, we observe that if a random variable V has kurtosis
The last conclusion shows that estimates of (δμ)^{2} under the proportional slippage model are not very much more variable than under the SMM. However, the fluctuations under the SMM in this case are huge. Figure 2 gives a simulation of (δμ)^{2} under the parameters of the humanchimpanzee split. We used two populations of size N_{e} = 10,000 individuals, a divergence time of 250,000 generations, and a mutation rate of 5.6 × 10^{−4} per locus per generation. It is interesting to compare the simulations where 51% of the (δμ)^{2} values are >120 with the data in Table 1 where the largest (δμ)^{2} among 25 loci is 112. Indeed, as (1) predicts, the average value of (δμ)^{2} in the simulation is 2βτ = 280.
Further, coalescent simulations of the PS/0M model show that for the humanchimpanzee and the D. melanogasterD. simulans splits the observed (δμ)^{2} and D_{SW} statistics are not within the expected 95% confidence intervals. These observations suggest that there may be some additional mechanism(s) preventing microsatellites from getting too long.
DEATH OF MICROSATELLITES
Our final topic is to compute the probability of microstellite death in the PS/0M model, i.e., the probability a microsatellite will reach 0 activity in t generations. Since, as noted above, the PS/0M model is equivalent to the binary branching process of probability theory, we can compute not only all of the moments of
Theorem 4: Letting P_{α} denote the probability law for the PS/0M model starting from
To apply Theorem 4 to our four examples, we begin by recalling that b = β/(2α), where β is the per locus slippage rate and α is the activity, that is, the length minus κ = 4. In the African vs. nonAfrican human comparison, t = 6000, β = 5.6 × 10^{−4}, and α = 15 (i.e., an average length of 19 repeat units), so (9) shows that the probability of having no activity after t = 4 × 10^{3} generations is (0.11/1.11)^{15} < 10^{−15}. In the human vs. chimpanzee comparison, t = 250,000, β = 5.6 × 10^{−4}, and α = 15, so the probability of having no activity after t generations is 0.054.
Figure 3 shows the distribution of the lengths in this case as computed from (10). Note the positive skewness in the distribution as predicted by Theorem 3. Note also that our numerical solution has 17% of the microsatellites having >30 repeat units while only 1 of 205 dinucleotide microsatellites in the original 1Mb sample of human DNA in Kruglyak et al. (1998) has this length. This again suggests that there may be some additional mechanism(s) preventing microsatellites from getting too long.
For the D. melanogaster vs. D. simulans and cattle vs. sheep comparisons, the PS/0M model overestimates the number of microsatellites with no activity. But this is to be expected since our earlier results show that point mutations have slowed down microsatellite mutation processes over this amount of time.
DISCUSSION
In summary, microsatellite mutation models that incorporate point mutations and proportional slippage events fit the data better than the SMM. However, these two features are not enough to explain, for example, the observation that the genetic distance statistics (δμ)^{2} and D_{SW} tend to underestimate divergence times and have more difficulty with more distant comparisons. This and other evidence we presented suggests that long microsatellites are more likely to become shorter rather than longer when a mutation occurs.
One possibility is that there is selection against longer alleles. This effect is clearly noticeable in microbial genomes where selection for small genome size appears to cause microsatellites to be much shorter than they would be by chance alone (see Field and Wills 1998). The low frequency of di and tetranucleotide repeats and the enhanced frequency of trinucleotide repeats in coding sequence in yeast (see Younget al. 2000) are another sign of the effects of selection.
Upper limits on allele sizes are a severe form of selection that has been incorporated in some models (e.g., Nauta and Weissing 1996; Feldmanet al. 1997; Pollocket al. 1998; Stefanini and Feldman 2000). This simple approach allows one to develop a detailed theory. However, it is not clear what biological mechanism sets an absolute upper bound on the length of all CA repeats. A second approach (see Garzaet al. 1995; Zhivotovskyet al. 1997; Zhivotovsky 1999) is that there is a mutational bias such that alleles of large size mutate preferentially to alleles of smaller size. Wierdl et al. (1997) observed this where they inserted GT repeats of various sizes into the coding sequence of a yeast gene. In D. melanogaster mutationaccumulation lines Harr and Schlötterer (2000) observed that for long microsatellites, although the number of upward and downward mutations was identical, the size of the downward mutations was larger than the size of the upward ones. Two recent studies of mutations observed in human pedigrees also support this notion. Ellegren (2000) found a weak but statistically significant negative relationship between the magnitude of mutation and standardized allele size. Xu et al. (2000) examined 236 mutations at 122 tetranucleotide repeat loci and found that the rate of expansion mutations is roughly constant but contraction mutations increase with length. It will be of interest to explore the dependence of the mutational bias on length and to incorporate these effects into our proportional slippage/point mutation model to develop a more accurate model of microsatellite evolution. If mutational bias appears only when microsatellites are fairly long this bias should have only a limited impact on our previous modelbased estimates of slippage rates (see Kruglyak et al. 1998, 2000).
Acknowledgments
We thank two anonymous reviewers, Tessa Bauer DuMont, Jennifer Calkins, Semyon Kruglyak, Willie Swanson, and Todd Vision for their many helpful comments. This work was partially supported by National Institutes of Health (NIH) grant GM36431 to C.F.A., NIH grant GM3643114S1 to C.F.A. and R.T.D., and National Science Foundation grant DMS9877066 to R.T.D.
APPENDIX A
To compute D_{SW} and (δμ)^{2}, let ξ_{1}, ξ_{2}, … be independent and ±1 with probability 1/2 each and let S_{n} = ξ_{1} + · · · + ξ_{n}. The ξ_{i} are the results of the various slippage events and S_{n} is the total change after n events. If each population consists of N diploid individuals then the number of slippage events, U, before the ancestors of X and X′ (or of Y and Y′) coalesce has a shifted geometric distribution with success probability p = 1/(4βN + 1); that is, we have
In the case of (δμ)^{2} breaking things down according to the values of U and V and using the fact that
Let T be a random time, e.g., U or U + V. Writing 1_{(T⩾n)} for the function that is 1 if T ⩾ n and 0 otherwise, we have
In the case T = U + V, P(U + V ⩾ 2k + 1) is given by
Our next goal is to show that if 2βτ is large and τ ⩾ N we can drop the second term from (A7) to end up with
APPENDIX B
Writing
To prepare for the computation of the variance, let
To check the second equality note that if
Using this with (B1) and solving the differential equation we have
Turning now to
APPENDIX C
From the definition of the PS/0M model and the Kolmogorov differential equations, it follows that
Letting f_{4}(k) = (k − ℓ)^{4} we have
To compute the righthand side we note that E_{ℓ}(Z_{s}(Z_{s} − ℓ)^{2}) = E_{ℓ}(Z_{s} − ℓ)^{3} + ℓE_{ℓ}(Z_{s} − ℓ)^{2} = 6b^{2}ℓs^{2} + 2bℓ^{2}s by (C4) and (C3), so we have
Footnotes

Communicating editor: S. Tavaré
 Received June 15, 2000.
 Accepted July 6, 2001.
 Copyright © 2001 by the Genetics Society of America