Abstract
Minisatellite and microsatellite are short tandemly repetitive sequences dispersed in eukaryotic genomes, many of which are highly polymorphic due to copy number variation of the repeats. Because mutation changes copy numbers of the repeat sequences in a generalized stepwise fashion, stepwise mutation models are widely used for studying the dynamics of these loci. We propose a minimum chisquare (MCS) method for simultaneous estimation of all the parameters in a stepwise mutation model and the ancestral allelic type of a sample. The MCS estimator requires knowing the mean number of alleles of a certain size in a sample, which can be estimated using Monte Carlo samples generated by a coalescent algorithm. The method is applied to samples of seven (CA)_{n} repeat loci from eight human populations and one chimpanzee population. The estimated values of parameters suggest that there is a general tendency for microsatellite alleles to expand in size, because (1) each mutation has a slight tendency to cause size increase and (2) the mean size increase is larger than the mean size decrease for a mutation. Our estimates also suggest that most of these CArepeat loci evolve according to multistep mutation models rather than singlestep mutation models. We also introduced several quantities for measuring the quality of the estimation of ancestral allelic type, and it appears that the majority of the estimated ancestral allelic types are reasonably accurate. Implications of our analysis and potential extensions of the method are discussed.
SINCE the discovery that a large number of loci with tandemly repeated sequences in human and many eukaryote species are highly polymorphic because of copy number variation of the repeats in different individuals (Jeffreys 1985; Litt and Luty 1989; Weber and May 1989), allele size data from such loci are rapidly becoming the dominant source of genetic markers for genome mapping, forensic testing, and population studies. Loci with repeat sequences longer than 5 bp are generally referred to as minisatellite or variable number tandem repeat loci, and those with repeat sequences between 2 to 5 bp are referred to as microsatellite or short tandem repeat loci (Tautz 1993). Because mutations change the copy number of such loci in a stepwise fashion, rapid accumulation of population samples from minisatellite and microsatellite loci has resurrected the interest of the stepwise mutation model (SMM), which was popular in the 1970s.
To avoid misinterpretation when using the information from these loci, understanding the dynamics of polymorphism at minisatellite and microsatellite loci is important. It is also vital for population and evolutionary study. Important for better understanding of the evolution of such loci is the estimation of relevant population parameters. There are several parameters of a population that can affect the pattern of polymorphism in a sample. The most important is θ = 4Nμ, where N is the effective population size and μ is the mutation rate per locus per generation. θ is primarily responsible for the amount of polymorphism in a sample. Under the infiniteallele model, that is, each mutation at a locus creates a new allele in the population, θ is the only parameter for the distribution of polymorphism in a sample from a steady population. Ewens' (1972) sampling formula provides the basis for estimating θ from a single quantity—the number of alleles in a sample. However, under SMM, the pattern of polymorphism in a sample becomes more complex and depends on additional parameters, the number of which depends on the complexity of the mechanism of evolution for such a locus. Unfortunately, a sampling distribution for a SMM parallel to Ewens' (1972) has not been found, resulting in difficulty in making inference under the SMM. Since the values of parameters are necessary in interpreting the evolution of a locus under the SMM, proper estimation of parameters is critical in studying the mechanism of evolution of loci under the SMM. To date, no method is available for simultaneously estimating all the parameters of the SMM, which limits the usefulness of such loci for studying the history of a population, particularly the human population.
We develop in this article a method for simultaneous estimation of all the parameters of the SMM and the ancestral allelic type of the alleles in a sample. The estimator is a combination of the minimum chisquare estimator (MCS) and Monte Carlo simulation, taking advantage of fast coalescent algorithms. We apply the method to the samples of dinucleotide repeats of Deka et al. (1995) and discuss the implications of our analyses.
STEPWISE MUTATION MODELS
Let π_{ij} be the probability that a mutation causes an allele size change from i to j. For a stable population, which is assumed throughout this article, a SMM is completely specified by the distribution π_{ij} and the mutation parameter θ = 4Nμ, where N is the effective population size and μ is the mutation rate per allele per generation. Following the introduction of the SMM by Ohta and Kimura (1973), most of the subsequent studies in the 1970s were based on single or twostep SMMs (e.g., Moran 1975; Li 1976; Weiret al. 1976; Chakraborty and Nei 1977). In particular, Moran (1975) showed that under a singlestep mutation model, allelic frequencies do not reach a steady distribution. Consequently, later studies of SMMs have focused on various moments of allele frequencies, e.g., the variance of allele sizes that have steadystate distributions (Weiret al. 1976; Chakraborty and Nei 1982). This tradition appears to continue in the recent surge of interest in the SMM (Shriveret al. 1993; Valdeset al. 1993; Di Rienzoet al. 1994; Kimmelet al. 1996).
Although most studies on SMMs assume either a single or twostep SMM, there are as many SMMs as different distributions for π_{ij}. One problem is that many SMMs can result in patterns of polymorphism that are practically indistinguishable. As a result, the choice of the distribution π_{ij} is not trivial. Distributions that are sufficiently flexible and depend on few parameters, each having clear biological meaning, should be preferred. Although the method developed in this article for estimating parameters of a SMM applies to any distribution for π_{ij}, we shall consider a distribution π_{ij} that is homogeneous for a different value of i, partly because such a model has fewer parameters and partly because only the relative sizes of alleles in the samples analyzed later are known. We shall consider the following distribution:
It follows from the distribution (1) that given the direction of size change (increasing or decreasing) the size of a change is given by geometric distribution (1 − P)P^{}^{i}^{−}^{j}^{−1}. A small value of P implies that the size of a change is likely small and a large value of P means that the size of a change can be large. An example of π_{ij} is given in Figure 1. Note that the geometric distribution (1−P)P^{}^{i}^{−}^{j}^{−1} does not have maximum size (step) for a size change. Any size of change is possible at least theoretically. Although we can consider a truncated geometric distribution that imposes a maximum size of change, doing so will introduce another parameter. We note that, although any size change is possible under a geometric distribution, the probabilities for most changes of large size are small and therefore negligible in practice. For this reason, it is simpler to consider an effective number of steps rather than to impose an absolute maximum step. We define the effective number of steps of a SMM to be the smallest integer s such that when π_{ij} < ε, i − j > s for some threshold value ε, we shall use ε = 10^{−3}. For our model, s increases with P and is the largest integer that is not larger than
Given that a locus evolves according to the SMM described above, the values of three parameters θ, α, and P then determine the values of various moments of allele frequencies at equilibrium. Therefore, in general, moments computed from a sample can be used to estimate these parameters. Because moments are computed from allele frequencies of a sample, allele frequencies thus contain more information about the parameters than a set of moments do, and consequently the accuracy of estimation directly based on allele frequencies should be higher than that based on a small set of moments. However, Moran (1975) showed that for any set of initial allele frequencies, the frequency of any allele does not have a steady distribution after a sufficient number of generations. Also, because the allele frequencies of a population many generations ago are generally unknown, Moran's analysis appears to suggest that it would be futile to make an inference about the parameters of a SMM based directly on allele frequencies of a sample. A close examination of what determines the allele frequencies in a sample will change this perception.
The sequences (chromosomes) of a sample of size n can be traced back to their most recent common ancestor (MRCA), who lived on average 4N(1 − 1/n) generations ago. It is obvious from a coalescent point of view that the distribution of allele frequencies in a sample is entirely determined by the allele A possessed by the MRCA and the three parameters θ, α, and P of SMM. The allele frequencies of a population in the more distant past are irrelevant to the allele frequencies in a sample once the ancestral allele A is known or specified. This observation implies that inference on parameters θ, α, and P based on allele frequencies in a sample can be made once A is specified.
In addition to the importance of the ancestral allele A in determining the distribution of allele frequencies in a sample, the value of A for a sample is itself of great interest in studying the evolution of the locus from which alleles were sampled. It is thus desirable to be able to infer the value of A as well as the values of θ, α, and P from a sample. Such an inference would be difficult if it were based on a set of moments only. In this article, we treat A as a parameter whose value is to be estimated from a sample. For convenience, we collect these four parameters in a vector
MINIMUM CHISQUARE ESTIMATOR OF Γ
Let f_{i}, i = 1, … be the number of alleles of size i in a sample of n chromosomes. Our aim is to derive an estimate of Γ from {f_{i}}.
There are two widely used approaches for estimating parameters: the maximum likelihood method and the leastsquares method. Often both methods result in similar estimates and share many properties that are desirable. The maximum likelihood method computes the probabilities of allele frequencies {f_{i}} given the value of Γ. The computation is difficult and timeconsuming, but the likelihood approach has the advantage of being able to test hypotheses. On the other hand, leastsquaresbased methods compute the means and perhaps the variances and covariances of allele frequencies, which are much easier to compute than probabilities. Therefore, leastsquaresbased methods are generally easier to use in practice and are particularly appealing when a large number of samples need to be analyzed. The major disadvantage is that they are difficult to extend for hypothesis testing.
Let e_{i}(Γ), i = 1, … be the expected number of alleles of size i conditional on the value of Γ, i.e., e_{i}(Γ) = E(f_{i}Γ). Our strategy is to find the value of Γ that minimizes the quantity
Although some optimization procedures with constraints can be adapted to search for the MCS estimate, a simple grid search is sufficient here because there are only four parameters. What complicates this seemingly straightforward procedure is that a formula or an even numerical solution for e_{i}(Γ) is not available at present. Therefore, their values have to be estimated from simulated samples. Estimating e_{i}(Γ) for a large number of combinations of parameter values can be very time consuming even when samples are generated by a fast algorithm from coalescent theory (Kingman 1982a,b; see Hudson 1991 for a recent review). When there are only a few samples to be analyzed, a twosteps grid search approach can be used. The first step is to carry out a fast fullgrid search to identify the vicinity of the MCS estimate. To achieve a fast fullgrid search, only a modest number of Monte Carlo samples is used to estimate e_{i}(Γ) for each Γ. The second step is to carry out a finescale grid search in the small area identified by the first step. In the finescale search, a relatively large number of Monte Carlo samples is used to obtain more accurate estimates of e_{i}(Γ), and thus more accurate MCS estimates. When there are many samples to be analyzed, an alternative approach is to create a database of {e_{i}} for all reasonable combinations of parameter values, and estimation for each sample will retrieve the values of {e_{i}} from the database. This latter approach is the one we used in our analyses of 63 samples of dinucleotide repeats.
Let us consider how e_{i}(Γ) can be estimated. Suppose we have M simulated samples of size n given the value of Γ, and suppose the number of alleles of size i in the jth sample is n_{ij}. Then we can use the meanê_{i} of n_{ij} as an estimate of e_{i}(Γ). That is,
First, a genealogy of n sequences (alleles) is generated using a coalescent algorithm (e.g., Hudson 1991). For the simulated genealogy, we have not only the topological relationships of these sequences, but also the number of mutations that occurred on each branch of the genealogy. Simulation of such a genealogy requires only the value of θ. Second, assign a value to A, which is by definition the allele at the root of the genealogy; then determine the resulting allele of each mutation that occurred on the genealogy. Obviously the exercise requires knowing the allele type before a mutation and can be accomplished by starting from the root and progressing toward the tips of the genealogy. In this process, the type of a new mutant allele is simulated according to the distribution {π_{i}}, which is completely specified by the values of P and α.
Table 1 shows examples of the values ofê_{i} and their standard errors for different numbers of Monte Carlo samples. Note that the majority of estimates are reasonably accurate even when M = 500. These results as well as many other simulation experiments we performed suggest that for the purpose of identifying the vicinity of the MCS estimate, 500–1000 Monte Carlo samples is usually sufficient, and 10,000 Monte Carlo samples is adequate for a finescale grid search to obtain the final MCS estimate.
We define t_{i}(n) = e_{i}/n as the proportion of alleles being size i and measure the closeness between {t_{i}(n)} for two different sample sizes by Euclidean distance, i.e.,
APPLICATIONS
Since the discovery of highly polymorphic CArepeat loci (Litt and Luty 1989; Weber and May 1989), many samples of such loci from human populations have been reported (e.g., Kaminoet al. 1993; Bowcocket al. 1994; Di Rienzoet al. 1994; Dekaet al. 1995). The samples of Deka et al. (1995) are particularly useful for population study because their sample sizes are relatively large and the populations sampled are anthropologically well defined. We thus use their data to illustrate our method and to examine several issues about the evolution of microsatellite loci. Eight CArepeat loci in nine different populations, the Samoan (SA), Dogrib Indian (DG), Pehuenche Indians (PH), New Guineans (NG), Kachari (KA), German (GR), CEPH (CP), Sokoto (SO), and Chimpanzee (CH), were reported in Deka et al. (1995), but we shall exclude the locus D13S137 from our analysis because there are many single nucleotide insertions/deletions within the repeat motifs at the locus. For the remaining seven loci, we also exclude all the alleles that are results of single nucleotide insertion/deletion because the mechanisms for changing allele size and for insertion/deletion are likely different. The allele sizes and their frequencies at these loci are given in the appendix of Deka et al. (1995). Figure 3 shows the allele frequencies at the loci FLT1 and D13S122. Because polymerase chain reaction (PCR) was used for amplification and the distances between the upstream primer sequences and the first CArepeat were unknown for these loci, the resulting alleles were given in terms of sequence length from the primer sequences, instead of copy numbers. This does not pose difficulty in our analysis because the mutation model (1) is only dependent on the relative number i − j of copy numbers, which are available from the samples. However, the estimated ancestral allele A at each locus will have to be given in terms of sequence length from the primer sequence.
Our task of estimation appears to be challenging at first glance because there are 63 samples to be analyzed, each requiring a considerable amount of computer cpu time. The observation we made earlier that the proportion of alleles of a given size is rather insensitive to sample size is of great help. Instead of generating a large number of samples for estimating e_{i} for each of the 63 samples, we choose to obtain good estimates of e_{i}/n for a sample of 100 chromosomes only and rescale them to obtain e_{i} for samples of different sizes. We selected θ = 0.2(0.2)10 (i.e., 0.2, 0.4, …, 9.8, 10), P = 0.01(0.01)0.10(0.05)0.9, and α = 0.5(0.5)1.0, a total of 14,500 combinations of values of the three parameters. For each set of parameters, we generated 20,000 independent samples and obtained estimates of e_{i}/n from (5). Note that we do not need to estimate e_{i}/n for α < 0.5 separately, because they can be obtained from those for α > 0.5 because of symmetry. Therefore, we effectively obtained estimates of e_{i}/n for 29,000 sets of parameters. These estimates were stored in a database and can be retrieved easily.
For each of the 63 samples, we performed a grid search over all the parameter sets to obtain a MCS estimate of Γ. In other words, for each of 29,000 parameter sets, we computed the χ^{2} value, updated the minimum χ ^{2} value and corresponding parameter values, and obtained the MCS estimate after all the parameter sets had been examined. This finescale grid search still requires nontrivial computer cpu time but is quite manageable. The estimation results are given in Table 3.
Note that 26 of the 63 samples show contraction (α < 0.5) in allele sizes and 37 show expansion (α > 0.5) in allele sizes (Table 3). This suggests that there is a slight bias of mutations toward expansion. Table 3 also suggests that most of the loci evolve in a multistepwise fashion. We divided samples into two groups, one showing allele size contraction and the other expansion; we then found that P values of the latter group are considerably larger than those of the former group, which means that size change during expansion is likely greater than that during contraction. This result implies that if the probability α of expanding allele size by a mutation is the same as or even slightly smaller than the probability of contracting allele size, the alleles in a population will still tend to increase in size. Because Table 3 shows that α in the majority of samples is larger than ½, there is a tendency, stronger than that suggested by α alone, that most loci are expanding in allele size.
Rubinsztein et al. (1995) compared allele sizes of 42 microsatellite loci in several primate species and found that alleles in humans are generally longer than in other primates. They argued that microsatellite loci can evolve directionally and at different rates in closely related species. Our estimates of the ancestral allele sizes in human populations and in chimpanzees also show a slight bias toward longer alleles in humans than in chimpanzees, because the ancestral alleles of humans in five (FLT1, D13S118, D13S71, D13S122, and D13S124) of the seven microsatellite loci are longer than those of chimpanzees. However, some of these differences may be due to ascertainment bias, and analyses of more loci are needed to resolve this issue.
Weber and Wong (1993) studied 28 microsatellite loci in human chromosome 19 and a total of 20,000 parentoffspring allele pairs. They found that 78% of the 24 size changes in vivo were either gain or loss of single repeat unit, and gain or loss of more than three repeat units was not observed. When all the mutations in vivo and in vitro are considered, there is a strong tendency toward gains over losses in repeat units. Our analysis in general agrees with their observations. In vivo mutations as observed by Weber and Wong (1993) do not appear to suggest large P values. However, they do not necessarily contradict our estimates: first, because the number of mutations found in each locus examined in their study is simply too small to yield a reliable estimate of P for that locus; and second, because it is not unreasonable to suggest that a similar mutational bias may be occurring both in vitro and in vivo, and when both in vitro and in vivo mutations found in their study are considered, the mutation pattern would agree well with larger P values.
Let θ_{ij} be the value of θ for the ith population at the jth locus. If these loci are selectively neutral, then it would be reasonable to assume that the mutation rate at each locus is the same for different populations. Therefore, under the neutrality assumption, θ_{ij}/θ_{ij}_{″} = N_{j}/N_{j}_{″}, where N_{j} and N_{j}_{″} are the effective population sizes of the jth and j″th populations, respectively. That is, the ratio of θs of two different populations is independent of the locus studied. If estimates of θ are accurate, then we should expect to see a consistent value of ratio of θs over different loci. The estimated θs in Table 3 show that this ratio for most pairs of populations is not very consistent over loci. This suggests that the variance in θ estimate is likely to be large. The estimated θs also vary considerably among different populations for each locus, but this is expected because of different effective population sizes. A large variance in the estimation of θ is not unexpected because the estimate of θ, [for example, by Watterson's (1975) estimator], from segregating sites of DNA sequences that contain more information about θ than microsatellite data, is also accompanied by a relatively large variance. Furthermore, Kimmel and Chakraborty (1996) showed that the variance of a variance estimator of θ does not diminish with sample size.
When we draw conclusions based on estimated values of parameters, which are associated with variances, it is important to have some measures of accuracy in the estimates. Often the variance of an estimate of a single parameter is hard enough to compute, and measuring the accuracy of simultaneous estimation of all four parameters in a SMM model is undoubtedly more difficult, but nevertheless of great importance for recommending a method. This is an area in which answers demand substantially more computer resource than estimation, and it deserves much effort in the future. Therefore, we do not intend to provide all the answers here. Instead, we will focus on discussing the accuracy in the estimation of ancestral allele A. Although estimates of the four parameters are interrelated, we observed that values of the three parameters p, α, and θ that result in χ ^{2} that is close to the minimum are also close to the set that results in minimum χ^{2}, while MCS estimates conditional on different ancestral alleles can differ substantially. This is not difficult to understand. For example, specifying a small ancestral allele size (relative to the sizes of alleles in a sample) will require large α and P (or θ) to explain the observed allele frequencies, while, on the other hand, specifying a large ancestral allele size will result in a small estimated value for α. Our experience suggests that accuracy in the estimation of an ancestral allele is a good indicator of the overall accuracy of estimation.
To measure the accuracy of the estimate of A, we compare the minimum χ^{2} values conditional on different ancestral allele sizes. One way to facilitate the comparison is a plot of χ^{2} values vs. various ancestral allele sizes. A sharp decrease in the overall minimum value of χ^{2} at allele A should suggest high accuracy of estimation. To allow comparison of the estimates for different samples, we use the ratio of conditional minimum χ^{2} to the overall minimum χ^{2}. Figure 3 shows two examples. Part a suggests that the estimate of A for the FLT1 locus from population DG is accurate but the estimate from population CH is uncertain. The allele frequencies in Figure 3 concur with this analysis, because the frequency of estimated A = 168 in the DG sample is extremely high, while the frequency of estimated A = 176 in the CH sample is only intermediate, although it is the highest. Figure 4b also shows a similar pattern for locus D13S122. It is interesting to note that estimated A = 105 for the SA sample is not the most frequent allele in that sample. Table 3 shows that the θ estimate from this sample is suspiciously large. Because Figure 3b shows that the estimate of A for this sample is rather uncertain, we expect that quite different sets of P, α, and θ can result in χ^{2} values close to the minimum. Indeed, when ancestral allele A is set to 85 the conditional minimum χ^{2} value is equal to 154.59, which is less than 1% larger than the minimum χ^{2} value, and the corresponding estimates of P, α, and θ are 0.3, 1.0, and 4.2, respectively.
When the ancestral allele is reasonably certain, it makes sense to examine closely the estimation of other parameters. Take the locus D13S122, for example; allele size 95 appears to be the ancestral allele for the PH and GR samples (see Figure 3 and Table 4). Under the condition A = 95, we examined the minimum χ^{2} values for different values of α, and results are given in Figure 5. For the PH sample, Figure 5 shows that it is very unlikely that α < 0.5 while for the GR sample, α should not be substantially different from 0.5. These conclusions are reinforced by the allele frequencies in the two samples that are shown in Figure 3.
Another measure R_{2} of accuracy is the ratio of the χ^{2} value of the second best estimate of A to that of the best. The larger the ratio, the worse the fit for the second best A, and thus the better the estimate of A. Another useful measure R_{m} is the ratio of the mean χ^{2} values of the two neighboring sizes of A to that of A, which measures the goodness of the A compared to its neighbors. Obviously we always have R_{2} ≤ R_{m}. These two measures are particularly convenient when there are many samples to analyze, as in our situation. The values of R_{2} and R_{m}, as well as the minimum χ^{2} value for the 63 samples, are given in Table 4.
Table 4 shows that 43 of the 63 samples result in R_{2} > 1.20, and 30 result in R_{2} > 1.5. Although further study is required for proper interpretation of these R_{2} values, it appears that other than the MCS estimate an increase of 20% or more in the χ^{2} value for an ancestral allele should be a reasonable indication that the estimation is not totally out of line.
DISCUSSION
A strength of the MCS estimator developed in this article is its ability to simultaneously estimate all the parameters of the SMM, including the ancestral allele, making better use of available information in a sample. To date, mutation mechanisms for minisatellite and microsatellite loci are not yet fully understood, and even less is known about the mode of mutations, i.e., whether it is symmetric or nonsymmetric, singlestep or multistep. Although Kimmel et al. (1996) and Kimmel and Chakraborty (1996) emphasized that allelic size variancebased estimates of intra and interpopulation variation at repeat loci are not affected by asymmetry of allele size changes by mutations, their analyses reflect as well the concept that knowledge of the distribution of size change by mutation is critical. Furthermore, the modes of mutations are likely to differ from loci to loci. Therefore, being able to estimate simultaneously all the parameters of a SMM has considerable advantages over methods for estimating a single parameter that assume a mode of mutations, such as a symmetric singlestep mutation model, which may be grossly incorrect.
Another strength of our method is its flexibility. Although we only considered a mutation model with three parameters and assumed a constant population size, it is not difficult to see that any combination of mutation model and population genetics model can be analyzed in a similar manner as long as alleles under these models can be simulated. In particular, one can consider mutation models with constraints on allele size or mutation rates dependent on allele size and more complex population genetic models, such as growth populations or subdivided populations. This strength should not be overlooked for two reasons. First, rapid accumulation of population samples from microsatellite and minisatellite loci provides excellent opportunities to examine various mutation models, and, second, many natural populations, particularly human populations, are not panmictic. Proper statistical inferences should be based on more realistic population models, allowing for population growth and subdivision. The expectation of alleles of a given size as estimated by Monte Carlo simulation provides great flexibility of the method, although it has the drawback of requiring more computer cpu time. Note that methods for parameter estimation that rely on Monte Carlo samples to obtain some necessary quantities were used in Fu (1994).
The MCS estimator we developed is a generalized least squares estimator and is often used in statistics for discrete distributions. Therefore, we expect our estimator to have many desirable properties. Even though the procedure of estimation takes advantage of a fast coalescent algorithm, it is still a timeconsuming method, which makes it hard to investigate the statistical properties of the estimator. Nevertheless, the statistical properties of the estimator will be worth studying in the future.
There are a number of potential extensions to the method we proposed. We chose to obtain parameter estimates from allele frequencies, but the same approach can be applied to a set of summary statistics, including various moments, number of alleles, heterozygosity, etc. It is also possible to incorporate variances and covariances of allele frequencies into an estimator, although doing so will demand even more computer resource. Another potential extension is to use the χ^{2} statistics for testing hypotheses, such as the hypothesis that mutations follow a singlestep SMM, but hypothesis testing is likely to be more effective using likelihoodbased approaches such as the method by Nielson (1997).
Acknowledgments
This work was supported in part by National Institutes of Health grant R29 GM50428 (Y.X.F.) and GM58545 (R.C.).
Footnotes

Communicating editor: G. B. Golding
 Received March 13, 1998.
 Accepted May 26, 1998.
 Copyright © 1998 by the Genetics Society of America