Abstract
Positional cloning of gene(s) underlying a complex trait requires a highresolution linkage map between the trait locus and genetic marker loci. Recent research has shown that this may be achieved through appropriately modeling and screening linkage disequilibrium between the candidate marker locus and the major trait locus. A quantitative genetics model was developed in the present study to estimate the coefficient of linkage disequilibrium between a polymorphic genetic marker locus and a locus underlying a quantitative trait as well as the relevant genetic parameters using the sample from randomly mating populations. Asymptotic covariances of the maximumlikelihood estimates of the parameters were formulated. Convergence of the EMbased statistical algorithm for calculating the maximumlikelihood estimates was confirmed and its utility to analyze practical data was exploited by use of extensive MonteCarlo simulations. Appropriateness of calculating the asymptotic covariance matrix in the present model was investigated for three different approaches. Numerical analyses based on simulation data indicated that accurate estimation of the genetic parameters may be achieved if a sample size of 500 is used and if segregation at the trait locus explains not less than a quarter of phenotypic variation of the trait, but the study reveals difficulties in predicting the asymptotic variances of these maximumlikelihood estimates. A comparison was made between the statistical powers of the maximumlikelihood analysis and the previously proposed regression analysis for detecting the disequilibrium.
INVESTIGATION of linkage disequilibrium between genetic loci has long been a historical topic in evolutionary theory and has been shown to be a very powerful approach in distinguishing between alternative evolutionary models (Lewontin 1974). There has, however, been a recent interest in linkagedisequilibrium analysis because of the vast abundance of genetic polymorphisms at the DNA molecular level and because of successes in using the disequilibrium measure to map disease genes (Hastbackaet al. 1992; Lander and Schork 1994) or to optimize breeding schemes for markerassisted selection (Lande and Thompson 1990). From a statistical point of view, inferences about linkage disequilibrium involve two steps: detecting its presence and estimating its magnitude once the disequilibrium has been confirmed (Weir 1979). Many studies have focused on inferring linkage disequilibrium between alleles at two or more loci when gametic or genotypic data are available at the involved loci (Hill 1974, 1975; Brown 1975; Weir and Cockerham 1979; Hastbackaet al. 1992; Spielmanet al. 1993; Kaplanet al. 1995; Slatkin and Excoffier 1995).
Most characters of economic or medical importance are affected by multiple genes called quantitative trait loci (QTL). The key difficulty encountered in modeling linkage disequilibria involved with QTL is mainly caused by the unavailability of genotypic data on the trait. Hill and Robertson (1966, 1968) demonstrated the predictability of the expected dynamics of linkage disequilibrium between a pair of linked QTL for the purpose of investigating the effect of linkage on the limit of selection on the two loci. Luo et al. (1997) developed a population genetics model of the linkage disequilibrium between a polymorphic marker locus and a QTL in finite populations under selection and recombination. Their study indicates that evolution of the linkage disequilibrium between a marker locus and QTL can be accurately predicted using the model. The theoretical analysis was carried out more recently by Luo (1998) to explore the statistical power for detecting the disequilibrium between a polymorphic marker locus and a locus contributing to quantitative genetic variation in natural populations. For a good understanding of and thus an appropriate use of the markerQTL linkage disequilibrium in the three aspects discussed above, there is a need of knowledge about feasibility in estimating the magnitude of the disequilibrium in natural populations. While Luo (1998) concentrated on detecting the presence of the markerQTL linkage disequilibrium, the aim of the present article is to develop a theory for achieving an estimate of the coefficient of the linkage disequilibrium using the principle of maximum likelihood. Numerical analyses based upon intensive simulation study are used to illustrate validation of theoretical analyses and to confirm the accuracy of theoretical predictions.
MODEL AND NOTATION
The theoretical analysis involves modeling two autosomal loci: one affects a quantitative trait, whereas the other is a codominant marker locus that is devoid of effect on the trait. The two alleles are denoted by M and m at the marker locus and by A and a at the QTL. Three genotypes at the QTL, say AA, Aa, and aa are assumed to have genotypic values G_{AA} = a – d/2, G_{Aa} = d/2, and G_{aa} = –a – d/2, respectively, where a and d indicate additive and dominance effects at the QTL. The phenotype (y) of an individual for the trait is assumed to be a random variate that can be expressed as μ + G_{X} + ϵ, where μ is the population mean, G_{X} represents the genotypic effect of the individual with the QTL genotype X, and ϵ is a normally distributed variate with the mean zero and variance v. The residual variance, v, accounts for the variation of polygenes that are in linkage equilibrium with the marker alleles and for environmental variation. The frequencies of M and A in the population are denoted by p and q, respectively. A similar model has been described elsewhere (Luo 1998). The distribution of the QTL genotypes within each of three possible marker genotypes is illustrated in Table 1. The Q (or R) is the frequency of allele A at the QTL among the chromosomes carrying marker allele M (or m), which is a function of the allelic frequencies and the linkage disequilibrium between the two loci, say for example D. Simple relationships exist among Q, R, and D as follows: Q = q + D/p, R = q – D/(1 – p), and D = p(1 – p)(Q – R). Note that the theoretical model described in Table 1 implies random union of gametes with respect to genotypes at both the marker locus and the QTL.
THEORETICAL ANALYSES
In the present study with the model given in Table 1, the statistical analysis involves formulation of maximumlikelihood estimates (MLEs) of the unknown genetic parameters θ = (p, q, D, μ, a, d, v) and estimation of asymptotic variances of these estimates.
The MLEs of the genetic parameters: If a random sample of n individuals were taken from the population described above and we let y_{ij} be the phenotypic record of the jth individual within the ith marker genotype group, where i = 1, 2, 3 referring to the marker genotype MM, Mm, and mm, respectively, j = 1, 2,..., n_{i} and n_{1} + n_{2} + n_{3} = n. The loglikelihood of the observed phenotypic records of the quantitative trait and the marker genotypic data given the genetic parameters can be written as
The data in the likelihood function (1) are incomplete with information about the QTL genotypes being missed. Thus, the estimates of the parameters, which maximize Equation 1, can be appropriately formulated following the principles of missing data analysis (Little and Rubin 1987). In the present context, let ξ_{ijk} be a random indicator variate that takes a value of 1 when the individual ij has the kth QTL genotype or 0 otherwise. If ω_{ijk} represents the posterior probability of the individual having the kth QTL genotype given its phenotype y_{ij} and the marker genotype i, it can be easily shown from these definitions and the assumption of HardyWeinburg equilibrium at the QTL that R^{3}_{k}_{=1} ω_{ijk} = 1, E(ξ_{ijk}) = E(ξ_{ijk})^{2} = ω_{ijk}, k = 1,2,3, and E(ξ_{ijk}ξ_{ijl}) = 0 for k ≠ l.
The loglikelihood of the complete data given the model parameters is thus
Estep: Calculate the posterior probability of the jth individual having the kth QTL genotype given its phenotype y_{ij} and the marker genotype i as
Mstep: Substitute ω_{ijk} into the following equations:
Asymptotic variance of the MLEs: Louis (1982) formulated asymptotic variances of MLEs that are obtained from incomplete data. Applying his general equation to the present model yields the observed information matrix given by
Kao and Zeng (1997) used this approach to calculate the asymptotic variances of MLEs in their theoretical investigation of a composite inverval mapping model. We found, however, that Louis' approach was inferentially valid in very limited cases. This is discussed in detail in the numerical analyses section.
A simple but inferentially valid alternative for calculating the asymptotic variances is to calculate the observed information matrix using the formula
Likelihood ratio test of linkage disequilibrium: The maximumlikelihood analysis discussed above provides a statistical test for the presence of linkage disequilibrium between the marker locus and QTL. The hypotheses to be tested are H_{0}, D = 0 and H_{1}, D ≠ 0. The likelihoodratio (LR) test statistic has the form of
NUMERICAL ANALYSES
Simulation study: Populations were simulated for 11 different sets of parameters as summarized in Table 2. For each set of parameters, joint genotypes at both the marker locus and the QTL for an individual were sampled from a multinomial distribution with the probability parameters as shown in Table 1 and the given sample size n. The appropriateness of this sampling strategy for simulating the linkage disequilibrium between a polymorphic marker locus and a QTL has been confirmed in Luo (1998). Once the markerQTL joint genotype was determined, the phenotypic record for an individual was generated by its genotypic value of the QTL plus a random number sampled from a normal distribution of mean zero and variance v. The additive and dominant effects at the QTL were expressed in units of the phenotypic standard deviation that was fixed at 1.0 in the present study. For simplicity, the population mean of the quantitative character, μ, was set to zero in all simulations. Each parameter set was repeated 100 times.
Estimation of D using Hill's approach: It may be useful to compare the estimates of D using the approach developed in the present article to those derived when the genotypes at both loci are known. This provides an evaluation of increase in variation of the estimates due to missing genotypic information at one of the two loci. When genotypes at both loci are known, Hill (1974) suggested an iterative procedure to solve the cubic equation
Asymptotic variances of the MLEs in a mixture distributin model: A mixture of two normal distributions was also simulated to check the appropriateness of the approach suggested by Louis (1982) for calculating asymptotic variances of MLEs obtained using the EM algorithm. The theoretical model of the mixed distributions is the same as Example 4.2 considered in Louis (1982), but with the common variance being unknown here. This provides an analogy to the present estimation problem in which the common residual variance at the QTL was also assumed to be estimated. The EM algorithm for calculating the MLEs of parameters defining the model and the formulation of the asymptotic covariance matrix of the MLEs can be found elsewhere (Louis 1982).
Calculation of statistical powers for detecting the disequilibrium: Simulations were performed to calculate statistical powers for detecting linkage disequilibrium. The parameters used for the statistical power analyses are similar to those considered in the above estimation study except that the QTL effects are halved at least and expressed in terms of the proportion of phenotypic variance attributable to the QTL for an easy comparison to the previous study (Luo 1998). Each parameter set was repeated 1000 times. Each set of the simulation data was used to perform the LR test as described in Equation 21 and the regression analysis proposed in Luo (1998). Calculating the frequency of the significant statistical test of these two different analyses in the repeated simulation trials gives simulated observation of the power, as has been carried out in Luo (1998). The statistical power was also predicted for the regression analysis using the equations in Luo (1998).
NUMERICAL RESULTS
MLEs of the parameters: Table 3 shows the average values of the MLEs of the genetic parameters over 100 replicates of simulations and their corresponding standard errors. D̂_{h}'s were the estimates of the disequilibrium coefficients that were obtained using genotypic data at both the marker and QTL through Hill's method. D̂_{1}'s were the estimates of the coefficients derived using genotypic data at the marker locus and the phenotypic records of the quantitative trait and using the method developed in the present article.
It can be seen from Table 3 that the MLEs of the disequilibrium coefficient between the marker locus and the trait locus adequately estimated their corresponding actual values in all simulated populations by use of the estimation procedure developed in the present article. Accordingly, the allelic frequencies at the trait locus were also consistently well estimated from their MLEs. As suggested in the above theoretical analysis, the disequilibrium coefficient and the allelic frequency at the trait locus can be estimated using the EM algorithm in which the Mstep calculation is implemented by either numerically solving the simultaneous Equations 11 and 12 or using Equations 13 and 14. Comparison was made, using the first replicate of the simulation data for each of the 11 simulated populations, to investigate the difference in the parameter estimates due to these different calculations. It was found that the maximum difference in the converged values of the parameter estimates occurred at the fourth decimal point. Thus, the difference in using these equations may be trivial. Equations 13 and 14 are suggested for their simplicity of numerical calculations.
As expected, the MLE of the marker allele frequency, p, converged after one step of the EM iteration, showing that the data contained no missing information for estimating this component of the parameter vector (Meng and Rubin 1991). D̂_{h} is the MLE of the disequilibrium coefficient obtained from solving the cubic Equation 22, which was based upon observations of genotypes at the two loci. In all sets of simulation data considered here, none was found to yield an estimate of the disequilibrium coefficient beyond its bound. When the comparison was made between the estimates of the disequilibrium coefficient predicted from using genotypic data at both the loci and those obtained from the cases where a genotype at one of the loci was not available, it revealed that the markerQTL disequilibrium estimates (D̂_{1}) had consistently larger standard errors than the markermarker disequilibrium estimates (D̂_{h}).
The population mean, additive, and dominant effects at the QTL were well estimated by their MLEs in all simulated populations. Note that when the marker and QTL are in linkage equilibrium, distribution of marker genotype is independent of that of the QTL, and thus the genetic marker genotype provides no information about the genotype at the QTL. Under this circumstance, the model of linkage disequilibrium collapses to the general mixture model of normal distributions. In the simulated population 4 in which the marker and QTL were in linkage equilibrium, the algorithm was, however, able to produce accurate estimates for all parameters except the residual variance v. The numerical analyses showed that the MLE of the residual variance would be biased downward when genetic effect at the QTL was small (population 3) or when the marker and QTL were in linkage equilibrium (population 4).
A critical feature found in the calculation of the asymptotic variances of the MLEs using the method suggested in Louis (1982) was that the covariance matrix of the MLEs may not be always guaranteed positive definite; that is, some of the variance estimates may be negative. Summarized in Table 4 are the MLEs computed from the simulated samples of a mixture of two normal distributions using the EM algorithm and the asymptotic variances of the MLEs calculated using the equations described in Louis (1982). In the two samples demonstrated here, the parameters defining the mixture model were accurately estimated by the maximumlikelihood procedure, but use of the approach of Louis (1982) yielded the negative estimates of the variances of the MLEs. In contrast, when the same data sets were analyzed using the method suggested in the present article, the negative variance estimates were avoided. Summarized in the table are also the coefficients of skewness and kurtosis as well as their sample standard deviations for each of the subpopulations. It is seen that none of these nonnormality parameter estimates was significant, suggesting that the negative variance estimates were not likely due to possible violation of the random sampling process of the normality requirement.
Table 5 illustrates the estimates of the asymptotic variances of the MLEs derived from the EM algorithm developed here. The variance estimates were computed three different ways: the variance of the MLEs in repeated simulations (the empirical estimates), the mean of the variance estimates calculated using the Louis approach, and the average of the variances obtained by inverting the information matix given by Equation 20. They are listed in three rows accordingly. The integers in the parentheses represent the number of the negative estimates of the variances in 100 simulation trials. It can be seen that the Louis estimates of the asymptotic variances had a probability of up to 5% of being negative. In contrast, the equation proposed here for calculating the variances produced no negative estimates. Both of the approaches usually yielded an underestimation of the asymptotic variances; particularly the variances of the residual variance estimates of the QTL effect were consistently biased downward. The amount of the bias of the Louis approach was usually larger than the proposed method.
Although the MLE of the marker allele frequency was estimated during the EM iteration, it is not appropriate to calculate its estimated asymptotic variance in the same way as for the parameters whose estimation involves missing information (Meng and Rubin 1991). Taking into account that sampling the marker genotypes was equivalent to a binomial process, the variance of the marker allele frequency estimate can thus be easily calculated.
Tabulated in Table 6 are statistical powers at α = 0.05 for detecting the linkage disequilibrium. The powers for the nine representative populations were calculated as the frequencies of the significant statistical tests in 1000 repeated simulation trials either using the LR analysis described by Equation 21 in the present article (P_{LR}) or using the regression analysis presented in Luo (1998; P_{RS}). For the populations under consideration here, the powers of the regression analyses (P_{RT}) are also predicted using the equations in Luo (1998). It can be seen that the pattern of change in the simulated powers is parallel between the LR tests and the statistical tests of the regression analysis. Effects of changing the parameters on the power were the same between the two tests and have been discussed elsewhere (Luo 1998). The LR test performed less powerfully than the regression analysis for detecting the disequilibrium.
DISCUSSION
Recent research in human genetics has shown that statistical inference about linkage disequilibrium between polymorphic genetic markers and diseasesusceptibility offers an effective approach for constructing a highresolution linkage map of the genetic marker and the genetic disorders (Hastbackaet al. 1992; Kaplanet al. 1995). It has been theoretically but heuristically demonstrated in Kaplan and Weir (1996) that this can be achieved through appropriate statistical modeling of the rate of decay in linkage disequilibrium between the genetic marker and the disease locus, relying on the logic that a close genetic linkage is the only mechanism that maintains the disequilibrium between the two loci in a longterm evolutionary process with or without selection (Hill and Robertson 1966; Luoet al. 1997). The finescale linkage disequilibirum map paved the way for success in cloning the gene that causes diastrophic dysplasia in the Finnish population (Hastbackaet al. 1994). However, all of these theoretical and experimental studies share a common limitation, that the analyses are only feasible when genotyping at both the marker locus and the disease locus is possible.
Most complex genetic diseases, such as those affecting blood pressure (Jacobet al. 1991), bodymass index (Allison 1997), and behavior traits (Wehneret al. 1997), etc., however, share the common feature of quantitative traits, so that their underlying genetics cannot be precisely genotyped as simple Mendelian characters. This makes it almost impossible to apply the above strategy directly to map genes controlling complex diseases. A quantitative genetics model proposed in the present article can be used to estimate the coefficient of linkage disequilibrium between a marker locus and a locus contributing to quantitative genetic variation in natural populations as well as to estimate the other genetic parameters defining the model. The results of numerical analyses (Table 3) indicate that the theoretical analyses of the present model and statistical algorithm were adequate in computing the MLEs of these genetic parameters. In addition to its appropriateness in analyzing the linkage disequilibrium involving QTL, the model differs from many others in various aspects. Many theoretical studies have been dedicated to estimating recombination fraction between marker and QTL and genetic parameters of QTL effects (Lander and Botstein 1989; Zeng 1994). The linkage disequilibria between the marker loci and the linked QTL were assumed in these studies to be created from hybridization between inbred lines. The genetic structure of the segregating populations in these studies provides information of linkage phase between alleles at the marker and QTL on one hand, and genetic segregation at the marker loci provides direct information about genotype at the linked QTL on the other. Thus, this substantially eases the estimation problem. The present study deals with the much more complicated situation that no such prior knowledge is available, and thus the analyses have to rely on a larger sample size for accurate estimation of the parameters. Several researchers have developed theoretical models for detecting linkage and association of polygenic genetic markers to QTL using family data (Allison 1997; Martinet al. 1997). In contrast, the present study focuses on estimation of linkage disequilibrium through analyzing random samples from natural populations. This populationbased analysis may be of advantage for collecting sample sizes as large as those considered in the simulation studies.
Efforts were made in this study to evaluate the asymptotic variances of the MLEs, which are important for constructing the confidence intervals of the MLEs. The approach given in Louis (1982) was originally developed for calculating the asymptotic variancecovariance matrix of the MLEs from the EM algorithm. Taking the normal mixture model with a known common variance as an example, Louis (1982) was able to reach a positive definite covariance matrix of the MLEs of parameters. He acknowledged that the numerical example illustrated in his article was carefully chosen so that the chance of the EM algorithm converging to a nonglobal maximum was small. The numerical example was reexamined here but with the common variance of the mixture model to be estimated. It was found that the covariance matrix of the MLEs could be negative definite even though the converged MLEs were in very good agreement with their actual values (Table 4). Estimates of skewness and kurtosis were found to be not important in the randomly generated samples, and thus the negative variance estimates were unlikely due to violation of normality of the random samples. Having treated QTL mapping analysis as a normal mixture model with the residual QTL variance component to be estimated, Kao and Zeng (1997) presented equations of the EM procedure for calculating the relevant MLEs and equations, based on the principles of Louis (1982), for computing the asymptotic variancecovariance matrix of the MLEs. They found several cases in which the predicted variances were not in close approximation to the empirical estimates, which were also calculated from the repeated simulations. However, they did not pursue further the possibility of the negative definite asymptotic covariance matrix obtained from using the Louis approach. In fact, the failure in guaranteeing a positive definite variancecovariance matrix of the EMbased MLEs may reflect that the iterative process of the EM algorithm might have not always converged to a (local) maximum but rather to a saddle point, or that the Louis approach might be sensitive to any violation to the normality condition of the likelihood function (Meng and Rubin 1991). However, it is hard to distinguish between these possibilities because no explicit form of the observed information matrix is available before the odd circumstance happens.
A simple but inferentially wider valid equation was suggested in the present article for computing the asymptotic variances of the MLEs in the mixed distribution estimation problems. The equation was compared to that of Louis (1982) and the numerical analyses showed that the equation gave usually a closer approximation to the empirical variance estimates. The equation was more robust to the ill conditions that might yield the negative estimates of the variances. Our numerical examples demonstrated that these negative estimates could occur not only in the present linkage disequilibrium model but also in the normal mixture model that has been widely studied in published reports. This issue is thus a feature of the likelihood function and the properties of the EM algorithm for calculating the MLEs but not a problem of the present model. A general statistical problem arises from the numerical evidence in seeking for an appropriate approach for estimating the asymptotic variances in a mixture model with common variance unknown.
A statistical test based upon the MLE was proposed to detect the linkage disequilibrium. The power of the LR test was compared to that of the regression test (Luo 1998). The simulation study shows that the LR analysis has a lower power than the regression analysis. This relatively low performance may be due to dependence of the power of the LR test on accuracy in estimating the model parameters on one hand. On the other hand, increasing the number of parameters to be estimated may decrease the power of likelihoodbased methods (Le Roy and Elsen 1995). Because the statistical model considered in the present study is the same as that studied in Luo (1998), the regression method is thus suggested for detecting the presence of the disequilibrium while the estimation method developed here provides the MLE of the disequilibrium coefficient.
The present study, together with that of Luo (1998), builds the path for extending the general idea of disequilibriumbased mapping, which was described in Hastbacka et al. (1992) and Kaplan et al. (1995), to the case where the disease shows quantitative phenotypic variation. In fact, if there is a significant genetic component of a quantitatively inherited disease to be found in linkage disequilibrium with a polymorphic marker locus by use of statistical tests (Allison 1997; Martinet al. 1997; Luo 1998), it can be inferred that there may be a major disease gene located in the vicinity of the marker locus (Weeks and Lathrop 1995) when the analyzed sample is randomly drawn from an isolated founder population in which the disease has evolved for quite a few generations. If, moreover, the evolutionary dynamics of the events of recombination between the major disease locus and the genetic marker can also be assumed following the simple LuriaDelbruck model (Luria and Delbruck 1943) or the stochastic processes (Kaplanet al. 1995; Xiong and Guo 1997), the present study provides an effective method for calculating the expected distribution of the joint gametic genotypes at the marker locus and the major disease locus in the observed population. This distribution, together with the knowledge of the number of generations since recombination started breaking down the linkage between the two loci as well as of the pattern in which the recombinants evolve, will provide an estimation of the rate of dissipation in the linkage disequilibrium and thus the estimate of the recombination fraction between the marker locus and the disease locus.
The numerical analysis presented here considered the situation where the QTL allele had a frequency of not less than 0.3. When a rare allele is considered, effectiveness of the estimation procedure may be affected, because then the distribution pattern of the quantitative variation may provide little information about segregation of the gene in a sample of the natural population unless the rare allele has a large genetic effect. In fact, the probability for observing a significant disequilibrium between a rare allele with small effect and a specific marker gene is actually very low (Luo 1998). Selective genotyping may result in an increased contrast of quantitative effects among different marker classes and may thus be helpful in improving the power of the disequilibrium detection. However, if only those individuals with extreme phenotypes are genotyped, the estimate of the disequilibrium may not be used to infer the level of the linkage disequilibrium in the population from which these individuals are chosen because then it is no longer a random sample of the population under question.
The present study has been focusing on modeling linkage disequilibrium between a single marker locus and a single QTL. This gives some insight into feasibility in estimating the linkage disequilibrium when genotypic information at one locus is not available. The model discussed here may seem far from being completely realistic for polygenic inheritance of quantitative variation and for availability of the data on a genetic marker map. Recent data from model organisms, however, indicated that a substantially large proportion of quantitative genetic variation can be attributable to only a few individual QTL or DNA segments (Laiet al. 1994; Wehneret al. 1997). Although theoretical efforts have been made to model linkage disequilibria using data on three marker loci (Brown 1975; Hill 1975), the analyses have been very tedious and the algebra will become even more complicated when the multilocus system involves QTL. When multiple markers and QTL are considered, the question remains how the multilocus linkage disequilibrium can be modeled in the analysis. It appears necessary to establish new theoretical strategies that may enable the number of model parameters to be absorbed into a level at which estimation of these parameters is tractable. The model studied here is thus increasingly likely to be a subunit of the sophisticated framework of the multilocus disequilibria analyses and serves as an important step toward a full understanding of the associated inheritance of quantitative trait loci with the molecular genetic markers in natural populations and of its implications in locating complex genetic variations.
Acknowledgments
Drs. W. G. Hill, T. Louis, and C. C. Tan are gratefully acknowledged for their comments and encouragement. I am indebted to Dr. ZB. Zeng and two anonymous reviewers for their constructive comments and criticisms that have been helpful in improving presentation and clarifying several ambiguities in an earlier version of the present article. This research was supported in part by the National Natural Science Foundation of China, the QiuShi Foundation, and the Shanghai Life Science Center.
APPENDIX
If λ_{ij} is the likelihood of the ijth complete data, x_{ij}, its logarithm can be written as
Calculation of E{S(x_{ij},θ)}: By definition,
Footnotes

Communicating editor: ZB. Zeng
 Received April 16, 1998.
 Accepted September 14, 1998.
 Copyright © 1999 by the Genetics Society of America