Abstract
For tightly linked loci, cosegregation may lead to nonrandom associations between alleles in a population. Because of its evolutionary relationship with linkage, this phenomenon is called linkage disequilibrium. Today, linkage disequilibriumbased mapping has become a major focus of recent genome research into mapping complex traits. In this article, we present a new statistical method for mapping quantitative trait loci (QTL) of additive, dominant, and epistatic effects in equilibrium natural populations. Our method is based on haplotype analysis of multilocus linkage disequilibrium and exhibits two significant advantages over current disequilibrium mapping methods. First, we have derived closedform solutions for estimating the markerQTL haplotype frequencies within the maximumlikelihood framework implemented by the EM algorithm. The allele frequencies of putative QTL and their linkage disequilibria with the markers are estimated by solving a system of regular equations. This procedure has significantly improved the computational efficiency and the precision of parameter estimation. Second, our method can detect markerQTL disequilibria of different orders and QTL epistatic interactions of various kinds on the basis of a multilocus analysis. This can not only enhance the precision of parameter estimation, but also make it possible to perform wholegenome association studies. We carried out extensive simulation studies to examine the robustness and statistical performance of our method. The application of the new method was validated using a case study from humans, in which we successfully detected significant QTL affecting human body heights. Finally, we discuss the implications of our method for genome projects and its extension to a broader circumstance. The computer program for the method proposed in this article is available at the webpage http://www.ifasstat.ufl.edu/genome/~LD.
DESPITE its significant importance in plant and animal breeding and evolutionary studies, the genetic basis of a quantitatively inherited trait has not been well understood. Two factors have caused this. First, a quantitative trait, or a complex trait, is likely to be controlled by many genes, each having a small effect (Lynch and Walsh 1998). Such genes are difficult to identify using classic genetic approaches. Second, as revealed in a considerable body of literature, the genes underlying a complex trait are not independent in exerting an effect on quantitative variation (reviewed in Mackay 2001). The dependence between alleles at different loci may be due to their linkage on the same chromosome or nonrandom association even if they are on different chromosomes. In particular, the nonrandom association between two linked genes is called linkage (gametic) disequilibrium (Risch and Merikangas 1996; Hudson 2000). Unlinked genes may also be associated if biological processes, such as population differentiation, population admixture, and natural selection, occur in a population. Linkage and nonrandom association leading to cosegregation of different genes affect gamete distribution and frequencies and, thus, are considered to be population genetic parameters (Mackay 2001).
Apart from possible allelic associations, different genes may be mutually dependent when one gene prevents another from manifesting its effect through a particular biochemical pathway (Phillips 1998). Such dependence of genes is referred to as epistasis, a ubiquitous phenomenon in the genetic control of a complex trait. Epistasis, described as a type of gene interaction on phenotypes, possesses the property of a quantitative genetic parameter. Classical estimation of quantitative genetic parameters is based on the resemblance among relatives using a theory established by Fisher (1918), but this approach lacks the power to estimate epistasis because epistasis contributes little to the relative resemblance. A lack of powerful methods for estimating epistasis has diminished its central role in trait evolution and domestication, as claimed by S. Wright and his followers (Wolfet al. 2000).
The advent of complete DNAmarker linkage maps has provided a powerful tool for investigating the genetic architecture of a quantitative trait in plants, animals, or humans (reviewed in Mackay 2001; Barton and Keightley 2002). A number of statistical methods have been proposed to map genes responsible for complex traits, referred to as quantitative trait loci (QTL), based on different genetic designs, different marker types, and different mapping populations (reviewed in Jansen 2000; Hoeschele 2000). Most of these methods were developed to adapt gene segregation patterns in a mapping pedigree derived from a controlled cross and have two significant limits when they are applied on a wider scheme. First, for many species such as humans, it is not possible to obtain controlled crosses and thus their QTL mapping should be based on existing natural populations. Second, many methods assume that epistatic effects due to gene interaction are absent, to facilitate the estimation of quantitative genetic parameters. Given the complexity of quantitative variation, however, the assumption of no epistasis most likely deviates from biological reality (Mackay 2001; Barton and Keightley 2002).
In this article, we present a new statistical method for mapping QTL with epistasis in natural populations. Our analysis is based on an important population property of gene segregation, that is, linkage disequilibrium (Lynch and Walsh 1998; Hudson 2000), although this property made quantitative genetic study more difficult in the past. Linkage causing linkage disequilibrium is used as a basis for fine mapping of human diseases (Templeton 1999) and has been successful in several case studies of gene positional cloning (reviewed in Ardlieet al. 2002). Luo and Suhai (1999), Luo et al. (2000), and Wu et al. (2002) extended disequilibriumbased mapping strategies to map QTL for a quantitative trait using a random sample drawn from a natural population. However, in these studies only the simplest model, based on one marker and one QTL, was formulated. The application of these models to study the genetic architecture of polygenic traits is therefore limited.
Our model presented here uses a multilocus linkage disequilibrium analysis to detect epistatic QTL and estimate the effects of their gene action and interaction on a quantitative trait. Our model can be considered to be comprehensive, because it includes all possible disequilibria among different markers and QTL and considers quantitative gene interaction effects of different kinds. We derive a simple closed form of the expectationmaximization (EM) algorithm for estimating the allelic frequencies and coefficients of linkage disequilibria and gene action and interaction effects. Because of this computational innovation, the estimation precision of QTL population and quantitative parameters from our multilocus model can be significantly improved over that of traditional treatments, despite the fact that the multilocus model contains an increasingly larger number of unknown parameters. Extensive simulations are performed to study the robustness and performance of our multilocus disequilibrium mapping of QTL in a natural population. Our method has been validated by the successful detection of QTL for human body height.
POPULATION GENETIC MODEL
Suppose there is a panmictic natural population, in which m biallelic markers,
Haplotypes, zygote configurations, and zygote genotypes: With the above notation for the markers and QTL and their corresponding alleles, a general multilocus haplotype mix for the markers (M_{1} M_{2}... M_{m}) and the QTL (Q^{1} Q^{2}... Q^{q}) is denoted by
Table 1 describes a matrix for the population frequencies of 27 zygote genotypes at two markers and one QTL, which contain 36 zygote configurations through random combinations of the eight haplotypes. As shown in the table, we use P_{ξ}(ξ= 1,..., 36) to denote the frequencies of different zygote configurations. Because different configurations may produce the same zygote genotype when paternal and maternal gametes have different alleles at the same locus, the number of zygote configurations may be greater than the number of zygote genotypes. For example, zygote genotype
Of the eight threelocus haplotype frequencies contained in the joint probability matrix of Table 1, seven are independent because of the constraint
A more complicated situation is that in which two QTL (
Multiallelic disequilibrium analysis: For outcrossing populations, it is not uncommon to have multiple alleles at a locus. Although the measure of linkage disequilibrium between two multiallelic loci was well defined (Weir 1996), a multiallelic disequilibrium analysis has not been extended to many different loci, because of an increasingly large number of disequilibrium coefficients. Assume that there are u_{k} alleles for marker
Hexagenic linkage disequilibria between four marker nonalleles and two QTL nonalleles are expressed as
QUANTITATIVE GENETIC MODEL
In the above section, we discussed the population properties of markers and QTL and derived the joint probabilities of the QTL genotypes and the marker genotypes on the basis of their zygote formation. We now introduce a genetic model for characterizing the additive, dominant, and epistatic effects of QTL affecting a quantitative trait. We use Mather and Jinks’ (1982) notation to describe the epistasis, under which the genotypic values for any two QTL (
The phenotypic value of a quantitative trait due to these two putative QTL for individual i from a random sample of the population can be written as
For many outcrossing populations, the biallelicQTL genetic model described by Equation 5 may not be sufficient and should be extended to a multiallelicQTL situation. Quantitative genetic models based on multiallelic inheritance have been well discussed in the literature (see Cockerham 1980).
STATISTICAL MODEL
Likelihood of the complete data: In this QTL mapping study, marker data (M) and phenotype data (y) for all n individuals randomly drawn from a natural population are observed data, whereas zygote formation modes, i.e., zygote configurations (g), are unobservable or missing data. The observed data alone are incomplete data, which, along with the missing data, comprise the complete data. For simplicity, the statistical framework is illustrated for the relatively simple twomarker/oneQTL model. This framework can be extended to consider a multiplemarker/multipleQTL model, considering multiple alleles at each locus. The likelihood function of the complete data given the unknown parameters Ω = (Ω_{p}, Ω_{q}) is written, under the twomarker/oneQTL model, as
The likelihood of incomplete data: The data of QTL mapping are incomplete because only marker data and phenotype data can be observed and the data on zygote configurations and QTL genotypes are missing. The likelihood of incomplete data is built upon a mixture model in that each individual is assumed to have arisen from one of these unknown QTL genotypes. Finite mixtures of distributions used to model a wide variety of random phenomena (see McLachlan and Peel 2000) can provide a sound mathematicalbased approach for distinguishing unknown QTL genotypes. With a normal mixture modelbased approach for separating QTL genotypes, the likelihood function of incomplete data including the phenotype (y) and marker information (M) is written as
The estimation of haplotype frequencies is based on the number of a particular haplotype in the population. For the complete data, 36 modes of zygote formation via random combinations of paternal haplotypes and maternal haplotypes are known a priori and, therefore, the number of various markerQTL haplotypes can be directly counted (see Table 1). But for the incomplete data, zygote formation is unobservable and its inference is viewed as a missing data problem. In fact, it is possible to calculate the expected number of a particular markerQTL haplotype for a given marker zygote genotype. Although markerQTL zygote genotypes are also unknown, they can be inferred on the basis of marker information and phenotype data by implementing the EM algorithm. By differentiating Equation 9, we have
Each iteration consists of two steps. First, on the E step, the completedata loglikelihood is averaged over the conditional distribution of the indicator variables given the observed data, using the current estimate of the parameter vector. Since the completedata loglikelihood is linear in these indicator variables, the Estep of the EM algorithm involves simply replacing them by the current values of their conditional expectations, that is, the posterior probabilities of component membership expressed in Equation 11. Next, in the M step, conditional on Π, we solve for the zeros of Equation 10 (likelihood equations) to get our estimates of Ω. The likelihood equation can be split into two terms: The first term refers to the genetic association as specified by haplotype frequencies, and the second term to the phenotypegenotype relationship described by the genotypic means and residual variance. The estimates are then used to update Π in the E step, and the process is repeated until convergence. The values at convergence are the MLEs.
It is interesting to note that the estimation of allelic frequencies and coefficients of linkage disequilibria can be obtained from the closedform solutions of haplotype frequencies. This will largely improve the computational efficiency of these parameters that can only be expressed as polynomial equations using traditional procedures (see Luo and Suhai 1999; Luoet al. 2000; Wuet al. 2002).
Disequilibrium mapping of epistatic QTL: We consider two cases for mapping QTL with epistasis for a quantitative trait based on linkage disequilibrium. First, two epistatic QTL are predicted by two independent pairs of markers. Second, two epistatic QTL are predicted by two pairs of genetically associated markers. In the first case, the two QTL can be assumed to be independent from a population genetic standpoint, although they are not so in terms of quantitative genetic effects. In the second case, the two QTL should be associated genetically and interact epistatically to affect a quantitative trait. In both cases, the EM algorithm proposed for the above twomarker/oneQTL model is extended to estimate both population and quantitative genetic parameters.
Two basic tasks should be completed before the implementation of the EM algorithm for epistasis mapping. The first task is the derivation of an (81 × 9) matrix for joint probabilities of fourmarker/twoQTL zygote genotypes based on zygote configurations. For two independent sets of markers and QTL, such probabilities are simply calculated as the Krockner product of the joint probability for each set, so that there are 16 haplotype frequencies (and 14 independent haplotype frequencies) in this case. But for two dependent sets of loci, the derivations of such probabilities should consider the formation principle of a sixlocus zygote, in which case there are a total of 64 haplotype frequencies, of which 63 are independent. In the first case, we have only digenic and trigenic linkage disequilibria, whereas, in the second case, we must face digenic, trigenic, quadrigenic, pentagenic, and hexagenic disequilibria (see Equation 3).
The second task is the modeling of epistatic QTL with additive, dominant, and additive × additive, additive × dominant, dominant × additive, and dominant × dominant interaction effects (see matrix 4). The estimation of both population (from the first task) and quantitative genetic parameters (from the second task) for epistatic QTL can be obtained using closed forms.
Hypothesis tests: We can test two major hypotheses in the following sequence: (1) the existence of QTL by testing the significance of QTL effects and (2) the association of significant QTL with markers by testing linkage disequilibrium. The test for the existence of a significant QTL may not rely upon the association between the QTL and known marker(s), but the random association of a QTL with markers can be tested only when the existence of the QTL is statistically tested.
The existence of a QTL can be tested on the basis of two alternative hypotheses:
After the existence of the QTL is statistically tested, we formulate a second test for its association with two markers under consideration. The loglikelihood ratio for such a test is calculated by
Other more specific tests for the existence of additive, dominant, or epistatic effect and the association of a QTL with a particular marker can also be formulated similarly. Generally, these hypotheses can be tested on the basis of a critical threshold calculated from permutation tests (Churchill and Doerge 1994). Other approaches as proposed by Piepho (2001) can also be used, which do not require expensive computations.
MONTE CARLO SIMULATION
Simulation scenarios: We have performed extensive simulation studies to investigate the robustness and performance of our linkage disequilibriumbased mapping of quantitative traits. A number of simulation scenarios are performed to compare results under different QTL models (onemarker/oneQTL vs. multiplemarker/multipleQTL), different heritability levels, different sample sizes, and different population genetic parameters (equal vs. extreme allele frequencies at a locus). Marker and QTL genotype data are simulated on the basis of joint frequencies for a given sample size, whereas phenotype data are simulated by summing genotypic values and environmental errors, distributed as N(0, σ^{2}). The residual variance σ^{2} is determined under a given heritability level. The simulated data are analyzed using the genetic model proposed in this article. Each simulation scenario was repeated 200 times to estimate the biases and mean square errors of the MLEs.
To show the advantage of our multilocus disequilibrium model, we first perform the analysis on the basis of a onemarker/oneQTL model (MQ), as used in earlier studies (Luo and Suhai 1999; Luoet al. 2000; Wuet al. 2002). The results from MQ are compared with those obtained from a twomarker/oneQTL model (MQM) and two epistatic QTL models (two separate sets of twomarker/oneQTL models, denoted by MQMMQM, and two dependent sets of twomarker/oneQTL models, denoted by MQMMQM). For each QTL model, we design two different heritabilities (H ^{2} = 0.1 vs. 0.4), two different sample sizes (n = 500 vs. 1000), and two different allele frequencies for a locus. These designs including the hypothesized parameter values are listed in Tables 2, 3, 4.
Results: Compared to the excellent accuracy and precision of the MLEs of marker allele frequencies that were obtained directly from the observed data, the accuracy and precision of the MLEs of QTLallele frequency and disequilibrium and QTL effects are relatively reduced, sometimes seriously. As expected, the MLEs of QTLrelated parameters under all given QTL models from MQ to MQMMQM (Table 2, 3, 4) display increased accuracy and precision with increased heritabilities and sample sizes, although the extent of the increase may be different among the models (Table 5, 6, 7). Generally, a more significant improvement of parameter estimation, especially for QTL allele frequency and disequilibrium, was observed when H ^{2} is increased from 0.1 to 0.4 than when N is increased from 500 to 1000.
Additivedominant model: Under MQ, we consider two situations, one with no disequilibrium between the marker and QTL (MQ1MQ4) and the other with disequilibrium between the two loci (MQ5MQ8). The first situation is similar to the detection of major genes because marker information is actually not used. It is found that the MLEs of all QTLrelated parameters in the second situation are improved over those in the first situation, especially for the dominant effect of the QTL and model mean (Table 5), suggesting the value of utilizing marker information in quantitative genetic research.
In MQM, we first assume that both markers are not associated with a putative QTL (MQM1MQM4). As expected, this has no improvement in parameter estimation compared to MQ1MQ4 (Table 6). Second, we assume that one of the two markers is associated with the QTL (MQM5MQM8), whereas the second marker is not. This design is similar to composite interval mapping in which those markers outside the marker interval carrying a QTL are used as cofactors to control genome background. We did not observe any improvement in parameter estimation from composite disequilibrium mapping (MQM5MQM8; Table 6) over disequilibrium mapping (MQ5MQ8; Table 5).
We also design a scheme in which two markers are associated with each other, both of which are further associated with a putative QTL (MQM9MQM12). Although this kind of mutual association may not lead to a significant improvement for the MLEs of QTL allele frequency and residual variance, the estimation for the model mean, the additive and dominant effects of the QTL, and the residual variance is largely benefitted by a simultaneous use of two associated markers (MQM9MQM12; Table 6).
We examine the effects of different allelic frequencies on parameter estimation. The relative frequencies of two alternative alleles reveal the degree of genetic differentiation in a population. When QTL alleles are less differentiated, as shown by the allele frequencies of 0.3 vs. 0.7 (MQM13MQM16), the precision of the MLEs is not affected as long as more differentiated markers are used (Table 6). We also found that the inclusion of some less differentiated markers did not affect the MLEs of QTL parameters (MQM17MQM20). However, when less differentiated markers are used to predict less differentiated QTL (MQM21MQM24), the estimation precision of QTL effects and model parameters is largely affected, although this may not be true for the MLEs of the population genetic parameters of QTL.
Additivedominantepistasis model: When two QTL interact epistatically, we develop two different multilocus disequilibrium schemes to estimate their epistasis. The first scheme (MQMMQM) permits simultaneous use of two independent sets of markers to predict their corresponding QTL, whereas the second scheme (MQMMQM) assumes two dependent sets of markerQTL relationships. In general, more tightly linked loci tend to have a greater chance for nonrandom association than less tightly linked loci do. Thus, the first scheme is roughly similar to a situation in which two sets of markers and QTL are from different chromosomes, whereas the second scheme is similar to a situation in which two sets of loci are from the same chromosome.
In the first scheme, all genetic parameters can be estimated with reasonable accuracy and precision (Table 7), suggesting that our disequilibriumbased mapping strategy can be used to map epistatic QTL. When two independent marker pairs are used at the same time, the MLEs of QTLallele frequencies and QTLmarker disequilibrium are more precise than when only one marker pair is used. The root meansquare errors of these estimates are ∼0.122 for QTL allele frequencies, 0.033 for QTLmarker digenic disequilibrium, and 0.017 for QTLmarker trigenic disequilibrium under MQMMQM1 when two independent marker sets are used (Table 7). Yet, the corresponding values are 0.161, 0.045, and 0.023 under MQM9 when one marker pair is used (Table 6). But compared to MQM, the estimation precision of QTL additive and dominant effects is reduced for MQMMQM (Table 7).
In terms of estimation precision, more precise MLEs of the additive × additive interaction effect between two different QTL than those for the additive effect at a single QTL are obtained, whereas the additive × dominant or dominant × additive interaction effect is similar to the dominant effect, but the dominant × dominant interaction effect is poorer than the dominant effect (Table 7). As expected, the MLE of the dominant × dominant interaction effect may be quite imprecise. However, when both sample size and heritability are significantly increased, the MLEs of the dominant × dominant effect can be close to those of other parameters (Table 7).
In the second scheme, the precision of the MLEs of all parameters is reduced, compared to the first scheme, because of the inclusion of many unknowns. We perform an additional simulation under H ^{2} = 0.4 for the second scheme by increasing the sample size from 1000 to 2000 to 5000, keeping the other parameters unchanged. The precision of the MLEs of population genetic parameters remained approximately constant with increased sample sizes (data not given), but the MLEs of gene effects including the additive, dominant, and epistatic effects displayed significantly improved precision when the sample size is increased (Table 8). When N = 2000, the estimation of the additive and additive × additive interaction effects achieves acceptable precision under the complex MQMMQM model. The estimation of the other effects is certainly acceptable when a sample size is 5000 (Table 8).
A CASE STUDY FROM HUMANS
A case study from a human genome project for mapping obesity genes is used to validate our method. The subjects sampled consisted of 643 women from the National Heart, Lung, and Blood Institutefunded Woman’s Ischemic Syndrome Evaluation (WISE) study (J. A. Johnson, unpublished results). Women included in the WISE study were >18 years old and consist of 105 blacks and 538 whites, each measured for body height and other traits. However, as an example, only body height is analyzed here. All study participants provided informed written consent prior to participation in the study. WISE study protocols were approved by the Institutional Review Boards of the participants’ institutions.
Genotypes for the women studied were determined using Orchid BioScience’s proprietary primer extension technology (SNPIT). A total of 10 singlenucleotide polymorphisms (SNPs) were genotyped at six candidate genes for obesity on different chromosomes, which are ADRB1, ADRB2, ADRB3, ADRA1A, the G_{S} protein αsubunit (GNAS1), and the G protein β3 subunit (GNB3; J. A. Johnson, unpublished results). The ADRB1 gene contains two common nonsynonymous polymorphisms at codon 49 (Ser → Gly) and codon 389 (Arg → Gly). At the ADRB2 gene, there are three polymorphism sites at codon 19 of the β_{2}AR 5′ leader cistron (Cys → Arg), codon 16 (Gly → Arg), and codon 27 (Gln → Gly). Two sites at codon 131 and codon 371 were genotyped for the GNAS1 gene.
Our newly developed statistical method is used to perform the association studies between SNPs and body heights in two different populations. According to Mackay (2001), the gene for a quantitative trait detected by a SNP is called a quantitative trait nucleotide (QTN). We found different QTN that affect human body heights between the black and white populations (Table 9). Two SNPs, Gly16Arg and Cys19Arg, at gene ADRB2 can identify significant QTN in the black population, although the significance of the QTL detected by the second SNP is marginal (Table 9). In fact, the QTL detected by these two SNPs may be identical for two reasons. First, the sum of the allele frequencies of the QTN identified is one, implying that variants detected by the two SNPs within the same gene are actually two alternative alleles. This alternative relationship is confirmed by different signs of disequilibrium coefficients. Second, the QTN displays a similar genetic effect on body height, with different directions for the additive effect as expected. This blackspecific height QTN can also be identified by combining one of these two SNPs with SNP C825T at gene GNB3.
In the white population, a significant QTN was detected by SNP C835T at gene GNB3 (Table 9). As shown by allele frequencies, linkage disequilibria, and genetic effects, we conjecture that the same QTN has been detected by combining this SNP with other SNPs. Although body height QTNs may be population specific, they can be detected in both populations when some common SNPs are combined.
DISCUSSION
Since its successful use in mapping human diseases (Hästbacka et al. 1992, 1994; Jordeet al. 1993), linkage disequilibriumbased mapping methods have become one of the most active areas in current theoretical and empirical genomic research (Allison 1997; Xiong and Guo 1997; Collins and Morton 1998; Terwilliger and Weiss 1998; Kruglyak 1999; Meuwissen and Goddard 2000; Pritchard and Przeworski 2001; Reichet al. 2001; Weiss and Clark 2002). A major advantage of disequilibrium mapping is that it can potentially provide highresolution mapping of actual genes responsible for variation of biomedically important diseases by exploiting their associations with adjacent markers (Haley 1999; Zaykinet al. 2002). The intense interest in linkage disequilibrium mapping is further stimulated by the advent of highdensity SNP maps (Wanget al. 1998; Sachidanandamet al. 2001) and an inherited paucity of power for narrowing the genomic interval carrying disease genes for traditional pedigreebased linkage analyses (Ardlieet al. 2002).
Linkage disequilibrium mapping makes use of cumulative crossovers between markers and the target gene during a large number of generations since the first appearance of the disequilibrium. However, current applications of linkage disequilibrium are limited to genetic diseases in humans that are fairly simple, monogenic, and obey the rule of Mendelian inheritance. In this article, we present a maximumlikelihoodbased method for multilocus linkage disequilibrium mapping of complex traits that are controlled by many, genetically interacting, genes. The motivation of our study includes three aspects. First, current disequilibriumbased mapping is based mostly on a onemarker/onegene model, although Terwillinger (1995) and Garner and Slatkin (2002) proposed a likelihoodbased method for twomarker haplotype data in disease mapping. The theoretical principle of the onemarker/onegene model has been extended for fine mapping QTL for quantitative traits (Luo and Suhai 1999; Luoet al. 2000; Wuet al. 2002). However, singlemarker likelihood analysis does not adequately make use of information from multiple cosegregated markers. We extend singlemarker analysis to multilocus linkage disequilibrium mapping. As demonstrated by the simulations, such an extended model provides a higher resolution of QTL localization by more precisely estimating the linkage disequilibrium between the QTL and markers.
Second, in many organisms, including humans, quantitatively inherited traits are economically or biomedically important. Yet, their inheritance mode can be complex and likely includes multiple genes with interactions at both the gene and population levels (Mackay 2001; Barton and Keightley 2002). Given the genetic complexity of a quantitative trait, the current onemarker/oneQTL model of Luo and Suhai (1999), Luo et al. (2000), and Wu et al. (2002) may be oversimplified. Our method can incorporate the complex inheritance of a quantitative trait by including multiple QTL in the mapping framework. The degree of interaction between two different QTL at the population level is expressed as the coefficient of linkage disequilibrium. With linkage disequilibrium, the segregation of one QTL will depend on the segregation of other QTL. The interaction at the gene level is the epistasis that triggers a direct effect on the phenotype. Gene association and gene interaction have presented much difficulty in genetic research. The statistical method proposed in this study displays power to estimate both linkage disequilibrium and epistasis, although their precise estimation relies upon a large sample size and/or high heritability level. Indeed, given the possible contributions of these two phenomena to the genetic architecture of a complex trait (Mackay 2001; Barton and Keightley 2002), a reasonable investment in the maintenance of a huge mapping population and its careful management to reduce environmental noises should be well rewarded. Also, with our method and such a mapping population, we will be able to address the current debate, initiated by R. A. Fisher and S. Wright a long time ago, on the role of epistasis in trait development and evolution (Wolfet al. 2000).
Third, one of the critical problems that has not been solved well in previous linkage disequilibrium mapping is the estimation of QTL allele frequencies and disequilibrium coefficients. Unlike gene effect parameters, these population parameters have no explicit closed forms for their solutions in the M step of the EM algorithm. The estimation of these two parameters in previous studies (Luo and Suhai 1999; Luoet al. 2000; Wuet al. 2002) relied upon the resolutions of polynomial loglikelihood equations by other numerical methods, such as the NewtonRaphson iteration. However, this is not efficient from a computational perspective and has difficulty in providing precise estimates of the population parameters. In this article we have derived a closedform solution for estimating haplotype frequencies, from which the MLEs of QTL allele frequency and linkage disequilibrium are obtained. This theoretical derivation can be regarded as one of our major contributions to linkage disequilibrium mapping and will likely make our method more attractive to realworld applications.
In analyzing two epistatic QTL using multiple markers, we presented two schemes by assuming, respectively, that the two QTL interact at the gene level (epistasis), but not at the population level (no disequilibrium), and that the two QTL interact at both the gene and population levels. Our simulation results suggest that the population and quantitative genetic parameters of QTL in the first epistasis scheme can be estimated with reasonable precision. More interesting, the MLEs of the population genetic parameters for two nonassociated but epistatic QTL are more precise than when only a single QTL is modeled. This result may have practical implications for selecting optimal marker combinations in disequilibrium mapping for a complex trait. However, our simulation results also show that parameter estimation for QTL in the second epistasis scheme has reduced precision because of the inclusion of too many known parameters. To overcome this problem in the second scheme, the use of a much larger sample size is recommended. Other strategies include the development of an analytical method that can typically handle a high dimension of parameter space, like Markov chain Monte Carlo algorithms (MCMC; Robert and Casella 1999).
The strengths of the Bayesian approach for haplotype analysis have been demonstrated by Liu et al. (2001) and Niu et al. (2002). First, it can make a reasonable inference of haplotypes comprising a large number of markers, larger than can be handled by the EM algorithm (Zhanget al. 2001). Second, the Bayesian approach, implemented with MCMC, is powerful in constructing ancestral haplotypes by considering multiple founders, uncertainty of linkage phases, and incompletion of marker data (Liuet al. 2001). We expect that a similar Bayesian framework can be developed to map QTL based on linkage disequilibrium analysis. However, our method based on the EM algorithm is still needed. It can efficiently analyze a relatively small number of haplotypes and provide accurate information about haplotype construction. Moreover, the results from the EM algorithm can provide prior information for the Bayesian approach with capacity to solve more complicated problems.
Our method can also be modified in other areas. First, we assume an equilibrium population. When this assumption is violated, as is often observed in practice, the introduction of the coefficient of HardyWeinberg disequilibrium is necessary. Recently, Yang (2000, 2002) proposed a multilocus statistic to examine zygotic associations in nonequilibrium populations. The disequilibria of different degrees due to a single locus or multiple loci can be summarized in such a statistic. It would be interesting to incorporate Yang’s zygotic association theory to map QTL in an arbitrary natural population. Second, our method can estimate only linkage disequilibria; however, linkage disequilibrium alone may not be effective to infer the linkage between the marker and QTL. Wu and Zeng (2001) have developed a new statistical strategy for simultaneously estimating linkage and linkage disequilibrium, and this strategy shows great potential for fine mapping of quantitative trait loci in natural populations (Wuet al. 2002). Once our current method is integrated with Wu and Zeng’s model, we can test the relationship between linkage and linkage disequilibrium and make linkage disequilibrium mapping a more predictable and powerful approach.
Acknowledgments
This work is partly supported by an Outstanding Young Investigator Award of the National Science Foundation (NSF) of China (30128017), a University of Florida Research Opportunity Fund (02050259), and a University of South Florida Biodefense grant (722206112) to R.W., NSF of China grant (30000097) to X.Y. Lou, and NSF grant (9971586) to G.C. The collection of the SNP and human body height data used in this study was supported by grants from the National Institutes of Health (NIH; HL65729) and Orchid Biosciences to J.A.J. and from the NIH (HL64924) to C.J.P. The publication of this manuscript was approved as journal series no. R09224 by the Florida Agricultural Experiment Station.
Footnotes

Communicating editor: C. Haley
 Received October 7, 2002.
 Accepted January 16, 2003.
 Copyright © 2003 by the Genetics Society of America