| HOME | HELP | FEEDBACK | SUBSCRIPTIONS | ARCHIVE | SEARCH | TABLE OF CONTENTS |
Corresponding author: Rongling Wu, 533 McCarty Hall C, University of Florida, Gainesville, FL 32611., rwu{at}stat.ufl.edu (E-mail)
Communicating editor: C. HALEY
| ABSTRACT |
|---|
For tightly linked loci, cosegregation may lead to nonrandom associations between alleles in a population. Because of its evolutionary relationship with linkage, this phenomenon is called linkage disequilibrium. Today, linkage disequilibrium-based mapping has become a major focus of recent genome research into mapping complex traits. In this article, we present a new statistical method for mapping quantitative trait loci (QTL) of additive, dominant, and epistatic effects in equilibrium natural populations. Our method is based on haplotype analysis of multilocus linkage disequilibrium and exhibits two significant advantages over current disequilibrium mapping methods. First, we have derived closed-form solutions for estimating the marker-QTL haplotype frequencies within the maximum-likelihood framework implemented by the EM algorithm. The allele frequencies of putative QTL and their linkage disequilibria with the markers are estimated by solving a system of regular equations. This procedure has significantly improved the computational efficiency and the precision of parameter estimation. Second, our method can detect marker-QTL disequilibria of different orders and QTL epistatic interactions of various kinds on the basis of a multilocus analysis. This can not only enhance the precision of parameter estimation, but also make it possible to perform whole-genome association studies. We carried out extensive simulation studies to examine the robustness and statistical performance of our method. The application of the new method was validated using a case study from humans, in which we successfully detected significant QTL affecting human body heights. Finally, we discuss the implications of our method for genome projects and its extension to a broader circumstance. The computer program for the method proposed in this article is available at the webpage http://www.ifasstat.ufl.edu/genome/~LD.
DESPITE its significant importance in plant and animal breeding and evolutionary studies, the genetic basis of a quantitatively inherited trait has not been well understood. Two factors have caused this. First, a quantitative trait, or a complex trait, is likely to be controlled by many genes, each having a small effect (![]()
![]()
![]()
![]()
![]()
Apart from possible allelic associations, different genes may be mutually dependent when one gene prevents another from manifesting its effect through a particular biochemical pathway (![]()
![]()
![]()
The advent of complete DNA-marker linkage maps has provided a powerful tool for investigating the genetic architecture of a quantitative trait in plants, animals, or humans (reviewed in ![]()
![]()
![]()
![]()
![]()
![]()
In this article, we present a new statistical method for mapping QTL with epistasis in natural populations. Our analysis is based on an important population property of gene segregation, that is, linkage disequilibrium (![]()
![]()
![]()
![]()
![]()
![]()
![]()
Our model presented here uses a multilocus linkage disequilibrium analysis to detect epistatic QTL and estimate the effects of their gene action and interaction on a quantitative trait. Our model can be considered to be comprehensive, because it includes all possible disequilibria among different markers and QTL and considers quantitative gene interaction effects of different kinds. We derive a simple closed form of the expectation-maximization (EM) algorithm for estimating the allelic frequencies and coefficients of linkage disequilibria and gene action and interaction effects. Because of this computational innovation, the estimation precision of QTL population and quantitative parameters from our multilocus model can be significantly improved over that of traditional treatments, despite the fact that the multilocus model contains an increasingly larger number of unknown parameters. Extensive simulations are performed to study the robustness and performance of our multilocus disequilibrium mapping of QTL in a natural population. Our method has been validated by the successful detection of QTL for human body height.
| POPULATION GENETIC MODEL |
|---|
Suppose there is a panmictic natural population, in which m biallelic markers,
1,
2, ... ,
m, and q biallelic QTL,
1,
2, ...
q, are segregating. For the sake of distinction, we use the subscripts to denote the markers and the superscripts to denote the QTL. Two alleles at a locus are denoted by M1k and M2k, or indexed by rk (rk = 1, 2), for marker
k, and Ql1 and Ql2, or indexed by sl (sl = 1, 2), for QTL
l. This population is assumed to be in disequilibrium, so that all markers and QTL are associated with one another. We denote Dk1k2 as the coefficient of digenic linkage disequilibrium between markers
k1 and
k2, Dl1l2 as the coefficient of digenic linkage disequilibrium between QTL
l1 and
l2 and Dl1k1 as the coefficient of digenic linkage disequilibrium between marker
k1 and QTL
l1. Similarly, the coefficients of trigenic (Dl1k1k2 and Dl1l2k1), quadrigenic (Dl1l2k1k2), or higher-order linkage disequilibrium can be defined.
Haplotypes, zygote configurations, and zygote genotypes:
With the above notation for the markers and QTL and their corresponding alleles, a general multilocus haplotype mix for the markers (M1 M2 ... Mm) and the QTL (Q1 Q2 ... Qq) is denoted by gs1s2...sqr1r2...rm (r1, ... , rm; s1, ... , sq = 1, 2), with population frequency expressed as ps1s2...sqr1r2...rm. This haplotype frequency can be divided into the allele frequencies and a set of disequilibrium coefficients of different orders (![]()
k)/one-QTL (
l) haplotype frequency is divided into two parts, no disequilibrium and digenic disequilibrium, expressed as
![]() |
(1) |
From now on we omit the symbols k for markers and l for QTL for simplicity, unless they are needed. A two-marker/one-QTL haplotype frequency is expressed as
![]() |
(2) |
When four or more genetic loci are associated, disequilibria of one set of loci may interact with those of other sets to form so-called multiplicative disequilibria (![]()
![]()
![]() |
(3) |
For the m markers and q QTL, a total of 2m+q haplotypes randomly unite to generate 3m+q zygote genotypes containing 2m+q(2m+q + 1)/2 zygote configurations or modes of zygote formation via different combinations of paternal and maternal gametes. The number of zygote genotypes is smaller than that of zygote configurations because different configurations may generate the same zygote genotype. A zygote genotype can be generally formulated as

where {·}{·} are used to separate two different loci, and r1
r'1, ... , rm
r'm,s1
s'1, ... ,sq
s'q = 1, 2 are the two alternative alleles of a zygote at the same locus. The joint frequency of a zygote for all markers and QTL is expressed as

At Hardy-Weinberg equilibrium, the population frequency of a (m + q)-locus zygotic configuration is calculated as the product of the two corresponding haplotype frequencies (![]()
Table 1 describes a matrix for the population frequencies of 27 zygote genotypes at two markers and one QTL, which contain 36 zygote configurations through random combinations of the eight haplotypes. As shown in the table, we use P
(
= 1, ... , 36) to denote the frequencies of different zygote configurations. Because different configurations may produce the same zygote genotype when paternal and maternal gametes have different alleles at the same locus, the number of zygote configurations may be greater than the number of zygote genotypes. For example, zygote genotype G{12}{11}{12} is generated via configuration g111 x g212 or g211 x g112.
|
Of the eight three-locus haplotype frequencies contained in the joint probability matrix of Table 1, seven are independent because of the constraint
. These seven independent frequencies, arrayed by
, are composed of three allele frequencies (pr1,pr2,ps), three digenic linkage disequilibria (Dk1k2,Dlk1,Dlk2), and one trigenic linkage disequilibrium (Dlk1k2). We develop a statistical algorithm to estiate
p and ultimately estimate those population genetic parameters describing allelic association and population evolutionary history.
A more complicated situation is that in which two QTL (
l1 and
l2) are detected by two pairs of markers (
k1 -
k2 and
k3 -
k4). We permit these six loci to be mutually associated in the population. A total of 64 (26) six-locus haplotypes randomly unite to generate 729 (36) six-locus zygotes whose population frequencies are arrayed by matrix
. If these six loci contain two independent sets each with two markers and one QTL, we have
where
is the Krockner product. If the two sets are not independent, however, the joint zygote probabilities of four markers and two QTL should be derived on the basis of different zygote configurations via random combinations of the 64 haplotypes. Summing to 1, the 64 six-locus haplotype frequencies contain 63 independent frequencies (arrayed as
p), which are expressed in terms of an equal number of composite parameters, six-allele frequencies, and 57 coefficients of linkage disequilibria of different orders.
Multiallelic disequilibrium analysis:
For outcrossing populations, it is not uncommon to have multiple alleles at a locus. Although the measure of linkage disequilibrium between two multiallelic loci was well defined (![]()
k and vl for QTL
l. According to ![]()

with constraints
Trigenic linkage disequilibrium among two marker alleles rk1 and rk2 at markers
k1 and
k2 and one QTL allele sl at QTL
l is expressed as

with constraints
.
Hexagenic linkage disequilibria between four marker nonalleles and two QTL nonalleles are expressed as


with constraints
| QUANTITATIVE GENETIC MODEL |
|---|
In the above section, we discussed the population properties of markers and QTL and derived the joint probabilities of the QTL genotypes and the marker genotypes on the basis of their zygote formation. We now introduce a genetic model for characterizing the additive, dominant, and epistatic effects of QTL affecting a quantitative trait. We use MATHER and JINKS' (1982) notation to describe the epistasis, under which the genotypic values for any two QTL (
l1 and
l2) are expressed as
![]() |
(4) |
where µ is the overall mean; al1, al2, dl1, and dl2 are the additive and dominant effects of the two QTL, respectively; and il1l2, jl1l2, kl1l2, and ll1l2 are the additive x additive, additive x dominant, dominant x additive, and dominant x dominant epistatic effects between the two QTL, respectively.
The phenotypic value of a quantitative trait due to these two putative QTL for individual i from a random sample of the population can be written as
![]() |
(5) |
where xiS1S'1S2S'2 is an indicator variable, defined as 1 if individual i has QTL genotype Qs1Qs'1Qs2Qs'2 and 0 otherwise, and ei is the residual error, distributed as N(0,
2). It is possible that each QTL genotype has a different residual variance. Since there are nine QTL genotypes, nine independent genotypic values arrayed by {µ{s1s'1}{s2s'2}}1x9 can be defined as in (4). These nine genotypic values are composed of nine unknown quantitative genetic parameters: one overall mean (µ), two additive {al1, al2}, two dominant {dl1, dl2}, and four epistatic effects {il1l2, jl1l2, kl1l2, ll1l2}. The nine genotypic values and residual variance comprise a quantitative genetic parameter vector arrayed by
q = {µ{11}{11}, ... , µ{22}{22},
2}, which can be estimated by incorporating the population genetic properties of the QTL within a mapping framework.
For many outcrossing populations, the biallelic-QTL genetic model described by Equation 5 may not be sufficient and should be extended to a multiallelic-QTL situation. Quantitative genetic models based on multiallelic inheritance have been well discussed in the literature (see ![]()
| STATISTICAL MODEL |
|---|
Likelihood of the complete data:
In this QTL mapping study, marker data (M) and phenotype data (y) for all n individuals randomly drawn from a natural population are observed data, whereas zygote formation modes, i.e., zygote configurations (g), are unobservable or missing data. The observed data alone are incomplete data, which, along with the missing data, comprise the complete data. For simplicity, the statistical framework is illustrated for the relatively simple two-marker/one-QTL model. This framework can be extended to consider a multiple-marker/multiple-QTL model, considering multiple alleles at each locus. The likelihood function of the complete data given the unknown parameters
= (
p,
q) is written, under the two-marker/one-QTL model, as
![]() |
(6) |
or
![]() |
(7) |
where n{r1r'1}{r2r'2} is the observed number of individuals with marker genotype Mr1Mr'1Mr2Mr'2,


and f{ss'}(yi) is a normal density of observation yi with a particular QTL genotype QsQs' having mean µ{ss'} and variance
2. The maximization of the likelihood (7) is achieved by differentiating the log-likelihood with respect to
, which leads to
![]() |
(8) |
where
By letting the score (Equation 8) equal zero and solving the log-likelihood equation, we obtain the maximum-likelihood estimates (MLE) of the unknown parameters as

where
denotes the sample size of a particular zygote configuration
(
= 1, ... , 36) as defined in Table 1. For the complete data,
is observable. Based on the invariance property of maximum-likelihood estimates, the MLEs of allelic frequencies, the coefficients of linkage disequilibrium, and QTL effects can be obtained from

The likelihood of incomplete data:
The data of QTL mapping are incomplete because only marker data and phenotype data can be observed and the data on zygote configurations and QTL genotypes are missing. The likelihood of incomplete data is built upon a mixture model in that each individual is assumed to have arisen from one of these unknown QTL genotypes. Finite mixtures of distributions used to model a wide variety of random phenomena (see ![]()
![]() |
(9) |
where Prob(G{r1r'1}{r2r'2}) is the population frequency of marker genotypes at the two markers,

and the other notation is defined as above.
The estimation of haplotype frequencies is based on the number of a particular haplotype in the population. For the complete data, 36 modes of zygote formation via random combinations of paternal haplotypes and maternal haplotypes are known a priori and, therefore, the number of various marker-QTL haplotypes can be directly counted (see Table 1). But for the incomplete data, zygote formation is unobservable and its inference is viewed as a missing data problem. In fact, it is possible to calculate the expected number of a particular marker-QTL haplotype for a given marker zygote genotype. Although marker-QTL zygote genotypes are also unknown, they can be inferred on the basis of marker information and phenotype data by implementing the EM algorithm. By differentiating Equation 9, we have
![]() |
(10) |
where
![]() |
(11) |
which could be thought of as a posterior probability that individual i has a marker-QTL zygote configuration
given marker zygote genotype G{r1r'1}{r2r'2}. We then implement the EM algorithm (![]()
,
}, where
Each iteration consists of two steps. First, on the E step, the complete-data log-likelihood is averaged over the conditional distribution of the indicator variables given the observed data, using the current estimate of the parameter vector. Since the complete-data log-likelihood is linear in these indicator variables, the E-step of the EM algorithm involves simply replacing them by the current values of their conditional expectations, that is, the posterior probabilities of component membership expressed in Equation 11. Next, in the M step, conditional on
, we solve for the zeros of Equation 10 (likelihood equations) to get our estimates of
. The likelihood equation can be split into two terms: The first term refers to the genetic association as specified by haplotype frequencies, and the second term to the phenotype-genotype relationship described by the genotypic means and residual variance. The estimates are then used to update
in the E step, and the process is repeated until convergence. The values at convergence are the MLEs.
It is interesting to note that the estimation of allelic frequencies and coefficients of linkage disequilibria can be obtained from the closed-form solutions of haplotype frequencies. This will largely improve the computational efficiency of these parameters that can only be expressed as polynomial equations using traditional procedures (see ![]()
![]()
![]()
Disequilibrium mapping of epistatic QTL:
We consider two cases for mapping QTL with epistasis for a quantitative trait based on linkage disequilibrium. First, two epistatic QTL are predicted by two independent pairs of markers. Second, two epistatic QTL are predicted by two pairs of genetically associated markers. In the first case, the two QTL can be assumed to be independent from a population genetic standpoint, although they are not so in terms of quantitative genetic effects. In the second case, the two QTL should be associated genetically and interact epistatically to affect a quantitative trait. In both cases, the EM algorithm proposed for the above two-marker/one-QTL model is extended to estimate both population and quantitative genetic parameters.
Two basic tasks should be completed before the implementation of the EM algorithm for epistasis mapping. The first task is the derivation of an (81 x 9) matrix for joint probabilities of four-marker/two-QTL zygote genotypes based on zygote configurations. For two independent sets of markers and QTL, such probabilities are simply calculated as the Krockner product of the joint probability for each set, so that there are 16 haplotype frequencies (and 14 independent haplotype frequencies) in this case. But for two dependent sets of loci, the derivations of such probabilities should consider the formation principle of a six-locus zygote, in which case there are a total of 64 haplotype frequencies, of which 63 are independent. In the first case, we have only digenic and trigenic linkage disequilibria, whereas, in the second case, we must face digenic, trigenic, quadrigenic, pentagenic, and hexagenic disequilibria (see Equation 3).
The second task is the modeling of epistatic QTL with additive, dominant, and additive x additive, additive x dominant, dominant x additive, and dominant x dominant interaction effects (see matrix 4). The estimation of both population (from the first task) and quantitative genetic parameters (from the second task) for epistatic QTL can be obtained using closed forms.
Hypothesis tests:
We can test two major hypotheses in the following sequence: (1) the existence of QTL by testing the significance of QTL effects and (2) the association of significant QTL with markers by testing linkage disequilibrium. The test for the existence of a significant QTL may not rely upon the association between the QTL and known marker(s), but the random association of a QTL with markers can be tested only when the existence of the QTL is statistically tested.
The existence of a QTL can be tested on the basis of two alternative hypotheses:
![]() |
(12) |
The log-likelihood-ratio test statistic for the existence of a QTL is calculated by comparing the likelihood values under H1 (full model) and H0 (reduced model), using
![]() |
(13) |
where the tilde denotes the MLEs of unknown parameters under H0 and the other notation is defined as above. The LRQ is considered to asymptotically follow a
2 distribution with the degrees of freedom equaling the difference of the number of unknown parameters to be estimated under the two hypotheses. However, the approximation of a
2 distribution may be inappropriate when some regularity conditions are violated. The permutation test approach proposed by ![]()
After the existence of the QTL is statistically tested, we formulate a second test for its association with two markers under consideration. The log-likelihood ratio for such a test is calculated by LRQ
![]() |
(14) |
which is asymptotically distributed as a
2 distribution with 3 d.f.
Other more specific tests for the existence of additive, dominant, or epistatic effect and the association of a QTL with a particular marker can also be formulated similarly. Generally, these hypotheses can be tested on the basis of a critical threshold calculated from permutation tests (![]()
![]()
| MONTE CARLO SIMULATION |
|---|
Simulation scenarios:
We have performed extensive simulation studies to investigate the robustness and performance of our linkage disequilibrium-based mapping of quantitative traits. A number of simulation scenarios are performed to compare results under different QTL models (one-marker/one-QTL vs. multiple-marker/multiple-QTL), different heritability levels, different sample sizes, and different population genetic parameters (equal vs. extreme allele frequencies at a locus). Marker and QTL genotype data are simulated on the basis of joint frequencies for a given sample size, whereas phenotype data are simulated by summing genotypic values and environmental errors, distributed as N(0,
2). The residual variance
2 is determined under a given heritability level. The simulated data are analyzed using the genetic model proposed in this article. Each simulation scenario was repeated 200 times to estimate the biases and mean square errors of the MLEs.
To show the advantage of our multilocus disequilibrium model, we first perform the analysis on the basis of a one-marker/one-QTL model (MQ), as used in earlier studies (![]()
![]()
![]()
|
|
|
Results:
Compared to the excellent accuracy and precision of the MLEs of marker allele frequencies that were obtained directly from the observed data, the accuracy and precision of the MLEs of QTL-allele frequency and disequilibrium and QTL effects are relatively reduced, sometimes seriously. As expected, the MLEs of QTL-related parameters under all given QTL models from MQ to MQMMQM (Table 2 Table 3 Table 4) display increased accuracy and precision with increased heritabilities and sample sizes, although the extent of the increase may be different among the models (Table 5 Table 6 Table 7). Generally, a more significant improvement of parameter estimation, especially for QTL allele frequency and disequilibrium, was observed when H2 is increased from 0.1 to 0.4 than when N is increased from 500 to 1000.
|
|
|
Additive-dominant model: Under MQ, we consider two situations, one with no disequilibrium between the marker and QTL (MQ1MQ4) and the other with disequilibrium between the two loci (MQ5MQ8). The first situation is similar to the detection of major genes because marker information is actually not used. It is found that the MLEs of all QTL-related parameters in the second situation are improved over those in the first situation, especially for the dominant effect of the QTL and model mean (Table 5), suggesting the value of utilizing marker information in quantitative genetic research.
In MQM, we first assume that both markers are not associated with a putative QTL (MQM1MQM4). As expected, this has no improvement in parameter estimation compared to MQ1MQ4 (Table 6). Second, we assume that one of the two markers is associated with the QTL (MQM5MQM8), whereas the second marker is not. This design is similar to composite interval mapping in which those markers outside the marker interval carrying a QTL are used as cofactors to control genome background. We did not observe any improvement in parameter estimation from composite disequilibrium mapping (MQM5MQM8; Table 6) over disequilibrium mapping (MQ5MQ8; Table 5).
We also design a scheme in which two markers are associated with each other, both of which are further associated with a putative QTL (MQM9MQM12). Although this kind of mutual association may not lead to a significant improvement for the MLEs of QTL allele frequency and residual variance, the estimation for the model mean, the additive and dominant effects of the QTL, and the residual variance is largely benefitted by a simultaneous use of two associated markers (MQM9MQM12; Table 6).
We examine the effects of different allelic frequencies on parameter estimation. The relative frequencies of two alternative alleles reveal the degree of genetic differentiation in a population. When QTL alleles are less differentiated, as shown by the allele frequencies of 0.3 vs. 0.7 (MQM13MQM16), the precision of the MLEs is not affected as long as more differentiated markers are used (Table 6). We also found that the inclusion of some less differentiated markers did not affect the MLEs of QTL parameters (MQM17MQM20). However, when less differentiated markers are used to predict less differentiated QTL (MQM21MQM24), the estimation precision of QTL effects and model parameters is largely affected, although this may not be true for the MLEs of the population genetic parameters of QTL.
Additive-dominant-epistasis model: When two QTL interact epistatically, we develop two different multilocus disequilibrium schemes to estimate their epistasis. The first scheme (MQM-MQM) permits simultaneous use of two independent sets of markers to predict their corresponding QTL, whereas the second scheme (MQMMQM) assumes two dependent sets of marker-QTL relationships. In general, more tightly linked loci tend to have a greater chance for nonrandom association than less tightly linked loci do. Thus, the first scheme is roughly similar to a situation in which two sets of markers and QTL are from different chromosomes, whereas the second scheme is similar to a situation in which two sets of loci are from the same chromosome.
In the first scheme, all genetic parameters can be estimated with reasonable accuracy and precision (Table 7), suggesting that our disequilibrium-based mapping strategy can be used to map epistatic QTL. When two independent marker pairs are used at the same time, the MLEs of QTL-allele frequencies and QTL-marker disequilibrium are more precise than when only one marker pair is used. The root mean-square errors of these estimates are
0.122 for QTL allele frequencies, 0.033 for QTL-marker digenic disequilibrium, and 0.017 for QTL-marker trigenic disequilibrium under MQM-MQM1 when two independent marker sets are used (Table 7). Yet, the corresponding values are 0.161, 0.045, and 0.023 under MQM9 when one marker pair is used (Table 6). But compared to MQM, the estimation precision of QTL additive and dominant effects is reduced for MQM-MQM (Table 7).
In terms of estimation precision, more precise MLEs of the additive x additive interaction effect between two different QTL than those for the additive effect at a single QTL are obtained, whereas the additive x dominant or dominant x additive interaction effect is similar to the dominant effect, but the dominant x dominant interaction effect is poorer than the dominant effect (Table 7). As expected, the MLE of the dominant x dominant interaction effect may be quite imprecise. However, when both sample size and heritability are significantly increased, the MLEs of the dominant x dominant effect can be close to those of other parameters (Table 7).
In the second scheme, the precision of the MLEs of all parameters is reduced, compared to the first scheme, because of the inclusion of many unknowns. We perform an additional simulation under H2 = 0.4 for the second scheme by increasing the sample size from 1000 to 2000 to 5000, keeping the other parameters unchanged. The precision of the MLEs of population genetic parameters remained approximately constant with increased sample sizes (data not given), but the MLEs of gene effects including the additive, dominant, and epistatic effects displayed significantly improved precision when the sample size is increased (Table 8). When N = 2000, the estimation of the additive and additive x additive interaction effects achieves acceptable precision under the complex MQMMQM model. The estimation of the other effects is certainly acceptable when a sample size is 5000 (Table 8).
|
| A CASE STUDY FROM HUMANS |
|---|
A case study from a human genome project for mapping obesity genes is used to validate our method. The subjects sampled consisted of 643 women from the National Heart, Lung, and Blood Institute-funded Woman's Ischemic Syndrome Evaluation (WISE) study (J. A. JOHNSON, unpublished results). Women included in the WISE study were >18 years old and consist of 105 blacks and 538 whites, each measured for body height and other traits. However, as an example, only body height is analyzed here. All study participants provided informed written consent prior to participation in the study. WISE study protocols were approved by the Institutional Review Boards of the participants' institutions.
Genotypes for the women studied were determined using Orchid BioScience's proprietary primer extension technology (SNP-IT). A total of 10 single-nucleotide polymorphisms (SNPs) were genotyped at six candidate genes for obesity on different chromosomes, which are ADRB1, ADRB2, ADRB3, ADRA1A, the GS protein
-subunit (GNAS1), and the G protein ß3 subunit (GNB3; J. A. JOHNSON, unpublished results). The ADRB1 gene contains two common nonsynonymous polymorphisms at codon 49 (Ser
Gly) and codon 389 (Arg
Gly). At the ADRB2 gene, there are three polymorphism sites at codon 19 of the ß2AR 5' leader cistron (Cys
Arg), codon 16 (Gly
Arg), and codon 27 (Gln
Gly). Two sites at codon 131 and codon 371 were genotyped for the GNAS1 gene.
Our newly developed statistical method is used to perform the association studies between SNPs and body heights in two different populations. According to ![]()
|
In the white population, a significant QTN was detected by SNP C835T at gene GNB3 (Table 9). As shown by allele frequencies, linkage disequilibria, and genetic effects, we conjecture that the same QTN has been detected by combining this SNP with other SNPs. Although body height QTNs may be population specific, they can be detected in both populations when some common SNPs are combined.
| DISCUSSION |
|---|
Since its successful use in mapping human diseases (![]()
![]()
![]()
![]()
![]()
![]()
![]()
![]()
![]()
![]()
![]()
![]()
![]()
![]()
![]()
![]()
![]()
Linkage disequilibrium mapping makes use of cumulative crossovers between markers and the target gene during a large number of generations since the first appearance of the disequilibrium. However, current applications of linkage disequilibrium are limited to genetic diseases in humans that are fairly simple, monogenic, and obey the rule of Mendelian inheritance. In this article, we present a maximum-likelihood-based method for multilocus linkage disequilibrium mapping of complex traits that are controlled by many, genetically interacting, genes. The motivation of our study includes three aspects. First, current disequilibrium-based mapping is based mostly on a one-marker/one-gene model, although ![]()
![]()
![]()
![]()
![]()
Second, in many organisms, including humans, quantitatively inherited traits are economically or biomedically important. Yet, their inheritance mode can be complex and likely includes multiple genes with interactions at both the gene and population levels (![]()
![]()
![]()
![]()
![]()
![]()
![]()
![]()
Third, one of the critical problems that has not been solved well in previous linkage disequilibrium mapping is the estimation of QTL allele frequencies and disequilibrium coefficients. Unlike gene effect parameters, these population parameters have no explicit closed forms for their solutions in the M step of the EM algorithm. The estimation of these two parameters in previous studies (![]()
![]()
![]()
In analyzing two epistatic QTL using multiple markers, we presented two schemes by assuming, respectively, that the two QTL interact at the gene level (epistasis), but not at the population level (no disequilibrium), and that the two QTL interact at both the gene and population levels. Our simulation results suggest that the population and quantitative genetic parameters of QTL in the first epistasis scheme can be estimated with reasonable precision. More interesting, the MLEs of the population genetic parameters for two nonassociated but epistatic QTL are more precise than when only a single QTL is modeled. This result may have practical implications for selecting optimal marker combinations in disequilibrium mapping for a complex trait. However, our simulation results also show that parameter estimation for QTL in the second epistasis scheme has reduced precision because of the inclusion of too many known parameters. To overcome this problem in the second scheme, the use of a much larger sample size is recommended. Other strategies include the development of an analytical method that can typically handle a high dimension of parameter space, like Markov chain Monte Carlo algorithms (MCMC; ![]()
The strengths of the Bayesian approach for haplotype analysis have been demonstrated by ![]()
![]()
![]()
![]()
Our method can also be modified in other areas. First, we assume an equilibrium population. When this assumption is violated, as is often observed in practice, the introduction of the coefficient of Hardy-Weinberg disequilibrium is necessary. Recently, ![]()
![]()
![]()
![]()
| ACKNOWLEDGMENTS |
|---|
This work is partly supported by an Outstanding Young Investigator Award of the National Science Foundation (NSF) of China (30128017), a University of Florida Research Opportunity Fund (02050259), and a University of South Florida Biodefense grant (7222061-12) to R.W., NSF of China grant (30000097) to X.-Y. Lou, and NSF grant (9971586) to G.C. The collection of the SNP and human body height data used in this study was supported by grants from the National Institutes of Health (NIH; HL65729) and Orchid Biosciences to J.A.J. and from the NIH (HL64924) to C.J.P. The publication of this manuscript was approved as journal series no. R-09224 by the Florida Agricultural Experiment Station.
Manuscript received October 7, 2002; Accepted for publication January 16, 2003.
| LITERATURE CITED |
|---|
ALLISON, D. B., 1997 Transmission disequilibrium tests for quantitative traits. Am. J. Hum. Genet. 60:676-690.[Medline]
ARDLIE, K. G., L. KRUGLYAK, and M. SEIELSTAD, 2002 Patterns of linkage disequilibrium in the human genome. Nat. Rev. Genet. 3:299-309.[Medline]
BARTON, N. H. and P. D. KEIGHTLEY, 2002 Understanding quantitative genetic variation. Nat. Rev. Genet. 3:11-21.[Medline]
BENNETT, J. H., 1954 On the theory of random mating. Ann. Eugen. 18:311-317.
CHURCHILL, G. A. and R. W. DOERGE, 1994 Empirical threshold values for quantitative trait mapping. Genetics 138:963-971.[Abstract]
COCKERHAM, C. C., 1980 Random and fixed effects in plant genetics. Theor. Appl. Genet. 56:119-131.
COLLINS, A. and N. E. MORTON, 1998 Mapping a disease locus by allelic association. Proc. Natl. Acad. Sci. USA 95:1741-1745.
DEMPSTER, A. P., N. M. LAIRD, and D. B. RUBIN, 1977 Maximum likelihood from incomplete data via EM algorithm. J. R. Stat. Soc. Ser. B 39:1-38.
FISHER, R. A., 1918 The correlation between relatives on the supposition of Mendelian inheritance. Proc. R. Soc. Edinb. 52:399-433.
GARNER, C. and M. SLATKIN, 2002 Likelihood-based disequilibrium mapping for two-marker haplotype data. Theor. Popul. Biol. 61:153-161.[Medline]
HALEY, C., 1999 Advances in quantitative trait locus mapping, pp. 4759 in From J. L. Lush to Genomics: Visions for Animal Breeding and Genetics, edited by J. C. M. DEKKERS, S. J. LAMONT and M. F. ROTHSCHILD. Iowa State University Press, Ames, IA.
HÄSTBACKA, J., A. DE LA CHAPELLE, I. KAITILA, P. SISTONEN, and A. WEAVER et al., 1992 Linkage disequilibrium mapping in isolated founder populations: diastropic dysplasia in Finland. Nat. Genet. 2:204-221.[Medline]
HÄSTBACKA, J., A. DE LA CHAPELLE, M. MAHTANI, G. CLINES, and M. P. REEVE-DALY et al., 1994 The diastropic dysplasia gene encodes a novel sulfate transporter: positional cloning by fine-structure linkage disequilibrium mapping. Cell 78:1073-1087.[Medline]
HOESCHELE, I., 2000 Mapping quantitative trait loci in outbred pedigrees, pp. 599644 in Handbook of Statistical Genetics, edited by D. J. BALDING, M. BISHOP and C. CANNINGS. John Wiley & Sons, New York.
HUDSON, R. R., 2000 Linkage disequilibrium and recombination, pp. 309324 in Handbook of Statistical Genetics, edited by D. J. BALDING, M. BISHOP and C. CANNINGS. John Wiley & Sons, New York.
JANSEN, R. C., 2000 Quantitative trait loci in inbred lines, pp. 567597 in Handbook of Statistical Genetics, edited by D. J. BALDING, M. BISHOP and C. CANNINGS. John Wiley & Sons, New York.
JORDE, L. B., W. S. WATKINS, D. VISKOCHIL, P. OCONNELL, and K. WARD, 1993 Linkage disequilibrium in the neurofibromatosis-I (NFI) region: implications for gene mapping. Am. J. Hum. Genet. 53:1038-1050.[Medline]
KRUGLYAK, L., 1999 Prospects for whole-genome linkage disequilibrium mapping of common disease genes. Nat. Genet. 22:139-144.[Medline]
LIU, J. S., C. SABATTI, J. TENG, B. J. B. KEATS, and N. RISCH, 2001 Bayesian analysis of haplotypes for linkage disequilibrium mapping. Genome Res. 11:1716-1724.
LUO, Z. W. and S. SUHAI, 1999 Estimating linkage disequilibrium between a polymorphic marker locus and a trait locus in natural populations. Genetics 151:359-371.
LUO, Z. W., S. H. TAO, and Z-B. ZENG, 2000 Inferring linkage disequilibrium between a polymorphic marker locus and a trait locus in natural populations. Genetics 156:457-467.
LYNCH, M., and B. WALSH, 1998 Genetics and Analysis of Quantitative Traits. Sinauer, Sunderland, MA.
MACKAY, T. F. C., 2001 The genetic architecture of quantitative traits. Annu. Rev. Genet. 35:303-339.[Medline]
MATHER, K., and J. L. JINKS, 1982 Biometrical Genetics, Ed. 3. Chapman & Hall, London.