Genetics, Vol. 160, 779-792, February 2002, Copyright © 2002

Joint Linkage and Linkage Disequilibrium Mapping of Quantitative Trait Loci in Natural Populations

Rongling Wua, Chang-Xing Maa,b, and George Casellaa
a Department of Statistics, University of Florida, Gainesville, Florida 32611
b Department of Statistics, Nankai University, Tianjin 300071, People's Republic of China

Corresponding author: Rongling Wu, 533 McCarty Hall C, University of Florida, Gainesville, FL 32611., rwu{at}stat.ufl.edu (E-mail)

Communicating editor: Y.-X. FU


*  ABSTRACT
*TOP
*ABSTRACT
*STATISTICAL METHOD
*SIMULATION
*DISCUSSION
*APPENDIX A
*APPENDIX B
*LITERATURE CITED

Linkage analysis and allelic association (also referred to as linkage disequilibrium) studies are two major approaches for mapping genes that control simple or complex traits in plants, animals, and humans. But these two approaches have limited utility when used alone, because they use only part of the information that is available for a mapping population. More recently, a new mapping strategy has been designed to integrate the advantages of linkage analysis and linkage disequilibrium analysis for genome mapping in outcrossing populations. The new strategy makes use of a random sample from a panmictic population and the open-pollinated progeny of the sample. In this article, we extend the new strategy to map quantitative trait loci (QTL), using molecular markers within the EM-implemented maximum-likelihood framework. The most significant advantage of this extension is that both linkage and linkage disequilibrium between a marker and QTL can be estimated simultaneously, thus increasing the efficiency and effectiveness of genome mapping for recalcitrant outcrossing species. Simulation studies are performed to test the statistical properties of the MLEs of genetic and genomic parameters including QTL allele frequency, QTL effects, QTL position, and the linkage disequilibrium of the QTL and a marker. The potential utility of our mapping strategy is discussed.


GENETIC mapping of quantitative trait loci (QTL) has become a routine tool for the genetic study of plants, animals, and humans. With such a tool, many fundamental genetic questions including the inheritance mode of a quantitative trait, genotype x environment interaction, and the genetic basis of heterosis can be addressed (reviewed by TANKSLEY 1993 Down; TEMPLETON 1999 Down; WU et al. 2000 Down; MACKAY 2001 Down). Genetic mapping also has potential to reshape our understanding of complex biological phenomena, such as human diseases and adaptive plasticity (the capacity of a given individual to change its phenotype across different environments). Most of these phenomena are now viewed as having some genetic components and, therefore, can be modified or changed genetically for a feature beneficial to humans. It can be anticipated that genetic mapping will play an increasingly important role in unraveling the genetic basis of quantitative variation in the next decade with the advent of novel DNA-based molecular marker technologies, such as single-nucleotide polymorphisms (SNPs; WANG et al. 1998 Down).

Because of differences in biological properties of study materials, considerable effort is being made to develop statistical genetic mapping methods for specific species or populations. In terms of the genetic principles behind mapping, the methodology of genetic mapping includes two main areas: linkage analysis and association studies (reviewed by OLSON et al. 1999 Down). Linkage analysis is based on the recombination of nonalleles at a marker and QTL during meiosis and, thus, can directly estimate the map distance (measured by recombination fraction) between the two syntenic loci. However, it is difficult to detect recombination events between closely spaced (<1 cM) loci when there are a limited number of meioses occurring in a mapping population (e.g., HASTBACKA et al. 1992 Down, HASTBACKA et al. 1994 Down; DARVASI et al. 1993 Down; LONG et al. 1995 Down). Association studies, on the other hand, use all recombinations generated since nonrandom association of nonalleles at a marker and QTL (commonly referred to as linkage disequilibrium) was introduced into a population, thus increasing the precision of the estimate of the QTL location (RISCH and MERIKANGAS 1996 Down; RABINOWITZ 1997 Down; XIONG and GUO 1997 Down). Yet, the localization of QTL using linkage disequilibrium mapping is ineffective when the significant linkage disequilibrium detected between a marker and QTL results from the recent occurrence of disequilibrium rather than from a tight linkage between the loci. Such a spurious association detected even when the marker is not physically linked to any causative loci may be due to population subdivision and admixture. Current population- (GORDON et al. 2000 Down; LUO et al. 2000 Down) or family-based analyses [e.g., the transmission/disequilibrium test (TDT); SPIELMAN and EWENS 1996; ALLISON 1997] of linkage disequilibrium cannot distinguish strong disequilibrium and loose linkage from weak disequilibrium and tight linkage (WHITTAKER et al. 2000 Down).

The limits of linkage analysis and linkage disequilibrium mapping when they are used alone can be overcome by a new strategy for taking advantage of each approach in genetic mapping. Such a joint linkage and linkage disequilibrium mapping strategy has been recently devised by WU and ZENG 2001 Down in that a random sample from a natural population and the open-pollinated progeny of the sample were analyzed jointly. This strategy was established on the principle of gene transmission from the parental to progeny generation during which the linkage between a marker and QTL is broken down due to meiotic recombination. It can therefore divide the composite measure of linkage disequilibrium from traditional population- or family-based association tests relying on recombinations in a single generation into two components: the linkage between the marker and QTL and their linkage disequilibrium created at a historic time. With the measures of these two components, one can clearly determine the mechanistic basis of a significant disequilibrium detected between a marker and QTL, which increases the feasibility for fine mapping QTL affecting a quantitative trait.

In this article, we extend the joint linkage and linkage disequilibrium mapping strategy to map QTL segregating in a natural population. The extension allows for simultaneous estimates for a number of genetic and genomic parameters including the allele frequency of QTL, its effects, its location, and its population association with a known marker locus. Our analysis is performed within the maximum-likelihood framework, implemented with the expectation-maximization (EM) algorithm. The statistical properties of the estimates for different genetic parameters are studied through extensive simulations. A comparison of the power for detecting linkage disequilibrium is made on the basis of traditional disequilibrium analyses and the joint linkage and linkage disequilibrium analysis proposed here.


*  STATISTICAL METHOD
*TOP
*ABSTRACT
*STATISTICAL METHOD
*SIMULATION
*DISCUSSION
*APPENDIX A
*APPENDIX B
*LITERATURE CITED

Population structure theory:
Outcrossing species likely have heterogeneous genomes, on which both dominant and codominant loci are distributed. For codominant loci, there are often a high but variable number of alleles from locus to locus (WEBER and WONG 1993 Down; PFEIFFER et al. 1997 Down). To simplify the descriptions of our mapping model, we consider only biallelic codominant loci in this article. Although straightforward in principle, extensions to other marker types, such as dominant or missing markers and multiallelic markers, require particular mathematical manipulations.

Consider one marker (M) and one QTL (Q), both segregating in a random mating population at Hardy-Weinburg equilibrium. The two alleles are denoted by M1 and M2 at the marker locus and by Q1 and Q2 at the QTL. The frequencies of alleles Mr (r = 1, 2) and Qs (s = 1, 2) in the population are denoted by pr and qs, with and . The population frequency of one-locus genotypes and Qs1Qs2 (s1 <= s2 = 1, 2) are denoted by Pr1r2 and Qs1s2, with {sum}2r1{sum}2r2 and . The nonallelic frequencies from the marker and QTL are not independent of each other in the population, with the coefficient of gametic linkage disequilibrium denoted by Drs.

If the marker and QTL are located on the same region of a chromosome, they are likely linked with recombination fraction {theta}. On the basis of population genetic theory (NAGYLAKI 1992 Down), it is easy to derive the population frequencies of four two-locus gametes (haplotypes) MrQs (r, s = 1, 2), which are randomly combined to form the current generation t, as

(1)

where D(t)rs has a bound of max[-p(t)1q(t)1, -p(t)2q(t)2] <= D(t)(rs) <= min[p(t)1q(t)2, p(t)2q(t)1] (WEIR 1996 Down). Through free combinations, these gametes from the maternal and paternal sides produce nine different progeny genotypes Mr1Mr2Qs1Qs2 (r1 <= r2, s1 <= s2 = 1, 2 denote the two alternative alleles of the marker and QTL), whose population frequencies P(t)r1r2s1s2 in the current generation t are calculated as products of the population frequencies of the maternal and paternal gametes (Table 1). Table 1 also gives the (conditional) frequencies of the four gametes produced by each of the nine genotypes for the next (progeny) generation t + 1. As shown by population genetic theory, the amount of linkage disequilibrium between any two loci is reduced at the rate of recombination frequency after the population mates at random for one generation (NAGYLAKI 1992 Down). Therefore, the coefficient of gametic linkage disequilibrium in the progeny generation t + 1 is changed to be D(t+1) = (1 - {theta})D(t). Thus, the population frequencies of two-locus gametes MrQs (r, s = 1, 2), which are randomly combined to form the progeny generation t + 1, are expressed as

(2)

For plants, all genetic information about the progeny generation is contained in seeds. If there is no overlapping in reproduction between parental and progeny generations, the frequencies of the genotypes at the marker and QTL are the products of the frequencies of the corresponding gametes.


 
View this table:
In this window
In a new window

 
Table 1. Probabilities of the gamete genotypes produced by mother plants with different genotypes for the marker and QTL

Sampling theory:
Assume that we randomly select M unrelated individuals from the population and collect the seeds from the selected individuals. The seeds are germinated and grown into seedlings for a progeny test, which is a regular procedure for traditional quantitative genetic experiments (MCKEAND and BRIDGWATER 1998 Down). Both the selected individuals and their progeny are genotyped for molecular markers. Assuming the species studied is dioecious, the genotypes of the seeds from a selected individual are derived from its own (maternal) gametes each with a frequency given in Table 1 and the paternal gametes from the pollen pool each with a frequency described by Equation 2. Thus, every selected individual represents an open-pollinated (i.e., half-sib) family with the common mother and different (unknown) fathers. On the basis of the sampling theory, the M selected individuals include three different marker genotypes, with the number denoted by Mr1r2 for genotype Mr1Mr2, and the genotypic frequencies in the sampled population are for M1M1, for M1M2, and for M2M2. The progeny from the selected individuals (called mothers) with different marker genotypes are different in genotype composition and genotype frequency (Table 2). In other words, the marker genotype of an offspring (go) is dependent on the marker genotype of its mother (gm):

(3)

Thus, different mother marker genotypes and different progeny marker genotypes form seven unique two-level marker genotypes, i.e., {M1M1 - M1M1}, {M1M1 - M1M2}, {M1M2 - M1M1}, {M1M2 - M1M2}, {M1M2 - M2M2}, {M2M2 - M1M2}, and {M2M2 - M2M2}. The number of the progeny of marker genotype M{gamma}1M{gamma}2 produced by the ith mother plant of marker genotype Mr1Mr2 is denoted by N{gamma}1{gamma}2r1r2i, where the subscripts stand for the marker genotype of the mother and the superscripts for the marker genotypes of its progeny (r1, r2, {gamma}1, {gamma}2 = 1 or 2 constrained by Expression 3). The conditional probabilities of the QTL genotypes given each two-level marker genotype are given in Table 2 (see Appendix A for the derivations). These conditional probabilities are used to calculate the likelihood of the phenotype for the trait in an open-pollinated progeny design.


 
View this table:
In this window
In a new window

 
Table 2. Conditional probabilities of the QTL genotypes upon the marker genotypes of a progeny produced by a mother plant with different marker genotypes (two-level marker genotypes)

Estimation theory:
Suppose there is a segregating QTL responsible for a quantitative trait in the half-sib families. The phenotypic value of offspring j in an open-pollinated progeny test at the putative QTL is described by a simple statistical model

(4)

where µ is the overall mean, xj and zj are the indicator variables describing the genotypes of the QTL,

and

and {epsilon}j is the random error, {epsilon}j ~ N (0, {sigma}2). The genotypic values of Q1Q1, Q1Q2, and Q2Q2 are denoted by µ + 2{alpha}, µ + {alpha} + {delta}, and µ, respectively, where µ is the population mean and {alpha} and {delta} are the additive and dominant effects of the QTL. The unknown genetic parameters specifying the genetic architecture of the trait in the progeny population are included in the vector . The maximum-likelihood estimates (MLEs) of these parameters can be obtained by maximizing the likelihood of the phenotype (y) and marker (M) data. The likelihood of the phenotypic trait and the marker genotype data observed in the open-pollinated progeny can be written as a mixture model,

(5)

where N is the total number of offspring (seeds) in the open-pollinated progeny design, h{kappa}j is the conditional probability of the {kappa}th QTL given a two-level marker genotype for the jth offspring ({kappa} = 0, 1, 2), h{gamma}1{gamma}2{kappa}r1r2j is specified for the offspring marker genotype M{gamma}1M{gamma}2 and mother genotype Mr1Mr2 (Table 2), and f{kappa}(yj) is the normal distribution density function having the form

Calculating the MLEs of {Omega} is equivalent to differentiating the log-likelihood of Equation 5 with respect to each of the unknown genetic parameters, setting the derivatives to equal zero, and solving the log-likelihood equations. On the basis of these procedures, we can obtain the explicit ML estimator of marker allele frequency p1:

(6)

For the other parameters , it is not possible to derive explicit ML estimators. To obtain MLEs for these parameters, the EM algorithm (DEMPSTER et al. 1977 Down) is used, which initializes from an arbitrary value of each of the parameters (Appendix B).

The existence of the QTL under consideration can be tested by formulating the two hypotheses

(7)

A log-likelihood ratio (LR) test statistic for the test of these two hypotheses is calculated using

where and denote the MLEs of the unknown vector under the full model (H1) and reduced model (H0), respectively, and LRQ asymptotically follows the {chi}2 distribution with 2 d.f. The hypotheses for testing the linkage disequilibrium detected in the progeny generation t + 1 can be formulated as

(8)

with the corresponding LRD approximately {chi}2 distributed with 1 d.f. (WEIR 1996 Down). The acceptance of the null hypothesis of (8) may be due to either no linkage disequilibrium or a combination of loose linkage and weak linkage disequilibrium. The rejection of the null hypothesis of (8), on the other hand, exclusively reveals strong linkage disequilibrium with or without tight linkage. Further hypotheses for testing whether there is a significant linkage can be formulated as

(9)

with the LRR also approximately {chi}2 distributed with 1 d.f. (TERWILLIGER 1995 Down). If the null hypothesis of (8) is rejected and the null hypothesis of (9) is accepted, then a significant linkage disequilibrium detected between a marker and QTL in the progeny generation is not due to their strong linkage. In this case, results from pure linkage disequilibrium mapping (LUO et al. 2000 Down; MEUWISSEN and GODDARD 2000 Down) are ineffective for genome mapping because the linkage disequilibrium detected is spurious. If a tight linkage is detected, one can further test whether such a linkage is tight enough to the fine mapping of QTL. This test can be carried out by letting {theta} equal a particular small value, e.g., 0.01. In summary, by testing simultaneously for the significance of linkage and linkage disequilibrium, our analytical approach increases the predictability of gene mapping in a natural population.


*  SIMULATION
*TOP
*ABSTRACT
*STATISTICAL METHOD
*SIMULATION
*DISCUSSION
*APPENDIX A
*APPENDIX B
*LITERATURE CITED

The statistical properties of the mapping method proposed in this article are examined by using simulated examples. Suppose the mother plants from which seeds are collected and grown into seedlings are randomly sampled from a panmictic population. A biallelic marker locus and a biallelic QTL, each of which is segregating in the population, are genetically associated. A number of factors may affect the precision and power of the method to detect the putative QTL, which include sampling schemes, the degree of marker and QTL segregation, the degree of linkage and linkage disequilibrium, and the mode of QTL gene action.

The effects of sampling schemes and population heterozygosity:
How the size of samples and their allocation between and within open-pollinated families affect the behavior of a statistical method in a mapping experiment is an important issue for a practitioner to examine. In this simulation, we investigate the effects of three different sampling schemes on parameter estimation. The three schemes include (1) few large families (10 x 100), (2) moderately sized families of a moderate number (32 x 32), and (3) many small families (100 x 10). Also, the effects of sampling schemes are affected by other factors, such as gene segregation, the degree of nonrandom association between the marker and QTL, and the QTL effect. The effect due to the interaction between sampling schemes and gene segregation is examined. Gene segregation for a gene in a population is described by the difference in the frequencies of alternative alleles at the gene. A larger difference (say 0.10 vs. 0.90) implies that a population is closer to fixture and has a smaller degree of segregation. Otherwise, a population of a smaller difference in allele frequency (say 0.50 vs. 0.50) has a larger degree of segregation. Table 3 gives the parameter values used to simulate the effects of sampling schemes and gene segregation. Assuming each of the M selected open-pollinated families has an equal size, the phenotype and marker data are generated using the following steps:

  1. Step 1. Randomly assign three marker genotypes to the M hypothesized mother plants according to probabilities p2(t)1(M1M1), 2p(t)1p(t)2(M1M2), and p2(t)2(M2M2).


     
    View this table:
    In this window
    In a new window

     
    Table 3. Means and standard errors (in parentheses) of the MLEs of the genetic parameters for different sampling schemes and different heterogeneity in allelic frequency from 100 simulation replicates

  2. Step 2. Randomly assign three marker genotypes to the progeny within a mother plant of a particular marker genotype according to probabilities of the marker genotypes of the progeny (Table 2).

  3. Step 3. Randomly sample joint genotypes at both the marker and QTL for an offspring derived from each mother plant from a multinomial distribution with the probabilities calculated from Table 2.

  4. Step 4. Determine the phenotypic value for an individual with a given marker-QTL joint genotype by its genotypic value of the QTL plus a random number sampled from a normal distribution of mean zero and variance {sigma}2 = 1.

The mean and standard error of the MLE for each of the unknowns over 100 replicates of simulation are given in Table 1. The MLE of the marker allele frequency is estimated directly, using Equation 6. The estimation for the other parameters is viewed as a missing data problem. In general, the EM algorithm derived in this article can provide the unknown parameters with consistent MLEs compared to their actual values. Yet, the precisions of parameter estimations in terms of the standard errors estimated from multiple simulation runs are greater when using a sampling scheme of few large families (10 x 100) than of many small families (100 x 10). Such precision improvement due to the use of a better sampling scheme is much more remarkable when the population sampled is closer to fixture. For example, when the difference in allele frequency for both marker and QTL is 0.80, the standard error for the allele frequency of the QTL is 0.0151 for many small families and 0.0087 for few large families. But the corresponding values are 0.0105 and 0.0081 for a population having an equal frequency for the alternative alleles at the same locus.

The power of detecting a significant linkage disequilibrium using our method is also investigated. For a less segregating population, the power is strongly dependent on the sampling scheme used, with 0.79 for many small families and 0.95 for few large families (Table 1).

The effects of linkage and linkage disequilibrium:
Because missing information about the QTL is inferred from the marker genotype, the relationship between the marker and QTL affects the estimates for genetic parameters. Here, four different relationship patterns are compared on the basis of a sampling scheme 32 x 32: (1) tight linkage and weak disequilibrium, (2) tight linkage and strong disequilibrium, (3) loose linkage and weak disequilibrium, and (4) loose linkage and strong disequilibrium (Table 4). In these four patterns, all parameters except recombination fraction and linkage disequilibrium are set equal. As expected, the marker allele frequency can be very well estimated. Given the same linkage between the marker and QTL, a more associated marker tends to provide more precise estimates for both the population genetic (allele frequency) and quantitative genetic parameters of the QTL (the overall mean, additive and dominant effect, and residual variance) than a less associated marker. Also, as shown in our simulation example, there is significantly greater power to detect a QTL using a more associated marker than a less associated marker []. Similarly, given the same disequilibrium, a more linked marker displays greater precision and greater power for estimating a QTL than a less linked marker. When the marker has a loose linkage and weak disequilibrium with the QTL, the marker information provides little information about the genotype at the QTL. Under this circumstance, the MLEs for the QTL parameters are biased with lower precision compared to the other patterns. The power to detect an existing QTL based on the information of a marker with loose linkage ({theta} = 0.20) and weak disequilibrium [] is typically low (Table 5).


 
View this table:
In this window
In a new window

 
Table 4. Means and standard errors (in parentheses) of the MLEs of genetic parameters


 
View this table:
In this window
In a new window

 
Table 5. Means and standard errors (in parentheses) of the MLEs of the genetic parameters for different effects of the QTL from 100 simulation replicates

The effects of linkage and linkage disequilibrium on parameter estimation vary among different parameters. Generally, these effects are larger on the estimates of the dominant effect of the QTL and residual variance than the additive effect and overall mean (Table 4).

The effects of QTL gene action:
It has been well demonstrated that the magnitude of QTL effect affects parameter estimation, with a QTL of large effect being estimated more precisely than a QTL of small effect. Similar results have also been observed in the linkage disequilibrium-based mapping of QTL (LUO and SUHAI 1999 Down; LUO et al. 2000 Down). However, it is unclear how different modes of gene action affect the precision and power of parameter estimation in linkage disequilibrium mapping. A simulation here is designed to investigate the effect of gene action of the estimates of QTL parameters.

Our simulation on gene action includes four different patterns: (1) purely additive ({delta} = 0), (2) partially dominant (0 < {delta}/{alpha} < 1), (3) dominant ({delta}/{alpha} = 1), and (4) overdominant ({delta}/{alpha} > 1). Except for the marker allele frequency, all other parameters have a consistent trend in the precision and power of parameter estimation over gene action (Table 5). As shown by the estimates of standard error, a dominant QTL can be estimated more precisely than an additive QTL. Also, an overdominant QTL can be estimated more precisely than a dominant or partially dominant QTL. However, the power to detect a significant linkage disequilibrium between the marker and QTL is greater for an additive QTL than for a dominant QTL as well as for a partially dominant than for an overdominant QTL (Table 5).

Comparison between traditional disequilibrium mapping an our joint mapping:
We conduct an additional simulation study to compare the power for detecting linkage disequilibrium on the basis of the traditional disequilibrium mapping approach (ALLISON 1997 Down; LUO et al. 2000 Down) and our joint linkage and linkage disequilibrium mapping approach. For comparison, the same sets of genetic parameters are hypothesized between the two approaches, each allowing for different combinations between linkage and disequilibrium (Table 6). For both approaches a sample size of 1000 is assumed. For the pure disequilibrium mapping approach, this sample is randomly derived from a natural population, representing the same generation. But for our joint linkage and linkage disequilibrium mapping approach, this sample is allocated between the parental generation and the open-pollinated progeny generation. Here, the sampling scheme of 32 x 32 is simulated.


 
View this table:
In this window
In a new window

 
Table 6. Difference in power to detect the linkage disequilibrium between a marker and QTL using the existing (top) and our (bottom) approaches

Table 6 shows the observed power for detecting linkage disequilibrium using the two mapping approaches. Generally, greater power is observed for the joint linkage and linkage disequilibrium analysis than for the pure disequilibrium analysis. However, the increase of the power by using the joint analysis depends on the degrees of linkage and linkage disequilibrium between a marker and QTL. In the situations where the linkage is loose or the disequilibrium is weak, the joint mapping approach has significantly increased power compared to the traditional disequilibrium mapping approach.


*  DISCUSSION
*TOP
*ABSTRACT
*STATISTICAL METHOD
*SIMULATION
*DISCUSSION
*APPENDIX A
*APPENDIX B
*LITERATURE CITED

We have provided a unifying framework for the fine-scale mapping of QTL affecting a quantitative trait in a natural population on the basis of a joint linkage and linkage disequilibrium mapping strategy proposed by WU and ZENG 2001 Down. We model marker-QTL association on the basis of a random sample (mothers) drawn from a natural population and marker-QTL linkage on the basis of the open-pollinated progeny of the sample in which recombination events happen between different loci. Such a two-stage (mother and progeny) hierarchical modeling provides a simultaneous estimate of the linkage and linkage disequilibrium between the marker and QTL, which is thus beyond many existing composite linkage disequilibrium mapping methods that cannot distinguish strong association and loose linkage from weak association and tight linkage (TERWILLIGER 1995 Down; XIONG and GUO 1997 Down; COLLINS and MORTON 1998 Down; TERWILLIGER and WEISS 1998 Down; LUO et al. 2000 Down). Moreover, by unifying the information about linkage and linkage disequilibrium, the joint mapping method displays increased power to detect linkage disequilibrium, compared to the traditional linkage disequilibrium analyses.

As an example, we used a simpler one-biallelic codominant marker/one-biallelic QTL model to demonstrate the statistical properties of the joint linkage and linkage disequilibrium analysis in the precise mapping of individual QTL for complex trait. Linkage analysis requiring informative meioses in a pedigree can rarely detect a target gene that is within 1 cM of markers, but it should be useful for a genome-wide scan for QTL because a high-density map covering the entire genome can be constructed in a single pedigree. Thus, through a genome-wide scan for QTL using linkage analysis, genomic regions containing QTL can first be identified. These identified regions are then saturated by more markers and are further narrowed around QTL, using the joint linkage and linkage disequilibrium mapping strategy. We employ the maximum-likelihood method implemented with the EM algorithm to obtain the MLEs for model parameters including the allele frequency of QTL, its effects, its location, and its linkage disequilibrium with a marker. Extensive simulation studies show that the method can provide reasonable estimates for these genetic and genomic parameters for a wide range of parameter values.

In the current modeling, we have not considered the phenotypes of the genotyped mothers sampled from a natural population and used to supply the next progeny (contained in seeds). Yet, this would not affect the efficiency and utility of the model because we have integrated mothers' marker genotypes and progeny's marker genotypes into a two-level marker genotype framework. Thus, the phenotypes of the progeny population can be directly associated with the two-level marker genotypes. The strategy with no need of mothers' phenotypes is practically advantageous in at least two aspects. First, for species like forest trees, sample mothers from a natural population are easily genotyped, but their phenotypes are difficult to measure. Second, the mothers sampled cannot be compared to their progeny in phenotypes because of different ages and growth environments. However, for some species that can be vegetatively propagated, a field trial can be established with clonal replicates of both mothers and their progeny. In this case, mothers and their progeny with the same age can be simultaneously measured and compared. A further simulation study is needed to examine the advantage of the model implemented with mothers' phenotypes.

Although the codominant marker assumption used can be valid by genotyping markers like SNPs, there are many dominant markers derived from rapid amplified polymorphic DNAs and amplified fragment length polymorphisms in real data analyses for natural outcrossing populations. Also, with no doubt, our one marker-one QTL model is too simplistic for a quantitative trait that may be controlled by multiple genes each with a different effect. For these two practical reasons, the joint linkage and linkage disequilibrium mapping approach needs extension to allow for multiple markers including dominant and multiallelic markers. Linkage analysis in a pedigree using dominant markers is often biased and has low precision especially when a sample size is small (MALIEPAARD et al. 1997 Down). But these problems can be overcome if they are combined with codominant markers through a Markov chain (JIANG and ZENG 1997). For the linkage disequilibrium analysis of dominant markers, a similar improvement in the precision of parameter estimate can also be expected from their combined use with codominant markers. For multiple alleles and/or loci, basic extension of the single-marker disequilibrium measures presented above has been developed in the current literature. Like linkage analysis, multipoint disequilibrium can be more efficient than single-marker analysis. For example, HILL and WEIR 1994 Down showed that the variance of the linkage disequilibrium between a closely linked marker and a QTL is large, such that the disequilibrium cannot be used for the precise mapping of the QTL. When the disequilibria between all markers and the QTL are analyzed simultaneously, the problem of a high variability of a single linkage disequilibrium is avoided (MEUWISSEN and GODDARD 2000 Down). A likelihood-based multipoint approach to linkage disequilibrium mapping loci can be found in TERWILLIGER 1995 Down, MCPEEK and STRAHS 1999 Down, MEUWISSEN and GODDARD 2000 Down, and MORRIS et al. 2000 Down. When a narrow region is being considered for linkage disequilibrium fine-scale mapping, conditioning on the distances between markers allows the use of a composite likelihood to extract information from multiple markers. XIONG and GUO 1997 Down give a general likelihood framework for linkage disequilibrium mapping that incorporates multiallelic markers, multiple loci, and mutational processes at the disease and marker alleles.

For a multi-QTL model, a number of genetic parameters are treated as unknown. These include the number of QTL, the additive and dominant effect of each QTL, different kinds of epistatic effect between each pair of QTL, the chromosomal location of each QTL (determined by the recombination fraction between each QTL and its flanking markers), the linkage disequilibrium between each pair of QTL, and the linkage disequilibrium between each QTL and each marker. The maximum-likelihood method that works in a one-marker/one-QTL case may be insufficient for handling such a high dimension of unknowns. Markov chain Monte Carlo (MCMC) methods within a Bayesian framework may be a better solution for our multi-QTL linkage and linkage disequilibrium mapping. Unlike the traditional maximum-likelihood method, MCMC methods provide estimates for unobservables by analyzing their posterior distributions (ROBERT and CASELLA 1999 Down). In the MCMC paradigm, we are able to incorporate prior information for model parameters including the number of QTL, where appropriate, which is thus advantageous over the maximum-likelihood method. Given the impressive applications of the Bayesian approach in QTL linkage mapping (see SATAGOPAN et al. 1996 Down; SILLANPAA and ARJAS 1996 Down, SILLANPAA and ARJAS 1999 Down; UIMARI et al. 1996 Down; HEATH 1997 Down; STEPHENS and FISCH 1998 Down), we are confident of developing a powerful Bayesian approach for joint linkage and linkage disequilibrium mapping of multiple QTL through the entire genome.


*  ACKNOWLEDGMENTS

We are grateful to Dr. Bruce Weir, Dr. Shizhong Xu, Dr. Nengjun Yi, and Dr. Zhao-Bang Zeng for stimulating discussions about this study, and to the associate editor and two anonymous referees for their helpful comments on this manuscript. This manuscript was approved for publication as journal series no. R-07961 by the Florida Agricultural Experiment Station.

Manuscript received June 15, 2001; Accepted for publication November 20, 2001.


*  APPENDIX A
*TOP
*ABSTRACT
*STATISTICAL METHOD
*SIMULATION
*DISCUSSION
*APPENDIX A
*APPENDIX B
*LITERATURE CITED

We describe a procedure for deriving the conditional probabilities of QTL genotypes given two-level (mother and progeny) marker genotypes. Let us first consider the mothers with marker genotype M1M1. This mother marker genotype is composed of three joint marker-QTL genotypes M1M1Q1Q1, M1M1Q1Q2, and M1M1Q2Q2, with respective population frequencies in the mother generation (t) as , , and (Table 1). Each of these three mother two-locus genotypes generates either a two-locus gamete M1Q1 or M1Q2, or both, which are combined with all four possible two-locus gametes M1Q1, M1Q2, M2Q1, and M2Q2 from the pollen pool, with population frequencies p(t+1)11, p(t+1)12, p(t+1)21, and p(t+1)22, respectively, to produce the progeny generation (t + 1) (contained in seeds). Here it is not difficult to calculate the probabilities of different joint marker-QTL genotypes in the progeny population. For example, the probability of progeny joint genotype M1M1Q1Q1 derived from mother genotype M1M1 is the sum of , where the first part results from the combination of the same gamete genotype M1Q1 from mother genotype M1M1Q1Q1 and the pollen pool and the second part from the combination of the same gamete genotype M1Q1 from mother genotype M1M1Q1Q2 and the pollen pool. Thus, according to Bayes' theorem, the conditional probability of the QTL genotype Q1Q1, given the mother's marker genotype M1M1 and progeny's marker genotype M1M1, is

The probability of progeny joint genotype M1M1Q1Q2 derived from mother genotype M1M1 includes two components: (1) from the mating of mother gamete genotype M1Q1 and father gamete genotype M1Q2 from the pollen pool and (2) from the mating of mother gamete genotype M1Q2 and father gamete genotype M1Q1 from the pollen pool. The conditional probability of the QTL genotype Q1Q2 given the mother's marker genotype M1M1 and progeny's marker genotype M1M1 is thus calculated as . The rest of the conditional probabilities of the QTL genotypes given the mother's marker genotype M1M1 can also be calculated (see Table 2).

When the marker genotype of a sampled mother is M1M2, three possible joint marker-QTL genotypes are M1M2Q1Q1, M1M2Q1Q2, and M1M2Q2Q2, with population frequencies as and in the generation t, respectively. The probabilities of four joint marker-QTL gamete genotypes generated by each of these three joint genotypes are given in Table 1. Thus, the probability of progeny joint genotype M1M1Q1Q1 derived from mother marker genotype M1M2 is the sum of p(t)11p(t)21 · p(t+1)11 and (1 - {theta}) [p(t)11p(t)22 + p(t)12p(t)21] · p(t+1)11. The conditional probability of the QTL genotype given the mother's marker genotype M1M2 and progeny's marker genotype M1M1 can be calculated accordingly. The probabilities of all QTL genotypes conditional upon different progeny marker genotypes derived from the mother's marker genotype M1M2 are derived in Table 2.

A similar procedure can be described to derive the conditional probabilities of different QTL genotypes when the mother's marker genotype is M2M2 (Table 2).


*  APPENDIX B
*TOP
*ABSTRACT
*STATISTICAL METHOD
*SIMULATION
*DISCUSSION
*APPENDIX A
*APPENDIX B
*LITERATURE CITED

The MLEs of the unknown parameters can be computed by implementing an EM algorithm (DEMPSTER et al. 1977 Down; MENG and RUBIN 1993). The log-likelihood is given by

with derivatives

where we define

(B1)

which could be thought of as a posterior probability that progeny j have QTL genotype {kappa}. We then implement the EM algorithm with the expanded parameter set {{Omega}_, H}, where . Conditional on H, we solve for the zeros of {partial}/{partial}{Omega}_ log L({Omega}_) to get our estimates of {Omega}_ (the M step). In the M step, the quantitative genetic parameters, µ, {alpha}, ß, and {sigma}2, of the QTL detected are solved using

(B2)


(B3)



(B4)


(B5)

The population genetic parameters q(t)s and D(t)rs and genomic parameter {theta} are estimated by using a numerical subroutine approach (PRESS et al. 1992 Down) because closed forms for the solutions of these parameters cannot be derived. The estimates of these parameters are obtained by solving Equations B6–B8 in Scheme 1.


(B6)


(B7)

and

(B8)

where for the {kappa}th QTL conditional on a two-level marker genotype with the subscripts and superscripts given by Equation 5, and .

The estimates obtained from Equations B2–B8 in Scheme 1 are then used to update H (the E step). In the E step, the posterior probability of progeny j to have QTL genotype {kappa} is calculated using Equation B1. The iteration between the E and M steps is repeated until convergence. The values at convergence are the MLEs of the parameters.


*  LITERATURE CITED
*TOP
*ABSTRACT
*STATISTICAL METHOD
*SIMULATION
*DISCUSSION
*APPENDIX A
*APPENDIX B
*LITERATURE CITED

ALLISON, D. B., 1997  Transmission disequilibrium tests for quantitative traits. Am. J. Hum. Genet. 60:676-690[Medline].

COLLINS, A. and N. E. MORTON, 1998  Mapping a disease locus by allelic association. Proc. Natl. Acad. Sci. USA 95:1741-1745[Abstract/Free Full Text].

DARVASI, A., A. WEINREB, V. MINKE, J. I. WELLER, and M. SOLLER, 1993  Detecting marker-QTL linkage and estimating QTL gene effect and map location using a saturated genetic map. Genetics 134:943-951[Abstract].

DEMPSTER, A. P., N. M. LAIRD, and D. B. RUBIN, 1977  Maximum likelihood from incomplete data via EM algorithm. J. R. Stat. Soc. Ser. B 39:1-38.

GORDON, D., I. SIMONIC, and J. OTT, 2000  Significant evidence for linkage disequilibrium over a 5-cM region among Afrikaners. Genomics 66:87-92[Medline].

STBACKA, J., A. DE LA CHAPELLE, I. KAITILA, P. SISTONEN, and A. WEAVER et al., 1992  Linkage disequilibrium mapping in isolated founder populations: diastropic dysplasia in Finland. Nat. Genet. 2:204-221[Medline].

STBACKA, J., A. DE LA CHAPELLE, M. MAHTANI, G. CLINES, and M. P. REEVE-DALY et al., 1994  The diastropic dysplasia gene encodes a novel sulfate transporter: positional cloning by fine-structure linkage disequilibrium mapping. Cell 78:1073-1087[Medline].

HEATH, S. C., 1997  Markov chain Monte Carlo segregation and linkage analysis for oligogenic models. Am. J. Hum. Genet. 61:748-760[Medline].

HILL, W. G. and B. S. WEIR, 1994  Maximum-likelihood estimation of gene location by linkage disequilibrium. Am. J. Hum. Genet. 54:705-714[Medline].

LONG, A. D., S. L. MULLANEY, L. A. REID, J. D. FRY, and C. H. LANGLEY et al., 1995  High resolution mapping of genetic factors affecting abdominal bristle number in Drosophila melanogaster.. Genetics 139:1273-1291[Abstract].

LUO, Z. W. and S. SUHAI, 1999  Estimating linkage disequilibrium between a polymorphic marker locus and a trait locus in natural populations. Genetics 151:359-371[Abstract/Free Full Text].

LUO, Z. W., S. H. TAO, and Z-B. ZENG, 2000  Inferring linkage disequilibrium between a polymorphic marker locus and a trait locus in natural populations. Genetics 156:457-467[Abstract/Free Full Text].

MACKAY, T. F. C., 2001  Quantitative trait loci in Drosophila. Nat. Rev. Genet. 2:11-20[Medline].

MALIEPAARD, C., J. JANSEN, and J. W. VAN OOIJEN, 1997  Linkage analysis in a full-sib family of an outbreeding plant species: overview and consequences for applications. Genet. Res. 70:237-250.

MCKEAND, S. E. and F. E. BRIDGWATER, 1998  A strategy for the third breeding cycle of loblolly pine in the Southeastern US. Silvae Genet. 47:223-234.

MCPEEK, M. S. and A. STRAHS, 1999  Assessment of linkage disequilibrium by the decay of haplotype sharing, with application to fine-scale genetic mapping. Am. J. Hum. Genet. 65:858-875[Medline].

MEUWISSEN, T. H. E. and M. E. GODDARD, 2000  Fine mapping of quantitative trait loci using linkage disequilibria with closely linked marker loci. Genetics 155:421-430[Abstract/Free Full Text].

MORRIS, A. P., J. C. WHITTAKER, and D. J. BALDING, 2000  Bayesian fine-scale mapping of disease loci, by hidden Markov models. Am. J. Hum. Genet. 67:155-169[Medline].

NAGYLAKI, T., 1992 Introduction to Theoretical Population Genetics. Springer-Verlag, Berlin.

OLSON, J. M., J. S. WITTE, and R. C. ELSTON, 1999  Tutorial in biostatistics genetic mapping of complex traits. Stat. Med. 18:2961-2981[Medline].

PFEIFFER, A., A. M. OLIVIERI, and M. MORGANTE, 1997  Identification and characterization of microsatellites in Norway spruce (Picea abies K.). Genome 40:411-419[Medline].

PRESS, W. H., S. A. TEUKOLSKY, W. T. VETTERLING and B. P. FLANNERY, 1992 Numerical Recipes: The Art of Scientific computing. Cambridge University Press, New York.

RABINOWITZ, D., 1997  A transmission disequilibrium test for quantitative trait loci. Hum. Hered. 47:342-350[Medline].

RISCH, N. and K. MERIKANGAS, 1996  The future of genetic studies of complex human diseases. Science 273:1516-1517[Abstract/Free Full Text].

ROBERT, C. P., and G. CASELLA, 1999 Monte Carlo Statistical Methods. Springer, New York.

SATAGOPAN, J. M., Y. S. YANDELL, M. A. NEWTON, and T. C. OSBORN, 1996  Bayesian approach to detect quantitative trait loci using Markov chain Monte Carlo. Genetics 144:805-816[Abstract].

SILLANPAA, M. J. and E. ARJAS, 1996  Bayesian mapping of multiple quantitative trait loci from incomplete inbred line cross data. Genetics 148:1373-1388[Abstract/Free Full Text].

SILLANPAA, M. J. and E. ARJAS, 1999  Bayesian mapping of multiple quantitative trait loci from incomplete outbred offspring data. Genetics 151:1605-1619[Abstract/Free Full Text].

SPIELMAN, R. S. and W. J. EWENS, 1996  The TDT and other family-based tests for linkage disequilibrium and association. Am. J. Hum. Genet. 59:983-989[Medline].

STEPHENS, D. A. and R. D. FISCH, 1998  Bayesian analysis of quantitative trait locus data using reversible jump Markov chain Monte Carlo. Biometrics 54:1334-1347.

TANKSLEY, S. D., 1993  Mapping genes. Annu. Rev. Genet. 27:205-233[Medline].

TEMPLETON, A. R., 1999  Uses of evolutionary theory in the human genome project. Annu. Rev. Ecol. Syst. 30:23-49.

TERWILLIGER, J. D., 1995  A powerful likelihood method for the analysis of linkage disequilibrium between trait loci and one or more polymorphic marker loci. Am. J. Hum. Genet. 56:777-787[Medline].

TERWILLIGER, J. D. and K. M. WEISS, 1998  Linkage disequilibrium mapping of complex disease: Fantasy or reality? Curr. Opin. Biotechnol. 9:578-594[Medline].

UIMARI, P., G. THALLER, and I. HOESCHELE, 1996  The use of multiple markers in a Bayesian method for mapping quantitative trait loci. Genetics 143:1831-1842[Abstract].

WANG, D. G., J. B. FAN, C. J. SIAO, A. BERNO, and P. YOUNG et al., 1998  Large-scale identification, mapping, and genotyping of single-nucleotide polymorphisms in the human genome. Science 280:1077-1082[Abstract/Free Full Text].

WEBER, J. L. and C. WONG, 1993  Mutation of human short tandem repeats. Hum. Mol. Genet. 2:1123-1128[Abstract/Free Full Text].

WEIR, B. S., 1996 Genetic Data Analysis II. Sinauer Associates, Sunderland, MA.

WHITTAKER, J. C., M. C. DENHAM, and A. P. MORRIS, 2000  The problems of using transmission/disequilibrium test to infer tight linkage. Am. J. Hum. Genet. 67:523-526[Medline].

WU, R. L. and Z-B. ZENG, 2001  Joint linkage and linkage disequilibrium mapping in natural populations. Genetics 157:899-909[Abstract/Free Full Text].

WU, R. L., Z-B. ZENG, S. E. MCKEAND, and D. M. O'MALLEY, 2000  The case for molecular mapping in forest tree breeding. Plant Breed. Rev. 19:41-68.

XIONG, M. M. and S. W. GUO, 1997  Fine-scale genetic mapping based on linkage disequilibrium: theory and applications. Am. J. Hum. Genet. 60:1513-1531[Medline].




This article has been cited by other articles:


Home page
The Plant GenomeHome page
C. Zhu, M. Gore, E. S. Buckler, and J. Yu
Status and Prospects of Association Mapping in Plants
The Plant Genome, July 1, 2008; 1(1): 5 - 20.
[Abstract] [Full Text] [PDF]


Home page
GeneticsHome page
J. Yu, J. B. Holland, M. D. McMullen, and E. S. Buckler
Genetic Design and Statistical Power of Nested Association Mapping in Maize
Genetics, January 1, 2008; 178(1): 539 - 551.
[Abstract] [Full Text] [PDF]


Home page
BioinformaticsHome page
M. Lin, H. Li, W. Hou, J. A. Johnson, and R. Wu
Modeling sequence sequence interactions for drug response
Bioinformatics, May 15, 2007; 23(10): 1251 - 1257.
[Abstract] [Full Text] [PDF]


Home page
GeneticsHome page
W. Zhao, J. Zhu, M. Gallo-Meagher, and R. Wu
A Unified Statistical Model for Functional Mapping of Environment-Dependent Genetic Expression and Genotype x Environment Interactions for Ontogenetic Development
Genetics, November 1, 2004; 168(3): 1751 - 1762.
[Abstract] [Full Text] [PDF]


Home page
GeneticsHome page
T. Liu, J. A. Johnson, G. Casella, and R. Wu
Sequencing Complex Diseases With HapMap
Genetics, September 1, 2004; 168(1): 503 - 511.
[Abstract] [Full Text] [PDF]


Home page
GeneticsHome page
K. Shibata, T. Ito, Y. Kitamura, N. Iwasaki, H. Tanaka, and N. Kamatani
Simultaneous Estimation of Haplotype Frequencies and Quantitative Trait Parameters: Applications to the Test of Association Between Phenotype and Diplotype Configuration
Genetics, September 1, 2004; 168(1): 525 - 539.
[Abstract] [Full Text] [PDF]


Home page
GeneticsHome page
X.-Y. Lou, G. Casella, R. C. Littell, M. C. K. Yang, J. A. Johnson, and R. Wu
A Haplotype-Based Algorithm for Multilocus Linkage Disequilibrium Mapping of Quantitative Trait Loci With Epistasis
Genetics, April 1, 2003; 163(4): 1533 - 1548.
[Abstract] [Full Text] [PDF]