Genetics, Vol. 165, 901-913, October 2003, Copyright © 2003

A General Statistical Framework for Mapping Quantitative Trait Loci in Nonmodel Systems: Issue for Characterizing Linkage Phases

Min Lina, Xiang-Yang Loua,b, Myron Changa, and Rongling Wua,c
a Department of Statistics, University of Florida, Gainesville, Florida 32611,
b Department of Agronomy, Zhejiang University, Hangzhou, Zhejiang 310029, People's Republic of China
c Laboratory of Statistical Genetics, Zhejiang Forestry College, Lin'An, Zhejiang 311300, People's Republic of China

Corresponding author: Rongling Wu, 533 McCarty Hall C, University of Florida, Gainesville, FL 32611., rwu{at}stat.ufl.edu (E-mail)

Communicating editor: S. TAVARÉ


*  ABSTRACT
*TOP
*ABSTRACT
*STATISTICAL MODEL
*MONTE CARLO SIMULATION
*A CASE STUDY
*DISCUSSION
*APPENDIX A
*APPENDIX B
*LITERATURE CITED

Because of uncertainty about linkage phases of founders, linkage mapping in nonmodel, outcrossing systems using molecular markers presents one of the major statistical challenges in genetic research. In this article, we devise a statistical method for mapping QTL affecting a complex trait by incorporating all possible QTL-marker linkage phases within a mapping framework. The advantage of this model is the simultaneous estimation of linkage phases and QTL location and effect parameters. These estimates are obtained through maximum-likelihood methods implemented with the EM algorithm. Extensive simulation studies are performed to investigate the statistical properties of our model. In a case study from a forest tree, this model has successfully identified a significant QTL affecting wood density. Also, the probability of the linkage phase between this QTL and its flanking markers is estimated. The implications of our model and its extension to more general circumstances are discussed.


RECENT developments of modern molecular marker technologies and statistical and computational tools have led to a great resurgence of interest in studying the inheritance and genetic architecture of a complex trait at the individual quantitative trait locus (QTL) level (LANDER and SCHORK 1994 Down; LANDER and WEINBERG 2000 Down; MACKAY 2001 Down). A number of analytical methodologies that suit different situations of QTL mapping have been framed (LYNCH and WALSH 1998 Down; JANSEN 2000 Down; WELLER 2001 Down) and the results of genetic analysis for a variety of organisms using these methodologies have been reported (reviewed in WU et al. 2000 Down; MACKAY 2001 Down). According to the biological properties of study objects, all these theoretical or empirical studies can be classified into two categories, one for model systems and the other for nonmodel systems.

QTL mapping for model systems, in which homozygous inbred lines can be developed, is performed with well-designed experiments. One popular experimental design is to create a segregating progeny population, such as F2 or backcross, by using two complementary inbred lines. Statistical technologies for identifying QTL in these standard designs are relatively simple because there are only two segregating alleles for each genetic locus and because the allelic frequencies and linkage phases for both the markers and QTL are known. LANDER and BOTSTEIN 1989 Down for the first time proposed a maximum-likelihood-based method to map a QTL in a chromosomal interval bracketed by two flanking markers. The theory behind this interval-mapping method was subsequently extended to create a so-called composite-interval mapping by combining multiple marker regression analysis techniques, which can overcome the influences of QTL in other different marker intervals (reviewed in JANSEN 2000 Down). Although these two statistical developments of QTL mapping have brought about numerous publications on QTL identification, they are not designed to make a simultaneous search for all possible QTL affecting a quantitative trait throughout the entire genome. KAO and ZENG 1997 Down have derived general formulas for obtaining maximum-likelihood estimates for QTL positions and effects. In their article, the authors developed a multiple-interval-mapping method to search and map all possible QTL by analyzing multiple marker intervals simultaneously and, therefore, to estimate the genetic architecture of a quantitative trait in a comprehensive framework.

Unlike the model systems, it is difficult or impossible to generate inbred lines in nonmodel systems and, thus, QTL mapping for these species should be based on existing nondomesticated populations, such as a full-sib family derived from heterozygous parents (GRATTAPAGLIA and SEDEROFF 1994 Down). In the mapping populations of nonmodel systems the number of segregating alleles per marker locus or QTL and linkage phases between different loci are usually unknown. These two uncertainties make linkage analysis and QTL mapping using molecular markers much more challenging for a full-sib family of outbred lines than for a progeny of a cross derived from inbred lines. Several studies have been conducted for linkage analyses of molecular markers of a different amount of segregation informativeness (RITTER et al. 1990 Down; ARUS et al. 1994 Down; RITTER and SALAMINI 1996 Down; MALIEPAARD et al. 1997 Down; RIDOUT et al. 1998 Down) or QTL identification using these different markers in a full-sib family (SCHAFER-PREGL et al. 1996 Down; JOHNSON et al. 1999 Down; SONG et al. 1999 Down). In some studies, more sophisticated statistical algorithms, such as Bayesian approaches relying on a Markov chain Monte Carlo, have been proposed to take the complexity of full-sib family mapping into account (HOESCHELE et al. 1997 Down; SILLANPAA and ARJAS 1999 Down; XU and YI 2000 Down). However, all these studies are still simplified in practice because they do not provide a robust approach for characterizing linkage phases between markers and QTL. Because different segregation patterns are expected under different linkage phases (WU et al. 2002 Down), the failure to characterize a correct linkage phase may lead to serious biases for the estimation of QTL positions and effect sizes in a full-sib family.

In this article, we extend WU et al. 2002 Down multilocus analysis procedure to simultaneously estimate linkage and linkage phases between markers and QTL segregating in outcrossing populations. Our idea here is to integrate all possible linkage phases between a putative QTL and two flanking markers in two parents, each specified by a phase probability, within the framework of a mixture statistical model. In characterizing a most likely linkage phase on the basis of the phase probabilities, the QTL position, QTL effects, and other model parameters are also estimated using a likelihood approach. We perform numerous simulation studies to investigate the robustness, power, and precision of our statistical mapping method, incorporating linkage phases. An example from an outcrossing forest tree is used to validate the application of our method to QTL mapping for nonmodel systems.


*  STATISTICAL MODEL
*TOP
*ABSTRACT
*STATISTICAL MODEL
*MONTE CARLO SIMULATION
*A CASE STUDY
*DISCUSSION
*APPENDIX A
*APPENDIX B
*LITERATURE CITED

Marker segregation types:
A commonly used mapping population for outcrossing species is one derived from a full-sib family generated by two heterozygous outbred parental lines. In such a full-sib family, many different marker segregation types can be expected given the heterozygosity of the two parents. GRATTAPAGLIA and SEDEROFF 1994 Down proposed a well-accepted pseudo-test backcross design for mapping in an outcrossing population, but this design can use only a portion of the genome markers segregating in one parent but null in the other. For a full-sib family derived from two outbred parents, up to four marker alleles, besides a null allele, at a single locus, can occur. Also, the number of alleles may vary over loci. Each of the marker alleles, symbolized by a, b, c, and d, is dominant to the null allele, symbolized with o. Assume that all markers follow a Mendelian segregation without distortion. Depending on how different alleles are combined between the two parents used for the cross, a total of 18 possible cross types exist for a segregating marker locus (Table 1). On the basis of both parental and offspring marker band patterns, these cross types can be classified into seven groups:

  • A. Loci that are heterozygous in both parents and segregate in a 1:1:1:1 ratio, including four alleles, ab x cd; three nonnull alleles, ab x ac; three nonnull alleles and a null allele, ab x co; and two null alleles and two nonnull alleles, ao x bo.


     
    View this table:
    In this window
    In a new window

     
    Table 1. Possible marker genotype cross combinations and observed marker band patterns for parents and their offspring

  • B. Loci that are heterozygous in both parents and segregate in a 1:2:1 ratio, which include three groups:

  • B1. Three alleles form a nonsymmetrical cross type between the two parents. Of the three alleles, one is a null allele in one parent, e.g., ab x ao.

  • B2. The reciprocal of B1.

  • B3. Two alleles form a symmetrical type between the two parents, i.e., ab x ab.

  • C. Loci that are heterozygous in both parents and segregate in a 3:1 ratio, i.e., ao x ao.

  • D. Loci that are in the testcross configuration between the parents and segregate in a 1:1 ratio, which include two groups:

  • D1. Heterozygous in one parent and homozygous in the other, including three alleles, ab x cc; two alleles, ab x aa, ab x oo, and bo x aa; and one allele ao x oo.

  • D2. The reciprocals of D1.

A general statistical framework has been proposed for linkage analysis of different types of markers in nonmodel systems (WU et al. 2002 Down). A multilocus linkage phase inference model is derived on the basis of a hidden Markov chain process to simultaneously estimate linkage and linkage phases for the markers on a same linkage group. The genetic mapping of QTL is conducted using such a well-constructed linkage map.

A general framework:
Consider two outbred parental lines denoted as P and Q, each containing two homologous chromosomes 12 in a set. The cross between these two lines, 12 x 12, results in four possible parental chromosome pairings, 11, 12, 21, and 22. In this article, we used italic numbers to denote parental chromosomes.

As explained above and seen in Table 1, there may be many different marker types in a full-sib family derived from the two outbred parental lines. However, all observed markers, no matter which type they come from, can be described by two alleles, Mk1 and Mk2, at marker k and two alleles, Mk+11 and Mk+12, at marker k+1 for parent P. Similarly, the corresponding alleles for parent Q are described as Nk1 and Nk2 at marker k and Nk+11 and Nk+12 at marker k+1. Suppose there is a QTL between the two markers. The four alleles of the QTL are denoted by P1 and P2 for parent P and by Q1 and Q2 for parent Q. Analogous to the marker segregation as described in WU et al. 2002 Down, the QTL will be segregating to generate zygotes P1Q1, P1Q2, P2Q1, and P2Q2 following a 1:1:1:1 ratio in the family. The recombination fractions between the two markers, between marker k and the QTL and between the QTL and marker k+1, are denoted by r, r1, and r2, respectively, with r = r1 + r2 - 2r1r2. Parent-specific difference of linkage is ignored. The alleles of these two markers and the QTL are arranged between the two homologous chromosomes in each of a total of four possible linkage phases for each parent. But the allelic linkage phases of the two markers can be known for both parents through linkage analyses of markers using a strategy proposed in WU et al. 2002 Down. Thus, under a fixed-marker linkage phase, we will have 2 x 2 = 4 parental combinations ({Phi}'s) of linkage phase of the QTL relative to the two markers, schematically expressed, along with the order of the four QTL genotypes in the progeny, as




where the first and second subscripts of {Phi} denote two possible phases of parents P and Q, respectively, and the vertical lines for each phase combination denote two parental chromosomes 12 for each parent. Each parent, no matter which possible phase combination it has, will generate eight three-locus haploid gametes, with the gamete probabilities depending on the phase. Under phase combination {Phi}11, the eight gamete probabilities are calculated as

where we use the subscripts to denote the marker and QTL alleles. The eight gametes from parent P unite randomly with the eight gametes from parent Q, which will generate a total of 64 zygotic genotypes. The probabilities of the joint genotypes for the two markers and the QTL are calculated on the basis of the g's, which are expressed in matrix Gk(k+1) (Table 2). This joint probability matrix is composed of four vectors, gk(k+1)11, gk(k+1)12, gk(k+1)21, and gk(k+1)22, each corresponding to a different parental chromosomal pairing. The probabilities of the 2-marker gametic genotypes are expressed as


 
View this table:
In this window
In a new window

 
Table 2. A 16-dimensional vector (Mk(k+1)) for the probabilities of the marker genotypes for k and k+1 and a (16 x 4)-matrix (Gk(k+1)) for the probabilities of the joint genotypes for the two markers and the QTL bracketed by the markers in a full-sib family

These 4 marker gametic probabilities are used to calculate 16 marker zygotic probabilities denoted by vector Mk(k+1). Thus, according to Bayes theorem, the matrix (Hk(k+1)) for the conditional probabilities of different QTL genotypes, conditional upon the marker interval genotypes, can be derived as

where {oslash} stands for the division of the corresponding elements of each column in a matrix by a column vector. Correspondingly, the conditional probability matrix Hk(k+1) is composed of four vectors, hk(k+1)11, hk(k+1)12, hk(k+1)21, and hk(k+1)22, each represented by a different parental chromosomal pairing.

Because of different gamete probability combinations, the joint probabilities of the 64 zygotic genotypes (and therefore the conditional probabilities of the QTL genotypes given marker genotypes) will be different among the four phase combinations. However, regardless of the difference among these four phase combinations, these conditional probabilities under different phase combinations can be obtained just by changing the order of the QTL genotypes corresponding to a particular phase combination (Table 2).

Let u and v denote the QTL alleles that an offspring i has received from parent P and Q, respectively. The conditional probability of the QTL genotype for this individual under parental-phase combination {Phi}11 is denoted by huvi expressed in one of the four vectors, hk(k+1)11, hk(k+1)12, hk(k+1)21, and hk(k+1)22. The probability with which a particular phase occurs is denoted as p for parent P and q for parent Q. Without loss of generality, let {phi}11 = pq, {phi}12 = p(1 - q), {phi}21 = (1 - p)q, and {phi}22 = (1 - p)(1 - q). Thus, the conditional probability of a QTL genotype PuQv in the full-sib family should be a mixture of the corresponding conditional probabilities under these four phase combinations, weighted by {phi}11, {phi}12, {phi}21, and {phi}22.

Assume that the phenotypic values y of a QTL genotype PuQv are normally distributed with mean µuv and variance {sigma}2, expressed as The likelihood function of the phenotypic values (y) for all N offspring in the full-sib family is expressed in terms of a normal mixture model:

(1)

where {Omega} = (µuv, {sigma}2, p, q)T is the vector for unknown parameters contained within the mixture model, and {pi}11i = h11i{phi}11 + h12i{phi}12 + h21i{phi}21 + h22i{phi}22, {pi}12i = h12i{phi}11 + h11i{phi}12 + h22i{phi}21 + h21i{phi}22, {pi}21i = h21i{phi}11 + h22i{phi}12 + h11i{phi}21 + h12i{phi}22, and {pi}22i = h22i{phi}11 + h21i{phi}12 + h12i{phi}21 + h11i{phi}22 are the mixture of the conditional probabilities of different QTL genotypes over different phase combinations. The parameters contained in {Omega} can be estimated by implementing the expectation-maximization (EM) algorithm (DEMPSTER et al. 1977 Down). The log-likelihood of Equation 1 is given by

(2)

with a derivative for any unknown {Omega}{lambda}

where we define

(3)

which could be thought of as a posterior probability that the ith offspring has a QTL genotype PuQv. We then implement the EM algorithm with the expanded parameter set {{Omega}, {Pi}}, where {Pi} = {{Pi}uvi}. Conditional on {Pi}, we solve for the zeros of ({partial}/{partial}{Omega}{phi})log L({Omega}|y) (Appendix A) to get our estimates of {Omega} (the M step). The estimates are then used to update {Pi} (the E step), and the process is repeated until convergence. The values at convergence are the maximum-likelihood estimates (MLEs).

Because marker information for each offspring has been incorporated into the mixture model of Equation 1, one unknown parameter r1 or r2 (that determines the location of the QTL on the interval) should be estimated along with vector {Omega}. But in practice, the QTL location is estimated by treating r1 (and therefore r2) as fixed. Using a grid approach, we can obtain the MLE of the QTL location from the peak of the profile of the log-likelihood ratio test statistics across a chromosome.

On the basis of quantitative genetic theory, the genotypic value of a QTL can be partitioned into the additive and dominant effects as

where µ is the overall mean, {alpha}u and ßv are the allelic (additive) effects of alleles u and v, respectively, and {delta}uv is the interaction (dominant) effect at the QTL. Considering all possible alleles and allele combinations between the two parents, there are a total of four additive effects ({alpha}1 and {alpha}2 from parent P and ß1 and ß2 from parent Q) and four dominant effects ({delta}11, {delta}12, {delta}21, and {delta}22). But these additive and dominant effects are not independent and, therefore, are not estimable. After parameterization, there are two independent additive effects, {alpha} = {alpha}1 = -{alpha}2 and ß = ß1 = 2, and one dominant effect, {delta} = {delta}11 = -{delta}12 = -{delta}21 = {delta}22, to be estimated.

Let m = (µuv)4x1 and a = (µ, {alpha}, ß, {delta})T, which can be connected by a design matrix D. We have

where

The MLE of a can be obtained from the MLE of m by

Fitting marker phenotypes:
We have built a general framework for QTL mapping based on the two-marker zygote genotypes. But in practice only the phenotypes of the marker zygotes can be observed. The numbers of the zygote phenotypes of a marker are 4, 3, 3, 3, 2, 2, and 2 for marker types A, B1, B2, C, D1, and D2, respectively (Table 1). We have designed different incidence matrices I (WU et al. 2002 Down; Appendix B) to connect the zygotic genotypes to the zygotic phenotypes for all different marker types listed in Table 1. Thus, general expressions for the probability vector of two-marker genotypes or the joint probability matrix of two markers and QTL for particular marker types can be derived by using the corresponding incidence matrices (Appendix B), which are expressed as

where {otimes} is the Kronecker product, and Ik and Ik+1 are the incidence matrices for markers k and k+1. For some marker types, the pattern and structure of the incidence matrices are dependent on the linkage phases of the two flanking markers. Hence, the conditional probability matrix for an observed marker type is calculated as

which is used as a basis for QTL mapping in outcrossing species.

Hypothesis tests:
The existence of a QTL of significant effect within a marker interval can be tested by calculating a log-likelihood ratio (LR) test statistic under the null (H0, there is no QTL) and alternative hypotheses (H1, there is a QTL), expressed as

(4)

The LR under the null hypothesis is asymptotically {chi}2-distributed with 4 d.f. However, because the position of a QTL is not identifiable, the assumption of the {chi}2 distribution of the LR is violated. CHURCHILL and DOERGE 1994 Down proposed a permutation test approach to determine a critical threshold for declaring the existence of a QTL at a given type I error rate. Similar test statistics can be formulated for testing for the significance of any kind of gene effects of the QTL detected.

In a full-sib family derived from two outbred parents, it is possible that a putative QTL does not segregate in a 1:1:1:1 ratio. The genetic model (1) proposed in this article has power to test if the QTL detected is diallelic segregating 1:2:1 (like marker type B) or 1:1 (like marker type D1 or D2; see Table 1). The hypothesis that a significant QTL conforms to segregation type B can be tested by formulating

Similarly, the hypothesis for testing for the consistency of the QTL segregation to type D can be formulated as

In each of the two hypotheses above, the LR is calculated similarly to Equation 4. In practice, the segregation pattern of a significant QTL should be tested because this is important for designing an efficient breeding strategy.


*  MONTE CARLO SIMULATION
*TOP
*ABSTRACT
*STATISTICAL MODEL
*MONTE CARLO SIMULATION
*A CASE STUDY
*DISCUSSION
*APPENDIX A
*APPENDIX B
*LITERATURE CITED

Extensive simulation studies are performed to test the statistical properties of our method for simultaneously estimating QTL position and effects and linkage phase between the QTL and markers in an outcrossed population. Suppose a full-sib family is derived from two outcrossed parents. This full-sib family is genotyped at six equally spaced (20-cM) fully informative markers (type A), forming five intervals. A QTL is hypothesized at 26 cM from the first marker (located within the second interval).

The phenotypic values for this full-sib family are simulated by giving a particular set of unknown QTL effect parameters under the linkage phase combination {Phi}11 for the two parents. These simulated data are subject to mapping analysis using our model that considers a mixture of all possible linkage phases. Thus, if the MLEs of the phase probabilities p and q are near one, this indicates that our model can precisely characterize the linkage phase for a practical phase-unknown data set. Of course, if the data are simulated under the other linkage phase combination, the values of p and q reflecting this correct linkage phase should be changed correspondingly.

The critical thresholds for declaring a significant QTL are determined from the distribution of the LR values calculated from the simulated phenotypic data assuming no QTL. The simulated data under this null hypothesis are analyzed by the statistical model proposed. The distribution of the LR values over 1000 simulation replicates can be approximated by a {chi}2 distribution. The 99th percentile of the distribution of the maximum is used as empirical critical values to declare chromosome-wide existence of a significant QTL at the significance level {alpha} = 0.01.

Our simulation schemes include different gene action modes (additive, dominant, or overdominant), different heritabilities (H2 = 0.1 or 0.4), and different sample sizes (N = 200 or 400; Table 3). Given a heritability and the genetic variance calculated from hypothesized genetic effect values, we estimate the residual variance ({sigma}2). The accuracy and precision of parameter estimates are affected by gene action modes in three ways (Table 3). First, an overdominant QTL tends to have a more precise estimate of location than does a dominant or additive QTL. For example, the standard error (SE) of the location MLE for an overdominant QTL is 14–20% smaller than those for other QTL when the heritability is 0.4 and sample size is 400. Second, for a small heritability trait, the MLEs of additive effects for an overdominant QTL are less biased than those for additive or dominant QTL. Third, the dominant effect is overestimated to a larger extent than the additive effect, especially for an overdominant QTL.


 
View this table:
In this window
In a new window

 
Table 3. MLEs (±SE) of QTL position, QTL effects, and phase probabilities and the power of detecting a significant QTL under different heritabilities (H2) and sample sizes (N)

The estimation accuracy and precision of all parameters can be improved when heritabilities and sample sizes are increased (Table 3). For example, it is difficult to estimate the position of QTL for a low heritability (H2 = 0.1) trait when N = 200 (Fig 1A). An increased sample size (N = 400) can lead to more precise estimation of the QTL location. For a high heritability (H2 = 0.4) trait, the QTL can be precisely localized, especially when a larger sample size is used (Fig 1B). Similar trends also hold for the estimates of other QTL parameters, such as additive and dominant effects, and model parameters (overall means and residual variance; Table 3). It appears that there are more substantial improvements in the accuracy and precision of parameter estimates due to an increased heritability level from 0.1 to 0.4 than to an increased sample size from 200 to 400.



View larger version (12K):
In this window
In a new window
Download PPT slide
 
Figure 1. The profiles of the log-likelihood ratio (LR) test statistic from one random simulation replicate for QTL detection across a linkage group for a quantitative trait with different heritabilities (A) H2 = 0.1 and (B) H2 = 0.4. The statistical model used considers all possible linkage phases between the QTL and its flanking markers (second and third marker). Results from different sample sizes (N = 200, broken curves; N = 400, solid curves) are compared. The empirical thresholds for declaring the existence of a QTL at the significant level 0.05 are indicated by two horizonal lines (N = 200, broken lines; N = 400, solid lines). The vertical lines with an arrow indicate the position of the hypothesized QTL. The additive and dominant effects of a QTL hypothesized in the model are {alpha} = 0.5, ß = 0.5, and {delta} = 0.5.

It is interesting to note that our model can well estimate the linkage phase between the QTL and the markers. The MLEs of phase probabilities are close to 0.90 for a small heritability trait and >=0.95 for a high heritability trait (Table 3). Unlike the estimation of other parameters, the power of detecting a significant QTL seems to be more sensitive to sample sizes than to heritabilities (Table 3). For a small mapping population, the power of detecting a significant QTL is considerably reduced.

We also performed an additional simulation experiment to test the influence of incorrectly characterizing a linkage phase on QTL detection and parameter estimation. The simulated data, given H2 = 0.4 and N = 400, under linkage phase combination {Phi}11 are analyzed using models based on this phase and three other different phases, {Phi}12, {Phi}21, and {Phi}22. Because different linkage phases change only the order of the parental chromosomal pairings, the maximums of the LR values from the correct linkage phase {Phi}11 and the three incorrect linkage phases {Phi}12, {Phi}21, and {Phi}22 will be identical (see Fig 2), suggesting that phase-separate analyses have no power to select a most likely linkage phase. Also as shown by flat, crooked curves, the maximum LR value from a single linkage phase model cannot be used to precisely determine the QTL position. Fig 2 also illustrates the LR values across the linkage group calculated when all linkage phase combinations are considered simultaneously on the basis of the same simulated data set. A higher peak of the curve for a mixed-phase analysis (see also Fig 1B) indicates that our model has a greater advantage in detecting a significant QTL than usual phase-separate analyses do. When an incorrect linkage phase is used, the signs of the MLEs of the additive and dominant effects of a QTL will be reversed (results not shown).



View larger version (12K):
In this window
In a new window
Download PPT slide
 
Figure 2. The profiles of the log-likelihood ratio (LR) test statistic from one random simulation replicate for QTL detection across a linkage group under one mixed- (solid curve) and four separate-phase analyses (dotted curves). The heritability for the trait hypothesized is H2 = 0.4 with a sample size of 400. It should be noted that the same simulated data set given {alpha} = 0.5, ß = 0.5, and {delta} = 0.5 is used for all of these five different (one mixed- and four separate-phase) analyses.


*  A CASE STUDY
*TOP
*ABSTRACT
*STATISTICAL MODEL
*MONTE CARLO SIMULATION
*A CASE STUDY
*DISCUSSION
*APPENDIX A
*APPENDIX B
*LITERATURE CITED

We use an example of an outcrossed forest tree to demonstrate the power of our statistical model for mapping QTL affecting a quantitative trait. The study material used was derived from the hybridization between two poplar species, Populus deltoides and P. euramericana. A genetic linkage map was constructed using a so-called pseudo-testcross strategy (GRATTAPAGLIA and SEDEROFF 1994 Down) based on 90 genotypes selected randomly from the F1 interspecifc hybrid family with random amplified polymorphic DNAs (RAPDs), amplified fraction length polymorphisms (AFLPs), and intersimple sequence repeats (ISSRs; YIN et al. 2002 Down). This map is composed of the 19 largest linkage groups for each parental map, which roughly represent 19 pairs of chromosomes. The 90 hybrid genotypes used for map construction were measured for wood density with wood samples collected from 11-year-old stems in a field trial. The measurement for each genotype was repeated four to six times to reduce measurement errors. The means of these genotypes were calculated and used for QTL mapping here.

Our model can successfully identify a significant QTL for wood density on linkage group D17 as reported in YIN et al. 2002 Down. In this example, the empirical estimate of the critical value is obtained from 1000 permutation tests. It is found that the critical value for declaring the existence of a QTL on the whole linkage group under consideration is 6.9 at the significance level P = 0.05. The profile of the LRs of the full vs. reduced model across the length of linkage group D17 has a steep peak between a narrow marker interval AG/CGA-480–AG/CGA-330 (Fig 3). The LR value at this peak is 11.7, well beyond the empirical critical threshold at the significance level P = 0.05.



View larger version (14K):
In this window
In a new window
Download PPT slide
 
Figure 3. The profile of the log-likelihood ratio (LR) test statistic for QTL detection across linkage group D17 in YIN et al. 2002 Down, using the mixed-phase analysis. The empirical threshold based on permutation tests (CHURCHILL and DOERGE 1994 Down) is indicated at the horizonal line. The marker names across the linkage group are given at the bottom.

The additive effect of this significant QTL detected is 0.033, or equivalent to 7% relative to the overall mean. This QTL was found to explain ~30% of the phenotypic variance for wood density in hybrid poplars. The MLE of phase probability p is 0.82, thus suggesting that there is quite a high probability to have a linkage phase {Phi}11. This indicates that the positive allele of this QTL that increases wood density is, at a probability of 0.82, in coupling phase with dominant alleles of the two markers AG/CGA-480 and AG/CGA-330 flanking the QTL.

The same material was analyzed using a traditional interval-mapping approach that assumes a possible QTL-marker linkage phase at one time. This phase-separate approach can also identify a significant QTL for wood density (data not shown), but cannot determine a correct linkage phase because the maximums of the LR values are identical between two possible linkage phases. Our method provides important information about nonallelic arrangements on the homologous chromosomes.


*  DISCUSSION
*TOP
*ABSTRACT
*STATISTICAL MODEL
*MONTE CARLO SIMULATION
*A CASE STUDY
*DISCUSSION
*APPENDIX A
*APPENDIX B
*LITERATURE CITED

Statistical strategies for mapping QTL segregating in an inbred population have been well established (reviewed in JANSEN 2000 Down), which has led to a number of publications reporting on the detection of QTL in different species (WU et al. 2000 Down; MACKAY 2001 Down). Yet, despite significant importance, QTL mapping in outcrossing species is often frustrated due to lack of an appropriate statistical method to consider high heterozygosity of this group of species. In this article, we present a statistical model for mapping QTL in these outcrossing, nonmodel systems by incorporating their heterozygous nature into a mapping framework.

Our model is advantageous over current QTL mapping methods in a full-sib family derived from outcrossing species in three aspects (ANDERSSON et al. 1994 Down; HALEY et al. 1994 Down; XU 1996 Down; KNOTT et al. 1997 Down). First, our model can characterize a correct linkage phase between a putative QTL and markers. For heterozygous populations, allelic arrangements of different markers and QTL on a single chromosome, i.e., linkage phase, generally cannot be known a priori. The current statistical methods for full-sib analysis either were based on a simplified assumption that markers are segregating but QTL fixed in a full-sib family (HALEY et al. 1994 Down) or failed to consider the influence of incorrectly characterizing a marker-QTL linkage phase on parameter estimation when both markers and QTL are assumed to be segregating (KNOTT et al. 1997 Down). Linkage phase affects statistical inference about QTL effect size and direction for a fixed-model approach, although this problem does not occur for a random model-based mapping approach (XU and ATCHLEY 1995 Down).

Second, our model can analyze all possible different types of markers and can test how a QTL is segregating in a family. Most of the current studies consider only fully informative markers. For example, XU 1996 Down proposed a full-sib family-mapping approach by assuming four different alleles at each marker and QTL. This approach is likely limited because the genome of an outcrossing species is often covered by different types of polymorphism markers, forming many different cross types when two parents are crossed (Table 1). Third, our model provides a way of simultaneously estimating linkage phases and QTL parameters within a unified framework. Simulation studies suggest that this unified framework has power to characterize a most likely linkage phase and also displays increased power to detect a significant QTL (Fig 2). For two heterozygous parents used to generate a full-sib family, there are multiple linkage phases, but only one is correct. For pure marker analysis, maximum-likelihood approaches can be used to select a most likely linkage phase because different phases correspond to different LR values (WU et al. 2002 Down). But, using two markers to infer a QTL, all different linkage phases theoretically give the identical LR (Fig 2), which indicates that it is not possible to correctly detect a linkage phase on the basis of likelihood analysis. If QTL identification is based on a wrong linkage phase, the estimation of QTL additive and dominant effects will have an inverse sign.

The correct characterization of a linkage phase between the QTL and markers for a practical data set is not only important for parameter estimation and model selection, but also essential for the application of molecular results to genetic improvement programs. For a genetic breeding program, we need to know the direction of genetic effects to make an efficient marker-assisted selection. Suppose a dominant marker allele is in a coupling phase with the positive allele of a QTL. Thus, the selection for dominant marker alleles can lead to improved phenotypes due to favorable QTL alleles. Without this knowledge, however, it is possible to select the negative allele of this QTL by using the marker allele if it is based only on the significant relationship between the marker and QTL.

The robustness and performance of our statistical method has been examined through extensive simulations. One of the most important findings is that improvements in the accuracy and precision of QTL parameters can be more substantial with increased heritabilities than with increased sample sizes (Table 2). In practice, this conclusion will have important implications for framing an optimal experiment design for precise estimation of QTL parameters. To increase parameter precision, for example, special care should be paid to the use of silvicultural measurements to increase site homogeneity, rather than planting a huge sample size on large-scale, nonuniform sites.

The model proposed is based on simple interval mapping for a single QTL affecting a quantitative trait using a mixed set of marker types. The theory extended to capture information provided by other markers outside interval markers considered is straightforward (see ZENG 1994 Down) and simulations can be similarly conducted to test the analytical advantages and disadvantages of including more markers in our analysis model. Also, a more important statistical aspect of QTL-mapping models is to simultaneously map multiple linked QTL for a trait. Multiple-QTL analysis obviously can be closer to biological reality because many traits are actually polygenic (LYNCH and WALSH 1998 Down) and can also increase the power to detect QTL of smaller effects from an analytical perspective. Several QTL may be located on the same linkage group or different groups. In addition to the modeling of more gene actions and interactions between different QTL, multiple-QTL analyses include increased linkage phase combinations relative to the marker intervals on which different QTL are located. It is possible that these more complex models can be solved using Markov chain Monte Carlo algorithms (ROBERT and CASELLA 1999 Down; SILLANPAA and ARJAS 1999 Down; XU and YI 2000 Down). For a broader application, we have proposed a unified framework for simultaneous maximum-likelihood estimation of linkage phases and QTL actions and interactions in our computer program.


*  ACKNOWLEDGMENTS

This work is partially supported by an Outstanding Young Investigators Award of the National Science Foundation of China (30128017), a University of Florida Research Opportunity Fund (02050259), and a University of South Florida Biodefense grant (7222061-12) to R.W., and National Science Foundation of China grant (30000097) to X.-Y.L. The publication of this manuscript is approved as journal series no. R-09202 by the Florida Agricultural Experiment Station.

Manuscript received July 13, 2002; Accepted for publication June 17, 2003.


*  APPENDIX A
*TOP
*ABSTRACT
*STATISTICAL MODEL
*MONTE CARLO SIMULATION
*A CASE STUDY
*DISCUSSION
*APPENDIX A
*APPENDIX B
*LITERATURE CITED

We present the formulas for obtaining the MLEs of the unknown parameters {Omega} = (µuv, {sigma}2, {phi}j)T in the M step. For the distribution parameters within the mixture model, we have

For the phase probabilities, we have

where


*  APPENDIX B
*TOP
*ABSTRACT
*STATISTICAL MODEL
*MONTE CARLO SIMULATION
*A CASE STUDY
*DISCUSSION
*APPENDIX A
*APPENDIX B
*LITERATURE CITED

The pattern and structure of an incidence matrix (Ik) relating the zygotic genotypes to phenotypes for marker Mk depend on the type of this marker (Table 1). When marker Mk is from marker types A, B1, B2, B3, C, D1, and D2, we have


*  LITERATURE CITED
*TOP
*ABSTRACT
*STATISTICAL MODEL
*MONTE CARLO SIMULATION
*A CASE STUDY
*DISCUSSION
*APPENDIX A
*APPENDIX B
*LITERATURE CITED

ANDERSSON, L., C. S. HALEY, H. ELLEGREN, S. A. KNOTT, and M. JOHANSSON et al., 1994  Genetic mapping of quantitative trait loci for growth and fatness in pigs. Science 263:1771-1774.[Abstract/Free Full Text]

ARUS, P., C. OLARTE, M. ROMERO, and F. VARGAS, 1994  Linkage analysis of 10 isozyme genes in F1 segregating almond progenies. J. Am. Soc. Hort. Sci. 119:339-344.[Abstract/Free Full Text]

CHURCHILL, G. A. and R. W. DOERGE, 1994  Empirical threshold values for quantitative trait mapping. Genetics 138:963-971.[Abstract]

DEMPSTER, A. P., N. M. LAIRD, and D. B. RUBIN, 1977  Maximum likelihood from incomplete data via EM algorithm. J. R. Stat. Soc. Ser. B 39:1-38.

GRATTAPAGLIA, D. and R. SEDEROFF, 1994  Genetic linkage maps of Eucalyptus grandis and Eucalyptus urophylla using a pseudo-testcross: mapping strategy and RAPD markers. Genetics 137:1121-1137.[Abstract]

HALEY, C. S., S. A. KNOTT, and J. M. ELSEN, 1994  Mapping quantitative trait loci in crosses between outbred lines using least squares. Genetics 136:1195-1207.[Abstract]

HOESCHELE, I., P. UIMARI, F. E. GRIGNOLA, Q. ZHANG, and K. M. GAGE, 1997  Advances in statistical methods to map quantitative trait loci in outbred populations. Genetics 147:1445-1457.[Abstract]

JANSEN, R. C., 2000 Quantitative trait loci in inbred lines, pp. 567–597 in Handbook of Statistical Genetics, edited by D. J. BALDING, M. BISHOP and C. CANNINGS. John Wiley & Sons, New York.

JOHNSON, D. L., R. C. JANSEN, and J. A. M. VAN ARENDONK, 1999  Mapping quantitative trait loci in a selectively genotyped outbred population using a mixture model approach. Genet. Res. 73:75-83.

KAO, C.-H. and Z-B. ZENG, 1997  General formulas for obtaining the MLEs and the asymptotic variance-covariance matrix in mapping quantitative trait loci when using the EM algorithm. Biometrics 53:653-665.[Medline]

KNOTT, S. A., D. B. NEALE, M. M. SEWELL, and C. S. HALEY, 1997  Multiple marker mapping of quantitative trait loci in an outbred pedigree of loblolly pine. Theor. Appl. Genet. 94:810-820.

LANDER, E. S. and D. BOTSTEIN, 1989  Mapping Mendelian factors underlying quantitative trait using RFLP linkage maps. Genetics 121:185-199.[Abstract/Free Full Text]

LANDER, E. S. and N. J. SCHORK, 1994  Genetic dissection of complex traits. Science 265:2037-2048.[Abstract/Free Full Text]

LANDER, E. S. and R. A. WEINBERG, 2000  Genomics: journey to the center of biology. Science 287:1777-1782.[Free Full Text]

LYNCH, M., and B. WALSH, 1998 Genetics and Analysis of Quantitative Traits. Sinauer, Sunderland, MA.

MACKAY, T. F. C., 2001  The genetic architecture of quantitative traits. Annu. Rev. Genet. 35:303-339.[Medline]

MALIEPAARD, C., J. JANSEN, and J. W. VAN OOIJEN, 1997  Linkage analysis in a full-sib family of an outbreeding plant species: overview and consequences for applications. Genet. Res. 70:237-250.

RITTER, E. and F. SALAMINI, 1996  The calculation of recombination frequencies in crosses of allogamous plant species with applications to linkage mapping. Genet. Res. 67:55-65.

RITTER, E., C. GEBHARDT, and F. SALAMINI, 1990  Estimation of recombination frequencies and construction of RFLP linkage maps in plants from crosses between heterozygous parents. Genetics 125:645-654.[Abstract]

RIDOUT, M. S., S. TONG, C. J. VOWDEN, and K. R. TOBUTT, 1998  Three-point linkage analysis in crosses of allogamous plant species. Genet. Res. 72:111-121.

ROBERT, C. P., and G. CASELLA, 1999 Monte Carlo Statistical Methods. Springer, New York.

SCHÄFER-PREGL, R., F. SALAMINI, and C. GEBHARDT, 1996  Models for mapping quantitative trait loci (QTL) in progeny of non-inbred parents and their behaviour in presence of distorted segregation ratios. Genet. Res. 67:43-54.

SILLANPAA, M. J. and E. ARJAS, 1999  Bayesian mapping of multiple quantitative trait loci from incomplete outbred offspring data. Genetics 151:1605-1619.[Abstract/Free Full Text]

SONG, J. Z., M. SOLLER, and A. GENIZI, 1999  The full-sib intercross line (FSIL): a QTL mapping design for outcrossing species. Genet. Res. 73:61-73.

WELLER, J. I., 2001 Quantitative Trait Loci Analysis in Animals. CABI Publishing, New York.

WU, R., C.-X. MA, I. PAINTER, and Z-B. ZENG, 2002  Simultaneous maximum likelihood estimation of linkage and linkage phases in outcrossing species. Theor. Popul. Biol. 61:349-363.[Medline]

WU, R. L., Z-B. ZENG, S. M. MCKEAND, and D. M. O'MALLEY, 2000  The case for molecular mapping in forest tree breeding. Plant Breed. Rev. 19:41-68.

XU, S. Z., 1996  Mapping quantitative trait loci using four-way crosses. Genet. Res. 68:175-181.

XU, S. Z. and W. R. ATCHLEY, 1995  A random model approach to interval mapping of quantitative trait loci. Genetics 141:1189-1197.[Abstract]

XU, S. and N. YI, 2000  Mixed model analysis of quantitative trait loci. Proc. Natl. Acad. Sci. USA 97:14542-14547.[Abstract/Free Full Text]

YIN, T. M., X. Y. ZHANG, M. R. HUANG, M. X. WANG, and Q. ZHUGE et al., 2002  The molecular linkage maps of the Populus genome. Genome 45:541-555.[Medline]

ZENG, Z-B., 1994  Precision mapping of quantitative trait loci. Genetics 136:1457-1468.[Abstract]




This article has been cited by other articles:


Home page
GeneticsHome page
C.-X. Ma, Q. Yu, A. Berg, D. Drost, E. Novaes, G. Fu, J. S. Yap, A. Tan, M. Kirst, Y. Cui, et al.
A Statistical Model for Testing the Pleiotropic Control of Phenotypic Plasticity for a Count Trait
Genetics, May 1, 2008; 179(1): 627 - 636.
[Abstract] [Full Text] [PDF]


Home page
jashsHome page
E. J. Oliveira, M. L. C. Vieira, A. A. F. Garcia, C. F. Munhoz, G. R.A. Margarido, L. Consoli, F. P. Matta, M. C. Moraes, M. I. Zucchi, and M. H. P. Fungaro
An Integrated Molecular Map of Yellow Passion Fruit Based on Simultaneous Maximum-likelihood Estimation of Linkage and Linkage Phases
J. Amer. Soc. Hort. Sci., January 1, 2008; 133(1): 35 - 41.
[Abstract] [Full Text] [PDF]


Home page
GeneticsHome page
S. Wang, S. Huang, L. Zheng, and H. Zhao
Mapping Quantitative Trait Loci in Noninbred Mosquito Crosses
Genetics, April 1, 2006; 172(4): 2293 - 2308.
[Abstract] [Full Text] [PDF]


Home page
GeneticsHome page
R. Wu, C.-X. Ma, M. Lin, and G. Casella
A General Framework for Analyzing the Genetic Architecture of Developmental Characteristics
Genetics, March 1, 2004; 166(3): 1541 - 1551.
[Abstract] [Full Text] [PDF]