Genetics, Vol. 158, 463-475, May 2001, Copyright © 2001

Detection of Closely Linked Multiple Quantitative Trait Loci Using a Genetic Algorithm

Reiichiro Nakamichia, Yasuo Ukaia, and Hirohisa Kishinoa
a Laboratory of Biometrics, Graduate School of Agricultural and Life Science, University of Tokyo, Tokyo 113-8657, Japan

Corresponding author: Reiichiro Nakamichi, Laboratory of Biometrics, Graduate School of Agricultural and Life Science, University of Tokyo, Yayoi 1-1-1, Bunkyo, Tokyo 113-8657, Japan., naka{at}peach.ab.a.u-tokyo.ac.jp (E-mail)

Communicating editor: Z-B. ZENG


*  ABSTRACT
*TOP
*ABSTRACT
*MODEL AND GENETIC ALGORITHM
*SIMULATION EXPERIMENTS
*DISCUSSION
*LITERATURE CITED

The existence of a quantitative trait locus (QTL) is usually tested using the likelihood of the quantitative trait on the basis of phenotypic character data plus the recombination fraction between QTL and flanking markers. When doing this, the likelihood is calculated for all possible locations on the linkage map. When multiple QTL are suspected close by, it is impractical to calculate the likelihood for all possible combinations of numbers and locations of QTL. Here, we propose a genetic algorithm (GA) for the heuristic solution of this problem. GA can globally search the optimum by improving the "genotype" with alterations called "recombination" and "mutation." The "genotype" of our GA is the number and location of QTL. The "fitness" is a function based on the likelihood plus Akaike's information criterion (AIC), which helps avoid false-positive QTL. A simulation study comparing the new method with existing QTL mapping packages shows the advantage of the new GA. The GA reliably distinguishes multiple QTL located in a single marker interval.


MANY traits of plants and animals are quantitative; that is, the phenotype of the trait is continuous and described by measurement values. In many cases, one quantitative trait is controlled by multiple genetic loci with relatively small effects each. Furthermore, as quantitative traits fluctuate under the influence of environmental effects, we cannot infer a genotype from a phenotype unambiguously. Therefore, genetic and environmental effects contributing to the target traits must be estimated using information from phenotypic values and molecular markers to locate quantitative trait loci (QTL) on linkage maps.

The simple interval mapping (SIM) approach of LANDER and BOTSTEIN 1989 Down examines the existence of a QTL at each location on the linkage map. The likelihood is calculated from (a) the conditional probability of the QTL genotype given the marker genotype and (b) the conditional distribution of the phenotypic value given the QTL genotype. The existence of the QTL is evaluated by the log-likelihood ratio (called the LOD score), comparing the proposed QTL with the "no QTL" hypothesis. The LOD score is calculated at all the locations on the linkage map, and, if a peak of the LOD score is higher than a given threshold, a QTL is mapped at the point. SIM uses the expectation maximization (EM) algorithm to estimate the maximum-likelihood (ML) point, and the calculation is fairly computationally expensive. HALEY and KNOTT 1992 Down described a regression method that can decrease the computation. This regression method examines the existence of a QTL using the information of flanking markers and phenotypic value, but estimates parameters using multiple regression rather than explicit ML. The regression method reduces the calculation task, but the approximation is not so good (KAO 2000 Down).

The SIM method is biased, when there are multiple QTL, because it assumes only one QTL at a time. To correct for this bias, ZENG 1993 Down, ZENG 1994 Down and JANSEN 1993 Down independently developed composite interval mapping (CIM). CIM is a combination of SIM and multiple regression. When testing a QTL using the method of interval mapping, CIM uses the genotypes of the markers (except for the flanking markers of the target QTL) as covariates to control genetic effects at other than the target QTL. However, if there are multiple QTL within a marker interval, neither CIM nor SIM can detect them separately.

It is possible to construct a genetic model that considers multiple QTL simultaneously. Because we do not know the number of QTL in advance, it is impractical to calculate the LOD scores for all possible combinations of number and locations of QTL. We require a practical method to simultaneously estimate the optimum number of QTL. Because the LOD score drops at the positions of markers and consequently shows many local optima, most numerical optimization procedures using information based on the local shape of the LOD score, such as the Newton-Raphson procedure and the simplex method, do not perform so well.

Bayesian analysis of QTL using Markov chain Monte Carlo (MCMC) has been reported recently (UIMARI and HOESCHELE 1997 Down; STEPHENS and FISCH 1998 Down; SILLANPAA and ARJAS 1998 Down; YI and XU 2000 Down). This method searches for the best combination of the QTL number and locations by using a prior distribution to update the parameters of the model step by step, which enables us to evaluate the reliability of the estimates in terms of a posterior distribution. However, the approach is presently computationally prohibitive, and thus it is often difficult to detect QTL correctly in a realistic time period.

KAO and ZENG 1997 Down and KAO et al. 1999 Down described a general model for handling multiple QTL simultaneously. This model includes not only additive effects of QTL but also epistatic effects between QTL. Their proposed multiple interval mapping (MIM) searches for the best model with a stepwise selection using likelihood-ratio statistics to add or to delete a QTL. Threshold values for QTL detection were calculated assuming a {chi}2 distribution of the ratio statistics plus a Bonferroni correction for multiple testing of marker intervals (LANDER and BOTSTEIN 1989 Down; ZENG et al. 1999 Down).

As an alternative with high potential, we propose and test a genetic algorithm (GA) for QTL mapping. We use a multiple-QTL model and Akaike's information criterion (AIC, AKAIKE 1974 Down) as the evaluation function. Because AIC measures predictive power for the distribution of quantitative traits, it is particularly useful for the application to breeding such as marker-assisted selection. Candidates for the QTL numbers and locations are encoded as "genotypes" of the GA. "Selection" on the genotype then produces an efficient global search after maintaining diversity in the GA population through "crossover" and "mutation" type operations. A combination of "drastic mutation" and "slight modification" enables efficient global search for QTL number and locations.

A major advantage of using AIC for QTL mapping comes when the prime interest is marker-assisted breeding. Here a major problem is not detecting closely linked QTL of opposite effect (a type II error), and so not checking the results of offspring where rearrangement has occurred. In general, it is very difficult to control type II error rates. Most of the proposed QTL mapping methods control type I error rates with no regard to type II error rates. However, by maximizing predictive power, AIC creates a balance of type I and type II errors, and in this sense is desirable in marker-assisted breeding QTL prediction.


*  MODEL AND GENETIC ALGORITHM
*TOP
*ABSTRACT
*MODEL AND GENETIC ALGORITHM
*SIMULATION EXPERIMENTS
*DISCUSSION
*LITERATURE CITED

Let us consider an F2 population from two completely homozygous parents P1 and P2. Let the number of the QTL be M. The phenotypic value Y of one F2 individual is expressed as

(1)

where µ is a constant value that has no relation to the QTL genotype, and e is the residual following a normal distribution with mean zero and constant variance. The term gm (m = 1, 2, ... M) is the genetic effect of the mth QTL. Let the allele from P1 be denoted by Qm and the allele from P2 be denoted by qm; then gm takes am, dm, and -am for the QTL genotypes QmQm, Qmqm, and qmqm, respectively. The additive effect, am, and the dominance effect, dm, may be either positive or negative. For simplicity, we assume there are no epistatic effects.

Given a set of the locations of QTL, we can evaluate the plausibility of the set using likelihood. Therefore, QTL mapping is a case of combinatorial optimization, which must search for the combination of locations on a linkage map, which maximizes the likelihood.

The GA is a numerical optimization procedure, based on an analogy to organic evolution. Candidates for the optimum solution are coded as the "genotypes" of virtual creatures (GA individuals), and candidates have a probability of survival described by the "fitness." First, a population of GA individuals with random genotypes is generated. Then, "mating" among the GA individuals, "mutation," and "selection" by the fitness are repeated to improve fitness until the fitness in many succeeding generations changes by less than a threshold value. At this time, the genotype of the best GA individual is adopted as the optimum solution. "Mating" and "mutation" increase the diversity of the GA population, while "selection" reduces it. For GA to reliably find a good global optimum, the factors in quotation marks need to be appropriately matched to the problem.

GA genotype:
In our GA, the genotype is the number and location of QTL. Fig 1 shows the GA genotype. M denotes the number of QTL. The term (Cx, Lx) denotes the location of the xth QTL, where Cx is the chromosome on which the xth QTL exists, and Lx is the location measured in centimorgans on the chromosome Cx.



View larger version (7K):
In this window
In a new window
Download PPT slide
 
Figure 1. GA genotype. M, number of QTL; Cj, chromosome on which jth QTL exists; Lj, location of the jth QTL on the chromosome Cj (j = 1, 2, ..., M).

Selection function and alternating generations:
The tournament method is adopted as the "selection" of parent GA individuals that breed offspring GA individuals. Fig 2 gives an example where some constant number Nt (called the tournament size) of the GA individuals is extracted (tournament 1) from the GA population of the current generation. Next, the GA individual with the highest fitness (here, the lowest AIC) among tournament 1 is adopted as a parent GA individual (parent 1). This parent selection is repeated again without replacement (so we get parent 2 from tournament 2), and the genotype of the offspring GA individual is created from the genotypes of the selected two parents through the "crossover" and mutation. The breeding procedures are repeated without replacement until the number of the GA individuals in the next generation becomes the same as that of the current generation (dotted line in Fig 2). This alternation of GA generations continues until the GA individual with the highest fitness appears in successive generations, indicating convergence (bold solid line in Fig 2).



View larger version (31K):
In this window
In a new window
Download PPT slide
 
Figure 2. Overview of GA. NGA, number of GA individuals; Nt, tournament size; Pd, probability of drastic mutations; Ni, number of identical GA individuals with newly created GA individual in the next generation; Ps, base probability of slight mutation (slight mutation occurs with probability Ps x Ni); Dshift, distance of slight shift.

GA crossover:
GA crossover creates an offspring GA individual as shown in Fig 3. There are two parent GA individuals selected through the tournament method. Parent 1 has two QTL located at (Ca, La) and (Ca, Lb). Parent 2 has four QTL located at (Ca, La), (Ca, Lc), (Cb, Ld), and (Cb, Lf). The number of QTL for the offspring is selected randomly from the range determined by the numbers of QTL of the parents. The upper limit of the range is the larger QTL number of the parents, and the lower limit is the smaller one. In the example in Fig 3, the upper limit is four (QTL number of parent 2), and the lower limit is two (QTL number of parent 1). Then, the offspring's QTL number is selected randomly from two, three, or four, and three is selected in this example. There are six QTL locations among the genotypes of the parents, but the location (Ca, La) is duplicated, so there are five available locations, namely, (Ca, La), (Ca, Lb), (Ca, Lc), (Cb, Ld), and (Cb, Lf). Three locations are selected randomly from these five locations. In this example, (Ca, Lb), (Ca, Lc), and (Cb, Lf) are selected.



View larger version (19K):
In this window
In a new window
Download PPT slide
 
Figure 3. An example of GA crossover. Parent 1 has two QTL located at (Ca, La) and (Ca, Lb). Parent 2 has four QTL located at (Ca, La), (Ca, Lc), (Cb, Ld), and (Cb, Lf). The QTL number of the offspring GA individual is selected randomly from the range defined by the QTL numbers of parents (here, two to four). The QTL locations of the offspring are selected randomly from these locations. In this example, three locations, (Ca, Lb), (Ca, Lc), and (Cb, Lf), are selected.

GA mutation of drastic change:
The offspring that the GA individual created through the crossover may be changed further by a random mutation event. GA mutation has three patterns (Fig 4), which mimic real genomic evolution: insertion, deletion, and substitution (relocation). When a mutation occurs, one of these is randomly selected and applied to the offspring GA individual. Insertion adds a random location to the list of QTL locations in the GA genotype, deletion randomly selects a QTL from the GA genotype and deletes it, and substitution randomly selects a QTL from the GA genotype and moves it to a random location on a random chromosome.



View larger version (22K):
In this window
In a new window
Download PPT slide
 
Figure 4. An example of three patterns of GA mutation. The offspring created in Fig 3 has adopted the mutation. Insertion adds one QTL on a random location. Deletion selects a QTL randomly and deletes it. Substitution selects a QTL randomly and changes to a random location.

GA mutation of slight modification:
A serious problem of GA is premature convergence. In the alternation of the GA generation, a GA individual with higher fitness breeds more offspring than the GA individuals with lower fitness. This procedure decreases genotypic diversity of the GA population. If the GA selection drives diversity too low, the GA easily falls into local optima. In our GA system, diversity means the number of the candidates of the QTL location contained in the GA population. Since the GA mutations mentioned above change GA genotypes drastically, most of them may be deleted from the population. To help maintain the diversity of the GA population, we introduce very slight mutation. This additional mutation randomly selects a QTL from a GA individual and moves it a little from its present location. When an offspring GA individual is created through crossover and drastic mutation, if GA individuals of identical genotype already exist in the newly created population, the slight shift mutation occurs with the probability proportional to the number of identical GA individuals.

GA-controlling parameters:
In our GA, five parameters control the performance: size of GA population, tournament size, probabilities of drastic and slight mutations, and shift distance for the slight mutation. GA population size affects the speed of the search and the diversity. Small population size takes less time than a large one, but a large population contains more candidates of QTL locations in the GA population, that is, it allows high diversity, while in a small population, diversity can quickly decrease. Tournament size affects the selection pressure. Large tournament size increases the chance to select a GA individual with high fitness at the expense of diversity. Mutation affects the rate at which a promising new QTL location enters into the GA population. A high mutation rate and large shift distance quickly generate many QTL locations not contained in the earlier GA populations, but can result in a very slow convergence. Despite the many studies that have attempted to balance these conflicting requirements (e.g., SCHRAUDOLPF and BELEW 1992 Down; MYERS and HANCOCK 1997 Down; STREIFEL et al. 1999 Down), there is no general criterion for setting up these parameters. Accordingly, GA parameters are typically determined empirically. We repeated many simulations, changing the GA controlling parameters, and then selected parameter values that showed good QTL detection. We settled on the number of GA individuals being 100, the tournament size was two, the rate of the drastic GA mutation was 0.05, and the rate of slight mutation was set to be 0.25 of the number of identical GA individuals to a newly created GA individual. The slight mutation moves the selected QTL 1 cM. To maximize the speed and certainty of convergence to the global optimal point, further study of GA-controlling parameters is desirable.

Fitness of GA:
Fitness is measured by the LOD score, which expresses the goodness of fit of the model to the data. However, the LOD score tends to increase with the number of QTL, simply because of an additional degree of freedom. To avoid the false-positive detection of QTL, we need a penalty on the number of QTL. Since GA repeats the evaluation of fitness so many times, we need a criterion that is quickly calculated. Further, in the actual application of QTL analysis to the field of breeding such as marker-assisted selection, it is often of most concern to predict the distribution of quantitative traits. Accordingly, we adopt AIC as the fitness criterion. AIC is very simple and was developed to select the model with the highest expected predictive power (see AKAIKE 1974 Down; SAKAMOTO et al. 1983 Down). The fitting of a model with a parameter vector {theta} to the data X is expressed as the maximum log-likelihood

where is the vector of maximum-likelihood estimates (MLE) of {theta}. Assume that the data set Y is independent of X and follows the same distribution as X. The predictive power of the model is measured by logeL( (X) | Y). On average, this is worse than the maximum log-likelihood because it is not tailored to the data at hand. The difference in likelihood may be examined with the Taylor expansion:

Under regularity condition, (Y) - (X) asymptotically follows a normal distribution of mean zero and variance equal to -2 · (E[{nabla}{theta}{nabla}'{theta} logeL({theta} | X)])-1.

Hence, Z = ((Y) - (X))'({nabla}{theta}{nabla}'{theta} logeL((X) | X)) ((Y) - (X)) follows {chi}2 distribution with degree of freedom being the dimension of {theta}. Taking the expectation, we get

By taking account of this difference, AIC is defined as

The model with lowest AIC has maximized predictive power. The parameters of our model (Equation 1) are the location, additive and dominance effects of each QTL, the constant value, and the residual variance. So, for M QTL,

(2)

Here, max logeL is the log-likelihood maximized over the additive and dominance effects of the QTL (am, dm, m = 1, 2, ... M), the constant (µ), and the residual variance ({sigma}2), given the locations of the QTL. The penalty of one log-likelihood unit for one additional parameter is valid under the regularity condition that guarantees the central limit theorem. When the sample size is not large enough for a good approximation to the multinormal asymptotic distribution of the parameter estimates, further elaborations become necessary (LANDER and BOTSTEIN 1989 Down; CHURCHILL and DOERGE 1994 Down; DOERGE and CHURCHILL 1996 Down; ZENG et al. 1999 Down).

AIC is more liberal than likelihood-ratio hypothesis testing procedures, because it may allow the model to incorporate "nonsignificant" QTL, as long as they contribute to increasing the predictive power. Like the hypothesis-testing procedure, the procedure based on AIC has some probability of making type I errors. However, the predictive power and the accuracy of the estimated genetic effects are maximized, and, in this sense, an optimal balance of type I and type II errors is achieved. In practical situations, such as marker-assisted selection, it may be advantageous to examine the best AIC model and exclude QTL with negligible genetic effects.

Suppose we have a sample of size N from the F2 population. Each of the M QTL has three possible genotypes, QmQm, Qmqm, and qmqm, indexed as km = 1, 2, and 3 (m = 1, 2, ... M). The genetic effect gm,km of the mth QTL is am, dm, and -am for km = 1, 2, and 3. The log-likelihood function is

(3)

where Pj,k1,...kM is the conditional probability of the QTL's composite genotype given the information of the markers' genotypes and

(4)

When each interval between adjacent markers has at most one QTL, Pj,k1,...kM is the product of the conditional probabilities of the QTL genotypes:

(5)

P*j,km depends on the genotypes of the two markers flanking the QTL and on the recombination fraction between them (Table 1; HAYASHI and UKAI 1994 Down; KAO and ZENG 1997 Down). Table 1 describes the probabilities for one QTL between two markers. But, there can be multiple QTL in a marker interval particularly when dense marker linkage maps are not available. In such a case, QTL other than the target QTL are regarded as markers and used as flanking markers in Table 1. Details of this problem are mentioned in the DISCUSSION.


 
View this table:
In this window
In a new window

 
Table 1. Conditional probability of the QTL genotype, given flanking marker genotypes

KAO and ZENG 1997 Down and KAO et al. 1999 Down described an EM algorithm to estimate the genetic effects of multiple QTL. With the explicit formula of conditional probabilities of genotypes of up to two QTL in a marker interval, their procedure can detect multiple QTL successfully. Instead of extending their formula to a general case, we adopt the following two-step moment method for simplicity. First, we obtain the crude estimates of the genetic effects ãm, m, the constant m, and the residual variance 2m for each QTL, assuming a single QTL. These estimates are obtained by the EM algorithm. This repeats the E step of calculating the expected log-likelihood; hence

(6)

where

and the M step of maximization of logeL* is repeated until the parameters converge. These crude estimates of genetic parameters are biased because of the effects of other QTL. Thus, we next apply a moment method to these crude estimates to remove the effects of other QTL. The marginal expectations for these genotypes are obtained by

(7)


(8)


(9)

where rmh is the recombination fraction between the mth and hth QTL. Then, we obtained the final estimates and âmm(m = 1, 2, ... M) by solving the linear equations

(10)


(11)


(12)

Likewise, the estimates of the residual variance 2 are obtained by

(13)

Since the estimates of µ and {sigma}2 have different values for each m in this procedure, we adopt the average as and 2.


*  SIMULATION EXPERIMENTS
*TOP
*ABSTRACT
*MODEL AND GENETIC ALGORITHM
*SIMULATION EXPERIMENTS
*DISCUSSION
*LITERATURE CITED

We examined the performance of our GA through a set of numerical experiments using simulated F2 populations. We used Haldane's model to generate the marker genotype data assuming no interference among markers. The variance of the residual was set at 1.0, and the constant value was set at 0. That is, phenotypic value data were generated following a normal distribution with variance 1.0 and mean determined by the summation of the genetic effects of QTL. The sample size of the F2 population was set at 500, each F2 individual had one or three pairs of chromosomes of 100-cM length each, and 11 markers were located every 10 cM on each chromosome.

In these experiments, the number of GA individuals was 100, and the tournament size was two. The rate of the ordinary GA mutation was 0.05, and the rate of the additional one was set to be 0.25 of the number of GA individuals with the same GA genotype as the one newly created. These GA-controlling parameters were exactly those found to work well empirically.

Behavior of the GA:
In this experiment, three QTL were located on the same chromosome at positions of 17, 43, and 85 cM from the end of a chromosome. Each QTL had an additive effect of 1.0 and a dominance effect of 0 (broad-sense heritability is 0.6). Fig 5 and Table 2 show that the GA converged by the 20th GA generation. The estimates of three QTL had locations of 16, 84, and 43 cM, and additive effects of 1.424, 1.358, and 1.084, all close to the true values. The dominance effects were correctly estimated to be negligible. The residual variance was also reliably estimated (0.977 vs. 1.0). Repeating the same experiment 100 times (Fig 6), we detected three true QTL in 96 out of 100 simulations, and four QTL, including one false positive, in the remaining cases.



View larger version (9K):
In this window
In a new window
Download PPT slide
 
Figure 5. Improvement of the fitness of the best GA individual with GA generations.



View larger version (10K):
In this window
In a new window
Download PPT slide
 
Figure 6. Results of 100 trials of QTL analysis using GA.


 
View this table:
In this window
In a new window

 
Table 2. Improvement of QTL estimation with more GA generations

Effects of QTL:
Next, we altered the pattern of QTL, so seven QTL were located on three chromosomes at 17, 43, and 72 cM along the first chromosome, 24 and 85 cM along the second chromosome, and 62 and 91 cM along the third chromosome. Each QTL had the same genetic effect. Simultaneously, additive effects were varied from 0.2 to 1.0 in steps of 0.2, and the dominance effects were fixed at 0.

The experiments were repeated 100 times. Table 3 shows the mean and the standard deviation of the estimates when a QTL was detected at the relevant region. It also shows the empirical probabilities of the detection. For example, when the additive effect of each QTL was 0.2, the first QTL was detected 71 times out of 100 trials, and the second, 57 times. With increasing genetic effects, it becomes easier to correctly detect QTL. When the additive effects are 1.0, the average probability of detection is 0.964, but when the additive effects are 0.2, the average probability is 0.624. False-positive QTL, where QTL have negligible genetic effects or are located outside the true relevant regions, occurred with an average frequency of 0.32 with small effects, dropping to 0.00 with genetic effects of 1.0. This question of whether anything can be done about AIC being too liberal at lower genetic effects requires further study. The trade-off will likely be decreasing type I errors but more rapidly rising type II errors (i.e., failure to include the true QTL).


 
View this table:
In this window
In a new window

 
Table 3. Ability of GA to locate seven QTL (100 replicates for each combination of additive and dominance effect)

Closely linked QTL:
We examined the performance of our GA in the situations of closely linked QTL (Table 4). In this situation, two QTL were located on the same chromosome at 43 and 47 cM. As the markers are located every 10 cM, no marker exists between the two QTL. These two QTL had reverse genetic effects on each other. The additive and dominance effects of the QTL at 43 cM were changed from 0.4 and 0.2 to 2.0 and 1.0, and those of the QTL at 47 cM were changed from -0.4 and -0.2 to -2.0 and -1.0.


 
View this table:
In this window
In a new window

 
Table 4. Ability of GA to locate two very close QTL with opposite effects (100 replicates for each combination of additive and dominant effect)

The simulations were repeated 100 times. Table 4 shows the results. For example, when the additive and dominance effects of the QTL are 1.6 and 0.8 for the QTL at 43 cM and -1.6 and -0.8 for the QTL at 47 cM, two QTL with reverse genetic effects were detected 87 times out of 100 trials in the marker interval 40–50 cM, one QTL was detected 10 times, and no QTL was detected 3 times. No cases of three QTL being detected were observed.

Here, the estimates of the genetic effects are smaller than their true values. This bias of the estimated genetic effects comes from the high correlation between the two QTL, even conditional on the marker information. However, unlike multicolinearity in regression analysis, the estimates are conservative rather than liberal, preventing practitioners from being too optimistic.

Comparison with other methods:
We compared the precision of our GA method with those of SIM, CIM, and Bayesian estimation. Software packages used in these experiments were QTL Cartographer (BASTEN et al. 1998 Down) for simple and composite interval mapping and Multimapper (SILLANPAA 1998 Down) for Bayesian estimation. For SIM and CIM, the critical value of the LOD score for claiming QTL detection with a 5% probability of a type I error was based on 1000 replicates of the simulation under the null hypothesis of no QTL. Three separate experiments were repeated 100 times, and the results are summarized in Table 5 and Table 6.


 
View this table:
In this window
In a new window

 
Table 5. Comparison of four methods inferring three QTL (100 replicates)


 
View this table:
In this window
In a new window

 
Table 6. Comparison of four methods in the case of two QTL with reverse genetic effects in a marker interval (100 replicates)

Conditions for generating the data were as above. In the first experiment (Table 5), three QTL were located on the same chromosome at 17, 43, and 85 cM from the end of a chromosome. Each QTL had an additive effect of 1.0 and a dominance effect of 0 (broad-sense heritability is 0.6). CIM and GA always detected the three QTL correctly, but SIM could not detect the three QTL separately. Bayesian analysis sometimes detected three QTL, but in other cases it detected only two. The probability of success for the Bayesian analysis was 81/100 for the QTL at 17 cM, 73/100 for the QTL at 43 cM, and 95/100 for the QTL at 85 cM.

In the second experiment (Table 6), two QTL were located on the same chromosome at 43 and 47 cM. As the markers are located every 10 cM, no marker exists between the two QTL. The QTL at 43 cM had additive and dominance effects of 2.0 and 1.0, respectively, whereas the QTL at 47 cM had the opposite effects, -2.0 and -1.0 (broad-sense heritability is ~0.82). SIM, CIM, and the Bayesian method all detected only one QTL with very little genetic effect because of the two opposite QTL effects. However, our GA detected the two QTL in 98 cases out of 100 trials.


*  DISCUSSION
*TOP
*ABSTRACT
*MODEL AND GENETIC ALGORITHM
*SIMULATION EXPERIMENTS
*DISCUSSION
*LITERATURE CITED

Numerical simulations clearly showed the high level of detection of our GA. Composite interval mapping is well known as an improved interval mapping. However, if there are two QTL in the same marker interval, even composite interval mapping cannot detect QTL separately, as in the case of simple interval mapping. Bayesian analysis can detect QTL separately in such a case, but will not work well if there are many QTL or if the amount of information decreases due to QTL having opposite net effects. Moreover, Bayesian analysis is time intensive, because it uses the MCMC method. We performed our experiments on a DEC Alpha 21164A 500MHz workstation, and for the three-QTL experiment (Table 5), Bayesian analysis took 150–200 min, while our GA took only 1–3 min.

Since the GA repeats the evaluation of the fitness so many times, a criterion that is easily calculated is required. And as mentioned above, it is often of primary importance to predict the distribution of quantitative traits in the field of breeding such as marker-assisted selection. AIC was developed to select the model with the highest predictive power (AKAIKE 1974 Down; SAKAMOTO et al. 1983 Down) and is very simple. The GA detects QTL with higher probability than the other methods. That is, it shows a significant improvement in type II error rates, while type I errors are not overly apparent. For the purpose of marker-assisted breeding, this is a major advantage. However, it does not necessarily mean that our GA has better performance than all the others, if type I error rate is the sole focus of attention (and in fact it should not be, since these methods explicitly aim to control type I errors). AIC is a relatively liberal criterion, while software packages of interval mapping and Bayesian analysis use even more conservative criteria. Accordingly, our GA picked up more false-positive QTL than the other methods. The question of how to trade off type I (false detections) and type II (failures to detect) is an important topic that is yet to be studied.

In this study, we adopted a moment method to estimate the genetic effects. On the other hand, KAO and ZENG 1997 Down and KAO et al. 1999 Down proposed the MIM using a general EM algorithm for multiple QTL. MIM directly implements simultaneous estimation of genetic effects. Since the moment method is less statistically efficient than ML, MIM may give more precise MLEs than our method. On the other hand, a merit of our moment method is simplicity. Our method is easier to implement than MIM. To check the difference between true MLEs and the results of our method, we applied the Newton-Raphson procedure to the results of our GA and searched for genetic parameters that maximize the fitness function. The differences found were negligible, but we should leave this as an open question for more extensive comparisons.

Dense marker linkage maps presently exist only for some major plants and animals. Worse, genes controlling the same trait are often clustered (MEYERS et al. 1998 Down; SIMONS et al. 1998 Down). Even though rice is fairly densely sampled, two important genes in one interval were regarded as a single gene using standard QTL mapping (WANG et al. 1999 Down). This suggests that multiple QTL may well exist within a given marker interval. If there are multiple QTL in one marker interval, the conditional probabilities of QTL genotypes given marker genotypes are not described by Table 1 and Equation 5. For example, let two QTL (QTL1 and QTL2) exist between two markers (marker A and marker B) in the following order: marker A, QTL1, QTL2, and marker B. The conditional probability of QTL genotypes given marker genotypes is

(14)

where Q1, Q2, A, and B are the genotypes of QTL1, QTL2, marker A, and marker B. Equation 14 becomes more complex if there are more, i.e., an unknown number of QTL in a marker interval. JIANG and ZENG 1995 Down, JIANG and ZENG 1997 Down derived an explicit formula of the joint conditional probabilities calculating the left side of Equation 14 directly. Their method is easy to implement when the number of QTL is few, but the task increases exponentially with more QTL. Our method is free from this complexity for the estimation of the genetic effects because the moment method uses only recombination fractions between QTL. On the other hand, for the calculation of maximum likelihood, our method also needs to use the joint probabilities mentioned above. We compared the results considering and not considering Equation 14 in the cases of closely linked QTL mentioned above (Table 4); however, the differences were negligible. Since one of our aims is to handle multiple QTL in large data sets with practical calculation time, the approximate likelihood estimated by not considering the joint probability is used.

Evaluation of the accuracy of the estimation is an important problem. For instance, in the Bayesian model, the accuracy is easy to evaluate, because the result of the estimation is described by a posterior distribution. With our GA, however, only one GA individual with the best fitness is adopted as the optimum solution. A possible solution is to calculate Fisher's information matrix (FIM; also called the Hessian matrix), which can be used as a rough evaluation of the precision. The variance of the estimates is approximated by the inverse of the FIM. We tried to evaluate the accuracy of our GA with FIM in the experiment shown in Fig 5 and Table 2. Variances of the estimates of QTL locations were ~0.5 (1 SD = 0.7 cM), and they are well in accordance with the differences between correct locations and estimated locations. However, estimated variances of genetic parameters are too small (~10-4–10-5), so this measure of accuracy may be too optimistic. Further, our aims are to develop a method to evaluate the accuracy of the estimation using information from lower-ranked GA individuals.

The GA method allows the user to construct models for complicated problems and gain the optimum solution within a practical time. One major goal is to extend our GA to many complex cases, such as epistasis and outbred lines, and to develop a widely applicable QTL mapping method.


*  ACKNOWLEDGMENTS

We thank the communicating editor Dr. Zhao-Bang Zeng and two anonymous reviewers for their many helpful comments. Dr. Hiroyuki Hirano and Dr. Hiroyoshi Iwata provided useful information in running the simulations. We thank Stephane Aris-Brosou, Douglas M. Robinson, and Peter J. Waddell for their assistance in writing this article. This work is supported by grants 09788, 12554037, and BSAR-497 from the Japan Society for the Promotion of Science. Software of our GA use, written in the C++ language, is available from Reiichiro Nakamichi (http://peach.ab.a.u-tokyo.ac.jp/~naka).

Manuscript received August 17, 2000; Accepted for publication February 5, 2001.


*  LITERATURE CITED
*TOP
*ABSTRACT
*MODEL AND GENETIC ALGORITHM
*SIMULATION EXPERIMENTS
*DISCUSSION
*LITERATURE CITED

AKAIKE, H., 1974 A new look at the statistical model identification. IEEE Trans. Automat. Control AC-19: 716–723.

BASTEN, C. J., B. S. WEIR and Z-B. ZENG, 1998 QTL Cartographer (available at http://statgen.ncsu.edu/qtlcart/cartographer.html).

CHURCHILL, G. A. and R. W. DOERGE, 1994  Empirical threshold values for quantitative trait mapping. Genetics 138:967-971.

DOERGE, R. W. and G. A. CHURCHILL, 1996  Permutation tests for multiple loci affecting a quantitative character. Genetics 142:285-294[Abstract].

HALEY, C. S. and S. A. KNOTT, 1992  A simple regression method for mapping quantitative trait loci in line crosses using flanking markers. Heredity 69:315-324[Medline].

HAYASHI, T. and Y. UKAI, 1994  Detection of additive and dominance effects of QTL in interval mapping of F2 RFLP data. Theor. Appl. Genet. 87:1021-1027.

JANSEN, R. C., 1993  Interval mapping of multiple quantitative trait loci. Genetics 135:205-211[Abstract].

JIANG, C. and Z-B. ZENG, 1995  Multiple trait analysis of genetic mapping for quantitative trait loci. Genetics 140:1111-1127[Abstract].

JIANG, C. and Z-B. ZENG, 1997  Mapping quantitative trait loci with dominant and missing markers from crosses from two inbred lines. Genetica 101:47-58[Medline].

KAO, C. H., 2000  On the differences between maximum likelihood and regression interval mapping in the analysis of quantitative trait loci. Genetics 156:855-865[Abstract/Free Full Text].

KAO, C. H. and Z-B. ZENG, 1997  General formulas for obtaining the MLEs and the asymptotic variance-covariance matrix in mapping quantitative trait loci when using the EM algorithm. Biometrics 53:653-665[Medline].

KAO, C. H., Z-B. ZENG, and R. D. TEASDALE, 1999  Multiple interval mapping for quantitative trait loci. Genetics 152:1203-1216[Abstract/Free Full Text].

LANDER, E. S. and D. BOTSTEIN, 1989  Mapping Mendelian factors underlying quantitative traits using RFLP linkage maps. Genetics 121:185-199[Abstract/Free Full Text].

MEYERS, B. C., D. B. CHIN, K. A. SHEN, S. SIVARAMAKRISHNAN, and D. O. LAVELLE et al., 1998  The major resistance gene cluster in lettuce is highly duplicated and spans several megabases. Plant Cell 10:1817-1832[Abstract/Free Full Text].

MYERS, R. and E. R. HANCOCK, 1997  Genetic algorithm parameter sets for line labelling. Patt. Recogn. Lett. 18:1363-1371.

SAKAMOTO, Y., M. ISHIGURO and G. KITAGAWA, 1983 Akaike information criterion statistics. Kluwer Academic, Dordrecht/Norwell, MA.

SCHRAUDOLPF, N. N. and R. K. BELEW, 1992  Dynamic parameter encoding for genetic algorithms. Mach. Learning 9:9-21.

SILLANPÄÄ, M. J., 1998 Multimapper (available at http://www.rni.helsinki.fi/~mjs/).

SILLANPÄÄ, M. J. and E. ARJAS, 1998  Bayesian mapping of multiple quantitative trait locus from incomplete inbred line cross data. Genetics 148:1373-1388[Abstract/Free Full Text].

SIMONS, G., J. GROENENDIJK, J. WIJBRANDI, M. REIJANS, and J. GROENEN et al., 1998  Dissection of the Fusarium I2 gene cluster in tomato reveals six homologs and one active gene copy. Plant Cell 10:1055-1068[Abstract/Free Full Text].

STEPHENS, D. A. and R. D. FISCH, 1998  Bayesian analysis of quantitative trait locus data using reversible jump Markov chain Monte Carlo. Biometrics 54:1334-1347.

STREIFEL, R. J., R. J. MARKS, R. REED, J. J. CHOI, and M. HEARLY, 1999  Dynamic fuzzy control of genetic algorithm parameter coding. IEEE Trans. Syst. Man Cybern. B: Cybernet. 29:426-433.

UIMARI, P. and I. HOESCHELE, 1997  Mapping linked quantitative trait loci using Bayesian analysis and Markov chain Monte Carlo algorithms. Genetics 146:735-743[Abstract].

WANG, Z.-X., M. YANO, U. YAMANOUCHI, M. IWAMOTO, and L. MONNA et al., 1999  The Pib gene for rice blast resistance belongs to the nucleotide binding and leucine-rich repeat class of plant disease resistance genes. Plant J. 19:55-64[Medline].

YI, N. and S. XU, 2000  Bayesian mapping of quantitative trait loci for complex binary traits. Genetics 155:1391-1403[Abstract/Free Full Text].

ZENG, Z-B., 1993  Theoretical basis for separation of multiple linked gene effects in mapping quantitative trait loci. Proc. Natl. Acad. Sci. USA 90:10972-10976[Abstract/Free Full Text].

ZENG, Z-B., 1994  Precision mapping of quantitative trait loci. Genetics 136:1457-1468[Abstract].

ZENG, Z-B., C-H. KAO, and C. J. BASTEN, 1999  Estimating the genetic architecture of quantitative traits. Genet. Res. 74:279-289[Medline].




This article has been cited by other articles:


Home page
GeneticsHome page
C.-H. Kao
Mapping Quantitative Trait Loci Using the Experimental Designs of Recombinant Inbred Populations
Genetics, November 1, 2006; 174(3): 1373 - 1386.
[Abstract] [Full Text] [PDF]


Home page
GeneticsHome page
S. H. Lee and J. H. J. Van der Werf
Simultaneous Fine Mapping of Multiple Closely Linked Quantitative Trait Loci Using Combined Linkage Disequilibrium and Linkage With a General Pedigree
Genetics, August 1, 2006; 173(4): 2329 - 2337.
[Abstract] [Full Text] [PDF]


Home page
GeneticsHome page
A. Baierl, M. Bogdan, F. Frommlet, and A. Futschik
On Locating Multiple Interacting Quantitative Trait Loci in Intercross Designs
Genetics, July 1, 2006; 173(3): 1693 - 1703.
[Abstract] [Full Text] [PDF]


Home page
GeneticsHome page
M. Zhang, K. L. Montooth, M. T. Wells, A. G. Clark, and D. Zhang
Mapping Multiple Quantitative Trait Loci by Bayesian Classification
Genetics, April 1, 2005; 169(4): 2305 - 2318.
[Abstract] [Full Text] [PDF]


Home page
GeneticsHome page
W. Zhao, R. Wu, C.-X. Ma, and G. Casella
A Fast Algorithm for Functional Mapping of Complex Traits
Genetics, August 1, 2004; 167(4): 2133 - 2137.
[Abstract] [Full Text] [PDF]


Home page
GeneticsHome page
M. Bogdan, J. K. Ghosh, and R. W. Doerge
Modifying the Schwarz Bayesian Information Criterion to Locate Multiple Interacting Quantitative Trait Loci
Genetics, June 1, 2004; 167(2): 989 - 999.
[Abstract] [Full Text] [PDF]


Home page
GeneticsHome page
C. R. Moreno, F. Lantier, I. Lantier, P. Sarradin, and J.-M. Elsen
Detection of New Quantitative Trait Loci for Susceptibility to Transmissible Spongiform Encephalopathies in Mice
Genetics, December 1, 2003; 165(4): 2085 - 2091.
[Abstract] [Full Text] [PDF]