## Abstract

We propose a birth–death–merge data-driven reversible jump (DDRJ) for multiple-QTL mapping where the phenotypic trait is modeled as a linear function of the additive and dominance effects of the unknown QTL genotypes. We compare the performance of the proposed methodology, usual reversible jump (RJ) and multiple-interval mapping (MIM), using simulated and real data sets. Compared with RJ, DDRJ shows a better performance to estimate the number of QTLs and their locations on the genome mainly when the QTLs effect is moderate, basically as a result of better mixing for transdimensional moves. The inclusion of a merge step of consecutive QTLs in DDRJ is efficient, under tested conditions, to avoid the split of true QTL’s effects between false QTLs and, consequently, selection of the wrong model. DDRJ is also more precise to estimate the QTLs location than MIM in which the number of QTLs need to be specified in advance. As DDRJ is more efficient to identify and characterize QTLs with smaller effect, this method also appears to be useful and brings contributions to identifying single-nucleotide polymorphisms (SNPs) that usually have a small effect on phenotype.

GENETICISTS and molecular biologists have aimed at locating regions associated with quantitative traits in a chromosome. These chromosomal regions are known as quantitative trait *loci* (QTL) and their location and effects on the phenotypic traits are estimated by genetic markers. The most popular genetic markers are simple sequence repeats (SSR) and single-nucleotide polymorphism (SNP); their location is specified by the linkage map and their genotype is known.

A phenotype is usually modeled as a linear function of the additive and dominance effects of the QTL genotypes and several methods have been developed for the localization and characterization of QTLs. The standard estimation method in experimental crosses is the interval mapping (IM) presented by Lander and Botstein (1989) and Haley and Knott (1992). Lander and Botstein (1989) propose using the EM algorithm (Dempster *et al.* 1977), assuming a single putative QTL at each location on the genome and comparing the hypothesis of a single QTL to the null hypothesis of no segregation QTLs by the logarithm of the odds ratio (LOD score). However, the estimate of the QTL effects can be influenced by the effect of other possible QTLs in adjacent regions since this effect is not controlled in the model and nonexisting or ghost QTLs can be identified. A ghost QTL appears when two or more QTLs are linked in coupling (meaning that their effects have the same sign) and interval mapping gives a maximum LOD score at a location between the two QTLs (Broman and Speed 1999).

Jansen (1993), Jansen and Stam (1994), and Zeng (1994) propose composite-interval mapping (CIM) to control the effect of QTLs located in adjacent regions and avoid the identification of ghost QTLs. They propose to include in the single putative QTL regression model a subset of markers as cofactors. Kao *et al.* (1999) propose multiple-interval mapping (MIM), which considers the effect of all possible QTLs and the epistatic effect between them in a single model. This model, with a fixed number of QTLs, is estimated by the EM algorithm and the number of QTLs is selected by model selection methods such as the Akaike information criterion (AIC) and the Bayesian information criterion (BIC), among others.

Bayesian methods for QTL mapping are interesting tools since they allow us to select and estimate the model jointly. Earlier Bayesian approaches were proposed by Stephens and Smith (1993) and Satagopan *et al.* (1996). The authors estimate the locations and effect of a prespecified number of QTLs. In practice, however, the number of QTLs is unknown and must be estimated. Satagopan and Yandell (1996) and Stephens and Fisch (1998) propose variants of reversible-jump (RJ) Markov chain Monte Carlo to estimate it and the remaining parameters of the model jointly. An important characteristic in the chain generated in MCMC is that it mixes well; *i.e.*, it moves around the parameter space rather easily and quickly finds its stationary distribution. Forming a good Markov chain and monitoring its behavior is delicate and sophisticated work (Broman and Speed 1999).

Over the past decade, different ways to generate proposal parameters in MCMC have been suggested to facilitate the moves between models and accelerate the convergence of the original RJ algorithm. Green and Mira (2001) propose an algorithm that, on rejection, makes a second attempt to move. Regarding the inclusion of a new QTL, Yi and Xu (2002) suggest generating its effects (additive and dominance) from the conditional *a posteriori* distribution. Yi *et al.* (2005) propose updating the location of a specific QTL and its genotypes together. As a QTL’s location and genotype are correlated, the acceptance probability of a new QTL’s location is higher if its genotype is updated jointly.

To accelerate the search procedure of the correct number of QTLs, *K*, more suitable and efficient dimensional change candidates must be generated. For this purpose, we propose a birth–death–merge data-driven reversible jump (DDRJ) for multiple-QTL mapping. It simulates a more likely location for a new QTL using the available data, chooses a QTL to be excluded according to its importance in the current model, or merges the effects of two consecutive QTLs if their genotypes are correlated. Consequently, candidates are more likely to be accepted and the space of possible models is more easily explored. Jain and Neal (2004, 2007) and Saraiva and Milan (2012) show that data-driven methods are effective in simplifying the methodology and improving the chain mixing.

The merge movement of consecutive QTLs is efficient under tested conditions to avoid identification of false QTLs. Usually, as both QTLs have similar estimated genotypes, the effects of the true QTL are split between the two QTLs and bias the estimate of the number of QTLs and their effects. Split QTLs can be seen as the opposite problem to that of ghost QTLs.

The proposed method has also the advantage of providing interval estimates that can be used to analyze the uncertainty of estimates. The usual methods generally provide only point estimates or asymptotic confidence intervals for big samples.

This article is organized as follows: *Model for Quantitative Traits* presents such a model and discusses the likelihood function; *Bayesian Approach* addresses the Bayesian approach for the model, including the DDRJ procedure to estimate the number of QTLs; *Applications* analyzes the performance of DDRJ and compares it with RJ and MIM performance in simulated and real data sets. Finally, *Discussion* provides a discussion of the methods.

## Model for Quantitative Traits

Let be a quantitative trait of *n* individuals from an F_{2} population. Assume this phenotype has been affected by *K* QTLs located at positions , for , between *m* different genotyped markers with a known linkage map.

Phenotype for the *i*th individual can be modeled by(1)where *μ* is the average of expected values of genotypes and is the additive effect of the *k*th QTL; is the dominance effect of the *k*th QTL; represents the genotype of the *k*th QTL of the *i*th individual coded as or 1 for , , or , respectively, and Normal is the random error; and and are supposed to be independent for .

The phenotype can also be affected by environmental covariates and interactions among QTLs or between covariates and QTLs. The model defined by Equation 1 does not consider these effects, but extensions (modeling environmental covariates as fixed effects, for example) are straightforward.

The data set consists of the observations regarding the quantitative trait of *n* individuals; , the markers’ genotype are coded as or 1 for , or , respectively; and , the distances (in centimorgans) between each marker and the first marker, where

We assume there is at most one QTL between two consecutive markers, therefore , and the QTL’s genotype is explained only by flanking markers; *i.e.*, and are independent for , where is the genotype of the marker to the right of the *k*th QTL for the *i*th individual and is the genotype of the marker to the left of the *k*th QTL for the *i*th individual.

The joint probability distribution of **Y** and **Q**, where is the matrix of the *K* QTLs genotypes for the *n* individuals, is(2)where , for , for and *f* is the conditional normal density for .

In practice, the number of QTLs *K* is unknown and the parameters of the model are . The likelihood function of **θ** given and is(3)where is the residual of the *i*th observation and is the conditional probability of the QTL genotype given the flanking markers genotypes as defined by Stephens and Fisch (1998). Such a probability is a function of recombination fractions between the *k*th QTL and its flanking markers calculated by the Haldane distance function. Note that , and , is nonobservable and must be estimated.

Without losing the generality and for simplicity, consider the models with one and two QTLs defined, respectively, asandfor . Observe if for all or almost all individuals, and , the models and are equally or almost equally likely and it can be hard to select the correct model in this situation. The genotype of two loci has a high probability of being equal when they are close on the same chromosome and the model is wrongly estimated if the effect of two or more true close QTLs are merged in only one QTL or if the effect of one true QTL is split with one or more false close QTLs. We note in our simulated data sets, some of them shown in *Applications*, using multiple QTLs methods to estimate the model, that often methods split the effect of one true QTL with one or more false QTLs. Conventional methodologies for QTL mapping often do not deal well with this problem.

## Bayesian Approach

The usual Bayesian methodology for models with unknown *K* is the RJ proposed by Green (1995). This method consists of running Metropolis–Hastings steps that either accept or reject different moves, like “birth” or “death” of a QTL. These steps enable transitions from the current model to models of higher or lower dimensions.

Parameters , , , *μ*, , and elements of **α** and **δ** are supposed to be independent and the joint *a priori* density for **θ** is written as(4)Particularly, we consider

Uniform.

Normal, , where and are known hyperparameters.

Normal, , where and are known hyperparameters.

Normal, where and are known hyperparameters.

Inverse-gamma, where and are known hyperparameters.

. If there is no

*a priori*information about the QTL’s location, each location is assumed uniformly distributed over the possible loci.

Combining the likelihood function in Equation 3 with the *a priori* distributions, we obtain the conditional *a posteriori* distributions of , , , , and , , provided in Supporting Information, File S1.

The nonobservable genotype , and , is simulated and updated by its conditional *a posteriori* distribution given by(5)for and where is the Normal density function.

From Equation 5, Multinomial, whereParameters *μ*, , , and and nonobservable variables , and , are updated by Gibbs sampling steps and is updated jointly with by Metropolis–Hastings steps, in which is sampled from a Uniform distribution and the block is accepted according to probability , where(6) is the residual of the *i*th individual, , and is calculated using and .

### DDRJ

The movements that change *K* are called birth , death , or merge moves when a new QTL is included in the model or, conversely, one QTL is excluded from the current model or the effects of two QTLs are summed into a single QTL. The birth, death, and merge moves are implemented by Metropolis–Hastings steps and either increase or reduce the number of QTLs by one at each step.

Consider the current state of the MCMC procedure with *K* QTLs and the proposed movement, where ′ means a birth , a death , or a merge of QTLs. Therefore, if a birth movement is proposed or if a death or a merge movement is proposed. This move is accepted according to Metropolis–Hastings probability , where(7)and is the transition function, described below.

At each step, we choose a movement to increase or reduce the number of QTLs as follows:

If , a birth or a death is randomly chosen, according to its probability. Here, we assume and .

If , a birth is chosen;

*i.e.*,If , a death is chosen;

*i.e.*,

#### Birth proposal:

When a birth movement is chosen, a location is selected for the new QTL in a marker interval that has no QTL and its genotype and effect parameters must be defined. The selection of a location through a Uniform distribution can be inefficient, mainly if we have a large number of marker intervals.

If there is a strong association between a marker and a trait, it is reasonable to suppose there is a QTL nearby that marker. Therefore, the association between markers and trait can be used to guide the search for new QTLs in the estimation process. As each marker can be seen as a factor with three levels affecting differently the phenotype mean or the residual mean of the current model, we use the Kruskal–Wallis test statistic to measure this association. The *F* statistics in a one-way analysis of variance could also be used. Higher values indicate the residual mean is different for the distinct levels of the marker and there is a higher chance of a QTL close to it whose effect is not considered in the current model. Values close to zero indicate the residual mean is the same for all levels of the marker and its contribution to explain the quantitative trait is not relevant or its effect is already considered in the model.

The complete birth step is built as follows:

Select a marker to allocate the new QTL from a Multinomial, where , , and is the statistics of the Kruskal–Wallis test from residuals of the current model and the

*j*th marker genotype, defined aswhere is the number of individuals in the*l*th group and the three groups are specified by the genotype of the*j*th marker, is the rank (among all individuals) of the*i*th individual from the*l*th group, , and is the average of all the . Note that markers which most affect the residual mean are more likely to be chosen;Assume the marker has been chosen, and , and suppose there is no QTL between the and markers. The new QTL can be located in and is defined as , where Beta and

*a*is calculated according toConsequently, the expected value of is the average of the marker and its flanking markers’ position weighted by their effect on the residual mean of the current model and the new QTL is more likely to be close to the marker that has the most relevant effect on the residual mean. Note the distribution is Uniform when , and have the same effect on the residual mean and the marker is in the center of .If , , or already contains a QTL then the new QTL will be located in , , , or , respectively, and its position is simulated as in step 2, considering only two markers and not three.

Sample the genotype of the new QTL for all individuals, , from

Sample from its conditional

*a posteriori*distribution considering andSample from its conditional

*a posteriori*distribution considering and .Sample from its conditional

*a posteriori*distribution, considering , , and .Sample from its conditional

*a posteriori*distribution, considering , , , and .

Therefore, we have a new set of QTL genotypes and parameters . This transition proposal is denoted by and its probability is(8)where is the conditional *a posteriori* distribution for each parameter used to sample the candidate values. The acceptance probability for the birth move is , where is given by Equation 7. The probability of the transition proposal denoted by is

#### Death proposal:

Since a death move has been selected, we choose a QTL from the current model to be deleted.

As assumes only values , 0, and 1 and assumes only 0 and 1, for and , the current absolute value of and shows the importance and significance of the *k*th QTL, *i.e.*, higher absolute values of or indicate the *k*th QTL is more relevant to explain the phenotype. The current values of these parameters are useful for the choice of the QTL to be excluded without changing significantly the predictive power of the model.

Instead of selecting a QTL to be excluded from a Uniform, we select it from a Multinomial, where , for *i.e.*, QTLs that exert the strongest effects and are the most relevant to the model are less likely to be selected and deleted. Therefore, the acceptance probability of the death movement is improved.

The complete death step is as follows:

Select the QTL to be excluded from Multinomial, the QTL.

Delete , , , and from

**q**,**λ**,**α**, and**δ**, respectively.Sample from its conditional

*a posteriori*distribution, considering only QTLs.Sample from its conditional

*a posteriori*distribution, considering the reduced model.

We have a new set of QTL's genotypes and parameters . This transition proposal is denoted by and its probability is(10)where is the conditional *a posteriori* distribution of each parameter used to generate the candidate values.

The acceptance probability for the death movement is , where with some suitable substitutions. The probability of transition proposal denoted by is defined as(11)where is the marker on the left of the QTL and is the marker on the right of the QTL.

Note that if we first choose a birth movement in state *x*, giving , and then choose the death of the th QTL, we can recover *x* and state *x* is likely to be recovered after a birth process of . If the candidate movement is not accepted, the chain remains in the current model, the value of *K* does not change, and the remaining parameters of the model are updated by Metropolis–Hastings or Gibbs steps.

#### Merge proposal:

Instead of proposing data driven with only birth and death steps, we also include a merge movement in the procedure since the model can be wrongly estimated if the effect of a true QTL is split between two or more false QTLs. The split of a QTL may happen if a QTL appears very close to an existent QTL and, as their genotypes are very similar, both are in the model and split the additive and dominance effect that would be of only one QTL. The death of one of these QTLs is not generally accepted since the effects of both QTLs are relevant to explain the phenotype variability. The merge moves of two consecutive QTLs are usually accepted and effective to avoid split QTLs since the effect of the QTL that is removed from the model is added to the effect of an adjacent QTL and the predictive power of the model does not change significantly.

For merging two QTLs we must choose a pair of consecutive QTLs to be merged and choose one QTL to be removed from the model. Its effects are added to the effect of the other QTL. We propose to build a data-driven merge candidate as follows:

Select a pair of consecutive QTLs to be merged from Multinomial, where , and , and is Cramér’s

*V*measure of association between the genotypes of the*k*th and the*j*th QTLs. Note that pairs of successive QTLs with more associated genotypes have higher probability to be merged since the split happens between QTLs with similar genotype. Suppose the pair of QTLs and has been selected.Choose the or to be excluded from the current model, according to , Consider that the has been chosen to be excluded.

Delete , , , and from

**q**,**λ**,**α**, and**δ**, respectively.Update , ,

*μ*, and , successively, from their conditional*a posteriori*distribution considering , , and with QTLs.

Instead of adding the value of and to and , respectively, we propose to update and from their conditional *a posteriori* probability, using the reduced model. It is equivalent since we remove the effects of the QTL from the current model to update and and simplify the calculation of merge acceptance probability since is not necessary to define deterministic transformations to reduce the dimension of the model.

We have a new set of QTL's genotypes and parameters . This transition proposal is denoted by and its probability is(12)where is the conditional *a posteriori* distribution of each parameter used to sample the candidate values.

The acceptance probability for the merge movement is , where is defined by Equation 7. The probability of a transition proposal denoted by that represents a split of the QTL is defined as(13)where is the marker on the left of the QTL and is the marker on the right of the QTL.

Since we include the QTL merge move only to avoid split QTLs, we do not include a QTL split step in this procedure. However, a split step could be easily included in the algorithm, using the transition function of a split movement defined in Equation 13.

#### Algorithm:

The birth–death–merge DDRJ is specified as follows:

Initialize a configuration for

**θ**and**q**.For the

*l*th iteration, ,Choose a death or birth movement.

Generate the candidate values of .

Accept the proposal with probability , where ′ means either

*b*or*d*.If a birth movement has been accepted, do and consider .

If a death movement has been accepted, do and consider .

If no movement has been accepted, do and consider

*x*.

If , generate and evaluate the acceptance of a merge of a QTLs pair. If a merge movement has been accepted, do and consider .

Update , .

Update , and , from its conditional

*a posteriori*distribution.Update and , , from their conditional

*a posteriori*distributions.Update

*μ*from its conditional*a posteriori*distribution.Update from its conditional

*a posteriori*distribution.

This algorithm is implemented in R language and the codes are available in File S3 and File S4. R is a free software environment for statistical computing and graphics and more details are found in its homepage, “https://www.r-project.org”.

## Applications

We apply the proposed method to simulated and real data sets and compare the performance of the RJ, DDRJ, and MIM methodologies. Although the computational efficiency is an important feature of the methods, we focus on analyzing and comparing their performance in selecting and estimating the correct model. We set hyperparameters , , and This setup provides *a priori* distributions with large variability and weak information about the parameters.

### Simulated data sets

We simulate a high-dimension linkage map with 450 loci that are allocated on a large chromosome of 450 cM (average distance between the loci is 1 cM) and their genotype for an F_{2} population of 300 individuals by QTL Cartographer 2.5 software available at “http://statgen.ncsu.edu/qtlcart/WQTLCart.htm” (Basten *et al.* 1997). We choose loci located at to be the QTLs and simulate the phenotype using , , , and three values of *σ* . The effects of the first and the second QTLs are stronger and are easily identified, the fourth and fifth QTLs have opposite effects, and the effect of the third QTL is the weakest.

We run RJ and DDRJ chains iterations, discard the first 5000 iterations, and take one for every 10 iterations. The chains are initialized with Convergence is verified using trace plots.

Figure 1, Figure 2, and Figure 3 show the RJ and DDRJ trace plots of *K* for , 1.0, and 1.5, respectively. We observe DDRJ chains show better mixing since they easily move around the models space throughout the chain as a consequence of better proposal candidates. The RJ chain moves with greater difficulty between the possible models and it can get stuck in a specific model for longer periods even if it is a wrong model. When , we observe a very poor mixing of the RJ chain since it gets stuck for long periods (at the beginning and end of the chain) in the model with (wrong model). When , the RJ chain moves easily around the models space in the beginning of the chain but not in its end.

We also analyze the mixing of the chains by their effective sample size (ESS) (Kass *et al.* 1998), which is the number of effectively independent draws from the *a posteriori* distribution. A large discrepancy between the ESS and the simulation sample size indicates poor mixing. Table 1 shows the ESS for the RJ and DDRJ *K* sequences and we observe the DDRJ ESS is larger than the RJ ESS, which confirms a better mixing of DDRJ chains. We observe a very poor mixing of the RJ chain mainly for DDRJ and RJ ESSs of the remaining parameters of the models are shown in Table A of File S2 and DDRJ ESSs are in most cases larger than RJ ESSs.

Table 2 shows *a posteriori* probabilities for *K* calculated as the relative frequency of each value of *K* in the sequence. The highest *a posteriori* probability estimate for each situation is in boldface type and the argument that maximizes this probability is the estimate of *K*. In situations where the genetic effects of QTLs are strong compared with the size of the error variability () both methodologies estimate correctly However, as a result of weak mixing, the RJ chain gets stuck in for long periods and tends to underestimate the *a posteriori* probability of *K*. Since represents a small variability of the random error and, consequently, the effect of QTLs is more evident, the choice of the correct model should be precise. When , the opposite fourth and fifth QTLs, although they have higher additive effect than the third QTL, are not identified by RJ and DDRJ since their effects cancel each other. For , the RJ procedure estimates only and shows greater difficulties in locating the QTLs.

Table 3 shows the estimates (*a posteriori* average) of parameters and their credibility interval. The estimates of both methodologies are similar when and close to the true values. The DDRJ point estimates of the additive and the dominance effect of the fourth and fifth QTLs are closer to the true simulated parameters than the RJ estimates. Zero belongs to the RJ credibility interval of . The additive and dominance effects of the third QTL are the worst estimate in both methods. When , RJ and DDRJ estimates for the model with QTLs are similar and the additive and dominance effects estimates of the third QTL are also the worst estimate in both methods. For , RJ shows a low performance to estimate the number of QTLs and the parameters associated with them. The RJ point estimates are different from the parameters and interval estimates are large.

We also analyze the simulated data sets, using the MIM method available in QTL Cartographer. The main model selection criterion available in QTL Cartographer to select the number of QTLs is BIC , where is the maximum-likelihood estimator of , *p* is the number of free parameters to be estimated, and . Other definitions of are used and available in QTL Cartographer such as (AIC), , , , and , where we define We choose the MIM forward search method to estimate the initial model and test the six model selection criteria to optimize QTLs positions, search for new QTLs, and test existing QTLs. We report the results of , which shows the best results for the simulated data sets.

The MIM method combined with BIC model selection methodologies and optimization procedures of QTL location and effect estimates for , and 1.5, respectively. Table 4 shows the MIM estimates of the remaining parameters of the models. The method identifies one nonexisting QTL at 9.0 cM when and the additive and dominance effects of the second QTL are biased. We observe that if we sum the estimates of additive and dominance effects of first and second QTLs, we have estimates closer to additive and dominance effects of the QTL located at 15.0 cM; that is, the effects of the true QTL estimated at 14 cM are split with a false QTL identified at 9 cM. When and 1.5, the opposite fourth and fifth QTLs are not identified and the DDRJ estimates of the remaining parameters, especially estimates associated with the third QTL that has weaker effects, are better than MIM estimates. We do not have a confidence interval to analyze the uncertainty of the parameters.

Unlike BIC (), we stop the AIC, BIC-like criteria with and estimation when they wrongly identify , and 9 significant QTLs for , and 1.5 located at , , respectively. The BIC-like criterion with estimates significant QTLs located at for all values of *σ* and the BIC-like criterion with estimates QTLs located at for , QTLs located at for , and significant QTL located at for Therefore, we observe the MIM method combined with BIC model selection is sensitive to choice for these simulated data sets; that is, depending on the choice, the method overestimates or underestimates the number of QTLs. If the data were not simulated and we did not know the correct model, we could estimate the model by the six MIM model selection criteria and select the estimated model that was the most frequent between all criteria. In this case, we would choose, for all values of *σ*, the model estimated by the AIC, BIC-like criterion with and , which is the worst estimated model.

### Real data set

We apply RJ and DDRJ to the bone mineral density data set. It consists of 661 female F_{2} mice derived from matings of F_{1} individuals from NZB/B1NJ × RF/J parents. This cross is designed to identify the genetic loci regulating femur mechanical properties, geometric properties, and bone mineral density (BMD). The data have 94 genetic markers located in 19 chromosomes. NZB, RF, and heterozygous markers are coded as 1, , and 0, respectively. The data were downloaded from the site “http://qtlarchive.org/db/q?pg=projlist”.

Twenty-three phenotypes were measured in all individuals; however, we analyze only the total femur volumetric BMD in milligrams per cubic centimeter. The trait was log-transformed before analysis to be comparable with Wergedal *et al.* (2006) and Cox *et al.* (2009) results.

We run RJ iterations, discard the first 10,000 and take one for every 10 iterations. We run DDRJ iterations, discard the first 5000, and take one for every 10 iterations. The sequences are initialized with and, in DDRJ, we update the birth candidate 10 times before evaluating its acceptance, as proposed by Green and Mira (2001). We analyze the convergence and conclude the number of iterations is sufficient for reliable results.

Table 5 shows the *a posteriori* DDRJ probability (relative frequency) for *K* in each chromosome whose value is evidence of a QTL presence. The *a posteriori* probability of the model with one QTL is 0.67 in chromosome 7, 0.42 in chromosome 11, 0.38 in chromosome 19, 0.33 in chromosome 9, and 0.25 in chromosome 1, which represents strong evidence of a QTL in chromosome 7 since is the argument that maximizes the *a posteriori* probability of *K* and moderate probability in chromosomes 1, 9, 11, and 19 since, despite that the maximum *a posteriori* probability is not for , it is >0.25. In chromosomes 10, 12, 17, and 18, the probability of a QTL is not negligible. Depending on the cost and researcher interest, these loci can be studied in more detail. Therefore, we identify at least QTLs regulating bone mineral density.

Table 6 shows estimates and credibility intervals for QTLs’ locations (centimorgans) and additive and dominance effects in chromosomes 1, 7, 9, 10, 11, 12, 17, 18, and 19. Additive and dominance effects explain how QTLs genotypes are associated to bone mineral density and their estimates are small (close to zero) because of the scale of the log(BMD). Although the chance of a QTL in chromosomes 10, 17, and 19 is not negligible, zero belongs to their additive and dominance effects credibility interval. Therefore, DDRJ identifies relevant QTLs at chromosomes 1, 7, 9, 11, 12, and 18.

We also analyze these data by a RJ and MIM forward search method combined with BIC model selection (), which shows better results in the simulated data sets. We observe only RJ low *a posteriori* probabilities 0.0006, 0.0009, and 0.027 for one QTL in chromosomes 7, 9, and 11, respectively. MIM identifies one QTL in chromosomes 1, 7, 9, 11, and 12 located at 88, 65, 70, 34, and 28 cM, respectively. The MIM point estimates of additive and dominance effects are and . The MIM effect estimates are close to DDRJ estimates; however, we do not have information about MIM estimates uncertainty. Wergedal *et al.* (2006) use a three-stage strategy and LOD score to identify QTLs located in chromosomes 3, 7, 10, 11, and 18 at 10-, 65-, 65-, 40-, and 50-cM positions, respectively.

If we use the DDRJ *a posteriori* probability of *K* as evidence of QTL presence, we observe DDRJ, MIM, and Wergedal methodologies identify QTLs in chromosomes 7 and 11; DDRJ and MIM identify three more QTLs in chromosomes 1, 9, and 12; and DDRJ and Werdegal methods identify another QTL in chromosome 18. The Werdegal method also identifies one QTL in chromosomes 3 and 10 whose credibility interval of additive effect and dominance effect contains zero. Therefore, for this data set, DDRJ methodology identifies QTLs with strong and weak effects in BMD that are not identified by other QTL mapping methods.

## Discussion

We propose a birth–death–merge DDRJ for QTL mapping in an F_{2} population with an unknown number of QTLs. We compare the performance of the proposed method with traditional RJ and MIM combined with a model selection method and optimization procedures that are the most popular methodologies for QTL mapping in experimental crosses. Although computational efficiency is an important feature of the methods, we focus on analyzing and comparing their performance in identifying significant QTL regions.

DDRJ shows a better performance to identify and estimate QTLs mainly when their effects are moderate and RJ does not identify them. The better performance of DDRJ occurs because it facilitates the moves around the models space and improves the chain mixing as a consequence of better proposals in transdimensional moves. Unlike DDRJ, the RJ method moves with greater difficulty between the possible models and it can get stuck in a specific model for longer periods even if it is a wrong model. Compared with MIM combined with model selection methods, DDRJ also shows better performance in identifying QTL regions and provides uncertainty information for all parameters through credibility intervals. For simulated data sets, MIM shows sensitivity to the choice of model selection criterion and, depending on the criterion choice, the method overestimates or underestimates the number of QTLs. As QTLs single effects are not so high in practice, mainly the effect of SNP QTLs (Yang *et al.* (2010)), the proposed methodology appears to be useful and brings contributions to identification and characterization of QTLs.

The DDRJ *a posteriori* probability of *K* is evidence of QTL presence and, even when this value is not maximum for , it allows us to specify regions that can be further explored by genetic researchers. The application in a real data set illustrates an example where DDRJ identifies QTLs with strong, moderate, and weak effects on the phenotype that are not identified by RJ, MIM, or other QTL mapping methods.

The inclusion of merge moves in DDRJ is efficient under analyzed data sets to avoid the split of a true QTL effect with one or more false QTLs. The conventional methodologies usually deal with a ghost QTL that appears between two or more QTLs linked in coupling and is generally more significant than the true QTLs. The problem presented here is the opposite of that of a ghost QTL since the true QTLs share their importance with one or more false QTLs. Ghost QTLs are usually avoided by multiple-QTL mapping methods and merge moves included in DDRJ reduce the chance of split QTLs. Since we include the QTLs merge move only to avoid split QTLs, we do not include a QTL split step in this procedure.

The R codes of birth–death–merge data-driven reversible jump are available in File S3 and File S4 and we are improving them to be more efficient and user friendly.

The amplitude of the DDRJ credibility interval of QTLs' location is large when error variability is higher. To improve the DDRJ performance, we can estimate the genotype of a QTL using more than the two flanking markers or using nonconjugate samplers, and analyze the results in future work. The proposed data-driven method can be extended to generalized linear models and identifies QTLs that affect binary or discrete phenotypes or for QTL mapping in pedigree data in which the individuals’ genotype is correlated if they are relatives and improves SNP mapping methods that have a smaller single effect on the phenotype.

## Acknowledgment

The authors thank two referees for useful comments and suggestions that improved the manuscript.

## Footnotes

*Communicating editor: N. Yi*Supporting information is available online at www.genetics.org/lookup/suppl/doi:10.1534/genetics.115.180802/-/DC1.

- Received July 16, 2015.
- Accepted October 30, 2015.

- Copyright © 2016 by the Genetics Society of America