## Abstract

Genomewide multiple-loci mapping can be viewed as a challenging variable selection problem where the major objective is to select genetic markers related to a trait of interest. It is challenging because the number of genetic markers is large (often much larger than the sample size) and there is often strong linkage or linkage disequilibrium between markers. In this article, we developed two methods for genomewide multiple loci mapping: the Bayesian adaptive Lasso and the iterative adaptive Lasso. Compared with eight existing methods, the proposed methods have improved variable selection performance in both simulation and real data studies. The advantages of our methods come from the assignment of adaptive weights to different genetic makers and the iterative updating of these adaptive weights. The iterative adaptive Lasso is also computationally much more efficient than the commonly used marginal regression and stepwise regression methods. Although our methods are motivated by multiple-loci mapping, they are general enough to be applied to other variable selection problems.

IT is well known that complex traits, including many common diseases, are controlled by multiple loci (Hoh and Ott 2003). However, multiple-loci mapping remains one of the most attracting and most difficult problems in genetic studies, mainly due to the high dimensionality of the genetic markers as well as the complicated correlation structure among genotype profiles (throughout this article, we use the term “genotype profile” to denote the genotype profile of one marker, instead of the genotype profile of one individual). Suppose a quantitative trait and the genotype profiles of *p*_{0} markers (*e.g.*, single-nucleotide polymorphisms, SNPs) are measured in *n* individuals. We treat this multiple-loci mapping problem as a linear regression problem,(1)or **y** = **b**_{0} + **Xb** + **e** in a matrix form, where **y** = (*y*_{1}, … , *y _{n}*)

*,*

^{T}**X**= (

*x*)

_{ij}_{n×p},

**b**

_{0}=

*b*

_{0}

**1**

_{1×p},

**b**= (

*b*, … ,

_{1}*b*)

_{p}*,*

^{T}**e**= (

*e*

_{1}, … ,

*e*)

_{n}*, and*

^{T}**e**∼

*N*(

**0**

_{n×1}, σ

^{2}

*I*).

_{n×n}*y*is the trait value of the

_{i}*i*th individual, and

*b*

_{0}is the intercept. Here

*p*is the total number of covariates. If we consider only the main effect of each SNP,

*p*=

*p*

_{0}; and if we consider the main effects and all the pairwise interactions,

*p*=

*p*

_{0}+

*p*

_{0}(

*p*

_{0}– 1)/2.

*x*is the value of the

_{ij}*j*th covariate of individual

*i*. The specific coding of

*x*depends on the study design and the inheritance model. For example, if additive inheritance is assumed, the main effect of a SNP can be coded as 0, 1, and 2 on the basis of the number of minor alleles. The major objective of multiple-loci mapping is to identify the correct subset model,

_{ij}*i.e.,*to identify those

*j*'s, such that

*b*≠ 0, and estimate the

_{j}*b*'s.

_{j}Marginal regression and stepwise regression are commonly used for multiple-loci mapping. Permutation-based thresholds for model selection have been used for these two methods (Churchill and Doerge 1994; Doerge and Churchill 1996). Broman and Speed (2002) proposed a modified Bayesian information criterion (BIC) for model selection, which was further written as a penalized LOD score criterion and implemented within a forward–backward model selection framework (Manichaikul *et al.* 2009). The threshold of the penalized LOD score is also estimated by permutations.

Several simultaneous multiple-loci mapping methods have been developed, among which two commonly used approaches are Bayesian shrinkage estimation and Bayesian model selection. Most existing Bayesian shrinkage methods are hierarchical models based on the additive linear model specified in Equation (1), with covariate-specific priors: *p*(*b _{j}* | ) ∼

*N*(0, ). The coefficients are shrunk because their prior mean values are 0. The degree of shrinkage is controlled by the prior of the covariate-specific variance . An inverse-Gamma prior,(2)leads to an unconditional prior of

*b*as a Student's

_{j}*t*distribution (Yi and Xu 2008). We refer to this method as the Bayesian

*t*. Another choice is an exponential prior,(3)where

*a*is a hyperparameter. In this case, the unconditional prior of

*b*is a Laplace distribution, that is closely related to the Bayesian interpretation of the Lasso (Tibshirani 1996); therefore, it has been referred to as the

_{j}*Bayesian Lasso*(Yi and Xu 2008). Park and Casella (2008) constructed the Bayesian Lasso using a similar but distinct prior:

*p*(

*b*| ) ∼

_{j}*N*(0, ) and

*p*( |

*a*

^{2}/2) = Exp(

*a*

^{2}/2). Hans (2009) proposed another Bayesian Lasso method, with more emphasis on prediction than on variable selection. Several general Bayesian model selection methods have been applied for multiple-loci mapping, for example, the stochastic search variable selection (George and McCulloch 1993) and the reversible-jump Markov chain Monte Carlo (MCMC) methods (Richardson and Green 1997). One example is the composite model space approach (CMSA) (Yi 2004).

We propose two variable selection methods: the Bayesian adaptive Lasso (BAL) and the iterative adaptive Lasso (IAL). The BAL is a fully Bayesian approach while the IAL is an expectation conditional maximization (ECM) algorithm (Meng and Rubin 1993). Both the BAL and the IAL are related to the adaptive Lasso (Zou 2006), which extends the Lasso (Tibshirani 1996) by allowing covariate-specific penalties. The adaptive Lasso enjoys the oracle property (Fan and Li 2001); *i.e*., the covariates with nonzero coefficients will be selected with probability tending to 1, and the estimates of nonzero coefficients have the same asymptotic distribution as the correct model. However, the adaptive Lasso requires consistent initial estimates of the regression coefficients, which are generally not available in the high dimension, low sample size (HDLSS) setting. Huang *et al.* (2008) showed that with initial estimates obtained from the marginal regression, the adaptive Lasso still has the oracle property in the HDLSS setting under a partial orthogonality condition: the covariates with zero coefficients are weakly correlated with the covariates with nonzero coefficients. However, in many real-world problems, including the multiple-loci mapping problem, the covariates with zero coefficients are often strongly correlated with some covariates with nonzero coefficients. The BAL and the IAL extend the adaptive Lasso in the sense that they do not require any informative initial estimates of the regression coefficients so that they can be applied in the HDLSS setting, even if there is high correlation among the covariates.

After we completed an earlier version of this article, we noticed an independent work on extending the adaptive Lasso from a Bayesian point of view (Griffin and Brown 2007). There are several differences between Griffin and Brown's approach and our work. First, Griffin and Brown (2007) did not study the fully Bayesian approach, while we have implemented and carefully studied the BAL. Second, Hoggart *et al.* (2008) implemented Griffin and Brown's approach in HyperLasso, a coordinate descent algorithm, which is different from the IAL at both model setup and implementation. We showed in our simulation and real data analysis that the IAL has significantly better variable selection performance than the HyperLasso. The differences between the IAL and the HyperLasso are further elaborated in the discussion.

In this article, we focus on the genomewide multiple-loci mapping in experimental crosses of inbred strains (*e.g.*, yeast segregants, F_{2} mice) where typically thousands of genetic markers are genotyped in hundreds of samples. Another situation is genomewide association studies (GWAS) in outbred populations where millions of markers are genotyped in thousands of individuals. Multiple-loci mapping in experimental crosses and GWAS presents different challenges. In experimental crosses, genotype profiles have higher correlations in longer genomic regions, which give simultaneous multiple-loci mapping methods more advantages than the marginal regression or stepwise regression. In contrast, GWAS data have higher dimensionality but the linkage disequilibrium (LD) blocks often have limited sizes. We focus on the experimental cross in this study, but similar methods can also be applied to GWAS data, although preselection or mapping chromosome-by-chromosome may be necessary to reduce the dimension of the covariates.

The remainder of this article is organized as follows. We first introduce the BAL and the IAL in the following two sections, respectively, and then evaluate them and several existing methods by extensive simulations in simulation studies. A real data study of multiple-loci mapping of gene expression traits is presented in the gene expression QTL study section. We summarize and discuss the implications of our methodology in the discussion.

## THE BAL

The BAL is a Bayesian hierarchical model. The priors are specified as(4)(5)(6)where δ > 0 and τ > 0 are two hyperparameters. The unconditional prior of *b _{j}* is(7)which we refer to as a

*power*distribution with parameters δ and τ. From this unconditional prior, we can see that smaller τ and larger δ lead to bigger penalization. In practice, it could be difficult to choose specific values for the hyperparameters δ and τ. Following a similar rationale of Yi and Xu (2008), we suggest a joint improper prior

*p*(δ, τ) ∝ τ

^{−1}and let the data estimate δ and τ. We suggest the prior of τ

^{−1}because in our application in the HDLSS setting, we encourage larger penalty and hence smaller τ and larger δ.

The posterior distribution of all the parameters is given by(8)where rss indicates residual sum of squares, *i.e.,* . We sample from this posterior distribution using a Gibbs sampler, which is presented in supporting information, File S1, Section A.

Our model can also be explained from the perspective of penalized least squares. Using the unconditional prior to replace the conditional prior, and taking the negative logarithm of the posterior probability, we obtain a penalized least-squares objective function: rss/(2 ) + (1 + δ) log(|*b _{j}*| + τ), with a log penalty for |

*b*|. Another form of log penalty is λ log(|

_{j}*b*|), which has been introduced by Zou and Li (2008) as the limit of bridge penalty |

_{j}*b*|

_{j}^{γ}, as . Comparing these two forms of log penalties, λ and 1 + δ are both tuning parameters that control the size of the penalty, and our log penalty has an extra parameter τ to give a minimum penalty when

*b*= 0. In contrast, if

_{j}*b*= 0, the log penalty by Zou and Li (2008) is infinite, which is not a problem for their one-step estimate, given the initial estimate of

_{j}*b*is not zero, but will cause problems in our iterative algorithms. Friedman (2008) proposed a log penalty, λ log((1 – β)|

_{j}*b*| + β), with 0 < β < 1, which is equivalent to the log penalty that we use. Friedman's motivation is that this log penalty bridges the

_{j}*L*

_{1}penalty (Lasso) and the

*L*

_{0}penalty (all-subset selection) as β changes from 1 to 0. In contrast, we motivate this log penalty and solve the penalized least-squares problem from a Bayesian point of view.

The BAL can be better understood by comparing it with the Bayesian Lasso. Recall that in the Bayesian Lasso, *b _{j}* | ∼

*N*(0, ), ∼ Exp(

*a*

^{2}/2), and the unconditional prior for each

*b*is a double-exponential distribution. The conditional normal prior resembles a ridge penalty and the unconditional Laplace prior resembles a Lasso penalty. Therefore an intuitive (albeit not accurate) explanation of the Gibbs sampler for the Bayesian Lasso is that it approaches a Lasso penalty by iteratively applying covariate-specific ridge penalties. Figure 1 in Park and Casella (2008) justifies this intuitive explanation: the coefficient paths of the Bayesian Lasso are a compromise between the coefficient paths of the Lasso and ridge regression. In contrast, an intuitive explanation of the BAL is that it approaches the log penalty by iteratively applying the adaptive Lasso penalty. Figure 1 illustrates the difference between the unconditional prior distribution of BAL and the Bayesian Lasso. The former has a higher peak at zero and heavier tails, which leads to more penalization for smaller coefficients and less penalization for larger coefficients. This shape can potentially be an advantage in the HDLSS setting where strong penalty is needed, although as pointed by a reviewer, what shape works best depends on the data.

_{j}## THE IAL

Because no point mass at zero is specified in the Bayesian shrinkage methods (including the BAL), the samples of the regression coefficients would not be exactly zero, and thus the Bayesian shrinkage methods do not automatically select variables. However, if we look for the mode of the posterior distribution, it could be exactly zero. This leads to the following ECM algorithm: the iterative adaptive Lasso. Specifically, under the setup of the BAL (Equations 4–6), we treat θ = (*b*_{0}, *b*_{1}, …, *b _{p}*) as parameter and let ϕ = ( , κ

_{1}, …, κ

_{p}) be the missing data. The observed data are

*y*and

_{i}*x*. The complete data log-posterior of θ is(9)

_{ij}where *C* is a constant with respect to θ. Suppose in the *t*th iteration the parameter estimates are θ^{(t)} = . Then after some derivations (File S1, Section B), the conditional expectation of *l*(θ | **y**, **X**, ϕ) with respect to the conditional density of *f*(ϕ | **y**, **X**, θ^{(t)}) is(10)

where rss^{(t)} is the residual sum of squares calculated on the basis of θ^{(t)} = . Comparing Equations 9 and 10, it is obvious that to obtain *Q*(θ | θ^{(t)}), we can simply let = rss^{(t)}/*n* and κ_{j} = (| | + τ)/(1 + δ).

On the basis of the above discussions, the IAL is implemented as follows:

Initialization: We initialize

*b*(0 ≤_{j}*j*≤*p*) with zero, initialize by variance of*y*, and initialize κ_{j}(1 ≤*j*≤*p*) with τ/(1 + δ).Conditional maximization (CM) step:

Expectation (E) step: With the updated

*b*'s, recalculate the residual sum of squares, rss, and do the following:_{j}Update σ

_{e}^{2}: σ_{e}^{2}=rss/*n*.Update κ

_{j}: κ_{j}= (|*b*| + τ)/(1 +δ)._{j}

We say the algorithm is converged if the coefficient estimates have little change.

The above discussions proved that the IAL is an ECM algorithm, which guarantees its convergence. In each step of the ECM algorithm, the conditional log posterior is concave, so a local maximum is a global maximum, and thus it is computationally easy to maximize the conditional log posterior. However, the unconditional log prior is not concave; thus for some tuning parameters, the unconditional log posterior (that is, the concave log likelihood plus the nonconcave log prior) could be nonconcave, so that a local maximum many not be the global maximum. Therefore, similar to other EM algorithms, it is possible that the IAL identifies a local mode of the posterior. This is a common problem for EM algorithms and we address this issue by choosing appropriate initial values of the coefficients and appropriate tuning of the hyperparameters τ and δ.

In the HDLSS setting, especially where the covariates are highly correlated, initial estimates from ordinary least squares (OLS) or ridge regression are unavailable, unstable, or noninformative. Therefore we initialize all the coefficients by zero.

To decide δ and τ, we first consider this problem in an asymptotic point of view to show that theoretically, we can identify optimal δ and τ (see Theorem 1 in File S1, Section C). However, the theoretical results provide only a rough scale for δ and τ. In practice, we select the specific values of δ and τ by the BIC, followed by a variable filtering step. The BIC is written as(11)where d.f._{τ,δ} is the number of nonzero coefficients, an estimate of the degrees of freedom (Zou *et al.* 2007). Given the subset model selected by the BIC, the variable filtering step can be implemented by a single multiple-regression or stepwise backward selection. Multiple regression is computationally more efficient. However, occasionally, if two highly correlated covariates are both included in the model, it is possible that both of them are insignificant by multiple regression. In backward regression, however, after dropping one of them, the other one may become significant. We use backward regression for all the numerical results in this article. The *P*-value cutoff for variable filtering can be set as 0.05/*p*_{E}, where *p*_{E} is the effective number of independent tests. A conservative choice is *p*_{E} = *p*, the total number of covariates. In this article, we estimate *p*_{E} by the number of independent tests; see Sun and Wright (2010) and File S1, Section D for details. Our approach is closely related to the screening and cleaning method proposed by Wasserman and Roeder (2009). Wasserman and Roeder (2009) use Lasso, marginal regression, or forward stepwise regression accompanied with cross-validation for screening and use a multiple regression for cleaning. We use the IAL for screening and use multiple regression or backward regression for cleaning. As pointed out by an anonymous reviewer, a ridge regression using all the selected covariates may also be an appropriate choice for cleaning.

In contrast to the BIC plus variable filtering approach, an alternative strategy is to apply an extended BIC, which provides larger penalty for bigger models (Chen and Chen 2008). The simulation results in the next section show that the extended BIC leads to slightly worse variable selection performance. Our explanation is that the extended BIC is valid asymptotically and it is conservative when the sample size is relatively small (*n* = 360 in our simulation). Compared with the extended BIC, the ordinary BIC tends to select larger models with all or most of the true discoveries plus some false discoveries (Chen and Chen 2008), which can be filtered out by the variable filtering step.

## SIMULATION STUDIES

We first use simulations to evaluate the variable selection performance of 10 methods: marginal regression, forward regression, forward–backward regression (with penalized LOD as the model selection criterion), the CMSA, the adaptive Lasso (with initial regression coefficients from marginal regression), the IAL, the HyperLasso, and three Bayesian shrinkage methods: the Bayesian *t*, the Bayesian Lasso, and the BAL. See File S1, Section E for the implementation details.

#### Simulation setup:

We first simulate a marker map of 2000 markers from 20 chromosomes of length 90 cM, with 100 markers per chromosome [using function sim.map in R/qtl (Broman *et al.* 2003)]. The chromosome length is chosen to be close to the average chromosome length in the mouse genome. Next we simulate genotype data of the 360 F_{2} mice on the basis of the simulated marker map (using function sim.cross in R/qtl). As expected, the markers from different chromosomes have little correlation, while the majority of the markers within the same chromosome are positively correlated (Figure S1). In fact, given the genetic distance of two SNPs, the expected *R*^{2} between two SNPs in this F_{2} cross can be explicitly calculated (Figure S2). For example, the *R*^{2}'s of two SNPs 1, 5, and 10 cM apart are 0.96, 0.82, and 0.67, respectively. Finally, we randomly choose 10 markers as QTL and simulate quantitative traits in six situations with 1000 simulations per situation. Given the 10 QTL, the trait is simulated on the basis of the linear model in Equation 1, where genotype (*x _{ij}*) is coded by the number of minor alleles. The QTL effect sizes across the six situations are listed below:

Unlinked QTL: One QTL per chromosome, with effect sizes 0.5, 0.4, −0.4, 0.3, 0.3, −0.3, 0.2, 0.2, −0.2, and −0.2; = 1. Recall that is the variance of the residual error.

QTL linked in coupling: Two QTL per chromosome, with effect sizes of the QTL for each chromosome as (0.5, 0.3), (−0.4, −0.4), (0.3, 0.3), (0.2, 0.2), and (−0.2, −0.2); = 1.

QTL linked in repulsion: Two QTL per chromosome, with effect sizes of the QTL for each chromosome as (0.5, −0.3), (0.4, −0.4), (0.3, −0.3), (0.2, −0.2), and (0.2, −0.2); = 1.

Situations 2, 4, and 6 are the same as situations 1, 3, and 5, respectively, except that = 0.5. The locations and effect sizes of the QTL in each situation are illustrated in Figure 2. To mimic the reality that the genotype of a QTL may not be observed, we randomly select 1200 markers with “observed genotype profiles” and use only these 1200 markers in the multiple-loci mapping. The information loss is limited due to the high density of the markers. In fact, the vast majority of the 800 markers with missing genotype can be tagged with *R*^{2} > 0.8 by at least one marker with observed genotype (Figure S3).

One important aspect in the implementation of Bayesian methods is the diagnosis of the convergence of the MCMC. The results in Figure S4, Figure S5, and Figure S6 suggest the convergence of the Bayesian Lasso, Bayesian *t*, and BAL, respectively.

#### Results:

We divide the methods to be tested into two groups: the Bayesian methods that do not explicitly carry out variable selection (since most coefficients remain nonzero) and the stepwise regression, adaptive Lasso, and HyperLasso that explicitly select a subgroup of variables. The IAL is classified into the first group if we use the ordinary BIC to select the hyperparameters; and it is classified into the second group if we use the extended BIC or ordinary BIC plus variable filtering.

For either group, we compare the performance of different methods by comparing the number of true discoveries and false discoveries across different cutoffs of coefficient size or posterior probability. Given a cutoff, we can obtain a final model. We count the number of true discoveries in the final model as follows. For each of the true QTL, we check whether any marker in the final model satisfies the following three criteria: (1) it is located on the same chromosome as the QTL, (2) it has the same effect direction (sign of the coefficient) as the QTL, and (3) the *R*^{2} between this marker and the QTL is >0.8. Different cutoffs such as 0.7 and 0.9 lead to similar conclusions (results not shown). If there is no such marker, there is no true discovery for this QTL. If there is at least one such marker, the one with the highest *R*^{2} with the QTL is recorded as a true discovery and is excluded from the true discovery searching of other QTL. After the true discoveries of all the QTL are identified, the remaining markers in the final model are defined as false discoveries. These false discoveries are further divided into two classes: false discoveries linked to at least one QTL (linked false discoveries) and false discoveries unlinked to any QTL (unlinked false discoveries). A false discovery is linked to a QTL if it satisfies the above three criteria. We summarize the results of each method by an receiver operating characteristic (ROC)-like curve that plots the median number of true discoveries *vs.* the median number of false discoveries across different cutoff values. The methods with ROC-like curves closer to the upper-left corner of the plot have better variable selection performance because they have less false discoveries and more true discoveries. It is possible that a few cutoff values correspond to the same median of false discoveries but different medians of true discoveries. In this case, the largest median of true discoveries is plotted to simplify the figure. In other words, these ROC-like curves illustrate the best possible performance of these methods. Forward regression outperforms marginal regression in all situations. Therefore we omit the results for marginal regression for readability of the figures.

We first compare the Bayesian methods and the IAL (with ordinary BIC). If the linked false discoveries are counted as false discoveries, the IAL has apparent advantages in all situations. Approximately, the performance of these methods can be ranked as IAL ≥ CMSA ≥ BAL ≥ Bayesian *t* ≥ Bayesian Lasso (Figure 3). If the linked false discoveries are counted as true discoveries, the performances of different methods are not well separated (Figure S7). Overall the IAL, the BAL, and the CMSA have similar and superior performance, and the Bayesian Lasso has inferior performance. We point out that the comparison of the IAL and the Bayesian methods should be interpreted with caution. The posterior mode estimates of parameters by the IAL provide less information than the fully Bayesian methods. For example, the Bayesian methods provide the distribution of the regression coefficients or the distribution of the probability of being included in the model, while the IAL cannot provide such information.

Next we compare the stepwise regression method, the HyperLasso, the adaptive Lasso (with initial estimates from marginal regression), and the IAL with extended BIC or ordinary BIC plus variable filtering. These methods tend not to select the unlinked false discoveries. In fact, the ROC-like curve for each of these methods is exactly the same whether we count unlinked false discoveries as true discoveries or not. If the linked false discoveries are treated as true discoveries, an additional fine-mapping step is needed to pinpoint the location of the QTL in a cluster of linked markers. Therefore, in general, methods that avoid linked false discoveries should be preferred. As shown in Figure 4, the IAL with ordinary BIC plus variable filtering has the best performance while the HyperLasso has the worst performance in all the situations. When the QTL are linked in repulsion, the HyperLasso has no power at all. The adaptive Lasso has similar performance to the IAL when the signal is strong (*i.e.*, QTL linked in coupling); otherwise it has significantly worse performance than the IAL. The stepwise regression and the IAL using the extended BIC have slightly worse performance than the IAL using ordinary BIC plus variable filtering.

We have compared different methods by ROC-like curves and use somewhat *ad hoc* rules to define true/false discoveries. In practice, we may need different criteria to define the true/false discoveries. For example, if there is no missing covariate, we may define the true discovery as the identification of the exact covariate, instead of a highly correlated one. In fact, for prediction, a small number of false discoveries with small coefficients may not affect the prediction accuracy.

## GENE EXPRESSION QTL STUDY

QTL study of one particular trait may favor one method by chance. To evaluate our method in real data in a comprehensive manner, we study the gene expression QTL (eQTL) of thousands of genes. The expression of each gene, like other complex traits, is often controlled by multiple QTL (Brem and Kruglyak 2005). Therefore multiple-loci mapping has important applications for eQTL studies. In this section, we study eQTL data with >6000 genes and 2956 SNPs in 112 yeast segregants (Brem and Kruglyak 2005; Brem *et al.* 2005). The gene expression data are downloaded from Gene Expression Omnibus (GEO) (GSE1990). The expression of 6229 genes is measured in the original data. We drop 129 genes that have >10% missing values and impute the missing values in the remaining 6100 genes by R function impute.knn (Troyanskaya *et al.* 2001). The genotype data were obtained from Rachel Brem. Fifteen SNPs with >10% missing values are excluded from this study, and the missing values in the remaining SNPs are imputed using the function fill.geno in R/qtl (Broman *et al.* 2003). The neighboring SNPs with the same genotype profiles are combined, resulting in 1027 genotype profiles. With >6000 genes, it is extremely difficult, if not impossible, to examine the QTL mapping results gene by gene to filter out possible linked false discoveries. Therefore, the Bayesian methods that generate lots of linked false discoveries were not applied to these eQTL data.

We apply the IAL, marginal regression, forward regression, forward–backward regression, and HyperLasso to these yeast eQTL data to identify multiple eQTL of each gene separately. In other words, we examine the performances of these methods across 6100 traits. The permutation *P*-value cutoff for marginal regression and stepwise regression is set at 0.05. The parameters δ and τ of the IAL are selected by ordinary BIC followed by backward regression, with a *P*-value cutoff 0.05/412, which is based on a conservative estimate of 412 independent tests (see File S1, Section D). For the HyperLasso, the “lambda” parameter is set at , and the “shape” parameter is set to be 1. The IAL and stepwise regressions have similar power to identify the genes with at least one eQTL (Table 1). Apparently, the IAL is the most powerful method in terms of identifying multiple eQTL per gene, and the HyperLasso has least power to identify either single eQTL or multiple eQTL per gene (Table 1).

Next we focus on the results of the IAL. Many previous studies have identified one-dimensional (1-D) eQTL hotspots. A 1-D eQTL hotspot is a genomic locus that harbors the eQTL of several genes. Similarly, if the expressions of several genes are associated with the same *k* loci, these *k* loci are referred to as a *k*-D eQTL hotspot. The results of the IAL reveal several 1-D eQTL hotspots (Figure 5), as well as many eQTL hotspots of higher dimensionality. We illustrate the 2-D eQTL hotspots in a two-dimensional plot where one point corresponds to one 2-D eQTL and the *x*, *y* coordinates of the point are the locations of the two eQTL (Figure 6). Comparing Figures 5 and 6, it is interesting that a 1-D eQTL can be further divided into several groups on the basis of the results of 2-D eQTL, which is consistent with the finding of Zhang *et al.* (2010).

We divide the whole yeast genome into 600 bins of 20-kb regions, which lead to 600 × 599/2 = 179,700 bin pairs as potential “2-D eQTL hotspots”. Eleven bin pairs are linked to >15 genes (Table S1). The cutoff is chosen arbitrarily so that we can focus on a relatively small group of 2-D hotpots with definite significant enrichment. Due to space limits, we discuss in detail only the largest 2-D hotspot located at chromosome (Chr)15, 160–180 kb and Chr15, 560–580 kb. There are 46 genes linked to these two loci simultaneously, and among them 16 are involved in “generation of precursor metabolites and energy” (*P*-value 3.60 × 10^{−13}). A closer look reveals that 41 of the 46 genes are linked to one marker block at Chr15, 170,945–180,961 bp, and one marker at Chr15, 563,943 bp. One potential causal gene nearby Chr15, 171–181 kb is PHM7 (Zhu *et al.* 2008), and one potential causal gene nearby Chr15, 564 kb is CAT5 (Yvert *et al.* 2003). Interestingly, both PHM7 and CAT5 are among the 46 genes linked to both loci.

There are also several cases that one group of genes linked to three loci (∼35.8 million possible three-loci combinations, Table S2) or even four loci (∼5.3 billion possible four-loci combinations, Table S3). For example, three genes, KGD2, SDH1, and SDH3 are all linked to four loci: Chr2, 240–260 kb; Chr13, 20–40kb; Chr15, 160–180 kb; and Chr15, 560–580 kb. Interestingly, all three genes are involved in an “acetyl-CoA catabolic process” (*P*-value 1.93 × 10^{−7}).

## DISCUSSION

In this article, we proposed two variable selection methods, namely the BAL and the IAL. These two methods extend the adaptive Lasso in the sense that they do not require any informative initial estimates of the regression coefficients. The BAL is implemented by MCMC. Through extensive simulations, we observe the BAL has apparently better variable selection performance than the Bayesian Lasso, slightly better performance than the Bayesian *t*, and slightly worse performance than the CMSA. The IAL, which is an ECM algorithm, aims at finding the mode of the posterior distribution. The IAL has uniformly the best variable performance among all the 10 methods we tested. Coupled with a variable filtering step, type I error of the IAL can be explicitly controlled.

The IAL differs from the HyperLasso (Griffin and Brown 2007; Hoggart *et al.* 2008) in at least two aspects. First, the HyperLasso specifies inverse gamma distribution for κ_{j}^{2}/2, and the resulting unconditional posterior relies on a numerical function. In contrast, we specify inverse gamma distribution for κ_{j}, and it has a much simpler unconditional posterior (Equation 7). The difference is not trivial since it leads not only to convenient theoretical studies in Theorem 1 (File S1, Section C), but also to better numerical stability. For example, the HyperLasso becomes unstable for small shape parameter while IAL is stable for all possible values of δ and τ. Second, we select δ and τ by BIC and further filter out covariates with insignificant effects. In contrast, the HyperLasso directly assigns a large penalization to control the type I error. As shown in the *Results* section, the strong penalization of HyperLasso leads to little power to detect relatively weaker signals.

The IAL is computationally very efficient. For example, it takes 4 hr to carry out the multiple-loci mapping for the yeast eQTL data with 6100 genes and 1017 markers. In contrast, marginal regression, forward regression, and forward–backward regression take ∼60, 100, and 200 hr. All of the computation was done using a Dual Xenon 2.0-Ghz Quadcore server. One additional computational advantage of the IAL is that the type I error is controlled by the computationally efficient variable filtering step. The IAL results can be reused for different type I errors. In contrast, for the stepwise regression, all the computation needs to be redone for each type I error.

Our results seem to contradict the results of Yi and Xu (2008) that the Bayesian Lasso has adequate variable selection performance. This inconsistency can be explained by the fact that we are studying the variable selection problem with a much denser marker map. It is known that the Lasso does not have variable selection consistency if there are strong correlations between the covariates with zero and nonzero coefficients (Zou 2006). Since the Bayesian Lasso has similar penalization characteristics to the Lasso (Park and Casella 2008) and the denser marker map leads to higher correlations among genotype profiles, it is not surprising that the Bayesian Lasso has inferior performance in our simulations. In fact, in our simulations, the Bayesian Lasso overpenalizes the regression coefficients (Figure S8). This is consistent with the findings that “Lasso has had to choose between including too many variables or overshrinking the coefficients” (Radchenko and James 2008, p. 1310). In contrast, the Bayesian *t*, the BAL, and the IAL have increasingly smaller penalization on the coefficients estimates. The IAL seems to provide unbiased coefficient estimates. This leads to an assumption that the IAL has the oracle property, which warrants further theoretical study.

Due to the computational cost and the need to further filter out false discoveries, the Bayesian shrinkage methods are less attractive for large-scale computation such as eQTL studies. However, there is room to improve these Bayesian shrinkage methods, such as the equi-energy sampler approach (Kou *et al.* 2006). Alternative prior distribution for hyperparmeters may also lead to better variable selection performance of the BAL. These strategies are among our future studies. Specifically, for the gamma prior in the BAL, we have tried the strategy to set the hyperparameters δ and τ as fixed numbers. In general, the results are not very sensitive to the choices of δ and τ, and no combination of δ and τ leads to significantly better results than assigning the joint prior for δ and τ; *i.e.*, *p*(δ, τ) ∝ τ^{−1}.

In the current implementation, we handle missing genotype data by imputing them first (using the Viterbi algorithm implemented in R/qtl) and then taking the imputed values as known. A more sophisticated approach for the BAL is to take the genotype data as unknown and sample them within MCMC; and for the IAL, we can summarize its results across multiple imputations (Sen and Churchill 2001). However, these sophisticated approaches are computationally more intensive and are mainly designed for relatively sparse marker maps. The current high-density SNP arrays often have high-confidence genotype call rates >98% (Rabbee and Speed 2006). Imputation methods are also an active research topic. Haplotype information from related or unrelated individuals can be used to obtain accurate genotype imputation (Marchini *et al.* 2007). Therefore simply imputing the genotype data and then taking it as known may be sufficient for many studies using high-density SNP arrays, although careful examination of missing data patterns is always important.

We have mainly discussed our method in a linear regression framework. Extension to the generalized linear model (*e.g.*, logistic regression for binary responses) is possible. The generalized linear model can be solved by iterated reweighted least squares. Similar to the approach used in Friedman *et al.* (2009), our method can be plugged in to solve the least-squares problem within the loop of iterated reweighted least squares. However, this approach is computationally intensive. Yi and Banerjee (2009) proposed an intriguing and efficient EM algorithm for multiple-loci mapping by generalized linear regression. In their EM algorithm, many correlated covariates can be simultaneously updated, which has the advantage of accommodating the correlation among the covariates. However, our method uses a different model setup and cannot adopt the same approach of Yi and Banerjee (2009). Computationally efficient extension of our method to a generalized linear model warrants further studies.

We tested the robustness of our methods by additional sets of simulations where the traits are log or exponentially transformed (results not shown). The conclusion is that our methods are robust to mild violations of linearity assumption. However, we still expect that the additive linear model assumption is severely violated in some situations, *e.g.*, when epistatic interactions are present. As mentioned in the Introduction, we can include pairwise interactions in our model, similar to the approaches used by Zhang and Xu (2005) and Yi *et al.* (2007). In practice, prioritizing interactions by the significance of main effects or biological knowledge may help to reduce the multiple-testing burden and to improve the power. How to penalize the interaction term also warrants further study. Grouping the interaction terms and the corresponding main effects together and applying group penalties (Yuan and Lin 2006) may be a better approach than penalizing the main effects and interactions separately.

In summary, we have developed iterative adaptive penalized regression methods for genomewide multiple-loci mapping problems. Both theoretical justifications and empirical evidence suggest that our methods have superior performance than the existing methods. Although our work is motivated by genetic data, our methods are general enough to be applied to other HDLSS problems.

## Acknowledgments

The authors thank two anonymous reviewers and the associate editor for their constructive comments and suggestions. The authors also thank Jun Liu for his valuable suggestions for the implementation of the Bayesian adaptive Lasso. This research was supported by National Institute of Mental Health (NIMH) grants 1 P50 MH090338-01 and 1 RC2 MH089951-01. J. G. Ibrahim's research was partially supported by National Institutes of Health (NIH) grants GM70335 and CA74015. F. Zou's research was partially supported by NIH grant GM074175-03.

## Footnotes

Supporting information is available online at http://www.genetics.org/cgi/content/full/genetics.110.114280/DC1.

Communicating editor: H. Zhao

- Received January 17, 2010.
- Accepted February 3, 2010.

- Copyright © 2010 by the Genetics Society of America