Stochastic Search Variable Selection for Identifying Multiple Quantitative Trait Loci
Nengjun Yi, Varghese George, David B. Allison

Abstract

In this article, we utilize stochastic search variable selection methodology to develop a Bayesian method for identifying multiple quantitative trait loci (QTL) for complex traits in experimental designs. The proposed procedure entails embedding multiple regression in a hierarchical normal mixture model, where latent indicators for all markers are used to identify the multiple markers. The markers with significant effects can be identified as those with higher posterior probability included in the model. A simple and easy-to-use Gibbs sampler is employed to generate samples from the joint posterior distribution of all unknowns including the latent indicators, genetic effects for all markers, and other model parameters. The proposed method was evaluated using simulated data and illustrated using a real data set. The results demonstrate that the proposed method works well under typical situations of most QTL studies in terms of number of markers and marker density.

MOST complex traits important to evolution, animal and plant breeding, and medical genetics are influenced by the segregation of multiple genes [quantitative trait loci (QTL)] and environmental factors. There is strong interest in inferring the number, genomic locations, and genetic effects of QTL. Recently, the most widely used methods are interval mapping (Lander and Botstein 1989; Haley and Knott 1992). These methods are developed on the basis of a single-QTL model and detect QTL effects at different genomic locations separately. Although these methods have been successfully applied to detect QTL for a number of traits in a number of organisms, they may result in biased estimates for QTL locations and effects when the traits are actually controlled by multiple, especially linked, QTL (e.g., Haley and Knott 1992).

For complex traits governed by multiple QTL, it is necessary to take the whole genome into account for estimating the number, locations, and genetic effects of QTL. It has been recently shown both theoretically and empirically that multiple-QTL methods can improve power in detecting QTL and eliminate biases in estimates of QTL locations and genetic effects that can be introduced by using a single-QTL model (e.g., Haley and Knott 1992). Composite interval mapping creates a relatively simple and systematic procedure to map multiple QTL (Jansen and Stam 1994; Zeng 1994). This method detects and estimates each individual QTL by conditioning the test on other selected markers to absorb effects of other QTL. In the last decades, several statistical methods have been developed to detect multiple QTL and estimate their locations and effects simultaneously, including the multiple-interval mapping approach (Kaoet al. 1999; Zenget al. 2000), variable selection methods (Ball 2001; Piepho and Gauch 2001; Broman and Speed 2002), and Bayesian methodology with the reversible-jump Markov chain Monte Carlo algorithm (Satagopan and Yandell 1996; Satagopanet al. 1996; Heath 1997; Sillanpää and Arjas 1998; Stephens and Fisch 1998; Xu and Yi 2000; Yi and Xu 2000, 2001; Hoeschele 2001). These methods treat mapping multiple QTL as a problem of model determination and variable selection (Sillanpää and Corander 2002).

In this study, we propose an alternative Bayesian method for identifying multiple quantitative trait loci in experimental designs. Our method is based on a variable selection method, called stochastic search variable selection (SSVS), developed by George and McCulloch (1993). SSVS was originally introduced for linear regression models and has been adopted for more complex models such as generalized linear models (George and McCulloch 1997), log-linear models (Ntzoufraset al. 1997), and multivariate regression models (Brownet al. 1998). The difference between SSVS and other variable selection approaches is that the dimensionality is kept constant across all possible models by limiting the posterior distribution of nonsignificant terms in a small neighborhood of zero instead of removing them from the model as is usually done. Due to this unique property, SSVS is able to (1) be easily implemented via the Gibbs sampler, (2) evaluate each variable effect on the dependent response, and (3) provide the posterior probability that each variable should be included in the model.

In most QTL studies, a large number of markers are available across the genome, and these markers are usually closely related. It has been hypothesized that the genetic variation of most quantitative traits is actually controlled by a few loci with large effects and a large number of loci with small effects (e.g., Lynch and Walsh 1998). Therefore, only a small number of markers are expected to have large effects on the trait because of being linked to large-effect loci, and most of the markers have nonsignificant effects. Our method considers all markers simultaneously and is able to evaluate not only marker effects of the entire genome, but also the posterior probability of each marker having significant effects.

METHODS

Linear model: We describe the method primarily for a mapping population with only two segregating genotypes, e.g., a backcross, double-haploid lines (DHLs), or recombinant inbred lines (RILs). Assume that we observe K markers along the genome. Among the K markers, some may be tightly linked to genes with large effects and therefore have large effects, and others may have only weak effects. Our aim here is to identify which markers are tightly linked to genes with large effects and to estimate the magnitude of their effects. For a continuously distributed trait, the observed phenotypic value of individual i, yi, can be described by the linear model, yi=μ+j=1Kxijαj+ei, (1) where μ is the population mean, xij denotes the genotype of marker j for individual i and is defined by 0.5 or -0.5 for the two genotypes in the mapping population, αj is the effect size associated with marker j, and ei is the residual error assumed to follow N(0, σe2 ).

In practice, some marker data may be missing. Two methods deal with missing marker data. The first method is to replace the missing genotype xij by its conditional expectation E(xij|Mi) = 0.5p(xij = 0.5|Mi) - 0.5p(xij = -0.5|Mi), where Mi is observed marker data for individual i, and p(xij = 0.5|Mi) and p(xij =-0.5|Mi) are the conditional probabilities that marker j for individual i takes the two genotypes, respectively, and can be calculated using the multipoint method (Jiang and Zeng 1997). The second method is to impute the missing marker genotypes by sampling from the corresponding fully conditional probability distribution. If xij is missing, the fully conditional distribution can be derived as p(xij=0.5yi,xi(j),μ,α,σe2)=p(yixi(j),xij=0.5,μ,α,σe2)p(xij=0.5xi(j1),xi(j+1))xijp(yixi(j),xij,μ,α,σe2)p(xijxi(j1),xi(j+1)) and p(xij=0.5yi,xi(j),μ,α,σe2)=1p(xij=0.5yi,xi(j),μ,α,σe2), (2) where α= (α1,..., αK), xi(-j) = (xi1,..., xi(j-1), xi(j+1),..., xiK), the conditional probability p(xij = 0.5|xi(j-1), xi(j+1)) depends on the recombinant rates between marker j and its flanking markers (j - 1) and (j + 1), and p(yixi(j),xij=0.5,μ,α,σe2) is a normal density function with mean μ+j=1Kxijαj and variance σe2 . Obviously, the first method ignores the probability distributions of the missing genotypes and provides approximate estimates of the missing genotypes. In contrast, the second method can take the probability distributions into account. In this study, we use the second method to describe our Bayesian approach.

Stochastic search marker selection: In variable selection problems, statistical models can be naturally represented by a set of binary indicator variables Γ= (Γ1,..., ΓK), where Γj = 1 or 0 represents the presence or absence of covariate j in the model, respectively. The difference between SSVS and other variable selection approaches is that the dimension of parameter space remains unchanged so that the Gibbs sampler can be easily used to explore both the model space and the parameter space (George and McCulloch 1993).

The SSVS constructs the prior distribution for (Γ, α) in two stages. The prior distribution of the model indicator variables p(Γ) is chosen to reflect prior belief in whether particular markers are linked to QTL. A simple choice might have the Γj’s independent, so that p(γ)=j=1Kp(γj). (3)

When no information is available, a uniform prior is chosen for each Γj, i.e., pj = 0) = pj = 1) = 0.5.

The marker effects αj (j = 1,..., K) are given normal prior distributions conditional on the corresponding indicators Γj: αjγj(1γj)N(0,τj2)+γjN(0,cj2τj2),j=1,,K. (4) The prior parameters τj2 and cj2 are chosen so that τj2 is “small” and cj2τj2 is “large.” Hence, if Γj = 0, the magnitude of the effect αj is small and then the prior distribution for αj forces this parameter to be close to zero. If Γj = 1, the magnitude of the effect αj is large and then a nonzero estimate of αj should be included in the model and its posterior distribution will largely be determined by the data. On the basis of the above prior specification, a multivariate normal distribution can be used as the joint prior distribution for α conditional on Γ, given by αγNK(0,DγRDγ), (5) where R is the prior correlation matrix that is usually assigned to be R = I or R ∞ (xxT)-1, and DΓ= diag[a1τ1,...,aKτK] with ai = 1 if Γi = 0 and ai = ci if Γi = 1. The prior distribution for μ is assumed to be Normal N(η, τ2) with prespecified prior mean η and prior variance τ2. The prior for σe2 is chosen to be of a scaled inverse -χ2 distribution, Invχ2(ν0,σ02) , with known hyperparameters ν0 and σ02 .

View this table:
TABLE 1

Locations and effects of simulated QTL

On the basis of the prior specifications described above, we can use the Gibbs sampler to generate samples from the posterior distribution p(μ, α, σe2 , Γ|y, M). Starting with an initial value (μ(0), α(0), σe2(0) , Γ(0)), the Gibbs sampler proceeds as follows:

  1. Sample the missing marker genotypes from the full conditional posterior distributions described in Equation 2.

  2. Sample μ from the full conditional posterior distribution: p(μy,x,α,σe2)=N(ητ2+i=1n(yij=1Kxijαj)σe21τ2+nσe2,11τ2+nσe2).

  3. The full conditional posterior distribution of α is multivariate normal, NK((xxT+σe2(DγRDγ)1)1xT(yμ),σe2(xxT+σe2(DγRDγ)1)1) . Sampling from this distribution requires recomputing (xxT+σe2(DγRDγ)1)1 on the basis of new values of σe2 and Γ and thus may be costly. To avoid computing (xxT+σe2(DγRDγ)1)1 , we sample αj (j = 1,..., K) from the full conditional posterior distribution p(αjy,x,α(j),σe2,γ) , which is normal distribution (Wanget al. 1994), where α(-j) denotes all terms of α except αj.

  4. Sample σe2 from the full conditional posterior distribution: p(σe2y,x,μ,α)=Invχ2(ν0+n,ν0σ02+i=1n(yiμj=1Kxijαj)2ν0+n).

  5. Sample Γj from pj|y, x, Γ(-j), μ, α, σe2 ) = pj(-j), μ, α, σe2 ), which is Bernoulli with probability p(γjγ(j),μ,α,σe2)=p(αγ(j),γj=1)p(γj=1)p(αγ(j),γj=1)p(γj=1)+p(αγ(j),γj=0)p(γj=1), where Γ(-j) denotes all terms of Γ except Γj.

The above steps are repeated until a certain criterion for convergence is reached. The posterior sample {(μ(t), α(t), σe2(t) , Γ(t)): t = 1, 2,...} converges in distribution to the joint posterior distribution, p(μ, α, σe2 , Γ|y, M). The embedded subsequence {γj(t) : t = 1, 2,...} thus converges to pj|y, M), j = 1,..., K. Generally, the markers with large effects will appear most frequently and quickly, making them easier to identify. Therefore, markers with high posterior probability included in the model will most probably be linked to large-effect QTL.

SIMULATION STUDIES AND REAL DATA ANALYSIS

Simulation studies: The applicability of the proposed method was demonstrated by analyzing simulated data. The experimental sample was from a backcross and contained 300 segregating individuals. Four chromosomes with length 100 cM each were simulated. Twenty-one codominant markers were evenly placed on each chromosome with marker intervals of 5 cM each. We simulated 8 large-effect QTL and 16 small-effect QTL controlling the expression of a quantitative trait. The locations of the simulated QTL and their genetic effects are shown in Table 1. The overall mean and the residual variance were set to be μ= 1 and σe2 = 1, respectively. The genetic variance of QTL j is calculated by vj=aj24 , where aj is the true genetic effect. Ignoring the covariance due to linkage, the total genetic variances for the 8 large-effect QTL and 16 small-effect QTL are 2 and 0.04, respectively. Therefore, the phenotypic variances explained by each large-effect QTL and each small-effect QTL are 8 and 0.08%, respectively. We randomly generated missing markers of 10%. The design was replicated five times and analyzed using the proposed method. The results averaged over the five replicates were reported.

For each analysis, the initial values (μ(0), α(0), σe2(0) , Γ(0)) were randomly generated from their priors. We used the uniform distribution as prior for Γ as described earlier. Following the principles developed in George and McCulloch (1993, 1997) for choosing τj and cj, three different prior variances, i.e., (τj2,cj2τj2)=(0.001,10),(0.01,10),(0.01,100) , were used for the conditional prior distribution of the genetic effect αj. The prior correlation matrix was assigned to be the identity matrix, i.e., R = I. The prior distribution for μ was N(0, 2). The hyperparameter ν0 for σe2 was set to zero, which yields the noninformative prior distribution p(σe2)σe2 (Gelmanet al. 1995).

Figure 1.

—Simulation study. Posterior probabilities of marker indicators (left) and marker effects (right) are plotted against marker locations along the genome. Red, Formula ; green, Formula ; blue, Formula .

The Gibbs sampler was run for 50,000 cycles after discarding the first 2000 cycles for the burn-in period. It took ∼1 hr to generate each sample with a C++ program on a Pentium 4 PC. The chain was thinned (saved one iteration in every 5 cycles) to reduce serial correlation in the stored samples so that the total number of samples kept in the post-Bayesian analysis was 10,000 (Gelmanet al. 1995). The stored sample was used to infer the parameters of interest.

The estimated posterior probabilities for the marker indicators Γj (j = 1,..., K) are given in Figure 1 (left). The posterior probability pj|y, M) was obtained by counting the number of samples in which the marker indicator Γj is 1, divided by the total of number of samples. As shown in Figure 1, for almost all markers, the posterior probabilities for the first prior setting were larger than those for the other two settings, and the posterior probabilities for the second prior setting were larger than those for the third. For the three sets of prior variances, however, the profiles of the posterior probability distributions were similar. These profiles are very peaked, suggesting that the markers corresponding to the peaks have much larger effects than the rest. From these profiles, we also found that for most situations two markers flanking a large-effect QTL have very different posterior probabilities, one being large and another being close to zero. Therefore, our method may be powerful in distinguishing closely linked markers. There are a total of eight main peaks along the four simulated chromosomes on the profiles, and two are on one chromosome. It can be observed that the markers corresponding to the peaks are those that are the closest to the simulated eight large-effect QTL. Therefore, our Bayesian method was shown to be powerful for identifying multiple QTL. None of the simulated small-effect QTL were identified in our analyses. Actually, these QTL had effects close to zero and thus were picked up only occasionally.

Figure 2.

—Simulation study. Posterior distributions of overall mean and residual variance are shown: (a) overall mean for Formula ; (b) residual variance for Formula ; (c) overall mean for Formula ; (d) residual variance for Formula ; (e) overall mean for Formula ; (f) residual variance for Formula .

The profiles of marker effects are displayed in Figure 1 (right). Although empirical posterior distribution for each marker effect can be depicted, for simplicity we report only the posterior mean over the samples. The marker effects were estimated to be essentially identical for the three sets of prior variances (τj2 , cj2τj2 ). Therefore, these prior variances had an ignorable influence on the posterior inference about the marker effects. As in the case of the posterior probability distribution of marker indicators, the profiles of marker effects also have eight obvious peaks, each corresponding to a marker that is the closest to a large-effect QTL. For markers far from the large-effect QTL, their effects were estimated to be close to zero. From Figure 1, it can be seen that the accuracy of the estimate for a marker effect depends on the estimated posterior probability of the marker indicator. When the posterior probability was estimated to be close to one, the estimate of the marker effect was close to the true value. Otherwise, the marker effects were slightly underestimated. These results are expected because the marker effects with the corresponding indicators being zero are forced to be close to zero by the priors. However, we observed that the conditional estimates of marker effects were close to the true value if we used only the posterior samples with the corresponding indicators equal to one.

Figure 3.

—Single-marker regression analysis for chromosome 1. (a) Values of t-test statistic; (b) marker effects.

The empirical posterior distributions for the overall mean and the residual variance are depicted in Figure 2 (a-f). The estimated means for these two parameters were very close to the simulated values and the standard deviations were small, showing that the overall mean and the residual variance were estimated with precision.

For comparison, we also performed the single-marker analyses with the simple regression method for each marker and the usual multiple regression analysis with all markers as predictors. For the single-marker analyses, the profiles of the t-test statistics and the marker effects on chromosome 1 are shown in Figure 3. Apparently, these two profiles have only one peak covering a wide range. This shows that the single-marker analyses fail to separate the two linked QTL. It is also obvious that the marker effects were seriously overestimated. Since the marker density is quite high, results of single-marker analyses should be close to that of the interval mapping. Therefore, the proposed method was shown to be more powerful than the widely used interval-mapping method for detecting multiple QTL. Figure 4 shows the plot of the marker effects against the genome location (centimorgans) of the markers from multiple regression analysis. Obviously, the effects of most markers far from the large-effect QTL have not shrunk in the usual multiple regression analysis, indicating that multiple regression failed to detect clear signals of QTL. A common feature of the proposed Bayesian method and the usual multiple regression method is that all markers are included in the model. The clear advantage of the proposed method is that it uses two different prior distributions for the markers, which force the posterior means of insignificant markers to be close to zero and the posterior distributions of significant markers to be determined by the data.

Real data analysis: Data from the North American Barley Genome Mapping Project (Tinkeret al. 1996) were analyzed using the proposed Bayesian method. Seven traits were investigated in the project: heading, yield, maturity, height, lodging, kernel weight, and test weight. We present only the results of “heading” here. The DH (double-haploid) population contained 145 lines (n = 145), each grown in a range of environments. A total of 127 mapped markers (K = 127) covering a 1500-cM genome along seven linkage groups were used in the analysis. The average phenotypic values across the environments were calculated for each line and these average values were treated as the original phenotypic values (yi) for the analysis. These phenotypic values were further standardized. The standardized records were used in the analysis.

The prior distributions for (μ, α, σe2 , Γ), the prior variances (τj2 , cj2τj2 ), the length of the Gibbs sampler, and the thinning scheme of the posterior sample were set to be the same as those in the analysis of our simulated data described above. The initial values (μ(0), α(0), σe2(0) , Γ(0)) were randomly generated from their priors.

For the three different prior specifications, the plots of the posterior probabilities of the marker indicators are shown in Figure 5, and the marker effects are depicted in Figure 6. As observed in our simulation studies, for almost all markers, the posterior probabilities for the first prior setting were larger than those for the other two settings, and the posterior probabilities for the second prior setting were larger than those for the third. However, the profiles of the posterior probability distributions were proximate. As shown in Figure 6, the marker effects were estimated to be essentially identical for the three sets of prior variances (τj2 , cj2τj2 ). For the first prior setting, it was found that five markers with posterior probabilities from 0.75 to 0.95 and marker effects of ∼ ±0.4, are located at chromosomes 1, 3, 4, and 6, respectively. We also found five markers on chromosomes 3, 4, 5, and 6, respectively, with posterior probabilities from 0.4 to 0.6. Using the interval mapping of Lander and Botstein (1989) and the composite interval mapping of Zeng (1994), however, Tinker et al. (1996) declared only three QTL on chromosomes 1, 4, and 7, respectively, as significant (data not shown here). Two markers on chromosome 1 were found to have the posterior probability of ∼0.76 and the effect of ∼ -0.4. However, Tinker et al. (1996) declared only one QTL on chromosome 1 as significant.

Figure 4.

—Marker effects plotted against marker locations along the genome from multiple-marker regression analysis.

For the first prior setting, the posterior means of the overall mean and the residual variance were estimated to be (τj2,cj2τj2)=(0.001,10) and (τj2,cj2τj2)=(0.001,10) , respectively. The proportion of phenotypic variance explained by the markers is calculated as (τj2,cj2τj2)=(0.01,10) , where (τj2,cj2τj2)=(0.01,10) is the phenotypic variance for the standardized phenotype. Thus, the proportion of phenotypic variance explained by the markers was estimated to be ∼69%. From the estimates of the marker effects (τj2,cj2τj2)=(0.01,100) , we calculated the proportion of phenotypic variance explained by marker j as (τj2,cj2τj2)=(0.01,100) and found that the proportion of phenotypic variance explained by each of the five strongest markers was ∼4%.

DISCUSSION

Mapping multiple QTL can be viewed essentially as a problem of model selection (e.g., Broman and Speed 2002; Sillanpää and Corander 2002). A variety of statistical selection procedures including both non-Bayesian and Bayesian methods have been developed for conventional statistical models. Some of these procedures have been modified to map multiple QTL. In this study, we developed a Markov chain Monte Carlo (MCMC) algorithm on the basis of the SSVS approach of George and McCulloch (1993) for identifying multiple markers. The proposed method was shown to be extremely efficient under typical situations of most QTL studies in terms of the number of markers and the marker density. Compared with the existing Bayesian methods, such as the reversible-jump MCMC, the SSVS approach has advantages on simplicity of computation and diagnosis of convergence (George and McCulloch 1997). The SSVS procedure can even be implemented using the publicly available software BUGS (Congdon 2002) and thus can be widely used in QTL studies.

An essential element of the performance of our Gibbs sampler is its ability to move between two different values of the indicator variable Γj. In the analyses of the simulated data and real data, the value of Γj changed frequently, suggesting that the proposed algorithm mixes well and the chain converges quickly. However, as described in George and McCulloch (1993, 1997), convergence can be very slow and thus computational problems can arise when μ^=0.0882 is set too large. This setup can lead to very small transition probabilities for Γj to go from 0 to 1 or from 1 to 0. After extensive testing, George and McCulloch (1997) indicated that these problems can be avoided whenever σ^e2=0.3102 . Also, mixing behavior and convergence of the chain is expected to be affected by marker density. In the cases of high-density maps with hundreds of markers, one might consider a second iteration of SSVS with a reduced set of markers based on the first run (George and McCulloch 1993). This two-stage strategy may improve accuracy of estimating the marker effects and the posterior probabilities.

Figure 5.

—Posterior probabilities of marker indicators for heading in barley. Ticks on the horizontal axes represent markers. Red, Formula ; green, Formula ; blue, Formula .

Recently, Xu (2003) proposed a Bayesian method under the random regression model to simultaneously estimate genetic effects associated with markers of the entire genome in inbred line crosses. In his Bayesian framework, each genetic effect was assigned a normal prior distribution with mean zero and a unique variance. The effect-specific prior variance was further assigned a vague prior so that the variance was estimated from the data. This approach is analogous to the Bayesian method of Meuwissen et al. (2001) for BLUP prediction of gene effects in outbred populations. For a backcross population with K markers, Xu’s method needs to estimate K different marker effects αj (j = 1,..., K) and K different variances h^2=(σy2σ^e2)σy2 , where σy2=1 . Although this approach can evaluate each marker effect, it does not provide a probability statement about statistical significance for marker effects. In our approach, indicator variables are introduced but all effects included in the model have the same prior variance and all effects excluded from the model have another common prior variance. Therefore, our method not only estimates each marker effect, but also provides the posterior probability that each marker has a significant effect on the trait. The introduction of indicator variables may allow a large number of markers to be included in the model. It is worth noting that both the random-model approach and our method include all markers in the analysis and thus may have the ability to control the genetic variances of a large number of small-effect QTL. Whether this property can improve power in detecting multiple QTL and estimating the genetic effects deserves further investigation.

Figure 6.

—Marker effects plotted against marker locations along the genome for heading in barley. Ticks on the horizontal axes represent markers. Red, Formula ; green, Formula ; blue, Formula .

We have applied the SSVS approach to identify multiple QTL by analyzing all markers of the whole genome. When the markers are densely and regularly spaced, the marker analysis would provide reasonable estimates of marker effects and marker posterior probabilities even when QTL are located in the marker intervals. If the marker density is low and irregularly spaced, however, the marker analysis will be biased. In these situations, however, we can extend the proposed method to allow for finer structure mapping by two ways. The first approach is to use the multiple imputation method to generate the missing genotypes at grids of points between markers (Sen and Churchill 2001). The imputed genotypes are then incorporated into our Bayesian procedure. The second approach is to substitute markers by positions in the marker intervals if we assume that at most one QTL is on any marker interval. This approach requires searching the optimal positions within the marker intervals. The algorithms for updating QTL positions have been developed (e.g., Xu and Yi 2000; Yi and Xu 2000, 2001) and can be easily incorporated into our procedure.

The SSVS approach has been extended to the multivariate regression model (Brownet al. 1998). In QTL-mapping studies, the joint analysis of multiple traits can provide formal procedures to test a number of biologically interesting hypotheses concerning the nature of genetic correlations between different traits. Under certain situations, the joint analysis can improve statistical power in detecting QTL and estimating the genetic parameters. We can extend the proposed method by applying the SSVS approach to jointly identify QTL for correlated multiple traits. In this study, we considered mapping multiple QTL under the nonepistatic model. A growing number of experiments provide strong evidence of the presence of interactions between genes for many complex traits. Under the epistatic model, the number of genetic effects increases exponentially as the number of markers increases. The multiple-stage SSVS approach can be employed to identify interacting QTL.

Acknowledgments

We are grateful to two anonymous reviewers for their helpful comments. This work was supported by National Institutes of Health grants R01ES009912, P41RR006009, R01DK054298, and P30DK56336 to D.B.A.

Footnotes

  • Communicating editor: J. B. Walsh

  • Received December 5, 2002.
  • Accepted March 13, 2003.

LITERATURE CITED

View Abstract