## Abstract

Information from cosegregation of marker and QTL alleles, in addition to linkage disequilibrium (LD), can improve genomic selection. Variance components linear models have been proposed for this purpose, but accommodating dominance and epistasis is not straightforward with them. A full-Bayesian analysis of a mixture genetic model is favorable in this respect, but is computationally infeasible for whole-genome analyses. Thus, we propose an approximate two-step approach that neglects information from trait phenotypes in inferring ordered genotypes and segregation indicators of markers. Quantitative trait loci (QTL) fine-mapping scenarios, using high-density markers and pedigrees of five generations without genotyped females, were simulated to test this strategy against an exact full-Bayesian approach. The latter performed better in estimating QTL genotypes, but precision of QTL location and accuracy of genomic breeding values (GEBVs) did not differ for the two methods at realistically low LD. If, however, LD was higher, the exact approach resulted in a slightly higher accuracy of GEBVs. In conclusion, the two-step approach makes mixture genetic models computationally feasible for high-density markers and large pedigrees. Furthermore, markers need to be sampled only once and results can be used for the analysis of all traits. Further research is needed to evaluate the two-step approach for complex pedigrees and to analyze alternative strategies for modeling LD between QTL and markers.

DUE to advances in molecular genetics, high-density single-nucleotide polymorphisms (SNPs) are becoming available in animal and plant breeding. These can be used for whole-genome analyses such as prediction of genomic breeding values (GEBVs) and fine mapping of quantitative trait loci (QTL). Genomic selection (GS) (Meuwissen *et al*. 2001) is promising to improve response to selection by exploiting linkage disequilibrium (LD) between SNPs and QTL (Hayes *et al*. 2009; Vanraden *et al*. 2009), but the accuracy of GEBVs depends on additive-genetic relationships between the individuals used to estimate SNP effects and selection candidates (Habier *et al*. 2007, 2010). Use of cosegregation information, in addition to LD, may reduce this dependency and improve GS. Calus *et al*. (2008) used a variance components linear model for this purpose in which random QTL effects are modeled conditional on marker haplotypes. The covariance between founder haplotypes allows one to include LD (Meuwissen and Goddard 2000), and the covariance between nonfounder haplotypes computed as in Fernando and Grossman (1989) allows one to include cosegregation. The resulting covariance matrices, however, can be nonpositive definite, which necessitates bending with the effect that information can be lost (Legarra and Fernando 2009). Furthermore, accommodating dominance and epistasis is not straightforward with linear models, especially for crossbred data. In contrast with mixture genetic models, genetic covariance matrices do not enter into the analysis, and accommodating dominance and epistasis is more straightforward (Goddard 1998; Pong-Wong *et al*. 1998; Stricker and Fernando 1998; Du *et al*. 1999; Du and Hoeschele 2000; Hoeschele 2001; Yi and Xu 2002; Pérez-Enciso 2003; Yi *et al*. 2003, 2005).

Mixture model analyses, however, are more computationally demanding because the unknowns to be estimated in these analyses include the effects of unobservable QTL genotypes. In linear model analyses, in contrast, it is effects of observable marker genotypes that are estimated. The mixture model analysis can be thought of as a weighted sum of linear model analyses corresponding to each possible state for the unobservable QTL genotypes, where the weights are the probabilities of the QTL genotype states conditional on the observed marker genotypes and trait phenotypes. In practice, the analysis needs to consider all possible haplotypes at the markers also because even when all marker genotypes are observed, some of the marker haplotypes may not be known. As a result, the computational burden of these analyses stems from the number of unknown genotype and haplotype states that need to be summed over being exponentially related to the number of individuals in the pedigree and the number of loci.

It can be shown that conditional on the genotypes of their parents, genotypes of offspring are independent of the genotypes of all their ancestors. This conditional independence can be exploited to efficiently compute the weighted summation in the mixture model analysis, provided the pedigree is not too complex (Lauritzen and Sheehan 2003). In genetics, this strategy is called peeling (Elston and Stewart 1971; Cannings *et al*. 1978) and is equivalent to variable elimination in graphical models (Lauritzen and Sheehan 2003). This approach, however, becomes infeasible when the pedigree is complex and the number of loci is large. Another strategy for analysis of mixture models is based on using Markov chain Monte Carlo (MCMC) theory to draw samples of QTL genotypes and marker haplotypes conditional on the observed marker genotypes and trait phenotypes. Pérez-Enciso (2003) developed an MCMC-based Bayesian analysis for a mixture genetic model that uses information from both LD and cosegregation to fine map a single QTL, but this approach becomes computationally infeasible for whole-genome analyses without approximations.

In this article, we investigate a two-stage, approximate analysis that uses information from both LD and cosegregation. In the first stage, ordered genotypes of markers are sampled conditional only on the observed, unordered marker genotypes, ignoring information from the trait phenotypes. These samples are drawn using a Gibbs sampler with overlapping blocks (Thomas *et al*. 2000; Abraham *et al*. 2007) in which peeling is performed within a block while conditioning on variables outside the block. From these samples, founder haplotype probabilities and segregation probabilities for the QTL, also called probabilities of descent of QTL (PDQs) alleles, are calculated. In the second stage, these probabilities are used to sample QTL genotypes conditional on the trait phenotypes. In this analysis, information from LD is incorporated by allowing the QTL allele frequencies in founders to be dependent on the marker haplotypes, and information from cosegregation is incorporated by using the PDQs from the first stage to sample QTL alleles in nonfounders. The approximation comes from ignoring trait phenotypes in sampling ordered marker genotypes. A major advantage of the two-step approach is that markers have to be sampled only once and can then be used to analyze all quantitative traits with a mixture model.

The objective of this study is to test the hypothesis that this approximation is negligible given high-density SNPs. To test this hypothesis, results from the two-stage, approximate analysis are compared to a full-Bayesian analysis that does not ignore the information from the trait phenotypes in sampling the ordered marker genotypes. The full-Bayesian approach was selected, because it is considered to be the ideal statistical model as it accounts for all uncertainties (Hoeschele 2001). Because the full-Bayesian approach is computationally too demanding for application to GS, the approximate and full-Bayesian analyses are used to fine map within a simulated chromosomal region that is known to contain a QTL to make the comparison computationally feasible. If the consequences of ignoring trait phenotypes to sample ordered marker genotypes are negligible, further research to apply mixture genetic models to GS and comparisons with linear models are justifiable.

## THEORY

#### Statistical models:

In this study, we assumed that a single additive QTL has previously been located within a given genome region and that SNP data are available for this region. The goal was to fine map this QTL into one of the genome intervals spanned by two flanking SNPs. Thus, if *K* SNPs are available, there will be *K* − 1 intervals or in other words *K* − 1 putative QTL locations. For each putative QTL location, δ, which equals the midpoint between the two flanking SNPs on an interval in centimorgans, a mixture genetic model can be written as(1)where is the vector of trait phenotypes, μ is the overall mean, is the column vector of unobservable genotype scores for the putative QTL location δ, *a* is the random additive-genetic effect of the QTL to be located, and is the vector of residual effects for location δ. The QTL is assumed to be biallelic with alleles *Q*_{1} and *Q*_{2}, and thus genotype scores in are set to −1, 0, and 1 for the genotypes *Q*_{1}*Q*_{1}, *Q*_{1}*Q*_{2}, and *Q*_{2}*Q*_{2}, respectively. The mean phenotypic difference between the two homozygous genotypes is therefore 2*a*. Residual effects in are modeled to be different for each putative QTL location because genotypic scores in depend on the QTL location for two reasons: (a) marker-QTL LD can be different for each interval and (b) the distance between true and putative QTL location affects cosegregation of marker and QTL alleles. Residual effects are assumed to be normally distributed with zero mean and variance that is specific to each QTL location.

To model LD with a manageable number of parameters, the joint probability of allele states of markers and QTL in a founder is written as follows. Let **S**^{m}(*i*) denote, for example, the maternal allele states for founder individual *i*. Given disequilibrium, these allele states are not independent, and thus the joint probability of these allele states is expressed as(2)where *S*^{m}(*i*, *j*) is the maternal allele state variable of individual *i* at locus *j*, the marginal probabilities, Pr[*S*^{m}(*i*, 1)], for allele states at marker locus 1 are written as Pr[*S*^{m}(*i*, 1) = 1] = η and Pr[*S*^{m}(*i*, 1) = 0] = 1 – η, and letting λ_{x}(*j*) = Pr [*S*^{m}(*i*, *j*) = 1 | *S*^{m}(*i*, *j* – 1) = *x*] for *x* = 0 or 1, the conditional probabilities of allele states in (2) are written asThe map positions of markers and pedigree information are assumed known.

Unlike modeling known marker genotypes, the genotypic scores in cannot be observed. Only trait phenotypes in and the unordered genotypes of *K* SNPs denoted by **M** are observed. Unordered, as opposed to ordered, means that the maternal and paternal origins of the alleles are unknown.

Beta priors with shape parameters equal to 1, which reduce these distributions to a uniform distribution in the interval (0, 1), are used for both marginal and conditional probabilities of allele states. The prior for the QTL location parameter δ is uniform with probability , for μ it is 1, and for it is scale inverse chi square with ν = 4.2 d.f. and scale parameter , where is the simulated residual variance in this study. The prior for *a* is gamma with shape parameter 0.4, which was estimated by Hayes and Goddard (2001) from published estimates of QTL effects, and the scale parameter is 1.66 such that the variance is 1 (Meuwissen *et al*. 2001).

#### Statistical methods:

First, we describe general principles that are common to both the full-Bayesian and the approximate, two-stage analyses. This is followed by the specific details for each approach. In Bayesian analyses, inference about model parameters such as, , and are made from their posterior distributions given the observed data, which in our case consists of the unordered marker genotypes, **M**, and trait phenotypes, (Sorensen and Gianola 2002). In most problems, however, computing the posterior distributions of the parameters is computationally infeasible, and inferences are made by drawing samples from the joint posterior of the parameters,(3)using MCMC techniques (Sorensen and Gianola 2002). To draw samples for the variables in (3), it is often convenient to include allele state and allele origin variables into the joint posterior. Including these variables in (3), the augmented joint posterior distribution becomes(4)where, for example, **S**^{m} denotes the maternal allele state variables and **O**^{m} the maternal allele origin variables. As an alternative to augmenting (3) with allele state and allele origin variables, it is possible to replace the maternal and paternal allele state variables in (4) by the corresponding genotype variables. Then, the augmented joint posterior becomes(5)where **G** denotes the genotype variables. Depending on the number of alleles at a locus and the amount of missing information, sampling from one of these posterior distributions may be more efficient than sampling from the other. In this study, sampling from (5) was more efficient. For the purpose of describing the algorithm, however, (4) is more convenient. Thus, in the following, we use (4) to describe the sampling scheme.

After simple elimination of impossible states of allele states and origins, a Gibbs sampling strategy was used to sample the remaining variables in (4), where samples for one or more of these variables are drawn from the conditional distribution of those variables given the data and all other variables in the posterior. As described in the appendix, this joint conditional distribution can be written as the product of several probability functions, each involving only a few variables. Then in principle, peeling and reverse peeling can be used to draw samples for the allele state and allele origin variables from their joint conditional distribution given all the parameters and the data (Ploughman and Boehnke 1989; Jensen *et al*. 1995; Heath 1997; Fernández *et al*. 2001). Suppose the variables to be sampled by peeling and reverse peeling are denoted by *V*_{1}, … , *V _{n}*, where the variables are numbered in the reverse order of peeling. Then, peeling (Elston and Stewart 1971) is used to compute efficiently the marginal distribution for

*V*

_{1}, by summing over the possible states of all the other variables. In reverse peeling (Ploughman and Boehnke 1989; Jensen

*et al*. 1995; Heath 1997; Fernández

*et al*. 2001), a sample for

*V*

_{1}is drawn from this marginal distribution, and the intermediate results from the peeling process are used to compute efficiently the marginal for

*V*

_{2}conditional on the sampled value of

*V*

_{1}. Then, a sample for

*V*

_{2}is drawn from its marginal distribution. This process is continued for all

*n*variables, where at step

*i*, intermediate results from peeling are used to compute efficiently the marginal for

*V*conditional on the sampled values of

_{i}*V*

_{1},

*V*

_{2}, … ,

*V*

_{i−1}.

When the pedigree is complex and the number of loci is large, however, peeling becomes computationally infeasible. In this case, allele state and allele origin variables can be sampled in blocks (Jensen *et al*. 1995). Blocks can be formed of pedigree members and marker loci to reduce the number of variables within a block that are sampled jointly by peeling and reverse peeling conditional on the variables outside the block. Here, a pedigree block included a sire, its mates, their offspring, and the parents of the sire and the mates. Note that these blocks are overlapping, because sires and dams also occur as offspring in other blocks, a strategy also described by Thomas *et al*. (2000). A locus block included allele state and allele origin variables of adjacent marker loci, where a certain number of the leftmost loci of a block are also included in the previous block (except for the first block) and a certain number of the rightmost loci in the following block (except for the last block) (Abraham *et al*. 2007). The overlapping of variables serves to improve mixing of the Gibbs algorithm (Abraham *et al*. 2007). Allele origin and state variables of loci are peeled only within intersections of pedigree and locus blocks, while conditioning on all variables outside such an intersection. One round of sampling is completed after all intersections have been processed successively.

In this study, sampling was feasible without blocking by pedigree but using locus blocks of size eight with three overlapping loci on each side. To evaluate alternative blocking strategies, either pedigree blocking was used or the number of loci was reduced to four with two overlapping loci. To improve the estimation of PDQs, the two haplotypes of a founder are arbitrarily labeled maternal and paternal, which can be done if at least one of the observed marker genotypes of a founder is heterozygous (van Arendonk *et al*. 1999).

The parameters in η and are gene frequencies, and given the beta priors used for these gene frequencies, their full conditionals are also beta distributions (Sorensen and Gianola 2002). The parameters for the QTL include its location δ, the mean for the trait μ, the effect of the QTL *a*, the allele frequencies involving the QTL in , and the residual variance . These parameters are sampled jointly from their full conditional using the Metropolis–Hastings algorithm, which is described in the next section.

#### Full-Bayesian analysis:

At the marker loci, as described above, peeling and reverse peeling are used to sample allele state and origin variables conditional on the current values for the allele state and allele origin variables at the QTL, , , and the observed, unordered genotypes at the marker loci. Thus, peeling is applied only to the marker variables in (10) with the current values of the QTL variables being treated as constants. Once all the allele state and origin variables at the marker loci were sampled, and elements of corresponding to markers are sampled from their full-conditional distributions, which are beta distributions that depend on the sampled allele states of the founders (Sorensen and Gianola 2002). Following this, as described below, the Metropolis–Hastings (MH) algorithm is used to jointly sample a new QTL location, the parameters of this QTL, and the allele state and origin variables for the QTL at the new location, conditional on the marker allele state and origin variables.

In the MH algorithm, candidate samples are obtained by drawing from a proposal distribution, and these candidate samples are accepted or rejected according to the MH acceptance probability (Gilks and Roberts 1996). First, the location variable is sampled from an adaptive proposal, constructed from a Polya urn (Ross 2007), in which the probability of sampling location *j* in round *t* is given by(6)where *p*_{0} is the prior degree of belief that each QTL location is proposed with equal probability of , and *n*_{tj} is the frequency of accepted samples in location *j*, with *n*_{0j} = 0. Given the proposed QTL location, the proposed value for μ is drawn from a normal distribution, for *a* from gamma, for from a scaled inverse chi square, and for allele frequencies from beta distributions, where the parameters of these distributions are chosen such that the proposed values are in the neighborhood of the most recently accepted values for the proposed QTL location as described in the appendix.

Given the proposed location and QTL parameters, the QTL allele state and origin variables are sampled from the full-conditional distribution for these variables. Peeling and reverse peeling are used to sample these variables jointly conditional on the allele state and origin variables at the markers and all the relevant parameters. Here, peeling is applied only to the QTL variables in (10) with the current values of the marker variables being treated as constants. Finally, the proposed QTL location, QTL parameters, and QTL origin and state variables are accepted or rejected according to the MH algorithm in which the probability of acceptance iswhere is the vector of the currently accepted values for the variables being sampled, is the vector of the proposed values for these variables, and denote distributions for the data models, and are prior distributions, and and are proposal distributions.

#### Approximate two-stage approach:

In the first stage of the approximate method, allele state and allele origin variables for the marker loci are sampled conditional on the observed, unordered genotypes at the marker loci and the current values of η and for the markers, ignoring the trait phenotypes. As described previously, these samples are obtained by using a blocking Gibbs sampler that uses peeling and reverse peeling to jointly sample all the variables within a block, except that the joint distribution (Equation A1 in the appendix) is constructed without any QTL related probabilities. These samples are then used to estimate two types of probabilities. The first type consists of the joint probabilities of the maternal (Pr[*h*^{m}(*i*, δ) | **M**]) and paternal (Pr[*h*^{p}(*i*, δ) | **M**]) marker haplotypes, consisting of the allele states of the two markers flanking each of the *K* – 1 potential QTL positions in the founders. They are estimated from the sampled allele state variables in the founders. The second type consists of the joint probabilities of the allele origin variables for the maternal and paternal marker haplotypes, Pr[*O*^{m}(*i*, δ – 1), *O*^{m}(*i*, δ + 1) | **M**] and Pr[*O*^{p}(*i*, δ – 1), *O*^{p}(*i*, δ + 1) | **M**], flanking each of the *K* – 1 potential QTL positions in nonfounders. They are estimated from the sampled allele origin variables of nonfounders. With the allele origin variable for a given marker allele being either 0 or 1, there are four possible combinations of allele origin variables for a two-marker haplotype. Given one of these combinations, the conditional PDQ can be calculated for the maternal and the paternal QTL allele of a nonfounder at the midpoint of the flanking markers using formulas shown in Table 1, assuming no interference. The PDQs conditional on the observed, unordered marker genotypes, Pr[*O ^{x}*(

*i*, δ) |

**M**], are derived by weighting the conditional PDQ for each combination of segregation indicators of a two-marker haplotype with their corresponding probabilities,

*i.e.*, Pr[

*O*

^{m}(

*i*, δ – 1),

*O*

^{m}(

*i*, δ + 1) |

**M**] and Pr[

*O*

^{p}(

*i*, δ – 1),

*O*

^{p}(

*i*, δ + 1) |

**M**].

The parameters of the mixture model in the two-stage approximate method are δ, μ, *a*, and as in the full-Bayesian analysis, with a new parameter for the conditional allele frequency at the QTL given the allele states at the flanking markers. The vector has four elements, where(7)for *x* = *m* or *p*, and (*l*, *m*) = (0, 0), (0, 1), (1, 0), or (1, 1).

As in the full-Bayesian analysis, these parameters, the allele states, and the allele origin variables at the QTL are sampled jointly using the MH algorithm in the second stage of the approximate method. However, while Equation A1 in the appendix was used in the full-Bayesian analysis to sample QTL allele states and allele origin variables conditional on the allele states and allele origin variables at the flanking markers, here QTL variables are sampled conditional on the observed, unordered marker genotypes, **M**, as described below.

The joint distribution for QTL allele state and allele origin variables given the observed, unordered marker genotypes and the QTL parameters is expressed as(8)where *F* is the number of founders, *s*_{i} denotes the sire and *d*_{i} the dam of individual *i*, and Pr[*S ^{x}*(

*i*, δ) |

**M**] is computed as(9)Peeling and reverse peeling are used to draw candidate samples for the QTL allele states and allele origin variables from (8), and candidate samples for the parameters were drawn in the neighborhood of their current values as described in the full-Bayesian analysis. These samples were accepted or rejected using the MH rule.

#### Sample scheme:

Each MCMC round has two types of sampling steps to update QTL parameters and variables. These were called *jump* and *resample* steps, where each of these two steps can be performed a different number of times within each MCMC round. In the *jump* step, a new QTL location other than the one currently accepted is proposed along with its parameters and variables, whereas in the *resample* step, new parameters and variables of the QTL location currently accepted are proposed. The *jump* step was performed 30 times and the *resample* step 20 times in each round.

The full-Bayesian approach was run for 24 hr and each step of the two-step approach for 12 hr, where the burn-in was always 5000 rounds. The processor used for all computations was an AMD Opteron 2352 with 2.1 GHz and 4 Gb memory.

## SIMULATIONS

Simulations were used to study the consequences of neglecting the information that trait phenotypes provide to infer allele states and origins of markers and of using alternative blocking strategies for the Gibbs sampler on estimation of QTL parameters and on prediction of GEBVs. The aim was (1) to simulate marker–QTL LD similar to that found recently in real cattle populations, where the study of de Roos *et al*. (2008) served as a template, and (2) to simulate marker–QTL LD that is higher than in the first scenario to compare the full-Bayesian with the two-step approach at low and high LD. The two scenarios differed only in the effective size of the simulated base populations, which were randomly mated, including selfing, for 1000 discrete generations. Effective sizes of the two base populations were 1500 and 500 individuals in scenarios 1 and 2, respectively. The population was reduced to a size of 100 individuals after 1000 generations and randomly mated for another 15 generations. The population was then gradually increased over the next 10 generations to obtain a population of 800 males and 800 females in generation 1025. In the following 3 generations, 80 sires were randomly selected and mated to 800 dams in each discrete generation. Each female had one male and one female offspring and thus each sire had 10 sons and 10 daughters. Individuals of generation 1028 were considered as founders of the pedigree used for fine mapping. Four additional generations were generated by selecting 20 sires from the male offspring in each generation at random and mating them to 200 founder dams from generation 1028, where different founder dams were used in each generation. Each mating produced one male offspring and thus each sire had 10 offspring. It was assumed that neither phenotypic nor marker data were available for dams and thus they were not included in the analyses. In total, the pedigree consisted of 820 individuals, the 20 founder sires and 200 male offspring in each of the 4 subsequent generations. Because dams were not included in the analyses, the pedigree had no loops and thus QTL variables could be peeled without pedigree blocking. Otherwise further approximation strategies were needed, which must be different from those used for SNPs. This is discussed in the discussion.

A single-chromosome segment of length 5 cM was simulated with 2000 evenly spaced SNPs in generation 1. A total of 300 QTL were randomly distributed among those SNPs and their effects were sampled from a gamma distribution with shape 0.4 and scale 1.66 as used by Meuwissen *et al*. (2001). All loci were simulated to be biallelic with initial allele frequencies 0.5 and in Hardy–Weinberg and linkage equilibrium. The mutation rate for both SNPs and QTL was 2.5 × 10^{−5}, which is larger than estimates of actual mutation rate to ensure that enough loci are segregating for statistical analyses after 1025 generations of random mating. However, it can be shown that the mutation rate has only a small effect on LD in this simulation. Recombinations were modeled according to a binomial map function, where the maximum number of uniformly and independently distributed crossovers on a chromosome of 1 M was four (Karlin 1984), *i.e.*, assuming interference. In generation 1028 (founder generation of the pedigree), the chromosome was first divided into 100 bins with an equal number of SNPs in each bin. Then, within each bin the SNP with highest minor allele frequency was selected. Next, the QTL with frequency closest to 0.5 was selected as the only locus with effect on the quantitative trait. Twenty adjacent SNPs around this QTL were finally selected for fine mapping, where the position of the QTL was random between SNPs 4 and 16. Because each of the 100 bins had a length of 0.05 cM, the expected length of the fine-mapping region was 1 cM.

Heritability (*h*^{2}) of the quantitative trait was set to 0.05 by rescaling the QTL effect in generation 1028. Phenotypes were calculated as the sum of the QTL genotype effect of an individual and a residual effect sampled from a standard normal distribution. To compare methods also at a higher *h*^{2} of 0.3, residual effects were resampled with a residual variance of 0.123. The 820 individuals were all phenotyped for the quantitative trait and genotyped for the 20 SNPs.

#### Comparison of methods:

For both the full-Bayesian approach and the two-step approach, the mean absolute difference between estimated and true value of δ and *a* was obtained aswhere and *z _{j}* denote, respectively, posterior mean and true value of replicate

*j*of the simulation, and

*n*is the number of replicates. The posterior mean of the genotypic score of an individual

*i*, , was derived from samples of allele states at all putative QTL locations. The mean absolute differences between true and estimated genotypic scores were first averaged over individuals and then over replicates. The ability to estimate LD between the QTL and two-SNP intervals was evaluated with regression

*R*

^{2}(Zhao

*et al*. 2005), which was calculated for each putative QTL location in each iteration of the MCMC algorithm aswhere Pr[

*h*(

^{x}*i*, δ)] = Pr[

*S*(

^{x}*i*, δ – 1),

*S*(

^{x}*i*, δ + 1)] (

*x*= m or p) denotes the frequency of the maternal or paternal two-SNP haplotype at QTL location δ, and Pr[

*S*(

^{x}*i*, δ)|

*h*(

^{x}*i*, δ)] describes LD between the QTL and the flanking SNPs at location δ. This conditional probability is obtained from η and elements in in the full-Bayesian approach, whereas is used in the two-step approach. The true was determined using true frequencies from the simulation. Because LD is different for every putative QTL location, the absolute difference between true and posterior means was calculated for each location separately, weighted by the posterior probabilities for δ, and summed to obtain a single value. These values were finally averaged over replicates. The GEBV of an individual

*i*was calculated as , where is the posterior mean of the QTL effect. The accuracy of GEBVs was calculated as the correlation between true simulated breeding values and GEBVs. Finally, the number of MCMC rounds obtained for sampling SNPs and QTL variables was recorded. Paired

*t*-tests were applied to test for significant differences between comparison criteria of the full-Bayesian and two-step approach when replicates were simulated with the same LD, whereas normal

*t*-tests were used across LD scenarios.

## RESULTS

The average length of the genome regions obtained from 32 replicates of the simulation was 0.94 (±0.005) cM containing 20 SNPs at intervals of 0.05 (±0.001) cM and with minor allele frequency of 0.46 (±0.01). The mean fraction of ordered SNP genotypes and paternal segregation indicators that could be determined by simple genotype elimination prior to sampling was 24.8% each. Average allele frequency of the QTL was 0.5 (±3 × 10^{−4}) such that the mean simulated QTL effect was 0.324 (±1 × 10^{−6}). The extent of LD between pairs of QTL and SNPs generated after 1028 generations along with that reported recently for real cattle populations (de Roos *et al*. 2008) is depicted for a map distance of up to 1 cM in Figure 1. Average *r*^{2} decreased exponentially with increasing distance and was similar for all populations above a distance of 0.6 cM. As expected, average *r*^{2} at shorter distances was substantially higher for 500 effective individuals (*N*_{e}) than for 1500, and values for *N*_{e} = 1500 matched well with those for real cattle populations, although they were lower at distances <0.01 cM. Average *r*^{2} at 0.025 cM, which corresponds to the average distance between a putative QTL location and a flanking SNP, was 0.5 for *N*_{e} = 500 and 0.24 for *N*_{e} = 1500. Figure 2 presents true *r*^{2} between the QTL and SNPs used for mapping as well as true *R*^{2} between the QTL and each two-SNP interval for selected replicates of the simulation with *N*_{e} = 1500. The latter LD measure, which was the relevant one in this study because two-SNP haplotypes were used to model LD, was often substantially >*r*^{2} for each of the flanking SNPs, especially when *r*^{2} was incomplete. In replicates 1, 4, 5, and 6, *R*^{2} clearly decreased with increasing distances to the true QTL position, whereas in replicates 2 and 3 the difference between close and distant putative QTL locations was not as distinct, where a clear *R*^{2} peak was missing in replicate 2. Further, note that the SNP interval with the highest *R*^{2} may not be the one closest to the true QTL position, as can be seen in replicates 2–4.

Figure 2 also presents posterior distributions of the QTL location parameter δ for the exact and two-step approaches. Except for replicate 2, which had low and almost uniform distributed *R*^{2}, the positions of posterior modes agreed with locations of the highest *R*^{2} values in both methods. For replicates with high *R*^{2}, the posterior distributions were generally highly peaked and standard deviations of the QTL location were small, as can be seen for replicates 1 and 3 in Table 2. Posterior distributions of both methods matched well for these two replicates, but in 5 and 6, which also had high *R*^{2}, they differed apparently and standard deviations were high (Figure 2 and Table 2). With low *R*^{2}, posterior modes were less distinct and standard deviations were usually larger than with high *R*^{2} (replicates 2 and 4). An explanation for the shape of the posterior distributions can be found in the estimated *R*^{2} depicted in Figure 3. In general, estimated *R*^{2} agreed well between both methods, but it was much lower than the true *R*^{2}. Clear maxima resulted only for replicates 1 and 3, whereas estimated *R*^{2} was almost flat around the true QTL position in replicates 5 and 6, although there were two-SNP intervals with high true *R*^{2}. Thus both methods had difficulties in these replicates to distinguish between putative QTL locations surrounding the true QTL position, resulting in different posterior distributions. For low *R*^{2}, the estimated *R*^{2} was usually flat as in replicates 2 and 4.

Estimates and standard deviations of μ, *a*, and obtained with both methods were very similar (Table 2). Replicates 1 and 3 were conspicuous compared to the other selected replicates in that *a* was overestimated and underestimated, which was generally the case when true *R*^{2} was high and estimated sufficiently well. Otherwise *a* tended to be underestimated and overestimated.

Table 3 shows accuracies of GEBVs and mean absolute differences (MAD) between true and estimated values of δ, **q**, *a*, and *R*^{2}, as well as the average number of MCMC rounds per hour obtained for sampling SNP and QTL variables for the exact and two-step approaches, depending on the extent of LD and *h*^{2}. In all scenarios, and especially at high LD, the MAD of **q** was significantly lower for the exact approach than for the two-step approach. The accuracy of GEBVs of the exact approach, however, was only higher at high LD, whereas both methods had equal accuracies at low LD. The MAD of δ, *a*, and *R*^{2} showed no clear differences with respect to both *h*^{2} and the method used. With high LD, accuracies of GEBVs as well as MAD of *a* and *R*^{2} were higher compared to low LD, whereas MAD of **q** was significantly lower. The MAD of δ ranged between 0.058 and 0.078 cM, which means that the distance between estimated and true QTL position was on average 6–8% of the size of the genome region prior to fine mapping. Similarly, MAD of **q** was 16–18% of the initial range of 2, and MAD of *a* was 19–26% of the simulated QTL effect.

Average number of MCMC rounds per hour for sampling both SNP and QTL variables was always higher for the two-step approach (Table 3). Because the exact approach was run for 24 hr and each step of the two-step approach for 12 hr, the total number of rounds for SNPs was higher for the exact approach, but lower for QTL variables. Furthermore, the number of rounds in the exact approach was lower with higher *h*^{2}, but higher at high LD. As expected for the two-step approach, the number of rounds was not different for sampling SNPs regarding *h*^{2} and LD, but the number of rounds was higher for sampling QTL variables at high LD as in the exact approach. The reason is that more QTL locations were in high LD with the QTL than in the low-LD scenario, and thus it was easier for the MCMC chain to jump between QTL locations, thereby reducing the average number of iterations used in the *jump* step such that the next MCMC round could start more rapidly.

Results of using alternative strategies for sampling SNP variables in the two-step approach are shown in Table 4. Pedigree blocking (alternative a) had no effect on both accuracy of GEBVs and mean absolute differences, but the number of MCMC rounds for sampling SNP variables was higher (compare with Table 3, low LD, *h*^{2} = 0.05, two-step approach). If, however, the size of the locus blocks was reduced from eight to four and the number of overlapping loci from three to two (alternative b), MAD of **q** was higher and accuracy of GEBVs significantly lower. Increasing the block size above eight loci did not improve precision of QTL parameter estimates or GEBVs (results not shown).

## DISCUSSION

The primary goal of this study was to utilize information from LD and cosegregation for whole-genome analyses with a mixture genetic model, because accommodating dominance, epistasis, and imprinting is straightforward with such a model. To make it computationally feasible for a large number of high-density markers and large pedigrees, an approximate strategy was proposed in which (1) marker and QTL variables are not sampled jointly as in an exact approach, but in two consecutive steps, and (2) a Gibbs sampler with overlapping blocks is used to sample marker variables. Simple pedigrees without females and fine-mapping scenarios with realistic LD were simulated to compare the exact with the two-step approach. Estimates for QTL location, additive effect of QTL, and LD parameters obtained by both methods were very similar. When LD between the QTL and two-SNP intervals could not be estimated sufficiently well to distinguish between putative QTL locations, posterior distributions of the QTL location parameter slightly differed between methods. Although the exact approach always performed better in estimating QTL genotypes, accuracy of GEBVs was identical for the two methods at realistically low LD. If, however, LD was higher, the exact approach resulted in a slightly higher accuracy of GEBVs. Pedigree blocking used to sample SNP variables had neither an effect on estimates of QTL parameters nor one on accuracy of GEBVs, whereas a smaller locus block size reduced the accuracy of GEBVs.

#### Modeling of LD and cosegregation:

The extent of LD between QTL and SNPs, how it is modeled, and whether it can be estimated determines, to a large extent, the success of whole-genome analyses with high-density markers. In this study, LD is incorporated by allowing the joint probability of marker and QTL allele states to deviate from the product of their marginal probabilities, which implies independence of the allele states. More precisely, the conditional probabilities of QTL allele states given the flanking, two-SNP haplotypes of a founder were used as parameters in the model. These parameters were then used along with founder haplotype probabilities to derive *R*^{2}, which was highly underestimated in this study, irrespective of whether the true LD was high or low. Although the uncertainty about founder haplotypes contributes to the inaccurate *R*^{2} estimates, more important seems the precision of the four QTL frequencies that have to be estimated solely from phenotypic data. Nevertheless, in replicates with complete true LD clear maxima were observed, whereas estimated *R*^{2} was flat in the incomplete case. By assuming that LD is due to mutation, the joint probability of marker and QTL allele states can be modeled as a function of the effective population size and time since the mutation (Morris *et al*. 2000). Pérez-Enciso (2003) used this approach to incorporate LD into a fine-QTL mapping analysis. In that study also *r*^{2} was underestimated, but not as much as in this study. The reason might be that more parameters need to be estimated in our study. Furthermore, in replicates of the simulation in which true LD was high and estimated sufficiently well to determine the QTL position, the QTL effect was overestimated and the residual variance underestimated. This could be a compensation for the underestimation of the LD parameters and might be resolved either by improving the estimation of LD parameters or by increasing the contribution of cosegregation information. The latter is achieved if the number of nonfounders is increased or females are included in the analyses so that the maternal alleles of nonfounders are not treated as founder alleles. The underestimation of QTL effects if *R*^{2} cannot be estimated well may be the effect of poor estimates of individual QTL genotypes such that contrasts between QTL genotypes become less distinct.

Instead of using 2-SNP intervals to model LD between QTL and markers, one could use 1 or >2 SNPs per interval, where the number of parameters would change correspondingly. As depicted in Figure 2, true LD for 2-SNP intervals (*R*^{2}) was higher than for flanking SNPs (*r*^{2}) when *r*^{2} was incomplete, but both LD measures were similar when *r*^{2} was high. Thus, if true *r*^{2} is high, it may be advantageous to model LD with single SNPs, because fewer parameters have to be estimated, whereas if *r*^{2} is low then ≥2 SNPs per interval could perform better. Meuwissen *et al*. (2001) also assumed that LD is due to a mutational event. However, rather than model LD through the joint probability of marker and QTL allele states, they proposed to model LD through the probability that QTL alleles in founders are identical-by-descent given markers. Grapes *et al*. (2006) used this method to fine map QTL using SNPs and compared various haplotype sizes. For the lowest SNP spacing analyzed of 0.25 cM, 4 SNPs per haplotype performed better than 2, and both of these options were better than 6 or 10 SNPs. Calus *et al*. (2008) modeled LD with the same method to estimate GEBVs and compared haplotype sizes of 2 and 10 SNPs. They found that accuracy of GEBVs hardly differed at a SNP spacing of 0.125 cM. Note that adjacent SNPs were on average 0.05 cM apart in this study, and thus >2 SNPs per interval may not be necessary, especially if LD is high only close to the true QTL position and declines rapidly with distance as is the case in real cattle populations. Nevertheless, further study is needed to determine the optimal number of SNPs per interval for a SNP density at least as high as simulated here and whether the methods to model LD proposed by Meuwissen and Goddard (2000) or Morris *et al*. (2000) are better alternatives than that used here.

Cosegregation was modeled in the two-step approach by using joint probabilities of segregation indicators of the two flanking SNPs of a putative QTL location. The segregation indicators of the two flanking SNPs are sufficient, because conditional on them the PDQs for a putative QTL location are independent of segregation indicators of other marker loci. Moreover, all available marker data in the pedigree were used efficiently to estimate those probabilities, combining information from LD and cosegregation, by using a blocking Gibbs sampler that uses peeling and reverse peeling to sample jointly ordered marker genotypes and segregation indicators. The contribution of cosegregation information to fine-QTL mapping and estimation of breeding values increases with number of nonfounders and helps to determine the QTL genotypes in founders, which improves the precision of LD parameter estimates.

#### Application to whole-genome analyses with large numbers of markers:

In the two-step approach, markers on different chromosomes can be processed separately, because they are independent of each other, while multiple processors can be used within a chromosome by applying locus blocking. Markers on a chromosome are first divided into overlapping locus blocks, and then each processor is assigned to two adjacent blocks. In the first iteration all processors start with the left block, while the marker genotypes and segregation indicators of the right blocks are held constant. In the following iterations the left and right blocks are sampled alternately. In addition, rule-based methods (Wang *et al*. 2007) can be applied prior to sampling to reduce the state space of ordered marker genotypes and segregation indicators in pedigrees with at least three generations. As a result, cutset sizes, mixing problems, and computing time decrease considerably. To demonstrate this strategy, genotypes of 100 SNPs on a chromosomal segment of length 5 cM were simulated for 8239 North American Holstein bulls that were genotyped for the Illumina Bovine50K array in practice. The rule-based method was able to resolve 98% of the ordered genotypes of nonfounders and 87% of the grandparental origins of bulls having genotyped grandfathers. This left a few short chromosomal segments for which the grandparental origins were unknown and some genotypes that were unordered. Given the known grandparental origins at the flanking markers, the states of those chromosomal segments could be sampled independently from the rest of the chromosome. Moreover, if marker genotypes of the father were ordered on those chromosomal segments, then the sampling distributions of the grandparental origins of the offspring were independent of those of their sibs and ancestors in the pedigree. Therefore, 1000 iterations of the Gibbs sampler with overlapping blocks were assumed to be sufficient to sample the remaining ambiguous states. However, the optimal number of iterations has to be analyzed in further studies. The computing time was 2 hr with 4 processors (2.4 GHz AMD 280 Opteron processors), where 1 hr and 13 min was required to set up the cutsets needed for peeling and reverse peeling. Thus, a chromosome of length 1 M having 2000 SNPs, which is about the number of available SNPs on bovine chromosome 1 using the 50K SNP panel, is expected to take 2 hr with 80 processors. Note that the proportion of ambiguous states that can be resolved by the rule-based method increases with SNP density, and therefore the sampling time for nonfounder states is not expected to increase in the future.

The size of the locus blocks and the number of overlapping loci had only a slight, yet significant, effect on the accuracy of GEBVs in this study, where a higher number of loci performed better than a lower one (Table 4). This is attributed to better mixing of the MCMC chain (Abraham *et al*. 2007), although the number of MCMC rounds per hour increases with smaller block sizes. Pedigree blocking, as a further means to reduce cutset sizes during peeling and thus to reduce the demand of computer memory, had neither a significant effect on estimates of QTL parameters nor one on accuracy of GEBVs. This might be different for complex pedigrees with loops, which should be analyzed in a further study. Complex pedigrees with genotypes on males and females were not simulated here, because the pedigree blocking strategy failed for the QTL variables due to very poor mixing of the Gibbs algorithm. However, a blocking strategy without mixing problems that is similar to the one described by Abraham *et al*. (2007) can be applied for QTL variables.

Note that numerous other methods have been presented to estimate segregation probabilities in the literature (Wang *et al*. 1995; Pong-Wong *et al*. 2001; Totir *et al*. 2003; Windig and Meuwissen 2004; Kong *et al*. 2008) and these can be used in the first step of the two-step approach. These methods are approximations to the joint sampling of ordered genotypes and segregation indicators of all markers, but may be computationally more efficient than our Gibbs approach. Further studies are needed to compare precision and computing time of the different methods.

The number of putative QTL positions that have to be evaluated in the second step of the two-step approach equals at least the number of markers minus one. Statistical models have been proposed that include multiple QTL (Heath 1997; Sillanpaa and Arjas 1998, 1999; Yi 2004; Zhang *et al*. 2005), even with dominance and epistatic effects (*e.g.*, Yi and Xu 2002; Yi *et al*. 2003, 2005). The simplest approach is to use a single-site Gibbs algorithm, but further research is necessary to improve computing efficiency and time.

In practice, missing marker genotypes as well as genotyping and pedigree errors occur. A missing genotype of an individual at a certain locus increases the sample space of ordered marker genotypes. First, genotype elimination can be used prior to sampling, which utilizes the observed marker genotypes of relatives. The Gibbs sampler of the first step of the two-step approach then utilizes LD and cosegregation to infer ordered marker genotypes, which are highly informative if high-density markers are available. The proportion of missing genotypes within an individual is small in practice [<1% in our own studies using the BovineSNP50 BeadChip from Illumina (San Diego)] and thus the observed marker genotypes should be sufficiently informative that the contribution of trait phenotypes to sample marker variables should not be significantly different from that found here. Genotyping errors can be accounted for by treating an observed marker genotype not as the true genotype, but as a marker phenotype plus error, which introduces a penetrance function for marker genotypes (Sobel *et al*. 2002). Pedigree errors can be avoided by use of the marker information in a parentage test prior to the whole-genome analysis.

#### Contribution of trait phenotypes to infer marker variables:

Loss of accuracy was limited with the two-step approach, and thus the contribution of trait phenotypes to infer marker variables, in addition to observed marker data, appears to be small. One explanation is the information content of observed marker data, which depends on the minor allele frequencies of markers, marker spacing, and pedigree structure. The average minor allele frequency in this study was high at 0.46. With decreasing frequency, the expected number of heterozygous marker genotypes with unknown order decreases, and thus fewer marker genotypes need to be sampled. On the other hand, more segregation indicators have to be sampled because less can be determined prior to sampling due to the higher proportion of homozygous marker genotypes. For simple pedigrees without females, it can be shown that when minor allele frequency decreases, the expected fraction of marker genotypes that needs to be sampled decreases at the same rate as the expected fraction of segregation indicators increases. This might be one reason why the differences between the exact and approximate two-step approaches did not change for a minor allele frequency of 0.35 (results not shown). Another reason is the advantage of the low marker spacing. Even if genotypes or segregation indicators are ambiguous at a certain marker locus, the flanking markers can be highly informative because recombinations between these loci are very unlikely to occur. The information content of markers also increases with the size of half- and full-sib families in each generation as well as the number of generations. In a complex pedigree containing males and females, the expected fraction of genotypes and segregation indicators that can be determined prior to sampling is higher than in the simple pedigree, but the total number of variables to be sampled is higher. Therefore, the importance of trait phenotypes in complex pedigrees remains to be analyzed.

In the exact approach, trait phenotypes provide information to sample markers through LD between QTL and SNPs to a large extent. Because the LD parameters were underestimated, the information flow from QTL to markers was limited. With higher LD, however, QTL genotypes and GEBVs were estimated with higher precision. Thus, if the LD parameters could be estimated more accurately with alternative strategies, trait phenotype information would be exploited more efficiently and the difference between the exact and two-step approaches could be larger.

The available markers may not be evenly distributed on the genome, resulting in regions with higher or lower density than assumed in this study. This could affect the information content of the observed markers and thus the contribution of trait phenotypes to infer ambiguous marker variables. The information that flanking markers provide through cosegregation is not expected to change notably if the distance between adjacent markers remains within a few centimorgans. LD information, however, is more affected, because LD between markers decreases rapidly with distance. The results of this study suggest that the differences between the approximate and the exact approach should decrease with decreasing marker density and vice versa.

#### Conclusions:

The proposed two-step approach makes mixture genetic models computationally feasible for high-density markers and large pedigrees. Neglecting phenotypic information to infer marker variables has limited effect on precision of estimated QTL parameters and accuracy of GEBVs for a simple pedigree without genotyped females, such as those currently available in dairy cattle. Thus, markers need to be sampled only once and results can be used for the analysis of all traits. Further research is needed to evaluate the two-step approach and the Gibbs sampler with overlapping blocks for complex pedigrees and to analyze alternative strategies for modeling LD between QTL and markers.

## APPENDIX

#### Joint distribution of allele state and allele origin variables:

Conditional on the observed, unordered marker genotype and the marker and QTL parameters, the joint distribution of allele state and allele origin variables can be written as a product of simple functions involving only a few variables and the parameters. This simplicity, which is the key to efficient computation of marginal probabilities by peeling, stems from two Markov properties of these variables. The first of these results from assuming no interference. Let *O*(*i*, *j*) denote either the maternal or the paternal allele origin for locus *j* of individual *i*. Then assuming no interference, given *O*(*i*, *j*), allele origin variables *O*(*i*, *k* < *j*) are independent of allele origin variables *O*(*i*, *m* > *j*). This conditional independence between allele origin variables is the basis for the Lander–Green algorithm (Lander and Green 1978). The second Markov property is due to Mendelian inheritance; because offspring inherit genes directly from their parents, given allele states of the parents, allele states of an individual are independent of allele states of ancestors and sibs (Sheehan and Thomas 1993). This conditional independence between allele state variables is the basis for the Eslton–Stewart algorithm (Elston and Stewart 1971). Suppose individuals are numbered such that individuals *i* ≤ *F* are founders and *i* > *F* are nonfounders. Then, these two Markov properties can be combined to write the joint distribution of allele state and allele origin variables as(A1)(Fishelson and Geiger 2002), where *y*_{ij} is the unordered marker genotype for a marker locus *j* and is the trait phenotype for a QTL *j*, *S*^{m}(*i*, *j*) and *S*^{p}(*i*, *j*) are the maternal and paternal allele state variables, and *O*^{m}(*i*, *j*) and *O*^{p}(*i*, *j*) are the maternal and paternal allele origin variables at locus *j* for individual *i*. For a marker locus, the penetrance function Pr[*y*_{ij} | *S*^{m}(*i*, *j*), *S*^{p}(*i*, *j*)] is null if the unordered genotype is not consistent with the allele state variables *S*^{m}(*i*, *j*) and *S*^{p}(*i*, *j*) and is unity if it is consistent. For a QTL, the penetrance function is a normal density with mean μ + *q*_{i}*a* and variance , where the value of *q*_{i} is −1, 0, or 1 corresponding to allele state values of *Q*_{1}*Q*_{1}, *Q*_{1}*Q*_{2}, or *Q*_{2}*Q*_{2} for *S*^{m}(*i*, *j*) and *S*^{p}(*i*, *j*). The probability Pr[**S**^{m}(*i*)] is given in Equation 2, and Pr[**S**^{p}(*i*)] is similarly written in terms of and . The variables *S*^{m}(*d*_{i}, *j*) and *S*^{p}(*d*_{i}, *j*) are the maternal and paternal allele state variables of the mother *d*_{i} of *i*. The allele origin variable *O*^{m}(*i*, *j*) is used to indicate if *i* inherited its mother's paternal allele (*O*^{m}(*i*, *j*) = 1) or maternal allele (*O*^{m}(*i*, *j*) = 0). Thus,andThe probability Pr[*S*^{p}(*i*, *j*) | *S*^{m}(*s*_{i}, *j*), *S*^{p}(*s*_{i}, *j*), *O*^{p}(*i*, *j*)] is similarly defined for the paternal allele state variable of *i* given its paternal allele origin variable and its father's allele state variables. Finally, *O*^{m}(*i*, *j*) ≠ *O*^{m}(*i*, *j* – 1) indicates a recombination between loci *j* – 1 and *j* in the maternal haplotype of *i*. Thus,for *j* = 2, … , *K*, where *r*_{j−1} is the recombination rate between loci *j* – 1 and *j*. For *j* = 1, this probability is set to 0.5 such that the maternal and paternal alleles have equal probability of being inherited. The probability Pr[*O*^{p}(*i*, *j*) | *O*^{p}(*i*, *j* – 1)] of paternal allele origin variables is similarly defined.

#### Sampling of proposal values:

The most recent accepted parameter values of the Metropolis–Hastings algorithm, denoted in the following with an asterisk, are used to derive the parameters of the proposal distributions of μ, *a*, , and the conditional QTL allele frequencies used to model marker–QTL LD. Distributions for *a* and the QTL allele frequencies, in particular, differ for the *jump* and *resample* steps, which were described at the end of the *Full-Bayesian analysis* section. The scale parameters are larger in the *jump* step than in the *resample* step to make larger steps from the last accepted value to another point in the parameters space when the MCMC algorithm tries to move from one putative QTL location to another. The purpose of the *resample* step, in contrast, is to search the parameter space in the vicinity of accepted values at the QTL location that is currently accepted. The proposal distribution is normal for μ with mean μ* and variance 0.01, gamma for *a*, scale inverse chi square for , and beta for QTL allele frequencies. The method of moments is used to derive the parameters of distributions for *a*, , and QTL allele frequencies. The shape parameter of the gamma distribution is α = *a**β, where β is the scale parameter being 100 in the *jump* step and 60 in the *resample* step. The scale inverse chi square distribution has ν = 30 d.f. and a scale parameter of . The two shape parameters of the beta distributions are and , where *p** is the most recent accepted QTL allele frequency conditional on marker allele states and σ^{2} is the desired variance of the beta-distributed proposal values being 0.02 in the *jump* step and 0.01 in the *resample* step.

## Acknowledgments

D. Habier acknowledges financial support from the Deutsche Forschungsgemeinschaft. This research was further supported by the U.S. Department of Agriculture (USDA), National Research Initiative grant USDA-NRI-2007-35205-17862, and State of Iowa Hatch and Multistate Research Funds.

## Footnotes

Communicating editor: I. Hoeschele

- Received February 19, 2010.
- Accepted March 30, 2010.

- Copyright © 2010 by the Genetics Society of America