## Abstract

Dominance (intralocus allelic interactions) plays often an important role in quantitative trait variation. However, few studies about dominance in QTL mapping have been reported in outbred animal or human populations. This is because common dominance effects can be predicted mainly for many full sibs, which do not often occur in outbred or natural populations with a general pedigree. Moreover, incomplete genotypes for such a pedigree make it infeasible to estimate dominance relationship coefficients between individuals. In this study, identity-by-descent (IBD) coefficients are estimated on the basis of populationwide linkage disequilibrium (LD), which makes it possible to track dominance relationships between unrelated founders. Therefore, it is possible to use dominance effects in QTL mapping without full sibs. Incomplete genotypes with a complex pedigree and many markers can be efficiently dealt with by a Markov chain Monte Carlo method for estimating IBD and dominance relationship matrices (). It is shown by simulation that the use of increases the likelihood ratio at the true QTL position and the mapping accuracy and power with complete dominance, overdominance, and recessive inheritance modes when using 200 genotyped and phenotyped individuals.

A variance component approach is an efficient and powerful tool for dissecting the variation due to quantitative trait loci (QTL) because of its generality and flexibility (Goldgar 1990; Amos 1994; Grignola *et al.* 1996; Almasy and Blangero 1998; Van Arendonk *et al.* 1998; Blangero *et al.* 2000; George *et al.* 2000). In the approach, QTL are treated as random effects, which makes it possible to apply complex covariance structures on the basis of general and complex pedigree. It is also feasible to simultaneously consider additional effects in the statistical model, *e.g.*, those of other QTL and their interactions (Blangero *et al.* 2000).

Quantitative trait variation associated with causal mutations in the loci can be decomposed into additive genetic effects, dominance (intralocus interactions), and epistasis (interlocus interactions). Additive genetic effects due to QTL and their interlocus interactions have been explicitly considered in several QTL mapping studies (Jansen 1993; Zeng 1993, 1994; Mitchell *et al.* 1997; Blangero *et al.* 2000; Carlborg *et al.* 2000). However, few studies about dominance in QTL mapping have been reported for outbred populations although a part of variation due to QTL may be caused by dominance. The reason for dominance not having been explicitly considered may be because common dominance effects can be predicted mainly for full sibs that do not frequently occur in outbred or natural populations with a general pedigree. Moreover, such a general pedigree structure spans several generations with complex relationships, and ancestors' genotypes are often unavailable (Haley 1999).

Identity-by-descent (IBD) coefficients between haplotypes of unrelated founders in a recorded pedigree can be estimated on the basis of linkage disequilibrium (LD) using coalescence methods (Meuwissen and Goddard 2001; Perez-Enciso 2003; Morris *et al.* 2004). This makes it possible to estimate the covariance between unrelated animals due to additive gene effects and intralocus allelic interactions. Additive genetic and dominance relationship coefficients based on LD and pedigree information give much more information about covariance between individuals for a targeted QTL region. Therefore, this approach allows estimation of additive genetic and dominance effects in an outbred or natural population without a particular designed family structure.

The aim of this study is to investigate the use of LD-based in mapping of QTL with a general complex pedigree. The mapping accuracy with and without is investigated with various levels of LD and inheritance modes. In estimating IBD coefficients and , a Markov chain Monte Carlo approach is used as it is a flexible and efficient tool to deal with the case of complex pedigree with incomplete genotypic data and many markers (Lee *et al.* 2005).

## MATERIALS AND METHODS

#### Mixed linear model:

A vector of phenotypic observations can be modeled as(1)where *y* is a vector of phenotypic observations on the trait of interest, is a vector of fixed effects, is a vector with a random polygenic effect for each of *n* individuals, *q* is a vector of *n* random additive genetic effects due to QTL, *d* is a vector of *n* random effects due to intralocus allelic interaction at the QTL, and *e* are residuals. The random effects *u*, *q*, *d*, and *e* are assumed uncorrelated with each other and normally distributed with mean zero and variance . *A* is the numerator relationship matrix based on additive genetic relationships predicted from pedigree, is the additive genotype relationship matrix at QTL whose elements are IBD probabilities between individuals computed for a putative QTL position and conditional on marker information, is the dominance relationship matrix at QTL based on IBD probabilities, and *I* is an identity matrix. *X* and *Z* are incidence matrices for the effects in , and *u*, *q*, and *d*, respectively. From (1), the associated variance covariance matrix of all observations (*V*) for a given pedigree and marker genotype set is modeled as(2)It is noted that the IBD coefficients between all haplotypes (2*n*) are transformed to a covariance between individual genotypes without loss of information, which is (*n* × *n*) (Tier and Solkner 1993; Pong-Wong *et al.* 2001; Lee and van der Werf 2006). Therefore, the number of terms corresponding to *q* reduces to *n* × *n*, which is the same for *u*. The number of terms corresponding to *d* is originally the same as that for *u* (Liu *et al.* 2002). Therefore, a single incidence matrix, *Z*, can be used for *u*, *q*, and *d*.

#### Estimation of G_{RM} and D_{RM}:

##### Probabilities of segregation states given marker data and pedigree:

Each configuration of segregation states has its own probability that is estimated on the basis of observed marker and pedigree. The probability for one configuration (*S*) is(3)where represents the observed marker data, pr() is prior probability of the segregation state, Pr( | ) is the probability of the observed marker data given , and the denominator is summed over the probabilities of all possible segregation states. Since the computation of the denominator is not generally feasible, a Markov chain Monte Carlo approach is required to obtain the posterior distribution of the segregation state.

##### Sampling segregation state using Markov chain Monte Carlo approaches:

The meiosis Gibbs sampler makes joint updates for the inheritance at linked loci for one individual at a time. For example, for the *i*th individual, segregation indicators at all loci can be sampled using a forward–backward algorithm (Thompson and Heath 1999), according to all possible segregation states for the *i*th individual, conditional on the segregation indicators for other individuals (see Lee *et al.* 2005). Joint updates for each individual result in better mixing properties and the process is much more reliable than that of a single-site Gibbs sampler (Thompson and Heath 1999). To increase mixing properties, the random walk approach (Sobel and Lange 1996) is used for the potential reducible sites in the meiosis sampler (Lee *et al.* 2005).

##### Haplotype reconstruction:

In the process of the Markov chain Monte Carlo (MCMC) sampling, marker genotypes are sampled for paternal and maternal gametes for all animals conditioned on observable variables (*e.g.*, observed marker data and pedigree information), taking into account the linkage across the chromosome. This is known as haplotype reconstruction. This procedure is performed in each sampling round.

##### QTL allelic identity-by-descent coefficients based on LD and linkage:

Given the sampled segregation state and haplotypes, it is possible to estimate QTL allelic IBD coefficients on the basis of LD and linkage. Sampled haplotypes for base animals are used to estimate LD-based IBD probabilities between unrelated base animals, using the method of Meuwissen and Goddard (2000, 2001). Sampled segregation indicators at multiple marker loci for descendants are used to estimate linkage-based IBD probabilities between relatives. There are four IBD probabilities between any pair of individuals *i* and *j* in the pedigree that are the probabilities of the paternal or maternal QTL allele of individual *i* being IBD to the paternal or maternal allele of individual *j*, given marker genotypes (Liu *et al.* 2002); *i.e*., , where *x* is the paternal (P) or maternal (M) QTL allele for each individual. From the probabilities, the additive genotype coefficient between animals *i* and *j* at the QTL isThe dominance relationship coefficient between animals *i* and *j* at the QTL is

#### Restricted maximum likelihood:

By assuming multivariate normality of the data with vector and a variance–covariance matrix of all observations, the resulting likelihood and variance components can be estimated. An efficient algorithm to obtain restricted maximum-likelihood (REML) estimates is one that uses the average of the information from the observed derived Hessian coefficients and the expected derived Fisher information coefficients (see Gilmour *et al.* 1995; Johnson and Thompson 1995). We solved the average information (AI) REML equation, directly using the variance–covariance matrix rather than using the mixed model equations (see Lee and van der Werf 2006).

#### Simulation study:

One hundred generations with a population size of 100 were simulated for 10 marker loci at 1-cM intervals and a QTL to generate LD between QTL and flanking markers beyond the recorded pedigree. In each generation, the number of male and female parents was 50 and their alleles were transmitted to descendants on the basis of Mendelian segregation using the genedropping method (MacCluer *et al.* 1986). Parents were randomly mated with a total of two offspring for each of 50 mating pairs. Note that no pedigree information was recorded for these 100 generations.

The number of base alleles in each marker locus was four and starting allele frequencies were all at 0.25. The marker alleles were mutated at a rate of 4 × 10^{−4}/generation (Dallas 1992; Weber and Wong 1993; Ellegren 1995). A QTL was located at the middle of the fourth and fifth markers, and unique numbers were assigned to the base population alleles. In generation 100 and afterward, phenotypic values for individuals were simulated as(4)The population mean (μ) was 100, values for polygenic effects () were drawn from with = 25 (*A* is numerator relationships among individuals calculated since generation *t* + 1), and values for residuals () were from with = 50. The value due to QTL genotypes () for each individual was decomposed as additive effects of paternal and maternal QTL alleles and their nonadditive allelic interaction; that is,(5)where is the paternal (maternal) QTL allele and is interaction between the two alleles. Three cases were considered for intralocus allelic interaction for the QTL. The favorable QTL allele () had an additive value of 7 (*a* = 7) and allelic interaction with a particular allele (say ) gave a value of 14 (*d* = 7) for case 1 or 21 (*d* = 14) for case 2. The last one (case 3) was that homozygous recessive alleles (any pair of ) gave a value of 14 and otherwise no effects (Table 1). The allele , , or was randomly chosen from the base alleles with a frequency of >0.1 and <0.9. Since the number of alleles and their frequencies were stochastically simulated, the ratio of additive and dominance QTL variance over phenotypic variance (*i.e.*, and ) varied for each case (Table 1).

The likelihood ratio [LR = 2()] over 50 replicates was estimated in LD and linkage (LDL) mapping with or without fitting nonadditive intralocus allelic interaction (*i.e*., with or without using ). Both LR and proportion of replicates positioning QTL for each marker bracket are plotted against genomic region as an indication of the mapping accuracy. The power was calculated as the proportion of replicates detecting QTL with LR value that was >6.6 (= ). Analyses were carried out for a general pedigree of 2 generations (generation 100–∼101), varying marker density or past effective size with changing levels of LD. In addition, results from LDL mapping and linkage-only mapping are compared using a pedigree of 5 generations (generation 100–∼104). It is noted that a pedigree of 2 generations does not have enough linkage information to estimate in linkage mapping alone. In all cases, marker genotypes and phenotypes were available for the last 2 generations (200 animals) for a fair comparison. For the pedigree of 5 generations, ancestral genotypes from generation 100–102 were totally missing.

The middle point of each marker bracket is tested for QTL signal. Therefore, nine and matrices were generated. Variance components for model parameters (, , , and ) were estimated with each set of and , using REML.

## RESULTS

#### Increased accuracy of estimated position:

The averaged value of LR and the proportion of replicates positioning the QTL with or without for each putative QTL position is plotted against the genomic region. When using a no-interaction model, the LR curve with is almost identical to that without ; *e.g.*, at the true QTL position, the LR is 20.7 with and 20.2 without (Figure 1A). The density curve for the proportion of replicates positioning the QTL for each position is also very similar for both models with and without (Figure 1A). Whether is used or not, the QTL is significantly detected for all replicates (the power is 1). This shows that the use of has no effect on the mapping performance when dominance effects are absent.

In case 1, the LR at the correct QTL position is much higher with used to fit the effects of allelic interaction (31.1) than without (26.8) (Figure 1B). The proportion of replicates positioning the QTL at the correct location is 0.62 with , whereas the proportion is 0.46 without (Figure 1B) although the power is 0.96 for both models with and without .

In case 2 (*i.e.*, overdominance model), the LR value at the correct QTL position is substantially higher with (49.5) than without (34.4) (Figure 1C). The proportion positioning the QTL on the correct location is 0.74 with and 0.54 without (Figure 1C). The power is 0.98 for both models. Compared to the result in case 1 (Figure 1B), the averaged LR value and the accuracy are higher and the difference between the models with and without is much bigger.

In case 3 (*i.e*., recessive model), the LR curve without is relatively flat and low even at the correct QTL position (LR = 2.7) (Figure 1D). However, the LR curve with is highly peaked on the correct QTL position (LR = 7.7). The proportion positioning the QTL at the correct location is 0.32 with and 0.22 without (Figure 1D). The power is much higher with (0.5) than without (0.14).

#### Effect of marker density and past effective size:

Since the IBD and dominance relationship coefficients are based on LD information, the performance of the proposed method may depend on the levels of LD. Lower LD could arise from using either a lower marker density or a higher past effective size. Figure 2 shows the patterns of LR values averaged over replicates when using two different marker densities (1 and 5 cM). The difference between the LR values at the correct QTL position with and without is significant (31.09 and 26.76) with a 1-cM marker spacing. However, the difference is negligible (9.16 and 8.79) in the case of a 5-cM marker spacing. The overall LR values across the genome region are much lower with the lower marker density.

Figure 3 shows LR values when using two different past effective sizes (100 and 400). The difference between the performances with and without is much bigger when using an effective size of 100 (31.09 and 26.76) than when using an effective size of 400 (11.38 and 10.73). The overall LR value substantially decreases with the higher value of past effective size.

The level of LD can be predicted given the length of a chromosomal region and the value of past effective size,(6)(Sved 1971), where is past effective size and *c* is the recombination rate of the chromosomal region. *E*(*r*^{2}) is defined here as the expected probability of the chromosomal region being IBD when two random haplotypes are taken from the population (Hayes *et al.* 2003). For the parameters used in the simulated data, the expected *r*^{2} is 0.2 with 1 cM of marker spacing and effective size of 100, 0.05 with 5 cM of marker spacing and effective size of 100, and 0.06 with 1 cM of marker spacing and effective size of 400.

This shows that when the levels of LD are low (∼0.05), the use of does not improve the value of the LR. It is also noted that the efficiency of LDL mapping decreases with lower LD. This is because IBD and dominance relationship coefficients are based on LD information.

#### Comparison between LDL mapping and linkage mapping alone:

The accuracy of estimated QTL position with LDL mapping or linkage mapping is estimated and compared when using a five-generation pedigree where only the last two generations are genotyped (incomplete genotypes). The LR values with or without are estimated with a complete dominance model.

The accuracy of estimated QTL position with LDL mapping is much higher than that with linkage mapping (Figure 4). The LR value with LDL mapping is fairly peaked at the true QTL position and high (19.23 with and 17.47 without ). However, the LR curve with linkage mapping alone is flat and low (∼4 on the true QTL with or without ) and gives little useful information about QTL position. It should be noted that analysis with is clearly better than that without in LDL mapping. However, there is no difference in linkage mapping (Figure 4). This shows that dominance effects are not captured by linkage information in a general pedigree.

#### Estimated variance components:

Table 2 shows estimated parameters with or without when a complete dominance model is considered. The ratio of polygenic variances over phenotypic variance () is not much biased for analyses both with and without using . The ratio of additive QTL variance over phenotypic variance is relatively accurate with using whereas it is overestimated without using . The ratio of nonadditive QTL dominance variance over phenotypic variance is overestimated with using . This is probably due to the fact that the number of records is small with insufficient information to accurately estimate dominance variances, even though the power of QTL detection is increased due to (see discussion).

## DISCUSSION

The use of in variance component QTL mapping (in both linkage mapping and LDL mapping) has not been tested before for outbred populations. This is because lack of full sibs would give little or no information to estimate the (co)variance due to dominance effects (Goddard and Meuwissen 2005). This is true for the case that is based on linkage information alone (as illustrated in Figure 4), but IBD coefficients and based on LD provide much more information. Our study investigated the usefulness of based on combined LD and linkage in QTL mapping. It was shown that the mapping resolution was higher with using when was higher.

Previous reports on LD variance component mapping (Meuwissen and Goddard 2000, 2001; Meuwissen *et al.* 2002; Blott *et al.* 2003) have ignored intralocus interaction. We constructed based on combined LD and linkage information with a general complex pedigree of two generations using a MCMC method (Lee *et al.* 2005) and investigated whether power and accuracy of mapping increases with the . Moreover, we showed that the MCMC method made it possible to construct IBD matrices and with a complex pedigree of several generations (approximately five generations) having a large proportion of missing genotypes. The recursive method described in Liu *et al.* (2002) could not deal with missing genotypes.

In cases 1 and 2, the power was high (>0.95) and the same for both models with and without . This was probably due to the fact that additive QTL effects were already large enough to provide significant QTL signals ( = 0.23–∼0.28) whether was used or not. However, the proportion of replicates positioning the QTL on the correct location was significantly increased with , which probably gave more information about the QTL position. When additive QTL effects were small as in case 3, it was clearly shown that the power as well as the accuracy was much higher with than without .

The LR values with a five-generation pedigree were lower than those with a two-generation pedigree. This was probably because haplotype reconstruction for the founders was based on descendants' genotypes, and this was less accurate compared to using their own genotypes as in the two-generation pedigree. In addition to, a high proportion of missing genotypes and complex relationships may often cause noncommunicating classes in the Markov chain (*i.e.*, a reducibility problem). However, the LR values were still fairly peaked on the true QTL position with a five-generation pedigree using LDL mapping.

Our dominance model in case 1 assumed interaction only between and . An alternative interaction model was considered where showed an allelic interaction with any other allele (), giving a value of 14. The results from this alternative were very similar to case 1 in that the LR values with were significantly higher than those without (Figure 5). This indicates that the proposed method performs well for different intraallelic interaction models.

In all cases, only a single QTL was involved with polygenic effects and residuals. The question is whether still performs well when multiple dominance QTL are linked in a chromosome. We simultaneously simulated three QTL across a 100-cM genomic region, each with a different inheritance mode. The first QTL at 13.5 cM was additive (*a* = 7), the second QTL at 46.5 cM showed complete dominance (*a* = 5 and *d* = 5), and the third QTL at 76.5 cM was recessive (*a* = 7 and *d* = −7). The values for polygenic effects and residuals were the same as before. Thirty-eight markers were spanned across the region such that markers were basically positioned at 10-cM intervals, and 9 markers were added and positioned at 1-cM intervals for each region from 10 to 20 cM, from 40 to 50 cM, and from 70 to 80 cM. Twenty-five replicates were carried out. The average ratios of additive and dominance variance are = 0.14, 0.13, and 0.089 and =0, 0.031, and 0.056 for the first, second, and third QTL, respectively. As shown in Figure 6, the pattern of LR curves clearly showed that the use of improved the mapping resolution more with higher even when mapping simultaneously with other QTL. The LR value for the first QTL was the same for both models with and without . However, the LR value with is higher for the second QTL and much higher for the third QTL.

The distribution of the estimated in replicates was upwardly skewed (Figure 7A), which caused overestimated (Table 2). When the number of genotyped and phenotyped individuals used in the analysis increased to 1000, the upward bias was much reduced (Figure 7B). The mode of the distribution coincided with the true observed value for . Although the power to map the QTL on the correct position was high with a few hundred records, accurate estimation of variance parameters may require more numbers of records.

Another reason for upward bias was probably because the dominance variances were estimated and averaged over for replicates where it was significant, which resulted in a truncated distribution of the estimation (Beavis 1994, 1998; Xu 2003). Bayesian model averaging (see Ball 2001; Sillanpaa and Corander 2002) could be used for this problem, which could reduce the upward bias of QTL effect estimates.

Inbreeding may produce a correlation between additive and dominance effects that was ignored in constructing the variance–covariance matrix (2). De Boer and Hoeschele (1993) reported that unbiased predictions of additive and dominance effects were obtained with only slightly reduced accuracies without considering the correlation between additive and dominance values even with highly inbred populations.

In this study, it was shown that LR values and mapping accuracy with using were significantly higher than those without using although the dominance variance was relatively small ( = 0.03–∼0.13). This implies that the proposed approach can be useful for economical traits in livestock populations where a significant proportion of genetic variation is caused by dominance effects (Gengler *et al.* 1997; Misztal *et al.* 1998). For mapping single-gene characteristics, *e.g.*, a recessive disease gene, the proposed approach can also be an alternative to homozygosity mapping (Lander and Botstein 1987; Miano *et al.* 2000).

## Acknowledgments

We are grateful for the comments from anonymous reviewers. This study was supported by Australian Wool Innovation, Sygen, and Sheep Genomics Australia.

## Footnotes

Communicating editor: M. Nordborg

- Received May 14, 2006.
- Accepted August 8, 2006.

- Copyright © 2006 by the Genetics Society of America