Abstract
There is a growing need for the development of statistical techniques capable of mapping quantitative trait loci (QTL) in general outbred animal populations. Presently used variance component methods, which correctly account for the complex relationships that may exist between individuals, are challenged by the difficulties incurred through unknown marker genotypes, inbred individuals, partially or unknown marker phases, and multigenerational data. In this article, a twostep variance component approach that enables practitioners to routinely map QTL in populations with the aforementioned difficulties is explored. The performance of the QTL mapping methodology is assessed via its application to simulated data. The capacity of the technique to accurately estimate parameters is examined for a range of scenarios.
WITH the widespread usage of genetic markers in helping to detect and localize quantitative trait loci (QTL), marker data are becoming available on human and livestock populations with increasingly complex pedigree structures. QTL analysis in such populations is challenging because the number of alleles segregating at the QTL is unknown, the marker phases may be unknown or only partially known, the marker and QTL allele frequencies must be estimated from the data, inbreeding loops may exist in the pedigree, and markers may be noninformative or ungenotyped. Although it is possible to simplify the analysis of complex pedigree data by fragmenting the pedigree into smaller component families, methods that fully account for complex relationships between individuals are expected to provide greater power to detect QTL (Almasy and Blangero 1998).
Literature surrounding the mapping of QTL in populations with complex pedigrees can be classified according to the allelic assumptions associated with the QTL. Mapping methods either assume the QTL is a fixed effect with a finite number of alleles or a random effect with an infinite number of alleles.
Analysis of statistical models that treat the QTL as a fixed effect range from simple regressionbased methodologies (Knottet al. 1996) to complex statistical analyses involving Markov chain Monte Carlo (MCMC) methods within frequentist (Heath 1997; Jansenet al. 1998) and Bayesian (Uimari and Hoeschele 1997; Georgeet al. 2000) paradigms. The statistical models are mixture distributions, where the number of component densities is determined by the number of QTL genotypes, and assumptions regarding the total number of segregating alleles have a profound effect on the formulation of the statistical model.
Random effects models, based upon the simple premise that individuals of like phenotype are more likely to share genes identicalbydescent (IBD), offer a less parameterized statistical environment in which to map QTL. This environment is obtained by assuming the QTL effects are normally distributed—an assumption that circumvents the estimation of QTL allele frequencies and is robust to violation (Hoescheleet al. 1997).
Random effects models have long been utilized by human geneticists interested in partitioning the genetic variance of quantitative traits into effects due to specific chromosomal regions. As early as the 1970s, variance component approaches (i.e., analytical methods that estimate the parameters of random effects models) were being used to detect QTL in phaseknown pedigrees (Jayakar 1970). Since then, the development of increasingly sophisticated variance component methods has enabled QTL to be mapped in increasingly general pedigrees (Amos 1994; Almasy and Blangero 1998).
In contrast to the long association human geneticists have had with random effects models, animal geneticists’ acceptance of QTL as random effects is relatively recent. Fernando and Grossman (1989), Hoeschele (1993), and Van Arendonk et al. (1994) began by assuming the QTL variance and location, among other parameters, were known. These parameters were later estimated with a singlemarker singleQTL model (Grignola et al. 1996a,b) and multiple linked markers and QTL model (Grignolaet al. 1997). To date, QTL mapping in outbred animal populations has been confined to experimental populations (e.g., daughter and granddaughter designs). This can be attributed to the availability of data and complexities associated with calculating (co)variance matrices for QTL effects given multigenerational pedigrees.
The aim of this article is to present to the animal genetics community a new twostep variance component method that is capable of routinely mapping QTL in populations with considerable missing marker information and complex pedigree structures. The methodology is based upon an interval mapping procedure and begins by utilizing available marker data and pedigree information to calculate the (co)variance matrices associated with a QTL at a particular position on the genome. Once the (co)variance matrix is calculated, the mixed linear model is constructed and parameter estimates are obtained. This twostep process of calculating the (co)variance matrix and estimating the parameters of the mixed linear model is repeated for each position on the genome. A test statistic measuring QTL presence is then obtained from which position and size can also be determined. The ability of this method to analyze complex pedigree data is owed to the recently upgraded and freely available software package Loki (Heath 1997). Loki enables the IBD probabilities at a QTL to be calculated between all pairs of individuals given considerable missing information and pedigree complexities. These IBD probabilities are used to construct the QTL’s (co)variance matrix.
MATERIALS AND METHODS
Mixed linear models: When constructing a mixed linear model that accounts for a QTL, the quantitative trait is commonly assumed to be controlled by a linear combination of fixed effects, putative QTL, and additive residual (polygenic) effects. The polygenic effects account for the cumulative result of all loci affecting the quantitative trait that are unlinked to the QTL. Mixed linear models can be constructed at the animal or gametic level. In this article, an animal model is presented, which, in matrix notation, is defined as
The random effects u, v, and e are assumed to be uncorrelated and distributed as multivariate normal densities as follows:
When no QTL is assumed to be segregating in the population, the mixed linear model in matrix notation becomes
Calculating the IBD probabilities for the G matrix: In practice, QTL genotypes are unobservable. Instead, linked markers are genotyped and used to infer QTL IBD status. The marker information in complex pedigrees is often incomplete. Unknown linkage phases, noninformative markers, and/or missing marker genotypes complicate the calculation of G. Several methods for calculating IBD probabilities in complex pedigrees have been developed. These methods fall into one of three classes—recursive algorithms, correlationbased algorithms, or simulationbased algorithms.
Recursive algorithms: Recursive algorithms to calculate IBD probabilities for a QTL’s gametic relationship matrix were developed by Van Arendonk et al. (1994) and Wang et al. (1995). These algorithms can also be used to construct G since a simple linear relationship exists between the (co)variance matrix used in animal QTL models and the gametic relationship matrix. That is, g_{ij} = 0.5 R_{s}_{=}_{m}_{,}_{p}R_{t}_{=}_{m}_{,}_{p}g_{isjt}, where s, t ∊ {maternal (m), paternal (p)} and g_{isjt} represents the probability of the sth parental gamete inherited from individual i being IBD to the tth parental gamete inherited from individual j. The calculation of the gametic IBD probabilities is based upon information from a single fully genotyped marker linked to a QTL. Extensions to linked phaseknown marker data were made by Grignola et al. (1996a).
Recursive algorithms are an effective and economical way of calculating IBD probabilities given the availability of full marker information; however, this requirement is difficult to guarantee for complex pedigrees. Wang et al. (1995) discussed a nonstochastic approach to handling missing marker information while maintaining the recursive integrity of the algorithm; however, large amounts of missing marker information render the algorithm intractable. Furthermore, recursive algorithms follow a “topdown” strategy beginning with the calculation of IBD probabilities for the parents and using these estimates to infer the IBD probabilities of the offspring. Missing information on individuals early in the pedigree introduces estimation errors that propagate throughout the pedigree because recursive algorithms are incapable of utilizing information that is not otherwise passed down through the parents.
Correlationbased algorithms: Almasy and Blangero (1998) developed an alternate approach for IBD probability calculation. Their methodology espouses the IBD correlation relationships of Amos (1994), who, in matrix notation, showed
Almasy and Blangero (1998) have made a significant contribution to the advancement of correlationbased algorithms; however, little attention is paid to the difficulties of calculating G_{M} given missing marker information. The authors suggest Monte Carlo methods to impute the missing marker genotypes but irreducibility (i.e., the ability of a sampler to visit any consistent state in a parameter space with positive probability) of the chains is difficult to assess and guarantee. Issues relating to the use of Monte Carlo methods to infer missing marker genotypes are discussed in further detail below.
Simulationbased algorithms: For pedigrees with incomplete marker information, direct application of recursive or correlationbased IBD algorithms is impossible. In this situation, G is often replaced by its expectation conditioned on the observed marker data (M_{obs}) such that
Calculating the expectation of G for pedigrees containing substantial missing data presents two computational challenges. First, the number of configurations in ω is potentially large, thus the order of the summation in (4) makes the calculation infeasible. In practice, a Monte Carlo approximation is used (see Grignolaet al. 1996a). Second, the exact calculation of Pr(ωM_{obs}) is intractable. Exact methods such as the Elston and Stewart (1971) algorithm and peeling algorithms (Canningset al. 1978) are exponential in pedigree complexity and marker polymorphicity. Instead practitioners rely on simulation techniques, namely MCMC methods.
A plethora of MCMC algorithms have been developed for the exploration of ω and thus approximation of Pr(ωM_{obs}). Among the simplest are the “singlesite” approaches (Sheehan 1990), which update each locus for each individual separately. The individual’s genotype is updated, conditioned upon the individual’s phenotype and the current genotypes of the parents, spouses, and offspring. Unfortunately, singlesite samplers can possess poor “mixing” qualities for complex pedigrees and irreducibility of the chains can be ensured only for biallelic loci (Linet al. 1994). Difficulties in exploring ω stem from the observed marker data constraining the set of missing marker configurations. Not all marker configurations are consistent with Mendelian inheritance rules. Several more complex samplers (Linet al. 1993; Geyer and Thompson 1995; Lund and Jensen 1998) that reportedly ensure irreducibility have been suggested; however, irreducibility of the chains can still not always be guaranteed as discovered by Jensen and Sheehan (1998).
These difficulties prompted Thompson (1994) to devise an alternate sampling strategy which can be used for a variety of tasks including the estimation of G. It has long been recognized that segregation events (i.e., the separation of alleles at a locus during meiosis) govern the inheritance of genetic material from parent to offspring. In fact, marker genotypes are merely the observed results of segregations. Thompson (1994) developed a sampler, based upon segregation indicators that are binary variables modeling segregations, to explore the set of possible segregation configurations (Λ). This then allowed G_{s}, the (co)variance matrix for a QTL conditioned on the segregation indicators, and Pr(sM_{obs}) to be estimated where s ∊ Λ. Also the expectation of G can be easily calculated as E(GM_{obs}) = R_{s}_{∊Λ}G_{s}Pr(sM_{obs}). The space of segregation indicators is far less constrained than the space of missing genotypes, culminating in Monte Carlo chains with improved convergence and irreducibility properties. A singlesite MetropolisHastings sampler was developed by Thompson (1994) and later extended to the simultaneous updating of multiple sites in Thompson and Heath (1999).
Segregation indicators and their use in estimating G: Using notation consistent with Thompson and Heath (1999), the segregation indicator (S_{ij}) equals 0 if the inherited allele at the ith segregation and the jth locus is the parent’s maternal allele. Alternately, S_{ij} = 1 if the inherited allele at the ith segregation and the jth locus is the parent’s paternal allele. The set of segregation indicators for the m segregations in the pedigree and the n loci where these loci may be marker loci and/or QTL is represented by s = {S_{ij}; i = 1, · · ·, m j = 1, · · ·, n}.
Consider the pedigree depicted in Figure 1, where ordered  marker information is recorded on a single locus (i.e., x y implies x is the allele inherited from the maternal parent and y is the allele inherited from the paternal parent). Shown are three different sets of segregation indicators consistent with the observed marker data and pedigree structure. These segregation indicators give possible allelic pathways through the pedigree. Since segregation events are not directly observable, several segregation patterns may be consistent for the same set of marker data.
By obtaining a large number of s with probability Pr(s M_{obs}), these segregation indicators can be used to estimate IBDprobabilities between any pair of individuals in the pedigree. For example, in the first and third segregation patterns in Figure 1, the maternal allele of individual 6 originates from (or is IBD to) the maternal allele of individual 1, while in the second segregation pattern, the maternal allele of individual 6 originates from (or is IBD to) the maternal allele of individual 2. Therefore, based upon these realizations, Pr(6_{m} ≡ 2_{m} M_{obs}) = ⅓ and Pr(6_{m} ≡ 1_{m}  M_{obs}) = ^{2}/_{3}, where ≡ represents IBD.
Multiplesite segregation sampler: A brief introduction to the multiplesite segregation sampler, as developed by Thompson and Heath (1999) and employed in this article, is now presented. Readers who wish to pursue a more rigorous derivation are invited to read Thompson and Heath (1999).
Very simply, the multiplesite segregation sampler is a cleverly designed Gibbs sampler (Geman and Geman 1984) with batch updating, which allows IBD probabilities to be calculated in pedigrees with unknown marker genotypes and unknown marker phases. Exploration of the joint density Pr(s M_{obs}), which may be of high dimension when the pedigree islarge, is facilitated through the sampling of m simpler ndimensional conditional distributions such that
The first step involves moving through the genome, calculating locus by locus, cumulative probabilities for S_{ij}. These probabilities are relatively easy to calculate recursively. Once all n cumulative probabilities have been obtained, the second step involves moving back down the genome, sampling S_{ij} from a univariate density that is a function of the associated cumulative probability, the previous sampled segregation indicator (S_{i j}_{+1}), and the recombination rate between loci j and j + 1. In this way, s_{i•} can be sampled from its conditional distribution. By repeating these two steps for i = 1,..., m, a realization from (5) is obtained.
Implementation of the multiplesite segregation sampler: Implementation of the multiplesite segregation sampler is via an adapted version of the QTL mapping software Loki. Loki was originally designed for multipoint linkage analysis in general pedigrees using MCMC methods; however, it has since been modified for IBD probability calculation. The user supplies Loki with the pedigree structure, marker genotypes, marker positions, and QTL positions for which the IBD matrices are to be calculated. Dependent chains of IBD probabilities are then obtained for each QTL position. Convergence is determined by monitoring the IBD probabilities over the iteration number. Once the probabilities stabilize, the sampler is deemed to have reached convergence.
Twostep variance component approach: The variance component approach to map QTL in complex pedigrees is composed of two distinct steps:
Step 1. For each QTL position on the chromosomal segment, the (co)variance matrix for the QTL (i.e., G) is calculated.
Step 2. For each position considered in step 1, construct the mixed linear models (1) and (2), obtain estimates of the parameters, and test for the presence of a QTL.
These steps are common to all interval mappingbased variance component methods; however, their implementation differs greatly among practitioners. For example, there are various approaches to calculating the G matrix that have already been discussed and there are numerous analytical and simulation techniques for estimating the parameters of a mixed linear model.
With regard to the implementation strategy adopted in this article, in step 1 the IBD probabilities for the G matrix are obtained via the multiplesite segregation sampler. In step 2 ASREML (Gilmouret al. 1998) provides restricted maximumlikelihood (REML) estimates of (1) and (2). ASREML was chosen over other available REML packages due to its ability to handle large userdefined (co)variance matrices. To test for the presence of a QTL against no QTL at a particular chromosomal position, the test statistic log LR = 2 ln(L_{0}(H_{0}, no QTL present)  L_{1}(H_{1}, QTL present)) is constructed, where L_{1} and L_{0} represent the respective likelihood values of (1) and (2) evaluated at the REML solutions.
Distribution of the test statistic: Statistical theory states that log LR follows a χ^{2} distribution with the degrees of freedom equal to the number of parameters being tested (Wilks 1938). However, in the context of interval mapping, the asymptotic behavior of log LR is under nonstandard conditions since the null hypothesis places parameters on the boundary of the parameter space defined by the alternative hypothesis (Stram and Lee 1994). Furthermore, the distribution of log LR under H_{0} is influenced by the chromosomal segment length, the degree of missing marker data, and the distributional properties of the trait.
When a single chromosomal position is being tested, log LR follows a 50:50 mixture distribution, where one mixture component is a peak at 0 and the other component is a
Since this article deals with simulated data, it is possible to replicate data under the null hypothesis, construct the empirical distribution of log LR, and derive empirical threshold values in which to determine QTL presence. For real data, permutation methods (Churchill and Doerge 1994) have been suggested. The large number of required analyses, though, limits the methodology to relatively small pedigrees. Furthermore, it is not clear how the data should be permuted given populations with complex pedigree structures.
SIMULATION STUDY
The simulation study begins with the analysis of data generated under H_{0}. By constructing a histogram of the test statistic over replicates, an empirical distribution of log LR is obtained. QTL presence at a chromosomal position is then determined in subsequent analyses by comparing the respective test statistic to the empirical threshold.
To investigate the performance of the twostep variance component method for mapping QTL in complex pedigrees, data are generated under four simulation setups. The first setup, referred to as the “benchmark” setup, involves the generation of fully genotyped, highly polymorphic marker data. A biallelic QTL that explains 10% of the total variation (i.e., h^{2}_{v} = 0.1) is segregating in the population. Setups A, B, and C then change a single feature of the benchmark setup, enabling the effect on the variance component method’s performance to be assessed. The four setups considered in this study are summarized in Table 1 and are discussed in detail below together with the generation of data under H_{0}.
Generation of data under H_{0}: Replicates are generated according to the benchmark setup and setup A (which are described below) but without a segregating QTL in the population. These two setups are equivalent except the benchmark setup is based upon a sheep pedigree where setup A is based upon a pig pedigree. This allows the sensitivity of the test statistic’s distribution to changes in pedigree structure to be assessed.
Benchmark setup: The pedigree structure is based upon a real pedigree created to explore copper deficiency in a selected sheep population. The original experiment contained over 2000 individuals; however, for the purposes of demonstrating the methodology, a subset of 500 individuals is selected. In reducing the pedigree’s size, careful attention is given to maintaining the structure’s original complex nature. The reduced pedigree consists of 269 related families spanning four generations with 1.8 offspring on average per mating. The pedigree structure contains no inbreeding.
The marker information consists of four polymorphic markers segregating with eight equally frequent alleles and placed on a chromosomal segment of length 60 cM at positions 0, 20, 40, and 60 cM. A biallelic QTL with alleles Q and q segregating at equal frequencies is then placed between the second and third markers at position 35 cM. If an individual inherits QQ from its parents, its phenotypic contribution due to the QTL is v_{i} = m + a, where m = 0 and a = 13.5. If the individual’s QTL genotype is heterozygous or qq, the individual’s phenotypic contribution is v_{i} = m + d or v_{i} = m  a, respectively, where d = 0. Falconer (1989) defines m as the midhomozygote value, a as the additive effect, and d as the dominance effect.
The polygenic contribution made by an individual is dependent upon the polygenic contributions of its parents and Mendelian sampling. Since the complete parentage of every individual in the pedigree is not available, u_{i} is generated according to the number of known parents. If both parents are unknown (i.e., the individual is a founder), then
The value of
Setup A: The ability of the variance component method to map QTL in a pedigree with large numbers of offspring per mating and inbreeding is investigated. The pedigree used in this study is again based upon a real structure originating from a Meishan pig experiment. The initial experiment contained ∼2500 related individuals, but for meaningful comparisons to be made with the benchmark analysis, 500 related individuals are selected. The average number of offspring per family is 14.3 across five generations of matings, consisting of 35 related families. The average inbreeding coefficient is 4.5% with a maximum inbreeding coefficient of 17%.
Setup B: Complex pedigrees often contain individuals with missing marker genotypes. This missing information introduces uncertainty into the analysis. To better understand the ability of the methodology to cope with this uncertainty, two approaches to removing the marker information generated according to the benchmark setup are explored. The first approach is where 50% of the marker genotypes are randomly removed. The second approach removes only the marker information of offspring that are not themselves parents. This results in a 53% loss in marker information.
Setup C: The final setup investigates the effect a reduction in marker informativeness has upon the analysis. Data are generated according to the benchmark setup except three alleles as opposed to eight alleles are segregating at the markers.
RESULTS
Results from the application of the twostep variance component method to replicated data generated under the abovedescribed simulation study are now reported. Due to the analyses being computationally demanding, only every third centimorgan is tested for the presence of a QTL. A single analysis across the chromosome can take up to 56 min on a Compaq Professional Workstation XP1000 utilizing a single Alpha 21264 processor running at 500 MHz. Four parallel analyses per replicate are performed, where Loki and ASREML runs begin from different welldispersed starting values. The empirical distributions of the test statistic, however, are created from the analysis of 500 replicated data sets; therefore only a single run is performed per replicate due to obvious computational constraints.
Construction of the empirical distribution of log LR under H_{0}: Figure 2A reveals close agreement between the empirical and theoretical (i.e., 50:50 mixture where one component mixture is a peak at 0 and the other is a
In Figure 2B, when a chromosomewide QTL search is performed, the empirical distributions appear to follow a
Benchmark setup: The mean log LR profile over 50 replications of data generated under the benchmark setup is shown in Figure 3. The profile peaks between markers 2 and 3 at the position of the simulated QTL (i.e., 35 cM). The mean peak is well below the 5% empirical threshold; however, this result is slightly misleading. The peak of the mean profile is biased downward because the estimated position, and thus corresponding peak of the profile, varies across replicates. In fact, 48% of the analyses yield a test statistic along the chromosomal segment in excess of the 5% threshold.
The ability of the methodology to accurately estimate the parameters of interest can be gauged from the results presented in Table 2. The mean parameter estimates of
For the parameters
Setup A: Increasing the average family size has an obvious effect on the performance of the methodology as evidenced in Figure 3. With larger families, the peak of the log LR profile (based upon 50 replicates) increases from 3.9 to 11.0, where 82% of the analyses yield a log LR in excess of the 5% empirical threshold. Once again, the parameters are well estimated (see Table 3), with mean biases slightly smaller than the biases obtained under the benchmark setup.
Setup B: In setup B, five patterns of missing marker data are analyzed. Patterns 14 correspond to the random removal of 50% of the marker genotypes while pattern 5 is obtained by only genotyping the parents, which constitutes a 53% loss in marker genotypes. The proportions of individuals in the pedigree with 0, 1, 2, 3, and 4 missing marker genotypes, for each pattern, are given in Table 4. Each pattern consists of 50 replicates. These replicates, before the marker data are removed, are the same as those replicates generated under the benchmark setup. Thus, differences between the results obtained under setup B and the benchmark setup can be directly attributed to the effect of missing marker information.
The mean log LR profiles for patterns 15 together with the mean log LR profile for the benchmark setup are shown in Figure 4. There are two points of interest to note with respect to this figure. First, the mean log LR profile for data with partially genotyped markers lies below the profile obtained with complete marker information. Less marker information introduces extra uncertainty into the analysis and the method’s ability to detect QTL decreases. In fact, the percentages of analyses yielding a log LR in excess of the 5% empirical threshold are only 24, 36, 24, 28, and 20% for patterns 15, respectively, well below the 48% achieved when the same data contain completely genotyped individuals.
Second, no real difference exists between the performance of the method across patterns 14. However, the mean profile for pattern 5, where only the parents are genotyped, does appear to differ from the other log LR profiles.
The difference in the method’s performance across the five patterns of missing marker data is further evidenced in Table 5. The SD, average betweenrun variance, and mean bias are marginally higher for patterns 14 than they are under the benchmark stepup (given in Table 2). For pattern 5, though, the method struggles to obtain reasonable parameter estimates, with the mean bias for
Setup C: The impact of less informative markers on the ability of the variance component method to detect QTL is evident from Figure 5. The mean log LR profile is well below the 5% threshold with only 26% of the analyses yielding a significant peak log LR. The parameter estimates (shown in Table 6), however, are similar to the estimates obtained under the benchmark setup with highly informative markers. Thus, the use of less polymorphic markers imparts greater uncertainty into the detection of QTL as opposed to the estimation of QTL.
DISCUSSION
To date, several statistical approaches have been developed to map QTL in outbred livestock populations; however, these methods focus on granddaughter or halfsib designs and are not easily extended to more challenging pedigree structures. In this article, a twostep variance component method is presented that is capable of detecting and estimating QTL in populations with complex pedigrees and considerable missing marker information. The methodology is illustrated through its application to simulated sheep and pig populations.
By formulating the QTL mapping problem within a mixed linear model framework, a less parameterized statistical environment is obtained, reducing the computational burden of the analysis. The complex relationships that may exist between individuals are included within the model, leading to more accurate parameter inferences, and additional fixed and random effects can be easily incorporated into the analysis with minimal adjustment to the methodology.
For example, to simultaneously map two linked QTL, the mixed linear model becomes y = Xβ + Zu + Zv_{1} + Zv_{2} + e, where v_{1} and v_{2} are the additive effects of the linked QTL. Analogous to the twostep process of mapping a single QTL, Loki is used to calculate the (co)variance matrices of v_{1} and v_{2} at two separate test positions along the chromosome. Estimates of the parameters are then obtained via ASREML and the test statistic for the presence of two linked QTL is constructed. This process is repeated for each pair of test positions on the chromosome, enabling multiple QTL to be detected and localized. Note, when two QTL are being mapped, the QTL profile is a twodimensional surface.
A twostep process to estimating the variance components of a mixed linear model is not new per se. Fernando and Grossman (1989), Van Arendonk et al. (1994), and Wang et al. (1995) are but a few who first calculate the IBD probabilities for the QTL’s (co)variance matrix and then estimate the parameters of the mixed linear models using standard statistical techniques. Difficulties in determining marker phase and unknown marker genotypes, however, mean these methods have limited application to populations with general pedigree structures. Via the multiplesite segregation sampler, opportunities now exist to analyze data with considerably complex pedigrees.
A variety of algorithms are available for the calculation of REML estimates. Standard algorithms such as ASREML require the inverse of the QTL’s (co)variance matrix, which is singular at marker loci. In this article, G and thus the test statistic are calculated at a position slightly to the right or left of the marker, an approach also adopted by I. Hoeschele (personal communication). Visscher et al. (1999) instead use a derivativefree algorithm to calculate REML estimates that does not require G^{1} but V^{1}, where V represents the complete (co)variance matrix for the likelihood. The complete (co)variance matrix is always nonsingular, allowing the test statistic to be calculated at marker positions. The two approaches give almost identical results. To illustrate this, a single replicate from setup A was analyzed using ASREML and the derivativefree algorithm of Visscher et al. (1999). The resulting QTL profiles are shown in Figure 6. Clearly, there is little difference between the two REML strategies; however, the present version of the derivativefree approach that calculates V^{1} is considerably more computer intensive due to its reliance upon calculating V^{1} for every convergence iterate.
Pivotal to the success of any interval mapping procedure is the calculation of an appropriate threshold value in which QTL are declared significant. The threshold value is dependent upon the distribution of the test statistic, which is known for a single point [i.e., 50:50 mixture where one component mixture is a peak at 0 and the other is a
Problems also surround the construction of confidence intervals for QTL position estimates. These problems are not unexpected given that the construction of such intervals is challenging in even simple pedigrees. For an approximate confidence interval, the LOD dropoff method could be employed and more accurate confidence intervals obtained under parametric and/or nonparametric bootstrapping methods. However, as with permutation testing, resampling for nonparametric bootstrapping methods may be difficult. Clearly, further research is needed to resolve these issues.
This article has been catalytic to initiating work in three further areas of research. First, the simulation study suggests partial marker information on most individuals is to be desired over having a mixture of fully genotyped and completely ungenotyped individuals. This is currently being explored in greater detail for a range of missing marker scenarios. Second, a new recursive algorithm to calculate IBD probability in complex pedigrees has been developed and is currently being tested. Third, the methodology is to be applied to the analysis of real sheep and beef cattle data.
Acknowledgments
The authors thank Simon Heath for his many useful comments and finetuning of Loki. This work was partly supported by a Biotechnology and Biological Sciences Research Council award.
Footnotes

Communicating editor: T. F. C. Mackay
 Received May 12, 2000.
 Accepted July 27, 2000.
 Copyright © 2000 by the Genetics Society of America