## Abstract

Often in genetic research, presence or absence of a disease is affected by not only the trait locus genotypes but also some covariates. The finite logistic regression mixture models and the methods under the models are developed for detection of a binary trait locus (BTL) through an interval-mapping procedure. The maximum-likelihood estimates (MLEs) of the logistic regression parameters are asymptotically unbiased. The null asymptotic distributions of the likelihood-ratio test (LRT) statistics for detection of a BTL are found to be given by the supremum of a χ^{2}-process. The limiting null distributions are free of the null model parameters and are determined explicitly through only four (backcross case) or nine (intercross case) independent standard normal random variables. Therefore a threshold for detecting a BTL in a flanking marker interval can be approximated easily by using a Monte Carlo method. It is pointed out that use of a threshold incorrectly determined by reading off a χ^{2}-probability table can result in an excessive false BTL detection rate much more severely than many researchers might anticipate. Simulation results show that the BTL detection procedures based on the thresholds determined by the limiting distributions perform quite well when the sample sizes are moderately large.

A variety of quantitative and/or qualitative traits in plants and animals are attributed mainly to genotypes at certain chromosome locations (so-called trait loci). Extensive studies on quantitative trait locus (QTL) mapping have been conducted since Sax (1923). For experimental organisms, there are several statistical methods developed for mapping QTL systematically. Soller and Brody (1976) proposed the traditional approach of detecting a QTL by using a nearby marker as a surrogate for a QTL. Lander and Botstein (1989) developed the interval-mapping (IM) method to test for a QTL in a flanking marker interval. As the genotypes of the QTL are unobservable, the distribution of the quantitative trait follows a finite mixture model, say a mixture of two normal distributions as many authors proposed. For some recent developments on the QTL mapping method, see, *e.g.*, Zeng (1994), Hackett and Weller (1995), Kao and Zeng (1997), Lange and Whittaker (2001), Broman (2003), Thomson (2003), Chen and Chen (2005), Sillanpaa and Bhattacharjee (2005), and Xu *et al.* (2005).

The QTL mapping methods and corresponding statistical procedures discussed above are for quantitative traits with continuous phenotypic values. In real life, however, some disease traits are qualitative, and in many cases they are binary (binary trait locus, BTL). For example, the fusiform rust disease in loblolly pine is described as presence or absence of the formation of galls on a polygenic basis. Current approaches to the BTL mapping usually assume that the binary trait of interest is to be controlled by an underlying continuous liability so that the QTL mapping methods can be modified for the BTL mapping. On the other hand, as several authors remarked (see, *e.g.*, Xu and Atchley 1996), mapping genes for binary traits is more complicated than that for continuous traits. Although many descriptive procedures are proposed, the development of inferential procedures has been left behind due to the great challenge met in determining the threshold for detecting a BTL with a controlled false BTL detection rate. A reasonable account of large sample studies for the BTL interval-mapping methods has been lacking. As a result, the thresholds for significance tests and critical values of interval estimates of the parameters have to be established using an empirical approach (Yi and Xu 1999b) or a repeated sampling technique, *e.g.*, a permutation test (Churchill and Doerge 1994) or bootstrapping test (Visscher *et al.* 1996). It is also common practice due to the lack of large sample theory that the thresholds for detecting a BTL on the basis of the log-likelihood ratio or the LOD score are incorrectly determined by reading off a χ^{2}-probability table as suggested by the classic asymptotic χ^{2}-theory, with the perception that the actual thresholds may be only *slightly* larger than those from a χ^{2}-table (see, *e.g.*, Haley and Knott 1992; Zeng 1994; Xu and Atchley 1996).

In this article, the finite logistic regression mixture models for the BTL interval mapping and the methods under these models are developed and investigated with a quite general scheme: (a) both the backcross and intercross populations are considered; (b) readily available covariates are accommodated as one of the main features of the models; and (c) other markers in addition to the two flanking markers can be naturally incorporated into the models as part of the covariates to control the genetic background of other chromosomal regions. The research is intended to establish the direct asymptotic procedures for determining the thresholds for the BTL interval-mapping methods on the basis of the log-likelihood ratio or equivalently the LOD score.

Note that some covariates such as environmental and nutritional conditions, age, sex, and location of field are often readily available in genetic research. While including several explanatory covariates can substantially increase the efficiency of the statistical analysis, the identifiability of the logistic regression mixture model in the parameters of interest can also be augmented via the covariates (Follmann and Lambert 1991). It is widely recognized that identifiability is a central issue in the finite mixture models (see, *e.g.*, Chen and Chen 2001).

## MODELS AND METHODS

Suppose that a BTL of an experimental organism is flanked by two markers, say marker I and marker II. Assume that the putative BTL has two alleles *B* and *b*, marker I has two alleles *M*_{1} and *m*_{1}, and marker II has *M*_{2} and *m*_{2}. To detect a BTL in the flanked marker interval, a series of experiments can be arranged so that eventually two inbred lines P_{1} and P_{2} can be obtained, where any individual from P_{1} has homozygous genotype *M*_{1}*BM*_{2}/*M*_{1}*BM*_{2} at the three locations on a chromosome: marker I, the BTL, and marker II, while any individual from P_{2} has homozygous genotype *m*_{1}*bm*_{2}/*m*_{1}*bm*_{2}. Thus, all individuals in the F_{1} population, the progeny of P_{1} and P_{2}, have the same heterozygous genotype *M*_{1}*BM*_{2}/*m*_{1}*bm*_{2}. Let γ be the recombination fraction between marker I and marker II. Assume that γ is specific or can be preestimated. Also assume that the two flanking markers, marker I and marker II, are linked; *i.e.*, γ < 0.5. In fact, in interval mapping γ is usually chosen to be substantially below 0.5. An experimental population can then be produced either by F_{1} individuals backcrossing with one of the parents P_{1} and P_{2}, resulting in a backcross population, or by F_{1} intermating with F_{1} itself, resulting in an intercross population also called an F_{2} population.

Let *Y* be the observable binary trait of concern: *Y* = 1 or 0 according to presence or absence of the trait on a randomly selected individual. Suppose that in addition to the putative BTL, the binary trait *Y* may also be affected by some explanatory covariates, say *x* = (*x*_{1}, … , *x _{p}*)

^{τ}, where

*a*

^{τ}is the transpose of a vector

*a*. The covariate vector

*x*can be fixed effects or random effects, or part of it is fixed and part of it random. In the case of random effects, a conditional model of

*Y*given

*x*is adopted to form the likelihood function. As such, part of the explanatory covariates can be the index variables of genotypes of some other markers in addition to marker I and marker II. For example, if an individual is homozygous at the

*j*th marker, define

*x*= 1; otherwise,

_{j}*x*= 0. As commented by Kao and Zeng (1997), combining interval mapping with multiple markers may greatly improve the identification of a BTL. Other explanatory covariates can be environmental and nutritional conditions, age, sex, location of field, and so on.

_{j}We now present the models of *Y* and the methods of BTL detection combining the IM method in the backcross and intercross populations, respectively.

#### Model and method with backcross population:

For specification, consider a backcross population to be the progeny of P_{1} and F_{1} so that the individuals in the backcross population have four different genotypes at marker I and marker II: *M*_{1}*M*_{2}/*M*_{1}*M*_{2}, *M*_{1}*M*_{2}/*M*_{1}*m*_{2}, *M*_{1}*M*_{2}/*m*_{1}*M*_{2}, and *M*_{1}*M*_{2}/*m*_{1}*m*_{2}, coded as 1, 2, 3, and 4. Let *J* denote the code of the genotype for a randomly selected individual at marker I and marker II, *J* = 1, 2, 3, and 4. The genotypes of markers, *i.e.*, the values of *J*, are observable with the probabilities(1)

At a putative BTL, the genotype can be homozygous *BB* or heterozygous *Bb*. Let *r* be the recombination fraction between marker I and the BTL and *s* the recombination fraction between the BTL and marker II. Without interference, *s* can be computed from γ and *r* as *s* = (γ − *r*)/(1 − 2*r*). The genetic association among the two flanking markers and the BTL can be described by the conditional probabilities *p*_{r}(*j*) = *P*(*BB*|*J* = *j*) that a randomly selected individual has homozygous genotype *BB* given *J* = *j* as follows:(2)

Suppose that given covariate value *x* and genotype value *u* at the BTL (*u* = 1 if *BB* and *u* = 0 if *Bb*), the probability distribution of *Y* = *y* follows a logistic model,(3)(Bonney 1986; Hosmer and Lemeshow 1989), where the parameters β and α and the 1 × *p* parameter vector λ are unknown. Since the genotype of the putative BTL is unobservable, the probability distribution for *Y* given *J* = *j* and *x* can be constructed hieratically according to the homozygous or heterozygous genotype of the BTL, so that a logistic regression mixture model for detection of a BTL by the interval-mapping method is induced as follows: for *y* = 0, 1,(4)where θ = (*r*, β, α, λ^{τ}) is the parameter vector of interest with *p* + 3 components, and *p*_{r}(*j*) and π are given by (2) and (3). The parameter *r* is the genetic distance between marker I and the BTL, β is the base-line gene effect on the binary trait and α is the effect of the BTL, and λ is the effects of the explanatory covariate vector *x*. When a random sample *y _{i}*,

*j*,

_{i}*x*,

_{i}*i*= 1, … ,

*n*, of size

*n*is observed on the binary trait

*Y*, the flanking marker genotype code

*J*, and the explanatory covariate

*x*, the log-likelihood function for θ is(5)where(6)The maximum-likelihood estimate (MLE) for θ is such that it solves the partial derivative equation: ∂

*l*(θ)/∂θ = 0. A computational algorithm can be as follows. For any fixed

_{n}*r*, the EM algorithm (Dempster

*et al.*1977) is used to find the restricted MLE , with Newton-Raphson employed in the M-step (detailed instructions are given in the appendix). Then to obtain and hence , vary

*r*over the interval [0, γ] with a small increment of 1 or 2 cM at a time.

Clearly the identifiability question of the model (4) in θ must be settled before one can meaningfully discuss any inferential procedure on the basis of the likelihood function *l _{n}*(θ). First, it is clear that if α = 0,

*f*(

*y*|

*j*,

*x*, θ) = π(

*y*|

*x*, β, λ) and the mixture model reduces to the ordinary logistic regression model so that the model is unidentifiable in

*r*since the reduced model is free of

*r*. To settle the identifiability problem in other cases, define a cumulative probability functionwhere

*I*{·} is the indicator function. The mixture model

*f*(

*y*|

*j*,

*x*, θ) can be reparameterized by λ and the mixing distribution

*G*as follows:In light of Theorem 2 of Follmann and Lambert (1991),

_{j}*f*(

*y*|

*x*, λ,

*G*) =

_{j}*f*(

*y*|

*x*, λ′,

*G*′

_{j}) holds for all

*y*,

*x*, and

*j*= 1, 2, 3, 4, if and only if λ = λ' and

*G*=

_{j}*G*′

_{j}for

*j*= 1, 2, 3, 4. On the other hand, for

*r*fixed, (β, α) identifies

*G*; if α ≠ 0, (

_{j}*r*, β, α) identifies

*G*. These results imply the following:

_{j}The parameter (β, α, λ) identifies the model (4) with

*r*fixed.The parameter (

*r*, β, α, λ) identifies the model (4) if α ≠ 0.When α = 0, the model (4) reduces to the ordinary logistic regression model and is unidentifiable in

*r*.

Applying Wald's (1949) consistency argument and using the techniques developed in Chen and Chen (2005), the MLEs of β, α, and λ under the logistic regression mixture model (4) are consistent, but the MLE of *r* is inconsistent. However, if indeed there is a BTL in the flanking interval, *i.e.*, α ≠ 0, then the MLE of *r* is consistent.

Detecting a BTL in the flanking marker interval is accomplished by testing H_{0}: α = 0 *vs.* H_{1}: α ≠ 0. Denote the MLE of θ under the full model (4) by , while the MLE under the null model is denoted by . The null hypothesis is rejected if the likelihood-ratio test (LRT) statisticexceeds a threshold value, where *l _{n}*(θ) is given by (5). When the sample size

*n*is large, the null distribution of

*T*

_{n}can be approximated by that of(7)where

*Z*

_{1}, … ,

*Z*

_{4}are four independent standard normal variables, , and

*q*

_{bc}(

*j*) and

*p*

_{r}(

*j*) are defined by (1) and (2), respectively. Thus a threshold or a

*P*-value based on the LRT statistic

*T*can be determined asymptotically by the distribution of

_{n}*T*. Justification of this asymptotic testing procedure is given in the appendix.

It should be noted that if log_{10} is chosen to define the logarithm of likelihood function, the log-likelihood ratio becomes the so-called LOD. Although log_{10} might be a more popular choice than log_{e} historically in the genetics community, log_{e} is adopted in this article to enjoy convenience with the calculus and asymptotic analysis of *T*_{n}. Needless to say, the LOD score and *T*_{n} are only a scale transform of each other (*T*_{n} = 2LOD/ log_{10}*e*) and hence equivalent for summarizing the evidence for a BTL. For those who would feel more comfortable with the LOD score than with the LRT statistic *T*_{n}, an adjusting formula and procedure is immediate: the threshold of the LOD score can be asymptotically determined by the distribution ofor 0.217*T* approximately, where *T* is given by (7).

Note that the distribution of *T* is convenient to implement for determining a threshold at a prespecified significance level via a Monte Carlo method. The null limiting distribution is determined solely by four independent standard normal random variables, free of the nuisance parameters β and λ and free of the distribution of the covariate *x*. Table 1 displays the thresholds determined by the distribution of *T* at the levels 1, 5, and 10% with nine different marker interval sizes (between 2 and 40 cM). In each case, 1,000,000 Monte Carlo repetitions of four independent standard normal variables were performed. The supremum of the χ^{2}-process, as defined by (7), was obtained by exhaustive search over *r* ∈ [0, γ] in a grid increment of 1 cM (the centimorgan unit was converted to the recombination fraction *r* by Haldane's mapping function). From Table 1, the difference between an actual threshold and the χ^{2}-threshold is clearly notable. The consequences of the difference on the false BTL detection rate are shown in the end of this section.

#### Model and method with intercross population:

In addition to the four genotypes at marker I and marker II in the backcross population, there are another five observable genotypes in the intercross population: *M*_{1}*m*_{1}/*M*_{2}*m*_{2}, *M*_{1}*m*_{1}/*m*_{2}*m*_{2}, *m*_{1}*m*_{1}/*M*_{2}*M*_{2}, *m*_{1}*m*_{1}/*M*_{2}*m*_{2}, and *m*_{1}*m*_{1}/*m*_{1}*m*_{2}, coded as 5, 6, 7, 8, and 9, respectively. Continue to denote by *J* the code of the genotype of a randomly selected individual from the intercross population. Let *P*(*J* = *j*) = *q*_{ic}(*j*), *j* = 1,…,9. Without interference,At the putative BTL, there are now three different genotypes: *BB*, *Bb*, and *bb*. Let *p*_{r,1}(*j*) = *P*(*BB*|*J* = *j*), *p*_{r,2}(*j*) = *P*(*bb*|*J* = *j*), and *p*_{r,3}(*j*) = *P*(*Bb*|*J* = *j*). Recall that *s* = (γ − *r*)/(1 − 2*r*) is the recombination fraction between the BTL and marker II. Thenand for *j* = 1, 2, … , 9,For the *i*th randomly selected individual with *Y* = *y _{i}*,

*x*=

*x*, and

_{i}*J*=

*j*, because there are three different but unobservable genotypes at the BTL, the probability model is a mixture of three logistic regression distributions,(8)where η = (

_{i}*r*, β, α, ν, λ

^{τ}) is the parameter vector with 4 +

*p*components. With a random sample of size

*n*, the log-likelihood function for η is(9)where π

_{i}(β, λ) = π(

*y*|

_{i}*x*, β, λ) as defined in (6).

_{i}Similar to the backcross case, it can be seen that the MLEs of β, α, ν, and λ are always consistent under the model (8), although the MLE of *r* is inconsistent when α = ν = 0. However, when either α or ν or both are not 0, the MLE of *r* is consistent.

Detecting a BTL in the flanking interval is now accomplished by testing H_{0}: α = ν = 0 *vs.* H_{1}: α ≠ 0 or ν ≠ 0. The LRT statistic iswhere *l _{n}*(η) is defined by (9), is the unrestricted MLE of η, and is the MLE of η under the null model. The null hypothesis is rejected and hence a BTL is detected if

*S*

_{n}exceeds a threshold that can be determined asymptotically by the distribution of(10)where

*Z*

_{1},…,

*Z*

_{9}are nine independent standard normal variables, andA proof of the limiting distribution (10) of

*S*

_{n}is outlined in the appendix.

If the LOD score is used to measure the evidence of a BTL, it is immediate that a threshold of the LOD can be determined asymptotically by the distribution of (log_{10}*e*/2)*S* or 0.217*S*.

As in the backcross population case, the limiting distribution (10) is free of nuisance parameters β and λ and free of the distribution of *x* and is entirely determined by nine independent standard normal variables. Therefore, (10) is easy to use to approximate a threshold or a *P*-value of the LRT *S*_{n} via a Monte Carlo method. Note that , implying stochastically. Table 2 displays the thresholds determined by the distribution of *S* at levels of 1, 5, and 10% with nine different marker interval sizes (between 2 and 40 cM). As expected, the thresholds determined by the distribution of *S* are between and and can be very close to , depending on the size of the flanking marker interval.

#### Consequences of using a χ^{2}-threshold:

The differences between the asymptotic thresholds determined correctly by the limiting distribution and incorrectly by a χ^{2}-distribution have been noted in Tables 1 and 2. The consequences of the differences can further be described in terms of the false BTL detection rate. A false detection of a BTL over the flanking marker interval occurs when there is actually not any BTL presented in the marker interval, but the LRT statistic (or equivalently the LOD score) exceeds a threshold preset to satisfy a desired level of false BTL detection, say 5%. It is thus clear that if the threshold used is lower than it is supposed to be, the resulting false BTL detection rate will be higher than expected.

Table 3 displays the actual asymptotic false BTL detection rates of the LRT statistic when a χ^{2}-threshold at a level of 5% is incorrectly used (Lander and Botstein 1989; Xu and Atchley 1996; Lynch and Walsh 1998, p. 447). Five different marker interval lengths (2, 10, 20, 30, and 40 cM) are considered. As seen from Table 3, the actual false BTL detection rate can be substantially higher than expected and is more than doubled in both the backcross and intercross cases, when the interval length is as short as 40 cM. Even in the event of an unusually short interval length of 2 cM, the actual false BTL detection is >1% higher than the desired 5%.

## SIMULATION STUDIES

Simulation studies were intended primarily to assess the consequences of using the limiting distributions (*n* = ∞) of *T* and *S* in real life (*n* < ∞). Therefore, for simplicity, only two flanking markers and one continuous covariate were included in the simulation models.

The simulation studies were planned with the following consideration: sample sizes *n* = 200 and 500, marker interval size = 20 cM (*i.e.*, γ = 0.091), absence or presence of a BTL at the location of 4 cM (*r* = 0.038) from marker I, and a standard normal covariate *x* with coefficient λ = 1. Both the backcross and intercross populations with different combinations of effects were considered. The nominal significance level was 5%. The simulated thresholds, *i.e.*, 4.89 in the backcross case (Table 1) and 7.14 in the intercross case (Table 2), were adopted.

In each scenario, 5000 Monte Carlo repetitions were performed. With each Monte Carlo sample, the EM algorithm was used to obtain the MLEs and the supremum of a χ^{2}-process was obtained by exhaustive search from 0 to 20 cM in a grid increment of 1 cM. The simulation results are summarized in Tables 4 and 5.

The simulation results demonstrate that the asymptotic distributions approximate the sampling distributions well. Specifically, (i) the probabilities of type I errors are very close to the nominal one, (ii) the detection rates for a BTL correctly reflect the sample sizes and the sizes of discrepancy between an alternative and the null, and (iii) the estimates of the parameters confirm the expected consistency behaviors.

The simulation results also show the weak performance of the estimates for the BTL locations. As a referee remarked, it is often the case that the QTL estimates are drawn to the center of the flanking marker interval. This phenomenon is also evident in Tables 4 and 5. It appears that performance of the estimates of *r* depends largely on the BTL's genetic distance to the flanking markers as well as on the discrepancy between the null model and the alternative. The performance gets weaker when the BTL is closer to a flanking marker and is best when the BTL is in the middle of the marker interval. It should be noted that this interesting dynamic consistency property of the MLE of *r* does not show up in Xu and Atchley's (1996) simulation report because they considered only the BTLs located in the middles of the marker intervals.

## CONCLUSIONS AND DISCUSSION

The finite logistic regression mixture models for the BTL interval mapping and the methods under these models have been developed and investigated.

The binary trait of interest is assumed to be affected by a BTL and some explanatory covariates. Since the genotype of the BTL is unobservable, the finite logistic regression mixture models are used to develop the procedures for detecting a BTL in a flanking marker interval. Compared to the existing methods, the new approach accommodates trait-associated covariates and allows multiple markers that are expected to help improve the efficiency of BTL mapping. Both the backcross and intercross populations are considered. It should be pointed out that the logistic regression mixture models in this research assume no interaction among the covariates and the genotype of the putative BTL.

The MLEs of the logistic regression parameters are asymptotically unbiased. The null asymptotic distributions of the likelihood-ratio test statistics for detecting a BTL are presented and found to be given by the supremum of a χ^{2}-process. Nevertheless, the limiting null distributions are free of the nuisance parameters and free of the null model parameters and the distribution of the covariates and are explicitly determined through four (backcross case) or nine (intercross case) independent standard normal variables. Thus the thresholds for detecting a BTL in a flanking marker interval can be easily determined by the limiting distributions via a Monte Carlo method. The asymptotic thresholds turn out to be substantially larger than those given by the ordinary χ^{2}-distribution and the actual false BTL detection rates can be much higher than many researchers might have anticipated when a χ^{2}-threshold is incorrectly used.

The asymptotic procedures developed are applicable to large samples in general. For small or moderate samples, Churchill and Doerge (1994) described how to perform a permutation test in QTL mapping. The permutation procedure can be extended to BTL mapping by permuting the binary data along with the associated covariates.

Finally, note that the present research is for detecting only one putative BTL on a chromosome using two flanking markers. One disadvantage of this method, similar to that in Lander and Botstein's (1989) IM method, is that if there is more than one BTL on a chromosome, the estimate of a BTL effect and the location in a tested interval is likely to be affected by the BTLs at other locations. The main purpose of this article is to provide a basic method in genetic locus mapping for binary traits and it is hoped that future research can extend it for testing multiple BTL simultaneously. For some recent research on BTL mapping, see, *e.g.*, Yi and Xu (1999a,b, 2000), Kadarmideen *et al.* (2000, 2001), McIntyre *et al.* (2001), and Arechavaleta-Velasco and Hunt (2004).

## APPENDIX

#### Assumptions:

All the asymptotic results assume that (i) the parametric space of the logistic regression model parameters is a compact super-rectangular in *R*^{2+p} for the backcross population and in *R*^{3+p} for the intercross and *r* ∈ [0, γ] with γ < being specific. On the covariate *x*, assume that *E*{*xx*^{τ}} > 0 is positive definite and the distribution *g* of *x* satisfies *E*{|log *g*(*x*)|} < ∞.

#### Null limiting distribution of *T*_{n}:

Suppose that under the null hypothesis, the true parameter is θ_{0} = (γ, β_{0}, 0, λ_{0}); then, the probability mass function of the trait value *Y* isRewriteLetwhereRecall that the MLEs of β, α, and λ are consistent. So for *r* fixed, consider the Taylor expansion of δ_{i}(θ) at θ = θ_{0}. Rearranging the leading terms, we have<!?A3B2 tpb=2pt?>(A1)where and ϵ_{i}(θ) is the remainder.

By the zero-mean conditionsand<!?A3B2 tpb=1pt?>it can be shown (see Chen and Chen 2005 for technical details in a similar situation) that<!?A3B2 tpb=1pt?>where(A2)and *o*_{p}(1) means convergence to zero in probability. A standard analysis for *T*_{n2} yieldsHence,Uniformly in *r*,almost surely, and as *n* → ∞,in distribution. Therefore, by Slutsky's convergence theorem, the asymptotic distribution of *T*_{n} is given by (7).

#### Null limiting distribution of *S*_{n}:

The proof is similar to the case of *T*_{n}, but just adjust the orthogonal decomposition of δ_{i} in (A1) as follows:<!?A3B2 tpt=1pt?>Note that the first two terms contribute to *Q*_{n} defined in (A2) that is canceled out with the leading term of *T*_{n2} and the last terms are orthogonal to the first two and orthogonal to each other. And finally, sinceanduniformly in *r*, the limiting distribution of *S*_{n} is implied and given by (10).

#### EM algorithm for obtaining MLEs:

The EM algorithm with the backcross population is given here, while the EM algorithm with the intercross is similar and so omitted. Fix *r*. Let *u _{i}* be the genotype of the BTL (

*u*= 1 if

_{i}*BB*and = 0 if

*Bb*). Since

*u*is unobservable, it is treated as a missing value. DenoteThen the log-likelihood function with the complete data isThus in the E-step of the (

_{i}*t*+ 1)th EM iteration, we need to calculate onlyReplace the missing value

*u*by

_{i}*w*

_{i}^{(t)}in the log-likelihood function with the complete data; then in the M-step, we maximizewith respect to β, α, and λ. To do so, we can use the Newton-Raphson iteration method that needs the first and second partial derivatives given below:

## Acknowledgments

The authors thank the associate editor J. Bruce Walsh and two reviewers for their constructive suggestions and comments leading to substantial improvements of the manuscript. They also acknowledge helpful discussions with Hong Zhang. This research is supported in part by National Institutes of Health grant EY014478 (Z. L.).

## Footnotes

Communicating editor: J. B. Walsh

- Received June 24, 2005.
- Accepted October 24, 2005.

- Copyright © 2006 by the Genetics Society of America