## Abstract

The identification of genes that affect quantitative traits has been of great interest to geneticists for many decades, and many statistical methods have been developed to map quantitative trait loci (QTL). Most QTL mapping studies in experimental organisms use purely inbred lines, where the two homologous chromosomes in each individual are identical. As a result, many existing QTL mapping methods developed for experimental organisms are applicable only to genetic crosses between inbred lines. However, it may be difficult to obtain inbred lines for certain organisms, *e.g*., mosquitoes. Although statistical methods for QTL mapping in outbred populations, *e.g*., humans, can be applied for such crosses, these methods may not fully take advantage of the uniqueness of these crosses. For example, we can generally assume that the two grandparental lines are homozygous at the QTL of interest, but such information is not be utilized through methods developed for outbred populations. In addition, mating types and phases can be relatively easy to establish through the analysis of adjacent markers due to the large number of offspring that can be collected, substantially simplifying the computational need. In this article, motivated by a mosquito intercross experiment involving two selected lines that are not genetically homozygous across the genome, we develop statistical methods for QTL mapping for genetic crosses involving noninbred lines. In our procedure, we first infer parental mating types and use likelihood-based methods to infer phases in each parent on the basis of genotypes of offspring and one parent. A hidden Markov model is then employed to estimate the number of high-risk alleles at marker positions and putative QTL positions between markers in each offspring, and QTL mapping is finally conducted through the inferred QTL configuration across all offspring in all crosses. The performance of the proposed methods is assessed through simulation studies, and the usefulness of this method is demonstrated through its application to a mosquito data set.

MOST statistical methods for QTL mapping developed for experimental organisms are applicable only to crosses starting from two inbred lines differing in the trait of interest, with each inbred line genetically homozygous between the two sets of chromosomes. Crossing between the two lines yields F_{1} progeny, who receive a copy of each chromosome from the two homozygous inbred lines; thus they are heterozygous at all loci where the two inbred lines differ. Therefore, genotypes of the F_{2} progeny are highly informative for the inheritance pattern at putative QTL sites. For example, if the two parental lines are denoted by a high- (H) trait line and a low- (L) trait line, respectively, having genotype HH, HL, or LL at one putative QTL for an F_{2} individual offers full information on the number of high-trait alleles at the QTL, *i.e.*, 2, 1, or 0, respectively.

However, crosses may be carried out between individuals who are not completely homozygous across the genome. Although we can generally assume that the two lines being crossed are still homozygous at the QTL, other loci along the genome may be heterozygous. For example, a mosquito intercross experiment was conducted to detect QTL in *Anopheles gambiae* that control melanotic encapsulation response against *Plasmodium cynomolgi* Ceylon (Zheng *et al*. 2003). The encapsulation response is defined as the proportion of the encapsulated oocysts among all oocysts in a single mosquito. The crosses were carried out between a laboratory-selected *A. gambiae* refractory strain L3-5 and a susceptible strain 4Ar/r. The *A. gambiae* female is recognized as the most successful vector of human malarias. Refractoriness to *P. cynomolgi* Ceylon in the original refractory strain L3-5 seems to be largely but not completely recessive, which is also called incomplete recessive (Collins *et al*. 1986; Vernick and Collins 1989). The encapsulation responses of 167 F_{2} females, among which 123 became infected, from six intercrosses among offspring of L3-5 females and 4Ar/r males were tested and genotype data were collected on these F_{2} females and their F_{1} maternal parents at 52 microsatellite markers spanning the current genetic map of *A. gambiae*. Among the six F_{1} female parents of the F_{2} progeny, 11 autosomal markers were found to have more than two alleles, indicating that one or both of the parental strains (strain L3-5 and strain 4Ar/r) were polymorphic at these markers. However, there were no genotype data from the two original strains. For a detailed experimental setup and phenotype information in F_{0} and F_{1} generations, see Zheng *et al*. (2003). Due to the heterozygosity among individuals in the parental lines, genotypes of F_{2} progeny no longer offer full inheritance information at the genetic markers, and the existing QTL mapping methods are either not applicable or not designed to optimally use information from such studies, although much work has been done on QTL mapping between outbred lines. For example, Haley *et al*. (1994) developed a least-squares method for the analysis of crosses between outbred lines that simultaneously uses information from multiple linked markers. Their method has been applied in a three-generation pig experiment (Knott *et al.* 1998) and implemented in a web-based computer program, “QTL express” (Seaton *et al.* 2002). Although they assumed that major QTL affecting the trait of interest are fixed (although markers may be segregating), genotypic data on both the F_{2} individuals and their parents and grandparents are needed to consider all possible line origin combinations at a putative QTL in an F_{2} individual, making their method inapplicable for the current mosquito crosses, which lack grandparental and paternal genotype information. More recently, Lin *et al*. (2003) proposed a method that simultaneously estimates linkage phase, QTL location, and effect parameters. In this method, no simplified assumption that QTL are fixed was made, so all possible cross types have to be considered, making the method not optimized for the current mosquito crosses. Because of the uniqueness of the current mosquito crosses, we can make the assumption similar to that of Knott *et al*. (1994) that the genotypes at the QTL locations in two grandparental lines are fixed; thus, genotypes at the QTL locations in an F_{2} individual can be considered to be known provided that the phase type of an F_{2} individual can be estimated. Moreover, because of the large number of F_{2} progeny of the mosquito crosses, parental mating types and parental haplotypes can be inferred with very high accuracy; thus, there is no need to consider all possible cross types as required in the method of Lin *et al*. (2003). This can greatly reduce the inference complexity and makes our inference procedure more efficient. Therefore, to take full advantage of these unique features of the mosquito data, we have developed a statistical procedure for QTL mapping for genetic crosses with similar characteristics. This procedure consists of four components. First, we use genotypes from one parent in the F_{1} generation and the F_{2} offspring to infer the F_{1} mating type. Second, a likelihood-based method is used to infer the phases of the F_{1} parents. Third, a hidden Markov model is employed to estimate the number of high-trait alleles each F_{2} individual has at putative QTL locations. Finally, we perform linear regression analysis coupled with permutation-based methods to identify regions linked to the QTL. The performance of our procedure is evaluated through simulation studies, and our method is then applied to the mosquito data set to identify QTL in *A. gambiae* controlling the encapsulation response against *P. cynomolgi* Ceylon. To mimic the mosquito crosses, we focus on the intercross design, but extensions to other designs are straightforward.

## METHODS

#### Parental mating-type inference:

For the mosquito data, only genotypes from one parent are available, so the genotypes of the other parent have to be inferred. One advantage of the experimental crosses studied here is that there are a sufficient number of F_{2} progeny within each family so that the F_{1} mating type can be inferred with very high confidence from the observed genotypes of the F_{2} progeny together with possibly available parental information. In appendix a and Table 1, we show that the probability of making correct mating-type inference approaches 1 when the number of offspring is ∼20–30 from each family, which is about the family size in the mosquito experiment. Therefore, the mating type at each locus can be inferred with very high confidence.

#### Likelihood approach for phase inference in F_{1} parents:

In contrast to crosses between inbred lines where phase (the composition of marker alleles on a single chromosome) information is completely known for F_{1} individuals, the phase information is not known with certainty due to heterozygosity in the F_{0} generation. However, phase information is essential for QTL analysis to correlate the number of high-trait alleles at a putative QTL site with the phenotype.

In this article, we use a likelihood-based approach for phase inference for the individuals in the F_{1} generation. We assume that the parental mating type can be inferred with high confidence and the genetic map is also known. As all these families have a very large number of F_{2} individuals, a full-likelihood approach to make simultaneous phase inference across all the markers is computationally prohibitive. Therefore, we conduct sequential inference by inferring the phase of each pair of adjacent markers. In appendix b, we show that the probability of making correct phase inference on the basis of a set of 30 offspring is very high when the genetic distance between each pair of markers is not large (Table 3). As an illustration of this inference procedure, we consider the case where the two parents have four distinct alleles in our discussion here, and the other cases can be derived similarly. Let *a*, *b*, *c*, and *d* denote the four alleles at the first marker, and the four alleles at the second marker are denoted by 1, 2, 3, and 4. In this case, for each pair of markers, we write the mating type at the first marker as , where allele *a* represents the allele in the mother with a grandmaternal origin, allele *b* represents the allele in the mother with a grandpaternal origin, and *c* and *d* represent alleles in the father with grandmaternal and grandpaternal origins, respectively. When genotype information is not available from either parent, we can fix the mating type at the first marker as ,and then there are eight possible mating types at the second marker: 12 × 34, 12 × 43, 21 × 34, 21 × 43, 34 × 12, 34 × 21, 43 × 12, and 43 × 21. For each of these eight mating types at the second marker, we can calculate the likelihood of observing individual genotypes across these two markers in the F_{2} progeny within each family. The phase across these two markers can be inferred through the comparisons of the likelihoods among all eight possible mating types. For example, consider the following mating type between two parents: (a1, b2) × (c3, d4), where the phases in the mother are a1 and b2, and those in the father are c3 and d4. In the formation of F_{2} progeny, the mother has a chance of to generate a nonrecombinant gamete a1 or b2, respectively, and a chance of to generate a recombinant gamete a2 and b1, respectively. Similarly, the father has a chance of to generate a nonrecombinant gamete c3 or d4, respectively, and a chance of to generate a recombinant gamete c4 and d3, respectively. Here is the recombination fraction between the two markers. Therefore, 16 possible two-locus phase types can be formed in the F_{2} progeny. The probability for each possible two-locus F_{2} phase type under this specific case is summarized in Table 2. Because the genotypes of the F_{2} progeny are independent conditional on the F_{1} mating types, the likelihood of the observed F_{2} genotypes is(1)where is the probability of an F_{2} progeny having genotype combination *i* at the two markers given that the F_{1} two-locus phase type is *j*; and is the number of F_{2} progeny having genotype combination *i* at the two markers. We choose the F_{1} two-locus phase type that gives the highest likelihood among the eight possible F_{1} parental phase types and proceed with the same procedure until reaching the last marker on the chromosome. We then obtain the four complete haplotypes for the F_{1} parents, conditioning on the prefixed phase type at the first marker. When the genotype information from one parent is available, there are four possible mating types at each marker and the phase information can be inferred similarly as described above. As described in appendix b and shown in Table 3, simulations suggest that the probability of correctly inferring the F_{1} haplotype approaches 1 with ∼20–30 offspring in a given family, so this pairwise procedure offers an excellent balance between inference accuracy and computational efficiency.

#### Hidden Markov model for the inference of the number of high-trait alleles:

In statistical analysis of crosses resulting from inbred lines, the degree of association between the trait value of each individual in the F_{2} generation and the number of high-trait alleles at the putative QTL site(s) provides information on genetic linkage. For these crosses, it is relatively straightforward to estimate the number of high-trait alleles at each position along the chromosomes. However, for the crosses considered here, such inference is more difficult, especially when the grandparental origin of each allele is unknown. In the following discussion, although the two noninbred lines are not genetically homozygous at some markers, we assume that they are genetically identical at the QTL within each grandparental line and are different between the two lines because the trait of interest is not segregating within each grandparental line. Therefore, we can denote the genotype at a QTL of the high-trait line in the F_{0} generation as HH and that of the low-trait line as LL, so each F_{1} progeny has genotype HL, with one high-trait allele and one low-trait allele. In the intercross setting, at each putative QTL, if we label the two haplotypes, inferred phases across all the markers, in the mother by M1 and M2 and the two haplotypes in the father by P1 and P2, there are four possible combinations of haplotypes and QTL alleles for F_{1} parents: (H, M1)(L, M2) × (H, P1)(L, P2), (H, M1)(L, M2) × (H, P2)(L, P1), (H, M2)(L, M1) × (H, P1)(L, P2), and (H, M2)(L, M1) × (H, P2)(L, P1). For example, in the case of (H, M1)(L, M2) × (H, P1)(L, P2), M1 and P1 are the haplotypes transmitted from the high-trait line and M2 and P2 are the haplotypes transmitted from the low-trait line. For each F_{2} individual, our goal is to estimate the number of high-trait (H) alleles at each candidate QTL. In the absence of information on the origins of haplotypes M1, M2, P1, and P2, we need to consider all these four possibilities. For each given possibility, *e.g.*, (H, M1)(L, M2) × (H, P1)(L, P2), we can use a hidden Markov model described in the following to infer the number of H alleles at the putative QTL sites along the chromosome conditional on the observed genotypes of the offspring and the haplotypes of the parents.

At a given locus *i* on a chromosome for an F_{2} progeny, the grandparental origin, *i.e*., the number of high-trait alleles at a putative QTL site, is not directly observable and can be thought of as a hidden state. We use 1 to denote the event that the mother transmitted the high-trait segment to the offspring, 3 to denote the event that the mother transmitted the low-trait segment, and 2 or 4 to denote the event that the father transmitted the high-trait segment or low-trait segment to the offspring, respectively. Using this notation, there are four possible types at marker *i* in the F_{2} progeny, denoted by , and . So there are four possible hidden states, and the hidden state at marker *i* is denoted by . Note that the hidden state for an F_{2} individual at a given marker corresponds to the inheritance vector for this individual at that marker (Lander and Green 1987). The observed genotype of the F_{2} progeny at the marker is treated as the observed state. Assuming the no interference model for the crossover process, the hidden states {} follow a Markov chain,(2)where is the transition probability of the hidden states from one marker to another marker, which depends only on the recombination fraction between the two markers. The transition probabilities are summarized in Table 4. The distribution of the initial hidden states, *i.e.*, hidden states at the first marker on a chromosome, is set to(3)

As for the emission probability at marker *i* in the F_{2} progeny, we take a marker with F_{1} phase type as an example, and the genotypes at this marker in the F_{2} progeny are *aa*, *ab*, and *bb*. The phase types at this marker in the F_{2} progeny are *ab*, *aa*, *bb*, and *ba*. Thus, the emission probabilities at marker *i* are(4)where H denotes the hidden state and O denotes the observed state.

Given these three components in the hidden Markov model, we can estimate the probabilities of the hidden states in the F_{2} progeny at each marker as reviewed in Rabiner (1989). Note that we have four sets of estimated high-trait allele distributions according to the four phase types at each putative QTL location.

#### Hidden Markov model for interval mapping:

Lander and Botstein (1989) proposed an interval mapping method that is more powerful than marker-based analysis. The above hidden Markov model can be extended in a straightforward manner to infer the number of high-trait alleles at any position between two flanking markers. If we denote the observed genotypes at all marker positions by , denote as the genotypes at the *j*th putative QTL locus among a total of *J* considered QTL between the two flanking markers and , and also denote as the phase type at the *j*th putative QTL, we then have(5)Here can be obtained similarly as described in the previous section, and can be obtained on the basis of the recombination fractions and , where is the recombination fraction between marker and the *j*th putative QTL, and is the recombination fraction between the *j*th putative QTL and marker . The two recombination fractions can be calculated on the basis of the map distance using Haldane's map function, which assumes no interference. We can add putative QTL positions throughout the chromosome (for example, every 1 cM or every 2 cM). For each added putative QTL position, recombinations can be calculated on the basis of the map distance from the added position to the two flanking markers. A detailed derivation of Equation 5 can be found in appendix c.

#### QTL analysis:

The statistical association between a putative QTL and phenotypes is tested by regressing the phenotypes on the estimated number of high-trait alleles obtained from the hidden Markov model,(6)where denotes the inferred expected number of high-trait alleles of the *k*th individual in the F_{2} progeny at the tested putative QTL, and denotes the quantitative phenotype of the *k*th individual in the F_{2} progeny. Within each intercross family, we calculate the *F*-statistic for the null hypothesis of no association between the phenotype and a putative QTL, ,(7)where , *n _{l}* denotes the number of F

_{2}progeny within family

*l*, is the fitted value of phenotype, and

*L*is the number of families. We can then obtain the overall

*F*-statistic

*F*

_{com},(8)For each putative QTL, we conduct four tests based on the estimated number of high-trait alleles assuming the four configurations at the QTL as explained above and we select the maximum

*F*

_{com}-statistic among the four

*F*

_{com}-statistics and name it the

*F*

_{all}-statistic, which will be the test statistic for QTL mapping.

#### Simulation studies:

To study the performance of our procedure for QTL mapping in experimental crosses in the case of noninbred lines in the F_{0} generation, we performed two simulation studies. The first study evaluated the procedure when the QTL was genotyped, *i.e.*, the QTL is one of the genetic markers used, called the putative simulation case in the following discussion; and the second simulation study considered the situation where the QTL was not genotyped but fell between two markers, called the interval simulation case in the following discussion. All simulations were designed to mimic the mosquito intercross experiment: six intercross families with 30 progeny in each family, with 15 markers on one chromosome for the putative simulation case and 14 markers for the interval simulation case. We assumed an incomplete recessive genetic model, which means that instead of only F_{1} homozygotes being susceptible, a proportion of heterozygous mosquitoes can also develop the phenotype. Thus, we defined the genotypic value to be for QTL genotype HH; and for individuals with QTL genotype HL, had average value , and had average value , where is called the degree of incompleteness; for the QTL genotype LL, the genotypic value is . We set the overall trait variance to be 1, and the proportion of variance due to genetic factors (heritability) is *i.e.*, , with . Note that the heritability used here is defined in the broad sense, also called the degree of genetic determination (Falconer and Mackay 1996). By fixing heritability and the degree of incompleteness *b* to some prespecified level, we can obtain the genotypic value by solving the equation(For a detailed derivation of the heritability with incompleteness, see appendix d.) Given the simulated genotypes, the phenotypic effect can be simulated following a normal distribution or as described above.

We first assigned allele frequencies of the 15 markers on one chromosome for the two grandparental lines in the F_{0} population. To mimic the situation of two noninbred lines in the F_{0} population, a single QTL was positioned at 35 cM on the 100-cM chromosome segment (marker 4), which was genetically homozygous, while all other markers were polymorphic instead and the number of alleles was up to four. These numbers were also chosen to mimic the mosquito intercross data. The 15 genetic markers had a marker spacing of 4.5–10.7 cM. The assigned allele frequencies and marker locations are displayed in Table 5.

For the putative simulation case, we simulated two haplotypes of the F_{1} mother on the basis of the predefined allele frequencies with one haplotype from each population line. The same simulation procedure was repeated to obtain the two haplotypes of the F_{1} father. The genotypes of each F_{2} progeny were simulated on the basis of the simulated F_{1} haplotypes and the recombination rates among the markers. More specifically, we chose an allele at the first marker with probability 0.5 from the mother and father and then, starting from the second marker, chose an allele either from the same haplotype or from a different haplotype from the subsequent marker according to whether there was a recombination event between the two markers for a parent.

For the interval simulation case, the putative QTL located at marker 4 was removed from the simulated data sets, making the total number of markers equal to 14. Therefore, the QTL was located between the third and fourth markers in this case. The phenotype data were still simulated on the basis of the QTL genotype for each individual, and all other simulation procedures remained the same.

We also performed simulation studies to study the impact of varying marker information content on QTL analysis. We tested this by setting some F_{2} genotypes to be missing at some markers or by setting several markers to have only one allele in the F_{1} mating type, where we modified the allele frequencies of three markers (2, 9, and 15) as shown in Table 5. When genotype information is missing in F_{2} individuals at some markers, the interval QTL calculation for the number of expected high-trait alleles is carried out using the two nearest existing flanking markers while the positions of the added interval QTL are kept, *i.e.*, between two flanking markers that are supposed to be genotyped no matter whether the genotype information is present or not. We explored the impact of missing marker genotypes by testing different levels of missingness.

For all sets of simulation studies, we also performed interval mapping by considering putative QTL between two flanking markers as described in *Hidden Markov model for interval mapping*.

We performed 1000 simulations for all simulation scenarios. For each simulation, we first estimated the 100(1−α)% chromosomewide threshold for the *F*_{all}-statistic. With specified heritability and degree of incompleteness, we simulated F_{2} phenotype data on the basis of F_{2} genotype data at the true QTL. We then applied our procedure and recorded the chromosomewide maximum *F*_{all}-statistic under H_{1} for this specific simulation. The 100(1−α)% chromosomewide threshold for the *F*_{all}-statistic was estimated through the permutation test. To generate the permuted samples, we permuted the simulated phenotypes within each family and kept the simulated F_{2} genotype at the QTL location intact, *i.e.*, kept the expected number of high-trait alleles of each individual intact. Each permutation generated a new data set in which the null hypothesis of no linkage between the phenotypes and the putative QTL holds. The resulting chromosomewide maximum *F*_{all}-statistic was stored and the permutation procedure was repeated 1000 times. The 100(1−α) percentile of the maximum *F*_{all}-statistic was recorded (Churchill and Doerge 1994; Broman 2003) as our estimated chromosomewide critical value, and we used this value to detect the presence of a QTL on the chromosome to control the overall type I error rate to be ≤α for this current simulated data set. The whole procedure was repeated 1000 times, and the proportion of times that the null hypothesis of QTL not on this chromosome was rejected was the estimated power to detect the QTL.

#### Mosquito data:

As explained in the Introduction, a set of experiments was carried out to identify genes in *A. gambiae* controlling the melanotic encapsulation response against *P. cynomolgi* Ceylon (Zheng *et al.* 2003). The encapsulation responses of 167 F_{2} females from six intercrosses among offspring of L3-5 females and 4Ar/r males were tested, among which 123 F_{2} females became infected. The encapsulation responses of the F_{2} females in the six families are summarized in Table 6 and the histograms of each individual family and six families combined are provided in Figure 1. Note the significant departure from a normal distribution of the encapsulation responses; therefore, the standard deviations in Table 6 should be interpreted as a very approximate indication of the spread of the responses. Among the 52 genotyped microsatellite markers spanning the genetic map of *A. gambiae*, 35 were informative markers, including 5 on the *X* chromosome, 16 on the second chromosome, and 14 on the third chromosome. No QTL was found on the *X* chromosome, so we focused on chromosome 2 and chromosome 3 in this article. Note that Zheng *et al.* (1996) observed that recombination in both male and female mosquitoes was comparable, and a similar observation was made in other anopheline mosquitoes (Mitchell *et al.* 1993; Seawright and Narang 1993). Therefore, in this article, we used the same genetic map for both males and females.

The *F*_{all}-statistic at each marker position as well as at each putative QTL position between two flanking markers was obtained and the statistical significance of the *F*_{all}-statistics was evaluated empirically through the permutation test. To generate the permuted samples, we permuted the encapsulation phenotype within each family and kept the expected number of high-trait alleles of each individual intact. Each permutation generated a new data set in which the null hypothesis of no linkage between the phenotype and the genotype of the putative QTL holds. The permuted data were then analyzed for QTL effects within each individual family. The resulting *F*_{all}-statistics at each position were stored and the procedure was repeated 10,000 times. To calculate the empirical *P*-values adjusting for multiple testing, we applied the procedure by Lystig (2003), where, for each permutation, the largest *F*_{all}-statistic was recorded and the adjusted *P*-value is the proportion of the maximum *F*_{all}-statistics from 10,000 permutations that is greater than or equal to the original *F*_{all}-statistic.

We also conducted QTL mapping by comparing the observed chromosomewide maximum *F*_{all}-statistic with the chromosomewide critical value obtained similarly as described in the *Simulation studies* section on the basis of permutations. Chromosomes that have chromosomewide maximum *F*_{all}-statistics exceeding the critical value possibly bear one or more QTL.

## RESULTS

In the simulation studies, we considered four levels of heritability, from very low to moderate, 0.03125, 0.0625, 0.125, and 0.25. We also considered two levels of incompleteness, 0.5 and 0.7, to mimic the mosquito study, where the degree of incompleteness was slightly >50%. With these two parameters fixed, the corresponding phenotypic value can be calculated accordingly. The significance level α was set to be 0.05. Another simulation parameter we considered was missing genotype rate. We tested three different missing genotype rates, 0, 0.1, and 0.25, to explore the impact of missing marker genotypes. We also tested the impact of having one allele at some marker positions in F_{1} mating types using interval simulation when the missing genotype rate was set at 0.1 and the degree of incompleteness was set at 0.5.

As shown in Table 1, when the number of F_{2} progeny was ∼20–30 in a given family, the probability of inferring F_{1} mating type correctly approaches 1. Similarly, Table 3 shows that the probability of inferring an F_{1} haplotype correctly also approaches 1 when the map distance between two markers was as long as 40 cM. We note that the maximum map distances between adjacent markers on the two mosquito chromosomes are both ∼18 cM. Therefore, with six mosquito families having ∼15–37 individuals in the F_{2} generation, we should have very high confidence that our pairwise procedure for the F_{1} phase inference within each mosquito family can lead to accurate inference.

To estimate the overall type I error rate, we simulated F_{2} phenotype data under QTL is not in this place, and obtained the chromosomewide critical value for the simulation following the permutation procedures described in *Simulation studies*. We then compared the observed chromosomewide maximum *F*_{all}-statistic with the chromosomewide critical value. The above procedure was repeated 1000 times and the proportion of times that the null hypothesis was rejected was an estimate of the overall type I error rate. Our simulation for the overall type I error rate was based on one parameter setting when the heritability was 0.25, the degree of incompleteness was 0.5, and the missing genotype rate was 0. The overall type I error rate for the interval simulation case with QTL at a marker was 0.049, and it was 0.059 for the interval simulation case using the interval mapping method, both of which are within the 95% confidence interval (0.036, 0.064) for a 5% nominal type I error rate based on 1000 simulations.

The power analysis results from different simulation studies for both interval simulation and putative simulation scenarios are summarized in Table 7 and plotted in Figure 2. Figure 2 includes results both from marker-based analyses and from interval mapping scanning through the chromosome at every 2 cM. We note the following:

The interval mapping method did provide higher power than mapping at marker positions only for all scenarios considered. For example, when heritability was 12.5% and the missing genotype rate was 25%, for the interval simulation scenario we had 75% power using marker-based analysis and 80% power using interval mapping.

As heritability increased, the power to detect QTL increased, achieving almost 100% power when the heritability was set at 25% for both putative and interval simulations with different missing genotype rates considered.

As the missing genotype rate increased, the power decreased, where the loss in power was greater for the interval simulation scenario than that for the putative simulation scenario. The loss in power was the greatest when heritability was either 0.0625 or 0.125, with as much as 30% power loss, whereas the power loss was relatively small when heritability was 0.25.

When the degree of incompleteness was increased from 0.5 to 0.7, there was consistent decrease in power for both putative and interval simulations for all levels of heritability.

In testing the impact of having only one allele at some markers in the F_{1} mating type, we focused on interval simulation when the degree of incompleteness was 0.5, and the missing genotype rate was 0.1. The results are summarized in Table 8. There was clear power loss in QTL detection when there was only one allele at some markers in the F_{1} mating type compared to the situation when there were two or more alleles at all markers in the F_{1} mating type for all heritability values considered. Note that the estimated QTL effect, *i.e.*, the regression coefficient estimate from the QTL analysis, was ∼0.7–0.9 when the heritability value was 0.25. When the heritability was 0.03125, the estimated QTL effect decreased to ∼0.3–0.5.

The results from the mosquito intercross data at marker positions as well as results from interval QTL mapping scanning through the chromosome at every 2 cM are plotted in Figure 3 for chromosome 2 and in Figure 4 for chromosome 3, respectively. Also plotted are the adjusted *P*-values for analysis at marker positions estimated on the basis of the permutation procedure described in methods and the chromosomewide threshold values controlling the overall type I error rate to be . These results suggest that one possible QTL region on chromosome 2 and two QTL regions on chromosome 3 were related to encapsulation response against *P. cynomolgi* Ceylon. Note that previous genetic mapping experiments on encapsulation response against *P. cynomolgi* B, a simian malaria originating in Malaysia, with the current refractory (L3-5) and susceptible (4Ar/r) strains have identified one major (named *Pen1*) and two minor (named *Pen2* and *Pen3*) autosomal dominant QTL. *Pen1* has been mapped to chromosome 2R, division 8C, while *Pen2* and *Pen3* are less precisely located (Collins *et al*. 1997, 1999; Zheng *et al*. 1997). In our investigation with these mosquitoes on the response of L3-5 to infection with *P. cynomolgi* Ceylon, on chromosome 2, close to markers *H290* or *H175*, where *Pen1* was most precisely defined, a major QTL was identified at markers *3F11-C1*, while previous analyses of this data set found marker *3F11-C1* having a large but not significant *F*-statistic and thus did not define a new QTL (Zheng *et al.* 2003). Marker *H095* appeared to define another minor QTL on the basis of the permutation-adjusted *P*-value, and this marker was most closely associated with *Pen3*, a minor QTL for *P. cynomolgi* B encapsulation. Note that previous analyses defined markers *H135* and *H095* as a novel QTL, termed *Pcen2R* (Zheng *et al.* 2003). The interval mapping method identified another possible minor QTL between markers *H290* and *H799* on chromosome 2. Two major QTL regions were identified on chromosome 3. Marker *3A6-17* defined a major QTL region, where the nearest markers *H555* and *H158* defined a QTL *Pcen3R* in the previous linear regression analyses (Zheng *et al.* 2003). Marker *3A2-29* defined another major QTL region, where marker *3A2-29* defined a novel QTL *Pcen3L* from the previous linear regression analysis (Zheng *et al.* 2003). Interval mapping with a 2-cM scan on chromosome 3 revealed another peak between two tightly linked markers *3A2-29* and marker *H577*, which confirmed the major QTL defined by *3A2-29.* Note that the estimated QTL effect was ∼0.3–0.4 for chromosome 2 and ∼0.4–0.5 for chromosome 3. We also note that marker *H758* was the marker most closely associated with *Pen2* and marker *3A2-29*, which gave the highest *F*_{all}-statistic on chromosome 3, was not available yet when studies on *P. cynomolgi* B were conducted.

## DISCUSSION

In this article, we have considered QTL mapping in experimental crosses between noninbred lines. Under this situation, most QTL mapping methods are no longer applicable since they generally assume that experimental crosses begin with two genetically homozygous inbred lines. The development of this method is motivated by a mosquito intercross experiment involving two laboratory-selected lines that are not genetically homozygous across many markers. Although some methods have been developed for crosses between outbred lines (*e.g.*, Haley *et al*. 1994; Knott *et al*. 1998; Lin *et al*. 2003), none of them are optimized for the current mosquito crosses for reasons noted in the Introduction. Therefore, our method may prove useful for QTL mapping in similar crosses. Note that in our procedure we focus on the F_{2} intercross only, but extensions to other designs are straightforward.

The simulation results indicate the feasibility and power of the proposed method for both putative simulation and interval simulation scenarios. As we expected, under the putative simulation scenario, we had higher power for QTL mapping than that under the interval simulation scenario, and, as heritability increased, the power to detect QTL increased as well. In addition, the interval mapping approach had better power than that of marker-based analysis for all scenarios considered. We note that marker information content can influence the power of QTL analysis, where missing genotype information in the F_{2} progeny would lead to power loss in QTL mapping. A higher missing genotype rate could lead to more power loss, and the loss is greater for the interval simulation scenario than that for the putative simulation scenario. We have also investigated the effect of the degree of incompleteness on power. Note that the power obtained here is for whether there is a QTL on the chromosome, and this procedure is not designed to test whether there are multiple QTL. Extensions of the permutation-based method to estimate the empirical threshold values for detecting minor QTL not explained by major QTL have been studied by Doerge and Churchill (1996).

We note that in the mosquito data, the phenotype, which is the proportion of the encapsulated oocysts among all oocysts, does not follow a normal distribution, but rather is concentrated at both ends, *i.e.*, close to either 1 or 0. Different transformations have been examined, but all these transformations led to similar results (data not shown). This is partly due to the use of permutations to set thresholds for statistically significant findings. Therefore, the original phenotypes were used in the mosquito data analysis. Broman (2003) studied the case when the normal distribution assumption is violated, such as when there is a spike in the distribution. Tilquin *et al.* (2001) studied the power of QTL mapping methods applied to bacteria counts and concluded that the power of parametric QTL mapping methods is strongly reduced if raw bacteria counts are analyzed. Note that the encapsulated oocyst counts are similar to bacteria counts and we are currently working on methodology development to take into account the “two-spike” data in linkage analysis to increase statistical power in QTL mapping.

We also note that it may not be sufficient to use two markers at a time to estimate phases in the F_{1} individuals as information in the data from other markers is not utilized in our approach. Although this procedure works well for the cases we considered here, more accurate inference of phases can be made, leading to more powerful tests. However, the computational burden may make joint analysis of all markers unfeasible. In addition, we have assumed no interference in the hidden Markov model. When this assumption is violated, although the inferred number of high-trait alleles may not be correct, it should provide a very good estimate. Finally, note that the statistical model (6) for QTL analysis implicitly assumes a codominant genetic model while an incomplete recessive model is more appropriate for the mosquito data. As the mosquito data have a degree of incompleteness only slightly >50%, the current statistical model may be appropriate. However, when the degree of incompleteness is much different from 0.5, but close to 0 or 1 instead, a more appropriate way is to incorporate degree of incompleteness into the statistical model.

## APPENDIX A: ACCURACY OF F_{1} MATING-TYPE INFERENCE FROM F_{2} GENOTYPES AND F_{1} MATERNAL GENOTYPE

The probability of inferring F_{1} mating type correctly from F_{2} genotypes and F_{1} maternal genotype depends both on the true F_{1} mating type and on the number of F_{2} progeny. Therefore, with a fixed number of F_{2} progeny *n*, we consider all possible F_{1} mating types by grouping F_{1} mating types into groups with different numbers of distinct alleles. We obtain the probability of inferring each F_{1} mating type correctly by considering and summing over all possible observed F_{2} genotypes and F_{1} maternal genotype combinations. In forming the probability, we assume that given the F_{1} mating type, the genotypes of the F_{2} progeny are independent. We summarize the probability of inferring F_{1} mating type correctly in Table A1, where we denote as the number of F_{2} individuals having genotype , that *i* can be 1, 2, 3, or 4 for F_{2} progeny within a family, and . With the number of F_{2} progeny *n* large enough, we will have high probability to observe all possible F_{2} genotypes in the F_{2} generation for a given F_{1} true mating type and, thus, have a high probability to infer the F_{1} mating type correctly.

## APPENDIX B: ACCURACY OF F_{1} PHASE TYPE INFERENCE

To test the accuracy of F_{1} phase type inference, we performed several simulation studies. Because the purpose of inferring F_{1} phase type is to infer the number of high-trait alleles at each putative QTL, we are concerned only about the two-marker haplotype instead of the phase across the whole chromosome. Therefore, in each simulation study, all inferred two-marker haplotypes along the chromosome were examined and compared with the true two-marker haplotypes. In the simulation study, we fixed map distances along the chromosome at different levels to study the impact of recombination fraction on the accuracy of haplotype inference. If all four inferred two-marker haplotypes from two parents were not consistent with the four true two-marker haplotypes, we say that this haplotype is inferred incorrectly. The simulation procedure was repeated 100,000 times and the probability of inferring each haplotype with fixed map distance correctly was recorded. Note that the true F_{1} mating type was simulated under one situation when the frequencies of alleles *a*, *b*, *c*, and *d* were 0.35, 0.35, 0.15, and 0.15, respectively, for both the maternal line and the paternal line in the F_{0} generation. The results are summarized in Table 3. The results will differ for other mating types, and the scenario considered by us (with all possible haplotypes present) is the most difficult to infer.

## APPENDIX C

Here we give the detailed derivation of the probability of phase type for interval QTL mapping between two flanking markers. Using the same notation as that introduced in methods, we denote the observed genotypes at the *i*th marker position as , and denote the genotype at the *j*th putative QTL position lying in between two flanking markers and as . We also denote as the phase type at this putative QTL. We then haveIn calculating , we can apply the forward and backward equations in the hidden Markov model (Rabiner 1989). The emission probabilities and transition probabilities can be obtained similarly as discussed in methods.

## APPENDIX D: DERIVATION OF THE HERITABILITY EXPRESSION

Here we give the detailed derivation of the heritability expressionwhich is a function of phenotypic value and degree of incompleteness *b*. Consider a QTL with two alleles H (high trait) and L (low trait); and are the allele frequencies of the high-trait (*e.g.*, refractoriness) allele and low-trait allele (*e.g.*, susceptibility) of the QTL. In our case, . Assume an incomplete recessive genetic model with the degree of incompleteness being ; that is, for heterozygotes HL, have the phenotype. The relationship between QTL genotype and phenotype is displayed in Table D1. From Table D1, we can derive the mean of the phenotypic value asThe average gene effect of allele H isThe average gene effect of allele L isTherefore, if allele H and allele L have independent effects, we can obtain the additive effect of each genotype:The dominance effect for each genotype isThe proportion of the total variation due to this QTL can be decomposed into two parts, additive variance and dominance variance:Therefore,

## Acknowledgments

We thank two reviewers for their constructive comments. This work was supported in part by National Institutes of Health (NIH) grant GM59507 to H.Z. and by NIH grant AI43053 and an award from the Burroughs Wellcome Fund to L.Z.

## Footnotes

Communicating editor: R. W. Doerge

- Received September 1, 2005.
- Accepted January 9, 2006.

- Copyright © 2006 by the Genetics Society of America