- THIS ARTICLE
-
Abstract
- Full Text (PDF)
-
All Versions of this Article:
genetics.105.050419v1
172/4/2293 most recent - Alert me when this article is cited
- Alert me if a correction is posted
- SERVICES
- Email this article to a friend
- Similar articles in this journal
- Similar articles in PubMed
- Alert me to new issues of the journal
- Download to citation manager
- Reprints & Permissions
- CITING ARTICLES
- Citing Articles via Google Scholar
- GOOGLE SCHOLAR
- Articles by Wang, S.
- Articles by Zhao, H.
- Search for Related Content
- PUBMED
- PubMed Citation
- Articles by Wang, S.
- Articles by Zhao, H.
Originally published as Genetics Published Articles Ahead of Print on January 16, 2006.
Genetics, Vol. 172, 2293-2308, April 2006, Copyright © 2006
doi:10.1534/genetics.105.050419
Mapping Quantitative Trait Loci in Noninbred Mosquito Crosses
Shuang Wang*,
Song Huang
,
Liangbiao Zheng
and
Hongyu Zhao
,
,1
* Department of Biostatistics, Mailman School of Public Health, Columbia University, New York, New York 10032,
Computational Biology and Bioinformatics, Yale University, New Haven, Connecticut 06520 and
Department of Epidemiology and Public Health and
Department of Genetics, Yale University School of Medicine, New Haven, Connecticut 06520
1 Corresponding author: Department of Epidemiology and Public Health, Yale University School of Medicine, 60 College St., New Haven, CT 06520-8034.
E-mail: hongyu.zhao{at}yale.edu
>ABSTRACT
METHODS
RESULTS
DISCUSSION
APPENDIX A: ACCURACY OF...
APPENDIX B: ACCURACY OF...
APPENDIX C
APPENDIX D: DERIVATION OF...
ACKNOWLEDGEMENTS
LITERATURE CITED
The identification of genes that affect quantitative traits has been of great interest to geneticists for many decades, and many statistical methods have been developed to map quantitative trait loci (QTL). Most QTL mapping studies in experimental organisms use purely inbred lines, where the two homologous chromosomes in each individual are identical. As a result, many existing QTL mapping methods developed for experimental organisms are applicable only to genetic crosses between inbred lines. However, it may be difficult to obtain inbred lines for certain organisms, e.g., mosquitoes. Although statistical methods for QTL mapping in outbred populations, e.g., humans, can be applied for such crosses, these methods may not fully take advantage of the uniqueness of these crosses. For example, we can generally assume that the two grandparental lines are homozygous at the QTL of interest, but such information is not be utilized through methods developed for outbred populations. In addition, mating types and phases can be relatively easy to establish through the analysis of adjacent markers due to the large number of offspring that can be collected, substantially simplifying the computational need. In this article, motivated by a mosquito intercross experiment involving two selected lines that are not genetically homozygous across the genome, we develop statistical methods for QTL mapping for genetic crosses involving noninbred lines. In our procedure, we first infer parental mating types and use likelihood-based methods to infer phases in each parent on the basis of genotypes of offspring and one parent. A hidden Markov model is then employed to estimate the number of high-risk alleles at marker positions and putative QTL positions between markers in each offspring, and QTL mapping is finally conducted through the inferred QTL configuration across all offspring in all crosses. The performance of the proposed methods is assessed through simulation studies, and the usefulness of this method is demonstrated through its application to a mosquito data set.
MOST statistical methods for QTL mapping developed for experimental organisms are applicable only to crosses starting from two inbred lines differing in the trait of interest, with each inbred line genetically homozygous between the two sets of chromosomes. Crossing between the two lines yields F1 progeny, who receive a copy of each chromosome from the two homozygous inbred lines; thus they are heterozygous at all loci where the two inbred lines differ. Therefore, genotypes of the F2 progeny are highly informative for the inheritance pattern at putative QTL sites. For example, if the two parental lines are denoted by a high- (H) trait line and a low- (L) trait line, respectively, having genotype HH, HL, or LL at one putative QTL for an F2 individual offers full information on the number of high-trait alleles at the QTL, i.e., 2, 1, or 0, respectively.
However, crosses may be carried out between individuals who are not completely homozygous across the genome. Although we can generally assume that the two lines being crossed are still homozygous at the QTL, other loci along the genome may be heterozygous. For example, a mosquito intercross experiment was conducted to detect QTL in Anopheles gambiae that control melanotic encapsulation response against Plasmodium cynomolgi Ceylon (ZHENG et al. 2003). The encapsulation response is defined as the proportion of the encapsulated oocysts among all oocysts in a single mosquito. The crosses were carried out between a laboratory-selected A. gambiae refractory strain L3-5 and a susceptible strain 4Ar/r. The A. gambiae female is recognized as the most successful vector of human malarias. Refractoriness to P. cynomolgi Ceylon in the original refractory strain L3-5 seems to be largely but not completely recessive, which is also called incomplete recessive (COLLINS et al. 1986; VERNICK and COLLINS 1989). The encapsulation responses of 167 F2 females, among which 123 became infected, from six intercrosses among offspring of L3-5 females and 4Ar/r males were tested and genotype data were collected on these F2 females and their F1 maternal parents at 52 microsatellite markers spanning the current genetic map of A. gambiae. Among the six F1 female parents of the F2 progeny, 11 autosomal markers were found to have more than two alleles, indicating that one or both of the parental strains (strain L3-5 and strain 4Ar/r) were polymorphic at these markers. However, there were no genotype data from the two original strains. For a detailed experimental setup and phenotype information in F0 and F1 generations, see ZHENG et al. (2003). Due to the heterozygosity among individuals in the parental lines, genotypes of F2 progeny no longer offer full inheritance information at the genetic markers, and the existing QTL mapping methods are either not applicable or not designed to optimally use information from such studies, although much work has been done on QTL mapping between outbred lines. For example, HALEY et al. (1994) developed a least-squares method for the analysis of crosses between outbred lines that simultaneously uses information from multiple linked markers. Their method has been applied in a three-generation pig experiment (KNOTT et al. 1998) and implemented in a web-based computer program, "QTL express" (SEATON et al. 2002). Although they assumed that major QTL affecting the trait of interest are fixed (although markers may be segregating), genotypic data on both the F2 individuals and their parents and grandparents are needed to consider all possible line origin combinations at a putative QTL in an F2 individual, making their method inapplicable for the current mosquito crosses, which lack grandparental and paternal genotype information. More recently, LIN et al. (2003) proposed a method that simultaneously estimates linkage phase, QTL location, and effect parameters. In this method, no simplified assumption that QTL are fixed was made, so all possible cross types have to be considered, making the method not optimized for the current mosquito crosses. Because of the uniqueness of the current mosquito crosses, we can make the assumption similar to that of KNOTT et al. (1994) that the genotypes at the QTL locations in two grandparental lines are fixed; thus, genotypes at the QTL locations in an F2 individual can be considered to be known provided that the phase type of an F2 individual can be estimated. Moreover, because of the large number of F2 progeny of the mosquito crosses, parental mating types and parental haplotypes can be inferred with very high accuracy; thus, there is no need to consider all possible cross types as required in the method of LIN et al. (2003). This can greatly reduce the inference complexity and makes our inference procedure more efficient. Therefore, to take full advantage of these unique features of the mosquito data, we have developed a statistical procedure for QTL mapping for genetic crosses with similar characteristics. This procedure consists of four components. First, we use genotypes from one parent in the F1 generation and the F2 offspring to infer the F1 mating type. Second, a likelihood-based method is used to infer the phases of the F1 parents. Third, a hidden Markov model is employed to estimate the number of high-trait alleles each F2 individual has at putative QTL locations. Finally, we perform linear regression analysis coupled with permutation-based methods to identify regions linked to the QTL. The performance of our procedure is evaluated through simulation studies, and our method is then applied to the mosquito data set to identify QTL in A. gambiae controlling the encapsulation response against P. cynomolgi Ceylon. To mimic the mosquito crosses, we focus on the intercross design, but extensions to other designs are straightforward.
ABSTRACT
>METHODS
RESULTS
DISCUSSION
APPENDIX A: ACCURACY OF...
APPENDIX B: ACCURACY OF...
APPENDIX C
APPENDIX D: DERIVATION OF...
ACKNOWLEDGEMENTS
LITERATURE CITED
Parental mating-type inference:
For the mosquito data, only genotypes from one parent are available, so the genotypes of the other parent have to be inferred. One advantage of the experimental crosses studied here is that there are a sufficient number of F2 progeny within each family so that the F1 mating type can be inferred with very high confidence from the observed genotypes of the F2 progeny together with possibly available parental information. In APPENDIX A and Table 1, we show that the probability of making correct mating-type inference approaches 1 when the number of offspring is
2030 from each family, which is about the family size in the mosquito experiment. Therefore, the mating type at each locus can be inferred with very high confidence.
|
Likelihood approach for phase inference in F1 parents:
In contrast to crosses between inbred lines where phase (the composition of marker alleles on a single chromosome) information is completely known for F1 individuals, the phase information is not known with certainty due to heterozygosity in the F0 generation. However, phase information is essential for QTL analysis to correlate the number of high-trait alleles at a putative QTL site with the phenotype.
In this article, we use a likelihood-based approach for phase inference for the individuals in the F1 generation. We assume that the parental mating type can be inferred with high confidence and the genetic map is also known. As all these families have a very large number of F2 individuals, a full-likelihood approach to make simultaneous phase inference across all the markers is computationally prohibitive. Therefore, we conduct sequential inference by inferring the phase of each pair of adjacent markers. In APPENDIX B, we show that the probability of making correct phase inference on the basis of a set of 30 offspring is very high when the genetic distance between each pair of markers is not large (Table 3). As an illustration of this inference procedure, we consider the case where the two parents have four distinct alleles in our discussion here, and the other cases can be derived similarly. Let a, b, c, and d denote the four alleles at the first marker, and the four alleles at the second marker are denoted by 1, 2, 3, and 4. In this case, for each pair of markers, we write the mating type at the first marker as
, where allele a represents the allele in the mother with a grandmaternal origin, allele b represents the allele in the mother with a grandpaternal origin, and c and d represent alleles in the father with grandmaternal and grandpaternal origins, respectively. When genotype information is not available from either parent, we can fix the mating type at the first marker as
,and then there are eight possible mating types at the second marker: 12 x 34, 12 x 43, 21 x 34, 21 x 43, 34 x 12, 34 x 21, 43 x 12, and 43 x 21. For each of these eight mating types at the second marker, we can calculate the likelihood of observing individual genotypes across these two markers in the F2 progeny within each family. The phase across these two markers can be inferred through the comparisons of the likelihoods among all eight possible mating types. For example, consider the following mating type between two parents: (a1, b2) x (c3, d4), where the phases in the mother are a1 and b2, and those in the father are c3 and d4. In the formation of F2 progeny, the mother has a chance of
to generate a nonrecombinant gamete a1 or b2, respectively, and a chance of
to generate a recombinant gamete a2 and b1, respectively. Similarly, the father has a chance of
to generate a nonrecombinant gamete c3 or d4, respectively, and a chance of
to generate a recombinant gamete c4 and d3, respectively. Here
is the recombination fraction between the two markers. Therefore, 16 possible two-locus phase types can be formed in the F2 progeny. The probability for each possible two-locus F2 phase type under this specific case is summarized in Table 2. Because the genotypes of the F2 progeny are independent conditional on the F1 mating types, the likelihood of the observed F2 genotypes is
![]() | (1) |
is the probability of an F2 progeny having genotype combination i at the two markers given that the F1 two-locus phase type is j; and
is the number of F2 progeny having genotype combination i at the two markers. We choose the F1 two-locus phase type that gives the highest likelihood among the eight possible F1 parental phase types and proceed with the same procedure until reaching the last marker on the chromosome. We then obtain the four complete haplotypes for the F1 parents, conditioning on the prefixed phase type at the first marker. When the genotype information from one parent is available, there are four possible mating types at each marker and the phase information can be inferred similarly as described above. As described in APPENDIX B and shown in Table 3, simulations suggest that the probability of correctly inferring the F1 haplotype approaches 1 with
2030 offspring in a given family, so this pairwise procedure offers an excellent balance between inference accuracy and computational efficiency.
|
|
Hidden Markov model for the inference of the number of high-trait alleles:
In statistical analysis of crosses resulting from inbred lines, the degree of association between the trait value of each individual in the F2 generation and the number of high-trait alleles at the putative QTL site(s) provides information on genetic linkage. For these crosses, it is relatively straightforward to estimate the number of high-trait alleles at each position along the chromosomes. However, for the crosses considered here, such inference is more difficult, especially when the grandparental origin of each allele is unknown. In the following discussion, although the two noninbred lines are not genetically homozygous at some markers, we assume that they are genetically identical at the QTL within each grandparental line and are different between the two lines because the trait of interest is not segregating within each grandparental line. Therefore, we can denote the genotype at a QTL of the high-trait line in the F0 generation as HH and that of the low-trait line as LL, so each F1 progeny has genotype HL, with one high-trait allele and one low-trait allele. In the intercross setting, at each putative QTL, if we label the two haplotypes, inferred phases across all the markers, in the mother by M1 and M2 and the two haplotypes in the father by P1 and P2, there are four possible combinations of haplotypes and QTL alleles for F1 parents: (H, M1)(L, M2) x (H, P1)(L, P2), (H, M1)(L, M2) x (H, P2)(L, P1), (H, M2)(L, M1) x (H, P1)(L, P2), and (H, M2)(L, M1) x (H, P2)(L, P1). For example, in the case of (H, M1)(L, M2) x (H, P1)(L, P2), M1 and P1 are the haplotypes transmitted from the high-trait line and M2 and P2 are the haplotypes transmitted from the low-trait line. For each F2 individual, our goal is to estimate the number of high-trait (H) alleles at each candidate QTL. In the absence of information on the origins of haplotypes M1, M2, P1, and P2, we need to consider all these four possibilities. For each given possibility, e.g., (H, M1)(L, M2) x (H, P1)(L, P2), we can use a hidden Markov model described in the following to infer the number of H alleles at the putative QTL sites along the chromosome conditional on the observed genotypes of the offspring and the haplotypes of the parents.
At a given locus i on a chromosome for an F2 progeny, the grandparental origin, i.e., the number of high-trait alleles at a putative QTL site, is not directly observable and can be thought of as a hidden state. We use 1 to denote the event that the mother transmitted the high-trait segment to the offspring, 3 to denote the event that the mother transmitted the low-trait segment, and 2 or 4 to denote the event that the father transmitted the high-trait segment or low-trait segment to the offspring, respectively. Using this notation, there are four possible types at marker i in the F2 progeny, denoted by
, and
. So there are four possible hidden states, and the hidden state at marker i is denoted by
. Note that the hidden state for an F2 individual at a given marker corresponds to the inheritance vector for this individual at that marker (LANDER and GREEN 1987). The observed genotype
of the F2 progeny at the marker is treated as the observed state. Assuming the no interference model for the crossover process, the hidden states {
} follow a Markov chain,
![]() | (2) |
is the transition probability of the hidden states from one marker to another marker, which depends only on the recombination fraction between the two markers. The transition probabilities are summarized in Table 4. The distribution of the initial hidden states, i.e., hidden states at the first marker on a chromosome, is set to
![]() | (3) |
|
As for the emission probability
at marker i in the F2 progeny, we take a marker with F1 phase type
as an example, and the genotypes at this marker in the F2 progeny are aa, ab, and bb. The phase types at this marker in the F2 progeny are ab, aa, bb, and ba. Thus, the emission probabilities at marker i are
![]() | (4) |
Given these three components in the hidden Markov model, we can estimate the probabilities of the hidden states in the F2 progeny at each marker as reviewed in RABINER (1989). Note that we have four sets of estimated high-trait allele distributions according to the four phase types at each putative QTL location.
Hidden Markov model for interval mapping:
LANDER and BOTSTEIN (1989) proposed an interval mapping method that is more powerful than marker-based analysis. The above hidden Markov model can be extended in a straightforward manner to infer the number of high-trait alleles at any position between two flanking markers. If we denote the observed genotypes at all marker positions by
, denote
as the genotypes at the jth putative QTL locus among a total of J considered QTL between the two flanking markers
and
, and also denote
as the phase type at the jth putative QTL, we then have
![]() | (5) |
can be obtained similarly as described in the previous section, and
can be obtained on the basis of the recombination fractions
and
, where
is the recombination fraction between marker
and the jth putative QTL, and
is the recombination fraction between the jth putative QTL and marker
. The two recombination fractions can be calculated on the basis of the map distance using Haldane's map function, which assumes no interference. We can add putative QTL positions throughout the chromosome (for example, every 1 cM or every 2 cM). For each added putative QTL position, recombinations can be calculated on the basis of the map distance from the added position to the two flanking markers. A detailed derivation of Equation 5 can be found in APPENDIX C.
QTL analysis:
The statistical association between a putative QTL and phenotypes is tested by regressing the phenotypes on the estimated number of high-trait alleles obtained from the hidden Markov model,
![]() | (6) |
denotes the inferred expected number of high-trait alleles of the kth individual in the F2 progeny at the tested putative QTL, and
denotes the quantitative phenotype of the kth individual in the F2 progeny. Within each intercross family, we calculate the F-statistic for the null hypothesis of no association between the phenotype and a putative QTL,
,
![]() | (7) |
, nl denotes the number of F2 progeny within family l,
is the fitted value of phenotype
, and L is the number of families. We can then obtain the overall F-statistic Fcom,
![]() | (8) |
Simulation studies:
To study the performance of our procedure for QTL mapping in experimental crosses in the case of noninbred lines in the F0 generation, we performed two simulation studies. The first study evaluated the procedure when the QTL was genotyped, i.e., the QTL is one of the genetic markers used, called the putative simulation case in the following discussion; and the second simulation study considered the situation where the QTL was not genotyped but fell between two markers, called the interval simulation case in the following discussion. All simulations were designed to mimic the mosquito intercross experiment: six intercross families with 30 progeny in each family, with 15 markers on one chromosome for the putative simulation case and 14 markers for the interval simulation case. We assumed an incomplete recessive genetic model, which means that instead of only F1 homozygotes being susceptible, a proportion of heterozygous mosquitoes can also develop the phenotype. Thus, we defined the genotypic value to be
for QTL genotype HH; and for individuals with QTL genotype HL,
had average value
, and
had average value
, where
is called the degree of incompleteness; for the QTL genotype LL, the genotypic value is
. We set the overall trait variance to be 1, and the proportion of variance due to genetic factors (heritability) is
i.e.,
, with
. Note that the heritability used here is defined in the broad sense, also called the degree of genetic determination (FALCONER and MACKAY 1996). By fixing heritability
and the degree of incompleteness b to some prespecified level, we can obtain the genotypic value
by solving the equation
![]() |
or
as described above. We first assigned allele frequencies of the 15 markers on one chromosome for the two grandparental lines in the F0 population. To mimic the situation of two noninbred lines in the F0 population, a single QTL was positioned at 35 cM on the 100-cM chromosome segment (marker 4), which was genetically homozygous, while all other markers were polymorphic instead and the number of alleles was up to four. These numbers were also chosen to mimic the mosquito intercross data. The 15 genetic markers had a marker spacing of 4.510.7 cM. The assigned allele frequencies and marker locations are displayed in Table 5.
|
For the putative simulation case, we simulated two haplotypes of the F1 mother on the basis of the predefined allele frequencies with one haplotype from each population line. The same simulation procedure was repeated to obtain the two haplotypes of the F1 father. The genotypes of each F2 progeny were simulated on the basis of the simulated F1 haplotypes and the recombination rates among the markers. More specifically, we chose an allele at the first marker with probability 0.5 from the mother and father and then, starting from the second marker, chose an allele either from the same haplotype or from a different haplotype from the subsequent marker according to whether there was a recombination event between the two markers for a parent.
For the interval simulation case, the putative QTL located at marker 4 was removed from the simulated data sets, making the total number of markers equal to 14. Therefore, the QTL was located between the third and fourth markers in this case. The phenotype data were still simulated on the basis of the QTL genotype for each individual, and all other simulation procedures remained the same.
We also performed simulation studies to study the impact of varying marker information content on QTL analysis. We tested this by setting some F2 genotypes to be missing at some markers or by setting several markers to have only one allele in the F1 mating type, where we modified the allele frequencies of three markers (2, 9, and 15) as shown in Table 5. When genotype information is missing in F2 individuals at some markers, the interval QTL calculation for the number of expected high-trait alleles is carried out using the two nearest existing flanking markers while the positions of the added interval QTL are kept, i.e., between two flanking markers that are supposed to be genotyped no matter whether the genotype information is present or not. We explored the impact of missing marker genotypes by testing different levels of missingness.
For all sets of simulation studies, we also performed interval mapping by considering putative QTL between two flanking markers as described in Hidden Markov model for interval mapping.
We performed 1000 simulations for all simulation scenarios. For each simulation, we first estimated the 100(1
)% chromosomewide threshold for the Fall-statistic. With specified heritability and degree of incompleteness, we simulated F2 phenotype data on the basis of F2 genotype data at the true QTL. We then applied our procedure and recorded the chromosomewide maximum Fall-statistic under H1 for this specific simulation. The 100(1
)% chromosomewide threshold for the Fall-statistic was estimated through the permutation test. To generate the permuted samples, we permuted the simulated phenotypes within each family and kept the simulated F2 genotype at the QTL location intact, i.e., kept the expected number of high-trait alleles of each individual intact. Each permutation generated a new data set in which the null hypothesis of no linkage between the phenotypes and the putative QTL holds. The resulting chromosomewide maximum Fall-statistic was stored and the permutation procedure was repeated 1000 times. The 100(1
) percentile of the maximum Fall-statistic was recorded (CHURCHILL and DOERGE 1994; BROMAN 2003) as our estimated chromosomewide critical value, and we used this value to detect the presence of a QTL on the chromosome to control the overall type I error rate to be 
for this current simulated data set. The whole procedure was repeated 1000 times, and the proportion of times that the null hypothesis of QTL not on this chromosome was rejected was the estimated power to detect the QTL.
Mosquito data:
As explained in the Introduction, a set of experiments was carried out to identify genes in A. gambiae controlling the melanotic encapsulation response against P. cynomolgi Ceylon (ZHENG et al. 2003). The encapsulation responses of 167 F2 females from six intercrosses among offspring of L3-5 females and 4Ar/r males were tested, among which 123 F2 females became infected. The encapsulation responses of the F2 females in the six families are summarized in Table 6 and the histograms of each individual family and six families combined are provided in Figure 1. Note the significant departure from a normal distribution of the encapsulation responses; therefore, the standard deviations in Table 6 should be interpreted as a very approximate indication of the spread of the responses. Among the 52 genotyped microsatellite markers spanning the genetic map of A. gambiae, 35 were informative markers, including 5 on the X chromosome, 16 on the second chromosome, and 14 on the third chromosome. No QTL was found on the X chromosome, so we focused on chromosome 2 and chromosome 3 in this article. Note that ZHENG et al. (1996) observed that recombination in both male and female mosquitoes was comparable, and a similar observation was made in other anopheline mosquitoes (MITCHELL et al. 1993; SEAWRIGHT and NARANG 1993). Therefore, in this article, we used the same genetic map for both males and females.
|
|
The Fall-statistic at each marker position as well as at each putative QTL position between two flanking markers was obtained and the statistical significance of the Fall-statistics was evaluated empirically through the permutation test. To generate the permuted samples, we permuted the encapsulation phenotype within each family and kept the expected number of high-trait alleles of each individual intact. Each permutation generated a new data set in which the null hypothesis of no linkage between the phenotype and the genotype of the putative QTL holds. The permuted data were then analyzed for QTL effects within each individual family. The resulting Fall-statistics at each position were stored and the procedure was repeated 10,000 times. To calculate the empirical P-values adjusting for multiple testing, we applied the procedure by LYSTIG (2003), where, for each permutation, the largest Fall-statistic was recorded and the adjusted P-value is the proportion of the maximum Fall-statistics from 10,000 permutations that is greater than or equal to the original Fall-statistic.
We also conducted QTL mapping by comparing the observed chromosomewide maximum Fall-statistic with the chromosomewide critical value obtained similarly as described in the Simulation studies section on the basis of permutations. Chromosomes that have chromosomewide maximum Fall-statistics exceeding the critical value possibly bear one or more QTL.
ABSTRACT
METHODS
>RESULTS
DISCUSSION
APPENDIX A: ACCURACY OF...
APPENDIX B: ACCURACY OF...
APPENDIX C
APPENDIX D: DERIVATION OF...
ACKNOWLEDGEMENTS
LITERATURE CITED
was set to be 0.05. Another simulation parameter we considered was missing genotype rate. We tested three different missing genotype rates, 0, 0.1, and 0.25, to explore the impact of missing marker genotypes. We also tested the impact of having one allele at some marker positions in F1 mating types using interval simulation when the missing genotype rate was set at 0.1 and the degree of incompleteness was set at 0.5.
As shown in Table 1, when the number of F2 progeny was
2030 in a given family, the probability of inferring F1 mating type correctly approaches 1. Similarly, Table 3 shows that the probability of inferring an F1 haplotype correctly also approaches 1 when the map distance between two markers was as long as 40 cM. We note that the maximum map distances between adjacent markers on the two mosquito chromosomes are both
18 cM. Therefore, with six mosquito families having
1537 individuals in the F2 generation, we should have very high confidence that our pairwise procedure for the F1 phase inference within each mosquito family can lead to accurate inference.
To estimate the overall type I error rate, we simulated F2 phenotype data under
QTL is not in this place, and obtained the chromosomewide critical value for the simulation following the permutation procedures described in Simulation studies. We then compared the observed chromosomewide maximum Fall-statistic with the chromosomewide critical value. The above procedure was repeated 1000 times and the proportion of times that the null hypothesis was rejected was an estimate of the overall type I error rate. Our simulation for the overall type I error rate was based on one parameter setting when the heritability was 0.25, the degree of incompleteness was 0.5, and the missing genotype rate was 0. The overall type I error rate for the interval simulation case with QTL at a marker was 0.049, and it was 0.059 for the interval simulation case using the interval mapping method, both of which are within the 95% confidence interval (0.036, 0.064) for a 5% nominal type I error rate based on 1000 simulations.
The power analysis results from different simulation studies for both interval simulation and putative simulation scenarios are summarized in Table 7 and plotted in Figure 2. Figure 2 includes results both from marker-based analyses and from interval mapping scanning through the chromosome at every 2 cM. We note the following:
- The interval mapping method did provide higher power than mapping at marker positions only for all scenarios considered. For example, when heritability was 12.5% and the missing genotype rate was 25%, for the interval simulation scenario we had 75% power using marker-based analysis and 80% power using interval mapping.
- As heritability increased, the power to detect QTL increased, achieving almost 100% power when the heritability was set at 25% for both putative and interval simulations with different missing genotype rates considered.
- As the missing genotype rate increased, the power decreased, where the loss in power was greater for the interval simulation scenario than that for the putative simulation scenario. The loss in power was the greatest when heritability was either 0.0625 or 0.125, with as much as 30% power loss, whereas the power loss was relatively small when heritability was 0.25.
- When the degree of incompleteness was increased from 0.5 to 0.7, there was consistent decrease in power for both putative and interval simulations for all levels of heritability.
0.70.9 when the heritability value was 0.25. When the heritability was 0.03125, the estimated QTL effect decreased to
0.30.5.
|
|
|
The results from the mosquito intercross data at marker positions as well as results from interval QTL mapping scanning through the chromosome at every 2 cM are plotted in Figure 3 for chromosome 2 and in Figure 4 for chromosome 3, respectively. Also plotted are the adjusted P-values for analysis at marker positions estimated on the basis of the permutation procedure described in METHODS and the chromosomewide threshold values controlling the overall type I error rate to be
. These results suggest that one possible QTL region on chromosome 2 and two QTL regions on chromosome 3 were related to encapsulation response against P. cynomolgi Ceylon. Note that previous genetic mapping experiments on encapsulation response against P. cynomolgi B, a simian malaria originating in Malaysia, with the current refractory (L3-5) and susceptible (4Ar/r) strains have identified one major (named Pen1) and two minor (named Pen2 and Pen3) autosomal dominant QTL. Pen1 has been mapped to chromosome 2R, division 8C, while Pen2 and Pen3 are less precisely located (COLLINS et al. 1997, 1999; ZHENG et al. 1997). In our investigation with these mosquitoes on the response of L3-5 to infection with P. cynomolgi Ceylon, on chromosome 2, close to markers H290 or H175, where Pen1 was most precisely defined, a major QTL was identified at markers 3F11-C1, while previous analyses of this data set found marker 3F11-C1 having a large but not significant F-statistic and thus did not define a new QTL (ZHENG et al. 2003). Marker H095 appeared to define another minor QTL on the basis of the permutation-adjusted P-value, and this marker was most closely associated with Pen3, a minor QTL for P. cynomolgi B encapsulation. Note that previous analyses defined markers H135 and H095 as a novel QTL, termed Pcen2R (ZHENG et al. 2003). The interval mapping method identified another possible minor QTL between markers H290 and H799 on chromosome 2. Two major QTL regions were identified on chromosome 3. Marker 3A6-17 defined a major QTL region, where the nearest markers H555 and H158 defined a QTL Pcen3R in the previous linear regression analyses (ZHENG et al. 2003). Marker 3A2-29 defined another major QTL region, where marker 3A2-29 defined a novel QTL Pcen3L from the previous linear regression analysis (ZHENG et al. 2003). Interval mapping with a 2-cM scan on chromosome 3 revealed another peak between two tightly linked markers 3A2-29 and marker H577, which confirmed the major QTL defined by 3A2-29. Note that the estimated QTL effect was
0.30.4 for chromosome 2 and
0.40.5 for chromosome 3. We also note that marker H758 was the marker most closely associated with Pen2 and marker 3A2-29, which gave the highest Fall-statistic on chromosome 3, was not available yet when studies on P. cynomolgi B were conducted.
|
|
ABSTRACT
METHODS
RESULTS
>DISCUSSION
APPENDIX A: ACCURACY OF...
APPENDIX B: ACCURACY OF...
APPENDIX C
APPENDIX D: DERIVATION OF...
ACKNOWLEDGEMENTS
LITERATURE CITED
The simulation results indicate the feasibility and power of the proposed method for both putative simulation and interval simulation scenarios. As we expected, under the putative simulation scenario, we had higher power for QTL mapping than that under the interval simulation scenario, and, as heritability increased, the power to detect QTL increased as well. In addition, the interval mapping approach had better power than that of marker-based analysis for all scenarios considered. We note that marker information content can influence the power of QTL analysis, where missing genotype information in the F2 progeny would lead to power loss in QTL mapping. A higher missing genotype rate could lead to more power loss, and the loss is greater for the interval simulation scenario than that for the putative simulation scenario. We have also investigated the effect of the degree of incompleteness on power. Note that the power obtained here is for whether there is a QTL on the chromosome, and this procedure is not designed to test whether there are multiple QTL. Extensions of the permutation-based method to estimate the empirical threshold values for detecting minor QTL not explained by major QTL have been studied by DOERGE and CHURCHILL (1996).
We note that in the mosquito data, the phenotype, which is the proportion of the encapsulated oocysts among all oocysts, does not follow a normal distribution, but rather is concentrated at both ends, i.e., close to either 1 or 0. Different transformations have been examined, but all these transformations led to similar results (data not shown). This is partly due to the use of permutations to set thresholds for statistically significant findings. Therefore, the original phenotypes were used in the mosquito data analysis. BROMAN (2003) studied the case when the normal distribution assumption is violated, such as when there is a spike in the distribution. TILQUIN et al. (2001) studied the power of QTL mapping methods applied to bacteria counts and concluded that the power of parametric QTL mapping methods is strongly reduced if raw bacteria counts are analyzed. Note that the encapsulated oocyst counts are similar to bacteria counts and we are currently working on methodology development to take into account the "two-spike" data in linkage analysis to increase statistical power in QTL mapping.
We also note that it may not be sufficient to use two markers at a time to estimate phases in the F1 individuals as information in the data from other markers is not utilized in our approach. Although this procedure works well for the cases we considered here, more accurate inference of phases can be made, leading to more powerful tests. However, the computational burden may make joint analysis of all markers unfeasible. In addition, we have assumed no interference in the hidden Markov model. When this assumption is violated, although the inferred number of high-trait alleles may not be correct, it should provide a very good estimate. Finally, note that the statistical model (6) for QTL analysis implicitly assumes a codominant genetic model while an incomplete recessive model is more appropriate for the mosquito data. As the mosquito data have a degree of incompleteness only slightly >50%, the current statistical model may be appropriate. However, when the degree of incompleteness is much different from 0.5, but close to 0 or 1 instead, a more appropriate way is to incorporate degree of incompleteness into the statistical model.
ABSTRACT
METHODS
RESULTS
DISCUSSION
>APPENDIX A: ACCURACY OF...
APPENDIX B: ACCURACY OF...
APPENDIX C
APPENDIX D: DERIVATION OF...
ACKNOWLEDGEMENTS
LITERATURE CITED
as the number of F2 individuals having genotype
, that i can be 1, 2, 3, or 4 for F2 progeny within a family, and
. With the number of F2 progeny n large enough, we will have high probability to observe all possible F2 genotypes in the F2 generation for a given F1 true mating type and, thus, have a high probability to infer the F1 mating type correctly.
|
ABSTRACT
METHODS
RESULTS
DISCUSSION
APPENDIX A: ACCURACY OF...
>APPENDIX B: ACCURACY OF...
APPENDIX C
APPENDIX D: DERIVATION OF...
ACKNOWLEDGEMENTS
LITERATURE CITED
ABSTRACT
METHODS
RESULTS
DISCUSSION
APPENDIX A: ACCURACY OF...
APPENDIX B: ACCURACY OF...
>APPENDIX C
APPENDIX D: DERIVATION OF...
ACKNOWLEDGEMENTS
LITERATURE CITED
, and denote the genotype at the jth putative QTL position lying in between two flanking markers
and
as
. We also denote
as the phase type at this putative QTL. We then have
![]() |
, we can apply the forward and backward equations in the hidden Markov model (RABINER 1989). The emission probabilities and transition probabilities can be obtained similarly as discussed in METHODS. ABSTRACT
METHODS
RESULTS
DISCUSSION
APPENDIX A: ACCURACY OF...
APPENDIX B: ACCURACY OF...
APPENDIX C
>APPENDIX D: DERIVATION OF...
ACKNOWLEDGEMENTS
LITERATURE CITED
![]() |
and degree of incompleteness b. Consider a QTL with two alleles H (high trait) and L (low trait);
and
are the allele frequencies of the high-trait (e.g., refractoriness) allele and low-trait allele (e.g., susceptibility) of the QTL. In our case,
. Assume an incomplete recessive genetic model with the degree of incompleteness being
; that is, for heterozygotes HL,
have the phenotype. The relationship between QTL genotype and phenotype is displayed in Table D1. From Table D1, we can derive the mean of the phenotypic value as
![]() |
![]() |
![]() |
![]() |
![]() |
![]() |
![]() |
|
ABSTRACT
METHODS
RESULTS
DISCUSSION
APPENDIX A: ACCURACY OF...
APPENDIX B: ACCURACY OF...
APPENDIX C
APPENDIX D: DERIVATION OF...
>ACKNOWLEDGEMENTS
LITERATURE CITED
ABSTRACT
METHODS
RESULTS
DISCUSSION
APPENDIX A: ACCURACY OF...
APPENDIX B: ACCURACY OF...
APPENDIX C
APPENDIX D: DERIVATION OF...
ACKNOWLEDGEMENTS
>LITERATURE CITED
BROMAN, K. W., 2003 Mapping quantitative trait loci in the case of a spike in the phenotype distribution. Genetics 163: 11691175.
CHURCHILL, G. A., and R. W. DOERGE, 1994 Empirical threshold values for quantitative trait mapping. Genetics 138: 963971.[Abstract]
COLLINS, F. H., R. K. SAKAI, K. D. VERNICK, S. PASKEWITZ, D. C. SEELEY et al., 1986 Genetic selection of a Plasmodium-refractory strain of the malaria vector Anopheles gambiae. Science 234: 607610.
COLLINS, F. H., L. B. ZHENG, S. M. PASKEWITZ and F. C. KAFATOS, 1997 Progress in map-based cloning of Anopheles gambiae genes responsible for encapsulation of malaria parasites. Ann. Trop. Med. Parasitol. 91: 517521.[CrossRef][Medline]
COLLINS, F. H., R. D. SAUNDERS, F. C. KAFATOS, C. ROTH, Z. KE et al., 1999 Genetics in the study of mosquito susceptibility to Plasmodium. Parasitologia 41: 163168.[Medline]
DOERGE, R. W., and G. A. CHURCHILL, 1996 Permutation tests for multiple loci affecting a quantitative character. Genetics 142: 285294.[Abstract]
FALCONER, D. S., and F. C. MACKAY, 1996 Introduction to Quantitative Genetics, Ed. 4. Longman, Essex, UK.
HALEY, C. S., S. A. KNOTT and J. M. ELSEN, 1994 Mapping quantitative trait loci in crosses between outbred lines using least squares. Genetics 136: 11951207.[Abstract]
KNOTT, S. A., L. MARKLUND, C. S. HALEY, K. ANDERSSON, W. DAVIES et al., 1998 Multiple marker mapping of quantitative trait loci in a cross between outbred wild boar and large white pigs. Genetics 149: 10691080.
LANDER, E. C., and D. BOTSTEIN, 1989 Mapping Mendelian factors underlying quantitative traits using RFLP linkage maps. Genetics 121: 185199.
LANDER, E. C., and P. GREEN, 1987 Construction of multilocus genetic maps in humans. Proc. Natl. Acad. Sci. USA 84: 23632367.
LIN, M., X. Y. LOU, M. CHANG and R. L. WU, 2003 A general statistical framework for mapping quantitative trait loci in nonmodel systems: issue for characterizing linkage phases. Genetics 165: 901913.
LYSTIG, T. C., 2003 Adjusted P-values for genomewide scans. Genetics 164: 16831687.
MITCHELL, S. E., J. A. SEAWRIGHT and S. K. NARANG, 1993 Linkage map of the mosquito (Anopheles quadrimaculatus species A), pp. 3.2733.276 in Genetic Maps, Ed. 6, edited by S. J. O'BRIEN. Cold Spring Harbor Laboratory Press, Cold Spring Harbor, NY.
RABINER, L., 1989 A tutorial on hidden Markov models and selected applications in speech recognition. Proc. IEEE 77: 257286.[CrossRef]
SEATON, G., C. S. HALEY, M. KEARSEY and P. M. VISSCHER, 2002 QTL express: mapping quantitative trait loci in simple and complex pedigrees. Bioinformatics 18: 339340.
SEAWRIGHT, J. A., and S. K. NARANG, 1993 Linkage map of the mosquito (Anopheles gambiae), pp. 3.2693.272 in Genetic Maps, Ed. 6, edited by S. J. O'BRIEN. Cold Spring Harbor Laboratory Press, Cold Spring Harbor, NY.
TILQUIN, P., W. COPPIETERS, J. M. ELSEN, F. LANTIER, C. MORENO et al., 2001 Statistical power of QTL mapping methods applied to bacteria counts. Genet. Res. 78: 303316.[Medline]
VERNICK, K. D., and F. H. COLLINS, 1989 Association of a Plasmodium-refractory phenotype with an esterase locus in Anopheles gambiae. Am. J. Trop. Med. Hyg. 40: 593597.
ZHENG, L. B., M. Q. BENEDICT, A. J. CORNEL, F. H. COLLINS and F. C. KAFATOS, 1996 An integrated genetic map of the African human malaria vector mosquito, Anopheles gambiae. Genetics 143: 941952.[Abstract]
ZHENG, L. B., A. J. CORNEL, R. WANG, H. ERFLE, H. VOSS et al., 1997 Quantitative trait loci for refractoriness of Anopheles gambiae to Plasmodium cynomolgi B. Science 276: 425428.
ZHENG, L. B., S. WANG, P. ROMANS, H. Y. ZHAO, C. LUNA et al., 2003 Quantitative trait loci in Anopheles gambiae controlling the encapsulation response against Plasmodium cynomolgi Ceylon. BMC Genet. 4: 16.[CrossRef][Medline]
Communicating editor: R. W. DOERGE
- THIS ARTICLE
-
Abstract
- Full Text (PDF)
-
All Versions of this Article:
genetics.105.050419v1
172/4/2293 most recent - Alert me when this article is cited
- Alert me if a correction is posted
- SERVICES
- Email this article to a friend
- Similar articles in this journal
- Similar articles in PubMed
- Alert me to new issues of the journal
- Download to citation manager
- Reprints & Permissions
- CITING ARTICLES
- Citing Articles via Google Scholar
- GOOGLE SCHOLAR
- Articles by Wang, S.
- Articles by Zhao, H.
- Search for Related Content
- PUBMED
- PubMed Citation
- Articles by Wang, S.
- Articles by Zhao, H.






















