It has been suggested that the collaborative cross, a large set of recombinant inbred strains derived from eight inbred mouse strains, would be a powerful resource for the dissection of complex phenotypes. Here we use simulation to investigate the power of the collaborative cross to detect and map small genetic effects. We show that for a fixed population of 1000 individuals, 500 RI lines bred using a modified version of the collaborative cross design are adequate to map a single additive locus that accounts for 5% of the phenotypic variation to within 0.96 cM. In the presence of strong epistasis more strains can improve detection, but 500 lines still provide sufficient resolution to meet most goals of the collaborative cross. However, even with a very large panel of RILs, mapping resolution may not be sufficient to identify single genes unambiguously. Our results are generally applicable to the design of RILs in other species.
THE Complex Trait Consortium (CTC) outlined a strategy, called the collaborative cross (CC), to construct a very large set of recombinant inbred (RI) strains from a genetically diverse set of inbred strains of mice (Churchill et al. 2004). It is suggested that the CC is an ideal resource for systems biology as it will provide a reproducible, highly varied yet controlled set of genetic backgrounds for functional genomic studies. One projected use of the CC is the detection and mapping of genetic effects underlying complex phenotypes. In principle, the CC could solve the obdurate problem of high-resolution quantitative trait locus (QTL) mapping: the CTC position paper suggested the creation of 1000 RIs to map QTL to a resolution where gene identification becomes possible, within a reasonable time and budget. An international collaboration to breed these lines has started and is described at http://www.complextrait.org.
However, the experimental design and statistical power of the CC has not yet been rigorously assessed. A number of factors need to be considered. First, the genetic architecture of complex traits, which remains largely unknown, will influence the CC's ability to detect and resolve genetic effects. Second, it is not clear how best to design the cross to maximize and maintain genetic diversity. If the CC is not seeded with sufficient diversity in its initial stages, or loses diversity during breeding, there may be insufficient detectable recombinants in the RI strains to deliver adequate mapping resolution. Third, assuming that the CC does have the potential to deliver high-resolution QTL mapping, how does it compare to other possibly more efficient strategies? For instance, detection and fine mapping can be achieved within a population of manageable size by making use of historical recombinants. Advanced intercross lines (AILs), populations made from F2's by repeated intercrossing within a large population, have the simple genetic background necessary for unambiguous assignment of identity by descent along with the highly recombinant haplotype structure that allows high mapping resolution (Darvasi and Soller 1995; Wang et al. 2003). Heterogeneous stocks (HS) are a generalization of AILs, derived from a cross of eight inbred mouse lines (McClearn et al. 1970; Demarest et al. 2001). Although HS are more complex to analyze, they provide additional genetic diversity and may offer increased mapping resolution over AILs (Talbot et al. 2003). But AILs and HS have drawbacks, since they take years to breed and are expensive to genotype. The efficiency of AIL and HS designs, relative to the CC, has not been investigated.
In this article, we use simulation to compare QTL detection and accuracy of the collaborative cross design against several alternatives. We simulate the realistic situation of a quantitative trait that owes half its variance to genetic factors. Of that half, a small portion is due to a QTL on a genotyped chromosome. The remaining variance is attributable to multiple QTL that lie on ungenotyped, hence unobserved, chromosomes. We investigate the ability of the CC to detect and map a QTL when it acts independently and when it is under strong epistatic control. We investigate variations on the basic breeding design and compare it with traditional alternatives (F2 intercross and the backcross) and newer, related strategies [recombinant inbred heterogeneous stocks (RIHS) (Valdar et al. 2003) and RIs based on AILs]. Finally, we ask how many lines are necessary to fine map small-effect QTL both in simple additive and in strongly epistatic systems.
Breeding strategies overview:
We examined five breeding strategies: the CC (Complex Trait Consortium 2003; Churchill et al. 2004), RIHS (Valdar et al. 2003), recombinant inbred advanced intercross lines (RIAILs), the F2 intercross (F2), and the backcross (BC). RIHS and RIAILs are panels of recombinant inbred strains made from, respectively, HS and advanced intercross (AI) populations. The HS, RIHS, AI, RIAIL, and CC all share a common aim: to produce fine-grain haplotypic mosaics from a limited known genetic repertoire. For this reason, we term them “mosaic crosses.” Moreover, with the exception of the backcross, the mosiac crosses and the F2 are variants of a more general multistage breeding program. That program has three stages: a compulsory mixing stage, followed by two optional stages, maintenance and inbreeding (Figure 1). In the mixing stage, inbred progenitor strains are intercrossed to produce a foundation population of mice whose genomes each contain some genetic material from every founder strain. In the maintenance stage, foundation mice are intercrossed over a number of generations. The aim is to introduce recombinants and produce a population of mice whose genomes are fine-grained mosaics of the original founder haplotypes. Intercrossing in the maintenance stage is typically performed using a pseudo-random mating technique that avoids pairing of close relatives and thereby reduces genetic drift and the risk of fixation. In the inbreeding stage, randomly chosen pairs of mice are inbred by recurrent brother–sister mating over 20 generations, each pair producing a distinct recombinant inbred line. We describe the four main types of strategies, and their variants, below and in Table 1.
F2 intercross and backcross:
In an F2 cross two inbred lines, A and B, are crossed in the mixing stage, to make F1 foundation animals, of genetic composition AB. They enter a maintenance stage of one generation to produce F2 animals. The inbreeding stage is omitted. For the backcross, two inbred lines, A and B, are combined in an F1 cross. These foundation animals are then crossed with strain A to produce animals whose chromosome pair comprises one A haplotype and one recombinant AB haplotype.
Advanced intercross lines (Darvasi and Soller 1995) resemble an F2 with an extended maintenance stage that lasts two or more generations. AIL genomes thus also have genetic composition AB, but contain more recombinants. To provide an informative comparison with the CC and RIHS crosses, which is the true focus of our work, we do not simulate AILs as such, but instead recombinant inbred strains derived from an AIL (RIAIL), made by adding on an inbreeding stage to the traditional AIL design. The genomes of RIAIL mice are therefore highly recombinant mosaics of strains A and B that theoretically are homozygous and identical within a particular line but different between lines. We use a notation in which RIAILs are suffixed with the number of filial generations intercrossed before inbreeding, e.g., RIAIL25 means RIs produced from an F25 or, equivalently, an AIL25 population.
Recombinant inbred heterogeneous stocks (Valdar et al. 2003) are like RIAILs but derived from eight founders rather than from two. They are made as follows. Eight inbred lines, A–H, are combined in a three-step mixing phase. In the first step, four F1 populations are produced of compositions AB, CD, EF, and GH by mating A with B, C with D, and so on, i.e., a “two-way” cross. The second step proceeds with two “four-way” crosses: AB and CD mice are mated to produce mice of composition ABCD, while EF and GH similarly give rise to EFGH. The last step is an “eight-way” cross, in which ABCD mice are crossed with EFGH mice to produce mice of genetic composition ABCDEFGH. We call these animals foundation heterogeneous stocks, or HS0. After maintenance for n generations to build up recombinants they become HSn. Subsequent inbreeding produces RIHSn; e.g., RIHS30 are RIs produced by inbreeding pairs of HS30 mice.
The collaborative cross as initially proposed (Complex Trait Consortium 2003; Churchill et al. 2004) is much like RIHS but with a more elaborate mixing stage and no maintenance breeding. In its most extreme form, mixing is ambitiously complex. Simpler mixings have since been suggested (Churchill et al. 2004), whose complexity lies between that of the original and that of simple RIHS mixing. Here we simulate the most extreme form, noting that this may overestimate the efficacy of an intermediate. As in RIHS, the mixing stage of the collaborative cross has three steps. The first step produces F1's from every distinct pair of founders, generating mice of composition AB, AC, … , FH, GH. This “diallel” design comprises 28 two-way crosses or 56 if reciprocal crosses between males and females are performed (which they are not here as we restrict our attention to autosomes). In step two, every possible four-way cross is performed between F1's, excluding instances of shared descent. For example, AB × CD is allowed but AB × BC is not. We call this a “disjoint diallel.” In the last step, eight-way crosses are performed in a further disjoint diallel (where, for example, ABCD × EFGH is allowed but ABCD × ABDE is not), resulting in foundation mice of composition ABCDEFGH. Its repeated disjoint diallel structure means the mixing stage incurs a combinatorial explosion of crosses. In the unrealistically conservative situation of every distinct cross being performed with no more than one mating pair, step two would see 210 matings and step three, 310. Only a subset of those eight-way crosses would be feasible in practice. We simulate them all.
In the standard CC proposal, pairs of foundation mice then are inbred to produce RIs. We also investigated variant CC designs with more recombination, to increase mapping accuracy. In “cctop10,” we select only the most highly recombinant 10% of mice from the four-way cross (which could be determined by genotyping) to proceed to the eight-way cross. In “cc04” and “cc08,” we insert four and eight maintenance generations (∼1 and 2 years of extra breeding), respectively, between the mixing and inbreeding.
There are several established ways to maintain a laboratory population over many generations with a view to preserving heterozygosity (Caballero and Toro 2000). Circular mating (also known as rotational breeding) defines a structured mixing of the animals, repeated at each generation that implicitly restricts the consanguinity of each mating and so minimizes the extent of inbreeding, drift, and fixation (Kimura and Crow 1963; Boucher and Cotterman 1990). We perform maintenance in all strategies using circular mating. We simulate it as follows. Given a population of 2m individuals, split equally between the sexes, put couples into cages labeled 1–m. Prepare empty cages labeled 1′–m′ to hold the next generation. Mate the female from cage 1 with the male from cage 2 and send their offspring to cage 1′. In similar fashion, mate the female of cage k with the male from cage k + 1 (or the male from cage 1 if k = m) and send their offspring to cage k′, doing this for all remaining k. Repeat this procedure, with suitable relabeling, until the required number of maintenance generations is reached.
The genetic diversity of the final population depends critically on m, the number of mating pairs. When m is small, the maintenance stage acts as a population bottleneck that encourages extremes of drift and fixation. Conversely, a larger m delays these effects. Consistent with existing HS mice (Northport HS) (Demarest et al. 2001), we set m at 50 when generating RIHS lines unless otherwise specified and do this for RIAILs also. However, for our CC variants, a project of more ambitious scope, we set m equal to the number of RIs required, which can be as high as 1000.
Founder strains and marker sets:
Northport HS stocks derive from a cross of the mouse strains A/J, AKR/J, BALB/cByJ, C3H/HeJ, C57BL/6J, CBA/J, DBA/2J, and LP/J. R. Hitzemann (personal communication) chose those strains with dual purpose: to emulate the original Boulder HS, whose founders were genetically diverse by the standards of that day, and to provide a contrast of response classes to haloperidol-induced catalepsy.
The Complex Trait Consortium (2003) urges that the founder set chosen for the CC balances genetic diversity, phenotypic diversity, breeding performance, and the availability of complementary resources. The strains finally selected for breeding by the CTC were A/J, C57BL/6J, 129Sv/ImJ, NOD/LtJ, NZO/H1J, CAST/Ei, PWK/Ph, and WSB/Ei, which include a number of wild-derived strains. However, at the time of simulation this set was not yet finalized, and only incomplete genotype data were available for them. For these reasons, and because we wished to compare alternative eight-founder designs on an equal footing, all simulated populations were founded with the Northport set listed above or, in the case of the two-strain strategies, with pairs of strains drawn from that list. The genomes of wild-derived strains such as CAST/Ei are more distantly related than are standard strains such as C57BL/6J. Therefore we would expect more QTL to segregate in the actual CC but the results of our simulations should not change overmuch since they are conditioned on the presence of segregating QTL.
We use two marker sets. The first is an artificial fine-grain set of 1000 fully informative markers (i.e., markers with eight alleles) spaced at regular intervals of 0.1 cM. We use this set only to establish unrecombined block length for different designs (see later) and not for mapping. The second, which we use to map QTL, is a real data set typed for chromosome 1, provided by the Genomics Institute of the Novartis Research Foundation (GNF) (Pletcher et al. 2004). We selected markers that segregate among the eight Northport HS progenitors to make our simulations as realistic as possible. The GNF set numbers 512 diallelic SNPs with average spacing 0.19 cM (standard deviation 0.16 cM) over 100 cM. These markers segregate among the eight founders with varying minor allele frequencies. The strain distribution pattern of the SNPs is such that of the 512 minor alleles, 68 are present in only one strain, 136 are present in two strains, 209 in three strains, and 99 in four strains.
Representing the mouse genome:
We represent a mouse by up to seven pairs of homologous autosomes, depending on the complexity of the genetic model being simulated. Chromosome 1 is 100 cM long and contains the GNF marker set or in the case of establishing block lengths the fine-grain artificial marker set. The QTL to be mapped always lies on chromosome 1. The remaining six chromosomes act as an unseen genetic background that affects the phenotype; these chromosomes are ignored during mapping. For our simulations of epistasis, chromosome 2 contains a single diallelic marker that acts as a masking QTL in our simulation of epistasis but is ignored otherwise. Chromosomes 3–7 each contain 10 diallelic markers spaced at 10-cM in-tervals. These 50 markers are used as background QTL (see below). Each of the 51 markers on chromosomes 2–7 has its high allele on four founders independently and randomly chosen from the eight.
Simulating breeding in the mouse population:
We simulate mouse populations stochastically, using software written in house (available from http://www.well.ox.ac.uk/∼valdar/software/). Gametogenesis is simulated by randomly drawing one of each parental autosome and recombining it with its homolog, choosing breakpoints according to the Haldane model with no interference, in which meioses occur as events in a one-dimensional Poisson process (Lynch and Walsh 1998).
Phenotypic variance components:
We simulate a quantitative phenotype that is 50% heritable in the founder population. This means that genetic factors explain half the phenotypic variance and environmental noise accounts for the remainder. The heritable portion of the variance is split between foreground and background QTL. Foreground QTL are defined as those that lie on chromosome 1. These QTL are visible in our mapping. Background QTL lie on the unseen chromosomes (2–7) and, although they affect the phenotype, are unmappable. The phenotypic variance partitions thus,where subscripts identify phenotypic (P), environmental (E), genetic (G), foreground (Q), and background (B) variance components, and where, for a phenotype that is 50% heritable, .
QTL effect sizes and allelic values:
The effect size of a QTL is defined here as the proportion of phenotypic variance it explains in the founder population, e.g., . We explicitly vary the effect size of a single foreground QTL while implicitly adjusting the effect sizes of the background QTL to make up the remaining heritability. These 50 background QTL act independently to provide only additive genetic variation. Thus, ignoring covariance between QTL (which is negligible in our design), , where the effect size of the ith background QTL is , such that in any one simulation all background QTL are equally potent in the founder population. The case of epistasis is described below.
The allelic value is defined here as the raw number that a QTL's high allele contributes to the phenotype value. We adjust each QTL's allelic value at the start of each simulation to yield the required effect size in the founder population. That allelic value then remains constant throughout breeding and phenotyping. This means, owing to inevitable drifts in allele frequency, that a QTL's effect size in the mapping population may differ from its starting value in the founder population, but it models reality better than fixing the effect size in the mapping population by changing its allelic value.
We simulate two main foreground QTL systems: a single additive QTL and a single QTL in strong epistasis with an unseen QTL on chromosome 2. In both cases, background QTL are present but act independently of QTL in the foreground.
We simulate a single foreground QTL as follows. Before any breeding takes place, a marker is drawn at random from chromosome 1 and designated the QTL. Then the strain distribution pattern (SDP) of its alleles is shuffled. We do this for the following reason. In the GNF marker set, 90% of the SNPs segregate between C57BL/6 and A/J or DBA/2, due to the method of SNP ascertainment. As a result, the strain distribution pattern of the QTL matches that of nearby SNPs more frequently than would be likely in reality, since there is no reason to suppose a QTL favors any particular SDP. To correct for this bias, we shuffle the SDP of the QTL but keep the SDP of the remaining SNPs unchanged. For example, a QTL with its minor allele in strains A, B, and C may, after shuffling, find that allele moved to strains B, F, and H. Were the strain distribution pattern of the QTL not shuffled it would be easy to detect through single-marker association because its allelic state would be closely matched by those of its flanking markers. We demonstrate this point by performing an additional set of unshuffled simulations for one experimental condition, contrasting its results with those of the shuffled case (see results). Once phenotyping is complete, the foreground QTL marker is concealed, and we then attempt to recover that marker's location.
The high allele of each QTL is assigned an allelic value on the basis of its frequency and required effect size in the founder population. The phenotype of each animal in the mapping population is calculated as , where, with subscripts B identifying background loci, denotes the allelic value, xB the number of high alleles, and ε a normally distributed error term with variance accounting for half the total variance in the founders. Table 2 illustrates the relation between the genotype (x) of an additive locus and its genetic effect (). For inbred mice x is usually 0 or 2 (although it can be 1 if inbreeding has not fixed that QTL); for F2 mice x is 0, 1, or 2; and for BC mice x is 0 or 1.
For simulations of epistasis, the foreground and background QTL are generated as before except that the genetic effect of the foreground QTL on chromosome 1 now depends on the masking QTL, M, on chromosome 2. The equation for a mouse phenotype is now , such that, for example, in a population of inbred lines the genetic effect from the foreground QTL () is nonzero only when both foreground and masking QTL are in the high state. Note that the “effect size” of the foreground QTL is now defined as the proportion of variance explained by the QTL pair. The goal of subsequent detection and fine mapping is to find the first QTL despite the other's confounding effect.
The foreground and masking QTL always segregate in each simulation. Thus for F2, RIAIL, and BC, founded on just two strains, the pair of strains chosen is random, subject to the constraint that those QTL segregate between them.
Detection and fine mapping:
We detect and fine map simulated QTL using two methods, single-marker association (SMA) and HAPPY, a multipoint method (Mott et al. 2000). In single-marker association, we regress the phenotype on the number of minor alleles present at each marker in turn. If the marker showing the strongest association is statistically significant (see later), we classify the QTL as detected and predicted to be located in the most significant marker interval. Otherwise, we deem the QTL undetected.
Mapping using HAPPY is also performed in a regression framework, but here the phenotype is regressed on the inferred probability of descent at each marker interval. Briefly, HAPPY models genotypes as symbols emitted from hidden progenitor states in a discrete Markov process where transitions between states correspond to recombination events. This hidden Markov model (HMM) is fitted to the data using a dynamic programming algorithm described in Mott et al. (2000). The HMM transition matrix is parameterized by the expected number of recombinants between markers, which, given the centimorgan distance between markers, can be summarized by the effective number of generations . Inbreeding, among other things, causes to deviate from the actual number of generations and so varies with breeding strategy. For each strategy we estimate by calibrating the observed level of informative recombination against that seen in a simulated outbred population.
By combining all the genotype data along a diploid chromosome, HAPPY computes for each individual and at each locus the probability of descent from each pair of the founder strains. The progenitor probabilities are regressed on the phenotype to identify loci where the pattern of strain inheritance correlates with the phenotype. As with SMA, the marker interval that associates most strongly with the phenotype is tested for statistical significance and, if it passes, declared the putative QTL.
In each simulation, 1000 animals are generated. As a consequence, in some experiments multiple animals are phenotyped from each RI strain. In that case we regress the mean phenotype of those replicates (i.e., the “strain mean”) on genetic information from a randomly chosen member. Residual heterozygosity within an RI means that “replicates” are rarely identical at all loci, so we implicitly model the drawbacks of minimal genotyping.
Assessing statistical significance using generalized extreme value distributions:
We measure the association between locus and phenotype in both single-marker association and HAPPY by an F-test of their respective linear model fits. The overall score for a simulation over m loci is defined as follows. Let , where is the P-value from the F-test at locus . Define , that is, the maximum score over all loci. The reported P-value for the highest-scoring locus is grossly anticonservative. We therefore assess statistical significance by comparing the obtained P-value with genomewide thresholds that have been corrected for multiple testing. The Bonferroni correction is too conservative as it ignores linkage disequilibrium (LD) between markers, and a permutation test is computationally unfeasible for so many simulations. Instead, we take LD into account by modeling the distribution of Smax under the null hypothesis of no foreground QTL and hence calculate suitable threshold quantiles.
The LD structure varies with breeding strategy, number of RI lines, and extent of background genetic variation so we estimate the null distribution of Smax separately for each experimental condition. Provided there is negligible long-range dependence between distant , with increasing m, asymptotically follows a generalized extreme value distribution (Coles 2001) with distribution function , where μ, σ, and ξ are the location, scale, and shape parameters, respectively, and . A closely related use of generalized extreme values (GEVs) is described in Dudbridge and Koeleman (2004).
For each experimental condition we perform 1000 null simulations with background QTL only. We then fit a GEV by maximum likelihood to the observed values of , using the R package “evd” (Stephenson 2003). An example of fit is shown in Figure 2a, which plots empirical and fitted distributions of for the cc strategy with 500 recombinant inbred lines (RILs). To determine genomewide significance thresholds we used the upper tail quantiles corresponding to P-values of g(5%), g(1%), g(0.1%), and g(0.01%), where Bonferroni corrects for the fact that we simulate only one chromosome per animal but set thresholds as if there were 20 [i.e., ].
Table 3 lists these significance thresholds calculated for all combinations of strategies and numbers of RILs for the case of a total background effect of = 45%.
We found that, for a given number of null simulations, estimating thresholds via a GEV model is more efficient than the traditional approach of taking empirical quantiles from the observed cumulative distribution of . To illustrate this point, Figure 2b shows how threshold estimates from both methods vary with the number of null simulations available.
Assessing the efficacy of detection and fine mapping:
For a given breeding protocol and QTL system, we calculate two main quantities from 1000 independent simulations: power to detect and location error. “Power to detect” is the probability that a QTL is detected at a given genomewide significance level , defined as the proportion of trials in which the highest-scoring locus exceeds the -threshold. “Mean location error” is the mean absolute distance of the predicted QTL from the actual QTL in trials where the predicted QTL reaches -level significance. We investigated measuring accuracy using root mean squared error but found that its sensitivity to outliers (i.e., false positive QTL detection events) produced misleading results.
Single-marker association compared to HAPPY for detecting QTL:
We first investigated the effects of algorithm choice (SMA vs. HAPPY) and SNP ascertainment. Figure 3, a and b, plots the performance of HAPPY and single-marker association in fine mapping a 5% QTL segregating in a population of 1000 mice derived from 500 collaborative cross RIs (i.e., with 2 replicate mice per RI strain). Note that here we define a 5% QTL as one that accounts for 5% of the phenotypic variance among the eight founder strains and that we include additional unobserved QTL to make up the additive genetic variance to 50% (see methods).
Figure 3a shows the power to detect a significant QTL. The orange and light green lines show the performance of SMA and HAPPY in the case where the QTL is an unaltered SNP drawn from the original GNF marker set. The dark red and dark green lines show what happens when the SNP chosen to be the QTL has had its SDP shuffled with respect to the founders in advance of the simulation and represent a more likely scenario (see methods for details). In the first case, SMA does better than HAPPY, detecting the QTL with higher power at all thresholds. But if the SDP of the QTL is shuffled, then single-marker association fails whereas HAPPY remains powerful. Similarly, Figure 3b, which plots the location error against the significance threshold, tells a similar story: when the QTL is an unaltered SNP, both HAPPY and single-marker association predict its location with subcentimorgan accuracy. But when the alleles of the QTL have been shuffled, the location error of SMA predictions rises dramatically whereas those of HAPPY are almost unchanged. Hereafter, we consider only shuffled QTL.
Figure 4 plots the location error of HAPPY predictions against those of SMA in 1000 simulations. For F2 populations (Figure 4a) the two methods are similar, with poor localization (where the peak is >15 cM away from the true location) slightly more in HAPPY than in SMA (3.1% vs. 1.2% of cases). However, for CC populations (Figure 4b) HAPPY localizes the QTL as accurately as before but SMA misses the QTL by >15 cM far more often (2.8% vs. 10.1% of cases). Hereafter, we consider only mapping predictions made by HAPPY for all strategies except BC and F2, for which we consider predictions made by both HAPPY and SMA.
Variation in haplotype block length:
Table 4 gives summary statistics of distributions of the lengths of unrecombined haplotype blocks produced by the breeding strategies. Statistics were calculated from 1000 simulations of 1000 animals, using the artificial fully informative marker set described in methods. The estimates have a slight upward bias because double recombinants returning to the same founder that occur between markers evade detection. Despite this, the observed distributions were close to the exponential distribution predicted by theory. As expected, the block length shortens with more generations. For instance, within the group of CC strategies, cc, which has no selection or maintenance, has the longest blocks whereas cc08, which has 2 years of maintenance, has the shortest. Strategies rihs05 and rihs10 have block lengths comparable to those of cc04 and cc08, and this owes much to their similar maintenance periods. Although the maintenance periods of cc04 and cc08 are also close to those of riail08 and riail13, these RIAIL strategies have longer block lengths because, with two founder haplotypes rather than eight, double recombinants are always invisible and many recombinations are uninformative. The shortest mean block length is seen in rihs30 at 4 cM. However, the block lengths of cc04 and rihs10 are not much longer, indicating that maintenance beyond a couple of years is subject to diminishing returns.
Single QTL—detection and resolution:
We simulated a single additive QTL in RI, backcross, and F2 intercross populations using a fixed population size of 1000 animals. The allelic value of the QTL was constant across designs and accounted for 5% of the phenotypic variance among the eight founders. For the RI designs, we varied the number of recombinant inbred lines from which the 1000 animals were drawn. For example, 100 RILs correspond to 10 replicates from each of 100 lines. The results are plotted in Figure 5 and tabulated in Table 5.
Figure 5A shows the power to detect an effect at the 5% genomewide significance level plotted against the number of RILs (with non-RI strategies at 0 RILs). Power is defined here as the proportion of 1000 trials in which the highest-scoring locus exceeds a given significance threshold. The cc strategy (red line) illustrates the trend among RI strategies: the greater the number of lines is, the more power to detect. There is consistent variation between RI strategies too: at 500 RILs, two-way strategies (i.e., those involving only two progenitor strains; blue) are more powerful than eight-way ones, and strategies with more recombination (e.g., cc08 or riail13) do worse than their less recombinant counterparts. At 1000 RILs, the eight-way strategies catch up. Overall, BC and F2, two-strain strategies with minimal recombination, have the greatest power to detect QTL.
Figure 5B shows the mean location error for detected QTL. Whereas each point in Figure 5A was based on 1000 simulations, in Figure 5B each point is based on only those simulations that were significant. For instance, the estimate for cc at 100 RILs (leftmost red point) is an average over only 40 simulations and so will have greater variance than the other plotted points. Nonetheless, the cc strategy (red) clearly shows how accuracy improves with increasing numbers of RILs. In particular, its elbow bend at 500 lines (1.1 cM) characterizes a process of diminishing returns where, past that point, modest gains hardly justify additional RILs. The remaining trends are the reverse of Figure 5A: more strains contributing to the cross and greater recombination give rise to sharper localization of the QTL, and this is most salient at 1000 RILs, where the best performer, cc08, reaches 0.58 cM. F2 and BC deliver more accurate localization with SMA (open circles) than with HAPPY (solid circles) but nonetheless lag behind the ≥500 RIL strategies.
Figure 5 does not show results from the RIHS strategies because their power was too low to allow meaningful estimates of location error. Echoing the trend above, power and degree of recombination were inversely related with 1000 RILs of rihs05 detecting 14.1% of QTL, rihs10 detecting 2.2%, and rihs30 detecting 0.5% (see Table 5). The explanation appears to be genetic drift.
Genetic drift of the foreground QTL:
By definition the QTL segregates in the progenitor strains and has an allele frequency of 0.125, 0.25, 0.375, or 0.5, depending on how many founders carry its high allele. During breeding, the frequency drifts unpredictably, affecting the QTL effect size and detectability in the final mapping population. Table 6 describes the genetic drift of the QTL under detection for the different breeding strategies and for different starting allele frequencies. We do not list the mean of the QTL allele frequency in the simulated mapping populations, since an average hides the underlying fluctuation. Rather, we give the standard deviation of allele frequencies, listing strategies prone to wide fluctuations first. The order of strategies closely matches the reversed order of power to detect a 5% QTL. Notably, rihs30 is the only breeding strategy where the allele frequency is so unstable as to drift to fixation occasionally.
QTL effect size:
We investigated the ability of the cc strategy with 500 RILs to map QTL of greater effect. Figure 6 shows detection and mapping of QTL with effect sizes 5, 10, 20, 30, and 40% using both HAPPY and SMA. In all cases, heritability was set at 50% with background QTL making up the remaining genetic variance. Figure 6a shows that effect sizes of ≥10% are uniformly detected with HAPPY (solid circles). With SMA (open circles), guaranteed detection occurs for effect sizes of ≥30%. The HAPPY line in Figure 6b is asymptotic, with the mean localization error at 0.24 cM for a 40% QTL (note: the average spacing between markers in the marker set is 0.19 cM). The localization by SMA shows small but consistent improvement with QTL of effect size ≥10%, yet never falls below the 2-cM level.
We next simulated a foreground QTL under strong epistatic control from an unlinked masking QTL lying on an ungenotyped chromosome (see methods) and tried to detect and map that foreground QTL in a population of 1000 animals. The combined effect size of the foreground and masking QTL was 10% in the founder population, with background QTL making up the genetic variance up to 50%. The results are plotted in Figure 7 and tabulated in Table 7.
Figure 7A shows the power to detect for the different breeding strategies. It exhibits the same trends as the detection graph in Figure 5 but with all points depressed by ∼40%. The top performer is BC using SMA, which achieves >90% power. The RI strategies lag far behind, with the best two-way RI (riail08) reaching just above 60% and the best eight-way strategies (hardly distinguishable cc and cctop10) coming close to 55% at 1000 RILs. As before, the power of the RIHS strategies was too low to merit their inclusion in Figure 7, with 1000 RILs of rihs05, rihs10, and rihs30 achieving 9, 1.9, and 0.5%, respectively. Figure 7B plots location error and tells a similar story to that of Figure 5B, but with all strategies locating the QTL with roughly twice the error. RI strategies with ≥500 RILs outperform F2 and BC strategies, although in the case of BC mapped with SMA this is by a small margin. The elbow bend of the cc line (500 RILs, 1.7 cM) shows, as before, that increasing the number of lines beyond 500 delivers only modest gains in resolution. The most accurate strategy and the only one that reaches below the 1-cM mark is cc08 (0.98 cM).
The collaborative cross is a highly attractive proposition because a large set of recombinant inbred strains is a powerful resource for investigating the genetic basis of complex traits, making it possible to map genetic effects into small regions and investigate gene-by-gene and gene-by-environment interactions. One drawback to the proposal is that the cross might have to be extremely large to meet some of its objectives: initial suggestions were that 1000 strains would be required. Our simulation results for QTL mapping are more optimistic.
We explored the power and mapping resolution of CC strategies for identifying QTL and compared the results to classical F2 intercross and BC approaches, as well as to two other RI methods: one using RIs derived directly from HS animals and the other from an advanced intercross. We found a number of striking trends that together characterize the relationship between power, accuracy, and aspects of breeding design in two-way and eight-way crosses.
First, we show the increase in power and resolution that accompanies an increase in RI lines (Figure 5). We used a fixed population size of 1000 animals regardless of how many RI lines were included. This meant that we implicitly modeled the trade-off between replication and genetic diversity, since one is bought at the cost of the other (see Belknap 1998 for a theoretical treatment).
Second, there was a tendency for more highly recombinant strategies to detect the QTL less readily but, once detected, to map it more accurately. Higher recombination produces a finer-grain haplotype mosaic that, by its reduced linkage disequilibrium between neighboring markers, allows sharper localization. Most strategies achieve higher recombination through increased generation time (the exception being cctop10, which undergoes specific selection for recombination during breeding). More generations mean greater opportunity for genetic drift and hence more cases of some QTL allele frequencies dropping to undetectable levels (Falconer and Mackay 1996). This explains the close resemblance between the order of strategies in Table 6 and their relative power in Figures 5 and 7. Replication accentuates these undesirable effects by reducing the effective population size. Conversely, increasing the number of lines stabilizes the allele frequencies and results in a convergence of data series as seen in the power graphs of Figures 5 and 7.
Third, our simulations suggest that the greatest resolution is achieved through a hybrid collaborative cross strategy in which two maintenance years are inserted between mixing and inbreeding (cc08). In both simulations of single and epistatic QTL, this strategy delivered subcentimorgan mapping with 1000 RILs and only slightly less accurate predictions with 500 RILs. The standard cc localized with ∼0.5 cM more error on average. However, the advantage of cc08 must be weighed against its lower power and the additional time and cost it would incur in stock creation. We acknowledge that the precision of our QTL localizations was limited by the density of markers in the GNF set. Notably this limit caused the apparent asymptote in Figure 6 (see results). We chose the GNF set because it was the best one available at the time of simulation. Since then, however, there have been and will continue to be denser marker sets available. It is possible that performing the simulations with more closely spaced markers may uncover a stronger advantage to using greater numbers of RILs than is suggested by Figure 5.
Our conclusions deserve certain qualifications, both general and specific in nature. On a general level, we chose a 5% QTL as a standard, but this may have provided a particularly stringent target, thereby making our results more conservative than necessary. Although a recent review surveyed the effect sizes of reported QTL phenotypes mapped using inbred crosses and found the mean was closer to 6% (Flint et al. 2005), in fact a smaller effect size is likely to be a more typical value. That is partly because most mapping studies overestimate QTL effect sizes (due to their small sample size) and partly because experiments that dissect QTL so often find that a single QTL is due to the combined effects of a number of physically linked QTL of small effect (Flint et al. 2005).
We also point out that our results do not tell us about the ability of the strategies to detect epistasis. Rather, our concern was to map a QTL in the presence of epistasis. In our simulations the extent to which the epistatic QTL remains detectable depends on the minor allele frequency of the second (masking) locus, which in turn depends on how well the chosen breeding strategy preserves heterogeneity. We may therefore underestimate the power and resolution attainable in practice when all chromosomes can be analyzed simultaneously.
A further general qualification is that our assessment of the trade-off between replication and genetic diversity, which affects the advantage of adding RI lines, is crucially dependent on the level of background genetic variation, which, in the case of mapping a 5% QTL, we have set at 45%. Replication squeezes out environmental noise. Phenotyping an individual once with environmental error is always going to be less accurate than phenotyping 10 identical individuals and taking their mean. That difference in accuracy is related to the size of the environmental component. For instance, if variation in the trait were entirely genetic, replication could confer no advantage. During mapping we take into account whether the population contains replicates. If not, we regress on single animals. But if there are replicates, we regress on strain means. Doing so amounts to deriving a pseudo-population of fewer animals with a QTL of greater effect size. How does this influence our results?
Background genetic variation, QTL effect size, and replication are related as follows. Let be the effect size (i.e., proportion of variance explained) in the mapping population of primary QTL, be the summed effect size of all background QTL, and be the remaining environmental variance, where , and suppose that the population of size N comprises N/n groups of n replicates. Then regressing on strain means is equivalent to mapping in a pseudo-population of N/n animals, where the effect size of the primary QTL is nowFor instance, consider a 5% QTL with no background QTL segregating in a population of 1000 genetically distinct animals (i.e., = 0.05, = 0, N = 1000, n = 1). Then consider the same QTL in a population that includes replicates: 1000 animals drawn in equal measures from 100 RILs (i.e., = 0.05, = 0, N = 1000, n = 10). Both instances describe a 5% QTL in a population of 1000 individuals. But by averaging phenotypes across genetically identical animals and thereby exploiting known population structure, the second instance is more usefully viewed as a 34% QTL in a pseudo-population of 100 “averaged” individuals.
How does inclusion of background QTL change this? In our simulations background QTL account for 45% of the phenotypic variance, such that the total heritability is 50% ( = 0.05, = 0.45). Without exploiting population structure, both instances still describe a 5% QTL segregating in 1000 individuals. However, treating the second instance as a pseudo-population of 100 now delivers an effect size of only 9%, showing a diminished advantage of replication in the face of a strong genetic background.
We chose a 50% heritable trait because it accorded with the estimated heritability of many physiological and behavioral traits. We found that power rose consistently with the number of lines used but that accuracy leveled off after 500 RILs, suggesting that 500 may be an economic optimum at which acceptable power is accompanied by fine localization. However, we note that if the trait were less heritable, this optimum would be lower; if the trait were more heritable, it would be higher.
It is noteworthy that the BC and F2 apparently performed better than many of the RI strategies, particularly in the presence of epistasis and when the number of RI lines was low. The main advantage of the RI is the much better mapping resolution achievable with ∼500 lines. However, the two-way strategies F2, BC, and RIAIL owed some of their success to strain ascertainment bias. Selecting only founder pairs in which the QTL segregate conferred several advantages over the eight-way strategies. First, it ensured that QTL allele frequency in the founding population was always 50% (while it could be as low as 12.5% in an eight-way cross). Consequently, not only did the marker allele uniquely determine strain origin, but also the effect size of the QTL was often amplified. We set the allelic value (the amount that an allele adds to the phenotype) to be constant across designs and calibrated it so it would account for 5% of the variance in a population of the eight founders. Among the founders, most QTL had a minor allele frequency of <50%. However, in the two-way strategies, the QTL frequency rises to 50%, which increases the variance attributable to it and thus its effect size. Moreover, because some background QTL that segregate among eight founders will inevitably not segregate between two founders, the amount of background variation will be diminished. Note that this is somewhat offset by the fact that the same will be true of non-QTL markers, fewer of which will be polymorphic, leading to coarser-grain mapping.
These are important consequences of using simpler breeding programs. We could have set the effect size of the QTL according to the mapping population, but that would have required varying the allelic value across designs, an unfair depiction of biological reality. Nonetheless, there is one advantage of eight-way strategies that our simulations mask: namely that a QTL is more likely to segregate among eight strains than among two. We have assumed that the QTL is known to segregate between the two chosen strains. Although pairs of strains are rarely chosen for a cross at random, it is a goal of the collaborative cross to make that decision less crucial to success of a mapping experiment. We discuss this point more fully in the appendix and detail how it upwardly biases our assessment of power and accuracy for two-way strategies.
The F2 and BC also have inherent disadvantages against RI strategies, two-way or otherwise. Because we simulate dominance effects, homozygous individuals lie at the extremes of the phenotypic spectrum. This means that the genetic variance of a QTL in an RI population will be greater than that in an F2 population, which has heterozygotes, and particularly greater than that in a BC population, which lacks one homozygote class (Belknap 1998; Flint et al. 2005). However, our inclusion of background QTL moderates these effects.
The poor performance of the RIHS strategies is also due partly to genetic drift, but the explanation is slightly more complex and relates to our inclusion of background QTL. The simulated RIHS design was faithful to the construction of the existing Northport HS. In particular, the number of mating pairs in the mixing stage of breeding was set at 50. In the case of rihs30, that introduced a dangerous bottleneck that allowed fixation or near fixation at many loci. In simulations mapping a foreground QTL that affects a monogenic quantitative trait we have found that increasing the number of mating pairs to 150 recovers some power and accuracy (data not shown). But in the presence of background QTL more extreme increases are necessary.
The rihs08 and rihs04 strategies echo these problems to a lesser extent. But the impact of the maintenance bottleneck in RIHS runs deeper. Table 3 lists 5% significance thresholds for different strategies. The thresholds for the rihs30 strategy are excessively high because the rihs30 population is quasi-inbred. Many markers on the genotyped chromosome (chromosome 1) are fixed or nearly fixed, and many of the background QTL (on chromosomes 2–7; see methods) have drifted far from their starting frequencies of 50% to have highly unequal effects. In our null simulations of rihs30, where the foreground QTL has zero effect, it is likely that some of the genotyped markers are in strong linkage LD with background QTL of amplified effect. Because these markers act as surrogates for a true QTL, they will produce good regression fits even in the absence of a foreground QTL and, as a result, cause threshold estimates based on that null distribution to soar.
Moreover, increasing the number of lines from 500 RILs to 1000 RILs will make matters worse. In the CC strategies, RILs are drawn from either a vast pool of genetically different individuals from an expansive mixing phase (as in cc and cctop10) or a large maintenance population. In the RIHS strategies, RILs are always drawn from a small maintenance population. Therefore, increasing the number of RILs releases little extra genetic variation but does increase the apparent sample size and therefore the significance of the observed association. Note that RIHS performed far worse here than in simulations reported previously (Valdar et al. 2003). The reasons are twofold. First, in those earlier simulations we did not include background QTL and so avoided the pitfalls described above. Second, in those simulations we fixed the effect size of the QTL relative to the mapping population, which meant that QTL drifting to low frequency were given commensurately larger allelic values to produce a 5% effect. By contrast, our current simulations calibrate the allelic value in the founders and fix it thereafter such that the effect size of the QTL changes as its allele frequency drifts through breeding. Thus, our current simulations implicitly reward strategies that stem genetic drift and penalize those that encourage it.
In summary, we find that for QTL detection and high-resolution mapping, the CC would not greatly benefit from increasing the number of inbred strains above 500. In the presence of epistatic QTL, additional RIs certainly help, particularly with detection, but the gains are relatively modest. Given the cost and logistical difficulties inherent in creating, maintaining, and distributing a very large set of RIs, it is hard to justify the routine use of a CC >500 for QTL mapping, although there may be advantages using additional lines for other experimental purposes, such as expression analyses. The results presented here are not specific to mice and so will be of interest to geneticists designing panels of RILs for other model organisms and plants.
Finally, we found that no strategy described here will definitively identify a gene. The mapping resolutions are still insufficient to achieve that goal. In the best scenario that we achieve using an eight-way strategy, the location error is just under 0.6 cM (obtained by 2 additional years of strain mixing in the CC and using 1000 RIs). Note that this is not the same as a confidence interval. Rather, it means that if we were to map a QTL, the gene would lie on average 0.6 cM away from the highest association peak. Since the mean gene density in the mouse genome is ∼10 genes/Mb, ideally we need a method with a resolution of 100 kb (equivalent to a location error of 0.05 cM).
The success of two-way strategies in our simulations is partly due to strain ascertainment bias. For these strategies, we selected founder pairs in which the QTL segregate, since to do otherwise would have meant simulating no QTL for most cases. However, this means that our assessment of power for these strategies is upward biased.
It is instructive to consider what would happen if, rather than ascertain strains, we choose the founder pair randomly from the eight. How would this affect power and resolution? The following reasoning assumes the QTL is one of the GNF SNPs and segregates among the eight founders. Power to detect describes a probability, P(detect). We can split this probability as follows:where “segregates” means the QTL segregates and “∼segregates” means it does not. Clearly, P(∼segregates) = 1 − P(segregates). Also the P(detect | ∼segregates), the probability to detect a QTL given that it is fixed, is equivalent to the significance thresholdIn the simulations performed here, we report only cases where the QTL segregates and so we report the power as P(detect | segregates). In the case of an eight-way strategy, P(segregates) = 1, and so P(detect | segregates) = P(detect). However, for a two-way strategy without strain ascertainment, P(segregate) < 1 and so P(detect) ≠ P(detect | segregate).
We can work out P(segregrates) for a single SNP as follows. If the high allele exists in b of 8 strains, then the probability of choosing two strains but only one high allele is . Averaging this over all m GNF SNPs giveswhere bi is the value of b for marker i and P(i) is the probability of choosing that marker. Therefore, for the two-way crosses, the power to detect without strain ascertainment iswhich would mean that an F2, BC, or RIAIL whose power was 100% with strain ascertainment would have only ∼50% power without this advantage. This reduction is even more pronounced in the epistatic case. There, further strain selection to ensure segregation of the unlinked QTL results in P(segregates) = 0.269, which reduces an apparent 100% detection to ∼30%.
In summary, if we were to simulate the extreme opposite case of no strain ascertainment, for two-way strategies power would be about half that reported here when mapping a single QTL and less than a third in the case of epistasis.
We thank Ken Paigen and Karl Broman for helpful discussions. This work was funded by a grant from The Wellcome Trust.
Communicating editor: K. W. Broman
- Received December 3, 2004.
- Accepted December 7, 2005.
- Copyright © 2006 by the Genetics Society of America