Skip to main content
  • Facebook
  • Twitter
  • YouTube
  • LinkedIn
  • Google Plus
  • Other GSA Resources
    • Genetics Society of America
    • G3: Genes | Genomes | Genetics
    • Genes to Genomes: The GSA Blog
    • GSA Conferences
    • GeneticsCareers.org
  • Log in
Genetics

Main menu

  • HOME
  • ISSUES
    • Current Issue
    • Early Online
    • Archive
  • ABOUT
    • About the journal
    • Why publish with us?
    • Editorial board
    • Early Career Reviewers
    • Contact us
  • SERIES
    • All Series
    • Genomic Prediction
    • Multiparental Populations
    • FlyBook
    • WormBook
    • YeastBook
  • ARTICLE TYPES
    • About Article Types
    • Commentaries
    • Editorials
    • GSA Honors and Awards
    • Methods, Technology & Resources
    • Perspectives
    • Primers
    • Reviews
    • Toolbox Reviews
  • PUBLISH & REVIEW
    • Scope & publication policies
    • Submission & review process
    • Article types
    • Prepare your manuscript
    • Submit your manuscript
    • After acceptance
    • Guidelines for reviewers
  • SUBSCRIBE
    • Why subscribe?
    • For institutions
    • For individuals
    • Email alerts
    • RSS feeds
  • Other GSA Resources
    • Genetics Society of America
    • G3: Genes | Genomes | Genetics
    • Genes to Genomes: The GSA Blog
    • GSA Conferences
    • GeneticsCareers.org

User menu

  • Log out

Search

  • Advanced search
Genetics

Advanced Search

  • HOME
  • ISSUES
    • Current Issue
    • Early Online
    • Archive
  • ABOUT
    • About the journal
    • Why publish with us?
    • Editorial board
    • Early Career Reviewers
    • Contact us
  • SERIES
    • All Series
    • Genomic Prediction
    • Multiparental Populations
    • FlyBook
    • WormBook
    • YeastBook
  • ARTICLE TYPES
    • About Article Types
    • Commentaries
    • Editorials
    • GSA Honors and Awards
    • Methods, Technology & Resources
    • Perspectives
    • Primers
    • Reviews
    • Toolbox Reviews
  • PUBLISH & REVIEW
    • Scope & publication policies
    • Submission & review process
    • Article types
    • Prepare your manuscript
    • Submit your manuscript
    • After acceptance
    • Guidelines for reviewers
  • SUBSCRIBE
    • Why subscribe?
    • For institutions
    • For individuals
    • Email alerts
    • RSS feeds
Previous ArticleNext Article

Distribution of Parental Genome Blocks in Recombinant Inbred Lines

Olivier C. Martin and Frédéric Hospital
Genetics October 1, 2011 vol. 189 no. 2 645-654; https://doi.org/10.1534/genetics.111.129700
Olivier C. Martin
INRA, Université Paris-Sud, Centre National de la Recherche Scientifique, AgroParisTech, UMR0320/UMR8120 Génétique Végétale, F- 91190 Gif-sur-Yvette, FranceUniversité Paris-Sud, Centre National de la Recherche Scientifique, UMR 8626 Laboratoire de Physique Theorique et Modeles Statistiques, F-91405, Orsay, France
  • Find this author on Google Scholar
  • Find this author on PubMed
  • Search for this author on this site
  • For correspondence: olivier.martin@u-psud.fr
Frédéric Hospital
INRA, UMR1313 Génétique Animale et Biologie Intégrative, 78352 Jouy-en-Josas, France
  • Find this author on Google Scholar
  • Find this author on PubMed
  • Search for this author on this site
  • Article
  • Figures & Data
  • Supplemental
  • Info & Metrics
Loading

Abstract

We consider recombinant inbred lines obtained by crossing two given homozygous parents and then applying multiple generations of self-crossings or full-sib matings. The chromosomal content of any such line forms a mosaic of blocks, each alternatively inherited identically by descent from one of the parents. Quantifying the statistical properties of such mosaic genomes has remained an open challenge for many years. Here, we solve this problem by taking a continuous chromosome picture and assuming crossovers to be noninterfering. Using a continuous-time random walk framework and Markov chain theory, we determine the statistical properties of these identical-by-descent blocks. We find that successive block lengths are only very slightly correlated. Furthermore, the blocks on the ends of chromosomes are larger on average than the others, a feature understandable from the nonexponential distribution of block lengths.

WITH the advent of dense genomic maps, in particular based on single-nucleotide polymorphism (SNP) data, the study of haplotypes has become central for modern analyses in population genetics (Buckler and Gore 2007; Carlton 2007; Frazer et al. 2007; Mott 2007; Jakobsson et al. 2008; Bryc et al. 2010). Here, the term haplotype refers to the series of alleles that an individual carries on a chromosome pair at a collection of (possibly many) loci and contrasts with single-locus genotypes that were the objects of many past studies. Haplotypic information can be used for association studies (Gold et al. 2008), for diversity studies (Lindblad-Toh et al. 2005), or for recognizing signals of positive selection using various measures of haplotype homozygosity (Sabeti et al. 2002; Zhang et al. 2006; Lencz et al. 2007; Tang et al. 2007; Curtis et al. 2008). Many approaches capitalize on the apparent “block” structure of haplotypes (Stumpf 2002; Cardon and Abecasis 2003; Wall and Pritchard 2003; Altshuler et al. 2005; Zheng and McPeek 2007). Various causes can be called upon to explain the apparent structuration of genomes in haplotype blocks (Tishkoff and Verrelli 2003; Zondervan and Cardon 2004; Pe’er et al. 2006), among which are recombination hotspots (Goldstein 2001; Jeffreys et al. 2001) and population structure (Pritchard et al. 2000; Grote 2007; Slate and Pemberton 2007). However, the situation is often complicated (Shifman et al. 2003; Yalcin et al. 2004; Cuppen 2005; Kauppi et al. 2005; Greenawalt et al. 2006; Moore et al. 2008). In particular, the theoretical properties of many of the objects mentioned above, e.g., haplotype block lengths, remain largely unknown. Often, the distribution of blocks is declared “nonrandom” (Curtis et al. 2008) although the null hypothesis is not clearly specified.

The task of determining statistical properties of chromosomal block structures has arisen in many different contexts. These can be classified into two types according to the kind of populations considered and lead to different mathematical techniques. In the first class, one asks how the genome of one or more parents in a population gets broken up into blocks at successive generations and how different descendents may share identical-by-descent (IBD) blocks. The framework most generally taken allows for random mating between individuals, a stochastic number of offspring for each individual, and possibly population growth; because of this stochasticity, the mathematical theory of branching processes plays a key role. The second class on the contrary assumes complete knowledge of all genealogies and so is relevant only for controlled crosses. But because the corresponding framework is thus constrained, Markov chains can be used to follow the statistics of IBD blocks and even how the genomes of all founding parents get shared among descendants.

The mathematical treatments in the first class typically build on Fisher’s “theory of junctions” (Fisher 1949). Fisher defined a junction in a chromosome as a boundary point between segments descended by different routes from the founders. Once formed by crossovers, junctions can be inherited, just like point mutations. Fisher (1949, 1954, 1959) and Bennett (1953) investigated the expected number of chromosomal regions separated by junctions for different systems of inbreeding (repeated selfing, repeated sib mating, repeated parent–offspring mating, etc.). Stam (1980) extended Fisher’s theory of junctions to a random mating population of constant size and any number of generations. Furthermore, he was able to derive the “probability distribution of the heterogenic part of the genome” (and not just the expected number of fragments) by assuming that the fragments were exponentially distributed—a critical hypothesis that was justified by numerical simulations. Chapman and Thompson (2003) extended Stam’s work to the case of a subdivided population. They were also able to relax the hypothesis of an exponential distribution and showed that the IBD tracts of chromosomes followed a distribution not quite exponential, having in fact a fat tail. They also determined how these properties were affected by the population size. Analogous work by Baird et al. (2003) focused on the case of a formally infinite population; this simplifies the problem because related individuals never mate with one another. Furthermore, they worked in the approximation of allowing only 0 or 1 crossover at each meiosis, a case sometimes referred to as complete interference; then each individual can carry at most just one block from the reference founding parent. Within this framework, they were able to treat the problem exactly, deriving in particular the distribution of the number of descendants containing blocks and the first two moments of these block sizes.

The mathematical treatment of the second class was initiated by Donelly (1983). Numerous studies since have derived exact mathematical results on different kinds of pedigree systems (Slatkin 1972; Franklin 1977; Donelly 1983; Guo 1994; Bickeboller and Thompson 1996a,b; Stefanov 2000; Browning and Browning 2002; Cannings 2003; Dimitropoulou and Cannings 2003; Ball and Stefanov 2005; Walters and Cannings 2005; Rodolphe et al. 2008). Such studies map the IBD problem to that of a random walk on a pedigree-dependent graph. It is that Markovian framework which we use here in the context of recombinant inbred lines, a particular kind of pedigree that has the additional complication of allowing for an infinite number of generations. We provide a description that is mathematically rigorous but also of practical use.

Recombinant inbred lines (RILs) can be derived by either self-fertilization (plants) or brother–sister matings (animals). RILs have become a tool of choice for animal and plant studies [genetic maps, QTL detection, and association studies (Churchill et al. 2004; Churchill 2007; Crow 2007; Keurentjes et al. 2007; Yu et al. 2008)]. Moreover, such lines are fixed and provide ever-lasting replicable reference homozygous genomes; these are very useful to dissect complex traits and estimate epistatic effects or genotype × environment interactions (Bergland et al. 2008; Maccaferri et al. 2008; Alcazar et al. 2009). To produce a RIL, one typically starts with F1 hybrids derived from the cross of two homozygous parents, say PA and Pa. Offspring are generated from these F1 and the process is repeated for many generations; this can be done by selfing [single-seed descent (SSD)] or by full-sib mating (hereafter referred to as “SIB”). At each generation mean heterozygosity decreases and in fact the process tends toward homozygosity at all loci. Due to the formation of crossovers during meiosis, the genomes at each generation are mixtures of the two parental ones, in which closer loci have a higher probability of descending from the same parent PA or Pa. The fixed genomes then form successions of blocks, each block being IBD to one of the two parents. In effect, we have a mosaic genome for the RIL, patching together pieces from each parent PA or Pa. What is the mean length of blocks? We shall see that it is 0.5 M in SSD and 0.25 M in SIB if the chromosome genetic length is large. But one may also ask what is the block length distribution, what is the mean number of blocks on a finite chromosome, or even what are the analogous statistics before all loci are fixed. As genome coverage becomes dense or as one approaches a nucleotide-level description of genomes [be it for association studies or genomic selection (Meuwissen et al. 2001)], one is inevitably driven toward a continuous chromosome picture, requiring block-like descriptions. Here, we address the need to work at this level where blocks are the elements of interest. In particular, we show how to calculate block statistics in RILs, using a mix of combinatorial analysis and probability theory.

Model and Methods

Junctions

For all our work, we deal with diploid organisms and are concerned with the construction of RILs in both SSD and SIB. Each chromosome pair is subject to independent dynamics, so without loss of generality we can focus on the case of a single chromosome pair. Furthermore, the objects of study are the IBD blocks, hereafter simply referred to as blocks. The F1 is considered to be the zeroth generation (g = 0). To go from one generation to the next, we produce one offspring in the case of SSD and a brother–sister pair in the case of SIB. An offspring individual is the union of two gametes, each of which is produced by a parent through meiosis, during which there can be crossovers. (In the case of SSD, there is just one parent.) Once a locus is fixed, it stays so forever. In SSD this simply requires the two homologous chromosomes to have the same allele at that locus; in SIB, it requires all four chromosomes to have the same allele.

We measure continuous positions on the chromosome in morgans, with the leftmost end of the chromosome corresponding to the origin of our axis; i.e., x = 0. Following the work of Fisher (1949, 1954, 1959), Bennett (1953), and Donelly (1983), a crossover is referred to as a “junction” and is identified with an arbitrarily precise point on the chromosome. We assume that crossovers arise without interference; then the production or not of a junction in the interval [x, x + dx] is independent of occurrences of junctions in any other interval. Here x denotes the genetic position, dx is infinitesimal, and junctions arise with density 1 along the chromosome. Figure 1 illustrates the use of these junctions when following successive generations under SSD. At one generation, consider the pair (H, H′) of homologous chromosomes. A meiosis takes place and results in an offspring chromosome (gamete) that is a mosaic of chromosome segments coming from either H or H′. A junction separates two adjacent segments. We use a binary label (0 or 1) to specify the origin (H or H′) of each segment. For any position x, having the list of its labelings for all generations g allows us to determine the IBD content at x as shown on the right-hand side of Figure 1. Note that the numbering of the junctions is done from left to right, not as a function of its occurrence in generations. The successive steps of the procedure are shown in Supporting Information, Figure S1: first one lays out the junctions and their numbering, then one introduces the binary labels across each junction, and finally one reconstructs the haplotypes (see File S1). As illustrated in Figure S2, the case of SIB mating is analogous, and again at each generation a chromosome consists of a mosaic of the founding parents’ chromosomes (see File S1).

Figure 1
  • Download figure
  • Open in new tab
  • Download powerpoint
Figure 1

Labeling in a SSD RIL. At each generation the homologs are called H and H′. To keep track of the IBD property, for each point on the continuous chromosome under consideration we specify the origin (H or H′ in the parent) using a 0–1 label, covering zones separated by junctions. The genotype at any generation g can be reconstructed from these binary numbers as shown on the right. Note that a junction need not separate two blocks.

So far, the introduction of junctions can be formulated for any pedigree system. Many previous studies have done this and mapped the IBD problem to that of a continuous-time random walk on a pedigree-dependent graph (Slatkin 1972; Franklin 1977; Donelly 1983; Guo 1994; Bickeboller and Thompson 1996a,b; Stefanov 2000; Browning and Browning 2002; Cannings 2003; Dimitropoulou and Cannings 2003; Ball and Stefanov 2005; Walters and Cannings 2005; Rodolphe et al. 2008). Here we consider RILs and then the Markovian framework’s pedigree-dependent graph is a hypercube. The sequence of binary labels at any given locus x specifies a unique vertex of the hypercube, and how this vertex changes as one moves along the chromosome determines the block structure. Note that some of this mathematical framework is close in spirit to that used for studying the coalescent in the presence of recombination; there the central object is the so-called ancestral recombination graph, and the problem (Wiuf and Hein 1999; McVean and Cardin 2005) is to describe how this graph changes with position along a continuous chromosome. This is a very difficult problem and so the authors of those studies derived relatively few exact results.

The application of this Markovian framework for SSD and SIB RILs requires considering all possible continuous-time walks on a high-dimensional hypercube. We do this in two steps. First, we enumerate by computer all possible discrete time random walks on that hypercube. Then we tackle the continuous waiting times of the original walks by analytical techniques. Finally, the numerical treatment of these analytical expressions is performed using Mathematica (Wolfram 1991). The C and Mathematica codes for these different tasks are provided in File S2.

The continuous-time Markov process

When creating the generations 1 to g, 2g gametes are produced in the SSD mating scheme, and 4g gametes are produced in the SIB mating scheme. Denote this number by Nc as it is also the number of (new) chromosomes produced in the RIL construction (remember we follow only one chromosome pair). Rather than follow each gamete from one generation to the next, it is (more) useful to consider all Nc gametes simultaneously. This can be visualized by stacking the pairs of chromosomes for all generations on top of each other and then scanning the chromosome stack from left to right to see where the junctions appear in order of increasing x.

Of great importance is the fact that the junctions on these Nc gametes are independent in all respects: having a junction on one gamete in [x, x + dx] does not affect the probability of having another junction elsewhere, be it on the same gamete or on any other gamete. Because of this independence, one can think of the production of junctions among the whole set of gametes as being a “continuous-time” Markov process (Feller 1950), where x plays the role of time. For the interval [x, x + dx], a junction arises with probability Ncdx, and then if such an event is realized the junction is assigned randomly to one of the Nc gametes (each with probability 1/Nc). The operation is then repeated for the interval [x + dx, x + 2dx], and so forth. We thus have a Markov process where interevent intervals are independent and distributed asEmbedded Image(1)while junction assignments to chromosomes are done equiprobably.

Discrete and continuous-time random walks on the hypercube

We initialize the binary labels at x = 0 randomly and uniformly because segregation is unbiased. The continuous-time Markov process extends these labels from x = 0 toward increasing x. At any given x, and using a {0, 1} notation for each binary label, we call ℳ the map from the Nc dimensional hypercube ℋ Embedded Image> to the genotypes at generation g; this map can be thought of as a coloring of the vertices of the Nc-dimensional hypercube. There are as many colors as there are one-locus genotypes at generation g: 4 for SSD and 16 for SIB. Then the block pattern at generation g can be “read off” by examining the succession of vertex colors visited by the Markov walk on the hypercube. Note that having a junction appear at x corresponds to hopping to a random neighboring vertex on ℋ at that time, while the residence time on each vertex is exponentially distributed (cf. Equation 1).

For the block statistics, we want to find the probability that the walk on ℋ leads to a given pattern of successive colors. For this, we sum over all possible walks compatible with the desired pattern. The crucial point is that the continuous variables of the interjunction values affect the lengths of the blocks, but not the pattern of successive blocks. This allows us to decompose the problem of block statistics into two parts. The first comes from the discrete set of possibilities for the sequence of vertices visited on ℋ (the “topology” of the junctions); we use the master equation of the discrete-time random walk on the hypercube to track these sequences. The second is associated with the continuous nature of the junction–junction intervals, which involves summing over known probability distributions derived from Equation 1.

Extracting block length distributions

Consider the simplest observable: the length of the first block along the chromosome. If the block is heterozygous, then its length distribution in SSD is that of the distance of the first junction and is given by Equation 1. Indeed, the locus x = 0 must be heterozygous at generation g, but in SSD, at the very first hop of our random walk on the hypercube, the heterozygous block will end, starting a fixed block. The situation is more instructive if the first block is homozygous (fixed). To calculate the length distribution of that first block, we first consider all possibilities for the different walks from the starting vertex (defined from the situation at x = 0). A walk will maintain the homozygous structure at generation g for perhaps a few hops and then one hop will change that. If k is the first hop that ends the block, we can collect together all the discrete time walks that have the same k. We thus define P(1)(k) as the probability to perform k – 1 hops while staying at generation g in the same fixed state as that of x = 0 and to then terminate the block at the kth hop. P(1)(k) is the sum of the probabilities of all discrete time walks on ℋ that are compatible with staying in the first fixed state during exactly k – 1 hops. Because the number of such walks grows exponentially with k, it is best to determine this quantity by recursions rather than by enumerations. This is precisely what is done when using the associated master equation. Each iteration of that equation updates a vector on the hypercube and generates the successive P(1)(j). To obtain P(1)(k) one has to perform k iterations of the master equation. File S1 specifies this master equation, the initialization of the vector iterated, and the relation between the iterated vector and P(1)(j); the C programs for implementing these iterations are also provided (see File S2).

Given the P(1)(k) probabilities, we can reintroduce the continuous times spent on each vertex of the hypercube to get the distribution of the length of the first bloc. Indeed, for SSD as well as for SIB, for all walks that contribute to this situation, x will go from 0 to xk with xk distributed as a rescaled Gamma distribution,Embedded Image(2)as this the distribution of the sum of k independent exponentially distributed variables. The distribution of ℓ1, the length of the first block (assuming it is fixed) is then given byEmbedded Image(3)This result holds for an infinite chromosome. For a finite chromosome of length L, we note that if ℓ1 > L, we have “stepped off” the finite chromosome. Thus, if the value of ℓ1 (distributed as in Equation 3) is greater than L, we see that on the finite chromosome the block is actually only L long. Thus to adapt Equation 3 to a finite chromosome, we simply keep the distribution as is when ℓ1 < L while for all those values ℓ1 ≥ L we set ℓ1 = L. Mathematically, this generates a delta function at that point of weight given by the probability that ℓ1 ≥ L. This derivation corresponds to a simple truncation of a distribution, and it can be extended to other observables. These include the length of the nth block, which requires calculating the probabilities P(n)(k) of stepping off the nth block after k steps, and the joint distribution of lengths for different blocks. Details on such derivations are given in File S1. The Mathematica codes for performing sums like those in Equation 3 are also provided in File S2.

Results

Infinite and semi-infinite chromosomes

Mean block lengths:

As the number g of generations increases, alleles become fixed and since nothing changes thereafter the statistics of the blocks must have a limit at large g. In that situation, there is only alternation of IBD blocks homozygous of type PA or Pa. We focus on the statistics of such blocks, either at arbitrary g or in the g → ∞ limit, approximated by taking g large enough. However, our computational framework applies to arbitrary blocks, homozygous or heterozygous. From the practical point of view, we are limited computationally by the part of the algorithm that follows occupation probabilities on the hypercube ℋ: executing this master equation on the computer uses Embedded Image operations where NJ is the maximum number of hops of the walks; this restricts our study to 14 generations in SSD and 7 generations in SIB.

A simple statistic of blocks is their mean length 〈ℓ〉. This quantity is related to the density η of block extremities: on a very large chromosome of size L, the number of blocks n will satisfy n/L ≈ η while 〈ℓ〉 ≈ L/n. For SSD RILs the density η is known to be 2 while it is 4 in SIB. Such a result follows by considering a small interval [x1, x2] and asking that the interval be recombinant. In SSD this occurs with probability R = 2r/(1 + 2r), where r is the recombination rate per meiosis between x1 and x2 (Haldane and Waddington 1931). Taking x2 – x1 to be infinitesimal, we get r ≈ (x2 – x1) (Haldane 1919) and so R ≈ 2(x2 – x1); noting that recombination then implies the presence of a block extremity in this interval, we see that the density of block extremities is 2. Setting η = 2, one obtains directly 〈ℓ〉 = Embedded Image, valid at large g and for large chromosomes. An identical reasoning gives 〈ℓ〉 = Embedded Image for SIB since in this case R = 4r/(1 + 6r).

Distribution of block lengths:

Getting the distribution of a block length cannot be achieved by such shortcuts. Instead, we generalize Equation 3, again at very large g and for a very long (semi-infinite) chromosome. Starting from x = 0, the successive block lengths are ℓ1, ℓ2, … ; as the block number n increases, the distributions of ℓn tend toward a limiting distribution μ*(ℓ) that has no memory of the state at the chromosome’s origin. The computation of the distribution at any given n, in direct analogy with what was done for ℓ1, requires calculating the probabilities P(n)(k) of staying on the nth block during k – 1 hops and stepping off at the kth one. Again, we use the master equation to compute these quantities iteratively; cf. File S1. Then the distribution of ℓn is obtained by replacing the P(1)(k) in Equation 3 by P(n)(k). These successive distributions are displayed for SSD in Figure 2. Note that the convergence in n is very rapid; only the first block is visibly different from the others. In File S1, we provide a parameter-free approximation to μ*(ℓ) that works quite well as shown in Figure S3. In the case of SIB RILs, the convergence with n is much slower as shown in Figure S4.

Figure 2
  • Download figure
  • Open in new tab
  • Download powerpoint
Figure 2

Block length distribution. Displayed is the probability density of homozygous block length for block number 1, 2, . . . , 10 in SSD. The chromosome is semi-infinite, and the number of generations is large to be in fixation; except for the first block, the curves seem to superpose but in fact are distinct.

A log–log plot of these distributions shows that they are not exponential, though in the tail they all are well approximated by an exponential; such a form in the tail is a consequence of the spectral decomposition of the Markov process (see File S1). Note that these distributions for ℓ1, ℓ2, … must all decay at the same asymptotic rate. Another important point is that the distribution of ℓ2 is slightly different from that of ℓ3, proving that the successive lengths are not independent: the block lengths are not generated by a stationary renewal process.

Although we were not able to derive the analytic form of μ*, we nevertheless haveEmbedded Image(4)This can be proved by relating μ*(ℓ → 0) to double-recombinant frequencies as follows. We take two successive intervals I1,2 and I2,3, each of length dx (infinitesimal), and ask what the probability is that the intervals are both recombinant. There is a probability 2dx in SSD and 4dx in SIB that the first interval is recombinant (recall that the densities of block extremities are respectively 2 and 4). Then given this first extremity, the probability that there is another extremity in the second interval is μ*(ℓ → 0)dx, leading to a total probability of 2dxμ*(ℓ → 0)dx in SSD and 4dxμ*(ℓ → 0)dx in SIB. However, this double-recombinant probability can also be computed (Martin and Hospital 2006) in terms of the three recombination frequencies R1,2, R2,3, and R1,3 associated with the locus pairs (1, 2), (2, 3), and (1, 3). In the limit of small dx, it is (2dx)23/2 for SSD and (4dx)27/4 for SIB. Identifying the different expressions, we get the claimed results.

Analogous studies can be performed at given values of g, including or not heterozygous blocks. Recalling that fixation arises rather rapidly in SSD, it is no surprise that the statistics of homozygous blocks converge quickly to a large g limit. For example, if one computes in SSD the length distribution of the first block when it is homozygous, one finds that it does not vary much for g ≥ 3 as illustrated in Figure S5. For completeness, we show the analogous result in SIB in Figure S6.

The first block is longer than the following ones:

The system does not follow a stationary renewal process. Nevertheless, it turns out that the large difference we see between the first and the remaining blocks is not so much due to the memory from one block to the next but to the nonexponential distribution of block lengths. Even in the presence of memory from block to block, one has the following general relation on a semi-infinite chromosome at large g,Embedded Image(5)where ℓ1 denotes the length of the first block and ℓ∞ denotes that of faraway blocks. The proof boils down to considering the blocks on the infinite line and taking the origin of the (semi-infinite) chromosome at random. It falls inside a block of length ℓ∞ with probability density proportional to ℓ∞ itself. Denoting as before by μ(1)(ℓ1) the probability density of the length of the first block, we haveEmbedded Image(6)(The reader can check that this is a normalized probability density.) Using this density, the computation of the first moment of ℓ1 leads directly to Equation 5. To interpret this result, note that Equation 5 implies that the relative difference (〈ℓ1〉 – 〈ℓ∞〉)/〈ℓ∞〉 is equal to the [relative variance of μ*(ℓ∞) Embedded Image. When μ*(ℓ∞) is a pure exponential, this quantity vanishes and then 〈ℓ1〉 = 〈ℓ∞〉. Thus we have 〈ℓ1〉 > 〈ℓ∞〉 if and only if the relative variance of μ*(ℓ∞) > 1, which is what we find to happen in this system. For instance in the SSD case, we have 〈ℓ1〉 = 0.595, to be compared with 〈ℓ∞〉 = Embedded Image. The first block is thus on average substantially larger than the others.

From Figure 2 one can see that μ(1) is more spread out than μ*; μ* gives μ(1) from Equation 6 so that μ(1)(0) = 2 in SSD and 4 in SIB by direct computation using 〈ℓ∞〉. Note that μ(1)(0)dx can also be interpreted as the probability of having the first block end between x = 0 and x = dx; since that is the same as the density of junctions times dx, i.e., 2 or 4 × dx, we indeed recover the result μ(1)(0) = 2 for SSD and μ(1)(0) = 4 for SIB.

The lengths of successive blocks are slightly correlated:

Even though junctions are independent, each junction affects the IBD property in its neighborhood. Two positions will have nearly independent IBD only when they are distant along the chromosome because only in that case will there be many crossover events separating them. It thus seems natural to expect that the successive block lengths will not be independent in contrast to the underlying Δx separating junctions. In fact this must be the case given that we found earlier that on a semi-infinite chromosome the distribution of lengths is different for the second and the third block.

Our framework allows one to compute joint distributions and thus the linear correlation coefficientsEmbedded Image(7)where Embedded Image (resp. Embedded Image) is the standard deviation of ℓn (resp. ℓn+1). The joint distribution of ℓn and ℓn+1 is given byEmbedded Imagewhere P(n,n+1)(kn, kn+1) is the probability that the nth block ends after kn hops and the n + 1th after kn+1. From this distribution, the mean of the product ℓnℓn+1 is the sum of the probabilities P(n,n+1)(kn, kn+1) times the average of ℓn times the average of ℓn+1 (factorization), each of which is obtained from Equation 2. The linear correlation coefficient then reduces toEmbedded Image(8)where Embedded Image (resp. Embedded Image) is the standard deviation of kn (resp. kn+1). These quantities are directly obtainable from the probability P(n,n+1)(kn, kn+1), as long as the number of generations is not too large. We have computed these quantities in SSD for a semi-infinite chromosome, for different g’s and choices of block numbers. For instance, if we consider the first and second blocks, assumed to be fixed, the value of C(ℓ1, ℓ2) is –0.0197 at g = 2, –0.0125 at g = 3, –0.00841 at g = 4, … , with a trend that is compatible with a vanishing limit at large g. We find the same trend for the following blocks too. Furthermore, at given g, C(ℓn, ℓn+1) rapidly converges to a limiting value as n increases. This is illustrated in Table S1. The computer programs for obtaining these correlation coefficients are also provided (see File S2).

Case of finite chromosomes

Length distributions:

Clearly on a finite chromosome of length L the distributions of block lengths are modified. The computations are more complicated, but remain feasible. As an illustration, consider the case where the chromosome has just two blocks. We use the P(1,2)(k1, k2) probabilities, and for each (k1, k2) we impose the filter that ℓ1 < L while ℓ1 + ℓ2 > L. Then we have for the probability densities μ(1)(ℓ1) = μ(1)(ℓ2) and thus the distribution must be symmetric about L/2, and we also have 〈ℓ1〉 = 〈ℓ2〉 = L/2. The explicit expression for this density isEmbedded Image(9)From this it turns out that μ(1)(ℓ1) has a minimum at ℓ1 = L/2: it is more likely to have one rather short and one rather long block than to have two blocks of approximately the same size.

A comparison with experimental RIL data:

It is appropriate to compare our theoretical computations with block statistics measured in experimental RILs. Since the block sizes are random variables, it is best to work with RIL data sets where (i) the block structure has been determined precisely, requiring high-density genotyping, and (ii) there are many lines, an easier task for SSD than for SIB crosses. Such a data set has been produced within the species Arabidopsis thaliana by Singer et al. (2006). These authors genotyped several hundred thousand loci via hybridization arrays, from which they determined block extremities in 100 SSD recombinant inbred lines derived from the crossing of Columbia and Landsberg homozygotes. Singer et al. provide the genetic map of their cross and the physical positions of the block extremities. From these data we determined the block lengths for each RIL and each of the five chromosomes. We display in Figure S7 the distributions of the first block length, for all five chromosomes. The solid line is the theoretical curve, corresponding to the infinite chromosome case but truncated to the genetic length of each chromosome. As was explained previously, if the (infinite chromosome) random variable ℓ1 is larger than the length L of the chromosome, one sees in practice a block of length ℓ1 = L; this happens with a finite probability that is represented in Figure S7, using a solid dot. The histograms are for the experimental data, and we have included the 95% confidence intervals for each bin. We see that the theoretical predictions agree well with the experimental values except for the last bin of chromosome 1 and chromosome 4. Interestingly, for both of these the segregation data exhibit significant distortion; such distortion, typically caused by loci under selection pressures, can affect recombination rates. It is thus satisfying that the agreement between theory and experiment is as good as it is in spite of this distortion.

Number of blocks:

Clearly the typical number of blocks will grow with the total length L of the chromosome. Furthermore, for large g, the density of block extremities is 2 in SSD (4 in SIB); thus at large L the mean number of blocks should grow as 2L in SSD (as 4L in SIB). It is also of interest to determine the distribution of the number of blocks.

Consider first the probability that the whole chromosome at generation g is in one single homozygous block. Starting at the left end of the chromosome, we must be in a fixed state: we choose it to be, for instance, the PA genotype. This constraint is used to set the occupation probabilities of the random walks on ℋ before the first hop. Explicitly, at x = 0 we introduce Vi(0) = 0 on vertex i if its color is incompatible with the PA genotype; otherwise Vi(0) is a site-independent constant such that the total probabilities sum to 1. At each junction (hop), the master equation is used to update the vector of probabilities on the hypercube, and so for K hops we have the vector Embedded Image. We iterate for up to a given total of NJ junctions. In practice we cannot take NJ = ∞ because of the numerical nature of the algorithm; so instead we take NJ sufficiently large so that only negligible probabilities are dropped in the truncation. As a ballpark estimate, NJ must be large enough to have NJ/Nc ≫ L, and then each gamete can have many junctions per morgan. During the application of the master equation to the vector, hops terminating the block are stored and we keep in a file the probabilities Π(1)(K) that the first block ends after K hops, 1 ≤ K ≤ NJ. Then the probability p1 that there will be just one block in 0 ≤ x ≤ L is given byEmbedded Image(10)times the probability of fixing at x = 0 (in SSD this is 1 – 1/2g; in SIB it is given by a recurrence relation). Note that these integrals correspond to incomplete Gamma functions, allowing for a relatively efficient computation (see File S2). As mentioned before, in practice the sum over K is truncated to K ≤ NJ and one must check that the error induced by this truncation is small enough. We find that the probability to have a single block decreases exponentially when L grows. For instance in SSD, at large g, the probability is 0.417 for L = 0.5 M, 0.189 for L = 1.0 M, and 0.088 for L = 1.5 M.

The previous approach can be extended to compute the probability that the chromosome consists of n blocks as follows. For simplicity, consider directly the large g limit so we can ignore hops on ℋ leading to heterozygous genotypes at g. We use the master equation framework to keep track of the joint probabilities of being on a vertex of ℋ and of being in the mth block, for the number of hops K going from 0 to NJ. At each hop, there are transitions from vertex to vertex and potentially from block to block. If one hops to a vertex incompatible with the desired block pattern, the probability is set to 0. We keep track of the probabilities Π(n)(K) that the nth block terminates at the Kth hop. The probability of having at most n blocks when 0 ≤ x ≤ L is then given by the same formula as for one block (Equation 10) but replacing Π(1)(K) by Π(n)(K). (Note that K is the sum of the number of hops for blocks 1, 2, … , n.) Repeating the computation for n – 1, we obtain the probability of having at most n – 1 blocks in the region 0 ≤ x ≤ L, and taking the difference of the two results we obtain the probability of having exactly n blocks (see File S2). As an illustrative example, Figure 3 gives the probability distribution of n at large g in the case of SSD for a chromosome of length L = 1.5 M.

Figure 3
  • Download figure
  • Open in new tab
  • Download powerpoint
Figure 3

Distribution of block number in SSD RIL. The histogram shows the frequencies of having 1, 2, . . . blocks in SSD at large number of generations; here L = 1.5 M.

Discussion

In the dense marker or continuous chromosome picture, genotypes appearing in a RIL form IBD blocks. The block structure is nontrivial in part because the consequence of a crossover at generation g depends on crossovers arising at previous generations. To study the statistics of such blocks, we used a labeling procedure that allows for a mapping onto a Markov process. Such a process reduces our problem of blocks to linear operations (associated with a master equation that we implemented in a C code), followed by relatively standard analysis involving sums and integrals (that we treated via Mathematica). The sources for these computations are provided with this work (see File S2).

We illustrated a number of properties of block statistics, highlighting in particular several closed-form results. Although the joint statistics of blocks are quite complex, we found that block-to-block correlations were very weak and thus genotype frequencies are well approximated by a stationary renewal process. In addition, the distributions of block lengths are not too far from exponential. Thus approximating the RIL case using exponential distributions will lead to qualitatively correct results with an accuracy of ∼20%. This level of accuracy should hold for other systems of crosses such as randomly mating populations, justifying the use of the exponential approximation in several previous studies (Stam 1980; Chapman and Thompson 2003).

We mainly stressed cases with complete fixation because the construction of RILs aims to have homozygous genotypes, but our formalism is applicable to both homozygous and heterozygous blocks. Note that within SSD RILs, as shown in Results, heterozygous blocks have an exponential distribution for their lengths; furthermore, heterozygous blocks interrupt the memory of the process in SSD; that is, two blocks separated by a heterozygous segment are independent. The reason is that there is only one way to be heterozygous in SSD (up to irrelevant exchanges of gametes at the same generation).

Clearly it is possible to extend our formalism to cases involving more than two parents; this may be of use when dealing with multiparental RILs that are being developed currently to have greater power in association studies (Churchill et al. 2004). We hope our results will stimulate work in this direction.

Acknowledgments

We thank F. Rodolphe for discussing his results with us. This work has been supported by grants from the Agence Nationale de la Recherche: ANR-09-GENM-022-003 SingleMeiosis (to O.C.M.) and ANR-O6-BLANC-0128 (to F.H.).

Footnotes

  • Communicating editor: I. Hoeschele

  • Received December 24, 2010.
  • Accepted July 12, 2011.
  • Copyright © 2011 by the Genetics Society of America

Literature Cited

  1. ↵
    1. Alcazar R.,
    2. Garcia A. V.,
    3. Parker J. E.,
    4. Reymond M.
    , 2009 Incremental steps toward incompatibility revealed by Arabidopsis epistatic interactions modulating salicylic acid pathway activation. Proc. Natl. Acad. Sci. USA 106: 334–339.
    OpenUrlAbstract/FREE Full Text
  2. ↵
    1. Altshuler D.,
    2. Brooks L. D.,
    3. Chakravarti A.,
    4. Collins F. S.,
    5. Daly M. J.,
    6. et al.
    , 2005 A haplotype map of the human genome. Nature 437: 1299–1320.
    OpenUrlCrossRefPubMedWeb of Science
  3. ↵
    1. Baird S.,
    2. Barton N.,
    3. Etheridge A.
    , 2003 The distribution of surviving blocks of an ancestral genome. Theor. Popul. Biol. 64: 451–471.
    OpenUrlCrossRefPubMed
  4. ↵
    1. Ball F.,
    2. Stefanov V.
    , 2005 Evaluation of identity-by-descent probabilities for half-sibs on continuous genome. Math. Biosci. 196: 215–225.
    OpenUrlCrossRefPubMed
  5. ↵
    1. Bennett J.
    , 1953 Junctions in inbreeding. Genetica 26: 392–406.
    OpenUrlCrossRefPubMed
  6. ↵
    1. Bergland A. O.,
    2. Genissel A.,
    3. Nuzhdin S. V.,
    4. Tatar M.
    , 2008 Quantitative trait loci affecting phenotypic plasticity and the allometric relationship of ovariole number and thorax length in Drosophila melanogaster. Genetics 180: 567–582.
    OpenUrlAbstract/FREE Full Text
  7. ↵
    1. Bickeboller H.,
    2. Thompson E.
    , 1996a Distribution of genome shared IBD by half-sibs: approximation by the Poisson clumping heuristic. Theor. Popul. Biol. 50: 66–90.
    OpenUrlCrossRefPubMedWeb of Science
  8. ↵
    1. Bickeboller H.,
    2. Thompson E.
    , 1996b The probability distribution of the amount of an individual’s genome surviving to the following generation. Genetics 143: 1043–1049.
    OpenUrlAbstract/FREE Full Text
  9. ↵
    1. Browning S.,
    2. Browning B.
    , 2002 On reducing the statespace of hidden Markov models for the identity by descent process. Theor. Popul. Biol. 62: 1–8.
    OpenUrlCrossRefPubMedWeb of Science
  10. ↵
    1. Bryc K.,
    2. Auton A.,
    3. Nelson M. R.,
    4. Oksenberg J. R.,
    5. Hauser S. L.,
    6. et al.
    , 2010 Genome-wide patterns of population structure and admixture in West Africans and African Americans. Proc. Natl. Acad. Sci. USA 107: 786–791.
    OpenUrlAbstract/FREE Full Text
  11. ↵
    1. Buckler E.,
    2. Gore M.
    , 2007 An Arabidopsis haplotype map takes root. Nat. Genet. 39: 1056–1057.
    OpenUrlCrossRefPubMed
  12. ↵
    1. Cannings C.
    , 2003 The identity by descent process along the chromosome. Hum. Hered. 56: 126–130.
    OpenUrlCrossRefPubMed
  13. ↵
    1. Cardon L. R.,
    2. Abecasis G. R.
    , 2003 Using haplotype blocks to map human complex trait loci. Trends Genet. 19: 135–140.
    OpenUrlCrossRefPubMedWeb of Science
  14. ↵
    1. Carlton J. M.
    , 2007 Toward a malaria haplotype map. Nat. Genet. 39: 5–6.
    OpenUrlCrossRefPubMed
  15. ↵
    1. Chapman N. H.,
    2. Thompson E. A.
    , 2003 A model for the length of tracts of identity by descent in finite random mating populations. Theor. Popul. Biol. 64: 141–150.
    OpenUrlCrossRefPubMed
  16. ↵
    1. Churchill G. A.
    , 2007 Recombinant inbred strain panels: a tool for systems genetics. Physiol. Genomics 31: 174–175.
    OpenUrlFREE Full Text
  17. ↵
    1. Churchill G.,
    2. Airey D. C.,
    3. Allayee H.,
    4. Angel J. M.,
    5. Attie A. D.,
    6. et al.
    , 2004 The collaborative cross, a community resource for the genetic analysis of complex traits. Nat. Genet. 36: 1133–1137.
    OpenUrlCrossRefPubMedWeb of Science
  18. ↵
    1. Crow J. F.
    , 2007 Haldane, Bailey, Taylor and recombinant-inbred lines. Genetics 176: 729–732.
    OpenUrlFREE Full Text
  19. ↵
    1. Cuppen E.
    , 2005 Haplotype-based genetics in mice and rats. Trends Genet. 21: 318–322.
    OpenUrlCrossRefPubMedWeb of Science
  20. ↵
    1. Curtis D.,
    2. Vine A. E.,
    3. Knight J.
    , 2008 Study of regions of extended homozygosity provides a powerful method to explore haplotype structure of human populations. Ann. Hum. Genet. 72: 261–278.
    OpenUrlCrossRefPubMedWeb of Science
  21. ↵
    1. Dimitropoulou P.,
    2. Cannings C.
    , 2003 Recsim and indstats: probabilities of identity in general genealogies. Bioinformatics 19: 790–791.
    OpenUrlCrossRefPubMed
  22. ↵
    1. Donelly K. P.
    , 1983 The probability that related individuals share some section of genome identical by descent. Theor. Popul. Biol. 23: 34–63.
    OpenUrlCrossRefPubMedWeb of Science
  23. ↵
    1. Feller W.
    , 1950 An Introduction to Probability Theory and Its Applications. John Wiley & Sons, New York.
  24. ↵
    1. Fisher R.
    , 1949 The Theory of Inbreeding. Oliver & Boyd, Edinburgh.
  25. ↵
    1. Fisher R.
    , 1954 A fuller theory of “junctions” in inbreeding. Heredity 8: 187–197.
    OpenUrlCrossRefWeb of Science
  26. ↵
    1. Fisher R.
    , 1959 An algebraically exact examination of junction formation and transmission in parent-offspring inbreeding. Heredity 13: 523–542.
    OpenUrl
  27. ↵
    1. Franklin I.
    , 1977 The distribution of the proportion of the genome which is homozygous by descent in inbred individuals. Theor. Popul. Biol. 11: 60–80.
    OpenUrlCrossRefPubMed
  28. ↵
    1. Frazer K. A.,
    2. Ballinger D. G.,
    3. Cox D. R.,
    4. Hinds D. A.,
    5. Stuve L. L.,
    6. et al.
    , 2007 A second generation human haplotype map of over 3.1 million SNPs. Nature 449: 851–861.
    OpenUrlCrossRefPubMedWeb of Science
  29. ↵
    1. Gold B.,
    2. Kirchhoff T.,
    3. Stefanov S.,
    4. Lautenberger J.,
    5. Viale A.,
    6. et al.
    , 2008 Genome-wide association study provides evidence for a breast cancer risk locus at 6q22–33. Proc. Natl. Acad. Sci. USA 105: 4340–4345.
    OpenUrlAbstract/FREE Full Text
  30. ↵
    1. Goldstein D. B.
    , 2001 Islands of linkage disequilibrium. Nat. Genet. 29: 109–111.
    OpenUrlCrossRefPubMedWeb of Science
  31. ↵
    1. Greenawalt D. M.,
    2. Cui X. F.,
    3. Wu Y. J.,
    4. Lin Y.,
    5. Wang H. Y.,
    6. et al.
    , 2006 Strong correlation between meiotic crossovers and haplotype structure in a 2.5-mb region on the long arm of chromosome 21. Genome Res. 16: 208–214.
    OpenUrlAbstract/FREE Full Text
  32. ↵
    1. Grote M. N.
    , 2007 A covariance structure model for the admixture of binary genetic variation. Genetics 176: 2405–2420.
    OpenUrlAbstract/FREE Full Text
  33. ↵
    1. Guo S.
    , 1994 Computation of identity by descent proportions shared by two siblings. Am. J. Hum. Genet. 54: 1104–1109.
    OpenUrlPubMed
  34. ↵
    1. Haldane J. B. S.
    , 1919 The combination of linkage values, and the calculation of distances between the loci of linked factors. J. Genet. 8: 299–309.
    OpenUrlWeb of Science
  35. ↵
    1. Haldane J. B. S.,
    2. Waddington C. H.
    , 1931 Inbreeding and linkage. Genetics 16: 357–374.
    OpenUrlFREE Full Text
  36. ↵
    1. Jakobsson M.,
    2. Scholz S. W.,
    3. Scheet P.,
    4. Gibbs J. R.,
    5. VanLiere J. M.,
    6. et al.
    , 2008 Genotype, haplotype and copy-number variation in worldwide human populations. Nature 451: 998–1003.
    OpenUrlCrossRefPubMedWeb of Science
  37. ↵
    1. Jeffreys A. J.,
    2. Kauppi L.,
    3. Neumann R.
    , 2001 Intensely punctate meiotic recombination in the class ii region of the major histocompatibility complex. Nat. Genet. 29: 217–222.
    OpenUrlCrossRefPubMedWeb of Science
  38. ↵
    1. Kauppi L.,
    2. Stumpf M. P. H.,
    3. Jeffreys A. J.
    , 2005 Localized breakdown in linkage disequilibrium does not always predict sperm crossover hot spots in the human MHC class II region. Genomics 86: 13–24.
    OpenUrlCrossRefPubMedWeb of Science
  39. ↵
    1. Keurentjes J. J. B.,
    2. Fu J. Y.,
    3. Terpstra I. R.,
    4. Garcia J. M.,
    5. van den Ackerveken G.,
    6. et al.
    , 2007 Regulatory network construction in Arabidopsis by using genome-wide gene expression quantitative trait loci. Proc. Natl. Acad. Sci. USA 104: 1708–1713.
    OpenUrlAbstract/FREE Full Text
  40. ↵
    1. Lencz T.,
    2. Lambert C.,
    3. DeRosse P.,
    4. Burdick K. E.,
    5. Morgan T. V.,
    6. et al.
    , 2007 Runs of homozygosity reveal highly penetrant recessive loci in schizophrenia. Proc. Natl. Acad. Sci. USA 104: 19942–19947.
    OpenUrlAbstract/FREE Full Text
  41. ↵
    1. Lindblad-Toh K.,
    2. Wade C. M.,
    3. Mikkelsen T. S.,
    4. Karlsson E. K.,
    5. Jaffe D. B.,
    6. et al.
    , 2005 Genome sequence, comparative analysis and haplotype structure of the domestic dog. Nature 438: 803–819.
    OpenUrlCrossRefPubMedWeb of Science
  42. ↵
    1. Maccaferri M.,
    2. Sanguineti M. C.,
    3. Corneti S.,
    4. Ortega J. L. A.,
    5. Ben Salem M.,
    6. et al.
    , 2008 Quantitative trait loci for grain yield and adaptation of durum wheat (Triticum durum desf.) across a wide range of water availability. Genetics 178: 489–511.
    OpenUrlAbstract/FREE Full Text
  43. ↵
    1. Martin O. C.,
    2. Hospital F.
    , 2006 Two- and three-locus tests for linkage analysis using recombinant inbred lines. Genetics 173: 451–459.
    OpenUrlAbstract/FREE Full Text
  44. ↵
    1. McVean G. A. T.,
    2. Cardin N. J.
    , 2005 Approximating the coalescent with recombination. Philos. Trans. R. Soc. B Biol. Sci. 360: 1387–1393.
    OpenUrlAbstract/FREE Full Text
  45. ↵
    1. Meuwissen T. H. E.,
    2. Hayes B. J.,
    3. Goddard M. E.
    , 2001 Prediction of total genetic value using genome-wide dense marker maps. Genetics 157: 1819–1829.
    OpenUrlAbstract/FREE Full Text
  46. ↵
    1. Moore R. C.,
    2. Henry M.,
    3. Stevens H.
    , 2008 Local patterns of nucleotide polymorphism are highly variable in the selfing species Arabidopsis thaliana. J. Mol. Evol. 66: 116–129.
    OpenUrlCrossRefPubMedWeb of Science
  47. ↵
    1. Mott R.
    , 2007 A haplotype map for the laboratory mouse. Nat. Genet. 39: 1054–1056.
    OpenUrlCrossRefPubMed
  48. ↵
    1. Pe’er I.,
    2. Chretien Y. R.,
    3. de Bakker P. I. W.,
    4. Barrett J. C.,
    5. Daly M. J.,
    6. et al.
    , 2006 Biases and reconciliation in estimates of linkage disequilibrium in the human genome. Am. J. Hum. Genet. 78: 588–603.
    OpenUrlCrossRefPubMedWeb of Science
  49. ↵
    1. Pritchard J. K.,
    2. Stephens M.,
    3. Donnelly P.
    , 2000 Inference of population structure using multilocus genotype data. Genetics 155: 945–959.
    OpenUrlAbstract/FREE Full Text
  50. ↵
    1. Rodolphe F.,
    2. Martin J.,
    3. Della-Chiesa E.
    , 2008 Theoretical description of chromosome architecture after multiple back-crossing. Theor. Popul. Biol. 73: 289–299.
    OpenUrlCrossRefPubMed
  51. ↵
    1. Sabeti P. C.,
    2. Reich D. E.,
    3. Higgins J. M.,
    4. Levine H. Z. P.,
    5. Richter D. J.,
    6. et al.
    , 2002 Detecting recent positive selection in the human genome from haplotype structure. Nature 419: 832–837.
    OpenUrlCrossRefPubMedWeb of Science
  52. ↵
    1. Shifman S.,
    2. Kuypers J.,
    3. Kokoris M.,
    4. Yakir B.,
    5. Darvasi A.
    , 2003 Linkage disequilibrium patterns of the human genome across populations. Hum. Mol. Genet. 12: 771–776.
    OpenUrlCrossRefPubMedWeb of Science
  53. ↵
    1. Singer T.,
    2. Fan Y. P.,
    3. Chang H. S.,
    4. Zhu T.,
    5. Hazen S. P.,
    6. et al.
    , 2006 A high-resolution map of Arabidopsis recombinant inbred lines by whole-genome exon array hybridization. PLoS Genet. 2: e144.
    OpenUrlCrossRefPubMed
  54. ↵
    1. Slate J.,
    2. Pemberton J. M.
    , 2007 Admixture and patterns of linkage disequilibrium in a free-living vertebrate population. J. Evol. Biol. 20: 1415–1427.
    OpenUrlCrossRefPubMedWeb of Science
  55. ↵
    1. Slatkin M.
    , 1972 On treating the chromosome as the unit of selection. Genetics 72: 157–168.
    OpenUrlAbstract/FREE Full Text
  56. ↵
    1. Stam P.
    , 1980 The distribution of the fraction of the genome identical by descent in finite random mating populations. Genet. Res. 35: 131–155.
    OpenUrlCrossRef
  57. ↵
    1. Stefanov V. T.
    , 2000 Distribution of genome shared identical by descent by two individuals in grandparent-type relationship. Genetics 156: 1403–1410.
    OpenUrlAbstract/FREE Full Text
  58. ↵
    1. Stumpf M. P. H.
    , 2002 Haplotype diversity and the block structure of linkage disequilibrium. Trends Genet. 18: 226–228.
    OpenUrlCrossRefPubMedWeb of Science
  59. ↵
    1. Tang K.,
    2. Thornton K. R.,
    3. Stoneking M.
    , 2007 A new approach for using genome scans to detect recent positive selection in the human genome. PLoS Biol. 5: e071.
    OpenUrlCrossRef
  60. ↵
    1. Tishkoff S. A.,
    2. Verrelli B. C.
    , 2003 Role of evolutionary history on haplotype block structure in the human genome: implications for disease mapping. Curr. Opin. Genet. Dev. 13: 569–575.
    OpenUrlCrossRefPubMedWeb of Science
  61. ↵
    1. Wall J. D.,
    2. Pritchard J. K.
    , 2003 Haplotype blocks and linkage disequilibrium in the human genome. Nat. Rev. Genet. 4: 587–597.
    OpenUrlCrossRefPubMedWeb of Science
  62. ↵
    1. Walters K.,
    2. Cannings C.
    , 2005 The probability density of the total IBD length over a single autosome in unilineal relationships. Theor. Popul. Biol. 68: 55–63.
    OpenUrlCrossRefPubMed
  63. ↵
    1. Wiuf C.,
    2. Hein J.
    , 1999 Recombination as a point process along sequences. Theor. Popul. Biol. 55: 248–259.
    OpenUrlCrossRefPubMedWeb of Science
  64. ↵
    1. Wolfram S.
    , 1991 Mathematica: A System for Doing Mathematics by Computer, Ed. 2. Addison Wesley Longman Publishing, Redwood City, CA.
  65. ↵
    1. Yalcin B.,
    2. Fullerton J.,
    3. Miller S.,
    4. Keays D. A.,
    5. Brady S.,
    6. et al.
    , 2004 Unexpected complexity in the haplotypes of commonly used inbred strains of laboratory mice. Proc. Natl. Acad. Sci. USA 101: 9734–9739.
    OpenUrlAbstract/FREE Full Text
  66. ↵
    1. Yu J.,
    2. Holland J. B.,
    3. McMullen M. D.,
    4. Buckler E. S.
    , 2008 Genetic design and statistical power of nested association mapping in maize. Genetics 178: 539–551.
    OpenUrlAbstract/FREE Full Text
  67. ↵
    1. Zhang C.,
    2. Bailey D. K.,
    3. Awad T.,
    4. Liu G. Y.,
    5. Xing G. L.,
    6. et al.
    , 2006 A whole genome long-range haplotype (wglrh) test for detecting imprints of positive selection in human populations. Bioinformatics 22: 2122–2128.
    OpenUrlCrossRefPubMedWeb of Science
  68. ↵
    1. Zheng M. X.,
    2. McPeek M. S.
    , 2007 Multipoint linkage-disequilibrium mapping with haplotype-block structure. Am. J. Hum. Genet. 80: 112–125.
    OpenUrlCrossRefPubMed
  69. ↵
    1. Zondervan K. T.,
    2. Cardon L. R.
    , 2004 The complex interplay among factors that influence allelic association. Nat. Rev. Genet. 5: 89–100.
    OpenUrlPubMedWeb of Science
View Abstract
Previous ArticleNext Article
Back to top

PUBLICATION INFORMATION

Volume 189 Issue 2, October 2011

Genetics: 189 (2)

ARTICLE CLASSIFICATION

INVESTIGATIONS
Genetics of complex traits
View this article with LENS
Email

Thank you for sharing this Genetics article.

NOTE: We request your email address only to inform the recipient that it was you who recommended this article, and that it is not junk mail. We do not retain these email addresses.

Enter multiple addresses on separate lines or separate them with commas.
Distribution of Parental Genome Blocks in Recombinant Inbred Lines
(Your Name) has forwarded a page to you from Genetics
(Your Name) thought you would be interested in this article in Genetics.
Print
Alerts
Enter your email below to set up alert notifications for new article, or to manage your existing alerts.
SIGN UP OR SIGN IN WITH YOUR EMAIL
View PDF
Share

Distribution of Parental Genome Blocks in Recombinant Inbred Lines

Olivier C. Martin and Frédéric Hospital
Genetics October 1, 2011 vol. 189 no. 2 645-654; https://doi.org/10.1534/genetics.111.129700
Olivier C. Martin
INRA, Université Paris-Sud, Centre National de la Recherche Scientifique, AgroParisTech, UMR0320/UMR8120 Génétique Végétale, F- 91190 Gif-sur-Yvette, FranceUniversité Paris-Sud, Centre National de la Recherche Scientifique, UMR 8626 Laboratoire de Physique Theorique et Modeles Statistiques, F-91405, Orsay, France
  • Find this author on Google Scholar
  • Find this author on PubMed
  • Search for this author on this site
  • For correspondence: olivier.martin@u-psud.fr
Frédéric Hospital
INRA, UMR1313 Génétique Animale et Biologie Intégrative, 78352 Jouy-en-Josas, France
  • Find this author on Google Scholar
  • Find this author on PubMed
  • Search for this author on this site
del.icio.us logo Digg logo Reddit logo Twitter logo CiteULike logo Facebook logo Google logo Mendeley logo
Citation

Distribution of Parental Genome Blocks in Recombinant Inbred Lines

Olivier C. Martin and Frédéric Hospital
Genetics October 1, 2011 vol. 189 no. 2 645-654; https://doi.org/10.1534/genetics.111.129700
Olivier C. Martin
INRA, Université Paris-Sud, Centre National de la Recherche Scientifique, AgroParisTech, UMR0320/UMR8120 Génétique Végétale, F- 91190 Gif-sur-Yvette, FranceUniversité Paris-Sud, Centre National de la Recherche Scientifique, UMR 8626 Laboratoire de Physique Theorique et Modeles Statistiques, F-91405, Orsay, France
  • Find this author on Google Scholar
  • Find this author on PubMed
  • Search for this author on this site
  • For correspondence: olivier.martin@u-psud.fr
Frédéric Hospital
INRA, UMR1313 Génétique Animale et Biologie Intégrative, 78352 Jouy-en-Josas, France
  • Find this author on Google Scholar
  • Find this author on PubMed
  • Search for this author on this site

Citation Manager Formats

  • BibTeX
  • Bookends
  • EasyBib
  • EndNote (tagged)
  • EndNote 8 (xml)
  • Medlars
  • Mendeley
  • Papers
  • RefWorks Tagged
  • Ref Manager
  • RIS
  • Zotero

Related Articles

Cited By

More in this TOC Section

Investigations

  • Comparative Genomics and Transcriptomics To Analyze Fruiting Body Development in Filamentous Ascomycetes
  • The Fate of Deleterious Variants in a Barley Genomic Prediction Population
  • The Role of Anti-Müllerian Hormone in Testis Differentiation Reveals the Significance of the TGF-β Pathway in Reptilian Sex Determination
Show more Investigations

Genetics of Complex Traits

  • The Fate of Deleterious Variants in a Barley Genomic Prediction Population
  • Novel Neuroprotective Loci Modulating Ischemic Stroke Volume in Wild-Derived Inbred Mouse Strains
  • Natural Variation and Genetic Determinants of Caenorhabditis elegans Sperm Size
Show more Genetics of Complex Traits
  • Top
  • Article
    • Abstract
    • Model and Methods
    • Results
    • Discussion
    • Acknowledgments
    • Footnotes
    • Literature Cited
  • Figures & Data
  • Supplemental
  • Info & Metrics

GSA

The Genetics Society of America (GSA), founded in 1931, is the professional membership organization for scientific researchers and educators in the field of genetics. Our members work to advance knowledge in the basic mechanisms of inheritance, from the molecular to the population level.

Online ISSN: 1943-2631

  • For Authors
  • For Reviewers
  • For Subscribers
  • Submit a Manuscript
  • Editorial Board
  • Press Releases

SPPA Logo

GET CONNECTED

RSS  Subscribe with RSS.

email  Subscribe via email. Sign up to receive alert notifications of new articles.

  • Facebook
  • Twitter
  • YouTube
  • LinkedIn
  • Google Plus

Copyright © 2019 by the Genetics Society of America

  • About GENETICS
  • Terms of use
  • Advertising
  • Permissions
  • Contact us
  • International access