We demonstrate that recent data from human males are consistent with constant interference levels among chromosomes under the two-pathway model, whereas inappropriately fitting shape parameters of Gamma distributions to immunofluorescent interfoci distances observed on finite chromosomes generates false interpretations of higher levels of interference on shorter chromosomes. We provide appropriate statistical methodology.
WORKING with mice, de Boer et al. (2006) fit the shape parameter of the Gamma distribution to inter-MLH1-foci distances on the synaptonemal complex as a new method for measuring crossover interference. The method exploited demonstrations that the Gamma distribution models intercrossover distances well when measured in terms of genetic length [on infinitely long chromosomes; Broman and Weber (2000)] and that the synaptonemal complex length may simply introduce a change in scale (Lynn et al. 2002). The methodology has since been employed to estimate interference again in mice (Barchi et al. 2008), as well as in dogs (Basheva et al. 2008), cats (Borodin et al. 2007), shrews (Borodin et al. 2008), minks (Borodin et al. 2009), tomatoes (Lhuissier et al. 2007), and humans (Lian et al. 2008).
Lian et al. (2008) used the estimates of the shape parameter in human males to conclude that interference increases as chromosomal length decreases. This seemingly contradicts earlier work of Kaback et al. (1999), who observed in budding yeast that the recombination rate per megabase increases but that interference decreases as chromosomal length decreases. In this note, we argue that the results of both Kaback et al. (1999) and Lian et al. (2008) are consistent with constant interference levels when analyzed under the two-pathway hypothesis for crossing over (Stahl et al. 2004; Getz et al. 2008).
According to this hypothesis, there are two recombinational pathways: crossovers in the pairing pathway promote pairing of the chromosomes and have no interference whereas crossovers in the disjunction pathway manifest interference. If, as in Housworth and Stahl (2003), we assume that each chromosome must have the same average number of double-strand breaks in the pairing pathway to achieve synapsis and the same proportion of these will have crossover resolutions, then the mathematical model for estimating genetic distance X in centimorgans from the physical distance L in number of megabase pairs would be X = aL + b. Here, a is the rate, per 100 meioses, of disjunction crossovers per megabase pair and b is the fixed average number of pairing pathway crossovers, per 100 meioses, per chromosome. Under this model, shorter chromosomes will have higher total recombination per megabase than long chromosomes and will seem to have lower levels of interference, explaining both of the results of Kaback et al. (1999). The regression analysis for human males based on the high-resolution Rutgers map (Matise et al. 2007) is given in Figure 1.
When analyzed with the two-pathway model, the results of Lian et al. (2008) are also explained by constant interference levels among chromosomes. Under this model, it is only the crossovers in the disjunction pathway that show up as MLH1 foci in the immunofluorescence images. Further, the interfoci distances given in the histograms of Lian et al. (2008) are conditional on the chromosomes receiving at least two crossovers, which, among other things, truncates the distance to be no more than the length of the chromosome whereas the Gamma distribution models intercrossover distances on infinitely long chromosomes. Indeed, the results of our simulation of 100 meioses from 10 men with constant interference given in Figure 2 match well with the corresponding histograms in Figure 2 of Lian et al. (2008). We conclude that the large shape estimates reported by Lian et al. (2008) for short chromosomes are simply due to the bias induced by inappropriately fitting a Gamma distribution to the data.
We note further that the natural measure of interference involves genetic distances. In that framework, the shape and scale parameters of the Gamma distribution do not both freely vary: the rate (reciprocal of the scale) is twice the shape, ν. For data normalized to be a percentage of the entire length, the restriction is that the scale, μ, = 100/(2νx), where x is the genetic length (in morgans) of the chromosome in the disjunction pathway.
METHOD OF ANALYSIS FOR FOCI DATA
de Boer et al. (2006) recognized the finite chromosome issue and employed an ad hoc method to address it, which was not utilized by the subsequent authors. The proper method of analysis of foci data would take into account all information, including the distances from the ends to the nearest crossovers. Assume that the data are relative to the total length. Then a small set of results for four meioses involving a given chromosome is given in Table 1.
The total likelihood of the data set is the product of the probabilities of all of these events, including the distances to the ends and the probability of receiving no crossover when none occurred. Following Broman and Weber (2000) without thinning the results from the four-strand bundle, the probability density of observing a given intercrossover distance, y, is Gamma distributed and is given by the formula
The probability density of the length to one of the ends, y, is the probability of the censored observation, which iswhere F(y | ν, μ) is the cumulative distribution function of f (y | ν, μ).
The probability density of the length to the other end, y, is the density required for stationarity (so that it does not matter which end is considered the censored one) and iswhere F(y | ν, μ) is again the cumulative distribution function of f(y | ν, μ).
The probability of having no crossovers on the entire length is the probability of not getting a first crossover, which iswhere G(y | ν, μ) is the cumulative distribution function of g(y | ν, μ). Chromosomes with no crossovers would contribute this probability to the product.
If the scale is restricted to μ = 100/(2νx) with the genetic length x known, then the likelihood is a function of only one variable, ν, and can be optimized easily. Code in R that takes as input a data set such as the one in Table 1 and returns the best estimate for ν along with an estimate of the standard error in ν is provided as supporting information, File S1 and also at http://mypage.iu.edu/∼ehouswor/Software/InterMLH1fociCode.html.
Supporting information is available online at http://www.genetics.org/cgi/content/full/genetics.109.103853/DC1.
Communicating editor: I. Hoeschele
- Received April 14, 2009.
- Accepted June 28, 2009.
- Copyright © 2009 by the Genetics Society of America