Abstract
The crossover distribution in meiotic tetrads of Arabidopsis thaliana differs from those previously described for Drosophila and Neurospora. Whereas a chi-square distribution with an even number of degrees of freedom provides a good fit for the latter organisms, the fit for Arabidopsis was substantially improved by assuming an additional set of crossovers sprinkled, at random, among those distributed as per chi square. This result is compatible with the view that Arabidopsis has two pathways for meiotic crossing over, only one of which is subject to interference. The results further suggest that Arabidopsis meiosis has >10 times as many double-strand breaks as crossovers.
COBBS (1978) and Stam (1979) proposed tidy mathematical models for crossover (chiasma) interference in meiosis. Their equivalent models envisioned Poisson-distributed crossover “attempts” among acts of meiosis for a specified bivalent. Successful attempts (resulting in crossing over) were separated by a fixed number of “failures,” which gave no crossing over. The models are sometimes referred to as “chi-square” models because the resulting probability distribution for interexchange distances is a chi-square distribution with an even number of degrees of freedom (Langeet al. 1997).
Foss et al. (1993), inspired by Mortimer and Fogel (1974), offered a “counting model” for crossover interference, assuming no chromatid interference, which (unwittingly) expanded on that of Cobbs and Stam by detailing the nature of the “failures” that are “counted” between crossovers. The failures were assumed to be double-strand breaks that were repaired without crossing over (“noncrossovers”; Szostaket al. 1983). Since double-strand-break repair can result in gene conversion whether or not it is accompanied by crossing over, it was reasonable to estimate the number of obligate failures between successes from the fraction of gene conversions that are unaccompanied by crossing over of markers flanking the conversion. For Drosophila, the only estimate of this fraction set the counting number at four (Hilliker and Chovnick 1981; Hillikeret al. 1991), while data for Neurospora gave a value of two (Perkinset al. 1993). A mark of the model's success was that the values four and two for those two organisms, respectively, generated optimal expressions for multiple crossover data analyzed in a variety of ways (Fosset al. 1993; Lande and Stahl 1993; McPeek and Speed 1995; Zhaoet al. 1995). The model received further encouragement from evidence in the literature, albeit weak, of negative interference between crossovers and noncrossovers (see Fosset al. 1993).
A prediction of the counting model, related to the negative interference between crossovers and noncrossovers, is that the interval between a pair of close exchanges should be especially enriched for noncrossovers, with some of them manifested as conversions when markers are present to detect them. Experiments in budding yeast by Foss and Stahl (1995) failed to support that prediction. Since that time, two developments in yeast genetics imply that the test was doomed: (1) In addition to crossovers that are subject to interference, yeast may have additional crossovers, not subject to interference, that derive from recombinational events required for chromosome pairing. As proposed by Zalevsky et al. (1999), these crossovers are the ones remaining in msh4 and zip1 mutants, which have reduced levels of crossing over and of interference. The existence of such “contaminating” crossovers would confound the quantitative predictions of the counting model. (2) More importantly, the genetic markers used to detect and enumerate the expected noncrossovers may have altered the events such that many were not detected as conversions (Borts and Haber 1989; Bortset al. 1990; Chen and Jinks-Robertson 1999). For instance, the chromosomes may have been driven by the presence of the markers to repair some of their breaks using sister chromatids as template. Alternatively, the markers may have prevented breaks.
Why do some organisms (e.g., Drosophila and Neurospora) appear to have crossing over that is subject to simple rules of interference, while interference in yeast appears to be complicated by noninterfering exchanges involved in pairing? A provisional answer to this query (Zalevskyet al. 1999) lies in the different ways that yeast, on the one hand, and flies, on the other, secure chromosome pairing during prophase I of meiosis. The chromosomes of Drosophila (Hawley 1980; reviewed in McKeeet al. 2000) and of Caenorhabditis elegans (Villeneuve 1994; Albertsonet al. 1997), which also has robust interference, have cis-acting “pairing centers,” sequences that ensure homolog pairing. Yeast (for reviews, see Kleckner 1996; Roeder 1997), on the other hand, like mouse (Romanienko and Camerini-Otero 2000) and Coprinus (Celerinet al. 2000), relies on recombinational interactions to secure stable pairing. In yeast, these interactions require the meiosis-specific strand-invasion protein Dmc1p (Bishopet al. 1992), which appears to be lacking in both Drosophila and C. elegans, organisms that apparently employ only the generalized strand-invasion protein Rad51p (Shinoharaet al. 1992) to effect meiotic double-strand-break repair. This correlation suggests the “rule” that organisms possessing Dmc1p use a noninterference crossover pathway to ensure chromosome pairing, while organisms that lack Dmc1p manifest robust interference of an uncomplicated sort because their chromosomes are paired by a nonrecombinational route.
Arabidosis requires the early recombination function Spo11 to achieve synapsis (Grelonet al. 2001), putting it in the camp with yeast, mouse, and Coprinus. That observation plus the demonstrated presence of a DMC1 homolog in Arabidopsis (Klimyuk and Jones 1997; Doutriauxet al. 1998; Couteauet al. 1999) requires this green plant, if it is to follow the rule, to have both an interference and a noninterference pathway for meiotic crossing over. This possibility can be tested by scoring the segregation of abundant PCR-based molecular markers in the meiotic tetrads produced in quartet mutants of Arabidopsis (Preusset al. 1994; Copenhaveret al. 1998).
Our analysis is based on the simple assumption that the disposition of exchange points in the interference pathway is governed by the counting model and that additional exchanges, arising in the pairing pathway, are (pre)sprinkled randomly (i.e., without interference) on this background. The adequacy of our model is supported by control analyses of Neurospora and Drosophila data.
RESULTS
The markers (chromosome 1, nga59, nga63, g2395, m235, SO392, 7G6, T27K12, nga280, ETR, TAG, AthAT-PASE, nga692; chromsome 2, nga1145, mi310, THY1B, nga1126, nga361, nga168; chromosome 3, nga32, nga162, Arlim, GAPA, GL1, NIT1, AFC1, nga112; chromosome 4, GA1, DET1, COP9B, AG, nga1139, nga1107; and chromosome 5, CTR, ca72, nga139, SO262, SO191, DFR, ASB2, LFY3) and their map locations are described in Copenhaver et al. (1998). More complete information is available at the Arabidopsis Information Resource (TAIR) web site (http://www.arabidopsis.org).
Markers were scored and recorded independently by two people and then cross-checked. The complete absence of “gene conversions” in these tetrad data sets further testifies to the reliability of the scoring. The data for each of the five Arabidopsis chromosomes (Tables 1–5) were analyzed separately assuming no chromatid interference. For (n + 1) markers, our data consist of tetrad patterns (t1, t2, …, tn), where ti = 0 denotes parental ditype, ti = 1 denotes parental tetratype, and ti = 2 denotes nonparental ditype with respect to the ith and (i + 1)st markers. We extend the model of Zhao et al. (1995) to include two types of crossover resolutions of the double-strand-break intermediates: type I without interference and type II with interference. Our model has two parameters: the interference parameter, m, which is the number of obligate “failures” between crossover “successes”, and the probability, p, which is the proportion of type I (without interference) crossovers out of all crossovers. For our analyses, the intermarker genetic distances were determined by the Perkins formula; that is, X = (TT/2 + 3NPD)/N, where TT is the number of tetratypes and NPD is the number of nonparental ditypes observed out of a sample of size N.
We determine the maximum-likelihood estimators for m and p from the log-likelihood function: L(m, p) = ∑ log(Pr((t1, t2, …, tn)|m, p, y1, y2, …, yn)), where the sum is taken over all the tetrads in the data set under consideration. See the appendix for the calculation of Pr((t1, t2, …, tn)|m, p, y1, y2, …, yn).
We restrict the possibilities for the interference parameter, m, to be integers between 0 and 20 and we allow p, the probability that a randomly chosen crossover is of the noninterference type, to range between 0 and 1. For each fixed m, we determine the value of p, pm, which maximizes the log-likelihood function, using the golden section algorithm. We then find the pair, (m, pm), which maximizes the log-likelihood function over all the possibilities for m.
To determine whether the model with the additional parameter, p, provides a substantially better fit to the tetrad data from Arabidopsis than an interference-alone model (for which p = 0), we conducted a likelihood-ratio test. The test statistic is two times the difference between the maximum of the log-likelihood function under the extended model and the maximum of the log-likelihood function under the null or interference-only model. For large sample sizes, this test statistic will have approximately a chi-square distribution with degrees of freedom equal to the difference in the number of parameters involved in the extended and null models. In this case, there is one extra parameter, p, in the extended model. We verified that our data set consisting of 57 three- or four-viable spore tetrads was large enough for the distribution of the test statistic to be well approximated by a chi-square distribution with 1 d.f. by simulating data under the null hypothesis, forming the test statistic, and checking that the chi-square cut-off for rejection at the 5% significance level, χ2(0.95) = 3.84, led to rejection of the null hypothesis no more than 5% of the time.
Tetrad data for Arabidopsis chromosome 1
The results of our analysis of the five linkage groups in Arabidopsis are summarized in Table 6. The model with two crossover pathways (one with and one without interference) fits the data on the longer linkage groups, 1, 3, and 5, substantially better than does the model with only an interference pathway. There is no reason to believe that the true values of the interference parameter, m, differ for these linkage groups. The distribution of the estimator m for these data sets is dispersed and skewed to the right. Due to the skewness and to computational problems encountered in obtaining estimates of m > 20, we cannot report meaningful standard errors or confidence intervals for the parameter estimates. However, simulations indicate that if the true interference parameter were 10, obtaining estimates for m of 17 is likely. Similarly, if the true interference parameter were 17, obtaining estimates for m of 10 is likely. On the other hand, these simulations reveal that if the true value of m were 3, estimates of 10 and 17 are unlikely and if the true value of m were 5, estimates of 10 are possible but estimates of 17 are unlikely.
The estimate of the proportion of crossovers without interference, p, was bounded above by ~0.25 and generally was close to 0.20. While the estimate did occasionally fall below 0.10 when two crossover pathways were simulated, the case for p > 0 comes strongly from the fact that, when only the interference pathway was simulated, statistically significant estimates for p > 0 were rare (<5%).
Tetrad data for Arabidopsis chromosome 2
Because our markers span the centromere on each chromosome, we considered the possibility that centromere disruption of interference might be the cause of our positive estimates for type I (without interference) crossovers. To rule out this possibility, we simulated data for chromosome 1 under an interference-only model (with m = 3 and with m = 10) but with complete disruption of interference by the centromere. The null hypothesis (that the interference-only model explains the data as well as the extended model) was not rejected more often than expected by chance (5%); centromere disruption did lead to a decreased estimate for the interference parameter, m, on average. Thus, we conclude that centromere disruption does not explain our significant test results.
Tetrad data for Arabidopsis chromosome 3
Tetrad data for Arabidopsis chromosome 4
To verify that our results were not the spurious consequence of having added a parameter (p) to the interference-alone model, we ran the test against the Drosophila data of Bridges and Curry (Morganet al. 1935) and the Neurospora data of Perkins (1962). For Neurospora, the interference-alone null model provides a good statistical fit to the data (Zhaoet al. 1995) as well as a good visual fit (Fosset al. 1993). For Drosophila, the statistical fit is somewhat lacking (Zhaoet al. 1995) although the visual fit of the interference-alone model is quite good (Fosset al. 1993). Tests of our extended model against the Neurospora and Drosophila data sets confirm those observations. For the Neurospora data set, the most likely value for p in the extended model is 0 and the most likely value for the interference parameter, m, is 2, in keeping with previous analyses. [The finding of p = 0 predicts that recombination functions are not required for synapsis in Neurospora and that Neurospora may lack a DMC1 homolog. mei-3, the only recA homolog reported for Neurospora, belongs to the RAD51 subfamily (Heyer 1994; Hatakeyamaet al. 1995).] For the Drosophila data set of Bridges and Curry (Morganet al. 1935), the most likely value for p in the extended model is 0.01 and the most likely value for the interference parameter, m, is 4. The visual difference between the null model and the extended model with p = 0.01 is virtually undetectable. The positive value for the probability that a crossover is of type I (without interference), while mildly significant (test statistic of 4.74 and a P value of 0.0295), is of little practical significance.
Tetrad data for Arabidopsis chromosome 5
Estimates of m and p
DISCUSSION
Limitations of the conclusions: Although our analysis yields results that are compatible with two discrete classes of crossovers in Arabidopsis, those with and those without interference, by themselves they are not strong support for that view. For instance, some models in which interference is imposed by a “careless” counting mechanism acting upon a single class of crossovers may not be ruled out by the data (e.g., Langeet al. 1997). More direct evidence for two discrete pathways will likely require the isolation of mutants that specifically eliminate one or the other class, as appears to have been done for yeast.
Our analyses of chromosomes 1, 3, and 5 gave comparable estimates of m and p, with p = 0 ruled out. For the short chromosomes, 2 and 4, there were insufficient data to rule out a p value of 0. While the short chromosomes may not differ from the others with respect to p, it remains possible that crossing over on chromosomes 2 and 4 occurs only, or primarily, by the interference pathway. Briscoe and Tomkiel (2000) and McKee et al. (2000), for Drosophila, and Stitou et al. (2001), for rodents, have concluded that nucleolus organizers (NORs) can act as chromosome pairing centers. We noted in the Introduction the view that creatures whose chromosomes are well endowed with cis-acting pairing centers, like those of Drosophila and C. elegans, have no need for recombinational interactions to achieve synapsis. If further data demonstrate small p values for chromosomes 2 and 4, we would propose that the presence of NORs on those two chromosomes results in a reduced requirement for the noninterfering crossovers of the pairing pathway.
In Arabidopsis, Moran et al. (2001) noted that two different mutant strains defective in synapsis suffered reductions in chiasmata differentially on the long (1, 3, and 5) and short (2 and 4) chromosomes. It is notable that a chromosome-specific response to meiotic mutations has not been seen in Drosophila (S. Hawley, personal communication). This difference may reflect the postulated presence of two crossover pathways in Arabidopsis and one in Drosophila.
Are the large estimates for m realistic? Stack and Anderson (1986) reported a 15-fold excess of early recombination nodules over late nodules in tomatoes, suggestive of a large value for m. In Zea mays, Franklin et al. (1999) noted a 10- to 20-fold excess of Rad51p zygotene foci over the estimated total number of crossovers per nucleus. However, although Rad51p does promote repair of double-strand breaks by binding to ssDNA at the resected ends, these authors were disinclined to view the excess as signaling double-strand-break-induced noncrossovers: (1) “It would seem both unnecessary and catastrophic …”; and (2) “Copenhaver et al. (1998) did not observe any gene conversion events in Arabidopsis, despite scoring >1000 polymorphic loci.” However, our estimate of m ≈ 15 for Arabidopsis implies, within the framework of the counting model, that green plants may indeed have a large excess of conversions over crossovers. We therefore calculate how many conversions should have been seen in the tetrad data involving 52 markers reported by Copenhaver et al. (1998). To make the calculation, we make several assumptions:
For each noninterference crossover there is one noncrossover in that pathway. This is equivalent to assuming that the canonical double Holliday junction intermediate is equally likely to be resolved to give a crossover or a noncrossover.
For each crossover in the interference pathway, the total number of potential conversions (crossovers plus noncrossovers) = (m + 1). Combining assumptions (1) and (2), the ratio of total conversions to crossovers is 2p + (m + 1)(1 − p).
We take the length of a conversion tract to be 1 kb, which is about what it is in better-characterized organisms.
The probability that a gene conversion tract of 1 kb will coincide with a particular marker on a chromosome of length L is 1 kb/L. We calculated the length of each Arabidopsis chromosome by adding the number of sequenced nucleotides between the distal-most markers scored on each chromosome (Arabidopsis Genome Initiative 2000; GenBank accession nos. NC 003070, NC 003071, NC 003074, NC 003075, and NC 003076) to the sizes of the respective unsequenced centromere arrays that were estimated by fluorescence in situ hybridization methods (Hauptet al. 2001). The expected number of gene conversion tracts per bivalent (S) is 2[chromosome length in morgans][2p + (m + 1)(1 − p)]. Combining these two calculations, the mean number of times a gene conversion tract is expected to coincide with a given marker is (S)(1 kb/L). To derive the number of expected gene conversions for each chromosome, this value is multiplied by the number of markers scored on a given chromosome and by the number of tetrads in which a conversion could have been seen. The latter number (33) is equal to the number of full (four viable spores) tetrads (25) plus one-quarter of the number of tetrads with three viable spores (32). The resulting value was halved to account for mismatch repair processes that restore potential conversions to the Mendelian ratio, yielding 0.21, 0.08, 0.15, 0.24, and 0.21 expected gene conversions on chromosomes 1–5, respectively, for a total of 0.89 gene conversions (0.88 if the individual chromosome values are not rounded prior to totaling). If we omit the unsequenced centromere arrays from the calculation of L, we expect 0.96 gene conversions in the full data set reported in Copenhaver et al. (1998). We conclude that the failure to have seen a conversion in these Arabidopsis tetrad data is compatible with our estimates of the number of double-strand breaks per crossover. This conclusion reduces the force of the speculation that Rad51p is involved in early pairing exercises that are not associated with breaks.
Acknowledgments
We thank Daphne Preuss for critical discussions. Terry Speed, Scott Hawley, Edward Van Veen, and Wolf Heyer made useful suggestions. Jette Foss and Lisa Young helped groom the text. G.P.C. was supported by National Science Foundation grant DBI-9872641. E.H. was supported by an Interdisciplinary Grant in the Mathematical Sciences, DMS-0075143. Support to F.W.S. was from National Science Foundation grant MCB-9402695 and National Institutes of Health grant GM-33677.
APPENDIX
Theorem 1. Let X be the genetic distance spanned by an interval in morgans. Define D(k, m, p, y) to be the (m + 1) × (m + 1) matrix with (i,j) entry given by
Proof. Let m be the interference parameter, i.e., the number of type II simple gene conversions between two type II crossovers. Let p be the probability any particular crossover is a type I crossover and 1 − p be the probability any particular crossover is a type II crossover.
Let y be the standardized interval length (standardized so that the rate for all Poisson events is 1). Since the rate for a tetrad is twice that for a bivalent, type I crossovers make up a fraction p of the crossovers in the interval and are an independent portion of the Poisson events, type II crossovers make up a fraction (1 − p) of the interval and for every type II crossover we expect m type II simple gene conversions [that is, we see only a fraction, 1/(m + 1), of the type II Poisson events], the rate for all Poisson events is 2(p + (1 − p)(m + 1)) and y = 2(p + (1 − p)(m + 1))X.
We want to form a (m + 1) × (m + 1) matrix, D(k, m, p, y), whose (i,j) entry di,j(k, m, p, y) is the probability of having k crossovers in the current interval of length y and j type II simple gene conversions
Let l (0 ≤ l ≤ k) be the number of type I crossovers so that k − l of the crossovers are of type II. To count the number of Poisson events, n, that we will have in the current interval, note that we need m − i type II simple gene conversions (
Also note that Pr(n Poisson events and l type I events in the current interval) = Pr(l type I events in the current interval given n Poisson events)Pr(n Poisson events). The distribution of type I crossovers given n Poisson events is just the binomial distribution. The probability that any given Poisson event is a type I event is the ratio of the rate of type I's to the rate of all events: p/(p + (1 − p)(m + 1)). Thus,
Pr(l type I events in the current interval given n Poisson events)
In the special case where l = k, all crossovers in the interval are of type I and none are of type II. In this case, we have that j, the number of simple type II gene conversions after the last type I crossover in the current interval, must be at least i, the number of simple type II gene conversions after the last type I crossover in the previous interval, since we had no type II crossovers in the current interval.
Thus the general formula for the (i,j) entry in D(k, m, p, y) is
The sum of all these probabilities over all the possibilities for the number of type I crossovers gives the probability of having k crossovers and j type II simple gene conversions
Similarly, 1/(m + 1)(1, 1, …, 1) D(k1, m, p, y1) D(k2, m, p, y2) (1, 1, …, 1)′ is the sum over all preceding and ending possibilities for the number of
Given k ≥ 1 crossovers in an interval and assuming no chromatid interference, the probability the resulting tetrad pattern would be t = 0 (parental ditype) is equal to the probability that the pattern would be t = 2 (nonparental ditype) and is (1/3)(1/2 + (−1/2)k); the probability the tetrad pattern would be t = 1 (tetratype) is thus (2/3)(1 − (−1/2)k) (Zhaoet al. 1995).
Thus, the (i,j) entries of the matrices P if t = 0, T if t = 1, and N if t = 2 give the probability of having the specified tetrad type and j type II simple gene conversions
For the analysis, we did not determine the maximum-likelihood estimators for the genetic distances between markers. Instead, we used the formula
Footnotes
-
Communicating editor: M. E. Zolan
- Received October 19, 2001.
- Accepted January 24, 2002.
- Copyright © 2002 by the Genetics Society of America