## Abstract

The “NPD ratio,” widely used by yeast geneticists, is of limited applicability and is prone to falsely indicate significant crossover interference in a chi-square test. A simple, better chi-square test for interference in two-factor crosses is described.

FOR organisms, such as yeast, in which the four products of individual meioses can be examined, crossover interference can be assessed in two-factor crosses. In 1952, Haig Papazian noted that, for the null hypothesis of no interference, the frequency of nonparental ditype (NPD) tetrads expected (*fN*_{exp}) can be written as a function of the frequency of tetratype (TT) tetrads observed (*fT*_{obs}):(1)

Equation 1, which is equivalent to Papazian's (1952) Equation 5, assumes the absence of both chromatid and chromosome (chiasma) interference and is a rewritten form of equations by Haldane (1931). Interference is indicated when the observed frequency of NPDs (*fN*_{obs}) differs significantly from *fN*_{exp}. The “NPD ratio” (*fN*_{obs}/*fN*_{exp}) is an indication of the intensity of interference, with smaller values indicating stronger positive interference (Snow 1979a,b).

#### Papazian's approximate method:

In 1952, in the absence of today's electronic calculators, Papazian provided an easily calculable approximation to Equation 1:(2)Although Papazian (1952) advertised the approximation as generally valid, it is useful only for short intervals. For longer intervals, *fN*_{exp} calculated from this approximation falls ever farther below the values calculated from the exact equation. At the limit, when calculated from the approximate equation is only 0.080, which is less than half of its exact expectation of . Subsequently, Strickland (1958) proposed an approximation that is more useful over a wider range of *fT*_{obs} values and that, he says, changes Papazian's (allegedly incorrect) to .

Some authors, noting correctly that Papazian's exact equation gives meaningless roots when , have used Equation 2 to calculate *fN*_{exp} when confronted with data in which . Apparently, these authors took Papazian (1952, p. 179) seriously when he wrote that his approximation “… is valid for long intervals and can be used where other methods of measuring interference cannot.” However, not only is the approximation a poor one for large *fT*_{obs}, its use when is as invalid as is that of the exact expression—*fT*_{obs} cannot (significantly) exceed in the absence of interference, as noted by Perkins (1955). Furthermore, since Equation 1 describes relationships in the presumed absence of interference, the only value for *fN*_{exp} that can reasonably be based solely on an *fT*_{obs} that (insignificantly) exceeds is , the value for *fN*_{exp} as map length approaches infinity. Some other authors have simply ignored the upper limit on *fT* for tetrads in which and extracted a meaningless “*fN*_{exp}” from the exact equation.

#### Tests for significance:

Several users of Papazian's equation test for significance by chi square, but they fail to justify use of this test or to publish particulars. The most obvious, though not necessarily legitimate (see below), application of chi square would seem to be a test of the observed numbers of NPDs and of parental ditypes (PDs) *vs*. the numbers expected (d.f. = 1), where the frequency of PDs expected is *fP*_{exp} = 1 – *fN*_{exp} – *fT*_{obs}. Some investigators appear to ignore the PDs and calculate chi square as (*N*_{obs} − *N*_{exp})^{2}/*N*_{exp} (d.f. = 1), where *N*_{exp} is *fN*_{exp} × total tetrads, rounded to the nearest integer. (The rounding seems to imply the inappropriate use of a contingency chi-square test rather than the appropriate “goodness-of-fit” chi-square test.)

#### Problems with chi square:

Papazian's (1952) equation uses only a fraction of the data available for testing interference. The resulting inefficiency, which is minimal at small values of *fN*_{obs}, increases with increasing *fN*_{obs}. Because the Papazian-based chi-square tests ascribe all of the deviation to two of the three classes, their chi-square values, based on the sum of the squares of those deviations, are enlarged. Accordingly, such Papazian-based methods will be prone to falsely claiming interference.

Below, we offer a simple, better way to test two-factor tetrad data for interference and compare it with the above-mentioned Papazian-based methods.

#### The better way:

In the presumed absence of interference, map length (*X*, in morgans) may be calculated from the observed recombinant frequency (*R*) using Haldane's (1919) equation(3)where *R*, by definition, is(4)

The expected frequencies of the three classes of tetrads are written below in a form that is user friendly for calculation. They appear in different form in Haldane (1931), Papazian (1952), and Snow (1979a),(5)(6)(7)with all equations written for the presumed absence of interference. Multiplying these expected frequencies by the total number of tetrads in the analysis gives a corresponding set of expected numbers, which we test by chi-square goodness-of-fit (d.f. = 1) against the observed numbers (see below). For further convenience, the calculations are automated at Stahl Lab Online Tools at http://molbio.uoregon.edu/∼fstahl/.

With respect to false positives, the performance of the better way (BW) can be compared with those of the Papazian-based methods and the maximum-likelihood estimation (MLE) (Fisher 1925), using the appropriate likelihood equation of estimation (Equation 7 of Snow 1979a). When a universe of tetrads lacking interference is sampled, the BW behaves as well as the MLE over the range tested, giving “significance” 5% of the time (Figure 1). The false positives of the Papazian-based methods, on the other hand, increase with increasing map length. For instance, a 2003 yeast article in Molecular and Cellular Biology uses Papazian-based chi-square tests and reports interference significant at the 0.02 level for an interval that has 188 PDs, 16 NPDs, and 235 TTs. When tested by the better way, the *P-*value for that interval is 0.07. A 2003 article in Genetics reports negative interference (*P* = 0.03) on the basis of 542 PDs, 80 NPDs, and 617 TTs. When tested by the BW, the *P-*value is 0.12.

In Table 1, we compare the properties of the three methods for some of the *Schizosaccharomyces pombe* data of Munz (1994) and for fictitious data that vary with respect to map length, interference, and total sample size. Table 1 shows that chi-square values for the BW and the MLE are practically identical. The Papazian-based method, on the other hand, often gives appreciably larger chi-square values. The entries in Table 1 suggest that, in the presence of positive interference, MLEs of *X* are slightly larger than BW estimates. This suggestion is supported by extensive comparisons (appendix) made (unintentionally) by Snow (1979a,b).

In Figure 2, the BW and the MLE are compared with respect to the efficiency with which they reflect the universe sampled. Figure 2A shows that, even for small samples (100 tetrads each), the mean values of *X* obtained by the BW reflect the universe mean about as well as those obtained by the MLE over the range tested. Figure 2B shows that the variances of the sample distributions obtained by the BW are also practically indistinguishable from those obtained by the MLE. The test range was limited by the frequency with which sample *R* values > prevented the use of the BW.

The similarity in performance of the MLE and the BW justifies the use of chi square as an easily calculated test for significance when the estimate of map length (*X*) is based on Equation 3 rather than on the tedious (or computerized) MLE. The BW is valid for values of *fT*_{obs} that are (as long as the observed *R* is not ).

#### APPENDIX

Snow (1979a) developed maximum-likelihood equations for estimating interference according to a model by Barratt *et al*. (1954). The MLE equations contain parameters for the map length (*X*) and for the strength of interference (*k*). Snow calculated what he called *X*(*i*) for “*X* with interference.” He was later advised that what he had, in fact, calculated was *X* as it would be if there were no interference (Snow 1979b).

Snow (1979a, p. 237) writes: “Because of the nature of tetrad data and the conditions of the interference model, the estimate of *x* obtained by (9) and (10) is equal to −(1/2)ln *p*(0) (*i.e*., *p*(0) = *e*^{−2x}), where *p*(0) is the proportion of tetrads with no exchanges, that is, PD − NPD/*N*. For example, a Saccharomyces cross involving the markers *mat1* and *his4* on chromosome *3* (see Table 2) produced 97 PD, 7 NPD, and 174 T. The maximum likelihood estimate of *x* with interference is 0.56390, which is −(1/2)ln(90/278).”

Note that *fP* − *fN* = 1 – 2*fN* – *fT* = 1 – 2*R*. Thus, Snow calculated the *X*(*i*) values of his Table 2 from equations exactly equivalent to our Equations 3 and 4, which assume no interference (Haldane 1919). Snow appeared at that time not to realize what he had done, judging from his remark that “the estimates for *x*(*i*) are somewhat smaller than those for *x*, in keeping with expectations of the interference model” (p. 241). In fact, Snow's estimates of *X*(*i*) tend to be less than his MLEs of *X* in the no-interference model simply because the BW procedure for estimating *X* assuming the absence of interference is slightly biased. By comparing MLEs and BW estimates of *X* to five decimal places, Snow (1979a) has given us examples of the extent of bias in the *X* values of the BW method. The degree of the bias can be appreciated by examining the many entries for *X* and *X*(*i*) in Snow's (1979a) Table 2.

## Acknowledgments

Elizabeth Housworth kindly conducted the Monte Carlo assessments of the various methods, prepared the graphs for Figures 1 and 2, and was indispensably helpful throughout. Jette Foss edited the text for clarity.

## Footnotes

Communicating editor: G. R. Smith

- Received January 10, 2008.
- Accepted February 27, 2008.

- Copyright © 2008 by the Genetics Society of America