Genetics, Vol. 150, 459-472, September 1998, Copyright © 1998

Statistical Analysis of Ordered Tetrads

Hongyu Zhaoa and Terence P. Speedb
a Department of Epidemiology and Public Health, Yale University School of Medicine, New Haven, Connecticut 06520
b Department of Statistics, University of California, Berkeley, California 94720

Corresponding author: Hongyu Zhao, Department of Epidemiology and Public Health, Yale University School of Medicine, 60 College St., New Haven, CT 06520., hongyu.zhao{at}yale.edu (E-mail).

Communicating editor: D. BOTSTEIN


*  ABSTRACT
*TOP
*ABSTRACT
*METHODS
*RESULTS
*DISCUSSION
*APPENDIX 1
*LITERATURE CITED

Ordered tetrad data yield information on chromatid interference, chiasma interference, and centromere locations. In this article, we show that the assumption of no chromatid interference imposes certain constraints on multilocus ordered tetrad probabilities. Assuming no chromatid interference, these constraints can be used to order markers under general chiasma processes. We also derive multilocus tetrad probabilities under a class of chiasma interference models, the chi-square models. Finally, we compare centromere map functions under the chi-square models with map functions proposed in the literature. Results in this article can be applied to order genetic markers and map centromeres using multilocus ordered tetrad data.


GENETIC studies using tetrad data are very valuable in studying the chance mechanisms in meiosis, including: (1) positions of crossovers along the four-strand bundle; (2) nonsister strand pairs involved in each crossover; (3) spindle-centromere attachment at the first meiotic division; and (4) spindle-centromere attachment at the second meiotic division. Deviation from random distributions of crossovers on the four-strand bundle is called chiasma interference. Deviation from random involvement of nonsister chromatid pairs in each crossover is called chromatid interference. Compared with single spore data, where the four products from a single meiosis can only be recovered separately, tetrad data, where four meiotic products can be recovered together, have several advantages. First, chromatid interference and chiasma interference can be distinguished using tetrad data. Second, when chromatid interference is absent, chiasma interference can be detected with only two markers, whereas at least three markers are needed for single spore data. Chiasma interference can even be detected with one marker in some studies. Third, the position of the centromere can be inferred. In some organisms, such as Neurospora crassa, the asci are produced in a linear order corresponding to the meiotic divisions and are called ordered tetrads. In other organisms, such as Saccharomyces cerevisiae, the asci are produced as a group without order and are called unordered tetrads.

Ordered tetrads have been used extensively to study the crossover process during meiosis since LINDEGREN 1932 Down, LINDEGREN 1933 Down, LINDEGREN 1936A Down, LINDEGREN 1936B Down. However, most studies on ordered tetrads and unordered tetrads, for example, PAPAZIAN 1952 Down and PERKINS 1955 Down, used only three loci for the detection of chromatid interference and one locus for mapping centromeres. Several studies have taken a multilocus approach: (1) under the assumption of no chromatid interference (NCI), RISCH and LANGE 1983 Down fitted one class of chiasma interference models, the count-location model, to one multilocus unordered tetrad data set; (2) ZHAO et al. 1995B Down fitted another class of chiasma interference models, the chi-square model, to several multilocus unordered tetrad data sets; and (3) ZHAO et al. 1995A Down derived a set of linear equality and inequality constraints on the probabilities of unordered tetrad patterns, with an arbitrary number of loci under the assumption of NCI, and tested these constraints on data sets from a variety of organisms reported in the literature.

In this article, ordered tetrad data are studied under different assumptions on the chance mechanisms. For each assumption, a detailed discussion is provided for single marker and two marker data. General results for multiple markers are then presented. Although the number of spores is four in a tetrad and eight in an octad, there is no loss of generality for discussing only tetrads when aberrant segregations can be ignored. Half-tetrad data are another type of genetic data that are closely related to ordered tetrad data and widely used in genetic studies. A detailed study of half-tetrad data is given in the accompanying article (ZHAO and SPEED 1998 Down).

We adopt the following notation in this article. Markers are denoted by script letters; for example, we use and to denote markers. Alleles are denoted by italic letters. For example, A and a denote two alleles of marker . We use [X, Y, Z, W] to denote the observed marker configuration for an ordered tetrad, where X and Y are attached to one centromere and Z and W are attached to the other centromere. For example, [AB, Ab, aB, ab] represents an ordered tetrad with two strands carrying AB and Ab attached to one centromere and with two strands carrying aB and ab attached to the other centromere. The centromere is denoted by CEN. For patterns between a pair of markers, we use P to denote parental ditype where all four strands retain the parental type, T to denote tetratype where two of the four strands show recombination, and N to denote nonparental ditype where all four strands are recombinants.


*  METHODS
*TOP
*ABSTRACT
*METHODS
*RESULTS
*DISCUSSION
*APPENDIX 1
*LITERATURE CITED

Random spindle-centromere attachment assumption:
Random spindle-centromere attachment (RSCA) assumes that two centromeres have the same chance to go to either pole at the first meiotic division, and the divided centromeres have the same chance to go to either pole at the second meiotic division (GRIFFITHS et al. 1996 Down).

For marker with alleles A and a inherited from two parents, there are six distinguishable configuration types, as illustrated in Table 1. Under RSCA, types 1 and 2 ([A, A, a, a] and [a, a, A, A]) have the same probability because of random spindle-centromere attachment at the first meiotic division, whereas types 3 to 6 ([A, a, A, a], [a, A, a, A], [A, a, a, A], and [a, A, A, a]) have the same probability because of random spindle-centromere attachment at the second meiotic division. Types 1 and 2 are called first division segregation (FDS) pattern, and types 3 to 6 are called second division segregation (SDS) pattern (GRIFFITHS et al. 1996 Down). For a single marker, RSCA can be tested by examining whether types 1 and 2 occur with equal frequency and whether types 3, 4, 5, and 6 occur with equal frequency. RSCA is generally confirmed (FINCHAM et al. 1979 Down).


 
View this table:
In this window
In a new window

 
Table 1. Six possible configurations at marker

For the two markers and , there are six distinguishable configurations at and six distinguishable configurations at . Therefore, there are 36 distinguishable patterns jointly at two markers. Under RSCA, if one pattern can be changed to another pattern through one of the eight permutations in Table 2, these two patterns should have the same probability. For example, the following eight types have the same probability: [AB, Ab, aB, ab], [Ab, AB, aB, ab], [AB, Ab, ab, aB], [Ab, AB, ab, aB], [aB, ab, AB, Ab], [ab, aB, AB, Ab], [aB, ab, Ab, AB], and [ab, aB, Ab, AB]. After examining all 36 distinguishable patterns, the number of distinct probabilities is seven under RSCA. These seven groups are shown in Table 3. When shows FDS, there are three distinct groups corresponding to whether and have parental ditype, nonparental ditype, or tetratype, denoted by P1, N1, and T12, respectively, in Table 3. When shows SDS, there are four distinct groups. Two groups correspond to and having parental and nonparental ditypes, denoted by P2 and N2, respectively, in Table 3. When and have tetratype, can show either FDS or SDS. Under RSCA, these two groups may have different probabilities, denoted by T21 and T2. RSCA can be tested by examining whether all distinguishable patterns within each group occur with equal frequency. For example, the 8 distinguishable patterns listed above, which correspond to group T12 with showing FDS and and having tetratype, should occur with equal frequency. These and more cases were first studied by WHITEHOUSE 1942 Down.


 
View this table:
In this window
In a new window

 
Table 2. Eight ordered tetrad types with the same probability under RSCA


 
View this table:
In this window
In a new window

 
Table 3. Seven distinct groups under RSCA

For n markers, there are 6n distinguishable patterns. Under RSCA, these 6n patterns reduce to (6n + 5 x 2n)/8 distinct probabilities. This result was first derived by PAPAZIAN 1952 Down. In the Appendix 1 (Proposition 1), we present a different derivation of this result through a more direct counting method. As for the one- and two-marker cases, RSCA can be tested by examining whether all the distinguishable tetrad patterns that should have the same probability (as discussed in the proof of Proposition 1 in the Appendix 1) under RSCA do occur equally frequently.

No chromatid interference (one marker):
For the case of one marker, , if NCI holds, the four configurations corresponding to the SDS pattern have the same probability before meiotic divisions. Therefore, these four types should occur with the same frequency even if RSCA fails. As a result, only when both NCI and RSCA fail can these four types occur with different probabilities.

As shown by MATHER 1935 Down, under NCI, the probabilities that shows FDS and SDS patterns, given k chiasmata between CEN and , are

(1)
and

As k increases, the probability of SDS tends to 2/3. For one marker, , NCI imposes no constraints on the probabilities of FDS and SDS, denoted by F and S. For any observed FDS and SDS proportions, we can always construct a chiasma process model that gives rise to the observed FDS and SDS proportions. In fact, the process with probability F having zero chiasmata and with probability S having one chiasma is the simplest such model.

No chromatid interference (two markers):
Two markers, and , may be (1) on different chromosomes; (2) on the same chromosome but on different sides of the centromere (that is, the order is CEN); or (3) on the same chromosome and on the same side of the centromere (that is, the order is CEN or CEN). We study these three cases separately. These three cases were first discussed in detail by WHITEHOUSE 1942 Down. We use the notation in Table 3 to denote seven distinct groups. For example, P1 is the group with both markers showing FDS and no strand showing recombination between and .

Two markers on different chromosomes: Let p and q denote the probability of SDS at and . When both markers have FDS, tetrad types between and can be either parental ditype or nonparental ditype, depending on which pair of alleles are separated at the first meiotic division—AB vs. ab or Ab vs. aB. The probability of either outcome, group P1 or N1, is (1 - p)(1 - q)/2. Similar considerations lead to the probabilities of the seven groups in Table 4. These seven probabilities are determined by two independent parameters: p and q.


 
View this table:
In this window
In a new window

 
Table 4. The probabilities of seven groups when and are on different chromosomes

Two markers on different sides of the centromere (CEN–): We use p(P(k,{ell})1) , p(N(k,{ell})1) , p(T(k,{ell})12) , p(P(k,{ell})2) , p(N(k,{ell})2), p(T(k,{ell})21) , and p(T(k,{ell})2) to denote the frequency of ordered tetrads of groups P1, N1, T12, P2, N2, T21, and T2 among meioses with k chiasmata in CEN and {ell} chiasmata in CEN. We can easily check that

and

For k + {ell} > 2,

where F(k) and S(k) were defined in (1). For a given chiasma process along the four-strand bundle, let ck{ell} denote the joint probability of there being k chiasmata between CEN and and {ell} chiasmata between CEN and . The above relations can be combined with expressions for (ck{ell}) and summed, to give our desired frequencies. For example,

and

On the basis of the above results, it can be shown that, for any chiasma process, seven distinct groups can have at most five different probabilities: the probabilities of P1, N1, T12, T21, and (P2 + T2 + N2) can differ, and we denote them by {alpha}, ß, {gamma}, {delta}, and {epsilon}. The ratio of the probabilities of P2:T2:N2 is 1:2:1. Therefore, the probabilities of P2, T2, and N2 are {epsilon}/4, {epsilon}/2, and {epsilon}/4, respectively. These are summarized in Table 5.


 
View this table:
In this window
In a new window

 
Table 5. The probabilities of seven groups when and are on different sides of the centromere

The probabilities of these seven groups can be derived by another approach. If we treat the centromere as a marker, the results for unordered tetrads (ZHAO et al. 1995A Down) can be applied in this context. If the two centromeres from the two parents could be distinguished, there would be three types, P, T, and N, between and CEN, and three types, P, T, and N, between CEN and . This would lead to nine distinct probabilities. Let pk0 , pk1 , and pk2 denote the conditional probabilities of pk0 , pk1 , and pk2 of P, T, and N, given k chiasmata between a pair of markers. Under NCI, MATHER 1935 Down showed that, for k >= 1,

(2)

When k = 0, p00 = 1 and p01 = p02 = 0. Let pij denote the probability of joint tetrad pattern (ij), where i, j = 0, 1, or 2 corresponding to P, T, and N in each interval; then,

where pki and p{ell}j were defined in (2). Because the two centromeres cannot be distinguished, some of these classes are not distinguishable. For example, (0, 0) and (2, 2) both give rise to FDS at two markers and no recombinations between these two markers. Using the notation in Table 5, we have {alpha} = p(P1) = p00 + p22, ß = p(N1) = p02 + p20, {gamma} = p(T12) = p01 + p21, {delta} = p(T21) = p10 + p12, and {epsilon} = 4p(P2) = 2p(T2) = 4p(N) = p11.

It was shown in ZHAO et al. 1995A Down that NCI imposes equality and inequality constraints on the probabilities of distinguishable unordered tetrad patterns. Here we derive constraints on ordered tetrad probabilities under NCI. Following the definition in ZHAO et al. 1995A Down, we say a chiasma process is compatible with a given set of joint tetrad probabilities p if, under NCI, this chiasma process gives rise to these joint probabilities. It was shown in ZHAO et al. 1995A Down that for any underlying chiasma process, if NCI holds, there is a chiasma process with at most two exhanges between each consecutive pair of markers, inducing the same tetrad probabilities. Using this property, ZHAO et al. 1995A Down showed that, for a given set of unordered tetrad probabilities p = (p0, p1, p2)', the probabilities of P, T, and N between two markers, there is some chiasma process compatible with p if and only if T-11p >= 0, where

and 0 = (0, 0, 0)'. For two markers, write the pij in lexicographical order as p; there is an underlying chiasma process satisfying NCI compatible with unordered tetrad probabilities p if and only if T-12p >= 0, where T2 = T1{otimes}T1, T-12 = T-11{otimes}T-11 and 0 is a column vector with nine 0's, plus equality constraints described in ZHAO et al. 1995A Down. The operator {otimes} is the standard tensor product (see, e.g., BELLMAN 1970 Down). If the chiasma process has at most two chiasmata in each interval, the correspondence between p and c = (ck{ell}) is simply c = T-12p . Using the property that, for any underlying chiasma process, there is a compatible chiasma process with at most two chiasmata in each interval, we may focus on the study of chiasma processes with at most two chiasmata in each interval. Using the notation in Table 5, we have the following proposition, whose proof is given in the Appendix 1 (Proposition 2): under NCI, for any joint ordered tetrad probabilities with two markers on different sides of the centromere, there is an underlying chiasma process that is compatible with these probabilities if and only if {alpha} >= ß and {gamma} + {delta} >= 2ß and the ratios of p(P2):p(T2):p(N2) are 1:2:1.

Using {alpha}, ß, {gamma}, {delta}, and {epsilon}, we may express the probability that and show P, T, and N, as p(P) = {alpha} + , p(T) = {gamma} + {delta} + , and p(N) = ß + , respectively. In the unordered tetrad case, the constraints imposed by NCI are: p(P) >= P(N) and p(T) >= 2p(N) (ZHAO et al. 1995A Down). Substituting {alpha}, ß, {gamma}, {delta}, and {epsilon} in these two inequalities, we get {alpha} >= ß and {gamma} + {delta} >= 2ß. Therefore, for markers on different sides of the centromere, the only extra constraints added by ordered tetrads are the 1:2:1 proportionality constriants among p(P2), P(T2), and p(N2).

Two markers on the same side of the centromere (CEN–; the case of CEN– can be discussed similarly): As in the previous discussion, we use (i, j) to denote the nine distinct groups if the centromere can be treated as a marker. Because the centromere cannot be observed, (0, 0) and (2, 0) cannot be distinguished. Both (0, 0) and (2, 0) show FDS at and parental ditype between and . Therefore, we have p(P1) = p00 + p20. Similarly, p(N1) = p02 + p22, p(T12) = p01 + p21, p(P2) = p10, p(N2) = p12, and p(T21) = p(T2) = . Therefore, these seven groups can have at most six different probabilities. Each of these 2 x 3 types can be represented as (i1i2), with i1 = 0 or 1 corresponding to FDS or SDS at , and i2 = 0, 1, or 2 corresponding to P, T, or N between and . Denote these probabilities by {alpha} = p(P1), ß = p(N1), {gamma} = p(T12), {delta} = p(P2), {epsilon} = p(N2), and {phi} = 2p(T21) = 2p(T2) (Table 6). It can be shown, as in ZHAO et al. 1995A Down, that for joint tetrad probabilities with two markers on the same side of the centromere, there is a compatible chiasma process under NCI if and only if {alpha} >= ß, {gamma} >= 2ß, {delta} >= {epsilon}, {phi} >= 2{epsilon}, and p(T21) = p(T2).


 
View this table:
In this window
In a new window

 
Table 6. The probabilities of seven groups when and are on one side of the centromere and in the order of CEN–

No chromatid interference (multiple markers):
Here we consider only markers on the same chromosome. Markers on the same side of the centromere and markers on different sides of the centromere are discussed separately.

Markers on the same side of the centromere: Under the assumption of NCI, there are 2 x 3n-1 distinct probabilities for n markers in the order of CEN12– · · · – n. Each of these 2 x 3n-1 classes can be identified as follows: FDS and SDS are distinguished at 1, and for each pair of consecutive markers, there are three types: P, T, and N. Each of these 2 x 3n-1 types can be represented as (i1i2 · · · in), where i1 = 0 or 1 corresponding to FDS or SDS at 1, and ir = 0, 1, or 2 corresponding to P, T, or N between r-1 and r, for r = 2, ... , n. For unordered tetrads with n markers (n - 1 intervals), there are 3n-1 distinct probabilities because FDS and SDS cannot be differentiated at 1; that is, i1 cannot be determined. Write the probabilities of the observable patterns (i2 ... in) from unordered tetrads, denoted by pti2...in, in lexicographical order as pt. It was shown that there is an underlying chiasma process satisfying NCI compatible with unordered tetrad probabilities if and only if T-1n-1pt >= 0 , where Tn-1 = T1 {otimes} ... {otimes} T1 (n - 1 terms), plus equality constraints described in ZHAO et al. 1995A Down.

In our discussion of multiple marker ordered tetrad data, the pt0i2...in and pt1i2...in are considered separately. Write the pt0i2...in in lexicographical order as pt0, the pt1i2...in in lexicographical order as pt1. If for a given (0i2 ... in), there are k >= 2 tetratypes in the n - 1 intervals, these tetratype combinations may be subdivided further into 4k-1 subcells as follows. First, the strands can be labeled such that strands 1 and 3 always show recombination between two markers that have tetratype closest to the centromere. For the other k - 1 intervals showing tetratype, recombinations can occur on four possible pairs of strands. Therefore, there are 4k-1 subtypes. The probability of each subcell can be denoted by p0i2...in(h1 ... hk-1), where each hj is 1, 2, 3, or 4. If for a given (1i2 ... in), there are k >= 1 tetratypes in the n - 1 intervals, these tetratype combinations may be subdivided further into 2 x 4k-1 subcells as follows. Suppose the first pair of markers showing tetratype from the centromere is r-1 and r. Marker r-1 must show SDS, because otherwise there must be a tetratype before r-1. Marker r can show either FDS or SDS. Two types can thus be distinguished, depending on whether r shows FDS or SDS. The strands can be labeled such that strands 1 and 3 always show recombination between r-1 and r. For the other k - 1 intervals showing tetratype, recombinations can occur on four possible pairs of strands. The probability of each subcell can be denoted by p1i2...in (h0, h1 ... hk-1), where h0 is 0 or 1 if r shows FDS or SDS, and each hj is 1, 2, 3, or 4 for j >= 1. Using arguments similar to those in ZHAO et al. 1995A Down, it can be shown that there is an underlying chiasma process satisfying NCI compatible with pt0 and pt1 if and only if T-1n-1pt0 >= 0, T-1n-1pt1 >= 0, all the subcell probabilities pt0i2...in (h1, ... , hr) in a cell i2 ... in with ir = 1 for more than one r are equal, and all the subcell probabilities pt1i2...in (h0, h1, ... , hr) in a cell i2 ... in with ir = 1 for one ore more r are equal.

Markers on different sides of the centromere: Consider markers on one side of the centromere in the order of CEN12– · · · –n1 and markers on the other side in the order of CEN12 · · · –n2 . If the centromere could be observed, any joint tetrad pattern can be represented by (i1i2 ... in1 ; j1j2 ... jn2 ), where ir = 0, 1, or 2 corresponding to P, T, or N between r-1 and r, js = 0, 1, or 2 corresponding to P, T, or N between s-1 and s, and 0 and 0 both denote the same centromere. Because the centromere is not observable, both (0i2 ... in1 ; 0j2 ... jn2 ) and (2i2 ... in1 ; 2j2 ... jn2 ) show FDS at 1 and 1 and parental ditype between 1 and 1, they are not distinguishable. Similarly, (0i2 ... in1 ; 1j2 ... jn2 ) is not distinguishable from (2i2 ... in1 ; 1j2 ... jn2 ), (1i2 ... in1 ; 0j2 ... jn2 ) is not distinguishable from (1i2 ... in1 ; 2j2 ... jn2 ), and (0i2 ... in1 ; 2j2 ... jn2 ) is not distinguishable from (2i2 ... in1 ; 0j2 ... jn2 ). For SDS at both 1 and 1, that is, tetratype in the intervals 1CEN and CEN1, there are three distinguishable types based on the configuration between 1 and 1: P, T, or N.

We combine tetrad types having P between 1 and 1, (0i2 ... in1 ; 0j2 ... jn2 ), (2i2 ... in1 ; 2j2 ... jn2 ), and one of the three types of (1i2 ... in1 ; 1j2 ... jn2 ) showing P between 1 and 1, and denote the grouped type by (P; i2 ... in1 ; j2 ... jn2 ). Similarly, we obtain new grouped types, (T; i2 ... in1 ; j2 ... jn2 ) and (N; i2 ... in1 ; j2 ... jn2 ), where the tetrad types between 1 and 1 are T and P. It can be shown that the inequality constraints imposed by NCI on ordered tetrads are the same as the inequality constraints imposed on unordered tetrads applied to the above new grouped types. In the new grouped types, FDS or SDS information is ignored at 1 and 1. The equality constraints can be established but are more complex; we omit the details here.

Genetic mapping (one marker):
The probabilities of FDS and SDS at a marker can be related to the map distance between CEN and if a chiasma process model is specified. We study several chiasma models and compare various map functions derived from these models and map functions proposed in the literature. Note that centromeres can be mapped using other types of data. When markers at the centromere are available, the centromere can be treated as a marker and standard mapping procedures can be used to map centromeres (FERGUSON-SMITH et al. 1975 Down). For unordered tetrads, centromeres can be mapped with three markers on three chromosomes (PERKINS 1949 Down).

Complete interference model: If there is at most one chiasma between CEN and , let c0 and c1 denote the probabilities of having 0 and 1 chiasma; then, F = p(FDS) = c0 and S = p(SDS) = c1. The map distance d between CEN and is c1/2. Therefore, F = 1 - 2d and S = 2d.

If more than one chiasma is allowed, map distance d cannot be estimated from F and S unless the chiasma process is fully specified with the map distance as the only unknown parameter.

Poisson model: The most widely used chiasma process model is the Poisson process, which imposes no chiasma interference. In this model, the probability of k chiasmata between CEN and is e-2d(2d)k/k!. Therefore, from (1),

(3)
and

Under the complete interference model, the SDS proportion is twice the map distance. Under the Poisson model, which imposes no chiasma interference, the SDS proportion will never exceed 2/3. Therefore, for ordered tetrad data, the presence of chiasma interference can be shown with just a single marker if NCI is assumed and the observed SDS proportion is significantly above 2/3. In many organisms, the SDS proportion was observed to be larger than 2/3 for some markers (WEINSTEIN 1936 Down; BARRATT et al. 1954 Down; PERKINS 1962 Down; DEKA et al. 1990 Down). On the other hand, for many markers, the observed SDS proportion is less than twice the map distance, especially for markers far from the centromere, indicating less than complete interference.

There are several proposals in the literature to incorporate chiasma interference in relating the map distance and the SDS proportion. The earliest one appears to be the model proposed by BARRATT et al. 1954 Down. In their model, the probability of having r >= 1 chiasmata is

(4)
where

Map distances and SDS proportions can be expressed in terms of x and {alpha}. BARRATT et al. 1954 Down used k instead of {alpha} in (4). To avoid confusion with other notation in this article, {alpha} is used in the following discussion. BARRATT et al. 1954 Down found that {alpha} between 0.2 and 0.3 provided good fit to Drosophila and Neurospora data.

After trying out many candidates for simple map functions for SDS proportions, OTT et al. 1976 Down found that, for SDS proportions between 0 and 0.6, the function S = sin(3d) was in excellent agreement with the empirical data in PERKINS 1962 Down.

On the basis of a map function relating the map distance d and the recombination fraction {theta} between two markers proposed by RAO et al. 1977 Down,

MORTON et al. 1990 Down proposed the map function S = 3{theta} - d.

Here we will compare these map functions with map functions derived from the chi-square chiasma interference models. The chi-square model, first introduced by FISHER et al. 1947 Down, was suggested as a plausible biological model by FOSS et al. 1993 Down, although there are now doubts concerning the appropriateness of this motivation (FOSS and STAHL 1995 Down). In the FOSS et al. 1993 Down study, the model is represented in the form of Cx(Co)m, as follows: assume the crossover intermediates are randomly distributed along the four strand bundle, and every intermediate resolves either as a crossover (Cx) or not (Co). When an intermediate resolves as a Cx, the next m intermediates must resolve as a Co, and after mCo's the next intermediate must resolve as a Cx. The process is made stationary by allowing the leftmost crossover intermediate an equal chance to be one of Cx(Co)m. The chi-square model was found to provide good fit to data from different organisms (ZHAO et al. 1995B Down). One nice property of the chi-square model is that the probability of any ordered tetrad pattern has a closed-form expression, thus facilitating genetic data analysis under this model.

Let p = m + 1, and define Dk(y) to be the matrix whose (i, j)th entry is d(ij) = ! if pk + j - i >= 0, and dk(ij) = 0 otherwise. Let 1 = (1, 1, ... , 1)' and {alpha} = (1/p)1'. For an interval defined by parameter y, the map distance d is y/2p because (1) the average number of crossover intermediates between these two markers is y; (2) one out of every p = m + 1 intermediates resolves as a crossover; and (3) each strand has a chance of 1/2 of being involved in each crossover. Therefore, a given strand is involved in a crossover for every 2p crossover intermediates. The probability of having k chiasmata between two markers is ck = {alpha}Dk1 (ZHAO et al. 1995B Down). For the simple case of m = 1,

Therefore, from (1),

where d is related to y by d = from the above discussion. The expressions of F(y) and S(y) are more complicated for m > 1. Map functions relating the SDS proportion and the map distance for different m's are plotted in Figure 1. Note that m = 0 corresponds to the no-interference model, that is, the Poisson model. Under the no interference model, the SDS proportion never goes above 2/3. For m > 0, the SDS proportion rises above 2/3. As m increases, the maximal value of S increases, and it is achieved at smaller d. For m > 0, there is no one-to-one correspondence between S and d. Therefore, the centromere cannot be uniquely mapped when the SDS proportion is larger than 0.6, and chiasma interference cannot be ruled out.



View larger version (21K):
In this window
In a new window
Download PPT slide
 
Figure 1. Map function relating the map distance between the centromere and the marker to the second division segregation proportion under different chi-square models. The upper limit for the frequency of SDS under the Poisson model, y = , is also plotted in the figure.

To compare map functions proposed in the literature, we plot different map functions in Figure 2. The map functions presented are: (1) the map function under the complete-interference model, (2) the map function under the no-interference model, (3) the map function proposed by BARRATT et al. 1954 Down with {alpha} = 0.3, (4) the map function proposed by OTT et al. 1976 Down, (5) the map function proposed by MORTON et al. 1990 Down with p = 0.40, and (6) the map function under the Cx(Co)2 model. It is clear from Figure 2 that all map functions, except those under the complete-interference model and the no-interference model, agree with each other fairly well for S up to 2/3. Therefore, the map functions proposed in the literature can be well approximated by the map functions under the Cx(Co)2 model. In the context of single-spore data, it was also found that map functions under the chi-square model can approximate most map functions in the literature (ZHAO and SPEED 1996 Down). Therefore, the chi-square model is a good candidate for multilocus analysis of ordered tetrad data.



View larger version (21K):
In this window
In a new window
Download PPT slide
 
Figure 2. Comparison of different map functions proposed in the literature. The upper limit for the frequency of SDS under the Poisson model, y = , is also plotted in the figure.

Genetic mapping (two markers):
From two-marker ordered tetrad data, the map distances among two markers and the centromere can be estimated for a given chiasma process model. Here we derive joint ordered tetrad probabilities under the chi-square model. A special case of the chi-square model, the Poisson model, is studied separately, because joint tetrad probabilities can be expressed rather easily under this model. We consider markers on different sides of the centromere and markers on the same side of the centromere in turn.

Markers on different sides of the centromere (Poisson model): For a Poisson chiasma process, if the map distance between CEN and is d1, and if the centromere could be observed, p0(d1) = (1 + 2e-3d1 + 3e-2d1), p1(d1) = (2 - 2e-3d1), and p2(d1) = (1 + 2e-3d1 - 3e-2d1), where p0(d1), p1(d1), and p2(d1) are the probabilities of P, T, and N between CEN and , respectively (HALDANE 1931 Down). If the map distance between CEN and is d2, similarly we obtain p0(d2), p1(d2), and p2(d2), the probabilities of P, T, and N between CEN and . The joint tetrad probability pij for type (ij), where i, j = 0, 1, or 2, is pi(d1)pj(d2). Therefore, the five probabilities in Table 5 are:

Markers on different sides of the centromere (chi-square model): The chi-square model Cx(Co)m assumes that the chiasma process is stationary. This model has been applied mostly to markers on the same side of the centromere. Because the chiasma interference pattern may be different across the centromere, two chi-square models starting from the centromere toward two different telomeres may be necessary to model the chiasma process. In general, we may model interference across the centromere by relating the two most proximal crossover intermediates on two sides of the centromere. For example, we may assign the most proximal crossover intermediates on both sides of the centromere as the first Co after a Cx, thus inducing a higher chiasma interference in the centromere region than those in other regions. Or we may assign these crossover intermediates as the mth Co after a Cx. This will induce a lower chiasma interference in the centromere region than those in other regions. For simplicity, in this discussion we assume that starting from the centromere, there are two stationary chiasma processes on the two arms of the chromosome. In this case, there is no chiasma interference between the two arms.

For marker , if the centromeres from the two parents could be distinguished, the probabilities p0, p1, and p2 of P, T, or N between CEN and can be evaluated as follows. Let Dk(y) be as defined above; the probability of having k chiasmata between CEN and is ck = {alpha}Dk1. Define P(y) = {Sigma}{infty}k=0pk0Dk(y), T(y) = {Sigma}{infty}k=0pk1Dk(y), and N(y) = {Sigma}{infty}k=0pk2Dk(y), where pk0, pk1 , and pk2 were defined in (2). Then p0 = {alpha}P(y)1, p1 = {alpha}T(y)1, and p2 = {alpha}N(y)1. The relation between the map distance d and the parameter y is d = . Using these results, p0(d1), p1(d1), and p2(d1) can be obtained. Similarly, the probability of P, T, or N between CEN and , p0(d2), p1(d2), and p2(d2) can be evaluated. The joint tetrad probability pij is pi(d1)pj(d2). Therefore, the five probabilities can be obtained as in the Poisson model.

When m = 1, it can be shown that

and

where d = . Even for this simple model, the analytical forms are not so simple. No general results for arbitrary m are presented in this article.

Markers on the same side of the centromere (Poisson model): For a Poisson chiasma process, if the map distance between CEN and is d1, as shown in Equation 3, then F(d1) = (1 + 2e-3d1) and S(d1) = (1 - e-3d1) . If the map distance between and is d2, p0(d2) = (1 + 2e-3d2 + 3e-2d2), p1(d2) = (1 - e-3d2), and p2(d2) = (1 + 2e-3d2 - 3e-2d2), where p0(d2), p1(d2), and p2(d2) are the probabilities of P, T, and N between and , respectively. The six probabilities in Table 6 are

Markers on the same side of the centromere (chi-square model): Under the chi-square model, the joint tetrad probability cj{ell} of having k and {ell} chiasmata in the intervals (CEN, ) and (, ) is {alpha}Dk(y1)D{ell}(y2)1, where {alpha}, Dk(y), and 1 were defined above (ZHAO et al. 1995B Down). For joint tetrad type (i1i2),

where pki1 is the conditional probability for FDS (i1 = 0) or SDS (i1 = 1) defined in Equation 1, and p{ell}i2 is the conditional tetrad type probability defined in Equation 2. Define F(y) = {Sigma}{infty}k=0[( + (-)k)]Dk(y) and S(y) = {Sigma}{infty}k=0[(1 - (-)k)]Dk(y) . For any joint tetrad pattern (i1i2), pi1i2 = {alpha}M1(y1)M2(y2)1 , where M1(y1) = F(y1) or S(y1) when i1 = 0 or 1, and M2(y2) = P(y2), T(y2), or N(y2) when i2 = 0, 1, or 2. The matrices P(y2), T(y2), and N(y2) were defined above. Explicit expressions for F(y), S(y), P(y), T(y), and N(y) were obtained in previous discussion under the CxCo model.

Genetic mapping (multiple markers):
As before, markers on the same side of the centromere and on different sides of the centromere are considered separately.

Markers on the same side of the centromere: Consider n markers 1, 2, · · · , n in the order of CEN12– · · · n. Under NCI, there are 2 x 3n-1 different probabilities corresponding to patterns (i1i2 ... in). These 2 x 3n-1 types were mentioned in the discussion of the NCI assumption for the multiple marker case. Denote the map distance between r-1 and r by dr, where 0 is the centromere.

For a Poisson chiasma process, from the previous discussion, F1(d1) = (1 + 2e-3d1), S1(d1) = (1 - e-3d1), p0(d) = (1 + 2e-3dr + 3e-2dr), p1(dr) = (1 - e-3dr) , and p2(dr) = (1 + 2e-3dr - 3e-2dr) . The probability of tetrad pattern (i1i2 ... in) is f x {Pi}nr=2pir(dr) , where f is F1(d1) or S1(d1) when i1 = 0 or 1.

Under the chi-square model, define F(y), S(y), P(y), T(y), and N(y) as above. The probability of tetrad pattern (i1i2 ... in) is {alpha}({Pi}nr=1Mr)1 , where M1 = F(y1) or S(y1) for i1 = 0 or 1, and Mr = P(yr), T(yr), or N(yr) for ir = 0, 1, or 2 when r >= 2. The parameter yr and the map distance dr are related by dr = .

Markers on different sides of the centromere: Consider markers in the order of n2 – · · · –1CEN12– · · · – n1 . If the two chiasma processes on different sides of the centromere are independent, we may first consider the case in which the centromere could be observed. For tetrad pattern (i1i2 ... in) on markers CEN, 1, 2, · · · , and n, p(i1i2...in1) = {alpha}({Pi}n1r=1Mr)1, where Mr = P(yr), T(yr) , and N(yr) for ir = 0, 1, and 2. The map distance between r-1 and r is dr = . For tetrad pattern (j1j2 ... jn2 ) on markers CEN, 1, 2, · · · , and n2, p(j1j2...jn2) = {alpha}({Pi}n2s=1Ms)1 , where Ms = P(ys), T(ys), and N(ys) for is = 0, 1, and 2. The map distance between s-1 and s is ds = . Because the centromere is not observable, instead of 3n1+n2 probabilities, there are 5 x 3n1+n2-2 distinct probabilities. These 5 x 3n1+n2-2 distinct probabilities can be denoted by (o; i2 ...in1; j2 ... jn2) , where o = 0 corresponds to FDS at both 1 and 1 and the tetrad type between 1 and 1 being P, o = 1 corresponds to FDS at both 1 and 1 and the tetrad type between 1 and 1 being N, o = 2 corresponds to FDS at 1 and SDS at 1, o = 3 corresponds to SDS at 1 and FDS at 1, and o = 4 corresponds to SDS at both 1 and 1. The probability of type (o;i2 ... in1 ; j2 ... jn2 ) is


*  RESULTS
*TOP
*ABSTRACT
*METHODS
*RESULTS
*DISCUSSION
*APPENDIX 1
*LITERATURE CITED

In this section, the methods developed and described in METHODS are used to find the order of a set of markers and to estimate map distances between the centromere and genetic markers.

Order markers under NCI (two markers):
As discussed above and summarized in Table 4 Table 5 Table 6, different orders of two markers impose different constraints among the probabilities of seven groups in Table 3. These constraints can be used to order two markers. Data from HOWE 1956 Down are analyzed here to illustrate the procedure. The first pair of markers analyzed are and . The observed numbers of tetrads for the seven groups are shown in Table 7. It is clear that the data satisfy the constraints under the order CEN but not the constraints under other orders. Thus, the order CEN can be established. To make the inference more rigorous, the maximum likelihood estimates of the probabilities for the seven groups and the corresponding maximum likelihoods were calculated under the four possible orders: (1) CEN1, CEN2, (2) CEN, (3) CEN, and (4) CEN. It is straightforward to obtain the maximum likelihood estimates under order (1). To find the maximum likelihood estimates under the linear inequality constraints among the seven probabilities for orders (2), (3), and (4), an expectation maximization (EM) algorithm (DEMPSTER et al. 1977 Down) was implemented to find the maximum likelihood estimates under each order. This algorithm is similar to the EM algorithm used in ZHAO et al. 1995A Down(p. 1061) in that it treats the unobserved chiasma frequencies as constituting the complete data. The details of this algorithm are provided in the Appendix 1 (EM algorithm). The expected number of observations for each group and the maximized log-likelihood under each order are given in Table 7. Among the four orders, the order CEN yielded the largest maximized log-likelihood, thus establishing the order CEN. The second pair of markers analyzed are and . The observations as well as the expected values and maximized log-likelihoods under the four orders are summarized in Table 8. Comparing the maximized log-likelihoods under the four orders leads to the order CEN. The last pair of markers studied are and (Table 9). The data are consistent with and being on different chromosomes. PERKINS 1953 Down discussed the detection of linkage using unordered tetrads. With ordered tetrads, as can be seen from these examples, it is not only possible to detect linkage, but it is also possible to order the two markers relative to the centromere. This results from the extra information in ordered tetrads.


 
View this table:
In this window
In a new window

 
Table 7. Observed and expected counts of seven groups for markers and under four possible orders for and


 
View this table:
In this window
In a new window

 
Table 8. Observed and expected counts of seven groups for markers and under four possible orders for and


 
View this table:
In this window
In a new window

 
Table 9. Observed and expected counts of seven groups for markers and under four possible orders for and

Order markers under NCI (three markers):
Consider three markers , , and . Under RSCA, there are 32 distinct probabilities (Appendix 1: Proposition 1). When , , and are on the same chromosome, there are a total of 12 possible orders among them: (1) CEN, (2) CEN, (3) CEN, (4) CEN, (5) CEN