help button home button Genetics AJP: Cell Physiology
HOME HELP FEEDBACK SUBSCRIPTIONS ARCHIVE SEARCH TABLE OF CONTENTS

This Article
Right arrow Abstract Freely available
Right arrow Full Text (PDF)
Right arrow Alert me when this article is cited
Right arrow Alert me if a correction is posted
Services
Right arrow Similar articles in this journal
Right arrow Similar articles in PubMed
Right arrow Alert me to new issues of the journal
Right arrow Download to citation manager
Right arrow reprints & permissions
Citing Articles
Right arrow Citing Articles via HighWire
Right arrow Citing Articles via Google Scholar
Google Scholar
Right arrow Articles by Wu, S. S.
Right arrow Articles by Casella, G.
Right arrow Search for Related Content
PubMed
Right arrow PubMed Citation
Right arrow Articles by Wu, S. S.
Right arrow Articles by Casella, G.
Genetics, Vol. 159, 1339-1350, November 2001, Copyright © 2001

A Multivalent Pairing Model of Linkage Analysis in Autotetraploids

Samuel S. Wu1,a, Rongling Wu1,a, Chang-Xing Maa,b, Zhao-Bang Zengc, Mark C. K. Yanga, and George Casellaa
a Department of Statistics, University of Florida, Gainesville, Florida 32611,
b Department of Statistics, Nankai University, Tianjin 300071, People's Republic of China
c Department of Statistics, North Carolina State University, Raleigh, North Carolina 27695

Corresponding author: Samuel S. Wu, Division of Biostatistics, P. O. Box 100212, University of Florida, Gainesville, FL 32610., samwu{at}biostat.ufl.edu (E-mail)

Communicating editor: J. B. WALSH


*  ABSTRACT
*TOP
*ABSTRACT
*AUTOTETRAPLOID MODEL
*SIMULATION
*DISCUSSION
*APPENDIX
*LITERATURE CITED

Polyploidy has been recognized as an important step in the evolutionary diversification of flowering plants and may have a significant impact on plant breeding. Statistical analyses for linkage mapping in polyploid species can be difficult due to considerable complexities in polysomic inheritance. In this article, we develop a novel statistical method for linkage analysis of polymorphic markers in a full-sib family of autotetraploids. This method is established on multivalent pairings of homologous chromosomes at meiosis and can provide a simultaneous maximum-likelihood estimation of the double reduction frequencies of and recombination fraction between two markers. The EM algorithm is implemented to provide a tractable way for estimating relative proportions of different modes of gamete formation that generate identical gamete genotypes due to multivalent pairings. Extensive simulation studies were performed to demonstrate the statistical properties of this method. The implications of the new method for understanding the genome structure and organization of polyploid species are discussed.


POLYPLOIDY is an important evolutionary force in flowering plants (STEBBINS 1971 Down; GRANT 1981 Down; BEVER and FELBER 1992 Down; JACKSON and JACKSON 1996 Down; SOLTIS and SOLTIS 2000 Down). It is estimated that as much as 30–80% of angiosperms are polyploids or have experienced one or more episodes of polyploidization (STEBBINS 1971 Down; GRANT 1981 Down; MASTERSON 1994 Down). Evidence for the creative role of polyploidy in evolution is well synthesized in a recent review by OTTO and WHITTON 2000 Down, although they estimated that only 2–4% of speciation events in flowering plants involve polyploidization. The frequency of polyploidy in domesticated plant taxa is also high (75%); alfalfa, banana, canola, coffee, cotton, potato, soybean, strawberry, sugarcane, sweet potato, and wheat represent excellent examples of polyploids of economic importance (HILU 1993 Down). To study the evolutionary consequences of polyploidy on genome organization and develop superior varieties of polyploid plant species, a number of genome projects have now been launched to construct genetic linkage maps using molecular markers and identify genes responsible for economically important traits in polyploid populations ranging from tetraploid (potato) to octoploid (sugarcane; WU et al. 1992 Down; DA SILVA et al. 1993 Down; YU and PAULS 1993 Down; GRIVET et al. 1996 Down; HACKETT et al. 1998 Down; MEYER et al. 1998 Down; BROUWER and OSBORN 1999 Down; RIPOL et al. 1999 Down).

For allopolyploids derived from the chromosome combination of distinct genomes and subsequent chromosome doubling (SOLTIS and SOLTIS 2000 Down), statistical methods developed for molecular linkage mapping by estimating recombination fractions between different loci in diploid species (LANDER and GREEN 1987 Down) will also apply. However, these methods cannot be used in autopolyploids that are formed due to the chromosome doubling of the same genome by fusion of unreduced gametes (SOLTIS and SOLTIS 2000 Down). Autopolyploids may undergo either bivalent (two chromosomes pair) or multivalent pairing (more than two chromosomes pair) or both, at meiosis, in which a gene has more than one possible partner (or set of partners). Polysomic inheritance could result from the multivalent formation. Most of the available statistical methods for autopolyploid linkage analysis assume bivalent pairings (WU et al. 1992 Down; HACKETT et al. 1998 Down; RIPOL et al. 1999 Down; LUO et al. 2000 Down, LUO et al. 2001 Down). Statistical analysis assuming multivalent pairings has not been explored thoroughly because of the complexity of polysomic inheritance.

Double reduction is a phenomenon that two sister chromatids of a chromosome sort into the same gamete (DARLINGTON 1929 Down; DE WINTON and HALDANE 1931 Down; MATHER 1936 Down; FISHER 1947 Down). It may be generated due to multivalent pairings in autopolyploids. Fig 1 shows how different types of gametes are formed. At anaphase I, chromatids located on a chromosome may migrate either to the same pole (reductional separation) or to different poles (equational separation). The type of separation depends on the number and the type of crossovers located between the centromere and the locus under consideration. We consider the segregation of two loci A and B in autotetraploid demonstrating quadrivalent formation during meiosis. Locus A is so close to the centromere that no crossover happened between them. The first division for this locus is reductional and double reduction never occurs (path X). Locus B has one crossover with the centromere and, thus, undergoes equational separation. If the four homologous chromosomes segregate randomly, they may migrate to the same cell in two different ways. In the first way, chromosomes 1 and 2 and their respective homologues migrate to the same cells, and therefore alleles located on sister chromatids reach different gametes and double reduction never occurs (path Y). In the second way, chromosomes 1 and 2 and chromosomes 3 and 4 migrate to the same cells, which may cause double reduction when chromatids segregate randomly.



View larger version (21K):
[in this window]
[in a new window]
[Download PPT slide]
 
Figure 1. A diagram displaying the segregation patterns of loci A and B during meiosis in an autopolyploid (modified from MATHER 1936 Down and BEVER and FELBER 1992 Down). Locus A having no crossover with the centromere undergoes path X of reductional separation (no double reduction), whereas locus B displaying a crossover with the centromere undergoes either path Y of equational separation with no double reduction or path Z of equational separation with double reduction. Gametes having undergone double reductions are underscored.

FISHER 1947 Down formulated a pioneering theoretical model for analyzing two linked loci in an autotetraploid undergoing quadrivalent pairings during meiosis. Although Fisher elegantly described the modes of gamete formation in terms of the recombination number between the two loci and the frequency of double reduction at each locus, he was not able to provide a tractable computational method for estimating these parameters. In this article, we use Fisher's model to devise a maximum-likelihood method for simultaneously estimating the frequency of double reduction and the recombination fraction between different markers in autopolyploids whose gamete formation is predominately due to multivalent pairings. The method relied on an expectation-maximization (EM) algorithm (DEMPSTER et al. 1977 Down). Mathematically, we prove that the difference in the frequency of double reduction between two loci is bounded by two times the recombination fraction in tetraploid. Our linkage analysis here is based on fully informative codominant markers of eight different alleles at each marker between the two autotetraploid parents. Statistical properties of this autopolyploid method are examined using a simulation study.


*  AUTOTETRAPLOID MODEL
*TOP
*ABSTRACT
*AUTOTETRAPLOID MODEL
*SIMULATION
*DISCUSSION
*APPENDIX
*LITERATURE CITED

A quadrivalent pairing model for two linked markers:
Consider two linked markers k and l on the same chromosome in an autotetraploid. At marker k, four alleles, each assigned to one of the four homologous chromosomes, are labeled by Pk1, Pk2, Pk3, and Pk4 for parent P and by Qk1, Qk2, Qk3, and Qk4 for parent Q. Accordingly, four different alleles at marker l are labeled by Pl1, Pl2, Pl3, and Pl4 for parent P and by Ql1, Ql2, Ql3, and Ql4 for parent Q. The recombination fraction between the two markers is denoted by {theta}P for parent P and {theta}Q for parent Q. For the two autotetraploid parents used for the cross, there are a total of 576 allelic configurations or linkage phase assignments between the two markers, one of which is schematically expressed as

(1)

where lines indicate the individual homologous chromosomes on which the two markers are located. The recombination fractions {theta}P and {theta}Q are estimated on the basis of the segregation of the two-marker joint genotypes observed in the progeny of the family. However, the observations of the joint marker genotypes are confounded by the models of meiotic pairings (bivalent or quadrivalent) and parental linkage phases of different alleles across the two maternally and two paternally derived chromosomes. To make accurate estimates for {theta}P and {theta}Q, therefore, it is essential to select a most likely pairing model and linkage phase configuration over the two parents.

In this article, we proposed a model for fully informative codominant markers, i.e., those of eight different alleles between the two autotetraploid parents at each marker. We assume that the four homologous chromosomes form quadrivalents. Thus, for a particular marker k, we must consider the full chromatid complement that may be represented as gametes Pk1Pk1, Pk2Pk2, Pk3Pk3, and Pk4Pk4 for parent P and gametes Qk1Qk1, Qk2Qk2, Qk3Qk3, and Qk4Qk4 for parent Q. The generation of these gametes is typical of the four-strand model in which both chromatids of a single chromosome may be passed to the same gamete, forming the so-called double reduction (DARLINGTON 1929 Down; MATHER 1936 Down; FISHER 1947 Down). The frequency of double reduction is a constant for any given locus, depending on its distance from the centromere. We denote the frequencies of double reduction at k by {alpha}P for parent P and {alpha}Q for parent Q. Similarly, ßP and ßQ are denoted for marker l. Following the classification in FISHER 1947 Down, with two linked loci in tetrasomics there are four different combinations in terms of the existence of double reduction:

  1. Both markers display double reductions;

  2. only marker k displays double reductions;

  3. only marker l displays double reductions; and

  4. none of the markers display double reductions.

Since there are four sources for the allele at any given locus, a gametic chromosome with two loci can be made up 16 ways:

Each gamete has two chromosomes and these will be of 1/2 x 16 x 17 = 136 different possible types. For one parent, all these types of gametes can be classified into 11 basic modes according to double reduction and the number of recombination events (MATHER 1936 Down; Table 1):

The second mode has 12 possibilities as a result of recombination between a pair of the four chromosomes:

3 and 4: The second two modes include double reduction only at marker k. For mode 3, one parental chromosome is unchanged, but the other is made up by all possible types of recombination between this chromosome and the remaining three. There are 12 possibilities and typical gametes are like

And for mode 4, both chromosomes are derived from recombination between the four parental chromosomes. There are also 12 possibilities such as

5 and 6: The next two modes involve double reduction only at marker l and have classifications similar to the second two modes.

Other 5: The last five modes (7A, 7B, 8A, 8B, and 9), in which neither marker k nor l has double reduction, can be sorted into three types. In the first type, mode 7, two gametic chromosomes are derived from two of the parental chromosomes either without recombination (mode 7A) or with recombination (mode 7B). There are six possibilities for each group. Typical gamete types are

Because the same genotype is represented, 7A and 7B cannot be distinguished on the basis of the marker phenotypes. The second type (mode 8) of nondouble reduction is that two gametic chromosomes are derived from three of the parental chromosomes with one event of recombination (8A, 24 possibilities) or two events of recombinations (8B, 24 possibilities). Gamete examples for modes 8A and 8B are

They are also indistinguishable because they have identical genotypes. The third type (mode 9) of nondouble reduction includes recombination between all four different chromosomes such as

Mode 9 has 12 possibilities.

Because gametes for fully informative markers are unique to the two parents and because the two parents are assumed to behave independently in terms of double reduction and recombination, gamete genotypes can provide adequate information for linkage analysis as much as zygote genotypes. Therefore, to simplify our treatments, we base our linkage analysis on the segregation of the gamete genotypes in each parent. Thereafter, only parent P is considered because a symmetrical inference can be made for parent Q. We refer to the frequencies of double reduction and recombination fraction between the markers for parent P by {alpha}, ß, and {theta} without the subscript P, unless otherwise specified.

Parameter estimation:
For marker k, assume a fixed assignment for the four alleles of parent P in the order Pk1, Pk2, Pk3, and Pk4. Given such a fixed assignment for marker k, we randomly assign the four observed alleles of marker l, Pl1, Pl2, Pl3, and Pl4, with a total of 24 different possibilities. One of the possibilities should present a correct assignment for the alleles of the two markers among the four homologous chromosomes. The estimates of the frequencies of double reduction and the recombination fraction between the two markers should be based on their best, but unknown, allelic assignment across the parental chromosomes. For linkage analysis in autotetraploid populations, therefore, a vector of unknown parameters can be denoted by {pi}{omega} = (A{omega}, {alpha}, ß, {theta})T, where A{omega} is the {omega}th allelic assignment for marker l relative to the fixed allelic assignment of marker k.

Given a particular allelic assignment for parent P as shown in expression (1), four double reduction gametes and six nondouble reduction gametes generated by marker k can be arrayed in the order {Pk1Pk1,Pk2Pk2,Pk3Pk3,Pk4Pk4,Pk1Pk2,Pk1Pk3,Pk1Pk4,Pk2Pk3,Pk2Pk4,Pk3Pk4} and {Pl1Pl1,Pl2Pl2,Pl3Pl3,Pl4 Pl4,Pl1 Pl2,Pl1Pl3,Pl1Pl4,Pl2Pl3,Pl2Pl4,Pl3Pl4} at marker l. Thus, we can identify 10 x 10 = 100 two-marker gamete genotypes for parent P. Following notation in FISHER 1947 Down, we define f, the relative frequencies of the 11 different modes of gamete formation, which must sum to unity (Table 1). However, because the marker phenotypes of 7A and 7B cannot be distinguished, we use a single f7 to denote the mixed frequency with which both 7A and 7B occur at meiosis. For the same reason, the mixed frequency of 8A and 8B is denoted by f8. It is not difficult to express the joint relative frequencies of two-marker diploid gametes in matrix notation:

(2)

However, as illustrated earlier, there are as many as 136 gamete formations for any two linked markers. The 36 "extra" gamete formations are each due to a reciprocal allelic assignment of marker l and are located in the 6 x 6 = 36 cells of the above matrix's bottom-right corner, in which neither of the two markers displays double reduction (Table 1). Of these 36 formations, 6 are under mode 7, 24 are under mode 8, and the remaining 6 are under mode 9. For example, gamete formations

are two reciprocal assignments, but they have the same genotype and are mixed in the same cell at row 5 and column 5.

Because formation mode 7 is a mixture of double recombinants and nonrecombinants, the determination of the expected number of recombination events under this mode requires information about the relative proportions of these two types of offspring. Given the relative proportion of double recombinants in mode 7 ({phi} = , see Appendix), the expected number of recombination events is 2{phi}. Similarly, for mode 8, which is a mixture of single recombinants and double recombinants, the expected number of recombination events is calculated as 1 · (1 - {psi}) + 2 · {psi} = 1 + {psi}, where {psi} is the proportion of double recombinants in mode 8 ({psi} = ; see Appendix). The expected numbers of recombination events between the two markers can be expressed in matrix notation as

The above information allows us to express the recombination fraction {theta} and the two double reduction parameters, {alpha} at marker k and ß at marker l, in terms of f1, ... , f9 and {phi}, {psi}. We have

From the above equations, it follows that |{alpha} - ß| = |f3 + f4 - f5 - f6) <= f3 + f4 + f5 + f6 <= 2{theta}. Therefore the difference in the frequency of double reduction between two loci is bounded by two times the recombination fraction in tetraploid. This inequality is consistent with the fact that when two markers are close, their double reduction rates tend to be similar. We believe similar inequalities exist for other ploidy levels. However, due to complexity of gamete types for those cases, we are not able to generalize the result at this moment.

For a fully informative marker, every gamete genotype can be well distinguished. Thus, N offspring in a full-sib family can be sorted into the nine distinguishable gamete formation modes of size N1, N2, ... , N9, respectively (see Table 1). It is not difficult to derive the explicit expressions of the maximum-likelihood estimates for the frequencies of these nine formation modes f1, f2, ... , f9 in terms of the corresponding sample frequencies N1, N2, ... , N9 on the basis of the following likelihood function given the observed marker data (M):

From the above matrix H, which indicates where double reduction has occurred for each of the markers, the two double-reduction parameters, {alpha} and ß, can be estimated in terms of the corresponding frequencies of formation modes; i.e., = and = . Since these are simply estimates of binomial proportions, the variances of and are {alpha}(1 - {alpha})/N and ß(1 - ß)/N, respectively.

Suppose we could distinguish the two f7 modes and the two f8 modes; the likelihood function given complete data (N1, N2, ... , N6, N7A, N7B, N8A, N8B, N9) is

(3)

On the basis of the observed incomplete data N1, N2, ... , N7, N8, N9, the EM algorithm is used to estimate the recombination fraction by maximizing the likelihood Equation 3 (DEMPSTER et al. 1977 Down; LANDER and GREEN 1987 Down). The general equations formulating the iteration of the {tau} + 1)th EM step are given as follows:

M step: Maximize the expected log-likelihood of {theta}. This gives an updated estimate for the recombination fraction and is obtained as

(5)

These two steps are repeated until the estimate of {theta} converges to a stable value. Such a stable value is the maximum-likelihood estimate (MLE) of {theta}.

If we plug {phi} and {psi} from Equation 4 into 5, we can see that the stable values of the iterative procedure are solutions of the following polynomial equation in {theta}:

(6)

Since this a fourth-order polynomial of {theta}, closed-form solutions exist and can be calculated very easily.

The characterization of linkage phase:
We derived statistical procedures for estimating {alpha}, ß, and {theta} when the allelic assignment as shown in expression (1) is assumed. The estimates of parameters ({alpha}, ß, {theta}) for any one of the other 23 assignments can be similarly obtained by changing the positions of the corresponding elements in matrices H and D. One remaining issue is how to determine the best assignment, i.e., one corresponding to a most likely parental linkage phase of the two markers. The most likely linkage phase can be determined using the posterior probability of {pi}{omega} = (A{omega}, {alpha}, ß, {theta})T conditional on the marker data M, where A{omega} is the {omega}th allelic assignment for marker l relative to the fixed allelic assignment of marker k. From Bayes' theorem:

These posterior probabilities for all possible assignments depend on the prior probabilities P({pi}{omega}). In practice, the prior distribution can be assumed to be uniform among all 24 assignments and, in this case, the posterior probabilities are proportional to the likelihoods L({pi}{omega}) = P(M|{pi}{omega}). The final MLEs for the parameters ({alpha}, ß, {theta}) are based on the most likely assignment with the highest posterior probability.

SVED 1964 Down demonstrated that, unless they solely form bivalents, autotetraploids have a recombination fraction bounded by 1 - 1/x, where x is the level of ploidy. Thus, for autotetraploids undergoing quadrivalent pairings, the maximum value of recombination fraction is {theta} = 0.75. The test of whether or not the two given markers are linked is based on the log-likelihood-ratio test statistic under the full model (Equation 3), which corresponds to the parameter estimators derived from the most likely assignments, and the reduced model with the restraint of {theta} = 0.75. The likelihood-ratio test (LRT) statistic calculated in this way has a {chi}2-distribution with1/2 d.f. under the null hypothesis (SELF and LIANG 1987 Down). Thus, two markers k and l can be declared to be linked if the LRT is > {chi}2,{delta} for an appropriate choice of the type I error rate {delta} (for example, {chi}2 = 2.42).


*  SIMULATION
*TOP
*ABSTRACT
*AUTOTETRAPLOID MODEL
*SIMULATION
*DISCUSSION
*APPENDIX
*LITERATURE CITED

Analysis of a simulated data set:
We illustrate the autotetraploid model through analyzing a simulated example. Since gamete genotypes can provide adequate information for linkage analysis as much as zygote types, we consider analysis only on the segregation of the gamete genotypes in parent P, which is assumed to have frequencies of double reduction (0.05, 0.1) and recombination fraction 0.05. These parameters correspond to relative frequencies of the nine different gamete formation modes f = (0.04071, 0.00130, 0.00446, 0.00353, 0.04301, 0.01498, 0.88221, 0.00736, 0.00245), which give the joint relative frequencies of two-marker diploid gametes in the matrix H. A random sample of N = 200 gametes was simulated from multinomial distribution with probabilities given by H. The marker data M, e.g., the counts of all gamete types, can be presented in the following matrix form:

Suppose parent P has alignment

then there are 11 offspring in the first gamete formation mode (N1 = 3 + 4 + 3 + 1). Similarly, counts for the other eight modes are N2 = 0, N3 = 3, N4 = 0, N5 = 9, N6 = 1, N7 = 173, N8 = 2, and N9 = 1. Hence we have MLEs of the relative frequencies of the nine different gamete formation modes = (11/200, 0, 3/200, 0, 9/200, 1/200, 173/200, 2/200, 1/200), which correspond to = (N1 + N2 + N3 + N4)/N = 0.07, = (N1 + N2 + N5 + N6)/N = 0.105, and = 0.0453 with log-likelihood (ll)A = -482.98. Furthermore, under the null hypothesis {theta} = 0.75, the MLEs of mode frequencies are = (0.01396, 0, 0.00757, 0, 0.02272, 0.25448, 0.43679, 0.01, 0.25448), and the parameter estimates are = 0.022, = 0.291 with llN = -616.62.

For a second assignment

gamete classification is different. For example, gamete

has no recombination and should be classified into mode 1 instead of mode 2, and gamete

should be in mode 7 instead of mode 8. The counts for all nine gamete formation modes, under the new assignments, are N1 = 7, N2 = 7, N3 = 2, N4 = 1, N5 = 5, N6 = 5, N7 = 53, N8 = 123, N9 = 0. Consequently, we can obtain the MLE (, , ) = (0.07, 0.105, 0.46436) with log-likelihood llA = -758.50. Similar to the first assignment, we also have MLE (, , ) = (0.118, 0.208, 0.75) with log-likelihood llN = -786.47 under the null hypothesis.

This procedure needs to be repeated for all of the other 22 assignments. In Table 2, we present MLEs and log-likelihood for all 24 different allelic assignments. Fig 2 (top left) plotted the log-likelihood values against the 24 assignments of marker l with a dictionary order 1234, 1243, ... , 4321. Estimates of the recombination fraction for different assignments are indicated by different insets in the figure. It shows that a true assignment has the largest log-likelihood value.



View larger version (37K):
[in this window]
[in a new window]
[Download PPT slide]
 
Figure 2. Plot of the log-likelihood (ll) vs. 24 assignments for 18 sets of parameters ({theta}, {alpha}, ß) from one simulation with sample size N = 200. The x-axes are the 24 assignments in dictionary order, i.e., 1234, 1243, ... , 4321. Different symbols used in the plots indicate the range of MLE for each assignment: circles for in (0, 1/8]; crosses for (1/8, 3/8]; stars for (3/8, 5/8]; diamonds for (5/8, 3/4]; and squares for others. The MLE 's corresponding to the most likely assignment are indicated in each plot.


 
View this table:
[in this window]
[in a new window]

 
Table 2. Maximum-likelihood estimates of the recombination fraction with all 24 different allelic assignments of marker l for the simulated example

Since assignment 1 has the largest log-likelihood, we choose the final MLEs for the parameters on the basis of the first assignment; e.g., (, , ) = (0.07, 0.105, 0.453) with log-likelihood llA = -482.98. However, under the null hypothesis, the final MLE comes from the last assignment with log-likelihood llN = -616.28. Thus the LRT statistic equals -2 x (-616.28 + 482.98) = 266.60, which is much larger than the cut point value {chi}2 = 2.42, implying that there is very strong evidence that the two markers are linked.

More simulations:
Extensive simulation studies were performed to investigate the properties of our statistical method by evaluating the effectiveness of determining a correct allelic assignment, the precision of the parameter estimates, and the power to detect linkage. A number of genetic scenarios are designed to explore the effects of different parameter values on their estimation from this new method. A segregating full-sib family of size N = 80, 200, 400, or 800 is simulated by hypothesizing different recombination fractions ranging from tight linkage to free recombination, {theta} = 0.05, 0.15, 0.25, 0.50, 0.65, and 0.75, and different pairs of double reduction rates with various degrees of difference between two markers, ({alpha}, ß) = (0.05, 0.1), (0.15, 0.2), (0.25, 0.3), (0.1, 0.2) and (0.05, 0.3). For {theta} = 0.05, however, only the first three pairs of ({alpha}, ß) are considered because the other two combinations are impossible (recall |{alpha} - ß| <= 2{theta}). The simulation is repeated 1000 times for each scenario. For each replication, the maximum-likelihood estimates (, , ) and the log-likelihood value are obtained for all 24 possible assignments. In addition, the LRT was calculated for each simulation to test for the significance of linkage.

In Fig 2, the log-likelihood values are plotted against the 24 different allelic assignments of marker l with a dictionary order 1234, 1243, ... , 4321. For different assignments, different estimates of the recombination fraction are obtained, as indicated by different insets in the figure. It is shown that a true assignment usually corresponds to the largest log-likelihood value. There is a distinct difference between the largest and the second-largest log-likelihood values, especially when {theta} is small. This implies that our method can well be used to characterize the marker linkage phase in parents. In some cases, the second-largest log-likelihood value is associated with the estimate of {theta} > 0.75, so it is easy to avoid the assignment corresponding to such an estimate.

We did not report simulation results about double reduction rate estimates and because we have closed-form formulas for their variances. To evaluate the precision of the recombination fraction estimates, square-rooted mean square errors (RMSEs) are calculated for all simulation scenarios (Table 3). As expected, the RMSEs decrease with increasing sample sizes. However, sample size effects also decrease with increasing sample sizes. This means that a sample size of 200–400 is adequate for providing a precise estimate of {theta}. It is also worth noting that the estimate works reasonably well when N = 80. In addition, the RMSEs of values increase with decreasing {theta} but decrease at {theta} = 0.75 because of the boundary effect. It is seen that the precision of depends on true double reduction rates ({alpha}, ß) with two tendencies (Table 3). First, the RMSEs tend to be larger when there are larger double reduction rates. Second, the RMSEs tend to increase when the difference of double reduction between the two markers increases. For example, the RMSEs of = 0.5 or above are larger for ({alpha}, ß) = (0.10, 0.20) than (0.25, 0.30), although the latter combination has larger double reduction rates.


 
View this table:
[in this window]
[in a new window]

 
Table 3. Square-rooted mean square error (RMSE) for the estimator of the recombination fraction for all 28 combinations of parameters ({theta}, {alpha}, ß) and four sample sizes N

The power to detect a significant linkage is examined on the basis of 1000 replicates (Fig 3). Obviously, the power of the test increases with increasing sample sizes. However, the effect of sample size depends on the double reduction rates and recombination fraction. For example, the effect is larger for ({alpha}, ß) = (0.1, 0.2) than for ({alpha}, ß) = (0.15, 0.2) when {theta} = 0.65, but this is reversed for {theta} = 0.5.



View larger version (30K):
[in this window]
[in a new window]
[Download PPT slide]
 
Figure 3. The size and power of the likelihood-ratio test of linkage using {chi}2 = 2.42 on the basis of 1000 replicates. The power (or size) of the test vs. true {theta} for all five sets of double reduction rates was plotted.


*  DISCUSSION
*TOP
*ABSTRACT
*AUTOTETRAPLOID MODEL
*SIMULATION
*DISCUSSION
*APPENDIX
*LITERATURE CITED

The main difficulty in performing linkage analysis for autopolyploids stems from the complexities of polysomic inheritance. With the occurrence of polysomic inheritance, the recombination fraction alone is no longer sufficient to specify the frequencies of gamete genotypes and their segregation patterns. To simplify linkage analysis in autopolyploids, many earlier methods assume a pure bivalent pairing model between homologous chromosomes during meiosis (WU et al. 1992 Down; HACKETT et al. 1998 Down; RIPOL et al. 1999 Down; LUO et al. 2001 Down). Although the statistical merits of these methods were demonstrated by extensive simulations, their underlying assumption may significantly deviate from biological reality. For an autopolyploid, multivalent pairings during gametogenesis may result in double reduction (DARLINGTON 1929 Down; DE WINTON and HALDANE 1931 Down; MATHER 1936 Down; FISHER 1947 Down), a phenomenon that adds extra complexity in the establishment of a workable model for polysomic linkage analysis.

In this article, we derive a statistical method for simultaneously estimating the linkage and linkage phase between different markers in a full-sib family of autotetraploids undergoing quadrivalent pairings at meiosis. This method based on quadrivalent pairings is not a simple extension of the existing models on bivalent pairing. Rather, the method has incorporated the cytological mechanisms underlying gamete formation derived from multivalent pairings, some of which (i.e., double reduction) are unique and do not happen with bivalent pairings. We also showed that the difference in the frequency of double reduction between two markers is bounded by two times their recombination fraction in tetraploid.

With these underpinning mechanisms of quadrivalent pairings, FISHER 1947 Down formulated a pioneering genetic model to count all possible modes of gamete formation in autotetraploids. But, in his time, he could not separate and further estimate two different modes generating the same gamete genotypes (e.g., mode 7A vs. 7B or mode 8A vs. 8B; Table 1). Thanks to the development of the maximum-likelihood method implemented with the EM algorithm (DEMPSTER et al. 1977 Down), we are now able to well discriminate and estimate the proportions of these different modes by viewing them as a missing data problem.

The advantage of the EM algorithm is that it resulted in closed-form solution for the recombination fraction. However, if we forego this, it is also possible to perform a Bayesian analysis. We may assign a Dirichlet prior for the frequencies of the nine formation modes f = (f1, f2, ... , f9), which yields a Dirichlet posterior distribution of f given the sample frequencies N1, N2, ... , N9. Thus we can easily sample from the posterior of f and obtain a posterior sample of ({alpha}, ß, {theta}) by letting {alpha} = f1 + f2 + f3 + f4, ß = f1 + f2 + f5 + f6 and solving {theta} using Equation 6 with each Ni/N replaced by fi. Moreover if we extend this to the 11 basic gamete modes f* = (f1, f2, ... , f6, f7A, f7B, f8A, f8B, f9), then a Gibbs sampler could be set up to obtain posterior samples (ROBERT and CASELLA 2000 Down).

Although we have devised a statistical method for resolving a fundamentally important problem in autopolyploid linkage analysis, one that has puzzled geneticists for over one-half century, there is still much room for improvement. First, our model is proposed for fully informative codominant markers, i.e., those of eight different alleles between the two autotetraploid parents at each marker. For these markers, an explicit expression exists for the MLE of the frequency of double reduction, although the estimate of the recombination fraction must rely upon EM iterations. In a practical full-sib mapping population, other types of markers, such as dominant or partially informative, may be common. For autopolyploids, dominant markers derived from randomly amplified polymorphic DNA or amplified fragment length polymorphism technologies typically cannot be distinguished among simplex (single dose), duplex (double dose), and multiplex (multiple dose) types, because they present an identical genotype (WU et al. 1992 Down; YU and PAULS 1993 Down; LUO et al. 2000 Down). For these dominant or partially informative markers, gametes formed with double reduction may have the same genotypes as those formed without double reduction. Thus, estimating the frequency of double reduction will have to require the EM algorithm. Also, linkage analysis for these markers must be based on the segregation of zygote genotypes, because the segregation at the gamete level cannot provide adequate information for linkage analysis.

Second, our method is based on a single pairing model—quadrivalent. Chromosome pairings in autopolyploids indeed are a function of the homology between the genomes involved, with a propensity in pairing between homologous over homeologous chromosomes, which is defined as the preferential pairing factor (SYBENGA 1994 Down). Such a preferential pairing factor determines the relative importance of bivalent vs. multivalent pairings in autopolyploids and, therefore, can be used to model the frequency of double reduction and recombination fraction when both bivalent and multivalent pairings happen simultaneously during meiosis. Last, our method is developed for autotetraploids, but its extension to autohexaploid, autooctoploid, and autodexaploid species is important because many important plant species have such high ploidy levels (SOLTIS and SOLTIS 2000 Down). For an autohexaploid plant, for instance, triploid gametes are generated at meiosis, including three gamete types of pure double reduction, partial double reduction, and no double reduction.

The statistical method proposed in this article describes a mapping framework for studying the genome structure and organization in complex autopolyploid species, providing a sophisticated model for linkage analysis in autopolyploids. It provides a necessary platform on which researchers can map quantitative trait loci (QTL) underlying economically and biologically important traits in autopolyploids. Although some preliminary studies have been reported for QTL mapping in autopolyploids, assuming pure bivalent pairings (DOERGE and CRAIG 2000 Down; XIE and XU 2000 Down), all of these should be viewed as premature until a comprehensive model is framed to take both bivalent and multivalent pairings into account.


*  FOOTNOTES

1 These authors contributed equally to this work. Back


*  ACKNOWLEDGMENTS

We are grateful to Dr. S. Xu and Dr. M. Gallo-Meagher for stimulating discussions regarding this project. The authors thank the associate editor, referee Dr. R. Deborah Overath, and one anonymous referee for their constructive comments. This manuscript was approved as Journal Series R-08464 by the Florida Agricultural Experiment Station.

Manuscript received May 2, 2001; Accepted for publication August 29, 2001.


*  APPENDIX
*TOP
*ABSTRACT
*AUTOTETRAPLOID MODEL
*SIMULATION
*DISCUSSION
*APPENDIX
*LITERATURE CITED

Among the 16 possible allele configurations, 4 have no recombination and 12 have one recombination. If we form gametes with two chromosomes by selecting, with replacement, from the 16 alleles twice, this yields 16 x 16 = 256 possibilities (16 with no recombination, 96 with one recombination, and 144 with two).

Recall that {phi} and {psi} are the proportions of gamete types that have two recombination events under modes 7 and 8, respectively. Note that f7A contains 12 out of 16 gametes with no recombination and f7B contains 12 out of 144 gametes with two recombinations; thus the relative proportions should be 12(1 - {theta})2/16:12{theta}2/144 = 9(1 - {theta})2:{theta}2. Similarly, f8A contains 48 out of 96 gametes with one recombination and f8B contains 48 out of 144 gametes with two recombinations; thus the relative proportions should be 48 x 2{theta}(1 - {theta})/96:48 x {theta}2/144 = 3(1 - {theta}):{theta}. Consequently, we may assume {phi} = {theta}2/(9(1 - {theta})2 + {theta}2) and {psi} = {theta}/(3 - 2{theta}).


*  LITERATURE CITED
*TOP
*ABSTRACT
*AUTOTETRAPLOID MODEL
*SIMULATION
*DISCUSSION
*APPENDIX
*LITERATURE CITED

BEVER, J. D. and F. FELBER, 1992  The theoretical population genetics of autopolyploidy. Oxf. Surv. Evol. Biol. 8:185-217.

BROUWER, D. J. and T. C. OSBORN, 1999  A molecular marker linkage map of tetraploid alfalfa (Medicago sativa L.). Theor. Appl. Genet. 99:1194-1200.

DARLINGTON, C. D., 1929  Chromosome behaviour and structural hybridity in the Tradescantiae. J. Genet. 21:207-286.

DA SILVA, J., M. E. SORRELLS, W. L. BURNQUIST, and S. D. TANKSLEY, 1993  RFLP linkage map and genome analysis of Saccharum spontaneum.. Genome 36:782-791.

DEMPSTER, A. P., N. M. LAIRD, and D. B. RUBIN, 1977  Maximum likelihood from incomplete data via EM algorithm. J. Stat. Soc. Ser. B 39:1-38.

DE WINTON, D. and J. B. S. HALDANE, 1931  Linkage in the tetraploid Primula sinensis.. J. Genet. 24:121-144.

DOERGE, R. W. and B. A. CRAIG, 2000  Model selection for quantitative trait locus analysis in polyploids. Proc. Natl. Acad. Sci. USA 97:7951-7956[Abstract/Free Full Text].

FISHER, R. A., 1947  The theory of linkage in polysomic inheritance. Philos. Trans. R. Soc. Ser. B 233:55-87.

GRANT, V., 1981 Plant Speciation, Ed. 2. Columbia University Press, New York.

GRIVET, L., A. D'HONT, D. ROQUES, P. FELDMANN, and C. LANAUD et al., 1996  RFLP mapping in cultivated sugarcane (Saccharum spp): genome organization in a highly polyploid and aneuploid interspecific hybrid. Genetics 142:987-1000[Abstract].

HACKETT, C. A., J. E. BRADSHAW, R. C. MEYER, J. W. MCNICOL, and D. MILBOURNE et al., 1998  Linkage analysis in tetraploid species: a simulation study. Genet. Res. 71:143-154.

HILU, K. W., 1993  Polyploidy and the evolution of domesticated plants. Am. J. Bot. 80:1491-1499.

JACKSON, R. C. and J. W. JACKSON, 1996  Gene segregation in autotetraploids: prediction from meiotic configurations. Am. J. Bot. 83:673-678.

LANDER, E. S. and P. GREEN, 1987  Construction of multilocus genetic linkage maps in human. Proc. Natl. Acad. Sci. USA 84:2363-2367[Abstract/Free Full Text].

LUO, Z. W., C. A. HACKETT, J. E. BRADSHAW, J. W. MCNICOL, and D. MILBOURNE, 2000  Predicting parental genotypes and gene segregation for tetrasomic inheritance. Theor. Appl. Genet. 100:1067-1073.

LUO, Z. W., C. A. HACKETT, J. E. BRADSHAW, J. W. MCNICOL, and D. MILBOURNE, 2001  Construction of a genetic linkage map in tetraploid species using molecular markers. Genetics 157:1369-1385[Abstract/Free Full Text].

MASTERSON, J., 1994  Stomatal size in fossil plants—evidence for polyploidy in majority of angiosperms. Science 264:421-424[Abstract/Free Full Text].

MATHER, K., 1936  Segregation and linkage in autotetraploids. J. Genet. 32:287-314.

MEYER, R. C., D. MILBOURNE, C. A. HACKETT, J. E. BRADSHAW, J. W. MCNICHOL, and R. WAUGH, 1998  Linkage analysis in tetraploid potato and association of markers with quantitative resistance to late blight (Phytophthora infestans). Mol. Gen. Genet. 259:150-160[Medline].

OTTO, S. P. and J. WHITTON, 2000  Polyploid incidence and evolution. Annu. Rev. Genet. 34:401-437[Medline].

RIPOL, M. I., G. A. CHURCHILL, J. A. G. DA SILVA, and M. SORRELLS, 1999  Statistical aspects of genetic mapping in autopolyploids. Gene 235:31-41[Medline].

ROBERT, P. C., and G. CASELLA, 2000 Monte Carlo Statistical Methods (Springer Texts in Statistics). Springer-Verlag, New York.

SELF, S. G. and K. Y. LIANG, 1987  Asymptotic properties of maximum likelihood estimators and likelihood ratio tests under nonstandard condition. J. Am. Stat. Assoc. 82:605-610.

SOLTIS, P. S. and D. E. SOLTIS, 2000  The role of genetic and genomic attributes in the success of polyploids. Proc. Natl. Acad. Sci. USA 97:7051-7057[Abstract/Free Full Text].

STEBBINS, G. L., 1971 Chromosomal Evolution in Higher Plants. Addison-Wesley, Reading, MA.

SVED, J. A., 1964  The relationship between diploid and tetraploid recombination frequencies. Heredity 19:585-596[Medline].

SYBENGA, A., 1994  Preferential pairing estimates from multivalent frequencies in tetraploids. Genome 37:1045-1055.

WU, K. K., W. BURNQUIST, M. E. SORRELLS, T. L. TEW, and P. H. MOORE et al., 1992  The detection and estimation of linkage in polyploids using single-dose restriction fragments. Theor. Appl. Genet. 83:L294-300.

XIE, C. G. and S. H. XU, 2000  Mapping quantitative trait loci in tetraploid populations. Genet. Res. 76:105-115[Medline].

YU, K. F. and K. P. PAULS, 1993  Segregation of random amplified polymorphic DNA markers and strategies for molecular mapping in tetraploid alfalfa. Genome 36:844-851.




This article has been cited by other articles:


Home page
GeneticsHome page
Z. W. Luo and Z. Zhang
Commentary on Wu and Ma
Genetics, December 1, 2005; 171(4): 2149 - 2150.
[Full Text] [PDF]


Home page
GeneticsHome page
R. Wu and C.-X. Ma
A General Framework for Statistical Linkage Analysis in Multivalent Tetraploids
Genetics, June 1, 2005; 170(2): 899 - 907.
[Abstract] [Full Text] [PDF]


Home page
GeneticsHome page
J. E. Bradshaw, B. Pande, G. J. Bryan, C. A. Hackett, K. McLean, H. E. Stewart, and R. Waugh
Interval Mapping of Quantitative Trait Loci for Resistance to Late Blight [Phytophthora infestans (Mont.) de Bary], Height and Maturity in a Tetraploid Population of Potato (Solanum tuberosum subsp. tuberosum)
Genetics, October 1, 2004; 168(2): 983 - 995.
[Abstract] [Full Text] [PDF]


Home page
Proc. Natl. Acad. Sci. USAHome page
Z. W. Luo, R. M. Zhang, and M. J. Kearsey
Theoretical basis for genetic linkage analysis in autotetraploid species
PNAS, May 4, 2004; 101(18): 7040 - 7045.
[Abstract] [Full Text] [PDF]


Home page
Crop Sci.Home page
R. G. Fjellstrom, J. J. Steiner, and P. R. Beuselinck
Tetrasomic Linkage Mapping of RFLP, PCR, and Isozyme Loci in Lotus corniculatus L.
Crop Sci., May 1, 2003; 43(3): 1006 - 1020.
[Abstract] [Full Text] [PDF]


Home page
Genome Res.Home page
C.-X. Ma, G. Casella, Z.-J. Shen, T. C. Osborn, and R. Wu
A Unified Framework for Mapping Quantitative Trait Loci in Bivalent Tetraploids Using Single-dose Restriction Fragments: A Case Study from Alfalfa
Genome Res., December 1, 2002; 12(12): 1974 - 1981.
[Abstract] [Full Text] [PDF]


This Article
Right arrow Abstract Freely available
Right arrow Full Text (PDF)
Right arrow Alert me when this article is cited
Right arrow Alert me if a correction is posted
Services
Right arrow Similar articles in this journal
Right arrow Similar articles in PubMed
Right arrow Alert me to new issues of the journal
Right arrow Download to citation manager