Originally published as Genetics Published Articles Ahead of Print on October 22, 2006.

Genetics, Vol. 175, 923-931, February 2007, Copyright © 2007
doi:10.1534/genetics.106.064030

A New Strategy for Estimating Recombination Fractions Between Dominant Markers From an F2 Population

* Human Genetics Center, School of Public Health, University of Texas, Houston, Texas 77030, {dagger} College of Life Science, Hunan Normal University, Changsha, 410081, Hunan, People's Republic of China and {ddagger} Laboratory for Conservation and Utilization of Bioresources, Yunnan University, Kunming Province, 65091, Yunnan, China

1 Corresponding author: Human Genetics Center, School of Public Health, University of Texas, 1200 Herman Pressler, Houston TX 77030.
E-mail: yunxin.fu{at}uth.tmc.edu

Manuscript received July 27, 2006. Accepted for publication October 9, 2006.

ABSTRACT

Although most high-density linkage maps have been constructed from codominant markers such as single-nucleotide polymorphisms (SNPs) and microsatellites due to their high linkage information, dominant markers can be expected to be even more significant as proteomic technique becomes widely applicable to generate protein polymorphism data from large samples. However, for dominant markers, two possible linkage phases between a pair of markers complicate the estimation of recombination fractions between markers and consequently the construction of linkage maps. The low linkage information of the repulsion phase and high linkage information of coupling phase have led geneticists to construct two separate but related linkage maps. To circumvent this problem, we proposed a new method for estimating the recombination fraction between markers, which greatly improves the accuracy of estimation through distinction between the coupling phase and the repulsion phase of the linked loci. The results obtained from both real and simulated F2 dominant marker data indicate that the recombination fractions estimated by the new method contain a large amount of linkage information for constructing a complete linkage map. In addition, the new method is also applicable to data with mixed types of markers (dominant and codominant) with unknown linkage phase.


MOST high-density linkage maps have been constructed from codominant markers such as single-nucleotide polymorphisms (SNPs) and microsatellites because of their high linkage information, but linkage maps of dominant markers will become more and more important because such markers are often related to biological functions and are increasingly available as proteomic techniques are becoming mature. Proteomic markers include position-shift locus (PSL), presence/absence sport (PAS), and protein quantitative locus (PQL) (THIELLEMENT et al. 1999; ZIVY and DE VIENNE 2000; CONSOLI et al. 2002), of which PAS and PQL are dominant markers (THIELLEMENT et al. 1999; ZIVY and DE VIENNE 2000; CONSOLI et al. 2002). An example of a linkage map constructed from mostly dominant markers is the Escherichia coli bacteriophage T7 protein linkage map (BARTEL et al. 1996). High-density linkage maps in the future will be more likely constructed from both dominant and codominant markers since such maps can provide fine genetic locations of functional markers through high-density codominant markers flanking them. Therefore, accurate estimates of recombination fractions between dominant markers and between dominant and codominant markers are important.

Due to dominance, the genotype of an individual at a dominant marker is often ambiguous, which increases the complexity of analysis. An important issue in the estimation of the recombination fraction is how to efficiently deal with different linkage phases between a pair of dominant loci (MESTER et al. 2003a). Two different linkage phases for a double heterozygote are well recognized. One is known as the repulsion phase, which corresponds to the situation in which these two dominant alleles reside on different chromosomes; otherwise, it is known as the coupling phase. In a two-point analysis that considers two markers at a time, the repulsion phase provides much less information about linkage than the coupling phase (ALLARD 1956; KNAPP et al. 1995; LIU 1998; MESTER et al. 2003a). This is especially true for double heterozygotes from the F2 population (LIU 1998). In reality, about half of the markers are in the coupling phase and the remaining markers are in the other coupling phase. The phase between two couplings is repulsion (LIU 1998; MESTER et al. 2003a). This leads in practice to the construction of two separate partner linkage maps: one is called the paternal map on which markers are derived from the paternal parent and the other is called the maternal map consisting of the maternal markers (KNAPP et al. 1995; PENG et al. 2000; MESTER et al. 2003a). To date, there is no effective way to integrate the partner maps into a single complete map. MESTER et al. (2003) attempted to use pairs of codominant and dominant markers to accomplish this task because such pairs of markers in the repulsion phase have higher linkage information than pairs of dominant markers in the coupling phase. However, this strategy is extremely demanding because it requires that every dominant marker be paired with a codominant marker.

The two-point analysis implemented by the expectation-maximization (EM) algorithm (DEMPSTER et al. 1977; LANDER and GREEN 1987; OTT 1991) is a powerful approach for estimating recombination fractions between codominant loci and between dominant loci in the coupling phase, but it has a poor resolution for dominant loci in the repulsion phase (see LIU 1998). This is because the two-point analysis cannot distinguish the coupling phase from the repulsion phase of dominant markers, which have rather different statistical properties. In addition to the need for treating coupling and repulsion phases separately, examining three loci at a time will lead to a better utilization of available linkage information. The problem is that not only the number of combinations of the three loci is large when the total number of loci is large, but also the complexity of the analysis increases due to the need to distinguish several types of double or triple heterozygotes. To circumvent these problems, we propose an alternative approach in this article. The new method considers three loci at a time. It first classifies phenotypes into four pairs of gamete genotypes, followed by estimating their frequencies from the sample that led to the identification of the linkage phase of the loci, then estimates recombination fractions between loci according to their linkage phase, and finally reduces the three-point estimates of the recombination fractions to two-point estimates. A key to this strategy is a fast method for estimating the frequencies of different gamete types because of the need to deal with a large number of loci combinations. We are able to develop very efficient estimators of these frequencies by taking advantage of the simplicity of their expectations. The estimates of recombination fractions obtained by this new method make it possible to integrate two separate partner linkage maps based on the EM estimates of recombination fractions into a single complete linkage map.


METHODS

Estimating the frequencies of three-locus gametes:

Since the novel method to be described for estimating recombination fractions makes use of the frequencies of gametes defined by alleles from three loci, we start by presenting estimators of these frequencies. Two cases need to be considered separately. The first corresponds to the situation in which all three loci are dominant and thus is referred to as "dominant loci." The second is that only one or two loci out of three are dominant and is referred to as "mixed loci."

Dominant loci:

Consider three dominant loci each having two alleles. Let A and a be the two alleles for the first locus, B and b be those for the second, and C and c be those for the third. Uppercase letters denote dominant alleles and lowercase letters recessive alleles. A meiosis from a triple-heterozygote individual of the F1 population can produces eight different types of three-locus gamete: ABC, ABc, Abc, AbC, aBC, abC, aBc, and abc, where ABC and abc, Abc and aBC, abC and ABc, and AbC and aBc are, respectively, sister gametes. These sister gametes are expected to have equal frequency under the assumption of no segregation distortion during meiosis. In practice, a chi-square test can be used to remove loci that exhibit significant segregation distortion. These gametes can be grouped into four pairs of nonsister gametes. Define an F2 population:

Formula
It follows that Formula. The individuals of the F2 population can be classified into four categories. Category i (i = 0, . . ., 3) consists of individuals with exactly i loci possessing a dominant allele. To estimate gamete frequencies, it is necessary to consider the frequency of each category. Let Formula represent the phenotype in which only locus c exhibits a dominant phenotype. Therefore Formula represent the group of individuals from category 1 whose locus c has a dominant allele(s). It is obvious that there are three genotypes in category 1 and Formula can be further dissected into

Formula
Phenotypes Formula and Formula are also dissected in a similar fashion.

There are also three phenotypes in category 2, each of which can be dissected into five pairs of sister gametes. For instance, the phenotype Formula can be dissected into

Formula
Note that the phenotype for category 3 is not very informative since the single phenotype corresponds to too many genotypes. Therefore frequencies for category 3 are not used.

Let Formula, Formula, Formula, Formula, Formula, Formula, and Formula be the expected frequencies of phenotypes Formula, Formula, Formula, Formula, Formula, Formula, and Formula in the F2 population, respectively. Then

Formula 1(1)
and

Formula 2(2)
Letting Formula 2, Equation 2 may be rewritten as

Formula 3(3)
Moment estimates of Formula 3 can be obtained from the above sets of equations by replacing Formula 3 by their moment estimates, which are simply their observed frequencies in the sample. Theoretically Equation 1 is sufficient for deriving solutions for q's. However, Equation 3 can be used to further minimize the stochastic effect in the observed frequencies. Specifically, Formula 3, Formula 3, and Formula 3 can be estimated as

Formula 4(4)
where Formula 4 (see APPENDIX A). It follows that Formula 4, Formula 4, and Formula 4 can alternatively be estimated from the observed frequencies of Formula 4, Formula 4, Formula 4, and Formula 4. We can combine the two sets of estimates of Formula 4, Formula 4, and Formula 4 to obtain a more stable set of estimates as

Formula 5(5)
where Formula 5 and Formula 5 are weights of Formula 5 and Formula 5, respectively, where k = 2, 3, 4. Formula 5 is the estimate of Formula 5. Our simulation study showed that Formula 5 usually gives the best result for the estimation of Formula 5. When the sample is small, it is possible that Formula 5 or Formula 5. In such a case, one can set Formula 5 and Formula 5 for Formula 5, or Formula 5 and Formula 5 for Formula 5 and Formula 5.

Since Formula 5, therefore Formula 5 can be expressed as

Formula 6A(6a)
Similarly we have

Formula 6B(6b)

Formula 6C(6c)
Formula 6C and Formula 6Care estimated by Formula 6C and Formula 6C, so Formula 6C is estimated by

Formula 7A(7a)
Similarly

Formula 7B(7b)

Formula 7C(7c)
Formula 7Cis estimated by

Formula 7D(7d)

Mixed loci:

Two configurations in the case of the mixed loci need to be considered. The first is two codominant loci and one dominant locus (2C1D), and the second is one codominant locus and two dominant loci (1C2D) (see Figure 1). For a codominant locus, "0" and "1" represent two parental types of homozygotes and "2" represent heterozygote. While for the dominant locus, "A" and "a" represent a dominant phenotype and a recessive phenotype, respectively. Without loss of generality, we assume in the following discussion the order of loci in the case of 2C1D is DCC. The 12 phenotypes are informative for linkage analysis, which are Formula 7D, Formula 7D, Formula 7D, Formula 7D, Formula 7D, Formula 7D, Formula 7D, Formula 7D, Formula 7D, Formula 7D, Formula 7D, and Formula 7D, while phenotypes A20, A21, and A02 and A12, a22, and A22 are much less informative because they are double (or potentially) and triple (or potentially) heterozygotes. In the Formula 7D population, similar to phenotype Formula 7D in dominant loci, phenotypes Formula 7D, Formula 7D, Formula 7D, and Formula 7D are homozygous and have the expected frequencies Formula 7D, Formula 7D, Formula 7D, and Formula 7D, respectively, and Formula 7D, Formula 7D, Formula 7D, and Formula 7D are similar to Formula 7D in dominant loci and have the expected frequencies Formula 7D, Formula 7D, Formula 7D, and Formula 7D, respectively. The frequencies of Formula 7D, Formula 7D, Formula 7D, and Formula 7D are expected to have Formula 7D, Formula 7D, Formula 7D, and Formula 7D, respectively. Thus, for any nonsister gamete type, there are three ways to estimate these gamete frequencies. For example, Formula 7D can be estimated by the following three equations:

Formula 8A(8a)

Formula 8B(8b)

Formula 8C(8c)
A simple single estimate can be obtained by taking the average of the three. The approach is also used for other gametes, resulting in the estimates

Formula 9A(9a)

Formula 9B(9b)

Formula 9C(9c)

Formula 9D(9d)
where Formula 9D, Formula 9D, Formula 9D, Formula 9D, Formula 9D, Formula 9D, Formula 9D, Formula 9D, Formula 9D, Formula 9D, Formula 9D, and Formula 9D are estimates of Formula 9D, Formula 9D, Formula 9D, Formula 9D, Formula 9D, Formula 9D, Formula 9D, Formula 9D, Formula 9D, Formula 9D, Formula 9D, and Formula 9D, respectively.


Figure 1
View larger version (5K):
In this window
In a new window
Download PPT slide
 
FIGURE 1.—

Three marker regions on a chromosome. C, codominant marker; D, dominant marker.

 
Similarly, we can obtain estimates of the frequencies of these four types of nonsister gametes in 1C2D from

Formula 10A(10a)

Formula 10B(10b)

Formula 10C(10c)

Formula 10D(10d)
where Formula 10D, Formula 10D, Formula 10D, Formula 10D, Formula 10D, and Formula 10D are the estimated frequencies of phenotypes Formula 10D, Formula 10D, Formula 10D, Formula 10D, Formula 10D, Formula 10D, and Formula 10D, respectively.

Three-point estimates of recombination fractions between loci:

Recombination fractions between loci can be estimated from q's. Since q's are estimated separately, their sum does not always satisfy the equation Formula 10D. Therefore, before estimating the recombination fraction, we obtain normalized estimates of q's as

Formula 10D

Formula 10D
It is obvious that three loci are viewed to be independent if the null hypothesis Formula 10D holds at the significance level of 0.05, two loci are believed to be linked with each other, and the rest is independent if two of four types of nonsister gametes have equal estimated frequencies at the 0.05 significance level.

For linked loci, the frequencies of the four pairs of nonsister gametes can be used to distinguish the coupling phase from the repulsion phase between loci and consequently lead to proper estimates of the recombination fraction between loci according to whether they are in the coupling phase or in the repulsion phase. For example, suppose the order of the three loci is abc. Then if Formula 10D is the smallest and Formula 10D is the largest, each pair of the three loci is in the coupling phase, and if Formula 10D is the largest and Formula 10D is the smallest, then loci a and c are in the coupling phase but loci a and b and loci b and c are in the repulsion phase. On the other hand, if Formula 10D is the largest and Formula 10D is the smallest, then loci a and b are in coupling phase but loci a and c and loci b and c are in repulsion phase. Similarly if Formula 10D is the smallest and Formula 10D is the largest, then loci b and c are in coupling phase but loci a and b and loci a and c are in repulsion.

In the coupling phase Formula 10D is the frequency of double crossover in the F2 progeny. Thus, the recombination fractions between a and b, between b and c, and between a and c can be estimated by

Formula 11(11)
Estimates of the recombination fractions between loci in the other orders in the coupling phase are also obtained in a similar manner.

In the repulsion phase, the order (a–b–c) leads to Formula 11 due to double crossover, and thus the recombination fractions between a and b, between b and c, and between a and c are estimated by

Formula 12(12)
The recombination fractions between three loci in the other orders in the repulsion phase can be estimated in a similar fashion.

Reduction of the three-point estimates of recombination fractions to the two-point estimates:

If n loci on a chromosome are genotyped in the mapping study, there are Formula 12 combinations of three loci, each of which results in three estimates of the recombination fraction. Therefore a total of Formula 12 recombination fractions are being estimated. When n is large, it will be difficult to compare all these combinations for building a linkage map of n loci even on a modern computer. Moreover, the Formula 12 recombination fractions contain coupling and repulsion linkage information. To avoid these complex comparisons, it is necessary to reduce the three-point estimates to two-point estimates. Although loci i and j would be configured with Formula 12 other loci to form Formula 12 three-point combinations, the linkage phase between loci i and j has already been fixed regardless of the other locus. Estimates of the recombination fraction between loci i and j may vary slightly with the other loci due to their respective different double-exchange frequencies and sampling error; hence, it needs to be adjusted with Formula 12 other loci. For convenience, let the estimate of recombination fraction between loci i and j in a three-point combination (i, j, k) be referred to as a three-point estimate and denoted by Formula 12, where k is called a reference locus and Formula 12. Thus, for n loci on a chromosome or a fragment, recombination fractions between loci i and j have Formula 12 three-point estimates. The order of loci i, j, and k in Formula 12 has been determined previously; that is, Formula 12 contains the order information of these three loci according to Equations 11 and 12. On the other hand, there are Formula 12 estimates of the recombination fraction between loci i and j. These Formula 12 estimates fluctuate with sampling errors and different double-exchange values, which depends upon the distances of locus i or/and locus j from locus k. Three cases for the variation of double-exchange values with respect to the estimate of the recombination fraction between loci i and j are considered: (1) loci i and j are adjacent loci, and all reference loci are out of interval i–j; (2) loci i and j are two terminal loci on a chromosome or a fragment, and all reference loci are within interval i–j; and (3) loci i and j are nonadjacent loci and the reference loci are either within or out of interval i–j. In the first case, the double exchanges dealing with all reference loci are detected and measured but different from one reference locus to another reference locus. For the second case, the double exchanges dealing with reference loci do not contribute to the recombination fraction between loci i and j. There is only one type in this case: loci i and j are two terminal loci but the Formula 12 estimates are also different with different reference loci because the double-exchange frequency is different with the reference locus; for example, a reference locus near locus i or j has less double-exchange frequency than a reference locus a distance from loci i and j. In other words, the former loses smaller double exchanges than the latter. Therefore, the former has a larger estimate value than the latter. The third case is in between the first and second cases, which is seen in the next section. Thus, the recombination fraction between loci i and j is estimated by an average estimate over Formula 12 reference loci:

Formula 13(13)
It is obvious that Formula 13 contains not only information of the linkage phase but also the average double-exchange frequency over all reference loci and, in addition, balances sampling errors. Therefore, Formula 13 is closer to its true value than that obtained by using an EM algorithm.


AN EXAMPLE
As an example to illustrate the construction of linkage maps by MAPMAKER/EXP (version 3.0b), LANDER et al. (1987) provided a RFLP data set of 333 F2 mice. Since RFLP markers are codominant, A, H, and B are used in the data set for each locus to denote homozygotes of type A, heterozygotes (type H), and homozygotes of type B, respectively. To evaluate our new method, we converted these codominant marker data into dominant marker data by changing A to H and applied our new method to the dominant marker data set of the first six markers in the unknown linkage phase. Table 1 provides the estimates of the four pairs of nonsister gametes in the three-point combinations in the sample of 333 F2 individuals. It is clear that the frequencies of the four pairs of nonsister gametes containing both loci 4 and 6 all fit the ratios of 1:1:1:1 very well, which indicates that loci 4 and 6 are independent of each other and unlinked to the other four loci. Thus, these two loci are excluded. By using Equations 11 and 12, we obtained estimates of the recombination fractions in three-point combinations (123), (125), (135), and (235). The procedure is as follows: the first step is to determine the linkage order of three loci in a combination; for example, for combination (123), Formula 13 indicates that Formula 13is the parental type and Formula 13is the type due to double exchange. Those remaining are recombinants where Formula 13 and Formula 13, respectively, represent recessive and dominant alleles in locus i (i = 1, 2, 3) in a combination. These three loci have the linkage order of 132. The second step is to determine the linkage phase: since gamete Formula 13 is recessive at all three loci and has the largest frequency among these four types of nonsister gametes, we can determine that loci 1, 2, and 3 are in the coupling phase. The third step is to estimate recombination fractions in combination (123) by applying Equation 11 for the case of the coupling phase to the data in Table 1; that is,

Formula 13
Similarly, we also obtained estimates of the recombination fractions in combinations (125), (135), and (235) (see Table 2).


View this table:
In this window
In a new window

 
TABLE 1

Estimation of frequencies of four types of nonsister gametes

 

View this table:
In this window
In a new window

 
TABLE 2

Estimation of recombination fractions by using the new method

 
Finally, the three-point estimates of the recombination fractions were incorporated into two-point estimates by applying Equation 13 to the data in Table 2:

Formula 13
On the basis of the two-point estimates of recombination fractions, the best linkage map for these four loci under study was found to be 1325, using a novel approach called the unidirectional growth method (TAN and FU 2006), where loci 1, 2, 3, and 5 correspond to markers T175, T93, C35, and C66, respectively, in the original data set. The same linkage map (see Figure 2A) was obtained when only some of the markers were converted to dominant markers and is also the same linkage map that was obtained by MAPMAKER (at LOD = 3.0) in the original data. However, when all markers are converted to the dominant type, MAPMAKER yielded a linkage map 132564 (at LOD = 3.0) where locus 6 corresponding to marker T209 was linked to locus 5 (C66) at map distance 30.3 cM and locus 4 corresponding to T24 was linked to locus T209 at map distance 14.9 cM (see Figure 2B). These observations indicate that the new method leads to a better estimate of recombination than the maximum-likelihood method between dominant markers in the case of unknown phase in F2 progeny.


Figure 2
View larger version (5K):
In this window
In a new window
Download PPT slide
 
FIGURE 2.—

Two linkage maps of loci built by the unidirectional growth method (TAN and FU 2006) on the basis of the new estimates of recombination fractions (A) and by MAPMARKER on the basis of the EM estimates (B), where the data of the RFLP markers provided in MAPMAKER/EXP (version 3.0, LANDER et al. 1987) were converted into dominant markers by replacing B with H.

 


SIMULATION STUDY
Since real data are not the best for fully evaluating a method because of unknown recombination fractions between loci, we used a computer simulation to generate data so that estimates of the recombination fraction can be compared to their true values. In addition to the new method, we also implemented the EM algorithm (see LIU 1998 for a detailed description of the process). To avoid potential unknown bias of a map-making method, we implemented the exhaustive search method to make maps (LIU 1998). Since the exhaustive search is extremely time consuming (MESTER et al. 2003b), we examined only two short linkage maps, composed of 6 and 11 dominant loci, respectively. Five map distances 10, 15, 20, 25, and 30 cM (1 cM = 1%) were randomly assigned to each adjacent interval. This setting makes it more difficult to estimate recombination fractions than in the case of a single fixed distance for all adjacent loci.

We took two cases of linkage phases into account in the simulation: (1) coupling phase (CP), 1 allelic statuses at all loci are assigned to a parental (P1) chromosome and all 0 allelic statuses to the other parental (P2) chromosome; and (2) unknown phase (UP), 1 or 0 allelic status at each locus is at random allocated to each of two parental chromosomes with equal probability. We used the point process crossover model (FOSS et al. 1993; MCPEEK and SPEED 1995) to generate recombinants. In each of F1 meioses, recombination events occur at random between two adjacent loci. We considered both crossover-independent and complete crossover interference (but in separate simulations). For the complete crossover interference, we assumed that crossover cannot occur within an interval and between two nonsister chromatids when there is already a crossover within its adjacent interval and between the same two nonsister chromatids in the case of which the sum of distances over two adjacent intervals is ≤40 cM.

The expected ratio of alleles 1 and 0 for each locus is 3:1 among F2 individuals. The simulations were carried out with sample sizes N = 100, 200, and 300 F2 individuals, and loci that exhibited significant segregation distortion as revealed by chi-square test were removed. For each parameter set, 500 replicates were generated. Two criteria were used to evaluate these methods. One is the bias of the estimates of recombination fractions between two adjacent loci, which is defined as the average squared distance of the estimate to its true value, and the other is the accuracy of a method in recovering the true linkage map of given loci.

Table 3 shows the biases of estimates in the case of UP obtained by the two methods. In all the cases, the new method has a much smaller bias than the EM algorithm, which is a good indication that the new method is a better approach. However, the ultimate measure of usefulness of a method for estimating recombination fractions is to see if it leads to more accurate linkage map estimation. Table 4 summarizes the results of linkage map estimation by applying the exhaustive search method to the estimated recombination fraction data obtained by using both the EM algorithm and the new methods. It can be seen from Table 4 that both the EM and the new estimators have a very high accuracy in the case of CP even in a relatively small sample of 100 F2 individuals. However, the new estimator has a much higher accuracy than the EM estimator in the case of UP, as expected. Furthermore, the new method improves its accuracy rapidly with sample size. It has an accuracy of 50.5% with a sample size of 100 F2 individuals and 85.1% with a sample size of 300 F2 individuals. The accuracy of both estimators decreases as the number of dominant loci increases. Table 5 shows the results of accuracy under the assumption of crossover interference. As expected, both methods have poorer performance than under the assumption of crossover interference. Although complete crossover interference in general likely occurs only between two very small adjacent intervals. The results in Table 5 suggest that crossover interference has in general a negative impact on the estimate of the recombination fraction.


View this table:
In this window
In a new window

 
TABLE 3

Variances of estimates of recombination fractions between adjacent dominant loci in the unknown phase (UP) deviated from their respective true values in 500 simulated samples

 

View this table:
In this window
In a new window

 
TABLE 4

Efficiencies of two recombination fraction estimators in recovering the true linkage orders of 6 and 11 linked dominant loci in 500 samples generated by simulations on the basis of crossover independence

 

View this table:
In this window
In a new window

 
TABLE 5

Efficiencies of two recombination fraction estimators in recovering the true linkage orders of 6 and 11 linked dominant loci in 500 samples generated by simulations on the basis of crossover interference

 


DISCUSSION
We showed in this article, using both real and simulated data, that the widely used EM algorithm for estimating the recombination fraction between a pair of loci performs poorly for dominant markers because it fails to distinguish the coupling phase from the repulsion phase. We also found (results not shown) that similar to those shown in Tables 4 and 5 MAPMAKER/EXP performed poorly (<10% accuracy) for dominant markers in the unknown linkage phase, regardless whether a two-point or a three-point approach was used to estimate recombination fractions. The excellent performance of our new method may be due to several factors: (a) improved accuracy of the estimates of the gamete frequencies, (b) three-point analysis in which coupling and repulsion phases of loci are effectively distinguished, and (c) reduction of three-point estimates to two-point estimates resulting in more stable estimates of the recombination fractions.

Although the new method appears to have a shortcoming in that good accuracy of recovering true linkage maps using its estimates requires a reasonably large sample size, it does provide a promising approach that can lead to a better estimation of linkage maps from either dominant loci or mixed loci when the sample size is ~300 F2 individuals. One likely application of the new method is to supplement the EM method. More specifically, one can apply both methods to the same data set and obtain two sets of estimates of recombination fractions. The EM estimates are used to build two partner linkage maps in which all linked loci are in the coupling phase. The new method's estimates can be used to integrate these two partner linkage maps into a single linkage map.

This study also indicates that examination of three loci at a time does provide additional information for estimating both recombination fractions and linkage maps. Since there are on the order of n3 combinations of three loci, any approach that analyzes three loci at a time will be demanding computationally, particularly when the number of loci is large. It will be practical only when the speed of analyzing each combination of the three loci is sufficiently fast. The new method is practical even for a large number of loci since the amount of computation for each triplet of loci is minimal.


APPENDIX A
Since Formula 13, an alternative expression of Formula 13 is

Formula A1(A1)
Similarly, we have

Formula A2(A2)

Formula A3(A3)
It follows that

Formula A4(A4)
and

Formula A5(A5)
Equations A4 and A5 lead to the solution for Formula A5 as

Formula A6(A6)


ACKNOWLEDGEMENTS
We thank the High Performance Computer Center of Yunnan University for computational support and Sara Barton for editorial assistance. This research was supported by National Institutes of Health grant R01 GM50428 (to Y.-X. F.) and by funds from Yunnan University and a 973 project (2003CB415102).


LITERATURE CITED

ALLARD, R. W., 1956 Formulas and tables to facilitate the calculation of recombination values in heredity. Hilgardia 24: 235–278.

BARTEL, P. L, J. A. ROECKLEIN, D. SENGUPTA and S. FIELDS, 1996 A protein linkage map of Escherichia coli bacteriophage T7. Nat. Genet. 12: 72–77.[CrossRef][Medline]

CONSOLI, L., A. LEFEVRE, M. ZIVY, D. DE VIENNE and C. DAMERVAL, 2002 QTL analysis of proteome and transcriptome variations for dissecting the genetic architecture of complex traits in maize. Plant Mol. Biol. 48: 575–581.[CrossRef][Medline]

DEMPSTER, A. P., N. M. LAIRD and D. B. RUBIN, 1977 Maximum likelihood from incomplete data via the EM algorithm. J. R. Stat. Soc. 39B: 1–38.

FOSS, E., R. LANDER, F. W. STAHL and C. M. STEINBERG, 1993 Chiasma interference as a function of genetic distance. Genetics 133: 681–691.[Abstract]

KNAPP, S. J, J. L. HOLLOWAY, W. C. BRIDGES and B. H. LIU, 1995 Mapping dominant markers using F2 mating. Theor. Appl. Genet. 91: 74–81.

LANDER, E. S., and P. GREEN, 1987 Construction of multilocus linkage maps in human. Proc. Natl. Acad. Sci. USA 84: 2363–2367.[Abstract/Free Full Text]

LANDER, E. S., P. GREEN, J. ABRAHAMSION, A. BARLOW, M. J. DALY et al., 1987 MapMaker: an interactive computer package for constructing genetic linkage maps of experimental and natural populations. Genomics 1: 174–181.[CrossRef][Medline]

LIU, B. H., 1998 Statistical Genomics. Linkage, Mapping, and QTL Analysis, pp. 163–214. CRC Press, Cleveland/Boca Raton, FL.

MESTER, D. I., Y. I. ROMIN, Y. HU, E. NEVO and A. B. KOROL, 2003a Efficient multipoint mapping: making use of dominant repulsion-phase markers. Theor. Appl. Genet. 107: 1102–1112.[CrossRef][Medline]

MESTER, D. I., Y. I. ROMIN, Y. HU, E. NEVO and A. B. KOROL, 2003b Constructing large-scale genetic maps using an evolutionary strategy algorithm. Genetics 165: 2269–2282.[Abstract/Free Full Text]

MCPEEK, M. S., and T. P. SPEED, 1995 Modeling interference in genetic recombination. Genetics 139: 1031–1044.[Abstract]

OTT, G., 1991 Analysis of Human Genetic Linkage. Johns Hopkins University Press, Baltimore/London.

PENG, J., A. KOROL, T. FAHIMA, M. RODER, Y. RONIN et al., 2000 Molecular genetic maps in wild emmer wheat, Triticum dicoccoides: genome-wide coverage, massive negative interference, and putative quasi-linkage. Genome Res. 10: 1509–1531.[Abstract/Free Full Text]

TAN Y.-D., and Y.-X. FU, 2006 A novel method for estimating linkage maps. Genetics 173: 2383–2390.[Abstract/Free Full Text]

THIELLEMENT, H., N. BAHRMAN, C. DAMERVAL, C. PLOMION, M. ROSSIGNOL et al., 1999 Proteomics for genetic and physiological studies in plants. Electrophoresis 20: 2013–2026.[CrossRef][Medline]

ZIVY, M, and D. DE VIENNE, 2000 Proteomics: a link between genomics, genetics and physiology. Plant Mol. Biol. 44: 575–580.[CrossRef][Medline]

Communicating editor: N. TAKAHATA