Abstract
A maximumlikelihood method to estimate the recombination fraction and its sampling variance using informative and noninformative halfsib offspring is derived. Estimates of the recombination fraction are biased up to 20 cM when noninformative offspring are discarded. In certain scenarios, the sampling variance can be increased or reduced up to fivefold due to the bias in estimating the recombination fraction and the LOD score can be reduced up to 5 units when discarding noninformative offspring. Comparison of the estimates of recombination fraction, map distance, and LOD score when constructing a genetic map with 251 twopoint linkage analyses and six families of Norwegian cattle was carried out to evaluate the implications of discarding noninformative offspring in practical situations. The average discrepancies in absolute value (average difference when using and neglecting noninformative offspring) were 0.0146, 1.64 cM, and 2.61 for the recombination fraction, map distance, and the LOD score, respectively. A method for simultaneous estimation of allele frequencies in the dam population and a transmission disequilibrium parameter is proposed. This method might account for the bias in estimating allele frequencies in the dam population when the halfsib offspring is selected for production traits.
A large effort has been made to construct genetic maps of markers for many farm animals. The traditional approach has been to use reference families in which one or more families consisting of parents and offspring are genotyped for genetic markers with as much coverage as possible of the genome. In cattle, the resource families have been consisting of fullsibs (Barendseet al. 1997), mixed full and halfsib groups (Kappeset al. 1997), or halfsibs (Maet al. 1996). Increasing interest in the construction of genetic maps aimed to map quantitative trait loci (QTL) in cattle has focused on the development of male genetic maps in different populations (Georgeset al. 1995; Maet al. 1996; Vågeet al. 2000). Those genetic maps have been used for QTL mapping by typing sires and their halfsib sons and using records of production in the granddaughters in the socalled granddaughter design (Welleret al. 1990). Usually DNA from sires and halfsib offspring is available. However, DNA from dams is usually not available because they are culled before the time of typing is started. Noninformative halfsib offspring will be produced in this situation. For example, let a sire be heterozygous Aa at a codominant marker. Alleles segregating in the dam population are A, a, and a^{*}, where a^{*} is any other allele segregating in the population and different from the alleles of the sire. Five genotypes of halfsib offspring are possible: AA, Aa^{*}, Aa, aa^{*}, and aa. Tracing of sire alleles in offspring AA, aa, Aa^{*}, and aa^{*} is straightforward. However, tracing of the sire alleles in offspring with the same genotype as their sire, Aa, is not possible. Therefore, offspring with the same genotype as their sire are noninformative.
Methods for linkage analysis have either ignored offspring with genotype Aa (Vilkkiet al. 1997; Maet al. 1996) or made use of estimates of allele frequency in the dam population with a single sire family at the time (Georgeset al. 1995).
The method proposed by Georges et al. (1995) to estimate allele frequencies in the dam population also assumed that either allele in a heterozygous sire segregates with equal probability in spite of the fact that those sons have been highly selected for production traits. Consequently, a higher frequency of the alleles associated to those traits among the selected offspring is expected.
More recently, consensus maps are being developed for each of the cattle chromosomes using data provided by different labs with different reference families (Casaset al. 1999; Guet al. 2000). The linkage analysis is carried out using the software CRIMAP, which ignores noninformative halfsib offspring.
It is the purpose of this article to show that discarding noninformative offspring leads to biased estimates of the recombination fraction and, therefore, of the map distances. It is also shown that neglecting the dam contribution affects both sampling variance of the estimates of the recombination fraction and the LOD score.
The order of this article is (1) to describe a maximumlikelihood method to estimate the recombination fraction and its sampling variance assuming known allele frequencies in the dam population, (2) to show the magnitude of the bias in the estimates of the recombination fraction when discarding noninformative offspring, (3) to show the effect of discarding noninformative offspring on the sampling variance of the recombination fraction and on the LOD scores, (4) to describe a maximumlikelihood method for estimating both a transmission disequilibrium parameter and allele frequencies in the dam population accounting for selection in the halfsib offspring, and (5) to evaluate the discrepancies in estimating the recombination fraction, its sampling variance, map distance, and LOD score when using or discarding noninformative offspring in the construction of a genetic map with six large halfsib families of Norwegian cattle.
THEORY AND METHODS
Maximumlikelihood estimation using informative and noninformative halfsib offspring: Let a sire be heterozygous at two codominant markers with alleles A and a at the first marker and with alleles B and b at the second marker. Alleles a^{*} and b^{*} represent any other allele segregating in the dam population different from the alleles of the sire at the first and the second marker, respectively. Assume that linkage phase is AB/ab and that the true recombination fraction is c. Offspring genotypes produced after mating the sire with different dams and having one offspring from each mating can be classified according to the gamete inherited from the sire: AB, Ab, aB, ab, Ax, ax, xB, xb, and xx. In this notation, x refers to noninformative offspring at the corresponding marker. For example, offspring with genotype xx are noninformative at both markers. Table 1 shows the type of gametes inherited from sire to noninformative offspring given the gametes from sire and dam. The expected frequency of each genotype can be computed by multiplying the frequencies of sire and dam gametes and adding over all possible combinations for each type of gamete. The gamete frequencies for each type of gamete can be used in the maximumlikelihood estimation of the recombination fraction. The maximumlikelihood equation is
The maximumlikelihood equation assuming both linkage equilibrium and known allele frequencies in the dam population is
A solution to the maximumlikelihood equation can be obtained using the grid search method or methods using derivatives such as NewtonRaphson (appendix a). The latter methods allow computation of the approximate sampling variance of the estimates of the recombination fraction
In general, the linkage phase is unknown and offspring from several sires are available. The jointlikelihood equation using both phases and multiplesire families is
Bias in the estimation of the recombination fraction using informative offspring: The maximumlikelihood estimate of the recombination fraction when noninformative offspring are discarded (c^{*}) is simply the frequency of informative recombinants divided by the sum of frequencies of all informative offspring (appendix b). From Table 1, the expected frequency of informative recombinants is
Consequently, the expected recombination fraction when discarding noninformative offspring is
Variance of the estimates of the recombination fraction using informative offspring: Discarding noninformative offspring also affects the sampling variance of the estimates of the recombination fraction. It is possible to evaluate the relation between both estimates using Fisher’s approximation. The estimate of the sampling variance of the recombination fraction when using only informative offspring is
Monte Carlo simulation: Analytic evaluation of the changes in the LOD scores by neglecting noninformative offspring is difficult because of the large number of combinations of the nine possible genotypes of offspring. A computer simulation was carried out in which one sire family was simulated with 50 offspring (average number of sons used in the construction of a genetic map of Norwegian cattle; Vågeet al. 2000). Linkage phase and allele frequencies in the dam population were assumed to be known. The transmission of either allele at each marker from sire to offspring was assumed to be ^{1}/_{2}. Offspring from the sire was simulated using random drawings from a uniform distribution. If a drawing was in the interval between 0 and ϕ_{AB} then the gametic count for AB was increased by one. If the drawing was between ϕ_{AB} and (ϕ_{AB} + ϕ_{Ab}) then the gametic count for genotype Ab was increased by one. This process was carried out until all 50 offspring were assigned to one of the nine possible gametes according to their probability (ϕ_{i}; i = AB, Ab, aB, ab, Ax, ax, Bx, bx, xx) and then accumulated in their corresponding gametic count. This approach is equivalent to simulating each of the markers having a Mendelian inheritance to his sons but it is computationally less demanding. Gametic counts were used to estimate recombination fraction and to compute LOD scores (log_{10}[L(ĉ)/L(^{1}/_{2})]) using all offspring (informative and noninformative) and when discarding noninformative offspring. Each simulation set was replicated 10,000 times. Simulated recombination fractions were 0.05, 0.15, or 0.25. Results are given in Table 4.
Estimation of allele frequency in the dam population: The above maximumlikelihood method to estimate recombination fraction using halfsib families assumed that allele frequencies in the dam population are known. In practice, allele frequencies might be estimated from the same data. A general maximumlikelihood estimation of allele frequencies in the dam population allowing estimation of the transmission disequilibrium of the sire alleles among offspring is described in this section. This disequilibrium would arise if the marker is linked to loci affecting a quantitative trait under selection. Selected offspring is the more frequent situation in cattle since bulls with lower estimated breeding values are culled after progeny testing.
Let a heterozygous sire, Aa, produce halfsib offspring assuming that the frequency of alleles A, a, and a^{*} in the dam population are f_{A}, f_{a}, and f_{a}_{*}, respectively. As in the previous sections, allele a^{*} corresponds to any allele segregating in the dam population different from the alleles of that particular sire. The gametes inherited from sire and dam to offspring, the offspring genotypes, and their frequencies are given in Table 3. In the notation of the table, the transmission parameter (v) is the probability of transmitting one of the sire alleles to his offspring.
The likelihood equation to be maximized is
Two particular cases can be considered in the above equation. The first is when both alleles have the same probability of being transmitted to the offspring; i.e., v = ½. The likelihood equation becomes
The second case is when reliable estimates of allele frequencies in the dam population are available and they do not need to be estimated. The explicit solution for the transmission parameter is
Comparison of a genetic map of Norwegian cattle constructed when using only informative offspring and when using all offspring: The previous sections addressed the impact of discarding noninformative offspring on linkage analysis results. However, the assumption of known frequencies at the alleles segregating in the dam population was necessary. It is of interest to evaluate, in practice, the effect of discarding noninformative offspring in the construction of genetic maps. A genetic map of Norwegian cattle was constructed using six halfsib families with an average of 50 informative and noninformative offspring per family (Vågeet al. 2000). The map covers all 29 autosomal chromosomes. More information about the map can be found in the web site http://www.nlh.no/Institutt/IHF/Genkartstorfe/.
Comparison of estimates of the recombination fraction using all (AO) vs. only informative offspring (IO) was carried out by computing (1) accumulated discrepancy in absolute value of the recombination fraction,
RESULTS
Examples of the magnitude of the bias in the estimation of the recombination fraction when ignoring the dam’s contribution are depicted in Figure 1. The bias is very severe (up to 20 cM) when allele frequencies are very different at each of the two markers. Bias increases with the value of the recombination fraction and can be positive or negative depending on which sort of gamete (recombinant or nonrecombinant) is more frequently produced among noninformative offspring.
Figure 2 shows the effect of discarding noninformative offspring on the sampling variances of the estimates of the recombination fraction. The component γ (Equation 4) increases or decreases as much as five times the sampling variance of the estimates of the recombination fraction when allele frequencies are very different at the markers. The value of γ is up to 17% larger or smaller for moderate differences between allele frequencies and increases with smaller recombination fractions.
The results of the simulation experiment are given in Table 4. The average estimates of the recombination fraction over replicates when discarding noninformative offspring (c¯) were very similar to their predicted values (c¯^{*}). Table 4 also shows that maximumlikelihood estimation using AO allows unbiased estimation of the recombination fraction. The use of all offspring may increase LOD scores >5 units. As expected, there is not any bias in the estimates of the recombination fraction when the frequencies of alleles within each marker are identical (Table 4).
In practice, a large variety of situations may occur with respect to allele frequencies at linked markers in the dam population. A comparison of the values of linkage parameters in practical scenarios is needed to evaluate the impact of using only informative offspring in the construction of a genetic map. Discrepancies in absolute value for the recombination fraction, map distance, and LOD score for each of the autosomal chromosomes are given in Table 5. The highest average discrepancy for the recombination fraction corresponds to chromosome 3 (0.28). The average discrepancy of the genetic distance for the 13 intervals in that chromosome is 3.2 cM. The average discrepancies in absolute value considering the entire genome are 0.0146, 1.64 cM, and 2.61 for the recombination fraction, map distance, and LOD score, respectively.
It is also of interest to know if the use of only informative halfsib offspring alters the length of the genetic map or how the loss of information affects the variances of the estimates of the recombination fraction. Table 6 shows the accumulated net discrepancies of the recombination fraction, map distance, average variance of the estimate of the recombination fraction, and number of linkage analyses with significant LOD scores (with values >3) when using only informative offspring and when using all information. The genetic map constructed using only informative offspring has a reduced length (2806.9 cM) when compared to a genetic map constructed using all offspring (2897.4 cM). However, this reduction in length is only ∼3%. The variance of the estimates of the recombination fraction is increased when only informative offspring are used (0.057), showing the loss of information by excluding noninformative offspring. Results from 22 linkage analyses were not significant (LOD score <3) when using only informative offspring but were significant (LOD score >3) when using all offspring.
DISCUSSION
Linkage analysis using halfsib families is a powerful tool for accurate estimation of recombination fractions and map distances. This is because the widespread use of artificial insemination in cattle allows a very large number of male meioses to be available for linkage analysis. However, the maximumlikelihood method must make use of both informative and noninformative offspring to yield unbiased estimates of map distances. If only informative offspring are used in the linkage analysis then a severe bias may occur in the estimation of the recombination fraction. The magnitude of the bias depends on the allele frequency in the dam population being used. In particular cases, the bias can be up to 20 cM. The average bias shown in the construction of a genetic map of Norwegian cattle was 1.64 cM. One of the main reasons to construct genetic maps using halfsib families is their further use for QTL mapping in the granddaughter design. The results of this article indicate that genetic maps constructed using only informative offspring will be inaccurate, which may reduce power for QTL mapping. The amount of bias is also considerable with today’s efforts aimed at fine structural mapping of already mapped QTL.
Discarding the information of noninformative halfsib offspring also increases the sampling variance of the estimates of the recombination fraction and reduces the LOD score. The relation between sampling variances of estimates using all offspring and only informative offspring was described by the term γ. It is the increase or decrease in the sampling variance due to bias when estimating the recombination fraction. In extreme cases, γ could take values of >5 indicating the large impact of neglecting noninformative offspring on the sampling variance of the estimates of the recombination fraction. Another factor affecting the sampling variances not represented in γ is the amount of data available for the linkage analysis. Discarding noninformative offspring would reduce the amount of data available for the analysis and, consequently, the sampling variance of the estimates of the recombination fraction. This fact is likely contributing to the observed results of the sampling variance of the recombination fraction when constructing a genetic map of Norwegian cattle. The average sampling variance was 0.057 and 0.045 when using only informative and all offspring, respectively. In general, the use of the noninformative offspring increases the amount of information available for the analysis and, consequently, the LOD scores. The effect of using noninformative offspring on the LOD score is related to the magnitude of the allele frequencies in the dam population in two ways. First, higher frequencies at the alleles represented in the sire reduce the proportion of informative offspring and, therefore, the amount of information in the linkage analysis. Second, if the proportion of recombinants among noninformative offspring is higher than the proportion of recombinants produced by the sire then the estimates of the recombination fraction are biased downward. The smaller the recombination fraction is, the higher the LOD score becomes. The latter can be observed in the simulation results when allele frequencies were f_{A} = 0.8, f_{a} = 0.1, f_{B} = 0.1, and f_{b} = 0.8, yielding average LOD scores of 1.69 and 7.22 when using only informative and all offspring, respectively.
Georges et al. (1995) carried out linkage analysis using one single halfsib family and making use of estimates of allele frequencies in the dam population. However, their method to estimate allele frequencies assumed that the transmission parameter was ^{1}/_{2}, i.e., equal probability of transmission for either allele from the sire to his halfsib sons. The American cattle population is selected for production traits and hence the sample taken for their linkage analysis corresponded to highly selected sons. Consequently, it is expected that markers linked to quantitative trait loci for production traits would not segregate 50:50 among offspring. The use of those estimates may yield biased estimates of the allele frequencies, which might also yield biased estimates of the recombination fraction. In this situation, the proposed maximumlikelihood approach to simultaneously estimate dam allele frequencies and the transmission parameter should be the method of choice. In addition, this maximumlikelihood approach could be used to carry out hypothesis testing for the transmission parameter being different from ^{1}/_{2} as a way to identify areas of the genome being effectively changed under selection. In fact, preliminary results using the genotyping information of Norwegian cattle yielded more significant results for the transmission parameter than would be expected by chance. Norwegian cattle is a dualpurpose breed in which the young bulls, tested and selected for growth performance, were used to construct the genetic map assuming a probability 1/2 of transmission of either allele from sire to offspring (Vågeet al. 2000).
The increasing amount of male genetic maps generated in different cattle populations in the world (e.g., Vilkkiet al. 1997; Georgeset al. 1995; Maet al. 1996; Vågeet al. 2000) provides a source for genomic studies such as homogeneity of the recombination across families. This is also important for QTL mapping because of the usual assumption of all sires having the same recombination fraction. However, a maximumlikelihood method using both informative and noninformative offspring should be used to avoid a different bias in the estimation of the recombination fraction for each of the sire families since the sires are likely carriers of different alleles segregating at a different frequency in the dam population. Analysis of homogeneity of the recombination fraction could then be performed using the available male genetic maps.
There is an international initiative to develop consensus maps for each of the cattle chromosomes. Consensus maps for chromosomes 4 and 7 have been already published (Casaset al. 1999; Guet al. 2000). Consensus maps are being constructed using data from different cattle populations and CRIMAP software, which makes use of only the informative offspring when analyzing halfsib families. It can be concluded from the results of this article that genetic distances in consensus maps are likely biased.
Maximumlikelihood methods using halfsib families have been discussed in the context of twopoint linkage analyses. In practice, the methods can be extended to include any number of loci. The use of only informative offspring in multipoint linkage analysis may yield the wrong order of loci. Caution must be taken in using consensus maps, which ignore noninformative offspring in halfsib families.
APPENDIX A
Solution to maximumlikelihood equations using all offspring and the NewtonRaphson method: The maximumlikelihood equation (Equation 2) assuming linkage disequilibrium and known allele frequencies in the dam population is L(c) = K(ϕ_{AB})^{nAB}(ϕ_{Ab})^{nAb}(ϕ_{aB})^{naB}(ϕ_{ab})^{nab} × (ϕ_{Ax})^{nAx}(ϕ_{ax})^{nax}(ϕ_{xB})^{nxB}(ϕ_{xb})^{nxb}(ϕ_{xx})^{nxx}, where ϕ_{i} are the gamete probabilities for gamete i (i = AB, Ab, aB, ab, Ax, ax, Bx, bx, xx) with values given in Table 2.
Taking the natural logarithm to both sides of the likelihood equation and dropping the constant term from the equation,
The second derivative of the likelihood equation is
The approximate sampling variance of the estimate of the recombination fraction is Var(ĉ) ≈ 1/[∂^{2}ln L(c)/∂c^{2}]_{c}_{=}ĉ.
APPENDIX B
Maximumlikelihood solution using only informative offspring: Assuming linkage phase AB/ab in the sire, let n_{AB}, n_{aB}, n_{Ab}, and n_{ab} be the observed counts for gametes AB, Ab, aB, and ab inherited from the sire. The maximumlikelihood equation discarding noninformative offspring is
APPENDIX C
Solution to the maximumlikelihood equation to estimate allele frequency in the dam population when either allele has the same probability of transmission from sire to sons (v = ^{1}/_{2}): The likelihood equation is
APPENDIX D
Estimation of the transmission parameter assuming known allele frequencies in the dam population: The maximumlikelihood equation is
After rearranging this equation a quadratic is obtained,
Acknowledgments
Genotype information from Norwegian cattle was provided by Helge Klungland, Dag Inge Våge, Ingrid Olsaker, and Sigbjørn Lien. Biological material for typing was provided by GENO (breeding organization of Norwegian cattle). This work has been supported by the Norwegian Research Council (NFR) project number 130162/130, titled “Strategic QTL Research Plan for Disease Resistance in Atlantic Salmon and Cattle.”
Footnotes

Communicating editor: J. B. Walsh
 Received June 15, 2000.
 Accepted November 20, 2000.
 Copyright © 2001 by the Genetics Society of America