Genetics, Vol. 161, 275-287, May 2002, Copyright © 2002

Simultaneous Mining of Linkage and Linkage Disequilibrium to Fine Map Quantitative Trait Loci in Outbred Half-Sib Pedigrees: Revisiting the Location of a Quantitative Trait Locus With Major Effect on Milk Production on Bovine Chromosome 14

Frédéric Farnir1,a, Bernard Grisart1,a, Wouter Coppietersa, Juliette Riqueta, Paulette Berzia, Nadine Cambisanoa, Latifa Karima, Myriam Mnia, Sirja Moisiob, Patricia Simona, Danny Wagenaara, Johanna Vilkkib, and Michel Georgesa
a Department of Genetics, Faculty of Veterinary Medicine, University of Liège (B43), 4000-Liège, Belgium
b Animal Production Research, Agricultural Research Centre MTT, 31600 Jokioinen, Finland

Corresponding author: Michel Georges, Faculty of Veterinary Medicine, University of Liège (B43), 20 Bd. de Colonster, 4000-Liège, Belgium., michel.georges{at}ulg.ac.be (E-mail)

Communicating editor: C. HALEY


*  ABSTRACT
*TOP
*ABSTRACT
*MATERIALS AND METHODS
*RESULTS
*DISCUSSION
*LITERATURE CITED

A maximum-likelihood QTL mapping method that simultaneously exploits linkage and linkage disequilibrium and that is applicable in outbred half-sib pedigrees is described. The method is applied to fine map a QTL with major effect on milk fat content in a 3-cM marker interval on proximal BTA14. This proximal location is confirmed by applying a haplotype-based association method referred to as recombinant ancestral haplotype analysis. The origin of the discrepancy between the QTL position derived in this work and that of a previous analysis is examined and shown to be due to the existence of distinct marker haplotypes associated with QTL alleles having large substitution effects.


WITH the development of comprehensive genetic marker maps in many species, it has become possible to map quantitative trait loci (QTL) underlying the genetic variance of medically and economically important complex phenotypes (e.g., LANDER and SCHORK 1994 Down; PATERSON 1995 Down; ANDERSSON 2001 Down; FLINT and MOTT 2001 Down; MACKAY 2001 Down; MAURICIO 2001 Down). Generally speaking, however, the resolution that is achieved when mapping QTL (typically of the order of 20–30 cM) is insufficient if one intends to positionally clone the corresponding genes. Also, this level of mapping accuracy limits the benefits of QTL mapping information for marker-assisted selection (e.g., SPELMAN and VAN ARENDONK 1997 Down). There is therefore a need to develop methods that improve the resolution of QTL mapping.

Most strategies proposed to achieve this goal imply breeding of large numbers of additional offspring to increase the density of recombinants in the chromosome regions of interest (e.g., DARVASI 1998 Down). In man as well as most livestock species this proposition is not practical. An alternative to the generation of recombinants de novo is to exploit historical recombinants, i.e., use linkage disequilibrium (LD; e.g., TERWILLIGER and WEISS 1998 Down). As in most human populations useful levels of LD might not extend for >10–100 kb (KRUGLYAK 1999 Down; REICH et al. 2001 Down); this approach implies very dense maps counting maybe as many as 500,000 polymorphic markers.

We recently made the unexpected observation that, contrary to the situation in human, LD extends over several tens of centimorgans in cattle (DUNNER et al. 1997 Down; FARNIR et al. 2000 Down). This suggests that in cattle, and maybe in other livestock species, LD mapping might be effective using the available medium density microsatellite maps (e.g., KAPPES et al. 1998 Down). This prediction has been supported by the mapping of the bovine syndactyly gene (CHARLIER et al. 1996 Down) as well as the fine mapping of a QTL influencing milk composition (RIQUET et al. 1999 Down) using strategies based on identity-by-descent.

In the RIQUET et al. 1999 Down study, we applied a two-tiered approach for the fine mapping of QTL in outbred populations. In the first stage, heterozygous "Qq" individuals were identified by performing marker-assisted segregation analysis in their offspring. In the second stage, shared haplotypes flanking the "Q" allele predicted to be identical-by-descent among Qq individuals were identified, thereby defining the QTL position.

While applied successfully to a QTL previously mapped to bovine chromosome 14 (COPPIETERS et al. 1998A Down; HEYEN et al. 1999 Down; RIQUET et al. 1999 Down; LOOFT et al. 2001 Down), extending this approach to other QTL has been hampered mainly because of the difficulty in identifying sufficient numbers of individuals with an unambiguous Qq genotype. In this work, we describe the development of a mapping method that simultaneously exploits linkage and linkage disequilibrium and that allows inclusion of individuals with an ambiguous QTL genotype.

The method has been applied to the QTL influencing milk yield and composition located at the centromeric end of chromosome 14. The results suggest a more proximal map position than the one that was previously identified (RIQUET et al. 1999 Down). A preliminary analysis of recombinant ancestral haplotypes (RAH) supports this proximal location of the QTL. We provide evidence that our initial erroneous localization is due to the heterogeneity of haplotypes associated with large QTL substitution effects.


*  MATERIALS AND METHODS
*TOP
*ABSTRACT
*MATERIALS AND METHODS
*RESULTS
*DISCUSSION
*LITERATURE CITED

Pedigree material:
The pedigree material used for QTL mapping comprised the following: Data set I:A Black-and-White Holstein-Friesian granddaughter design sampled in the Netherlands and composed of 22 paternal half-sib families for a total of 987 bulls.

Data set II:A Black-and-White Holstein-Friesian granddaughter design sampled in New Zealand and composed of seven paternal half-sib families for a total of 227 bulls. Data sets I and II have been described previously (SPELMAN et al. 1996 Down; COPPIETERS et al. 1998A Down).

Data set III:A Red-and-White Holstein-Friesian granddaughter design sampled in the Netherlands and Belgium comprising 23 paternal half-sib families for a total of 401 bulls (B. GRISART, unpublished data).

Data set IV:A novel granddaughter design composed of 39 Dutch Black-and-White Holstein-Friesian paternal half-sib families for a total of 430 bulls.

Data set V:A daughter design composed of 51 New Zealand Holstein-Friesian paternal half-sib families for a total of 529 cows.

Phenotypes:
Phenotypes were respectively daughter yield deviations (DYD) for bulls, lactation values (unregressed first lactation yield deviations) for cows, as well as average parental predicted transmitting abilities (PTA) for bulls and cows for milk protein and fat yield, as well as protein and fat percentage (VAN RADEN and WIGGANS 1991 Down). DYDs, lactation values, and PTA were obtained directly from CR-DELTA (Netherlands; data sets I, III, and IV) and Livestock Improvement Corporation (LIC; New Zealand; data sets II and V), respectively.

Marker genotyping:
Microsatellite genotyping was performed as previously described using either autoradiography (GEORGES et al. 1995 Down) or automatic sequencers (D. WAGENAAR and W. COPPIETERS, unpublished data).

Joint linkage and linkage disequilibrium mapping of QTL: Likelihood of a single half-sib pedigree assuming linkage between marker map and QTL:
Assume a paternal half-sib pedigree counting a total of O half-sibs. Assume also that all members of the pedigree (sire, dams, and offspring) have been phenotyped for a polygenic, quantitative trait with heritability, h2, and have been genotyped for a battery of M polyallelic markers (e.g., microsatellite markers) spanning a chromosome hypothesized to carry a QTL affecting the trait of interest at map position (p). Order and recombination rates between adjacent markers are assumed to be known.

The likelihood of this single pedigree, hypothesizing that the sire is heterozygous Qq for the QTL at map position (p), can be computed as

(1)

(GEORGES et al. 1995 Down), where is the product over all sons, and P(Sonj, A(p)|Mj) and P(Sonj, B(p)|Mj) are the probabilities that son j has inherited homologs A and B from its sire at map position (p) given its composite genotype Mj for all markers in the map. These probabilities can be computed as previously described (COPPIETERS et al. 1998B Down). Phj is the phenotypic value of offspring j, PAj is the average phenotypic value of the sire and of dam of j, and {phi}(Phj, PAj ± {alpha}/2, {sigma}2) is the probability density function for a normally distributed random variable with mean PAj ± {alpha}/2 and variance {sigma}2, computed as

This likelihood is a function of two unknown parameters: the Q to q allele substitution effect, {alpha}, and the residual variation, {sigma}2. The values of {alpha} and {sigma}2 maximizing L1Ped can be found by means of iterative procedures such as the "quasi-Newton" methods (e.g., GEMINI; LALOUEL 1983 Down) or expectation-maximization (EM) methods.

Note that in this study, we have used the average parental predicted transmitting ability (see above) as the parental average (PA). An estimate of the dam's breeding value that would not depend on her son's record would actually be preferred.

Likelihood of multiple half-sib pedigrees assuming linkage between marker map and QTL:

If N half-sib pedigrees are sampled in a given outbred population, one cannot assume that all founder sires will be segregating for the QTL at map position (p). Assuming a biallelic QTL with alleles Q and q in Hardy-Weinberg equilibrium and having respective frequencies fQ (allele Q) and fq (allele q), a proportion 2 fQ fq of sires will be heterozygous Qq, the others being either QQ (f2Q) or qq (f2q). Because one does not know a priori which founder sires are heterozygous, the overall likelihood of the data can be computed as

(2)

The QAqB vs. qAQB notations refer to QTL genotypes in which the Q allele is located on the A vs. B homolog of the sire, respectively. The probabilities of the ith pedigree conditional on the QTL genotype of the ith sire can be computed as

(3)


(4)


(5)

where is the product over all O offspring of sire i, {phi}(Phj, PAj, {sigma}2) is the probability density function for a normally distributed random variable with mean PAj and variance {sigma}2, and Mij denotes known marker information on son ij.

The likelihood expression in Equation 2 is a function of {alpha}, {sigma}2, and an additional unknown parameter, fQ. As before, the values of {alpha}, {sigma}2, and fQ jointly maximizing LN Peds can be found using optimization routines such as GEMINI (LALOUEL 1983 Down).

Likelihood of multiple half-sib pedigrees assuming linkage disequilibrium between marker map and QTL—extracting information from the sires only:

If we single out one of the microsatellite markers in the linkage group (m), we can assume that QTL allele Q appeared in the population of interest—by mutation or migration—on a chromosome carrying allele 1 at marker locus m, and that this occurred g generations ago. Pooling all other alleles at marker locus m in a single class, 0, the haplotype frequencies at the present generations have expected values

where

(6)

{rho} is a heterogeneity parameter corresponding to the proportion of Q-bearing chromosomes that are identical-by-descent with the founder chromosome bearing marker allele 1 (TERWILLIGER 1995 Down), {theta} is the distance in recombination units between marker locus m and map position p, and f1 and f0 (=1 - f1) are the respective frequencies of marker alleles 1 and 0. The latter can be estimated directly from the marker genotype data.

These haplotype frequencies can be used to calculate QTL genotype probabilities conditional on marker genotype mi, as shown in Table 1.


 
View this table:
In this window
In a new window

 
Table 1. QTL genotype probabilities conditional on marker genotype

From these, the likelihood of the data can now be computed as

(7)

In the absence of linkage disequilibrium, i.e., if reduces to f2Q, to f2q, and and to fQfq, thereby reducing (7) to (2).

In reality, we do not know which marker allele was initially associated with the Q QTL allele. We therefore have to compute (7) for all A alleles of marker m and sum over all likelihoods,

(8)

where fk is the population frequency of allele k, and corresponds to the probability that sire i has QTL genotype QAQB given its genotype for marker m and the assumption that marker allele k was initially associated with QTL allele Q.

So far, we have considered only the association between the QTL and one of the markers in the linkage group: m. Following TERWILLIGER 1995 Down, we multiplied the likelihoods computed separately for the different markers in the map according to (8) over all M markers to yield the following overall likelihood expression:

(9)

The likelihood expression in Equation 9 involves two new unknown parameters, g and {rho}, which can be estimated jointly with {alpha}, {sigma}2, and fQ to maximize LN Peds. Note that we follow TERWILLIGER 1995 Down in estimating only a single value for {rho}. Strictly speaking, however, different marker alleles may be associated with distinct subsets of Q alleles corresponding to distinct proportions of Q-bearing chromosomes.

Likelihood of multiple half-sib pedigrees assuming linkage disequilibrium between marker map and QTL—extracting information from sire and ungenotyped dams:

In most half-sib designs, the majority of dams will have only one offspring. The most commonly applied QTL mapping methods are therefore not able to extract information from the dam's meioses. This would require inclusion of additional pedigree relationships linking the dams to other pedigree members. This is, for instance, done in the great-granddaughter design (COPPIETERS et al. 1999 Down) or methods that target full-pedigree analysis (e.g., BINK and VAN ARENDONK 1999 Down). When exploiting LD, however, even meioses from parents having a single offspring are potentially informative. Even if the dams have not been genotyped, as would often be the case in, for instance, the granddaughter design, the marker genotypes of the maternal gametes transmitted to the offspring can easily be inferred. This information can be utilized in Equation 9 by modifying Equation 3Equation 4Equation 5 as follows:

(10)


(11)


(12)

In these expressions, fQ is the frequency of QTL allele Q; is the product over all O offspring of sire i; P(Sonij, .QB|Mij, Mi) and P(Sonij, .qB|Mij, Mi) are the probabilities that son ij has inherited, respectively, QTL alleles Q and q from its dam given its marker genotype Mij and that of its sire Mi; PHij is the phenotypic value of offspring j of sire i; and PAij is the average phenotypic value of sire i and dam ij. P(Sonij, .QB|Mij, Mi) and P(Sonij, .qB|Mij, Mi) are computed as

(13)

and

(14)

where P(Sonij, .1|Mij, Mi) and P(Sonij, .0|Mij, Mi) are the probabilities that son ij has inherited allele k and "not-k" from its dam at marker locus m, given marker information from son and sire. These probabilities are easily deduced and take on values of 1 or 0 for all combinations of sire, dam, and offspring genotype at marker locus m, except the situation where sire, dam, and offspring have genotype 10. In the latter case, exact probabilities can be computed from flanking marker genotypes and marker allele frequencies as previously described (COPPIETERS et al. 1998B Down).

P(Sonij, .QB (qB)|.1) and P(Sonij, .QB (qB)|.0) are the probabilities that son ij has inherited QB (respectively qB) from its dam given that it inherited marker allele k and not-k from its dam at marker locus m. These probabilities are computed as shown in Table 2.


 
View this table:
In this window
In a new window

 
Table 2. Probabilities that son ij has inherited QB (respectively qB) from its dam given that it inherited marker allele k and "not-k" from its dam at marker locus m

QTL mapping and hypothesis testing:

So far, we have assumed that we knew the position of the QTL. In reality, however, one attempts to identify the most likely position of the QTL. This can be accomplished using a conventional interval mapping approach, i.e., by sliding the position of the hypothetical QTL through the marker map and computing the likelihood of the data at each position according to Equation 9. The position on the map associated with the highest likelihood can then be considered as the most likely position of the QTL.

Note that one can consider three distinct likelihood profiles when attempting to map the QTL. Indeed, one can maximize the likelihood of the pedigree data with respect to the unknown parameters ({alpha}, {sigma}2, {rho}, fQ, and g), thereby extracting information from both linkage and linkage disequilibrium information. We refer to this hypothesis as H2. Alternatively, one can fix the value of g at {infty}, thereby ignoring all linkage disequilibrium information: H1. Finally, by comparing the likelihood under H1 with that obtained under H2 one can consider only the mapping data provided by the linkage disequilibrium signal alone.

In addition, one can compute the likelihood of the data under the null hypothesis of no QTL at the corresponding map position. We distinguish two distinct null hypotheses: H01, in which {alpha} is set at zero, and H02 in which the QTL is positioned at 50% recombination rate from the examined location, p.

The significance of these alternative hypotheses can be evaluated by generating different likelihood ratio statistics: H2/H0 tests the combined linkage + linkage disequilibrium signal, H1/H0 tests the linkage signal, and H2/H1 tests the linkage disequilibrium signal.

The approach chosen to combine the LD information from the different markers into a quasi-multipoint expression (TERWILLIGER 1995 Down) is an approximation. As a consequence, the theoretical distributions of these likelihood-ratio statistics are not precisely known. One can demonstrate that in the absence of LD, log10(LN Peds|H1)/(LN Peds|H0), computed according to Equation 9, corresponds to M x lod score. We therefore report the corresponding likelihood ratios as (1/M)log10(LN Peds|Halt)/(LN Peds|H0) in this work.

The LINKLD programs:

The described linkage and linkage disequilibrium analyses are implemented in a computer program (LINKLD), which is available from the authors upon request.

Recombinant ancestral haplotype analysis:
Using the maximum-likelihood approach outlined above, one can estimate the most likely genotype of the ancestral marker haplotype (AH) on which the mutation generating the Q allele occurred. Indeed, using Equation 7, one can compute the likelihood of the pedigree data assuming that a given allele (k) at the considered marker (m) was initially associated with the Q QTL allele. The allele yielding the highest likelihood can then be considered to be the most likely ancestral allele for that marker. The combination of the most likely ancestral alleles over all markers then yields the most likely AH. All other haplotypes are referred to as OH. Alternatively, the AH can be identified on the basis of the haplotype sharing among putative Qq sires as outlined in RIQUET et al. 1999 Down.

On the basis of the knowledge of this AH, we then define 2(n - 1) classes of RAH, including (n - 1) telomeric (T)RAH haplotypes that are identical-by-state with the AH from the telomeric end up to one of the n marker positions and (n - 1) centromeric (C)RAH haplotypes that are identical-by-state with the AH from the centromeric end up to one of the n marker positions. Fig 3 illustrates this concept. The genotype of individuals with phase-known marker genotype can then be expressed as a function of their haplotype class. Individuals could, for instance, be of OH/TRAH(n - 1) genotype, of AH/CRAH(3) genotype, etc. The effect on phenotype of each haplotype class can then be estimated using the linear model

(15)

where Phi is the phenotypic value of individual i, PAi is the average phenotypic value of the sire and dam of individual i, HPi and HMi are, respectively, the effect of paternal and maternal marker haplotypes of individual i as defined above, SHPi and SHMi are, respectively, the paternal and maternal marker haplotypes of the sire of individual i, and ei is a residual error term.






View larger version (125K):
In this window
In a new window
Download PPT slide
 
Figure 1. Lod scores obtained when analyzing data sets I and III jointly. Only sire information is considered in A and B, while sire and dam information is considered in C and D. The null hypothesis is that of no QTL (H01: {alpha} = 0) for A and C and that of an unlinked QTL (H02: {theta} = 0.5) in B and D. The black curves (solid diamond) correspond to (1/M) x log(H2/H0), therefore extracting information from both linkage and LD, the light gray curves (shaded square) to (1/M) x log(H1/H0), therefore extracting information from linkage only, and the dark gray curves (shaded triangle; scaled to 1/10) to (1/M) x log(H2/H1) correspond to LD only. The cross-hatched area corresponds to the interval selected by RIQUET et al. 1999 Down and subsequently refined by B. GRISART (unpublished data).



View larger version (61K):
In this window
In a new window
Download PPT slide
 
Figure 2. Schematic representation of the ancestral haplotype identified with the LINKLD program in data sets I and III (AH-Dutch) in black and in data set V (AH-NZ) in gray. Also represented are the fat-increasing haplotypes of 11 sires shown to be heterozygous Qq by marker-assisted segregation analysis (RIQUET et al. 1999 Down; B. GRISART, unpublished data). Chromosome segments identical-by-state with the AH-Dutch haplotype are highlighted in black and those identical-by-state with the AH-NZ haplotype are highlighted in gray. DBWx are Dutch Black-and-White sires (data set I), DRWx are Dutch Red-and-White sires (data set III), and NZBWx correspond to the New Zealand Black-and-White sires (data set II).



View larger version (91K):
In this window
In a new window
Download PPT slide
 
Figure 3. Recombinant ancestral haplotype (RAH) analysis. The compositions of the ancestral haplotype (AH) and centromeric (CRAH-n) and telomeric (TRAH-n) recombinant ancestral haplotypes are shown as bars, respectively, at the top and bottom, aligned with respect to the positions of the proximal markers for chromosome 14. The marker interval identified in the RIQUET et al. 1999 Down study corresponds to the stippled area. The light gray (shaded triangle) and black (solid diamond) lines in the graph connect the least-squares estimates of the allelic effects of the respective TRAH and CRAH. Least-squares estimates are flanked by error bars defining 95% confidence intervals (±2 standard errors of the estimate). The dotted line marks the estimate of the allelic effect of the OH (all-white) haplotype class (see MATERIALS AND METHODS). The dark gray (shaded box) line connects the least-squares estimates of the interval-specific contrasts flanked by error bars defined as above. The grouping of the RAH to generate the respective interval-specific contrast is marked by the shaded ovals underneath the haplotype bars. Haplotypes marked by a dark oval belong to group I and those marked by a light oval to group II.

Indeed, the phenotype of each individual can be partitioned into the contribution of the QTL of interest, a residual "polygenic" (PGi) component, and an error term as

(16)

The expectation of the polygenic component of individual i being the average of the polygenic components of sire (SPGi) and dam (DPGi),

(17)

The polygenic components of sire and dam can be expressed as functions of their respective phenotypic values (SPhi and DPhi) and marker haplotypes (SH and DH):

(18)


(19)

In the case of the granddaughter design the dams are not genotyped and the only known marker haplotype of the dam is HMi. Therefore,

which is identical to Equation 15. Note that the error terms, ei, in Equation 15Equation 16Equation 17Equation 18Equation 19 are not equivalent.

Following THALLER and HOESCHELE 2000 Down, but using historical rather than current recombinations, we then combine the different haplotype classes such as to derive specific contrasts for each marker interval. The expectation is that the interval-specific contrast will be maximal in the true interval.

Corresponding ANOVA analyses were performed with the GLM procedure of the SAS package.


*  RESULTS
*TOP
*ABSTRACT
*MATERIALS AND METHODS
*RESULTS
*DISCUSSION
*LITERATURE CITED

Joint linkage and linkage disequilibrium mapping point toward proximal location of the chromosome 14 QTL:
We initially applied the LINKLD programs to data sets I and III jointly. Although data sets I and III correspond to Black-and-White vs. Red-and-White animals, respectively, the corresponding phenotypes (DYD) are obtained from a joint breeding value estimation (NRS, The Netherlands), allowing us to treat data sets I and III as a single, homogeneous population. Fig 1 summarizes the results that were obtained with LINKLD when analyzing the effect of chromosome BTA14 on fat percentage. The choice of fat percentage is justified by the fact that this trait was shown previously to be the most profoundly affected by this QTL (COPPIETERS et al. 1998A Down). Fig 1A and Fig B, illustrates the results obtained when considering only the information from the sires. It can be seen that the results are quite similar whether considering no QTL ({alpha} = 0) or an unlinked QTL ({theta} = 0.5) as null hypothesis. When considering linkage information only, very significant maximum lod scores of, respectively, 30 (H01: {alpha} = 0) and 28 (H02: {theta} = 0.5) are obtained at map position 5 (i.e., in the second marker bracket). The additional information obtained from including linkage disequilibrium is also shown in Fig 1A and Fig B, yielding under both null hypotheses a maximum of 1.9 additional lod score at map position 6, i.e., the third marker. The corresponding maximum-likelihood estimators of {alpha}, {rho}, {sigma}, g, and fQ are reported in Table 3.


 
View this table:
In this window
In a new window

 
Table 3. ML parameter estimates—data sets I and III

It can be seen that the Q to q allele substitution effect is estimated at 1.23 residual standard deviations, which clearly confirms the major effect of the corresponding QTL. The population frequency of the Q alleles is estimated at 0.5 when ignoring LD and 0.39 when considering LD, therefore implying that ~50% of the studied sire families might be segregating for this QTL. This high percentage is not really unexpected given the fact that at least 11 out of the 45 paternal half-sib families comprising data sets I and III were shown clearly to segregate on the basis of marker-assisted segregation analysis performed within families as described (RIQUET et al. 1999 Down; B. GRISART, unpublished observations) and despite the intrinsically low power to detect such segregating families. The average number of generations to coalescence was estimated at 3.8. Note that we have previously estimated that b individuals sharing an identical-by-descent mutation inherited independently from a common founder g generations back will share an identical-by-descent flanking segment of size 2/gb morgans on average (DUNNER et al. 1997 Down). The fat-increasing chromosome of the 11 Dutch segregating sires that were previously mentioned shared a common haplotype over a chromosome length of 13 cM (see Fig 2). This length has to be compared with the expected 4.7 cM. This discrepancy could be due to the close relatedness between some of the sires. The value of 92% for the heterogeneity parameter indicates the relative allelic homogeneity of the hypothetical Q population in the sires.

Fig 1C and Fig D, summarizes the results obtained from the same data sets when exploiting the information from the maternal chromosomes. When assuming no QTL ({alpha} = 0) as a null hypothesis (Fig 1C) and when considering only linkage information (i.e., ignoring linkage disequilibrium), lod scores reach a maximum value of 79 at the first marker locus. This is a surprising result as this type of analysis exploits essentially the same mapping information as in the analysis in Fig 1A. It is notable that this lod score inflation is not limited to the proximal end of chromosome 14 but applies to its entire length. We attribute this to the fact that this analysis attempts to estimate the QTL genotype of the maternal gamete (see Equation 10Equation 11 HREF="#FD12">Equation 12 in MATERIALS AND METHODS). If, after correction for the inheritance of the paternal homologues, the sons' phenotypic distribution clearly departs from normality, the null hypothesis of no QTL ({alpha} = 0) becomes very unlikely, thereby increasing the lod score values across the entire chromosome. This explanation is supported by the examination of the lod score curves obtained when using the alternate null hypothesis of an unlinked QTL ({theta} = 0.5; Fig 1D). Although the lod score profile is parallel to the previous ones across the entire chromosome, the actual values have dropped by >40 lod score units when ignoring LD. Lod scores even become negative in the middle of the chromosome, indicating the higher likelihood of an unlinked QTL for that position. A maximal lod score of 36 is obtained at the position of the first marker under this scenario. This value is still higher than the one obtained when ignoring dam information (z = 30; Fig 1A and Fig B). Assuming comparable distributions of lod score values under H0, this might indicate that even in the absence of LD, estimating the QTL genotype of the maternal gamete might increase the QTL mapping power.

The additional signal obtained from the consideration of LD also has a very similar profile across the chromosome under both null hypothesis options, maximizing in both cases just proximal from the first marker locus. At this position 7.4 additional lod score units are gained under the null hypothesis of no QTL vs. 3.3 units under the null hypothesis of an unlinked QTL. The reason for this discrepancy remains unclear. Table 3 reports the ML estimates of {alpha}, {rho}, {sigma}, g, and fQ. Considering the dams' gametic contributions has reduced the residual variance, thereby increasing the Q to q QTL alleles substitution effect to 1.8 standard deviations. The estimate of the Q allelic frequency has dropped from 0.40–0.49 to 0.22, indicating that the Q alleles might be less common in the dam than in the sire population. The number of generations to coalescence increases slightly (from 3.8 to 4.4), while the heterogeneity parameter, {rho}, drops slightly (from 0.92 to 0.81), which might be evidence for the existence of distinct Q alleles in the dam population.

For each marker locus, we identified the allele that yielded the highest likelihood (computed according to Equation 7) when considered associated with the hypothetical Q allele. When doing this at the respective positions that were associated with the highest signal from linkage disequilibrium, the same "ancestral haplotype" was obtained whether including information from the dams (map position 1; Fig 1C and Fig D) or not (map position 5; Fig 1A and Fig B) and is represented in Fig 2. As expected, it corresponds to the haplotype that was previously identified as being shared in total or in part by the fat-increasing chromosomes of the known Qq founder sires. (RIQUET et al. 1999 Down; B. GRISART, unpublished observations).

Whether considering only information from the sires' gametes (Fig 1A and Fig B) or including information from the dams' gametes (Fig 1C and Fig D), both the linkage and linkage disequilibrium signals obtained with the LINKLD program point toward a terminal location of the QTL on the centromeric end of chromosome 14. The BULGE36-BULGE17 interval, previously identified on the basis of haplotype sharing among Qq sires (RIQUET et al. 1999 Down; B. GRISART, unpublished observations), clearly lies outside and distal to the lod-2 support interval for the QTL.

Recombinant ancestral haplotype analysis supports the proximal location of the QTL:
To resolve this discrepancy, we performed a RAH analysis for the BULGE09-CSSM66 marker interval that spans the two conflicting QTL positions following the procedure described in MATERIALS AND METHODS. To increase the number of RAH, we genotyped an additional data set (data set IV) comprising 401 bulls corresponding to 39 paternal half-sib pedigrees for the markers in the region.

Fig 3 summarizes the obtained results. It can be seen that for the marker interval ILSTS39-CSSM66, the allelic effects of the centromeric RAH (CRAH-3 to CRAH-6) are superior to that of the corresponding telomeric RAH (TRAH-3 to TRAH-6), while for the most centromeric marker position (BULGE09) this tendency is reversed. This clearly suggests that the QTL location is proximal (centromeric) to marker ILSTS39. This hypothesis is also supported by the observation that the interval-specific contrasts maximize within the first marker interval (BULGE09-BULGE11), suggesting that the QTL lies in that interval.

These results therefore corroborate the LINKLD analysis and provide strong evidence against the location of the QTL in the ILSTS39-CSSM66 interval as initially inferred (RIQUET et al. 1999 Down; B. GRISART, unpublished data).

Understanding the origin of the discrepancy between the former ( Riquet et al. 1999 Down) and new estimates of the QTL map position:
Assuming that the QTL is indeed positioned proximal to the ILSTS39 marker, this raises the issue of what could have caused its initial erroneous localization in the BULGE36-BULGE17 interval on the basis of the haplotype sharing among putative Qq sires (RIQUET et al. 1999 Down; B. GRISART, unpublished data). It is noteworthy in this respect that the sire (NZ1) that defined the proximal boundary of the BULGE36-BULGE17 interval was sampled in the New Zealand dairy cattle population (Fig 2). We therefore analyzed data set II, corresponding to the New Zealand portion of our original granddaughter design (SPELMAN et al. 1996 Down; COPPIETERS et al. 1998A Down), with the LINKLD program. While the linkage component (H1) of this analysis generated convincing evidence for the segregation of a QTL with major effect on fat percentage on the centromeric end of chromosome 2 (zmax = 8.65 at the BULGE11 position) in this population as well, no clear supplementary signal could be obtained when accounting for the linkage disequilibrium component (H2) in the analysis whether considering maternal chromosomes or not (data not shown). The reasons for this remain unclear.

To nevertheless obtain linkage disequilibrium information in the New Zealand population, we collected a daughter design comprising 51 paternal half-sib families for a total of 529 cows (data set V). This data set was genotyped for proximal chromosome 14 markers and the resulting genotypes were analyzed with the LINKLD programs using lactation values as phenotype. Fig 4 summarizes the results that were obtained, using an unlinked QTL as null hypothesis (H02) and when considering the maternal haplotypes in the analysis. The signal from linkage maximizes at the position of the second marker (BULGE11; zmax = 3.37). A maximum of 0.5 lod score units were gained from considering linkage disequilibrium at the position of the third marker (ILSTS39). Despite the relatively modest lod score values, these results were again pointing toward a location of the QTL proximal from the BULGE34-BULGE17 interval.



View larger version (30K):
In this window
In a new window
Download PPT slide
 
Figure 4. Lod scores obtained when analyzing data set V, using both sire and dam information and an unlinked QTL as null hypothesis (H02). The black curve (solid diamond) corresponds to (1/M) x log(H2/H0), therefore extracting information from both linkage and LD, the light gray curve (shaded square) to (1/M) x log(H1/H0), therefore extracting information from linkage only, and the dark gray curve (shaded triangle; scaled to 1/10) to (1/M) x log(H2/H1) corresponds to LD only. The cross-hatched area corresponds to the interval selected in RIQUET et al. 1999 Down and subsequently refined by B. GRISART (unpublished data).

The corresponding maximum-likelihood (ML) estimators of {alpha}, {sigma}, fQ, g, and {rho} were, respectively, 0.15%, 0.13%, 0.32, 6, and 0.36. The allele substitution effect has increased from 0.11% to 0.15%. This is not unexpected given the fact that in this case we analyzed lactation values rather than daughter yield deviations (which correspond to one-half breeding values). The increased residual variance is as expected as well given the lower reliability of lactation values when compared to daughter yield deviations. The frequency of the Q allele is estimated at 0.32 in the New Zealand population, which is therefore in the same range as in the Dutch population. The value of 0.36 for {rho} suggests the occurrence of distinct subgroups of Q QTL alleles in the New Zealand population.

We then identified the most likely ancestral haplotype from the analysis of data set V following the procedure described above. As shown in Fig 2, in the BULGE09-BULGE36 interval the resulting AH proved to be identical with the haplotype from sire NZ1 that was associated with an increase in fat percentage. This therefore suggested that in the New Zealand population, a fat-increasing allele might be preferentially associated with a marker haplotype distinct from the one identified in the Dutch population.

To verify this, we computed the average substitution effect of this newly identified haplotype (BULGE9-BULGE30), using daughters whose sire was heterozygous for this haplotype. Seventeen such heterozygous sires were identified in data set V for a total of 183 daughters. Average lactation values corrected for paternal breeding value were compared between daughters that received the haplotype vs. those that did not. This yielded a highly significant substitution effect of +0.12% (p < 1E - 7) as expected. This observation, associated with the fact that the fat-increasing haplotype of the NZ1 sire was fortuitously identical-by-state with the Dutch AH in the BULGE34-BULGE4 interval, likely caused the erroneous QTL localization in the previous analyses (RIQUET et al. 1999 Down; B. GRISART, unpublished data).

The average substitution effect of the Dutch AH (BULGE09-BULGE30) in the New Zealand population was estimated using the same approach; i.e., average lactation values corrected for paternal breeding value were compared between daughters of heterozygous sires that received the Dutch haplotypes vs. those that did not. Seventeen sires heterozygous for the Dutch haplotype were identified in data set V for a total of 172 daughters. This analysis yielded an average substitution effect of +0.073% (p < 1E - 3), therefore confirming the increasing effect on fat percentage of the Dutch AH in the NZ population as well.


*  DISCUSSION
*TOP
*ABSTRACT
*MATERIALS AND METHODS
*RESULTS
*DISCUSSION
*LITERATURE CITED

We herein describe a method aimed at simultaneously exploiting information from linkage and linkage disequilibrium to improve the power and resolution of QTL mapping in outbred half-sib pedigrees. The proposed approach is an extension of a multipoint association method developed by TERWILLIGER 1995 Down for the fine mapping of monogenic disease traits. It has been extended to (i) allow for the combined analysis of linkage and linkage disequilibrium in half-sib pedigrees and (ii) be applicable to quantitative traits. The principle underlying the proposed method consists of establishing a parallel between a disease-causing allele, D, in the case of a monogenic disease, and a QTL allele with large substitution effect, Q, in the case of a quantitative trait.

We demonstrate in this work that additional mapping information can indeed be obtained from linkage disequilibrium using the proposed approach. A drawback of the method as presently described is the unknown distribution of the generated lod scores under the null hypotheses of no QTL or unlinked QTL. As a consequence, one cannot confidently assign a significance level to the obtained LD signal. We are presently attempting to address this issue "empirically" using permutation tests (e.g., CHURCHILL and DOERGE 1995 Down). In these, the transmission probabilities of alternate paternal homologs to individual sons [P(Sonj, A(p)|Mj) and P(Sonj, B(p)|Mj)] are maintained, thereby conserving the mapping information from linkage. However, marker haplotypes of the sires and maternal haplotypes of the sons are subjected to permutation. This effectively disconnects QTL and marker genotype while maintaining the actual LD between marker loci. This empirical approach is tedious, however, given the requirement for long computation times and the difficulties encountered to find the likelihood maxima using the available numerical recipes.

The proposed approach allows for the extraction of information from the maternal gametes. This information is ignored by most of the analysis methods that are the most commonly applied to granddaughter designs (e.g., MACKINNON and WELLER 1995 Down; KNOTT et al. 1996 Down; COPPIETERS et al. 1998B Down). As presently formulated, the information from maternal chromosomes is obtained using an association study. This contrasts with the linkage disequilibrium information extracted from the sire chromosomes, which in essence corresponds to the more robust transmission disequilibrium test (SPIELMAN et al. 1993 Down). This approach is therefore susceptible to yielding associations independent of chromosomal localization in the case of population stratification. We therefore recommend considering the obtained result with caution. The algorithm could easily be reformulated to extract the maternal information using a transmission disequilibrium approach as well. Such an approach would, however, require genotyping of the dams, which is usually not practical.

We demonstrate in this work that the choice of null hypothesis, e.g., H01: {alpha} = 0 or H02: {theta} = 0.5, can have a major effect on the resulting lod scores. More specifically, we show an instance in which the use of H01: {alpha} = 0 tends to inflate the lod scores by up to 40 units. Note that this null hypothesis is the one that is generally used by QTL mappers. Alternatively, we have encountered situations (data not shown) where the real presence of an unlinked QTL with major effect on the examined trait substantially decreases the obtained lod scores when looking for QTL on a given chromosome and using H02: {theta} = 0.5. These observations therefore stress the need to generate significance thresholds adapted to specific circumstances (for instance by permutation), particularly when using ML approaches.

Estimation of the heterogeneity parameter, {rho}, supports a relatively high homogeneity of marker haplotypes flanking the Q allele in the Dutch population . We provide strong evidence, however, for the association of Q alleles increasing fat percentage with distinct marker haplotypes in the New Zealand population. It has been argued that one of the advantages of Terwilliger's method is that it would be robust against haplotype heterogeneity within the population of D bearing chromosomes (TERWILLIGER and WEISS 1998 Down). A distinct difference between disease-causing alleles (D) and QTL alleles with large substitution effects (Q), however, is that one can envisage situations where distinct haplotypes are associated with Q alleles, some having positive effects on the trait of interest and others having negative effects. Preliminary evidence suggests that this might occur in the New Zealand population. It remains to be established how the proposed method would behave in such situations.

While this work demonstrates that it is possible to extract linkage disequilibrium information using the available medium density maps, it remains uncertain whether this will yield a significant gain in power and—more importantly—mapping resolution. While we believe that this work provides strong evidence that the QTL on bovine chromosome 14 is actually located proximal to the previously identified BULGE34-BULGE17 interval, this evidence is obtained as much from linkage as from linkage disequilibrium information. There seems to be as much variation in the most likely position of the QTL across experiments, whether one considers information from linkage or linkage disequilibrium. We nevertheless believe that, at present, exploiting linkage disequilibrium remains one of the most promising strategies to increase mapping resolution and power in, for instance, dairy cattle. This work, as well as that from others (e.g., MEUWISSEN and GODDARD 2000 Down), is a first step toward that goal.


*  FOOTNOTES

1 Both authors contributed equally to this work. Back


*  ACKNOWLEDGMENTS

We are grateful to CR-DELTA and LIC for providing us with breeding values. We thank Richard Spelman and Johan van Arendonk for fruitful discussions. This work was funded by grants from Holland Genetics, Livestock Improvement Corporation (LIC), the Vlaamse Rundvee Vereniging, the Ministère des Classes Moyennes et de l'Agriculture, Belgium, and E.U. grants B104-CT95-0073 and PL970471.

Manuscript received September 10, 2001; Accepted for publication February 22, 2002.


*  LITERATURE CITED
*TOP
*ABSTRACT
*MATERIALS AND METHODS
*RESULTS
*DISCUSSION
*LITERATURE CITED

ANDERSSON, L., 2001  Genetic dissection of phenotypic diversity in farm animals. Nat. Rev. Genet. 2:130-138[Medline].

BINK, M. C. and J. A. M. VAN ARENDONK, 1999  Detection of quantitative trait loci in outbred populations with incomplete marker data. Genetics 151:409-420[Abstract/Free Full Text].

CHARLIER, C., F. FARNIR, P. BERZI, P. VANMANSHOVEN, and B. BROUWERS et al., 1996  IBD mapping of recessive traits in livestock: application to map the bovine syndactyly locus to chromosome 15. Genome Res. 6:580-589[Abstract/Free Full Text].

CHURCHILL, G. A. and R. W. DOERGE, 1995  Empirical threshold values for quantitative trait mapping. Genetics 138:963-971[Abstract].

COPPIETERS, W., J. RIQUET, J.-J. ARRANZ, P. BERZI, and N. CAMBISANO et al., 1998a  A QTL with major effect on milk yield and composition maps to bovine chromosome 14. Mamm. Genome 9:540-544[Medline].

COPPIETERS, W., A. KVASZ, J.-J. ARRANZ, B. GRISART, and F. FARNIR et al., 1998b  A rank-based nonparametric method to map QTL in outbred half-sib pedigrees: application to milk production in a granddaughter design. Genetics 149:1547-1555[Abstract/Free Full Text].

COPPIETERS, W., A. KVASZ, J.-J. ARRANZ, B. GRISART, and J. RIQUET et al., 1999  The great-grand-daughter design: a simple strategy to increase the power of a grand-daughter design. Genet. Res. 74:189-199.

DARVASI, A., 1998  Experimental strategies for the genetic dissection of complex traits in animal models. Nat. Genet. 18:19-24[Medline].

DUNNER, S., C. CHARLIER, F. FARNIR, B. BROUWERS, and J. CANON et al., 1997  Towards interbreed IBD fine mapping of the mh locus: double-muscling in the Asturiana de los Valles breed involves the same locus as in the Belgian Blue cattle breed. Mamm. Genome 8:430-435[Medline].

FARNIR, F., W. COPPIETERS, J.-J. ARRANZ, P. BERZI, and N. CAMBISANO et al., 2000  Extensive genome-wide linkage disequilibrium in cattle. Genome Res. 10:220-227[Abstract/Free Full Text].

FLINT, J. and R. MOTT, 2001  Finding the molecular basis of quantitative traits: successes and pitfalls. Nat. Rev. Genet. 2:437-445[Medline].

GEORGES, M., D. NIELSEN, M. MACKINNON, A. MISHRA, and R. OKIMOTO et al., 1995  Mapping quantitative trait loci controlling milk production by exploiting progeny testing. Genetics 139:907-920[Abstract].

HEYEN, D. W., J. I. WELLER, M. RON, M. BAND, and J. E. BEEVER et al., 1999  A genome scan for QTL influencing milk production and health traits in dairy cattle. Physiol. Genom. 1:165-175.

KAPPES, S. M., J. W. KEELE, R. T. STONE, R. A. MCGRAW, and T. S. SONSTEGARD et al., 1998  A second-generation linkage map of the bovine genome. Genome Res. 7:235-249[Abstract/Free Full Text].

KNOTT, S., J. M. ELSEN, and C. HALEY, 1996  Methods for multiple-marker mapping of quantitative trait loci in half-sib populations. Theor. Appl. Genet. 93:71-80.

KRUGLYAK, L., 1999  Prospects for whole-genome linkage disequilibrium mapping of common disease genes. Nat. Genet. 22:139-144[Medline].

LALOUEL, J. M., 1983  Optimization of functions. Contrib. Epidemiol. Biostat. 4:235-259.

LANDER, E. S. and N. J. SCHORK, 1994  Genetic dissection of complex traits. Science 265:2037-2048[Abstract/Free Full Text].

LOOFT, C., N. REINSCH, C. KARALL-ALBRECHT, S. PAUL, and M. BRINK et al., 2001  A mammary gland EST showing linkage disequilibrium to a milk production QTL on bovine chromosome 14. Mamm. Genome 12:646-650[Medline].

MAURICIO, R., 2001  Mapping quantitative trait loci in plants: uses and caveats for evolutionary biology. Nat. Rev. Genet. 2:371-381.

MACKAY, T. F. C., 2001  Quantitative trait loci in Drosophila. Nat. Rev. Genet. 2:11-20[Medline].

MACKINNON, M. and J. WELLER, 1995  Methodology and accuracy of estimation of quantitative trait loci parameters in a half-sib design using maximum likelihood. Genetics 141:755-770[Abstract].

MEUWISSEN, T. and M. GODDARD, 2000  Fine mapping of quantitative trait loci using linkage disequilibria with closely linked marker loci. Genetics 155:421-430[Abstract/Free Full Text].

PATERSON, A. H., 1995  Molecular dissection of quantitative traits: progress and prospects. Genome Res. 5:321-333[Abstract/Free Full Text].

REICH, D. E., M. CARGILL, S. BOLK, J. IRELAND, and P. C. SABETI et al., 2001  Linkage disequilibrium in the human genome. Nature 411:199-204[Medline].

RIQUET, J., W. COPPIETERS, N. CAMBISANO, J.-J. ARRANZ, and P. BERZI et al., 1999  Identity-by-descent fine-mapping of QTL in outbred populations: application to milk production in dairy cattle. Proc. Natl. Acad. Sci. USA 96:9252-9257[Abstract/Free Full Text].

SPELMAN, R. J. and J. A. M. VAN ARENDONK, 1997  Effect of inaccurate parameter estimates on genetic response to marker assisted selection in an outbred population. J. Dairy Sci. 80:3399-3410[Abstract].

SPELMAN, R. L., W. COPPIETERS, L. KARIM, J. A. M. VAN ARENDONK, and H. BOVENHUIS, 1996  Quantitative trait loci analysis for five milk production traits on chromosome six in the dutch Holstein-Friesian population. Genetics 144:1799-1808[Abstract].

SPIELMAN, R., R. MCGINNIS, and W. EWENS, 1993  Transmission test for linkage disequilibrium: the insulin gene region and insulin-dependent diabetes mellitus (IDDM). Am. J. Hum. Genet. 52:506-516[Medline].

TERWILLIGER, J. D., 1995  A powerful likelihood method for the analysis of linkage disequilibrium between trait loci and one or more polymorphic marker loci. Am. J. Hum. Genet. 56:777-787[Medline].

TERWILLIGER, J. D. and K. M. WEISS, 1998  Linkage disequilibrium mapping of complex disease: fantasy or reality? Curr. Opin. Biotechnol. 9:578-594[Medline].

THALLER, G. and I. HOESCHELE, 2000  Fine-mapping of quantitative trait loci in half-sib families using current recombinations. Genet. Res. 76:87-104[Medline].

VAN RADEN, P. M. and G. R. WIGGANS, 1991  Derivation calculation and use of National Animal Model Information. J. Dairy Sci. 74:2737-2746[Abstract].




This article has been cited by other articles:


Home page
GeneticsHome page
A. P. W. de Roos, B. J. Hayes, R. J. Spelman, and M. E. Goddard
Linkage Disequilibrium and Persistence of Phase in Holstein-Friesian, Jersey and Angus Cattle
Genetics, July 1, 2008; 179(3): 1503 - 1512.
[Abstract] [Full Text] [PDF]


Home page
J DAIRY SCIHome page
J. Naslund, W. F. Fikse, G. R. Pielberg, and A. Lunden
Frequency and Effect of the Bovine Acyl-CoA:Diacylglycerol Acyltransferase 1 (DGAT1) K232A Polymorphism in Swedish Dairy Cattle
J Dairy Sci, May 1, 2008; 91(5): 2127 - 2134.
[Abstract] [Full Text] [PDF]


Home page
J DAIRY SCIHome page
A. P. W. de Roos, C. Schrooten, E. Mullaart, M. P. L. Calus, and R. F. Veerkamp
Breeding Value Estimation for Fat Percentage Using Dense Markers on Bos taurus Autosome 14
J Dairy Sci, October 1, 2007; 90(10): 4821 - 4829.
[Abstract] [Full Text] [PDF]


Home page
GeneticsHome page
H. H. Zhao, R. L. Fernando, and J. C. M. Dekkers
Power and Precision of Alternate Methods for Linkage Disequilibrium Mapping of Quantitative Trait Loci
Genetics, April 1, 2007; 175(4): 1975 - 1986.
[Abstract] [Full Text] [PDF]