## Abstract

We herein demonstrate that in the Holstein–Friesian dairy cattle population, microsatellites are as polymorphic on the X chromosome as on the autosomes but that the level of linkage disequilibrium between these markers is higher on the X chromosome than on the autosomes. The latter observation is not compatible with the small male-to-female ratio that prevails in this population and results in a higher gonosomal than autosomal effective population size. It suggests that the X chromosome undergoes distinct selective or mutational forces. We describe and characterize a novel Markovian approach to exploit this linkage disequilibrium to compute the probability that two chromosomes are identical-by-descent conditional on flanking marker data. We use the ensuing probabilities in a restricted maximum-likelihood approach to search for quantitative trait loci (QTL) affecting 48 traits of importance to the dairy industry and provide evidence for the presence of QTL affecting 5 of these traits on the bovine X chromosome.

EXTENSIVE use of artificial insemination (AI) in mammalian livestock species, especially cattle and pigs, leads to the frequent occurrence of large paternal half-sibships. This pedigree structure has been abundantly exploited to map quantitative trait loci (QTL) influencing a variety of agronomically important phenotypes. Detection of significant differences between the phenotypic means of half-sibs sorted according to the homolog inherited from their sire points toward linked QTL (*e.g.*, Georges *et al.* 1995). However, as sires are hemizygous for the X chromosome, this strategy has precluded exploration of the sex chromosomes except for the pseudoautosomal region (*e.g.*, Hiendleder *et al*. 2003; Kuhn *et al*. 2003).

It has recently been shown that linkage disequilibrium (LD) extends over unusually long distances in livestock species due to reduced effective population size (*N*_{e}) (*e.g.*, Farnir *et al.* 2000). As a result, significant LD can be detected using the medium-density marker maps available in these species. This can be exploited to increase the power and resolution of QTL mapping (*e.g.*, Meuwissen and Goddard 2000; Farnir *et al.* 2002). It also opens possibilities to explore correlation between phenotype and marker genotype on the X chromosome, *i.e.*, to map QTL on that chromosome.

In this article, we have (i) quantified the level of microsatellite polymorphism and LD on the X chromosome in Holstein–Friesian dairy cattle, (ii) used an approach based on Markov chains to compute identity-by-descent (IBD) probabilities of a pair of X chromosomes conditional on flanking marker data, and (iii) used corresponding IBD probabilities to map QTL on the X chromosome.

## MATERIALS AND METHODS

#### Pedigree material and phenotypes:

We used a previously described Holstein–Friesian granddaughter design (GDD) (Coppieters *et al.* 1998) comprising 22 paternal half-brother families for a total of 929 bulls. The genealogies of these animals were obtained from Holland Genetics (Arnhem, The Netherlands). The average number of recorded ancestors per bull–dam was 40.4, whereas up to 11 generations separated the bulls from their most distant ancestor. On the basis of available pedigree data and assuming that the founders were unrelated, we estimated the average inbreeding coefficient, *F*, of the bull–dams at 1.3% (range: 0–14%), and their average kinship coefficient, *f*, at 4% (range: 0–57%) (Farnir *et al.* 2000).

For each of these bulls we obtained estimated breeding values (BV) from Holland Genetics (Arnhem, The Netherlands) for 48 phenotypes of interest to the dairy industry. The analyzed phenotypes can be grouped in (i) 5 milk yield and milk composition traits, (ii) 13 conformation traits, (iii) 8 birth and fertility traits, (iv) 10 udder health traits, (v) 2 workability traits, and (vi) 10 productivity traits (Table 1 ). A detailed description of each of these traits can be found at http://www.nrs.nl/index-eng.htm.

#### Marker genotyping:

Using standard procedures (*e.g.*, Coppieters *et al.* 1998), the entire GDD was genotyped for 22 X-specific and three pseudoautosomal microsatellite markers, which have all been previously described (Sonstegard *et al.* 2001). Marker order and recombination rates between adjacent markers were taken from Sonstegard *et al.* (2001) and are shown in Figure 1. We also used phased genotypes available on the same GDD for 202 autosomal microsatellites (Farnir *et al.* 2000). Sex-averaged recombination rates between autosomal markers were obtained from Ihara *et al.* (2004).

#### Measuring linkage disequilibrium:

Pairwise LD was measured using *r*^{2} as previously described (Grisart *et al.* 2004). Briefly, *r*^{2} was computed as(1)where *u* and *v* are the respective numbers of alleles at the two marker loci *A* and *B*, *p _{i}* and

*q*are the population frequencies of alleles

_{j}*A*and

_{i}*B*, respectively, and

_{j}*x*is the observed frequency of haplotype

_{ij}*A*.

_{i}B_{j}#### Computing IBD probabilities conditional on marker genotype for pairs of X chromosomes:

The probability of IBD of two X chromosomes at a given map position conditional on multiple linked marker genotypes, Φ_{p}, was computed according to Meuwissen and Goddard (2001). This requires the computation of the probability of coalescence without recombination, , of a chromosome segment of size θ in a population with constant male and female population sizes of, respectively, *N*_{m} and *N*_{f}, within *T* generations, where *T* is the number of generations separating the present from the unrelated base population. This probability is computed using hidden Markov theory, knowing that(2)where , is the probability that the two X chromosome segments are two different male chromosomes *t* generations before present, is the probability that the two X chromosomes are one a male the other a female chromosome *t* generations before present, is the probability that the two X chromosomes are two different female chromosomes *t* generations before present, and is the probability that the two X chromosomes have coalesced between the present and *t* generations back. **M** is a matrix of transition probabilities:(3)*p*_{MM→MM} is the probability for the two X chromosomes to be different male chromosomes *t* generations ago and different male chromosomes *t +* 1 generations ago; *p*_{MM→MF} is the probability for the two X chromosomes to be different male chromosomes *t* generations ago and one a male chromosome and the other a female chromosome *t +* 1 generations ago, etc.…; and θ_{m} and θ_{f} are male and female recombination rates, respectively. For all but the pseudoautosomal region of the sex chromosomes θ_{m} = 0. As we are using sires in a GDD, **P ^{0}** was set at (1, 0, 0, 0).

*T*was set at 100,

*N*

_{m}at 25, and

*N*

_{f}at 1,000,000, yielding an effective population size

*N*

_{e}of ≈100.

Note that the same approach can also be used to compute coalescence probabilities for autosomal chromosome segments. **M** then becomes(4)

The probabilities of coalescence without recombination, , computed as described above were then utilized to compute IBD probabilities for pairs of X chromosomes conditional on flanking marker genotypes, Φ_{p}, using the rules defined by Meuwissen and Goddard (2001).

#### Mapping QTL on the X chromosome using a restricted maximum-likelihood approach:

QTL were mapped on the X chromosome using an “interval-mapping approach”; *i.e.*, the position of the hypothetical QTL was slid across the chromosome X (BTAX) marker map (Figure 1) and lod scores were computed at regular intervals. In this study, a lod score was generated in the middle of each marker interval.

Lod scores were computed essentially as previously described (Kim and Georges 2002; Blott *et al.* 2003). To test map position *p*, we first clustered the *n* BTAX haplotypes in the data set in a rooted dendrogram. The *n*(*n −* 1)/2 pairwise (1 − Φ_{p}) values (computed as described above) were used as distance measures and UPGMA as a hierarchical clustering algorithm (*e.g.*, Mount 2001). The tree was then used as a logical framework to group the haplotypes in functionally distinct clusters. To achieve this, the tree was scanned downward from the top and branches cut such that the distance between haplotypes within resulting subtrees would not exceed a threshold *C*.

We then modeled the breeding values of the sons using the following linear model:(5)**y** is the vector of breeding values for all sons. **b** is a vector of fixed effects, which reduces here to the overall mean. **X** is the incidence matrix relating fixed effects to individual sons, which here reduces to a vector of ones. **h** is the vector of random QTL effects corresponding to the functionally distinct haplotype clusters defined as above. **Z _{h}** is an incidence matrix that relates each son to his corresponding haplotype cluster.

**u**is the vector of random individual autosomal polygenic effects (“animal model,” Lynch and Walsh 1998).

**Z**is a diagnonal incidence matrix that relates each son to its individual polygenic effect.

_{u}**e**is the vector of individual error terms.

Haplotype cluster effects with corresponding variance, , individual polygenic effects with corresponding variance,, and individual error terms with corresponding variance, , were estimated using AIREML (Johnson and Thompson 1995), by maximizing the restricted log-likelihood function *L*:(6)

In this,(7)

Because we assumed that the covariance between the QTL effects of the different haplotype clusters was zero, **H** reduces to an identity matrix. **A** is the additive genetic relationship matrix (Lynch and Walsh 1998).

*L* was computed for all possible values of *C* (from 1 to 0), to identify a restricted maximum-likelihood (REML) solution for map position *p* under the H_{1} hypothesis.

The log-likelihood of the data under H_{1} was then compared with that under the null hypothesis, H_{0}, of no QTL at map position *p.* The latter was computed as described above but using the reduced model:(8)

Evidence in favor of a QTL at map position *p* was expressed as a lod score:(9)The statistical significance of observed *z _{p}*-values, accounting for the fact that we tested 24 marker intervals and multiple values of

*C*in each interval, was estimated from the distribution of largest

*z*-values obtained along the X chromosome when analyzing 1000 sets of simulated phenotypes. The latter were generated by “dropping” 30 unlinked, autosomal QTL through the known genealogy of the 929 bulls. All QTL were assumed to have four alleles with random allelic frequency and additive QTL effects sampled from () with randomly assigned positive or negative sign. The polygenic effect of each bull was computed as the sum of its 60 QTL effects. In addition, each bull was assigned a nongenetic effect sampled from a normal distribution such that(10)where is the variance due to the polygenic effects and is the heritability that was set at 0.75.

_{p}## RESULTS

A total of 276,048 phased autosomal and 22,181 X-linked microsatellite genotypes, produced on 929 bulls as described in materials and methods, were available for analysis. Figure 2 compares the number of alleles as well as the “effective” heterozygosity (computed as , where *p _{i}* is the frequency of allele

*i*of

*n*) for the 202 autosomal microsatellites and the 25 gonosomal microsatellites. Only the alleles inherited from the dam were considered in this analysis. The average number of alleles and effective heterozygosity (±SE) were, respectively, 6.6 ± 0.4 and 0.59 ± 0.02 for the autosomal markers

*vs.*, respectively, 8.0 ± 1.1 and 0.61 ± 0.05 for the gonosomal markers. Microsatellite markers appeared thus slightly but not significantly more polymorphic on the X chromosome than on the autosomes.

Figure 3A compares the level of pairwise LD measured in the same sample for syntenic autosomal and X-specific markers as a function of genetic distance. To allow for proper comparison, the distance between markers was set at for the autosomal markers and at for the gonosomal markers. In these, θ_{m} and θ_{f} correspond to the male and female recombination rates, while *N*_{m} and *N*_{f} correspond to the numbers of males and females, respectively. For the autosomes, θ_{A} corresponded to the sex-averaged θ-values obtained from Ihara *et al.* (2004). For the X chromosome, θ_{f}-values were obtained from Sonstegard *et al.* (2001). *N*_{m} and *N*_{f} were set at 25 and 1 million, respectively, yielding an effective (autosomal) population size () of ≈100. The latter value is close to recent, pedigree-based estimates of in this population (Boichard 1996). It can be seen that *r*^{2}-values are systematically higher for the X chromosome than for the autosomes, being ∼1.75 times as large and ranging from values of the order of 0.10 for recombination rates <5% to values of the order of 0.01 for recombination rates between 20 and 25%. Note that these *r*^{2}-values, including those between the more distant markers, are highly significant (Farnir *et al.* 2000). Note also that choosing less extreme sex ratios accentuates the difference between and .

Using the Markov model described in materials and methods, we computed the (929^{2} − 929)/2 pairwise IBD probabilities conditional on marker data, for the 24 BTAX marker intervals. In these calculations, we assumed that *T*, the number of generations to the base population, was 100 and that *N*_{m} was 25 and *N*_{f} 1 million—as before. The overall IBD probability (averaged across all chromosome pairs and marker intervals) was 0.26. This is lower than the theoretical expectation of 0.36, corresponding to the probability of coalescence of a chromosome region of size 0 computed using the gonosomal-specific Markov model (see materials and methods), and hence to the coefficient of kinship of a chromosome region of size 0 (Meuwissen and Goddard 2001). Adjusting these parameter values for the theoretical and observed values to match might be a way to fine tune the model.

For a randomly selected interval (interval 12), we compared the IBD probabilities obtained with the X-specific transition probabilities, with those that would be obtained using autosomal transition probabilities, assuming (i) 25 males and 1 million females, but sex-specific recombination rates (θ_{f} and θ_{m} = 0), and (ii) a population size of 100 with equal numbers of males and females, as well as equal recombination rates in males and females (= θ_{f}). The latter yields in essence what would be obtained when directly applying the autosomal method proposed by Meuwissen and Goddard (2001) to the X chromosome. The results obtained under scenario i are shown in Figure 4. It can be seen that—although the differences are quite modest—the correct gonosomal model yields more “conservative” IBD probabilities, in the sense that if marker information suggests that the considered pair of chromosomes has a higher probability of IBD than two randomly selected chromosomes this probability is inflated when using the erroneous autosomal model, while if the marker information suggests the opposite, the autosomal model underscores the IBD probabilities when compared to the correct gonosomal model. This tendency was even more pronounced when using an autosomal model assuming equal numbers of males and females and equal recombination rate in both sexes corresponding to θ_{f} (scenario ii; data not shown).

The pairwise IBD probabilities described in the previous section were utilized to search for QTL influencing 48 dairying traits on the X chromosome using the variance component approach described in materials and methods. The proposed model includes a random individual animal effect that should properly correct for differences in autosomal coancestry that may be correlated with BTAX IBD probabilities and thus protect against false-positive QTL effects as a result of stratification. The lod score threshold of 2.5 corresponding to a chromosomewide type I error of ≈5% was exceeded for five traits: fat yield, direct durability, durable prestation, milking speed, and rear leg set (Table 2
). Assuming that these are independent QTL (as suggested by the type of trait and the respective most likely QTL positions), and knowing that—because of their correlations—the 48 analyzed traits behave in essence as ≈20 independent traits (W. Coppieters, unpublished data), this is thus approximately five times more than expected by chance alone. The same five QTL emerged out of an analysis performed using less extreme parameter values *N*_{m} = 15,000, *N*_{f} *=* 1,000,000, and *T =* 100 (Table 1). Altogether, this suggests that at least some of these QTL are likely to be genuine.

The proportion of the trait variance explained by the QTL ranged from 2 to 8% and was—as expected—well correlated with the lod score value. The number of haplotype clusters associated with the REML solution ranged very widely from 2 to 745. Figure 5 shows representative examples of the distribution of haplotype-cluster effects with standard error of the estimates and corresponding cluster frequency in the analyzed sample. In all cases, we were able to identify common haplotype clusters with significantly different effects. These most likely explain the high lod scores that were obtained.

## DISCUSSION

In this article, we have compared the level of polymorphism and LD between X-linked and autosomal microsatellite markers in the Holstein–Friesian dairy cattle population. We find X-linked microsatellite markers to be at least as variable as autosomal ones and—at comparable distance—LD to be considerably higher on the X chromosome than on the autosomes.

Typically, genetic polymorphism is expected to be lower and LD to be higher for markers on the X chromosome, as observed in humans (*e.g.*, Dib *et al.* 1996; Sachidanandam *et al.* 2001; Schaffner 2004; Altshuler *et al.* 2005). The lower level of polymorphism on the X is thought to result from (i) higher genetic drift as , (ii) a reduced mutation rate as the male mutation rate () is expected to be 1.7 to 4 times higher than the female mutation rate (), and while (*e.g.*, Ellegren 2000; Sachidanandam *et al.* 2001), and (iii) enhanced purifying selection due to male hemizygosity. Note that ascertainment bias (*i.e.*, the fact that one selects on polymorphism when developing markers) is expected to considerably blur this picture. The higher level of LD on the X chromosome is primarily thought to result from higher genetic drift. Our results in the bovine may thus—at first glance—seem in reasonable agreement with theoretical predictions: the level of polymorphism on the X is comparable to that on autosomes as a result of ascertainment bias when developing microsatellite markers, but the higher level of LD on the X—which is not subject to the same ascertainment bias—is as predicted by basic population genetics.

As a matter of fact, is valid only when the sex ratio () is one. This was clearly not the case for domestic cattle and most probably was not for their wild-type ancestors. Assuming (*e.g.*, Wright 1969) that(11)and(12)one can easily show that becomes larger than when SR < 1/7 (Figure 6). Thus in cattle is likely to have been higher than before and certainly after domestication. Equations 11 and 12 assume that all parents of a given sex have an equal chance of contributing offspring to the next generation, which is obviously not valid in livestock. Accounting for nonrandom parental contribution, however, does not alter these conclusions (Figure 6 and appendix).

This may in part explain why—in the bovine—X-linked markers are as polymorphic as autosomal markers, contrary to what is observed in humans. However, it precludes stronger drift from underlying the higher level of LD between X-linked markers. This is at least in part corroborated by comparing the level of LD between autosomal and X-linked “*in silico*” markers whose segregation within the known genealogy of the 929 analyzed bulls is simulated by gene dropping as previously described (Farnir *et al.* 2000). While for autosomal markers *in silico* LD levels are of the same magnitude as actual LD levels, for X-linked markers the levels of *in silico* LD are of the same magnitude as for the autosomal markers and thus considerably lower than the actual LD levels (Figure 3B).

This strongly suggests that the higher level of LD observed on the X chromosome of cattle, and possibly other species as well, is due to as yet undetermined factors other than drift. The fact that the increased levels of LD are also observed for markers that are >10 cM apart suggests that these factors are still operating or have been operating until recently, certainly after domestication (*e.g.*, Hayes *et al.* 2003).

We have developed a novel approach based on a Markovian model that allows computation of IBD probabilities conditional on flanking marker data accommodating both autosomes and gonosomes, varying SR, and sex-specific recombination rates. At present, the model accounts only for recombination but it should be possible to extend it such that it includes mutation as well. This may be important when dealing with short chromosomal segments for which mutation and recombination rate are of the same order of magnitude.

It is worthwhile noting that it should be possible to use the variance of IBD probabilities () as the basis for a measure of LD information content (IC). For a given marker interval, could be compared with the variance expected if the marker information allowed one to unambiguously distinguish IBD status from non-IBD status. The variance corresponding to such an ideal map is(13)where corresponds to the average IBD probability in the considered interval. Indeed, a proportion of chromosome pairs are expected to be IBD, thus have an IBD probability of 1 if the markers are fully informative, and hence contribute (1 − )^{2} to the variance of IBD probabilities; a proportion (1 − ) of chromosome pairs are expected to be non-IBD, thus have an IBD probability of 0 if the markers are fully informative, and hence contribute ^{2} to the variance of IBD probabilities. The ratio could thus be viewed as a measure of the LD IC of the utilized map. Although the detailed behavior and suitability of the proposed IC measure still need to be examined in more detail, the IC profile obtained with this method along the X chromosome map (data not shown) suggests—as expected—that one could gain considerable power by increasing the marker density. The recent availability of tens of thousands of bovine single-nucleotide polymorphisms (SNPs) as well of array-based methods allowing for the effective genotyping now makes this possible for both autosomes and the X chromosome.

We present evidence in this work that the X chromosome harbors several QTL affecting traits of importance to the dairy breeding industry. This contradicts previous studies in which effects of the X chromosome on some of these traits were searched for by fitting models in which the expected phenotypic covariance between relatives included terms accounting for sex linkage (*e.g.*, Lynch and Walsh 1998; M. Goddard, personal communication). The discrepancy between these results might be due to substantial gains of detection power from considering marker information.

Contrary to Meuwissen and Goddard (2001), but following Kim and Georges (2002) and Blott *et al.* (2003), we are not directly converting between haplotype IBD probabilities in covariances between haplotype effects, but are clustering haplotypes on the basis of the IBD probabilities and then testing the effect of haplotype clusters on phenotype. One of the advantages of this approach is that it allows one to identify the haplotypes that are the most likely to be functionally distinct (as illustrated in Figure 5) and then to focus the molecular work on these haplotypes when searching for causal variants. In addition, the haplotype clustering method in essence models a discrete number of QTL alleles that may be closer to the actual biology than the infinite number of alleles that is modeled in the alternative approach.

Our results suggest that it might be advantageous to include X-linked markers when performing marker-assisted selection or genomic selection (Meuwissen *et al.* 2001) in dairy cattle. One could, for instance, select among full brothers according to the X chromosome inherited from the dam.

## APPENDIX

In the case of nonrandom parental contribution and unequal numbers of males and females, effective population size for autosomes () and the X chromosome () can be estimated using, respectively,(A1)and(A2)

In these equations *N*_{m}, *N*_{f}, and SR are as defined before, is the variance of the number of offspring of sex *y* per parent of sex *x*, and is the covariance between the number of offspring of sex *y* and the number of offspring of sex *z* per parent of sex *x* (*e.g*., Caballero 1994).

Unequal parental contribution can be modeled by assuming that the males belong to *K* classes of size with probability of fatherhood , such that , and likewise that the females belong to *L* classes of size with probability of motherhood , such that . Using this model, one can show that

We supposed that the effects of selection were independent from one generation to another; offspring do not inherit probability of parenthood.

Figure 6 compares estimates of and obtained with this model in a population of 1 million individuals and varying SR in the case of (i) equal probability of parenthood for all parents and (ii) unequal probability of parenthood between parents. In the latter case, it was assumed that (i) the males were subdivided in three categories representing, respectively, 0.001, 0.1, and 0.89 of the male population with relative probabilities of parenthood of 100, 10, and 1 and (ii) the females were subdivided in three categories representing, respectively, 0.001, 0.1, and 0.89 of the female population with relative probabilities of parenthood of 10, 2, and 1.

We used this model to estimate and in the Dutch Holstein–Friesian population. Using statistics for 2002, 2003, and 2004 obtained from the Nationaal Rundveeteelt Syndikaat (Arnhem, The Netherlands), we set the number of reproducing males at 15,650 and the number of reproducing females at 1,076,100. To match the known distributions of sons per bull–sire and bull–dam as well as daughters per son as closely as possible (data not shown), we subdivided (i) the reproducing males in five classes representing, respectively, 0.0005, 0.005, 0.2, 0.3, and 0.4945 of the male population with relative probabilities of parenthood of 15,000, 400, 10, 2, and 1 and (ii) the reproducing females in three classes representing, respectively, 0.05, 0.1, and 0.85 of the female population with relative probabilities of parenthood of 1000, 100, and 1. This yielded estimates of and of 88 and 99, respectively, thus very similar to the corresponding values of 100 and 112 obtained when considering 25 reproducing males and 1,000,000 reproducing females with equal probability of parenthood (as utilized throughout this article), as well as to pedigree-based estimates of (Boichard 1996).

## Acknowledgments

We are grateful for the expert technical assistance of Paulette Berzi, Nadine Cambisano, Latifa Karim, Myriam Mni, Patricia Simon, and Erica Davis in producing the microsatellite genotypes and to Alain Empain for expert administration of the computer grid. This work has been funded by grants from Holland Genetics (Arnhem, The Netherlands), the Livestock Improvement Corporation (Hamilton, New Zealand), the Walloon Ministry of Agriculture, and the Genetic Analysis of Multifactorial Entities ULg/Action de Recherche Concertée. Cynthia Sandor is a fellow of the Fonds pour la formation à la Recherche dans l'Industrie et dans l'Agriculture.

## Footnotes

Communicating editor: C. Haley

- Received April 10, 2006.
- Accepted April 20, 2006.

- Copyright © 2006 by the Genetics Society of America