A Whole Genome Scan for Quantitative Trait Loci Affecting Milk Protein Percentage in IsraeliHolstein Cattle, by Means of Selective Milk DNA Pooling in a Daughter Design, Using an Adjusted False Discovery Rate Criterion
 Mathias O. Mosig1,
 Ehud Lipkin,
 Galina Khutoreskaya,
 Elena Tchourzyna,
 Morris Soller and
 Adam Friedmann
 Department of Genetics, Alexander Silberman Institute of Life Science, Hebrew University of Jerusalem, Jerusalem 91904, Israel
 Corresponding author: Ehud Lipkin, Department of Genetics, The Alexander Silberman Institute of Life Sciences, The Hebrew University of Jerusalem, Jerusalem 91904, Israel. Email: lipkin{at}vms.huji.ac.il
Abstract
Selective DNA pooling was employed in a daughter design to screen all bovine autosomes for quantitative trait loci (QTL) affecting estimated breeding value for milk protein percentage (EBVP%). Milk pools prepared from high and low daughters of each of seven sires were genotyped for 138 dinucleotide microsatellites. Shadowcorrected estimates of sire allele frequencies were compared between high and low pools. An adjusted false discovery rate (FDR) method was employed to calculate experimentwise significance levels and empirical power. Significant associations with milk protein percentage were found for 61 of the markers (adjusted FDR = 0.10; estimated power, 0.68). The significant markers appear to be linked to 19–28 QTL. Mean allele substitution effects of the putative QTL averaged 0.016 (0.009–0.028) in units of the withinsire family standard deviation of EBVP% and summed to 0.460 EBVP%. Overall QTL heterozygosity was 0.40. The identified QTL appear to account for all of the variation in EBVP% in the population. Through use of selective DNA pooling, 4400 pool data points provided the statistical power of 600,000 individual data points.
LARGE halfsib or fullsib families are routinely produced in many domesticated animal species, as part of normal population reproduction procedures (dairy cattle, swine, fish) or as part of routine genetic improvement programs (poultry). These families embody great statistical power for withinpopulation quantitative trait locus (QTL) mapping, which can be accessed by use of a “daughter design” (Soller and Genizi 1978; Welleret al. 1990). However, achieving the high power theoretically available with the daughter design requires genotyping very large numbers of daughters against large numbers of markers, with resultant high genotyping costs (Soller 1990) and logistical problems of sample collection.
In the case of dairy cattle, the logistical problem of sample collection has been solved in part by the use of the granddaughter design (Welleret al. 1990). This design is based on the availability of large halfsib families of progenytested sons of elite sires with readily accessible semen samples. The granddaughter design has the further advantage of reducing the sample size required for given power about fourfold as compared to a daughter design. A large number of studies employing this design have been reported (summarized in Table 7). However, the statistical power of the granddaughter design is limited by the number and size of the available sire families and is far from equaling the enormous statistical power embodied in the very large sire halfsib daughter families found in the same dairy cattle populations.
Two recent developments have opened the way to utilization of the daughter design in dairy cattle QTL mapping and its extension to other species. For dairy cattle, the high costs of sample collection from the scattered daughter families can be reduced by the use of milk samples as a source of DNA for the PCR reaction (Lipkin et al. 1993, 1998). Such samples are routinely collected as part of milk recording schemes and, in some instances, can be made available for QTL mapping at little additional cost.
For all species, the high costs of screening large families for marker allele frequencies can be greatly reduced by the use of selective DNA pooling (Darvasi and Soller 1994; Lipkinet al. 1998). In this procedure, determination of linkage between a molecular marker and a QTL is based on the distribution of parental marker alleles among pooled DNA samples of the extreme high and low phenotypic groups of the offspring population. Estimating allele frequencies in pooled DNA samples is based on a linear relation between the allele frequency in the group of individuals making up the pool and the densitometric intensity of the band corresponding to the allele. In the case of dinucleotide microsatellite markers, this requires correction for confounding of main allele bands and overlapping “shadow” bands (Lipkinet al. 1998).
The present study follows the procedures of Lipkin et al. (1998) for QTL mapping by selective DNA pooling with dinucleotide microsatellites, but extends these to a complete scan of the bovine genome with respect to the target trait (milk protein percentage). In addition, a modification of the false discovery rate concept (Benjamini and Hochberg 1995; Welleret al. 1998) is introduced to deal with the problem of determining statistical significance in the case of multiple comparisons and to provide an empirical estimate of statistical power. The high power and low genotyping costs achieved in this study make selective DNA pooling with shadow band correction an attractive option for QTL mapping in species with large halfsib or fullsib families.
MATERIALS AND METHODS
Population and samples: The population studied was the same as described in Lipkin et al. (1998). Briefly, seven IsraeliHolstein artificial insemination bulls were chosen, each the sire of >1800 milkrecorded daughters. Based on estimated breeding values for milk protein percentage (EBVP%), a list of the highest and lowest 220 daughters was prepared for each sire. Milk samples of the designated daughters were obtained through the milk recording system of the Israel Cattle Breeders Association.
Two pools were prepared of the daughters at each phenotypic tail. The most extreme half of the high and low daughter groups formed the “external” high and low pools, respectively. The remaining daughters in each tail formed the “internal” high and low pools, respectively. Each pool was prepared in two independent duplicates.
Genotyping individual semen samples and individual and pooled milk samples was as described (Lipkinet al. 1998).
Microsatellites: A total of 138 dinucleotide microsatellite markers distributed over all 29 bovine autosomes were used in this study (web sites: U.S. Department of Agriculture, http://bos.cvm.tamu.edu/bovgbase.html; and IBRP Cattle Genome Database, http://spinal.tag.csiro.au). The distance between the markers or between the chromosome ends (centromeres and telomeres) and the closest marker averaged 17.0 cM.
Densitometric estimates of sire allele frequencies in the pools: Densitometric estimates of sire allele frequencies in the pools were obtained after correction for shadow bands, as described in Lipkin et al. (1998). To validate the densitometric procedure, individual milk samples in a number of the pools were genotyped. The regression and correlation of estimates of allele frequency based on pool densitometry after shadow correction and estimates of allele frequency based on individual genotyping were calculated as described (Lipkinet al. 1998). In addition, mean deviation of pool estimates from individual estimates was calculated as
MarkerQTL Linkage Tests
Comparisonwise linkage tests: Individual sirebymarker combinations: Since all tests are twotail tests, the P_{ij}, for the ith sirebyjthmarker combination is obtained as twice the area of the normal curve from Z(D_{ij}) to +∞, where Z(D_{ij}) = D_{ij}/SE(D_{ij}), D_{ij} is the difference in sireallele frequencies between the high and low daughter pools of the ith sire with respect to the jth marker (Khatibet al. 1994; Lipkinet al. 1998), and SE(D_{ij}) is the standard error of D_{ij}, calculated as described in Lipkin et al. (1998). Note that this test can be applied only to sires heterozygous at the given marker.
D_{ij} values for a given sirebymarker combination were obtained in a threestep procedure, in which the order of the steps was changed somewhat in the course of the study. At first, D_{ij} were obtained by taking the difference between the mean of the two replicates of the high external pools and the mean of the two replicates of the low external pools, as in Lipkin et al. (1998), and a test for marker significance carried out across the pooled Z_{ij} values for these D_{ij}. For markers having P < 0.01, D_{ij} and P_{ij} were also obtained for the internal pools, and finally, the results of all pools (external and internal) were combined for the definitive marker comparisonwise test. As data accumulated in the present study, however, it became clear that technical error was less than expected and that the number of daughters in the pools was the main component of SE(D_{ij}). Consequently, the procedure was changed so that D_{ij} were initially based on one replicate each of both the external and internal pools and the second replicate analyzed only for markers having P < 0.01. Again, the results of all pools (external and internal) were combined for the definitive marker comparisonwise test.
Individual markers: Here, too, since all tests are twotail tests, the P_{j} for an individual marker, M_{j}, was obtained (Welleret al. 1990; Lipkinet al. 1998) as twice the area of the chisquare distribution (d.f. = s) from χ^{2}_{j} to +∞, where χ^{2}_{j} (d.f. = s_{j}) = RZ^{2}(D_{ij}) and s_{j} is the number of heterozygous sires tested for the jth marker. For each marker, the summation is over the subset of sirebymarker combinations for that marker only.
Experimentwise linkage tests—the false discovery rate: For each of the above two series of tests (sirebymarker combinations and markers), a comparisonwise error rate (CWER), or type I error Pvalue, was calculated for each test using standard statistical procedures, as detailed above. In the usual experimental situation, where only a small number of treatments are compared, a CWER Pvalue of 0.05 or 0.01 would lead to rejection of the null hypothesis (H_{0} represents absence of treatment effect), with type I error likelihood P < 0.05 or P < 0.01. In the present case of linkage testing, however, multiple tests are carried out at the sirebymarker and marker levels. Consequently, the use of a CWER based on rejecting the null hypothesis (H_{0} represents no linkage) for an individual comparisonwise test at the usual CWER may result in a high proportion of false rejections among the group of rejected null hypotheses (Lander and Kruglyak 1995). That is, among the sirebymarker combinations or markers declared to represent linkage to a QTL, a high proportion will represent false linkages. Attempting to control CWER, however, such that there is a low experimentwise probability of rejecting even one of the null hypotheses by chance alone, will result in a high proportion of false acceptances among the accepted null hypothesis. That is, a high proportion of the tests that truly represent linkage to QTL will not be recognized as such, and experimental power will be low.
As a way out of this dilemma, Benjamini and Hochberg (1995) proposed controlling the false discovery rate (FDR), i.e., “the expected proportion of true null hypotheses within the class of rejected null hypotheses” (that is, the proportion of false positive tests among the individual comparisonwise tests that are declared significant). Weller et al. (1998) discuss in detail the application and usefulness of this approach for QTL mapping with multiple markers. Zaykin et al. (2000) have pointed out that using the Benjamini and Hochberg (1995) procedure, only the unconditional FDR is controlled and not the proportion of false results among all positive tests, given that at least one test is significant. Controlling FDR at a level α based on Weller et al. (1998), therefore, results in controlling the FDR at a level greater than the ostensible level, α, because the proportion of false positives is expected to be greater than α if at least one test has been declared significant (Zaykinet al. 2000). While accepting this critique in principle, Weller (2000), however, argued that in the general class of situations where many effects are routinely declared significant, the probability of declaring at least one test significant in any specific case will be close to unity and, therefore, controlling the unconditional FDR will give similar results as controlling the conditional FDR. It is important to note that the derivation of the FDR does not require that the tests are independent (Benjamini and Hochberg 1995; Welleret al. 1998).
To apply the FDR approach to a particular series of comparisonwise tests, CWER Pvalues for the given comparison are ordered such that P_{(1)} < P_{(2)} <... < P_{(}_{h}_{)}... < P_{(}_{n}_{)}, where n is the total number of tests in the series (sirebymarker or marker, as the case may be) and P_{(}_{h}_{)} is the Pvalue corresponding to the null hypothesis of the hth test. As shown by Benjamini and Hochberg (1995), the FDR can be controlled at some level, q, by determining the largest h = t for which q < nP_{(}_{t}_{)}/t. That is, under this condition, among t rejected null hypotheses, the expected proportion of falsely rejected hypotheses is no greater than q. Using this procedure, CWER Pvalues were calculated in the present study for each of the above two test series (sirebymarker and marker). Treating each series of tests separately, the critical CWER corresponding to various desired FDR were identified. All tests having a CWER Pvalue equal to or less than the critical CWER were then taken to be significant, at the given FDR.
Estimating the proportion of false null hypotheses among all null hypotheses: For the multipletest situations considered here, the population of all n comparisonwise tests includes two groups: G_{1}, comprising n_{1} tests, for which the null hypothesis is false (i.e., true cases of linkage), and G_{2}, comprising n_{2} tests for which the null hypothesis is true (i.e., true cases of nonlinkage). An estimate of the magnitude of n_{1} and n_{2} for the sirebymarker and marker linkage test situations can be obtained on the following argument, to wit: A false null hypothesis that is wrongly accepted will nevertheless tend to have a low CWER Pvalue. Consequently, if n_{1} is an appreciable fraction of n, there should be an excess of comparisons having low CWER Pvalues and a deficit of comparisons having high CWER Pvalues. The excess or deficit for any given interval, P_{h} < P_{j} < P_{k}, can be obtained as n_{h}_{,}_{k} = (t_{h} – t_{k}), where t_{h} and t_{k} are the rank numbers of the ordered CWER test comparisons having Pvalues equal to P_{h} and P_{k}, respectively (for example: P_{h} = 0.100 and P_{k} = 0.199, Tables 2 and 7); (t_{h} – t_{k}) is the number of tests having Pvalues within a defined range, and n_{2}(P_{h} – P_{k}) is the number expected by chance for that interval out of the n_{2} tests. Although n_{2} is not known initially, it can be obtained by iteration. In the first iteration, set n_{2} = n. Then, n_{1} is estimated as n_{1} = Rn_{h}_{,}_{k}, where the summation is over all intervals for which n_{h}_{,}_{k} is positive. In the second iteration, set n_{2} = n – n_{1} and repeat until there is no further change in the estimates of n_{1} and n_{2}.
Adjusted FDR: When the proportion of true effects among all comparisonwise tests is large, the FDR calculated as in Benjamini and Hochberg (1995) is not appropriate. The reason for this is that the expected number of false rejections, nP_{(}_{h}_{)}, is based on the total number of comparisonwise tests, n, but, as noted above, these tests include two groups, G_{1} comprising n_{1} tests, for which the null hypothesis is false, and G_{2} comprising n_{2} tests, for which the null hypothesis is true. Thus, the expected number of falsely rejected true null hypotheses is actually n_{2}P_{(}_{h}_{)}, and the FDR is more appropriately calculated as q < n_{2}P_{(}_{t}_{)}/t, where t is the rank numbers of the ordered CWER test comparisons and n_{2} is calculated as above. We use the term “adjusted FDR” for the FDR calculated in this way. Because it is conditional on some true proportion of tests for which the null hypothesis is false, we believe that the FDR calculated in this way is not subject to the critique of Zaykin et al. (2000) and, hence, provides unbiased estimates of the likely proportion of false positives among the tests declared to be significant.
The composition of G_{1} and G_{2}: For sirebymarker comparisons within markers declared to be in linkage to QTL, the composition of G_{1} and G_{2} is unequivocal: All comparisons for which the sire is heterozygous at the QTL belong in G_{1}; all comparisons for which the sire is homozygous at the QTL belong in G_{2}. At the marker level, however, the composition of G_{1} and G_{2} is more problematic. Clearly, all markers on chromosomes that do not carry even one QTL are unequivocally in G_{2}. Problems arise with respect to markers on the same chromosome as a QTL. Practically speaking, markers that are >35 cM from a QTL (equivalent to >25 recombination units) are relatively useless for purposes of markerassisted selection and positional cloning, and hence one would not want to include them in G_{1}. For chromosomes of length 100 cM, with one randomly placed QTL, 0.42 of markers will be in this category. The proportion of markers on the chromosome that end up counted in G_{1} will depend on QTL location (greater if central), and on power of the test for a marker located at the QTL. For a single centrally located QTL, having power of 0.80 at CWER 0.05 for a marker located at the QTL, about 0.65 of markers on the chromosome will end up counted in G_{1}. With two or three QTL on the chromosome, virtually all markers will end up in G_{1}. Thus, G_{1} will include some markers that are in linkage to the QTL, but too far for the linkage to be useful, except as an indication that there is something of interest on the chromosome. This is part of the general problem of wide confidence intervals for QTL map location (Darvasiet al. 1993; Darvasi and Soller 1997) and is not specific to the adjusted FDR analysis.
Power of the test for linkage: By an extension of the argument of the previous section, it is possible to obtain an empirical estimate of power for the sirebymarker and marker tests. Assuming that markers fall into the above G_{1} and G_{2} groups, then accepting the null hypothesis is a type II error for markers in G_{1}, but not for markers in G_{2}. In this case, power will equal n_{m}/n_{1}, where n_{1} is as defined above, and n_{m} is the number of true markerQTL linkage determinations among the group of markers having Pvalues below the critical CWER. n_{m} can be estimated as n_{m} = t – n_{2}P_{(}_{t}_{)}, where t, P_{(}_{t}_{)}, and n_{2} are as defined above.
Estimating the proportion of heterozygosity at the QTL: Continuing this line of reasoning, the average degree of heterozygosity at the QTL was estimated by the excess proportion of significant sirebymarker combinations among all sires heterozygous at the significant markers, as follows. At the markers showing linkage to QTL, let s be the total number of sirebymarker combinations for which these markers were heterozygous. Then, these individual sirebymarker combinations fall into two groups, namely: S_{1}, consisting of s_{1} sirebymarker combinations at which the QTL was heterozygous, and S_{2} consisting of s_{2} sirebymarker combinations at which the QTL was homozygous. The ratio H = s_{1}/s will then estimate the proportion of heterozygosity at the QTL. Estimates of s_{1} and s_{2} can be obtained by the same procedure used above to estimate n_{1} and n_{2}, except that s_{1} and s_{2} are calculated for significant markers only.
The parameter H estimated in this way is an overestimate of heterozygosity for all QTL affecting EBVP% in the population as a whole. The reason for this is that H can be obtained only for significant markers. But, with only a limited number of heterozygous sires per marker, some markers in linkage to QTL will fail to reach significance, simply because none of the sires heterozygous for the marker was also heterozygous at the QTL. Taking this into account, the true proportion of heterozygosity at the QTL can be estimated as follows. Consider all markers in linkage to QTL, and let h be the true proportion of heterozygosity at the QTL to which these markers are linked and b be the average number of sires heterozygous at the linked markers. Then on the binomial distribution, the expected proportion of instances for which x of the b sires heterozygous at a marker in linkage to a QTL are also heterozygous at the linked QTL is given by B(x) = B(x; h, b).
By definition, the true proportion of heterozygosity at the QTL is
Since B(0) is a function only of h and b, and both H and b are known from the data, h is obtained by substituting successive values of h in the above expression, until the calculated H and observed H correspond.
The number of QTL on a chromosome: When dealing with a single F_{2} or backcross population or with the progeny of a single sire, the issue of multiple QTL on the same chromosome, and thus the total number of uncovered QTL, can be resolved only by using an appropriate twoQTL interval mapping model (Zhanget al. 1998). Methods for application of one or twoQTL models to interval mapping based on data obtained by selective DNA pooling are under development (Dekkerset al. 1999), but are not yet available. Nevertheless, as a preliminary estimate, multiple QTL were identified in this study based on additional qualitative criteria. These included: (i) the presence of one or more sires showing significance for more proximal marker(s), with clear lack of significance for more distal marker(s), while the opposite was true for other sires, and (ii) the presence of a sire showing significance for both proximal and distal marker(s), but lack of significance for intervening marker(s).
Allele substitution effects: Allele substitution effects were calculated as described (Lipkinet al. 1998). We were unable to derive a method to provide standard errors for these estimates.
RESULTS
Marker distribution by sires: Sire heterozygosity at the 138 markers averaged 0.67, ranging from 0.62 to 0.71. Differences in marker heterozygosity among sires were not significant (by chisquare contingency test). On the average, each sire was heterozygous at 92 markers, allowing a total of 644 individual sirebymarker tests.
Validation of densitometric allele frequency estimates: For sixteen pools, each representing a high or low pool of a specific sirebymarker combination, densitometric estimates of allele frequency were compared to estimates of allele frequency obtained by individual genotyping. For the most part these comparisons were for sirebymarker combinations on BTA 6, which showed an exceptionally high number of significant effects; some comparisons were also for sirebymarker combinations on other chromosomes involving sire 1, who individually showed an exceptionally large number of highly significant effects. In each case, the densitometric estimates were based on the mean of duplicate pools. For individual genotyping, the number of genotyped daughters per pool averaged 44.8 (ranging from 18 to 78) for a total of 729 individual genotypes. Three of the sirebymarker combinations were presented previously (Lipkinet al. 1998). For the remaining combinations, Table 1 shows intercepts, regression coefficients, and correlation of densitometric estimates on individualgenotyping estimates.
None of the intercepts or regression coefficients differed significantly from 0.0 or 1.0, respectively. For all alleles in the population, the correlation coefficient ranged from 0.92 to 0.99 and was 0.96 pooled over all combinations. The mean deviation of pool estimates from individualbased estimates was 0.05. For sire alleles only, the correlation was 0.93 and the mean deviation 0.06.
The proportion of false null hypotheses among all null hypotheses: Table 2 shows the distribution of CWER Pvalues for sirebymarker and marker tests and the estimated number of false (n_{1}) and true (n_{2}) null hypotheses at the two levels of comparison. At both levels, the distribution of Pvalues differed significantly from that expected if all comparisonwise tests were generated by true null hypothesis. This indicates that the null hypothesis must be false for at least some of the tests. The estimated proportion of false null hypotheses (i.e., true linkage) out of all hypotheses tested was 0.59 and 0.22, for the marker and sirebymarker levels, respectively. These numbers indicate that more than half of all markers are in linkage to a QTL affecting trait value and that more than onefifth of markers tested in an individual sire are heterozygous at a linked QTL affecting trait value.
Critical CWER Pvalues according to raw and adjusted FDR: Table 3 shows critical comparisonwise Pvalues for raw and adjusted FDR of 0.05, 0.10, and 0.20, according to level of comparison. Critical Pvalues increase with increase in FDR, as expected. Critical Pvalues are greater for adjusted than for raw FDR. The difference between critical Pvalues according to raw and adjusted FDR stands in proportion to the difference between estimated number of true null hypotheses (n_{2}) and total number of comparisons tests (n). Thus, the difference is minor for the sirebymarker comparisons and larger for marker comparisons.
Power of the tests for linkage: Table 4 shows total number of significant comparisons (rejected null hypotheses) and estimated power of the analysis, according to FDR (raw and adjusted) and level of comparison. In any given cell of the table, the total number of significant comparisons includes both falsely rejected null hypotheses derived from G_{2} and correctly rejected null hypotheses derived from G_{1}. The number of correctly rejected hypotheses (i.e., the number of elements of G_{1} that are correctly identified as significant) is given by the number in the table, less the number of falsely rejected null hypotheses as given by the FDR. For example, at an adjusted FDR of 0.10, there are 61 rejected null hypotheses at the marker level. Of these, 6.1 represent falsely rejected null hypotheses, so that the number of truly identified markerQTL linkages is equal to 54.9. On a raw FDR basis, power was about 0.28 to 0.49 at the sirebymarker level and 0.41 to 0.58 at the marker level. On an adjusted FDR basis, power was 0.34 to 0.60 at the sirebymarker level and 0.48 to 0.89 (at an adjusted FDR of 0.20) at the marker level.
MarkerQTL linkage: Marker tests: There were a total of 138 comparisonwise linkage tests at the individual marker level. Adjusted experimentwise FDR of 0.05 and 0.10 were obtained at a CWER of P < 0.033 and P < 0.100, respectively (Table 3). There were 42 and 61 significant markers at these levels, with estimated power of 0.49 and 0.68, respectively (Table 4).
Of the 20 additional markers included at an FDR of 0.10 as compared to an FDR of 0.05, 15 were found on chromosomes carrying at least one significant marker at FDR of 0.05. Thus, these 15 probably represent markers in linkage to QTL already identified on these chromosomes. Four more were on BTA 12, suggesting the presence of a QTL on this chromosome. These 19 markers would appear to validate the use of the less stringent criterion. The remaining one marker was found on BTA 24 that did not carry a significant marker on the more stringent criterion. We propose therefore to use the CWER providing an adjusted FDR of 0.10 as our criterion for significance in the present study.
Of the 61 significant markers on the above criterion (Table 5), 15 had two or more sirebymarker tests significant at an adjusted FDR of 0.10; 35 had a single significant sirebymarker test at this level; and 11 did not have even a single markerbysire test significant at this level.
The 61 significant markers were distributed over 23 chromosomes (Table 5): BTA 1*, 2, 3*, 4, 5*, 6*, 7*, 8*, 9*, 10*, 11*, 12*, 13*, 14, 16*, 18, 20*, 21*, 22, 23*, 26*, 27*, 29*. Of these, 18 chromosomes (indicated by an asterisk above) had at least two markers significant on the above criterion; the remainder had a single significant marker only. The large variability in significance level among sires significant for the same marker and presumably heterozygous for the same QTL (Table 5), is best explained by the nonlinear relationship of significance level to Dvalues; Dvalues and the resultant allele substitution effects show much less variation (Table 5). Nevertheless, the possibility cannot be excluded that at least in some of the instances, the significant sires are heterozygous for different alleles or for different closely linked QTL.
The proportion of significant sirebymarker tests, for sires heterozygous at the markers, differed greatly among the sires, being 0.25 for sire 1, 0.02 for sire 6, and 0.06 to 0.10 for the remaining sires. The differences among sires were highly significant (by chisquare contingency test). A source for these differences is not apparent, as all sires were of the same breed and geographic origin and were active in the same years and same environment. Nor can the number of daughters explain these differences, as the two sires with the highest number of daughters, sire 1 (3394 daughters) and sire 6 (3407 daughters), had opposite extreme proportions of significant tests.
Due to the large number of significant tests obtained with sire 1, densitometric estimates of allele frequencies for this sire were confirmed by individual genotyping of four markers (BMS963, CSSM5, UWCA9, and BM415) distributed over four chromosomes and representing a wide range of Pvalues (Tables 1 and 5). The results of the individual genotypings showed a close correspondence to the densitometric estimates (Table 1).
Proportion of heterozygosity at the QTL: At the 61 markers significant at an FDR of 0.10, there were a total of 282 sirebymarker comparisons (4.6 per marker); at the remaining 77 markers there were a total of 362 sirebymarker comparisons (4.7 per marker). The difference in average number of sirebymarker comparisons between significant and nonsignificant markers was not statistically significant (by chisquare contingency test). Table 6 shows the distribution of CWER Pvalues for sirebymarker comparison tests, according to whether the marker was among the significant or nonsignificant markers. The estimated numbers of false (s_{1}) and true (s_{2}) null hypotheses at each marker group are also shown. The distribution differed significantly from expected for the significant markers, but did not differ from expected for the nonsignificant markers. The estimated number of true rejections of the null hypothesis (s_{1}) among the significant markers was 125; that for the nonsignificant markers was 19. Thus, the estimate of heterozygosity at QTL for the significant markers was H = 125/282 = 0.44. As noted, this is an overestimate, since it includes only QTL for which at least one sirebymarker combination was heterozygous. The corrected estimate is h = 0.40. Even this may be an overestimate, since it is those QTL at a lower degree of heterozygosity that would not be identified in the limited number of sires sampled from the population. The presence of more than one segregating QTL on some of the chromosomes will also contribute to inflating this estimate. In particular, chromosome 6 shows 80% significant markerbysire comparisons. This might be evidence of at least two QTL on this chromosome.
At an average heterozygosity of 0.40 and an average of 4.6 sirebymarker comparisons per marker, the proportion of QTL that are not represented in heterozygous state in at least one sirebymarker comparison is given by (0.6)^{4.6} = 0.095. Thus, 90% of the QTL segregating in the population will have had some opportunity to be identified in the present study.
The number of uncovered QTL: Of the chromosomes carrying significant markers, 19 (BTA 2, 3, 4, 5, 8, 9, 10, 11, 12, 13, 14, 16, 18, 20, 22, 23, 26, 27, and 29) showed only a single distinct peak of significance, consistent with the presence of a single QTL (or a single linked QTL complex) on the chromosome (data not shown). Three chromosomes, BTA 1, 6, and 21, showed qualitative indications of two separated QTL; one chromosome, BTA 7, showed qualitative indications of three separated QTL. All told, therefore, in this study, at least 23 and possibly as many as 28 QTL were found, affecting milk protein percentage in the IsraeliHolstein population.
Allele substitution effects: For each marker significant at P < 0.100 (FDR = 0.10, Table 3), Table 5 presents the effects obtained for all individual sirebymarker combinations significant at P = 0.012 (FDR = 0.10), and the mean substitution effect of the marker, averaged over all such sires. Taking the effect of the most significant marker in any given region (when two or more adjacent markers appear to be representing the same QTL), mean effects of the putative 26 QTL averaged 0.016, ranging from 0.009 to 0.028, and summed to 0.460.
DISCUSSION
Selective DNA pooling: The results of this study confirm the effectiveness of selective DNA pooling using milk samples, for QTL mapping in dairy cattle. As found in the first stage of this study (Lipkinet al. 1998), pool estimates were accurate and unbiased (Table 1). As shown by Lipkin et al. (1998), the pools of the seven sires provided statistical power per marker equivalent to individual selective genotyping of 910 daughters per sire. Thus, the present study, involving 4396 pool genotypings, provided the equivalent of 644 (sirebymarker tests) × 910 (daughters per sire) = 587,860 individual genotypings, a 134fold reduction. Implications of this methodology for markerassisted selection and fine mapping of QTL were discussed in Lipkin et al. (1998).
The false discovery rate approach to statistical testing: In this study a modification of the false discovery rate approach proposed by Benjamini and Hochberg (1995), which we term the “adjusted” false discovery rate, was used to deal with the problem of establishing statistical significance in a multiple test situation. It enabled decisions to be made, emphasizing control of type II rather than type I errors, without greatly increasing the proportion of false declaration of linkage in the final results. Because of the novelty of these concepts, their validity and sensitivity to assumptions has not yet been tested, and this should be taken into account when the accuracy of this methodology is assessed.
Because the FDR has as yet not been used widely for hypothesis testing, there is no consensus as to appropriate levels of FDR for various experimental situations. It would appear appropriate to take into account the relative positive contribution to the goals of the experiment of a true rejection of the null hypothesis, as compared to the negative contribution of a false rejection of the null hypothesis. In the case of QTL mapping as a preliminary for markerassisted selection (MAS), genetic progress is proportional to the standard deviation of summed QTL value. On the simple assumption of equal effects for the different QTL, this will be proportional to the square root of the number of QTL that are followed. Thus, each additional truly identified QTL makes a positive contribution to MAS. A QTL falsely identified as present at a particular location may lead to some unnecessary genotyping costs. It will not, however, turn up as a heterozygous QTL in subsequent markerQTL phase determination of elite sires. Hence, it will not be actively incorporated in the MAS program and will not result in lost selection intensity in other areas. Thus, for purposes of MAS, the FDR can be set relatively high. On the other hand, in the case of QTL mapping as a preliminary to highresolution mapping or to comparative positional cloning, a false QTL identification may lead to major expenditure of wasted effort, and FDR would be set at a low level.
In the present study we chose to use an adjusted FDR of 0.10 for the marker level and 0.05 for the chromosome level. However, the general results would not have changed very much if an adjusted FDR = 0.05 had been used throughout. In contrast, the use of a Bonferroni approach would have changed results appreciably. At the marker level, an adjusted experimentwise FDR of 0.10 was obtained at a CWER of P ≤ 0.100 (Table 3). There were 61 significant markers at these levels, with estimated power of 0.68 (Table 4). Using a Bonferroni criterion, the CWER Pvalue needed for experimentwise significance at P = 0.10 (assuming 50 independent marker tests) is P ≤ 0.002. At this level, there would have been 24 significant markers (FDR = 0.005), and power would have been 0.29. Thus, although the FDR in this case would have been negligible, many true effects, snared at a somewhat higher FDR, would have been missed.
The power of the experiment and the adjusted false discovery rate: The adjusted FDR approach conceives of the population of multiple tests as consisting of two subgroups, one, which we have denoted G_{1}, for which the null hypothesis is false, and a second, denoted G_{2}, for which the null hypothesis is true. The characteristic feature of the first population is that it will tend to have lower CWER Pvalues than the second. On this basis we proposed a methodology for estimating the size of the two subgroups based on a comparison of expected and observed numbers in each Pvalue class. The excess numbers in the low Pvalue classes were assigned to G_{1}, the remainder to G_{2}. Using this approach, the estimated proportion of false null hypotheses (i.e., true linkage) out of all hypotheses tested was 0.59 and 0.22, for marker and the sirebymarker levels, respectively (Table 2). These proportions are internally consistent. If 23 of 29 (fourfifths) chromosomes carry a QTL, it is plausible that only about 50% of markers are in linkage to QTL: 20% of markers will be located on chromosomes that do not carry a QTL and, on chromosomes carrying a single QTL, some markers will be too far from the nearest QTL to show an effect. Similarly, if about 50% of markers are in linkage, it is plausible that only about 20–25% of all sirebymarker comparisons represent heterozygous QTL: Half of the markers are not in linkage, and of those in linkage, at least half of the sires are not heterozygous at the QTL.
By estimating the size of the two populations, it was possible to calculate an adjusted FDR, which further increased the power of the analyses. Further exploitation of this approach enabled estimates of the empirical power of the experiment to be obtained. Power at FDR = 0.10 was equal to 0.68 and 0.40 at the marker and sirebymarker levels, respectively (Table 4). The question of power is particularly important at the sirebymarker level, since correctly identifying sires heterozygous at the QTL is an essential component of phase determination for MAS. The rather low power at the marker and sirebymarker levels is probably due, at least in part, to the fact that many of the markers were at a distance from the QTL to which they were linked. As a result, power for these markers (and for the sirebymarker combinations involving them), would necessarily be low. Thus, we anticipate that power will be somewhat greater, once mapping proceeds to identify markers in tight linkage to the QTL. In addition, the high and low selected groups in this experiment included only the high and low 10% of the daughter population. Including 15% rather than 10% adds somewhat to mapping power (Darvasi and Soller 1992). Thus, a combination of more tightly linked markers and expanded high and low groups may be able to increase power of the pool analyses at the sirebymarker level. A combination of pool genotyping and individual genotyping may also provide further increase in power at this level.
Finally, through this approach we were able to estimate the average degree of heterozygosity at the QTL. This is of interest with respect to the potential impact of MAS, which will be greatest for positive alleles that are at low frequency. An average heterozygosity of 0.40 or less at the QTL, as found in this study, implies that for most QTL, allele frequency is unequal, with at least one allele at high frequency (>0.7) and the other at correspondingly low frequency (<0.3). This accords with the prior history of the IsraeliHolstein cattle population. This population was under intensive selection for high milk yield for almost 50 years. Because of the negative genetic and phenotypic correlation between milk yield and milk protein percentage, this will have resulted in strong selection against alleles with positive effects on milk protein percentage. Indeed, such selection is apparent in the higher proportion of target daughters accessed in the high EBVP% tail of the population than in the low tail (Lipkinet al. 1998). Thus, we can make the further inference that usually it is alleles with positive effects on milk protein percentage that are at low frequency in this population. The average proportion of heterozygosity (0.40) is quite favorable for MAS, since it implies that in each generation a significant proportion of QTL will be correctly phased at the level of the fullservice sire. This will allow markerQTL phase information to accumulate rapidly in the population as a whole.
Comparison to other studies in the literature: Because of the large scope and high power of the present study, it is of interest to compare the results of this study to those of other QTL mapping studies of milk production traits in dairy cattle (Table 7). Crossstudy comparisons are complicated by the fact that the different studies used different designs and levels of significance and examined different traits. With respect to levels of significance, we will accept a QTL as “confirmed” in a given region identified by two or more independent studies. With respect to the traits involved, because of the physiological correlation between milk quantity and milk composition and the correlation among the various milk components, a single QTL can be expected to have effects on more than one milk production trait. On this basis, we will consider a markerlinked effect on any of the milk production traits (kilograms milk, kilograms protein, kilograms fat, protein percentage, or fat percentage) as indicating a QTL affecting “milk production.” Comparing across studies, therefore, we propose that when the same or closely linked markers show effects on different milk production traits in different studies, the markers nevertheless be considered as reflecting the same underlying QTL. This is clearly a simplification, and, in practice, different QTL and possibly different alleles at the same QTL may differ in their effects on the various production traits. Indeed, it is the QTL alleles that break the overall correlation pattern among traits that are of greatest interest for purposes of markerassisted selection. It is also possible that milk production QTL are clustered, so that different studies are identifying different QTL located in the same general chromosomal region, but affecting different aspects of milk production.
We review the chromosomal level, without distinction as to marker location within the chromosome. On this basis, multiple studies, reviewed in Table 7 and in Boichard (1999), have reported QTL on 20 chromosomes: 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 14, 16, 17, 18, 19, 20, 21, 23, and 29. The present study found significant evidence for linkage on all but BTA 17 and 19. On BTA 19 the growth hormone gene was linked previously to milk production traits (Hojet al. 1993; Lagzielet al. 1996; Falakiet al. 1997). Although none of the markers on BTA 19 was significant in the present study, we note that sire 1 had a sirebymarker CWER of P = 0.03 at MAP2C (FDR of 0.17), which is adjacent to bGH. On the other hand, the effect found by Lagziel et al. (1996) was associated with a haplotype that is very rare in the IsraeliHolstein population, and it was probably missing among the seven sires in the present study.
In addition, linkage to QTL was found in the present study for three chromosomes, BTA 12, 26, and 27, for which there is only one report in the literature. Linkage was also found on two chromosomes, BTA 13 and 22, for which results have not been reported in the literature.
In summary, the present study and the literature agree in finding QTL on 21 chromosomes and in not finding QTL on three chromosomes: BTA 15, 24, and 28. There is disagreement with respect to two chromosomes for which QTL were identified in the present study, but not confirmed in the literature, and with respect to two chromosomes for which QTL were confirmed in the literature, but were not found in the present study. We believe that any discrepancy between the results of the present study and those in the literature can best be explained by the strong threeway “chance” element at play in any particular study, namely, whether a QTL segregating in the population as a whole is segregating in the sampled set of sires, whether the sampled segregating QTL is close to a genotyped segregating marker, and whether the segregating QTL/marker pair passes the significance threshold.
The proportion of total genetic variance explained by the uncovered QTL: The present study may have uncovered as many as 90% of the QTL affecting milk protein percentage that are segregating in the IsraeliHolstein population. It is of interest, therefore, to calculate the actual fraction of the total genetic variance that is explained by the uncovered QTL. The fraction of genetic variance for protein percentage that is accounted for by the QTL identified in this study will stand in proportion to
The allele substitution effects derived in this study are in terms of estimated breeding values (EBVs). Because EBVs are regressed toward the mean (depending on the accuracy), allele substitution effects derived from EBVs will underestimate the true substitution effects (Israel and Weller 1998).
Currently there is no accepted procedure to translate allele substitution effects in units of EBVP% to the actual allele substitution effects in terms of trait value for milk protein percentage. Therefore, we propose to obtain an estimate of the fraction of genetic variance in milk protein percentage in the study population that is accounted for by the effects of the identified QTL, by keeping the calculations in terms of the variance of EBVP% (equal to the genetic variance times accuracy squared). To do this we will take allele substitution effects in terms of EBVP%, estimate variance of EBVP%, and calculate the proportion of variance in EBVP% that is explained by the observed allele substitution effects. It seems plausible that whatever the eventual translation from EBVP% to trait value for protein percentage, the relative proportion of explained genetic variance would remain the same.
An estimate of the variance of EBVP% in the study population was obtained as follows: The average difference between the high and low daughters of the various sires was 0.195 EBVP%. On the average, the high and low daughters represented the high and low 0.085 of each tail, so that the difference between the means of the two tails is about 3.6 phenotypic standard deviations. Thus, the standard deviation of EBVP% can be estimated as 0.054 (= 0.195/3.6), and the variance as 0.00292 (= 0.054^{2}). On a withinsire basis, only threefourths of the genetic variation is present. Thus, withinsire variance of EBVP% represents only 0.75 of the total variance of EBVP%, giving an estimate of 0.00389 (= 0.00292/0.75) for the population variance of EBVP%, assuming betweensire variance in EBVP% is 25% of total variance in EBVP%.
Average heterozygosity at the QTL was estimated above as h = 0.40, and the summed squared allele substitution effects came to 0.0080, taking the allele substitution effect associated with the most significant marker in each region. Thus, the estimate of EBVP% variance is 0.00320 (= 0.0080 × 0.4). Since average marker spacing was about 17 cM, assuming random distribution of QTL and markers, the average QTL would be about 4 cM from the nearest marker, so that αvalues are reduced by 8%. The corrected estimate of the variance in EBVP% contributed by these loci is thus 0.00373 [= (1.08)^{2}(0.00320)]. This is more or less equal to the estimated variance of EBVP% in the population (0.00389) and does not include the 10% of QTL that may have been homozygous in all sires tested and, hence, not uncovered. On the other hand, allele substitution effects of significant markers tend to have positive estimation errors and hence may be somewhat overestimated, and significant markers may be a bit more closely linked to QTL than the average marker (since more closely linked QTL have a greater chance to be detected). Because of the relatively high power of the present study, however, we believe that these effects will not be great. Assuming that the latter two effects are equal to the missing 10% of QTL, the QTL uncovered and implied in this study appear to account for the total genetic variance in EBVP% and, by inference, for total genetic variance in protein percentage in the IsraeliHolstein population.
Markerassisted selection: The summed total of effects of the putative QTL uncovered in the present study is equal to 0.460 EBVP%. Because EBVP% are moderately regressed, the actual substitution effects at the individual QTL may be considerably larger than their estimates (Israel and Weller 1998; Lagzielet al. 1999). Thus, 0.460 EBVP% represents a minimum estimate of the total summed allele substitution effects on milk protein percentage for QTL segregating in the IsraeliHolstein population. As noted above, the results of this study and the history of the IsraeliHolstein breed suggest that the average frequencies at the QTL of alleles having a positive effect on milk protein percentage are low. In this case, the summed allele substitution effects on milk protein percentage over all segregating QTL represents the lower limit for genetic increase in milk protein percentage in the population. Thus, existing genetic variation for milk protein percentage in the IsraeliHolstein population should allow an absolute genetic increase of as much as 1% in milk protein percentage. This would represent an increase of onethird above the present mean of 3%. Such an increase would, of course, be accompanied by correlated effects on yearly milk production and other traits, according to the specific pleiotropic effects of the milk protein percentage QTL.
Acknowledgments
This study was supported by the joint biotechnology program of the Ministry of Science and Technology (Israel) and Bundesministerium für Biotechnologie (Germany), by the Israel Science Foundation, the Israel MAGNET program, the Israel Cattle Breeders Association, and the Israel Milk Marketing Board.
Footnotes

Communicating editor: C. Haley
 Received June 9, 2000.
 Accepted January 5, 2001.
 Copyright © 2001 by the Genetics Society of America