- THIS ARTICLE
-
Abstract
- Full Text (PDF)
-
All Versions of this Article:
genetics.105.049510v1
173/3/1761 most recent - Alert me when this article is cited
- Alert me if a correction is posted
- SERVICES
- Email this article to a friend
- Similar articles in this journal
- Similar articles in PubMed
- Alert me to new issues of the journal
- Download to citation manager
- Reprints & Permissions
- CITING ARTICLES
- Citing Articles via HighWire
- Citing Articles via Google Scholar
- GOOGLE SCHOLAR
- Articles by Gianola, D.
- Articles by Stella, A.
- Search for Related Content
- PUBMED
- PubMed Citation
- Articles by Gianola, D.
- Articles by Stella, A.
Originally published as Genetics Published Articles Ahead of Print on April 28, 2006.
Genetics, Vol. 173, 1761-1776, July 2006, Copyright © 2006
doi:10.1534/genetics.105.049510
Genomic-Assisted Prediction of Genetic Value With Semiparametric Procedures
Daniel Gianola*,
,
,1,
Rohan L. Fernando
and
Alessandra Stella
* Department of Animal Sciences, University of Wisconsin, Madison, Wisconsin 53706,
Parco Tecnologico Padano, 26900 Lodi, Italy,
Department of Animal and Aquacultural Sciences, Norwegian University of Life Sciences, N-1432 Ås, Norway and
Department of Animal Science, Iowa State University, Ames, Iowa 50011
1 Corresponding author: Department of Animal Sciences, 1675 Observatory Dr., Madison, WI 53706.
E-mail: gianola{at}ansci.wisc.edu
>ABSTRACT
KERNEL REGRESSION ON SNP...
SEMIPARAMETRIC KERNEL MIXED...
REPRODUCING KERNEL HILBERT...
CONCLUSION
ACKNOWLEDGEMENTS
LITERATURE CITED
Semiparametric procedures for prediction of total genetic value for quantitative traits, which make use of phenotypic and genomic data simultaneously, are presented. The methods focus on the treatment of massive information provided by, e.g., single-nucleotide polymorphisms. It is argued that standard parametric methods for quantitative genetic analysis cannot handle the multiplicity of potential interactions arising in models with, e.g., hundreds of thousands of markers, and that most of the assumptions required for an orthogonal decomposition of variance are violated in artificial and natural populations. This makes nonparametric procedures attractive. Kernel regression and reproducing kernel Hilbert spaces regression procedures are embedded into standard mixed-effects linear models, retaining additive genetic effects under multivariate normality for operational reasons. Inferential procedures are presented, and some extensions are suggested. An example is presented, illustrating the potential of the methodology. Implementations can be carried out after modification of standard software developed by animal breeders for likelihood-based or Bayesian analysis.
MASSIVE quantities of genomic data are now available, with potential for enhancing accuracy of prediction of genetic value of, e.g., candidates for selection in animal and plant breeding programs or for molecular classification of disease status in subjects (GOLUB et al. 1999). For instance, WONG et al. (2004) reported a genetic variation map of the chicken genome containing 2.8 million single-nucleotide polymorphisms (SNPs) and demonstrated how the information can be used for targeting specific genomic regions. Likewise, HAYES et al. (2004) found 2507 putative SNPs in the salmon genome that could be valuable for marker-assisted selection in this species.
The use of molecular markers as aids in genetic selection programs has been discussed extensively. Important early articles are SOLLER and BECKMANN (1982) and FERNANDO and GROSSMAN (1989), with the latter focusing on best linear unbiased prediction of genetic value when marker information is used. Most of the literature on marker-assisted selection deals with the problem of locating one or few quantitative trait loci (QTL) using flanking markers. However, in the light of current knowledge about genomics, the widely used single-QTL search approach is naive, since there is evidence of abundant QTL affecting complex traits, as discussed, e.g., by DEKKERS and HOSPITAL (2002). This would support the infinitesimal model of FISHER (1918) as a sensible statistical specification for many quantitative traits, with complications being the accommodation of nonadditivity and of feedbacks (GIANOLA and SORENSEN 2004). DEKKERS and HOSPITAL (2002) observe that existing statistical methods for marker-assisted selection do not deal well with complexity posed by quantitative traits. Some difficulties are: specification of "statistical significance" thresholds for multiple testing, strong dependence of inferences on model chosen (e.g., number of QTL fitted, distributional forms), inadequate handling of nonadditivity, and ambiguous interpretation of effects in multiple-marker analysis, due to collinearity.
Here, we discuss how large-scale molecular information, such as that conveyed by SNPs, can be employed for marker-assisted prediction of genetic value for quantitative traits in the sense of, e.g., MEUWISSEN et al. (2001), GIANOLA et al. (2003), and XU (2003). The focus is on inference of genetic value, rather than detection of quantitative trait loci. A main challenge is that of positing a functional form relating phenotypes to SNP genotypes (viewed as thousands of possibly highly colinear covariates), to polygenic additive genetic values, and to other nuisance effects, such as sex or age of an individual, simultaneously.
Standard quantitative genetics theory gives a mechanistic basis to the mixed-effects linear model, treated either from classical (SORENSEN and KENNEDY 1983; HENDERSON 1984) or from Bayesian (GIANOLA and FERNANDO 1986) perspectives. MEUWISSEN et al. (2001) and GIANOLA et al. (2003) exploit this connection and suggest highly parametric structures for modeling relationships between phenotypes and effects of hundreds or thousands of molecular markers. A first concern is the strength of their assumptions (e.g., linearity, multivariate normality, proportion of segregating loci, spatial within-chromosome effects); it is unknown if their procedures are robust. Second, colinearity between SNP or marker genotypes is bound to exist, because of the sheer massiveness of molecular data plus cosegregation of alleles. While adverse effects of colinearity can be tempered when marker effects are treated as random variables, statistical redundancy is undesirable (LINDLEY and SMITH 1972).
The genome seems to be much more highly interactive than what standard quantitative genetic models can accommodate (e.g., D'HAESELEER et al. 2000). In theory, genetic variance can be partitioned into orthogonal additive, dominance, additive x additive, additive x dominance, dominance x dominance, etc., components, only under highly idealized conditions. These include linkage equilibrium, absence of natural or artificial selection, and no inbreeding or assortative mating (COCKERHAM 1954; KEMPTHORNE 1954). Arguably, these conditions are violated in nature and in breeding programs. Actually, marker-assisted selection exploits existence of linkage disequilibrium, and even chance creates disequilibrium. Further, estimation of nonadditive components of variance is notoriously difficult, even under standard assumptions (CHANG 1988). Therefore, it is doubtful whether or not standard quantitative genetic approaches can model fine-structure relationships between genotypes and phenotypes adequately, unless either departures from assumptions have mild effects or statistical constructs based on multivariate normality turn out to be more robust than what is expected on theoretical grounds. These considerations suggest that a nonparametric treatment of the data could be valuable.
On the other hand, application of the additive genetic model in selective breeding of livestock has produced remarkable dividends, as shown in DEKKERS and HOSPITAL (2002). Hence, a combination of nonparametric modeling of effects of molecular variables (e.g., SNPs) with features of the additive polygenic mode of inheritance is appealing.
Our objective is to present semiparametric methods for prediction of genetic value for complex traits that make use of phenotypic and genomic data simultaneously. This article is organized as follows. KERNEL REGRESSION ON SNP MARKERS introduces nonparametric regression, kernel functions, and smoothing parameters and proposes a nonparametric approximation to additive genetic value. Next, SEMIPARAMETRIC KERNEL MIXED MODEL combines features of kernel regression with the mixed-effects linear model and describes classical and Bayesian implementations. REPRODUCING KERNEL HILBERT SPACES MIXED MODEL uses established calculus of variations results and hybridizes the mixed-effects linear model with a regression on kernel basis functions. Estimation procedures are presented, the problem of incomplete genotyping is addressed, and a simulated example is given, to illustrate feasibility and potential. The article concludes with a discussion and with suggestions for additional research.
ABSTRACT
>KERNEL REGRESSION ON SNP...
SEMIPARAMETRIC KERNEL MIXED...
REPRODUCING KERNEL HILBERT...
CONCLUSION
ACKNOWLEDGEMENTS
LITERATURE CITED
The regression function:
Consider a stylized situation in which each of a series of individuals possesses a measurement for some quantitative trait denoted as y, as well as information on a possibly massive number of genomic variables, such as SNP "genotypes," represented by a vector x. In the main, x is treated as a continuously valued vector of covariates, even though SNP genotypes are discrete (coding is done via dummy variates). Also, x could represent gene expression measurements from microarray experiments; here, it would be legitimate to regard this vector as continuous. Although gene expression measurements are typically regarded as response variables, there are contexts in which this type of information could be used in an explanatory role (MALLICK et al. 2005).
Let the relationship between y and x be represented as
![]() | (1) |
(0,
2) is a random residual, distributed independently of xi and with variance
2.
The conditional expectation function is
![]() | (2) |
![]() | (3) |
is a kernel function and h is a window width or smoothing parameter. In (3), x is the value ("focal point") at which the density is evaluated and xi (i = 1, 2,...,n) is the observed p-dimensional SNP genotype of individual i in the sample. Hence, (3) estimates population densities (or frequencies). If
is to behave as a multivariate probability density function, then it must be true that the kernel function is positive and that the condition
![]() |
![]() |
![]() |
is also a kernel function; again, yi is the observed sample value of variable y in individual i. The numerator of (2) can be estimated as
![]() | (4) |
, so that dy = hdz and
![]() |
and
. If so, the preceding expression is equal to yi, so that (4) becomes
![]() | (5) |
Returning to (2), one can form the nonparametric estimator
![]() |
![]() | (6) |
![]() |
![]() |
when x = xi and tails off to 0 as the distance between x and xi increases. The values of wi(x) decrease more abruptly as
. This is the reason why this type of estimator is called "local," in the sense that observations with xi coordinates closer to the focal point x are weighted more strongly in the computation of the fitted value
.
A specification that is more restrictive than (1) is the additive regression model
![]() | (7) |
![]() |
Impact of window width:
The scatter plot shown in Figure 1, from CHU and MARRON (1991), consists of data on log-income and age of 205 people. The solid line in Figure 1A is a moving weighted average of the points, with weights proportional to the curve at the bottom. CHU and MARRON (1991) regard the dip in average income in the middle ages as "unexpected." Figure 1B gives results from three different smooths at window widths of 1, 3, and 9. When h = 1, the curve displays considerable sampling variation or roughness. On the other hand, when h = 9, local features disappear, because points that are far apart receive considerable weight in the fitting procedure. If the dip is not an artifact, the oversmoothing results in a "bias." Hence, h must be gauged carefully.
|
MARRON (1988) and CHU and MARRON (1991) discuss data-driven procedures for assessing h. SILVERMAN (1986) gives a discussion in the context of density estimation, whereas MALLICK et al. (2005) consider Hilbert spaces kernel regression, with h treated as an unknown parameter. A conceptually simple and intuitively appealing procedure is cross-validation (e.g., SCHUCANY 2004). For instance, in the "leave-one-out" procedure, the datum for case i, that is (yi, xi), is deleted, and a fit is carried out on the basis of the other n 1 cases. Then, the prediction
of yi is formed, where the notation i indicates that all data other than that for case i are used for estimating the regression function. This process is repeated for all n data points. Subsequently, the cross-validation criterion
![]() |
D. GULISIJA, D. GIANOLA and K. A. WEIGEL (unpublished results) used another nonparametric procedure, LOESS, to study the relationship between performance and inbreeding in Jersey cows. There is some relationship between LOESS and kernel regression. In LOESS, the number of points contributing to a focal fitted value is fixed (contrary to kernel regression, where the actual number depends on the gentleness of the kernel chosen) and governed by a spanning parameter. This parameter (ranging between 0 and 1) is equivalent to h and dictates the fraction of all data points that contribute toward a fitted value. Figure 2 gives a LOESS fit for protein yield (actually, residuals from a parametric model) and 100 bootstrap replicates, illustrating uncertainty about the regression surface. Without getting into details, note that yield decreases gently at low values of inbreeding, followed by a faster linear decline, and then by an apparent increase. Irrespective of the variability (due to that few animals were either noninbred or highly inbred), neither the change of rate at low inbreeding nor the increase in yield at high consanguinity would be predicted by standard quantitative genetics theory. This is another illustration of how "irregularities" can be discovered nonparametrically, which would remain hidden otherwise.
|
Estimation of linear approximation:
If the kernel function is a probability density function, the nonparametric density estimator (3) will be a density function as well, retaining differentiability properties of the kernel (SILVERMAN 1986). Consider E (y | x) = g(x) and suppose that one is interested in inferring a linear approximation to g(x) near some fixed point x* such as the mean value of the covariates; this leads to a nonparametric counterpart of additive genetic value. From a plant and animal breeding point of view, the concept of breeding value is essential in parametric models, so it seems important to develop a nonparametric counterpart as well. The linear function is
![]() | (8) |
![]() |
![]() | (9) |
![]() | (10) |
A nonparametric estimator of (8) is given by
![]() | (11) |
![]() | (12) |
is an estimate of the vector of first derivatives of g(x) with respect to x, evaluated at x*. The estimate could be the gradient of (6),
![]() | (13) |
![]() |
![]() |
![]() | (14) |
, where
is some estimate of the covariance matrix of x. Finally, the relative contribution to variance made by the linear effect of x(
2) can be assessed as
![]() | (15) |
is an estimate of
2, such as
![]() | (16) |
Gaussian kernel:
A p-dimensional Gaussian kernel with a single band width parameter h has the form
![]() |
![]() |
![]() |
The weights entering into estimator (6) are then
![]() |
![]() | (17) |
![]() |
Consider next the linear approximation in (14). Using (17), write
![]() |
![]() |
![]() |
![]() |
![]() |
![]() |
Kernels for discrete covariates:
For a biallelic SNP, there are three possible genotypes at each "locus," as in stylized Mendelian situations. In a standard (parametric) analysis of variance representation, incidence situations (or additive and dominance effects at each of the loci) are described via two dummy binary variables per locus, and all corresponding epistatic interactions can be assessed from effects of cross-products of these variables. This leads to a highly parameterized structure and to formidable model selection problems.
Consider now the nonparametric approach. For an x vector with p coordinates, its statistical distribution is given by the probabilities of each of the 3p combinations of binary outcomes. With SNPs, p can be very large (possibly much larger than n), so it is hopeless to estimate the probability distribution of genotypes accurately from observed relative frequencies, and smoothing is required (SILVERMAN 1986). Kernel estimation extends as follows: for binary covariates the number of disagreements between a focal x and the observed xi in subject i is given by
![]() |
|
where N (Y) stands for disagreement (agreement). Then, d(x, xi) = 4, which is twice the number of disagreements in genotypes because there are only 2 "d.f." per locus. In practice, one should work with a representation of incidences that is free of redundancies. For binary covariates, SILVERMAN (1986) suggests the "binomial" kernel
![]() |
; alternative forms of the kernel function are discussed by AITCHISON and AITKEN (1976) and RACINE and LI (2004). It follows that the kernel estimate of the probability of observing the focal value x is
![]() | (18) |
, every focal point gets an estimate equal to
, irrespective of the observed values x1, x2,..., xn.
The nonparametric estimator of the regression function is
![]() |
Since a discrete distribution does not possess derivatives, the additive genetic value must be defined in the classical sense (e.g., FALCONER 1960), that is, by defining contrasts between expected values of individuals having appropriate genotypes. Here, one can use either the vectorial representation
or the concept of the additive regression model in (7). Hereinafter, it is assumed that the distribution of x is continuous, and a continuous kernel function is employed throughout, as an approximation.
ABSTRACT
KERNEL REGRESSION ON SNP...
>SEMIPARAMETRIC KERNEL MIXED...
REPRODUCING KERNEL HILBERT...
CONCLUSION
ACKNOWLEDGEMENTS
LITERATURE CITED
General considerations:
Consider now a situation for which there might be an operational or mechanistic basis for specifying at least part of a model. For instance, suppose that y is a measure on some quantitative trait, such as milk production of a cow. Animal breeders have exploited to advantage the infinitesimal model of quantitative genetics (FISHER 1918). Vectorial representations of this model are given by QUAAS and POLLAK (1980), and applications to natural populations are discussed by KRUUK (2004). In this section, we combine features of the infinitesimal model with a nonparametric treatment of genomic data and present semiparametric implementations.
Statistical specification:
Model (1) is expanded as
![]() | (19) |
is a vector of nuisance location effects and u is a q x 1 vector containing additive genetic effects of q individuals (these effects are assumed to be independent of those of the markers), some of which may lack a phenotypic record; w'i and z'i are known nonstochastic incidence vectors. As before, g(xi) is some unknown function of the SNP data. It is assumed that
, where
is the "unmarked" additive genetic variance and A is the additive relationship matrix, whose entries are twice the coefficients of coancestry between individuals. Let
be the n x 1 vector of residuals, and assume that
, where
is the residual variance. Note that the model implies that
![]() | (20) |
![]() | (21) |
and u were known, one could use (6) to estimate g(xi) employing yi w'iß z'iu as "observations." This suggests two strategies for analysis of data, discussed below.
Strategy 1mixed model analysis:
This follows from representation (20). First, estimate g(xi), for i = 1, 2, ... , n, via
, as in (6) or, if a Gaussian kernel is adopted, as in (17). Then, carry out a mixed model analysis using the "corrected" data vector and pseudomodel
![]() |
and
are incidence matrices of appropriate order. The pseudomodel ignores uncertainty about
, since
is treated as if it were the true regression (on SNPs) surface.
Under the standard multivariate normality assumptions of the infinitesimal model, one can estimate the variance components
and
from y* via restricted maximum likelihood (REML) (PATTERSON and THOMPSON 1971) and form empirical best linear unbiased estimators and predictors of ß and u, respectively, by solving the Henderson mixed model equations
![]() | (22) |
is evaluated at REML estimates of the variance components. Solving system (22) is a standard problem in animal breeding even for very large q, since A1 is easy to compute. The two-stage procedure could be iterated several times, i.e., use the solutions to (22), to obtain a new estimate of
using
as "data," and then update the pseudodata y*, etc.
The "total" additive genetic effect of an individual possessing a vector of SNP covariates with focal value x can be defined as the sum of the additive genetic effect of the SNPs, in the sense of (8), plus the polygenic effect ui, that is,
![]() |
in the mixed model equations. It is not obvious how a measure of uncertainty about
can be constructed using this procedure.
A Bayesian approach can be used instead, using the corrected data
as observations. Under standard assumptions made for the prior and the likelihood (e.g., WANG et al. 1993, 1994; SORENSEN and GIANOLA 2002), one can draw samples j = 1, 2, ... , m from the pseudoposterior distribution
via Gibbs sampling and then form "semiparametric" draws of the total genetic value as
![]() |
can be construed as draws from a semiparametric pseudoposterior distribution; one can readily calculate the means, median, percentiles, and variance of this distribution and produce posterior summaries, leading to an approximation to uncertainty about total genetic value. Irrespective of whether classical or Bayesian viewpoints are adopted, this approach ignores the error of estimation of g(x), as noted earlier.
Strategy 2random g(.)function:
The estimate of
can be improved as follows. Consider (21) and suppose, temporarily, that
and u are known. Then, form the offset
and the alternative NadarayaWatson estimator of g(xi):
![]() | (23) |
and u are not observable, so they must be inferred somehow.
Regard now
as unknown quantities possessing some prior distribution, which is modified via Bayesian learning into the pseudoposterior process
, after observation of the pseudodata y* and of the n x p matrix of observed SNP covariates X. Then, for some band width h,
would also be a random variable possessing some pseudoposterior distribution. Let
be a draw from the pseudoposterior process
, as in the preceding section. Further, regard
as a draw from the pseudoposterior distribution of
, given y*. Subsequently, estimate features of the pseudoposterior distribution of
by ergodic averaging. For example, an estimate of its posterior expectation would be
![]() |
are obtained via a Markov chain Monte Carlo (MCMC) procedure from the pseudoposterior distribution
. The MCMC procedure would consist of sequential, iterative, sampling from all conditional pseudoposterior distributions, as follows:
- Sample
from
, where ELSE denotes all other parameters, y*, X, and h. Using standard results, this conditional distribution has the form
where
and

- Sample u from
. Using similar standard results, the distribution to sample from is
where
and

- Sample the two variance components from the scaled inverse chi-square distributions
and
Above,
u,
and
e,
are known hyperparameters of independent scaled inverse chi-square priors assigned to
and
, respectively.
- Form draws from the pseudoposterior distribution of
as 
As usual, early draws in the MCMC procedure are discarded as burn-in and, subsequently, m samples are collected to infer features of interest. Note that the procedure termed as strategy 1 gives
as an estimate of the relationship between phenotypes and SNP data. In strategy 2, one can use
instead, where
and
are means of the pseudoposterior distributions
and
, respectively. Under strategy 2, a point predictor of total additive genetic value could be
![]() |
is the posterior mean of ui. ABSTRACT
KERNEL REGRESSION ON SNP...
SEMIPARAMETRIC KERNEL MIXED...
>REPRODUCING KERNEL HILBERT...
CONCLUSION
ACKNOWLEDGEMENTS
LITERATURE CITED
General:
What follows is motivated by developments in MALLICK et al. (2005) for classification of tumors using microarray data. The underlying theory is outside the scope of this article. Only essentials are given here, and foundations are in WAHBA (1990, 1999).
Using the structure of (19), consider the penalized sum of squares
![]() | (24) |
is some norm or "stabilizer." For instance, in smoothing splines,
is a function of the second derivatives of g(x) integrated between end points that compose the data. The second term in (24) acts as a penalty: if the unknown function g(x) is rough, in the sense of having slopes that change rapidly, the penalty increases. The main problem here is that of finding the function g(x) that minimizes (24). Since
is a functional on g(x), this is a variational or calculus of variations problem over a space of smooth curves. The solution was given by KIMELDORF and WAHBA (1971) and WAHBA (1999), and the minimizer admits the representation
![]() |
is called a reproducing kernel. A possible choice for the kernel (MALLICK et al. 2005) is the single smoothing parameter Gaussian function
![]() |
Mixed model representation:
We embed these results into (19), leading to the specification
![]() | (25) |
0 included as part of
. Note that there are as many regressions
j as there are data points. However, the roughness penalty in the variational problem leads to a reduction in the effective number of parameters in reproducible kernel Hilbert spaces (RKHS) regression, as it occurs in smoothing splines (FOX 2002).
Define the 1 x n row vector
![]() |
, j = 1, 2,..., n; and the n x n matrix
![]() |
![]() |
Suppose, further, that the
j coefficients are exchangeable according to the distribution
. Hence, for a given smoothing parameter h, we are in the setting of a mixed-effects linear model.
Given h,
,
, and
(at a given h, the three variance components may be estimated by, e.g., REML) one can obtain predictions of the polygenic breeding values u and of the coefficients
from the solutions to the system
![]() | (26) |
![]() | (27) |
![]() |
j effects, its empirical best linear unbiased predictor, assuming known h, can be obtained by replacing these effects by the corresponding solutions from (26).
Incomplete genotyping:
At least in animal breeding, it is not feasible to have all individuals genotyped for SNPs. On the other hand, the number of animals with phenotypic information available is typically in the order of hundreds of thousands, and genotyping is selective, e.g., young bulls that are candidates for progeny testing in dairy cattle production. Animals lacking molecular data are not a random sample from the population, and ignoring this issue may lead to biased inferences. Unless missingness of molecular data is ignorable, in the sense of, e.g., HENDERSON (1975), RUBIN (1976), GIANOLA and FERNANDO (1986), or IM et al. (1989), the procedures given below require modeling of the missing data process, which is difficult and may lack robustness. Here, it is assumed that missingness is ignorable, enabling use of likelihood-based or Bayesian procedures as if selection had not taken place (SORENSEN et al. 2001). Two ad hoc procedures are discussed, and an alternative approach, suitable for kernel regression, is presented in the CONCLUSION.
Let the vector of phenotypic data be partitioned as
, where y1 (n1 x 1) consists of records of individuals lacking SNP data, whereas y2 (n2 x 1) includes phenotypic data of genotyped individuals. Often, it will be the case that n1 > p >> n2. We adopt the model
![]() | (28) |
and
are mutually independent but heteroscedastic vectors. In short, the key assumption made here is that the random effect
affects y2 but not y1 or, equivalently, that it gets absorbed into e1. With this representation, the mixed model equations take the form
![]() | (29) |
,
, and
are treated as known, then
is an unbiased estimator of
, and
and
are unbiased predictors of u and
, respectively. They are not "best," in the sense of having minimum variance or minimum prediction error variance, because the smooth function
of the SNP markers is missing in the model for individuals that are not genotyped (HENDERSON 1974).
An alternative consists of writing the bivariate model
![]() |
![]() |
and
are additive genetic variances in individuals without and with molecular information, respectively, and
is their additive genetic covariance. Computations would be those appropriate for a two-trait linear model analysis (HENDERSON 1984; SORENSEN and GIANOLA 2002).
Bayesian analysis:
To illustrate, consider the first of the two options in the preceding section. Suppose a kernel has been chosen but that the value of h is uncertain, so that the model unknowns are
![]() |
![]() | (30) |
indicates a multivariate normal distribution with appropriate mean vector and covariance matrix. The four variance components
are assigned independent scaled inverse chi-square prior distributions with degrees of freedom
and scale parameters S2, with appropriate subscripts. Assign an improper prior distribution to each of the elements of
and, as in MALLICK et al. (2005), adopt a uniform prior for h, with lower and upper boundaries hmin and hmax, respectively.
Given the parameters, observations are assumed to be conditionally independent, and the distribution adopted for the sampling model is
![]() |
Given h, one is again in the setting of the Bayesian analysis of a mixed linear model, and Markov chain Monte Carlo procedures for this situation are well known. Under standard conjugate prior parameterizations, all conditional posterior distributions are known, save for that of h. Hence, one can construct a GibbsMetropolis sampling scheme in which conditional distributions are used for drawing
and a Metropolis update is employed for h.
Location effects
are drawn from a multivariate normal distribution with mean vector and covariance matrix
![]() | (31) |
![]() | (32) |
![]() | (33) |
![]() | (34) |
is multivariate normal as well, with mean vector
![]() | (35) |
![]() | (36) |
All four variance components have scaled inverse chi-square conditional posterior distributions and are conditionally independent. The conditional posterior distributions to sample from are
![]() | (37) |
![]() | (38) |
![]() | (39) |
![]() | (40) |
The most difficult parameter to sample is h. Its conditional posterior density can be represented as
![]() | (41) |
![]() |
. Suppose that the Markov chain is at state h[t]. A proposal value h* is drawn from some symmetric candidate-generating distribution and accepted with probability
![]() |
,
![]() |
is assessed such that a reasonable acceptance rate for the MetropolisHastings algorithm is attained.
Illustrative example:
Phenotypic and genotypic values were simulated (single replication) for a sample of N unrelated individuals, for each of two situations. The trait was determined either by five biallelic QTL, having additive gene action, or by five pairs of biallelic QTL, having additive-by-additive gene action. Under nonadditive gene action, the additive genetic variance was null, so that all genetic variance was of the additive-by-additive type. Heritability (ratio between genetic and phenotypic variance) in both settings was set to 0.5.Genotypes were simulated for a total of 100 biallelic markers, including the "true" QTL; all loci were simulated to be in gametic-phase equilibrium. Since all individuals were unrelated and all loci were in gametic-phase equilibrium, only the QTL genotypes and the trait phenotypes would provide information on the genotypic value. In most real applications, the location of the QTL will not be known, and so many loci that are not QTL will be included in the analysis.
A RKHS mixed model was used to predict genotypic values, given phenotypic values and genotypes at the QTL and at all other loci. The model included a fixed intercept
0 and a random RKHS regression coefficient
i for each subject; additive effects were omitted from the model, as individuals were unrelated, precluding separation of additive effects from residuals, in the absence of knowledge of variance components. The genetic value gi of a subject was predicted as
![]() |
and
were obtained by solving (26) for this model, using a Gaussian kernel at varying functions of h. The mean squared error of prediction of genetic value (MSEP) was calculated as
![]() | (42) |
that minimized (42). To evaluate the performance of
as a predictor, another sample of 1000 individuals ("PRED") was simulated, including genotypes, genotypic values, and phenotypes. This was deemed preferable to doing prediction in the training sample, to reduce dependence between performance and (h,
), whose values were assessed in the training sample The genotypic values of the subjects in PRED were predicted, given their genotypes, using
. Genotypic values were also predicted using a multiple linear regression (MLR) mixed model with a fixed intercept and random regression coefficients on the linear effects of genotypes. Results for the RKHS mixed model are in Table 1, and those for the MLR mixed model are in Table 2, where "accuracy" is the correlation between true and predicted genetic values. When gene action was strictly additive, the two methods (each fitting k = 100 loci) had the same accuracy, indicating that RKHS performed well even when the parametric assumptions were valid. On the other hand, when inheritance was purely additive by additive, the parametric MLR was clearly outperformed by RKHS, irrespective of the number of loci fitted. An exception occurred at k = 100 and N = 1000; here, the two methods were completely inaccurate. However, when N was increased from 1000 to 5000, the accuracy of RKHS jumped to 0.85 (Table 1), whereas MLR remained inaccurate (Table 2). Note that the accuracy of RKHS decreased (with N held at 1000) when k increased. We attribute this behavior to the use of a Gaussian kernel when, in fact, covariates are discrete.
|
|
ABSTRACT
KERNEL REGRESSION ON SNP...
SEMIPARAMETRIC KERNEL MIXED...
REPRODUCING KERNEL HILBERT...
>CONCLUSION
ACKNOWLEDGEMENTS
LITERATURE CITED
Except for the parametric part of the model, that is, the standard normality assumption for additive genetic values and model residuals, the procedures attempt to circumvent potential difficulties posed by violation of assumptions required for an orthogonal decomposition of genetic variance stemming from SNP genotypes (COCKERHAM 1954; KEMPTHORNE 1954). Our expectation is that the nonparametric function of marker genotypes,
, captures all possible forms of interaction, but without explicit modeling. The procedures should be particularly useful for highly dimensional regression, including the situation in which the number of SNP variables (p) exceeds amply the number of data points (n). Instead of performing a selection of a few "significant" markers on the basis of some ad hoc method, information on all molecular polymorphisms is employed, irrespective of the degree of colinearity. This is because the procedures should be insensitive to difficulties caused by colinearity, given the forms of the estimators, e.g., (6) or (23). It is assumed that the assignment of genotypes to individuals is unambiguous, i.e., x gives the genotypes for SNP markers free of error.
Our methods share the spirit of those of MEUWISSEN et al. (2001), GIANOLA et al. (2003), XU (2003), YI et al. (2003), TER BRAAK et al. (2005), WANG et al. (2005), and ZHANG and XU (2005), but without making strong assumptions about the form of the markerphenotype relationship, which is assumed linear by all these authors, and without invoking parametric distributions for pertinent effects.
A hypothetical example was presented, illustrating potential and computational feasibility of at least the RKHS procedure; a standard additive genetic model was outperformed by RKHS when additive-by-additive gene action was simulated. Comparisons between parametric and nonparametric procedures are needed. It is unlikely that computer simulations would shed much light in this respect. First, a huge number of simulation scenarios can be envisaged, resulting from limitless combinations of parameter values, numbers of markers, marker effects, residual distributions, etc. Second, simulations tend to have local value only, that is, conclusions are tentative only within the limited experimental region explored and are heavily dependent on the state of nature assumed. Third, end points and gauges of a simulation tend to be arbitrary. For instance, should frequentist procedures be assessed on Bayesian grounds and vice versa? We believe that studies based on predictive cross-validation for a range of traits and species are perhaps more fruitful. These studies will be conducted once adequate and reliable phenotypic-SNP data sets become more widely available.
There are at least two difficulties with the proposed methodology. As noted above, it is assumed that the SNP genotypes are constructed without error, which is seldom the case. To solve this problem, one would need to build an error-in-variables model, but at the expense of introducing additional assumptions. A second difficulty is that posed by the fact that many individuals will lack SNP information, at least in animal breeding. Earlier, we presented approximate procedures on the basis of the assumption that missingness of SNP data is ignorable, such that the effect of
can be absorbed into a residual that has a different variance from that peculiar to individuals possessing SNP data or into the additive genetic value in a two-trait implementation. A more appropriate treatment of missing data requires imputation of genotypes for individuals lacking SNP information. If the SNP data are missing completely at random or just at random, the solutions to system (29), after augmentation with the missing
, would give the means of the posterior distributions of
, and
, conditionally on the variance components and on h (GIANOLA and FERNANDO 1986; SORENSEN et al. 2001). However, the sampling procedure should address the constraint that SNP genotypes of related individuals must be more alike than those of unrelated subjects. In our treatment, and for operational reasons, we adopted the simplifying assumption that SNP genotypes are independently distributed. This may be anticonservative and could lead to some bias if SNP data are not missing completely at random.
It is unknown to what extent our procedures are robust with respect to the choice of kernel function. SILVERMAN (1986) discusses several options and, in the context of univariate density estimation, concludes that different kernels differ little in mean squared error. Also, an inadequate specification of the smoothing parameter h may affect inference adversely. In this respect, the procedures discussed in REPRODUCING KERNEL HILBERT SPACES MIXED MODEL provide for automatic specification of the band-width parameter. Here, one could either make an analysis conditionally on the "most likely" value of h or, alternatively, average over its posterior distribution, to take uncertainty into account fully. Here, we have focused on using a continuous kernel, primarily to exploit differentiability properties. However, the vector of markers, x, has a discrete distribution, and careful investigation is needed to assess the adequacy of such an approximation.
A procedure to accommodate missing genotypes in kernel regression is as follows. Replace g(xi) in (19) by
![]() | (43) |
is the conditional expectation of xi given all the observed genotypes in the pedigree, and
. The conditional expectation in (43) is a function of the conditional probabilities of the missing genotypes given the observed genotypes. Much research has been devoted to computing such probabilities from complex pedigrees using either approximations (VAN ARENDONK et al. 1989; FERNANDO et al. 1993; KERR and KINGHORN 1996) or MCMC samplers (SHEEHAN and THOMAS 1993; JENSEN et al. 1995; JENSEN and KONG 1999; FERNÁNDEZ et al. 2002; STRICKER et al. 2002).
When genotypes for individual i are observed,
, and qi is null. When genotypes for i are missing,
is an approximation to the conditional expectation of g(xi) given the observed genotypes, and qi is a random variable with approximately null expectation. Now, the genotypic value of an individual can be written as
![]() | (44) |
![]() | (45) |
by its estimate in (45).
In the simplest approach to modeling the covariances of the qi, the random effects ui and qi are combined as
![]() |
![]() | (46) |
of a can be obtained by using the usual tabular algorithm that is based only on pedigree information, after setting the ith diagonal of
to Var(ui) + Var(qi). The inverse of this approximate
is sparse and it can be computed efficiently (HENDERSON 1976).
Although qi is not null only for individuals with missing genotypes, as discussed below, the observed genotypes on relatives can provide information on the segregation of alleles in individuals with missing genotypes. Thus, observed genotypes can provide information on covariances between the qi. Let qij denote the additive genotypic value at locus j, which can be written as
![]() |
can be written as
![]() | (47) |
is the additive effect of haplotype hijm. These additive effects can be estimated from the linear function given by (8) as follows. Let
denote the value of x with all elements set to its mean value except at locus j where one of the haplotypes is set to hj. Then,
![]() |
The covariance between
and
, for example, can be written as
![]() | (48) |
is the probability that the maternal haplotype of i' is inherited from the maternal haplotype of d its dam. Suppose genotype information is available on the ancestors of d and on the descendants of i'. Then, the segregation probabilities in (48) may be different from 0.5, and thus the observed genotypes will contribute information for genetic covariances at this locus. However, even in this situation, the inverse of the gametic covariance matrix Gj for locus j that is constructed using (47) and (48) is sparse and it can be computed efficiently (FERNANDO and GROSSMAN 1989).
For an improved approximation to the modeling of covariances between the qi, the model for yi is written as
![]() | (49) |
, after setting the ith diagonal to
![]() | (50) |
Our models extend directly to binary or ordered categorical responses when a threshold-liability model holds (WRIGHT 1934; FALCONER 1965; GIANOLA 1982; GIANOLA and FOULLEY 1983). Here, the
function would be viewed as affecting a latent variable; MALLICK et al. (2005) used this idea in analysis of gene expression measurements.
Extension to multivariate responses is less straightforward. It is conceivable that each trait may require a different function of SNP genotypes. It is not obvious how this problem should be dealt with without making strong parametric assumptions.
In conclusion, we believe that this article gives a first description of nonparametric and semiparametric procedures that may be suitable for prediction of genetic value using dense marker data. However, considerable research is required for tuning, extending, and validating some of the ideas presented here.
ABSTRACT
KERNEL REGRESSION ON SNP...
SEMIPARAMETRIC KERNEL MIXED...
REPRODUCING KERNEL HILBERT...
CONCLUSION
>ACKNOWLEDGEMENTS
LITERATURE CITED
ABSTRACT
KERNEL REGRESSION ON SNP...
SEMIPARAMETRIC KERNEL MIXED...
REPRODUCING KERNEL HILBERT...
CONCLUSION
ACKNOWLEDGEMENTS
>LITERATURE CITED
AITCHISON, J., and C. G. G. AITKEN, 1976 Multivariate binary discrimination by the kernel method. Biometrika 63: 413420.
CHANG, H. L. A., 1988 Studies on estimation of genetic variances under nonadditive gene action. Ph.D. Thesis, University of Illinois, UrbanaChampaign, IL.
CHU, C. K., and J. S. MARRON, 1991 Choosing a kernel regression estimator. Stat. Sci. 6: 404436.
COCKERHAM, C. C., 1954 An extension of the concept of partitioning hereditary variance for analysis of covariances among relatives when epistasis is present. Genetics 39: 859882.
DEKKERS, J. C. M., and F. HOSPITAL, 2002 The use of molecular genetics in the improvement of agricultural populations. Nat. Rev. Genet. 3: 2232.[CrossRef][Medline]
D'HAESELEER, P., S. LIANG and R. SOMOGYI, 2000 Genetic network inference: from co-expression clustering to reverse engineering. Bioinformatics 16: 707726.
FALCONER, D. S., 1960 Introduction to Quantitative Genetics. Longman, London.
FALCONER, D. S., 1965 The inheritance of liability to certain diseases, estimated from the incidence among relatives. Ann. Hum. Genet. 29: 5176.[CrossRef]
FERNÁNDEZ, S. A., R. L FERNANDO, B. GULBRANDTSEN, C. STRICKER, M. SCHELLING et al., 2002 Irreducibility and efficiency of ESIP to sample marker genotypes in large pedigrees with loops. Genet. Sel. Evol. 34: 537555.[CrossRef][Medline]
FERNANDO, R. L., and M. GROSSMAN, 1989 Marker assisted selection using best linear unbiased prediction. Genet. Sel. Evol. 21: 467477.[CrossRef]
FERNANDO, R. L., C. STRICKER and R. C. ELSTON, 1993 An efficient algorithm to compute the posterior genotypic distribution for every member of a pedigree without loops. Theor. Appl. Genet. 87: 8993.
FISHER, R. A., 1918 The correlation between relatives on the supposition of Mendelian inheritance. Trans. R. Soc. Edinb. 52: 399433.
FOX, J., 2002 An R and S-PLUS Companion to Applied Regression. Sage, Thousand Oaks, CA.
FOX, J., 2005 Introduction to Nonparametric Regression (Lecture Notes) (http://socserv.mcmaster.ca/jfox/Courses/Oxford).
GIANOLA, D., 1982 Theory and analysis of threshold characters. J. Anim. Sci. 54: 10791096.
GIANOLA, D., and R. L. FERNANDO, 1986 Bayesian methods in animal breeding theory. J. Anim. Sci. 63: 217244.
GIANOLA, D., and J. L. FOULLEY, 1983 Sire evaluation for ordered categorical data with a threshold model. Genet. Sel. Evol. 15: 201244.
GIANOLA, D., and D. SORENSEN, 2004 Quantitative genetic models for describing simultaneous and recursive relationships between phenotypes. Genetics 167: 14071424.
GIANOLA, D., M. PEREZ-ENCISO and M. A. TORO, 2003 On marker-assisted prediction of genetic value: beyond the ridge. Genetics 163: 347365.
GOLUB, T. R., D. SLONIM, P. TAMAYO, C. HUARD, M. GASENBEEK et al., 1999 Molecular classification of cancer: class discovery and class prediction by gene expression monitoring. Science 286: 531537.
HART, J. D., and C. L. LEE, 2005 Robustness of one-sided cross-validation to autocorrelation. J. Multivar. Anal. 92: 7796.[CrossRef]
HASTIE, T. J., and R. J. TIBSHIRANI, 1990 Generalized Additive Models. Chapman & Hall, London.
HASTIE, T., R. TIBSHIRANI and J. FRIEDMAN, 2001 The Elements of Statistical Learning. Springer, New York.
HAYES, B., J. LAERDAHL, D. LIEN, A. ADZHUBEI and B. HØYHEIM, 2004 Large scale discovery of single nucleotide polymorphism (SNP) markers in Atlantic Salmon (Salmo salar). AKVAFORSK, Institute of Aquaculture Research (www.mabit.no/pdf/hayes.pdf).
HENDERSON, C. R., 1973 Sire evaluation and genetic trends, pp. 1041 in Proceedings of the Animal Breeding and Genetics Symposium in Honor of Dr. Jay L. Lush. American Society of Animal Science and American Dairy Science Association, Champaign, IL.
HENDERSON, C. R., 1974 General flexibility of linear model techniques for sire evaluation. J. Dairy Sci. 57: 963972.
HENDERSON, C. R., 1975 Best linear unbiased estimation and prediction under a selection model. Biometrics 31: 423447.[CrossRef][Medline]
HENDERSON, C. R., 1976 A simple method for computing the inverse of a numerator relationship matrix used in prediction of breeding values. Biometrics 32: 6983.[CrossRef]
HENDERSON, C. R., 1984 Applications of Linear Models in Animal Breeding. University of Guelph, Guelph, ON, Canada.
IM, S., R. L. FERNANDO and D. GIANOLA, 1989 Likelihood inferences in animal breeding under selection: a missing data theory viewpoint. Genet. Sel. Evol. 21: 399414.[CrossRef]
JENSEN, C. S., and A. KONG, 1999 Blocking Gibbs sampling for linkage analysis in large pedigrees with many loops. Am. J. Hum. Genet. 65: 885901.[CrossRef][Medline]
JENSEN, C. S., A. KONG and U. KJAERULFF, 1995 Blocking Gibbs sampling in very large probabilistic expert systems. Int. J. Hum. Comp. Stud. 42: 647666.[CrossRef]
KEMPTHORNE, O., 1954 The correlation between relatives in a random mating population. Proc. R. Soc. Lond. Ser. B 143: 103113.
KERR, R. J., and B. P. KINGHORN, 1996 An efficient algorithm for segregation analysis in large populations. J. Anim. Breed. Genet. 113: 457469.
KIMELDORF, G., and G. WAHBA, 1971 Some results on Tchebycheffian spline functions. J. Math. Anal. Appl. 33: 8295.[CrossRef]
KRUUK, L. E. B., 2004 Estimating genetic parameters in natural populations using the animal model. Philos. Trans. R. Soc. Lond. B 359: 873890.
LINDLEY, D. V., and A. F. M. SMITH, 1972 Bayes estimates for the linear model. J. R. Stat. Soc. B 34: 141.
MALLICK, B. K., D. GHOSH and M. GHOSH, 2005 Bayesian classification of tumours by using gene expression data. J. R. Stat. Soc. B 67: 219234.[CrossRef]
MARRON, J. S., 1988 Automatic smoothing parameter selection: a survey. Emp. Econ. 13: 187208.[CrossRef]
METROPOLIS, N., A. W. ROSENBLUTH, M. N. ROSENBLUTH, A. H TELLER and E. TELLER, 1953 Equations of state calculations by fast computing machines. J. Chem. Phys. 21: 10871091.[CrossRef]
MEUWISSEN, T. H. E., B. J. HAYES and M. E. GODDARD, 2001 Is it possible to predict the total genetic merit under a very dense marker map? Genetics 157: 18191829.
NADARAYA, E. A., 1964 On estimating regression. Theor. Probab. Appl. 9: 141142.
PATTERSON, H. D., and R. THOMPSON, 1971 Recovery of interblock information when block sizes are unequal. Biometrika 58: 545554.
QUAAS, R. L., and E. J. POLLAK, 1980 Mixed model methodology for farm and ranch beef cattle testing programs. J. Anim. Sci. 51: 12771287.
RACINE, J., and Q. LI, 2004 Nonparametric estimation of regression functions with both categorical and continuous data. J. Econom. 119: 99130.[CrossRef]
RUBIN, D. B., 1976 Inference and missing data. Biometrika 63: 581582.
RUPPERT, D., M. P. WAND and R. J. CARROLL, 2003 Semiparametric Regression. Cambridge University Press, Cambridge, UK.
SCHUCANY, W. R., 2004 Kernel smoothers: an overview of curve estimators for the first graduate course in nonparametric statistics. Stat. Sci. 4: 663675.[CrossRef]
SHEEHAN, N., and A. THOMAS, 1993 On the irreducibility of a Markov chain defined on a space of genotype configurations by a sample scheme. Biometrics 49: 163175.[CrossRef][Medline]
SILVERMAN, B. W., 1986 Density Estimation for Statistics and Data Analysis. Chapman & Hall, London.
SOLLER, M., and J. BECKMANN, 1982 Restriction fragment length polymorphisms and genetic improvement. Proc. 2nd World Congr. Genet. Appl. Livestock Prod. 6: 396404.
SORENSEN, D., and D. GIANOLA, 2002 Likelihood, Bayesian, and MCMC Methods in Quantitative Genetics. Springer-Verlag, New York.
SORENSEN, D. A., and B. W. KENNEDY, 1983 The use of the relationship matrix to account for genetic drift variance in the analysis of genetic experiments. Theor. Appl. Genet. 66: 217220.
SORENSEN, D., R. L. FERNANDO and D. GIANOLA, 2001 Inferring the trajectory of genetic variance in the course of artificial selection. Genet. Res. 77: 8394.[CrossRef][Medline]
STRICKER, C., M. SCHELLING, F. DU, I. HOESCHELE, S. A. FERNANDEZ et al., 2002 A comparison of efficient genotype samplers for complex pedigrees and multiple linked loci. 7th World Congress of Genetics Applied to Livestock Production. INRA, Castanet-Tolosan, France. CD-ROM communication no. 2112.
TER BRAAK, C. J. F., M. BOER and M. BINK, 2005 Extending Xu's (2003) Bayesian model for estimating polygenic effects using markers of the entire genome. Genetics 170: 14351438.
VAN ARENDONK, J. A. M., C. SMITH and B. W. KENNEDY, 1989 Method to estimate genotype probabilities at individual loci in farm livestock. Theor. Appl. Genet. 78: 735740.
WAHBA, G., 1990 Spline Models for Observational Data. Society for Industrial and Applied Mathematics, Philadelphia.
WAHBA, G., 1999 Support vector machines, reproducing kernel Hilbert spaces and the randomized GAVC, pp. 6888 in Advances in Kernel Methods, edited by B. SCHÖLKOPF, C. BURGES and A. SMOLA. MIT Press, Cambridge, MA.
WANG, C. S., J. J. RUTLEDGE and D. GIANOLA, 1993 Marginal inferences about variance components in a mixed linear model using Gibbs sampling. Genet. Sel. Evol. 25: 4162.
WANG, C. S., J. J. RUTLEDGE and D. GIANOLA, 1994 Bayesian analysis of mixed linear models via Gibbs sampling with an application to litter size in Iberian pigs. Genet. Sel. Evol. 26: 91115.[CrossRef]
WANG, H., Y. M. ZHANG, X. LI, G. MASINDE, S. MOHAN et al., 2005 Bayesian shrinkage estimation of quantitative trait loci parameters. Genetics 170: 465480.
WATSON, G. S., 1964 Smooth regression analysis. Sankhy
A 26: 359372.
WONG, G. K., B. LIU, J. WANG, Y. ZHANG, X. YANG et al., 2004 A genetic variation map for chicken with 2.8 million single-nucleotide polymorphisms. Nature 432: 717722.[CrossRef][Medline]
WRIGHT, S., 1934 The results of crosses between inbred strains of guinea pigs, differing in number of digits. Genetics 19: 537551.
XU, S., 2003 Estimating polygenic effects using markers of the entire genome. Genetics 163: 789801.
YI, N., G. VARGHESE and D. A. ALLISON, 2003 Stochastic search variable selection for identifying multiple quantitative trait loci. Genetics 164: 11291138.
ZHANG, Y., and S. XU, 2005 A penalized maximum-likelihood method for estimating epistatic effects of QTL. Heredity 95: 96104.[CrossRef][Medline]
Communicating editor: R. W. DOERGE
This article has been cited by other articles:
![]() |
G. de los Campos, D. Gianola, and G. J. M. Rosa Reproducing kernel Hilbert spaces regression: A general framework for genetic evaluation J Anim Sci, June 1, 2009; 87(6): 1883 - 1887. [Abstract] [Full Text] [PDF] |
||||
![]() |
G. de los Campos, H. Naya, D. Gianola, J. Crossa, A. Legarra, E. Manfredi, K. Weigel, and J. M. Cotes Predicting Quantitative Traits With Regression Models for Dense Molecular Markers and Pedigree Genetics, May 1, 2009; 182(1): 375 - 385. [Abstract] [Full Text] [PDF] |
||||
![]() |
B. J. Hayes, P. J. Bowman, A. J. Chamberlain, and M. E. Goddard Invited review: Genomic selection in dairy cattle: Progress and challenges J Dairy Sci, February 1, 2009; 92(2): 433 - 443. [Abstract] [Full Text] [PDF] |
||||
![]() |
E. L. Heffner, M. E. Sorrells, and J.-L. Jannink Genomic Selection for Crop Improvement Crop Sci., January 28, 2009; 49(1): 1 - 12. [Abstract] [Full Text] [PDF] |
||||
![]() |
A. Legarra, C. Robert-Granie, E. Manfredi, and J.-M. Elsen Performance of Genomic Selection in Mice Genetics, September 1, 2008; 180(1): 611 - 618. [Abstract] [Full Text] [PDF] |
||||
![]() |
D. Gianola and J. B. C. H. M. van Kaam Reproducing Kernel Hilbert Spaces Regression Methods for Genomic Assisted Prediction of Quantitative Traits Genetics, April 1, 2008; 178(4): 2289 - 2303. [Abstract] [Full Text] [PDF] |
||||
![]() |
O. Gonzalez-Recio, D. Gianola, N. Long, K. A. Weigel, G. J. M. Rosa, and S. Avendano Nonparametric Methods for Incorporating Genomic Information Into Genetic Evaluations: An Application to Mortality in Broilers Genetics, April 1, 2008; 178(4): 2305 - 2313. [Abstract] [Full Text] [PDF] |
||||
- THIS ARTICLE
-
Abstract
- Full Text (PDF)
-
All Versions of this Article:
genetics.105.049510v1
173/3/1761 most recent - Alert me when this article is cited
- Alert me if a correction is posted
- SERVICES
- Email this article to a friend
- Similar articles in this journal
- Similar articles in PubMed
- Alert me to new issues of the journal
- Download to citation manager
- Reprints & Permissions
- CITING ARTICLES
- Citing Articles via HighWire
- Citing Articles via Google Scholar
- GOOGLE SCHOLAR
- Articles by Gianola, D.
- Articles by Stella, A.
- Search for Related Content
- PUBMED
- PubMed Citation
- Articles by Gianola, D.
- Articles by Stella, A.







































































































