- THIS ARTICLE
-
Abstract
- Full Text (PDF)
- Alert me when this article is cited
- Alert me if a correction is posted
- SERVICES
- Email this article to a friend
- Similar articles in this journal
- Alert me to new issues of the journal
- Download to citation manager
- Reprints & Permissions
- CITING ARTICLES
- Citing Articles via HighWire
- Citing Articles via Google Scholar
- GOOGLE SCHOLAR
- Articles by Gianola, D.
- Articles by Toro, M. A.
- Search for Related Content
- PUBMED
- Articles by Gianola, D.
- Articles by Toro, M. A.
On Marker-Assisted Prediction of Genetic Value: Beyond the Ridge
Daniel Gianolaa, Miguel Perez-Encisob, and Miguel A. Toroca Department of Animal Sciences, University of Wisconsin, Madison, Wisconsin 53706,
b Station d'Amelioration Génétique des Animaux, Institut National de la Recherche Agronomique, 31326 Castanet-Tolosan, France
c Departamento de Mejora Genética Animal, Instituto Nacional de Investigaciones Agrarias, 28040-Madrid, Spain
Corresponding author: Daniel Gianola, 1675 Observatory Dr., Madison, WI 53706., gianola{at}calshp.cals.wisc.edu (E-mail)
Communicating editor: J. B. WALSH
| ABSTRACT |
|---|
Marked-assisted genetic improvement of agricultural species exploits statistical dependencies in the joint distribution of marker genotypes and quantitative traits. An issue is how molecular (e.g., dense marker maps) and phenotypic information (e.g., some measure of yield in plants) is to be used for predicting the genetic value of candidates for selection. Multiple regression, selection index techniques, best linear unbiased prediction, and ridge regression of phenotypes on marker genotypes have been suggested, as well as more elaborate methods. Here, phenotype-marker associations are modeled hierarchically via multilevel models including chromosomal effects, a spatial covariance of marked effects within chromosomes, background genetic variability, and family heterogeneity. Lorenz curves and Gini coefficients are suggested for assessing the inequality of the contribution of different marked effects to genetic variability. Classical and Bayesian methods are presented. The Bayesian approach includes a Markov chain Monte Carlo implementation. The generality and flexibility of the Bayesian method is illustrated when a Lorenz curve is to be inferred.
THE availability of a plethora of markers has led to consideration of the issue of the extent to which molecular information can be used to advantage in genetic improvement programs of agricultural species, such as maize or dairy cattle. There is a large body of literature on this matter (e.g., ![]()
![]()
![]()
![]()
![]()
The basic idea in marker-assisted selection is to exploit statistical dependencies (linkage disequilibrium) existing in the joint distribution of marker and quantitative trait loci (QTL) genotypes. For example, when two inbred lines are crossed, the disequilibrium is manifest in the F2 generation. On the other hand, when there is linkage equilibrium at the population level, only the joint distribution of marker and QTL genotypes within a family is nontrivial (![]()
![]()
![]()
Linkage disequilibrium between markers and QTL can be used for two main purposes: (1) to infer genomic location and effects of QTL affecting a trait and (2) to arrive at improved (in some statistical sense) predictions of genetic merit of candidates for selection in a breeding program. These two objectives may not be disjoint (e.g., ![]()
![]()
![]()
![]()
![]()
Our concern is with statistical models and methods for inferring genetic merit using molecular and phenotypic information. The objective is to describe phenotype-marker associations using multilevel hierarchical linear models. The setting is mainly as in ![]()
![]()
| A MIXED-EFFECTS MODEL FORMULATION |
|---|
Hierarchical representation:
Let the phenotypic value of individual i for a quantitative trait in an F2 from a cross between inbred lines be described by the model
![]() |
(1) |
Here, the p x 1 vector ß contains some systematic effects representing, e.g., year of harvest, level of fertilization, or plant density; x'i is a known incidence vector relating ß to yi; ai is an unobserved genetic value; and ei is an independently distributed random residual reflecting environmental variability or inadequacy of the model. It is assumed that the genetic value ai results from an unknown number (K, say) of QTL acting additively, so that

where Qik is the genotype at biallelic (assumed for simplicity) locus k for individual i and
k is the per-allele effect. If ai is random, (1) is a special case of the well-known mixed linear model (e.g., ![]()
It is conceptually convenient to develop model (1) hierarchically. The first level of a Gaussian hierarchy is given by
![]() |
(2) |
where N (·) indicates a normal distribution and
2e is the environmental variance. If the n environmental deviates are independently and identically distributed, this leads to the matrix representation
![]() |
(3) |
where X is an n x p incidence matrix assumed (without loss of generality) to have full-column rank, a = {ai}, and I denotes an identity matrix, in this case n x n. Suppose next that individual i has been typed for marker genotypes at each of l loci; this is represented by the vector

with

Assume that all individuals have been typed for all markers (this is not realistic, but see the DISCUSSION). Then, the unobserved genetic value can be modeled as
![]() |
(4) |
where
= {
k} is of order l x 1. We refer to
k as the "marked effect" of marker locus k on genetic value, while
i can be interpreted as some residual or "background" genetic effect not involved in the association between genetic value and the markers but, yet, having an effect on phenotype. The vector
is the gradient or regression of the unobservable additive genetic value on the observable marker genotype, that is,
=
ai/
mi. As noted by ![]()

where m2'i is a row vector with elements consisting of the squares of the corresponding entries of mi. Interactions between marked effects for different marker loci can be modeled via cross-products between appropriate elements of mi. For simplicity, additivity of marked effects is assumed throughout.
The second level of the hierarchy is represented by a distribution describing the uncertainty about genetic values, given the marked effects, that is the background genetic variability. We adopt the Gaussian model,

where
2
is the background additive genetic variance. It is assumed (rightly or wrongly, depending on the context), given the marker genotypes and
, that the "background" genetic effects
i of different individuals are mutually independent. This implies that either there is no family structure or that, conditionally on the marked effects
, the family structure is not relevant. Family clustering is taken up in a later section. In matrix notation, and consistently with (4), the assumption of independence leads to
![]() |
(5) |
where M is the n x l matrix of known marker genotypes. Unless there is some prior knowledge about
2e and
2
(or some clustering of individuals, such as a family structure), the background effect
i must be lumped together with ei, because of nonidentifiability. On the other hand, if the variances
2
and
2e are known a priori, it is possible to "predict"
i distinctly from ei, in the same way that one can predict additive genetic and environmental effects via BLUP when dispersion parameters are known.
![]()
![]()
as a fixed parameter and employ ridge regression for estimation of this vector. From a Bayesian perspective (e.g., ![]()
![]()
as having the distribution
![]() |
(6) |
with
2
elicited in some manner. Assumption (6) is adopted as the third level of the hierarchy. In a frequentist setting, the assumption states that the marked effects
are drawn at random from the multivariate normal distribution (6) in each conceptual repetition of a crossing experiment. In a Bayesian setting, this would be part of the prior ensemble of the model; this is discussed later. Regression coefficients for markers that are not adjacent to QTL are expected to be null under the assumption of no interference (![]()
can be treated as a random effect merely as a device for obtaining possibly improved (in some sense) predictions of genetic value. ![]()
![]()
. Note that (6) implies that the "marked effects" are independent and identically distributed, but this assumption is relaxed later on. The normality assumption in (6) is probably adequate enough and facilitates computation significantly.
The three-stage hierarchy can be condensed by inserting (5) into (3), so that the model describing the phenotypic values can be written as
![]() |
(7) |
![]() |
(8) |
where
= {
i} and e = {ei}. This can be viewed as a frequentist mixed-effects model, where ß is a fixed location parameter and
and
are random terms. Unless additional assumptions are made or some knowledge about the partition of variance in the population is available,
i and ei are "confounded." However, it is conceptually useful to maintain these two vectors as distinct. The marginal (frequentist) distribution of the phenotypes induced by model (8) is the normal process
![]() |
(9) |
where MM'
2
is the variance-covariance matrix of marked genotypic values, conditionally on the marker genotypes M observed in the experiment. In scalar notation, the "total variance" of a single observation is

Hence,
![]() |
(10) |
is interpretable as the fraction of genetic variance attributable to marked effects, in an experiment repeated over and over with the marker genotypes fixed across replications. The variance "due to" the association with the markers is
lk=1m2ik
2
, which depends nontrivially on mi, the specific marker genotype of individual i. On the other hand, since the marker genotypes vary at random over replications,
![]() |
(11) |
where the expectation and covariance matrix are taken over the distribution of marker genotypes in the population. Knowledge of the distribution of marker genotypes is needed for evaluation of (11).
Best prediction and best linear prediction:
Under standard assumptions, with ß and the variance components
2
,
2
, and
2e known, the joint distribution of
and
, given the phenotypes and the marker genotypes, is the multivariate normal process:
![]() |
(12) |
Here,
![]() |
(13) |
where 
=
2e/
2
, 
=
2e/
2
, and
![]() |
(14) |
The best linear predictor [BLP; also the best predictor (BP) under normality; HENDERSON 1973] of the unobserved total genetic values a = M
+
is
![]() |
(15) |
and the variance-covariance matrix of the prediction error (under normality this is also the covariance matrix of the conditional distribution of a) is
![]() |
(16) |
The best predictor has the smallest possible mean-squared error of prediction. Hence, it would be difficult to improve upon this, provided that the model is reasonable, normality holds, and parameters are known. Thus, (1316) provide an alternative to a ridge regression approach to prediction. Generalization to multiple traits measured in different individuals is straightforward, but this is not dealt with here. Since, given M, the BLP or BP is unbiased (e.g., ![]()

with the expectation taken with respect to the joint distribution of marker genotypes in the entire population. A drawback of BP or BLP is that it is unrealistic to assume that ß,
2
,
2
, and
2e are known without error.
Best linear unbiased prediction:
An obvious improvement is to use BLUP. BLUP takes into account uncertainty about ß, which is not the case of BP or BLP above, where ß is treated as known. Under normality, BLUP(a) can be interpreted as the mean of the conditional distribution of the predictand a = M
+
, given a vector of "error contrasts," denoted as w. For example, take w = y - X
, where
is either the ordinary least-squares or the generalized least-squares estimator of ß. In such a setting, BLUP is the best predictor under normality, but only in the class of linear translation invariant predictors (![]()
![]()
|
(17) |
where, for P = I - X(X'X)-1X',
|
(18) |
gives
= BLUP(
) and
= BLUP(
). Further,
![]() |
(19) |
The BLUP of a = M
+
is

and the variance-covariance matrix of the prediction errors is

As noted, BLUP of the unobservable genetic values (marked or background genetic effects or any linear combination thereof) takes into account the uncertainty about ß, although given 
, 
, and
2e. Since BLUP is conditionally unbiased, it follows that it is also so unconditionally (averaged over marker genotypes). The unconditional covariance matrix of the prediction errors is the average of (19) taken over the distribution of marker genotypes.
The needed variance components:
Sensible values of the dispersion parameters must be specified for implementing BLUP. As stated, unless there is prior or external knowledge, it is not possible to separate the variance of the background genetic effects (
2
) from that of the environmental influences (
2e). The model can be rewritten as
![]() |
(20) |
where e*
N(0, I
2*e). Methods for obtaining maximum-likelihood estimates of
2
and of (
2
+
2e) for (20) are well known, e.g., ![]()
would be
![]() |
(21) |
where

is the best predictor (given the error contrasts) or BLUP(
). The covariance matrix of the conditional distribution (21) is

with this being the same as the covariance matrix of the prediction errors
-
, given the marker genotypes M. The BLUP of the marked quantitative trait genotype is M
, and the prediction error dispersion matrix is MC*
M'
2*e. Hence, M
could be used as a criterion for genetic evaluation in marker-assisted selection ignoring background effects. If reasonable maximum-likelihood (ML) or restricted maximum-likelihood (REML) estimates of
2
and of
2*e are available, these can be treated as "true" values in (21).
Suppose that estimates of the additive genetic (
2a) and of the environmental variance (
2e) are available (or of heritability
2 and the phenotypic variance
2y) from an analysis ignoring marker information. Then, the variance of the background genetic effects (assuming additivity) could be estimated as
2
=
2*e -
2e =
2*e - (1 -
2)
2y (hoping that this value will be positive). Then, form

where
2
is the ML or REML estimate of
2
, and proceed with the BLUP implementation in (17). The resulting predictor is an empirical BLUP having properties that depend largely on the accuracy of the variance component estimates and on the adequacy of model in (7) and (9).
Differences with ridge regression:
A main difference with the ridge regression approach of ![]()
![]()

(or of
*
=
2*e/
2
) is obtained. Here, the formalism of the random-effects model (with a justification from a Bayesian viewpoint) is favored over an ad hoc procedure for doing shrinkage and tempering colinearity in a fixed-effects model, such as ridge regression. ![]()
![]()
![]()
![]()
![]()
![]()
An advantage of a random-effects treatment is the flexibility to accommodate additional model features. For example, the ridge regression estimator of ![]()
are independent (a priori). Evidence of coexpression of genes in at least the same chromosome (![]()
| EXTENDING THE HIERARCHY |
|---|
Model without background genetic effects:
For simplicity, assume first that the state of prior knowledge does not allow disentangling background genetic effects from environmental deviates. The first tier is then as in (20), so that
![]() |
(22) |
The second tier is given by the distribution of the marked genetic effects. Here, instead of assuming that
|
2
N(0, I
2
), a "one-way layout" is adopted, to partition the variability of marked effects into between- and within-chromosome components. Arrange markers sequentially according to their order within each chromosome and assume a dense marker map. The model for the second tier is
![]() |
(23) |
where c = {ci} is an nc x 1 vector of "marked chromosomal effects" (nc is the number of pairs of chromosomes), T is a known incidence matrix of order l x nc relating marked effects to chromosomes, and v = {vij} is an l x 1 vector of marked "within-chromosome" deviates. In scalar notation,
ij = ci + vij is the "marked quantitative effect of the jth marker at the ith chromosome." It is assumed that
![]() |
(24) |
Here, V = Diag(Vi) is taken to be a block-diagonal matrix with nc blocks, where Vi is the li x li variance-covariance matrix of marked within-chromosome deviates in chromosome i, and li is the number of markers in chromosome i. The block diagonality of V in (24) implies that marked within-chromosome effects are independent across chromosomes. However, a within-chromosome dependence will be introduced. The matrix Vi can be structured in several different manners, and some forms of modeling such dependence are discussed below. In (23) and (24) the nc x 1 vector of chromosome effects is assumed to possess the distribution
![]() |
(25) |
where
2c is the variance between chromosome effects. Some of the possible forms of Vi (among many possible ones) are considered next:
- Within-chromosome deviates are correlated according to a first-order autoregressive process. Partition
with 
Then, if markers are equally spaced,

where
2vi is the variance between deviates in chromosome i (i = 1, 2, ... , nc) and
is a parameter taking values between -1 and 1. This covariance structure satisfies the assumption that adjacent within-chromosome deviates are more strongly correlated than those further apart. A form of relaxing the requirement that markers be equally spaced is immediately below. - Within-chromosome deviates are correlated according to a Gaussian decay model (e.g.,
VERBEKE and MOLENBERGHS 1997 ). The covariance structure here is

where

is the correlation between marked within-chromosome deviates k and t in chromosome i. Here, di,kt is the distance (e.g., in physical units such as kilobases) between markers k and t in chromosome i, and
i > 0 is a chromosome-specific parameter. When the distance between markers
0,
i,kt
1; on the other hand, when di,kt
, the correlation between within-chromosome deviates is null. The parameter
i governs the rate at which the correlation decreases. When
i is close to 0, such correlation falls rapidly; when
i increases, the drop is more gentle. It may be convenient to assume that
i =
for all chromosomes. - Suppose that the number of markers is constant across chromosomes (with l markers and nc chromosomes there would be z = l/nc markers per chromosome). If the markers are equally spaced and if the variance of within-chromosome deviates is constant across chromosomes, one may pose the Toeplitz covariance specification

(
VERBEKE and MOLENBERGHS 1997 ) for i = 1, 2, ... , nc. Here, the correlation between within-chromosome deviates k and t is

Note that
kt here, even though a correlation coefficient, is not the same as
i,kt in the preceding section. Using (23) in (22) the model can be expressed as
![]() |
(26) |
Under Gaussian assumptions, the joint distribution of c and v, given a vector or linearly independent error contrasts w, is
|
(27) |
Here,
= BLUP(c) and
= BLUP(v) are computed as
|
(28) |
and
![]() |
(29) |
The BLUP of the M
is
. Since the predictor is conditionally unbiased (given M), it is also unbiased unconditionally. The unconditional covariance matrix of the prediction errors is the average of (29) taken with respect to the distribution of marker genotypes.
Here, it is possible to predict marked effects on a chromosomal basis. Consider model (26) and recall that
= Tc + v. Suppose that a hypothetical species has four pairs of chromosomes. For the n individuals assayed, the incidence matrix M can be written as

where m'ij are the codes for the marker genotypes of individual i at chromosome j. Likewise, the "true" marked effects on a chromosome-by-chromosome basis can be written as

where cj is an effect peculiar to all markers in chromosome pair j and 1lj is a vector of 1's of order lj. Then, after solving (28), one has the following relationships:

.
For many reasonable covariance structures, all dispersion parameters are identifiable in this model and can be estimated by tailoring some suitable algorithm for maximum likelihood. ![]()
![]()
Model with background genetic effects:
The model is now an expanded version of (26):
![]() |
(30) |
The best linear unbiased estimator of ß and the BLUP of c, v, and
can be obtained by solving the system:
|
(31) |
As before, if an estimate of the total genetic variance is available, one can set
2
as
2
=
2*
- (1 -
2)
2y. Some estimates (e.g., REML) of
2c and of the parameters defining the structure of Vi can be obtained when estimating
2*e from the model where background genetic effects are ignored or from the full model but fixing the value of
2
. The BLUP of the "total" genetic value of the entire collection of individuals in the sample is then
|
(32) |
and the covariance matrix of the prediction errors is
|
(33) |
Differential contribution of marked effects to genetic variability:
A representation of the degree of inequality of a frequency distribution is given by the Lorenz curve (e.g., ![]()
![]()
![]()
|
Let Munique be a set of rows of M defining u unique marker genotypes (u may be much smaller than n), m'unique,j be the jth row of Munique, and munique,jk be the entry for marker locus k in m'unique,j. Define the corresponding "unique marked effect" as
![]() |
(34) |
Arbitrarily, define "total marked genetic variability" as the sum of the squared unique marked genetic effects, so
![]() |
(35) |
This definition does not take into account the frequency of the unique marker genotypes in the population, but this can be accounted for easily by weighting each µj appropriately. Let now µ2[j] be the jth ordered value of µ2j, sorted in an increasing order. The ordinate values in the Lorenz curve, i.e., the cumulative proportion of the "total marked genetic variation," are calculated as
![]() |
(36) |
where i/u is the cumulative proportion of unique marker genotypes. The Lorenz curve results from plotting L(i/u) against i/u; observe that L(0) = 0 and L(1) = 1. It can be shown (![]()
![]()
![]() |
(37) |
A main difficulty in estimating the Lorenz curve and the Gini coefficient is that nonlinear functions of unknown quantities are involved. Simple method of moment estimators
and
are obtained by replacing µ2j in (36) and (37) by

with
obtained from (31). This statistic does not take into account the uncertainty associated with
.
Family heterogeneity:
Suppose that the phenotypic and molecular marker information can be partitioned into F nonoverlapping or "independent" clusters representing some familial aggregation. For example, ![]()
![]()
![]()
Let the vector of phenotypic records be partitioned as y = [y'1, y'2, ... , y'F]', where yi contains fi observations pertaining to family i (i = 1, 2, ... , F). Assume now that heterogeneity in the marker-phenotype association exists. In the presence of family-specific marked effects the entire vector
can be written as

where
i is an l x 1 set of effects peculiar to family i. The model for phenotypic values, allowing now for family-specific background effects,
F, becomes
![]() |
(38) |
where X = [X'1, X'2, ... , X'F]' and Xi has as many rows as there are observations for family i (fi) and p columns. Further, M = Diag(Mi). The incidence matrix Mi has fi rows and l columns, the number of markers. The F x 1 vector
f = {
i} contains F family background effects related to the phenotypic values via the known incidence matrix F (consisting of vectors of ones, 1i, in appropriate locations and of zeroes elsewhere); w is an n x 1 residual vector of independently distributed within-family deviations, with common variance
2w and where n =
Fi=1fi.
In the absence of marker information, the phenotypic variance can be partitioned into between- and within-family components. For example, with half-sib families, the between-family variance contains one-quarter of the additive genetic variance (supposing additive inheritance), and the within-family component includes the environmental variance plus three-quarters of the additive genetic variance. In the setting of (38) it is assumed that
f and w are independently distributed vectors, with distributions
f|
2
f
N(0, I
2
f) and w|
2w
N(0, I
2w), respectively. This partitioning reassigns the unmarked additive genetic variance into between (
2
f) and within-family (
2w) components. Now, the
i marked effects can be treated as random regressions following the distribution
![]() |
(39) |
Here,
0 is an l x 1 parameter that is common to all families (the "fixed" part of the regression) and B is the l x l variance-covariance matrix between the regression coefficients. This implies that
i =
0 +
i, where
i
N(0, B) is a vector of deviations from the common regression
0 that is specific to family i. In a simpler hierarchical model it could be postulated, for example, that
i|
2
N(0, Ii
2
) for i = 1, 2, ... , F. The model for the more general specification becomes
![]() |
(40) |
where ß*' = [ß',
'0], X* = [X M0], M0 = [M'1 M'2 · M'F]', and
' = [
1,
'2, ... ,
F] is a vector of family-specific deviations from the overall regression
in (39).
Given the dispersion parameters B,
2
f, and
2w, the best linear unbiased estimator of ß* and the BLUP of each of the family-specific deviations
i are found by solving the system
|
(41) |
where
Fi=1 is the "direct sum" of matrices notation (e.g., ![]()
2w, with all these matrices being l x l. Note that

and

The variance-covariance matrix of the prediction errors is given by the inverse of the coefficient matrix in (41) multiplied by
2w. Unless there are many families (e.g., independent half-sib groups in dairy cattle) it will be difficult to obtain reliable maximum-likelihood estimates of B.
The hierarchy can be extended as in (23) and (24) to include chromosome and within-chromosome effects that would now be family specific. Put
![]() |
(42) |
where
i is l x 1, ci is an nc x 1 vector of chromosome effects peculiar to family i, Ti is an l x nc incidence matrix, and vi is an l x1 vector of within-chromosome effects for family i. It is convenient to assume that the within-chromosome marked effects are independent across families and chromosomes, but possibly dependent within chromosomes. Here, the variance-covariance matrix V in (24) would be Fl x Fl and would take the form

where Vi = Var(vi) is an l x l variance-covariance matrix that is specific to family i. Now, partition

where vij is an lj (number of markers in chromosome j) x 1 vector of within-chromosome deviates for family i in chromosome j. Then, under the assumption of independence of deviates across chromosomes,

With this,

and Vij can be assigned any of the spatial structures discussed earlier. Then, the distribution of all family-specific deviations from regression would be
![]() |
(43) |
The hierarchy is completed by assigning a distribution to the family-specific chromosome effects ci. An extension of assumption (25) is
![]() |
(44) |
where K = {kij} is an nc x nc matrix of covariances between chromosome effects of the same family, with this dispersion structure assumed homogeneous across families. For example, element k45 of K would be the covariance between the effects of chromosomes 4 and 5 within the same family. As a side note, observe that unconditionally to ci, (43) and (44) imply that
i
N(0, TiKT'i + Vi). Hence, assumption (39) is equivalent to (43) and (44) if and only if TiKT'i + Vi = B for all i.
Using (42) in (40) leads to the following model for phenotypes,
![]() |
(45) |
where M = Diag(Mi) and T = Diag(Ti). The BLUP of all family-specific chromosome and within-chromosome effects is obtained by solving
|
(46) |
The variance-covariance matrix of the prediction errors is the inverse of the coefficient matrix in (46) times
2w.
Lande-Thompson regression model revisited:
![]()
i + (Mi -
i), where
i is an fi x l matrix with columns equal to the mean values of the corresponding columns of Mi, with all its rows being equal. A model of interest might be
![]() |
(47) |
where
B is the regression of phenotypes on mean family molecular scores, MW is a matrix of within-family deviations from the mean scores for the appropriate families, and
W is the vector of regressions of phenotypes on the within-family deviations. Assume that
B
N(0, IF
2
F) and
W
N(0, IF
2
W) where
2
F and
2
W are components of variance, and suppose that the two random vectors are independently distributed. The best linear unbiased predictor of the between- and within-family regressions can be calculated by solving
|
(48) |
The variance-covariance matrix of the prediction errors can be obtained from the inverse of the coefficient matrix in (48), times
2w. Model (47) can be extended by allowing the within-family regressions to be family-specific. In this extended model, in addition to
B, there would be
Wi (i = 1, 2, ... , F) within-family regressions.
| A BAYESIAN FORMULATION |
|---|
We adopt now a Bayesian point of view and describe a Markov chain Monte Carlo (MCMC) implementation. The focus is on model (40) and its hierarchical expansion in (4244), leading to (45) and (46) in the mixed model and BLUP treatment. This is the most general and richly parameterized specification among those discussed so far. The developments follow the typical hierarchical structure of Bayesian multilevel models (![]()
Joint posterior distribution:
Start with (40) as sampling model (data-generating process) or first level of the Bayesian hierarchy. Conditionally on ß*,
, and
f, it is assumed that the phenotypic values are drawn independently from the distribution
![]() |
(49) |
It is convenient to express the joint density of all phenotypes, given ß*,
,
f, and
2w as the product of F densities corresponding to the contributions to information made by the phenotypic values in each of the families. Thus,
![]() |
(50) |
The notation N(yi|X*iß* + Mi
i + 1i
i, I
2w) indicates a normal density or distribution with yi as random variable, X*iß* + Mi
i + 1i
i as mean vector, and I
2w as covariance matrix; a similar notation is used if the distribution involves a scalar.
The second level assigns a prior distribution to all unknown parameters of the first tier. It is assumed that the joint prior density can be written as
![]() |
(51) |
Above, ß*0 is the known mean of the prior distribution of ß* and s2ß* is a known scalar that tunes the degree of vagueness of the prior for this parameter. Different degrees of vagueness may be assigned to the two components of ß*; thus, there would be two distinct tuning parameters, s2ß and s2
0. The notation V(
) means that the variance-covariance matrix of v (the within-chromosome deviates) depends on some parameter vector
. For example, if within-chromosome deviates are correlated according to an autoregressive process having
as parameter and with chromosome-specific variances
2v1,
2v2, ... ,
2vnc, then
= [
,
2v1,
2v2, ... ,
2vnc]'. The autoregressive process is assumed hereafter, to illustrate one of the possible specifications. In (51), s2w and
w are known parameter values (hyperparameters) of a scaled inverted chi-square distribution (e.g., ![]()
2w. Now, as stated in the developments following (42) and leading to (43), given the chromosome effects c, the marked effects
of different families are taken to be mutually independent. Hence, (51) can be put as
![]() |
(52) |
The third level of the hierarchy consists of the prior distribution assigned to c,
, and
2
f, the parameters of the second tier. It is assumed that the corresponding density takes the form
![]() |
(53) |
In the preceding, it is assumed that (a) the chromosome effects of different families are mutually independent, although correlated within a family with an nc x nc covariance matrix K, as in (44); (b) the parameter
has a prior distribution indexed by some known parameter vector
and is independently distributed (a priori) of all within-chromosome variances
2vi; (c) the nc variances
2vi are independently and identically distributed as scaled inverted chi square with known hyperparameters s2v and
v; and (d) the variance between families
2
f follows a scaled inverted chi-square distribution with known parameters s2f and
f.
The fourth and final level of the hierarchical model is the prior distribution assigned to K, the covariance matrix between effects of the same family on different chromosomes. It is assumed that K follows an inverted Wishart process of order nc (the order of K) with density p(K|
cSc,
c), where
cSc is a known scale matrix and
c is a known positive parameter usually referred to as the "degrees of freedom" of the distribution (e.g., ![]()
cSc,
c) =
cSc/(
c - nc - 1). For a very large value of
c, Sc then approximates the mean value of the prior distribution. In less structured models, e.g., independence of chromosome effects within a family, K would take a simpler form, in which case the prior distribution would be modified appropriately.
Some of the prior distributions given above are difficult or impossible to elicit. In such a situation, one may consider using some of the standard default improper priors, carry out a technically involved reference prior analysis (![]()
![]()
![]()
![]()
cSc,
c) and letting H denote the set of all known hyperparameters, the joint posterior density of the uncertain parameters is, after rearrangement,
![]() |
(54) |
Markov chain Monte Carlo scheme:
At least in theory, all marginal or joint distributions of sets of parameters of interest can be derived from (54) by effecting the necessary integrations, to take uncertainty about nuisance parameters into account properly. Since the joint posterior distribution is not in a form amenable to analytical treatment, MCMC (e.g., ![]()
The hybrid algorithm proposed uses the conditional posterior distributions as candidate-generating densities, whenever possible. Save for the presence of p(
|
) (whose structure has not been specified yet), the joint posterior process with density (54) is in a normal-inverse Gamma or normal-inverse Wishart form. All fully conditional distributions (except that of
) can be identified and sampled with relative ease. This defines a Gibbs sampler for drawing from most posterior distributions of interest. Since the conditional process is not recognizable for
, a Metropolis procedure is tailored. The most relevant expressions needed for implementing this hybrid algorithm are presented below. The notation [parameter or parameters|ELSE] is used throughout to denote a fully conditional posterior distribution, that is, the posterior distribution of a scalar or vector parameter given all other parameters, the data, and H. For details, see ![]()
![]()
![]()
![]()
Conditional distribution of ß*:
This is arrived at by retaining in (54) only the terms involving ß*. The resulting conditional density is

This can be identified as the density of the multivariate Gaussian distribution,
![]() |
(55) |
where

and

Obtaining samples from (55) is straightforward, especially if the order of ß* is not too large. When the prior distribution of ß* is diffuse (s2ß*
), this step involves essentially least-squares type computations, after making an offset of the data vector.
Conditional distributions of family-specific marked effects
:
Inspection of (54) reveals that all family-specific marked regressions
i are mutually independent, given all other parameters. The density of the conditional posterior distribution of
i (i = 1, 2, ... , F) is

The distribution can be shown to be the l-variate normal process
![]() |
(56) |
where

and

All family-specific marked effects can be drawn on a family-by-family basis by sampling from the multivariate normal distribution given in (56).
Conditional distributions of family-specific background effects
:
From (54), it follows that

so that family-specific background effects are conditionally independent, given all other parameters. In particular, for the ith family one has

This can be shown to be the kernel of the density of the univariate normal distribution,
![]() |
(57) |
where

and

Note that 1'i1i = fi is the number of individuals in family i and that
2
f/(
2
f +
2w/fi) is proportional to the "heritability of a family mean" (whenever the variance of background effects between families is proportional to the additive genetic variance), in the terminology of standard quantitative genetics. Sampling from distribution (57) is straightforward.
Conditional distributions of chromosome effects:
The density of the conditional distribution of the vector of chromosome effects c is, from (54),

Thus, family-specific chromosome effect vectors ci (i = 1, 2, ... , F) are conditionally independent of each other. After algebra, as in ![]()
![]()
![]() |
(58) |
where

and

For instance, in a species with 30 pairs of chromosomes such as cattle, a draw would need to be made from a 30-variate normal distribution for each of the F families represented in the data set.
Between- and within-family variances of background effects:
The density of the conditional posterior distribution of the between-family variance of background effects,
2
f, is

Writing the F Gaussian densities and the density of the scaled inverted chi-square prior process of
2
f explicitly, one arrives at

This indicates that the conditional posterior distribution of
2
f is scaled inverted chi square (e.g., ![]()
![]()
![]() |
(59) |
To draw samples from (59), one extracts a random deviate from a central chi-square distribution on F +
f degrees of freedom, takes its reciprocal, and multiplies the inverted deviate by (
Fi=1
2i + s2f
f).
Likewise, the density of the conditional posterior distribution of the within-family variance is

Writing the densities explicitly and rearranging one arrives at

This is the distribution of the scaled inverted chi-square random variable
![]() |
(60) |
The sampling process is similar to that described in connection with (59).
Autoregressive parameter
:
Note in (54) that the only terms involving
are the prior densities of the family-specific marked effects
i and the prior density p(
|
). Hence,

Now, recall that Vi(
) is an l x l covariance matrix such that

where Vij(
) is the lj x lj covariance matrix of within-chromosome deviates for family i in chromosome j. Specifically,
![]() |
(61) |
Note that this covariance matrix is chromosome specific. Using this, the conditional posterior density above has the form
![]() |
(62) |
When viewed as a function of
, (62) is not in a recognizable form, irrespective of the form of the prior density p(
|
) adopted for the autoregressive parameter. Effecting a draw from this distribution requires a more involved procedure, such as a single-site Metropolis or Metropolis-Hastings algorithm (![]()
![]()
![]()
![]()
![]()
. As in ![]()
into
as

with the Jacobian of the transformation being

Then, the conditional posterior density of
is
![]() |
(63) |
where
= [
,
2v1,
2v2, ... ,
2vnc]'. Suppose now that the state of
at iteration t of the algorithm is
[t] and that all other parameters in its conditional posterior distribution have been updated. This implies that the state of
is

We update
via a Metropolis jump. Here, a candidate value
* must be sampled from a symmetric candidate generating distribution with density Q(
*|
[t]); symmetry means that Q(
*|
[t]) = Q(
[t]|
*) for all pairs (
*,
[t]) and for every t (![]()
First, sample a deviate r from a Gamma(d.f./2, d.f./2) where d.f. denotes the degrees of freedom of the t-distribution.
Second, following ![]()
* from

Third, using (63), compute the Metropolis ratio:

The integration constant cancels out in the numerator and denominator, so
is calculated by direct evaluation of ratios involving (63).
Fourth, draw a deviate U from a Uniform(0, 1) distribution, and set

Invert the candidate as

and set

The preceding completes the updating of the autoregressive parameter.
Variances of within-chromosome deviates
2vi:
From (54),

It is convenient to rewrite the model for marked effects presented in (42) into a chromosome-specific basis. Write

where
*j is a vector of order Flj x 1 containing the marked effects of the lj markers in chromosome j for the F families, and v*j is a vector of within-chromosome deviates. An assumption made earlier is that within-chromosome deviates are conditionally mutually independent across chromosomes and families, but correlated within chromosomes according to the covariance matrix in (61). Since the preceding is merely a rearrangement of (42), it follows that

where the identity matrix is F x F. Hence,

indicating that the within-chromosome variances
2vj have independent fully conditional posterior distributions. In particular, and writing the densities explicitly, one has for
2vj,
![]() |
(64) |
This is the density of the scale-inverted chi-square variate:
![]() |
(65) |
Thus, the within-chromosome variance can be sampled independently, chromosome by chromosome.
Covariance matrix of chromosome effects K:
Retaining in (54) only those terms involving K leads to

Now, we write explicitly the F normal densities and the inverse Wishart density p(K|Sc, vc), to obtain
![]() |
(66) |
This indicates that the fully conditional posterior distribution is inverse Wishart of order nc, degrees of freedom equal to F + vc, scale matrix
Fi=1cic'i + Scvc, and posterior expectation

(given all other parameters). Procedures for sampling from inverse Wishart distributions in a quantitative genetics context are in, e.g., ![]()
![]()
![]()
The MCMC scheme in a nutshell:
A discussion of how Markov chain Monte Carlo schemes can be tuned, run, and monitored for convergence can be found in ![]()
![]()
![]()
![]()
![]()
- Set starting values that are hopefully "inside" of the target joint posterior distribution.
- Sample systematically from each of the following conditional posterior distributions, perhaps changing the order of visitation at each round: (55) for ß*; (56), (57), and (58) for the family-specific marked effects, background effects, and chromosome effects, respectively; (59) and (60) for the between- and within-family variances of the background effects; (62) for the autoregressive parameter, after incorporating the Metropolis step; (65) for the nc variances of within-chromosome deviates; and (66) for K, the covariance matrix of chromosome effects. This constitutes a single completed iteration of the scheme. At each subsequent loop, update the values of the appropriate conditioning values with the new draws.
- Repeat the iteration as many times as needed to ensure (a) that draws can be reasonably claimed as belonging to the posterior distribution and (b) that posterior features are estimated with a sufficiently small Monte Carlo error.
- If necessary, discard early iterations, in what is called the "burn-in" period.
- Using the remaining samples, estimate any feature of the posterior distribution, e.g., a posterior mean or variance, a posterior density, or the distribution of an order statistic.
To illustrate the flexibility of the approach, consider inferring the Lorenz curve and the Gini coefficient in (36) and (37), respectively, and contrast this with the crude methods discussed in Differential contribution of marked effects to genetic variability. Let
(k) be a draw from the posterior distribution of the marked effects in the simplest model. Then, as in (34), µ(k)j = m'unique,j
(k) is a draw from the posterior distribution of µj and S(
(k)) is a sample from the posterior distribution of S(
) in (35). Next, order all the µ2(k)j, so that

is a sample from posterior distribution of the ordered squared "unique" marked effects. Then

is a sample from the posterior distribution of the Lorenz curve in (36). Likewise,

is a sample from the posterior distribution of the Gini coefficient in (37), measuring unequal contribution of marked effects to genetic variability. The Bayesian analysis produces an entire description of the uncertainty about the Lorenz curve and the Gini coefficient.
Draws from the posterior distribution of the "total" genetic value (in the simplest model) would be obtained as M
(k) +
(k), and posterior means, variances, and covariances can be computed directly from such samples.
| DISCUSSION |
|---|
Several methods for incorporating molecular information into predictors of genetic merit of candidates for selection in improvement programs have been proposed in recent years. For example, ![]()
![]()
![]()
![]()
![]()
![]()
Another difficulty with the least-squares method arises when the number of markers is almost as large as the number of individuals typed. For example, >1.5 million single-nucleotide polymorphisms (the current desideratum of genetic marker) have been identified in the human genome and their positions located at an average spacing of 2 x 10-3 cM (![]()
![]()
![]()
![]()
![]()
![]()
![]()
![]()
![]()
![]()
![]()
![]()
![]()
![]()
![]()
![]()
![]()
In practice, all individuals cannot be typed for all markers. Further, if typing costs continue to be abated, the number of predictor variables (markers) may grow faster than the number of observations. This situation is encountered routinely in animal breeding, and it can be solved by positing random-effect models. The BLUP methods used in animal breeding can be viewed as based on conditional or posterior distributions (![]()
![]()
The classical and Bayesian methods outlined here do not accommodate complex genealogical structures such as those arising in animal breeding. To illustrate, consider the hypothetical situation in which the markers and the QTL are identical, so that there are no background effects if gene action is additive. Hence, in the notation of A MIXED-EFFECTS MODEL FORMULATION, ai =
lk=1mik
k, and, therefore, one would expect a priori that the covariance between the genetic value of any two relatives i and i' is equal to

In the absence of knowledge of genotype frequencies, the term E(m'im'i) cannot be assessed. In a fully Bayesian method one needs to assign a prior to the genotypic frequencies and take the MCMC method even further (e.g., ![]()
Model (38) introduces a family structure, and (47) expands ![]()
i
B, the fi x 1 vector of unobserved (contrary to least-squares fitted values) mean molecular scores for all members of family i (all its elements are equal); (2) (Mi -
i)
W, the within-family contributions to the molecular scores; (3)
i or background genetic effect of family i (e.g., the additive genetic contribution, net of the marked part of the variability); and (4) wi, a vector of within-family deviations (also net of the marked part of the variability). The mean of the posterior distribution of the total genetic merit of the individuals in family i would be obtained by averaging
i
(k)B + (Mi -
i)
(k)W + 1i
(k)i over the MCMC samples. Similarly, the posterior mean of the within-family deviations can be estimated by averaging w(k)i = yi - Xiß(k) - Mi
(k)i - 1i
(k)i. ![]()

where hF
and hF
are some nonstochastic relative weights assigned to the unobserved molecular and quantitative trait family means, respectively, and hw
, hw
are corresponding weights assigned to the unobserved within-family deviations. The means, variances, covariances, and any feature of a posterior distribution involving the unobserved total merits Ti can be estimated from the MCMC samples. The procedure takes the ideas of ![]()
![]()
We have assumed throughout that all individuals have been typed for all markers, but this is seldom the case, as noted earlier. In the Bayesian context missing markers can be dealt with automatically via an augmentation of the posterior distribution. Subsequently, MCMC is used to make imputations for the unobserved part of the molecular information, at least if missingness is at random in the sense of ![]()
![]()
It remains to be seen to what extent the proposed procedures hold in practice or in simulations what they seem to promise on theoretical grounds. In a simulation, ![]()
![]()
![]()
| ACKNOWLEDGMENTS |
|---|
The authors thank Rohan Fernando, Davorka Gulisija, and Guilherme Rosa for useful comments. Research was supported by the Wisconsin Agriculture Experiment Station and by grants from the NRICGP/U.S. Department of Agriculture (99-35205-8162) and National Science Foundation (DEB-0089742).
Manuscript received May 3, 2002; Accepted for publication September 27, 2002.
| LITERATURE CITED |
|---|
AGUILAR-GUTIERREZ, X., 2000 Desigualdad y Pobreza en México, son inevitables? Instituto de Investigaciones Económicas, Universidad Nacional Autónoma de México, Mexico City.
BERNARDO, J. M., 1979 Reference posterior distributions for Bayesian inference. J. R. Stat. Soc. B 41:113-147.
BERNARDO, J. M., and A. F. M. SMITH, 1994 Bayesian Theory. John Wiley & Sons, New York.
BOX, G. E. P., 1980 Sampling and Bayes' inference in scientific modelling and robustness. J. R. Stat. Soc. A 143:383-430.
BOX, G. E. P., and G. C. TIAO, 1973 Bayesian Inference in Statistical Analysis. Addison-Wesley, Reading, UK.
CANTET, R. J. C., R. L. FERNANDO, and D. GIANOLA, 1992 Bayesian inference about dispersion parameters of univariate mixed models with maternal effects: theoretical considerations. Genet. Sel. Evol. 24:107-135.
CARON, H., B. VAN SCHAIK, M. VAN DER MEE, F. BAAS, and G. RIGGINS et al., 2001 The human transcriptome map: clustering of highly expressed genes in chromosomal domains. Science 291:1289-1292.
COWLES, M. K. and B. P. CARLIN, 1996 Markov chain Monte Carlo convergence diagnostics: a comparative review. J. Am. Stat. Assoc. 91:883-904.
FARNIR, F., W. COPPIETERS, J. J. ARRANZ, P. BERZI, and N. CAMBISANO et al., 2000 Extensive genome-wide linkage disequilibrium in dairy cattle. Genome Res. 10:220-227.
FERNANDO, R. L. and M. GROSSMAN, 1989 Marker-assisted selection using best linear unbiased prediction. Genet. Sel. Evol. 33:209-229.
FISHER, R. A., 1921 On the probable error of a coefficient of correlation deduced from a small sample. Metron 4:3-32.
GASTWIRTH, J. L., 1971 A general definition of Lorenz curve. Econometrica 39:1037-1039.
GASTWIRTH, J. L., 1972 The estimation of the Lorenz curve. Rev. of Econ. Stat. 54:306-316.
GELMAN, A., J. B. CARLIN, H. S. STERN and D. B. RUBIN, 1995 Bayesian Data Analysis. Chapman & Hall, London.
GEORGES, M. D., D. NIELSEN, M. MACKINNON, A. MISHRA, and R. OKIMOTO et al., 1995 Mapping quantitative trait loci controlling milk production in dairy cattle by exploiting progeny testing. Genetics 139:907-920.[Abstract]
GIANOLA, D., 1990 Can BLUP and REML be improved upon?, pp. 445-449 in Proceedings of the 4th World Congress on Genetics Applied to Livestock Production, Vol. XIII. Joyce Darling, Penicuik, UK.
GIANOLA, D. and R. L. FERNANDO, 1986 Bayesian methods in animal breeding theory. J. Anim. Sci. 63:217-244.
GIANOLA, D. and B. GOFFINET, 1982 Sire evaluation with best linear unbiased predictors. Biometrics 38:1085-1088.
GIANOLA, D., S. IM and F. W. MACEDO, 1990 A framework for prediction of breeding value, pp. 210238 in Advances in Statistical Methods for Genetic Improvement of Livestock, edited by D. GIANOLA and K. HAMMOND. Springer-Verlag, Heidelberg.
GILKS, W. R., S. RICHARDSON and D. J. SPIEGELHALTER, 1996 Markov Chain Monte Carlo in Practice. Chapman & Hall, London.
HARTL, D. L., and E. W. JONES, 2002 Essential Genetics: A Genomics Perspective. Jones & Bartlett, Sudbury, MA.
HARVILLE, D. A., 1976 Extension of the Gauss-Markov theorem to include the estimation of random effects. Ann. Stat. 4:384-395.
HAYES, B. and M. E. GODDARD, 2001 The distribution of the effects of genes affecting quantitative traits in livestock. Genet. Sel. Evol. 33:209-229.[Medline]
HAZEL, L. N., 1943 The genetic basis for constructing selection indexes. Genetics 28:476-490.
HENDERSON, C. R., 1973 Sire evaluation and genetic trends, pp. 1041 in Proceedings of the Animal Breeding and Genetics Symposium in Honor of Dr. Jay L. Lush. American Society of Animal Science and American Dairy Science Association, Champaign, IL.
HENDERSON, C. R., O. KEMPTHORNE, S. R. SEARLE, and C. M. VON KROSIGK, 1959 The estimation of environmental and genetic trends from records subject to culling. Biometrics 15:192-218.
HOERL, A. E. and R. W. KENNARD, 1970 Ridge regression: biased estimation and applications for non-orthogonal problems. Technometrics 12:55-82.
HOESCHELE, I., 2001 Mapping quantitative trait loci in outbred pedigrees, pp. 599644 in Handbook of Statistical Genetics, edited by D. J. BALDING M. BISHOP and C. CANNINGS. John Wiley & Sons, Chichester, UK.
HWANG, J. T. and D. NETTLETON, 2002 Investigating the probability of sign inconsistency in the regression coefficients of markers flanking quantitative trait loci. Genetics 160:1697-1705.
JENSEN, J., C. S. WANG, D. A. SORENSEN, and D. GIANOLA, 1994 Bayesian inference on variance and covariance components for traits influenced by maternal and direct genetic effects using the Gibbs sampler. Acta Agric. Scand. 44:193-201.
JUDGE, G. G., W. E. GRIFFITHS, R. CARTER, H. LUTKEPOHL and T. C. LEE, 1985 The Theory and Practice of Econometrics. John Wiley & Sons, New York.
KORSGAARD, I. R., A. H. ANDERSEN, and D. SORENSEN, 1999 A useful reparameterisation to obtain samples from conditional inverse Wishart distributions. Genet. Sel. Evol. 31:177-181.
LANDE, R. and R. THOMPSON, 1990 Efficiency of marker-assisted selection in the improvement of quantitative traits. Genetics 124:743-756.[Abstract]
LANGE, C. and J. C. WHITTAKER, 2001 On prediction of genetic values in marker-assisted selection. Genetics 159:1375-1381.
LINDLEY, D. V. and A. F. M. SMITH, 1972 Bayes estimates for the linear model. J. R. Stat. Soc. B 34:1-41.
METROPOLIS, N., A. W. ROSENBLUTH, M. N. ROSENBLUTH, A. H. TELLER, and E. TELLER, 1953 Equations of state calculation by fast computing machines. J. Chem. Phys. 21:1087-1091.
MEUWISSEN, T. H. E., B. J. HAYES, and M. E. GODDARD, 2001 Prediction of total genetic value using genome-wide dense marker maps. Genetics 157:1819-1829.
OLLIVIER, L., 1998 The accuracy of marker-assisted selection for quantitative traits in populations in linkage equilibrium. Genetics 148:1367-1372.
ROBERT, C. P., and G. CASELLA, 1999 Monte Carlo Statistical Methods. Springer-Verlag, New York.
RUBIN, D. B., 1976 Inference and missing data. Biometrika 63:581-592.
SEARLE, S. R., 1974 Prediction, mixed models and variance components, pp. 229266 in Reliability and Biometry, edited by F. PROSCHAN and R. J. SERFLING. Society for Industrial and Applied Mathematics, Philadelphia.
SEARLE, S. R., 1982 Matrix Algebra Useful for Statistics. John Wiley & Sons, New York.
SEARLE, S. R., G. CASELLA and C. E. MCCULLOCH, 1992 Variance Components. John Wiley & Sons, New York.
SMITH, C. and S. P. SIMPSON, 1986 The use of genetic polymorphisms in livestock improvement. J. Anim. Breed. Genet. 103:205-217.
SMITH, H. F., 1936 A discriminant function for plant selection. Ann. Eugen. 7:240-250.
SOLLER, M. and J. S. BECKMANN, 1983 Genetic polymorphisms in varietal identification and genetic improvement. Theor. Appl. Genet. 67:25-33.
SORENSEN, D., and D. GIANOLA, 2002 Likelihood, Bayesian, and MCMC Methods in Quantitative Genetics. Springer-Verlag, New York.
SORENSEN, D., C. S. WANG, J. JENSEN, and D. GIANOLA, 1994 Bayesian analysis of genetic change due to selection using Gibbs sampling. Genet. Sel. Evol. 26:333-360.
STEIN, C., 1955 Inadmissibility of the usual estimator for the mean of a multivariate normal distribution, pp. 192-206 in Proceedings of the Third Berkeley Symposium on Mathematics, Statistics, and Probability. University of California Press, Berkeley.
TANNER, M. A., 1993 Tools for Statistical Inference. Springer-Verlag, New York.
THEIL, H., 1971 Principles of Econometrics. John Wiley & Sons, New York.
URIOSTE, J. I., D. GIANOLA, R. REKAYA, W. F. FIKSE, and K. A. WEIGEL, 2001 Evaluation of extent and amount of heterogeneous variance for milk yield in Uruguayan Holsteins. Anim. Sci. 72:259-268.
VERBEKE, G., and G. MOLENBERGHS, 1997 Linear Mixed Models in Practice: A SAS Oriented Approach. Springer-Verlag, New York.
VERBEKE, G., and G. MOLENBERGHS, 2000 Linear Mixed Models for Longitudinal Data. Springer, New York.
WANG, C. S., J. J. RUTLEDGE, and D. GIANOLA, 1993 Marginal inferences about variance components in a mixed linear model using Gibbs sampling. Genet. Sel. Evol. 25:41-62.
WANG, C. S., J. J. RUTLEDGE, and D. GIANOLA, 1994 Bayesian analysis of mixed linear models via Gibbs sampling with an application to litter size in Iberian pigs. Genet. Sel. Evol. 26:91-115.
WANG, T., R. L. FERNANDO, and M. GROSSMAN, 1998 Genetic evaluation by best linear unbiased prediction using marker and trait information in a multibreed population. Genetics 148:507-515.
WEIGEL, K. A., D. GIANOLA, R. J. TEMPELMAN, C. A. MATOS, and I. H. C. CHEN et al., 1991 Improving estimates of fixed effects in a mixed linear model. J. Dairy Sci. 74:3174-3182.[Abstract]
WHITTAKER, J. C., 2001 Marker-assisted selection and introgression, pp. 673693 in Handbook of Statistical Genetics, edited by D. J. BALDING, M. BISHOP and C. CANNINGS. John Wiley & Sons, Chichester, UK.
WHITTAKER, J. C., R. THOMPSON, and P. M. VISSCHER, 1996 On the mapping of QTL by regression of phenotype and marker-type. Heredity 77:23-32.
WHITTAKER, J. C., R. THOMPSON, and M. C. DENHAM, 2000 Marker-assisted selection using ridge regression. Genet. Res. 75:249-252.[Medline]
XU, S., 1998 Mapping quantitative trait loci using multiple families of line crosses. Genetics 148:517-524.
ZELLNER, A., and W. VANDAELE, 1975 Bayes-Stein estimators for k-means, regression and simultaneous equation models, pp. 627653 in Studies in Bayesian Econometrics and Statistics, edited by S. FIENBERG and A. ZELLNER. North-Holland, Amsterdam.
ZENG, Z-B., 1993 Theoretical basis for separation of multiple linked gene effects in mapping quantitative trait loci. Proc. Natl. Acad. Sci. USA 90:10972-10976.
This article has been cited by other articles:
![]() |
H. P. Piepho Ridge Regression and Extensions for Genomewide Selection in Maize Crop Sci., June 26, 2009; 49(4): 1165 - 1176. [Abstract] [Full Text] [PDF] |
||||
![]() |
G. de los Campos, H. Naya, D. Gianola, J. Crossa, A. Legarra, E. Manfredi, K. Weigel, and J. M. Cotes Predicting Quantitative Traits With Regression Models for Dense Molecular Markers and Pedigree Genetics, May 1, 2009; 182(1): 375 - 385. [Abstract] [Full Text] [PDF] |
||||
![]() |
N. Yi and S. Xu Bayesian LASSO for Quantitative Trait Loci Mapping Genetics, June 1, 2008; 179(2): 1045 - 1055. [Abstract] [Full Text] [PDF] |
||||
![]() |
D. Gianola and J. B. C. H. M. van Kaam Reproducing Kernel Hilbert Spaces Regression Methods for Genomic Assisted Prediction of Quantitative Traits Genetics, April 1, 2008; 178(4): 2289 - 2303. [Abstract] [Full Text] [PDF] |
||||
![]() |
O. Gonzalez-Recio, D. Gianola, N. Long, K. A. Weigel, G. J. M. Rosa, and S. Avendano Nonparametric Methods for Incorporating Genomic Information Into Genetic Evaluations: An Application to Mortality in Broilers Genetics, April 1, 2008; 178(4): 2305 - 2313. [Abstract] [Full Text] [PDF] |
||||
![]() |
D. Gianola, R. L. Fernando, and A. Stella Genomic-Assisted Prediction of Genetic Value With Semiparametric Procedures Genetics, July 1, 2006; 173(3): 1761 - 1776. [Abstract] [Full Text] [PDF] |
||||
![]() |
H. Wang, Y.-M. Zhang, X. Li, G. L. Masinde, S. Mohan, D. J. Baylink, and S. Xu Bayesian Shrinkage Estimation of Quantitative Trait Loci Parameters Genetics, May 1, 2005; 170(1): 465 - 480. [Abstract] [Full Text] [PDF] |
||||
![]() |
C. Cervantes-Martinez and J. S. Brown A Haplotype-Based Method for QTL Mapping of F1 Populations in Outbred Plant Species Crop Sci., September 1, 2004; 44(5): 1572 - 1583. [Abstract] [Full Text] [PDF] |
||||
![]() |
M. Perez-Enciso, M. A. Toro, M. Tenenhaus, and D. Gianola Combining Gene Expression and Molecular Marker Information for Mapping Complex Trait Genes: A Simulation Study Genetics, August 1, 2003; 164(4): 1597 - 1606. [Abstract] [Full Text] [PDF] |
||||
- THIS ARTICLE
-
Abstract
- Full Text (PDF)
- Alert me when this article is cited
- Alert me if a correction is posted
- SERVICES
- Email this article to a friend
- Similar articles in this journal
- Alert me to new issues of the journal
- Download to citation manager
- Reprints & Permissions
- CITING ARTICLES
- Citing Articles via HighWire
- Citing Articles via Google Scholar
- GOOGLE SCHOLAR
- Articles by Gianola, D.
- Articles by Toro, M. A.
- Search for Related Content
- PUBMED
- Articles by Gianola, D.
- Articles by Toro, M. A.




































































