- THIS ARTICLE
-
Abstract
- Full Text (PDF)
-
All Versions of this Article:
genetics.106.060673v1
174/3/1613 most recent - Alert me when this article is cited
- Alert me if a correction is posted
- SERVICES
- Email this article to a friend
- Similar articles in this journal
- Similar articles in PubMed
- Alert me to new issues of the journal
- Download to citation manager
- Reprints & Permissions
- CITING ARTICLES
- Citing Articles via Google Scholar
- GOOGLE SCHOLAR
- Articles by Gianola, D.
- Articles by Simianer, H.
- Search for Related Content
- PUBMED
- PubMed Citation
- Articles by Gianola, D.
- Articles by Simianer, H.
Originally published as Genetics Published Articles Ahead of Print on September 15, 2006.
Genetics, Vol. 174, 1613-1624, November 2006, Copyright © 2006
doi:10.1534/genetics.106.060673
A Thurstonian Model for Quantitative Genetic Analysis of Ranks: A Bayesian Approach
Daniel Gianola*,
,
,1 and
Henner Simianer
* Department of Animal Sciences, University of Wisconsin, Madison, Wisconsin 53706,
Institute of Animal Breeding and Genetics, Georg-August-University, 37075 Göttingen, Germany and
Department of Animal and Aquacultural Sciences, Norwegian University of Life Sciences, N-1432 Ås, Norway
1 Corresponding author: Department of Animal Sciences, University of Wisconsin, 1675 Observatory Dr., Madison, WI 53706.
E-mail: gianola{at}ansci.wisc.edu
>ABSTRACT
PROBABILITY DISTRIBUTION OF...
BAYESIAN STRUCTURE
CONDITIONAL MAXIMUM A POSTERIORI...
FULLY BAYESIAN ANALYSIS
NONPARAMETRIC DISTRIBUTION FOR...
CONCLUSION
ACKNOWLEDGEMENTS
LITERATURE CITED
A fully Bayesian method for quantitative genetic analysis of data consisting of ranks of, e.g., genotypes, scored at a series of events or experiments is presented. The model postulates a latent structure, with an underlying variable realized for each genotype or individual involved in the event. The rank observed is assumed to reflect the order of the values of the unobserved variables, i.e., the classical Thurstonian model of psychometrics. Parameters driving the Bayesian hierarchical model include effects of covariates, additive genetic effects, permanent environmental deviations, and components of variance. A Markov chain Monte Carlo implementation based on the Gibbs sampler is described, and procedures for inferring the probability of yet to be observed future rankings are outlined. Part of the model is rendered nonparametric by introducing a Dirichlet process prior for the distribution of permanent environmental effects. This can lead to potential identification of clusters of such effects, which, in some competitions such as horse races, may reflect forms of undeclared preferential treatment.
LATENT variable models for describing mechanisms by which some continuous scale maps into ordered or unordered categories of response have a long tradition in quantitative genetics and psychometry. For example, the threshold-liability model for ordinal categories dates back to WRIGHT (1934), DEMPSTER and LERNER (1950), GRÜNEBERG (1952), and FALCONER (1965); see GIANOLA (1982) for a review. Extremal models were pioneered in psychology by THURSTONE (1927) and were adapted by BOCK and JONES (1968) to explain choices between unordered alternatives.
Suppose that an observation is an assignment into one of M mutually exclusive and exhaustive categories of response. The classical Thurstonian extremal model postulates the existence of some latent or unobserved continuous valued vector
, such that category i is observed when yi is larger than the other M – 1 elements of the vector. The probability of observing an assignment into category M, say, is given by
![]() |
, where y[1] is the liability of the individual ranked first, and so on. In particular, TAVERNIER (1991) described an empirical Bayes procedure for inferring features of the latent distribution, such as breeding value of racing horses or competing ability of each of the members of a list of competitors. The approach of TAVERNIER (1991) is based on calculating, iteratively, a conditional posterior mode, given some values of dispersion parameters; these, in turn, are estimated using an approximation to marginal maximum-likelihood estimation. Her procedure uses statistical and numerical approximations that are not needed in a Bayesian implementation based on Markov chain Monte Carlo (MCMC) sampling. The objective of this study is to present theory for a fully Bayesian method for quantitative genetic analysis of data consisting of ranks scored at each of a series of runs, experiments, or events. The method uses a Thurstonian representation of the mechanism by which an underlying continuum is translated into observed ranks. The parameters driving the Bayesian hierarchical model include additive genetic and permanent environmental effects, and the MCMC implementation allows easy computation of probabilities of yet to be observed runs. The article is organized as follows. PROBABILITY DISTRIBUTION OF OBSERVED RANKS outlines the Thurstonian concept and gives the sampling model for the observed data. In BAYESIAN STRUCTURE, a hierarchical specification is outlined. CONDITIONAL MAXIMUM A POSTERIORI ESTIMATION reviews the empirical Bayes methodology suggested by TAVERNIER (1991) and discusses potential shortcomings of this type of treatment of data. FULLY BAYESIAN ANALYSIS gives details of the MCMC scheme proposed and presents procedures for inference and prediction. Some assumptions are relaxed in NONPARAMETRIC DISTRIBUTION FOR EFFECTS. Specifically, the assumption of a normal distribution of permanent environmental deviates is replaced by a Dirichlet process, using a normal distribution as baseline probability measure; a Gibbs sampling scheme is presented. The article concludes with a discussion.
ABSTRACT
>PROBABILITY DISTRIBUTION OF...
BAYESIAN STRUCTURE
CONDITIONAL MAXIMUM A POSTERIORI...
FULLY BAYESIAN ANALYSIS
NONPARAMETRIC DISTRIBUTION FOR...
CONCLUSION
ACKNOWLEDGEMENTS
LITERATURE CITED
The model for the unobserved liability of individual j
winning event k
is the standard specification of quantitative genetics,
![]() | (1) |
is a random residual, distributed independently of all terms in the model, within and across subjects; and
is the residual variance. In (1),
. The liabilities are not observable, so it is assumed that
= 1 for parameter identification purposes. Hence, all effects are measured in units of residual standard deviation.
The liabilities of the "competitors" present in event k can be packed into the nk x 1 vector lk, and the corresponding vectorial representation is
![]() | (2) |
is a residual vector, and
is an identity matrix of order nk; and Xk, Zak, Zpk, and Zgk are incidence matrices of appropriate order. Vertical concatenation of all liabilities into a
vector l leads to the representation
![]() | (3) |
and
.
Prior to the competition, and conditionally on β, a, p, and g, the joint density of the yet to be realized liabilities, in view of the mutual independence of the residuals, is
![]() | (4) |
denotes the density of the normally distributed liability ljk, with mean µjk and variance 1 and
is a set of indicator variables; i.e.,
jk = 1 if competitor j is present in event k and 0 otherwise.
Consider competition k, such that the "winner" is the individual with the largest liability among the nk competitors. An ordered variable is denoted as
, which is the realized liability of individual j attaining rank = 1, 2, ... , nk in the event; for instance, if a competitor whose identification is 20 places second in the event, then the corresponding liability is denoted as
. Following HENERY (1981) and TAVERNIER (1991), the probability that a given order or ranking is observed in event k
is given by
![]() | (5) |
, is
![]() | (6) |
ABSTRACT
PROBABILITY DISTRIBUTION OF...
>BAYESIAN STRUCTURE
CONDITIONAL MAXIMUM A POSTERIORI...
FULLY BAYESIAN ANALYSIS
NONPARAMETRIC DISTRIBUTION FOR...
CONCLUSION
ACKNOWLEDGEMENTS
LITERATURE CITED
, a vector of variance parameters. Assume that the density of the joint prior distribution of
and
has the form
![]() | (7) |
, and
denote multivariate normal densities of orders F, Q, C, and K, respectively;
is a variance reflecting uncertainty about location parameters in β (it is assumed known and large); A is the numerator additive relationship matrix (HENDERSON 1976) and
is the additive genetic variance;
is the variance of permanent environmental effects and
is the variance between events. In addition,
is the density of a scaled inverted chi-square distribution on
i degrees of freedom, with
interpretable as a prior guess for
, and H =
is a set of known hyperparameters.
The density of the joint posterior distribution of
and
is proportional to the product of (6) and (7):
![]() | (8) |
ABSTRACT
PROBABILITY DISTRIBUTION OF...
BAYESIAN STRUCTURE
>CONDITIONAL MAXIMUM A POSTERIORI...
FULLY BAYESIAN ANALYSIS
NONPARAMETRIC DISTRIBUTION FOR...
CONCLUSION
ACKNOWLEDGEMENTS
LITERATURE CITED
is used to obtain an approximate maximum marginal-likelihood estimate of
(GIANOLA et al. 1986; FOULLEY et al. 1987),
, say. In the second stage, the joint mode of the conditional posterior distribution
is used to obtain point estimates of
(GIANOLA and FOULLEY 1983) given
, or "conditional maximum a posteriori" (MAP). If the distribution
is exactly Gaussian, its posterior mode yields directly the posterior means of each of the elements of
(GIANOLA and FERNANDO 1986). If, in addition, a flat prior is used for β, the posterior mean vector of
yields the best linear unbiased predictor of β and the best linear unbiased predictors of a, p, and g. Also, if a flat prior is assigned to β and to each of the elements of
, under Gaussian assumptions the elements of the mode of the marginal distribution
coincide with the restricted maximum-likelihood estimates of the variance components (HARVILLE 1974).
TAVERNIER (1990, 1991) used this type of approach for finding the
-maximizer of the log of the conditional posterior density
, assuming a flat prior for β. The objective function for the model in (1), assuming a normal prior for β, is
![]() | (9) |
. The gradient vector and the Hessian matrix needed for the Newton–Raphson machinery are
![]() |
![]() |
![]() |
Aside from these technical issues, the procedure relies on statistical approximations that may not be always adequate. For instance, in a high-dimensional, nonnormal, multivariate distribution, the components of the mode of the joint posterior density may be far from the corresponding posterior expectations. Also, computing a measure of precision of estimates of
is a task by itself: a Gaussian approximation to the conditional (given
) posterior covariance matrix is obtained by inverting the negative of the Hessian matrix and by evaluating this inverse at the converged values of
. The diagonal elements of this inverse give a large sample approximation to the conditional posterior variances of the elements of
. Last but not least, the procedure for estimating parameters in
is also based on Gaussian approximations and, furthermore, the approximate modal inferences about
do not take into account the uncertainty of the approximate estimates of
. All these difficulties can be circumvented via MCMC procedures, as discussed in the following sections.
ABSTRACT
PROBABILITY DISTRIBUTION OF...
BAYESIAN STRUCTURE
CONDITIONAL MAXIMUM A POSTERIORI...
>FULLY BAYESIAN ANALYSIS
NONPARAMETRIC DISTRIBUTION FOR...
CONCLUSION
ACKNOWLEDGEMENTS
LITERATURE CITED
Joint posterior distribution:
The Bayesian structure given above is maintained, but the parameter vector is augmented with the unobserved liabilities as latent variables (TANNER and WONG 1987; SORENSEN et al. 1995). A main difference with a standard latent variable problem is that, in our setting, the sample space of the liabilities is order constrained, within events.
After augmentation with the liabilities l, the joint posterior density is expressible as
![]() | (10) |
denotes the order of liabilities, or ranking attained, observed in competition k. For example, if competitors 1, 5, 7, and 14 participate in event 20, and the ranking observed is 5 > 14 > 1 > 7, then, following the notation employed in (10),
.
Markov chain Monte Carlo sampling:
A Gibbs sampler is described for effecting draws from the joint posterior distribution (10). This is a well-known procedure for sampling from joint distributions; see, e.g., GILKS et al. (1996) and ROBERT and CASELLA (2005). Briefly, the Gibbs sampler loops through all conditional posterior distributions, with uncertain quantities sampled either blockwise or piecewise. All samples from the fully conditionals are accepted with probability 1. Samples from early iterations are discarded as "burn in"; subsequently, at the end of each cycle of sampling, the coordinates of sample
are regarded as draws from the corresponding marginal posterior distributions. The chain can be thinned, such that samples are as lowly correlated (serially) as desired. If storage capacity is not a limiting factor, then it is better to use all samples in the post-Gibbs analysis. Convergence diagnostics and processing issues are discussed in COWLES and CARLIN (1996), SORENSEN and GIANOLA (2002), and GELMAN et al. (2003).
Sampling of liabilities:
The form of (10) indicates that the conditional posterior distribution of the liabilities in event k has the form
![]() | (11) |
,
,
, and DATA. Note that, conditionally on ELSE, the liabilities at different events are mutually independent. However, within event k, liabilities are not independent, because of the sample space constraint imposed by knowledge of its outcome, represented by Ik. The dependence arises through the sample space, but the kernel of the conditional distribution of the nk liabilities in event k, given ELSE, is proportional to the product of the kernels under conditional independence. Sampling from the space-constrained distribution is straightforward:- Draw "independently" the nk liabilities of competitors in event k (whose indexes are in Ik) from the distributions
, j
Ik.
- If the liabilities sampled satisfy the order constraint in Ik, then accept the vector lk as a draw from the posterior distribution. Otherwise, reject the proposal and repeat the sampling until a valid draw is effected.
- If the liabilities sampled satisfy the order constraint in Ik, then accept the vector lk as a draw from the posterior distribution. Otherwise, reject the proposal and repeat the sampling until a valid draw is effected.
This procedure, although simple, is typically inefficient and it may have a large rejection rate. A more efficient sampling technique, described by DEVROYE (1986), is given in more detail in SORENSEN and GIANOLA (2002). To illustrate, let the outcome of competition 20 be as in the example above; that is,
. Hence, in event 20, the sample space of the liabilities is l7,20 < l1,20 < l14,20 < l5,20. To sample the four liabilities, one proceeds from either first to last or last to first (the liability of the winner can be set equal to zero, as stated above). For instance:
- Sample l7,20 from
without any constraint. This is the realized value of the liability of the last competitor.
- Sample l1,20 from the truncated normal distribution
, where the parameters are as in the absence of truncation. The realized value is formed as 
- where
is the standard normal distribution function and U is a uniform random number in (0,1).
- Sample l14,20 from the truncated normal process
, forming the realized value as 
- Sample the liability of the winner from
as 
- Sample l1,20 from the truncated normal distribution
Once the K events are processed independently, this produces a realization of the
vector of liabilities at iteration s,
, which becomes the "data" in model (3) at iteration s = 1, 2, ... , S of the Gibbs sampler.
Sampling of parameters:
Given the liabilities, the conditional posterior density of all parameters can be deduced from (10) after fixing the vector l. This yields
![]() | (12) |
and
are in closed form (WANG et al. 1993, 1994), as follows. The location parameters
can be sampled from the multivariate normal process
![]() | (13) |
![]() |
![]() |
All conditional posterior distributions of the dispersion parameters are scaled inverted chi square and are mutually independent. The distributions are
![]() | (14) |
![]() | (15) |
![]() | (16) |
The Gibbs sampling algorithm consists of drawing liabilities from distributions (11), such that the order constraints are satisfied, followed by draws from (13) and (14)–(16) to obtain parameter updates. The sampling process is repeated as needed, after convergence diagnostics are satisfied, and such that the Monte Carlo error of estimation of features of the posterior distribution is small enough.
Inference:
The procedure of TAVERNIER (1991) produces conditional posterior modes of location parameters, as already noted, but can also be used to estimate the probability of observing a certain outcome, even in a hypothetical competition, although ignoring uncertainty about estimates. The fully Bayesian Markov chain Monte Carlo procedure is more flexible. First, it is possible to produce a complete posterior distribution for any uncertain quantity, e.g., the competing ability aj + pj of individual j or the additive genetic variance
. In addition, it is possible to estimate the probability of any event related to a future competition. Let If be an order associated with a yet to be realized event referred to as "future," with nf participants. The predictive probability distribution of If, posterior to historical data (DATA), is given by
![]() | (17) |
denotes expectation with respect to the distribution (.|.). The integral in (17) does not have a closed form, since the posterior distribution
is unknown. However, it can be estimated from the Markov chain Monte Carlo samples via ergodic averaging.
Let
, s = 1, 2, ... , S be the
-coordinate of a sample from the posterior distribution
, with density (10), and let
![]() |
. Note that, since event effects are assumed to be independent, DATA will not contain information about a future event effect gf. This does not pose a problem, as the posterior distribution can be augmented with this effect; however, DATA will convey information about
. A simulation-consistent estimator of (17) is
![]() | (18) |
is an indicator variable taking the value 1 if the ranking in If is attained in sample s and 0 otherwise. In practice, one samples liabilities for future event f from
![]() |
if so,
. This process is repeated for every Monte Carlo sample, leading directly to (18).
A similar procedure can be used to estimate the probability of any future event A, such as whether "participant" j is first, second, or third in a competition involving nf competitors; let such an event be denoted as A. Then, the probability of observing A in a future event can be estimated as
![]() |
as given above and checks whether or not A is realized, in which case
and 0 otherwise. ABSTRACT
PROBABILITY DISTRIBUTION OF...
BAYESIAN STRUCTURE
CONDITIONAL MAXIMUM A POSTERIORI...
FULLY BAYESIAN ANALYSIS
>NONPARAMETRIC DISTRIBUTION FOR...
CONCLUSION
ACKNOWLEDGEMENTS
LITERATURE CITED
and
, respectively. However, it might be sensible to make less strong assumptions about these distributions. For instance, there may be clusters of events that are more similar to each other, i.e., clusters representing different, unknown, levels of competition. Likewise, there may be some competitors that receive similar "preferential" treatment. A suitable representation of uncertainty may be that furnished by the Dirichlet process prior (ESCOBAR 1994; MACEACHERN 1994; KLEINMAN and IBRAHIM 1998; VAN DER MERWE and PRETORIUS 2003). The case of the permanent environmental effects is discussed here. A priori, let the permanent effects pi be independently distributed as
![]() |
![]() |
being parameters of these process. For example, it could be that
is taken as "base prior," a distribution approximating the nonparametric shape of G, with M interpretable as a degree of belief on how close
is to G. The parameter M is such that, when
, then
; the same is true if all random effects are identical (KLEINMAN and IBRAHIM 1998). When G is mixed over the Dirichlet process, the Polya urn representation of the prior distribution of the random effects (KLEINMAN and IBRAHIM 1998) is, for
as base prior,
![]() |
![]() |
, the prior distribution tends toward the parametric process
, as noted earlier. The preceding implies that the conditional prior distribution of a permanent environmental effect (i, say), given all other such effects, is
![]() | (19) |
![]() |
) with point masses at pj
and of the parametric base distribution
. The mixing probabilities of states j
i are
for each of the C – 1 "conditioning" states, and
for the probability that pi is a draw from the normal process.
Given the liabilities, and assuming temporarily that M is assigned a fixed value, the joint posterior density of all uncertain quantities is obtained by modifying (12) into
![]() | (20) |
G denotes that pi follows the unknown distribution G drawn from a Dirichlet process with
as base measure and with known parameter M. Since the Dirichlet process allows pi to take any of the other p's as a current value, data from all individuals with records of performance contribute to the conditional posterior distribution of pi. One has as conditional posterior distribution
![]() | (21) |
![]() | (22) |
![]() | (23) |
![]() |
is the column of Zp in (3) pertaining to individual j, and
![]() |
![]() | (24) |
density. Hence, with li being the ri x 1 vector of liabilities of the individual, and
, and
being incidence matrices relating li to β, a, and g, respectively, a standard integration yields
![]() | (25) |
![]() |
is a matrix of 1's of order ri. Employing (23) and (24) in (22) leads to
![]() |
![]() | (26) |
with point masses at pj
and of the parametric distribution
. Let now
![]() |
![]() | (27) |
. This means that values of pj that generate liabilities li with higher plausibility are assigned a higher probability of selection. Eventually, this leads to clustering of permanent environmental effects into "plausibility" groups.
The fully conditional posterior distributions of all other parameters can be readily deduced from (20) by fixing p as well as the liabilities
![]() | (28) |
![]() | (29) |
![]() |
![]() | (30) |
![]() | (31) |
![]() | (32) |
At each iteration of the Gibbs sampler, the procedure induces a clustering structure of permanent environmental effects, with each cluster consisting of individuals sharing the same value of pi. Following VAN DER MERWE and PRETORIUS (2003), let t be the number of unique permanent environmental effects (0 < t
C). The posterior distribution of t can be obtained readily from the sampling procedure. BUSH and MACEACHERN (1996) suggest embedding an optional additional step in the Gibbs sampling procedure: let the number of clusters identified at iteration m be
; this means that there are
distinct values of the p's, which are labeled as
. The model for the liabilities in (3) is then rewritten as
![]() |
is a condensation of Zp of order
obtained by assigning liabilities to the appropriate cluster of permanent environmental effects to which they belong. At that point of the scheme, KLEINMAN and IBRAHIM (1998) and VAN DER MERWE and PRETORIUS (2003) suggest drawing a sample of
from the process
![]() |
![]() |
![]() |
is obtained, say
, the permanent environmental effects are reconstituted as
, where
is a C x
matrix that allocates cluster values to permanent environmental effects. Then, the MCMC process continues for other parameters as usual.
Turn attention now to inferring M, the degree of belief parameter of the Dirichlet process. Using results in ANTONIAK (1974) and WEST (1992), but giving more details, as their developments are compact, the prior distribution of the number of clusters t may be written as
![]() |
does not involve M;
is the Gamma function. As pointed out by WEST (1992), if one has samples of t (see above), the conditional posterior distribution of M is expressible as
![]() |
![]() |
![]() |
![]() | (33) |
. In addition, the conditional posterior distribution of M is
![]() |
![]() |
![]() |
![]() |
the conditional posterior distribution becomes
![]() | (34) |
x and 1 –
x. Note that
![]() |
,
![]() |
![]() | (35) |
, and
. If a = 0 and b = 0, the Gamma prior degenerates to
or, equivalently, to
constant. If such an improper prior is adopted, the Rao–Blackwell estimator reduces to
![]() |
. While the conditional posterior distribution of M is well defined for a = b = 0, this does not guarantee that its marginal posterior distribution will be always proper. Hence, the uniform prior on
should be used with caution. ABSTRACT
PROBABILITY DISTRIBUTION OF...
BAYESIAN STRUCTURE
CONDITIONAL MAXIMUM A POSTERIORI...
FULLY BAYESIAN ANALYSIS
NONPARAMETRIC DISTRIBUTION FOR...
>CONCLUSION
ACKNOWLEDGEMENTS
LITERATURE CITED
In a study of competitions of jumping horses, TAVERNIER (1990) argued that a "physical" measure of performance (e.g., time, number of obstacles broken or refused) is not always available, and that it is unclear how to measure the technical difficulty of an event. Also, measures such as yearly earnings are available only for horses that earn money, so that the information comes from a selected sample, which may distort inference considerably. As discussed earlier, TAVERNIER (1990, 1991) proposed an approximate Bayesian method in which ranks are viewed as being the manifestation of latent variables, as in the present study. In the analysis, involving
19,000 horses participating in all jumping competitions that took place in France in 1987,
14% of the horses had an accuracy <0.14 ("accuracy" ranges between 0 and 1), and 28% did not reach 0.45. These accuracies are approximations and probably understate uncertainty, because the error of estimates of parameters is not taken into account in the calculations. Important numerical problems were encountered by TAVERNIER (1990). For each Newton–Raphson iteration, and for a race with n horses, she had to calculate the following multivariate normal integrals: (1) an integral of order n – 1, (2) n – 1 integrals of order n – 2, and (3)
integrals of order n – 3. For this reason, she used approximations to integrals based on Taylor series expansions. None of these integrals need to be computed explicitly or approximated in our procedure, as all calculations rely on Monte Carlo sampling. Another difficulty in the conditional maximum a posteriori method of TAVERNIER (1990, 1991) is the computation of the joint posterior mode, which, in the context of horse races in France may involve solving (reiteratively) a system exceeding 100,000 unknowns. In our procedure, solving of equations is replaced by effecting draws from conditional posterior distributions and, under the assumptions of this article, all these are recognizable. A potential difficulty may be slow mixing of the MCMC algorithm, but this can often be accelerated by either changing the algorithm or adopting a different parameterization. Introducing a Dirichlet process prior enhances the level of difficulty of the calculations, but it confers robustness with respect to possible departures from the normality assumption for the distributions of events or of permanent environmental effects. Further, the nonparametric treatment of permanent environmental effects is a flexible alternative to a thick-tailed distribution (STRANDÉN and GIANOLA 1998, 1999; ROSA et al. 2001, 2003, 2004), in the sense that it allows for asymmetry, as well as for allocation of effects into clusters with some biological meaning.
Here, it was assumed that event effects were equally distributed a priori, but there may be heterogeneity with respect to level of difficulty. This can be accommodated via hierarchical modeling. For instance, the prior distribution
could be modified into
, where
is a vector of unknown "levels of difficulty" effects, and M is a known matrix relating g to
; in turn,
could be assigned a vague prior. In their primate studies, JOHNSON et al. (2002) fitted a latent variable model including a genus effect and a "paradigm-genus" deviation; they found that the latter did not contribute to variability appreciably and concluded that taxa differed in domain-general ability. A nonparametric alternative would be to assign a Dirichlet process prior to the event effects g and let the data drive to clusters representing level of difficulty.
In the context of horse breeding, it may be that there are individuals whose genes predispose them to perform well in certain types of competition, but not in others; i.e., there may be differents sets of genes for different types of competition. Our model can be extended by assuming that each individual has a liability peculiar to each type of competition. Suppose there are three types of competition; here, there would be three liabilities per individual with a within-individual correlation matrix R0, say. All three liabilities would be realized only if the individual in question participates in the three types of events. A simpler model could postulate that liabilities for different events are independent, but that each individual possesses an additive genetic effect peculiar to the type of event. An additive genetic variance–covariance matrix G0 would be introduced and inferred as in a multivariate Bayesian mixed-effects model (SORENSEN and GIANOLA 2002).
In many sports events, e.g., races or marathons, there are no ties. However, ties can be handled in our procedure via a simple modification of the limits the liability variates can take. For example, consider a competition with four participants (5, 7, 14, and 20), such that the observed ranking is
here,
denotes a tie in second place between individuals 7 and 14. In the process of sampling the liabilities, it suffices to accept samples such that 5 and 20 take the largest and smallest values, respectively, with the liabilities 7 and 14 taking any order between those of 5 and 20; then, the ranking of 7 and 14 would vary at random over MCMC samples. Alternatively, one could assign the average of the two liabilities sampled for 7 and 14 in each of the MCMC iterations. JOHNSON et al. (2002) handled ties via a parametric model for the probability that two genera are tied, with one additional parameter (with a corresponding prior distribution) introduced in the implementation.
A useful extension of this model, at least in the context of evaluation of horse races, would be one for a joint analysis of ranks and of some "continuous" variate, such as time in the event or earnings. As mentioned above, there are situations in which the rank information is available for all competitors, but earnings are available only for the placed competitors. Clearly, earnings for the other competitors are not missing completely at random. However, if the conditional probability of missing earning information depends on the ranks, then missingness would be ignorable in a joint analysis of ranks and of earnings (RUBIN 1976; IM et al. 1989; SORENSEN et al. 2001). For example, such a model could postulate a joint normal distribution of the liabilities and of the continuous trait measured on some competitors.
ABSTRACT
PROBABILITY DISTRIBUTION OF...
BAYESIAN STRUCTURE
CONDITIONAL MAXIMUM A POSTERIORI...
FULLY BAYESIAN ANALYSIS
NONPARAMETRIC DISTRIBUTION FOR...
CONCLUSION
>ACKNOWLEDGEMENTS
LITERATURE CITED
ABSTRACT
PROBABILITY DISTRIBUTION OF...
BAYESIAN STRUCTURE
CONDITIONAL MAXIMUM A POSTERIORI...
FULLY BAYESIAN ANALYSIS
NONPARAMETRIC DISTRIBUTION FOR...
CONCLUSION
ACKNOWLEDGEMENTS
>LITERATURE CITED
ANTONIAK, C. E., 1974 Mixtures of Dirichlet processes with applications to nonparametric problems. Ann. Stat. 2: 1152–1174.[CrossRef]
BOCK, R. D., and L. V. JONES, 1968 The Measurement and Prediction of Judgment and Choice. Holden–Day, San Francisco.
BUSH, C. A., and S. N. MACEACHERN, 1996 A semiparametric Bayesian model for randomized block designs. Biometrika 83: 275–285.
COWLES, M. K., and B. P. CARLIN, 1996 Markov chain Monte Carlo convergence diagnostics: a comparative review. J. Am. Stat. Assoc. 91: 883–904.[CrossRef]
DEMPSTER, E. R., and I. M. LERNER, 1950 Heritability of threshold characters. Genetics 35: 212–236.
DEVROYE, L., 1986 Non-Uniform Random Variate Generation. Springer, New York.
ESCOBAR, M. D., 1994 Estimating normal means with a Dirichlet process prior. J. Am. Stat. Assoc. 89: 268–275.[CrossRef]
FALCONER, D. S., 1965 The inheritance of liability to certain diseases, estimated from the incidence among relatives. Ann. Hum. Genet. 29: 51–76.[CrossRef]
FERGUSON, T. S., 1973 A Bayesian analysis of some nonparametric problems. Ann. Stat. 1: 209–230.[CrossRef]
FOULLEY, J. L., S. IM, D. GIANOLA and I. HÖSCHELE, 1987 Empirical Bayes estimation of genetic value for n binary traits. Genet. Sel. Evol. 19: 197–224.[CrossRef]
GARCIA-CORTÉS, L. A., and D. SORENSEN, 1996 On a multivariate implementation of the Gibbs sampler. Genet. Sel. Evol. 28: 121–126.[CrossRef]
GELMAN, A., J. B. CARLIN, H. S. STERN and D. B. RUBIN, 2003 Bayesian Data Analysis, Ed. 2. Chapman & Hall, London.
GIANOLA, D., 1982 Theory and analysis of threshold characters. J. Anim. Sci. 54: 1079–1096.
GIANOLA, D., and R. L. FERNANDO, 1986 Bayesian methods in animal breeding theory. J. Anim. Sci. 63: 217–244.
GIANOLA, D., and J. L. FOULLEY, 1983 Sire evaluation for ordered categorical data with a threshold model. Genet. Sel. Evol. 15: 201–223.[CrossRef]
GIANOLA, D., J. L. FOULLEY and R. L. FERNANDO, 1986 Prediction of breeding values when variances are not known. Genet. Sel. Evol. 18: 485–498.[CrossRef]
GILKS, W. R., S. RICHARDSON and D. J. SPIEGELHALTER, 1996 Markov Chain Monte Carlo in Practice. Chapman & Hall, London.
GRÜNEBERG, H., 1952 Genetical studies on the skeleton of the mouse. IV. Quasi-continuous variations. J. Genet. 51: 95–114.
HARVILLE, D. A., 1974 Bayesian inference of variance components using only error contrasts. Biometrika 61: 383–385.
HENDERSON, C. R., 1976 A simple method for computing the inverse of a numerator relationship matrix used in prediction of breeding values. Biometrics 32: 69–83.[CrossRef]
HENERY, R. J., 1981 Permutation probabilities as models for horse races. J. R. Stat. Soc. B 43: 86–91.
IM, S., R. L. FERNANDO and D. GIANOLA, 1989 Likelihood inferences in animal breeding under selection: a missing data theory viewpoint. Genet. Sel. Evol. 21: 399–414.[CrossRef]
JOHNSON, V. E., R. O. DEANER and C. P. VAN SCHAIK, 2002 Bayesian analysis of multi-study rank data with application to primate intelligence ratings. J. Am. Stat. Assoc. 97: 8–17.[CrossRef]
KLEINMAN, K. P., and J. G. IBRAHIM, 1998 A semiparametric Bayesian approach to the random effects model. Biometrics 54: 921–938.[CrossRef][Medline]
LAWRENCE, S., R. BEASLEY, I. DOULL, B. BEGISHVILI, F. LAMPE et al., 1994 Genetic analysis of atopy and asthma as quantitative traits and ordered polychotomies. Ann. Hum. Genet. 58: 359–368.[Medline]
MACEACHERN, S. N., 1994 Estimation of normal means with a conjugate style Dirichlet process prior. Comm. Stat. Sim. 23: 727–741.[CrossRef]
ROBERT, C. P., and G. CASELLA, 2005 Monte Carlo Statistical Methods, Ed. 2. Springer, New York.
ROSA, G. J. M., D. GIANOLA and J. I. URIOSTE, 2001 Assessing relationships between genetic evaluations using robust regression with an application to Holsteins in Uruguay. Acta Agric. Scand. Sect. A 51: 21–34.[CrossRef]
ROSA, G. J. M., C. R. PADOVANI and D. GIANOLA, 2003 Robust linear mixed models with normal/independent distributions and Bayesian MCMC implementation. Biom. J. 45: 573–590.[CrossRef]
ROSA, G. J. M., D. GIANOLA and C. R. PADOVANI, 2004 Bayesian longitudinal data analysis with mixed models and thick-tailed distributions using MCMC. J. Appl. Stat. 31: 855–873.[CrossRef]
RUBIN, D. B., 1976 Inference and missing data. Biometrika 63: 581–582.
SORENSEN, D., and D. GIANOLA, 2002 Likelihood, Bayesian, and MCMC Methods in Quantitative Genetics. Springer–Verlag, New York.
SORENSEN, D., S. ANDERSEN, D. GIANOLA and I. KORSGAARD, 1995 Bayesian inference in threshold models using Gibbs sampling. Genet. Sel. Evol. 27: 229–249.[CrossRef]
SORENSEN, D. A., R. L. FERNANDO and D. GIANOLA, 2001 Inferring the trajectory of genetic variance in the course of artificial selection. Genet. Res. 77: 83–94.[CrossRef][Medline]
STRANDÉN, I., and D. GIANOLA, 1998 Attenuating effects of preferential treatment with Student-t mixed linear models: a simulation study. Genet. Sel. Evol. 30: 565–583.[CrossRef]
STRANDÉN, I., and D. GIANOLA, 1999 Mixed effects linear models with t-distributions for quantitative genetic analysis: a Bayesian approach. Genet. Sel. Evol. 31: 25–42.[CrossRef]
TANNER, M. A., and W. WONG, 1987 The calculation of posterior distributions by data augmentation. J. Am. Stat. Assoc. 82: 528–550.[CrossRef]
TAVERNIER, A., 1990 Estimation of breeding value of jumping horses from their ranks. Livest. Prod. Sci. 26: 277–290.[CrossRef]
TAVERNIER, A., 1991 Genetic evaluation of horses based on ranks in competitions. Genet. Sel. Evol. 23: 159–173.[CrossRef]
THURSTONE, L. L., 1927 A law of comparative judgement. Psych. Rev. 34: 278–286.
VAN DER MERWE, A. J., and A. L. PRETORIUS, 2003 Bayesian estimation in animal breeding using the Dirichlet process prior for correlated random effects. Genet. Sel. Evol. 35: 137–158.[CrossRef][Medline]
WANG, C. S., J. J. RUTLEDGE and D. GIANOLA, 1993 Marginal inferences about variance components in a mixed linear model using Gibbs sampling. Genet. Sel. Evol. 25: 41–62.[CrossRef]
WANG, C. S., J. J. RUTLEDGE and D. GIANOLA, 1994 Bayesian analysis of mixed linear models via Gibbs sampling with an application to litter size in Iberian pigs. Genet. Sel. Evol. 26: 91–115.[CrossRef]
WEST, M., 1992 Hyperparameter estimation in Dirichlet process mixture models. Technical Report 92-A03. ISDS, Duke University, Durham, NC.
WRIGHT, S., 1934 An analysis of variability in number of digits in an inbred strain of guinea pigs. Genetics 19: 506–536.
Communicating editor: J. B. WALSH
- THIS ARTICLE
-
Abstract
- Full Text (PDF)
-
All Versions of this Article:
genetics.106.060673v1
174/3/1613 most recent - Alert me when this article is cited
- Alert me if a correction is posted
- SERVICES
- Email this article to a friend
- Similar articles in this journal
- Similar articles in PubMed
- Alert me to new issues of the journal
- Download to citation manager
- Reprints & Permissions
- CITING ARTICLES
- Citing Articles via Google Scholar
- GOOGLE SCHOLAR
- Articles by Gianola, D.
- Articles by Simianer, H.
- Search for Related Content
- PUBMED
- PubMed Citation
- Articles by Gianola, D.
- Articles by Simianer, H.





































































