## Abstract

A fully Bayesian method for quantitative genetic analysis of data consisting of ranks of, *e.g*., genotypes, scored at a series of events or experiments is presented. The model postulates a latent structure, with an underlying variable realized for each genotype or individual involved in the event. The rank observed is assumed to reflect the order of the values of the unobserved variables, *i.e*., the classical Thurstonian model of psychometrics. Parameters driving the Bayesian hierarchical model include effects of covariates, additive genetic effects, permanent environmental deviations, and components of variance. A Markov chain Monte Carlo implementation based on the Gibbs sampler is described, and procedures for inferring the probability of yet to be observed future rankings are outlined. Part of the model is rendered nonparametric by introducing a Dirichlet process prior for the distribution of permanent environmental effects. This can lead to potential identification of clusters of such effects, which, in some competitions such as horse races, may reflect forms of undeclared preferential treatment.

LATENT variable models for describing mechanisms by which some continuous scale maps into ordered or unordered categories of response have a long tradition in quantitative genetics and psychometry. For example, the threshold-liability model for ordinal categories dates back to Wright (1934), Dempster and Lerner (1950), Grüneberg (1952), and Falconer (1965); see Gianola (1982) for a review. Extremal models were pioneered in psychology by Thurstone (1927) and were adapted by Bock and Jones (1968) to explain choices between unordered alternatives.

Suppose that an observation is an assignment into one of *M* mutually exclusive and exhaustive categories of response. The classical Thurstonian extremal model postulates the existence of some latent or unobserved continuous valued vector , such that category *i* is observed when *y _{i}* is larger than the other

*M*− 1 elements of the vector. The probability of observing an assignment into category

*M*, say, is given bywhere “parameters” are typically unknown quantities driving the process. The extremal model can also be applied to data consisting of ranks, presumably reflecting the outcome of some underlying continuous process,

*e.g.*, competing ability. For instance, Henery (1981) and Tavernier (1991) used this idea in the context of horse races, and Johnson

*et al.*(2002) for analysis of ranked data collected in primate intelligence experiments. Ranked data also arise in human genetics (

*e.g.*, Lawrence

*et al.*1994). In Henery (1981) and Tavernier (1991), the setting is one where several individuals compete in a series of events (

*e.g.*, horse races). Prior to the event, each individual generates an unobservable random variable, or “liability of winning the event,” and these liabilities are regarded as mutually independent, given some parameters. Posterior to the event, the observed ranking corresponds to the order of the realized values of these liabilities. Clearly, the latent variables are no longer independent,

*a posteriori*, because the ranking implies that , where

*y*

_{[1]}is the liability of the individual ranked first, and so on. In particular, Tavernier (1991) described an empirical Bayes procedure for inferring features of the latent distribution, such as breeding value of racing horses or competing ability of each of the members of a list of competitors. The approach of Tavernier (1991) is based on calculating, iteratively, a conditional posterior mode, given some values of dispersion parameters; these, in turn, are estimated using an approximation to marginal maximum-likelihood estimation. Her procedure uses statistical and numerical approximations that are not needed in a Bayesian implementation based on Markov chain Monte Carlo (MCMC) sampling.

The objective of this study is to present theory for a fully Bayesian method for quantitative genetic analysis of data consisting of ranks scored at each of a series of runs, experiments, or events. The method uses a Thurstonian representation of the mechanism by which an underlying continuum is translated into observed ranks. The parameters driving the Bayesian hierarchical model include additive genetic and permanent environmental effects, and the MCMC implementation allows easy computation of probabilities of yet to be observed runs. The article is organized as follows. probability distribution of observed ranks outlines the Thurstonian concept and gives the sampling model for the observed data. In bayesian structure, a hierarchical specification is outlined. conditional maximum *a* *posteriori* estimation reviews the empirical Bayes methodology suggested by Tavernier (1991) and discusses potential shortcomings of this type of treatment of data. fully bayesian analysis gives details of the MCMC scheme proposed and presents procedures for inference and prediction. Some assumptions are relaxed in nonparametric distribution for effects. Specifically, the assumption of a normal distribution of permanent environmental deviates is replaced by a Dirichlet process, using a normal distribution as baseline probability measure; a Gibbs sampling scheme is presented. The article concludes with a discussion.

## PROBABILITY DISTRIBUTION OF OBSERVED RANKS

Suppose *n _{k}* individuals “compete” in event

*k*and that the information available for analysis is that furnished by the ranks observed in the event. It is assumed that no ties are possible, but this restriction is easy to lift. As in Henery (1981) and Tavernier (1991), it is supposed that the observed ranking is a manifestation of the magnitudes of latent variables, liability of winning the event, which are specific to each of the individuals.

The model for the unobserved liability of individual *j* winning event *k* is the standard specification of quantitative genetics,(1)(Dempster and Lerner 1950; Falconer 1965; Gianola 1982; Gianola and Foulley 1983), where **β** is a set of *F* location effects affecting liability and **x**′_{jk} is a known incidence vector peculiar to individual *j* in event *k*; *a _{j}* and

*p*are the additive genetic and permanent environmental effects, respectively, of individual

_{j}*j*on liability;

*g*is an effect peculiar to event

_{k}*k*(which may or may not be needed in the model) and is a random residual, distributed independently of all terms in the model, within and across subjects; and is the residual variance. In (1), . The liabilities are not observable, so it is assumed that = 1 for parameter identification purposes. Hence, all effects are measured in units of residual standard deviation.

The liabilities of the “competitors” present in event *k* can be packed into the *n _{k}* × 1 vector

**l**

_{k}, and the corresponding vectorial representation is(2)where

**a**is a

*Q*× 1 vector of additive genetic effects of all potential participants or genetically related individuals;

**p**is a

*C*× 1 vector of permanent environmental effects (including perhaps nonadditive genetic deviations) of individuals with at least one record of competition;

**g**is a

*K*× 1 vector of event effects;

**e**

_{k}is a residual vector, and is an identity matrix of order

*n*; and

_{k}**X**

_{k},

**Z**

_{ak},

**Z**

_{pk}, and

**Z**

_{gk}are incidence matrices of appropriate order. Vertical concatenation of all liabilities into a vector

**l**leads to the representation(3)where and .

Prior to the competition, and conditionally on **β**, **a**, **p**, and **g**, the joint density of the yet to be realized liabilities, in view of the mutual independence of the residuals, is(4)where denotes the density of the normally distributed liability *l _{jk}*, with mean μ

_{jk}and variance 1 and is a set of indicator variables;

*i.e.*, δ

_{jk}= 1 if competitor

*j*is present in event

*k*and 0 otherwise.

Consider competition *k*, such that the “winner” is the individual with the largest liability among the *n _{k}* competitors. An ordered variable is denoted as , which is the realized liability of individual

*j*attaining rank = 1, 2, … ,

*n*in the event; for instance, if a competitor whose identification is 20 places second in the event, then the corresponding liability is denoted as . Following Henery (1981) and Tavernier (1991), the probability that a given order or ranking is observed in event

_{k}*k*is given by(5)This probability can also be expressed in terms of nine contrasts between liabilities being >0. Since the latent variables are not observable, one can “anchor” the liability of the winner to 0, so that all other liabilities are negative; the mean liability of a “perfect” competitor (

*i.e.*, one winning all events) would then be null. The probability of the ranks observed in all

*K*competitions (DATA), assuming that outcomes of different races are conditionally independent, given

**θ**, is(6)

## BAYESIAN STRUCTURE

Consider now the problem of inferring, via the Bayesian approach, unobserved features of interest, *e.g.*, the vector of additive genetics **a**, as it would be the case in horse breeding, or the linear combination **v** = **a** + **p**, which can be termed “competing ability.” Define , a vector of variance parameters. Assume that the density of the joint prior distribution of **θ** and **Λ** has the form(7)Above, , and denote multivariate normal densities of orders *F*, *Q*, *C*, and *K*, respectively; is a variance reflecting uncertainty about location parameters in **β** (it is assumed known and large); **A** is the numerator additive relationship matrix (Henderson 1976) and is the additive genetic variance; is the variance of permanent environmental effects and is the variance between events. In addition, is the density of a scaled inverted chi-square distribution on ν_{i} degrees of freedom, with interpretable as a prior guess for , and *H* = is a set of known hyperparameters.

The density of the joint posterior distribution of **θ** and **Λ** is proportional to the product of (6) and (7):(8)

## CONDITIONAL MAXIMUM *A POSTERIORI* ESTIMATION

Prior to the advent of MCMC procedures for sampling from posterior distributions, it was standard to use two-stage approaches to carry out an approximate Bayesian analysis. In the first stage, a Gaussian approximation to is used to obtain an approximate maximum marginal-likelihood estimate of **Λ** (Gianola *et al.* 1986; Foulley *et al.* 1987), , say. In the second stage, the joint mode of the conditional posterior distribution is used to obtain point estimates of **θ** (Gianola and Foulley 1983) given , or “conditional maximum *a posteriori*” (MAP). If the distribution is exactly Gaussian, its posterior mode yields directly the posterior means of each of the elements of **θ** (Gianola and Fernando 1986). If, in addition, a flat prior is used for **β**, the posterior mean vector of yields the best linear unbiased predictor of **β** and the best linear unbiased predictors of **a**, **p**, and **g**. Also, if a flat prior is assigned to **β** and to each of the elements of **Λ**, under Gaussian assumptions the elements of the mode of the marginal distribution coincide with the restricted maximum-likelihood estimates of the variance components (Harville 1974).

Tavernier (1990, 1991) used this type of approach for finding the **θ**-maximizer of the log of the conditional posterior density , assuming a flat prior for **β**. The objective function for the model in (1), assuming a normal prior for **β**, is(9)Tavernier (1990, 1991) employed the Newton–Raphson algorithm for maximizing (9) in the context of horse races. This requires calculation of first and second derivatives of (9) with respect to **θ**. The gradient vector and the Hessian matrix needed for the Newton–Raphson machinery areandrespectively, whereThese derivatives are not trivial and involve integration of multivariate normal densities, as well as differentiation under the integral sign. Tavernier (1991) gives approximations to first and second derivatives that require calculation of expectations, variances, and covariances of order statistics. Also, numerical calculation of high-order multivariate normal integrals is cumbersome. Since the Newton–Raphson algorithm is iterative and the number of competitors and races in a data set can be very large, the procedure is numerically intensive, as a large number of calculations must be effected.

Aside from these technical issues, the procedure relies on statistical approximations that may not be always adequate. For instance, in a high-dimensional, nonnormal, multivariate distribution, the components of the mode of the joint posterior density may be far from the corresponding posterior expectations. Also, computing a measure of precision of estimates of **θ** is a task by itself: a Gaussian approximation to the conditional (given **Λ**) posterior covariance matrix is obtained by inverting the negative of the Hessian matrix and by evaluating this inverse at the converged values of **θ**. The diagonal elements of this inverse give a large sample approximation to the conditional posterior variances of the elements of **θ**. Last but not least, the procedure for estimating parameters in **Λ** is also based on Gaussian approximations and, furthermore, the approximate modal inferences about **θ** do not take into account the uncertainty of the approximate estimates of **Λ**. All these difficulties can be circumvented via MCMC procedures, as discussed in the following sections.

## FULLY BAYESIAN ANALYSIS

#### Joint posterior distribution:

The Bayesian structure given above is maintained, but the parameter vector is augmented with the unobserved liabilities as latent variables (Tanner and Wong 1987; Sorensen *et al.* 1995). A main difference with a standard latent variable problem is that, in our setting, the sample space of the liabilities is order constrained, within events.

After augmentation with the liabilities **l**, the joint posterior density is expressible as(10)Above, the indicator set *I _{k}* denotes the order of liabilities, or ranking attained, observed in competition

*k*. For example, if competitors 1, 5, 7, and 14 participate in event 20, and the ranking observed is 5 > 14 > 1 > 7, then, following the notation employed in (10), .

#### Markov chain Monte Carlo sampling:

A Gibbs sampler is described for effecting draws from the joint posterior distribution (10). This is a well-known procedure for sampling from joint distributions; see, *e.g.*, Gilks *et al.* (1996) and Robert and Casella (2005). Briefly, the Gibbs sampler loops through all conditional posterior distributions, with uncertain quantities sampled either blockwise or piecewise. All samples from the fully conditionals are accepted with probability 1. Samples from early iterations are discarded as “burn in”; subsequently, at the end of each cycle of sampling, the coordinates of sample are regarded as draws from the corresponding marginal posterior distributions. The chain can be thinned, such that samples are as lowly correlated (serially) as desired. If storage capacity is not a limiting factor, then it is better to use all samples in the post-Gibbs analysis. Convergence diagnostics and processing issues are discussed in Cowles and Carlin (1996), Sorensen and Gianola (2002), and Gelman *et al.* (2003).

#### Sampling of liabilities:

The form of (10) indicates that the conditional posterior distribution of the liabilities in event *k* has the form(11)where ELSE denotes the unknown liabilities at all other events, **θ**, **Δ**, **δ**, and DATA. Note that, conditionally on ELSE, the liabilities at different events are mutually independent. However, within event *k*, liabilities are not independent, because of the sample space constraint imposed by knowledge of its outcome, represented by *I _{k}*. The dependence arises through the sample space, but the kernel of the conditional distribution of the

*n*liabilities in event

_{k}*k*, given ELSE, is proportional to the product of the kernels under conditional independence. Sampling from the space-constrained distribution is straightforward:

Draw “independently” the

*n*liabilities of competitors in event_{k}*k*(whose indexes are in*I*) from the distributions ,_{k}*j*∈*I*._{k}If the liabilities sampled satisfy the order constraint in

*I*, then accept the vector_{k}**l**_{k}as a draw from the posterior distribution. Otherwise, reject the proposal and repeat the sampling until a valid draw is effected.

This procedure, although simple, is typically inefficient and it may have a large rejection rate. A more efficient sampling technique, described by Devroye (1986), is given in more detail in Sorensen and Gianola (2002). To illustrate, let the outcome of competition 20 be as in the example above; that is, . Hence, in event 20, the sample space of the liabilities is *l*_{7,20} < *l*_{1,20} < *l*_{14,20} < *l*_{5,20}. To sample the four liabilities, one proceeds from either first to last or last to first (the liability of the winner can be set equal to zero, as stated above). For instance:

Sample

*l*_{7,20}from without any constraint. This is the realized value of the liability of the last competitor.Sample

*l*_{1,20}from the truncated normal distribution , where the parameters are as in the absence of truncation. The realized value is formed aswhere is the standard normal distribution function and

*U*is a uniform random number in (0,1).Sample

*l*_{14,20}from the truncated normal process , forming the realized value asSample the liability of the winner from as

Once the *K* events are processed independently, this produces a realization of the vector of liabilities at iteration *s*, , which becomes the “data” in model (3) at iteration *s* = 1, 2, … , *S* of the Gibbs sampler.

#### Sampling of parameters:

Given the liabilities, the conditional posterior density of all parameters can be deduced from (10) after fixing the vector **l**. This yields(12)Note that, given the liabilities, the information provided by the sets *I _{k}* is redundant. The density above is that of a Bayesian mixed-effects model under Gaussian assumptions (Gianola and Fernando 1986; Wang

*et al.*1993, 1994) with the sampled liabilities playing the role of data. The conditional posterior distributions of

**θ**and

**Δ**are in closed form (Wang

*et al.*1993, 1994), as follows. The location parameters

**θ**can be sampled from the multivariate normal process(13)whereandThe draw from distribution (13) can be done either in a single pass (Garcia-Cortés and Sorensen 1996) or element by element (Wang

*et al.*1994); the rate of convergence to the equilibrium distribution is lower and the autocorrelation between samples higher when sampling is piecewise.

All conditional posterior distributions of the dispersion parameters are scaled inverted chi square and are mutually independent. The distributions are(14)(15)and(16)

The Gibbs sampling algorithm consists of drawing liabilities from distributions (11), such that the order constraints are satisfied, followed by draws from (13) and (14)–(16) to obtain parameter updates. The sampling process is repeated as needed, after convergence diagnostics are satisfied, and such that the Monte Carlo error of estimation of features of the posterior distribution is small enough.

#### Inference:

The procedure of Tavernier (1991) produces conditional posterior modes of location parameters, as already noted, but can also be used to estimate the probability of observing a certain outcome, even in a hypothetical competition, although ignoring uncertainty about estimates. The fully Bayesian Markov chain Monte Carlo procedure is more flexible. First, it is possible to produce a complete posterior distribution for any uncertain quantity, *e.g.*, the competing ability *a _{j}* +

*p*of individual

_{j}*j*or the additive genetic variance . In addition, it is possible to estimate the probability of any event related to a future competition. Let

*I*be an order associated with a yet to be realized event referred to as “future,” with

_{f}*n*participants. The predictive probability distribution of

_{f}*I*, posterior to historical data (

_{f}**DATA**), is given by(17)where denotes expectation with respect to the distribution (.|.). The integral in (17) does not have a closed form, since the posterior distribution is unknown. However, it can be estimated from the Markov chain Monte Carlo samples via ergodic averaging.

Let , *s* = 1, 2, … , *S* be the -coordinate of a sample from the posterior distribution , with density (10), and letbe the probability of observing the ranking in *I _{f}* in the future liabilities when . Note that, since event effects are assumed to be independent,

**DATA**will not contain information about a future event effect

*g*. This does not pose a problem, as the posterior distribution can be augmented with this effect; however,

_{f}**DATA**will convey information about . A simulation-consistent estimator of (17) is(18)where is an indicator variable taking the value 1 if the ranking in

*I*is attained in sample

_{f}*s*and 0 otherwise. In practice, one samples liabilities for future event

*f*fromand checks whether or not the realized values satisfy if so, . This process is repeated for every Monte Carlo sample, leading directly to (18).

A similar procedure can be used to estimate the probability of any future event *A*, such as whether “participant” *j* is first, second, or third in a competition involving *n _{f}* competitors; let such an event be denoted as

*A*. Then, the probability of observing

*A*in a future event can be estimated asOperationally, one samples liabilities

**l**

_{f}from as given above and checks whether or not

*A*is realized, in which case and 0 otherwise.

## NONPARAMETRIC DISTRIBUTION FOR EFFECTS

Above, it was assumed that prior uncertainty about permanent environmental or event effects could be described reasonably by the normal distributions and , respectively. However, it might be sensible to make less strong assumptions about these distributions. For instance, there may be clusters of events that are more similar to each other, *i.e.*, clusters representing different, unknown, levels of competition. Likewise, there may be some competitors that receive similar “preferential” treatment. A suitable representation of uncertainty may be that furnished by the Dirichlet process prior (Escobar 1994; MacEachern 1994; Kleinman and Ibrahim 1998; Van Der Merwe and Pretorius 2003). The case of the permanent environmental effects is discussed here. *A priori*, let the permanent effects *p _{i}* be independently distributed aswhere

*G*is some general distribution. In turn, assume that

*G*follows a Dirichlet process (DP) prior(Ferguson 1973; Antoniak 1974), meaning that

*G*is a random member of a space of distributions, with

*M*and being parameters of these process. For example, it could be that is taken as “base prior,” a distribution approximating the nonparametric shape of

*G*, with

*M*interpretable as a degree of belief on how close is to

*G*. The parameter

*M*is such that, when , then ; the same is true if all random effects are identical (Kleinman and Ibrahim 1998). When

*G*is mixed over the Dirichlet process, the Polya urn representation of the prior distribution of the random effects (Kleinman and Ibrahim 1998) is, for as base prior,where the probabilities

*P*

_{1},

*P*

_{2}, … ,

*P*

_{C}_{−1}follow from the parameters of the Dirichlet process. Using results in Escobar (1994), Kleinman and Ibrahim (1998), and Van Der Merwe and Pretorius (2003), the prior distribution of the random effects generates the sequenceNote that as , the prior distribution tends toward the parametric process , as noted earlier. The preceding implies that the conditional prior distribution of a permanent environmental effect (

*i*, say), given all other such effects, is(19)whereand

*B*is the set over which the sum is taken (Escobar 1994; Van Der Merwe and Pretorius 2003). Note that

*p*can be equal to the permanent environmental effect of other subjects with nonnull prior probability; this has an implication in the construction of conditional posterior distributions. Representation (19) illustrates that the conditional prior distribution of any of the permanent effects, given all other

_{i}*p*'s, is a mixture of

*C*− 1 degenerate distributions () with point masses at

*p*and of the parametric base distribution . The mixing probabilities of states

_{j}*j*≠

*i*are for each of the

*C*− 1 “conditioning” states, and for the probability that

*p*is a draw from the normal process.

_{i}Given the liabilities, and assuming temporarily that *M* is assigned a fixed value, the joint posterior density of all uncertain quantities is obtained by modifying (12) into(20)Above, *p _{i}* ∼

*G*denotes that

*p*follows the unknown distribution

_{i}*G*drawn from a Dirichlet process with as base measure and with known parameter

*M*. Since the Dirichlet process allows

*p*to take any of the other

_{i}*p*'s as a current value, data from all individuals with records of performance contribute to the conditional posterior distribution of

*p*. One has as conditional posterior distribution(21)Under standard parametric assumptions, the fully conditional posterior distribution of

_{i}*p*is given by(22)Using well-known results (

_{i}*e.g.*, Wang

*et al.*1993, 1994; Sorensen and Gianola 2002) it can be shown that(23)where is the column of

**Z**

_{p}in (3) pertaining to individual

*j*, andIn addition, let(24)with this integral expressible in closed form. Note that the integrand is the product of the densities of all liabilities of individual

*i*, with

**β**,

**a**, and

**g**as fixed parameters (the

*p*'s of other individuals do not enter into the model for the liabilities of individual

*i*) times the density. Hence, with

**l**

_{i}being the

*r*× 1 vector of liabilities of the individual, and , and being incidence matrices relating

_{i}**l**

_{i}to

**β**,

**a**, and

**g**, respectively, a standard integration yields(25)whereand is a matrix of 1's of order

*r*. Employing (23) and (24) in (22) leads toso that(26)The form of (26) indicates that the fully conditional posterior distribution of

_{i}*p*is a mixture of the

_{i}*C*− 1 degenerate distributions with point masses at

*p*and of the parametric distribution . Let nowSimilar to that in Van Der Merwe and Pretorius (2003), the rule for drawing samples from the conditional posterior distribution

_{j}*p*|ELSE is to take as a current value for

_{i}*p*(27)where

_{i}*p*is the current state in the Markov chain of the permanent environmental effect of individual

_{j}*j*, and

*Q*is evaluated at the current values of all parameters entering into this probability. In the calculation of

_{j}*Q*, the permanent effect

_{j}*p*is replaced by

_{i}*p*when computing . This means that values of

_{j}*p*that generate liabilities

_{j}**l**

_{i}with higher plausibility are assigned a higher probability of selection. Eventually, this leads to clustering of permanent environmental effects into “plausibility” groups.

The fully conditional posterior distributions of all other parameters can be readily deduced from (20) by fixing **p** as well as the liabilities(28)The conditional posterior distribution of **β**, **a**, and **g** given everything else is the multivariate normal process(29)whereThe conditional posterior distributions of the variance components are as in the parametric model, that is,(30)(31)and(32)The Gibbs sampler draws liabilities as before, permanent environmental effects from (27), **β**, **a**, and **g** from (29), and the three variance components from the preceding expressions.

At each iteration of the Gibbs sampler, the procedure induces a clustering structure of permanent environmental effects, with each cluster consisting of individuals sharing the same value of *p _{i}*. Following Van Der Merwe and Pretorius (2003), let

*t*be the number of unique permanent environmental effects (0 <

*t*≤

*C*). The posterior distribution of

*t*can be obtained readily from the sampling procedure. Bush and MacEachern (1996) suggest embedding an optional additional step in the Gibbs sampling procedure: let the number of clusters identified at iteration

*m*be ; this means that there are distinct values of the

*p*'s, which are labeled as . The model for the liabilities in (3) is then rewritten aswhere is a condensation of

**Z**

_{p}of order obtained by assigning liabilities to the appropriate cluster of permanent environmental effects to which they belong. At that point of the scheme, Kleinman and Ibrahim (1998) and Van Der Merwe and Pretorius (2003) suggest drawing a sample of

**η**from the processwhereandOnce a sample of

**η**is obtained, say , the permanent environmental effects are reconstituted as , where is a

*C*× matrix that allocates cluster values to permanent environmental effects. Then, the MCMC process continues for other parameters as usual.

Turn attention now to inferring *M*, the degree of belief parameter of the Dirichlet process. Using results in Antoniak (1974) and West (1992), but giving more details, as their developments are compact, the prior distribution of the number of clusters *t* may be written aswhere the factor does not involve *M*; is the Gamma function. As pointed out by West (1992), if one has samples of *t* (see above), the conditional posterior distribution of *M* is expressible aswhere *p*(*M*) is the prior density of *M*. This result follows, because, if one knows **p**, this gives immediately the number of clusters *t* as well as the number of permanent environmental effects per cluster. Then, with *B*(. , .) being the Beta functionWest (1992) notes that this density results from marginalizing the joint distribution [*M*, *x*|*t*, ELSE], where *x* is a continuous random variable taking values between 0 and 1. Hence, it follows thatConsequently, the conditional posterior density of the auxiliary variable *x* is(33)so that . In addition, the conditional posterior distribution of *M* isand its form depends on the prior adopted for *M*. Since *M* must be positive, West (1992) suggests a Gamma (*a*, *b*) distribution as prior, so thatThe normalized density isThe integrals in the denominator yieldHence, letting *a** = (*t* + *a*) and the conditional posterior distribution becomes(34)This is a mixture of the two Gamma distributions indicated, with mixing probabilities π_{x} and 1 − π_{x}. Note thatSince ,At the end of the MCMC sampling process there will be *S* samples of the number of clusters *t* and of the auxiliary variables *x*. The density of the marginal posterior distribution of *M* can be estimated (West 1992) using the Rao–Blackwell estimator(35)where , and . If *a* = 0 and *b* = 0, the Gamma prior degenerates to or, equivalently, to ∝ constant. If such an improper prior is adopted, the Rao–Blackwell estimator reduces towith *a** = *t* and *b** = −log *x* used in the calculation of . While the conditional posterior distribution of *M* is well defined for *a* = *b* = 0, this does not guarantee that its marginal posterior distribution will be always proper. Hence, the uniform prior on should be used with caution.

## CONCLUSION

This study presents parametric and semiparametric procedures for analysis of rank data. Observations presented as ranks arise often in behavioral or cognitive sciences and in the context of competitions such as horse jumping events or greyhound races. For instance, Johnson *et al.* (2002) carried out a metaanalysis of 30 primate (24 genera represented) intelligence experiments grouped into “paradigms” and “procedures within paradigm.” Paradigms are types of intelligence tests (*e.g.*, “reversal learning,” where individuals learn to pick one object rather than another, and then the value of the objects changes, so that a previously unrewarded object is now rewarded); “procedures” refer to different methodologies used to investigate the paradigm. In one of six reversal studies the observed ranking was Cebus > Hylobates > Aotus > Saimiri. In the analysis of the ranks, they used Thurstonian models similar to those described in this article. Their Bayesian formulation allowed a clear assessment of uncertainty, which was viewed as especially important because, in these experiments, raters usually do not rank >15 items, so asymptotic approximations may be inadequate.

In a study of competitions of jumping horses, Tavernier (1990) argued that a “physical” measure of performance (*e.g.*, time, number of obstacles broken or refused) is not always available, and that it is unclear how to measure the technical difficulty of an event. Also, measures such as yearly earnings are available only for horses that earn money, so that the information comes from a selected sample, which may distort inference considerably. As discussed earlier, Tavernier (1990, 1991) proposed an approximate Bayesian method in which ranks are viewed as being the manifestation of latent variables, as in the present study. In the analysis, involving ∼19,000 horses participating in all jumping competitions that took place in France in 1987, ∼14% of the horses had an accuracy <0.14 ("accuracy" ranges between 0 and 1), and 28% did not reach 0.45. These accuracies are approximations and probably understate uncertainty, because the error of estimates of parameters is not taken into account in the calculations. Important numerical problems were encountered by Tavernier (1990). For each Newton–Raphson iteration, and for a race with *n* horses, she had to calculate the following multivariate normal integrals: (1) an integral of order *n* − 1, (2) *n* − 1 integrals of order *n* − 2, and (3) integrals of order *n* − 3. For this reason, she used approximations to integrals based on Taylor series expansions. None of these integrals need to be computed explicitly or approximated in our procedure, as all calculations rely on Monte Carlo sampling. Another difficulty in the conditional maximum *a posteriori* method of Tavernier (1990, 1991) is the computation of the joint posterior mode, which, in the context of horse races in France may involve solving (reiteratively) a system exceeding 100,000 unknowns. In our procedure, solving of equations is replaced by effecting draws from conditional posterior distributions and, under the assumptions of this article, all these are recognizable. A potential difficulty may be slow mixing of the MCMC algorithm, but this can often be accelerated by either changing the algorithm or adopting a different parameterization. Introducing a Dirichlet process prior enhances the level of difficulty of the calculations, but it confers robustness with respect to possible departures from the normality assumption for the distributions of events or of permanent environmental effects. Further, the nonparametric treatment of permanent environmental effects is a flexible alternative to a thick-tailed distribution (Strandén and Gianola 1998, 1999; Rosa *et al.* 2001, 2003, 2004), in the sense that it allows for asymmetry, as well as for allocation of effects into clusters with some biological meaning.

Here, it was assumed that event effects were equally distributed *a priori*, but there may be heterogeneity with respect to level of difficulty. This can be accommodated via hierarchical modeling. For instance, the prior distribution could be modified into , where **α** is a vector of unknown “levels of difficulty” effects, and **M** is a known matrix relating **g** to **α**; in turn, **α** could be assigned a vague prior. In their primate studies, Johnson *et al.* (2002) fitted a latent variable model including a genus effect and a “paradigm-genus” deviation; they found that the latter did not contribute to variability appreciably and concluded that taxa differed in domain-general ability. A nonparametric alternative would be to assign a Dirichlet process prior to the event effects **g** and let the data drive to clusters representing level of difficulty.

In the context of horse breeding, it may be that there are individuals whose genes predispose them to perform well in certain types of competition, but not in others; *i.e.*, there may be differents sets of genes for different types of competition. Our model can be extended by assuming that each individual has a liability peculiar to each type of competition. Suppose there are three types of competition; here, there would be three liabilities per individual with a within-individual correlation matrix **R**_{0}, say. All three liabilities would be realized only if the individual in question participates in the three types of events. A simpler model could postulate that liabilities for different events are independent, but that each individual possesses an additive genetic effect peculiar to the type of event. An additive genetic variance–covariance matrix **G**_{0} would be introduced and inferred as in a multivariate Bayesian mixed-effects model (Sorensen and Gianola 2002).

In many sports events, *e.g.*, races or marathons, there are no ties. However, ties can be handled in our procedure via a simple modification of the limits the liability variates can take. For example, consider a competition with four participants (5, 7, 14, and 20), such that the observed ranking is here, denotes a tie in second place between individuals 7 and 14. In the process of sampling the liabilities, it suffices to accept samples such that 5 and 20 take the largest and smallest values, respectively, with the liabilities 7 and 14 taking any order between those of 5 and 20; then, the ranking of 7 and 14 would vary at random over MCMC samples. Alternatively, one could assign the average of the two liabilities sampled for 7 and 14 in each of the MCMC iterations. Johnson *et al.* (2002) handled ties via a parametric model for the probability that two genera are tied, with one additional parameter (with a corresponding prior distribution) introduced in the implementation.

A useful extension of this model, at least in the context of evaluation of horse races, would be one for a joint analysis of ranks and of some “continuous” variate, such as time in the event or earnings. As mentioned above, there are situations in which the rank information is available for all competitors, but earnings are available only for the placed competitors. Clearly, earnings for the other competitors are not missing completely at random. However, if the conditional probability of missing earning information depends on the ranks, then missingness would be ignorable in a joint analysis of ranks and of earnings (Rubin 1976; Im *et al.* 1989; Sorensen *et al.* 2001). For example, such a model could postulate a joint normal distribution of the liabilities and of the continuous trait measured on some competitors.

## Acknowledgments

Yu-Mei Chang and Xiao-Lin Wu are thanked for helpful comments. Research was completed while the senior author was Mercator Professor (Deutsche Forschungsgemeinschaft) at Georg-August-University, Göttingen, Germany. Support by the Wisconsin Agriculture Experiment Station and by grants National Research Initiatives Competitive Grants Program/U.S. Department of Agriculture 2003-35205-12833, National Science Foundation (NSF) DEB-0089742, and NSF DMS-NSF DMS-044371 is acknowledged.

## Footnotes

Communicating editor: J. B. Walsh

- Received May 12, 2006.
- Accepted September 8, 2006.

- Copyright © 2006 by the Genetics Society of America