- THIS ARTICLE
-
Abstract
- Full Text (PDF)
- Alert me when this article is cited
- Alert me if a correction is posted
- SERVICES
- Email this article to a friend
- Similar articles in this journal
- Similar articles in PubMed
- Alert me to new issues of the journal
- Download to citation manager
- Reprints & Permissions
- CITING ARTICLES
- Citing Articles via HighWire
- Citing Articles via Google Scholar
- GOOGLE SCHOLAR
- Articles by Milligan, B. G.
- Search for Related Content
- PUBMED
- PubMed Citation
- Articles by Milligan, B. G.
Maximum-Likelihood Estimation of Relatedness
Brook G. Milliganaa Institute of Cell, Animal and Population Biology, University of Edinburgh, Edinburgh EH9 3JT, Scotland
Corresponding author: Brook G. Milligan, New Mexico State University, Las Cruces, NM 88003., brook{at}nmsu.edu (E-mail)
Communicating editor: J. B. WALSH
| ABSTRACT |
|---|
Relatedness between individuals is central to many studies in genetics and population biology. A variety of estimators have been developed to enable molecular marker data to quantify relatedness. Despite this, no effort has been given to characterize the traditional maximum-likelihood estimator in relation to the remainder. This article quantifies its statistical performance under a range of biologically relevant sampling conditions. Under the same range of conditions, the statistical performance of five other commonly used estimators of relatedness is quantified. Comparison among these estimators indicates that the traditional maximum-likelihood estimator exhibits a lower standard error under essentially all conditions. Only for very large amounts of genetic information do most of the other estimators approach the likelihood estimator. However, the likelihood estimator is more biased than any of the others, especially when the amount of genetic information is low or the actual relationship being estimated is near the boundary of the parameter space. Even under these conditions, the amount of bias can be greatly reduced, potentially to biologically irrelevant levels, with suitable genetic sampling. Additionally, the likelihood estimator generally exhibits the lowest root mean-square error, an indication that the bias in fact is quite small. Alternative estimators restricted to yield only biologically interpretable estimates exhibit lower standard errors and greater bias than do unrestricted ones, but generally do not improve over the maximum-likelihood estimator and in some cases exhibit even greater bias. Although some nonlikelihood estimators exhibit better performance with respect to specific metrics under some conditions, none approach the high level of performance exhibited by the likelihood estimator across all conditions and all metrics of performance.
AN understanding of the relatedness between individuals plays an important role in many areas of population biology and genetics. For example, it is central to quantitative genetics and plays a crucial role in estimating heritability and additive genetic variances and covariances (![]()
![]()
![]()
(![]()
are used to quantify the degree of relatedness between two individuals.
Estimates of
or r may be derived in a variety of ways. Traditionally, they are calculated from a known pedigree (![]()
![]()
![]()
![]()
![]()
![]()
![]()
![]()
![]()
![]()
![]()
![]()
![]()
Both ![]()
![]()
![]()
), that proposed by ![]()
![]()
The statistical behavior of relatedness estimators is critical to their utility in practice. Because of their complexity, simulations have generally been relied upon to characterize the sampling error. These simulations have taken two approaches. The first constructed data sets from relatively simple conditions, assuming identical allele frequency distributions across loci (![]()
![]()
![]()
![]()
Conspicuously lacking from the array of relatedness estimators under test is the traditional general maximum-likelihood estimator (![]()
![]()
Given the many desirable features of maximum-likelihood estimators, it would be natural to develop one for relatedness of individuals. Further, it would be useful to determine whether a maximum-likelihood estimator of relatedness approaches its asymptotic properties rapidly enough to be useful in practice or to compete with nonlikelihood estimators. In this study, we investigate one such estimator (![]()
Note that the approach taken here, to estimate relationship as a continuous parameter, is distinct from a related one, which infers the degree of relatedness from among a set of discrete possibilities. Both approaches are discussed by ![]()
![]()
![]()
![]()
![]()
![]()
The primary goal of this study is to assess the performance of the likelihood estimator, in comparison with some of those already developed, under a range of biological sampling conditions. In particular, the performance is quantified by two measures of the distribution of estimates obtained for each estimator (the standard error and the bias) and by the overall deviation of those estimates from the parametric value, quantified by the root mean-square error. From this information it is possible to determine how aspects of the sampling conditions, e.g., number of loci or segregating alleles, or aspects of the relationship being estimated influence the ability of one estimator or another to perform well. Thus, some guidance for experimental design can be developed.
The initial focus of this study is on a relatively simple set of conditions, although one meant to mimic a variety of natural situations. In this sense, it is more closely connected to the evaluation used by ![]()
![]()
| STATISTICAL MODELS |
|---|
For population samples in which the ancestry of individual alleles is unknown, the most general means of describing the relatedness of one individual to another is in terms of the nine identity modes described by ![]()
= (
1,
2, ...
9), each of which represents the probability of the four alleles at a single locus in two diploid individuals sharing the corresponding particular pattern of identity-by-descent. In a large, noninbred population, only
7,
8, and
9 are nonzero; consequently, in such a population any pattern of relationship between two individuals can be described by that set of three coefficients. The most commonly used summary of the degree of relationship is the coefficient of coancestry (![]()
![]()
![]() |
(1) |
which quantifies the probability that two individuals will produce an inbred offspring were they to mate. This latter coefficient plays central roles in the estimation of heritability and additive genetic variance (![]()
![]()
![]()
![]()
![]()
![]()
![]()
![]()
![]()
or r = 2
even though there is a loss of information in the process of transforming the complete set of parameters
into a single quantity
. For development of a likelihood estimator of relatedness, the underlying complete set of parameters
is used; subsequently
(or r) can be calculated from Equation 1 if necessary.
|
Likelihood models:
Likelihood estimators are based on a probability model of the sampled data. In this case, the unit of sampling is a pair of individuals, each one of which has been assayed genetically at L loci. The estimator described here is based on the assumption of independently segregating marker loci. The likelihood for the overall sample, therefore, is simply the product of the likelihoods across the loci.
The basic probability model of the sampled alleles at a single diploid locus is well known (![]()
There are nine distinct patterns of identity-in-state for the four alleles sampled at a single locus in two individuals. Table 1 lists these in the second column. As an example, a pair of individuals each homozygous for allele A1 represent identity-in-state mode
1, whereas two individuals of genotypes A1A1 and A1A2 represent identity-in-state mode
3.
|
Given that a pair of individuals is known to be related according to identity-by-descent mode Sj, the probability of each identity-in-state pattern Si is dependent on the allele frequencies. For example, if two noninbred individuals have two identical-by-descent alleles (S7), either of two identity-in-state patterns (
1 or
7) could occur, depending on the sampling of actual alleles at the locus. The former, which corresponds to both identical-by-descent alleles also having the same state Ai, occurs with probability p2i, whereas the latter, which corresponds to distinct Ai and Aj alleles, occurs with probability 2pipj. Table 1 lists the probability of observing each of these patterns of identity-in-state conditioned upon the two individuals being related according to each of the possible modes of identity-by-descent (Fig 1), Pr(
i|Sj). Note that this table corrects a typographical error present in the classical formulation (![]()
![]()
Recall that the set of parameters
= (
1,
2, ...
9) correspond to the probabilities of each identity-by-descent mode and completely quantify the degree of relatedness between individuals. Following ![]()
i, for two individuals at a single locus, given the degree of relatedness
and the distribution of allele frequencies, is equal to the likelihood of
:
![]() |
(2) |
The likelihood of the entire sampled array of L loci is simply the product of Equation 2 across loci. Although each locus will be characterized by its own set of allele frequencies, the degree of relatedness between the two individuals (the parameter
in Equation 2) is constant across loci as it represents the overall relatedness of the individuals to each other.
Parameter space:
The maximum-likelihood estimate of the set of
is found by searching over the parameter space until a maximum is found. In general an algebraic solution is impossible; as a result numerical methods are used. The implementation used here is based on a translation of the simplex method (![]()
![]()
A number of possibilities exist for defining the parameter space over which the optimization will be carried out. The complete parameter space is, of course, eight-dimensional, corresponding to the nine distinct parameters
j = Pr (Sj) constrained by the fact that they sum to unity. The immediate purpose, however, is to consider the case of a large noninbred population. In this instance only the last three parameters are nonzero. For the purposes of this analysis, the maximum-likelihood estimate is obtained by optimization within the two-dimensional parameter space defined by the parameters (
7,
8,
9) constrained by their sum being unity. It is meaningless to admit solutions outside this region as they correspond to undefined values for the probability of identity-by-descent (![]()
One of the useful features of maximum-likelihood estimators is that they can be readily adapted to a variety of situations. In some cases it may be known that individuals are either full-sibs or unrelated (or any other pair of degrees of relatedness; ![]()
![]()
Of primary concern is the statistical behavior of the likelihood estimator, especially in contrast to existing alternatives (![]()
![]()
![]()
![]()
![]()
Method-of-moments estimators:
The performance of 6 estimators of relatedness, quantified as the coancestry coefficient
, was investigated. The likelihood estimator calculated
from Equation 1 as the maximum-likelihood estimate of the identity-by-descent probabilities,
. Five additional nonlikelihood estimators were considered as being representative of the diversity of the ones available. They represent 5 different means of using the similarity in allelic states between individuals to construct estimates of relatedness, and the ones tested by ![]()
![]()
![]()
![]()
![]()
![]()
![]()
The most commonly used estimator is one published by ![]()
xy +
yx)/2 (![]()
![]()
xy (![]()
![]()
(![]()
![]()
xy +
yx)/2 across loci (![]()
![]()
![]()
and so were transformed to
to be comparable with the maximum-likelihood estimator.
Several of these estimators have undesirable behavior under certain conditions. For example, with two alleles the ![]()
![]()
Additionally, the method-of-moments estimators are generally not constrained to lie within the biologically relevant range of [0, 0.5], unlike the traditional maximum-likelihood estimator. This property enables them to remain statistically unbiased; however, individual estimates may not have meaning when interpreted as probabilities of identity-by-descent. One means of handling this is to truncate the method-of-moments estimates to lie within the proper range, that is, to replace lower values with zero and larger values with 0.5. To investigate the effect of the parameter range itself, as opposed to the type of estimator, all method-of-moments estimators were examined in both the standard and the truncated form.
Simulations:
A range of sampling conditions was considered to mimic the variety of different genetic markers that might be available for estimating relatedness. Although for some organisms huge arrays of polymorphic genetic loci are available, for the vast majority of natural populations this is not the case. Thus, the range in number of loci (530) mimics moderate genetic samples. A great variety of types of genetic markers are available for quantifying relatedness. These range from markers that segregate few distinguishable alleles, generally including allozymes, single-nucleotide polymorphisms, or restriction/PCR fragments, to markers that segregate many distinguishable alleles, commonly microsatellites. To reflect this range in marker types, loci segregating 2, 5, 10, and 20 alleles were considered. Three different allele-frequency distributions were used for the simulations: one in which all alleles occur at equal frequency, one in which a single allele occurs with a frequency of 0.8 and the remainder are equally frequent, and one in which allele frequencies at each locus were independently drawn from the same Dirichlet distribution (![]()
To determine the statistical behavior of each estimator under each condition, sets of 1000 replicate samples of two individuals were obtained. Each of the six estimators was used to estimate
for each of the replicate samples. The mean and standard error of the population of estimates were calculated from these samples. The bias of each estimator was quantified as the deviation of the mean from the known parametric value of
used to generate the data. The root mean-square error was quantified as
![]() |
(3) |
where
i is the ith estimate and
* is the parametric value used to generate the simulated data sets. In all cases it was assumed that the allele-frequency distribution was known without error. Consequently, the focus is on the sampling properties of the relatedness estimators themselves.
| RESULTS |
|---|
As with any estimator of genetic relatedness, the quality of the estimate depends on the amount of available genetic information. Typically, both the number of loci for which genetic information is available and the number of alleles segregating at those loci have strong influences on the standard error of the estimate of relatedness. Additionally, different estimators of relatedness often respond differently to the amount of genetic information available.
Fig 2 illustrates the general level of variation yielded by each of the estimators. It is evident that the likelihood estimator described here has lower standard error than any of the others under all conditions; however, the other five are rather similar overall. One feature of the other estimators is their propensity to yield estimates of
that lie outside the biologically meaningful range of [0, 0.5]. Under some conditions approximately half of the estimates are negative, for example. Because the focus of interest for these measures is on a specific pair of individuals, it is difficult to interpret the meaning of estimates that suggest, for example, that two individuals are less related than unrelated. However, because the likelihood estimator is constrained to always produce estimates within the biologically meaningful range, some bias is introduced near the boundary. For example, for unrelated individuals
= 0, yet the likelihood estimator evidently commonly generates values that overestimate that. Thus, while exhibiting less variation, the likelihood estimator is more biased under some conditions. Clearly, the truncated estimators will also exhibit less variation and more bias than the untruncated ones for the same reason.
|
An additional feature that is evident from Fig 2 is that many of the estimators are skewed. This is particularly the case for the ![]()
![]()
Standard error:
Fig 3 quantifies the standard error of each estimator of
as a function of the amount of genetic information available. The variation for all estimators declines with the number of loci sampled. Generally, the standard error of the likelihood estimator is lower than that of any of the others. Interestingly, the estimators proposed by ![]()
![]()
![]()
![]()
|
In these simulations there is no indication that the two different versions of the ![]()
![]()
![]()
The actual degree of relatedness between individuals has relatively little effect on the standard error of the likelihood estimator of
. For example, the standard errors of 30-locus likelihood estimates for full-sibs and first cousins differ by <4%, despite these representing quite different degrees of relatedness. The standard error is lower for unrelated individuals because of the constraint that estimates must be within the biologically realistic range. This independence of actual relatedness for standard error of
is broadly consistent across all the estimators when the allele-frequency distribution is favorable; it is somewhat less so when the allele-frequency distribution is dominated by a single allele segregating at high frequency or when variation among loci in allele-frequency distributions exists.
Bias:
The second main statistical feature of each estimator, bias, is illustrated in Fig 4. Two features are immediately apparent. All of the nonlikelihood estimators are essentially unbiased under all conditions; in contrast, the likelihood estimator is biased under some conditions. As mentioned before, this is a consequence of the fact that the likelihood estimator is constrained within the biologically meaningful range of [0, 0.5].
|
Unlike for standard error, the actual degree of relatedness does influence the bias of the likelihood estimator. For example, the likelihood estimator for parent-offspring and full-sib relationships yields estimates that are quite close to the true value of
; in fact, its bias is either zero or close enough as to be biologically insignificant. However, when actual relatedness is close to the boundary, which is the case for both first cousins and unrelated individuals, the bias is much larger. This is true for any of the allele-frequency distributions. When the allele-frequency distribution is favorable, increasing numbers of loci can substantially reduce the bias of the likelihood estimator, potentially to the point of being biologically insignificant even for unrelated individuals. In contrast, when the allele-frequency distribution is dominated by a single allele, increasing numbers of loci have little effect on bias for reasonable numbers of loci.
Root mean-square error:
The third main statistical feature of each estimator, root mean-square error, is illustrated in Fig 5. This measure is a reflection of the mean deviation of the distribution of estimates from the parametric value of
used in the simulation. As such it integrates both the standard error and the bias of the estimators. Largely these curves follow the corresponding ones for standard error, an indication that under most conditions all estimators are essentially unbiased. Only in situations both lacking in useful genetic information (e.g., a single predominant allele at each locus) and with true relationships near the boundary of the parameter space (e.g., especially unrelated individuals) does the likelihood estimator perform notably worse than the others with regard to root mean-square error. In all other cases considered, the likelihood estimator performs better than any alternative with regard to this integrated measure of performance.
|
Truncated estimators:
The performance of the truncated method-of-moments estimators relative to the likelihood estimator is largely anticipated from Fig 2. In the cases of full-sib and parent-offspring relatedness relatively few estimates lie beyond the meaningful range of [0, 0.5], so truncation has little effect; in contrast, for first cousins and unrelated individuals, substantial numbers of estimates do so and truncation has a large effect. For example, although the standard error is essentially unchanged between truncated and nontruncated estimators for the former two, it is somewhat reduced for first cousins and substantially reduced for unrelated individuals. For first cousins the truncated method-of-moments estimators exhibit no standard error lower than that of the maximum-likelihood estimator, and in most cases still exceed it. The reduction in standard error is great enough for unrelated individuals that, for the Dirichlet distribution of allele frequencies and when one allele predominates the ![]()
![]()
The bias of the truncated method-of-moments estimators increases substantially for the two less-related cases. Although the relative increase in bias is quite large (>10- to 100-fold in some cases), the bias is somewhat less than that exhibited by the maximum-likelihood estimator. Under these conditions the truncated ![]()
![]()
![]()
Segregating alleles:
The previous discussion illustrated the performance of alternative estimators under several different sampling conditions. These results are quite typical. However, the difference between the likelihood estimator and the others declines as the number of alleles segregating at each locus increases. That is, the nonlikelihood estimators improve in performance and approach the likelihood estimator as the amount of genetic information increases.
The variation of the likelihood estimator itself also changes in response to increasing numbers of alleles segregating at each locus (Fig 6). In all cases, more segregating alleles reduce the standard error of the likelihood estimate of
. This is especially true when alleles are equally frequent. When one allele predominates, even a few additional alleles substantially reduce the standard error, which is not further reduced by many additional alleles. Thus, for a wide range of conditions a large reduction in standard error relative to biallelic samples is possible by sampling loci with even a few alleles; additional reductions are possible only if many alleles segregate at intermediate frequencies.
|
In contrast, the bias of the likelihood estimator is relatively unaffected by the number of alleles segregating at each locus (Fig 7). In fact, when the allele-frequency distribution is dominated by a single allele, the number of additional alleles segregating (and, as noted above, the number of loci) has essentially no influence on the bias over the range of loci considered (although the estimator is asymptotically unbiased). These sampling conditions are basically uninformative, so large samples are unhelpful. However, if the allele-frequency distribution is more favorable, the number of alleles segregating at each locus does have an influence on the bias of the likelihood estimator. As with the standard error, substantial reductions in bias are possible under all conditions of actual relatedness when even a few alleles are segregating. Even for conditions exhibiting the largest bias (e.g., unrelated individuals), the degree of bias can be reduced to biologically insignificant levels for realistic samples.
|
| DISCUSSION |
|---|
Despite its basic importance for understanding the biology of natural populations, estimation of relatedness between individuals remains a difficult challenge. A diversity of estimators (![]()
![]()
![]()
![]()
![]()
![]()
![]()
The prominent feature of the likelihood estimator is that it exhibits a lower standard error than that of any of the others for a wide range of reasonable sampling conditions. This conclusion appears to be independent of the number of loci sampled, the number of alleles segregating at each locus, or the frequency distribution of the segregating alleles. When many loci are sampled, the other estimators approach the performance of the likelihood estimator. This is especially true for the ![]()
![]()
![]()
These conclusions differ dramatically from those obtained earlier for different maximum-likelihood estimators of relatedness (![]()
![]()
![]()
The other feature of the likelihood estimator is that it, unlike the others, is biased under some conditions. The degree of bias is dependent on the actual degree of relatedness between individuals and the nature of the genetic information. If the actual degree of relatedness is near a boundary, such as for first cousins or unrelated individuals, the bias is more severe than if the actual degree of relatedness is within the interior of the parameter space, such as for full-sibs. However, the degree of bias can be greatly reduced by sampling loci that segregate for more alleles. Additional segregating alleles are not helpful if their frequency distribution is highly skewed. Samples of 2030 microsatellite loci, which often segregate for 20 or more alleles and exhibit high heterozygosity, could yield estimates of
that are quite unbiased, even for unrelated individuals. However, even markers segregating for only a few alleles can also dramatically reduce the bias compared with biallelic markers. Thus, even though the likelihood estimator is more biased than any of the others, suitable genetic sampling can greatly reduce this problem. The amount of genetic information required to reduce the bias of the likelihood estimator to insignificant levels is within the range of feasible sampling efforts.
These two quantities, standard error and bias, are integrated by the measure of root mean-square error. As with the standard error, the likelihood estimator maintains a low root mean-square error under almost all conditions. The exceptions are cases involving little useful genetic information and true relationships that are near the boundary of the parameter space. However, the fact that a low root mean-square error is maintained by the likelihood estimator, despite the inherent potential for enhanced bias, indicates that in fact the bias is of little biological consequence. The other estimators more than make up for generally lower bias through their greater standard error. Additionally, the skewness identified for several other estimators may lead to further problems in practice.
Often the primary interest lies in the estimate of relatedness itself. Such would be the case, for example, in ascertaining family membership or determining the spatial structure of relatedness for sessile organisms. In such cases it is critical that each estimate of relatedness yield a biologically meaningful value. Such is the case for the likelihood estimator, which is constrained to yield estimates of
within the biologically meaningful range of [0, 0.5]. The method-of-moments estimators may also be truncated to lie within the same range. Under conditions of low relatedness, this reduces their standard error and increases their bias. In many cases, however, the maximum-likelihood estimator still exhibits lower standard error or root mean-square error than that of the truncated ones. Thus, the general performance characteristics of this estimator are not strictly the result of constraining the parameter space to include only biologically meaningful estimates.
The primary interest in estimating relatedness may alternatively lie in using the estimates in a subsequent analysis. Such would be the case, for example, in estimating heritability or additive genetic variance from relatedness estimates and phenotypic observations (![]()
![]()
![]()
![]()
![]()
![]()
can also be especially important, because of the propagation of error in the estimate of
to variation in derived estimates of additive genetic variance, VA.
Depending on the sampling conditions, standard errors for the nonlikelihood estimators are between 2 and 250% larger than the standard error for the likelihood estimator. That discrepancy is substantially improved only by truncating the nonlikelihood estimators when individuals are unrelated; unfortunately, in practice it will be unknown whether the specific pair of interest is unrelated or not.
This difference in standard error between estimators is likely to be significant. For example, in the study by ![]()
Overall, the likelihood estimator discussed in this article offers several advantages compared with existing, commonly used estimators. First, except for some truncated estimators applied to unrelated individuals, it uniformly exhibits lower variation, even under conditions of relatively abundant genetic information. For example, even when 30 loci, each segregating for 20 equally frequent alleles, are sampled, the standard error is 10% (and >250% under some conditions) greater for some estimators than for the likelihood estimator. Second, all likelihood estimates are constrained to lie within the biologically meaningful range. Thus, the biological interpretation of individual estimates is quite straightforward. Although apparently not general practice, this constraint can be obtained by truncating the nonlikelihood estimates. While each estimate is now interpretable biologically, the statistical behavior of the truncated estimators is not generally improved over the maximum-likelihood estimator and in some cases is worse. Finally, the likelihood estimator naturally accommodates different genetic sampling schemes. For example, the relative weighting of data from microsatellite loci segregrating for many alleles in contrast to data from single-nucleotide polymorphisms segregating for only two alleles is accomplished directly by the likelihood function. Consequently, all available data can be used to ascertain the degree of relatedness between two individuals.
The main drawback associated with the likelihood estimator is that it can be biased under some conditions, especially if the true degree of relatedness is near the boundary of the parameter space and little genetic information is available. The same is also true of the truncated estimators, some of which are more biased than the likelihood function even when the true relatedness is not near the boundary. Even though biased, however, the maximum-likelihood estimator exhibits a lower root mean-square error than do alternative estimators under many conditions. Thus, the bias is quite minor from a biological perspective. Furthermore, the extent of the bias can be greatly reduced by suitable genetic sampling. If markers with many alleles segregating at intermediate frequencies are used, the bias can be reduced considerably. Often markers with only a few alleles segregating at intermediate frequencies approach the performance of highly polymorphic ones. Given the relative ease with which genetic information can be obtained, bias is not likely to be a major drawback in practice. However, a preliminary study aimed at defining the allele-frequency distributions prior to selecting loci for more intense sampling can dramatically reduce the genetic information required to obtain estimates of relatedness.
Although the maximum-likelihood estimator of relatedness performs extremely well overall, there are conditions under which others perform better according to specific metrics. None perform uniformly better under all conditions according to all metrics. Whether the most relevant performance metric is the standard error, the bias, the root mean-square error, or some other one likely depends in detail on the ultimate use of the relatedness estimates. For specific applications under specific genetic conditions it may be possible to identify one estimator that is optimal. However, for its wide range of applicability and excellent performance across almost all conditions it is difficult to improve on the maximum-likelihood estimator.
| ACKNOWLEDGMENTS |
|---|
Thanks are due to Kelly Gallagher, Kermit Ritland, Elizabeth Thompson, and two anonymous reviewers for their helpful comments. Thanks are also due to Jinliang Wang for his comments and for clarifying the details of the weighting scheme used in his estimator. Finally, Sara Knott and Peter Visscher provided many stimulating discussions and insightful comments that greatly improved the manuscript. This work was supported by National Science Foundation grant DEB-9904026.
Manuscript received July 17, 2002; Accepted for publication November 26, 2002.
| APPENDIX A |
|---|
Although the likelihood function is in general complex and does not admit full analysis, some insight can be obtained for special cases and those results can be used to test the simulations. The first special case to consider corresponds to a single locus segregating for n equally frequent alleles. A1 gives a summary of this situation. Each of the allelic identity-in-state patterns (see Table 1)
1
9 recurs a number of times with different alleles; for example, the pattern involving all four alleles identical-in-state (
1) will occur n different times, corresponding to each of the n different alleles that are possible. Given that the true relationship between the individuals is known, the probability of each of the identity-in-state patterns can be calculated. These are given for each of the fundamental modes of relationship in a noninbred population (
7 = 1,
8 = 1, and
9 = 1); any other is simply a linear combination of these.
For the case of a single locus segregating for n equally frequent alleles, the maximum of the likelihood function can be solved analytically. In some cases (e.g.,
3,
5, and
8) the maximum depends on the number of alleles segregating. Note also that when only a few alleles are segregating not all the identity-in-state modes are possible to observe.
Even from A1 it is possible to understand the behavior of the maximum-likelihood estimator under more general conditions. For example, if an infinite number of alleles segregate in a noninbred population the identity-in-state pattern
i (i = {7, 8, 9}) will occur only when individuals are related by identity-by-descent mode Si. The corresponding maximum-likelihood estimate will also be
i = 1. More loci will reinforce the correct estimate; in cases such as full-sibs or first cousins involving a linear combination of the three fundamental relationship modes, more loci will yield an estimate of the correct proportion of loci corresponding to each of the fundamental modes. This may provide intuitive justification for the asymptotically unbiased nature of the maximum-likelihood estimator (![]()
The effect of fewer alleles segregating is also evident from A1. In this case identity-in-state patterns
1
6 will be observed simply because identical alleles are resampled from the finite pool. The effect of this depends on the number of alleles segregating. If only two alleles are segregating, only five of the nine possible patterns are observable. Two of these yield estimates of
7 = 1, one yields an estimate of
9 = 1, and the remainder yield indeterminate estimates. Given that some of these patterns can occur under any degree of relationship, it is clear that the maximum-likelihood estimate may be misleading under such situations. However, this is entirely due to the fact that the information available for the inference is itself misleading, something that no estimator can alter.
The rapid improvement in performance of the maximum-likelihood estimator with increasing number of segregating alleles is also understandable from A1. With only two alleles, the set of observable identity-in-state patterns is quite constrained and the maximum-likelihood estimates are less concordant with the actual mode of identity-by-descent. Even one additional allele greatly improves the concordance.
A second special case that is amenable to analysis corresponds to a parent-offspring pair assayed at L loci, each of which segregates for n equally frequent alleles. In this case the true relationship is S8 = 1 and the likelihood function is given by
![]() |
(A1) |
where li is the number of loci exhibiting identity in state pattern
i, L1 = l1 + l7 and L2 = l3 + l5 + l8, and c is a constant of proportionality independent of
i. Explicit maximization of Equation A1 yields the estimate of relationship
![]() |
(A2) |
where L = L1 + L2 is the total number of loci sampled.
8 = 1 -
7 and
9 = 0. These estimates, together with the probabilities of observation given in A1 and the binomial distribution, can be used to derive the mean and variance of
.
|
| APPENDIX B |
|---|
The method-of-moment estimator used by ![]()
![]()
![]()
![]()
First, let
![]() |
(B1) |
![]() |
(B2) |
![]() |
(B3) |
![]() |
(B4) |
For each of these pairs of terms, the first uses the notation of ![]()
![]()
![]()
![]() |
(B5) |
Noting that
xy and
xy of ![]()
7 and
8 of ![]()
7 +
8 +
9 = 1, some algebra demonstrates that this is equivalent to
![]() |
(B6) |
where LLR is the likelihood function proposed by ![]()
|
| LITERATURE CITED |
|---|
BOEHNKE, M. and N. J. COX, 1997 Accurate inference of relationships in sib-pair linkage studies. Am. J. Hum. Genet. 61:423-429.[Medline]
BROMAN, K. W. and J. L. WEBER, 1998 Estimation of pairwise relationships in the presence of genotyping error. Am. J. Hum. Genet. 63:1563-1564.[Medline]
CROW, J., and M. KIMURA, 1970 An Introduction to Population Genetics. Burgess, Minneapolis.
FALCONER, D. S., 1981 Introduction to Quantitative Genetics, Ed. 2. Longman, London.
HAMILTON, W. D., 1964a The genetical evolution of social behaviour. I. J. Theor. Biol. 7:1-16.[Medline]
HAMILTON, W. D., 1964b The genetical evolution of social behaviour. II. J. Theor. Biol. 7:17-52.[Medline]
JACQUARD, A., 1974 The Genetic Structure of Populations. Springer-Verlag, New York.
KENDALL, M. G., A. STUART and J. K. ORD, 1979 Advanced Theory of Statistics, Vol. 2: Inference and Relationship, Ed. 4. Macmillan, New York.
LI, C. C., D. E. WEEKS, and A. CHAKRAVARTI, 1993 Similarity of DNA fingerprints due to chance and relatedness. Hum. Hered. 43:45-52.[Medline]
LYNCH, M. and K. RITLAND, 1999 Estimation of pairwise relatedness with molecular markers. Genetics 152:1753-1766.
LYNCH, M., and B. WALSH, 1998 Genetics and Analysis of Quantitative Traits. Sinauer, Sunderland, MA.
MOUSSEAU, T. A., K. RITLAND, and D. D. HEATH, 1998 A novel method for estimating heritability using molecular markers. Heredity 80:218-224.
PAINTER, I., 1997 Sibship reconstruction without parental information. J. Agric. Biol. Environ. Stat. 2:212-229.
PRESS, W. H., S. A. TEUKOLSKY, W. T. VETTERLING and B. P. FLANNERY, 1992 Numerical Recipes in C: The Art of Scientific Computing, Ed. 2. Cambridge University Press, Cambridge, UK.
QUELLER, D. C. and K. F. GOODNIGHT, 1989 Estimating relatedness using genetic markers. Evolution 43:258-275.
RITLAND, K., 1996a Estimators for pairwise relatedness and individual inbreeding coefficients. Genet. Res. 67:175-185.
RITLAND, K., 1996b A marker-based method for inferences about quantitative inheritance in natural populations. Evolution 50:1062-1073.
RITLAND, K., 2000 Marker-inferred relatedness as a tool for detecting heritability in nature. Mol. Ecol. 9:1195-1204.[Medline]
RITLAND, K. and C. RITLAND, 1996 Inferences about quantitative inheritance based on natural population structure in the yellow monkeyflower, Mimulus guttatus.. Evolution 50:1074-1082.
SIEBERTS, S. K., E. M. WUSMAN, and E. A. THOMPSON, 2002 Relationship inference from trios of individuals, in the presence of typing error. Am. J. Hum. Genet. 70:170-180.[Medline]
STUART, A., and J. K. ORD, 1987 Kendall's Advanced Theory of Statistics, Vol. 1: Distribution Theory, Ed. 5. Oxford University Press, New York.
THOMAS, S. C., J. M. PEMBERTON, and W. G. HILL, 2000 Estimating variance components in natural populations using inferred relationships. Heredity 84:427-436.
THOMAS, S. C., D. W. COLTMAN, and J. M. PEMBERTON, 2002 The use of marker-based relationship information to estimate the heritability of body weight in a natural population: a cautionary tale. J. Evol. Biol. 15:92-99.
THOMPSON, E. A., 1975 The estimation of pairwise relationships. Ann. Hum. Genet. 39:173-188.[Medline]
THOMPSON, E. A., 1986 Pedigree Analysis in Human Genetics. Johns Hopkins University Press, Baltimore.
VAN DE CASTEELE, T., P. GALBUSERA, and E. MATTHYSEN, 2001 A comparison of microsatellite-based pairwise relatedness estimators. Mol. Ecol. 10:1539-1549.[Medline]
WANG, J., 2002 An estimator for pairwise relatedness using molecular markers. Genetics 160:1203-1215.
This article has been cited by other articles:
![]() |
R. J. W. Blonk, H. Komen, A. Kamstra, and J. A. M. van Arendonk Estimating Breeding Values With Molecular Relatedness and Reconstructed Pedigrees in Natural Mating Populations of Common Sole, Solea Solea Genetics, January 1, 2010; 184(1): 213 - 219. [Abstract] [Full Text] [PDF] |
||||
![]() |
W. Sieh, Y. Choi, N. H. Chapman, U.-K. Craig, E. J. Steinbart, J. H. Rothstein, K. Oyanagi, R. M. Garruto, T. D. Bird, D. R. Galasko, et al. Identification of novel susceptibility loci for Guam neurodegenerative disease: challenges of genome scans in genetic isolates Hum. Mol. Genet., October 1, 2009; 18(19): 3725 - 3738. [Abstract] [Full Text] [PDF] |
||||
![]() |
J. Yu, Z. Zhang, C. Zhu, D. A. Tabanao, G. Pressoir, M. R. Tuinstra, S. Kresovich, R. J. Todhunter, and E. S. Buckler Simulation Appraisal of the Adequacy of Number of Background Markers for Relationship Estimation in Association Mapping The Plant Genome, March 1, 2009; 2(1): 63 - 77. [Abstract] [Full Text] [PDF] |
||||
![]() |
C. Bessega, B. O. Saidman, M. R. Darquier, M. Ewens, L. Sanchez, P. Rozenberg, and J. C. Vilardi Consistency between marker- and genealogy-based heritability estimates in an experimental stand of Prosopis alba (Leguminosae) Am. J. Botany, February 1, 2009; 96(2): 458 - 465. [Abstract] [Full Text] [PDF] |
||||
![]() |
A. D. Anderson and B. S. Weir A Maximum-Likelihood Method for the Estimation of Pairwise Relatedness in Structured Populations Genetics, May 1, 2007; 176(1): 421 - 440. [Abstract] [Full Text] [PDF] |
||||
![]() |
M. Malosetti, C. G. van der Linden, B. Vosman, and F. A. van Eeuwijk A Mixed-Model Approach to Association Mapping Using Pedigree Information With an Illustration of Resistance to Phytophthora infestans in Potato Genetics, February 1, 2007; 175(2): 879 - 889. [Abstract] [Full Text] [PDF] |
||||
![]() |
K. Csillery, T. Johnson, D. Beraldi, T. Clutton-Brock, D. Coltman, B. Hansson, G. Spong, and J. M. Pemberton Performance of Marker-Based Relatedness Estimators in Natural Populations of Outbred Vertebrates Genetics, August 1, 2006; 173(4): 2091 - 2101. [Abstract] [Full Text] [PDF] |
||||
![]() |
P. A. Oliehoek, J. J. Windig, J. A. M. van Arendonk, and P. Bijma Estimating Relatedness Between Individuals in General Populations With a Focus on Their Use in Conservation Programs Genetics, May 1, 2006; 173(1): 483 - 496. [Abstract] [Full Text] [PDF] |
||||
![]() |
R. L. Andrew, R. Peakall, I. R. Wallis, J. T. Wood, E. J. Knight, and W. J. Foley Marker-Based Quantitative Genetics in the Wild?: The Heritability and Genetic Correlation of Chemical Defenses in Eucalyptus Genetics, December 1, 2005; 171(4): 1989 - 1998. [Abstract] [Full Text] [PDF] |
||||
![]() |
S. C Thomas The estimation of genetic relationships using molecular markers and their efficiency in estimating heritability in natural populations Phil Trans R Soc B, July 29, 2005; 360(1459): 1457 - 1467. [Abstract] [Full Text] [PDF] |
||||
![]() |
A. T. W. Kraakman, R. E. Niks, P. M. M. M. Van den Berg, P. Stam, and F. A. Van Eeuwijk Linkage Disequilibrium Mapping of Yield and Yield Stability in Modern Spring Barley Cultivars Genetics, September 1, 2004; 168(1): 435 - 446. [Abstract] [Full Text] [PDF] |
||||
- THIS ARTICLE
-
Abstract
- Full Text (PDF)
- Alert me when this article is cited
- Alert me if a correction is posted
- SERVICES
- Email this article to a friend
- Similar articles in this journal
- Similar articles in PubMed
- Alert me to new issues of the journal
- Download to citation manager
- Reprints & Permissions
- CITING ARTICLES
- Citing Articles via HighWire
- Citing Articles via Google Scholar
- GOOGLE SCHOLAR
- Articles by Milligan, B. G.
- Search for Related Content
- PUBMED
- PubMed Citation
- Articles by Milligan, B. G.



i given modes of identity-by-descent Sj 



















