Genetics, Vol. 160, 357-366, January 2002, Copyright © 2002

Assessing Allelic Dropout and Genotype Reliability Using Maximum Likelihood

Craig R. Millera, Paul Joyceb, and Lisette P. Waitsa
a Department of Fish and Wildlife, College of Natural Resources, University of Idaho, Moscow, Idaho 83844
b Department of Mathematics, Division of Statistics, University of Idaho, Moscow, Idaho 83844

Corresponding author: Craig R. Miller, College of Natural Resources, University of Idaho, Moscow, ID 83844., mill8560{at}uidaho.edu (E-mail)

Communicating editor: S. TAVARÉ


*  ABSTRACT
*TOP
*ABSTRACT
*METHODS
*RESULTS AND DISCUSSION
*CONCLUSIONS AND OUTSTANDING...
*APPENDIX
*LITERATURE CITED

A growing number of population genetic studies utilize nuclear DNA microsatellite data from museum specimens and noninvasive sources. Genotyping errors are elevated in these low quantity DNA sources, potentially compromising the power and accuracy of the data. The most conservative method for addressing this problem is effective, but requires extensive replication of individual genotypes. In search of a more efficient method, we developed a maximum-likelihood approach that minimizes errors by estimating genotype reliability and strategically directing replication at loci most likely to harbor errors. The model assumes that false and contaminant alleles can be removed from the dataset and that the allelic dropout rate is even across loci. Simulations demonstrate that the proposed method marks a vast improvement in efficiency while maintaining accuracy. When allelic dropout rates are low (0–30%), the reduction in the number of PCR replicates is typically 40–50%. The model is robust to moderate violations of the even dropout rate assumption. For datasets that contain false and contaminant alleles, a replication strategy is proposed. Our current model addresses only allelic dropout, the most prevalent source of genotyping error. However, the developed likelihood framework can incorporate additional error-generating processes as they become more clearly understood.


THE extraction and amplification of DNA from museum, noninvasive, and forensic sources has great potential for studying and managing wild populations (KOHN and WAYNE 1997 Down; TABERLET et al. 1999 Down). The power of these approaches lies in their circuity: free-ranging animals can be "tagged" and sexed without ever being handled or observed (MORIN et al. 1994 Down; PALSBOLL et al. 1997 Down; REED et al. 1997 Down; TABERLET et al. 1997 Down; KOHN et al. 1999 Down; WOODS et al. 1999 Down; ERNEST et al. 2000 Down) and generations long dead can reveal the effective size of a population (MILLER and KAPUSCINSKI 1997 Down) and changes in patterns and levels of variability over time (ROY et al. 1996 Down; MUNDY et al. 1997 Down; BOUZAT et al. 1998 Down; TESSIER and BERNATCHEZ 1999 Down; LEONARD et al. 2000 Down). Unfortunately, the DNA extracted from aged material such as bone or from noninvasive sources such as hair and feces is often at low concentrations and/or highly fragmented. Under these conditions the probability of a genotyping error is severely elevated (TABERLET et al. 1996 Down). At nuclear microsatellite loci there are three types of errors: failure to amplify one of an individual's two alleles (allelic dropout), polymerase error rendering a "false" allele, and amplification of contaminant DNA. Of the three errors, allelic dropout appears to be the most serious problem (GAGNEUX et al. 1997 Down). Contamination can be controlled by stringent laboratory protocols and the inclusion of multiple negative controls. False alleles are considerably less frequent (GAGNEUX et al. 1997 Down; GOOSSENS et al. 1998 Down), often show an unusual spectral pattern, or produce three alleles that can be tagged as suspicious and replicated. In contrast, an allele that has dropped out leaves no trace of itself in the genotype data.

The cause of allelic dropout is believed to be stochastic sampling error (NAVIDI et al. 1992 Down; TABERLET et al. 1996 Down). Sampling occurs when the extract is pipetted into the PCR mix and again when the primers and polymerase bind the template DNA in the PCR. If template DNA is at very low concentrations, then one copy may, by chance, be amplified more than the other. [In this article the term "copies" refers to the one maternally inherited allele and the one paternally inherited allele (without regard to state) so that the term "alleles" can be used to specify the variants at a locus.] If the two copies represent different alleles (a heterozygote) then dropout yields a "false homozygote."

TABERLET et al. 1996 Down used a stochastic sampling simulation and replicate PCRs in the laboratory to show that at low DNA concentrations the rate of allelic dropout can be >50%. As a worst-case scenario, they reasoned that in a heterozygote where every positive PCR reflects the sampling of only one allele, the probability of sampling the same allele i times consecutively is 1/2i-1 (assuming that each allele is equally likely to drop out). Treating this as a null hypothesis, a locus where only one allele is observed is accepted as a genuine homozygote only when the heterozygote hypothesis has been made untenable by the data. The probability of false acceptance can be made arbitrarily small ({alpha}) by making the integer i arbitrarily large: i >= 1 - (ln {alpha}/ln 2). If {alpha} = 0.05, i >= 6, and if {alpha} = 0.01, i >= 8. NAVIDI et al. 1992 Down used a similar hypothesis-based approach to evaluate genotype reliability. Although their model is more complex in that contamination is considered, the homozygote case reduces to the same rejection rule.

If multiple loci are considered simultaneously, then an acceptance error at any heterozygous locus renders the genotype erroneous. The worst-case rationale can be extended to multiple loci by casting it as a decision rule— that is, a procedure specified before collecting any data that will yield a correct genotype with probability >=1 - {alpha}. The procedure accomplishing this is one that renders a correct genotype with probability = 1 - {alpha} under the worst possible scenario (a dropout rate of 1 and all loci heterozygous). Under these circumstances, the probability of obtaining a correct multilocus genotype is

(1)

where L is the number of loci and i is the number of replicates (see Appendix for proof; the word "replicates" is used to specify the per locus number of reactions). This probability can be made arbitrarily large by making i arbitrarily large: i >= 1 - [ln(1 - (1 - {alpha})1/L)/2] (see Appendix). If L = 8 then 9 replicates are required to meet the {alpha} = 0.05 criteria and 11 replicates are required to meet the {alpha} = 0.01 criteria. For the duration of this article, this is called the "worst-case rule" (WCR).

There are both practical and statistical shortcomings to a worst-case approach. Pragmatically, it leads to the need to perform large numbers of replicates. Acquiring accurate genetic information on a population will often involve typing hundreds or thousand of samples. As is shown, a study using eight loci and the WCR could easily require an average of 35 reactions per sample. This equates to 35,000 single reactions to accurately genotype 1000 samples. The financial costs associated with this number would prohibit many noninvasive and historical genetic studies. Furthermore, the limited amount of DNA extract may be consumed before 30 or 40 reactions can be performed. The statistical problem with a worst-case approach is that it makes essentially no use of the data in hand.

In contrast to ignoring the available data, GAGNEUX et al. 1997 Down estimated the dropout rate from shed hairs (rather than assuming it is 1) by replicating genotypes on 18 chimpanzees up to six times. Maintaining the heterozygous null, the p value is given by (p) x (p/2)i-1, where p is the dropout rate (GAGNEUX et al. 1997 Down). The problem with this approach is that obtaining an accurate estimate of p at every locus for every individual requires many replicates and thus defeats the promise of improved efficiency. The problem might be circumvented by pooling results across loci and samples, but this requires the assumption that dropout rates are equal across loci and samples. GAGNEUX et al. 1997 Down observed significantly different dropout rates between individuals (ranging from 0 to 75%) but not between loci. As with the WCR, this approach makes the assumption that every locus is heterozygous regardless of how likely this is to be true.

What are the implications of committing genotyping errors? Clearly, a high genotyping error rate could bias most current applications of microsatellites, including genetic mark-recapture studies, forensic identification of individuals, parentage analysis, population assignment, and estimates of population substructure. D. ROON, L. WAITS and K. KENDALL (unpublished data) and WAITS and LEBERG 2000 Down have shown that genotyping errors in noninvasive mark-recapture studies can result in severe overestimates of population size. Compounding this problem is the fact that resolution between individuals is improved by adding loci (MILLS et al. 2000 Down). Because the probability of obtaining a correct multilocus genotype declines as the product across individual loci, adding loci while using a single-locus statistical procedure (or worse yet, ignoring statistical issues of reliability altogether) results in an increasing per sample error rate. The potentially severe implications of committing errors, along with the costly and inefficient methods currently employed for avoiding them, provide strong motivation for developing an approach that uses the available data more efficiently. In this article we develop the theoretical basis of a more efficient method for acquiring reliable genotype data, using the method of maximum likelihood. We then evaluate its performance and compare it with the worst-case approach under a range of simplified scenarios.


*  METHODS
*TOP
*ABSTRACT
*METHODS
*RESULTS AND DISCUSSION
*CONCLUSIONS AND OUTSTANDING...
*APPENDIX
*LITERATURE CITED

General approach:
Before developing the proposed approach, it is helpful to overview the rationale behind it. Suppose that an individual is genotyped at each of a number of diploid loci i times. Assuming that contamination and false alleles do not occur or can be removed from the data (see CONCLUSIONS AND OUTSTANDING ISSUES), the observation of two different alleles at a locus implies that the individual is a heterozygote. If only one allele is observed, however, then the individual may either be a true homozygote or it may be a heterozygote at which i dropouts of the same allele have occurred. The probability of the latter event can be estimated as it is a function of the probability that the two copies differ (i.e., the heterozygosity) and the probability of a dropout. If the allele frequencies are known and Hardy-Weinberg equilibrium is assumed, then the heterozygosity (conditional on the allele observed) is readily obtained. The dropout probability for the sample at hand can be estimated by finding that dropout rate that makes the observed data most likely. The dropout rate must be estimated for each sample because samples differ in age, environmental exposure, etc., and therefore in quality and quantity of DNA. The (un)reliability of the observed multilocus genotype can then be estimated by weighting the probability that sequential dropout errors have occurred by the probability that the locus is heterozygous multiplied across the observed homozygous loci. Samples that are not reliable must be replicated until they are.

The model:
Consider a population where all loci under study are independent, at Hardy-Weinberg equilibrium, and have known allele frequencies. The model involves two sampling events. First, during reproduction alleles are sampled from the gamete pool and fixed into individuals. Second, during the PCR alleles are sampled from individuals across loci. Let ij denote the number of times sampling occurs at locus j. Each time one copy drops out with probability pj. If this error occurs, we assume that each copy is equally likely to be the dropout. Otherwise, both copies are observed. We do not consider the event that both copies drop out because PCR failure may be due to more than stochastic sampling error such as PCR reagents, thermocycler problems, etc. Hence, pj is actually the conditional probability of detecting one allele given that at least one allele amplifies. (It can be shown that, for realistic error rates, neglecting double dropouts has a negligible effect.) If the two copies are labeled a and b (where a could be the same or different from b), the results can be summarized as the number of times in which a, b, and ab are sampled: ra,j, rb,j, and rab,j, respectively. Let the vector of these counts be denoted by j. When the individual's true genotype, g, is known, the likelihood of the data given the genotype and given the dropout rates is trinomial multiplied across the T heterozygous loci.

(2)

Truly homozygous loci can be ignored because the probability of the data at them is 1.

Of course, the true genotype is not known. This is addressed by writing the likelihood as the sum of genotype-specific probabilities weighted by the unconditional probability of the genotype (i.e., its expected frequency) over all possible genotypes. In addition, the model is greatly simplified by assuming that across all samples the dropout rates at different loci are related to one another by a collection of constants such that pj = cjp. Then

(3)

If the dropout rates are equal across loci then cj = 1 for all j and the likelihood reduces to

(4)

Although all subsequent theory and analyses in this article are based on the assumption that the dropout rate is even across loci, this assumption can be relaxed for any of the derivations that follow by substituting cjp in for p.

It is the reliability of the genotype, not the dropout rate, that is of interest to the investigator. Let Ej be the event that the observed genotype is correct at locus j and let Eg be the event that it is correct across all loci. Let fj denote the frequency of the observed allele at locus j. Note that it is at the M loci observed as homozygous where errors may be hidden. The reliability of a genotype, P(Eg), is given by

(5)

For purposes of study design, the unconditional probability that a genotype will be correctly identified is useful. Let Zl be the event that the lth locus is heterozygous, ZCl the event that the lth locus is homozygous, Hl the heterozygosity at locus l, and L the number of loci typed. Then

(6)

By comparing Equation 5 and Equation 6 we see how (5) is dependent on the observed data while (6) is not. The dependency on the data occurs in two ways. Since it cannot be determined before viewing the data which loci will be observed as homozygous, the observed allele with frequency fj at homozygous locus j is itself data dependent. Also, note that the total number of observed homozygous loci, M, is a random variable that is determined only after viewing the data.

Estimating the dropout rate, genotype reliability, and number of additional replicates:
The value of p that maximizes Equation 4 is the maximum-likelihood estimate (MLE), . If is substituted for p into Equation 5 and <= p, then accepting only those genotypes that exceed a reliability criteria (1 - {kappa}) of say, 95%, would limit the long run frequency of accepting false genotypes to <=5%. But when p > , substituting into Equation 5 overstates the reliability, a nonconservative error. This error can be guarded against by using an upper confidence bound on p in Equation 5, (up). Formally, we define the estimated reliability,

(7)

where (up) is chosen so that P(Reliability >= Estimated reliability) = {kappa}. A method for finding (up) is described in the simulation section below. This general approach of estimating the reliability on the basis of a MLE of the dropout rate is referred to as the MLR method.

Suppose that between one and three PCR replicates are conducted initially at each locus and the reliability is estimated. If the estimated reliability is <1 - {kappa}, then further replication is necessary, but how much and at which loci? The answer lies in Equation 7. Because the estimated reliability is a product across individual loci, it follows that the largest per reaction increase in estimated reliability will occur by adding a replicate to the most unreliable locus. In theory, the most efficient procedure is to add one replicate to the most unreliable locus, reestimate the reliability, and continue in this manner until the estimated reliability >= 1 - {kappa}. Adding replicates one at a time and reevaluating the data between each is called the "single addition method" (SAM). When the (re)evaluation entails the MLR method as just described, the abbreviation MLRSAM is used.

In the laboratory, however, SAM will usually be impractical as it entails a PCR, gel, and analysis between every additional reaction. A more practical approach is to add additional replicates in a block. The following algorithm provides an efficient way to choose the size of the block: mathematically add one to the number of replicates at the most unreliable locus, assume that the same homozygote is observed, and reestimate the reliability using the original (up). Repeat this process until the estimated reliability >= 1 - {kappa}. Perform this pattern of replication in the laboratory. Unless (up) increases substantially with the new data, this method will yield a genotype estimated as reliable. If (up) does jump and the genotype is still estimated as unreliable, perform another block of replicates. Adding replicates en masse is called the "blockwise addition method" (BAM). When the block size and acceptability criteria are based on the MLR method, the abbreviation MLRBAM is used. (The MLR prefix is necessary here because SAM and BAM are also used in conjunction with the WCR evaluation criteria.)

Simulations:
Simulations are used here for two basic purposes: (i) to find the upper confidence bound on p, (up), to be used in Equation 7 and (ii) to evaluate and compare the performances of the various approaches. Specific methods to these ends are detailed in the subsections below, but the same basic simulation algorithm is used throughout. First, model parameters are fixed. These include the number of loci, number of initial replicates, allele frequencies, and the dropout rate. Unless otherwise noted, the dropout rate is equal across loci. In each run a multilocus genotype is created and then sampled with errors as described in The model above, to yield the data. A computer algorithm is used to search the potential dropout rates between zero and one at 1% increments until the MLE from Equation 4 is found. Simulations are carried out using Visual Basic 6.0.

Performance of the MLE: The performance of Equation 4 in estimating the dropout rate is evaluated by conducting 1000 runs for a given set of parameters and then calculating the (estimated) bias and standard error. Simulations are run to assess the effects of the number of loci (three to six), number of replicates (one to three), heterozygosity (18–67%), and the parametric dropout rate (0–1) on the performance.

Upper confidence bound on the dropout rate: There are two steps in determining (up). First we need to know how large p could be and still yield estimates as small or smaller than only a specified proportion of the time ({alpha}). Thus the upper 1 - {alpha} confidence bound on p [denoted (up1-{alpha})] is defined as the value of p that would yield MLEs <= , {alpha}(100)% of the time. Because the data are discrete, there may be a range of p's that meet this criteria; (up1-{alpha}) is the largest of these. Finding (up1-{alpha}) requires knowing the sampling distribution of for any given p, number of replicates, number of loci, and allele frequencies. When repeated many times, the algorithm described in the first paragraph of this subsection on simulations generates just this. Candidate values of p are examined individually until the largest one having {alpha} of its probability <= is found. This is achieved by a computer algorithm that searches the candidate value of p at 1% increments.

This establishes how (up1-{alpha}) can be determined for any value of {alpha}; the second task is to determine what {alpha} renders a rate of false inclusions {approx}{kappa}. If the reliability was a deterministic function of p, then a 1 - {alpha} upper bound on p would render a 1 - {alpha} lower bound on the reliability. However, the reliability is a stochastic function depending on both p and the data. As is shown, a 1 - {kappa} of 95% corresponds to a 1 - {alpha} of only 70%. There does not appear to be an analytic method for finding {alpha} for a given {kappa}. Instead, simulations are employed to test how different upper bounds limit the rate of false inclusions across a range of reasonable conditions. In these simulations, the upper bound (1 - {alpha}), number of loci, number of initial replicates, allele frequencies, and the reliability criteria (1 - {kappa}) are fixed. In each run, data are generated by performing the initial replicates, determining (up1-{alpha}), and estimating the reliability according to Equation 7. If the estimated reliability is <1 - {kappa}, then SAM is used to add replicates until the estimated reliability is >=1 - {kappa}. Upon acceptance, the true and observed genotypes are compared to yield a binary result of either correct or incorrect. This process is repeated 1000 times and the observed incidence of false inclusions is calculated. The appropriate upper bound is defined as the smallest value of 1 - {alpha} that limits the observed incidence of false inclusions to <={kappa} for all values of p. Upper bounds are searched at 5% increments.

Efficiency of the different approaches: Four different methods—MLRSAM, MLRBAM, WCRSAM, and WCRBAM—are evaluated and compared by running simulations under common parametric conditions (i.e., fixed dropout rate, number of loci, allele frequencies, and number of initial replicates). In all simulations, the data are evaluated after the initial reactions have been performed. In the MLR simulations, a sample estimated to be unreliable is replicated at observed homozygous loci either in a single (SAM) or a blockwise (BAM) fashion as described above until it is estimated as reliable. By comparing it to the true genotype, the accepted genotype is scored as either correct or incorrect and the number of reactions invested in it is recorded. This is repeated 1000 times to yield an observed incidence of false inclusions and a mean number of reactions per sample.

The WCR simulations are analogous except in the rules governing acceptance and additional replication. Under the WCRSAM, all observed homozygous loci are replicated once and reevaluated, replicated once and reevaluated, and so on. Loci that do not turn up heterozygous by this process are replicated until Equation 1 is satisfied. With the WCRBAM, one block of reactions is added to all observed homozygous loci so that Equation 1 is satisfied. Because the BAM is more practical in the lab, further simulations focus on the MLRBAM and WCRBAM approaches. Specifically, the effects of heterozygosity (50–80%), number of loci (four to eight), number of initial replicates (one to three), and reliability criteria (95–99.9%) on the number of reactions per sample are explored.

Interlocus dropout heterogeneity: Simulations are conducted to investigate how well the MLRBAM approach performs when it is assumed that the dropout rates are even across loci, but they are not. This is accomplished by running successive simulations where the dropout rates across loci are made increasingly uneven but other parameters are held constant. Reliability is estimated under the assumption of dropout rate homogeneity (i.e., using Equation 4 and Equation 7), and replicates are added using BAM. Each simulation consists of 1000 runs from which the incidence of false inclusions is calculated.


*  RESULTS AND DISCUSSION
*TOP
*ABSTRACT
*METHODS
*RESULTS AND DISCUSSION
*CONCLUSIONS AND OUTSTANDING...
*APPENDIX
*LITERATURE CITED

Performance of the MLE:
When the number of loci is less than four, genotypes are unreplicated, or the heterozygosity is <50%, there is little information in the data regarding the dropout rate. In this case, the MLE from Equation 4 tends to be biased high for small values of p (data not shown). While no bias is desirable, overestimating the dropout rate is a conservative error. Above these values (or for p > 0.5 below these values), the estimator becomes approximately unbiased. As expected in a binomial model, the largest standard errors are observed when p is near 50%.

Upper confidence bound on the dropout rate:
To make a conservative estimate of the reliability, we need a sufficiently conservative estimate of the dropout rate. Preliminary simulations showed that, irrespective of what upper bound is used, the incidence of false inclusions is highest when p is between 0.5 and 0.8 (data not shown). We therefore concentrated on finding the appropriate upper bound for p in this range as it will be sufficiently large for other values of p. When a 95% reliability criteria is required for acceptance (i.e., 1 - {kappa} = 0.95) and two initial replicates are performed, the approximate upper bound on p for four, six, or eight loci with H = 50 or 67% is between 65 and 75% (Table 1). Increasing the reliability criteria to 99% elevates the upper bounds slightly to between 70 and 75%. In the case of three initial replicates, the appropriate upper bounds are in the 60–70% range, while with initially unreplicated data upper bounds are between 75 and 85% (data not shown). These results are used to set the upper bound in subsequent simulations.


 
View this table:
In this window
In a new window

 
Table 1. Upper bound as a function of {kappa}, number of loci, and H

Efficiency of the different approaches:
This study is motivated largely by the apparent inefficiency of the worst-case approach. The central question is, therefore, how efficient is the proposed MLR method by comparison? Consider a case where two initial replicates are performed at six loci, all with 67% heterozygosity. The reliability criteria are set at 1 - {kappa} = 95% for MLR simulations and the probability of a correct genotype is likewise set at 1 - {alpha} = 95% for the WCR simulations. In comparing the mean number of total reactions required to achieve acceptable genotypes under each of the four methods (Fig 1A), several important trends emerge. First, the MLR methods are virtually always more efficient than the WCR method of the same replication strategy. Second, differences between the MLR and the WCR methods are largest when p = 0 and they disappear as p approaches 1. For p <= 0.2, the MLR methods require 10–12 reactions fewer than the WCR methods (a 40–50% reduction). We do note, however, that the total number of reactions in this and all other simulations in this article does not include any failed PCR reactions. While PCR failures should be rare for samples with low values of p, failures will increase the total reaction counts and reduce the proportional difference between approaches. Third, the SAM and BAM approaches are nearly equivalent for small values of p, but SAM is increasingly superior as p approaches 1.



View larger version (21K):
In this window
In a new window
Download PPT slide
 
Figure 1. Comparative performance of the WCRSAM (•), WCRBAM ({blacktriangleup}), MLRSAM ({circ}), MLRBAM ({triangleup}) across dropout rate, p. (A) Mean total number of reactions to achieve acceptable genotypes. (B) Mean estimated reliability of accepted genotypes. For WCRSAM and WCRBAM, reliability estimated after genotype acceptance using Equation 7 and a 70% upper bound on p. (C) Incidence of false inclusion over the 1000 runs. (D) The number of rounds of reactions needed under MLRBAM to achieve genotype acceptance. ({blacksquare}) Three rounds, () two rounds, ({square}) one round. Based on 1000 runs/simulation, six loci, two initial replicates/locus, H = 67% [allele frequencies: {0.33, 0.33, 0.34}], reliability criteria 1 - {kappa} = 95%, upper bound on p (1 - {alpha}) = 70%.

This third trend is important because BAM is far more practical than SAM in the laboratory and because dropout rates should be <0.5 for most samples. For example, GAGNEUX et al. 1997 Down and GOOSSENS et al. 1998 Down observed mean dropout rates of 31 and 14%, respectively, on single hair extracts and ERNEST et al. 2000 Down observed an 8% dropout rate from felid fecal DNA. The practicality of the MLRBAM approach is apparent when one considers the number of rounds of replication (i.e., the number of blocks including the initial block of 12 reactions) that were necessary in the MLRBAM simulations of Fig 1A (Fig 1D). The sample is nearly always accepted within two rounds and when p is small it is not uncommon to accept it without further replication.

The WCR is generally inefficient because it is designed to guard against a worst possible scenario, p = 1. When p is not large, the payoff of (over)replicating is, of course, that it renders genotypes with high estimated reliabilities and very few errors (Fig 1B and Fig C). In contrast, the MLR methods yield moderate estimated reliabilities and an incidence of false inclusion below—but generally not far below—{kappa}. This is true of all the MLR simulations in this article except those involving interlocus dropout heterogeneity (see below). To put these false inclusion rates in perspective, if genotypes in these simulations were unconditionally accepted after initial replication (i.e., without being subject to a reliability criteria) the incidence of erroneous genotypes would be ~0, 7, 29, 54, 75, and 93% for p = 0, 0.2, 0.4, 0.6, 0.8, and 1, respectively.

It might be argued that the lower false inclusion rate observed with WCR approaches over most of the range of p suggests it actually outperforms the MLR approaches in one respect. Recall, however, that the investigator is willing to tolerate up to 5% errors to reduce the number of reactions. If the investigator wishes to reduce the incidence of false inclusions below 5%, the reliability criteria in the MLR methods can simply be raised. Fig 2 shows the number of replicates required to achieve estimated reliabilities of 95, 99, and 99.9% with the MLRBAM approach for six loci, two initial replicates, and H = 67%. When p is low, only a few more reactions are required to achieve the higher estimated reliabilities, and even at p = 0.4, 99% estimated reliability is just four reactions >95% estimated reliability. As p gets large the cost of higher estimated reliability grows considerably.



View larger version (17K):
In this window
In a new window
Download PPT slide
 
Figure 2. Effect of reliability criteria (1 - {kappa}) on the number of reactions needed to achieve acceptable genotypes under the MLRBAM method. Based on 1000 runs/simulation, six loci, two initial replicates/locus, H = 67%. Upper bounds on p (1 - {alpha}): 70% for {kappa} = 0.05 ({diamond}), 75% for {kappa} = 0.01 ({triangleup}), 80% for {kappa} = 0.001 ({circ}). In all simulations, observed incidence of false inclusions <= 1 - {kappa}.

In addition to assuming that p = 1, the WCR assumes that H = 1. As these assumptions are approached, we expect the relative performance of the WCR methods to improve. Examining the comparative effect of heterozygosity on efficiency in the WCRBAM and MLRBAM approaches shows this to be true (Fig 3). But even at H = 80%, MLRBAM outperforms WCRBAM across the range where most samples will realistically fall (for p < 70%). When H = 50% and p is small, MLRBAM renders acceptable genotypes in approximately one-half as many reactions. Interestingly, the efficiency of the MLRBAM approach is only slightly improved by increasing heterozygosity.



View larger version (15K):
In this window
In a new window
Download PPT slide
 
Figure 3. Effect of heterozygosity on the number of reactions needed to achieve acceptable genotypes under WCRBAM and MLRBAM approaches. (•) H = 80%, WCRBAM; ({circ}) H = 80%, MLRBAM; ({blacktriangleup}) H = 50%, WCRBAM; ({triangleup}) H = 50%, MLRBAM. Based on 1000 runs/simulation, six loci, two initial replicates/locus, 95% reliability criteria, and 70% upper bound on p. Allele frequencies: H = 50%, {0.5, 0.5}; H = 80%, {0.2, 0.2, 0.2, 0.2, 0.2}. In all simulations, observed incidence of false inclusions <= 1 - {kappa}.

Because an error anywhere in a genotype renders it erroneous, adding loci elevates the estimated reliability required of each locus and thereby the number of replicates per locus. The impact of this effect was investigated by running simulations at four and eight loci while holding all other parameters constant (two initial replicates, H = 67%; Fig 4). Surprisingly, when p is near zero, doubling the number of loci approximately doubles the total number of reactions in the MLRBAM approach. This near linear increase reflects a near constancy in per locus replication. As p increases, however, the cost of adding loci escalates in a nonlinear manner (as indicated by the divergent MLRBAM lines in the figure). Fig 4 also shows that unless p is near one, adding loci increases the disparity between the MLRBAM and WCRBAM approaches.



View larger version (14K):
In this window
In a new window
Download PPT slide
 
Figure 4. Effect of number of loci on mean number of reactions needed to reach acceptable genotypes under WCRBAM and MLRBAM approaches. (•) Eight loci, WCRBAM; ({circ}) eight loci, MLRBAM; ({diamondsuit}) four loci, WCRBAM; ({diamond}) four loci, MLRBAM. Based on 1000 runs/simulations, two initial replicates/locus, H = 67%, 95% reliability criteria, and 70% upper bound on p. In all simulations, observed incidence of false inclusions <= 1 - {kappa}.

One parameter that affects efficiency and is easily manipulated by the investigator is the number of initial replicates per locus. TABERLET et al. 1996 Down proposed performing three initial replicates—though this choice was influenced by considerations of false and contaminant alleles as well as allelic dropout. We consider the cases of one, two, and three initial replicates while the other parameters are held constant (six loci, H = 67%). The MLRBAM approach on unreplicated data is very inefficient except when p is near 0 (Fig 5). This is especially apparent when it is compared to the theoretical optimum: MLRSAM on initially unreplicated data. Note how much poorer the performance of BAM is relative to SAM with unreplicated (Fig 5) compared to two-replicate data (Fig 1A). This is because unreplicated data yield a higher upper bound and more observed homozygote loci and these drive the size of the reaction block up. In practice, the most efficient replication strategy is two initial replicates under the MLR BAM method. Three initial replicates under MLRBAM is more efficient than two only when p >= 0.8. The two- and three-replicate results under the WCRBAM approach are even less efficient except when p is near 1.



View larger version (28K):
In this window
In a new window
Download PPT slide
 
Figure 5. Effect of number of initial replicates, Ri, on mean number of reactions to achieve acceptable genotypes under WCRBAM and MLRBAM approaches. The results of Ri = 1 under MLRSAM (the theoretical optimum) is also shown for comparison. Based on six loci, H = 67%, 95% reliability criteria. Upper bounds on p(1 - {alpha}): 75% for Ri = 1, 70% for Ri = 2, and 70% for Ri = 3. In all simulations, observed incidence of false inclusions <= 1 - {kappa}.

Interlocus dropout heterogeneity:
All simulations to this point have assumed that dropout rates are equal across loci. Here we address a simple question: How well does MLRBAM perform when the dropout rates are assumed to be even but they are not? Fig 6A shows the observed incidence of false inclusions under MLRBAM for increasing degrees of dropout rate heterogeneity when the other parameters are fixed (six loci, H = 67%, two initial replicates). An upper bound of 70% is used as would be appropriate when the error rates genuinely are homogenous (Table 1). Even in the moderately uneven case where two of the loci are at 60% of the maximum rate, two are at 80%, and two are at the full error rate (coded "1 1 .8 .8 .6 .6" in Fig 6), the incidence of false inclusions remains near 5% so long as p <= 0.8. When the unevenness is more severe with loci at 40 and 70% of the maximum error rate (1 1 .7 .7 .4 .4), the incidence of false inclusion becomes unacceptably large for p > 0.4.



View larger version (31K):
In this window
In a new window
Download PPT slide
 
Figure 6. Effect of interlocus dropout rate heterogeneity on incidence of false inclusions under MLRBAM when rates are assumed to be even in the analysis. Using (A) a 70% upper bound on p and (B) a 95% upper bound on p. The constants dictating the relative dropout rates between the six loci: ({circ}) 1 1 .7 .7 .4 .4, () 1 1 .9 .9 .8 .8, ({diamondsuit}) 1 1 .8 .8 .6 .6, ({diamond}) 1 1 1 1 1 1. For example, when p = 0.6, the dropout constants {1 1 .7 .7 .4 .4} render dropout rates of {.6 .6 .42 .42 .24 .24}. Based on 1000 runs/simulation, six loci, two replicates/locus, and 95% reliability criteria.

One remedy might be to use a larger upper bound on p to estimate reliability. Fig 6B shows the incidence of false inclusions across the same set of uneven dropout rates when a 1 - {alpha} = 95% upper bound is used. This reduces the incidence of false inclusions at low and moderate p values, but the effect diminishes as p gets large. For the most uneven case considered, the false inclusions rate is acceptable so long as p < 0.6. Surprisingly, this increase in the upper bound on p from 70 to 95% elevates the number of reactions only slightly (Fig 7). These results suggest that, if the dropout rates across loci are not highly uneven and/or if the base rate is not large, analyzing data under the even dropout rate assumption still yields reliable results. Using a higher upper bound on p increases the range of violations over which the model remains robust while not increasing the number of reactions appreciably.



View larger version (18K):
In this window
In a new window
Download PPT slide
 
Figure 7. Effect of upper bound on number of reactions needed to achieve acceptable genotypes under MLRBAM for two levels of interlocus dropout rate heterogeneity. Upper bounds on p are indicated within parentheses in legend. Based on 1000 runs/simulation, six loci, two replicates/locus, and 95% reliability criteria. See Fig 6 for incidence of false inclusions associated with these simulations.

Study design:
When designing and budgeting a study, it is often valuable to have an estimate of the number of reactions that will be required. Equation 6, the unconditional probability of obtaining a correct genotype, can provide such an estimate. This requires three things of the investigator: (1) confidence that the model is appropriate; (2) knowledge of the heterozygosity per locus, or a willingness to make an educated guess; and (3) knowledge of the dropout rate, or an educated guess. To avoid underbudgeting, the investigator can use conservatively low heterozygosities and conservatively high dropout rates. It should be noted that the number of replicates need not be even across loci; the SAM algorithm can be used to forecast how replication will proceed on average after the initial replicates are performed. We also note that the number of reactions will additionally include failures.


*  CONCLUSIONS AND OUTSTANDING ISSUES
*TOP
*ABSTRACT
*METHODS
*RESULTS AND DISCUSSION
*CONCLUSIONS AND OUTSTANDING...
*APPENDIX
*LITERATURE CITED

The most important result of this article is that under the model assumptions the MLRBAM represents an efficient method for obtaining reliable genotypes, especially in comparison to the WCRBAM approach. Although a number of variables are shown to affect this efficiency, two valuable points emerge. First, the MLRBAM method is especially efficient when p is small—and published data suggest it generally will be 0–40% (GERLOFF et al. 1995 Down; GAGNEUX et al. 1997 Down; GOOSSENS et al. 1998 Down; ERNEST et al. 2000 Down). This result is robust to changes in the number of loci, heterozygosity, reliability criteria, and, to a lesser extent, the number of initial replicates. But there will be samples, if only occasionally, where allelic dropout occurs far more often. The second important point is that these problem samples may require more reactions, but they do not compromise accuracy under the MLRBAM approach.

Tantamount to the performance of MLRBAM in simulations is the issue of its applicability to real data. Several of the assumptions upon which the model is based warrant closer scrutiny. One such assumption made here is that the dropout rates are even across loci. The findings that the model is robust to mild departures from evenness and the failure of GAGNEUX et al. 1997 Down to reject the even error rate null hypothesis across 11 microsatellite loci suggest that this assumption may often be sufficiently approximated. Furthermore, the model can accommodate heterogeneity by assuming that the dropout rates across loci are related by a set of constants (see Equation 3). This would require a preliminary experiment to estimate the constants for the loci being used.

A second assumption made in the MLR model is that the two alleles in a heterozygote are equally likely to drop out. It has been suggested that the longer allele may drop out more often than the shorter (GERLOFF et al. 1995 Down; GOOSSENS et al. 1998 Down), but to our knowledge the empirical data have failed to detect significant departures from an even dropout rate (GERLOFF et al. 1995 Down; GAGNEUX et al. 1997 Down). If future data reveal how length, or some other allele attribute such as repeat motif, affects the dropout rate, the likelihood framework will readily allow the incorporation of this knowledge. Likewise, it has been assumed here that allele frequencies are known. If the sample size is large and there is no tendency for certain alleles to drop out more than others, then the observed allele frequencies should be relatively accurate. When these conditions are not met, the current model may need to be modified to address this source of uncertainty.

The most serious assumption made in our model is that there are no false or contaminant alleles in the analyzed data. We do not assume that such alleles never occur, but rather that they can be flagged and removed from the data. Although several studies have reported occurrences of false and contaminant alleles that are non-negligible (TABERLET et al. 1996 Down; GAGNEUX et al. 1997 Down; GOOSSENS et al. 1998 Down), three reasons suggest that these estimates may overstate the problem in the current context: (1) Some counts include the cases where three or more alleles (one or more false) appear in one reaction as well as the more insidious cases where one or two alleles are observed but one is false; (2) counts may include false alleles that have never otherwise been observed in the population or species and would therefore be viewed with great suspicion; and (3) the current practice of scoring microsatellites using electropherograms provides much greater resolution for identifying anomalous patterns compared to the former technique of scoring alleles by autoradiography.

Nevertheless, cryptic false and contaminant alleles do occur and when they are undetected, they will cause genotype errors. It may be possible to explicitly incorporate these errors into the likelihood model, for example, by assigning each allele a conditional probability of being true and a probability of being false. Certainly, there is information regarding how likely an allele is to be false vs. true such as its frequency and its length relative to the other allele (most false alleles are one repeat shorter or longer than a true allele). Likewise, the contamination probability can be estimated by using numerous blanks during DNA extraction and PCR. In practice, one current option is to follow TABERLET et al. 1996 Down in assuming that the probability that the same false or contaminant allele will occur twice at the same locus is remote. This leads to the requirement that every allele be observed twice before accepting a genotype.

A dilemma arises with this approach, however, when a series of reactions at a locus yield one heterozygote result and the same homozygote for all the rest (e.g., ab, a, a, a). If in replicating we continue to observe the same homozygote, at what point should we begin to have serious doubts about it being a genuine heterozygote? Suppose that the locus is truly a heterozygote. The probability of observing one heterozygous result and i - 1 homozygotes of the same allele given the model is

(8)

Replacing p with , or more conservatively with (up), estimated from data at the other loci, a p value can be calculated. Even in the worst case, where p is near 80–85%, it is very unusual not to observe both alleles twice in a true heterozygote within five or six replicates. Once replication drives this p value below, say, 5%, the sample should be abandoned at this locus. The sample cannot be declared a homozygote because casting doubt on the heterozygote hypothesis and accepting the homozygote-false/contaminant allele hypothesis are not equivalent. The competing hypotheses cannot be resolved (as with a likelihood-ratio test) without placing a probability on the event of a false/contaminant allele (see also NAVIDI et al. 1992 Down). An alternative to discarding the locus completely would be to count it as one-half a locus occupied by the allele observed in the homozygote.

Though false and contaminant alleles are generally rare, homozygotes are not. Each time a homozygote is observed, there is the possibility that it represents a true heterozygote where consecutive dropout errors have occurred. In this article we developed a general mathematical framework for dealing with this source of uncertainty. While incomplete, the model shows promise for vastly improving the efficiency in acquiring reliable genetic data—a critical step toward realizing the potential of noninvasive, historic, and forensic genetic sampling.


*  ACKNOWLEDGMENTS

We thank Gordon Luikart for advice on writing the simulation program. This research was supported by the National Science Foundation (NSF) grant DEB-0089756 and the NSF EPSCoR program (Experimental Program to Stimulate Competitive Research), NSF cooperative agreement nos. EPS-9720634 and EPS-0080935.

Manuscript received February 23, 2001; Accepted for publication October 22, 2001.


*  APPENDIX
*TOP
*ABSTRACT
*METHODS
*RESULTS AND DISCUSSION
*CONCLUSIONS AND OUTSTANDING...
*APPENDIX
*LITERATURE CITED

PROOF OF Equation 1
Assumptions:

  1. All L loci are true heterozygotes.

  2. Allelic dropout occurs in every reaction. Denoting the two alleles a and b, the true genotype is revealed only if a drops out during one replication and b drops out during another.

  3. The probability of observing a particular allele is one-half per replicate per locus.

  4. Loci are independent.

Probability experiment:
Each locus is initially replicated k times. The observed homozygotes are then replicated an additional i - k times for a total of i replicates.

Recording the results of the experiment:
If at the end of the experiment all replicates produce the same allele, the locus is typed as a homozygote; otherwise it is typed as a heterozygote. The genotype is correctly identified if at the end of the experiment each locus is typed as a heterozygote.

Mathematics:
Let C be the event that the genotype is typed correctly. Let Mk equal the number of observed homozygotes after k replicates. Let Xj = 1 if, after k replicates, locus j is an observed homozygote and Xj = 0 otherwise. It follows that Mk = X1 + X2 + ... + XL. Now let s be any real number. It follows that

(A1)

Therefore,

Now replace s = 1 - (1/2)i-k in (A1) to get

which is Equation 1. Note that P(C) does not depend on k. The initial number of replicates does not affect the probability of a correct genotype.

If a genotype error probability is set to {alpha}, then P(C) = 1 - {alpha}. Therefore,

Therefore,

implying


*  LITERATURE CITED
*TOP
*ABSTRACT
*METHODS
*RESULTS AND DISCUSSION
*CONCLUSIONS AND OUTSTANDING...
*APPENDIX
*LITERATURE CITED

BOUZAT, J. L., H. A. LEWIN, and K. N. PAIGE, 1998  The ghost of genetic diversity past: historical DNA analysis of the greater prairie chicken. Am. Nat. 152:1-6.

ERNEST, H. B., M. C. T. PENDO, B. P. MAY, S. M. SYVANEN, and W. M. BOYCE, 2000  Molecular tracking of mountain lions in the Yosemite Valley region in California: genetic analysis using microsatellite and faecal DNA. Mol. Ecol. 9:433-441[Medline].

GAGNEUX, P., C. BOESCH, and D. S. WOODRUFF, 1997  Microsatellite scoring errors associated with noninvasive genotyping based on nuclear DNA amplified from shed hairs. Mol. Ecol. 6:861-868[Medline].

GERLOFF, U., C. SCHLÖTTERER, K. RASSMANN, I. RAMBOLD, and G. HOHMANN et al., 1995  Amplification of hypervariable simple sequence repeats (microsatellites) from excremental DNA of wild living bonobos (Pan paniscus). Mol. Ecol. 4:515-518.

GOOSSENS, B., L. P. WAITS, and P. TABERLET, 1998  Plucked hair samples as a source of DNA: reliability of dinucleotide microsatellite genotyping. Mol. Ecol. 7:1237-1241[Medline].

KOHN, M. H. and R. K. WAYNE, 1997  Facts from feces revisited. Trends Ecol. Evol. 12:223-227.

KOHN, M. H., E. C. YORK, D. A. KAMRADT, G. HAUGHT, and R. M. SAUVAJOT et al., 1999  Estimating population size by genotyping faeces. Proc. R. Soc. Lond. Ser. B 266:1-7[Medline].

LEONARD, J. A., R. K. WAYNE, and A. COOPER, 2000  Population genetics of ice age brown bears. Proc. Natl. Acad. Sci. USA 97:1651-1654[Abstract/Free Full Text].

MILLER, L. M. and A. R. KAPUSCINSKI, 1997  Historical analysis of genetic variation reveals low effective population size in northern pike (Esox lucius) population. Genetics 147:1249-1258[Abstract].

MILLS, L. S., J. J. CITTA, K. P. LAIR, M. K. SCHWARTZ, and D. A. TALLMON, 2000  Estimating animal abundance using noninvasive DNA sampling: promise and pitfalls. Ecol. Appl. 10:283-294.

MORIN, P. A., J. WALLIS, J. J. MOORE, and D. S. WOODRUFF, 1994  Paternity exclusion in a community of wild chimpanzees using hypervariable simple sequence repeats. Mol. Ecol. 3:469-478[Medline].

MUNDY, N. I., C. S. WINCHELL, T. BURR, and D. W. WOODRUFF, 1997  Microsatellite variation and microevolution in the critically endangered San Clemente Island loggerhead shrike (lanius ludovicianus mearnsi). Proc. R. Soc. Lond. Ser. B 264:869-875.

NAVIDI, W., N. ARNHEIM, and M. S. WATERMAN, 1992  A multiple-tubes approach for accurate genotyping of very small DNA samples by using PCR: statistical considerations. Am. J. Hum. Genet. 50:347-359[Medline].

PALSBØLL, P. J., J. ALLEN, M. BÉRUBÉ, P. J. CLAPHAM, and T. P. FEDDERSEN et al., 1997  Genetic tagging of humpback whales. Nature 388:767-769[Medline].

REED, J. Z., D. J. TOLLIT, P. M. THOMPSON, and W. AMOS, 1997  Molecular scatology: the use of molecular genetic analysis to assign species, sex and individual identity to seal faeces. Mol. Ecol. 6:225-234[Medline].

ROY, M. S., E. GEFFEN, D. SMITH, and R. K. WAYNE, 1996  Molecular genetics of pre-1940 red wolves. Conserv. Biol. 10:1413-1424.

TABERLET, P., S. FRIFFIN, B. GOOSSENS, S. QUESTIAU, and V. MANCEAU et al., 1996  Reliable genotyping of samples with very low DNA quantities using PCR. Nucleic Acids Res. 24:3189-3194[Abstract/Free Full Text].

TABERLET, P., J. J. CAMARRA, S. GRIFFIN, E. UHRÈS, and O. HANOTTE et al., 1997  Noninvasive genetic tracking of the endangered Pyrenean brown bear population. Mol. Ecol. 6:869-876[Medline].

TABERLET, P., L. P. WAITS, and G. LUIKART, 1999  Noninvasive genetic sampling: look before you leap. Trends Ecol. Evol. 14:323-327[Medline].

TESSIER, N. and L. BERNATCHEZ, 1999  Stability of population structure and genetic diversity across generations assessed by microsatellites among sympatric populations of landlocked Atlantic salmon (Salmo salar L.). Mol. Ecol. 8:169-179.

WAITS, J. L. and P. L. LEBERG, 2000  Biases associated with population estimation using molecular tagging. Anim. Conserv. 3:191-200.

WOODS, J. G., D. PAETKAU, D. LEWIS, B. N. MCLELLAN, and M. PROCTOR et al., 1999  Genetic tagging of free-ranging black and brown bears. Wildl. Soc. Bull. 27:616-627.




This article has been cited by other articles:


Home page
GeneticsHome page
P. C. D. Johnson and D. T. Haydon
Maximum-Likelihood Estimation of Allelic Dropout and False Allele Error Rates From Microsatellite Genotypes in the Absence of Reference Data
Genetics, February 1, 2007; 175(2): 827 - 842.
[Abstract] [Full Text] [PDF]


Home page
J HeredHome page
J. B. A. Okello, G. Wittemyer, H. B. Rasmussen, I. Douglas-Hamilton, S. Nyakaana, P. Arctander, and H. R. Siegismund
Noninvasive Genotyping and Mendelian Analysis of Microsatellites in African Savannah Elephants
J. Hered., November 1, 2005; 96(6): 679 - 687.
[Abstract] [Full Text] [PDF]


Home page
J HeredHome page
J. D. Wehausen, R. R. Ramey II, and C. W. Epps
Experiments in DNA Extraction and PCR Amplification from Bighorn Sheep Feces: the Importance of DNA Extraction Method
J. Hered., November 1, 2004; 95(6): 503 - 509.
[Abstract] [Full Text] [PDF]


Home page
Proc. Natl. Acad. Sci. USAHome page
C. R. Miller and L. P. Waits
The history of effective population size and genetic diversity in the Yellowstone grizzly (Ursus arctos): Implications for conservation
PNAS, April 1, 2003; 100(7): 4334 - 4339.
[Abstract] [Full Text] [PDF]