Assessing Allelic Dropout and Genotype Reliability Using Maximum Likelihood
 ^{*}Department of Fish and Wildlife, College of Natural Resources, University of Idaho, Moscow, Idaho 83844
 ^{†}Department of Mathematics, Division of Statistics, University of Idaho, Moscow, Idaho 83844
 1 Corresponding author: Department of Fish and Wildlife, College of Natural Resources, University of Idaho, Moscow, ID 83844. Email: mill8560{at}uidaho.edu
Abstract
A growing number of population genetic studies utilize nuclear DNA microsatellite data from museum specimens and noninvasive sources. Genotyping errors are elevated in these low quantity DNA sources, potentially compromising the power and accuracy of the data. The most conservative method for addressing this problem is effective, but requires extensive replication of individual genotypes. In search of a more efficient method, we developed a maximumlikelihood approach that minimizes errors by estimating genotype reliability and strategically directing replication at loci most likely to harbor errors. The model assumes that false and contaminant alleles can be removed from the dataset and that the allelic dropout rate is even across loci. Simulations demonstrate that the proposed method marks a vast improvement in efficiency while maintaining accuracy. When allelic dropout rates are low (0–30%), the reduction in the number of PCR replicates is typically 40–50%. The model is robust to moderate violations of the even dropout rate assumption. For datasets that contain false and contaminant alleles, a replication strategy is proposed. Our current model addresses only allelic dropout, the most prevalent source of genotyping error. However, the developed likelihood framework can incorporate additional errorgenerating processes as they become more clearly understood.
THE extraction and amplification of DNA from museum, noninvasive, and forensic sources has great potential for studying and managing wild populations (Kohn and Wayne 1997; Taberletet al. 1999). The power of these approaches lies in their circuity: freeranging animals can be “tagged” and sexed without ever being handled or observed (Morinet al. 1994; Palsbøllet al. 1997; Reedet al. 1997; Taberletet al. 1997; Kohnet al. 1999; Woodset al. 1999; Ernestet al. 2000) and generations long dead can reveal the effective size of a population (Miller and Kapuscinski 1997) and changes in patterns and levels of variability over time (Royet al. 1996; Mundyet al. 1997; Bouzatet al. 1998; Tessier and Bernatchez 1999; Leonardet al. 2000). Unfortunately, the DNA extracted from aged material such as bone or from noninvasive sources such as hair and feces is often at low concentrations and/or highly fragmented. Under these conditions the probability of a genotyping error is severely elevated (Taberletet al. 1996). At nuclear microsatellite loci there are three types of errors: failure to amplify one of an individual's two alleles (allelic dropout), polymerase error rendering a “false” allele, and amplification of contaminant DNA. Of the three errors, allelic dropout appears to be the most serious problem (Gagneuxet al. 1997). Contamination can be controlled by stringent laboratory protocols and the inclusion of multiple negative controls. False alleles are considerably less frequent (Gagneuxet al. 1997; Goossenset al. 1998), often show an unusual spectral pattern, or produce three alleles that can be tagged as suspicious and replicated. In contrast, an allele that has dropped out leaves no trace of itself in the genotype data.
The cause of allelic dropout is believed to be stochastic sampling error (Navidiet al. 1992; Taberletet al. 1996). Sampling occurs when the extract is pipetted into the PCR mix and again when the primers and polymerase bind the template DNA in the PCR. If template DNA is at very low concentrations, then one copy may, by chance, be amplified more than the other. [In this article the term “copies” refers to the one maternally inherited allele and the one paternally inherited allele (without regard to state) so that the term “alleles” can be used to specify the variants at a locus.] If the two copies represent different alleles (a heterozygote) then dropout yields a “false homozygote.”
Taberlet et al. (1996) used a stochastic sampling simulation and replicate PCRs in the laboratory to show that at low DNA concentrations the rate of allelic dropout can be >50%. As a worstcase scenario, they reasoned that in a heterozygote where every positive PCR reflects the sampling of only one allele, the probability of sampling the same allele i times consecutively is ½i^{−1} (assuming that each allele is equally likely to drop out). Treating this as a null hypothesis, a locus where only one allele is observed is accepted as a genuine homozygote only when the heterozygote hypothesis has been made untenable by the data. The probability of false acceptance can be made arbitrarily small (α) by making the integer i arbitrarily large: i ≥ 1 − (ln α/ln 2). If α = 0.05, i ≥ 6, and if α = 0.01, i ≥ 8. Navidi et al. (1992) used a similar hypothesisbased approach to evaluate genotype reliability. Although their model is more complex in that contamination is considered, the homozygote case reduces to the same rejection rule.
If multiple loci are considered simultaneously, then an acceptance error at any heterozygous locus renders the genotype erroneous. The worstcase rationale can be extended to multiple loci by casting it as a decision rule—that is, a procedure specified before collecting any data that will yield a correct genotype with probability ≥1 − α. The procedure accomplishing this is one that renders a correct genotype with probability = 1 − α under the worst possible scenario (a dropout rate of 1 and all loci heterozygous). Under these circumstances, the probability of obtaining a correct multilocus genotype is
There are both practical and statistical shortcomings to a worstcase approach. Pragmatically, it leads to the need to perform large numbers of replicates. Acquiring accurate genetic information on a population will often involve typing hundreds or thousand of samples. As is shown, a study using eight loci and the WCR could easily require an average of 35 reactions per sample. This equates to 35,000 single reactions to accurately genotype 1000 samples. The financial costs associated with this number would prohibit many noninvasive and historical genetic studies. Furthermore, the limited amount of DNA extract may be consumed before 30 or 40 reactions can be performed. The statistical problem with a worstcase approach is that it makes essentially no use of the data in hand.
In contrast to ignoring the available data, Gagneux et al. (1997) estimated the dropout rate from shed hairs (rather than assuming it is 1) by replicating genotypes on 18 chimpanzees up to six times. Maintaining the heterozygous null, the p value is given by (p) × (p/2)^{i−1}, where p is the dropout rate (Gagneuxet al. 1997). The problem with this approach is that obtaining an accurate estimate of p at every locus for every individual requires many replicates and thus defeats the promise of improved efficiency. The problem might be circumvented by pooling results across loci and samples, but this requires the assumption that dropout rates are equal across loci and samples. Gagneux et al. (1997) observed significantly different dropout rates between individuals (ranging from 0 to 75%) but not between loci. As with the WCR, this approach makes the assumption that every locus is heterozygous regardless of how likely this is to be true.
What are the implications of committing genotyping errors? Clearly, a high genotyping error rate could bias most current applications of microsatellites, including genetic markrecapture studies, forensic identification of individuals, parentage analysis, population assignment, and estimates of population substructure. D. Roon, L. Waits and K. Kendall (unpublished data) and Waits and Leberg (2000) have shown that genotyping errors in noninvasive markrecapture studies can result in severe overestimates of population size. Compounding this problem is the fact that resolution between individuals is improved by adding loci (Millset al. 2000). Because the probability of obtaining a correct multilocus genotype declines as the product across individual loci, adding loci while using a singlelocus statistical procedure (or worse yet, ignoring statistical issues of reliability altogether) results in an increasing per sample error rate. The potentially severe implications of committing errors, along with the costly and inefficient methods currently employed for avoiding them, provide strong motivation for developing an approach that uses the available data more efficiently. In this article we develop the theoretical basis of a more efficient method for acquiring reliable genotype data, using the method of maximum likelihood. We then evaluate its performance and compare it with the worstcase approach under a range of simplified scenarios.
METHODS
General approach: Before developing the proposed approach, it is helpful to overview the rationale behind it. Suppose that an individual is genotyped at each of a number of diploid loci i times. Assuming that contamination and false alleles do not occur or can be removed from the data (see conclusions and outstanding issues), the observation of two different alleles at a locus implies that the individual is a heterozygote. If only one allele is observed, however, then the individual may either be a true homozygote or it may be a heterozygote at which i dropouts of the same allele have occurred. The probability of the latter event can be estimated as it is a function of the probability that the two copies differ (i.e., the heterozygosity) and the probability of a dropout. If the allele frequencies are known and HardyWeinberg equilibrium is assumed, then the heterozygosity (conditional on the allele observed) is readily obtained. The dropout probability for the sample at hand can be estimated by finding that dropout rate that makes the observed data most likely. The dropout rate must be estimated for each sample because samples differ in age, environmental exposure, etc., and therefore in quality and quantity of DNA. The (un)reliability of the observed multilocus genotype can then be estimated by weighting the probability that sequential dropout errors have occurred by the probability that the locus is heterozygous multiplied across the observed homozygous loci. Samples that are not reliable must be replicated until they are.
The model: Consider a population where all loci under study are independent, at HardyWeinberg equilibrium, and have known allele frequencies. The model involves two sampling events. First, during reproduction alleles are sampled from the gamete pool and fixed into individuals. Second, during the PCR alleles are sampled from individuals across loci. Let i_{j} denote the number of times sampling occurs at locus j. Each time one copy drops out with probability p_{j}. If this error occurs, we assume that each copy is equally likely to be the dropout. Otherwise, both copies are observed. We do not consider the event that both copies drop out because PCR failure may be due to more than stochastic sampling error such as PCR reagents, thermocycler problems, etc. Hence, p_{j} is actually the conditional probability of detecting one allele given that at least one allele amplifies. (It can be shown that, for realistic error rates, neglecting double dropouts has a negligible effect.) If the two copies are labeled a and b (where a could be the same or different from b), the results can be summarized as the number of times in which a, b, and ab are sampled: r_{a,j}, r_{b,j}, and r_{ab,j}, respectively. Let the vector of these counts be denoted by ñ_{j}. When the individual's true genotype, g, is known, the likelihood of the data given the genotype and given the dropout rates is trinomial multiplied across the T heterozygous loci.
Of course, the true genotype is not known. This is addressed by writing the likelihood as the sum of genotypespecific probabilities weighted by the unconditional probability of the genotype (i.e., its expected frequency) over all possible genotypes. In addition, the model is greatly simplified by assuming that across all samples the dropout rates at different loci are related to one another by a collection of constants such that p_{j} = c_{j}p. Then
It is the reliability of the genotype, not the dropout rate, that is of interest to the investigator. Let E_{j} be the event that the observed genotype is correct at locus j and let E_{g} be the event that it is correct across all loci. Let f_{j} denote the frequency of the observed allele at locus j. Note that it is at the M loci observed as homozygous where errors may be hidden. The reliability of a genotype, P(E_{g}), is given by
For purposes of study design, the unconditional probability that a genotype will be correctly identified is useful. Let Z_{l} be the event that the lth locus is heterozygous,
By comparing Equations 5 and 6 we see how (5) is dependent on the observed data while (6) is not. The dependency on the data occurs in two ways. Since it cannot be determined before viewing the data which loci will be observed as homozygous, the observed allele with frequency f_{j} at homozygous locus j is itself data dependent. Also, note that the total number of observed homozygous loci, M, is a random variable that is determined only after viewing the data.
Estimating the dropout rate, genotype reliability, and number of additional replicates: The value of p that maximizes Equation 4 is the maximumlikelihood estimate (MLE),
Suppose that between one and three PCR replicates are conducted initially at each locus and the reliability is estimated. If the estimated reliability is <1 − κ, then further replication is necessary, but how much and at which loci? The answer lies in Equation 7. Because the estimated reliability is a product across individual loci, it follows that the largest per reaction increase in estimated reliability will occur by adding a replicate to the most unreliable locus. In theory, the most efficient procedure is to add one replicate to the most unreliable locus, reestimate the reliability, and continue in this manner until the estimated reliability ≥ 1 − κ. Adding replicates one at a time and reevaluating the data between each is called the “single addition method” (SAM). When the (re)evaluation entails the MLR method as just described, the abbreviation MLR_{SAM} is used.
In the laboratory, however, SAM will usually be impractical as it entails a PCR, gel, and analysis between every additional reaction. A more practical approach is to add additional replicates in a block. The following algorithm provides an efficient way to choose the size of the block: mathematically add one to the number of replicates at the most unreliable locus, assume that the same homozygote is observed, and reestimate the reliability using the original
Simulations: Simulations are used here for two basic purposes: (i) to find the upper confidence bound on p,
Performance of the MLE: The performance of Equation 4 in estimating the dropout rate is evaluated by conducting 1000 runs for a given set of parameters and then calculating the (estimated) bias and standard error. Simulations are run to assess the effects of the number of loci (three to six), number of replicates (one to three), heterozygosity (18–67%), and the parametric dropout rate (0–1) on the performance.
Upper confidence bound on the dropout rate: There are two steps in determining
This establishes how
Efficiency of the different approaches: Four different methods—MLR_{SAM}, MLR_{BAM}, WCR_{SAM}, and WCR_{BAM}—are evaluated and compared by running simulations under common parametric conditions (i.e., fixed dropout rate, number of loci, allele frequencies, and number of initial replicates). In all simulations, the data are evaluated after the initial reactions have been performed. In the MLR simulations, a sample estimated to be unreliable is replicated at observed homozygous loci either in a single (SAM) or a blockwise (BAM) fashion as described above until it is estimated as reliable. By comparing it to the true genotype, the accepted genotype is scored as either correct or incorrect and the number of reactions invested in it is recorded. This is repeated 1000 times to yield an observed incidence of false inclusions and a mean number of reactions per sample.
The WCR simulations are analogous except in the rules governing acceptance and additional replication. Under the WCR_{SAM}, all observed homozygous loci are replicated once and reevaluated, replicated once and reevaluated, and so on. Loci that do not turn up heterozygous by this process are replicated until Equation 1 is satisfied. With the WCR_{BAM}, one block of reactions is added to all observed homozygous loci so that Equation 1 is satisfied. Because the BAM is more practical in the lab, further simulations focus on the MLR_{BAM} and WCR_{BAM} approaches. Specifically, the effects of heterozygosity (50–80%), number of loci (four to eight), number of initial replicates (one to three), and reliability criteria (95–99.9%) on the number of reactions per sample are explored.
Interlocus dropout heterogeneity:Simulations are conducted to investigate how well the MLR_{BAM} approach performs when it is assumed that the dropout rates are even across loci, but they are not. This is accomplished by running successive simulations where the dropout rates across loci are made increasingly uneven but other parameters are held constant. Reliability is estimated under the assumption of dropout rate homogeneity (i.e., using Equations 4 and 7), and replicates are added using BAM. Each simulation consists of 1000 runs from which the incidence of false inclusions is calculated.
RESULTS AND DISCUSSION
Performance of the MLE: When the number of loci is less than four, genotypes are unreplicated, or the heterozygosity is <50%, there is little information in the data regarding the dropout rate. In this case, the MLE from Equation 4 tends to be biased high for small values of p (data not shown). While no bias is desirable, overestimating the dropout rate is a conservative error. Above these values (or for p > 0.5 below these values), the estimator becomes approximately unbiased. As expected in a binomial model, the largest standard errors are observed when p is near 50%.
Upper confidence bound on the dropout rate:To make a conservative estimate of the reliability, we need a sufficiently conservative estimate of the dropout rate. Preliminary simulations showed that, irrespective of what upper bound is used, the incidence of false inclusions is highest when p is between 0.5 and 0.8 (data not shown). We therefore concentrated on finding the appropriate upper bound for p in this range as it will be sufficiently large for other values of p. When a 95% reliability criteria is required for acceptance (i.e., 1 − κ = 0.95) and two initial replicates are performed, the approximate upper bound on p for four, six, or eight loci with H = 50 or 67% is between 65 and 75% (Table 1). Increasing the reliability criteria to 99% elevates the upper bounds slightly to between 70 and 75%. In the case of three initial replicates, the appropriate upper bounds are in the 60–70% range, while with initially unreplicated data upper bounds are between 75 and 85% (data not shown). These resultsare used to set the upper bound in subsequent simulations.
Efficiency of the different approaches: This study is motivated largely by the apparent inefficiency of the worstcase approach. The central question is, therefore, how efficient is the proposed MLR method by comparison? Consider a case where two initial replicates are performed at six loci, all with 67% heterozygosity. The reliability criteria are set at 1 − κ = 95% for MLR simulations and the probability of a correct genotype is likewise set at 1 − α = 95% for the WCR simulations. In comparing the mean number of total reactions required to achieve acceptable genotypes under each of the four methods (Figure 1A), several important trends emerge. First, the MLR methods are virtually always more efficient than the WCR method of the same replication strategy. Second, differences between the MLR and the WCR methods are largest when p = 0 and they disappear as p approaches 1. For p ≤ 0.2, the MLR methods require 10–12 reactions fewer than the WCR methods (a 40–50% reduction). We do note, however, that the total number of reactions in this and all other simulations in this article does not include any failed PCR reactions. While PCR failures should be rare for samples with low values of p, failures will increase the total reaction counts and reduce the proportional difference between approaches. Third, the SAM and BAM approaches are nearly equivalent for small values of p, but SAM is increasingly superior as p approaches 1.
This third trend is important because BAM is far more practical than SAM in the laboratory and because dropout rates should be <0.5 for most samples. For example, Gagneux et al. (1997) and Goossens et al. (1998) observed mean dropout rates of 31 and 14%, respectively, on single hair extracts and Ernest et al. (2000) observed an 8% dropout rate from felid fecal DNA. The practicality of the MLR_{BAM} approach is apparent when one considers the number of rounds of replication (i.e., the number of blocks including the initial block of 12 reactions) that were necessary in the MLR_{BAM} simulations of Figure 1A (Figure 1D). The sample is nearly always accepted within two rounds and when p is small it is not uncommon to accept it without further replication.
The WCR is generally inefficient because it is designed to guard against a worst possible scenario, p = 1. When p is not large, the payoff of (over)replicating is, of course, that it renders genotypes with high estimated reliabilities and very few errors (Figure 1, B and C). In contrast, the MLR methods yield moderate estimated reliabilities and an incidence of false inclusion below—but generally not far below—κ. This is true of all the MLR simulations in this article except those involving interlocus dropout heterogeneity (see below). To put these false inclusionrates in perspective, if genotypes in these simulations were unconditionally accepted after initial replication (i.e., without being subject to a reliability criteria) the incidence of erroneous genotypes would be ~0, 7, 29, 54, 75, and 93% for p = 0, 0.2, 0.4, 0.6, 0.8, and 1, respectively.
It might be argued that the lower false inclusion rate observed with WCR approaches over most of the range of p suggests it actually outperforms the MLR approaches in one respect. Recall, however, that the investigator is willing to tolerate up to 5% errors to reduce the number of reactions. If the investigator wishes to reduce the incidence of false inclusions below 5%, the reliability criteria in the MLR methods can simply be raised. Figure 2 shows the number of replicates required to achieve estimated reliabilities of 95, 99, and 99.9% with the MLR_{BAM} approach for six loci, two initial replicates, and H = 67%. When p is low, only a few more reactions are required to achieve the higher estimated reliabilities, and even at p = 0.4, 99% estimated reliability is just four reactions >95% estimated reliability. As p gets large the cost of higher estimated reliability grows considerably.
In addition to assuming that p = 1, the WCR assumes that H = 1. As these assumptions are approached, we expect the relative performance of the WCR methods to improve. Examining the comparative effect of heterozygosity on efficiency in the WCR_{BAM} and MLR_{BAM} approaches shows this to be true (Figure 3). But even at H = 80%, MLR_{BAM} outperforms WCR_{BAM} across the range where most samples will realistically fall (for p < 70%). When H = 50% and p is small, MLR_{BAM} renders acceptable genotypes in approximately onehalf as many reactions. Interestingly, the efficiency of the MLR_{BAM} approach is only slightly improved by increasing heterozygosity.
Because an error anywhere in a genotype renders it erroneous, adding loci elevates the estimated reliability required of each locus and thereby the number of replicates per locus. The impact of this effect was investigated by running simulations at four and eight loci while holding all other parameters constant (two initial replicates, H = 67%; Figure 4). Surprisingly, when p is near zero, doubling the number of loci approximately doubles the total number of reactions in the MLR_{BAM} approach. This near linear increase reflects a near constancy in per locus replication. As p increases, however, the cost of adding loci escalates in a nonlinear manner (as indicated by the divergent MLR_{BAM} lines in the figure). Figure 4 also shows that unless p is near one, adding loci increases the disparity between the MLR_{BAM} and WCR_{BAM} approaches.
One parameter that affects efficiency and is easily manipulated by the investigator is the number of initial replicates per locus. Taberlet et al. (1996) proposed performing three initial replicates—though this choice was influenced by considerations of false and contaminant alleles as well as allelic dropout. We consider the cases of one, two, and three initial replicates while the other parameters are held constant (six loci, H = 67%). The MLR_{BAM} approach on unreplicated data is very inefficient except when p is near 0 (Figure 5). This is especially apparent when it is compared to the theoretical optimum: MLR_{SAM} on initially unreplicated data. Note how much poorer the performance of BAM is relative to SAM with unreplicated (Figure 5) compared to tworeplicate data (Figure 1A). This is because unreplicated data yield a higher upper bound and more observed homozygote loci and these drive the size of the reaction block up. In practice, the most efficient replication strategy is two initial replicates under the MLR_{BAM} method. Three initial replicates under MLR_{BAM} is more efficient than two only when p ≥ 0.8. The two and threereplicate results under the WCR_{BAM} approach are even less efficient except when p is near 1.
Interlocus dropout heterogeneity:All simulations to this point have assumed that dropout rates are equal across loci. Here we address a simple question: How well does MLR_{BAM} perform when the dropout rates are assumed to be even but they are not? Figure 6A shows the observed incidence of false inclusions under MLR_{BAM} for increasing degrees of dropout rate heterogeneity when the other parameters are fixed (six loci, H = 67%, two initial replicates). An upper bound of 70% is used as would be appropriate when the error rates genuinely are homogenous (Table 1). Even in the moderately uneven case where two of the loci are at 60% of the maximum rate, two are at 80%, and two are at the full error rate (coded “1 1 .8 .8 .6 .6” in Figure 6), the incidence of false inclusions remains near 5% so long as p ≤ 0.8. When the unevenness is more severe with loci at 40 and 70% of the maximum error rate (1 1 .7 .7 .4 .4), the incidence of false inclusion becomes unacceptably large for p > 0.4.
One remedy might be to use a larger upper bound on p to estimate reliability. Figure 6B shows the incidence of false inclusions across the same set of uneven dropout rates when a 1 − α = 95% upper bound is used. This reduces the incidence of false inclusions at low and moderate p values, but the effect diminishes as p gets large. For the most uneven case considered, the false inclusions rate is acceptable so long as p < 0.6. Surprisingly, this increase in the upper bound on p from 70 to 95% elevates the number of reactions only slightly (Figure 7). These results suggest that, if the dropout rates across loci are not highly uneven and/or if the base rate is not large, analyzing data under the even dropout rate assumption still yields reliable results. Using a higher upper bound on p increases the range of violations over which the model remains robust while not increasing the number of reactions appreciably.
Study design: When designing and budgeting a study, it is often valuable to have an estimate of the number of reactions that will be required. Equation 6, the unconditional probability of obtaining a correct genotype, can provide such an estimate. This requires three things of the investigator: (1) confidence that the model is appropriate; (2) knowledge of the heterozygosity per locus, or a willingness to make an educated guess; and (3) knowledge of the dropout rate, or an educated guess. To avoid underbudgeting, the investigator can use conservatively low heterozygosities and conservatively high dropout rates. It should be noted that the number of replicates need not be even across loci; the SAM algorithm can be used to forecast how replication will proceed on average after the initial replicates are performed. We also note that the number of reactions will additionally include failures.
CONCLUSIONS AND OUTSTANDING ISSUES
The most important result of this article is that under the model assumptions the MLR_{BAM} represents an efficient method for obtaining reliable genotypes, especially in comparison to the WCR_{BAM} approach. Although a number of variables are shown to affect this efficiency, two valuable points emerge. First, the MLR_{BAM} method is especially efficient when p is small—and published data suggest it generally will be 0–40% (Gerloffet al. 1995; Gagneuxet al. 1997; Goossenset al. 1998; Ernestet al. 2000). This result is robust to changes in the number of loci, heterozygosity, reliability criteria, and, to a lesser extent, the number of initial replicates. But there will be samples, if only occasionally, where allelic dropout occurs far more often. The second important point is that these problem samples may require more reactions, but they do not compromise accuracy under the MLR_{BAM} approach.
Tantamount to the performance of MLR_{BAM} in simulations is the issue of its applicability to real data. Several of the assumptions upon which the model is based warrant closer scrutiny. One such assumption made here is that the dropout rates are even across loci. The findings that the model is robust to mild departures from evenness and the failure of Gagneux et al. (1997) to reject the even error rate null hypothesis across 11 microsatellite loci suggest that this assumption may often be sufficiently approximated. Furthermore, the model can accommodate heterogeneity by assuming that the dropout rates across loci are related by a set of constants (see Equation 3). This would require a preliminary experiment to estimate the constants for the loci being used.
A second assumption made in the MLR model is that the two alleles in a heterozygote are equally likely to drop out. It has been suggested that the longer allele may drop out more often than the shorter (Gerloffet al. 1995; Goossenset al. 1998), but to our knowledge the empirical data have failed to detect significant departures from an even dropout rate (Gerloffet al. 1995; Gagneuxet al. 1997). If future data reveal how length, or some other allele attribute such as repeat motif, affects the dropout rate, the likelihood framework will readily allow the incorporation of this knowledge. Likewise, it has been assumed here that allele frequencies are known. If the sample size is large and there is no tendency for certain alleles to drop out more than others, then the observed allele frequencies should be relatively accurate. When these conditions are not met, the current model may need to be modified to address this source of uncertainty.
The most serious assumption made in our model is that there are no false or contaminant alleles in the analyzed data. We do not assume that such alleles never occur, but rather that they can be flagged and removed from the data. Although several studies have reported occurrences of false and contaminant alleles that are nonnegligible (Taberletet al. 1996; Gagneuxet al. 1997; Goossenset al. 1998), three reasons suggest that these estimates may overstate the problem in the current context: (1) Some counts include the cases where three or more alleles (one or more false) appear in one reaction as well as the more insidious cases where one or two alleles are observed but one is false; (2) counts may include false alleles that have never otherwise been observed in the population or species and would therefore be viewed with great suspicion; and (3) the current practice of scoring microsatellites using electropherograms provides much greater resolution for identifying anomalous patterns compared to the former technique of scoring alleles by autoradiography.
Nevertheless, cryptic false and contaminant alleles do occur and when they are undetected, they will cause genotype errors. It may be possible to explicitly incorporate these errors into the likelihood model, for example, by assigning each allele a conditional probability of being true and a probability of being false. Certainly, there is information regarding how likely an allele is to be false vs. true such as its frequency and its length relative to the other allele (most false alleles are one repeat shorter or longer than a true allele). Likewise, the contamination probability can be estimated by using numerous blanks during DNA extraction and PCR. In practice, one current option is to follow Taberlet et al. (1996) in assuming that the probability that the same false or contaminant allele will occur twice at the same locus is remote. This leads to the requirement that every allele be observed twice before accepting a genotype.
A dilemma arises with this approach, however, when a series of reactions at a locus yield one heterozygote result and the same homozygote for all the rest (e.g., ab, a, a, a). If in replicating we continue to observe the same homozygote, at what point should we begin to have serious doubts about it being a genuine heterozygote? Suppose that the locus is truly a heterozygote. The probability of observing one heterozygous result and i − 1 homozygotes of the same allele given the model is
Though false and contaminant alleles are generally rare, homozygotes are not. Each time a homozygote is observed, there is the possibility that it represents a true heterozygote where consecutive dropout errors have occurred. In this article we developed a general mathematical framework for dealing with this source of uncertainty. While incomplete, the model shows promise for vastly improving the efficiency in acquiring reliable genetic data—a critical step toward realizing the potential of noninvasive, historic, and forensic genetic sampling.
Acknowledgments
We thank Gordon Luikart for advice on writing the simulation program. This research was supported by the National Science Foundation (NSF) grant DEB0089756 and the NSF EPSCoR program (Experimental Program to Stimulate Competitive Research), NSF cooperative agreement nos. EPS9720634 and EPS0080935.
APPENDIX: PROOF OF EQUATION 1
Assumptions:
All L loci are true heterozygotes.
Allelic dropout occurs in every reaction. Denoting the two alleles a and b, the true genotype is revealed only if a drops out during one replication and b drops out during another.
The probability of observing a particular allele is onehalf per replicate per locus.
Loci are independent.
Probability experiment: Each locus is initially replicated k times. The observed homozygotes are then replicated an additional i − k times for a total of i replicates.
Recording the results of the experiment: If at the end of the experiment all replicates produce the same allele, the locus is typed as a homozygote; otherwise it is typed as a heterozygote. The genotype is correctly identified if at the end of the experiment each locus is typed as a heterozygote.
Mathematics: Let C be the event that the genotype is typed correctly. Let M_{k} equal the number of observed homozygotes after k replicates. Let X_{j} = 1 if, after k replicates, locus j is an observed homozygote and X_{j} = 0 otherwise. It follows that M_{k} = X_{1} + X_{2} + … + X_{L}. Now let s be any real number. It follows that
If a genotype error probability is set to α, then P(C) = 1 − α. Therefore,
Footnotes

Communicating editor: S. Tavaré
 Received February 23, 2001.
 Accepted October 22, 2001.
 Copyright © 2002 by the Genetics Society of America