Abstract
Elimination of genotypes or alleles for each individual or meiosis, which are inconsistent with observed genotypes, is a component of various genetic analyses of complex pedigrees. Computational efficiency of the elimination algorithm is critical in some applications such as genotype sampling via descent graph Markov chains. We present an allele elimination algorithm and two genotype elimination algorithms for complex pedigrees with incomplete genotype data. We modify all three algorithms to incorporate inheritance restrictions imposed by a complete or incomplete descent graph such that every inconsistent complete descent graph is detected in any pedigree, and every inconsistent incomplete descent graph is detected in any pedigree without loops with the genotype elimination algorithms. Allele elimination requires less CPU time and memory, but does not always eliminate all inconsistent alleles, even in pedigrees without loops. The first genotype algorithm produces genotype lists for each individual, which are identical to those obtained from the LangeGoradia algorithm, but exploits the halfsib structure of some populations and reduces CPU time. The second genotype elimination algorithm deletes more inconsistent genotypes in pedigrees with loops and detects more illegal, incomplete descent graphs in such pedigrees.
DETERMINING the set of genotypes an individual can have at a particular locus, given incomplete genotype data on a pedigree, is needed in many genetic applications including genotype sampling and evaluation of the likelihood of the phenotypes. Efficient genotype samplers are needed for Bayesian or maximumlikelihood polygene mapping in complex pedigrees implemented via Markov chain Monte Carlo algorithms (e.g, Heath 1997; Uimari and Hoeschele 1997). Lange and Boehnke (1983) presented a genotype elimination algorithm deleting genotypes, which are inconsistent with genotype data, by comparing genotypes within parentoffspring triples. Inconsistent genotypes have a probability of zero conditionally on observed genotype data. The Lange and Boehnke algorithm does not guarantee to eliminate all inconsistent genotypes in a pedigree, irrespective of whether loops are present or not (Lange and Boehnke 1983).
Lange and Goradia (1987) modified the LangeBoehnke algorithm by comparing genotypes within nuclear families rather than parentoffspring triples. For complex pedigrees, the algorithm proceeds through a list of nuclear families repeatedly or iteratively, until no further genotypes can be eliminated. In a particular iteration, for each pair of genotypes of a father and mother formed from the current genotype lists of these two individuals, all possible zygotes are generated and subsequently compared with the genotype lists of all children in the nuclear family. The LangeGoradia algorithm was shown to eliminate all inconsistent genotypes for any pedigree without loops. However, for pedigrees with loops, this algorithm does not guarantee to eliminate all inconsistent genotypes as shown in their counterexample containing an inbreeding loop. Applications of the LangeGoradia algorithm can be found in Heath (1998) and Sobel et al. (1996).
Recently, descent graph Markov chains have been proposed as an alternative to genotype samplers, in particular for complex pedigrees (Sobel and Lange 1996; F.X. Du and I. Hoeschele, unpublished results). With missing genotype data, descent graph samplers require us to evaluate the consistency of an incomplete or complete descent graph with the available partial genotype data (F.X. Du and I. Hoeschele, unpublished results). A descent graph is defined to be complete if all meioses have specified indicators. A descent graph is defined to be incomplete if only some of the meioses have specified indicators. Given a complete descent graph, Sobel and Lange (1996) proposed an algorithm for tracing the founder origin of each meiosis in the pedigree and obtaining a permissible allele list for each founder meiosis.
An elimination is successful if the resulting genotype or allele list for each individual or meiosis is nonempty. If at least one list is empty, then the elimination has failed. In asyetunpublished results, F.X. Du and I. Hoeschele spell out the minimum requirements that an elimination algorithm must have for evaluating the consistency of complete or incomplete descent graphs in descent graph sampling. The elimination must succeed for each consistent descent graph (incomplete or complete), but it must fail for any complete inconsistent descent graph. Successful genotype or allele elimination given an inconsistent, incomplete descent graph is permitted but undesirable, because it affects computational efficiency adversely. In descent graph sampling for complex pedigrees with incomplete genotype data, an elimination algorithm needs to be executed many times, and therefore high computational efficiency is important.
In this study, we first describe an allele elimination algorithm, which requires less computing time and memory than genotype elimination to produce a list of consistent alleles for each meiosis. As opposed to the method of Sobel and Lange (1996), this allele elimination algorithm can be used to evaluate the consistency of an incomplete descent graph. We then present a modified genotype elimination algorithm, which takes into account the predominant halfsib (or fullsib within halfsib) structure of some populations (e.g., cattle, swine, tree, or fish populations), and which is computationally more efficient than the basic LangeGoradia genotype algorithm. This first modification always produces the same genotype lists as the LangeGoradia algorithm. Last, we describe a second modification of the genotype elimination, which identifies more inconsistent genotypes in the presence of pedigree loops. All three elimination algorithms have been modified to evaluate the consistency, with observed genotype data, of complete or incomplete descent graphs, and the properties of these algorithms are discussed.
Notation: Some notation used throughout the text is defined as follows. Pedigree members are numbered from 1 to N, and 0 represents an individual with unknown identity. Marker alleles are also represented by integers ranging from 1 to the number of distinct alleles, and 0 denotes an unknown allele; hence 00 denotes a missing genotype. An ordered genotype is represented by (ij), with i being the allele of paternal origin and j the allele of maternal origin [hence we distinguish between genotypes (ij) and (ji)]. An unordered marker genotype is denoted by {ij}, with the parental origins of i and j unknown, and with {00} denoting an unknown genotype. Meiosis indicator values of 1, 2, and 0 correspond to grandpaternal, grandmaternal, and unknown grandparental origin, respectively.
An allele elimination algorithm: Here we introduce an algorithm to eliminate alleles for each meiosis, which are inconsistent with the observed genotypes. While the unit (for genotype comparisons) in genotype elimination is the nuclear family, the corresponding unit in allele elimination is the meiosis group. A meiosis group consists of the two meioses of a parent (father or mother) and all paternal or maternal meioses of the parent’s offspring. The allele elimination algorithm consists of the following steps:
For each meiosis, list all alleles that are compatible with genotype data on the individual to whom the meiosis belongs.
For each father (mother) in the pedigree:
For each allele pair or ordered genotype of the parent:
If the paternal (maternal) meiosis of any child has one or two alleles of the pair, identify the allele(s) as consistent.
If each child of the individual has at least one consistent allele, save the allele pair for two meioses of the parent and all consistent alleles for the paternal (maternal) meiosis of each child.
If all alleles at the paternal (maternal) meiosis of at least one child are incompatible with both alleles of the pair, take no action to save alleles of the children and the allele pair of the parent.
Exclude those alleles that are not saved during step 2A.
For an individual with an observed, heterozygous genotype, if there is a change in the allele list of one meiosis, delete the inconsistent allele in the other meiosis of the individual.
Repeat step 2 until no further alleles can be deleted.
An example pedigree to illustrate the allele elimination algorithm is in Table 1. Individuals 1 and 4 are parents or nonfinal pedigree members. The starting allele list for each meiosis, which is compatible with genotype data, is given in the last two columns of Table 1 (step 1). To perform step 2 of the allele elimination algorithm, consider nonfinal pedigree member 1 and his two children 4 and 5. There is only one possible pair ([1] & [1]) for the two meioses of individual 1. Therefore, the allele list of the paternal meiosis of individuals 4 and 5 becomes [1] after performing step 2A for individual 1. Because individual 5 has an observed, heterozygous genotype and the allele list of her paternal meiosis was changed from [1, 2] to [1] in step 2A, the allele list of her maternal meiosis is changed from [1, 2] to [2] after performing step 2C. There are 16 possible allele pairs for the two meioses of individual 2. Seven of those ([1] & [2], [2] & [1], [2] & [2], [2] & [3], [3] & [2], [2] & [4], [4] & [2]) are consistent allele pairs; hence, there are no changes in the allele lists of both meioses of individual 2 and the maternal meioses of individuals 4 and 5. Similarly, no changes result from performing step 2 for individual 3 and her child. There are four allele pairs ([1] & [1], [1] & [2], [1] & [3], [1] & [4]) for individual 4, but only two pairs ([1] & [3], [1] & [4]) are consistent with the genotypes of his child (individual 6). Therefore, the allele list of the maternal meiosis of individual 4 is changed to [3, 4], and no change is made for the paternal meiosis of individual 6. The allele lists of all meioses in the pedigree after the first cycle of allele elimination are given in the second and third columns of Table 2. In the second cycle, no changes in allele lists result from examining the allele pair for individual 1. For individual 2, 4 allele pairs ([2] & [3], [2] & [4], [3] & [2], [4] & [2]) of 16 possible pairs are consistent with the allele lists of the maternal meioses of her children. Therefore, the allele lists of the paternal and maternal meioses of individual 2 are changed to [2, 3, 4]. No further changes result from completing the second cycle and performing a third cycle. Allele lists for all meioses obtained at the end of the allele elimination are shown in the last two columns of Table 2.
Like genotype elimination, allele elimination will never overeliminate by eliminating any consistent alleles, because an allele is eliminated only if an inconsistency is encountered. However, allele elimination will not always eliminate every inconsistent allele in pedigrees with and without loops. A counterexample of a pedigree without loops is given in Table 3. There, allele 1 is inconsistent for the paternal meiosis of individual 7, but it cannot be deleted through allele elimination. Individual 8’s paternal meiosis can have allele 2 or 3, but not both. However, this information is not passed on with the iterative allele elimination algorithm, where the consistency of individual 7 and the paternal meioses of individuals 9 and 10 are evaluated. Because this example consists of a pedigree without loops, the basic LangeGoradia genotype elimination algorithm produces fully efficient genotype lists from which fully efficient allele lists can be derived. A fully efficient genotype or allele list is a list containing only consistent genotypes or alleles; i.e., each inconsistent genotype or allele has been removed.
The allele elimination algorithm was extended to accommodate the meiosis inheritance restrictions imposed by an incomplete or complete descent graph. The modification includes two components. First, with a meiosis specified as being of grandpaternal or grandmaternal origin, the allele elimination algorithm is modified to compare alleles between one meiosis of the parent (instead of both meioses in the original algorithm) and the corresponding meiosis of a child (with specified meiosis indicator). Consequently, the algorithm will eliminate alleles that are inconsistent with specified meiosis indicators and observed genotype data. Second, the algorithm checks whether the two meioses of an inbred individual with an observed heterozygous genotype do not have the same founder origin as their single origin. If this occurred, then the descent graph would be inconsistent with the observed genotype data, but the inconsistency would not be detected without this additional check. The check is performed by first fixing all meiosis indicators, which can be determined from the (in)complete descent graph and observed genotype data. Subsequently, for each inbred heterozygous individual the founder origins of its two meioses are determined to ensure that the two meioses are not traced back to the same single founder meiosis. For illustration, consider a pedigree with father 1 (genotype {12}), mother 2({00}) mated with father 1 to produce male offspring 4 [genotypes (1^{*}) and (2^{*}) based on parents, where ^{*} denotes any allele possible], mother 3[(33)] mated with father 1 to produce female offspring 5 [genotypes (13) and (23) based on parents] and with offspring 4 and 5 mated to produce 6 ({12}). Suppose we have an incomplete descent graph that specifies the indicators of the paternal meioses of individuals 4, 5, and 6 as being 1. Although the maternal meiosis indicator of individual 6 is not specified by the descent graph, the genotype data imply that it is 1. Hence, the indicators of four meioses are fixed. Individual 6 is the only individual being both inbred and heterozygous, and hence the founder origins of its two meioses are determined. The descent graph turns out to be inconsistent with the observed genotype data, because it implies that individual 6 inherited the same allele from individual 1 at its two meioses, which is not possible given that 6 is heterozygous. Both genotype and allele elimination do not detect such an inconsistency without the additional check.
Given a descent graph (incomplete or complete), the allele elimination will always succeed if the descent graph is consistent with observed genotype data, because an allele is deleted only if an inconsistency is encountered. The elimination will fail if the allele list for at least one meiosis does not contain any consistent alleles. For an illegal incomplete descent graph on a pedigree with loops, neither the allele elimination nor the LangeGoradia genotype elimination is guaranteed to fail. A counterexample with an inbreeding loop and illegal meiosis indicators is presented in Table 4. Here, genotype data and the one specified meiosis indicator force individual 12 to carry both alleles 2 and 3, which is not possible given its ancestry. However, both allele elimination and LangeGoradia genotype elimination were successful, and the resulting allele and genotype lists are presented in columns 7 and 8 of Table 4, respectively, with all genotypes of individual 12 left after elimination being inconsistent.
With the allele elimination algorithm, allele lists for all meioses need to be stored, compared to storing genotype lists for all individuals in genotype elimination. The reduction in storage requirements depends on how many individuals are genotyped and on the number of alleles at the locus under consideration. If all individuals are genotyped, then storage requirements for allele and genotype elimination are identical. If many individuals are not genotyped, then storage requirements of allele elimination are smaller by a factor of at most s, where s is the number of alleles (s^{2} genotypes vs. s alleles in each of the two lists for paternal and maternal meioses). Such reduction in storage requirements becomes important when analyzing very large populations, as is typically done in genetic evaluation of livestock.
The allele elimination can also be far more efficient than the LangeGoradia algorithm in terms of CPU time for situations with and without descent graph restriction. To eliminate inconsistent genotypes, the LangeGoradia algorithm evaluates all possible motherfather genotype pairs for each marriage, and all possible genotypes of each child are compared to all possible zygotes generated from a motherfather genotype pair. In contrast, the allele elimination algorithm evaluates all possible allele pairs for each father or mother, and all possible alleles at the paternal or maternal meiosis of each child are compared to an allele pair of a parent. The actual difference in computational efficiency between the two algorithms depends on pedigree structure, percentage of observed genotype data, and number of alleles at the locus of interest.
Sobel and Lange (1996) defined a descent tree as the collection of and connections between a founder meiosis and all meioses having this founder meiosis as their origin. In a descent tree, each meiosis pertaining to an individual with known genotype has either 1 or 2 possible alleles in its allele list (individual is homozygous or heterozygous, respectively). These meioses in the descent tree form a group, and the list of permissible alleles for the founder meiosis of this descent tree is obtained as the intersection of the allele lists of the meioses in the group. Sobel and Lange (1996) also noted that, for each pedigree member with heterozygous genotype, the lists of permissible alleles of the two founder meioses associated with the two meioses of the individual can be restrictive to each other. For example, it is illegal that the two meioses of a heterozygous individual have the same founder meiosis. Therefore, they added an additional step to further reduce the lists of permissible alleles of those founder meioses whose descent trees are connected through heterozygous individuals. With a complete descent graph, our allele elimination algorithm compares a parental (father or mother) meiosis to all progeny (paternal or maternal) meioses (i.e., it conducts generation by generation comparisons). Therefore, the comparison of all meioses within a descent tree as proposed by Sobel and Lange (1996) is completed via multiple steps in one or several iterations in our allele elimination algorithm. This iterative procedure retains the same alleles in the lists for the founder meioses as the SobelLange algorithm, as long as one additional step is performed as described earlier (the two meioses of any inbred, heterozygous individual must not have the same founder meiosis as their single origin). Sobel and Lange (1996) showed that their algorithm, and therefore also our algorithm, produces an accurate allele list for each founder meiosis when a complete descent graph is given.
Although the SobelLange algorithm has some computational overhead for constructing descent graph trees and jointly evaluating several descent trees, our algorithm needs more CPU time to eliminate inconsistent alleles due to its iterative nature. There is little difference in the memory requirements of both algorithms, which are generally small. While developed for a complete descent graph, the SobelLange algorithm cannot be extended to evaluate an incomplete descent graph. The main advantage of our allele elimination algorithm is that it can evaluate both complete and incomplete descent graphs.
First genotype elimination algorithm: In allele elimination, a parental allele pair is compared to alleles of the paternal or maternal meiosis of the parent’s children. This principle is used to modify the LangeGoradia genotype elimination algorithm. The new algorithm exploits the common structure of livestock populations consisting of fullsib families nested within halfsib families. Below, we assume that the common parent is a father (switching father and mother would accommodate maternal halfsib families). The new genotype elimination algorithm consists of the following steps:
For each pedigree member, form a list of genotypes that are consistent with the individual’s own genotype data.
For each paternal halfsib family:
Consider each ordered genotype [denoted by (ij)] of the father, which was saved in the previous cycle, but is NOT saved (see step 2.B) and NOT deleted (see steps 2.A.a.ii and 2.A.c) in this cycle.
Identify all genotypes of each child that are consistent with father’s genotype (ij) and store those genotypes in LIST1.
If each child of the father has at least one consistent genotype, go to step 2.A.b.
If all genotypes of any child are inconsistent with the father’s genotype, delete genotypes (ij) (ji), (ii), and (jj) from father’s genotype list, discard LIST1 and LIST2 (see step 2.A.b.iii), and go to 2.A.
For each mother, consider each genotype (kh) of the mother that was saved in the previously cycle, but is NOT saved (i.e., not in LIST2; see step 2.A.b.iii) and NOT deleted (see step 2.A.b.ii) in the current cycle.
Identify those genotypes of each child of the mother in LIST1 that are consistent with the mother’s genotype.
If all genotypes of any child in LIST1 are inconsistent with the mother’s genotype, delete genotypes (kh), (hk), (hh), and (kk) from the list of genotypes for the mother, and go to step 2.A.b.
If one or more genotypes of each child in LIST1 are consistent with the mother’s genotype, add genotypes (kh) and (hk) into LIST2 for the mother; also add all consistent genotypes of each child into LIST2, and go to step 2.A.b or 2.A.c.
If LIST2 is empty for a mother after evaluating all of her genotypes, delete genotypes (ij) (ji), (ii), and (jj) from the genotype list of the father, discard LIST1 and LIST2, and go to step 2.A.
If each mother has a nonempty LIST2 after completing step 2.A.b, save genotypes (ij) and (ji) for the father, and save all genotypes in LIST2 for each of the father’s children and their mothers.
Repeat step 2 until no further genotypes can be deleted.
Table 5 contains a pedigree consisting of one halfsib family to illustrate this genotype elimination algorithm. The starting genotype lists, which are compatible with each individual’s own genotype data, are given in column 6 of Table 5 (step 1). To perform step 2, begin with the first genotype of the father, (11). This genotype produces only one possible paternal allele (1), which is inconsistent with the paternal allele of individual 6 and therefore not saved. Father’s genotype (12) is consistent with the paternal allele of all halfsibs, and LIST1 contains the following genotypes (consistent with father’s genotype) for the children: [(13)], [(22)], [(13)], and [(12), (21)] for individuals 5, 6, 7, and 8, respectively. Given LIST1, perform step 2.A.b for mother 2, who has one possible genotype (11). The only maternal allele (1) is inconsistent with the genotype of child 5. Therefore, father’s genotype (12) is inconsistent and is deleted together with genotypes (22) and (21) (see step 2.A.c). Similarly, father’s genotype (13) is deleted because it is inconsistent with the paternal allele of individual 6, and father’s genotypes (31) and (33) are also deleted by performing step 2.A.a.ii. Father’s genotype (23) is consistent with at least one genotype for each of his children, and LIST1 contains genotype sets [(31)], [(22)], [(31)], and [(21)] for halfsibs 5, 6, 7, and 8, respectively. By performing step 2.A.b. for mother 2, her only genotype (11) is consistent with her child’s genotype in LIST1, and her genotype as well as the genotype of child 5 from LIST1 are added into LIST2. Similarly for mother 3, genotype (22) is added into LIST2 for individuals 3 and 6. For mother 4, genotypes (11), (12), (13), (21), and (31) are identified as consistent, by performing step 2.A.b, and added into LIST2 together with genotypes (31) and (21) for children 7 and 8 from LIST1, while mother’s genotype (22), (23), and (33) are identified as inconsistent and deleted. Because LIST2 is nonempty for each mother, father’s genotypes (23) and (32) are saved, together with all genotypes in LIST2 for mothers and children. No further changes result from performing a second cycle. Genotype lists for all individuals left after genotype elimination are shown in the last column of Table 5.
This genotype elimination algorithm was constructed by modifying the LangeGoradia algorithm to improve its computational efficiency, in particular for pedigrees with predominant halfsib structure (typical in certain populations including cattle, swine, tree, and fish populations) and with a substantial amount of missing genotype data. Our genotype elimination algorithm produces genotype lists that are identical to those of the LangeGoradia algorithm, because every possible fathermother genotype pair is evaluated in every iteration by both algorithms. Although our algorithm has somewhat higher storage requirements (for LIST1 and LIST2), it is computationally more efficient than the LangeGoradia algorithm for several reasons. First, the evaluation of a fathermother genotype pair is partitioned into two steps: the father’s genotype is evaluated first by comparing it with the alleles of the paternal meiosis of each of his children. If inconsistent, the father’s genotype is deleted without the evaluation of the genotype pairs resulting from the father’s genotype and all possible genotypes of one or several mothers mated to this father. This modification accelerates the identification of an inconsistent father genotype, as deleting an inconsistent father genotype in the original algorithm requires the deletion of all genotype pairs consisting of this father’s genotype and the genotypes of one or several mothers. Second, the algorithm recognizes that fathermother genotype pair (ij) × (hk) generates the same zygotes as pairs (ij) × (kh), (ji) × (hk), and (ji) × (kh), by evaluating only one representative of these four pairs. Third, the algorithm recognizes that if a parent’s genotype (ij) (i ≠ j) is incompatible with a child’s genotypes, then parents’ genotypes (ii) and (jj) must also be incompatible, because they generate fewer types of alleles. Fourth, the algorithm is more efficient for pedigrees with predominant halfsib structure, because it identifies inconsistent genotypes during the current iteration, while an algorithm based on nuclear families would likely require additional iterations. Of course, the roles of father and mother could be reversed; i.e., the mother’s genotype could be evaluated first by comparing it with the alleles of the maternal meiosis of each of her children, and this approach would be more suitable in populations with predominant maternal rather than paternal halfsib structure.
Modification of this algorithm to include descent graph restrictions is straightforward: The comparison between a father’s (mother’s) genotype and an allele at the paternal (maternal) meiosis of a child is limited to the comparison of the child’s allele with the one parental allele specified by the child’s meiosis indicator. In addition, the algorithm must ensure that the two meioses of an inbred individual with an observed heterozygous genotype do not have the same founder origin.
For pedigrees without loops, the LangeGoradia algorithm and the algorithm described here, with the appropriate modifications for descent graph restrictions, produce a fully efficient genotype list for each pedigree member, regardless of the number of specified meiosis indicators in the pedigree. Therefore, every illegal incomplete or complete descent graph will be detected in any pedigree without loops.
For pedigrees with loops, the genotype elimination algorithm will correctly identify every illegal or legal complete descent graph and every legal incomplete descent graph. However, both the genotype and allele elimination algorithms will not identify every illegal incomplete descent graph. The example in Table 4 also serves as a counterexample for the first genotype elimination here.
Second genotype elimination algorithm: The LangeGoradia genotype elimination algorithm as well as our algorithm described in the previous section are not guaranteed to eliminate all inconsistent genotypes in pedigrees with loops. We first consider a counterexample consisting of a pedigree with an inbreeding loop, similar to the counterexample of Lange and Goradia (1987). The pedigree, genotype lists from LangeGoradia elimination, and fully efficient genotype lists are in Table 6. Given the genotypes of ancestors of related individuals 6 and 7, who form a marriage, the paternal meiosis of individual 6 cannot have allele 3 (2) if the paternal meiosis of individual 7 has allele 2 (3). Hence, fathermother genotype pairs (21) × (31) and (31) × (21) are not possible. The failure to identify impermissible genotype pairs for a marriage between related individuals is the cause of incomplete deletion of inconsistent genotypes in this case.
Similarly, both the LangeGoradia algorithm and our genotype elimination algorithm are not fully efficient in the presence of marriage loops. A counterexample with a marriage loop is given in Table 7, together with the genotype lists from LangeGoradia elimination and fully efficient genotype lists. Comparison of the two types of lists shows that individual 8 cannot have genotype (11), yet the genotype elimination fails to delete it. Individual 8 must pass on either a 2 or a 3 allele to offspring 9 or 10, because if one of its mates (6 or 7) carries a 2 (or 3) allele, the other cannot carry allele 3 (or 2). More precisely, the paternal meiosis of individual 6 cannot have allele 3 (or 2) if the paternal meiosis of individual 7 has allele 2 (or 3). Although individuals 6 and 7 do not form a marriage, each has a marriage with individual 8. Given genotype pair 11 × 21 for the marriage between individuals 8 and 6, for example, only the genotype pairs 11 × 11 and 11 × 21 are possible for the marriage between individuals 8 and 7, and both pairs are inconsistent with the genotypes of individual 10. Because the two marriages are not considered jointly, incomplete genotype elimination results.
The two counterexamples indicate that simultaneous consideration of all parents involved in a halfsib family is necessary for more complete genotype elimination. Consider a paternal halfsib family with one father and k mothers. Then there are k + 1 parental genotypes consisting of 2(k + 1) alleles. A combination of (k +1) genotypes is consistent given ancestor information if all 2(k + 1)[2(k + 1) + 1]/2  2(k + 1) allele pairs are consistent. However, consistency needs to be evaluated only for allele pairs pertaining to meioses of different parents with nonzero identitybydescent (IBD) probability. Note that each meiosis has two parental meioses (e.g., the parental meioses of the paternal meiosis of an individual are the two meioses of the individual’s father), and thus a pair of meioses (i, j) has four parental meioses, which consist either of four distinct meioses or two (i and j are derived from the two meioses of the same individual). An allele pair of a meiosis pair is consistent if it can be produced from a consistent quadruple of alleles at the four parental meioses of the pair under consideration. The parental allele quadruple is consistent if all of its allele pairs (at most six) are consistent.
On the basis of these considerations, our genotype elimination algorithm is modified to include the evaluation of genotype combinations of related parents in a halfsib family via evaluation of all genetically related meiosis pairs in this parental combination. Prior to genotype elimination, a list of all relevant meiosis pairs must be created. For each halfsib family, any meiosis pair pertaining to different parents and having nonzero IBD probability is added to the list (meiosis pairs pertaining to the same individual are excluded, because every ordered genotype is evaluated during elimination). After all halfsib families have been considered, the list is processed again by considering all parental meiosis pairs of each pair in the list (a parental meiosis pair can be from two different parent individuals or from the same). Any parental meiosis pair that pertains to two different individuals and has nonzero IBD probability is added to the list. The list is then processed again until no further parental meiosis pairs can be added. Given that consistency evaluation of allele quadruples can be performed via consistency evaluation of allele pairs, sequential elimination of inconsistent allele pairs for all relevant meiosis pairs produces a list of consistent allele pairs for each relevant meiosis pair.
Within a (paternal) halfsib family, the father may be related to a mother, or a mother may be related to another mother in the same family. Then, if any meiosis pair from the two related individuals is listed as a relevant pair, and not all possible allele pairs (derived from the genotype lists of the two individuals) are consistent, then the two individuals are termed restrictive to each other. That is, given that one individual carries a certain genotype, not all genotypes in the list of the other individual are consistent.
The modified genotype elimination algorithm, utilizing the list of relevant meiosis pairs, consists of the following steps.
Form mother groups within paternal halfsib families. Initially, each mother is assigned to a group by herself.
For each pedigree member, form a list of genotypes, which are consistent with the individual’s own genotype data.
Create the list of relevant meiosis pairs.
Form a list of allele pairs for each meiosis pair in the list created in step III using the genotype listscreated in step II. This collection of allele lists is referred to as LISTAP.
For each (paternal) halfsib family, consider the next ordered genotype [denoted by (ij)] of the father, which was saved in the previous cycle and is NOT deleted (see step V.A.b) in this cycle.
Identify those genotypes of each child of the father, which are consistent with genotype (ij).
If each child has at least one consistent genotype, store the consistent genotypes of all children in LIST1, and go to step V.B.
If all genotypes of any child are inconsistent with genotype (ij), delete genotypes (ij) (ji), (ii), and (jj) from father’s genotype list, discard LIST1, and go to step V.
For each mother group, consider each possible genotype combination χ of previously saved genotypes of each mother in the group.
All genotypes in combination χ and all consistent childrens’ genotypes are added into LIST2, if
For each relevant meiosis pair pertaining to a mother (related to the father) and the father or to two (related) mothers in the group, the allele pair deduced from combination χ and father genotype (ij) is included in LISTAP, and
Each child of the mother group has at least one consistent genotype in LIST1.
If LIST2 is nonempty (i.e., at least one combination is consistent) for the mother group, consider the next mother group, and go to step V.B.
If LIST2 is empty (i.e., no combination is consistent) for a mother group, discard LIST1 and LIST2, and go to step V.
If LIST2 is nonempty for all mothers, save father genotype (ij) and all genotypes in LIST2, discard LIST1 and LIST2, and go to step V.
For each relevant meiosis pair, delete any inconsistent allele pairs deduced from the current genotype lists produced in step V in LISTAP.
For each allele pair (denoted by [s] and [t]) in LISTAP of each relevant pair (P):
Consider the parent individual of the first meiosis of relevant pair P; search through the genotype list of the parent to identify a genotype with at least one [s] allele.
Consider the parent individual of the second meiosis of relevant pair P; search through the genotype list of the parent to identify a genotype with at least one [t] allele.
Evaluate the genotype pair (denoted by L) of the two parent individuals identified in steps VII.a and VII.b as follows.
If the two parents are different individuals, there are four meiosis pairs. Among those, consider each relevant pair and check whether the allele pair implied by genotype pair L is included in the allele pair list of the relevant pair.
If the two parents are one individual, check if any genotype in the list of the parent individual contains both alleles [s] and [t].
If yes to ii for all relevant pairs, or to iii, then save allele pair [s] and [t] for relevant pair P in LISTAP and move on to the next allele pair. If no to ii for at least one relevant pair, or to iii, then return to steps VII.b or VII.a.
Modify the mother groups within each halfsib family as follows: A mother is included in a group if she is restrictive to at least one other mother in the group.
Repeat steps V to VIII until no further genotypes for any pedigree member and no further allele pairs for any relevant pair can be excluded.
An example constructed to illustrate the modified genotype elimination algorithm is in Table 8. The initial genotypes for each pedigree member, which are consistent with its own observed genotype, are in the last column of Table 8. The pedigree consists of four paternal halfsib families. For reference, the paternal (maternal) meiosis of individual k is identified as meiosis 2k  1 (2k).
To identify all relevant meiosis pairs in step III of the algorithm, we first consider the only paternal halfsib family, which has genetically related parents: the family of father 11. There are four parents, their paternal meioses have nonzero IBD probabilities, and there are six relevant meiosis pairs (meioses 19&21, 19&23, 19& 25, 21&23, 21&25, 23&25). For relevant meiosis pair 19 and 21, the only parental meiosis pair not consisting of the same meiosis is 15&16. Pair 15&16 does not need to be considered, because it corresponds to a single individual, 8. For relevant meiosis pair 19&23, the parental meiosis quadruple consist of meioses 15, 16, 17, and 18. Among these, pair 15&17 pertains to two different individuals (8 and 9), which are genetically related. Therefore, meiosis pair 15&17 is a relevant pair. No additional relevant meiosis pairs can be identified by continuing the search. The list of relevant meiosis pairs is given in Table 9, and initial lists of allele pairs for all relevant meiosis pairs are shown in column 2 of Table 9 (step IV).
To perform step V, consider the halfsib family of father 1. Father genotype (23), which produces paternal alleles [2] and [3], is consistent with the paternal alleles of his children 8 and 9. All genotypes containing at least one 2 or 3 allele are stored in LIST1 for both children. For mother 2, her genotype (11) is consistent, and the consistent genotype list for her child (individual 8) is reduced to [(21), (31)]; the genotype list for individual 9 is identical. Father genotype (32) is saved by similar considerations. After performing similar eliminations for the halfsib families of fathers 8 and 9, individuals 10, 11, 12, and 13 have the same genotype list, [(11), (21), (31)]. For the halfsib family of father 11, parents are genetically related. Because each mother forms a group by herself in the first cycle, genotype elimination proceeds one mother at a time. No genotypes are eliminated for this family during the current cycle.
The genotype lists of all individuals after step V of the first cycle are given in column 7 of Table 8. The allele pair lists of all relevant meiosis pairs after performing step VI are in column 3 of Table 9.
To perform step VII, consider relevant meiosis pair 15&17, which has four possible allele pairs ([2] & [2], [2] & [3], [3] & [2], [3] & [3]). All four allele pairs can be produced from the allele pairs of the parental meioses. For relevant meiosis pair 19&21, nine allele pairs must be evaluated. Allele pair [1] & [1] is consistent, because it can be produced with parent individual 8 having genotype (21). Similarly, allele pairs [1] & [2], [1] & [3], [2] & [1], [2] & [2], [3] & [1], and [3] & [3] are consistent, because they can be produced with parent 8 having genotype (21) or (31). Allele pairs [2] & [3] and [3] & [2] are inconsistent, because they cannot be produced by any of 8’s current genotypes. Because not all allele pairs of relevant meiosis pair 19&21 are consistent, individuals 10 and 11 are flagged as being restrictive to each other. All allele pairs for relevant meiosis pairs 19&23, 19&25, 21&23, and 21&25 are consistent, and hence individual pairs 10 and 12, 10 and 13, 11 and 12, and 11 and 13 are not restrictive to each other. For relevant meiosis pair 23&25, 7 of 9 possible allele pairs ([1] & [1], [1] & [2], [1] & [3], [2] & [1], [2] & [2], [3] & [1], [3] & [3]) are consistent with the genotypes of parent individual 9, while two other allele pairs ([2] & [3], [3] & [2]) are inconsistent, indicating that individuals 12 and 13 are restrictive to each other. The allele pair lists of the relevant meiosis pairs after the first cycle are in column 4 of Table 9.
In performing step VIII for the halfsib family of father 11, mothers 12 and 13 form a group because they are restrictive to each other, while mother 10 continues to be in a group by herself.
In step V of the second cycle, there are no changes in the genotype lists of all individuals in the halfsib families of fathers 1, 8, and 9. For the halfsib family of father 11, father genotype (11) is consistent with his childrens’ genotypes, and the consistent genotypes stored in LIST1 for individuals 14, 15, and 16 are [(13)], [(12)], and [(13)], respectively. Mother 10 is restrictive to the father. First consider mother genotype (13), which is consistent and results in no eliminations in the genotype list of her child, individual 14. Because mother 10 forms a group by herself, we proceed with her. Genotypes [(11) and (21)] of mother 10 are inconsistent given the father’s genotype (11). The next mother group consists of individuals 12 and 13. For mother 12, genotype (11) is inconsistent, but her next genotype (21) is consistent. Given that mother 12 has genotype (21), mother 13 cannot have genotype (31), because allele pair [2] & [3] is inconsistent for relevant meiosis pair 23&25. The remaining genotypes of individual 13 [(11) and (21)] are both inconsistent with the genotype of child 16 in LIST1. Therefore, individual 13 has no consistent genotypes given father 11’s genotype (11) and mother 12’s genotype (12). Therefore, genotype (12) of mother 12 is inconsistent given father genotype (11). The next genotype for mother 12, genotype (31), is inconsistent with her child’s genotype in LIST1. Therefore, mother 12 has no consistent genotypes given father genotype (11). Consequently, father genotype (11) is deleted.
Father 11’s genotype (21) is inconsistent, because mother 10 has no consistent genotypes. For father 11’s genotype (31), the consistent childrens’ genotypes are [(13), (31)], [(12)], [(13), (31)] for individuals 14, 15, and 16, respectively. For mother 10, genotype (21) is inconsistent because allele pair [2] & [3] is not permitted for relevant meiosis pair 19&21. The other two genotypes of mother 10 [(11) and (31)] are both consistent, and there is no change in her child’s genotype list. For mother 12, genotypes (11) and (31) are inconsistent with individual 15’s genotype in LIST1. Genotype (21) for mother 12 is inconsistent, and no change in the genotype list of her child is needed. Given the genotype of mother 12 being (21), mother 13 cannot have genotype (31). Genotypes (21) and (11) are consistent for mother 13, and the genotype list for the child of mother 13 is changed to (31). Father 11’s genotype (31) is consistent and saved along with genotype list [(11), (31)], [(21)], [(11), (21)], [(13), (31)], [(12)], and [(31)] for individuals 10, 12, 13, 14, 15, and 16, respectively.
Given the new genotype list, all current allele pairs for all relevant pairs are consistent, and individuals are not restrictive to each other. A few more genotypes are deleted in cycle 3, and the final genotype lists for each individual and allele pair lists for each relevant pair are given in Tables 8 and 9, respectively.
For pedigrees without loops, our second genotype elimination algorithm reduces to the first algorithm described earlier. For pedigrees with loops (marriage or inbreeding), the second genotype elimination algorithm will eliminate more inconsistent genotypes than the previous algorithm, which is achieved at the expense of a generally moderate increase in storage space and computing time required for evaluating the consistency of allele pairs for each relevant meiosis pair. For all three counterexamples of the LangeGoradia algorithm in Tables 6, 7, and 8, our second genotype elimination algorithm produces fully efficient genotype lists (and allele lists) for each pedigree member. However, there is no guarantee that this algorithm will always be fully efficient.
Modifications to include descent graph restrictions are again required and are analogous to those described earlier for the first genotype elimination algorithm. The second genotype elimination algorithm will identify more illegal incomplete descent graphs than our first algorithm (or equivalently the LangeGoradia algorithm). For example, the second genotype elimination identifies the descent graph in Table 4 as illegal, because more genotypes in individuals 9 to 13 are deleted. While we cannot show that every illegal incomplete descent graph can be identified with the second genotype elimination algorithm, we were unable to construct a counterexample for the algorithm.
DISCUSSION
Genotype sampling algorithms based on peeling or descent graph sampling have recently received considerable attention, because they are needed for implementing analyses for a number of inferential problems in complex pedigrees, including gene mapping and estimation of marginal genotype probabilities for individuals. For any genotype sampler, the genotype space, consisting of a list of genotypes consistent with all observed data on the pedigree, must be determined for each pedigree member. For some samplers (some implementations of peeling), this needs to be done only once, while for others (some descent graph samplers), this needs to be done many times and conditionally on a completely or partially specified descent graph. Therefore, genotype elimination is becoming increasingly important and has been revisited in this article.
The purpose of this study was to investigate and improve the properties of genotype or allele elimination in complex pedigrees with and without loops and unconditional or conditional on complete or incomplete descent graphs. For descent graph sampling or allelic peeling, lists of consistent alleles for all meioses rather than lists of consistent genotypes for all individuals are needed. Although allele lists can be created from genotype lists, forming allele lists directly is more efficient computationally. Therefore, we present an allele elimination algorithm that requires far less memory and CPU time than genotype elimination. However, we show that allele elimination does not always eliminate all of the alleles deleted by genotype elimination in pedigrees containing fullsibs and with or without loops.
The genotype elimination algorithm of Lange and Goradia (1987) is based on the nuclear family and evaluates every combination of the genotypes of the parents in each nuclear family based on the zygotes it produces and the genotype lists of the children. In our first genotype elimination algorithm, evaluation of many such combinations is avoided (i) by evaluating a father’s genotype first with all of its offspring and evaluating a mother’s genotype conditional on father’s genotype only if the latter is consistent (with the roles of father and mother of course being reversible), (ii) by recognizing that certain genotype combinations produce the same zygotes and therefore only one combination needs to be evaluated, and (iii) by recognizing that inconsistency of a heterozygous genotype implies inconsistency of genotypes that are homozygous for either allele in the heterozygote. This algorithm also improves the flow of information, decreasing the number of iterations through the pedigree and reducing CPU time in pedigrees with predominant halfsib structure. Genotype lists produced by this algorithm are identical to those obtained from the LangeGoradia elimination.
In pedigrees with loops the LangeGoradia genotype elimination is not guaranteed to delete every inconsistent genotype. For situations where accurate genotype or allele elimination is important, we expand our first genotype elimination algorithm to account for genotype pair restrictions, which may exist when mates are related (inbreeding loops) or when an individual has several, related mates (marriage loops). Ignoring such restrictions causes failure to delete certain inconsistent genotypes.
For each elimination algorithm, we describe how the algorithm can be adapted to evaluate the consistency of a complete or incomplete descent graph, such that every inconsistent complete descent graph is detected. In this case, the algorithms produce genotype or allele lists for individuals or meioses, which are consistent with observed genotype data and known meiosis indicators. Given a complete descent graph, the modified allele elimination algorithm produces allele lists for founder meioses, which are identical to those obtained by the algorithm of Sobel and Lange (1996). Therefore, allele elimination will detect every illegal complete descent graph. In contrast to the method of Soble and Lange (1996), allele elimination can also be used to evaluate the consistency of an incomplete descent graph, although it may not detect every illegal incomplete descent graph in the presence of fullsibs, even in a pedigree without loops. With the LangeGoradia and our first genotype elimination algorithm, every illegal incomplete or complete descent graph will be detected in any pedigree without loops. For evaluation of an incomplete descent graph in a pedigree with loops, the second genotype elimination algorithm will detect more and likely most illegal descent graphs. We will report on the relative performances of the elimination algorithms in the context of descent graph sampling in another contribution (F.X. Du and I. Hoeschele, unpublished results).
Acknowledgments
Thanks to Bruce Tier for sharing his ideas on implementation of descent graph samplers with us. This research was supported by the National Science Foundation (grant DBI9723022) and by the U.S. Department of Agriculture’s National Research Initiative Competitive Grants Program (grant 96352053662). I. Hoeschele also acknowledges financial support, while on sabbatical at North Carolina State University, from the National Institutes of Health (grant GM45344).
Footnotes

Communicating editor: ZB. Zeng
 Received February 17, 2000.
 Accepted July 26, 2000.
 Copyright © 2000 by the Genetics Society of America