Abstract
Consanguineous unions increase the rate at which identical genomic segments are paired within individuals to produce runs of homozygosity (ROH). The extent to which such unions affect identity-by-descent (IBD) genomic sharing between rather than within individuals in a population, however, is not immediately evident from within-individual ROH levels. Using the fact that the time to the most recent common ancestor for a pair of genomes at a specific locus is inversely related to the extent of IBD sharing between the genomes in the neighborhood of the locus, we study IBD sharing for a pair of genomes sampled either within the same individual or in different individuals. We develop a coalescent model for a set of mating pairs in a diploid population, treating the fraction of consanguineous unions as a parameter. Considering mating models that include unions between sibs, first cousins, and nth cousins, we determine the effect of the consanguinity rate on the mean
for pairs of lineages sampled either within the same individual or in different individuals. The results indicate that consanguinity not only increases ROH sharing between the two genomes within an individual, it also increases IBD sharing between individuals in the population, the magnitude of the effect increasing with the kinship coefficient of the type of consanguineous union. Considering computations of ROH and between-individual IBD in Jewish populations whose consanguinity rates have been estimated from demographic data, we find that, in accord with the theoretical results, increases in consanguinity and ROH levels inflate levels of IBD sharing between individuals in a population. The results contribute more generally to the interpretation of runs of homozygosity, IBD sharing between individuals, and the relationship between ROH and IBD.
- coalescent
- consanguinity
- identity by descent
- runs of homozygosity
- time to the most recent common ancestor
CONSANGUINEOUS unions, in which mating pairs have a close genetic relationship, produce offspring whose two genomic copies have higher levels of identity-by-descent (IBD) sharing than is seen for corresponding offspring of nonconsanguineous unions. The offspring of consanguineous unions can inherit two copies of a segment of the genome from the same recent ancestor—a shared close relative of both mother and father—through separate maternal and paternal lines of descent. Because this ancestor is recent, little time has been available for recombination to break the segment, so that the two copies can be identical over a long distance (Figure 1A).
Consanguinity and genomic sharing. (A) In a consanguineous pedigree, an inbred individual can possess two copies of a long genomic segment inherited from a common ancestor along two paths. (B) In a population with consanguineous mating, individuals experience increased genomic sharing of their two genomic copies (yellow); this article considers the effect of consanguinity on genomic sharing between genomes in two “unrelated” individuals (blue).
Long runs of homozygosity (ROH)—regions in which the two homologous chromosomes of an individual are identical over long distances—have been observed to co-occur with known high rates of consanguinity (Woods et al. 2006; Hunter-Zinck et al. 2010; Scott et al. 2016; Ceballos et al. 2018). In humans, many populations with a high rate for consanguineous unions have been seen to be among the populations with the largest fractions of their genomes residing in long ROH (Kirin et al. 2010; Pemberton et al. 2012; Karafet et al. 2015; Kang et al. 2016).
Measurement of IBD sharing between genomes in distinct individuals has emerged as a powerful method for analysis of population relationships and demographic history (Browning and Browning 2012; Palamara et al. 2012; Harris and Nielsen 2013; Ralph and Coop 2013; Thompson 2013). IBD sharing is computed for pairs of genomes in individuals at different geographic scales or in comparisons of pairs from the same or different populations. The pattern of sharing is then used to infer demographic histories.
Informally, high levels of IBD sharing between individuals within populations have been seen in some of the same populations that possess high ROH levels (Kang et al. 2016). However, it is not clear from a theoretical understanding of the determinants of IBD sharing that ROH levels, measured within individuals, and levels of between-individual IBD sharing would have a direct relationship. Consanguinity increases the probability that the same genomic segment appears in two copies in the same individual; the way in which consanguinity relates to genomic sharing between individuals does not directly follow from the within-individual pattern (Figure 1B).
It is possible that the increased IBD sharing within individuals that is produced by consanguinity increases IBD sharing between individuals, as an enlarged inbreeding coefficient decreases effective population size, and, hence, might increase genomic sharing between all pairs of individuals. On the other hand, it is possible that the increased genomic sharing within offspring resulting from consanguinity has little or no effect on sharing between pairs of genomes in individuals from different families; increased IBD sharing for individuals within a family that has many consanguineous unions might be counteracted by decreased IBD sharing for individuals from different families that are not closely related.
A difficulty in evaluating the effect of relationships among consanguinity, ROH, and IBD sharing between individuals is that the phenomena of interest concern properties of a diploid population pedigree. Unlike in many problems in population genetics, in which a diploid population of size exchangeable individuals can be approximated by a model of a corresponding haploid population of size
(Wakeley 2009, Chapter 6.1), for the study of consanguinity, it is important to consider mating pairs of diploid individuals, and to account for the possibility that individuals might have many consanguineous mating pairs in their ancestry.
Here, adapting a model of N diploid mating pairs, each of which can represent a consanguineous pair or a nonconsanguineous pair, we study the effect of consanguinity on the mean time to the most recent common ancestor for pairs of gene lineages sampled either as the two genomic copies within an individual or as two copies from different individuals. In the model, not only does consanguinity decrease
for pairs of genomic copies within an individual, thereby increasing ROH levels, it also decreases
for pairs of genomic copies in separate individuals in a population, thereby increasing between-individual IBD sharing. We verify the prediction of the model by examining ROH and IBD sharing in data from human populations.
Model
, ROH, and IBD
Our goal is to study the relationship between ROH within individuals and IBD sharing between individuals. To do so, we examine a model of a genetic locus in a population, in which we can consider two random variables: T, the for the two gene lineages sampled from the same individual chosen at random in the population, and V, or
for a pair of lineages from two individuals chosen at random in separate mating pairs. The choice to study
arises from the fact that the length of genome shared around a focal site is closely related to
at the site (Palamara et al. 2012; Carmi et al. 2014; Browning and Browning 2015). Thus, lower values of T lead to longer homozygous segments within individuals, and lower values of V lead to longer IBD segments in pairs of individuals. The relationship between T and V, and its dependence on model parameters, then provide insight into the relationship between ROH and IBD.
Diploid mating model
We study a diploid discrete-generation model with sib mating that was introduced by Campbell (2015), extending it to permit other forms of consanguinity. The model of Campbell (2015) considers a constant-sized diploid population with monogamous mating pairs,
individuals, and
allelic copies at a locus. Some of the mating pairs are consanguineous, and the others are nonconsanguineous. In particular, in each generation, a constant fraction
of the pairs represent sib matings (Figure 2A). Although in principle,
can be viewed as a probability of consanguinity that ranges from 0 to 1, in our model, to ensure that the number of sib mating pairs is an integer,
must be a multiple of
.
Diploid model of monogamous mating pairs, some of which are sib mating pairs. (A) Each generation has mating pairs, a fraction
of which represent sib mating pairs. (B) Each sib mating pair is assigned one parental pair from the previous generation, representing parents of both sibs. (C) Each nonconsanguineous pair is assigned two distinct parental pairs from the previous generation, representing the two sets of parents for the members of the pair.
One generation back in time, for each of the sib mating pairs, a single parental mating pair is chosen uniformly at random with replacement to represent the parents of the mating sibs (Figure 2B). For each of the remaining
nonconsanguineous mating pairs, two parental mating pairs in the previous generation are chosen uniformly at random from the
possibilities, representing the parents of the two members of the pair in the current generation (Figure 2C). Because each nonconsanguineous mating pair chooses two distinct parental mating pairs, chance sib mating does not occur.
In this model, two allelic copies at a locus have three possible states (Figure 3). They can be the two alleles of the same individual (state 1). Alternatively, they can be in the two individuals of a mating pair, one in each member of the pair (state 2). Finally, they can be in two individuals in separate mating pairs (state 3). We define three random variables corresponding to these three states: T is for two alleles in the same individual, U is
for two alleles in two individuals in a mating pair, and V is
for two alleles in two individuals in separate mating pairs (Figure 3).
Three states possible for a pair of alleles. State 1: within the same individual (yellow). State 2: in two individuals in a mating pair (pink). State 3: in two individuals in different mating pairs (blue).
Campbell (2015) derived the mean coalescence time for two alleles in an individual as a function of the population size N and the fraction of sib mating pairs
. We begin by recapitulating the results of Campbell (2015) in the diploid model with sib mating, also examining
and
. Next, we extend the model to consider
,
, and
in other consanguinity regimes: first cousin mating, nth cousin mating, and a superposition of multiple degrees of cousin mating. We find that, in all regimes, consanguinity decreases
,
, and
, thereby predicting that consanguinity increases both ROH lengths within individuals and IBD sharing between individuals. A single result unifies the consanguinity regimes in terms of the kinship coefficient of the pairs of individuals in consanguineous unions.
Sib Mating
Following Campbell (2015), we first rederive ,
, and
in units of generations by setting up recursions using a first-step analysis. For
, if two alleles are present within one individual (state 1), then they must have been present in two individuals in a mating pair in the previous generation (state 2). Hence,
and
(1)
For , if two alleles are in the two individuals of a mating pair (state 2), then, with probability
, the pair is a sib mating pair. Three cases are possible for the previous generation. With probability
, the two alleles coalesce, giving a coalescence time of 1 generation. With probability
, they are the two alleles of the same individual (state 1), giving a mean coalescence time of
. With probability
, they are two alleles in the two individuals of a mating pair (state 2), generating mean coalescence time
.
Chance sib mating is forbidden among the pairs that are not among the sib mating pairs. If the two individuals in the mating pair are not sibs, then, in the previous generation, the alleles trace to two separate mating pairs (state 3), giving mean coalescence time
. Combining the various cases, we have
(2)
Finally, for , because parental pairs are chosen uniformly at random with replacement among N possible pairs, two individuals in separate mating pairs are sibs with probability
. In the previous generation, if the two individuals are sibs, then the two alleles can either coalesce, be in the same parent in the previous generation (state 1), or be in separate parents (state 2). If they are not sibs, then the alleles lie in two individuals in separate mating pairs in the previous generation (state 3). Combining these cases, we have:
(3)
Equations 1–3 form a linear system of equations in ,
, and
, the solution to which is
(4)
(5)
(6)Note that although Campbell (2015) presented only Equation 4, Equations 5 and 6 also result from solving the system.
We can immediately observe that , so that if
exceeds
, or the population has more than two consanguineous mating pairs each generation, then
, and the mean coalescence time for two alleles in different mating pairs exceeds the mean coalescence time within individuals. For
,
and
differ by at most two generations.
If , then the reduced model of N monogamous diploid pairs with sib mating avoidance produces mean coalescence times close to the mean coalescence time of
for two lineages chosen uniformly at random in a haploid population of size
(Wakeley 2009, Chapter 6.1). The factors of
in Equations 4 and 5 and
in Equation 6 provide linear reductions in mean coalescence time owing to increasing consanguinity
.
Equations 4 and 6, normalized by , are plotted in Figure 4 as functions of N for various values of
. As N increases, the constant terms in Equations 4–6 become unimportant, and the mean coalescence times are dominated by a product of
, the number of allelic copies in the population, and the reduction factor due to consanguinity,
or
.
Normalized mean coalescence times under sib mating as functions of the number of mating pairs N and the fraction of sib mating pairs . (A)
, the normalized mean coalescence time for two alleles within an individual (Equation 4). (B)
, the normalized mean coalescence time for two alleles in two separate mating pairs (Equation 6).
First Cousins
Next, we extend the model to first cousin mating and again derive ,
, and
in the same manner as in the sib mating case. In each generation, the fraction of first cousin mating pairs is a constant value
. Similarly to the sib mating case, both chance first-cousin mating and chance sib mating are forbidden among the remaining nonconsanguineous pairs. Consanguineous pairs are assumed not to be double-first cousins, and chance double-first-cousin mating is also forbidden among nonconsanguineous pairs.
is the same as with sib mating: if two alleles are present within one individual (state 1), then they must have been present in two individuals in a mating pair in the previous generation (state 2), and Equation 1 still holds.
For , if two alleles are in two individuals of a mating pair (state 2), then, with probability
, those individuals are first cousins. If they are first cousins, then each has a parent who is the offspring of the shared grandparental mating pair (Figure 5B). For each individual in the first cousin mating pair, the probability that the sampled allele is inherited from the sib parent is
. Consequently, the probability that the sampled alleles in both individuals are inherited from the sib parents is
. If both alleles are inherited from the sib parents, then—similar to sib mating—two generations ago, three cases are possible. First, with probability
, the two alleles coalesce, giving a coalescence time of two generations. With probability
, they are the two alleles of the same individual (state 1), giving a mean coalescence time of
. Finally, with probability
, they are two alleles in the two individuals of a mating pair (state 2), generating mean coalescence time
.
The path by which two sampled alleles (green) in a consanguineous union of individuals with a specified relationship are inherited from a recent shared ancestral mating pair. (A) Sibs. (B) First cousins. (C) nth cousins.
With probability , two alleles in two individuals in a mating pair are not inherited from a shared grandparental mating pair. Because both chance sib and first-cousin mating are forbidden, two generations ago the alleles are in separate mating pairs, giving mean coalescence time
. Combining the cases gives
(7)
Lastly, for the formula is the same as Equation 3 because parental pairs of individuals are still chosen uniformly at random with replacement from the N pairs.
Equations 1, 7, and 3 form a linear system of equations, with solution(8)
(9)
(10)
We first note that , so if
, or the number of consanguineous mating pairs exceeds 12, then the mean coalescence time for two alleles in different mating pairs,
, exceeds the mean coalescence time for two alleles within an individual,
. As
approaches 0, the mean coalescence times are near
, the mean coalescence time for two lineages from a randomly mating haploid population of size
. On the other hand, for
near 1,
and
. The mean coalescence times are reduced linearly due to consanguinity, by a factor of
in Equations 8 and 9, and by a factor of
in Equation 10.
Equations 8 and 10, normalized by , are plotted in Figure 6. As the number of mating pairs N increases, the mean coalescence times approach the product of
and a reduction factor due to consanguinity. In contrast to sib mating, for which
decreases to 0 and
to
for large N and
,
is bounded below by
and
by
.
Normalized mean coalescence times under first cousin mating as functions of the number of mating pairs N and the fraction of first cousin mating pairs . (A)
, Equation 8. (B)
, Equation 10. The dashed lines represent the maximum reduction due to consanguinity, obtained by setting
:
in (A) and
in (B).
nth Cousins
The similarity of the derivation in the cases of sib mating and first-cousin mating suggests a generalization to nth cousin mating, where represents first-cousin mating and
represents sib mating. As before,
is the fraction of mating pairs that represent nth cousins. It will be convenient to assume that chance mating of ith cousins is forbidden for all i from 0 to n. Beginning with
, if two alleles are within one individual, then, as before, they must have been in two individuals in a mating pair in the previous generation. Equation 1 continues to hold.
For , with probability
, the individuals in the mating pair are nth cousins. They then share an ancestral mating pair
generations in the past and have ancestors that are sibs n generations ago (Figure 5C). The probability that a pair of alleles, one in an offspring of one sib and one in an offspring of the other sib, both trace to the sibs is
. For each of the next
generations connecting the sibs to the nth cousins, the conditional probability that the transmitted alleles are both from the sibs given that they are from the sibs in the previous generation is
. Consequently, with probability
, the sampled alleles in the current generation are inherited from the sib ancestors.
Conditional on tracing to the sibs, three cases exist for the two alleles in the shared ancestral mating pair: with probability , the alleles coalesce
generations ago. With probability
, the two alleles are in state 1 and have mean coalescence time
. Lastly, with probability
, the two alleles are in state 2 and have mean coalescence time
. If the two alleles are not inherited from the ancestral sibs, or if the individuals in the mating pair in the current generation are not nth cousins, then because chance mating of cousins of degree
is forbidden, the two alleles are in separate mating pairs
generations ago and have mean coalescence time
. Combining the cases gives
(11)
Lastly, for , because parental pairs are chosen uniformly at random with replacement from the N possible pairs, Equation 3 continues to hold.
Equations 1, 11, and 3 form a linear system of equations, the solution to which is(12)
(13)
(14)
Note that Equations 12–14 give Equations 4–6 as a special case when , and Equations 8–10 when
. We can consider the difference
. If
, or the number of consanguineous pairs exceeds
, then the mean coalescence time for two alleles in different mating pairs,
, exceeds the mean coalescence time for two alleles within an individual,
. As n increases, the first term
approaches zero and the two means differ by approximately
. For fixed n, the mean coalescence times are reduced linearly due to consanguinity, by a factor of
in Equations 12 and 13, and by
in Equation 14.
Equations 12 and 14, normalized by , are plotted in Figure 7 as functions of the degree n of the cousin relationship. The terms in these equations that reduce coalescence times are
in Equation 12 and
in Equation 14. As the degree n of the cousin relationship increases, these terms decrease exponentially to zero, and the mean coalescence times approach
. The ratio
, taking the ratio of Equations 14 and 12, is plotted in Figure 8 as a function of
for n from 0 to 5. As the fraction of cousin mating
increases, the ratio increases above 1, so
; however, as the degree of the relationship n increases for fixed
, the ratio decreases toward 1.
Normalized mean coalescence times as a function of degree n of the relationship and the fraction of nth cousin mating pairs.
is assumed. (A)
, Equation 12. (B)
, Equation 14.
The ratio , or Equation 14/Equation 12, as a function of the fraction of cousin mating
and the degree of the cousin relationship n.
is assumed.
Superposition of Multiple Mating Levels
We now combine all forms of consanguinity examined thus far into a superposition of levels of cousin mating, in which ith cousin mating is permitted for each i from 0 to n. For each i from 0 to n, let be the fraction of ith cousin mating pairs in each generation, and let n be the degree of the most distant cousin relationship allowed. For each
, chance ith cousin mating is prohibited. We assume individuals in a consanguineous mating pair cannot be related by more than one path; for example, they cannot be both first and third cousins. This assumption is designed for use with a large population and a small value of
. For fixed n, as N becomes large, the probability that two individuals in a consanguineous mating pair share more than one recent ancestor is regarded as negligible.
is the same as in the previous models: two alleles within one individual must have been in two individuals in a mating pair in the previous generation (Equation 1).
For , for each
, with probability
the individuals in the mating pair are ith cousins. As was seen with nth cousins, with probability
, the two alleles were inherited from sib ancestors i generations ago. Then,
generations ago, there are three possible cases: with probability
, the alleles coalesce. With probability
, the alleles are in state 1 and have mean coalescence time
. Finally, with probability
, the alleles are in state 2 and have mean coalescence time
.
The probability is that the two alleles are either not in a consanguineous mating pair for all i from 0 to n, or not inherited from the shared ancestral mating pair. Then, because chance mating of cousins of degree
is forbidden, the two alleles are in separate mating pairs
generations ago and have mean coalescence time
. Combining the cases for all
gives
(15)
Because parental pairs are chosen uniformly at random with replacement from the N possible pairs, for two alleles in separate mating pairs, Equation 3 holds as before.
We define c as the sum over i of the probability that two alleles in a mating pair chosen at random are inherited by descent from the same allele in a shared ancestral mating pair generations in the past:
(16)In other words, c is defined in the same way as the kinship coefficient of the two individuals in a randomly chosen mating pair (Jacquard 1972; Lange 1997); it is the probability that two alleles selected at random from a randomly chosen mating pair are identical by descent.
Equations 1, 15, and 3 form a system of equations, the solution to which is(17)
(18)
(19)where
(20)First, if
for any n, then Equations 17–19 reduce to Equations 12–14. The difference
for large N is approximately
.
For sufficiently large N, the constant terms in Equations 17–19 contribute little. Next, for each i, , so
, and the contributions from
in Equations 17 and 18 and from
in Equation 19 are relatively small. Finally, noting that for probabilities
with
, the sum in Equation 16 is maximized if
and all other
equal 0, so
and
. If
, then the maximal contribution of
in Equations 17 and 18 and
in Equation 19 is also relatively small. Then, except in the sib mating case of
and
, the means in Equations 17–19 are dominated by the product of
and the linear reduction factors
in Equations 17 and 18 and
in Equation 19.
Application to Data
Background
Previously, Kang et al. (2016) demonstrated that ROH sharing increases with consanguinity. Specifically, in their Figure 7, they observed a positive correlation between population means of the total ROH length and population levels of consanguinity available from demographic studies. This relationship accords with our prediction that increased consanguinity reduces within-individual mean pairwise coalescence times (Equations 4, 8, 12, and 17), and, hence, increases ROH length.
We use the data of Kang et al. (2016) to test predictions about the relationship between consanguinity, ROH, and IBD. Our model of the effect of consanguinity on coalescence times predicts that increased consanguinity decreases mean coalescence times both for pairs of alleles within individuals and for pairs of alleles in individuals in different mating pairs
with a larger reduction for within-individual coalescence times. Because more recent coalescence times for pairs of lineages are expected to give rise to elevated genomic sharing, we expect IBD and ROH sharing to be correlated, owing to the fact that their associated coalescence times both decrease with increasing consanguinity. In addition, we expect a larger increase in ROH sharing relative to the corresponding increase in IBD, due to the larger relative decrease of coalescence times for
compared to
.
Data set
We use data from Kang et al. (2016) consisting of 202 individuals from 18 Jewish populations, and 2903 individuals from 123 non-Jewish populations, with genotypes available at 257,091 SNPs. We focus our analysis on Jewish individuals classified by Kang et al. (2016) into six regional groups: Ethiopian, European, Middle Eastern, North African, South Asian, and Yemenite. The remaining non-Jewish individuals are a combination of the HGDP-CEPH and HapMap III data sets and were included only for phasing.
Data analysis
ROH lengths for each individual were taken from Kang et al. (2016). Following Pemberton et al. (2012), Kang et al. (2016) classified ROH segments into three length categories: Class A for short segments, Class B for segments of intermediate length, and Class C for long segments. Kang et al. (2016) further examined the relationship between length class and consanguinity, demonstrating that the total length of the Class C segments drives the correlation between ROH length and consanguinity.
To calculate IBD, we first phased the full data set with Beagle 4.1 (Browning and Browning 2007) using the default parameters (maxlr = 5000, lowmem = false, window = 50,000, overlap = 3000, niterations = 5, impute = false, cluster = 0.005, ne = 1,000,000, err = 0.001, seed = −99,999, modelscale = 0.8) and HapMap GRCh36 genetic maps for the map parameter. From the phased data, we called IBD segments with Refined IBD (Browning and Browning 2013) using the default parameters (window = 40.0, lod = 3.0, length = 1.5, trim = 0.15, scale = 3) and the same map files.
Total ROH length sums segments shared between two haplotypes within an individual, whereas total IBD length is a sum of four haplotype comparisons between two diploid individuals. To make IBD directly comparable with ROH, we calculated total IBD length by summing all segments shared between two individuals (reported by Refined IBD) and dividing by 4. This computation gives the mean total IBD length shared between two haplotypes chosen at random from the two individuals. We averaged this length across all pairs of distinct individuals within populations.
Data availability
See Kang et al. (2016) for the data used in this study.
Results
In Figure 9A, we compare the relationship between mean total IBD across all pairs of individuals and mean total ROH across all individuals in 18 Jewish populations. As noted by Kang et al. (2016), the longest ROH lengths occur primarily in the two South Asian Jewish populations and several of the Middle Eastern Jewish populations. Our new computation of IBD length generally accords with those of Atzmon et al. (2010), Campbell et al. (2012), and Waldman et al. (2016a,b), in that the South Asian Jewish populations have the highest IBD sharing, followed by most Middle Eastern and North African Jewish populations, with European, Syrian, and Ethiopian Jews having the least sharing. Note that the particularly high level of IBD sharing in the Mumbai population has been observed previously in an independent sample (Waldman et al. 2016a).
Mean total ROH and mean total IBD length for 18 Jewish populations. Populations are color-coded by regional group as in Kang et al. (2016); Ethiopian (orange), European (blue), Middle Eastern (brown), North African (yellow), South Asian (red), and Yemenite (green). Al, Algerian; As, Ashkenazi; Az, Azerbaijani; C, Cochin; E, Ethiopian; G, Georgian; Iq, Iraqi; Ir, Iranian; It, Italian; K, Kurdish; L, Libyan; Mo, Moroccan; Mu, Mumbai; Se, Sephardi; Sy, Syrian; T, Tunisian; U, Uzbekistani; Y, Yemenite. (A) All ROH. The regression equation is (
,
). (B) Class A short ROH. The regression equation is
(
,
). (C) Class B intermediate ROH. The regression equation is
(
,
). (D) Class C long ROH. The regression equation is
(
,
).
IBD and ROH are positively correlated, with . The regression has positive slope 0.12 with
, indicating that, at a population level, a 1 Mb increase in mean total ROH is expected to increase total IBD by 120 kb on average. The positive relationship between ROH and IBD is consistent with the prediction under the model of a correlated relationship for within-and between-individual coalescence times. Moreover, the slope is less than 1, reflecting the greater reduction in within-individual coalescence times due to consanguinity compared to between-individual coalescence times.
In Figure 9, B–D, we consider the relationship between mean total IBD and each of the ROH length classes. We observe the strongest correlation of IBD with total Class C or long ROH, with . Classes A (short) and B (intermediate) have positive, but weaker, correlations, with
and
respectively. The regression line for Class C is significant with
, whereas for Classes A and B it is not significant, with
and
respectively. The relationship between IBD length and Class C ROH length suggests that, in general, IBD and ROH are correlated because both are affected by consanguinity, in agreement with our theoretical predictions. The weaker correlations between IBD length and Classes A and B might result from comparatively less accurate calling of short IBD segments.
Discussion
Summary
We have studied the effect of consanguinity on within- and between-individual coalescence times. We extended the sib mating model of Campbell (2015) to permit first cousin mating, nth cousin mating, and a superposition of multiple levels of cousin mating, deriving mean coalescence times for two alleles within an individual and two alleles in separate mating pairs
. We found that consanguinity linearly reduces both means, with a greater reduction for within-individual coalescence times. To test our theoretical predictions, we studied ROH and IBD patterns in 18 Jewish populations, finding that they are correlated, and that the correlation is driven by long Class C ROH. These results support the prediction of the modeling framework that ROH and IBD levels are both amplified by consanguinity.
In each of our various models, for large N, and
are approximately equal to the product of
, the mean
in a haploid population of size
, and a linear reduction term that depends on the fraction of consanguineous pairs and their degree of consanguinity. Thus, although the model considers diploids with a rigid monogamous mating structure, its coalescence times produce a close relationship to those of the standard haploid model.
The difference is approximately
for nth cousin mating and
for the superposition of different mating levels. The quantity
can be viewed as the expected number of coalescence events due to consanguinity, as it is the product of the number of pairs of alleles in two individuals in a mating pair
and the probability that two alleles in a mating pair are identical by descent, and, therefore, coalesce quickly rather than on a coalescent timescale (c). In other words, two alleles in the same individual have probability c of having a coalescence time near zero, so that on average their coalescence time is expected to be
less than that of two alleles that are in different mating pairs and that do not have the probability c of near-immediate coalescence. Note that this perspective, based on the superposition case, also applies in the nth cousin mating case, as the difference
for nth cousins is
, and
is the probability that two alleles in two individuals in a mating pair are identical by descent in this case.
Theoretical population genetics of ROH and IBD
If two genomes share a recent common ancestor at a site, then the length of the shared segment surrounding that site is likely to be long, because recombination has had little time to break down the segment. If the genomes share a distant common ancestor, then the surrounding segment is likely to be short because recombination will have had many generations to break it. In this way, recombination produces an inverse relationship between coalescence times at a genomic site and the length of the surrounding shared segment (Palamara et al. 2012; Carmi et al. 2014; Browning and Browning 2015). The results of our model, that increased consanguinity decreases both within-individual and between-individual coalescence times, suggest that populations with higher rates of consanguinity will have more recent coalescence times and will share longer ROH and IBD segments. Moreover, the result suggests that the reduction is greater for ROH than for IBD, and that consanguinity will have a stronger effect on ROH sharing.
To study ROH and IBD together in the same model, we generalized a diploid coalescent model of sib mating. The Campbell (2015) model and its generalization represent examples of the increasing integration of coalescent perspectives into models that consider a diploid pedigree structure (Wollenberg and Avise 1998; Wakeley et al. 2012, 2016; Wilton et al. 2017; King et al. 2018). For example, Wakeley et al. (2012) found that, in a pedigree-based coalescent model, compared to a standard haploid model, the distribution of pairwise coalescence times for random pairs of individuals was altered, most strongly for the most recent coalescence times. Otherwise, the two models have similar coalescence time distributions. In our case, consideration of the pedigree—with no consanguinity—produced mean pairwise coalescence times close to the haploid mean pairwise coalescence time of . The inclusion of consanguinity in the model decreased the mean coalescence time by a linear factor dependent on the kinship coefficient of a randomly chosen mating pair.
Contrasting two hypotheses—one in which both within-individual ROH and between-individual IBD are increased by consanguinity, and the other in which consanguinity increases ROH but not IBD—we found support for the former rather than the latter view. According to our model, consanguinity inflates relatedness not only within families, but in the population in general, so that mean pairwise coalescence times decrease with increasing consanguinity, both for pairs of alleles within individuals and for pairs of alleles in separate mating pairs. We can understand this phenomenon through the concept of coalescent effective population size. In this perspective (Sjödin et al. 2005), mean pairwise coalescence times have a direct relationship with effective size. Consequently, the direct relationship that we observed between coalescence times associated with ROH and those associated with IBD can be viewed as resulting from a decreased coalescent effective size that in turn results from consanguinity, and which decreases coalescence times both within and between individuals.
Previously, Jacquard (1970) studied the effect of inbreeding avoidance on effective population size. He modeled a two-sex, diploid population of N individuals with equally many males and females, considering cases with and without sib mating avoidance (Jacquard 1970, p. 175, 245). Sib mating avoidance generated a slightly larger effective size compared to the case in which sib mating was permissible, analogous to our observation that coalescence times decrease with increasing sib mating.
The sib mating case of our model is also similar to models of partial selfing in plants (Charlesworth 2003). Such models can be viewed as having a linear combination of “consanguinity” (selfing) and “random mating” (outcrossing). In our sib mating model, two alleles have probability of coalescing in the previous generation, whereas, under partial selfing, alleles have probability
of coalescence in the previous generation, where s is the selfing rate. In a partially selfing population of
diploid individuals, taking
, the effective population size is
individuals (Pollak 1987; Nordborg and Donnelly 1997), a product of the effective size in a randomly mating population and a linear reduction factor proportional to the probability of coalescence, similar to our findings for
and
. Our results are analogous to those of Milligan (1996), who studied the effect of partial selfing on within- and between-individual coalescence times,
and
, respectively, finding
and
for a population of size
diploid individuals. The greater reduction in coalescence time for within- vs. between-individual comparisons echoes our results for
and
. Moreover,
, the product of the number of alleles and the probability of rapid coalescence from selfing, analogous to the difference
that we found for
.
IBD in Jewish populations
Our population ordering by ROH and IBD accords with previous studies in Jewish populations (Atzmon et al. 2010; Campbell et al. 2012; Waldman et al. 2016a,b). Kang et al. (2016) observed that their ordering of populations by mean total ROH lengths was similar to the ordering reported by Waldman et al. (2016b). We find that the ordering of mean total IBD length in the data of Kang et al. (2016) is also similar to that of Waldman et al. (2016b). For the populations included in both studies, Waldman et al. (2016b) reported, in decreasing order, Mumbai, Cochin, Iranian, Libyan, Italian, Iraqi, Tunisian, Georgian, Yemenite, Syrian, Ashkenazi, Moroccan, Algerian, and Sephardi. Here we find a similar ordering: Mumbai, Cochin, Iranian, Libyan, Georgian, Moroccan, Ashkenazi, Yemenite, Iraqi, Italian, Tunisian, Algerian, Sephardi, and Syrian. Although some specific rankings differ, South Asian Jewish populations generally share the most IBD, followed primarily by some of the Middle Eastern and North African Jewish populations, with European Jewish populations tending toward intermediate and lower levels.
From our model, we expect ROH and IBD to be correlated because and
both depend on consanguinity. Because
, we expect a stronger effect of consanguinity on ROH than on IBD. Indeed, we find that ROH and IBD are correlated with positive regression slope less than 1, reflecting the weaker effect of consanguinity on IBD. In particular, the correlation is strongest with Class C (long) ROH, though Classes A and B might produce larger correlations if IBD calling for short segments was more accurate. Long ROH in a population reflect consanguinity because long segments are the most likely to share a recent ancestor (Pemberton et al. 2012; Kang et al. 2016); the correlation between Class C ROH and IBD supports the prediction of our model that ROH and IBD are correlated because they are both amplified by consanguinity.
Limitations and extensions
Our analysis has a number of limitations. First, we assumed a constant population size and a constant fraction of consanguineous unions each generation. It might be possible to generalize these assumptions to accommodate temporal changes in population size and consanguinity that could affect ROH and IBD distributions. We also did not consider population substructure, which is potentially relevant if consanguinity is practiced as a culturally transmitted trait in subgroups of a population. Substructure would affect within- and between-individual coalescence probabilities, and, in turn, coalescence times. In the same manner that inbreeding and substructure can be viewed as forms of the same general phenomenon of deviation from random mating, it is possible that a structured population in which random mating occurs within subpopulations, but not between them, could produce similar phenomena to those we have seen in our consanguinity model.
Second, we focused only on neutral loci. Loci experiencing balancing selection can exhibit evidence of excess genetic differences for pairs of alleles sampled within individuals compared to that seen between individuals, so that a reverse effect might be observed. For example, for the HLA locus, Robertson et al. (1999) studied identity of haplotypes for haplotypes in the same individual and for haplotypes in different individuals, quantities expected to be inversely related to pairwise coalescence times under a neutral model. In a population with no first cousin and closer matings, they found an excess in the number of within-individual vs. between-individual haplotype differences compared to a neutral prediction, suggesting an increase in within-individual vs. between-individual coalescence times at the HLA locus. This result, which contrasts with our prediction of greater difference for between-individual comparisons, suggests that caution is warranted in interpreting ROH and IBD with our model for regions experiencing balancing selection.
A third limitation is that for nth cousin mating, we assumed . However, in practical scenarios, if n is large, then randomly mating pairs are related to some degree, and if N is small, then double-first cousin mating is non-negligible. It might therefore be unrealistic to consider n large in our model. A fourth limitation is that we did not examine the full distributions of T and V; further information about these distributions will be important for clarifying the theoretical relationship between ROH and IBD more precisely.
The same approach we took here can also be applied to other consanguinity regimes. Double-first cousins, for example, have twice the number of paths to a recent common ancestor as first cousins. In Equation 7, is the probability that two alleles in a mating pair are inherited from a shared grandparental mating pair. If, instead, we consider double-first cousins, and if
is the fraction of double-first cousin mating pairs, then
is the probability that two alleles in the two individuals of a mating pair are inherited from a shared grandparental mating pair. Substituting
for
in Equation 7 gives a computation for double-first cousin mating.
Our model has implications for empirical studies of ROH and IBD. Studies have used properties of IBD for inference of demographic parameters (e.g., Palamara et al. 2012; Harris and Nielsen 2013; Ralph and Coop 2013), and joint interpretation of ROH and IBD can potentially provide information about consanguinity. One method for distinguishing between the effects of small population size and those of consanguinity is to examine the relationship between the number and length of ROH segments (Ceballos et al. 2018). Our results suggest that examining the number and length of IBD segments could also assist in disentangling these effects, as such features of IBD segments are also affected by consanguinity.
The reduction we observed in coalescence times owing to consanguinity implies that assuming random mating when inferring effective population size may produce underestimates. Under random mating, the mean coalescence time is , whereas we find that, with consanguinity, it is
. Thus, in populations with consanguinity, an apparent estimate of
might actually be an estimate of
. Lastly, our finding that
and
depend on the population size N and the kinship coefficient c suggests that given the full distributions of these random variables, it may be possible to infer N and c from joint analysis of ROH and IBD sharing.
We have introduced a model for the simultaneous analysis of ROH and IBD, finding that both are driven by the same phenomena of consanguinity and reduction in effective population size. ROH and IBD have often been analyzed separately, with different motivations and techniques. Our results provide a formal connection between ROH and IBD, demonstrating the utility of considering them together in the same analysis.
Acknowledgments
We thank J. Kang for bioinformatics assistance. Support was provided by National Institutes of Health grant R01 HG005855, Unites States-Israel Binational Science Foundation grant 2017024, and by a National Science Foundation Graduate Research Fellowship.
Footnotes
Communicating editor: R. Nielsen
- Received September 25, 2018.
- Accepted March 22, 2019.
- Copyright © 2019 by the Genetics Society of America
Available freely online through the author-supported open access option.