## Abstract

We report a positive association between marital radius (distance between mates' birthplaces) and fertility detected in a large population. Spurious association due to socioeconomic factors is discarded by a conditional analysis involving income, education, and urbanicity. Strong evidence of consanguinity's deleterious effects affecting an entire human population is provided.

MARITAL radius, *i.e*., the distance between mates' birthplaces, is a classic population structure parameter (Cavalli-Sforza and Bodmer 1971). According to Malécot's theory on spatial structure of genetic populations, the consanguinity coefficient of a mate can be well approximated by a decreasing function of the marital radius (Malécot 1959; Kimura and Weiss 1964; Morton 1969, 1977). We report here a positive association between marital radius and fertility, measured in a large population (all women born in Denmark in 1954). In light of Malécot's theory, this association provides strong evidence of the deleterious effect of low genetic mobility affecting an entire human population. Moreover, the possibility of spurious association due to effects of socioeconomic factors is discarded by an analysis using a range of explanatory variables representing key socioeconomic factors. This data will enable us to represent not only the effect of consanguinity but also the effect of local population size. The extension of the material used and the possibility of obtaining individual information on all the individuals in the population make our results a remarkable illustration of a classic theory of population genetics.

The study is based on the Danish Central Personal Register, a population register with almost perfect coverage. We constructed the cohort of all women born in Denmark in 1954 who were alive and living in Denmark in 1969, totaling 42,165 women. This cohort was followed up to the end of 1999, covering >1,200,000 persons/year. The number of children each mother had between the ages of 15 to 45 years old was determined and is called fertility. This period, termed the observation period, covered essentially the entire reproductive life of those women, since having children before the age of 15 or after the age of 45 is very rare in Denmark. A total of 22,298 women of the cohort had at least one child in the observation period. Note that an analogous study using a cohort formed by all men born in Denmark in 1954, although also of interest, would generate nontrivial issues due to censoring of the number of children each man had, because the fertility of men is not necessarily reduced after the age of 45.

The marital radius associated with each child born in the observation period to a mother in the cohort was estimated by the distance between the centroids of the parishes where the parents were born. The marital radius of the mother in the cohort that had children in the observed period is defined as the mean marital radius associated with her children. Half of the women in the cohort who had children showed a marital radius <33 km. Using the links to the parents of all people born in Denmark in 1954 we discovered that the genetic mobility in the previous generation was lower; indeed, only 2.84% of the parents of children born in 1954 were born in different parishes. Therefore, the studied cohort originated from a population with a relatively low level of genetic mobility, which, however, presented variation in the level of consanguinity due to a partial increase in demographic and genetic mobility.

We present below strong evidence that fertility is positively associated with marital radius in the population in question. The Spearman correlation between the mean marital radius and the number of children was 0.038 (*P*-value to test for no correlation <0.0001), indicating a general association between marital radius and fertility. The association was further characterized by fitting a truncated Poisson regression for predicting the number of children as a fourth degree polynomial function of the marital radius. A likelihood ratio test for checking constancy of the expected number of children based on the regression model above formally confirmed the reported association (*P*-value < 0.0001). Moreover, visual inspection of the graph of the expected number of children as a function of the marital radius (Figures 1 and 2) confirms that fertility and marital radius are positively associated.

Fertility, measured in a complex population such as this, can be affected by socioeconomic factors. Therefore we also performed a conditional analysis involving three key socioeconomic indicators: education, income, and urbanicity. We linked the data from the Danish administrative social registers (the tax register and the housing register) in Statistics Denmark with the Danish Central Personal Register by means of the civil register number (which is a unique identifier for each person living in Denmark). This allowed us to determine the three socioeconomic indices above for each woman in the cohort in 1994, when they were 40 years old. Education was defined as the maximum level attained in the nuclear family, classified in a graduated scale with four levels: 0–5 years of study (incomplete basic schooling), 5–10 years of study (complete basic schooling), completed high school or technical school, and higher education (university or advanced technical school). Income was taken as the per capita annual income of the nuclear family registered in 1994 and urbanicity was defined as the type of place of residence of each woman in 1994, classified in an ordered scale with five levels: 1, countryside with low population density; 2, town with <20,000 inhabitants; 3, town with 20,000–39,999 inhabitants; 4, city with 40,000–99,999 inhabitants; and 5, city with >100,000 inhabitants (including the capital and its surroundings).

The Spearman partial correlation between the number of children and marital radius conditional on urbanicity, education, and income is 0.041 (*P*-value < 0.0001), indicating that the raw positive association between fertility and marital radius reported above is not a mere artifact due to spurious association with these socioeconomic factors.

We take advantage of the theory of graphical models to extract further relevant aspects of the correlation structure and the distribution of the information between fertility, marital radius, and the three socioeconomic indicators. The idea is to represent the multivariate structure of these variables by a graph constructed in the following way: each variable is represented by a vertex (point) in the graph. Pairs of variables for which the conditional (or partial) correlation given the other variables is significantly different than zero are joined by an edge (line). The absence of an edge joining two variables indicates that the two variables are not significantly correlated given the other variables. Note that this theory allows both continuous and discrete variables in the same graph. Figure 3 displays the graph representing the data in our study.

According to the theory of graphical models (see Whittaker 1990; Lauritzen 1996), if two vertices are connected, the related variables carry information to each other that is not contained in the other variables in the graph. This can be translated in terms of the Külback–Leibler information contents (see Whittaker 1990, Chap. 4). The graph in Figure 3 indicates that marital radius carries information on fertility that is not contained in the socioeconomic indicators. Analogously, urbanicity carries information on fertility (Spearman partial correlation of −0.023, *P*-value < 0.0001) that is not contained in the other variables. Neither income nor education is directly connected to fertility, indicating that they do not contain information on fertility that is not already contained in urbanicity and marital radius.

Another general result of the theory of graphical models is the global Markov property (Lauritzen 1999), which states that *if two groups of variables, A and B, are separated by a third group of variables, C, then A and B are conditionally noncorrelated, given C*. Here the expression “group C separates A and B” means that every path connecting an element of A with an element of B necessarily contains an element of C.

An example of separation is that the (group containing the) variable urbanicity separates the group containing fertility and marital radius from the group containing income and education. This means that once we know the urbanicity, the knowledge of income or education (or both simultaneously) does not add more information either to fertility or to marital radius. As we are primarily interested in describing the relation between fertility and marital radius, it suffices to let urbanicity represent all the socioeconomic indicators at play.

Urbanicity, when considered together with income and education, might be viewed as essentially representing the local population density (although urbanicity might contain hidden information on socioeconomic status). Therefore our results strongly suggest that fertility is essentially determined by the local effective size and consanguinity, here represented by urbanicity and marital radius, respectively. This is compatible with Malécot's theory of spatial structure of genetic populations (Malécot 1959).

In conclusion, we have illustrated a clear effect of classic parameters of population structure, operating in a large population. These effects cannot be attributed to spurious associations due to the presence of socioeconomic factors in the causal path. Our results are compatible with a scenario where an entire human population, subjected to a relatively high and widespread consanguinity in the previous generations, presented a variation on fertility due to a partial reduction of inbreeding caused by changes in the patterns of genetic mobility.

## Acknowledgments

This work was partially done during a stay of R.L. at the National Centre for Register-based Research (NCRR) at the University of Aarhus. The NCRR is supported by the Danish National Research Foundation. The Institute of Molecular Pathology and Immunology was partially supported by Fundação para a Ciência e a Tecnologia and Programa Operacional Ciência e Inovação 2010.

## Footnotes

Communicating editor: D. Charlesworth

- Received February 22, 2007.
- Accepted October 19, 2007.

- Copyright © 2008 by the Genetics Society of America