## Abstract

We show how to incorporate fluctuations of the recombination rate along the chromosome into standard gene-genealogical models for the decorrelation of gene histories. This enables us to determine how small-scale fluctuations (Poissonian hot-spot model) and large-scale variations (Kong *et al.* 2002) of the recombination rate influence this decorrelation. We find that the empirically determined large-scale variations of the recombination rate give rise to a significantly slower decay of correlations compared to the standard, unstructured gene-genealogical model assuming constant recombination rate. A model with long-range recombination-rate variations and with demographic structure (divergent population) is found to be consistent with the empirically observed slow decorrelation of gene histories. Conversely, we show that small-scale recombination-rate fluctuations do not alter the large-scale decorrelation of gene histories.

GENOME-WIDE variation and decorrelation of gene histories are reflected in patterns of linkage disequilibrium, which in turn shape the genetic variation observed on the molecular level. Recently Reich *et al.* (2002) reported on the first genome-wide measurement of correlations of human gene histories. Reich *et al.* (2002) show that their data are inconsistent with standard gene-genealogical models allowing for nontrivial population structures and demographic schemes, but assuming a constant recombination rate over the genome. The question is thus: Can fluctuations of the recombination rate along the chromosome explain the slow correlation decay of gene histories?

Empirical results (Jeffreys *et al.* 2001; Kong *et al.* 2002) indicate that the recombination rate is not constant along the chromosome. It was observed (Jeffreys *et al.* 2001) that, at certain locations, an appreciable fraction of recombination events are concentrated in short regions (∼1 kb long and spaced 60–90 kb apart), so-called hot spots. At least locally this implies small-scale (<100 kb) variations of the recombination rate along the chromosome. Genome-wide, long-range fluctuations of the recombination rate for humans have been empirically determined by Kong *et al.* (2002). It is thus necessary to incorporate the effect of fluctuating recombination rates into the standard gene-genealogical model (Griffiths 1981; Hudson 1983; Tavaré 1984; Kaplan and Hudson 1985; Hudson 1990; Nordborg and Tavaré 2002). More generally, it is necessary to determine: On which length scales do recombination-rate fluctuations at a certain scale influence the decorrelation function of gene histories most significantly? It has been argued (Reich *et al.* 2002) that small-scale recombination-rate fluctuations (<100 kb) related to hot spots are an important if not the main feature determining the slow decorrelation of gene histories (assuming that hot spots are to be found genome-wide).

Here we derive an expression for the correlation of gene histories in neutral gene-genealogical models allowing for fluctuating recombination rates. This enables us to explain and quantitatively describe the influence of recombination-rate fluctuations on the correlation of gene histories. We find that large-scale fluctuations of empirically determined recombination rates (Kong *et al.* 2002) give rise to a significantly slower decay of correlations compared to the standard, unstructured, constant population-size gene-genealogical model assuming constant recombination rate. Furthermore, a model with large-scale recombination-rate fluctuations and with demographic structure (divergent population, see Eyre-Walker *et al.* 1998; Reich *et al.* 2002; Teshima and Tajima 2002; and references cited therein) is found to be consistent with the empirically observed decorrelation of gene histories. It is not necessary to invoke hot spots.

In a neutral model, Kaplan and Hudson (1985)(see also Griffiths 1981) have derived a relation between the correlation function ρ_{τx,τy} of the times τ* _{x}* and τ

*to the most recent common ancestors of two loci*

_{y}*x*and

*y*and the amount

*C*of recombination between these two loci. The result of Kaplan and Hudson (1985) for the unstructured, constant population-size model is exact for sample size

*n*= 2; for large

*n*it is a very good approximation. We observe that their result depends on the total amount of recombination between the two loci, but not on the distribution of recombination events between these loci. Moreover, this is still true when population structure is taken into account. The expected correlation ρ

^{exp}

_{τx,τy}is obtained by averaging with a sliding window of length |

*y*−

*x*| along the chromosome. Thus, if

*p*(

_{X}*C*) is the genome-wide distribution of recombination intensity

*C*in bins of lengths

*X*= |

*y*−

*x*|, the expected correlation is 1It also follows that small-scale fluctuations of the recombination rate on length scales much smaller than

*X*are irrelevant to the decay of correlations on scales of the order of

*X*. In particular, fluctuations due to hot spots at small scales cannot change the decorrelation of gene histories at much larger scales.

Using (1) we have computed ρ^{exp}_{τx,τy} in four models (illustrated in Figure 1): assuming small-scale variation of the recombination rate (model I); incorporating, in addition, large-scale variation (model II); estimating *p _{X}*(

*C*) from the empirical data of Kong

*et al.*(2002) (model III); and, in addition, taking into account demographic population structure (model IV).

Model I is the Poissonian hot-spot model of Reich *et al.* (2002). Recombination events occur at hot spots (of zero width) with rate *R*_{1} = *R*/λ. Nearest-neighbor distances between hot spots are exponentially distributed with expected value λ^{−1} (*cf.* Figure 1a). Using (1), the expected correlation within this model can be derived explicitly. We obtain the following expression for ρ^{exp}_{τx,τy}: 2It is exact for sample size *n* = 2. The expected correlation according to (2) is shown in Figure 2a. In agreement with Reich *et al.* (2002) we find that on distances of the order of the hot-spot spacing, correlations are larger than those in a constant recombination-rate model. However, no choice of parameters could explain the empirically observed decorrelation function (*cf.* data in Reich* et al.* 2002, Figure 6a, reproduced in Figure 2b here). In particular, no significant increase in correlations on length scales ≫λ^{−1} is observed, as expected from Equation 1. Reich *et al.* (2002) have fitted an “arbitrary mixed model” to their empirical data. To obtain these results it is necessary to introduce large-scale variations of the recombination rate, on a scale *L ≫ X* ∼ 1 Mb.

One possibility (model II) is to assume that hot spots occur in clusters, with long (≫1 Mb) regions of low recombination intensity between them: Hot spots occur in clusters of size *pL*, separated by empty regions of length (1 − *p*)*L*, where 0 ≤ *p* ≤ 1 and *L* is a typical length scale (of the order of several megabases). Within each cluster, the number of hot spots is Poisson distributed (*cf.* Figure 1b). This model provides a better fit to the empirical data (Figure 2b) than model I, indicating that large-scale fluctuations of the recombination rate are important. We note that assuming *p _{X}*(

*C*) = (1 −

*p*) · δ(

*C*−

*R*

_{0}

*X*) +

*p*· δ(

*C*−

*R*

_{1}

*X*) can produce an equally good fit to the data (

*e.g.*, for

*p*= 0.55,

*R*

_{0}= 1.2 cM/Mb and

*R*

_{1}= 0.02 cM/Mb, not shown). In this model the recombination rate is constant on large scales (≫1 Mb) and alternates between two values

*R*

_{0}and

*R*

_{1}.

We have estimated the genome-wide distribution *p _{X}*(

*C*) from empirical recombination data (model III) by sampling

*C*=

*g*(

*x*+

*X*) −

*g*(

*x*), where

*g*(

*x*) is the cumulative genetic distance, at uniformly distributed positions

*x*in the genome. We have obtained

*g*(

*x*) from the pedigree data of Kong

*et al.*(2002) as follows: We have taken columns 1 (physical distance) and 3 (sex-averaged genetic distance) in the web supplement NG917-S13 to Kong

*et al.*(2002), ignored entries labeled “NA,” and shifted the origin of both physical and genetic distances so that

*g*(0) = 0. The resulting

*p*(

_{X}*C*) for chromosome 5 is shown in Figure 1c, and the gene-history correlation results for the genome-wide distribution of

*p*(

_{X}*C*) are shown in Figure 2b. We find that the empirically determined large-scale fluctuations of the recombination rate give rise to significantly enhanced correlations (compared to the standard model assuming constant recombination rate), especially for large distances.

It is expected that population structure can increase the correlations of gene histories at large distances. We have considered the effect of large-scale recombination-rate fluctuations within a well-established model of demographic structure (model IV): the population was of constant size *N* until τ_{0} generations ago, when it split into two fractions of size γ*N* and (1 − γ)*N*. The two subpopulations remained separate until a recent merging (see, for instance, Eyre-Walker *et al.* 1998; Reich *et al.* 2002; Teshima and Tajima 2002; and references therein). We have calculated ρ_{τx,τy} explicitly in this model (Eriksson and Mehlig 2004). Without recombination-rate fluctuations, demographic models do not describe the empirically observed correlation of gene histories (Reich *et al.* 2002). We have determined the effect of large-scale recombination-rate fluctuations (Kong *et al.* 2002) on the correlation of gene histories in this model, using Equation 1 together with the appropriate expression for ρ_{τx,τy} derived in Eriksson and Mehlig (2004). The parameters of the model (τ_{0} and *N*) were chosen to be consistent with the empirically estimated time to the most recent common ancestor and its coefficient of variation (Reich *et al.* 2002). The parameter γ was set to 0.3. The resulting correlation function matches the empirical data reasonably well. Decreasing γ gives rise to decreased correlations (γ = 0 corresponds to the model with fluctuating recombination rates but without demographic structure).

In summary we have determined the influence of recombination-rate fluctuations on the decorrelation of gene histories. We find that small-scale fluctuations are irrelevant to long-range correlation decay. Empirically determined large-scale fluctuations of the recombination rate, however, are found to significantly increase the correlations. Within a model with demographic structure, large-scale fluctuations of empirically determined recombination rates significantly contribute to the empirically observed slow decay of correlations.

We conclude by discussing the implications of our results for the study of genome-wide variability as reflected in single-nucleotide polymorphism (SNP) statistics. Equation 1 determines the effect of recombination-rate fluctuations on ρ^{exp}_{τx.τy}. This quantity, in turn, determines the genome-wide statistics of SNP locations: the variance of the number of SNPs in bins of lengths *l* along the chromosomes is determined by the integral of ρ^{exp}_{τx,τy} over *x* and *y* from 0 to *l*, *i.e.*, by how fast the correlations decay on scales of length *l* (Hudson 1990). The International SNP Map Working Group (2001) has empirically determined the variance of the number of SNPs in short reads (of average length 500 bp); the result was found to be consistent with the standard, unstructured gene-genealogical model assuming a constant recombination rate. This is consistent with our results (Figure 2b): On scales of the order of 500 bp, the recombination-rate fluctuations have little effect on the correlation function. We expect, however, that to understand the statistics of SNP counts in longer bins, it will be necessary to account for long-range recombination-rate fluctuations.

## Footnotes

Communicating editor: S. Tavaré

- Received May 27, 2003.
- Accepted November 2, 2004.

- Genetics Society of America