Genetics, Vol. 153, 1973-1988, December 1999, Copyright © 1999

Effect of DNA Sequence Divergence on Homologous Recombination as Analyzed by a Random-Walk Model

Youhei Fujitania and Ichizo Kobayashib
a Department of Applied Physics and Physico-Informatics, Faculty of Science and Technology, Keio University, Yokohama 223-8522, Japan
b Department of Molecular Biology, Institute of Medical Science, University of Tokyo, Tokyo 108-8639, Japan

Corresponding author: Youhei Fujitani, Department of Applied Physics and Physico-Informatics, Faculty of Science and Technology, Keio University, Yokohama 223-8522, Japan., youhei{at}appi.keio.ac.jp (E-mail)

Communicating editor: N. TAKAHATA


*  ABSTRACT
*TOP
*ABSTRACT
*PREVIOUS MODELS
*THE RANDOM-WALK MODEL
*THEORY FOR THE VERY...
*THEORY FOR MMR-DEFECTIVE STRAINS
*FOR LONGER SUBSTRATES
*FURTHER DISCUSSION
*APPENDIX A
*APPENDIX B
*APPENDIX C
*LITERATURE CITED

A point connecting a pair of homologous regions of DNA duplexes moves along the homology in a reaction intermediate of the homologous recombination. Formulating this movement as a random walk, we were previously successful at explaining the dependence of the recombination frequency on the homology length. Recently, the dependence of the recombination frequency on the DNA sequence divergence in the homologous region was investigated experimentally; if the methyl-directed mismatch repair (MMR) system is active, the logarithm of the recombination frequency decreases very rapidly with an increase of the divergence in a low-divergence regime. Beyond this regime, the logarithm decreases slowly and linearly with the divergence. This "very rapid drop-off" is not observed when the MMR system is defective. In this article, we show that our random-walk model can explain these data in a straightforward way. When a connecting point encounters a diverged base pair, it is assumed to be destroyed with a probability that depends on the level of MMR activity.


MANY experimental studies have analyzed the relationship between the frequency of homologous recombination and the homology length that ranges from some hundreds of base pairs up to ~20 kbp (SINGER et al. 1982 Down; RUBNITZ and SUBRAMANI 1984 Down; SHEN and HUANG 1986 Down; AHN et al. 1988 Down; DENG and CAPECCHI 1992 Down; SUGAWARA and HABER 1992 Down; JINKS-ROBERTSON et al. 1993 Down). Bacterial systems were investigated at first, and the data were explained in terms of the MEPS (minimal efficient processing segment) theory (SINGER et al. 1982 Down; SHEN and HUANG 1986 Down). A MEPS means a segment of the threshold length below which the reaction becomes inefficient, probably because a protein-DNA interaction requires a certain length to occur. The frequency is assumed to be proportional to the number of ways of obtaining a MEPS (Meps bp) in the homologous region (N bp; Figure 1) and is given by

(1)

where c is the constant of proportionality. The linear function thus obtained, however, was later found to disagree with nonlinear dependence of the frequency on the homology length observed in a mammalian gene targeting system (DENG and CAPECCHI 1992 Down).



View larger version (5K):
In this window
In a new window
Download PPT slide
 
Figure 1. The number of ways of obtaining a MEPS. The top long line represents a homologous region with N bp. The subsequent shorter lines indicate some of the possible positions of a MEPS, of which the length is Meps bp; the uppermost shorter line indicates a case where a MEPS is located at the left end of the homologous region. Here, we suppose Meps = 6 bp although it is thought to be much longer actually. The total number of the positions is N - Meps + 1, which is the number of ways of obtaining a MEPS in the homologous region.

In contrast with the MEPS theory, our "random-walk model" was shown to explain the data from both systems (FUJITANI and KOBAYASHI 1995 Down; FUJITANI et al. 1995 Down). In our previous articles, we formulated the movement in vivo of a point connecting a pair of homologous regions of DNA duplexes in the reaction intermediate as a random walk on the basis of observations in vitro of THOMPSON et al. 1976 Down and PANYUTIN and HSIEH 1993 Down; we found that a shift from the third-power dependence to the linear dependence of the recombination frequency on the homology length takes place as the homology length increases. The former dependence agrees well with the data from the mammalian gene targeting system.

The recombination frequency has been found to decrease as sequence differences are introduced into the homologous region; its logarithm appears to be reduced linearly with an increase of the divergence (the ratio of the number of diverged base pairs to the number of all base pairs in a region of homology between two DNA duplexes) for very long homologous regions (106–107 bp) in bacterial systems (ROBERTS and COHAN 1993 Down; ZAWADZKI et al. 1995 Down; VULIC et al. 1997 Down; MAJEWSKI and COHAN 1998 Down). VULIC et al. 1997 Down studied effects of the methyl-directed mismatch repair (MMR) system and the SOS system on the reduction; the absolute value of the slope becomes larger as the MMR activity increases, while the intercept goes up as the SOS activity increases when the MMR system is active. DATTA et al. 1997 Down used a short homologous region of 350 bp in a yeast mitotic recombination system and found that the logarithm drops rapidly in a regime of very low divergence and drops slowly and linearly beyond this regime in the wild-type (Mmr+) strains. In the MMR-defective (Mmr-) strains, the logarithm was shown to drop without the "very rapid drop-off" as the divergence increases from zero.

As described in the next section, these effects of the MMR system have been explained in terms of the MEPS theory, which has already failed to explain the nonlinear dependence of the recombination frequency on the homology length. Here we present an alternative explanation in terms of the random-walk model after a brief review of the original version of the random-walk model. Symbols we use frequently are listed in Table 1.


 
View this table:
In this window
In a new window

 
Table 1. Glossary


*  PREVIOUS MODELS
*TOP
*ABSTRACT
*PREVIOUS MODELS
*THE RANDOM-WALK MODEL
*THEORY FOR THE VERY...
*THEORY FOR MMR-DEFECTIVE STRAINS
*FOR LONGER SUBSTRATES
*FURTHER DISCUSSION
*APPENDIX A
*APPENDIX B
*APPENDIX C
*LITERATURE CITED

Assuming that a base pair at a particular position in a homologous region will be diverged with a probability equal to the divergence (D, 0 <= D <= 1), one can calcu-late the average recombination frequency to compare it with experimental data. We express this average over positions of diverged base pairs by putting the recom-bination frequency, denoted by {Pi}, between the angle brackets, < and >, in the following equations. The recombination frequency at D = 0 need not be averaged.

In the MEPS theory, initial enzymes are supposed to work only when they cling to a MEPS devoid of diverged base pairs; the recombination frequency is proportional to the number of ways of picking up a MEPS devoid of diverged base pairs from the homologous region (N bp in total; VULIC et al. 1997 Down; MAJEWSKI and COHAN 1998 Down). The probability with which a segment of M bp has no diverged base pairs is given by (1 - D)M, where it does not matter if the segment is a part of a longer divergence-free region. Thus, the number of ways is (N - Meps + 1)(1 - D)Meps on average, and the averaged recombination frequency is a function of D and N,

(2)

where the superscript (M) indicates a result in the framework of the MEPS theory, c is the constant used in Equation 1, and {Pi}(M)(D = 0, N) is the recombination frequency at D = 0 given by Equation 1. When D << 1, because e-D {approx} 1 - D, we have

(3)

The reaction, thus initiated, may be aborted by the MMR system. The MMR system would attack a mismatch, which is produced at a diverged base pair as the heteroduplex elongates.

VULIC et al. 1997 Down thought that a divergence-free segment would be required not only for the initiation but also for escape from the attack of the MMR system. Thus, Meps should be modified to include the length required for the latter; they rewrote Equation 3 as

(4)

where the modified MEPS length, M{dagger}eps, depends on the level of MMR activity. Equation 4 implies that the logarithm is a linear function of D with the slope dependent on the level of MMR activity. As shown later, VULIC et al. 1997 Down could not fit Equation 4 to their data set for the strains overproducing MMR proteins over the whole divergence range examined; the absolute value of the observed slope appears to become smaller as D increases, as in the very rapid drop-off. They supposed that this happens because the MMR machinery is saturated by many mismatches; but they did not formulate this saturation.

DATTA et al. 1997 Down assumed that, if the heteroduplex region has elongated less than ß bp before it encounters the first diverged base pair, the MMR system is always triggered by the resultant mismatch; they assumed that otherwise the MMR system is not triggered by the mismatch with probability R0. Because the probability with which the heteroduplex elongates longer than or equal to ß bp without producing mismatches is (1 - D)ß {approx} eD, the probability with which the MMR system is triggered is given by 1 - R0e-ßD. They introduced a factor f denoting the probability with which the reaction is aborted after the MMR system is triggered and expressed the averaged recombination frequency as a function of D, N, and f:

(5)

They fitted Equation 5 to their experimental data (N = 350) for the wild-type strains showing the very rapid drop-off to obtain f = 0.97. When f = 0, Equation 5 is equivalent to Equation 3, which can explain the data for the Mmr- strains showing no very rapid drop-off. Equation 5 gives different values to the recombination frequency between identical substrates in the wild-type strains, <{Pi}(M)(D = 0, N = 350, f = 0.97)>, and to that in the Mmr- strains, <{Pi}(M)(D = 0, N = 350, f = 0)>, which agrees with their data. DATTA et al. 1997 Down suggested that this difference is observed probably because, even between identical substrates, the MMR system is triggered by either intrastrand secondary structure or unpaired regions caused by the branch migration passing into the flanking nonhomologous region.

We feel that DATTA et al. 1997 Down introduced many fitting parameters without discussing the reaction mechanism in enough detail although they fitted Equation 5 to their data well. They did not convincingly explain why the probability of triggering the MMR system is uniformly 1 - R0 if the heteroduplex elongates longer than a threshold length without producing mismatches and is otherwise uniformly unity.


*  THE RANDOM-WALK MODEL
*TOP
*ABSTRACT
*PREVIOUS MODELS
*THE RANDOM-WALK MODEL
*THEORY FOR THE VERY...
*THEORY FOR MMR-DEFECTIVE STRAINS
*FOR LONGER SUBSTRATES
*FURTHER DISCUSSION
*APPENDIX A
*APPENDIX B
*APPENDIX C
*LITERATURE CITED

Here we review the original version of the random-walk model (FUJITANI et al. 1995 Down; Figure 2 and Figure 3), which is appropriate for an identical region. A connecting point is assumed to "walk randomly" over sites in the homologous region of n bp. Assuming for simplicity that the step size of the random walk is exactly the interval between neighboring base pairs along a DNA duplex, we have n (>> 1) sites in the region. We assume that a connecting point is produced at the initial time (t = 0) with probability {alpha} per site and neglect cases where more than one connecting point is produced in a relatively short identical region (n{alpha} << 1). A "randomly walking" connecting point is assumed to be processed somewhere within the region. Here, "being processed" includes "being resolved to a recombinant" and "being destroyed" (i.e., "disappearing without yielding a recombinant"). We write k (0 < k <= 1) for the conditional probability of resolution given that a connecting point is processed. A connecting point is assumed to be destroyed whenever it encounters either end of the homology. This is the condition of a totally absorbing boundary (VAN KAMPEN 1981 Down). Hence, we have the master equation [Equation 1 HREF="#FD2">Equation 2Equation 3Equation 4 of FUJITANI et al. 1995 Down],

(6)

where pj(t) denotes the probability distribution of a connecting point at a (real) site j (1 <= j <= n) at time t, and p*(t) is this probability distribution at an imaginary site * (Figure 3A). This site represents the state at which a homologous recombinant has been formed. The parameter g is the transition probability per unit time (or transition rate) of the random walk; h is the ratio of the probability with which a random walker (a connecting point) is processed per site per unit time to g. The assumption adopted here that g, h, and k are site-independent is appropriate when the homologous region is devoid of sequence divergence. We assume that the re-combination frequency is measured after a long enough time in the experiments.



View larger version (31K):
In this window
In a new window
Download PPT slide
 
Figure 2. Likely steps of homologous recombination. (A) A region of homology between two DNA duplexes. (B) A recombinogenic event in one of them causes their homologous pairing. (C) The homologous regions are connected at a point. A Holliday junction is one example of the connecting point, but the molecular details need not be specified. (D) The connecting point of the reaction intermediate moves along the homology. During this movement, it may be somehow destroyed, or (E) it may be resolved to a recombinant. (F) When the connecting point encounters the nonhomology, the intermediate is somehow destroyed.



View larger version (23K):
In this window
In a new window
Download PPT slide
 
Figure 3. The random-walk model for an identical region. (A) A connecting point "walks randomly" over n sites with the transition probability per unit time (or transition rate) g. Sites x and * are imaginary, representing the state at which a connecting point has been destroyed at one of the real sites from 1 to n and the state at which it has been resolved to a recombinant, respectively. Ratios h and k are defined in the text and Table 1; ghk gives the transition probability with which a random walker is resolved to a recombinant at each site per unit time, and gh(1 - k) gives the transition probability with which it is destroyed, i.e., disappears without yielding a recombinant, at each site per unit time. Each of sites 0 and n + 1 is imaginary, representing the state at which a connecting point is destroyed by encounter with an end of the homology. (B) The potential of the intermediate would depend on the position of the connecting point. The potential, supposed in the original version of the random-walk model, is schematically plotted against the position. Each of the sites, over which the random walk occurs, is located at the valley bottom. For simplicity, "being processed" is not represented.

Suppose first that a connecting point is produced at a real site m, and the initial condition is given by pj(0) = 0 for j != m and pm(0) = 1. The solution pj(t) of Equation 6 depends on m and the number of the sites n; we use a superscript (m;n) to express this dependence. As derived in Appendix A, the recombination frequency after a long enough time is given by

(7)


(8)

where

(9)

Here, sinh and cosh, as well as tanh and coth appearing below, are the hyperbolic functions. Because a connecting point is actually produced with probability {alpha} per site, the recombination frequency is given by

(10)

When h << 1, we have

(11)


(12)

as described in Appendix A and in FUJITANI et al. 1995 Down. Thus, the transition from the third-power dependence to the linear dependence happens as the length (n) increases above . The expression in the lower line of Equation 12 apparently coincides with the linear function given by Equation 1. One can see that the parameter h, named "relative probability of intermediate processing," is a key parameter here, instead of the MEPS length in the MEPS theory. As shown by FUJITANI et al. 1995 Down, the third-power dependence agrees well with the data from a mammalian gene targeting system, where the dependence was originally described as exponential (DENG and CAPECCHI 1992 Down).

Expressed in terms of physics [see, e.g., chapters VI and X of VAN KAMPEN 1981 Down], the reaction intermediate would have a potential energy depending on where the connecting point is. One may refer to the potential energy as "free energy" following the transition state theory of Eyring (EYRING and EYRING 1963 Down). We assumed that this potential energy has approximately a periodicity such that the period is equal to the interval between neighboring base pairs along a DNA duplex (Figure 3B), and that difference between its maxima and its minima is large enough, as described in Appendix A of FUJITANI et al. 1995 Down. Diffusion in such a periodic potential can be considered as a (symmetrical) random walk over sites, each of which is located at the "valley bottom" of the potential. Thus, we formulated the movement of a connecting point as a random walk.


*  THEORY FOR THE VERY RAPID DROP-OFF
*TOP
*ABSTRACT
*PREVIOUS MODELS
*THE RANDOM-WALK MODEL
*THEORY FOR THE VERY...
*THEORY FOR MMR-DEFECTIVE STRAINS
*FOR LONGER SUBSTRATES
*FURTHER DISCUSSION
*APPENDIX A
*APPENDIX B
*APPENDIX C
*LITERATURE CITED

Here we explain why the very rapid drop-off was observed in DATTA et al. 1997 Down data for the wild-type strains (Mmr+; open squares in Figure 5) in terms of the random-walk model. Below we perform curve fits to experimental data by using the software IGOR (WaveMetrics, Lake Oswego, OR) on a Macintosh computer. We use {chi}2 {equiv} {Sigma}i(y - yi)2 as a measure of the goodness of fit, where yi is the data value (the natural logarithm of the recombination frequency) for the ith data-point and y is the value of a theoretical curve at the point. The results are summarized in Table 2.



View larger version (4K):
In this window
In a new window
Download PPT slide
 
Figure 4. Explanation of Fl(m, n). The symbol | indicates an identical site (a site of an identical base pair), and x indicates a diverged site (a site of a diverged base pair). The homologous region (N sites) is divided into several subregions by diverged sites. The lth site from the left end of the homologous region is a diverged site, or the mth site from the left end of an identical subregion (A) with n sites. The probability of this case is denoted by Fl(m, n), where 1 <= l <= N, 1 <= m <= n, and 1 <= n <= N. This identical subregion lies between two diverged sites. An identical subregion (B) with n = 2 lies between an end of the homologous region and a diverged site.



View larger version (29K):
In this window
In a new window
Download PPT slide
 
Figure 5. The recombination frequency vs. sequence divergence: data and theory (see Figure 8). The natural logarithm of the recombination frequency is plotted against the divergence (D). The inset shows a low-divergence regime (0 <= D <= 0.06). The open squares and the solid circles represent DATTA et al. 1997 Down experimental data for the wild-type Mmr+ strains and the Mmr- strains of yeast, respectively (mitotic recombination; N = 350 bp). The bottom solid curve is obtained by a curve fit of Equation 13 to the data for the wild-type strains; the fitted values are h = 1.2 x 10-4 and k{alpha} = 3.4 x 10-9 ({chi}2 = 7.3). The crosses represent our simulation results by use of Equation 14 and Equation 17 with h' = 1.2, k' = 0, and the other parameter values the same as above. Each simulation result is obtained from 105 trials. The bottom dashed curve is DATTA et al. 1997 Down fitted curve to the data for the wild-type strains, which is Equation 5 with f = 0.97, Meps = 23, ß = 610, R0 = 0.18, {Pi}(M)(f = D = 0) = 5.1 x 10-6. The top solid curve is obtained by a curve fit of Equation 18 to the data for the Mmr- strains with the k'/k value restricted to be positive: the fitted values are h = 2.2 x 10-3, k{alpha} = 8.4 x 10-9, h' = 8.1 x 10-2, and k'/k = 6.9 x 10-7 ({chi}2 = 1.2 x 10). The {Delta} symbols represent our simulation results by use of Equation 14 and Equation 17 with the same parameter values as just above. Each simulation result is obtained from 105 trials. The top dashed curve is DATTA et al. 1997 Down fitted curve to the data for the Mmr- strains, which is Equation 5 with f = 0 and the same values of the other parameters as above.


 
View this table:
In this window
In a new window

 
Table 2. Results of curve fits

As in the previous models (DATTA et al. 1997 Down; VULIC et al. 1997 Down), we assume that the MMR system aborts the reaction by attacking mismatches resulting from diverged base pairs. To formulate it simply in terms of the random-walk model, we assume that a connecting point is always destroyed when it is produced at a diverged site (i.e., a site of a diverged base pair) and when it encounters a diverged site during its random walk. Thus, a diverged site plays the role of a totally absorbing boundary. The recombination frequency in an identical region is proportional to the third power of its length if the length falls in the range shown by the upper line of Equation 12. Suppose that one diverged base pair is introduced at the center of such an identical region to divide it into equal halves. Because a connecting point is produced in either of the two identical subregions, the recombination frequency in the entire homologous region drops very rapidly to one-eighth of the frequency for zero divergence. When two diverged base pairs are present at equal intervals, the recombination frequency drops to ()3 = of the frequency for zero divergence. Because 1/27 > (1/8)2, the frequency-drop from no diverged base pairs to one diverged base pair is more "rapid" than that from one diverged base pair to two diverged base pairs. It is probable that the random-walk model thus explains the very rapid drop-off. Actually, the recombination frequencies obtained by DATTA et al. 1997 Down for zero divergence are 92, 86, 110, 71, and 170 x 10-8, and those for one diverged base pair introduced rather close to the center are 21, 30, 23, 31, and 29 x 10-8. The drop rates are not so far from the one-eighth.

Let us examine this scenario. Suppose that one connecting point is produced initially at the lth site (say, from the left end) of a homologous region with N sites. This region may be divided into some identical subregions by diverged sites, each of which plays the role of a totally absorbing boundary. Suppose that this lth site is an identical site (i.e., a site of an identical base pair), and we define Fl(m, n) (1 <= l <= N, 1 <= m <= n, 1 <= n <= N) as the probability with which the connecting point is produced at the mth site of an identical subregion with n sites. The identical subregion lies between diverged sites (Figure 4A), lies between a diverged site and either end of the homologous region (Figure 4B), or coincides with the entire homologous region. In the first case, we have Fl(m, n) = D2(1 - D)n because n bp are identical with probability (1 - D)n and 2 bp at both ends are diverged with probability D2. In the second case, we have Fl(m, n) = D(1 - D)n because 1 bp at an end need not be diverged. Which case we have is determined by the relationship among l, m, n, and N as shown in Appendix B.

Noting that Equation 8 gives the probability of resolution of the connecting point considered above, we can express the averaged recombination frequency in the homologous region by

(13)

where we added the superscript + to indicate that this expression is valid when the MMR system is active enough. Note that {phi}, defined by Equation 9, depends on only h. By setting D = 0 in Equation 13, we recover Equation A12 with n replaced by N.

The value of <{Pi}+(D, N)>/(k{alpha}) is independent of the k{alpha} value. Thus, when we plot ln<{Pi}+(D, N)> against D, we can only shift the curve upward or downward by increasing or decreasing the k{alpha} value, respectively, with the curve shape remaining the same. The parameter h also influences the overall position of the curve because the intercept, i.e., the logarithm at D = 0, is given by the logarithm of Equation 12 with n replaced by N. The curve shape depends not on k{alpha} but on h.

We have two fitting parameters in Equation 13: h and the product k{alpha}. Curve fitting to DATTA et al. 1997 Down data for the wild-type strains (Figure 5) results in the fitted values h = 1.2 x 10-4 and k{alpha} = 3.4 x 10-9 ({chi}2 = 7.3). These values are consistent with FUJITANI et al. 1995 Down estimates for a similar yeast system (h < 10-4 and k{alpha} > 10-10). The fitted curve can follow the very rapid drop-off shown by the data (Figure 5). We replot DATTA et al. 1997 Down fitted curve, Equation 5, in Figure 5. It has five fitting parameters: {Pi}(M)(D = 0, N, f = 0), Meps, f, R0, and ß, of which the last four parameters are responsible for the curve shape. Their fit ({chi}2 = 1.8) is better than ours.

The homology length (350 bp) is found to be comparable to = 1.8 x 102, around which the shift in the dependence should occur as shown by Equation 12. Although we consider this, the calculated ratio of the frequency for one diverged base pair to that for zero divergence, (D = 0, N = 350) = 0.71, appears to be large as compared with the one-eighth mentioned in the second paragraph of this section. The reason is as follows. The one-eighth corresponds with the case where the diverged base pair is at the center of the homologous region in the third-power dependence range. The average <{Pi}+ (D = , N = 350)> is influenced not only by this case but also by the case where a diverged base pair is introduced near either end of the homologous region to give almost the same recombination frequency as {Pi}+(D = 0, N = 350).

Thus, the random-walk model can offer a very straightforward explanation for the presence of the very rapid drop-off in the wild-type strains (Mmr+). The same mechanism can explain the map expansion phenomenon, Rac > Rab + Rbc, where each term implies the recombination frequency between two markers indicated by the letters of the subscript and loci of the markers a, b, and c are arranged in this order (HOLLIDAY 1964 Down; FINCHAM and HOLLIDAY 1970 Down; SHEN and HUANG 1989 Down). A marker is a diverged base pair or a minute block containing diverged base pairs and plays the role of a totally absorbing boundary in terms of the random-walk model. For example, Rac is eight times as large as Rab = Rbc if the locus b is at the center of the ac interval, which amounts to Rac > Rab + Rbc. See FUJITANI and KOBAYASHI 1997 Down for the details.


*  THEORY FOR MMR-DEFECTIVE STRAINS
*TOP
*ABSTRACT
*PREVIOUS MODELS
*THE RANDOM-WALK MODEL
*THEORY FOR THE VERY...
*THEORY FOR MMR-DEFECTIVE STRAINS
*FOR LONGER SUBSTRATES
*FURTHER DISCUSSION
*APPENDIX A
*APPENDIX B
*APPENDIX C
*LITERATURE CITED

Assuming that a connecting point is always destroyed at a diverged site unlike at an identical site, in the preceding section we were successful at explaining the very rapid drop-off. What we assumed is a kind of site dependence in the transition rates. Thus, we expect to explain the absence of the very rapid drop-off in DATTA et al. 1997 Down data for the Mmr- strains ({Delta}msh2{Delta}msh3; solid circles in Figure 5) by similarly assuming site dependence in the transition rates. We assume that, when the MMR system is defective, a connecting point is a little more likely to be processed and destroyed at a diverged site than at an identical site; the resolution step could be affected by mismatches themselves (SHEN and HUANG 1989 Down). Here, we adopt a set of site-dependent transition rates, which is called the random-jump-rate model or the random-trap model (DENTENEER and ERNST 1984 Down; HAUS and KEHR 1987 Down), in the study of diffusion in a random medium.

As illustrated in Figure 6A, this model supposes that the potential felt by a random walker has the same "height" at the "hilltops." We assume that there are two kinds of heights of the valley bottoms: one for an identical site and the other for a diverged site (Figure 6A). The latter should be higher than the former because a connecting point is assumed to be a little more unstable at a diverged site. A random walker can reach a neighboring site after "climbing up" a lower "hill," i.e., with larger transition rate, when it starts from a diverged site than when it starts from an identical site [see, e.g., chapter X of VAN KAMPEN 1981 Down]. The master equation is, instead of Equation 6,

(14)

where gj, hj, and kj take the values g, h, and k, respectively, at an identical site, and take g', h', and k', respectively, at a diverged site (Figure 6B). Without diverged sites, Equation 14 is reduced to Equation 6 with n replaced by N.



View larger version (23K):
In this window
In a new window
Download PPT slide
 
Figure 6. The random-walk model with a set of transition rates of the random-trap type. (A) A potential of the random-trap type; the potential has the same height at the hill tops. Each of the sites, over which the random walk occurs, is located at the valley bottom, as in Figure 3B. The potential is assumed to be higher at a diverged site j than at identical sites j - 2, j - 1, j + 1, and j + 2. For simplicity, being processed is not represented. As discussed in the text, the transition rate g is replaced by g' from a diverged site to one of the neighboring sites. (B) Unlike in Figure 3, sequence divergence is taken into account here. The ratios h and k are replaced by h' and k', respectively, at a diverged site.

As in Equation 7 and Equation 10, the recombination frequency is given by

(15)

where the superscript (RT) indicates the recombination frequency for a set of transition rates of the random-trap type, and p(m;N)*({infty}) is given by

(16)

where p(m;N)j(t) is the solution of Equation 14 under the initial condition pj (0) = 0 for j != m and pm (0) = 1. We have, from Equation 15 and Equation 16,

(17)

As shown later, {Pi}(RT) (N) is independent of g and g'. Because p(m;N)j (t) is a solution of the first three equations of Equation 14 and is independent of {alpha} and k, {Pi}(RT) (N) is invariant for any set of values of {alpha}, k, and k' as long as k{alpha} and k'/k remain fixed. This is also the case with its average <II(RT)(D, N)>; we can therefore regard h, k{alpha}, h', and k'/k as the parameters of <{Pi}(RT) (D, N)>. The shape of the curve of ln<{Pi}(RT)(D, N)> depends not on k{alpha} but on h, h', and k'/k, as the shape of the curve of ln<{Pi}+(D, N)> depended not on k{alpha} but on h.

We simulate the dynamics described by Equation 14 with a computer (VT-Alpha 433S8/3N, 433 MHz cpu; Visual Technology, Tokyo). Suppose that a random walker is now at an identical site. According to Equation 14, the probability of its jump to either of the neighboring sites in a short time {Delta}t is given by 2g{Delta}t, and the probability of its being processed in this short time is given by gh{Delta}t. Thus, on average, some action (i.e., jump to a next site or being processed) of the random walker at an identical site occurs in a short time {Delta}t = . Similarly, a random walker at a diverged site takes some action in a short time {Delta}t' = on average. One time step (Monte Carlo step) in our simulation is made to correspond with this time interval {Delta}t or {Delta}t' when the random walker is at an identical site or a diverged site, respectively. Thus, some action occurs at each time step in our simulation. A random walker jumps to one neighboring site with probability g/{g(2 + h)}, jumps to the other with probability g/{g(2 + h)}, and is processed with probability gh/{g(2 + h)} at each time step if it is at an identical site. If it is at a diverged site, the probabilities are g'/{g'(2 + h')}, g'/{g'(2 + h')}, and g'h'/{g'(2 + h')}, respectively. This rule is modified at either end of the homology. Because these probabilities are independent of g and g', we need not specify values of g and g' to calculate the recombination frequency. This point is shown analytically in Appendix C.

We have introduced a set of transition rates of the random-trap type to analyze the data for the Mmr- strains, but we should also be able to analyze data for the Mmr+ strains with Equation 14 HREF="#FD15">Equation 15Equation 16Equation 17. We first analyze the data of DATTA et al. 1997 Down again for comparison with the analysis in the preceding section. In Equation 6, the relative probability of intermediate processing, h, is the ratio of the transition rate of being processed to the transition rate from a site to a neighboring site. In Equation 14, h is the ratio at an identical site while h' is the ratio at a diverged site. Hence, the condition that a connecting point at a diverged site is almost always destroyed without moving to a neighboring site can be expressed by h' >> h and k' << k. Because we assumed that a connecting point is always destroyed at a diverged site in the preceding section, we can expect that the averaged recombination frequency from Equation 14 tends to Equation 13 as h'/h -> {infty} and k'/k -> 0. This expectation is verified in Figure 5; the cross symbols, which are obtained numerically from Equation 14 with large h'/h and k' = 0, agree with the bottom solid curve obtained in the preceding section. This point is also discussed in the next section.

Let us now analyze DATTA et al. 1997 Down data for the Mmr- strains. We have smaller h'/h(>1) and larger k'/k(<1) than the above because we assume that a connecting point is a little more likely to be processed and destroyed at a diverged site than at an identical site. We usually have h << 1 as estimated in the preceding section, and so we can expect 0 < h' - h << 1. Thus, we can use the decoupling approximation introduced in Appendix C to average Equation 15 over positions of diverged sites,

(18)

where is defined by {equiv} (1 - D)h + Dh' and is {phi} of Equation 9 with h replaced by .

DATTA et al. 1997 Down data for the Mmr- strains show no very rapid drop-off and a large intercept as compared with their data for the wild-type strains (Figure 5). The latter implies that the MMR system somehow hinders the homologous recombination between identical substrates. Thus, the Mmr- strains would not have the same h and k{alpha} values as the wild-type strains. Curve fitting to the data for the Mmr- strains results in the fitted values h = 2.2 x 10-3, k{alpha} = 8.4 x 10-9, and h' = 8.1 x 10-2 with {chi}2 = 1.2 x 10 (Figure 5). The fitted k'/k value varies from 10-7 to 10-4 depending on the initial condition of curve fitting; the curve shape is insensitive to k'/k so long as it is not too large. This is expected because k'/k appears only in the first term in the first braces of Equation 18, which term is negligible as compared with the second term when k'/k is not too large. We also obtained simulation results with the same parameter values (Figure 5); the agreement between them and the fitted curve shows the validity of our decoupling approximation.

DATTA et al. 1997 Down explained their data by using Equation 5 with f = 0 and the other parameter the same as for the wild-type strains (Figure 5; {chi}2 = 7.1). Their fit is better than ours, judging from the {chi}2 value over the divergence range examined (0 <= D <= 0.26). Our curve is convex (i.e., its second derivative is positive) although the data appear to be concave as a whole; our curve deviates considerably from the data point at D = 0.26. Except for this data point, however, our curve can be fit to the data ({chi}2 = 3.8) better than their line ({chi}2 = 7.1).


*  FOR LONGER SUBSTRATES
*TOP
*ABSTRACT
*PREVIOUS MODELS
*THE RANDOM-WALK MODEL
*THEORY FOR THE VERY...
*THEORY FOR MMR-DEFECTIVE STRAINS
*FOR LONGER SUBSTRATES
*FURTHER DISCUSSION
*APPENDIX A
*APPENDIX B
*APPENDIX C
*LITERATURE CITED

VULIC et al. 1997 Down studied conjugational crosses of enterobacteria, which formally involves very long substrates of the order of 107 bp to obtain data for the Mmr- strains (mutS), for the wild-type strains (Mmr+), and for the strains overproducing the MMR proteins of MutS and MutL (Mmr++). They analyzed their data by line fits with Equation 4. To analyze them in terms of the random-walk model, we first study how our curves change as N increases and check again the validity of Equation 18. We plot ln<{Pi}(RT)(D, N = 350)>, changing the h' value or changing the k'/k value (Figure 7A and Figure B). Using the same sets of parameter values, we plot the logarithm for N = 3500 in Figure 7C and Figure D.



View larger version (26K):
In this window
In a new window
Download PPT slide
 
Figure 7. The recombination frequency vs. the sequence divergence: theory and simulation. The natural logarithm of the recombination frequency is plotted against the divergence (D). The symbols {square}, x, {circ}, and {triangleup} represent simulation results by use of Equation 14 and Equation 17; each simulation result is obtained from 105 trials. We use h = 3.0 x 10-5 and k{alpha} = 3.6 x 10-8 in common. The solid curve represents Equation 13. (A) We use N = 350 and k'/k = 2.0 x 10-4 in common, and use h' = 2.0 x 10-3 ({square}), 2.0 x 10-2 (x), 2.0 x 10-1 ({circ}), and 2.0 ({triangleup}). The first three h' values are also used for the top, the middle, and the bottom dashed curves representing Equation 18, respectively. (B) We use N = 350 and h' = 2.0 in common, and use k'/k = 2.0 x 10-1 ({square}), 2.0 x 10-2 (x), 2.0 x 10-3 ({circ}), and 0 ({triangleup}). (C) We use N = 3500 and k'/k = 2.0 x 10-4 in common, and use h' = 2.0 x 10-3 ({square}), 2.0 x 10-2 (x), 2.0 x 10-1 ({circ}), and 2.0 ({triangleup}). The first three h' values are also used for the top, the middle, and the bottom dashed curves representing Equation 18, respectively. (D) We use N = 3500 and h' = 2.0 in common, and use k'/k = 2.0 x 10-1 ({square}), 2.0 x 10-2 (x), 2.0 x 10-3 ({circ}), and 0 ({triangleup}).

We find that the curves, which the decoupling approximation yields for h' = 2.0 x 10-3 and h' = 2.0 x 10-2 (i.e., the top two dashed curves in Figure 7A and Figure C), agree well with the corresponding simulation results. This is expected because we then have h' - h << 1 (h = 3.0 x 10-5). We again find that the simulation results tend to Equation 13 as h'/h -> {infty} and k'/k -> 0 in each of Figure 7A&NDASH;D; the very rapid drop-off appears then.

We find that the corresponding curves for N = 350 and N = 3500 share almost the same shape. The curve shape is thus insensitive to N probably because the horizontal axis represents the divergence. At the same divergence, the average interval between two neighboring diverged sites is irrespective of the homology length. This average interval would mainly determine how frequently the connecting point encounters a diverged site and thus would mainly determine how the recombination frequency is reduced from that in the case of zero divergence.

Curve fitting of Equation 18 to VULIC et al. 1997 Down data for the Mmr- strains in Figure 8 results in the fitted values of h = 3.2 x 10-5, k{alpha} = 3.1 x 10-9, and h' = 1.9 x 10-3 ({chi}2 = 6.0 x 10-1). The fitted k'/k value varies from 10-7 to 10-3 depending on the initial condition of curve fitting as in the preceding section. Line fitting to the data for the Mmr- strains gives the fitted intercept -3.6 and the fitted slope -1.7 x 10 ({chi}2 = 3.8 x 10-1). These comparable {chi}2 values demonstrate that our fit is as good as VULIC et al. 1997 Down line fit.



View larger version (21K):
In this window
In a new window
Download PPT slide
 
Figure 8. The recombination frequency vs. the sequence divergence: data and theory (see Figure 5). The natural logarithm of the recombination frequency is plotted against the divergence (D). The symbols x, {circ}, and {triangleup} represent the data for the Mmr- strains, the wild-type strains, and the Mmr++ strains of VULIC et al. 1997 Down, respectively (conjugational cross of enterobacteria). We use N = 107 in our analysis. The top solid curve is obtained by a curve fit of Equation 18 to the data for the Mmr- strains; the fitted values are h = 3.2 x 10-5, k{alpha} = 3.1 x 10-9, h' = 1.9 x 10-3, and k'/k = 3.6 x 10-7 ({chi}2 = 0.60). The middle solid curve represents Equation 13 with the same h and k{alpha} values ({chi}2 = 2.3 x 10). The bottom solid curve is obtained by a curve fit of Equation 13 to the data for the Mmr++ strains with the k{alpha} value fixed to be the same as above. The fitted h value is 1.0 x 10-6 ({chi}2 = 2.5 x 10). The dashed lines are obtained by line-fits to the data as was done by VULIC et al. 1997 Down; the top line is fitted to the data for the Mmr- strains, the middle line to the data for the wild-type strains, and the bottom line to the data up to D = 0.05 for the Mmr++ strains. The fitted intercepts are -3.6, -2.8, and -2.9, the fitted slopes are -1.7 x 10, -6.2 x 10, and -2.2 x 102, and the {chi}2 values are 0.38, 0.47, and 3.0, respectively. The dotted line is fitted to the data for the Mmr++ strains up to D = 0.17; the fitted intercept and slope are -5.9 and -7.1 x 10, respectively, with {chi}2 = 2.9 x 10.

The fitted h value gives = 3.5 x 102, which is much smaller than N = 107. Unless h changes drastically enough to make comparable to or much larger than N, the intercept is still given approximately by k{alpha}N as shown by the bottom line of Equation 12 with n replaced by N. The intercepts appear to be the same among the Mmr- strains, the wild-type strains, and the Mmr++ strains in Figure 8. We assume that the same k{alpha} value is shared among the three types of strains; we expect that their h values are not drastically different.

Judging from our analysis of the data of DATTA et al. 1997 Down, Equation 13 is expected to be applicable to the data for the wild-type strains of VULIC et al. 1997 Down. This equation yields the very rapid drop-off as shown in Figure 5 and Figure 7, while their data appear to show no very rapid drop-off (open circles in Figure 8). Thus, giving up curve fitting of Equation 13 to the data, we only plot Equation 13 with the same h and k{alpha} values as obtained for the Mmr- strains (Figure 8). We find that the data point at D = 0.17 is not so far from the curve, but its overall agreement with the data is poor ({chi}2 = 2.3 x 10). If we do a line fit as in VULIC et al. 1997 Down, the fitted intercept and slope are -2.8 and -6.2 x 10, respectively, with {chi}2 = 4.7 x 10-1 (Figure 8). This fit is much better than ours.

Let us fit Equation 13 to the data for the Mmr++ strains with h being the only fitting parameter. Using the 433 MHz machine to perform the summation over N = 107 in Equation 13, we obtain the fitted value h = 1.0 x 10-6 with {chi}2 = 2.5 x 10 (Figure 8). The data for the Mmr++ strains appear to show the very rapid drop-off, which is followed by our curve. Attributing this tendency to saturation of the MMR proteins without its formulation, VULIC et al. 1997 Down did a line fit to the data up to D = 0.05 (Figure 8); the fitted intercept and slope are -2.9 and -2.2 x 103, respectively ({chi}2 = 3.0). In passing, if the extreme data point is included, these values are -5.9 and -7.1 x 10, respectively, with {chi}2 = 2.9 x 10.

Our curves for the Mmr- strains and for the Mmr++ strains (the top and the bottom solid curves in Figure 8, respectively) appear to have the same intercept regardless of their different h values as expected. Comparing our curve for the Mmr++ strains with that for the wild-type strains (the middle curve in Figure 8), we find that the slope near D = 0 is steeper, i.e., the very rapid drop-off becomes more prominent, as h decreases. This can be explained qualitatively as follows. As D increases in Equation 13, the whole homologous region is separated by a greater number of totally absorbing boundaries and average length of an identical subregion becomes shorter. As is larger, even if D is small, more identical subregions can be in the third-power dependence range of Equation 12. This dependence causes the very rapid drop-off as discussed in the second paragraph of THEORY FOR THE VERY RAPID DROP-OFF.

Although the substrates are very long (~107 bp), we have used the random-walk model with a single random walker. In other words, we still assumed N{alpha} << 1 in this section as in Equation 6 and Equation 14. This is consistent with the fitted value of k{alpha} = 3.1 x 10-9 above.


*  FURTHER DISCUSSION
*TOP
*ABSTRACT
*PREVIOUS MODELS
*THE RANDOM-WALK MODEL
*THEORY FOR THE VERY...
*THEORY FOR MMR-DEFECTIVE STRAINS
*FOR LONGER SUBSTRATES
*FURTHER DISCUSSION
*APPENDIX A
*APPENDIX B
*APPENDIX C
*LITERATURE CITED

As mentioned in the Introduction, VULIC et al. 1997 Down reported that, when the MMR system is active, the intercept goes up without significant change in the slope as the SOS activity increases to induce overproduction of RecA protein. They explained this observation by adjusting the homology length N in the right-hand side of Equation 4 because they assumed that the total length of DNA available for recombination increases with the RecA concentration. In the random-walk model, the homology length N is a fixed length of the region where the connecting point randomly walks. It would be natural to assume that the probability of initial production of a connecting point per site, {alpha}, increases with the RecA concentration. As discussed, our curve of either ln<{Pi}+(D, N)> or ln<{Pi}(RT)(D, N)> is then lifted with its shape remaining the same. Thus, the random-walk model can also explain this SOS-induced change of the intercept in a very straightforward way.

Table 2 summarizes the results of the curve fits. The {chi}2 values tell that the curves in our model cannot be fit to the data better than those in the previous models, except for the Mmr++ strains. However, this never means failure of our model. First, the previous models are based on the MEPS theory, which has failed to explain the nonlinearity between the recombination frequency and the homology length as discussed in the opening section. Second, the previous models cannot explain the very rapid drop-off well; VULIC et al. 1997 Down did not include the data point at D = 0.17 in their line fit to the data for the Mmr++ strains, and DATTA et al. 1997 Down introduced many fitting parameters rather intuitively. Assuming that a connecting point is always destroyed at a diverged site in terms of the random-walk model, we derived Equation 13 to explain the very rapid drop-off observed in DATTA et al. 1997 Down data for the wild-type strains (Figure 5) and VULIC et al. 1997 Down data for the Mmr++ strains (Figure 8). This equation has the parameters h and k{alpha}, which also determine the dependence of the homologous recombination on the homology length in Equation 11. We have mentioned an agreement between the estimates in Equation 11 and Equation 13 in the paragraph next but one to that containing Equation 13. In particular, how the logarithm drops very rapidly from the intercept is determined by only one parameter h. This parameter, relative probability of intermediate processing, is also the key to the relationship between the recombination frequency and the homology length. This very simple explanation for the very rapid drop-off is our main result. The very rapid drop-off is not observed in VULIC et al. 1997 Down wild-type strains (Figure 8), in which a connecting point may not be always destroyed at a diverged site.

We also assumed site dependence of the transition rates for the Mmr- strains of DATTA et al. 1997 Down and VULIC et al. 1997 Down, in which the very rapid drop-off was not observed (Figure 5 and Figure 8). We adopted a set of the transition rates of the random-trap type and verified that the averaged recombination frequency calculated from Equation 17 tends to that from Equation 13 as a diverged site severely obstructs the homologous recombination (Figure 7). It is possible that Equation 13 is the extreme expression approached by not only Equation 17 but by a corresponding equation coming from a set of transition rates of another type because we derived Equation 13 without using a set of transition rates of the random-trap type. This is why we explained the very rapid drop-off before introducing a set of transition rates of the random-trap type although we can explain it using a set of transition rates of this type.

Although we find that the very rapid drop-off becomes less prominent as a diverged site obstructs the homologous recombination less severely (Figure 7), our curve cannot be fitted to DATTA et al. 1997 Down data for the Mmr- strains better than their fitted line (Figure 5). In particular, our curve cannot follow the apparent concavity shown in their data set. This concavity appears to be absent in VULIC et al. 1997 Down data for the Mmr- strains (Figure 8). To this data set, our curve can be fitted as well as their line.

We supposed that the MMR system, if active enough, detects mismatches to abort the homologous recombination as in VULIC et al. 1997 Down and DATTA et al. 1997 Down. However, WALDMAN and LISKAY 1988 Down, by studying recombination between plasmids and herpes simplex virus in a mammalian cell, claimed that the recombination frequency is determined not by the divergence but by the length of a divergence-free stretch, and that the heteroduplex can elongate through a region with significant divergence. Furthermore, MAJEWSKI and COHAN 1998 Down studied sexual isolation in Bacillus and concluded that the reduction in the recombination frequency due to the sequence divergence is caused predominantly by resistance to the heteroduplex formation and only fractionally by mismatch repair. NEGRITTO et al. 1997 Down, on the contrary, reported the relevance of the MMR system by analyzing the recombination between DNA fragment and a genomic target in a yeast system although they also found that only mismatches close to the edge of the fragment can inhibit the recombination.

To explain all these findings, we may also have to take into account possible influence of the divergence on the initial events in the random-walk model. PORTER et al. 1996 Down suggested that the relevance of the MMR system to the reduction of the recombination frequency caused by sequence divergence depends on the system. Whether the site dependence of the transition rates in the random walk or the influence of the divergence on the initial events is relevant to the reduction could depend on the system.

DATTA et al. 1997 Down data show the difference in the intercept between the wild-type strains and the Mmr- strains (Figure 5), which implies that the MMR system influences the recombination frequency between identical substrates, as they pointed out. We have explained the difference by adjusting the h and k{alpha} values. The intercept of our curve for the Mmr- strains (the upper solid curve in Figure 5) is larger by 1.5 than that of our curve for the wild-type strains (the lower solid curve in Figure 5). Of this difference, 0.9 is caused by the, difference in k{alpha} and the rest is caused by the difference in h as calculated with Equation 12. On the contrary, as discussed in the preceding section, both the h and k{alpha} values need not remain fixed in explaining (almost) the same intercepts among VULIC et al. 1997 Down data sets for the three types of strains (Figure 8). Equation 12 tells that the intercept, i.e., the logarithm of the recombination frequency between identical substrates, is insensitive to the h value in the linear-dependence range; we have only to fix k{alpha} among the three types of strains.

We again emphasize that the random-walk model can explain, in a straightforward way, the linear dependence and the nonlinear dependence of the recombination frequency on the homology length, the presence or the absence of the very rapid drop-off and the SOS-induced change of the intercept in the relationship between the recombination frequency and the sequence divergence, and the map expansion. We therefore believe that the random-walk model helps in understanding essential aspects of the reaction of the homologous recombination.


*  ACKNOWLEDGMENTS

Y.F. acknowledges helpful advice of Dr. G. J. M. Koper and Professor K. Kitahara. He also thanks Y. Mizoguchi and J. Kawai, who helped him in some of the curve fits. The work by Y.F. was supported by Keio Gakuji Shinko Shikin. The work by I.K. was supported by grants from the Ministry of Education, Science, Sports and Culture of Japanese government (Class C, Class B, Repair, Genome), Nagase Science and Technology Foundation, Takeda Science Foundation, Yakult Bio-Science Foundation, and New Energy and Industrial Technology Development Organization (NEDO).

Manuscript received March 8, 1999; Accepted for publication August 11, 1999.


*  APPENDIX A
*TOP
*ABSTRACT
*PREVIOUS MODELS
*THE RANDOM-WALK MODEL
*THEORY FOR THE VERY...
*THEORY FOR MMR-DEFECTIVE STRAINS
*FOR LONGER SUBSTRATES
*FURTHER DISCUSSION
*APPENDIX A
*APPENDIX B
*APPENDIX C
*LITERATURE CITED

Using Equation 5 and B9 of FUJITANI et al. 1995 Down, we obtain from Equation 7

(A1)

where

(A2)

However, a more simple expression of p(m;n)* ({infty}), equivalent to the above, saves us computing time.

Using the Laplace transform of pj(t),

(A3)

we obtain from Equation 6

(A4)

where L is an n x n matrix

(A5)

From Equation 7, we thus obtain

(A6)

where L-1 is the inverse of L. Thus, we have from Equation 10

(A7)

Equation A6 is equivalent to Equation 16 of FUJITANI and KOBAYASHI 1995 Down under "{gamma} = 0." As is well known in the research community of path integrals [see, e.g., Equation 3.41 of SAKITA and KIKKAWA 1986 Down], we have

(A8)

Here, {phi} is defined by Equation 9 and satisfies

(A9)

One can check that substituting Equation A5 and Equation A8 into LL-1 produces the n x n unit matrix. [One way to derive Equation A8 is substituting "x1" and "xN-1" obtained from Equations B8 and B9 into Equations B5 and B7 of FUJITANI and KOBAYASHI 1995 Down under {gamma} = 0].

Using Equation A8, we have

(A10)

where we used Equations 1.341.2, 1.314.6, 1.334.1, and 1.313.2 of GRADSHTEYN and RYZHIK 1980 Down. Equation A6 and Equation A10 yield Equation 8 with the aid of Equation A9.

Equation A10 leads to

(A11)

where we used Equations 1.314.6, 1.341.4, and 1.313.2 of GRADSHTEYN and RYZHIK 1980 Down. From Equation A9, and Equation A11, we obtain

(A12)

When {phi} << 1, we have h {approx} 4{phi}2 from Equation A9, and Equation A12 produces Equation 11 because coth {phi} {approx} 1/{phi}.


*  APPENDIX B
*TOP
*ABSTRACT
*PREVIOUS MODELS
*THE RANDOM-WALK MODEL
*THEORY FOR THE VERY...
*THEORY FOR MMR-DEFECTIVE STRAINS
*FOR LONGER SUBSTRATES
*FURTHER DISCUSSION
*APPENDIX A
*APPENDIX B
*APPENDIX C
*LITERATURE CITED

Suppose (N + 1)/2 >= l. Then, the identical subregion can reach neither end of the homologous region if n <= l - 1, but it reaches only the left end if l <= n <= N - l and m = l. Considering it in this way and writing F for Fl(m, n), we have 1. Case of l = 1:

When 1 <= n <= N - 1,

When n = N,

2. Case of l = 2: When n = 1, F = D2(1 - D). When 2 <= n <= N - 2,

When n = N - 1,

When n = N,

3. Cases of 3 <= l <= (N + 1)/2: When n <= l - 1, F = D2(1 - D)n. When l <= n <= N - l (this case does not exist if l = (N + 1/2),

When N - l + 1 <= n <= N - 2,

When n = N - 1,

When n = N,

We can obtain Fl(m, n) for l > (N + 1)/2 by using

(B1)

which comes from the symmetry of the one-dimensional lattice where the random walk occurs. When D = 0, the above Fl(m, n) is reduced to

(B2)

The lth site of a homologous region is diverged with probability D, and otherwise it is the mth site of an identical subregion with n sites with probability Fl(m, n), where 1 <= m <= n and 1 <= n <= N. Thus, the normalization condition is given by

(B3)

It is easy to see that this condition is satisfied when D = 0 because of Equation B2. Let us next check this condition when D != 0 and l <= (N + 1)/2; we then have

(B4)

Here, the first term does not exist when l = 1, the second term does not exist when l = 1 and when l = (N + 1)/2, the third term does not exist when l = (N + 1)/2, and the fourth term does not exist when l <= 2. Using the sum formulas of the geometric series and the arithmetico-geometric series [Equations 0.112 and 0.113 of GRADSHTEYN and RYZHIK 1980 Down, respectively], we can derive Equation B3 from Equation B4. Similarly, we can derive Equation B3 when D != 0 and l > (N + 1)/2.


*  APPENDIX C
*TOP
*ABSTRACT
*PREVIOUS MODELS
*THE RANDOM-WALK MODEL
*THEORY FOR THE VERY...
*THEORY FOR MMR-DEFECTIVE STRAINS
*FOR LONGER SUBSTRATES
*FURTHER DISCUSSION
*APPENDIX A
*APPENDIX B
*APPENDIX C
*LITERATURE CITED

Following the derivation of Equation A7, we can obtain from Equation 14Equation 15Equation 16

(C1)

where M and V are N x N matrices,

(C2)

and

(C3)

with being an arbitrary real number.

We can expand the inverse of the matrix in Equation C1 as

(C4)

where

(C5)

This is a generalization of Equation A8, and is defined so as to satisfy

(C6)

Introducing an N x N matrix,

(C7)

we obtain from Equations C1 and C4

(C8)

where {Delta}nj {equiv} hnj - . Equation C8 tells that {Pi}(RT)(N) is independent of g and g'.

Each of the products hn0 kn0 and hn0kn0 ({Delta}n1)({Delta}n2) ... ({Delta}nq) is put between the angle brackets, < and >, when Equation C8 is averaged over positions of diverged sites. Let us consider the average of the latter product. Suppose that the subscripts n0, n1, ... , nq contain r(: 0 <= r <= q) kinds of numbers, m0({equiv}n0), m1, ... , mr, and that the subscripts n0, n1, ..., nq are composed of N0 pieces of m0, N1 pieces of m1, ... , and Nr pieces of mr. Then, the average of the product is given by

(C9)

However, because all the subscripts n0, n1, ... , nq are different from each other in the overwhelming majority of terms appearing in the summation n0, n1, ..., nq of Equation C8, we can decouple the average of the product approximately as

(C10)

which coincides with the case of r = q and Ni = 1 for any i in Equation C9. This decoupling approximation is valid when both h - and h' - are set to be small enough as compared to unity to make terms of higher power with respect to them negligible in Equation C9. Then, Equation C8 reads

(C11)

Expanding the inverse of a matrix

(C12)

as in Equation C4, where E is the N x N unit matrix, we obtain the infinite series in the brackets of Equation C11. The matrix, Equation C12, turns out to be the matrix L with h replaced by and n replaced by N, where L is defined by Equation A5 and is defined just below Equation 18. Because replacing as such in Equation A8 gives the inverse of Equation C12, replacing as such in Equation A11 gives the summation in Equation C11. Thus, the decoupling approximation yields Equation 18 irrespective of .


*  LITERATURE CITED
*TOP
*ABSTRACT
*PREVIOUS MODELS
*THE RANDOM-WALK MODEL
*THEORY FOR THE VERY...
*THEORY FOR MMR-DEFECTIVE STRAINS
*FOR LONGER SUBSTRATES
*FURTHER DISCUSSION
*APPENDIX A
*APPENDIX B
*APPENDIX C
*LITERATURE CITED

AHN, B., K. J. DORNFELD, T. J. FRAGRELIUS, and D. M. LIVINGSTON, 1988  Effect of limited homology on gene conversion in a Saccharomyces cerevisiae plasmid recombination system. Mol. Cell. Biol. 8:2442-2448[Abstract/Free Full Text].

DATTA, A., M. HENDRIX, M. LIPSITCH, and S. JINKS-ROBERTSON, 1997  Dual roles for DNA sequence identity and the mismatch repair system in the regulation of mitotic crossing-over in yeast. Proc. Natl. Acad. Sci. USA 94:9757-9762[Abstract/Free Full Text].

DENG, C. and M. R. CAPECCHI, 1992  Reexamination of the gene targeting frequency as a function of the extent of homology between the targeting vector and the target locus. Mol. Cell. Biol. 12:3365-3371[Abstract/Free Full Text].

DENTENEER, P. J. H. and M. H. ERNST, 1984  Diffusion in systems with static disorder. Phys. Rev. B 29:1755-1768.

EYRING, H., and E. M. EYRING, 1963 Modern Chemical Kinetics. Reinhold, New York.

FINCHAM, J. R. S. and R. HOLLIDAY, 1970  An explanation of fine structure map expansion in terms of excision repair. Mol. Gen. Genet. 109:309-322[Medline].

FUJITANI, Y. and I. KOBAYASHI, 1995  Random-walk model of homologous recombination. Phys. Rev. E 52:6607-6622.

FUJITANI, Y. and I. KOBAYASHI, 1997  Mismatch-stimulated destruction of intermediates as an explanation for map expansion in genetic recombination. J. Theor. Biol. 189:443-447[Medline].

FUJITANI, Y., K. YAMAMOTO, and I. KOBAYASHI, 1995  Dependence of frequency of homologous recombination on the homology length. Genetics 140:797-809[Abstract].

GRADSHTEYN, I. S., and I. M. RYZHIK, 1980 Tables of Integrals, Series, and Products. Academic Press, New York.

HAUS, J. W. and K. W. KEHR, 1987  Diffusion in regular and disordered lattices. Phy. Rep. 150:263-406.

HOLLIDAY, R., 1964  A mechanism for gene conversion in fungi. Genet. Res. 5:282-304.

JINKS-ROBERTSON, S., M. MICHELITCH, and S. RAMCHARAN, 1993  Substrate length requirements for efficient mitotic recombination in Saccharomyces cerevisiae.. Mol. Cell. Biol. 13:3937-3950[Abstract/Free Full Text].

MAJEWSKI, J. and F. M. COHAN, 1998  The effect of mismatch repair and heteroduplex formation on sexual isolation in Bacillus. Genetics 148:13-18[Abstract/Free Full Text].

NEGRITTO, M. T., X. WU, T. KUO, S. CHU, and A. M. BAILIS, 1997  Influence of DNA sequence identity on efficiency of targeted gene replacement. Mol. Cell. Biol. 17:278-286[Abstract].

PANYUTIN, I. G. and P. HSIEH, 1993  Formation of a single base mismatch impedes spontaneous DNA branch migration. J. Mol. Biol. 230:413-424[Medline].

PORTER, G., J. WESTMORELAND, S. PRIEBE, and M. A. RESNICK, 1996  Homologous and homeologous intermolecular gene conversion are not differentially affected by mutations in the DNA damage or the mismatch repair genes RAD1, RAD50, RAD51, RAD52, RAD54, PMS1 and MSH2.. Genetics 143:755-767[Abstract].

ROBERTS, M. S. and F. M. COHAN, 1993  The effect of DNA sequence divergence on sexual isolation. Genetics 134:401-408[Abstract].

RUBNITZ, J. and S. SUBRAMANI, 1984  The minimum amount of homology required for homologous recombination in mammalian cells. Mol. Cell. Biol. 4:2253-2258[Abstract/Free Full Text].

SAKITA, B., and K. KIKKAWA, 1986 Keiro Sekibun ni yoru Taryuushikei no Ryoushirikigaku (Quantum Mechanics of Many-Particle Systems and Path Integrals). Iwanami, Tokyo (in Japanese).

SHEN, P. and H. V. HUANG, 1986  Homologous recombination in Escherichia coli: dependence on substrate length and homology. Genetics 112:441-457[Abstract/Free Full Text].

SHEN, P. and H. V. HUANG, 1989  Effect of base pair mismatches on recombination via the Rec BCD pathway. Genetics 218:358-360.

SINGER, B. S., L. GOLD, P. GAUSS, and D. H. DOHERTY, 1982  Determination of the amount of homology required for recombination in bacteriophage T4. Cell 31:25-33[Medline].

SUGAWARA, N. and J. E. HABER, 1992  Characterization of double-strand break-induced recombination: homology requirements and single-stranded DNA formation. Mol. Cell. Biol. 12:563-575[Abstract/Free Full Text].

THOMPSON, B. J., M. N. CAMIEN, and R. C. WARNER, 1976  Kinetics of branch migration in double-stranded DNA. Proc. Natl. Acad. Sci. USA 73:2299-2303[Abstract/Free Full Text].

VAN KAMPEN, N. G., 1981 Stochastic Processes in Physics and Chemistry. North-Holland, Amsterdam.

VULIc, M., F. DIONISIO, F. TADDEI, and M. RADMAN, 1997  Molecular keys to speciation: DNA polymorphism and the control of genetic exchange in enterobacteria. Proc. Natl. Acad. Sci. USA 94:9763-9767[Abstract/Free Full Text].

WALDMAN, A. S. and R. M. LISKAY, 1988  Dependence of intrachromosomal recombination in mammalian cells on uninterrupted homology. Mol. Cell. Biol. 8:5350-5357[Abstract/Free Full Text].

ZAWADZKI, P., M. S. ROBERTS, and F. M. COHAN, 1995  The log-linear relationship between sexual isolation and sequence divergence in Bacillus transformation is robust. Genetics 140:917-932[Abstract].




This article has been cited by other articles:


Home page
Mol Biol EvolHome page
T. Tsuru and I. Kobayashi
Multiple Genome Comparison within a Bacterial Species Reveals a Unit of Evolution Spanning Two Adjacent Genes in a Tandem Paralog Cluster
Mol. Biol. Evol., November 1, 2008; 25(11): 2457 - 2473.
[Abstract] [Full Text] [PDF]


Home page
Mol Biol EvolHome page
T. Tsuru, M. Kawai, Y. Mizutani-Ui, I. Uchiyama, and I. Kobayashi
Evolution of Paralogous Genes: Reconstruction of Genome Rearrangements Through Comparison of Multiple Genomes Within Staphylococcus aureus
Mol. Biol. Evol., June 1, 2006; 23(6): 1269 - 1285.
[Abstract] [Full Text] [PDF]


Home page
J. Bacteriol.Home page
N. Handa and I. Kobayashi
Type III Restriction Is Alleviated by Bacteriophage (RecE) Homologous Recombination Function but Enhanced by Bacterial (RecBCD) Function
J. Bacteriol., November 1, 2005; 187(21): 7362 - 7373.
[Abstract] [Full Text] [PDF]


Home page
GeneticsHome page
R. Opperman, E. Emmanuel, and A. A. Levy
The Effect of Sequence Divergence on Recombination Between Direct Repeats in Arabidopsis
Genetics, December 1, 2004; 168(4): 2207 - 2215.
[Abstract] [Full Text] [PDF]


Home page
GeneticsHome page
Y. Fujitani, S. Mori, and I. Kobayashi
A Reaction-Diffusion Model for Interference in Meiotic Crossing Over
Genetics, May 1, 2002; 161(1): 365 - 372.
[Abstract] [Full Text] [PDF]


Home page
J. Bacteriol.Home page
Z.-C. Tu, K. C. Ray, S. A. Thompson, and M. J. Blaser
Campylobacter fetus Uses Multiple Loci for DNA Inversion within the 5' Conserved Regions of sap Homologs
J. Bacteriol., November 15, 2001; 183(22): 6654 - 6661.
[Abstract] [Full Text] [PDF]


Home page
Mol. Cell. Biol.Home page
E. Evans and E. Alani
Roles for Mismatch Repair Factors in Regulating Genetic Recombination
Mol. Cell. Biol., November 1, 2000; 20(21): 7839 - 7844.
[Full Text]


Home page
Infect. Immun.Home page
K. C. Ray, Z.-C. Tu, R. Grogono-Thomas, D. G. Newell, S. A. Thompson, and M. J. Blaser
Campylobacter fetus sap Inversion Occurs in the Absence of RecA Function
Infect. Immun., October 1, 2000; 68(10): 5663 - 5667.
[Abstract] [Full Text] [PDF]