Genetics, Vol. 160, 353-355, January 2002, Copyright © 2002

A Simple Formula Useful for Positional Cloning

Richard T. Durretta, Kai-Yi Chenb, and Steven D. Tanksleyb
a Department of Mathematics, Cornell University, Ithaca, New York 14853-4201
b Department of Plant Breeding and Department of Plant Biology, Cornell University, Ithaca, New York 14853-1902

Corresponding author: Steven D. Tanksley, 252 Emerson Hall, Cornell University, Ithaca, NY 14853-1902., sdt4{at}cornell.edu (E-mail)

Communicating editor: J. A. BIRCHLER


*  ABSTRACT
*TOP
*ABSTRACT
*THE FORMULA AND ITS...
*DERIVATION
*DISCUSSION
*LITERATURE CITED

We derive a formula for the distribution of the length T of the recombination interval containing a target gene and using N gametes in a region where R kilobases correspond to 1 cM. The formula can be used to calculate the number of meiotic events required to narrow a target gene down to a specific interval size and hence should be useful for planning positional cloning experiments. The predictions of this formula agree well with the results from a number of published experiments in Arabidopsis.


POSITIONAL cloning has been widely used in both plants and animals to isolate genes known only by their phenotypic effects. Underlying positional cloning is the assumption that a gene can be pinpointed with sufficient precision to narrow its location to a DNA segment small enough to be sequenced and/or subjected to transformation/complementation experiments (TANKSLEY et al. 1995 Down; LUKOWITZ et al. 2000 Down).

The chromosomal position of a gene targeted for positional cloning is typically defined by the closest flanking crossover events. So, if a gene is to be pinpointed to a defined segment size, a minimum of two crossovers are required, one on either side of the target gene (Fig 1). Theoretically, by having an estimate of the kilobase to centimorgan ratio for a particular genome or genomic region, one can estimate the number of meiotic gametes that one must sample to narrow the position of the gene to a prescribed physical segment of DNA. Surprisingly, despite the large number of genes that have been isolated by positional cloning and the popularity of this technique, we have been unable to find a published formula for making this calculation. It is for this reason we herein describe the derivation of such a formula and its application to positional cloning.



View larger version (8K):
In this window
In a new window
Download PPT slide
 
Figure 1. Independent meiotic crossovers (x) observed in a segment of chromosome containing a gene targeted for positional cloning. T is the size of the genomic segment in kilobases between the two closest flanking crossovers.


*  THE FORMULA AND ITS APPLICATIONS
*TOP
*ABSTRACT
*THE FORMULA AND ITS...
*DERIVATION
*DISCUSSION
*LITERATURE CITED

These calculations assume that one can observe crossovers in gametes derived from an F1 hybrid, which is heterozygous for the target locus, that the genotype of the target locus can be unambiguously determined, and that recombination rates can be assumed to be constant near the target locus. Let T represent the distance (in kilobases) between two crossovers that bracket a target gene; R the kb/cM ratio for the genomic region in question; N the number of gametes to sample (N is equivalent to the number of testcross progeny or two times the number of F2 progeny); and P the probability of finding in N gametes a minimum of two crossovers (one on each side of the target gene) at a physical distance <T.

Then:

This formula assumes that the kilobase/centimorgan ratio (R) is constant in the region of width 2T centered at the target gene. The formula should be applicable to plants, animals, and any organism in which screens can be made for meiotic recombination. The greatest efficiency will be obtained in populations where meiotic recombination can be deduced simultaneously for both male and female gametes (e.g., F2 or recombinant inbred populations).

To illustrate the use of this formula, suppose that we are working in a region where 250 kb corresponds to 1 cM and we are interested in a target size of 100 kb, the size of a bacterial artificial chromosome. In our example with T = 100 kb and R = 250 kb/cM, we show the results in Table 1. We have included the middle column to emphasize the fact that the answer depends only on the variables through the ratio NT/(100R). One consequence of this is that the sample size needed is proportional to the estimate of the recombination rate R. If one believes that there are 750 kb/cM then the needed sample size for a given success probability will be three times as large.


 
View this table:
In this window
In a new window

 
Table 1. Probability that the target interval <100 kb for various sample sizes N

Table 2 allows one to pick the sample size that will produce a given probability. For example, if we want the recombination interval to be smaller than the target with probability P = 0.95, then we should take NT/(100R) = 4.744. In our example T = 100 kb and R = 250 kb/cM, so this translates into N = 1186.


 
View this table:
In this window
In a new window

 
Table 2. Values of the design ratio NT/(100R) that are needed to achieve success with probability P

As a final check on our formula we compare its predictions with the results of positional cloning experiments in Arabidopsis (Table 2 in LUKOWITZ et al. 2000 Down). In Table 3, N is the number of gametes (two times the number of individuals screened), T gives the estimated mapping resolution in kilobases, and P is the probability as computed by our formula of getting a result better than the one observed assuming that R = 250 kb/cM (the value they suggest in their article). Note that most of the P values are in the range 0.25–0.75, indicating that these experimental results are typical of what we expect. (If the assumptions underlying our formula are correct then the observed P values will be uniformly distributed between 0 and 1.) One group got lucky in localizing their target to a 20-kb interval using only a sample of size 972, while another was somewhat unlucky when their sample of size 1914 resulted in a 50-kb interval. These results and the fact that the target size is proportional to R suggest that R = 250 kb/cM is a good guess for Arabidopsis.


 
View this table:
In this window
In a new window

 
Table 3. Results of positional cloning experiments for various Arabidopsis genes (reported in LUKOWITZ et al. 2000 Down)


*  DERIVATION
*TOP
*ABSTRACT
*THE FORMULA AND ITS...
*DERIVATION
*DISCUSSION
*LITERATURE CITED

The formula is obtained by combining the following facts:

  1. When distances are measured in morgans, recombinations follow a Poisson process with rate 1; i.e., the distances between successive crossovers are independent and have an exponential density with mean 1.

  2. When recombinations from N gametes are combined, the result is a Poisson process with rate N; i.e., the distances between successive crossovers are independent and have an exponential density with mean 1/N. See, for example, DURRETT 1999 Down(p.141).

  3. The distances to the first crossover to the left (X) and right (Y) of the target site are independent exponentials with mean 1/N, so their sum X + Y has a gamma distribution. See, for example, DURRETT 1999 Down(p. 129). This implies

The distances X and Y are measured in morgans. To convert our target from morgans to kilobases, we set z = T/(100R).

The reader should note that until the last step all of our computations are exact; i.e., the size of the recombination interval is measured in morgans and has a gamma distribution. In the last step we use the assumption of a constant recombination rate per unit distance near (i.e., within 2T kb) the target to convert from morgans to kilobases.


*  DISCUSSION
*TOP
*ABSTRACT
*THE FORMULA AND ITS...
*DERIVATION
*DISCUSSION
*LITERATURE CITED

Two cases cause severe reduction or suppression of crossover events: centromeric regions (heterochromatic regions) or a chromosomal inversion between two parents that are used to create a mapping population that suppresses recombination. In both cases, positional cloning is not a proper strategy to isolate the target gene.

Chromosomal interference also serves to reduce crossover events in some chromosomal regions and results in inconstant R values across the whole genome. However, chromosomal interference guarantees some minimum crossovers (COPENHAVER et al. 1998 Down), so the formula is still useful in this situation. As noted above, we do not need to assume a constant R value across the whole genome. If the target chromosomal region has much bigger R than the estimated average of a certain species, the formula can calculate the required numbers of gametes on the basis of the new R value. For example, for the positional cloning of HY2, two rounds of recombinant screening were performed (KOHCHI et al. 2001 Down). In the first screening, HY2 was mapped in an interval of ~360 kb equal to 0.51 cM and R was estimated to be ~700 kb/cM, which is much bigger than the average 250 kb/cM of Arabidopsis. Then the second screening used 3818 gametes to achieve the goal to delimit HY2 in a 66-kb contig.


*  ACKNOWLEDGMENTS

Richard T. Durrett was partially supported by a grant from the program in probability at the National Science Foundation (no. DMS 9877066). Steven D. Tanksley was supported by grants from the National Science Foundation (no. DBI-9872617) and U.S. Department of Agriculture Plant Genome Program (no. 97-35300-4384).

Manuscript received July 19, 2001; Accepted for publication October 15, 2001.


*  LITERATURE CITED
*TOP
*ABSTRACT
*THE FORMULA AND ITS...
*DERIVATION
*DISCUSSION
*LITERATURE CITED

COPENHAVER, G. P., W. E. BROWNE, and D. PREUSS, 1998  Assaying genome-wide recombination and centromere functions with Arabidopsis tetrads. Proc. Natl. Acad. Sci. USA 95:247-252[Abstract/Free Full Text].

DURRETT, R., 1999 The Essentials of Stochastic Processes. Springer, New York.

KOHCHI, T., K. MUKOUGAWA, N. FRANKENBERG, M. MUNEHISA, and A. YOKOTA et al., 2001  The Arabidopsis HY2 gene encodes phytochromobilin synthase, a ferredoxin-dependent biliverdin reductase. Plant Cell 13:425-436[Abstract/Free Full Text].

LUKOWITZ, W., C. S. GILLMOR, and W. R. SCHIEBE, 2000  Positional cloning in Arabidopsis. Why it feels good to have a genome initiative working for you. Plant Physiol. 123:795-806[Abstract/Free Full Text].

TANKSLEY, S. D., M. W. GANAL, and G. B. MARTIN, 1995  Chromosome landing: a paradigm for map-based gene cloning in plants with large genomes. Trends Genet. 11:63-68[Medline].




This article has been cited by other articles:


Home page
GeneticsHome page
S. J. Dinka, M. A. Campbell, T. Demers, and M. N. Raizada
Predicting the Size of the Progeny Mapping Population Required to Positionally Clone a Gene
Genetics, August 1, 2007; 176(4): 2035 - 2054.
[Abstract] [Full Text] [PDF]


Home page
Genome ResHome page
L. Wilson, Y.-H. Ching, M. Farias, S. A. Hartford, G. Howell, H. Shao, M. Bucan, and J. C. Schimenti
Random mutagenesis of proximal mouse chromosome 5 uncovers predominantly embryonic lethal mutations
Genome Res., August 1, 2005; 15(8): 1095 - 1105.
[Abstract] [Full Text] [PDF]


Home page
GeneticsHome page
J. D. Faris, J. P. Fellers, S. A. Brooks, and B. S. Gill
A Bacterial Artificial Chromosome Contig Spanning the Major Domestication Locus Q in Wheat and Identification of a Candidate Gene
Genetics, May 1, 2003; 164(1): 311 - 321.
[Abstract] [Full Text] [PDF]