- THIS ARTICLE
-
Abstract
- Full Text (PDF)
- Alert me when this article is cited
- Alert me if a correction is posted
- SERVICES
- Similar articles in this journal
- Similar articles in PubMed
- Alert me to new issues of the journal
- Download to citation manager
- Reprints & Permissions
- CITING ARTICLES
- Citing Articles via HighWire
- Citing Articles via Google Scholar
- GOOGLE SCHOLAR
- Articles by Schoen, D. J.
- Search for Related Content
- PUBMED
- PubMed Citation
- Articles by Schoen, D. J.
Comparative Genomics, Marker Density and Statistical Analysis of Chromosome Rearrangements
Daniel J. Schoenaa Department of Biology, McGill University, Montreal, Quebec H3A 1B1, Canada
Corresponding author: Daniel J. Schoen, McGill University, 1205 Ave. Docteur Penfield, Montreal, Quebec H3A 1B1 Canada., dan_schoen{at}maclan.mcgill.ca (E-mail)
Communicating editor: A. G. CLARK
| ABSTRACT |
|---|
Estimates of the number of chromosomal breakpoints that have arisen (e.g., by translocation and inversion) in the evolutionary past between two species and their common ancestor can be made by comparing map positions of marker loci. Statistical methods for doing so are based on a random-breakage model of chromosomal rearrangement. The model treats all modes of chromosome rearrangement alike, and it assumes that chromosome boundaries and breakpoints are distributed randomly along a single genomic interval. Here we use simulation and numerical analysis to test the validity of these model assumptions. Mean estimates of numbers of breakpoints are close to those expected under the random-breakage model when marker density is high relative to the amount of chromosomal rearrangement and when rearrangements occur by translocation alone. But when marker density is low relative to the number of chromosomes, and when rearrangements occur by both translocation and inversion, the number of breakpoints is underestimated. The underestimate arises because rearranged segments may contain markers, yet the rearranged segments may, nevertheless, be undetected. Variances of the estimate of numbers of breakpoints decrease rapidly as markers are added to the comparative maps, but are less influenced by the number or type of chromosomal rearrangement separating the species. Variances obtained with simulated genomes comprised of chromosomes of equal length are substantially lower than those obtained when chromosome size is unconstrained. Statistical power for detecting heterogeneity in the rate of chromosomal rearrangement is also investigated. Results are interpreted with respect to the amount of marker information required to make accurate inferences about chromosomal evolution.
EVOLUTIONARY change in the macrostructure of individual chromosomes occurs largely by reciprocal translocation and inversion. During the course of the independent evolutionary histories separating two species from their common ancestor, divergence in chromosome structure arising from chromosomal rearrangement is manifested as the progressive fractionation of the genome into increasingly smaller conserved chromosome segments (![]()
![]()
![]()
The discovery of conserved segments of chromosomes among taxa suggests that it may be possible to construct unified genetic maps for a number of organismal groups (e.g., the grasses, higher plants, fishes, and mammals; ![]()
![]()
![]()
![]()
![]()
![]()
![]()
Currently there are few statistical tools for comparing genetic maps, and most studies are based on visual inspection of shared syntenies and conserved gene arrangements. One of the more useful analytical approaches for interpreting comparative map data was initiated by ![]()
![]()
![]()
![]()
![]()
![]()
In practice, the information required to apply the random-breakage model to the estimation of chromosomal evolution comes from the comparative mapping of homologous marker loci such as conserved expressed sequence tags (ESTs; ![]()
![]()
| METHODS |
|---|
Maximum-likelihood estimates of chromosomal divergence and their variances (numerical solutions):
In analytical studies of the random-breakage model, the genome is represented as a single interval of unit length 1.0, broken at n randomly placed positions (e.g., by translocations and inversions) as well as by chromosome endpoints (![]()
![]()
![]() |
(1) |
In a comparative mapping study one uncovers conserved segments containing r
1 marker genes. These are referred to as "nonempty segments." There will also be a certain number of conserved segments that do not contain markers and that thus remain undetected (empty segments). Comparison of the maps of the two species provides information on the number of nonempty segments, each containing r marker genes (sr, where r
1). The sum total of the nonempty segments, a =
sr, is sufficient for calculation of the likelihood that there are n chromosomal breakpoints separating the two species (![]()
![]() |
(2) |
Numerical analysis of Equation 2 allows one to determine the maximum-likelihood estimate (MLE) of (hereafter
). The estimated asymptotic variance of
can be calculated as
![]() |
(3) |
(![]()
and its variance estimate, we require the value of a expected when there have been n chromosomal breakpoints and m markers. This expected value, a*, is derived from Equation 1 as
![]() |
(4) |
This solution allows one to obtain numerical solutions to Equation 3 under different combinations of n, m, and a*, and thereby examine how the numbers of markers used and the actual amount of chromosomal evolution influence the estimates of chromosomal rearrangement and their variances.
Maximum-likelihood estimates of chromosomal divergence and their variances (simulation studies):
Results obtained with the methods outlined above give one picture of the relationship of the mean and variance of
to the numbers of markers used and the amount of chromosomal evolution separating the species. These results, however, may differ from those obtained with actual genomes for several reasons. First, the analytical model described above (and the associated likelihood estimator) assumes that all conserved segments arising from chromosomal rearrangement will be detected provided they contain one or more markers. As illustrated below, this need not be true in general, especially in the case of chromosomal inversions. Second, the random-breakage model assumes that the genome is comprised of a single long interval with uniformly distributed breakpoints arising from both chromosomal segment reshuffling as well as from the chromosome end points. True chromosome size variation, however, is constrained (![]()
(![]()
To extend the investigation to more realistic genomes, chromosome evolution was modeled by computer simulation. A fixed ancestral genome size of T length units was assumed such that each chromosome was of equal length T/c. The m homologous marker genes were assigned to random positions along the chromosomes. Starting with this ancestral genome, t random translocation and i random inversion events were distributed at random to two isolated lineages. For each translocation, chromosome segment exchange involved two randomly chosen chromosomes and two randomly chosen breakpoints (separated by the same distance on each of the two chromosomes). For each inversion, one chromosome was chosen at random, and two breakpoints within it were randomly chosen. Following the e = t + i chromosome rearrangement events, the chromosomes of the two species were compared, and the number of conserved chromosome segments (i.e., the number of segments containing identical runs of one or more marker genes when compared in forward or reverse order) was counted. The total number of conserved segments containing one or more marker genes was recorded to obtain the value of a, which together with m was used to calculate the probabilities in Equation 2. The value of n that maximized the probability was retained as
.
To restrict the number of different simulation conditions, it was assumed that chromosome numbers remain constant following divergence from the common ancestor. Chromosome evolution involving duplication of chromosomes, followed by divergence of the duplicated chromosomes, is thus outside the realm of the results presented below.
Simulations were conducted for a variety of different combinations of m, c, t, and i. The choice of values for these parameters was guided by results from published investigations (![]()
![]()
![]()
![]()
was calculated over 500 simulation trials. A copy of the simulation program (written in FORTRAN) is available from the author on request.
Detection of heterogeneity in the rate of chromosomal evolution:
Studies have shown that different lineages may undergo different rates of chromosomal rearrangement (![]()
![]()
![]()
Let the MLE of chromosomal rearrangement occurring between two species, species A and B, be denoted as
AB. The estimate of chromosomal rearrangement reported between two other species C and D (scaled for the same estimated amount of time separating species A and B) is denoted as nCD. The log-likelihood ratio test statistic follows from Equation 1 and Equation 2 as
![]() |
(5) |
where nconstrained is the likelihood when n is constrained to a given value (e.g., that of nCD). The test statistic is distributed as
2 with 1 d.f. (![]()
The approximate statistical power (probability of rejection of the null hypothesis) of the test was examined by simulation. Simulations were conducted, as described above, under a variety of input parameters (different combinations of m, t, i, and c). For each combination of input parameters, 500 simulations were conducted, and for each set of simulated data, the value of
was calculated for null hypotheses of nconstrained = k
AB (where k is a constant that defines the null hypothesis in question). The proportion of cases where the value of the test statistic exceeded the critical value at the P < 0.05 and 0.01 levels gives an approximation of the statistical power of the test.
| RESULTS |
|---|
Maximum-likelihood estimates of chromosomal divergence and their variances (numerical analysis):
The maximum-likelihood estimator returns the value of
expected for a*. The likelihood peak becomes progressively sharper with increases in m (Figure 1). The asymptotic variance estimate of
is seen to be a decreasing curvilinear function of m. The effect on the estimation of n can be seen most clearly by examining the relationship of the coefficient of variation (CV) of
to m (Figure 2). The largest reductions in CV occur in the initial stages of mapping effort, but as m is increased beyond several hundred markers, reductions in the variance of
become progressively smaller. For any given value of m, the CVs are larger when there are more chromosomal breakpoints separating the species in question (i.e., true value of n large), but only marginally so (Figure 2).
|
|
Maximum-likelihood estimates of chromosomal divergence (simulation results):
When chromosome evolution occurs via t translocation and i inversion events in a pair of species each having c chromosomes, the number of chromosomal breakpoints expected is n = 2(t + i) + c (![]()
|
When chromosome evolution occurs by translocation alone, and the density of markers is high relative to the number of chromosomes, the MLEs are close to their expected values (Figure 3, a and b). In the case where chromosome evolution occurs by translocation alone, and the number of markers is low relative to the number of chromosomes, there is significant departure of the MLEs from their expected values (Figure 3C). The occurrence of inversions contributes further to underestimation of the true value of n by the MLEs. It is in the range of 550% when inversions account for half of the rearrangements (Figure 3, df) and rises to nearly 70% when inversions account for all rearrangements (Figure 3, gi). Again, underestimation is most pronounced when marker number is low relative to chromosome number. The basis for this underestimation of
is discussed below.
As seen in the numerical analysis, the CVs of the MLEs decrease with m in an accelerating manner (Figure 4). The effect of increasing the number of markers is most pronounced for the first few hundred markers. For instance, with c = 20 chromosomes, progressing from 100 to 200 markers reduces the CV by ~5060%, from 200 to 400 markers by 25%, and from 400 to 800 markers by ~10%. For any given value of m, the CVs are slightly larger when there are more rearrangement events separating the species, but the difference becomes almost nil for m
400. The relation of the CVs with m are similar for translocations and inversions (Figure 4).
|
Heterogeneity in the rate of chromosomal evolution (simulation results and illustration using published data):
The power of the log-likelihood ratio test to detect heterogeneity in rates of chromosomal reshuffling increases as the number of markers placed on the maps increases, but for tests involving rate heterogeneity of >10%, the rate of gain in statistical power diminishes rapidly with marker number. These results are shown in Figure 5 for c = 20 chromosomes and e = 90 rearrangements. Nearly identical results were obtained for c = 10 and c = 30 (results not shown). The increase in power is roughly linear when rate heterogeneity between the lineages being compared is in the vicinity of 10% or less; but above 20% rate heterogeneity, the increase in power is decelerating with increasing marker numbers. When marker numbers are >m = 200, there are rapidly diminishing returns in power per marker added to the maps. There is relatively little difference in the shapes of the power curves across the range of e values studied (e = 1590) regardless of whether rearrangments were due to translocations or inversions (results not shown).
|
To illustrate the application of the log-likelihood ratio test, published comparative mapping data from A. thaliana and Brassica nigra were examined (![]()
![]()
![]()
![]()
![]()
![]()
![]()
![]()
can be calculated as

This result is highly significant (P < 0.001) and supports Lagercrantz's conclusion that the rate of chromosomal rearrangement following divergence of Arabidopsis and Brassica has been unusually high.
| DISCUSSION |
|---|
Underestimates of n:
The simulation results illustrate that when marker number is low relative to the number of chromosomes, or when rearrangements occur by both translocation and inversion, the number of breakpoints is underestimated under the random breakage model. This can be understood by considering Equation 1 and Equation 2 together with some of the possible relationships that may arise between marker positions and chromosome rearrangement events, as illustrated in Figure 6. As noted, the MLE of n is a function of the total number of nonempty fragments (detected conserved fragments), a, observed in the comparative mapping study. Under the random-breakage model, probabilities of observing such fragments are defined by Equation 1. If, however, one considers the biological mechanisms by which chromosome breakpoints are generated (Figure 6), it becomes clear that there are several types of rearrangements of nonempty segments that may go undetected. Accordingly, the value of a obtained will be lower than expected under the random-breakage model.
|
One type of undetected rearrangement is an inversion that occurs in a segment containing a single marker (Figure 6A). Such an event is effectively "invisible" to the investigator, and the extent of underestimation can, in fact, be quantified when rearrangements arise only by inversion. Note that undetected inversions will occur with probability P(r = 1) as defined by Equation 1. The expected number of rearranged segments, a*, will, therefore, be reduced by the fraction [1 - P(0) - P(1)]/[1 - P(0)]. From Equation 4, the number of nonempty fragments (when all rearrangements occur by inversion) becomes
![]() |
(6) |
Comparison of the MLE of n based on aI* reveals a relationship with m and a level of underestimation similar to that observed with simulation (Figure 7).
|
Translocations located in nonempty segments may also go undetected as illustrated in Figure 6B. When these types of events occur, a is underestimated, and the MLE of n is again underestimated. Compared with undetected inversions, however, undetected translocations are less likely to lead to underestimation of n, because they involve the sequential progression of several events. The problem is expected to occur most frequently when the number of markers per chromosome is low, a result that is supported by the simulations (Figure 3, ac).
Variances of the MLE estimate of n:
As progressively more conserved fragments are detected through comparative mapping of new markers, additional mapping effort will only marginally reduce the variance of the estimate of chromosome evolution (Figure 4). A similar relationship was found between mapping effort and ability to detect heterogeneity in the rate of chromosomal evolution (Figure 5). The actual values of the variances and CVs are significantly smaller (by 50% or more) than those obtained via numerical analysis based on the random-breakage model (Figure 2). This qualitative difference is not unexpected given that the simple random breakage model studied by numerical analysis assumes that chromosomal boundaries are distributed at random on the interval 01 (see above). While the variances seen using simulation are likely to be more representative than those calculated under the random-breakage model, nonrandom distributions of translocations and inversions could act to inflate variances obtained with actual data.
Marker numbers, estimation of n, and the detection of conserved functional gene blocks:
There has been some discussion that blocks of genes found in conserved chromosome segments may represent gene combinations that interact functionally to produce important organismal characteristics (e.g., blocks of genes that interact to produce characteristics closely related to organismal fitness; ![]()
![]()
![]()
) (see ![]()
and L. Such segments may have been selectively conserved due to their function. But because n is estimated, the distribution P(x) is not known with certainty. The question arises, therefore, of how many markers are needed to compare observed with predicted segment length distributions. Applying the method of statistical differentials (![]()
). These results suggest that in contrast to the other applications discussed above, a comparative genomics investigation that aims to detect selectively conserved chromosome segments by examining segment size distribution may benefit from the mapping of larger numbers of markers. Moreover, the comparison of observed and expected chromosome segment length requires that the true segment lengths and the total genome length be known. The segment lengths can be estimated from the observed distances between the outermost markers on each segment (see ![]()
![]()
|
Marker numbers, the estimation of n, and exploratory surveys of genomic evolution:
The results of this investigation have implications for applied studies and comparative evolutionary work based on comparative mapping. They suggest that studies of chromosome evolution based on low densities of markers (e.g., <100200 per genome) may underestimate the amount of chromosomal rearrangement, especially between taxa that are distantly related or in instances where inversion has played a large role in restructuring the chromosome. ![]()
![]()
Another issue is how many markers are required to obtain a low variance estimate of chromosomal rearrangement. When comparative mapping is used to examine the prospects of applying genetic map information from a well-characterized model species to a less well-characterized target species, the emphasis is often on uncovering candidate regions containing quantitative trait loci (e.g., for genes contributing to yield or disease resistance; ![]()
![]()
![]()
![]()
![]()
Conclusions:
Apart from the bias against detection of inversion, the results presented in this investigation accord well with those of other studies in suggesting that estimation of numbers of chromosome breakpoints is robust to relatively small numbers of markers. For example, ![]()
![]()
![]()
| ACKNOWLEDGMENTS |
|---|
I thank Steve Wright, Andrew Paterson, and two anonymous reviewers for commenting on the material presented in this article. This research was supported by a grant from the Natural Sciences and Engineering Research Council of Canada.
Manuscript received February 12, 1999; Accepted for publication October 22, 1999.
| LITERATURE CITED |
|---|
AHN, S. and S. D. TANKSLEY, 1993 Comparative linkage maps of the rice and maize genomes. Proc. Natl. Acad. Sci. USA 90:7980-7984
BODMER, W. F., 1975 Analysis of linkage by somatic cell hybridization and its conservation by evolution, pp. 5361 in Chromosome Variations in Human Evolution, edited by A. BOYCE. Taylor and Francis, London.
CHAKRAVARTI, A., L. K. LASHER, and J. E. REEFER, 1991 A maximum likelihood method for estimating genome length using genetic linkage data. Genetics 128:175-182[Abstract].
CHARLESWORTH, B., 1992 Evolutionary rates in partially self-fertilizing species. Am. Nat. 140:126-148.
COPELAND, N. G., N. A. JENKINS, D. J. GILBERT, J. T. EPPIG, and L. J. MALATAIS et al., 1993 A genetic linkage map of the mouse: current applications and future prospects. Science 262:57-66
EHRLICH, J., D. SANKOFF, and J. H. NADEAU, 1997 Synteny conservation and chromosome rearrangements during mammalian evolution. Genetics 147:289-296[Abstract].
ELANDT-JOHNSON, R. C., 1971 Probability Models and Statistical Methods in Genetics. John Wiley & Sons, New York.
GALE, M. D. and K. M. DEVOS, 1998 Comparative genetics in the grasses. Proc. Natl. Acad. Sci. USA 95:1971-1974
LAGERCRANTZ, U., 1998 Comparative mapping between Arabidopsis thaliana and Brassica nigra indicates that Brassica genomes have evolved through extensive genome replication accompanied by chromosome fusions and frequent rearrangements. Genetics 150:1217-1228
LIN, Y. R., K. F. SCHERTZ, and A. H. PATERSON, 1995 Comparative analysis of QTLs affecting plant height and maturity across the Poaceae, in reference to an interspecific sorghum population. Genetics 141:391-411[Abstract].
LUNDIN, L.-G., 1979 Evolutionary conservation of large chromosomal segments reflected in mammalian gene maps. Clin. Genet. 16:72-81[Medline].
NADEAU, J. H. and D. SANKOFF, 1997 Landmarks in the Rosetta Stone of mammalian comparative maps. Nat. Genet. 15:6-7[Medline].
NADEAU, J. H. and D. SANKOFF, 1998 The lengths of undiscovered conserved segments in comparative maps. Mamm. Genome 9:491-495[Medline].
NADEAU, J. H. and B. A. TAYLOR, 1984 Lengths of chromosomal segments conserved since divergence of man and mouse. Proc. Natl. Acad. Sci. USA 81:814-818
OHNO, S., 1967 Sex Chromosomes and Sex-linked Genes. Springer-Verlag, New York.
PATERSON, A. H., Y.-R. LIN, Z. LI, K. F. SCHERTZ, and J. F. DOEBLEY et al., 1995 Convergent domestication of cereal crops by independent mutations at corresponding genetic loci. Science 269:1714-1718
PATERSON, A. H., T. H. LAN, K. P. REISCHMANN, C. CHANG, and Y. R. LIN et al., 1996 Toward a unified genetic map of higher plants, transcending the monocot-dicot divergence. Nat. Genet. 14:380-382[Medline].
PEREIRA, M. G. and M. LEE, 1995 Identification of genomic regions affecting plant height in sorghum and maize. Theor. Appl. Genet. 90:380-388.
SANKOFF, D. and J. H. NADEAU, 1996 Conserved synteny as a measure of genomic distance. Disc. Appl. Math. 71:247-257.
SANKOFF, D., M.-N. PARENT, I. MARCHAND and V. FERRETTI, 1997 On the Nadeau-Taylor theory of conserved chromosome segments, pp. 262274 in Combinatorial Pattern Matching, edited by A. APOSTOLICO and J. HEIN. Springer-Verlag, New York.
STEBBINS, G. L., 1971 Chromosomal Evolution in Higher Plants. Addison-Wesley, Reading, MA.
TANKSLEY, S. D. and S. R. MCCOUCH, 1997 Seed banks and molecular maps: unlocking genetic potential from the wild. Science 277:1063-1065
TANKSLEY, S. D., M. W. GANAL, J. P. PRINCE, M. C. DE VICENTE, and M. W. BONIERBALE et al., 1992 High density molecular linkage maps of the tomato and potato genomes. Genetics 132:1141-1160[Abstract].
VAN DEYNZE, A. E., M. E. SORRELLS, W. D. PARK, N. M. AYRES, and H. FU et al., 1998 Anchor probes for comparative mapping of grass genera. Theor. Appl. Genet. 97:356-369.
WEIR, B. S., 1996 Genetic Data Analysis II: Methods for Discrete Population Genetic Data. Sinauer, Sunderland, MA.
This article has been cited by other articles:
![]() |
A. Bhutkar, S. W. Schaeffer, S. M. Russo, M. Xu, T. F. Smith, and W. M. Gelbart Chromosomal Rearrangement Inferred From Comparisons of 12 Drosophila Genomes Genetics, July 1, 2008; 179(3): 1657 - 1680. [Abstract] [Full Text] [PDF] |
||||
![]() |
S. E. Hampson, B. S. Gaut, and P. Baldi Statistical detection of chromosomal homology using shared-gene density alone Bioinformatics, April 15, 2005; 21(8): 1339 - 1348. [Abstract] [Full Text] [PDF] |
||||
![]() |
Y. Q. Gu, D. Coleman-Derr, X. Kong, and O. D. Anderson Rapid Genome Evolution Revealed by Comparative Sequence Analysis of Orthologous Regions from Four Triticeae Genomes Plant Physiology, May 1, 2004; 135(1): 459 - 470. [Abstract] [Full Text] [PDF] |
||||
![]() |
P. Pevzner and G. Tesler Human and mouse genomic sequences reveal extensive breakpoint reuse in mammalian evolution PNAS, June 24, 2003; 100(13): 7672 - 7677. [Abstract] [Full Text] [PDF] |
||||
![]() |
S. Hampson, A. McLysaght, B. Gaut, and P. Baldi LineUp: Statistical Detection of Chromosomal Homology With Application to Plant Comparative Genomics Genome Res., May 1, 2003; 13(5): 999 - 1010. [Abstract] [Full Text] [PDF] |
||||
![]() |
J. Gonzalez, J. M. Ranz, and A. Ruiz Chromosomal Elements Evolve at Different Rates in the Drosophila Genome Genetics, July 1, 2002; 161(3): 1137 - 1154. [Abstract] [Full Text] [PDF] |
||||
![]() |
S. Kumar, S. R. Gadagkar, A. Filipski, and X. Gu Determination of the Number of Conserved Chromosomal Segments Between Species Genetics, March 1, 2001; 157(3): 1387 - 1395. [Abstract] [Full Text] |
||||
- THIS ARTICLE
-
Abstract
- Full Text (PDF)
- Alert me when this article is cited
- Alert me if a correction is posted
- SERVICES
- Similar articles in this journal
- Similar articles in PubMed
- Alert me to new issues of the journal
- Download to citation manager
- Reprints & Permissions
- CITING ARTICLES
- Citing Articles via HighWire
- Citing Articles via Google Scholar
- GOOGLE SCHOLAR
- Articles by Schoen, D. J.
- Search for Related Content
- PUBMED
- PubMed Citation
- Articles by Schoen, D. J.


















