Abstract
Estimates of the number of chromosomal breakpoints that have arisen (e.g., by translocation and inversion) in the evolutionary past between two species and their common ancestor can be made by comparing map positions of marker loci. Statistical methods for doing so are based on a randombreakage model of chromosomal rearrangement. The model treats all modes of chromosome rearrangement alike, and it assumes that chromosome boundaries and breakpoints are distributed randomly along a single genomic interval. Here we use simulation and numerical analysis to test the validity of these model assumptions. Mean estimates of numbers of breakpoints are close to those expected under the randombreakage model when marker density is high relative to the amount of chromosomal rearrangement and when rearrangements occur by translocation alone. But when marker density is low relative to the number of chromosomes, and when rearrangements occur by both translocation and inversion, the number of breakpoints is underestimated. The underestimate arises because rearranged segments may contain markers, yet the rearranged segments may, nevertheless, be undetected. Variances of the estimate of numbers of breakpoints decrease rapidly as markers are added to the comparative maps, but are less influenced by the number or type of chromosomal rearrangement separating the species. Variances obtained with simulated genomes comprised of chromosomes of equal length are substantially lower than those obtained when chromosome size is unconstrained. Statistical power for detecting heterogeneity in the rate of chromosomal rearrangement is also investigated. Results are interpreted with respect to the amount of marker information required to make accurate inferences about chromosomal evolution.
EVOLUTIONARY change in the macrostructure of individual chromosomes occurs largely by reciprocal translocation and inversion. During the course of the independent evolutionary histories separating two species from their common ancestor, divergence in chromosome structure arising from chromosomal rearrangement is manifested as the progressive fractionation of the genome into increasingly smaller conserved chromosome segments (Nadeau and Taylor 1984). For example, comparative mapping studies often show that closely related organisms share large portions of chromosome segments in which the identities and linear orders of genes are conserved, while more distantly related taxa exhibit shorter conserved chromosome segments (Patersonet al. 1996; Ehrlichet al. 1997).
The discovery of conserved segments of chromosomes among taxa suggests that it may be possible to construct unified genetic maps for a number of organismal groups (e.g., the grasses, higher plants, fishes, and mammals; Ahn and Tanksley 1993; Patersonet al. 1996; Nadeau and Sankoff 1997; Gale and Devos 1998). This could have important consequences for genetic and biotechnical applications. For instance, the detailed information on genome structure gained from sequencing and mapping efforts with model systems such as Arabidopsis thaliana may assist in the identification of agriculturally important genes in domesticated plant species and help facilitate markerbased introgression from exotic germ plasm, markerassisted selection, and positional cloning. If the chromosomal locations of one or more genes of interest are known with reference to the positions of a set of marker genes in the model species, the probabilities of linkage between the markers and genes of interest in the target species can be calculated (Nadeau and Taylor 1984). Information on the overall amount of the chromosomal rearrangement separating two species may also help to detect conserved gene blocks (i.e., blocks that are larger than expected given the overall amount of rearrangement observed between two genomes). As well, estimates of the extent and type of chromosomal rearrangement may be useful for reconstructing evolutionary history or for testing specific evolutionary hypotheses about rates of chromosome evolution (Ohno 1967; Charlesworth 1992).
Currently there are few statistical tools for comparing genetic maps, and most studies are based on visual inspection of shared syntenies and conserved gene arrangements. One of the more useful analytical approaches for interpreting comparative map data was initiated by Nadeau and Taylor (1984) and subsequently expanded by Sankoff and colleagues (Sankoff and Nadeau 1996; Ehrlichet al. 1997; Sankoffet al. 1997; Nadeau and Sankoff 1998). These researchers proposed the use of a probabilistic model to infer the amount of chromosomal rearrangement from the lengths and numbers of conserved chromosomal segments detected in a comparative genetic mapping investigation. They express the amount of chromosomal evolution between two species as the number of chromosomal breakpoints separating their genomes (Sankoff and Nadeau 1996). The underlying model is referred to as the “randombreakage model” of chromosomal evolution, because it assumes a uniformly distributed probability that any given chromosomal location will experience a breakpoint (e.g., arising from translocation or inversion) during divergence from a common ancestor.
In practice, the information required to apply the randombreakage model to the estimation of chromosomal evolution comes from the comparative mapping of homologous marker loci such as conserved expressed sequence tags (ESTs; Patersonet al. 1996; Van Deynzeet al. 1998). Estimates based on the model are expected to depend on the validity of the model assumptions, and on the amount and quality of the comparative map data. The amount of chromosomal evolution separating the species in question may also influence the accuracy of the estimates. Given the increasing interest in comparative genomic investigations, it is surprising that there have been no studies of how the amount of mapping effort influences the quality of inferences obtained from the comparative maps. In this article I investigate the estimation of chromosome evolution based on the random breakage model, and (1) how estimates of chromosomal breakpoints and their variances are influenced by the density of markers used in comparative mapping; (2) how estimates of chromosomal breakpoints and their variances are influenced by the amount and type of chromosomal rearrangement; and (3) how ability to detect heterogeneity in the rate of chromosomal rearrangement is influenced by the density of markers and extent of chromosomal evolution.
METHODS
Maximumlikelihood estimates of chromosomal divergence and their variances (numerical solutions): In analytical studies of the randombreakage model, the genome is represented as a single interval of unit length 1.0, broken at n randomly placed positions (e.g., by translocations and inversions) as well as by chromosome endpoints (Sankoff and Nadeau 1996). This results in n + 1 segments in which gene order is conserved with reference to another genome of interest. When there are m homologous marker genes distributed uniformly on the interval 0–1, the probability that an arbitrary segment contains r marker genes is (Sankoff and Nadeau 1996)
Maximumlikelihood estimates of chromosomal divergence and their variances (simulation studies): Results obtained with the methods outlined above give one picture of the relationship of the mean and variance of
To extend the investigation to more realistic genomes, chromosome evolution was modeled by computer simulation. A fixed ancestral genome size of T length units was assumed such that each chromosome was of equal length T/c. The m homologous marker genes were assigned to random positions along the chromosomes. Starting with this ancestral genome, t random translocation and i random inversion events were distributed at random to two isolated lineages. For each translocation, chromosome segment exchange involved two randomly chosen chromosomes and two randomly chosen breakpoints (separated by the same distance on each of the two chromosomes). For each inversion, one chromosome was chosen at random, and two breakpoints within it were randomly chosen. Following the e = t + i chromosome rearrangement events, the chromosomes of the two species were compared, and the number of conserved chromosome segments (i.e., the number of segments containing identical runs of one or more marker genes when compared in forward or reverse order) was counted. The total number of conserved segments containing one or more marker genes was recorded to obtain the value of a, which together with m was used to calculate the probabilities in Equation 2. The value of n that maximized the probability was retained as
To restrict the number of different simulation conditions, it was assumed that chromosome numbers remain constant following divergence from the common ancestor. Chromosome evolution involving duplication of chromosomes, followed by divergence of the duplicated chromosomes, is thus outside the realm of the results presented below.
Simulations were conducted for a variety of different combinations of m, c, t, and i. The choice of values for these parameters was guided by results from published investigations (Tanksleyet al. 1992; Ahn and Tanksley 1993; Patersonet al. 1996; Nadeau and Sankoff 1997). For each combination of these parameters, the mean and variance of
Detection of heterogeneity in the rate of chromosomal evolution: Studies have shown that different lineages may undergo different rates of chromosomal rearrangement (Ehrlichet al. 1997), though there are few statistical tools for examining rate heterogeneity. Likelihood estimation as outlined above can be extended to the detection of heterogeneity in the rate of chromosomal rearrangement. One approach is to compare the estimated rate of chromosomal rearrangement for the taxa of interest with the rate(s) reported in studies of other taxa (Patersonet al. 1996; Lagercrantz 1998).
Let the MLE of chromosomal rearrangement occurring between two species, species A and B, be denoted as
The approximate statistical power (probability of rejection of the null hypothesis) of the test was examined by simulation. Simulations were conducted, as described above, under a variety of input parameters (different combinations of m, t, i, and c). For each combination of input parameters, 500 simulations were conducted, and for each set of simulated data, the value of Φ was calculated for null hypotheses of
RESULTS
Maximumlikelihood estimates of chromosomal divergence and their variances (numerical analysis): The maximumlikelihood estimator returns the value of
Maximumlikelihood estimates of chromosomal divergence (simulation results): When chromosome evolution occurs via t translocation and i inversion events in a pair of species each having c chromosomes, the number of chromosomal breakpoints expected is n = 2(t + i) + c (Sankoff and Nadeau 1996). Results obtained from the application of likelihood Equation 2 to the estimation of n with simulated data are shown in Figure 3.
When chromosome evolution occurs by translocation alone, and the density of markers is high relative to the number of chromosomes, the MLEs are close to their expected values (Figure 3, a and b). In the case where chromosome evolution occurs by translocation alone, and the number of markers is low relative to the number of chromosomes, there is significant departure of the
MLEs from their expected values (Figure 3c). The occurrence of inversions contributes further to underestimation of the true value of n by the MLEs. It is in the range of 5–50% when inversions account for half of the rearrangements (Figure 3, d–f) and rises to nearly 70% when inversions account for all rearrangements (Figure 3, g–i). Again, underestimation is most pronounced when marker number is low relative to chromosome number. The basis for this underestimation of
As seen in the numerical analysis, the CVs of the MLEs decrease with m in an accelerating manner (Figure 4). The effect of increasing the number of markers is most pronounced for the first few hundred markers. For instance, with c = 20 chromosomes, progressing from 100 to 200 markers reduces the CV by ~50–60%, from 200 to 400 markers by 25%, and from 400 to 800 markers by ~10%. For any given value of m, the CVs are slightly larger when there are more rearrangement events separating the species, but the difference becomes almost nil for m ≥ 400. The relation of the CVs with m are similar for translocations and inversions (Figure 4).
Heterogeneity in the rate of chromosomal evolution (simulation results and illustration using published data): The power of the loglikelihood ratio test to detect heterogeneity in rates of chromosomal reshuffling increases as the number of markers placed on the maps increases, but for tests involving rate heterogeneity of >10%, the rate of gain in statistical power diminishes rapidly with marker number. These results are shown in Figure 5 for c = 20 chromosomes and e = 90 rearrangements. Nearly identical results were obtained for c = 10 and c = 30 (results not shown). The increase in power is roughly linear when rate heterogeneity between the lineages being compared is in the vicinity of 10% or less; but above 20% rate heterogeneity, the increase in power is decelerating with increasing marker numbers. When marker numbers are >m = 200, there are rapidly diminishing returns in power per marker added to the maps. There is relatively little difference in the shapes of the power curves across the range of e values studied (e = 15–90) regardless of whether rearrangments were due to translocations or inversions (results not shown).
To illustrate the application of the loglikelihood ratio test, published comparative mapping data from A. thaliana and Brassica nigra were examined (Lagercrantz 1998). In this study, comparative mapping based on 284 markers uncovered 87 conserved segments. Estimation of n based on numerical evaluation of the likelihood Equation 2 gives an estimate of 124 breakpoints separating the species. This is higher than the estimate obtained by Lagercrantz (1998), who used the more conservative procedure of Nadeau and Taylor (1984) that does not consider segments marked by single loci. Lagercrantz (1998) notes that the two mustard family species may have diverged ~35 million years ago, and that the rate at which chromosomal rearrangement has occurred since their divergence is significantly greater than that seen in other plants and animals. Comparison of the results for the A. thalianaB. nigra rate estimate with those obtained in other comparative mapping investigations (Patersonet al. 1996; Lagercrantz 1998) lend qualitative support to this conclusion. For instance, the next highest rate of chromosome rearrangement currently reported is the 13 rearrangements between Triticum and Secale that are estimated to have occurred over 6 million years. (Patersonet al. 1996). Scaling the TriticumSecale estimate for the divergence time assumed above for Arabidopsis and Brassica gives 76 breakpoints. As this number is smaller than the 87 conserved segments observed by Lagercrantz (1998) in the ArabidopsisBrassica comparison, a L(n_{constrained}  a *,m), where n_{constrained} = 76 cannot be calculated. If instead we take n_{constrained} to be equal to 87, Φ can be calculated as
DISCUSSION
Underestimates of n: The simulation results illustrate that when marker number is low relative to the number of chromosomes, or when rearrangements occur by both translocation and inversion, the number of breakpoints is underestimated under the random breakage model. This can be understood by considering Equations 1 and 2 together with some of the possible relationships that may arise between marker positions and chromosome rearrangement events, as illustrated in Figure 6. As noted, the MLE of n is a function of the total number of nonempty fragments (detected conserved fragments), a, observed in the comparative mapping study. Under the randombreakage model, probabilities of observing such fragments are defined by Equation 1. If, however, one considers the biological mechanisms by which chromosome breakpoints are generated (Figure 6), it becomes clear that there are several types of rearrangements of nonempty segments that may go undetected. Accordingly, the value of a obtained will be lower than expected under the randombreakage model.
One type of undetected rearrangement is an inversion that occurs in a segment containing a single marker (Figure 6a). Such an event is effectively “invisible” to the investigator, and the extent of underestimation can, in fact, be quantified when rearrangements arise only by inversion. Note that undetected inversions will occur with probability P(r = 1) as defined by Equation 1. The expected number of rearranged segments, a*, will, therefore, be reduced by the fraction [1 − P(0) − P(1)]/[1 − P(0)]. From Equation 4, the number of nonempty fragments (when all rearrangements occur by inversion) becomes
Translocations located in nonempty segments may also go undetected as illustrated in Figure 6b. When these types of events occur, a is underestimated, and the MLE of n is again underestimated. Compared with undetected inversions, however, undetected translocations are less likely to lead to underestimation of n, because they involve the sequential progression of several events. The problem is expected to occur most frequently when the number of markers per chromosome is low, a result that is supported by the simulations (Figure 3, a–c).
Variances of the MLE estimate of n: As progressively more conserved fragments are detected through comparative mapping of new markers, additional mapping effort will only marginally reduce the variance of the estimate of chromosome evolution (Figure 4). A similar relationship was found between mapping effort and ability to detect heterogeneity in the rate of chromosomal evolution (Figure 5). The actual values of the variances and CVs are significantly smaller (by 50% or more) than those obtained via numerical analysis based on the randombreakage model (Figure 2). This qualitative difference is not unexpected given that the simple random breakage model studied by numerical analysis assumes that chromosomal boundaries are distributed at random on the interval 0–1 (see above). While the variances seen using simulation are likely to be more representative than those calculated under the randombreakage model, nonrandom distributions of translocations and inversions could act to inflate variances obtained with actual data.
Marker numbers, estimation of n, and the detection of conserved functional gene blocks: There has been some discussion that blocks of genes found in conserved chromosome segments may represent gene combinations that interact functionally to produce important organismal characteristics (e.g., blocks of genes that interact to produce characteristics closely related to organismal fitness; Bodmer 1975; Lundin 1979; Patersonet al. 1996). But because all genomes are interrelated, most colinear groups of genes detected in a comparative genomic investigation are likely to reflect nothing more than the limited number of genomic rearrangements following descent from a common ancestor. To move beyond the simple observation of large, conserved genome segments in the search for functionally related gene blocks, one requires knowledge of the “null” distribution of conserved segment lengths (i.e., that expected from random chromosome reshuffling and descent from a common ancestor). If the number of breakpoints separating the species in question is known, along with the total lengths of their genomes (in centimorgans or base pairs), the mean number of rearrangements per unit genome length n/L (where L is the total genome length) can be calculated. Given this information, the probability distribution of no rearrangements in a segment of length x can be derived from the Poisson distribution as P(x) = exp(−nx/L) (see Nadeau and Taylor 1984). This distribution provides a benchmark against which to compare the observed distribution of conserved segment sizes. One may then ask whether there are segments that appear longer than expected given
Marker numbers, the estimation of n, and exploratory surveys of genomic evolution: The results of this investigation have implications for applied studies and comparative evolutionary work based on comparative mapping. They suggest that studies of chromosome evolution based on low densities of markers (e.g., <100–200 per genome) may underestimate the amount of chromosomal rearrangement, especially between taxa that are distantly related or in instances where inversion has played a large role in restructuring the chromosome. Ehrlich et al. (1997), who have used the randombreakage model to study chromosomal evolution in mammals, estimated that interchromosomal rearrangements have occurred roughly four times as often as intrachromosomal rearrangements following the divergence of humans and mice from their common ancestor. This is unexpected given the apparent strong selection against translocations. A relatively high ratio of translocations to inversions has also been reported in other investigations (Lagercrantz 1998). It is possible that some of the observed high ratios of translocations to inversions may be due to the inherent bias against detection of inversions as noted above.
Another issue is how many markers are required to obtain a low variance estimate of chromosomal rearrangement. When comparative mapping is used to examine the prospects of applying genetic map information from a wellcharacterized model species to a less wellcharacterized target species, the emphasis is often on uncovering candidate regions containing quantitative trait loci (e.g., for genes contributing to yield or disease resistance; Linet al. 1995; Patersonet al. 1995; Pereira and Lee 1995). The first objective is not a finescale comparative map, but rather the rough evaluation of the extent of conservation of synteny and gene order in the target group. Once a picture of this emerges, the investigator can determine whether additional map detail would greatly enhance the prospects of finding conserved segments containing the gene(s) of interest and marker(s). For the initial task, our results suggest that several hundred markers per species are sufficient. This means that if other (e.g., related species) are of interest, comparative mapping effort could be allocated over more members of the target group. This has relevance to efforts aimed at uncovering and evaluating the potential of nontraditionally used germ plasm (e.g., wild relatives of crop plants) as sources of useful genetic variation (Tanksley and McCouch 1997). In other words, a more useful division of comparative mapping effort in these types of investigations may be to spread effort across a larger number of candidate species rather than to pursue an ever more detailed comparative study of one or two species. A similar argument may hold when one is interested in using information on chromosomal rearrangement to construct a phylogeny or to compare rearrangement rates in different lineages (Ehrlichet al. 1997).
Conclusions: Apart from the bias against detection of inversion, the results presented in this investigation accord well with those of other studies in suggesting that estimation of numbers of chromosome breakpoints is robust to relatively small numbers of markers. For example, Nadeau and Sankoff (1998) have shown that as additional markers are included in a comparative mapping effort, the undetected but conserved segments become progressively smaller in number and in length. As well, estimates of genome rearrangement obtained with few markers have not changed substantially when many more markers are added (Nadeau and Taylor 1984; Copelandet al. 1993). It seems reasonable to conclude that much can be learned about the amount of gross chromosomal rearrangement from comparative mapping studies that use a moderate number of markers. In some species, however, factors such as many inversions, smallscale deletions and transpositions (e.g., below the resolution provided by the marker density used), and largescale duplications of entire chromosomes may render the task more difficult, and more research will be required to deal with the complications resulting from such events.
Acknowledgments
I thank Steve Wright, Andrew Paterson, and two anonymous reviewers for commenting on the material presented in this article. This research was supported by a grant from the Natural Sciences and Engineering Research Council of Canada.
Footnotes

Communicating editor: A. G. Clark
 Received February 12, 1999.
 Accepted October 22, 1999.
 Copyright © 2000 by the Genetics Society of America