Bayesian Estimation of Genomic Distance
 ^{*} Department of Mathematics, Cornell University, Ithaca, New York 14853
 ^{†} Department of Biological Statistics and Computational Biology, Cornell University, Ithaca, New York 14853
 1 Corresponding author: Department of Mathematics, 523 Malott Hall, Cornell University, Ithaca, NY 14853. Email: rtd1{at}cornell.edu
Abstract
We present a Bayesian approach to the problem of inferring the number of inversions and translocations separating two species. The main reason for developing this method is that it will allow us to test hypotheses about the underlying mechanisms, such as the distribution of inversion track lengths or rate constancy among lineages. Here, we apply these methods to comparative maps of eggplant and tomato, human and cat, and human and cattle with 170, 269, and 422 markers, respectively. In the first case the most likely number of events is larger than the parsimony value. In the last two cases the parsimony solutions have very small probability.
UNDERSTANDING the relationship between the organizations of two genomes is important for transferring information between species, for example, for finding animal models of human diseases or locating genes of agricultural importance (O'Brienet al. 1999). Inferences concerning genome evolution have primarily used parsimony methods. In the biology literature, experts have compared chromosomebanding patterns to detail the evolution of primate genomes (Dutrillaux 1979) or have used comparative gene mapping and crossspecies chromosome painting to analyze mammalian genome evolution (Chowdharyet al. 1996; Haig 1999; Murphyet al. 2001). In a separate and independent effort computer scientists have developed algorithms for computing the minimum number of events needed to transform one genome into another (Hannenhalli and Pevzner 1995a,b). The number of evolutionary events (i.e., inversions and translocations) separating two species is of intrinsic biological interest, but may also guide comparative mapping studies and may be used for estimating phylogenies.
In an earlier article (Yorket al. 2002) we developed a Bayesian method to infer the history of inversions separating homologous chromosomes from two different species. This method can be applied to mitochondrial genomes, to X chromosomes, and to chromosome arms of species of genera Drosophila (Ranzet al. 2001) and Anopheles (Sharakhovet al. 2002), where pericentric inversions and translocations are rare. In this article we extend these methods to the problem of genomic distance, i.e., the number of inversions and translocations needed to transform one genome into another. Here, fissions and fusions are included as a special case of translocations in which one of the input or output chromosomes is empty.
There are several reasons for preferring our statistical approach to parsimony methods. The first and simplest is that there is no guarantee that nature took the shortest path. Indeed, we see in two of our examples that the parsimony solution is not in the 95% credible interval. A second important reason is that our approach allows for estimation of rates of evolution and testing of hypotheses: Are all inversions equally likely? Are rates the same in different lineages? Answering the first question is important for a proper analysis of the second. Otherwise we will not know if the perceived higher rate of rearrangements in rodents compared to carnivores is due to higher density of markers in the first case.
In this article we concentrate on estimating the number of events needed to change one genome into the other, leaving hypothesis testing issues for later work. In the next section we describe existing parsimony methods. The following section (a bayesian approach) explains our new approach, and in the last section we analyze three data sets.
PARSIMONY METHODS
Hannenhalli and Pevzner (1995a,b), whom we abbreviate HP, developed a polynomial algorithm for computing the distance between two genomes; i.e., what is the smallest number of inversions and translocations needed to transform one genome into another? They accomplished this in two steps. In the first article they solved the problem of computing the reversal distance, i.e., the smallest number of inversions needed to transform one chromosome or mitochondrial genome into another. In the second article they showed how to reduce the genome distance problem to the one for reversal distance by concatenating chromosomes into one unit. That approach is convenient for proving the existence of a polynomial algorithm, but not for our implementation so we take a different approach.
To illustrate the method we consider part of the data of Doganlar et al. (2002), who constructed a comparative genetic linkage map of eggplant (Solanum melongena) on the basis of 233 markers for tomato. Using the first letter of the common name to denote the species they found that the marker orders on T1 and E1 and on T8 and E8 were identical, while in four other cases (T2 vs. E2, T6 vs. E6, T7 vs. E7, and T9 vs. E9) the collections of markers were the same and the orders became the same after a small number of inversions were performed (3, 1, 2, and 1, respectively).
In our example we compare the remaining six chromosomes from the two species. The first step is to divide the chromosomes into conserved segments where the adjacency of markers has been preserved between the two species, allowing for the possibility of the overall order being reversed. When such segments have two or more markers we can determine the relative orientation. However, as the HP algorithm assumes that one knows the relative orientation of segments we have to assign orientations to conserved segments consisting of single markers to minimize the distance. In the case of the tomatoeggplant comparison there are only five singleton segments so one can easily consider all 2^{5} = 32 possibilities. In the humancat and humancattle comparisons in analysis of three data sets there are 21 and 75 singlemarker segments, respectively, so the amount of work is considerable (2^{21} = 2,097,152) or impossible (2^{75} = 3.77 × 10^{22}).
We now describe the details of the distance computation. Conceptually this is simple—one defines a graph and then computes the number of cycles. However, carrying out the procedure involves a number of details. The first part of Table 1 shows the two genomes with an assignment of signs to the singlemarker segments that minimizes the distance. The first step in preparing to use the HP algorithm is to double the markers. When segment i is doubled we replace it by two consecutive numbers, 2i – 1 and 2i; e.g., 6 becomes 11 and 12. A reversed segment –i is replaced by 2i and 2i – 1; e.g., –5 is replaced by 10 and 9. The second step is to add ends to the chromosomes and enough empty chromosomes to make the number of chromosomes equal. In this example, no empty chromosomes are needed. We have labeled the ends in the first genome by 1000–1011 and in the second genome by 2000–2011.
The second part of Table 1 shows the result of the first two preparatory steps. Commas indicate separations between two segments or between a segment and an end. The next step is to construct the breakpoint graph that results when the commas are replaced by edges that connect vertices with the corresponding numbers. We did not draw the graph since we need to know only the connected components of the graph. Since each vertex has degree two, these are easy to find: Start with a vertex and follow the connections. The resulting component will be either a path that connects two ends or a cycle that consists of markers and no ends. In our example there are five paths of length three that connect ends. These “short paths” tell us that end 1000 in genome 1 corresponds to end 2000 in genome 2, etc. The other correspondences between ends are determined after we compute the distance.
The other “long paths” in the breakpoint graph are listed in Table 1. At this point there are no cycles in the breakpoint graph. To compute a lower bound for the distance we start now with the number of commas seen when we write out one genome. In this example that is 33. We subtract the number of connected components in the breakpoint graph. In this example that is 5 + 7 = 12, and then we add the number of paths that begin and end in the same genome, which in this case is 0. The result, which is 21 in this case, is a lower bound on the distance since any inversion or translocation can at most reduce this quantity by 1, and it is 0 when the two genomes are the same.
In general the distance between genomes can be larger than the lower bound from the breakpoint graph and the computation of the exact distance can be quite complex. In the reversal distance problem there can be obstructions called hurdles that can prevent us from decreasing the distance and hurdles can be intertwined in a fortress of hurdles that takes an extra move to break. In the genome distance problem the situation is even more complex and one must consider six different quantities to compute the distance. Fortunately these complexities rarely arise in biological data sets. Bafna and Pevzner (1995) considered the reversal distance problem for 11 chloroplast and mitochondrial data sets and in all cases they found that the distance is equal to the lower bound. That is also the case for the three examples we consider in this article.
To verify that 21 is the minimum distance we construct a sequence of 21 moves that transforms genome 1 into genome 2. This is straightforward, but somewhat tedious, so we have hidden the details away in an appendix. We have given this computation to show that the procedure gives a systematic method for finding the shortest path. In the example we are considering our solution gives a slight improvement on that of Doganlar et al. (2002).
The HP method just described gives an estimate of the number of events that can be seen with the marker set used; e.g., an inversion that occurs between two markers or reverses the orientation of a single marker cannot be detected. The method of Nadeau and Taylor (1994), which can be used when we know the distances between markers in one of the genomes, avoids this problem. Their computation is based on the length of conserved segments with more than two markers, enlarges the lengths on the basis of the expected value of the unseen flanking ends, and then uses the transformed data to get an estimate of the average size of conserved segments. Using the total size of the genome, this translates into an estimate of the number of conserved segments, S.
Subtracting the number of chromosomes in the genome for which the distances are known from S gives an estimate of the number of disruptions, R, that have occurred. Given R, we divide by 2 to get an estimate of the number of events that have occurred. In the literature this estimate is called the breakpoint distance. It is reasonable to use in Nadeau and Taylor's context, since they are considering the situation in which complete information about marker order has revealed all of the disruptions, and if we assume that all breakpoints are equally likely, then with high probability no two disruptions will occur between the same pair of nucleotides. On the other hand with a small number of markers, e.g., in the tomatoeggplant comparison considered above, the breakpoint distance is not very accurate. For the six chromosomes considered, we have 27 – 6 = 21 disruptions but the genomic distance is 20.
A BAYESIAN APPROACH
The aim of the Bayesian approach is to generate the probability distribution of inversions and translocations given the data. The Bayesian approach is desirable because it allows probabilities to be assigned to particular evolutionary histories. On the basis of these probabilities, estimates of the number of inversions and translocations with associated measures of statistical uncertainty can be obtained. In addition, the Bayesian approach allows us to make probabilistic statements regarding the relative frequency of translocations and inversions and investigate the distribution of the inversion tract length. However, to use the Bayesian approach assumptions must be made regarding the biological processes generating inversions and translocations. We assume that inversions and translocations arise in the genome at a constant rate and that the process of inversions and translocations forms a Markov chain with state space (Ω_{O}) given by the set of all possible orderings of the markers on ordered chromosomes. Letting all markers be signed indicating their orientation on the chromosome, there are
Transitions between neighboring states in this collapsed Markov chain occur at rates λ_{I} if the two states differ by an inversion and λ_{T} if they differ by a translocation, except for translocations involving a fission of two chromosomes in an equivalence class with M_{0} empty chromosomes. These transitions occur at rate 2M_{0}λ_{T}. Therefore, the transition probabilities of the Markov chain obey the detailed balance equations and the Markov chain is time reversible. In the following we use the representation based on the Markov chain with state space on Ω_{U}.
Consider the genome of two organisms. The genomic marker data from one species (x_{1}) can be transformed into the genomic marker data from another species (x_{2}) through a sequence of inversions and translocations. Because the process we have defined is time reversible we can write the sampling probability as
In practice, to avoid keeping track of the times between events, we use a method based on uniformization to ensure that the total rate of change is constant in time. In brief, we allow pseudoevents of evolutionary change, which have no effect on marker order, to occur at rate max_{(}_{I}_{T,}_{I}_{I)}{I_{T}λ_{T} + I_{I}λ_{I}} – (I_{T}(t)λ_{T} + I_{I}(t)λ_{I}) at time t. The total rate of evolutionary change is then kept constant at rate max_{(}_{I}_{T,}_{I}_{I)}{I_{T}λ_{T} + I_{I}λ_{I}}. As in York et al. (2002), an efficient proposal kernel for y is constructed by considering the breakpoint graph of Hannenhali and Pevzner (1995a).
The prior densities for λ_{T} and λ_{I} are assumed to be independent uniform on (0, λ_{Tmax}) and (0, λ_{Imax}), respectively. Updates to these parameters are then proposed by simulating a new value of the parameter uniformly in a window around the current value, independently of each other and of updates of y. The marginal posterior distributions of the parameters are then estimated by sampling values of λ_{T} and λ_{I} from the simulated Markov chain at stationarity. The distribution of values of λ_{T} and λ_{I} is proportional to the joint likelihood function for λ_{T} and λ_{I}, and the results can be interpreted in both a Bayesian and a likelihood framework. Here we focus on the Bayesian interpretation since the usual asymptotic properties of the likelihood function may not be satisfied.
The method can also be used to make inferences on y. The posterior distribution of the number of inversions (L_{I}) and the number of translocations (L_{T}) that have occurred in the history of the two species can be estimated by sampling from the simulated Markov chain. Likewise, the posterior distribution of inversion tract lengths can be estimated.
To improve convergence we used the technique of Metropoliscoupled Markov chain Monte Carlo, in which each chain with stationary distribution π is coupled to a set of “heated” chains with modified stationary distributions, π_{i} = π^{1/}^{T}_{i}, where the “temperatures,” T_{i}, are slightly greater than one, resulting in flattened distributions and, consequently, better mixing. Only unheated chains are used in estimating posterior distributions.
Our estimates of posterior distributions use every eighth update, after excluding an initial burnin phase, i.e., the part of the Markov chain before convergence to stationarity. To determine an appropriate length of the burnin phase, we run multiple chains and compare the betweenchain variance of the evolutionary path length L (for example), B_{L}, to the withinchain variance, W_{L}, requiring that B_{L}/W_{L} become small. This is essentially the method of Gelman and Rubin (1992). We similarly consider B/W for L_{I}, L_{T}, λ_{I}, and λ_{T}, and define the burnin phase to end when all five of these ratios are <0.1. The thinning factor of 8 is chosen to keep the output files to a reasonable size, and since this is much less than the number of updates to stationarity (which is >10^{4} for all of the data sets considered here) there should be little loss of information about the posterior distribution.
ANALYSIS OF THREE DATA SETS
Tomato vs. eggplant: Doganlar et al. (2002) constructed a comparative map of tomato and eggplant consisting of 233 markers. Thinning their data set to choose one marker from each group in which the order was not resolved leads to a data set with 170 markers. On the basis of this map and comparisons with maps for potato (Tanksleyet al. 1992) and pepper (Livingstoneet al. 1999) they concluded: “Overall, eggplant and tomato were differentiated by 28 rearrangements, which could be explained by 23 paracentric inversions and five translocations during evolution from the species' last common ancestor” (Doganlaret al. 2002, p. 1697).
As demonstrated in parsimony methods, the parsimony solution for this comparison is 21 events for the six chromosomes and seven inversions for the other six or also a total of 28 using only ordinary translocations. Our Bayesian analysis produced 95% credible intervals of [5, 7], [21, 31], and [28, 37] for the number of translocations, inversions, and total number of events (respectively) separating tomato and eggplant. These results are from 6 unheated chains, each coupled to 4 heated chains. Each of these 30 chains was updated 459,000 times, using 75,000 sec of CPU time on a 1.5GHz processor. The first 14,000 updates are excluded as burnin. The posterior distribution for the number of translocations assigns probability 0.08172 to 5, 0.55407 to 6, 0.32137 to 7, 0.03832 to 8, and 0.00453 to 9 or more. Figure 1 gives the posterior distribution of the number of inversions, which has a mode of 25. Thus, even in the case of these two closely related species, the most likely number of inversions and translocations is somewhat higher than the parsimony estimates.
Figure 2 gives the posterior joint distribution of the rates of translocations λ_{t} and inversions λ_{i}. The mode of the distribution occurs at λ_{t} = 0.000219 and λ_{i} = 0.0194. To interpret these numbers we note that in the two genomes being compared there are an average of 30,271 possible translocations and 1335 possible inversions. Multiplying we see that the total rate is 6.629 for translocations and 25.899 for inversions. This rate assumes that the total evolutionary time between the two genomes has been scaled to be 1. Taking 12 million years as an estimate of the divergence between tomato and eggplant we arrive at rates of 0.276 and 1.078 per genome per million years for translocations and inversions.
Human vs. cat: Murphy et al. (2000) created a radiation hybrid map of the cat genome integrating 424 genes with 176 microsatellite markers. Using resources on the National Center for Biotechnology Information (NCBI) home page, http://www.ncbi.nlm.nih.gov, we were able to locate the position of 281 of the genes in the human genome. From this set we deleted 12 singleton disruptions, i.e., isolated genes that map to a chromosome different from their neighbors. We do this because it is our belief that these genes are the result of duplications of small segments that recent studies (Baileyet al. 2002a) have shown to occur frequently in the human genome. In support of this practice we note that none of the regions deleted are observed in the chromosomepainting experiments of Weinberg et al. (1997) and those techniques are thought to be capable of detecting segments as small as 5 Mb.
Parsimony analysis shows that the marker order in the human genome can be transformed into the cat genome in 78 moves (14 translocations and 64 inversions). Our Bayesian analysis gives 95% credible intervals of [12, 15], [71, 89], and [85, 102] for the numbers of translocations, inversions, and total number of events, respectively. These results are from 6 unheated chains, each coupled to 3 heated chains. Each of these 24 chains was updated 2.2 million times, using 790,000 sec of CPU time. The first 306,000 updates are excluded as burnin.
Note that the parsimony distance is not in the 95% credible interval for the total number of events. In fact, the posterior probability of a total of 78 events is ∼2 × 10^{–5}. The posterior distribution for the number of translocations assigns probability 0.5967 to 12, 0.3451 to 13, 0.0531 to 14, and 0.0050 to 15 or more, so this time the smallest value is the most likely. The posterior distribution of the number of inversions is given in Figure 3, with the mode being 79.
The extensive conservation of gene order between human and cat is a widely cited result; see, e.g., O'Brien et al. (1999). However, in some cases this fact is exaggerated on the basis of a misunderstanding: Bailey et al. (2002b, p. 83) say “Even human comparisons to distantly related mammals demonstrate strong conservation with the estimated number of rearrangements varying from 17 in felines to 180 in mice.” Unfortunately, here they are comparing the number of inversions and translocations in felines on the basis of fluorescence in situ hybridization data, which cannot detect many inversions with the number of inversions and translocations in mice based on a fairly dense comparative map.
Murphy et al. (1999a) mapped 25 markers on the feline X chromosome and found that the marker order in humans is identical. In the radiation hybrid map of Murphy et al. (2000), complete conservation of marker order holds for the 8 markers on HSA 9 and FCA D4 but most other chromosomes show a number of inversions. Our estimate for humancat comparison is about onehalf of the one for humancattle given below. It would be interesting to use three species comparisons to partition the events between lineages but here and in most other cases there are not enough shared markers to make inferences. Bourque and Pevzner (2002) compared human, cat, and rat. They found 193 markers shared by the three species but when they discarded the ones for which they could not determine the orientation only 114 were left.
It is interesting to compare the number of translocations estimated with the result of chromosomepainting experiments of Weinberg et al. (1997). The syntenic segments (i.e., contiguous sets of markers that all map to the same chromosome) from chromosome painting correspond to those in the comparative map with one notable exception: HSA 12 has a segment homologous to part of FCA D3, which was not revealed by the original chromosome painting but was identified by the followup study of Murphy et al. (1999b). If we add this segment to the chromosome painting then one needs 12 translocations and two inversions to transform one genome into the other.
Finally we compare our estimates with those of Nadeau and Taylor. There are 83 conserved segments with two or more markers with an average length of 16.3 Mb, which results in an estimated average length of 18.2 Mb. Using 3.2 Gb for the length of the human genome yields an estimate of 175.8 segments. Subtracting 22 chromosomes in the human genome, we arrive at an estimate of 155.8 disruptions or an estimate of 77.4 events. The reader should note that not only is this lower than the parsimony and Bayesian estimates but also the method of Nadeau and Taylor includes events that are not visible with our set of markers.
Figure 4 gives the posterior joint distribution of the rates of translocations λ_{t} and inversions λ_{i}. The mode of the distribution occurs at λ_{t} = 0.000161 and λ_{i} = 0.0350. To interpret these numbers we note that in the two genomes being compared there are an average of 79,650 possible translocations and 2370 possible inversions. Multiplying we see that the total rate is 12.82 for translocations and 82.95 for inversions. Taking 100 million years as an estimate for the divergence between humans and cats we arrive at rate estimates of 0.0641 and 0.415 per genome per million years for translocations and inversions.
Human vs. cattle: Band et al. (2000) constructed a radiation hybrid map of cattle (Bos taurus) with a total of 638 genes. For the data see http://bos.cvm.tamu.edu/htmls. Using resources on the NCBI home page, we were able to locate the position of 448 of the genes in the human genome. Deleting 24 singleton disruptions for the reasons indicated above results in a map with 422 markers. Again the reduced map is consistent with the results of chromosome painting (see Hayes 1995; Chowdharyet al. 1996).
Parsimony analysis shows that the marker order in the human genome can be transformed into the cattle genome in 155 moves (20 translocations and 135 inversions). Our Bayesian approach experienced convergence problems in this example. Four unheated chains were each coupled to eight heated chains. Two of these sets of chains ran for 1.3 million updates and the other two sets for 1.5 million updates, using a total of 2.6 million CPU seconds. Our usual criterion for the end of burnin was not attained; a burnin of 600,000 updates is used for the plots shown.
Figures 5 and 6 give the four posterior distributions for the number of translocations and inversions. In Figure 5 there is a considerable difference between the chains, indicating convergence problems. The qualitative differences between chains in the number of inversions are not as great as in the case of translocations. The modes are all in the range 185–191 but the variance differs considerably from run to run. We cannot make statements with much confidence about the number of inversions and translocations. However, two things are clear: (a) The number of events is roughly twice that in the human vs. cat comparison even though the divergence times are similar and (b) our conclusions differ considerably from those of Band et al. (2000, p. 1359), who say that their “comparative map suggests that 41 translocation events and a minimum of 54 internal rearrangements have occurred.” They do not explain how they reached this conclusion. However, we would require a larger number of translocations if we had not deleted the singletons and they would underestimate the number of inversions if they used the breakpoint distance.
To estimate distances using the methods of Nadeau and Taylor we note that there are 125 conserved segments with 2 or more markers with an average length of 7.19 Mb, which results in an estimated average length of 7.57 Mb. Using 3.2 Gb for the length of the human genome yields an estimate of 422.7 segments. Subtracting 22 chromosomes in the human genome gives an estimate of 400.7 disruptions or an estimate of 200 events. This is somewhat larger than our Bayesian estimate, but it is consistent with the fact that our estimate is restricted to the events that can be detected by the 422 markers in our map.
CONCLUSIONS
Elucidating the evolutionary history of genomes is one of the major goals of comparative genomics. We have shown that a full probabilistic approach to the problem is feasible, but the method experiences difficulty with data sets that have had a large number of events (inversions and translocations). For the humancat and humancattle comparisons, the probability that the true number of events is close to the minimum number is very small, so in these cases the parsimony estimate is not reliable. The breakpoint distance, which is the number of disruptions/2, is even worse.
Our analysis of the three data sets has revealed a consistent pattern in which inversions are four to seven times more frequent than translocations. Our ratio is larger than that of Murphy et al. (2000), who conclude that “the ordered gene maps of the cat and cow genomes show approximately twice as many intrachromosomal (inversions) as interchromosomal changes” (Murphyet al. 2001, p. 4). There are three reasons for this discrepancy: (i) Biologists use the breakpoint distance, which for these maps underestimates the minimum number of inversions; (ii) our Bayesian estimate is typically considerably larger than the minimum distance; and (iii) as genetic maps become denser more inversions can be seen.
Our methods can be extended to a model in which inversion probabilities depend upon tract length. This generalization is important if we are going to accurately extrapolate from the number of inversions seen by a collection of markers to the number that have actually occurred in the history of a genome. A second extension we would like to pursue is to incorporate gene duplications to treat the entire data set and obtain rate estimates for small segmental duplications.
Acknowledgments
All three authors were partially supported by grant no. 0201037 from a joint program of the National Science Foundation and National Institute of General Medical Sciences.
APPENDIX
To explain the solution for the eggplanttomato data set, we call the edges that result from commas in genome 1 black edges and those from genome 2 gray edges. On each step we choose an inversion or translocation involving two gray edges to increase the number of connected components by 1. It is easy to check that this can be done only by picking two edges in the same cycle.
Some of the cycles are easy to deal with. Using dashes to indicate gray edges, flipping segment 17 turns 1007 3632 3335 342009 into 1007 3632 332009 and 35 = 34, where = indicates a black and a gray edge now connecting 34 and 35. Flipping 18 17 now we have 1007 362009 and 32 = 33. Of course we could have also flipped 18 17 to 17 18 and then flipped 17. Similarly, 1009 4442 4340 412002 is eliminated by changing 21 22 20 to 22 21 20 with two inversions and 1011 5449 4852 5351 502011 is eliminated by changing 24 26 27 25 to 24 25 26 27 with three inversions.
To follow the solution on page 1707 of Doganlar et al. (2002) we also do two inversions to turn 1 5 2 to 1 2 5, flip the order of 14 11 15 3, flip 19, and perform a reciprocal translocation of T4 and T10. After these 11 events the situation is given by the arrangement called step 1 at the top of Table A1. Twentythree components are now in the breakpoint graph, so the distance has been reduced to 10. At this point Doganlar et al. (2002) invoke a nonreciprocal translocation to put 3 4 in the right place. In our scheme this requires two translocations. Written in terms of the cycles the moves are 295 with 49 and then 299 with 82006. In terms of the chromosomes we first make 1 2 3 4 and 6 5 15 11 14 9 and then 1 2 3 4 5 6 and 15 11 14 9.
Performing translocations 2719 with 2629 and 3139 with 3847 and then performing two inversions to turn 12 23 to 23 12 lead to the arrangement called step 2. After these six changes the distance is 4. To finish up we flip 9 then perform the translocation 4623 with 3147 to make 23 24 25 26 27 and 18 17 16 12 13 14 11 15. Flipping 11 15 to 15 11 and then 12 13 14 15 we have E10 written backward and we are done.
Footnotes

Communicating editor: J. B. Walsh
 Received June 9, 2003.
 Accepted October 1, 2003.
 Copyright © 2004 by the Genetics Society of America