Genetics, Vol. 154, 323-332, January 2000, Copyright © 2000

A Chromosome-Based Model for Estimating the Number of Conserved Segments Between Pairs of Species From Comparative Genetic Maps

David Waddingtona, Anthea J. Springbetta, and David W. Burta
a Roslin Institute (Edinburgh), Roslin, Midlothian EH25 9PS, Scotland

Corresponding author: David Waddington, Roslin Institute (Edinburgh), Roslin, Midlothian EH25 9PS, Scotland., dave.waddington{at}bbsrc.ac.uk (E-mail)

Communicating editor: G. A. CHURCHILL


*  ABSTRACT
*TOP
*ABSTRACT
*METHODS
*RESULTS
*DISCUSSION
*LITERATURE CITED

Comparative genetic maps of two species allow insights into the rearrangements of their genomes since divergence from a common ancestor. When the map details the positions of genes (or any set of orthologous DNA sequences) on chromosomes, syntenic blocks of one or more genes may be identified and used, with appropriate models, to estimate the number of chromosomal segments with conserved content conserved between species. We propose a model for the distribution of the lengths of unobserved segments on each chromosome that allows for widely differing chromosome lengths. The model uses as data either the counts of genes in a syntenic block or the distance between extreme members of a block, or both. The parameters of the proposed segment length distribution, estimated by maximum likelihood, give predictions of the number of conserved segments per chromosome. The model is applied to data from two comparative maps for the chicken, one with human and one with mouse.


COMPARATIVE gene mapping, the analysis of the chromosomal location of homologous genes in different species, is a powerful tool for gene mapping and the study of genome organization and evolution. The most detailed comparisons are between mouse and man, with >2000 homologous genes mapped in both species. Almost 200 linkage groups are conserved between these two species (CARVER and STUBBS 1997 Down). Even before these detailed comparative gene maps were assembled, the early genetic maps of man and mouse were used to estimate the mean length and number of chromosomal segments conserved during evolution (NADEAU and TAYLOR 1984 Down). Comparison of the locations of 83 homologous loci revealed 13 conserved segments. Statistical models were developed for using this sample of conserved segments to estimate the mean length of all conserved autosomal segments in the genome as 8.1 cM. This was used to estimate the number of conserved segments as 198, which is very close to the number observed today. Most comparative studies have focused on mammals, notably mouse and human comparisons (O'BRIEN et al. 1993 Down, O'BRIEN et al. 1997 Down; WOMACK and KATA 1995 Down; ANDERSSON et al. 1996 Down; CARVER and STUBBS 1997 Down). Recently, comparisons between birds (BURT et al. 1995 Down; ANDERSSON et al. 1996 Down; JONES et al. 1997 Down; PITEL et al. 1998 Down; SMITH and CHENG 1998 Down) or bony fish (MORIZOT 1983 Down; POSTLETHWAIT et al. 1998 Down) and mammals reveal a high degree of conservation of genome organization. This is surprising given that these species diverged from a common ancestor 420 mya.

The genetic marker maps of farm animals such as cattle, pigs, and poultry are now sufficiently well advanced to be of practical value for the study of economically important traits and livestock improvement (ANDERSSON et al. 1996 Down). Knowledge of the location of coding sequences is, however, limited. Maps of major livestock species contain 1000–2000 anonymous microsatellite markers and only 5–10% of all genetic markers are genes. Mapping of several vertebrate genomes is progressing rapidly, but by far the most detailed information is still to be found for mouse and human. Through comparative gene mapping, it is possible to link the "gene-poor" maps of livestock to the "gene-rich" maps of human and mouse (ANDERSSON et al. 1996 Down).

Many measures of genome rearrangement are possible, depending on the level of gene mapping information available (e.g., synteny, gene order, and gene position) and the corresponding mathematical modeling approach used. Two derived measures of the degree of genome reorganization between two species using synteny data have been proposed (BENGTSSON et al. 1993 Down), and also a measure of genome similarity using gene order (ZAKHAROV et al. 1995 Down). More mechanistic models have been derived from some or all of the known chromosome modification mechanisms such as reciprocal translocation, inversion, transposition, and chromosome fusion and fission. Such an approach has been developed to obtain a direct estimate of the number of conserved segments from synteny data (SANKOFF and NADEAU 1996 Down; ERLICH et al. 1997 Down), which takes account of as yet unobserved syntenies. When sequences of genes are accurately mapped, similar descriptive models of genome rearrangement are possible (SANKOFF 1993 Down; HANNENHALLI 1995 Down; HANNENHALLI and PEVZNER 1995 Down) but these models do not allow for undiscovered segments.

Our concern is with incomplete data of an intermediate accuracy arising from genetic maps, which yield blocks of conserved synteny. These contain information on the number of genes per block and the measured distance (or range) between extreme genes in blocks with at least two members, but ignore information on gene order. The first published estimate of the number of conserved segments between man and mouse used an approach based on such data (NADEAU and TAYLOR 1984 Down). Although this landmark work used measurements of distance, subsequent approaches have concentrated either on counts of genes (by chromosome or blocks within chromosomes) or on gene order. We build on the approach of NADEAU and TAYLOR 1984 Down by using both counts and the additional distance information available in ranges, when present. A central assumption of the NADEAU and TAYLOR 1984 Down model was that all chromosomes had identical distributions of the lengths of segments from which ranges had been sampled. Chromosome lengths were assumed to be large relative to segment lengths. This approximation is good for chromosomes >100 cM in length, and fair for those >50 cM in length (as in the mouse), but is untenable for species with shorter chromosomes, such as the chicken, which has extreme divergence in chromosome size. The currently established chicken linkage group sizes range from 2 cM to 518 cM, with several <50 cM. We have extended the method of NADEAU and TAYLOR 1984 Down to allow small chromosome lengths and also to use the probability density of the observed ranges in a likelihood approach. A similar method, using only the number of genes forming a syntenic block of one or more markers, is also proposed. This leads naturally to a combined approach using both types of data. The model allows a flexible description of chromosome breakage, which includes random breakage as a special case. The methods are illustrated using comparative maps that compare chickens with both humans and mice (BURT et al. 1999 Down).


*  METHODS
*TOP
*ABSTRACT
*METHODS
*RESULTS
*DISCUSSION
*LITERATURE CITED

Distributions of segment lengths for different chromosomes:
How are the lengths of conserved segments expected to change, in general, as chromosome length increases? Very small (hypothetical) chromosomes are likely to contain only a single conserved segment, while large chromosomes might be expected to contain many relatively short segments and a few long ones. Intermediate length chromosomes may have segments whose lengths are a substantial proportion of chromosome length. Thus, for our empirical model of segment lengths, we require a flexible distribution whose shape can be defined for each chromosome. The ß distribution, a two-parameter distribution defined on the unit interval, can give distributional shapes as varied as unimodal, uniform, exponential-like, and reverse exponential, as its parameters vary. Segment lengths for the kth chromosome can be scaled by chromosome length, lk, to follow a ß distribution whose parameters are a function of lk. Using the square parenthesis notation for a density function, assume the distribution of segment lengths, y, on chromosome k to be

where a and b are the ß distribution parameters and B(a, b) is the ß function {int}10 xa-1(1 - x)b-1 dx. Assume also that the ß parameters change with chromosome length in a smooth way: a = {alpha}lßk and b = {gamma}l{delta}k. The mean segment length on chromosome k is then lka/(a + b) and the expected number of segments Sk is (a + b)/a, or Sk = 1 + ()l({delta}-ß)k. The expected number of conserved segments is the sum of Sk over all chromosomes.

An important special case occurs when the ß distribution parameter a equals one, so that [y]k = . This is the distribution of segment lengths when there are b random breaks in a chromosome (SANKOFF and NADEAU 1996 Down), and thus there are Sk = b + 1 conserved segments. Strictly, this random breakage pattern results from the superposition of chromosome breakage patterns of two species. We use the terms random genome breakage model and random chromosome breakage model to distinguish the following two cases where {delta} = 1 and {delta} != 1. When a = 1 and {delta} = 1, b is a linear function of lk. This corresponds to the random breakage model commonly found in the literature, which assumes that chromosome breakage occurs entirely at random throughout the genome with density {gamma} breaks per centimorgan. When {delta} != 1, then the density of random breaks changes from chromosome to chromosome. The more general nonrandom breakage model presented here uses only three parameters. Adding a fourth parameter produces no appreciable improvement in fit to our data. We set ß = 0, and estimate the constants {alpha}, {gamma}, and {delta}. Comparisons of likelihoods from these three models, starting from the three-parameter model and simplifying, allow the plausibility of random breakage models to be assessed.

Count data:
Observed genes are assumed to be distributed at random along the genome with constant density D genes per centimorgan. If there are many genes and a large number of observed syntenic groups, then the distribution of the number of genes (n) in a syntenic group found on an underlying conserved segment of length y will be approximately Poisson with mean Dy, defined for observable values of n >= 1.

The distribution of n, given y, is

The marginal distribution of n is then

Parameters {alpha}, ß, {gamma}, and {delta} (if not fixed) are estimated by maximizing the log-likelihood L1 = {Sigma}all counts log([n]k), with the integral evaluated numerically.

Range data and combined data:
An extension of the scheme for counts follows naturally for syntenic groups of at least two genes, where we have additional information on the range, w, between the outermost pair of the group. For this subset of the data the Poisson distribution of n given y has to be truncated to be >=2:

The distribution of the range, conditional on n and y, is

(PLACKETT 1971 Down). Then the joint distribution of n and w is

Note that the lower limit of this integral is no longer zero because the underlying segment must be at least as long as the observed range. Parameters are estimated by maximizing the log-likelihood L2 = {Sigma}ranges log([n, w]k).

It is possible to combine both preceding likelihoods. For single loci (n = 1) the distribution of n in the Count data section may be used. For range data the joint distribution of n and w may be used, with one modification. The Poisson distribution for a count conditional on segment length, [n|y]2, should be truncated to allow n >= 1 rather than n >= 2. This gives a common truncated Poisson distribution for both approaches, so that their respective log-likelihoods may be added.

Then we maximize L3 = {Sigma}single loci log([n]k) + {Sigma}ranges log([n, w]k).

Confidence intervals:
All maximizations were performed using standard derivative-free optimization routines. Confidence intervals for the number of conserved segments, S, were calculated only for the two random breakage models with ß parameter a = 1.

The random chromosome breakage model has a confidence region for {delta} and log({gamma}) that is an elliptical area defined by the critical log-likelihood contour corresponding to Lmax - {chi}22 . For k indexing all N = 38 autosomes,

and

When S is fixed at a value S0, {gamma} and the log-likelihood may be expressed as a function of {delta}. For small changes in {delta} the contours of constant S are almost linear and run approximately parallel to the major axis of the elliptical confidence region for log({gamma}) and {delta}. The likelihood was maximized for {delta}, over a grid of integer S0 values, and if the maximum exceeded that of the critical contour, then S0 was taken to be inside the confidence interval for S.

The confidence interval for the random genome breakage model was found from the log-likelihood corresponding to a grid of integer S0 values, using the critical value Lmax - {chi}21 .

Comparing model and data:
Observed genes are assumed to be distributed at random over the genome. Those found by means that are not random (previously mapped by FISH, gene families, chromosome walking, cross-referenced genes from other species' maps, etc.) have been omitted. If the distribution of genes is random and of constant density D, then, on average, the number found on linkage group k will be proportional to the length of the linkage group, lk, and the observed number, mk, will follow a Poisson distribution with mean Dlk. A linear regression through the origin of Poisson variables mk against lk was fitted and the generalized Pearson chi-square used as a measure of lack of fit (COLLETT 1991 Down) to assess the evidence for nonrandomness.

We can also compare the observed number of segments per chromosome with a prediction from the model. To estimate the predicted number of observed segments the distribution [y]k is replaced by the distribution of the observed segments

Then Sk and S equivalents are calculated as before.

Gene mapping data from the chicken genetic linkage map:
For chicken, the genes were mapped as part of the EC CHICKMAP project and the worldwide effort to map the chicken genome (BURT et al. 1995 Down; BURT and CHENG 1998 Down). The mapping information is recorded in the chicken genome database, Arkdb-chick (http://www.ri.bbsrc.ac.uk).

To estimate the genetic length of the chicken genome we take map lengths from recombination among m loci, using the Map Manager program (MANLY 1993 Down), corrected using the Kosambi mapping function (KOSAMBI 1944 Down) and multiplied by (m + 1)/(m - 1) to adjust for failure to sample telomeric regions (MORTON 1991 Down). The second correction assumes that loci are sampled randomly from a uniform distribution along the genetic map.

The locations of human and mouse genes were taken from the Genome Database (http://gdbwww.gdb.org/gdb/), UniGene (http://www.ncbi.nlm.nih.gov/), Online Mendelian Inheritance in Man (http://gdbwww.gdborg/omim/docs/omimtop.html), and the Mouse Genome Database (http://www.informatics.jax.org/).

The comparative gene map for chicken, human, and mouse (http://www.ri.bbsrc.ac.uk) contains 214 orthologous loci, most of which are known genes or conserved anonymous cDNA sequences. We excluded members of multigene families or genes for which specific orthology could not be determined or for which homology was in doubt.


*  RESULTS
*TOP
*ABSTRACT
*METHODS
*RESULTS
*DISCUSSION
*LITERATURE CITED

Data presented for comparative maps are based on chicken linkage groups and are labeled chicken-human (C-H) and chicken-mouse (C-M).

Data:
Details of the observed numbers of conserved genes between chicken and human or mouse are given in Table 1. Gene density, for those considered found at random, was ~3/100 cM for both comparisons, having excluded one-third of the conserved loci that were considered nonrandom and therefore biased. The total estimated length of the linkage groups in the chicken map was 3836 cM. There were considerably more single loci than conserved syntenic groups with ranges, particularly so for the chicken-mouse comparison. Most of the ranges were derived from fewer than five genes. Ranges were observed on 19 (C-H) and 13 (C-M) linkage groups, and almost always as a single range per linkage group except for the four largest linkage groups (Fig 1). Smaller linkage groups were more likely to contain single loci than ranges for the chicken-mouse comparison. In all, 28 (C-H) and 26 (C-M) linkage groups were found to contain homology segments defined by a single gene (n = 1) or conserved syntenic groups with n >= 2. The largest observed ranges from both comparative maps exceeded the median linkage group length, emphasizing the need for models allowing for chromosome size.



View larger version (17K):
In this window
In a new window
Download PPT slide
 
Figure 1. Numbers of conserved syntenic blocks defined by one (solid) or more than one (hatched) gene observed in each chicken linkage group for (A) chicken-human (C-H) and (B) chicken-mouse (C-M) comparisons. Linkage groups are ordered by size.


 
View this table:
In this window
In a new window

 
Table 1. Conserved genes and numbers of syntenic groups for chicken-human and chicken-mouse comparisons, together with the numbers of ranges defined by a conserved syntenic block with two or more genes

Tests of randomness, using the number of loci per linkage group, gave {chi}237 values of 33.4 (C-H) and 28.9 (C-M); neither provided evidence against randomness. The data and fitted lines representing the expected number of genes, assuming random scattering, are shown in Fig 2.



View larger version (12K):
In this window
In a new window
Download PPT slide
 
Figure 2. Numbers of conserved loci per chicken linkage group vs. linkage group length, together with fitted lines corresponding to uniform gene density, for the (A) chicken-human (C-H) and (B) chicken-mouse (C-M) tests for random scattering of genes.

Model fitting and predictions:
The results of fitting the various models to different data types are presented in Table 2 for the chicken-human comparison and in Table 3 for the chicken-mouse comparison. The behavior of the models was broadly similar for both comparative maps. Using either count data alone or combined data there was no evidence against the random cutting of chromosomes. In contrast, a model of nonrandom breakage was preferred when range data was considered on its own [{chi}21 = 13.16, P < 0.001 (C-H) and {chi}21 = 7.12, P < 0.01 (C-M) for a comparison of the three-parameter model with the two-parameter model for random chromosome breakage]. For both comparative maps the estimated numbers of segments from the nonrandom cuts model were less than the observed numbers of 69 (C-H) and 85 (C-M). Much of the information about the frequency of the short conserved segments is lost from the data when single loci are excluded. Estimates of the number of conserved segments for combined data are intermediate between those for range data and count data alone, but much closer to those obtained from counts. Confidence intervals derived from combined data were less than two-thirds the width of those from count data alone. This reflects both the extra information in ranges and the expectation that larger point estimates would give rise to wider confidence intervals. The single-parameter random genome breakage model was favored (as the simplest model giving a comparable fit) in both comparative maps when using count or combined data, with the exception of the chicken-mouse combined data, where the two-parameter random chromosome breakage model was preferred. Both models give very similar estimates for the number of conserved segments and also have similar confidence intervals.


 
View this table:
In this window
In a new window

 
Table 2. Fits of various chromosome breakage models, together with maximum-likelihood estimates of parameters and confidence intervals, using count, range, and combined data, for chicken-human conserved segments


 
View this table:
In this window
In a new window

 
Table 3. Fits of various chromosome breakage models, together with maximum-likelihood estimates of parameters and confidence intervals, using count, range, and combined data, for chicken-mouse conserved segments

The observed numbers of segments per linkage group, plotted against linkage group length, are presented in Fig 3. Also included are predictions from the random chromosome breakage model of the number of underlying segments per linkage group and of the number of observed segments per linkage group with a 95% confidence interval. The chicken-mouse prediction for the number of observed segments shows good agreement with the data. Even so, there are still some (nonzero) observed numbers outside the confidence range for observed segments. This is inevitable when the model predicts a single segment for very small linkage groups. For the same reason the predicted curve for the number of conserved segments also lies below some of the observed numbers of segments for short linkage groups. In the chicken-human comparison this also occurs for the longest linkage groups.



View larger version (14K):
In this window
In a new window
Download PPT slide
 
Figure 3. Numbers of observed conserved segments per chicken linkage group vs. linkage group length for the (A) chicken-human (C-H) and (B) the chicken-mouse (C-M) comparisons. Predicted numbers of conserved segments from the random chromosome breakage model using combined data (solid line), and the corresponding prediction of observed numbers of segments (dashed line), together with its 95% confidence limits (dotted lines).

An illustration of the flexibility of the ß distribution models to represent a wide range of segment length distributions is presented in Fig 4 using fitted distributions corresponding to the estimated parameters from the chicken-mouse comparison. The upper diagram shows changes in segment length distributions with chromosome length from the random chromosome breakage model applied to the combined data. Four chromosome lengths have been chosen for illustration. The distribution for the shortest chromosome of 20 cM has the most probable segment length equal to the chromosome. At a length of 60 cM the distribution is almost uniform, which would be appropriate for a single random cut point. At 90 cM the distribution becomes triangular, corresponding to two cut points. Longer chromosomes show an exponential-like segment length distribution shifted progressively further to the left. This is shown for a chromosome length of 150 cM, corresponding to 5 segments, for which the probability of a segment exceeding half of the length of the chromosome is 0.06. This probability halves for each additional segment on a chromosome.



View larger version (19K):
In this window
In a new window
Download PPT slide
 
Figure 4. Changes in the distribution of the ratio of segment length to chromosome length for the chicken-mouse comparison. Chromosomes are scaled to have length = 1 for display on a common axis. The upper diagram is from the fit of the random chromosome breakage model applied to the combined data, showing chromosomes of length 20 (dots), 60 (dashes), 90 (long dashes), and 150 cM. The lower diagram is from the fit of the nonrandom breakage model applied to range data alone, for chromosomes of length 60, 180, 270, and 450 cM, with line style order as above.

The lower diagram in Fig 4 is of the nonrandom breakage model fitted to the range data alone. For the shortest chromosome the most probable segment length is equal to that of the chromosome, as in the random breakage model. As chromosome lengths increase, however, the segment length distributions have progressively smaller means relative to the length of the chromosome and are unimodal. There is no evidence in the range data, and therefore no reflection in the fitted distributions, of a preponderance of very small segment lengths.


*  DISCUSSION
*TOP
*ABSTRACT
*METHODS
*RESULTS
*DISCUSSION
*LITERATURE CITED

The small size of some chicken chromosomes and the relatively large size of some conserved syntenic blocks have driven the construction of a chromosome-based model for conserved segments. But, as the model is a generalization of that of Nadeau and Taylor, there is no reason why the approach should not be used more widely, particularly with its emphasis on the testing of model and data assumptions. As illustrated in RESULTS, the ß distribution has provided a very flexible and intuitively appealing range of distributional shapes for the unobservable segments. Particularly important are special cases corresponding to random breakage models, one of which is already prevalent in the literature (see NADEAU and SANKOFF 1998 Down for a review). The likelihood approach presented here allows an explicit test of the plausibility of these random-breakage models, as well as providing a framework for deriving the confidence intervals that are an essential accompaniment to estimates. A particularly striking consequence of using such a flexible model is the need to use all available data to draw reliable conclusions. Discarding segments defined by single loci (homology segments) results in gross underestimation of genomic rearrangement. These issues are discussed in more detail below.

There are two untested assumptions made in the NADEAU and TAYLOR 1984 Down model, which have also been made here, without extensive comment: that genetic and physical distances are (approximately) proportional and that there are no insertions within a syntenic block, although inversions are permitted and fairly common in large conserved segments. This will lead to underestimation of genomic rearrangement. The algorithm of SANKOFF et al. 1997A Down could be used to identify probable inversions in the data, rather than those caused by incorrect gene ordering, prior to model estimation.

The crucial assumption of genes spread at random over the genome has been tested, but at the simplest level, to assess constant density over chromosomes. There is some evidence that recombination rates in the chicken microchromosomes (the smallest 33 autosomes) are some 2.5 times those in macrochromosomes (the largest 5 autosomes; RODIONOV 1996 Down), and that gene densitites in microchromosomes are double those in macrochromosomes (SMITH et al. 1999 Down). These two effects cancel out in the test for a random scattering of genes. Performing the same test on the human-mouse comparative map using >1600 unselected genes gave a {chi}218 value of 142, indicating a range of gene densities on different chromosomes well in excess of expectation under randomness. For this comparative map, the removal of nonrandomly selected genes represents a formidable task. Of course, assuming random scattering of the genes will only be an approximation, but a very useful one that is likely to become increasingly untenable, and practically impossible to remedy, as maps become more detailed. The overall consequences of nonrandomness of genes are not easily predicted. If genes are too clustered (as a result of chromosome walking or proximity of gene families), then corresponding segment lengths will be overestimated, resulting in a downward bias for the estimated number of conserved segments. If genes are too evenly spread (perhaps by map cross-referencing), then an excess of short homology segments may be observed, leading to an upwardly biased estimate. A preliminary model allowing for nonrandom gene distributions has recently been proposed by SANKOFF et al. 1997B Down, but further development is needed. For comparative mapping purposes, linking species with nascent maps to the detailed maps of humans or mice will be highly beneficial, but these are the very maps where nonrandomness of genes will be unavoidable. With careful examination, randomness of gene discovery may be a plausible approximation in the newly mapped species, but the background densities of orthologous genes on mouse or human maps may well vary. If this variation leads, as a first approximation, to groups of chromosomes of similar densities, our model is easily modified to reflect this.

The random chromosome and genome breakage models presented here are obtained as special cases of an empirical nonrandom breakage model. This allows likelihood-ratio tests for independent components of the model, an approach that is preferable to using goodness-of-fit tests for the whole model and then, if satisfactory, declaring that all of the model components are validated. Conclusions about the pattern of chromosome breakage are strongly influenced by the choice of which data measurements to analyze. It may be that when using only the ranges, which contain indirect metric evidence about segment lengths, we are detecting genuine nonrandomness. Or, perhaps, short ranges are underrepresented in the relatively small number of ranges in our sample covering the whole genome, often resulting in just a single range being present on a linkage group. When using all the data the tests within the model do not provide evidence against "chromosomes" being cut at random for the two comparative maps presented here, although the evidence is not unanimous about whether the randomness is on a chromosome or a genome basis. However, both models give similar estimates of conserved numbers of segments. Furthermore, if the model is modified so that observed ranges and linkage group lengths corresponding to microchromosomes are reduced by a factor of 2.5 (the minimum shrinkage factor corresponding to the almost linear part of the Kosambi mapping function), then the random genome breakage model is preferred for both comparative maps, and the estimates of conserved segment numbers change little. Other evidence for random genome breakage models is presented in NADEAU and SANKOFF 1998 Down. One arbitrary feature of the chromosome model presented here is the smooth relationship chosen to change the distribution of segment length with chromosome length. There is no expectation that the number of conserved segments on a chromosome increases monotonically with chromosome length, although when chromosome lengths differ widely an increasing trend is likely. With random genome breakage the trend should be linear, becoming less variable with an increase in the number of generations and rearrangements between the two species being compared. This may be a factor in the superior agreement of the observed segment numbers and their prediction in the chicken-mouse comparison. The chicken-human comparison is dominated by linkage groups with only a single observed segment (Fig 1), and this suggests that other functions might be useful in relating the number of conserved segments to chromosome length for some contexts. For example, the model may be easily modified to fit different breakage rates among microchromosomes and among macrochrosomosomes in chickens, if considered biologically plausible.

The estimates of the number of conserved segments change considerably depending on which measurements are chosen as data, in contrast to the relative stability of the estimates over the different chromosome breakage models. The flexibility of these breakage models in describing segment length distributions means that the model will be more sensitive to the data than might be the case with, say, a single-parameter exponential distribution. This places considerable emphasis on the quality of the mapped data and the examination of the assumptions used to describe gene distributions.

As an example of the dramatic effect of very different model assumptions on conclusions, we have also used the model of SANKOFF et al. 1997B Down to estimate conserved segment number. This elegant model, derived from the probability theory of runs, treats the genome as continuous and relies on both the process of gene identification and chromosome breakage occurring at random. It excludes considerations of gene order and distance both between and within conserved segments, and consequently ignores the impossibility of some configurations of observed ranges and chromosome boundaries. Further work is required to assess the limitations of the model assumptions. Estimates of the conserved number of segments from this model are considerably larger than those derived from our models; C-H = 142 segments and C-M = 293 segments.

Using our models, the chicken-mouse comparison gave an estimate of the number of conserved segments of almost 50% more than the chicken-human comparison for the combined data, with a confidence interval twice as wide. The intervals barely overlapped, suggesting a difference in conserved number of segments with the chicken between these two species. Further evidence of a difference comes from an examination of the ranges that were found in both comparisons. There are 16 ranges in common, of which 8 were of equal length. The remaining 8 have a measured range that is shorter for the chicken-mouse comparison. The two comparative maps presented here have many common genes in their data: genes that are first mapped in the chicken and then located in both of the more extensive mouse and human maps. This precludes the simplest approach to testing for human and mouse differences in conserved segment numbers with chicken by pooling data and assuming it to be independent, because there may be a positive correlation between the estimates of conserved segment number caused by the sampling scheme above. Finding a satisfactory representation of this correlation will be important in future work in evolutionary modeling, because in the long term we will wish to assess differences in the number of conserved segments for multiple comparative maps and to use these maps to give a new perspective on phylogenetic trees.


*  ACKNOWLEDGMENTS

We thank Liz Archibald for her excellent typing, and Michael Romanov for Russian translation. We also thank the Ministry of Agriculture, Fisheries and Food (MAFF), the Biotechnology and Biological Sciences Research Council (BBSRC) and the Commission of the European Communities for supporting this work.

Manuscript received April 23, 1999; Accepted for publication September 20, 1999.


*  LITERATURE CITED
*TOP
*ABSTRACT
*METHODS
*RESULTS
*DISCUSSION
*LITERATURE CITED

ANDERSSON, L., M. ASHBURNER, S. AUDUN, W. BARENDSE, and J. BITGOOD et al., 1996  Comparative genome organization of vertebrates: the first international workshop on comparative genome organization. Mamm. Genome 7:717-734[Medline].

BENGTSSON, B. O., K. KLINGA LEVAN, and G. LEVAN, 1993  Measuring genome reorganization from synteny data. Cytogenet. Cell Genet. 64:198-200[Medline].

BURT, D. W. and H. H. CHENG, 1998  Chicken gene map. ILAR J. 39:185-192.

BURT, D. W., N. BUMSTEAD, J. J. BITGOOD, F. A. PONCE DE LEON, and L. B. CRITTENDEN, 1995  Chicken genome mapping: a new era in avian genetics. Trends Genet. 11:190-194[Medline].

BURT, D. W., C. BRULEY, I. DUNN, C. T. JONES, and A. S. LAW et al., 1999  Dynamics of chromosome evolution in birds and mammals. Nature in press.

CARVER, E. A. and L. STUBBS, 1997  Zooming in on the human-mouse comparative map: genome conservation re-examined on a high-resolution scale. Genome Res. 7:1123-1137[Abstract/Free Full Text].

COLLETT, D., 1991 Modelling Binary Data. Chapman and Hall, London.

ERLICH, J., D. SANKOFF, and J. H. NADEAU, 1997  Synteny conservation and chromosome rearrangements during mammalian evolution. Genetics 147:289-296[Abstract].

HANNENHALLI, S., 1995 Polynomial time algorithm for computing translocation distance between genomes, pp. 162–176 in Combinational Pattern Matching, 6th Annual Symposium, Lecture Notes in Computer Science, edited by Z. GALIL and E. UKKONEN. Springer-Verlag, New York.

HANNENHALLI, S., and P. A. PEVZNER, 1995 Transforming men into mice (polynomial algorithm for genomic distance problem). Proceedings of the 36th Ann. Symposium Found. Comp. Sci., IEEE Computer Society Press, pp. 581–592.

JONES, C. T., D. R. MORRICE, I. R. PATON, and D. W. BURT, 1997  Homologues of genes on human chromosome 15q21-q26 and a chicken microchromosome show conserved synteny and gene order. Mamm. Genome 8:436-440[Medline].

KOSAMBI, D. D., 1944  The estimation of map distance from recombination values. Ann. Eugen. 12:172-175.

MANLY, K. F., 1993  A Macintosh program for storage and analysis of experimental genetic mapping data. Mamm. Genome 4:303-313[Medline].

MORIZOT, D. C., 1983  Tracing linkage groups from fishes to mammals. J. Hered. 74:413-416[Abstract/Free Full Text].

MORTON, N. E., 1991  Parameters of the human genome. Proc. Natl. Acad. Sci. USA 88:7474-7476[Abstract/Free Full Text].

NADEAU, J. H. and D. SANKOFF, 1998  Counting on comparative maps. Trends Genet. 14:495-501[Medline].

NADEAU, J. H. and B. A. TAYLOR, 1984  Lengths of chromosomal segments conserved since divergence of man and mouse. Proc. Natl. Acad. Sci. USA 81:814-818[Abstract/Free Full Text].

O'BRIEN, S. J., J. E. WOMACK, L. A. LYONS, K. J. MOORE, and N. A. JENKINS et al., 1993  Anchored reference loci for comparative genome mapping in mammals. Nat. Genet. 3:103-112[Medline].

O'BRIEN, S. J., J. WIENBERG, and L. A. LYONS, 1997  Comparative genomics: lessons from cats. Trends Genet. 13:393-399[Medline].

PITEL, F., V. FILLON, C. HEIMEL, N. LE FUR, and C. EL KHADIR-MOUNIER et al., 1998  Mapping of FASN and ACACA on two chicken microchromosomes disrupts the human 17q syntenic group well conserved in mammals. Mamm. Genome 9:297-300[Medline].

PLACKETT, R. L., 1971 An Introduction to the Theory of Statistics. Oliver and Boyd, Edinburgh.

POSTLETHWAIT, J. H., Y.-L. YAN, M. A. GATES, S. HORNE, and A. AMORES et al., 1998  Vertebrate genome evolution and the zebrafish gene map. Nat. Genet. 18:345-349[Medline].

RODIONOV, A. V., 1996  Micro versus macro: a review of structure and functions of avian micro- and macrochromosomes. Genetika 32:597-608[Medline].

SANKOFF, D., 1993 Models and analyses of genomic evolution, pp. 177–183 in Supercomputing and Complex Genome Analysis, Proceedings of the Second International Conference on Bioinformatics, edited by H. A. LIM, J. W. FICKETT and C. R. CANTOR. World Scientific Publishing Co. Pte. Ltd., Singapore.

SANKOFF, D. and J. H. NADEAU, 1996  Conserved synteny as a measure of genomic distance. Discrete Appl. Math. 71:247-257.

SANKOFF, D., V. FERRETTI, and J. H. NADEAU, 1997a  Conserved segment identification. J. Comput. Biol. 4:559-565[Medline].

SANKOFF, D., M.-N. PARENT, I. MARCHAND and V. FERRETTI, 1997b On the Nadeau-Taylor theory of conserved chromosome segments, pp. 262–274 in Combinational Pattern Matching, 8th Annual Symposium, Lecture Notes in Computer Science, edited by A. APOSTOLICO and J. HEIN. Springer-Verlag, New York.

SMITH, E. J. and H. CHENG, 1998  Mapping chicken genes using preferential amplification of specific alleles. Microbial and comparative genomics. Genomics 30:13-20.

SMITH, J., C. K. BRULEY, I. R. PATON, I. DUNN, and C. T. JONES et al., 1999  Differences in gene density in the chicken macrochromosomes and microchromosomes. Anim. Genet. in press.

WOMACK, J. E. and S. KATA, 1995  Bovine genome mapping: evolutionary inference and the power of comparative genomics. Curr. Opin. Genet. Dev. 5:725-733[Medline].

ZAKHAROV, I. A., V. S. NIKIFOROV, and E. V. STEPANYUK, 1995  Interval estimates of the combinatorial measures of similarity for orders of homologous genes. Genetika 31:1163-1167[Medline].




This article has been cited by other articles:


Home page
Genome ResHome page
C. Ren, M.-K. Lee, B. Yan, K. Ding, B. Cox, M. N. Romanov, J. A. Price, J. B. Dodgson, and H.-B. Zhang
A BAC-Based Physical Map of the Chicken Genome
Genome Res., December 1, 2003; 13(12): 2754 - 2758.
[Abstract] [Full Text] [PDF]


Home page
Proc. Natl. Acad. Sci. USAHome page
P. Pevzner and G. Tesler
Human and mouse genomic sequences reveal extensive breakpoint reuse in mammalian evolution
PNAS, June 24, 2003; 100(13): 7672 - 7677.
[Abstract] [Full Text] [PDF]


Home page
Genome ResHome page
S. Hampson, A. McLysaght, B. Gaut, and P. Baldi
LineUp: Statistical Detection of Chromosomal Homology With Application to Plant Comparative Genomics
Genome Res., May 1, 2003; 13(5): 999 - 1010.
[Abstract] [Full Text] [PDF]


Home page
GeneticsHome page
E. A. Housworth and J. Postlethwait
Measures of Synteny Conservation Between Species Pairs
Genetics, September 1, 2002; 162(1): 441 - 448.
[Abstract] [Full Text] [PDF]


Home page
GeneticsHome page
S. Kumar, S. R. Gadagkar, A. Filipski, and X. Gu
Determination of the Number of Conserved Chromosomal Segments Between Species
Genetics, March 1, 2001; 157(3): 1387 - 1395.
[Abstract] [Full Text]