Genetics, Vol. 160, 793-798, February 2002, Copyright © 2002

Comparative Analysis of the Human Dystrophin and Utrophin Gene Structures

Uberto Pozzoli1,a, Manuela Sironi1,a, Rachele Cagliania, Giacomo P. Comib, Alessandra Bardonia, and Nereo Bresolina,b
a IRCCS E. Medea, Associazione La Nostra Famiglia, 23842 Bosisio Parini (LC), Italy
b Centro Dino Ferrari, Istituto di Clinica Neurologica, Università di Milano, IRCCS Ospedale Maggiore Policlinico, 20122 Milan, Italy

Corresponding author: Manuela Sironi, Via Don Luigi Monza 20, 23842 Bosisio Parini (LC), Italy., msironi{at}bp.lnf.it (E-mail)

Communicating editor: A. J. LOPEZ


*  ABSTRACT
*TOP
*ABSTRACT
*LITERATURE CITED

We present analysis of intronic sequences in the human DMD and UTRN genes. In both genes accumulation of repeated elements could account for intron expansion. Out-of-frame rod-domain exons have stronger splice sites and are separated by significantly longer introns as compared to in-frame exons. These features are unique for the two homologs and not shared by other spectrin superfamily genes.


THE DMD gene is the largest known human gene, spanning >2500 kb of the X chromosome and occupying ~0.1% of the genome (NISHIO et al. 1994 Down; LANDER et al. 2001 Down); it is composed of 79 exons that account for only 0.6% of the sequence (AHN and KUNKEL 1993 Down). Its main protein product, dystrophin, a member of the spectrin superfamily, is a rod-shaped protein (KOENIG et al. 1988 Down) that localizes to the sarcolemma. In vertebrates another large gene (LOVE et al. 1989 Down) encodes utrophin, a protein that displays a conserved structure with dystrophin over its entire length, with higher sequence similarity in the C- and N-terminal regions (TINSLEY et al. 1992 Down; PEARCE et al. 1993 Down); despite high structural homology, the utrophin gene (UTRN) is about one-third in length with respect to dystrophin.

Mutations in the dystrophin gene are responsible for either Duchenne or Becker muscular dystrophy (DMD and BMD). The majority of DMD and BMD patients carry deletions in the gene (KOENIG et al. 1989 Down) with long introns being preferential sites for deletion breakpoints. Worldwide incidence of DMD is 1 in 3500 male births, one-third of which arise from new mutations (AHN and KUNKEL 1993 Down); it has been speculated that the size of the dystrophin gene might partially account for the high new mutation rate. The extreme length of dystrophin introns, a feature conserved in mammals, birds, and invertebrates (DOMINGUEZ-STEGLICH et al. 1990 Down; NEUMAN et al. 2001 Down), has long been a matter of debate and, in this respect, detailed analysis of intron sequences might be of help.

To allow a closer comparison to dystrophin gene structure, we used BLASTn analysis of utrophin cDNA against human genomic sequences to map intron/exon boundaries and to describe splice junctions and most intronic sequences: the gene consists of 74 exons with a length varying between 23 and 269 bp. Average intron length is ~7633 bp (it is 26,137 bp for dystrophin). In the two genes, 56 of the exons are identical in size and pairwise sequence alignment of these exons revealed a mean identity score of 61.9%. In contrast, no relation seems to exist between corresponding introns (Fig 1). Sequence analysis was performed for both dystrophin and utrophin available intron regions (see Table 1 for dystrophin). A high concentration of repetitive elements was found to be a key feature of many dystrophin and utrophin long introns; overall 32.1% of total dystrophin intron length (28.4% for utrophin) is accounted for by repeated sequences, with LINE-1 elements representing the major contribution to dystrophin intron size. Interestingly, when total length of repetitive sequences per intron was compared with residual intron length (intron size after removal of all repeats), a highly significant correlation was found for both the dystrophin and utrophin genes (Spearman correlation coefficients = 0.93 and 0.70, respectively; P < 0.001 in both cases; Fig 2A). This finding might indicate that early insertional events (which are now obliterated by accumulated point mutations) triggered further insertions, leading to incremental intron growth. Nonetheless, it is also possible that residual sequences did not arise from early sequence insertion but rather existed per se and started accumulating interspersed repeated elements in proportion to their original length. When the presence of different repeats was analyzed as a function of time (Fig 2B), similar profiles were obtained for the two homologs and in both cases the trend was superlinear indicating that, indeed, the augmented intron size resulting from each insertion event has favored further insertions in a process that determined, in the last 130 million years, a size increase of ~20% for both utrophin and dystrophin. These data indicate that gradual accumulation of repeated elements may be regarded as a convincing hypothesis to explain intron expansion in these genes. Repeated elements also represent a large target for homologous unequal recombination, yet only a few breakpoints in the dystrophin gene have been sequenced and associated with homologous DNA misalignment (MCNAUGHTON et al. 1998 Down). SUMINAGA et al. 2000 Down recently indicated a nonhomologous recombination event between Alu and LINE-1 repeats as the cause of a deletion in the dystrophin gene and hypothesized the existence of a novel source of instability. Nonetheless, here we show that 31.2% of the dystrophin intron size is represented by repetitive elements; this implies that even if breakpoints were not promoted by any single sequence element, approximately one-third of them would be expected to involve repeated sequences. Whatever the molecular mechanisms involved, the longest dystrophin introns have been shown to be preferential sites for deletion breakpoints (BAUMBACH et al. 1989 Down); in this view intron expansion can be regarded only as a genetic load and much more so if energetic effort during transcription and potential problems in pre-mRNA processing are considered. We have previously shown (SIRONI et al. 2001 Down) that, in the dystrophin gene, out-of-frame (OF) exons have significantly stronger splice sites with respect to in-frame (IF) exons. Here we extended this analysis to utrophin splice junctions and it is evident from Table 2 that the same finding is verified. A similar bias is not observed when splice sites of other genes of the spectrin superfamily are considered; this implies that duplication, early in evolution, of a common structural motif cannot be indicated as an explanation. Interestingly, in both genes, significant differences between splicing parameters are accounted for by rod-domain exons (Table 2) that encode a region where the two proteins display the lowest degree of conservation. One possibility is that this feature represents a device to minimize energetic waste: skipping of an IF rod-domain exon due to a splicing error would produce an internally truncated "Becker-like" protein, which would retain partial activity in cellular processes; in contrast, exon missplicing in the C- and N- terminal domains, where different binding sites are located, is expected to determine protein dysfunction irrespective of frame conservation or alteration. Intron lengths were also considered in the two homologs (Table 2) and OF exons were found to be separated by significantly longer genomic distances as compared to IF exons; again, significant differences were accounted for by rod-domain exons and were not found when other genes of the spectrin superfamily were considered. This finding is quite surprising since the probability of cryptic splice site activation is expected to increase with intron length. Nonetheless, despite the lack of any simple relation between lengths of corresponding introns, this feature has been preserved in both dystrophin and utrophin, suggesting underlying functional significances. To this respect it should be noted that attention has recently been focused on intron sequences as modulators of both splicing and transcription (BRINSTER et al. 1988 Down; OKKEMA et al. 1993 Down); intron-dependent recruitment of splicing factors to transcription sites has been reported in HeLa cells (HUANG and SPECTOR 1996 Down) while NEEL et al. 1993 Down indicated that, in NIH-3T3 cells, intron removal rate is dependent upon the number of introns on the nascent transcript; the authors suggest that interaction between intron sequences can increase both the specificity and the efficiency of splicing.



View larger version (10K):
In this window
In a new window
Download PPT slide
 
Figure 1. Schematic representation of the dystrophin (a) and utrophin (b) genes. Vertical lines represent exons. Length information was not available for utrophin introns 32, 33, 41, and 42. Length of dystrophin introns 50–53 is only approximate. Utrophin genomic sequences can be freely accessed at the Celera publication site (http://public.celera.com) through BLASTn search against the cDNA sequence (GenBank accession no. NM007124); genomic sequences of exons 43–74 are available in GenBank (accession no. 13632062) or can be accessed through the University of California at Santa Cruz (UCSC) human genome browser (http://genome.ucsc.edu/index.html). In a few instances small gaps interrupted the sequences. Dystrophin genomic sequences were derived either from the Celera publication site (free accession no. GA_x8W864V:1..500000 for exons 44–50) or from the UCSC human genome browser. Some discrepancies were found with intron lengths previously predicted on the basis of cDNA hybridization experiments (http://www.dmd.nl/cdnagene.html).



View larger version (9K):
In this window
In a new window
Download PPT slide
 
Figure 2. Analysis of repetitive elements in the dystrophin and utrophin genes. A recent update of the RepeatMasker program (http://repeatmasker.genome.washington.edu) run under sensitive settings was used. (a) Total length of repetitive sequences vs. residual intron length; top, dystrophin; bottom, utrophin. (b) Intron expansion as a function of time. Solid line, dystrophin; dashed line, utrophin. Lengths of sequences estimated to have inserted at different time courses were subtracted from total intron size; resulting lengths have been plotted as a function of time and expressed as fraction of present size. Age of long interspersed element (LINE) and LTR sequences was defined as previously described (SMIT 1993 Down; SMIT et al. 1995 Down; SMIT and RIGGS 1996 Down). For Alu and mammalian-wide interspersed repeats (MIR) sequences, age was determined as by KAPITONOV and JURKA 1996 Down and JURKA et al. 1995 Down, respectively. Overall 57.2% (length percentage) of total dystrophin repetitive elements could be dated (66.6% of utrophin's).


 
View this table:
In this window
In a new window

 
Table 1. Repetitive elements in dystrophin introns


 
View this table:
In this window
In a new window

 
Table 2. Splice site parameters and intron lengths: comparison between in-frame and out-of-frame exons

In a scenario whereby introns can enhance transcriptional activity and, eventually, stimulate the accumulation of splicing factors, long intronic regions might turn out to be not as disadvantageous as expected.


*  FOOTNOTES

1 These authors contributed equally to this work. Back


*  ACKNOWLEDGMENTS

We are grateful to Dr. R. Giorda for useful discussion about the manuscript. We thank the Celera publication site for allowing sequence retrieval and analysis.

Manuscript received August 31, 2001; Accepted for publication November 15, 2001.


*  LITERATURE CITED
*TOP
*ABSTRACT
*LITERATURE CITED

AHN, A. H. and L. M. KUNKEL, 1993  The structural and functional diversity of dystrophin. Nat. Genet. 3:283-291[Medline].

BAUMBACH, L. L., J. S. CHAMBERLAIN, P. A. WARD, N. J. FARWELL, and C. T. CASKEY, 1989  Molecular and clinical correlation of deletion leading to Duchenne and Becker muscular dystrophies. Neurology 39:465-474[Abstract/Free Full Text].

BRINSTER, R. L., J. M. ALLEN, R. R. BEHRINGER, R. E. GELINAS, and R. D. PALMITER, 1988  Introns increase transcriptional efficiency in transgenic mice. Proc. Natl. Acad. Sci. USA 85(3):836-840[Abstract/Free Full Text].

DOMINGUEZ-STEGLICH, M., G. MENG, T. BETTECKEN, C. R. MULLER, and M. SCHMID, 1990  The dystrophin gene is autosomally located on a microchromosome in chicken. Genomics 8(3):536-540[Medline].

HUANG, S. and D. L. SPECTOR, 1996  Intron-dependent recruitment of pre-mRNA splicing factors to sites of transcription. J. Cell Biol. 133(4):719-732[Abstract/Free Full Text].

JURKA, J., E. ZIETKIEWICZ, and D. LABUDA, 1995  Ubiquitous mammalian-wide interspersed repeats (MIRs) are molecular fossils from the mesozoic era. Nucleic Acids Res. 23(1):170-175[Abstract/Free Full Text].

KAPITONOV, V. and J. JURKA, 1996  The age of Alu subfamilies. J. Mol. Evol. 42(1):59-65[Medline].

KOENIG, M., A. P. MONACO, and L. M. KUNKEL, 1988  The complete sequence of dystrophin predicts a rod-shaped cytoskeletal protein. Cell 53:219-228[Medline].

KOENIG, M., A. H. BEGGS, M. MOYER, S. SCHERPF, and K. HEINDRICH et al., 1989  The molecular basis for Duchenne versus Becker muscular dystrophy: correlation of severity with type of deletion. Am. J. Hum. Genet. 45:498-506[Medline].

LANDER, E. S., L. M. LINTON, B. BIRREN, C. NUSBAUM, and M. C. ZODY et al., 2001  Initial sequencing and analysis of the human genome. Nature 409(6822):860-921[Medline].

LOVE, D. R., D. F. HILL, G. DICKSON, N. K. SPURR, and B. C. BYTH et al., 1989  An autosomal transcript in skeletal muscle with homology to dystrophin. Nature 339:55-58[Medline].

MATHEWS, D. H., J. SABINA, M. ZUKER, and D. H. TURNER, 1999  Expanded sequence dependence of thermodynamic parameters improves prediction of RNA secondary structure. J. Mol. Biol. 288:911-940[Medline].

MCNAUGHTON, J. C., D. J. COCKBURN, G. HUGHES, W. A. JONES, and N. G. LAING et al., 1998  Is gene deletion in eukaryotes sequence- dependent?: a study of nine deletion junctions and nineteen other deletion breakpoints in intron 7 of the human dystrophin gene. Gene 222(1):41-51[Medline].

NEEL, H., D. WEIL, C. GIANSANTE, and F. DAUTRY, 1993  In vivo cooperation between introns during pre-mRNA processing. Genes Dev. 7(11):2194-2205[Abstract/Free Full Text].

NEUMAN, S., A. KABAN, T. VOLK, D. YAFFE, and U. NUDEL, 2001  The dystrophin/utrophin homologues in Drosophila and in sea urchin. Gene 263(1–2):17-29[Medline].

NISHIO, H., Y. TAKESHIMA, N. NARITA, H. YANAGAWA, and Y. SUZUKI et al., 1994  Identification of a novel first exon in the human dystrophin gene and of a new promoter located more than 500kb upstream of the nearest known promoter. J. Clin. Invest. 94:1037-1042.

OKKEMA, P. G., S. W. HARRISON, V. PLUNGER, A. ARYANA, and A. FIRE, 1993  Sequence requirements for myosin gene expression and regulation in Caenorhabditis elegans. Genetics 135:385-404[Abstract].

PEARCE, M., D. J. BLAKE, J. M. TINSLEY, B. C. BYTH, and L. CAMPBELL et al., 1993  The utrophin and dystrophin genes share similarities in genomic structure. Hum. Mol. Genet. 2:1765-1772[Abstract/Free Full Text].

SHAPIRO, M. B. and P. SENAPATHY, 1987  RNA splice junctions of different classes of eukaryotes: sequence statistics and functional implications in gene expression. Nucleic Acids Res. 15:7155-7174[Abstract/Free Full Text].

SIRONI, M., U. POZZOLI, R. CAGLIANI, G. P. COMI, and A. BARDONI et al., 2001  Analysis of splicing parameters in the dystrophin gene: relevance for physiological and pathogenetic splicing mechanisms. Hum. Genet. 109(1):73-84[Medline].

SMIT, A. F., 1993  Identification of a new, abundant superfamily of mammalian LTR-transposons. Nucleic Acids Res. 21(8):1863-1872[Abstract/Free Full Text].

SMIT, A. F. and A. D. RIGGS, 1996  Tiggers and DNA transposon fossils in the human genome. Proc. Natl. Acad. Sci. USA 93(4):1443-1448[Abstract/Free Full Text].

SMIT, A. F., G. TOTH, A. D. RIGGS, and J. JURKA, 1995  Ancestral, mammalian-wide subfamilies of LINE-1 repetitive sequences. J. Mol. Biol. 246(3):401-417[Medline].

SUMINAGA, R., Y. TAKESHIMA, K. YASUDA, N. SHIGA, and H. NAKAMURA et al., 2000  Non-homologous recombination between Alu and LINE-1 repeats caused a 430-kb deletion in the dystrophin gene: a novel source of genomic instability. J. Hum. Genet. 45(6):331-336[Medline].

TINSLEY, J. M., D. J. BLAKE, A. ROCHE, U. FAIRBROTHER, and J. RISS, 1992  Primary structure of dystrophin-related protein. Nature 360:591-593[Medline].




This article has been cited by other articles:


Home page
Mol Biol EvolHome page
B. Cardazzo, L. Bargelloni, L. Toffolatti, and T. Patarnello
Intervening Sequences in Paralogous Genes: A Comparative Genomic Approach to Study the Evolution of X Chromosome Introns
Mol. Biol. Evol., December 1, 2003; 20(12): 2034 - 2041.
[Abstract] [Full Text] [PDF]


Home page
Genome ResHome page
U. Pozzoli, G. Elgar, R. Cagliani, L. Riva, G. P. Comi, N. Bresolin, A. Bardoni, and M. Sironi
Comparative Analysis of Vertebrate Dystrophin Loci Indicate Intron Gigantism as a Common Feature
Genome Res., May 1, 2003; 13(5): 764 - 772.
[Abstract] [Full Text] [PDF]