- THIS ARTICLE
-
Abstract
- Full Text (PDF)
- Alert me when this article is cited
- Alert me if a correction is posted
- SERVICES
- Email this article to a friend
- Similar articles in this journal
- Similar articles in PubMed
- Alert me to new issues of the journal
- Download to citation manager
- Reprints & Permissions
- CITING ARTICLES
- Citing Articles via HighWire
- Citing Articles via Google Scholar
- GOOGLE SCHOLAR
- Articles by Huntley, M. A.
- Articles by Golding, G. B.
- Search for Related Content
- PUBMED
- PubMed Citation
- Articles by Huntley, M. A.
- Articles by Golding, G. B.
Neurological Proteins Are Not Enriched For Repetitive Sequences
Melanie A. Huntleya and G. Brian Goldingaa Department of Biology, McMaster University, Hamilton, Ontario L8S 4K1, Canada
Corresponding author: G. Brian Golding, McMaster University, 1280 Main St. W., Hamilton, Ontario L8S 4K1, Canada., golding{at}mcmaster.ca (E-mail)
Communicating editor: S. W. SCHAEFFER
| ABSTRACT |
|---|
Proteins associated with disease and development of the nervous system are thought to contain repetitive, simple sequences. However, genome-wide surveys for simple sequences within proteins have revealed that repetitive peptide sequences are the most frequent shared peptide segments among eukaryotic proteins, including those of Saccharomyces cerevisiae, which has few to no specialized developmental and neurological proteins. It is therefore of interest to determine if these specialized proteins have an excess of simple sequences when compared to other sets of compositionally similar proteins. We have determined the relative abundance of simple sequences within neurological proteins and find no excess of repetitive simple sequence within this class. In fact, polyglutamine repeats that are associated with many neurodegenerative diseases are no more abundant within neurological specialized proteins than within nonneurological collections of proteins. We also examined the codon composition of serine homopolymers to determine what forces may play a role in the evolution of extended homopolymers. Codon type homogeneity tends to be favored, suggesting replicative slippage instead of selection as the main force responsible for producing these homopolymers.
THE presence and abundance of simple repetitive sequences within nucleotide sequences are well known. Microsatellites and other tandemly repeated sequences within DNA are well characterized; however, similarly repetitive sequences within proteins are less well acknowledged and understood. Nevertheless, such repeats within eukaryotic proteins are abundant. They vary in composition from a simple reiteration of a single amino acid to long tracts of sequence that are predominated by the presence of one or only a few amino acids.
Genome-wide surveys for simple sequences have shown that these low-complexity sequences are the most commonly shared peptide fragments in eukaryotic proteomes (![]()
![]()
![]()
![]()
Only a few functions have been ascribed to these unusual regions. One of the first described and perhaps best known are the opa and opa-like repeats found in essential developmental proteins in insects (![]()
30 glutamines, with interspersed histidine residues.
In some prokaryotes, reversible mutations within regions of repetitive simple sequence DNA are involved in phase variation (![]()
![]()
![]()
![]()
![]()
![]()
![]()
Other well-known repetitive regions in proteins are thought to be the cause of several human neurodegenerative diseases. These are associated with proteins containing extended regions of tandemly repeated glutamine residues. These proteins and others involved in nervous system disease and development contain multiple long homopeptides within their sequence (![]()
Huntington's disease was one of the first disorders characterized to be due to homopeptides. This disease is associated with neural cell death, progressive chorea, dementia, and seizures. It is believed to be caused by an increase in the length of a CAG triplet repeat within the huntingtin gene. The age of onset is inversely correlated with the length of CAG repeats (![]()
![]()
![]()
![]()
Kennedy's disease, also known as spinal and bulbar muscular atrophy (SBMA), is an X-linked disease that causes late onset lower-motor and primary-sensory neuropathy. Clinical symptoms include muscular atrophy, twitching, tremors, and androgen deficiency. The primary cause of this disease is an expanded CAG triplet repeat within the androgen receptor (AR) gene (![]()
Dentatorubral-pallidoluysian atrophy (DRPLA) is phenotypically similar to Huntington's disease, including late onset dementia, cerebellar ataxia, myoclonic seizures, and choreic and athetoid movements. Again an expanded CAG repeat, encoding polyglutamine, is responsible for the pathology of this disease (![]()
![]()
![]()
![]()
Other neurological diseases that fall into this category are spinocerebellar ataxia (SCA) 1, 2, 3 (Machado-Joseph disease), 6, 7, and 17. All are caused by expansions of a polyglutamine tract in separate proteins (![]()
![]()
![]()
![]()
![]()
![]()
![]()
Studies of synthetic homopolymers, including glutamine repeats, have shown that some can form stable structures (![]()
![]()
![]()
![]()
![]()
![]()
Large numbers of short and long homopeptides are more frequent in developmental proteins than in other classes of proteins (![]()
In most of the neurodegenerative proteins, polyglutamine results from a triplet repeat expansion of the CAG codon. It is generally believed that these simple sequences arise as a byproduct of replicative slippage at the DNA level, similar to the process occurring in microsatellite expansion. However, not all repeats follow this pattern. Serine reiterations in yeast do not show bias toward long tracts of one of the possible codons (![]()
In this study extended serine homopolymer tracts are used to show that the length of the tract does not affect the mixture of codon types but that the relative position of the codons within a tract does affect codon composition, indicating that these tracts are likely the result of slippage.
| MATERIALS AND METHODS |
|---|
Neurological proteins:
Human and Drosophila neurological and kinase proteins were collected from the National Center for Biotechnology Information (NCBI) using the ENTREZ query system. To search for neurological proteins, the key words neural, neuro, nerve, and axon were used. To search for developmental proteins we used the key words development, morphogen, homeotic, differentiation, embryo, larva, and determination. These key words were based on the key words used in the gene ontology database (http://www.godatabase.org/dev/database/). Kinase proteins were collected by searching for the key word kinase. All key words (or modifications of the key word's roots) had to be present on the definition line of the GenPept files. All key word matches were screened to eliminate matches that did not fit into their respective categories, such as homeostasis, which matched to the root of homeotic. These databases are not exclusive, but this method is unbiased, explicit, and easily repeatable. All sequences targeted to the mitochondria were removed.
Many coding sequences within a genome are redundant duplicates, isozymes, or ancient duplications. Additionally, sequence databases can contain redundant sets of sequences. To construct a database of, for example, neurological proteins, such duplicates had to be filtered. First a BLAST search (![]()
![]()
Databases from Drosophila protein sequences were constructed, resulting in 77 neurological proteins (a 45% reduction), 139 developmental proteins (a 56% reduction), and 128 kinases (a 65% reduction). The kinase database within 5% of the neurological lengths had 52 proteins, while the one within 10% of the neurological lengths had 64.
In total, we constructed five types of databases each for human and Drosophila proteins: neurological, developmental, kinase, kinase within 5% of neurological lengths, and kinase within 10% of neurological lengths. To analyze these databases we constructed comparison databases that were similar in composition to the original databases, while excluding neurological, developmental, and kinase proteins, respectively. Each database was used as a basis to sample sequences from the NCBI and to construct 100 random comparison databases. For instance, for human neurological proteins, 50 databases were constructed to contain human sequences that were not neurological, but otherwise randomly chosen from the NCBI and within 5% of the lengths of the neurological proteins. Another 50 databases were constructed to be within 10% of the lengths of the neurological proteins. Therefore, each protein within the neurological database had a protein of similar length within each of the comparison databases. In this way, each of the 100 comparison databases is mutually exclusive to the human neurological database, but is similar in protein length composition.
To determine how common highly repetitive, simple sequences were in these databases, BLAST searches were performed, using 100-residue-long homopolymers of each amino acid. The number of BLAST hits with expect values
0.01 were compared to those found from the 100 comparison databases and the corresponding percentiles were recorded.
This analysis was also performed on the redundant databases, to examine how the analysis was affected by making the databases nonredundant.
To ensure that these results were robust, we also performed the same analysis using BLAST with 50-amino-acid-long homopolymers and using two entirely distinct algorithms, SIMPLE (![]()
![]()
Of these methods, the SIMPLE algorithm has the most rigid window length to search for cryptically simple sequences. During various trials we used total window lengths ranging from 40 to 100 and searched for monomeric-like simple sequences.
For analysis using the SEG algorithm, we chose a window length, L, of 40 and a complexity cutoff value, K2(1), of 2.6. All low-complexity segments were sorted into amino acid categories on the basis of the composition of the segment. If two or more amino acids each had frequencies of 30% or higher, that segment was counted toward each of those categories. This was done to search for highly repetitive, low-complexity regions.
In addition, we analyzed the percentage of low complexity per sequence and the number of low-complexity regions per sequence. We did this using two different sets of SEG parameters: an L of 15 with K2(1) of 1.9 and an L of 40 with K2(1) of 2.6.
This entire analysis was also performed on the proteins from Caenorhabditis elegans to determine how widespread the resulting patterns were.
Homopolymer tracts:
Analysis similar to a previous study (![]()
5 residues whose combined lengths totaled no less than 20. In sequences with extreme bias in composition long homopeptides are expected to occur more often by chance. Karlin and Burge also screened for proteins containing at least one homopeptide of length
10 residues and at least one other of length
5 residues. We used the additional requirement that at least one homopeptide within a protein had a length of 15 residues or more to emphasize more extended homopolymers. The protein descriptions, their accession numbers, lengths, and the homopeptide lengths were recorded. Proteins with any known neurological function were grouped in the "neurological" category. Any of the remaining proteins with known developmental function were grouped under the "developmental" category. All other proteins with some known function were termed "other" and any remaining proteins were put in the "unknown or hypothetical" category. We further selected the serine homopeptides within these proteins and analyzed their codon content.
Serine is unique among the amino acid residues as it has two types of codons (TCN and AGY) that are at least two mutational events apart. Because of the mutational distance between the two codon types, studying the codon composition of serine homopolymers allows for a stronger distinction between the two hypotheses for their mechanism of evolution: replicative slippage or selection at the protein level. The TCN codons (TCA, TCC, TCG, and TCT) are more frequent than the AGY codons (AGT and AGC). If the homopeptide was simply the product of DNA slippage during replication, we would expect little mixture of the two codon types. For example, a polyserine tract that was created via strand slippage should be composed of only TCN codons or only AGY codons, but seldom a mixture of both. If, however, other forces, such as selection, are acting to create these homopeptides, then a mixture of the codon types might be more common.
We determined whether the length of the homopolymer tract influenced the mixture of the two codon types, using a likelihood-ratio test,
2 = 2 ln (L0/L), where L0 is the likelihood of the null model and L is the likelihood of the model being tested.
Given genomic codon usage frequencies (fAGY and fTCN) and N polyserine tracts of length ni = xi + yi, where xi is the number of AGY codons and yi is the number of TCN codons in the ith tract, the likelihood model can be summarized as
![]() |
(1) |
This model assumes a linear relationship between the length of the tract and codon composition. The parameters a and b were adjusted to maximize the likelihood, L. The null model, L0, which is a random choice according to the frequencies, is the likelihood obtained with a = 1 and b = 0.
We used a second model to see if the position of a codon within a homopolymer tract influenced the type of codon found. For instance, if a codon position is flanked by AGY codons, is that position more likely to be occupied by an AGY or a TCN codon? Given N polyserine tracts each with length ni, where Xj denotes the codon at position j within the homopolymer tract, we calculated the likelihood as
![]() |
(2) |
The null model suggests no dependence on neighboring codons. This situation is achieved when
and
. Otherwise the parameters P1, P2, P3, and P4 can range from 1/e to 1. This results in a logarithmic decay function, bounded between zero and one. The parameters P1 and P2 are a measure of how likely the middle codon position will be occupied by the same codon type as the two surrounding codons, given that the two surrounding codons are of the same type. Thus, smaller values of P1 and P2 translate to increased probabilities of codon type homogeneity. P3 and P4 measure the bias of the middle codon position toward the left or the right codon position when they are not occupied by the same codon type. Therefore, smaller values of P3 and P4 mean an increase in the probability of the Xj codon being of the same type as the Xj1 codon only, while larger values of P3 and P4 correspond to an increase in the probability of being the same type as the Xj+1 codon.
| RESULTS |
|---|
Neurological proteins:
Table 1 shows that the human neurological database contained eight proteins with significant similarity to polyalanine. This number of BLAST hits was larger than that found for any of the 100 human nonneurological databases (matched to be within 5 and 10% of the neurological sequence lengths). The Drosophila neurological database also contained eight significant hits, which were in the 100th percentile of the number of significant BLAST hits from each of the 50 Drosophila nonneurological databases (matched to be within 5% of the neurological sequence lengths) and larger than that found for any of the 50 nonneurological databases (matched to be within 10% of the neurological sequence lengths).
|
Table 2 shows that developmental proteins seem to be enriched with alanine (A), glycine (G), proline (P), and serine (S) in comparison to nondevelopmental proteins equally numerous and matched for sequence length. Also, glutamic acid (E) and glutamine (Q) seem to be more common in developmental proteins; however, this result is not as consistent as for A, G, P, and S. It is interesting to note that lysine (K) shows a rather large discrepancy between human and Drosophila. In human developmental proteins, the number of significant BLAST hits to poly(K) was in the 84th to 94th percentile, but in Drosophila it was only in the 2nd to 10th percentile.
|
Neurological proteins are consistently enriched for alanine (A) and histidine (H) as shown in Table 1. Glutamine, which is associated with many neurodegenerative diseases, is not found to be overrepresented in neurological proteins. There are also large discrepancies between the species for glutamic acid (E) and proline (P).
The kinase proteins in Table 3 show that none of the amino acids are consistently enriched in both species, compared to nonkinase proteins. Kinase databases constructed to be of similar lengths to the neurological proteins (Table 4 and Table 5) show no consistent enrichment and an increase in species-to-species discrepancies.
|
|
|
Neurological proteins have much less enrichment compared to developmental proteins. With the exception of alanine and histidine, neurological proteins are not consistently enriched for repetitive protein sequence.
We performed the BLAST analysis on the redundant data sets to investigate the effect of using nonredundant databases. We found no significant difference except for all of the kinases and for the neurological proteins from D. melanogaster. In these cases, the nonredundant databases were found to have significantly more BLAST hits per sequence than the redundant databases (data not shown).
Using the SIMPLE algorithm we obtained broadly similar results for neurological proteins. However, in many cases the windows detected as significantly simple were not as enriched for a predominant amino acid as those regions detected by BLAST. Another difference is that SIMPLE is not constructed to recognize residues with similar properties and misses such enriched regions as a result. In a parallel analysis using SEG, again the results were consistent with our BLAST analysis, but with more variability found within the Drosophila results (results not shown).
The patterns we obtained using BLAST with 50-amino-acid-long homopolymers were nearly identical to those found using the 100-residue-long homopolymers. However, the Drosophila results, like those from SEG, were more variable.
The parameter space for SEG is very large with numerous parameter sets possible for identifying different types of repetitive low-complexity sequences. Different parameter sets can give rise to dissimilar SEG results. The SEG analysis examining the percentage of low complexity and the number of low-complexity segments per sequence was highly inconsistent between the SEG parameters employed (data not shown).
The proteins of C. elegans yielded similar results to those of humans and Drosphila (data not shown). Again, the neurological proteins had no significant enrichment compared to the nonneurological databases. The developmental proteins had the greatest enrichment, while the kinase proteins had enrichment patterns like those found in humans and Drosophila.
Homopolymer tracts:
Table 6 shows the lengths of the longest homopolymer tracts for each amino acid. This table does not reflect homopolymer frequency or the average lengths of such tracts. Only the individual extreme cases are listed. In humans, many of the longest tracts for neurological proteins are longer than those for the developmental proteins. In Drosophila the opposite is true. Also, for nine amino acids, in both humans and Drosophila, kinase proteins have homopolymers as long as or longer than those of the developmental and neurological proteins.
|
The appendix lists proteins with multiple homopolymers containing at least one homopeptide of length 15 or more. This is composed of 29 human proteins (Table A11) and 74 Drosophila proteins (Table A22). While such proteins are more numerous in Drosophila, they also contain significantly (P < 0.05) more homopeptides per protein than do the human sequences. There are 559 homopeptide tracts for the Drosophila proteins and only 133 for humans. While ![]()
15 residues and our nonredundant database account for this difference.
In both humans and Drosophila, poly(Q) is the most frequent homopolymer tract. However, poly(Q) accounts for only 24.1% of the human homopolymers, while accounting for 53.1% of the Drosophila homopolymers. Another discrepancy between the two species is found in the abundance of poly(E), which accounts for 18.8% of the human homopolymers, but only 0.5% of the Drosophila homopolymers. As well, poly(G) and poly(P) are more than double in humans (15.0% vs. 7.3% and 10.5% vs. 4.7%, respectively).
These interspecies discrepancies are largely consistent with previous results (![]()
![]()
Of the 11 polyserine tracts in human, 7 had absolutely no mixture of the codon types. Of the 56 Drosophila polyserine tracts, 26 had no mixture. From the analysis of the first model, which was used to determine if the length of a homopolymer tract influenced the underlying codon mixture, the likelihood-ratio test gave
2 values of 22.83 for humans and 57.18 for Drosophila. Using 2 d.f. these values corresponded to probabilities <0.001 of occurring by chance alone. The likelihood model suggests that longer polyserine tracts did not have significantly less mixture of codon types. In fact, the parameter b is small in both cases and did not have a consistent direction between the two species. However, the maximum-likelihood estimate of a took on a fractional value. This indicated that maximum-likelihood codon frequencies within the homopolymers were different from the genomic codon frequencies.
For the second model, which examines the influence of codon position within a homopolymer tract, we found that P1, P2, and P4 were smaller than the null model values. For humans, P3 was slightly greater than the null model value, but for Drosophila P3 was less than the null model value. Likelihood-ratio tests gave
2 values of 81.29 for humans and 65.56 for Drosophila. Using 4 d.f., this translates to probabilities <<0.001. Indeed, being surrounded by one type of codon significantly increases the likelihood of the middle position also being that same codon type. Also, if the two neighboring codons are of different types, the middle position (Xj) will tend to be occupied by a codon type that matches the left-hand (Xj1) site.
| DISCUSSION |
|---|
These results confirm previous reports, showing developmental proteins to be enriched for simple sequences composed primarily of alanine, glutamic acid, glycine, proline, glutamine, or serine. However, unexpectedly, neurological proteins are only slightly enriched for alanine or histidine.
Neurological proteins have been thought to be enriched for repeats. These results show that as a class they do not have an excess of glutamine-enriched regions. Yet many neurological disorders are linked with extended polyglutamine tracts and proteins enriched with glutamine residues. There is evidence that many of these disorders result from protein aggregation, triggered by tracts of polyglutamine forming polar zippers (![]()
![]()
In contrast, these results confirm the well-known example of simple sequence protein repeats, the opa and opa-like repeats originally found in insects (![]()
![]()
When we look at only the proteins containing multiple homopolymer tracts (Table A11 and Table A22), we again find a rather large discrepancy between the two species. Although both species have poly(Q) as the most frequent homopolymer tract, it is far more frequent in the Drosophila proteins, representing over half of the homopolymers, while comprising less than a quarter of the human homopolymers. Poly(E) and poly(P) are much more abundant in humans than in Drosophila.
The amino acids that are found to be overrepresented as repeats within these proteins have diverse properties and thus a variety of implications for the structures of the proteins in which they are embedded.
However, overall, little is known about the types of protein structures extended amino acid repeats can form. A survey of eukaryotic proteins within the structural database revealed that low-complexity protein repeats are underrepresented and rarely structurally characterized (![]()
![]()
![]()
![]()
![]()
![]()
Although some repeat regions may arise and be maintained by selection, most appear to have arisen via slipped-strand mispairing, like microsatellite expansion. Our analysis of the serine homopolymers from Table A11 and Table A22 shows evidence for slippage, in contrast to the results found in yeast serine homopolymers (![]()
![]()
![]()
![]()
![]()
A study on glutamine, alanine, and glycine repeats being inserted into the loop of a protein showed that the stability and folding rates of the proteins were minimally affected (![]()
One hypothesis suggests that these repeats allow for protein elongation, followed by functional specialization of the repeat region via mutation (![]()
![]()
![]()
Another argument in support of this hypothesis is that eukaryotes may compensate for longer generation times, using the extra variability afforded by protein repeats to rapidly create novel protein domains (![]()
This speculation of the function of protein repeats still does not clearly explain why they are overly abundant in the critically important developmental proteins, but not so in neurological proteins.
| ACKNOWLEDGMENTS |
|---|
We thank two anonymous reviewers for their valuable comments on the manuscript. This work was supported by a Natural Sciences and Engineering Research Council of Canada (NSERC) grant to G.B.G. and an NSERC scholarship to M.A.H.
Manuscript received August 19, 2003; Accepted for publication December 11, 2003.
| APPENDIX 1 |
|---|
|
|
| APPENDIX 1 |
|---|
|
|
| LITERATURE CITED |
|---|
ALBA, M. M., R. A. LASKOWSKI, and J. M. HANCOCK, 2002 Detecting cryptically simple protein sequences using the SIMPLE algorithm. Bioinformatics 18:672-678.
ALTSCHUL, S. F., T. L. MADDEN, A. A. SCHAFFER, J. ZHANG, and Z. ZHANG et al., 1997 Gapped BLAST and PSI-BLAST: a new generation of protein database search programs. Nucleic Acids Res. 25:3389-3402.
BANFI, S., A. SERVADIO, M. Y. CHUNG, T. J. KWIATKOWSKI, JR., and A. E. MCCALL et al., 1994 Identification and characterization of the gene causing type 1 spinocerebellar ataxia. Nat. Genet. 7:513-520.[CrossRef][Medline]
BURKE, J. R., M. S. WINGFIELD, K. E. LEWIS, A. D. ROSES, and J. E. LEE et al., 1994 The Haw River syndrome: dentatorubropallidoluysian atrophy (DRPLA) in an African-American family. Nat. Genet. 7:521-524.[CrossRef][Medline]
CARIELLO, L., T. DE CRISTOFARO, L. ZANETTI, T. CUOMO, and L. DI MAIO et al., 1996 Transglutaminase activity is related to CAG repeat length in patients with Huntington's disease. Hum. Genet. 98:633-635.[CrossRef][Medline]
COOPER, G., N. J. BURROUGHS, D. A. RAND, D. C. RUBINSZTEIN, and W. AMOS, 1999 Microsatellite and trinucleotide-repeat evolution: evidence for mutational bias and different rates of evolution in different lineages. Proc. Natl. Acad. Sci. USA 96:11916-11921.
DAVID, G., N. ABBAS, G. STEVANIN, A. DURR, and G. YVERT et al., 1997 Cloning of the SCA7 gene reveals a highly unstable CAG repeat expansion. Nat. Genet. 17:65-70.[CrossRef][Medline]
DUNKER, A. K., C. J. BROWN, J. D. LAWSON, L. M. IAKOUCHEVA, and Z. OBRADOVIC, 2002 Intrinsic disorder and protein function. Biochemistry 41:6573-6582.[CrossRef][Medline]
DUYAO, M., C. AMBROSE, R. MYERS, A. NOVELLETTO, and F. PERSICHETTI et al., 1993 Trinucleotide repeat length instability and age of onset in Huntington's disease. Nat. Genet. 4:387-392.[CrossRef][Medline]
GOLDING, G. B., 1999 Simple sequence is abundant in eukaryotic proteins. Protein Sci. 8:1358-1361.[Medline]
GREEN, H. and N. WANG, 1994 Codon reiteration and the evolution of proteins. Proc. Natl. Acad. Sci. USA 91:4298-4302.
HOOD, D. W., M. E. DEADMAN, M. P. JENNINGS, M. BISERCIC, and R. D. FLEISCHMANN et al., 1996 DNA repeats identify novel virulence genes in Haemophilus influenzae. Proc. Natl. Acad. Sci. USA 93:11121-11125.
HUNTLEY, M. and G. B. GOLDING, 2000 Evolution of simple sequence in proteins. J. Mol. Evol. 51:131-140.[Medline]
HUNTLEY, M. A. and G. B. GOLDING, 2002 Simple sequences are rare in the Protein Data Bank. Proteins 48:134-140.[CrossRef][Medline]
KARLIN, S. and C. BURGE, 1996 Trinucleotide repeats and long homopeptides in genes and proteins associated with nervous system disease and development. Proc. Natl. Acad. Sci. USA 93:1560-1565.
KARLIN, S., L. BROCCHIERI, A. BERGMAN, J. MRAZEK, and A. J. GENTLES, 2002 Amino acid runs in eukaryotic proteomes and disease associations. Proc. Natl. Acad. Sci. USA 99:333-338.
KAWAGUCHI, Y., T. OKAMOTO, M. TANIWAKI, M. AIZAWA, and M. INOUE et al., 1994 CAG expansions in a novel gene for Machado-Joseph disease at chromosome 14q32.1. Nat. Genet. 8:221-228.[CrossRef][Medline]
KIEBURTZ, K., M. MACDONALD, C. SHIH, A. FEIGIN, and K. STEINBERG et al., 1994 Trinucleotide repeat length and progression of illness in Huntington's disease. J. Med. Genet. 31:872-874.
KOIDE, R., T. IKEUCHI, O. ONODERA, H. TANAKA, and S. IGARASHI et al., 1994 Unstable expansion of CAG repeat in hereditary dentatorubral-pallidoluysian atrophy (DRPLA). Nat. Genet. 6:9-13.[CrossRef][Medline]
KRULL, L., J. WALL, H. ZOBEL, and R. DIMLER, 1965 Synthetic polypeptides containing sidechain amide groups: water insoluble polymers. Biochemistry 4:626-632.[CrossRef][Medline]
LA SPADA, A. R., E. M. WILSON, D. B. LUBAHN, A. E. HARDING, and K. H. FISCHBECK, 1991 Androgen receptor gene mutations in X-linked spinal and bulbar muscular atrophy. Nature 352:77-79.[CrossRef][Medline]
LADURNER, A. G. and A. R. FERSHT, 1997 Glutamine, alanine or glycine repeats inserted into the loop of a protein have minimal effects on stability and folding rates. J. Mol. Biol. 273:330-337.[CrossRef][Medline]
LI, S. H., M. G. MCINNIS, R. L. MARGOLIS, S. E. ANTONARAKIS, and C. A. ROSS, 1993 Novel triplet repeat containing genes in human brain: cloning, expression, and length polymorphisms. Genomics 16:572-579.[CrossRef][Medline]
LINDQUIST, S., S. KROBITSCH, L. LI, and N. SONDHEIMER, 2001 Investigating protein conformation-based inheritance and disease in yeast. Philos. Trans. R. Soc. Lond. B 356:169-176.
MAR ALBA, M., M. F. SANTIBANEZ-KOREF, and J. M. HANCOCK, 1999 Amino acid reiterations in yeast are overrepresented in particular classes of proteins and show evidence of a slippage-like mutational process. J. Mol. Evol. 49:789-797.[CrossRef][Medline]
MARCOTTE, E. M., M. PELLEGRINI, T. O. YEATES, and D. EISENBERG, 1999 A census of protein repeats. J. Mol. Biol. 293:151-160.[CrossRef][Medline]
MITCHELL, P. J. and R. TJIAN, 1989 Transcriptional regulation in mammalian cells by sequence-specific DNA binding proteins. Science 245:371-378.
MOXON, E. R., P. B. RAINEY, M. A. NOWAK, and R. E. LENSKI, 1994 Adaptive evolution of highly mutable loci in pathogenic bacteria. Curr. Biol. 4:24-33.[CrossRef][Medline]
MYERS, E. W. and W. MILLER, 1988 Optimal alignments in linear-space. Comput. Appl. Biosci. 4:11-17.
NAGAFUCHI, S., H. YANAGISAWA, K. SATO, T. SHIRAYAMA, and E. OHSAKI et al., 1994 Dentatorubral and pallidoluysian atrophy expansion of an unstable CAG trinucleotide on chromosome 12p. Nat. Genet. 6:14-18.[CrossRef][Medline]
NAKAMURA, K., S. Y. JEONG, T. UCHIHARA, M. ANNO, and K. NAGASHIMA et al., 2001 SCA17, a novel autosomal dominant cerebellar ataxia caused by an expanded polyglutamine in TATA-binding protein. Hum. Mol. Genet. 10:1441-1448.
PERUTZ, M. F. and A. H. WINDLE, 2001 Cause of neural death in neurodegenerative diseases attributable to expansion of glutamine repeats. Nature 412:143-144.[CrossRef][Medline]
PERUTZ, M. F., T. JOHNSON, M. SUZUKI, and J. T. FINCH, 1994 Glutamine repeats as polar zippers: their possible role in inherited neurodegenerative diseases. Proc. Natl. Acad. Sci. USA 91:5355-5358.
PERUTZ, M. F., B. J. POPE, D. OWEN, E. E. WANKER, and E. SCHERZINGER, 2002 Aggregation of proteins with expanded glutamine and alanine repeats of the glutamine-rich and asparagine-rich domains of Sup35 and of the amyloid beta-peptide of amyloid plaques. Proc. Natl. Acad. Sci. USA 99:5596-5600.
PULST, S. M., A. NECHIPORUK, T. NECHIPORUK, S. GISPERT, and X. N. CHEN et al., 1996 Moderate expansion of a normally biallelic trinucleotide repeat in spinocerebellar ataxia type 2. Nat. Genet. 14:269-276.[CrossRef][Medline]
ROHL, C. A., W. FIORI, and R. L. BALDWIN, 1999 Alanine is helix-stabilizing in both template-nucleated and standard peptide helices. Proc. Natl. Acad. Sci. USA 96:3682-3687.
ROMERO, P., Z. OBRADOVIC, and A. K. DUNKER, 1999 Folding minimal sequences: the lower bound for sequence complexity of globular proteins. FEBS Lett. 462:363-367.[CrossRef][Medline]
ROMERO, P., Z. OBRADOVIC, X. LI, E. C. GARNER, and C. J. BROWN et al., 2001 Sequence complexity of disordered protein. Proteins 42:38-48.[CrossRef][Medline]
RUBINSZTEIN, D. C., B. AMOS, and G. COOPER, 1999 Microsatellite and trinucleotide-repeat evolution: evidence for mutational bias and different rates of evolution in different lineages. Philos. Trans. R. Soc. Lond. B 354:1095-1099.
SAUNDERS, N. J., A. C. JEFFRIES, J. F. PEDEN, D. W. HOOD, and H. TETTELIN et al., 2000 Repeat-associated phase variable genes in the complete genome sequence of Neisseria meningitidis strain MC58. Mol. Microbiol. 37:207-215.[CrossRef][Medline]
SILVEIRA, I., C. MIRANDA, L. GUIMARAES, M. C. MOREIRA, and I. ALONSO et al., 2002 Trinucleotide repeats in 202 families with ataxia: a small expanded (CAG)n allele at the SCA17 locus. Arch. Neurol. 59:623-629.
SNELL, R. G., J. C. MACMILLAN, J. P. CHEADLE, I. FENTON, and L. P. LAZAROU et al., 1993 Relationship between trinucleotide repeat expansion and phenotypic variation in Huntington's disease. Nat. Genet. 4:393-397.[CrossRef][Medline]
STERN, A., M. BROWN, P. NICKEL, and T. F. MEYER, 1986 Opacity genes in Neisseria gonorrhoeae: control of phase and antigenic variation. Cell 47:61-71.[CrossRef][Medline]
TRIEZENBERG, S. J., 1995 Structure and function of transcriptional activation domains. Curr. Opin. Genet. Dev. 5:190-196.[CrossRef][Medline]
WHARTON, K. A., B. YEDVOBNICK, V. G. FINNERTY, and S. ARTAVANIS-TSAKONAS, 1985 opa: a novel family of transcribed repeats shared by the Notch locus and other developmentally regulated loci in D. melanogaster. Cell 40:55-62.[CrossRef][Medline]
WOOTTON, J. C. and S. FEDERHEN, 1993 Statistics of local complexity in amino acid sequences and sequence databases. Comput. Chem. 17:149-163.
YANAGISAWA, H., M. BUNDO, T. MIYASHITA, Y. OKAMURA-OHO, and K. TADOKORO et al., 2000 Protein binding of a DRPLA family through arginine-glutamic acid dipeptide repeats is enhanced by extended polyglutamine. Hum. Mol. Genet. 9:1433-1442.
ZHUCHENKO, O., J. BAILEY, P. BONNEN, T. ASHIZAWA, and D. W. STOCKTON et al., 1997 Autosomal dominant cerebellar ataxia (SCA6) associated with small polyglutamine expansions in the alpha 1A-voltage-dependent calcium channel. Nat. Genet. 15:62-69.[CrossRef][Medline]
This article has been cited by other articles:
![]() |
M. A. Huntley and A. G. Clark Evolutionary Analysis of Amino Acid Repeats across the Genomes of 12 Drosophila Species Mol. Biol. Evol., December 1, 2007; 24(12): 2598 - 2609. [Abstract] [Full Text] [PDF] |
||||
![]() |
M. A. Huntley and G. B. Golding Selection and Slippage Creating Serine Homopolymers Mol. Biol. Evol., November 1, 2006; 23(11): 2017 - 2025. [Abstract] [Full Text] [PDF] |
||||
- THIS ARTICLE
-
Abstract
- Full Text (PDF)
- Alert me when this article is cited
- Alert me if a correction is posted
- SERVICES
- Email this article to a friend
- Similar articles in this journal
- Similar articles in PubMed
- Alert me to new issues of the journal
- Download to citation manager
- Reprints & Permissions
- CITING ARTICLES
- Citing Articles via HighWire
- Citing Articles via Google Scholar
- GOOGLE SCHOLAR
- Articles by Huntley, M. A.
- Articles by Golding, G. B.
- Search for Related Content
- PUBMED
- PubMed Citation
- Articles by Huntley, M. A.
- Articles by Golding, G. B.


