The idea that some genetic factors are able to move around chromosomes emerged more than 60 years ago when Barbara McClintock first suggested that such elements existed and had a major role in controlling gene expression and that they also have had a major influence in reshaping genomes in evolution. It was many years, however, before the accumulation of data and theories showed that this latter revolutionary idea was correct although, understandably, it fell far short of our present view of the significant influence of what are now known as “transposable elements” in evolution. In this article, I summarize the main events that influenced my thinking about transposable elements as a young scientist and the influence and role of these specific genomic elements in evolution over subsequent years. Today, we recognize that the findings about genomic changes affected by transposable elements have considerably altered our view of the ways in which genomes evolve and work.
Anecdotal, Historical and Critical Commentaries on Genetics
SINCE the radical suggestion by Barbara McClintock in the 1950s, based on her extensive genetic analyses in maize, that some genes might move along chromosomes, our knowledge of transposable elements (TEs) has vastly increased. TEs are no longer seen as “junk” and “selfish” pieces of DNA—the predominant view from the 1960s through the 1990s—but as major components of genomes that have played a significant role in evolution, an idea also first proposed by McClintock (1984: her Nobel Prize lecture). The history of these genomic elements provides one of the best examples of how scientific concepts in biology emerge and then evolve into new concepts. It is a salutary lesson for researchers, both young and old, to be tolerant of striking new ideas when they appear and not to dismiss them simply because they conflict with current theories and knowledge. History is easier to relate to when you have been a direct observer, and in this article I summarize the main events witnessed in my own scientific lifetime that paved the way to the present day understanding of the structure and composition of genomes. I make no attempt to present a complete or balanced historical account of what happened; instead, I describe the events and discoveries that influenced the thinking of a young scientist, not just despite, but because of, their very strangeness and incompatibility with what was then received opinion.
TRANSPOSABLE ELEMENTS AS COMPONENTS OF GENETIC DIVERSITY
In the 1970s, the field of population genetics was dominated by analysis of the genetic polymorphism of populations using allozymes, with the aim of deciphering population structuring (Lewontin 1974). The entire emphasis was on the role of point mutations in coding regions as the primary source of evolutionary change. Despite the observations of McClintock (1950) that in maize some genetic factors [e.g., the Activator (Ac)/Dissociator (Ds)] that can control the cell color of kernels were able to change their locations within and between chromosomes and could control the expression of some genes [see Fedoroff (1994) for a biography of B. McClintock], and the demonstrated presence of mobile DNA elements in bacteria (Shapiro 1969), the possibility that TEs could influence genetic polymorphism, and therefore genetic diversity, was, for the most part, ignored. Some researchers did not even believe that these unconventional DNA insertions actually moved or moved with significant frequency and so thought that they could not possibly contribute to genetic diversity. This was because they always found the same TE insertion in the particular mutant in which they were interested. Such a result was, however, to be expected, given the screening process that scientists were using to isolate their mutant.
However, the 1970s saw the emergence in Drosophila research of the hybrid dysgenesis phenomenon, in which crosses between specific lines of Drosophila melanogaster led to various genetic changes, including sterility and increased mutation and recombination rates. It was some years before these effects were finally shown to be associated with the mobilization of specific TEs: P elements (for the P/M system) and I elements (for the I/R system) (see Picard et al. 1978; Kidwell 1979; Rubin et al. 1982; Engel 1988). P elements were later shown to be DNA transposons (TEs that transpose via a DNA intermediary) and I elements to be non-long terminal repeat (LTR) retrotransposons [TEs that have no long terminal repeat at their extremities and that transpose via an RNA intermediary (Finnegan 1992)]. Yet even this simple division does not do justice to the known complexity of the entire set of TEs, as revealed over the past 30 years. Some of these new TEs, for example, have a composite structure and appear to be major components of some genomes. The LTR retrotransposons, which have a long terminal repeat at their extremities, have been divided into various subfamilies: Ty1-copia-like (Pseudoviridae), Ty3-gypsy-like (Metaviridae), and Pao-BEL-like, depending on their sequence similarity and the order of the gene products that they encode. Because Barbara McClintock's Ac elements in maize and the P elements in Drosophila, which both transpose via a DNA intermediary, were initially named “transposons,” it became customary to use this term as a generic term for all kinds of transposable elements, regardless of whether their mechanism of transposition involved DNA or RNA. To avoid confusion while still taking historical usage into account, it is now usual to use the term “DNA transposons” for the DNA-based TEs and the term “retrotransposons” (either with or without a long terminal repeat at their extremities) for the RNA-based TEs. To bring some order to the nomenclature in the field, Wicker et al. (2007) and Kapitonov and Jurka (2008) have proposed a new TE classification based on transposition mechanism, sequence similarities, and structural relationships so as to include the new TE classes that have emerged.
The discoveries that P and I elements were actually TEs was decisive in convincing us that TEs were of interest, not only because we realized that they were able to move within the genome, but also because the P element had been shown to have invaded all known populations of D. melanogaster worldwide within the space of ∼50 years (Anxolabéhère et al. 1988), after having been transferred into this species from Drosophila willistoni (Daniels et al. 1990). Fifty years is a very short period in terms of evolutionary time, but a manageable interval for population geneticists. The apparently deleterious effects of the mobilization of the P and I elements as a result of crosses, however, confirmed the idea that TEs are “selfish”; i.e., they produce only detrimental effects on organisms, an idea developed in 1980 by Doolittle and Sapienza, which was generally accepted for many years. Were this simple idea true, the evolutionary significance of TEs would be slight. But during the 1980s, new findings were already showing the picture to be more complex.
In particular, various TEs were discovered by molecular biologists who were interested either in the composition of the genomes or in the sequences of mutant alleles. Several Russian teams were in the vanguard of this research, and they referred to the first TEs of Drosophila as “mobile dispersed genes” (mdg), a term that everybody could understand. mdg1, mdg2, mdg3, and mdg4 were the first to be discovered and described (Ananiev and Ilyin 1981; Kulguskin et al. 1981; Gvozdev et al. 1981). This numerical nomenclature was soon superseded by more colorful names such as gypsy (mdg4), copia, and 412. Since then, many other names have emerged from the imaginative minds of scientists in a seemingly unending flow and, in a way, it is a pity that these imaginative names will finally have to be replaced by a more austere but accurate and much-needed systematic nomenclature reform, as noted above (Finnegan 1992; Wicker et al. 2007; Kapitonov and Jurka 2008).
The next phase in the exploration of these elements was to estimate their number of copies by Southern blots and to localize precisely their chromosomal insertions, as could be visualized on the polytene chromosomes of Drosophila, using in situ hybridization. Initially, such localization was carried out with radioactive labels but then with the more powerful biochemical labeling technique based on avidin-streptavidin and dyes. At the end of the 1970s, the period when I was a post doc student in the United States, I first came into contact with the in situ hybridization technique and some “strange things” about TEs. In particular, I remember being shown some pictures of Drosophila salivary gland polytene chromosomes taken by a Ph.D. student, which showed strong labeling in a specific band in one chromosome but no labeling on its homolog (the two chromosomes had been separated on the squash as sometimes happens). This young researcher, using a probe of a specific TE, showed me these pictures surreptitiously because it was difficult at this time to understand what this was all about. Furthermore, the in situ hybridization technique led to estimations of the number of copies of various TEs and detection of the polymorphism of their insertion in different strains and populations. Such polymorphism provided direct evidence of the continuing mobility of these elements.
Most of these studies were done on D. melanogaster, which has a low frequency of TE insertions at numerous, albeit limited, numbers of sites (usually less than 100 in this species). One surprising result was the observation of transposition bursts of TEs such as Doc (a non-LTR retrotransposon; Gerasimova et al. 1990), copia (a retrotransposon; Biémont et al. 1987; Gerasimova et al. 1990; Pasyukova and Nuzhdin 1993), and P (a DNA transposon; Biémont et al. 1990) while inbred lines were being maintained in the laboratory. Although we did not know whether this mobilization was the result of the inbreeding itself, or simply “spontaneous” transposition revealed by the homozygosity of the lines, which made it easier to observe new insertions, we were forced to conclude that inbred lines were not as homozygous as expected, at least with regard to TE insertions. This opened the way for the further observation that TE movement and the resulting polymorphism could influence quantitative traits during artificial selection (Shrimpton et al. 1990). While empirical data accumulated about various strains and populations (Yamaguchi et al. 1987; Charlesworth et al. 1992; Biémont et al. 1994), theoretical approaches and simulations—the parameters of which involved transposition and excision rates, selection against the insertion of TEs, and effective population size—furnished useful models with which to confront the experimental data with some theoretical expectations (Charlesworth and Charlesworth 1983).
These population-based models now look imperfect and oversimplified, mainly because they analyzed TE dynamics in terms of asymptotic values, with little understanding of the dynamics of the variation in TE copy number over long stretches of time. Nevertheless, at the time, these models were very useful in forcing experimental scientists to reanalyze their data, to estimate and reconsider the parameters that they were using in the models, and to invent new protocols. It is striking to note, however, that the precise mechanisms underlying the action of natural selection against TE insertions, which involve either gross chromosomal rearrangements caused by unequal recombination between TE copies [the ectopic exchange model of Langley et al. (1988)] or the slightly deleterious effects of TE insertions that reduce host fitness, are still topics of considerable debate (Charlesworth et al. 1997; Biémont et al. 1997), with most data now coming from plants (Tian et al. 2009; Lockton and Gaut 2010) and humans (Song and Boissinot 2006), in which the reproductive system (Hickey 1982; Dolgin et al. 2008), demography (Lockton et al. 2008), and population size (Lynch and Conery 2003) seem to play major roles. In the present context, it is interesting to remember some discussions in which the possibility of the presence or influence of TEs in organisms “higher” than bacteria or Drosophila, such as humans or plants, was dismissed out of hand. However, DNA-reassociation studies had already shown us long before that genomes of many organisms, including plants, do in fact contain many repeated sequences (Flavell et al. 1974; Baldari and Amaldi 1976; Crain et al. 1976; Venturini et al. 1987). The connection between such sequences and TEs would not be demonstrated for many years and, as a result, we found it hard to accept that the human genome is in fact full of TEs and other repeated sequences, a point that was definitively admitted only when the human genome had been fully sequenced. Plant genomes were also initially assumed to be free of retrotransposons until these elements were actually looked for, and we now know that the genomes of some plants contain large numbers of TEs of this kind (Voytas and Ausubel 1988) (see Figure 1 for the proportion of TEs in various organisms). Most genomes appear to contain a mixture of TEs, some of which are still active while others are ancient relics that have degenerated and are sometimes no longer recognizable as TEs. Because we can now detect the presence of ancient TE copies or decayed sequences that are the hallmarks of TEs, the proportion of the genome now known to have originated from TEs is increasing.
One interesting debate that began in earnest in the 1990s concerned the relationship between retroviruses and retrotransposable elements. Some people thought that TEs arose from retroviruses, but made no attempt to explain where the retroviruses had come from, a view that, like the hypothesis of the extraterrestrial origin of life, is not per se unlikely, but in the end only shifts the question elsewhere. Although it was easier to accept that retroviruses and the members of the LTR retrotransposon family had common ancestors, and that retroviruses arose from TEs by the addition of an envelope gene, even these ideas were not very readily accepted (Varmus 1988). This may have been because TEs did not have the status that viruses had then and because it was easier to recognize and admit the existence of viruses, with which we were used to coexisting, than to accept the idea of the presence of thousands of copies of junk DNA actually inside our human genome. The discovery that some TEs were capable of producing virus-like particles similar to those of retroviruses (Mossie et al. 1985; Miyake et al. 1987), and the similarity of their reverse transcriptase-like sequences (Xiong and Eickbush 1988), clearly revealed the link between retrotransposons and retroviruses (see Varmus 1988). It was also at this time that infectious retroviruses were discovered in Drosophila (Kim et al. 1994; Song et al. 1994), making this species once again a good model for the analysis of the dynamics of such elements within genomes and populations and between species. Such infectious retroviruses were subsequently also identified in plants (Wright and Voytas 1998). Because infectious retroviruses can be transmitted between organisms, this greatly broadened the notion of the possible horizontal transfer of TEs, one example of which had been unambiguously established for the transfer of the P DNA transposon between two species of Drosophila.
One important clue discovered toward the end of the 1980s was the observation that some TEs contain internal sequences similar to those found in murine mammary tumor virus LTRs, which are known to play a role in the regulation of the provirus by steroid hormones (Peronnet et al. 1986; Ziarczyk and Best-Belpomme 1991), and element 412 was found to contain a 20-HD ecdysone-responsive repetitive sequence (Micard et al. 1988). In addition to such hormone-sensitive sequences, the presence of sequences homologous to heat-shock consensus sequences suggested the possibility that TEs could be sensitive to their environment. It is surprising that such observations have not been investigated further because hormones are known to regulate various genes during development, and their action on TEs could help to include TEs in gene network regulatory systems (see below).
It was during the 1990s that retroviruses and TEs began to be used as vectors to transfer genes within or between species and therefore became important tools in genetic engineering. While many experiments used retroviruses to insert new genes, especially in higher organisms, many experiments also tried to use LTR retrotransposons and DNA transposons with the aim of obtaining a powerful universal vector. We are still far from achieving this objective, although recent discoveries look promising (Grabundzija et al. 2010).
TRANSPOSABLE ELEMENTS AS PLAYERS IN EVOLUTION
Except for their use as potential genetic tools, the interest in molecular analyses of TEs faded between 1990 and 2000, and the population approach was not really understood. Researchers were interested in the precise mechanisms by which TE activity and copy number are regulated, rather than in global processes such as those involved at the level of populations. Some forces do indeed select against TE insertions (due to the deleterious impact of insertions or of their effects through ectopic recombination), and drift, resulting from the small effective population size of the host, may be at work even for TEs that are strongly regulated at the molecular level. However, there was a great renewal of interest when sequenced genomes of organisms such as humans, mice, and plants became available. The large amount of TEs present in these organisms forced us to finally reconsider the assumption that the TEs were purely selfish and to envisage instead that they (or some of their insertions) may have evolved toward genomic functions (they are said to have been “domesticated”) (McDonald 1983). This idea of a possible genomic function of some TE insertions was initially rejected, mostly because no fixed TE insertions had been identified in chromosomes. However, this was mainly due to the in situ hybridization technique used, which tended to detect large insertions rather than smaller ones, such as the solo LTRs of retrotransposons that were later shown to be associated with gene regulation. In addition to the accepted notion that specific TE domestication has led to the RAG genes of the immunoglobulin system, which are derived from DNA transposons and act like a transposase in the V(D)J recombination system (Kapitonov and Jurka 2005), and the use of the TART and HeT elements as protectors of telomeres in Drosophila (Pardue and Debaryshe 2003) instead of the classical sequences usually found in many organisms, or the acquisition of novel cellular functions by recruitment of TE-derived coding sections (Miller et al. 1999), the TEs were also seen as responsible for gene regulation.
On the basis of many recent findings, we now consider that TEs have considerably shaped the structure, function, and evolution of the genomes and that the regulatory sequences that they possess can interfere with the networks of regulation of many genes, even of genes located at some distance from them (Feschotte 2008; Herpin et al. 2010). TEs must therefore be considered to be integrated components of the genomes, which have played a major role in evolution. It is proposed that it might be more accurate to view the genome as an ecological ecosystem in which the TE families and subfamilies correspond to the “species” of an ecosystem (Brookfield 2005; Mauricio 2005; Le Rouzic et al. 2007; Venner et al. 2009). These ideas suggest that the term “controlling element,” initially proposed by B. McClintock, was appropriate, even if all TE insertions do not have this “gene-controlling” capacity. In the 1980s, I read in a book that B. McClintock had been right except in her suggestion that TEs could act as “controlling elements.” This illustrates the difficulty of changing one's views, perhaps especially when our own genome is concerned. The idea that TEs could control some genes appeared more reasonable when some insertions of even a part of their sequence, such as a LTR that possesses promoters, were found in permanent positions with respect to specific genes. However, this does not eliminate the possibility that some more recent TE insertion sites may also be involved in gene network regulation and in species adaptation (González et al. 2010). The influence that TEs have had and are still having on genomes is not diminished even if the effects of some of their insertions are neutral and detrimental and contribute to the inbreeding genetic load or display selfish behavior. It is also gratifying to realize that the existence of TEs accounts for most of the early reports of “spontaneous” mutations observed in some natural populations or laboratory lines, which suggested the presence of “mutator genes or mutator systems” (Demerec 1927; Duseeva 1948; Tinyakov 1939; Green 1973).
The mutational capacity of the TEs, their power to regulate genetic systems, and their sensitivity to environmental stress that has been shown to mobilize them, soon led to the idea that the TEs not only could generate genetic polymorphism favoring population adaptation, but also could promote speciation (Flavell 1982; Georgiev et al. 1983; McDonald 1983; Syvanen 1984). Because the effects of TEs were seen at that time as being predominantly deleterious, some authors concluded that TEs would have only the effect of increasing the genetic load, rather than leading to speciation (Krieber and Rose 1986). We now have considerable evidence showing the impact of the TEs throughout evolution (Biémont and Vieira 2006; Feschotte and Pritham 2007), possibly through chromosomal rearrangements (King 1995; see Zhang et al. 2009 in maize), but the possibility that TEs may have facilitated or promoted speciation is still the subject of considerable debate (Fontdevila 2005; Rebollo et al. 2010). This is because it is difficult to find out whether a change in TE content or activity during a specific evolutionary period is the cause or the consequence of the speciation process.
In particular, we are not yet able to link TE characteristics, such as a high copy number, high transposition or transcription activity to the ability of populations or species to adapt better to new environmental conditions. Is there any relationship between evolutionary radiation and the composition of the genomes in terms of TEs and other repetitive sequences? Does Bombyx mori, with its 45% of TEs (International Silkworm Genome Consortium 2008), actually do any better or have stronger “evolvability,” than Apis mellifera (Honeybee Genome Sequencing Consortium 2006) with its puny 1% of TEs? And what is the biological significance when a plant or an amphibian genome is composed of >70% TEs? Does the increase in the TE amount seen in some invasive species result from confronting fresh environmental conditions (Vieira et al. 1999)? Do the genomes really need TEs? More comparative analyses of genomes of various organisms and various populations of the same species will be required before we can answer these questions. This will be facilitated by improvements in sequencing techniques that will allow the low-cost processing of data from many individuals and many species. This growing body of genomic sequence data will provide precise information about the exact localization and precise DNA sequence of the copies of all kinds of TE families in many individuals, populations, and species in various ecological and physiological environments, allowing the history of genome invasion to be inferred and the impact of the environment to be evaluated. We hope, however, that the sequences of the TEs and other repeated sequences will also be taken into consideration in these studies and that they will not be ignored as has happened all too often because their annotation is difficult and time-consuming. These large-scale DNA sequence analyses should make it possible to identify new TE families, thus helping to explain why only some TE families have invaded genomes efficiently, and why some TE families have invaded certain genomes but not others. In addition, there should be real progress in the analysis of epigenetic marks, which have been known since the 1980s to interfere with TE activity (Sobieski and Eden 1981). It is now clear that DNA methylation of cytosines, histone modifications, and RNA interference, all interdependent mechanisms associated with chromatin conformation, can switch TEs on or off. These processes must therefore play a role not only in defending the genome against invasion by TEs and retroviruses, but also in the complex interactions involved in gene regulation throughout development (Huda et al. 2010), as reported for neuronal development, as well as in developmental processes in mouse oocytes and preimplantation embryos (Muotri et al. 2007; Sasaki et al. 2008). The possibility remains, however, that the presence of the TEs in many eukaryote genomes is also the result of their having been selected as components of (inactive) heterochromatin because of their importance in chromosomal biology and cell division (Biémont 2009). We must not forget the role of TEs in the spread of antibiotic resistance genes in bacterial populations, which can have a dramatic impact on human health, and their role via their epigenetic activation-inactivation in the effects of the early environment (nutrition, ultraviolet light, temperature) to adulthood (Bollati and Baccarelli 2010), making them significant players in the interaction between genotype and phenotype.
New models of TE copy-number dynamics will be developed on the basis of the new epigenetic regulations that involve RNA interference (Lu and Clark 2010). While such models will be more precise and will have to be applied to each specific TE family, they will not entirely replace the more conventional models based on the transposition-excision rate, selection coefficient, and effective population size. These models still have their utility at the level of populations, especially because population structuring may have a greater impact on TE dynamics than previously imagined (Deceliere et al. 2005). The numbers of TE sequences to be analyzed will be large, and larger than that obtained from in situ hybridizations and Southern blots (Strobel et al. 1979; Charlesworth et al. 1992; Biémont et al. 1994), but we must be aware that while new questions will emerge from these data, the old questions, such as the analysis and modeling of the dynamics of TE families, will still depend on an appropriate definition of what a TE family is. Because a TE family is defined as a “set of phylogenetically close copies that share >80% sequence identity” (Wicker et al. 2007), the new data will not be very different from the data obtained by in situ hybridization, which specifically detected and identified sequences on the basis of such similarity criteria (Charlesworth et al. 1992; Biémont et al. 1994).
As the result of the ongoing sequencing of the genomes of a wide variety of organisms, we will be faced with the need to analyze huge amounts of data, and will enter a cycle of complexity in which the genome will be considered in its entirety. TEs and all other repeated sequences of the genome could play a role in deciphering this complexity (Shapiro 2010). While we will inevitably tend to concentrate on the human genome and on the genomes of organisms of industrial or agronomic interest, we should not overlook the historical model organisms, such as Drosophila, yeast, nematodes, and Arabidopsis, because the knowledge that we have accumulated about them over decades of study constitutes a mine of information that will enable us to develop new ideas and approaches to understanding our own genome.
TEs were initially considered to be just junk DNA, with no influence on the genes and endowed solely with the capacity to invade genomes and populations because of their ability to transpose. They began to acquire their more exalted status once we recognized their influence on recombination rates and chromosomal rearrangements, their role as mutators and gene regulators, and their ability to be domesticated by the genome, which transforms them into new genes (Volff 2006; Sinzelle et al. 2009). Thirty years ago we began to suspect that these repeated sequences could possibly have some influence on genetic diversity, but it was beyond even the most fertile minds to imagine all their properties and their huge impact on genome composition and genome functioning. So far molecular biologists have concentrated mainly on the 1–2% of the genome composed of protein-coding genes; we now have to incorporate all the sequences that surround these genes if we are to have an overall view of all the forces that allow genomes to define an organism in interaction with the environment. In addition to incorporating TEs into genetic networks (Ramos et al. 2007; Feschotte 2008), we will have to understand how some repeated sequence–gene interactions can be “redundant,” thus enabling genomes to cope with huge changes in their structure and composition. We will then have to consider all the repeated and nonrepeated sequences. Some sequences will be found to play major roles while others will undoubtedly be found to be only selfish. Such research is likely to have a major impact in the field of cancer, some of the phenomenology of which involves both epigenetic processes and TE reactivation (Serafino et al. 2009; Lamprecht et al. 2010; Shalgi et al. 2010; Wilkins 2010). TEs and all the other repeated sequences might, once again, have some major new surprises in store for us.
It is very satisfying for a population geneticist to see that whereas at the end of the 1970s we wondered whether TEs really were involved in the generation of biologically significant genetic variability, and whether there was any point in analyzing them in natural populations, today the study of variation at the genome level has settled the question. This is due to the discovery of numerous SNPs in the genomes of different individuals and of the great variation in epigenetic marks observed between individuals, tissues, and even cells (Biémont 2010; Johnson and Tricker 2010; Melcer and Meshorer 2010; Skipper et al. 2010). The history of transposable elements is a good example of how science works and of how new concepts can be progressively incorporated and evolved until, in the end, they entirely transform our way of looking at things. In the 1950s, Barbara McClintock's ideas that some DNA sequences were able to move between different sites in the chromosomes, and that some were involved in the control of gene expression, were difficult to understand and accept, but she has been fully vindicated. Her thinking represented a major step forward in our understanding of how genomes work. Although we are still far from a clear or complete vision of gene network interactions and regulation, major progress is being made in the analysis of complex systems. In this research, TEs and epigenetics and epigenomics can be expected to play a major role, a far larger one than could have been anticipated even 10 years ago.
I thank my colleagues E. Lerat and C. Vieira for their helpful comments, A. S. Wilkins for his useful suggestions that greatly improved the text, and Monika Ghosh for her invaluable help with the English language during all these years.
- Copyright © 2010 by the Genetics Society of America