|
|
|
|
|
|
|
|
|
Part 2 CONSERVED GENES, WHOLE GENOME DUPLICATION INTRONS, EXONS, TRANSPONSONS, REGULATORY GENES SILENT GENES AND RETRO-VIRUSES Rhawn Joseph, Ph.D.
1. THE ORIGIN OF EARTHLY LIFE
Life had taken root and repeatedly arrived on this planet between 3.8 to 4.2 BYA, a time period during which Earth was undergoing continual pummeling from the remnants and debris produced by the exploding parent star and its planetary system. Although microfossils resembling yeast cells and fungi were discovered in 3.8 BY old quartz (Pflug 1978), the nature of the first and earliest Earthlings can only be inferred indirectly based on the residue of photosynthesis, oxygen secretion, carbon isotopes, the structure of banded iron formations and high concentrations of carbon 12, or “light carbon;” all of which are typically associated with microbial life (Manning et al. 2006; Mojzsis et al. 1996; Nemchin et al. 2008; O'Neil et al. 2008; Rosing, 1999, Rosing and Frei, 2004; Schoenberg et al. 2002).
However, if simple eukaryotes such as fungi and yeast cells had arrived on Earth by 3.8 BYA, then we can certainly assume that those sojourners from the stars who had arrived hundreds of millions of years earlier, included bacteria, archae, and viruses--and this has been demonstrated by geo-physical and biochemical analysis (Manning et al. 2006; Mojzsis et al. 1996; Nemchin et al. 2008; O'Neil et al. 2008; Rosing, 1999, Rosing and Frei, 2004; Schoenberg et al. 2002).
2. THE FIRST EUKARYOTES
It is generally assumed based on genomic analysis, that the first Earthly unicellular eukaryotes were fashioned when genes from archae and bacteria combined thereby inducing eukaryogenesis and giving rise to the eukaryote genome. These genes subsequently underwent repeated single gene and whole genome duplications, perhaps in response to regulatory signals or environmental triggers, and unicellular eukaryotes became multicellular and then increasingly complex and intelligent.
However, the possibility that the first eukaryotes also arrived on Earth contained in jettisoned planetary debris and ejecta from the shattered remnants of the parent star's solar system, cannot be ruled out. Many species of bacteria form spores (Marquis and Shin 2006) and some survive in a state of suspended animation for hundreds of millions of years (Satterfield et al. 2005; Vreeland et al. 2000). Simple eukaryotic organisms, including yeast and fungi (Botts et al., 2009), also produce spores, often for reproductive purposes, but also in response to adverse, life threatening conditions. Therefore, it is possible that some simple eukaryotic organisms, and their descendants, along with trillions of other microbes, may have survived the destruction of the parent star system only to be hurled upon the newly forming Earth hundreds of millions of years later. This would account for the presence of microfossils resembling yeast cells and fungi, discovered in 3.8 BY old quartz (Pflug 1978).
Once on Earth, these simplified eukaryotes may have phagatocized archae and bacteria (Kurland et al., 2006; Poole and Penny, 2007) and incorporated their genes, or were infiltrated by parasitic prokaryotes which donated genes to the eurkayotic genome.
Woese, (2004) has proposed that these initial bacteria, archaea and eukaryotes may have lived together and repeatedly swapped and shared genes. "Eventually this collection of eclectic and changeable cells coalesced into the three basic domains known today. These domains become recognisable because much (though by no means all) of the gene transfer that occurs these days goes on within domains" (Woese, 2004).
A second possbility is that the first Earthly eukaryotic cells were created by the genetic fusion of bacteria and archae, and possibly the injection of viral genes. Thus, hundreds of millions of years after arriving on Earth, archae, bacteria, and a virus may have joined together, combining their genomes, and in so doing, created the first eukaryotes, which, nearly 4 billion years later, would give rise to humans.
3. ARCHAE VS BACTERIA: GENE TRANSFER
Numerous species of bacteria act as endosymbionts or endoparasites (Dyall et al., 2004; Poole and Penny 2007). Viruses are parasitic by nature. Archaea do not generally serve in this capacity--though there are exceptions. Bacteria, of course, are not uniform and there may be innumerable species (Nakabachi et al., 2006; Ranea et al., 2005; Schulz and Jorgensen 2001; Schneiker et al., 2007) .
Considered in the broadest terms, archaea are highly distinct from bacteria, particularly in regard to the size of their genomes and cell membranes. For example, archaean membranes are made of ether lipids where as bacterial cell membranes are created from phosphoglycerides with ester bonds (De Rosa et al., 1986). Like bacteria, archae can live in the most extreme environments (Kimura et al, 2006, 2007; Leininger et al., 2006; Robertson et al., 2005). However, whereas bacteria are usually (but not always, e.g. Leininger et al., 2006) the most common form of life in the soil, archaeota are the most common form of life in the ocean, dominating ecosystems below 150 m in depth (Karner et al., 2001; Robertson et al., 2005).
The genomes of archae are rather uniform and compact in size ranging from 0.5 Mb in the parasite Nanoarchaeum equitans (Waters et al., 2003) to 5.5 Mb in Methanosarcina barkeri (Maeder et al., 2006).
Bacterial genomes can vary by two orders of magnitudes, from 180 kb in an intracellular symbiont, Carsonella rudii (Nakabachi et al., 2006), to 13 Mb in Sorangium cellulosum which dwells in soil (Schneiker et al., 2007). Although there are bacterial genomes of intermediate size, the vast majority of bacteria so far sequenced show a clear-cut bimodal distribution of genomes; i.e. large vs small, suggesting the existence of two distinct classes of bacteria: those with ‘small’ genomes (Ranea et al., 2005) with the highest peak at 2 Mb and those with "large" genomes at about 5 Mb (Schulz and Jorgensen 2001).
By contrast, eukaryotic genomes range wildly in size and are generally several magnitudes larger than those of prokaryotes. However, the genomes of some eukaryotic species, such as microsporidian Encephalitozoon cuniculi (Katinka et al., 2001) are substantially smaller than many bacteria and archaeal genomes. Encephalitozoon cuniculi is also a parasite and may serve as a genetic messenger.
Likewise, those bacteria and archae with the smallest genomes share a significant behavioral feature with Encephalitozoon cuniculi: they too are parasites and they prey upon other prokaryotes as well as eukaryotes (Waters et al., 2003; Huber et al., 2002). It is these parasitic behaviors which may explain their small genomes, and the presence of prokaryotic genes in the eukaryotic genome. These prokaryotes may have donated their genes to a eukaryotic host billions of years ago. Once donated many of these genes were not replaced.
For example, prokaryotes with the smallest genomes, i.e. parasitic and symbiotic bacteria and archaeal parasites (e.g., N. equitans) no longer encode or express a variety of protein regulators, indicating the responsible genes have been transferred to the genome of the eurkaryotic host. With the donation of these regulatory genes, the genomes of these parasitic and symbiotic prokaryotes decreased in size. However, in addition to genes, many species of parasitic bacteria/archae may have taken up residence inside a eukaryotic host after which they continued to transfer and donate genes (Dyall et al., 2004; Margulis et al., 1997).
Hundreds of specialized prokaryotic genes have been donated to the genomes of their hosts, possibly by horizontal gene transfer (Yutin et al., 2008) and were then preserved, unchanged, often in the same position even after hundreds of millions and, perhaps, even after billions of years of evolution. Some of these donated genes, or the complete engulfment of a bacterial parasite by eurkayotes, appear to to be responsible for the metamorphosis of mitochondria which also donated genes to the eukarayote genome (Margulis et al., 1997). These prokaryotic genes and bacteria/archae symbionts, enabled eukaryotes to become increasingly complex and to colonize and conquer new environments which were being genetically engineered by prokaryotic genes.
The activity of photosynthesizing organisms and prokaryotic genes altered the environment via the liberation, secretion, and synthesis of a variety of chemicals and enzymes including oxygen (Buick 1992, 2008; Falkowski and Godfrey 2008; Holland 2006; Olson 2006; Williams and Fraústo da Silva 2006). The changed environment acted on gene selection, activating genes contributed by bacteria and archae, giving rise to new traits and new species perfectly adapted for a world that had been prepared for them.
4. CONSERVED GENES & GENE EXPRESSION
Thousands of orthologous genes and hundreds of conserved genes can be traced back to the last common ancestor for eukaryotes (Snel et al., 2002; Mirkin et al., 2003; Kunin and Ouzounis 2003; Koonin 2003; Mushegian 2008; Bejerano et al., 2004). Almost all underwent duplication at the onset of eukaryotic evolution (Makarova et al., 2005). These genes then continued to undergo repeated episodes of single gene and whole genome duplication such that the eukaryotic genome increased in size. However, these duplications were often coupled with gene deletions, obscuring their original relationship with prokaryotes.
Almost all of the genes donated by prokaryotes, including those subsequently deleted from the eukaryotic genome, performed crucial functions that would guide the future or evolution. These included regulatory genes and genes controlling core cellular activities and the capacity to make duplicates of individual genes and the entire genome.
Genome sequencing has revealed an extensive conservation of the same repertoire of genes coding for core cellular functions in the genomes of prokaryotes and eukaryotes (Koonin et al., 2004; Koonin and Wolf 2008). A core set of approximately 70 genes contributed by archae and bacteria have been conserved and passed down, without deletion, for billions of years, and which make up around between 1% to 10% of the genes in the genomes of all multicellular life (Koonin 2003; Koonin and Wolf, 2008; Harris et al., 2003; Charlebois and Doolittle 2004).
These conserved genes, proteins, (Koonin 2002) and gene sequences (Koonin 2009b), include those governing translation, the core transcription systems, and several central metabolic pathways, such as those for purine and pyrimidine nucleotide biosynthesis (Koonin 2003). Moreover, protein sequence conservation extends from mammals to bacteria thus demonstrating their great antiquity (Dayhoff et al., 1974; Eck and Dayhoff 1966; Dayhoff et al., 1983).
Between 2150 to 4137 orthologous gene sets are highly conserved and can be traced back to the last common ancestor for eukaryotes (Makarova et al., 2005). And often these orthologs express or perform the same function regardless of species.
In yet other instances, these conserved genes had not been expressed in ancestral species and were activated only after hundreds of millions of years had passed; activated in response to changing environmental or regulatory conditions. These genes generally have numerous interaction partners indicating they can exert widespread effects across networks of genes.
Consider, the evolution of the eye. It has been claimed that the chief components of the eye, such as photoreceptors must have evolved essentially de novo 40–65 times independently according to Darwinian principles (Salvini-Plawen & Mayr 1977). However, genes do not evolve de novo or ex nihilo; they are transferred from another species, inherited from an ancestral species, or they are produced by exon shuffling, whole gene duplication, and numerous other replicative mechanisms.
Genes involved in eye development, known as Pax, "Pax-6" and opsin in vertebrates and "eyeless" in fruit flies, are homologous between diverse phyla (Quiring et al., 1994; Gehring & Ikeo 1999). Pax genes ("Pax-6"). They have also been found in the genomes of ancient species such as the sea urchin and trichoplax, both of which have no eyes and cannot see (Sodergren et al., 2007; Callaerts et al. 1997; Hadrys et al., 2005).
Pax-6 serves as a master regulator of a network of genes that can give rise to a variety of different types of eyes that utilize the same visual pigment genes. That is, Pax-6 appears to act on different genes to produce the different structures on which the pigment cells are mounted in different creatures giving rise to a variety of eyes (Sheng et al., 1997; Gehring and Ikeo, 1999; Davidson, 2001).
Moreover, Pax 6 proteins show an 90-90% identity between vertebrate and invetebrates (e.g. squid) as well as insects (Drosophila) and marine worms (Tomarev, 1997). These genes also utilize identical homologous Pax-6 proteins during eye development (Gehring & Ikeo 1999). As the common ancestors for vertebrates and invetebrates diverged between 600 mya to 1.6 bya (Ayala et al., 1998; Wray et al., 1996; Gu, 1998; Cutler, 2000), this is an indication of the great antiquity of Pax genes--many of which can be traced to ancestral species who had no eyes and were unable to see. Those ancestors could include prokaryotes.
Consider, for example, vitamin-A-related chromophores in the visual pigment and which is the single most prerequisite for vision in the vertebrate or invertebrate genome. Vitamin-A-related chromophores are also found in bacteria as well as algae (Seki and Vogt 1998; von Lintig, J., Vogt 2004).
These highly conserved genes were then passed down, through numerous diverging ancestral species until activated in the period leading up to and including the Cambrian Explosion. Over 1000 genes involved in visual functioning, including ancestral Pax-6 genes, were inherited and are homologous between phyla (Quiring et al., 1994; Gehring and Ikeo, 1999),(Tomarev et al. 1997), and have been isolated from several invertebrate and vertebrate species, including squid, flatworm, ribbonworm, ascidian, sea urchin, nematode, and fruit flies (Callaerts et. al., 1997; Tomarev, 1997).
Be it vertebrate, flatworm or insect, and in spite of the large differences in eye morphology and mode of development (Gehring 1996), the same genes and same gene products related to the visual system are under the same genetic control (Quiring et al., 1994). Thus, regardless of species some parts of eyes are homologous because they are coded by the same genes and the same proteins.
Between 70% to 80% of these genes are common and evolutionary conserved in the genomes of mammals, squid, octopus, flatworm, ribbonworm, ascidian, and nematode mosquitos, flies, tunicates, and vertebrate genomes including humans (Ogura et al., 2004). The common ancestors for these species diverged anywhere from 1.2 bya to 830 million years ago (Ma) (e.g., Wray et al., 1996; Peterson et al., 2004, Nei et al., 2001; Gu 1998). As there is no evidence for visual functioning in any creature before 550 mya, these genes were therefore inherited, in silent form, from ancestral species which could not see.
However, regardless of their activity, genes that are highly conserved over the course of eukaryotic evolution not only remain in the same location but accumulate fewer substitutions in their protein sequences. Therefore the conservation of a gene and the fact that it is passed down vertically to subsequent species and is maintained unchanged in the same position, indicates biological importance and the identical roles it plays, almost regardless of species, over the course of evolution.
That importance may also have more to do with the future of evolution rather than the survival of the species possessing that gene. Therefore, some highly conserved genes can be removed (knocked out) of various genomes without having any noticeable impact on the viability of the organism or its ability to function (Koonin 2000). In fact, hundreds of genes have been knocked out, or stripped from various species which remained viable (Glass et al., 2006; Koonin 2000).
Mycoplasma genitalium, for example, has one of the smallest genome of any organism but remained viable even after 100 of its 482 genes were removed (Glass et al., 2006). However, 28% of the minimal set of genes coded for unknown functions (Glass et al., 2006). Moreover, 80 genes of the original minimal gene set were represented by orthologs in all forms of life and many of these coded for unknown functions (Koonin 2000). Therefore, not all highly conserved genes are related to the viability of the organism, but instead serve the future evolution of new functions, new structures, and new species.
Mycoplasma genitalium Likewise, features of gene architecture that are not necessarily directly relevant to gene function are highly conserved across lengthy periods of evolutionary history. This includes the positions of a large fraction of introns with 25–30% conservation in orthologs from plants and chordates (Fedorov et al., 2002; Rogozin et al., 2003; Roy and Gilbert 2006). In the human genome, these ultraconserved elements often overlap introns or nearby genes involved in the regulation of transcription and development. Highly conserved genes are also located adjacent to exons involved in RNA processing (Bejerano et al., 2004).
In addition, the positions of a large number of introns are conserved between plants and vertebrates (Fedorov, et al., 2002; Rogozin et al., 2003; Roy and Gilbert 2006) and between mammals and "living fossils" such as as Trichoplax and the sea anemone (Putnam et al., 2007; Srivastava et al., 2008); species which diverged over a billion years ago. Introns play a major role in the regulation of gene expression and transcription and creation of new genes from old genes.
Thus, genes involved in transcription regulation and which were donated by prokaryotes to eukaryotes interact with and overlap genes and introns also contributed by prokarayotes to eukaryotes. Moreover, these same genes were repeatedly duplicated and dispersed to a wide range of divergent species, and when activated gave rise to identical or similar evolutionarily advanced characteristics such as the eye and brain.
5. GENE REPLICATION & WHOLE GENOME DUPLICATION
Some of these highly conserved genes act as a genetic mechanism through which prokaryote genes, gene sequences, and proteins, could be repeatedly duplicated within the eurkaryotic genome. For example, a variety of regulatory genes and proteins were donated which insure that specific genes and the functions they code for remained inhibited, while guaranteeing these same genes could be repeatedly duplicated and their functions preserved even as they grew in number and were passed down to subsequent species over hundreds of millions of years. Nevertheless, many of these genes were suppressed and remained silent.
For example, archae and bacteria contributed three subunits of the core DNA-dependent RNA polymerase (Iwabe et al. 1991; Klenk et al. 1993) and two enzymes of DNA metabolism, RecA and Pol1A to the eukaryotic genome (Eisen and Hanawalt 1999; Harris et al., 2003) . These enzymes and the core RNA polymerase subunits serve many regulatory and replicative functions. For example, both RecA and Pol1A contributed to genetic continuity by gene conversion after recombination. They also insure the integrity and maintenance of genetic information as the lengths of DNA strands increase and the genome grows larger in size (Eisen and Hanawalt 1999).
The replicative DNA polymerase, DnaN (COG0592), and the gene for the “sliding clamp” were also donated. This gene and proteins are necessary for the high degree of processivity of DNA polymerase during replication (Kuriyan and O'Donnell 1993;Hingorani and O'Donnell 2000). This enables the accurate replication of linked genes and the preservation of the information they encode.
Many of the proteins that regulate eukaryotic signal transduction networks, including those involved in programmed cell death, are also derived from the prokaryotic genome (Aravind et al., 1999; Koonin and Aravind 2002; Bidle and Falkowski 2004). These signaling molecules are common in bacteria, cyanobacteria, and archae and include proteases from the AP-ATPase family. These proteases perform catalytic functions, and are found in the plant and animal genome (Koonin and Aravind 2002; Bidle and Falkowski 2004) and are utilized by mitochondria.
Replication is a universal feature of cellular organisms, and eurkaryotes and prokaryotes share many genes and characteristics involved in replication, including the production of RNA primers, replication bidirectionality, strand synthesis, and the utilization of the same principal proteins involved in transcription and translation. That these genes were transferred from prokaryotes to eukaryotes is demonstrated by their commonality.
Prokaryotic genes which guide replication and duplication contributed to the expanding size of the eukaryotic genome. Indeed, the number of signal transduction and regulatory proteins that are encoded parallel the increasing size of the genome. Thus, the larger the genome, the greater the number of genes dedicated to signal transduction (van Nimwegen 2003; Konstantinidis Tiedje 2004; Galperin 2005).
Some of these genes that can be traced to a common ancestor also perform functions that involve the transfer of genetic information (Harris et al., 2003). Some interact with ribosomes and those ribosomal RNA genes which play fundamenal roles in cellular functioning and DNA translation and transcription (Harris et al., 2003). Ribosome and ribosomal RNA genes were also likely transferred from prokaryotes to eukaryotes (Lake et al. 1984; Lake 1988; 1998; Rivera and Lake 1992; Rivera and Lake 2004; Vishwanath et al. 2004).
Thus, the ability to replicate and duplicate genes, and to transfer genes and to express these genes can be traced backwards in time to prokaryotes and to the direct descendants of the first creatures to arrive on Earth.
Moreover, the donation of these genes and proteins was not random but under extreme regulatory control, performing essential functions related to the metamorphosis and evolution of future eukaryotic species; and this is why they are highly conserved across diverse species. These functions include gene and whole genome duplications (Dehal and Boore 2005; Lynch and Conery 2000; Lynch et al., 2001; McLysaght et al., 2002).
Repeated replication, including whole genome duplication, freed up duplicated genes from regulatory restraint. Thus pre-coded genetic instructions were expressed giving rise to advanced traits which had been suppressed. Gene duplication is a major evolutionary mechanism (Ohno 1970).
However, with each duplication, genes were also deleted, often the original prokaryotic insert. For example, a comparison of the numbers of ancestral gene clusters with those of extant animals such as the nematode, fly, mouse and human, established that extant bilaterian animals have retained more than 3500 gene clusters of the ancestral gene set, and have lost more than 1600 gene clusters (Ogura et al., 2004). Following duplication the originals or the copies were moved to new locations within the eukaryotic genome. Therefore, most of the genes which originated in the prokaryote genome can no longer be traced back to their prokaryotic source.
After they had been donated and transferred to the eurkaryotic genome, many of these genes were simultaneously deleted from the prokaryotic gene pool thus insuring they would not affect prokaryote evolution. In prokaryotes, gene loss is one of the two major evolutionary processes, along with horizontal gene transfer (HGT), that contribute to the intensive “gene flux” that seems to have shaped the genomes of these organisms.
Those donated genes included those regulating whole genome duplication (WGD). Thus, it appear that these genes underwent WGD only after they had been acquired by eukaryotes as there is little evidence of WGD in prokaryotes.
There have been several whole gene duplications during the early evolution of eukaryotes and which date back to the emergence of the first eukaryotic cells or their ancestors (Makarova et al., 2005). Reconstruction of ancestral gene repertoires has identified 4137 orthologous gene sets in the last multicellular eukaryotic common ancestor, and 2150 orthologous sets in the hypothetical first unicellular eukaryotic common ancestor, which is indicative of WGD coupled with deletions. (Makarova et al., 2005).
There is evidence to suggest that the genome may be duplicated at least every 100 million years (Lynch et al., 2001; Lynch and Conery 2000). Therefore majority of the genes in most genomes of cellular life underwent at least one duplication at some point during evolution (Lynch 2007; Koonin et al., 1996) and many genes belong to large families of paralogs.
The number of ancestral gene sets at the time of the split of plant–animal–fungi and the divergence of bilaterian animals, is estimated to be 2469 and 6577, respectively (Ogura et al., 2004). There is a 2.7-fold increase in the number of gene clusters during the period from the evolutionary split of plant–animal–fungi to the divergence of bilaterian animals (Ogura et al., 2004). This indicates that at least one and possibly two whole genome duplications must have occurred coupled with massive deletions.
Whole genome duplications have occurred in almost all lineages, including yeast (Wong et al., 2002; Vision et al., 2000; Kellis et al., 2004; Dietrich et al., 2004), fish (Van de Peer et al., 2003; Jaillon et al., 2004; Taylor et al., 2001), frogs (Tymowska et al., 1977; Jeffreys et al., 1980) and plants (Blanc and Wolfe 2004). The relatively large and complex vertebrate genome appears to have been duplicated at least twice (McLysaght et al., 2002; Dehal and Boore 2005).
Whole genome duplication played a central role in the primary radiation of chordates (Dehal and Boore 2005) during the Cambrian explosion over 500 million years ago. There followed additional duplications during chordate evolution, thereby forming many of the gene families of vertebrates (McLysaght et al., 2002).
Dehal and Boore (2005 reconstructed the evolutionary relationships of all gene families from the complete gene sets of a tunicate, fish, mouse, and human, and then determined when each gene underwent duplication relative to the evolutionary tree of the organism. An analysis of the global physical organization and genomic map positions of paralogous genes indicates these specific genes were duplicated prior to the fish–tetrapod split, some 400 million years ago. This was followed by two distinct genome duplication events early in vertebrate evolution as indicated by clear patterns of four-way paralogous regions covering a large part of the human genome (Dehal and Boore 2005).
Large-scale genomic events marked the transition and divergence between yeast and fungi (Liti and Louis, 2005) chordates and non-chordates (McLysaght et al., 2002), fish and tetrapods (Dehal and Boore 2005), and then once or twice more after vertebrates began to colonize the surface of Earth (Dehal and Boore 2005).
There is evidence to suggest that the genome may have been duplicated dozens of times over the course of evolutionary history (Lynch and Conery 2000; Lynch et al., 2001) thereby triggering the transition and divergence between numerous species, ranging from yeast and fungi (Liti and Louis, 2005) to chordates and non-chordates (Dehal and Boore 2005; McLysaght et al., 2002).
Gene and whole genome duplication are crucial mechanisms of evolutionary innovation and when coupled with regulatory genes contributed by prokaryotes, enabled the genomes of eukaryotes to become increasingly complex as well as larger in size. This also allowed for multiple copies of the same genes to appear in divergent species and to be passed down until a regulatory or environmental signal triggered their activation.
Gene duplication appears to provide t he raw material for major evolutionary transitions and triggering the emergence of new species in the absence of obvious intermediaries. The duplication of all genes at the same time could possibly induce rapid and extensive evolutionary change; i.e. the emergence of new species from old in the absence of obvious transitional species. Whole genome duplication also enabled the entire expanded gene repertoire to evolve together and reach a greater level of interaction and complexity as compared to single gene duplications.
Duplication is often followed by accelerated sequence evolution as well as rearrangement of a gene, an evolutionary mode that obliterates detectable connections to the original gene source. Moreover, although numerous genes might be retained, other duplicated genes or the original might be quickly eradicated (Wolfe 2001) thus erasing the genetic footprints that would lead back to the prokaryotic source. This would make it appear that a new gene has emerged because its origins are no longer apparent. In fact, the vast majority of duplicated genes are subsequently deleted (Dehal and Boore 2005); an event which may also lead to freeing the original, or the duplicate, from inhibitory restraint, and which can erase all evidence of genome duplication (Dehal and Boore 2005).
6. GENE LOSS & GENE EXPRESSION
Lineage-specific gene loss is one of the major evolutionary processes that have been brought to light by comparative analyses of gene sets from completely sequenced genomes (Aravind et al. 2000; Moran 2002). Genome analysis has revealed the extensive loss of genes after WGD, in yeasts (Katinka et al. 2001; Scannell et al., 2007; Wolfe and Shields 1997), plants (Soltis et al., 2008; Tuskan et al., 2006), and chordates (Dehal and Boore 2005; Durand 2003; McLysaght et al., 2002).
Gene loss without replacement is a common phenomenon in many genomes and appears to play an important role in shaping genome content (Snel et al. 2002). The extent of gene loss can be dramatic, and it can occur relatively rapidly under a strong selective pressure (Baumann et al. 1995).
Although genomes of parasites expose the most striking cases of massive gene loss, a possible function of deletion following transfer, the fact is: substantial gene loss has occurred in all phylogenetic lineages (Snel et al. 2002; Mirkin et al. 2003).
The eradication of the original gene may also play a role in the expression of the duplicate. Some of these duplicate genes appeared to have been freed from inhibitory restraint and were able to undergo an accelerated rate of sequence change thereby inducing the rapid evolution of new characteristics and abilities (Seoighe et al., 2003) Thus after duplication followed by deletion, the duplicate or original genes, now freed of the constraints, could express an already encoded function (“neofunctionalization”) which had been repressed (Conant and Wolf 2008).
In many cases the 'new' function of one gene copy is a secondary property, or subfunction, that was always present, but which may have been suppressed, or which only came to be expressed when other more dominant functional capabilities were inhibited, suppressed or deleted. Therefore, old functions might be fractionated giving rise to new subfunctions (“subfunctionalization”). That is, the new function was not really "new" but had always been a property of a specific gene that could only be expressed following duplication, or duplication coupled with deletion.
Thus, it is not uncommon for the new paralogs to retain or express distinct subsets of the original functions of the ancestral gene whereas the rest of the functions differentially deteriorate (Lynch and Force 2000; Lynch and Katju 2004)
Duplication and the lessening of regulatory restraints might also make the gene more susceptible to environmental triggers.
7. INTRONS
DNA includes stretches of nucleotides, called exons, that are encoded and expressed to produce various proteins (De Souza et al., 1996). These strings of nucleotides are punctuated, bracketed, framed, and interspersed with long stretches of non-encoding DNA, called introns (Belfort, 1991, 1993; Breathnach et al., 1978; Buchman and Berg 1988; Witkowski, 1988). In complex multicellular organisms introns are often 10-fold longer than exons (De Souza et al., 1996). They also signal which lengths of exons are to be expressed (Belfort, 1991, 1993; Breathnach et al., 1978; Witkowski, 1988). Introns are typically snipped out as strings of exons are transcribed via RNA intermediaries, into proteins (Breibart et al., 1985; Leff et al., 1986).
Introns are of particular importance in regulating gene expression (Brinster et al., 1988; Buchman and Berg, 1988; Collis et al., 1990; Lai, et al., 1998; Noe et al., 2003). If different "starter" or "stop" introns are activated this results in different segments or sequence lengths becoming expressed, thereby producing a different product (Belfort, 1991, 1993; Breathnach et al., 1978; Breibart et al., 1985; Leff et al., 1986). Hence, variation and diversity can be differentially induced if different "starter" exons or promoter introns are activated.
Introns have been preserved often in the same places in the genome, over the course of evolution, be it the genes of Drosophila melanogaster (the fruit fly), Caenorhabditis elegans (nematode), mice, or humans ((De Souza et al., 1996; Federov et al., 2002). This extreme conservation and preservation of their positions within genes, attests to their importance in regulating and coordinating evolution and metamorphosis among numerous species. Many are catalytically active and facilitate chemical reactions, even catalyzing their own synthesis (De Souza et al., 1996).
Some introns are found within or in association with ribosomes (Dürrenberger and Rochaix 1991; Jackson et al., 2002; Toro et al., 2007; Yoshihama et al., 2007). The functional part of the ribosome is fundamentally a ribozyme, the molecular machine that translates the RNA copies of exons into proteins (Cech 2000). Thus introns, in association with ribosomes play a major role in translation, transcription and protein synthesis. Ribozymes are also able to splice themselves and other introns out of the original transcript created by these RNA molecules (Jackson et al., 2002). Ribozymes can also be found in the intron of RNA transcripts, which had been removed from the transcript.
Mitochondrial ribosomes and introns are considered to be of bacterial origin (Kenmochi et al., 2001); a product of endosymbiosis (Dyall et al., 2004; O'Brien 2002). Ribosomal introns and protein sequences which circulate in the cytoplasm appear to have originated in the archae genome, and were later donated to eukaryotes, as there is a specific affinity between eukaryotic genes and their orthologs from archae (Lake et al. 1984; Lake 1988; 1998; Rivera and Lake 1992 Rivera and Lake 2004 Vishwanath et al. 2004). Archae and bacteria were a major source of introns and ribosomes.
Some introns are also known as spliceosomes, self-splicing introns, and as Group I and II introns (Roy and Gilbert, 2006). Spliceosomes and spliceosomal introns are responsible for splicing out introns and transposable elements, and insuring that the genetic sequences in introns are not translated into proteins. Thus, they regulate gene expression and help guarantee that only designated exons are translated and transcribed (Roy and Gilbert, 2006).
Spiceosomal introns and are found in the nuclear genes of higher eukaryotes including humans (Doolittle 1978; Gilbert 1978; Mattick 1994; Deutsch and Long 1999). Simple prokaryotes and some eukaryotes (such as fungi and protozoa) do not possess a nucleus and lack nuclear introns. Nuclear introns also engage in alternative splicing, and can produce multiple types of messenger RNA from a single gene (Roy and Gilbert, 2006).
Via the joining of exons after splicing, introns also trigger the synthesis of novel proteins with new properties (Brietbart et al., 1985; De Souza et al., 1996; Leff et al., 1986). They may also promote the creation of multiple copies of the proteins coded by single genes (Brietbart et al., 1985; Leff et al., 1986). In fact, the presence of an intron can increase transcriptional efficiency 100-fold whereas in the absence of the intron these genes may not be expressed at all (Brinster et al., 1988; Lai et al., 1998).
Hence, introns are involved in transcription, translation, signaling, protein synthesis, and regulating which gene sequences or portions of the gene should be expressed or inhibited (Brinster et al., 1988; Brietbart et al., 1985; Buchman and Berg 1988; Collis et al., 1990; Leff et al., 1986; Lai, et al., 1998; Noe et al., 2003). They also create new genes.
Introns guide or participate in the genetic recombinations between exons, a process called “exon shuffling" (Gilbert, 1978, 1987; Doolittle,1978; Blake, 1978). Exon shuffling is the process where new full-length genes are created from exon “pieces” by recombination within the introns (De Souza et al., 1996, 1998, 2003; Fedorov 2001, 2003; Long et al., 1995; Roy 2003; Roy et al., 1999, 2001, 2003). Exon shuffling is associated with the formation of new genes from old genes.
Introns also are implicated in the production of additional genes and even gene clusters which are located deep within the intron (Henikoff et al. 1986; De Souza et al., 1996; Strachan & Read, 1996). Thus, introns may be responsible for producing duplicate genes as well as new genes and clusters of genes, including numerous copies of highly repetitive sequences of nucleotide base pairs (Finnegan, 1989; Henikoff et al. 1986; Peters & Fink, 1982). Indeed, introns, and intronic gene clusters are considered a "hot spot" for homologous recombination (Wahls et al. 1990).
Introns also play a major role in the origin and diversity of proteins by facilitating recombination of sequence coding for small protein/peptide modules (Brietbart et al., 1985; Leff et al., 1986; Koonin 2006). If the length of the code is altered and reframed, or if introns change their positions within the genes, the products produced by the altered code may also undergo subtle or profound changes (Brietbart et al., 1985; Leff et al., 1986). Therefore a variety of tissues and organs can be fashioned.
Introns also contain copies of gene sections that have been silenced and suppressed (De Souza et al., 1996). They maintain the "old code" for genes that were once translated into a protein, as well as the codes for genes that have not yet been expressed. Introns are thus implicated in the release of genetically genetically pre-coded traits (de Jong & Scharloo, 1976; Dykhuizen & Hart, 1980; Gibson & Hogness, 1996; Polaczyk et al., 1998; Rutherford & Lindquist, 1998; Wade et al., 1997).
Hence, introns create genes from old genes, recombine pieces of genes, and thus can combine, fractionate, or reconfigure the structure of a gene, thereby creating new functions from the parsing or assimilation of old functions coded by single or multiple genes. Moreover, they can silence or activate the expression of the genes they create or those they regulate.
Introns, therefore, play a major role in evolution acting to regulate gene expression, maintaining copies of genes, and promoting the assembly of new genes and new gene sequences from old genes, and multiple copies of the same or a new protein product.
Thus, following the donation of introns to eukaryotes, new genes were assembled from old genes (De Souza et al., 1996, 1998, 2003; Fedorov 2001, 2003; Long et al., 1995; Roy 2003; Roy et al., 1999, 2001, 2003). The genome began to increase in size and complexity and genes expressed new, albeit precoded functions; which gave rise to new tissues, organs, and the evolution of new species (Duret 2001; Comeron and Krietman 2000). In fact, the number of introns per gene varies by more than two orders of magnitude between species (Roy 2004).
Therefore, introns, which may have originally been donated by prokaryotes (Cavalier-Smith 1991; Martin and Koonin 2006; Sharp 1991; Stoltzfus 1999), may play a significant role in regulating, copying, and duplicating genes which had also been transferred to the eukaryotic genome by prokaryotes. Moreover, they appear able to regulate the manufacture of new proteins and thus guide the evolution of new tissues, organs, and species. These are not random events, but are under precise regulatory control.
8. INTRONS ORIGINATED IN PROKARYOTES
Numerous introns invaded eukaryotic genes at the outset of eukaryogenesis as the first eurkayotes were being fashioned (Martin and Koonin 2006; Rogozin et al., 2005), and thus at the earliest stages of eukaryote evolution (Rogozin et al., 2005). All eukaryotes whose genomes have been sequenced, including parasitic protists, have been shown to possess introns (Doolittle 1978; Gilbert 1978; Mattick 1994; Deutsch and Long 1999; Nixon et al. 2002; Simpson et al. 2002; Vanacova et al. 2005). Even the simplest of eurkaryotes contain introns as well as spliceosomal proteins within their genomes (Collins and Penny 2005).
Hence, introns were present when simple eukarayotes took root on this planet, or they originated in the prokaryote genome and were transferred to the first proto-eukaryotic organism (Cavalier-Smith 1991; Martin and Koonin 2006; Sharp 1991; Stoltzfus 1999). Introns then continued to be donated or duplicated as eukaryotes evolved.
Both archae and bacteria appear to have supplied eurkaryotes with numerous introns (Martin and Koonin 2006), perhaps flooding the eukaryotic genome with introns and transposable elements at the earliest stages of eukaryosis (Cavalier-Smith 1991; Martin and Koonin 2006; Sharp 1991; Stoltzfus 1999). Or these prokaryotes may have suppled introns at the time the archae and bacteria genomes were unified to create the first eukaryotes (Martin and Koonin 2006). A massive influx of introns would also explain why ancient eukaryotes (Roy 2006) including the last common ancestors for eukaryotes, possessed high intron densities comparable even to vertebrates who posses intron-rich modern genomes (Roy 2006; Carmel et al., 2007; Csuros et al., 2008).
Mitochondria may also be a direct and indirect source for introns including group II self-splicing introns and spliceosomal introns (Dyall et al., 2004; O'Brien 2002; Roy and Gilbert 2006). For example, spliceosomal introns may have evolved from group II self-splicing introns which originated in the genome of the alpha-proteobacterial progenitor of the mitochondria (Koonin 2006). Group II self-splicing introns are present in the genomes of many bacteria (Cavalier-Smith 1991; Koonin 2006; Roy 2006; Stoltzfus 1999). Thus, at least some eukaryotic introns may be linked to the same alpha-proteobacteria genome which gave rise to mitochondria which also donated numerous genes to the eukaryotic genome (Koonin 2006).
Moreover, archae may have contributed introns, including ribosomal introns and protein sequences. Some archael genomes contain genes that are dotted with micro-introns and some archae proteins are also bracketed by introns (Watanabe et al., 2002) as is common in eukaryotes.
Be it archae, bacteria, viruses, or a combination of influences, once these introns were donated to the eurkaryotic genome, they then punctuated and framed numerous protein-coding genes and played crucial roles in recombination, gene creation, coordination of transcription and translation, the emergence of the spliceosome, as well as the nucleus, linear chromosomes, telomerase, the ubiquitin signaling system, inhibition and expression, gene duplication and creation, and the expansion of the genome (Comeron and Kreitman 2000; De Souza et al., 2003; Duret 2001; Fedorov 2003; Koonin 2006; Gilbert 1978, 1987; Long et al., 1995; Mattick 1994; Prachumwat et al., 2004; Roy and Gilbert 2006; Tonegawa et al., 1978).
Thus, introns which were donated by prokaryotes, acted on genes which had been transferred by prokaryote to the eukaryotic genome, thereby creating new genes from old genes, expressing pre-coded traits, and giving rise to new species. Introns play a major role in the regulation of evolutionary metamorphosis,
The donation of introns by prokaryotes following the metamorphosis of the first eukaryotes, also explains the relative absence of introns in the genomes of most modern prokaryotes (Koonin 2006). Introns were donated and were not replaced thus insuring that eukaryotes and not prokaryotes would evolve into new species.
That these prokaryotes at one time may have contained an abundance of introns may also account for why the genomes of archae and bacteria contain split genes (Dassa et al., 2007). Therefore, having contributed their introns to the eukaryotic genome, most archae and most bacterial genes lack or have only a few introns, and their genes are encoded as uninterrupted open reading frames. This indicates that the donation of introns was not random, but under precise genetic control, such that their transfer to eukaryotes played a highly regulated role in eukaryotic evolution whereas their deletion from the prokaryotic genome insured that only eukaryotes would continue to evolve.
9 . INTRONS ARE CONSERVED
The positions of introns and numerous spliceosomal and spliceosome-associated proteins, have been highly conserved in the same locations and positions within the genes of numerous species (Anantharaman et al., 2002; Collins and Penny 2005; Federov et al., 2002). Thousands of introns are located in the exact same regions of the genome, even when comparing the genes of fungi and humans (Federov et al., 2002). This conservation of position and location indicates they exert extremely important influences on the coordination of gene regulation and expression even among different species, possibly even acting to coordinate the evolution of various species in relation to one another.
Studies have shown that highly conserved, shared intron positions are common in animal, plant and fungal genes (Federov et al., 2002). In one study it was found that 14% of animal introns match plant positions, and that ≈17–18% of fungal introns match animal or plant positions (Fedorov et al., 2002), even though animals and plants diverged from any common ancestors over a billion years ago (Wang et al., 1999).
Indeed, the three-way split between plants, animals and fungi has been estimated to have occurred around 1.6 bya, whereas the the basal animal phyla (Porifera, Cnidaria, Ctenophora) diverged between 1.2 to 1.5 bya (Wang et al., 1999). Introns have an ancient pedigree.
Federov et al., (2002) examined 30 nonrelated genes with the highest numbers of common animal–plant introns and found that "60% of the fungal introns have positions common to animal and/or plant introns, and 39% of fungal introns are common simultaneously to both plant and animal introns. This exceptionally high abundance of introns with positions common to all three taxa of animals, plants, and fungi strongly supports the antiquity of these common intron positions."
In yet another genomic study (Rogozin et al., 2003), intron positions were compared in 684 orthologous gene sets from 8 complete genomes of animals, plants, fungi, and protists/parasite. Approximately one-third of the introns in the protist parasite were shared with at least one crown group of eukaryote; indicating that these introns have been conserved for over 1.2 billion years of evolution.
Between 10% to 20% of intron positions and other genomic features without obvious functions are conserved throughout the evolution of eukaryotes leading up to an including in humans (Bejerano et al., 2004; Fedorov et al., 2002). However, the fact that these functions are not obvious is not an indication of a lack of importance. These unknown functions may not be expressed except in future species. "What is conserved is functionally relevant" should be considered a central tenant of biology, even if the functions are not yet obvious.
10. INTRONS & PUNCTUATED EVOLUTIONARY EQUILIBRIUM
Sequences within introns have changed considerably over the course of evolution, sometimes by orders of magnitude, and at a faster pace than those of the exons (Federov et al., 2002). Thus, these highly conserved introns are obviously active and are exerting a variety of influences on the genome and gene expression, as well as the evolution of new species. In fact, bursts of introns appear to have invaded the eurkaryotic genome initially and possibly at key points in eukaryotic evolution, such as the origin of animals and prior to the divergence of extant eukaryotic lineages (Carmel et al., 2007). For example, lineages leading to animals seem to have experienced a phase of massive intron invasion early in their evolution (Carmel et al., 2007).
After billions, or hundreds of millions or tens of millions of years of stasis, armies of introns either invade or rapidly duplicate within the eukaryotic genome, and are directly associated with, or may have directly triggered bursts of branching speciation and explosions of evolutionary change in the absence of transitional forms; a phenomenon that Eldredge and Gould (1972; Gould 2002) described as "punctuated equilibrium." Indeed, there is no fossil evidence of gradual change from one species to another or any fossil record of transitional forms acting as an evolutionary bridge between species (Eldredge and Gould 1972; Gould 2002). Evolution occurs in leaps. Thus, the regulation and coordination of these great evolutionary leaps may well be yet another function of introns.
Although the position of an intron in a gene's coding sequence is well conserved, introns can make copies of themselves which can be snipped out and transposed to another region of the genome (Finnegan, 1989; Moran et al., 1999). Introns change position within the genome, acting as transposable elements. Moreover, they can act as a plasmid or transposon and invade and transpose themselves the genomes of cospecies (Dujon, 1989; Dujon et al., 1989; McDonald 1993). In this manner, they can coordinate gene expression among most members of the same species, such that all make the same evolutionary leaps simultaneously.
Also many drop out of the genome after serving their function, which in turn would effect gene selection and exon transcription. When introns drop out, their deletion may halt any further evolutionary advance, thus leading to another long period of stasis. Intron deletion would also obscure and erase evidence of any genetic footprints leading to prokaryotes, viruses, or a common ancestor.
11. INTRON GAINS & LOSSES
The donation or duplication and deletion of introns may have occurred throughout eukaryotic evolution, with introns coming and going (Roy and Gilbert 2006). Eukaryotes harbor multiple introns per gene (Logsdon 1998; Mourier and Jeffares 2003; Jeffares et al. 2006), requiring hundreds of thousands, if not millions of individual introns to have been donated or duplicated throughout eukaryotic evolution and even during recent evolutionary history (Cavalier-Smith, 1985; Logsdon 1998; Palmer and Logsdon, 1991). However, gains are often accompanied or followed by losses.
It is inferred that a relatively high intron density was reached early in the metamorphosis of eukaryotes (Carmel et al., 2007; Cavalier-Smith 1991; Csuros et al., 2008; Martin and Koonin 2006; Roy 2006; Sharp 1991; Stoltzfus 1999). It has been estimated that the last common ancestor of eukaryotes contained >2.15 introns/kilobase. The last common ancestor of multicellular life acquired even more, harboring ∼3.4 introns/kilobase, a greater intron density than in modern insects, most extant fungi and some animals (Carmel et al., 2007); indicating a massive intron duplicative event coupled with deletions. Among the top six intron-rich species, five are ancestral forms, indicating that some species have subsequently lost introns, whereas initially the number of introns actually increased during the evolutionary leap from uni-cellular ancestor to the first multi-cellular ancestor.
Just as prokaryotes may have lost introns upon donating them in massive amounts to ancestral eukarotes, the higher density of introns in ancient vs more recent species, also suggests that introns play a major role in evolution and then drop out in those species which will no longer evolve.
The evolution of eukaryotic genes is characterized by numerous gains and losses of introns (Carmel et al., 2007) and different species vary dramatically in their intron density, ranging from a few introns per genome to over eight per gene (Logsdon 1998; Mourier and Jeffares 2003; Jeffares et al. 2006). Introns are prevalent in complex eukaryotes but rare in the simple ones (Cavalier-Smith, 1985; Logsdon 1998; Palmer and Logsdon, 1991), indicating that the acquisition or duplication of introns is associated with species which have evolved. By contrast some introns have been eliminated from the genomes of those in a state of prolonged stasis and evolutionary equilibrium.
Therefore, intron gains and losses may be an indication of the evolutionary status of any particular species, if they are in a state of stasis or if their genome is primed to undergo additional evolutionary leaps. Thus intron gain, retention, or loss, may indicate if a species may continue to evolve.
For example, we see an elevated rate of intron loss in several lineages, such as fungi and insects, nematodes, and arthropods (Carmel et al., 2007; Rogozin et al., 2003); species which no longer appear to be evolving, and which may have diverged from vertebrates around 1.2 bya (Wang et al., 1999). Thus, in non-vertebrates the rate of intron loss and gain have decreased in the last 1.3 billion yr. (Carmel et al., 2007). Further, in these lineages and in the last 100 to 300 million years, there has been a dramatic decrease in intron duplicative events, such that gains decreased faster than the decrease in losses, resulting in many lineages with very limited intron gains (Carmel et al., 2007; Rogozin et al., 2003).
Nematodes are characterized by a high number of events, with losses being more plentiful than gains (Cho et al. 2004; Coghlan and Wolfe 2004). Fungi also show more losses than gains (Nielsen et al. 2004). Recent intron losses are also seen in plant genes (Charlesworth et al., 1998).
Whereas many ancestral introns have been lost in fungi and other lower forms, they are retained in the genomes of higher vertebrates (Rogozin et al., 2003) many of which evolved in the last 40 million years. Many "higher" vertebrate species have continued to gain introns, albeit at a rather slowed pace, whereas "lower vertebrates" appear to be losing introns and to be experiencing a rapid reduction in gains (Fedorov et al. 2003; Babenko et al. 2004; Coulombe-Huntington and Majewski 2007). A survey of mammalian genes found six cases of intron losses in rodents relative to human (Roy et al., 2003). In fact, for most extant species, the total number of losses outnumbers the number of gains (Carmel et al., 2007).
The accelerated rate of loss in many species may indicate that these introns have been donated to the genomes of yet other species where they are exerting regulatory and evolutionary influences on gene selection and expression. As introns are quite mobile, they can also jump from location to location like a plasmid, coordinating the expression or suppression of a wide range of genes simultaneously and thus making it appear that introns have been lost, or gained, when they have merely moved to a new location; or, perhaps, jumped to the genome of a different species.
12. INTRONS & TRANSPOSONS
Introns have been implicated in the creation of new genes, new traits, new species, and thus evolutionary metamorphosis. They have played crucial roles in gene creation, coordination of transcription and translation, the expansion and possibly even the duplication of the genome, the emergence of the spliceosome, the nucleus, linear chromosomes, telomerase, the ubiquitin signaling system, and eukaryotic evolutionary innovation (Koonin 2006; Mattick 1994; Roy and Gilbert 2006).
Introns exert a significant regulatory influence over gene expression and may have played a role in the seperation between transcription and translation (Roy and Gilbert 2006). For example, they appear to have provided two types of RNA genes to the eukaryotic genome--mRNA and iRNA. These highly structured Eukaryotic RNAs are also linked with group II introns and might have originated from introns in the alphaproteobacterial progenitor of the mitochondria (Blumenthal, 2005; Toro et al., 2007).
Spliceosomal introns snip out introns and interrupt sequences of protein-coding genes and are among the defining features of eukaryotes (Doolittle 1978; Gilbert 1978; Mattick 1994; Deutsch and Long 1999). Numerous spliceosomal introns invaded genes of the emerging eukaryote during eukaryogenesis and thus must have originated in prokaryotes.
Splicing mechanisms are directly linked to bacterial group II introns (Toro et al., 2007), to archae and bacteria (Lake et al. 1984; Lake 1988; 1998; Rivera and Lake 1992; Rivera and Lake 2004; Vishwanath et al. 2004) (Martin and Koonin 2006), to mitochondria (Blumenthal, 2005; Dyall et al., 2004; O'Brien 2002), and to bacterial operons (Garrett et al., 1994).
Self-splicing introns can be traced back to the earliest stages of eurkaryotic evolution, and are linked to RNA and the basic machinery of gene expression: transcription, splicing, and translation (Blumenthal, 2005).
Likewise, spliceosomal proteins are part of the core cellular machinery that is conserved across eukaryotes, and are sometimes located within operons (Blumenthal and Gleason, 2003; Blumenthal et al., 2002; Garrett et al., 1994; Hill et al., 2000). Operons are sequences of nucleotides which include several structural genes and a promoter, and which produce messenger RNA (mRNA), via transcription by an RNA polymerase (Salgado et al., 2000). Operons are believed to have originated in the prokaryote genome (Che et al., 2006; Ermolaeva et al., 2001) and regulate the expression of various genes, depending on environmental conditions (Salgado, et al., 2000). This is accomplished by the binding of a repressor to the operator to prevent transcription, or by inserting an inducer molecule which binds to the repressor thereby allowing expression (Blumenthal et al., 2002; Salgado, et al., 2000). Introns have retained the operon capacity to repress or selectively express genes sequences.
Group II self-splicing introns also evolved in partnership with the spliceosome, both of which may have originated in organelles which transfered type II introns into the nucleus (Cavalier-Smith, 1985; Rogers, 1989). Organelles are linked to the alpha-bacterial symbiont whose genes combined with archae to fashion the eukaryotic genome and which gave rise to mitochondria.
Self-splicing Group II introns serve as catalytic RNAs (ribozymes) and mobile retroelements, which reinsert themselves into the genome after they are snipped out (Finnegan, 1989; Moran et al., 1999; Roy and Gilbert 2006). They can change their position within the genome and can influence the expression of different sequences of genes in a step-wise temporal-sequential fashion (Dibb & Newman, 1989; John & Miklos, 1988; Kuhsel, et al. 1990).
Group II introns therefore, have the mobile characteristics of transposons and retrotransposons and also serve as transposable genetic elements (Crick, 1979; Coghlan and Wolfe, 2004; Finnegan, 1989; Hickey 1992; Moran et al., 1999). Likewise, some novel introns appear to arise by transposon insertions (Crick, 1979; Dibb & Newman, 1989; John & Miklos, 1988; Kuhsel, et al. 1990). Conversely, some retrotransposons, which have the ability to reinsert themselves, appear to have evolved from mobile group II introns.
Introns and transposable elements (TEs) are intimately linked and in some instances are indistinguishable. Eukaryotic genomes contain numerous TEs, many of which are found in introns (Nekrutenko and Li 2001). Most eukaryotic genomes are littered with introns and transposable elements, and many TEs are located within introns or have been inserted into exons during evolution (Nekrutenko and Li 2001). Hallick et al., 1993).
Coghlan and Wolfe (2004) have examined intron matches and found that around 70% have a nucleotide identity identical to transposable elements. In many cases the new intron is homologous to a transposon and to another intron, indicating the intron acted as a transposon and made a copy of itself which was inserted into another region of the genome. In this manner introns duplicate themselves, jump to different regions of the genome, and can coordinate gene expression in a wide range of gene networks. In some cases what appears to be a new intron is in fact an intron reinsertion, transcript retroposition, intron duplication, or gene conversion. If due to duplication then deletion of the original intron following transposition, then intron gains and losses may be one and the same. However, intron loss may also be a function of transfer to another organism.
The original introns were likely highly mobile, retrotransposable genetic elements which actively invaded the eukaryotic genome at the outset of eukaryotic evolution, relying in part on internally encoded enzyme activities for mobility.
13. TRANSPOSONS, INTRONS & GENE ACTIVATION VS GENE EXPRESSION
Introns also insert themselves into introns. The genomes of numerous species contain introns-within-introns (twintrons), indicating that introns are also targets of intron insertions (Copertino and Hallick 1991; Doetsch et al., 2001) . Thus introns may also regulate introns.
TEs inserted into introns also affect RNA processing, and intronic TEs can render its host gene susceptible to siRNA-mediated transcriptional gene silencing (Doetsch et al., 2001). Therefore, they can turn genes on, or off.
The majority of all introns in the eukaryotic and human genome have Alu insertions (Grover et al. 2004). These Alu enzymes cut up foreign DNA in a process called "restriction" and are also found in bacteria and archaea (Arber and Linn 1969; Krüger and Bickle 1983). Possibly they were donated to the eukaryotic genome by prokaryotes, perhaps as a protection against viruses. "Restriction" is yet another means by which introns can silence genes, including nearby genes, as well as engage in cutting and splicing.
Moreover, transposons/introns, in association with RNA, can serve as regulators of gene expression and chromosome segregation by inserting and introducing heterochromatin which prevents gene expression by wrapping the gene in a protective protein coat (Hall et al., 2002; Grewal and Moazed 2003; Grewal and Martienssen, 2002; McClintock 1950; Volpe et al., 2002). Indeed, heterochromatin is characterized by a high density of transposons (Volpe et al., 2002). TE insertion therefore, can disrupt the coding sequences of a gene and inhibit the production of viable gene products.
These mechanisms mediating gene silencing and activation have also been adopted to evolve new traits (Liu et al., 2004). TE insertion within promoters, introns, and untranslated regions, can directly trigger incredible genetic variation and the full gambit of phenotypes, ranging from subtle epigenetic regulatory perturbations to the complete loss of gene function (Kidwell and Lisch, 1997; Wessler, 1988). That is, by turning genes on and off, different regions of a gene network may be activated and different products can be produced.
TEs that insert into introns are sometimes spliced out during mRNA processing. Even when spliced, however, these TE inserted introns can effect regulatory sequences and gene regulation in numerous ways including triggering or suppressing gene expression in certain tissues (Greene et al., 1994). Moreover, intronic transposable elements and transposons can significantly affect the expression of nearby genes (Finnegan, 1989; Dibb & Newman, 1989; John & Miklos, 1988; Kuhsel, et al. 1990; Lippman et al. 2004). Gene silencing is accomplished in a step-wise process involving RNA and the methylation of histones (Grewal and Martienssen, 2002; Hall et al., 2002).
Group II and III intronic retroelements often insert themselves into exons. Once inserted they are quickly integrated within these exonic sequence (Hallick et al., 1993) and can easily suppress these genes. Group III intron are sometimes formed from the domains of two individual group II introns (Hong and Hallick, 1994). The group III introns and group II introns also share a common evolutionary ancestor, which is linked to the alpha bacteria progenitator as well as archae.
These introns possess the genetic mechanisms which allow them to be efficiently spliced out of transcripts, and to reinsert themselves in another part of the genome. They are able to demarcate coding sequences and to regulate gene expression in different regions of the genome, perhaps simultaneously as well as sequentially. Thus they can guide the activity of a number of gene networks to coordinate gene expression.
Therfore, introns, which may have originated in prokaryotes, can duplicate and give birth to themselves, and possess the genetic machinery which enables them to propagate throughout the genome and to regulate gene expression via silencing and restriction. As is also demonstrated by their highly conserved nature, these are not chance, or random events.
14. INTRONS & RNA
Some introns may also propagate at the RNA level including within messenger RNA. Messenger RNA (mRNA) is transcribed from a DNA template and contains the codes for creating specific protein products which it transports to ribosomes for protein synthesis. These introns indicate which portions of the code are to be translated and transcribed and are then snipped out and are reinserted (spliced) into another region of the genome which is without an intron.
Presumably, the new intron-containing RNA is reverse-transcribed and undergoes gene conversion leading to a new intron. Therefore, via reverse-splicing an excised intron sometimes reintegrates back into a different site in the same mRNA (Coghlan & Wolfe 2004; Tarrío et al., 1998) thereby exerting multiple coordinated influences on gene expression and protein synthesis.
Introns may have been the original information source for the creation of genes which code for mRNA. Likewise, genes involved in mRNA processing and splicing, and germline-expressed genes, preferentially gain introns (Roy 2004). By contrast, introns/TEs are generally excluded from mRNAs of highly conserved genes (van de Lagemaat et al., 2003).
A gene ontology analysis has demonstrated that novel introns are unusually frequent in genes with mRNA processing functions, relative to germ-line-expressed genes. This suggests that it is the function of these genes, rather than their mode of transcription, that makes them amenable to gaining introns (Coghlan & Wolfe 2004). Thus, introns regulate functional expression. Thus introns regulate gene expression or suppression and control the transposition of these introns to different regions of the genome. These properties enabled introns to coordinate the expression or suppression of a wide network of genes.
For example, RNA not only serves as a messenger but can interfere with and inhibit and silence gene expression (Hall et al., 2002). This is accomplished, in association with transposons/introns via heterochromatin formation whose repressive capacity is mediated by components of RNA interference machinery (RNAi). This RNAi machinery acts to nucleate heterochromatin assembly and can initiate and propagates regional heterochromatic inhibition and gene silencing (Hall et al., 2002; Volpe 2002). RNAi in association with introns/transposons can even control chromosome segregation and the expression of large chromosome domains (Grewal and Moazed 2003).
Thus, introns and transposons can exert regulatory control of individual genes, chromosomes, and thus the entire genome.
TE-induced genetic alterations and changes in regulatory sequences, are of extreme evolutionary significance to their hosts and to the metamorphosis and evolution of future species (Britten 1996). TEs, especially when inserted into introns, can alter the size and arrangement of whole genomes, induce changes in single nucleotides, and generate new genetic variation on a scale, and with a degree of sophistication, ranging from subtle to dramatic alterations in the development and organization of tissues and organs (de Jong & Scharloo, 1976; Dykhuizen & Hart, 1980; Finnegan, 1989; Dibb & Newman, 1989; Gibson & Hogness, 1996; John & Miklos, 1988; Kuhsel, et al. 1990; Moran et al., 1999 Polaczyk et al., 1998; Rutherford & Lindquist, 1998; Strachan & Read, 1996; Wade et al., 1997). Such changes appear most likely if these insertions occur in coding regions and often confer useful traits on the host, as well as guide, coordinate, and regulate evolution and metamorphosis.
15. INTRONS INFECT OTHER SPECIES
Between 35% to 50% of the human genome is ultimately derived from transposable elements (International Human Genome Sequencing Consortium 2001; Lander et al., 2001; Smith 1996; Yoder et al., 1997), and there are many examples of human genes derived from single transposon insertions (Nekrutenko and Li, 2001; Sakai et al. 2007). Moreover, large numbers are found in human protein coding genes (Nekrutenko and Li, 2001).
In a study of genome-wide impact of transposable elements on evolution, Nekrutenko and Li (2001) found that almost 89% of these TEs reside within 'introns' and were recruited into coding regions as novel exons, such that it appears that TE insertion might create new genes (Nekrutenko and Li, 2001) and recruit new exons (Sakai et al. 2007), which would in turn, affect and accelerate species divergence. Numerous studies have in fact found that TEs in the mammalian genome promote the variation and diversification of genes, and affect the expression of many genes through the donation of transcriptional regulatory signals (Thornburg et al., 2006; van de Lagemaat et al., 2003; Jordan et al., 2003).
TEs therefore, contribute to pre-transcriptional gene regulation, especially by moving transcriptional signals within the genome which in turn leads to new gene expression patterns (Thornburg et al., 2006) and the creation of new genes from old genes (Nekrutenko and Li, 2001; Sakai et al. 2007). Further TEs are involved in gene duplication and the creation of large numbers of interspersed repetitive sequences (Smit 1996). By contrast, mRNAs of highly conserved genes are generally devoid of TEs (van de Lagemaat et al., 2003).
TEs are more frequent in duplicate than single copy protein coding genes (Sakai et al. 2007) indicating they are involved in gene duplication and diversity (van de Lagemaat et al., 2003) and not gene conservation. Thus TEs serve as recombination hot spots and may express or create specific cellular functions, through the control of protein translation and gene transcription (Thornburg et al., 2006). In fact because many TEs are taxon-specific, their integration into coding regions could accelerate species divergence and contribute to sudden bursts of evolutionary development (Jordan et al., 2003; Morgan 1993; Nekrutenko and Li, 2001; Sakai et al. 2007; van de Lagemaat et al., 2003).
Moreover, gene classes which react to external environmental stimuli, have transcripts enriched with TEs (van de Lagemaat et al., 2003). In addition, TEs are intimately involved in the simultaneous regulation of multiple genes (Jordan et al., 2003). Thus TEs can trigger gene expression in numerous genes simultaneously in response to changing environmental conditions; and this may include whole genome duplication and/or explosive evolutionary leaps after long periods of evolutionary equilibrium.
The life cycle of TEs in any single phylogenetic lineage can apparently last for many thousands or millions of years and can be considered as a succession of six phases: dynamic replication, movement to another region of the genome, transfer to another species, activation, inactivation, degradation (Kidwell, 1993; Miller et al., 1996).
TE are intrinsically parasitic (Doolittle and Sapienza, 1980; Dujon, 1989; Orgel and Crick, 1980; Hickey 1982; Kiyasu and Kidwell 1984; McDonald 1993; Yoder et al., 1997), and can easily duplicate themselves (Plasterk and Sherratt, 1995) and invade new species (Dujon, 1989; Dujon et al., 1989; McDonald 1993). A proclivity for horizontal transfer is consistent with the role of TEs as genomic parasites. TEs, therefore, also act as plasmids.
Horizontal transfer to another host lineage provides the opportunity for active TEs to begin the cycle over again in yet another species (Dujon, 1989; Dujon et al., 1989; Hurst et al., 1992; Kidwell, 1993; 1994; McDonald 1993) or to insure that all members of the same species undergo the same genetic and evolutionary changes at the same time (McDonald 1993).
Moreover, this enables these intronic TEs to coordinate gene expression among multiple members of the same or divergent species, such that different species may evolve in tandem or develop complimentary traits at the same time.
These TEs can survive over long periods of evolutionary time by spreading throughout numerous genomes belonging to numerous divergent and subsequent species. However, once transferred, transposed, and inserted, these TEs may serve only to inhibit gene expression (Waterland and Jirtle, 2003; Yoder et al., 1997). It may take hundreds of millions or even billions of years, before these genes become active and begin expressing new functions, new characteristics, and even new species; and this may require major changes in the environment and the elimination of suppressive influences.
16. GENE ACTIVATION & SUPPRESSION
Genes expression can be restricted and inhibited by a variety of mechanisms and proteins, such by "restriction" via Alu enzymes (Arber and Linn 1969; Krüger and Bickle 1983), or the binding of a repressor molecule or protein to the operator to prevent transcription (Blumenthal et al., 2002; Salgado, et al., 2000), or via methylation and/or the generation of heterochromatin (Waterland, 2006, Waterland and Jirtle, 2003; Yoder et al., 1997).
Further, TEs inserted into introns can inhibit mRNA processing, and can render numerous genes susceptible to siRNA-mediated transcriptional gene silencing (Doetsch et al., 2001). Heterochromatin formation and its repressive capacity are also mediated by RNA interference (RNAi) machinery (Grewal and Moazed 2003; Hall et al., 2002; Volpe et al., 2002). Therefore, they can turn genes on, or off.
Transposons which use the gene replication machinery to reproduce themselves, also utilize methylation to prevent their own replication and to prevent the expression of nearby genes (Yoder et al., 1997; Rakyan et al., 2002). Most transposable elements in the mammalian genome, along with the genes positioned near them, are silenced by methylation (Yoder et al., 1997; Rakyan et al., 2002). DNA methylation involves four atoms, the methyl group, which attaches to and coats the gene thus silencing the gene by preventing its expression. Methylation is commonly employed to inactivate a variety of genes (Wolff et al., 1998; Yoder et al., 1997; Van den Veyver 2002). However, by inactivating a TE, methylation may instead induce gene expression.
Transposable elements, therefore, in conjunction with methylation, "restriction" siRNA-mediated transcriptional gene silencing, and the generation of heterochromatin commonly silence or activate various genes, and can cause considerable phenotypic variability, making each individual mammal a "compound epigenetic mosaic" (Whitelaw and Martin, 2001).
17. ENVIRONMENT & GENE EXPRESSION: METHYLATION
Not just transposons and introns, but the environment also activates or silences genes, and can effect methylation. In fact, those genes which are most responsive to external environmental stimuli, have transcripts enriched with TEs (van de Lagemaat et al., 2003). However, certain environmental triggers can induce or remove methylation thus enabling the expression of these genes (Waterland and Jirtle, 2003; Wolff et al., 1998).
For example, it has been demonstrated that nutritional supplementation to the mother can permanently alter gene expression in her offspring by activating or silencing Agouti genes via methylation (Waterland and Jirtle, 2003; Wolff et al., 1998). In one set of experiments pregnant mice that received dietary supplements of vitamin B12, folic acid, choline and betaine, gave birth to babies with brown coats whereas the control group gave birth predominantly to mice with yellow coats (Waterland and Jirtle, 2003). These four nutrients possessed chemicals that donated methyl groups which reduced the expression of a specific gene, Agouti via DNA methylation. Thus, diet altered the color of the coats by acting on gene selection. This effect is referred to as "epigenetic" because it occurs over and above the gene sequence without altering the four-unit genetic code. Likewise, genes passed down from ancestral species can be expressed by varying the environment and through other stresses including fluctuations in temperature, oxygen levels, and diet (e.g., de Jong & Scharloo, 1976; Dykhuizen & Hart, 1980; Gibson & Hogness, 1996; Polaczyk et al., 1998; Rutherford & Lindquist, 1998; Wade et al., 1997). Change the environment, and gene expression patterns may also be altered, giving rise to slight or major differences in the products produced. For example, increases in the levels of oxygen, calcium, and other elements and gasses significantly impacted gene selection around 540 mya, triggering what became the Cambrian Explosion. 18. GENE EXPRESSION, HSP90 & MOLECULAR SWITCHES These genetic-environmental interactions on gene expression are mediated through protein products like Hsp90 (Rutherford & Lindquist, 1998). Hsp90 is a highly conserved multifunctional protein which targets multiple signal transducers which act as "molecular switches" which control gene expression in eukaryotes ranging from yeast to humans (Feder and Hofmann 1999; Rutherford 2003; Sangster et al., 2004). Hsp90 "normally suppresses the expression of genetic variation affecting many developmental pathways" (Rutherford & Lindquist, 1998). Hsp90 does not act alone but is part of a networks that includes other protiens such as Hsp70, and p23 (Pratt and Toft 2003). As summarized by Cossins (1998, p. 309), these and related regulatory and signaling proteins, are sometimes referred to as "chaperones and have been discovered in all organisms studied so far. These signaling proteins form complex webs of molecular switches that allows signals both within and between cells to be transduced into responses." However, the coordination of these responses, can be influenced by the environment. "Hsp90 is one of the more abundant chaperones. At normal temperatures it binds to a specific set of proteins, most of which regulate cellular proliferation and cell development" (Cossins, 1998). At significantly lower or higher temperatures Hsp90 ceases to bind to these proteins thus allowing for gene expression(Rutherford and Lindquist 1998). Thus they can also act for or against genetic variation and can trigger or prevent the expression of silent characteristics (Cossins, 1998; Rutherford and Lindquist 1998). For example, these proteins may prevent DNA expression by acting as a buffer between silent genes and their nucleotides and the environment. Therefore these genes are inhibited and are only expressed in reaction to changes in the environment including temperature change. 19. HSP90, GENE EXPRESSION, NUCLEAR RECEPTORS & SNOW BALL EARTH In response to signifcantly lowered or increased temperatures, Hsp90 levels are reduced and no longer act as effective buffers against the expression of signal-transduction proteins which leads to the expression of genes that had been inhibited (Rutherford and Lindquist 1998). This allows for the expression of hidden genetic variation leading to new developmental and evolutionary patterns. As demonstrated by, Rutherford and Lindquist (1998, p. 341) Hsp90 acts as an "explicit molecular mechanism that assists the process of evolutionary change in response to the environment" and it accomplishes this through the "conditional release of stores of hidden morphological variation.... perhaps allowing for the rapid morphological radiations that are found in the fossil record." This has important implications for evolution as Earth has repeatedly undergone global ice ages followed or preceded by periods of high temperatures secondary to greenhouse warming. As lowered or raised temperatures can eliminate the suppressive influences of chaperones such as Hsp90, dramatic climate change, such as global glaciation or global warming, could affect a wide variety of signal-transduction proteins that are stabilized by Hsp90, thus inducing gene expression and the expression of precoded traits thus inducing the next stage of evolutionary metamorphosis. The Hsp90 complex also regulate nuclear receptors (Arbeitman and Hogness 2000; Feder and Hofmann 1999; Mayer and Bukau 1999; Picard 2002; Rutherford 2003; Pratt and Toft 2003). These include receptors for retinoic acid, thyroid hormone, signal-transduction proteins, ligand-dependent transcription factors, tyrosine/serine/threonine kinases, and steroids. Most nuclear receptors appear to be restricted to metazoans (Laudet 1997; Escriva et al. 2000; Thornton 2001; Baker 2005). However, the metamorphosis of the first metazoans did not take place until during or after the 3rd world wide glaciation. As will be detailed in part 3 and elsewhere (Joseph 2009b), Earth has undergone at least three major world-wide glaciations (Hoffman et al. 1998; Hyde et al., 2000; Runnegar 2000; Lubick 2002). Each was followed by periods of global warming and the diversification and evolution of new species. However, the last glaciation which began around 635 mya is also associated with the evolution of the the first primitive metazoan, i.e. a "living fossil" known as Trichoplax, around 630 mya (Srivastava, et al., 2008). Trichoplax, however, was not a true bilateral animal and lacked muscle, heart, eyes or brain. Thus, although its genome likely possessed all the genes that code for these structures, including nuclear receptors (Srivastava, et al., 2008), the preponderance of evidence suggests they had not been expressed. By the end of the 3rd glaciation, around 580 mya, what may be the first bilateral-symmetrical metazoan had evolved; an Echinodermata, Arkarua adami (Gehling 1987). In fact, a wide range of increasing complex species appeared following the 3rd glaciation and ensuing warming cycle, leading to an explosive burst of evolutionary change and diversification (beginning 540 mya), including the appearance of complex animals and chordates equipped with bilateral bodies, eyes, and brains (Chen et al., 1995, 1999; 2003; Shu et al., 2001; Siveter et al., 2001). It was during this same time period, known as the Cambrian Explosion, that the genome duplicated in size (Holland 1994, 1999; Dehal and Boore 2005) and which is associated with the evolution of every phylum which is in existence today. It can be assumed that the metamorphosis of the first true metazoans and chordates, was paralleled by the functional expression of those nuclear receptors regulated by the Hsp90 protein complex, and which are associated with metazoans. Thus, the explosion of complex life at the onset of the Cambrian, could be related to the effects of world wide glacial freezing followed by global warming on the Hsp90 protein complex. This may have led to activation of genes that had been suppressed, and even the duplication of individual genes and the entire genome thus enabling their expression. In fact, the genome underwent duplication at this time (Holland 1994, 1999; Dehal and Boore 2005) and nuclear receptors appear to have evolved by series of gene duplications, followed by functional expression of the duplicated gene (Laudet 1997; Baker 1997, 2003; Thornton 2001). Therefore, it appears that the genes coding for sex steroids, adrenal and other nuclear receptors, and which have an important role in development and sexual differentation, underwent duplications in chordates possibly during the Cambrian Explosion (Baker 1997, 2003; Laudet 1997; Escriva et al. 2000; Thornton 2001) and were expressed once freed of inhibitory restraints. Therefore, Hsp90, which can prevent the expression of a variety of genes or enable these genes to express functions which had been suppressed, may have been impacted by the extremes climatic changes in global temperatures. These global temperature changes, which may have been induced by biological activity, in turn effected a wide variety of signal-transduction proteins that are stabilized by Hsp90, thereby allowing for their expression and thus the metamorphosis of complex species including those which appeared during the Cambrian Explosion. Genes often interact in networks. Change the environment and gene expression patterns may be altered, giving rise to slight or major differences in the products produced and allowing for the expression of pre-determined traits (Rutherford & Lindquist, 1998). As demonstrated by experiments performed by Rutherford and Lindquist, (1998) when these suppressive protein-buffering actions are altered by environmental change, including temperature fluctuations, "variants are expressed and selection can lead to the continued expression of these traits, even when" the actions of these repressor proteins are restored. However, it as also the actions of genes, that is, biological organisms, which were largely responsible for these dramatic changes in the climate and global temperatures. Genes effect the environment and the environment acts on gene selection, creating an interactive feedback loop which significantly impacts the speed and rate of evolutionary metamorphosis. In order for these repressor proteins and other regulating genetic mechanisms to be switched off or on, requires contact and exposure to specific environmental agents. Presumably, these environmental influences directly impacted those genetic mechanisms involved in gene silencing, gene duplication, and gene expression, thereby giving rise to traits, functions, organs, and species, which had been precoded into silent genes inherited from ancestral species, and which were donated to the eukaryotic genome by prokaryotes--the ancestors of which, arrived on Earth from other planets.
Part 3. Genes, Microbes & Metazoan Metamorphosis: |