Promoter regions are highly acetylated in humans, and many promoters contain CpG islands, which are important transcription-controlling elements and are unmethylated under normal circumstances. CpG islands have been shown to be highly acetylated. The locations of the 46,813 acetylation islands identified are significantly correlated with conserved noncoding sequences and many of them are colocalized with known regulatory elements in T cells (immune cells). The human genome project has identified 27,058 CpG islands, highly concentrated in a 1-kb region surrounding the transcription initiation site.
Telomers are very little acetylated.
Acetylation - methylation.
Histone modifications (post-translational modifications) play an important role in gene regulation. Histones are the most abundant proteins in chromatin and play key roles in modulating chromatin folding to regulate the accessibility of DNA for transcription, replication, recombination and repair. Chromatin structure change. Histone acetylation is required for gene activation and cell growth and many enzymes that regulate histone acetylation and deacetylation are transcriptional cofactors = correlated acetylation patterns with the transcriptional activity usually. The acetylated histones may recruit and/or stabilize transcription factors and/or chromatin remodeling enzymes to their target sites in chromatin. The levels and genomic distributions of specific modifications are regulated to provide chromatin responses to physiological stimuli like hormones and growth factors. Also important are the mechanisms by which aberrant histone modifications act as epigenetic factors in the pathogenesis of diseases including cancer.
The folding is the important signal. What does this mean? - That the importance of the genome is to distribute active, regulating sites, that is EM-, magnetism and affection-repellation - not the DNA-code per se. The 'letters' are meaningless alone, only the context gives a meaning. It is important if the amino acid is lysine or cysteine because the amino acids have different 'metabolic pathways' in terms of EM-effects.
Acetylation (of lysine) loci may mediate the global chromatin remodeling and gene activation, and are epigenetic marks that allow prediction of functional regulatory elements. Methylation (of cytosine) is the 'lock-function' on the chromatin. A diverse array of posttranslational modifications largely impinge on histone N termini, thereby regulating access to the underlying DNA. Modifications can generate synergistic or antagonistic interaction affinities for chromatin-associated proteins. The combinatorial nature of histone N-terminal modifications thus reveals a "histone code" that considerably extends the information potential of the genetic code. This epigenetic marking system represents a fundamental regulatory mechanism that has an impact on most, if not all, chromatin-templated processes. Histones provides complex recognition surfaces or a code for factors that regulate chromatin structure and gene activity.
The methylation pattern is heritable after cell division. DNA methylation plays an important role in cell differentiation during development and for the immunity etc. Epigenetics is the study of heritable changes in chromatin (e.g., DNA acethylation/methylation) without involving the change in DNA sequences. Imprinting involves the inheritance of a silenced allele of a gene through either a paternal or maternal germline.
The fifth base of human DNA, 5-methylcytosine (5-mC), recognized 1948, involves addition of a methyl group to the carbon 5 position of the cytosine ring, catalyzed by DNA methyltransferase in the context of the sequence 5'-CG-3', which is also referred to as a CpG dinucleotide.
Schematic representation of the biochemical pathways for cytosine methylation, demethylation, and mutagenesis of cytosine and 5-mC.
This inverse relationship between cytosine methylation and transcription has been observed in a large number of genes, although not universally. Numerous reports have shown the ability of promoter DNA methylation to inhibit transcription of a wide variety of genes in in vitro transfection assays, and in some cases, such methylation corresponds to the inactive state of the gene under study in vivo. Three possible mechanisms have been proposed.
- direct interference withthe binding of specific transcription factors to their recognitionsites in their respective promoters. Several transcription factors,including AP-2, c-Myc/Myn, the cyclic AMP-dependent activatorCREB, E2F, and NF-B, recognize sequences that contain CpG residues, others are indifferent and manyfactors have no CpG dinucleotide residues in their binding sites.
- direct binding of specific transcriptional repressorsto methylated DNA. Two such factors, MeCP-1 and MeCP-2 (methylcytosine binding) bind to methylated CpG residues in any sequence context. MeCP-1 binds to DNA containing multiple symmetrically methylated CpG sites, as opposed to hemimethylated CpGs, and manifestsas a large complex on electrophoretic mobility shift assay. The complex stability varies, and a weak or a strong promoter can make the shift. A complex with electrophoretic mobility similar to MeCP-1 forms efficiently with the methylatedbut not with unmethylated embryonic rho-globin gene promoter sequences, suggest a role for MeCP-1 or asimilar complex in developmental silencing of embryonic globingenes.
- methylation may mediate transcriptional repression is by altering chromatin structure.
Eukaryotic genomes are not methylated uniformly but contain methylated regions interspersed with unmethylated domains. During evolution, the dinucleotide CpG has been progressively eliminated from the genome of higher eukaryotes and is present at only 5% to 10% of its predicted frequency. Cytosine methylation appears to have played a major role in this process, because most CpG sites lost represent the conversion through deamination of methylcytosines to thymines. Approximately 70% to 80% of the remaining CpG sites contain methylated cytosines in most vertebrates, including humans. These methylated regions are typical of the bulk chromatin that represents the late replicating DNA with its attendant histone composition and nucleosomal configuration and is relatively inaccessible to transcription factors. In contrast to the rest of the genome, smaller regions of DNA, called CpG islands, ranging from 0.5 to 5 kb and occurring on average every 100 kb, have distinctive properties. These regions are unmethylated, GC rich (60% to 70%), have a ratio of CpG to GpC of at least 0.6, and thus do not show any suppression of the frequency of the dinucleotide CpG. Chromatin containing CpG islands is generally heavily acetylated, lacks histone H1, and includes a nucleosome-free region. This so called open chromatin configuration may allow, or be a consequence of, the interaction of transcription factors with gene promoters.MTase protein has inherent de novo methylating activity that may be altered by protein-protein interactions and enhanced by aberrant structures or 5-mC residues in the substrate DNA. Several studies in the last few years have demonstrated an increase in DNA-MTase activity in neoplastic cells. The intrinsic mutagenicity of 5-mC, activation of proto-oncogenes through hypomethylation, transcriptional inactivation of tumor-suppressor genes through hypermethylation, and defects in chromosomal segregation due to failure of de novo methylation may all contribute to neoplasia. Studies have linked two global mechanisms of gene regulation, DNA methylation, and histone deacetylation. Further investigations are necessary to understand the complex links between the methyltransferases, demethylases, methyl cytosine binding proteins, histone acetylation, and the transcriptional activity of genes.
Two regulatory ways:
1. Approximately half of all genes in mouse and humans (ie, 40,000 to 50,000 genes) contain CpG islands (housekeeping genes that have a broad tissue pattern of expression).
2. Tissue-specific genes (40%) without CpG islands are variably methylated, often in a tissue-specific pattern, and usually methylation is inversely correlated with the transcriptional status of the genes.
The target site for DNA-MTase in DNA is the dinucleotide palindrome CG (commonly referred to as CpG, with p denoting the phosphate group)
Although the protein complexes (holoenzymes) that form during transcription initiation, elongation, and RNA processing are being extensively characterized in vitro, our understanding of the long-range interactions that determine the overall efficiency of transcription are less well understood in vivo. Key determinants of transcriptional initiation and reinitiation include promoters, enhancers, locus control regions (LCRs), and those involved in the organization of chromosomal and nuclear context. Position effect variegation is perhaps the best-known phenomena, where the chromosomal position of a gene and promoter element can influence its overall expression. One reason why chromosomal context can have such a dominant effect relates to the position of the element in relation to the competing effects of ‘open’, transcriptionally active chromatin with ‘condensed’ transcriptional silent chromatin. Within the context of the nucleus of the living cell, the degree of condensation of chromatin can be thought of as different ‘cloud’-like densities of chromatin loops depending on the degree of compaction. Even though the mechanisms and interactions that govern the processes of gene expression now appear more complex, the theme emerging from experimental investigation is that of a dynamic regulated system. There are multiple superfamilies of chromatin proteins that modify gene expression and chromatin condensation.Within the spatial confines of the nucleus, chromatin is organized into loops attached to the core structure of the chromosome, which can then unravel into 30 nm fibers during interphase. The loops are attached at their base to a protein substructure termed either the nucleoskeleton, nuclear scaffold, or nuclear matrix. Interactions are generally of functional importance. Such regions have been termed matrix attachment regions, scaffold attachment regions, or base of loops. We will refer to these collectively as loop attachment regions (LARs). The current evidence from investigation of base of loop libraries and transient episomal reporter gene expression is that the LAR sequences map mainly to transcription units. Transcription and RNA processing are performed by very large protein complexes that are likely to be immobile structures within the gel-like nucleoplasm. If the RNA polymerase tracks along the template DNA dragging its RNA behind, as depicted in most models of transcription, then the RNA molecule will be wound around DNA every 10 bp. If RNA secondary structures form cotranscriptionally, then the complex is likely to result in a knot of protein and nucleic acid. For genes several kilobases long, the energy required to untangle RNA from DNA would appear to be a costly topological puzzle. The entwining problem for the cell can be sidestepped if the polymerase is made immobile and topoisomerases remove supercoils generated in the template DNA. The experimental evidence for immobile polymerases comes from a variety of observations. Perhaps the most compelling is the observation that active RNA polymerases appear to concentrate in sites, termed factories, the largest factory being the nucleolus. Labeling of nascent RNA in nuclei produces multiple foci when visualized with light microscopy. However, foci represent collections of nascent RNA around multiple polymerases (Fig. 1).
DNA is packed within the nucleus around histone octamers, a protein complex consisting of two copies each of four different histone proteins. Eight types of histone modifications are known (acetylation, methylation, phosphorylation, ubiquitylation, sumoylation, ADP ribosylation, deimination, and proline isomerisation). Each type of modification is specific to certain residues and has a different mechanism of function, and accordingly different functional consequences. There is no simple one-to-one correspondence between the type of modification and the functional consequence, but rather the combination of modification type, enzymatic activity, affected residue and the DNA sequence in the immediate vicinity of the affected histone determine the functionality of the modification in a very complex manner. The same type of modification can be enhancing or repressing transcriptional activity, depending on which histone and residue it occurs. The transactivation potential of a given transcription factor depends on the degree of differentiation of the recipient cells, on the promoter structure, and on the affinity of the binding site.
The chromatin accessibility and gene expression of a genetic domain is correlated with hyperacetylation of promoters and other regulatory elements but not with generally elevated acetylation of the entire domain. Islands of acetylation are identified in the intergenic and transcribed regions.
An estimate is that about 2% (1%) of the unique sequences in the genome are associated with the acetylated histone. Also higher repetitive sequences (as telomers) are associated with lower levels of the histone acetylation.
Higher eukaryotic genomes differ from yeast in having much lower acetylation levels and more heterochromatic regions. (Linked to the intron-sequencies?)
What about sequence-association for features of the clustered promoters?
The clustering revealed three groups with major histone signal upstream, centered and downstream of the promoter. Narrow single peak promoters tend to have a concentrated activity of histone in the upstream region, while broad promoters tend to have a concentrated activity of histone and RNA polymerase II binding in the centered and downstream regions. A subset of promoters with high gene expression level, compared to subsets with low and medium gene expression, shows dramatic increase in histone activity in the upstream cluster only; this may indicate that promoters in the centered and downstream clusters are predominantly regulated at post-initiation steps. Furthermore, the upstream cluster is depleted in CpG islands and more likely to regulate un-annotated genes.
Clustering core promoters according to their surrounding acetylation signal is a promising approach for the study of histone modifications. When examining promoters clustered into groups according to their surrounding histone acetylation signal, we find that the relative localization and intensity of histone is very specific depending on characteristic sequence features of the promoter.
The association between the clusters and the three tested genomic features promoter architecture (single peak vs. broad), CpG islands and gene annotation is highly significant.
Promoters of genes that transcribe relatively large amounts of mRNA have similar structures. They have a TATA sequence (sometimes called the TATA box or Goldberg-Hogness box) about 30 base pairs upstream from the site where transcription begins, as well as one or more promoter elements further upstream (CAAT and CACCC sequencies). The TATA box is present in only about 10 to 15% of human genes. Two new core promoter motifs-the DPE and the MTE, is discovered. Both the DPE and MTE are conserved from Drosophila to humans, and are located downstream of the transcription start site. There are sharp differences in the properties of TATA-dependent versus DPE-dependent core promoters. For example, Caudal, which is a sequence-specific DNA-binding protein that is a master regulator of the homeotic (Hox) genes, is a DPE-specific activator. Thus, enhancer-core promoter specificity can be used to create gene regulatory networks.
The "functional anatomy" of a promoter region showed that the first 109 base pairs preceding the cap site were sufficient for the correct initiation of gene transcription by RNA polymerase. The enzyme responsible for the transcription of messenger RNAs is RNA polymerase II, that is bound to the promoter.
Fig. Typical promoter region for a protein-coding eukaryotic gene. The gene diagrammed here contains a TATA box and three upstream promoter elements.
The CAAT and TATA sequence-boxes have been found to be critical elements in numerous eukaryotic promoters, but the CACCC sequence is seldom seen except in the β-globin gene promoters in several species. In humans, this sequence appears to be critical (usually no synthesis).
Promoters can function not only to bind RNA polymerase, but also to specify the places and times that transcription can occur from that gene. Histone modifications are one of the major mechanisms regulating gene expression, acting in combination with other mechanisms such as alternative promoter usage, alternative splicing, and microRNAs ( a kind of 'language'). Acetylation occurs on the N-termini of the protein octamers and neutralizes the basic charge of the affected lysine, the association between the DNA and the octamer becomes weaker, unravelling the DNA and making the genomic DNA more accessible for RNA polymerases and transcription factors. Like all histone modifications, acetylation can work on two different scales. Globally, the acetylation state of large genomic regions helps to define euchromatin and heterochromatin within the nucleus. Acetylation can also function locally, being restricted to short sequences of the genome, where it is associated with upregulated transcription of individual genes.
A promoter can be classified into different shape classes , the two most prominent classes being single peak (SP) and broad promoters (BR) (>97% of total). Single peak promoters have the majority of their CAGE tags concentrated in a narrow region, while broad promoters have a more even, widespread distribution of start sites within the promoter. SP promoters are associated with the TATA Box binding motif and tissue-specific expression, while BR promoters are associated with CpG islands and ubiquitously transcribed genes including housekeeping genes. A more direct link (higher acetylation) between histones and gene expression level in the upstream cluster, than in the centered and downstream clusters. Also as weak and strong promoters. Broad promoters tend to regulate genes with a higher gene expression level than peak promoters,
Around half of these un-annotated promoters are evolutionary conserved across mammals and are therefore likely to be promoters of yet undetected genes, including functional non-coding RNA genes. Large intergenic noncoding RNAs (lincRNAs) are a group of multi-exonic, functional RNAs that show strong conservation across mammals and are thought to be involved in various cellular processes, including embryonic stem cell pluripotency and differentiation, but they represent only a subset of the entire functional noncoding transcriptome. It is reasonable to assume that many of the un-annotated core promoters belong to ncRNA genes, yet undetected protein-coding genes, or may be alternative promoters of already annotated genes.
This association link can be interpreted within the model of three main epigenetic modes of transcription initiation: genes experiencing initiation and elongation, genes experiencing transcription initiation but not elongation, and genes experiencing neither. The mechanisms of gene-regulation in these three groups may belong to the initiation or elongation phase of transcription, respectively. This model in combination with our observations suggests that genes having the histone concentration in the centered and downstream region could predominantly be regulated at post-initiation steps. Such post-initiation regulation could be based on two general classes of regulation mechanisms : in one class, transcriptional pausing of RNA polymerase II, poor processivity, or abortive initiation prevents elongation. In the second class, transcription does take place but is immediately degraded by gene silencing.
With increasing promoter expression level, an increase in the number of promoters overlapping with repeat elements. Only ~5.8% of all lowly expressed promoters overlap with any of the repeat elements. For medium gene expression, ~7.8% of the promoters overlap with a repeat element, and for the promoters regulating highly expressed genes the result was ~11.8%.
Regulation of transcription of eukaryotic class-II genes with developmental, tissue and hormone-sensitive specificities requires as-yet not understood organization and genome-wide co-ordination of signaling by many regulators. The genome-wide co-ordinated assembly of these regulators into higher-order complexes with target genes, to effect precisely timed changes in transcriptional activity, is one of the great remaining 'black boxes' in developmental biology. Recent findings on the very tightly controlled genes have generated the notion of 'composite promoters' that have and are necessarily dependent on both a TATA box and an initiator. Thus far most examples of composite promoters have been from viruses that are dependent on usurpation of host transcriptional machinery at precisely the right moment in its life cycle and in just the right host-cell biological context. However, our studies have identified that a cellular gene, juvenile hormone esterase, behaves as a possessing a composite core promoter.Transcription from the core promoter of the juvenile hormone esterase gene (-61 to +28) requires the presence of both an AT-rich motif (TATA box) and an initiator motif for any transcription to occur, when assayed by either transcription in vitro or by transient-transfection assay. Additional gel-shift experiments in which both the TATA box and initiator motifs are transversion mutated indicated that at least one additional binding site is utilized. The juvenile hormone esterase gene thus appears to be a model of a cellular composite core promoter with a multipartite, indispensible requirement for not just both the TATA box and initiator, but also for at least a third core element as well.
The sequence of the modelled region is shown at the upper portion of the Figure. The transcription factor binding sites are shown within the boxed regions, with the bend angle and direction of the protein-induced bend shown above. The underlined bases indicate the bases on which the protein-induced bends were centred. The lower portion of the Figure shows a model of the promoter generated using Berkeley Enhanced Rasmol. The FMR1 model is superimposed, for scale, on the structure of DNA from the nucleosome core particle of Xenopus laevi. The boxed regions correspond to the boxed regions in the upper panel. Numbers indicate the first and last base of the modelled sequence.
To be continued.