- Four C5-Cytosine Modifications Comprise the Highest Diversity
- Two N6-Adenine Modifications Are Now Known
- 8-Oxoguanine Is a Regulatory Modification Via Alteration of G-Quadruplex Function
This blog represents an update to several earlier blogs on epigenomics, which defines the modified DNA bases formed after replication that influence the function of DNA in various ways. Epigenomics is a trending topic in nucleic acids research, as seen here by the marked increase in number of publications indexed to “epigenomics” as a keyword in PubMed during the past 10 years. The annual number was ~1,000 in 2018, which is the equivalent of ~3 publications per day, 7 days per week—that’s a lot of interest!
PubMed search and charting by Jerry Zon.
Prof. Sir Shankar Balasubramanian. Taken from commons.wikimedia.org and free to use.
The catalyst, so to speak, for writing this update occurred when I stumbled upon a recent perspective article in the prestigious Journal of the American Chemical Society titled Detection, Structure and Function of Modified DNA Bases by Alexandre Hoffer, Zhang J. Liu, and Shankar Balasubramanian, hereafter referred to as HLS. The last and corresponding author is Professor Sir Shankar, pictured here, whose seminal contributions to the emerging field of biologically functional G quadruplexes—also trending—have been featured in past blogs in the Zone.
Prof. Sir Shankar’s was knighted in 2017 for his scientific achievements, which led me to an even more careful review of HLS, especially with regards to the perspectives offered. What follows draws from content in HLS, but it represents a limited and selective sampling. Interested readers should consult the publication by HLS for more information, as have more than 3,200 others in only about 4 months since its availability, according to the journal’s metrics as of this past July. There is also a 2018 review by Carell et al. titled Non-Canonical Bases in the Genome: The Regulatory Information Layer in DNA.
Methods to Detect and Map Modified Bases
Detection and Quantification by Liquid Chromatography-Mass Spectrometry: Typically, a DNA sample is enzymatically digested into nucleosides, which are subjected to liquid chromatography-mass spectrometry (LC-MS) to resolve and identify modified nucleosides. Secondary fragmentation by tandem MS (MS/MS) is used to aid in this identification process. LC-MS/MS is depicted here and explained elsewhere.
Detection of modified nucleobases by LC-MS/MS. Taken from Hofer et al. J. Amer. Chem. Soc. April 1, 2019, with permission. Copyright © 2019, American Chemical Society.
Use of a synthetic, stable-isotope (e.g. 13C, 15N, etc.) labeled analogue of the modified nucleoside of interest, as an internal standard, enables accurate quantification. Detection limits are down to sub-femtomole (<10-15) amounts from only 1 μg of DNA (~150,000 human cells), which is <1 ppm of modification per nucleoside. Consequently, LC-MS/MS is the preferred method for the discovery, global quantification, and comparative analysis of modified bases across different cell states, tissues, and organisms, according to HLS.
Mapping by Antibody-Based or Chemical Enrichment: The position of a modified base in genomic DNA can be approximately mapped by fragmenting the DNA to short (∼200−400 bases) pieces, enriching for DNA fragments that contain the modified base, deep-sequencing the enriched fragments, and then aligning the sequence-reads to the reference genome, as depicted here. The resultant map indicates where modifications are by a peak in the sequencing depth with a resolution limited to ∼200 bases. A critical aspect to sequencing-by-enrichment is having a highly selective method to recognize and pull-down DNA with the modification of interest.
Mapping modified bases by enrichment of DNA fragments containing the base-modifications and deep-sequencing. Taken from Hofer et al. J. Amer. Chem. Soc. April 1, 2019, with permission. Copyright © 2019, American Chemical Society.
A broadly used enrichment strategy is the use of DNA immunoprecipitation (DIP) with antibodies that recognize specific base modifications. In an early publication of modified base immunoprecipitation by Vucic et al., a 5'-methylcytosine (5mC)-specific antibody was used for the immunoprecipitation of cytosine-methylated DNA (MeDIP). Following that report, numerous high-affinity antibodies have been raised against various modified bases, such as N6-methyladenine (N6mA), which has been widely studied by antibody-based recognition (e.g., see Koziol et al.).
5mC N6mA
Structures taken from commons.wikimedia.com and free to use.
However, the specificity of antibodies for modified bases has been repeatedly called into question, according to HLS. For example, in Nature Methods, Lentini et al. state that the results of DNA immunoprecipitation followed by sequencing (DIP-seq) “often show considerable variation between profiles of the same genome and between profiles obtained by alternative methods.” They showed that these differences are primarily due to the intrinsic affinity of immunoglobulin G (IgG) for short unmodified DNA repeat sequences, and concluded that “[t>
his pervasive experimental error accounts for 50–99% of regions identified as ‘enriched’ for DNA modifications in DIP-seq data.”
This caveat by Lentini et al. is a good segue into chemical enrichment strategies, which avoid the use of DNA-recognition antibodies, and are based on chemical selectivity to specifically “tag” a modified base in DNA for subsequent enrichment. Examples of chemical tagging are the enzymatic glycosylation of 5-hydroxymethylcytosine (5hmC) or the reaction of the aldehyde group in 5-formylcytosine (5fC) with hydroxylamines to allow chemoselective biotinylation. By the same token, the carboxylic group of 5-carboxycytosine (5caC) can be selectively labeled using 1-ethyl-3-(3-dimethylaminopropyl)carbodiimide (EDC)-mediated coupling.
5mC 5fC 5caC
Structures taken from commons.wikimedia.com and free to use.
Base-Resolution Sequencing of Modified Nucleobases: Sequencing modified DNA bases at single-base resolution is the “gold standard” for mapping exactly where these modifications occur in genomes. The general principle requires distinguishing the modified base from the unmodified counterpart by a transformation that selectively alters the Watson−Crick readout during sequencing. The established standard for sequencing C modifications at base resolution is bisulfite sequencing (BS-seq), as it exploits the hydrolytic deamination of C bases in DNA mediated by bisulfite under controlled pH. The net result is that C is transformed to U, (read as T), with high efficiency, whereas 5mC remains intact, as depicted here. Thus, after bisulfite treatment, C-to-T conversions mark positions that were unmethylated C, whereas Cs that remain were 5mC.
Taken from commons.wikimedia.org and free to use.
However, the applicability of this chemistry became questionable after the existence of 5hmC, 5fC, and 5caC was confirmed in mammalian DNA, since 5fC and 5caC both convert to U upon reaction with bisulfite (and thus behave as per C) whereas 5hmC does not convert to U by bisulfite (and thus behaves as 5mC). Recognition of this complication led to the development of additional chemistry that could disambiguate the various newly discovered C modifications.
Taken from Hofer et al. J. Amer. Chem. Soc. April 1, 2019, with permission. Copyright © 2019, American Chemical Society.
Example strategies to achieve this differentiation, reported by Prof. Sir Shankar and coworkers, include reductive and oxidative BS-seq, i.e. redBS-seq and oxBS-seq, respectively. These new methods rely on either the selective chemical reduction of 5fC to 5hmC or the oxidation of 5hmC to 5fC prior to bisulfite treatment, as shown here.
Alternatively, a chemoenzymatic approach based on ten-eleven translocation (TET) enzyme-assisted BS-seq (TAB-seq) exploits enzymatic glucosyltransferase-blocking of 5hmC, as well as oxidation of 5mC to 5caC by TET prior to bisulfite conversion, as described by Yu et al. TET is discussed further in the next section.
An exciting detection breakthrough was reported by Wescoe et al., who used nanopore technology to discriminate among C, 5mC, 5hmC, 5fC, and 5caC in individual (i.e. single) DNA molecules. In this first-ever achievement, a patch-clamp amplifier was used to acquire ionic current traces caused by phi29 DNA polymerase-controlled translocation of DNA templates through a protein pore embedded in a lipid bilayer. They found that, for ~4400 individual DNA molecules analyzed at CG dinucleotides, correct base calls for single-pass reads ranged from 91.6% to 98.3%. This accuracy was determined to be dependent on the identity of the nearest neighbor bases surrounding the CG dinucleotide.
Modified Cytosine Bases
Since its discovery in the early 20th century, 5mC has established itself as the “fifth letter of the eukaryotic genetic alphabet,” according to HLS, who add that the “biological roles of 5mC in mammals have been thoroughly investigated, whereas its more recently discovered oxidation products 5hmC, 5fC and 5caC are the subject of ongoing studies in the field.”
Methylation and Demethylation at Cytosine C5 in DNA: As depicted here, 5mC is incorporated into eukaryotic genomes by direct methylation of C in DNA strands by DNA methyltransferases (DNMTs), which use S-adenosylmethionine (SAM) as their methyl donor. Demethylation can occur passively when methylation patterns are not maintained on newly synthesized DNA during replication. Loss of the methyl group can also occur by active demethylation, as observed during differentiation of pluripotent stem cells in early mammalian development, for example. Active loss of the 5-methyl group is thought to occur primarily by successive oxidation of 5mC to 5hmC, 5fC, and 5caC.
Taken from Hofer et al. J. Amer. Chem. Soc. April 1, 2019, with permission. Copyright © 2019, American Chemical Society.
These oxidation steps are catalyzed by TET enzymes, which use a Fe(+2) center and 2-oxoglutarate (2-OG). Decarboxylation of 2-OG leads to formation of a transient, highly oxidative Fe(+4)=O species that is responsible for the oxidation on the nucleobase. Both 5fC and 5caC can be recognized and removed by thymine DNA-glycosylase (TDG), which initiates base excision repair (BER). TDG catalyzes the hydrolysis of the corresponding glycosidic bond to afford an abasic site, which is ultimately replaced by an unmethylated cytosine in the sequence. Direct deformylation of 5fC or the decarboxylation of 5caC are proposed alternatives to the excision of the oxidized 5mC.
Biological Roles of 5mC Oxidative Demethylation Intermediates: According to HLS, 5hmC has been shown to mark actively transcribed genes, whereas 5mC is typically associated with promoters of silent genes. 5hmC also marks enhancers—noncoding DNA regions that can increase transcription activity of nearby genes, as depicted here—that define cell and tissue identity.
Taken from commons.wikimedia.org and free to use.
Unlike 5mC, which is present at constant levels of typically 4−5% 5mC/C throughout different tissues, global 5hmC levels are highly tissue-specific as well as age-dependent, with levels ranging from 0.2% to 1.2% 5hmC/C. 5hmC formation on a newly synthesized DNA strand seems to be decoupled from 5mC formation, according to HLS. Of potential clinical relevance, HLS add, is “the broad observation that global 5hmC levels are generally depleted in tumors, whereas 5mC is generally elevated, suggesting important roles for 5hmC metabolism in the biology of cancer.”
Taken from commons.wikimedia.org and free to use.
The aldehyde derivative 5fC is a largely metabolically stable DNA modification in vivo, with global levels in mammals that are tissue-dependent, varying from 0.00002 to 0.0011% 5fC/C, in a way that is independent of 5mC and 5hmC levels, according to HLS. It has been shown, they say, that 5fC can influence the mechanical properties of DNA by introducing flexibility and altering the structure, with the potential to affect the interaction with nucleosomes.
As indicated by the scheme shown here, studies have revealed that 5fC can form Schiff bases with lysine (Lys) side chains of histone proteins (H3), and that these reversible interactions can be trapped using sodium cyanoborohydride. These findings suggest, according to HLS, “a chemical mechanism for the control of nucleosome positioning by 5fC in vivo.” However, “more work is needed to fully evaluate and understand the mechanisms involved.”
Taken from Hofer et al. J. Amer. Chem. Soc. April 1, 2019, with permission. Copyright © 2019, American Chemical Society.
HLS state that relatively little is understood about the carboxyl derivative 5caC, the levels of which are approximately 0.00005% in mammalian embryonic stem cells (mESCs). Both 5fC and 5caC were shown by Wang et al. to be recognized by RNA Pol II and transiently slow down the activity of this polymerase, which may therefore provide a way to regulate transcription. “Mechanistic biological functions of 5caC, however, need to be elucidated, and improvements in chemical tools may enable this in the future.”
Modified Adenine Bases
According to HLS, global N6mA levels range from 0.05% to 2.8% N6mA/A in plants and fungi, with a high density of the modified base around transcription start sites. In vertebrates, N6mA maps from mESCs indicate that N6mA is enriched in mobile DNA elements called transposons, with high N6mA levels correlating with suppressed transposon activity, as well as downstream genes. Furthermore, N6mA levels vary during the development of zebrafish and pig embryos, with peak levels of N6mA at early embryonic stages.
Taken from en.wikipedia.org and free to use.
In humans, global levels of 0.05% N6mA/A were found in blood-derived cells, with levels of up to 0.2% N6mA/A in mitochondrial DNA, whereas human astrocytes had levels of only a few ppm N6mA/A. In astrocytes, N6mA was strongly correlated to heterochromatin, indicating possible “cross-talk” between adenine methylation and chromatin remodeling, as depicted here. For this reason, HLS state that “N6mA would appear to be important to the reprogramming events that occur during development and the establishment of cellular identity.”
Recent reports of rodent and human genomic N6mA also suggest clinical relevance, according to HLS. In the mouse model, N6mA levels in the prefrontal cortex are stress-dependent, and a link to neurological disorders such as depression has been proposed. In humans, liver cancer tissues have been found to have lower N6mA levels compared to adjacent nontumoral tissues, whereas glioblastoma cells had far higher N6mA levels (up to 0.1% N6mA/A) than nontumoral astrocytes.
A publication in 2019 by Xiong et al. describes the discovery of N6-hydroxymethyladenine (N6hmA) in mammalian DNA. The researchers found that N6hmA can be formed from the hydroxylation of N6mA by the Fe(2+)- and 2-oxoglutarate-dependent ALKBH1 protein in genomic DNA of mammals. In addition, the content of N6hmA exhibited a significant increase in lung carcinoma tissues compared to normal tissues. HLS suggest that “[t>
he development of improved methods for the detection and mapping of N6mA would play a key role in further advancement” of the biological roles of this modified adenine base.” In my opinion, this suggestion is equally important for N6hmA.
Modified Thymine Bases
Recently, both 5-hydroxymethyluracil (5hmU) and 5-formyluracil (5fU) have been detected in the DNA of higher eukaryotes (∼0.5 and 2.5/106 dN for 5hmU and 5fU respectively in mESCs), according to HLS. Isotopic labeling of nucleosides followed by LC-MS/MS quantification has revealed that up to 80% of 5hmU in mESCs DNA is generated in a reactive oxygen species (ROS)-independent manner. TET has been identified as a possible enzyme for T to 5hmU oxidation in vivo. An alternative discussed by HLS is 5mC demethylation by TET oxidation to give 5hmC, followed by deamination into 5hmU, thus initiating DNA repair by the BER pathway to restore cytosine (depicted here).
Taken from Hofer et al. J. Amer. Chem. Soc. April 1, 2019, with permission. Copyright © 2019, American Chemical Society.
Another well-known modification of thymine is uracil (U). Deamination of cytosine gives rise to U, which is read as T and therefore leads to mutagenesis (C:G to T:A). In addition to spontaneous deamination, HLS mention an enzymatic pathway involving activation-induced cytidine deaminase. Although detectable in mammalian DNA, due to the low abundance of T modifications “it remains a challenge to study the true significance of these sites,” according to HLS.
Modified Guanine
8oxoG; taken from commons.wikimedia.org and free to use.
In addition to its being a byproduct of DNA damage, 8-oxoguanine (8oxoG) may play a role in transcriptional regulation under oxidative stress. During hypoxia, intracellular levels of ROS are elevated, facilitating the generation of 8oxoG. Gene regulation upon response to hypoxia is mediated in part by hypoxia-inducible factor 1-α (HIF1α). This transcription factor binds to the hypoxic response element (HRE) sequence in the promoters of hypoxia-inducible genes. HLS say that, in rat pulmonary artery endothelial cells, an accumulation of 8oxoG was found under hypoxic conditions within ~200 protein-coding genes, which is “in a drastically different distribution to that found under normal conditions.” Furthermore, genes that gained 8oxoG signal upon hypoxic stress were associated with transcriptional upregulation.
More recently, HSL described an “intriguing hypothesis,” proposed by Flemming et al. , who suggested that 8oxoG may be considered an epigenetic marker capable of controlling gene expression, specifically in the context of 4-stranded DNA G-quadruplexes, depicted here and featured in past blogs in the Zone. The proposed multi-step mechanism in the case of the VEGF promoter is discussed in detail in Flemming et al., and is now supported by findings published in 2019 on the NEIL3 DNA repair gene.
A generalized G-quadruplex structure and motif. (A) H-bonded planar structure of four guanines is formed from different G-tracts, which are separated by intervening loop regions. (B) Schematic of an intramolecular DNA motif comprised of four G-tracts of three guanines separated by loop regions. Taken from en.wikipedia.org and free to use.
Concluding Comments
I fully agree with HLS’ opinion on the necessity of new and/or improved methods for discovering and mapping of modified bases of DNA, in order to enable continued elaboration of the scope and biological significance of epigenomics. In addition to expanding our collective scientific knowledge, the emergence of epigenomic methodology has led to new diagnostic applications in health and medicine. Readers interested in these topics can consult a 2018 perspective by Wang and Chang titled Epigenomics: Technologies and Applications.
In concert with advances in epigenomics, various commercial endeavors have been launched, the first of which was the aptly named company Epigenomics, founded in 1998 in Germany. Its technology focus was—and still is—methylated DNA analysis for detection of various types of cancer. A January 2018 article by Reuters states that, according to Statistics MRC, the global epigenetics market was estimated to be $753 million in 2016 and is expected to grow at a compounded annual growth rate of 14.0%, reaching $1.89 billion by 2023.
The Reuters article adds that over 40 companies are active in epigenomics, including Cambridge Epigenetix, which was co-founded by Prof. Sir Shankar and Dr. Bobby Yerramilli-Rao. The stated mission of this company is “to change the way medicine is practiced by reducing several routine and important diagnostic screening tests to a simple blood draw using the power of the 5-hydroxymethylcytosine (5hmC) epigenetic modification.” If past success indicates future outcome, then Cambridge Epigenetix may be a winner, given Prof. Sir Shankar’s past co-invention of Solexa sequencing, which underpins the market-leading Illumina® sequencing technology.
Your comments are welcome, as usual.