Introduction
In the present digital era, the quantity of data being produced continues to increase exponentially. The global demand for data storage is expected to grow to
2 X 1014 gigabytes (GB) by 2025 and 10-fold more by 2030. The demand for denser and longer-lived information storage media is also increasing. Current storage media for optical and magnetic devices are reaching their information-density limits and are not suitable for long-term (>50 years) storage. Valuable information needs to be regularly transferred to new or improved storage media if it is to be preserved for future generations.
Nature provides an inspiring example of how to encode and transfer genetic information in the form of DNA sequences. In addition, DNA found in ancient bones and fossils can be amplified and sequenced, indicating its long-term stability under certain conditions. These perspectives about DNA, together with its theoretical data density being
106-times higher than those of conventional media used for encoding data, underlie the following catchy headline in Science magazine:
One of the early demonstrations of digital information storage in chemically synthesized DNA was reported by Church et al. in 2012. A book comprised of 53,000 words, 11 digital images, and a computer program was converted into binary information (1s and 0s) for encoding by 55,000 bases in a pool of 160-nt oligos. The oligos were synthesized by ink-jet printing on a microarray and, to read the encoded book, the oligo pool was amplified by PCR and then sequenced for decoding.
Since then, there have been over 1,600 publications indexed to “DNA data storage” in Google Scholar, all of which can be perused here. Among these is a recent review by Doricchi et al. (2022) titled Emerging Approaches to DNA Data Storage: Challenges and Prospects, which draws from over 150 references. This review’s depiction of the general strategy for DNA data storage is reproduced in Figure 1.
FIGURE 1. General strategy for DNA data storage. The data is stored directly in the sequence of the oligonucleotides. The six main steps—encoding, writing, storage, access, reading, and decoding—are depicted. Taken from Doricchi et al. (2022) and free to use under CC BY license.
The major steps involve (1) encoding digital information into DNA sequence, (2) data “writing” by synthesis of oligos, (3) storing the DNA either physically or biologically, (4) random access for amplification, (5) data readout by DNA sequencing, and (6) decoding sequences back into the original information. This blog provides examples of how some of these steps have been researched using chemically modified oligos, all of which—and many more types—are available from TriLink custom oligo synthesis.
5’-Phosphate Modified Oligos
PCR of oligo pools from microarray synthesis is subject to PCR amplification bias, in part due to highly variable oligo-copy number. This has been countered by Gao et al. (2020) using isothermal strand displacement amplification (iSDA) on magnetic beads. Briefly, the oligo pool is amplified with adapter primer-pairs having either 5’-phosphate or 5’-biotin (see below). The terminal phosphate group is recognized by lambda exonuclease, a highly processive enzyme that digests the phosphorylated strand to provide 5’-biotinylated ssDNA. Equimolar capture probes are hybridized for extension by Taq polymerase, followed by Exo I digestion to provide equimolar 5’-biotinylated ssDNA for unbiased iSDA on avidin-coated magnetic beads.
5’-Biotin Modified Oligos
A small fraction of the stored sequence-information pool is irretrievably consumed during PCR-based sequence retrieval. To prevent this loss of the primary DNA-sequence file, Bögels et al. (2023) developed DNA storage in microreactors with temperature-dependent membrane permeability, referred to as “thermo-responsive” microcapsules, for repeated random access of multiple DNA files by PCR.
The method is based on encapsulating a 5’-biotinylated DNA file in a thermo-responsive semipermeable microcapsule at room temperature by binding to avidin. Following pooling of these capsules, PCR reagents and primers are infused at room temperature. Importantly, the membrane’s unique thermo-responsive permeability prevents outward diffusion of molecules at elevated temperatures during PCR, but allows non-biotinylated amplicons to be removed at room temperature, after thermal cycling, for pooled sequencing.
An additional feature of this methodology is the use of fluorescently labeled oligos (see next section) as barcoding for each encapsulated DNA file. This enables use of fluorescent-assisted sorting instrumentation to isolate one or more individual capsules for PCR and sequencing. Bögels et al. note that, in theory, it is possible to generate up to 2N unique barcode combinations, where N is the number of fluorophores used for barcoding.
An alternative approach for non-consumptive, repeated access of DNA files reported by Lin et al. (2020) uses in vitro transcription (IVT) of RNA. This method (Figure 2) was demonstrated by designing three different 160-nt templates each characterized by a unique 20-nt file address and 117-nt data payload, but with the same 23-nt T7 promoter sequence.
FIGURE 2. Schematic of IVT RNA copying of a selected DNA file. Taken from Lin et al. (2020) and free to use under CC BY license.
Hybridization of a 5’-biotinylated (red dot) oligo complementary to the file address of a desired template (black) allows magnetic avidin-bead (brown) isolation for solid-phase IVT. Reverse transcription to cDNA is then followed by sequencing. De-hybridization of the 5’-biotinylated capture probe releases the file for its return to the original pool of files.
5’-Fluorophore Modified Oligos
Random access to DNA plasmid-encoded images reported by Banal et al. (2021) uses encapsulation in silica particles, each tagged with a unique 25-nt oligo barcode to select the desired image(s). This was demonstrated by accessing 3 of 20 prototypal 2-kilobyte (KB) image files using fluorescence-assisted sorting with selection sensitivity of one in 106 files, which thereby enables 106N selection capability using N optical channels.
As a proof-of-principle, Banal et al. encapsulated each of the 20 different 2-KB plasmid-encoded image files in surface-barcoded silica microparticles spiked. Three capture probes for the desired target files were synthesized as 25-nt 5’-hexylamine-modified oligos for labeling with 3 different fluorescent dyes. Each probe sequence was complementary to a corresponding barcode, which enabled sequence-specific hybridization-based bead capture.
Following fluorescence-assisted sorting and isolation of each of the desired particles, a chemical etching reagent released the encapsulated DNA plasmid-encoded image file. Selected images were visualized by sequencing the plasmid to decode the image. Because plasmids are used to encode information, transfection of decoded plasmids into bacteria for replication in cell culture replenishes the molecular file database.
Nucleoside-Modified Oligos
Tabatabaei et al. (2022) reported a prototype DNA data storage system based on synthetic oligos with an extended (11-letter) alphabet comprised of 4 natural bases and 7 chemically modified bases, such as 2,6-diaminopurine and 5-hydroxymethylcytosine. It was shown that nanopore sequencing using neural network signal analysis can discriminate different combinations and ordered sequences of these oligos. These researchers concluded that, if further work could realize 11-letter oligo writing and amplification, a 1.7-fold increase in DNA storage density could be achieved.
Importantly, Kawabe et al. (2023) later reported experimental proof-of-principle for a 1.8-fold increase in DNA storage density with a 12-letter alphabet read by nanopore sequencing. This elegant work, which involved combined chemical and enzymatic synthesis for writing, allows amplification: the 4 natural bases form 2 base pairs and the 8 synthetic bases form 4 base pairs. All 6 base pairs involve complementary H-bonding.
Biological Storage
Biological storage refers to synthetic DNA digital data added to the genome of an organism, and therefore differs from all other forms of DNA-encoded data storage. To demonstrate long-term, non-frozen biological storage, Davis et al. (2020) encoded digital information into Halobacterium salinarum (Hsal), which thrive under high-salt conditions. While encased in salt crystals, such halophilic organisms can exist in a dormant state for geological periods of time.
Because storage experiments with genome-encoded digital data performed over thousands of years are obviously impractical, the hypothesis that digitally encoded Hsal cells can be revived after being dormant for long periods of time in salt crystals was studied as follows.
A high salinity liquid culture of the encoded Hsal cells was divided into two aliquots. The first aliquot was used for genomic DNA extraction and decoding at time zero, while the second aliquot was left to evaporate. After 5 days, all liquid media was transformed into Hsal-containing salt crystals. The salt crystals were then distributed in sterile test tubes, sealed, and stored at room temperature.
During the next 3 years, speckles of salt crystals from one of these samples were regularly tested for growth in freshly prepared high salinity liquid media every 6 months. In every test performed in this still ongoing study, Hsal cells exhibited viability by reaching mid-exponential growth in less than 24 hours. Because it is known that halophiles comparable to Hsal can remain dormant for hundreds of millions of years, it was suggested that this method of digital archiving could be applicable to indefinite periods of time.
Concluding Comments
For digital DNA data storage to reach its full potential, Doricchi et al. believe that collaboration between scientists, engineers, and mathematicians will be essential to produce the necessary advanced chemical techniques, instrumentation, characterization methods, and automated analysis tools. Importantly, the DNA Data Storage Alliance formed in 2020 has a mission to “create and promote an interoperable storage ecosystem based on DNA as a data storage medium.” Encouragingly, the U.S. government awarded a total of $48 million in 2020 to researchers to develop digital data storage with synthetic DNA.
TriLink has been, and continues to be, a trusted and reliable provider of modified oligos that can be used for R&D in this emerging era of digital data storage in DNA.
Your comments are welcomed, as usual.
Please feel free to share this blog with your colleagues or on social media.