
Data storage infrastructure has proved to be one of the growing problems in a data-intensive world. By 2025, The total amount of data estimated to be consumed/stored is 181 zettabytes. Storing data can take up significant amounts of space; as the amount of data we start to collect grows, so does the physical space. The best solution? DNA. With a diameter of 2.5 nanometers, DNA is one of the most fundamental building blocks of organisms. But how can these polymeric strings serve as a medium for storing 0s and 1s?
The University of Washington researchers developed a new method for synthesizing and reading DNA that utilizes machine learning algorithms to improve the accuracy and speed of the process. They designed a system called "DNA Fountain" that can store data in the form of DNA strands and retrieve it with high accuracy (1). The process of storing data in DNA involves converting the data into a code made up of the four nucleotide bases found in DNA: adenine (A), cytosine (C), guanine (G), and thymine (T). These bases are arranged in specific sequences, known as genes, that provide the instructions for the development and function of an organism. For example, the sequence of bases in a gene might specify the instructions for the production of a particular enzyme or the color of an organism's eyes.
Now that we have a basic understanding of the structure of DNA, we can see how it can be used as a medium for storing data. By converting the data into a code made up of the four nucleotide bases, we can synthesize physical strands of DNA that contain the stored data. In order to store data in DNA, the data must first be converted into a code made up of these bases.
Once the data has been converted into a DNA code, it can be synthesized into physical strands of DNA using a process called polymerase chain reaction (PCR). During PCR, the DNA code is amplified and synthesized into multiple copies of the DNA strand. The synthesized DNA strands can then be stored in a variety of ways, including in a test tube or on a chip.
To retrieve the stored data, the DNA strands must be sequenced, which involves determining the specific order of the nucleotide bases. There are several different techniques for DNA sequencing, but most involve using enzymes to break the DNA strands into smaller fragments and then determining the order of the bases in each fragment. Once the sequences of the DNA strands have been determined, the stored data can be retrieved by converting the DNA code back into its original form.
While DNA has the potential to serve as a virtually unlimited and long-lasting medium for storing data, there are also several disadvantages to using DNA for data storage. One major disadvantage is the cost and complexity of the process. Synthesizing and reading DNA is still relatively slow and expensive, and there is a need for further research and development in order to make it a more practical and widely-used technology.
In conclusion, while DNA data storage is not yet a practical option for most applications, it holds great potential for the future. With the increasing amount of data being generated and stored on the internet(2), there is a need for new and innovative methods for storing and preserving this data. DNA storage may be able to address this need by providing a virtually unlimited storage capacity and the ability to preserve data for hundreds of thousands of years. While there are still challenges to be overcome, the future looks bright for DNA as a revolutionary new way to store and preserve information.
1.https://www.sciencedaily.com/releases/2016/04/160407121455.htm
2.https://www.datatobiz.com/big-data/