Data storage in synthetic DNA: coding and decoding in secret code

  • 19 mei
  • Tineke van Heijst
  • ·
  • Aangepast 21 mei
  • 130
Tineke van Heijst
KIA Community

This is blog 7 in the blog series about Green IT.

Visitors to the Archive Days 2023, organised by the Royal Association of Archivists in the Netherlands (KVAN), will remember the fascinating story by researcher Dina Zielinski about storing data in synthetic DNA. She also explains it clearly in her TED talk on YouTube (1). Despite the advanced stage of research, the costs of storage in DNA remains an obstacle. Nevertheless, the technology is expected to become commercially available within the next five to ten years.

How does it work?

DNA is contained in our bodily cells as a kind of blueprint for our body. It somewhat resembles a very long chain built up of four building blocks: adenine (referred to as A), cytosine (C), guanine (G) and thymine (T). These building blocks are strung together by something called hydrogen bonds.

A data file consists of 1s and 0s, such as 00, 01, 10, and 11. These patterns can be converted into the four building blocks of DNA. It is like creating a sort of secret code, where 00 is converted into A, 01 into C, 10 into G, and 11 into T. This 'codified message’ is then transferred to a laboratory where it is stored in synthetic DNA.

You can then take home the result in a tube of approximately three centimetres long. When you want to read the stored data, you send the tube back to the laboratory, where all the As, Cs, Gs and Ts are converted back into binary code: the bits and bytes of the original files. (2,3)

Many advantages

Storing data in synthetic DNA offers a range of advantages; for instance that you can store data for at least 10,000 years without any data loss. By storing all the data we do not need to have immediately accessible in this way, we can substantially reduce the amount of energy required by cloud storage. A further advantage is that data stored in this way cannot be overwritten. (4)

DNA furthermore offers an astounding data density. For example, the information contained in 100 million HD films requires a volume approximately the size of a pencil’s eraser topIt is even estimated that all the data on the internet could be stored in something like the size of a shoe box. (5)

Another important advantage of storage in synthetic DNA is that, as long as people exist, it will always be possible to decode the stored data. After all, the code to effect the translation is found in our own body.

But some drawbacks as well

Despite the many advantages of this technology, there are currently some drawbacks as well. Consider for instance the high costs: 1000 dollar per TB. Additionally, the time it takes to decode the data is relatively long. The process requires returning the data to the laboratory for a chemical treatment, which means that the data is not immediately available. Also, there is a considerable risk of loss of quality when repeatedly decoding the data.

Looking at the environmental impact of DNA storage, what stands out is that, once the data has been stored, no further energy is required (cold storage). However, DNA functions best at low temperatures and in a dark environment. This makes it slightly more sensitive to environmental influences than for instance glass, which we will discuss shortly.

Another disadvantage is that the coding and decoding require biochemical processes (comparable to the PCR test familiar from the COVID pandemic time), which uses critical natural resources. One of these is phosphorous, which is even rarer than silicium. (6)

Greater data stability

In May 2023, the journal De Ingenieur published an article describing how an international team of researchers, including those of TU Eindhoven, Radboud University and Microsoft, developed a micro sphere. Strings of nucleotides that together form one file can attach to this sphere. (7) This makes it possible to retrieve specific information much faster.

Previously you needed to decode the entire ‘secret code’, and it was vitally important that the data had metadata affixed in the correct manner to make it retrievable; it’s comparable to looking through a huge stack of A4 sheets. These micro spheres make it possible to search in a much more targeted manner, comparable to browsing through books rather than digging around in a mountain of paperwork.

This new method also ensures a more stable quality of the stored data. Previously, around 35% of the quality would be lost after three rounds of decoding, but this improved approach has brought that figure down to just 0.3%.

Sooner than we think

Researchers expect that the first DNA datacentre will open within five to ten years. This datacentre will contain a special section where new files are encoded through DNA synthesis. At the same time, another part of the building will house large fields of micro spheres containing those files. A robot arm will select a sphere, read the contents, and place it back in the same spot.

You can see a video on YouTube in which a team of researchers from the University of Washington, working with Microsoft, demonstrate the first fully automated system for the storage and retrieval of data using synthetic DNA. (8) The video was made in March 2019. Given the pace of these developments, the prognosis of five to ten years certainly seems realistic.

Sources:

(1) Zielinski, Dina, ‘How can we store digital data in DNA’ TED Talk available on YouTube, last viewed on 1 November 2023.

(2) Seeker, ‘We Could Back Up The Entire Internet On A Gram of DNA’, YouTube video, last viewed on 1 November 2023.

(3) Ted-Ed, ‘Is DNA the future of data storage’, Ted-Ed available on YouTube, last viewed on 1 November 2023.

(4) Heijst, Ad van, ‘Monitoring van opslagtechnieken 5: Informatieopslag in de vorm van DNA’, available on KIA Pleio, published on 3 February 2023.

(5) Reactions, ‘Is DNA The Future of Data Storage?, YouTube video, last viewed on 1 November 2023.

(6) De Ingenieur, ‘Aminozuren als alternatief voor de cloud’, published on 1 May 2019.

(7) De Ingenieur, ‘Data-opslag in DNA weer stap dichterbij’, published on 5 May 2023.

(8) Microsoft Research, ‘Microsoft and UW demonstrate first fully automated DNA Data Storage’, YouTube video, last viewed on 1 November 2023.

About the blog series on Green IT

This blog series aims to familiarise heritage institutions with the subject of Green IT, making it easier to discuss this important topic within the organisation. The next blog first takes a closer look at CO2 emission and its impact, and then applies the issue to the heritage sector.

This series was written by Tineke van Heijst, green tech watcher of the Green IT network group set up by the Dutch Digital Heritage Network (Netwerk Digitaal Erfgoed, NDE). This network group monitors developments regarding Green IT and the impact of the increasing digitalisation on the climate. The group specifically studies the (increasing) digitalisation within the heritage sector.

Previously published in this blog series:

Introduction into Green IT

IT’s double role in sustainability - KIA community

The need for a sustainability framework for the heritage sector - KIA community

Data Storage

The digital databerg - KIA community

The hidden impact of cloud storage - KIA community

To store 1% of the world’s data: what does that cost in terms of CO2 emission? - KIA community

The quest for sustainable alternatives to disks and tapes - KIA community

Trefwoorden