To store 1% of the world’s data: what does that cost in terms of CO2 emission?

  • 19 mei
  • Tineke van Heijst
  • ·
  • Aangepast 21 mei
  • 48
Tineke van Heijst
KIA Community

This is blog 5 in the blog series about Green IT.

We are heading towards a data explosion. In 2026, the amount of data worldwide will likely exceed the number of sand grains found on all beaches worldwide. This massive amount of data will largely be stored in the cloud, with huge CO2 emissions as a result. This prospect is a source of great concern for the discipline of digital preservation. In this blog we discuss the hypothetical CO2 impact of storing just 1% of this data, as calculated by Matthew Addis, director of software supplier Arkivum.

The idea of making this calculation emerged during a panel debate on the growth of data at iPRES 2023, a worldwide congress on digital preservation. The debate, titled 'Tipping Point' (1), examined whether we have reached a point where the amount of data created annually has become so large that the data can no longer be processed in a meaningful manner. Let alone that we can manage and store the data for the long term. The debate also stressed the climate costs of digital preservation.

Ballpark figure

The English entrepreneur Matthew Addis participated in this panel debate. He has been studying the environmental impact of digital preservation for some time. As the debate notably failed to present any specific figures on the impact of the long-term storage of huge amounts of data, he decided to make a hypothetical calculation (2).

Based on his calculations, performed in two different ways, he concluded that the preservation of just 1% of the data produced worldwide would come to 10 MegaTonnes (MT) of CO2 equivalent. Which is roughly the same amount of emission produced each year by a city of 2 million people, which is about the size of Chicago, near to where iPRES 2023 happened to take place.

Although Matthew cautions that his calculation is a “ballpark figure”, or rough estimate, it remains interesting to share his reasoning in this blog.

How does he arrive at these figures?

Matthew used two methods to arrive at his conclusion: a top-down calculation, based on the global data volume and the CO2 emission of the ICT sector, and a bottom-up calculation, based on the actual measurements of the footprint of Arkivum’s large-scale preservation activities.

Top-down calculation based on the global data volume

Let’s first look at the top-down approach. For the total amount of data produced globally in one year, Matthew assumes the round number of 100 Zettabytes. The annual global CO2 emission is 40 GigaTonnes. Various studies suggest that the IT sector’s contribution is between 2 and 4%. This implies that the IT sector is responsible for approximately 1 GigaTonne of CO2 emission per year to support the production, processing, distribution and consumption of 100 ZB of data.

If next we assume that we wish to permanently preserve 1% of this amount, then the processing, storage, retrieval and use of this data would create CO2 emissions of 10 MegaTonnes per year.

Bottom-up calculation based on Arkivum as a case study

What did the bottom-up approach show? Arkivum has researched the actual CO2 footprint of its preservation activities, assessing the energy use and integrated footprint (3) of the ICT equipment involved (4).

Considering the huge amounts of data, at present the cloud is the only infrastructure capable of storing this amount in a sustainable manner. The CO2 emission varies depending on the type and volume of the data, and how it is used, stored and retrieved. According to the Arkivum study’s figures, to process, store and access 1 terabyte of data corresponds with an emission of 10 kg of CO2. Extrapolate this to 1 ZB data, and you again arrive at a figure of 10 MegaTonnes of CO2 emission per year.

As said, this is a hypothetical approach that can be debated. The calculation is based on projected growth figures, energy use, and the assumption of 1% preservation. The figures have also been rounded off, for the sake of ease. Although it’s mainly the reasoning process that is debatable here, the figures are astonishing, and certainly have some bearing on reality.

Recommendations from the panel debate

The debate at iPRES 2023 centred on tackling the growing databerg, while the discipline of digital preservation is challenged for instance by a shortage of workers, skills, financial resources and technology. Add to this, the ecological impact of our work. A number of solutions were proposed:

- Save less and be more selective, so that only the data that truly need to be kept long-term are saved and the rest can be discarded.

- Accept that we cannot preserve everything, however much we would like to.

- Critically review the processes we apply to secure digital data for the future. Are all process steps truly necessary?

- Adjust expectations regarding access to our collections. Is it really necessary for everything to always be accessible, everywhere and all at once?

- Look at efficiency throughout the sector: make use of shared platforms and avoid the double storage of collection items, for instance in both your own collection and in a cross-institutional collection.

Finally

We will explore the questions above – which are actually user’s dilemmas -- more thoroughly in the last theme of the blog series. The main goal of this blog was to draw attention to the volume of digital material heading our way and the associated potential CO2 impact. Suppose we did start storing an amount of data that is linked to a CO2 footprint comparable to a city of 2 million inhabitants. Can we justify this toward future generations? Does our sector carry a responsibility to prevent this grim scenario?

Sources

(1) Stokes, Paul & Colbron, Karin, 'Tipping point: Have we gone past the point where we can handle the Digital Preservation Deluge?', iPRES 2023 paper, available online, last consulted on 29 October 2023.

(2) Addis, Matthew, 'What is the carbon footprint of large-scale global digital preservation?', published on DPC Online on 3 October 2023, last consulted on 29 October 2023.

(3) As described in the introductory blog, there is much debate on how to measure the CO2 footprint. Some studies only focus on the energy use, others take account of the entire ICT lifecycle of equipment. When calculations include the ICT lifecycle, the associated term is the ‘embodied footprint’’ or ‘embedded footprint’. Matthew Addis calls this “integrated footprint” in his article.

(4) Addis, Matthew, 'Does net zero emissions from energy usage in the cloud mean carbon free digital preservation is on the horizon?, published on DPC Online on 31 July 2023, last consulted on 29 October 2023.

About the blog series on Green IT

This blog series aims to familiarise heritage institutions with the subject of Green IT, making it easier to discuss this important topic within the organisation. The next blog first takes a closer look at CO2 emission and its impact, and then applies the issue to the heritage sector.

This series was written by Tineke van Heijst, green tech watcher of the Green IT network group set up by the Dutch Digital Heritage Network (Netwerk Digitaal Erfgoed, NDE). This network group monitors developments regarding Green IT and the impact of the increasing digitalisation on the climate. The group specifically studies the (increasing) digitalisation within the heritage sector.

Previously published in this blog series:

Introduction into Green IT

IT’s double role in sustainability - KIA community

The need for a sustainability framework for the heritage sector - KIA community

Data Storage

The digital databerg - KIA community

The hidden impact of cloud storage - KIA community

Trefwoorden