It’s Raining Tweets: Preserving Social Media Content is the Next Big Archival Challenge

  • mei 2018
  • Zefi Kavvadia
  • ·
  • Aangepast 27 jun
  • 1
  • 87
Zefi Kavvadia
KIA Community
  • Violet
  • Jacqueline Deknatel
  • Arjan Blom

So much of our lives now happens on the newly but firmly established spaces of social media networks. Ideas about a separation between Internet presence and ‘real life’ become increasingly obsolete; as one meme puts it, we used to say BRB (Be Right Back) when we left our computers briefly – no need for this anymore; we live here now.

As practical, intellectual, and even emotional labor is now carried out on the web, with Facebook being questioned about what can be basically considered a breach of democratic process in one of the most powerful countries in the world, with politicians, commercial, and non-commercial bodies engaging in Twitter battles and performances, and with fringe communities formed on Reddit, 4chan, and Tumblr not only attracting media attention but also impacting, often tragically, public life, we are faced with new challenges as information and heritage professionals.

While we would once preserve journals and scrapbooks, folders of correspondence, photo albums and newspaper clippings in order to serve our functions as keepers of memory and evidence, we now need to turn to bits, files, and data in all forms and flavors. Social media archiving becomes thus an imperative that cultural heritage organizations are slowly beginning to deal with in various degrees of success and efficiency. For example, in 2010 the Library of Congress in the US began a project with the aim of archiving every single public tweet since Twitter’s founding. The project ran into hurdles and was finally discontinued earlier last year, as LOC admitted that it is no longer viable to preserve every tweet – the collection had grown too big for comfort, and only selected tweets will now be archived.

Interestingly, the conversation on the principled and systematic preservation of social media data often overlaps with debates about information overabundance, freedom of expression vs. protection of privacy, and the Internet’s inherent storage and description capabilities. It seems that up until now, we have mostly been content with the idea that whatever was online, would somehow stay there, provided we made the necessary updates and patch fixes. But the reality is that Facebook pages are taken offline because of terms violations and Twitter users are banned, their tweets lost with them. As a UNESCO Expert Meeting on long-term preservation of digital material concluded in 2015, we are losing a lot and we are losing it now. The idea of the Internet as a self-sustaining archive cannot be responsibly supported and stakeholders in cultural heritage and information management must think of how to tackle the problem of preserving the fluid and dynamic information encoded in social media content.

In the Netherlands, there has been no sustained effort from a cultural heritage institution so far to archive social media.* While the KB and the Institute of Sound and Vision have, among others, been collecting born-digital materials such as websites and videos, social media data sets are to my knowledge only preserved on platforms such as EASY, created through DANS by KNAW and NWO with the purpose of storing and managing research data for the long term. While such data sets are valuable, they are not sufficient.

The International Institute for Social History has recently kicked off its born-digital collection development effort; one important feature of this, and a part of the focus of my internship project there, is exactly the creation and optimization of a workflow for preserving social media from the perspective of a cultural heritage organization with a specific mission statement and subject matter. This means there are technical and conceptual questions to answer, which we have slowly but steadily started to work on. For example, the acquisitions policy of the institute does not until now include specific points on what exactly to collect from social networks. As IISH does have a “safe haven” function, safeguarding records whose creators cannot or will not be able to preserve, this very often means that it will need to sort out exactly which data it is willing to preserve. This is why one of the next steps to be taken is exactly the redefinition of the acquisition policy, but also the digital preservation policy, so far mostly geared towards digitized materials. At the same time, and pertinent to appraisal and selection of materials, is the question of context and provenance: how much are we to preserve to make a Facebook page meaningful? The ever-changing environment in which social media posts appear and disappear, due to the intrinsic build of the networks based on real-time generated content as well as website versioning and updating, makes it tricky to decide on what the relevant context for any given social media record is, and thus to ensure authenticity and reliability.

From the more practical side of things, we have been faced with decisions about the best preservation formats to select for our social media records (XLSX and CSV are human-readable and well-documented, yet JSON is machine-readable and interoperable), coming up with harvesting schedules (large organizations’ feeds with a more streamlined social media presence can arguably be harvested only a few times per year, while prolific accounts may have to be harvested monthly or even weekly), setting up a software tool suite and overcoming the initial learning curve, and very importantly, juggling the various licenses, permissions, and terms to be respected when using content owned by social media corporations. With GDPR coming into effect in less than two weeks, we must now think ahead about how it will affect our access and publishing practices. Specifically, we must think about how to best bring together donor agreement terms and legislation requirements, without excluding and obstructing the needs and desires of the research community that the institute serves, and of course the general public.

While this all definitely complicated, these are exciting times for archives and cultural heritage institutions. The human archive has transformed, and together with it, our theories and practices. We live here now, and this is why we must strive to make it better.

* I have not been able to track any such project after some research online – admittedly, my Dutch is next to non-existent and I might have missed some important information because of the language barrier. If this is the case, and if there are projects by any institution or individual on the preservation of social media, please share some information on it! And of course, share any kind of observation, comment, or idea about this blog.

Reacties

één reactie, 16 mei 2018
  • Interesting blog Zefi! At the moment I'm doing research about roughly the same topic. Maybe we can help each other. If you want you can contact me at arjan.blom@student.uva.nl and then we can discuss some topics.
    Best regards, Arjan.

    Arjan Blom

Trefwoorden