Commercial Social Media Archiving Tools: Summary of First Survey by NDE/IISG

  • feb 2022
  • Zefi Kavvadia
  • ·
  • Aangepast 27 jun
  • 48
Zefi Kavvadia
Particuliere Websites en SoMe
  • Antal Posthumus
  • Patricia den Ambtman
  • Sophie Ham

Social media archiving is one of the latest hot topics in digital preservation and information and records management. It is now becoming widely recognized that the contemporary record, whether it is meant for evidence, research, memory or any combination of the above, can and will very often contain material originating from various social media platforms.

Background of the project

In the Netherlands too there has been recent mounting interest in developing and establishing social media archiving expertise. One of the problems that many organizations interested in preserving social media face, is the lack of standardized tools and solutions. There is a breadth of almost what can be described as DIY approaches to capturing social media, with mostly open-source software that often requires reasonable technical skills. But for many organizations in cultural heritage and local government, it might be difficult to find the resources in time, staff, and tools to implement and use open-source social media archiving tools reliably and systematically.

The NDE, to address this need, decided to expand on the already existing free and open-source social media archiving tools research by looking into the possibilities of paid solutions. This article is a summarization of the survey research that was conducted in the second half of 2021, and it presents the main five commercial providers about which the most information could be gathered. Several other providers were also contacted, but they have not been included in the present summary as the information gathering on them has not been completed yet.

Purpose

The purpose of the present document is not to review or assess the commercial tools against specific standards or to compare them against one another to establish which is most suitable for which activity. Rather, it is meant more like a survey with an introductory nature, that will cover different aspects of the commercial social media archiving solutions such as their scope, technical features, usability, etc. Inclusion or exclusion of any commercial tools in this survey is therefore not to be considered an endorsement (or the lack thereof) of that tool or company by NDE/IISG, as the aim of this article is to present and inform interested organizations about their options.

Tool list and features

The following list was compiled based on requirements used previously in our free and open-source tools survey, and on requirements defined following both public and private discussions with colleagues from the heritage field, government archival organizations, and representatives of various commercial web and social media archiving companies.

The tools examined so far are:

Note: All of the information listed about commercial tools is indicative and may be subject to change. Additionally, compiling information on commercial tools that in most cases are not available for a trial, neither provide any significant public documentation about their functions, necessarily means that there are unfortunately still some gaps or inaccuracies in this survey.

Due to this, and since most of the providers offer at least some room for specific and tailor-made solutions, it is advised that if an organization is interested in a feature or service that is not listed explicitly here, the best option is to contact the company about it.

Additionally, if readers discover any eye-catching omission or inaccuracy, we will be glad to hear it and edit accordingly.

------------------------------------------------------------------------------------------------------------

Which social media platforms can the tool archive?

  • Archive-It: Facebook, Twitter, YouTube, Instagram, Tumblr, Pinterest, Flickr, Vimeo, and more

  • Intradyn: Facebook, Twitter, YouTube, LinkedIn, Instagram, Pinterest, Flickr, and more upon request

  • MirrorWeb: Facebook, Twitter, YouTube, Filckr, Vimeo, as well as internal communications platforms like Slack, SharePoint, messaging apps like Telegram

  • Pagefreezer/Webpreserver: Pagefreezer can archive Facebook, Twitter, YouTube, Instagram, Linkedin. Webpreserver can archive these and more upon request. Additionally, WhatsApp Business archiving is available but only outside the EU.

  • Socialex: Facebook, Twitter, LinkedIn, YouTube, Instragram, WhatsApp

Does the tool make use of social media platform APIs, web crawlers, both?

  • Archive-It: Crawlers like Brozzler, Heritrix, and more

  • Intradyn: API-based capture via third-party services (Pagefreezer)

  • MirrorWeb: Crawlers like Electrolyte (proprietary), plus other open-source software

  • Pagefreezer/Webpreserver: API-based (Pagefreezer), crawlers (Webpreserver)

  • Socialex: Direct API access to Twitter, API via third-party service (Obi4Wan)

Can the tool capture private messages, private groups, and other non-public content?

  • Archive-It: Yes, but passwords and usernames are required

  • Intradyn: No, they focus on public content

  • MirrorWeb: Yes, if the content is behind a log-in for which there is a password and username available

  • Pagefreezer/Webpreserver: Yes, if passwords and usernames are provided by account owners

  • Socialex: Yes, if there are passwords and usernames of the account owners available

Can the tool capture personal social media accounts, as well as pages and groups?

  • Archive-It: Yes

  • Intradyn: No, only pages

  • MirrorWeb: Yes

  • Pagefreezer/Webpreserver: Yes

  • Socialex: Yes

Is it possible to exclude from the capturing process, and/or remove from the captured data, potentially sensitive information, e.g., anonymize users, remove addresses, censor photos, etc.?

  • Archive-It: It is possible to filter out content by scoping, i.e., determining which URLs will be included or excluded from a capture, but not to modify already captured data

  • Intradyn: Terms can be redacted in the exported files after a capture

  • MirrorWeb: Scoping rules can be applied to exclude URLs from being captured by a crawler

  • Pagefreezer/Webpreserver: Pagefreezer allows users to choose which of the collected data will be made available for public access or included in an export

  • Socialex: It is possible to redact information like names, reactions, etc.

In which formats can the collected data be exported and shared?

  • Archive-It: WARC

  • Intradyn: PDF, EML

  • MirrorWeb: WARC

  • Pagefreezer/Webpreserver: WARC, MHTML, PDF, JPG

  • Socialex: PDF/A, WARC

How is the captured social media data transferred to the organization, if the capture does not happen in-house?

  • Archive-It: The captured data is available via an online application for browsing and download, and it is also possible to integrate an organization’s systems with the collections by using the Archive-It API

  • Intradyn: Available via the provider’s dashboard for download

  • MirrorWeb: Available via the provider’s dashboard for download

  • Pagefreezer/Webpreserver: -

  • Socialex: Available via an online dashboard, which can also be integrated with an organization content management system

Can the tool be used by staff themselves to capture data, or do the providers perform the captures on behalf of the organization?

  • Archive-It: The web application is meant to be used by the organization’s staff to perform captures themselves

  • Intradyn: The company performs the captures

  • MirrorWeb: -

  • Pagefreezer/Webpreserver: A web application and a browser plug-in can be used by staff themselves to perform captures (WebPreserver)

  • Socialex: The company performs the captures

Is continuous, real-time capture available?

  • Archive-It: Captures can be scheduled in various frequencies (one-time, daily, weekly, monthly, etc.)

  • Intradyn: Continuous monitoring of and capture of new content is possible

  • MirrorWeb: Continuous monitoring of and capture of new content is possible

  • Pagefreezer/Webpreserver: Continuous monitoring of and capture of new content is possible, as well as scheduling

  • Socialex: Continuous monitoring of and capture of new content is possible

Is support available for organizations using the tool e.g., training, guides, technical support?

  • Archive-It: Guides, webinars, consultation with web archivists, support

  • Intradyn: Guides, support available

  • MirrorWeb: Support available

  • Pagefreezer/Webpreserver: Training and support available

  • Socialex: Training and support available

Future goals

For now, our aim is to expand the survey to include missing information on the tools we already looked into, and potentially gather data on some others as well.

We would also like to attempt to gain more first-hand experience with the tools, as in the short time of the project, it was not possible for us to gain access to any of them but one.

Finally, we would like to expand this document with more information and details on the requirements and features of the tools, something which we hope can be used by organizations themselves when they investigate what the most suitable solution is for them.

Note: Edited on 22.02.22 to include more information on Socialex.

For questions and comments about this article, the NDE social media archiving tools research projects, and any other related matter, please contact Zefi Kavvadia (IISG) at zefi.kavvadia@iisg.nl and Sophie Ham (KB) at Sophie.Ham@KB.nl

Trefwoorden