When slowing down means moving fast: A blog about iPRES 2022 from the IISG
Samenvatting: Zefi Kavvadia, Caro Matulessya, & Robert Gillesse, International Institute of Social History
Social media archiving is one of the latest hot topics in digital preservation and information and records management. It is now becoming widely recognized that the contemporary record, whether it is meant for evidence, research, memory or any combination of the above, can and will very often contain material originating from various social media platforms.
In the Netherlands too there has been recent mounting interest in developing and establishing social media archiving expertise. One of the problems that many organizations interested in preserving social media face, is the lack of standardized tools and solutions. There is a breadth of almost what can be described as DIY approaches to capturing social media, with mostly open-source software that often requires reasonable technical skills. But for many organizations in cultural heritage and local government, it might be difficult to find the resources in time, staff, and tools to implement and use open-source social media archiving tools reliably and systematically.
The NDE, to address this need, decided to expand on the already existing free and open-source social media archiving tools research by looking into the possibilities of paid solutions. This article is a summarization of the survey research that was conducted in the second half of 2021, and it presents the main five commercial providers about which the most information could be gathered. Several other providers were also contacted, but they have not been included in the present summary as the information gathering on them has not been completed yet.
The purpose of the present document is not to review or assess the commercial tools against specific standards or to compare them against one another to establish which is most suitable for which activity. Rather, it is meant more like a survey with an introductory nature, that will cover different aspects of the commercial social media archiving solutions such as their scope, technical features, usability, etc. Inclusion or exclusion of any commercial tools in this survey is therefore not to be considered an endorsement (or the lack thereof) of that tool or company by NDE/IISG, as the aim of this article is to present and inform interested organizations about their options.
The following list was compiled based on requirements used previously in our free and open-source tools survey, and on requirements defined following both public and private discussions with colleagues from the heritage field, government archival organizations, and representatives of various commercial web and social media archiving companies.
The tools examined so far are:
Note: All of the information listed about commercial tools is indicative and may be subject to change. Additionally, compiling information on commercial tools that in most cases are not available for a trial, neither provide any significant public documentation about their functions, necessarily means that there are unfortunately still some gaps or inaccuracies in this survey.
Due to this, and since most of the providers offer at least some room for specific and tailor-made solutions, it is advised that if an organization is interested in a feature or service that is not listed explicitly here, the best option is to contact the company about it.
Additionally, if readers discover any eye-catching omission or inaccuracy, we will be glad to hear it and edit accordingly.
------------------------------------------------------------------------------------------------------------
Which social media platforms can the tool archive?
Archive-It: Facebook, Twitter, YouTube, Instagram, Tumblr, Pinterest, Flickr, Vimeo, and more
Intradyn: Facebook, Twitter, YouTube, LinkedIn, Instagram, Pinterest, Flickr, and more upon request
MirrorWeb: Facebook, Twitter, YouTube, Filckr, Vimeo, as well as internal communications platforms like Slack, SharePoint, messaging apps like Telegram
Pagefreezer/Webpreserver: Pagefreezer can archive Facebook, Twitter, YouTube, Instagram, Linkedin. Webpreserver can archive these and more upon request. Additionally, WhatsApp Business archiving is available but only outside the EU.
Socialex: Facebook, Twitter, LinkedIn, YouTube, Instragram, WhatsApp
Does the tool make use of social media platform APIs, web crawlers, both?
Archive-It: Crawlers like Brozzler, Heritrix, and more
Intradyn: API-based capture via third-party services (Pagefreezer)
MirrorWeb: Crawlers like Electrolyte (proprietary), plus other open-source software
Pagefreezer/Webpreserver: API-based (Pagefreezer), crawlers (Webpreserver)
Socialex: Direct API access to Twitter, API via third-party service (Obi4Wan)
Can the tool capture private messages, private groups, and other non-public content?
Archive-It: Yes, but passwords and usernames are required
Intradyn: No, they focus on public content
MirrorWeb: Yes, if the content is behind a log-in for which there is a password and username available
Pagefreezer/Webpreserver: Yes, if passwords and usernames are provided by account owners
Socialex: Yes, if there are passwords and usernames of the account owners available
Can the tool capture personal social media accounts, as well as pages and groups?
Archive-It: Yes
Intradyn: No, only pages
MirrorWeb: Yes
Pagefreezer/Webpreserver: Yes
Socialex: Yes
Is it possible to exclude from the capturing process, and/or remove from the captured data, potentially sensitive information, e.g., anonymize users, remove addresses, censor photos, etc.?
Archive-It: It is possible to filter out content by scoping, i.e., determining which URLs will be included or excluded from a capture, but not to modify already captured data
Intradyn: Terms can be redacted in the exported files after a capture
MirrorWeb: Scoping rules can be applied to exclude URLs from being captured by a crawler
Pagefreezer/Webpreserver: Pagefreezer allows users to choose which of the collected data will be made available for public access or included in an export
Socialex: It is possible to redact information like names, reactions, etc.
In which formats can the collected data be exported and shared?
Archive-It: WARC
Intradyn: PDF, EML
MirrorWeb: WARC
Pagefreezer/Webpreserver: WARC, MHTML, PDF, JPG
Socialex: PDF/A, WARC
How is the captured social media data transferred to the organization, if the capture does not happen in-house?
Archive-It: The captured data is available via an online application for browsing and download, and it is also possible to integrate an organization’s systems with the collections by using the Archive-It API
Intradyn: Available via the provider’s dashboard for download
MirrorWeb: Available via the provider’s dashboard for download
Pagefreezer/Webpreserver: -
Socialex: Available via an online dashboard, which can also be integrated with an organization content management system
Can the tool be used by staff themselves to capture data, or do the providers perform the captures on behalf of the organization?
Archive-It: The web application is meant to be used by the organization’s staff to perform captures themselves
Intradyn: The company performs the captures
MirrorWeb: -
Pagefreezer/Webpreserver: A web application and a browser plug-in can be used by staff themselves to perform captures (WebPreserver)
Socialex: The company performs the captures
Is continuous, real-time capture available?
Archive-It: Captures can be scheduled in various frequencies (one-time, daily, weekly, monthly, etc.)
Intradyn: Continuous monitoring of and capture of new content is possible
MirrorWeb: Continuous monitoring of and capture of new content is possible
Pagefreezer/Webpreserver: Continuous monitoring of and capture of new content is possible, as well as scheduling
Socialex: Continuous monitoring of and capture of new content is possible
Is support available for organizations using the tool e.g., training, guides, technical support?
Archive-It: Guides, webinars, consultation with web archivists, support
Intradyn: Guides, support available
MirrorWeb: Support available
Pagefreezer/Webpreserver: Training and support available
Socialex: Training and support available
For now, our aim is to expand the survey to include missing information on the tools we already looked into, and potentially gather data on some others as well.
We would also like to attempt to gain more first-hand experience with the tools, as in the short time of the project, it was not possible for us to gain access to any of them but one.
Finally, we would like to expand this document with more information and details on the requirements and features of the tools, something which we hope can be used by organizations themselves when they investigate what the most suitable solution is for them.
Note: Edited on 22.02.22 to include more information on Socialex.
For questions and comments about this article, the NDE social media archiving tools research projects, and any other related matter, please contact Zefi Kavvadia (IISG) at zefi.kavvadia@iisg.nl and Sophie Ham (KB) at Sophie.Ham@KB.nl