Functional requirements for social media archiving tools (open-source)

  • jan 2022
  • Zefi Kavvadia
  • ·
  • Aangepast 27 jun
  • 31
Zefi Kavvadia
Particuliere Websites en SoMe
  • Alle leden mogen wijzigen

The list of functional requirements below is based on an earlier project carried out within IISG which focused on workflows for acquiring and preserving born-digital materials in the broad sense, and another project that looked into web archiving tools and workflows specifically.

For this NDE/IISG research on social media archiving tools, we were particularly interested in testing tools and their outcomes with a focus on how they can be used primarily in cultural heritage and secondarily in local government settings, and for this reason we took a view as broad as possible of what the tool outputs could be (webpage snapshots, structured data, even screenshots).

While there is increasingly more and more interest in finding out what, for example, humanities and social science researchers need from web and social media collections (Hockx-Yu 2014a; Jackson et al. 2016; Winters 2017), the truth is we are still at a stage where a lot of the decisions made are necessarily based on various degrees of informed guesswork. The type of research (quantitative, qualitative, mixed methods, etc.) they perform, as well as their digital skills and background, affects the kinds of collections that researchers desire.

1.

Preservation of "artefactual" value of content (the "original look and feel" of the page as it browsed by end users of the platforms)

2.

Preservation of "informational" value of content (the "informational content" of the page, i.e., the textual content of posts and comments, the links, the usernames, and the metadata associated with those)

3.

Capture of password-protected content (what sits behind a log-in screen as opposed to publicly available social media content)

4.

Media capture and/or extraction (e.g., capturing images and videos embedded in the page but also downloading them as separate files)

5.

Rich media (i.e., interactive graphics and other dynamic content) capture

6.

Snapshot captures (one-time capture of the page) vs. scheduling of periodic captures

7.

Output in accepted archival formats and/or widely adopted and/or open-source formats

8.

Internal logging and documentation capabilities (e.g., logs, change tracking, capture session metadata, including ability to extract this information in a usable format

Why should we try out different types of tools?

Experimenting with different tools allows us to gain experience of the possibilities for access that the output of each tool creates before we finalize our strategies and methods of collecting.

Why should we use open source tools?

While there are many great paid options, open source could be a valuable choice for institutions that prefer more tailor-made solutions, more control over their data and processes, and a chance to contribute to the development and evolution of open source archiving software.

Trefwoorden