Munin-Indexer (Munin)

  • jan 2022
  • Zefi Kavvadia
  • ·
  • Aangepast 27 jun
  • 63
Zefi Kavvadia
Particuliere Websites en SoMe
  • Alle leden mogen wijzigen

Munin (Munin-Indexer) uses Docker to wrap different scraping and archiving tools together and offer a scraping solution for Facebook, Instagram, and VKontakte. It indexes and scrapes posts, then crawls and captures them, and finally uses pywb to display them.

Suitable for public social media content

The important thing to note about Munin is that it is only able to archive public posts, i.e., only posts that do not sit behind a log-in. Consequently, this means that it is useful for archiving public Facebook pages, public Facebook groups, or the public posts on a personal Facebook account, but cannot archive private Facebook group content or private posts. Likewise, for Instagram, if the posts belong to an account that is restricted, Munin cannot get to them and archive them.

Capturing and monitoring for new posts exclusively

Additionally, Munin is not a tool meant for archiving historical content per se, i.e., it cannot capture posts from days, weeks, or months before the crawl is initiated. Munin is in fact a monitoring tool that, after you have entered a URL, will detect each new post created starting from the moment you begin the crawl – posts made before this moment will not be captured. It will then store each post in a separate WARC file. While most of the tools in this category create one WARC file for the entire page they crawl, Munin is different in that it creates individual WARCs for each individual post. This is reasonable as, unlike the rest of the look and feel tools we looked at, Munin indexes and scrapes individual public posts, and not entire pages.

Each post = separate WARC file

Having each post archived in its own WARC might be an issue if you intend to archive a great number of materials, as it can make ingest and storage potentially more complicated. Additionally, if the ultimate goal is to preserve an account or page’s look and feel, the approach Munin takes could be said to affect the provenance of the material, and ultimately its authenticity, as it extracts posts from their originating context and presents them as individual pieces of content.

Nevertheless, especially for the notoriously difficult to archive Facebook, Munin could be a viable choice.

Trefwoorden