The "look and feel" approach

  • jan 2022
  • Zefi Kavvadia
  • ·
  • Aangepast 27 jun
  • 70
Zefi Kavvadia
Particuliere Websites en SoMe
  • Alle leden mogen wijzigen

This method is based on common web archiving practices, which makes sense as social media archiving can be seen as an offshoot of web archiving.

The most common method of web archiving, namely web crawling or web harvesting, attempts to preserve the so-called “look and feel” of online content, meaning the layout, structure, and style of a website, as well as its navigational features, like buttons and menus. However, this has proven to be relatively limited in what it can accomplish with social media.

What is web crawling and why is it not successful when capturing social media?

Crawlers are software that request the pages in the URLs we provide to them, the so-called seeds, and then they store the content they receive. They behave quite like a browser requesting whatever is at an internet address with an HTTP request, and then displaying to the user what it receives from the website servers. Such HTML-based websites were called "static" and they are becoming more and more uncommon. Where it was previously possible to capture an entire website by gradually visiting all its URLs one by one, e.g. www.example.com, www.example.com/faq, www.example.com/help, etc., nowadays the dynamic features of most websites, including social media, very often require user interaction in order for pages to load. Basically, the user clicking and interacting with the page triggers the construction of URLs for the elements of the website – without the interaction, these URLs are not constructed at all. This puts most contemporary social media content out of traditional crawlers’ reach.

What is the best way to capture the "look and feel" of social media?

To capture the “look and feel” of social media content, i.e., the pages that make up the social media website that an end user can peruse, scrapers and harvesters based on browser emulation are used. As the front-page content of a social media website is the most recognizable form of social media experience for the majority of people, being able to preserve this component of social media is important. Browser-based crawlers, e.g., Brozzler, Webrecorder Desktop, etc., mimic the interaction of a human user on a social media website, playing media content, expanding nested comments, and generally triggering all the interactions required by the page for it to be fully rendered. The goal is to recreate and store the visual outlook and functionality of the page as it was at the time of capture.

Trefwoorden