AI is not a new thing in digital heritage

  • 20 mei
  • Tineke van Heijst
  • ·
  • Aangepast 21 mei
  • 46
Tineke van Heijst
KIA Community

This is blog 15 in the blog series about Green IT.

Heritage organisations have been using AI for many years to make their digital collections accessible and searchable in new ways and to allow more people to discover, explore and make use of cultural treasures. In this blog, we look at how the heritage sector is already using AI, and discuss a few example projects.

Applications in the heritage sector
Applying AI is actually not a new development at all: there are countless projects in which artificial intelligence is already being used successfully. In the following, we list some of the most common applications in the field of heritage, along with references to specific projects to better illustrate the examples.

Image recognition
One of the most common purposes for which AI is used in the heritage sector is image recognition. For example, Erfgoed Gelderland used the image recognition technology of Google’s Vision AI to tag images with automatically generated keywords. (2) Digital heritage coach Tim Stapel explains how this works in an article on NDE’s website: “The AI program analyses an image and attaches automatically generated keywords to it. Based on these keywords, other objects from the collections are shown to our website visitors. The idea is that visitors browsing the website without a specific purpose in mind can discover the connections that exist between different collections within CollectieGelderland.” Manually, it would not have been possible to provide all the images (totalling around half a million) with keywords like this.

Another large-scale application of image recognition is Vitec Memorix’s crowdsourcing platform Vele Handen (literally: Many Hands) (3). On this platform, archives and museums can upload their digitised collections to be “unlocked” by the public. For example, volunteers can help train an image recognition algorithm by adding labels to images within a project, or train an algorithm to automatically transcribe ancient documents by typing out the handwritten texts.

Another example that shows how AI can improve the accessibility of collections is the research project Krant en foto’s (Newspapers and Photos). In this project, the photograph collections of the press photo agencies Fotopersbureau De Boer (Noord-Hollands Archief) and Persfotobureau D. van der Veen (Groninger Archieven) are linked with publications in the newspapers Haarlems Dagblad, IJmuider Courant and Nieuwsblad van het Noorden. A fascinating whitepaper was written about this project: Krant en foto’s verbonden: een verkenning om kunstmatige intelligentie in te zetten om erfgoedcollecties te verbinden (Connecting Newspapers and Photos: an exploration of AI’s application in connecting heritage collections), published by NDE. (4)

Facial recognition
In Belgium, meemoo – the Flemish Institute for Archives – has researched the use of facial recognition to recognise persons in photos and videos (5). Under the FAME project, innovative techniques were tested to make it easier to identify people. The researchers used an open-source tool and applied it to various heritage collections, with a focus on public figures (in particular: performing artists, racing cyclists, politicians and activists). Thanks to the tool, they now know not only which person is shown in each photograph, but also who each of the people in group photographs are. (6)

Handwriting recognition
​​Another example of successful AI application in the heritage sector is the project De ijsberg zichtbaar maken (literally: Making the Iceberg Visible), managed by the National Archives of the Netherlands and Noord-Hollands Archief. In this project, AI is used to convert hand-written texts into information that computers can read – a process known as transcribing. The project has converted one million scans of documents from the VOC (the Dutch East India Company) in the National Archives and one million scans of documents written by nineteenth-century notaries into easy-to-read and searchable texts. (7)

Speech recognition
Dutch researchers have been exploring ways to improve the searchability of audiovisual archives using speech recognition for more than twenty years. As far back as 2001, the Netherlands Institute for Sound and Vision was collaborating with the University of Twente on a European research project (ECHO) to determine whether automatic speech recognition could be applied in practice. Although the technology is much more advanced today than it was back then, it was already clear that the automatic transcription of spoken language in audio and video would significantly contribute to the objective of making archive materials more accessible. (8)

A concrete project example was MALACH, which also launched in 2001. This project sought to preserve the testimonies of Holocaust survivors and make them accessible for a broad audience via digital search and analysis systems. Due to the great variation in speech (different languages, dialects, emotionally charged voices), it was a challenge to find a technology that yielded an acceptable quality level across the full breadth of the collection.

The CLARIAH project has also embarked on the creation of a speech recognition service for research and heritage institutions. The speech recognition system that the Institute of Sound & Vision and CLARIAH use is based on Kaldi (9), an open-source speech recogniser that uses the latest machine learning technologies to achieve significant quality improvements. Thanks to this software, it has become relatively easy to apply speech recognition.

Application of generative AI
Generative AI, such as ChatGPT, also offers a potential tool for the heritage sector. At a gathering on 13 February 2023 under the title of ChatGPT of nie, forty heritage professionals investigated ways to apply this technology fruitfully. (10) They concluded that the technology still needs to be critically assessed: although the answers that the chatbot generates may seem convincing at first glance, they don’t always turn out to be accurate.

NDE asked Heleen Wilbrink to write a blog series on ChatGPT and Large Language Models in the field of heritage and digital preservation. (11) Wilbrink and her colleagues spent the past year experimenting with the use of precursors to ChatGPT to convert historic texts to modern Dutch, so that more people can engage with heritage. This revealed that ChatGPT does better than its predecessors, but it is still not perfect. Sometimes, the chatbot invents things that do not exist in the source text: these are known as hallucinations. At other times, the chatbot leaves out parts of the original text that an expert would have kept in the “translation”. As such, the results have to be critically evaluated, which can take a lot of time – a disproportionate amount, potentially, compared to the advantages that ChatGPT offers.

ChatGPT can also be used to generate code, such as JavaScript, HTML, RDF or SPARQL. Again, this code must be carefully checked by an expert to identify and solve any errors ChatGPT has made.

Another way in which this AI chatbot can be useful is to summarise texts such as notarial deeds, or finding entities such as personal names and locations in texts and determining the relationships between them. This can be helpful when creating indexes to make archives searchable. Finally, ChatGPT can be used to convert data into a story or a description.

Conclusion
It would be beyond the scope of a single blog post to discuss an exhaustive list of AI heritage projects. Rather, the purpose of this blog was to give an impression of the kinds of AI projects that are already taking place, and thus show how broad the possibilities are and how this technology can improve the accessibility of our collections. Furthermore, using this technology saves a lot of time: tasks that had to be done manually in the past can now be completed very rapidly by an algorithm.

However, there is a flip side to this coin: AI uses a lot of energy due to the immense amount of computing power required to train and apply the algorithms. Our next blog will discuss this in more detail.

Sources

(1) NDE and Ministry of OCW, 'Nationale Strategie Digitaal Erfgoed', March 2021.

(2) Netwerk Digitaal Erfgoed, ‘Erfgoed Gelderland ging aan de slag met beeldherkenning. Dit zijn de geleerde AI-lessen’, published on 1 July 2021.

(3) The website can be accessed at http://www.velehanden.nl

(4) Balmashnova, Evgeniya et al., 'Krant en foto's verbonden: een verkenning om kunstmatige intelligentie in te zetten om erfgoedcollecties te verbinden', published by the Netwerk Digitaal Erfgoed in 2022.

(5) Meemoo, 'FAME: gezichtsherkenning als tool voor metadatacreatie', 2022.

(6) Meemoo, 'FAME loopt af! De eerste resultaten' published on 4 October 2022.

(7) KIA Pleio, ‘Bijeenkomst RHC’s over het project ‘De ijsberg zichtbaar maken’, published on 8 February 2021.

(8) Roeland Ordelman 'Spraakherkenning voor onderzoek in AV-archieven - Twintig jaar ontwikkeling in Nederland', published on AVA_Net on 15 April 2021.

(9) The Kaldi-code is available at Kaldi ASR (kaldi-asr.org).

(10) Netwerk Digitaal Erfgoed, ‘ChatGPT: nog lang niet perfect maar wel met potentie voor het erfgoedveld’, published on 15 February 2023.

(11) The blogs by Heleen Wilbrink are available at KIA Pleio.

About the blog series on Green IT

This blog series aims to familiarise heritage institutions with the subject of Green IT, making it easier to discuss this important topic within the organisation. The next blog first takes a closer look at CO2 emission and its impact, and then applies the issue to the heritage sector.

This series was written by Tineke van Heijst, green tech watcher of the Green IT network group set up by the Dutch Digital Heritage Network (Netwerk Digitaal Erfgoed, NDE). This network group monitors developments regarding Green IT and the impact of the increasing digitalisation on the climate. The group specifically studies the (increasing) digitalisation within the heritage sector.

Previously published in this blog series:

Introduction into Green IT

IT’s double role in sustainability - KIA community

The need for a sustainability framework for the heritage sector - KIA community

Data Storage

The digital databerg - KIA community

The hidden impact of cloud storage - KIA community

To store 1% of the world’s data: what does that cost in terms of CO2 emission? - KIA community

The quest for sustainable alternatives to disks and tapes - KIA community

Data storage in synthetic DNA: coding and decoding in secret code - KIA community

Data storage in atoms: science fiction or future reality? - KIA community

Data storage in glass: Superman has already been immortalised - KIA community

Green software

Green software: less energy = less CO2 emission - KIA community

Green software: Do more when the electricity is cleaner - KIA community

Green software: Do more when the electricity is cleaner - KIA community

Green software: Extend the lifespan of your (used) IT equipment - KIA community

Green software: Measuring to understand and to improve - KIA community

Artificial Intelligence

Introducing Artificial Intelligence: friend or foe? - KIA community

Trefwoorden