Oral history collections: metadata and access challenges

Discussion
74
Aisha Mbeki
Feb 24, 2026 11:18 AM

Our institution is digitizing a collection of 2,000 oral history recordings from the liberation struggle era. We are facing several challenges:

  • Transcription in multiple languages (isiZulu, isiXhosa, English, Afrikaans)
  • Sensitive content requiring restricted access for some recordings
  • Linking recordings to named entities (persons, events, places)
  • Preservation of original cassette tapes alongside digital surrogates

Any recommendations for metadata schema?

oral-history metadata digitization south-africa multilingual

2 Replies

Kwame Asante Feb 24, 2026 11:18 PM

For oral history metadata, I'd recommend looking at the Oral History Metadata Synchronizer (OHMS) schema as an extension to Dublin Core. It provides fields specifically designed for audio/video interviews — indexed segments, transcript sync points, and subject keywords per segment.

For the multilingual transcription challenge, Heratio's AI plugin has speech-to-text capabilities that work reasonably well for English and Afrikaans.

Maria Garcia Feb 25, 2026 1:18 AM

We have a similar oral history collection from the Spanish Civil War period. Our approach:

  • Each recording gets a Dublin Core description with local extensions
  • Transcripts are stored as digital objects linked to the audio file
  • NER processing on transcripts to auto-generate name/place index entries
  • Access restrictions handled per-item using Security Clearance levels

The NER + auto-indexing approach has been a game-changer for discovery.