Oral history collections: metadata and access challenges
Feb 24, 2026 11:18 AM
Our institution is digitizing a collection of 2,000 oral history recordings from the liberation struggle era. We are facing several challenges:
- Transcription in multiple languages (isiZulu, isiXhosa, English, Afrikaans)
- Sensitive content requiring restricted access for some recordings
- Linking recordings to named entities (persons, events, places)
- Preservation of original cassette tapes alongside digital surrogates
Any recommendations for metadata schema?
2 Replies
For oral history metadata, I'd recommend looking at the Oral History Metadata Synchronizer (OHMS) schema as an extension to Dublin Core. It provides fields specifically designed for audio/video interviews — indexed segments, transcript sync points, and subject keywords per segment.
For the multilingual transcription challenge, Heratio's AI plugin has speech-to-text capabilities that work reasonably well for English and Afrikaans.
We have a similar oral history collection from the Spanish Civil War period. Our approach:
- Each recording gets a Dublin Core description with local extensions
- Transcripts are stored as digital objects linked to the audio file
- NER processing on transcripts to auto-generate name/place index entries
- Access restrictions handled per-item using Security Clearance levels
The NER + auto-indexing approach has been a game-changer for discovery.