Speech-to-text for Breton

Research output: Contribution to conferencePaperpeer-review

Technology is a vital part of language revitalisation and conversation. While certain languages have usable speech-to-text (STT) models, this is not the case for most Celtic languages, including Breton. Audio transcription plays a crucial role in improving linguistic accessibility, improving online presence, and is used in a variety of fields. This paper seeks to use three open-source STT toolkits to train acoustic models to investigate how these can be used to create efficient STT models for Breton. This will be done using publicly available speech and text corpora. Given the low resources available for Breton, optimising several toolkits could allow for greater insight into how these different toolkits are able to perform with a low amount of resources. Acoustic models are in this case the base models that are used to convert spoken words into text. In addition, language models will be used to improve the accuracy of the acoustic models. The aim is to use this insight to produce and distribute usable and efficient STT models for Breton and make this technology accessible to the community.
Original languageEnglish
Publication statusPublished - Mar 2023
EventCeltic Student Conference - University of Glasgow, Glasgow, United Kingdom
Duration: 30 Mar 20231 Apr 2023
Conference number: 10

Conference

ConferenceCeltic Student Conference
Country/TerritoryUnited Kingdom
CityGlasgow
Period30/03/231/04/23
View graph of relations