Speech-to-text for Breton

Allbwn ymchwil: Cyfraniad at gynhadleddPapuradolygiad gan gymheiriaid

StandardStandard

Speech-to-text for Breton. / Vangberg, Preben; Farhat, Leena.
2023. Papur a gyflwynwyd yn Celtic Student Conference, Glasgow, Y Deyrnas Unedig.

Allbwn ymchwil: Cyfraniad at gynhadleddPapuradolygiad gan gymheiriaid

HarvardHarvard

Vangberg, P & Farhat, L 2023, 'Speech-to-text for Breton', Papur a gyflwynwyd yn Celtic Student Conference, Glasgow, Y Deyrnas Unedig, 30/03/23 - 1/04/23.

APA

Vangberg, P., & Farhat, L. (2023). Speech-to-text for Breton. Papur a gyflwynwyd yn Celtic Student Conference, Glasgow, Y Deyrnas Unedig.

CBE

Vangberg P, Farhat L. 2023. Speech-to-text for Breton. Papur a gyflwynwyd yn Celtic Student Conference, Glasgow, Y Deyrnas Unedig.

MLA

Vangberg, Preben a Leena Farhat Speech-to-text for Breton. Celtic Student Conference, 30 Maw 2023, Glasgow, Y Deyrnas Unedig, Papur, 2023.

VancouverVancouver

Vangberg P, Farhat L. Speech-to-text for Breton. 2023. Papur a gyflwynwyd yn Celtic Student Conference, Glasgow, Y Deyrnas Unedig.

Author

Vangberg, Preben ; Farhat, Leena. / Speech-to-text for Breton. Papur a gyflwynwyd yn Celtic Student Conference, Glasgow, Y Deyrnas Unedig.

RIS

TY - CONF

T1 - Speech-to-text for Breton

AU - Vangberg, Preben

AU - Farhat, Leena

N1 - Conference code: 10

PY - 2023/3

Y1 - 2023/3

N2 - Technology is a vital part of language revitalisation and conversation. While certain languages have usable speech-to-text (STT) models, this is not the case for most Celtic languages, including Breton. Audio transcription plays a crucial role in improving linguistic accessibility, improving online presence, and is used in a variety of fields. This paper seeks to use three open-source STT toolkits to train acoustic models to investigate how these can be used to create efficient STT models for Breton. This will be done using publicly available speech and text corpora. Given the low resources available for Breton, optimising several toolkits could allow for greater insight into how these different toolkits are able to perform with a low amount of resources. Acoustic models are in this case the base models that are used to convert spoken words into text. In addition, language models will be used to improve the accuracy of the acoustic models. The aim is to use this insight to produce and distribute usable and efficient STT models for Breton and make this technology accessible to the community.

AB - Technology is a vital part of language revitalisation and conversation. While certain languages have usable speech-to-text (STT) models, this is not the case for most Celtic languages, including Breton. Audio transcription plays a crucial role in improving linguistic accessibility, improving online presence, and is used in a variety of fields. This paper seeks to use three open-source STT toolkits to train acoustic models to investigate how these can be used to create efficient STT models for Breton. This will be done using publicly available speech and text corpora. Given the low resources available for Breton, optimising several toolkits could allow for greater insight into how these different toolkits are able to perform with a low amount of resources. Acoustic models are in this case the base models that are used to convert spoken words into text. In addition, language models will be used to improve the accuracy of the acoustic models. The aim is to use this insight to produce and distribute usable and efficient STT models for Breton and make this technology accessible to the community.

M3 - Paper

T2 - Celtic Student Conference

Y2 - 30 March 2023 through 1 April 2023

ER -