SUBTLEX-CY: A new word frequency database for Welsh

Allbwn ymchwil: Cyfraniad at gyfnodolynErthygladolygiad gan gymheiriaid

Fersiynau electronig

Dogfennau

Dangosydd eitem ddigidol (DOI)

We present SUBTLEX-CY, a new word frequency database created from a 32-million-word corpus of Welsh television subtitles. An experiment comprising a lexical decision task examined SUBTLEX-CY frequency estimates against words with inconsistent frequencies in a much smaller Welsh corpus that is often used by researchers, the Cronfa Electroneg o’r Gymraeg (CEG), and three other Welsh word frequency databases. Words were selected that were classified as low frequency (LF) in SUBTLEX-CY and high frequency (HF) in CEG and compared with words that were classified as medium frequency (MF) in both SUBTLEX-CY and CEG. Reaction time analyses showed that HF words in CEG were responded to more slowly compared to MF words, suggesting that SUBTLEX-CY corpus provides a more reliable estimate of Welsh word frequencies. The new Welsh word frequency database that also includes part-of-speech, contextual diversity, and other lexical information is freely available for research purposes on the Open Science Framework repository at https://osf.io/9gkqm/.
Iaith wreiddiolSaesneg
Tudalennau (o-i)1052–1067
Nifer y tudalennau16
CyfnodolynQuarterly Journal of Experimental Psychology
Cyfrol77
Rhif y cyfnodolyn5
Dyddiad ar-lein cynnar30 Awst 2023
Dynodwyr Gwrthrych Digidol (DOIs)
StatwsCyhoeddwyd - Mai 2024

Cyfanswm lawlrlwytho

Nid oes data ar gael
Gweld graff cysylltiadau