SUBTLEX-CY: A new word frequency database for Welsh

Research output: Contribution to journalArticlepeer-review

Electronic versions

Documents

DOI

We present SUBTLEX-CY, a new word frequency database created from a 32-million-word corpus of Welsh television
subtitles. An experiment comprising a lexical decision task examined SUBTLEX-CY frequency estimates against words
with inconsistent frequencies in a much smaller Welsh corpus that is often used by researchers, the Cronfa Electroneg
o’r Gymraeg (CEG), and three other Welsh word frequency databases. Words were selected that were classified as low
frequency (LF) in SUBTLEX-CY and high frequency (HF) in CEG and compared with words that were classified as medium
frequency (MF) in both SUBTLEX-CY and CEG. Reaction time analyses showed that HF words in CEG were responded
to more slowly compared to MF words, suggesting that SUBTLEX-CY corpus provides a more reliable estimate of Welsh
word frequencies. The new Welsh word frequency database that also includes part-of-speech, contextual diversity, and
other lexical information is freely available for research purposes on the Open Science Framework repository at https://
osf.io/9gkqm/.
Original languageEnglish
Pages (from-to)1052–1067
Number of pages16
JournalQuarterly Journal of Experimental Psychology
Volume77
Issue number5
Early online date30 Aug 2023
DOIs
Publication statusPublished - May 2024

Total downloads

No data available
View graph of relations