SUBTLEX-CY: A new word frequency database for Welsh
Research output: Contribution to journal › Article › peer-review
Electronic versions
Documents
- van-heuven-et-al-2023-subtlex-cy-a-new-word-frequency-database-for-welsh
Final published version, 2.57 MB, PDF document
Licence: CC BY-NC Show licence
DOI
We present SUBTLEX-CY, a new word frequency database created from a 32-million-word corpus of Welsh television
subtitles. An experiment comprising a lexical decision task examined SUBTLEX-CY frequency estimates against words
with inconsistent frequencies in a much smaller Welsh corpus that is often used by researchers, the Cronfa Electroneg
o’r Gymraeg (CEG), and three other Welsh word frequency databases. Words were selected that were classified as low
frequency (LF) in SUBTLEX-CY and high frequency (HF) in CEG and compared with words that were classified as medium
frequency (MF) in both SUBTLEX-CY and CEG. Reaction time analyses showed that HF words in CEG were responded
to more slowly compared to MF words, suggesting that SUBTLEX-CY corpus provides a more reliable estimate of Welsh
word frequencies. The new Welsh word frequency database that also includes part-of-speech, contextual diversity, and
other lexical information is freely available for research purposes on the Open Science Framework repository at https://
osf.io/9gkqm/.
subtitles. An experiment comprising a lexical decision task examined SUBTLEX-CY frequency estimates against words
with inconsistent frequencies in a much smaller Welsh corpus that is often used by researchers, the Cronfa Electroneg
o’r Gymraeg (CEG), and three other Welsh word frequency databases. Words were selected that were classified as low
frequency (LF) in SUBTLEX-CY and high frequency (HF) in CEG and compared with words that were classified as medium
frequency (MF) in both SUBTLEX-CY and CEG. Reaction time analyses showed that HF words in CEG were responded
to more slowly compared to MF words, suggesting that SUBTLEX-CY corpus provides a more reliable estimate of Welsh
word frequencies. The new Welsh word frequency database that also includes part-of-speech, contextual diversity, and
other lexical information is freely available for research purposes on the Open Science Framework repository at https://
osf.io/9gkqm/.
Original language | English |
---|---|
Pages (from-to) | 1052–1067 |
Number of pages | 16 |
Journal | Quarterly Journal of Experimental Psychology |
Volume | 77 |
Issue number | 5 |
Early online date | 30 Aug 2023 |
DOIs | |
Publication status | Published - May 2024 |
Total downloads
No data available