SUBTLEX-CY: A new word frequency database for Welsh
Allbwn ymchwil: Cyfraniad at gyfnodolyn › Erthygl › adolygiad gan gymheiriaid
StandardStandard
Yn: Quarterly Journal of Experimental Psychology, Cyfrol 77, Rhif 5, 05.2024, t. 1052–1067.
Allbwn ymchwil: Cyfraniad at gyfnodolyn › Erthygl › adolygiad gan gymheiriaid
HarvardHarvard
APA
CBE
MLA
VancouverVancouver
Author
RIS
TY - JOUR
T1 - SUBTLEX-CY: A new word frequency database for Welsh
AU - van Heuven, Walter J.B.
AU - Payne, Joshua S.
AU - Jones, Manon
PY - 2024/5
Y1 - 2024/5
N2 - We present SUBTLEX-CY, a new word frequency database created from a 32-million-word corpus of Welsh televisionsubtitles. An experiment comprising a lexical decision task examined SUBTLEX-CY frequency estimates against wordswith inconsistent frequencies in a much smaller Welsh corpus that is often used by researchers, the Cronfa Electronego’r Gymraeg (CEG), and three other Welsh word frequency databases. Words were selected that were classified as lowfrequency (LF) in SUBTLEX-CY and high frequency (HF) in CEG and compared with words that were classified as mediumfrequency (MF) in both SUBTLEX-CY and CEG. Reaction time analyses showed that HF words in CEG were respondedto more slowly compared to MF words, suggesting that SUBTLEX-CY corpus provides a more reliable estimate of Welshword frequencies. The new Welsh word frequency database that also includes part-of-speech, contextual diversity, andother lexical information is freely available for research purposes on the Open Science Framework repository at https://osf.io/9gkqm/.
AB - We present SUBTLEX-CY, a new word frequency database created from a 32-million-word corpus of Welsh televisionsubtitles. An experiment comprising a lexical decision task examined SUBTLEX-CY frequency estimates against wordswith inconsistent frequencies in a much smaller Welsh corpus that is often used by researchers, the Cronfa Electronego’r Gymraeg (CEG), and three other Welsh word frequency databases. Words were selected that were classified as lowfrequency (LF) in SUBTLEX-CY and high frequency (HF) in CEG and compared with words that were classified as mediumfrequency (MF) in both SUBTLEX-CY and CEG. Reaction time analyses showed that HF words in CEG were respondedto more slowly compared to MF words, suggesting that SUBTLEX-CY corpus provides a more reliable estimate of Welshword frequencies. The new Welsh word frequency database that also includes part-of-speech, contextual diversity, andother lexical information is freely available for research purposes on the Open Science Framework repository at https://osf.io/9gkqm/.
U2 - 10.1177/17470218231190315
DO - 10.1177/17470218231190315
M3 - Article
VL - 77
SP - 1052
EP - 1067
JO - Quarterly Journal of Experimental Psychology
JF - Quarterly Journal of Experimental Psychology
SN - 1747-0218
IS - 5
ER -