Leveraging Lexical Resources and Constraint Grammar for Rule-Based Part-of-Speech Tagging in Welsh

Allbwn ymchwil: Cyfraniad at gynhadleddPapuradolygiad gan gymheiriaid

Fersiynau electronig

Dolenni

  • Steven Neale
    University of Wales, Cardiff
  • Kevin Donnelly
  • Gareth Watkins
    University of Wales, Cardiff
  • Dawn Knight
    University of Wales, Cardiff
As the quantity of annotated language data and the quality of machine learning algorithms have increased over time, statistical
part-of-speech (POS) taggers trained over large datasets have become as robust or better than their rule-based counterparts. However,
for lesser-resourced languages such as Welsh there is simply not enough accurately annotated data to train a statistical POS tagger.
Furthermore, many of the more popular rule-based taggers still require that their rules be inferred from annotated data, which while not
as extensive as that required for training a statistical tagger must still be sizeable. In this paper we describe CyTag, a rule-based POS
tagger for Welsh based on the VISL Constraint Grammar parser. Leveraging lexical information from Eurfa (an extensive open-source
dictionary for Welsh), we extract lists of possible POS tags for each word token in a running text and then apply various constraints –
based on various features of surrounding word tokens – to prune the number of possible tags until the most appropriate tag for a given
token can be selected. We explain how this approach is particularly useful in dealing with some of the specific intricacies of Welsh
- such as morphological changes and word mutations - and present an evaluation of the performance of the tagger using a manually
checked test corpus of 611 Welsh sentences.
Iaith wreiddiolSaesneg
Tudalennau3946-3954
Nifer y tudalennau9
StatwsCyhoeddwyd - 7 Mai 2018
Cyhoeddwyd yn allanolIe
DigwyddiadLREC 2018 - Miyazaki, Siapan
Hyd: 12 Mai 201812 Mai 2018

Cynhadledd

CynhadleddLREC 2018
Gwlad/TiriogaethSiapan
DinasMiyazaki
Cyfnod12/05/1812/05/18
Gweld graff cysylltiadau