Towards a Welsh Semantic Annotation System
Research output: Chapter in Book/Report/Conference proceeding › Conference contribution › peer-review
Standard Standard
Proceedings of the Eleventh International Conference on Language Resources and Evaluation ({LREC} 2018). 2018. p. 980-985.
Research output: Chapter in Book/Report/Conference proceeding › Conference contribution › peer-review
HarvardHarvard
APA
CBE
MLA
VancouverVancouver
Author
RIS
TY - GEN
T1 - Towards a Welsh Semantic Annotation System
AU - Piao, Scott
AU - Rayson, Paul
AU - Watkins, Gareth
AU - Knight, Dawn
PY - 2018/5/7
Y1 - 2018/5/7
N2 - Automatic semantic annotation of natural language data is an important task in Natural Language Processing, and a variety of semantictaggers have been developed for this task, particularly for English. However, for many languages, particularly for low-resourcelanguages, such tools are yet to be developed. In this paper, we report on the development of an automatic Welsh semantic annotationtool (named CySemTagger) in the CorCenCC Project, which will facilitate semantic-level analysis of Welsh language data on a largescale. Based on Lancaster’s USAS semantic tagger framework, this tool tags words in Welsh texts with semantic tags from a semanticclassification scheme, and is designed to be compatible with multiple Welsh POS taggers and POS tagsets by mapping different tagsetsinto a core shared POS tagset that is used internally by CySemTagger. Our initial evaluation shows that the tagger can cover up to91.78% of words in Welsh text. This tagger is under continuous development, and will provide a critical tool for Welsh language corpusand information processing at semantic level
AB - Automatic semantic annotation of natural language data is an important task in Natural Language Processing, and a variety of semantictaggers have been developed for this task, particularly for English. However, for many languages, particularly for low-resourcelanguages, such tools are yet to be developed. In this paper, we report on the development of an automatic Welsh semantic annotationtool (named CySemTagger) in the CorCenCC Project, which will facilitate semantic-level analysis of Welsh language data on a largescale. Based on Lancaster’s USAS semantic tagger framework, this tool tags words in Welsh texts with semantic tags from a semanticclassification scheme, and is designed to be compatible with multiple Welsh POS taggers and POS tagsets by mapping different tagsetsinto a core shared POS tagset that is used internally by CySemTagger. Our initial evaluation shows that the tagger can cover up to91.78% of words in Welsh text. This tagger is under continuous development, and will provide a critical tool for Welsh language corpusand information processing at semantic level
M3 - Conference contribution
SP - 980
EP - 985
BT - Proceedings of the Eleventh International Conference on Language Resources and Evaluation ({LREC} 2018)
ER -