Towards a Welsh Semantic Annotation System
Allbwn ymchwil: Pennod mewn Llyfr/Adroddiad/Trafodion Cynhadledd › Cyfraniad i Gynhadledd › adolygiad gan gymheiriaid
Fersiynau electronig
Dolenni
- https://aclanthology.org/L18-1158.pdf
Fersiwn derfynol wedi’i chyhoeddi
Automatic semantic annotation of natural language data is an important task in Natural Language Processing, and a variety of semantic
taggers have been developed for this task, particularly for English. However, for many languages, particularly for low-resource
languages, such tools are yet to be developed. In this paper, we report on the development of an automatic Welsh semantic annotation
tool (named CySemTagger) in the CorCenCC Project, which will facilitate semantic-level analysis of Welsh language data on a large
scale. Based on Lancaster’s USAS semantic tagger framework, this tool tags words in Welsh texts with semantic tags from a semantic
classification scheme, and is designed to be compatible with multiple Welsh POS taggers and POS tagsets by mapping different tagsets
into a core shared POS tagset that is used internally by CySemTagger. Our initial evaluation shows that the tagger can cover up to
91.78% of words in Welsh text. This tagger is under continuous development, and will provide a critical tool for Welsh language corpus
and information processing at semantic level
taggers have been developed for this task, particularly for English. However, for many languages, particularly for low-resource
languages, such tools are yet to be developed. In this paper, we report on the development of an automatic Welsh semantic annotation
tool (named CySemTagger) in the CorCenCC Project, which will facilitate semantic-level analysis of Welsh language data on a large
scale. Based on Lancaster’s USAS semantic tagger framework, this tool tags words in Welsh texts with semantic tags from a semantic
classification scheme, and is designed to be compatible with multiple Welsh POS taggers and POS tagsets by mapping different tagsets
into a core shared POS tagset that is used internally by CySemTagger. Our initial evaluation shows that the tagger can cover up to
91.78% of words in Welsh text. This tagger is under continuous development, and will provide a critical tool for Welsh language corpus
and information processing at semantic level
Iaith wreiddiol | Saesneg |
---|---|
Teitl | Proceedings of the Eleventh International Conference on Language Resources and Evaluation ({LREC} 2018) |
Tudalennau | 980-985 |
Nifer y tudalennau | 6 |
Statws | Cyhoeddwyd - 7 Mai 2018 |
Cyhoeddwyd yn allanol | Ie |