Analysing and Correcting Dyslexic Arabic Texts
Electronic versions
Documents
7.78 MB, PDF document
- PhD, School of Comouter Science
Research areas
Abstract
Dyslexia is a disorder that involves difficult with literacy skills and language related skills. It is related to the inability of a person to master the utilisation of written language and affects a significant number of people. This thesis describes the development of the Bangor Dyslexia Arabic Corpus (BDAC) in order to facilitate the analysis and automatic correction of dyslexic Arabic text. This thesis has also developed a new classification of errors made in Arabic by people with dyslexia which was used in the annotation of the BDAC. The dyslexic error classification scheme for Arabic texts (DECA) comprises a list of dyslexia spelling errors classified into 37 types, and grouped into nine categories.
This thesis also investigates a new type of classification – dyslexia text classification – that identifies whether or not a text has been written by a person with dyslexia. The text compression scheme known as prediction by partial matching (PPM) has been applied to the problem of distinguishing dyslexic text from non-dyslexic text. Experimental results show that the F1 score for PPM-classification was 0.99 and outperformed other classifiers such as Multinomial Naïve Bayes and Support Vector Machiness.
A new system called Sahah is also proposed for the automatic detection and correction of dyslexia errors in Arabic text. The system uses a language model based on the PPM text compression scheme in addition to edit operations (omission, addition, substitution and transposition). The correct alternative for each error word is chosen on the basis of the compression codelength. Two experiments were carried out to evaluate the usefulness of the Sahah system. Firstly, its accuracy was evaluated using the BDAC containing errors made by people with dyslexia. Secondly, the results of Sahah were compared with the results obtained when using word processing software and the Farasa tool. The results show that the Sahah system significantly outperforms Microsoft Word, Ayaspell and the Farasa tool with an F1 score of 0.83 for detection and an F1 score of 0.58 for correction.
This thesis also investigates a new type of classification – dyslexia text classification – that identifies whether or not a text has been written by a person with dyslexia. The text compression scheme known as prediction by partial matching (PPM) has been applied to the problem of distinguishing dyslexic text from non-dyslexic text. Experimental results show that the F1 score for PPM-classification was 0.99 and outperformed other classifiers such as Multinomial Naïve Bayes and Support Vector Machiness.
A new system called Sahah is also proposed for the automatic detection and correction of dyslexia errors in Arabic text. The system uses a language model based on the PPM text compression scheme in addition to edit operations (omission, addition, substitution and transposition). The correct alternative for each error word is chosen on the basis of the compression codelength. Two experiments were carried out to evaluate the usefulness of the Sahah system. Firstly, its accuracy was evaluated using the BDAC containing errors made by people with dyslexia. Secondly, the results of Sahah were compared with the results obtained when using word processing software and the Farasa tool. The results show that the Sahah system significantly outperforms Microsoft Word, Ayaspell and the Farasa tool with an F1 score of 0.83 for detection and an F1 score of 0.58 for correction.
Details
Original language | English |
---|---|
Awarding Institution | |
Supervisors/Advisors |
|
Award date | 12 Nov 2019 |
Research outputs (1)
- Published
Visualisation Data Modelling Graphics (VDMG) at Bangor
Research output: Contribution to conference › Paper › peer-review