Analysing and Correcting Dyslexic Arabic Texts

Electronic versions

Documents

MAlamri_PhD_thesis
7.78 MB, PDF document

Maha Alamri

School of Computer Science & Engineering

Research areas

PhD, School of Comouter Science

Abstract

Dyslexia is a disorder that involves difficult with literacy skills and language related skills. It is related to the inability of a person to master the utilisation of written language and affects a significant number of people. This thesis describes the development of the Bangor Dyslexia Arabic Corpus (BDAC) in order to facilitate the analysis and automatic correction of dyslexic Arabic text. This thesis has also developed a new classification of errors made in Arabic by people with dyslexia which was used in the annotation of the BDAC. The dyslexic error classification scheme for Arabic texts (DECA) comprises a list of dyslexia spelling errors classified into 37 types, and grouped into nine categories.
This thesis also investigates a new type of classification – dyslexia text classification – that identifies whether or not a text has been written by a person with dyslexia. The text compression scheme known as prediction by partial matching (PPM) has been applied to the problem of distinguishing dyslexic text from non-dyslexic text. Experimental results show that the F₁ score for PPM-classification was 0.99 and outperformed other classifiers such as Multinomial Naïve Bayes and Support Vector Machiness.
A new system called Sahah is also proposed for the automatic detection and correction of dyslexia errors in Arabic text. The system uses a language model based on the PPM text compression scheme in addition to edit operations (omission, addition, substitution and transposition). The correct alternative for each error word is chosen on the basis of the compression codelength. Two experiments were carried out to evaluate the usefulness of the Sahah system. Firstly, its accuracy was evaluated using the BDAC containing errors made by people with dyslexia. Secondly, the results of Sahah were compared with the results obtained when using word processing software and the Farasa tool. The results show that the Sahah system significantly outperforms Microsoft Word, Ayaspell and the Farasa tool with an F1 score of 0.83 for detection and an F₁ score of 0.58 for correction.

Details

Original language	English
Awarding Institution	Bangor University
Supervisors/Advisors	William Teahan (Supervisor)
Award date	12 Nov 2019

Research outputs (1)

Published
Visualisation Data Modelling Graphics (VDMG) at Bangor
Research output: Contribution to conference › Paper › peer-review

View all

Theses