Feature selection and classification of non-traditional data : examples from veterinary medicine

Overview

Electronic versions

Documents

Hoare Phd 2007.pdf
12.1 MB, PDF document

Zoe Hoare

Research areas

PhD, School of Informatics

Abstract

Early diagnosis of notifiable diseases in the veterinary domain is important with regard to agriculture, the health sector and the economy. With no diagnostic test in the live animal for either BSE or Scrapie many cases may be mis-diagnosed. Traditionally, data for pattern recognition is stored as recorded cases of interest either labelled with their outcome (suitable for supervised classification) or unlabelled. Each case is described by a collection of symptoms, recorded as present / absent. These are called "binary features". In the case of medical data, the amount of cases recorded in this way may be limited for many reasons. To overcome this lack of data expert-estimated probability tables have been proposed as a substitute. These "non-traditional" tables contain the estimated percentage frequencies of clinical symptoms in various diseases. The construction of the tables assumed that the clinical signs (features) were independent given the diseases (classes). Given the "non-traditional" data, various feature selection techniques were applied and compared in this study in order to select a reduced subset of features (symptoms). The potential, limitations and stability of Sequential Forward Selection (SFS) in particular, were investigated. Decision trees and Naive Bayes classifier models were applied for the diagnosis task. The apparent success and stability of Naive Bayes in the medical domain led to an indepth investigation of the effects of this type of data and its inherent assumptions on the model. Naive Bayes is known to be optimal in the case of independent features, which is the condition assumed by the estimated probability tables in the "non-traditional" data. Various proposed adaptations to the Naive Bayes model were investigated with regard to their optimality when the independence assumption is violated. Finally, the performance of Naive Bayes with regard to traditionally stored medical data with binary features was assessed. Naive Bayes and its adaptations performed well with the traditional data. Since the effect of assuming independence when it is not true is minimal, using the "non-traditional" data with the Naive Bayes classifier can be a practical solution for veterinary diagnosis.

Details

Original language	English
Awarding Institution	Bangor University
Supervisors/Advisors	Ludmila Kuncheva (Supervisor)
Award date	Jan 2007