Random Balance: Ensembles of variable priors classifiers for imbalanced data

J.F. Diez-Pastor; J.J. Rodriguez; C. Garcia-Osorio; L.I. Kuncheva

doi:10.1016/j.knosys.2015.04.022

Random Balance: Ensembles of variable priors classifiers for imbalanced data

Research output: Contribution to journal › Article › peer-review

Standard Standard

Random Balance: Ensembles of variable priors classifiers for imbalanced data. / Diez-Pastor, J.F.; Rodriguez, J.J.; Garcia-Osorio, C. et al.
In: Knowledge-Based Systems, Vol. 85, 07.05.2015, p. 96-111.

Research output: Contribution to journal › Article › peer-review

VancouverVancouver

Diez-Pastor JF, Rodriguez JJ, Garcia-Osorio C, Kuncheva LI. Random Balance: Ensembles of variable priors classifiers for imbalanced data. Knowledge-Based Systems. 2015 May 7;85:96-111. doi: 10.1016/j.knosys.2015.04.022

Author

Diez-Pastor, J.F. ; Rodriguez, J.J. ; Garcia-Osorio, C. et al. / Random Balance: Ensembles of variable priors classifiers for imbalanced data. In: Knowledge-Based Systems. 2015 ; Vol. 85. pp. 96-111.

RIS

TY - JOUR

T1 - Random Balance: Ensembles of variable priors classifiers for imbalanced data

AU - Diez-Pastor, J.F.

AU - Rodriguez, J.J.

AU - Garcia-Osorio, C.

AU - Kuncheva, L.I.

PY - 2015/5/7

Y1 - 2015/5/7

N2 - In Machine Learning, a data set is imbalanced when the class proportions are highly skewed. Imbalanced data sets arise routinely in many application domains and pose a challenge to traditional classifiers. We propose a new approach to building ensembles of classifiers for two-class imbalanced data sets, called Random Balance. Each member of the Random Balance ensemble is trained with data sampled from the training set and augmented by artificial instances obtained using SMOTE. The novelty in the approach is that the proportions of the classes for each ensemble member are chosen randomly. The intuition behind the method is that the proposed diversity heuristic will ensure that the ensemble contains classifiers that are specialized for different operating points on the ROC space, thereby leading to larger AUC compared to other ensembles of classifiers. Experiments have been carried out to test the Random Balance approach by itself, and also in combination with standard ensemble methods. As a result, we propose a new ensemble creation method called RB-Boost which combines Random Balance with AdaBoost.M2. This combination involves enforcing random class proportions in addition to instance re-weighting. Experiments with 86 imbalanced data sets from two well known repositories demonstrate the advantage of the Random Balance approach.

AB - In Machine Learning, a data set is imbalanced when the class proportions are highly skewed. Imbalanced data sets arise routinely in many application domains and pose a challenge to traditional classifiers. We propose a new approach to building ensembles of classifiers for two-class imbalanced data sets, called Random Balance. Each member of the Random Balance ensemble is trained with data sampled from the training set and augmented by artificial instances obtained using SMOTE. The novelty in the approach is that the proportions of the classes for each ensemble member are chosen randomly. The intuition behind the method is that the proposed diversity heuristic will ensure that the ensemble contains classifiers that are specialized for different operating points on the ROC space, thereby leading to larger AUC compared to other ensembles of classifiers. Experiments have been carried out to test the Random Balance approach by itself, and also in combination with standard ensemble methods. As a result, we propose a new ensemble creation method called RB-Boost which combines Random Balance with AdaBoost.M2. This combination involves enforcing random class proportions in addition to instance re-weighting. Experiments with 86 imbalanced data sets from two well known repositories demonstrate the advantage of the Random Balance approach.

U2 - 10.1016/j.knosys.2015.04.022

DO - 10.1016/j.knosys.2015.04.022

M3 - Article

VL - 85

SP - 96

EP - 111

JO - Knowledge-Based Systems

JF - Knowledge-Based Systems

SN - 0950-7051

ER -

Research Portal

Random Balance: Ensembles of variable priors classifiers for imbalanced data

Standard Standard

HarvardHarvard

APA

CBE

MLA

VancouverVancouver

Author

RIS