Residential College | false |
Status | 已發表Published |
An Application of Oversampling, Undersampling, Bagging and Boosting in Handling Imbalanced Datasets | |
Bee Wah Yap1; Khatijahhusna Abd Rani1; Hezlin Aryani Abd Rahman1; Simon Fong2; Zuraida Khairudin1; Nik Nik Abdullah3 | |
2014 | |
Conference Name | First International Conference on Advanced Data and Information Engineering (DaEng-2013) |
Source Publication | Lecture Notes in Electrical Engineering:Proceedings of the First International Conference on Advanced Data and Information Engineering (DaEng-2013) |
Volume | 285 LNEE |
Pages | 13-22 |
Conference Date | Dec 16, 2013 - Dec 18, 2013 |
Conference Place | Kuala Lumpur, Malaysia |
Abstract | Most classifiers work well when the class distribution in the response variable of the dataset is well balanced. Problems arise when the dataset is imbalanced. This paper applied four methods: Oversampling, Undersampling, Bagging and Boosting in handling imbalanced datasets. The cardiac surgery dataset has a binary response variable (1=Died, 0=Alive). The sample size is 4976 cases with 4.2% (Died) and 95.8% (Alive) cases. CART, C5 and CHAID were chosen as the classifiers. In classification problems, the accuracy rate of the predictive model is not an appropriate measure when there is imbalanced problem due to the fact that it will be biased towards the majority class. Thus, the performance of the classifier is measured using sensitivity and precision Oversampling and undersampling are found to work well in improving the classification for the imbalanced dataset using decision tree. Meanwhile, boosting and bagging did not improve the Decision Tree performance. |
Keyword | Bagging Boosting Imbalanced Data Oversampling Undersampling |
DOI | 10.1007/978-981-4585-18-7_2 |
URL | View the original |
Indexed By | 其他 |
Language | 英語English |
Scopus ID | 2-s2.0-84958535372 |
Fulltext Access | |
Citation statistics | |
Document Type | Conference paper |
Collection | DEPARTMENT OF COMPUTER AND INFORMATION SCIENCE |
Affiliation | 1.Faculty of Computer and Mathematical Sciences, Universiti Teknologi MARA, Selangor,Malaysia 2.Faculty of Science and Technology, University of Macau, China 3.Faculty of Medicine, Universiti Teknologi MARA, Selangor, Malaysia |
Recommended Citation GB/T 7714 | Bee Wah Yap,Khatijahhusna Abd Rani,Hezlin Aryani Abd Rahman,et al. An Application of Oversampling, Undersampling, Bagging and Boosting in Handling Imbalanced Datasets[C], 2014, 13-22. |
APA | Bee Wah Yap., Khatijahhusna Abd Rani., Hezlin Aryani Abd Rahman., Simon Fong., Zuraida Khairudin., & Nik Nik Abdullah (2014). An Application of Oversampling, Undersampling, Bagging and Boosting in Handling Imbalanced Datasets. Lecture Notes in Electrical Engineering:Proceedings of the First International Conference on Advanced Data and Information Engineering (DaEng-2013), 285 LNEE, 13-22. |
Files in This Item: | There are no files associated with this item. |
Items in the repository are protected by copyright, with all rights reserved, unless otherwise indicated.
Edit Comment