Impact of Data Sampling on Feature Selection Techniques for Software Defect Prediction | ||||
Author | Kehan Gao
|
|||
Co-Author(s) | Taghi M. Khoshgoftaar; Amri Napolitano
|
|||
Abstract | In software quality modeling, two problems often come with a software training dataset: (1) high dimensionality and (2) imbalanced distributions between the two classes (fault-prone and not-fault-prone modules). To overcome these problems, an effective method is to perform feature selection and data sampling prior to building classifiers for software quality prediction. In this study, we investigate 18 filter-based feature ranking techniques and three data sampling approaches, and compare the similarity between each pair of filters with respect to different sampling techniques. We also compare the prediction performance when using every combination of filter and sampling method. The experimental results demonstrate that data sampling increases the similarity between two feature ranking techniques on average and improves the classification performance when combined with feature selection approaches.
|
|||
Keywords | ||||
Article #: 1844 |
July 26-28, 2012 - Boston, Massachusetts, U.S.A. |