![]() |
International Society of Science and Applied Technologies |
Similarity of Feature Subset Selection Methods on Software Metrics Data | ||||
Author | Huanjing Wang
|
|||
Co-Author(s) | Taghi M. Khoshgoftaar; Naeem Seliya
|
|||
Abstract | During the software development cycle various software metrics are collected for different reasons. An intelligent selection of software metrics prior to building defect predictors may improve model performance. A software practitioner is interested in the similarity of the feature subset selected by different metric (feature) selection algorithms. To study the similarity of different feature selection methods, we test two filter-based rankers, two filter-based subset evaluators, and two wrappers and then use our newly proposed Average Pairwise Tanimoto Index (APTI) to evaluate the similarity between techniques. Three software metric datasets from a real-world software project are used in this study. Results demonstrate that Chi-square (CS) and Signal-To-Noise (S2N) exhibit most similarity regardless of perturbation level; in addition, filter-based feature selection methods are less similar to wrappers. This demonstrates that the choice of feature selection methods will have a major influence on the features chosen, and that practitioners must be careful when making these choices to ensure their techniques will give optimal results.
|
|||
Keywords | software metrics, feature selection, similarity, defect prediction | |||
Article #: 22186 |
August 4-6, 2016 - Los Angeles, California, U.S.A. |