Similarity of Feature Subset Selection Methods on Software Metrics Data

Similarity of Feature Subset Selection Methods on Software Metrics Data
Author	Huanjing Wang
Co-Author(s)	Taghi M. Khoshgoftaar; Naeem Seliya
Abstract	During the software development cycle various software metrics are collected for different reasons. An intelligent selection of software metrics prior to building defect predictors may improve model performance. A software practitioner is interested in the similarity of the feature subset selected by different metric (feature) selection algorithms. To study the similarity of different feature selection methods, we test two filter-based rankers, two filter-based subset evaluators, and two wrappers and then use our newly proposed Average Pairwise Tanimoto Index (APTI) to evaluate the similarity between techniques. Three software metric datasets from a real-world software project are used in this study. Results demonstrate that Chi-square (CS) and Signal-To-Noise (S2N) exhibit most similarity regardless of perturbation level; in addition, filter-based feature selection methods are less similar to wrappers. This demonstrates that the choice of feature selection methods will have a major influence on the features chosen, and that practitioners must be careful when making these choices to ensure their techniques will give optimal results.
Keywords	software metrics, feature selection, similarity, defect prediction

		Article #: 22186

Proceedings of the 22nd ISSAT International Conference on Reliability and Quality in Design
August 4-6, 2016 - Los Angeles, California, U.S.A.

	International Society of Science and Applied Technologies