Performance of Feature Subset Evaluators for Software Engineering Datasets

Performance of Feature Subset Evaluators for Software Engineering Datasets
Author	Huanjing Wang
Co-Author(s)	Taghi M. Khoshgoftaar; Kehan Gao
Abstract	The objective of feature selection is to identify irrelevant or redundant features, which can then be discarded from the analysis. Reducing the number of metrics (features) in a software dataset can lead to faster defect prediction model training and improve classifier performance. In the context of software defect prediction, we investigated two filter-based and five wrapper-based feature (software metrics) subset evaluators and built classification models using five different classifiers. The models were evaluated using the area under the Receiver Operating Characteristic (ROC) Curve (AUC). All experiments were conducted on nine imbalanced datasets from a real-world software project. The experimental results demonstrated that the choice of subset evaluators may significantly influence the classification evaluation conclusion. In this study, we have found that Correlation-Based Feature Selection performed best followed by k-nearest neighbors wrapper evaluator. The model built with support vector machine performed best.
Keywords	feature subset selection, software measurements, filters, wrappers, software quality classification

		Article #: 23-131

Proceedings of the 23rd ISSAT International Conference on Reliability and Quality in Design
August 3-5, 2017 - Chicago, Illinois, U.S.A.

	International Society of Science and Applied Technologies