Determining Optimal Feature Subset Size for Credit Card Fraud Detection

Determining Optimal Feature Subset Size for Credit Card Fraud Detection
Author	Huanjing Wang
Co-Author(s)	John Hancock; Taghi M. Khoshgoftaar
Abstract	Financial computing requires accurate detection of fraudulent activities in credit card transactions. In the field of credit card fraud detection, selecting relevant features before building a fraud detection model can significantly enhance its performance. This study proposes a novel ensemble approach that utilizes an ensemble of supervised feature selection methods to remove irrelevant and redundant features. We evaluate the size of the selected feature subsets through a comparative investigation based on the Credit Card Fraud Detection Dataset, a popular dataset known for its real-world transaction data. We train credit card fraud detection models using six different classifiers and evaluate their performance using two metrics: the Area Under the Receiver Operating Characteristic Curve (AUC) and the Area under the Precision-Recall Curve (AUPRC). Our empirical case study results demonstrate that an effective fraud detection model for classifying highly imbalanced data can be built with as few as ten features, and model performances improved when over two-thirds of the features were eliminated. Additionally, we find that our novel ensemble supervised feature selection technique outperforms the baseline of using all features.
Keywords	Ensemble Feature Selection, Credit Card Fraud, Machine Learning

		Article #: RQD28-407

Proceedings of 28th ISSAT International Conference on Reliability & Quality in Design
August 3-5, 2023

	International Society of Science and Applied Technologies