Statistical Significance of Hyperparameter Tuning for Varying Levels of Class Imbalance

John Hancock

Statistical Significance of Hyperparameter Tuning for Varying Levels of Class Imbalance
Author	John Hancock
Co-Author(s)	Taghi M. Khoshgoftaar; Sara Landset
Abstract	Researchers experimenting with classification tasks for Machine Learning have a choice to use optimized or default values for their algorithms’ hyperparameters. Our contribution is to conduct experiments with balanced and imbalanced datasets to show hyperparameter tuning has a significant, positive impact on classification results regardless of class ratio. To the best of our knowledge, this is the first study to investigate whether hyperparameter tuning has a statistically significant impact on the classification of balanced and imbalanced datasets derived from the Health and Retirement Study.We conduct a series of experiments with three classifiers, and five datasets. The classifiers are well-known, widely used classifiers in Machine Learning research. The datasets are based on a survey on cognition in human subjects. Three of the datasets are balanced, and two of them are imbalanced. We perform Analysis of Variance and Tukey’s Honestly Significant Difference tests to determine the effect of hyperparameter tuning. Our results show that, regardless of class imbalance, using optimized hyperparameter values yields better results in a statistically significant sense.
Keywords	CatBoost, Random Forest, Logistic Regression, Class Imbalance, Cognition, Health and Retirement Study, ANOVA, Tukey HSD, Machine Learning

		Article #: RQD26-155

Proceedings of 26th ISSAT International Conference on Reliability & Quality in Design
Virtual Event
August 5-7, 2021

	International Society of Science and Applied Technologies