International Society of Science and Applied Technologies |
|
Quality and Reliability of Data-Driven Business Applications: Methodology of Addressing Imbalanced Data Set Biases | ||||
Author | Andrei Shcheprov
|
|||
Co-Author(s) | Brady McMicken; Mike Sturdevant; Alan Cordell
|
|||
Abstract | Binary classification algorithms are commonly applied to real-world data-driven business problems that are represented by highly imbalanced data sets. Classifiers built on imbalanced data are often viewed as biased towards the majority class. This bias can significantly impact quality and reliability of data-driven business decisions and outcomes. This paper explores the nature of the bias and its influence on performance of classification methods by comparing models trained on data with different class imbalance levels. It is shown that in practical applications the bias is usually associated with model evaluation metrics and can be significantly reduced if the selected metric is directly linked to the business requirements. The paper also emphasizes the importance of Cost Sensitive Learning.
|
|||
Keywords | machine learning (ML) modeling, classification, data computing, performance analysis, optimization | |||
Article #: RQD28-1 |
Proceedings of 28th ISSAT International Conference on Reliability & Quality in Design |