Machine Learning for Compositional Data Analysis in Support of the Decision Making Process

Machine Learning for Compositional Data Analysis in Support of the Decision Making Process
Author	Thi Thuy Van Nguyen
Co-Author(s)	Cédric Heuchenne; Kim Phuc Tran
Abstract	Due to the importance of ML in data analysis and its limited research on CoDa, in this work, we will summarize the most popular ML techniques on CoDa, including principal component analysis (PCA), clustering, classification, and regression. Besides, we will introduce an efficient transformation method based on Dirichlet density estimation to transform CoDa into real data. The proposed method can not only remove the constraint (nonnegative and constant-sum) on each CoDa vector, but also reduce its dimension and improve the quality of data. We also apply the transformed data deriving from this method in anomaly detection using Support Vector Data Description (SVDD), a one-class classification algorithm that allows us to detect abnormal observations by modeling the normal ones. To indicate the promise of this method in building classification models as well as anomaly detection models on CoDa, a simulation example will also be provided at the end of the work.
Keywords	Compositional Data, Machine learning, Anomaly Detection, SVDD, Dirichlet density

		Article #: DSBFI23-19

Proceedings of 2nd ISSAT International Conference on Data Science in Business, Finance and Industry
January 8-10, 2023 - Da Nang, Vietnam

	International Society of Science and Applied Technologies