A Comprehensive Approach to Tabular Data Classification: FT-Transformer Enhanced by KNN Imputation and ICA in Diabetes Classification  
Author Tho Nguyen

 

Co-Author(s) Hao Mai Xuan; Quoc-Thong Nguyen; Kim Duc Tran; Ludovic Koehl; Kim Phuc Tran

 

Abstract Diabetes represents a major global public health challenge, imposing a significant burden on healthcare systems and socio-economic development. Its impact is expected to increase, with prevalence projected to rise by 59.7% from 2021 to 2050, affecting over 1.31 billion people. There is a growing emphasis on processing medical data and evaluating machine learning models to harness the full potential of artificial intelligence in diabetes diagnosis. The FT-Transformer demonstrated good classification performance with accuracy (98,07%), precision (95.76%) and recall (98.26%), alongside models like Random Forest, XGBoost, and LGBM. Although it slightly lags behind these models, it holds promise for future applications in more complex database systems. The data processing approach, which combines KNN imputation, ICA, and Isolation Forest, simplifies and optimizes the model effectively. These findings mark a significant step forward in streamlining data processing and provide future researchers with insights into identifying optimal models for application, particularly in medical databases such as diabetes diagnosis.

 

Keywords FT-Transformer, tabular data, ICA, KNN imputation, explainable artificial intelligence, diabetes
   
    Article #:  DSBFI25-30
 
Proceedings of 3rd ISSAT International Conference on Data Science in Business, Finance and Industry
January 6-8, 2025 - Da Nang, Vietnam