Improved Unsupervised Feature Selection for High Dimensional Data  
Author Peican Zhu

 

Co-Author(s) Xin Hou; Keke Tang; Zhen Wang; Feiping Nie

 

Abstract With the vigorous development of information technology, the field of data science has received extensive attention in recent years. How to efficiently process data becomes particularly important. Feature selection is an important content in the field of dimensionality reduction, which expects to select some representative features from many candidate ones. The feature selection process can achieve different functions such as dimensionality reduction, model improvement, and execution efficiency. Nowadays, a large amount of high-dimensional data has been generated, labeling these high-dimensional data is expensive and timeconsuming. Therefore, unsupervised feature selection has attracted increasing attention. Researches in many classification tasks have found that data from the same class are often close to each other. So, the importance of a feature can be assessed by its local compactness. In this paper, we proposed a novel unsupervised feature selection algorithm, named as Compactness Score (CSUFS), to select desired features. To verify the accuracy and efficiency of the proposed algorithm, CSUFS and several other unsupervised feature selection algorithms are experimented and compared on eight different public datasets, respectively. As revealed by the simulation results, our proposed algorithm seems to be more accurate and efficient compared with other algorithms.

 

Keywords Unsupervised feature selection, dimensionality reduction, k-nearest neighbor distance
   
    Article #:  DSBFI23-85
 
Proceedings of 2nd ISSAT International Conference on Data Science in Business, Finance and Industry
January 8-10, 2023 - Da Nang, Vietnam