![]() |
International Society of Science and Applied Technologies |
An Evaluation of CNN and Vision Transformer Models for Mars Surface Image Classification | ||||
Author | Kehan Gao
|
|||
Co-Author(s) | Sarah Tasneem; Taghi M. Khoshgoftaar
|
|||
Abstract | This paper examines the effectiveness of three Convolutional Neural Network (CNN) architectures (InceptionNet, DenseNet, and EfficientNet) and a Vision Transformer (ViT) model in classifying Mars surface images, with a particular focus on their performance under varying degrees of class imbalance. Using NASA’s HiRISE imagery, we evaluate model robustness under two scenarios: one with severe imbalance across six terrain categories and another with moderate imbalance across four. While CNNs capture spatial hierarchies, ViTs treat images as sequences of patches and leverage self-attention, offering a contrasting approach to handling imbalanced planetary datasets. Experimental results demonstrate that all models perform well under moderate imbalance. However, a clear decline in classification performance is observed as imbalance becomes more severe. Among the models, the ViT consistently demonstrates greater robustness compared to the CNN architectures, achieving higher F1-scores and accuracy, particularly in the severely imbalanced setting. These findings highlight the potential of transformerbased models in addressing challenges posed by imbalanced datasets in image classification tasks.
|
|||
Keywords | Convolutional Neural Networks (CNNs), InceptionNet, DenseNet, EfficientNet, Vision Transformer (ViT), Image Classification, Class Imbalance | |||
Article #: RQD2025-167 |
Proceedings of 30th ISSAT International Conference on Reliability & Quality in Design |