Improving the Accuracy of Drug and Supplement Classification Using Vision Transformer (ViT) Model and Fine-Tuning Techniques on Multinational Dataset

Authors

  • Irmawati Irmawati
  • Firman Aziz Universitas Pancasakti Makassar

Keywords:

Vision Transformer, drug classification, supplements, fine-tuning, artificial intelligence

Abstract

This study aims to improve the accuracy of drug and supplement classification using a fine-tuned Vision Transformer (ViT) architecture on a large-scale multinational dataset. The primary challenge in drug image classification lies in the high variability of packaging design, differences in language, lighting conditions, and the visual similarity between products. The ViT model was compared with two widely used convolutional models, ResNet50 and VGG16, to evaluate accuracy, generalization capability, and performance stability. Experimental results using simulated data demonstrate that ViT achieves the highest accuracy with more stable loss trends compared to both CNN models, attributed to its self-attention mechanism that effectively captures global dependencies in images. These findings highlight ViT as a strong candidate for AI-based drug classification systems, particularly when dealing with heterogeneous datasets across countries.

Downloads

Published

2025-11-30

Issue

Section

Articles