Application of Euclidean Distance-Based K-Means Clustering for Early Detection of Non-Compliant Cosmetic Product Risks

Muhammad Nur Arafah; Firman Aziz; Christian Victor Burdam

Authors

Muhammad Nur Arafah IRMEX Digital Akademika
Firman Aziz
Christian Victor Burdam Badan Pengawas Obat dan Makanan

Keywords:

K-Means; Kosmetik, Unsupervised Learning, Euclidean Distance, Machine Learnings

Abstract

Risk-based supervision of cosmetics is a priority for the Food and Drug Monitoring Agency (BPOM) to protect the public from substandard products (TMS) that potentially contain hazardous substances. This study aims to apply the Euclidean distance-based K-Means clustering method to identify risk patterns of TMS cosmetic products and support data-driven supervision strategies. The study used an unsupervised clustering approach through the stages of data collection, preprocessing, feature selection, determining the optimal number of clusters, clustering, and model evaluation. A total of 1,186 TMS cosmetic samples were analyzed based on product characteristics and naming patterns. The results identified five main clusters: general care, hair dye, sunscreen, daily cleanser, and premium anti-aging. The premium anti-aging cluster showed the highest average Euclidean distance and was therefore interpreted as having the most distinct characteristics compared to the other clusters. Model evaluation showed a Silhouette Score of 0.587 and a Davies-Bouldin Index of 0.412, indicating fairly good cluster separation. PCA visualization also showed relatively clear cluster separation in two-dimensional space. This study shows that the clustering approach can be used as an analytical tool to support the priority of risk-based cosmetic supervision in the BPOM environment.