Probabilistic Threshold Optimization on Machine Learning-Based Credit Scoring with DCA and Expected Loss

Watty Rimalia; Hardiansyah Syam; Rahmat Fuadi Syam; Syahrul Usman; Muhammad Nur Arafah

Authors

Watty Rimalia Universitas Pancasakti Makassar
Hardiansyah Syam Universitas Pancasakti Makassar
Rahmat Fuadi Syam Universitas Pancasakti Makassar
Syahrul Usman Universitas Pancasakti Makassar
Muhammad Nur Arafah Irmex Digital Akademika, Makassar 90551, Indonesia

Keywords:

Credit Scoring, Machine Learning, Decision Curve Analysis, Threshold Optimization, Early Warning System

Abstract

High predictive performance in credit scoring models does not always translate into optimal financial decisions, particularly in imbalanced datasets where misclassification of high-risk borrowers leads to substantial losses. This study proposes a probability threshold optimization framework based on Decision Curve Analysis (DCA) and Expected Loss to align machine learning predictions with economic decision value. A dataset of 1,670,214 historical consumer credit records with 37 features was used. Logistic Regression and Random Forest models were developed and evaluated using classification metrics, followed by threshold tuning. Experimental results show that Random Forest achieves higher discrimination performance (ROC-AUC 0.8691, PR-AUC 0.9027), while Logistic Regression provides competitive predictive performance with better interpretability (F1-score 0.7845). Both models indicate an optimal probability threshold of 0.2. At this threshold, Logistic Regression yields the highest net benefit (0.81) and the lowest Expected Loss (IDR 1.21 billion). The findings demonstrate that integrating DCA and Expected Loss improves decision quality by reducing false-negative-driven financial risk compared to conventional metric-based evaluation. Overall, Logistic Regression with a 0.2 threshold is recommended for practical implementation in credit scoring systems due to its balance of accuracy, interpretability, and economic efficiency.