PREDIKSI PEMBATALAN PESANAN E-COMMERCE INDONESIA MENGGUNAKAN RANDOM FOREST DAN SMOTE

Authors

  • Felixiana Koten
  • I Wayan Sudiarsa
  • Ni Komang Trisnawati
  • Magdalena Matildis Palo Pera INSTIKI ( Institut Bisnis Dan Teknologi Indonesia )
  • Maria Avilia Ndinin

DOI:

https://doi.org/10.70585/jumali.v3i1.192

Keywords:

Pembatalan Pesanan, E-Commerce, Random Forest, SMOTE, Perilaku Konsumen.

Abstract

Tingginya angka pembatalan pesanan (order cancellation) menjadi tantangan serius bagi efisiensi operasional dan profitabilitas industri e-commerce di Indonesia. Penelitian ini bertujuan untuk mengidentifikasi variabel determinan yang memicu pembatalan serta membangun model prediktif berbasis machine learning. Metode yang digunakan adalah kuantitatif dengan pendekatan data mining menggunakan algoritma Random Forest. Sumber data berasal dari dataset sekunder Kaggle "Indonesia E-Commerce Sales and Shipping 2023-2025" yang mencakup 19.189 data transaksi. Tahapan penelitian meliputi pra-pemrosesan data, penggunaan teknik SMOTE (Synthetic Minority Over-sampling Technique) untuk menangani ketidakseimbangan kelas, pembangunan model, dan evaluasi menggunakan confusion matrix. Hasil penelitian menunjukkan bahwa model Random Forest mampu memprediksi pembatalan dengan akurasi sangat tinggi sebesar 99,94%. Faktor utama yang paling berpengaruh terhadap keputusan pembatalan adalah metode pembayaran (khususnya Online Payment dan COD) serta nilai transaksi. Temuan ini memberikan implikasi manajerial bagi penyedia platform untuk memperketat verifikasi pembayaran dan meningkatkan transparansi biaya guna memitigasi risiko pembatalan di masa depan.

 

Downloads

Download data is not yet available.

References

Bakitacos. (2025). Indonesia E-Commerce Sales and Shipping 2023-2025 [Data set]. Kaggle. https://www.kaggle.com/datasets/bakitacos/indonesia-e- commerce-sales-and-shipping-20232025

Blagus, R., & Lusa, L. (2013). SMOTE for high-dimensional class-imbalanced data.

BMC Bioinformatics, 14(1), 106. https://doi.org/10.1186/1471-2105-14-106

Chawla, N. V., Bowyer, K. W., Hall, L. O., & Kegelmeyer, W. P. (2002). SMOTE: Synthetic Minority Over-sampling Technique. Journal of Artificial Intelligence Research, 16, 321–357. https://doi.org/10.1613/jair.953

Davagdorj, K., Pham, V. H., Theera-Umpon, N., & Ryu, K. H. (2020). XGBoost- Based Framework for Smoking-Induced Noncommunicable Disease Prediction. International Journal of Environmental Research and Public Health, 17(18), 6513. https://doi.org/10.3390/ijerph17186513

Dewi, A. C., Hermawan, A., & Avianto, D. (2024). Classification of Customers' Repeat Order Probability Using Decision Tree, Naïve Bayes and Random Forest. Pilar Nusa Mandiri: Journal of Computing and Information System, 20(1), 43–50. https://doi.org/10.33480/pilar.v20i1.5243

Gao, X., Wen, J., & Zhang, C. (2019). An Improved Random Forest Algorithm for Predicting Employee Turnover. Mathematical Problems in Engineering, 2019, 1–12. https://doi.org/10.1155/2019/4140707

Hakim, R. N. S. (2023). Optimasi Algoritma Random Forest dengan Teknik Boosting dalam Prediksi Churn Pelanggan di Industri Telekomunikasi. Jurnal Ilmiah Teknologi Informasi, 21(1), 45–56.

Kaya, E., Dong, X., Suhara, Y., Balcisoy, S., Bozkaya, B., & Pentland, A. S. (2018). Behavioral attributes and financial churn prediction. EPJ Data Science, 7(1), 1–15. https://doi.org/10.1140/epjds/s13688-018-0165-5

Mohammadpour, S., Khedmati, M., & Zada, M. (2023). Classification of truck- involved crash severity: Dealing with missing, imbalanced, and high- dimensional safety data. PLOS ONE, 18(3), e0281901. https://doi.org/10.1371/journal.pone.0281901

Netayawijit, P. (2025). Interpretable Machine Learning Framework for Diabetes Prediction: Integrating SMOTE Balancing with SHAP Explainability for Clinical Decision Support. Healthcare, 13(20), 2588.

https://doi.org/10.3390/healthcare13202588

Nugroho, A. (2025). The Influence of the Timeliness of Goods Delivery, The Speed of Goods Delivery Time, The Transparency of Goods Delivery Information on Customer Satisfaction and Company Performance Case Study at Posind Kendari Main Branch Office. Apcore Online Journal, 1(1), 160–165. https://doi.org/10.65232/xh5yav58

Rodan, A., Fayyoumi, A., Faris, H., Alsakran, J., & Al-Kadi, O. (2015). Negative correlation learning for customer churn prediction: A comparison study. The Scientific World Journal, 2015, 1–11. https://doi.org/10.1155/2015/347683

Starcke, J., Spadafora, J., Spadafora, P., & Toma, M. (2025). The Effect of Data Leakage and Feature Selection on Machine Learning Performance for Early Parkinson’s Disease Detection. Bioengineering, 12(8), 845. https://doi.org/10.3390/bioengineering12080845

Suryanto, S., & Martias, M. (2021). Komparasi metode K-NN, support vector machine dan random forest pada e-commerce Shopee. INSANtek, 2(1), 15– 21.

Tan, T., Ngoc, K., Thanh, H., Thu, H., & Hoang, U. (2024). Enhancing Repurchase Intention on Digital Platforms Based on Shopping Well-Being Through Shopping Value, Trust and Impulsive Buying. SAGE Open, 14(3), 1–12. https://doi.org/10.1177/21582440241278454

Downloads

Published

2026-04-09

How to Cite

Koten, F., Sudiarsa, I. W., Trisnawati, N. K., Pera, M. M. P., & Ndinin, M. A. (2026). PREDIKSI PEMBATALAN PESANAN E-COMMERCE INDONESIA MENGGUNAKAN RANDOM FOREST DAN SMOTE. Jurnal Manajemen Akuntansi Dan Ilmu Ekonomi , 3(1), 8–14. https://doi.org/10.70585/jumali.v3i1.192

Issue

Section

Articles