PREDIKSI PEMBATALAN PESANAN E-COMMERCE INDONESIA MENGGUNAKAN RANDOM FOREST DAN SMOTE
DOI:
https://doi.org/10.70585/jumali.v3i1.192Keywords:
Pembatalan Pesanan, E-Commerce, Random Forest, SMOTE, Perilaku Konsumen.Abstract
Tingginya angka pembatalan pesanan (order cancellation) menjadi tantangan serius bagi efisiensi operasional dan profitabilitas industri e-commerce di Indonesia. Penelitian ini bertujuan untuk mengidentifikasi variabel determinan yang memicu pembatalan serta membangun model prediktif berbasis machine learning. Metode yang digunakan adalah kuantitatif dengan pendekatan data mining menggunakan algoritma Random Forest. Sumber data berasal dari dataset sekunder Kaggle "Indonesia E-Commerce Sales and Shipping 2023-2025" yang mencakup 19.189 data transaksi. Tahapan penelitian meliputi pra-pemrosesan data, penggunaan teknik SMOTE (Synthetic Minority Over-sampling Technique) untuk menangani ketidakseimbangan kelas, pembangunan model, dan evaluasi menggunakan confusion matrix. Hasil penelitian menunjukkan bahwa model Random Forest mampu memprediksi pembatalan dengan akurasi sangat tinggi sebesar 99,94%. Faktor utama yang paling berpengaruh terhadap keputusan pembatalan adalah metode pembayaran (khususnya Online Payment dan COD) serta nilai transaksi. Temuan ini memberikan implikasi manajerial bagi penyedia platform untuk memperketat verifikasi pembayaran dan meningkatkan transparansi biaya guna memitigasi risiko pembatalan di masa depan.
Downloads
References
Bakitacos. (2025). Indonesia E-Commerce Sales and Shipping 2023-2025 [Data set]. Kaggle. https://www.kaggle.com/datasets/bakitacos/indonesia-e- commerce-sales-and-shipping-20232025
Blagus, R., & Lusa, L. (2013). SMOTE for high-dimensional class-imbalanced data.
BMC Bioinformatics, 14(1), 106. https://doi.org/10.1186/1471-2105-14-106
Chawla, N. V., Bowyer, K. W., Hall, L. O., & Kegelmeyer, W. P. (2002). SMOTE: Synthetic Minority Over-sampling Technique. Journal of Artificial Intelligence Research, 16, 321–357. https://doi.org/10.1613/jair.953
Davagdorj, K., Pham, V. H., Theera-Umpon, N., & Ryu, K. H. (2020). XGBoost- Based Framework for Smoking-Induced Noncommunicable Disease Prediction. International Journal of Environmental Research and Public Health, 17(18), 6513. https://doi.org/10.3390/ijerph17186513
Dewi, A. C., Hermawan, A., & Avianto, D. (2024). Classification of Customers' Repeat Order Probability Using Decision Tree, Naïve Bayes and Random Forest. Pilar Nusa Mandiri: Journal of Computing and Information System, 20(1), 43–50. https://doi.org/10.33480/pilar.v20i1.5243
Gao, X., Wen, J., & Zhang, C. (2019). An Improved Random Forest Algorithm for Predicting Employee Turnover. Mathematical Problems in Engineering, 2019, 1–12. https://doi.org/10.1155/2019/4140707
Hakim, R. N. S. (2023). Optimasi Algoritma Random Forest dengan Teknik Boosting dalam Prediksi Churn Pelanggan di Industri Telekomunikasi. Jurnal Ilmiah Teknologi Informasi, 21(1), 45–56.
Kaya, E., Dong, X., Suhara, Y., Balcisoy, S., Bozkaya, B., & Pentland, A. S. (2018). Behavioral attributes and financial churn prediction. EPJ Data Science, 7(1), 1–15. https://doi.org/10.1140/epjds/s13688-018-0165-5
Mohammadpour, S., Khedmati, M., & Zada, M. (2023). Classification of truck- involved crash severity: Dealing with missing, imbalanced, and high- dimensional safety data. PLOS ONE, 18(3), e0281901. https://doi.org/10.1371/journal.pone.0281901
Netayawijit, P. (2025). Interpretable Machine Learning Framework for Diabetes Prediction: Integrating SMOTE Balancing with SHAP Explainability for Clinical Decision Support. Healthcare, 13(20), 2588.
https://doi.org/10.3390/healthcare13202588
Nugroho, A. (2025). The Influence of the Timeliness of Goods Delivery, The Speed of Goods Delivery Time, The Transparency of Goods Delivery Information on Customer Satisfaction and Company Performance Case Study at Posind Kendari Main Branch Office. Apcore Online Journal, 1(1), 160–165. https://doi.org/10.65232/xh5yav58
Rodan, A., Fayyoumi, A., Faris, H., Alsakran, J., & Al-Kadi, O. (2015). Negative correlation learning for customer churn prediction: A comparison study. The Scientific World Journal, 2015, 1–11. https://doi.org/10.1155/2015/347683
Starcke, J., Spadafora, J., Spadafora, P., & Toma, M. (2025). The Effect of Data Leakage and Feature Selection on Machine Learning Performance for Early Parkinson’s Disease Detection. Bioengineering, 12(8), 845. https://doi.org/10.3390/bioengineering12080845
Suryanto, S., & Martias, M. (2021). Komparasi metode K-NN, support vector machine dan random forest pada e-commerce Shopee. INSANtek, 2(1), 15– 21.
Tan, T., Ngoc, K., Thanh, H., Thu, H., & Hoang, U. (2024). Enhancing Repurchase Intention on Digital Platforms Based on Shopping Well-Being Through Shopping Value, Trust and Impulsive Buying. SAGE Open, 14(3), 1–12. https://doi.org/10.1177/21582440241278454
Downloads
Published
How to Cite
Issue
Section
License
Copyright (c) 2026 Felixiana Koten, I Wayan Sudiarsa, Ni Komang Trisnawati, Magdalena Matildis Palo Pera, Maria Avilia Ndinin

This work is licensed under a Creative Commons Attribution-ShareAlike 4.0 International License.









