Prediksi Retensi Pengguna Baru Shopee Menggunakan Machine Learning

Wahyu Fajrin Mustafa, Syarif Hidayat, Dhomas Hatta Fudholi

Abstract


Shopee has evolved into one of the leading e-commerce platforms connecting sellers with consumers. However, the challenge of keeping users active and engaged on the platform has become increasingly complex. User retention, the ability of a platform to sustain and enhance user presence, is a key factor in the long-term success of an e-commerce platform. Understanding the factors influencing users' decisions to remain active or cease interactions with the platform involves analyzing various variables, including user behavior, preferences, shopping experiences, and interactions with the platform. This research is designed to develop an effective user retention prediction model using data from new Shopee users. By analyzing the data and applying machine learning techniques using Logistic Regression, Decision Tree, Gaussian Naive Bayes, Random Forest, KNN (K-Nearest Neighbors), MLP (Multi-Layer Perceptron), AdaBoost, and XGBoost methods, this study predicts user retention within a 14-day period after registration on Shopee. The results of this research indicate that the Random Forest model performs the best with an Accuracy value of 0.733677, Precision of 0.702161, Recall of 0.811626, and F1-Score of 0.752936. Cross-validation values demonstrate the model's consistency with an Accuracy of 0.727626, Precision of 0.698143, Recall of 0.801884, and F1-Score of 0.746328. The Random Forest model becomes a model with a high recall value, indicating good sensitivity in identifying users who retain. Consequently, the results of this research provide valuable insights for Shopee in developing retention strategies for new users, which is an important aspect in the growth and sustainability of the e-commerce business.

Keywords


Prediction; User Retention; Shopee; E-commerce; Machine Learning

Full Text:

PDF

References


Databoks, “Pertumbuhan Pengunjung Shopee sampai Kuartal II 2022,†Databoks. [Online]. Available: https://databoks.katadata.co.id/datapublish/2022/11/21/tokopedia-masih-ungguli-shopee-sampai-kuartal-ii-2022

B. Prabadevi, R. Shalini, and B. R. Kavitha, “Customer churning analysis using machine learning algorithms,†Int. J. Intell. Networks, vol. 4, no. September 2022, pp. 145–154, 2023, doi: 10.1016/j.ijin.2023.05.005.

A. N. Rachmi, “Implementasi Metode Random Forest Dan Xgboost Pada Klasifikasi Customer Churn,†pp. 1–101, 2020, [Online]. Available: https://dspace.uii.ac.id/handle/123456789/30082

N. Suryana, Pratiwi, and R. Tri Prasetio, “Penanganan Ketidakseimbangan Data pada Prediksi Customer Churn Menggunakan Kombinasi SMOTE dan Boosting,†IJCIT (Indonesian J. Comput. Inf. Technol., vol. 6, no. 1, pp. 31–37, 2021, [Online]. Available: https://creativecommons.org/licenses/by-sa/4.0/

P. P. Singh, F. I. Anik, R. Senapati, A. Sinha, N. Sakib, and E. Hossain, “Investigating customer churn in banking: A machine learning approach and visualization app for data science and management,†Data Sci. Manag., 2023, doi: 10.1016/j.dsm.2023.09.002.

X. Li and Z. Li, “A hybrid prediction model for e-commerce customer churn based on logistic regression and extreme gradient boosting algorithm,†Ing. des Syst. d’Information, vol. 24, no. 5, pp. 525–530, 2019, doi: 10.18280/isi.240510.

M. Kiguchi, W. Saeed, and I. Medi, “Churn prediction in digital game-based learning using data mining techniques: Logistic regression, decision tree, and random forest,†Appl. Soft Comput., vol. 118, p. 108491, 2022, doi: 10.1016/j.asoc.2022.108491.

S. M. Shrestha and A. Shakya, “A Customer Churn Prediction Model using XGBoost for the Telecommunication Industry in Nepal,†Procedia Comput. Sci., vol. 215, pp. 652–661, 2022, doi: 10.1016/j.procs.2022.12.067.

S. Baghla and G. Gupta, “Performance Evaluation of Various Classification Techniques for Customer Churn Prediction in E-commerce,†Microprocess. Microsyst., vol. 94, no. September, p. 104680, 2022, doi: 10.1016/j.micpro.2022.104680.

J. Mantik and L. Qadrini, “Handling Unbalanced Data With Smote Adaboost,†J. Mantik, vol. 6, no. 2, pp. 2332–2336, 2022.

J. Pamina et al., “An effective classifier for predicting churn in telecommunication,†J. Adv. Res. Dyn. Control Syst., vol. 11, no. 1 Special Issue, pp. 221–229, 2019.

I. Hanif, “Implementing Extreme Gradient Boosting (XGBoost) Classifier to Improve Customer Churn Prediction,†2020, doi: 10.4108/eai.2-8-2019.2290338.

I. M. Syahrani, “Comparation Analysis of Ensemble Technique With Boosting(Xgboost) and Bagging (Randomforest) For Classify Splice Junction DNA Sequence Category,†J. Penelit. Pos dan Inform., vol. 9, no. 1, pp. 27–36, 2019, doi: 10.17933/jppi.v9i1.249.

S. F. Sabbeh, “Machine-learning techniques for customer retention: A comparative study,†Int. J. Adv. Comput. Sci. Appl., vol. 9, no. 2, pp. 273–281, 2018, doi: 10.14569/IJACSA.2018.090238.

Kaggle, “Kaggle.†[Online]. Available: https://www.kaggle.com/datasets/danielbeltsazar/shopee-new-user-behavior?select=sample_data_DStest.csv

Nitesh V. Chawla, Kevin W. Bowyer, Lawrence O. Hall, and W. Philip Kegelmeyer, “SMOTE: Synthetic Minority Over-sampling Technique,†J. Artif. Intell. Res., vol. 30, no. 2, pp. 321–357, 2002.

H. Hairani, K. E. Saputro, and S. Fadli, “K-means-SMOTE for handling class imbalance in the classification of diabetes with C4.5, SVM, and naive Bayes,†J. Teknol. dan Sist. Komput., vol. 8, no. 2, pp. 89–93, 2020, doi: 10.14710/jtsiskom.8.2.2020.89-93.

Y. A. Sir and A. H. H. Soepranoto, “Pendekatan Resampling Data Untuk Menangani Masalah Ketidakseimbangan Kelas,†J. Komput. dan Inform., vol. 10, no. 1, pp. 31–38, 2022, doi: 10.35508/jicon.v10i1.6554.

M. Libnao, M. Misula, C. Andres, J. Mariñas, and J. Mariñas, “ScienceDirect Procedia ScienceDirect Traffic incident prediction and classification system using naïve Traffic incident prediction and algorithm classification system using naïve bayes bayes algorithm,†Procedia Comput. Sci., vol. 227, pp. 316–325, 2023, doi: 10.1016/j.procs.2023.10.530.

Z. Jin, J. Shang, Q. Zhu, C. Ling, W. Xie, and B. Qiang, “RFRSF: Employee Turnover Prediction Based on Random Forests and Survival Analysis,†Lect. Notes Comput. Sci. (including Subser. Lect. Notes Artif. Intell. Lect. Notes Bioinformatics), vol. 12343 LNCS, pp. 503–515, 2020, doi: 10.1007/978-3-030-62008-0_35.

J. G. C. Krüger, A. de S. Britto, and J. P. Barddal, “An explainable machine learning approach for student dropout prediction,†Expert Syst. Appl., vol. 233, no. June, p. 120933, 2023, doi: 10.1016/j.eswa.2023.120933.

D. Vora and K. Iyer, “Evaluating the Effectiveness of Machine Learning Algorithms in Predictive Modelling,†Int. J. Eng. Technol., vol. 7, no. 3.4, p. 197, 2018, doi: 10.14419/ijet.v7i3.4.16773.

S. K. Wagh, A. A. Andhale, K. S. Wagh, J. R. Pansare, S. P. Ambadekar, and S. H. Gawande, “Customer Churn Prediction in Telecom Sector using Machine Learning Techniques,†Results Control Optim., vol. 14, no. October 2023, p. 100342, 2023, doi: 10.1016/j.rico.2023.100342.




DOI: https://doi.org/10.30865/mib.v8i1.7074

Refbacks

  • There are currently no refbacks.


Copyright (c) 2024 JURNAL MEDIA INFORMATIKA BUDIDARMA

Creative Commons License
This work is licensed under a Creative Commons Attribution 4.0 International License.



JURNAL MEDIA INFORMATIKA BUDIDARMA
Universitas Budi Darma
Secretariat: Sisingamangaraja No. 338 Telp 061-7875998
Email: mib.stmikbd@gmail.com

Creative Commons License
This work is licensed under a Creative Commons Attribution 4.0 International License.