Prediksi Retensi Pengguna Baru Shopee Menggunakan Machine Learning
DOI:
https://doi.org/10.30865/mib.v8i1.7074Keywords:
Prediction, User Retention, Shopee, E-commerce, Machine LearningAbstract
Shopee has evolved into one of the leading e-commerce platforms connecting sellers with consumers. However, the challenge of keeping users active and engaged on the platform has become increasingly complex. User retention, the ability of a platform to sustain and enhance user presence, is a key factor in the long-term success of an e-commerce platform. Understanding the factors influencing users' decisions to remain active or cease interactions with the platform involves analyzing various variables, including user behavior, preferences, shopping experiences, and interactions with the platform. This research is designed to develop an effective user retention prediction model using data from new Shopee users. By analyzing the data and applying machine learning techniques using Logistic Regression, Decision Tree, Gaussian Naive Bayes, Random Forest, KNN (K-Nearest Neighbors), MLP (Multi-Layer Perceptron), AdaBoost, and XGBoost methods, this study predicts user retention within a 14-day period after registration on Shopee. The results of this research indicate that the Random Forest model performs the best with an Accuracy value of 0.733677, Precision of 0.702161, Recall of 0.811626, and F1-Score of 0.752936. Cross-validation values demonstrate the model's consistency with an Accuracy of 0.727626, Precision of 0.698143, Recall of 0.801884, and F1-Score of 0.746328. The Random Forest model becomes a model with a high recall value, indicating good sensitivity in identifying users who retain. Consequently, the results of this research provide valuable insights for Shopee in developing retention strategies for new users, which is an important aspect in the growth and sustainability of the e-commerce business.References
Databoks, “Pertumbuhan Pengunjung Shopee sampai Kuartal II 2022,†Databoks. [Online]. Available: https://databoks.katadata.co.id/datapublish/2022/11/21/tokopedia-masih-ungguli-shopee-sampai-kuartal-ii-2022
B. Prabadevi, R. Shalini, and B. R. Kavitha, “Customer churning analysis using machine learning algorithms,†Int. J. Intell. Networks, vol. 4, no. September 2022, pp. 145–154, 2023, doi: 10.1016/j.ijin.2023.05.005.
A. N. Rachmi, “Implementasi Metode Random Forest Dan Xgboost Pada Klasifikasi Customer Churn,†pp. 1–101, 2020, [Online]. Available: https://dspace.uii.ac.id/handle/123456789/30082
N. Suryana, Pratiwi, and R. Tri Prasetio, “Penanganan Ketidakseimbangan Data pada Prediksi Customer Churn Menggunakan Kombinasi SMOTE dan Boosting,†IJCIT (Indonesian J. Comput. Inf. Technol., vol. 6, no. 1, pp. 31–37, 2021, [Online]. Available: https://creativecommons.org/licenses/by-sa/4.0/
P. P. Singh, F. I. Anik, R. Senapati, A. Sinha, N. Sakib, and E. Hossain, “Investigating customer churn in banking: A machine learning approach and visualization app for data science and management,†Data Sci. Manag., 2023, doi: 10.1016/j.dsm.2023.09.002.
X. Li and Z. Li, “A hybrid prediction model for e-commerce customer churn based on logistic regression and extreme gradient boosting algorithm,†Ing. des Syst. d’Information, vol. 24, no. 5, pp. 525–530, 2019, doi: 10.18280/isi.240510.
M. Kiguchi, W. Saeed, and I. Medi, “Churn prediction in digital game-based learning using data mining techniques: Logistic regression, decision tree, and random forest,†Appl. Soft Comput., vol. 118, p. 108491, 2022, doi: 10.1016/j.asoc.2022.108491.
S. M. Shrestha and A. Shakya, “A Customer Churn Prediction Model using XGBoost for the Telecommunication Industry in Nepal,†Procedia Comput. Sci., vol. 215, pp. 652–661, 2022, doi: 10.1016/j.procs.2022.12.067.
S. Baghla and G. Gupta, “Performance Evaluation of Various Classification Techniques for Customer Churn Prediction in E-commerce,†Microprocess. Microsyst., vol. 94, no. September, p. 104680, 2022, doi: 10.1016/j.micpro.2022.104680.
J. Mantik and L. Qadrini, “Handling Unbalanced Data With Smote Adaboost,†J. Mantik, vol. 6, no. 2, pp. 2332–2336, 2022.
J. Pamina et al., “An effective classifier for predicting churn in telecommunication,†J. Adv. Res. Dyn. Control Syst., vol. 11, no. 1 Special Issue, pp. 221–229, 2019.
I. Hanif, “Implementing Extreme Gradient Boosting (XGBoost) Classifier to Improve Customer Churn Prediction,†2020, doi: 10.4108/eai.2-8-2019.2290338.
I. M. Syahrani, “Comparation Analysis of Ensemble Technique With Boosting(Xgboost) and Bagging (Randomforest) For Classify Splice Junction DNA Sequence Category,†J. Penelit. Pos dan Inform., vol. 9, no. 1, pp. 27–36, 2019, doi: 10.17933/jppi.v9i1.249.
S. F. Sabbeh, “Machine-learning techniques for customer retention: A comparative study,†Int. J. Adv. Comput. Sci. Appl., vol. 9, no. 2, pp. 273–281, 2018, doi: 10.14569/IJACSA.2018.090238.
Kaggle, “Kaggle.†[Online]. Available: https://www.kaggle.com/datasets/danielbeltsazar/shopee-new-user-behavior?select=sample_data_DStest.csv
Nitesh V. Chawla, Kevin W. Bowyer, Lawrence O. Hall, and W. Philip Kegelmeyer, “SMOTE: Synthetic Minority Over-sampling Technique,†J. Artif. Intell. Res., vol. 30, no. 2, pp. 321–357, 2002.
H. Hairani, K. E. Saputro, and S. Fadli, “K-means-SMOTE for handling class imbalance in the classification of diabetes with C4.5, SVM, and naive Bayes,†J. Teknol. dan Sist. Komput., vol. 8, no. 2, pp. 89–93, 2020, doi: 10.14710/jtsiskom.8.2.2020.89-93.
Y. A. Sir and A. H. H. Soepranoto, “Pendekatan Resampling Data Untuk Menangani Masalah Ketidakseimbangan Kelas,†J. Komput. dan Inform., vol. 10, no. 1, pp. 31–38, 2022, doi: 10.35508/jicon.v10i1.6554.
M. Libnao, M. Misula, C. Andres, J. Mariñas, and J. Mariñas, “ScienceDirect Procedia ScienceDirect Traffic incident prediction and classification system using naïve Traffic incident prediction and algorithm classification system using naïve bayes bayes algorithm,†Procedia Comput. Sci., vol. 227, pp. 316–325, 2023, doi: 10.1016/j.procs.2023.10.530.
Z. Jin, J. Shang, Q. Zhu, C. Ling, W. Xie, and B. Qiang, “RFRSF: Employee Turnover Prediction Based on Random Forests and Survival Analysis,†Lect. Notes Comput. Sci. (including Subser. Lect. Notes Artif. Intell. Lect. Notes Bioinformatics), vol. 12343 LNCS, pp. 503–515, 2020, doi: 10.1007/978-3-030-62008-0_35.
J. G. C. Krüger, A. de S. Britto, and J. P. Barddal, “An explainable machine learning approach for student dropout prediction,†Expert Syst. Appl., vol. 233, no. June, p. 120933, 2023, doi: 10.1016/j.eswa.2023.120933.
D. Vora and K. Iyer, “Evaluating the Effectiveness of Machine Learning Algorithms in Predictive Modelling,†Int. J. Eng. Technol., vol. 7, no. 3.4, p. 197, 2018, doi: 10.14419/ijet.v7i3.4.16773.
S. K. Wagh, A. A. Andhale, K. S. Wagh, J. R. Pansare, S. P. Ambadekar, and S. H. Gawande, “Customer Churn Prediction in Telecom Sector using Machine Learning Techniques,†Results Control Optim., vol. 14, no. October 2023, p. 100342, 2023, doi: 10.1016/j.rico.2023.100342.
Downloads
Published
Issue
Section
License

This work is licensed under a Creative Commons Attribution 4.0 International License
Authors who publish with this journal agree to the following terms:
- Authors retain copyright and grant the journal right of first publication with the work simultaneously licensed under Creative Commons Attribution 4.0 International License that allows others to share the work with an acknowledgment of the work's authorship and initial publication in this journal.
- Authors are able to enter into separate, additional contractual arrangements for the non-exclusive distribution of the journal's published version of the work (e.g., post it to an institutional repository or publish it in a book), with an acknowledgment of its initial publication in this journal.
- Authors are permitted and encouraged to post their work online (e.g., in institutional repositories or on their website) prior to and during the submission process, as it can lead to productive exchanges, as well as earlier and greater citation of published work (Refer to The Effect of Open Access).