Comparative Analysis Performance of Naïve Bayes and K-NN Using Confusion Matrix and AUC To Predict Insurance Fraud

Authors

  • Gandung Triyono Budi Luhur University, Jakarta
  • Dermawan Ginting Budi Luhur University, Jakarta

DOI:

https://doi.org/10.30865/mib.v6i4.4836

Keywords:

Insurance Claim Fraud, Naïve Bayes, K-NN, Confusion Matrix, AUC

Abstract

Based on claim submission data from year 2019 to 2021 can be seen that the percentage of claims in one province is much higher than other provinces. During that period, the percentage of claims in that province reached 22% while the highest percentage in other provinces was only 6%. It is suspected that there has been a claim fraud in the province. The fraud allegedly started when customer submits a policy issuance for the elderly insured with a low sum insured so that the premium is also low. The insured's health condition at that time may not be good but it is not explained in the insurance application letter. To increase the sum insured, the policy is usually added with additional coverage. Fraud claim creates big loss for insurance company since the company has to pay the claim that they should not pay. Insurance company need to have a mechanism to avoid the fraud claim. From this research, it is expected to find the best methodology to be able to predict the potential of insurance claim fraud early when customers apply for policy issuance so that additional checks can be carried out for suspected submissions. The initial data available for this research is 14,778 claim records with attributes are : the date of claim submission, policy effective date, sum assured, type of claim, cause of claim, province and fraud. In order to get the best methodology on the accuracy and performance aspect to fulfill the expectation, two methodologies (Naïve Bayes and K-NN) are compared. Both Naive Bayes and K-NN methods are used with a comparison of training data and testing data is 80:20. Several combinations were performed for each of these methods. By using Confusion Matrix and AUC to measure the accuracy and performance of the two methods, it can be concluded that the best one is Naive Bayes with accuracy is 90% and AUC is 0.761. The attributes used are province, sum assured, additional coverage and the insured is the policy holder.

References

Law of Republic Indonesia no 40, “Undang-Undang Republik Indonesia no 40 Tahun 2014,†Undang. Republik Indones. no 40 Tahun 2014, p. 634, 2014.

N. S. Devi and M. Jeyanthi, “Comparative Analysis Of Classification Algorithm Using Machine Learning Technique,†no. February, 2019.

J. Sun, W. Du, and N. Shi, “A Survey of kNN Algorithm,†Inf. Eng. Appl. Comput., vol. 1, no. 1, pp. 1–10, 2018, doi: 10.18063/ieac.v1i1.770.

P. Pandey, A. Saroliya, and R. Kumar, “Analyses and detection of health insurance fraud using data mining and predictive modeling techniques,†Adv. Intell. Syst. Comput., vol. 584, pp. 41–49, 2018, doi: 10.1007/978-981-10-5699-4_5.

S. Subudhi and S. Panigrahi, “Detection of Automobile Insurance Fraud Using Feature Selection and Data Mining Techniques,†Int. J. Rough Sets Data Anal., vol. 5, no. 3, pp. 1–20, 2018, doi: 10.4018/ijrsda.2018070101.

A. Verma, A. Taneja, and A. Arora, “Fraud detection and frequent pattern matching in insurance claims using data mining techniques,†2017 10th Int. Conf. Contemp. Comput. IC3 2017, vol. 2018-Janua, no. August, pp. 1–7, 2018, doi: 10.1109/IC3.2017.8284299.

A. Ghorbani and S. Farzai, “Fraud Detection in Automobile Insurance using a Data Mining Based Approach,†Int. J. Mechatronics, Electr. Comput. Technol., vol. 8, no. 27, pp. 3764–3771, 2018, [Online]. Available: www.aeuso.org.

N. Ghuse, P. Pawar, and A. Potgantwar, “An Improved Approch For Fraud Detection In Health Insurance Using Data Mining Techniques,†no. 5, pp. 27–32, 2017, [Online]. Available: www.ijsrnsc.orgAvailableonlineatwww.ijsrnsc.org.

L. Rukhsar, W. Haider Bangyal, K. Nisar, and S. Nisar, “Prediction of Insurance Fraud Detection using Machine Learning Algorithms,†Mehran Univ. Res. J. Eng. Technol., vol. 41, no. 1, pp. 33–40, 2022, doi: 10.22581/muet1982.2201.04.

I. Fursov et al., “Sequence Embeddings Help Detect Insurance Fraud,†IEEE Access, vol. 10, pp. 32060–32074, 2022, doi: 10.1109/ACCESS.2022.3149480.

H. Abbassi, I. El Alaoui, and Y. Gahi, “Fraud Detection Techniques in the Big Data Era,†no. June 2022, pp. 161–170, 2022, doi: 10.5220/0010730300003101.

M. Hanafy and R. Ming, “Using machine learning models to compare various resampling methods in predicting insurance fraud,†J. Theor. Appl. Inf. Technol., vol. 99, no. 12, pp. 2819–2833, 2021.

K. Kapadiya and U. Patel, “Blockchain and AI-Empowered Healthcare Insurance Fraud Detection : An Analysis , Architecture , and Future Prospects,†IEEE Access, vol. 10, no. June, pp. 79606–79627, 2022, doi: 10.1109/ACCESS.2022.3194569.

C. Yan, M. Li, W. Liu, and M. Qi, “Improved adaptive genetic algorithm for the vehicle Insurance Fraud Identification Model based on a BP Neural Network,†Theor. Comput. Sci., vol. 817, pp. 12–23, 2020, doi: 10.1016/j.tcs.2019.06.025.

R. Karim and S. Alla, Scala and Spark for Big Data Analytics. 2017.

ITEBA, “Ini Dia Perbedaan Metode Penelitian Kualitatif, Kuantitatif, dan Penelitian Gabungan,†https://iteba.ac.id/blog/perbedaan-metode-penelitian-kualitatif-kuantitatif-gabungan/, 2021. .

Olson David L, Data Mining Models, Second Edition. 2018.

Allen B. Downey, Think Bayes, 2nd Edition. 2021.

B. Lantz, Machine Learning with R - Third Edition. 2019.

R. Bhowmik, “Detecting Auto Insurance Fraud by Data Mining Techniques,†vol. 2, no. 4, pp. 156–162, 2011.

Z. Karimi, “Confusion Matrix,†Encycl. Mach. Learn. Data Min., no. October, pp. 260–260, 2021, doi: 10.1007/978-1-4899-7687-1_50.

G. Hackeling, Mastering Machine Learning with scikit-learn - Second Edition. 2017.

M. Al Amin and D. Juniati, “Klasifikasi Kelompok Umur Manusia Berdasarkan Analisis Dimensi Fraktal Box Counting Dari Citra Wajah Dengan Deteksi Tepi Canny,†J. Ilm. Mat., vol. 2, no. 6, pp. 1–10, 2017.

Downloads

Published

2022-10-25