Comparative Analysis Performance of NaÃ¯ve Bayes and K-NN  Using Confusion Matrix and AUC To Predict Insurance Fraud

Gandung Triyono; Dermawan Ginting

doi:10.30865/mib.v6i4.4836

Authors

Gandung Triyono Budi Luhur University, Jakarta
Dermawan Ginting Budi Luhur University, Jakarta

DOI:

https://doi.org/10.30865/mib.v6i4.4836

Keywords:

Insurance Claim Fraud, NaÃ¯ve Bayes, K-NN, Confusion Matrix, AUC

Abstract

Based on claim submission data from year 2019 to 2021 can be seen that the percentage of claims in one province is much higher than other provinces. During that period, the percentage of claims in that province reached 22% while the highest percentage in other provinces was only 6%. It is suspected that there has been a claim fraud in the province. The fraud allegedly started when customer submits a policy issuance for the elderly insured with a low sum insured so that the premium is also low. The insured's health condition at that time may not be good but it is not explained in the insurance application letter. To increase the sum insured, the policy is usually added with additional coverage. Fraud claim creates big loss for insurance company since the company has to pay the claim that they should not pay. Insurance company need to have a mechanism to avoid the fraud claim. From this research, it is expected to find the best methodology to be able to predict the potential of insurance claim fraud early when customers apply for policy issuance so that additional checks can be carried out for suspected submissions. The initial data available for this research is 14,778 claim records with attributes are : the date of claim submission, policy effective date, sum assured, type of claim, cause of claim, province and fraud. In order to get the best methodology on the accuracy and performance aspect to fulfill the expectation, two methodologies (NaÃ¯ve Bayes and K-NN) are compared. Both Naive Bayes and K-NN methods are used with a comparison of training data and testing data is 80:20. Several combinations were performed for each of these methods. By using Confusion Matrix and AUC to measure the accuracy and performance of the two methods, it can be concluded that the best one is Naive Bayes with accuracy is 90% and AUC is 0.761. The attributes used are province, sum assured, additional coverage and the insured is the policy holder.

References

Law of Republic Indonesia no 40, â€œUndang-Undang Republik Indonesia no 40 Tahun 2014,â€ Undang. Republik Indones. no 40 Tahun 2014, p. 634, 2014.

N. S. Devi and M. Jeyanthi, â€œComparative Analysis Of Classification Algorithm Using Machine Learning Technique,â€ no. February, 2019.

J. Sun, W. Du, and N. Shi, â€œA Survey of kNN Algorithm,â€ Inf. Eng. Appl. Comput., vol. 1, no. 1, pp. 1â€“10, 2018, doi: 10.18063/ieac.v1i1.770.

P. Pandey, A. Saroliya, and R. Kumar, â€œAnalyses and detection of health insurance fraud using data mining and predictive modeling techniques,â€ Adv. Intell. Syst. Comput., vol. 584, pp. 41â€“49, 2018, doi: 10.1007/978-981-10-5699-4_5.

S. Subudhi and S. Panigrahi, â€œDetection of Automobile Insurance Fraud Using Feature Selection and Data Mining Techniques,â€ Int. J. Rough Sets Data Anal., vol. 5, no. 3, pp. 1â€“20, 2018, doi: 10.4018/ijrsda.2018070101.

A. Verma, A. Taneja, and A. Arora, â€œFraud detection and frequent pattern matching in insurance claims using data mining techniques,â€ 2017 10th Int. Conf. Contemp. Comput. IC3 2017, vol. 2018-Janua, no. August, pp. 1â€“7, 2018, doi: 10.1109/IC3.2017.8284299.

A. Ghorbani and S. Farzai, â€œFraud Detection in Automobile Insurance using a Data Mining Based Approach,â€ Int. J. Mechatronics, Electr. Comput. Technol., vol. 8, no. 27, pp. 3764â€“3771, 2018, [Online]. Available: www.aeuso.org.

N. Ghuse, P. Pawar, and A. Potgantwar, â€œAn Improved Approch For Fraud Detection In Health Insurance Using Data Mining Techniques,â€ no. 5, pp. 27â€“32, 2017, [Online]. Available: www.ijsrnsc.orgAvailableonlineatwww.ijsrnsc.org.

L. Rukhsar, W. Haider Bangyal, K. Nisar, and S. Nisar, â€œPrediction of Insurance Fraud Detection using Machine Learning Algorithms,â€ Mehran Univ. Res. J. Eng. Technol., vol. 41, no. 1, pp. 33â€“40, 2022, doi: 10.22581/muet1982.2201.04.

I. Fursov et al., â€œSequence Embeddings Help Detect Insurance Fraud,â€ IEEE Access, vol. 10, pp. 32060â€“32074, 2022, doi: 10.1109/ACCESS.2022.3149480.

H. Abbassi, I. El Alaoui, and Y. Gahi, â€œFraud Detection Techniques in the Big Data Era,â€ no. June 2022, pp. 161â€“170, 2022, doi: 10.5220/0010730300003101.

M. Hanafy and R. Ming, â€œUsing machine learning models to compare various resampling methods in predicting insurance fraud,â€ J. Theor. Appl. Inf. Technol., vol. 99, no. 12, pp. 2819â€“2833, 2021.

K. Kapadiya and U. Patel, â€œBlockchain and AI-Empowered Healthcare Insurance Fraud Detection : An Analysis , Architecture , and Future Prospects,â€ IEEE Access, vol. 10, no. June, pp. 79606â€“79627, 2022, doi: 10.1109/ACCESS.2022.3194569.

C. Yan, M. Li, W. Liu, and M. Qi, â€œImproved adaptive genetic algorithm for the vehicle Insurance Fraud Identification Model based on a BP Neural Network,â€ Theor. Comput. Sci., vol. 817, pp. 12â€“23, 2020, doi: 10.1016/j.tcs.2019.06.025.

R. Karim and S. Alla, Scala and Spark for Big Data Analytics. 2017.

ITEBA, â€œIni Dia Perbedaan Metode Penelitian Kualitatif, Kuantitatif, dan Penelitian Gabungan,â€ https://iteba.ac.id/blog/perbedaan-metode-penelitian-kualitatif-kuantitatif-gabungan/, 2021. .

Olson David L, Data Mining Models, Second Edition. 2018.

Allen B. Downey, Think Bayes, 2nd Edition. 2021.

B. Lantz, Machine Learning with R - Third Edition. 2019.

R. Bhowmik, â€œDetecting Auto Insurance Fraud by Data Mining Techniques,â€ vol. 2, no. 4, pp. 156â€“162, 2011.

Z. Karimi, â€œConfusion Matrix,â€ Encycl. Mach. Learn. Data Min., no. October, pp. 260â€“260, 2021, doi: 10.1007/978-1-4899-7687-1_50.

G. Hackeling, Mastering Machine Learning with scikit-learn - Second Edition. 2017.

M. Al Amin and D. Juniati, â€œKlasifikasi Kelompok Umur Manusia Berdasarkan Analisis Dimensi Fraktal Box Counting Dari Citra Wajah Dengan Deteksi Tepi Canny,â€ J. Ilm. Mat., vol. 2, no. 6, pp. 1â€“10, 2017.

Comparative Analysis Performance of NaÃ¯ve Bayes and K-NN Using Confusion Matrix and AUC To Predict Insurance Fraud

Authors

DOI:

Keywords:

Abstract

References

Downloads

Published

Issue

Section

License