Comparative Analysis Performance of Naïve Bayes and K-NN Using Confusion Matrix and AUC To Predict Insurance Fraud
DOI:
https://doi.org/10.30865/mib.v6i4.4836Keywords:
Insurance Claim Fraud, Naïve Bayes, K-NN, Confusion Matrix, AUCAbstract
Based on claim submission data from year 2019 to 2021 can be seen that the percentage of claims in one province is much higher than other provinces. During that period, the percentage of claims in that province reached 22% while the highest percentage in other provinces was only 6%. It is suspected that there has been a claim fraud in the province. The fraud allegedly started when customer submits a policy issuance for the elderly insured with a low sum insured so that the premium is also low. The insured's health condition at that time may not be good but it is not explained in the insurance application letter. To increase the sum insured, the policy is usually added with additional coverage. Fraud claim creates big loss for insurance company since the company has to pay the claim that they should not pay. Insurance company need to have a mechanism to avoid the fraud claim. From this research, it is expected to find the best methodology to be able to predict the potential of insurance claim fraud early when customers apply for policy issuance so that additional checks can be carried out for suspected submissions. The initial data available for this research is 14,778 claim records with attributes are : the date of claim submission, policy effective date, sum assured, type of claim, cause of claim, province and fraud. In order to get the best methodology on the accuracy and performance aspect to fulfill the expectation, two methodologies (Naïve Bayes and K-NN) are compared. Both Naive Bayes and K-NN methods are used with a comparison of training data and testing data is 80:20. Several combinations were performed for each of these methods. By using Confusion Matrix and AUC to measure the accuracy and performance of the two methods, it can be concluded that the best one is Naive Bayes with accuracy is 90% and AUC is 0.761. The attributes used are province, sum assured, additional coverage and the insured is the policy holder.References
Law of Republic Indonesia no 40, “Undang-Undang Republik Indonesia no 40 Tahun 2014,†Undang. Republik Indones. no 40 Tahun 2014, p. 634, 2014.
N. S. Devi and M. Jeyanthi, “Comparative Analysis Of Classification Algorithm Using Machine Learning Technique,†no. February, 2019.
J. Sun, W. Du, and N. Shi, “A Survey of kNN Algorithm,†Inf. Eng. Appl. Comput., vol. 1, no. 1, pp. 1–10, 2018, doi: 10.18063/ieac.v1i1.770.
P. Pandey, A. Saroliya, and R. Kumar, “Analyses and detection of health insurance fraud using data mining and predictive modeling techniques,†Adv. Intell. Syst. Comput., vol. 584, pp. 41–49, 2018, doi: 10.1007/978-981-10-5699-4_5.
S. Subudhi and S. Panigrahi, “Detection of Automobile Insurance Fraud Using Feature Selection and Data Mining Techniques,†Int. J. Rough Sets Data Anal., vol. 5, no. 3, pp. 1–20, 2018, doi: 10.4018/ijrsda.2018070101.
A. Verma, A. Taneja, and A. Arora, “Fraud detection and frequent pattern matching in insurance claims using data mining techniques,†2017 10th Int. Conf. Contemp. Comput. IC3 2017, vol. 2018-Janua, no. August, pp. 1–7, 2018, doi: 10.1109/IC3.2017.8284299.
A. Ghorbani and S. Farzai, “Fraud Detection in Automobile Insurance using a Data Mining Based Approach,†Int. J. Mechatronics, Electr. Comput. Technol., vol. 8, no. 27, pp. 3764–3771, 2018, [Online]. Available: www.aeuso.org.
N. Ghuse, P. Pawar, and A. Potgantwar, “An Improved Approch For Fraud Detection In Health Insurance Using Data Mining Techniques,†no. 5, pp. 27–32, 2017, [Online]. Available: www.ijsrnsc.orgAvailableonlineatwww.ijsrnsc.org.
L. Rukhsar, W. Haider Bangyal, K. Nisar, and S. Nisar, “Prediction of Insurance Fraud Detection using Machine Learning Algorithms,†Mehran Univ. Res. J. Eng. Technol., vol. 41, no. 1, pp. 33–40, 2022, doi: 10.22581/muet1982.2201.04.
I. Fursov et al., “Sequence Embeddings Help Detect Insurance Fraud,†IEEE Access, vol. 10, pp. 32060–32074, 2022, doi: 10.1109/ACCESS.2022.3149480.
H. Abbassi, I. El Alaoui, and Y. Gahi, “Fraud Detection Techniques in the Big Data Era,†no. June 2022, pp. 161–170, 2022, doi: 10.5220/0010730300003101.
M. Hanafy and R. Ming, “Using machine learning models to compare various resampling methods in predicting insurance fraud,†J. Theor. Appl. Inf. Technol., vol. 99, no. 12, pp. 2819–2833, 2021.
K. Kapadiya and U. Patel, “Blockchain and AI-Empowered Healthcare Insurance Fraud Detection : An Analysis , Architecture , and Future Prospects,†IEEE Access, vol. 10, no. June, pp. 79606–79627, 2022, doi: 10.1109/ACCESS.2022.3194569.
C. Yan, M. Li, W. Liu, and M. Qi, “Improved adaptive genetic algorithm for the vehicle Insurance Fraud Identification Model based on a BP Neural Network,†Theor. Comput. Sci., vol. 817, pp. 12–23, 2020, doi: 10.1016/j.tcs.2019.06.025.
R. Karim and S. Alla, Scala and Spark for Big Data Analytics. 2017.
ITEBA, “Ini Dia Perbedaan Metode Penelitian Kualitatif, Kuantitatif, dan Penelitian Gabungan,†https://iteba.ac.id/blog/perbedaan-metode-penelitian-kualitatif-kuantitatif-gabungan/, 2021. .
Olson David L, Data Mining Models, Second Edition. 2018.
Allen B. Downey, Think Bayes, 2nd Edition. 2021.
B. Lantz, Machine Learning with R - Third Edition. 2019.
R. Bhowmik, “Detecting Auto Insurance Fraud by Data Mining Techniques,†vol. 2, no. 4, pp. 156–162, 2011.
Z. Karimi, “Confusion Matrix,†Encycl. Mach. Learn. Data Min., no. October, pp. 260–260, 2021, doi: 10.1007/978-1-4899-7687-1_50.
G. Hackeling, Mastering Machine Learning with scikit-learn - Second Edition. 2017.
M. Al Amin and D. Juniati, “Klasifikasi Kelompok Umur Manusia Berdasarkan Analisis Dimensi Fraktal Box Counting Dari Citra Wajah Dengan Deteksi Tepi Canny,†J. Ilm. Mat., vol. 2, no. 6, pp. 1–10, 2017.
Downloads
Published
Issue
Section
License

This work is licensed under a Creative Commons Attribution 4.0 International License
Authors who publish with this journal agree to the following terms:
- Authors retain copyright and grant the journal right of first publication with the work simultaneously licensed under Creative Commons Attribution 4.0 International License that allows others to share the work with an acknowledgment of the work's authorship and initial publication in this journal.
- Authors are able to enter into separate, additional contractual arrangements for the non-exclusive distribution of the journal's published version of the work (e.g., post it to an institutional repository or publish it in a book), with an acknowledgment of its initial publication in this journal.
- Authors are permitted and encouraged to post their work online (e.g., in institutional repositories or on their website) prior to and during the submission process, as it can lead to productive exchanges, as well as earlier and greater citation of published work (Refer to The Effect of Open Access).