Tourist Places Recommender System Using Cosine Similarity and Singular Value Decomposition Methods

−Tourism in the city of Bandung has various potentials in the field of culture, regional specialties, buildings, and other tourist attractions. On the Tripadvisor page there are many reviews from users who have visited tourist attractions in the city of Bandung. In this case, user reviews are an important element for analysis. The analysis process is carried out using rulebased sentiment analysis. In conducting the review analysis, we use vaderSentiment to weight the positive and negative values. Positive values are subtracted from negative values to get a compound value and converted to a rating value. The rating value obtained is then processed using the Cosine Similarity and Singular Value Decomposition methods to obtain recommendations for tourist attractions in the city of Bandung. For this method, we use the Root Mean Square Error method as a measure of the level of accuracy between the predicted values. The results of the measurement of the level of accuracy produce a value of 3,489 in the Cosine Similarity method, while the Singular Value Decomposition method gets a value of 1,231. The value in the Singular Value Decomposition method is smaller than the Cosine Similarity method with a difference of 2,258 values.


INTRODUCTION
Tourism is a trip to visit tourist sites for recreation or certain purposes. Tourism data shows a significant increase from the Indonesian tourism sector. Indonesian tourism data can show the increasing number of tourists visiting tourist attractions every year [1]. In 2019, the Indonesian Central Statistics Agency showed that the number of tourists visiting tourist attractions for entertainment was 47.61%. The number of tourists in 2019 reached 282 million, both domestic and foreign tourists. Not only Indonesian tourism is experiencing development, but tourism in West Java has developed in various tourism sectors.
West Java has 496 tourist attractions in the form of cultural tourism, nature tourism, building tourism, culinary tourism, and special interest tourism. According to the Department of Tourism and Culture, West Java has great natural potential, with 303 natural attractions [2]. As the capital of West Java, the city of Bandung has a high physical and cultural appeal. The culture the city of Bandung is diverse such as traditional musical instruments, regional languages, regional dances, and special foods. A physical point of view, the city of Bandung has old Dutch buildings and many are still well preserved. On September 25, 2013 by UNESCO determined that the city of Bandung became one of the cities as a world tourist destination [3]. It can be seen from the development of the city of Bandung in terms of transportation, buildings, tourist attractions, and culinary. This makes the development of the city of Bandung which is quite rapid, making many tourist destinations on offer and new tourist attractions continue to emerge from nature to culinary tourism. This development makes more and more tourists visit every year. The growth of visiting tourists can be seen in Figure 1. The development of tourism is inversely proportional to the available information about the available tourist attractions. In these conditions, users will be faced with a choice of tourist attractions that have been around for a long time and have only popularity. Many new tourist attractions are not yet known by tourists, due to lack of information and the results of other user reviews. TripAdvisor is a travel site that has many tourist destinations in  The tourist attractions listed on the TripAdvisor site are quite diverse, such as indoor, outdoor attractions, historical buildings, special foods and much more. Users who have visited certain tourist attractions, will provide a review on the TripAdvisor site. User reviews on the site become an important marketing value, the quality of the reviews shows the level of user satisfaction [4].
From the problems above, it is necessary to have a system that can provide recommendations to users in choosing tourist attractions. In this case, we will do two things, namely converting user reviews into rating values by calculating polarity values and generating recommendations for tourist attractions using the Cosine Similarity and Surprise Factorization Matrix in the Singular Value Decomposition [5], [6].
Matrix Factorization is an approach that is often used in the process of recommender systems. Matrix Factorization process by filling in empty ratings that utilize the characteristics of matrix factorization which is positioned as preprocessing. The next stage is to look for similarities with Cosine Similarity, selecting the same neighbor values and predicting the number of weights. Based on the results of weighting that has a small value, it cannot be used as an appropriate recommendation, and items with a low rating are not recommended properly [7], [8].
Surprise is a Python matrix factor library that supports the use of the Singular Value Decomposition method, which allows us to visualize similarities between items in the data set being used. Supports model evaluation results such as cross-validation iterators and scikit-learn metrics, model selection process and automated hyperparameter search, namely GridSearch. The use of Surprise to support the Singular Value Decomposition method is the result of fast execution in the exploration of recommendation results [9]- [11].
The purpose of this study is to convert user reviews into rating values, then develop a system to provide the best recommendations for users and that the computing era has developed with systems that assist in data processing. A system for recommending tourist attractions based on user reviews on the TripAdvisor site. Identifying the problem above, we want to compare two different methods to show how user reviews can be converted as rating values to generate recommendations for users. The comparison process between the Cosine Similarity and Singular Value Decomposition methods by looking at the accuracy obtained from the two methods using the Root Mean Square Error (RMSE). Root mean square error has been used as a standard statistical metric to measure model performance in meteorological research studies to measure model error [16].

System Design
In this study, we process user reviews into rating prediction values at the pre-processing stage. The rating prediction value is used to recommend tourist attractions using the Cosine Similarity and Singular Value Decomposition methods. These results were tested using the Root mean square error and comparison was made. An overview of the system is shown in figure 2.

Dataset
In this study, we use data from TripAdvisor which contains 2872 data. The data includes details of tourist attractions in the city of Bandung (tour id, tourist place name, address, category), reviewer id, reviewer name, review title, and review comments given by users on certain tourist attractions. The data we get is then processed by predicting the rating and taking the column that will be used in the next process. The following is the form of the dataset shown in table 1.

Rating Prediction Using Vader
In this study, we use Vader (Valence Aware Dictionary and Sentiment Reasoner) in the rating prediction process.
Vader is a rule-based sentiment analysis tool adapted from sentiment results on social media [12]. The rating prediction takes information from review_tempatwisata.csv, using the information in column id_wisata, id_user, and review.

Cosine Similarity
Cosine Similarity is a method used to calculate the similarity value between data and then compare to see the similarity level [13], [14]. This can be done by looking for the similarity value of the data to produce recommendations. The recommendation process is used to measure how similar the data are based on the similarity of different criteria. This method of measuring similarity has several advantages, namely the normalization of long data. This minimizes the effects caused because the data used has quite a lot of data. The following is the algorithm used, as follows: Based on the above algorithm, . is a dot vector multiplication of and by calculating with ∑ . , || || is the length of vector , calculated with √∑ 2 , || || is the length of vector , calculated with √∑ 2 .

Singular Value Decomposition
Singular Value Decomposition is factoring a matrix by dividing it into two matrices and becoming a diagonal matrix containing the factorization value [10]. The parameters that underlie the Singular Value Decomposition model include item factors using rank values by predicting the relationship between items [15]. The Singular Value Decomposition method is applied to a wide variety of application systems including dimension reduction, computer vision, signal processing, etc. The following is the algorithm used, as follows: Based on the above algorithm, we can describe that R is the ranking matrix of the users, is the user feature matrix, is the user feature matrix and Sigma is the diagonal of the single value matrix. In and are orthogonal matrices with user features is how relevant to each tourist spot.

Comparison Using RMSE
Root Mean Square Error (RMSE) is a method of measuring the level of error by calculating the difference in the value of the predicted model, the form of statistical metrics is also used to measure the performance of the model [16], [17]. Where the calculation results are getting smaller (closer to 0) the RMSE value, the prediction results will be more accurate and better. The following is the algorithm used, as follows:

Data Pre-Processing
The first step in our research is to process user review data into rating values. This data processing process aims to change the form of review data information into rating data. The resulting information is then processed using the Cosine Similarity and Singular Value Decomposition methods. The steps in the pre-processing process are eliminating stop words and conducting analysis to get the rating value.

Remove Stopword
Stopword is the process of removing unnecessary characters for analysis. Deletion of words that are considered inappropriate or appear frequently, such as: (  If you are looking for a soothing therapeutic foot and body massage by and in a hot spring this is for you. At a cost of less than 5usd you can get a good massage by some experienced masseurs readily available in the area If you are looking for a dip in the hot spring Geology museum in Bandung educate my self and kids about the journey of life, our ancient parents 'homo erectus paleojavanicus' and also some bones from the wild which life here in Java Island, around the museum there were a lot of culinary places and merchandise, and a store Geology museum in Bandung educate my self and kids about the journey of life our ancient parents homo erectus paleojavanicus and also some bones from the wild which life here in Java Island around the museum there were a lot of culinary places and merchandise and a store The resulting data is then processed into rating predictions. SentimentIntensityAnalyzer from vaderSentiment to provide sentiment score as output by dividing three values, namely positive values, negative values and compound values. This value is processed using python library and only compound scores are extracted. After extraction of the combined scores for each user opinion, based on the sum of the positive, negative and neutral scores calculated which are given in table 3.

Rating Result
The previous step is Data Pre-Processing, to convert the results of user reviews into positive and negative values using the Python library (shown in table 2). The determination of the rating value is based on the positive value minus the negative value, so the higher the positive value, the less negative value. The rating value is then converted into a scale from 0 to 5 as shown in Table 4.  id_user  rating  0  1  1118914  5  1  1  1925512  2  2  1  5618425  3  3  1  9144181  2  4 1 13212611 3 The process of assigning a rating value on a scale of 0 to 5 uses the sector division guidelines that have been determined in the algorithm. The rating sector is divided into 6 sections according to the required rating scale. The rating value that has been generated is then stored in a new data document with the required tables such as id_user, id_wisata and rating. Table selection is based on how important the information in the table is, the data will be reused in the next process.

Cosine Similarity
The similarity value between items is calculated using the calculation of the cosine angle value of two variables or vectors. By comparing each vector and given a predicted rank based on the cosine similarity for each category. In the process of calculating the matrix value, the data is divided into two, namely training data and testing data. The distribution of data is 80% training data and 20% testing data. The larger amount of training data used can affect the value of the output data testing.

Matrix Factorization -Cosine Similarity
The Matrix Factorization technique aims to test the training data to calculate the matrix values that have adjacent similarity values [7]. The matrix value based on factorization is concluded to get a ranking pattern [8]. The algorithm can be seen in table 5.  The above algorithm is to calculate the matrix value. The generalization of the cosine similarity is compared with the values in the data matrix B (cosine similarity matrix A vs. B) and compares with itself (cosine similarity matrix using A vs. A). After the Matrix Factorization has been carried out, it is continued with calculations using the Cosine Similarity method to find a set of items that are similar to their neighbors. Vector values cannot have a negative rating and items that do not have a rating are normalized as zero rank. The ranking is shown in Table 6.

Output -Cosine Similarity
The results of recommendations using the Cosine Similarity method are a list of recommendations tourist attractions and predictive values addressed to users based on the dataset we currently have. The sample results can be seen in table 7.

Singular Value Decomposition
The first stage in this method is dividing the data into two. The distribution of data is 80% training data and 20% testing data. The greater the amount of training data used can affect the value of the output data testing. The process in the Singular Value Decomposition method, the process that needs to be done is Tuning hyperparameters as well as modeling using existing train data and testing the designed model to the test data. The test results will be compared with the Cosine Similarity method. The data is processed using optimal parameters in the Singular Value Decomposition method called Grid Search. To get a certain grid vulnerability value and optimal parameter by traversing all points on the grid by dividing the parameters used, this method becomes a solution when the distance is small and the global value when the optimal interval is quite wide [18].

Surprise-Singular Value Decomposition
After doing data splitting, then the data is processed to be carried out for modeling. Python surprise library has provided Singular Value Decomposition function. Surprise is useful in data exploration processes that require fast time to generate predictions. The algorithm can be seen in table 8.

Output-Singular Value Decomposition
The results of the recommendations using the Singular Value Decomposition method are a list of recommended tourist attractions and predictive values addressed to users based on the dataset used. The sample results can be seen in table 9.

Comparison using Root Mean Square Error
The evaluation process of this study uses the Root Mean Square Error (RMSE) method. RMSE as an error metric to show a good numerical prediction by comparing the prediction errors of different models or configuring the model for certain variables. The algorithm can be seen in table 10. The results of the evaluation were compared to see the value of the Singular Value Decomposition method and the Cosine Similarity method. From the value generated by the RMSE method, the Cosine Similarity method gets a score of 3,489 and the Singular Value Decomposition method gets a score of 1,231. Seen in figure 3.

CONCLUSION
The system for providing recommendations for tourist attractions by taking Bandung city tourism as a dataset in this study. Based on the results of this study, the tests carried out using the RMSE method on the Cosine Similarity method using Matrix Factorization and the Singular Value Decomposition method using the Python Surprise library. The evaluation value generated by the Cosine Similarity method is 3,489 and the SVD method is 1,231.
In the results of the recommendation system for tourist attractions, the evaluation value of the Cosine Similarity method is greater than that of the Singular Value Decomposition method. This shows that the level of accuracy produced by the Singular Value Decomposition method is better than the Cosine Similarity method. The smaller the evaluation results of the RMSE method, the better the accuracy value obtained. Suggestions for further research are to be able to use data from user reviews that use Indonesian, and to expand stopword removal. The use of parameters that can produce more precise accuracy values, data that can be processed using faster methods.