Application of Data Mining to Predict Procurement of Office Writing in Al-Ikhwan Middle School using Naïve Bayes Method

−Office stationery (ATK) is now a necessity for almost everyone, especially for corporate instances or educational institutions. The need for office stationery is often an unexpected need to buy, this is what makes some educational institutions overwhelmed in doing their work, when they find out their office stationery is out of stock, so it is not uncommon to make work in company or institutional instances education is not completed on time, one of the ways to be more efficient is by implementing data mining to predict the purchase of office stationery (ATK) at the company or educational institution's intents, especially at the AL IKHWAN Middle School in Tanjung Morawa. The Naive Bayes method is used to analyze data in pattern recognition and predict purchase of office stationery (ATK) at AL IKHWAN Middle School in Tanjung Morawa. The data needed is the data for the purchase of office stationery (ATK) last month as test data, calculated from the date of the first purchase until the expiry date of office stationery (ATK) at AL IKHWAN Middle School in Tanjung Morawa. The results of this study are to be able to predict whether the office stationery (ATK) at AL IKHWAN Middle School in Tanjung Morawa can be bought back, or it can still be used for a long time, and if more than four types of stationery at the AL IKHWAN Middle School in Tanjung Morawa the lack of writing instruments from the two then the purchase of new stationery is feasible to do, from the amount of data occurring out of stationery.


INTRODUCTION
Data mining is a series of processes to get useful information from large database warehouses. Data mining can also be interpreted as extracting new information taken from large chunks of data that helps in decision making. In data mining there are many techniques in the process, including the naïve bayes algorithm, decision tree, artificial neural networks and many others. Prediction is a process of systematically estimating about something that is most likely to occur in the future based on past and present information that is owned , so that the error (the difference between something that happens with the estimated results) can be minimized. Predictions do not have to provide definitive answers to events that will occur, but rather try to find answers as close as possible to occur. Utilization of existing data in information systems to support decision making activities, it is not enough to rely solely on operational data, a data analysis is needed to explore the potential of existing information. Decision makers try to utilize data warehouse that has been owned to dig up information that is useful to help make decisions. Office stationery (ATK) has now become the need of almost everyone, especially for corporate agencies or educational institutions. the use of existing data in the information system to support decision-making activities, it is not enough just to rely on operational data alone, we need a data analysis to explore the potential of existing information. Decision makers try to use the data warehouse that is already owned to dig up information that is useful to help make decisions. The need for office stationery is often an unexpected need to buy, this makes some educational institutions overwhelmed in doing their work, when in know their office stationery turns out to have run out, so it is not uncommon to make work in company agencies or educational institutions not completed on time, one way to be more efficient is by implementing data mining to predict the purchase of office stationery (ATK) on company intansi or educational institutions, especially in the Middle School AL-IKHWAN Tanjung Morawa that is using data mining techniques. So that a lot of data can be used optimally. In previous research is still not optimal and still needs to be improved, as there is no determination of the time available in the warehouse with what is recorded on the computer so that it can result in a difference in the amount of inventory, besides remembering stationery the office is a small object and easily transferred, it is not easy to ascertain how much expenditure must be borne within a certain period of time.Then it is necessary to determine the time in checking the inventory of goods in the computer with those available in the warehouse, so as to minimize the difference in recording the amount Inventory. Naive Bayes algorithm is one of the algorithms found in classification techniques. Naive Bayes is a classification with probability and statistical methods raised by the British scientist Thomas Bayes, which predicts future opportunities based on past experience so that it is known as the Bayes Theorem. The theorem is combined with Naive where it is assumed that conditions between attributes are mutually independent. The Naive Bayes classification is assumed that the presence or absence of certain characteristics of a class has nothing to do with the characteristics of other classes [1]. The IJICS | Aida Sopia | http://ejurnal.stmik-budidarma.ac.id/index.php/ijics

Data Mining
Data mining as the process of obtaining useful information from large database warehouses. Data mining can also be interpreted as the extraction of new information taken from large chunks of data that helps in making decisions. The term data mining is sometimes also called knowledge discovery. One technique in data mining is how to trace existing data to build a model, then use the model so that it can recognize other data patterns that are not in a stored database. The need for predictions can also utilize this technique. In data mining, data grouping can also be done. The aim is so that we can know the universal pattern of existing data. Anomaly of transaction data also needs to be detected to find out the next follow-up that can be taken. All of these are aimed at supporting the company's operational activities so that the company's ultimate goal is expected to be achieved.

Naïve Bayes
Bayes is a simple probabilistic-based prediction technique based on the application of the Bayes theorem / Bayes rules with a strong assumption of independence (non-dependence) (Naive). In other words, in Naive Bayes the model used is the indevendent feature model. In Bayes (especially Naive Bayes), the strong intention to refer to features is that a feature in a data is not related to the presence or absence of other features in the same data. The relationship between Naïve Bayes with classification, correlation of hypothenesis, and evidence with classification is that the hypothesis in the Bayes theorem is the class label that is the target of mapping in the classification, while the proof is the features that are input in the classification model. If X is an input vector that contains features and Y is a class label, Naïve Bayes is written as P (Y / X). The notation means that the Y class label probability is obtained after the X features are observed. This notation is also called the final probability (posterior probability) for Y, while P (Y) is called the initial probability (prior probability) Y.
During the training process the final probability learning (P / Y / X) of the model for each combination of X and Y must be made based on the information obtained from the training data by building the model, an X test data can be classified by finding the value of Y by maximizing the value ( P (Y '/ X') obtained.
However, for features with numeric (continuous) type there are special treatments before they are included in Naïve Bayes. The trick is Perform discretization on each continuous feature and replace the value of the continuous feature with a discrete interval value. This approach is carried out by transforming continuous features into ordinal features.
Analyze specific forms of probability distributions for continuous features and estimate distribution parameters with training data. The gausian distribution is usually chosen to represent the conditional probability of a continuous feature in a class P (Xi | Y), while the gausian distribution is characterized by two parameters: mean, µ, and variance, σ. For each class yj, the conditional probability y for the Xi feature is:

Prediction
Prediction is predicting, predictions about things that have not yet happened. While according to some experts argue that, predicting is the process of scientific forecasts to gain systematic knowledge based on physical evidence. These words or terms can give rise to various perceptions. It can even mean multiple or multiple interpretations. No wonder that there is overlapping in the community in interpreting it. According to the large Indonesian dictionary, predictions are the result of predicting or predicting or estimating activities. Prediction can be based on mere scientific or subjective methods. Take for example, weather predictions are always based on the latest data and information based on observations including satellites. Likewise, earthquake predictions, volcanoes erupt or disasters in general. However, predictions such as soccer matches, sports, etc. are generally based on subjective views with their own perspective predicting them.

RESULT AND DISCUSSION
Analysis can be defined as the decomposition of a complete information system into its component parts with the aim of identifying and evaluating problems, opportunities, obstacles that occur in the expected needs so that improvements can be proposed. The study was conducted at Al-Ikhwan Middle School and based on research conducted data processing is still done manually. Analysis in the procurement of office stationery (ATK) really needs to be done at Al-Ikhwan Middle School. This analysis is useful to get the main problems that actually are the core problems that often occur in Al-Ikhwan Middle School. In this case it is necessary to analyze the procurement of office stationery (ATK), this aims to make the school know the steps that must be taken in recording ATK supplies in the coming month. Office Stationery (ATK) procurement data needs to be known because to find out what office equipment has been used so far, this data will be useful to find out what equipment has been used up and needs to be added next month. As for the procurement of ATK data the last 3 months can seen in the picture below.
Office Stationery (ATK) data usage needs to be known because to find out what office equipment has been used so far, this data will be useful to find out what equipment has been used up and needs to be added next month. As for the use of ATK data the last 3 months can seen in the picture below. Then from the table above can be calculated using the Naive Bayes Classification formula, as for how it works as follows: X needs to be added = 9 + 2+ 2 = 13 = 4,333 3 3 Xtidak perlu ditambah = 24 + 2 + 24 + 60 + 3 + 4 +5 + 3 + 32 + 2 = 159 = 15,9 10 10 s 2 perlu ditambah =  S tidak perlu ditambah = √152,448 = 12,347 Because the final probability value does not need to be added, office stationery does not need to be added.

IMPLEMENTATION
To run Weka application, you must first prepare the needs of the program to be implemented both in terms of hardware and computer software. Below are the testing steps in the Naïve Bayes algorithm using WeKa tools. Create the xlsx format on a Microsoft Excel worksheet The data format is stored on a Microsoft Excel worksheet. Microsoft Excel is the xlsx data storage database, with save as type being CSV (Comma Delimited). Microsoft Excel will be connected to Weka Tools. Then Data appears with 5 Attributes      Page | 44 The IJICS | Aida Sopia | http://ejurnal.stmik-budidarma.ac.id/index.php/ijics Select the "Use Training Set" menu. Then select the "Start" test using the Naïve Bayes algorithm successfully displayed.

Figure 8 Naive Bayes Algorithm Classification Results
Based on figure 8 above, it can be seen that the percentage for Correctly Classified Instance is 100% while the percentage for Incorrectly Classified Instance is 0%.

CONCLUSION
Based on the results of the author's analysis, it can be concluded in outline in making this system are as follows. 1. The process of procuring office stationery with a total of 13 items based on testing with the Weka application, the results do not need to be added. 2. By implementing the Naïve Bayes method it can be easier to predict office stationery. 3. Test results with Weka application in office stationery prediction are less than optimal.