Implementation of K-Means Methods In Clustering Students Ability Levels in English Language

Nowadays, English extremely needs to be controlled, especially students, in communicating and reading also understanding literature written in English. In achieving mastery of English, the students, in this case, the students who are not majoring in English are given a common base subject of English. In Politeknik Indonesia, especially majoring in a Bachelor's Degree in Informatics Engineering, teaching English is using the direct method, to find out the results of teaching English within three semesters. Therefore, by doing this research for classifying the level of ability of students into three categories, they are Beginner, intermediate and advanced. The objective of the grouping is to determine how many students who have the capability level is low, medium and high so that the faculty can determine the average level of students' proficiency and the lecturers can intervene to conduct teaching in developing the students' knowledge of English. The classification used the K-Means clustering algorithm, which is one algorithm that classifies the same data on specific groups and different data in the other group. The results of this study by applying the k-means clustering method is the researchers can classify the students based on students' ability levels either they are beginner, intermediate or advanced.


INTRODUCTION
As we know, English is the most dominant of the international language, so it has become a hope for many people to be able to understand and to communicate in English well.In Indonesia, English is the only foreign language that must be learned from pre-school to university level.Although it has been studied in a long period, there are still many students who have problems in learning English.They had difficulty in speaking, listening, reading and writing in English.This is due to a lack of confidence in speaking English and fears of making mistakes in using grammar, mentioning vocabulary, pronouncing and any others.
Currently, English is very needed to be mastered especially for students, in communicating and reading also understanding literature that written in English.In achieving mastery of English, the students, in this case, the students who are not majoring in English are given a common base subject of English.In Politeknik Pos Indonesia, especially the majoring of Bachelor Degree in Informatics Engineering, for instance, teaching English is conducted within three semesters, namely 1st semester General English 1, 2nd semester General English 2, and 3rd semester General English 3 by using the direct method.To find out the results of teaching English within three semesters, the researchers conducted this research by classifying the level of ability of students into three categories, they are Beginner, intermediate and advanced.
By grouping the students, the student's ability level can be known by seeing their fluency, comprehension, and grammar.The objective of the grouping is to determine how many students have low, medium and high of English level ability.Therefore, the faculty can determine the average level of students' proficiency so that lecturers can intervene in the conduct of teaching in developing the students' knowledge in English.By doing the grouping, the researchers apply the K-means method.K-Means method is needed for students to be able to determine the grouping criteria that could be a reference [1] This k-means clustering algorithm is one of the most widely used in clustering techniques [2] and grouping data into specific figures partition clusters (groups, subsets, or category) [3].so that the results of this research in applying the k-means clustering method, researchers can classify students based on students' ability levels neither are Beginner, intermediate, and advanced.

DSRM Method (Design Science Research Methodology)
A research methodology is a research common approach in taking and implementing research projects.The type of research methodology that is used by the researchers was Design Science Research Methodology (DSRM), this method was used for the framework of the procedures used as well as the understanding of the process of The IJICS | Cahyo Prianto |http://ejurnal.stmik-budidarma.ac.id/index.php/ijicsreview to identify and evaluate the results of the research.Design Science Research Methodology (DSRM) consists of seven process methods that researches do [4].

Identification of Problems
In defining research problems and finding solutions for the problem.Two things are done to find a solution is to focus on research and client in the study, which will be to find solutions and help to understand the reasons researchers in understanding the problem.The resources required for these activities include knowledge about the state of the problem and the importance of the solutions.The step that is used in identifying a problem is doing data collection, which consists of: 1.The study of literature, searching for references via the internet and journals of national or international.2. The interview, in this study the researches conducted interviews with English lecturer D4 Informatics Engineering at the Politeknik Pos Indonesia.

Determining the Problem Solution
Summing up the purpose of the existing problems, where the goal is later expected to be better than the present, or how this new artifact can support the settlement of problems that are now handled.The resources required for this phase include knowledge about the state of the current problem.

Design and Development
This stage is making artifacts, namely building models or methods or new properties of technical, social, and or information resources.Conceptually, the design of research artifacts can be in the form of objects that are designed and designed by researchers.This activity includes determining the function of the desired artifact and the architecture of the actual artifact making.At the design stage, the database is carried out, and the design of the application process uses UML.

Processing Data with K-Means
This stage is about how the data is processed using the K-Means, Data clustering using the k-means method is generally done with the basic algorithm.

Demonstration
The Demonstrate the use of artifacts to solve one or more of the problems that exist.This could involve the person in charge of the company.

Evaluation
Observe and measure how well the artifacts in solving this problem.This activity involves, comparing both the actual purpose of the results observed in the use of artifacts when demonstrations.This stage requires knowledge of the relevant measure and analysis techniques, depending on the nature of the problem and artifacts, the evaluation can take many forms.

Communication
Communicating the problem and the importance of artifacts between researchers and others interested in the publication of scientific research.Researchers may use the reports or scientific journals such as empirical research process (problem definition, literature review, hypothesis development, data collection, analysis, results, discussion, and conclusions) is a common structure for empirical research papers.

Clustering
Clustering is a method of grouping data [5].Data that have similar characteristics to be gathered in the same group or cluster, the data that have different characteristics, will congregate in groups or clusters of different [6] Analysis of the cluster is the method used to divide the data set into groups based similarities predetermined [7] the main objective of the clustering method is a grouping of a number of data/objects into clusters (groups) so that in each cluster will contain the data as closely as possible [8].

K-Means Methods
K-Means is one method of a non-hierarchical grouping of data (Blocking) which seeks the partitioning of data into the form of two or more groups [9].This method of partitioning the the data into clusters / groups so that the data having the same characteristics are grouped into the same cluster and the data that have different characteristics are grouped into the other group [10] The data that have the same characteristics are grouped in one cluster/group and the data that have different characteristics grouped by cluster/group to another so the data are in one cluster/group has a small degree of variation [11].This method is to divide the data into groups with the understanding that any of the data has the same characteristics are grouped into the same group and so Also to any different characteristic properties that the data will be grouped into another group.1. Determine k as the number of clusters you want in the form.2. Determining the center point (centroid) early in each cluster as many as k.
3. Calculate the distance of each data inputted to each centroid using Euclidean distance formula (Euclidean Distance) that is found within the closest of any data by using centroid.Here is the equation Euclidian Distance: Information : d = distance j = the number of data c = centroid x = Data c = centroid 4. Classify each of data based on its proximity to the centroid (the smallest distance).5. Renewing the centroid value.New centroid value obtained from the average cluster is concerned with using the formula; 6. Doing repetition from steps 2 to 5, until the members of each cluster nothing has changed.If step 6 has been fulfilled, then the cluster center value (j) in the last iteration will be used as a parameter to determine the classification of data.

Data analysis
The data is used for the data dataset with 51 students in D4 Informatics Engineering 2018 is the data value with the attributes of general English 2. is grammar, fluency, and comprehension. of the three attributes that will be grouped into three groups: beginner with a value of 0-40, 41-70 intermediate value, whereas 71-100 advance.

K-Means Clustering Algorithm
Then after the necessary data are ready, the calculation process such data with manual calculation first thing to do is to use clustering techniques, Calculation using K-Means Clustering: a. Determining K namely Grammar, Fluency and comprehension b.Determine the number of clusters.In this study, the data will be grouped into three clusters, namely Beginner, intermediate and advanced.c.Determining the initial center point (centroid), and get the center point.And so the calculation is done for the data and the next attribute as a step above.a.Having obtained the results for each of the data, and then classifying each data based on its proximity to the centroid (the smallest distance) to the cluster.So from the calculation of the Euclidean distance Distance value has the smallest distance it will be added to the cluster example in the first data is included in the C1 group because it has the smallest distance is 13.And so on grouping for the next data.So that the amount of data for each cluster in the initial calculation is C1 (Advance) is 40 data, C2 (elementary) amounted to 8 data, C3 (beginner) amounted to 3 data.b.Do the previous process again to get or make sure the cluster value doesn't change.This repetition is called iteration.The initial step of iteration is to update the centroid value.So looking for a new centroid value and not using the initial centroid.The calculation of the distance is as follows: The distance between the first data and the cluster center point (centroid) The first is: The distance between the first data and the cluster center point (centroid) The second is: The distance between the first data and the cluster center point (centroid) of the three is: The calculations are done for the data and the next attribute as a step above.And if the value is different from the first iteration centroid beginning of the meal will be done iteration or repetition again and so on until the centroid value equal to the value of the previous centroid.for grouping students, ability levels have occurred 5 times iterations and stop at centroid to 5 because centroid value to 4 and 5 have the same result.Resulting centroid in the table below: The IJICS | Cahyo Prianto |http://ejurnal.stmik-budidarma.ac.id/index.php/ijics For grouping, the results for the final centroid or centroid to 5 and the fifth iteration can be seen in the table below.The IJICS | Cahyo Prianto |http://ejurnal.stmik-budidarma.ac.id/index.php/ijicsSo for the results of grouping the level of student ability in English can be seen in the results of iteration to cluster 1 (Advance) with a total of 37 data and for cluster 2 (intermediate) with a total of 10 data and cluster 3 (beginner) with a total of 4 from 51 data.It can be seen in the graphic image below.

Table 1 .
Initial Data Students

Table 2 .
Centroid 1 a. Calculate the distance of each data inputted to each -each centroid with Euclidean Distance Using the distance formula to find the closest distance of each data by using centroid.The calculation of the distance is as follows:

Table 3 .
To get the new centroid results seen from the first iteration results, here are examples of calculations: And so on is calculated for the next attribute.So we get a new centroid center point like the table below.Centroid 2

Table 5 .
Distance to the nearest and Results Grouping of iterations to 5