Comparison Analysis with Huffman Algorithm and Goldbach Codes Algorithm in File Compression Text Using the Method Exponential Comparison

−With the development of technology at this time many people know about compression. In simple compression is a process to shrink the file from its original size. At this time compression applications that are often used are WinZip, WinRar, and 7-Zip, namely with the aim of compressing documents and saving space on memory or data transmission. Compressed data can be in the form of images, audio, video and text. The use of the Huffman algorithm and the Goldbach Codes algorithm in compressing text files is intended to provide enormous benefits in the sending and storage process and requires less memory space compared to uncompressed text. The algorithm starts by providing a string of inputs as input, how to produce an algorithm output in the form of a binary string or code that translates each input string, so that the string has a small number of bits compared to strings that are not compressed. Thus, the problem is how to obtain the code with sorted characters and frequency tables as input and shorter binary code as output. In applying the Huffman algorithm and the Goldbach Codes algorithm in compressing text files is very good, the results were not reduced from the original file or there was no reduction.


INTRODUCTION
With the development of technology at this time many people know about compression. In simple compression is a process to shrink the file from its original size. At this time compression applications that are often used are WinZip, WinRar, and 7-Zip, namely with the aim of compressing documents and saving space on memory or data transmission. Compressed data can be in the form of images, audio, video and text. Text file is a collection of characters or strings that become a single unit. Text files that contain lots of characters in them always cause problems with the storage media.
The compression method used at this time is by compressing the finished files (original files) and then compressing them and communicating them. After reaching the compressed data on the recipient, then carried out a de-compress to return to its original form, then the file can be used. To overcome the above problem, one solution is to compress the data so that it is smaller than the original size without reducing the content of the data.
In data compression using a compression ratio, the data compression ratio is a measure of the percentage of data that has been successfully compressed. Compression ratio measurement is intended to see how much reduction occurs in the file compression process when compared with the original file. The greater the compression ratio, the more reduction in the size of the compressed file compared to the original file. A compression ratio of 60% means there has been a reduction of 60% from the original file size. Compression ratio can be negative, which means the size of the compressed file is bigger than the original file.
In previous research journals by Nuryasin it can be concluded that the resulting compression ratio varies considerably depending on the type of file being compressed. While the speed of compressing files is obtained that if the file size is getting bigger, the longer the time needed to do the compression process [1].
In previous research journals by Surya Darma Nasution and Mesran it can be concluded that the Goldbach Codes (GC) algorithm is a simple compression algorithm that encodes only positive integers n by converting them into positive integers with 2 (n + 3) and then writing some sum primes in reverse [2].
The use of the Huffman algorithm and the Goldbach Codes algorithm in compressing text files is intended to provide enormous benefits in the sending and storage process and requires less memory space compared to uncompressed text..

Data Compression
Data compression (data compression) is a technique to reduce the amount of data size (the result of compression) from the original data. Data compression is generally applied to computer machines, this is done because each symbol that appears on the computer has a different bit value. For example in ASCII each symbol that appears has a bit length of 8 bits, for example code A on ASCII has a decimal value = 65, if changed in binary numbers to 010000001. Data compression is used to subtract the number of bits generated from each symbol that appears. With this compression is expected to reduce (reduce data size) in storage space [3].

Huffman's Algorithm
In 1952 David Huffman introduced a compression algorithm called Huffman Coding. The formation of binary trees in the Huffman algorithm is formed from leaves to roots and is called the formation of trees from the bottom up. This method uses almost all the characteristics of Shannon-Fano Coding. The principle of the Huffman code is that the characters that appear most often in the data are encoded with the shortest codes, while characters that rarely appear are coded with longer codes. The Huffman algorithm builds a binary tree to generate prefix code [6].

Goldbach Codes Algorithm
In 2001, Peter Fenwick had the idea to use Goldbach's assumption (assuming that was true) to design a completely new class of code based on prime numbers. Prime numbers can function as the basis of a number system, so if we write even integers. This system, its representation will have exactly two. So, even number 20 equals 7 + 13 and can therefore be written 10100, where five bits are given the main weight (from left to right) 13, 11, 7, 5, and 3. Now turn this bit pattern over. The least significant bit becomes 1, producing 00101. Such numbers are easy to read and extract from long string bits. Simply stop reading in the second part. Remember that unary code (the zero sequence ending in single) is read by a similar rule. Stop at the first. So, the Goldbach code can be considered as an extension of a simple unary code [7].

Exponential Comparison Method
In calculating and comparing the search process of the two algorithms is as follows:

RESULT AND DISCUSSION
The analysis carried out in comparing the process of the workings of the Huffman algorithm and the Goldbach Codes algorithm only leads to the compression of text files. Compression is a reduction in the size of a file to a size smaller than the original. Compressing this file is very beneficial when there is a large file and the data in it contains a lot of repetition of characters. The technique of this compression is to replace these repetitive characters with a certain pattern so that the file can minimize its size.
The algorithm starts by giving string strings as input, how to produce algorithmic output in the form of binary strings or code that translates each input string so that the string has a smaller number of bits compared to strings that are not compressed. Thus, the problem is how to obtain the code with sorted characters and frequency tables as input and shorter binary code as output. Text file compression analysis refers to text documents with * .txt and * .doc extensions, with processes that depend on the size of the document and how many characters are in the document. In data compression using a compression ratio, the data compression ratio is a measure of the percentage of data that has been successfully compressed.
Compression ratio measurement is intended to see how much reduction occurs in the file compression process when compared with the original file. The greater the compression ratio, the more reduction in the size of the compressed file compared to the original file. In analyzing the workings of the two algorithms, it is necessary to have an application that will find out how the process produced by each algorithm in doing compression. This designed application is a desktop based application. The tools used to design the comparative analysis application are to use Microsoft Visual Basic.Net 2008 as a tool for design and source code.

Goldbach Codes Algorithm
The following is an example of compression using the Goldbach Codes Algorithm, there is a text file with the extension .txt and the size is 19 bytes containing the string: AKU ANAK MANDAILING, here's how to solve: 1. Read all the characters in the text to calculate the frequency of occurrence of each character, where the character that appears the most is the first (n = 1) and so on. 2. Find the codeword for each character to be encoded, by finding a prime number that can represent the sum of two prime numbers. Primary starts from number 3, 5, 7 and so on. The IJICS | Abu Sani Tanjung | http://ejurnal.stmik-budidarma.ac.id/index.php/ijics 3. Codeword only has two "1 or 0". Prime number search represents the number in search then stop being given "1" and if you don't find it given "0" and so on until you find what you are looking for. For example the number 24, then 11 + 13 = 24. After getting the numbers of several prime numbers are correct, then the bit pattern is obtained is 11000. Then the bit pattern is reversed towards the very back, producing a bit pattern 00011.

Huffman's Algorithm
The Huffman algorithm uses a table of occurrence of characters for the frequencies of two trees combined. Therefore, the total cost of forming a Huffman tree is the sum of all the combined leaves. Huffman provides an algorithm to construct a Huffman code with the input text string S = {s1, s2, ..., sn} and the frequency of occurrence of characters F = {f1, f2, ..., fn}, resulting in the form of binary string C = {c1 , c2, ..., cn} or called the Huffman code.
Compression steps for the Huffman algorithm: 1. Data are analyzed first by making a frequency table for each ASCII symbol to appear, the frequency table has attributes in the form of ASCII symbols and frequencies. 2. The two data that have the smallest occurrence frequency are selected as the first node in the Huffman tree. 3. From these two nodes, a parent node is made which records the number of frequencies of the first two nodes. 4. Then the two nodes removed from the table are replaced by the parent node earlier. This node is then used as a reference to form a tree. 5. Steps 3-5 are repeated until the contents of the table are only one, this data will be the free node or the root node. 6. Each node located on the left branch (node with greater frequency) is given a value of 0 and a node located on the right branch (node with smaller frequency) is given a value of 1. 7. The reading is done from the root node towards the leaf node by observing the value of each branch.
Steps to complete the Huffman algorithm: 1. Read a bit from the binary code set 2. Starting from the root of the binary tree. 3. For each bit in step 1, traverse the corresponding branch. 2. Formation of the Huffman Tree Each character is described as a single knotted leaf or tree. Then the combination of two leaves that have the smallest frequency of appearance of characters to form roots. The root is the sum of the frequencies of the two leaves they compose. This iteration is carried out until a binary tree is formed. Label 0 and 1 on each side of the binary tree. The left side is labeled with 0 and the right side is labeled with 1. The process of forming a binary tree to form a Huffman tree can be seen in the picture below: Page | 33 The IJICS | Abu Sani Tanjung | http://ejurnal.stmik-budidarma.ac.id/index.php/ijics Rows of numbers 0 and 1 on the sides of the tree from root to leaf represent the Huffman code for the corresponding character. Browse the binary tree from root to leaf to form the Huffman code.

Exponential Comparison Method
In calculating and comparing the search process of the two algorithms is as follows: 1. Determine alternatives To analyze the speed ratio between the Huffman algorithm and the Goldbach Codes algorithm in compression it is necessary to determine which algorithm will be used as the compression algorithm.

Determine the criteria
To be able to compare the two algorithms, the next step is to determine criteria in analyzing the process and how it works. The criteria can be seen in the following table:

Ratio Of Compression (RC)
The value of the comparison between the size of the data bit before it is compressed to the size of the data bit that has been compressed Compression Ratio (CR) Percentage of comparison between compressed data and uncompressed data.

Space Saving (SS)
Difference between uncompressed data and large compressed data

Grading values on each criterion
The keriteria that has been formed must be given a value. This value can be seen in the example below where the value is taken based on the analysis of the Huffman algorithm and the previous Goldbach Codes algorithm. After obtaining the final value or the total value of each alternative, then the next step that needs to be done is to determine the priority of the decision based on the value of each alternative. The results of priority decisions can be seen in the table below: