Data Compression Using Stout Codes

The need for data compression is now very important because it is affected by human nature that tends to collect data and also influenced by the need for a process for fast data transfer because the data size becomes smaller. Now many compression algorithms can be used, one of which is the stout codes algorithm. how the stout code algorithm works by encoding characters that often appear with smaller codes than characters with fewer occurrences. The results of data compression in the form of text with the algorithm of the stout code algorithm is the size of the text data becomes smaller with a compression ratio 60% and space saving 40%.


INTRODUCTION
In several studies, it has been mentioned that data compression is a representation of data that has been processed into smaller sizes so that it is different from the original data [1].Nowadays many compression algorithms and a lot of software are used to solve problems regarding data compression.In data compression there are ways to measure the performance results of the compression algorithm used, namely time (the time when the compression process and the time when the decompression process), the ratio (the size of the data after the compression process), space savings (the difference in the percentage of data size after the compression process with the size data before the compression process) [2] [3].
In data compression, there are two compressions, namely Lossy Compression wherein the compression process there will be data loss and Lossless Compression wherein the compression process there is no loss of data [3].In this research using the algorithm of the stout code with the type of compression is lossless compression.Research that discusses the algorithm for stout codes published in journals is currently lacking, but this algorithm is found in books on data compression.This research was conducted to see how the performance of the algorithm of the stout code to compress text data.

Compression
Compression has the meaning of compressing or shrinking a size.Data compression is the process of encoding information using other bits that have a value lower than the representation of data that is not encoded [4][5][6] [7].Previously it has been mentioned that there are two compression methods namely Lossy Compression and Lossless Compression.Lossy Compression is very suitable when applied to compress images, audio, and video.Lossless Compression is very suitable for compressing text or programs.

Stout Codes
In Stout codes, variable-length codes for integers are similar to Elias omega and Even-Rodeh code.Codeword generated by the algorithm of the stout code depends on the choice of parameter l which is greater than or equal to 2. The Stout Code algorithm is introduced by Quentin stout with two families, namely Rl and Sl of recursion [8].In the Rl family, more and more long groups are read until the group is found followed by 0. Use the notation L = 1 +? Log2 n? and is denoted by B (n, l) binary representation of the lbit (beta code) of integer n.So, B (12, 5) = 01100.For l ≥ 2, the prefix is defined by : The IJICS | Surya Darma Nasution | http://ejurnal.stmik-budidarma.ac.id/index.php/ijics The second family of the Stout codes has a similar method, but with a different prefix denoted by Sl (n).For a small value of l, this family offers several improvements compared to the Rl code.In particular, this eliminates the slight redundancy in the Rl code because the long group cannot be 0 (which is why the long group in the omega code encodes Li-1 and not Li).The prefix Sl is similar to the prefix R1 with the difference that the long group for Li encodes Li −1 -l.The prefix Sl (n) is defined recursively by: Table 1 lists some of the prefixes S2(n) and S3(n) and describe the regularity.See that the far left column includes the value L, for example, the length of the encoded integer, and not the integer itself.The long group retains its value until the group that follows it becomes all 1, where the group point increases by 1 and the group that follows are reset to 10 ... 0. All long groups, except perhaps the leftmost one, start with 1.This behavior is the result of Li choices −1 -l.

Table 1. Codes S2(n) and Codes S3(n)
The prefix S2(64), for example, starts with a 7-bit group 1000000 = 64 and depends on it S2(7−1−2) = S2 (4) = 00 | 100.It is emphasized again that table 1 only lists the prefix, not the complete codeword.Once this is understood, it is not difficult to do so, see that the second Stout code is the prefix code.After the codeword is given, it will not be the prefix of other codewords.So, for example, the prefix of all codewords for 64-bit integers starts with 00 100 prefixes of 4-bit integers, but every codeword for 4-bit integers has 0 following 00 100, whereas codewords for 64-bit integers have 1 follows 00 100.

RESULT AND DISCUSSION
To find out how to apply the algorithm of the stout code to compress text data using the example sentence "SURYA DARMA NASUTION".In this research, the stout codes will be discussed with the Sl family.The first step taken is to calculate the frequency of each character, then sort it from the largest to the smallest frequency, as can be seen in table 2. After the first step is complete, then the second step is forming the codeword of each character based on the n value of each character.The steps in forming a codeword begin by determining the value of l, in this study using the value of l = 2.After determining the value of l, then the next step by entering the following formula : a.For the value of 0 = n = 2 l -1 because the value l = 2, then 0 = n = 2 2 -1 => 0 = n = 3 1) n = 1, then the resulting codeword is a binary value of n which is 1 and is taken as much as the value of l is 2 so that the codeword is 01 2) n = 2, then the resulting codeword is a binary value of n which is 10 and is taken as much as the value of l which is 2 so that the codeword is 10 3) n = 3, then the resulting codeword is a binary value of n which is 11 and is taken as much as the value of l which is 2 so that the codeword is 11 b.For the value of n = 2 l because the value l = 2, then n = 2 2 => n = 4 1) n = 4 with a binary value of 100, then the value of L = 3 is taken from the bit length of the binary value n and the next step looks for the value of R. R2 (3-1-2) = 0, the binary value is 0 and taken as many as the value of l, which is 2 so that the value of R = 00.Rl (L-1-l) B (n, L) = 00 100 2) n = 5 with a binary value of 101, then the value of L = 3 The IJICS | Surya Darma Nasution | http://ejurnal.stmik-budidarma.ac.id/index.php/ijicsR2 (3-1-2) = 0, the binary value is 0 and taken as many as the value of l, which is 2 so that the value of R = 00.Rl (L-1-l) B (n, L) = 00 101 3) n = 6 with a binary value of 110, then the value of L = 3 R2 (3-1-2) = 0, the binary value is 0 and taken as many as the value of l, which is 2 so that the value of R = 00.Rl (L-1-l) B (n, L) = 00 110 4) n = 7 with a binary value of 111, then the value of L = 3 R2 (3-1-2) = 0, the binary value is 0 and taken as many as the value of l, which is 2 so that the value of R = 00.Rl (L-1-l) B (n, L) = 00 111 5) n = 8 with a binary value of 1000, then the value of L = 4 R2 (4-1-2) = 1, the binary value is 1 and taken as many as the value of l, that is 2 so that the value of R = 01.Rl (L-1-l) B (n, L) = 01 1000 6) n = 9 with a binary value of 1001, then the value of L = 4 R2 (4-1-2) = 1, the binary value is 1 and taken as many as the value of l, that is 2 so that the value of R = 01.Rl (L-1-l) B (n, L) = 01 1001 7) n = 10 with a binary value of 1010, then the value of L = 4 R2 (4-1-2) = 1, the binary value is 1 and taken as many as the value of l, that is 2 so that the value of R = 01.Rl (L-1-l) B (n, L) = 01 1010 8) n = 11 with a binary value of 1011, then the value of L = 4 R2 (4-1-2) = 1, the binary value is 1 and taken as many as the value of l, that is 2 so that the value of R = 01.Rl (L-1-l) B (n, L) = 01 1011 9) n = 12 with a binary value of 1100, then the value of L = 4 R2 (4-1-2) = 1, the binary value is 1 and taken as many as the value of l, that is 2 so that the value of R = 01.Rl (L-1-l) B (n, L) = 01 1100 After forming the codeword, the third step is to replace the existing characters with the codeword that has been generated, the process can be seen in table 3: From the above process, the resulting string bit "101100100001110100101011000010010001100 101001010011001101101101001101101110000110".The fifth step is to add padding and flagging.Adding padding is done if the resulting bit string is not divisible by 8. Adding flagging is done to make it easier to eliminate padding during the decompression process.In the resulting string, there are 81 bits, where 81 are not divisible by 8 and have the remaining 1 expressed in n.Adding padding is done by adding 0 as many as 7 -n + "1" at the end of the string bit, 7-1 + "1" then adding padding 0000001.Adding padding is done by the formula 9-n, which is 9-1 = 8 expressed in the form of numbers 8-bit binary so that it produces 00001000.
The string bits that have been added to the padding and flagging become "101100100001110100 101011000010010001100101001010011001101101101001101101110000110000000100001000" with a total bit length of 96 bits.And the results of the performance of the algorithm stout codes calculated by: a. Compression Ratio (CR) Compression Ratio (Cr) is the percentage ratio between the data that has been compressed by the data that has not been compressed.cr = (size after compression / size before compression) x 100% cr = 96/160 x 100% cr = 60% b.Space Saving (SS) Space saving is the difference between data that has not been compressed by the data after compressed.SS = 100% -Cr SS = 100% -60% SS = 40% For the decompression process, the first step that must be done is to eliminate padding and flagging, by taking the last 8 bits and changing it to a decimal value and declaring it with n, then use the formula 7 + n.The last eight bits are 00001000 with a decimal value of 8 and denote with n, then 7 + 8 that is 15 omit the last 15 bits of the string bit.The second step with checking the bit starts from left to right, if the bit matches the codeword in table 3, then the bit value when checking is replaced with the character that has the codeword.Perform these steps until all the bits in the string bit change into characters.

CONCLUSION
The use of the algorithm of the stout code to compress text data is successful with a compression ratio of 60% and 40% space saving, so it can be said that the stout codes algorithm is suitable for compressing text data.The results of the compression of the algorithm of the stout code are affected by the number of

Table 2 .
Calculate Frequency And Sort By Frequency

Table 3 .
Replace Characters With CodewordsThe fourth step is to rearrange the codewords in the order of the characters that become the example so that it becomes a string bit.