Trading Strategy on Market Stock by Analyzing Candlestick Pattern using Artificial Neural Network (ANN) Method

−Technical analysis plays an important role in a stock market. Traders using technical analysis to find the trading strategy on the market stock. There are some technical indicators tools that can support the technical analysis, such as Moving Average, Stochastic, and others. Candlestick pattern also parts of the tools that used in technical analysis to develop the trading strategy since Candlestick represents the stock behavior. Therefore, understanding the Candlestick pattern and technical indicator tools will be valuable for the traders to predict the trading strategy. This study performs the prediction of trading strategy by analyzing the Candlestick pattern using an Artificial Neural Network (ANN). The technical indicator tools and Candlestick pattern will be generated as the features and label data in the modeling process. The method is applied to four stocks from IDX through their technical indicators for a certain period of time. We find that in the period of 28 days, the model generates the highest accuracy that reached 85.96%. We also used K-Fold Cross-Validation to evaluate the result of model performance that generates.


INTRODUCTION
The stock market plays a significant role in economic fields since it runs two economic and finance functions [1]. Technical analysis is a finance technique to predict the stock market trends [2]. In a journal by Osman and his coworkers, fundamental and technical analyses were the first two methods used to predict stock prices and Artificial Neural Networks are the most common techniques in predicting stock prices [3]. Technical analysis is a technique used in the financial field to predict the trend of the market stock in a short time [2]. Several technical indicators are used to get the accurate predictions on technical analysis [4]. Ten main indicator tools are being popular to use in the technical analysis, according to the journal by Nuraini (2015) [5].
The decision-making for financial investment is very crucial in practice, and it becomes interesting in the research field of computational finance. Several computational approaches have been developed to support the decision in a trading strategy: predicting stock price using the model of time series analysis, finding the hidden market regimes for a given stock price defined as observable states using the statistical method called Hidden Market Model (HMM) [6], and even the application of deep learning for image processing was applied in finance field by the research works as in [7] and [8] to detect the candlestick patters from a given candlestick image. Candlestick patterns are commonly used by analysts for future trends prediction to support investment decision making. Although its formation and the shapes are revealed in the literature, the ambiguity and misreading are very likely to occur since the description is actually a natural language. This makes difficult to be adopted in computational analysis. There are more than a hundred known candlestick patterns, and several research studies tried to establish the definition of the patterns to be more comprehensive. One of the works is [9] that formulated the formal specification of these patterns in first logic order representation. The formulation such as [9] is helpful for computation purposes, to identify the pattern formally from the structure of the candlestick data, base on the recent condition. The classification of candlestick patterns has been researched in [7] by a visual approach using two steps to recognize the pattern from candlestick images, consists of the method called Gramian Angular field (GAF) to detect the time series data in the images. The second step is applying Convolutional Neural Network (CNN) to classify the critical structure of candlestick patterns. Another work of [8] also used CNN to identify the market movement from a given dataset of candlestick pattern images. The learning model was developed from the images generated by some formal rule formulation of candlestick based on historical data, then used to predict the movement in the next day.
In this work, ANN is used to predict the candlestick pattern from the given information of the technical indicators obtained from the market stocks. And then, the trading strategy such as buying or selling the stock is generated by categorical trends characterized from candlestick patterns. The idea to use the information base on technical indicators to predict the candlestick pattern is a novelty in this research. Moreover, since it is included only numerical data for the input, the data preparation becomes simpler and more efficient computationally. The chosen stocks in this study based on stock clustering for IDX stocks refer to the work [10] that performed a clustering algorithm for portfolio investment diversification. In this study, several scenarios of a given period are performed. Once the model is generated using ANN for a certain time period that has already been set up, some data validation is inputted into the model to obtain the prediction of the candlestick pattern and then determine the trading strategy based on the obtained patterns. To evaluate the performance of the model, the measurement from the confusion matrix is used. Also, we implemented the K-Fold Cross-Validation to check the dependency and performances of the model that generates. Figure 1 shows the implementation procedures to predict Candlestick patterns until obtaining the trading strategy for a given period. Based on Figure 1, this study are conducted in eight stages. The stage begins with collecting the data and then processed by the ANN method. Then the results are used to generate the recommendation for a trading decision that consists of buy and sell position.

Data Collection
The data is the daily stock price of several chosen stocks listed in the Indonesian Stock Exchange (IDX). The selected stocks are from the study of IDX market clustering from [10]. These stocks are the results centroids of stock clusters that are generated in the study. The selected stocks are INAF.JK, INCI.JK, DSFI.JK, and ULTJ.JK. These centroids are the center of stock clusters, and the prices movement is similar to other stocks in the same cluster. See [10] for the detail. The data was collected from Yahoo Finance from January 2000 to January 2021.

Data Analysis
There are found six features in the dataset that is used to compute the technical indicator and determine the candlestick patterns. The features are 'Adj Close,' 'Close,' 'High,' 'Low,' 'Open', and 'Volume.' These features are used to compute the ten indicator values in technical analysis and also to generate the candlestick patterns by their rule representation and set the patterns as the target label in the modeling process to find the classifier.

Generating 10 Indicator Tools of Technical Analysis
The features from the dataset are used to compute the value of 10 indicators such as Simple Moving Average (SMA), Weighted Moving Average (WMA), Momentum, Stochastic K%, Stochastic D%, Relative Strength Index (RSI), Moving Average Convergence Divergence (MACD), Larry William's R%, Accumulation/Distribution (A/D) Oscillator, and Commodity Channel Index (CCI). Those indicators are calculated using the formula that is described on Table 1. The formulas are from [11].

Table 1. Indicator Tools in Technical Analysis
Indicator name Formula SMA Stochastic D% Refer to the table 1, C t , H t, L t are respectively the closing, the high, and the low price at time t. DIFF ≔EMA [12] , with α is smoothing factor =2 /(1 + k) , k is the time period of EMA. LL t-n and HH t-n are respectively the low and the highest high in the last t days.
, Up t is the uptrend and Dw t is the downtrend on t time.

Data Preprocessing
After the ten indicators are calculated, the next step is to implement data preprocessing to obtain the best quality of data, such as comprehensiveness, consistency, and increase the accuracy of the modeling process as told in [12]. Several processes are applied in this work, including data cleaning and data normalization. The collected stock data is used to calculate the technical indicators for a certain period of time. These technical indicators are set to be the feature variables that relate to the class of candlestick patterns. To see the relation of these features to the target, the ANN model is used to generate a classifier to use it to predict an investment position in the future.

Generate the Candlestick Pattern
Candlestick patterns are formed between three until five candles that sticks sequentially [13] and the alternative way to represent price behavior in the stock market [14]. Figure 2 represented the Candlestick parts and the relationship between its open, close, low, and high prices. Body from the Candlestick is formed by the distance between the open price and close price. If the Candlestick body is black it shows that the closing price is larger than the opening price, but if the Candlestick's body is white, it shows that the opening price is larger than the closing price. The line that sticks on the Candlestick body is called shadow that represents the highest and lowest price in the current moment. The upper shadow is a distance from the highest price with the maximum open price or the close price on the current day, and the lower shadow is the distance from the low price with the minimum of open price or close price on the current day. Using candlestick pattern is helpful in technical analysis for the benefit in the short-term trading strategy [15], whether it is in a buy or sell position. For example, if the Candlestick is detected as Hammer, it determines the downtrend and is recognized as a buy signal. On generating the candlestick pattern, we analyze and conduct the formula according to the candlestick pattern in Figure 3. Twelve patterns are considered to be the target label. To determine the patterns, several rules relating to the body, the tail, and color of the given Candlesticks are applied, see [16] for the details. The rule is simple. For example, if we analyse the Hammer pattern, according to Figure 3, the white body of Hammer shows that the opening price is larger than the closing price. The long tail of Hammer indicates the pattern has a long upper shadow that has weight more than the body, and the open price is larger than the close price. The head that almost does not occur in the Hammer pattern indicates that the upper shadow does not appear, and the close price close to the high price. The example of rule for the Hammer pattern lies on Table 2. Please check [12] for the twelve candlesticks rule considered in this paper.

ANN Model
Artificial Neural Network (ANN) is a well-known method to use in prediction and classification problems, even generally for the problem related to finding the optimal regressor model for a given dataset [17]. ANN builds the knowledge by detecting patterns and relationships within a given data and learn through experience [18]. Multilayer Perceptron (MLP) aims to be used as a model of ANN in this study. A Multilayer Percepron (MLP) network includes the backpropagation, a procedure to repeatedly adjust the weight and threshold value accordingly to minimize the difference between the desired and obtained output [19]. The architecture of Multilayer Perceptron can be seen on figure 4. Thus, several parameters were inputted to generate the best model, including: a. Number of hidden layer, input layer, and output layer, b. Activation function. Two Activation Function options suggested to be used in the model: Tanh function and Softmax function that are respectively written on equation (11) and (12), tanh x = e x -e -x e x +e -x (11) σ(Z ⃗ ⃗ ) i = e z i ∑ e z j K j=1 (12) σ is a softmax, Z ⃗ ⃗ is an input vector, e z i is a standard exponential function for the input vector, K is a number of classes in multi-class classifier, and e z j is a standard exponential function for the output vector. c. Optimizer, loss function, epoch, and batch size.

Evaluation
To evaluate the prediction model, K-Fold Cross-Validation is implemented to check the dependency model to the given dataset. Several scores that are important to evaluate model performance from the ratio between training and testing data are used in this work. They are parts of the confusion matrix including: a. Accuracy = In K-Fold Cross-Validation, the available learning set is partitioned into k disjoint subsets of approximately equal size [21]. The data split into k subsets {D 1 , D 2 . . . D k } with the same size and the model is evaluated for k different splitting scenarios of training and testing data.

Developing the Trading Strategy
On defining the trading strategy, the unlabeled data is used as the data validation to predict the candlestick label applied by the obtained classifier model. The validation data consist of 10 indicators value of technical analysis that explained in section 2.3. Next, Candlestick prediction results are used to generate a trading strategy according to the buy and sell class position in Table 3. Table 3 describes that every candlestick has a prior trend, whether bearish or bullish, and every prior trend indicates the movement trend i.e downtrend and uptrend, which lead to the decision to buy or sell. For example, if the candlestick's prior trend is bearish, it indicates the downtrend, so the trading strategy recommended is to buy.

Dataset Scenario
On the modeling process, we divided the dataset into four periods of time, i.e., 1 day, 5 days, 10 days, and 28 days. The aim is to analyze the comparison of model performance between each period. The way to divide the data based on its period is when we generates the value of 10 indicator tools. On generating the indicator tools, we adjust the value of the day that should be inputted to the calculation according to the data period that we use, it is 1, 5, 10, and 28. Also, on generating the Candlestick pattern, we adjust every pattern based on the data period, e.g., for INAF stock on 5 days period, we develop the Candlestick pattern in 5 days. The difference between Candlestick that generates in each period is on the value of open, high, low, and close price, e.g. Candlestick with 1 day period has the open, high, low, close prices value in range 1 day, but in Candlestick in 5 days the prices value is in range of 5 days. The different period of the data, generates the difference value of the detected label. Table 4 displays the number of the candlestick pattern as the detected label that generates in each period of time.

Modeling Scenario
We set every stock index and its period using the same ANN parameter on the modeling scenario process. Several modeling scenarios have been done, including adjusting the parameters of ANN to get the maximum accuracy. Therefore, the parameters that we used are displayed on Table 5.

Result Evaluation on Modeling using ANN
The results of modeling using ANN are represented in this chapter. Based on the modeling results on Table 6,  Table 7, Table 8, and Table 9, the results shows the Accuracy and average value of Precision, Recall, and F-1 Score of all classes. On the analysis, the highest Accuracy, average Precision, Recall and F-1 Score on the modeling process are found at period of 28 days. On the 28 days, the accuracy reaches more than 70%, with the highest accuracy of 85.96% on INCI stock. Also, if we see on the results, the period of 5 days has the lowest value on Accuracy, average Precision, Recall, and F-1 Score. All the modeling result values are increased as the period is increases, but not for the value on period 5 days, it always has the minimum value than other periods. The reason of the differences between the modeling results of each period is because the difference number of target label that detected, see Table 4 for the details.

Evaluation Results using K-Fold Cross-Validation Method
The result conducted in the K-Fold Cross-Validation is used to measure the model performance as we split the data sample randomly in K groups. On evaluation results for every stock using K-Fold Cross-Validation with K = [3,4,5,6] the results parameter that generate includes the average of Accuracy, Highest Accuracy, Standard Deviation, and the class that have the highest average of precision value based on the period that used. We generate those parameters to analyze the results from the model in each fold and give us insight about the model performances. The results are represented in graph and can be seen on Figure 5, Figure 6, Figure 7, and Figure 8.   Figure 8, the highest results of average accuracy and highest accuracy are in period of 28 days. On period 28 days, the value of average accuracy was between 85.15% to 90.91%, and the highest accuracy was between 87.87% to 97.53%. The results of all stocks within all period has the same graph pattern and the pattern also constant for each value of K. The graph plot that the value of average and highest accuracy will be decreased on period 5 days and increased again on period 10 days to 28 days. For the standard deviation, we found that the minimum value is 1.23% or 0.0123, and the maximum value is 10.18% or 0.1087. On Table 10 we describe the average precision from the class representative that appears to have the highest precision on each period based on K-Fold Cross-Validation results. On Table 10, The highest precision among the results is at the Three Black Crows class with value 89.50%.

Candlestick Prediction Results
On prediction results, we use the data validation to create the prediction using the model results. The interval of data is starting from January 2019 to May 2021 for all the stocks. The predicted data is the candlestick pattern that generate from the 10 indicators on the modeling process.

Trading Strategy Results
After generating the predictions, we use the rules on Table 3 to generate the trading strategy based on the Candlestick prediction. The goals to generate the trading strategy are to find and analyze the strategy that develops by predicting Candlestick using 10 indicator tools, and also the relation between the indicator tools with the Candlestick pattern that detected. Table 11 describes the result of trading strategy of INAF stock in each period of time. On Table 11, the begin date of the prediction are difference for each period, this happen because of the 10 indicators that we use are using different value of day. The difference between period also generates difference candlestick prediction, see Table 11, the candlestick prediction of date 2019-01-11 are Hammer on period 1 day, but it is Three Black Crown on period 5 days.

CONCLUSION
This study presented a trading strategy on the market stock by analyzing the Candlestick patterns using an Artificial Neural Network (ANN). On the modeling, training and testing data are divided into four periods of time. We perform the ANN method to create models and predict the Candlestick pattern from 10 indicator tools that have been generated. The goal of this study is giving the traders a visualization of trading strategy recommendation that generates from the indicator tools and candlestick pattern on current period of time, so in the future traders can learn and apply candlestick pattern along with the indicator tools to generate the trading strategy. The performance of the model is evaluated using K-Fold Cross-Validation and confusion metrics scores. From the results, it can be conclude that the modeling process using ANN can be useful as the method in predicting the candlestick pattern using 10 indicators, since the modeling can yield the accuracy more than 70% for stocks in period 28 days. When we plot the results of K-Fold Cross-Validation, the value of average accuracy and highest accuracy from all the stock results within the period is constantly decreasing from period 1 day to 5 days, and increase again on period 10 days to 28 days. The highest average accuracy and highest accuracy are in period 28 days. The result of K-Fold Cross Validation for each stock in each period with different value of K, e.g., INAF period 1 day on K = [3, 4, 5,