# Forecasting And Stock Market Critical Points Analysis Using Modified Local Holder Exponents

## Abstract

The article is devoted to a new indicator for forecasting critical points of financial time series based on modified Hölder indicators. The indicator was developed to predict large movements of financial instruments in the stock market. The analysis of the indicator's performance was conducted on the US and Russian stock markets using time series with a minute sampling frequency. It is shown that this indicator is able to predict large movements of financial time series with good enough statistics. The corresponding calculations, tables and statistics are presented. The paper shows that the developed predictor on average in 80% of cases in the US market and on average in 60% of cases on the Russian market correctly predicts large movements in the market. These results were obtained by statistical processing of all predictions of critical points in the markets of the USA and Russia. Also, a significant difference was found between the parameters of the developed indicator for the US and Russian markets.

Keywords: Forecastingtime seriesHölder indicatorslocal Hölder exponents

## Introduction

The idea of predicting the dynamics of the prices of financial instruments in the stock and currency markets with the help of a mathematical approach arose long ago. To solve this problem researchers apply the whole variety of mathematical methods, beginning with simple linear models and ending with models of recurrent neural networks and hidden Markov processes (Khan & Gour, 2013). In particular, one of the possible methods for analyzing and predicting the dynamics of the stock market is based on the multifractal approach (Suárez-García & Gómez-Ullate, 2014; Kapecka, 2013; Kuperin & Schastlivtsev, 2008). In this paper we use the notion of the local Hölder exponents (LHE), which are closely related to multifractal analysis, which can be found, for example, in (Kuperin & Schastlivtsev, 2008). More specifically, LHE are defined below. Definition. Let a function $f\left(t\right)$ be defined on a domain $t\subset R$ and satisfy the relation

$|f\left(t+∆t-f\left(t\right)\right)|-{C}_{t}·∆{t}^{a\left(t\right)}$

at $∆t\to 0,0 and ${C}_{t}>0,$ Then the number $\alpha \left(t\right)$ is called the local Hölder exponent of the function $f\left(t\right)$ at the point $t$ Local Hölder exponents show how smooth the function is at a given point, in other words, the higher the value $\alpha \left(t\right)$ the higher the smoothness of the function $f\left(t\right)$ The method described in this article is based on the assumption that before a strong price movement in the market, the time series of prices is smoothed (Kuperin & Schastlivtsev, 2008). Thus, LHE allow us to estimate the smoothness of the series and make a forecast about the subsequent strong price change. There are two basic approaches to the calculation of local Holder exponents, which have a number of advantages and disadvantages (Abry et al., 2009). We will describe them briefly below. Method of oscillations. It is one of the simplest methods for computing local Hölder exponents (Legrand et al., 2006; Trujillo et al., 2010). The drawback of this approach lies in the slow convergence of the values of the computed local Hölder exponents to their theoretical values. In other words, in practice, it is necessary to use large neighborhoods of a time series point to obtain good-accuracy LHE values. Wavelet transform method. This method of calculating Hölder exponents is one of the most frequently used methods (Los & Yalamova, 2004; Struzik, 2001a,b). This method can be based on both a continuous and discrete wavelet transform (Bacry et al., 2002). The drawback of the method is the presence of the so-called edge effect, which appears when the wavelet transform is implemented. The edge effect for finite time series is manifested in the fact that changing at least one value at the beginning or at the end of the time series leads to a strong change in the wavelet transform. In this paper, LHE are calculated by the modified local Hölder exponents (MLHE) method, proposed and described in detail in (Kuperin & Schastlivtsev, 2008). Advantages of this method in comparison with the methods described above consist in the absence of an edge effect and in the rapid convergence of the computed values to the corresponding theoretical values. It should be noted that in this paper the method of calculating MLHE in comparison with the computational procedure proposed in (Kuperin & Schastlivtsev, 2008) has been changed. Namely, we do not use the notion of a signal line. proposed in (Kuperin & Schastlivtsev, 2008), and we perform MLHE smoothing not with the help of the moving average, but with the help of the Hodrick-Prescott filter (Cogley & Nason, 1995).

## Basic concepts

The method of MLHE allows to determine changes in the dynamics of financial time series, characterized by strong amplitude motions of the series. Let the amplitude of the MLHE be determined by the difference between the maximum and minimum.

Definition 1. We call the MLHE signal the amplitude of the MLHE exceeding a certain threshold value $s,$ the value of which is determined by numerical experiment. We will assume that the signal appears at the maximum ${t}_{s}$ of the amplitude of the MLHE. The point in time, in which a large motion of the time series begins, will be called the critical point of the time series.

An important step in calculating the statistics of the MLHE method is the definition of the term "movement of the financial time series". Obviously, depending on what definition to adopt, different results will be obtained. For objectivity and greater coverage of possible ways of applying the method of MLHE in this paper, it is proposed to use several variants of such definitions at once and to count statistics according to each of the definitions separately. Also in options trading often rely on the value of volatility at each point in time, as its change leads to a change in the value of the option. Therefore, it is reasonable to make calculations regarding volatility. Below are the definitions used in calculating statistics. We introduce the main definitions:

Definition 2. We define the motion $∆H\left(\epsilon \right)$ of a time series over a certain time interval $\epsilon$ as the difference between the values of the final and the first points of this interval.

Definition 2.1. We define a large (significant) motion $∆H\left(\epsilon \right)$ in a certain time interval $\epsilon$ as a motion (see definition 2) whose value exceeds the mean value of motion on the same time interval for a given time series.

Definition 2.2. We define a large (significant) motion $∆H\left(\epsilon \right)$ as a motion (see definition 2) whose magnitude exceeds the median value of the motion for a given time series.

Definition 2.3. We define a large (significant) motion $∆H\left(\epsilon \right)$ of a time series in a certain time interval $\epsilon$ as the maximum difference of values (in modulus) between points inside the interval $\epsilon$ and the first point of the interval.

Definition 3.1. We define a large (significant) motion of a time series as a motion $∆H\left(\epsilon \right)$, the value of which exceeds the mean value of motion for a given time series.

Definition 3.2. We define a large (significant) motion as a motion $∆H\left(\epsilon \right)$ whose value exceeds the median value of the motion for a given time series.

Definition 4. We determine the volatility $V$ of the time series over a certain time interval $\epsilon$ as the standard deviation of the logarithmic increment of the instrument $X$ under study on the interval $\epsilon$ (1).

Definition 4.1. We define large (significant) volatility as volatility, the value of which exceeds the average value of volatility for a given time series:

$V=\sqrt{\frac{1}{1-e}\stackrel{s}{\underset{i=1}{\sum }}{\left(\mathrm{l}\mathrm{n}\frac{{X}_{i}}{{X}_{i-1}}-\left(\mathrm{l}\mathrm{n}\frac{{X}_{i}}{{X}_{i-1}}\right)\right)}^{2}}$(1)

Where $〈\mathrm{l}\mathrm{n}\frac{{X}_{i}}{{X}_{i-1}}〉$ this averaged over all values of the series $\mathrm{l}\mathrm{n}\frac{{X}_{i}}{{X}_{i-1}}$

Definition 5. Let us determine the average value of the motion $∆H\left(\epsilon \right)$ by the formula:

$〈∆H〉=\frac{1}{n\left(e\right)}\stackrel{n\left(s\right)}{\underset{i=1}{\sum }}∆H\left({e}_{i}\right)$ (2)

where ${\epsilon }_{i}$ are disjoint time intervals of equal length $\epsilon$, and $n\left(\epsilon \right)=⌈\frac{N}{\epsilon }⌉$ this is the number of intervals divided by the time series, $N$ this is the length of the time series. Here $⌈\frac{N}{\epsilon }⌉$is the integer part of the number $\frac{N}{\epsilon }$.

We define the median value of the motion $Me\left[\mathrm{\Delta }H\left(e\right)\right]$ as the mean value of an ordered series of motions $∆H\left({\epsilon }_{i}\right)$ of the time series, where ${\epsilon }_{i}$ are disjoint time intervals of equal length $\epsilon$, $n\left(\epsilon \right)=⌈\frac{N}{\epsilon }⌉$ the number of intervals divided by the time series, $N$ is the length of the time series.

The algorithm for calculating the predictor of large movements for financial instruments is implemented in several steps:

1. Calculation of the MLHE of the time series under study;

2. Smoothing of MLGP by Hodrick-Prescott filter (Cogley & Nason, 1995) filter, where the trend curve is used as a smoothing curve.

3. For each MLHE signal, the value of the change in the time series over the interval $\left[{t}_{s},{t}_{s}+e\right]$ introduced in Definition 2 and Definition 3.

4. If the value of movement is large (significant), then the signal is considered correct, otherwise the signal is considered incorrect.

The motion is considered large according to the definition with respect to which the calculations are made. These calculations were carried out for all definitions of large motions (listed above) in time series.

As a mechanism to control the objectivity of the results for each time series, the number of correct and incorrect signals obtained as a result of random numerical generation.

By random generation we mean the following. Imagine that we have received a proportion of the correct signals of the MLHE equal to $P.$ Now it is necessary to show that this proportion is not accidental. That is, if we choose random time points (random signals) instead of the MLHE signals on the same time series, then the proportion of the correct signals for random selection should be lower (statistically) than for the signals of the MLHE. Random numerical generation is the receipt of random points in the time domain of the time series, which we apply to the algorithm "as if" they are signals of large movements. If the proportion of the correct signals with random signals would be the same as for signals of the MLHE, then the signals of the MLHE would have no value, i.e. themselves would be accidental and not capable of predicting large movements. The random generation of signals (RSG) allows us to show what proportion of the "correct signals" can be obtained on the time series under study if we calculate the motions of this series at random points. To calculate the optimal parameters of the MLHE, a simple optimizer is used that sorts out all possible combinations of parameters in a given range with a given step for each parameter. This algorithm was used to calculate the 13 most liquid shares in the US market, as well as 5 liquid shares of the MICEX stock exchange for the period from 01.01.15 to 17.08.15 for the minute sampling rate of time series. In this paper, only data with a high sampling rate (minute values) is used. This is due to the fact that for statistical analysis of data it is necessary to obtain a large number of signals, that is, series containing about 50,000 values and higher are needed. Such a number of values in the public domain is available only for the minute sampling frequency, that is, series containing about 50,000 values and higher are needed. Optimization is carried out in three parameters: the MLHE window $w$ (the number of points in the series necessary to calculate one value of the MLHE), the interval $\epsilon$ and the threshold $s$ above which the amplitude of the MLHE is considered to be a signal amplitude. Intervals and optimization steps are listed below:

1. window MLHE $w$ : interval [200,350], step is 30 points (such a short interval and a large step are chosen because of the large time for calculating the MLHE about 500 minutes into one time series);

2. time interval $\epsilon$ : interval [20,220], step is 10 points;

3. threshold value for the amplitude of the MLHE: interval [0.1, 4], step 0.1 (measured with respect to the mean amplitude, i.e. 0.2 means 0.2 of the mean).

Below is a brief description of the optimization algorithm and the objective function.

We divide the original time series into two consecutive samples: primary (30% of the whole series) and secondary (70% of the whole series). On both samples, for all sets $\left(w,s,\epsilon \right)$ of parameters, we calculate the proportions $P\left(w,s,\epsilon \right)$ of the correct signals. Let $\stackrel{-}{P}\left(w,s,\epsilon \right)$ is the proportion of correct MLHE signals received on the primary sample with the values of the MLHE window $w,$ the signal threshold $s$ and the time interval $\epsilon .$ Likewise let be $\stackrel{^}{P}\left(w,s,\epsilon \right)$ is the proportion of the correct MLHE signals received on the secondary sample. We define an objective function as follows:

$F\left(w,s,e\right)=\mathrm{\Delta }P\left(w,s,e\right)·\stackrel{^}{P}\left(w,s,e\right)$(3)

where $\mathrm{\Delta }P=|\stackrel{^}{P}\left(w,s,e\right)-\stackrel{-}{P}\left(w,s,e\right)$ Then the optimal values of for $w,s,\epsilon$ can be determined from condition

$F\left({w}^{\mathrm{*}},{s}^{\mathrm{*}},{\epsilon }^{\mathrm{*}}\right)={\mathrm{m}\mathrm{a}\mathrm{x}}_{\stackrel{ˆ}{P}} {\mathrm{m}\mathrm{i}\mathrm{n}}_{\mathrm{\Delta }P} \left[\mathrm{\Delta }P\left(w,s,\epsilon \right)\cdot \stackrel{ˆ}{P}\left(w,s,\epsilon \right)\right]$(4)

where ${w}^{\mathrm{*}},{S}^{\mathrm{*}},{\epsilon }^{\mathrm{*}}$ are the optimal parameters. In other words, the optimal parameters are those parameters for which the values of $\stackrel{-}{P}\left(w,s,\epsilon \right)$ and $\stackrel{^}{P}\left(w,s,\epsilon \right)$ differ as little as possible, and $\stackrel{^}{P}\left(w,s,\epsilon \right)$ has value as large as possible. Such an optimization algorithm is necessary to obtain a stable parity of parameters $\left(w,s,e\right)$ , in which the proportion of correct signals as little as possible would change with time. Otherwise, it may turn out that the obtained values of the fractions of the correct signals are just the result of excessive optimization at a given time interval and do not bear any meaningful meaning.

## Results and analysis

The result of the algorithm for the minute sampling frequency on the interval from 01.01.15 to 17.08.15 can be seen in Table 1 . The result of the algorithm performance for the minute sampling frequency on the interval from 01.01.15 to 17.08.15 can be seen in Table 2 . The abbreviations used are: RSG is random signal generator, RSG is a synonym for random numerical generation, that is, they are just random points on a time series. The generator uses a uniform distribution throughout the optimized interval, i.e. probability of signal appearance for all points of the interval on which the optimization is carried out are equal. We recall that (2.1), (2.2), (3.1), (3.2) are the sub clauses of Definitions 2 and 3 indicated at the beginning of the paper. The slash sign "/" separates the values of the parameters and the results of the algorithm work, obtained for different definitions, for example, the record (2.1) / (2.2) means that the values obtained for determining (2.1) go to the sign of the line, and after the sign the values obtained for (2.2). Values of normalized values of large movements of time series are added. The time series is normalized according to the formula $\stackrel{^}{X}=\frac{X}{〈X〉}$ where $X$ this is the original time series, $〈X〉$ this is the average of the time series. The result of the algorithm for the minute sampling frequency on the interval from 01.01.15 to 17.08.15 can be seen in Table 1 . The result of the algorithm performance for the minute sampling frequency on the interval from 01.01.15 to 17.08.15 can be seen in Table 2 .

The work of the MLHE method is based on the hypothesis of multifractality of financial time series, so one can test the method (and at the same time the hypothesis itself) based on the reverse assumption: using the scrambling algorithm. "Let us destroy" the multifractal structure of the time series and apply the method to the new scrambled time series. For example, take the stock of AT & T, which is the first in the tables 1 , 2 , 3 . As an algorithm for scrambling, we take an algorithm based on the iAAFT technique (iteratively Amplitude Adjusted Fourier Transform), proposed and described in (Schreibe & Schmitz, 2000). This scrambling algorithm is chosen because it allows to preserve the probabilistic distribution of the time series and its frequency spectrum (within the limits of calculation error). The error that determines the differences in the original spectrum and in the spectrum of the scrambled series will be determined by the formula:

(5)

Where $\stackrel{ˆ}{F}\left({\omega }_{i}\right)|,|\stackrel{ˆ}{S}\left({\omega }_{i}\right)\mid$ are the Fourier transform amplitudes of the original and scrambled series, this is the total number of Fourier-expansion frequencies (for a discrete Fourier transform is equal to the number of points of the original series). Thus, the error is the average deviation of the amplitude modules of the Fourier spectrum.

We now apply the MLHE algorithm to the scrambled series. The results are shown in Table 4 , «Results of the application of the MLGR algorithm for a scrambled time series based on the AT & T time series».

It can be seen that the values of the proportions of the correct signals for the given series were practically equal to the values obtained for the RSG and much lower than the values obtained earlier for the original AT & T series. All values are slightly higher than for the RSG due to the optimization algorithm. It can be shown that each of the values does not exceed the limits of the confidence interval. For example, for the definition (2.2), a confidence interval with a confidence level of 95% is 0.9, so a value of 0.56 falls into this interval, and therefore cannot be considered statistically significant. For other values, the confidence interval is even greater, since these values are calculated on a smaller number of signals. From the results obtained it can be concluded that the financial time series have multifractal structure and method MLHE stops working for scrambled time series.

The most important and almost obvious result observed in all the tables for all selected time series (especially for the US market) is the explicit statistical difference between the frequency of the correct signals of the MLHE and the frequency of the correct signals obtained by the random signal generator (RSG).

The optimal values of the MLHE window are the same for all analyzed time series. In general, this is due to the rough optimization of this parameter (as already described above due to the speed of the program), but somehow these values are within a small interval [290,350] points relative to the entire range for the window $w$

The optimal interval values $\epsilon$ for different series of the US market and the Russian market (individually) lie in a small numerical interval relative to the entire range of values $\epsilon$, which, on the one hand, speaks of the non-random work of the MLHE method, and on the other, the overall internal structure of the time series of the US market and the Russian market (separately).

## Conclusions

In this study, several definitions of the concept of large motion were used, on the basis of which the corresponding calculations of the probability of the correct (according to the definition) forecast of the dynamics of the time series were made. This method showed good performance, several times exceeding the probabilities of the correct signals received by the generator of random signals, which are necessary for an objective evaluation of the effectiveness of the method. The best results are obtained with respect to the definition (3.1). In practice, the average value of volatility is used to form the price of options, i.e. MLGP can be considered a good tool for option trading. Also valuable results are differences in the values of optimal parameters and probabilities of correct signals for instruments of American and Russian financial markets. In addition, based on the results obtained, one more important conclusion can be drawn: the greater the liquidity of the instrument, the more the number of players trades in this instrument, the better the performance of the MLGR (which is in some contradiction with the theory of an effective market).

## References

1. Abry, P., Gonçalves, P., & Véhel, J. L. (Eds.). (2009). Scaling, fractals and wavelets. ISTE.
2. Bacry, E., Muzy, J. -F., & Arneodo, A. (2002). Wavelet-based estimators of scaling behavior. IEEE Transactinons on Information Theory, 48, 2938-2954.
3. Cogley, T., & Nason, J. M. (1995). Effects of the Hodrick-Prescott filter on trend and difference stationary time series: Implications for business cycle research. Journal of Economic Dynamics and Control, 19, 253-278.
4. Kapecka, A. (2013). Fractal Analysis of Financial Time Series Using Fractal Dimension and Pointwise Hölder Exponents. Dynamic Econometric Models, 13(1), 107-126.
5. Khan, A. U., & Gour, B. H. U. P. E. S. H. (2013). Stock market trends prediction using neural network based hybrid model. International Journal of Computer Science Engineering and Information Technology Research, 3(1), 11-18.
6. Kuperin, Yu. A., & Schastlivtsev, R. R. (2008). Modified Holder Exponents Approach to Prediction of the USA Stock Market Critical Points and Crashes. Statistical Finance, 15. http://arxiv.org/abs/0802.4460
7. Legrand, P., Lutton, E., & Olague, G. (2006). Evolutionary denoising based on an estimation of Holder exponents with oscillations. Lecture Notes in Computer Science, 3907, 520-524.
8. Los, C., & Yalamova, R. (2004). Multifractal spectral analysis of the 1987 stock market crash. Working Paper, Department of Finance, Graduate School of Management, Kent State University, 1-39.
9. Schreibe, T., & Schmitz, A. (2000). Surrogate time series. Physica D, 142, 346-382.
10. Struzik, Z. R. (2001). Revealing local variability properties of human heartbeat intervals with the local effective Hölder exponent. Fractals, 9(01), 77-93.
11. Struzik, Z. R. (2001). Wavelet methods in (financial) time-series processing. Physica A: Statistical Mechanics and its Applications, 296(1-2), 307-319.
12. Suárez-García, P., & Gómez-Ullate, D. (2014). Multifractality and long memory of a financial index. Physica A: Statistical Mechanics and its Applications, 394, 226-234.
13. Trujillo, L., Legrand, L., & Lévy-véhel, J. (2010). The Estimation of Hölderian Regularity using Genetic Programming. Conference: Genetic and Evolutionary Computation Conference, GECCO 2010, 861-868.

08 March 2021

#### Article Doi

https://doi.org/10.15405/epsbs.2021.03.71

#### eBook ISBN

978-1-80296-102-7

#### Publisher

European Publisher

103

-

1st Edition

1-644

#### Subjects

Digital economy, cybersecurity, entrepreneurship, business models, organizational behavior, entrepreneurial behavior, behavioral finance, personnel competencies