How Our robot works?

11 min readOct 1, 2020

Introduction

In the world we live in today, more than 90% of financial activities in large companies are done by robots, In the meantime, there are companies that claim to make and sell robots for trading. Most of these robots are not able to do the right thing and are more harmful than profitable. The only ones who make the main profit are the sellers of these robots.

The world’s largest banks such as Goldman Sachs, Bank of Canada, JP Morgan and Citigroup are all controlled by financial robots, each worth several million dollars. But none of them sell their robots to you, so why?

Because you put something up for sale when there is no profit for you or the profit of the sale is more than the profit of keeping it. So it is clear, I will keep what is valuable to me and I will not share it with you in any way.

The purpose of this article is simply to introduce the structure and algorithms used in coding, this robot is proprietary and not for sale.

1-BINANCE

Bainance is an online and international exchange in the field of digital currencies that allows the full use of the digital currency market, as well as allows professional users to activate their robots in the context of this exchange and Perform trading automatically.

You can see the API codes here

2- neural network

Neural networks are modern systems and computational methods for machine learning, knowledge display, and finally the application of knowledge gained to maximize the output responses of complex systems. The main idea of such networks is partly inspired by the way the biological neural system works to process data and information in order to learn and create knowledge. The key element of this idea is to create new structures for the information processing system.

This system is made up of a large number of extremely interconnected processing elements called neurons that work together to solve a problem and transmit information through synapses (electromagnetic communications). In these networks, if one cell is damaged, other cells can make up for its absence and contribute to its regeneration. These networks are able to learn. For example, by injecting tactile nerve cells, the cells learn not to go to the hot body, and with this algorithm, the system learns to correct its error. Learning in these systems is adaptive, that is, using examples, the weight of the synapses changes in such a way that the system produces the correct response if new inputs are given.

The main philosophy of the artificial neural network is to model the processing properties of the human brain to approximate conventional computational methods with the biological processing method. In other words, artificial neural network is a method that learns the knowledge of communication between several data sets through training and stores it for use in similar cases. This processor works in two ways similar to the human brain:

Neural network learning is done through education.

Weighting similar to the information storage system takes place in the neural network of the human brain.

Some formulas used artificial neural network:

2–1 ) Normalize data

Entering data in raw form reduces network speed and accuracy. Therefore, the input data to the network should be normalized.

In this study, the following equation is used to normalize the data, which standardizes the inputs between 0.1 and 0.9. Finally, network outputs can be restored to their original state by reversing the standardization algorithm.

2–2 ) Number of hidden layers

The number of hidden layers should be as small as possible. It is proved that each function can be approximated with a maximum of three hidden layers. First, the network is trained with a hidden layer, which in case of improper operation, the number of hidden layers will increase.

2–3 ) Number of hidden layer neurons

The size of a secretory layer is generally obtained experimentally. For a neural network with a reasonable size, the number of hidden neurons is selected with a small proportion of the number of inputs. If the network does not converge to the desired answer, they increase the number of hidden layer neurons, and if the network converges and has good generalization power, they test the number of hidden hidden neurons if possible.

2–4 ) Momentum algorithm

In this algorithm, the law of weight change can be considered in such a way that the weight change in the nth iteration depends to some extent on the weight change in the previous iteration.

ΔWji (n) = η δj Xji + α ΔWji (n-1)

2–6 ) A summary of how the robot works in the neural network

3- Long Short-Term Memory network (Optimized by NCC)

This section of the chapter discusses one of the most used approaches to cope with the difficulty of learning long-term dependencies, LSTM units. This problem remains an important challenge in deep learning and we could have chosen other solutions such as Echo State Networks or Leaky Units, one can refer to Goodfellow et al. (2016) for a general introduction.

We chose to discuss about LSTM networks, because it is one of the most effective sequence models, with the Gated Recurrent Unit (GRU). Both are gated recurrent neural networks.

The idea behind GRU and LSTM units is to create connections through time with a constant error ow, thus the gradient neither explodes nor vanishes. LSTM networks are explicitly designed to avoid the long-term dependency problem. Remembering information for long periods of time is practically their default behavior, that is why they are the most popular neural network model for sequence learning.

3–1) LSTM architecture

The architecture of LSTM is similar to RNN, the difference being that, as RNN are supersets of MLP, LSTM networks are supersets of recurrently connected subnets, known as memory blocks, (Graves; 2012). The basic memory block has three single neural network layers, interacting with each other instead of one:

· One (or more) memory cell sc , called the cell state, is the central feature. It is referred to

as constant error carrousel (CEC) in Hochreiter and Schmidhuber (1997). The cell state

produces only some minor linear transformations, achieving a constant error ow through

the memory block. It is a linear unit with a _xed recurrent self-connection. Controlling

the cell state, gate cells are added to the memory cell(s).

· One multiplicative input gate unit is introduced to protect the current memory content

stored in sc to be perturbed by previous states.

· One multiplicative output gate unit is also introduced to protect next units to be perturbed by the currently irrelevant memory content.

These gates gives to LSTM the ability to control the information ow in the cell state and

have a sigmoid activation function, which gives the amount of information to let trough. They are closed when the activation is close to 0 and are opened when the activation is close to 1. Thus, the input gate decides when to keep or override information in the memory cell.

With this architecture, the cell state, Sc, is updated based on its current state and three sources of inputs: netc, the input to the sell itself through the recurrent connection, and the inputs to the input and output gates, (Gers et al.; 2000). At each time step, during the forward pass, all units are updated and the error signals for all weights are computed during the backward pass.

If LSTM have found a large success and numerous applications, Gers et al. (2000) identified a weakness when they process continual input streams, without explicitly resetting the network state. In some case, the cell state tends to grow linearly during learning which can make the LSTM cell degenerate into an ordinary recurrent network, where the gradient vanishes, see Gers et al. (2000). Gers et al. (2000) proposed to add to the memory block a forget gate that allows the memory block to reset itself, thanks to a sigmoid activation function. For now on, we will only consider the extended LSTM with forget gates from Gers et al. (2000). Figure 1.1 illustrates such a LSTM cell.

Figure 1.1: LSTM cell with a forget gate, source: Graves (2013)

Let Xt be one observation at time t of the input vector, the LSTM cell from Figure 1.1 is implemented by the following equations (Graves; 2013):

where _ is the logistic sigmoid function, and i, f, o, c are respectively the intput gate, forget gate, output gate and memory cell activations. Wxi, Whi, Wxf , Whf , Wcf , Wxc Whc, Wxo Who Wco, which indexes are intuitive, are the input-input gate weight matrix, hidden-input gate weight matrix, etc.

3.2 Back-propagation for LSTM networks Like previous neural networks, LSTM are trained with gradient descent that requires the error gradient. In this section, we present the computation of the exact LSTM gradient with BPTT.

The forward and backward passes are calculated as in Section 1.2.2. For more details, the reader can refer to Graves (2012) from which we took the equations.

Regularization techniques

Early stopping

We can see early stopping as a very efficient hyper-parameter selection algorithm, where the number of training steps (epochs) is just another hyper-parameter (Goodfellow et al.; 2016).

Early stopping is a method that stops training when a monitored quantity has stopped improving. Indeed, deep neural networks tend to overfit the data. Even if we often observe a decreasing training loss, sometimes the test set error begins to increase after some training steps, which indicates overfitting. In that case, we will have a better generalization power if we stop training before the test set error increases.

With early stopping method, we monitor for example the loss or the metric on the validation set after each training step and we stop training if the loss starts to increase or if the accuracy starts to decrease (in the case that the metric used is the accuracy).

Early stopping is one of the most used regularization technique in deep learning, because of its simplicity and its effectiveness on reducing training time.

Dropout

The most common way to avoid overfitting in deep neural networks is to use a dropout layer as proposed Srivastava et al. (2014). The key idea is to randomly drop units (along with their connections) from the neural network during training, by forcing the weights of the units to be equal to 0, which reduces the number of parameters in the model.

Dropout can also be thought as an effective bagging method for many large neural networks (Goodfellow et al.; 2016). Bagging evolves training multiple models and evaluating multiple models on each test example. With the dropout method, we train an ensemble of all sub-networks that can be constructed by removing some input units from an underlying base network.

Now that we explained the different architectures of neural networks and that we have a reproducible and objective method to train and select the best models corresponding to a specific task, let us give examples of applications of deep neural networks on financial data.

Neural networks for time series forecasting

1–1 ) Cryptocurrencies as _nancial assets

The first cryptocurrency, Bitcoin, was created in 2008 by Satoshi Nakomoto. With Bitcoin, he invented the first unregulated digital currency designed to work as a medium of exchange thanks to the Blockchain technology which is a distributed ledger based on a decentralized peer-to- peer network that confirm transactions. Ten years later, 1512 alternative cryptocurrencies, called altcoins, are identified on CoinMarketCap proving that a real cryptocurrency market has emerged. Indeed, the cryptocurrency market is experiencing a strong growth over the last years, which can be inferred from CRIX, developed by Trimborn and Hardle (2016).

Because the economy is becoming more and more digital, it is natural to think that the role of digital assets, such as cryptocurrencies, in investment decisions will also grow. Indeed, investors from the former existing financial markets are now interested in cryptocurrencies, as new financial products on the cryptocurrency market are created with the apparition of options and futures on Bitcoin. Eisl et al. (2015) analyzed Bictoins returns to successfully show that adding Bitcoin to portfolios can have an optimal diversification effect. Elendner et al. (2017)

proved that cryptocurrencies are interesting for investors due to the diversification effect, because they are uncorrelated with each other and uncorrelated with traditional asset. Finally, Trimborn et al. (2017) showed that mixing cryptocurrencies with stocks could improve the risk-return trade-o_ of portfolio formation.

As in Elendner et al. (2017), we will investigate cryptocurrencies as alternative investment assets by studying their returns. In the following section, we will focus on the application of the different models we presented in Chapter 1 for time series modeling, especially cryptocurrencies prices.

Stock prices as time series

Stock price forecasting is one of the most important task of quantitative finance. It is particularly challenging because of the properties of stock prices, which does not behave as simple time series. Indeed, the random walk theory suggest that stock price returns are independently and identically distributed over time so the past values of a stock returns cannot explain their future values. The direct result is that the best prediction for tomorrow’s price is the price of today. This theory led to the Efficient Market Hypothesis (EMH) which asserts that the price of stocks reflects all relevant information available, implying that investor cannot outperform the market with a trading strategy, based on decisions made from the available information.

To be short fundamental analysis which tries to evaluate the intrinsic value of a security, would be irrelevant for financial market if the EMH is true. Moreover, technical analysis would be also irrelevant. Opposed to fundamental analysis, technical analysis aims to forecast the future price movements based on statistics, such as technical market indicators (moving average, Bollinger intervals) or underlying variables such as price movement, volume, market capitalization, global economic variables.

Technical analysis forbid us to use linear autoregressive models such as ARIMA for price movements’ modelization, because ARIMA models use only the past observation of the time series itself as regressors and supposed that the relation is linear.

In the past years, the E_cient Market Hypothesis has weakened its credibility among economics, specially with the appearance of Behavioral Economics that revitalized Fundamental and Technical analysis.

For this thesis, we use both Technical and Fundamental analysis and we supposed that stock prices is represented by a Non-Linear AutoRegressive with eXogenous variables model (NLARX).

How Our robot works?

Neural networks for time series forecasting

Written by Netcoincapital Official