Activation functions of neural networks for solving problems of predicting energy loads

A scientific group from the Moscow Power Engineering Institute conducted a study on the most common methods and ways to use neural network software in the field of predicting load loads on power generating facilities.

The first works on methods for predicting the loads of energy systems appeared at the turn of the fourth quarter of the 20th century. The main disadvantage of regression and time series models based on statistical methods is the low level of detail in the forecast power systems.

With the development of computing power of computer technology, models based on artificial neural networks (ANN) began to be used in solving problems of predicting energy loads. At first, these were expert systems using fuzzy sets, the so-called Fuzzy Expert Systems. Subsequent development led to the emergence of hybrid systems (expert system and neural network) and neural networks with fuzzy logic – Fuzzy Neural Networks (FNN).

Such approaches are promising, due to the possibility of building a model of an object without a detailed description of it, while at the same time sufficient adequacy of the model. The analysis also allows you to identify the factors that most affect the energy consumption of the object, get their weighted participation in the process of energy consumption and calculate the probabilistic characteristics corresponding to various phenomena.

Why is the use of neural networks better than classical methods?

Existing analytical and statistical methods (including the fuzzy set method) are less effective in this case, because with their help it is not always possible to describe the accumulated array of non-digital statistical information. The evolutionary method based on the Wiener filter method, artificial neural network method, genetic algorithm, etc., has the advantage of independent search for the optimal solution, where minimization of the prediction error is used as the optimality criterion. Another advantage of the ANN is the rejection of the assumption of the distribution of fluctuations in the characteristics of power systems according to the normal distribution of a random process. A significant advantage is the ability to create a set of solutions in the form of subsystems obtained using the tensor-topological method of G. Kron. Solutions found for individual subsystems can be stored and then used to determine solutions for more complex ES&Cs containing these subsystems connected in a wide variety of ways.

The set of solutions, for example, in the presence of two free parameters, can be represented as a convex polyhedron in three-dimensional space. The process of solving the problem will look like a sequential transition from one vertex of the polyhedron to another, the value of the objective function obtained at each step will approach the optimal one.

Graphical representation of the solution of the optimization problem

The following assumptions are accepted:

all variables are non-negative;
changes in the time scale of the characteristics of installations associated with the start, stop and transients are not taken into account, because transient modes occupy a small fraction of the operating time of installations;
small changes in the characteristics of installations as a result of changing the mode of their operation are not taken into account, because the impact of these changes on the overall performance of the system is considered to be negligible.

And what is the essence of the question?

Artificial neural networks have received the greatest application for operational and short-term load forecasting. An artificial neural network is a set of neural elements and connections between them. ANN consists of formal neurons that perform a non-linear transformation of the products of input signals into weight coefficients summed over all signals:

where X is the input signal vector, W is the weight vector; F is a non-linear transformation operator. — Where X is the vector of the input signal, W is the vector of weights; F is the nonlinear transformation operator.

The apparatus of the ANN does not impose any requirements on the forecasting procedure, but it assumes the expert setting of the parameters of the object. It is important that the network parameters do not rely on the same input factors, as this leads to network retraining. In addition, when the behavior of load consumption changes, the onset of a disorder, the prediction error of a neural network trained on previous time series data increases significantly and often exceeds the allowable values.

To predict the energy consumption of objects of medium and large scale, it is advisable to predict the fuel and energy resources to use multilayer ANNs based on Elman networks; with direct or cascade distribution, with a time delay, etc. To emulate ANNs on a conventional computer, simulator programs are used – neuropackages, for example, NeuroPro. Training of a neural network in the NeuroPro program is carried out according to the principle of dual functioning using one of the following optimization methods: gradient descent; modified ParTan method; conjugate gradient method.

It is believed that to solve most problems in various branches of science and technology, it is sufficient to use a two-layer perceptron network with one hidden layer and a sigmoidal activation function. However, ANNs with direct distribution when solving problems of forecasting FER have four minima according to the criterion of error observed when the number of neurons is equal to 25, 35, 45, 55 and 75 units. The limitation of single-layer neural networks associated with the linear inseparability of training data. Therefore, the most common type of ANN is a perceptron multilayer network, consisting of several sequentially connected layers of neurons with the implementation of a vector function of several variables.

To estimate the optimal number of neurons in the hidden layers of the perceptron, a formula can be used that is a consequence of the Arnold-Kolmogorov-Hecht-Nielsen theorems:

where Ny is the dimension of the output signal; Q is the number of elements of the training set; Nw is the required number of synaptic weights; Nx is the dimension of the input signal. — where N_y is the dimension of the output signal; Q is the number of elements of the training set; N_w is the required number of synaptic weights; N_x is the dimension of the input signal.

A formal neuron that reflects the basic properties of a biological neuron is an elementary conversion element that has many inputs to which signals are received, a summation function, a signal conversion unit using a transfer function (or activation function) and one output y. Each input has its own “weight” and a bias parameter.

The neuron operates in two cycles. At the first stage, the amount of excitation received by the neuron is calculated in the summing block:

Formula for calculating the amount of excitation

From the point of view of the implementation of the neuron model, the bias parameter θ is often represented as a single input:

Representation of the bias parameter as a single output

At the second stage, the total excitation is passed through the activation (transforming) function, as a result of which the output signal is determined.

Definition of the output signal

The number of neurons in the hidden layer of a two-layer perceptron

A non-trivial task is to find the most optimal type of activating function in terms of accuracy.

Sigmoidal (logsig) is a monotonically increasing non-linear function. The function is S-shaped, non-linear, differentiable throughout its length, with saturation.

A linear neuron is the simplest one-layer model, which is used, for example, to predict some continuous value:

Line neuron function

ReLU – Rectified linear unit (ReLU) – improved linear neuron.

The function of such a neuron is:

Enhanced Linear Neuron Function

Typical LR functions (Gaussian, bell-shaped, triangular, trapezoidal) can also be used to specify fuzzy sets. The main advantage of these functions lies in their differentiability on the entire x-axis, which is necessary for ANN learning algorithms with differentiation operation. The sigmoid has the property, when not saturated from strong signals, to simultaneously amplify weak signals, and is described by the function:

where α is the function slope criterion.

The disadvantage of the perceptron is its non-differentiability and discontinuous nature.

Hyperbolic tangent (tansig) is one of the types of sigmoid, all values of which are enclosed in boundaries [–1;1]:

Because the:

then tansig differs from sigmoid only in multiplicity along the x-axis:

Main types of activation function: a –logsig; b – tansig; in –purelin; — Main types of activation function: A -logsig; b – tansig; V -purelin;

Purelin is a type of linear function, most often used as an activation function in input or output neurons. Another variation of the linear activation function is the saturating linear transfer function. It has a big drawback in the form of the impossibility of differentiation on the entire numerical axis, which narrows its application.

oftplus is the ReLU analytical approximation. The advantage is differentiability (as opposed to the ReLU function). The function looks like:

Graphic display of the Softplus function

It is advisable to use logsig as an activation layer in the first layer, because it eliminates the problem of noise penetration into the second layer. In the inner layers, it is advisable to use tansig. The resulting neural network can be trained with an error backpropagation algorithm that determines the strategy for fitting the weights of a multilayer network using gradient optimization methods. It is based on the objective function, formulated as a quadratic sum of the differences between the actual and expected values of the output signals.

The main advantage of neural network methods for predicting electrical loads is the absence of any requirements for input data. The main drawback is that the ANN works like a “black box”, i.e. does not allow interpreting the results of forecasting in a form understandable to an expert.

When solving problems of predicting energy loads, part of the interdependencies between the analyzed variables can be described analytically, with a minimum set of assumptions. In this case, it is advisable to use an ANN with a layer of rules either with a time delay, or a Sugeno network with a time delay and several type inputs.

The authors of the material: Guzhov S.V., Varshavsky P.R., Bashlykov M.S., Torop D.V.

The work was carried out within the framework of the project “Development of neural network software for forecasting the demand for thermal energy by objects of mass construction in the city of Moscow” with the support of a grant from the National Research University “Moscow Power Engineering Institute” for the implementation of the research program “Priority 2030: Future Technologies” in 2022-2024.