Neural stack (neural stack machine)
This article briefly discusses the architecture of a neural stack.
Neural stack is one of the models of neural networks with external memory.
You can read more about neural networks with external memory here.
Part 1: What is a neural stack?
The basic idea is to train a recurrent neural network (specifically an LSTM) to manipulate a stack of numbers.
This model is trained using the backpropagation method (Backpropagation). And since a recurrent neural network is used, the method will be called Backpropagation through time.
The developers of this model solved the problem of teaching a neural network to add numbers to the stack and remove numbers from the stack.
In the original article The authors consider not only the neural stack, but also the neural queue and neural deck.
Let the external memory (neural stack) be a differentiable structure into and from which real numbers are added and removed. These discrete operations (push and pop) we deliberately make continuous, allowing operations push And pop have a real factor that takes values in the interval (0, 1).
Intuitively, we can interpret these values as the degree of confidence with which the controller wants to place the number onto the stack or pop the top of the stack.
Formally, a stack fully parameterized by the size of the attachment is described at some time step matrix valuesand vector of strength (force)
This matrix forms the core of the recurrent layer, which is acted upon by the controller. The neural stack receives values from the controller signal pop and signal push .
The controller outputs the read vector .
The recurrence of this layer is due to the fact that it will receive the pair and output a pair as the next state following the dynamics described below.
Here Vt[i] represents the i-th row (m-dimensional vector) A st[i] represents -th value .
Equation 1
Equation 1 shows the update of a recurrent layer represented as a matrix . The number of rows grows over time.
The values pushed onto the stack at each time step are saved (whether they are logically still on the stack or not). The values are added to the bottom of the matrix (top of the stack) and are never changed.
Equation 2
Equation 2 shows the influence of signals push And pop to update the force (intensity) vector to receive .
Surgery first pop removes objects from the stack.
We can consider the meaning pop as the initial removal value for the operation.
a) We move the force vector from the very first index until the last.
b) If the next force scalar less than the remaining number of deletions it is subtracted from the remaining number of deletions and its value is set to 0.
c) If the remaining number of deletions less than next value then the remaining number of deletions is subtracted from this value and the deletion stops.
Then the value push is set to the value added at the current time step.
Equation 3
Equation 3 shows the dynamics of the read operation, which is similar to the operation pop.
Fixed initial reading value equal to 1is set at the top of a temporary copy of the force vector .
Vector moves from the largest index to the smallest.
If the next strength value is less than the remaining read value, its value is stored for this operation and subtracted from the remaining read value.
If not, then it is temporarily set equal to the remaining read value, and the strength scalar values of all subscripts are temporarily set to 0.
Result read operation is a weighted sum of rows scaled by the temporary scalar values produced during the traversal.
An example of calculating a read from the stack in three time steps, after clicks and claps, as described above, is shown in the figure 1a.
The third step shows how to set the value V 0 For logically deletes from the stack and how it is ignored during reading.
This completes the description of the direct dynamics of the neural stack, represented as a recurrent layer, as shown in the figure b. All operations described in this section are differentiable. Equations describing the inverse dynamics are given below.
The architectures of the neural queue and neural deck will probably be discussed later.
Analysis of inverse dynamics of a neural stack
Here we will look at the partial derivatives of the output with respect to the input.
We will use the Kronecker symbol (If , otherwise).
The equations below are valid for any valid line numbers And .
All partial derivatives other than those obtained using the chain derivative rule can be taken to be 0.
Initial initialization of model parameters
The original article describes that the best results for the model are provided by the following initialization: offsets (bias) for the pop operation with a negative number.
Model results
Let's consider the task of printing a string in reverse order.
Example:
As an example of input and target data, assume '$' is the leading character, '|' is a delimiter, and '&' is a terminator. Input data can be:
$A string to reverse|
and the expected result will be as follows:
esrever ot gnirts A&