Data Modeling for Recurrent Neural Networks (RNN)

Parveen Khurana
4 min readFeb 7, 2022

In the last article, we discussed the Data and Task jar for Sequence labeling problems. In this article, we touch upon the data modeling aspect for Sequence classification and sequence labeling problems

Data Modeling

A model is an approximation of the true relationship that exists between the input and the output. And the only difference in modeling for Sequence Learning Problems in comparison with other problems is that there is not a single input, but instead “a series of inputs”

In a sequence classification problem, the output depends on the entire sequence for example for a sequence sentiment analysis problem, the output depends on all the words in the sequence and not just the first word or the last word

So, the objective is to come up with a function such that the final output is a function of all the inputs and a recurrent neural network essentially serves this purpose

In a Fully Connected Neural Network (FCNN) the output depends on a single input something like the below

And “y_hat” is computed using the below equations (complete details are covered in this article)

The model’s equation for Recurrent Neural Network (RNN) is very similar to the equation in FCNN, the only change in the case of RNN is that the “output also depends on the hidden state(si) in addition to the input, and the “hidden state inherently would depend on the previous hidden state” by way of design/equation linkage

And therefore the final “y_hat” would inherently depend on the final hidden state (“sT) which by design keeps on accumulating all previous hidden states (s1, s2, s3, ….., s(T-1))

The way final output is computed is slightly complex because the intermediate/hidden states are computed and accumulated for all the time steps one after the other and then the state for the final time step would inherently have state value for all previous steps as opposed to a feed-forward neural network wherein one forward pass gives the output

This is going to be a long computation as there are so many non-linearities, first, we are computing sigmoid over “x1” then in the next step we are computing sigmoid over “s1” and “x2” after multiplying both with weight matrix and so on

Data Modeling for Sequence Labeling

Here the idea is to compute the output at every time step

In the case of FCNN, it would have been like the block in the snippet below wherein the output depends on the input for that time step input:

Here as well, the subsequent output depends on previous inputs also and not just the current input, and the recurrent neural network serves this purpose (equation in the image below) but here in addition to the final output y_hat, output at every time step is to be computed

And the output at any time step would depend on the current input and current hidden state (which depends upon the previous hidden state(and this hidden state would depend on the input)

This equation is used again and again at each time step, and the output would be available as the probability distribution

So, y1_hat, y2_hat, and so on depends on the current hidden state, and that hidden state depends upon the previous hidden state as well as the current input

So, the main message here is that this is still a model that tries to approximate the relation between the input and the output, and the output would either be computed at each time step or just at the final time step depending on the task at hand.

References: PadhAI