Learning Algorithm - Recurrent Neural Networks(RNNs)

Learning Algorithm

One question at this point would be “can the gradient descent algorithm be used in the context of a recurrent neural network?

There would be bias terms as well in addition to “w”, “v”

Total No. of Parameters

Let’s start with “U

Input dimension
Dimension of U
Dimension of W
Dimension of V

Learning Algorithm — Derivatives w.r.t. V

Let’s briefly look at how the derivatives would be computed for backpropagation steps as part of the learning algorithm

a₂ is the above image refers to the pre-activation at layer 2

Learning Algorithm — Derivatives w.r.t. W

This is what the network looks like where the “model predicts an output at each time step” and “loss is also computed at each time step” using the cross-entropy loss function

Here 4 represent the number of time steps



Get the Medium app

A button that says 'Download on the App Store', and if clicked it will lead you to the iOS App store
A button that says 'Get it on, Google Play', and if clicked it will lead you to the Google Play store