Member-only story
How LSTMs solve the problem of Vanishing Gradients?
This article covers the content discussed in the Vanishing and Exploding Gradients and LSTMs module of the Deep Learning course offered on the website: https://padhai.onefourthlabs.in
We discussed in case of RNNs that while back-propagating, gradients might vanish or gradients might explode as discussed in this article and so from there, we move on to the concepts of LSTMs and GRUs(discussed in this article) which uses selective read, write and forget to pass on the relevant information to the state vector.
The gates regulate/control the flow of the information in LSTMs.
Intuition: How gates help to solve the problem of vanishing gradients
During forward propagation, gates control the flow of the information. They prevent any irrelevant information from being written to the state.
Similarly, during backward propagation, they control the flow of the gradients. It is easy to see that during the backward pass, gradients will get multiplied by the gate.
Let’s consider the following output gate:
We can write the hidden state ht as:
ht = (st)*(ot) (Equation 1)
Now let’s say we wish to compute the derivative of the Loss function with respect to W, now in that…