Member-only story
The whiteboard analogy to deal with vanishing and exploding gradients
In the previous article, we briefly touched upon the problem of “Vanishing and Exploding gradients”, especially when dealing with longer sequences. In this article, one strategy termed “selectively read, selectively write and selectively forget” is discussed that would help to deal with the problem of vanishing and exploding gradients
In a recurrent neural network, especially wherein there is a long chain between the input and the output, suppose there is a loss at the 4ᵗʰ time step, and say this high loss value is because s₁ was not computed properly, and because of which s₂, s₃, and s₄ was not computed properly, and s₁ was not computed correctly because the corresponding W on which it depends was not correct
Now this feedback needs to go back to W, that the loss is very high because s₁ is not good enough because W is not configured properly and hence W should change and this information has to flow through a long chain from Loss to s₄ to s₃ to s₂ to s₁ to W