Member-only story

The whiteboard analogy to deal with vanishing and exploding gradients

Parveen Khurana
6 min readMar 20, 2022

--

In the previous article, we briefly touched upon the problem of “Vanishing and Exploding gradients”, especially when dealing with longer sequences. In this article, one strategy termed “selectively read, selectively write and selectively forget” is discussed that would help to deal with the problem of vanishing and exploding gradients

In a recurrent neural network, especially wherein there is a long chain between the input and the output, suppose there is a loss at the 4ᵗʰ time step, and say this high loss value is because s₁ was not computed properly, and because of which s₂, s₃, and s₄ was not computed properly, and s₁ was not computed correctly because the corresponding W on which it depends was not correct

Now this feedback needs to go back to W, that the loss is very high because s₁ is not good enough because W is not configured properly and hence W should change and this information has to flow through a long chain from Loss to s₄ to s₃ to s₂ to s₁ to W

--

--

Parveen Khurana
Parveen Khurana

Written by Parveen Khurana

Writing on Data Science, Philosophy, Emotional Health | Grateful for the little moments and every reader | Nature lover at heart | Follow for reflective musings

Responses (1)