Member-only story
Activation Functions and Initialization Methods
This article covers the content discussed in the Activation Functions and Initialization Methods module of the Deep Learning course and all the images are taken from the same module.
In this article, we discuss a drawback of the Logistic function and see how some of the other choices for the Activation function deal with this drawback and then we also look at some of the common ways to initialize the parameters(weights) of the network which then helps in the training.
Why are activation functions important?
Let’s say we have the below network architecture:
In this network, we compute ‘h2’ as:
i.e we apply sigmoid over ‘a2’ where ‘a2’ is the weighted aggregate of the inputs from the previous layer(first intermediate layer). This sigmoid function is a non-linear function.
So, we can take a simple scenario where we have ‘h2’ is equal to ‘a2’ or in other words, we are passing ‘a2’ as it is to the neuron in the output layer and we are not applying any non-linearity on ‘a2’.