Member-only story

Activation Functions and Initialization Methods

Parveen Khurana
15 min readFeb 4, 2020

--

This article covers the content discussed in the Activation Functions and Initialization Methods module of the Deep Learning course and all the images are taken from the same module.

In this article, we discuss a drawback of the Logistic function and see how some of the other choices for the Activation function deal with this drawback and then we also look at some of the common ways to initialize the parameters(weights) of the network which then helps in the training.

Why are activation functions important?

Let’s say we have the below network architecture:

In this network, we compute ‘h2’ as:

i.e we apply sigmoid over ‘a2’ where ‘a2’ is the weighted aggregate of the inputs from the previous layer(first intermediate layer). This sigmoid function is a non-linear function.

So, we can take a simple scenario where we have ‘h2’ is equal to ‘a2’ or in other words, we are passing ‘a2’ as it is to the neuron in the output layer and we are not applying any non-linearity on ‘a2’.

--

--

Parveen Khurana
Parveen Khurana

Written by Parveen Khurana

Writing on Data Science, Philosophy, Emotional Health | Grateful for the little moments and every reader | Nature lover at heart | Follow for reflective musings

No responses yet