Sigmoid Neuron and Cross-Entropy

Parveen Khurana
5 min readJan 6, 2020

This article covers the content discussed in the Sigmoid Neuron and Cross-Entropy module of the Deep Learning course and all the images are taken from the same module.

The situation that we have is that we are given an image, we know the true label for that image if it contains text or not and in the below case since the image contains text, we can say that all the probability mass is on the random variable taking on the value 1 and there is 0 probability mass on the random variable taking on the value No Text.

Of course, in practice, we don't know this true distribution, so we are approximating the same using the sigmoid function, and when we pass this input as x to the sigmoid neuron, we get the output to say 0.7 which we can again interpret as the probability distribution as the probability of the image containing text is 0.7 and the probability of the image not containing text is 0.3.

So, we were computing the difference between these two distributions using the squared error loss but now we have a better metric, something which is grounded in probability theory which is the KL Divergence between these two distributions.

So, now instead of minimizing the squared error loss, we are interested in minimizing the KL Divergence and this minimization would be in respect to the parameters of the model(w, b)

--

--

Parveen Khurana
Parveen Khurana

Written by Parveen Khurana

Writing on Data Science, Philosophy, Emotional Health | Grateful for the little moments and every reader | Nature lover at heart | Follow for reflective musings

No responses yet