My understanding of Sigmoid Neuron and Cross-Entropy ... in the form of a quick post

Just like its name suggests, this not-so-primitive Deep Learning model is built on the classical Sigmoid function. For all those not familiar with the term, it’s an S-shaped distribution graph whose values are in the range of 0 and 1. The inputs for the model are mostly normalized – get them into a limited range of values so that it respects the equal treatment principle for all the possible factors that go into the decision making of a random event.

In functional world, this model is used in classification type predictions. You may be wondering how a probability output is linked with classification. Well, it gives a probabilistic distance between the predicted and true outcome. When you see this from the lenses of a Cross-Entropy loss function, you would appreciate the nature of this model. A Loss function is something that is meant to give the shift of the model’s output from ground reality. Cross Entropy is a very popular method to calculate such loss and evolves from the concept of Gains (or loss) you make while betting/guessing. It is a multiplication of actual distribution of an event with the Gain (out of a prediction) and the Gain is a log function (raised to base 2) of the guessed probability. Essentially, the word “cross” in Cross Entropy indicates this bipolar nature of this product – component of actually occurred distribution multiplied with component of probable distribution of the event.

As a side-kick, this Gain has another interesting connotation. It is the number of bits that is required to transmit a message.