Shouldn't we use a discrete loss function for sigmoid neuron (classification problem)?

As told by the mentor in the Loss function jar, the squared error will calculate the sum of squared differences between actual and predicted values. I have one problem in accepting the formula when the actual classes are given in discrete values (0,1) and you are using decimal predicted values to calculate loss. In real world, as the output of prediction is supposed to be a 0 or 1, I will apply a threshold to the probabilities (from sigmoid function), thus leading again to discrete predicted values (instead of decimals) - isn’t it fair to use the discrete values to loss function?

I think while training the model, it is better to use decimal predicted values for calculating loss.
For e.g. let the actual ground truth value is 1 and the predicted values is 0.55, (with threshold 0.5, such that anything greater than threshold will be binarised to 1 and less than 0.5 is 0).
0.55 will be binarised to 1 but note that 0.55 can also be seen as the confidence (or probability) by model in predicting the value as 1.
If there is another new model which predicts 0.95 for the same example. This can be seen as model saying there is a confidence of 95% in predicting the value is 1.

Now, if we use decimal predicted values for calculating loss, we are making sure that we are capturing this information about how sure the model is in making that prediction.
On the other hand, if you use simply 0 or 1 for calculating loss, you will lose that information and a chance to improve the model.

Once the model is ready and you are now making prediction on the new unseen data, now it make sense to convert predicted values as 0 or 1 and calculate accuracy as a metric to see how well your model is performing.

In short, I will prefer decimal predicted values while building the model (helps in improvement) and will use binary values while using the model

1 Like

@sanjayk
Got it…this explanation was really helpful. Thanks!