Why do we use nn.LogSoftmax for NLL Loss intead of nn.Softmax?

why is nn.logsoftmax used instead of softmax… and how can we justify the use of negative log likelihood after logsoftmax?
We are using log twice and when using logsoftmax the output is not between 0 -1 like it should be in case of probabilities refer image

Assume you have a vector z .

Softmax Function:
{\displaystyle \sigma (\mathbf {z} )_{i}={\frac {e^{z_{i}}}{\sum _{j=1}^{K}e^{z_{j}}}}{\text{ for }}i=1,\dotsc ,K}

LogSoftmax Function:
{\displaystyle \sigma _{log} (\mathbf {z} )_{i}={\log {\frac {e^{z_{i}}}{\sum _{j=1}^{K}e^{z_{j}}}}}}
=> {\displaystyle \sigma _{log} (\mathbf {z} )_{i}={ {z_{i}} - {\log (\sum _{j=1}^{K}e^{z_{j}})}}}

Cross-Entropy Loss: (torch.nn expects the vector z directly as input)
L = - \sum{p_{i} \log{\sigma(z_{i})} = - \sum{p_{i}{\log {\frac {e^{z_{i}}}{\sum _{j=1}^{K}e^{z_{j}}}}}}}

NLL-Loss: (torch.nn implements this as follows, requiring you to pass the output of LogSoftmax)
L = - \sum{p_{i} \sigma_{log}(z_{i})}

Basically, nn.NLLLoss expects log probabilities as input instead of probabilities, though theoretically Negative Log-Likelihood and Cross-Entropy losses mean the same thing.
But I’m not sure why PyTorch named it this way causing confusion.

1 Like

thanks man now it makes sense! should have checked the docs :sweat_smile: