In “0413_InitailizationActivationfunction.ipynb” (Optimiazation Algorithms) to calculate loss the log_loss is used but previouslly mean_squared_error is used…?
The reason log loss function is used is because of the Sigmoid activation function (similar to logistic regression). The log loss function, is derived from Maximum likelihood Estimator.
Loss functions should be convex for learning algorithms to find the global minimum. If we use Squared error for logit functions (sigmoid functions), the loss equation will have multiple local minimums, thereby learning algorithms won’t do a good job of finding the global minimum.
Please go through the below given sources; Ben Lambert explains MLE in three short videos about MLE. The log-loss function’s derivation is shown in the math exchange resource shared.
Hope it helps.