How to reduce MAE and MSE values which is increasing with increase in no. of hidden layers?

For my data, I have calculated the MSE and MAE for hidden layers ranging from 1-9 with each hidden layer having 7 neurons.

These are my findings for 9 hidden layers.

            MAE	MSE

1 HL 0.4814 0.5691
2 HL 0.4805 0.5872
3 HL 0.4979 0.5904
4 HL 0.5018 0.5996
5 HL 0.5868 0.952
6 HL 0.5868 0.952
7 HL 0.5867 0.952
8 HL 0.587 0.952
9 HL 0.6633 0.9755

MSE and MAE is increasing with increase in hidden layers.

1 Like

Is this for training or evaluation set?
What are the other hyperparameters that you’ve used?

  1.   if display_loss:
     Y_pred = self.predict(X)
     #loss[i] = log_loss(np.argmax(Y, axis=1), Y_pred)
     loss[i] = mean_squared_error(Y_pred, Y)

    if display_loss:

    I have taken learning rate parameters 0.01, 0.05, 0.1, 0.2, 0.5.

Well for reducing loss, you shouldn’t just keep increasing the hidden layers as other hyperparameters also play a role. Instead of randomly choosing the lr, we don’t you use ‘tf.keras.callbacks.LearningRateScheduler’ as callback and plot it to choose ideal lr. Similarly play with other hyperparameters also to get better insight!!

How mush optimization techniques is going to help in this regard?

Sharing my understanding of your issue:

  • Ishvinder asked you about training or evaluation set. probably you missed that question.
    (From the code you shared)Since you seem to be accumulating these results from inside 'fit' function, it would be your training set.
    If you try to look for configuration which result in very low error(or very hight accuracy) on the training set, its not necessarily a good thing. This may result in 'overfitting'. Basically, it will make your model too rigid and won’t generalise
    Remember Your goal is to improve prediction on unseen data (i.e test data). So may be you can also try checking these metrices values on your non-training set.

  • How mush optimization techniques is going to help in this regard?
    I remember, in one of the video lectures, Mitesh Sir mentioned, in DL its mostly about Hyperparameter tuning.
    So you will have to try different combinations of learning_rate, epochs on different models and see how they perform. Once you think a model looks good, you can check results on non-training set as well.