Notice the error while updating weight. There should be a common bracket to separate the square root of history and epsilon; perhaps, by mistake that common bracket moved after the epsilon. Same should be for bias term too.

Current and wrong notation:

~

self.params[“W”+str(i)] -= (eta/(np.sqrt(self.update_params[“v_w”+str(i)]+eps)))*(self.gradients[“dW”+str(i)]/m)

~

Correct notation:

~

self.params[“W”+str(i)] -= (eta/(np.sqrt(self.update_params[“v_w”+str(i)])+eps))*(self.gradients[“dW”+str(i)]/m)

~