Relu and leaky_Relu resulting in Nan

github code
i am getting Nan values in forward pass by using Relu and leaky_Relu

exactly shown in the picture when i am using the deeper networks i am getting Nan values
While using wider networks no issue of Nan values
same with multiclass

1 Like

By wider networks, you mean one having many layers and by deeper you mean each layer being dense?

1 Like

by using more than 5 layers i facing the issue
by using 3 layers with each 10 neurons also facing the issue
finally as no of neurons and layers increases i facing that Nan values
by debugging the Nan was due the line shown on the above picture can u please clarify

1 Like

Hi, I’m not clear about the reason for this problem.
NaNs can generally occur when there’s a problem of exploding gradients, inputs, and others.

My key points to know:

  1. What’s the learning rate you’re using? (Try a smaller one maybe).
  2. Have you tried the same configuration with some other activation function (TanH?)
  3. In the shared git notebook, which cell is it where you’re facing error?

there is no problem with Tanh and sigmoid the only problem with Relu and leaky_Relu
the problem is in the line
forward pass function getting infinite values at this line
self.A[self.nh+1] = np.matmul(self.H[self.nh], params[“W”+str(self.nh+1)]) + params[“B”+str(self.nh+1)]

so it leads to nan when i ran predict

1 Like

This seems to be exploding gradients, did you try decreasing the learning rate?


yes tried but same problem

yes it seems exploding gradients for binary classification i tried with MiniBatch Gradient descent it worked for
binary classification
But for **Multi class Classification ** by using MiniBatch still facing nan
By using print i noticed that some of gradient values leads to Nan values which in turn leads to total Nan at the end
can u plz help with multiclassification code

You can try out the following experiments:

  1. Changing the model architecture.
  2. Use a significantly smaller batch size.
  3. Use a better optimizer.
  4. Use some regularizer.

I find out where the problem is from
it is from softmax function 0/0 which leads to nan values
how to overcome it

no worries i have found this solution on stack overflow
def softmax(self,X):

** z = X - np.max(X, axis=-1, keepdims=True)**

** numerator = np.exp(z)**

** denominator = np.sum(numerator, axis=-1, keepdims=True)**

** softmax = numerator / denominator**

** return softmax**
Thank u for supporting till now

1 Like