Relu and leaky_Relu resulting in Nan

github code https://github.com/saiteja9010/optimized.git
i am getting Nan values in forward pass by using Relu and leaky_Relu

exactly shown in the picture when i am using the deeper networks i am getting Nan values
While using wider networks no issue of Nan values
same with multiclass

1 Like

By wider networks, you mean one having many layers and by deeper you mean each layer being dense?

1 Like

by using more than 5 layers i facing the issue
by using 3 layers with each 10 neurons also facing the issue
finally as no of neurons and layers increases i facing that Nan values
by debugging the Nan was due the line shown on the above picture can u please clarify

1 Like

Hi, I’m not clear about the reason for this problem.
NaNs can generally occur when there’s a problem of exploding gradients, inputs, and others.

My key points to know:

  1. What’s the learning rate you’re using? (Try a smaller one maybe).
  2. Have you tried the same configuration with some other activation function (TanH?)
  3. In the shared git notebook, which cell is it where you’re facing error?
2 Likes

there is no problem with Tanh and sigmoid the only problem with Relu and leaky_Relu
the problem is in the line
forward pass function getting infinite values at this line
self.A[self.nh+1] = np.matmul(self.H[self.nh], params[“W”+str(self.nh+1)]) + params[“B”+str(self.nh+1)]


so it leads to nan when i ran predict

1 Like

This seems to be exploding gradients, did you try decreasing the learning rate?

2 Likes

yes tried but same problem

yes it seems exploding gradients for binary classification i tried with MiniBatch Gradient descent it worked for
binary classification
But for **Multi class Classification ** by using MiniBatch still facing nan
By using print i noticed that some of gradient values leads to Nan values which in turn leads to total Nan at the end
can u plz help with multiclassification code

You can try out the following experiments:

  1. Changing the model architecture.
  2. Use a significantly smaller batch size.
  3. Use a better optimizer.
  4. Use some regularizer.

I find out where the problem is from
it is from softmax function 0/0 which leads to nan values
how to overcome it

no worries i have found this solution on stack overflow
def softmax(self,X):

** z = X - np.max(X, axis=-1, keepdims=True)**

** numerator = np.exp(z)**

** denominator = np.sum(numerator, axis=-1, keepdims=True)**

** softmax = numerator / denominator**

** return softmax**
Thank u for supporting till now

1 Like