In case of sigmoid activation the op values are always in range of 0-1 and while in tanh it is in range of (-1, 1), in that case do we need to apply batch norm? or should it be applied only in case of RELU or LeakyRELU?
Do we need batch norm if activation function used is tanh or sigmoid?
Yes, we can use batch norm with sigmoid and tanh as well. Remember it is used before the activation function, so we don’t have values between (0-1) or (-1 to 1) yet.