In case of sigmoid activation the op values are always in range of 0-1 and while in tanh it is in range of (-1, 1), in that case do we need to apply batch norm? or should it be applied only in case of RELU or LeakyRELU?
Yes, we can use batch norm with sigmoid and tanh as well. Remember it is used before the activation function, so we don’t have values between (0-1) or (-1 to 1) yet.
Does batch norm provide any sort of non linearity? If it does than can we omit these sigmoid or tanh functions
Yes id does, but the primary use of batchnorm is for gradient stabilization.