Why Batch normalization?

We are using batch normalization for each layer and value gets reduces between -1 to 1… Why cannot we use sigmoid or tanh activation instead of performing the batch normalization in relu?

Hi @Gaurav_Kandel,
Batch norm is not just used to scale the values between -1 to 1, but it actually has many implications which are helpful in our training process.
You can refer this article for more understanding