Why grad is set to zero after every epoch

Because we want to recalculate the gradients for every epoch from starting. If it was a mini-batch GD, we would have done this after each mini-batch. Please refer to theory lecture videos for more explanation.

1 Like