Because we want to recalculate the gradients for every epoch from starting. If it was a mini-batch GD, we would have done this after each mini-batch. Please refer to theory lecture videos for more explanation.
1 Like