Why omit /N in Squared Loss Function

In Squared Error loss function, say each term is close upto 0.1 or more
100 such records would probably give a loss of 10. How is it ever made 0???

Hi @HimajaMSC,
Can you please describe your doubt a bit more clearly?

the actual formula to calculate Square Error loss is (sigma( yi - y)^ 2/ N ).
Why do we ignore that ‘division with N’ all the time?

Won’t this affect the final loss value or the grad_w and grad_b( which we calculate based on Loss )?


Omitting N does not really make much of a difference for following reasons:

  1. N is a constant and hence it does not affect the general Loss function expression.
  2. Values of grad_w and grad_b are all relative and change by small amount so including N or excluding N should not bring a considerable difference in the final value. If you want then keeping N in the equation is no problem since it is the complete form of the expression.

This is as far as my understanding goes.
@Ishvinder sir, kindly correct me if I am wrong.