 # Explain fit() function in SigmoidNeuron class (Python code)

Hi I have a doubt, Hope some one will help me understand . In the sigmoid neuron class(python code) professor has written Here, we are initializing dw and db to 0 (these are nothing but the derivative of loss function wrt to w)
Now for each data point we are basically incrementing the dw and db, means if I have 3 data points and
the dw for three points are 0.1,0.15 and 1.5 then my final dw would be => 0+0.1+0.15+1.5=1.75
and after each epoch we are updating the W which we defined earlier by W=W-learning rate*dw

Ideally we should update the W in each iteration of the x,y, for each data point I will compute the error and take the derivative of the error wrt to the previous weight and update the wight and continues until satisfied. Why the code above deviates from the actual logic?

The code follows a full batch gradient descent. i.e., weights and bias gets updated only after one epoch (as in the case of three data points after predicting the output and computing respective losses and gradients for three data points) the parameters are updated.

Whereas it is also possible to update the parameters after each data point (predicting the output and computing loss of that data point and also the gradient). This is called stochastic gradient descent. That is after every data point is seen the parameters are greedily updated.

Both are fine. But the code here follows full batch gradient descent. And an ideal form of parameter update is Mini batch gradient descent.

1 Like

Thanks got the point.