Batch vs Stochastic Gradient Descent

The Loss surface (contours) on the left is for batch gradient descent. Once we have a meshgrid for w, b, we can generate the Loss plot uniquely for a set of training examples given a Loss function…

The Loss surface (contours) on the right is for Stochastic Gradient Descent. As per my understanding, even if I have meshgrid of w and b, I don’t know the loss as it is also dependent on the example chosen. Is my understanding correct?

If yes, how the overall loss plot (contour) on the right has been generated (not the dynamcially generated movement of curve that is shown on top of Loss plot but the overall contour plot)

These two contour plots means that we have two different dataset entirely. That is, when we change the dataset the parameters associated with the dataset and the loss computed for them also changes. Therefore contour plots depend on the dataset.

Both the pictures depicting the loss (contour plot) shown are not about same data set but entirely two different datasets.


Ok, makes sense. During the video while comparing the two approaches (vanilla vs stochastic) there was no mention of different dataset (or I missed it) though now it seems natural after you pointed out.

1 Like