 # Mini Batch GD Confuse

I have one confusing regarding Mini Batch GD.
In the code, we can see that the points_seen is set to zero after every epoch.

`for i in range(epochs):`
`dw, db = 0, 0`
`points_seen = 0` Now, let’s say we take five data points

X Y
1 1
2 4
3 9
4 16
5 25

Some sort of square function.
Taking `mini_batch = 2`,

X Y
1 1
2 4 Update is done, here
3 9
4 16 Update is done, here
5 25

Does this mean my algorithm will never consider the fifth data point? As the points_seen is being set to zero, in every iteration my 5th point will be unnoticed.

I feel like the code below would work :

`for i in range(epochs):`
`dw, db = 0, 0`
`for x , y in zip(X,Y):`
`dw += self.grad_w(x,y)`
`db += self.grad_b(x,y)`
`point_seen +=1`
`if points_seen % mini_batch_size == 0:`
`self.w -= eta*dw/mini_batch_size`
`self.b -= eta*db/mini_batch_size`
`dw, db = 0, 0`

In this case, the fifth point will be considered in the next iteration.
So in the 2nd iteration the update will be like

X Y
1 1 Update is done, here
2 4
3 9 Update is done, here
4 16
5 25 Update is done, here

Can someone clarify ?

This is an interesting observation.
Yes, that’s a better option.

But considering the point from the previous iteration isn’t also so intuitive. What if we had only 1 epoch?
Even though the second seems a better option, it would be more appropriate to include the point in the first itself, I guess.

The case of training a model in a single epoch is practically not possible, it would tend to underfit.

See the updates in minibatch are happening after each batch. It doesn’t matter if the update comes in first epoch or second.

Additionally, the case already given in the code, where we discard those last few samples isn’t a big deal if we have a large dataset.