Which one of the following is a more robust method of weight update?

1. full batch gradient descent

2. Minibatch Gradient descent

3. Stochastic gradient descent

Which one of the following is a more robust method of weight update?

1. full batch gradient descent

2. Minibatch Gradient descent

3. Stochastic gradient descent

I would suggest Mini-batch, but the batch size would depend on various factors. You can finetune the model for different batch sizes.

Quoting Yoshua Bengio from his paper Practical recommendations for gradient-based training of deep architectures:

The mini-batch size (B in Eq. (1)) is typi-

cally chosen between 1 and a few hundreds, e.g.

B = 32 is a good default value, with values above

10 taking advantage of the speed-up of matrix-

matrix products over matrix-vector products.

The impact of B is mostly computational, i.e.,

larger B yield faster computation (with ap-

propriate implementations) but requires visiting

more examples in order to reach the same error,

since there are less updates per epoch. In the-

ory, this hyper-parameter should impact train-

ing time and not so much test performance, so it

can be optimized separately of the other hyper-

parameters, by comparing training curves (train-

ing and validation error vs amount of training

time), after the other hyper-parameters (except

learning rate) have been selected.

1 Like