more robust method of weight update

Which one of the following is a more robust method of weight update?

1. full batch gradient descent

2. Minibatch Gradient descent

3. Stochastic gradient descent

I would suggest Mini-batch, but the batch size would depend on various factors. You can finetune the model for different batch sizes.

Quoting Yoshua Bengio from his paper Practical recommendations for gradient-based training of deep architectures:

The mini-batch size (B in Eq. (1)) is typi-
cally chosen between 1 and a few hundreds, e.g.
B = 32 is a good default value, with values above
10 taking advantage of the speed-up of matrix-
matrix products over matrix-vector products.
The impact of B is mostly computational, i.e.,
larger B yield faster computation (with ap-
propriate implementations) but requires visiting
more examples in order to reach the same error,
since there are less updates per epoch. In the-
ory, this hyper-parameter should impact train-
ing time and not so much test performance, so it
can be optimized separately of the other hyper-
parameters, by comparing training curves (train-
ing and validation error vs amount of training
time), after the other hyper-parameters (except
learning rate) have been selected.

1 Like