Sir, what is learning rate scheduling? I read that it is use to just decrease the learning rate at a certain schedule(time decay, step decay or exponential decay). But if so, then what is the difference between this and various adaptive learning algorithms? Like Adagrad, adam which also uses the same scheme?
Yes both do same task, but if we look at it carefully LR Schedulers do require us to check which LR decay mechanism would work best with our data (exponent, time-based, step etc.) which can be a computationally expensive task to do and check.
On other hand, Adaptive Learning Algos are based on a particular heuristic, and tends to be a bit inexpensive in terms of computation.
Hope this helps!