Although I did follow the above links I am not quite getting the explanation for why is the gradient in the direction of steepest ascent? A simple intuition for this would be really helpful. Thanks in advance.

So itâ€™s basically like asking:

Why is it that only when we subtract the gradient from the parameter, we perform minimization (or gradient â€śdescentâ€ť), or rather move in the direction opposite to gradient ascent?

Or in simple words,

Why does the update rule â€śsubtractâ€ť the gradient to minimize the cost function?

Please check if this video helps in understanding the math involved behind that:

1 Like