While explaining the math behind LSTM vanishing gradient…in the below equation, prof said:
ds(t)/ds(t-1) = d/d(s(t-1)[ f(t)*s(t-1) + i(t)*s(tcand)], let us consider i(t)*s(tcand) = 0 and make our lives tough,
and we will prove
derivate of f(t)*s(t-1) > 0, and if so we can prove gradients do not vanish, following principle if a> 0 , and assuming b = 0,
then a+b > 0.
My question is about the term b, which is i(t)*s(ctand), can’t it be negative? if so can’t it negate a, hence
even if a > 0 it would not mean a + b > 0?
If we further take derivate of i(t)*s(tcand) wrt s(t-1),
which would mean W(i), W or o(t-1) would control whether gradient would vanish or not along with
the first term a which was already explained in the video, is this understanding correct?