While explaining the math behind LSTM vanishing gradient…in the below equation, prof said:

ds(t)/ds(t-1) = d/d(s(t-1)[ f(t)*s(t-1) + i(t)*s(tcand)], let us consider i(t)*s(tcand) = 0 and make our lives tough,

and we will prove

derivate of f(t)*s(t-1) > 0, and if so we can prove gradients do not vanish, following principle if a> 0 , and assuming b = 0,

then a+b > 0.

My question is about the term b, which is i(t)*s(ctand), can’t it be negative? if so can’t it negate a, hence

even if a > 0 it would not mean a + b > 0?

If we further take derivate of i(t)*s(tcand) wrt s(t-1),

= sigmoid’(i(t))*o(t-1)*W(i)*sigmoid’(s(cand(t)))*W*o(t-1)

which would mean W(i), W or o(t-1) would control whether gradient would vanish or not along with

the first term a which was already explained in the video, is this understanding correct?