Gradient Descent - Theory code vs Hands-on code


Doubt dradient decent wt.r to cross entropy. I think it’s suppose to be like in next screenshoot

Hi @180101120033,
Which specific snippet are you pointing to ?

In 1st screenshoot (grad_w_ce & grad_b_ce) and 2nd screenshoot right side(grad_w & grad_b)are suppose to be equal because they both are gradient decent with respective to cross entropy. But why they’re diffrent in explination?
So can you please help because i got stucked there.

1 Like

It’s a fair doubt, but notice that we’re taking grad_b = -1 * (1 - y_pred) only where y==1.
We can simplify it as:
grad_b = (-1 + y_pred)
Therefore,
grad_b = (y_pred - 1)
which is nothing but
grad_b = (y_pred - y) as y is 1.