Which specific snippet are you pointing to ?
In 1st screenshoot (grad_w_ce & grad_b_ce) and 2nd screenshoot right side(grad_w & grad_b)are suppose to be equal because they both are gradient decent with respective to cross entropy. But why they’re diffrent in explination?
So can you please help because i got stucked there.
It’s a fair doubt, but notice that we’re taking
grad_b = -1 * (1 - y_pred) only where
We can simplify it as:
grad_b = (-1 + y_pred)
grad_b = (y_pred - 1)
which is nothing but
grad_b = (y_pred - y) as y is 1.