Backpropagation: Why do we sum the gradient of multiple paths?

Please refer to the screenshot. I am confused as to why the partial derivative here is a sum across multiple paths when it seems like it should be the average. Refer to equation, RHS is 2*LHS and not equal to it.

Because the loss equation is the sum of all individual losses.

Assume you want to find \frac{\partial L}{\partial h_{13}} for the above diagram.
We know that Total Loss:
L = L_1 + L_2 , where, L_1 and L_2 are losses of output neurons 1 & 2.

Now, \frac{\partial L}{\partial h_{13}} = \frac{\partial L_1}{\partial h_{13}} + \frac{\partial L_2}{\partial h_{13}}

Partially differentiating L_1 and L_2 with respect to h_{13}, we get:
\frac{\partial L_1}{\partial h_{13}} = \frac{\partial L_1}{\partial a_{21}} \frac{\partial a_{21}}{\partial h_{13}} and \frac{\partial L_2}{\partial h_{13}} = \frac{\partial L_2}{\partial a_{22}} \frac{\partial a_{22}}{\partial h_{13}}
(After removing the terms which are not dependent on h_{13})

Hence, we have:
\frac{\partial L}{\partial h_{13}} = \frac{\partial L}{\partial a_{21}} \frac{\partial a_{21}}{\partial h_{13}} + \frac{\partial L}{\partial a_{22}} \frac{\partial a_{22}}{\partial h_{13}}

2 Likes

Thanks for the reply! I get all that, but here is the thing. Look at the very last equation. On the right hand side, cancel out the del a_21 and del a_22 terms respectively in the two expressions, and you’ll end up with 2* del L/ del h_13 which is twice LHS. What is going on here?

Never mind. I get it. There’s a notation problem. The way it’s written in the slides is the wrong way to write it. What you have written except the very last step is right.

Great! Please feel free to correct it :slight_smile:

Replace both L on the RHS of your last equation with L1 and L2 respectively