Backpropagation chain rule intuition

Hello,
in this video sir explains the partial derivatives but he tells that dL/dw121 = (dL/dy) . (dy/dh2) . (dh2/dh1) . (dh1/dw121) , image for reference

But should it not use the weights in the hidden layer too? like,
dL/dw121 = (dL/dy) . (dy/dw3) . (dw3/dh2) . (dh2/dw2) . (dw2/dh1) . (dh1/dw121)

For any dy/dh, you can either express y in terms of h and directly calculate dy/dh.
Or express y in terms of some intermediate w, and w in terms of h. Then you can split
dy/dh as dy/dw * dw/dh

It will be used inherently. Both ways are ok. Consider example below

if y=2w, w=3h ,we can dy/dh in two ways:

=> dy/dw = 2, dw/dh = 3
=> dy/dh = dy/dw * dw/dh = 2 * 3 = 6

or, we can

y = 2w   => y=2(3h). => y=6h
y = 6h, dy/dh = 6

Note, how in second expression, w as not used directly in derivative but used in constructing the expression of y in terms of h

2 Likes