Backpropagation for Multiclass classification

In the attached screenshot of multiclass classification, how the dw5 is getting calculated?

How (self.h3 - self.y1) is getting computed? Can you please explain its derivative in detail?



Hi @shweta_s,
It’s the same chain rule used while explaining backprop in theory lectures, i hope that you may have covered them earlier, Please visit the same once, and let me know which place you’re getting stuck.

Hi Ishvinder,

I was trying to find out chain of derivatives for w5 from the above message’s screenshot.

dL/dw5 = dL/dy1 * dy1/dh3 * dh3/da3 * da3/dw5

dL/dy1 = (y1-y1^)/(y1^ * (1-y)) -→ Loss is cross entropy and y1^ represent y1 hat i.e predicted output

If my above understanding is correct then, I want to know what will be the y1 value in terms of h3? What dy1/dh3 will be?

One thing i know that y1 will be softmax output and it is --→ y1 = e^a3/(e^a3 + e^a4 + e^a5 +e^a6)

I got stuck in finding values of this chain of derivatives. Can you please give me some guidance here?