What will be the result of Partial derivative of a vector with respect to matrix and vice versa?

In the sequence problem we saw the equation of loss with respect to weight as \frac{\partial L_{t}(\theta)}{\partial W} = \frac{\partial L_{t}(\theta)}{\partial s_{t}} \sum_{k =1}^{t}\prod_{j = k}^{t - 1}\frac{\partial s_{j+1}}{\partial s_{j}}\frac{\partial s_{k}}{\partial W}. Here, what would be the result of \frac{\partial s_{k}}{\partial W}?

  1. {\partial s_{k}} is a vector then what will be the partial derivative of a vector with respect to matrix {\partial {W}}? Whether it results in a matrix or tensor of higher dimension?

  2. Apart from the sequence equation what will be the result of the partial derivative of a matrix with respect to vector? Say in case of \frac{\partial {W}}{\partial s_{k}} where W is a matrix and s_{k} is a vector?

I just want to understand the math behind them.

1 Like

My understanding.

  • derivative is scaler. For e.g. df/dx means change in f for a unit change in x (or a unit change in x direction). Same for df/dy etc.
  • Note, df/dv (derivative w.r.t. vector v) is also scaler (Consider a 2-d space where v is some linear combination of x and y). A unit change in vector v means some change in corresponding x and y components which can be propagated to f to calculate the correspond delta in f. The delta is still a number/scaler.

Now coming to the question:

I think gradient is a better term here than partial derivative.

The result of above operation will be a collection of partial derivatives (collection -> gradient) where the derivative will be taken for individual component of vector w.r.t. individual element/component of matrix.

Dimension wise, for every component of the vector, you will get n x m partial derivatives where n x m is the dimension of the matrix.
Specifically if vector has 2 components and matrix is 3 x 4, we can write the corresponding gradient as a 2 x 3 x 4 arrangement of partial derivatives.
Same reasoning can be extended to the 2nd question.

PS: This is just my current understanding. I request others to share their response as well to the original post :+1:


Refer to the image and and as we can see deravative of a vector with respect to matrix(where S is the vector and W is the matrix) is shown and it will be a 3-d tensor with its first top layer including the deravitive of the first value of the vector with respect to each value of the matrix. Similary , It happens for the second value of the vector with respect to each value of the matrix in the second layer which is hard to represent in the diagram but i hope this gives an intuition for how the things are working in high dimesions.

If this question is with respect to the theory videos in which S is vector of inputs which are constant then deravative of a constant with respect to a variable matrix W will always be zero since slope is defined as the change in the y variable for a given change in the x variable .
Hope this is helpful.