Here while defining rnn class,why are we passing input_size+hidden_size to nn.linear layer?Why not we pass just input layer alone as first parameter?What is the intution behind it
I have the same query. One more thing that I can’t understand is self.i2o. I thought it was supposed to be i2h and h2o
As explained in the theory lectures, there are two parameters U and W. Which is why we’re taking input+hidden number of neurons in first layer, take it as a scenario where we have U+W inputs, but only W outputs.
And here we’re defining a fully connected neural net for the hidden layer, which takes in two inputs.
After going through code and writing on the paper,I understood as below:
As h(t) = W*h(t-1) + U x(t)
here in this context dimensions are as follows : x=57 * 1 , h=128 * 1 U=128 * 57 and W = 128 * 128
As we add W * h(t-1) and U * x(t) we get final dimensions as 128 * 1
Instead of doing different calculations separately and adding them ,here in nn.linear we defined input as (input+hidden) and directly calculating the result at a go in single line.
So finally we get weight matrix as (128185) according to the nn.linear definition.
Here in weight tensor, first 57 columns correspond to U and remaining 128 columns correspond to the W vector
Just confirm me whether I was on the right direction?
Yes. Please check the below explanation to understand it in detail:
Creating a simple RNN Network using PyTorch from scratch
This still doesn’t align with the theory lectures, where there was supposed to be a new weight between the computed hidden layer and the output layer, as O = softmax(V*h + b).
@GokulNC Can you help me relate the implementation with this explanation ?
If you look at the this tutorial,
It uses only the new hidden state to compute the output, instead of the combination of input and previous hidden state as used in our notebook,