The softmax is applied to the hidden state in RNN(or pre-activation layer in FNN).
As highlighted in the screenshot below, it is told in the class that softmax is applied on self.hidden_size axis.
Can someone please elaborate why the 3rd dimension of the tensor in init_hidden function represents the right axis to apply softmax function?
The output of RNN will have the shape by default: sequence_length, batch_size, output_size
.
In the above tutorial, for simplicity, we have sequence_length
and batch_size
as 1.
So the output till before the softmax stage will have dimension 1, 1, num_hindi_chars
.
Hence we apply softmax on the last dimension alone.
1 Like