GRU - application of softmax on the last axis

The softmax is applied to the hidden state in RNN(or pre-activation layer in FNN).
As highlighted in the screenshot below, it is told in the class that softmax is applied on self.hidden_size axis.
Can someone please elaborate why the 3rd dimension of the tensor in init_hidden function represents the right axis to apply softmax function?

The output of RNN will have the shape by default: sequence_length, batch_size, output_size.

In the above tutorial, for simplicity, we have sequence_length and batch_size as 1.
So the output till before the softmax stage will have dimension 1, 1, num_hindi_chars.

Hence we apply softmax on the last dimension alone.

1 Like