I tried to implement seq2seq model and add the attention mechanism to it. I was able to successfully implement the mechanism for a single input. However, I am unable to implement it for batched inputs. The problem lies in computing Uatt(encoder_inp) and Watt(hidden_state), as the it then becomes a 3D tensor. Please suggest.
Duplicate of: How to implement attention with batching in PyTorch