Sorry, I might be a bit verbose while asking this question, because it is basic in nature and I tried to not miss the vitals while putting it across.
I have a very basic question. Why would we call this an encoder decoder, when, for example RNN encodes (or produces an s-t) and FNN decodes (or produces the output)? I don’t recall having an FNN in an RNN network to give me an output. So, why does it need a combination of 2 different kind of networks to be called as encoder-decoder network? You just need an softmax function in the end to give you the output, so why another FNN to be able to label it as an encoder-decoder? Also, if that’s the way to call it, wouldn’t those deep CNN models (like Resnet, GoogLeNet) be also categorised as encoder-decoder? They also have those FNNs at the end of the CNN layers, making them a combination of 2 different types of networks.