Why does having an FFN at the end of RNN make it an Encoder Decoder model?

Sorry, I might be a bit verbose while asking this question, because it is basic in nature and I tried to not miss the vitals while putting it across.

I have a very basic question. Why would we call this an encoder decoder, when, for example RNN encodes (or produces an s-t) and FNN decodes (or produces the output)? I don’t recall having an FNN in an RNN network to give me an output. So, why does it need a combination of 2 different kind of networks to be called as encoder-decoder network? You just need an softmax function in the end to give you the output, so why another FNN to be able to label it as an encoder-decoder? Also, if that’s the way to call it, wouldn’t those deep CNN models (like Resnet, GoogLeNet) be also categorised as encoder-decoder? They also have those FNNs at the end of the CNN layers, making them a combination of 2 different types of networks.

Having a fully-connected layer after an RNN does not make it an encoder-decoder model.
It’s just a sequential-model. :slight_smile:

If the model has RNN layer(s) for encoding the sequential input domain and RNN layer(s) on top of that for decoding the encoded input to a sequential output domain, we could call it a sequence-to-sequence encoder-decoder model.

The transliteration model shown in the course is an example of sequence-to-sequence encoder-decoder model, which finally has an FNN+Softmax to classify the output characters.

Sorry, but the confusion still remains. I might have a reason to believe now that Encoder Decoder models have to have a CNN aspect in it (or may be not). Hear me out and please refer my original query one more time.

In the course (refer 7th video tutorial in Encoder-Decoder module), the instructor reinforces the idea that in language modeling we are taught how to combine RNN and FFN. So essentially he means that use of a softmax function is the use of an FFN (Please explain, in case this is a wrong assumption), which was done either at each layer or at the end. If this is the case, then is it wrong to suggest that RNN and FNN combo can never be an example of encoder-decoder model (I know that RNN is sequential, but can it be categorised as an encoder-decoder model)? I see the use of FFN’s softmax as decoding at the end of the RNN network (but my assessment could be wrong).

I have another aspect that I deduce from the theory class, to suggest what might be an essential aspect of an encoder-decoder model. That the input from this another model (like CNN’s model in the video tutorial above), either has to be fed at the beginning or at each time-stamp - that is what a true combination looks like and hence the definition of encoder-decoder fits (i.e. CNN encoded the input and RNN is decoding it at the output, albeit RNN is doing that decoding in each layer). If that’s true, won’t your suggestion that addition of FFN’s softmax can’t be treated as an encoder-decoder model stands to be not correct?

Take your time and no rush, but please give me a good explanation while you answer to my doubt.