In the previous article, we saw the translation being done.
But there is an issue.
The decoder does not stop until it outputs an EOS token.
So, we plug the word "Vamos" into the decoder’s unrolled embedding layer and unroll the two LSTM cells in each layer.
Then, we run the output values (short-term memory or hidden states) into the same fully connected layer.
The next predicted token is EOS.
How the Decoder Works
So now, this means we translated the English sentence "let’s go" into the correct Spanish sentence.
For the decoder, the context vector, which is created by both layers of encoder unrolled LSTM cells, is used to initialize the LSTMs in the decoder.
The input to the LSTMs comes from the output word embedding layer, which starts with EOS. After that, it uses whatever word was predicted by the output layer.
In practice, the decoder keeps predicting words until it predicts the EOS token or reaches some maximum output length.
All these weights and biases are trained using backpropagation.
When training an encoder-decoder model, instead of using the predicted token as input to the decoder LSTMs, we use the known correct token. This is known as teacher forcing (explained in this article).
What’s Next
That’s it for sequence-to-sequence neural networks.
In the next article, we will continue with the attention mechanism for neural networks.
Looking for an easier way to install tools, libraries, or entire repositories?
Try Installerpedia: a community-driven, structured installation platform that lets you install almost anything with minimal hassle and clear, reliable guidance.
Just run:
ipm install repo-name
… and you’re done! 🚀

