Seq2Seq Models
Sequence‑to‑sequence (Seq2Seq) models transform an input sequence into an output sequence. They are used for machine translation, summarization, question answering, and more. The architecture consists of an encoder and a decoder.
Encoder: processes input sequence into a context vector. Decoder: generates output sequence from the context vector.
Encoder
An RNN (LSTM or GRU) reads the input sequence token by token. The final hidden state (or all hidden states) encodes the input meaning. For long sequences, using attention (next chapter) helps.
Decoder
Another RNN generates the output sequence one token at a time. It takes the encoder's final hidden state as its initial state. At each step, it outputs a token and passes its hidden state to the next step. Training uses teacher forcing (feeding true previous token instead of predicted).
Example: Machine Translation
Input (English): "How are you?" → Encoder → Context vector → Decoder → Output (French): "Comment allez‑vous?"
Encoder RNN → final hidden state → Decoder RNN (start token) → predict next token → repeat until end token.Teacher Forcing
During training, instead of feeding the decoder's own previous output, we feed the ground truth token. This speeds up convergence and stabilizes training. At inference, we use the predicted token.
Two Minute Drill
- Seq2Seq has encoder (reads input) and decoder (generates output).
- Context vector captures input meaning.
- Used for translation, summarization.
- Teacher forcing speeds training.
Need more clarification?
Drop us an email at career@quipoinfotech.com
