Groups
Sequence-to-sequence with attention lets a decoder focus on the most relevant parts of the input at each output step, rather than compressing everything into a single vector.
Long Short-Term Memory (LSTM) networks use gates (forget, input, and output) to control what information to erase, write, and reveal at each time step.