Groups
Category
Sequence-to-sequence with attention lets a decoder focus on the most relevant parts of the input at each output step, rather than compressing everything into a single vector.
Temporal (causal) convolution computes each output at time t using only the current and past inputs, ensuring no future information leakage.