The paper shows that video AIs do not need long, human-like chains of thought to reason well.
Big language models use RoPE to remember word order, but it throws away the imaginary half of a complex number during attention.