Omni-R1 teaches AI to think with pictures and words at the same time by drawing helpful mini-images while reasoning.
RelayLLM lets a small model do the talking and only asks a big model for help on a few, truly hard tokens.
This paper teaches video-language models to first find when the proof happens in a video and then answer with that proof, instead of mixing both steps together.