Loop-ViT is a vision model that thinks in loops, so it can take more steps on hard puzzles and stop early on easy ones.
CASA is a new way to mix images and text inside a language model that keeps speed and memory low while keeping accuracy high.
Large language models usually line words up in fixed order slots, which can waste mental energy and make it harder to find the important parts of a long or noisy text.
D4RT is a new AI model that turns regular videos into moving 3D scenes (4D) quickly and accurately.