Track4World is a fast, feedforward AI that can follow the 3D path of every pixel in a video using just one camera.
People often pick CLIP-like models for image labeling, but this paper shows that large multimodal models (LMMs) can be just as good—or even better—when you give them a few examples in the prompt (in-context learning).
Big idea: Make image-making AIs stop, think, check, and fix their own work so they get better at both creating pictures and understanding them.
FS-Researcher is a two-agent system that lets AI do very long research by saving everything in a computer folder so it never runs out of memory.
VERGE is a teamwork system where an AI writer (an LLM) works with a strict math checker (an SMT solver) to make answers both smart and logically sound.
The paper proposes Diffusion in Diffusion, a draft-then-revise method that brings back global coherence to fast, block-based diffusion language models.
The paper asks what a truly good diffusion-based language model should look like and lists five must-have properties.