This paper fixes a common problem in reasoning AIs called Lazy Reasoning, where the model rambles instead of making a good plan.
Loop-ViT is a vision model that thinks in loops, so it can take more steps on hard puzzles and stop early on easy ones.
This paper shows that many AI models that both read images and write images are not truly unified inside—they often understand well but fail to generate (or the other way around).
World models are AI tools that imagine the future so a robot can plan what to do next, but they are expensive to run many times in a row.
Large language models don’t map out a full step-by-step plan before they start thinking; they mostly plan just a little bit ahead.
FSVideo is a new image-to-video generator that runs about 42× faster than popular open-source models while keeping similar visual quality.
The paper introduces RPG-Encoder, a way to turn a whole code repository into one clear map that mixes meaning (semantics) with structure (dependencies).
Long tasks trip up most AIs because they lose track of goals and make small mistakes that snowball over many steps.
WildGraphBench is a new test that checks how well GraphRAG systems find and combine facts from messy, real-world web pages.
The paper asks AI to hunt for insights in big databases without being told exact questions, like a curious scientist instead of a test-taker.
Shampoo is a smart optimizer that can train models better than AdamW, but it used to be slow because it must compute tricky inverse matrix roots.
Large Vision-Language Models (LVLMs) are great with one picture but get confused when you give them several, often mixing details from different images.