Kimi K2.5 is a new open-source AI that can read both text and visuals (images and videos) and act like a team of helpers to finish big tasks faster.
LatentMorph teaches an image-making AI to quietly think in its head while it draws, instead of stopping to write out its thoughts in words.
The paper fixes a hidden mistake many fast video generators were making when turning a "see-everything" model into a "see-past-only" model.
This paper studies how AI agents get better while they are working, not just whether they finish the job.
The paper introduces VDR-Bench, a new test with 2,000 carefully built questions that truly require both seeing (images) and reading (web text) to find answers.
This paper fixes a common problem in reasoning AIs called Lazy Reasoning, where the model rambles instead of making a good plan.
Loop-ViT is a vision model that thinks in loops, so it can take more steps on hard puzzles and stop early on easy ones.
This paper shows that many AI models that both read images and write images are not truly unified inside—they often understand well but fail to generate (or the other way around).
World models are AI tools that imagine the future so a robot can plan what to do next, but they are expensive to run many times in a row.
Large language models don’t map out a full step-by-step plan before they start thinking; they mostly plan just a little bit ahead.
FSVideo is a new image-to-video generator that runs about 42× faster than popular open-source models while keeping similar visual quality.
The paper introduces RPG-Encoder, a way to turn a whole code repository into one clear map that mixes meaning (semantics) with structure (dependencies).