This paper proposes ReSID, a new way to turn items into short token codes (Semantic IDs) that are much easier for a recommender to predict.
LatentMorph teaches an image-making AI to quietly think in its head while it draws, instead of stopping to write out its thoughts in words.
The paper fixes a hidden mistake many fast video generators were making when turning a "see-everything" model into a "see-past-only" model.
This paper studies how AI agents get better while they are working, not just whether they finish the job.
ECHO-2 is a new way to train AI with reinforcement learning that keeps a small, central trainer busy while sending the easy, cheap work (rollouts) to many low-cost computers spread around the world.
The paper introduces VDR-Bench, a new test with 2,000 carefully built questions that truly require both seeing (images) and reading (web text) to find answers.
This paper fixes a common problem in reasoning AIs called Lazy Reasoning, where the model rambles instead of making a good plan.
Loop-ViT is a vision model that thinks in loops, so it can take more steps on hard puzzles and stop early on easy ones.
This paper shows that many AI models that both read images and write images are not truly unified inside—they often understand well but fail to generate (or the other way around).
This paper finds a precise way to describe and fix the Modality Gap, which is when image and text features that mean the same thing still sit in different places in the AI’s memory space.
World models are AI tools that imagine the future so a robot can plan what to do next, but they are expensive to run many times in a row.
Large language models don’t map out a full step-by-step plan before they start thinking; they mostly plan just a little bit ahead.