CatRAG is a new way for AI to find the right facts by letting the knowledge graph change its paths based on each question.
VIBE is a new test that checks how well image-editing AI models follow visual instructions like arrows, boxes, and sketches—not just text.
The paper introduces a new way to sample text from masked diffusion language models that is smarter and less greedy.
The paper makes long video generation much faster and lighter on memory by cutting out repeated work in attention.
The paper tests a simple but bold idea: show code to AI as pictures instead of plain text, then shrink those pictures to save tokens and time.
Mind-Brush turns image generation from a one-step 'read the prompt and draw' into a multi-step 'think, research, and create' process.
ObjEmbed teaches an AI to understand not just whole pictures, but each object inside them, and to link those objects to the right words.
TRIP-Bench is a new test that checks if AI travel agents can plan real trips over many chat turns while following strict rules and changing user requests.
CoDiQ is a recipe for making hard-but-solvable math and coding questions on purpose, and it controls how hard they get while you generate them.
A2Eval is a two-agent system that automatically builds and runs fair tests for robot-style vision-language models, cutting wasted work while keeping results trustworthy.
This paper argues that true world models are not just sprinkling facts into single tasks, but building a unified system that can see, think, remember, act, and generate across many situations.
This paper shows how to make text-to-video models create clearer, steadier, and more on-topic videos without using any human-labeled ratings.