FSVideo is a new image-to-video generator that runs about 42× faster than popular open-source models while keeping similar visual quality.
The paper introduces RPG-Encoder, a way to turn a whole code repository into one clear map that mixes meaning (semantics) with structure (dependencies).
Long tasks trip up most AIs because they lose track of goals and make small mistakes that snowball over many steps.
The paper asks AI to hunt for insights in big databases without being told exact questions, like a curious scientist instead of a test-taker.
Shampoo is a smart optimizer that can train models better than AdamW, but it used to be slow because it must compute tricky inverse matrix roots.
Large Vision-Language Models (LVLMs) are great with one picture but get confused when you give them several, often mixing details from different images.
CatRAG is a new way for AI to find the right facts by letting the knowledge graph change its paths based on each question.
VIBE is a new test that checks how well image-editing AI models follow visual instructions like arrows, boxes, and sketches—not just text.
The paper introduces a new way to sample text from masked diffusion language models that is smarter and less greedy.
The paper makes long video generation much faster and lighter on memory by cutting out repeated work in attention.
The paper tests a simple but bold idea: show code to AI as pictures instead of plain text, then shrink those pictures to save tokens and time.
Mind-Brush turns image generation from a one-step 'read the prompt and draw' into a multi-step 'think, research, and create' process.