This paper teaches AI to name things in pictures very specifically (like “golden retriever” instead of just “dog”) without making more mistakes.
Robots learn better when they think about how things move over time, not by redrawing every pixel of a video.
BeyondSWE is a new benchmark that tests code agents on tougher, more real-life tasks than single-repo bug fixing.
NOVA is a new video editor that lets you change a few key frames (sparse control) while it carefully keeps the original motion and background details (dense synthesis).
NE-Dreamer is a model-based reinforcement learning agent that skips rebuilding pixels and instead learns by predicting the next step’s hidden features.
This paper introduces HACRL, a way for different kinds of AI agents to learn together during training but still work alone during use.
Track4World is a fast, feedforward AI that can follow the 3D path of every pixel in a video using just one camera.
MemSifter is a smart helper that picks the right memories for a big AI so the big AI doesn’t have to read everything.
ParEVO teaches AI to write fast, safe parallel code for messy, irregular data like big graphs and uneven trees.
PRISM is a new way to help AI think through hard problems by checking each step, not just the final answer.
HiFi-Inpaint is a new AI method that fills a missing area in a photo of a person by inserting a specific product, while keeping tiny details like logos, textures, and small text crisp.
The paper fixes a big flaw in test-time reinforcement learning (TTRL): when many wrong answers agree, the model rewards the mistake and gets stuck.