When a model learns from many rewards at once, a popular method called GRPO can accidentally squash different reward mixes into the same learning signal, which confuses training.
RoboVIP is a plug-and-play tool that turns ordinary robot videos into many new, realistic, multi-view training videos without changing the original robot actions.
PlenopticDreamer is a new way to remake a video from different camera paths while keeping everything consistent across views and over time.
The paper asks a simple question: do video AIs really need to “think out loud” every time, or can they answer quickly most of the time and think deeply only when needed?
This paper teaches AI to look around a 3D place step by step, instead of staring at a fixed set of pictures, so it can answer tricky spatial questions better.
RelayLLM lets a small model do the talking and only asks a big model for help on a few, truly hard tokens.
DocDancer is a smart document helper that answers questions by exploring and reading long, mixed-media PDFs using just two tools: Search and Read.
VerseCrafter is a video world model that lets you steer both the camera and multiple moving objects by editing a single 4D world state.
Big all-in-one language models are powerful but too expensive to run everywhere, while small specialists are cheaper but narrow.
The paper shows that big language models often get stuck with weight sizes set by training hyperparameters instead of by the data, which quietly hurts performance.
SmartSearch teaches search agents to fix their own bad search queries while they are thinking, not just their final answers.
Mixture-of-Experts (MoE) models use many small specialist networks and only activate a few per token, but classic LoRA fine-tuning gives every expert the same rank, wasting parameters on the wrong experts.