PlenopticDreamer is a new way to remake a video from different camera paths while keeping everything consistent across views and over time.
The paper asks a simple question: do video AIs really need to “think out loud” every time, or can they answer quickly most of the time and think deeply only when needed?
This paper teaches AI to look around a 3D place step by step, instead of staring at a fixed set of pictures, so it can answer tricky spatial questions better.
RelayLLM lets a small model do the talking and only asks a big model for help on a few, truly hard tokens.
DocDancer is a smart document helper that answers questions by exploring and reading long, mixed-media PDFs using just two tools: Search and Read.
VerseCrafter is a video world model that lets you steer both the camera and multiple moving objects by editing a single 4D world state.
Big all-in-one language models are powerful but too expensive to run everywhere, while small specialists are cheaper but narrow.
The paper shows that big language models often get stuck with weight sizes set by training hyperparameters instead of by the data, which quietly hurts performance.
SmartSearch teaches search agents to fix their own bad search queries while they are thinking, not just their final answers.
Mixture-of-Experts (MoE) models use many small specialist networks and only activate a few per token, but classic LoRA fine-tuning gives every expert the same rank, wasting parameters on the wrong experts.
AgentOCR turns an agent’s long text history into pictures so it can remember more using fewer tokens.
AT2PO is a new way to train AI agents that work in several turns, like asking the web a question, reading the result, and trying again.