The paper teaches an AI to act like a careful traveler: it looks at a photo, forms guesses about where it might be, and uses real map tools to check each guess.
This paper builds MFMD-Scen, a big test to see how AI changes its truth/false judgments about the same money-related claim when the situation around it changes.
This paper teaches a camera to fix nighttime colors by combining a smart rule-based color trick (SGP-LRD) with a learning-by-trying helper (reinforcement learning).
When a model learns from many rewards at once, a popular method called GRPO can accidentally squash different reward mixes into the same learning signal, which confuses training.
RoboVIP is a plug-and-play tool that turns ordinary robot videos into many new, realistic, multi-view training videos without changing the original robot actions.
PlenopticDreamer is a new way to remake a video from different camera paths while keeping everything consistent across views and over time.
The paper asks a simple question: do video AIs really need to “think out loud” every time, or can they answer quickly most of the time and think deeply only when needed?
This paper teaches AI to look around a 3D place step by step, instead of staring at a fixed set of pictures, so it can answer tricky spatial questions better.
RelayLLM lets a small model do the talking and only asks a big model for help on a few, truly hard tokens.
DocDancer is a smart document helper that answers questions by exploring and reading long, mixed-media PDFs using just two tools: Search and Read.
VerseCrafter is a video world model that lets you steer both the camera and multiple moving objects by editing a single 4D world state.
Re-Align is a new way for AI to make and edit pictures by thinking in clear steps before drawing.